Documentation for using Abisko cluster for the SNIC 2018/3-38 computing project

Introduction

NB! New project from Feb 1, 2019 - see SNIC 2019/3-68 for up-to-date info

This page contains instructions for how to use the computing resources granted via the SNIC 2018/3-38 project "Particle Physics at the High Energy Frontier". The project grants 100k CPUh/month for the KTH, Lund, Stockholm and Uppsala ATLAS groups, and the Lund ALICE group. This page contains the ATLAS-specific documentation for how to use these resources at the Abisko cluster which is part of HPC2N.

Collecting this documentation is work in progress and contributions are most welcome, in particular after galning experience with the job submission tools, etc. please feel free to use the egroup mentioned below for discussions.

Getting started

This section explains what you need to do to get set up with an account at HPC2N, to check your affiliation with the project, and set up your environment. The two expandable subsections show instructions specific to ATLAS and ALICE users.

        t-an01:~ > projinfo
        Project info for all projects for user: cohm
        Information for project SNIC2018-3-38:
            Christian Ohm: Particle Physics at the High Energy Frontier
            Active from 20180130 to 20190201
            SUPR: https://supr.snic.se/project/SNIC2018-3-38
        Allocations:
            abisko:         100000 CPUhours/month
        Usage on abisko:
            N/A

  • Add the following to your login script (e.g. .bashrc) to automatically have setupATLAS available when you log in:
        # ATLASLocalRootBase
        export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
        alias setupATLAS='. ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh'

  • TODO:
        ALICE commands

After you log out and back in, you should be set up to use the resources and experiment-specific tools.

Quick test of the setup on the interactive node

After logging out and back in, you can call setupATLAS to set up all the standard ATLAS software off of cvmfs, just like on lxplus, e.g. to set up the analysis release 21.2.16 just do:

setupATLAS -c slc6
asetup AnalysisBase,21.2.16,here

The -c slc6 switch directs setupATLAS to start up a SLC6 singularity container. Once in the container all the standard ATLAS software will work as if running on SLC6. NB! Once inside the container the batch system commands no longer work, as they are not compatible with SLC6.

In order to use the batch system from within the Singularity container use -c slc6+batch which also enables the use of batch system tools. For more information see here

  • TODO:
        ALICE commands

Submitting jobs to the batch system

The interactive login nodes should not be used for heavy computing jobs, and these should be run on the SLURM-based batch system Abisko at HPC2N. More complete instructions are available here, but below are some examples of basic commands, and simple examples for how to run jobs interactively and submit jobs to the batch system.

Examples of useful commands:

  • Show me my jobs: squeue -u cohm (replace with your user name)
  • Cancel a running job: scancel
  • Submit a job to the batch system: sbatch yourjob.slurm, where yourjob.slurm is a file describing your job (see below)

Running jobs interactively: reserve one batch node with four processors to work on interactively for one hour:

  1. Reserve the node: salloc -A 2 -N 1 -n 4 --time=1:00:00
  2. Run a program on the allocated node interactively: srun -n 1 my_program (this will wait until the program is done, and the shell will not be usable until the program is done)

Submit a program to the batch system: In order to submit a job to a node using ATLAS software the following can be used, with parameters example SBATCH parameters:

  #SBATCH -A SNIC2018-3-38
  # Name of the job (makes easier to find in the status lists)
  #SBATCH -J TestJob
  # name of the output file
  #SBATCH --output=test.out
  # name of the error file
  #SBATCH --error=test.err

  #SBATCH -n 1
  # the job can use up to 30 minutes to run
  #SBATCH --time=00:30:00
  export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
  
  export ALRB_SCRATCH="/pfs/nobackup/home/<initial>/<username>/"
  export HOME="/pfs/nobackup/home/<initial>/<username>/"
  
  export ALRB_CONT_RUNPAYLOAD="what you want to do"
  source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh -c slc6

where the ALRB_CONT_RUNPAYLOAD variable contains the commands to be executed after the singularity container has started (needed since we don't get an interactive prompt on the worker nodes). Assuming that the above script is saved under job.sh then it is possible to submit by running:

* sbatch test.sh

Storage

For more exhaustive info about the file systems at HPC2N, see this page. The most important parts are summarized here:

  • AFS: Your home directory is on afs and backed up regularly, and is accessible under /afs/hpc2n.umu.se/home/u/user/ (very similar to the CERN afs, so user should be replaced with your user name, and u with its first letter). One difference is that this is accessible under /home/u/user/ on the nodes (and that's also where $HOME points to). Your afs area is not accessible from batch jobs, so for that you should use PFS. You can check your quota using fs lq
  • PFS: This is the file system you should use for anything that needs to be read or written in your batch jobs, and your area can be found under /pfs/nobackup/home/u/user.
  • Quotas: you can check your quotas on the above two file systems from any directory like this:
        t-an01:~ > quota
        Disk quotas for cohm:
        Filesystem            usage       quota   remark
        /home/c/cohm       176.00MB      1.91GB   9.0% used
        /pfs/nobackup           4KB      2.00TB   0.0% used
          file count              1     1000000   0.0% used

For info about options for more substantial storage, please see Swestore (Christian has not investigated this further yet but is happy to discuss).

Running MG5_aMC@NLO

Install MG5 by downloading the latest release and unpack the tarball in your PFS area (/pfs/nobackup/home/<initial>/<username>/). Setup the required environment:

module load GCC/7.3.0-2.30
module load OpenMPI/3.1.1 # needed for ROOT
module load ROOT/6.14.06-Python-2.7.15
module CMake

export PYTHONPATH=</path/to/mg5>/HEPTools/lhapdf6/lib/python2.7/site-packages:$PYTHONPATH
export LD_LIBRARY_PATH=</path/to/mg5>/HEPTools/lhapdf6/lib:$LD_LIBRARY_PATH

You will need to do the above steps every time before running MG5 so it's a good idea to put it in a script. To run in cluster mode, edit the file input/mg5_configuration.txt and set run_mode = 1. The next thing you need to do is specify the type of cluster. Abisko runs slurm, but if you set cluster_type = slum it will not work out of the box. This is because Abisko requires you to specify the account and walltime at submission. To to this you can either edit the submit function in the class SLURMCluster inside the file madgraph/various/cluster.py, or use this PLUGIN specifically made for the Abisko cluster. If you use the plugin you need to set cluster_type = abisko.

When you run in cluster mode as described above, you can run MG5 as normal and the proper jobs will be submitted under the hood. However, the drawback with this mode is that the whole production will fail if one of the subjobs fail. This is because the results will be unphysical otherwise. You might instead consider running in gridpack mode.

To run in gridpack mode, start MG5 as normal and edit the run_card.dat file when prompted, and set True = gridpack. Now when you "generate" your events, you will instead get a file called run_01_gridpack.tar.gz (or similar) which contains a streamlined copy of MG5. Unpack and compile:

$ tar xvf run_01_gridpack.tar.gz # contains madevent/ and run.sh
$ cd madevent
$ bin/compile

The gridpack is run by passing the number of events and the seed to run.sh

$ ./run.sh <nevents> <seed>

To prepare the gridpack for the cluster, repack it after compilation:

$ tar cvf gridpack.tar.gz madevent/ run.sh

Prepare a batch file (e.g. single_batch.sh) to send to the cluster for a single run:

#!/bin/bash

#SBATCH -A SNIC2019-3-68
#SBATCH -n 1
#SBATCH -c 6
#SBATCH -t 10
#SBATCH -J gridpack
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

count=$1
nevents=10000
seed=$((RANDOM+count))
jobdir="gridrun_${SLURM_JOB_NAME}_${SLURM_JOB_ID}_batch${count}"
mkdir $jobdir
cd $jobdir
tar xvf ../gridpack.tar.gz
srun -n 1 ./run.sh $nevents $seed
cd ..

And then to launch 10 jobs for example, do

$ for i in {1..10}; do sbatch single_batch.sh $i; done

Of course you do not need to use this exact form of the batch file, but it will generate a new seed for each run and put them in different directories which are somewhat sanely named.

It is possible to also run the shower in gridpack mode, but you need to explicitly call it in edit run.sh.

Getting help

The egroup atlas-sweden-analysis should be used to report and discuss problems using these resources. If you haven't already, please join here. If you have issues registering with SNIC or joining the project, please contact Christian Ohm directly (PI of the project).

Known issues

There are no known issues with using ATLAS software at Abisko at the moment, but since the cluster doesn't have an OS based on SLC6 there could be problems (see ATLAS s/w readiness for CentOS7 here). Please contact the egroup above if you have problems.

Documentation -- ChristianOhm - 2018-02-03

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf SNIC_tutorial_201609.pdf r1 manage 7437.6 K 2018-02-19 - 09:00 ChristianOhm Older slides with tutorial from 2016 for how to use Abisko resources for SNIC 2016/1-274.
Unknown file formatpptx SNIC_tutorial_201609.pptx r1 manage 1688.6 K 2018-02-19 - 08:58 ChristianOhm Older slides with tutorial from 2016 for how to use Abisko resources for SNIC 2016/1-274.
Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2019-02-18 - MaxFredrikIsacson
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback