587 Cluster

test This page is dedicated to the 587 private cluster and should contain all the information needed to login and work on the cluster. Currently the cluster consists of 13 machines, each with Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz CPUS; 6 cores and 12 threads. All machines have a shared filesystem, which is located at /alf. All systems are currently running Ubuntu Mate 18.04 LTS.

Login

Before one can work on the cluster a user account has to be created for you . Please contact any of the following:
  • Florian Jonas (florian.jonas[at]cern.ch)
  • Markus Fasel (markus.fasel[at]cern.ch)
  • Raymond Ehlers (raymond.ehlers[at]cern.ch)

Once you have an account created for you, you can access the login server from the CERN network via ssh:

 
ssh -XY username@pc059
 
Info The option -XY is optional and allows for X11 forwarding, which means opened windows can be displayed on your local machine.

If you are connecting from outside the CERN network, you can do so via lxplus:

 
 ssh -J CERNUSER@lxplus.cern.ch CLUSTERUSER@pc059 
 

Initial Setup

After logging in to the cluster, instead of storing all your files in your home directory, please create a folder in either of the /alf/data directories named after your username. Here you can then store all your files, and they will be accessible for all computing nodes (otherwise calculations will fail). If you want to install your own AliPhysics installation, please so so in the /software directory to avoid heavy load on the alf filesystem. Before you do, please make sure that you have already specified:
export ALIBUILD_WORK_DIR="/software/USER/alice/sw"
in your .bashrc file. AliBuild and all its prerequisites are already installed on the system. Before building for the first time, make sure that you ran the aliBuild inti command in the base directory (/software/user/alice), e.g. for AliPhysics:
aliBuild init AliPhysics@master
Since the recources on the login machine are shared with everyone, please do not simply run the usual aliBuild command ! Instead, a tool call melmac is provided to run the build process as a batch job. melmac needs to be run from the software directory (containing alidist), and the environment variable ALIBUILD_WORK_DIR - which is required by alienv in any case - must be set. To build i.e. O2 simply do:
melmac launch O2 -d o2 -z aliceo -p -j 3
This will ensure that the resources for building are properly allocated by the cluster. The parameters -z (development tag), -p (always-prefer-system) and -j are optional, the parameters -z and -p are however strongly encouraged to not unnecessarily increase the disk usage on software due to unnecessary packages. Note that the option --disable DPMJET is nedded, in order to avoid the prompt for a password, which will not work when submitting the building as a job to the cluster. If you need DPMJET, run melmac as local job via i.e.
melmac run AliPhysics -d o2 -z aliceo2 -p
More information can be found via
melmac -h

If you need to build any other software, please always try to submit this as a job using sbatch. If it is not possible try to use maximum 1 core of the login machine.

Monitoring cluster status

Monitoring of the batch system status can be found here: https://b587clustermon.cern.ch:3000 (select search icon and then SLURM Dashboard). The page is visible only from within the CERN network. For access from outside you need to create an ssh tunnel with port forwarding. To do so, you need to do

ssh -L 3000:b587clustermon.cern.ch:3000 mfasel@lxplus.cern.ch -fN

and access in your web browser *http://localhost:3000". Alternatively you can login to the CERN Terminal Server via rdesktop (Linux) or Windows Remote Desktop Connection (Windows, mac).

We also provide a very simple monitoring of the cluster (jobs, disk usage ...) under http://pc059.cern.ch. The monitoring is still in development and will be updated from time to time.

Running Analysis on the cluster

Running analysis code on the cluster is simple, and only requires two bash scripts:
  • A start script, containing the loading of all needed librairies, as well as the command that is running your analysis
  • A submit script, that will submit your startscript (i.e. your analysis) as jobs to the cluster.
At the moment, 126 jobs can be running at the same time, the rest will go to a queue and will be run as soon as resources are available, based on a "first in, first out" basis.

Below you find an example of a startscript, which can also be found in the /clusterfs1/examples folder:

#!/bin/bash

#load any variable paths needed like this
# they have to be located on either of the clusterfs storages
export PYTHIA8=/software/flo/alice/AliRoot/PYTHIA8/pythia8243
export LD_LIBRARY_PATH=$PYTHIA8/lib:$LD_LIBRARY_PATH
export PYTHIA8DATA=/software/flo/alice/AliRoot/PYTHIA8/pythia8243/share/Pythia8/xmldoc

#load the AliPhysics,AliRoot etc. modules like this, specifying the path to the
#your own AliPhysics installation or the installation of someone else on clusterfs
eval `/usr/local/bin/alienv -w /software/flo/alice/sw --no-refresh printenv AliPhysics/latest`

# run your program here
./CorrelationStudy.exe $1 -1 -1 1 HardQCDMonash_$2
In this case we are loading all the librairies needed to run a pythia standalone simulation. Furthermore, AliPhysics was loaded with the special command, which needs to be adapted according to the location of your AliPhysics installation:
eval `/usr/local/bin/alienv -w /clusterfs1/flo/alice/sw --no-refresh printenv AliPhysics/latest`
All this loading is neccessary in order to ensure that the libraries are accessible on each computing node.

Once the start script is finished, only a submission script is needed. Example:

#!/bin/bash

#create a directory where output will be stored (optional if you output files have unique names)
#in case your output filenames are not unique, make sure to create a subfolder for each job

mkdir $1
cd $1
cp ../CorrelationStudy.exe .


for i in {0..50} #submit 50 jobs
do
    # command to submit one job
    sbatch --job-name="HardPythiaCorr" ../StartPythiaSim.sh 1000000 $i
done
In this case, 50 jobs are submitted by simply running the submission command sbatch in a for loop. A few important notes to keep in mind when submitting jobs:
  • since the output of all jobs will be written to the same folder (the folder from which sbatch was run from), you need either a unique filename for the output file of each job, or start the jobs from unique folders
  • if part of your analysis relies on random seeds that are generated from the current unix time, submitting your jobs all at the same time can cause problems since sometimes only 1sec accuracy is taking into account and therefore the seed is not unique anymore
Once your jobs are submitted you can view the queue using the squeue command. The cout of each job will be written to a file slurm-JOBID.out, which is constantly updated during runtime, which allows you to check on the progress of a particular job.

I/O considerations for the cluster

The cluster uses ceph to handle multiple HDD which are spread over multiple machines (all mounted in a single point /alf), which improves i/o performance. However, for very i/o intensive tasks, i.e. reconstruction or analysis tasks, performance can be improved by creating a local working directory in the /tmp folder of the worker node, copy the files needed for processing there, and when the job is done copy the output files back to the worker. Remember to always use the $SLURM_JOBID in the path of the working directory in order to prevent that several jobs use the same working directory. Example:

...
TMPDIR=/tmp/slurrm_$SLURM_JOBID
if [ ! -d $TMPDIR ]; then mkdir -p $TMPDIR; fi
cd $TMPDIR
# copy large raw file there
cp /alf/raw.root .
# do task
...
# remove files not to be copied back
rm raw.root
# copy back results
for f in ${fls[@]}; do cp $f $OUTPUTDIR/; done
# remove TMPDIR
cd $OUTPUTDIR
rm -rf $TMPDIR

Running interactive jobs

The cluster consists only of one interactive nodes shared by all users. Consequently interactive work from all users is done on that node, which might take quite some resources if many users work at the same time. For resource consumptive work interactive jobs can be used. Interactive jobs can be started via

salloc -N 1 -n 1 -J interactive --partition short
srun --pty /bin/bash

Note that global homes are not available on the cluster, consequently your shellrc will not be available on worker nodes. When submitting interactive jobs to the long queue keep in mind to exit the job (twice, for srun and salloc) once your work is done in order to free the resources again. A dedicated interactive will be considered. Never run interactive jobs in the vip queue.

Important commands for SLURM

The jobs are are submitted and handeled using SLURM. Below you find a few important commands:
Command Description
sbatch STARTSCRIPT.sh submit a job script to the grid. In addition -J or --job-name allows to specify name that will be shown in queue; -p PARTITIONNAME lets you submit to a certain partition
squeue view currentling running jobs and jobs in queue
scancel cancel a job that you submitted. Useful example: scancel {1000..1500} to cancel multiple jobs
sinfo show status of nodes

Several partitions exist on the cluster that you can submit your jobs to, that differ in how long a job can run on them. You can see them using the sinfo command and specify them suring submission using sbatch -p PARTITIONNAME job.sh

Partition Description
short (default) time limit of 2 hours, 13 machines
long no time limit, use this for long jobs; 7 machines
vip no time limit; 13 machines (do not use if not absolutely needed), always has higher priority than other partitions
loginOnly only pc059, no timelimit; use for building aliPhysics etc

Singularity support on the cluster

Running jobs inside container via Singularity supported on all nodes of the cluster via the central software repository. The common software repository is handled via environment modules. To access the modules you need to exec

module use /software/centralsoft/Modules

Now you can load the singularity module:

module load singularity/2.5.2

In order to run a script under singularity you need to first load the singularity module as described above. Then you can run

singularity exec -B /software:/software -B /alf:/alf -B /cvmfs:/cvmfs PATH_TO_MY_CONTAINER COMMAND COMMAND_ARGUMENTS ...

-B /software:/software mounts the various file systems in the container.

cctools support on the cluster (most notable: makeflow)

The cluster provides the latest version of cctools. You can get them from the centralsoft repository as done for singularity

module use /software/centralsoft/Modules
module load cctools/7.4.2

Suppose you want to run a simulation with 100 jobs. The bash script launching setting up the environment is called runner.sh. The description can be specified in a file tasks.jx

{
"rules":[
      {
         "outputs":[format("output.%d",i)],
         "inputs":["runner.sh",],
         "command":format("./runner.sh DATADIR %d > output.%d", i, i),
      } for i in range(1,100),
   ],
}
where datadir is the output location. In this case the script runner.sh takes two parameters: a data location path as first and a slot ID as second. In order to submit to the cluster do
makeflow -T slurm --jx tasks.jx
The parameter -J can be used to limit the amount of slots running in parallel.

In case the jobs are very short jobs it can be more advantageous to us a workqueue. A workqueue can be created (in this example with 10 workers) via

bash slurm_submit_workers pc059.cern.ch 9123 10
9123 in this case is a common port used to communicate between workqueue and makeflow. In order to submit now to the workqueue do
makeflow -T wq --jx tasks.jx

Selecting python version

Several python3 versions including very recent versions are installed on the cluster. By default the python version is the one shipped with Ubuntu18.04 (3.6.8). Two options are available to select a different python version:

  • python_version: In case you want to swap to python 3.9 do
    . python_version 3.9
    
  • Python modules: Dedicated modules are available in the centralsoft repository. In order to swap to python 3.9 do
    module use /software/centralsoft/Modules
    module load python/3.9
    
Users are recommended to use the second method as this is a lot more flexible. I.e. it allows swapping the python version (module swap python/3.9 python/3.10) and unloading the module again.

Mounting cluster storage on your local machine

You can mount/unmount the cluster filestorage to your local machine. Just add these aliases to you bashrc on your local machine:
alias mountServerFlo2='sudo sshfs -o allow_other flo@pc059:/alf /media/florianjonas/alf'
alias unmountServerFlo1='sudo umount /media/florianjonas/alf'
with your username. Please note that the folder you want to mount to, needs to exist on your local machine already.

If you want to mount from outside the the cern network use this instead, going via a lxplus tunnel:

alias mountServerAlf="sudo sshfs flo@pc059:/alf ~/alf -o allow_other,ssh_command='ssh -J fjonas@lxplus.cern.ch'"

Data transfer via globus

Globus is a powerful tool to transfer large amount of data between clusters and from clusters to personal endpoints, i.e. laptops (transfer between two private endpoints needs a premium subscription). CERN has a globus subscription, therefore the CERN account can be used to login to globus (https://www.globus.org). Several big computing centers (i.e. NERSC, CADES,...) have subscriptions as well and provide data transfer nodes as public endpoint. Currently the 587 cluster does not yet have a public endpoint (we are working on this!), therefore users must run a personal endpoint on pc059. To do so, go to https://app.globus.org/file-manager/gcp and select "Globus Connect Personal for Linux". On pc059 you need to launch globusconnectpersonal, however when running without argument it will just include the user home directory. The global file systems need to be mounted as well (alf, software), and you should only mount your user folder on the global file system. To do so, do (ideally should go into a small script which launches globusconnectpersonal as nohup):

globusconnectpersonal -start -restrict-paths rw/home/WHOAMI,rw/software/WHOAMI,rw/alf/data/WHOAMI

To transfer data i.e. to CADES go to https://app.globus.org/file-manager and in the file manager select your personal endpoint as first source (under "Collection") and as the second source the cluster to transfer data to/from (CADES-OR for CADES). The file manager is pretty much self-explaining.

Checking used space on the filesystem

If you want to check how much space you are using on alf, we provide an easy tool to check the current usage of the filesystem (which gets updated once per week). Just run the following commands to get an overview:
cd /alf/data/FSInfo
bash readScan.sh export_{DATEOFCHOICE}.gz
It will provie you with an interface to check the size of each folder in alf!

cvmfs

The cluster has cvmfs mounted, which can be used to easly get a working AliPhysics installation with the latest tag. Here are some useful commands:
 * List all packages         --> /cvmfs/alice.cern.ch/bin/alienv q
 * List AliPhysics packages  --> /cvmfs/alice.cern.ch/bin/alienv q | grep -i aliphysics 
 * Enable a specific package --> /cvmfs/alice.cern.ch/bin/alienv enter VO_ALICE@AliPhysics::vAN-20210302_ROOT6-1
 * Enable multiple packages  --> /cvmfs/alice.cern.ch/bin/alienv enter VO_ALICE@AliPhysics::vAN-20210302_ROOT6-1,VO_ALICE@fastjet::v3.3.4_1.042-1 

useful links

https://support.ceci-hpc.be/doc/_contents/QuickStart/SubmittingJobs/SlurmTutorial.html https://slurm.schedmd.com/quickstart.html.

If you have any questions or problems, don't hesitate to contact me (Florian Jonas) or Markus Fasel.

-- FlorianJonas - 2021-03-03 -- FlorianJonas - 2021-07-21

Edit | Attach | Watch | Print version | History: r27 < r26 < r25 < r24 < r23 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r27 - 2022-01-20 - MarkusFasel
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback