In this page there wil be track of some usefull infos about the usage of the italian INFN computing center (CNAF):

Queues

There are three queues available: ams, ams_short and ams_prod

  • ams_prod (CPU time 13200.0 m, ~9 days, will be decreased): is used for the production by golden users only. This queue has the lowest priority but no limit in terms of running jobs;

  • ams (CPUT time 3300.0 m, ~2.2 days) for general purpose. This queue has an higher priory (wrt ams_prod) but is limited to ~75% of the total cores avalaible for AMS;

  • ams_short (CPU time 360.0 m, ~6h, to be reduced to 1h) settled for short jobs. This queue has the highets priority but is limited to 100 jobs (see later about the job efficiency).

Job submission

To submit a job the are few things that one should know but ther are very important:

  • the environment at the moment of the submission is kept. Is however a best practice to apply, in the job, the required environment;
  • when the job start running it will start in the very same directory (if possible, for example over /storage/gpfs_ams/) where the job has been launched;
  • is however important not to create files (and kerp them open) over /storage/gpfs_ams/ but is better to use a local dir in the working node. The designed place where to go is /data/. This is a directory, avalaible on all the working nodes and designed for this purpose. In general the approach is to go there, create a sandbox (with a unique name) and inside do whatever needed. At the end, before exiting, is a best ptractice to remove the created sandbox. An example of template can be found here:
    • template_job.sh: template submission scripts example. Is uses, as COMMAND to be executed the test_node_env.sh below.
    • test_node_env.sh: script used as COMMAND example

Once submitting a lot of jobs (>1000) is mandatory to avoid to submit all the jobs at once. The maximum number of jobs to be submitted togheter is ~ 1000 so that, even in case of multiple users submitting bunch of jobs, the total number pending jobs is kept under 10k or below.

To do this a script to submit is needed. This script should:

  • check how many jobs the user has already PENDING+RUNNING in a loop
  • once the PENDING+RUNNING jobs are below a treshold the new jobs can be submitted
Writing this script one should pay attention:

  • is generally better to implement a loop with a long sleep (1 or even 5 minutes, and not absolutely 1 second)
  • once the threshold is passed submit a bunch (i.e. 100) of jobs and not just one

Job efficiency

Our storage (1.5 PB for 2015, still 1.1PB @ 10/Feb/2015) is limited, in bandwidth, to ~ 2GB/s. The AMS jobs, especially when accessing the AMSRoot data format has been observed to be a lot I/O hungry and consuming. This is the reason why the number of analysis jobs has been limited.

When the bandwidth is saturated also the interactive job is affected (even editing a text file or doing a simple ls is tremendously slow) and in general the job efficiency (namely the number of seconds spent by using the CPU divided by the total job running time) is low. A low job efficiency of the jobs is symptomatic of a file system under a saturated bandwidth. A very low job efficiency is symptomatic that the job itself is doing a lot of I/O and is the origin of the saturation and of the low job efficiency for the other jobs.

To avoid it would be useful if each user verified the efficiency of his own jobs, and, in case, limit the number of running jobs by hand.

If someone has jobs with less than 60% efficiency, he should have at most 100 jobs submitted (PENDING+RUNNING) at the same time on the ams queue.

If those jobs last less then 1h, they must be submitted to the ams_short queue.

You may verify the efficiency of your running jobs by using the following script, passing as arguments the username and the queue:

/opt/exp_software/ams/bin/job_efficiency_calculation.sh <user> <queue>

Or you can add the directory to your path, in your configuration file, and using it as a simple command:

in bash

export PATH=$PATH:<b>/opt/exp_software/ams/bin</b>

in csh or tcsh

setenv PATH $PATH:<b>/opt/exp_software/ams/bin</b>

then simply:

job_efficiency_calculation.sh <user> <queue>

The script will give you back the efficiency for all of your running jobs:

<JOBID> <user> <efficiency>

And the mean efficiency of all of your running jobs.

Common software

A shared installation of the common software is installed under

/opt/exp_software/ams/

this directory is managed by Matteo Duranti (matteo.duranti_ACT_pg.infn.it).

It contains the AMSsoft, some links and some copies of the AMSDataDir and other stuff:

lrwxrwxrwx 1 mduranti ams 39 Feb 21 2012 AMSDataDir_afs -> /afs/cern.ch/exp/ams/Offline/AMSDataDir/
lrwxrwxrwx 1 mduranti ams 36 Feb 21 2012 AMSsoft_afs -> /afs/cern.ch/exp/ams/Offline/AMSsoft/
drwxr-xr-x 4 mduranti ams 4.0K Feb 26 2012 additional_libs/
-rw-r--r-- 1 mduranti ams 223M Sep 13 2012 qt-everywhere-opensource-src-4.8.3.tar.gz
lrwxrwxrwx 1 mduranti ams 12 Oct 22 2012 AMSsoft -> AMSsoft_v0.5/
lrwxrwxrwx 1 mduranti ams 37 Nov 23 2012 AMSDataDir -> /cvmfs/ams.cern.ch/Offline/AMSDataDir/
drwxr-xr-x 14 mduranti ams 4.0K Nov 25 2012 qt-4.8.3/
drwxr-xr-x 21 mduranti ams 4.0K Jun 23 2013 qt-everywhere-opensource-src-4.8.3/
drwxr-xr-x 15 mduranti ams 4.0K Aug 22 2013 AMSsoft_cern/
-rwxr-xr-- 1 mduranti ams 75 Aug 23 2013 rsyncAMSsoft_cern.sh*
lrwxrwxrwx 1 mduranti ams 17 Aug 24 2013 AMSsoft_tars -> AMSsoft_cern/tars/
drwxr-xr-x 38 mduranti ams 4.0K Nov 17 2013 root_v5.27ams_patched/
drwxr-xr-x 6 mduranti ams 4.0K Nov 17 2013 AMSsoft_v0.3/
drwxr-xr-x 3 mduranti ams 4.0K Nov 17 2013 AMSsoft_v0.3_modified/
drwxr-xr-x 7 mduranti ams 4.0K Nov 17 2013 AMSsoft_v0.5/
drwxr-xr-x 4 mduranti ams 4.0K Nov 17 2013 AMSsoft_v0.5_modified/
drwxr-xr-x 11 mduranti ams 4.0K Nov 17 2013 AMSsoft_cern_17Nov2013_modified/
lrwxrwxrwx 1 mduranti ams 31 Nov 18 2013 AMSsoft_modified -> AMSsoft_cern_17Nov2013_modified/
drwxr-xr-x 4 mduranti ams 512 Jan 26 12:55 pyfits/
drwxr-xr-x 18 mduranti ams 4.0K Jan 27 17:32 galprop/
drwxr-xr-x 2 mduranti ams 512 Jan 29 14:50 bin/

For the AMSsoft the suggest is 'AMSsoft_modified' (where modifed means that the useless libshift dependancy has been purger), so:

lrwxrwxrwx 1 mduranti ams 31 Nov 18 2013 AMSsoft_modified -> AMSsoft_cern_17Nov2013_modified/

For the AMSDataDir the suggest is 'AMSDataDir' (via cvmfs), so:

lrwxrwxrwx 1 mduranti ams 37 Nov 23 2012 AMSDataDir -> /cvmfs/ams.cern.ch/Offline/AMSDataDir/

For the qt, is you want to compile gbatch with the Aachen stuff, the suggest is

drwxr-xr-x 14 mduranti ams 4.0K Nov 25 2012 qt-4.8.3/
Environment variables

TODO: Add the exlanation of the amsvar.sh. Attach and example (maybe...) and show the additional ones.

GALPROP

An installation of the GALPROP software (simulation of the cosmic rays propagation in the Galaxy, version 54.r2504) has been done in /opt/exp_software/ams/galprop.

Some infos on how to run the sw have been included in Guide e README.

The required environment variables are:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<b>/opt/exp_software/ams/galprop/CLHEP/2.2.0.3/CLHEP/lib</b>
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<b>/opt/exp_software/ams/galprop/CCfits/lib</b>

export PYTHONPATH=/opt/exp_software/ams/pyfits/pyfits/lib64/python2.6/site-packages/

An example of how to run GALPROP:

/opt/exp_software/ams/galprop/bin/galprop -r my_datacard -g your_home/GALDEF/ -f /opt/exp_software/ams/galprop/FITS -o your_home/GP_output

where:

  • your_home/GALDEF/: is your own directory that will host your galdef files;
  • my_datacard: is the name of the galdef file (without the 54_ prefix);
  • your_home/GP_output: is your own dir where the results will be written.

Inside /opt/exp_software/ams/galprop/GALDEF you can find several example datacards.

To extract the spectra from the nuclei_54_my_datacard.gz files produced by GALPROP in the GP_output dir, you can use the python plot_galprop.py script. For example:

python /opt/exp_software/ams/galprop/plot_galprop.py your_home/GP_Output 54_my_datacard.gz spectra 500 2 4 2.7 > Helia4.txt

Where:
Usage: /opt/exp_software/ams/galprop/plot_galprop.py GP_Output galdefid parser [arg1 [arg2 [ ... ]]]
GP_Output : directory where the data files are located
galdefid : the common suffix of all the fits files (including .gz if archived)
parser : one of the following: spectra, abundances, ratios, gamma, synchrotron
arg1, arg2, etc: depends on the parser
For parser 'spectra', arg1 is the modulation potential in MV, arg2 is Z, arg3 is A and arg4 is alpha
For parser 'abundances', arg1 is the modulation potential in MV, arg2 is the kinetic energy per nucleon in MeV
For parser 'ratios', arg1 is the modulation potential, arg2 and arg3 are Z1 and A1 (numerator), arg4 and arg5 are Z2 and A2 (denominator)
For parsers 'gamma' and 'synchrotron', arg1 is alpha

For more details have a look also to http://galprop.stanford.edu/

Prelection

A much more smart way of "pre-selecting" the files has been implemented, in order to:

  • not to be analysis-dependant
  • not to need to analyze the full passX to study the efficiency of the cuts used in the preselection itself
The idea is to create a "database" (based on ROOT TTrees) with the information about which cut have been passed for:

  • each event
  • each particle of each event
The "database" can be found here:
/storage/gpfs_ams/ams/users/mduranti/PreselectionDB
/storage/gpfs_ams/ams/users/mduranti/PreselectionDB.B1043.MCpos
/storage/gpfs_ams/ams/users/mduranti/PreselectionDB.B620.pass4.12May2015
/storage/gpfs_ams/ams/users/mduranti/PreselectionDB.B950.pass6.21Sep2015
/storage/gpfs_ams/ams/users/mduranti/PreselectionDB.B1043.MCele
/storage/gpfs_ams/ams/users/mduranti/PreselectionDB.B1043.MCpos.9Oct2015
/storage/gpfs_ams/ams/users/mduranti/PreselectionDB.B620.pass4.26Oct2015
/storage/gpfs_ams/ams/users/mduranti/PreselectionDB.B954.BT
/storage/gpfs_ams/ams/users/mduranti/PreselectionDB.B1043.MCele.9Oct2015
/storage/gpfs_ams/ams/users/mduranti/PreselectionDB.B620.pass4
/storage/gpfs_ams/ams/users/mduranti/PreselectionDB.B950.pass6
/storage/gpfs_ams/ams/users/mduranti/PreselectionDB.B954.BT.7Oct2015

where:

/storage/gpfs_ams/ams/users/mduranti/PreselectionDB

is a link to the latest pass one.

Each db directory contains files like:

preseldb_1383255982.root

(one for each RUN, i.e. different files of the same run merged).

The pre-selection code can be check-out from svn here

svn co https://svn.code.sf.net/p/amsacommonsw/code/trunk amsacommonsw

(the old Group-A repo)

and in particular inside

NewPreSelection
Code to be implemented to use the preselection looping on the db itself (previously created and filled) and retrieving the AMSEventR only for interesting events:
# on top

#include "preselection.h"
# outside the event loop
AMSChain* chain = new AMSChain("AMSRoot");
chain->Add(<file to add>);
AMSEventR *pev
Preselection* presel = new Preselection();
presel->SetProcessHowever(true);
presel->SetDebugLevel(0);
# 3 means 'electron' (a la GEANT). This affects some cuts (like the value of beta used to cut)
presel->SetPartType(3);
# the mask to require per event (0xFFFFFF60) and per particle (0xFFFFFFFFFFFFFC48). Bits up are cut NEGLECTED
presel->SetMask(0xFFFFFF60, 0xFFFFFFFFFFFFFC48);
presel->SetDBPath("/storage/gpfs_ams/ams/users/mduranti/PreselectionDB");
//presel->SetWriteList(false);
presel->SetAMSChain(chain);
TTree* dbtree = presel->GetDBTreeFromAMSChain(chain);
int events = (int)dbtree->GetEntries();
# event loop
for(int iev=0; iev < events; iev++) {
     int partno = presel->GetPartIndexInDBTree(iev);
     if(partno<0) continue;//there's not a single particle passing the preselection (neither one or more than one)
     pev = chain->GetEvent(iev);//ONLY now the real AMS event is retrieved
     presel->SetAMSEvent(pev);
     if(!presel->AMSEventMatchWithDBTreeEvent(pev->Event())) {
         printf("Loop-On-DB-Tree: %d-%d does not match with DB tree\n",pev->Run(),pev->Event());
         exit(9);
     }
...
}

Code to be implemented to use the preselection looping on a custom tree (ntuples with only selected events) and asking to the db which preselection cut are passed or not:

A compact and smart way to do this is not yet implemented.

A "workaround" that can be used, so far, is to still loop on the db tree and IN PARALLEL on the custom tree (that MUST have ordered events): one loops on the custom tree and on the db tree (as in the previous snippet of code) and "accept" an event only if it has the same run/event of the one in the custom tree. In that case (i.e. the event on the custom tree has been "found" in the db tree) one can ask is a preselection cut has been passed or not. If the events in the custom tree are ordered once the event is found on the db tree THERE'S NO NEED TO RESTART THE LOOP, for the next custom tree event, on the db tree but is ok to continue from the current one.

To retrieve the db tree corresponding to the run of the current custom tree, one could use:

Preselection::GetDBTreeFromRun(unsigned int run)

-- MatteoDuranti - 2015-02-10

Topic attachments
I Attachment History Action Size Date Who Comment
Unix shell scriptsh template_job.sh r1 manage 2.1 K 2015-02-10 - 18:31 MatteoDuranti template submission scripts
Unix shell scriptsh test_node_env.sh r1 manage 0.6 K 2015-02-10 - 18:31 MatteoDuranti template submission scripts
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2016-05-18 - MatteoDuranti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback