Remark: Samples which have been produced on the GRID are all replicated to this directory:
/castor/ific.uv.es/grid/atlas/datafiles/wildauer/
If not in my local CASTOR space yet, they will be copied there eventually.


This description is not supposed to replace the various howto's for running jobs on the Grid. It is merely a summary of what I have to do to run jobs on the grid. If people use it as their howto - go ahead.

This guide uses the jdlgen, status, ... scripts written by Alessandro de Salvo. Please have a look at this webpage written by James Catmore and setup the right environment.

Step by step to the grid:

  • Get a certificate smile
  • login to lxplus and type:
source /afs/cern.ch/project/gd/LCG-share/2.3.0/sl3/etc/profile.d/grid_env.sh
  • Get a very long certificate! If your grid token expires before the jobs are finished they will all be aborted.
grid-proxy-init -valid 100:00
gives you a proxy which last for 100 hours.
  • cd to your grid environment (as setup with James guide) and type
source setup.sh

Simulation jobs

  • create jdl job description files with the command jdlgen
jdlgen -r 9.0.4 -d 4020 -D tth120_bb --in-step evgen904 --out-step g4initial904 \
-p wildauer -t 09-00-04-06 -e 2 --in-part 1:1 --out-part 1:5 -s castorgrid.ific.uv.es \
--eta-min=-4. --eta-max=4.
this will give you jobs with the following structure:
  • storage directory (on castorgrid.ific.uv.es)
wildauer/g4initial904/wildauer.004020.g4initial904.tth120_bb
  • input file name LFN has to be:
wildauer.004020.evgen904.tth120_bb._00001.pool.root
  • Output files will be called:
wildauer.004020.g4initial904.tth120_bb._00001.pool.root
wildauer.004020.g4initial904.tth120_bb._00001.job.log
...
same for 2 to 5
...
This jdlgen command will create jdl files where you need TWO input file registered with the logical file name (LFN) wildauer.004020.evgen904.tth120_bb._00001.pool.root. You want to have 5 output files with 2 events each. You also need to have a template file in your templates directory with the name wildauer.wildauer.g4initial904.jdl.template.

Digitization jobs

  • create jdl job description files with the command jdlgen
jdlgen -r 9.0.4 -d 4020 -D tth120_bb --in-step g4initial904 --out-step g4digi904 \
-p wildauer -t 09-00-04-06 -e 2 --in-part 1:2 --out-part 1:1 -s castorgrid.ific.uv.es
this will give you jobs with the following structure:
  • storage directory (on castorgrid.ific.uv.es)
wildauer/g4digi904/wildauer.004020.g4digi904.tth120_bb
  • input file name LFN has to be:
wildauer.004020.g4initial904.tth120_bb._00001.pool.root.1
wildauer.004020.g4initial904.tth120_bb._00002.pool.root.1
  • Output files will be called:
wildauer.004020.g4digi904.tth120_bb._00001.pool.root.1
wildauer.004020.g4digi904.tth120_bb._00001.job.log.1
...
same for 2.
...

This jdlgen command will create jdl files where you need TWO input file registered with the logical file name (LFN) wildauer.004020.g4initial904.tth120_bb._0000X.pool.root (X=1,2). You want to have 2 output files with 2 events each. This means you get one output file per input file. You also need to have a template file in your templates directory with the name wildauer.wildauer.g4digi904.jdl.template.

Miscellaneous

Register a file

Note that files created on the grid are already registered. You can register a file either with:
lcg-cr --vo atlas -d castorgrid.cern.ch -l <LFNofFile> -g guid:<PoolFileID> \
gsiftp://castorgrid.cern.ch/castor/cern.ch/grid/atlas/wildauer/<file>
this will make a copy of your file and put it under some generic directory. It is better to copy the files to atlas castor grid space and register with:
lcg-rf --vo atlas -g guid:44021A5E-CBB3-D911-8575-0002B3AF7CDB -l wildauer.004020.evgen904.tth120_bb._00001.pool.root \
sfn://castorgrid.cern.ch/castor/cern.ch/grid/atlas/wildauer/evgen/data/wildauer.004020.evgen904.tth120_bb._00001.pool.root 
make sure the file is already in the atlas grid castor space (/castor/cern.ch/grid/atlas/). That way the sfn of the file will just be the file location and not some generic copied name.

The LFN of the file is arbitrary. As a convention it is simply the filename itself.

Unregister a file

To unregister a file, type:
lcg-uf --vo atlas guid mySFN
where guid and mySFN have to replaced accordingly. mySFN can be obtained via:
lcg-lr --vo atlas lfn:myLFN
and the guid via:
lcg-lg --vo atlas lfn:myLFN
Note that this only unregisters the file. It does not delete it! As a "shortcut" zou can also do:
lcg-uf --vo atlas \
`lcg-lg --vo atlas lfn:private.004020.g4initial904.tth120_bb._00062.pool.root.1` \
`lcg-lr --vo atlas lfn:private.004020.g4initial904.tth120_bb._00062.pool.root.1`

Checking the ranking expression

If you want to know to which site a job would most likely go, type:
edg-job-list-match --rank myfile.jdl
The answer will look something like this:
lxplus011,11:30,~/grid/jobs >edg-job-list-match --rank jdl/wildauer.004020.g4initial904.tth120_bb._01302.jdl

Selected Virtual Organisation name (from JDL): atlas
Connecting to host lxn1177.cern.ch, port 7772

***************************************************************************
                         COMPUTING ELEMENT IDs LIST
 The following CE(s) matching your job requirements have been found:

                   *CEId*                             *Rank*

 golias25.farm.particle.cz:2119/jobmanager-lcgpbs-lcgatlasprod11300
 lcgce01.triumf.ca:2119/jobmanager-lcgpbs-atlas         1800
 lcg2ce.ific.uv.es:2119/jobmanager-lcgpbs-atlasL        1650
 t2-ce-01.lnl.infn.it:2119/jobmanager-lcglsf-atlas      1250
 lcg-ce.lps.umontreal.ca:2119/jobmanager-lcgpbs-atlas   1000
 lcg2ce.ific.uv.es:2119/jobmanager-lcgpbs-atlas         173
***************************************************************************

Excluding or especially demanding a certain site

If you want to exclude or select a site to run you jobs on, add the following to you Rank expression:

Exclude:

(!(RegExp("address_of_troublesome_site",other.CEId))) &&

Select all sites in the uk:

RegExp(".*uk:*",other.CEId) &&

Troubleshooting

lxplus011,17:24,~/grid/jobs >echo $LCG_GFAL_INFOSYS
lxn1178.cern.ch:2170
lxplus011,17:25,~/grid/jobs >export LCG_GFAL_INFOSYS=atlas-bdii.cern.ch:2170
lxplus011,17:25,~/grid/jobs >./registerFiles.sh

In case you cannot get a pfn to your file via the lfn

In such case you file is irremediably lost since you have no entry pointing to the PFN of the file (unless you know it by some other mean different form the file catalog). The only solution is therefore to delete the GUID-LFN mapping. You can do that using the lcg-ra command which you fing in the User Interface.

The sysopsys is

$ lcg-ra --vo vo guid lfn

see manpages for details. -- DerSchrecklicheSven - 25 Apr 2005


This topic: Main > DerSchrecklicheSven > AthenaStuff > GrId
Topic revision: r10 - 2005-06-20 - AndreasWildauer
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback