Panda Athena

As an alternative to Ganga, you can use Panda for your distributed analysis. The information here should be enough to get you started, but for full documentation you should consult the PandaAthena TWiki here: https://twiki.cern.ch/twiki/bin/view/Atlas/PandaAthena.

Initial Setup

Before you use pathena for the first time, you will need to make sure that your nickname is registered with the ATLAS VO. To do this go to https://lcg-voms.cern.ch:8443/vo/atlas/vomrs, click on the '+' by 'Member Info', then go to 'Edit Personal Info', tick the First name, Last name and nickname boxes and then click Search. If you don't already have a nickname then you can add this in the appropriate field (this should be of the format 'firstnamelastname').

You're now ready to start pathena-ing!

Submitting Pathena Jobs

Source your favourite ATLAS release. e.g.

asetup 16.0.2.3

Setup the grid:

source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid_env.sh

export PATH=$PATH:/afs/cern.ch/atlas/offline/external/GRID/ddm/pro02/

export PYTHONPATH=$PYTHONPATH:/afs/cern.ch/atlas/offline/external/GRID/ddm/pro02/

voms-proxy-init -voms atlas -valid 90:00

Setup Panda:

source /afs/cern.ch/atlas/offline/external/GRID/DA/panda-client/latest/etc/panda/panda_setup.sh

Then, all you have to do is, instead of running your athena jobs in the usual way, e.g. athena myJobOptions.py, all you have to do is replace 'athena' with 'pathena' and specify input and output dataset names using --inDS and --outDS, like so:

pathena myJobOptions.py --inDS myFavouriteDataset --outDS user..myTestNtuple.root

If you're running over a large number of files you can specify how many sub-jobs it should be split in to by adding "--split N" to the end of the command (whereN is the number of sub-jobs you want to have).

Submitting Multiple Jobs

Use a simple script which takes a list of datasets and runs the pathena command on each one.

An example can be found here: PathenaSub.py.txt (remove the .txt suffix).

It can also be very easily modified to run over several GRLs instead of datasets (see below).

Monitoring Jobs

You can check the status of your jobs by going to the Panda Monitor page (http://panda.cern.ch:25980/server/pandamon/query) and entering the PandaID of the job(s) in the 'Job' field. Once the jobs have completed you will get an email detailing the number of jobs submitted, and how many succeeded, failed or were cancelled.

You can also display the status of your jobs on the command lin. In the terminal, type:

pbook

It will then retreive information about all of the jobs you have submitted (it may take a few seconds to load if you've recently submitted a lot). Then you can display the status of a given job by doing:

>>> show(JobID) e.g.

Start pBook 0.2.50
>>> show(33)
INFO : Getting status for JobID=33 ...
INFO : Updated JobID=33
======================================
          JobID : 33
           type : pathena
        release : Atlas-15.6.10
          cache :
        PandaID : 1080485641-1080485664,1080485667-1080485678,1080485680-1080485692,1080485694-1080485695
          nJobs : 50 + 1(build)
           site : ANALY_LYON_DCACHE
          cloud : FR
           inDS : mc09_7TeV.109281.J5_pythia_jetjet_1muon.merge.AOD.e534_s765_s767_r1302_r1306/
          outDS : user.katharineleney.J5muon.r1306.root
          libDS : user.katharineleney.0614094720.367819.lib._000033
        retryID : 0
   provenanceID : 0
   creationTime : 2010-06-14 09:47:23
     lastUpdate : 2010-06-14 16:22:32
         params : ../share/Htautau_jobOptions.py --inDS mc09_7TeV.109281.J5_pythia_jetjet_1muon.merge.AOD.e534_s765_s767_r1302_r1306/ --outDS user.katharineleney.J5muon.r1306.root
      jobStatus : running
             finished : 49
              running : 2
>>>

If you don't specify a JobID (i.e. simply do 'show()' then the status of all uncompleted jobs will be displayed.

Killing Jobs

Start pbook, as above, and then just do: =>>> kill(JobID) =

Resubmit Failed Sub-Jobs

Again, it's dead easy... if a job, or some sub-jobs failed, simply go to pbook and do:

>>> retry(JobID)

It will then pick out any jobs which failed last time and resubmit just those ones for you.

Retreiving the Output

Once your jobs have finished you can retrieve the output files by doing

dq2-get user./myTestNtuple.root

And you're done!

A few extra notes...

* Use the merged AOD files - it complains if you don't.

* You can specify a specific site to use by adding "--site" to your command. e.g. pathena -c 'Events=1000' ../share/Htautau_jobOptions.py --site UKI-NORTHGRID-LIV-HEP_MCDISK --inDS mc09_7TeV.109910.SherpabbAtautaulhMA120TB20.merge.AOD.e534_s765_s767_r1250_r1260/ --outDS user.katharineleney.myTestNtuple.root

* You can tell pathena to run over your GRL by doing: pathena myJobOptions.py --goodRunListXML MyLBCollection.xml --outDS user... It will then translate your GRL into a list of datasets and use this in place of the --inDS option. See https://twiki.cern.ch/twiki/bin/view/Atlas/PandaAthena#example_10_How_to_run_on_a_good for details.

-- KatharineLeney - 22-Feb-2011

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt PathenaSub.py.txt r1 manage 0.9 K 2011-02-22 - 15:40 KatharineLeney Simple script to submit multiple pathena jobs
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2011-02-22 - KatharineLeney
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback