Scope of the Tutorial:

The goal of this tutorial is to make sure that by the end of the workshop, the participants will be able to run ATLAS analysis on their own. There are two important requirements that need to be satisfied before the start of the tutorial, namely:

  • Have an account on Lxplus.
  • Have a valid grid certificate to be able to run analysis using the GRID.
We will cover the following main areas:
  • Setup the directory structure.
  • Setup a particular version of the ATLAS software releases.
  • Understand various physics analysis objects (how to access more information about each object).
  • Use the physics objects in an AOD analysis.
  • Use the GRID (PAnda, Ganga) to analyze data, etc...

Session 1:

Setup your work area on lxplus

In your home directory (${HOME}/):

  • Create a directory called cmthome: " mkdir cmthome"
  • Create a directory called IctpTutorial: "
    mkdir IctpTutorial
  • Create a directory under IctpTutorial which corresponds to the ATLAS release being used:
    mkdir IctpTutorial/14.2.21
    . This will hold your Algorithms and any checked out code .
  • Copy the requirement file from my home area:
    cp ~chaouki/scratch0/cmthome/requirements ${HOME}/cmthome/
    where the requirements file is needed to define athena release version together with the necessary environment (CAUTION: you need to make sure that $HOME/scratch0 is replaced by $HOME in requirements and the script
       set CMTSITE CERN
       set SITEROOT /afs/
       macro ATLAS_DIST_AREA ${SITEROOT}/atlas/software/dist
       # use optimised version by default
       macro ATLAS_TEST_AREA    "${HOME}/IctpTutorial/14.2.21" \
       14.2.21            "${HOME}/IctpTutorial/14.2.21"
       use AtlasLogin AtlasLogin-* $(ATLAS_DIST_AREA)
  • Copy and run the script, which sets up CMT.
       cp ~chaouki/scratch0/cmthome/ ${HOME}/cmthome/
       cd ${HOME}/cmthome/
    TIP this command needs to be done only once.
  • Finally copy and run the script to setup the release environment
       cp ~chaouki/scratch0/cmthome/ ${HOME}/cmthome/
    TIP this command needs to be done every time you open a new shell/xterm.

Browse/Retrieve Datasets over the GRID

Although, some of you do not have GRID certificate roll eyes (sarcastic) , we will present the necessary steps to browse and retrieve the data. Once the GRID certificates is granted (ATLAS VO registration), they should be placed in the directory:
After installing the certificates thumbs up , you can initiate a proxy session by typing:
Next, you need to setup the required environment to be able to use the DQ2 tools needed for browsing/retrieving data over the GRID, namely:
source /afs/
Now, you are ready to view ATLAS datasets which are stored in multiple locations on the GRID. The most useful commands are:
  • dq2-ls: used to browse a particular dataset or datasets which contain particular string, example: dq2-ls '*.PythiaZee.*'
  • dq2-get: used to retrieve the data over the GRID and store it locally.
For more information about the DQ2 tools, you can look at the following page: Atlas DDM page

Let's look for a particular dataset: valid1.005144.PythiaZee.recon.AOD.e322_s412_r583, this is a validation dataset. Now, to retrieve what containers are stored in the dataset, one can use the python file "" which can be obtained using get_file:

. Get the file from CASTOR, rfcp /castor/ /tmp/$USER/. Then, run it:
 ./ /tmp/$USER/AOD.029110._00001.pool.root.1 

General Information about ATLAS data

ATLAS data Model

ATLAS uses several formats, designed to handle in a efficient way a large and diverse amount of information, ranging from raw data collected with the detector to the final reconstructed objects used in the physics analysis


ESD (Event Summary Data):

  • Full output of reconstruction in object (POOL/ROOT) format:
    • Tracks (and their hits), Calo Clusters, Calo Cells, combined reconstruction objects etc.
  • Nominal size 1 MB/event initially, to decrease as the understanding of the detector improves
    • Compromise between “being able to do everything on the ESD and not enough disk space to store too large events.

AOD (Analysis Object Data):

  • Summary of event reconstruction with “physics” (POOL/ROOT) objects:
    • Contains electrons, muons, jets, etc.
  • Nominal size 100 kB/event (now 200 kB/event including MC truth)

DPD (Derived Physics Data):

  • Skimmed/slimmed/thinned events + other useful “user” data derived from AODs and conditions data
  • Nominally 10 kB/event on average
    • Large variations depending on physics channels


  • Database (or ROOT files) used to quickly select events in AOD and/or ESD files

Dataset Naming convention

A run number (or dataset number) is unique and refers to a sample that contains particular physics. It can be assigned to either a Monte Carlo sample or a data sample are it cannot be recycled.

Datasets have the following name format: Project.datasetNumber.physicsShort.prodStep.dataType.TAG. Example: mc08.106300.PythiaH120zz4l.recon.AOD.e352_s462_r541

Project is a string that indicates a production or processing serie, such as mc08, valid1, fdr08_run2...

dataSet is a 6 digits number assigned to the run number.

PhysicsShort is a string that provides a short description of the data.

prodStep descrive the production step, it can have one of the following value

  • evgen corresponding to event generation.
  • simul corresponding to events processed by Geant-4 simulation
  • digit corresponding to digitized events
  • recon corresponding to outputs from reconstruction

dataType describe the data format produced at a particular production step:

  • EVNT: at event generation
  • HITS: at detector simulation
  • RDO: at event digitization
  • ESD, AOD, DPD : at reconstruction
  • TAG: at event data tag building
  • log: logfile produces with each step.

TAG is a serie of tags describing each step of the production. For Monte Carlo data they are defined as follow:

  • e XXX: evgen configuration
  • s XXX: simulation configuration
  • d XXX: digitization configuration
  • r XXX: reconstruction configuration
  • a XXX: atlfast (either I or II) configuration
  • t XXX: tag production configuration
  • b XXX: bytestream production configuration

The digits following the letter indicate a unique software configuration. For example: using a different athena release, or using a different geometry tag, or using a different set of jobOptions fragments requires a new tag.

Searching for ATLAS Dataset and interpreting TAGs

Tags can be interpreted in three possible ways:

  • Using AMI (recommended). From the main page:
    • Click on Nomenclature
    • Write down the tag in the configurationTag field
    • Click on Interpret
  • Using the Atlas Production page: click on Production Tags, then write the tag in the field, and click Go.
  • Using the Panda Monitor: do a Tasks - search, write down the tag in the Configuration Tag field, click on any task ID that will appear. Then Click on the active link to get info about the production configuration..

Using AMI

AMI can also be used to retrieve dataset files and get the full path to access them using DQ tools such as dq-get. As an example, retrieving AOD files in mc08.106300.PythiaH120zz4l.recon.XXX.e352_s462_r541 sample. Go to AMI then to Dataset Search . Fill in mc08.106300.PythiaH120zz4l.recon.%.e352_s462_r541in the "datasets search" field and click on it.

NB: Use % for wildcard: example "mc08.106300.PythiaH120zz4l.%" to search for all run 106300 samples produced in the mc08 series, including event generation, simulation, digitization, reconstruction...

Please refer to the AMI Tutorial for more relevant details.

Start Running a simple job (Z(ee) reconstruction):

Now, that we know the objects of interests, we will use the Z --> ee MC sample and reconstruct the resonance invariant mass. You can copy the following code ~chaouki/scratch0/IctpTutorial/14.2.21/Z_Analysis to your area $HOME/IctpTutorial/14.2.21. Then follow the following steps to configure and compile the code:

  • In Z_Analysis/cmt, do the following:
    cmt config
You need to repeat the last two commands whenever you change the code to recompile it.
  • In Z_Analysis/run, do the following:
The output would be a root file which contains a tree with one leaf representing the Z invariant mass. thumbs up

The AOD file to run on, is included in the jobOption file:

import AthenaPoolCnvSvc.ReadAthenaPool
ServiceMgr.EventSelector.InputCollections = ["rfio:/castor/"]

Session 2: Basic Tools used in a Physics Analysis (1)

Monte-Carlo Information

When looking at real data, it might be interesting to make use of theoretical models and model the detector response to the theory prediction. This is very important step to guide the expected discoveries. In fact theory and experiment must work together in order to achieve physics results where doubt is no probability. Therefore, one can in fact generate MC data samples and study the specificity of the models that are in the market. The process of generating MC has 4 main steps:

  • Generate events using a particular model.
  • Simulate the detector response by generating Hits in the different sub-detectors, here we need to know the detector geometry and possible misalignment.
  • Then the digitization phase come to simulate the electronics' response to the Hits.
  • Finally, using the digitization information one can reconstruct the basic quantities which will be described later (this is the only step one has to implement for real data).

The benefit that one gets with MC samples is the fact that the underlying/initial information is stored. That is you know for sure whether an identified object matches an electron, a muon, a tau or a jet using the truth information (hence the name "truth"). Exercise: try to access this information by adding the following lines into the Z_Analysis package:

In the header file:

#include "McParticleEvent/TruthParticleContainer.h"


std::string _truthParticleContainerName;
const TruthParticleContainer *_truthList ;

Then in the cxx file, add the following:

  • In the ctor:
declareProperty("MCParticleContainer", _truthParticleContainerName = "SpclMC");
  • Then for each event in the execute() method, you can retrieve the truth information as follows:
// --- Retrieve truth list ---
sc=_storeGate->retrieve( _truthList, _truthParticleContainerName);

double truth_px, truth_py, truth_pz, truth_e;
int truth_id, truth_stat, truth_bcode, truth_ndg;

for (int i=0 ; i<_truthList->size() ; i++) {
  const TruthParticle* tp = (*_truthList)[i] ;
  // --- Kinematics info of this truth particle ---
  truth_px = tp->px() ;
  truth_py = tp->py() ;
  truth_pz = tp->pz() ;
  truth_e = tp->e() ;
  truth_id = tp->pdgId() ;
  truth_stat = tp->genParticle()->status() ;
  truth_bcode = tp->genParticle()->barcode() ;
  // --- Daughter information ---
  truth_ndg = tp->nDecay() ;  
  for (int j=0 ; j< truth_ndg ; j++) {   
    const TruthParticle* truth_dg = tp->child(j);
Your assignment is to print out the truth information of the events generated in the Z(ee) simulation.


This is another very crucial part of the data taking process. Based on the effectiveness of the trigger, you might or might not discover your beloved signal. Try to picture this (numbers are not real but reality must of the same order of magnitude): almost 99% of the expected events are not of interest to new physics, hence you are left with almost 1% of potentially interesting data. Now, if something in the trigger does not work well, then you might loose part of the 1% (and may be all). Hence, it is of great importance to make sure that the trigger is highly efficient. In ATLAS, we have two main trigger levels, namely:
  • Level 1 trigger: LVL1.
  • High Level Trigger (HLT) which in turn is divided into two parts:
    • Level 2 trigger: LVL2.
    • Event Filter: EF.

For early data, the aim is to have a fast decision based on the information gathered at 40MHz at L1, which would allow to record data at ~200Hz after the EF (equivalent to store 300MB/s).

As an exercise: extract the LVL1 trigger signatures of the Z(ee) process. You might need to add the following code into the Z_Analysis package:

  • In the header file:
#include "TrigDecision/TrigDecisionTool.h"


/** get a handle to the TrigDecision helper */
ToolHandle<TrigDec::TrigDecisionTool> _trigDec;

  • In the .cxx file, in execute() method:
sc = _trigDec.retrieve();
// retrieve all TriggerDecision objects
const std::vector<const LVL1CTP::Lvl1Item*> L1Items = _trigDec->getL1Items();
std::vector<const LVL1CTP::Lvl1Item*>::const_iterator itItem;   
for(itItem= L1Items.begin(); itItem!= L1Items.end(); ++itItem) {
   if (!*itItem) continue;
   std::string name = (*itItem)->name(); 
   if(name == "") continue;
//   _trigDec->isPassed(name) ===> this method will tell you if the corresponding trigger signature called "name" has been fired or not (it has boolean type)

// Get all configured chain names from config
const std::vector< const TrigConf::HLTChain * > confChains = _trigDec->getConfigurationChains();
std::vector< const TrigConf::HLTChain *>::const_iterator iter;
for (iter = confChains.begin(); iter != confChains.end(); ++iter){
  std::string name = (*iter)->chain_name();
  std::string tmp_trigLevel =  name.substr(0,3);
  float prescale = (*iter)->prescale();
  if(name == "") continue;
  if (tmp_trigLevel=="L2_") {
  } else if  (tmp_trigLevel=="EF_") {

  • in the jobOption file, add:

############################# Set up trigger configuration service and metadata service is relies on, for analysis job without RecExCommon
from AthenaCommon.GlobalFlags import GlobalFlags
import IOVDbSvc.IOVDb
from IOVDbSvc.CondDB import conddb
conddb.addFolder("TRIGGER","/TRIGGER/HLT/Menu <tag>HEAD</tag>")
conddb.addFolder("TRIGGER","/TRIGGER/HLT/HltConfigKeys <tag>HEAD</tag>")
conddb.addFolder("TRIGGER","/TRIGGER/LVL1/Lvl1ConfigKey <tag>HEAD</tag>")
conddb.addFolder("TRIGGER","/TRIGGER/LVL1/Menu <tag>HEAD</tag>")
conddb.addFolder("TRIGGER","/TRIGGER/LVL1/Prescales <tag>HEAD</tag>")

## set up trigger decision tool
from TrigDecision.TrigDecisionConf import TrigDec__TrigDecisionTool
tdt = TrigDec__TrigDecisionTool()
ToolSvc += tdt
from RecExConfig.RecFlags  import rec
from TriggerJobOpts.TriggerFlags import TriggerFlags
TriggerFlags.doTriggerConfigOnly = True
TriggerFlags.configurationSourceList = ['ds']

## setup configuration service
from TrigConfigSvc.TrigConfigSvcConfig import DSConfigSvc
from TrigConfigSvc.TrigConfigSvcConfig import SetupTrigConfigSvc
trigcfg = SetupTrigConfigSvc()

################################## END of trigger setup

Session 3: Basic Tools Used in a Physics Analysis (2)

Basic Objects (Outputs of the reconstruction)

At the AOD level, the user will be able to access the basic information about each object needed for physics analysis. Below, we show code snippets which would help you understand the process of retrieving the object's properties/reconstruction variables. In general, for each object we can access the kinematics information namely (charge is available for all charged objects):

Variable Implementation
px (*Itr)->hlv().x()
py (*Itr)->hlv().y()
pz (*Itr)->hlv().z()
energy (*Itr)->hlv().t()
pT (*Itr)->hlv().perp()
eta (*Itr)->hlv().eta()
phi (*Itr)->hlv().phi()
charge (*Itr)->charge()

And before retrieving the different objects, you need to get a handle on the store gate (_storeGate):

StatusCode sc = service("StoreGateSvc", _storeGate);

then retrieve different basic objects, which are defined below:

egamma objects

These are objects identified in the Liquid Argon calorimeter. They are a mixture of electrons and photons. One can distinguish between the two species using the information that the inner detector would provide namely track-shower matching information. Therefore and based on this information, egamma objects are further separated into photon and electron collections.

  • electrons: the electron objects, which are stored in the ElectronContainer, satisfy very loose criteria and can be accessed in the following way.
const ElectronContainer *_elecList ;   
sc=_storeGate->retrieve( _elecList, _electronContainerName);   
ElectronContainer::const_iterator elecItr  = _elecList->begin();
ElectronContainer::const_iterator elecItrE = _elecList->end();
where one can extract all possible information for example:
elecAuthor = (*elecItr)->author();
IsEM = (*elecItr)->isem() ;
emWeight = (*elecItr)->egammaID(egammaPID::ElectronWeight) ;
piWeight = (*elecItr)->egammaID(egammaPID::BgWeight) ;
From the EM/Track matching, one can access the following information:
  • E/p and Et isolation:
Variable Implementation
E/p trkmatch->parameter(egammaParameters::EoverP)
Et (Cone 0.45) trkmatch->parameter(egammaParameters::etcone)
Et (Cone 0.20) trkmatch->parameter(egammaParameters::etcone20)
Et (Cone 0.30) trkmatch->parameter(egammaParameters::etcone30)
Et (Cone 0.40) trkmatch->parameter(egammaParameters::etcone40)
Et (Cone 0.20) trkmatch->parameter(egammaParameters::etconoise20)
Et (Cone 0.30) trkmatch->parameter(egammaParameters::etconoise30)

  • Et isolation in a ring 0.1< DeltaR < D (0.2 or 0.3), above 3 sigma of total noise, available from rel 14.0.0:
Et (Cone 0.20) trkmatch->parameter(egammaParameters::etconoise20)
Et (Cone 0.30) trkmatch->parameter(egammaParameters::etconoise30)

where trkmatch is obtained as follows:

const EMTrackMatch* trkmatch = (*elecItr)->detail<EMTrackMatch>(_trkMatchContainerName);

  • photons: the photon objects, which are stored in the Photon Container, satisfy very loose criteria and can be accessed in the following way.
const PhotonContainer *_photonList ;   
sc=_storeGate->retrieve( _photonList, _photonContainerName);   
PhotonContainer::const_iterator photItr  = _photonList->begin();
PhotonContainer::const_iterator photItrE = _photonList->end();

where one can extract all possible information for example:

author   = (*photItr)->author() ;
IsEM     =  (*photItr)->pid()->isEM() ;
emwgt    =  (*photItr)->egammaID(egammaPID::ElectronWeight) ;
piwgt    =  (*photItr)->egammaID(egammaPID::BgWeight) ;

From the EM, one can access the following information:

  • E/p and Et isolation:
Variable Implementation
Et (Cone 0.45) p_EMShower->parameter(egammaParameters::etcone)
Et (Cone 0.20) p_EMShower->parameter(egammaParameters::etcone20)
Et (Cone 0.30) p_EMShower->parameter(egammaParameters::etcone30)
Et (Cone 0.40) p_EMShower->parameter(egammaParameters::etcone40)

  • Et isolation in a ring 0.1<!DeltaR< D (0.2 or 0.3), above 3 sigma of total noise, available from rel 14.0.0:
Et (Cone 0.20) p_EMShower->parameter(egammaParameters::etconoise20)
Et (Cone 0.30) p_EMShower->parameter(egammaParameters::etconoise30)

where p_EMShower is obtained from:

const EMShower* p_EMShower = (*photItr)->detail<EMShower>(_egDetailContainerName);

Muon objects

There are a variety of muon identification algorithms which led to two different muon containers: Stacomuons and Muidmuons.

The Stacomuons (from StacoMuonCollection) are muon candidates found by combining the information from the Inner Detector (ID) and MuonSpectrometer (MS) at the Interaction Point (IP). The packages involved are:

  • Muonboy which is a muon spectrometer "standalone" track reconstruction code.
  • MuTag: is an algorithm to tag low Pt muons (starts from the ID tracks at the IP).
  • STACO for STAtistical COmbination.

MuidMuons (from MuidMuonCollection) are muon candidates found by global re-fit of the hits from the ID and the MS. These muons are found the following packages:

  • MOORE (Muon Object Oriented REconstruction): A track fit is performed on the collection of hits recorded and a separate package (MuIDStandalone) is used to provide back propagation of the MOORE track through the calorimeter to the IP.
  • MuGirl (similar to MuTag): it associates an inner detector track to muon spectrometer. It uses pattern recognition algorithm based on Hough transforms and incorporates reasonable assumptions about MDT low level performance.
  • MUIDCombined: this algorithm performs a global fit of all hits associated to tracks, unlike STACO which statistically merges the two independently found tracks.

const Analysis::MuonContainer *_muidMuonList ;   
sc=_storeGate->retrieve( _muidMuonList, _muidMuonContainerName);   
MuonContainer::const_iterator muonItr  = _muidMuonList->begin();
MuonContainer::const_iterator muonItrE = _muidMuonList->end();
const Analysis::MuonContainer *_stacoMuonList ;   
sc=_storeGate->retrieve( _stacoMuonList, _stacoMuonContainerName);   
MuonContainer::const_iterator muonItr  = _stacoMuonList->begin();
MuonContainer::const_iterator muonItrE = _stacoMuonList->end();
where one can extract all possible information for example:
author   = (*muonItr)->author() ;
fChi2OverDoF     =  (*muonItr)->fitChi2OverDoF() ;
mChi2OverDoF     = (*muonItr)->matchChi2OverDoF() ;
fChi2     = (*muonItr)->fitChi2();
mChi2     = (*muonItr)->matchChi2() ;
bestM   = (*muonItr)->bestMatch();

Where one can get the following information from the EM:

  • Et isolation:
Variable Implementation
Et (Cone 0.10) (*muonItr)->parameter(egammaParameters::etcone10)
Et (Cone 0.20) (*muonItr)->parameter(egammaParameters::etcone20)
Et (Cone 0.30) (*muonItr)->parameter(egammaParameters::etcone30)
Et (Cone 0.40) (*muonItr)->parameter(egammaParameters::etcone40)

  • Et isolation in a ring DeltaR < D (0.1 or 0.2 or 0.3 or 0.4), above 3 sigma of total noise:
Et (Cone 0.10) (*muonItr)->parameter(egammaParameters::etconoise10)
Et (Cone 0.20) (*muonItr)->parameter(egammaParameters::etconoise20)
Et (Cone 0.30) (*muonItr)->parameter(egammaParameters::etconoise30)
Et (Cone 0.40) (*muonItr)->parameter(egammaParameters::etconoise40)

Muons are divided into three categories:

  • Combined muons: isCombinedMuon()
  • Standalone muons: isStandAloneMuon()
  • LowPt muons: isLowPtReconstructedMuon()

Jet objects

In general there are two main jet algorithms: Cone and Kt algorithms. These two algorithms have completely different ways of associating CaloClusers to jets. Jets are formed by nearby objects, where nearby refers to a distance. This distance can be either angular delta_R=sqrt(delta_eta**2 + delta_phi**2) for Cone algorithms or the relative transverse momentum K_T for the Kt algorithms. The Cone algorithm successively merge pairs of nearby objects within a cone size of delta_R in order of decreasing pt. To avoid double counting of energy a merging/splitting method is employed. But still the Cone algorithms are neither infrared nor collinear safe.

The Kt Algorithm successively merge pairs of nearby objects in order of increasing relative transverse momentum. A single parameter "D" determines when this merging stops ("D" characterizes the size of the resulting jets). There are two modes of the Kt algorithms: inclusive mode and exclusive mode.

const JetCollection* JetList ;
// --- Retrieve the cone jet list ---
sc=_storeGate->retrieve( _jetList, _jetContainerName);
// --- OR  Retrieve the cone4 jet list ---
sc=_storeGate->retrieve( _jetList, _jet4ContainerName);
// --- OR Retrieve the kt jet list ---
sc=_storeGate->retrieve( _jetList, _jetkContainerName);

JetCollection::const_iterator jetItr = _JetList->begin() ;
JetCollection::const_iterator jetItrE = _JetList->end() ;

Tau objects

// --- Retrieve taujet list ---
const Analysis::TauJetContainer *_tauList ;
sc=_storeGate->retrieve( _tauList, _taujetContainerName);
TauJetContainer::const_iterator tauItr  = _tauList->begin();
TauJetContainer::const_iterator tauItrE = _tauList->end();
For Taus, there are different reconstruction algorithms, where each algorithm has its own tau-identification parameters, for example we have:

  • Algorithm 1: tauRec for which one can extract the following information:
Variable Implementation
  • Algorithm 2: tau1P3P
Variable Implementation
(*tauItr)->parameter(TauJetParameters::etChargedHadCells )
(*tauItr)->parameter(TauJetParameters::etOtherEMCells )
(*tauItr)->parameter(TauJetParameters::discriminant )

The alogorithms are accessed through the following method:

author = (*tauItr)->author(); // can be either TauJetParameters::tauRec or TauJetParameters::tau1P3P

MissingEt objects

To get information about missing ET measurement, you need to extract the MissingET object with the key: "MET_RefFinal" at the ctor level:

declareProperty("MissingEtObject_RefFinal", _missingEtObjectNameRefFinal = "MET_RefFinal");

And then, in the execute() method, add the following lines:

// --- Retrieve missing ET RefFinal object ---
sc = _storeGate->retrieve(_etMissRefFinal,_missingEtObjectNameRefFinal);
MissingEt = _etMissRefFinal->sumet();

Modify the Z_Analysis package, Display and Analyse the Results (Histograms and Ntuples using Root)

Modify the code

The actual code loops over objects in the electron container and make a combination between two objects. The invariant mass of the two objects is then stored in the root file. Your assignment is to include more variables into the tree, try to clean the invariant mass plot using a combination of cuts. The files to modify are:
  • Z_Analysis/src/Z_Analysis.cxx
  • Z_Analysis/Z_Analysis/Z_Analysis.h

Session 4: Running jobs on the grid with GANGA and pathena/PANDA

Note: following this session will require you to have a valid grid certificate.

Having successfully tested the analysis code on a local AOD file, the code may be run on large datasets on the grid. The ATLAS policy is that jobs should be run on the data over the grid so that the job goes to the data where it's stored, rather than copying the data files locally which is impractical due to the size of AODs. It should be kept in mind that at the nominal luminosity, tens or hundreds of terabytes of AOD files will be produced in a single year. There are two ways of sending jobs to the grid: via GANGA and pathena (also called PANDA). Roughly speaking, GANGA is the framework for the European part of the grid while PANDA is its American counterpart. Ganga and PANDA are independent frameworks. However, Europe based users can still send jobs to PANDA and in fact sometimes this may be the preferred option, for example in case the required datasets are available at BNL and CERN only.

We start by discussing GANGA.

Running jobs on the grid with Ganga

First set up Athena in the usual way. Then from the run directory of Z_Analysis type:

source /afs/  (or .csh)

Ganga is now set up and running - it can be exited with Ctrl-D. In order to run our job we have to write a small script which tells Ganga which job options to use, which dataset to run on, how many subjobs to split the job into and which site to execute on. A simple example of a Ganga script is:

j = Job()'MyGridAnalysis'


j.backend.requirements.sites= ['NAPOLI']
The important parts are:
which specifies how many events to run on,
which indicates how many subjobs the job is split into - remember that running over 100.000 events takes a very long time on a single grid node, and it is therefore preferable to split the job into (for example) 20 subjobs each of which will then run over 5000 events. The lines
tell ganga to save the output in a dq2 dataset, which will be called
Lastly, we specify which Tier-2 site to run on in the line:
j.backend.requirements.sites= ['NAPOLI']

Now copy/paste this script into your favourite editor and save the file as in the run subdirectory of Z_Analysis. Then run ganga as before and once it's started up, type:


You can now type

and there will be one job, whose status will eventually change from running to completed. In case of problems, the job will fail. This may be due to a number of reasons, the most frequent being buggy code or the grid site not having been set up properly, in which case you should try sending the job to a different location.

Once the job completes successfully, we can retrieve the output by setting up dq2 and issuing the following command:

dq2-get -H /tmp/<yourUserName> user08.<yourUserName>.Zjets_v14_test

dq2 will now retrieve the

files from each individual subjob and and put them in the local /tmp/ directory. The log files from the run, including the athena log files for each separate subjob, may be found in the directory
$HOME/UserName/Local/<job number>/output/

More information on GANGA can be found at

Running jobs on the grid with pathena/PANDA

First set up Athena as usual. Then type:
cd ${HOME}/IctpTutorial/14.2.21
cmt co PhysicsAnalysis/DistributedAnalysis/PandaTools
cd PhysicsAnalysis/DistributedAnalysis/PandaTools/cmt
cmt config
rehash  # (if you are using zsh/csh/tcsh)

pathena should be set up now. Typing

should result in:
ERROR : no outDS
   pathena [--inDS input] --outDS output

Every time you log in and set up Athena from then on, the pathena command should be recognised automatically.

Now, to run a job with pathena, we run the command:

pathena --split 20 --inDS valid1.005144.PythiaZee.recon.AOD.e322_s412_r583 --outDS user08.<yourUserName>.Zjets_v14_pathenatest

By default, jobs are sent to the BNL site, ANALY_BNL_ATLAS_1. This can be changed by adding a

option. The number of subjobs is controlled by the
option. To specify the number of events each subjob runs on, use the

There are two ways to monitor the progress of a PANDA jobs. One is to use the pandamonitor website, The other is to run

Then issuing the command
gives a list of jobs, which nodes they were assigned to, their JobID and other useful information. Now the command
may be used to see the status of an individual job.

Conveniently, PANDA sends an email out to your CERN address once the job execution is finished, with the status (completed/failed) and the names of input and output datasets.

More help on PANDA is available at

-- ChaoukiB - 24 Nov 2008

Edit | Attach | Watch | Print version | History: r32 < r31 < r30 < r29 < r28 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r32 - 2010-10-12 - JohannesElmsheuser
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback