Analysis Tutorial
There is differents way to Analyse the data from a Simulation.
You can Analyse the data localy on you computer,
you can also use GRID or PROOF to do it on a bigger amount of data.
Local Analysis
INTRODUCTION
When you need to Analyse data, it's better to test the Analysis localy before.
Localy means that you analyse the data on your computer but the data must be in the GRID Database.
After you can,if you want ,submit your program on GRID or PROOF and be sure that
you don't have made a mistake.
So you need, an xml file to indicate which file you want to analyse ,a macro to get the librairies and the Data,
a class deriving from TSelector (this class contains the code to do the analysis ),you also need the header file of this Class
and finaly a PAR Archive which is a kind of library.
1)xml file
The xml file indicates where are the data in the grid
DataBase. To create a xml file you
can use the command "find" impemented in the alien shell.
find -x collection /alice/cern.ch/user/m/mdujardi/data/
AliESDs.root > /afs/in2p3.fr/group/alice/tempo4/mdujardi/xml/collection.xml
2) The Macro
The main macro is done to get your data, get the trees containing the events and
apply your Analysis on each event. You can download this simple macro on this weeky page.
Don't forget to change into the macros,the PATH whoch indicates where your TSelector and your
xml file is.
The macro is very general, you can remove the part you don't need.
To describe in details what does the marco:
- Unzip the .par archive
- Build the class into the folder of the .par
- Execute the setup to use PROOF
- Connect to GRID
- Create a collection of files with the xml file
- Convert the collection into a TChain
- Use the TSelector on each file of the TChain
You need to use this macro each time you want to do an analysis.
3)The Tselector
You need to put your Analasys code in a Class that inherits from a TSelector. I join
a example of a TSelector called "esdAna.C". The code you want to apply to your
data must be in the method "Process". This method is applied on each event of the
all the TTree contained by the files indicated in the xml file.
4)The Header file
Don't forget to put in the folder of your analysis the header file of your Tselector.
I attach the header file "esdAna.h".
5)The PAR archive
The PAR archive contains all the Class that are necessary to do the Analysis. You can make the PAR archive
with this script
MakePar. You can also download the file called
AliESD.par at .
CONCLUSION
Now you can start an Analysis localy in starting Aliroot and typing the command.
root[0] .x
AlienBatchAnalysis("collection.xml","esdAna.C","AliESD");
You can find here a macro which copy data files in the GRID Database.
Analysis on GRID
Now that you have make your Analysis localy, and that your are sure that your program
is correct. You can do your analysis on GRID in submiting one or many jobs.
To sumit a job you need a jdl file,the PAR Archive,an executable,the same TSelector
and Header File, and a file merger (only if you have split your jobs ).
You must copy all these files in alien with the copy command,it's better to create
in alien a directory bin for the executable ,a directory jdl for the jdl files
and a directory macros for your macros .
1)JDL File
This file contains all the informations related to the job.
You can find here a commented example of a jdl file.
This time you don't need to create a xml file and you can create your jdl file using
this little macro "Makejdl.C".
It's possile to split your job and process a given number of file or a given size of data per job.
To do that you have to remove in your jdl file the line with the output directory and to add the lines
split="se";
SplitMaxInputFileNumber="number of files you want";
or
SplitMaxInumpuFile
So the ouptut of the job will go in the directory /proc/
/
Thus you can merge your data using this macro "HistoMerger.C".
In spliting your jobs you will save a lot of time.
3)The Executable
In the jdl File you have to specify the Executable. It's just a script you have to put
in you bin directory in alien.This script start aliroot ,execute your macro that load
the librairies and execute the TSelector.
There is an example of the scrpit:
#!/usr/bin/env bash
root -b <<EOF
.L AlienBatchAnalysis.C
AlienBatchAnalysis("collection.xml","esdAna.C","AliESD");
EOF
4)PAR Archive
You just need to copy the par archive in alien.
5)The File Merger
The File Merger will merge your different joboutput histogramms or Trees in one file.
You can use this simple macro.
6)Submiting a job
If you have done all the steps you can start your job using the command submit .jdl
You can follow your job(s) with differents commands.
ps
ps -trace
ps -f RUNNING
ps -f WAITING
...
gbbox ...
Analysis with PROOF
-- MarcDujardin - 31 May 2006