TWiki> Main Web>TWikiUsers>JohannesHaller>SFramePage (revision 28)EditAttachPDF

SFrame

Idea, motivation, history of the SFrame package

SFrame is supposed to be a general HEP analysis package based on ROOT trees. An analysis in HEP is often performed in cycles. Each cycle usually corresponds to a reduction, or more general to a new treatment of the data (e.g. calculation of new quantities for each event). SFrame follows this cycle-based analysis approach by splitting an analysis in several cycles. Each cycle takes a number of ROOT trees (e.g. from sources, like data or different MC generators) in a certain format as input and produces ROOT trees in a different output format. In addition control histograms are produced. SFrame allows the combination of multiple input ROOT trees potentially coming from different physics processes taking into account varying luminosity values to which these trees correspond. Moreover, there exists the possibility to process multiple trees containing multiple views of the same event. In addition, SFrame can properly handle cuts applied on MC generator level of the input trees. The output trees of a certain step are supposed to be the input trees of the next step.

SFrame is a framework in which users can implement their analysis after they have produced the Root Trees for the various data sources. All analysis-specific steps are still under the control of the user. In particular the implementation of all analysis cycles is in the hand of the user: He only has to provide an ExecuteEvent() method for each cycle in which the selection/calculation steps and the histogram filling is done. All the frame-work functionlity is provided by SFrame: I/O of trees, I/O of histograms, loop over events, weighting, etc... As steering parameters, the user has to provide some meta-data describing the cycle (e.g. integrated luminosity the background should be weighted to), the input ROOT trees (type of physics process, integrated luminosity, cuts applied on generator level) and the output format. These meta-data are provided in a certain XML format.

Some introductory transparencies can be found here.

The work on SFrame started in the context of the effort of the CERN ATLAS trigger group in analyses in Supersymmetry. Initially SFrame was intended to work on the ROOT trees provided by the SusyView package. However, SFrame developed towards an SUSY indepdendent package that can be used for any HEP analysis based on ROOT trees. Institutes involved in the development are CERN, University Hamburg, University Manchester. Meanwhile it is used by several groups (DESY, CERN, Manchester, University Hamburg)

As SFrame is a new project, it is still in the development phase.

Technicalities

SFrame is a C++ package that is independent of the ATLAS software. It only depends on ROOT and works with all ROOT versions starting with 5.14/00. The code is usually being developed against the newest production release of ROOT.

Note: Some features might only be available when compiling against the 5.17 development version of ROOT, or newer versions.

SFrame is currently supported/used on both Linux and MacOS X platforms. (Support for Windows has been dropped at one point.)

CVS repository

The SFrame code can be found in the ATLAS CVS repository under: /atlas/groups/sframe/SFrame.

The structure of the code

The SFrame package builds 3 separate libraries, and an executable. Short descriptions of the libraries follow:

  • SFrameCore: Built from the sources under SFrame/core. It holds the main classes of the framework for controlling the execution of an analysis cycle.
  • SFramePlugIns: Built from the sources under SFrame/plug-ins. It holds classes that can be useful in physics analyses. (Quite empty at the moment.)
  • SFrameUser: Built from the sources under SFrame/user. This library is a skeleton for physics analysis code built on top of SFrame. It holds two example analysis cycles, which demonstrate the features of SFrame. It also holds example configurations of these cycles.

The main SFrame executable is called sframe_main, and is built from SFrame/core/app/sframe_main.cxx. The executable expects the name of exactly one XML file describing the analysis cycle as a command line parameter. It is only linked against SFrameCore, as the executable only uses code from this library. Any additional libraries which are needed for the analysis (additional ROOT libraries, the user's own library holding his/her cycles) are loaded dynamically, and have to be specified in the configuration XML file.

Obtaining and compiling the package

SFrame is supported on two platforms:

  • Linux systems with a standard ROOT installation: The testing/development is being done mainly on SLC4 using the AFS installation (/afs/cern.ch/sw/lcg/external/root/) on ROOT.
  • MacOS X systems with either "standard" or Fink ROOT installation: The testing is done on 10.4 (Tiger) with a Fink installed ROOT.

If you're on MacOS X and using Fink to install ROOT, you only have to make sure that the Fink directories (from /sw) are correctly configured in your environment. (Usually by sourcing /sw/bin/init.sh in your login script.) When using a "standard" installation of ROOT (either on Linux or MacOS X), you have to set the following environmental variables:

  • ROOTSYS = /location/of/root
  • PATH = $ROOTSYS/bin:$PATH
  • LD_LIBRARY_PATH = $ROOTSYS/lib:$LD_LIBRARY_PATH

Since SFrame-02-01-00 SFrame also indirectly depends on Python. (Through the ROOT bindings.) Because of this, you also have to make sure that your environment is set up to use the same version of python that your version of ROOT was compiled against. For instance ROOT 5.20 and newer versions are compiled against Python 2.5, while the default Python version on SLC4 is 2.3...

All in all, on an Intel SLC4 machine you could make the following settings:

> export ROOTSYS=/afs/cern.ch/sw/lcg/app/releases/ROOT/5.22.0/slc4_ia32_gcc34/root
> export PYTHONDIR=/afs/cern.ch/sw/lcg/external/Python/2.5/slc4_ia32_gcc34
> export LD_LIBRARY_PATH=$ROOTSYS/lib:$PYTHONDIR/lib:$LD_LIBRARY_PATH
> export PATH=$ROOTSYS/bin:$PYTONDIR/bin:$PATH
> export PYTHONPATH=$ROOTSYS/lib:$PYTHONDIR/lib:$PYTHONPATH

Wherever you do the setup, most importantly $ROOTSYS/bin/root-config has to be in your $PATH.

To check out the code, select the directory where you want to put/compile it. Set up CVS access to the ATLAS offline repository:

  • CVSROOT = :kserver:atlas-sw.cern.ch:/atlascvs from lxplus and other SLC machines
  • CVSROOT = :ext:@atlas-sw.cern.ch:/atlascvs; CVS_RSH = ssh from outside/other platforms.

Check out the code with:

> cvs co -r SFrame-02-01-08 -d SFrame groups/sframe/SFrame

Note: When starting to work with SFrame, it's always a good idea to start using the newest tagged version from the main branch in CVS. Bugfixes and small improvements are usually added more often than this page is updated...

Go to the SFrame directory, and execute:

> source setup.[c]sh
> make

This will build all libraries and the executable of the package. The libraries are put under SFrame/lib/ and the executable is put under SFrame/bin/. (They are both put in your environment by the setup.[c]sh script.)

Running the examples

There are two example cycles in the SFrame/user/ directory. They are called FirstCycle and SecondCycle. There are two XML files configuring these cycles under SFrame/user/config/. They are meant to work out-of-the-box on lxplus.

To run the first example, go to SFrame/user/config/, and execute:

> sframe_main FirstCycle_config.xml

The executable will read in the FirstCycle_config.xml file, and execute the FirstCycle cycle. The cycle will create two output ROOT files named: FirstCycle.Data1.root and FirstCycle.Data2.root.

The second example reads in one of the files created by the first example (FirstCycle.Data2.root) and creates some histograms from the data stored in the file. Thus demonstrating how the idea of multiple cycles work. To run the example, execute the following in the same directory:

> sframe_main SecondCycle_config.xml

Notice, that the second cycle runs over less events for Data2 than the first cycle did. This is because FirstCycle executed some basic event selection, and only the events that passed the selection were written into FirstCycle.Data2.root.

Structure of the input XML file

todo

Implementation of a cycle

The best strategy for newcomers is to take the examples (preferably FirstCycle) and rename/extend those. All cycles have to inherit from SCycleBase, and implement a few functions which are called by the framework when executing the cycle:

  • virtual void BeginCycle() throw( SError ): Function called once at the beginning of execution. You can use it to perform initial configuration of the cycle and all the code that it uses.
  • virtual void EndCycle()() throw( SError ): Function called once at the end of the cycle execution. Any finalisation steps should be done here. (Closure of some helper files opened by the user code for instance.)
  • virtual void BeginInputData( const SInputData& ) throw( SError ): Function called once before processing each of the input data types. SFrame creates one output file per input data type. By the time BeginInputData(...) is called, the output file is already open. This means that if you need to initialise output objects (histograms, etc.) before the event-by-event execution, you should do that here. Also the declaration of the output variables has to be done here.
  • virtual void EndInputData( const SInputData& ) throw( SError ): Function called as last before the processing of the input data type is finished, and the output file is closed. Any histogram finalisation should be performed here.
  • virtual void BeginInputFile( const SInputData& ) throw( SError ): For each new input file the user has to connect his input variables. (More on this later.) This has to be performed in this function.
  • virtual void ExecuteEvent( const SInputData&, Double_t ) throw( SError ): This is the main analysis function that is called for each event. It receives the weight of the event, as it is calculated by the framework from the luminosities and generator cuts defined in the XML configuration.

NTuple reading and writing

The branches of the input and output ntuples are handled individually. This means that for each input branch that you want to use in your analysis, you have to declare a variable (a "simple" variable in case of primitives [ Int_t, Double_t ] or a pointer in case of STL containers) and connect this variable to the appropriate branch in the BeginInputFile(...) function. The function for connecting variables to an input branch is:

template< typename T >
void ConnectVariable( const char* treeName, const char* branchName, T& variable ) throw( SError )

Let's say you have a branch in your input tree which is of type std::vector< double >. You can use this branch in your analysis by creating a member variable in your cycle with a pointer to such an object, and connecting it to the branch like this:

 In the header:
   std::vector< double >* m_variable;

 In BeginInputFile(...):
   ConnectVariable( "Reco0", "vec_var", m_variable );   

Output variables are handled similarly. For each output primitive or object you have to create the object as a member of your cycle class, then you can declare it to the base class with the function:

template< typename T >
TBranch* DeclareVariable( T& obj, const char* name, const char* treeName = 0 ) throw( SError )

To write out a simple Double_t variable to the output TTree, you have to do the following:

 In the header:
   Double_t m_out_var;

 In BeginInputData(...):
   DeclareVariable( m_out_var, "out_var" );

Note that if you only declared one output TTree in your XML, then you don't have to specify the tree name for the function.

Note: For all the data types that you want to read or write from/to a TTree, you have to load the appropriate dictionary. For the basic STL classes (std::vector< double >, std::vector< int >, ...) ROOT has a built in dictionary. But if you want to write out a custom object for instance, you have to create a dictionary for this object, and load it in your SFrame job.

Histogram handling

You can put basically any kind of ROOT object (inheriting from TObject) into the output ROOT file. There are two functions that you can use to put or retrieve ROOT object to/from the output file:

template< typename T >
T* Book( const T& obj, const char* directory = 0 ) throw( SError )

template< typename T >
T* Retrieve( const char* name, const char* directory = 0 ) throw( SError )

You can use the first in the following way to declare a 1 dimensional output histogram:

   TH1* hist = Book( TH1D( "hist", "Histogram", 100, 0.0, 100.0 ) );

To access this histogram somewhere else in your code, you could do:

   TH1* hist = Retrieve< TH1 >( "hist" );

Note: These functions (because of the underlying ROOT implementations) are quite slow. So it's good practice to store the pointers to the output histograms in your cycle, and possibly never use SCycleBase::Retrieve(...).

Processing EventView NTuples containing multiple views

The possibility was added to process multiple views per event, as desired for example for top-quark analysis. When EventView stores multiple views for each event in the output NTuple, it creates one tree for each view, named EVxx, where xx is the "view number". Since not all the views exist for each event, only EV0, the tree with the first view, has as many entries as there are events in the NTuple. The other EV trees will have less entries. A special synchronisation mechanism is thus required in this case as the multiple input trees that SFrame lets you define have to have the same number of entries.

Another input-tree class was therefore introduced for the EV trees. If the user specifies the corresponding fields in the XML steering file, called EVInputTree with parameters BaseName (in this case "EV"), Number (the number of EV input trees that you want to analyse), and CollTreeName (name of the "collection tree, usually "CollectionTree"). Then the EventView trees are read-in from the input file and synchronised.

The synchronisation is done based on the variable EVInstance of the CollectionTree, a special tree of EventView NTuples that is needed for the synchronisation. If this tree does not exist, the program aborts. EVInstance gives for each event the number of views that are contained in the event. The corresponding EV trees can hence be incremented based on this information.

The function to connect a variable that exists in the multiple views is the following:

template< typename T >
Int_t* ConnectEventViewVariable( const char* baseName, const char* branchName, std::vector< T >& variables ) throw( SError )

For instance, let's say you have a simple Int_t variable calculated in multiple views by EventView that you would like to read for each event, for each available view. You could do that with the following code:

 In the header:
   std::vector< Int_t > m_ev_variables;

 In BeginInputFile(...):
   ConnectEventViewVariable( "EV", "HadTop_N", m_ev_variables )

Note: The view synchronisation has not really been tested with the new ntuple reading mechanism. The code probably holds a few bugs at the moment, but since we've not been using the multiple view feature of EventView for the CSC analyses, we didn't have time to debug it...

Using the additional scripts

The SFrame package has recently been extended (>= SFrame-02-01-01) with some Python code. It is meant to aid analyses by automating some things.

sframe_input.py

This executable script can be used to create the XML nodes for a set of Monte Carlo ROOT files. SFrame needs the integrated luminosity of the input files specified either file-by-file, or for a complete InputData. The script calculates the luminosity of each input file separately, and creates the nodes accordingly. After setting up your environment for compiling/running SFrame, it can be called like:

> sframe_input.py -x 23.45 -o test_input.xml *.root

This command would open all the files ending with .root in the current directory, calculating their integrated luminosity from the 23.45 pb cross section given as a command line option. In this example the results of the script would be written in the test_input.xml file.

The script provides a little help by executing the sframe_input.py -h command.

sframe_create_cycle.py

This script is a "revival" of the python code that was developed in Hamburg a while ago for a previous version of SFrame. It is able to create a template for a new user analysis cycle in a simple-to-use manner. For instance if the user would like to add a new cycle called MyAnalysis to the SFrameUser library, (s)he just has to go in the SFrame/user directory and execute:

> sframe_create_cycle.py -n MyAnalysis

After this the script creates a header file under SFrame/user/include/MyAnalysis.h, a source file under SFrame/user/src/MyAnalysis.cxx and an XML configuration file under SFrame/user/config/MyAnalysis_config.xml. It also extends the already existing SFrame/user/include/SFrameUser_LinkDef.h file with a line about the new cycle.

Since I like to put my code into namespaces, the script even supports creating an analysis cycle in a namespace. For instance you can write:

> sframe_create_cycle.py -n Ana::MyNamespacedAnalysis

This creates all the files with the MyNamespacesAnalysis prefix (removing Ana::), but puts the C++ code into the correct namespace.

Notes:

  • I've tried to make the code smart enough to work in many different scenarios, but I wouldn't be surprised if it still had some bugs.
  • The created XML file is very poorly done at the moment. It's probably a better idea to forget about it for now, and write the configuration from scratch...
  • Just like the other script, this can also give some help by calling sframe_create_cycle.py -h.

SFrameARA

SFrameARA is a relatively thin layer above SFrame that binds it to the ATLAS offline software. It's purpose is to enable the user to use AthenaROOTAccess with SFrame. Detailed information about it is given on the page: SFrameARA.

Running SFrame with Ganga

It is now possible to run SFrame on the GRID by using the SFrameApp plugin for the GRID job management application Ganga.

The code of the plugin is available here via CVS. As a prerequisite for the SFrameApp plugin, you must have Ganga already installed, for help please refer to the GangaTutorial44

Installation of the SFrameApp plugin

  • If you are using Ganga version 4.4.3, the plugin is already available as GangaSFrame.
  • Otherwise, retrieve the SFrameApp package from WebCVS or via the shell:

export CVSROOT=:ext:isscvs.cern.ch:/local/reps/atdesyz
 export CVS_RSH=ssh
 cvs co -d <$path of your choice> TopPhysics/mbarison/SFrameApp

  • If you don't have a ~/.gangarc config file, create one typing ganga -g then edit the config file:
    • In line 34, add

RUNTIME_PATH = GangaAtlas:GangaSFrame 

    • If you downloaded the plugin yourself, instead of getting it with Ganga 4.4.3, you have to type this instead:

RUNTIME_PATH = GangaAtlas:<$path of your choice>  

    • In line 240, uncomment VirtualOrganisation = atlas

If you set up the environment correctly, by starting Ganga you should get a message like this:

ATLAS User Support is provided by the Hypernews Forum Ganga User and Developers
 You find the forum at
 https://hypernews.cern.ch/HyperNews/Atlas/get/GANGAUserDeveloper.html
 or you can send an email to hn-atlas-GANGAUserDeveloper@cern.ch

SFrame Plugin -- Copyleft 2007 Marcello Barisonzi. All Wrongs Reserved.
Got a bug? Write to marcello.barisonzi@desy.de

Now you can test the SFrame plugin!

A Sample SFrameApp Job

Ganga is based on the Python shell, so you can enter commands interactively or write a script file.

Create a new job object:

j = Job()

Assign SFrame as the application used by the job:

j.application = SFrameApp()

Set the directory where the SFrame sources are located:

j.application.sframe_dir = '/path/to/SFrame'

Which SFrame configuration file do you want to use?

j.application.xml_options = '/path/to/my_example_top.xml'

(This is tricky) Which ATLAS SW release contains the ROOT version you want to compile SFrame with?

j.application.atlas_release = '13.0.40'

Now prepare the tarball with the SFrame libraries:

j.application.prepare()

SFrameApp : INFO Creating SFrame archive:
/afs/ifh.de/user/m/mbarison/gangadir/workspace/Local/sframe-00005.tar.gz ...
SFrameApp : INFO From /afs/ifh.de/user/m/mbarison/SFrame

Choose a DQ2 dataset from the ATLAS Wiki and assign it to the job

j.inputdata=DQ2Dataset()

j.inputdata.dataset = 'name.of.your.dataset'

j.inputdata.type = 'DQ2_LOCAL'

This option is a workaround if the dataset is incomplete (as most of datasets are) :

j.inputdata.match_ce_all = True

If you want to know many files are there in the dataset or their location you can use

j.inputdata.list_contents() or j.inputdata.list_locations()

Suppose you want to split the dataset in 16 jobs and merge it afterwards (WARNING: you nead at least 16 files in the dataset to do it!):

j.splitter = SFrameAppSplitterJob()
j.splitter.numsubjobs = 16
j.merger = SFrameAppOutputMerger()

Choose the GRID backend (Panda is not working with the plugin yet):

j.backend = LCG()

If you want to use a specific CE:

j.backend.CE = "your_favourite_CE"

Some standard CEs you might want to use:

  • Karlsruhe : "ce-fzk.gridka.de:2119/jobmanager-pbspro-atlasS"
  • Zeuthen : "lcg-ce0.ifh.de:2119/jobmanager-lcgpbs-atlas"
  • Hamburg0 : "grid-ce0.desy.de:2119/jobmanager-lcgpbs-atlas"
  • Hamburg2 : "grid-ce2.desy.de:2119/jobmanager-lcgpbs-atlas"
  • Hamburg3 (SLC4): "grid-ce3.desy.de:2119/jobmanager-lcgpbs-atlas"

You can get more CEs by using the command lcg-infosites (from the command shell)

Now that the job is defined, give it a name and let it run:

j.name = 'My SFrame Test'
j.submit()

Ganga.GPIDev.Adapters : INFO submitting job 170.0 to LCG backend
 Ganga.GPIDev.Lib.Job : INFO job 170.0 status changed to "submitted"
(etcetera)

You can check the status of the job with j.status or jobs, when the status is completed, you can run the merger: j.merge()

You will find a ROOT file with all your output data in:

~/gangadir/workspace/Local/<job_id>/output/

A slightly longer version of this tutorial with some background info can be found in this (almost obsolete) presentation

Developers

The following people have contributed to the SFrame package:

  • Stefan Ask <STEFAN.ASK@CERN.CH> - CERN
  • David Berge <DAVID.BERGE@CERN.CH> - CERN
  • Nicolas Berger <NICOLAS.BERGER@CERN.CH> - CERN
  • Till Eifert <TILL.EIFERT@CERN.CH> - U. of Geneva, Switzerland
  • Andreas Hoecker <ANDREAS.HOCKER@CERN.CH> - CERN

Currently active developers are:

  • Attila Krasznahorkay < Attila.Krasznahorkay@cern.ch> - CERN/Debrecen
  • David Berge <DAVID.BERGE@CERN.CH> - CERN
  • Johannes Haller <JOHANNES.HALLER@CERN.CH> - U. of Hamburg, Germany

To Do List

  • Test/debug code handling multiple views.

Comments, questions, suggestions, contributions

If you have comments, questions or suggestions or you would like to see additional functionalities of SFrame, please send an email to one of the developers mentioned above or post to atlas-sframe-users@cernNOSPAMPLEASE.ch.


Major updates:

-- JohannesHaller - 11 Sep 2006
-- DavidBerge - 26 Sep 2006
-- DavidBerge - 06 Oct 2006
-- MarcelloBarisonzi - 28 Aug 2007
-- MarcelloBarisonzi - 13 Nov 2007
-- AttilaKrasznahorkay - 21 Nov 2007

Topic attachments
I Attachment History Action Size Date Who Comment
PowerPointppt SFrame.ppt r1 manage 506.0 K 2007-03-05 - 12:31 JohannesHaller  
Edit | Attach | Watch | Print version | History: r30 < r29 < r28 < r27 < r26 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r28 - 2009-03-18 - AkiraShibata
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback