How to run on stripped DC06 data
Introduction
This page explains all you need to know to run on DC06 stripped data. If there's something missing, just add it!
It supersedes the obsolete
DC04 page
.
All below assumes you, or someone else, has provided a preselection which is suitable to select your favourite signal. If this is not the case the stripped data is useless for you and you have to provide a preselection for the next stripping. The procedure is explained
here.
The last official round of stripping was done with
DaVinci v19r7. The selected events were reconstructed with tow versions of Brunel, v30r17 and v31r11. Unfortunately because of a bug it was not possible to recover which selection selected a particular event. For this reason Lesya Shchutska has re-run the stripping on the stripped data and regenerated all files.
This page first explains how the official stripping worked, and then what Lesya did and from where to get to get the data.
How the official stripping is done
Before using stripped data you need to understand a little bit of how it was done. A BB stripping job consists of
- A DaVinci step that runs on the output of Brunel, the rDST (reduced DST). It runs each of the released preselections and outputs an event-tag collection (ETC) containing for every event the result of each preselection. This file is the full ETC (FETC) which is available in the bookkeeping.
- A Brunel v30 step that takes the FETC as input and reconstructs all events which are accepted by at least one preselection. It saves a full DST and another event-tag collection, the stripped ETC (SETC). Both these files are available in the bookkeeping.
- A Brunel v31 step that takes the same FETC, does analogous job and saves yet another full DST and SETC, also available in the bookkeeping.
Unlike with DC04 there is no scaling of preselections. The working groups are responsible for the output of their preselections.
Output data
In the bookkeeping you will find the following data formats:
-
rDST
, the reduced DST - This is the input data for the stripping. You probably don't want to look at this data except when you design a new preselection. There's no MC truth and not all tracks.
-
DST
, the full stripped DST - This is the main output of the stripping: the complete DST with all what you need in it.
-
SETC
, the stripped event-tag collection - This event-tag collection allows you to navigate directly to the events selected by your preselections. Examples are given below. Because of a bug in DaVinci v19r7, the
SETC
does not contain the necessary information to recover the initial event. See a href="#How_to_run_on_re_stripped_DC06_d">here for more details.
-
FETC
, the full event-tag collection - This event-tag collection points back to the rDST. You should hence not use it as input to DaVinci jobs as you would have to stage in the world. But you can read it as a root file to find out what the accept rate of your preselection was.
Which preselections are run?
The general procedure to find out what has been run:
- The Bookkeeping
will tell you which version of DaVinci has been run for your particular data sample. Say, v19r7
(which is the most likely).
- On a machine where the releases are installed do
SetupProject DaVinci v19r7
(or whichever version).
- Look at the actual stripping file
$STRIPPINGROOT/options/Presel.opts
.
In the case of
DaVinci v19r7
the file is
here
.
Signal stripping and reprocessing
Signal will also be stripped using the same preselections as the BB stripping. It has not started yet. The procedure is explained in
Thomas' talk
at software week.
How to run on re-stripped DC06 data with the new format DSTs
The stripping was rerun with
DaVinci v19r11 on the stripped DSTs. Since there are two sets of DSTs corresponding to reconstruction with Brunel v30r17 or Brunel v31r11 ( the corresponding configurations are
DC06 - stripping-v30-lumi2
and
DC06 - stripping-v31-lumi2
) the two sets of SETC, newDSTs and newSETCs were produced. The difference between the reconstructions leads to the fact that efficiency of all preselections on
v30r17 reconstructed stripped data is
~94% and on
v31r11 only
~60%. These numbers can vary for the individual preselections. So the safe way is to use v30r17 reconstructed data and than make a check of your final selection on v31r11 data.
Output data of the re-stripping
Now it is possible to use the following data formats:
-
SETC
, the stripped event-tag collection - This event-tag was produced since the SETC in the bookkeping does not contain the results of preselections. Examples are given below.
-
newDST
, the new stripped DST - This is the output data of the re-stripping. It contains MC truth information,
/Event/Phys
containers with the all intermediate particles and B-candidates selected by preselections, protoparticles, selection result and links to the DSTs in case the other information is needed. Examples of using are given below.
-
newSETC
, the stripped event-tag collection for newDST - This event-tag collection allows you to navigate to the new DSTs containing events selected by your preselections. Examples are given below.
Why were the new DSTs produced.
When analyzing stripped events it often appears that one should rerun the preselection/selection algorithm in
order to include some new features that were absent in the previous version. So waiting the output makes the
analysis really time consuming. The new concept of stripped DSTs provides a faster access to the stripped
events and for the result of preselections. It was checked that e.g. selection algorithm for

runs four times faster on newSETC than on SETC so that processing of the stripped events takes 1.5 hour.
Timing example for using different approaches with

selection:
-
DST
: 3310 s on ~36k events
21 hours for all data (note that here both PreselXb2GammaX
and SelBs2PhiGamma
ran in sequence )
-
SETC
: 1630 s on ~3.2k events selected by PreselXb2GammaX
~72k stripped events
5.1 hours for all data
-
newDST
: 1120 s on ~72k events
3.5 hours for all data
-
newSETC
: 460 s on ~3.2k events selected by PreselXb2GammaX
~72k stripped events
1.4 hours for all data
How to use stripped data with the new files
It's unlikely you want to look at all events. You only care about events selected by your preselections. There are three ways to get access to the result of your preselection. The third one is the fastest.
Run on stripped ETC
Here's the recipe to get the result of your preselection and navigate to interesting events form DSTs.
Get the files with SETC
You can find the file to be included to your options here:
setc_v30.opts or
setc_v31.opts.
Add the name of your preselection as a
SEL='PreselXXX>0'
requirement at the end of each line. You can add as many selections as you like combined with
&&
or
||
. It's basically a root
TCut
. For example that would be like
EventSelector.Input = {
"COLLECTION='TagCreator/1'
DATAFILE='PFN:castor:/castor/cern.ch/user/l/lshchuts/stripping/SETC_v30/1.root' TYP='POOL_ROOT'
SEL='(PreselXb2GammaX>0)'",
[...]
};
ApplicationMgr.ExtSvc += { "TagCollectionSvc/EvtTupleSvc" };
This last line tells DaVinci what to use to understand the ETC.
Include a file catalogue for DSTs to your options
Next you need to tell where to find the files the ETC points to. This is done declaring a file catalogue which you can copy from here:
dst_v30.xml or
dst_v31.xml
and then add this line to your options:
FileCatalog.Catalogs = { "xmlcatalog_file:dst_v30.xml" } ;
Run on new re-stripped DST
Use the new re-stripped DST as input but look only at events that have passed
your preselection. To achieve this you have to read the
SelResult
tag of each event to find the ones relevant for you. This way is faster than running on SETC that navigate you to the full stripped DSTs.
Get the files with new DSTs
You can find the file to be included to your options here:
newDST_v30.opts or
newDST_v31.opts.
Using CheckSelResult
algorithm
You should put all your analysis code in a sequencer starting with a
CheckSelResult
algorithm.
#include "$DAVINCIROOT/options/DaVinciCommon.opts"
ApplicationMgr.TopAlg = { "GaudiSequencer/MainSeq" };
MainSeq.Members = { "CheckSelResult/CheckPreselXb2GammaX"
, "PrintHeader/PrintOfficialPreselXb2GammaX"
, "GaudiSequencer/SeqSelBs2PhiGamma" };
CheckPreselXb2GammaX.Algorithms = {"PreselXb2GammaX"};
SeqSelBs2PhiGamma.Algorithms = { "SelBs2PhiGamma" };
// Run Bs to phi gamma selection
#include "$RADIATIVEOPTS/options/DVSelBs2PhiGamma.opts"
SelBs2PhiGamma.PhysDesktop.InputLocations ={
"Phys/StdLooseAllPhotons",
"Phys/StdLooseKaons" ,
"Phys/StdLoosePhi2KK",
"Phys/PreselXb2GammaX"
};
Here for example you look for events having passed the
PreselXb2GammaX
preselection and run the selection only for those.
Be aware of absence any
+
signs in the sequence declarations. You don't have to make all the standard particles again since they are already stored in the corresponding containers. Also you can select the B-candidate from the
"Phys/PreselXXX"
container. So there is no need to re-run your preselection you can directly apply the selection algorithm.
Include a file catalogue to your options
These new DSTs also have links to the full stripped DSTs. In order to tell where these DSTs can be find you should add the same file catalogue as when using SETC (
dst_v30.xml or
dst_v31.xml):
FileCatalog.Catalogs = { "xmlcatalog_file:dst_v30.xml" } ;
#include "newDST_v30.opts"
Run on new DSTs with ganga
In order to run it with ganga you need logical file names (LFN) as the input:
newDST_v30_lfn.opts or
newDST_v31_lfn.opts
and file catalogue that contain IDs for the newDST files:
newDST_v30.xml or
newDST_v31.xml
FileCatalog.Catalogs = { "xmlcatalog_file:dst_v30.xml" } ;
FileCatalog.Catalogs += { "xmlcatalog_file:newDST_v30.xml" } ;
#include "newDST_v30_lfn.opts"
You will also need to add these two files to the input sandbox
j.inputsandbox=[File('$DAVINCIROOT/options/dst_v30.xml'),File('$DAVINCIROOT/options/newDST_v30.xml')]
It surely works for lxbatch i.e. LSF() backend but won't work for Dirac().
Run on stripped ETC for new stripped DSTs.
This way is the fastest one and depending on the selection algorithm one even can run through the all stripped

-data interactively. Here's the recipe to get the result of your preselection and navigate to interesting events form the new stripped DSTs. Again one doesn't need to make the standard particles and can put only the selection algorithm in the
sequence:
ApplicationMgr.TopAlg = { "GaudiSequencer/SeqBs2PhiGamma" } ;
/// the selection sequence itself
SeqBs2PhiGamma.Members = {
"SelBs2PhiGamma" // the selection algo
} ;
SelBs2PhiGamma.PhysDesktop.InputLocations ={
"Phys/StdLooseAllPhotons",
"Phys/StdLooseKaons" ,
"Phys/StdLoosePhi2KK",
"Phys/PreselXb2GammaX"
};
Always apply cuts on your B's
An even safer way of making sure you don't get anything wrong is to start your offline selection from your preselected candiadte:
- You might have missed a cut, or made a mistake and actually apply looser cuts. This would result in applying looser cuts on your signal (where you don't have run the preselection) than on the stripped sample (where you did).
- DaVinci evolving which affects the results, you might find some events that did pass the stripping, but not actually your selection.
In both cases you are likely to get a too optimistic result.
Always apply your cuts on the B candiadte from the preselection
AnaHeavyDimuonSeq.Members += { "FilterDesktop/MyOfflineSelection" };
MyOfflineSelection.PhysDesktop.InputLocations = { "PreselHeavyDimuon" };
For L0 stripped MB
Ignore all the above. All events are L0-accepted and there's no further granularity.
Get the files with new SETC
You can find the file to be included to your options here:
setc_newdst_v30.opts or
setc_newdst_v31.opts.
Add the name of your preselection as a
SEL='PreselXXX>0'
requirement at the end of each line. You can add as many selections as you like combined with
&&
or
||
. In our example that would be like
EventSelector.Input = {
"COLLECTION='TagCreator/1'
DATAFILE='PFN:castor:/castor/cern.ch/user/l/lshchuts/stripping/SETC_newDST_v30/1.root' TYP='POOL_ROOT' SEL='(PreselXb2GammaX>0)'",
[...]
};
ApplicationMgr.ExtSvc += { "TagCollectionSvc/EvtTupleSvc" };
This last line tells DaVinci what to use to understand the ETC.
Include a file catalogues for new DSTs and DSTs to your options
Next you need to tell where to find the files the SETC points to and where to find files where newDSTs point to. This is done declaring a file catalogue which you can take from the corresponding directory:
dst_v30.xml or
dst_v31.xml
and
newDST_v30.xml or
newDST_v31.xml
an then add this line to your options:
FileCatalog.Catalogs = { "xmlcatalog_file:dst_v30.xml" } ;
FileCatalog.Catalogs += { "xmlcatalog_file:newDST_v30.xml" } ;
#include "setc_newdst_v30.opts"
The correspondent
-statistics
All the data sets were produced from the
DC06 - Stripping-v30-lumi2
and the
DC06 - Stripping-v31-lumi2
stripped DST data available
in the bookkeping. This means that new files correspond to
804584 stripped events and
22085009 
inclusive events. So this
number should be used for the final evaluation. However it's worth to emphasize once more that the safe way is to use v30r17 configuration
since it reproduces the original stripping in a correct way. But with the gain in time which the new DSTs provide it is not a problem to
run on both sets of events and check the consistency of results.
--
LesyaShchutska - 15 Apr 2008
- setc_v30.xml: File catalogue for SETC for v30r17 reconstructed DSTs
- dst_v30.xml: File catalogue for v30r17 reconstructed DSTs
- newDST_v30.xml: File catalogue for new DSTs made from v30r17 reconstructed DSTs
- setc_v31.xml: File catalogue for SETC for v31r11 reconstructed DSTs
- dst_v31.xml: File catalogue for v31r11 reconstructed DSTs
- newDST_v31.xml: File catalogue for new DSTs made from v31r11 reconstructed DSTs
Tagging : Beware of the opposite B bug!
If you are doing B tagging make sure you read
this first.
MC Association problem in Background Categorization tool !
There is a problem in the MC association causing a segmentation fault in the Background Categorization tool. This can be corrected just by adding
#include "$DAVINCIROOT/options/DaVinciMainSeqFixes.opts"
just after DaVinciCommon.opts.
Looking at events with Panoramix
NOT TESTED YET, but should work.
The preparation of the user specific option files is as explained above. It is recommended to start Panoramix with python. The
SetupProject
Panoramix script will define an environment variable
myPanoramix
pointing to the Panoramix startup script. Panoramix is then launched by
python $myPanoramix
(linux) or
python %myPanoramix%
(windows).
Run on stripped DST with Panoramix, not recommended way
Assume the name of the user option file is
mySel.opts
with the options explained above, BUT without
DaVinciCommon.opts
. Then the event display can be launched with:
python $myPanoramix -f none -u mySel.opts
The -f none option is necessary if the event selector is specified in
mySel.opts
. In the next version of Panoramix, v15r6, the logic will change. If there is a user option file and no input file specified on the command line, it is assumed that the input file is specified by the user options. Default file can still be used with option -f default.
Also, next version of Panoramix v15r6 will have a command Event/Loop over events to allow to go to the next selected event. Currently, next event will be next event on the file.
Run on stripped ETC with Panoramix, recommended way
Take again the same option file as for running DaVinci, BUT without
DaVinciCommon.opts
. Alternatively, you could specify the file catalogue separately (only for future version v15r6):
python $myPanoramix -u mySelETC.opts -x myFileCatalog.xml
When using an ETC as input, next event will go automatically to the next event selected with the SEL string in definition of the ETC.
User Experience
Nothing yet.
To do list for next stripping, and some ideas
- The
SelResults
container is now created from the ETC and some information, like the decay descriptor is lost.
- Maybe the preselections should be declared to the
DataOnDemandSvc
so that the preselection is triggered by a call to the SelResult
object (?)
- The StreamingTaskForce would like the B candidates to be stored on the DST. This can be done rerunning the selection(s) on the DST after the Brunel step (simple), or by storing the necessary information in a format that could be similar to what is proposed to save the trigger candidates (needs more work).
- One should also not forget the procedure to include preselections in the stripping suggested by this task force.
- All preselections should be rewritten using the HLT syntax.
--
PatrickKoppenburg - 18 Jul 2007 - 19 Jul 2007 - 08 Dec 2007 - 04 Jul 2008