Adding information to the events with EDProducer

Complete: 3

Introduction

This tutorial shows how to put data into the Event. There are two parts to putting data into the Event

  1. Creating the package which holds the C++ class for the data. This includes the code needed to generate the dictionaries (used for storing the class to a file). More details are given in SWGuideCreatingNewProducts. In the most simple (and common) case, if your data are of already existing format, you will not need to do this. An example how to store data of a known format is given in SWGuideEDProducerAnalysis.
  2. Creating an EDProducer which is a framework module that creates the data and then places it into the Event

The EDProducer is a top-level, reconstruction-developer written CMSSW framework component that performs a quantifiable step of reconstruction by creating and storing persistable data in the Event. An EDProducer instance produces an EDProduct instance. Any EDProducer class may be used for scheduled and unscheduled applications.

Steps

  1. create a scram project area (using a 0_6_0* release or greater)
  2. in the src directory of the project, mkdir Analysis
  3. cd Analysis

Recipe for creating the data package for your product

We will create two different data classes to put into the Event: MyStuff and std::vector<MyOtherStuff>.
  1. mkdir MyStuff
  2. mkdir MyStuff/interface
  3. mkdir MyStuff/src
  4. create the file CMS.BuildFile in the directory MyStuff and put the following inside
    <use name=DataFormats/Common>
    <use name=rootrflx>
    <export>
      <lib name=AnalysisMyStuff>
      <use name=DataFormats/Common>
    </export>
    
  5. cd interface
  6. create the header files for the class using the mkskel command
    1. $CMSSW_RELEASE_BASE/src/FWCore/Skeletons/scripts/mkskel MyStuff.h
    2. $CMSSW_RELEASE_BASE/src/FWCore/Skeletons/scripts/mkskel MyOtherStuff.h
  7. edit the header files: comment out the copy constructor and operator=() which are found in the private: section of the class declaration.
  8. cd ../src
  9. create the source files for the classes
    1. $CMSSW_RELEASE_BASE/src/FWCore/Skeletons/scripts/mkskel MyStuff.cc
    2. $CMSSW_RELEASE_BASE/src/FWCore/Skeletons/scripts/mkskel MyOtherStuff.cc
  10. create the files used to generate the dictionaries
    1. create the file classes.h containing the following
      #include "Analysis/MyStuff/interface/MyStuff.h"
      #include "Analysis/MyStuff/interface/MyOtherStuff.h"
      #include <vector>
      #include "DataFormats/Common/interface/Wrapper.h"
      
      namespace { namespace {
        //say which template classes should have dictionaries
        edm::Wrapper<MyStuff> dummy1;
        edm::Wrapper<std::vector<MyOtherStuff> > dummy2;
      } }
      
    2. create the file classes_def.xml containing the following:
      <lcgdict>
        <class pattern="edm::Wrapper<*>"/>
        <class name="std::vector<MyOtherStuff>"/>
        <class name="MyStuff"/>
        <class name="MyOtherStuff"/>
      </lcgdict>
      

Create the EDProducer

Now we will create the EDProducer which will create the new data of an existing format
  1. use the command 'mkedprod' to create the package and skeleton code
         mkedprod MyStuffProducer
    
  2. edit the MyStuffProducer/CMS.BuildFile , add
        <use name=Analysis/MyStuff>;
    
    both outside and inside the <export></export> block
  3. edit the MyStuffProducer/src/MyStuffProducer.cc file
    1. add the following includes
      #include "Analysis/MyStuff/interface/MyStuff.h"
      #include "Analysis/MyStuff/interface/MyOtherStuff.h"
      #include <vector>
      
    2. Do not put any code between any #ifdef THIS_IS_AN_*_EXAMPLE #endif block since that is just an example which will not be compiled.
    3. in the constructor definition ( i.e., MyStuffProducer::MyStuffProducer(...) ), add the code
         produces<MyStuff>();
         produces<std::vector<MyOtherStuff> >();
      
    4. in the MyStuffProducer::produce(...) method, add the code
         std::auto_ptr<MyStuff> myStuff( new MyStuff );
         std::auto_ptr<std::vector<MyOtherStuff> > otherStuffs( new std::vector<MyOtherStuff> );
         iEvent.put( myStuff);
         iEvent.put(otherStuffs);
      

Testing

  1. goto the src directory of the project and do scramv1 b
  2. create the file test.cfg which contains
    process TEST = {
        //just a dummy source to give us some empty events
       source = EmptySource {
          untracked int32 maxEvents=2
       }
    
       //load our module
       module stuff = MyStuffProducer {}
    
       //make sure our module is called every event
       path p = { stuff }
    
        //just to see that something is happening
       service = Tracer {}
    }
    
  3. run the job by doing cmsRun test.cfg
The output of the job should look like
++ Job started
++++ processing event:run: 1 event: 1 time:5000000
++++++ module:stuff
++++++ finished:stuff
++++ finished event:
++++ processing event:run: 1 event: 2 time:10000000
++++++ module:stuff
++++++ finished:stuff
++++ finished event:
++ Job ended

Further Exercises

Make the classes more interesting

  1. edit the MyOtherStuff class (Analysis/MyStuff/interface/MyOtherStuff.h) so that it has an int member data and the method int value() const which returns that member data
  2. edit MyStuffProducer::produce() function to add an entry in the otherStuffs container
    1. otherStuffs->push_back( MyOtherStuff( 1 ) );
  3. edit the configuration file so that we can print the data in the Event
    1. add module print = EventContentAnalyzer {untracked bool verbose = true }
    2. modify the path statement to be path p = { stuff&print }
  4. rerun the cmsRun job.

The results of the job should look something like

++ Job started
++++ processing event:run: 1 event: 1 time:5000000
++++++ module:stuff
++++++ finished:stuff
++++++ module:print

++Event     0 contains 2 products with friendlyClassName, moduleLabel and productInstanceName:
++MyStuff "stuff" ""
++  (MyStuff) (MyStuff)
++MyOtherStuffs "stuff" ""
++  (std::vector<MyOtherStuff>)=[size=1]
++    [0] (MyOtherStuff)
++      value_=1
++++++ finished:print
++++++ module:out
++++++ finished:out
++++ finished event:
++++ processing event:run: 1 event: 2 time:10000000
++++++ module:stuff
++++++ finished:stuff
++++++ module:print

++Event     1 contains 2 products with friendlyClassName, moduleLabel and productInstanceName:
++MyStuff "stuff" ""
++  (MyStuff) (MyStuff)
++MyOtherStuffs "stuff" ""
++  (std::vector<MyOtherStuff>)=[size=1]
++    [0] (MyOtherStuff)
++      value_=1
++++++ finished:print
++++++ module:out
++++++ finished:out
++++ finished event:

Summary for key being the concatenation of friendlyClassName, moduleLabel and productInstanceName
     2 occurrences of key MyOtherStuffs + "stuff" + ""
     2 occurrences of key MyStuff + "stuff" + ""
++ Job ended

Storing the data to a ROOT file

So far, all the work is done in the cmsRun job, but is then lost. Lets save our results.
  1. edit the configuration file to include
    1. module out = PoolOutputModule { untracked string fileName = "test.root"}
    2. endpath o = {out}
  2. run the job
  3. run ROOT and open the test.root file. All the data is stored in the 'Events' TTree.

Having only one EDProducer in the job is not that interesting, lets add a second one.

  1. edit the configuration file to include
    1. module thing = ThingProducer {}
    2. untracked PSet options = { untracked bool allowUnscheduled = true }
      1. this command starts unscheduled execution which just means the framework figures out the proper order of the EDProducers.
      2. if you want to specify the execution explicitly, do not use the last command and instead modify the path statement to be
        1. path p={thing, stuff & print }
  2. run the cmsRun job
  3. look at the test.root file using root
    1. you should see a new branch named edmtestThings_thing__TEST

But say we need ThingProducer in our job but we only want to store what data is created by MyStuffProducer? In that case, we can tell the PoolOutputModule exactly what it should store (or even what it should ignore). See SWGuideSelectingBranchesForOutput for details.

  1. edit the configuration file by adding the following to the PoolOutputModule's configuration block
    1. untracked vstring outputCommands = { "keep *_stuff_*_*" }
  2. run the job
  3. look at the test.root file using root
    1. you should no longer see the branch edmtestThings_test__TEST.

Getting data from the Event

Most algorithms need data as input in order to do their calculations. For an EDProducer this usually means getting data from the edm::Event. So now lets have our MyStuffProducer get the std::vector< edmtest::Thing > data which is created by the ThingProducer.
  1. edit the MyStuffProducer/src/MyStuffProducer.cc file
    1. #include "DataFormats/TestObjects/interface/Thing.h"
    2. in the produce(...) method add the following (see SWGuideEDMGetDataFromEvent for details)
      1.   Handle<std::vector<edmtest::Thing> > things;
          iEvent.getByLabel("thing",things);
                 
      2. then replace otherStuffs->push_back( MyOtherStuff(1) ); with
           for( std::vector<edmtest::Thing>::const_iterator itThing=things->begin();
           itThing != things->end();
           ++itThing) {
              otherStuffs->push_back( !MyOtherStuff(itThing->a) );
           }
                 
  2. edit the MyStuffProducer/CMS.BuildFile
    1. add <use name=DataFormats/TestObjects> both inside and outside the <export></export> block
  3. compile the code and run the job

Printing debug messages

Sometimes it is useful when debugging to have the code print messages about the codes progress. Once you have finished debugging, sometimes you like to keep those debug messages in the code but just make them silent. The MessageLogger service can help you do so.
  1. edit the MyStuffProducer/src/MyStuffProducer.cc file
    1. #include "FWCore/MessageLogger/interface/MessageLogger.h"
    2. in the produce(...) method, add the following statements in the appropriate place
      1. LogDebug("Trace") >>"entered produce";
      2. LogDebug("Values") >>"MyOtherStuff vector has size">>otherStuffs->size();
      3. LogDebug("Trace") >>"exiting produce";
  2. compile the code and run the job
In this case, you should see nothing in the job's output. This is because by default, no DEBUG messages get printed.

Now lets turn on the debug messages

  1. edit the configuration file to include the following
       service = MessageLogger {
          vstring destinations =  {"debug.txt"}
          PSet debug.txt = { 
             string threshold = "DEBUG" 
             PSet DEBUG = { int32 limit = -1}    
          }
          vstring debugModules = { "stuff"} 
       }
         
  2. run the job
This time, you should see a debug.txt file which contains the output from the MessageLogger which includes your debug print outs.

NOTE In release CMSSW_0_6_0_pre2 (and beyond?) the MessageLogger does not work correctly with the 'unscheduled' execution mechanism. To see all your debug message you will need to use the fully 'scheduled' execution (i.e. you must explicitly place every module in the path). Alternatively, you can tell the MessageLogger to print debug messages for all modules by doing vstring debugModules={"*"}.

Adding Parameters to the Producer

It is often useful to be able to modify values used by ones algorithms without having to change the C++ code. The Framework provides the ParameterSet system to accomplish that task.

Lets modify the MyStuffProducer so that the values of the MyOtherStuff objects created can start from an offset with respect to the values of the Thing object from which they are created.

  1. edit the MyStuffProducer/src/MyStuffProducer.cc file
    1. in the class declaration, add a member data int offset_
    2. change the class constructor implementation to be
      • MyStuffProducer::MyStuffProducer(const edm::ParameterSet& iConfig) : offset_(iConfig.getParameter<int>("offset") )
    3. in the produce(...) method, add the value of offset_ to the value from Thing
      • otherStuffs->push_back(MyOtherStuff(itThing->a + offset_) );
  2. run the job

The job should now fail because we have not set the value of the offset parameter.

  1. edit the configuration file so that that the configuration for 'stuff' is
    • module stuff = MyStuffProducer { int32 offset = 5 }
  2. run the job
This time the job should work just fine.

untracked parameters

The system keeps track of what parameters are used to create each data item in the Event. This can be used laster to help understand how the data was made. However, sometimes a parameter will have no effect on the final objects created, e.g., the parameter just sets how much debugging information should be printed to the log. Such parameters are considered 'untracked' and it is best to tell the ParameterSet system that they are untracked.

Lets use an untracked parameter to decide how much debug information we should print.

  1. edit the MyStuffProducer/src/MyStuffProducer.cc file
    1. in the class declaration, add the member data bool verbose_
    2. change the class constructor implementation to have
      • ... offset_(....), verbose_(iConfig.getUntrackedParameter<bool>("verbose",false))
    3. change the produce(...) method so that if verbose_ is true, you print out all the values in the otherStuffs vector.
  2. compile and run the job
The job will run and you will not see any changes in the debug output. This is because we did not set verbose in the configuration file and therefore the default value of false was used.
  1. edit the configuration file, adding the following to the stuff configuration block
    1. untracked bool verbose = true
  2. run the job

cfi file

As you add more and more parameters to your module, it becomes increasingly painful to have to write every single one of them in the jobs' configuration file. Instead, you should write a configuration fragment include (cfi) file which just says how your particular module should be configured. These cfi files go into the data directory of the package which contains the module. The names of the files should match the 'module label' assigned to the module from within the cfi file and should end with the post-fix .cfi.

Lets create a cfi file for our MyStuffProducer.

  1. in the Analysis/MyStuffProducer package make the directory data
  2. in the data directory, create the file stuff.cfi and fill it with the following
    module stuff = MyStuffProducer { 
        int32 offset = 5
        untracked bool verbose = false
    }
           
  3. in the job configuration file, replace the stuff configuration block with
    include "Analysis/MyStuffProducer/data/stuff.cfi"
           
  4. run the job

Review status

Reviewer/Editor and Date (copy from screen) Comments
JennyWilliams - 05 March 2007 Moved entire content of WorkBookEDMTutorialProducer here, replaced its content with a more fundamental tutorial
JennyWilliams - 08 Jan 2007 Moved page to workbook, put in manual TOC and other stuff needed for wb
ChrisDJones - 5 Apr 2006 Page made outside workbook

Responsible: ChrisDJones
Last reviewed by:

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2013-03-27 - HomerWolfe



 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback