Creating New Products
Complete:
Goal of this page
This page contains instructions on how to introduce a new
product into the system so that it can be stored in the edm::Event and later in a
EDM/ROOT file.
We distinguish between a
product put directly into the edm::Event (e.g. a JetCollection), which we call an
EDProduct, and a
product that is a constituent of an EDProduct (e.g. a single Jet, a Hit), which we refer to as a
constituent.
Making the package containing your product:
The package that defines your product must be separate from the package of the EDProducer that will actually create instances of your product. If this is not done, then errors may occur when running cmsRun.
The package defining your
product needs to contain:
- The code defining your product
- The
classes_def.xml
file
- The
classes.h
file
Multiple related
products can be placed in the same package.
Restrictions on the package defining your persistence capable product:
These restrictions apply to the package defining your products that will be persisted in a EDM/ROOT file.
Note: Products that will be persisted only in non-EDM ROOT files need to follow the rules for transient products given in the next section.
The package defining your product should be located in a subsystem devoted to product definitions. Examples of such subsystems are
AnalysisDataFormats,
DataFormats,
FastSimDataFormats,
SimDataFormats, and
TBDataFormats.
In your package
BuildFile.xml, you declare any dependencies on other packages or external tools.
Your
BuildFile.xml must not contain a plugin directive
The files defining your products must be in the
src or
interface directories
If your package includes any headers in
DataFormats/Common, you must declare a dependency on
DataFormats/Common.
If your package throws any exceptions, you must explicitly declare a dependency on
FWCore/Utilities.
You must not write directly to
cerr or
cout. If you need text output, use the
MessageLogger. If your package uses the
MessageLogger, you must explicitly declare a dependency on
FWCore/MessageLogger. Limit the use of
MessageLogger to source files, i.e. avoid using
MessageLogger in headers in
interface
directory.
The package containing your product must not be declared dependent on any other package in CMSSW, except as stated above, and also except other packages defining lower level products that are constituents of your product.
There are also restrictions on external dependencies (dependencies on tools/packages outside of CMSSW).
In most cases, there should be none.
A dependency on
clhep is permitted if your product contains a clhep product (e.g. HepLorentzVector).
A dependency on
boost is permitted if necessary.
A dependency on
eigen is permitted if necessary, and the use of eigen does not have any side effects (like starting new threads).
Dependencies on ROOT I/O package are prohibited. Dependencies on ROOT non-I/O packages (e.g. Math, Histograms) are permitted.
Dependencies on geant4 are prohibited.
Each persistence capable class must have a public default constructor.
If your package directly throws any exceptions, they must be of the class "cms::Exception" or "edm::Exception". Do not throw a std::exception or define your own exception classes.
The methods of your class should be kept short and simple.
NOTE: if additional dependencies are permitted in the future, very likely the
ROOT_INCLUDE_PATH
needs to be updated.
Restrictions on a package defining your transient only product:
In rare cases, you may wish to put a product in the event that you wish NEVER to be output into a EDM/ROOT file.
Also, you may need to use ROOT dictionaries for products that will be persisted in only non-EDM ROOT files. These follow these same rules as transient only products
If the package defining your product is located in a subsystem devoted to product definitions (e.g. DataFormats, SimDataFormats), your package must meet all the same restrictions as given above for persistence capable products. However, for a transient only product, you may choose to locate your package outside of one of these subsystems. If so, the requirements on external dependencies of your package are eased somewhat. Your package may be dependent on other CMSSW packages outside of the subsystems devoted to product definitions, but you still must avoid dependencies on packages defining plug-ins (e.g. producers, filters, analyzers, etc.). Your package must still meet all of the other restrictions given for persistence capable products above. In particular, your product still does need a public default constructor.
Restrictions on the package producing your product:
You must not write directly to
cerr or
cout. If you need text output, use the
MessageLogger. If your package uses the
MessageLogger, you must explicitly declare a dependency on
FWCore/MessageLogger. (Applies to all packages)
If your package directly throws any exceptions, they must be of the class "cms::Exception" or "edm::Exception". Do not throw a std::exception. If you define your own exception class, it must publicly inherit from cms::Exception. (Applies to all packages).
Your producer should do as little as possible other than producing the product(s) and putting them into the event. The producer should be dependent only on packages necessary for this task. Therefore, dependencies on ROOT (other than non/IO packages, such as math packages) are strongly discouraged.
Make your product
For this example, our product will be very simple.
#ifndef DataFormats_SampleProd_h
#define DataFormats_SampleProd_h
#include <vector>
// a simple class
struct SampleProd
{
explicit SampleProd(int v):value_(v) { }
SampleProd():value_(0) { }
int value_;
};
// this is our new product, it is simply a
// collection of SampleProd held in an std::vector
typedef std::vector<SampleProd> SampleCollection;
#endif
The first step is to add this header and the associated implementation file to the interface and src directories of your package.
Create a classes_def.xml in the src directory
The classes_def.xml file directs scram dictionary generation
and must contain the names of the classes that you will want to put into the event.
The
EDM always adds a wrapper around an
EDProduct that you stick directly into to event.
You will need to
- specify the product and any of its (non-transient) constituents defined in your package,
- If the product or a constituent defined in your package has any transient (i.e. non-persistent) data members, these members must be indicated as such in the classes_def.xml file. A dictionary is not usually needed for the transient member itself or its constituents.
- If the product is an EDProduct, mention the wrapped EDProduct.
- If a dictionary for an std::map is specified, also specify the dictionary for the corresponding std::pair.
- In some exceptional cases, due to the requirements of schema evolution, a dictionary for a transient member or a component of a transient member may be required. If such a needed dictionary is missing, a run time exception should occur indicating what dictionary or dictionaries are needed.
In addition, you will likely need to mention any of the class template instantiations used in the
product.
Some important rules must be followed:
- The namespace of each class must be fully spelled out
- If the class is an instance of a template, the values of all template parameters, including defaulted parameters, must be fully specified.
- A typedef may often be used in place of the fully specified type, but the typedef must expand to the fully defined type.
- In some cases a typedef may not work.
Finally, here are the contents of the file for our sample
product:
<lcgdict>
<class name="SampleProd"/>
<class name="std::vector<SampleProd>"/>
<class name="edm::Wrapper<std::vector<SampleProd> >"/>
</lcgdict>
classes_def.xml entries for an EDProduct that stores only some of its data members
Suppose there is a data member of your class that you do not want to store because it can be calculated from other stored data
members. E.g.
// a simple class
struct SampleProd
{
private:
//do not store
mutable int square_;
public:
explicit SampleProd(int v):value_(v),square_(0) { }
SampleProd():value_(0),square_(0) { }
int value_;
int square() const {
if ( 0 == square_) {
square_=value_*value_;
}
return square_;
}
};
In the entry for the class in the classes_def.xml file you use a
<field name="..." transient="true" /> node, where the
name
attribute is the name of the member you do not want to store. E.g.,
<class name="SampleProd">
<field name="square_" transient="true"/>
</class>
<class name="std::vector<SampleProd>"/>
<class name="edm::Wrapper<std::vector<SampleProd> >"/>
In addition, one needs to be sure ROOT clears the value each time it reads a new object back from storage. This clearing must be done since ROOT may reuse the same memory area over and over for different events. The clearing can be done using the same tools for ROOT's explicit schema evolution handling (
http://root.cern.ch/root/html532/io/DataModelEvolution.html
). This entails writing an
ioread rule in the classes_def.xml file. E.g.
<ioread sourceClass = "SampleProd" version="[1-]" targetClass="SampleProd" source="" target="square_">
<![CDATA[ square_=0;
]]>
</ioread>
The rule says it applies to the class type
SamepleProd
which doesn't change type between the file and the job (
sourceClass
is identical to
targetClass
) and there is no data to read from the file (
source
is empty) and this is needed when ROOT must read the
square_
data member.
It is also possible to just do the calculation right at readback time. In that case the class would not need a mutable data member and would include a function which could be used both by the constructor and the ROOT readback code. E.g.,
// a simple class
struct SampleProd
{
private:
//do not store
int square_;
public:
explicit SampleProd(int v):value_(v),square_(0) { initSquare(); }
SampleProd():value_(0),square_(0){}
int value_;
int square() const {
return square_;
}
//called by constructor and ROOT read back
void initSquare() { square_=value_*value_;}
};
The declaration of the member as transient remains the same but we do need a new iorule which would be
<ioread sourceClass = "SampleProd" version="[1-]" targetClass="SampleProd" source="int value_" target="square_">
<![CDATA[ newObj->initSquare();
]]>
</ioread>
The difference between this rule and the previous is we have to tell ROOT which data member needs to be read from disk,
source="int value_"
in order for our call to
initSquare()
to work properly.
Class versioning
The above examples of classes_def.xml files ignore class versioning. Class versioning is used to support schema evolution by giving the current version of a persistent class a version number, and incrementing the version number if and when a persistent non-static data member of the class is added, removed, renamed, or has its type changed.
Up to and including CMSSW_4_4_X, CMSSW uses class versioning only sporadically. However, in CMSSW_5_0_X, class versioning has been implemented throughout CMSSW as much as is feasible. Most of this initial implementation was done centrally by the Core Software group. Users should add class versioning for any classes added in 5_0_X or subsequent releases. Class versioning need not be added to 4_4_X and prior releases, but it is harmless if done correctly.
Class versioning is used in classes_def.xml files only for classes that are not instances of templates. For example, here is the classes_def.xml for a persistent product with a transient data member, using class versioning:
<class name="SampleProd" ClassVersion="10">
<version ClassVersion="10" checksum="238838498"/>
<field name="square_" transient="true"/>
</class>
<class name="std::vector<SampleProd>"/>
<class name="edm::Wrapper<std::vector<SampleProd> >"/>
Version numbers 0, 1, and 2 have special meaning to ROOT and should not be used as version numbers. By convention, CMS will use 3 as the initial version number of a class which has never been stored before. If a class was previously stored without a class version assigned then one should use 10 as the initial version instead of 3 because ROOT will automatically assign a version number to each unversioned class starting with 3 and increment the version number each time it finds an instance of a class with a different checksum while writing to the file. By using 10 we can accommodate 7 unversioned instances of a stored class which should be more than enough to avoid conflicts. The checksum is an automatically generated number over the class. The tool currently used to add class versioning to a file,
edmAddClassVersion, will generate the ckecksums automatically. scram build will fail if any class with a class version does not have a correct checksum. The error message will indicate the correct value for the checksum.
Note that the templated classes do not have any versioning information in classes_def.xml. Versioning of CMS provided templates will be done by instrumenting a function in the template code itself.
Should the class SimpleProd be modified by adding, removing, renaming, or changing the type of any non-transient non-static data member, the new classes_def.xml file might look like this:
<class name="SampleProd" ClassVersion="11">
<version ClassVersion="11" checksum="169027539"/>
<field name="square_" transient="true"/>
<version ClassVersion="10" checksum="238838498"/>
<field name="square_" transient="true"/>
</class>
<class name="std::vector<SampleProd>"/>
<class name="edm::Wrapper<std::vector<SampleProd> >"/>
The new class version number is 11. The information about the previous version remains in the file for backward compatibility.
A set of instructions applicable to recent releases (tested in 11_0) are
here
. The instruction above in this paragraph need updating, accordingly.
Schema evolution
ROOT is able to automatically deal with some changes to a class (i.e. schema evolution) such as
- dropping of a member data
- adding a member data (when reading back old versions the new member data will have the value it was assigned in the class' default constructor)
- changing the same member data from one builtin type to another builtin type, e.g. from float to double or unsigned short to unsigned long.
However, it is possible to explicitly handle schema changes by adding a snippet of code to the classes_def.xml file. ROOT's documentation for that feature can be found at
http://root.cern.ch/root/html532/io/DataModelEvolution.html
.
classes_def.xml entries for a transient only EDProduct
In rare cases, you may wish to put a product in the event that you wish NEVER to be output into a
EDM/ROOT file.
If you specify
persistent="false"
in classes_def.xml file in the appropriate place (example is just below), the framework will guarantee that any EDProduct of this class will never be written to a
EDM/ROOT file. You do not need a dictionary for constituents of the transient only class. In CMSSW_4_3_0_pre3 and prior releases, you do not need to specify a dictionary for a Wrapper for the transient only class. In CMSSW_4_3_0_pre4 and later releases, a dictionary for the wrapper must be specified.
CMSSW_4_3_0_pre4 or later release example.
<lcgdict>
<class name="SampleProd"/>
<class name="std::vector<SampleProd>"/>
<class name="edm::Wrapper<std::vector<SampleProd> >" persistent="false"/>
</lcgdict>
CMSSW_4_3_0_pre3 or prior release example:
<lcgdict>
<class name="SampleProd"/>
<class name="std::vector<SampleProd>" persistent="false"/>
</lcgdict>
Note that this directive does not prevent the output of an
EDProduct containing objects of type:
std::vector<SampleProd>
It only prevents the output of an
EDProduct of the exact type:
std::vector<SampleProd>
Create classes.h
This C++ file must contain an #include (direct or indirect) of a header containing a full definition of each class or template used in classes_def.xml.
#include "DataFormats/<mypackage>/interface/SampleProd.h"
#include "DataFormats/Common/interface/Wrapper.h"
Do a build
You can now to "scram b". You will see the dictionary generation step.
Watch for errors or warning during this step because the build might continue even if there are messages. If you see a message like "WARNING: dictionary not generated for XX", you must go back to the classes_def.xml and classes.h to try and remove the trouble. The trouble could be as simple as a missing or misspelled namespace name or a class that you forgot to mention that is used in a later declaration. Remember to make instances of any template used in the classes.h file.
After a successful build, run
edmPluginDump to make sure your new product and associated things from the classes_def.xml are included in the list.
Producing the EDProduct
NOTE: The producer must not be in the same package as the EDProduct.
The producer package is a plug-in, and must follow the rules for plug-ins:
https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideDeclarePlugins
This section only contains code relevant to producing and using a new
EDProduct. Any producer that makes one of these new products must announce that it makes it.
This section is not directly relevant to
constituents, as only
EDProducts have producers.
Here is an example
EDProducer constructor and produce function that makes one instance of an SampleCollection. Because we are inserting only one product of each type (and, in this case, we are dealing with only one type), there is no need to add an extra instance name. The class of your EDProducer must publicly inherit from class
EDProducer.
Note that
Event::put() does not immediately add the product to the event. It simply places the product in a queue. Immediately after
the
EDProducer returns successfully, the framework will put any queued products into the event. This means that a product cannot
be obtained from the event by the same
EDProducer instance that produced it.
[NOTE: You can use the skeleton code generator
mkedprod to generate the initial C++ source files, which include examples]
SamplesProducer::SamplesProducer(edm::ParameterSet const& ps) {
// note: no argument in the call to produces
// The product instance name will be empty.
// The "setBranchAlias()" for setting the ROOT branch alias is optional.
// If "setBranchAlias()" was not used, the aliases would default to the module label.
produces<SampleCollection>().setBranchAlias("SampleCollection");
}
void SamplesProducer::produce(edm::Event& e, edm::EventSetup const&) {
auto result = std::make_unique<SampleCollection>();
// ... fill the collection ...
e.put(std::move(result));
}
Here is an example producer constructor and produce function that makes two named instances of SampleCollection. Each call to
put is now given an instance name, so that the two instances of the same class, created by the same producer instance, can be distinguished within the
Event.
SamplesProducer::SamplesProducer(edm::ParameterSet const& ps)
{
// notes: The argument string (e.g. "one") identifies the instance name.
// The "setBranchAlias()" for setting the ROOT branch alias is optional.
// If "setBranchAlias()" was not used, the aliases would default to "one" and "two".
produces<SampleCollection>("one").setBranchAlias("SampleCollectionOne");
produces<SampleCollection>("two").setBranchAlias("SampleCollectionTwo");
}
void SamplesProducer::produce(edm::Event& e, edm::EventSetup const&)
{
auto result1 = std::make_unique<SampleCollection>();
auto result2 = std::make_unique<SampleCollection>();
// ... fill the collections ...
e.put(std::move(result1),"one");
e.put(std::move(result2),"two");
}
Starting from
CMSSW_12_1_0_pre2
the data product can be omitted when the return value of
produces()
is assigned to
edm::EDPutTokenT
. The token can (and should be) used also in conjunction with the instance names.
class SamplesProducer {
...
edm::EDPutTokenT<SampleCollection> putToken_;
};
SamplesProducer::SamplesProducer(edm::ParameterSet const& ps) :
putToken_{produces()} {}
void SamplesProducer::produce(edm::Event& e, edm::EventSetup const&) {
auto result = std::make_unique<SampleCollection>();
// ... fill the collection ...
e.put(putToken_, std::move(result));
}
If a module does not put the product in the event itself but instead uses
a helper class, then it can allow the helper class to also make the
calls to the "produces" function. This is done by calling the function
"producesCollector()" which returns a ProducesCollector. Then the
ProducesCollector is passed as an argument into the constructor of
the helper class. The ProducesCollector class defines "produces" functions
and calling them as member functions of the ProducesCollector will
have the same effect as calling the module "produces" member functions
directly. (Note ProducesCollector was added to the repository in the
11_0_X release series. It didn't exist before then.)
Review Status
Responsible:
WilliamTanenbaum
Last reviewed by: Reviewer