Generic Fit Function Tools

Complete: 3

Purpose

The most widely and recommended fitting tool in CMS is CMS.RooFit.

This section describes an alternative toolkit that allows to easily define in C++ function models to be used in fit problems.

Those fit tools have been originally developed for some BaBar analyses, then partially ported to CMSSW and used for Z → μ+μ- studies.

The toolkit has two main features:

  • unlike CMS.RooFit it is based on template specialization, and is implemented without using polymorphism in fit functions, which are built from inlined code that gets optimized by the compiler and doesn't cause run-time overhead;
  • functions are described using a "natural" language that allows users to easily define custom fit models.

Disclaimer

Many fit applications are based on RooFit, which is supported under ROOT distribution. This toolkit is not intended to be a competitor of RooFit, it is just a possible alternative for the cases where defining a customized application may be more convenient.

This tool was ported to CMSSW to make easier the study of Z → μ+μ- inclusive cross section, and support for more general features will be considered only if there will be interest in the CMS community.

Package and Namespaces

The toolikit is all contained in the package:

The following namespaces are used in this toolkit:

funct:: Function definitions
fit:: Fit tools
root:: Root utilities and wrappers

Functions and Expressions

Functions are specified as C++ function objects taking one or more arguments (up to two or three in some cases currently supported, but extensible). A simple example is the following:

struct MyFunct {
  double operator()(double x) { return exp(-x)*x; }
};

So, one can call the following code:

MyFunct f;
double y = f(x);

Expressions are like functions, but take no argument. One example is the following:

struct PiOverTwo {
  double operator()() { return M_PI / 2; }
};

Functions and expressions become more useful than the above simple examples if parameters can be specified, as described in the following section.

Function Parameters

Functions may be defined in terms of parameters that can be passed to function/expression constructors. A convenient way to allow more functions to share the same parameter is to use boost::shared_ptr in the implementation. The utility funct::Paramter helps passing parameters adding more functionalities to the bare boost::shared_ptr (adds a string for the name, conversions to and from double). A simple example of parametric function is the Gaussian defined below:

namespace funct {
  struct Gaussian {
    static  const double oneOverSqrtTwoPi = 1/sqrt(2*M_PI);
    Gaussian(const Parameter & m, const Parameter & s) :
      mean(m.ptr()), sigma(s.ptr()) { }
    double operator()(double x) const {
      double z = (x - *mean)/ *sigma;
      return oneOverSqrtTwoPi/ *sigma * exp(-z*z/2);
    }
    boost::shared_ptr<double> mean, sigma;
  };
}

To force, for instance, two Gaussians to have the same mean value, user code may look like the following:

Parameter mean("mean", 0.0), sigma1("sigma1", 1.0), sigma2("sigma2", 2.0);
Gaussiam g1(mean, sigma1), g2(mean, sigma2);
double x = 1.2345;
double y1 = g1(x), y2 = g2(x);

Function Composition and Manipulation

Functions and expressions can be composed using:

  • algebraic operations (+, -, *, /, power raising ^; beware, of the unexpected precedence of the ^ operator in C++!)
  • math functions
  • composition (assuming that the first function takes one single argument)
  • convolution (a basic numerical integration algorithm is provided in the current implementation)

A new function type is generated by the function argument types. Examples are:

typedef
  Sum<Product<Parameter,Gaussian>::type,
      Product<Parameter,Gaussian>::type>::type DoubleGaussian;

Parameter a("a"), b("b"), mean("mean"), sigma1("sigma1"), sigma2("sigma2");
Gaussian g1(mean, sigma1), g2(mean, sigma2);
DoubleGaussian g = a * g1 + b * g2;

Symbolic Expressions

Symbolic expressions are useful to define with simple C++ code complex functions without explicitly defining a new type. Symbolic expression manipulations, including simplifications, are evaluated using the compiler, and involve no polymorphism, and produce inlined code that is fully optimized by the compiler.

Symbolic Variables

One has to first define the interesting variables. Variable types X, Y and Z are defined in the namespace funct:: by default:

X x;
Y y;

More variable types are easy to add:

// in the .h file:
DEFINE_VARIABLE(W, "w");
// in the .cc file:
IMPLEMENT_VARIABLE(W);
// in user code:
W w;

Symbolic manipulations

Expressions like the following can be easily defined:

X x; Y y;
Expression expr = sin(x) * cos(y);

x = 1.234; y = 4.321;
double value = expr();

Numerical Constants

The only caveat is that you can't use explicitly numbers in your expression, but you have to use provided utilities:

X x;
Numerical<1> _1; // this is a constant equal to 1
Numerical<2> _2; // this is a constant equal to 2

Expression expr = ((x - _1)^_2);
x = 3;
double value = expr(); // evaluates to (3-1)^2 = 4

Fractions can be used as follows:

Fraction<1,2> _1_2; // this evaluates to 1/2

Numerical constants and fractions can be generated on the fly with helper function templates:

Expression expr1 = x ^ num<2>(); // x ^2
Expression expre2 = x = fract<1, 2>(); // x ^ (1/2)

Symbolic Derivatives

Symbolic derivatives are computed at compile time, including term simplifications, and involve no run-time overhead. Higher order derivative can also be computed, according to the following example:

cout << derivative<X>(sin(x)*cos(x)) << endl;
cout << nth_derivative<2, X>(sin(x)*cos(x)) << endl;
cout << nth_derivative<3, X>(sin(x)*cos(x)) << endl;
cout << nth_derivative<4, X>(sin(x)*cos(x)) << endl;
This will print the following:
  cos(x)^2 - sin(x)^2
  -4 sin(x) cos(x)
  -4 ( cos(x)^2 - sin(x)^2
  16 sin(x) cos(x)
More examples are provided below:

Converting Expressions into Functions

Expressions can be converted into functions using the utility Function, which has as template parameters the variable(s) that need to be taken from the expression and converted into C++ arguments. One example is:
funct::Parameter yield(kYield, commands.par(kYield));
funct::Parameter mean(kMean, commands.par(kMean));
funct::Parameter sigma(kSigma, commands.par(kSigma));
funct::Parameter c("C", 1./sqrt(2*M_PI));

funct::Numerical<2> _2;
    
funct::Function<funct::X> f = yield * c * exp(-(((x-mean)/sigma)^_2)/_2);

double y = f(1.2345);

Fit Tools

ROOT version of Minuit is wrapped in utilities that make easy to write fit code. One example of fit is the following:
typedef funct::Product<funct::Parameter, funct::BreitWigner>::type FitFunction;
typedef fit::HistoChiSquare<FitFunction> ChiSquared;

funct::Parameter yield("Yield", 1000);
funct::Parameter mass("Mass", 91.2);
funct::Parameter gamma("Gamma", 2.50);
funct::BreitWigner bw(mass, gamma);

FitFunction f = yield * bw;
TF1 startFun = root::tf1("startFun", f, 0, 200, yield, mass, gamma);
TH1D histo("histo", "Z mass (GeV/c)", 200, 0, 200);
histo.FillRandom("startFun", yield);
ChiSquared chi2(f, &histo, 80, 120);
fit::RootMinuit<ChiSquared> minuit(chi2, true);
minuit.addParameter(yield, 100, 0, 10000);
minuit.addParameter(mass, 2, 70, 120);
minuit.addParameter(gamma, 1, 0, 5);
minuit.minimize();
minuit.migrad();

Chi-squared fits

Any parametric expression can be used to be minimized. What is released so far is:

  • fit::HistoChiSquare: chi-squared function based on an histogram and a fit function model.
  • fit::MultiHistoChiSquare: simultaneous chi-squared on multiple histograms (up to three are suported) each with a fit function model.

Unbinned Maximum Likelihood fits

Likelihood fits have been tested for a single dimension so far. They can be performed using the class template:

  • fit::Likelihood<Sample, PDF>
where Sample is the container that holds the data values. Could be std::vector for a single dimensional problem.

For Extended unbinned Maximum Likelihood fits, an extra parameter specifying the type of yield to befitted has to be specified. Typically it is just funct::Parameter:

  • fit::Likelihood<Sample, PDF, funct::Parameter>

Driving the Fit with Data-Cards

The sequence of minuit commands can also be passed via data-cards, in order to avoid recompiling in case of small changes, like initial parameter values, fixing and releasing parameters, and so on. The last example of the section below uses the following data-card file as input:

The supported commands are:

  • par <name> <value> <min> <max> [free|fixed]
  • fix <name>
  • release <name>
  • set <name> <value>
  • minimize
  • migrad
  • print_all

Fit Examples

Complete examples are released in CVS:

Integration with CMS.RooFit

Expressions can be integrated in RooFit with the adapter tool root::RooFitFunction. An example of how to do it is shown below:

Interactive use of such adapters requires the generation of dictionaries, which should be trivial, but has not been tested yet.

Possible Extensions

The presented approach was extended in BaBar version with the following additional features, not ported (yet?) to CMSSW:
  • Symbolic integration, in the simplest cases
  • PDF (Probability Density Function) manipulation
  • Random Number generation from PDFs

References

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2009-05-27 - LucaLista
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback