Upsilon Analysis Tutorial @ PKU


CMS group at Fermilab LPC, Data Analysis School

(scope of this twiki)

This twiki page presents a tutorial for beginners of CMS at Peking University.The attendees are supposed to be familiar with basic ROOT and CMSSW.


In these Exercises we will measure the Upsilon resonance cross section as presented in arXiv:1012.5545 (CMS-AN). We will reproduce the same results as obtained in the paper using the 3/pb real CMS data. We will learn how to carry out a complete analysis from soup to nuts.

The two principal motivations for the study of the inclusive differential production cross section of the Υ(nS) family of resonances at CMS are: (i) elucidation of the physical processes (hadroproduction) that produce the Υ(nS) family in proton-proton collisions, and (ii) calibration of the CMS detector for low pT muons.

The differential cross section is given by

   \begin{displaymath} \frac{d\sigma (pp \to \Upsilon(nS))}{dp_T} \times Br(\Upsilon(nS) \to \mu^{+}\mu^{-}) = \frac{N^{fit}_{\Upsilon(nS)}(p_T; {\cal{A}}, \epsilon)}{\int {\cal{L}} dt \cdot \Delta p_T}   \end{displaymath}

The determination of the l.h.s is obtained from the r.h.s measured inputs:

The implementation presented is based on the analysis of dimuon events. Online selection is based on the detection of two muons at the hardware trigger level, without an explicit pT requirement and without further selection at the HLT. Such criteria, forming the L1 trigger path, was sufficient to maintain a dimuon trigger without prescaling at the instantaneous luminosities of the LHC start-up. All three muon systems, DT, CSC and RPC, take part in the trigger decision.

The basic objects, referred to as tracker muons, correspond to tracks reconstructed in the silicon tracker and associated with a compatible signal in the muon detectors. (Read more about tracker muons, global muons, standalone muons, and Calo muons)

Along with executing the various steps involved in a physics analysis, the attendee will have become familiar with:

  • determining detector acceptances
  • data based methods for measuring reconstruction efficiencies
  • mc generation and sample manipulations, reconstruction, fitting
  • evaluation of systematic uncertainties
  • using various standard CMS software tools and packages.

The Upsilon Analysis Workflow

Analysis Steps

Exercise 1: Setup the working area

Release & Software

The CMSSW release used in this tutorial relies on SLC5.x. Till now, the available nodes are node02, node09, node10, node11, node12.

Login to or hepfarm01. And ceate working directory, set cmssw release:

ssh -Y node02
source /home/zhlinl/

PLEASE make sure that you are not doing this tutorial at HOME directory , its disk space is very limited :-)

mkdir UpsilonAna 
cd UpsilonAna
cmsrel CMSSW_3_8_7
cd CMSSW_3_8_7/src

The shell file is to set the Environment.

SLCversion=`grep Scientific /etc/redhat-release | sed 's/.*[rR]elease \([0-9]*\)\..*/\1/'`

if [ $SLCversion = 4 ]
export VO_CMS_SW_DIR=/scratch/cms/sw
export SCRAM_ARCH=slc4_ia32_gcc345
source $VO_CMS_SW_DIR/

if [ $SLCversion = 5 ]
export VO_CMS_SW_DIR=/home/zhlinl/cluster2/sw
export SCRAM_ARCH=slc5_ia32_gcc434
source $VO_CMS_SW_DIR/

#export  (this outdated)
#cvs login

To checkout the required software tags, you need login to the CVS system, the password is 98passwd :

cvs login

Then do

cvs co -d zhlinl/UpsilonAna  UserCode/zhlinl/PKUTutorial
addpkg PhysicsTools/TagAndProbe V02-06-11 
addpkg MuonAnalysis/MuonAssociators  V01-11-00
addpkg MuonAnalysis/TagAndProbe V07-02-00
addpkg PhysicsTools/PatAlgos
cvs update -r V08-01-02 PhysicsTools/PatAlgos/python/tools/
cvs co -r V00-12-01 HeavyFlavorAnalysis/Onia2MuMu
cvs co -d pytnp  UserCode/zhlinl/TagAndProbe/pytnp
cvs co -d zhlinl/UpsAcc UserCode/zhlinl/UpsAcc
cvs co -r CMSSW_3_8_7 IOMC/ParticleGuns
tar xzvf zhlinl/UpsAcc/test/upsiAcc_patch361p4.tgz

cp zhlinl/UpsilonAna/test/ pytnp/
cp zhlinl/UpsilonAna/test/ pytnp/

scramv1 b -j7                                         
this takes a few minutes.

Backup of ROOT files, that will be used in this tutorial, are located on hepfarm @ PKU:


Data Samples



Exercise 2: Skim and PAT-tify the Data Samples

We use the standard HeavyFlavorAnalysis/Onia2MuMu package. The aim of Onia2MuMuPAT is to provide to the CMS analysis community:

  • A flexible module that can be used to make di-muon candidate objects starting from muons filling in some analysis level variables which would be tricky or tedious for everybody to compute by themselves
  • A standard configuration to produce di-muon candidates using the above mentioned module, so that it can be saved in group skims and used consistently among the different analyzes.

Go to the following directory:

 cd HeavyFlavorAnalysis/Onia2MuMu/test/

Use the example root file, MuOnia_Run2010A_ReReco-v1.root, as input in the configuration file As you see, we will only run on 1000 events as a test. Execute the following command:

 cmsRun >& log &
tail -f log

it will take several minutes.

When the job is done, you will see a summary report like the example below and an output PAT-tuple. In the PAT-tuple, muons are referred to as pat muons.



Question Question 2-1: How many events failed to pass the selection criteria in Onia2MuMuPAT?

Open the PAT-tuple via ROOT:

root -l onia2MuMuPAT.root

new TBrowser


Question Question 2-2: What is the mean pt and eta of the pat muons with trigger?

Read more about rapidity y and pseudorapidity eta.

Question Question 2-3: Is there any mass cut on the dimuon candidates? (mention how you verified this)

    process.onia2MuMuPatTrkTrk = cms.EDProducer('Onia2MuMuPAT',
        muons = cms.InputTag("patMuonsWithTrigger"),
        beamSpotTag = cms.InputTag("offlineBeamSpot"),
        primaryVertexTag = cms.InputTag("offlinePrimaryVertices"),
        # At least one muon must pass this selection
        higherPuritySelection = cms.string("(isGlobalMuon || isTrackerMuon || (innerTrack.isNonnull && genParticleRef(0).isNonnull)) && abs(innerTrack.dxy)<4 && abs(<35"),
        # BOTH muons must pass this selection
        lowerPuritySelection  = cms.string("(isGlobalMuon || isTrackerMuon || (innerTrack.isNonnull && genParticleRef(0).isNonnull)) && abs(innerTrack.dxy)<4 && abs(<35"),
        dimuonSelection  = cms.string("2 < mass"), ## The dimuon must pass this selection before vertexing
        addCommonVertex = cms.bool(True), ## Embed the full reco::Vertex out of the common vertex fit
        addMuonlessPrimaryVertex = cms.bool(True), ## Embed the primary vertex re-made from all the tracks except the two muons
        addMCTruth = cms.bool(MC),      ## Add the common MC mother of the two muons, if any
        resolvePileUpAmbiguity = cms.bool(False)   ## Order PVs by their vicinity to the J/psi vertex, not by sumPt                            

For more information, check the Onia2MuMuPAT twiki page. As running this step on the whole dataset is time consuming, we will provide you the output skimmed PAT-tuples for the data samples directly.

skimmed PAT-tuple: /MuOnia/zgecse-Run2010A-Nov4ReReco_v1-Onia2MuMu-v6-52b0ce2a29246dfcd7fb39d814d6fb33/USER

skimmed PAT-tuple: /MuOnia/yzheng-Run2010B-Nov4ReReco-v1-Onia2MuMu-v6-Dec2-52b0ce2a29246dfcd7fb39d814d6fb33/USER

Exercise 3: Upsilon Selection and Yield Tree

The purpose of this exercise is to create a ROOT TTree with the Upsilon candidate information to be used for fitting the signal yield. The input is the skimmed data sample (obtained from Exercise 2) that contains the PAT objects, produced in the previous exercise. The output TTree has to contain the basic information (e.g. mass, pt) of the Dimuon/Upsilon candidates, as well as the information of the daughter muons.

cd $CMSSW_BASE/src/zhlinl/UpsilonAna/test/EventSelection

Replace the input root files with the one you created in Exercise 2 or the backup file onia2MuMuPAT-Run2010A-ReReco_v1.root in the configuration file


Inspect the output root file:

root -l selection.root


Question Question 3-1: How many Upsilon candidates are in the output tree?

Below is a summary of the selection criteria used in the analysis:

The dimuon candidate satisfies:

  • opposite sign
  • Vertex Probability > 0.001
  • | -| < 2.
  • abs(rapidity) < 2

Each muon satisfies:

  • > 3.5 if |mu.eta| <1.6, >2.5 if 1.6<|mu.eta|<2.4
  • is tracker muon
  • mu.innerTrack.numberOfValidHits > 11
  • mu.innerTrack.hitPattern.pixelLayersWithMeasurement > 0
  • mu.innerTrack.normalizedChi2 < 5
  • < 25
  • mu.dB < 0.2
  • both muons matched to trigger objects

Question Question 3-2: Please go to CMSSW Reference Manual to understand the meaning of each variables listed above. "innerTrack" means the part of track in the inner tracker. Similarly, "outerTrack" means the part of track in the muon detector. What's the meaning of

Choose the CMSSW release first. In this case, it is "CMSSW_3_8_7". Then choose the "Namespaces" as "pat". Click on class "Muon", you will see a list of member functions, like "dB" and "innerTrack". Click on the member function that you want to understand more.

Question Question 3-3: Please inspect the python file python/ and compare the SELECTIONCUT with the above cuts, add the missing ones.

SELECTIONCUT  = " charge = 0"
SELECTIONCUT += " && userFloat('vProb') > 0.001"
SELECTIONCUT += " && abs(rapidity) < 2."
SELECTIONCUT += " && abs(daughter('muon1') - daughter('muon2') < 2."

SELECTIONCUT += " && ((daughter('muon1').pt > 3.5 && abs(daughter('muon1').eta) < 1.6) || (daughter('muon1').pt > 2.5 && 1.6 < abs(daughter('muon1').eta) < 2.4))"
SELECTIONCUT += " && daughter('muon1').isTrackerMuon "
SELECTIONCUT += " && daughter('muon1').innerTrack.numberOfValidHits > 11 "
SELECTIONCUT += " && daughter('muon1').innerTrack.hitPattern.pixelLayersWithMeasurement > 0 "
SELECTIONCUT += " && daughter('muon1').innerTrack.normalizedChi2 < 5 "
SELECTIONCUT += " && abs(daughter('muon1') < 25 "
SELECTIONCUT += " && abs(daughter('muon1').dB) < 0.2 "

SELECTIONCUT += " && ((daughter('muon2').pt > 3.5 && abs(daughter('muon2').eta) < 1.6) || (daughter('muon2').pt > 2.5 && 1.6 < abs(daughter('muon2').eta) < 2.4))"
SELECTIONCUT += " && daughter('muon2').isTrackerMuon "
SELECTIONCUT += " && daughter('muon2').innerTrack.numberOfValidHits > 11 "
SELECTIONCUT += " && daughter('muon2').innerTrack.hitPattern.pixelLayersWithMeasurement > 0 "
SELECTIONCUT += " && daughter('muon2').innerTrack.normalizedChi2 < 5 "
SELECTIONCUT += " && abs(daughter('muon2') < 25 "
SELECTIONCUT += " && abs(daughter('muon2').dB) < 0.2 "

SELECTIONCUT += " && !daughter('muon1').triggerObjectMatchesByFilter('SelectionTrigger').empty() && daughter('muon1').userFloat('muonL1Info:deltaR')<0.3"
SELECTIONCUT += " && !daughter('muon2').triggerObjectMatchesByFilter('SelectionTrigger').empty() && daughter('muon2').userFloat('muonL1Info:deltaR')<0.3"

After adding all the cuts, change the output root file name to "selection_cut.root" and re-run the

Question Question 3-4: Now how many Upsilon candidates are in the new output tree?

Question Question 3-5: What is the eventID of the Upsilon candidate that is closest to the real Upsilon(1S) mass?

Now we are ready to submit some test crab jobs. BUT this can only be done on the UI ( or

Edit the $CMSSW_BASE/src/zhlinl/UpsilonAna/test/EventSelection/multicrab.cfg, comment out the [Jpsi] part as we don't need it in this exercise, and use the correct datasetpath. Change the CMSSW.total_number_of_lumis = -1 ("-1" means all the events contained in the dataset) to CMSSW.total_number_of_lumis = 30 and CMSSW.number_of_jobs = 500 to CMSSW.number_of_jobs = 10. Inspect the file again before submitting the jobs.

multicrab -create -submit                                 

You can check the job status any time by doing:

crab -status -c MuOnia  
crab -status -c MuOniaB      

After all jobs are done, you can use crab -getoutput to retrieve the output files and then use hadd to merge them.

In the exercise, we prepared the results directly for everybody, MuOnia.root. The Exercises 5&6 will rely on the output from this exercise.


Exercise 4: Calculation of Acceptance


We separate the total efficiency to reconstruct a muon in CMS into acceptance and efficiency. The acceptance includes effects such as detector geometry, kinematics criteria, pT and eta of each muon and the rapidity of the dimuon. We calculate the acceptance in Exercise 4 and use data based technique to measure the efficiencies in Exercise 5.


The Υ → μ+μ− acceptance of the CMS detector is defined as the product of two terms. The first is the fraction of upsilons of given pT and |η| such that each of the two muons satisfies:

   \begin{displaymath}     {p_{T}^{\mu} > 3.5 {\rm GeV/}c \qquad {\rm if} \quad |\eta^{\mu}| < 1.6, \qquad     {\rm or} \qquad p_{T}^{\mu} > 2.5 {\rm GeV/}c \qquad {\rm if} \quad 1.6 < |\eta^{\mu}| < 2.4}   \end{displaymath}
these kinematic cuts are implemented in step 3. below

The second is the probability that each muon can be reconstructed in the tracker using the CMS software, in the absence of further event activity in the detector and without quality cuts imposed. Both components are evaluated by simulation and parametrized as a function of the pT and rapidity of the Υ.

The acceptance is calculated from the ratio

   \begin{displaymath}     {\mathcal A}^\Upsilon \left(p_T,y \right) = \frac{N_{\rm reco}^\Upsilon \left( p_T,y \right)}{N_{\rm gen}^\Upsilon \left( p_T,y \right)}   \end{displaymath}
where the denominator is the number of upsilons generated in a (pT, y) bin, while the numerator is the number reconstructed in the same (pT, y) region but now using the reconstructed variables rather than the generated ones.

The acceptance is calculated for two-dimensional (2-D) bins of size (1 GeV/c × 0.1) in the reconstructed pT and y of the Υ and used in candidate-by-candidate yield corrections.

Step 1 - Generate edm input source

We use an "Upsilon gun" that produces events that contain only the Upsilons, distributed flat in pT and Rapidity, and the daughter muons. The events are generated with the EVTGEN package including the effects of electromagnetic final-state radiation (FSR). This sample is then fully simulated and reconstructed with the CMS detector simulation software to assess the effects of multiple scattering and finite resolution of the detector.

The job configuration in the $CMSSW_BASE/src/zhlinl/UpsAcc/test directory performs the above mentioned steps. We are planning to measure the Upsilon cross section in the Rapidity range (-2,2) and pT range (0,30) GeV. So we should adjust the kinematic phase-space of the Upsilon to somewhat larger range as, due to resolution effects, an Upsilon generated outside the measured range can be reconstructed within. Please edit the as shown below

process.generator = cms.EDProducer("FlatRapidFlatPtGunProducer",
                                   PGunParameters = cms.PSet(
    # you can request more than 1 particle
    PartID = cms.vint32(553),
    MinRapidity = cms.untracked.double(-2.6),
    MaxRapidity = cms.untracked.double(2.6),
    # for some reason these are tracked now...this gun doesnt use eta though
    # phi must be given in radians
    MinPhi = cms.double(-3.14159265358979323846),
    MaxPhi = cms.double( 3.14159265358979323846),
    MinPt  = cms.double(0),
    MaxPt  = cms.double(30),

The process schedule clearly shows the executed steps: generation, simulation,...,reconstruction:

process.pstart = cms.Path(process.generator)
process.p0 = cms.Path(process.pgen)
process.p1 = cms.Path(process.psim)
process.p2 = cms.Path(process.pdigi)
process.p3 = cms.Path(process.L1Emulator)
process.p4 = cms.Path(process.DigiToRaw)
process.p5 = cms.Path(process.RawToDigi)
process.p6 = cms.Path(process.reconstruction)
process.outpath = cms.EndPath(process.RECOSIM)

process.schedule = cms.Schedule(process.pstart,process.evtgen,process.p0,process.p1,process.p2,process.p3,process.p4,process.p5,process.p6,

Question Question 4-1: How do we ensure in that the Upsilon(1S) decay is only to muons?

Inspect the parameter user_decay_file in the configuration file.

Adjust the number of events to 100:

process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32(100)
Test the configuration by running
cd $CMSSW_BASE/src/zhlinl/UpsAcc/test
A root file named ups1s_gun.root will be created used as input to the next step.

The large (40M Y(1S) events) samples used in the paper are located at T2_MIT and published locally on cms_dbs_ph_analysis_02(multiple batches of 10M):

  • Y(1S) pt 0-30
    • /Upsi1s-Acc-Gun-v1/mrudolph-Upsi1s-Acc-Gun-v1-4d722db526b65ca73eb58b833a090059/USER
    • /Upsi1s-Acc-Gun-v2/mrudolph-Upsi1s-Acc-Gun-v2-4d722db526b65ca73eb58b833a090059/USER
    • /Upsi1s-Acc-Gun-v3/mrudolph-Upsi1s-Acc-Gun-v3-4d722db526b65ca73eb58b833a090059/USER
    • /Upsi1s-Acc-Gun-v5/mrudolph-Upsi1s-Acc-Gun-v5-4d722db526b65ca73eb58b833a090059/USER
  • Y(1S) pt 30-50
    • /Upsi1s-Acc-Gun-v7/mrudolph-Upsi1s-Acc-Gun-v7-e4f0da8313faec637f0cfedd6e235f33/USER
  • Y(2S)
    • /Upsi2s-Acc-Gun-v1/mrudolph-Upsi2s-Acc-Gun-v1-5a3719f53c17d40ee772f6c63d6bad37/USER
  • Y(3S)
    • /Upsi3s-Acc-Gun-v1/mrudolph-Upsi3s-Acc-Gun-v1-f7919ff6bba81db039a3d38e5b5f0b53/USER

Step 2 - Create TTree

It would require significant space to save every object in the output file. Therefore, to save space, an analyzer has been prepared in the ../src directory that dumps only the necessary variables into a TTree. This basic code will also allow you to understand how to access information from a data sample root file. Nineteen parameters are saved into a TTree, including the generated momenta of Upsilons and muons along with the momenta of reconstructed tracks.

Inspect the python file, use "file:ups1s_gun.root" as the input. Then execute the command:

cd $CMSSW_BASE/src/zhlinl/UpsAcc/test/

To check the output, open the resulting upsilonTree.root file:

root -l upsilonTree.root

TBrowser a
Click on "ROOT Files", and then "upsilonTree.root" and then "UpsTree", you will find 19 branches inside. Click on each branch, to see the distribution of each parameter. Or you can scan the contents of the tree with the command UpsTree->Scan("*") in root. It will display all the branches in the tree like this:


Notice: The generated values of the Y and muons depends on the random number seed and so differ from the above example. However, the structure of the tree is the same.

You can submit a CRAB job to produce larger statistics. To save time, we have prepared a tree with nearly 10M events. Please copy it to your work directory.

ln -s /home/zhlinl/cluster2/UpsiAna2011/upsilonTree.root  $CMSSW_BASE/src/zhlinl/UpsAcc/test/
This will create a soft link to the real path of the root file.

Step 3 - Kinematic Cuts

In this step, we will show you how we decide the kinematic cuts. In Step 2, you've already produced a TTree for the "Upsilon Gun" events. Now, we'd like you to repeat Step 2 but using a MC sample generated via PYTHIA.

The MC samples used in the step are from the official Summer10 MC production: /Upsilon1SToMuMu_2MuEtaFilter_7TeV-pythia6-evtgen/Summer10-START36_V9_S09-v1/GEN-SIM-RECO

Please use DBS to find the sample. And run on one of the data files(you can download one file using this tool: File Mover). Run on 10000 events and use MCupsilonTree.root as the name of output.

After it has finished, inspect the macro kinematic.C and do the following:

root -l -b -q kinematic.C

Open the output file:

root -l kinematic.root

TBrowser a;

After looking at the histograms and understanding the macro,

Question Question 4.2: What is the meaning of the variables in the histogram "genHMuUpsPt" and "genLMuUpsPt"?

To make more informative histograms, we need a larger statistics tree. Please copy a tree with ~2M events to your work directory:

ln -s /home/zhlinl/cluster2/UpsiAna2011/MCupsilonTree.root $CMSSW_BASE/src/zhlinl/UpsAcc/test/

Please use it as input and re-run kinematic.C. In the output file, you will find three histograms similar to the following plots:

kinematic1_v2.png kinematic2_v2.png

Question Question 4-3: Please produce the following plots: (i) mass of the generated Upsilons, (ii) eta-pt occupancy map for generated and reconstructed muons, (iii) muon reconstruction efficiency. Is the generated upsilon mass single-valued? are the distributions in eta-pt of generated and reconstructed muons uniform? how does the efficiency for reconstructing low-pt muons vary between endcap and barrel detector regions, what are the detection pt thresholds? Please offer explanations for the observed effects.

Now you are familiar with the basic kinematics of the Upsilon to dimuon events. The plots below, based on which we decide the kinematic cuts, show the trigger efficiency and reconstruction efficiency.

trigger_mc.png reco_mc.png

Step 4 - Calculate Acceptance

Since the acceptance depends on the pT and Rapidity of the Upsilon, we calculate it in bins of pT and Rapidity. We are still in the directory $CMSSW_BASE/src/UpsAcc/test. There is a ROOT macro named acceptance.C. It defines a two dimensional histogram for the generated Upsilon pT and Rapidity and it is filled for each event of the upsilonTree.root sample generated in the previous step.

Run the macro:

root -l acceptance.C
This step may take 10 minutes. After it has finished, using the resulting root file as input, run the following macro:
root -l divide.C
This macro divides the histogram with kinematic cuts by the generator level histogram. Resulting root file acceptance.root has 2D acceptance maps as used by weighting code in Exercise 6. Weighting means applying the acceptance and efficiency corrections to the Upsilon yield tree. We will perform weighting in Exercise 6.

Question Question 4-4: Please calculate and draw the 1D acceptances as functions of pT and rapidity of Upsilon. Why is there a dip in the plot?

Question Question 4-5: At the beginning of Exercise 4, we stated that the acceptance was a product of two items. One is kinematic, the second is the probability that each muon can be reconstructed in the tracker using CMSSW. Does the acceptance depend on the MC faithfully reproducing the resolution for each kinematic quantity observed in the data?

Question Question 4-6: Because the acceptance assumes the absence of further event activity in the detector, are there additional effects that must be accounted for to obtain the cross section related to event activity?

Exercise 5: Using Tag and Probe to measure efficiencies

Tag and Probe methodology

Tag and Probe (T&P) is a data driven method for efficiency calculations of electrons and muons. It utilizes resonances to identify probe objects. Assuming no efficiency correlation between the two decay products of the resonance it identifies one of the leptons, called a Tag, using tight identification criteria and measures the efficiency of the other lepton, referred to as a Probe. In the Tag and Probe twikipage you can find generic documentation about the method and on the existing software infrastructure to perform these measurements in CMS. In this page, we will discuss the efficiency of muons with Tag and Probe method. And we will utilize a Jpsi resonance as a source of muons because it produced more copiously than the Upsilon.

Efficiency components

In the Upsilon analysis, the total muon efficiency is composed with three conditional terms:

   \begin{displaymath} \epsilon (total) = \epsilon (trig|id) \cdot \epsilon (id|track) \cdot \epsilon (track|accepted) = \epsilon_{trig} \cdot \epsilon_{id} \cdot \epsilon_{track}    \end{displaymath}

  • Tracking efficiency - it combines the efficiency that the accepted track of a muon from the Y(nS) decay is reconstructed in the presence of other activity in the silicon tracker, as determined with a track-embedding technique, and the efficiency for the track to satisfy quality criteria, as determined with the tag-and-probe technique
  • Muon identification efficiency - the probability that the track in the silicon tracker is identified as a muon
  • Trigger efficiency - an identified muon satisfies the trigger

The component of the tracking efficiency measured with the track-embedding technique is well described by a constant value of (99.64 ± 0.05)%. The efficiency of the track quality criteria measured by the T&P method is likewise nearly uniform and has an average value of (98.66 ± 0.05)%. As a result, in this exercise, we will only measure the muon identification efficiency and trigger efficiency with the standard T&P tool in this exercise. The code is in the PhysicsTools/TagAndProbe package.

Track Embedding Method

A sample of pp collision events is analyzed and the position of the primary vertex in each event is determined. Simulated muons with pT ranging from 0.5 to 500 GeV/c and with pseudo-rapidity in the range |η| < 3 were generated with initial positions corresponding to the primary vertices reconstructed in the data. These simulated muons were processed using the Monte Carlo detector simulation and event reconstruction procedure and were classified as being within the detector acceptance if a reconstructed track was found that had at least 75% of its hits originating from the generated muon. The pixel and strip detector hits from the pp collision were then merged with the simulated muon hits and the event reconstruction was repeated. The tracking inefficiency is calculated by dividing the fraction of embedded tracks that fail to be reconstructed after superimposing hits from pp collision events.

muonRecoEff_2d.gif muonTrackRecoEff_1d_projection_bin.gif

Step 1 - Produce tag-and-probe Ntuples

This step has already been finished in Exercise 3. It is performed using the TagProbeFitTreeProducer tool. In a CMSSW job running on EDM files like RECO/AOD or a skim in PAT format, one:

  • Defines the tags and the probes. Normally the tag is a high quality muon, matched to a trigger object. But there are different kinds of probes that you can consider for muon efficiencies. We use tracks in the inner tracker in this analysis.
  • The tags and probes are combined into pairs (usually with the CandViewShallowCloneCombiner package)
  • Probe variables and "passing probes" can be defined in two ways: (The generic documentation is in the Passing probe flag definitions)
    • In a simple way, using string expressions and cuts on the variables of the muon objects
    • Associating one object to another. (e.g. matching one muon object to another by ΔR)
  • On simulated events, the generator level information can also be used to mark which entries in the dataset correspond to true di-muons from the selected resonance

The output of this step is a set of ROOT trees containing the probe variables, which can then be combined with the hadd command and passed as input to the next step. We have already created these trees in Exercise 3.

Question Question 5-1: Open the root file produced in Exercise 3. There are 7 trees in it. Which one is created with the TagProbeFitTreeProducer tool?

looking at the root file with _file0->ls()

Question Question 5-2: Which kinds of objects are defined as tag, probe, and passing probe when calculating the muon identification efficiency?

Question Question 5-3: Which kinds of objects are defined as tag, probe, and passing probe when calculating the muon trigger efficiency?

Step 2 - Fit for the tag-and-probe efficiences

This step is performed using the TagProbeFitTreeAnalyzer tool. The tool could be used also standalone, but is normally wrapped as a CMSSW analyzer for the ease of parsing configuration files. The module takes as input:
  • A set of rootfiles produced from Step 1
  • Information about which columns to use inside of the dataset:
    • for the binning variables for the efficiency (e.g. pT, or (η, φ))
    • to select which are the passing probes, as the same dataset can contain multiple columns e.g. for different muon identification algorithms
    • p.d.f. of the signal, the passing background and the failing background
To extract the efficiency on the signal the module performs an unbinned maximum likelihood fit simultaneously on passing and failing probes.

Find the python scripts in the following directory

cd $CMSSW_BASE/src/zhlinl/UpsilonAna/test/TagAndProbe                                           

Inspect the configuration file and Set the input directory with the root file produced in Exercise 3. Change the output root files' names to "JPsi_DATA_MuonID.root" and "JPsi_DATA_Trigger.root".

cmsRun >& log_id &
cmsRun  >& log_trig &                               

it will take ~20 minutes to finish.

The outputs are RooDataSet JPsi_DATA_MuonID.root and JPsi_DATA_Trigger.root containing the efficiencies in each bin, plus a set of control plots.

Question Question 5-4: We use the double muon trigger for the analysis. But when measuring the efficiencies why do we require the tag muon to pass a single muon trigger?

Question Question 5-5: There is another python file under the same directory. Which efficiency does it fit for?

Step 3 - Generate efficiency plots

In this step we generate efficiency plots with the root files produced in Step 2 by using a python package pytnp.

This python package is intended to generate some useful plots related with efficiencies in an automatic way using the outputs provided by the CMSSW Tag & Probe package. The package contains the effPlots script to do the jobs which executes the pytnp.libPytnp module and the class pytnp.libPytnp.pytnpclass.pytnp which uses the root files provided by the CMSSW Tag & Probe package.

cd $CMSSW_BASE/src/pytnp                          
python install --user
export PATH=$PATH:$HOME/.local/bin    (setenv PATH ${PATH}:${HOME}/.local/bin # for csh) 

You can check your installation launching:

effPlots -h                               

Before execute the "effPlots" command, please copy the root files that you produced in Step 3 in this directory:

mv $CMSSW_BASE/src/zhlinl/UpsilonAna/test/TagAndProbe/JPsi_DATA_MuonID.root .
mv $CMSSW_BASE/src/zhlinl/UpsilonAna/test/TagAndProbe/JPsi_DATA_Trigger.root  .             

To produce the muonId efficiency maps:

effPlots -i JPsi_DATA_MuonID.root -m TM pt,abseta -c  
effPlots -i JPsi_DATA_Trigger.root -m L1DoubleMuOpen pt,abseta -c           
Resulting root files include 2D efficiency maps (as used by weighting code in Exercise 6).

Question Question 5-6: Please draw the 1D efficiency as functions of pT and eta.

Question Question 5-7: Why do we use Tag and Probe to measure efficiencies instead of using MC efficiencies directly?


Question Question 5-8: Change the binning to "pt = cms.vdouble(2.5, 3.5, 4.5, 6.0, 10.0, 50.0)," and "abseta = cms.vdouble(0.0, 0.8, 1.6, 2.4),", repeat step 2 and step 3 and compare the results with the previous ones that you obtained. What do you conclude? (note: The previous binning is the default one used in the whole analysis.)

Exercise 6: Cross Section

In this section of the exercise we extract the signal yield and apply the corrections determined in the previous sections, thus performing the measurement of Υ cross section.

We start from the generic expression L.σ=N where the number of events is given by the product of the luminosity and the cross section. There are several important precisions required here, in particular that experimentally we observe only a fraction of the produced Υs. This follows from the limited geometric acceptance of the detector, and inefficiencies with the which the #Upsilon;s are selected and reconstructed. In addition, only the dimuon final state is considered, and a correction to the respective branching ratio is needed.

In the following, we start from the Upsilon yield tree (obtained in Exercise 3), apply the acceptance and efficiency corrections (obtained in Exercises 4 and 5), then proceed with extracting the signal yield from the data sample, and finally achieve the cross section measurement.

Copy the upsilon yield tree computed above (in Ex.3) to the working directory, eg:

cd $CMSSW_BASE/src/zhlinl/UpsilonAna/test/xsection                                       
# copy the data file
ln -s /home/zhlinl/cluster2/UpsiAna2011/MuOnia.root ./
# copy the maps
mkdir maps
ln -s $CMSSW_BASE/src/zhlinl/UpsAcc/test/acceptance.root maps/
ln -s $CMSSW_BASE/src/pytnp/effMaps_JPsi_DATA_Trigger.root maps/
ln -s $CMSSW_BASE/src/pytnp/effMaps_JPsi_DATA_MuonID.root maps/
ln -s $CMSSW_BASE/src/zhlinl/UpsilonAna/test/maps/effMap_track.root maps/
# create output dir.s
mkdir figs; mkdir log;

Step 1 - Acceptance and efficiency corrections

The acceptance and efficiency 2D maps are applied to the Upsilon candidates, and for each the efficiency and acceptance correction values are computed and appended to the data yield tree. The ROOT macro makeWeights.C will take the yield file, MuOnia.root , as well as the efficiency and acceptance files as input and produce an output file upsilonYieldWeighted.root that contains the same upsilonYield tree but with extra column(s) included in addition with the candidates correction factors (evaluated based on the upsilon and decay muons kinematic properties).

We use the macro makeWeights.C to append the eff. and acc. correction factors to the data tree. Inspect the macro and change the input file names and histograms names to those you obtained in the previous exercises, i.e. change the following lines:

TString accHistoName     = "acc_pt35_eta16_pt25_eta24_trk_unpol"; //(pt,y)
TString effTrckHistoName = "h_hits2d"; //(eta,pt)
//TString effTrigHistoName = "TH2F_tagAndProbe_Trigger_TM_pt_abseta_fit_eff"; //(pt,eta)
//TString effMuidHistoName = "TH2F_tagAndProbe_TM_pt_abseta_fit_eff"; //(Pt,eta)
TString effTrigHistoName = "TH2F_tagAndProbe_LiDoubleMuOpen_Tight_fit_eff"; //(pt,eta)
TString effMuidHistoName = "TH2F_tagAndProbe_TM_fit_eff"; //(Pt,eta)

TString treeDir = "yieldUpsilonTree";
TString treeName = "probe_tree";

TString inFile  ="MuOnia.root";
TString accFile ="acceptance.root";
TString trkFile ="effMap_track.root";
TString idFile  ="effMaps_JPsi_DATA_MuonID.root";
TString trigFile="effMaps_JPsi_DATA_Trigger.root";
TString outFile ="upsilonYieldWeighted.root";
TString mapDir ="maps/";

root -l -b -q makeWeights.C >& log/wei.log &

Step 2 - Mass spectrum and signal yield

The number of upsilon events (signal yield) is determined from a fit to the dimuon invariant mass, in the range containing all three 1S, 2S and 3S states. The fitter uses the Roofit (Root) framework and is implemented in oniafitter.

The default Root version set in the CMSSW release (3_8_7) contains an old version of Rootfit. Please set a more recent root version as:

 source /home/zhlinl/ 
The recommended ROOT version is 5.26

Notice: If you want to use the command cmsRun, you have to reset the environment again using cmsenv. Inspect the oniafitter fitting code. Relevant methods are highlighted below. The signal and background PDFs are defined in the buildPdf() method.

  //input and output files
 const TString finput  = "upsilonYieldWeighted_nominal.root";
 const TString foutput = "xsection.root";

  // pt and rapidity binning (for differential measurements)
  const double rapBinArr [] = {-2.0, 0.0, 2.0};
  const double ptBinArr  [] = {0,1,2,3,4,5,6,7,8,9,10,12,14,17,20,30};

  // define workspace owing the relevant data and fit objects 
  RooWorkspace* ws = new RooWorkspace("ws","upsilon mass");

  // define the signal and background PDFs
  // read in the yield data tree

  // perform a  fit to the full dataset  

  // plot the mass and fitted model

These instructions have been gathered for convenience in simpleFit.C.

Question Question 6-1: Fit the mass spectrum and determine the Υ(nS) yields observed in 3/pb.

Question Question 6-2: From the fit to the invariant mass (i) quote the measured Υ mass and compare with the world average value, and (ii) determine the mass resolution (mention, and/or verify, whether it is expected to vary between central and endcap region and why).

Step 3 - First estimation of the integrated cross section

We have now the various ingredients in place for estimating the cross section. The cross section times branching ratio may be determined as

   \begin{displaymath} \sigma_{\rm{tot}}  \times Br(\Upsilon \to \mu\mu) =  \frac{N_{\Upsilon}}{L \cdot {\cal{A}} \cdot \epsilon}    \end{displaymath}

  • L denotes the luminosity of the sample
  • N is the observed yield
  • A is the detector acceptance
  • ε is the trigger and reconstruction efficiency.

The integrated luminosity of the dataset is obtained by running the LumiCalc script over the json file obtained from the crab data-processing job (exercise above) eg: -i lumiSummary.json --nowarning overview

Question Question 6-3: Perform a first estimate of the total Υ(1S) total cross section. (i) compute the average efficiency and acceptance values (by applying the previously computed A and ε maps to the selected upsilon candidates in the ttree); (ii) use the world average value of the relevant branching ratio; (iii) compare to measurements previously performed at lower energies.

Step 4 - The Υ(nS) differential cross sections

In exercise (6.3) we have obtained a first estimation of the Υ cross section, by applying average corrections. A more thorough method is to apply each of the acceptance and efficiency factors on a per-event basis as part of the mass fitting procedure. In this fashion, each event carries a weight which corresponds to the inverse of the product of the Υ-acceptance and μ-efficiencies. These weights correspond to the correcting factors computed and stored in the data tree in Step 1.

The outcome of the fit to the weighted dataset is the corrected signal yield. The cross section is accordingly given simply by

   \begin{displaymath} \sigma_{\rm{tot}} \times Br(\Upsilon \to \mu\mu) =  \frac{N_{\rm{fit}}^{\rm corr.}}{L}   \end{displaymath}

For comparison to theory, and to best allow discrimination amongst different models, the cross section measurement is performed in bins of transverse momentum:

   \begin{displaymath} \frac{d\sigma}{dp_T} \times Br(\Upsilon \to \mu\mu) = \frac{N_{\rm{fit}}^{\rm corr.}(p_T)}{L \cdot \Delta p_T}   \end{displaymath}

where a bin width normalization is included in the computation of the differential cross section.

A dataset weighted by the variable 'weight' is specified through the roofit WeightVar method as in

 RooDataSet * data_w =  new RooDataSet("data_w", "data_w", *(ws->data("data")->get()), Import(*((RooDataSet *)ws->data("data"))), WeightVar("weight"));

The method xsectDif() performs the fitting according to the specified differential binning.

Question Question 6-4: Produce the differential cross sections.

Step 5 - Produce the complete set of 3/pb cross section results

The following set of instructions will allow to execute the full-fledged analysis machinery.

#link the eff and acc maps
cd $CMSSW_BASE/src/zhlinl/UpsilonAna/test/xsection
ln -s ../maps 

#append the eff and acc corrections

#execute the full set of fits 

#extract the cross sections and uncertainties from the fit results 

#compute cross section ratios and display the differential results
root -l -b -q ratio.C
root -l -b -q peakOverlay.C\(true,0\)
root -l -b -q peakOverlay.C\(false,1\)
root -l -b -q peakOverlay.C\(false,2\)
root -l -b -q peakOverlay.C\(false,3\) 

#gather the detailed results in .pdf format

Question Question 6-5: Re-produce the detailed analysis results. (append the automated analysis results .pdf file)

Exercise 7: Evaluate the Systematic Uncertainties

Systematic errors are our attempts to quantify the effects of the assumptions that went into our measurement. Every measurement has some sort of assumptions. Usually starting with the assumption that the measuring device is measuring properly.

Question Question 7-1: As you have gone through this exercise what were the assumptions and intermediary results that contributed to the cross-section measurement?

Each analysis has its own assumptions that go into it and so will have its own set of systematic uncertainties. In general systematic errors are much more important for a measurement of an actual signal than for setting a limit. Some sources of error are common to a large class of measurements such as uncertainty on the integrated luminosity; some will be more specific such as any bias in the fitting procedure used to extract the signal. A systematic uncertainty should quantify the degree to which a model, intermediary result, or other assumption has an effect on your result.

We will consider a few of the sources of systematic uncertainty for the Upsilon cross-section.

Fitter validation

Many analyses use a statistical fitter to separate signal and background or to extract parameters from a distribution. For the Upsilon cross-section we employ an unbinned extended maximum likelihood fit to the dimuon mass spectrum to separate the sample into four components Y(1S) signal, Y(2S) signal, Y(3S) signal and background. The fitting model is implemented in RooFit and the minimization is done with Minuit. When doing a fit it is important to validate that your fitter returns results that are not biased and that the errors from the procedure are valid.

The most basic test is to validate the fitter using pseudo-data generated from the models themselves. This is relatively straight forward to accomplish. We will do a single pseudo-experiment.

root -l

fitterVal(1, "rawFitIntegrated.txt", "rawFitIntegrated.txt");

Question Question 7-2: In our pseudo-data, what was the “true” number of Y(1S) generated? How many were found in the final fit? Are these two numbers consistent?

Doing a single pseudo experiment isn't enough to validate the fitter. Typically you will need to do a thousand or more such fits to ensure that the fitter is reliable. Our pseudo-experiment takes about a minute to run. We don't have time to do a 1000 experiments right now, but to validate a fitter you would look at the distribution of the parameter of interest with respect to its “true” value and its error. This is quickly accomplished by looking at the pull distribution

 \[ x_\mathrm{pull} = \frac{x-x_\mathrm{true}}{\sigma_x} \]
For an unbiased fitter and reasonable errors the pull distribution should be a unit Gaussian distribution, ie mean consistent with 0 and rms consistent with 1. The macro you ran above, fitterVal, can do more than one pseudo-experiment and then create a text file named validation.asc with the results that can be parsed and analyzed. We have prepared a file that has the results of 1000 pseudo-experiments.

mkdir toyPlots
python --plotdir toyPlots -p -b validation1k.asc
display toyPlots/nsig1_pull.png &

Question Question 7-3: Is the fitter unbiased? Does it have reasonable errors?

Question Question 7-4: What sort of a systematic error would you associate with this type of test?

Upsilon polarization

We determined our acceptance using MC under the assumption that the Upsilon was unpolarized. If we polarize our Upsilon sample this changes kinematic distribution of the muons and the acceptance. If the acceptance changes then the cross-section must also.

Open the file and edit it to comment out all of the lines which start out files.append(“... except for the one with the nominal root file and the helT root file. Save this file. Back at the command line execute 1 0

After a couple of minutes you will have done the nominal 1S fit and the fit using the transverse polarization in the helicity frame.

display mode0/fitres_1s_upsilonYieldWeighted_nominal/massfit_rap0_pt1.pdf &
display mode0/fitres_1s_upsilonYieldWeighted_helT/massfit_rap0_pt1.pdf &

Question Question 7-5: What is the difference in the weighted yield of the Y(1S)? Is this large compared to the statistical uncertainty?

Question Question 7-6: What could be done to reduce this source of uncertainty?

In the final Upsilon paper it was decided not to use the helicity as a systematic source; instead the result is quoted under 5 different helicity assumptions. Nevertheless the helicity uncertainty is the largest single factor to the cross-section measurement.

Fit model

To separate the signal and the background we have to choose a shape to use as the model for each component. For the Upsilon measurement we have chosen to use a Crystal Ball shape for the three signal distributions and a 2nd order polynomial as the background. The Crystal Ball shape has a radiative tail that captures the FSR tail of the distribution. The tail is not well constrained using the data so these parameters are fixed from the MC. This simplification of the model is a potential source of systematic error. In the paper we used the statistical error on the tail parameters from the MC and their correlations to do a study on the effect of the tail parameters on the yield. For our purposes we will take a look at the more extreme case of neglecting the tail all together.

root -l

fitterVal(1, "rawFitIntegrated.txt", "rawFitIntegrated.txt");
//write down the Y(1S) yield and error and look at the fit projection
fitterVal(1, "rawFitIntegrated.txt", "rawNoTail.txt");
//write down the Y(1S) yield and error and look at the new fit projection

Question Question 7-7: What happened to the yield? Does the fit projection look as good? (pay attention to the FSR tail)

Additional systematic sources

The paper has an extensive list of systematic uncertainties and how they were obtained. In step 5 of the previous exercise there were the instructions for running the full analysis. This also evaluated many of the systematic uncertainties. If you ran those commands and they successfully completed then you should have a mode0 folder in addition to others in your working directory. Within this folder you will find folders for many of the different variations on the analysis which are used as cross-checks and allow us to quantify our systematic errors. Take some time to look through the paper and the analysis note to see the sections on the systematic errors and understand the different methods used to obtain them.

Question Question 7-8: What are the dominant systematic errors in the 3 inverse pb analysis? Where will the improvement come with 40 inverse pb? What systematic errors won't gain much or may suffer from the increased statistics?

Discussion of the Results

Now we have reproduced the whole Upsilon analysis as published in the paper. Please compare the Y(1S) cross section results obtained from Exercise 6 with the plots in the paper.

xsec1S_overlay.gif Table1S_allRap.GIF

Question Question 7-9: Plot the results for the differential cross section from the table in the paper and the table you made in Exercise 6 to compare the results.

Additional Materials


Please contact LinlinZhang for any questions, comments or suggestions about this tutorial.

-- LinlinZhang - 29-Mar-2011

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng PATtuple_screen.PNG r1 manage 87.9 K 2011-04-12 - 05:26 UnknownUser  
GIFgif ScanTree.gif r1 manage 28.4 K 2011-04-12 - 05:49 UnknownUser  
GIFgif Table1S_allRap.GIF r1 manage 28.1 K 2011-04-12 - 06:01 UnknownUser  
GIFgif UpsilonWorkFlow.gif r2 r1 manage 28.4 K 2011-04-10 - 12:48 UnknownUser  
PNGpng fig_Y1S.png r1 manage 57.3 K 2011-03-29 - 14:15 UnknownUser  
PNGpng kinematic1_v2.png r1 manage 62.5 K 2011-03-29 - 14:15 UnknownUser  
PNGpng kinematic2_v2.png r1 manage 36.4 K 2011-03-29 - 14:16 UnknownUser  
GIFgif muonRecoEff_2d.gif r1 manage 11.0 K 2011-03-29 - 14:16 UnknownUser  
GIFgif muonTrackRecoEff_1d_projection_bin.gif r1 manage 5.8 K 2011-03-29 - 14:16 UnknownUser  
PNGpng reco_mc.png r1 manage 82.7 K 2011-03-29 - 14:17 UnknownUser  
PNGpng report_onia2mumu.PNG r1 manage 107.1 K 2011-04-13 - 11:17 UnknownUser  
PNGpng summary_onia2mumu.png r1 manage 31.0 K 2011-03-29 - 14:17 UnknownUser  
PNGpng table_Y1S.png r1 manage 50.5 K 2011-03-29 - 14:18 UnknownUser  
PNGpng trigger_mc.png r1 manage 90.7 K 2011-03-29 - 14:18 UnknownUser  
PNGpng trigger_tnp_mc.png r1 manage 129.9 K 2011-03-29 - 14:18 UnknownUser  
GIFgif xsec1S_overlay.gif r1 manage 11.1 K 2011-04-12 - 05:57 UnknownUser  
Edit | Attach | Watch | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r19 - 2012-09-10 - unknown
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback