HSCP Search Recipes
Marissa's Tutorial on 8 TeV analysis
This is
WeifengJiSandbox topic
This links to
WebSearch.
This is a link to search.
Related Links:
•
https://cmsweb.cern.ch/das/
•
https://cmsweb.cern.ch/phedex/prod/Request::Create?type=xfer
•
https://cmswbm.web.cern.ch/cmswbm/cmsdb/servlet/RunSummary
•
https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/
•
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookLocatingDataSamples
•
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideFrontierConditions
•
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRABTutorial
Setting up the environment - ntuples (NOTE: the
StoppedHSCP/Analysis package is too large to submit to crab, so you must have only
StoppedHSCP/Ntuples in your directory to create ntuples)
cmsrel CMSSW_5_3_11
cd CMSSW_5_3_11/src
cmsenv
git clone
https://github.com/rodenm/StoppedHSCP
src/StoppedHSCP
rm -rf
StoppedHSCP/Analysis/
rm -rf
StoppedHSCP/Lumi/
rm -rf
StoppedHSCP/Simulation/
rm -rf
StoppedHSCP/Statistics/
rm -rf
StoppedHSCP/ToyMC/
chmod u+x
StoppedHSCP/Ntuples/scripts/*py
scram b
rehash
Finding the dataset
Go to:
https://cmsweb.cern.ch/das/
DAS is the service used to catalog all datasets available to
CMS. You want to find the most recently created version of our dataset.
Search for: dataset=/NoBPTX/Run2012*/RECO
NoBPTX is the name for all datasets with the BPTX veto in the trigger
Run2012* gives you datasets from 2012
RECO is the format of data we want
For this excercise, we will only use
Run2012C. It's the smallest dataset so it will be the fastest to process.
/NoBPTX/Run2012C-22Jan2013-v1/RECO
For this dataset "22Jan2013" refers to when it was latest re-reco'd into the latest version of CMSSW (at the time: CMSSW_5_3_7_patch5)
Click the "runs" link to see all of the runs in the dataset. In particular, we need to know the first and last run numbers. Sort the list by run.run_number. For this dataset, the runs range from 198022-203742
If you click the "sites" link below the dataset name, you will see the T2 and T3 sites that have this dataset stored. We really hope that it's on Purdue, otherwise you have to request to have it moved there via
PhEDEx:
https://cmsweb.cern.ch/phedex/prod/Request::Create?type=xfer
Now that we have the full dataset name and have ensured it's at an appropriate T2, we can get on with things.
Update fills.txt and fillingschemes.txt
During data-taking periods, when additional fills are added to the dataset each week, these files need to be updated with the new fill numbers and fill schemes.
Go to this webpage:
https://cmswbm.web.cern.ch/cmswbm/cmsdb/servlet/RunSummary
Click "Recent LHC Fills"
Right now the fills that are displayed are test fills from just before LS1 and don't need to be added to the files. In the future, during data-taking, you will need to update fills.txt with the info on this page.
Excerpt from fills.txt:
3363 50ns_1374_1368_0_1262_144bpi12inj_V2 208427,208428,208429
3370 50ns_1374_1368_0_1262_144bpi12inj_V2 208487
3372 50ns_72_60_0_6_36bpi4inj 208509
3374 50ns_1374_1368_0_1262_144bpi12inj_V2 208538,208540,208541
3375 50ns_1374_1368_0_1262_144bpi12inj_V2 208551,208553
3378 50ns_1374_1368_0_1262_144bpi12inj_V2 208686
Using only fills marked "stable", enter the fill number, "injection scheme", and the list of runs for the fill at the bottom of the file.
Now run (replacing "rodenm" with your CERN username):
pyGetFillScheme.py -u rodenm -i $CMSSW_BASE/src/StoppedHSCP/Ntuples/data/fills.txt -o $CMSSW_BASE/src/StoppedHSCP/Ntuples/data/fillingSchemes.txt
This updates fillingSchemes.txt with details from any filling schemes not already listed in there. NOTE: this doesn't work for all filling schemes. If there is an important filling scheme you need (ie not one of the "Single_10b_4_2_4" type schemes for fills from back in 2010), then you will have to try a bunch of other varsity-level stuff. We'll skip that for now.
Finding the appropriate JSON file for the certified data
Go to:
https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/
There are two links we care about. "Prompt" lists the certificates for new data as it comes off the machine. During LHC running, this is where you'll go for certified data.
"Reprocessing" has the certificates for data after it's been reprocessed into a more recent version of CMSSW. That's what we want.
You are looking for a file of the format:
Cert_190456-196531_8TeV_22Jan2012ReReco_Collisions12_JSON_v2.txt
190456-196531 gives the run range certified. Make sure this run range includes the range of runs found for the dataset
22Jan2012ReReco tells you when the rereco occured. This should match the label in the dataset name, or be more recent (if the original certificate isn't available any longer)
The correct file is:
Cert_190456-203742_8TeV_22Jan2013ReReco_Collisions12_JSON.txt
This one also works (it just contains a wider run range which includes
Run2012D)
Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt
Using this cert file, execute the following:
GetRunFillInfo.py -u rodenm -j
https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt
-d /NoBPTX/Run2012C-22Jan2013-v1/RECO
EXCEPT! This relies on a service that no longer exists. We used to use dbs to search for info on datasets, but we can't anylonger. Now, there is a service called das. It would be SUPER helpful if you could edit
GetRunFillInfo.py so that it uses das instead of dbs.
Location of the file:
/StoppedHSCP/Nutples/scripts/GetRunFillInfo.py
Information on the new command-line interface for das is at:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookLocatingDataSamples
(For now, this step isn't necessary since we already have the latest certified data json file in the repository)
Global tags
Global tags are used whenever we access a dataset on the grid. These are used to connect the dataset with the most recent values in a conditions database that describes things like . Getting the wrong global tag can cause all kinds of weirdness so you should do this carefully.
The lists of available global tags are here:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideFrontierConditions
In the table of contents, below "Global Tags used in official data reprocessing / MC productions" there is this link for
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideFrontierConditions#Winter13_2012_A_B_C_D_datasets_r
You are looking for analysis tags for data. For
Run2012C, the correct tag is:
FT_53_V21_AN6
Making ntuples!
mkdir June_week1
cd June_week1
The following command creates all of the things required to submit crab jobs to the grid to make ntuples (again, replacing rodenm with your username)
makeTreeJob.py -u rodenm -s T2_US_Purdue -j
Run2012C_5311_V29 2797_3102_v1 /NoBPTX/Run2012C-22Jan2013-v1/RECO FT_53_V21_AN6::All ../StoppedHSCP/Ntuples/data/runs_22JanReReco_198049_203742.json
run:
makeTreeJob.py -h
to get details of what each of these arguments mean.
For historical reasons, there are a bunch of files that we don't need and files that need to be edited for them to work properly. It would be super great if you guys edited makeTreeJob.py with the correct values!
Without editing it, you will have to make these changes:
rm
reduced
emacs -nw crab_tree_Run2012C_5311_V29_2797_3102_v1.cfg
• change "scheduler = condor" to "scheduler = remoteglidein"
• Add "dbs_url = phys03" under the [CMSSW] section
If you haven't gotten your purdue storage space yet, you need to change the [USER] section to:
[USER]
return_data = 1
copy_data = 0
ui_working_dir = stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1
This will put all of the ntuples in your local directory. This is NOT how you want to do things normally, but just this once, it should be ok.
Finally we get to submit the jobs.
crab -create -cfg crab_tree_Run2012C_5311_V29_2797_3102_v1.cfg
crab -c stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1 -submit
To monitor the status of the jobs run
crab -c stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1 -status
Once all jobs are finished, run
crab -c stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1 -get
crab -c stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1 -report
Finally, to make sure that crab didn't quietly drop lumi sections from your ntuples, run
compareJSON.py --diff stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1/res/lumiSummary.json ../StoppedHSCP/Ntuples/data/runs_22JanReReco_198049_203742.json
Retrieving your ntuples from the T2 at Purdue
cmsrel CMSSW_5_3_11
cd CMSSW_5_3_11/src
cmsenv
git clone
https://github.com/rodenm/StoppedHSCP
src/StoppedHSCP
rm -rf
StoppedHSCP/Lumi/
rm -rf
StoppedHSCP/Simulation/
chmod u+x
StoppedHSCP/Ntuples/scripts/*py
chmod u+x
StoppedHSCP/Analysis/scripts/*py
chmod u+x
StoppedHSCP/ToyMC/scripts/*py
scram b
rehash
Initialize your grid certificate:
voms-proxy-init -voms cms
Check that your ntuples are at the T2 (replace "rodenm" with your username):
lcg-ls -b -D srmv2 "srm://srm.rcac.purdue.edu:8443/srm/v2/server?SFN=/mnt/hadoop/store/user/rodenm"
You should see your remote_dir from your crab file in the list. Example, I'm looking for the ntuples from the previous tutorial which would be:
/mnt/hadoop/store/user/rodenm/stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1
We have written a special script to copy files from a remote storage location to your local server. You can find it at:
StoppedHSCP/Ntuples/scripts/copyFiles.py
When you set files in this directory to be excecutable (the chmod command above) and compile, this script can be used like any other command. Right now there are a couple of things in the script that need to be changed for it to work.
To see how the script works, execute:
copyFiles.py -h
(If this doesn't work, make sure you did chmod on scripts and compiled.)
To pull down your ntuples locally execute something like this command, but change the username and output directories:
copyFiles.py -u [username] -o [output directory] -s PUR -d stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1
Your output directory should be something like:
/store/user/[username]/stoppedHSCP/data/
It can take a few minutes to move all of the ntuples locally, so go get a cup of coffee in the meantime.
Once it's done, you need to copy all ntuples from all run eras into a single directory. I typically name this directory something like:
/store/user/[username]/stoppedHSCP/data/stoppedHSCP_tree_AllRun2012_5311_V29_June_week1_v1
where "June_week1" corresponds to the time I made all of the ntuples. (Notice it is also the directory name in which we submitted the crab jobs…)
We put them all into one directory because the rest of our analysis requires it.
Running the analysis - Part 1: selected events
NOTE: these instructions assume you've already executed the first steps at the beginning of the tutorial on copying your files from the T2.
From now on [ntuple dir] is the full local address of the directory containing the ntuples. Ex:
/store/user/[username]/stoppedHSCP/data/stoppedHSCP_tree_AllRun2012_5311_V29_June_week1_v1
And [analysis dir] is the name of the local directory in which analysis results go. You can pick whatever name you want. A good example is:
AllRun2012_5310_V29_June_week1_v1
To start:
mkdir [analysis dir]
Next, run the basic analysis script. This takes all of the ntuples, runs over each event, produces some summary plots, and determines which events pass all cuts.
search -i [ntuple dir] -o [analysis dir] > & [analysis dir]/summary.txt
This step can take a while if you're running over the full 4.5M events in 2012. After it's done:
cat [analysis dir]/summary.txt
This shows all of the logging information for the search. The last bit is the most important:
Total livetime : 1.01144e+06
Final rate : 9.8869e-06 +/- 3.12651e-06
[TABLE border=1]
|Cut |N |Rate (Hz) |N-1 % |N-1 (Hz)|-
|0 trigger | 3275614 | 3.24e+00 +/- 1.79e-03 | 10 | 9.89e-06 +/- 3.13e-06 |-
|1 BPTX veto | 3275614 | 3.24e+00 +/- 1.79e-03 | 10 | 9.89e-06 +/- 3.13e-06 |-
|2 BX veto | 3262671 | 3.23e+00 +/- 1.79e-03 | 10 | 9.89e-06 +/- 3.13e-06 |-
|3 Vertex veto | 3251175 | 3.21e+00 +/- 1.78e-03 | 98 | 9.69e-05 +/- 9.79e-06 |-
|4 Halo veto | 1690877 | 1.67e+00 +/- 1.29e-03 | 12463 | 1.23e-02 +/- 1.10e-04 |-
|5 Cosmic veto | 737121 | 7.29e-01 +/- 8.49e-04 | 2110 | 2.09e-03 +/- 4.54e-05 |-
|6 Noise veto | 217139 | 2.15e-01 +/- 4.61e-04 | 1575 | 1.56e-03 +/- 3.92e-05 |-
|7 E30 | 29648 | 2.93e-02 +/- 1.70e-04 | 10 | 9.89e-06 +/- 3.13e-06 |-
|8 E70 | 6119 | 6.05e-03 +/- 7.73e-05 | 88 | 8.70e-05 +/- 9.27e-06 |-
|9 n60 | 6119 | 6.05e-03 +/- 7.73e-05 | 10 | 9.89e-06 +/- 3.13e-06 |-
|10 n90 | 111 | 1.10e-04 +/- 1.04e-05 | 26 | 2.57e-05 +/- 5.04e-06 |-
|11 nTowiPhi | 28 | 2.77e-05 +/- 5.23e-06 | 13 | 1.29e-05 +/- 3.56e-06 |-
|12 iPhiFrac | 12 | 1.19e-05 +/- 3.42e-06 | 20 | 1.98e-05 +/- 4.42e-06 |-
|13 R1 | 11 | 1.09e-05 +/- 3.28e-06 | 10 | 9.89e-06 +/- 3.13e-06 |-
|14 R2 | 11 | 1.09e-05 +/- 3.28e-06 | 10 | 9.89e-06 +/- 3.13e-06 |-
|15 Rpeak | 10 | 9.89e-06 +/- 3.13e-06 | 11 | 1.09e-05 +/- 3.28e-06 |-
|16 Router | 10 | 9.89e-06 +/- 3.13e-06 | 10 | 9.89e-06 +/- 3.13e-06 |-
[/TABLE]
================
JES uncertainty: LOW
==================
|Cut |N |cum % |N-1 |-
0 trigger |
3275614 |
1.00e+02 |
1 BPTX veto |
3275614 |
1.00e+02 |
2 BX veto |
3262671 |
9.96e+01 |
3 Vertex veto |
3251175 |
9.93e+01 |
4 Halo veto |
1690877 |
5.16e+01 |
5 Cosmic veto |
737121 |
2.25e+01 |
6 Noise veto |
217139 |
6.63e+00 |
7 E30 |
29461 |
8.99e-01 |
8 E70 |
5350 |
1.63e-01 |
9 n60 |
5350 |
1.63e-01 |
10 n90 |
96 |
2.93e-03 |
11 nTowiPhi |
22 |
6.72e-04 |
12 iPhiFrac |
11 |
3.36e-04 |
13 R1 |
10 |
3.05e-04 |
14 R2 |
10 |
3.05e-04 |
15 Rpeak |
9 |
2.75e-04 |
16 Router |
9 |
2.75e-04 |
================
JES uncertainty: HIGH
==================
|Cut |N |cum % |N-1 |-
0 trigger |
3275614 |
1.00e+02 |
1 BPTX veto |
3275614 |
1.00e+02 |
2 BX veto |
3262671 |
9.96e+01 |
3 Vertex veto |
3251175 |
9.93e+01 |
4 Halo veto |
1690877 |
5.16e+01 |
5 Cosmic veto |
737121 |
2.25e+01 |
6 Noise veto |
217139 |
6.63e+00 |
7 E30 |
29864 |
9.12e-01 |
8 E70 |
7002 |
2.14e-01 |
9 n60 |
7002 |
2.14e-01 |
10 n90 |
128 |
3.91e-03 |
11 nTowiPhi |
30 |
9.16e-04 |
12 iPhiFrac |
12 |
3.66e-04 |
13 R1 |
11 |
3.36e-04 |
14 R2 |
11 |
3.36e-04 |
15 Rpeak |
10 |
3.05e-04 |
16 Router |
10 |
3.05e-04 |
End of analysis
For now, disregard the last two sections. The first section is important.
Total livetime : 1.01144e+06
Final rate : 9.8869e-06 +/- 3.12651e-06
[TABLE border=1]
|Cut |N |Rate (Hz) |N-1 % |N-1 (Hz)|-
|0 trigger | 3275614 | 3.24e+00 +/- 1.79e-03 | 10 | 9.89e-06 +/- 3.13e-06 |-
|1 BPTX veto | 3275614 | 3.24e+00 +/- 1.79e-03 | 10 | 9.89e-06 +/- 3.13e-06 |-
|2 BX veto | 3262671 | 3.23e+00 +/- 1.79e-03 | 10 | 9.89e-06 +/- 3.13e-06 |-
|3 Vertex veto | 3251175 | 3.21e+00 +/- 1.78e-03 | 98 | 9.69e-05 +/- 9.79e-06 |-
|4 Halo veto | 1690877 | 1.67e+00 +/- 1.29e-03 | 12463 | 1.23e-02 +/- 1.10e-04 |-
|5 Cosmic veto | 737121 | 7.29e-01 +/- 8.49e-04 | 2110 | 2.09e-03 +/- 4.54e-05 |-
|6 Noise veto | 217139 | 2.15e-01 +/- 4.61e-04 | 1575 | 1.56e-03 +/- 3.92e-05 |-
|7 E30 | 29648 | 2.93e-02 +/- 1.70e-04 | 10 | 9.89e-06 +/- 3.13e-06 |-
|8 E70 | 6119 | 6.05e-03 +/- 7.73e-05 | 88 | 8.70e-05 +/- 9.27e-06 |-
|9 n60 | 6119 | 6.05e-03 +/- 7.73e-05 | 10 | 9.89e-06 +/- 3.13e-06 |-
|10 n90 | 111 | 1.10e-04 +/- 1.04e-05 | 26 | 2.57e-05 +/- 5.04e-06 |-
|11 nTowiPhi | 28 | 2.77e-05 +/- 5.23e-06 | 13 | 1.29e-05 +/- 3.56e-06 |-
|12 iPhiFrac | 12 | 1.19e-05 +/- 3.42e-06 | 20 | 1.98e-05 +/- 4.42e-06 |-
|13 R1 | 11 | 1.09e-05 +/- 3.28e-06 | 10 | 9.89e-06 +/- 3.13e-06 |-
|14 R2 | 11 | 1.09e-05 +/- 3.28e-06 | 10 | 9.89e-06 +/- 3.13e-06 |-
|15 Rpeak | 10 | 9.89e-06 +/- 3.13e-06 | 11 | 1.09e-05 +/- 3.28e-06 |-
|16 Router | 10 | 9.89e-06 +/- 3.13e-06 | 10 | 9.89e-06 +/- 3.13e-06 |-
[/TABLE]
This shows the effective livetime for the full set of data in seconds (1.011e+06 is about 281 hours)
Next is what we call a cutflow table. It's formatted to be copied into an elog, but we'lll get to that later. For each line we list the name of the cut, the total number of events passing all cuts up to that one, and the N-1 value for the cut. The N-1 value is the number of events passing all cuts but that one. Looking at the halo cut:
|Cut |N |Rate (Hz) |N-1 |N-1 (Hz) |-
|4 Halo veto | 1690877 | 1.67e+00 +/- 1.29e-03 | 12463 | 1.23e-02 +/- 1.10e-04 |-
1690877 events passed cuts 1-4.
12463 events pass all cuts excluding the halo cut. This implies that there could be up to 12463 halo events
The "N" value at the end shows how many events passed all cuts, ie the number of selected events. For 2012, this number is 10.
Running the analysis - Part 2: backgrounds
NOTE: The code for all of the following programs is in
StoppedHSCP/Analysis/bin/ and the source code file always has the same name as the executable.
To determine if any selected events constitute signal, we have to first calculate our expected background. Obviously, if our background estimate is the same as our # of selected events, we have good reason to believe that these selected events are just background events that escaped all of our cuts. For each of these, we're running over the full set of data so each step takes a little while.
There are 3 sources of background we have to consider. First, the halo background. The details of how this is calculated are in the analysis note. You should be able to read through the source file and figure out the mechanics. The source file is at:
StoppedHSCP/Analysis/bin/HaloBackground.cpp
To run it:
haloBackground -i [ntuple dir] -o [analysis dir]
cat [analysis dir]/HaloBackground.txt
The output of this file is pretty complicated (blame me, I like a lot of output). It basically shows the expected halo background using various different binning schemes and we can go over the rest of it later. The important part is the final line:
Final = 8.02641 +/- 0.192619 +/- 0.241676
That is the number of halo events +/- statistical error +/- systematic error.
Now that the halo estimate is complete, let's move on the cosmic background estimate. This starts by running over some cosmic muon Monte Carlo that I've made in the past. First you run "backgrounds" which produces some plots we need later. Next is "cosmicInefficiency" which calculates the cosmic inefficiency based on cosmic muon MC.
mkdir Cosmic12_5310_V29_All
backgrounds -i /store/user/rodenm/mcgluino/stoppedHSCP_tree_Summer12_5310_V29_Cosmic_All -o Cosmic12_5310_V29_All/
cosmicInefficiency -i /store/user/rodenm/mcgluino/stoppedHSCP_tree_Summer12_5310_V29_Cosmic_All -o Cosmic12_5310_V29_All/
cat Cosmic12_5310_V29_All/CosmicInefficiency.txt
Now that we have histograms with the cosmic inefficiencies, we apply them to the actual cosmic data.
cosmicBackground -i [ntuples dir] -o [analysis dir] --ineffPlots=Cosmic12_5310_V29_All/CosmicInefficiency.root >& [analysis dir]/CosmicBackground.txt
cat [analysis dir]/CosmicBackground.txt
Again, there's tons of output and the key part is at the very end of [analysis dir]/CosmicBackground.txt:
N-1 entries: 1753
DT by RPC background: 3.05437 +/- 1.29556
Smeared background: 5.2124 +/-1.49521
Uncertainty background: 6.2546 +/-1.50815
End of analysis
We use the "Smeared background" for our background estimate. We then take the "Uncertainty background" to estimate the systematic by taking the difference between the two central values. That means that the final cosmic background estimate is 5.21 +/- 1.50(stat) +/- 1.04.
And now, the noise background.
We currently use the 2010A run to estimate our noise background. I'm fairly certain this will change next year (to what, I don't know), but for now, let's do things the way we have in the past. First, you need to run the usual search on the 2010A run.
mkdir
Run2010A-v1_5310_V29_1114_1309_v5_final
search -i /store/user/rodenm/data/gluino/stoppedHSCP_tree_Run2010A-v1_5310_V29_1114_1309_v5/ -o
Run2010A-v1_5310_V29_1114_1309_v5_final >&
Run2010A-v1_5310_V29_1114_1309_v5_final/summary.txt
Then run the cosmic background estimate for 2010A.
backgrounds -i /store/user/rodenm/data/gluino/stoppedHSCP_tree_Run2010A-v1_5310_V29_1114_1309_v5/ -o
Run2010A-v1_5310_V29_1114_1309_v5_final
cosmicBackground -i /store/user/rodenm/data/gluino/stoppedHSCP_tree_Run2010A-v1_5310_V29_1114_1309_v5/ -o
Run2010A-v1_5310_V29_1114_1309_v5_final --ineffPlots=Cosmic12_5310_V29_All/CosmicInefficiency.root >&
Run2010A-v1_5310_V29_1114_1309_v5_final/CosmicBackground.txt
cat
Run2010A-v1_5310_V29_1114_1309_v5_final/CosmicBackground.txt
Calculate the
Run2010A background just as we did for the Run2012 background above.
To be continued...
--
WeifengJi - 2015-04-08