HSCP Search Recipes

Marissa's Tutorial on 8 TeV analysis

This is WeifengJiSandbox topic

This links to WebSearch.

This is a link to search.

Related Links:

• https://cmsweb.cern.ch/das/

• https://cmsweb.cern.ch/phedex/prod/Request::Create?type=xfer

• https://cmswbm.web.cern.ch/cmswbm/cmsdb/servlet/RunSummary

• https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/

• https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookLocatingDataSamples

• https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideFrontierConditions

• https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRABTutorial

Setting up the environment - ntuples (NOTE: the StoppedHSCP/Analysis package is too large to submit to crab, so you must have only StoppedHSCP/Ntuples in your directory to create ntuples)

cmsrel CMSSW_5_3_11 cd CMSSW_5_3_11/src cmsenv

git clone https://github.com/rodenm/StoppedHSCP src/StoppedHSCP rm -rf StoppedHSCP/Analysis/ rm -rf StoppedHSCP/Lumi/ rm -rf StoppedHSCP/Simulation/ rm -rf StoppedHSCP/Statistics/ rm -rf StoppedHSCP/ToyMC/ chmod u+x StoppedHSCP/Ntuples/scripts/*py scram b rehash

Finding the dataset

Go to: https://cmsweb.cern.ch/das/

DAS is the service used to catalog all datasets available to CMS. You want to find the most recently created version of our dataset.

Search for: dataset=/NoBPTX/Run2012*/RECO NoBPTX is the name for all datasets with the BPTX veto in the trigger Run2012* gives you datasets from 2012 RECO is the format of data we want

For this excercise, we will only use Run2012C. It's the smallest dataset so it will be the fastest to process.

/NoBPTX/Run2012C-22Jan2013-v1/RECO

For this dataset "22Jan2013" refers to when it was latest re-reco'd into the latest version of CMSSW (at the time: CMSSW_5_3_7_patch5)

Click the "runs" link to see all of the runs in the dataset. In particular, we need to know the first and last run numbers. Sort the list by run.run_number. For this dataset, the runs range from 198022-203742

If you click the "sites" link below the dataset name, you will see the T2 and T3 sites that have this dataset stored. We really hope that it's on Purdue, otherwise you have to request to have it moved there via PhEDEx: https://cmsweb.cern.ch/phedex/prod/Request::Create?type=xfer

Now that we have the full dataset name and have ensured it's at an appropriate T2, we can get on with things.

Update fills.txt and fillingschemes.txt

During data-taking periods, when additional fills are added to the dataset each week, these files need to be updated with the new fill numbers and fill schemes.

Go to this webpage: https://cmswbm.web.cern.ch/cmswbm/cmsdb/servlet/RunSummary

Click "Recent LHC Fills"

Right now the fills that are displayed are test fills from just before LS1 and don't need to be added to the files. In the future, during data-taking, you will need to update fills.txt with the info on this page.

Excerpt from fills.txt:

3363 50ns_1374_1368_0_1262_144bpi12inj_V2 208427,208428,208429 3370 50ns_1374_1368_0_1262_144bpi12inj_V2 208487 3372 50ns_72_60_0_6_36bpi4inj 208509 3374 50ns_1374_1368_0_1262_144bpi12inj_V2 208538,208540,208541 3375 50ns_1374_1368_0_1262_144bpi12inj_V2 208551,208553 3378 50ns_1374_1368_0_1262_144bpi12inj_V2 208686

Using only fills marked "stable", enter the fill number, "injection scheme", and the list of runs for the fill at the bottom of the file.

Now run (replacing "rodenm" with your CERN username): pyGetFillScheme.py -u rodenm -i $CMSSW_BASE/src/StoppedHSCP/Ntuples/data/fills.txt -o $CMSSW_BASE/src/StoppedHSCP/Ntuples/data/fillingSchemes.txt

This updates fillingSchemes.txt with details from any filling schemes not already listed in there. NOTE: this doesn't work for all filling schemes. If there is an important filling scheme you need (ie not one of the "Single_10b_4_2_4" type schemes for fills from back in 2010), then you will have to try a bunch of other varsity-level stuff. We'll skip that for now.

Finding the appropriate JSON file for the certified data

Go to: https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/

There are two links we care about. "Prompt" lists the certificates for new data as it comes off the machine. During LHC running, this is where you'll go for certified data.

"Reprocessing" has the certificates for data after it's been reprocessed into a more recent version of CMSSW. That's what we want.

You are looking for a file of the format: Cert_190456-196531_8TeV_22Jan2012ReReco_Collisions12_JSON_v2.txt

190456-196531 gives the run range certified. Make sure this run range includes the range of runs found for the dataset 22Jan2012ReReco tells you when the rereco occured. This should match the label in the dataset name, or be more recent (if the original certificate isn't available any longer)

The correct file is: Cert_190456-203742_8TeV_22Jan2013ReReco_Collisions12_JSON.txt

This one also works (it just contains a wider run range which includes Run2012D) Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt

Using this cert file, execute the following:

GetRunFillInfo.py -u rodenm -j https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions12/8TeV/Prompt/Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt -d /NoBPTX/Run2012C-22Jan2013-v1/RECO

EXCEPT! This relies on a service that no longer exists. We used to use dbs to search for info on datasets, but we can't anylonger. Now, there is a service called das. It would be SUPER helpful if you could edit GetRunFillInfo.py so that it uses das instead of dbs.

Location of the file: /StoppedHSCP/Nutples/scripts/GetRunFillInfo.py

Information on the new command-line interface for das is at: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookLocatingDataSamples

(For now, this step isn't necessary since we already have the latest certified data json file in the repository)

Global tags

Global tags are used whenever we access a dataset on the grid. These are used to connect the dataset with the most recent values in a conditions database that describes things like . Getting the wrong global tag can cause all kinds of weirdness so you should do this carefully.

The lists of available global tags are here: https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideFrontierConditions

In the table of contents, below "Global Tags used in official data reprocessing / MC productions" there is this link for https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideFrontierConditions#Winter13_2012_A_B_C_D_datasets_r

You are looking for analysis tags for data. For Run2012C, the correct tag is: FT_53_V21_AN6

Making ntuples!

mkdir June_week1 cd June_week1

The following command creates all of the things required to submit crab jobs to the grid to make ntuples (again, replacing rodenm with your username)

makeTreeJob.py -u rodenm -s T2_US_Purdue -j Run2012C_5311_V29 2797_3102_v1 /NoBPTX/Run2012C-22Jan2013-v1/RECO FT_53_V21_AN6::All ../StoppedHSCP/Ntuples/data/runs_22JanReReco_198049_203742.json

run: makeTreeJob.py -h

to get details of what each of these arguments mean.

For historical reasons, there are a bunch of files that we don't need and files that need to be edited for them to work properly. It would be super great if you guys edited makeTreeJob.py with the correct values!

Without editing it, you will have to make these changes:

rm reduced

emacs -nw crab_tree_Run2012C_5311_V29_2797_3102_v1.cfg

• change "scheduler = condor" to "scheduler = remoteglidein"

• Add "dbs_url = phys03" under the [CMSSW] section

If you haven't gotten your purdue storage space yet, you need to change the [USER] section to:

[USER] return_data = 1 copy_data = 0 ui_working_dir = stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1

This will put all of the ntuples in your local directory. This is NOT how you want to do things normally, but just this once, it should be ok.

Finally we get to submit the jobs. crab -create -cfg crab_tree_Run2012C_5311_V29_2797_3102_v1.cfg crab -c stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1 -submit

To monitor the status of the jobs run crab -c stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1 -status

Once all jobs are finished, run crab -c stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1 -get crab -c stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1 -report

Finally, to make sure that crab didn't quietly drop lumi sections from your ntuples, run

compareJSON.py --diff stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1/res/lumiSummary.json ../StoppedHSCP/Ntuples/data/runs_22JanReReco_198049_203742.json


Retrieving your ntuples from the T2 at Purdue

cmsrel CMSSW_5_3_11 cd CMSSW_5_3_11/src cmsenv

git clone https://github.com/rodenm/StoppedHSCP src/StoppedHSCP rm -rf StoppedHSCP/Lumi/ rm -rf StoppedHSCP/Simulation/ chmod u+x StoppedHSCP/Ntuples/scripts/*py chmod u+x StoppedHSCP/Analysis/scripts/*py chmod u+x StoppedHSCP/ToyMC/scripts/*py scram b rehash

Initialize your grid certificate: voms-proxy-init -voms cms

Check that your ntuples are at the T2 (replace "rodenm" with your username):

lcg-ls -b -D srmv2 "srm://srm.rcac.purdue.edu:8443/srm/v2/server?SFN=/mnt/hadoop/store/user/rodenm"

You should see your remote_dir from your crab file in the list. Example, I'm looking for the ntuples from the previous tutorial which would be:

/mnt/hadoop/store/user/rodenm/stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1

We have written a special script to copy files from a remote storage location to your local server. You can find it at:

StoppedHSCP/Ntuples/scripts/copyFiles.py

When you set files in this directory to be excecutable (the chmod command above) and compile, this script can be used like any other command. Right now there are a couple of things in the script that need to be changed for it to work.

To see how the script works, execute:

copyFiles.py -h

(If this doesn't work, make sure you did chmod on scripts and compiled.)

To pull down your ntuples locally execute something like this command, but change the username and output directories:

copyFiles.py -u [username] -o [output directory] -s PUR -d stoppedHSCP_tree_Run2012C_5311_V29_2797_3102_v1

Your output directory should be something like:

/store/user/[username]/stoppedHSCP/data/

It can take a few minutes to move all of the ntuples locally, so go get a cup of coffee in the meantime.

Once it's done, you need to copy all ntuples from all run eras into a single directory. I typically name this directory something like:

/store/user/[username]/stoppedHSCP/data/stoppedHSCP_tree_AllRun2012_5311_V29_June_week1_v1

where "June_week1" corresponds to the time I made all of the ntuples. (Notice it is also the directory name in which we submitted the crab jobs…)

We put them all into one directory because the rest of our analysis requires it.

Running the analysis - Part 1: selected events

NOTE: these instructions assume you've already executed the first steps at the beginning of the tutorial on copying your files from the T2.

From now on [ntuple dir] is the full local address of the directory containing the ntuples. Ex:

/store/user/[username]/stoppedHSCP/data/stoppedHSCP_tree_AllRun2012_5311_V29_June_week1_v1

And [analysis dir] is the name of the local directory in which analysis results go. You can pick whatever name you want. A good example is:

AllRun2012_5310_V29_June_week1_v1

To start:

mkdir [analysis dir]

Next, run the basic analysis script. This takes all of the ntuples, runs over each event, produces some summary plots, and determines which events pass all cuts.

search -i [ntuple dir] -o [analysis dir] > & [analysis dir]/summary.txt

This step can take a while if you're running over the full 4.5M events in 2012. After it's done:

cat [analysis dir]/summary.txt

This shows all of the logging information for the search. The last bit is the most important:

Total livetime : 1.01144e+06 Final rate : 9.8869e-06 +/- 3.12651e-06

[TABLE border=1] |Cut |N |Rate (Hz) |N-1 % |N-1 (Hz)|- |0 trigger | 3275614 | 3.24e+00 +/- 1.79e-03 | 10 | 9.89e-06 +/- 3.13e-06 |- |1 BPTX veto | 3275614 | 3.24e+00 +/- 1.79e-03 | 10 | 9.89e-06 +/- 3.13e-06 |- |2 BX veto | 3262671 | 3.23e+00 +/- 1.79e-03 | 10 | 9.89e-06 +/- 3.13e-06 |- |3 Vertex veto | 3251175 | 3.21e+00 +/- 1.78e-03 | 98 | 9.69e-05 +/- 9.79e-06 |- |4 Halo veto | 1690877 | 1.67e+00 +/- 1.29e-03 | 12463 | 1.23e-02 +/- 1.10e-04 |- |5 Cosmic veto | 737121 | 7.29e-01 +/- 8.49e-04 | 2110 | 2.09e-03 +/- 4.54e-05 |- |6 Noise veto | 217139 | 2.15e-01 +/- 4.61e-04 | 1575 | 1.56e-03 +/- 3.92e-05 |- |7 E30 | 29648 | 2.93e-02 +/- 1.70e-04 | 10 | 9.89e-06 +/- 3.13e-06 |- |8 E70 | 6119 | 6.05e-03 +/- 7.73e-05 | 88 | 8.70e-05 +/- 9.27e-06 |- |9 n60 | 6119 | 6.05e-03 +/- 7.73e-05 | 10 | 9.89e-06 +/- 3.13e-06 |- |10 n90 | 111 | 1.10e-04 +/- 1.04e-05 | 26 | 2.57e-05 +/- 5.04e-06 |- |11 nTowiPhi | 28 | 2.77e-05 +/- 5.23e-06 | 13 | 1.29e-05 +/- 3.56e-06 |- |12 iPhiFrac | 12 | 1.19e-05 +/- 3.42e-06 | 20 | 1.98e-05 +/- 4.42e-06 |- |13 R1 | 11 | 1.09e-05 +/- 3.28e-06 | 10 | 9.89e-06 +/- 3.13e-06 |- |14 R2 | 11 | 1.09e-05 +/- 3.28e-06 | 10 | 9.89e-06 +/- 3.13e-06 |- |15 Rpeak | 10 | 9.89e-06 +/- 3.13e-06 | 11 | 1.09e-05 +/- 3.28e-06 |- |16 Router | 10 | 9.89e-06 +/- 3.13e-06 | 10 | 9.89e-06 +/- 3.13e-06 |- [/TABLE]

================ JES uncertainty: LOW ================== |Cut |N |cum % |N-1 |-

0 trigger 3275614 1.00e+02
1 BPTX veto 3275614 1.00e+02
2 BX veto 3262671 9.96e+01
3 Vertex veto 3251175 9.93e+01
4 Halo veto 1690877 5.16e+01
5 Cosmic veto 737121 2.25e+01
6 Noise veto 217139 6.63e+00
7 E30 29461 8.99e-01
8 E70 5350 1.63e-01
9 n60 5350 1.63e-01
10 n90 96 2.93e-03
11 nTowiPhi 22 6.72e-04
12 iPhiFrac 11 3.36e-04
13 R1 10 3.05e-04
14 R2 10 3.05e-04
15 Rpeak 9 2.75e-04
16 Router 9 2.75e-04

================ JES uncertainty: HIGH ================== |Cut |N |cum % |N-1 |-

0 trigger 3275614 1.00e+02
1 BPTX veto 3275614 1.00e+02
2 BX veto 3262671 9.96e+01
3 Vertex veto 3251175 9.93e+01
4 Halo veto 1690877 5.16e+01
5 Cosmic veto 737121 2.25e+01
6 Noise veto 217139 6.63e+00
7 E30 29864 9.12e-01
8 E70 7002 2.14e-01
9 n60 7002 2.14e-01
10 n90 128 3.91e-03
11 nTowiPhi 30 9.16e-04
12 iPhiFrac 12 3.66e-04
13 R1 11 3.36e-04
14 R2 11 3.36e-04
15 Rpeak 10 3.05e-04
16 Router 10 3.05e-04
End of analysis

For now, disregard the last two sections. The first section is important.

Total livetime : 1.01144e+06 Final rate : 9.8869e-06 +/- 3.12651e-06

[TABLE border=1] |Cut |N |Rate (Hz) |N-1 % |N-1 (Hz)|- |0 trigger | 3275614 | 3.24e+00 +/- 1.79e-03 | 10 | 9.89e-06 +/- 3.13e-06 |- |1 BPTX veto | 3275614 | 3.24e+00 +/- 1.79e-03 | 10 | 9.89e-06 +/- 3.13e-06 |- |2 BX veto | 3262671 | 3.23e+00 +/- 1.79e-03 | 10 | 9.89e-06 +/- 3.13e-06 |- |3 Vertex veto | 3251175 | 3.21e+00 +/- 1.78e-03 | 98 | 9.69e-05 +/- 9.79e-06 |- |4 Halo veto | 1690877 | 1.67e+00 +/- 1.29e-03 | 12463 | 1.23e-02 +/- 1.10e-04 |- |5 Cosmic veto | 737121 | 7.29e-01 +/- 8.49e-04 | 2110 | 2.09e-03 +/- 4.54e-05 |- |6 Noise veto | 217139 | 2.15e-01 +/- 4.61e-04 | 1575 | 1.56e-03 +/- 3.92e-05 |- |7 E30 | 29648 | 2.93e-02 +/- 1.70e-04 | 10 | 9.89e-06 +/- 3.13e-06 |- |8 E70 | 6119 | 6.05e-03 +/- 7.73e-05 | 88 | 8.70e-05 +/- 9.27e-06 |- |9 n60 | 6119 | 6.05e-03 +/- 7.73e-05 | 10 | 9.89e-06 +/- 3.13e-06 |- |10 n90 | 111 | 1.10e-04 +/- 1.04e-05 | 26 | 2.57e-05 +/- 5.04e-06 |- |11 nTowiPhi | 28 | 2.77e-05 +/- 5.23e-06 | 13 | 1.29e-05 +/- 3.56e-06 |- |12 iPhiFrac | 12 | 1.19e-05 +/- 3.42e-06 | 20 | 1.98e-05 +/- 4.42e-06 |- |13 R1 | 11 | 1.09e-05 +/- 3.28e-06 | 10 | 9.89e-06 +/- 3.13e-06 |- |14 R2 | 11 | 1.09e-05 +/- 3.28e-06 | 10 | 9.89e-06 +/- 3.13e-06 |- |15 Rpeak | 10 | 9.89e-06 +/- 3.13e-06 | 11 | 1.09e-05 +/- 3.28e-06 |- |16 Router | 10 | 9.89e-06 +/- 3.13e-06 | 10 | 9.89e-06 +/- 3.13e-06 |- [/TABLE]

This shows the effective livetime for the full set of data in seconds (1.011e+06 is about 281 hours)

Next is what we call a cutflow table. It's formatted to be copied into an elog, but we'lll get to that later. For each line we list the name of the cut, the total number of events passing all cuts up to that one, and the N-1 value for the cut. The N-1 value is the number of events passing all cuts but that one. Looking at the halo cut:

|Cut |N |Rate (Hz) |N-1 |N-1 (Hz) |- |4 Halo veto | 1690877 | 1.67e+00 +/- 1.29e-03 | 12463 | 1.23e-02 +/- 1.10e-04 |-

1690877 events passed cuts 1-4. 12463 events pass all cuts excluding the halo cut. This implies that there could be up to 12463 halo events

The "N" value at the end shows how many events passed all cuts, ie the number of selected events. For 2012, this number is 10.

Running the analysis - Part 2: backgrounds

NOTE: The code for all of the following programs is in StoppedHSCP/Analysis/bin/ and the source code file always has the same name as the executable.

To determine if any selected events constitute signal, we have to first calculate our expected background. Obviously, if our background estimate is the same as our # of selected events, we have good reason to believe that these selected events are just background events that escaped all of our cuts. For each of these, we're running over the full set of data so each step takes a little while.

There are 3 sources of background we have to consider. First, the halo background. The details of how this is calculated are in the analysis note. You should be able to read through the source file and figure out the mechanics. The source file is at:

StoppedHSCP/Analysis/bin/HaloBackground.cpp

To run it:

haloBackground -i [ntuple dir] -o [analysis dir] cat [analysis dir]/HaloBackground.txt

The output of this file is pretty complicated (blame me, I like a lot of output). It basically shows the expected halo background using various different binning schemes and we can go over the rest of it later. The important part is the final line:


Final = 8.02641 +/- 0.192619 +/- 0.241676

That is the number of halo events +/- statistical error +/- systematic error.

Now that the halo estimate is complete, let's move on the cosmic background estimate. This starts by running over some cosmic muon Monte Carlo that I've made in the past. First you run "backgrounds" which produces some plots we need later. Next is "cosmicInefficiency" which calculates the cosmic inefficiency based on cosmic muon MC.

mkdir Cosmic12_5310_V29_All backgrounds -i /store/user/rodenm/mcgluino/stoppedHSCP_tree_Summer12_5310_V29_Cosmic_All -o Cosmic12_5310_V29_All/ cosmicInefficiency -i /store/user/rodenm/mcgluino/stoppedHSCP_tree_Summer12_5310_V29_Cosmic_All -o Cosmic12_5310_V29_All/ cat Cosmic12_5310_V29_All/CosmicInefficiency.txt

Now that we have histograms with the cosmic inefficiencies, we apply them to the actual cosmic data.

cosmicBackground -i [ntuples dir] -o [analysis dir] --ineffPlots=Cosmic12_5310_V29_All/CosmicInefficiency.root >& [analysis dir]/CosmicBackground.txt cat [analysis dir]/CosmicBackground.txt

Again, there's tons of output and the key part is at the very end of [analysis dir]/CosmicBackground.txt:

N-1 entries: 1753 DT by RPC background: 3.05437 +/- 1.29556 Smeared background: 5.2124 +/-1.49521 Uncertainty background: 6.2546 +/-1.50815 End of analysis

We use the "Smeared background" for our background estimate. We then take the "Uncertainty background" to estimate the systematic by taking the difference between the two central values. That means that the final cosmic background estimate is 5.21 +/- 1.50(stat) +/- 1.04.

And now, the noise background.

We currently use the 2010A run to estimate our noise background. I'm fairly certain this will change next year (to what, I don't know), but for now, let's do things the way we have in the past. First, you need to run the usual search on the 2010A run. mkdir Run2010A-v1_5310_V29_1114_1309_v5_final search -i /store/user/rodenm/data/gluino/stoppedHSCP_tree_Run2010A-v1_5310_V29_1114_1309_v5/ -o Run2010A-v1_5310_V29_1114_1309_v5_final >& Run2010A-v1_5310_V29_1114_1309_v5_final/summary.txt

Then run the cosmic background estimate for 2010A. backgrounds -i /store/user/rodenm/data/gluino/stoppedHSCP_tree_Run2010A-v1_5310_V29_1114_1309_v5/ -o Run2010A-v1_5310_V29_1114_1309_v5_final

cosmicBackground -i /store/user/rodenm/data/gluino/stoppedHSCP_tree_Run2010A-v1_5310_V29_1114_1309_v5/ -o Run2010A-v1_5310_V29_1114_1309_v5_final --ineffPlots=Cosmic12_5310_V29_All/CosmicInefficiency.root >& Run2010A-v1_5310_V29_1114_1309_v5_final/CosmicBackground.txt

cat Run2010A-v1_5310_V29_1114_1309_v5_final/CosmicBackground.txt

Calculate the Run2010A background just as we did for the Run2012 background above.

To be continued...

-- WeifengJi - 2015-04-08

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2015-04-08 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback