First Exercise: Prerequisites and Simple Editing Exercise
Introduction
Prerequisites of the School
Must be register student and must know his.her registration ID
NCP server login and password
Laptop registration
Basics of Linux and installation of Mobextrum for Linux
Obtain a github account:
Since Summer 2013, most of the
CMS software are hosted on
Github
. Github is a Git repository web-based hosting
service, while Git is a distributed revision control system. In your future
analysis work, version control of your analysis code will become a very
important task and git will be very useful. A small git tutorial will wait for you in the fifth exercise set.
In order to checkout and develop
CMS software, you will need a github account, which is free.
NOTE:
Legend of colors for this tutorial:
GRAY background for the commands to execute (cut&paste)
GREEN background for the output sample of the executed commands
BLUE background for the configuration files (cut&paste)
PINK background for the code (EDAnalyzer etc.) (cut&paste)
Exercise 1 - Cut and Paste
This exercise is designed to run only on
cmslpc-sl6 as copies of the scripts are present there.
Login to the
cmslpc-sl6 cluster.
If you are preparing for CMSDAS@LPC2018 please know that the
cmslpc-sl6 is the cluster you are supposed to use. By now you should have a FNAL account that you can use to get kerberos credential and follow the instructions on
how to log in to the LPC cluster
.
As the exercises often require copying and pasting from instruction, we will make sure that you will have no problems. To verify if cut and paste to/from a terminal window works, first copy the script
runThisCommand.py
as follows.
To connect to
cmslpc-sl6
at Fermilab, try the following commands (Mac/Linux, Windows use the putty or cygwin instructions above):
kinit YourUsername@FNAL.GOV
Enter the kerberos principle password for your account and then connect:
ssh -Y YourUsername@cmslpc-sl6.fnal.gov
Once connected (Mac/Linux/Windows):
cp ~cmsdas/runThisCommand.py .
chmod +x runThisCommand.py
ssh -Y USERNAME@lxplus6.cern.ch
USERNAME@lxplus6.cern.ch's password:
Enter the password and then do:
cp /afs/cern.ch/cms/Tutorials/TWIKI_DATA/runThisCommand.py .
and then cut and paste the following and then hit return
./runThisCommand.py "asdf;klasdjf;kakjsdf;akjf;aksdljf;a" "sldjfqewradsfafaw4efaefawefzdxffasdfw4ffawefawe4fawasdffadsfef"
The response should be your username followed by alphanumeric string of characters unique to your username, for example for a user named gbenelli:
success: gbenelli toraryyv
QUESTION 1 - Post the alphanumeric string of characters unique to your username.
For CMSDAS@LPC2018 please submit your answers for the CMSDAS@LPC2018 Google Form first set
.
If you executed the command without copy-pasting:
./runThisCommand.py
the command will return:
Error: You must provide the secret key
Alternately, copying incorrectly will return
Error: You didn't paste the correct input string
If you are not running on cmslpc-sl6 (for example locally on a laptop), will result in:
bash: ./runThisCommand.py: No such file or directory
OR:
Unknown user: gbenelli.
Exercise 2 - Simple Edit Exercise
This exercise is designed to run only on
cmslpc-sl6.
The purpose of this exercise is to ensure that the user can edit files.
This means that you need to be able to use one of the standard text editors (emacs, pico, nano, vi, vim, etc.) available on the cluster you are running (cmslpc-sl6), open a file, edit it and save it!
On the
cmslpc-sl6 cluster:
cp ~cmsdas/editThisCommand.py .
Then open
editThisCommand.py
with your favorite editor (e.g.
emacs -nw editThisCommand.py
) and make sure that the 11th line has
# (hash character) as the first character of the line. If not, explicitly change the following three lines:
# Please comment the line below out by adding a '#' to the front of
# the line.
raise RuntimeError, "You need to comment out this line with a #"
to:
# Please comment the line below out by adding a '#' to the front of
# the line.
#raise RuntimeError, "You need to comment out this line with a #"
Save the file (e.g. in emacs
CTRL+x CTRL+s
to save,
CTRL+x CTRL+c
to quit the editor) and execute the command:
./editThisCommand.py
If this is successful, the result will be:
success: gbenelli 0x6D0DB4E0
QUESTION 2 - Paste the line beginning with "success" into the form provided.
If the file has not been successfully edited, an error message will result such as:
Traceback (most recent call last):
File "./editThisCommand.py", line 11, in ?
raise RuntimeError, "You need to comment out this line with a #"
RuntimeError: You need to comment out this line with a #
At Fermilab
cmslpc
one can use
nobackup
area linked from your home directory at
cmslpc
(nobackup -> /uscms_data/d2/YOURUSERNAME)
for the exercises.
source /cvmfs/cms.cern.ch/cmsset_default.csh #or .sh for bash
cd ~/nobackup
mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
### If you are using csh shell
setenv SCRAM_ARCH slc6_amd64_gcc481
### If you are using Bash shell
export SCRAM_ARCH=slc6_amd64_gcc481
cmsrel CMSSW_7_3_0_pre1
cd CMSSW_7_3_0_pre1/src
cmsenv
git cms-init
source /afs/cern.ch/user/n/ndefilip/public/logincmsdas.sh
cmscvsroot CMSSW
export SCRAM_ARCH=slc6_amd64_gcc481
scram p CMSSW CMSSW_7_3_0_pre1
cd CMSSW_7_3_0_pre1/src
cmsenv
source /cvmfs/cms.cern.ch/cmsset_default.sh (or .csh)
### If you are using csh shell
setenv SCRAM_ARCH slc6_amd64_gcc481
### If you are using Bash shell
export SCRAM_ARCH=slc6_amd64_gcc481
cmsrel CMSSW_7_3_0_pre1
cd CMSSW_7_3_0_pre1/src
cmsenv
source /afs/pi.infn.it/grid_exp_sw/cms/scripts/setcms.sh (or .csh)
cmscvsroot CMSSW
(once forever: mkdir -p /gpfs/gpfsddn/cms/user/`id`)
cd /gpfs/gpfsddn/cms/user/`id`mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
setenv SCRAM_ARCH slc5_amd64_gcc462
scram p CMSSW CMSSW_5_3_11
cd CMSSW_5_3_11/src
cmsenv
source /home/cmsdas/env.csh
mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
setenv SCRAM_ARCH slc5_amd64_gcc462
scram p CMSSW CMSSW_5_3_11
cd CMSSW_5_3_11/src
cmsenv
Exercise 3 - Setup a CMSSW release area like CMSSW_9_3_2
module use -a /afs/desy.de/group/cms/modulefiles/
module load cmssw
mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
export SCRAM_ARCH=slc6_amd64_gcc493
cmsrel CMSSW_7_6_0
cd CMSSW_7_6_0/src
## or for a release where gcc530 is needed
export SCRAM_ARCH=slc6_amd64_gcc530
cmsrel CMSSW_8_0_6
cd CMSSW_8_0_6/src
#
cmsenv
CMSSW is the CMS SoftWare framework used in our collaboration to process and analyze data. In order to use it, you need to set up your environment and set up a local CMSSW release.
At Fermilab
cmslpc-sl6
users have a 2GB home area at
/uscms/homes/Y/YOURUSERNAME
and a
larger mass storage area
called the
nobackup
area, which is linked from your home directory at
cmslpc-sl6
(if do
ls -alh |grep nobackup
you will see something like
nobackup -> /uscms_data/d3/YOURUSERNAME
) for the exercises. In both of these cases
YOURUSERNAME
is a placeholder for your actual username (you can do
whoami
to see your actual username). You will first want to set up the proper environment by entering the following command.
source /cvmfs/cms.cern.ch/cmsset_default.csh #or .sh for bash
Actually you should edit your ~/.tcshrc file (or ~/.bash_profile if bash is your default shell), create it if you do not have one, to include the above command so that you do not have to execute each time you log into the cluster.
Then proceed with the creation of a working area (called YOURWORKINGAREA in the following):
cd ~/nobackup
mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
### If you are using the default tcsh shell (or csh shell)
setenv SCRAM_ARCH slc6_amd64_gcc630
### Alternatively, If you are using Bash shell
export SCRAM_ARCH=slc6_amd64_gcc630
cmsrel CMSSW_9_3_2
cd CMSSW_9_3_2/src
cmsenv
git cms-init
This last command will take some time to execute and will produce some long output, be patient.
source /etc/profile.d/modules.sh
mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
export SCRAM_ARCH=slc6_amd64_gcc493
cmsrel CMSSW_7_6_0
cd CMSSW_7_6_0/src
## or for a release where gcc530 is needed
export SCRAM_ARCH=slc6_amd64_gcc530
cmsrel CMSSW_8_0_6
cd CMSSW_8_0_6/src
#
cmsenv
When you get the prompt again, run the following command:
echo $CMSSW_BASE
QUESTION 3 - Paste the result of executing the above command in the form
Note: The directory (on
cmslpc-sl6)
~/nobackup/YOURWORKINGAREA/CMSSW_9_3_2/src
is referred to as your
WORKING DIRECTORY
.
Every time you log out or exit a session you will need to setup your environment in your working directory again.
To do so, once you have executed once the steps above (assuming you have added the
source /cvmfs/cms.cern.ch/cmsset_default.csh #or .sh for bash
in your ~/.tcshrc or ~/.bash_profile file), you simply:
cd ~/nobackup/YOURWORKINGAREA/CMSSW_9_3_2/src
cmsenv
And you are ready to go!
Exercise 4 - Find data in the DAS ( Data Aggregation Service)
In this exercise we will locate the MC dataset
RelValZMM and the collision dataset
/DoubleMuon/Run2017C-PromptReco-v3/MINIAOD using the
Data Aggregation Service (not to be confused with the
Data Analysis School in which you are partaking!). Also be aware that DAS is an improved database access service known many years ago as DBS (Dataset Bookkeeping System).
Go to the url
DAS
,
NOTE that you will be asked for your Grid certificate which you should have loaded into your browser by now, (also note that there may be a security warning message, which you will need to ignore and still load the page) and type in the space provided:
dataset release=CMSSW_9_3_0_pre5 dataset=/RelValZMM*/*CMSSW_9_3_0*/MINIAOD*
This will search for datasets, processed with release
CMSSW_9_3_0_pre5
, which is named like
/RelValZMM*/*CMSSW_9_3_0*/MINIAOD*
. The syntax for searches is found
here
, with many useful common search patterns under "CMS Queries".
For this query, several results should be displayed (you may be queried for security exceptionss in the process). Select (click) on the dataset name
/RelValZMM_13/CMSSW_9_3_0_pre5-93X_mc2017_realistic_v2-v1/MINIAODSIM and after a few seconds another page will appear.
QUESTION 4.1a - What is the size of this dataset?
QUESTION 4.1b Click on "Sites" to get a list of sites hosting this data. Is this data at FNAL? Is this data at DESY?
Back in the main dataset page, click on the link "Files" to get a list of the root files in our selected dataset. One of the files it contains should look like this:
/store/relval/CMSSW_9_3_0_pre5/RelValZMM_13/MINIAODSIM/93X_mc2017_realistic_v2-v1/00000/96FBB6F5-0E92-E711-841B-0025905B85C0.root
If you want to know the name of the dataset from the name of a file, one can go to
DAS
and type
dataset file=/store/relval/CMSSW_9_3_0_pre5/RelValZMM_13/MINIAODSIM/93X_mc2017_realistic_v2-v1/00000/96FBB6F5-0E92-E711-841B-0025905B85C0.root
in the command line and hit "Enter".
Now we will locate a collisions dataset skim using the keyword search which is sometimes more convenient if you know the dataset you are looking for.
In
DAS
, type
dataset=/DoubleMu*/*Run2017C*/MINIAOD*
and hit Enter. Answer the following question:
QUESTION 4.2 - What release was the dataset containing 12Sep2017 collected in? (If you see more than one release, just answer one)
Having set your CMSSW environment one can also search for the dataset
/DoubleMuon/Run2017C-PromptReco-v3/MINIAOD
by invoking the DAS command in your
WORKING DIRECTORY
.
The DAS commands
das_client.py
and
dasgoclient
are in the path for CMSSW_9 versions and above, so you do not need to download anything additional. More about
das_client.py
can be found
here
.
The query we're interested in is:
/DoubleMuon/Run2017C-PromptReco-v3/MINIAOD, see the commands below on how to execute it command-line. This assumes that you have installed your CERN grid certificate on cmslpc-sl6, if not, follow
Step 5
to install.
NOTE:
For cmslpc-sl6
at the LPC at Fermilab you will need to init your Grid proxy beforehand:
voms-proxy-init --voms cms
(You will be asked for your grid certificate passphrase). Then you can execute the query with:
das_client.py --query="dataset=/DoubleMuon*/Run2017C-PromptReco-v3/MINIAOD" --format=plain
You will see something like
das_client.py --query="dataset=/DoubleMuon*/Run2017C-PromptReco-v3/MINIAOD" --format=plain
Showing 1-10 out of 2 results, for more results use --idx/--limit options
/DoubleMuon/Run2017C-PromptReco-v3/MINIAOD
/DoubleMuonLowMass/Run2017C-PromptReco-v3/MINIAOD
More information about accessing data in the
Data Aggregation Service
can be found in
WorkBookDataSamples
Exercise 5 - EDM ( Event Data Model framework) standalone utilities -
edmFileUtil
,
edmDumpEventContent
,
edmProvDump
,
edmEventSize
Make sure CMSSW has been set up as in
Exercise 3.
The overall collection of
CMS software, referred to as
CMSSW, is built around a Framework, an
Event Data Model (
EDM), and
Services needed by the simulation, calibration and alignment, and
reconstruction modules that process event data so that physicists can perform
analysis. The primary goal of the Framework and EDM is to facilitate the
development and deployment of reconstruction and analysis software. The
CMS
Event Data Model (EDM) is centered around the concept of an Event. An Event is
a C++ object container for all RAW and reconstructed data related to a
particular collision.To understand what is in a data file and more, several
EDM utilities are available. In this exercise, one will use three of these EDM
utilities. They will be very useful at CMSDAS and after. More about these EDM
utilities can be found at
WorkBookEdmUtilities. These together with
the
Github web interface for CMSSW
and
the
CMS LXR Cross Referencer
are very
useful to understand and write
CMS code.
edmFileUtil
First we will use the
edmFileUtil
to find the
physical file name (PFN) where the file is actually stored at your site, given the
logical file name (LFN) which is an alias that can be used in CMSSW at any site.
- Use
edmFileUtil
to find the physical file name (PFN) corresponding to the logical file name (LFN) from a MiniAOD file.
- To do this execute
edmFileUtil -d /store/relval/CMSSW_9_3_0_pre5/RelValZMM_13/MINIAODSIM/93X_mc2017_realistic_v2-v1/00000/96FBB6F5-0E92-E711-841B-0025905B85C0.root
- Since you are working on cmslpc-sl6 this will return:
root://cmsxrootd-site.fnal.gov//store/relval/CMSSW_9_3_0_pre5/RelValZMM_13/MINIAODSIM/93X_mc2017_realistic_v2-v1/00000/96FBB6F5-0E92-E711-841B-0025905B85C0.root
edmDumpEventContent
Next we will use
edmDumpEventContent
to dump a summary of the products that are contained within the file we're interested in, on
cmslpc-sl6:
This will return:
Type Module Label Process Full Name
--------------------------------------------------------------------------
vector<pat::Muon> "slimmedMuons" "" "RECO" patMuons_slimmedMuons__RECO
- The output of
edmDumpEventContent
has information divided into four variable width columns. The first column is the C++ class type of the data
, the second is module label
, the third is product instance label
and the fourth is process name
. More information is available at Identifying Data in the Event.
- QUESTION 5.1a - How many modules produce products of type
vector
? * QUESTION 5.1b - What are the names of three of the modules that produce products of type vector
?
- NOTE: Instead of the above, try without the option
--regex slimmedMuons
. This will dump the entire event content - a file with many lines. For this reason we'll send the output to a file called EdmDumpEventContent.txt
with a UNIX pipe command.
edmProvDump
To aid in understanding the full history of an analysis, the framework accumulates provenance for all data stored in the standard ROOT output files. Using the command
edmProvDump
one can print out all the tracked parameters used to create the data file. For example, one can see which modules were run and the CMSSW version used to make the MiniAOD file. In executing the command below it is important to follow the instructions carefully, otherwise a large number of warning messages may appear. The
ROOT
warning messages can be ignored.
- NOTE:
EdmProvDump.txt
is a very large file of the order of 40000-60000 lines. Open and look at this file and locate Processing History
( about 20-40 lines from the top).
- QUESTION 5.2 - Which version of CMSSW_?_?_? was used to produce the MiniAOD file?
edmEventSize
Finally we will execute
edmEventSize
to determine the size of different branches in the data file. Further details may be found here:
SWGuideEdmEventSize.
edmEventSize
isn't actually a 'Core' helper function (anyone can slap 'edm' on the front of a program in CMSSW). You can use edmFileUtil to get a PFN from an LFN (as shown above) so you could combine the call
Execute at cmslpc-sl6: edmEventSize -v `edmFileUtil -d /store/user/cmsdas/2018/pre_exercises/0EE14BA8-41BB-E611-AD2F-0CC47A4D760A.root` > EdmEventSize.txt
QUESTION 5.3 What is the number of events if you execute the command at cmslpc-sl6?
Open and look at file EdmEventSize.txt
and locate the line containing the text patJets_slimmedJetsPuppi__RECO
. There are two numbers following this text that measure the plain and the compressed size of this branch.
QUESTION 5.4 - What are these two numbers?
Exercise 6 - Familiar with MiniAOD Format
Analyzing physics data at
CMS is a very complicated task involving multiple steps, sharing of expertise, cross checks, and comparing different analysis. To maximize physics productivity,
CMS developed a
new high-level data tier
MiniAOD in Spring 2014 to serve the needs of the mainstream physics analyses while keeping a small event size (30-50 kb/event), with easy access to the algorithms developed by Physics Objects Groups (POGs) in the framework of the CMSSW offline software. The production of MiniAODs will be done centrally for common samples. Its goal is to centralize the production of PAT tuple which were used among the Physics Analysis Groups (PAGs) in Run 1.
(Information about PAT can be found in
SWGuidePAT and in a
CMS conference note
.) MiniAOD samples will be used in the Run 2 analysis. Hence it is important to know about this tool. More information about MiniAOD can be found in
WorkBookMiniAOD.
The main contents of the MiniAOD are:
- High level physics objects (leptons, photons, jets, ETmiss), with detailed information in order to allow e.g. retuning of identification criteria, saved using PAT dataformats.
Some preselection requirements are applied on the objects, and objects failing these requirements are either not stored or stored only with a more limited set of information.
Some high level corrections are applied: L1+L2+L3(+residual) corrections to jets, type1 corrections to ETmiss.
- The full list of particles reconstructed by the ParticleFlow, though only storing the most basic quantities for each object (4-vector, impact parameter, pdg id, some quality flags), and with reduced numerical precision; these are useful to recompute isolation, or to perform jet substructure studies.
For charged particles with pT > 0.9 GeV, more information about the associated track is saved, including the covariance matrix, so that they can be used for b-tagging purposes.
- MC Truth information: a subset of the genParticles enough to describe the hard scattering process, jet flavour information, and final state leptons and photons; GenJets with pT > 8 GeV are also stored, and so are the other mc summary information (e.g event weight, LHE header, PDF, PU information).
In addition, all the stable genParticles with mc status code 1 are also saved, to allow reclustering of GenJets with different algorithms and substructure studies.
- Trigger information: MiniAOD contains the trigger bits associated to all paths, and all the trigger objects that have contributed to firing at least one filter within the trigger. In addition, we store all objects reconstructed at L1 and the L1 global trigger summary, and the prescale values of all the triggers.
Please note that the files used in the following are from older releases, but they still illustrate the points they intended to. To avoid the fact that RelVal files (produced to validate new release in the rapid CMSSW development cycle) become unavailable on a short (month) timescale, a small set of files have been copied to the LPC EOS storage. They are available at
root://cmseos.fnal.gov//store/user/cmsdas/2018/pre_exercises/
.
The Z to dimoun MC file
root://cmseos.fnal.gov//store/user/cmsdas/2018/pre_exercises/CMSDataAnaSch_MiniAODZMM730pre1.root
is made in
CMSSW_7_3_0_pre1
release and the datafile
root://cmseos.fnal.gov//store/user/cmsdas/2018/pre_exercises/CMSDataAnaSch_Data_706_MiniAOD.root
made from the collisions dataskim
/DoubleMu/CMSSW_7_0_6-GR_70_V2_AN1_RelVal_zMu2011A-v1/MINIAOD.
In your working directory, try to open the root file
root://cmseos.fnal.gov//store/user/cmsdas/2018/pre_exercises/CMSDataAnaSch_MiniAODZMM730pre1.root
root -l
Note, if you already have a custom
.rootrc
or
.rootlogon.C
, you can start root without them with
root -l -n
On the
ROOT
prompt type the following:
gSystem->Load("libFWCoreFWLite.so");
FWLiteEnabler::enable();
gSystem->Load("libDataFormatsFWLite.so");
gROOT->SetStyle ("Plain");
gStyle->SetOptStat(111111);
TFile *theFile = TFile::Open("root://cmseos.fnal.gov//store/user/cmsdas/2018/pre_exercises/CMSDataAnaSch_MiniAODZMM730pre1.root");
TBrowser b;
Note: TBrowser is a graphical browser. It runs on the computer, where you started ROOT. Its graphical interface needs to be forwarded to your computer. This can be very slow. You either need a lot of patience, a good connection or you can try to run ROOT locally, copying the root files that are to be inspected. Since everyone is running a different operating system on their local computer, we do not support the setup of ROOT on your local computer. However, instructions exist on the
official ROOT website
.
To be able to use the member functions of a CMSSW data class from within ROOT, a 'dictionary' for that class needs to be available to ROOT. To obtain that dictionary, it is necessary to load the proper library into ROOT. The first three lines of the code above do exactly that. More information is at
WorkBookFWLiteExamples. Note that
gROOT->SetStyle ("Plain");
sets a plain white background for all the plots in ROOT.
NOTE: If the
rootlogon.C
is created in the home area, and the above four lines of code are in that file, the dictionary will be obtained, and all the plots will have a white background automatically upon logging in to
ROOT
.
Now a
ROOT
browser window opens and looks like this ("Root Files" may or may not be selected):
In this window click on
ROOT Files
on the left menu and now the window looks like this:
Double-click on the root file you opened:
root://cmseos.fnal.gov//store/user/cmsdas/2018/pre_exercises/CMSDataAnaSch_MiniAODZMM730pre1.root
, then
Events
, then scroll down and click
patMuons_slimmedMuons__PAT
(or the little + that appears next to it) and then
patMuons_slimmedMuons__PAT.obj
.
A window appears that looks like this:
Scroll a long way down the file (not too fast) and click on
pt()
. A PAT Muon Pt distribution will appear. These muons have been produced in the Z to mumu interactions as the name of the data sample implies.
QUESTION 6.1 - What is the mean value of the muon pt() for the MC data?
Note: To exit ROOT simply type .q
in the command line.
Now open the data file
root://cmseos.fnal.gov//store/user/cmsdas/2018/pre_exercises/CMSDataAnaSch_Data_706_MiniAOD.root
. Similarly run the following command, and answer the following question:
root -l
On the
ROOT
prompt type the following:
gSystem->Load("libFWCoreFWLite.so");
FWLiteEnabler::enable();
gSystem->Load("libDataFormatsFWLite.so");
gROOT->SetStyle ("Plain");
gStyle->SetOptStat(111111);
TBrowser b;
TFile *theFile = TFile::Open("root://cmseos.fnal.gov//store/user/cmsdas/2018/pre_exercises/CMSDataAnaSch_Data_706_MiniAOD.root");
QUESTION 6.2 - What is the mean value of the muon pt() for the collision data?
Be sure to submit your answers to the
Google Form first set
, then proceed to the second set. Links to all exercises below:
Link to
NcpSlpExerciseSecondSet
Link to
NcpSlpExerciseThirldSet
Link to
NcpSlpExerciseFourthSet
Link to
NcpSlpExerciseFifthSet
Link to
NcpSlpExerciseSixthSet
--
MuhammadAhmad - 2018-05-19
--
MuhammadAhmad - 2018-07-20
--
MuhammadAhmad - 2018-07-20