The CMS Newcomer Handbook


This is a collection of useful programming tips, physics knowledge, and resources to assist you in the CMS environment.
Use the Table of Contents below or your browser's built-in search function to look for keywords of interest.



Text background color convention

In this tutorial page, the following text background color convention is used:

GREY: For commands.
GREEN: For the output of executed commands.
PINK: For CMSSW parameter-set configuration files.
YELLOW: For any other type of file.


Acronyms

Acronym ...Stands For More Details
ACLiC Automatic Compiler of Libraries for CINT  
AFEB Anode Front-End Board  
ALCA ALignment and CAlibration  
ALCT Anode Local Charged Track A trigger board that sits on top of the CSC chamber.
ALICE A Large Ion Collider Experiment  
AOD Analysis Object Data  
API Application Programming Interface  
APS American Physical Society  
APV Analogue Pipeline Voltage  
ARC Analysis Review Committee  
ASIC Application-Specific Integrated Circuit  
ASO Asynchronous Stage Out  
B2G Beyond 2nd Generation  
BEST Boosted Event Shape Tagger  
BPH B-physics and general quarkonia  
BR Background Region  
BRIL Beam Radiation Instrumentation and Luminosity  
B2G Beyond 2 Generations  
CADI CMS Analysis Database Interface  
CAF CERN Analysis Facility  
CB Crystal Ball function  
CDS CERN Document Server  
CFEB Cathode Front-End Board An electronic board found on a CSC that stores analog signals from strips in SCA for digitization.
CINCO CMS Information on Conferences  
CJLST CMS Joint 4Lepton Study Team  
CLCT Cathode Local Charged Track A trigger board
CMSSDT CMS Software Development Tools  
CMSSW CMS SoftWare  
CR Control Region  
CRAB CMS Remote Analysis Builder  
CSV Combined Secondary Vertex  
CVMFS Cern Virtual Machine File System  
CWR Collaboration-Wide Review  
DAS Data Aggregation Service (used to be DBS, Data Bookkeeping Service)  
DB Data Base  
DMB DAQ MotherBoard  
DOC Detector On-Call  
DPF Division of Particles and Fields  
DPG Data Particle Group  
DPG Detector Performance Group  
DPS    
DQM Data Quality Monitoring  
ECF Energy Correlation Functions  
EDM Event Data Model  
EE ECAL Endcap  
EMTF Endcap Muon Track Finder  
EOS    
EPR Experimental Physics Responsibilities You need a certain amount of EPR credit to get authorship on CMS papers.
ES ECAL preShower  
EVE Event Visualization Environment  
EWPT ElectroWeak Precision Tests  
EWSB ElectroWeak Symmetry Breaking  
FAST Final ASsembly and Test  
FC Flavor, Charge  
FGSO Flammable Gas Safety Officer  
FOUT Fraction Out (outside fiducial cuts, for example)  
FSR Final-State Radiation  
FTR    
FUSE Filesystem in USErspace  
FWL Frame Work Lite  
GEM Gas Electron Multiplier  
GFAL Grid File Access Library  
GMT Global Muon Trigger  
GSf Gaussian-sum filter  
GT Global Tag  
HEFT Higgs Effective Field Theory  
HELAS HELicity Amplitude Subroutine library  
HEP High Energy Physics  
HF Heavy Flavor  
HI Heavy Ion  
HSCP Heavy Stable Charged Particle  
JEC Jet-Energy Corrections  
JER Jet-Energy Resolution  
JP Jet Probability  
JSON JavaScript Object Notation  
KF Kalman Filter  
LCG LHC Computing Grid  
LFN Logical File Name  
LFV Lepton Flavor Violation  
LHE Les Houches Events  
LLP    
LPC LHC Physics Center  
LSF Load Sharing Facility  
LTT Long-Term Testing refers to CMS electronics that require more testing
LVDB Low Voltage Distribution Board  
LXPLUS Linux Public Login User Service  
MAD Material Access Device  
MCFM Monte Carlo for FeMtobarn processes  
MCM Monte Carlo Manager  
ME MadEvent  
MELA Matrix Element Likelihood Approach (or Analysis)  
MG MadGraph  
MGM    
MLM (something to do with jet multiplicity)  
MIP Minimum Ionizing Particle  
MPO Multi-Fiber Push On a type of fiber used in indoor conditions
MSSM Minimal Supersymmetric Standard Model  
MTCC Magnet Test and Cosmic Challenge  
MTD MIP Timing Detector  
MTP Multi-fiber Termination Push-on a type of fiber
NNLL Next-to-Next-to-Leading-Logarithmic  
ODH Oxygen Deficiency Hazard  
OMTF Overlap Muon Track Finder  
OSSF Opposite Sign, Same Flavor  
PAD Personnel Access Device  
PAG Physics Analysis Groups  
PAS Physics Analysis Summary  
PAT Physics Analysis Toolkit  
PAW Physics Analysis Workstation  
PDF Parton Distribution Function  
PDF Particle Data Group  
PD Primary Dataset  
PF Particle Flow  
PFN Physical File Name  
PhEDEx Physics Experiment Data Export service  
POG Physics Objects Groups  
PPD Physics Performance Dataset  
PPE Personal Protective Equipment  
PPS Precision Proton Spectrometer  
PROOF Parallel ROOt Facility  
PS Parton Shower  
PU Pile Up  
PUPPI Pileup Per Particle Identification  
PV Primary Vertex  
PWG Physics Working Group  
RAID Redundant Array of Independent Disks  
RAISIN Radiological Risk Areas Inventory  
RECO Reconstructed  
RelVal new Release Validation  
ROC ReadOut Chip  
ROC Receiver Operating Characteristic  
RP Radiation Protection  
RPO Radiation Protection Officer  
RPV R-Parity Violation  
RSO Radiation Safety Officer  
RSS Resident Set Size  
SB Signal, Background  
SCA Switched Capacitor Array  
SCRAM Software Computing, Release and Management  
SCRAM Source Configuration, Release and Management  
SF Scale Factor  
SIP Significance of the 3D Impact Parameter (e.g. < 4)  
SLC Scientific Linux CERN  
SR Signal Region  
SSL Secure Sockets Layer  
STEP Simultaneous Track and Error Propagation  
STEP System Test of Endcap Peripheral (electronics) Validation tests done on the ME chambers to analyze their performance
SUSY SuperSymmetry  
SV Secondary Vertex  
SWAN Service for Web-based ANalysis  
TMB Trigger MotherBoard  
TOTEM TOtal, Elastic and diffractive cross-section Measurement  
TTC Timing, Trigger, and Control system Distributes the system clock, first level triggers and synchronization commands for the experiment
T2 Tier 2 server  
UE Underlying Event  
UFO Unidentified Falling Object --or-- Universal FeynRules Output  
VO Virtual Organization  
VOMS Virtual Organization Membership Service  
WLCG    
WN Worker Node  
2HDM Two-Higgs-Doublet Model  
2P2F 2 Pass 2 Fail (2 leptons pass tight selection, 2 fail)  
3P1F 3 Pass 1 Fail (3 leptons pass tight selection, 1 fails)  


Frequent Abbreviations

cmd = command
dir = directory
var = variable

Programming Languages

ROOT/PyROOT

Getting Started

We primarily use ROOT to access .root files and all the tasty data stored inside.

Honestly, I find ROOT to be difficult to work with. So instead I use PyROOT, which is just Python with ROOT's libraries imported.
In your shell, do:

python
from ROOT import *
  • Congrats! You are now working in PyROOT.
  • You can use all the typical ROOT commands (like, make histograms, canvases, open TTrees, etc.) all within the comfort of friendly Python syntax!
While inside this "!PyROOT" interpreter, you can open files locally or remotely:
python
from ROOT import *
f1 = TFile.Open("root://cmsio5.rc.ufl.edu//cms/data/store/user/Path/To/File/GluGluHToZZTo4L_M125_13TeV_powheg2_JHUGenV7011_pythia8.root")
f1.ls()
TFile**      root://cmsio5.rc.ufl.edu//cms/data/store/user/t2/users/rosedj1/ForPeeps/ForFilippo/GluGluHToZZTo4L_M125_13TeV_powheg2_JHUGenV7011_pythia8.root   
 TFile*      root://cmsio5.rc.ufl.edu//cms/data/store/user/t2/users/rosedj1/ForPeeps/ForFilippo/GluGluHToZZTo4L_M125_13TeV_powheg2_JHUGenV7011_pythia8.root   
  KEY: TDirectoryFile   Ana;1   Ana
A couple of points:
  • root: specifies that the file should be opened with ROOT
  • // is a separator
  • cms-xrd-global.cern.ch says to use the Xrootd service with a particular "redirector". Use this to access remote files.
This root file in particular has a "directory" object called Ana from the TDirectoryFile class. Let's see what's inside:
f1.Get("Ana").ls()
TDirectoryFile*      Ana   Ana
 KEY: TTree   passedEvents;1   passedEvents
 KEY: TH1F   nEvents;1   nEvents in Sample
 KEY: TH1F   sumWeights;1   sum Weights of Sample
 KEY: TH1F   nVtx;1   Number of Vertices
 KEY: TH1F   nInteractions;1   Number of True Interactions
We see some 1-dimensional histograms (the TH1F dudes) and a TTree, which stores most of the juicy data we want.
Let's go into that TTree, check out the 0th event, and see what branches the TTree contains:
t1 = f1.Get("Ana/passedEvents")
t1.Show(0)
======> EVENT:0
 Run             = 1
 Event           = 772131
 LumiSect        = 8044
 nVtx            = 31
 nInt            = 26
 finalState      = -1
 triggersPassed  = HLT_IsoMu20_v12HLT_L1SingleMu18_v3HLT_L1SingleMu25_v2HLT_L2Mu10_v7HLT_Mu17_TrkIsoVVL
 passedTrig      = 1
 passedFullSelection = 0
 genWeight       = 1
 pileupWeight    = 1.27664
 dataMCWeight    = 1
 eventWeight     = 1.27664
 crossSection    = 1
 lep_ecalDriven  = (vector<int>*)0x50d6660
 lep_tightId     = (vector<int>*)0x4a091b0
 nisoleptons     = 0
 H_mass          = (vector<float>*)0x50cd530
 mass4l          = -1
 pTZ1            = -1
 pTZ2            = -1
 met             = 24.7942
 nFSRPhotons     = 0
 allfsrPhotons_dR = (vector<float>*)0xfb5eaf0
 fsrPhotons_lepindex = (vector<int>*)0xfdcced0
 fsrPhotons_pt   = (vector<float>*)0xfe95950
 passedFiducialSelection = 1
 GENlep_id       = (vector<int>*)0x11244610
 GENmass4l       = 125
 GENmassZ1       = 42.1127
 GENmassZ2       = 18.0732
 D_bkg_kin       = 999
 ...
This shows you what information is stored inside the TTree. You can use all sorts of TTree methods to sift through the data.
The most useful methods that I know of are:
t.Show(2)                                       # Shows all branches and values of the second entry.
t.Scan()                                        # Scans the first 25 entries. Press Enter to show another 25 entries.
t.Scan("<branch_name>")                         # Scan across a specific branch in the TTree.
t.Scan("Event:triggersPassed:GENmass4l")        # Simultaneously scan across multiple branches. SUPER USEFUL.
t.GetEntries()                                  # Gives total number of entries in N-Tuple.
t.GetEntries("passedFullSelection==1")          # Only count entries with passedFullSelection==1.
t.GetEntries("Sum$(abs(GENlep_id[])==11)==4")   # Can do cool sums and stuff. I need to learn more about this.
t.GetEntry(2)                                   # Puts you at the second entry and allows you to extract branch info.
  Then: t.eventWeight                           # Get value of eventWeight of second entry.
t.Print()                                       # Another way to see what branches your tree has.
t.Draw("pTZ1")                                  # Make a histogram of the pTZ1 branch.
t.Draw("pTZ1","pTZ2 > 80")                      # Make a histogram of pTZ1 but apply cuts on pTZ2.
t.Draw("pTZ1","pTZ2 > 80 && nVtx < 5")          # Can combine selection criteria.
t.Draw("ebeam","(1/e)*(sqrt(z)>3.2)")           # Apply a weight of 1/e to all entries whose sqrt(z)>3.2
t.Draw("patMuons_slimmedMuons__PAT.obj.eta()","abs(patMuons_slimmedMuons__PAT.obj.eta())<1.2","")
t.Draw("some_branch>>h1", "", "goff")    # Store the values from <some_branch> into a histogram called h1.

Histograms

A histogram ("histo") is a kind of frequency plot; it shows your data in "bins", based on how often certain data values occur.
  • The most common kind of histo is a TH1F (1-dimensional Histogram of Floats).

Make a histo in Python:

TFile f("histos.root", "new")
TH1F h("hgaus", "histo from a gaussian", 100, -3, 3)    # 
h1.FillRandom("gaus", 10000)    # Pull 10,000 values from a normalized Gaussian distribution.

Make a histo in C++:

TH1F * h = new TH1F("h","My Histogram",100,-20,20)
h->FillRandom("gaus", 5000)    # Fill histo with 5000 random points pulled from Gaussian Distribution

Frequent Histogram Methods:

h->Fill(gRandom->Gaus(4,2))    # Fill histo with a single point pulled from Gaussian with mu=4, sigma=2
for (int i=0; i<1000; i++) {h->Fill(gRandom->Gaus(40,15));}    // Do a for loop to fill histo with many points.
h->GetEntries()            # Returns how many total values have been put into the bins
h->GetMaximum()            # Returns the number of entries inside the bin which holds the most entries
h->GetMinimum()            # Returns the number of entries inside the bin which holds the fewest entries
h->GetBinContent(<int bin_num>)   # Returns the number of entries inside bin number bin_num
h->GetMaximumBin()         # Tells you which bin holds the most entries; Returns the bin number(not x value of bin!)
h->Draw()    # Draws the histo using points. Looks ugly.
h->Draw("HIST")   # Draws the histo using rectangles. Looks more professional!
h->Draw("HIST e")   # Draw histo with error bars ( where err = sqrt(num_entries_in_bin) )
h->GetMaximumStored()            # ???
h->GetMean()                     # Get average of histogram
h->GetStdDev()                  # Get standard deviation of histo
h->GetXaxis()->GetBinCenter(<int bin>)   # returns the x value where the center of bin is located
h->GetNbinsX()                  # Returns the number of bins along x axis
h->Fill(<int bin_num>, <double val>)      # Fills bin number bin_num with value val
h->SetBinContent(<int bin>, <double val>)   # Deletes whatever is in bin number <bin>, and fills it with value <val>
                              (counts as adding a NEW entry!)
h->SetAxisRange(double <xmin>, double <xmax>, "<X or Y>")    # 
h->SetMaximum(max_y * 1.2)    // 
h->GetXaxis().SetRangeUser(80 , 250)
h->IntegralAndError(<bin1>,<bin2>,<err>)               # calculates the integral 
    - err will store the error that gets calculated
    - so before you execute the IntegralAndError, first do err = Double(2) to create the err variable 
h->SetLogy()                     # set y axis to be log scale
h->Write()    # Saves the histo to a root file or something?
h->GetXaxis()->GetBinCenter( h->GetMaximumBin() )      # returns most-probable value of histo
h->Sumw2()    // Ensure proper error propagation.
h->Rebin(10)    // Rebinning should be easy! Errors are automatically recalculated.

<b>"Pretty up" your plot:</b>
h.GetXaxis().SetTitle("massZ(GeV)")   # Put title on X axis. Can also use: h.SetXTitle()
h.GetYaxis().SetTitleOffset(1.3)      # Move Y axis up or down a bit.       
h.SetAxisRange(0.0, 0.1, "X")                 # Sets range on X axis.
h.SetLabelSize(0.03, "Y")                        
h.SetLineColor(1)

TH2F: (2-dimensional Histogram of Floats)

  • Makes "heatmaps".
h2 = TH2F("h2","h2 with latex titles",40,0,40,20,0,10)
h2.Integral()                           # calculate integral over ALL bins
h2.IntegralAndError(xbin1, xbin2, ybin1, ybin2, err)   # calculates the integral over square region, specified by bins
- err will store the error that gets calculated
- so before you execute the IntegralAndError, first do err = Double(2) to create the err variable 
h2.Draw("COLZ1")    # "COL" means color, "Z" means draw the color bar, "1" makes all cells<=0 white!

EXTRA HISTOGRAM TIPS

Load histo from a root file:
=TFile f("histos.root");=
TH1F h = (TH1F)f.Get("hgaus");

Bin Convention:

  • bin = 0; underflow bin
  • bin = 1; first bin with low-edge xlow INCLUDED
  • bin = nbins; last bin with upper-edge xup EXCLUDED
  • bin = nbins+1; overflow bin

Fitting functions to a histo:

h->Fit("gaus")    // Fit a Gaussian curve to the histo.

// Create your own fitting function:
fitfunc = new TF1("m1","gaus",85,95);    // OK this one is still a Gaus, but it is only valid from 85 < x < 95.
fitfunc->SetLineColor(1)
fitfunc->SetLineWidth(2)
fitfunc->SetLineStyle(2)
h->Fit(fitfunc, "R");
param = h->Fit(fitfunc, "S");    // Saves the fit parameters into the variable param.
ROOT.gStyle.SetOptFit(1111)    // Set statistics box.

To learn more about fitting functions, check out:

Combine similar histograms together using the hadd command (hadd = "histogram add"):

  • hadd  .root  .root
  • Example: hadd  -f  newfile.root  oldfile1.root  oldfile2.root  oldfile3.root
    • The -f flag forces the new file to be produced, even if the file already exists.

For histos with equal bin width,
it is probably better to set the binwidth rather than the setting min_bin, max_bin, and number of bins!

There is an "overflow bin" on the right edge of the histogram that collects entries which lie outside the histogram x range.
These entries are NOT counted in the statistics (mean, stdev), BUT they are counted as new entries!

  • There's also an underflow bin
  • Therefore, entries in overflow bins DO count towards total entries, but not towards statistics, like Integral()

Normalizing Histos: Scale(1/h->Integral)

TGraphs

The most common kinds of graphs:
  • TGraphErrors: plot data points with error bars.
  • TF1: plot any function that you can create.
  • Multigraph:

tg = TGraph(<int n_points>, <x_array>, <y_array>)
tg.GetXaxis().SetTitle("<x_title>")
tg.Draw("APC")

Drawing Options   Description
"A"   Axis are drawn around the graph
"I"   Combine with option 'A' it draws invisible axis
"L"   A simple polyline is drawn
"F"   A fill area is drawn ('CF' draw a smoothed fill area)
"C"   A smooth Curve is drawn
"*"   A Star is plotted at each point
"P"   The current marker is plotted at each point
"B"   A Bar chart is drawn
"1"   When a graph is drawn as a bar chart, this option makes the bars start from the bottom of the pad. By default they start at 0.
"X+"   The X-axis is drawn on the top side of the plot.
"Y+"   The Y-axis is drawn on the right side of the plot.
"PFC"   Palette Fill Color: graph's fill color is taken in the current palette.
"PLC"   Palette Line Color: graph's line color is taken in the current palette.
"PMC"   Palette Marker Color: graph's marker color is taken in the current palette.
"RX"   Reverse the X axis.
"RY"   Reverse the Y axis.

How to use TMultiGraph:

mg = TMultiGraph("<internal_name>", "<title>")
mg.SetMaximum(<maxval>)      # set y-axis to <maxval>

LaTeX in ROOT

Instead of using \, use #.
h.GetXaxis().SetTitleOffset(1.4)
h.GetXaxis().SetTitle("#left| #frac{1}{1 - #Delta#alpha} #right|^{2} (1+cos^{2}#theta)")
h.GetYaxis().SetTitle("#frac{2s}{#pi#alpha^{2}}  #frac{d#sigma}{dcos#theta}")
h.Draw()

Legends

leg = TLegend(xmin, ymin, xmax, ymax)    # (all floats between 0 and 1, as a proportion of the x or y dimension)

Example:
leg = TLegend(0.60,0.7,0.8,0.9)                                           
leg.AddEntry(h1, "Mass = %s" % mZd,"lpf")                                                         
leg.SetLineWidth(3)                                                       
leg.SetBorderSize(0)                                                      
leg.SetTextSize(0.03)    
leg.Draw("same")        

Python

Take in user input:

usrinput = raw_input("Process which file?")    # User input will be stored in usrinput as a string.

help(<object>)         # brings up a help menu (docstring?) for <object>
e.g.   help(os.makedirs)   

It is often useful to debug a python script by doing: 
python -i <script.py>
- this executes the script and then puts you in the python interpreter
- This is beneficial because now all variables have been initialized and you can play around!

Printing
print "".
"Hello, %s. You are %s." % (name, age)      # called "%-formatting", not suggested by the docs!
"Hello, {1}. You are {0}.".format(age, name)   # "str.formatting", more flexible!
- "1" is the 1st var, "0" is the 0th var in format(0,1)

print "{0:.1f}\t{1:15.4E}\t{2:15.4E}".format( 
    mZdList[k],                                                     
    sigseleffList_4e[k],                                                                             
    sigseleffList_4mu[k],                                                                           
    )
- {<var>:<spaces>.<decimalplaces><type>}


The most common objects in Python:

Dictionaries
mydict = {}
- Mutable

Iterate over values:
for val in mydict.values():
Iterate over keys and values:
for key,val in mydict.items():
   print "The key is:", key, "and the value is:", val
iterkeys()

Lists
- Unordered and mutable!
mylist = [1,3,'hey']         # lists can hold different data types
mylist.append(4.6)         # permanently appends value to mylist


Tuples
- Ordered and immutable!
Very similar to lists... except tuples are immutable!
They are processed faster than lists


for loops
for item1,item2 in zip( list1,list2 ):   # will iterate through item1 at same time as item2
   # do stuff

range
- creates an iterable object
- useful for "for loops"
xrange is faster and requires less memory, but has less versatility

Functions:
Variable number of arguments:
def asManyAsYouWant(var, *argv):   # pass in as many arguments into argv as you want
   for arg in argv:               # each one will be iterated over
      print "do stuff"

Lambda functions
A way to write quick functions

square = lambda x: x**2
pythag = lambda x,y: np.sqrt(x**2 + y**2)
ls = lambda : os.listdir()
printstuff = lambda *args: print args
- can handle any number of arguments
Call these functions with: 
- square(), pythag(), etc. 

Probably safer to put the arguments of an if statement in parentheses:
if (not os.path.exists(<path_to_dir>)): print "this dir doesn't exist"


Passing arguments to script:
myscript.py  arg1  arg2
sys.argv[0]   # name of script (myscript.py)
sys.argv[1]   # first argument passed to script (arg1)
sys.argv[2]   # first argument passed to script (arg2)


Useful string methods:
<string>.lstrip()         # temporarily removes whitespace from beginning of <string> to first non-whitespace char
<string>.rstrip('/')         # temp. remove a '/' from the right-part of <string>
<string>.startswith('#')      # return bool if string starts with '#'

module    = container for code, e.g. a .py file (which is called a submodule!)
package = modules that contain other modules, e.g. a directory with an __init__.py file

Classes:

class Vectors():
   def __init__(self,x,y,z):
      self.x = x
      self.y = y
      self.z = z
   
   def length(self):
      return np.sqrt(x**2+y**2+z**2)

Now you can create objects:
myobj = Vector(9,4,2)
myobj.x         # get x coord
myobj.length()      # get length of myobj

Built-in methods:&#8232;__doc__
__init__
__module__
__dict__
dir(myobj)      # show all the attributes of myobj
myobj.__dict__      # returns a dictionary of {attributes:values}

Save your objects in a pickle:
import pickle
mylist = [1,2,3]
fileobj = open(<file_to_write_to>,'wb')
pickle.dump(mylist, fileobj)
fileobj.close()
Easily restore the pickled object:
anotherfileobj = open(<file_with_pickled_obj>, 'r')


Packages:

glob
import glob
glob.glob("/raid/raid7/rosedj1/Higgs/*/*/Data.root")   # stores matched files in a list object!

Remember, that it's not regex! It's standard UNIX path expansion.
How to use wildcards:
*      # matches 0 or more characters
?      # matches 1 character in that position
[0-9]   # matches any single digit

glob.glob("/home/file?.txt")   # `?' will match a single character

sys
import sys
print sys.version         # find out what version of python is running the script
sys.exit()               # Immediately ends program. Useful for debugging. 

os
import os
os.getcwd()               # returns string of current working dir (equivalent to `pwd`)
os.system()               # not recommended, since the output is not stored in a variable; &#8232;                     only 0 (success) or 1 (failure) will get stored; use module: subprocess instead
os.path.join(<dir1>, <dir2>)      # finds path to <dir1> and <dir2> into single path, supplying '/' as needed
os.path.split(<path/to>/<file>)   # returns a 2-tuple with (<path/to>, <file>) (good for finding the parent dir of <file>)
os.path.exists()            # 
os.makedirs(<dirpath>)         # make directory <dirpath>, recursive
os.environ['USER']            # returns string of current user (same as doing `echo $USER` in bash)

subprocess
Python can run shell commands
import subprocess
subprocess.call( ['<cmd1>', '<cmd2>', ...] )      # passes commands to shell and shell executes commands
var = subprocess.check_output(<cmd>)      # allows you to store output of <cmd> in var
var = subprocess.check_output(['ls', '-a'])

ret_output = subprocess.check_output('date')
print ret_output.decode("utf-8")
- Thu Oct  5 16:31:41 IST 2017

Clean way:
import shlex, subprocess
command_line = "ls -a"
args = shlex.split(command_line)
p = subprocess.Popen(args)

Example:
import subprocess, shlex                                                       
                                                                               
def processCmd(cmd):                                                                                            
    args = shlex.split(cmd)                                                    
    sp = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = sp.communicate()                                                                           
    return out, err                                                            


argparse
# Note that it is difficult to use bools as input arguments!
# A quick hack is to pass in '0' and'1' instead. Python is forgiving. :-)
import argparse
def ParseOption():
    parser = argparse.ArgumentParser(description='submit all')                                                         
    parser.add_argument('--min', dest='min_relM2lErr', type=float, help='min for relMassZErr')                                                                                  
    parser.add_argument('--filename', dest='filename', type=str, help='')                                                                                                                                                                 
    parser.add_argument('--zWidth', dest='Z_width', type=float, help='Z width in MC or pdg value')                     
    parser.add_argument('--plotBinInfo', dest='binInfo', nargs='+', help='', type=int)#, required=True)                       
    parser.add_argument('--doubleCB_tail',dest='doubleCB_tail', nargs='+', help='', type=float)#, required=True) 
    parser.add_argument('--pTErrCorrections', dest='pTErrCorrections', nargs='+', help='', type=float)#, required=True)                                                 
    args = parser.parse_args()                                                                                         
    return args                

# Call the function.                                                                                                              
args=ParseOption()                    
# Get values from args.
args.Z_width                                                                          
massZErr_rel_min = args.min_relM2lErr



Matplotlib.pyplot

fig, ax  = plt.subplots()

f = plt.figure(figsize=(12,3))
ax1 = f.add_subplot(121)
ax2 = f.add_subplot(122)
-----
f,(ax1,ax2) = plt.subplots(1,2)      # 1 row, 2 col

plt.axis( <xvals>.min() , <xvals>.max() , <yvals>.min() , <yvals>.max() )   # control axis range

Axes:
ax.set_ylim([<ymin>,<ymax>])      # sets y bounds from <ymin> to <ymax>
ax.tick_params(axis='y', labelsize=8)   # adjust size of numbers on y-axis
plt.xscale('log')               # make x-axis log scale
ax.set_xscale('log')
ax.set_xlabel(r'$<LaTeX!!!>$')      # Use LaTeX commands 
xlabel="$lep$: $p_{T}$ / GeV")      # can separate LaTeX font from regular font
https://matplotlib.org/users/mathtext.html

Font sizes:
plt.rc('font', size=BIGGER_SIZE)          # controls default text sizes
plt.rc('axes', titlesize=BIGGER_SIZE)     # fontsize of the axes title
plt.rc('axes', labelsize=BIGGER_SIZE)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=BIGGER_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=BIGGER_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE)    # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE)  # fontsize of the figure title

Find indices of the minimum of arr:
np.unravel_index(np.argmin(<arr>, axis=None),<>.shape)



Importing Packages and Modules
If you get ImportError, then most likely the python interpreter doesn't know the path to your package
1. Do echo $PYTHONPATH to see which paths the python interpreter knows about
2. Do export PYTHONPATH=$PYTHONPATH:<path/to/package>    # permanently append <path/to/package> to $PYTHONPATH
3. Make sure that you have the file __init__.py in each dir of your package.
    - can do: find <path/to/package> -type d -exec touch '{}/__init__.py' \;
More on this: https://askubuntu.com/questions/470982/how-to-add-a-python-module-to-syspath/471168

sys.path                      # list of python packages; python searches these file paths for packages to use
sys.path.append(<path/to/package>)   # temporarily append <path/to/package> to PYTHONPATH
sys.path.insert(0, <path/to/package>)   # temporarily insert <path/to/package> to PYTHONPATH as the 0th element in the sys.path list



import os,ROOT,pickle                                                                   
from .Utils.processCmds import processCmd                                               
                                                                                        
class Hadder(object):                                                                   
    def haddSampleDir(self,dir_path):                                                   
        processCmd('sh '+os.path.join(dir_path,"hadd.sh"))                              
                                                                                        
    def makeHaddScript(self,dir_path,sampleNames,outputInfo):                           
        haddText="hadd -f {0} ".format(dir_path+"/"+outputInfo.TFileName)               
        basedir = os.path.dirname(dir_path)+"/"                                         
        for sampleName in sampleNames:                                                  
            haddText += " {0}/*_{1}".format(basedir+sampleName,outputInfo.TFileName)+" "
            #haddText += "\n"                                                           
        outTextFile = open(dir_path+"/hadd.sh","w")                                     
        outTextFile.write(haddText)                                                     


np.log10(x)
dir(numpy)
- gives big list of all available functions in numpy
lists:
x= [3,-1,5.5,0]
np.mean(x) —> 1.875
map(np.exp,x)
- ^maps a function to each element to a list
Define an array of ‘r’ values and one of ‘theta’ values
a = np.arange(1,10).reshape(3,3) —> makes a 3x3 array
a.size   # 
a.shape   # 
np.linspace(start,stop,number_of_values)

np.arctan2(y,x)
Element-wise arc tangent of x1/x2 choosing the quadrant correctly.


Meanings of underscores in variable names:
By the way, a double underscore is often called a 'dunder'!
_var      # when you import using, say: 'from ROOT import *', then '_var' will not be imported
                - single underscores are meant for variables for internal use only (within classes, e.g.); not enforced by interpreter
var_      # this one's easy: a trailing underscore is used simply to avoid naming conflicts (e.g., class_ = 'just a regular string')
__var      # interpreter will intentionally name-mangle this var so that it doesn't get overwritten
__var__   # only used for special vars native to the Python language; don't define these yourself!
_      # used as a placeholder var in a function or something; a 'throw-away' variable



Reading from and writing to files:
Read lines from a file:
with open(<filename>) as f:
   content = f.readlines()


with open(savePath + saveName + ".txt", "w") as myfile:
   myfile.write('rootPath: ' + rootPath1 + '\n')       
   myfile.write('rootfile: ' + rootfile1 + '\n')       
   for i in range(len(vars1_x)):                       
       myfile.write('var_x: ' + vars1_x[i] + '\n')     
       myfile.write('var_y: ' + vars1_y[i] + '\n')     
       myfile.write('cut: ' + cuts1[i] + '\n')         
myfile.close()                                         

IPython
IPython is like a quick jupyter notebook for your terminal.
Extremely useful for its "magic" commands, tab completion, 
and ability to go back and edit blocks of code.

?         # Intro and overview of IPython
%quickref   # quick reference

A command that starts with % is called a "line magic" and %% is called a "cell magic"
- these are non-native to C++ or python, but understood by the IDE for really cool effects!

%magic         # bring up tutorial on magics
%lsmagic         # bring up magic commands
%<magicname>?   # get help on <magicname>

Cell Magic:
%%!      
<commands>      # begins a cell magic and then passes the cell to the shell 

If you need to pass in arguments into a script using ipython:
ipython <script.py> -- --arg1 --arg2   # note the '--' between <script.py> and --arg1

------
Topics to research:
import multiprocessing

Bash/Linux

Linux Tutorial: https://ryanstutorials.net/linuxtutorial/commandline.php

Remember the philosophy of Unix: "small, sharp tools"

MUST KNOW Bash commands: ls # list most contents in current directory ls -a # list all contents (including hidden files) in current dir ls -l # list contents in a long format (just more detailed way) pwd cd <path/to/files> cd .. cd - # go back to previous dir cp mkdir

Some Bash magic: ! # this is the 'bang' operator, an iconic part of bash !! # execute the last command from history sudo !! # run last command with sudo privileges cat # run the last cat command from your history that used cat cat:p # print the last cat command you used to stdout; also adds that command to your history history # check your history; displays command numbers ! # execute command number !$ # means the argument of the last command cd !$ # cd's into the last command's argument; e.g. mkdir /new/awesome/folder/ cd !$ # would cd you into /new/awesome/folder/ ^ls^rm # if the last command used 'ls', it copies the command, replaces it with 'rm' and executes the new command command # will not add 'command' to history!

bash batch expansion cp /etc/rc.conf{,-old} # will make copy of 'rc.conf' called 'rc.conf-old' mkdir newdir{1,2,3} # will make newdir1, newdir2, newdir3 - it's as if the filepath "gets distributed" over the braces - this is a good way to mv files and make backups

Difference between 'source' and 'export': source <script.sh> # effectively the same as: . <script.sh>; executes script current shell export VAR=value # saves value as a new environmental VAR available to child processes

computer cluster: folders aren’t contained on just one computer, but network mounts can make it look like they are

If a file path begins with / - this is an absolute path ("root") - relative paths use: ..

server uses a "load balancer" - puts each user on a variety of nodes to balance the load of resource usage

man -k - shows you the

> is the redirection operator
| is the pipe operator

e.g., ls -l | grep Apr > somefile.txt - piping the output of ls into the grep command

Common Commands: scp # scp # the remote or should be of the form: user@server:/path/to/file scp -r history # shows you all previous commands you’ve entered more # prints to stdout the entire file(?) less # prints to temporary shell, not to stdout wc # word count, useful flags: -l -w sort [-nN] [-r] # usually sorts alphabetically, sort by number size with -n, -r is reverse search uniq # returns unique values diff # see the differences between and ; '<' indicates ; '>' indicates diff -r # compares differences between all files in and

top -n 1 -b | grep chenguan # see system summary and running processes; -n flag is iterations; -b is batch mode can grep to see a user's processes free [-g] # displays the amount of free and used memory in the system; -g to make it more readable read [-s] [-p] [] # stores user input into ; -s=silent text, -p=prompt becomes cut -d' ' -f2-4 # use whitespace as delimiter, and cut (print to screen) only fields (columns) 2 through 4 cat | tee [-a] # tee will append the stdout to both a file and to stdout (it piplines the info into a 'T' shape) ln -s # creates a symbolic link (a reference) between and - If you modify then you WILL MODIFY ! - Except for 'rm'; deleting does NOT delete file <> printf # appears to just be a fancier and more reliable echo

Less common, but possibly helpful commands: uname -a # look at your Linux kernel architecture, server, etc. uname -n # find out what node you're on env # print all your environmental variables to stdout gdb # GNU DeBugger (not sure how this works yet) basename # strips of directory part of name and suffix - basename /usr/bin/sort # returns: 'sort' - basename include/stdio.h .h # returns: stdio date # prints the date

Less important but still really cool commands! say [-v] [name] "" # write # start a chat with on your server - You are immediately put into "write mode". Now you can send messages back and forth. - Press 'Ctrl+C' or 'Esc' to exit write mode. mesg [y|n] # allow [y] people to send you messages using 'write' or not [n] Command line language translator: https://www.ostechnix.com/use-google-translate-commandline-linux/

Control Statements if ! -x <path/to/file>; then fi

while true; do done

Many different flags: ! EXPRESSION The EXPRESSION is false. -n STRING The length of STRING is greater than zero. -z STRING The length of STRING is zero (ie it is empty). STRING1 = STRING2 STRING1 is equal to STRING2 STRING1 = STRING2 STRING1 is not equal to STRING2 INTEGER1 -eq INTEGER2 INTEGER1 is numerically equal to INTEGER2 INTEGER1 -gt INTEGER2 INTEGER1 is numerically greater than INTEGER2 INTEGER1 -lt INTEGER2 INTEGER1 is numerically less than INTEGER2 -d FILE FILE exists and is a directory. -e FILE FILE exists. -r FILE FILE exists and the read permission is granted. -s FILE FILE exists and it's size is greater than zero (ie. it is not empty). -w FILE FILE exists and the write permission is granted. -x FILE FILE exists and the execute permission is granted.

Defining Functions: function { ; ; ... }

Can be one liners: function cdl { cd $1; ls; } cdl mydir # cd into mydir and then ls ________________ grep (global regular expression print) grep

grep -E -r "**" ./* # search the contents of every file for , recursively starting from ./*

rsync #rsync -av

ps aux | grep # see active processes

GNU screen! (a terminal multiplexer)

  • Start a persistent remote terminal. That way, you won't lose your work if you get disconnected!
  • Once inside a new screen, you should do: source ~/.bash_profile to get your normal settings.

Start up a screen

screen -S <screen_name>   # start a bash environment session ("screen")
ctrl+a, then Esc         # enters "copy/scrollback mode", which lets you scroll! 
- navigate copy mode using Vim commands!
- Hit `Enter` to highlight text. Hit `Enter` again to copy it. Paste with Ctrl+a, then `]`
- Hit `q` or `Esc` to exit copy mode.
ctrl+a, then d            # detach from the session (remember though that it's still active!)
screen -ls            # see what sessions are active
screen -r <name>         # reattach to active session (instead of <name> can also use: <screen_pid>)
exit                  # terminate session
kill <screen_pid>         # kill frozen screen session
ctrl+a then s            # split screens horizontally
ctrl+a then v            # split screens vertically
ctrl+a then Tab         # switch between split screen regions
ctrl+a then c            # begins a virtual window in a blank screen session
ctrl+a then "            # see list of all active windows inside session
________________

wget # download whatever url from the web wget -r --no-parent -A.pdf http://tier2.ihepa.ufl.edu/~rosedj1/DarkZ/MG5vsJHUGen_bestkinematics_GENlevel_WITHfidcuts/ - Downloads recursively, without looking at parent directories, and globbing all .pdf

tar -cf foo bar # create using files foo and bar tar -xf # unzip all of tar -xvf # untar specific files from tarball - x=extract, v=verbose, f=file,

Memory usage

du             # "disk usage"; good for find which files or dirs are taking up the most space
du -h <dir>      # print size of <dir> in human-readable format 
du -sh ./         # sums up the total of current workspace and all subdirs

df -h            # "disk filesystem", shows usage of memory on entire filesystem 

find
find ./ -name "*plots*"                        # find all files with name plots in this dir and subsequent dir
find /<path> -mtime +180 -size +1G               # find files with mod times >180 days and size>1GB
find . -type d -exec touch '{}/__init__.py' \;            # create (touch) a __init__.py file in every dir and subsequent dir
find . -type f -printf '%s\t%p\n' | sort -nr | head -n 30      # find the 30 biggest files in your working area, sorted 
find . -name "*.css" -exec sed -i -r 's/MASS/mass/g' {} \;   # use sed on every found file (the {} indicates a found file)
find ~/src/ -newer main.css                     # find files newer than main.css

locate
locate -i <file_to_be_found>      # searches computer's database. -i flag means case insensitive

Copy multiple files from remote server to local: scp @:/path/to/files/\{file1, file2, file3\} .

***Learn more about these commands: rcp set set -e # exit on first error? set -u # catch unset variables?

Use a specific interpreter to execute a file: #!/usr/bin/env python

Environment variables can be modified using 'export': export VARIABLE=value export PYTHONPATH=${PYTHONPATH}:</path/to/modules> # appending ':</path/to/modules>' to PYTHONPATH env var Interesting env vars: SHELL # hopefully bash HOSTNAME # host SCRAM_ARCH # cmssw architecture USER # You! PWD # current working dir PS1 # bash prompt LS_COLORS # colors you see when you do 'ls' MAIL EDITOR

Customize your prompt (you can even add a command to be executed INSIDE the prompt): PS1="[\d \t] `uptime` \u@\h\n\w\$ "

Prompt settings: * A bell character: \a * The date, in “Weekday Month Date” format (e.g., “Tue May 26”): \d * The format is passed to strftime(3) and the result is inserted into the prompt string; an empty format results in a locale-specific time representation. The braces are required: \D{format} * An escape character: \e * The hostname, up to the first ‘.’: \h * The hostname: \H * The number of jobs currently managed by the shell: \j * The basename of the shell’s terminal device name: \l * A newline: \n * A carriage return: \r * The name of the shell, the basename of $0 (the portion following the final slash): \s * The time, in 24-hour HH:MM:SS format: \t * The time, in 12-hour HH:MM:SS format: \T * The time, in 12-hour am/pm format: \@ * The time, in 24-hour HH:MM format: \A * The username of the current user: \u * The version of Bash (e.g., 2.00): \v * The release of Bash, version + patchlevel (e.g., 2.00.0): \V * The current working directory, with $HOME abbreviated with a tilde (uses the $PROMPT_DIRTRIM variable): \w * The basename of $PWD, with $HOME abbreviated with a tilde: \W * The history number of this command: \! * The command number of this command: \# * If the effective uid is 0, #, otherwise $: \$ * The character whose ASCII code is the octal value nnn: \nnn * A backslash: \* Begin a sequence of non-printing characters. This could be used to embed a terminal control sequence into the prompt: \[ * End a sequence of non-printing characters: \] FINISH GETTING THE REST OF THESE PROMPT SETTINGS! e.g. colors

open plots while ssh'ed: display eog # quickly open png files

sleep 7 # make the shell sleep for 7 seconds

Sexy Bash Tricks: quickly rename a bunch of files in a dir: for file in *.pdf; do mv "$file" "${file/.pdf/_standardsel.pdf}"; done # is this bash's native renaming?

Make a bunch of dir's quickly: mkdir newdir{1..20} # make newdir1, newdir2, ..., newdir20

iterate over floats: for k in $(seq 0 0.2 1); do echo "$k"; done # seq - seq has all kinds of flags for formatting!

Check if a dir exists. If it doesn't, then make it: [ -d possibledir ] || mkdir possibledir - The LHS checks if the directory is there. If it is, bash returns 1 and the OR statement ('||') is satisfied. Else, mkdir

Terminal Shortcuts: Ctrl-A # quickly go to BEGINNING of line in terminal Ctrl-E # quickly go to END of line in terminal Ctrl-W # delete whole WORD behind cursor Ctrl-U # delete whole LINE BEHIND cursor Ctrl-K # delete whole LINE AFTER cursor
Ctrl-R, then # reverse-search your command history for Option-Left # move quickly to the next word to the left Cmd-Right # switch between terminal WINDOWS Cmd-Shift-Right # switch between terminal TABS within window

alias # check your aliases alias ="" # add to

time ./<script.sh> # time how long a script takes to run - this will send three times to stdout: real, user, sys (real = actual run time)

Background Jobs: & # runs in a background subshell fg # bring a background process to foreground jobs # see list of all background processes Ctrl+Z # pause current job and return to shell Ctrl+S # pause a job, but DON'T return to shell Ctrl+Q # resume paused job in foreground bg # resume current job in background (sleep 3 && echo 'I just woke up') >/tmp/output.txt & # group commands and redirect stdout! - here the '&&' means to do the second command ONLY IF the first command was successful

Learn more about nohup: nohup ./gridpack_generation_patched06032014.sh tt 1nd > tt.log &

.bash_profile is executed for login shells, while .bashrc is executed for interactive non-login shells.

Execute shell script in current shell, instead of forking into a subshell: . ./<script.sh> # note: dot space dot forward-slash - N.B. this is nearly the same as doing: source <script.sh>

watch -n 10 '' # repeats every 10 seconds - default is 2 seconds

########################## sed # stream editor echo "1e-2" | sed "s#^+*[^e]#&.000000#;s#.*e-#&0#" # makes 1e-2 become 1.000000e-02 sed "s#^[0-9]*[^e]#&.000000#;s#.*e-#&0#" # equivalently

sed, in place, on a Mac: sed -i '' -e "s|STORAGESITE|${storageSiteGEN}|g" DELETEFILE.txt

Strip python/bash comments from a file: sed -i -e 's/#.*$//g' -e '/^$/d' # the '-e' executes another instance of sed, like piping. '-i' is "in place" so it modifies ________________ Check to see if some command succeeded: (N.B. a command returns 0 if it succeeds!) some_command if [ $? -eq 0 ]; then echo OK else echo FAIL fi

awk An extremely powerful file-processing language

General format: awk 'BEGIN{begincmds} {cmd_applied_to_each_line} END{endcmds}'

Sum up the second column in a file, and specifying the delimiter of columns as a comma: awk -F',' '{sum+=$2} END{print sum}' bigfiles.txt

Study this code below and see if syntax is useful: # if rnum allows, multiply by 10 to avoid multiple runs # with the same seed across the workflow run_random_start=$(($rnum*10)) # otherwise don't change the seed and increase number of events as 10000 if n_evt<50000 or n_evt/9 otherwise if [ $run_random_start -gt "89999990" ]; then run_random_start=$rnum max_events_per_iteration=$(( $nevt > 10000*9 ? ($nevt / 9) + ($nevt % 9 > 0) : 10000 )) fi

You can "divide out" strings: MG="MG5_aMC_v2.6.0.tar.gz" MG_EXT=".tar.gz" echo ${MG%$MG_EXT} - Prints: MG5_aMC_v2.6.0 # so effectively MG has been "divided by" MG_EXT

Need to learn about: tr "." "_" # translates all "." chars into "_" (used with piping) perl -ne 'print if /pattern1/ xor /pattern2/'

What does this do? model=HAHM_variablesw_v3_UFO.tar.gz if [[ $model = [!\ ] ]]; then...

Bash Scripting $# # number of arguments passed to script $@ # the arguments themselves which were passed to script $? # return statement of last command: 0 is successful

LaTeX

% # comments

Table of Contents (ToC) are very easily built by LaTeX!

Every document must have: \documentclass[12pt]{extarticle} \begin{document}

\end{document}

Referencing One section in a ToC may be in a file: sec-015-model.tex Inside this file might be: \section{Dark photon model} # title of whole section \label{sec:model} # internal reference name - when other pieces of code, like the ToC, need to reference this section (using something like: \ref{sec:model}), they need to know the label of the section!

C++

Take in user input:

TString usrinput;
std::cin >> usrinput;    // User input will be stored in usrinput var

To determine size of a C++ array: int myarr[] = {4, 6, 8, 9}; sizeof(myarr)/sizeof(*myarr) VECTORS ARE BETTER THAN ARRAYS! # more flexibility!

Vectors: Similar to arrays, but dynamically sized

#include vector vecName; vecName.push_back(value); # append value at the end of vecName vecName[index]; # index vecName, just like arrays vecName; # show all entries in vecName vecName.size(); # number of elements in vecName

make a pointer to the array: (PROBABLY UNNECESSARY) vector * vecPtr = &vecName # initialize pointer to point to address of vecName *vecPtr #

Vim

Why use Vim?
  • Vim (Vi IMproved) is a powerful text editor that is found on most Unix systems.
  • It is a light-weight program that lets you edit files with lightning-fast speed and is highly customizable.
  • Vim commands are frequently used in other commands, like: less, man, screen, info, etc.

Did you know?

  • Type vimtutor in your shell to go to a helpful Vim tutorial!

Essentials in Command Mode:

h,j,k,l         # left, down, up, right.
i               # Enter Insert Mode.
Esc             # Go back to Command Mode.
u               # Undo.
Ctrl+r          # Redo.
:w              # Write (save) your file.
:q              # Quit Vim.
yy              # Copy entire line.
dd              # Delete entire line.
p               # Paste what was recently copied or deleted.
w               # Jump forward 1 word.
b               # Jump backward 1 word.
0               # Bring cursor to start of line.
$               # Bring cursor to end of line.
gg              # Go to top of file.
G               # Go to bottom of file.
Ctrl+f          # Jump forward one page.
Ctrl+b          # Jump backward one page.
o (O)           # Enter a new line below (above) cursor.
.               # Repeat last action.
/hello          # Search for the string 'hello'.
n (N)           # Search for the next (previous) matched string.
:19             # Jump to line 19.
:set nu         # Add line numbers.
:set nonu       # Remove line numbers.
:set paste      # Prevents the horrendous indentation that sometimes happens when pasting.
:set nopaste    # Go back to a world of misery.

Some cool tricks:

c4w             # Delete next 4 words. Enter Insert Mode.
9o              # Enter 9 newlines below cursor.
ggdG            # Go to top of page, then delete contents of file.
di)             # Deletes the text inside a pair of ().
ca"             # Deletes the pair of "" and all text inside. Go straight into Insert Mode.
qk<cmds>q       # Begin recording a macro of <cmds> into register 'k'. Last q ends the recording.
@k              # Execute macro stored in register 'k'.
5@k             # Execute macro stored in register 'k' 5 times!
mj              # Set a mark (checkpoint) in register 'j'.
`j              # Bring cursor to the mark stored in register 'j'.
:9,18fo         # Fold (collapse) lines 9-18.
za              # Fold/unfold lines.
Shift+r         # Enter Replace mode: replaces text as you type.
Ctrl+o          # Return to previous position of cursor.
Shift+d         # Delete all text on line after cursor.
Shift+c         # Delete all text on line after cursor. Enter Insert Mode.
Shift+i         # Bring cursor to start of line. Enter Insert Mode.
Shift+a         # Bring cursor to end of line. Enter Insert Mode.
Ctrl+v          # Enter Visual Block Mode.
Shift+i         # While in Visual Block Mode, go into Insert Mode. Changes affect all highlighted lines.
:%s/word/neat/g # Substitute every instance of "word" with "neat", globally (in whole file).

Try adding Python comments to 3 lines at once:

0, Ctrl+V, jjj, Shift+i, #, Esc


CMSSW (CMS SoftWare):

https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookConfigFileIntro

***If you want to use CMSSW, you must be in an environment that can reach the CMSSW libraries. Example servers: - UF (ihepa) - CERN (lxplus) - HiPerGator (hpg) - Fermilab (fnal)

Get 'cmsenv' and 'cmsrel' export VO_CMS_SW_DIR=/cvmfs/cms.cern.ch source $VO_CMS_SW_DIR/cmsset_default.sh

cmsrel CMSSW_X_Y_Z - install the CMSSW environment in a new dir, version X_Y_Z, like e.g. 9_4_2 - 8_0_X = 2016 data - 9_4_X = 2017 data - 10_2_X = 2018 data - once inside, be sure to do: cmsenv to "load" the environment variables (sets up your runtime environment)! - You will have to use different CMSSW versions for different years' data! - By the way "cmsrel" stands for "CMSSW Release"!

See what versions of CMSSW are available: scram list -a scram list -a | egrep "CMSSW_9_4_X" > cmssw.txt

See what scram architecture you are running: scram arch - or - echo $SCRAM_ARCH

Set scram arch to something different: export SCRAM_ARCH=slc3_ia32_gcc export SCRAM_ARCH=slc6_amd64_gcc491

Configuration Files, cmsDriver, and cmsRun: https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCmsDriver Two kinds of config files: 1. CRAB config files: options for submitting CRAB jobs - You need crab_config files to tell how CRAB how to deal with the jobs you want processed. 2. Parameter Set Config Files: sets all the parameters for generating MC events - cmsDriver is the main tool to create these param_config files - View help options: cmsDriver.py --help

Example to generate a param_set_config file called "CMSDAS_MC_generation_cfg.py": cmsDriver.py MinBias_13TeV_pythia8_TuneCUETP8M1_cfi --conditions auto:run2_mc -n 10 --era Run2_2016 --eventcontent FEVTDEBUG --relval 100000,300 -s GEN,SIM --datatier GEN-SIM --beamspot Realistic50ns13TeVCollision --fileout file:step1.root --no_exec --python_filename CMSDAS_MC_generation_cfg.py

Use cmsRun to load modules stored in a configuration file: cmsRun CMSDAS_MC_generation_cfg.py

You can just make sure everything properly compiles by doing: python CMSDAS_MC_generation_cfg.py - if it returns no errors, you should be good to go! - Do this before submitting CRAB jobs

It's a good idea to check for errors in your "python generator fragment", like: python -i externalLHEProducer_and_PYTHIA8_Hadronizer_cff.py

A config file allows you to set all the parameters you want for a job. - They usually start with this line: import ParameterSet.Config as cms # imports our CMS-specific Python classes and functions - And have these as the guts: - A source (which might read Events from a file or create new empty events) - A collection of modules (e.g. EDAnalyzer, EDProducer, EDFilter) which you wish to run - An output module to create a ROOT file which stores all the event data - A path which will list in order the modules to be run

A configuration file written using the Python language can be created as: - a top level file, which is a full process definition (naming convention is _cfg.py ) which might import other configuration files - external Python file fragment, which are of two types: - those used for module initialization (naming convention is _cfi.py) # configuration fragment include - those used as configuration fragment (naming convention is _cff.py) # configuration fragment file?

process.load() # Import fragment to top level, also attaches imported objects

Standard fragments are available in the CMSSW release's Configuration/StandardSequences/python/ area. They can be read in using syntax like process.load("Configuration.StandardSequences.Geometry_cff")

The word "module" has two meanings. A Python module is a file containing Python code and the word also refers to the object created by importing a Python file. In the other meaning, EDProducers, EDFilters, EDAnalyzers, and OutputModules are called modules.

Standard Steps for full simulation and real data Building blocks of the created configurations are the standard processing steps:

* GEN : the generator plus the creation of GenParticles and GenJets * SIM : Geant4 simulation of the detector (energy deposits in the detector volumes) * DIGI : simulation of detector signal response to the energy deposits * L1: simulation of the L1 trigger * DIGI2RAW : data format conversion of the digi signals into the RAW format that will be provided in the online system * HLT : high level trigger Usually all the above steps are executed in one single job. Remaining building blocks are:

* RAW2DIGI : data format conversion of the RAW format into digi signals * RECO : full event reconstruction * ALCA : production of alignment and calibration streams * DQM : code run for DQM * VALIDATION : code run for validation The above list is usually referred to as 'step2'.

Use PhEDEx to transfer datasets between storage areas: https://cmsweb.cern.ch/phedex/prod/Request::Create?type=xfer#

ED Analyzer == .cc file python config file

git cms-merge-topic : - I think this merges the branch and its contents into local directory?

xrootd

https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookXrootdService
xrootd is a command to retrieve Any data, Anytime, Anywhere (AAA) on the CMS servers from anywhere in the world.
Uses logical file names (LFN) to find files

For the US, use:
cmsxrootd.fnal.gov

Must first do:
voms-proxy-init -voms cms

e.g.
TFile f = TFile::Open("root://cmsxrootd.fnal.gov///store/mc/SAM/GenericTTbar/GE.root");

If you wish to check if your desired file is actually available through AAA, execute the command:
`xrdfs cms-xrd-global.cern.ch locate /store/path/to/file’
(xrd = xrootd, fs = file search?)

In a MC cfg.py file, use:
fileNames = cms.untracked.vstring('root://cmsxrootd.fnal.gov//store/myfile.root')

Change password
In command line, do:
yppasswd

CRAB Utility

a utility to submit CMSSW jobs to distributed computing resources
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrab
CRAB Tutorial
https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3AdvancedTutorial
CRAB FAQ
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq
CRAB job errors:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/JobExitCodes
CRAB Commands:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3Commands

There is ONE Tier0 site:
1. CERN
Seven T1 sites:
1. USA
2. France
3. Spain
4. UK
5. Taiwan
6. Germany
7. Italy
~55 T2 sites
You must specify config.Site.storageSite, which will depend on which center is hosting your area, and user_remote_dir which is the subdirectory of /store/user/ you want to write to.
* Caltech storage_element = T2_US_Caltech
* Florida storage_element = T2_US_Florida
* MIT storage_element = T2_US_MIT
* Nebraska storage_element = T2_US_Nebraska
* Purdue storage_element = T2_US_Purdue
* UCSD storage_element = T2_US_UCSD
* Wisconsin storage_element = T2_US_Wisconsin
* FNAL storage_element = T3_US_FNALLPC

Check out all the tiers here:
https://cmsweb.cern.ch/sitedb/prod/sites


You need a CRAB config file in order to run an MC event generation code.
- The cmsDriver.py tool helps to generate config files
- examples of crab_cfg.py files:
crab_GEN-SIM.py
crab_PUMix.py
crab_AODSIM.py
crab_MINIAODSIM.py

A typical CRAB config file looks like:
====================================
from WMCore.Configuration import Configuration
config = Configuration()

config.section_("General")
config.General.requestName = 'CMSDAS_Data_analysis_test0'
config.General.workArea = 'crab_projects'

config.section_("JobType")
config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'slimMiniAOD_data_MuEle_cfg.py'
config.JobType.allowUndistributedCMSSW = True

config.section_("Data")
config.Data.inputDataset = '/DoubleMuon/Run2016C-03Feb2017-v1/MINIAOD'
config.Data.inputDBS = 'global'
config.Data.splitting = 'LumiBased'
config.Data.unitsPerJob = 50
config.Data.lumiMask = 'https://cms-service-dqm.web.cern.ch/cms-service-dqm/CAF/certification/Collisions16/13TeV/Cert_271036-275783_13TeV_PromptReco_Collisions16_JSON.txt'
config.Data.runRange = '275776-275782'

config.section_("Site")
config.Site.storageSite = 'T2_US_Florida'
====================================

The /store/user/ area at LPC is commonly used for the output storage from CRAB jobs

How to make CRAB commands available: (must be in CMSSW environment)
cmsenv
source /cvmfs/cms.cern.ch/crab3/crab.sh #.csh for c-shells

To check that it worked successfully, do:
which crab
> /cvmfs/cms.cern.ch/crab3/slc6_amd64_gcc493/cms/crabclient/3.3.1707.patch1/bin/crab
or:
crab --version
> CRAB client v3.3.1707.patch1

crab checkusername
> Retrieving username from SiteDB...
> Username is: drosenzw

Can also test your EOS area grid certificate link:
crab checkwrite --site=T3_US_FNALLPC # checks to see if you have write permission at FNAL
crab checkwrite --site=T2_US_Nebraska
crab checkwrite --site=T2_US_Florida

***N.B. It is better to use: low num jobs, high num events/job!***

First you can run a job locally, to make sure all is well:
cmsRun <step1_cfg.py>

Submitting CRAB job:
crab submit -c crabConfig_MC_generation.py

Resubmitting a CRAB job:
crab resubmit --siteblacklist='T2_US_Purdue' / # don't submit to Purdue
- N.B. only failed jobs get resubmitted
- There are lots of flags to call to change things like memory usage, priority, sitewhitelist, etc.
- --sitewhitelist=T2_US_Florida,T2_US_MIT
- Can also use wildcards: --siteblacklist=T1_*

Resubmitting SPECIFIC CRAB jobs:
crab resubmit --force --jobids=1,5-10,15 <crabdir1/crabdir2> # N.B. you must --force successful jobs to resubmit

Check number of events from CRAB job:
crab report /

Check status:
crab status
crab status /

Kill a job:
crab kill -d <crab_DIR/crab_job>

For help with MC generation (step1):
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRAB3Tutorial#2_CRAB_configuration_file_to_run

LXBATCH

https://twiki.cern.ch/twiki/bin/view/Main/BatchJobs
http://www.slac.stanford.edu/exp/atlas/computing/batchDesc.html

echo "cd $CMSSW_BASE &&" `eval $(scram ru -sh)` "cd $(pwd) && ./JHUGen ....." | bsub -q 1nd -J jobname

Useful commands:
bjobs [-l] # check job status
bpeek # check stdout so far
bkill # kill a job

OPTIONS:
-c <[hh:]mm> # sets CPU time limit, supposedly default is no-limit, but I don't trust that

CONDOR

https
CASTOR is a big storage space for lxplus

3 MAIN FILES:
bash script # a wrapper, "condor.sh", call the code that you want to run, kind of sets parameters
- e.g. condor.sh
submit script # typical cluster parameters, memory, universe, queue, group
- condor.sub
DAG script # contains the jobs, children, parents needed for condor
- this dag script is produced from a perl/bash command
- then condor reads this DAG script

Run condor:
condor_submit <submitfile.sub>
condor_submit_dag <file.dag> # this submits the dag file to condor (i.e. submits your jobs!)

log/
- one of which is:
output.log # has stdout from code that you want condor to process!

Kill ALL jobs under your username:
condor_rm #
condor_q #

CASTOR

https://twiki.cern.ch/twiki/bin/view/Main/HowtoUseLxplus#Helpful_linux_commands
CASTOR is a big storage space for lxplus

New commands!
ls ==> nsls
mkdir ==> nsmkdir
cp ==> rfcp
rm ==> rfrm
chmod ==> rfchmod

LXPLUS

Your user area: /afs/cern.ch/user/ (10 GB storage)
Your work area: /afs/cern.ch/work/ (100 GB storage, also allows you to share files with others!)

EOS Storage

Must be logged into the LPC machines (Fermilab) or on lxplus https://uscms.org/uscms_at_work/computing/LPC/usingEOSAtLPC.shtml Big storage area for big files

Past Jake says: DON'T store big files in EOS They are easier to access from Tier2 on HiPerGator through UF: /cms/data/store/user/drosenzw/ - use uberftp or gfal-copy to access them if on IHEPA, can store big files in /raid/raid{5,6,7,8,9}

Only lxplus accounts can access EOS storage!

See if you have an eos area on an LPC machine: eosls -d /store/user/drosenzw/

Tier2 Storage is better: /cms/data/store/user/drosenzw/ # HiPerGator at UF. ONLY WRITABLE BY CRAB. Output of CRAB stored here. /cms/data/store/user/t2/users/rosedj1/ # HPG at UF. Put NTuples here.

There are different eos storage areas: /eos/uscms/store/user/drosenzw/ # My allocated EOS area. LPC's Tier3 eos storage (also: /store/user/drosenzw/ ). Use: eosls /eos/cms/ # lxplus /eos/user/d/drosenzw/ # easily accessible from lxplus. SWAN also uses this /uscms_data/d1/drosenzw/ # normal LPC area /eos/uscms_data/d1/drosenzw/ # What even is this?

MAIN COMMANDS: On lxplus, do: ls /eos/cms/ ls -l /eos/user/d/drosenzw/ mkdir /eos/user/d/drosenzw/ eos ls -l /eos/user/d/drosenzw/ # different kind of listing? eos mkdir /eos/user/d/drosenzw/ # different kind of mkdir? xrdcp root://eosuser.cern.ch//eos/user/d/drosenzw/ # copy files - SWAN is also connected to /eos/user/d/drosenzw/SWAN_projects

Set up your environment: export EOS_MGM_URL=root://eoscms.cern.ch

File Names: MGM: root://cmseos.fnal.gov/ LFN (shortcut name): /store/user/drosenzw/ - the LFN is an alias which can be used at ANY site (The LFN is Lenient, i.e. uses a short path like /store/user/...) - the PFN is the actual file path

eosquota returns the amount of storage space used/available in personal EOS area

`eosgrpquota lpctau’ checks the storage space for the group “lpctau”

LISTING `eosls /LFN’ lists files (NEVER USE `ls’!)

`eosls -d /store/user/drosenzw/‘ lists directory entries -l option for long listing -a option for listing hidden entries

DON’T USE WILDCARDS OR TAB-COMPLETION! DON’T USE TRADITIONAL COMMANDS! ls, rm, cd, etc.

COPYING `xrdcp root://cmseos.fnal.gov//store/user/drosenzw/newNameOfFile.txt' (Local file to EOS) `xrdcp root://cmseos.fnal.gov//store/user/drosenzw/whateverFile.txt ~/newName.txt’ (EOS to local file) `xrdcp root://cmseos.fnal.gov//store/user/drosenzw/whateverFile.txt ? root://cmseos.fnal.gov//store/user/drosenzw/newFile.txt' -f option can overwrite existing files -s option for silent copy

MAKE DIR `eosmkdir /store/user/drosenzw/newDir’ -p option will make parent directories as needed `eosmkdir /store/user/drosenzw/newDir1/newDir2/newDir3’

REMOVING `eosrm /store/user/drosenzw/EOSfile.txt’ - removes files `eosrm -r /store/user/drosenzw/dir1’ - removes directory and all contents

if you get scram b errors, first run: cmsrel CMSSW_X_Y_Z

MUST set up environment in working directory (YOURWORKINGAREA): cd ~/nobackup/YOURWORKINGAREA/CMSSW_9_3_2/src cmsenv

For condor batch jobs: xrdcp outputfile.root root://cmseos.fnal.gov//store/user/username/outputfile.root or xrdfs root://cmseos.fnal.gov ls /store/user/username

Attaching files: root -l root://cmsxrootd.fnal.gov//store/user/jjesus/rootFile.root or TFile *theFile = TFile::Open("root://cmsxrootd.fnal.gov//store/user/jjesus/rootFile.root");

in IHEPA, you add root://cmsio5.rc.ufl.edu//store/user/ in the front TFile::Open() instead of TFile(path,"READ")

LPC / Fermilab / CMSDAS

LPC Contact for CMS DAS problems cmsdasatlpc@fnalNOSPAMPLEASE.gov USCMS T1 Facility Support Team uscms-t1@fnalNOSPAMPLEASE.gov Fireworks Problems: fireworks-support@cernSPAMNOTNOSPAMPLEASE.ch Mattermost Problems: service-desk@cernNOSPAMPLEASE.ch

Subscribe to hypernews (I may already be subscribed): https://hypernews.cern.ch/HyperNews/CMS/login.pl?&url=%2fHyperNews%2fCMS%2fcindex

For CRAB Issues: CMSDASATLPC@fnalNOSPAMPLEASE.gov

Get Kerberos ticket: kinit @FNAL.GOV to check: klist

Log onto cmslpc-sl6 cluster: ssh -Y drosenzw@cmslpc-sl6NOSPAMPLEASE.fnal.gov ssh -Y drosenzw@cmslpcNNOSPAMPLEASE.fnal.gov, where N is whatever node you want to join

Initialize your proxy: voms-proxy-init -voms cms --valid 168:00 (makes the proxy valid for a week instead of a day!) source /cvmfs/cms.cern.ch/cmsset_default.sh # or put this in .bash_profile

Storage Areas: /uscms/homes/d/drosenzw # 2 GB storage area /nobackup/ # larger mass storage area

For DAS, each time I log into the sl6 cluster, I need to: cd ~/nobackup/YOURWORKINGAREA/CMSSW_10_2_0/src cmsenv

Switch default shell from tcsh to bash: To permanently change your default login shell, use the LPC Service Portal, login with your Fermilab Services username and password. Choose the " Modify default shell on CMS LPC nodes" ticket and fill it out.

* If you want to get the nice command line after a switch to bash, put source /etc/bashrc in your cmslpc ~/.bash_profile file *

Fireworks doesn’t work locally. Located in: /Users/Jake/Desktop/cmsShow-9.2-HighSierra For help, contact Basil Schneider: basil.schneider@cernNOSPAMPLEASE.ch

I may still have issues pushing to GitHub.

Keep getting this error in ROOT plots: AutoLibraryloader::enable() and AutoLibraryLoader.h are deprecated. Use FWLiteEnabler::enable() and FWLiteEnabler.h instead Info in <TCanvas::MakeDefCanvas>: created default TCanvas with name c1

Go through CRAB3 tutorial in THE WORKBOOK when finished with pre-exercises

FWLite (found in PhysicsTools): Frame Work Lite is an interactive analysis tool integrated with the CMSSW EDM (Event Data Model) Framework. It allows you to automatically load the shared libraries defining CMSSW data formats and the tools provided, to easily access parts of the event in the EDM format within ROOT interactive sessions. It reads produced ROOT files, has full access to the class methods and there is no need to write full-blown framework modules. Thus having FWLite distribution locally on the desktop one can do CMS analysis outside the full CMSSW framework.

Example command: FWLiteHistograms inputFiles=slimMiniAOD_MC_MuEle.root outputFile=ZPeak_MC.root maxEvents=-1 outputEvery=100

Fireworks: turns EDM collections into visual representations… i.e., turns .root files into event displays! cmsShow DoubleMuon_n100.root cmsShow --no-version-check root://cmseos.fnal.gov//store/user/cmsdas/2017/pre_exercises/DYJetsToLL.root

For help with: process.maxEvents = cms.untracked.PSet - https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuidePoolInputSources

An Event is a C++ object container for all RAW and reconstructed data related to a particular collision.

DAS (Data Aggregation Service)

Big database to hold MC and data samples https://cmsweb.cern.ch/das/ FAQ: https://cmsweb.cern.ch/das/faq Examples: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookDataSamples More info: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookLocatingDataSamples

Different ways to interpret the dataset names: /<primary-dataset>/<CERN-username_or_groupname>-<publication-name>-<pset-hash>/USER /object_type/campaign/datatier /Primary/Processed/Tier //<Campaign-ProcessString-globalTag-Ext-Version>/

Given a file, DAS can return a dataset! Given a dataset, DAS can return all the associated files.

Datasets (whether MC or actual data) are published on DAS - A dataset is comprised of many root files - Find the name of a dataset based on the file name: dataset file=/store/relval/CMSSW_10_2_0/RelValZMM_13/MINIAODSIM/PUpmx25ns_102X_upgrade2018_realistic_v9_gcc7-v1/10000/3017E7A1-178D-E811-8F63-0025905A6070.root >>> /RelValZMM_13/CMSSW_10_2_0-PUpmx25ns_102X_upgrade2018_realistic_v9_gcc7-v1/MINIAODSIM

If you have trouble finding a file that you KNOW is on DAS: - change the dbs instance to something other than global, e.g. "prod/phys03"

Example DAS Searches: dataset release=CMSSW_9_3_0_pre5 dataset=/RelValZMM*/*CMSSW_9_3_0*/MINIAOD* dataset release=CMSSW_10_2_0 dataset=/RelValZMM*/*CMSSW_10_2_0*/MINIAOD* dataset=/DoubleMu*/*Run2017C*/MINIAOD* # /object_type/campaign/datatier (/Primary/Processed/Tier)

Can search for datasets from the command line using dasgoclient: dasgoclient --query="dataset=/DoubleMuon*/Run2018A-PromptReco-v1/MINIAOD" --format=plain - must first do: voms-proxy-init -voms cms

Get the LFN of a dataset by doing a DAS search, like: file dataset=/GenericTTbar/HC-CMSSW_5_3_1_START53_V5-v1/GEN-SIM-RECO which will retrieve the following LFN: /store/mc/HC/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0010/00CE4E7C-DAAD-E111-BA36-0025B32034EA.root

MCM (Monte Carlo Manager) Not the same thing as DAS! Use /mcm/ to find the correct info to find MC samples on DAS: /mcm/ is the bookkeeping of all produced MC samples - tells you details of how the MC samples were produced - e.g., tells you location of makecards.sh and data sets Put into mcm: GluGluHToZZTo4L_M125_13TeV_powheg2_JHUGenV7011_pythia8

MCM tutorial with David: Did a search on DAS: /GluGluHToZZ*4L*125*/*Fall17*94X*/MINIAODSIM /Primary/Processed/Tier

David noticed that the location of the MC files couldn't be found here. So then he checked mcm (Monte Carlo Manager): David's MCM user profile: https://cms-pdmv.cern.ch/mcm/users?prepid=dsperka&page=0&shown=51 - click: Request > Navigation > dataset_name - Here, type in the name of the dataset from DAS (without leading forward slash!) e.g. GluGluHToZZTo4L_M125_13TeV_powheg2_JHUGenV7011_pythia8 - may have to click: Select view > Fragment - Then go back to Navigation and scroll to right to click the "enlarge" button - This will bring up important information from "rawGitHub" about the MC samples Of these, most notably is: https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/HIG-RunIIFall17wmLHEGS-00607/0 - It has "Links to cards" - these are MC generation cards - may have to manually search for a specific URL to get the template.input: https://raw.githubusercontent.com/cms-sw/genproductions/fd7d34a91c3160348fd0446ded445fa28f555e09/bin/Powheg/production/2017/13TeV/Higgs/gg_H_ZZ_quark-mass-effects_NNPDF31_13TeV/gg_H_ZZ_quark-mass-effects_NNPDF31_13TeV_template.input

svn

"subversion" - seems like the lxplus version of git and version control Use this to edit Analysis Notes in CMS

Excellent tutorial on svn: http://cmsdoc.cern.ch/cms/cpt/tdr/notes_for_authors_temp.pdf

https://twiki.cern.ch/twiki/bin/view/Main/HowtoNotesInCMS

https://twiki.cern.ch/twiki/bin/viewauth/CMS/Internal/TdrProcessing

To get your AN/paper started: svn co -N svn+ssh://svn.cern.ch/reps/tdr2 myDir cd myDir svn update utils svn update -N [papers|notes] # choose one, papers or notes svn update [papers|notes]/XXX-YY-NNN # enter your AN or paper code eval `[papers|notes]/tdr runtime -sh`

To modify: cd [papers|notes]/XXX-YY-NNN/trunk

To build the document: tdr --style=pas b XXX-YY-NNN # --style=paper for papers

Git-like commands to update files: svn add # YOU ONLY NEED TO DO THIS ONCE FOR ANY FILE svn commit -m '' # This will update the file

svn status svn status -u (--show-updates)

Figures should reside in the fig/ directory Figure ̃\ref{fig:test} shows a figure prepared with the TDR template and illustrates how to include a picture in a document and refer to it using a symbolic label. \begin{figure}[!Hhtb] \centering \includegraphics{width=0.55\textwidth}{c1_BlackAndWhite} \caption[Caption for TOC]{Test of graphics inclusion.\label{fig:test}} \end{figure} The result of the above is roughly as follows: Figure 1 shows a figure prepared with the TDR template and illustrates how to include a picture in a document and refer to it using a symbolic label.

Colour versions of figures can by provided for PDF output using the combinedfigure macro in place of the \ includegraphics command. This takes two arguments corresponding re- spectively to the black and white and the coloured versions of the same picture, for example: Figure ̃\ref{fig:test} shows a figure prepared with the TDR template and illustrates how to include a picture in a document and refer to it using a symbolic label. \begin{figure}[!Hhtb] \centering \combinedfigure{width=0.4\textwidth}{c1_BlackAndWhite}{c1_Colour} \caption[Caption for TOC]{Test of graphics inclusion.\label{fig:test}} \end{figure}

the recommended procedure is to use multiple instances of the \includegraphics command, combined with the tabular environment if needed.

Lucien ditched svn and switched to git for our AN-18-194. https://twiki.cern.ch/twiki/bin/viewauth/CMS/Internal/TdrProcessing

Compare what version of the AN you have: git log # shows recent commits

Certificate Stuff:

Followed instructions on: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookStartingGrid#ObtainingCert Anytime you want to access data on a TCreate a temporary proxy: voms-proxy-init --rfc --voms cms voms-proxy-init --voms cms --valid 168:00 # makes the proxy valid for a week instead of a day! voms-proxy-init -debug

voms-proxy-info # check your info

When your grid certificate expires, you get an error like: “Error during SSL handshake:Either proxy or user certificate are expired.”

Request a new grid user certificate: https://ca.cern.ch/ca/help/?kbid=024010

Must have these permissions:

usercert.pem -rw-r--r-- %BR%
userkey.pem -r-------- 

So do:

chmod 644 path/to/usercert.pem
chmod 400 path/to/userkey.pem

Put certificate into browser and then into VOMS: https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideVomsFAQ

Also need to link certificate to your account in SiteDB: https://resources.web.cern.ch/resources/Manage/Accounts/MapCertificate.aspx Check if it worked: https://cmsweb.cern.ch/sitedb/prod/mycert

Potentially Useful: https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookStartingGrid#ObtainingCert https://twiki.cern.ch/twiki/bin/viewauth/CMS/SiteDBForCRAB

Successfully registered for VO CMS membership on 2018-06-26 Able to submit crab jobs on 2018-07-17

Your certificate subject (DN): /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=*yourusername*/CN=820970/CN= /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=drosenzw/CN=820970/CN=Jake Rosenzweig # NEW The CA that issued your certificate: /DC=ch/DC=cern/CN=CERN Grid Certification Authority

Local System

In case you ever get this kind of error: “Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/private/var/folders/zj/mnvc1p6542bgc5j7npt_2jkh0000gn/T/pip-install-1uj6p02b/tabula-py/tabula/tabula-1.0.2-jar-with-dependencies.jar' Check the permissions.”

This is because Homebrew doesn’t play nicely with pip. So do: `python -m pip install --user --install-option="--prefix=" ’

If you ever get the following error: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! Then the simple fix is: ssh-keygen -f ~/.ssh/known_hosts -R - for example, = lxplus.cern.ch

Version Control

GitHub

GitHub BitBucket svn

Get help on any git command: git help

The most frequent use of git commands: git add git commit -m "" git push origin master You can also add and commit in one step (adds all modified and deleted files): git commit -am "" -m ""

A good collaborative workflow: 1. Fork the group's repo so that you have your own repo. 2. Make your own, new branch in the forked repo that you can work on. 3. Keep the master branch of the forked repo synched up with the group's master branch. (upstream)

git config --global user.name [Name] git config --global user.email [Email] git config --global user.github [Account] git config --global core.editor [your preferred text editor]

Make the print log easier to read: git config --global alias.lol 'log --graph --decorate --pretty=oneline --abbrev-commit'

Pull a specific file from the GitHub repo: git fetch # downloads all the recent changes, but it will not put it in your current checked out code (working area). git fetch origin

git cherry-pick <commit-ID> # grab the files from a specific commit(?)

git checkout origin/master -- <path/to/file> //git checkout / -- path/to/file will checkout the particular file from the downloaded changes (origin/master).

If a Core folder gets updated, do: git submodule init # may not need to do this every time git submodule update

If you need to pull down more recent code from a repo, you can stash your current changes: git stash # saves your modifications for later (so now you can: git pull) git stash apply # brings those saved modifications back to life!

How to pull down changes from a repo that you're following: git fetch git merge

git merge: 1. First make sure local repo is up to date with remote repo: git fetch 2. Then do: git checkout master 3. Make sure master has latest updates: git pull 4. Then checkout branch that should receive changes 5. Finally: git merge

Remotes: Add a remote called "upstream" to push to original (not forked) repo: git remote add upstream git@githubNOSPAMPLEASE.com:GitHATSLPC/GitHATS.git - this is equivalent to doing: git fetch upstream master git merge upstream/master

To keep from having to put in your password each time you push: git remote show origin # This shows you your repo_name git remote set-url origin git+ssh://git@bitbucket.org//.git Can also remove remotes: git remote rm origin Rename a remote: git remote rename

Two ways to make a repo: 1. Create repo in terminal: 1. git init 2. git add . 3. git commit -m 'Commit message’ 1. undo with: git reset --soft HEAD~1 4. git remote add origin 
where the : 1. GitHub: git@githubNOSPAMPLEASE.com:/.git # make whatever you want! 2. bitbucket: https://username@your.bitbucket.domain:7999/yourproject/repo.git 5. git push -u origin master

2. Create repo online and clone into terminal: 1. make repo on BitBucket or GitHub 2. git clone

Check the status of latest changes in your own repo: git status git status -s # short format Also useful: git diff # shows edits between old and new files, line by line git diff # specifically, compares the changes you have made to last committed version of file

If you get the following error: error: The requested URL returned error: 403 Forbidden while accessing https://github.com/rosedj1/ Then do: 1. edit .git/config file under your repo directory 2. find url=entry under section [remote "origin"] 3. change all text before @ symbol to ssh://git

USEFUL! To remove a file from your remote git repo: git rm # I think this also deletes the file locally! git rm --cached # does NOT delete file locally; only on the remote repo! then do: git commit -m "removing " git push origin

Remove a directory: git rm -r

Say you have made a pull request and a bunch of commits which you can see on GitHub. Now you want to remove those files from the PR. Doing rm from your local computer won't take it away from GitHub. # may not be true So you can REMOVE previously committed files by doing: # also may not be true git rm

If you have a file in a PR that you want to delete, or say you have sensitive info in a PR which must be deleted, you should 'rewrite' the commit: git commit --amend # just do this if your most recent commit is local (not online) git push --force origin # otherwise, include this part too to rewrite the history online

If you move a repo to a new location: git remote set-url origin ssh://git@gitlab.cern.ch:7999/cms-rcms-artifacts/gitlab-maven.git or git remote set-url origin https://gitlab.cern.ch/cms-rcms-artifacts/gitlab-maven.git

GitHub Markdown

*<words>*   or    _<words>_   # make <words> italic      (called "emphasis")
**<words>**    or    __<words>__   # make <words> bold       (called "strong emphasis")
**<words> and _<newwords>_**   # <words> and <newwords>    (called "combined emphasis")
~~<words>~~             # make <words> strikethrough
{code}<words>{code}    # make <words> monospace and code-like
!!<space>         # make entire message monospace by beginning message with '!!' and then a space!
@@<space>         # ignore all special formatting by beginning message with '@@' and then a space

Code
`<code>`         # inline <code>

```python
<code>
```            # block <code> with python syntax highlighting

Headers
# H1         # biggest text (used for headings)
## H2
### H3
#### H4
##### H5   # smallest text
###### H6   # smallest text, but greyed out

Lists   ('&#8901;' is a whitespace)
1. First ordered list item
2. Another item
&#8901;&#8901;* Unordered sub-list. 
1. Actual numbers don't matter, just that it's a number
&#8901;&#8901;1. Ordered sub-list
&#8901;&#8901;1. Second item in the sub-list. Remember, GitHub Markdown has automatic numbering
4. And another item.

&#8901;&#8901;&#8901;You can have properly indented paragraphs within list items. Notice the blank line above, and the leading spaces (at least one, but we'll use three here to also align the raw Markdown).

&#8901;&#8901;&#8901;To have a line break without a paragraph, you will need to use two trailing spaces.&#8901;&#8901;   # two trailing spaces keeps you in same paragraph
&#8901;&#8901;&#8901;Note that this line is separate, but within the same paragraph.&#8901;&#8901;

Unordered Lists
* Unordered list can use asterisks
- Or minuses
+ Or pluses


Tables
| Tables        | Are           | Cool  |
| ------------- |:-------------:| -----:|
| col 3 is      | right-aligned | $1600 |
| col 2 is      | centered      |   $12 |
| *zebra stripes* | `are neat`      |    $1 |

- Colons can be used to align columns.
- There must be at least 3 dashes separating each header cell.


Blockquotes         # look like quotes from a forum or email
> <quoted_text>

Hyperlink:
[<words>](<URL>)         # inserts a hyperlink at the string <words>
Image:
![<Image>](<URL>)         # inserts an image

You can also add: Images, Hyperlinks, inline HTML, and YouTube videos


Make a horizontal line (all methods are the same):
***    or    ___    or    ---   

TWiki Markdown:

Intro: https://twiki.cern.ch/twiki/bin/view/TWiki/ATasteOfTWiki?slideshow=on;skin=print#GoSlide1 User guide: https://twiki.cern.ch/twiki/bin/view/TWiki/TWikiUsersGuide Markdown help (long version): https://twiki.cern.ch/twiki/bin/view/TWiki/TextFormattingRules https://twiki.cern.ch/twiki/bin/view/TWiki/TWikiShorthand (short version) TWiki variables: https://twiki.cern.ch/twiki/bin/view/TWiki/TWikiVariables TWiki plugins: twiki.org/cgi-bin/view/Plugins

  • Easiest to edit the TWiki using Raw Edit.

Markdown:
_italics_
*bold*
__bold italic__
=monospace=
==bold monospace==

<verbatim class="cmd">
block of code</verbatim>

Disable formatted text:
<nop>*word*
!*word*

Separate paragraphs with a blank line

---+    # This is a heading
---++   # Deeper heading
---     # horizontal bar

%TOC{title="Goodies:"}%      # Table of Contents

   * text # three spaces, then * starts a bulleted list
    - (further bullets are indented via whitespace triplets)
   1 text # three spaces, then some number, starts a numbered list
    - doesn't matter what number you put!
    - Use the %BR% variable to add a paragraph without renumbering the list

| Cat | Dog
| boo | yah! |    # creates a table

%RED% your_text %ENDCOLOR      # color your text red

BumpyWord   # using CamelCase like this creates an auto-hyperlink to BumpyWord 's TWiki
[[BumpyWords][bumpy words]] appears as bumpy words
[[http://www.google.com/][Google]] appears as Google

%SEARCH   # This is an interface to a sophisticated search engine that embeds the results of the search in your page

Three kinds of documents on the TWiki:
1. DocumentMode = community property, anyone can edit
2. ThreadMode = Q&A
3. StructuredMode = has definite structure and rules to follow

Import an image:

<verbatim class="cmd"><img align="right" alt="CRAB Logo" src="http://cmsdoc.cern.ch/cms/ccs/wm/www/Crab/img/crab_logo_3.png" width="154" /> </verbatim>

Important Particle Physics Stuff:

MadGraph5_aMC@NLO

MadGraph5 (MG5) is a leading order (LO) and next-to-leading order (NLO) Monte Carlo event generator.

  • "Mad" stands for Madison-Wisconsin

Excellent MadGraph5 tutorial:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/MadgraphTutorial

You can also check out the built-in tutorial by typing tutorial into the MadGraph5 interpreter.

  • See next section on how to install MadGraph5.

How to install MadGraph5 (MG5):

  1. Go to https://launchpad.net/mg5amcnlo and right-click a green button with your favorite version of MG5.
  2. Click Copy Link Location.
  3. Download MG5 by going to your shell and pasting in the link:
wget https://launchpad.net/mg5amcnlo

Untar the downloaded tarball and display the contents:

tar -zxf MG5_aMC_v2.6.5.tar.gz
ls -l MG5_aMC_v2_6_5/

You should see something like this:

total 7.4M
drwxr-xr-x.  4 drosenzw zh 2.0K Jul 12 00:49 Delphes
drwxr-xr-x.  3 drosenzw zh 2.0K Jul 12 00:49 ExRootAnalysis
drwxr-xr-x.  3 drosenzw zh 6.0K Jul 12 00:49 HELAS
drwxr-xr-x. 15 drosenzw zh 2.0K Jul 12 00:49 HEPTools
-rw-r--r--.  1 drosenzw zh 1.9K Jul 12 00:48 INSTALL
lrwxr-xr-x.  1 drosenzw zh   16 Jul 12 00:48 LICENSE -> madgraph/LICENSE
drwxr-xr-x.  3 drosenzw zh 2.0K Jul 12 00:48 MadSpin
drwxr-xr-x.  2 drosenzw zh 2.0K Jul 12 00:48 PLUGIN
-rw-r--r--.  1 drosenzw zh 2.2K Jul 12 00:49 README
drwxr-xr-x.  8 drosenzw zh 2.0K Jul 12 00:48 Template
-rw-r--r--.  1 drosenzw zh 122K Jul 12 00:48 UpdateNotes.txt
-rw-r--r--.  1 drosenzw zh   41 Jul 12 00:49 VERSION
drwxr-xr-x.  4 drosenzw zh 2.0K Jul 12 01:01 aloha
drwxr-xr-x.  2 drosenzw zh 2.0K Jul 12 00:49 apidoc
drwxr-xr-x.  2 drosenzw zh 2.0K Jul 12 00:48 bin
drwxr-xr-x.  2 drosenzw zh 2.0K Jul 12 00:49 doc
-rw-r--r--.  1 drosenzw zh 7.2M Jul 12 00:48 doc.tgz
drwxr-xr-x.  2 drosenzw zh 2.0K Jul 12 00:48 input
drwxr-xr-x. 10 drosenzw zh 2.0K Jul 12 01:01 madgraph
drwxr-xr-x.  2 drosenzw zh 2.0K Jul 12 01:01 mg5decay
drwxr-xr-x. 10 drosenzw zh 2.0K Jul 12 01:01 models
-rw-r--r--.  1 drosenzw zh 1.8K Jul 12 00:49 proc_card.dat
-rw-r--r--.  1 drosenzw zh    0 Jul 12 00:49 pythia-pgs.tgz
drwxr-xr-x.  7 drosenzw zh 2.0K Jul 12 00:48 tests
drwxr-xr-x.  7 drosenzw zh 2.0K Jul 12 01:01 vendor

Notes about a couple of these:

  • The models/ dir contains different theoretical models which MG5 can import.
    • Put new models inside this dir to successfully import model .
  • The proc_card.dat file contains the default process to be generated.
  • The bin/ dir contains the executable mg5_aMC. Let's play with that next.

Boot up the MG5 interpreter:

./MG5_aMC_v2_4_2/bin/mg5_aMC
  • Now you can type tutorial for a built-in tutorial or continue reading this TWiki.
Note that by default the Standard Model gets imported:
Loading default model: sm

See what particles MG5 currently knows about:

display particles
Look at the particles with a little more detail:
display multiparticles
Look at the possible vertices:
display interactions

Let's generate a Drell-Yan process:

generate p p > z > l+ l-

Draw the Feynman diagrams associated with this process: (If you have graphics-forwarding set up correctly on your system, MG5 will draw some purdy-lookin' Feynman diagrams for you.)

display diagrams

Save this generated process in a newly-created dir:

output <new_dir_name>
  • Note: Executing output automatically writes the Feynman diagrams to the subprocess/matrix.ps file

Calculate the cross section of the process:

launch
  • Now type 0 to bypass extraneous programs to run.
  • Then press 1 to modify the param_card.dat using Vim. Change anything you want and then do :wq to write (save) and quit.
  • Now press 2 to modify the run_card.dat. Change whatever run conditions and then write and quit out of Vim.
  • Finally, press 0 to calculate the cross section.

EXTRA INFO ON MG5

Bring up the help menu or help on a specific command:

help
help <cmd>

Syntax for generate :

generate INITIAL STATE > REQ S-CHANNEL > FINAL STATE $ EXCL S-CHANNEL / FORBIDDEN PARTICLES COUP1=ORDER1 COUP2=ORDER2 @N

### Examples:
generate g g > h > l- l+ l- l+ [QCD]    # loop process
generate l+ vl > w+ > l+ vl a $ z / a h QED=3 QCD=0 @1
generate p p > h , (h > hs hs, (hs > zp zp, (zp > l+ l-)))
generate p p > h > j j e+ e- vm vm~ QCD=0 QED=6

Specify the number of vertices:

generate p p > h > j j e+ e- vm vm~ QCD=0 QED=6

p p > t t~    # Gives only dominant QCD vertices; ignores QED vertices
p p > t t~ QED=2    # Gives both QCD and QED vertices

Add new processes to current process:

add process p p > h > j j mu+ mu- ve ve~ QCD=0 QED=6

Define new particles (or groups of particles):

define v = w+ w- z a    # Define the vector bosons
define p = p b b~    # Redefine the proton

Import a new model:

import model mssm
  • Note: The model must exist in the /MG5_aMC_v2_4_2/models/ dir.

Modify the model:

customize_model 
customize_model --save=<new_model_name>    # Save new model
  • Useful for setting a mass to zero, or removing some interaction, etc.

Save MG5 commands from interactive session: history .dat

Execute commands stored in history file: import command .dat # from MG5 CLI ./bin/mg5_aMC my_mg5_cmd.dat # from your shell

Execute shell commands from MG5 CLI: ! # option 1 shell # option 2

./bin/madevent do: pythia run_01

Rerun a launch command from a dir that was produced using output

./bin/generate_events

After you do

output <new_dir>
, inside that dir you will find a very useful README file that shows you how to:
  1. generate events B. how to run in cluster/multi-core mode C. how to launch sequential run (called multi-run) D. How to launch Pythia/PGS/Delphes E. How to prevent automatic opening of html pages F. How to link to lhapdf G. How to run in gridpack mode

import model HAHM_variablesw_v3_UFO define q = u d s c t b u~ d~ c~ s~ t~ b~ generate q q > z z / g h h2 , z > l+ l- output launch

How to make a gridpack

Run gridpack_generation.sh.
  • I think you get this from cmssw/genproductions github but I need to double check.

Tips: - Usually good to put: ptj = 0.01 (= 0 has caused problems) - qscale at ME level is controlled by ptj at NLO and by xqcut at LO - draj = 0.05 (this is the deltaR between gamma and jets) - jetradius = 0.7 (for non-FXFX merging samples) - lhaid = 292000 (for 4 fermion final state)

Fix: ******Appending [QCD] # applies NLO QCD corrections to process

generate p p > w+, w+ > ell+ vl @0 # '@0' is still leading order...

How to fix certain errors: Error detected in "import model Must put a "model dir" with all the model cards inside MG5_aMC_v2_6_5/models/ - a model dir has files like: "couplings.py", "vertices.py", "decays.py"

For help on using MCM or php:
https://indico.cern.ch/event/807778/contributions/3362163/attachments/1826349/2989132/mccmTutorial.pdf

How to install LHAPDF sets: Open up a MG5 interpreter and do: install lhapdf6 BEWARE! IT'S NOT GUARANTEED TO WORK! while doing 'install lhapdf6' some errors are encountered, specifically that the desired dir is never created: /20190422_HAHM_qqZZ4L/MG5_aMC_v2_6_5/HEPTools/lhapdf6/share/LHAPDF/

instead it only creates: /20190422_HAHM_qqZZ4L/MG5_aMC_v2_6_5/HEPTools/lhapdf6/

Need to MANUALLY put these files into .../share/LHAPDF/: - pdfsets.index - lhapdf.conf /cvmfs/cms.cern.ch/lhapdf/pdfsets/6.2/pdfsets.index

Then download the desired pdfs into .../share/LHAPDF/: wget https://lhapdf.hepforge.org/downloads?f=pdfsets/6.1/NNPDF23_lo_as_0130_qed.tar.gz -O NNPDF23_lo_as_0130_qed.tar.gz tar xvfz NNPDF23_lo_as_0130_qed.tar.gz

If you want to view the code that fails: MG5_aMC_v2_6_5/HEPTools/HEPToolsInstallers/installLHAPDF6.sh

value '230000' for entry 'pdlabel' is not valid. Preserving previous value: 'nn23nlo'. allowed values are lhapdf, cteq6_m, cteq6_d, cteq6_l, cteq6l1, nn23lo, nn23lo1, nn23nlo

Change Fortran compiler to "gfortran": MG5_aMC_v2_6_5/input/mg5_configuration.txt

LHAPDF_DATA_PATH=/cvmfs/cms.cern.ch/lhapdf/pdfsets/6.2/NNPDF30_nlo_nf_5_pdfas PATH PYTHONPATH LD_LIBRARY_PATH /afs/cern.ch/work/d/drosenzw/DarkZ/MG5_gridpacks_practice/CMSSW_10_2_0/biglib/slc6_amd64_gcc700:/afs/cern.ch/work/d/drosenzw/DarkZ/MG5_gridpacks_practice/CMSSW_10_2_0/lib/slc6_amd64_gcc700:/afs/cern.ch/work/d/drosenzw/DarkZ/MG5_gridpacks_practice/CMSSW_10_2_0/external/slc6_amd64_gcc700/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_2_0/biglib/slc6_amd64_gcc700:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_2_0/lib/slc6_amd64_gcc700:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_2_0/external/slc6_amd64_gcc700/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/llvm/6.0.0-gnimlf2/lib64:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/lib64:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/cuda/9.2.88-gnimlf/drivers

Maybe need to do this: export PATH=$PATH:/HEPTools/lhapdf6/bin /afs/cern.ch/work/d/drosenzw/DarkZ/MG5_gridpacks_practice/HAHM_LO/HAHM_variablesw_v3/HAHM_variablesw_v3_gridpack/work/LHAPDF-6.2.1/bin

/afs/cern.ch/work/d/drosenzw/DarkZ/MG5_gridpacks_practice/HAHM_LO/HAHM_variablesw_v3/HAHM_variablesw_v3_gridpack/work/LHAPDF-6.2.1/bin/lhapdf

Les Houches Events (LHE) Files

When a LHE file is made, inside you will find something like this:
<event>
 8   1  0.6793200E-07  0.9147429E+02  0.7818608E-02  0.1356938E+00
   particleID    1    4    4    0    0       px                 py               pz                  E                   mass
        2   -1    0    0  501    0  0.00000000000E+00  0.00000000000E+00  0.68369358717E+03  0.68369358717E+03  0.00000000000E+00 0. -1.
       -2   -1    0    0    0  501 -0.00000000000E+00 -0.00000000000E+00 -0.69562716985E+01  0.69562716985E+01  0.00000000000E+00 0.  1.
       23    2    1    2    0    0 -0.39165990322E+01  0.82379263440E+01  0.32833201641E+03  0.34083640999E+03  0.91018361847E+02 0.  0.
     1023    2    1    2    0    0  0.39165990322E+01 -0.82379263440E+01  0.34840529906E+03  0.34981344888E+03  0.29999890451E+02 0.  0.
      -11    1    3    3    0    0  0.17753759019E+02  0.35304624130E+01  0.31380369896E+03  0.31432534356E+03  0.51100000000E-03 0. -1.
       11    1    3    3    0    0 -0.21670358051E+02  0.47074639310E+01  0.14528317452E+02  0.26511066425E+02  0.51100000000E-03 0.  1.
      -11    1    4    4    0    0  0.85238449545E+01  0.88209596645E+01  0.94289826123E+02  0.95084365554E+02  0.51100000000E-03 0.  1.
       11    1    4    4    0    0 -0.46072459224E+01 -0.17058886009E+02  0.25411547294E+03  0.25472908333E+03  0.51100000000E-03 0. -1.

UF-Specific Stuff

DON'T store big files in EOS; they are easier to access from Tier2: /cms/data/store/user/ - use uberftp or gfal-copy to access them if on IHEPA, you can store big files in one of the native /raid/ storage areas:

Path Server
/raid/raid5/ gainesville
/raid/raid6/ newberry
/raid/raid7/ alachua
/raid/raid8/ melrose
/raid/raid9/ archer
(Mnemonic: Ga-New-Ala-M-Ar, pronounced "GNU Alamar")

Use Jupyter Notebooks on your remote server:

  • First, on the remote server, do: jupyter notebook --no-browser --port=8880
  • Then, on local computer, do: ssh -N -L localhost:8888:localhost:8880 yourusername@melroseNOSPAMPLEASE.ihepa.ufl.edu
  • Finally, type this into your browser: localhost:8888

Notes:

  • If you started the jupyter notebook after doing cmsenv, then the notebook will know about ROOT.
  • You still have to do: import ROOT, etc.
  • Add your own packages and modules to your python path: sys.path.append('/path/to/modules')

For general tips on !CMS analysis basics:

For excellent code structure and interesting ideas like PyROOT and Jupyter Notebooks:

UF Tier2 Commands (uberftp, xrootd, etc.)

Get an IHEPA account:

  • Email Yu Fu (yfu at ufl.edu) and ask him to make you an account.

Renew your CERN account:

Interact with UF Tier 2 storage directly:

uberftp cmsio.rc.ufl.edu "ls /cms/data/store/user/drosenzw/"
uberftp cmsio.rc.ufl.edu "help"    # See available commands.
uberftp cmsio.rc.ufl.edu "mkdir /cms/data/store/user/drosenzw/mynewdir/"
uberftp cmsio.rc.ufl.edu "rename /cms/data/store/user/drosenzw/olddir/ /cms/data/store/user/drosenzw/newdir/"

Copy files to/from UF Tier 2 storage:

gfal-copy <source_dir> gsiftp://cmsio.rc.ufl.edu//cms/data/store/user/drosenzw/<dest_dir>
gfal-copy  -r  gsiftp://cmsio.rc.ufl.edu//cms/data/store/user/<filepath>  file:///home/rosedj1/<dest_dir>
  • Use the -r flag (recursive) to copy entire directories.
  • With gfal-copy, you CAN'T USE WILDCARDS! Just end with .../dir/

You can write to your T2 area on HiPerGator:

  • /cms/data/store/user/t2/users//

Two methods to attach a file from T2:

  • root -l root://cmsio5.rc.ufl.edu//store/user/drosenzw/myfile.root
  • Once in PyROOT: f = TFile.Open("root://cmsio5.rc.ufl.edu//store/user/drosenzw/myfile.root","READ")

*You may have to use one of the following paths instead:*
To access CMS data from IHEPA, please use:

  • root://cmsio5.rc.ufl.edu//store/...
  • root://cms-xrd-global.cern.ch//store/.. # access the file in any site
  • root://cms-xrd-global.cern.ch//store/test/xrootd/T2_US_Florida//store/...
  • gsiftp://cmsio.rc.ufl.edu/cms/data/store/... ( if cmsio5.rc.ufl.edu does not work for some reason )

The best way to display plots on IHEPA for easy viewing:

mkdir -p /home/<your_UN>/public_html/<dest_dirs>
cp /home/rosedj1/index.php /home/<your_UN>/public_html<dest_dirs>/
Then the plots will show up at your website:

HiPerGator (HPG)

HiPerGator lectures given by Matt Gitzendanner

Find notes on HiPerGator (Find Matt Gitzendanner's presentations): training.it.ufl.edu Find SLURM commands at: help.rc.ufl.edu Interactive Jupyter Notebook session that uses HiPerGator!: jhub.rc.ufl.edu

Location of SLURM example scripts: /ufrc/data/training/SLURM/*.sh - for single jobs, grab: single_job.sh - for parallel jobs, grab: parallel_job.sh

You have a couple main directories: / # where HPG first drops you off /home// # CANNOT handle big files (only has 20 GB of storage) /ufrc// # can handle 51000 cores!

/ufrc/phz5155/$USER - parallel file system - CAN handle 51000 cores, reading and writing to it - 2 TB limit per group after ssh’ing into HPG, it will take you to: /home/$USER - for me this is: /home/rosedj1 - Get 20GB of space - Has one server (node) hosting

My groups: /ufrc/korytov/rosedj1/ # for particle physics research /ufrc/phz5155/ # for computing course - so I'm part of two different groups

To use class resources, instead of Korytov’s resources: module load class/phz5155 - each time you want to submit a job, do this command^

It is useful to use the extension: .slurm for SLURM scripts

######################
## Basic SLURM job script:
#!/bin/bash
#SBATCH --job-name=test       # Name for job
#SBATCH -o job_%j.out          #
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<rosedj1@ufl.edu>
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=100mb      or    #SBATCH --mem=1gb
#SBATCH --time=2:00:00 (hh:mm:ss)   or   #SBATCH -t=00:01:00

SCRIPT STUFF BELOW, e.g.
hostname
module load python
python -V
######################

SLURM sbatch directives multi-letter directives are double dashes: --nodes=1 # processors --ntasks --ntasks-per-node --ntasks-per-socket --cpus-per-task (cores per task) Memory usage: --mem=1gb --mem-per-cpu=1gb --distribution Long option short option description --nodes=1 -N request num of servers --ntasks=1 -n num tasks that job will use (useful for MPI applications) --cpus-per-task=8 -c

If you invest in 10 cores, burst qos can use up to 90 cores! #SBATCH --nodes=1

Task Arrays #SBATCH --array=1-200%10 # run on 10 jobs at a time to be nice $SLURM_ARRAY_TASK_ID %A: job id %a: task id

HPG COMMANDS: id # see your user id, your group id, etc. sbatch # submit script.sh to scheduler sbatch --qos=phz5155-b # squeue # see ALL jobs running squeue -u rosedj1 # just see your jobs squeue -j scancel # kill a job sacct # sstat # slurmInfo # see info about resource utilization; must do: module load ufrc slurmInfo -p # partition, a better summary slurmInfo -g # srun --mpi=pmix_v2 myApp

Memory utilization = MAX amount used at one point Memory request = aim for 20-50% of total use

BE WISE ABOUT USING RESOURCES! - Users have taken up 16 cores and TOOK MORE TIME than just using 1 core!!!

It would be interesting write a SLURM script which submits many of the same job with different cores, plots the efficiency vs. num cores

QOS or burstQOS "Quality of Service" When you do sbatch, the -b option is “burst capacity” to allow 9x allocation of resources when resources are idle --qos=phz5155-b --qos=

In the job summary email, the memory usage is talking about RAM efficiency

Time: -t time limit is 31 days - It is to our benefit to be accurate with job time - infinite loops will just waste resources and make you think your job is actually working - the scheduler might postpone your job if it sees it will delay other people's jobs

Module system organizes file paths If you want to use common modules on HPG, you must load them first: module load module load python module load python3 module load = ml # already aliased automagically into HPG module list # list modules module spider # list everything? module spider cl # list everything with cl in name module purge # unloads all modules ml intel # allows you to do "make" commands module load intel/2018 openmpi/3.1.0 # compiling

Learning about Xpra: module load gui launch_gui_session -h # shows help options - This will load a session on a graphical node on the cluster - Default time on server is 4 hrs - use the -a option to use secondary account - use the -b option to use burst SLURM qos

Paste the xpra url into your local terminal

Do: module load gui launch_gui_session -e (e.g., launch_rstudio_gui) xpra attach ssh: xpra_list_sessions scancel

ln -s <file_path_that's_way_far_away> # makes a symbolic link from to

Development Sessions When to use a dev session? - When a job requires multiple cores and maybe a few days to run - There are 6 dev nodes!

module load ufrc srundev -h # help! srundev --time=04:00:00 # begin a 4 hr dev session, with the default 1 processor core and 2 GB of memory srundev --time=60 --ntasks=1 --cpus-per-task=4 --mem=4gb # additional flags srundev -t 3-0 # session lasts 3 days srundev -t 60 # session lasts 60 min - default time is 00:10:00 (10 min) and max time is 12:00:00 These are all wrappers for: srun --partition=hpg2-dev --pty bash -i

Getting CMSSW on HPG!!! 1. Start a dev session 2. source /cvmfs/cms.cern.ch/cmsset_default.sh # this makes cmsrel and cmsenv two new aliases for you! 3. Now cmsrel your favorite CMSSW_X_Y_Z

Misc Info on HPG: Terminology: 8 cpus/task = 8 cores on that one server 1 node has: RAM, maybe 2 sockets (processors), each with 2 cores A node is a physical server Each server has either 2 or 4 sockets You can also specify the number of tasks per processor: ntasks-per-socket each processor only has a certain bandwidth to memory processor = cpu= core 15000 servers all networked together, each node has either 32 or 64 cores on it

Entire PHZ5155 course is allocated a whole node! - This is 32 cores on HPG2 The slowdown of your job may be in the bandwidth!

Data center near Satchel’s where HiPerGator resides 51000 cores in HPG2 cluster (only 14000 cores in HPG1) world-class cluster 3 PB of storage

It’s essentially just a big rack of computers, where each computer has: HPG2: 2 servers, 32 cores per server, 128 GB RAM/core? HPG1: 1 servers, 64 cores per server, 256 GB RAM/core?

hpg1 is 3 years old and has older nodes 3.5 GB/core available

threaded=parallel=open MPI

Can do parallel applications: - OpenMP, Threaded, Pthreads applications - all cores on ONE server, shared memory - CAN'T talk to other servers MPI (Message Passing Interface) - applications which can run across multiple servers

ntasks = # of MPI rinks say you want to run 100 tasks across 10 nodes 100 MPI ranks You might think the scheduler would put 10 MPI ranks on each node, - but it won't be so equal per node, necessarily! The scheduler may put 30 tasks on one node, and distribute the remaining 70 tasks on other nodes. Though you can control the ntasks-per-node Two processors, each processor has 2 cores 16 cores per processor 64 cores per node

For Windows users who need a Terminal: - MobaXterm - or the Ubuntu subsystem Need an SFTP client to move from to your computer - Cyberduck - FileZilla Text editor: - BBedit(?)

Cluster basics: ssh’ing puts you into a login node Then submit a job to the scheduler. - The scheduler submits the job to the 51000 cores! - You must prepare a file to tell scheduler what to do (BATCH script) - number of CPUS - RAM - how long to process the job

There are also compute nodes - this is where the money is! - They are optimized to distribute jobs across different computers efficiently

Extra Stuff:

Mantra:
"GUIs make easy tasks easier; CLIs make difficult tasks possible."

Neat commenting styles:

@@@@@@@@@@@@@@@@@@@@@@
@ IMPORTANT MESSAGE  @
@@@@@@@@@@@@@@@@@@@@@@

ccccccccccc
c message c
ccccccccccc

-*-*-*-*-* title *-*-*-*-*-

#________________________|
Section 1:
#________________________|
Section 2:

==========
 My Title
==========

// ============ Initialize Variables ============= //
// ------------ other title ------------

Teach by showing. Learn by doing.

Good TWiki Layouts:

Someone else's intro to CMS and its software

How to be more efficient with your coding:

  • Use lots of print statements. It's not professional, but it gets the job done!
  • Do sys.exit() to test sections of Python code.
  • Use an interpreter (Python, ROOT, bash command prompt) to test dummy examples. Make sure your code does what you think it will do.
  • Rubber Ducky Method: Talk about your code out loud to the "rubber ducky" sitting on your desk. Or go find a real human.
  • Comment heavily at first, and then trim it down. Remember that your code might not do what you intended for it to do!
  • How to understand someone else's code:
    • Just run the code and see what it produces. Find out which part of the code produces which stuff.
    • Comment their code and explain it in plain English.
    • Start from some point that you understand and work your way backwards
    • Go line by line very carefully figuring out what each line does.
    • Try rewriting their code in chunks to reproduce what it is, but in your own way
Edit | Attach | Watch | Print version | History: r41 < r40 < r39 < r38 < r37 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r41 - 2019-12-08 - JakeRosenzweig
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback