This is a collection of useful programming tips, physics knowledge, and resources to assist you in the CMS environment.
Use the Table of Contents below or your browser's built-in search function to look for keywords of interest.
We primarily use ROOT to access .root files and all the tasty data stored inside.
Honestly, I find ROOT to be difficult to work with. So instead I use PyROOT, which is just Python with ROOT's libraries imported.
In your shell, do:
python
from ROOT import *
Congrats! You are now working in PyROOT.
You can use all the typical ROOT commands (like, make histograms, canvases, open TTrees, etc.) all within the comfort of friendly Python syntax!
While inside this "PyROOT" interpreter, you can open files locally or remotely:
python
from ROOT import *
f1 = TFile.Open("root://cmsio5.rc.ufl.edu//cms/data/store/user/t2/users/rosedj1/ForPeeps/ForFilippo/GluGluHToZZTo4L_M125_13TeV_powheg2_JHUGenV7011_pythia8.root")
f1.ls()
TFile** root://cmsio5.rc.ufl.edu//cms/data/store/user/t2/users/rosedj1/ForPeeps/ForFilippo/GluGluHToZZTo4L_M125_13TeV_powheg2_JHUGenV7011_pythia8.root
TFile* root://cmsio5.rc.ufl.edu//cms/data/store/user/t2/users/rosedj1/ForPeeps/ForFilippo/GluGluHToZZTo4L_M125_13TeV_powheg2_JHUGenV7011_pythia8.root
KEY: TDirectoryFile Ana;1 Ana
A couple of points:
root: specifies that the file should be opened with ROOT
// is a separator
cms-xrd-global.cern.ch says to use the Xrootd service with a particular "redirector". Use this to access remote files.
This root file in particular has a "directory" object called Ana from the TDirectoryFile class. Let's see what's inside:
f1.Get("Ana").ls()
TDirectoryFile* Ana Ana
KEY: TTree passedEvents;1 passedEvents
KEY: TH1F nEvents;1 nEvents in Sample
KEY: TH1F sumWeights;1 sum Weights of Sample
KEY: TH1F nVtx;1 Number of Vertices
KEY: TH1F nInteractions;1 Number of True Interactions
We see some 1-dimensional histograms (the TH1F dudes) and a TTree, which stores most of the juicy data we want.
Let's go into that TTree, check out the 0th event, and see what branches the TTree contains:
This shows you what information is stored inside the TTree. You can use all sorts of TTree methods to sift through the data.
The most useful methods that I know of are:
t.Show(2) # Shows all branches and values of the second entry.
t.Scan() # Scans the first 25 entries. Press Enter to show another 25 entries.
t.Scan("<branch_name>") # Scan across a specific branch in the TTree.
t.Scan("Event:triggersPassed:GENmass4l") # Simultaneously scan across multiple branches. SUPER USEFUL.
t.GetEntries() # Gives total number of entries in N-Tuple.
t.GetEntries("passedFullSelection==1") # Only count entries with passedFullSelection==1.
t.GetEntries("Sum$(abs(GENlep_id[])==11)==4") # Can do cool sums and stuff. I need to learn more about this.
t.GetEntry(2) # Puts you at the second entry and allows you to extract branch info.
Then: t.eventWeight # Get value of eventWeight of second entry.
t.Print() # Another way to see what branches your tree has.
t.Draw("pTZ1") # Make a histogram of the pTZ1 branch.
t.Draw("pTZ1","pTZ2 > 80") # Make a histogram of pTZ1 but apply cuts on pTZ2.
t.Draw("pTZ1","pTZ2 > 80 && nVtx < 5") # Can combine selection criteria.
t.Draw("ebeam","(1/e)*(sqrt(z)>3.2)") # Apply a weight of 1/e to all entries whose sqrt(z)>3.2
t.Draw("patMuons_slimmedMuons__PAT.obj.eta()","abs(patMuons_slimmedMuons__PAT.obj.eta())<1.2","")
t.Draw("some_branch>>h1", "", "goff") # Store the values from <some_branch> into a histogram called h1.
Histograms
A histogram ("histo") is a kind of frequency plot; it shows your data in "bins", based on how often certain data values occur.
The most common kind of histo is a TH1F (1-dimensional Histogram of Floats).
Make a histo in Python:
TFile f("histos.root", "new")
TH1F h("hgaus", "histo from a gaussian", 100, -3, 3) #
h1.FillRandom("gaus", 10000) # Pull 10,000 values from a normalized Gaussian distribution.
Make a histo in C++:
TH1F * h = new TH1F("h","My Histogram",100,-20,20)
h->FillRandom("gaus", 5000) # Fill histo with 5000 random points pulled from Gaussian Distribution
Frequent Histogram Methods:
h->Fill(gRandom->Gaus(4,2)) # Fill histo with a single point pulled from Gaussian with mu=4, sigma=2
for (int i=0; i<1000; i++) {h->Fill(gRandom->Gaus(40,15));} // Do a for loop to fill histo with many points.
h->GetEntries() # Returns how many total values have been put into the bins
h->GetMaximum() # Returns the number of entries inside the bin which holds the most entries
h->GetMinimum() # Returns the number of entries inside the bin which holds the fewest entries
h->GetBinContent(<int bin_num>) # Returns the number of entries inside bin number bin_num
h->GetMaximumBin() # Tells you which bin holds the most entries; Returns the bin number(not x value of bin!)
h->Draw() # Draws the histo using points. Looks ugly.
h->Draw("HIST") # Draws the histo using rectangles. Looks more professional!
h->Draw("HIST e") # Draw histo with error bars ( where err = sqrt(num_entries_in_bin) )
h->GetMaximumStored() # ???
h->GetMean() # Get average of histogram
h->GetStdDev() # Get standard deviation of histo
h->GetXaxis()->GetBinCenter(<int bin>) # returns the x value where the center of bin is located
h->GetNbinsX() # Returns the number of bins along x axis
h->Fill(<int bin_num>, <double val>) # Fills bin number bin_num with value val
h->SetBinContent(<int bin>, <double val>) # Deletes whatever is in bin number <bin>, and fills it with value <val>
(counts as adding a NEW entry!)
h->SetAxisRange(double <xmin>, double <xmax>, "<X or Y>") #
h->SetMaximum(max_y * 1.2) //
h->GetXaxis().SetRangeUser(80 , 250)
h->IntegralAndError(<bin1>,<bin2>,<err>) # calculates the integral
- err will store the error that gets calculated
- so before you execute the IntegralAndError, first do err = Double(2) to create the err variable
h->SetLogy() # set y axis to be log scale
h->Write() # Saves the histo to a root file or something?
h->GetXaxis()->GetBinCenter( h->GetMaximumBin() ) # returns most-probable value of histo
h->Sumw2() // Ensure proper error propagation.
h->Rebin(10) // Rebinning should be easy! Errors are automatically recalculated.
<b>"Pretty up" your plot:</b>
h.GetXaxis().SetTitle("massZ(GeV)") # Put title on X axis. Can also use: h.SetXTitle()
h.GetYaxis().SetTitleOffset(1.3) # Move Y axis up or down a bit.
h.SetAxisRange(0.0, 0.1, "X") # Sets range on X axis.
h.SetLabelSize(0.03, "Y")
h.SetLineColor(1)
TH2F: (2-dimensional Histogram of Floats)
Makes "heatmaps".
h2 = TH2F("h2","h2 with latex titles",40,0,40,20,0,10)
h2.Integral() # calculate integral over ALL bins
h2.IntegralAndError(xbin1, xbin2, ybin1, ybin2, err) # calculates the integral over square region, specified by bins
- err will store the error that gets calculated
- so before you execute the IntegralAndError, first do err = Double(2) to create the err variable
h2.Draw("COLZ1") # "COL" means color, "Z" means draw the color bar, "1" makes all cells<=0 white!
EXTRA HISTOGRAM TIPS
Load histo from a root file:
=TFile f("histos.root");= TH1F h = (TH1F)f.Get("hgaus");Bin Convention:
bin = 0; underflow bin
bin = 1; first bin with low-edge xlow INCLUDED
bin = nbins; last bin with upper-edge xup EXCLUDED
bin = nbins+1; overflow bin
Fitting functions to a histo:
h->Fit("gaus") // Fit a Gaussian curve to the histo.
// Create your own fitting function:
fitfunc = new TF1("m1","gaus",85,95); // OK this one is still a Gaus, but it is only valid from 85 < x < 95.
fitfunc->SetLineColor(1)
fitfunc->SetLineWidth(2)
fitfunc->SetLineStyle(2)
h->Fit(fitfunc, "R");
param = h->Fit(fitfunc, "S"); // Saves the fit parameters into the variable param.
ROOT.gStyle.SetOptFit(1111) // Set statistics box.
The -f flag forces the new file to be produced, even if the file already exists.
For histos with equal bin width,
it is probably better to set the binwidth rather than the setting min_bin, max_bin, and number of bins!
There is an "overflow bin" on the right edge of the histogram that collects entries which lie outside the
histogram x range.
These entries are NOT counted in the statistics (mean, stdev), BUT they are counted as new entries!
There's also an underflow bin
Therefore, entries in overflow bins DO count towards total entries, but not towards statistics, like Integral()
Normalizing Histos: Scale(1/h->Integral)
TGraphs
The most common kinds of graphs:
TGraphErrors: plot data points with error bars.
TF1: plot any function that you can create.
Multigraph:
tg = TGraph(<int n_points>, <x_array>, <y_array>)
tg.GetXaxis().SetTitle("<x_title>")
tg.Draw("APC")
Drawing Options Description
"A" Axis are drawn around the graph
"I" Combine with option 'A' it draws invisible axis
"L" A simple polyline is drawn
"F" A fill area is drawn ('CF' draw a smoothed fill area)
"C" A smooth Curve is drawn
"*" A Star is plotted at each point
"P" The current marker is plotted at each point
"B" A Bar chart is drawn
"1" When a graph is drawn as a bar chart, this option makes the bars start from the bottom of the pad. By default they start at 0.
"X+" The X-axis is drawn on the top side of the plot.
"Y+" The Y-axis is drawn on the right side of the plot.
"PFC" Palette Fill Color: graph's fill color is taken in the current palette.
"PLC" Palette Line Color: graph's line color is taken in the current palette.
"PMC" Palette Marker Color: graph's marker color is taken in the current palette.
"RX" Reverse the X axis.
"RY" Reverse the Y axis.
How to use TMultiGraph:
mg = TMultiGraph("<internal_name>", "<title>")
mg.SetMaximum(<maxval>) # set y-axis to <maxval>
leg = TLegend(xmin, ymin, xmax, ymax) # (all floats between 0 and 1, as a proportion of the x or y dimension)
Example:
leg = TLegend(0.60,0.7,0.8,0.9)
leg.AddEntry(h1, "Mass = %s" % mZd,"lpf")
leg.SetLineWidth(3)
leg.SetBorderSize(0)
leg.SetTextSize(0.03)
leg.Draw("same")
Python
Take in user input:
usrinput = raw_input("Process which file?") # User input will be stored in usrinput as a string.
help(<object>) # brings up a help menu (docstring?) for <object>
e.g. help(os.makedirs)
It is often useful to debug a python script by doing:
python -i <script.py>
- this executes the script and then puts you in the python interpreter
- This is beneficial because now all variables have been initialized and you can play around!
Printing
print "".
"Hello, %s. You are %s." % (name, age) # called "%-formatting", not suggested by the docs!
"Hello, {1}. You are {0}.".format(age, name) # "str.formatting", more flexible!
- "1" is the 1st var, "0" is the 0th var in format(0,1)
print "{0:.1f}\t{1:15.4E}\t{2:15.4E}".format(
mZdList[k],
sigseleffList_4e[k],
sigseleffList_4mu[k],
)
- {<var>:<spaces>.<decimalplaces><type>}
The most common objects in Python:
Dictionaries
mydict = {}
- Mutable
Iterate over values:
for val in mydict.values():
Iterate over keys and values:
for key,val in mydict.items():
print "The key is:", key, "and the value is:", val
iterkeys()
Lists
- Unordered and mutable!
mylist = [1,3,'hey'] # lists can hold different data types
mylist.append(4.6) # permanently appends value to mylist
Tuples
- Ordered and immutable!
Very similar to lists... except tuples are immutable!
They are processed faster than lists
for loops
for item1,item2 in zip( list1,list2 ): # will iterate through item1 at same time as item2
# do stuff
range
- creates an iterable object
- useful for "for loops"
xrange is faster and requires less memory, but has less versatility
Functions:
Variable number of arguments:
def asManyAsYouWant(var, *argv): # pass in as many arguments into argv as you want
for arg in argv: # each one will be iterated over
print "do stuff"
Lambda functions
A way to write quick functions
square = lambda x: x**2
pythag = lambda x,y: np.sqrt(x**2 + y**2)
ls = lambda : os.listdir()
printstuff = lambda *args: print args
- can handle any number of arguments
Call these functions with:
- square(), pythag(), etc.
Probably safer to put the arguments of an if statement in parentheses:
if (not os.path.exists(<path_to_dir>)): print "this dir doesn't exist"
Passing arguments to script:
myscript.py arg1 arg2
sys.argv[0] # name of script (myscript.py)
sys.argv[1] # first argument passed to script (arg1)
sys.argv[2] # first argument passed to script (arg2)
Useful string methods:
<string>.lstrip() # temporarily removes whitespace from beginning of <string> to first non-whitespace char
<string>.rstrip('/') # temp. remove a '/' from the right-part of <string>
<string>.startswith('#') # return bool if string starts with '#'
module = container for code, e.g. a .py file (which is called a submodule!)
package = modules that contain other modules, e.g. a directory with an __init__.py file
Classes:
class Vectors():
def __init__(self,x,y,z):
self.x = x
self.y = y
self.z = z
def length(self):
return np.sqrt(x**2+y**2+z**2)
Now you can create objects:
myobj = Vector(9,4,2)
myobj.x # get x coord
myobj.length() # get length of myobj
Built-in methods:
__doc__
__init__
__module__
__dict__
dir(myobj) # show all the attributes of myobj
myobj.__dict__ # returns a dictionary of {attributes:values}
Save your objects in a pickle:
import pickle
mylist = [1,2,3]
fileobj = open(<file_to_write_to>,'wb')
pickle.dump(mylist, fileobj)
fileobj.close()
Easily restore the pickled object:
anotherfileobj = open(<file_with_pickled_obj>, 'r')
Packages:
glob
import glob
glob.glob("/raid/raid7/rosedj1/Higgs/*/*/Data.root") # stores matched files in a list object!
Remember, that it's not regex! It's standard UNIX path expansion.
How to use wildcards:
* # matches 0 or more characters
? # matches 1 character in that position
[0-9] # matches any single digit
glob.glob("/home/file?.txt") # `?' will match a single character
sys
import sys
print sys.version # find out what version of python is running the script
sys.exit() # Immediately ends program. Useful for debugging.
os
import os
os.getcwd() # returns string of current working dir (equivalent to `pwd`)
os.system() # not recommended, since the output is not stored in a variable; 
 only 0 (success) or 1 (failure) will get stored; use module: subprocess instead
os.path.join(<dir1>, <dir2>) # finds path to <dir1> and <dir2> into single path, supplying '/' as needed
os.path.split(<path/to>/<file>) # returns a 2-tuple with (<path/to>, <file>) (good for finding the parent dir of <file>)
os.path.exists() #
os.makedirs(<dirpath>) # make directory <dirpath>, recursive
os.environ['USER'] # returns string of current user (same as doing `echo $USER` in bash)
subprocess
Python can run shell commands
import subprocess
subprocess.call( ['<cmd1>', '<cmd2>', ...] ) # passes commands to shell and shell executes commands
var = subprocess.check_output(<cmd>) # allows you to store output of <cmd> in var
var = subprocess.check_output(['ls', '-a'])
ret_output = subprocess.check_output('date')
print ret_output.decode("utf-8")
- Thu Oct 5 16:31:41 IST 2017
Clean way:
import shlex, subprocess
command_line = "ls -a"
args = shlex.split(command_line)
p = subprocess.Popen(args)
Example:
import subprocess, shlex
def processCmd(cmd):
args = shlex.split(cmd)
sp = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = sp.communicate()
return out, err
argparse
# Note that it is difficult to use bools as input arguments!
# A quick hack is to pass in '0' and'1' instead. Python is forgiving. :-)
import argparse
def ParseOption():
parser = argparse.ArgumentParser(description='submit all')
parser.add_argument('--min', dest='min_relM2lErr', type=float, help='min for relMassZErr')
parser.add_argument('--filename', dest='filename', type=str, help='')
parser.add_argument('--zWidth', dest='Z_width', type=float, help='Z width in MC or pdg value')
parser.add_argument('--plotBinInfo', dest='binInfo', nargs='+', help='', type=int)#, required=True)
parser.add_argument('--doubleCB_tail',dest='doubleCB_tail', nargs='+', help='', type=float)#, required=True)
parser.add_argument('--pTErrCorrections', dest='pTErrCorrections', nargs='+', help='', type=float)#, required=True)
args = parser.parse_args()
return args
# Call the function.
args=ParseOption()
# Get values from args.
args.Z_width
massZErr_rel_min = args.min_relM2lErr
Matplotlib.pyplot
fig, ax = plt.subplots()
f = plt.figure(figsize=(12,3))
ax1 = f.add_subplot(121)
ax2 = f.add_subplot(122)
-----
f,(ax1,ax2) = plt.subplots(1,2) # 1 row, 2 col
plt.axis( <xvals>.min() , <xvals>.max() , <yvals>.min() , <yvals>.max() ) # control axis range
Axes:
ax.set_ylim([<ymin>,<ymax>]) # sets y bounds from <ymin> to <ymax>
ax.tick_params(axis='y', labelsize=8) # adjust size of numbers on y-axis
plt.xscale('log') # make x-axis log scale
ax.set_xscale('log')
ax.set_xlabel(r'$<LaTeX!!!>$') # Use LaTeX commands
xlabel="$lep$: $p_{T}$ / GeV") # can separate LaTeX font from regular font
https://matplotlib.org/users/mathtext.html
Font sizes:
plt.rc('font', size=BIGGER_SIZE) # controls default text sizes
plt.rc('axes', titlesize=BIGGER_SIZE) # fontsize of the axes title
plt.rc('axes', labelsize=BIGGER_SIZE) # fontsize of the x and y labels
plt.rc('xtick', labelsize=BIGGER_SIZE) # fontsize of the tick labels
plt.rc('ytick', labelsize=BIGGER_SIZE) # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE) # legend fontsize
plt.rc('figure', titlesize=BIGGER_SIZE) # fontsize of the figure title
Find indices of the minimum of arr:
np.unravel_index(np.argmin(<arr>, axis=None),<>.shape)
Importing Packages and Modules
If you get ImportError, then most likely the python interpreter doesn't know the path to your package
1. Do echo $PYTHONPATH to see which paths the python interpreter knows about
2. Do export PYTHONPATH=$PYTHONPATH:<path/to/package> # permanently append <path/to/package> to $PYTHONPATH
3. Make sure that you have the file __init__.py in each dir of your package.
- can do: find <path/to/package> -type d -exec touch '{}/__init__.py' \;
More on this: https://askubuntu.com/questions/470982/how-to-add-a-python-module-to-syspath/471168
sys.path # list of python packages; python searches these file paths for packages to use
sys.path.append(<path/to/package>) # temporarily append <path/to/package> to PYTHONPATH
sys.path.insert(0, <path/to/package>) # temporarily insert <path/to/package> to PYTHONPATH as the 0th element in the sys.path list
import os,ROOT,pickle
from .Utils.processCmds import processCmd
class Hadder(object):
def haddSampleDir(self,dir_path):
processCmd('sh '+os.path.join(dir_path,"hadd.sh"))
def makeHaddScript(self,dir_path,sampleNames,outputInfo):
haddText="hadd -f {0} ".format(dir_path+"/"+outputInfo.TFileName)
basedir = os.path.dirname(dir_path)+"/"
for sampleName in sampleNames:
haddText += " {0}/*_{1}".format(basedir+sampleName,outputInfo.TFileName)+" "
#haddText += "\n"
outTextFile = open(dir_path+"/hadd.sh","w")
outTextFile.write(haddText)
np.log10(x)
dir(numpy)
- gives big list of all available functions in numpy
lists:
x= [3,-1,5.5,0]
np.mean(x) —> 1.875
map(np.exp,x)
- ^maps a function to each element to a list
Define an array of ‘r’ values and one of ‘theta’ values
a = np.arange(1,10).reshape(3,3) —> makes a 3x3 array
a.size #
a.shape #
np.linspace(start,stop,number_of_values)
np.arctan2(y,x)
Element-wise arc tangent of x1/x2 choosing the quadrant correctly.
Meanings of underscores in variable names:
By the way, a double underscore is often called a 'dunder'!
_var # when you import using, say: 'from ROOT import *', then '_var' will not be imported
- single underscores are meant for variables for internal use only (within classes, e.g.); not enforced by interpreter
var_ # this one's easy: a trailing underscore is used simply to avoid naming conflicts (e.g., class_ = 'just a regular string')
__var # interpreter will intentionally name-mangle this var so that it doesn't get overwritten
__var__ # only used for special vars native to the Python language; don't define these yourself!
_ # used as a placeholder var in a function or something; a 'throw-away' variable
Reading from and writing to files:
Read lines from a file:
with open(<filename>) as f:
content = f.readlines()
with open(savePath + saveName + ".txt", "w") as myfile:
myfile.write('rootPath: ' + rootPath1 + '\n')
myfile.write('rootfile: ' + rootfile1 + '\n')
for i in range(len(vars1_x)):
myfile.write('var_x: ' + vars1_x[i] + '\n')
myfile.write('var_y: ' + vars1_y[i] + '\n')
myfile.write('cut: ' + cuts1[i] + '\n')
myfile.close()
IPython
IPython is like a quick jupyter notebook for your terminal.
Extremely useful for its "magic" commands, tab completion,
and ability to go back and edit blocks of code.
? # Intro and overview of IPython
%quickref # quick reference
A command that starts with % is called a "line magic" and %% is called a "cell magic"
- these are non-native to C++ or python, but understood by the IDE for really cool effects!
%magic # bring up tutorial on magics
%lsmagic # bring up magic commands
%<magicname>? # get help on <magicname>
Cell Magic:
%%!
<commands> # begins a cell magic and then passes the cell to the shell
If you need to pass in arguments into a script using ipython:
ipython <script.py> -- --arg1 --arg2 # note the '--' between <script.py> and --arg1
------
Topics to research:
import multiprocessing
Bash/Linux
Linux Tutorial: https://ryanstutorials.net/linuxtutorial/commandline.php
Remember the philosophy of Unix: "small, sharp tools"
MUST KNOW Bash commands:
ls # list most contents in current directory
ls -a # list all contents (including hidden files) in current dir
ls -l # list contents in a long format (just more detailed way)
pwd
cd <path/to/files>
cd ..
cd - # go back to previous dir
cp
mkdir
Some Bash magic:
! # this is the 'bang' operator, an iconic part of bash
!! # execute the last command from history
sudo !! # run last command with sudo privileges
cat # run the last cat command from your history that used cat
cat:p # print the last cat command you used to stdout; also adds that command to your history
history # check your history; displays command numbers
! # execute command number
!$ # means the argument of the last command
cd !$ # cd's into the last command's argument; e.g.
mkdir /new/awesome/folder/
cd !$ # would cd you into /new/awesome/folder/
^ls^rm # if the last command used 'ls', it copies the command, replaces it with 'rm' and executes the new command
command # will not add 'command' to history!
bash batch expansion
cp /etc/rc.conf{,-old} # will make copy of 'rc.conf' called 'rc.conf-old'
mkdir newdir{1,2,3} # will make newdir1, newdir2, newdir3
- it's as if the filepath "gets distributed" over the braces
- this is a good way to mv files and make backups
Difference between 'source' and 'export':
source <script.sh> # effectively the same as: . <script.sh>; executes script current shell
export VAR=value # saves value as a new environmental VAR available to child processes
computer cluster:
folders aren’t contained on just one computer,
but network mounts can make it look like they are
If a file path begins with /
- this is an absolute path ("root")
- relative paths use: ..
server uses a "load balancer"
- puts each user on a variety of nodes to balance the load of resource usage
man -k
- shows you the
> is the redirection operator
| is the pipe operator
e.g.,
ls -l | grep Apr > somefile.txt
- piping the output of ls into the grep command
Common Commands:
scp #
scp # the remote or should be of the form: user@server:/path/to/file
scp -r
history # shows you all previous commands you’ve entered
more # prints to stdout the entire file(?)
less # prints to temporary shell, not to stdout
wc # word count, useful flags: -l -w
sort [-nN] [-r] # usually sorts alphabetically, sort by number size with -n, -r is reverse search
uniq # returns unique values
diff # see the differences between and ;
'<' indicates ; '>' indicates
diff -r # compares differences between all files in and
top -n 1 -b | grep chenguan # see system summary and running processes; -n flag is iterations; -b is batch mode
can grep to see a user's processes
free [-g] # displays the amount of free and used memory in the system; -g to make it more readable
read [-s] [-p] [] # stores user input into ; -s=silent text, -p=prompt becomes
cut -d' ' -f2-4 # use whitespace as delimiter, and cut (print to screen) only fields (columns) 2 through 4
cat | tee [-a] # tee will append the stdout to both a file and to stdout (it piplines the info into a 'T' shape)
ln -s # creates a symbolic link (a reference) between and
- If you modify then you WILL MODIFY !
- Except for 'rm'; deleting does NOT delete
file <>
printf # appears to just be a fancier and more reliable echo
Less common, but possibly helpful commands:
uname -a # look at your Linux kernel architecture, server, etc.
uname -n # find out what node you're on
env # print all your environmental variables to stdout
gdb # GNU DeBugger (not sure how this works yet)
basename # strips of directory part of name and suffix
- basename /usr/bin/sort # returns: 'sort'
- basename include/stdio.h .h # returns: stdio
date # prints the date
Less important but still really cool commands!
say [-v] [name] "" #
write # start a chat with on your server
- You are immediately put into "write mode". Now you can send messages back and forth.
- Press 'Ctrl+C' or 'Esc' to exit write mode.
mesg [y|n] # allow [y] people to send you messages using 'write' or not [n]
Command line language translator:
https://www.ostechnix.com/use-google-translate-commandline-linux/
Control Statements
if ! -x <path/to/file>; then
fi
while true; do
done
Many different flags:
! EXPRESSION The EXPRESSION is false.
-n STRING The length of STRING is greater than zero.
-z STRING The length of STRING is zero (ie it is empty).
STRING1 = STRING2 STRING1 is equal to STRING2
STRING1 = STRING2 STRING1 is not equal to STRING2
INTEGER1 -eq INTEGER2 INTEGER1 is numerically equal to INTEGER2
INTEGER1 -gt INTEGER2 INTEGER1 is numerically greater than INTEGER2
INTEGER1 -lt INTEGER2 INTEGER1 is numerically less than INTEGER2
-d FILE FILE exists and is a directory.
-e FILE FILE exists.
-r FILE FILE exists and the read permission is granted.
-s FILE FILE exists and it's size is greater than zero (ie. it is not empty).
-w FILE FILE exists and the write permission is granted.
-x FILE FILE exists and the execute permission is granted.
Defining Functions:
function {
;
; ...
}
Can be one liners:
function cdl { cd $1; ls; }
cdl mydir # cd into mydir and then ls
________________
grep (global regular expression print)
grep
grep -E -r "**" ./* # search the contents of every file for , recursively starting from ./*
rsync
#rsync -av
ps aux | grep # see active processes
GNU screen! (a terminal multiplexer)
Start a persistent remote terminal. That way, you won't lose your work if you get disconnected!
Once inside a new screen, you should do: source ~/.bash_profile to get your normal settings.
Start up a screen
screen -S <screen_name> # start a bash environment session ("screen")
ctrl+a, then Esc # enters "copy/scrollback mode", which lets you scroll!
- navigate copy mode using Vim commands!
- Hit `Enter` to highlight text. Hit `Enter` again to copy it. Paste with Ctrl+a, then `]`
- Hit `q` or `Esc` to exit copy mode.
ctrl+a, then d # detach from the session (remember though that it's still active!)
screen -ls # see what sessions are active
screen -r <name> # reattach to active session (instead of <name> can also use: <screen_pid>)
exit # terminate session
kill <screen_pid> # kill frozen screen session
ctrl+a then s # split screens horizontally
ctrl+a then v # split screens vertically
ctrl+a then Tab # switch between split screen regions
ctrl+a then c # begins a virtual window in a blank screen session
ctrl+a then " # see list of all active windows inside session
________________
wget # download whatever url from the web
wget -r --no-parent -A.pdf http://tier2.ihepa.ufl.edu/~rosedj1/DarkZ/MG5vsJHUGen_bestkinematics_GENlevel_WITHfidcuts/
- Downloads recursively, without looking at parent directories, and globbing all .pdf
tar -cf foo bar # create using files foo and bar
tar -xf # unzip all of
tar -xvf # untar specific files from tarball
- x=extract, v=verbose, f=file,
Memory usage
du # "disk usage"; good for find which files or dirs are taking up the most space
du -h <dir> # print size of <dir> in human-readable format
du -sh ./ # sums up the total of current workspace and all subdirs
df -h # "disk filesystem", shows usage of memory on entire filesystem
find
find ./ -name "*plots*" # find all files with name plots in this dir and subsequent dir
find /<path> -mtime +180 -size +1G # find files with mod times >180 days and size>1GB
find . -type d -exec touch '{}/__init__.py' \; # create (touch) a __init__.py file in every dir and subsequent dir
find . -type f -printf '%s\t%p\n' | sort -nr | head -n 30 # find the 30 biggest files in your working area, sorted
find . -name "*.css" -exec sed -i -r 's/MASS/mass/g' {} \; # use sed on every found file (the {} indicates a found file)
find ~/src/ -newer main.css # find files newer than main.css
locate
locate -i <file_to_be_found> # searches computer's database. -i flag means case insensitive
Copy multiple files from remote server to local:
scp @:/path/to/files/\{file1, file2, file3\} .
***Learn more about these commands:
rcp
set
set -e # exit on first error?
set -u # catch unset variables?
Use a specific interpreter to execute a file:
#!/usr/bin/env python
Environment variables can be modified using 'export':
export VARIABLE=value
export PYTHONPATH=${PYTHONPATH}:</path/to/modules> # appending ':</path/to/modules>' to PYTHONPATH env var
Interesting env vars:
SHELL # hopefully bash
HOSTNAME # host
SCRAM_ARCH # cmssw architecture
USER # You!
PWD # current working dir
PS1 # bash prompt
LS_COLORS # colors you see when you do 'ls'
MAIL
EDITOR
Customize your prompt (you can even add a command to be executed INSIDE the prompt):
PS1="[\d \t] `uptime` \u@\h\n\w\$ "
Prompt settings:
* A bell character: \a
* The date, in “Weekday Month Date” format (e.g., “Tue May 26”): \d
* The format is passed to strftime(3) and the result is inserted into the prompt string; an empty format results in a locale-specific time representation. The braces are required: \D{format}
* An escape character: \e
* The hostname, up to the first ‘.’: \h
* The hostname: \H
* The number of jobs currently managed by the shell: \j
* The basename of the shell’s terminal device name: \l
* A newline: \n
* A carriage return: \r
* The name of the shell, the basename of $0 (the portion following the final slash): \s
* The time, in 24-hour HH:MM:SS format: \t
* The time, in 12-hour HH:MM:SS format: \T
* The time, in 12-hour am/pm format: \@
* The time, in 24-hour HH:MM format: \A
* The username of the current user: \u
* The version of Bash (e.g., 2.00): \v
* The release of Bash, version + patchlevel (e.g., 2.00.0): \V
* The current working directory, with $HOME abbreviated with a tilde (uses the $PROMPT_DIRTRIM variable): \w
* The basename of $PWD, with $HOME abbreviated with a tilde: \W
* The history number of this command: \!
* The command number of this command: \#
* If the effective uid is 0, #, otherwise $: \$
* The character whose ASCII code is the octal value nnn: \nnn
* A backslash: \* Begin a sequence of non-printing characters. This could be used to embed a terminal control sequence into the prompt: \[
* End a sequence of non-printing characters: \]
FINISH GETTING THE REST OF THESE PROMPT SETTINGS!
e.g. colors
open plots while ssh'ed:
display
eog # quickly open png files
sleep 7 # make the shell sleep for 7 seconds
Sexy Bash Tricks:
quickly rename a bunch of files in a dir:
for file in *.pdf; do mv "$file" "${file/.pdf/_standardsel.pdf}"; done # is this bash's native renaming?
Make a bunch of dir's quickly:
mkdir newdir{1..20} # make newdir1, newdir2, ..., newdir20
iterate over floats:
for k in $(seq 0 0.2 1); do echo "$k"; done # seq
- seq has all kinds of flags for formatting!
Check if a dir exists. If it doesn't, then make it:
[ -d possibledir ] || mkdir possibledir
- The LHS checks if the directory is there. If it is, bash returns 1 and the OR statement ('||') is satisfied. Else, mkdir
Terminal Shortcuts:
Ctrl-A # quickly go to BEGINNING of line in terminal
Ctrl-E # quickly go to END of line in terminal
Ctrl-W # delete whole WORD behind cursor
Ctrl-U # delete whole LINE BEHIND cursor
Ctrl-K # delete whole LINE AFTER cursor Ctrl-R, then # reverse-search your command history for
Option-Left # move quickly to the next word to the left
Cmd-Right # switch between terminal WINDOWS
Cmd-Shift-Right # switch between terminal TABS within window
alias # check your aliases
alias ="" # add to
time ./<script.sh> # time how long a script takes to run
- this will send three times to stdout: real, user, sys (real = actual run time)
Background Jobs:
& # runs in a background subshell
fg # bring a background process to foreground
jobs # see list of all background processes
Ctrl+Z # pause current job and return to shell
Ctrl+S # pause a job, but DON'T return to shell
Ctrl+Q # resume paused job in foreground
bg # resume current job in background
(sleep 3 && echo 'I just woke up') >/tmp/output.txt & # group commands and redirect stdout!
- here the '&&' means to do the second command ONLY IF the first command was successful
Learn more about nohup:
nohup ./gridpack_generation_patched06032014.sh tt 1nd > tt.log &
.bash_profile is executed for login shells, while .bashrc is executed for interactive non-login shells.
Execute shell script in current shell, instead of forking into a subshell:
. ./<script.sh> # note: dot space dot forward-slash
- N.B. this is nearly the same as doing: source <script.sh>
watch -n 10 '' # repeats every 10 seconds
- default is 2 seconds
##########################
sed # stream editor
echo "1e-2" | sed "s#^+*[^e]#&.000000#;s#.*e-#&0#" # makes 1e-2 become 1.000000e-02
sed "s#^[0-9]*[^e]#&.000000#;s#.*e-#&0#" # equivalently
sed, in place, on a Mac:
sed -i '' -e "s|STORAGESITE|${storageSiteGEN}|g" DELETEFILE.txt
Strip python/bash comments from a file:
sed -i -e 's/#.*$//g' -e '/^$/d' # the '-e' executes another instance of sed, like piping. '-i' is "in place" so it modifies ________________
Check to see if some command succeeded:
(N.B. a command returns 0 if it succeeds!)
some_command
if [ $? -eq 0 ]; then
echo OK
else
echo FAIL
fi
awk
An extremely powerful file-processing language
General format:
awk 'BEGIN{begincmds} {cmd_applied_to_each_line} END{endcmds}'
Sum up the second column in a file, and specifying the delimiter of columns as a comma:
awk -F',' '{sum+=$2} END{print sum}' bigfiles.txt
Study this code below and see if syntax is useful:
# if rnum allows, multiply by 10 to avoid multiple runs
# with the same seed across the workflow
run_random_start=$(($rnum*10))
# otherwise don't change the seed and increase number of events as 10000 if n_evt<50000 or n_evt/9 otherwise
if [ $run_random_start -gt "89999990" ]; then
run_random_start=$rnum
max_events_per_iteration=$(( $nevt > 10000*9 ? ($nevt / 9) + ($nevt % 9 > 0) : 10000 ))
fi
You can "divide out" strings:
MG="MG5_aMC_v2.6.0.tar.gz"
MG_EXT=".tar.gz"
echo ${MG%$MG_EXT}
- Prints: MG5_aMC_v2.6.0 # so effectively MG has been "divided by" MG_EXT
Need to learn about:
tr "." "_" # translates all "." chars into "_" (used with piping)
perl -ne 'print if /pattern1/ xor /pattern2/'
What does this do?
model=HAHM_variablesw_v3_UFO.tar.gz
if [[ $model = [!\ ] ]]; then...
Bash Scripting
$# # number of arguments passed to script
$@ # the arguments themselves which were passed to script
$? # return statement of last command: 0 is successful
LaTeX
% # comments
Table of Contents (ToC) are very easily built by LaTeX!
Every document must have:
\documentclass[12pt]{extarticle}
\begin{document}
\end{document}
Referencing
One section in a ToC may be in a file: sec-015-model.tex
Inside this file might be:
\section{Dark photon model} # title of whole section
\label{sec:model} # internal reference name
- when other pieces of code, like the ToC, need to reference this section
(using something like: \ref{sec:model}), they need to know the label of the section!
C++
Take in user input:
TString usrinput;
std::cin >> usrinput; // User input will be stored in usrinput var
To determine size of a C++ array:
int myarr[] = {4, 6, 8, 9};
sizeof(myarr)/sizeof(*myarr)
VECTORS ARE BETTER THAN ARRAYS! # more flexibility!
Vectors:
Similar to arrays, but dynamically sized
#include
vector vecName;
vecName.push_back(value); # append value at the end of vecName
vecName[index]; # index vecName, just like arrays
vecName; # show all entries in vecName
vecName.size(); # number of elements in vecName
make a pointer to the array: (PROBABLY UNNECESSARY)
vector * vecPtr = &vecName # initialize pointer to point to address of vecName
*vecPtr #
Vim
Why use Vim?
Vim (Vi IMproved) is a powerful text editor that is found on most Unix systems.
It is a light-weight program that lets you edit files with lightning-fast speed and is highly customizable.
Vim commands are frequently used in other commands, like: less, man, screen, info, etc.
Did you know?
Type vimtutor in your shell to go to a helpful Vim tutorial!
Essentials in Command Mode:
h,j,k,l # left, down, up, right.
i # Enter Insert Mode.
Esc # Go back to Command Mode.
u # Undo.
Ctrl+r # Redo.
:w # Write (save) your file.
:q # Quit Vim.
yy # Copy entire line.
dd # Delete entire line.
p # Paste what was recently copied or deleted.
w # Jump forward 1 word.
b # Jump backward 1 word.
0 # Bring cursor to start of line.
$ # Bring cursor to end of line.
gg # Go to top of file.
G # Go to bottom of file.
Ctrl+f # Jump forward one page.
Ctrl+b # Jump backward one page.
o (O) # Enter a new line below (above) cursor.
. # Repeat last action.
/hello # Search for the string 'hello'.
n (N) # Search for the next (previous) matched string.
:19 # Jump to line 19.
:set nu # Add line numbers.
:set nonu # Remove line numbers.
:set paste # Prevents the horrendous indentation that sometimes happens when pasting.
:set nopaste # Go back to a world of misery.
Some cool tricks:
c4w # Delete next 4 words. Enter Insert Mode.
9o # Enter 9 newlines below cursor.
ggdG # Go to top of page, then delete contents of file.
di) # Deletes the text inside a pair of ().
ca" # Deletes the pair of "" and all text inside. Go straight into Insert Mode.
qk<cmds>q # Begin recording a macro of <cmds> into register 'k'. Last q ends the recording.
@k # Execute macro stored in register 'k'.
5@k # Execute macro stored in register 'k' 5 times!
mj # Set a mark (checkpoint) in register 'j'.
`j # Bring cursor to the mark stored in register 'j'.
:9,18fo # Fold (collapse) lines 9-18.
za # Fold/unfold lines.
Shift+r # Enter Replace mode: replaces text as you type.
Ctrl+o # Return to previous position of cursor.
Shift+d # Delete all text on line after cursor.
Shift+c # Delete all text on line after cursor. Enter Insert Mode.
Shift+i # Bring cursor to start of line. Enter Insert Mode.
Shift+a # Bring cursor to end of line. Enter Insert Mode.
Ctrl+v # Enter Visual Block Mode.
Shift+i # While in Visual Block Mode, go into Insert Mode. Changes affect all highlighted lines.
:%s/word/neat/g # Substitute every instance of "word" with "neat", globally (in whole file).
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookConfigFileIntro
***If you want to use CMSSW, you must be in an environment that can reach the CMSSW libraries.
Example servers:
- UF (ihepa)
- CERN (lxplus)
- HiPerGator (hpg)
- Fermilab (fnal)
Get 'cmsenv' and 'cmsrel'
export VO_CMS_SW_DIR=/cvmfs/cms.cern.ch
source $VO_CMS_SW_DIR/cmsset_default.sh
cmsrel CMSSW_X_Y_Z
- install the CMSSW environment in a new dir, version X_Y_Z, like e.g. 9_4_2
- 8_0_X = 2016 data
- 9_4_X = 2017 data
- 10_2_X = 2018 data
- once inside, be sure to do: cmsenv to "load" the environment variables (sets up your runtime environment)!
- You will have to use different CMSSW versions for different years' data!
- By the way "cmsrel" stands for "CMSSW Release"!
See what versions of CMSSW are available:
scram list -a
scram list -a | egrep "CMSSW_9_4_X" > cmssw.txt
See what scram architecture you are running:
scram arch
- or -
echo $SCRAM_ARCH
Set scram arch to something different:
export SCRAM_ARCH=slc3_ia32_gcc
export SCRAM_ARCH=slc6_amd64_gcc491
Configuration Files, cmsDriver, and cmsRun:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCmsDriver
Two kinds of config files:
1. CRAB config files: options for submitting CRAB jobs
- You need crab_config files to tell how CRAB how to deal with the jobs you want processed.
2. Parameter Set Config Files: sets all the parameters for generating MC events
- cmsDriver is the main tool to create these param_config files
- View help options: cmsDriver.py --help
Example to generate a param_set_config file called "CMSDAS_MC_generation_cfg.py":
cmsDriver.py MinBias_13TeV_pythia8_TuneCUETP8M1_cfi --conditions auto:run2_mc -n 10 --era Run2_2016 --eventcontent FEVTDEBUG --relval 100000,300 -s GEN,SIM --datatier GEN-SIM --beamspot Realistic50ns13TeVCollision --fileout file:step1.root --no_exec --python_filename CMSDAS_MC_generation_cfg.py
Use cmsRun to load modules stored in a configuration file:
cmsRun CMSDAS_MC_generation_cfg.py
You can just make sure everything properly compiles by doing:
python CMSDAS_MC_generation_cfg.py
- if it returns no errors, you should be good to go!
- Do this before submitting CRAB jobs
It's a good idea to check for errors in your "python generator fragment", like:
python -i externalLHEProducer_and_PYTHIA8_Hadronizer_cff.py
A config file allows you to set all the parameters you want for a job.
- They usually start with this line:
import ParameterSet.Config as cms # imports our CMS-specific Python classes and functions
- And have these as the guts:
- A source (which might read Events from a file or create new empty events)
- A collection of modules (e.g. EDAnalyzer, EDProducer, EDFilter) which you wish to run
- An output module to create a ROOT file which stores all the event data
- A path which will list in order the modules to be run
A configuration file written using the Python language can be created as:
- a top level file, which is a full process definition (naming convention is _cfg.py ) which might import other configuration files
- external Python file fragment, which are of two types:
- those used for module initialization (naming convention is _cfi.py) # configuration fragment include
- those used as configuration fragment (naming convention is _cff.py) # configuration fragment file?
process.load() # Import fragment to top level, also attaches imported objects
Standard fragments are available in the CMSSW release's Configuration/StandardSequences/python/ area. They can be read in using syntax like
process.load("Configuration.StandardSequences.Geometry_cff")
The word "module" has two meanings. A Python module is a file containing Python code and the word also refers to the object created by importing a Python file. In the other meaning, EDProducers, EDFilters, EDAnalyzers, and OutputModules are called modules.
Standard Steps for full simulation and real data
Building blocks of the created configurations are the standard processing steps:
* GEN : the generator plus the creation of GenParticles and GenJets
* SIM : Geant4 simulation of the detector (energy deposits in the detector volumes)
* DIGI : simulation of detector signal response to the energy deposits
* L1: simulation of the L1 trigger
* DIGI2RAW : data format conversion of the digi signals into the RAW format that will be provided in the online system
* HLT : high level trigger
Usually all the above steps are executed in one single job. Remaining building blocks are:
* RAW2DIGI : data format conversion of the RAW format into digi signals
* RECO : full event reconstruction
* ALCA : production of alignment and calibration streams
* DQM : code run for DQM
* VALIDATION : code run for validation
The above list is usually referred to as 'step2'.
Use PhEDEx to transfer datasets between storage areas:
https://cmsweb.cern.ch/phedex/prod/Request::Create?type=xfer#
ED Analyzer == .cc file
python config file
git cms-merge-topic :
- I think this merges the branch and its contents into local directory?
e.g. TFile f = TFile::Open("root://cmsxrootd.fnal.gov///store/mc/SAM/GenericTTbar/GE.root");
If you wish to check if your desired file is actually available through AAA, execute the command: `xrdfs cms-xrd-global.cern.ch locate /store/path/to/file’ (xrd = xrootd, fs = file search?)
In a MC cfg.py file, use: fileNames = cms.untracked.vstring('root://cmsxrootd.fnal.gov//store/myfile.root')
Change password In command line, do: yppasswd
CRAB Utility
a utility to submit CMSSW jobs to distributed computing resources https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrab CRAB Tutorial https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3AdvancedTutorial CRAB FAQ https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideCrabFaq CRAB job errors: https://twiki.cern.ch/twiki/bin/view/CMSPublic/JobExitCodes CRAB Commands: https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3Commands
There is ONE Tier0 site: 1. CERN Seven T1 sites: 1. USA 2. France 3. Spain 4. UK 5. Taiwan 6. Germany 7. Italy ~55 T2 sites You must specify config.Site.storageSite, which will depend on which center is hosting your area, and user_remote_dir which is the subdirectory of /store/user/ you want to write to. * Caltech storage_element = T2_US_Caltech * Florida storage_element = T2_US_Florida * MIT storage_element = T2_US_MIT * Nebraska storage_element = T2_US_Nebraska * Purdue storage_element = T2_US_Purdue * UCSD storage_element = T2_US_UCSD * Wisconsin storage_element = T2_US_Wisconsin * FNAL storage_element = T3_US_FNALLPC
Check out all the tiers here: https://cmsweb.cern.ch/sitedb/prod/sites
You need a CRAB config file in order to run an MC event generation code. - The cmsDriver.py tool helps to generate config files - examples of crab_cfg.py files: crab_GEN-SIM.py crab_PUMix.py crab_AODSIM.py crab_MINIAODSIM.py
A typical CRAB config file looks like: ==================================== from WMCore.Configuration import Configuration config = Configuration()
The /store/user/ area at LPC is commonly used for the output storage from CRAB jobs
How to make CRAB commands available: (must be in CMSSW environment) cmsenv source /cvmfs/cms.cern.ch/crab3/crab.sh #.csh for c-shells
To check that it worked successfully, do: which crab > /cvmfs/cms.cern.ch/crab3/slc6_amd64_gcc493/cms/crabclient/3.3.1707.patch1/bin/crab or: crab --version > CRAB client v3.3.1707.patch1
crab checkusername > Retrieving username from SiteDB... > Username is: drosenzw
Can also test your EOS area grid certificate link: crab checkwrite --site=T3_US_FNALLPC # checks to see if you have write permission at FNAL crab checkwrite --site=T2_US_Nebraska crab checkwrite --site=T2_US_Florida
***N.B. It is better to use: low num jobs, high num events/job!***
First you can run a job locally, to make sure all is well: cmsRun <step1_cfg.py>
Resubmitting a CRAB job: crab resubmit --siteblacklist='T2_US_Purdue' / # don't submit to Purdue - N.B. only failed jobs get resubmitted - There are lots of flags to call to change things like memory usage, priority, sitewhitelist, etc. - --sitewhitelist=T2_US_Florida,T2_US_MIT - Can also use wildcards: --siteblacklist=T1_*
Resubmitting SPECIFIC CRAB jobs: crab resubmit --force --jobids=1,5-10,15 <crabdir1/crabdir2> # N.B. you must --force successful jobs to resubmit
Check number of events from CRAB job: crab report /
Check status: crab status crab status /
Kill a job: crab kill -d <crab_DIR/crab_job>
For help with MC generation (step1): https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCRAB3Tutorial#2_CRAB_configuration_file_to_run
Useful commands: bjobs [-l] # check job status bpeek # check stdout so far bkill # kill a job
OPTIONS: -c <[hh:]mm> # sets CPU time limit, supposedly default is no-limit, but I don't trust that
CONDOR
https CASTOR is a big storage space for lxplus
3 MAIN FILES: bash script # a wrapper, "condor.sh", call the code that you want to run, kind of sets parameters - e.g. condor.sh submit script # typical cluster parameters, memory, universe, queue, group - condor.sub DAG script # contains the jobs, children, parents needed for condor - this dag script is produced from a perl/bash command - then condor reads this DAG script
Run condor: condor_submit <submitfile.sub> condor_submit_dag <file.dag> # this submits the dag file to condor (i.e. submits your jobs!)
log/ - one of which is: output.log # has stdout from code that you want condor to process!
Kill ALL jobs under your username: condor_rm # condor_q #
New commands! ls ==> nsls mkdir ==> nsmkdir cp ==> rfcp rm ==> rfrm chmod ==> rfchmod
LXPLUS
Your user area: /afs/cern.ch/user/ (10 GB storage)
Your work area: /afs/cern.ch/work/ (100 GB storage, also allows you to share files with others!)
EOS Storage
Must be logged into the LPC machines (Fermilab) or on lxplus
https://uscms.org/uscms_at_work/computing/LPC/usingEOSAtLPC.shtml
Big storage area for big files
Past Jake says: DON'T store big files in EOS
They are easier to access from Tier2 on HiPerGator through UF: /cms/data/store/user/drosenzw/
- use uberftp or gfal-copy to access them
if on IHEPA, can store big files in /raid/raid{5,6,7,8,9}
Only lxplus accounts can access EOS storage!
See if you have an eos area on an LPC machine:
eosls -d /store/user/drosenzw/
Tier2 Storage is better:
/cms/data/store/user/drosenzw/ # HiPerGator at UF. ONLY WRITABLE BY CRAB. Output of CRAB stored here.
/cms/data/store/user/t2/users/rosedj1/ # HPG at UF. Put NTuples here.
There are different eos storage areas:
/eos/uscms/store/user/drosenzw/ # My allocated EOS area. LPC's Tier3 eos storage (also: /store/user/drosenzw/ ). Use: eosls
/eos/cms/ # lxplus
/eos/user/d/drosenzw/ # easily accessible from lxplus. SWAN also uses this
/uscms_data/d1/drosenzw/ # normal LPC area
/eos/uscms_data/d1/drosenzw/ # What even is this?
MAIN COMMANDS:
On lxplus, do:
ls /eos/cms/
ls -l /eos/user/d/drosenzw/
mkdir /eos/user/d/drosenzw/
eos ls -l /eos/user/d/drosenzw/ # different kind of listing?
eos mkdir /eos/user/d/drosenzw/ # different kind of mkdir?
xrdcp root://eosuser.cern.ch//eos/user/d/drosenzw/ # copy files
- SWAN is also connected to /eos/user/d/drosenzw/SWAN_projects
Set up your environment:
export EOS_MGM_URL=root://eoscms.cern.ch
File Names:
MGM: root://cmseos.fnal.gov/
LFN (shortcut name): /store/user/drosenzw/
- the LFN is an alias which can be used at ANY site (The LFN is Lenient, i.e. uses a short path like /store/user/...)
- the PFN is the actual file path
eosquota
returns the amount of storage space used/available in personal EOS area
`eosgrpquota lpctau’
checks the storage space for the group “lpctau”
LISTING
`eosls /LFN’
lists files (NEVER USE `ls’!)
`eosls -d /store/user/drosenzw/‘
lists directory entries
-l option for long listing
-a option for listing hidden entries
DON’T USE WILDCARDS OR TAB-COMPLETION!
DON’T USE TRADITIONAL COMMANDS! ls, rm, cd, etc.
COPYING
`xrdcp root://cmseos.fnal.gov//store/user/drosenzw/newNameOfFile.txt' (Local file to EOS)
`xrdcp root://cmseos.fnal.gov//store/user/drosenzw/whateverFile.txt ~/newName.txt’ (EOS to local file)
`xrdcp root://cmseos.fnal.gov//store/user/drosenzw/whateverFile.txt ? root://cmseos.fnal.gov//store/user/drosenzw/newFile.txt'
-f option can overwrite existing files
-s option for silent copy
MAKE DIR
`eosmkdir /store/user/drosenzw/newDir’
-p option will make parent directories as needed
`eosmkdir /store/user/drosenzw/newDir1/newDir2/newDir3’
REMOVING
`eosrm /store/user/drosenzw/EOSfile.txt’ - removes files
`eosrm -r /store/user/drosenzw/dir1’ - removes directory and all contents
if you get scram b errors, first run:
cmsrel CMSSW_X_Y_Z
MUST set up environment in working directory (YOURWORKINGAREA):
cd ~/nobackup/YOURWORKINGAREA/CMSSW_9_3_2/src
cmsenv
For condor batch jobs:
xrdcp outputfile.root root://cmseos.fnal.gov//store/user/username/outputfile.root
or
xrdfs root://cmseos.fnal.gov ls /store/user/username
Attaching files:
root -l root://cmsxrootd.fnal.gov//store/user/jjesus/rootFile.root
or
TFile *theFile = TFile::Open("root://cmsxrootd.fnal.gov//store/user/jjesus/rootFile.root");
in IHEPA, you add root://cmsio5.rc.ufl.edu//store/user/ in the front
TFile::Open() instead of TFile(path,"READ")
LPC / Fermilab / CMSDAS
LPC Contact for CMS DAS problems
cmsdasatlpc@fnalNOSPAMPLEASE.gov
USCMS T1 Facility Support Team
uscms-t1@fnalNOSPAMPLEASE.gov
Fireworks Problems:
fireworks-support@cernSPAMNOTNOSPAMPLEASE.ch
Mattermost Problems:
service-desk@cernNOSPAMPLEASE.ch
Subscribe to hypernews (I may already be subscribed):
https://hypernews.cern.ch/HyperNews/CMS/login.pl?&url=%2fHyperNews%2fCMS%2fcindex
For CRAB Issues: CMSDASATLPC@fnalNOSPAMPLEASE.gov
Get Kerberos ticket: kinit @FNAL.GOV
to check: klist
Log onto cmslpc-sl6 cluster:
ssh -Y drosenzw@cmslpc-sl6NOSPAMPLEASE.fnal.gov
ssh -Y drosenzw@cmslpcNNOSPAMPLEASE.fnal.gov, where N is whatever node you want to join
Initialize your proxy:
voms-proxy-init -voms cms --valid 168:00 (makes the proxy valid for a week instead of a day!)
source /cvmfs/cms.cern.ch/cmsset_default.sh # or put this in .bash_profile
Storage Areas:
/uscms/homes/d/drosenzw # 2 GB storage area
/nobackup/ # larger mass storage area
For DAS, each time I log into the sl6 cluster, I need to:
cd ~/nobackup/YOURWORKINGAREA/CMSSW_10_2_0/src
cmsenv
Switch default shell from tcsh to bash:
To permanently change your default login shell, use the LPC Service Portal, login with your Fermilab Services username and password. Choose the " Modify default shell on CMS LPC nodes" ticket and fill it out.
*
If you want to get the nice command line after a switch to bash, put source /etc/bashrc in your cmslpc ~/.bash_profile file
*
Fireworks doesn’t work locally. Located in: /Users/Jake/Desktop/cmsShow-9.2-HighSierra
For help, contact Basil Schneider: basil.schneider@cernNOSPAMPLEASE.ch
I may still have issues pushing to GitHub.
Keep getting this error in ROOT plots:
AutoLibraryloader::enable() and AutoLibraryLoader.h are deprecated.
Use FWLiteEnabler::enable() and FWLiteEnabler.h instead
Info in <TCanvas::MakeDefCanvas>: created default TCanvas with name c1
Go through CRAB3 tutorial in THE WORKBOOK when finished with pre-exercises
FWLite (found in PhysicsTools):
Frame Work Lite is an interactive analysis tool integrated with the CMSSW EDM (Event Data Model) Framework. It allows you to automatically load the shared libraries defining CMSSW data formats and the tools provided, to easily access parts of the event in the EDM format within ROOT interactive sessions. It reads produced ROOT files, has full access to the class methods and there is no need to write full-blown framework modules. Thus having FWLite distribution locally on the desktop one can do CMS analysis outside the full CMSSW framework.
Example command:
FWLiteHistograms inputFiles=slimMiniAOD_MC_MuEle.root outputFile=ZPeak_MC.root maxEvents=-1 outputEvery=100
Fireworks: turns EDM collections into visual representations… i.e., turns .root files into event displays!
cmsShow DoubleMuon_n100.root
cmsShow --no-version-check root://cmseos.fnal.gov//store/user/cmsdas/2017/pre_exercises/DYJetsToLL.root
For help with:
process.maxEvents = cms.untracked.PSet
- https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuidePoolInputSources
An Event is a C++ object container for all RAW and reconstructed data related to a particular collision.
DAS (Data Aggregation Service)
Big database to hold MC and data samples
https://cmsweb.cern.ch/das/FAQ:
https://cmsweb.cern.ch/das/faq
Examples:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookDataSamples
More info:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookLocatingDataSamples
Different ways to interpret the dataset names:
/<primary-dataset>/<CERN-username_or_groupname>-<publication-name>-<pset-hash>/USER
/object_type/campaign/datatier
/Primary/Processed/Tier
//<Campaign-ProcessString-globalTag-Ext-Version>/
Given a file, DAS can return a dataset!
Given a dataset, DAS can return all the associated files.
Datasets (whether MC or actual data) are published on DAS
- A dataset is comprised of many root files
- Find the name of a dataset based on the file name:
dataset file=/store/relval/CMSSW_10_2_0/RelValZMM_13/MINIAODSIM/PUpmx25ns_102X_upgrade2018_realistic_v9_gcc7-v1/10000/3017E7A1-178D-E811-8F63-0025905A6070.root
>>> /RelValZMM_13/CMSSW_10_2_0-PUpmx25ns_102X_upgrade2018_realistic_v9_gcc7-v1/MINIAODSIM
If you have trouble finding a file that you KNOW is on DAS:
- change the dbs instance to something other than global, e.g. "prod/phys03"
Example DAS Searches:
dataset release=CMSSW_9_3_0_pre5 dataset=/RelValZMM*/*CMSSW_9_3_0*/MINIAOD*
dataset release=CMSSW_10_2_0 dataset=/RelValZMM*/*CMSSW_10_2_0*/MINIAOD*
dataset=/DoubleMu*/*Run2017C*/MINIAOD* # /object_type/campaign/datatier (/Primary/Processed/Tier)
Can search for datasets from the command line using dasgoclient:
dasgoclient --query="dataset=/DoubleMuon*/Run2018A-PromptReco-v1/MINIAOD" --format=plain
- must first do: voms-proxy-init -voms cms
Get the LFN of a dataset by doing a DAS search, like:
file dataset=/GenericTTbar/HC-CMSSW_5_3_1_START53_V5-v1/GEN-SIM-RECO
which will retrieve the following LFN:
/store/mc/HC/GenericTTbar/GEN-SIM-RECO/CMSSW_5_3_1_START53_V5-v1/0010/00CE4E7C-DAAD-E111-BA36-0025B32034EA.root
MCM (Monte Carlo Manager)
Not the same thing as DAS!
Use /mcm/ to find the correct info to find MC samples on DAS:
/mcm/ is the bookkeeping of all produced MC samples
- tells you details of how the MC samples were produced
- e.g., tells you location of makecards.sh and data sets
Put into mcm:
GluGluHToZZTo4L_M125_13TeV_powheg2_JHUGenV7011_pythia8
MCM tutorial with David:
Did a search on DAS:
/GluGluHToZZ*4L*125*/*Fall17*94X*/MINIAODSIM
/Primary/Processed/Tier
David noticed that the location of the MC files couldn't be found here.
So then he checked mcm (Monte Carlo Manager):
David's MCM user profile: https://cms-pdmv.cern.ch/mcm/users?prepid=dsperka&page=0&shown=51
- click: Request > Navigation > dataset_name
- Here, type in the name of the dataset from DAS (without leading forward slash!)
e.g. GluGluHToZZTo4L_M125_13TeV_powheg2_JHUGenV7011_pythia8
- may have to click: Select view > Fragment
- Then go back to Navigation and scroll to right to click the "enlarge" button
- This will bring up important information from "rawGitHub" about the MC samples
Of these, most notably is:
https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/HIG-RunIIFall17wmLHEGS-00607/0
- It has "Links to cards" - these are MC generation cards
- may have to manually search for a specific URL to get the template.input:
https://raw.githubusercontent.com/cms-sw/genproductions/fd7d34a91c3160348fd0446ded445fa28f555e09/bin/Powheg/production/2017/13TeV/Higgs/gg_H_ZZ_quark-mass-effects_NNPDF31_13TeV/gg_H_ZZ_quark-mass-effects_NNPDF31_13TeV_template.input
svn
"subversion" - seems like the lxplus version of git and version control
Use this to edit Analysis Notes in CMS
Excellent tutorial on svn:
http://cmsdoc.cern.ch/cms/cpt/tdr/notes_for_authors_temp.pdfhttps://twiki.cern.ch/twiki/bin/view/Main/HowtoNotesInCMShttps://twiki.cern.ch/twiki/bin/viewauth/CMS/Internal/TdrProcessing
To get your AN/paper started:
svn co -N svn+ssh://svn.cern.ch/reps/tdr2 myDir
cd myDir
svn update utils
svn update -N [papers|notes] # choose one, papers or notes
svn update [papers|notes]/XXX-YY-NNN # enter your AN or paper code
eval `[papers|notes]/tdr runtime -sh`
To modify:
cd [papers|notes]/XXX-YY-NNN/trunk
To build the document:
tdr --style=pas b XXX-YY-NNN # --style=paper for papers
Git-like commands to update files:
svn add # YOU ONLY NEED TO DO THIS ONCE FOR ANY FILE
svn commit -m '' # This will update the file
svn status
svn status -u (--show-updates)
Figures should reside in the fig/ directory
Figure ̃\ref{fig:test} shows a figure prepared with the TDR
template and illustrates how to include a picture in a document
and refer to it using a symbolic label.
\begin{figure}[!Hhtb]
\centering
\includegraphics{width=0.55\textwidth}{c1_BlackAndWhite}
\caption[Caption for TOC]{Test of graphics inclusion.\label{fig:test}}
\end{figure}
The result of the above is roughly as follows:
Figure 1 shows a figure prepared with the TDR template and illustrates how to
include a picture in a document and refer to it using a symbolic label.
Colour versions of figures can by provided for PDF output using the combinedfigure macro in place of the \ includegraphics
command. This takes two arguments corresponding re-
spectively to the black and white and the coloured versions of the same picture, for example:
Figure ̃\ref{fig:test} shows a figure prepared with the TDR
template and illustrates how to include a picture in a document
and refer to it using a symbolic label.
\begin{figure}[!Hhtb]
\centering
\combinedfigure{width=0.4\textwidth}{c1_BlackAndWhite}{c1_Colour}
\caption[Caption for TOC]{Test of graphics inclusion.\label{fig:test}}
\end{figure}
the recommended procedure is to use multiple instances of the
\includegraphics command, combined with the tabular environment if needed.
Lucien ditched svn and switched to git for our AN-18-194.
https://twiki.cern.ch/twiki/bin/viewauth/CMS/Internal/TdrProcessing
Compare what version of the AN you have:
git log # shows recent commits
Certificate Stuff:
Followed instructions on:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookStartingGrid#ObtainingCert
Anytime you want to access data on a TCreate a temporary proxy:
voms-proxy-init --rfc --voms cms
voms-proxy-init --voms cms --valid 168:00 # makes the proxy valid for a week instead of a day!
voms-proxy-init -debug
voms-proxy-info # check your info
When your grid certificate expires, you get an error like:
“Error during SSL handshake:Either proxy or user certificate are expired.”
Request a new grid user certificate:
https://ca.cern.ch/ca/help/?kbid=024010
Must have these permissions:
In case you ever get this kind of error:
“Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/private/var/folders/zj/mnvc1p6542bgc5j7npt_2jkh0000gn/T/pip-install-1uj6p02b/tabula-py/tabula/tabula-1.0.2-jar-with-dependencies.jar'
Check the permissions.”
This is because Homebrew doesn’t play nicely with pip. So do:
`python -m pip install --user --install-option="--prefix=" ’
If you ever get the following error:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
Then the simple fix is:
ssh-keygen -f ~/.ssh/known_hosts -R
- for example, = lxplus.cern.ch
GitHubBitBucket
svn
Get help on any git command:
git help
The most frequent use of git commands:
git add
git commit -m ""
git push origin master
You can also add and commit in one step (adds all modified and deleted files):
git commit -am "" -m ""
A good collaborative workflow:
1. Fork the group's repo so that you have your own repo.
2. Make your own, new branch in the forked repo that you can work on.
3. Keep the master branch of the forked repo synched up with the group's master branch. (upstream)
git config --global user.name [Name]
git config --global user.email [Email]
git config --global user.github [Account]
git config --global core.editor [your preferred text editor]
Make the print log easier to read:
git config --global alias.lol 'log --graph --decorate --pretty=oneline --abbrev-commit'
Pull a specific file from the GitHub repo:
git fetch # downloads all the recent changes, but it will not put it in your current checked out code (working area).
git fetch origin
git cherry-pick <commit-ID> # grab the files from a specific commit(?)
git checkout origin/master -- <path/to/file>
//git checkout / -- path/to/file will checkout the particular file from the downloaded changes (origin/master).
If a Core folder gets updated, do:
git submodule init # may not need to do this every time
git submodule update
If you need to pull down more recent code from a repo, you can stash your current changes:
git stash # saves your modifications for later (so now you can: git pull)
git stash apply # brings those saved modifications back to life!
How to pull down changes from a repo that you're following:
git fetch
git merge
git merge:
1. First make sure local repo is up to date with remote repo: git fetch
2. Then do: git checkout master
3. Make sure master has latest updates: git pull
4. Then checkout branch that should receive changes
5. Finally: git merge
Remotes:
Add a remote called "upstream" to push to original (not forked) repo:
git remote add upstream git@githubNOSPAMPLEASE.com:GitHATSLPC/GitHATS.git
- this is equivalent to doing:
git fetch upstream master
git merge upstream/master
To keep from having to put in your password each time you push:
git remote show origin # This shows you your repo_name
git remote set-url origin git+ssh://git@bitbucket.org//.git
Can also remove remotes:
git remote rm origin
Rename a remote:
git remote rename
Two ways to make a repo:
1. Create repo in terminal:
1. git init
2. git add .
3. git commit -m 'Commit message’
1. undo with: git reset --soft HEAD~1
4. git remote add origin where the :
1. GitHub: git@githubNOSPAMPLEASE.com:/.git # make whatever you want!
2. bitbucket: https://username@your.bitbucket.domain:7999/yourproject/repo.git
5. git push -u origin master
2. Create repo online and clone into terminal:
1. make repo on BitBucket or GitHub
2. git clone
Check the status of latest changes in your own repo:
git status
git status -s # short format
Also useful:
git diff # shows edits between old and new files, line by line
git diff # specifically, compares the changes you have made to last committed version of file
If you get the following error:
error: The requested URL returned error: 403 Forbidden while accessing https://github.com/rosedj1/
Then do:
1. edit .git/config file under your repo directory
2. find url=entry under section [remote "origin"]
3. change all text before @ symbol to ssh://git
USEFUL!
To remove a file from your remote git repo:
git rm # I think this also deletes the file locally!
git rm --cached # does NOT delete file locally; only on the remote repo!
then do:
git commit -m "removing "
git push origin
Remove a directory:
git rm -r
Say you have made a pull request and a bunch of commits which you can see on GitHub.
Now you want to remove those files from the PR.
Doing rm from your local computer won't take it away from GitHub. # may not be true
So you can REMOVE previously committed files by doing: # also may not be true
git rm
If you have a file in a PR that you want to delete, or say you have sensitive info in a PR
which must be deleted, you should 'rewrite' the commit:
git commit --amend # just do this if your most recent commit is local (not online)
git push --force origin # otherwise, include this part too to rewrite the history online
If you move a repo to a new location:
git remote set-url origin ssh://git@gitlab.cern.ch:7999/cms-rcms-artifacts/gitlab-maven.git
or
git remote set-url origin https://gitlab.cern.ch/cms-rcms-artifacts/gitlab-maven.git
*<words>* or _<words>_ # make <words> italic (called "emphasis")
**<words>** or __<words>__ # make <words> bold (called "strong emphasis")
**<words> and _<newwords>_** # <words> and <newwords> (called "combined emphasis")
~~<words>~~ # make <words> strikethrough
{code}<words>{code} # make <words> monospace and code-like
!!<space> # make entire message monospace by beginning message with '!!' and then a space!
@@<space> # ignore all special formatting by beginning message with '@@' and then a space
Code
`<code>` # inline <code>
```python
<code>
``` # block <code> with python syntax highlighting
Headers
# H1 # biggest text (used for headings)
## H2
### H3
#### H4
##### H5 # smallest text
###### H6 # smallest text, but greyed out
Lists ('⋅' is a whitespace)
1. First ordered list item
2. Another item
⋅⋅* Unordered sub-list.
1. Actual numbers don't matter, just that it's a number
⋅⋅1. Ordered sub-list
⋅⋅1. Second item in the sub-list. Remember, GitHub Markdown has automatic numbering
4. And another item.
⋅⋅⋅You can have properly indented paragraphs within list items. Notice the blank line above, and the leading spaces (at least one, but we'll use three here to also align the raw Markdown).
⋅⋅⋅To have a line break without a paragraph, you will need to use two trailing spaces.⋅⋅ # two trailing spaces keeps you in same paragraph
⋅⋅⋅Note that this line is separate, but within the same paragraph.⋅⋅
Unordered Lists
* Unordered list can use asterisks
- Or minuses
+ Or pluses
Tables
| Tables | Are | Cool |
| ------------- |:-------------:| -----:|
| col 3 is | right-aligned | $1600 |
| col 2 is | centered | $12 |
| *zebra stripes* | `are neat` | $1 |
- Colons can be used to align columns.
- There must be at least 3 dashes separating each header cell.
Blockquotes # look like quotes from a forum or email
> <quoted_text>
Hyperlink:
[<words>](<URL>) # inserts a hyperlink at the string <words>
Image:
 # inserts an image
You can also add: Images, Hyperlinks, inline HTML, and YouTube videos
Make a horizontal line (all methods are the same):
*** or ___ or ---
Markdown:
_italics_
*bold*
__bold italic__
=monospace=
==bold monospace==
<verbatim class="cmd">
block of code</verbatim>
Disable formatted text:
<nop>*word*
!*word*
Separate paragraphs with a blank line
---+ # This is a heading
---++ # Deeper heading
--- # horizontal bar
%TOC{title="Goodies:"}% # Table of Contents
* text # three spaces, then * starts a bulleted list
- (further bullets are indented via whitespace triplets)
1 text # three spaces, then some number, starts a numbered list
- doesn't matter what number you put!
- Use the %BR% variable to add a paragraph without renumbering the list
| Cat | Dog
| boo | yah! | # creates a table
%RED% your_text %ENDCOLOR # color your text red
BumpyWord # using CamelCase like this creates an auto-hyperlink to BumpyWord 's TWiki
[[BumpyWords][bumpy words]] appears as bumpy words
[[http://www.google.com/][Google]] appears as Google
%SEARCH # This is an interface to a sophisticated search engine that embeds the results of the search in your page
Three kinds of documents on the TWiki:
1. DocumentMode = community property, anyone can edit
2. ThreadMode = Q&A
3. StructuredMode = has definite structure and rules to follow
Import an image:
<verbatim class="cmd"><img align="right" alt="CRAB Logo" src="http://cmsdoc.cern.ch/cms/ccs/wm/www/Crab/img/crab_logo_3.png" width="154" /> </verbatim>
The proc_card.dat file contains the default process to be generated.
The bin/ dir contains the executable mg5_aMC. Let's play with that next.
How to play with MadGraph5:
Boot up the MG5 interpreter:
./MG5_aMC_v2_4_2/bin/mg5_aMC
Now you can type tutorial for a built-in tutorial or continue reading this TWiki.
Note that by default the Standard Model gets imported:
Loading default model: sm
See what particles MG5 currently knows about:
display particles
Look at the particles with a little more detail:
display multiparticles
Look at the possible vertices:
display interactions
Let's generate a Drell-Yan process:
generate p p > z > l+ l-
Draw the Feynman diagrams associated with this process:
(If you have graphics-forwarding set up correctly on your system, MG5 will draw some purdy-lookin' Feynman diagrams for you.)
display diagrams
Save this generated process in a newly-created dir:
output <new_dir_name>
Note: Executing output automatically writes the Feynman diagrams to the subprocess/matrix.ps file
Calculate the cross section of the process:
launch
Now type 0 to bypass extraneous programs to run.
Then press 1 to modify the param_card.dat using Vim. Change anything you want and then do :wq to write (save) and quit.
Now press 2 to modify the run_card.dat. Change whatever run conditions and then write and quit out of Vim.
Finally, press 0 to calculate the cross section.
EXTRA INFO ON MG5
Bring up the help menu or help on a specific command:
help
help <cmd>
Syntax for generate :
generate INITIAL STATE > REQ S-CHANNEL > FINAL STATE $ EXCL S-CHANNEL / FORBIDDEN PARTICLES COUP1=ORDER1 COUP2=ORDER2 @N
### Examples:
generate g g > h > l- l+ l- l+ [QCD] # loop process
generate l+ vl > w+ > l+ vl a $ z / a h QED=3 QCD=0 @1
generate p p > h , (h > hs hs, (hs > zp zp, (zp > l+ l-)))
generate p p > h > j j e+ e- vm vm~ QCD=0 QED=6
generate p p > h > j j e+ e- vm vm~ QCD=0 QED=6
p p > t t~ # Gives only dominant QCD vertices; ignores QED vertices
p p > t t~ QED=2 # Gives both QCD and QED vertices
Add new processes to current process:
add process p p > h > j j mu+ mu- ve ve~ QCD=0 QED=6
Define new particles (or groups of particles):
define v = w+ w- z a # Define the vector bosons
define p = p b b~ # Redefine the proton
Import a new model:
import model mssm
Note: The model must exist in the /MG5_aMC_v2_4_2/models/ dir.
Modify the model:
customize_model
customize_model --save=<new_model_name> # Save new model
Useful for setting a mass to zero, or removing some interaction, etc.
Save MG5 commands from interactive session:
history .dat
Execute commands stored in history file:
import command .dat # from MG5 CLI
./bin/mg5_aMC my_mg5_cmd.dat # from your shell
Execute shell commands from MG5 CLI:
! # option 1
shell # option 2
./bin/madevent
do: pythia run_01
Rerun a launch command from a dir that was produced using output
./bin/generate_events
After you do
output <new_dir>
, inside that dir you will find a very useful README file that shows you how to:
generate events B. how to run in cluster/multi-core mode C. how to launch sequential run (called multi-run) D. How to launch Pythia/PGS/Delphes E. How to prevent automatic opening of html pages F. How to link to lhapdf G. How to run in gridpack mode
import model HAHM_variablesw_v3_UFO
define q = u d s c t b u~ d~ c~ s~ t~ b~
generate q q > z z / g h h2 , z > l+ l-
output
launch
How to make a gridpack
Run gridpack_generation.sh.
I think you get this from cmssw/genproductions github but I need to double check.
Tips:
- Usually good to put: ptj = 0.01 (= 0 has caused problems)
- qscale at ME level is controlled by ptj at NLO and by xqcut at LO
- draj = 0.05 (this is the deltaR between gamma and jets)
- jetradius = 0.7 (for non-FXFX merging samples)
- lhaid = 292000 (for 4 fermion final state)
Fix:
******Appending [QCD] # applies NLO QCD corrections to process
generate p p > w+, w+ > ell+ vl @0 # '@0' is still leading order...
How to fix certain errors:
Error detected in "import model
Must put a "model dir" with all the model cards inside MG5_aMC_v2_6_5/models/
- a model dir has files like: "couplings.py", "vertices.py", "decays.py"
For help on using MCM or php: https://indico.cern.ch/event/807778/contributions/3362163/attachments/1826349/2989132/mccmTutorial.pdf
How to install LHAPDF sets:
Open up a MG5 interpreter and do:
install lhapdf6
BEWARE! IT'S NOT GUARANTEED TO WORK!
while doing 'install lhapdf6' some errors are encountered,
specifically that the desired dir is never created:
/20190422_HAHM_qqZZ4L/MG5_aMC_v2_6_5/HEPTools/lhapdf6/share/LHAPDF/
instead it only creates:
/20190422_HAHM_qqZZ4L/MG5_aMC_v2_6_5/HEPTools/lhapdf6/
Need to MANUALLY put these files into .../share/LHAPDF/:
- pdfsets.index
- lhapdf.conf
/cvmfs/cms.cern.ch/lhapdf/pdfsets/6.2/pdfsets.index
Then download the desired pdfs into .../share/LHAPDF/:
wget https://lhapdf.hepforge.org/downloads?f=pdfsets/6.1/NNPDF23_lo_as_0130_qed.tar.gz -O NNPDF23_lo_as_0130_qed.tar.gz
tar xvfz NNPDF23_lo_as_0130_qed.tar.gz
If you want to view the code that fails:
MG5_aMC_v2_6_5/HEPTools/HEPToolsInstallers/installLHAPDF6.sh
value '230000' for entry 'pdlabel' is not valid. Preserving previous value: 'nn23nlo'.
allowed values are lhapdf, cteq6_m, cteq6_d, cteq6_l, cteq6l1, nn23lo, nn23lo1, nn23nlo
Change Fortran compiler to "gfortran":
MG5_aMC_v2_6_5/input/mg5_configuration.txt
LHAPDF_DATA_PATH=/cvmfs/cms.cern.ch/lhapdf/pdfsets/6.2/NNPDF30_nlo_nf_5_pdfas
PATH
PYTHONPATH
LD_LIBRARY_PATH
/afs/cern.ch/work/d/drosenzw/DarkZ/MG5_gridpacks_practice/CMSSW_10_2_0/biglib/slc6_amd64_gcc700:/afs/cern.ch/work/d/drosenzw/DarkZ/MG5_gridpacks_practice/CMSSW_10_2_0/lib/slc6_amd64_gcc700:/afs/cern.ch/work/d/drosenzw/DarkZ/MG5_gridpacks_practice/CMSSW_10_2_0/external/slc6_amd64_gcc700/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_2_0/biglib/slc6_amd64_gcc700:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_2_0/lib/slc6_amd64_gcc700:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_2_0/external/slc6_amd64_gcc700/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/llvm/6.0.0-gnimlf2/lib64:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/lib64:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/gcc/7.0.0-omkpbe2/lib:/cvmfs/cms.cern.ch/slc6_amd64_gcc700/external/cuda/9.2.88-gnimlf/drivers
Maybe need to do this:
export PATH=$PATH:/HEPTools/lhapdf6/bin
/afs/cern.ch/work/d/drosenzw/DarkZ/MG5_gridpacks_practice/HAHM_LO/HAHM_variablesw_v3/HAHM_variablesw_v3_gridpack/work/LHAPDF-6.2.1/bin
/afs/cern.ch/work/d/drosenzw/DarkZ/MG5_gridpacks_practice/HAHM_LO/HAHM_variablesw_v3/HAHM_variablesw_v3_gridpack/work/LHAPDF-6.2.1/bin/lhapdf
Les Houches Events (LHE) Files
When a LHE file is made, inside you will find something like this:
DON'T store big files in EOS; they are easier to access from Tier2: /cms/data/store/user/
- use uberftp or gfal-copy to access them
if on IHEPA, you can store big files in one of the native /raid/ storage areas:
Path
Server
/raid/raid5/
gainesville
/raid/raid6/
newberry
/raid/raid7/
alachua
/raid/raid8/
melrose
/raid/raid9/
archer
(Mnemonic: Ga-New-Ala-M-Ar, pronounced "GNU Alamar")
Use Jupyter Notebooks on your remote server:
Note: You'll need two terminals to make this work:
On the first terminal, ssh into a remote server (like the melrose IHEPA server) and do: jupyter notebook --no-browser --port=8884
The output should look like this:
[I 10:50:30.632 NotebookApp] Serving notebooks from local directory: /home/rosedj1/HiggsMeasurement/CMSSW_10_2_15/src/HiggsMassMeasurement/d0_Studies/d0_Analyzers
[I 10:50:30.632 NotebookApp] 0 active kernels
[I 10:50:30.632 NotebookApp] The Jupyter Notebook is running at:
[I 10:50:30.633 NotebookApp] http://localhost:8884/?token=5659ec4939cf8978b56f82329d0c0a465ea0df536bf4ed24
[I 10:50:30.633 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 10:50:30.646 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8884/?token=5659ec4939cf8978b56f82329d0c0a465ea0df536bf4ed24
HiPerGator lectures given by Matt Gitzendanner
Find notes on HiPerGator (Find Matt Gitzendanner's presentations):
training.it.ufl.edu
Find SLURM commands at:
help.rc.ufl.edu
Interactive Jupyter Notebook session that uses HiPerGator!:
jhub.rc.ufl.edu
Location of SLURM example scripts:
/ufrc/data/training/SLURM/*.sh
- for single jobs, grab: single_job.sh
- for parallel jobs, grab: parallel_job.sh
You have a couple main directories:
/ # where HPG first drops you off
/home// # CANNOT handle big files (only has 20 GB of storage)
/ufrc// # can handle 51000 cores!
/ufrc/phz5155/$USER
- parallel file system
- CAN handle 51000 cores, reading and writing to it
- 2 TB limit per group
after ssh’ing into HPG, it will take you to:
/home/$USER
- for me this is: /home/rosedj1
- Get 20GB of space
- Has one server (node) hosting
My groups:
/ufrc/korytov/rosedj1/ # for particle physics research
/ufrc/phz5155/ # for computing course
- so I'm part of two different groups
To use class resources, instead of Korytov’s resources:
module load class/phz5155
- each time you want to submit a job, do this command^
It is useful to use the extension: .slurm for SLURM scripts
######################
## Basic SLURM job script:
#!/bin/bash
#SBATCH --job-name=test # Name for job
#SBATCH -o job_%j.out #
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<rosedj1@ufl.edu>
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=100mb or #SBATCH --mem=1gb
#SBATCH --time=2:00:00 (hh:mm:ss) or #SBATCH -t=00:01:00
SCRIPT STUFF BELOW, e.g.
hostname
module load python
python -V
######################
SLURM sbatch directives
multi-letter directives are double dashes:
--nodes=1 # processors
--ntasks
--ntasks-per-node
--ntasks-per-socket
--cpus-per-task (cores per task)
Memory usage:
--mem=1gb
--mem-per-cpu=1gb
--distribution
Long option short option description
--nodes=1 -N request num of servers
--ntasks=1 -n num tasks that job will use (useful for MPI applications)
--cpus-per-task=8 -c
If you invest in 10 cores, burst qos can use up to 90 cores!
#SBATCH --nodes=1
Task Arrays
#SBATCH --array=1-200%10 # run on 10 jobs at a time to be nice
$SLURM_ARRAY_TASK_ID
%A: job id
%a: task id
HPG COMMANDS:
id # see your user id, your group id, etc.
sbatch # submit script.sh to scheduler
sbatch --qos=phz5155-b #
squeue # see ALL jobs running
squeue -u rosedj1 # just see your jobs
squeue -j
scancel # kill a job
sacct #
sstat #
slurmInfo # see info about resource utilization; must do: module load ufrc
slurmInfo -p # partition, a better summary
slurmInfo -g #
srun --mpi=pmix_v2 myApp
Memory utilization = MAX amount used at one point
Memory request = aim for 20-50% of total use
BE WISE ABOUT USING RESOURCES!
- Users have taken up 16 cores and TOOK MORE TIME than just using 1 core!!!
It would be interesting write a SLURM script which submits
many of the same job with different cores, plots the efficiency vs. num cores
QOS or burstQOS
"Quality of Service"
When you do sbatch, the -b option is “burst capacity” to allow 9x allocation of resources when resources are idle
--qos=phz5155-b
--qos=In the job summary email, the memory usage is talking about RAM efficiency
Time:
-t
time limit is 31 days
- It is to our benefit to be accurate with job time
- infinite loops will just waste resources and make you think your job is actually working
- the scheduler might postpone your job if it sees it will delay other people's jobs
Module system organizes file paths
If you want to use common modules on HPG, you must load them first:
module load
module load python
module load python3
module load = ml # already aliased automagically into HPG
module list # list modules
module spider # list everything?
module spider cl # list everything with cl in name
module purge # unloads all modules
ml intel # allows you to do "make" commands
module load intel/2018 openmpi/3.1.0 # compiling
Learning about Xpra:
module load gui
launch_gui_session -h # shows help options
- This will load a session on a graphical node on the cluster
- Default time on server is 4 hrs
- use the -a option to use secondary account
- use the -b option to use burst SLURM qos
Paste the xpra url into your local terminal
Do:
module load gui
launch_gui_session -e (e.g., launch_rstudio_gui)
xpra attach ssh:
xpra_list_sessions
scancel
ln -s <file_path_that's_way_far_away> # makes a symbolic link from to
Development Sessions
When to use a dev session?
- When a job requires multiple cores and maybe a few days to run
- There are 6 dev nodes!
module load ufrc
srundev -h # help!
srundev --time=04:00:00 # begin a 4 hr dev session, with the default 1 processor core and 2 GB of memory
srundev --time=60 --ntasks=1 --cpus-per-task=4 --mem=4gb # additional flags
srundev -t 3-0 # session lasts 3 days
srundev -t 60 # session lasts 60 min
- default time is 00:10:00 (10 min) and max time is 12:00:00
These are all wrappers for:
srun --partition=hpg2-dev --pty bash -i
Getting CMSSW on HPG!!!
1. Start a dev session
2. source /cvmfs/cms.cern.ch/cmsset_default.sh # this makes cmsrel and cmsenv two new aliases for you!
3. Now cmsrel your favorite CMSSW_X_Y_Z
Misc Info on HPG:
Terminology:
8 cpus/task = 8 cores on that one server
1 node has: RAM, maybe 2 sockets (processors), each with 2 cores
A node is a physical server
Each server has either 2 or 4 sockets
You can also specify the number of tasks per processor: ntasks-per-socket
each processor only has a certain bandwidth to memory
processor = cpu= core
15000 servers all networked together, each node has either 32 or 64 cores on it
Entire PHZ5155 course is allocated a whole node!
- This is 32 cores on HPG2
The slowdown of your job may be in the bandwidth!
Data center near Satchel’s where HiPerGator resides
51000 cores in HPG2 cluster
(only 14000 cores in HPG1)
world-class cluster
3 PB of storage
It’s essentially just a big rack of computers, where each computer has:
HPG2: 2 servers, 32 cores per server, 128 GB RAM/core?
HPG1: 1 servers, 64 cores per server, 256 GB RAM/core?
hpg1 is 3 years old and has older nodes
3.5 GB/core available
threaded=parallel=open MPI
Can do parallel applications:
- OpenMP, Threaded, Pthreads applications
- all cores on ONE server, shared memory
- CAN'T talk to other servers
MPI (Message Passing Interface)
- applications which can run across multiple servers
ntasks = # of MPI rinks
say you want to run 100 tasks across 10 nodes
100 MPI ranks
You might think the scheduler would put 10 MPI ranks on each node,
- but it won't be so equal per node, necessarily!
The scheduler may put 30 tasks on one node, and distribute the remaining 70 tasks on other nodes.
Though you can control the ntasks-per-node
Two processors, each processor has 2 cores
16 cores per processor
64 cores per node
For Windows users who need a Terminal:
- MobaXterm
- or the Ubuntu subsystem
Need an SFTP client to move from to your computer
- Cyberduck
- FileZilla
Text editor:
- BBedit(?)
Cluster basics:
ssh’ing puts you into a login node
Then submit a job to the scheduler.
- The scheduler submits the job to the 51000 cores!
- You must prepare a file to tell scheduler what to do (BATCH script)
- number of CPUS
- RAM
- how long to process the job
There are also compute nodes
- this is where the money is!
- They are optimized to distribute jobs across different computers efficiently
Extra Stuff:
Mantra:
"GUIs make easy tasks easier; CLIs make difficult tasks possible."
Neat commenting styles:
@@@@@@@@@@@@@@@@@@@@@@
@ IMPORTANT MESSAGE @
@@@@@@@@@@@@@@@@@@@@@@
ccccccccccc
c message c
ccccccccccc
-*-*-*-*-* title *-*-*-*-*-
#________________________|
Section 1:
#________________________|
Section 2:
==========
My Title
==========
// ============ Initialize Variables ============= //
// ------------ other title ------------
Teach by showing. Learn by doing.Good TWiki Layouts: