March 2, 2017

Direct srm paths for various T2 sites

  • Purdue: srm://srm.rcac.purdue.edu:8443/srm/v2/server?SFN=/mnt/hadoop/store/user/kjung (soon to be decommissioned)
  • Purdue (alternate): gsiftp://cms-gridftp.rcac.purdue.edu/store/user/kjung (for use with the new 'gfal' tools)

  • MIT: srm://se01.cmsaf.mit.edu:8443/srm/v2/server?SFN=/mnt/hadoop/cms/store/user/
  • MIT (alternate): gsiftp://se01.cmsaf.mit.edu:2811//cms/store/user/
  • LLR: srm://polgrid4.in2p3.fr:8446/srm/managerv2?SFN=/dpm/in2p3.fr/home/cms/trivcat/store/user/
  • CERN EOS: srm://srm-eoscms.cern.ch:8443/srm/v2/server?SFN=/eos/cms/store

The gfal-* commands can be used with these paths as well as xrdcp, and the old-school lcg-cp commands via "lcg-cp -b -v -n 5 -D srmv2 "

Copying locally can be done by using "file:////" as a source or destination

Aug. 15, 2016

tdr_diff instructions, here: tdrDiffInstr

Feb. 1, 2016

Recursive remove from a hadoop space for CMS:

 srmrmdir -recursive=true "srm://srm.rcac.purdue.edu:8443/srm/v2/server?SFN=/mnt/hadoop/..." 

or, using the new gfal tools

 gfal-rm -r "srm://srm.rcac.purdue.edu:8443/srm/v2/server?SFN=/mnt..." 

May 29, 2014

Post-QM Code Documentation

Type Name Location(s) Notes
MC Gen Pythia Generator with/without neutrinos [Purdue] /home/jung68/NoNuGen/CMSSW_5_3_15/src Produces Gen-Only test files with and without including neutrinos in the gen-jet definition
MC Gen Pythia Bare-Bones Generator [Purdue] /home/jung68/PythiaGen/CMSSW_5_3_11/src Produces Gen-Only Pythia MC for testing tune cfgs
MC Gen pPb Embedding Creation [Purdue] /home/jung68/CMSSW_5_3_15/src/PbpGen Produces RECO pPb MC embedded files
Forest Producer Forest Unofficial embedded MC [MIT] /net/hidsk0001/d00/scratch/kjung/PbpJECs Spits out forest files using condor at MIT from existing embedded RECO files
Forest Producer CVS-style forest with b-tagging info [Purdue] /home/jung68/btagForest/CMSSW_5_3_8_HI Older-style forest with working b-tagger implementation on all jet collections
Forest Producer Git-style forest without b-tag info [Purdue] /home/jung68/gitForest/CMSSW_5_3_15 New-style forest to be updated with b-tagger info
Analysis pPb B-Tag nTuple Creator [Purdue] /home/jung68/scratch/macros/bTagging/subscripts Creates b-tagged ntuples from forest files
Analysis b-jet purity and eff. fitter [Purdue] /home/jung68/scratch/macros/bTagging/bfractionVsJetPtPP.C Creates small TH1D's from analysis ntuples for b-jet spectra creation
Utility LumiCalc Setup [Purdue] /home/jung68/lumiCalc/CMSSW_5_3_9/src Simply cmsenv this directory and then use lumiCalc2.py via examples here
Utility Duplicate Forest File Checker [Purdue] /home/jung68/scratch/utils/duplicateForestCheck.C Checks for duplicate forest files in collection from filelist
Utility Check Forest Files for pPb or Pbp [Purdue] /home/jung68/scratch/utils/findForestRun.C Creates a new filelist of just Pbp files from an overall collection
Utility Forest File Merger [MIT] /net/hidsk0001/d00/scratch/kjung/FileMerger/carefulMergeBatch.pl Carefully merges lists of forest files 100 at a time so as to not break the hadoop disk

November 25, 2013

pPb OpenHF + BTag Forest Production

Found some duplicates in the production of the OpenHF + B-Tagging HiForest that Wei created. I wrote a quick checkDuplicates script to figure out which files were duplicates - it relies on the jobs having the same job id in the output directory. The code is here:

(Purdue) $scratch/macros/bJetTools/bJetTools/subscripts/checkForDuplicates.C

and it should be generic for any crab output.

pPb B-tag study

Something is sort of strange with the unfolding currently. The MC closure is somewhat poor - very different than the closure that Yaxian shows for inclusive pPb jets here: pdf. This will need to be investigated further. Both akPu3PF and ak3PF jets show pretty poor closure after unfolding. Perhaps the boost is screwing things up or perhaps the JECs are not quite right?

Other than that, pPb B-tagged jet study is coming along. Some issues in the MC unfolding to take care of. Documenting the locations of all the samples:

Type Pthat bins Location Analysis Code
pp MC at 5.02 TeV 30, 50, 80, 120, 170, 220, 280, 370 (MIT) $hadoop/pp_BTagForestMerged_502TeV/pp_BTagForestXXX_502TeVoutput.root (MIT) $scratch/analyzeTrees.C
pp MC at 2.76 TeV (QCD) 30, 50, 80, 120, 170, 220, 280, 370, 460, 540 (MIT) $hadoop/pp_Fix3_QCDJetMerged/* (MIT) $scratch/analyzeTrees.C careful here! Modded for 5 TeV
pp MC at 2.76 TeV (C-Jet) 30, 50, 80, 120, 170 (MIT) $hadoop/pp_Fix3_CJetMerged/* (MIT) $scratch/analyzeTrees.C (MC=2) careful!
pp MC at 2.76 TeV (B-Jet) 30, 50, 80, 120, 170 (MIT) $hadoop/pp_Fix3_BJetMerged/* (MIT) $scratch/analyzeTrees.C (MC=3) careful!
pp Data at 5.02 TeV N/A *NONE* *NONE*
pPb MC at 5.02 TeV (boosted) 30, 50, 80, 120, 170, 220, 280 (MIT) $hadoop/pPb_Fix2_MCBForestXXX/ (MIT) $scratch/bJetTools/subscripts
pPb Data at 5.02 TeV N/A (Purdue) /store/user/kjung/PAHighPt/pPb_BTagForest_MiniForest/6da38a6c0d9d314487fc89e07594eb29 (Purdue) $scratch/macros/bJetTools/bJetTools/subscripts

April 11, 2013

Some additional caveats for the b-fraction calculation (first step):

(1) Note that for the pp samples, the standard reconstruction sequence does not include neutrinos as part of the genJet definition. Since the b-jets have a greater likelihood of stemming from a neutrino jet, this will cause an unbalance in the reconstruction efficiency of the b-jets as opposed to the inclusive jets. Furthermore, neutrinos ARE included (by accident) in the PbPb genJet definition so this correction becomes even more imperative. You can see in AN2012_126 (b-fraction calculation in PbPb and pp, 2012) that once the correction is applied, the Jet Energy Scales again match.

(2) Perhaps more importantly is a small caveat involving the regional tracking. The way the standard HI reconstruction procedure is done, the jet energies are evaluated before the regional tracking is run. This is because in order to define the regions over which to run the regional tracking, you have to know where the jets are (kind of a catch-22). Therefore, the secondary tracks are generally not included in the b-jet energies and generally are included in the non-bjet energies, due to the nature of the tracking reconstruction algorithm. Because the b-jet secondary tracks are often far from the primary vertex, these are usually ignored in the first round of tracking due to computing constraints. Therefore, the energy found wtihin b-jets is deficient as compared to inclusive jets, so the jet energy corrections applied to all jets will not completely correct the b-jets. An additional correction is needed to restore the energy lost by ignoring the regional tracks. In fact, this is akin to sliding the bjet pt spectrum (dN/dpT) to the left (or, equivalently, cutting on higher pt for the b-jets than the inclusive jets). This amounts to a b-jet number deficiency, then, that is equal to $(1-x)^{|\alpha|-1}$, where x is the amount deficient the b-jets are in energy (usually ~10%) and $\alpha$ is the exponent of the falling pT spectrum (usually ~5.8). Therefore, we can say that our b-jet fraction will be off by 0.9^4.8 ~ 0.6. This also needs to be corrected for. The procedure is again described in AN2012_126, located here on CADi.

January 10, 2013

Discussed the addition of the relevant information for the C-tagging (via D-meson reconstruction) and B-tagging into the official HiForest release. The word was that it's probably best to do the first round of HiForest production as is and complete the proposed fast analyses, and then once the first round is finished, then work on adding the relevant information into the revised HiForest. The main suggestion was that we should look into creating a D-meson tree (perhaps with branches for D and D*, etc).

The presentation that was given in the high-pt group is here

December 19, 2012

Detailed and complete (mostly) outline for b-jet cross-section study (Further elaboration at the attached pdf)

  • Creation of the B-Jet and inclusive Jet MC Samples
    • use the fixed 50K Hijing sample I created for testing to avoid any possible weird errors that may pop up in AMPT. Hijing at this point is much more trustworthy and reliable. In addition, Hijing is able to do more than 20 events at a time.
    • Use pythia dijet embedding in a Hijing background - script in CMSSW_5_3_3_patch3
    • PYSTAT in log files should contain the tried number of events for each successful embedding event. If that doesn't work, it might be useful to simply count the number of times that PYTHIA was initialized, since it's printed every time in the logfile. Using either of these methods, it should be more or less trivial to obtain the total number of events tried before embedding. This allows us to find the equivalent number of min-bias events used to obtain the luminosity corrections for each pt-hat bin.
    • Need 50K events in pt-hat bins of:
      • 10+ (completed 12/19)
      • 20+ (completed 1/3)
      • 30+ (completed 1/15... recoing)
      • 50+ (postponed until after pPb run ((March?)) )
    • Official Production of the sample has also been requested: here
    • Basic reconstruction should work for these samples
    • Subevents are probably not treated properly in the Hijing version from 12/17, but it might be good enough for a first pass
    • Finally, create the HiForest sample using the modified B-Jet HiForest ready in CMSSW_5_3_3_patch3
    • Ensure the GenJet collections are kept!!
  • Calculation of purity
    • Extract purity from the MC samples
    • Purity = N(bjets) / N(total jets) in a b-tagged sample (using the SSVHPT, simple secondary vertex, high purity algorithm) = FbEb / FbEb + FcEc + FlEl, where Fx = relative fraction of x jets, and Ex = tagging efficiency of x jets.
    • Then we plot the secondary vertex mass distribution for the MC samples and use the shapes of the B-jets, C-jets and light jets as templates. For this particular study, we will plan on creating only a b-jet and a light jet template.
    • Then we use these templates and apply them to the shape of the secondary vertex mass distribution in data.
      • The light flavor contribution is expected to be almost negligible after this tagger is applied
    • These templates are applied to a btagged and an anti-btagged sample for the efficiency study.
    • Using the templates, we can then calculate purity in Data and MC and fit it to a functional form:
    • After fitting the MC and Data to this function, we can then define the Scale Factor as the ratio of the Data purity fit / MC purity fit. This is to be done within 5 distinct rapidity bins and plotted as a function of $p_{T}$.
  • Calculation of efficiency
    • Once purity is calculated, finding efficiency is fairly straightforward.
    • Essentially, we have a b-tagged and a anti-btagged sample based on the b-tagging algorithm of our choosing (at first, we will use SSVHPT, unless statistics are very poor, then we will try SSVHE, or even JP).
    • Once we have the b-jet fraction in each sample (analogous to purity, derived in the previous step), it is a simple matter to calculate the efficiency:
      • and .
      • This is the calculation of the number of bjets I actually have total from both samples (in my entire sample).
      • Then, . This is easily calculated for both data and MC.
    • Once we have the b-jet efficiency for data and MC, it's a similar procedure as the purity to calculate the Data/MC Scale Factor, instead the functional form is slightly different:
  • Corrections
    • Unfolding / Smearing Corrections ($C_{smear}$) - Bin-by-bin Method. Yen-Jie has discussed that this may not be the best way for publication, but for a first look, it's fine.
      • The idea is to smear the true $p_{T}$ spectrum using the known $p_{T}$ resolutions and fit to the data using an ansatz function.
      • First we have to assume a gaussian width of the smearing, and convolute it with the ansatz function.
      • We should do this in the same 5 eta bins as everything else, and again, as a function of $p_{T}$.
      • I don't think the ansatz function itself matters too much... it should be cancelled out by the division of the data/MC, but I need to ask about this!.
      • The procedure is to find , where f is the ansatz function and g is the smearing gaussian. I believe the smearing gaussian can simply be looked up or asked for, but again it's something to be asked about.
      • For completeness sake, the ansatz function they recommend in AN-2011/279 is:
      • , where $\alpha$ = 5, motivated by the parton model described by Bjorken, Feynman, et. al. in 1971 and 1978.
      • Finally, once we find the $f(p_{T})$ in data and MC, the smearing factor is simply f(data) / f(MC) as a function of $p_{T}$.
    • Basic Jet Energy Corrections
      • These corrections can be broken down into four parts and need to be well understood:
        • Offset Correction:
          • Corrects for background noise and pileup. Corrected via the Jet Area method:
          • Need to calculate the average pT density per unit area in bins of number of primary vertices. We then assume events that have exactly 1 primary vertex are not pile-up events and use this as the underlying event measurement.
          • Then,
          • This is calculated as a function of the leading jet pT.
        • MC Calibration
          • Corrects the average reco jet pT to the average pT of the Gen Jets. Measured as a function of the reconstructed pT and done, again in the 5 eta bins.
        • Relative Eta Correction
          • Corrects for the fact that the jet energy response depends on $\eta$ because the detector coverage depends on $\eta$.
          • Somewhat complicated correction using the asymmetry in positive and negative $\eta$ along with comparing the response in data and MC. Unfortunately this correction can affect the jet energy up to ~10% in pp, so it's probably worth the effort.
          • Start by creating an analysis sample enriched with dijets balanced in the transverse plane by applying a cut on the third jet (should one exist). Normally, this cut is applied on the ratio , where $\alpha$ usually equals 0.2.
          • Measure this response in data and MC and find the ratio of the two: .
          • Then, since the final state radiation and underlying event mechanics are not perfectly modeled in MC, we apply a correction where we extrapolate alpha to 0:
          • Finally, we define the asymmetry with respect to eta in a final equation:
          • , similar to the dijet quantity $A_{J}$.
          • Then, putting it all together gives us a really messy (but unfortunately necessary) equation:
        • Jet Resolution Effects
          • These correct any mean pT shift due to the resolution effects. It's slightly different than C(smear) because C(smear) corrects the width, and not the mean of the average jet energy response gaussian.
          • Essentially, the point is to use the photon or Z boson for calibration because the response of these two observables are much tighter than in jets. Unfortunately, it's not clear yet whether these methods are going to be worthwhile in pPb collisions, due to possible dijet suppression. These studies rely on the fact that the total jet energy is roughly symmetric in all directions, and if jet suppression exists, this assumption is no longer true. If it is discovered that suppression is small in the Dijet paper released shortly after the pPb run, these effects will be studied. Else, for now, they can be absorbed into systematic errors.

ISSUES YET TO BE SOLVED!

  • Subevent tracking in Hijing / AMPT is not yet implemented... This might lead to issues with the jet smearing, like Pelin and Yaxian are having now with the Jet Shapes paper (HIN-12-002).
  • pt-hat bins for private study is different than those requested for official production (30+, 50+, 80+, 120+, 170+). This is probably okay, since we just want to get a feel for the procedure now, and specifically try out calculation of b-jet cross-section in the low pT range in order to overlap with possible RHIC studies (also produced by Purdue?).
  • Jet Resolution - Is suppression small enough in pPb such that dijet balancing is possible?

December 14, 2012

Uncovered the true source of the CONDOR issues at MIT. The problem is that the pythonpath variable fails to pick up the new python 2.6 dependencies when sending to a remote node. The fix, then, is to add this right before you call cmsRun in your remote script:

 
export PYTHONPATH=$PYTHONPATH:/osg/app/cmssoft/cms/slc5_amd64_gcc462/external/python/2.6.4/lib/python2.6:/osg/app/cmssoft/cms/slc5_amd64_gcc462/external/python/2.6.4/lib/python26.zip:/osg/app/cmssoft/cms/slc5_amd64_gcc462/external/python/2.6.4/lib/python2.6:/osg/app/cmssoft/cms/slc5_amd64_gcc462/external/python/2.6.4/lib/python2.6/plat-linux2:/osg/app/cmssoft/cms/slc5_amd64_gcc462/external/python/2.6.4/lib/python2.6/lib-tk:/osg/app/cmssoft/cms/slc5_amd64_gcc462/external/python/2.6.4/lib/python2.6/lib-old:/osg/app/cmssoft/cms/slc5_amd64_gcc462/external/python/2.6.4/lib/python2.6/lib-dynload:/osg/app/cmssoft/cms/slc5_amd64_gcc462/external/python/2.6.4/lib/python2.6/site-packages

If that fails, you might need additional dependencies that I don't have. What you can do is run your cfg via python -i cfg.py. Then call:

import sys
sys.path
And check for differences between what gets printed and what's stored in $PYTHONPATH. Whatever is missing should be appended to PYTHONPATH before running on a remote node.

November 29, 2012

Condor Fixes due to MIT updating on 11/19 -

So it seems that I can no longer run some of the required python imports via CONDOR. Everything imports correctly when running interactively, but all the CONDOR jobs fail to import either python.os or python.cStringIO. The fixes I implemented were:

  • Removed any dependencies on ivars - this includes condor input and output filenames and the random number seeds
  • Removed any dependencies on RandomServiceHelper. Instead, you can use the bash shell $RANDOM to generate a random number in a script and then replace CONDOR_SEED with the randomly generated number in the python script. If you do this, MAKE SURE you reset the initialSeed of the random number generator in the python script after calling the generator you want
  • Removed any dependencies on FileUtils. This posed a serious problem toward reading in a filelist as an argument to a job. Instead, what I ended up doing was copying the "loadListFromFile" function in FileUtils and using the bare code to do what I want. This bare code is (thankfully) pure python and doesn't require any imports or dependencies. Essentially you just parse the list you create as a long string and cms knows how to work with it. The code is here:

retval = []
source = open('CONDOR_INPUTFILENAME', 'r')
for line in source.readlines():
        line = line.strip()
        if len (line):
                retval.append (line)
source.close()
filenames = cms.untracked.vstring( *retval)

These fixes allow me to now run a full Gen-Sim-Digi -> Reco -> hiForest generation chain without (serious) problems as of 11/29.

November 2, 2012

Notes on the TMVA framework

This framework is particularly useful as a method to find a combination of variables which leads to strong discrimination power between c-jets and b-jets. The idea behind this framework is that it is a suite of tools to run a multi-variable analysis to separate any signal (in our case, c-jets) from any background (in our case, everything else). The one downside is that it seems to take quite some time, especially when one runs more than the simple "Cuts" and "Fisher" algorithms, which do some basic correlation analysis. There exist much more complicated methods in the TMVA suite, like neural networks, etc., but these take on the order of 100 hours to complete using 40K events and 6 or 7 variables, mainly due to the fact that the algorithms are somewhat intelligent, as they try to "learn" about the data given to increase the discrimination power.

October 30, 2012

Flavor Matching Tree

Just some notes on the information contained in the jet trees of the modified hiForest.

  • "refparton_flavor" looks for the highest pT parton and does a delta-R matching with the nearest jet. If the highest pT parton is not within delta-R of a jet, it's not flavor tagged.
  • "refparton_flavorForB" looks at all partons (not just the highest pT) and does the same delta-R matching with the jets. If more than one parton are found to correspond to a single jet, the algorithm will pick out the heaviest flavor parton to assign to the jet. This is due to the fact that decay products are often both seen, like $ B \rightarrow C+e/\mu+\nu$.

Physics topic for B/C disassociation at the LHC

  • B/D mesons' properties will change as they're influenced by the hadronic medium. A disassociation allows us to look at the time evolution of the in-medium modification, as the c decays well before the b parton. This analysis will be difficult because we need to make sure we disentangle pure c-production from B to C decay.
  • Perhaps the D/B direct reconstruction could solve this decay issue.
  • Currently available theoretical models predict different behavior for B/C mesons while in-medium.
    • One from Baier, Kharzeev, Djordjevic, Wiedemann, et al. predicts in medium modification through gluon radiation
    • One from Teaney, Rapp, Molnar, Gossiaux, et al. predicts collisional energy loss (parton-parton interaction in medium)
    • One from Vitev, et al. predicts a mesonic energy loss, as the mesons recombine within the medium.
    • Finally, the AdS/CFT model from Gubser, Herzog, Horowiz, Gyulassy, et al. predicts energy loss through string interactions (string theory model?) This model is of particular interest, because comparing the $R_{AA}^{c}$ to the $R_{AA}^{b}$ gives very different results for this model than a simple radiative energy loss mechanism, as shown on Wei's slides (slide 10), linked in the October 27th section of my notes. This measurement has potential implications to provide strong evidence for or against the AdS/CFT and AdS/QCD models currently used as a string picture of strong force interactions.

October 27, 2012

Talk with Matt Nguyen about doing B-Jet studies at CMS

Had a helpful conversation with Matt today about various methods to try and look at the b-jets in CMS. Here is a summary of what we talked about:

Basically, there are really two ways to do b-jet tagging in cms. The first way is to look at the secondary vertex reconstruction. This is what many of the b-jet tagging algorithms already do. In fact, the complex secondary vertex algorithm (simpleCSV) will revert to using ghost vertices if it can't find a secondary vertex, and even then, after that, it will try to reconstruct the b-jets based on jet probability. Anyway, the secondary vertex allows a great deal of discrimination power between the b-jets and the light jets, since the tracker has such a good resolution. The main problem with this is that we don't always reconstruct all secondary vertices in the reco algorithms, so it's possible to miss some stuff, especially in the face of really heavy pile-up. In pA, I don't think this will be a problem, but it's always best to prepare for unforeseen consequences (half-life, anyone?).

The second method of b-jet tagging is done through muon tagging. For the most part, the presence of a muon pretty much ensures the presence of a heavy flavor jet, especially when you can correlate the muons to electrons. In fact, the only way that you can get a e/$\mu$ pair is to have a heavy flavor decay process happen. Using the muon triggers, then, is a good way to easily filter on events that are likely to contain heavy flavor jets. In fact, in the pA run, we expect to have a few good muon triggers, but more interestingly, we expect to have cross triggers as well. According to Matt, we should be able to have

  • PFJet20
  • PFJet40
  • PFJet20

triggers in the HLT menu for the run in Jan/Feb 2013. These types of triggers will be ideal for a heavy flavor process.

As for actually getting the b-jets / c-jets out once we have the collection of likely heavy-flavor events, there are a couple of ways to do this as well. The first way is to actually go in and do the direct $D^{0}$ reconstruction. When you do this, though, you have to worry about your statistics and your MC comparison. If the MC describes the data poorly, it could potentially not describe the $D^{0}$ reconstruction well at all. If that's the case, it might be more straightforward to use a different method, though you can always use data-driven methods of estimating efficiencies and purities. It actually might be useful to try and reconstruct all the particles I can (D, $V^{0}$, K, $\pi$, etc).

In addition, comparing the various b-tagging methods is another good data-driven way to estimate some of your efficiencies. If you can pick two completely orthogonal tagging techniques (e.g. one that relies strictly on the secondary vertex, and one that does not), you can get a handle on how efficient your secondary vertex reconstruction is. I got the impression that secondary vertices are not all that common in these events, and reconstructing them is going to be a bear. I think this should be more straightforward in pp and pA than it was in PbPb collisions, just due to the fact that we have so many fewer tracks.

It is also important to note that the Jet Energy Corrections (JEC's) are not done yet in pA (or in 5_3_3 for that matter). Are they going to be the same for light jets, c-jets and b-jets? Something else that we have to keep in mind. I can table this for now, but I need to make sure I keep an eye on it in the future.

Lastly, it would be good to have a physics interest driving this measurement. After talking with Wei, it seems like the charm and bottom mesons may interact with the medium differently because they have different masses. Wei's talk here describes the heavy-flavor interaction with the medium. Essentially, because the decay lifetime of c's and b's are different, we can obtain another differential measurement of the evolution of the medium in time by deconvoluting the b's and c's from our measurements of medium properties.

Papers to look at:
BTV-11-004 (b-Jet Identification in the CMS Experiment)
BPH-11-022 (Inclusive b-jet production in pp collisions at $\sqrt{s}$ = 7 TeV)

October 25, 2012

Normalizing the inclusive jet samples

So to make comparisons between the heavy flavor embedded jet samples and the inclusive jet sample we have to normalize by the luminosities of the sample. Of course, the embedded B-jets will have the largest luminosity per jet, since those jets are the most rare. For my particular case, my QCD sample seems to be large enough that I can normalize by simply counting the B-jets / C-jets in each sample and scaling each sample by that particular factor. In my case, my normalization factors were:

Cuts: |jet eta| < 2.4 && jtpt > 30 && refparton_flavor = -999 && flavor Cut

QCD (b-jets): 1119 jets
B-Jet Sample (b-jets): 5579 jets
Norm. Factor: 0.2006 (1/5)

QCD (c-jets): 2844
C-Jet Sample (c-jets): 22235 jets
Norm. Factor: 0.1279 (1/8)

This is ok because recall that my C-Jet sample is 4x bigger than my B-jet sample. I should probably check for self-consistency and see the overlap between the two samples.

B-Jet Sample (light-jets): 6077
C-Jet Sample (light-jets): 23469
QCD Sample (light-jets): 33357
Norm. Factor (b/QCD): 0.1822, Norm. Factor (c/QCD): 0.7035.
But if we remember that the b-jet Nevts is 1/4 of the c-jet Nevts, 4*0.1822 = 0.7288, roughly equal to the c-jet overlap.

This obviously isn't the most robust way to correct the luminosity difference between the samples, but for a first look, it should be good enough. With these normalization factors, we should be able to obtain the purity of the samples. The way we should do this is the following:

  1. Apply the b-tag on the b-jet sample and calculate efficiency vs pT.
  2. Apply the b-tag on the QCD sample and calculate efficiency vs pT.
  3. Normalize the b-jet sample according to cross-section.
  4. Purity = b-jet sample size / b-jet + QCD jet sample size.

In other words:

but if we recall that , then we have:

Awesome.

October 17, 2012

B-Jet Status

So I have my own sample of 10K B-Jets, 40K C-Jets and 40K QCD (inclusive) jets. As a reminder, Efficiency = Jets Passing Discriminator / Total Jets =

,

where x is the threshold desired for that particular algorithm. In my sample, I've been using x=0.8 as a baseline for the csvSimple algorithm and x=0.55 for the jet probability algorithm. Of course, this is really only useful if you want the efficiency as a function of some independent variable like jet $p_T$. Instead, if you want the more useful plot, where you plot the algorithm efficiency as a function of the discriminator threshold itself, it's a little more tricky. What I did was I created an array and counted the number of jets that passed various levels of the threshold, i.e. for each jet, for each x, where x goes from 0->1 in steps of 1/50, increment a counter if the jet passes. Then divide each element of the array by the total number of jets in the sample. For a good algorithm, the efficiency should decrease for all types of jets as a function of discriminator, but the light jet efficiency should decrease faster than the heavy flavor jet efficiency.

October 16, 2012

Some embedding notes

When embedding, we need to provide an inclusive Jet sample - not just B-jets, because we need to take into account the mistagging of the light quark jets as B-jets. In principle, a Minimum Bias sample could do the same job as an inclusive jet sample, but would require many millions of events.

So now, I'm creating embedded b-jets using a pyquen setup with Matt's help. Have to remember to check out the pyquen parameters - the aBeamTarget = cms.double(208.0) is unclear whether that's a pA or AA collision. Probably the AA or pp won't make much difference right now when looking at embedded b-jets.

To make light jets (for next time), it's better (easier) to re-execute the driver command. There is a cfi modification required to do this. Copy over Matt's directory here: /net/hisrv0001/home/mnguyen/scratch/CMSSW_5_3_3_patch3/src/Configuration/Generator/python and remove the bjetTrigSetting import and bjetTrigCommon. Then remake the cfg with the cmsDriver command on the existing cfg.

October 9, 2012

HiForest fixes to get the flavor information in the hiForest:

Required:

  • Check out HLTagger/Configuration/python/HLT_PIon_cff.py for CMSSW_5_3_3_patch3
  • Check out HeavyIonsAnalysis/Configuration/python/CollisionEventSelection_cff.py for CMSSW_5_3_3_patch3 (for some reason cmsRun has a problem finding these files)

Optional (for adding b-jets):

  • Remove all non b-tagged processes from the pat_step. I have only akPu3PF and icPu5Calo jets
  • Change hiSelectedTracks -> generalTracks
  • Change hiSelectedVertex -> offlinePrimaryVertices (since there's no regional tracking in pPb)
  • Comment out all non-btagging collections b/c/ they're all clones of akPu3PF anyway.
  • Change DoRegitForBjets = True in runForest cfg ONLY if running PbPb. This turns on regional tracking, which isn't necessary in p-Pb.

October 8, 2012

B-tagging at CMS - Samples with a fixed hiForest such that they contain the gen parton flavor information

Source Type Location Size Short Description
Hijing (no embedding) $hadoop/Hijing_BJet/RECO 10K events Preliminary hijing heavy flavor generation. No b-jets at all
AMPT (no embedding)      
Pythia q/qbar production $scratch/PythiaBJets/PythiaBAll.root 29K events Test sample with pythia q-qbar production.
AMPT with embedding (10/17/12) $scratch/AMPTBJetRECO 40K events First embedded self-production. Used Lingshan's realdata1 collection

October 2, 2012

Hijing Generation of Heavy quarks (no embedding) -

(all these flags are on line 5833)

IHPR2(3) = 3 //Sets Hijing to Heavy quark production (instead of min bias). This induces charm production by default. IHPR2(18) = 0 //for charm, =1 for bottom production

HIPR1(10) = 0 //turns on inclusive sample jet production HIPR1(7) = heavy quark mass used in calculation. = 1.5 for charm (default), = 4.2 for bottom.

September 28, 2012

Hijing Notes -

- Hijing needs to be debugged in CMSSW. The biggest problem we've run into with Hijing is that we can get memory overflow errors and array overflow errors with very high multiplicity events. I'm very surprised this wasn't an issue when they were doing the Pb-Pb analysis. I think they used Pyquen exclusively, so the Hijing and AMPT errors weren't an issue then. Nevertheless, the Hijing implementation in CMSSW is not well understood. The bug fixes that Yue-shi implemented are here. They mainly relate to some array overflow errors and handling events with thousands of tracks (which is rare, but happens even in p-Pb).

- Additional note = Flag IHPR(21) = 1 (line 5833) turns on the retaining of all decayed particle information. This means that decayed B's and D's, for example, will be passed to Geant. It's not clear yet whether Geant interprets these as real particles and decays them again, but it doesn't seem likely. I think Hijing particles have a "decayed" flag, which Geant picks up on. Without this flag, you don't see any B and D particles in a standalone hijing. They do show up in geant, but the question is: is this the behavior that we want? I think yes - the CMS rule is to decay all particles with a lifetime shorter than the distance to any detector component within the generator itself, i.e. not rely on Geant to do any decays of very short-lived particles.

September 24, 2012

So I've been here for two weeks now - just need to make a quick addendum:

Places to Stay - If you're going to stay less than 2 weeks, definitely just stay in the CERN hostels. You'll spend a little more because you'll be eating either at the CERN restaurant or out, but the rooms are nice enough and the internet is fast-ish and it's really convenient to be on site.

- If you're staying for longer than 2 weeks, but less than 3 or 4 months, I really recommend the Aparthotel in St. Genis - Pouilly, France. Make sure you ask about the CERN rates. If there's two people (or 3 if you don't mind sharing a room), this is a better deal than the hostel. It's slightly cheaper on rent (though not by much), but you save a lot of money on food. You'll get a fridge and a stove top, and there's a Carrefour right down the street that has enough food to get you through a couple of months. Their link is here.

- If you're staying for longer than a few months (ideally 6 months+), look into renting a real apartment. The CERN Users' Office site has a bunch of good information on this front. I would check that out.

September 13, 2012

Arrived at CERN and started to settle in. Getting here was a hassle, so I want to put a reference here regarding the things that need to take place before you get here as a US citizen:

Check your passport - Make sure your passport has at least 6 months of validity left from the departure date.

Request a "Convention d'Accueil" - For this you'll need to contact Yasmin Yazgan at yasemin.uzunefe.yazgan@SPAMNOTcernNOSPAMPLEASE.ch. Obviously remove the spamnot part. Just tell her how long you're staying at CERN and she'll request that you provide her with a bunch of other information like name and birthdate, etc and then will draft and send you the Convention. Print it, sign it, and bring it with you.

Request a letter of employment from your university - Make sure you bring a letter signed by the department that gives your START AND END dates of employment. Just so long as the end date is after you're planning on leaving CERN, you should be fine. Print it and bring it with you.

Fill out this form - Make sure your group head signs it before you go. It's seriously a hassle if you have to fax stuff around because of the the +6 hour time change. Print it and fill it out, have it signed, and bring it with you.

Make sure your Visa situation is in order - If you're staying for longer than 90 consecutive days, you'll need to apply for a visa. I don't think this is a big deal (I didn't have to do it since I'm only staying for 60 days), but I think there is documentation online for this here.

Book your flight - Geneva has very few direct flights from the United States, so unless you live on the east coast, expect to transfer planes at least once. We went from Chicago -> Washington DC -> Geneva through United.

Find a place to live - As far as I can tell, it's extremely difficult to find a place to live unless you know someone currently at CERN willing to help you out. I believe most of the time, visits to the apartment is mandatory as most hosts want to meet their clients before signing a lease. I decided to stay a week at the CERN hostel (BOOK EARLY!) and am currently trying to find a place while here, within that first week. Expect to spend at least 1000 CHF / month for a sublease. Anything under this is either a pretty good deal, or you're about to get ripped off. Also note that many landlords require a 3 month stay for apartments, so if you're going to be at CERN for less than that amount of time, make sure you say that before you make arrangements to travel all the way out to an apartment

Request a travel advance - Purdue offers travel advances, so if you're a broke graduate student (like me!), you'll need to pick up some spending cash before you go so you can afford to pay for an apartment or lodging or food, etc. I know Purdue requires 5 business days to process one of these, so make sure you do this at least a week or two ahead of time.

Make sure your credit card companies know you'll be abroad - Most credit card companies assume a withdrawal from Europe on a US credit card means that you're card's been stolen. To avoid this, just call them and tell them you'll be in Switzerland and/or France. They'll take care of it by putting a travel notice on your card.

August 28, 2012

So Quark Matter 2012 really interrupted my CMS work. I'm planning on now moving all production to MIT since Purdue's systems give me the most annoying errors. I guess most jobs can't seem to stage-out the files that I need, so they simply crash and burn. It seems to happen on most nodes, and though they all usually fail, sometimes a few will make it through. I think there must be some kind of memory or hard drive space constraint when using the grid through Purdue, but the local submission through condor is not worth my time to figure out. It makes more sense to move all production to MIT instead.

It seems a lot has changed since this last update. For starters, we've migrated to CMSSW_5_3_3 for the AMPT production, since this will be the CMSSW version used for the Pilot Run in Sept. This was quite a bitch to get working, since they've migrated to using symbolic global tags and, therefore, it makes much more sense to use the cmsDriver.py function for each successive update of CMSSW and global tag. Some of the required data needs to be pulled in via the frontier://FrontierProd/... sourcing. In the end, you'll need to do something like this:

 cmsDriver.py Hydjet_Quenched_MinBias_2760GeV_cfi -s GEN,SIM,DIGI,L1,DIGI2RAW,HLT:GRun -n 10 --conditions auto:startup_GRun --datatier GEN-SIM-RECODEBUG --eventcontent FEVTDEBUGHLT --scenario HeavyIons --no_exec 

The auto:startup_GRun automatically pulls in the appropriate global tag that corresponds to the CMSSW version that you've established via the cmsenv command (which I'm sure sets a bunch of environment variables).

I'm travelling to CERN from Sept. 10 - Nov. 9, so hopefully I'll be able to at least see the pilot run and start work on my jet tagging code using the pilot run data. In addition, I'd really like to be trained to be an HLT on call person for the real p-Pb run in January / February.

Once my MIT account is assigned, my plan is to then work on hiForesting the data that Lingshan currently has produced on the MIT disk. Then I can start to make plots using the generated data with trigger decisions and come up with some various analyses to aid in development of the trigger menu for the production run in January.

June 27, 2012

Update on the production of AMPT-generated events. We need to have:

  • 5.02 TeV center of mass energy
  • Include the ZDC information
    • To do this, you need to use the GeometryDB_cff file
    • Along with the updated global tag. For Heavy Ion, it should be STARTHI50_V16::All
  • Include the Castor Information (HF forward tracker, not the storage space)
    • Need to add the following lines:
      process.g4SimHits.Generator.MinEtaCut = -7.0
      process.g4SimHits.Generator.MaxEtaCut = 5.5
      process.g4SimHits.CastorSD.nonCompensationFactor = cms.double(0.77) 
  • Want to make the Pb the projectile and p the target (that way the particle spray is in the forward eta direction).
  • Need to include the rapidity shift code (written by Lingshan), check-outable from cvs via cvs co UserCode/lixu/PPbBoost/
  • Need the process to run on CMSSW_5_1_3 (5_0_1 won't submit jobs properly since it's not in the grid database)

I've included all these things in a py script, located on Purdue at ~/CMSSW_5_1_3/src/GeneratorInterface/AMPTInterface/test/pPb_vtxBoost.py

When the Simulation is complete, the raw and reco data will be posted on the PPb Simulation twiki for 2012.

The next step I need to accomplish (one this generation step is completed and verified) is to do reconstruction and then run the hiForest on the reconstructed data. This should allow be to get the L1 trigger decisions from which I can get a good feel for the trigger rates, etc. It has already been verified that the OpenHLT produces the same information as the hiForest production, and since many more people are working on the hiForest as opposed to the HLT, I feel like it's a better decision to move to hiForest.

More updates coming soon. I'd like to have ~5K events produced by the end of the week to verify.

May 24, 2012

More hints - I figured out a way to dump all the input tags from a EDAnalyzer to a file, in order for easy searching if you have lots of input tags (Like in the case of the OpenHLT framework). To do this, you need to:

python -i configuration file | tee output.txt , then run whatever command line you need. In my case, just process.hltanalysis.

This will redirect all the output to the screen to a file, from which you can search for what you need.

In addition, if you want to dump Root TTree Content to a file, Phillipe's explanation on this page works pretty well.

May 22, 2012

Couple of hints and tips from yesterday and this morning: Using srmls and srmcp at Purdue requires a very specific voms proxy. It requires you to ensure you call the cms voms server via voms-proxy-init -voms cms . Also be sure to encapsulate any wild cards or logins in single quotes (especially if you're using a csh environment, as I am). I am considering switching to the .sh environment, because it seems like most scripts are written with this shell in mind. Translation is a simple matter though - most of the time you just need to change any export commands to a setenv and you'll be all set. A simple find and replace in vi usually takes care of this. Maybe I'll write a shell translation script. Might be useful.

As for the trigger primitive matching, It's been significantly more challenging than I expected. For some reason, I can't get the code that Matt Nguyen provided, here, to work. There's a mismatch between the reconstructed vertex input tag that I have in the AMPT-generated files and the input tag the code is looking for. I changed the names to match just fine, but the code is looking for an object of type std::vector<reco::Vertex>, but the ntuple has type edm::Wrapper< vector<reco::Vertex> >. Maybe I can remove the wrapper somehow? I guess that's today's project. I emailed Matt and Frank asking for a solution, but no response yet.

Finally, I have to make sure to be very careful not to exceed quota at the purdue home directory. Returned files from crab jobs very quickly fill up a directory and I have to remember to crab -clean or at least delete the root files after I've retrieved them. I only have 4.6 GB, which goes fast.

May 17, 2012

I need to look into this trigger reconstruction further. The presentation I had last week was a simple ratio of events that fired the trigger to total events, which is nice if you're looking at pure trigger rates, but if you want a more robust analysis of what's actually firing the triggers, you need to be able to match the reconstructed jets to their L1 triggered primitives, i.e. did this jet actually fire this trigger, or was it a random other physics process?

My thoughts were to include the MC data into the OpenHLT framework, but this is proving more difficult than I originally thought. It looks like the OpenHLT doesn't do any native reconstruction, so I might need to add that in manually if I'd like to ensure the MC data gets filled into the OpenHLT ntuple. Everything I've seen online here makes it look like OpenHLT should be able to run on RAW data. Further study is needed. I emailed Frank Ma and Matt Nguyen to confirm. Hopefully this can be resolved by the end of the week.

May 16, 2012

Looked at $\tau$ tagging and b-tagging of jets for the past couple of days. I'll start with b-tagging.

B-tagging of jets is useful because it allows us direct access to a few very interesting physics channels, including top quarks, Higgs bosons, and Supersymmetric (SUSY) particles. All these primarily decay into bottom quarks (top quarks through CKM matrix, Higgs because decays depend on f mass, and SUSY particles ??). There are 5 basic algorithms used to do b-tagging of jets. Most of these tagging algorithms use the distinctive lifetime properties of the b-hadrons (). These tagging schemes are usually applied offline, but some that are quick enough can be applied at the HLT Level. I will do my best to denote each algorithm that can be applied at HLT (usually level 2.5 or 3).

Physics Object Reconstruction

Many of the b-tagging algorithms require properly tagged lower-level physics objects. Since most b's create jets, jet tagging is crucial, especially since we're looking for b's within jets to begin with. Most b-tagging algorithms requiring jets use the iterative cone algorithm with a cone size R = 0.5. In addition, we require extremely precise track reconstruction, especially close to the interaction point. Indeed, the track measurement precision in terms of the impact parameter is probably the most crucial measurement required to effectively tag b-particles. These measurements allow us to directly measure the long lifetime of the b-particle.

Track Impact Parameter-based Tags

b tagging graphic.png
B-Tagged Jet Example [1]

These types of algorithms rely heavily on the efficient track reconstruction mentioned above. In essence, this algorithm sorts all tracks from an event in terms of a three-dimensional "impact parameter," which characterizes how positively displaced (along the jet direction, from the primary vertex) the particle track is. For example, if a track is very heavily characteristic of a b-particle, we would find that there would be a few tracks that have a vertex displaced a bit from the primary vertex, but most importantly, these tracks would be displaced in the same direction as the jet. See picture to the right. This secondary vertex is the point of decay of the b-particle, so the more "along the jet" the secondary vertex is, the more characteristic the jet is of a b-jet. In addition, we also usually sign the displacement. Often times, negatively displaced tracks arise from the displacement calculation. These negatively displaced tracks are characteristic of many things, but primarily the finite detector resolution, which can output a poor measurement of the track parameters or even the primary vertex. In addition, beam pipe and pixel layer scattering can also affect these measurements. These negative parameters are useful, however, as one usually uses these to obtain the b-tagging efficiency for jets that have no b-particle or c-particle correlation.

Track Counting Algorithm (HLT-capable). Inputs: Impact Parameter Significance, N tracks per jet

Very simple algorithm, thought it requires fast online jet reconstruction (usually iterative cone). First, the highest N tracks in a jet are ordered based on impact parameter significance. Then, depending on N, usually either 2 (for higher speed), or 3 (for higher purity) highest impact parameter tracks are calculated. If the Nth track passes the designated impact parameter significance threshold, the jet is tagged as a b-jet.

Jet Probability Algorithm Inputs: Impact Parameter Significance

This algorithm is significantly more complicated, though it runs with the same basic idea as the track counting algorithm. In this case, though, instead of simply having a particle pass a threshold, we require the total jet to pass some certain probability threshold. First, we use the tracks with a negative impact parameter to extract a resolution function R. The negative tracks are used here because they are mainly primary tracks (almost assuredly not b-jet tracks). This resolution function, then, is used as a distribution of impact parameter significance, binned in a fine enough grain such that we avoid any artificial binning effects. Once the resolution function is determined, we then say that the signed probability of a track to have come from the primary vertex is expressed as . Since jets have more than one track, however, we must also express the probability that a jet (as a collection of tracks) comes from the primary vertex. We express the probability that any of a collection of N tracks did not come from a long lived particle (or, equivalently, the probability that all of the N tracks came from the primary vertex) as , where . is used to weight for negative track impact parameters, as it is more likely that the track is a primary track if the impact parameter is negative. Therefore, we define it as for positive, and for negative. Finally, using this ordered probability, we pick out the jet candidates that are most likely from b-jets.

Other Algorithms Inputs: Vary

There also exist some leptonic algorithms for b-tagging jets that exploit the fact that the branching ratio for b into electrons and muons is ~19% for each family. These algorithms are more rare, and are usually used in conjunction with the impact parameter significance algorithms described above, so I won't go into too much detail. Basically, the idea is you find the leptons among the tracks that are associated with a jet. This requires a very pure sample of electrons and muons, though various track cuts exist to obtain pure samples of both these particles. The majority of the work and CPU power associated with this method is non-lepton rejection and elimination of sample contamination.

One can also reconstruct the secondary vertex and apply various selection cuts on the primary vertex to exclude $V^{0}$ decays, etc. This is the combined secondary vertex tag and is usually used in conjunction with other algorithms.

Finally, we can do some b-tagging at the HLT Level. Generally, this technique uses only the pixel hits to do reconstruction, which provides a significant decrease in sensitivity, but enough of a boost in speed to allow for the online tagging to occur at all. Nevertheless, it turns out that this technique is sufficiently reliable that meaningful online b-tagging can be done. Using this shortcut, along with taking only the top 2 jets in $E_{T}$, gives us enough of a performance enhancement that we can do the track counting algorithm (described above) at the HLT level.

May 14, 2012

The HLT and Reco Jet reconstruction algorithms are not incredibly complicated, which is nice. I aim to give a summary below. We also must remember that the HLT trigger moves the 3 kHz L1 trigger rate down to the ~100 Hz write to disk rate. As an overall theme, if the reconstruction fails or the event is suddenly deemed uninteresting, the reconstruction is stopped at that point and the event is discarded. This includes both the HLT and Reco steps.

HLT Jet Finding Overview

Even with a fast iterative cone method, the HLT algorithm cannot keep up with the L1 readout unless there's a very high jet energy threshold (1 Hz = 650 GeV (one jet)). Usually you need something else besides the pure jet $E_{T}$ to have an acceptable rate with an acceptable threshold. Also note that each calorimeter tower corresponds to 1 HCal, and one 5x5 section of PbWO4 ECal pads. Noise elimination thresholds are usually in accordance with "Scheme T" -> $E_{T}$ > 0.5 GeV && E > 0.8 GeV.

Finally, we must mention the recombination schemes. Since information is stored at the particle level, the jet algorithms eventually need to combine the individual particles' momenta and define the collection of particles as a single "jet" entry. There are two schemes to do this. In all cases, the Jet (particle).

  • Single Angle Recombination: (usually used with cone algorithms)
  • Vector Recombination: and (usually used with the kT algorithms)

In all cases, of course, the Jet .

Online Reconstruction

HLT Algorithm - Iterative Cone. Inputs: R = size of cone, = iteration energy change

This is the fastest cone finding algorithm, which also means that it's the least robust. Offline reconstruction is much more accurate, but this iterative cone works well for a first pass. Also keep in mind that the iterative cone algorithm can take a maximum of 4 jets per event, as that's what the L1 trigger can maximally find. The iterative cone method works by:

  • Order a list of all particles in an event by $E_{T}$.
  • Take a size R cone in $\eta$ and $\phi$ space cast around input object with largest $E_{T}$.
  • Find the "proto-jet" using all particles in the size R cone using one of the recombination schemes (usually Single Angle)
  • Computed direction now seeds new protojet.
  • Iteratively step until the proto-jet energy change is less than the desired (usually 1%) and the < 0.01.

Offline Reconstruction

These next few reconstruction methods are all significantly slower than the iterative cone algorithm, but seem to be significantly more robust, especially in terms of jet splitting and merging. In fact, this area is where the next algorithm thrives.

Midpoint Cone Algorithm - Mid speed. Inputs (size of cone R, jet energy overlap threshold f)

When proto-jets are calculated here, the objects in the proto-jets are not removed from the particle collection, as they were in the iterative cone method. Instead, for each pair of protojets within R, a midpoint is defined. Then, this midpoint is used as a seed to find more protojets. Essentially, what we do is:

  • Starting with the highest particle $E_{T}$, discover if other particles lie within a distance R of the particle
    • If not, then the proto-jet is defined as a jet and we remove this particle from the list
    • If yes, then continue
  • Looking at our two overlapping proto-jets, we calculate the shared energy between the jets
    • If the shared energy > f (f is usually ~50%), the proto-jets are merged.
    • Else, the particles are individually assigned to the closest proto-jet in $\eta$ and $\phi$ space.
  • This repeats, starting with the proto-jet with the highest remaining $E_{T}$, until all jets have been located.

Inclusive $k_{T}$ Algorithm - Mid speed. Inputs (single particle weight $R^{2}$)

This is a cluster-based algorithm, meaning that it does not require a single particle seed to start with, as do the iterative and midpoint cone algorithms. Instead, what we do is we look at all particles in an event, along with all pairs of particles in an event and sort them by isolation. Essentially, the algorithm works like this:

  • For each object i and each pair ij, we find:
    • , where $R^{2}$ is a dimensionless weighing parameter, usually equal to 1.
    • , where
  • If $d_{i}$ is the smallest, then the object designated as a jet and filled into the jet list.
  • If $d_{i,j}$ is the smallest, then the objects are merged (usually using the vector recombination method) and re-added to the particle list as a single particle.

SIS-Cone Algorithm - Mid to Slow Speed. Inputs (Cone radius R, iteration steps N)

This is a more complicated algorithm, and one that needs to be looked at further. From what I can understand, this SIS-Cone (Seedless Infrared Safe Cone) is seedless, meaning that the highest $E_{T}$ particle is not influencing where the jet will likely be. Instead, the algorithm searches for "stable jets," which are jets that have the same direction and energy regardless of whether the particles located on the edge of the cone radius are included or not. It seems to work a little better than the pure Midpoint Cone and the kT algorithms. The general method seems to be:

  • For all particles in event
  • Find Stable Cones
  • Add each stable cone to the list of jet objects
  • Remove particles within a stable cone from the candidate list
  • Repeat N times or until no new cones are found.

For finding stable cones, the procedure is outlined as such:

  • For each particle i
    • Find all particles j within a distance 2R of i.
    • For each particle j, identify the two circles where i and j lie on the circumference
    • Compute angle (in y and $\phi$ space) of each circle's center to the location of particle i. (y = rapidity)
    • Call this angle $\zeta$, where
    • Sort circles into increasing $\zeta$.
    • For all four permutations of including and excluding the edge points i and j
      • If cone containing inside particles has not been found (using bitwise XOR to increase speed), add it to the list of cones.
      • If the cone momentum is the same with or without the edge points, define it as stable.
    • Do this for all circles
  • For each stable cone, check explicitly its stability, then add it to the list of proto-jets.

Then, we take our list of proto-jets and apply the midpoint cone algorithm to obtain a list of true jets.

Anti-kT Algorithm - Mid speed. (Inputs: single particle weight $R^{2}$)

antiKtPerf.gif
Anti-kT Algorithm Performance

This algorithm is essentially the same as the kT algorithm, but the method it uses to find the distances $d_{i}$ and $d_{i,j}$ work a little differently. In the algorithm paper, it is shown that the case of using $E_{T}^{2}$'s work quite well, but it is also shown that using $E_{T}^{-2}$ may work even better. In the plot to the right (taken from the anti-kT paper, published here, we find that the anti-kT algorithm has a significantly better jet structure - i.e. the largest jets have the most conical shape, while the smaller jets tend to get mixed with the background. To use this algorithm, we simply have to modify the distances $d_{i}$ and $d_{i,j}$ from the kT algorithm to be:

  • , where $R^{2}$ is a dimensionless weighing parameter, usually equal to 1.
  • , where, as before
and follow the same steps as we do in the kT algorithm.

May 11, 2012

Looked for a long time at Jet Trigger Algorithms today. Found a good summary from this paper:

Page 693:

L1 Jet-finding algorithm. The jet trigger is based on sums of the transverse energies of electromagnetic and hadron calorimeters in non-overlapping towers 4 4 in size (0.35η 0.35φ region). Simulation has shown that the jet-finding trigger based on summing $E_{T}$ from 16 trigger towers in the region of 0.35 0.35 in the η and φ directions quite satisfies the LHC experimental conditions. An adequately acceptable result was obtained for both jets with high momentum pT, generated by standard QCD processes, and jets from decays of SUSY particles with masses above 300 GeV. The relatively small jet regions provide certain advantages for implementation of triggers with many jets, since jets are more easily resolved in the η and φ directions. The jet-trigger region has a 10-bit dynamic range covering energies up to 1000 GeV. The values of local jet sums are sorted according to their transverse sums to obtain the top ranks of jets. Moreover, the jet trigger simplifies the search for isolated τ leptons, which can be produced by decays of MSSM (Minimal Supersymmetric Standard Model) Higgs particles. The data on jet candidates are obtained by simple summation of the values of $E_{T}$ from 16 electromagnetic and 16 hadron towers. The same sums are used as input data for the total and missing energy missing $E_{T}$ of sums, which together with the tower center position are used to calculate various $E_{T}$ components. Neutrino identification consists in calculation of the missing $E_{T}$ vector and checking this value by the preset threshold.

Gave a presentation today on the L1 trigger spectra and threshold curves. This is located here.

Had an extensive read today on the L1 Jet Trigger Algorithm. Ideally, I'd like to know the entire process from triggering through HLT through reconstruction. Wei then asked me to write an internal Purdue analysis note for the Jet Trigger. The discussion that follows will be enhanced by using the diagrams on the right hand side of the page.

JetTrgAlgo.gif
Calorimeter Tower Configuration

L1JetTrgConfig.gif
L1 Jet Configuration

The Jet Trg. Algorithm uses the $E_{T}$ in each 4x4 tower section (calorimeter "region"). The Jets are then characterized by the $E_{T}$ in each 3x3 block of calorimeter regions (12x12 towers). The jets can be defined as "$\tau$ - like" if no $\tau$ veto bits are set in any region. These $\tau$ veto bits are set if there exist > 2 active Ecal & Hcal towers in any region.

Overall, there are 72 $\phi$ towers and 56 $\eta$ towers (|$\eta$| < 5, full 2 $\pi$ of coverage). 20 regional crates control the individual towers. 18 control the barrel and endcaps (|$\eta$| < 3), 1 crate from Hadronic Forward Calorimeter (HF) 3 < |$\eta$| < 5, and finally 1 last crate for the regional information from the other crates. This "Master" crate sends the jet/tau candidates with location info and $E_{T}$. Tau rejection is nice for jet-tagging as tau events often look just like jets. I don't fully understand how the veto rejection bits work, so I might add this to the to do list (done). The tower size of the HF cal is larger than that of the barrel, such that a 12x12 tower region (in hcal or ecal) corresponds to a 3x3 tower region in the HF cal. The Center of the jet is picked out using a simple algorithm for the 4x4 regions. If the $E_{T}$ > region to the right and bottom, and the $E_{T}$ >= region to the left and top, that region is a center region for jets. Finally, the algorithm can keep only the highest 4 energies for central and forward jets, and (I think) the highest 4 energies of central taus. Therefore, up to quad jet triggers are possible.

Things left to do:
  • Obtain information about the HLT Triggers (Specifically the Jet HLT Trigger Algorithms)
  • Figure out how to change the Jet Reconstruction algorithm in the OpenHLT framework. Seems like ak5CalCorJets is not a common algorithm for HI.
  • Ensure we're doing Heavy Ion reconstruction (not pp reco) when we build the jets using OpenHLT. This needs to be revisited.
  • Further study on the tau veto bits.

May 2, 2012

Calculating Trigger Efficiency Curves

Need various trigger thresholds to obtain this spectrum. Current ntuple (As of 5/1) only contain 1 trigger threshold. I think the OpenHLT production can be adjusted for any trigger threshold for certain triggers (specifically Jet triggers and EG triggers). Each Trigger Menu (located here) contains a fairly extensive set of triggers, so I decided to build the plots of the efficiency curves for the various triggers that were in the collection. The plots of Ratio of passed events to total events vs trigger threshold can be made, just with only a few points at a time (i.e. Jet 16, Jet 36, Jet 52, etc...). These plots will be shown in a presentation on May 11.

March 28, 2012

L1 Trigger Algorithm

Mostly written by MIT Students (MIT And Vanderbilt basically run the HI @ CMS.

CMS Working Groups (HI) PinG

  • Spectra
  • Flow / correlations
  • Dilepton
  • Jet
  • Dimuon

Tracking needs to be improved before charm-tagged jets can be observed.

  • Start from p+p -> Move to Heavy Ion?

March 6, 2012

The CMS Experiment

cmsSchematic.gif
Layout of the CMS Experiment

Dimensions

  • 21.6 m long
  • 14.6 m diameter
Magnet
  • 4 T inner field, 2 T return yoke through muon detector
  • 13 m long
  • 6 m inner diameter
Tracking Volume
  • 5.8 m length
  • 2.6 m diameter
Design Luminosity
  • $10^{34} cm^{2} s^{-1}$
  • 25 ns bunch crossing
Inner Tracking System
  • $| \eta |$ < 2.5, efficient to $| \eta |$ < 2.0

February 29, 2012

Generating MinBias events from AMPT (impact parameter (b) from 0 to 10)

  • Needs:
    • MC Info
      • Particle Corresponding to each track
      • Particle ID Distribution
    • Reco Info
      • Reco Track <--> MC track correspondence.
      • pT Distribution for each particle
      • |eta| for each particle
      • pT (Reco) - pT (MC) / pT (MC) vs pT (MC) (Resolution)
      • No. of Silicon tracker hits associated with each track for each PID

  • Schematic:

Pass generator -> [Virtual Experiment (Geant 4)] -> [MC Info] -> [Reconstruction, based on hit smearing] -> [Reco Hits] -> [Reco Tracks]

January 30, 2012

GRID Takes care of file finding and shipping

p-Pb Simulation

  • Trigger analysis -> which trigger fires and how often
  • L1 Trigger
  • HLT (not L0 / L1)

To Do

  • Study GRID
  • Copy any root file - Draw pT, eta, multiplicity spectrum

January 26, 2012

CMSSW-SW Hierarchy

Collision -> (L1 Trigger + HLT Trigger) -> RAW -> (tracking, vertexing, Particle ID) -> RECO -> (Organization + RECO Summary) -> AOD

PAT - Physics Analysis Toolkit (Pat-tuples)

  • Menu in restaurant -> People order different things, but all from the same menu
  • Allows easy interaction w/ RECO and AOD data.
  • PATs are different depending on physics being analyzed
  • Configured using python

cmsRun - Main software module -> extensions are loaded onto main substructure. Usually use cmsRun for creating known data, where FWLite has improved interactivity.

!!TIP!! - Protect #includes with

#if !defined (__CINT__) && !defined (__MAKECINT__) ... #endif

Using FWLite requires the use of fwlite::Handles, which act as branches in TTrees -> Handle.getbyLabel(). edm::EventBase works the same way but integrates into cmsRun.

References

[1]: B-tagging of Jets Example: http://www-d0.fnal.gov/Run2Physics/top/singletop_observation/

-- KurtJung - 11-May-2012

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf BJet-HiForestAdditions-kjung.pdf r1 manage 157.9 K 2013-01-10 - 21:54 KurtJung  
PDFpdf BTagJEC-kjung.pdf r1 manage 202.6 K 2013-01-10 - 21:45 KurtJung  
GIFgif JetTrgAlgo.gif r1 manage 33.7 K 2012-05-13 - 21:23 KurtJung Calorimeter Jet Trigger Algorithm
GIFgif L1JetTrgConfig.gif r1 manage 58.5 K 2012-05-13 - 21:28 KurtJung L1 Jet Trigger Configuration
GIFgif antiKtPerf.gif r1 manage 103.8 K 2012-05-15 - 18:06 KurtJung Anti-kT algorithm performance
PDFpdf bJetProdRequest.pdf r1 manage 32.1 K 2013-01-10 - 21:49 KurtJung  
PNGpng b_tagging_graphic.png r1 manage 80.9 K 2012-05-16 - 19:50 KurtJung shows how b-tagging wrt impact parameters work
GIFgif cmsSchematic.gif r1 manage 96.2 K 2012-05-13 - 20:05 KurtJung CMS Schematic
Edit | Attach | Watch | Print version | History: r34 < r33 < r32 < r31 < r30 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r34 - 2017-08-28 - KurtJung
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback