This page details work done in the Summer of 2010 as part of the CMG top quark analysis group. It was originally located on the KUCMSGroupStuff page but was moved here to clear space and make the page more usable.

Top Production

Setting up the work area

ssh -Y lxplus5@cern.ch
cd Top
cmsrel CMSSW_3_6_1_patch3 This is the current version being used for production but will switch when the official ICHEP one is announced.
cd CMSSW_3_6_1_patch3/src

cvs co -d CMGEDMMicroTuple UserCode/cmgtop/CMGEDMMicroTuple
cvs co -d xsection UserCode/cmgtop/xsection
If you are running on a CMSSW_3_5_X release instead of a CMSSW_3_6_X release, you will also need to check out the following two packages:
cvs co -r V00-07-08 FWCore/RootAutoLibraryLoader
addpkg TrackingTools/IPTools V00-01-05

scramv1 b

List of samples to run

Priority 1


Priority 2

Single Top:
/SingleTop_sChannel-madgraph/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO (*)
/SingleTop_tChannel-madgraph/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO (*)



Priority 3

alpgen W+jets:
/W X Jets_Pt Y -alpgen/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO
with X=0,1,2,3,4,5 and Y = 0to100, 100to300, 300to800, 800to1600; see here

sherpa W+Jets:

alpgen ttbar:
/TTbar X Jets_40Gevthreshold-alpgen/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO (*)
with X = 0,1,2,3,4

alpgen W+heavy:
Wbb X Jets-alpgen/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO (*) with X = 0,1,2,3,4
Wc X Jets-alpgen/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO (*) with X = 0,1,2,3,4
Wcc X Jets-alpgen/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO (*) with X = 0,1,2,3,4

(*) not yet available at T2 sites

Setting up config files

mkdir ../work
cd ../work
cp ../src/CMGEDMMicroTuple/CMGEDMMicroTuple/ttbar_syncex_2stepscombinded.py ttbar_syncex_2stepscombined_YOURNAME.py

Now we need to edit the config file to run on the datasets listed above

emacs ttbar_syncex_2stepscombined_YOURNAME.py

Starting with the first one for an example: /TTbarJets-madgraph/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO
You need to go to the DBS interface to look for the data. In the text field enter:

find dataset where dataset = /TTbarJets-madgraph/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO and dataset.status like VALID*
Or just enter the dataset name alone in the text field.

On the results page, click on the "Conf.files" link and look up the GlobalTag used for the production (usually the dataset name gives it away, but here's the best place to get the actual one). This will open a python configuration file out of which you wan to copy the line containing the Global Tag, in our case:
process.GlobalTag.globaltag = 'START3X_V26:All'

Now edit the ttbar_syncex_2stepscombined_YOURNAME.py configuration file with the correct GlobalTag copied above. You may also want to save the file at this point. The naming conventions that will be used for bookkeeping purposes will correspond to the sample being run. The file names will be the sample name with the slashes being replaced by underscored (ex: TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO.py).

Next, edit the output file name to fit the sample being run using these same naming conventions(can add your initials to it during testing as well):

process.out.fileName = cms.untracked.string('TTbar_TopPAT.root') 
will become:
process.out.fileName = cms.untracked.string('TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO.root') 

Going back to the DBS interface search results, click on the LFNs "py" link to display a python configuration file containing a list of root files like:


You can pick the first couple of root files (in may cases one is enough) and paste them after the lines:

process.source = cms.Source("PoolSource",
                            fileNames = cms.untracke.vstring(
Commenting out the root files that are currently in that () clause. NOTE: remeber to take away the extra comma after the last line

If these lines do not already exist, add them (after the Source lines above is OK)

process.maxEvents = cms.untracked.PSet(
        input = cms.untracked.int32(10)
This will limit the number of events processed to 10 for now, to allow for quick command line execution of the configuration file.

Also add the following lines if they are not already present (this should be temporary, since in the production some jet collections were dropped by accident)


A few changes must be made if running 36X code on a 35X sample. Directly after the line reading from PatAlgos.tools.jetTools import * insert the following lines.

from PhysicsTools.PatAlgos.tools.cmsswVersionTools import *

Also, the line switchJECSet( process, "Summer09_7TeV_ReReco332") can be commented out, since this is not needed in CMSSW_3_6_X.

It is also necessary to add another line. Just before the line that says addPfMET(process, 'PF') you will need to add the following line:

addTcMET(process, 'TC')

When processing MC and not data (like in our current case) you will need to comment out the following portion of the configuration file:

## require physics declared
#process.physDecl = cms.EDFilter("PhysDecl",
#    applyfilter = cms.untracked.bool(True)
## require scraping filter
#process.scrapingVeto = cms.EDFilter("FilterOutScraping",
#                                    applyfilter = cms.untracked.bool(True),
#                                    debugOn = cms.untracked.bool(False),
#                                    numtrack = cms.untracked.uint32(10),
#                                    thresh = cms.untracked.double(0.2)
#                                    )
#process.primaryVertexFilter = cms.EDFilter("GoodVertexFilter",
#                                           vertexCollection = cms.InputTag('offlinePrimaryVertices'),
#                                           minimumNDOF = cms.uint32(4) ,
#                                           maxAbsZ = cms.double(15), 
#                                           maxd0 = cms.double(2) 
#                                           )
## configure HLT
#process.hltLevel1GTSeed.L1TechTriggerSeeding = cms.bool(True)
#process.hltLevel1GTSeed.L1SeedsLogicalExpression = cms.string('0 AND (40 OR 41) AND NOT (36 OR 37 OR 38 OR 39)')

Finally, to complete the jet collection fix you need to add the modules in the Path by adding the following lines to the process.p = cms.Path section, after the * process.makeGenEvt line and before the * process.flavorHistorySeq one.

* process.genParticlesForJets
* process.ak5GenJets

Testing the code

The first thing that needs to be done is to test the code interactively with a few event. If the root files for the sample being processed are available at CERN then this is no problem running interactively, but if the sample is not a different file can be tried for the interactive run to test the code. To see if the sample is available at CERN, go back to the DBS search results and click on the "show" link. This will list all of the sites at which this sample is currently located. If the sample is not located at CERN, you can use a different file by commenting out the root files added to the process.source section of the configuration file and adding the file to the list (or simply uncommenting it if it was one of the files previously listed). '/store/relval/CMSSW_3_5_0_pre1/RelValTTbar/GEN-SIM-RECO/STARTUP3X_V14-v1/0006/14920B0A-0DE8-DE11-B138-002618943926.root'

Now, if you named your configuration file as was written above, to test your file interactively do:
cmsRun TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO.py >& TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO.log&

This should produce two things:
A root file: TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO.root
And a log file: TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO.log

You should check the log file for any warnings or errors that occurred while running the configuration file. Once you are confident your configuration is what you want and does not crash when run over a few events you can prepare to launch a first test job on the grid.

Setting up CMST3 directory space

You will need to set up the directories to store the datasets after they have been processed on the grid. For bookkeeping purposes, the directory names on CMST3 will be the same as the sample names with the slashes in the path name replaced by underscores. For example, the directory for the first sample (/TTbarJets-madgraph/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO) will become /TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO. To create this directory, use the command:

rfmkdir /castor/cern.ch//user/r/rovere/cmst3/TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO

It will also be necessary to change the permissions to allow everyone to write to it using the command:

rfchmod 775 /castor/cern.ch//user/r/rovere/cmst3/TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO

To check that the directory is now there, use:

nsls /castor/cern.ch//user/r/rovere/cmst3/

In this directory, you need to create the following standard subdirectories, and change their permissions:

rfmkdir /castor/cern.ch//user/r/rovere/cmst3/TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO/CMGtuple
rfmkdir /castor/cern.ch//user/r/rovere/cmst3/TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO/CMGPAT
rfmkdir /castor/cern.ch//user/r/rovere/cmst3/TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO/log
rfchmod 775 /castor/cern.ch//user/r/rovere/cmst3/TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO/CMGtuple
rfchmod 775 /castor/cern.ch//user/r/rovere/cmst3/TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO/CMGPAT
rfchmod 775 /castor/cern.ch//user/r/rovere/cmst3/TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO/log

Again, to check that these subdirectories were created correctly, use:

nsls /castor/cern.ch//user/r/rovere/cmst3/TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO/

Running on the grid

From lxplus, setup the crab environment by sourcing the following files

source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.csh
source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.csh

You can test you grid certificate with: grid-proxy-init -debug -verify

Now you will need to copy over the crab configuration file

cp -pR ~gbenelli/public/ForDanny/crab_example.cfg . 

This will need to be edited for the correct datasets, configuration file name, and output file name. Open up the crab configuration file.

emacs crab_example.cfg

And edit the [CMSSW] section to look like the example below:

datasetpath             = /TTbarJets-madgraph/Spring10-START3X_V26_S09-v1/GEN-SIM-RECO
pset                    = TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO.py
events_per_job          = 10000
total_number_of_events  = 10000
#runselection            = 132440,132442,132458,132471,132473,132474,132476,132477,132478
output_file             = TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO.root

This sets it to run on the correct sample, the configuration file we have edited and tested above, run 1 jobs over 10000 events as a test, and save the output file with the same naming conventions used earlier. It is also necessary to change the user_remote_dir to match the CMST3 directory that was created above and the ui_working_dir to the naming conventions used above. This will look like:

ser_remote_dir         = /user/r/rovere/cmst3/TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO/CMGtuple
ui_working_dir          = TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO

The file can now be saved using similar naming conventions as before but with "crab_" preceding the sample name(ie crab_TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO.cfg)

Now the job can be created and submitted to the grid to be run:

crab -create -cfg crab_TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO.cfg
crab -submit all -c TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO

To check the status of the submitted jobs, use the command

crab -status -c TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO

Depending on the settings in the crab configuration file, the output will have either been directly written to cmst3 or will be available to download once the jobs are finished using the command

crab -getoutput -c TTbarJets-madgraph__Spring10-START3X_V26_S09-v1__GEN-SIM-RECO
If the line return_data = 1 is not commented out, the data will be returned to your work area by the crab -getoutput command.

If this crab job is run successfully, it can now be edited to run over the full sample. To do this simply change the total number of events run over from 10000 to -1 (all events). So,

total_number_of_events  = 10000

Will become,
total_number_of_events  = -1

After this change is made, simply follow the same steps from above to create and submit the jobs.

-- DannyNoonan - 27-Jun-2011

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2011-09-27 - NilsHoeimyr
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback