Alpgen CMS

Complete: 5

Introduction

Alpgen is a collection of codes (executables) for calculation of hard multiparton processes in hadronic collisions. It produces matrix element (ME) level events which are consecutively passed to a parton shower (PS) / hadronization code (like Pythia or HERWIG) for further event development. Using the MLM ME/PS matching procedure, one can combine the matrix element calculations with parton showers while avoiding double-counting. This allows for the best Monte Carlo prediction of multi-jet final states at this moment.

This page documents the usage of Alpgen within the CMS Experiment.

Event Generation

Alpgen is a matrix element generator which generates as its output a set of files which contain parton-level events, i.e., data representing only the hard scattering for a given process (like W+jets, for instance). Alpgen itself is distributed as a collection of executables, which have to be built and run independently for every process. In order to have real, complete Monte Carlo events, those parton-level events have to be passed to another piece of software which takes care of aspects like initial-state radiation, final-state radiation, parton showering, hadronization. In CMS, we use PYTHIA as this second piece of software, through the interface described in SWGuideAlpgenInterface , which is a set of CMSSW modules. But Alpgen itself can't be run in CMSSW, because of its structure as a collection of executables. So, event generation with Alpgen a happens in through a three-step process:

  1. Generation of weighed events using an Alpgen executable
  2. Generation of unweighted, parton-level events using the same executable.
  3. Generation of full-fledged Monte Carlo events using the auxiliary parton shower code.
The first two steps are generally called "Alpgen standalone", since they are done with Alpgen code only. The third step is generally called "showering and matching", and in CMS it is the only step performed within the CMSSW Framework proper. In the following figure, the red and blue arrows refer to the Alpgen standalone step, and the green arrows and black lines refer to the showering and matching step.

img2a6ec60a419891e15b10dd245bad0a12.png

Running Alpgen standalone

To run Alpgen standalone, one must either download and compile the codes available in the Alpgen homepage, or use the precompiled ones available within CMSSW. To access the precompiled Alpgen executabless, one must prepare the CMSSW environment and search in the subdirectories of the CMS external software directory:
$ cmsenv

$ ls $CMS_PATH/sw/$SCRAM_ARCH/external/alpgen
212  212-cms  212-CMS18  212-CMS19  212-CMS2  213  213-cms  213-cms2

$ ls $CMS_PATH/sw/$SCRAM_ARCH/external/alpgen/213-cms2/bin
2Qgen                phjet_300_5000bingen  wjet_1600ptwgen         zjet_100ptz300gen
2Qphgen              phjet_60_120bingen    wjet_300ptw800gen       zjet_1600ptzgen
4Qgen                phjetgen              wjet_800ptw1600gen      zjet_300ptz800gen
hjetgen              QQhgen                wjetgen                 zjet_800ptz1600gen
Njetgen              topgen                wjet_VBFHiggsTo2Taugen  zjetgen
phjet_120_180bingen  vbjetgen              wphjetgen               zjet_VBFHiggsTo2Taugen
phjet_180_240bingen  wcjetgen              wphqqgen                zqqgen
phjet_20_60bingen    wjet_0ptw100gen       wqqgen
phjet_240_300bingen  wjet_100ptw300gen     zjet_0ptz100gen

Each of those executables is for a different process / final state, which can be determined, as a rule of thumb, from its name:

Name part Meaning
Q Heavy quark (b or t)
h Higgs
jet Extra jet
ph Photon
top Single top
w W boson
z Z boson

So, 2Qphgen is the executable for the ''heavy quark + photon" final state. But please note that this is just a rule of thumb - please check the complete Alpgen documentation. Also note that there are some special executables like wjet_300ptw800gen, which impose kinematic cuts in order to allow for better exploration of the tails of pT distributions.

All those executables can be run interactively, or be driven by input files. One minimal example is given below.

1     ! imode
w2j   ! label for files
0     ! start with: 0=new grid, 1=previous warmup grid, 2=previous generation grid
10000 2  ! Nevents/iteration,  N(warm-up iterations)
100000   ! Nevents generated after warm-up
*** The above 5 lines provide mandatory inputs for all processes
*** (Comment lines are introduced by the three asteriscs)
*** The lines below modify existing defaults for the hard process under study
*** For a complete list of accessible parameters and their values,
*** input 'print 1' (to display on the screen) or 'print 2' to write to file
njets 2
ptjmin 20
drjmin 0.7 

This example can be input to an executable in the following way: wjetgen < file.input and it will generate weighted events for W+2j for proton-antiproton collisions and for 980 GeV per beam (Tevatron configuration). It will also produce the following additional files [run and look]: cnfg.dat  w2j.grid1  w2j.grid2  w2j.mon  w2j.par  w2j.stat  w2j.top  w2j.wgt

So, one needs to include in the input file the lines

ih2 1
ebeam 7000
to change the parameters to proton-proton collisions at 7000 GeV per beam.

Here is a table containing the most used options of the input files. Again, note that the full array of options can be found in the complete Alpgen documentation

Name Meaning
njets Number of light jets in the final state
ebeam Beam energy
ih2 Second beam identity (1 = proton, -1 = antiproton)
ickkw Usage of matching
iqopt Choice of energy scale Q for the process
qfac Q scale rescaling factor
etajmax Rapidity range for the light jets
etalmax Rapidity range for the charged leptons
ptlmin Charged lepton minimum pT
ptjmin Light jet minimum pT
drjmin Minimum delta R in between jets

Running showering and matching.

In CMS, showering and matching are done within the CMSSW Framework. Please refer to the SWGuideAlpgenInterface page.

Centralized productions in CMS

For centralized productions in CMS, the workflow is the following:

  1. Run event generation for Alpgen up to the point where it outputs unweighted events.
  2. Format-shift the events it outputs to a standard format (Les Houches Accord Events, a.k.a LHE). The AlpgenSource module, described in the SWGuideAlpgenInterface page, together with the LHEWriter module, are fit for this step.
  3. Send these LHE events to the central database known as MCDB: http://mcdb.cern.ch/
  4. The Production team takes over from there, and runs standard grid jobs to do the rest of the workflow (running PYTHIA, running detector simulation, reconstruction, PAT...)

If you are interested in having a large centralized production in CMS, there are two possibilities. The preferred one is that you make the request yourself. That means producing and uploading the LHE files yourself, with guidance of the Alpgen contact in the Generators Group. Alternatively, if you can't do that, the other possibility is that you contact the Generators Group and ask them to make the request in your behalf and to produce the LHE files for you.

Delegating LHE files production to the Generators Group

You must comply with the following guidelines:
  1. You must describe completely and exactly what sample you want. That means you will provide a set of input files, executables, and desired number of events for each channel and/or bin you want in your request.
  2. You must provide the grid files for that sample.
  3. You must provide values for all the three efficiencies involved (weighted generation, unweighting and matching).
  4. You must provide a clear way to prioritize the samples, in case you request more than one.

Also, please notice that the resources (both human and computing) of the Generators Group are very limited, and that the time for completion of a request can be larger than what would be ideal for you. Please consider always producing the LHE files yourself.

Producing the LHE files yourself

The Alpgen contact in the Generators Group can give you advice on how to setup a computing environment suitable for the task. You will still want to have the input files, executables and grid files for your own convenience. There are two main steps:
  1. For the generation itself, you will want to have access to a computing farm (a standard CMS Tier2 centre will do, and even a Tier3), and some sort of job scheduler system, like Condor, PBS or LSF. Alpgen itself is not paralelizable, but many instances of ALPGEN can be run separately, each as a different job. The unweighted events files can then be concatenated together and given as input for AlpgenSource, which format-shifts them to LHE.
  2. For uploading the LHE file to the MCDB database, you will have to register yourself as an Author at http://mcdb.cern.ch. The MCDB team provides a standard script to upload files to the database, but since behind the scenes what is happening is really a lcg-cp command, you will need to have access to both a Grid UI and Grid Credentials.

Using CRAB to produce ALPGEN files.

The Generators Group provides a way to harness the power of the Grid to produce ALPGEN unweighted files. This tarball provides all files needed to submit ALPGEN jobs to the Grid using CRAB. The files contained are the following:
  • crab_genAlpgen_grid.cfg: This is the CRAB configuration. In this file you have to do the usual setting of total_number_of_events and events_per_job. You also have to setup the input and output files.
    • To setup the output files - we suggest you to retrieve all four output files: LABEL.wgt, LABEL.par, LABEL.unw, LABEL_unw.par, where LABEL is the label given to the files in the ALPGEN input card (see below). The files can be rather large, so we suggest you to use the CRAB options to retrieve the files to a Storage Element, with the options copy_data and user_remote_dir.
    • To setup the input files - use the additional_input_files option. The input files needed are the ALPGEN input card alpgen.input and the ALPGEN grid file LABEL.grid2.
  • alpgen.input: the ALPGEN input card. This is a standard ALPGEN card, here you setup all the process parameters. Pay special attention to the second line, it defines the LABEL mentioned above. So, for instance, if the second line is allure   ! Some comment, the output files will be allure.wgt, allure.par, allure.unw, allure_unw.par. Edit this file for your particular process.
  • runAlpgen.sh - the wrapper file that runs ALPGEN in the remote computer. The only thing you need to edit here is the ALPGEN executable that is going to be run. Edit the line export ALPGEN_EXECUTABLE=wjetgen to the executable related to the process you want to run (wjetgen for W+jets processes, 2Qgen for ttbar processes, etc.).
  • dummy.py - a dummy CMSSW configuration file to help generate the Framework Job Report. No need to edit it.
  • w1j.grid2 - an example of an ALPGEN grid file, with LABEL "w1j"
To use the tarball, just setup all your Grid Environment, then create and submit the CRAB jobs. In lxplus, for instance:
source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env.sh
cmsenv
source /afs/cern.ch/cms/ccs/wm/scripts/Crab/crab.sh
crab -create -cfg crab_genAlpgen_grid.cfg
crab -submit -continue

The following figure illustrates the process:

LargeScaleProduction.png

Summary of past centralized productions

Production Summer 2009 (at 7 TeV center of mass)

Production Winter 2009

Production Summer 2008

Production CSA07

Utilities

Translating _unw.par files to HEPML files.

ALPGEN saves the information about a given run on a file with _unw.par suffix, with its own, private formatting. In order to allow for a better bookkeeping, it is useful to save that information in a standard format like the HepML standard. We provide a small C++ program, partially derived from the implementation of the AlpgenSource module in CMSSW, that is able to do the _unw.par --> HepML translation. In order to use this code, you have to have the libhepml (version greater or equal than 0.2.6) library and headers installed in your system, and be able to compile and link a C++ program against it. In that case, you can compile the translation program in the following way:
tar -xzvf ALPGEN_hepml.tar.gz
cd ALPGEN_hepml
g++ -I$HEPML_INCLUDE_DIR -I. -L$HEPML_LIB_DIR -lhepml-writer AlpgenHeader.cpp ALPGEN_hepml.cpp -o ALPGEN_hepml.exe
and run it with
./ALPGEN_hepml.exe somefile_unw.par
which will produce a file named hepml.out. This file contains the same information as the _unw.par file, but in HepML format. You can download the C++ code here: and the libhepml at http://mcdb.cern.ch/distribution/ .

Documentation

Review status

Reviewer/Editor and Date (copy from screen) Comments
-- ThiagoTomei - 23-Nov-2010 Added content on using libhepml
-- ThiagoTomei - 28-Jul-2010 Added content on using ALPGEN in the Grid
-- ThiagoTomei - 31-May-2010 Update in centralized productions
-- ThiagoTomei - 23 May 2009 Copied from Maria Spiropulu's CMS Alpgen page

Responsible: FlaviaDias
Last reviewed by: ThiagoTomei

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatgz ALPGEN_hepml.tar.gz r1 manage 5.6 K 2010-11-23 - 20:17 ThiagoTomei Code for translating _unw.par ALPGEN files to HEPML files.
Unknown file formatgz AlpgenInGrid.tar.gz r1 manage 4.4 K 2014-06-23 - 14:54 ThiagoTomei Tarball for ALPGEN production in grid
PNGpng AlpgenUsage.png r2 r1 manage 61.3 K 2009-05-23 - 09:23 ThiagoTomei  
PNGpng LargeScaleProduction.png r1 manage 185.1 K 2010-06-01 - 16:24 ThiagoTomei  
Edit | Attach | Watch | Print version | History: r18 < r17 < r16 < r15 < r14 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r18 - 2014-06-24 - ThiagoTomei



 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback