Showering Alpgen events with Herwig++ in Athena
Contact:
m.k.bugge@fysNOSPAMPLEASE.uio.no
This page describes how to shower Alpgen events (and do MLM matching) with Herwig++ in Athena. Showering Alpgen events with Herwig++ in standalone mode requires two steps. One must first run a small standalone program called
AlpGenToLH (
AlpGenToLH.cc is found in the Contrib/Alpgen directory of the Herwig++ source tree), which converts the Alpgen output files to the modern LHE file format and produces a suitable steering file for the subsequent Herwig++ run. Then, one runs Herwig++ with the provided steering file. Inside Athena, the user can run directly on the Alpgen output files (which must be slightly modified following ATLAS convention, see below), in which case the
AlpGenToLH conversion step is automatically handled by Athena. However, the user can also run a simpler Athena implementation, in which case a proper LHE event file must be provided. At present, this would require the user to run the
AlpGenToLH conversion step him-/herself, but a new version of Alpgen which will produce modern LHE files directly is soon expected. Once that version is released, there will be no need for the conversion step, and the simpler Athena implementation will be the way to go.
Running Athena on the Alpgen outputs directly
The user can run Athena directly on the Alpgen output files, in which case the
AlpGenToLH conversion step is handled by the Herwig++ interface in Athena. The only tampering of the files required from the user is then to rename the unweighted events file from *.unw to *.events and the parameters file from *_unw.par to *.dat, in addition to adding some lines containing the required merging parameters in the parameters file (which is now called *.dat). This follows exactly the convention of inputs production for Alpgen+HERWIG (Fortran version) in ATLAS, and more details (e.g. which extra lines to add to the .dat file) can be found
here. An advantage of this compatibility is that one can use existing input event samples from the grid.
Update 27.05.2013: The information below should still mostly be correct, but the correct way to run Herwig++ with Alpgen inputs has changed a bit in the latest version, see the subsection "Running with the latest version". Following the description in that subsection, you should be able to run this "out of the box", after only setting a few parameters in the top job options.
Update 25.02.2013: The following description will still work, but you can get things running more easily following the recipe under the subsection "Getting things running starting from Herwigpp_i-00-06-06-01".
First, set up an Athena release which has Herwig++ version 2.6, e.g. 17.2.7.6. Alpgen processing will not work with earlier versions of Herwig++.
asetup 17.2.7.6,here
The framework for showering Alpgen events with Herwig++ is not currently part of any Athena release, so you will need to check out the Herwig++ package, and recompile with code attached to this page. Check out the Herwig++ package:
cmt co Generators/Herwigpp_i
Now, replace the files
Generators/Herwigpp_i/Herwigpp_i/Herwigpp.h
and
Generators/Herwigpp_i/src/Herwigpp.cxx
with the corresponding ones attached to this page. Then, recompile by going into the cmt directory under Generators/Herwigpp_i/ and type "make".
Your checked out version of Herwigpp_i should now have the necessary functionality to process Alpgen events. Now, go into some run directory under your test area (where you set up Athena), and place the following files there (all are found attached to this page):
HerwigppAlpgen.py
AlpGenHandler.so
BasicLesHouchesFileReader.so
The last two files are compiled library files for the Alpgen functionality in Herwig++. Obviously, these should come as part of the Athena release, but this is not currently the case. The files provided here are compiled for 32-bit setups, and have been tested under a "i686-slc5-gcc43-opt" setup on a RHEL5 computer. The first file,
HerwigppAlpgen.py, is the job options script for processing Alpgen events with Herwig++. This script includes the
MC12JobOptions/Herwigpp_UEEE4_CTEQ6L1_Common.py setup, then it looks for the input files named *.dat and *.events, extracting the filename prefix from the .events file, and finally it sets the filename prefix and "doAlpgen" switch in the Herwigpp instance. The last line in the script can be commented back in to make a steering file for standalone Herwig++. The joboptions are run through Generate_trf.py, so after placing a pair of input files (or one tar file containing the pair) from Alpgen in your run directory, you do something like
Generate_trf.py ecmEnergy=8000 runNumber=100001 firstEvent=1 maxEvents=100 randomSeed=1234 jobConfig=./HerwigppAlpgen.py outputEVNTFile=./test.pool.root
Note that you should only place one pair of input files (one .dat and one .events file or one .tar file) in the run directory at a time. Running over a large number of relatively small input sets would then need to be done by a script. I attach an example of such a script, runHerwigppAlpgen.py, which may in particular be useful if you dq2-get an input event dataset from the grid (these datasets consist of relatively small samples, with .dat and .events files together in a tarball). You will need to change some paths in that script. You may also want to change the number of events for each Athena run, called "evtsBatch", although you may run into problems if you increase it too much, due to the limited number of available events in each tarball.
Running with the latest version
In the latest versions of Herwigpp_i, the alpgenFilenamePrefix property of Herwigpp has been removed. Instead, some file names have been hard coded, and these should always work as long as the "inputGeneratorFile" argument to Generate_trf.py has been specified. With the addition of some new common job options in
MC12JobOptions since
MC12JobOptions-00-08-02, only a few lines are needed in the top job options. An example is given here:
evgenConfig.description = "Alpgen+Herwig++ ttbar fully leptonic + 1 jets with UEEE4 tune and CTEQ6L1 PDF"
evgenConfig.keywords = ["ttbar","dileptonic"]
evgenConfig.inputfilecheck = "ttbarlnlnNp1"
evgenConfig.contact = ["m.k.bugge@fys.uio.no"]
evgenConfig.minevents = 500
include('MC12JobOptions/Herwigpp_UEEE4_CTEQ6L1_Alpgen_Common.py')
These job options can be run through a command such as
Generate_trf.py ecmEnergy='7000' runNumber='110021' firstEvent='1' maxEvents='100' randomSeed='1234' jobConfig='MC12.110021.AlpgenHerwigpp_UEEE4_CTEQ6L1_ttbarlnlnNp1.py' outputEVNTFile='./test.pool.root' inputGeneratorFile='group09.phys-gener.alpgen.105891.ttbarlnlnNp1.TXT.v1._00062.tar.gz'
(with the above lines placed in the file MC12.110021.AlpgenHerwigpp_UEEE4_CTEQ6L1_ttbarlnlnNp1.py).
The latest version of Herwigpp_i and the latest Herwig++ version should be available in Athena release 17.2.10.4, and production tests have been carried out with release 17.2.10.5. The necessary .so files should already be part of the Herwig++ installation used by these releases.
Update: A problem related to running on grid sites without AFS access has been fixed, and some small improvements in Herwigpp_i have been implemented. Release 17.2.13 or later is recommended.
To run on grid with Pathena, issue a command such as
pathena --trf "Generate_trf.py ecmEnergy=7000 runNumber=110024 firstEvent=1 maxEvents=1000 randomSeed=%RNDM:1234 jobConfig=MC12.110024.AlpgenHerwigpp_UEEE4_CTEQ6L1_ttbarlnqqNp0.py outputEVNTFile=%OUT.pool.root evgenJobOpts=MC12JobOpts-00-08-93_v0.tar.gz inputGeneratorFile=%IN" --inDS group09.phys-gener.alpgen.105894.ttbarlnqqNp0.TXT.v1/ --outDS user.mkbugge.testAlpgenHerwigppGridNew-01 --nFiles 3 --nFilesPerJob 1
Here, MC12.110024.AlpgenHerwigpp_UEEE4_CTEQ6L1_ttbarlnqqNp0.py is a top job options file in the same style as the one given above. Only 3 files were processed for this test job.
Getting things running starting from Herwigpp_i-00-06-06-01
The Alpgen processing functionality in Herwigpp_i is now available in a branch. All you need to do, is set up Athena,
asetup 17.2.7.6,here
and then do
cmt co -r Herwigpp_i-00-06-06-01 Generators/Herwigpp_i
in your $TestArea directory. Then, go into Generators/Herwigpp_i/cmt and do a "make". Finally, place the file
HerwigppAlpgen.py from this page in your working directory, as well as a couple of input files (one .dat and one .events file or one .tar file), and run:
Generate_trf.py ecmEnergy=8000 runNumber=100001 firstEvent=1 maxEvents=100 randomSeed=1234 jobConfig=./HerwigppAlpgen.py outputEVNTFile=./test.pool.root
The necessary .so files are inside the Herwigpp_i-00-06-06-01 branch that you checked out. The latest version of
HerwigppAlpgen.py will then automatically find them and place them in your working directory. This has been tested under both "i686-slc5-gcc43-opt" and "x86_64-slc5-gcc43-opt" setups on a RHEL5 computer.
Running on grid with Herwigpp_i-00-06-06-01
After setting up Herwigpp_i-00-06-06-01 as described in the above section, you can run on the grid using pathena, issuing from your run directory (where you have the job options) a command such as the following:
pathena --trf "Generate_trf.py ecmEnergy=8000 runNumber=100001 firstEvent=1 maxEvents=1000 randomSeed=%RNDM:1234 jobConfig=HerwigppAlpgen.py outputEVNTFile=%OUT.pool.root evgenJobOpts=MC12JobOpts-00-07-39_v5.tar.gz" --inDS group.phys-gener.alpgen214.107683.WenuNp3_CTEQ6L1_8TeV.TXT.mc12_v1_i11 --outDS user.mkbugge.testAlpgenHerwigppGrid-15 --nFiles 3 --nFilesPerJob 1 --extFile $TestArea/Generators/Herwigpp_i/run/lib/*.so,$TestArea/Generators/Herwigpp_i/run/lib64/*.so
The only things to note here, is that we need to tell pathena to send the compiled library files (*.so) to the grid site, and that we use only one input file per subjob, since the Herwig++ interface only handles one input file at a time. As the command stands here, both 32-bit and 64-bit .so files are sent to the grid site, so this will work for both 32-bit and 64-bit setups, since the job options will fetch the right .so files at run time. Also, in this particular example, I ran over only 3 input files from the input dataset, as this was just a test job.
How to change the configuration of Herwig++
The only Alpgen related parameters which are eventually meant to be set by the user, are the merging parameters, which are set through the .dat file, as described in the link above. However, as the Alpgen+Herwig++ combination is still at a validation/testing stage, you might want to try changing the setup yourself. The configuration lines related to Alpgen in the Herwig++ setup are written in the function Herwigpp::writeHWPPinFile in the file Generators/Herwigpp_i/src/Herwigpp.cxx. Here, you can easily change the setup for any tests you might want to perform. After changing anything in this function, you need to do "make" again in the cmt directory before running Generate_trf.py.
Update (18.02.2013): The below described settings can now be changed through the job options, with no need to recompile. You just need to uncomment one of the lines
#topAlg.Herwigpp.useTwoLoopAlphas = False
#topAlg.Herwigpp.useShowerImprovement = False
(The default values are True for both these switches.)
One part of the setup which you may in particular want to change is here:
m_herwigCommandVector.push_back(string("########################################################## "));
m_herwigCommandVector.push_back(string("# Shower improvement # "));
m_herwigCommandVector.push_back(string("########################################################## "));
m_herwigCommandVector.push_back(string("cd /Herwig/Shower"));
m_herwigCommandVector.push_back(string("set Evolver:ColourEvolutionMethod 1"));
m_herwigCommandVector.push_back(string("set PartnerFinder:PartnerMethod 1"));
m_herwigCommandVector.push_back(string("set GtoGGSplitFn:SplittingColourMethod 1"));
These lines change the shower setup in a way that is in principle an improvement (changing the rate of wide angle emissions in the shower). My study on the W+jets process show that indeed these lines improve agreement with data as well as with Fortran HERWIG.
Another part which you may want to play with is related to the strong coupling constant. The theoretically "correct" thing to do, is to use the leading order coupling used in the Alpgen matrix element calculation (determined by the PDF used in Alpgen). However, using the same two-loop coupling constant as in Fortran HERWIG, we get better agreement with Fortran HERWIG as well as with data. You can comment in/out the relevant lines in this part of the code:
//Use these lines to get the "proper" alpha_s as governed by the PDF choice
// sprintf(buf,"%f",aqcdup);
// m_herwigCommandVector.push_back(string("set AlphaQCD:AlphaMZ "+string(buf)));
// sprintf(buf,"%d",nloop);
// m_herwigCommandVector.push_back(string("set AlphaQCD:NumberOfLoops "+string(buf)));
// m_herwigCommandVector.push_back(string("set AlpGenHandler:ShowerAlpha AlphaQCD"));
//Use these lines to get the two-loop alpha_s used by fHerwig
m_herwigCommandVector.push_back(string("#Two-loop alpha_s as used in fHerwig"));
m_herwigCommandVector.push_back(string("set /Herwig/Shower/AlphaQCD:InputOption LambdaQCD"));
m_herwigCommandVector.push_back(string("set /Herwig/Shower/AlphaQCD:LambdaQCD 180.0*MeV"));
m_herwigCommandVector.push_back(string("set /Herwig/Shower/AlphaQCD:NumberOfLoops 2"));
m_herwigCommandVector.push_back(string("set /Herwig/Shower/AlphaQCD:LambdaOption Convert"));
m_herwigCommandVector.push_back(string("set /Herwig/Shower/AlpGenHandler:ShowerAlpha /Herwig/Shower/AlphaQCD"));
As the code stands here, the Fortran HERWIG coupling will be used, which means that Lambda_QCD = 180
MeV and the running is at two-loop order.
Details on the implementation
In order to make the
AlpGenToLH conversion happen automatically inside Athena, the code from the standalone conversion tool has been copied into Herwigpp.cxx (and necessary additions to Herwigpp.h have been made as well). The code from "main()" in
AlpGenToLH.cc has been put into the function "Herwigpp::readAlpgen()". Other functions from
AlpGenToLH.cc have been copied directly, with some alterations. The main alterations are that the merging parameters (including whether the matching should be inclusive or exclusive) are read from the parameters file (*.dat), and that the strong coupling constant (value at Z pole and loop order for evolution) is set based on the PDF used in Alpgen instead of being read from the .par file. The latter alteration eliminates the need for the .par file from Alpgen, thus making full compatibility with existing ATLAS input datasets possible. (The electroweak coupling was also read from the .par file, but it is not actually passed on to Herwig++.) In fact, the strong coupling constant could also have been hard coded, as the job options have already read the tune for
CTEQ6L1, so the setup should not be used with any other PDF. In addition to these main changes, some minor changes of the code were necessary, and some alteration was also done to make sure that some important errors were propagated properly to Athena instead of simply producing a call to "exit()".
Two "properties" (that can be set from the job options) were added to the Herwigpp class, namely the "doAlpgen" switch (default value false) and the "alpgenFilenamePrefix" string. The function "Herwigpp::readAlpgen()" is called from "Herwigpp::genInitialize()" if the "doAlpgen" switch has been set to true. The .events file is then converted to a .lhe file in the modern LHE format, and the necessary lines for Alpgen events processing are added to "m_herwigCommandVector" in the function "Herwigpp::writeHWPPinFile".
Validation
The Athena implementation was validated with Athena release 17.2.7.6 by comparing the results from showering W+jets -> e nu + jets inside Athena with the results from showering events from the same input dataset using
AlpGenToLH and Herwig++ standalone . Good agreement was observed, as can be seen
here
.
Running Athena on LHE files
Instead of runnning directly on the Alpgen output files, the user can run Athena on LHE files which have been produced by the
AlpGenToLH conversion tool. The same procedure can be used to run on the LHE files produced by Alpgen version 2.2, once it is released.
Update 24.06.2013: With newer Athena versions, such as 17.2.10.4 and 17.2.10.5, you should be able to run the attached
HerwigppAlpgen_LHE.py job options "out of the box". There is no need to check out Herwigpp_i unless you want to change something, and the .so files should automatically be available. The exact agreement between this implementation and the one described above was last validated in release 17.2.10.5 using both W+jets and ttbar+jets events.
First, set up Athena and check out the Herwig++ package as described above. Replace the file
Generators/Herwigpp_i/python/config.py
with the corresponding one attached to this page. Then, go into the cmt directory under Generators/Herwigpp_i/ and type "make". (The Python code need not be compiled, but it seems that doing "make" once is necessary for Athena/CMT to recognize your local version of Generators/Herwigpp_i/.) Now, go into some run directory under your test area (where you set up Athena), and place the following files there (all are found attached to this page):
HerwigppAlpgen_LHE.py
AlpGenHandler.so
BasicLesHouchesFileReader.so
(See comments about the .so files above.) Put also the LHE event file under your working directory. Open the file
HerwigppAlpgen_LHE.py, and find the following line:
topAlg.Herwigpp.Commands += hw.alpgen_cmds("events.lhe",0,20.0,0.7,6.0).splitlines()
For the first argument, insert the file name of your LHE event file. For the second argument, change this to 1 if you are processing the highest jet multiplicity sample, else leave it as 0. The last three arguments are the merging parameters, and you may need to change those depending on the Alpgen settings for the sample you want to process. The values given here are the values needed for ATLAS W+jets samples. Finally, you run Athena by doing something like:
Generate_trf.py ecmEnergy=8000 runNumber=100001 firstEvent=1 maxEvents=100 randomSeed=1234 jobConfig=./HerwigppAlpgen_LHE.py outputEVNTFile=./test.pool.root
How to change the configuration of Herwig++
If you want to change the configuration of Herwig++, simply go into Generators/Herwigpp_i/python/config.py. All Alpgen specific commands are set in the function alpgen_cmds, and you can easily edit them here. Recompilation should not be necessary for the changes to take effect.
Details on the implementation
The function alpgen_cmds in Generators/Herwigpp_i/python/config.py does not simply return a predefined string. Certain parameters are extracted from the LHE file in order for the Herwig++ configuration to be set up properly. You may get an error from within alpgen_cmds if your LHE file does not look as expected.
Validation
This implementation was tested against the C++ implementation described above using a small test sample of W+jets events. The results were identical.
Choice of strong coupling constant in the Herwig++ shower
In the initial validation process for the Alpgen+Herwig++ setup, two different choices have been explored for the strong coupling constant used in the Herwig++ shower. Some results can be found
here
. The 1-loop alpha_s which is used for the PDF and in the Alpgen matrix element calculation is theoretically preferred for a more consistent MLM matching. Indeed, this is the choice that gives the best result in terms of the total normalization (meaning that the cross sections of all the parton multiplicities after matching add up to the same as the cross section of the "+0jets" sample before matching). However, the 2-loop alpha_s which is used by Fortran HERWIG seems to give slightly better agreement with data in terms of the shapes of jet distributions. The user can easily switch between these alpha_s choices from the top job options, by inserting the line
topAlg.Herwigpp.useTwoLoopAlphas = False
The default value of this switch is True, meaning that the 2-loop alpha_s will be used by default.
Physics validation
Physics validation of the Alpgen+Herwig++ setup has been performed using both W+jets (7
TeV and 8
TeV) and ttbar+jets (7
TeV only) events. The setup has been validated against Fortran HERWIG samples and against data (7
TeV only) using Rivet.
For details on the W+jets validation, please see these slides:
HerwigppAlpgen.pdf
,
HerwigppAlpgenUpdate.pdf
. Also, you can access some Rivet plots directly
here
(directories named kfac contain plots where all samples are scaled to a higher order cross section).
Plots from the validation of ttbar+jets events can be found
here
. Here, the only analysis containing data is the "gap fraction" analysis, and in this analysis very good agreement is seen between Fortran HERWIG, Herwig++, and data. The agreement between Fortran HERWIG and Herwig++ in the pure MC analyses is not perfect, but perfect agreement is not expected here. As one can also see in the W+jets validation plots, there are significant differences between Fortran HERWIG and Herwig++ in the pure MC analysis even though agreement in the ATLAS W+jets analysis is excellent. The disagreement in MC_WJETS is understood to come mainly from underlying event, as agreement without underlying event (see slides) is much better. We can thus expect to see similar effects of underlying event in the ttbar+jets events, and we should not worry too much about disagreements in pure MC analyses.
Attached files: