FASER MDC Production
Overview
This page explains how to produce FASER monte carlo samples, in particular the samples used for the MDC. All production samples are generated at CERN using the
fasermc
service account. If you need to use this account and don't know the password, ask one of the usual suspects like Dave Casper, Eric Torrence or Carl Gwilliam.
Code Releases (tags)
Information about creating and installing production tags can be found at
ProductionTags.
Preparing Jobs
Most production jobs should be run through the CERN
Batch Service
. This is a condor system, and there is
good documentation
along with many web resources to solve common problems. Jobs must be submitted using a condor job submission file that specifies executable shell scripts with optional arguments. The shell scripts are available in git, while the condor submission scripts are now kept in their own
MonteCarloProduction
repository.
The MDC scripts all take two directories as inputs, the release directory and the working directory. The release directory should be the full path to where the tag was installed, while the work directory should be the
/tmp
area on the condor worker nodes. The scripts also have optional arguments which are the eos directories to copy the output file and log file into eos at the conclusion of the job.
The condor system produces some independent logging output. I have used subdirectories in the
submit
directory to keep this organized. These logs aren't useful unless there is a job submission problem.
The job submission file must contain the job to run, the directories needed (which are passed as arguments to the job script), and the location for condor output. Any directories referenced must be created before the jobs are submitted. Most MDC jobs have a json configuration file which contains the default parameters used to control the actual job python script. These are all in git, so can be referenced in their location with respect to the release directory. Currently, generation and simulation of ParticleGun and Foresee signal samples are done using the
submit_faserMDC_particlegun.sh
and
submit_faserMDC_foresee.sh
scripts available in the
Generation/scripts
area.
One thing I am quite proud of is figuring out the following syntax to allow submission of multiple jobs without having to write a separate line for each:
queue 25 arguments from (
$(config) $(Step) $(rel_dir) $(scratch_dir)
)
The Step variable is an integer that will increment for each queue instance (so in this example it will increment from 0 to 24). As the second argument to the bash job script here it specifies the segment number, so each output file will be unique. Note I have also used the hashed output file name to seed the random number generator, so each output file will have different events, and reproducing a given file within a release should be reproducible. To extend an existing sample, an offset can be added to the Step variable like so:
queue 25 arguments from (
$(config) $$([$(Step)+25]) $(rel_dir) $(scratch_dir)
)
This will extend the previous production by another 25 files. The job execution script should automatically detect which release is being used and append this to the output file name.
The CERN batch system has several different queues with different run times. The naming of these queues was done in a 'cutesy' way with the following logic:
# Job queue (espresso is default)
# espresso = 20 minutes
# microcentury = 1 hour
# longlunch = 2 hours
# workday = 8 hours
# tomorrow = 1 day
# testmatch = 3 days
# nextweek = 1 week
The queue is specified in the condor submission file using the line:
+JobFlavour = "longlunch"
. If in doubt about the running time, use 1 day, although for most sim jobs 8 hours is enough. Run some test jobs if you don't know what to expect.
Submitting Jobs
Once you have a condor submission file, you are ready to submit some jobs.
condor_submit myfile.condor
To see the status of a job
condor_q
is helpful.
The output will be written to the working directory, in a subdirectory which is the run number. There will be a separate log file for each job, and a subdirectory with the actual calypso output.
Simulation
Simulation can be done including 4-vector generation, or by using external 4-vector files as inputs. A few different methods are described here.
Particle Gun
The particle gun combines 4-vector generation with simulation. The main script to generate particle gun samples is
faserMDC_particlegun.py
.
This script has a large number of options, and to make this easier to manage, these options can be read or saved into an XML file. Standard samples can be found in the
config area
on git. So instead of specifying many options at the command line, it is enough to specify the config file:
faserMDC_particlegun.py --conf config.xml
. Any additional command-line options will
overwrite the options in the config file.
To produce a new configuration file, or to just see what the configured options are, the
--dump
option can be specified. If
--noexec
is also provided, the job won't actually run, but the configuration will be dumped. This is convenient for testing configurations. Another testing option is the
--nevts
specification. This sets the total number of events that will be generated, overriding the
--file_length
specifier from a config file.
Dark Photons (foresee)
The foresee-based generators use a precompiled production phase space library, and use the decay in flight generator to produce final 4-vectors. The main script to generate these samples is
faserMDC_foresee.py
. This is very similar to the particle gun script described above, and the parameters can be specified in XML configuration files. By default, the IP1 crossing angle and shift to the FASER coordinate system is applied.
External 4-vectors
The script to read external 4-vectors and simulate them is
faserMDC_simulation.py
.
This is used for simulating Genie, FLUKA, and also Foresee samples (with HEPEvt output).
Because the input 4-vectors can come in a variety of naming schemes, required arguments to this script are the full input file path and the output file name.
The run number to use in the simulated events will be extracted from the output file name if it follows the FASER naming convention. Crossing angles and shifts can be specified at the command line. There is also the option to skip some number of events in the input file with
--skip
.
This is useful when large 4-vector files have been produced, and we want to split this up into more manageable sizes in the simulated HITS files.
Digitization
Digitization from HITS files to RDO output is performed using the
faserMDC_digi.py
script. The only required argument is the full path to the input HITS file. The output file name will automatically be generated following the FASER naming conventions. A subset of the events in the HITS file can be processed using a command-line switch.
As the HITS files are quite large, in the digitization step there is the option to merge multiple HITS files together into one output RDO file. This is done in the
faserMDC_digi_merge.py
script. This takes an input directory as an argument (rather than file path) and the parameter
--files
controls how many files to merge. The specific output file written is controlled using the
--slice
parameter. So with
--files=5 --slice=0
, the job will merge the input hits files from 0 to 4 together into a single output file, while
--files=5 --slice=1
will merge input files from 4-9.
Submitting these jobs to condor is controlled using the
submit_faserMDC_digi_merge.sh
script, and example condor submission scripts can be found in the mdc directories under the
fasermc
account.
Reconstruction
--
EricTorrence - 2022-05-20