The MillePede Production System (MPS)


Complete: 4


Goal of the page

Explain the use of the MPS, to run parallelized production of MillePede alignment constants.

Contacts

Original Authors:

Introduction

The MillePede Production System (MPS) is an ad-hoc implementation of a tool executing tracker alignment with MillePede-II in a production environment. It is building on the MillePede Alignment Producer (see SWGuideMillepedeIIAlgorithm) and an underlying batch system (presently LSF). The general workflow of MillePede production is displayed in the following figure.

millepede-workflow-700.png

The MillePede-II alignment procedure consists of two steps. The first step, Mille, processes all tracks and hit residuals and prepares the information for the final global fit, which is written out to a "Millepede binary file". The MPS parallelizes this step by splitting the set of input files across a large number of Mille jobs. The second step, Pede, reads the information from all binary files and performs the global fit that results in the full set of alignment constants.

The MPS supports this process by

  • splitting the set of input files across a given number of parallel jobs
  • preparing individual run directories for all Mille jobs and the Pede job
  • preparing run scripts for all jobs from templates
  • preparing configuration (py) files for all jobs from a template
  • submitting the Mille jobs, all at once or in several steps
  • displaying the status of running jobs
  • checking the output of finished jobs
  • cancelling and resubmitting jobs if needed
  • submitting the Pede job when all Mille jobs have successfully finished
  • storing relevant results into a separate directory for conservation

MPS is presently configured to work with the LSF installation at CERN, assuming the user to be logged in on a server of the lxplus cluster.

Setting up Alignments using the new version of MPS (-> version 2, documentation under development)

For setting up an alignment, some sort of CMSSW-config-template is necessary. MPS fills in necessary information using placeholders. This is too tedious to do by hand, because what needs to be filled in differs for each of the jobs. Up until january 2016 for each tracktype (cosmics, minimum bias, Z -> mu mu, isolated muons) a different config-template was used. These templates differ mainly in their parameters for refitting and selection of the tracks. Large parts are the same though. To improve the workflow of setting up alignments a universal config-template for all tracktypes was constructed using a unified track selection and refitting tool that already existed.

This universal config-template comes in conjunction with a new script called mps_alisetup.py which serves a similar purpose as the old setup_align.pl script. The difference is that the new script doesn't need to be modified directly for each alignment. The variables that need to be accessed are now stored separately in a .ini-configuration-file (alignment_config.ini) which is given to the mps_alisetup.py script as an argument. This way, the script can be put into the CMSSW-release together with the other MPS-scripts.

Recipe

The recipe to set up an alignment consists of the following steps:

  • go to the MillePede production (MPproduction) area: /afs/cern.ch/cms/CAF/CMSALCA/ALCA_TRACKERALIGN/MP/MPproduction
  • source a CMSSW release provided in MPproduction
  • create a new campaign using mps_setup_new_align.py -t <type of data> -d <description>
    • currently the release areas for CMSSW_8_0_3_patch1 contain version 2 of MPS %RED -> might be outdated at the time of reading this
    • mps_setup_new_align.py has an optional argument -c/--copy to which the name of a previous campaign can be supplied, e.g. mp1996
      • copies all files from this campaign with the following extensions .py, .ini, .txt
      • if this option is omitted, a default configuration will be created that has to be filled with the desired settings
  • change to the newly created campaign directory (indicated by the output of mps_setup_new_align.py)
  • modify the configuration files to your needs
    • you might need to create file lists for the datasets used in your alignment campaign (best stored under MPproduction/datasetfiles)
  • setup the alignment campaign with mps_alisetup.py
  • check the number of mille jobs by mps_stat.py
  • submit the mille jobs using mps_fire.py
  • check their status regularly with mps_stat.py
    • if the jobs are in DONE state, fetch the results with mps_fetch.py
    • if the jobs succeeded, their are now in OK state (check this with mps_stat.py)
  • if all mille jobs are OK, submit the pede (merge) job with mps_fire.py -m
    • wait...
    • proceed with the job check as for the mille jobs
  • if the pede job finished and is (after fetching) in OK state you are ready to run the validations using the All-In-One tool
    • typically a pede jobs ends with a warning about bad measurements
      • in real data it is often fine, but look at the output files and, of course, validate the resulting database sqlite file
      • if in doubt, ask the MillePede experts under cms-millepede@cernNOSPAMPLEASE.ch

Here's an overview of the procedure (note: alignment_setup.py is now called mps_alisetup.py):

overviewWorkflow.png

Slides from the MP-Alignement-Meeting

Configuration of an alignment campaign

Typically one just needs to modify the ini file. Such a file has different sections and possible options are explained in the following.

[general] section

option expected value required description example
classInf <mille job queue>:<pede job queue> yes pair of CAF queues to be used for mille and pede jobs cmscaf1nd:cmscafspec1nw
jobname <name> yes identifier used for batch jobs MP_2015B_ReReco
pedeMem <memory in KB> yes memory requirement for the pede job 32000
datasetdir <path> no path to where the file lists are stored; can be referenced in [dataset:<name>] sections using ${datasetdir} /afs/cern.ch/cms/CAF/CMSALCA/ALCA_TRACKERALIGN/MP/MPproduction/datasetfiles
configTemplate <path to file> no default config template to be used in [dataset:<name>] sections universalConfigTemplate.py
globaltag <global tag> no default global tag to be used in [dataset:<name>] sections 80X_dataRun2_v9
json <path to file> no default JSON file to be used in [dataset:<name>] sections /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions15/13TeV/Reprocessing/Cert_13TeV_16Dec2015ReReco_Collisions15_50ns_JSON.txt
pedesettings <comma separated list of files> no expert option; if provided, the content of those configuration snippet files will be appended to the created pede configuration file; one job per configuration snippet is created pede_plain.txt,pede_1.txt,pede_2.txt

[dataset:<name>] sections

option expected value required description example(s)
collection <track collection> yes describes the kind of tracks and determines the track selection applied ALCARECOTkAlMinBias, ALCARECOTkAlCosmicsCTF0T (for both cosmics at 0T and 3.8T), ALCARECOTkAlMuonIsolated, ALCARECOTkAlZMuMu
inputFileList <path to file> yes path to the file list created for this dataset; 'datasetdir' defined in the [general] section can be referenced here using ${datasetdir} inputFileList = ${datasetdir}/MC/CMSSW_7_6_X/TKCosmics_38T.CosmicWinter15DR-TkAlCosmics0T-DECO_76X_mcRun2cosmics_asymptotic_deco_v0-v1_1p8M-tracks.txt
globaltag <global tag> yes, if not given in [general] section global tag to be used 80X_dataRun2_v9
configTemplate <path to file> yes, if not given in [general] section config template to be used universalConfigTemplate.py
json <path to file> yes, if not given in [general] section JSON file to be used /afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions15/13TeV/Reprocessing/Cert_13TeV_16Dec2015ReReco_Collisions15_50ns_JSON.txt
cosmicsZeroTesla <boolean> no explicitely state if 0T magnetic field is used; applies only to collection ALCARECOTkAlCosmicsCTF0T; defaults to false true
cosmicsDecoMode <boolean> no explicitely state if cosmics were taken in deconvolution mode; applies only to collection ALCARECOTkAlCosmicsCTF0T; defaults to false, i.e. peak mode true
primaryWidth <float> no sets a different primaryWidth for the AlignmentProducer; for ALCARECOTkAlZMuMu collection it is already set, so this variable might be completely useless -1.0
njobs <integer> no number of jobs to be used for this dataset; defaults to number of files in inputFileList, but should be set to a lower value to avoid a too large number of files on EOS; if nJobs is larger than the number of files, one job per file is created 10
weight <comma separated floats or names from [weights] section> no weight(s) assigned to this dataset in the final pede job(s), defaults to 1.0 0.5

[weights] section

This section is optional and is used to assign common weights for some datasets. The syntax is as follows

[weights]
<name> = <comma separated list of floats>
<another name> = <comma separated list of floats>
...

The names <name>, <another name>, etc. can be used in the value for the weight option in the dataset sections.

Missing features or Problems

Ideas for future developments:

  • Think of a better connection between config template and alignment_setpup.py/alignment_config.ini. Currently it uses regular expression matching and substitution.
  • Overwrite conditions from ini-file (startgeometry).
  • Set pedesettings from ini-file.
  • Set alignables from ini-file.
Last three points would make the config template completely static and it could just stay in the release directory. Possible downside: Loosing sight over what is happening under the surface.

Installation of MPS

The MPS is released as a part of CMSSW. But the most up-to-date version might have to be installed following the recipes at SWGuideMillepedeIIProductionEnvironment. The scripts that run mille and pede jobs on the farm (mps_runMille_template.sh and mps_runPede_rfcp_template.sh) are ment to be genericly working except that the CMSSW version needs to be adjusted. The alignment_cfg.py has to be updated to the specific needs.

Further information are provided in these tutorials

Recipe to build and test Pede :

Here are the steps to build pede for 80X. This can be done for other release as well, with keep in mind the correct architecture and the release name. Before start building pede, if you never forked cmsdist and pkgtools to your github repository then do that first.


0) login and setting up initial parameters :

First login to your lxplus account and then login to cms-developing area using `ssh cmsdev07`.
CMSSW=CMSSW_8_0_X
ARCH=slc6_amd64_gcc493
BUILDDIR=/build

1) Create build area and prepare CMSDIST and PKGTOOLS :

HERE=${BUILDDIR}/${USER}
DATETIME=$(date +'DONEm%d_HELPM')
TOPDIR=${HERE}/ext/${CMSSW}/${DATETIME}
mkdir -p $TOPDIR
cd $TOPDIR
URL=https://raw.githubusercontent.com/cms-sw/cms-bot/master/config.map
eval $(curl $URL | grep "SCRAM_ARCH=$ARCH;" | grep "RELEASE_QUEUE=$CMSSW;")
git clone -b $CMSDIST_TAG git@githubNOSPAMPLEASE.com:cms-sw/cmsdist.git CMSDIST
git clone -b $PKGTOOLS_TAG git@githubNOSPAMPLEASE.com:cms-sw/pkgtools.git PKGTOOLS

2) Edit millepede.spec file, build and test the external :

pushd CMSDIST
<update millepede.spec file by changing "tag"(#) on very first line (commented) with latest one.>
git commit millepede.spec (add message into file then save and close it.)
popd
screen -L time PKGTOOLS/cmsBuild -i a -a $ARCH --builders 4 -j $(($(getconf _NPROCESSORS_ONLN) * 2)) build cmssw-tool-conf

This `screen` command will compile and build using PKGTOOLS (will take some time).

(#) all the tags can be found here : https://svnsrv.desy.de/public/MillepedeII/tags/

3) Testing your builds :

Once everything is compiled successfully, one can find `pede` executable built in the directory $TOPDIR/a/slc6_amd64_gcc493/external/millepede/V04-03-03/

Here "V04-03-03" is the latest tag. it should be same as the tag you changed into the millepede.spec file.

  1. To test this latest built pede, first copy this pede executable to some directory in your lxplus account.
  2. Create a millepede project in MPproduction area : /afs/cern.ch/cms/CAF/CMSALCA/ALCA_TRACKERALIGN/MP/MPproduction/
  3. Either borrow configuration from mp1915 for high-level structure alignment or create your own. Similarly repeat above step for module level alignment by borrow configuration from mp1910 or create your own.
  4. After getting all mille jobs done run pede job twice, one for older pede and other for pede you built. To run job with newer pede executable go to jobm or jobm1 (whatever) dirctory and then change following line in the alignment_merge.py (config file) file : process.AlignmentProducer.algoConfig.pedeSteerer.pedeCommand = "/path/to/your/newer/pede/executable" . On the call of Claus I updated further "pedesetting.txt" file by changing 'bandwidth 6 1' to 'bandwidth 6' for pede job to be run with newer 'pede'. So for these kind of changes, keep in touch with experts.
  5. When jobs from both older and latest built pede are done, then go to there respective directory and compare results from one another. Basically compare fit parameters in "pede.dump.gz" from both older and latest pede. One can also compare "millepede.res.gz". If you don't find significant difference within some error then you are done with testing part.
  • One can also do geometry comparison further, if needed.
If you and all the experts will get satisfied with the above test then go for pull request.

4) Push your changes in your cmsdist and then create Pull request from your repository to the official one:

cd CMSDIST
git remote add $USER <your-cmsdist-fork>
git push $USER $CMSDIST_TAG
create pull request

For reference one can browse through :

Usage Environment of MPS

The main idea of MPS is that it will set up and control a job series performing one MillePede alignment task. Every job series should be set up in its own particular directory, and the MPS commands will create and maintain a subdirectory structure and additional files in this directory. All MPS commands referring to a job series must be issued from the same UNIX working directory. The setup parameters and the current state of each job are maintained in a file named "mps.db", all job-specific information is stored in a directory tree named "jobData". For each job a directory jobData/jobNNN is created, where NNN stands for the three-digit job number. The maximum number of Mille jobs MPS sets up in parallel is 999. The Pede job is assigned a special directory jobData/jobm. It is also possible to run parallel Mille jobs with different configurations or later on run different Pede jobs, e.g. with slightly varied configuration (the latter will get directory names jobData/jobm1, jobData/jobm2, etc.).

As a result, an arbitrary number of alignment job series can be run in parallel without interfering, but one should never attempt to set up two job series in the same directory. For example, the mps_stat.pl command will always show the jobs related to the job series that was initiated in the same directory.

Job States

The status of each job is specified by a named job state, which also determines which transitions are possible. A diagram illustrating this is shown in the following figure:

MPS-states.png

The meaning of the states is listed in the following:

Job state Meaning
SETUP Job has been set up, and is ready to be submitted
SUBTD Job has been submitted, control turned over to batch system
PEND Job is pending with batch system
RUN Job is running under batch system
SUSP Job has been suspended by batch system
DONE Job has left control of batch system
OK Job has passed checks successfully
FAIL Job has failed at least one of the (crucial) checks

Explanation of Commands

The mpedegui Command

Usage:
mpedegui.pl
This command starts a GUI to the mps_setup, mps_fire, mps_retry, mps_fetch, mps_stat and mps_save commands.

mpedegui-1.11.png

The working directory is shown at the top.

The following few lines contain the parameters and options for the mps_setup command; if a database file (mps.db) is found in the working directory, they are recovered from there. For every needed input file, you can either write the name in the field, or browse it with the browse button. When you run the mps_setup command by pressing the button, the result is shown in the bottom part of the GUI.

For the mps_fire and mps_retry commands you have to choose the same parameters and options as for the command line; once you run the commands the result is shown in the bottom part of the GUI.

The mps_fetch and mps_stat commands have no parameters.

The mps_save command needs in input the name of the directory where to save the output files.

The mps_setup Command

Usage:
mps_setup.pl [-m] [-a] milleScript cfgTemplate infiList nJobs class jobname [pedeScript [mssDir]]
mps_setup.pl -h
Parameters:
milleScript Template script for Mille batch job
cfgTemplate Template for CMSSW configuration file
infiList List of input files, one line each (as it could be within the quotes in a configuration file). If its first line starts with CastorPool=<pool>, everything behind CastorPool= will be interpreted as the CASTOR disk pool where the CMSSW input files should be read from. When input files are on the CAF disk pool, CastorPool=cmscaf should be used, so that the correct setup can be executed. Lines starting with '#' (prepended by any amount of whitespace) are ignored - it is recommended to use this to comment which dataset the files belong to etc.
nJobs Number of parallel jobs for Mille
class The batch system queue/class. If class contains a ':', it will be split (the part before the ':' defines the class for Mille jobs, the part behind the class for the Pede job, eg cmscaf1nd:cmscaf1nw). The class can be:
- any of the normal LSF queues (8nm, 1nh, 8nh, 1nd, 2nd, 1nw, 2nw)
- special CAF queues (cmscaf1nh, cmscaf1nd, cmscaf1nw)
- special CAF queues, for pede job (cmscafspec1nh, cmscafspec1nd, cmscafspec1nw). E.g. cmscafspec1nh corresponds to "-q cmscaf1nh -R cmscafspec" (note that due to option '-M <mem>' you can safely use simple cmscaf queues for pede jobs)
jobname Jobname for batch system
pedeScript Template script for Pede batch job
mssDir Mass storage directory (Castor) for output of binary files. If 'mssDir' contains a colon ':', the part before ':' defines the pool and the part after ':' the directory (e.g. cmscafuser://castor/cern.ch/cms/store/caf/user/<username>/...).
Options:
-h Help on arguments.
-N name Give the jobs the name "name".
-m Set up a Pede job in addition to Mille jobs
-a Set up additional Mille jobs with other cfgTemplate etc. A following Pede job will merge the previous and the new Mille jobs.
-M pedeMem The allocated memory (MB) for pede (min: 1024 MB); if the parameter is not given, the memory is extracted from the pede executable name (if in the form pede_*GB); the memory is set 2560 MB if neither of the two are available.
This is the main command that sets up a job series and defines what should be done. After mps_setup.pl, the jobs are in SETUP state.

The mps_setup command creates the mps.db status file and the jobData directory tree within the directory where it is invoked, hence the user must have write access there. An existing jobData tree will be removed, unless the -a option is used. The LSF system usually also needs access to the directory, so use afs space.

The command also creates a tarfile 'ScriptsAndCfgNNN.tar' in the jobData directory, which contains the original scripts and configuration (py) files. NNN stands for a three digit number, increased by one unit every time the mps_setup is invoked with -a option.

A word of warning is in order regarding the mssDir parameter. While the input and output spaces for each job are neatly separated within the individual job directories, the mass storage directory must necessarily be outside of this tree. It is thus very important for the users to avoid at any cost that two job series will write to the same mass storage directory.

As the mps_setup command is by far the most complex directive, it can be helpful to store each instance in a single line script. If a faulty mps_setup command has been issued, it is usually best to just execute another mps_setup command with the correct parameters.

The mps_stat Command

Usage:
mps_stat.pl
Parameters:
none
The mps_stat command is used to displays the updated status of all jobs belonging to the job series. In detail, it works in three steps:

  • mps_stat reads the last status stored in the mps.db file
  • mps_stat updates the status for those jobs that are under control of the batch system
  • mps_stat displays and saves the updated status stored to the mps.db file

An example for the output is shown in the following picture.
mgstat.gif

The meaning of the columns displayed is as follows:

Column Description
### Job sequence number
dir Name of job directory
jobid Job number assigned by batch system (LSF)
stat Job status
ntry Number of retries
rtime CPU time (available after job termination)
nevt Number of events processed (available after job termination)
time / evt CPU time per event (available after job termination)
remark additional information
While a job is running, the CPU time information is taken from the output of the "bjobs" command. When a job has finished, this information is replaced by the number of NCU seconds taken from the jobs standard output, divided by a conversion factor (assumed to be 3).

The remark field, which is not shown in the figure above, has versatile uses. Some of the checks performed during mps_fetch place additional information here reminding about the nature of a failure. For the time being, for jobs in RUN state the CPU increment since the last invocation of mps_stat is shown, which allows an easy identification of stalled or hanging jobs.

It is important to realize that the mps_stat command is the only means for MPS to find out which jobs are finished. It is therefore important to invoke mps_stat before issuing a mps_fetch command.

The mps_fire Command

Usage:
mps_fire.pl [-m[f]] [nJobsSubmit]
Parameters:
nJobsSubmit Number of Mille jobs to be submitted (default is one).
Options:
-m Submit the Pede job
-mf Force the submission of the Pede job in the case when some Mille jobs are not in the "OK" state
Submit the first nJobsSubmit jobs that are in SETUP state.

If option -m given, submit the Pede job(s). Usually you need all the Mille jobs to be in the "OK" state to run the Pede job; you can force the Pede job to run anyway with the -mf option (only the jobs in the "OK" state will be used in this case).

When the force option is used, the Pede configuration file and script are regenerated, after a backup is created in the Pede run directory. The template for the new configuration file is the one contained in the Pede directory; for the new script the template is the one passed originally to the mps_setup command (whose name is stored in the mps.db file).

The mps_fetch Command

Usage:
mps_fetch.pl
Parameters:
none
Fetch all jobs that are in DONE state, including the Pede job if there is one, and perform checks on the output. After mps_fetch, the jobs will be moved either to OK or to FAIL state.

The mps_kill Command

Usage:
mps_kill.pl [-a] [-m] jobSpec
Parameters:
jobSpec A specification of the job(s) to be cancelled. This can be a job number (without leading 0's), or a job state (FAIL, RUN,...).
Options:
-a Cancel all jobs
-m Cancel the Pede job
Cancel all or a selection of jobs. The cancelled jobs will be moved to FAIL state.

The mps_retry Command

Usage:
mps_retry.pl [-a] [-f] [-m] jobSpec
Parameters:
jobSpec A specification of the job(s) to be retried. This can be a job number (without leading 0's), or a job state (FAIL, RUN,...).
Options:
-f Force the retry (e.g. if job state is OK)
-m Retry the Pede job(s)
Retry all or a selection of jobs. The retried jobs will once more go to SETUP state, and can be resubmitted with the mps_fire command. The number of tries (which is reported by the mps_stat command) is incremented by one.

The mps_setupm Command

Usage:
mps_setupm.pl [-h] [-a] [-d] [mergeJobId]
Parameters:
mergeJobId A positive number to specify 'template' merge job.
Options:
-h Prins help.
-d Ignore "disabled" jobs.
-a Consider only "active" Mille jobs (see mps_disablejob and mps_enablejob commands)
Sets up an additional merge job (so needs at least one merge job to be setup). Configuration is copied from
  • last previous merge job if 'mergeJobId' is NOT specified,
  • job directory 'jobm<mergeJobId>' otherwise.
Edit the jobData/jobm<n>/alignment_merge.py configuration file before submitting the new job by mps_fire.pl -m.

The mps_auto Command

Usage:
mps_auto.pl [-h] [seconds]
Parameters:
seconds Time interval in seconds to test whether merge job can be submitted, default is 60 s, enforced to be at least 20 s.
Options:
-h Prints help.
Calls mps_stats.pl and mps_fetch.pl in intervals of 'seconds' seconds to update the statistics of the mille jobs. Then it tries mps_fire.pl -m to submit the merge job. Quits if that is successful.

The mps_save Command

Usage:
mps_save.pl saveDir [n]
Parameters:
saveDir Name of the directory into which output files are to be copied.
n  
mps_save will copy the following files from the jobData/jobm directory to the specified target directory: treeFile_merge.root, histograms_merge.root, millePedeMonitor_merge.root, alignment_merge.py, alignmen.log[.gz], millepede.log[.gz], millepede.res[.gz], millepede.his[.gz], pede.dump[.gz], alignments_MP.db, pedeSteer*.txt[.gz], theScript.sh. The command also copies the jobData/ScriptsAndCfg???.tar files, which contain the original scripts and configuration files.

If the directory does not exist, it will be created, otherwise the existing directory will be reused.

The mps_weight Command

Usage:
mps_weight.pl [-c] [-N name] [weight] [jobids]
Parameters:
-h Some help.
-l List all weights
-c Remove all weights from the data base
-N name Assign a weight to the dataset "name". See the option "mps_setup.pl -N".
weight The weight to be used
jobids A list of job ids to which the weight is assigned. This option is only used if neither -N nor the -c option is specified.
The command mps_weight.pl can be used to associated weights to individual Mille binary files. Some examples are given below.
# Examples:
#
# % mps_weight.pl -N ztomumu 5.7
# Assign weight 5.7 to Mille jobs which are called "ztomumu" ("mps_setup.pl -N ztomumu ..." has to be used during job creation).
#
# % mps_weight.pl 6.7 3 4 102
# Assign weight 6.7 to Mille binaries with numbers 3, 4, and 102, respectively.
#
# % mps_weight.pl -c
# Remove all assigned weights.

The mps_disablejob Command

Usage:
mps_disablejob.pl [-h] [-N name] [jobids]
Parameters/Options:
-h The help.
-N name Disable Mille jobs with name "name".
jobids A list of Mille job ids which should be disabled. Does not work together with option -N.
The script mps_disablejob.pl can be used to disable several Mille jobs by using either their associated name or by their ids. See command mps_enablejob for further details or this presentation.
#Examples:

#first example:
#create a new Pede job:
% mps_setupm.pl
# disable some Mille jobs:
% mps_disablejob.pl -N ztomumu
# submit the Pede job (works only if the "force" option is used):
% mps_setup.pl -mf
# enable everything
% mps_enablejob.pl

#second example:
#create a new Pede job:
% mps_setupm.pl
# disable some Mille jobs
% mps_disablejob.pl 3 5 6 77 4
# submit the Pede job (works only if the "force" option is used):
% mps_setup.pl -mf

#third example:
# disable a sequence of jobs
% mps_disablejob.pl `seq 2 300`
#create and submit new Pede job. Note if you want to omit the "force" option when the Pede job is submitted, you need to use the -a option for mps_setupm.pl
% mps_setupm.pl -a
%mps_fire.pl -m
%mps_enablejob.pl

The mps_enablejob Command

Usage:
mps_enablejob.pl [h] [-N name] [jobids]
Parameters/Options:
-h Some help.
-N name Enable Mille jobs with name "name".
jobids A list of Mille job ids which should be enabled. Does not work together with option -N.
The command mps_enablejob can be used to turn on Mille jobs which have been previously been turn off. If no option is provided all jobs are enabled. See command mps_disablejob for further details or this presentation.

Input Information for a MPS run

The milleScript Template

This template is used by the mps_setup command to create the batch script for each Mille job. In this transformation, the following changes are performed:

  • if the template contains a line starting with the token "RUNDIR=", the text after the token is replaced by the directory that the actual job will run in. Thus, the variable "RUNDIR" can be used in subsequent script commands.
  • if the template contains a line starting with the token "MSSDIR=", the text after the token is replaced by the mass storage directory specified in the mps_setup command. Thus, the variable "MSSDIR" can be used in subsequent script commands.
  • if the template contains a line starting with the token "MSSDIRPOOL=", the text after the token is replaced by the Castor pool specified for MSSDIR.
  • the string after a cmsRun command will be replaced by the name of the actual configuration file produced for this job
  • any occurence of the token ISN will be replaced by the three-digit jobs sequence number (1...nJobs).
  • for each data file assigned to the job, a Castor prestage directive will be added at the beginning of the script, taking care of the Castor pool if defined in the file list (see CastorPool=<pool> above).
An example milleScript template is provided with MPS in Alignment/MillePedeAlignmentAlgorithm/scripts/mps_runMille_template.sh.

An outdated version is linked to this Twiki.

The pedeScript Template

This template is used by the mps_setup command to create the batch script for the Pede job. In this transformation, the following changes are performed:

  • if the template contains a line starting with the token "RUNDIR=", the text after the token is replaced by the directory that the actual job will run in. Thus, the variable "RUNDIR" can be used in subsequent script commands.
  • if the template contains a line starting with the token "MSSDIR=", the text after the token is replaced by the mass storage directory specified in the mps_setup command. Thus, the variable "MSSDIR" can be used in subsequent script commands.
  • if the template contains a line starting with the token "MSSDIRPOOL=", the text after the token is replaced by the Castor pool specified for MSSDIR.
  • the string after a cmsRun command will be replaced by the name of the actual configuration file produced for this job
  • for any line containing the token ISN, this line will be replicated once for each Mille job, and ISN will be replaced by the three-digit jobs sequence number of that job.

An example pedeScript template is provided with MPS in = Alignment/MillePedeAlignmentAlgorithm/scripts/mps_runPede_rfcp_template.sh. The suffix _rfcp indicates that in this script the milleBinary files will first be copied using rfcp from MSSDIR to a local disk of the LSF node. How to setup MPS to make pede directly read from Castor is still in preparation.

An outdated version of such a script is linked to this Twiki.

The Configuration File Template

This template is used by the mps_setup command to create the configuration files for both the Mille and Pede jobs. In the transformation for the Mille jobs, the following changes are performed:

  • the list of input files given in the "fileNames" directive of the template is replaced by the actual list of files designated to this particular job
  • any occurence of the token ISN will be replaced by the three-digit jobs sequence number (1...nJobs)
  • the MillePede mode is set to "mille"

The python configuration alignment_cfg.py serves as template for MPS.

Automatic Creation of Pede Configuration File

Based on the configuration file template, the mps_setup command will also attempt to create a configuration file for the Pede job if invoked with the -m option. This is a complex operation, in particular since some key directives may not be listed in the file, but rather in an underlying cff file, to which MPS has no access. In this case, mps_setup will try to insert additional lines. For this operation, the file template MUST contain a line


#MILLEPEDEBLOCK

as an appropriate placeholder to indicate where such lines can be safely inserted. In detail, the creation process of the Pede configuration file will perform the following operations:

  • comment lines will be removed (but the #MILLEPEDEBLOCK directive will be protected)
  • in a directive setting the MillePedeAlignmentAlgorithm.mode, the mode will be changed to "pede"
  • a directive setting the "binaryFile" parameter will be blanked
  • the "mergeBinaryFiles" parameter will be set to the actual list of binary files for this Pede job (GF question: Currently as local files - how to make them be read from Castor?)
  • the name of the "treeFile" parameter will be set to "treeFiles_merge.root"
  • the parameter "mergeTreeFiles" will be set to the actual list of tree files produced by the Mille jobs
  • the parameter "monitorFile" will be set to "millePedeMonitor_merge.root"
  • the parameter "pedeSteerFile" will be set to "pedeSteer_merge" (GF: not true?)
  • the parameter "pedeDump" will be set to "pede_merge.dump" (GF: not true?)
  • the "source=" directive will be set for empty input
  • if there is no directive setting AlignmentProducer.saveToDB to "false", directives will be added setting this to "true" and adding the necessary PoolDBOutputService storing the alignment constants in an sqlite file alignments_MP.db as 'Alignments' and 'AlignmentErrors'
  • a temporary hack (by Jula Draeger et al.) has been introduced in mps_merge.pl, in order to run event setup in CMSSW >= 351. The script introduces "process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(1) )" in the configuration file

In most of the cases above, failure to find the target directive in the file template will lead to generation of a new directive that is inserted after the #MILLEPEDEBLOCK placeholder. (For a detailed explanation of the meaning of these parameters, see SWGuideMillepedeIIAlgorithm).

It is clear that successful operation of such a complex pattern replacement operation cannot be guaranteed to work on any kind of input. In practice, it may therefore be necessary to tailor the file template such that it contains the properly replaceable phrases, or adapt the corresponding MPS code to newer versions of alignment code.

Placement of Binary Files

The MPS assumes that most of the intermediate output of the Mille jobs will be placed in the run directory of each job. The MillePede binary files are a special case: while it would in principle be possible to place them into the run directory as well, they tend to consume too much disk space when millions of events are processed. The same is true for the treeFile.root.

For this reason, the examples given above assume that each binary file of a Mille job is first stored in a local directory of the respective batch machine (do not use /tmp, but the job directory in which the script starts), and at the end of the job copied to a mass storage directory under Castor. The Pede jobs first copies back to local directory. rfcp commands are normally used to access the mass storage; cmsStageIn/cmsStageOut are instead used when reading/writing to cmscafuser pool.

Goal

The Pede job will then read the binary files directly from Castor. It is important to note that this requires a special RFIO-aware version of Pede, an example of which can be found in the directory /afs/cern.ch/user/r/rmankel/pede-rfio and as one of the /afs/cern.ch/user/f/flucke/cms/pede/versWebEndMay2007/pede_*GB_rfio* executables. The MPS itself supports this process by propagating a mass storage directory name into both the Mille and Pede job scripts.

Use Cases

A Normal Production Session

During normal production, the user will typically

  • set up the production with the mps_setup command
  • submit all or parts of the Mille jobs with the mps_fire command
  • intermittently check the status with the mps_stat command
  • fetch & check jobs that have finished with the mps_fetch command
  • if necessary, resubmit failed jobs with the mps_retry and mps_fire commands
  • when all Mille jobs are OK, submit the Pede job with the mps_fire -m command
  • if necessary, resubmit a failed Pede job with the mps_retry -m and mps_fire -m commands
  • fetch & check the finished Pede job with the mps_fetch command
  • save output with the mps_save command
  • delete the jobData tree to save space

Running different pede jobs on the same mille binaries

New, elegant way

Simply make use of the mps_setupm command described above.

Old way

If the user wants to run different pede jobs on the same mille binaries, the procedure is as follows:

  • save the previous pede results with the mps_save command
  • go into the jobData/jobm directory and modify as desired the mps_template.cfg.bak (if present, or the mps_template.cfg otherwise)
  • Force the pede job to the SETUP state with "mps_retry.pl -f -m" command
  • Run again the pede job (just "mps_fire.pl -m" or "mps_fire.pl -mf")

Testing

After successful mps_setup, an arbitrary Mille job can be tested interactively by running the corresponding job script, e.g. jobData/job001/theScript.sh. This may be useful to check e.g. for cfg file errors that might prevent the program from starting properly.

Expert Mode

The structure of the mps.db file is intentionally straightforward in ASCII format. Experts who know what they are doing can edit the file and effect modifications of the production that are beyond the functionality of the existing commands.

Some useful commande besides mps

The following commands may be useful when interacting with =mps=-jobs:

  • bhosts -Rcmscafexclusive
    Lists the status of the special pede machines. If everything is healthy, status should be ok for all machines.
  • bjobs -u all -q cmscafalcamille
    Shows the queue status of the special pede machines.

Links


#ReviewStatus

Review status

Reviewer/Editor and Date (copy from screen) Comments
RainerMankel - 30 Nov 2007 page author
RainerMankel - 7 Dec 2007 page author
GeroFlucke - 10 Apr 2008 adjust to MPS now being in CVS (renamed scripts)
AndreaParenti - 17 Apr 2008 no more needs to mps_install the scripts
AndreaParenti - 26 Nov 2010 New responsible: Joerg Behr
MatthiasSchroederHH - 2014-12-03 update with current recipes and github links
FrankMeier - 2015-03-10 added queue commands for reference

Responsible: JoergBehr

Last reviewed by: AndreaParenti
Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf LatestPede_Arun.pdf r1 manage 61.8 K 2016-01-20 - 14:00 SumitKeshri Arun's Slides
PDFpdf MPS-states.pdf r1 manage 10.4 K 2007-12-03 - 17:40 RainerMankel States used within MPS
PNGpng MPS-states.png r1 manage 39.4 K 2007-12-03 - 17:42 RainerMankel States used within MPS
Unknown file formatcfg alignment-template.cfg r1 manage 4.8 K 2007-12-04 - 19:15 RainerMankel Cfg file template
GIFgif mgstat.gif r1 manage 41.1 K 2007-12-05 - 14:24 RainerMankel Example for output from mps_stat command
PNGpng millepede-workflow-700.png r1 manage 90.8 K 2007-12-03 - 17:27 RainerMankel MillePede workflow
PNGpng mpedegui-1.11.png r1 manage 25.4 K 2009-07-15 - 17:58 AndreaParenti A screenshot of the MPS graphical user interface.
PNGpng mpedegui-1.9.png r2 r1 manage 28.7 K 2009-06-04 - 14:08 AndreaParenti A screenshot of the MPS graphical user interface.
PNGpng mpedegui.png r4 r3 r2 r1 manage 26.4 K 2008-12-12 - 12:22 AndreaParenti A screenshot of the MPS graphical user interface.
PNGpng overviewWorkflow.png r1 manage 59.7 K 2016-01-13 - 17:54 MartinDescher Overview of worklfow using universal-config-template
PDFpdf presentation_alignmentsetup.pdf r1 manage 71.6 K 2016-01-19 - 15:38 MartinDescher Presentation from the MP-Alignement-Meeting on new alignment setup workflow
Unix shell scriptsh runscript-mille.sh r1 manage 0.9 K 2007-12-04 - 19:12 RainerMankel Run script template for Mille
Unix shell scriptsh runscript-pede.sh r1 manage 1.1 K 2007-12-04 - 19:14 RainerMankel Pede run script template
Edit | Attach | Watch | Print version | History: r85 < r84 < r83 < r82 < r81 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r85 - 2016-04-08 - MatthiasSchroederHH



 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback