Difference: DownloadAndBuild (1 vs. 37)

Revision 372019-07-09 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 198 to 198
  Due to changes in setting up the environment for LHCbDIRAC v9r3 and later, MC liaisons will have to make a temporary clone of the MCStatTools data package in order to use it (as detailed here):
Changed:
<
<
$ git lb-clone-pkg -b v4r7p4 MCStatTools
>
>
$ git lb-clone-pkg -b v4r7p5 MCStatTools
 $ cd MCStatTools/scripts The latest production version of LHCbDIRAC (v9r3 and later) should be set up with:
Line: 207 to 207
 

ALERT! The validation of XML file LFNs with the Log-SE may take quite a long time. In spite of measures taken to indicate to the user that the script is not stuck, please, be advised of this peculiar implementation of the current algorithm. \ No newline at end of file

Added:
>
>
ALERT! A subshell with the same environment as set by lb-run command may be obtained issueing the following command (after setting the LHCbDIRAC environment):
$ ( eval $(xenv --sh -x /cvmfs/lhcb.cern.ch/lib/lhcb/DBASE/MCStatTools/v4r7p5/MCStatTools.xenv); bash -i )

Revision 362019-06-27 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 198 to 198
  Due to changes in setting up the environment for LHCbDIRAC v9r3 and later, MC liaisons will have to make a temporary clone of the MCStatTools data package in order to use it (as detailed here):
Changed:
<
<
$ git lb-clone-pkg -b v4r7p3 MCStatTools
>
>
$ git lb-clone-pkg -b v4r7p4 MCStatTools
 $ cd MCStatTools/scripts The latest production version of LHCbDIRAC (v9r3 and later) should be set up with:

Revision 352019-06-03 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 13 to 13
 The machinery is most effective when the statistics pages are produced shortly after the productions are finished. There are several steps to followed, detailed herebelow.
Changed:
<
<
The current version of MCStatTools, v4r7p* is "frozen" following the discussion with experts during Simulation Meeting on May 21, 2019. Packge is available in CERN GITlab to be checked-out and used locally as detailed at the bottom of this page. Any issues related to parsing of production XML logs remaining in .tgz archives on CASTOR should be reported on LHCBGAUSS-1677 (this JIRA task will be closed with prior notice upon decision announced at the Simulation Meeting). Any issues that are found to affect larger number of productions should be reported on LHCBGAUSS-1676. Both types or issues may still trigger minor patch releases of MCStatTools.
>
>
The current version of MCStatTools, v4r7p* is "frozen" following the discussion with experts during Simulation Meeting on May 21, 2019. Packge is available in CERN GITlab to be checked-out and used locally as detailed at the bottom of this page. Any issues related to parsing of production XML logs remaining in .tgz archives on CASTOR should be reported on LHCBGAUSS-1677 (this JIRA task will be closed with prior notice upon decision announced at the Simulation Meeting). Any issues that are found to affect larger number of productions should be reported on LHCBGAUSS-1676. Both types or issues may still trigger minor patch releases of MCStatTools. So, please, use the latest available patch release from v4r7p* series unless another version is specifically indicated by experts.
  TIP Parts of this documentation may be obsolete, but it is kept mainly for providing in-sight on the algorithms involved in retrieving and merging generator statistics for MC productions and their evolution with the package releases.
Line: 198 to 198
  Due to changes in setting up the environment for LHCbDIRAC v9r3 and later, MC liaisons will have to make a temporary clone of the MCStatTools data package in order to use it (as detailed here):
Changed:
<
<
$ git lb-clone-pkg -b v4r7p2 MCStatTools
>
>
$ git lb-clone-pkg -b v4r7p3 MCStatTools
 $ cd MCStatTools/scripts The latest production version of LHCbDIRAC (v9r3 and later) should be set up with:

Revision 342019-05-29 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 13 to 13
 The machinery is most effective when the statistics pages are produced shortly after the productions are finished. There are several steps to followed, detailed herebelow.
Changed:
<
<
The current version of MCStatTools, v4r7p1 is "frozen" following the discussion with experts during Simulation Meeting on May 21, 2019. Packge is available in CERN GITlab to be checked-out and used locally as detailed at the bottom of this page. Any issues related to parsing of production XML logs remaining in .tgz archives on CASTOR should be reported on LHCBGAUSS-1677 (this JIRA task will be closed with prior notice upon decision announced at the Simulation Meeting). Any issues that are found to affect larger number of productions should be reported on LHCBGAUSS-1676. Both types or issues may still trigger minor patch releases of MCStatTools.
>
>
The current version of MCStatTools, v4r7p* is "frozen" following the discussion with experts during Simulation Meeting on May 21, 2019. Packge is available in CERN GITlab to be checked-out and used locally as detailed at the bottom of this page. Any issues related to parsing of production XML logs remaining in .tgz archives on CASTOR should be reported on LHCBGAUSS-1677 (this JIRA task will be closed with prior notice upon decision announced at the Simulation Meeting). Any issues that are found to affect larger number of productions should be reported on LHCBGAUSS-1676. Both types or issues may still trigger minor patch releases of MCStatTools.
  TIP Parts of this documentation may be obsolete, but it is kept mainly for providing in-sight on the algorithms involved in retrieving and merging generator statistics for MC productions and their evolution with the package releases.
Line: 198 to 198
  Due to changes in setting up the environment for LHCbDIRAC v9r3 and later, MC liaisons will have to make a temporary clone of the MCStatTools data package in order to use it (as detailed here):
Changed:
<
<
$ git lb-clone-pkg -b v4r7p1 MCStatTools
>
>
$ git lb-clone-pkg -b v4r7p2 MCStatTools
 $ cd MCStatTools/scripts The latest production version of LHCbDIRAC (v9r3 and later) should be set up with:

Revision 332019-05-22 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 13 to 13
 The machinery is most effective when the statistics pages are produced shortly after the productions are finished. There are several steps to followed, detailed herebelow.
Added:
>
>
The current version of MCStatTools, v4r7p1 is "frozen" following the discussion with experts during Simulation Meeting on May 21, 2019. Packge is available in CERN GITlab to be checked-out and used locally as detailed at the bottom of this page. Any issues related to parsing of production XML logs remaining in .tgz archives on CASTOR should be reported on LHCBGAUSS-1677 (this JIRA task will be closed with prior notice upon decision announced at the Simulation Meeting). Any issues that are found to affect larger number of productions should be reported on LHCBGAUSS-1676. Both types or issues may still trigger minor patch releases of MCStatTools.

TIP Parts of this documentation may be obsolete, but it is kept mainly for providing in-sight on the algorithms involved in retrieving and merging generator statistics for MC productions and their evolution with the package releases.

 

1 Get the productions ID number for your request(s).

Use the Dirac webpage (alternate webpage) to retrieve the Dirac Production ID associated to your request.

Revision 322019-05-14 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 202 to 202
 $ source /cvmfs/lhcb.cern.ch/lib/lhcb/LHCBDIRAC/lhcbdirac
Deleted:
<
<
ALERT! Until v4r7p1 is properly tested with help of MC liaisons, please, ignore -b v4r7p1 in the instructions above and therefore check out the master branch of MCStatTools.
 ALERT! The validation of XML file LFNs with the Log-SE may take quite a long time. In spite of measures taken to indicate to the user that the script is not stuck, please, be advised of this peculiar implementation of the current algorithm. \ No newline at end of file

Revision 312019-05-11 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 194 to 194
  Due to changes in setting up the environment for LHCbDIRAC v9r3 and later, MC liaisons will have to make a temporary clone of the MCStatTools data package in order to use it (as detailed here):
Changed:
<
<
$ git lb-clone-pkg -b v4r7 MCStatTools
>
>
$ git lb-clone-pkg -b v4r7p1 MCStatTools
 $ cd MCStatTools/scripts The latest production version of LHCbDIRAC (v9r3 and later) should be set up with:
Line: 202 to 202
 $ source /cvmfs/lhcb.cern.ch/lib/lhcb/LHCBDIRAC/lhcbdirac
Deleted:
<
<
ALERT! When running the script in DEBUG verbosity mode, please, be advised that especially for productions with large number of jobs the LFNs of the compressed XML files are only partially validated (to speed up the downloading procedure) and brute-force attempts to download the XML files from the Log-SE will yield a sizable number of error messages. When the number of jobs in the production is equal or below the number of logs to download (parameter -n) the complete validation of the XML LFNs may take quite a long time. In spite of measures taken to indicate to the user that the script is not stuck, please, be advised of this peculiar implementation of the current algorithm.
 \ No newline at end of file
Added:
>
>
ALERT! Until v4r7p1 is properly tested with help of MC liaisons, please, ignore -b v4r7p1 in the instructions above and therefore check out the master branch of MCStatTools.

ALERT! The validation of XML file LFNs with the Log-SE may take quite a long time. In spite of measures taken to indicate to the user that the script is not stuck, please, be advised of this peculiar implementation of the current algorithm.

 \ No newline at end of file

Revision 302019-05-07 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 201 to 201
 
$ source /cvmfs/lhcb.cern.ch/lib/lhcb/LHCBDIRAC/lhcbdirac
Added:
>
>
ALERT! When running the script in DEBUG verbosity mode, please, be advised that especially for productions with large number of jobs the LFNs of the compressed XML files are only partially validated (to speed up the downloading procedure) and brute-force attempts to download the XML files from the Log-SE will yield a sizable number of error messages. When the number of jobs in the production is equal or below the number of logs to download (parameter -n) the complete validation of the XML LFNs may take quite a long time. In spite of measures taken to indicate to the user that the script is not stuck, please, be advised of this peculiar implementation of the current algorithm.
 \ No newline at end of file

Revision 292019-04-23 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 187 to 187
 ('/MC/2012/Beam4000GeV-MayJune2012-MagUp-Nu2.5-EmNoCuts/Sim06a/Trig0x0097003dFlagged/Reco13a/Stripping19aNoPrescalingFlagged/11102003/ALLSTREAMS.DST', 'head-20120413', 'sim-20120727-vc-mu100', 17, 207000) [...] %ENDSYNTAX% \ No newline at end of file
Added:
>
>

LHCbDIRAC based versions

Starting with v4r7, MCStatTools includes a new algorithm for downloading the XML files needed for creating the generator statistics tables. This new algorithm is fully based on the LHCbDIRAC interfaces in v9r3 and later. Previous downloading methods are recovered by using the -k, --compat command line flag (although these methods should be regarded as obsolete and they will be removed from the package in the next major version). Also older downloading algorithm will only work with the environment set up by LHCbDIRAC v9r2p9 or earlier which provide a working Python interface to XRootD library. DownloadAndBuildStat.py should fail with a clear message if this interface is corrupted.

Due to changes in setting up the environment for LHCbDIRAC v9r3 and later, MC liaisons will have to make a temporary clone of the MCStatTools data package in order to use it (as detailed here):

$ git lb-clone-pkg -b v4r7 MCStatTools
$ cd MCStatTools/scripts
The latest production version of LHCbDIRAC (v9r3 and later) should be set up with:
$ source /cvmfs/lhcb.cern.ch/lib/lhcb/LHCBDIRAC/lhcbdirac

Revision 282019-01-28 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 118 to 118
  --use-local-logs Use already downloaded logs, for debuging purpose.
Added:
>
>
ALERT! MCStatTools v4r6 (see LHCBGAUSS-1586) introduces a rigorous validation of the prodID=s provided on the command-line using (and caching) metadata from =LHCbDIRAC. Also the Sim full version (including letter for minor version) is now included in the statistics table comments to allow users to identify the associated LFN path in BKK.
 ALERT! MCStatTools v4r5 (see LHCBGAUSS-1415) makes the HTML table generation interactive. In case no existing HTML table is found in the working directory for the simulation conditions corresponding to a given ProdId, the user is asked to select the WG (working group) where such file should be searched automatically in the current EOS repository for the HTML tables. In case the search is succesful the latest file in the repository is copied locally before merging operations are triggered.

Also, yet another JSON file is saved(/overwritten!) for each generated table containing a dictionary with the all the quantities which are output in the HTML table. This feature is experimental for now and would require further work in order to integrate into a future dynamic UI to the production statistics tables.

Revision 272018-06-26 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 15 to 15
 

1 Get the productions ID number for your request(s).

Changed:
<
<
Use the Dirac webpage (alternate webpage) to retrieve the Dirac Production ID associated to your request.
>
>
Use the Dirac webpage (alternate webpage) to retrieve the Dirac Production ID associated to your request.
 Follow 'Production' -> 'Request manager', and then use the left hand panel to filter the displayed request and pin down the request you are interested in. Click on the request, and select 'Production monitor'. This will bring you to another Dirac webpage, which you could use directly if you know the request ID number.

Revision 262018-06-14 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 52 to 52
 

To begin, cd to a directory with a lot of free space.

Added:
>
>
Tip, idea In order to be able to asses the amount of free disk space required and the status of log files for archived productions please, make use of check_for_staged.py helper script:
Usage: check_for_staged.py [options] <ProdIDs>

       <ProdIDs> : list of ProdID(s), comma-separated, with no blank spaces.

Options:
  -h, --help            show this help message and exit
  -s, --stage           stage the files (do NOT use repeatedly)
  -c, --check           check whether log file was staged
  -r, --copy            copy file to current dir
  -v <VERB_LEVEL>, --verb-level=<VERB_LEVEL>
                        case insensitive verbosity level [CRIT, ERROR, WARN,
                        INFO, DEBUG; default: info]
 Copy the current stats table of <your_WG> and corresponding to the MCtype <MC _ype> (MC2012, SIM08STAT, SIM09STAT) in this directory:

%CODE{ lang="bash" num="on" }%

Line: 91 to 108
  -h, --help show this help message and exit -n , --number-of-logs= number of logs to download [default : 1000]
Deleted:
<
<
--save-html force generation of Html report page (default action when all prod. logs in JSON)
  --delta-size= log files smaller by from largest file of sample will be deleted [default : 0.07] -t, --tape get files from ARCHIVE (on disk/tape) directly
Line: 103 to 118
  --use-local-logs Use already downloaded logs, for debuging purpose.
Changed:
<
<
ALERT! As of MCStatTools v4r1 the default behaviour of DownloadAndBuildStat.py is to generate a JSON file for each ProdId. HTML tables are generated by default once all statistical information for all ProdIds has been gathered into JSON files. To recover/force generation of HTML tables users may use --save-html option flag on the command line. The JSON files should be compressed when used as supplemental information for debugging issues reported via JIRA.
>
>
ALERT! MCStatTools v4r5 (see LHCBGAUSS-1415) makes the HTML table generation interactive. In case no existing HTML table is found in the working directory for the simulation conditions corresponding to a given ProdId, the user is asked to select the WG (working group) where such file should be searched automatically in the current EOS repository for the HTML tables. In case the search is succesful the latest file in the repository is copied locally before merging operations are triggered.

Also, yet another JSON file is saved(/overwritten!) for each generated table containing a dictionary with the all the quantities which are output in the HTML table. This feature is experimental for now and would require further work in order to integrate into a future dynamic UI to the production statistics tables.

ALERT! As of MCStatTools v4r4 the "old" default behaviour of DownloadAndBuildStat.py to generate a HTML tables for each ProdId was recovered. However, for each ProdId an additional JSON file is generated which caches the generator counters for that ProdId. This JSON file may be used for debugging reasons and/or faster rebuilding of HTML tables. The JSON files should be compressed when used as supplemental information for debugging issues reported via JIRA. The ProdId JSON file will need to be removed in case you want to extract more production logs or the script fails when re-run (though it would be nice of you to inform the developers about such issues). All changes in v4r4 are discussed in LHCBGAUSS-1352.

  ALERT! MCStatTools is evolving towards automating the log generation procedure. The status of this process can be followed on this JIRA task: LHCBGAUSS-929.
Line: 148 to 167
 

3. Publish the tables.

Changed:
<
<
Create a JIRA task at https://its.cern.ch/jira/browse/LHCBGAUSS (you have to login with your CERN SSO account).
>
>
Create a JIRA task at LHCBGAUSS (you have to login with your CERN SSO account).
 Choose as Component Generators Statistics and either upload the statistics pages or give a pointer to a folder containing the updated tables in a public-readable area. The tables will then be added to the Gauss web site.

Revision 252018-06-02 - GiulioDujany

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 31 to 31
 Mind that if there are spaces in the BK path, you should enclose it in quotes.

%SYNTAX{"tcsh"}%

Changed:
<
<
$ SetupProject LHCbDirac $ dirac-bookkeeping-prod4path --BK '/MC/2012/Beam4000GeV-2012-MagDown-Nu2.5-Pythia8/Sim08c/Digi13/Trig0x409f0045/Reco14a/Stripping20NoPrescalingFlagged/41900006 ( ttbar_gg_1l17GeV ) /ALLSTREAMS.DST'
>
>
$ lb-run LHCbDirac/prod dirac-bookkeeping-prod4path --BK '/MC/2012/Beam4000GeV-2012-MagDown-Nu2.5-Pythia8/Sim08c/Digi13/Trig0x409f0045/Reco14a/Stripping20NoPrescalingFlagged/41900006 ( ttbar_gg_1l17GeV ) /ALLSTREAMS.DST'
 For BK path /MC/2012/Beam4000GeV-2012-MagDown-Nu2.5-Pythia8/Sim08c/Digi13/Trig0x409f0045/Reco14a/Stripping20NoPrescalingFlagged/41900006 ( ttbar_gg_1l17GeV ) /ALLSTREAMS.DST: Productions found (Merge): 32263 Parent productions (MCSimulation): 32262

Revision 242018-03-12 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 126 to 126
 

Known MCStatTools issues

Changed:
<
<
As the MCStatTools code is restructured and rewritten to attain the level of functionality described in LHCBGAUSS-929, there are a number of known issues (Thumbs-down) that will be (are) solved in the development version of the package. We list them with possible work-arounds (Thumbs-up) until the code gets released officially.
>
>
As the MCStatTools code is restructured and rewritten to attain the level of functionality described in LHCBGAUSS-929, there are a number of known issues (Thumbs-down) that will be (are) solved in the development version of the package. The issues are presented together with workarounds whenever they are available. The list below also contains some (Thumbs-up) tips and features that should improve the user's experience and provide a better understanding of the scripts' workflow.

Thumbs-down Bug detected in versions v4r1 through v4r3 which prevents generation of HTML tables when production data loaded from JSON. A possible work-around till the code gets patched is to use flags --save-html (and --use-local-logs in case you still have the XML logs on disk) in order to force generation of HTML tables from the first run.

  Thumbs-up Use check_for_staged.py script to ensure that for each Prod ID at least some of the production logs are staged on disk (ALERT! Recommended procedure for archived productions ! ).
Changed:
<
<
Since v3r3 the user gets both the summed size of log files for each Prod ID and the summed size of all staged log files to ease selection of a partition with necessary free disk space.
>
>
Since v3r3 the user gets both the summed size of log files for each Prod ID and the summed size of all staged log files to ease selection of a partition with necessary free disk space.
 Thumbs-up Newly introduced feature to initialise GRID proxy at run-time requires the script to restart in order for LHCbDirac API calls to be successful. Mind, that the script will issue a warning when doing so.
Changed:
<
<
If debug logging level is set the command being executed is printed before the old instance of the script exits.
>
>
If debug logging level is set the command being executed is printed before the old instance of the script exits.
 
Changed:
<
<
Please, open a JIRA task under LHCBGAUSS project for Generator Statistics component if you are sure you found bug or, as MC liaison, you really need a new feature to be implemented.
>
>
Tip, idea Please, open a JIRA task under LHCBGAUSS project for Generator Statistics component if you are sure you found bug or, as MC liaison, you really need a new feature to be implemented.
 

<!--

Revision 232018-03-12 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 92 to 92
  -h, --help show this help message and exit -n , --number-of-logs= number of logs to download [default : 1000]
Deleted:
<
<
--save-json save output in JSON file and inhibit generation of Html report page (default action)
  --save-html force generation of Html report page (default action when all prod. logs in JSON)
Deleted:
<
<
--load-json look for and load production statistics from JSON file(s); if missing process production ID according to rest of arguments
  --delta-size= log files smaller by from largest file of sample will be deleted [default : 0.07] -t, --tape get files from ARCHIVE (on disk/tape) directly
Line: 109 to 104
  --use-local-logs Use already downloaded logs, for debuging purpose.
Changed:
<
<
ALERT! As of MCStatTools v4r1 the default behaviour of DownloadAndBuildStat.py is to generate a JSON file for each ProdId. HTML tables are generated by default once all statistical information for all ProdIds has been gathered into JSON files. To recover/force generation of HTML tables users may use --save-html option flag on the command line. The JSON files should be compressed and used as supplemental information for debugging issues reported via JIRA.
>
>
ALERT! As of MCStatTools v4r1 the default behaviour of DownloadAndBuildStat.py is to generate a JSON file for each ProdId. HTML tables are generated by default once all statistical information for all ProdIds has been gathered into JSON files. To recover/force generation of HTML tables users may use --save-html option flag on the command line. The JSON files should be compressed when used as supplemental information for debugging issues reported via JIRA.
  ALERT! MCStatTools is evolving towards automating the log generation procedure. The status of this process can be followed on this JIRA task: LHCBGAUSS-929.

Revision 222018-02-23 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 63 to 63
  %CODE{ lang="bash" num="on" }% lb-run --use MCStatTools LHCbDirac/prod bash --norc
Changed:
<
<
python $MCSTATS_AREA/DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
>
>
python $MCSTATTOOLSSCRIPTS/DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
 %ENDCODE%

where XXXX, YYYY, ZZZZ are the prodId's of the simulation steps for the requests

Changed:
<
<
This will try to get the logs first from the web, then from CASTOR archive.
>
>
This will try to get the logs first from the web, then from "CASTOR archive" (a.k.a. ARCHIVE in some script log messages).
 It will also filter the xml files for obviously malformed and incomplete files. Then it calls the script that builds the tables up. It will then update existing tables with the information corresponding to the ProdIDs passed.

Revision 212018-02-20 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 7 to 7
 

Foreword

The user of a Monte-Carlo sample is interested in a number of efficiency values to make sense of this sample.

Changed:
<
<
This information is available in xml files produced by the simulation jobs and associated to each production job.
>
>
This information is available in xml files produced by the simulation jobs and associated to each production job. Statistical quantities are computed according to formula detailed here.
 The statistics tables gather this information in a well-formatted way, and are published so that the information is easily accessible to users.
Changed:
<
<
It is the task of the MC contact to generate the statistics tables for the production of his WG.
>
>
It is the task of the MC contact to generate the statistics tables for the production of his/her WG.
 The machinery is most effective when the statistics pages are produced shortly after the productions are finished. There are several steps to followed, detailed herebelow.
Line: 93 to 93
  -n , --number-of-logs= number of logs to download [default : 1000] --save-json save output in JSON file and inhibit generation of
Changed:
<
<
Html report page
>
>
Html report page (default action) --save-html force generation of Html report page (default action when all prod. logs in JSON)
  --load-json look for and load production statistics from JSON file(s); if missing process production ID according to rest of arguments
Line: 107 to 109
  --use-local-logs Use already downloaded logs, for debuging purpose.
Added:
>
>
ALERT! As of MCStatTools v4r1 the default behaviour of DownloadAndBuildStat.py is to generate a JSON file for each ProdId. HTML tables are generated by default once all statistical information for all ProdIds has been gathered into JSON files. To recover/force generation of HTML tables users may use --save-html option flag on the command line. The JSON files should be compressed and used as supplemental information for debugging issues reported via JIRA.
 ALERT! MCStatTools is evolving towards automating the log generation procedure. The status of this process can be followed on this JIRA task: LHCBGAUSS-929.

Revision 202017-11-29 - MickMulder

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 63 to 63
  %CODE{ lang="bash" num="on" }% lb-run --use MCStatTools LHCbDirac/prod bash --norc
Changed:
<
<
python $MCSTATTOOLSSCRIPTS/DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
>
>
python $MCSTATS_AREA/DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
 %ENDCODE%

where XXXX, YYYY, ZZZZ are the prodId's of the simulation steps for the requests

Revision 192017-11-28 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 92 to 92
  -h, --help show this help message and exit -n , --number-of-logs= number of logs to download [default : 1000]
Changed:
<
<
--filtering retrieve filtering statistics rather than generator logs (experimental!)
>
>
--save-json save output in JSON file and inhibit generation of Html report page --load-json look for and load production statistics from JSON file(s); if missing process production ID according to rest of arguments
  --delta-size= log files smaller by from largest file of sample will be deleted [default : 0.07] -t, --tape get files from ARCHIVE (on disk/tape) directly -v , --verb-level= case insensitive verbosity level [CRIT, ERROR, WARN,
Changed:
<
<
INFO, DEBUG; default: info]
>
>
INFO, DEBUG, VERBOSE; default: info]
  --usage show this help message and exit --use-local-logs Use already downloaded logs, for debuging purpose.
Line: 109 to 113
  TIP In MCStatTools v3r2 and later, the fall-back to parsing parameters from jobDescription.xml files, in case Dirac request of Simulation conditions parameters fails, was removed since it lead any way to invalid naming of the generated HTML page. Instead the script will fail completely for the specific ProdId.
Changed:
<
<
ALERT! A new feature to retrieve *summaryDaVinci*.xml files for MCReconstruction/MCMerge productions (transformations) is available in v3r3. It is activated by the --filtering command line argument and could be used for retrieving custom statistics for filtering productions. Nevertheless, to get the basic filtering efficiency, the new LHCbDirac versions include a dedicated script, e.g.
>
>
ALERT! To get the basic filtering efficiency, the new LHCbDirac versions include a dedicated script, e.g.
 %SYNTAX{ syntax="sh" numbered="1000" numstep="10"}% $ dirac-bookkeeping-rejection-stats -P 57611 Using BK query {'Visible': 'Yes', 'Production': 57611, 'ReplicaFlag': 'Yes'}
Line: 119 to 123
 EventInputStat: 27302528 from 121 jobs Retention: 7.17 % %ENDSYNTAX%
Changed:
<
<
Beware that this --filtering feature is experimental (and might not be supported in future version). Also the retrieved summary log files could be gzipped corresponding to extension .xml.gz. It is the user responsibility to ungzip and design further processing scripts to get the final required statistics for the production.
>
>
 

Known MCStatTools issues

Revision 182017-07-04 - ThomasLatham

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 62 to 62
 Launch DownloadAndBuildStat.py, passing it the list of prodIDs you have just found:

%CODE{ lang="bash" num="on" }%

Changed:
<
<
lb-run --use MCStatTools LHCbDirac bash $MCSTATTOOLSSCRIPTS/python DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
>
>
lb-run --use MCStatTools LHCbDirac/prod bash --norc python $MCSTATTOOLSSCRIPTS/DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
 %ENDCODE%

where XXXX, YYYY, ZZZZ are the prodId's of the simulation steps for the requests

Revision 172017-03-16 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 46 to 46
 The latter operation is done by a script called GaussStat.py. The full operation is steered by DownloadAndBuild.py, which runs GaussStat.py internally.
Changed:
<
<
The following instructions are updated to work with latest LHCbDirac and MCStatTools versions.
>
>
The following instructions are updated for latest LHCbDirac and MCStatTools versions.
 For obsolete older versions, one could use the old method of setting up the run-time environment for a compatible LHCbDirac package:
SetupProject LHCbDirac
Line: 92 to 92
  -h, --help show this help message and exit -n , --number-of-logs= number of logs to download [default : 1000]
Added:
>
>
--filtering retrieve filtering statistics rather than generator logs (experimental!)
  --delta-size= log files smaller by from largest file of sample will be deleted [default : 0.07] -t, --tape get files from ARCHIVE (on disk/tape) directly
Line: 106 to 107
 The status of this process can be followed on this JIRA task: LHCBGAUSS-929.
Changed:
<
<
TIP In MCStatTools v3r0 and later, the Dirac request of Simulation conditions parameters hangs for a long time in case of older ProdIds. This is to be addressed in next version of the package. However, if query is left to time-out the fall-back method of retrieving these parameters from jobDescription.xml files is used, which for Sim08 and earlier productions could be successful. In case of Sim09 productions this fall-back procedure is known to fail to identify correctly the Sim code name, so the generated web page would have an invalid file name.
>
>
TIP In MCStatTools v3r2 and later, the fall-back to parsing parameters from jobDescription.xml files, in case Dirac request of Simulation conditions parameters fails, was removed since it lead any way to invalid naming of the generated HTML page. Instead the script will fail completely for the specific ProdId.
 
Added:
>
>
ALERT! A new feature to retrieve *summaryDaVinci*.xml files for MCReconstruction/MCMerge productions (transformations) is available in v3r3. It is activated by the --filtering command line argument and could be used for retrieving custom statistics for filtering productions. Nevertheless, to get the basic filtering efficiency, the new LHCbDirac versions include a dedicated script, e.g.
<!-- SyntaxHighlightingPlugin -->
 1000$ dirac-bookkeeping-rejection-stats -P 57611
 1010Using BK query {'Visible': 'Yes', 'Production': 57611, 'ReplicaFlag': 'Yes'}
 1020Getting metadata for 121 files  : completed in 0.2 seconds
 1030Getting metadata for 121 jobs : completed in 0.1 seconds
 1040Event stat: 1957882 on 121 files
 1050EventInputStat: 27302528 from 121 jobs
 1060Retention: 7.17 %
<!-- end SyntaxHighlightingPlugin -->
Beware that this --filtering feature is experimental (and might not be supported in future version). Also the retrieved summary log files could be gzipped corresponding to extension .xml.gz. It is the user responsibility to ungzip and design further processing scripts to get the final required statistics for the production.
 

Known MCStatTools issues

Line: 122 to 131
 Thumbs-up Newly introduced feature to initialise GRID proxy at run-time requires the script to restart in order for LHCbDirac API calls to be successful. Mind, that the script will issue a warning when doing so. If debug logging level is set the command being executed is printed before the old instance of the script exits.
Deleted:
<
<
Thumbs-down Script in v3r2 fails to process older (buggier) generator logs. Issue has been addressed (adding compatibility with this format to generator statistics code) and working code is committed to MCStatTools head pending release after February 20, 2017.
 
Deleted:
<
<
Please, open a JIRA task under LHCBGAUSS project for Generator Statistics component if you are sure you found bug.
 
Changed:
<
<
<!-- DownloadAndBuildStat.py terminates with Exception: "ValueError: max() arg is an empty sequence". 
This indicates that either you tried to process the archived production logs which are not staged yet or that the given production ID does not correspond to a valid Simulation production (check that you did not confuse the request ID and production ID or whether you did pass the ID of another production type (e.g. Reconstruction/Merging) corresponding to the same request)
-->
>
>
Please, open a JIRA task under LHCBGAUSS project for Generator Statistics component if you are sure you found bug or, as MC liaison, you really need a new feature to be implemented.
 

<!--

Revision 162017-02-21 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 117 to 117
 As the MCStatTools code is restructured and rewritten to attain the level of functionality described in LHCBGAUSS-929, there are a number of known issues (Thumbs-down) that will be (are) solved in the development version of the package. We list them with possible work-arounds (Thumbs-up) until the code gets released officially.
Changed:
<
<
Thumbs-up Use check_for_staged.py script to ensure that for each ProdId at least some of the production logs are staged (ALERT! Recommended procedure for archived productions ! ). Since v3r1 the user gets both the sumed size of the staged log archive and an estimate of the total necessary space to unpack these archives assuming a multiplication factor 11x.
>
>
Thumbs-up Use check_for_staged.py script to ensure that for each Prod ID at least some of the production logs are staged on disk (ALERT! Recommended procedure for archived productions ! ). Since v3r3 the user gets both the summed size of log files for each Prod ID and the summed size of all staged log files to ease selection of a partition with necessary free disk space.
 Thumbs-up Newly introduced feature to initialise GRID proxy at run-time requires the script to restart in order for LHCbDirac API calls to be successful. Mind, that the script will issue a warning when doing so. If debug logging level is set the command being executed is printed before the old instance of the script exits.

Revision 152017-02-20 - MarcOlivierBettler

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 6 to 6
 

Foreword

Changed:
<
<
The user of a Monte-Carlo sample is interested in a number of efficiency values to make sense of this sample. This information is available in the logfile - and the xml file - associated to each production job. The statistics tables gather this information in a well-formatted way, and are published so that the information is easily accessible to users. It is the task of the MC contact to generate the statistics tables for the production of his WG. There are several steps
>
>
The user of a Monte-Carlo sample is interested in a number of efficiency values to make sense of this sample. This information is available in xml files produced by the simulation jobs and associated to each production job. The statistics tables gather this information in a well-formatted way, and are published so that the information is easily accessible to users. It is the task of the MC contact to generate the statistics tables for the production of his WG. The machinery is most effective when the statistics pages are produced shortly after the productions are finished. There are several steps to followed, detailed herebelow.

1 Get the productions ID number for your request(s).

Use the Dirac webpage (alternate webpage) to retrieve the Dirac Production ID associated to your request. Follow 'Production' -> 'Request manager', and then use the left hand panel to filter the displayed request and pin down the request you are interested in. Click on the request, and select 'Production monitor'. This will bring you to another Dirac webpage, which you could use directly if you know the request ID number. Each request have several steps, each with a Dirac Production ID. You are interested only in the step of type 'MCSimulation'. Write down its ProdID, shown in the first column.

 
Changed:
<
<

Get the productions ID number for your request(s).

>
>
Do it for all the requests you want to process as the rest of the work-flow can be performed on several prodIDs in one go.
 
Deleted:
<
<
Use the Dirac webpage (alternate webpage) to retrieve the Dirac Production ID associated to your request. Follow 'Production' -> 'Request manager', and then use the left hand panel to filter the displayed request and pin down the request you are interested in. Click on the request, and select 'Production monitor'. This will bring you to another Dirac webpage, which you could use directly if you know the request ID number. Each request have several steps, each with a Dirac Production ID. You are interested only in the step of type 'MCSimulation'. Write down its ProdID, shown in the first column.
 
Deleted:
<
<
Do it for all the requests you want to process as the rest of the work-flow can be performed on several prodIDs in one go.
 
Changed:
<
<

Produce the tables.

>
>
Another way to get this information from a bookkeeping path is to use the LHCbDirac command dirac-bookkeeping-prod4path. Mind that if there are spaces in the BK path, you should enclose it in quotes.

<!-- SyntaxHighlightingPlugin -->
$ SetupProject LHCbDirac
$ dirac-bookkeeping-prod4path --BK '/MC/2012/Beam4000GeV-2012-MagDown-Nu2.5-Pythia8/Sim08c/Digi13/Trig0x409f0045/Reco14a/Stripping20NoPrescalingFlagged/41900006 ( ttbar_gg_1l17GeV ) /ALLSTREAMS.DST'
For BK path /MC/2012/Beam4000GeV-2012-MagDown-Nu2.5-Pythia8/Sim08c/Digi13/Trig0x409f0045/Reco14a/Stripping20NoPrescalingFlagged/41900006 ( ttbar_gg_1l17GeV ) /ALLSTREAMS.DST: 
Productions found (Merge): 32263 
Parent productions (MCSimulation): 32262 
<!-- end SyntaxHighlightingPlugin -->

You are interested only in the prodID for the 'MCSimulation'.

 
Changed:
<
<
This step consists into retrieving the logs of a set of ProdIDs and constructing statistics tables for them. The latter operation is done by a script called GaussStat.py. The full operation is steered by DownloadAndBuild.py, which runs GaussStat.py internally.
>
>

2 Produce the tables.

 
Changed:
<
<
The following instructions are updated to work with latest LHCbDirac and MCStatTools versions. For obsolete older versions, one could use the old method of setting up the run-time environment for a compatible LHCbDirac package:
>
>
This step consists into retrieving the xml files of a set of ProdIDs and constructing statistics tables for them. The latter operation is done by a script called GaussStat.py. The full operation is steered by DownloadAndBuild.py, which runs GaussStat.py internally.

The following instructions are updated to work with latest LHCbDirac and MCStatTools versions. For obsolete older versions, one could use the old method of setting up the run-time environment for a compatible LHCbDirac package:

 
SetupProject LHCbDirac
Changed:
<
<
To begin, cd to a directory with a lot of free space. Copy the current stats table of <your WG> and corresponding to the MCtype (MC11, MC2012) in this directory:
>
>
To begin, cd to a directory with a lot of free space. Copy the current stats table of <your_WG> and corresponding to the MCtype <MC _ype> (MC2012, SIM08STAT, SIM09STAT) in this directory:
  %CODE{ lang="bash" num="on" }%
Changed:
<
<
cp $LHCBDOC/STATISTICS/SIM08STAT/-WG/*.html .
>
>
cp $LHCBDOC/STATISTICS//-WG/*.html .
 %ENDCODE%
Deleted:
<
<
SIM08 changes to SIM09 in case the production was done using Sim09* simulation software.
 Launch DownloadAndBuildStat.py, passing it the list of prodIDs you have just found:
Deleted:
<
<
  • Before MCStatTools is released (should be done end of March 2013), you need to follow this procedure. Once MCStatTools will be released, this won't be necessary anymore unless you want (are instructed) to use the HEAD of the package.

<!-- SyntaxHighlightingPlugin -->
    1cd cmtuser
    2getpack MCStatTools
    3lb-run LHCbDirac bash
    4cd MCStatTools/scripts
    5python DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
<!-- end SyntaxHighlightingPlugin -->

where XXXX, YYYY, ZZZZ are the prodId's of the simulation steps for the requests

  • Once MCStatTools is released, only do:
 
<!-- SyntaxHighlightingPlugin -->
    1lb-run --use MCStatTools LHCbDirac bash
    2$MCSTATTOOLSSCRIPTS/python DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
<!-- end SyntaxHighlightingPlugin -->
Changed:
<
<
This will try to get the logs first from the web, then from CASTOR archive. It will also filter the logs for obviously malformed and incomplete log files. Then it calls the script that builds the tables up. It will then update existing tables with the information corresponding to the ProdIDs passed.
>
>
where XXXX, YYYY, ZZZZ are the prodId's of the simulation steps for the requests

This will try to get the logs first from the web, then from CASTOR archive. It will also filter the xml files for obviously malformed and incomplete files. Then it calls the script that builds the tables up. It will then update existing tables with the information corresponding to the ProdIDs passed.

 
Changed:
<
<
ALERT! In case you do not have a valid GRID proxy initialized on the running machine, you will be prompted to create such a proxy at run-time. This is done using the dirac-proxy-init command.
>
>
ALERT! In case you do not have a valid GRID proxy initialized on the running machine, you will be prompted to create such a proxy at run-time. This is done using the dirac-proxy-init command.
 
Changed:
<
<
ALERT!If execution appears to hang, try running the script from an lxplus node. When using your local machine there is a chance the incoming packets from archive (XRootD) server will be rejected by your firewall.
>
>
ALERT!If execution appears to hang, try running the script from an lxplus node. When using your local machine there is a chance the incoming packets from archive (XRootD) server will be rejected by your firewall.
  Interesting options are --verbose, --number-of-logs=<NB_LOGS>.
Line: 81 to 102
  --use-local-logs Use already downloaded logs, for debuging purpose.
Changed:
<
<
ALERT! MCStatTools is evolving towards automating the log generation procedure. The status of this process can be followed on this JIRA task: LHCBGAUSS-929.
>
>
ALERT! MCStatTools is evolving towards automating the log generation procedure. The status of this process can be followed on this JIRA task: LHCBGAUSS-929.
 
Changed:
<
<
TIP In MCStatTools v3r0 and later, the Dirac request of Simulation conditions parameters hangs for a long time in case of older ProdIds. This is to be addressed in next version of the package. However, if query is left to time-out the fall-back method of retrieving these parameters from jobDescription.xml files is used, which for Sim08 and earlier productions could be successful. In case of Sim09 productions this fall-back procedure is known to fail to identify correctly the Sim code name, so the generated web page would have an invalid file name.
>
>
TIP In MCStatTools v3r0 and later, the Dirac request of Simulation conditions parameters hangs for a long time in case of older ProdIds. This is to be addressed in next version of the package. However, if query is left to time-out the fall-back method of retrieving these parameters from jobDescription.xml files is used, which for Sim08 and earlier productions could be successful. In case of Sim09 productions this fall-back procedure is known to fail to identify correctly the Sim code name, so the generated web page would have an invalid file name.
 

Known MCStatTools issues

Changed:
<
<
As the MCStatTools code is restructured and rewritten to attain the level of functionality described in LHCBGAUSS-929, there are a number of known issues (Thumbs-down) that will be (are) solved in the development version of the package. We list them with possible work-arounds (Thumbs-up) until the code gets released officially.
>
>
As the MCStatTools code is restructured and rewritten to attain the level of functionality described in LHCBGAUSS-929, there are a number of known issues (Thumbs-down) that will be (are) solved in the development version of the package. We list them with possible work-arounds (Thumbs-up) until the code gets released officially.
 
Changed:
<
<
Thumbs-up Use check_for_staged.py script to ensure that for each ProdId at least some of the production logs are staged (ALERT! Recommended procedure for archived productions ! ). Since v3r1 the user gets both the sumed size of the staged log archive and an estimate of the total necessary space to unpack these archives assuming a multiplication factor 11x.
Thumbs-up Newly introduced feature to initialise GRID proxy at run-time requires the script to restart in order for LHCbDirac API calls to be successful. Mind, that the script will issue a warning when doing so. If debug logging level is set the command being executed is printed before the old instance of the script exits.
Thumbs-down Script in v3r2 fails to process older (buggier) generator logs. Issue has been addressed (adding compatibility with this format to generator statistics code) and working code is committed to MCStatTools head pending release after February 20, 2017.
>
>
Thumbs-up Use check_for_staged.py script to ensure that for each ProdId at least some of the production logs are staged (ALERT! Recommended procedure for archived productions ! ). Since v3r1 the user gets both the sumed size of the staged log archive and an estimate of the total necessary space to unpack these archives assuming a multiplication factor 11x.
Thumbs-up Newly introduced feature to initialise GRID proxy at run-time requires the script to restart in order for LHCbDirac API calls to be successful. Mind, that the script will issue a warning when doing so. If debug logging level is set the command being executed is printed before the old instance of the script exits.
Thumbs-down Script in v3r2 fails to process older (buggier) generator logs. Issue has been addressed (adding compatibility with this format to generator statistics code) and working code is committed to MCStatTools head pending release after February 20, 2017.
  Please, open a JIRA task under LHCBGAUSS project for Generator Statistics component if you are sure you found bug.
Changed:
<
<
<!-- DownloadAndBuildStat.py terminates with Exception: "ValueError: max() arg is an empty sequence". This indicates that either you tried to process the archived production logs which are not staged yet or that the given production ID does not correspond to a valid Simulation production (check that you did not confuse the request ID and production ID or whether you did pass the ID of another production type (e.g. Reconstruction/Merging) corresponding to the same request)
-->
>
>
<!-- DownloadAndBuildStat.py terminates with Exception: "ValueError: max() arg is an empty sequence". 
This indicates that either you tried to process the archived production logs which are not staged yet or that the given production ID does not correspond to a valid Simulation production (check that you did not confuse the request ID and production ID or whether you did pass the ID of another production type (e.g. Reconstruction/Merging) corresponding to the same request)
-->
 

<!--
Line: 106 to 138
 
-->
Changed:
<
<

Publish the tables.

<!-- The up-to-date procedure is detailed here. -->
>
>

3. Publish the tables.

 
Changed:
<
<
<!-- Send a mail to Gloria, with a pointer to a folder containing the updated tables in a public-readable area. -->
Create a JIRA task at https://its.cern.ch/jira/browse/LHCBGAUSS (you have to log in to your CERN SSO account). Choose as Component Generators Statistics and either upload the statistics pages or give a pointer to a folder containing the updated tables in a public-readable area. The tables will then be added to the Gauss web site.
>
>
Create a JIRA task at https://its.cern.ch/jira/browse/LHCBGAUSS (you have to login with your CERN SSO account). Choose as Component Generators Statistics and either upload the statistics pages or give a pointer to a folder containing the updated tables in a public-readable area. The tables will then be added to the Gauss web site.
 

Look for available files in the bookeeping given an eventype.

Changed:
<
<
This is automated in the following script from Vanya. For each returned tuple, the first entry is the bkk path, the last two ones are the number of files and the overall number of events.
>
>
This is automated in the following script from Vanya. For each returned tuple, the first entry is the bkk path, the last two ones are the number of files and the overall number of events.
  %SYNTAX{ syntax="sh" numbered="1000" numstep="10"}% > lhcb-proxy-init

Revision 142017-02-10 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 92 to 92
  Thumbs-up Use check_for_staged.py script to ensure that for each ProdId at least some of the production logs are staged (ALERT! Recommended procedure for archived productions ! ). Since v3r1 the user gets both the sumed size of the staged log archive and an estimate of the total necessary space to unpack these archives assuming a multiplication factor 11x.
Thumbs-up Newly introduced feature to initialise GRID proxy at run-time requires the script to restart in order for LHCbDirac API calls to be successful. Mind, that the script will issue a warning when doing so. If debug logging level is set the command being executed is printed before the old instance of the script exits.
Changed:
<
<
Thumbs-down No known issues for v3r2 or later. Please, open a JIRA task under LHCBGAUSS project for Generator Statistics component if you are sure you found one.
>
>
Thumbs-down Script in v3r2 fails to process older (buggier) generator logs. Issue has been addressed (adding compatibility with this format to generator statistics code) and working code is committed to MCStatTools head pending release after February 20, 2017.

Please, open a JIRA task under LHCBGAUSS project for Generator Statistics component if you are sure you found bug.

 
<!-- DownloadAndBuildStat.py terminates with Exception: "ValueError: max() arg is an empty sequence". This indicates that either you tried to process the archived production logs which are not staged yet or that the given production ID does not correspond to a valid Simulation production (check that you did not confuse the request ID and production ID or whether you did pass the ID of another production type (e.g. Reconstruction/Merging) corresponding to the same request)
-->

Revision 132017-02-06 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 91 to 91
 As the MCStatTools code is restructured and rewritten to attain the level of functionality described in LHCBGAUSS-929, there are a number of known issues (Thumbs-down) that will be (are) solved in the development version of the package. We list them with possible work-arounds (Thumbs-up) until the code gets released officially.

Thumbs-up Use check_for_staged.py script to ensure that for each ProdId at least some of the production logs are staged (ALERT! Recommended procedure for archived productions ! ). Since v3r1 the user gets both the sumed size of the staged log archive and an estimate of the total necessary space to unpack these archives assuming a multiplication factor 11x.

Changed:
<
<
Thumbs-down DownloadAndBuildStat.py terminates with Exception: "ValueError: max() arg is an empty sequence". This indicates that either you tried to process the archived production logs which are not staged yet or that the given production ID does not correspond to a valid Simulation production (check that you did not confuse the request ID and production ID or whether you did pass the ID of another production type (e.g. Reconstruction/Merging) corresponding to the same request)
>
>
Thumbs-up Newly introduced feature to initialise GRID proxy at run-time requires the script to restart in order for LHCbDirac API calls to be successful. Mind, that the script will issue a warning when doing so. If debug logging level is set the command being executed is printed before the old instance of the script exits.
Thumbs-down No known issues for v3r2 or later. Please, open a JIRA task under LHCBGAUSS project for Generator Statistics component if you are sure you found one.

<!-- DownloadAndBuildStat.py terminates with Exception: "ValueError: max() arg is an empty sequence". This indicates that either you tried to process the archived production logs which are not staged yet or that the given production ID does not correspond to a valid Simulation production (check that you did not confuse the request ID and production ID or whether you did pass the ID of another production type (e.g. Reconstruction/Merging) corresponding to the same request)
-->
 

<!--

Revision 122017-02-02 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 23 to 23
 SetupProject LHCbDirac
Changed:
<
<
To begin, cd to a directory with a lot of free space. Copy the current stats table of your WG and corresponding to the MCtype (MC11, MC2012) in this directory. Launch DownloadAndBuildStat.py, passing it the list of prodIDs you have just found:
>
>
To begin, cd to a directory with a lot of free space. Copy the current stats table of <your WG> and corresponding to the MCtype (MC11, MC2012) in this directory:

<!-- SyntaxHighlightingPlugin -->
    1cp $LHCBDOC/STATISTICS/SIM08STAT/<your WG>-WG/*.html .
<!-- end SyntaxHighlightingPlugin -->

SIM08 changes to SIM09 in case the production was done using Sim09* simulation software.

Launch DownloadAndBuildStat.py, passing it the list of prodIDs you have just found:

 
  • Before MCStatTools is released (should be done end of March 2013), you need to follow this procedure. Once MCStatTools will be released, this won't be necessary anymore unless you want (are instructed) to use the HEAD of the package.
Changed:
<
<
>
>
%CODE{ lang="bash" num="on" }%
 cd cmtuser getpack MCStatTools lb-run LHCbDirac bash cd MCStatTools/scripts python DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
Changed:
<
<
>
>
%ENDCODE%
  where XXXX, YYYY, ZZZZ are the prodId's of the simulation steps for the requests

  • Once MCStatTools is released, only do:
Changed:
<
<
>
>
%CODE{ lang="bash" num="on" }%
 lb-run --use MCStatTools LHCbDirac bash $MCSTATTOOLSSCRIPTS/python DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
Changed:
<
<
>
>
%ENDCODE%
  This will try to get the logs first from the web, then from CASTOR archive. It will also filter the logs for obviously malformed and incomplete log files. Then it calls the script that builds the tables up. It will then update existing tables with the information corresponding to the ProdIDs passed.
Line: 82 to 90
  As the MCStatTools code is restructured and rewritten to attain the level of functionality described in LHCBGAUSS-929, there are a number of known issues (Thumbs-down) that will be (are) solved in the development version of the package. We list them with possible work-arounds (Thumbs-up) until the code gets released officially.
Changed:
<
<
Thumbs-down DownloadAndBuildStat.py terminates with Exception: "Could not find any log files for the production ID(s)". Also --delta argument value has no effect.
Thumbs-up Use check_for_staged.py script to ensure that for each ProdId at least some of the production logs are staged.
TODO Add to check_for_staged.py a feature to compute total disk space required by production log files so that user can choose a partition with enough disk space when running DownloadAndBuildStat.py.
>
>
Thumbs-up Use check_for_staged.py script to ensure that for each ProdId at least some of the production logs are staged (ALERT! Recommended procedure for archived productions ! ). Since v3r1 the user gets both the sumed size of the staged log archive and an estimate of the total necessary space to unpack these archives assuming a multiplication factor 11x.
Thumbs-down DownloadAndBuildStat.py terminates with Exception: "ValueError: max() arg is an empty sequence". This indicates that either you tried to process the archived production logs which are not staged yet or that the given production ID does not correspond to a valid Simulation production (check that you did not confuse the request ID and production ID or whether you did pass the ID of another production type (e.g. Reconstruction/Merging) corresponding to the same request)
 
<!--
Thumbs-down  
Line: 95 to 103
 

Publish the tables.

Added:
>
>
<!-- The up-to-date procedure is detailed here. -->
 
<!-- Send a mail to Gloria, with a pointer to a folder containing the updated tables in a public-readable area. -->
Changed:
<
<
Create a JIRA task at https://its.cern.ch/jira/browse/LHCBGAUSS (you have to login with your CERN SSO account). Choose as Component Generators Statistics and either upload the statistics pages or give a pointer to a folder containing the updated tables in a public-readable area. The tables will then be added to the Gauss web site.
>
>
Create a JIRA task at https://its.cern.ch/jira/browse/LHCBGAUSS (you have to log in to your CERN SSO account). Choose as Component Generators Statistics and either upload the statistics pages or give a pointer to a folder containing the updated tables in a public-readable area. The tables will then be added to the Gauss web site.
 

Look for available files in the bookeeping given an eventype.

Revision 112016-12-14 - AlexGrecu

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 18 to 18
  This step consists into retrieving the logs of a set of ProdIDs and constructing statistics tables for them. The latter operation is done by a script called GaussStat.py. The full operation is steered by DownloadAndBuild.py, which runs GaussStat.py internally.
Added:
>
>
The following instructions are updated to work with latest LHCbDirac and MCStatTools versions. For obsolete older versions, one could use the old method of setting up the run-time environment for a compatible LHCbDirac package:
SetupProject LHCbDirac
 To begin, cd to a directory with a lot of free space. Copy the current stats table of your WG and corresponding to the MCtype (MC11, MC2012) in this directory. Launch DownloadAndBuildStat.py, passing it the list of prodIDs you have just found:
Changed:
<
<
  • Before MCStatTools is released (should be done end of March 2013), you need to follow this procedure. Once MCStatTools will be released, this won't be necessary anymore.
>
>
  • Before MCStatTools is released (should be done end of March 2013), you need to follow this procedure. Once MCStatTools will be released, this won't be necessary anymore unless you want (are instructed) to use the HEAD of the package.
 
cd cmtuser
getpack MCStatTools
Changed:
<
<
SetupProject LHCbDirac cd MCStatTools/cmt source setup.csh
>
>
lb-run LHCbDirac bash cd MCStatTools/scripts
 python DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
Changed:
<
<
where XXXX, YYYY, ZZZZ are the prodId's of the simulation steps for the requests
>
>
where XXXX, YYYY, ZZZZ are the prodId's of the simulation steps for the requests
 
  • Once MCStatTools is released, only do:
Changed:
<
<
SetupProject LHCbDirac --runtime DBASE --use MCStatTools
>
>
lb-run --use MCStatTools LHCbDirac bash
 $MCSTATTOOLSSCRIPTS/python DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
Changed:
<
<
This will try to get the logs first from the web, then from castor. It will also filter the logs for obviously malformed and incomplete log files. Then it calls the script that builds the tables up. It will then update existing tables with the information corresponding to the ProdIDs passed.
>
>
This will try to get the logs first from the web, then from CASTOR archive. It will also filter the logs for obviously malformed and incomplete log files. Then it calls the script that builds the tables up. It will then update existing tables with the information corresponding to the ProdIDs passed.

ALERT! In case you do not have a valid GRID proxy initialized on the running machine, you will be prompted to create such a proxy at run-time. This is done using the dirac-proxy-init command.

 
Changed:
<
<
Note: If execution appears to hang, try running the script from an lxplus node. When using your local machine there is a chance the incoming packets from CASTOR will be rejected by your firewall.
>
>
ALERT!If execution appears to hang, try running the script from an lxplus node. When using your local machine there is a chance the incoming packets from archive (XRootD) server will be rejected by your firewall.
 
Changed:
<
<
Interesting options are --verbose, --number-of-logs=<NB_LOGS>. More subtle options allow to use local copies of the scripts that are used to get logs from CASTOR and to build the tables, --local-download-script and --local-gaussstat, respectively. Those are used when either of the aforementioned script becomes buggy and a quick local fix is available.
>
>
Interesting options are --verbose, --number-of-logs=<NB_LOGS>.
  Full help:
Line: 56 to 62
 Options: -h, --help show this help message and exit -n , --number-of-logs=
Changed:
<
<
number of logs to download [default : 300]
>
>
number of logs to download [default : 1000]
  --delta-size= log files smaller by from largest file of sample will be deleted [default : 0.07]
Changed:
<
<
--local-download-script use local version of dirac-production-download-logs- withoutcheck.py, if False, the script of MCStatTools will be used [default : False] --local-gaussstat use local version of GaussStat.py, if False, the script of MCStatTools will be used [default : False] -c, --castor get files from CASTOR directly -v, --verbose talk to me baby
>
>
-t, --tape get files from ARCHIVE (on disk/tape) directly -v , --verb-level= case insensitive verbosity level [CRIT, ERROR, WARN, INFO, DEBUG; default: info]
  --usage show this help message and exit
Changed:
<
<
--use-local-logs Use already downlaoded logs, for debuging purpose.
>
>
--use-local-logs Use already downloaded logs, for debuging purpose.
 
Added:
>
>
ALERT! MCStatTools is evolving towards automating the log generation procedure. The status of this process can be followed on this JIRA task: LHCBGAUSS-929.

TIP In MCStatTools v3r0 and later, the Dirac request of Simulation conditions parameters hangs for a long time in case of older ProdIds. This is to be addressed in next version of the package. However, if query is left to time-out the fall-back method of retrieving these parameters from jobDescription.xml files is used, which for Sim08 and earlier productions could be successful. In case of Sim09 productions this fall-back procedure is known to fail to identify correctly the Sim code name, so the generated web page would have an invalid file name.

Known MCStatTools issues

As the MCStatTools code is restructured and rewritten to attain the level of functionality described in LHCBGAUSS-929, there are a number of known issues (Thumbs-down) that will be (are) solved in the development version of the package. We list them with possible work-arounds (Thumbs-up) until the code gets released officially.

Thumbs-down DownloadAndBuildStat.py terminates with Exception: "Could not find any log files for the production ID(s)". Also --delta argument value has no effect.
Thumbs-up Use check_for_staged.py script to ensure that for each ProdId at least some of the production logs are staged.
TODO Add to check_for_staged.py a feature to compute total disk space required by production log files so that user can choose a partition with enough disk space when running DownloadAndBuildStat.py.

<!--
Thumbs-down  
Thumbs-up
TODO -->
 

Publish the tables.

Changed:
<
<
Send a mail to Gloria, with a pointer to a folder containing the updated tables in a public-readable area.
>
>
<!-- Send a mail to Gloria, with a pointer to a folder containing the updated tables in a public-readable area. -->
Create a JIRA task at https://its.cern.ch/jira/browse/LHCBGAUSS (you have to login with your CERN SSO account). Choose as Component Generators Statistics and either upload the statistics pages or give a pointer to a folder containing the updated tables in a public-readable area. The tables will then be added to the Gauss web site.
 

Look for available files in the bookeeping given an eventype.

Revision 102016-10-28 - VanyaBelyaev

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 80 to 80
  This is automated in the following script from Vanya. For each returned tuple, the first entry is the bkk path, the last two ones are the number of files and the overall number of events.
Changed:
<
<
$ python /afs/cern.ch/user/i/ibelyaev/public/scripts/dirac-bookkeeping-get-prodinfo-eventtype.py  11102003                                                  [15:16] 
>
>
%SYNTAX{ syntax="sh" numbered="1000" numstep="10"}% > lhcb-proxy-init
> PATH=/afs/cern.ch/user/i/ibelyaev/public/scripts:$PATH
> get_bookkeeping_info 11102003
 ('/MC/2012/Beam4000GeV-MayJune2012-MagDown-Nu2.5-EmNoCuts/Sim06a/Trig0x0097003dFlagged/Reco13a/Stripping19aNoPrescalingFlagged/11102003/ALLSTREAMS.DST', 'head-20120413', 'sim-20120727-vc-md100', 17, 203599) ('/MC/2012/Beam4000GeV-MayJune2012-MagUp-Nu2.5-EmNoCuts/Sim06a/Trig0x0097003dFlagged/Reco13a/Stripping19aNoPrescalingFlagged/11102003/ALLSTREAMS.DST', 'head-20120413', 'sim-20120727-vc-mu100', 17, 207000) [...]
Deleted:
<
<
 \ No newline at end of file
Added:
>
>
%ENDSYNTAX%

Revision 92014-12-09 - MarcOlivierBettler

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 74 to 75
 

Publish the tables.

Send a mail to Gloria, with a pointer to a folder containing the updated tables in a public-readable area.

Added:
>
>

Look for available files in the bookeeping given an eventype.

This is automated in the following script from Vanya. For each returned tuple, the first entry is the bkk path, the last two ones are the number of files and the overall number of events.

$ python /afs/cern.ch/user/i/ibelyaev/public/scripts/dirac-bookkeeping-get-prodinfo-eventtype.py  11102003                                                  [15:16] 
('/MC/2012/Beam4000GeV-MayJune2012-MagDown-Nu2.5-EmNoCuts/Sim06a/Trig0x0097003dFlagged/Reco13a/Stripping19aNoPrescalingFlagged/11102003/ALLSTREAMS.DST', 'head-20120413', 'sim-20120727-vc-md100', 17, 203599)
('/MC/2012/Beam4000GeV-MayJune2012-MagUp-Nu2.5-EmNoCuts/Sim06a/Trig0x0097003dFlagged/Reco13a/Stripping19aNoPrescalingFlagged/11102003/ALLSTREAMS.DST', 'head-20120413', 'sim-20120727-vc-mu100', 17, 207000)
[...]
 \ No newline at end of file

Revision 82014-09-30 - JacksonSmith

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 16 to 16
 

Produce the tables.

Changed:
<
<
This step consists into retrieving the logs of a set of ProdIDs and constructing statistics tables for them. The latter operation of done by a script called GaussStat.py. The full operation is steered by DownloadAndBuild.py, which runs GaussStat.py internally.
>
>
This step consists into retrieving the logs of a set of ProdIDs and constructing statistics tables for them. The latter operation is done by a script called GaussStat.py. The full operation is steered by DownloadAndBuild.py, which runs GaussStat.py internally.
 
Changed:
<
<
You need to separate your list of ProdIDs according to the MCtype (MC11, MC2012).
>
>
To begin, cd to a directory with a lot of free space. Copy the current stats table of your WG and corresponding to the MCtype (MC11, MC2012) in this directory. Launch DownloadAndBuildStat.py, passing it the list of prodIDs you have just found:
 
  • Before MCStatTools is released (should be done end of March 2013), you need to follow this procedure. Once MCStatTools will be released, this won't be necessary anymore.
Line: 28 to 28
 SetupProject LHCbDirac cd MCStatTools/cmt source setup.csh
Deleted:
<
<
lhcb-proxy-init
 python DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
Line: 38 to 37
 
SetupProject LHCbDirac --runtime DBASE --use MCStatTools
Deleted:
<
<
lhcb-proxy-init
 $MCSTATTOOLSSCRIPTS/python DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
Deleted:
<
<
Then cd to a directory with a lot of free space. Copy the current stats table of your WG and corresponding to the MCtype (MC11, MC2012) in this directory. Launch DownloadAndBuildStat.py, specifying the MC type and giving a list of ProdID, comma-separated. E.g. for a MC11 productions, and for the ProdIDs 23001,23004,23007:
$MCSTATTOOLSSCRIPTS/python DownloadAndBuildStat.py --MCtype=MC11 23001,23004,23007
 This will try to get the logs first from the web, then from castor. It will also filter the logs for obviously malformed and incomplete log files. Then it calls the script that builds the tables up. It will then update existing tables with the information corresponding to the ProdIDs passed.
Changed:
<
<
Note: If execution appears to hang, try running the script from an lxplus node. When using your local machine there is a chance the incoming packets from CASTOR will be rejected by your firewall.
>
>
Note: If execution appears to hang, try running the script from an lxplus node. When using your local machine there is a chance the incoming packets from CASTOR will be rejected by your firewall.
 
Changed:
<
<
Interesting options are --MCtype=MC_TYPE, --verbose, --number-of-logs=<NB_LOGS>. More subtle options allow to use local copies of the scripts that are used to get logs from CASTOR and to build the tables, --local-download-script and --local-gaussstat, respectively. Those are used when either of the aforementioned script becomes buggy and a quick local fix is available.
>
>
Interesting options are --verbose, --number-of-logs=<NB_LOGS>. More subtle options allow to use local copies of the scripts that are used to get logs from CASTOR and to build the tables, --local-download-script and --local-gaussstat, respectively. Those are used when either of the aforementioned script becomes buggy and a quick local fix is available.
  Full help:
Line: 66 to 59
  number of logs to download [default : 300] --delta-size= log files smaller by from largest file of sample will be deleted [default : 0.07]
Deleted:
<
<
--MCtype=MC_TYPE MC nickname used to path to pick up the correct stat tables to produce. Recognised values are ['MC10', 'MC11', 'MC2012', 'DEV', 'Upgrade'] [default : MC2012]
  --local-download-script use local version of dirac-production-download-logs- withoutcheck.py, if False, the script of MCStatTools

Revision 72014-09-30 - JacksonSmith

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 32 to 32
 python DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
Changed:
<
<
where XXXX, YYYY, ZZZZ are the prodId's of the relevant requests
>
>
where XXXX, YYYY, ZZZZ are the prodId's of the simulation steps for the requests
 
  • Once MCStatTools is released, only do:

Revision 62014-09-25 - JacksonSmith

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 29 to 29
 cd MCStatTools/cmt source setup.csh lhcb-proxy-init
Added:
>
>
python DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
 
Added:
>
>
where XXXX, YYYY, ZZZZ are the prodId's of the relevant requests
 
  • Once MCStatTools is released, only do:

SetupProject LHCbDirac --runtime DBASE --use MCStatTools
lhcb-proxy-init
Added:
>
>
$MCSTATTOOLSSCRIPTS/python DownloadAndBuildStat.py [opts..] XXXX,YYYY,ZZZZ
 

Then cd to a directory with a lot of free space. Copy the current stats table of your WG and corresponding to the MCtype (MC11, MC2012) in this directory. Launch DownloadAndBuildStat.py, specifying the MC type and giving a list of ProdID, comma-separated. E.g. for a MC11 productions, and for the ProdIDs 23001,23004,23007:

Changed:
<
<
python DownloadAndBuildStat.py --MCtype=MC11 23001,23004,23007
>
>
$MCSTATTOOLSSCRIPTS/python DownloadAndBuildStat.py --MCtype=MC11 23001,23004,23007
 

This will try to get the logs first from the web, then from castor. It will also filter the logs for obviously malformed and incomplete log files. Then it calls the script that builds the tables up. It will then update existing tables with the information corresponding to the ProdIDs passed.

Added:
>
>
Note: If execution appears to hang, try running the script from an lxplus node. When using your local machine there is a chance the incoming packets from CASTOR will be rejected by your firewall.
 Interesting options are --MCtype=MC_TYPE, --verbose, --number-of-logs=<NB_LOGS>. More subtle options allow to use local copies of the scripts that are used to get logs from CASTOR and to build the tables, --local-download-script and --local-gaussstat, respectively. Those are used when either of the aforementioned script becomes buggy and a quick local fix is available.

Full help:

Revision 52014-02-11 - JacksonSmith

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 8 to 8
  The user of a Monte-Carlo sample is interested in a number of efficiency values to make sense of this sample. This information is available in the logfile - and the xml file - associated to each production job. The statistics tables gather this information in a well-formatted way, and are published so that the information is easily accessible to users. It is the task of the MC contact to generate the statistics tables for the production of his WG. There are several steps
Deleted:
<
<
 

Get the productions ID number for your request(s).

Changed:
<
<
Use the Dirac webpage (alternate webpage) to retrieve the Dirac Production ID associated to your request. Follow 'Production' -> 'Request manager', and then use the left hand panel to filter the displayed request and pin down the request you are interested in. Click on the request, and select 'Production monitor'. This will bring you to another Dirac webpage, which you could use directly if you know the request ID number. Each request have several steps, each with a Dirac Production ID. You are interested only in the step of type 'MCSimulation'. Write down its ProdID, shown in the first column.
>
>
Use the Dirac webpage (alternate webpage) to retrieve the Dirac Production ID associated to your request. Follow 'Production' -> 'Request manager', and then use the left hand panel to filter the displayed request and pin down the request you are interested in. Click on the request, and select 'Production monitor'. This will bring you to another Dirac webpage, which you could use directly if you know the request ID number. Each request have several steps, each with a Dirac Production ID. You are interested only in the step of type 'MCSimulation'. Write down its ProdID, shown in the first column.
  Do it for all the requests you want to process as the rest of the work-flow can be performed on several prodIDs in one go.
Line: 39 to 34
 
  • Once MCStatTools is released, only do:

Changed:
<
<
SetupProject LHCbDirac --use MCStatTools
>
>
SetupProject LHCbDirac --runtime DBASE --use MCStatTools
 lhcb-proxy-init
Line: 47 to 42
 
python DownloadAndBuildStat.py --MCtype=MC11 23001,23004,23007
Deleted:
<
<
This will try to get the logs first from the web, then from castor. It will also filter the logs for obviously malformed and incomplete log files. Then it calls the script that builds the tables up. It will then update existing tables with the information corresponding to the ProdIDs passed.

Interesting options are --MCtype=MC_TYPE, --verbose, --number-of-logs=<NB_LOGS>. More subtle options allow to use local copies of the scripts that are used to get logs from CASTOR and to build the tables, --local-download-script and --local-gaussstat, respectively. Those are used when either of the aforementioned script becomes buggy and a quick local fix is available.

 
Added:
>
>
This will try to get the logs first from the web, then from castor. It will also filter the logs for obviously malformed and incomplete log files. Then it calls the script that builds the tables up. It will then update existing tables with the information corresponding to the ProdIDs passed.
 
Added:
>
>
Interesting options are --MCtype=MC_TYPE, --verbose, --number-of-logs=<NB_LOGS>. More subtle options allow to use local copies of the scripts that are used to get logs from CASTOR and to build the tables, --local-download-script and --local-gaussstat, respectively. Those are used when either of the aforementioned script becomes buggy and a quick local fix is available.
  Full help:

Revision 42013-03-13 - MarcOlivierBettler

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 21 to 21
 

Produce the tables.

Changed:
<
<
  • Before MCStatTools is released, you need to follow this procedure. Once MCStatTools will be released, this won't be necessary anymore.
>
>
This step consists into retrieving the logs of a set of ProdIDs and constructing statistics tables for them. The latter operation of done by a script called GaussStat.py. The full operation is steered by DownloadAndBuild.py, which runs GaussStat.py internally.

You need to separate your list of ProdIDs according to the MCtype (MC11, MC2012).

  • Before MCStatTools is released (should be done end of March 2013), you need to follow this procedure. Once MCStatTools will be released, this won't be necessary anymore.
 
cd cmtuser
Line: 32 to 36
 lhcb-proxy-init
Changed:
<
<
>
>
  • Once MCStatTools is released, only do:
 
SetupProject LHCbDirac --use MCStatTools
lhcb-proxy-init
Changed:
<
<
This step consists into retrieving the logs of a set of ProdIDs and constructing statistics tables for them. The latter operation of done by a script called GaussStat.py. The full operation is steered by DownloadAndBuild.py, which runs GaussStat.py internally.
>
>
Then cd to a directory with a lot of free space. Copy the current stats table of your WG and corresponding to the MCtype (MC11, MC2012) in this directory. Launch DownloadAndBuildStat.py, specifying the MC type and giving a list of ProdID, comma-separated. E.g. for a MC11 productions, and for the ProdIDs 23001,23004,23007:
python DownloadAndBuildStat.py --MCtype=MC11 23001,23004,23007
This will try to get the logs first from the web, then from castor. It will also filter the logs for obviously malformed and incomplete log files. Then it calls the script that builds the tables up. It will then update existing tables with the information corresponding to the ProdIDs passed.

Interesting options are --MCtype=MC_TYPE, --verbose, --number-of-logs=<NB_LOGS>. More subtle options allow to use local copies of the scripts that are used to get logs from CASTOR and to build the tables, --local-download-script and --local-gaussstat, respectively. Those are used when either of the aforementioned script becomes buggy and a quick local fix is available.

 
Deleted:
<
<
You need to separate your list of ProdIDs according to the MCtype (MC11, MC2012).
 
Added:
>
>
Full help:
 
$ python DownloadAndBuildStat.py --help
Usage: DownloadAndBuildStat.py [options] <ProdIDs>
Line: 69 to 80
  -v, --verbose talk to me baby --usage show this help message and exit --use-local-logs Use already downlaoded logs, for debuging purpose.
Deleted:
<
<
--no-use-MCStatTools Hack to allow prerelease debugging. +verbatim+
 

Publish the tables.

Revision 32013-03-13 - MarcOlivierBettler

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 21 to 21
 

Produce the tables.

Added:
>
>
  • Before MCStatTools is released, you need to follow this procedure. Once MCStatTools will be released, this won't be necessary anymore.

cd cmtuser
getpack MCStatTools
SetupProject LHCbDirac
cd  MCStatTools/cmt
source setup.csh
lhcb-proxy-init

SetupProject LHCbDirac --use MCStatTools
lhcb-proxy-init
 This step consists into retrieving the logs of a set of ProdIDs and constructing statistics tables for them. The latter operation of done by a script called GaussStat.py. The full operation is steered by DownloadAndBuild.py, which runs GaussStat.py internally.

You need to separate your list of ProdIDs according to the MCtype (MC11, MC2012).

Deleted:
<
<
+verbatim+
 
$ python DownloadAndBuildStat.py --help
Usage: DownloadAndBuildStat.py [options] <ProdIDs>

Revision 22013-03-11 - MarcOlivierBettler

Line: 1 to 1
 
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Line: 21 to 21
 

Produce the tables.

Added:
>
>
This step consists into retrieving the logs of a set of ProdIDs and constructing statistics tables for them. The latter operation of done by a script called GaussStat.py. The full operation is steered by DownloadAndBuild.py, which runs GaussStat.py internally.

You need to separate your list of ProdIDs according to the MCtype (MC11, MC2012).

+verbatim+

$ python DownloadAndBuildStat.py --help
Usage: DownloadAndBuildStat.py [options] <ProdIDs>

       <ProdIDs> : list of ProdID(s), comma-separated, with no blank spaces.

Options:
  -h, --help            show this help message and exit
  -n <NB_LOGS>, --number-of-logs=<NB_LOGS>
                        number of logs to download [default : 300]
  --delta-size=<DELTA>  log files smaller by <DELTA> from largest file of
                        sample will be deleted [default : 0.07]
  --MCtype=MC_TYPE      MC nickname used to path to pick up the correct stat
                        tables to produce. Recognised values are ['MC10',
                        'MC11', 'MC2012', 'DEV', 'Upgrade'] [default : MC2012]
  --local-download-script
                        use local version of dirac-production-download-logs-
                        withoutcheck.py, if False, the script of MCStatTools
                        will be used [default : False]
  --local-gaussstat     use local version of GaussStat.py, if False, the
                        script of MCStatTools will be used [default : False]
  -c, --castor          get files from CASTOR directly
  -v, --verbose         talk to me baby
  --usage               show this help message and exit
  --use-local-logs      Use already downlaoded logs, for debuging purpose.
  --no-use-MCStatTools  Hack to allow prerelease debugging.
+verbatim+
 

Publish the tables.

Send a mail to Gloria, with a pointer to a folder containing the updated tables in a public-readable area.

Revision 12013-03-11 - MarcOlivierBettler

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="LHCbSimulation"

Building the statistics tables for Monte-Carlo simulations.

Foreword

The user of a Monte-Carlo sample is interested in a number of efficiency values to make sense of this sample. This information is available in the logfile - and the xml file - associated to each production job. The statistics tables gather this information in a well-formatted way, and are published so that the information is easily accessible to users. It is the task of the MC contact to generate the statistics tables for the production of his WG. There are several steps

Get the productions ID number for your request(s).

Use the Dirac webpage (alternate webpage) to retrieve the Dirac Production ID associated to your request. Follow 'Production' -> 'Request manager', and then use the left hand panel to filter the displayed request and pin down the request you are interested in. Click on the request, and select 'Production monitor'. This will bring you to another Dirac webpage, which you could use directly if you know the request ID number. Each request have several steps, each with a Dirac Production ID. You are interested only in the step of type 'MCSimulation'. Write down its ProdID, shown in the first column.

Do it for all the requests you want to process as the rest of the work-flow can be performed on several prodIDs in one go.

Produce the tables.

Publish the tables.

Send a mail to Gloria, with a pointer to a folder containing the updated tables in a public-readable area.

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback