How to work with files for Good Luminosity Sections in JSON format
<style type="text/css" media="all">
pre {
text-align: left; padding: 10px;margin-left: 20px; color: black;
}
pre.command {background-color: lightgrey;}
pre.cfg {background-color: lightblue;}
pre.code {background-color: lightpink;}
pre.output {background-color: lightgreen;}
</style>
Introduction
This twiki talks about files that describe which luminosity sections in which runs are considered good and should be processed. In CMS, these files are in the JSON format. (JSON stands for Java Script Object Notation). To find the most current good luminosity section files in JSON format, please visit
NOTE: Legend of colors for this twiki:
GRAY background for the commands to execute (cut&paste)
GREEN background for the output sample of the executed commands
BLUE background for the configuration files (cut&paste)
PINK background for the lines of C++ code (cut&paste)
How to understand the text of Good Luminosity files in JSON format
A typical good lumisection file looks like:
{"132440": [[157, 378]],
"132596": [[382, 382], [447, 447]],
"132598": [[174, 176]],
"132599": [[1, 379], [381, 437]]
}
So the general format is
{"Run Number":[Lumi range, Lumi range, Lumi range, ...],
"Run Number":[Lumi range, Lumi range, Lumi range, ...],
...}
where "Lumi range" could be a single lumi section like [382,382] or range like [157,378]
In this good lumi section file:
- 132440 is the Run Number and [[157, 378]] means good lumisections from 157 to 378 (both inclusive) in Run 132440.
- 132596": [[382, 382], [447, 447]] - Means good lumisections 382 and 447 in Run 132596. The lumisection is always a range, so a single lumisections 382 and 447 are written as [382,382] and [447,447].
- "132599": [[1, 379], [381, 437]] - Means lumisections 1 to 379 and 381 to 437 in Run 132599 in Run 132599
How to compare Good Luminosity files in JSON format
The python script compareJSON.py will run different comparisons between two different files.
With this script you can check
- The union of two files (--or)
- The intersection of two files (--and)
- The subtraction of two files (--sub) - all of the lumi sections in the first file that are not in the second file.
- The difference of two files (--diff) - all lumi sections that appear in one of the files and not the other (i.e., compareJSON.py --diff alpha.json beta.json is equivalent to compareJSON.py --sub alpha.json beta.json and compareJSON.py --sub beta.json alpha.json)
The Good Luminosity files in JSON format used as examples in this twiki are here: alpha.json beta.json
For all options except --diff, you can specify a third file which will contain the output. For example, to see the intersection between alpha.json and beta.json: execute the following command
compareJSON.py --and alpha.json beta.json
and you see the output as follows
{"132596": [[382, 382], [447, 447]],
"132598": [[174, 176]],
"132599": [[1, 379], [381, 437]]}
To have the results saved in output.json:
compareJSON.py --and alpha.json beta.json output.json
And here are the other options and their corresponding output in use:
compareJSON.py --or alpha.json beta.json
{"132440": [[157, 378]],
"132596": [[382, 382], [447, 447]],
"132598": [[174, 176]],
"132599": [[1, 379], [381, 437]],
"132601": [[1, 207]]}
compareJSON.py --sub alpha.json beta.json
{"132440": [[157, 378]]}
compareJSON.py --sub beta.json alpha.json
{"132601": [[1, 207]]}
compareJSON.py --diff alpha.json beta.json
'alpha.json'-only lumis:
{"132440": [[157, 378]]}
'beta.json'-only lumis:
{"132601": [[1, 207]]}
Other Utilities for JSON formatted files
All of these scripts (compareJSON.py and all following scripts) have a
--help
option for further details..
python Library
The compareJSON.py script and all the scripts below are based on the python library in FWCore/PythonUtilities. If you are programming in python (including in the CMSSW configuration language) you may find it easier to use the python directly.
To use the code, first import it:
import
FWCore.PythonUtilities.LumiList as
LumiList
now you can construct a
LumiList object in several different ways:
ll1 = LumiList(file = 'myFile.json')
ll2 = LumiList(url = 'https://cern.ch/path/to/file_json.txt') # Not available in all versions
ll3 = LumiList(lumis = [[1001,1], [1001, 2], [1003, 1], [1003, 3]]) # Pairs of run number, lumi number
ll4 = LumiList(runsAndLumis = {'1001' : [1, 2], '1003' : [1, 3]}) # Dictionaries where the key is the run number and the value is a list of lumis
ll5 = LumiList(runsAndLumis = [{'1001' : [1, 2], '1003' : [1, 3]}]) # A list of objects like above. This is a fast way to construct a LumiList from outputs from lots of files, etc.
ll6 = LumiList(compactList = {'1001' : [[1,2]], '1003' : [[1,1], [1,3]]}) # The same format as the regular good lumi file
ll7 = LumiList(runs = [1001, 1003]) # This corresponds to every lumi in the listed runs
Once you have a
LumiList object (or two) you can easily do lots of things with them:
nl1 = ll1 - ll2 # Give me all the lumis in ll1 not in ll2
nl1 = ll1 + ll2 # Give me all the lumis in ll1 or in ll2
nl1 = ll1 | ll2 # Same as ll1 + ll2, just different notation
nl1 = ll1 & ll2 # Give me all the lumis that are in both ll1 and ll2
len(ll1) # How many runs are in the LumiList?
ll1.removeRuns([1001,2002]) # Remove runs and all their lumis from a LumiList (not available in all versions)
ll1.selectRuns([1001,2002]) # Select only these runs if they exist in LumiList (not available in all versions)
ll5.getDuplicates() # Get a list of all the duplicates found during construction (not available in all versions)
You can also get various representations of the data in a
LumiList
print ll1 # Give a nice representation of the LumiList print(ll1) in Python3
ll1.getCompactList() # In the same format as the regular good lumi file
ll1.getLumis() # Pairs of run number, lumi number
ll1.getCMSSWString # CMSSW representation: '1001:1-2,1003:1,1003:3'
ll1.getVLuminosityBlockRange # a VLuminosityBlockRange suitable for configuring CMSSW
nl1.writeJSON(fileName='myNewFile.json') # Write out the results of your modifications to a new .json file
printJSON.py
printJSON.py
in FWCore/PythonUtilities , that prints out these files in a much more human readable fashion
printJSON.py alpha.json
{"132440": [[157, 378]],
"132596": [[382, 382], [447, 447]],
"132598": [[174, 176]],
"132599": [[1, 379], [381, 437]]}
instead of a single line as they usually are:
cat alpha.json
{"132440": [[157, 378]], "132596": [[382, 382], [447, 447]], "132598": [[174, 176]], "132599": [[1, 379], [381, 437]]}
fjr2json.py
fjr2json.py
in FWCore/PythonUtilities will read cmsRun framework job reports and print the list of lumis that have been processed in JSON format.
fjr2json.py somedir/*.xml
will run over all fjr in somedir and print out the JSON format to the screen.
fjr2json.py --output=ran.json somedir/*.xml
will save the results to ran.json.
Note that if you have used CRAB, you can just use the
crab -report
option to retrieve the same file.
edmLumisInFiles.py
edmLumisInFiles.py
in
DataFormats /FWLite (tag V01-11-00 or greater) takes a list of
EDM files for input and prints out the list of lumis contained in JSON format.
edmLumisInFiles.py data_Run14*
{"140362": [[29, 31], [60, 61]],
"141961": [[62, 64], [85, 85], [87, 87]]}
A working example is:
edmLumisInFiles.py /afs/cern.ch/cms/Tutorials/TWIKI_DATA/CMSDataAnaSch/CMSDataAnaSch_Data_387.root
would give the following output
{"149011": [[575, 576], [699, 699]]
--intLumi will print the total integrated luminosity (recorded and delivered) to the screen (as well as a note pointing out that
lumiCalc.py
is the official method to calculate integrated luminosities; see the
LumiCalc TWiki).
As with fjr2json.py, you can also use the --output option to save the results in a file.
filterJSON.py
filterJSON.py
in FWCore/PythonUtilities (tag V01-04-00 or later) will read in a JSON formatted file and keep only runs that meet requested minimum or maximum run number.
filterJSON.py --min 140380 old.json
will print to the screen all runs greater than or equal to 140380.
filterJSON.py --min 140380 --max 141220 old.json --output new.json
will save to new.json all runs greater than or equal to 140380 and less than or equal to 141220.
You can also specify individual runs to be removed.
filterJSON.py --min 140380 --max 141220 --runs 140381,140385 --runs 140388 old.json --output new.json
will save to new.json all runs greater than or equal to 140380 and less than or equal to 141220 and explicitly removed runs 140381, 141385, and 141388. You can either add many runs with a comma separated list (
e.g., 140381,140385) or you can use multiple
--runs
options.
csv2json.py
cvs2json.py
in FWCore/PythonUtilities (tag V01-05-00 or later) will extract run and luminosity section in a CSV file and print output in JSON format. Uses --output option to save output to file (instead of printing to screen).
cvs2json.py input.csv --output output.json
By default, the script assumes that the 0th column is the run number and 1st column is the lumi section. You can control this with
--runIndex
and --lumiIndex options.
mergeJSON.py
mergeJSON.py
in FWCore/PythonUtilities (tag V01-07-00 or later) will merge different JSON files together.
mergeJSON.py first.json second.json- --output=total.json
will take the runs in
first.json
as well as the runs in
second.json
.
mergeJSON.py first.json:132000-140999 second.json:141000- --output=total.json
will take the runs in
first.json
between 132000 and 14999 as well as the runs in
second.json
greater than or equal to 141000.
filterCSVwithJSON.py
filterCSVwithJSON.py
in FWCore/PythonUtilities (tag V01-07-00 or later) will filter CSV files (
e.g., those created by lumiCalc.py= using option =lumibyls or lumibylsXing), keeping only lumi sections that are in the JSON file.
filterCSVwithJSON.py short.json long.csv short.csv
will take the lumi sections in
short.json
from long.csv and create short.csv.
How to use Good Luminosity Section files in
CRAB
For this please use the following link
Running over selected luminosity from the CRAB documentation.
Warning: When running CRAB do not follow the instructions for cmsRun. In other words do not mix Framework methods with the CRAB settings.
cmsRun
This section tells you how to use a file of good lumi sections to configure CMSSW. Usually the files in JSON format of luminosity sections are used as inputs into CRAB. But if you want to run interactively on the same lumi sections, you can use the little code snippet bellow. With CMSSW 3.8 and higher it works out-of-the-box. For earlier releases, one can check out FWCore/PythonUtilities tag V01-00-02 and begin using it. You might also have to check out
PhysicsTools /PythonAnalysis (needed for
LumiList module):
import FWCore.ParameterSet.Config as cms
import PhysicsTools.PythonAnalysis.LumiList as LumiList
myLumis = LumiList.LumiList(filename = 'goodList.json').getCMSSWString().split(',')
process.source.lumisToProcess = cms.untracked.VLuminosityBlockRange()
process.source.lumisToProcess.extend(myLumis)
A more compact syntax is available starting with CMSSW 5.0.0. For CMSSW 4.x releases it can be used after checking out
PhysicsTools /PythonAnalysis V00-05-03 and building it.
import PhysicsTools.PythonAnalysis.LumiList as LumiList
process.source.lumisToProcess = LumiList.LumiList(filename = 'goodList.json').getVLuminosityBlockRange()
For Run 2 (CMSSW_7_4_X), the tool should be imported from a new location:
import FWCore.PythonUtilities.LumiList as LumiList
process.source.lumisToProcess = LumiList.LumiList(filename = 'goodList.json').getVLuminosityBlockRange()
FWLite
This below works with CMSSW 3.8 and higher. For earlier releases, one can check out
FWCore/PythonUtilities
tag V01-00-02 and begin using it.
You can find another complete example in CMSSW of how to access good run/lumi lists in FWLite
here.
To use a good luminosity file in FWLite , first you need to have a configuration file that loads whichever file in JSON format you want.
cat loadJson.py
The loadJson.py file is
HERE and is also shown below
import FWCore.PythonUtilities.LumiList as LumiList
import FWCore.ParameterSet.Types as CfgTypes
import FWCore.ParameterSet.Config as cms
# setup process
process = cms.Process("FWLitePlots")
process.inputs = cms.PSet (
lumisToProcess = CfgTypes.untracked(CfgTypes.VLuminosityBlockRange())
)
# get JSON file correctly parced
JSONfile = 'Cert_132440-139790_7TeV_StreamExpress_Collisions10_JSON.txt'
myList = LumiList.LumiList (filename = JSONfile).getCMSSWString().split(',')
process.inputs.lumisToProcess.extend(myList)
In FWLite, you want to load in that configuration file. If a good luminosity file in JSON format is present, load it.
PythonProcessDesc builder (argv[1], argc, argv); // or "myConfigFile.py"
edm::ParameterSet const& inputs =
builder.processDesc()->getProcessPSet()->
getParameter("inputs");
std::vector jsonVector;
if ( inputs.exists("lumisToProcess") )
{
std::vector<edm::LuminosityBlockRange> const & lumisTemp =
inputs.getUntrackedParameter<std::vector<edm::LuminosityBlockRange> > ("lumisToProcess");
jsonVector.resize( lumisTemp.size() );
copy( lumisTemp.begin(), lumisTemp.end(), jsonVector.begin() );
}
Finally, you want to be able to check if this given event is part of the good luminosity file or not. If no good luminosity file is loaded, this function will always return true.
bool jsonContainsEvent (const std::vector< edm::LuminosityBlockRange > &jsonVec,
const edm::EventBase &event)
{
// if the jsonVec is empty, then no JSON file was provided so all
// events should pass
if (jsonVec.empty())
{
return true;
}
bool (* funcPtr) (edm::LuminosityBlockRange const &,
edm::LuminosityBlockID const &) = &edm::contains;
edm::LuminosityBlockID lumiID (event.id().run(),
event.id().luminosityBlock());
std::vector< edm::LuminosityBlockRange >::const_iterator iter =
std::find_if (jsonVec.begin(), jsonVec.end(),
boost::bind(funcPtr, _1, lumiID) );
return jsonVec.end() != iter;
}
Where you would call this from inside your event loop:
for( evevnt.toBegin(); ! event.atEnd(); ++event )
{
if ( ! jsonContainsEvent (jsonVector, event) )
{
// this event is not in a good lumi section
continue;
}
} // event loop
<!--
* testJSON.cc - main code
*
BuildFile * loadJson.py - python configuration that tells FWLite to load Cert_132440-139790_7TeV_StreamExpress_Collisions10_JSON.txt
* noJson.py - python configuration with no file in JSON format
* Cert_132440-139790_7TeV_StreamExpress_Collisions10_JSON.txt - file in JSON format
You simply run this providing a configuration file.
Loading files in JSON format
<pre class="command">
testJSON.exe loadJson.py
</pre>
<pre class="output">
132440, 1 : missing
132440, 156 : missing
132440, 157 :
* Included **
132596, 382 : * Included
*
132596, 383 : missing
</pre>
No file in JSON format
<pre class="command">
testJSON.exe noJson.py
</pre>
<pre class="code">
132440, 1 : * Included
*
132440, 156 : * Included
*
132440, 157 : * Included
*
132596, 382 : * Included
*
132596, 383 : * Included **
</pre>
-->
--
SudhirMalik - 30-Jul-2010