Monitoring using JSON metadata files (JSON Collector)
Metadata file formats for the FileBasedEVF
BU
Simple data file (.jsn) and definition / legend file (.jsd)
Simple data file (.jsn)
- File type: JSON
- Output by: BU
- Data fields
- number of events
- total size
- others...
- Other fields
-
definition
is the location of the legend file for this format
-
source
is a string representing the source of this data
Example:
{
"data" : [ "1022", "122122"],
"definition" : "/path/to/def.jsd",
"source" : "bu-1"
}
Definition / Legend file (.jsd)
- File type: JSON
- Output by: ??
- Fields
-
legend
array of name/operation values for each monitored field
-
file
file path of this legend; used as reference for data json files
{
"legend" : [
{
"name" : "Events",
"operation" : "sum"
},
{
"name" : "Total size",
"operation" : "sum"
}
],
"file" :"/path/to/def.jsd"
}
NB: The number of elements in the data array of the DATA file must match the number of elements in the legend array of the LEGEND file!
FU
Fast output file (.fast), Data+histogram file (.jsh) and definition/legend file (.jsd)
Fast output file (.fast)
- File type: CSV
- Output by: CMSSW process
- Data fields
- first line is path to definition, which has the same format as above; microstates will have a "histogram" operation instead of the regular sum, avg, etc.
- following lines are comma-separated values of the different counters and are appended to the end of the file by line
- number of events processed
- number of events accepted by any stream
- ministate
- microstate
- more...
Example:
/path/to/def.jsd
100,50,12,15
200,70,12,800
...
Data+histogram file (.jsh)
- File type: JSON
- Output by: Aggregating .fast files
- Data fields (same as .fast files, but with histogram vectors for the states)
- number of events processed
- number of events accepted by any stream
- ministate histogram array
- microstate histogram array
- more...
Example:
{
"data" : [ "100", "50"],
"ministates" : "[0,0,0,0,4,0...]",
"microstates" : "[0,0,0,0,4,0...]",
"source" : "fu-pid"
}
Definition/legend file (.jsd)
- Same as the BU-type legend file, but with array of ministates and microstate names
{
"legend" : [
{
"name" : "Events processed",
"operation" : "sum"
},
{
"name" : "Events accepted by any stream",
"operation" : "sum"
}
],
"ministates" : ["Mname1", "Mname2", ...],
"microstates" : [ "mname1", "mname2", ...],
"file" :"/path/to/def.jsd"
}
Handling these formats
- Simple data files (BU-type) are aggregated by using operations specified in the definition file
- Fast output files are aggregated by looking at the definition (1st line) and:
- if the field is a regular operation just take the last value
- if it is a microstate, take corresponding value in each row
- place these values in a Data+histogram type JSON file
- Data+histogram files are aggregated using operations specified in the definition file
Writing JSON metadata files
The following C++ types are currently monitorable:
- IntJ: wraps
int
- DoubleJ: wraps
double
- StringJ: wraps
std::string
Below is an example showing how to use the JSONCollector output API to generate monitoring files in the format above. This approach to JSON file writing is useful when we want to configure the monitoring output without changing the code but only the definition (an external file).
Writing simple data files (BU-type)
#include "JSONCollector/interface/JsonMonitorable.h"
#include "JSONCollector/interface/DataPointMonitor.h"
#include "JSONCollector/interface/JSONSerializer.h"
#include <iostream>
#include <vector>
#include <string>
using namespace std;
using namespace jsoncollector;
class Monitored {
public:
// some variables to monitor
// types defined in JsonMonitorable.h
IntJ nEvents;
DoubleJ totalSize;
StringJ someString;
};
int main() {
Monitored mon;
// set names of the variables to be matched with JSON Definition
mon.intvar.setName("Events");
mon.doublevar.setName("Sizes");
mon.stringvar.setName("SomeString");
// create a vector of all monitorable parameters to be passed to the monitor
vector<JsonMonitorable*> monParams;
monParams.push_back(&mon.nEvents);
monParams.push_back(&mon.totalSize);
monParams.push_back(&mon.someString);
// create a DataPointMonitor using vector of monitorable parameters and a path to a JSON Definition file
DataPointMonitor monitor (monParams, "/path/to/simple_def.jsd");
// give some values to the monitored parameters
mon.nEvents = 1023;
mon.totalSize = 512223;
// create a DataPoint object and take a snapshot of the monitored data into it
DataPoint dp;
monitor.snap(dp);
// serialize the DataPoint and output it
string output;
JSONSerializer::serialize(&dp, output);
cout << output << endl;
// write this string to a file
// ...
return 0;
}
For the above code we need a definition file at the specified path. The output format will be given by the definition at
/path/to/simple_def.jsd.
{
"legend" : [
{
"name" : "Sizes",
"operation" : "sum"
},
{
"name" : "Events",
"operation" : "sum"
}
],
"file" : "/path/to/simple_def.jsd"
}
Writing fast files (FU-type)
These files are not JSON, but CSV. They will be converted to JSON by the aggregation process.
#include "JSONCollector/interface/JsonMonitorable.h"
#include "JSONCollector/interface/FastMonitor.h"
#include "JSONCollector/interface/JSONSerializer.h"
#include <vector>
using namespace std;
using namespace jsoncollector;
class Monitored {
public:
IntJ processedEvents;
IntJ acceptedEvents;
IntJ microstate;
IntJ macrostate;
};
int main() {
Monitored mon;
// set names of the variables to be matched with JSON Definition
mon.processedEvents.setName("Processed Events");
mon.acceptedEvents.setName("Accepted Events");
mon.microstate.setName("Microstate");
mon.macrostate.setName("Macrostate");
// create a vector of all monitorable parameters to be passed to the monitor
vector<JsonMonitorable*> monParams;
monParams.push_back(&mon.processedEvents);
monParams.push_back(&mon.acceptedEvents);
monParams.push_back(&mon.macrostate);
monParams.push_back(&mon.microstate);
// create a FastMonitor using vector of monitorable parameters, a path to a JSON Definition file and the output file path
FastMonitor
monitor(
monParams,
"/path/to/histo_def.jsd",
"/path/to/output.fast");
// change the monitored parameters
mon.processedEvents = 100;
mon.acceptedEvents = 76;
mon.microstate = 3;
mon.macrostate = 1;
monitor.snap();
// change the monitored parameters again
mon.processedEvents = 200;
mon.acceptedEvents = 150;
mon.microstate = 9;
mon.macrostate = 1;
monitor.snap();
// do something else ...
monitor.snap();
return 0;
}
For the above code we need a definition file at the specified path. The output format will be given by the definition at
/path/to/histo_def.jsd.
{
"legend" : [
{
"name" : "Processed Events",
"operation" : "sum"
},
{
"name" : "Accepted Events",
"operation" : "sum"
},
{
"name" : "Microstate",
"operation" : "mHisto"
},
{
"name" : "Macrostate",
"operation" : "MHisto"
}
],
"file" : "/path/to/histo_def.jsd"
}
Aggregating JSON metadata files
API
Below is an example of using the API to aggregate json metadata files read from a directory.
#include "JSONCollector/interface/JSONFileCollector.h"
#include <vector>
#include <string>
using std::vector;
using std::string;
int main() {
string inputFolder = "/path/to/jsnfiles/";
string outputFile = "/output/path/out.jsn";
vector<string> inputJSONFilePaths;
string outcomeString;
// get a list of .jsn files that respect the regular expression <mon.*> for the file name
JSONFileCollector::getJSONFileList(inputFolder, inputJSONFilePaths, outcomeString, "mon.*");
// aggregate json files and write the output file
// the last argument (formatForDisplay) is set to false if we want to keep the same format for output
// if we want a more readible output, we set this to true (the output file will no longer respect the input format)
int outcome = JSONFileCollector::collectAndOutput(inputJSONFilePaths, outputFile, false);
return outcome;
}
Command line tool
Available as a command line tool that loads json data files from a dir (optionally using a regex for the file name) and outputs the aggregated result according to the legend. Input files must have the same legend and be consistent.
Usage:
./JSONCollector [-d] [-r <regex>] -o <outfile> -i <indir1> <indir2>...<infileN>
where:
- [-d] if flag is set will output for display, meaning it will merge Data and Legend files into one. This file can no longer be re-aggregated.
- [-r ] regular expression to be satisfied by the input file names of json files
- -o one output file of the operation
- -i a list of inputs for aggregation, may be individual files or dirs containing files
CODE
Code is available here:
/afs/cern.ch/user/a/aspataru/public/JSONCollector
Open issues
- Meaning of microstate numbers: only required for visualization, so the mapping between state number and name will be defined somewhere else
--
AndreiSpataru - 28-Nov-2012