Things that I am or may be working on, as a part of DIANA-HEP.

ROOT 7 graphics/plotting user interface

Cannibalize ideas from SVGFig and Plotsmanship projects: well defined grammar, functional user interface, extremely fluid composition, write-once Javascript with many language bindings, headless server. See Plotsmanship design document for details.


Presented to Philippe, Axel, and Bertrand on 2016-01-28, they said there's some duplication with what's already there. Gave me some JSRoot links for me to investigate the overlap.

Still haven't compiled ROOT trunk: this bug hasn't been resolved.

ROOT 7 functional data processing chains


Replace ROOT's ntuple → histogram syntax with C++11 inline functions for better access to non-flat ntuples (containing structures and arrays). Make these expressive enough that physicists don't have to resort to imperative for-loops over ntuple contents, which are not automatically parallelizable.


Steal concept of functional data processing chains from Spark (which comes from R, Numpy, SQL, and ultimately Lisp). Use infix notation for



Philippe pointed me to some ROOT unit tests that demonstrate the kinds of operations that must be expressible. I developed a toy example in Python and showed what each of these operations would look like. I wrote it up in a pseudo-talk and sent it to him.

Access ROOT in Java


Almost all of the Big Data tools are Java projects and almost all of the HEP data is in ROOT files. There will need to be a bridge.

Possible scope:

  1. JAR that reads ROOT files
  2. JAR that reads and writes ROOT files
  3. JAR that provides full access to all ROOT functions (like PyROOT for Java)
  4. Apply one of the above to Spark (Scala)
  5. Connection from CMSSW to Java (for use as a Spark "smart" RDD)


  1. Compile C/C++ code into Java with JNI (rejected: too buggy)
  2. Dynamically link shared object file (.so) with JNA (currently favored)
  3. Re-implement ROOT file reading for a pure Java solution; see JSRoot for inspiration (currently disfavored: too complicated)
  4. Use FreeHEP-ROOTIO, an existing reimplementation used at SLAC (could be good).

Decision between #2 and #4 could come down to performance tests.

Even if we use FreeHEP-ROOTIO for ROOT, the JNA technique could be useful for CMSSW or other C++ code.


  • Initial tests with JNA.
  • Working in scaroot on GitHub.
  • Presented introductory talk on 2016-02-08.

Accelerate data skims with NoSQL database

From informal conversations with Jin Chang.


Physicists typically pull data from a central source, like Mini-AODs in CMS, by running a filter over the whole dataset. With sufficient indexing in a database of event numbers, it should be possible to reduce disk access and processing by proposing the cuts to a database, getting an event list of from the database consisting of a coarser version of those cuts, and applying the event list to the skim job to apply the filter over a smaller number of events.

For this to be an improvement, four conditions must be met:

  • The features chosen for indexing in the database must be frequently used in user queries.
  • The database must not do a full filter itself (although even that might be quicker than loading whole physics events).
  • The user queries must apply reasonably tight cuts (they may have wide margins, but they must eliminate a large fraction of the total data).
  • Event-list processing must be faster than a straight read (it must seek through the file, a capability that ROOT provides).


  • Rely on a NoSQL database's built-in indexing capabilities. This relies on the particular database being designed for a similar need to our own. (My reading of MongoDB's indexing capabilities isn't promising. Our chosen features would be continuous and many-dimensional: a lexicographic sort wouldn't help!)
  • Do a multidimensional version of GeoHashing and use a simple key-value store. At what level of precision? Adaptive?
  • Build a decision tree over the index variables to make the bucket contents more uniformly distributed. User cuts would need to be passed through this decision tree to generate database queries.


  • Have to repopulate the database for each Mini-AOD reprocessing.
  • Have to analyze user cut formulae to generate database queries.

Present Spark how-to to MicroBoone

Jason St. John identified a calibration-style analysis (iterative map-reduce) for which Spark could provide a speed-up. I would give a workshop and provide some help for them to set up a Spark workflow for that calibration.

Limited high-level mini-language to generate fast math functions

Longer timeline. I'll fill this in later, but it could become relevant in 2017 or 2018.

Presentations and papers


Not presented:


People I've talked to regarding DIANA-HEP.


  • Philippe Canal <>
  • Axel Naumann <>
  • Bertrand Bellenot <>

CMS computing

  • Oliver Gutsche <>

Non-CMS computing

  • Jin Chang <>

CMS analysts

  • Lovedeep Kaur <>, Kansas State
  • Artur Apresyan <> and Si Xie <>, Caltech
  • Nhan Tran <>, Fermilab postdoc
  • Kevin Pedro <>, Fermilab postdoc
  • Jim Hirschauer <>, Wilson fellow

Non-CMS analysts

  • Jason St. John <>, Cincinnati postdoc, MicroBoone
FirstName Jim
LastName Pivarski
Telephone 312 448 0672


Edit personal data
Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf Plotsmanship.pdf r1 manage 362.5 K 2016-02-03 - 21:16 JimPivarski Plotsmanship design document
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2016-02-06 - JimPivarski
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback