4.1 Data Analysis in the Full Framework
Complete:
Detailed Review Status
Contents
Goals of this page
This page provides an up-to-date overview of the role of CMSSW Framework (aka Full Framework) in user's analysis.
Introduction
The Full Framework is CMS's main tool for data processing and is thus intimately connected with user's analysis in a number of ways:
- it is used to produce data`upstream' of the user's analysis: in HLT, Reconstruction, and Skimming
- it is used in PAT-tuple production in group and user skims
- it could also be used for making histograms and plots
- it could be used by the user to further adjust the content of the PAT files used in the analysis
The objective of this whole chapter is to demonstrate the key applications of the Full Framework in points #3 and #4 above.
Making plots in Full Framework, and the Interactive Analysis
The Full Framework is capable of creating and filling histograms, and is able to manage many histograms produced by a number of ED Analyzers. There are two ways to utilize this capability:
- monitoring and validation of Full Framework jobs
- interactive use
The CMS Framework is perfectly suited for creating and filling lots of histograms that can be used to keep track of what is happening in the event processing. The histograms produced this way can be used to quickly identify whether the job is working correctly and whether the output file makes sense (without analyzing the output file more thoroughly in FW Lite). This can be extended to a more detailed validation as well. In fact, the Full Framework is a great tool for making many
known plots.
However, this approach suffers from reduced
interactivity. In FW Lite, one can have rapid iterations in the think-click-plot-think cycle. In cmsRun, the same cycle takes a bit longer (sometimes many times longer, depending on the application), which ultimately slows down the progress of the analysis as well.
That being said, if the analysis requires the use of detailed Geometry, Alignment, and other kinds of Calibration Constants (possibly requiring database access), the only way to achieve it is in the Full Framework.
Adjusting the content of the user's PAT-tuple
The users will naturally want to control what is in their analysis data sample (
e.g., PAT-tuples), as that defines what is possible in the interactive stage of the analysis. In addition, having data sample which is too large also slows down the interactive analysis, simply because, due to a larger I/O, it takes longer to process the same number of events. So the key is to learn how to
- add the necessary information (in terms of other ED products, or CMS.UserData attached to PAT objects)
- remove the unnecessary information (in terms of making PAT objects which are `just right' in size)
These topics will be covered in detail in the PAT workbook sections below.
-- Main.Altan Cakir - 09 Oct 2017