Collected feedback for coming PAT-tuple and data flow workshop on June/19 (Friday).

The questions:

  • What is your analysis topic and the crucial issues which are related to using the tools of the PAT?
  • Are you using PAT datasets or are you creating your private n-tuple. What is the exact input of this n-tupling step?
  • Does the PAT fit your needs in terms of convenience?
  • Do you think your analysis needs can be translated into a custom tuple for your subgroup or exotica in general using PAT data formats? What are the reasons for your conclusion?

The answers:

Boosted top / muon+jets and all-jets (Francisco Yumiceva)

  • Boosted top analyses: muon+jets and all-jets channels. We need to keep several jet and MET collections. Manipulation of the JEC are also important.

  • We do not use the PAT datasets. We create our own pat-tuples. The pat-tuples have a loose preselection and include additional collections for our analyses, like several jet and MET collections.

  • yes.

  • I don't think that common PAT tuples in the Exotica group will hep us. We are already sharing pat-tuples in our boosted top subgroup because we need additional objects and our pre-selection help us to keep our pat-tuples small. If we find problems or we need some changes it is much easier and faster for us to re-create our pattuples than waiting for the official ones.

Boosted top jets / ttbar resonance (Salvatore Rappoccio)

  • I'm using boosted top jets for all-hadronic ttbar resonance searches. For this, I created a custom tag info, which I successfully included in the tag infos for the pat::Jets. I also created a custom jet collection which I used as an input for my pat::Jets. Both operations were completely successful and are in place for my current analysis.

  • Private pat-tuples, which input reco or AOD.

  • Yes, I designed some aspects of it with my own analysis in mind, so I kind of cheated ;).

  • It could, indeed, but it might not be of general interest since the input is a non-standard jet collection. The Exotica conveners should have input here. I'm happy to include it if there is general interest.

HEEP group (Pascal Vanlaer)

  • Search for resonances in di-electron final states (High-Energy Electron Pairs / HEEP group). The crucial issues are:
    • that the standard electron identification and isolation variables are made available in the PAT;
    • that the other objects are also standard (i.e. POG-validated).

  • We are using a two-step process:
  1. we produce standard PAT trees with the following exceptions:
    • the electron user isolation variable is used to patch the PAT for the missing HCAL 2nd segment in the endcaps in CMSSW2X. This needed special code to be developed (by Sam Harper). This issue should be solved in CMSSW3X where PAT should provide all the standard egamma isolation variables.
    • we keep some additional RECO collections (mainly superclusters for electron efficiency studies). These PAT trees are created with a CRAB job and stored at our local Tier-2 disk space.
  2. we copy the values of some of the PAT variables into very simple root trees, that allow fast interactive analysis and plot making. These are easy and fast to reproduce off the PAT trees that are stored locally, in case a variable has to be added. In this way, we have achieved both a high level of traceability (electron selection efficiencies were checked event by event and were found to agree between CMSSW and root interactive analysis) and a high level of flexibility and fast turn-around of analysis.

  • PAT acts as a single place (namespace) to look for high-level objects, which is very convenient, i.e. using pat::Muon or pat::genMET one gets a reasonable definition of genMET and muon objects. We feel this will make analysis in CMS more efficient by lowering the learning threshold for non-experts to certain objects (we are more electron experts). Documentation is important, e.g. with PAT-POG contact persons clearly mentioned on POG wiki pages and links to the actual PAT config files for cross-checks of the actual parameters used.

  • For the HEEP subgroup, yes with the exception of the additional RECO data needed (mainly superclusters). For a general Exotica analysis, the additional RECO data needed is most likely different. Of course, detailed data quality checks need RECO and will always need RECO.

MUSiC/model-independent search group (Carsten Hof)

  • MUSiC. We pretty much rely on Standard object IDs which are fairly well supported by PAT.

  • We run the PAT layers on the fly and store the information needed in a custom ROOT like format using PXL the successor of PAX which has been developed in our institute and was part of the ORCA package.

  • In principle yes. However, currently things are moving quite fast and versions and bugs change frequently. Even as an experienced end-user I have to say that I am sometimes lost. PAT shields one from the framework, but it also makes things difficult as it hides complex things. So through the additional layer of PAT it is even more difficult to get the feeling that one is doing the right things. As an example: previously when something got wrong in the reconstruction of say electrons you look at the electron code. Now you have to go all the way through the PAT code to finally dig into the electron reco code to find what's going wrong. Bottomline: either use PAT as a black box (bad thing) or go the hard way of digging through PAT further to the framework code when trying to understand the system.

  • The MUSiC analysis is quite specific. However, a standard PAT tuple might be convenient as we can switch easily from running PAT on the fly to running on already produced PAT ntuples. Still, at the LHC start-up things will change rapidly and n-tuples might not pop up as fast as re-recoed samples. Therefore the approach of running PAT on the fly might be most suitable. Once LHC/CMS runs smoothly it might be a valuable option to use PAT-ntuple only.

Fourth Generation Quarks (Kai-Feng Chen)

  • It is very nice to have PAT to provide an unified interface to the reconstructed objects. This analysis is heavily rely on the standard object tools.

  • For most of the studies we are able to run on the lepton+jets PAT tuples; however, the fast changing software still makes a big touble for PAT-tuple productions: never catched up with the developments. Most of the case we are still running PAT on the fly.

  • The PAT fits the needs in terms of convenience. However, simplied non-PAT root trees are still need for a larger scale optimization work and plots making, in terms of data size and speed. PAT-tuples are still too big for this purpose (e.g. doing a cut scan with 10M or more events).

-- KaiFengChen - 17 Jun 2009

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2009-06-18 - KaiFengChen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback