SWGuidePATJet38X

The pat::Jet has undergone a major overhaul in 38x. Because this is the largest object in the PAT due to content embedding, users are hesitant to use PAT for analysis due to the slow read access speeds. The overhaul is a data format refactoring that speeds up read access speeds considerably.

pat_jet_refactor.gif

The refactoring makes heavy use of the edm::FwdRef and edm::FwdPtr. This has an advantage over the RECO strategy for association (edm::ValueMap) because it allows the user to thin the collection without losing access to the keys to the association.

Producing

The new PATJetProducer will create collections of CaloTower, TagInfo, PFCandidate, and GenJet. The objects that are associated to each pat::Jet will be appended to a new collection with the same name as the jets.

The event content will therefore look something like this:

edm::OwnVector<reco::BaseTagInfo,edm::ClonePolicy<reco::BaseTagInfo> >     "selectedPatJets"       "tagInfos"    "PAT."         
edm::SortedCollection<CaloTower,edm::StrictWeakOrdering<CaloTower> >     "selectedPatJets"       "caloTowers"    "PAT."         
vector<pat::Electron>             "cleanPatElectrons"     ""            "PAT."         
vector<pat::Jet>                  "cleanPatJets"          ""            "PAT."         
vector<pat::MET>                  "patMETs"               ""            "PAT."         
vector<pat::Muon>                 "cleanPatMuons"         ""            "PAT."         
vector<pat::Photon>               "cleanPatPhotons"       ""            "PAT."         
vector<pat::Tau>                  "cleanPatTaus"          ""            "PAT."         
vector<reco::GenJet>              "selectedPatJets"       "genJets"     "PAT."         
vector<reco::PFCandidate>         "selectedPatJets"       "pfCandidates"    "PAT." 

The default PAT event content has changed to reflect this refactorization. If the user drops these extra collections, the usual restrictions apply (that they cannot access the information that isn't in the event).

Filtering

Due to the changed event content, care must be taken in filtering these objects. To facilitate this, a new pat::Jet specific string-parse-enabled filter has been created here. This will handle the appropriate thinning of the pat::Jet collection's secondary objects.

The python configuration to use this filter is the same as before here so no action needs to be taken by the user if this tool is used.

If the user desires to use another selector, this is also fine, except that the secondary collections will not be thinned along with the main jet collection.

Performance

The performance increase can be seen in this unit test:

#include <memory>
#include <string>
#include <vector>
#include <sstream>
#include <fstream>
#include <iostream>

#include <TH1F.h>
#include <TROOT.h>
#include <TFile.h>
#include <TSystem.h>

#include "DataFormats/Common/interface/Handle.h"
#include "DataFormats/FWLite/interface/Event.h"
#include "DataFormats/PatCandidates/interface/Jet.h"
#include "FWCore/FWLite/interface/AutoLibraryLoader.h"
#include "CMS.PhysicsTools/FWLite/interface/TFileService.h"
#include "TStopwatch.h"


int main(int argc, char* argv[])
{
  // ----------------------------------------------------------------------
  // First Part:
  //
  //  * enable the AutoLibraryLoader
  //  * book the histograms of interest
  //  * open the input file
  // ----------------------------------------------------------------------

  if ( argc < 4 ) return 0;

  // load framework libraries
  gSystem->Load( "libFWCoreFWLite" );
  AutoLibraryLoader::enable();
 
  TFile* inFile = TFile::Open( argv[1] );

  unsigned int iEvent=0;
  fwlite::Event ev(inFile);
  TStopwatch timer;
  timer.Start();

  unsigned int nEventsAnalyzed = 0;
  for(ev.toBegin(); !ev.atEnd(); ++ev, ++iEvent){
    edm::EventBase const & event = ev;

    // Handle to the jet collection
    edm::Handle<std::vector<pat::Jet> > jets;
    edm::InputTag jetLabel( argv[3] );
    event.getByLabel(jetLabel, jets);
   
    ++nEventsAnalyzed;
  } 

  inFile->Close();

  timer.Stop();

  // print some timing statistics
  Double_t rtime = timer.RealTime();
  Double_t ctime = timer.CpuTime();
  printf("Analyzed events: %d \n",nEventsAnalyzed);
  printf("RealTime=%f seconds, CpuTime=%f seconds\n",rtime,ctime);
  printf("%4.2f events / RealTime second .\n", (double)nEventsAnalyzed/rtime);
  printf("%4.2f events / CpuTime second .\n", (double)nEventsAnalyzed/ctime);
  

  return 0;
}

In 3.6.2, we see the following performance on 1000 events:

RealTime=6.149763 seconds, CpuTime=6.140000 seconds
162.61 events / RealTime second .
162.87 events / CpuTime second .

In 3.8.0 with the same input we see the following performance on 1000 events:

Analyzed events: 1000 
RealTime=2.682342 seconds, CpuTime=2.670000 seconds
372.81 events / RealTime second .
374.53 events / CpuTime second .
</verbatiim>

This speeds the reading of =pat::Jet= by a full factor of 2. 

-- Main.SalvatoreRoccoRappoccio - 22-Jul-2010
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2010-07-22 - SalvatoreRRappoccio
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback