Reprocessing and Production Team Meeting - June 9 4PM CERN, 9 FNAL time


Vidyo Link

Attending

  • FNAL: Jen, Jorge, Gaston, Dima, MAtteo
  • US: Alli
  • CERN : Alan, Sebastion, Paola, Andrew
  • Korea :

Personnel

  • Jen Vacation July 25-29
  • SeangChan Jun 2-July 5
  • Jorge June 13-24

News - Dima

  • Nothing major -
    • for the DR80 - If we get miniAOD v1 done we can close whatever we have, what they really need is miniaod v2, so if we have something stuck in tails for
    • JR is putting the conditions in the Unified closeout script, and it's in a config file that is easily cleared out when we are done

Top issues affecting production

  • Duplicates:https://cms-logbook.cern.ch/elog/Workflow+processing/24533
    • duplicates showed up after things were approved
    • get the files with the duplicate limi's then do a search in PhEDEx for file creation
      • Jen will get the lisst of files, Jorge will do the PhEDEx search
  • premix: https://cms-logbook.cern.ch/elog/Workflow+processing/24577
    • wf's have been cloned and resubmitted
    • submit failed why are we getting submit fails when HLT and Alan will figure out why it happened
    • 771101 - no sites available is happening in the Alan and Paola are looking at it so
  • backfill: https://cms-logbook.cern.ch/elog/Workflow+processing/24553
  • High number of workflows in assist-man again - we were caught up last Friday.. what happened?
    • Wallclock time - error status in Condor, 2nd is for jobs running longer than 2 days, pending jobs longer than 3 days
    • we don't understand why we have so many held jobs
    • we need to pass that info to GlidinWMS people, Farruk, may also be a site that has a config moving to held, looks to be more a buggy condor issue
    • Paola will talk to Farruk and get this worked out.
  • wrong config on miniAOD - https://cms-logbook.cern.ch/elog/Workflow+processing/24606
    • if it's the same issue reject, add to hypernews, elog and done

Site support -

Date Site Into the Waiting Room Out of the Waiting Room Into the morgue Out of the morgue
2016-05-29 00:00:01 T2_RU_IHEP   x    
2016-06-01 00:00:01 T2_TW_NCHC     x  
2016-06-01 00:00:01 T2_IT_Rome x      
2016-06-03 00:00:01 T2_BR_UERJ x      
2016-06-03 00:00:01 T2_BE_UCL x      
2016-06-05 00:00:01 T2_US_UCSD   x    

Transfers - Jorge

  • stuck transfers from CERN to Nebraska because the link isn't fully commissioned yet
  • new PhEDEx version for central agents will be updated is already done, backlog at CERN MSS of 600TB

Workflows

ReDigi

MiniAOD

TaskChains

StepChain

  • NA

Rereco

Store Results

MonteCarlo

Agent Issues

Agent redeployment

RequestMgr2 Migration

Merging Scripts

  • assign.py script

RelVal Andrew

AOB

-- JenniferAdelmanMcCarthy - 2016-06-07

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2016-06-09 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback