Reprocessing and Production Team Meeting - May 19 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, Jorge, Jesus, Matteo, SeangChan
  • US: Allie
  • CERN : Paola, Dima, Alan, Andrew, Svenja, Mykola, JeanRoch
  • Korea :

Personnel

  • Gaston to Colombia Early May 12-28, talk on the 13
  • Jen 1/2 day May 20, June 3
  • SeangChan Jun 2-July 5
  • Jorge June 13-24

News - Dima

  • New Shifters! Mykola and Svenja from DESY University.
    • Paola & Alan will do some training with them next week
  • Why are we running so low

top issues affecting production

  • Lots of Submit failures, FNAL, KIT, RWTH, PISA needing manual cleanup
    • JR - I know you sort Workflows by failure type to determine what WF's we can automatically do ACDC's on, is there a way that you can also tell where the workflow failed at? If a workflow is failing due to submit failures, once a site is back up and healthy we can run ACDC. If we are waiting, it would be useful to sort those workflows out that we are waiting for by site so we can decide what to do based on what the site support team tells us is going on.
      • knowing the location, error code, and task that have been failing so as to make more elaborate automatic recovery is what we have been discussed over the last couple of weeks (https://github.com/dmwm/WMCore/issues/6858). I hope we can converge on this very soon : as I mentioned last week, this is the top priority development for what concerns operation
      • SeangChan will look at the issue and see if he can get it going in testbed next week.
  • Where are we in testing merge issues at T0_CH_CERN
    • T0 merge : how comes permission issues come and go ?
  • workflow with intermediate outputs missing in PhEDEx -
    • here is one case, but I am pretty sure there will be others, if we've ACDC'd 3X : pdmvserv_task_SUS-RunIISpring16DR80-00039__v1_T_160403_123422_6253
    • Would this be related to PhedexInjector not catching up with job submission ? was the agent that saw this file produced asked for what it did with the file ?
  • workflows that we are waiting for MINIAOD's on
  • Workflows exceeding maxRSS
  • failed workflow pdmvserv_task_HIG-RunIISpring16DR80-00066__v1_T_160405_075229_6042 https://cms-logbook.cern.ch/elog/Workflow+processing/24201
    • Can one please take a look ?
    • elog is not the proper tool for operation what else can we change to ?
  • workqueue element with no chance at running (https://cms-logbook.cern.ch/elog/Workflow+processing/24303): will cause major source of delay

Site support - Gaston is on his way to Colombia

  • T2_CH_CSCS - where are we in the testing?
    • Last week's test failed but this week is ok so Paola will answer her findings in a ggus tiket

Date Site Into the Waiting Room Out of the Waiting Room Into the morgue Out of the morgue
2016-04-21 00:00:01 T2_US_Caltech x      
2016-04-21 00:00:01 T2_PL_Swierk x      
2016-04-22 00:00:01 T2_TW_NCHC x      
2016-04-22 00:00:01 T2_IT_Bari   x    
2016-04-24 00:00:01 T2_IN_TIFR   x    
2016-04-25 00:00:01 T2_US_Caltech x      
2016-04-25 00:00:01 T2_UK_London_Brunel x      
2016-04-25 00:00:01 T2_US_UCSD x      
2016-04-25 00:00:01 T2_BR_SPRACE x      

Transfers - Jorge

  • there is a GenSim dataset stuck in transfers that Jorge is working on

Workflows

  • lots of workflow are completing, lots of recoveries pending
    • lots of smaller workflows that just finished at rate of 25% into recovey, it would be interesting to have and keep track of this number over time.

ReDigi

MiniAOD

TaskChains

StepChain

  • NA

Rereco

Store Results

MonteCarlo

Agent Issues

Agent redeployment

RequestMgr2 Migration

Merging Scripts

  • Paola - what do you need to turn over to Allie still?

RelVal Andrew

AOB

-- JenniferAdelmanMcCarthy - 2016-05-18
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2016-05-19 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback