Workflow Team Meeting - April 7 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, Gaston, Jorge, Jesus
  • US:
  • CERN : Paola, Dima, Alan
  • Korea : Youn

Personnel

  • Youn on shift
  • Jorge to Columbia April 15-May 2, Talk on April 27

News - Dima

  • keep an eye on pdmvserv_task_TOP-RunIISpring16DR80-00001__v1_T_160331_151408_3872
    • submit failures
  • we need to prioritize running at the T0, we need to fully saturate the T0, 13K jobs, and jobs were crashing, * Merge jobs still failing as of Wed AM * still needs a fresh testing, we have fixed the issue with the thresholds, workflow is running in testbed
    • possible problem in Global Pool in testbed - Paola will send same test to production instead
  • Database rotation has been put off until Monday at 10 AM CERN time, all WMStats info will be stale for
  • The people in PR are working on something mixing Multicore and Single Core which could make things interesting
    • similar to the overflow situation that we've been doing the last couple months
    • there will be changes made to WMStats before this can happen Alan won't make it official until it's been thought through and tested. They are currently hacking around in the testbed
  • we didn't have all the input prestaged so we are waiting for that to happen before we can really get going hard on
  • Watch DR80 stuff carefully

3 top issues affecting production

  • assignProdTaskChain.py is not assigning the merge ACDC's. TaskChain ACDC's need to be assigned via script... what can we do?
    • problem is that there is the none in the ac era, so the script is not - Alan and Paola will look at the assignProdTaskChain script
  • pdmvserv_SUS-RunIISummer15GS-00172_00293_v0__160331_161259_9392 - input data is at Sprace and Legnaro, workflow is only whitelisted to Sprace and Legnaro, but it has overflowed to many other sites that are failing it because of file read issues.
  • Higher than normal stage out errors, basically everywhere. Has something changed?

Site support - Gaston

Failed to include URL http://cmssst.web.cern.ch/cmssst/lifestatus/lifeStatus_log.txt Not Found

Transfers - Jorge

  • do we have a way to watch transfers by campaign? Not really but it would be useful to have for Dima. Russian sites are having some problems with transferring files.

Workflows

  • All work backlog is current to the past month!
  • There are a bunch of HARVEST workflows in failed, can we clean them up?

ReDigi

MiniAOD

TaskChains

StepChain

  • NA

Rereco

Store Results

MonteCarlo

Agent Issues

Agent redeployment

  • Can we make sure that we have all the agents updated on the twiki
  • we have 2 left to update 311, 219

RequestMgr2 Migration

Merging Scripts

RelVal Andrew

AOB

-- JenniferAdelmanMcCarthy - 2016-04-06

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2016-04-07 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback