Workflow Team Meeting - March 17 4PM CERN, 10 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, Jorge, SeangChan
  • US: Ali,
  • CERN : Alan, Dima, Paola, Andrew

Personnel

  • Jorge March 24-25
  • Jorge to Columbia April 15-May 2, Talk on April 27
  • Gaston off on Thursday 17 and Friday 18.

News - Dima

  • for the premixing samples, we want to run 2 idtical workflows that runs wherever and acdc etc 14M events one we run as normal
  • 2nd clean deterministic way, go to HP agent, and T1's and not do acdc's etc on it and let Physics do validation and decide what level of
    • we need to make sure we have the whole PU sample locally or we will end up with issues, and the flag needs to be set for site lists
    • Alan will send around an email with details we need to run the workflow determinaistically
  • April Fools Campaign is coming and they are testing right now

3 top issues affecting production

  • pdmvserv_FSQ-RunIISpring15PrePremix-00004_00008_v2_AVE_50_BX_25ns_160309_183357_8225
    • lots of merge jobs caused the networking and storage at the T1's to melt down when we attempted to run this at high priority

Site support - Gaston

Failed to include URL http://cmssst.web.cern.ch/cmssst/lifestatus/lifeStatus_log.txt Not Found

Transfers - Jorge

Workflows

ReDigi

MiniAOD

TaskChains

StepChain

  • NA

Rereco

Store Results

MonteCarlo

Agent Issues

  • Replication has been having issues all week
    • cms server side, keeps starting over the patch is not sending the deleted document to the cms web site, replication is working but is starting over again, but replication is startin over from 0% and is starting over again before it finishes.
  • ErrorHandler has been unstable on all agents all week. Alan patched all the production agents, it should mitigate the crashes seen with ErrorHandler (ONLY apply to workflows assigned from now on).

Agent redeployment

  • Alan and Paola are taking care of draining agents vocms303, vocms0308,cmssrv218 and cmssrv217.
    • if we can wait to redeply to do the reboots for the security patch that is best
    • above agents are ready to reboot,
  • we will be using a different interface to Condor so be aware of weird issues and alert the developers asap when we see them
  • plan is to redeploy all the agents before the April

ReqestMgr2 Migration

RelVal Andrew

  • ReqMgr 2 the resubmit.py doesn't work for RelVal, trys to submit to injection api that should be submitted to the workflow assignment api but Andrew needs help going through this.

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources

Automatic Assignment And Unified Software

AOB

-- JenniferAdelmanMcCarthy - 2016-03-16

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2016-03-17 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback