Workflow Team Meeting - May 15

Vidyo Link



May 8 --> May 15 Jasper
May 15 -> May 22 Sara


  • Do we need to change the closeout script requirements for PhEDEx? - Dave this will be your discussion to lead
  • Status of syncing up and recovery after the WMStats problems last week - SeangChan this will be your part of the discussion to lead
    • add any missing permanent documents by hand:
      • the definitive list of missing documents yet to be generated
      • we copy them by hand if they exist in the current database, otherwise we re-generate them from ReqMgr DB (with loss of some document information).
  • UPG not "highest priority" anymore, but still High Priority.

Jasper's notes

Agent Issues

  • Jobs couldn't be aborted:
    • Alan & Seangchan applied a patch, and now it's working Elog discussion
General network problem at CERN: Affected GlideIn Collector (generalized job drop on Friday). cmssrv98 and cmssrv112: Host certificate expired, and couch replication issues Elog discussion
  • Step0 agent's hack:
    • When request ask for more events than what the input LHE file has
    • Producing 'Products Unmergeable Error'
    • If the hack is applied, empty files are ignored and events are merged.
    • The hack is not compatible with most recent WMStats version.

Workflow issues

  • Workflow status mismatch 14488
    • I only checked the Redigi - Julian do you check the MC for this ?
    • Julian didn't found anything weird - usual hiccups and missing job information
  • Redigi -
    • EXO-Summer12DR53X - WF's that failed 100% due to input pileup datasets being deleted - 65 of them!
    • Yes I know I'm way behind going through the rest I'm tackling the big issues first then will drill down into the ones that need to be looked at one by one.
    • Redigi's in failed state
      • Let's discuss the question here : which is the origin of this failed status? We don't see it in the GENSIM/FSIM.
      • why do we keep ending up with "good workflows" stuck in failed and WF's like the EXO-Summer12DR53X that failed out miserably move through to complete.
  • Julian's question: What is this "StoreResult" thing that we are supposed to take over?

Site issues - John Please fill this in!

Andrew's questions

  • Last week's meeting notes stated that Andrew would be back today! Welcome back.. so what questions do you have for us wink
-- JenniferAdelmanMcCarthy - 15 May 2014
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2014-05-15 - JulianBadillo
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback