Vidyo Link


  • John, Seangchan, Dave, Jen Luis
  • Julian, Andrew


Feb 11 -> Feb 18 Sunil
Feb 18 -> Feb 25 ???


  • Turn back on to production with DBS3 went amazingly well! Thank you again for everyone's hard work! Hope everyone managed to grab a little breather.
    • Lessons learned?

Agent Issues

  • Phedex/DBS3 error?
    • Fix made and applied changes to WMAgents,
  • All the agents reinstalled with few changes:
    • No mc-highprio team.
    • vocms237 with no step0 patch (zero/events file will be reported as a request error)
    • vocms85 in the mc team, not started yet, will be started as a backup
  • DBS3Upload crashing on thursday 12804:
    • Incomplete SE information reported by jobs.
    • Seangchan and Luis run a recovery for missing blocks
    • GitHub issue: 4964
  • DBS3 Upload time-lag, blocks may be uploaded after the workflow is complete:
    • Wait 18h to resubmit workflows?
    • How to check the completion-time?
    • add this wait in to the closeout script?
    • Seangchan will open github issue. we will be having a new state, for now work around is time WF moves to closeout + 18 hrs

Workflow Issues

  • Closeout script problems - Julian how far did you get on this?
    • Already fixed and running, it was an error on the way I used the dbs3 api.
  • The monitoring script that reports to dashboard:
    • Julian is working on that
  • MonteCarlo
    • NTR
  • ReDigi/ReReco
    • cloned WF's assigned
    • need to go through them and make sure everything is OK now. We have 1 with duplicates and several that PhEDEx is not updating on

Site Issues:


  • When are we doing this? April sounds good? Validate with PPD
  • Plan (Scratch):
    1. Drain the reproc_highprio agents
    2. Create a "production" team:
      • vocms85 + ex_reproc_highprio
    3. Drain one of the reproc_lowprio and one of the mc
    4. Switch assignment to the new team.
    5. Add the agents to the new team as they are drained
    6. Drain the rest of the agents
    7. At the end:
      • One team with 5 agents + 2 backup (will they be enough?)

Release Validation

  • wmagent parameters issue
  • high memory relval workflows
    • how can JINR set a limit of 8 GB per core if each worker node has 12 cores and 48 GB RSS total?
  • slow transfers from FNAL:

-- JenniferAdelmanMcCarthy - 18 Feb 2014

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2014-02-18 - JenniferAdelmanMcCarthy
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback