%TOC{title="Workflow Team Meeting July 15, 2013"}

INDICO Link: https://indico.cern.ch/conferenceDisplay.py?confId=

Attending

Personel

Jul 9 -->Jul 16 Sunil
Jul 16 --> Jul 23 Xavier
  • Jen will be taking some vacation time in Mid August exact dates TBD

Issues last week

  • Over weekend couch replication went down on 216 causing MC production to come to a halt

  • Diego was able to fix Monday morning, how hard is it? Do we need Diego/Seanchan to do this or is this something operations should do? We can't be loosing an entire weekend of processing on a regular basis

* Problems with updates to the closeout script

  • closeout script was marking WF's as duplicates that didn't have duplicate issues

  • Jacob and Jen spent Friday debugging and verifying that it is OK to close out ReDigi WF's that "only failed duplication"

  • this issue also is effecting Step0, Diego fixed the logic for the Step0 but ReDigi is still Broken

Site Issues

Agents

Workflows

  • Continuing work with Resubmission script
  • Need to address Step0/1 issues

IEEE Paper

Draft Outline #1

  • Introduction (Why we need to run so much simulations, why we need to do a rereconstruction of the data) (Edgar/Jen)
  • a brief discussion of what the different types of workflows are, and how they are processed differently (Diego/Jen/Edgar)
  • monitoring for T1 & T2 sites(Diego/Jen/Edgar)
  • How we ran prior to 2011
    • ProdAgent vs WMAgent ( Diego/Alan) (Focus on differences and improvements)
    • Reprocessing and Production (Jen/Xavier) (How this was handled with ProdAgent and why the need to move to another framework
  • How we ran with WMAgent (after 2011)
    • WMAgent /ReqMgr/Workqueue (Diego/Edgar/Alan) General comment on how it works
    • PREP/ReqmG Interaction (Vincenzo?)
    • Organization of the workflow team and operations around it (Edgar)
  • Achievements
    • Events reconstructed (L3s)
    • Usage of the grid (Edgar/Jen/L3s)
  • Conclusions / Outlook (Edgar/Jen)

Action Items

  • Recovery workflows - Jen - ongoing
    • Diego got us an updated recovery workflow script!
    • discovered that the recovery workflows were creating some duplicates
  • we need to add a daily report on Workflow stats
    • How many workflows running, pending, waiting, stuck -
      • Jen -come up with template report
      • Edgar - please comment on workflow statuses I feel like we are not always communicating what workflows are in a waiting status for various issues
      • Diego - how hard would it be to have a "manual switch" that we can set on workflows for "waiting" so if there is a group of workflows that we are waiting back from a site/requesters to close out we can put the workflows in waiting so that things that are in "complete" really are ready to be closed or need to be looked at.
  • Diego - Can we have the script you wrote for finding stuck workflows?
    • Diego will put it in a public place so we can add it to svn
    • Is it documented yet?
      • need to pull documentaion out of e-log and put it on the twiki - Jen
  • Problems with dbsTest.py - done???

AOB

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2013-07-16 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback