• Thurs is a CERN Holiday so Edgar and Alan will be off Thurs & Fri
  • Jen needs to attend the release planning meeting Friday at 4 CERN time


Jen, John, Luis (FNAL) Sunil Edgar, Andrew, Alan

Issues last week

  • 216 blowing up on Friday. Do we know how this happened? What can we do to prevent it from happening again? once
    • why did it create so many pending jobs? was still set for pre-late binding levels
    • all agents have been updated
    • workflow is HUGE, it has too many lumi's, we will have to manually throttle it through
  • In2P3 sites - opened ticket
  • 174 back?? but not being used used just for submission so don't worry about it
  • 201, 235 - should be fine


  • Coming off Shift- Sunil
  • Coming on Shift - Sunil

Site Issues

Sites for Production

Site in MC Slots Status Notes Issues
T2_RU_PNPI 176 skip to be commissioned
T2_RU_SINP 50 drain to be commissioned

  • How to add sites to drain list?
  • How to add sites to SSB Pledges view?


  • How are we going to get 216 back running stable? We need to work through the backlog
    • when number of pending jobs gets below 20K turn JobSubmitter back on, when it gets above 100K turn it back off, and keep cycling through that until we get through the workflow that is causing problems.


IEEE Paper

Draft Outline #1

  • Introduction (Why we need to run so much simulations, why we need to do a rereconstruction of the data) (Edgar/Jen)
  • a brief discussion of what the different types of workflows are, and how they are processed differently (Diego/Jen/Edgar)
  • monitoring for T1 & T2 sites(Diego/Jen/Edgar)
  • How we ran prior to 2011
    • ProdAgent vs WMAgent ( Diego/Alan) (Focus on differences and improvements)
    • Reprocessing and Production (Jen/Xavier) (How this was handled with ProdAgent and why the need to move to another framework
    • How we ran with WMAgent (after 2011)
  • WMAgent /ReqMgr/Workqueue (Diego/Edgar/Alan) General comment on how it works * PREP/ReqmG Interaction (Vincenzo?) * Organization of the workflow team and operations around it (Edgar)
  • Achievements
  • Events reconstructed (L3s)
  • Usage of the grid (Edgar/Jen/L3s)
  • Conclusions / Outlook (Edgar/Jen)

Action Items

  • Write twiki disk/tape separation T1_IT_CNAF. Edgar
  • Recovery workflows - Jen - suspend
    • first 2 workflows are completely through and now we are waiting for people to really look and make sure that there are no show stoppers before we do the other 50.
    • Guillemo is bothering JeanRoc about if people have actually looked at the data
  • we need to add a daily report on Workflow stats - needs work on debugging
  • A new state for completed and already dealt with ACDC.
  • How many workflows running, pending, waiting, stuck
    • Is it documented yet? yes
    • Luis is working on a script to pull these numbers automatically. But we will still need to manually look at the workflows that may need help being pushed along.
  • solve the problem of how to use a non-production scram architecture (waiting for Alan to come back)
  • Updating documentation on scripts with github now that we aren't using svn anymore
    • docuentation needs to be updated and everyone needs to start ramping up on github


  • Diego will continue to work in the paper
  • problem with the creation of WF's if you change the number of files per job
  • CouchDB will be rotated on Wed so we will be running without being able to watch WMStats on Wed/Thurs should be back Friday
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2013-09-03 - JenniferAdelmanMcCarthy
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback