• Jen, John, Dorian, Andrew, Diego, Edgar


  • Coming off Shift- Sara
  • Coming on Shift - Sunil
  • Jen will be on vacation Aug 9-21. I will have very little Internet access during this time.
    • Edgar will not be taking any time off during this time so we are covered
  • Dorian will be watching things US time while Jen is on Vacation.
    • Dorian will start watching this so you can ramp back up
  • Diego Gone 7-11, around 12-21 then will be gone for good

Issues last week

  • 98 112 Couch replication down host cert was wrong
    • Seangchan will work with Krista and Tony to fix this issue and find out what changed last Friday so that things were broken over the wekend

Site Issues

Sites for Production

Site in MC Slots Status Notes Issues
T2_KR_KNU 200 - Jul 30: re-commissioned Ok
T2_RU_IHEP 700 - Jul 29: re-commissioned Ok
T2_AT_Vienna 212 skip under commissioning 25% failure rate
T2_GR_Ioannina 94 skip under commissioning 100% failure rate
T2_UA_KIPT 200 skip under commissioning SAM swinst solved on Jul 31

  • Issues with commissioning Vienna, GR_Ioannina, UA_KIPT
    • WF assigned - taking too long (agent max 50 jobs/site) - Jobs killed due to timeout


* 237 is waiting for update but its got a big workflow to go through


IEEE Paper

Draft Outline #1

  • Introduction (Why we need to run so much simulations, why we need to do a rereconstruction of the data) (Edgar/Jen)
  • a brief discussion of what the different types of workflows are, and how they are processed differently (Diego/Jen/Edgar)
  • monitoring for T1 & T2 sites(Diego/Jen/Edgar)
  • How we ran prior to 2011
    • ProdAgent vs WMAgent ( Diego/Alan) (Focus on differences and improvements)
    • Reprocessing and Production (Jen/Xavier) (How this was handled with ProdAgent and why the need to move to another framework
  • How we ran with WMAgent (after 2011)
    • WMAgent /ReqMgr/Workqueue (Diego/Edgar/Alan) General comment on how it works
    • PREP/ReqmG Interaction (Vincenzo?)
    • Organization of the workflow team and operations around it (Edgar)
  • Achievements
    • Events reconstructed (L3s)
    • Usage of the grid (Edgar/Jen/L3s)
  • Conclusions / Outlook (Edgar/Jen)

Action Items

  • Write twiki disk/tape separation T1_IT_CNAF. Edgar
  • SVN - IEEE paper - Edgar. Ongoing.
  • Recovery workflows - Jen - suspend
    • first 2 workflows are completely through and now we are waiting for people to really look and make sure that there are no show stoppers before we do the other 50.
    • Guillemo is bothering JeanRoc about if people have actually looked at the data
  • we need to add a daily report on Workflow stats - needs work on debugging
    • A new state for completed and already dealt with ACDC.
    • How many workflows running, pending, waiting, stuck
    • Is it documented yet?
      • need to pull documentaion out of e-log and put it on the twiki - Jen - Done
    • Do we understand how to find the stuck workflows and fix them without Diego's help? SeangChan will have to do surgery now


Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2013-08-14 - EdgarFajardo
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback