Workflow Team Meeting June 25, 2013
Attending
Personel
Jun 18 --> Jun 25 Sara + Sunil (trainee)
Jun 25-->July 2 ???? we talked about Sunil doing it, there is nothing on the web page
Site Issues
Agents
- GlideIn Team asked for all scheds to be moved to the CERN collector as the first step for having the global condor pool. cmssrv98 was the first one and it was correctly prioritized
- cmssrv98 was the first agent to be put on new version 0.9.69a, vocms234 will follow.
- vocms235 - set to drain for upgrade
- vocms202 - upgrade to 0.9.69a requested Monday. Is it done yet?
- we had Workflows that were stuck for a couple days waiting to complete without notice. Diego can you please write us a script that can look at workflows and let us know if they dont move for 24 hrs?
- cmssrv112 - ran out of room for couch, Sara cleaned up disk and restarted. Old version of Agent was not taken off machine after upgrade. Who is responsible for that?
Workflows
- Stuck workflows
- R2692_B304 & R2711_B305
IEEE Paper
Draft Outline #1
- Introduction (Why we need to run so much simulations, why we need to do a rereconstruction of the data) (Edgar/Jen)
- How we ran prior to 2011
- ProdAgent vs WMAgent ( Diego/Alan) (Focus on differences and improvements)
- Reprocessing and Production (Jen/Xavier) (How this was handled with ProdAgent and why the need to move to another framework
- How we ran with WMAgent (after 2011)
- WMAgent /ReqMgr/Workqueue (Diego/Edgar) General comment on how it works
- PREP/ReqmG Interaction (Vincenzo?)
- Organization of the workflow team and operations around it (Edgar)
- Achievements
- Events reconstructed
- Usage of the grid
- Conclusions / Outlook (Edgar/Jen)
Action Items
- Recovery workflows - Jen - ongoing
- updating missing Lumi's - Jen - ongoing
- Making new shift schedule - Xavier
- Github issue for Required OS, parameter (REQUIRED_OS = "rhel5", "rhel6", "any").