Workflow Team Meeting - May 28 4PM CERN, 9 FNAL time
Vidyo Link
Attending
- FNAL: Jen, Jorge, Luis, SeangChan
- US: Ajit, Dima, Ian, JeanRoc
- CERN : Julian, Andrew, Alan
- EU:
Personel
- Jen unavailable from noon Fri-Monday morning
- This will be Ian's last week, he is switching to Atlas * Jen needs to start bugging people about more US operators
News - DIMA
- Need to invalidate ~200 requests. Should be done at PPD side. They will reset and resubmit.
- was listed last night, already done. New work has not been resubmitted.
- we have no way of moving requests from announced to rejected in the ReqMgr
- state can be changed manually if really needed, but as written announced-archived is a final resting place.
- We do not have a timeline for the replacements.
- We may have some urgent requests for first paper. Probably nothing significant.
- we will need some MC on first data, it will be small
- WF's that are being assigned to T2_CH_CERN can have jobs sent to AI and HLT, it's OK for HLT but not for AI
3 top issues effecting production
- Ongoing issues with workflows being stuck in acquired for long periods of time.
- Alan and Seangchan solved it! - 70K jobs running (100K in the whole pool)
- Global WQ not updating block location (thinks stuff is only at T0_CH_CERN) https://cms-logbook.cern.ch/elog/Workflow+processing/20515
- Symptoms: Workflows that have 0 errors but < 100% lumis (in all datasets). Stuff stuck in acquired and running-closed.
- Alan is debugging it.
- workflows stuck close to being completed and needing to be force-completed, this is happening more often than it used to, are we being impatient or is this related to the agents getting stuck problem. Is force completing things causing issues?
- Julian: IMO, not an easy answer to that. We already discussed this.
Site support - John
Workflows
- RunIISpring15DR74: 495 assig 27 acq 336 runn, 16 comp, 58 ann (1130 at the beggining) so around 40% through.
Rereco
Store Results
- new store results popped into the system this week: T3_US_Cornell
Agent Issues
Redeployment Plan
--++ L3 discussion - Ajit, Jean-Roch, Matteo
Opportunistic Resources - Stefan
HLT
SDSC
Automatic Assignment And Unified Software
AOB
Last Agent status
- Next plan: balance fnal and cern agents (drain vocms0311 - wake one of the cmssrv's)
--
JenniferAdelmanMcCarthy - 2015-05-27