Workflow Team Meeting - May 28 4PM CERN, 9 FNAL time

Vidyo Link


  • FNAL: Jen, Jorge, Luis, SeangChan
  • US: Ajit, Dima, Ian, JeanRoc
  • CERN : Julian, Andrew, Alan
  • EU:


  • Jen unavailable from noon Fri-Monday morning
  • This will be Ian's last week, he is switching to Atlas * Jen needs to start bugging people about more US operators

News - DIMA

  • Need to invalidate ~200 requests. Should be done at PPD side. They will reset and resubmit.
    • was listed last night, already done. New work has not been resubmitted.
    • we have no way of moving requests from announced to rejected in the ReqMgr
    • state can be changed manually if really needed, but as written announced-archived is a final resting place.
    • We do not have a timeline for the replacements.
  • We may have some urgent requests for first paper. Probably nothing significant.
    • we will need some MC on first data, it will be small
  • WF's that are being assigned to T2_CH_CERN can have jobs sent to AI and HLT, it's OK for HLT but not for AI

3 top issues effecting production

  • Ongoing issues with workflows being stuck in acquired for long periods of time.
    • Alan and Seangchan solved it! - 70K jobs running (100K in the whole pool)
  • Global WQ not updating block location (thinks stuff is only at T0_CH_CERN)
    • Symptoms: Workflows that have 0 errors but < 100% lumis (in all datasets). Stuff stuck in acquired and running-closed.
    • Alan is debugging it.
  • workflows stuck close to being completed and needing to be force-completed, this is happening more often than it used to, are we being impatient or is this related to the agents getting stuck problem. Is force completing things causing issues?
    • Julian: IMO, not an easy answer to that. We already discussed this.

Site support - John



  • RunIISpring15DR74: 495 assig 27 acq 336 runn, 16 comp, 58 ann (1130 at the beggining) so around 40% through.



Store Results

  • new store results popped into the system this week: T3_US_Cornell


Agent Issues

Redeployment Plan

RelVal Andrew

--++ L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources - Stefan



Automatic Assignment And Unified Software


Last Agent status

  • Next plan: balance fnal and cern agents (drain vocms0311 - wake one of the cmssrv's)

production SL6
cmsgwms-submit1 (up) vocms0308 (up)
cmsgwms-submit2 (up) vocms0309 (up)
cmssrv217 (ready to wake) vocms0310 (up)
cmssrv218 (ready to wake) vocms0311 (drain)
cmssrv219 (ready to wake)  

-- JenniferAdelmanMcCarthy - 2015-05-27

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2015-05-28 - JenniferAdelmanMcCarthy
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback