Workflow Team Meeting - Nov 20 4PM CERN time & US Meeting Tues Nov 11 at 1PM FNAL time

Vidyo Link



Nov 14 -> Nov 20 Sara
  • About the chat-schedule poll (we should discuss this by the end of the meeting):
    • Most popular slot is still the same (Thursdays 4-5pm cern time)
    • Sean, Xavier and Ian may not attend - so it will be better to fix a separate meeting time for EU and US operators
  • Julian will be in Colombia 25th Nov - 25th Dec - Working plans on being online.
    • Unavailable during 25th Nov
  • Luis will be in Columbia Dec 20-through New Year will get us exact dates soon


Site support

EU shift notes

US Shift notes

Agent Issues & Redeployment plan

  • Some cronjob on cmst2 doing 8M/day queries to DBS - shut it down?
    • which does:
      python -d '/*/*/GEN-SIM' -s 2014-04-05 &
      python -d '/*/*/AODSIM' -s 2014-04-05 &
      python -m -d '/*/*/USER' -u -s 2014-06-20 &
  • FNAL agents - troubles connecting to couch (Network errors)
  • cmssrv21x are almost drained (no jobs running) - checks and stabilize before switching
  • Production Pool:
    production SL6 mc SL5
    cmssrv217 (drain)
    218 (drain)
    219 (drain)
    vocms216 (up)
    201 (up)
    235 (up)
    reproc_lowprio SL5 step0 SL5
    vocms202 (up)
    234 (up)
    vocms237 (up - will be abandoned)
  • Global Pool
    backfill+production SL6
    submit1 (up)
    submit2 (up)
  • cmssrv218 and 219 and without any jobs, so Dave will ask Krista to switch them to global pool.
    • once this is done, Alan will set the proxy
    • we'll keep on mind that different agents on same teamname should have different agent number
  • cmssrv217 still has some jobs, so we'll wait till monday.



  • Phys14DR (top priority):
    • Duplicated lumis elog
    • Jen will kill and clone a couple of those to test if th failure is consistent.
  • miniaod's:
    • 13 recently assigned waiting for T1's slots (keep an eye).
    • still 26 running
    • 11 completed
    • Jen will try the old-school acdc on the Merge task to see if there were any acdc document produced


  • NTR

Store Results

  • Three store results still running (3rd acdc) - they had performance kills (RSS) 97% of events


  • TP2023HGCALGS and TP2023SHCALGS and (top priority):
    • 26/66 requests still running. Most of them have been handled.
    • Some of them affected by site issues.
  • task_SMP-Summer12WMLHE: running with minor failures.
    • they are running on step0 to limit the number of concurrent jobs.
    • keep an eye because they cannot be acdc'd
  • Dima asked about RunIIFall14GS, 1 acquired (assigned today) 2 at 99% (force-completed) and almost ready to be announced.

RelVal Andrew

-- JenniferAdelmanMcCarthy - 2014-11-11

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2014-11-20 - JulianBadillo
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback