Workflow Team Meeting - Oct 23 4PM CERN time

Vidyo Link


  • FNAL:
  • CERN :



Oct 16 -> Oct 23 Sara
Oct 23 -> Oct 30 Jasper


Oct 22 -> Oct 30 Ian
  • New US Operator Ian Dyckes


  • Power outage!
    • We somehow managed to survive the power outage rather unscathed. Agents needed to be restarted but it doesn't appear that they lost their minds.
    • This does bring up the fact that we should at least think about what to do in case of a longer more catastrophic outage. We have agents at FNAL that can run the data, but we are then running blind. Something to think about.
    • Two key points two work:
      • Documentation backup: google cache example
      • Alternative communication channels: Gtalk, Skype, AIM, etc.
  • wmLHE+GEN-SIM and DIGI-RECO 53x (Phys14 MC for next run)
  • Possible Urgent data coming late in the week? So far all we have is a rumor that something is coming and we have no idea what!
    • Urgent upgrade, 4 new campains starting
    • Week number 3 of having this in our news notes
  • Monitoring scripts: have to point to the global pool. Can we ignore analysis jobs?
    • Production jobs are showing in Dashboard, and being monitored, but the backfill going to global pool still are not and should be watched via the WMAgent
  • Ian, new US operator At FNAL for training this week.
  • vocms174 and vocms227 will be given back to Ivan on October 31st. Make sure to copy your stuff before this data.
    • vocms049 (already available) is the replacement for vocms174.
  • Julian working in a new WorkflowPercentage and closeOut script:
    • Include taskchains and deal with FilterEfficiency
    • Better / faster - run as a cronjob with html output.
    • Waiting for requests to test

Site support

  • John is on vacation. Not sure if there is any site news

Sara's notes

Agent Issues

Redeployment plan

Production Pool

production mc
cmssrv217 (drain)
218 (drain)
219 (up/new)
vocms216 (drain/redeployed soon)
201 (up/new)
235 (drain)
cmssrv98(up - will be abandoned)
reproc_lowprio step0
vocms202 (drain)
234 (up/new)
85 (up - will be abandoned)
cmssrv112 (up - will be abandoned)
vocms237 (up/new - will be abandoned)

Global Pool

submit1 (up/new)
submit2 (up/new)
  • vocms216 caught a few reproc_lowprio jobs, they will be over soon.
  • any word on new SL6 machines for CERN?
  • What machine did we finally decide to reshoot to SL6 for restesting?
    • cmssrv95 (old StoreResults)
    • also cmssrv112, 98 are good candidates once we get our new machines.




  • cleared out


  • nothing... literally

Store Results

  • NTR


  • running smoothly - had to extend a couple workflows but that is it

SL6 testing/backfill

  • Monitoring scripts are not grabbing the Production - aka SL6 WF's properly And need to be modified - Luis

RelVal Andrew

-- JenniferAdelmanMcCarthy - 22 Oct 2014

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2014-10-23 - JulianBadillo
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback