Workflow Team Meeting - Oct 23 4PM CERN time

 

Vidyo Link

Attending

  • FNAL: Jen, Dave, Luis, Ian, SeangChan, Jorge, Juan
  • CERN : Andrew, Julian, Dima (New Dave)
 

Personel

EU

Oct 16 -> Oct 23 Sara
Oct 23 -> Oct 30 Jasper

US

Oct 22 -> Oct 30 Ian
  • New US Operator Ian Dyckes

News

  • Dima - first to help make process more streemlined, also SanDiego
  • Power outage!
    • We somehow managed to survive the power outage rather unscathed. Agents needed to be restarted but it doesn't appear that they lost their minds.
    • This does bring up the fact that we should at least think about what to do in case of a longer more catastrophic outage. We have agents at FNAL that can run the data, but we are then running blind. Something to think about.
    • Two key points two work:
      • Documentation backup: google cache example
      • Alternative communication channels: Gtalk, Skype, AIM, etc.
  • wmLHE+GEN-SIM and DIGI-RECO 53x (Phys14 MC for next run)
  • Possible Urgent data coming late in the week? So far all we have is a rumor that something is coming and we have no idea what!
    • Urgent upgrade, 4 new campains starting
    • Week number 3 of having this in our news notes
    • still no new news or dates, we think this is a re-run of data we ran in Sept so the data should be on disk.
      • Dima is tasked with the job of getting more information for us.
  • Monitoring scripts: have to point to the global pool. Can we ignore analysis jobs?
    • Production jobs are showing in Dashboard, and being monitored, but the backfill going to global pool still are not and should be watched via the WMAgent
    • Backfill is now showing up as well to the global pool so we can start using it.
  • Ian, new US operator At FNAL for training this week.
  • vocms174 and vocms227 will be given back to Ivan on October 31st. Make sure to copy your stuff before this data.
    • vocms049 (already available) is the replacement for vocms174.
      • does not have git, it's an sl6 machine but it isn't big a small virtual machine just for running scripts
      • Let's have Julian ask git, xrd, xrootd so we can fetch our logs, mounting cvfms added to the puppet for sl6 machines,
  • Julian working in a new WorkflowPercentage and closeOut script:
    • Include taskchains and deal with FilterEfficiency
    • Better / faster - run as a cronjob with html output.
    • Waiting for requests to test

Site support

  • John is on vacation. Not sure if there is any site news
  • problems with cpu bound on site status board, sites would disappear, Adli is catching up but thinks it has been taken care of

Sara's notes

 

Agent Issues

 

Redeployment plan

Production Pool 

production mcSorted ascending
reproc_lowprio step0
cmssrv217 (drain)
218 (drain)
219 (up/new)
vocms216 (drain/redeployed soon)
201 (up/new)
235 (drain)
cmssrv98(up - will be abandoned)
vocms202 (drain)
234 (up/new)
85 (up - will be abandoned)
cmssrv112 (up - will be abandoned)
vocms237 (up/new - will be abandoned)

Global Pool

backfill
submit1 (up/new)
submit2 (up/new)
  • vocms216 caught a few reproc_lowprio jobs, they will be over soon.
  • any word on new SL6 machines for CERN?
  • What machine did we finally decide to reshoot to SL6 for restesting?
    • cmssrv95 (old StoreResults)
    • also cmssrv112, 98 are good candidates once we get our new machines.

Workflows

 

ReDigi

miniaod's

  • cleared out

Rereco

  • nothing... literally

Store Results

  • NTR

MonteCarlo

  • running smoothly - had to extend a couple workflows but that is it

SL6 testing/backfill

  • Monitoring scripts are not grabbing the Production - aka SL6 WF's properly And need to be modified - Luis

RelVal Andrew

  • why is it possible to move aborted to rejected. Andrew says it is possible but it shouldn't be. Aborted should only move to aborted archived

-- JenniferAdelmanMcCarthy - 22 Oct 2014

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2014-10-23 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback