Workflow Team Meeting - Feb 12 4PM CERN time

Vidyo Link

Attending

  • FNAL: Jen, SeanChan, Dave, Mattio
  • US: Ajit
  • EU: Vincenzo, Alan

Personel

  • Julian will be off Feb 10-13th - in Istanbul Already updated calendar

News

  • About EU Operators:
    • Jason Lee - has requested accounts
    • Youn Jung Roh - does not have FNAL accounts ??? We should make sure we have her setup properly
    • Also news from Belgium.
      • in principal - still using Sara for 2 mo she will disappear before Summer
      • Xavier will have 4 wks for us
      • new postdoc in group that we will have for next year
      • Xavier will put us in contact with the people who he is arranging, we need to work on haveing better feedback to the operators
      • give shifters more specific tasks.
  • recovery from FNAL downtime
    • All FNAL agents were rebooted and brought back up
      • submit1, 217 and 219 were giving disk errors so we left them in drain for reshooting now that we have the CERN agents up
        • ram disk has been filling up, it was too small given the number of jobs we had in the system which meant shutting down condor, resizing petition etc, the only good time to do it was durin g yesterday's downtime. for submit2 and 217 it lost contact with the jobs in running state, they moved from running to idle. The agent should handle it, they just run longer. We don't think we actually lost jobs. There are ~20000 running jobs that got kicked back to idle. Otherwise we did pretty good.
    • Site brought out of drain at ~4PM (11PM CERN time)
  • WMStats is rotating so it may not be reliable today.
  • vocms310 can't login
  • PhEDEx issues over the weekend

Site support

  • Site Support
    • new "Prod status - manual changes" metric (drain list) dedicated only for manual changes made directly in SSB.
    • "Prod status" will be controlled automatically and will continue feeding WMagents
    • FNAL out of drain

prod.png

    • T2_EE_Estonia had some problems Tue, Feb 10th.
      • test workflow, how was it before that date? we will have to resend them once they fix their problems

Agent Issues

Redeployment plan

  • Submit2 redeployed on Wed
    • Global Pool
      production SL6
      submit1 (drain)
      submit2 (up)
      cmssrv217 (drain)
      218 (up)
      219 (up)
      vocms0308 (down)
      vocms0309 (up)
      vocms0310 (up)
      vocms053 (relVal) (up)

Workflows

  • we are in the tails of everything - there are 3 WF's that are still stuck with issues with datasets

ReDigi

  • a number of workflows will need ACDC due to submit errors to fnal
  • other Upgrade workflows are running smoothly

miniaod's

Rereco

Store Results

---+++ MonteCarlo

RelVal Andrew

AOB

-- JenniferAdelmanMcCarthy - 2015-02-11

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2015-02-12 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback