Workflow Team Meeting - Feb 5 4PM CERN time

Vidyo Link

Attending

  • FNAL:
  • US:
  • EU:

Personel

  • Julian will be off Feb 10-13th - in Istanbul Already updated calendar
  • Jen May be off Feb 13-16, Pending weather

News

  • About EU Operators:
    • Two new guys from Seoul Univ. will start training next two weeks.
    • Also news from Belgium.
  • FNAL will have a downtime on the 11th
  • Old requests monitoring (Dima)
    • we need to check periodically for requests that are in production for too long:
    • Typical issues:
      • rejected workflows that were not properly communicated back to PPD and still reflected as "submitted" in McM
      • lost/forgotten workflows. Example: pdmvserv_HIG-Summer12DR53X-01991_T1_US_FNAL_MSS_00212_v0__140502_155516_4571

Site support

  • SAM and HC problems in all sites
  • drain list script had some issues and sites have not been updated. SSB list is ok, we can do it manually.

Agent Issues

  • JobAccountant unstable on Feb 3 - SeangChan had to run Alan's script a number of times to get things running again. Not sure why this is happening, but he spent some time looking at it.

Redeployment plan

  • Submit2 redeployed on Wed
    • Global Pool
      production SL6
      submit1 (up)
      submit2 (up)
      cmssrv217 (up)
      218 (up)
      219 (up)
      vocms0308 (down)
      vocms0309 (down)
      vocms0310 (down)
    • Production Pool: * All Production machines have been retired.
  • CERN machines installed and tested:
    • We are waking them up when one of the FNAL agents reach 75% disk -> submit1 most probably (is at 61% now)
    • The idea -> drain submit1 and submit2 and use them as backup.
    • Please check that you have access to the machines!
    • Also check access to vocms049 (for scripts running)

Workflows

  • Backfill again: two TMNT (Teenage Mason's Nuclear Trolls)
    • 1x10^9 events ~ 770K jobs, 4.5K prio
  • Priorities working fine - however check this elog see elog

ReDigi

miniaod's

Rereco

Store Results

  • 1 wf waiting for resources at FNAL

MonteCarlo

  • Huge load of RunIIWinter15GS injected yesterday, 61 in acquired, 9 running

RelVal Andrew

AOB

-- JenniferAdelmanMcCarthy - 2015-02-04

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2015-02-05 - JohnArtieda
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback