Workflow Team Meeting - March 19 4PM CERN time 10 AM FNAL TIME!!!!!

Vidyo Link

Attending

  • FNAL: Jen, Luis
  • US: Ian
  • CERN : Julian, Andrew maybe Alan later, Dima
  • EU: JeanRoc

Personel

  • Luis to Colombia around May 1-10???? talk on the 6th
  • Julian out over Easter Holidays March 28-April 4
  • Jen May be taking time off around Easter, not sure on dates yet
  • Seangchan off a few days March 28-April 4
  • Matteo is at CERN this week, Moriond next week

News

  • New EU Operator!
    • James Keaveney - Belgium
    • Haneol Lee - SNU Korea
  • JeanRoc will be joining to automating putting workflows into the system in a more automatic way, decision was made by Christoph and Christoph
  • certs e-mail - there was an issue with certs, how many people got hit? I think the conclusion is that most people really didn't need to do anything
  • Files disappearing at CNAF Buffer: https://hypernews.cern.ch/HyperNews/CMS/get/comp-ops/2190/1/3/1/1/1/1/1/1/1/1.html
    • I'm seeing a similar issue at KIT for pdmvserv_B2G-Phys14DR-00071_00109_v0__150305_210045_8232, since this is a PhEDEx issue could it be the same problem?
    • seems like it was transient, issue is understood, we should kill and clone.

3 top issues effecting production

  • Couch issue - central couch went down, local couches had issuew when it came out
    • workqueue issues - Seangchan deleted some elements, so there was an error that got cleared when we restarted all the couches
    • workqueues started pulling work, still do not fully understand the problem so we aren't sure how to prevent it in the future
  • disappearing files at CNAF https://hypernews.cern.ch/HyperNews/CMS/get/comp-ops/2190/1/3/1/1/1/1/1/1/1/1.html
  • glidin problems we are running low jobs at cern, 3-4 days we didn't run anything at CERN - now that it's fixed we have lots of backfill running

Site support

Opportunistic sites

Workflows

  • Found problem with recovery script, when you run it on workflows that have whitelists it ignores the whitelist when it makes the recovery workflows.
    • this still needs attention SeangChan wants to do it along with the clouseout script himself, he wants to port it to reqmgr2, we need to have a config ops can change at anytime but pull the rest into requst manager 2

ReDigi

  • alahiff_HCA-Fall14DR73-00003_00007_v0__150318_074242_9537 failed because of lumi_list (some bug in DBS3 API)
    • The permanent solution needs to wait for the next cmsweb upgrade.
    • Alan suggest splitting to multiple requests (https://cms-logbook.cern.ch/elog/Workflow+processing/19280) if this is really super urgent. - Julian will do it ??? if he has to
    • reject and ask for a better solution in staging wf
  • IN2P3 still having staging issues, they put the files I needed onto disk manually, but now dealing with exceeding MaxRSS issues on the workflows.

TaskChains

  • Two tc ready to be announced.
  • One running.

miniaod's

Rereco

Store Results

  • One ticket this week ! 112454

MonteCarlo

  • Nothing to report

Agent Issues

  • Couch issues at CERN, followed by stuck local couches, with no complaints from WMStats, only from operators.
    • why did it take so long to debug?

Redeployment Plan

production SL6
FNAL CERN
cmsgwms-submit1 (up) vocms0308 (up)
cmsgwms-submit2 (ready to wake) vocms0309 (ready to wake)
cmssrv217 (up) vocms0310 (ready to wake)
cmssrv218 (up)  
cmssrv219 (up)  

RelVal Andrew

  • What is the status of the batches - talk to Valintine
    • should be grouped by person
    • 390 issues for WMCore - should we stop making issues? do we need a cleanup? Somebody needs to prioritize? CompOps?
      • Julian, Andrew, Alan and Dima will get together, look at issues, prioritize and clean them up

AOB

-- JenniferAdelmanMcCarthy - 2015-03-18

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2015-03-19 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback