Workflow Team Meeting - April 9 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, Luis,Matteo, SeangChan
  • US: Ajit, Stefan
  • CERN : Julian, Andrew, Alan, Dima
  • EU:

Personel

  • Luis to Colombia around May 1-15th, working remotely (very little)
  • Jorge to Colombia around the same time
  • Sara is shifter

News

  • Not much is changing, need to cleanup digi-reco that are sitting around, they are trying to get their act together, 25nsec code is coming
  • 73 digi-reco campaign, upgrade stuff is going OK
  • any gen-sim wf with low priority could be used for oppertunistic workflow testing, SDSC, not the data will have to be read from FNAL so doing PU workflows could saturate our ability to do this.
    • we want to be able to run redigi reco on the opportunistic sites, can we really do this or is reading the input from other sites going to give us headaches.
    • testing running redigi/rereco on T2's
    • we are using some already?? how are they doing??

3 top issues effecting production

  • RAL - still having ongoing problems with this site. Nobody I've talked to seems to know anything. need to double check is it just the hot file issue? if so why are jobs succeeding at FNAL and JINR?

Site support

  • RAL - reassign all workflows that are from the problematic cmssw version to other sites and lob some "easy" work at RAL and see if we can get it to stay green
    • Jen
  • TIFR - maxwalltime in the config is not matching their job requirements and that is why jobs are not going there.

Opportunistic sites

  • Not much to add, workflows are going slowly, and encountering some issues but the problems are all minor. NERSC no advance on the cdmfs issue, they are trying a local mirror, but it isn't working so we are trying to get them to follow the SDSC example of working over nfs. In principal it works

Workflows

* cleanup of files that were sent to _MSS
    • do we have a full list? who is going to do the cleanup
    • we need to produce this list and do the cleanup - Matteo/JeanRoc

ReDigi

  • Ongoing upgrade, some of the non-upgrade workflows are getting pretty old

TaskChains

  • ACDC's creating duplicate lumis (when done on T1) see elog
    • Not sure it can be easy to reproduce.
    • without looking... does this only effect MC or do we need to worry about it for redigi as
      • we can not do acdc for any workflow that does event splitting. Julian needs to invalidate files to clean this up

miniaod's

  • nothing to report

Rereco

* nothing to report

Store Results

* one open ticket, Julian is going to work on it

MonteCarlo

  • nothing to report, workflows with problems in effeciency, but issue is understood and e-logged. Somebody needs to take action on them to announce.

Agent Issues

Redeployment Plan

  • We'll start draining these four:
    • cmssrv217, cmssrv218, cmssr219 and cmsgws-submit1
    • vocms0308 (is still on drain)
  • And we are waking these four up:
    • vocms0309, 0310, 0311 and submit2

RelVal Andrew

  • what about other github issues, any progress???

L3 discussion

  • Changes to the assignment scripts
    • changes to subscriptions
    • changes to reporting to activity

AOB

-- JenniferAdelmanMcCarthy - 2015-04-13

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2015-04-16 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback