Workflow Team Meeting - Sept 17 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, SeangChan, Matteo, Eliana, Jorge, Gaston
  • US: Ajit
  • CERN : Alan, Dima
  • EU:

Personnel

  • Julian is off Sept 14- oct 2

News - Dima

  • Miniaod campaign - bunch of quick jobs everything produced ~3000 requests
    • single input events
    • they are testing now, they want to test Probably hit Friday at 5.. as always.... soon not exactly sure when
    • it's been a week.. where are our requests???
      • they are coming in but not all at once
  • 200 high priority black hole events https://hypernews.cern.ch/HyperNews/CMS/get/prep-ops/1971.html
    • haven't hit the fan before Wed afternoon FNAL time yet
    • it's being staged, and we're starting to run
  • nothing else exciting in the pipeline.

3 top issues effecting production

  • TaskChain ACDC's failing
    • known issue must be assigned via script, but then the ACDC's fail what can we do?
    • Matteo will look
  • exit code: 8001
    • https://cms-logbook.cern.ch/elog/Workflow+processing/21669
    • attempting to get log files, there are 8 wf's with this issue, can we return all of them?
    • will leave in complete for now, and wait to hear back from JeanRoc/Requestors * still backed up at T2_CH_CERN with GENSIM,
    • moving things to T0_CH_CERN, but they are going to a different storage element, T0 can share them but
    • there are 3 sites for T2_CH_CERN, if you add them up we have 16K but in actuality we only have10K slots, we are using them all. We need to get this cleaned up so Unified knows how many slots we really have. Dirk has already opened a ticket to clean this up some.

Site support - Gaston

  • Need to clear up the VO machines out of the T2_CH_CERN, _TO and _HLT and possibly _AI need to go away

Waiting Room

  • : T2_EE_Estonia, T2_TH_CUNSTDA, T2_IT_Bari, T2_UA_KIPT, T2_IN_TIFR.

Morgue

  • No changes.

Workflows

ReDigi

TaskChains

  • unable to assign taskChain ACDC's via script or manually apparently now what do we do? - Matteo will look at

StepChain

  • need to work on getting logs, running tests
  • Matteo will open elog with request names and questions on them and we'll see what we can find
    • last step then will produce aod and miniaod, so this can work

Rereco

  • testing functionality of it with unified but closed with 40% failure rate, good news is that we may be able to run them through the unified system. Just need to figure out what is going on.

Store Results

MonteCarlo

Agent Issues

  • waiting for a few workflows to complete then need to redeploy the agents that had couch views missing etc 304,
  • resource control is off, we had workflows that were in acquired but not running

Redeployment Plan

RelVal Andrew

L3 discussion - Ajit, Jean-Roch, Matteo

  • stageout is being successful but files are missing, Alan is working on it highest priority, he's been debugging issue for 2 days and is not finding where file is being deleted.

Opportunistic Resources - Stefan

HLT

SDSC

Automatic Assignment And Unified Software

AOB

-- JenniferAdelmanMcCarthy - 2015-09-23

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2015-09-26 - JulianBadillo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback