Workflow Team Meeting - June 25 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • US: Jen, Ajit, Jorge, Matteo, SeangChan, Luis
  • EU: Julian, Andrew

Personel

  • Jen off June 26-July5 - will have e-mail access but painfully slow internet
  • Jen off Aug 10-26 (tenitive )
  • SeangChan July 27-31 * Julian Sept 14-30

News

  • A few upgrades in the system but nothing major

3 top issues effecting production

  • File Read issues at FNAL, xrootd problems - Both Julian and Jen submitted ACDC's on Wed.. check in morning to see if they worked
  • RunIISpring15DR74: (Exit Code: 8003) = Step3 miniaod problem, across sites, across workflows for
    • getting somewhere with ACDC's but not to full recovery
    • Julian will let run over the weekend, and then report back to PPD on Monday
  • EXO-RunIIWinter15GS 's with high failure rates (31 so far):
    • https://cms-logbook.cern.ch/elog/Workflow+processing/20900
    • maxRSS exceeded, I'm testing finer splitting, Andrew says he didn't see any problem with these on RelVal
    • using too much RSS memory, probably will need to be reset, finer splitting isn't fixing the problem we should send it back?? elog and hypernews discussions already in the works
  • ACDC's stuck in acquired, stuck in Global Queue - SeangChan needs to look at them - Global Queue got wrong from PhEDEx

Site support - John

Waiting Room

Workflows

ReDigi

  • Problems discussed up in Problems section

TaskChains

  • there are 2 task chains in complete, Julian have you looked at them
    • Waiting for reset

Rereco

Store Results

  • NTR

MonteCarlo

  • MC WF's with >95% failure rate Exceeding memory, sending them back

Agent Issues

Redeployment Plan

  • Deploying 1.0.8.pre6 version.
  • Also we want another opportunistic WMAgent for FNAL+Amazon
    • probably will use one of the submit machines
  • vocms0304 - backfill team - needs a few tests before using it as a backfill and scale testing, try to increase maxjobs running for CERN agents
  • deploy new cern agent 308, then we have all new agents and start draining old

RelVal Andrew

  • Discussing injecting log files int PhEDEx

L3 discussion - Ajit, Jean-Roch, Matteo

  • nothing special, no more sending redigi/rereco to SDSC
  • GEN_SIM not moving, seems to be a site issue

Opportunistic Resources - Stefan

  • Old backfills sitting in running-closed but not closing out, what should we do

HLT

  • HLT Testing - stupid question, T2_CH_CERN_HLT is not in the menu in WMStats are you just assigning via script? or what? needs to be assigned via script
  • perhaps we should do a more systematic campaign like what we are doing with SDSC
  • let's make sure that we submit reasonable workflows

SDSC

Automatic Assignment And Unified Software

AOB

-- JenniferAdelmanMcCarthy - 2015-06-24

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2015-06-25 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback