Workflow Team Meeting - Nov 5 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL:
  • US:
  • CERN :
  • EU:

Personnel

  • Jorge - Nov 9-13
  • SeangChan off Nov 3-10
  • Julian to Colombia Dec 14, and then contract ends, will be working remotely from Colombia until the First of Jan
  • Ajit out till Nov

News - Dima

3 top issues effecting production

  • See ReReco notes
  • StepChain -
  • cmsweb upgrades - finished already https://cms-logbook.cern.ch/elog/Workflow+processing/22103
    • Watch out for PNN's
  • do we need to revive recovery script for ReReco
  • Lots of WF;s in status failed, with no reason given in file why they failed. Is something broken? I found several of them that had been reset, or cloned but the origioinal was still stuck in status failed. There was nothing in elog about the workflows being cloned or reset, the only way I knew what happened was to look for output datasets with higher version numbers. Is it really that hard to post an elog when you clone/reset workflows? Is there a better way to check on the workflows in failed?

Site support - Gaston

News & Issues

Workflows

ReDigi

*

TaskChains

  • Workflows with None AcqEra - https://cms-logbook.cern.ch/elog/Workflow+processing/22100
  • We have already discussed why this happened, and outlined the procedure to fix it
    • this is a problem of communication and documentation, Alan and Julian knew what to do, I sort of knew what to do but nothing was documented anywhere!
    • What other exceptions to regular acdc production and assigning are there?

StepChain

Rereco

  • Need to start returning WF's that are not 100%
  • How much have we talked to the developers on why we are getting missing lumi's but they are not showing up in WmStats as errors? We need to find a better way of recovering this!

Store Results

MonteCarlo

Agent Issues

Redeployment Plan

  • everything is up to date - redeploy in mid-November, and then we'll have major changes to iron out before Christmas Production
  • all sites need proper local site node name info or they won't work - reposting last weeks list here to see where we are in getting things done!
    • before site config returned node names , but se name and phedex node name need to be returned or they will fail. Gaston needs to verify that the CERN Local configs are right

* as discussed in the meeting, some sites are not providing the phedex-node value in the site-local-config.xml. For example, T2_CH_CERN is Ok:

  • We need all the processing sites to properly provide this info otherwise we'll start having problems in the next WMAgent release.
  • Also the site we are using has to be listed here. If not we either update siteDB or get the site name from other sources.

RelVal Andrew

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources

Automatic Assignment And Unified Software

  • In the past Julian has "extended MC workflows" that didn't meet statistics, is this something Unified can take over?

AOB

  • PNN change - have Gaston check sitedb list that maps

-- JenniferAdelmanMcCarthy - 2015-10-28

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2015-11-05 - JulianBadillo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback