Workflow Team Meeting - Nov 5 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL:
  • US:
  • CERN :
  • EU:

Personnel

  • Jorge - Nov 9-13
  • SeangChan off Nov 3-10
  • Julian to Colombia Dec 14, and then contract ends, will be working remotely from Colombia until the First of Jan
  • Ajit out till Nov
  • JR unavailable until Nov 23d

News - Dima

3 top issues effecting production

Site support - Gaston

News & Issues

  • (JR) Several sites over dataops DDM quota : reasons are
    • heavy gen-sim in production or in use
    • 4 or 5 opened campaign with secondary input : 40+40+25+10+10 TB = 125TB of secondary on disk
  • (JR) Large TaskChain with large job blow-up ratio : now assigning to >4k slots sites if ratio >5

Workflows

ReDigi

*

TaskChains

  • Workflows with None AcqEra - https://cms-logbook.cern.ch/elog/Workflow+processing/22100
  • We have already discussed why this happened, and outlined the procedure to fix it
    • this is a problem of communication and documentation, Alan and Julian knew what to do, I sort of knew what to do but nothing was documented anywhere!
    • What other exceptions to regular acdc production and assigning are there?
  • (JR) what can be done about the very long running taskchains for now

StepChain

Rereco

  • Need to start returning WF's that are not 100%
  • How much have we talked to the developers on why we are getting missing lumi's but they are not showing up in WmStats as errors? We need to find a better way of recovering this!

Store Results

MonteCarlo

Agent Issues

Redeployment Plan

  • everything is up to date - redeploy in mid-November, and then we'll have major changes to iron out before Christmas Production
  • all sites need proper local site node name info or they won't work - reposting last weeks list here to see where we are in getting things done!
    • before site config returned node names , but se name and phedex node name need to be returned or they will fail. Gaston needs to verify that the CERN Local configs are right

* as discussed in the meeting, some sites are not providing the phedex-node value in the site-local-config.xml. For example, T2_CH_CERN is Ok:

  • We need all the processing sites to properly provide this info otherwise we'll start having problems in the next WMAgent release.
  • Also the site we are using has to be listed here. If not we either update siteDB or get the site name from other sources.

RelVal Andrew

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources

Automatic Assignment And Unified Software

  • In the past Julian has "extended MC workflows" that didn't meet statistics, is this something Unified can take over?

AOB

  • PNN change - have Gaston check sitedb list that maps

-- JenniferAdelmanMcCarthy - 2015-10-28

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 2015-11-05 - JeanrochVlimant
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback