Workflow Team Meeting - Oct 29 4PM CERN, 10 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, SeangChan, Eliana, Gaston, Jorge
  • US: Matteo, Dima
  • CERN : Julian, Alan, Andrew, JeanRoc
  • EU:

Personnel

  • Jorge - Nov 9-13
  • Gaston - working remotely 30th
  • SeangChan off Nov 3-10
  • Julian to Colombia Dec 14, and then contract ends, will be working remotely from Colombia until the First of Jan
  • Ajit out till Nov

News - Dima

  • Things are really ramping down, what is ready to go in?
  • Not much is going on and we are ramping down right now.
  • main issue is clear closure in processing

3 top issues effecting production

  • Issue is conceptual
    • run2015B - lots of things in complete state but not 100%, data is not reprocessable, lumisection is too big, T0 is not always processing the data
    • we did splitting by events, but we can't have lumi's split into different files
    • this is super urgent, we've done what we can and we need to send things back to computing and let them figure it out.

Site support - Gaston

News & Issues

  • Current waiting room:
    • T2_IN_TIFR,T2_TH_CUNSTDA, T2_PK_NCP, T2_EE_Estonia, T2_RU_INR, T2_BR_UERJ.
    • RSWTH and Innonina is showing failures from yesterday and today maybe a permission issue, Gaston will followup.

  • Notes
    • T2_TH_CUNDSTA: In unscheduled downtime. T2_PT_NCG_Lisbon: Was in downtime during last week, the site is ok now.

    • Sites in Waiting Room: 6
    • Sites in Morgue: 7

Workflows

ReDigi

*

TaskChains

StepChain

Rereco

Store Results

MonteCarlo

Agent Issues

Redeployment Plan

  • everything is up to date - redeploy in mid-November, and then we'll have major changes to iron out before Christmas Production
  • all sites need proper local site node name info or they won't work
    • before site config returned node names , but se name and phedex node name need to be returned or they will fail. Gaston needs to verify that the CERN Local configs are right

* as discussed in the meeting, some sites are not providing the phedex-node value in the site-local-config.xml. For example, T2_CH_CERN is Ok:

  • We need all the processing sites to properly provide this info otherwise we'll start having problems in the next WMAgent release.
  • Also the site we are using has to be listed here. If not we either update siteDB or get the site name from other sources.

RelVal Andrew

  • Why can't SeangChan find the disk space. Andrew wants disk space size, not rate, and it needs to be updated, the job is writing the output files and if it gets bigger than a certain size we want to kill it. Need to find out how condor is finding the size.

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources

Automatic Assignment And Unified Software

  • Stuff closing out on 98% (not on 95% anymore).
    • for everything except for lhe, Dima is arguing that we stick with 95%
  • What workflows will be automatically assigned from assignment-approved?
    • clones? Yes! Kill and clone, and don't assign it and unified will assignint
    • acdc's if we make them by hand? No! must be assigned
  • let's start updating info on the documentation.

AOB

  • PNN change - have Gaston check sitedb list that maps

-- JenniferAdelmanMcCarthy - 2015-10-28

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2015-10-29 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback