Workflow Team Meeting - Aug 20th 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • US: Ajit, Matteo, Seangchan, Jorge
  • EU: Alan, Andrew, Dima, Jean-roch, Julian

Personel

  • Jen off Aug 8-26
  • Julian next week at FNAL
  • Julian Sept 14-30
  • Luis is leaving us Sept 5
  • John will be leaving us Aug 28
  • Jorge Aug 27-Sept 4 off
  • Alan off Sept 07-12

News

  • Storage management
    • Dima: We need to identify use cases for this (i.e. Test workflows, RelVals, etc)
    • Alan: input for test workflows
    • We'll create a file on ~cmst1/public afs -> with the same Unified .json format.

3 top issues affecting production

  • MaxWallTimeMins: some jobs were having very big values, hence not matching in the global pool.
    • Proposed solution: Estimate the MaxWallTimeMins based on job workload (how it's done now) and set Min and Max boundaries (configurable), so no job gets spawned with more than that.
  • TP2023's delaying everything
    • 3.5GB was assigned to KIT, Wiscon, Nebras, Legnaro, Caltech - 8 days and only 60%
      • Reassigned high memory sites only (FNAL, KIT, Wis, Nebras, etc.) - 1 day- already half of the jobs.
    • 3.0GB was assigned including Legnaro,
      • Cloned with high memory sites. 3 days (96% already)
    • This was affecting a lot of wf's with lower prio that got jobs waiting for those sites. (more than 20wfs with no progress in the last 5 days)
      • After reassigned they got unstuck.

Site support - John

Transfer - Jorge

  • NTR

Workflows

ReDigi

TaskChains

MonteCarlo

ReReco's

  • Matteo will take care of them ---+++ Store Results

Agent Issues

Redeployment Plan

production SL6
FNAL CERN
cmsgwms-submit1 (ready to wake) vocms0308 (up)
cmsgwms-submit2 (ready to wake) vocms0309 (up)
cmssrv217 (drain) vocms0310 (ready to wake)
cmssrv218 (up) vocms0311 (ready to redeploy)
cmssrv219 (up)  
  • next to drain -> cmssrv218

RelVal Andrew

  • Issue about log staging to eos. Andrew will ask Dirk.

L3 discussion - Ajit, Jean-Roch, Matteo

  • New auto-ACDC's being comissioned.

Opportunistic Resources

Automatic Assignment And Unified Software

AOB

-- JulianBadillo - 2015-08-20

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2015-08-20 - JulianBadillo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback