Workflow Team Meeting - Oct 22 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL: Jorge, Gaston, Jen, SeangChan
  • US: Matteo, Ajit,
  • CERN : Julian, Andrew, JeanRoc, Dima - soon
  • EU:

Personnel

  • Jorge - Nov 9-13
  • Gaston - working remotely 30th
  • Julian to Colombia Dec 14, and then contract ends
  • Ajit out till Nov

News - Dima

  • Need to finish stuff we have
  • The plan is to have a big Rereco in Nov

3 top issues effecting production

  • Sites failing and having to move workflows
    • TIFR: Latest is that they had permissions issues on /store/mc and are testing
      • production and merge jobs working, but cleanup still failing
      • Kill clone and let unified take care of it from there. When sight goes back on they should just run.
    • Submit Failures at Bristol - in Jorge's court
      • We think the problem may be solved put in a Hail Mary Clone to see if it works https://cms-logbook.cern.ch/elog/Workflow+processing/21957
      • tests are all failing, Gaston will put Bristol into drain, Jen will Kill clone all workflows set to run to Bristol, leave in assignment approved
      • when ticket is closed, Gaston will remove manual drain
    • File read/merge issues at KIPT
      • testing, having Gaston verify that permissions are set correctly on files
      • drain, kill and clone
  • Product not found: could not find HcalNoiseSummary.
  • discussions about files that are transferred and then disappear 30 min later

Site support - Gaston

News & Issues

  • About the sites:
    • Into the Waiting Room: T2_EE_Estonia
    • Out the Waiting Room: T2_FI_HIP,T2_ES_IFCA,T2_IT_Bari Sites in Waiting Room: 4 Sites in Morgue: 7

Workflows

ReDigi

  • Julian redirected a few miniaod's to sites that were with lower load -> suggests doing the same when we have site failures.
    • did so manually and let unified take care of it

TaskChains

  • One bigass SUS-RunIIWinter15wmLHE (1B events, 1M jobs) running. 75% done.

StepChain

Rereco

Store Results

MonteCarlo

Agent Issues

Redeployment Plan

  • everything is up to date - redeploy in November, and then we'll have major changes to iron out before Christmas Production

RelVal Andrew

  • Random seed github issues 5534 - looking it up so answer is no wink
  • new one 5697 - basically the same thing, 6285 - disk space issue.
  • Alan updated the RelVal agent already

L3 discussion - Ajit, Jean-Roch, Matteo

  • not much, quick question for Gaston - to add a site to site conf write a ticket to siteDB and the site support team will create folder, that the site admin's can update
  • Auto force complete
    • more than 2 wks old without adding work

Opportunistic Resources

  • ran out of time at ucsd so done with this

Automatic Assignment And Unified Software

  • merges some of the changes in unified next requests will try unified

AOB

-- JenniferAdelmanMcCarthy - 2015-10-21

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2015-10-22 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback