Workflow Team Meeting - Oct 8 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, Eliana, Gaston, Jorge, SeangChan
  • US: Matteo, Ajit, Dima
  • CERN : Julian, JeanRoc, Alan
  • EU:

Personnel

  • everybody chained to their desks for the time being
  • Jorge - Nov 9-13

News - Dima

  • Finally started miniaod, huge number of requests but low load
  • we need to rerun 10% of the main campaign for run2.
    • it will be discussed with Christoph and JeanRoc. We don't yet know how intense it will be or when it will hit.
  • we lost the data with how ddm was handling datasets. Big bug was found and fixed. We don't have complete datasets for these datasets, PPD needs to then find the datasets, figure out if they need the data, come up with the list of datasets to invalidate and rerun. Most likely at least a week before we get this back.
  • large number of requests that the dataset transfer to tape has not been approved.
    • Jorge will look into this
  • 2000 miniaod samples in, need to keep a close eye on the system to make sure it all runs smoothly.

3 top issues effecting production

Site support - Gaston

Waiting Room

  • Estonia and FI_HIP are both in the waiting room
    • Estonia is looking better and should be moving out of the waiting room - moved out yesterday, and are back in production.
    • FI_HIP hasn't found the problem yet, but they are at least working on it.

Morgue

  • No movements

Workflows

ReDigi

  • Open issue: MiniAOD's failing due to input location. https://cms-logbook.cern.ch/elog/Workflow+processing/21802
    • input was at 2 sites, but the jobs tried to run at sites that are neither of these 2 sites, we need to check why workqueue is picking the wrong site.
    • workflows were whitelisted to a broader list. The tape location was seen as a disk location in the global queue Alan will take a look

TaskChains

StepChain

Rereco

  • there are 2 requests are ended, that had 0 inputs and 0 outputs. We tried ACDC's and got to 99.8% for the test, we still have a few things to check to see if things are going to work with the unified model

Store Results

  • Only one ticket, already closed.

MonteCarlo

  • Open Issue: pLHE with no errors but lower % https://cms-logbook.cern.ch/elog/Workflow+processing/21751
    • also being discussed, it's a read error that is not showing up as a failure in cmssw
    • for now elog, For now while we figure out how to handle this Ajit will look and post to hypernews and return to requestor

Agent Issues

Redeployment Plan

RelVal Andrew

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources - Stefan

Automatic Assignment And Unified Software

AOB

JulianBadillo - 2015-10-08

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2015-10-14 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback