Workflow Team Meeting - Sept 10 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, SeangChan, Ajit, Matteo, Gaston
  • US:
  • CERN : Julian
  • EU:

Personnel

  • Julian is off Sept 14-30
  • Alan off Sept 07-12

News - Dima

  • All FNAL machines have been re-booted for new kernel. We will needing to upgrade condor on the schedds soon as well, but not this week. Krista will let us know when they are already.

3 top issues effecting production

CONFLICT original 1:

CONFLICT version new:
  • we have a fair number workflows in acquired but jobs are ramping down.
CONFLICT end

Site support - Gaston

Waiting Room

T2_AT_Vienna into the Waiting Room

  • T2_AT_Vienna - was in waiting room due to mistake in dashboard so it is coming out today
CONFLICT original 1:

CONFLICT version 2:
No changes in Morgue
CONFLICT version new:
  • we have a fair number workflows in acquired but jobs are ramping down.
  • what happened to Briazilian site - still waiting for response
CONFLICT end
  • is there a text file or database that gives us a cross check between where EU sites report downtimes and dashboard
    • docDB - Julian will send Gaston a link

Morgue

  • nothing new

Workflows

  • We need to look at the stuck workflow scripts to tell us why workflows are stuck in acquired for so long and update them
    • it was Luis's script, somebody else needs to adopt it
  • clones were failing, Julian already fixed the script and turned in the changes
  • let's get the rest of the assignment approved WFs into the system

ReDigi

  • NTR

TaskChains

  • Jen had issues with makeACDC.py, you only have to assign with script and it is assigh with task chain.
    • Jen will elog issues for Julian to look at

Rereco

  • everything went fine with last requests - handled manually, working toward getting them working through the unified scripts

Store Results

  • ntr

MonteCarlo

  • 313 workflows sitting in acquired with LHE and can only run at CERN
  • auto assigning from acdc is inabled - so keep an eye on things

Agent Issues

  • issue of 0304 - alan redeployed and attached to production but something is missing in the agent because it is not replicating, agent is up but it hasn't acquired a single job or workflow.
  • 311 issues - SeangChan is finding some corrupt documents

Redeployment Plan

  • all the agents have been redeployed except the RelVal agent

RelVal Andrew

  • NTR

L3 discussion - Ajit, Jean-Roch, Matteo

  • NTR
  • Thanks for all the help from Ajit
  • Dima has pulled in a new monitoring page, numbers are not matching with dashboard according to Ajit so he is looking at it

Opportunistic Resources - Stefan

  • Ajit is starting to look at opportunisitc in OSG in general, this is Ajit's new focus. He is doing some tests to find out what we can do at them
  • Needs to talk to people at CERN about global pool etc. talk to Brian B or Krista or Furruck
  • concintrating on verification of WF's ran WF's at 3 different sites, and outputs were almost but not quite identical. They are looking at outputs to make sure that these differences are not significant. so far only running 500, now upping to 10K events to see if the differences disappear in the noise.

HLT

SDSC

Automatic Assignment And Unified Software

  • ACDC's assigned automatically

AOB

CONFLICT original 1:

CONFLICT version 2:
No changes in Morgue

-- JenniferAdelmanMcCarthy - 2015-09-09

CONFLICT version new:
  • we have a fair number workflows in acquired but jobs are ramping down.
  • what happened to Briazilian site - still waiting for response
CONFLICT end
  • Julian will send Jen the directions for making the plots for Monday meeting
  • agent issues will not be created so leave it out

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2015-09-10 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback