Workflow Team Meeting - Sept 17 4PM CERN, 9 FNAL time

Vidyo Link

Attending

Personnel

  • Julian is off Sept 14-30
  • Jen will be taking a 1/2 day on the 22nd

News - Dima

  • Miniaod camgain - bunch of quick jobs everything produced ~3000 requests
    • single input events
    • they are testing now, they want to test Probably hit Friday at 5.. as always.... soon not exactly sure when
  • nothing urgent happening

3 top issues effecting production

  • Workflows in running with agent url: undefined
    • https://cms-logbook.cern.ch/elog/Workflow+processing/21624
    • issues is with submit1, workflows are moving things along and starting to report, but not properly
    • there were 2 couches running, the couch that was deployed doesn't have job or FWJR views so the monitor processing isn't working with it properly
    • we lost the data that would have gone into those views, so job accounting didn't go into FWJR, it was fixed yesterday.
    • for workflows in complete that require ACDC, we will just have to clone and run over again.
    • since there is no record, the completion record is messed up.
    • mysql data is OK it's just the couch data that is messed up.
    • drain agent and redeploy and restart it with all pieces in place.
    • data uptime was messed up it looked like a replication issue but it wasn't. Time update is broken, and has been for 4-5 mo
    • couch is up and rest of agents are happy
  • 304 is having issues, it is in drain Alan will redeploy it once it is done
  • 311 couch there was a corrupt couch view, if it is completely drained Alan will shut it down and redeploy
  • Inconsistent Status of workflow
    • https://cms-logbook.cern.ch/elog/Workflow+processing/21618
    • it was in running-closed but request manager was completed, not sure how it happened. When you click on the request manager, and it shows the page it goes to the Oracle DB WMStats refers to couch. and the couch and Oracle were not consistent. Not sure why this has happened .
    • SeangChan will run his script to make sure everything is consistent and we won't worry about fixing request manager 1 right now and just push to get 2 in place
  • is there a change in rest api interface the scripts for injection, interface for reading changed 6-7 mo ago, think it's already been changed. Reading rest interfaces, from ops side is changing but if they are using interface within scripts.
    • SeangChan thinks Julian already made changes in WMAgent scripts to go to request manager 2 but we need to look at them to verify.

  • Goal - move to request manager 2 by end of year!
  • transfer to IN2P3 are slow
    • several workflows waiting for input to be available there
    • there was an expired proxy there, it was renewed and transfers are running again.
  • "TotalInputEvents" missing from the workflow workload cache

Site support - Gaston

  • NTR

Waiting Room

Morgue

Workflows

  • other than issues reported above things are moving smoothly
  • lots of WF's waiting to run at T2_CH_CERN
    • up the priority of those workflows that can only run there so they run faster, stop assigning other work there until the LHE is done. - Jen's list of things to do set to 100K

ReDigi

TaskChains

Rereco

Store Results

MonteCarlo

Agent Issues

Redeployment Plan

  • see above - nothing major since we just upgraded all machines

RelVal Andrew

  • when will EOS be ready - running tests and seeing errors that have to be fixed. Hope to finish next week after testbed deployment
  • can we raise merge job thresholds for FNAL - raise threshold to 300 - Alan will raise it, and we will keep an eye on things to make sure we aren't breaking things
  • new max walltime limits are causing priority problems for RelVal, setting them to 36 hrs means we are only getting 200 jobs going. Need to discuss this with glidin WMS team

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources - Stefan

HLT

SDSC

Automatic Assignment And Unified Software

AOB

-- JenniferAdelmanMcCarthy - 2015-09-16

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2015-09-17 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback