Workflow Team Meeting - July 30rd 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • US: Jen, Jorge, Luis, Gaston Eliana, John, Ajit and Matteo
  • EU: Dima, Julian, Alan

Personel

  • Jen off Aug 8-26
  • Julian Sept 14-30
  • Luis is leaving us Sept 5
  • John will be leaving us Aug 28
  • Jorge Aug 23 off
  • Gaston (New FNAL site support person) and Eliana (New FNAL T0 Operator) both now at FNAL

News

  • Eliana and Gaston have arrived
  • Dima -
    • We need to redo all the GEN_SIM, all digi RECO and all data
    • hopefully all upgrades will stop
    • We need to find better ways of monitoring things to determine that we are fully using resources
  • Downtime next week - Krista will shutdown glidins the afternoon before so we will not be draining FNAL, we want to keep the pressure up on the site so once we are back in production we can take over and not have oppertunistic jobs take over.

3 top issues affecting production

  • cleaning up all the failed submission workflows - ongoing
  • FNAL downtime
  • vocms217 and 311 bottlenecked due to the issues we had last week, both of them are almost ready. Both are in drain, just completely drain and redeploy

Site support - John

  • two sites that are not currently available Estonia and Bari, they are already in drain downtimes declared directly in DocDB and OSG, right now we are checking for downtimes that last longer than 24 hrs, if that is the case we put the site in drain one day in advance. Works automaically, if site comes out of downtime, and passes all metrics it will automatically move out of drain. For Tier 1's team needs to look at things by hand more carefully.

Transfer - Jorge

CONFLICT version 3:
  • Xfers are failing from/to T2_US_Wisconsin for last 6-8 hours. The error is "Couldn't submit to FTS ..". Please take a look (requested by Ajit).
  • /GluGluToHToGG_M-125_14TeV-powheg-pythia6/TP2023HGCALNoTRKExtGS-DES23_62_V1-v1/GEN-SIM dataset xfer to T2_CH_CERN is stuck at 72% for last few days. Please take a look.
CONFLICT version new:
  • Nothing to report
  • Transfers are failing to WI since last night. What can Ajit look at to fix it. Jorge will look and let Ajit know.
CONFLICT end

Workflows

ReDigi

  • cleaning up workflows with timeouts and submission failures, making slow but steady progress

TaskChains

  • also affected by the submission failure

MonteCarlo

  • A few workflows with big lumis: they had failures anyway, the ones that needed to be resubmitted and were cleared out
  • Julian is only reporting on these in hypernews not e-log so requesters know what is going on

ReReco's

  • There are 2 that are done, but are not showing up in closed-out, the script is not closing out rereco's
  • cerminar_Run2015B-HighMultiplicity85-PromptAmd-22Jul2015_747p2_150722_185014_5549 - stuck in acquired Jen and Matteo will look at it

Store Results

  • NTR

Agent Issues

* already discussed

Redeployment Plan

  • already discussed, any new releases coming up? end of Aug/Sept

RelVal Andrew

  • Andrew went home because SeangChan is on vacation

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources

  • 3 problems
    • at NERC - stageout issues, they are staging out at the wrong directory at FNAL so failing merge step, it's understood and being fixed
    • problem with a specific CMSSW = do an e-log search to determine what is going on
    • 2 workflows stuck in acquired, not understood why they are stuck in acquired, we need to figure out what is going on, one has input dataset and the other doesn't
      • look at 214 and make sure it's all running, there may be a problem with thresholds at NERC

Automatic Assignment And Unified Software

AOB

-- JenniferAdelmanMcCarthy - 2015-07-30
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2015-07-30 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback