Workflow Team Meeting - Oct 15 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, Gaston, Jorge, Eliana, SeangChan
  • US:
  • CERN : Andrew and Julian
  • EU:

Personnel

  • Jorge - Nov 9-13

News - Dima

  • Plenty of work to keep us out of trouble
  • KIT downtime was over 6 min ago wink
    • site readyness has been below 80% for the last couple days, do we force them on or wait for them to have a "good day" via tests so they go on automatically

3 top issues effecting production

  • Sites failing and having to move workflows
    • TIFR: TIFR: https://cms-logbook.cern.ch/elog/Workflow+processing/21866
      • the site has been drained, and we have some test WF's sent there. Waiting for test workflows to run.
    • Submit Failures at Bristol
    • File read/merge issues at KIPT
      • same errors at Lisbon, Julian dug deeper, looks like it might be a permissions error
    • Stage-out failures at RALPP
    • file read issues at Lisbon
  • Workflows with no errors but not 100%

Site support - Gaston

News & Issues

  • About the sites:
    • KIPT : SAM, HC, Links :OK, waiting for resolution of ticket : 116876, the site was on downtime from Oct 7 through Oct 9.
    • TIFR: Problems could be related to problematic DPM node, issue reported to be solved 116867. SR ok during last 2 weeks.
    • RALPP: Links issues on Monday and Tuesday. Site appears in good condition now according to SR. Apparent overload on SRM according to HN thread. * double check date/time of failures to see if it's still a problem
    • Bristol: Waiting for resolution of 116683, 116873. No answer from site admin yet. Site appears in good shape to SAM, HC and link metrics.
      • problem with the TFC?? Jorge will check that the files are actually there. PhEDEx thinks the files are there, the agent is failing the submits because it can't find the files

  • new morgue controler script if a site has better than 80% for more than a week it will go back into the waiting room but will not go back into production until we do so manually
  • Out of the morgue: T2_BR_UERJ, T2_RU_INR, T2_PK_NCP
  • Into the waiting room: T2_FI_HIP, T2_IT_Bari, T2_ES_IFCA
    • low site readiness
  • Out of the waiting room: T2_EE_Estonia.

  • Sites in Waiting Room: 6
  • Sites in Morgue: 7

Workflows

ReDigi

TaskChains

StepChain

Rereco

Store Results

MonteCarlo

Agent Issues

Redeployment Plan

  • everything is up to date - redeploy in November, and then we'll have major changes to iron out before Christmas Production

RelVal Andrew

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources - Stefan

Automatic Assignment And Unified Software

AOB

-- JenniferAdelmanMcCarthy - 2015-10-14

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2015-10-21 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback