Workflow Team Meeting - Jan 21 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL: Jen, Jorge,Gaston, Matteo, SeangChan
  • US:
  • CERN : Dima, JeanRoc
  • Brazil - Alan

Personnel

  • Alan - Going to Brazil Dec 21-Jan 21 will be working from Brazil Jan 14-20 - SeangChan has Alan's grandma's number
  • New Julian starting Feb 1
  • JR in Zurich 18-20
  • Jen to CERN 22-26???
  • Possible training sessions Feb 8-12 - ND student Alison, Matteo, Paola/Kathrine

News - Dima

  • We need to get data done! Very serious
    • the workflows that are in assist-man and at 100% are actually 99.999 something so run recovery
Hi everyone,

I'm so sorry, my internet connection is quite unstable this afternoon. I just wanted to comment on two questions that were raised: 1) recovery procedure: as Jen said, it checks input x output and then create a special resubmission workflow for a single dataset, ignoring all the other output tiers. Thus, duplicates will be seen only if someone creates and assigns exactly the same json/request. 2) intermediate output for StepChain: I think the goal was to get it done for February, unfortunately due to holidays and me being vagabundo it was not accomplished. It will be done for March cmsweb.

Let me know in case you guys have any additional questions to me, I'll start replying tomorrow evening.

Cheers, Alan.

  • Long term we will get more and more "monsters" running so we need to learn to manage them.

3 top issues effecting production

  • manpower
  • site issues at Ioannia and ncp
  • too many files open temp cvms issue, error. Try resubmitting and see if they just run.

Site support - Gaston

  • Problems at T2_EE_Estonia were caused by missing kernel sources. The site should be functional now.

  • We are still investigating issues with T2_GR_Ioannina

  • T2_UK_SGrid_RALPP issues were caused by overload of storage system, they've requested a reduction of DIGI-RECO workflows.

    • Current Waiting Room:* T2_IN_TIFR, T2_RU_IHEP, T2_RU_SINP, T2_TH_CUNSTDA, T2_RU_INR

    • Current Morgue:* T2_PL_Warsaw, T2_TR_METU, T2_RU_PNPI, T2_RU_ITEP, T2_MY_UPM_BIRUNI, T2_RU_RRC_KI

    • Out the Waiting Room:* T2_ES_IFCA,T2_RU_IHEP

    • Sites in Waiting Room: 5
    • Sites in Morgue: 6

Transfers - Jorge

  • Nothing to report
  • JR - please look at the transfers that are needed for the miniaodsim
    • new json's transfer team still needs to figure out how to deal with this.
    • will look at it and discuss options/solutions on the Monday Meeting

Workflows

  • filesmismatch - SeangChan will look into what is going on there.

ReDigi

TaskChains

StepChain

Rereco

* Highest Priority

Store Results

MonteCarlo

Agent Issues

Agent redeployment

  • cmssrv218 and 219 are in drain (Workqueues overloaded). SeagChan will look to see if they are ready for redeployment
production SL6
FNAL CERN
cmsgwms-submit1 (up) vocms0308 (up)
cmsgwms-submit2 (ready to redeploy) vocms0309 (up)
cmssrv217 (up) vocms0310 (up)
cmssrv218 (drain - overloaded) vocms0311 (ready to redeploy)
cmssrv219 (drain - overloaded) vocms0304 (on HLT tests)
  vocms0303 (up / highprio)

RelVal Andrew

L3 discussion - Ajit, Jean-Roch, Matteo

Opportunistic Resources

Automatic Assignment And Unified Software

  • We need documenation!!!!!!! Matteo is working on it, will continue to look at it.
  • will be worked on when we are training in Alison

AOB

-- JenniferAdelmanMcCarthy - 2016-01-20

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2016-01-27 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback