Workflow Team Meeting - Jan 21 4PM CERN, 9 FNAL time
Vidyo Link
Attending
- FNAL: Jen, Jorge,Gaston, Matteo, SeangChan
- US:
- CERN : Dima, JeanRoc
- Brazil - Alan
Personnel
- Alan - Going to Brazil Dec 21-Jan 21 will be working from Brazil Jan 14-20 - SeangChan has Alan's grandma's number
- New Julian starting Feb 1
- JR in Zurich 18-20
- Jen to CERN 22-26???
- Possible training sessions Feb 8-12 - ND student Alison, Matteo, Paola/Kathrine
News - Dima
- We need to get data done! Very serious
- the workflows that are in assist-man and at 100% are actually 99.999 something so run recovery
Hi everyone,
I'm so sorry, my internet connection is quite unstable this afternoon.
I just wanted to comment on two questions that were raised:
1) recovery procedure: as Jen said, it checks input x output and then
create a special resubmission workflow for a
single dataset, ignoring
all the other output tiers. Thus, duplicates will be seen only if someone
creates and assigns exactly the same json/request.
2) intermediate output for
StepChain: I think the goal was to get it done
for February, unfortunately due to holidays and me being vagabundo
it was not accomplished. It will be done for March cmsweb.
Let me know in case you guys have any additional questions to me, I'll
start replying tomorrow evening.
Cheers,
Alan.
- Long term we will get more and more "monsters" running so we need to learn to manage them.
3 top issues effecting production
- manpower
- site issues at Ioannia and ncp
- too many files open temp cvms issue, error. Try resubmitting and see if they just run.
Site support - Gaston
- Problems at T2_EE_Estonia were caused by missing kernel sources. The site should be functional now.
- We are still investigating issues with T2_GR_Ioannina
- T2_UK_SGrid_RALPP issues were caused by overload of storage system, they've requested a reduction of DIGI-RECO workflows.
-
- Current Waiting Room:* T2_IN_TIFR, T2_RU_IHEP, T2_RU_SINP, T2_TH_CUNSTDA, T2_RU_INR
-
- Current Morgue:* T2_PL_Warsaw, T2_TR_METU, T2_RU_PNPI, T2_RU_ITEP, T2_MY_UPM_BIRUNI, T2_RU_RRC_KI
-
- Out the Waiting Room:* T2_ES_IFCA,T2_RU_IHEP
-
- Sites in Waiting Room: 5
- Sites in Morgue: 6
Transfers - Jorge
- Nothing to report
- JR - please look at the transfers that are needed for the miniaodsim
- new json's transfer team still needs to figure out how to deal with this.
- will look at it and discuss options/solutions on the Monday Meeting
Workflows
- filesmismatch - SeangChan will look into what is going on there.
Rereco
* Highest Priority
Store Results
Agent Issues
Agent redeployment
- cmssrv218 and 219 are in drain (Workqueues overloaded). SeagChan will look to see if they are ready for redeployment
L3 discussion - Ajit, Jean-Roch, Matteo
Opportunistic Resources
Automatic Assignment And Unified Software
- We need documenation!!!!!!! Matteo is working on it, will continue to look at it.
- will be worked on when we are training in Alison
AOB
--
JenniferAdelmanMcCarthy - 2016-01-20