Reprocessing and Production Team Meeting - May 12 4PM CERN, 9 FNAL time
Vidyo Link
Attending
- FNAL: Jen, Jorge, Jesus, Matteo, SeangChan
- US: Allie
- CERN : Alan, Paola, Andrew, JeanRoch
- Korea :
Personnel
- Gaston to Colombia Early May 12-28, talk on the 13
- Paola May 13
- Jen 1/2 day May 13 and 20
News - Dima
- New Shifters! Mykola and Svenja from DESY University.
- Paola & Alan will do some training with them next week
3 top issues affecting production
- Lots of Submit failures, FNAL, KIT, RWTH, PISA needing manual cleanup
- Where are we in testing merge issues at T0_CH_CERN
- Script has been integrated, I downloaded it Wed and it is working perfectly! Thanks Paola!
- filemistach : delay in uploading is bad for book keeping. pdmvserv_SUS-RunIISummer15GS-00131_00284_v0__160323_032949_5198 in the list since a couple of weeks
- running* => completed transition only when all files are in ? is there an api to check this is all done ? any ways to have this automated out ? please provide the script that check in the agents for files pending injection.
- T0 merge : how comes permission issues come and go ?
- /reqmgr2/data/request?status= failing repeatedly bringing unified to errors.
Site support - Gaston is on his way to Colombia
- T2_CH_CSCS - where are we in the testing?
- Last week's test failed but this week is ok so Paola will answer her findings in a ggus tiket
Date |
Site |
Into the Waiting Room |
Out of the Waiting Room |
Into the morgue |
Out of the morgue |
2016-04-21 00:00:01 |
T2_US_Caltech |
x |
|
|
|
2016-04-21 00:00:01 |
T2_PL_Swierk |
x |
|
|
|
2016-04-22 00:00:01 |
T2_TW_NCHC |
x |
|
|
|
2016-04-22 00:00:01 |
T2_IT_Bari |
|
x |
|
|
2016-04-24 00:00:01 |
T2_IN_TIFR |
|
x |
|
|
2016-04-25 00:00:01 |
T2_US_Caltech |
x |
|
|
|
2016-04-25 00:00:01 |
T2_UK_London_Brunel |
x |
|
|
|
2016-04-25 00:00:01 |
T2_US_UCSD |
x |
|
|
|
2016-04-25 00:00:01 |
T2_BR_SPRACE |
x |
|
|
|
Transfers - Jorge
- there is a GenSim dataset stuck in transfers that Jorge is working on
Workflows
- lots of workflow are completing, lots of recoveries pending
- lots of smaller workflows that just finished at rate of 25% into recovey, it would be interesting to have and keep track of this number over time.
Rereco
Store Results
Agent Issues
Agent redeployment
Merging Scripts
- The merging changes in this pull request https://github.com/CMSCompOps/WmAgentScripts/pull/139
.
- reject.py/resubmit.py tested OK.
- Do we wanna merge resubmitUnprocessedBlocks.py and resubmitWithBlockBlacklist.py? Has Unified picked up the below functionality?
- assignProdTaskChain.py was changed according to https://github.com/CMSCompOps/WmAgentScripts/pull/140
, script tested ok. Are this changes propagated to the new merged assign.py?
- What is left?
- Automatic ACDC, step 1: start with a script that drill down through the various wmstats calls for a given workflow and expose what are the main issues, error code, sites, ecc. Paola will be on top of that.
AOB
--
JenniferAdelmanMcCarthy - 2016-05-12
This topic: CMSPublic
> CompOps >
CompOpsWorkflowTeam >
WorkflowTeamMeeting > WorkflowTeamMeeting20160512
Topic revision: r6 - 2016-05-12 - JenniferAdelmanMcCarthy