Workflow Team Meeting - March 17 4PM CERN, 10 FNAL time
Vidyo Link
Attending
- FNAL: Jen, Gaston, Scarlet, Jesus, SeangChan
- US: Allie, Matteo
- CERN : Paola, Dima, Alan
Personnel
- Jorge March 24-25
- Jorge to Columbia April 15-May 2, Talk on April 27
- Welcome to Scarlet and Jesus - Workflow Team Operators
- Good Friday and Easter Monday of at CERN
- SeangChan taking 1/2 day next Thurs and Fri off next week
News - Dima
- So far it looks like the schedule is on! Big Campaign April 1
- cleanup what we have, finish tests, need to finalize T0 configuration, Alan and John are working on this, challenge is getting the agent
- JR's "accidental test" of the T0 by using it in overflow worked
- Testing will likely happen next Tuesday
- premixing is done, and running tests to see what it does but it is out of question to run for April 1 round
- multicore - what sites can we use
- we don't yet have the list of sites for multicore yet. Then we need to have test
- Jen and Matteo will have to figure this out next week
3 top issues affecting production
Site support - Gaston
News & Issues
Date |
Site |
Into the Waiting Room |
Out of the Waiting Room |
Into the morgue |
Out of the morgue |
2016-03-13 00:00:01 |
T2_RU_INR |
x |
|
|
|
2016-03-16 00:00:01 |
T2_FI_HIP |
x |
|
|
|
2016-03-17 00:00:01 |
T2_AT_Vienna |
x |
|
|
|
2016-03-22 00:00:01 |
T2_IT_Bari |
x |
|
|
|
Transfers - Jorge
Workflows
Rereco
- fabozzi_Commissioning2015-Cosmics-Boff-01Mar2016_763p2_160302_100618_8061
- file mismatch in PhEDEx and das due to file invalidation, I think the correct course of action is to Kill and clone now that we know what happend, but it is a big workflow so wanted to discuss first, could recovery work here?
- vote is to kill and clone
Store Results
Agent Issues
- After next Tues, when we redeploy FNAL machines we should reboot
- Jen needs to poke Dave about redeployting agents
Agent redeployment
Summary of the scripts to take into account:Scripts that modify workflows or datasets in the system
1. Abort, clone, reject, assign, announce, close out, force complete, set status, change priority (they use reqMgrClient.py):
abortWorkflows (uses also dbs3Client.py)
abortAndClone (uses also dbs3Client.py and resubmit.py)
assignProdTaskChain.py
assignWorkflow.py
rejectAndClone.py (uses also dbs3Client.py and resubmit.py)
rejectWorkflows.py (uses also dbs3Client.py)
resubmit.py
announceWorkflows.p
closeOutWorkflows.py (uses also dbs3Client.py and phedexClient.py)
closeOutWorkflowsFiltered.py (uses all closeOutWorkflows*)
closeOutWorkflowsManual.py (uses all closeOutWorkflows*)
closeOutWorkflowsWeb.py (uses all closeOutWorkflows*)
forceCompleteWorkflows.py
setCascadeStatus.py
changePriorityWorkflow.py
2. Another operations:
makeACDC.py
makeAllACDC.py (uses makeACDC.py)
changeSplittingWorkflow.py
DBS3SetDatasetStatus.py (uses dbs3Client.py)
extendWorkflow.py (uses dbs3Client.py)
resubmitUnprocessedBlocks.py (uses changeSplittingWorkflow.py and assignWorkflow.py)
createSitesBackfill.py [Used for testing]
deleteInvalidOutput.py (uses also dbs3Client.py and phedexClient.py)
setDatasetStatusDBS3.py
Scripts query * from the system:
duplicateEvents.py (uses also dbs3Client.py)
stuckRequestDiagnosis.py
WorkflowPercentage.py
getDatasetStatus.py (uses also dbs3Client.py)
getInputLocation.py (uses also dbs3Client.py and phedexClient.py)
condor_global_overview.py
condor_overview.py
findWorkflows.py
Clients:
dbs3Client
reqMgrClient
phedexClient
L3 discussion - Ajit, Jean-Roch, Matteo
Opportunistic Resources
Automatic Assignment And Unified Software
AOB
--
JenniferAdelmanMcCarthy - 2016-03-24