WMAgent end to end Validation Tests for HG1409 cmsweb upgrade
Upgrade schedule
- 18 August: release candidate RPMs due for pre-prod deployment * deadline for requests *
- 19 August: cmsweb-testbed pre-prod release candidate deployment
- 31 August: validation results due * deadline for validation *
- 02 Sept: production deployment
Release changes trac ticket
Validation results trac ticket
Versions tested
HG1409a
ReqMgr version 0.9.97.pre2
Global WQ version 0.9.97.pre2
WMStats version 0.9.97.pre2
WMAgent version used for the testing: v0.9.97.pre2
Release Notes (UPDATE THEM)
RequestManager
Global_WorkQueue
WMStats
Observed changes from previous versions
Test |
Tester |
Completed |
Status |
Comments |
Lumi mask input |
|
MC from scratch - Lumi mask input |
|
MC with input - Lumi mask input |
|
ReDiGi - Lumi mask input |
|
ReReco - Lumi mask input |
|
TaskChain - Lumi mask input |
|
Wrong run number - Lumi mask input |
|
Wrong lumi range - Lumi mask input |
|
Force completing workflows 5249 |
|
PhEDEx node naming |
|
Disk subscription 5142 |
|
Tests
Test |
Tester |
Completed |
Status |
Comments |
Bug fixes / New features in WMStats |
|
Bug fixes in WMAgent |
|
New features in WMAgent |
|
LheInputFiles feature added to TaskChain requests 4871 |
Alan |
|
EventsPerLumi capability added to TaskChain requests 4872 |
Alan |
|
Bug fixes in ReqMgr |
|
MC from scratch workflow extension |
|
Change the permission for agent update the reqmgr status |
|
New features in ReqMgr |
|
Bug fixes in WorkQueue |
|
Standard workflows |
|
Old request moved from completed to closed-out and announced |
|
Old request moved from completed to rejected |
|
Old request moved from assignment-approved to rejected |
|
Request moved from assigned to aborted |
Alan |
|
Request moved from assigned to rejected |
Alan |
|
Request moved from acquired to aborted |
Alan |
|
Request moved from acquired to rejected |
Alan |
|
Request moved from running to aborted |
Alan |
|
MonteCarlo workflow |
Justas |
|
MonteCarlo LHE workflow |
Justas |
|
MonteCarloFromGEN workflow |
Justas |
|
ReDigi workflow |
Justas |
|
ReReco+skim workflow |
Justas |
|
ACDC for Production |
Justas |
High Scale Test |
Alan |
|
TaskChain: MC recycling |
Alan |
|
TaskChain: MC from scratch |
Alan |
|
TaskChain: FastSim workflow + event splitting |
Alan |
|
TaskChain: Data workflow |
Alan |
|
TaskChain: Pileup workflow by recycling |
Alan |
|
TaskChain: Pileup workflow from scratch |
Alan |
|
TaskChain: Pileup Pyquen workflow (PrimaryDataset override) |
Alan |
|
TaskChain: automatic harvesting |
Alan |
|
TaskChain: different ProcessingString per task |
Alan |
|
TaskChain: KeepOutput = False feature (single and cascade) |
Alan |
|
TaskChain: 'TransientOutputModules': ['RAWoutput'] and TransientOutputModules = ['RECOSIMoutput'] |
Alan |
|
TaskChain: ACDC via WMStats |
Alan |
TaskChain: MC Pre-Mixing workflow |
Alan |
Optional things to test
Test |
Tester |
Completed |
Status |
Comments |
TaskChain: cascade "closed-out" and "announced" changes via script |
Alan |
Propagate Memory (RequestMemory in MB), Disk (RequestDisk in KB) and Job length (MaxWallTimeMins in minutes) estimates to Condor through the JDL #4472 |
|
Apply smart error handling for jobs that failed due to high memory usage or excessive run time #4473 |
|
Robust merge jobs - add missing merge files to ACDC, proceed with existing files #4476 |
|
Fixed timeouts when connecting to the ReqMgr which prevented workflows from being acquired #4660 |
|
Track pileup location and NOT fail out requests #3733 and #4507 |
|
Priority: It has become a required parameter, it can only take values up to 1 million |
Alan |
Argument validation is stricter, in general the idea is that a parameter is either with a valid value or not present, dummy values will most likely fail validation |
Alan |
JobSplitting can now be specified at request creation. Use "SplittingAlgo" and other parameters for that |
Alan |
Do not allow rejection of requests in "assigned" state 4976 |
Alan |
--
AlanMalta - 14 Aug 2014