Disk / Tape Separation plan for PIC - Workflow Team

When What Who
Feb-21 start draining PIC site Workflow Team
Feb-24 Site is drained, announce in comp ops meeting WorkflowTeam
Feb-25 (delayed 2 days) Switch TFC configuration, site points. PIC Managers
Request link updates to xrootd, Phedex et al. PIC Managers
Notify by email to workflow team when config changes are done: PIC Managers
"Send the new se-name from site-local-config.xml to PhEDEx admins (even if unchanged), asking them to change the SE_NAMEs for T1_ES_PIC_Disk/Buffer/MSS in TMDB accordingly"
After the change, the SE_NAMEs in TMDB should be:
- for T1_ES_PIC_Disk, the same value as se-name in the PIC site-local-config.xml
- for T1_ES_PIC_Buffer/MSS, a fake value TAPE.se-name
Feb-26 (delayed 2 days) switch config in testbed agents and assign tests to PIC: Disk Tape separation procedure WorkflowTeam
Feb-28 (delayed 2 days) Notify by email if tests were successful WorkflowTeam
Apply config changes to Production Agents WorkflowTeam
Mar-3 (delayed 2 days) Bring PIC back in production - notify in the CompOps meeting WorkflowTeam

Report

  • On Tuesday and Wednesday the plan was delayed due to pilots still running, agent vocms235 had problems flushing its jobs even after site was down.
  • On Thursday morning site admins announced that the site was finally free of pilots. They applied a temporary DN filter to avoid getting new pilots.
  • On Thursday afternoon TFC changes, and TMDB (Phedex) changes were applied. Also the site "srmcms.pic.es" was added as a disk/tape separated site on Production and TestBed agent config.
  • On thursday afternoon this test MC was assigned: jbadillo_RequestString-OVERRIDE-ME_140227_173649_1290
    • Running on testbed - vocms142 (testbed-dataops)
    • Run Whitelist = [T1_ES_PIC]
    • Custodial subs = []
  • On friday afternoon (2pm) this was the status of the test:
    • 418 successful production,
    • 60 jobs failed due to xrootd failures
    • 7827 jobs queued or missing, - the workflow was force-completed since no merge jobs were successful and the dataset was empty..
    • this was aborted due to merge failures
  • On Friday afteroon (3pm) another test was assigned: jbadillo_PICDiskTapeSepTest_140228_142249_4259
    • Running on testbed vocms230 (t1-testbed)
    • Run Whitelist = [T1_ES_PIC]
    • Custodial subs = []
    • this was aborted due to wrong parameters
  • On Friday afternoon (4pm) a third test was assigned: jbadillo_ES_PICDiskTapeSepTest_140228_162011_7284

-- JulianBadillo - 19 Feb 2014

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2014-03-03 - JulianBadillo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback