Reprocessing and Production Team Meeting - April 28 4PM CERN, 9 FNAL time

Vidyo Link

Attending

  • FNAL:
  • US:
  • CERN :
  • Korea :

Personnel

  • Youn on shift
  • CERN holidays in the next week, May 5 and 6.
  • Jorge to Colombia April 15-May 2, Talk on April 27
  • Gaston to Colombia Early May Dates yet to be determined.
  • Allie will be gone April 29 through May 8.
  • Alan on holidays from May 5 to 11 (included).
  • Paola May 4, May 9 and May 13

News - Dima

  • The meeting has a new name! Christoph has decided that "Reprocessing and Production Team" is a better description of what we do. For now our functionality remains the same, and Dave Mason has already taken the name "bunnies" so we couldn't take that one.

3 top issues affecting production

  • Unified is not creating all the ACDC's for workflows only some of them and submitting them, so we need to figure out what steps have been ACDC'd and which have not and create them by hand
  • Lower level work taking over the system
  • Agents silently having components stuck and not submitting jobs.
    • how do we identify this faster? what are our tools for unsticking things? When should we restart components and when should we bug SeangChan or Alan to look?
  • Where are we in testing merge issues at T0_CH_CERN
  • ACDC in assignment-approved, what is the situation
  • assistance-*filemismatch, what is the situation of having this taken care of ? is there a way to identify

Site support - Gaston

  • T2_CH_CSCS - where are we in the testing?

Date Site Into the Waiting Room Out of the Waiting Room Into the morgue Out of the morgue
2016-04-21 00:00:01 T2_US_Caltech x      
2016-04-21 00:00:01 T2_PL_Swierk x      
2016-04-22 00:00:01 T2_TW_NCHC x      
2016-04-22 00:00:01 T2_IT_Bari   x    
2016-04-24 00:00:01 T2_IN_TIFR   x    
2016-04-25 00:00:01 T2_US_Caltech x      
2016-04-25 00:00:01 T2_UK_London_Brunel x      
2016-04-25 00:00:01 T2_US_UCSD x      
2016-04-25 00:00:01 T2_BR_SPRACE x      

Transfers - Jorge

Workflows

  • ACDC's being submitted to just the list of sites in the white list isn't enough with overflow, we need a wider list
  • Is T2_CH_CERN_HLT the only site that is an "execption" where we can overflow to a site in drain?
    • Drain means drain the site, allow any processing jobs that are there finish then merge everything else up and clean up because the site is going down, overflowing to sites in drain and not following these rules is dangerous!

ReDigi

MiniAOD

TaskChains

StepChain

  • NA

Rereco

Store Results

MonteCarlo

Agent Issues

Agent redeployment

RequestMgr2 Migration

Merging Scripts

  • TaskChain Merge jobs are not being submitted using the script that is in the main directory
  • Allie reviewed the Unified script rejector.py and the new reject.py script that Paola made. They seem to have all of the same functionalities. We should start using rejector.py instead.
  • Summary of merged scripts:
* assign.py = assignProdTaskChain.py + assignWorkflow.py. Needs fixing (https://github.com/CMSCompOps/WmAgentScripts/pull/135). Pick info from siteInfo() class in utils.py. Needs testing. * assign.py = assignProdTaskChain.py + assignWorkflow.py. Needs fixing (https://github.com/CMSCompOps/WmAgentScripts/pull/135). Needs testing. * reject.py = rejectAndClone.py + rejectWorkflows.py + abortWorkflows.py + abortAndClone.py. No additional functionalities with the Unified rejector.py module.
    • the core/overlap of rejector.py and reject.py should be factored in, and the part of unified factored out so that it uses the same code, but reject is not touching anything to Unified. People needing to reject/clone things for backfill and all will require this decoupling
    • condor_overview2.py = condor_global_overview.py + condor_overview.py. Needs testing.
  • Remaining proposed merging:
    • extendWorkflow.py + resubmit.py
  • Next steps:
    • finish merging
    • test
    • clean the repository from unused/unmerged scripts
    • further merge with Unified
  • ACDC:
    • Automatize. Paola will start report any recurrent pattern of ACDC that we could fully automatize.

RelVal Andrew

AOB

-- JenniferAdelmanMcCarthy - 2016-04-27-- JenniferAdelmanMcCarthy - 2016-04-27

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2016-04-28 - JeanrochVlimant
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback