Reprocessing and Production Team Meeting - Aug 4 4PM CERN, 9 FNAL time


Vidyo Link

Attending

  • FNAL: SeangChan, Jorge, Jen, Gaston, Matteo, Allie, Jesus, Scarlet
  • CERN : Paola, Sebastian, Dima

Personnel

  • Workshop Week of Aug 16-19
  • Alan - on holidays from Aug 2-9
  • Jen off 23-24

News - Dima

  • Testing Premix!
  • want to get 40K CPU's, multicore 100K jobs, failure rate is high so we need to figure out what is going on, all the AAA
  • We need to give more feedback early on, we need to find the errors, it will be a few weeks before we are doing this for real, but we need to focus on this now
  • agenda's for the workshop need to be finalized

Top issues affecting production

  • Agents - stability 311 having real issues, not creating new jobs - SeangChan looked at it, it's in drain, jobs are being created now, slower than normal but it's working
  • 219 - there are no jobs to pull down, there are 12 workflows acquired that are going to Vandy, and CSCS and both sites are in drain so there are no jobs to pull down, there are no jobs to pull down
  • CSCS - fixed their problem one of their lib was out of date but they are still working on it keep it in drain
  • Vandy - is leaving it's downtime today so we will be able to start running once they come out of the waiting room in 3 days
  • http://dabercro.web.cern.ch/dabercro/unified/showlog/?search=critical - need to start checking this

Site support -

  • CSCS - still in drain
  • Vandy - in waiting room but should be out later today
  • Bristol - left downtime earlier last week but have been failing tests, tickets are open
  • Brunel - just went in today, problems with links
  • Vienna and UERJ are in drain manually for merge issues

Transfer Team

  • Transfers have been working on stuck custodial transfers, Sebastian is working on that
  • lostBlocksDatasets.json - new what is the difference between that and the stuck Transfers json - it's JR's file, he's not on but hopefully we can ask in person later today

Workflows

  • large workflow that was aborted, but it is still creating jobs that need to be killed by hand.

Agent Issues

Agent redeployment

  • Mid Aug - start redploying everything

RequestMgr2 Migration

Scripts

  • modified the makeAllACDC script need to update the twiki for that as well as for the assign script changes that JR made

RelVal Andrew

AOB

-- JenniferAdelmanMcCarthy - 2016-08-04

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2016-08-04 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback