Reprocessing and Production Team Meeting - May 26 4PM CERN, 9 FNAL time


Vidyo Link

Attending

  • FNAL: Jen, Jorge, Matteo, Jesus, SeangChan, Scarlet
  • US: Allie
  • CERN : JeanRoc, Andrew, Paola, Alan, Sebastian, Dima
  • Korea :

Personnel

  • Gaston to Colombia Early May 12-28, talk on the 13
  • Jen 1/2 day June 3 & 6th???
  • Jen Vacation July 25-29
  • SeangChan Jun 2-July 5
  • Jorge June 13-24
  • US Holiday Monday May 30

News - Dima

  • End of May is plan to finalize production for summer, we still have huge number of requests in our hands, we need to get these out of our hands and back in the hands of Physics
  • Force complete things that are over 95%, bypass workflows >90% - Dima told us to do it ONLY for DR80 everything else needs to have normal standards and miniAOD's
  • we have requests for new post-ICHEP samples with premixing, we need to get together to figure out how to test it and prepare so we are ready.

top issues affecting production

  • assist- man lots of workflows to look at
  • HIGH high percentage of wallclock failures at T2_CH_CERN
  • The connection to CERN has been miserable from FNAL the last week and a half, elogs time out, dashboard won't load loading WMStats has been slow making doing anything from our end difficult this week.
  • CSCS is down but has lots of merge failures going to it
  • Lower priority workflows are running and we still have high work that need to run. Why? Alan and SeangChan are looking at this
  • Long Workflows that move into fail Alan will take care of it after the meeting, ACDC's need to be made properly
  • workflows that were assigned to the wrong version number, problem with a script that has now been fixed. So we move on.

Site support -

  • problems loading site support pages to determine what was up and down all week

Transfers - Jorge

  • slow transfers between fnal and cern
  • blocks in stuck transfer json - blocks are still open SeangChan is looking at why the agent didn't close it automatically, but it can be closed.

Workflows

  • lots of workflow are completing, lots of recoveries pending.
  • We have several "The job was killed by the WMAgent for using too much wallclock time" failures at T2_CH_CERN, we need to go through this!!

ReDigi

MiniAOD

TaskChains

StepChain

  • NA

Rereco

Store Results

MonteCarlo

Agent Issues

Agent redeployment

RequestMgr2 Migration

Merging Scripts

  • It was neccessary to do some changes in the assignProdTaskChain.py script, to prevent mistakes assigning a wrong processing version.
  • The resubmit.py script was improved, please Jen do some tests. The documentation is being updated.
  • The reject.py script was changed to prevent dataset's invalidations, when we need to discard an useless ACDC.

RelVal Andrew

AOB

-- JenniferAdelmanMcCarthy - 2016-05-26

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2016-06-02 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback