Workflow Team Meeting - Sept 11 4PM CERN time

Vidyo Link

Attending

  • FNAL - Jen, Dave, Seangchan, Luis
  • CERN - On Holiday

Personel

Sep 8 -> Sep 11 Jasper
Sep 11 -> Sep 18 Jasper
  • Dave on Jury duty Tuesdays until the End of Sept

News

  • Quiet shifts continue until Oct with bursts of stuff
    • hyper urgent stuff still waiting, the test workflows are failing and have been sent back

Site support

  • nothing to report

Jasper's notes

Agent Issues

  • cert issues over the weekend
  • Luis found a problem with LogCollect, that when a WF is running on multiple machines sometimes log collect misses collecting some of the jobs. This has been going on forever and was just discovered.

Redeployment plan

  • everything is up to date
  • SL6 agents - Burt gave us 3 new machines cmssrv217, cmssrv218, and cmssrv219 to start working with.
    • Seangchan said things are ready to go and as soon as they are up lets get backfill going.
    • the development team is a bit stuck on a 1.5 couch replication issue that needs to get fixed.
    • after some discussion we agreed that as well as testing the agents on SL6 we are going to test to see if we can effectively run processing using one team
    • Let's make them team production
      • MC and production, and backfill and maybe even relval will be thrown at these machines to test if we can handle everything with one team, by adjusting priorities.
      • During our "test period" we will run backfill, as well as low priority workflows on the SL6 machines to test things out.
      • we will have to remap the priorities that PPD gives us, we have asked that they give us more reasonable priorities, they haven't yet, so we will just have to do so ourselves.
        • Oli gave us rules back in Feb that PPD never followed but we will.
          • we will with one team, and use the priories outlined here: https://indico.cern.ch/event/300036/contribution/2/material/slides/0.pdf see pages 5-6
            • RelVal/Task Chain will have priority levels 70-90K
              • the meaning of "TaskChain" has changed a bit over the last couple weeks, we are doing work there which should really be lower priority, more like in the range of the MC WF's. This needs more discussion
            • Redigi/Rereco - will have priority levels of 40-60K
            • MC - has priorities 10-50K
            • Backfill will have Priorities set less than 10K
        • we will try to work this way for the next couple months, treat it as production and when it's running smoothly announce the new scheme to Monday meeting
        • we will setup the new FNAL machines now, when we get the new machines in Oct at CERN we will broden our pool.
        • plan to retire old machines as soon as new system is up and running smoothly. All SL6 by the end of the year!
        • need to start checking all our scripts to make sure they run on SL6 without problems.
        • Dave still needst to come up with a condor priority for these machines. He'll talk to Tony about how they are setup and let us know.
  • we need to figure out what is in couch and get it to clear out properly so that when we are in full blown produciton it works
  • Force complete is now an option from running-closed, but it is not functional yet. As it is now, if there are other datasets that are using

Workflows

miniaod's

  • next round in Sept - but not here yet
  • There were 2 wf's with duplicates, we need to track back how far back were the dupilicates, and invalidate things. Still hearing crickets from PPD on what they want us to do with the input datasets.

ReDigi

  • Work is ongoing, but everything is running smoothly

T1's summary

Rereco

  • nothing... literally

Store Results

MonteCarlo

  • next round of upgrade MC coming still waiting
  • MC validation campain in Oct/Nov

RelVal Andrew

-- JenniferAdelmanMcCarthy - 10 Sep 2014
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2014-09-11 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback