Workflow Team Meeting - Sept 18 4PM CERN time

Vidyo Link

Attending

* FNAL - Dave, Luis, Jen, Luis, SeangChan, Jorge * CERN - Julian, Maric, Alan

Personel

Sep 11 -> Sep 18 Jasper
Sep 18 -> Sep 25 ...
  • Dave on Jury duty Tuesdays until the End of Sept
  • looks like we are back in the hunt for an EU operator??

News

  • Quiet shifts continue until Oct with bursts of stuff
  • hyper urgent stuff is running Upg2023SHCAL14DR is running, keep a very close eye on it
    • RSS is set at 2.8
    • cut events per lumi 1/2
    • no time for clone, ACDC and send back

Site support

  • nothing to report

Jasper's notes

Agent Issues

  • CMSWeb down over weekend, unstable again on Wed
  • High load on one of the backends for stageout

Redeployment plan

  • everything is up to date
  • SL6 agents: new machines cmssrv217, cmssrv218, and cmssrv219
    • Julian - How is the installation going? Time estimate
      • Alan was testing the deployment script - tomorrow will be deployed (at least one of them)
      • We could send backfill tomorrow
        • Dave and Jen will come up with candidate WF's
    • Development team: couch 1.5 replication issues. any progress?
      • we will run 3-6 wks will backfill and low priority real work and make sure it works, then when that works we go live
  • Single team: "production"
    1. Test-Phase: new fnal (sl6) machines, backfill and MC lowprio.
      1. Use new prio-schema - Schema remapping Prio schema, see pages 5-6
        RelVal/Task Chain 70-90K
        Redigi/Rereco 40-60K
        MC 10-50K
        Backfill < 10K
      • we need to include Step0's, StoreResults.
      • I (Julian) go for RelVal still in a separate team.
    2. Migration-Phase: When we think it's working
      1. slowly start assigning things to "production"
      2. slowly drain "mc" and "repoc_lowprio" machines
      3. If we get new SL6's from CERN, we plug them to team "production"
      4. if we don't (or the ones we get are not good enough) we plug the SL5 machines.
    3. Details to define:
      • Time frames
      • Condor priorities (in the collector).
      • Our production scripts
    • we will try to work this way for the next couple months, treat it as production and when it's running smoothly announce the new scheme to Monday meeting
    • we will setup the new FNAL machines now, when we get the new virtual machines in Oct at CERN we will broden our pool.
    • plan to retire old machines as soon as new system is up and running smoothly. All SL6 by the end of the year!

Dataset's Pre-Staging

  • Problem to solve: have GEN-SIM staged at T1_disk before ReDigi is assigned.
  • Solution: MeriÁ from Transfer Team is going to explain us: download the slides
    • we have agreed that solution #1 is what needs to happen, and will demand that cmsweb release be delayed to get this implemented. Dave will pass this up the foodchain.

Workflows

miniaod's

  • next round in Sept - but not here yet
  • There were 1 wf with duplicates, we need to track back how far back were the dupilicates, and invalidate things. Still hearing crickets from PPD on what they want us to do with the input datasets.

ReDigi

  • some of the High Priority Upg2023SHCAL WF's are having a high (30-50%) failure rate due to timeouts.
  • Some wfs cloned with wrong ProcessingString: see elog report
    • waiting for requestor's answer.
  • Very old request in assignment-approved: alahiff_EWK-Summer12DR53X-00157_T1_US_FNAL_MSS_00121_v0__140819_183535_3703
    • please kill or assign this
  • why wasn't the data staged - because we weren't told the data was important

T1's summary

Rereco

  • nothing... literally

Store Results

MonteCarlo

  • next round of upgrade MC coming still waiting
  • MC validation campain in Oct/Nov

RelVal Andrew

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2014-09-18 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback