Workflow Team Meeting - Oct 2 4PM CERN time

Vidyo Link

Attending

  • FNAL: Jen, Jorge, Dave, Seangchan, Luis
  • CERN : Andrew and Julian

Personel

Sep 25 -> Oct 2 Sara
Oct 2-> Oct 9 Xavier
  • Dave is done with Jury duty. He can now tell you what parts of Kane county to avoid.

News

  • hyper urgent Upg2023SHCAL14DR is down to one last ACDC2
  • SL6 stress testing, backfill - more on this further down the agenda
    • resubmit.py script modified to get a backfill wf from any given workflow (not Taskchain)
    • to use it: python resubmit.py WORKFLOW USER GROUP -b
    • it will add "Backfill" everywhere (Acquisition era, processing string, bla bla bla)
    • it will reset the request date (treated as new request).
    • example:
[vocms174] /afs/cern.ch/user/j/jbadillo > python WmAgentScripts/resubmit.py pdmvserv_TSG-Spring14dr-00048_00226_v0__140924_200502_3926 jbadillo DATAOPS -b
Submitting workflow
RESPONSE ....
Cloned workflow: jbadillo_TSG-Spring14dr-Backfill-00048_00226_v0__141002_110514_6144
    • add remove prep - id

Site support

  • we began AAA testing with SL6 but need to stop doing so...

Sara's notes

Agent Issues

  • Agents have been well behaved... that's what happens when you aren't running much

Redeployment plan

  • cmssrv112 - in drain for redeployment as disk filled

Workflows

  • Processing string - Where do we stand with this. We had 2 tasks assigned last week where do we stand?
    • We need to talk to MCM about how they want the policy set: Dave will reply to the e-mail that they are postponing it and asking for clarification as to how we will know that we are using it.
      • Dave is working with them, we need to just handle the responsiblity of request to reqest manager2
    • Julian will make a list of WF's that have processing string in the schema that are already in the system.

ReDigi

  • slow... just wrapping up old work and new stuff dribbling in behaving itself

miniaod's

  • Workflow with duplicates was resubmitted by PPD - new miniaod ran, no duplicates, old workflows rejected and outputs were deleted
  • New Miniaod at 300% pdmvserv_SUS-Spring14miniaod-00012_00049_v2__140916_141253_9688 300%
https://cmslogbook.cern.ch/elog/Workflow+processing/17065

Rereco

  • nothing... literally

Store Results

MonteCarlo

  • running smoothly

SL6 testing/backfill

  • We have begun ramping up the load testing on the 3 SL6 agents, only one suspicious crash in both cmssrv217 and 218 see elog here
    • had issues with couch replication
    • problems overnight that were solved. New one appeared after it and Alan will talk to Seangchan after the meeting about it.
  • Fri-Weekend - submitted Backfill ran at nice steady state, no crashes but thresholds were improperly set (Use the new resubmit.py)
  • Mon-Tues realized that the thresholds were set incorrectly when we couldn't get the 2nd Backfill to go, once the thresholds were reset things ramped up nicely and held steady state
    • We are running T1's at threshold using these 3 agents
  • Wed - Started another redigi backfill, with higher priority than the others, it is currently taking over slots as we want it to!
  • Wed- Julian started MC backfill, and ramping up more jobs
    • cmssrv217.fnal.gov cmssrv217. 11618 27143 0
    • cmssrv218.fnal.gov cmssrv218. 13986 20560 1
    • cmssrv219.fnal.gov cmssrv219. 10986 23072 0
  • We think we are ready to start running low priority real work on these machines, Anybody want to throw some work at us?
  • Tests to run
    • replication didn't work in the first tries, and then it worked. Seangchan can't think of any specific tests
    • need to ask Alan or Justice if they have any specific tests they think we should run - Julian will work with Alan to come up with a list of things we need to ru
  • need to come up with a good variety of WF's to put through, multi step, miniaod's, mixing.... what else?
    • get something big for each site
    • get some hard workflows multistep gensim+raw + gen_sim_Reco
    • the Frankenflow should run through - made aodsim and miniaodsim
    • miniaod's that run on reco
    • someof the CSA14 produced gen sim raw and gen sim reco they actually had the reco format rather than AOD then did miniaod from reco

RelVal Andrew

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2014-10-02 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback