Workflow Team Meeting - Nov 13 4PM CERN time & US Meeting Tues Nov 11 at 1PM FNAL time

Vidyo Link

Attending

  • Tues Meeting : Jen, Ian and Sean
  • Ian
  • FNAL: Jen, Luis, Dave
  • CERN : Julian, Andrew, Alan, Dima

Personel

EU

Nov 7 -> Nov 13 ?
Nov 14 -> Nov 20 Sara

US US shifters have decided it works better for their schedule to commit to 10 hrs a week of monitoring. We will give it a try and see how it goes!

  • Julian will be in Colombia 25th Nov - 25th Dec - Working plans on being online.
  • Luis will be in Columbia Dec 20-through New Year will get us exact dates soon

News

  • Christmas Production hints are coming
    • MC has started to start nailing things down so when we do the full on reconstruction it will be properly calibrated
    • startup digi-reco in March
    • 3-4 MC generation with reconstruction between now and then.
  • Upgrade MC just came in, which is the generation step
  • real start of MC of 1 Billion events starting in the spring. several high pressure cycles between now and then, we don't know scale or timeframe but we should be ready to take it on when it comes.
  • FNAL will have a downtime for storage upgrade on Thurs, we will put the site in drain
    • should be up at end of day FNAL time
  • Has everyone filled out the doodle poll?
  • GlideIn WMS not running new jobs - https://cms-logbook.cern.ch/elog/GlideInWMS/892 Farrukh and Krista working on it. Things stuck everywhere!
    • Production pool there are some jobs running now matching is really slow. We need to keep an eye on it.
    • CERN schedd's
    • condor plots in dashboard not showing all jobs. there is a lot pending and not a lot running at T2's
  • cmsweb migration to SL6 VM's impact:
    • Slow down on couchdb replication
    • Slow down on wmstats (monitoring, debugging), reqmgr (assigning, aborting, cloning, etc)
    • tests on Physical machines is going on so hopefully we can be back to physical machines next

Site support

  • Re commissioning for production
    • T2_PL_Swierk (new site), Julian sent test workflow. Is it ready?
      • Tests stuck in acquired - site not created in the resouce-control
      • Julian resent test today should have results tomorrow
    • T2_RU_INR (200 cores) moving out of the Morgue. Please test it for production
      • will test tomorrow

EU shift notes

US Shift notes

  • Sean had to update his FNAL passwords
  • Ian still having issues loging tinto SL6 or CERN. I forwarded the FNAL error messages to Lisa G. Ian will re-bug Julian and Ivan about the CERN machines.
    • he can login but can't authenticate so he can't do anything with the stuck WF's on CERN machines.

Agent Issues

  • Error Handler crashes -
  • crashes due to slow connection with couch
  • couch replication stopped all these should be fixed with cmsweb upgrade to physical machines
    • disabled compactions so we are running, but we can't survive long this way

Redeployment plan

  • Production Pool:
    production SL6 mc SL5
    cmssrv217 (up)
    218 (up)
    219 (up)
    vocms216 (up)
    201 (up)
    235 (up)
    reproc_lowprio SL5 step0 SL5
    vocms202 (up)
    234 (up)
    85 (drain - will be abandoned)
    vocms237 (up - will be abandoned)
  • Global Pool
    backfill SL6
    submit1 (up)
    submit2 (up)
  • All agents have latest version.
  • cmssrv98 and 112 agents shut down
  • vocms216 was rebooted on thursday. No major impact.

Workflows

  • let's put low priority data to the submit :backfill team to further test the global pool

ReDigi

  • Top Priority WF's Phys14DR then miniaod's
    • wf's were not given a custodial site so I manually subscribed them and then they closed out.

miniaod's

*

Rereco

Store Results

  • Shutdown of Savannah is making it so we can't do store results anymore
    • Julian is messing with scripts to get things working waiting for FNAL downtime to end

MonteCarlo

SL6 testing/backfill

RelVal Andrew

-- JenniferAdelmanMcCarthy - 2014-11-11
Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2014-11-13 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback