https://indico.cern.ch/conferenceDisplay.py?confId=254682

Attending

Adli, Julian - CERN John, Luis, Jen, SeangChan - FNAL Congratulations to Dave on his new baby. We are still waiting cute baby photos. Andrew

Personel:

  • coming off shift: Xavier
* coming on shift Xavier * coming on shift Sunil * list of everyone's holidays? (Cern closeout)
    • US will be having the Thanksgiving Holiday Thursday-Sunday
    • Jen will be pulling "best effort" days Wed-Sunday. In other words I'll log in, run the close out script and make sure the machines aren't on fire during US/Asia shifts but will not be spending a lot of time trackingdown issues that can not be ignored.
    • SeangChan will be Taking off completely Wed-Sun
    • Julian's holidays: (dec 27th to 29th) and (jan 7th to 11th)
    • Dec 23-Jan1 - Xavier and Sunil will be on shift but working from home
    • CERN closed Dec 22-Jan6
    • Luis will be gone Dec 11-13, Dec 16-20 Luis will be working remotely

Issues

  • Dashboard is being unrelable this week - Julian will make sure Sunil understands the Dashboard issues we are having so we are not relying on it for debugging
    • time plots, # of jobs is unrelyable we are running 60K jobs but the dashboard plots say 120K jobs
    • Effiency plots - John will look into why we keep going green/yellow/green
  • Couch issues on 201, 216 couch keeps going down we need to keep a close eye on it

Agents

  • vocms201: Issues with couch. Getting usual when heavy load.
  • vocms235: Sandbox problem solved. 11386 fwjr with missing task field, manually added
  • vocms85: Workflows stuck in Acquired. Oracle connection problem, solved. 11359
  • Stuck MonteCarlos: Why are so many? Not reaching "complete"
    • the acquired WF's were waiting resources
    • 2-3 WF's with many queued jobs and everything is piled up behind it
    • When you make ACDC's they need to have higher priority so that they go in.
  • Blocks were not being closed in DBS. Problem identified. There will be a procedure change
    • already discussed procedure changes, we need to make sure the twiki is updated
    • when workflow is force completed ACDC's also need to be force completed
    • Jullian will update the twiki's Jen or Sunil will test

Site Issues that affected workflows

* Xavier - good job going through the sites with issues and working with the site support team! * We need to get the EU operators working closer with the EU site support team. Right now the EU site support team is still on a steep learning curve but we aren't going to get them ramped up unless we get them working!

Workflows

Monte Carlo

ReDigi
Reprocessing
Workload Summary/Problems in the config file
AOB
  • Dashboard Alarms (Adli and Julian working on it) 40%
  • Site status script, tested on vocms201. Successful for now, this week is the migration for vocms216 and vocms85.
  • Emails to workflow HN is the TaskChain working properly, Luis has in fact read the emails but hasn't replied or looked yet. All jobs are running over the same event.
  • are we doing DQM harvesting on ACDC's
-- JenniferAdelmanMcCarthy - 25 Nov 2013-- JenniferAdelmanMcCarthy - 25 Nov 2013
Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2013-11-26 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback