* Jen, Luis C, John, Seangchang, Dave

* Andrew, Julian, Alan


  • I think everyone is done with their vacation time! And just in time, it's going to be a busy couple weeks
  • Julian will be off Thursday and Friday, and will work remotely on Monday.
Jan 7 -> Jan 14 Sunil
Jan 14 -> Jan 21 Sara
Jan 21 -> Jan 28


we have ~2 wks to clear all workflows out of the agents. We will stop submitting starting this upcoming weekend and spend next week just draining agents and clearing out workflows. We need EVERYONE to spend some time looking to see what workflows we have in what states and clearing them out.

* We might have to cancel some workflows so:
    • They will have to run again after the migration
    • The other wfs run faster
    • The canceled ones could be the ones with lower completion % or with lower priority.

Issues of the week (As reported to the Comp Ops Meeting)

  • Site thresholds:
    • There are different scripts for different agent teams
    • New script should:
      • use info from site status board: site slots. Do we need to add another column for merge slots?
      • set different thresholds according to the agent team. How are we setting thresholds when agents dont have team anymore?
      • Luis and Julian will run the site thresholds tune up.
      • About the new column in SSB: ask for one column production and one for other jobs.
  • StageOut issue from last week: all the agents are patched. Closed.
  • Service certificates updated on vocms201, vocms202 and vocms216
  • Central couch crashed during the weekend, affected all the agents, already recovered. 12046
    • We need to update our notes on who to contact for this: cms-web-tools team, or send email to CRC on duty.
  • Purge old aborted workflows. - this needs to be done by hand, SeangChan is this something we can hand off to operators?
    • Also 6 step0 in "complete" since march 2013.
  • PhedexInjector on infinite loop 1206, is affecting Phedex subscription creation. phedex fix run several times during the last week.
  • PhedexInjector problem with sites with _Disk suffix in the name 12012.
    • Not using _Disk or _MSS anymore
    • We need to make sure that:
      • Custodial --> MSS
      • Non-custodial --> Disk
      • Luis is working on that.
  • Changes to closeout script:
    • Not yet pushed to github
    • checks that no block is open before closing it out.
    • It runs about 30-40% slower (querying DAS)
    • mail thread with developers started. is this the query get_data?

Site Issues that affected workflows

Workflows Issues


  • ACDC's for IN2P3 ReDigi WF's 12006
  • ACDC's for FNAL ReDigi WF's 12047
  • all other sites have been addressed here: 11996
  • Running jobs decreasing on FNAL (yesterday midnight: 10K, now only 4K) dashboard plot
    • Analysis running
    • Julian will reply to the HN.


  • Question: what do we do with backfill stuck in complete?: 12052

RelVal (Andrew's Questions)

DBS Migration Plan

-- JenniferAdelmanMcCarthy - 14 Jan 2014

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2014-01-19 - JulianBadillo
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback