Workflow Team Meeting - April 10

Vidyo Link

  • Jen, can you please add me (Julian) as vidyo moderator? smile


  • Jen, SeangChan, John, Luis - FNAL
  • Jullian, Andrew - CERN


April 3 -> April 10 Jasper
April 10 -> April 17 Sara

  • Luis is going to Colombia for a seminar April 21-25 then on vacation April 28-May 2
  • Dave & Oli at SLAC April 6-10
  • Dave at CERN April 21-29
  • Julian will be on vacation from 16-May to 18-May


  • FNAL is now fully disk tape separated.
  • Throttle script is in place. It looks like we have MC production under control.

Jasper's Notes

  • How did looking for stuck WF's go?

Agent Issues

  • 201 & 216 seem to be having issues with TaskArchiver
    • problem with Oracle Dirk and Yuyi are supposed to look at it, long lasting query that makes Oracle lock
    • we have a release meeting tomorrow at 4PM CERN time, 9 FNAL time.
  • Problems with T1_US_FNAL_Disk, we were sending jobs there, temporary solution was done. 13907 Thresholds by default should be 0 for all sites, default status should be down?
    • I believe John was setting _Disk endpoints also in SSB, in that case we can set *_Disk to down and thresholds = 0 on SSB and keep using the scripts.
    • factory doesn't have node name at all. so when you specify white list just specify site NOT _DISK. We get initial population of site list it tries to send to _DISK
    • the plan is to remove the disk names from sitedb, and remove from Dashboard.
    • for resource control to populate database we should remove the _DISK from the scripts. Jullian is using the atallsites command we need to move to Luis's threshold script. But it doesn't populate the site.
      • Jullian with Johns help will modify the script to read from ssb to pull list of sites that we need to populate. No _DISK sites, no sites that are not currently in service.
      • lets remove the "bad choices" from the request manager page to remove the human mistake element.
  • Agent redeployment plan:
    • vocms85 attached to reproc_lowprio team
    • vocms234 and vocms216 in drain
  • we need to update our status plots to point to the right information. - John will post to Vincenczo and Xavier

Site Issues

  • Drain list
  • shifter instructions if site in downtime
    • nothing if short (<2days)
    • drain if long (>2days) set it couple days in advance


  • Workflows have been re-injected or acdc'd so we are back on track
  • running HI wf's and they are doing OK


  • Things are under better control than they have been in a while. re-rampup of Redigi at FNAL is going OK
  • Issues with Fall13 StepTwo - what is the current status?
    • they are sending to FNAL_DISK. do a condor_edit command and change the sites. SeangChan will talk to Krista on how to fix this. Main problem is in 85. JINR, RAL and FNAL are having this issue. there are maybe 10 jobs on other agents rest are on 85. SeangChan and Krista will work on this.

Relval == Andrew's Questions

  • Major issue is that all the WF's are stuck in running closed. They look like they are done and all jobs successful but they are not moving on. There on the order of 60 WF's in this state. SeangChan will look at them and debug.
  • The relVal agent environment is messed up. When Andrew logs in the home directory is alevin but he is cmst1 on vocms142. When he does a whoami he is cmst1. You need to do a cmst1 and then an agentenv. Andrew and Jullian will work this out off line.
-- JenniferAdelmanMcCarthy - 09 Apr 2014
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2014-04-10 - JenniferAdelmanMcCarthy
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback