Reprocessing and Production Team Meeting - July 21 4PM CERN, 9 FNAL time


Vidyo Link

Attending

  • FNAL: Jen, Jesus, Jorge, Matteo, Scarlett, Gaston, SeangChan
  • CERN : Alan, Paola, JR
  • Korea :

Personnel

  • Jen Vacation July 25-29, Aug 23-24
  • Workshop Week of Aug 16-19
  • Alan - taking Aug 5 off for sure, maybe a few more days

News - Dima

  • Premix testing
  • data reprocessing is running everywhere
    • getting file read errors, running multiple ACDCs gets messy fast
    • raise a ticket at RAL - not sending aaa

Top issues affecting production

  • Vandy has been down - no pilots running so we are still waiting. Failing Job Submit test and error connecting to schedd and authentification error and connecting to CE1, it was working for half the day working then it stopped for both CE's - Gaston will work with Andrew to fix the issues
  • merge at Vienna, CSCS, Estonia - passing all the tests Matteo is going to set Unified so that we aren't sending new work, then Gaston will take the sites out of drain and we will get the ACDC's running, if we get things to behave we will discuss putting new work to the site.
  • Workflows stuck in acquired, SeangChan is working on a patch to make this easier to figure out.
    • there were problems with dashboard that are messing with thresholds, which is why we have workflows that don't think we have slots. The values were set to null. they have updated the values manually but there shouldn't be an issue for the next 7 days. In this case the thresholds saw 50, or it could also be due to data location
  • workflows stuck in running-closed - closed out
  • having to restart agents, PhEDEx injector and JobAccountant locking, and timing out, we solved for a few components, it's a long standing issue, seems to be happening more now, but we aren't sure why. In past we have changed query's that were long and causing database issues. Patches have been made for catching the error and restarting the database, a bit dangerous but it's in and we've patched. Warnings go to LogDB, there is no interface so you need to look at the request, so we can look at it.
  • Error Log - we should look here first if we think workflows are stuck

Site support -

About Values missing in pledges view: the dashboard team is currently debugging the dashboard code. The values have been inserted manually by them.

Date Site Into the Waiting Room Out of the Waiting Room Into the morgue Out of the morgue
2016-07-19 00:00:01 T2_UK_SGrid_Bristol x      
2016-07-21 00:00:01 T2_GR_Ioannina   x    
2016-07-21 00:00:01 T2_BE_UCL x      
2016-07-21 00:00:01 T2_US_Caltech x      

Transfer Team

Workflows

Agent Issues

Agent redeployment

RequestMgr2 Migration

Merging Scripts

RelVal Andrew

AOB

-- JenniferAdelmanMcCarthy - 2016-07-20

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2016-07-21 - JenniferAdelmanMcCarthy
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback