Waiting Room

  • what constitutes being in the waiting room
    • site is not working correctly for a longer period of time despite support effort by CompOps
    • tickets are not being handled with highest urgency like other sites => site showed anyway not to work, we don't want that the site's tickets get in the way of normal support tickets of working sites (first goal: keep the working sites working, then bring back waiting room sites to working state)
    • corresponding to their time in the waiting room, we reduce ESP credit
    • we should exclude the site from MC production and analysis
  • how do you get in the waiting room
    • should be a slower process to get in
  • how do you get out of the waiting room
    • fast way out
  • analysis suggestion: analyze site readiness for the last 3 month, 2 months, 1 month
  • create 3 lists of sites below 60%
  • determine by trend analysis which sites should be in the waiting room
  • if a site is bad for 1 month and there is no trend in getting better => waiting room => if a site is good again for 1 week, move it up?
  • Atlas: in and out of production within days, all is tracked by HammerCloud => should consider in the medium term, we need some more automatic procedures anyway, was always the idea of site readiness that sites could be automatically included and excluded in the systems (WMAgent, CRAB, etc.):
  • maybe we need a disabled state in between production and waiting room (because waiting room mean ESP credit reduction)
  • steps towards an automated system: - develop procedures moving sites in and out of waiting room - for 1 month, move sites in and out manually without them having an effect on the usage of the sites - report on statistics and tell the sites what we found and what we are going to do - allow WMAgents and CRAB to exclude sites because of readiness, still move sites manually in and out of the waiting room - then automate moving sites in and out

daily business

  • GGUS bridge not working for some savannah tickets => check SiteDB if LCG name is filled properly => with specific tickets contact Guenter (SCC) Grein <guenter.grein@kit.edu> to check bridging of specific tickets EDIT May 4: seems that I just needed to wait a day or so and the bridge gets made properly
  • RelVals on SL6: Pisa bootstrapped and installed SL6 and is running SL6 workernodes => mail to Alan and Andrew
  • to work on:
    • bad performance of T2_EE_Estonia
    • what happened with the site readiness with CERN last week
    • SL5/SL6 resource mapping => talk to workflow team, then submission infrastructure team * clarify status of Cracow => look into SiteDB, REBUS, PhEDEx and talk to Giuseppe/Ken (as T2 liaisons) and Peter/Ken as resource office
  • discussion amongst team
    • systems we are using
    • ticket systems we are following
    • how do we prepare the plots for the CompOps twiki => Duncan will write instructions on TWiki, John will try them

action items

  • prepare the 3,2,1 month analysis (Duncan/John)
    • list of sites that should go into the waiting room and list of sites that should go out of the waiting room
    • start discussion with Sten, the monitoring team and the DashBoard team: plot showing how long a site is in the waiting room for a period of time
  • Document current procedures (duncan)
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2013-05-04 - DuncanRalph
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback