23 Jan 2007

  • Gridview frontend will move to new hardware next wednesday. Fully Quattor managed. Other parts of service will move over the next 2 weeks.

16 Jan 2007

  • RAS

21 Nov 2007

  • Hardware acquired for new Gridview service - these are mid-range server nodes already

14 Nov 2007

  • OSG 0.8.0 released Nov 1st. With the release we will get full SAM tests for OSG sites appearing in the production SAM instance.

10 Oct 2007

  • SAM Unavailability (Tue, 02 Oct 16:30 - Wed, 03.10.2007 12:00) - DB Problems - understood and resolved
  • A new version of Gridview (new summarization algorithm) slipped into production last week
    • Work starting on turning Gridview service into production - based on work done for SAM service
  • lxdpm101 (dpm for SAM testing infrastructure) seems to be now important - should move into a more "production" state
  • Successful demo of nagios prototype & gridmaps prototype at EGEE conference (http://gridmap.cern.ch/gm)
  • First data published directly from OSG into SAM

22 Aug 2007

  • Nothing to report

15 Aug 2007

  • GridView new service availability calculation published and waiting approval by MB/GDB.
  • WG telecon. discussed deployment issues/testing of OSG's own and Nagios-based probe runners for OSG custom probes.
  • EDS consultant prototyping heatmap display for high-level view of site availability/reliability.
  • Testing SLC4/glite-3.1 UI/Nagios monitoring combination - some minor issues to be resolved.

11 July 2007

  • Work on-going on new service availability calculation for GridView. Will be presented to MB/GDB for approval/notification

30 May 2007

  • R-GMA based grid publication removed from CERN disk servers. Working now on rolling out new publication mechanism (SAM ws-client) and removing R-GMA based publication from other sites.

24 Apr 2007

  • R-GMA team have said that they think they've solved the TIME_WAIT socket problem which required frequent reboots of mon boxes.

18 Apr 2007

  • We have a consultant on-site from EDS, who is 60% on monitoring for 1 year. He may help out on the architecture of the new monitoring system.
  • Memory leak found in Gridview producer for gridftp logs on castor - fixed and fix deployed.
  • R-GMA not scaling for JobWrapper tests. Turned off R-GMA publication - Piotr provided instructions to site-admins in the operations meeting.

14 March 2007

  • At the OSG All-Hands meeting, discussions were had about OSG deploying SAM for testing OSG sites using their own custom probes. They've a target of June for this.

28 Feb 2007

Issues

  • Our monbox monb001 is completely overloaded and cannot cope with the number of requests it gets. We see more and more timeouts and we have to reboot the machine quite often. Lemon shows the the number of processes is constantly between 1000 and 2000+.

How can we make this service more robust? Is it possible to load balance this service and if, how?

21 Feb 2007

Gridview WS client deployed on Castor seems to running well - R-GMA is still losing data.

14 Feb 2007

Nothing to Report

31 Jan 2007

WLCG Workshop
  • Monitoring BOF and monitoring session on reliable services held with large participation (80 people in BOF)

17 Jan 2007

Work in Progress
  • Site survey done for monitoring tools currently used. Presentation to be made next week at Monitoring BOF
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2008-10-15 - SteveTraylen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback