LCGSCM Monitoring, Logging & Reporting Status

29 April 2009

  • Ongoing work to get latest MoU pledges into a DB for use by the other monitoring applications. Updating with latest numbers post RRB

15 October 2008

Nothing

17 September 2008

  • Regular (intermittent) failures of messaging brokers during the night. investigating. Hampered by a bug which makes recovery take a long time. Effect is missing OSG tests (due to the 2 hour Gridview timeout) - will need resummarisation.
  • CMS will set up an elog service on their vobox. We will provide expertise in the quattor and elog configuration for them.
  • SAM running ok (last outage 9th Sept)

9 July 2008

  • Tue 2nd July - CERN network problem prevented SAM BDII reading site BDIIs (1 hour )
  • Wed 3rd July - All services in an 'ERROR' status due to host-cert tests failing - package missing on SAM UI
  • Some results corrected for June in Gridview (22-23, 6th) for outages. New general procedure being put in place to 'mask' results

25 June 2008

  • SAM - CERN hit limit of 100 nodes over weekend which stopped gstat tests working.
  • DB issues in GV to be covered in meeting with DB Devs.
  • Deployed new messaging based gridftp producers on all CERN disk servers. Testing message based L&B reporting system. Will be send to certification in next days. When deployed outside of CERN we'll turn off R-GMA and WS based publication at same time. * monb001 - R-GMA box to be shut off.

30 Apr 2008

  • SAM - DB intervention yesterday (Tue) to fix some tables in GV schema which had problem last time. All went ok - SAM turned off for 2 test submission cycles (2hours)
  • elog - moved from VM to a 'real' machine for duration of CCRC'08 phase 2

19 Mar 2008

  • SAM - downtime Friday lunchtime - Monday
    • Due to bad config + human error (didn't check)

  • Both:
    • DB want some space back. Short term - delete some CLOBs - ~150GB ?
    • Mid-term - produce policy on data expiration and approval by MB
    • Finally - move to solution where we purge daily/monthly the data from the schemas

27 Feb 2008

  • SAM UI upgrade still ongoing - SAM tests running at 50% frequency
  • Problem with SAM SRM Tests - weren't run for 3 days (they had been scheduled only on the SAM UI which was out of service)

13 Feb 2008

  • Final gridview services moved to new hardware. Old machines will be returned next week.
  • Gridview/SAM will need a downtime to cleanup the old entries in the table. Advantage of using the downtime is we can partition for the future.

Old Reports

Edit | Attach | Watch | Print version | History: r30 < r29 < r28 < r27 < r26 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r30 - 2009-04-29 - JamesCasey
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback