---+ LCGSCM Monitoring, Logging & Reporting Status * SAM UnAvailabilities : SAMProdServUnavail ---++ 29 April 2009 <!--STARTLCGSCM--> * Ongoing work to get latest MoU pledges into a DB for use by the other monitoring applications. Updating with latest numbers post RRB <!--ENDLCGSCM--> ---++ 15 October 2008 <!--STARTLCGSCM--> Nothing <!--ENDLCGSCM--> ---++ 17 September 2008 <!--STARTLCGSCM--> * Regular (intermittent) failures of messaging brokers during the night. investigating. Hampered by a bug which makes recovery take a long time. Effect is missing OSG tests (due to the 2 hour Gridview timeout) - will need resummarisation. * CMS will set up an elog service on their vobox. We will provide expertise in the quattor and elog configuration for them. * SAM running ok (last outage 9th Sept) <!--ENDLCGSCM--> ---++ 9 July 2008 <!--STARTLCGSCM--> * Tue 2nd July - CERN network problem prevented SAM BDII reading site BDIIs (1 hour ) * Wed 3rd July - All services in an 'ERROR' status due to host-cert tests failing - package missing on SAM UI * Some results corrected for June in Gridview (22-23, 6th) for outages. New general procedure being put in place to 'mask' results <!--ENDLCGSCM--> ---++ 25 June 2008 <!--STARTLCGSCM--> * SAM - CERN hit limit of 100 nodes over weekend which stopped gstat tests working. * DB issues in GV to be covered in meeting with DB Devs. * Deployed new messaging based gridftp producers on all CERN disk servers. Testing message based L&B reporting system. Will be send to certification in next days. When deployed outside of CERN we'll turn off R-GMA and WS based publication at same time. * monb001 - R-GMA box to be shut off. <!--ENDLCGSCM--> ---++ 30 Apr 2008 <!--STARTLCGSCM--> * SAM - DB intervention yesterday (Tue) to fix some tables in GV schema which had problem last time. All went ok - SAM turned off for 2 test submission cycles (2hours) * elog - moved from VM to a 'real' machine for duration of CCRC'08 phase 2 <!--ENDLCGSCM--> ---++ 19 Mar 2008 * SAM - downtime Friday lunchtime - Monday * Due to bad config + human error (didn't check) * Gridview * FTS stats for CERN in pre-prod - http://gvdev.cern.ch/GRIDVIEW/fts_index.php * Both: * DB want some space back. Short term - delete some CLOBs - ~150GB ? * Mid-term - produce policy on data expiration and approval by MB * Finally - move to solution where we purge daily/monthly the data from the schemas ---++ 27 Feb 2008 * SAM UI upgrade still ongoing - SAM tests running at 50% frequency * Problem with SAM SRM Tests - weren't run for 3 days (they had been scheduled only on the SAM UI which was out of service) ---++ 13 Feb 2008 * Final gridview services moved to new hardware. Old machines will be returned next week. * Gridview/SAM will need a downtime to cleanup the old entries in the table. Advantage of using the downtime is we can partition for the future. ---++ Old Reports * LcgScmStatusMLR2007
This topic: LCG
>
WebHome
>
LCGServiceChallenges
>
ProgressLogs
>
ServiceChallengeFourProgress
>
LcgScm
>
LcgScmStatus
>
LcgScmStatusMLR
Topic revision: r30 - 2009-04-29 - JamesCasey
Copyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback