Q2 2012

Site Service Area Date Duration Service Impact Report
PIC CE 21 Jun 1 h PIC Tier1 Computing About 17% of the WN capacity switched off due to cooling incident https://twiki.cern.ch/twiki/pub/LCG/WLCGServiceIncidents/20120621_SIR_Cooling_Incident_at_PIC.pdf
CERN Storage 18 Jun ~1h CASTOR c2atlas diskservers were not reachable for ~1h https://twiki.cern.ch/twiki/bin/view/CASTORService/IncidentsRmNodeMisconfiguration20120618
CERN Storage 5 Jun 1 h CASTOR communication problems and client timeouts https://twiki.cern.ch/twiki/bin/view/CASTORService/IncidentsNameServerContention20120605
PIC CE 3-4 Jun 18 h PIC Tier1 Computing 18h of service degradation: Number of cores reduced by 60% due to cooling incident https://twiki.cern.ch/twiki/pub/LCG/WLCGServiceIncidents/20120603_SIR_Cooling_Incident_at_PIC.pdf
CERN DB 22 May 1.5 h CMS online DB 1.5 hours of high luminosity data lost https://twiki.cern.ch/twiki/bin/view/DB/PostMortem22May12
CERN Storage 22 May 5-40 min CASTOR ~1k unavailable files after transparent DB intervention https://twiki.cern.ch/twiki/bin/view/CASTORService/IncidentsDegradationDBIntervention20120522
CERN Infrastructure 19-20 April 1 day batch batch system down https://twiki.cern.ch/twiki/bin/view/PESgroup/IncidentBatchDown190412
CERN Infrastructure 18-20 April 2 days batch ATLAS Tier-0 job submission system could not keep up with incoming RAW data https://twiki.cern.ch/twiki/bin/view/PESgroup/IncidentBatchSlow180412
ASGC Storage 11-12 April 24 h CASTOR hardware failure, DB crashed https://twiki.cern.ch/twiki/pub/LCG/WLCGServiceIncidents/ASGC_SIR_2012-04-11.pdf
TRIUMF All Tier-1 services 10-11 April 20 h All Tier-1 services Two site-wide power failures https://twiki.cern.ch/twiki/pub/LCG/TempArea/TRIUMF-incident-report-april10-2012.pdf
CERN Storage 4 April 1.5 h CASTOR Name Server stuck, 3 CMS files had to be rewritten https://twiki.cern.ch/twiki/bin/view/CASTORService/IncidentsCentralNSStuck20120404
CERN Storage 2 April several days CASTOR 1 LHCb diskserver hardware issue (files unavailable, finally 3 file systems lost) https://twiki.cern.ch/twiki/bin/view/CASTORService/IncidentsDiskOnlyDataLoss20120402

-- JamieShiers - 13-Jul-2012

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2012-07-13 - JamieShiers
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback