Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf 2009-11-26_CMS_CCIN2P3_Report.pdf r1 manage 78.4 K 2009-12-01 - 15:52 DirkDuellmann CMS Data Loss Incident at FR-CCIN2P3
PDFpdf 20090411_SIR_SRM_PIC.pdf r1 manage 152.9 K 2009-04-16 - 15:10 OlofBarring  
PDFpdf 20091219_PIC_Service_Incident_Report.pdf r1 manage 23.3 K 2009-12-23 - 11:20 GonzaloMerino SIR of the cooling incident at PIC on 19 Dec 2009
PDFpdf 20100521_SIR_PIC_PowerCut.pdf r1 manage 123.4 K 2010-05-31 - 09:39 GonzaloMerino SIR for the power cut affecting PIC Tier1 on 21-22 May 2010
PDFpdf 20100727_SIR_PIC_CoolingModule.pdf r1 manage 75.0 K 2010-07-27 - 18:14 GonzaloMerino Cooling problem at PIC WN module causing about 50% of WNs to be shutdown (running jobs killed)
PDFpdf 20100727_SIR_PIC_DDN.pdf r1 manage 63.4 K 2010-07-27 - 17:25 GonzaloMerino SRM ATLAS problems at PIC on 22-Jul due to wrong dCache configuration. About 10h.
PDFpdf 20100727_SIR_PIC_Gridmapdir.pdf r1 manage 48.0 K 2010-07-27 - 17:23 GonzaloMerino CE failure at PIC of 3hrs on 20-Jul due to a faulty gridmapdir migration.
PDFpdf 20100831_SIR_ASGC_STAGERFTS_DB.pdf r2 r1 manage 246.7 K 2010-09-13 - 18:26 JhenWeiHuang 20100831_SIR_ASGC_STAGERFTS_DB.pdf
PDFpdf 20100924_SIR_ASGC_DCPOWERCUT.pdf r1 manage 249.2 K 2010-10-16 - 22:40 JhenWeiHuang 20100924_SIR_ASGC_DCPOWERCUT
PDFpdf 20110211_SIR_PIC_ATLAS_lost_files.pdf r1 manage 45.7 K 2011-02-11 - 13:29 GonzaloMerino Incident with ATLAS lost files at PIC 21/1/2011
PDFpdf 20110310_SIR_PIC_ATLAS_lost_Files.pdf r1 manage 74.2 K 2011-03-10 - 13:44 GonzaloMerino Update to the PIC SIR of lost files with ATLAS (21-Jan-2011 to 8-Feb-2011)
PDFpdf 20110501_SIR_ASGC_10GbLINKDOWN.pdf r1 manage 187.2 K 2011-05-11 - 19:03 JhenWeiHuang 20110501_SIR_ASGC_10GbLINKDOWN.pdf
PDFpdf 20110521_SIR_ASGC_DCPOWERCUT.pdf r1 manage 114.7 K 2011-05-27 - 07:56 JhenWeiHuang SIR for ASGC DC Power Cut on 21 May 2011
PDFpdf 20110727GGUS_Service_Incident_Report.pdf r1 manage 51.0 K 2011-07-27 - 12:00 DirkDuellmann  
PDFpdf 20120122SIRPowerandCoolingProblematPIC.pdf r1 manage 313.2 K 2012-02-03 - 14:50 GonzaloMerino SIR of the power and cooling incident at PIC Jan 22nd 2012
Microsoft Word filedoc 20120310SIRATLASlostfileonLTO5tapeG05918.doc r1 manage 34.0 K 2012-04-13 - 21:16 AlexeySedov ATLAS Tape incident at PIC
PDFpdf 20120603_SIR_Cooling_Incident_at_PIC.pdf r1 manage 54.4 K 2012-06-04 - 23:58 GonzaloMerino Cooling incident at PIC on 3-Jun-2012: Computing service degraded
PDFpdf 20120621_SIR_Cooling_Incident_at_PIC.pdf r1 manage 86.0 K 2012-06-28 - 11:57 GonzaloMerino Cooling incident at PIC 21-Jun-2012: 17% of WNs switched off
PDFpdf 20120729_SIR_ASGC_STAGERDB.pdf r1 manage 293.2 K 2012-11-21 - 18:55 JhenWeiHuang 20120729_SIR_ASGC_STAGERDB
PDFpdf 20121009_PIC_SIR_ATLAS_deleted_files.pdf r1 manage 61.3 K 2012-10-26 - 13:40 GonzaloMerino SIR for accidental ATLAS files deletion at PIC
PDFpdf 20121210SIRPICLHCblostfilesontape2.pdf r1 manage 58.7 K 2012-12-14 - 15:54 GonzaloMerino SIR for the lost LHCb tape files at PIC on Dec 2012
PDFpdf 20170321_SIR_CERN_PHEDEX.pdf r1 manage 49.0 K 2017-04-03 - 13:24 KateDziedziniewicz CMS Phedex not working at CNAF/WISCONSIN after CMSR migration
PDFpdf 3D-DB-incident-20100629.pdf r1 manage 43.4 K 2010-06-30 - 20:22 FelixLee ASGC 3D DB incident report 20100629
PDFpdf ASGC-DB-Sep28.pdf r1 manage 22.5 K 2009-10-12 - 17:05 JamieShiers  
PDFpdf ASGC-SIR20130324-Atlas_file_lost.pdf r2 r1 manage 31.4 K 2013-05-08 - 00:40 FelixLee ASGC file loss to Atlas MCTAPE
PDFpdf ASGC_DATA_LOSS_SIR-NOV_2013.pdf r1 manage 36.5 K 2013-11-21 - 14:41 FelixLee ASGC_DATA_LOSS_SIR-NOV2013
PDFpdf ASGC_SIR_2012-04-11.pdf r1 manage 281.0 K 2012-05-03 - 10:35 JhenWeiHuang ASGC_SIR_2012-04-11.pdf
PDFpdf ASGC_incident_report_Jan18_2010.pdf r2 r1 manage 16.6 K 2010-02-02 - 02:56 HorngLiangShih  
PDFpdf CCIN2P3-WLCGT1SCM-LHCB-SW-Problem-Report-20101111.pdf r1 manage 78.3 K 2011-01-18 - 10:40 JamieShiers CCIN2P3 Shared s/w area interim report
PDFpdf CERN_OCSP_incident_report.pdf r1 manage 46.9 K 2020-06-29 - 21:40 MaartenLitmaath CERN Grid CA OCSP incident report, June 24-25, 2020
PDFpdf Fibre_Cut_June_2009.pdf r1 manage 177.9 K 2009-07-06 - 08:30 JamieShiers  
Microsoft Word filedoc GOCDB_Outage_5th_March_2014.doc r1 manage 30.5 K 2014-03-13 - 15:52 MaartenLitmaath GOCDB Outage 5th March 2014
PDFpdf GridKa_SIR_20100612.pdf r1 manage 28.5 K 2010-06-15 - 15:19 UnknownUser CMS dCache down for approx. 3h15
PDFpdf GridKa_SIR_20100706.pdf r1 manage 34.2 K 2010-07-07 - 23:44 JosVanWezel  
PDFpdf GridKa_SIR_PBS-Jan11.pdf r1 manage 47.7 K 2011-02-07 - 14:57 AndreasHeiss SIR about GridKa local batch system problems, January 2011
PDFpdf GridKa_SIR_lost_files_alice_20110526.pdf r1 manage 8.2 K 2011-06-06 - 17:23 JosVanWezel KIT SIR loast files ALICE 5/2011
PDFpdf GridKa_Service_Incident_Report_12082011.pdf r1 manage 461.9 K 2011-12-12 - 15:00 XavierMol  
PDFpdf KDC-SIR.pdf r2 r1 manage 66.1 K 2011-08-23 - 14:52 DirkDuellmann  
PDFpdf KIT_SIR_CMSChimeraDatabase_2018-08.pdf r1 manage 196.1 K 2018-08-20 - 10:15 XavierMol Database incident CMS dCache Aug 2018
PDFpdf KIT_SIR_StorageFTS_20121127.pdf r1 manage 298.0 K 2013-01-22 - 16:01 XavierMol SIR about offline FTS and dCache pool nodes end of Nov 2012 at GridKa.
PDFpdf KIT_SIR_Storage_20131028.pdf r2 r1 manage 429.2 K 2014-04-08 - 08:29 XavierMol 130 files lost for CMS
PDFpdf KIT_SIR_Storage_20141023.pdf r1 manage 203.8 K 2014-10-31 - 13:06 ThomasHartmann KIT: SIR: identification of file losses fro tape due to wrong end of tape markers
PDFpdf KIT_SIR_TapeStorage_2017-12.pdf r1 manage 195.6 K 2018-03-13 - 08:57 XavierMol SIR KIT Tape Storage Q4 2017
PDFpdf LHCb_Databases_Upgrade_Migration_Incident_report.pdf r1 manage 43.1 K 2018-03-21 - 18:27 IgnacioCoterillo  
Unknown file formatdocx POSTmortem-CMS-Oct2010.docx r1 manage 117.8 K 2010-10-15 - 13:51 MaartenLitmaath CMS storage down at CNAF Oct 6-10, 2010
Microsoft Word filedoc PostMortemTier-1ServiceIncidentRAIDCORRUPTIONAdaptec644515-03-2012.doc r1 manage 52.5 K 2012-04-13 - 21:13 AlexeySedov ATLAS Data Loss Incident at PIC
PDFpdf Post_Mortem_PIC_Tier-1_SIR_Computing_SSC5_20110525.pdf r1 manage 94.9 K 2011-06-01 - 16:18 UnknownUser SIR for the computing incident at PIC on 25/26th May 2011
PDFpdf Post_Mortem_Tier-1_Service_Incident_dCache_PNFS_overload_10-June-2011.pdf r1 manage 129.5 K 2011-06-14 - 17:14 UnknownUser  
PDFpdf Post_Mortem_Tier-1_Service_Incident_dCache_PNFS_overload_10-June-2011_f.pdf r1 manage 129.9 K 2011-06-16 - 15:53 UnknownUser  
PDFpdf Post_mortem_LFC_indicent_23-26_May_2009_-_WikiPIC.pdf r1 manage 163.7 K 2009-05-27 - 17:28 JamieShiers  
PDFpdf SIR-2018-CCIN2P3-DiskServerFailure.pdf r1 manage 416.0 K 2018-10-05 - 16:26 EricFede SIR for CCIN2P3 Data lost on xrootd storage
PDFpdf SIR-ALICE-KIT-overload-v2.pdf r1 manage 78.8 K 2014-05-07 - 18:52 MaartenLitmaath SIR about KIT firewall and OPN overload by ALICE jobs
PDFpdf SIR-CNAF--AtlasSRMoutage-April-2010.pdf r1 manage 112.5 K 2010-05-10 - 14:22 HarryRenshall CNAF ATLAS SRM blockage 28 April then MCDISK full STORM bug
PDFpdf SIR-FZK-20090907.pdf r1 manage 74.9 K 2009-09-29 - 14:42 HarryRenshall SIR of FZK degraded ATLAS RAC 7 to 16 Sep 2009
PDFpdf SIR-IN2P3-CC-AFSoutage-2010-04-26.pdf r1 manage 12.0 K 2010-05-07 - 11:14 HarryRenshall SIR for IN2P3 AFS Outage
PDFpdf SIR-IN2P3-CC-BatchOutage-2010-04-24.pdf r1 manage 15.4 K 2010-05-04 - 09:48 HarryRenshall SIR of IN2P3 batch outage of 24/25 April 2010
PDFpdf SIR-IN2P3-CC-CVMFS-2012-07-03-v0.pdf r1 manage 6.9 K 2012-07-18 - 23:06 MaartenLitmaath IN2P3-CC CVMFS inconsistency
PDFpdf SIR-IN2P3-CC-CVMFSSquid-2012-06-24-v2.pdf r1 manage 8.7 K 2012-08-29 - 22:17 MaartenLitmaath software area unavailable at IN2P3 on 24-Jun-2012
PDFpdf SIR-IN2P3-CC-LHCb-AFS-Latency-2010-S2-v2.pdf r1 manage 212.3 K 2011-02-14 - 22:14 MaartenLitmaath Slow AFS response causing environment setup timeout for LHCb jobs
PDFpdf SIR-IN2P3-CC-Network-2011-02-13-v0.pdf r1 manage 6.8 K 2011-03-01 - 15:45 MaartenLitmaath IN2P3-CC core network switch outage due to CPU card failure
PDFpdf SIR-IN2P3-CC-Network-2011-03-14-v1.pdf r1 manage 6.2 K 2011-03-25 - 16:07 MaartenLitmaath IN2P3-CC hardware failure on network equipment
PDFpdf SIR-IN2P3-CC-OperationsPortal-2010-04-22v2.pdf r2 r1 manage 17.2 K 2010-05-07 - 11:14 HarryRenshall SIR for IN2P3 Downtimes Notification Impossible
PDFpdf SIR-IN2P3-CC-PowerIncident-2011-04-08-v0.pdf r1 manage 8.1 K 2011-04-14 - 11:29 MaartenLitmaath IN2P3-CC power incident Apr 8
PDFpdf SIR-IN2P3-CC-PowerIncident-2011-08-26-v2.pdf r1 manage 24.3 K 2011-09-14 - 20:50 MaartenLitmaath IN2P3-CC cooling system failure Aug 26
PDFpdf SIR-IN2P3-CC-WNs-disconnected-2010-02-15-2.pdf r1 manage 10.5 K 2010-02-25 - 14:28 HarryRenshall Worker node network connectivity loss at IN2P3 15 Feb 2010
PDFpdf SIR-IN2P3-CC-dCache-2012-07-01-v1.pdf r1 manage 6.7 K 2012-07-18 - 22:59 MaartenLitmaath IN2P3-CC dCache downtime due to leap second
PDFpdf SIR-IN2P3-CC-lbms-DB-overload-2010-01-04.pdf r1 manage 30.1 K 2010-01-11 - 16:08 DirkDuellmann IN2P3 Local batch system database server overload
PDFpdf SIR-IN2P3-CC-network-2012-06-29-v0.pdf r1 manage 5.7 K 2012-07-16 - 20:04 MaartenLitmaath IN2P3-CC network outage
PDFpdf SIR-IN2P3-CC-network-2014-11-26-v0.pdf r1 manage 31.6 K 2014-12-01 - 10:00 AndreaSciaba  
PDFpdf SIR-IN2P3-CC-network-2015-11-03-v3.pdf r1 manage 33.1 K 2015-11-12 - 14:18 AndreaSciaba  
PDFpdf SIR-IN2P3-Dcache-ATLAS-Transfer-Degradation-2010-Q4-v3.pdf r1 manage 281.6 K 2011-02-11 - 19:27 MaartenLitmaath IN2P3-CC dCache transfer degradation for ATLAS
PDFpdf SIR20120921.pdf r1 manage 31.9 K 2012-10-16 - 18:31 MaartenLitmaath CNAF LHCb SE 6d downtime
PDFpdf SIR_201705.pdf r1 manage 127.2 K 2017-06-06 - 12:11 MaartenLitmaath GGUS outage of 2017-05-31
PDFpdf SIR_ASGC_July_2012.pdf r1 manage 292.8 K 2012-11-21 - 18:42 JhenWeiHuang SIR_ASGC_July_2012
PDFpdf SIR_BNL_CONDB.pdf r1 manage 58.3 K 2011-09-29 - 15:12 MariaGirone  
PDFpdf SIR_BNL_DB_CFG.pdf r2 r1 manage 50.6 K 2011-09-20 - 10:01 MariaGirone  
PDFpdf SIR_CCIN2P3_15aug2011.pdf r1 manage 32.8 K 2011-08-22 - 17:12 JamieShiers  
PDFpdf SIR_CCIN2P3_19july2011.pdf r1 manage 37.0 K 2011-08-01 - 15:53 MaartenLitmaath IN2P3-CC database incidents due to disk drive failures
Microsoft Word filedoc SIR_CCIN2P3_SRM_incident_08oct2009.doc r1 manage 71.5 K 2009-10-12 - 14:22 JamieShiers  
Microsoft Word filedoc SIR_CCIN2P3_cooling_outage_03nov2009.doc r1 manage 12.5 K 2009-11-06 - 17:37 DirkDuellmann IN2P3 cooling outage Nov 3rd
PDFpdf SIR_CNAF_20190829.pdf r1 manage 49.9 K 2019-08-29 - 18:42 MaartenLitmaath CNAF site outage Aug 6-21, 2019
PDFpdf SIR_COOLING_OUTAGE_2009_05_03.pdf r1 manage 26.7 K 2009-05-22 - 14:05 HarryRenshall SIR for PIC cooling failure of 14 May 2009
PDFpdf SIR_FZK-LCG2_2010-01-13.pdf r1 manage 28.5 K 2010-01-15 - 12:58 UnknownUser SIR FZK-LCG2 (GridKa/KIT) - Information system problems on 13th and 14th of January 2010
PDFpdf SIR_GRID-FTP_OUTAGE_2009_06_11-1.pdf r1 manage 73.9 K 2009-06-16 - 11:06 JamieShiers  
PDFpdf SIR_PIC_ATLAS_T10KD_20160519.pdf r1 manage 24.3 K 2016-05-19 - 10:05 AreshVedaee T10KD issue at PIC affecting ATLAS
PDFpdf SIR_PIC_COOLING_OUTAGE_2009_04_14.pdf r1 manage 32.0 K 2009-05-22 - 14:21 HarryRenshall SIR for PIC cooling failure of 2009.05.14
PDFpdf SIR_PIC_COOLING_OUTAGE_2009_05_14.pdf r1 manage 32.0 K 2009-05-22 - 14:26 HarryRenshall SIR for PIC Cooling Outtage of 14 May 2009
PDFpdf SIR_ROBOTIC_LIBRARY_OUTAGE_2009_04_22.pdf r1 manage 22.8 K 2009-04-25 - 10:06 DirkDuellmann  
PDFpdf SIR_ROBOTIC_LIBRARY_OUTAGE_2009_04_26-3.pdf r1 manage 17.6 K 2009-04-30 - 11:50 JamieShiers  
PDFpdf SIR_SARA_TAPEBACKEND_OUTAGE_2009_05_04.pdf r1 manage 22.0 K 2009-05-07 - 15:27 HarryRenshall SIR for SARA Tapebackend outage 4 to 6 May 2009
PDFpdf SIR_cooling_failure_20100710.pdf r1 manage 53.4 K 2010-07-19 - 14:28 UnknownUser SIR of the cooling incident at KIT on July 10
PDFpdf SIR_storage_FZK_GridKa.pdf r1 manage 51.7 K 2009-07-02 - 14:17 JamieShiers  
PDFpdf SIRondatalossinASGCinOct.2016.pdf r1 manage 32.1 K 2016-11-11 - 14:21 MaartenLitmaath ASGC - loss of ATLAS data, 18 Oct 2016
Unknown file formatxlsb SIRs-by-Q-2012.xlsb r1 manage 43.8 K 2012-11-23 - 14:06 JamieShiers Spreadsheet for producing SIR plots for WLCG QRs
PDFpdf SURFsara_SIR_network_outage_30-6-2016.pdf r1 manage 57.0 K 2016-07-13 - 14:36 RonTrompert1  
PDFpdf SURFsara_Service_Incident_Report_-_bw32-1_backplane.pdf r1 manage 4267.2 K 2015-02-09 - 16:58 AndreaSciaba  
PDFpdf Service_Incident_Report.pdf r1 manage 177.2 K 2014-01-14 - 12:09 SimoneCampana Service instabilities in the SURFsara grid storage cluster
PDFpdf Service_Incident_Report_for_BNL_Tier1-06-2013.pdf r1 manage 28.3 K 2013-06-26 - 21:56 MichaelErnst Service Incident Report for US ATLAS Tier-1 Center
PDFpdf Storage_incident_report_at_TRIUMF_Sep-16-2013.pdf r1 manage 46.6 K 2013-09-25 - 00:46 RedaTafirout TRIUMF incident report (lost files)
PDFpdf TRIUMF-dcs08lun0_incident_20161218.pdf r1 manage 41.7 K 2017-01-25 - 18:05 DiQing ATLAS lost files at TRIUMF due to hardware/firmware issue on December 18 2016
PDFpdf TRIUMF-incident-report-april10-2012.pdf r1 manage 29.8 K 2012-04-27 - 02:36 RedaTafirout TRIUMF incident report
PDFpdf post-mortem-CNAF-CE-Problem-Sept-2016.pdf r1 manage 141.2 K 2016-10-17 - 20:22 MaartenLitmaath  
Texttxt power_cut_ASGC.txt r1 manage 0.6 K 2009-07-31 - 16:19 GangQin power cut at ASGC on July 17th
Texttxt power_surge_ASGC_20090118.txt r1 manage 0.8 K 2010-02-01 - 12:59 GangQin Po
PDFpdf sir-in2p3-cc-dcachesrmincident-2011-03-19-v2.pdf r1 manage 7.1 K 2011-03-28 - 14:08 MaartenLitmaath IN2P3-CC dCache SRM overload
PDFpdf sir-in2p3-cc-powerincident-2011-02-25-v0.pdf r1 manage 7.8 K 2011-03-07 - 19:18 MaartenLitmaath IN2P3-CC power incident Feb 25
PDFpdf sir-kit-atlas-dcache-20110728.pdf r1 manage 25.9 K 2011-07-28 - 14:18 AndreasPetzold SIR ATLAS dCache data loss at KIT July 2011
PDFpdf sir_BatchIncident_15_10_09.pdf r1 manage 29.9 K 2009-10-15 - 16:07 JamieShiers  
PDFpdf sir_in2p3network_outage_10_12_2009.pdf r1 manage 48.8 K 2009-12-14 - 10:01 HarryRenshall SIR of IN2P3 DNS Load Balancing Failure 8 December 2009
PDFpdf uscmsT1_SIR_042015.pdf r2 r1 manage 46.7 K 2015-05-04 - 15:00 LucaMascetti 2015-05 FNAL uscms lost files
Edit | Attach | Watch | Print version | History: r291 < r290 < r289 < r288 < r287 | Backlinks | View topic | WYSIWYG | More topic actions
Topic revision: r291 - 2020-06-29 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback