Replication of LHCb conditions to SARA stopped because streams apply process aborted due to inconsistency of data (no data found). Some LHCb condition and ATLAS condition tables had missing rows. LFC data were consistent.
Impact
Conditions data of ATLAS and LHCb were partly inconsistent since SARA recovery in October.
Replication of conditions data (LHCb and ATLAS) to SARA was not working after incident for 2 days.
Time line of the incident
10-Sep-10 - SARA database and replication has been recovered according to the manual.
26-Oct-10 17:07:25 - STREAMS_APPLY_LHCB@SARA process crushed with error: ORA-01403: no data found
26-Oct-10 17:07:25 - CERN DBAs realized that some tables are broken (missing rows)
27-Oct-10 16:00:00 - After investigation held at CERN. Root cause was understood.
27-Oct-10 17:00:00 - SARA recovery started
28-Oct-10 17:00:00 - SARA recovery finished
Analysis
The root cause of the problem was missing one step in SARA recovery procedure (re-instantiation of whole schema). The impact of this caused that all new changes on tables that were created in replicated schemas between SARA crash and recovery (from 2010-08-18 to 2010-09-09) were discarded by apply process without any warnings on SARA database.
Follow up
Damaged tables has been recovered with datapump import using data from T0 - afternoon 28.10.2010
Procedure for resynchronization of T1 has been updated with missing step.