Detector |
Description of the problem |
Actions to be taken by next shifters |
Author |
Date |
LAr |
INFORMATION rcc::ActionsInvalidated PU device "PU_EMECA3_08_01" is OFF. Actions with Resources/Segments panel are invalidated for this object. |
Expected for 6 PUs (see here). Ignore, do not report (not even in shift summary). |
Steffen |
31/10/2016 |
LAr |
ERROR LArNoiseBurstsUploader rc::BadVetoDB Writing Event Veto to COOL failed because Exception caught while uploading to database. Will retry later |
If these errors are not given very often (i.e if not in every two minutes), then just include it in your shift summary. If it appears very often, call LAr RC. |
Tulin |
25/10/2016 |
LAr |
WARNING RODC_EMECA2 rcc::ActionMessage Action 'CONFIG': FEB_EMECA2_10L_F2 Scac::CheckConf() - Readback error |
This FEB is known to produce readback errors at CONFIG (e.g. this elog ). According to Stefan Simion the FEB is well configured, only the readback failed. This error, for this specific FEB, might be masked in future. No need to call just put in summary. |
Alexis |
25/10/2016 |
LAr |
WARNING rc::HardwareError PU_XXXX RODC_XXXX |
LAr has gone busy. Call the LAr Run Coordinator IMMEDIATELY anddDo a dump of the ROS/RODs status logs. Post an elog with the details (see this entry for a nice example of what to include). Note: If the busy persists for over a minute the offending PUs will be stopless removed automatically (if in stable beams, flat top, squeeze or adjust), otherwise the Run Control shifter will get a popup. Run Control should NOT click "yes" for stopless removal before you have spoken to the LAr Run Coordinator. (more info on LAr.LArTroubleShooting) |
Claire |
10/06/26 |
LAr |
WARNING RODC_EMBCX rcc::ActionMessage Action 'MONITOR': FEB_EMBCX_XXX_FX reg. error +3.3V SCA or -1.7V SCA or +5V Analog [Left/Right] |
If single instance, add to shift summary only, if in bunches, call LAr RC. (details: LAr.LArTroubleShooting) |
Steffen |
14/07/2016 |
LAr |
WARNING ROS::ROSRobinNPExceptions RobinNP::clearRequest: The RobinNP could not delete 100 events because they were not in its buffer |
If you see a large number of these warnings at the start of a run Action required! Call LAr RC immediately. (more info on LAr.LArTroubleShooting |
Claire |
10/06/2016 |
LAr |
WARNING ROS-LAR-XXXX-XX ROS::ROSRobinNPExceptions Duplicate fragment: RobinNP::processIncomingFragment: Fragment for L1ID 0xXXXXXXX already exists in index for ROL 0x9 replacing with newer version for RobinNP |
If this message occurs for all systems then just note it in the shift summary. However if only one system complains then contact the LAr RC. |
Ian |
02/08/2016 |
LAr |
ERROR XXXXX ESInterface::CannotNotify The "CentralES" Expert System cannot be properly contacted: a communication problem ("TIMEOUT") occurred |
This error should be ignored. Understood as a transient network condition that does not cause further issues, as stated in DAQ Whiteboard:https://atlasop.cern.ch/twiki/bin/view/Main/DAQWhiteBoard |
Tulin |
02/11/2015 |
LAr |
ERROR LArHistogramming is::HangingSubscriber The number of timeouts exceeds the maximum tolerated configuration parameter (100) the 'corbaloc:iiop=XXXX' receiver will no longer be serviced |
This error has to be ignored. It has absolutely no affect on the Histogramming in the ATLAS partition. |
Kirill |
21/10/2015 |
LAr |
WARNING RODC_XXXXXX rcc::ActionMessage Action 'MONITOR': FEB_XXXXX_XX_XX QPLL Unlocked |
See instructions for shifters on LAr.LArTroubleShooting |
Emma |
03/10/2015 |
LAr |
WARNING l1calo-trigger-monitor-app trigmon::VeryHotTower HEC L1Calo tower 0x051c0500 is MUCH hotter than its neighbours. Eta=-3.15 Phi=-2.11 (-32,42). Please check L1 Empty trigger rates for unusual activity (EM3,J10,Tau8 etc.) Beam state=STABLE BEAMS |
See instructions for shifters on LAr.LArTroubleShooting |
Emma |
16/09/2015 |
LAr |
RODC_XX rcc::Generic XX DSP(s) have no received TTC event => ROD_XX_YY_NbTtcEvents |
Action required! Call immediatelly LAr RC! See info on LAr.LArTroubleShooting |
Emma |
10/07/2015 |
LAr |
WARNING RODC_EMBA4 rcc::Timeout Waiting for state running on controller "RootController" accompanied by WARNING RODC_EMECC2 rcc::Timeout Waiting for state running on controller "RootController" |
These sometimes appear at the start of the run if the run is slow to start / does not start smoothly - no action needed |
Emma |
10/07/2015 |
LAr |
WARNING EM L1Calo tower 0x001d0201 experienced a LARGE spike in its PPM rate. Eta=0.65 Phi=-1.23 (6,51). Please check L1Calo Mapping Tool and L1 Empty trigger rates (EM3,J10,TAU8 etc.) for unusual activity. Beam state=STABLE BEAMS |
Thresholds not tuned yet, if not busy this doesn't affect data taking. See here for shifter actions required. |
Ryne |
11/06/2015 |
LAr |
ddcdt_LAR_xxxx ddc::AppWarning Service crate_YXXFSMState.dcs unavailable |
misconfiguration of timeout - to be fixed in the ATLAS partition too - no need to report |
eleni |
15/05/2015 |
LAr |
WARNING RODC_EMBA1 rcc::ActionMessage Action 'STATUS': Staging Fpga FIFO OK and WARNING RODC_EMBA1 rcc::ActionMessage Action 'STATUS': Staging Fpga FIFO not in Good state... |
once at the beginning of the run, no need to report |
Manuela |
13/04/2015 |
LAr ABBA |
Warning Action 'STATUS': ipbusudp-2.0://10.145.91.xx:y {L1A_received, packet_built, packet_transmitted} counter is not increasing |
Clean them Is expected when ATLAS is not running at all or running but without the trigger item L1_LAR-EM disabled. Otherwise consult the trigger rate: If the rate is non-zero, call. |
Steffen |
11.10.2016 |
LAr ABBA |
Warning Action 'START': ipbusudp-2.0://10.145.91.xx:y TIMEOUT reading ABBA board |
Clean those messages from time to time. May occur from time to time in single instances (i.e. once every five minutes) → ignore it: no call, no elog, not reported in shift summary. |
Steffen |
12.08.2016 |
LAr ABBA |
Error Failed to lookup TTCC_EMF in IPC. |
Ignore. |
Steffen |
03.08.2016 |
LAr ABBA |
Warning Cannot read RunParams.!ConfigVersions from Information Service (possible reason: oks2coral is disabled or failed) |
Ignore. |
Steffen |
03.08.2016 |
LAr ABBA |
Warning Action 'STATUS': ipbusudp-2.0://10.145.91.xx:y Only n fibres are locked in Ppod{0/1} |
During day, inform LAr RC, otherwise report at next run meeting. |
Steffen |
12.08.2016 |
LAr ABBA |
Error Action 'STATUS': ipbusudp-2.0://10.145.91.xx:y Only n fibres are locked in Ppod{0/1} |
During day, inform LAr RC, otherwise report at next run meeting. |
Steffen |
12.08.2016 |
LAr ABBA |
Warning Action 'STATUS': ipbusudp-2.0://10.145.91.xx:y In Ppod{0/1} Fibre Rx{z} lost locked state. |
Note in shift summary. |
Steffen |
12.08.2016 |
LAr ABBA |
Warning Master Trigger not defined. |
Ignore. |
Alessandra |
31.08.2016 |
LAr ABBA |
Error Error from call to select(): fd flagged but no data avialable probable EOF -- 0 similar messages suppressed, last occurrence was at |
This message is appearing when ABBA Controller is dead. During day, inform LAr RC, otherwise report at next run meeting. |
Alessandra |
08.11.2016 |
LAr ABBA |
Error Action 'CONFIG': ipbusudp-2.0://10.145.91.18:2 Error in ABBA reset : TIMEOUT |
This message is appearing if the connection with a board is lost and we are passing through CONFIG state. During day, inform LAr RC, otherwise report at next run meeting. |
Alessandra |
08.11.2016 |
LAr ABBA |
Error Failed to notify the parent controller "LArABBA" about changes in my status |
This message only appears when the machine where LAr_ABBA runs is dead. During day, inform LAr RC, otherwise report at next run meeting. |
Alessandra |
08.11.2016 |
LAr ABBA |
Fatal The transition "CONNECT" has not been properly completed |
This message only appears when there is a failure in restarting the application. During day, inform LAr RC, otherwise report at next run meeting. |
Alessandra |
08.11.2016 |
LAr ABBA |
Error Error from send on a TCP connection : a call to send returned 104 |
This message only appears when there is a failure in restarting the application. During day, inform LAr RC, otherwise report at next run meeting. |
Alessandra |
08.11.2016 |
LAr ABBA |
Error Application "LArABBA" cannot be re-started. Reason: errors occurred trying to start it |
This message only appears when there is a failure in restarting the application. During day, inform LAr RC, otherwise report at next run meeting. |
Alessandra |
08.11.2016 |
LAr ABBA |
Error The transition "STOPROIB" could not be propagated to application "EB_ABBA" |
This message only appears when there is a failure in restarting the application. During day, inform LAr RC, otherwise report at next run meeting. |
Alessandra |
08.11.2016 |
LAr ABBA |
Error unknown |
This message appears when there is a failure in restarting the application. During day, inform LAr RC, otherwise report at next run meeting. |
Alessandra |
08.11.2016 |
Tile |
WARNING TileLaserRCD TileLaserII::Warning TileLaserII Warning NoLaserTT |
if you see this warning during a cosmics/collisions runs: check with the trigger desk if L1_CALREQ is enabled (it's in the menu and the prescale is not -1). If not, they need to correct this, L1_CALREQ2 should be included. If it is included, this means the shaft might be stuck, call the DAQ- on-call. Report in separate eLog once it is resolved. |
Arely |
June 2015 |
Tile |
l1calo-trigger-monitor-trigmon::LowRate Low L1Calo rate in Tile XXXYY Tower #. RECONFIGURE THIS MODULE |
If you see this ERS error, check first the L1Calo map and make sure you don't see anomalies for the correspondent tower. If it's the case, do not reconfigure the module. Put the error in your shift summary, no need for a separate eLog. If you see a problem in the L1Calo map, check with the trigger shifter and inform the Tile RC. |
Silvia |
20/07/2016 |
Tile |
WARNING ROS-TIL-EBC-00 ROS::ROSRobinNPExceptions Duplicate fragment: RobinNP::processIncomingFragment: Fragment for L1ID 0x1f043d57 already exists in index for ROL 0x5 replacing with newer version for RobinNP 0 , WARNING ROS-TIL-EBC-00 ROS::ROSRobinNPExceptions : RobinNP::clearRequest: The RobinNP could not delete ### events because they were not in its buffer (...) , WARNING ROS-TIL-EBC-00 Fragment error: RobinNP::processIncomingFragment: ROL ### Fragment out of sequence: L1 ID = 0x2b009d51, Most Recent ID 0x2b009d4b for RobinNP 1 |
We might get some of these if the L1 rate is relatively high (we are investigating). If you observe them, wait for the run to go for a couple of hrs, so you can get an impression on how often they appear. Then, post an eLog commenting on how frequent they are, which partition they are coming from (LBA, EBA LBC, EBC) and the L1 and HLT rates of the run (found in 'counters' in the TDAQ Igui), and posting a few examples. NB They may come in bunches of 5-10, and then nothing for 1-2hrs. If the rate at which they show is much higher, call DAQ on-call or run coordinator. |
Arely |
08.08.15 |
Tile |
Tile L1Calo tower 0x06160f03 is MUCH hotter than its neighbours. Eta=0.95 Phi=3.09 (9,31). Please check L1 Empty trigger rates for unusual activity (EM3,J10,Tau8 etc.) |
The message can be MUCH hotter , Very hot or LARGE spike , for these towers: 0x06140b01 , 0x06160f03 and 0x07130f00 . These are known hot towers. Check the rates and if you see a correlation post a separate eLog (including LVL1). Otherwise you can ignore them (post the warnings only once in your summary). For any other towers check with the trigger shifter that the rates are ok and, if not, call L1Calo on-call (and post a separate eLog with trigger rates and warnings including LVL1). |
Silvia |
03.06.16 |
Tile |
Tile L1Calo tower 0x061c0601 is hotter than its neighbours. Eta=-0.95 Phi=-0.83 (-10,55). Please check L1 Empty trigger rates for unusual activity (EM3,J10,Tau8 etc.) |
The message can appear for these towers: 0x061c0601 , 0x061c0603 , 0x061c0701 , 0x061c0703 , 0x061d0701 , 0x061d0703 . These are known hot towers. Check the rates and if you see a correlation post a separate eLog (including LVL1). Otherwise you can ignore them (post the warnings only once in your summary). The same applies for the following towers when the LHC status is NOT Stable Beams: 0x071d0601 , 0x061a0703 , 0x061e0703 , 0x061e0803 , 0x071d0600 , 0x061a0802 , 0x07190901 , 0x07190600 . For any other towers check with the trigger shifter that the rates are ok and, if not, call L1Calo on-call (and post a separate eLog with trigger rates and warnings including LVL1). |
Silvia |
03.06.16 |
Tile |
Digital Errors/Corruption |
LBA01, LBA52 and EBC06 are reporting digital errors and are known. You don't have to power cycle them if the digital error fraction is >5%. |
Silvia |
20.07.16 |
Tile |
Hot spot at eta=1, phi=1 |
If DQ online shifter reports about hot spot at eta=1, phi=1, this is a known hot spot and can be ignored. |
Silvia |
05.05.16 |
Tile |
Recurring calibration errors which can be ignored |
CHIP ERROR Example; No valid clock detected Example; STOPGATHERING Example. |
Silvia |
11.05.16 |
Detector |
Name of the alarm |
Actions to be taken by next shifters |
Author |
Date |
LAr |
LAR TERMO LArTermoCan* E***_* NOT OPERATIONAL |
call HW oncall to inform Sergey (during day), document, very rare |
Richard |
13.05.16 |
LAr |
LAR TERMO LArTermoCan* E***_* emergency |
no need to call, report full alarm text in shift summary |
Sergey |
01.04.16 |
LAr |
LAR FECLV * FECCur crate_* LowVoltageSupply Current or Status WARNING |
no need to call, report full alarm text in shift summary |
Aaron/Manuela/Jose |
12.08.15 |
LAr |
!VNotAtVop WENT WARNING , in HV channel for ~10s only |
Check voltage trend/plot, voltage should have changed only for one reading ~10s, alarm should be in WENT. No need to call, just put in the summary. Alarm will be acknowledged by HW on-call. NOTE: If accompanied by a second alarm for !TripAutoRecovery call RC. |
Jose |
01.07.15 |
LAr |
Example: WENT LAR FCAL C HV FCAL C 01 0 S3 M192-C7 Not at Op Voltage |
Only for FCAL. These happen sometimes at ADJUST or during background spikes. IMPORTANT: The alarm should go to WENT almost immediately. If it does not, please call HW on call (70137) asap and write an elog. Otherwise, just put in Shift Summary. |
Claire |
04.08.16 |
LAr |
LAr Cryogenic system |
If the Slimos or the shift leader tell you that there is a problem with ATLAS>EXT>CRYO ARGON, please note that this is NOT under your responsability. The Slimos should inform the piquet |
S. Mazza |
20.05.15 |
LUCID |
ATLLCD02.ELMBE/ELMBCanBus_1/ELMB_X/AI/PT_4W_Y_Z.value |
If the alert refers only to one isolated value, i.e. there is no trend, it can be acknowledged. The reason(s) for these hick-ups is under investigation, most probably electronic noise. In case of doubt the DCS on-call must be called at any time: 16 1981 |
D. Caforio |
17.04.2015 |