%RESPONSIBLE%
VincentPascuzzi

CaloFwdKnownIssuesSandbox

ERS warnings and errors

LAr+ABBA

Detector Description of the problem Actions to be taken by next shifters Author Date
LAr INFORMATION rcc::ActionsInvalidated PU device "PU_EMECA3_08_01" is OFF. Actions with Resources/Segments panel are invalidated for this object. Expected for 6 PUs (see here). Ignore, do not report (not even in shift summary). Steffen 31/10/2016
LAr ERROR LArNoiseBurstsUploader rc::BadVetoDB Writing Event Veto to COOL failed because Exception caught while uploading to database. Will retry later If these errors are not given very often (i.e if not in every two minutes), then just include it in your shift summary. If it appears very often, call LAr RC. Tulin 25/10/2016
LAr WARNING RODC_EMECA2 rcc::ActionMessage Action 'CONFIG': FEB_EMECA2_10L_F2 Scac::CheckConf() - Readback error This FEB is known to produce readback errors at CONFIG (e.g. this elog). According to Stefan Simion the FEB is well configured, only the readback failed. This error, for this specific FEB, might be masked in future. No need to call just put in summary. Alexis 25/10/2016
LAr WARNING rc::HardwareError PU_XXXX RODC_XXXX LAr has gone busy. Call the LAr Run Coordinator IMMEDIATELY anddDo a dump of the ROS/RODs status logs. Post an elog with the details (see this entry for a nice example of what to include). Note: If the busy persists for over a minute the offending PUs will be stopless removed automatically (if in stable beams, flat top, squeeze or adjust), otherwise the Run Control shifter will get a popup. Run Control should NOT click "yes" for stopless removal before you have spoken to the LAr Run Coordinator. (more info on LAr.LArTroubleShooting) Claire 10/06/26
LAr WARNING RODC_EMBCX rcc::ActionMessage Action 'MONITOR': FEB_EMBCX_XXX_FX reg. error +3.3V SCA or -1.7V SCA or +5V Analog [Left/Right] If single instance, add to shift summary only, if in bunches, call LAr RC. (details: LAr.LArTroubleShooting) Steffen 14/07/2016
LAr WARNING ROS::ROSRobinNPExceptions RobinNP::clearRequest: The RobinNP could not delete 100 events because they were not in its buffer If you see a large number of these warnings at the start of a run Action required! Call LAr RC immediately. (more info on LAr.LArTroubleShooting Claire 10/06/2016
LAr WARNING ROS-LAR-XXXX-XX ROS::ROSRobinNPExceptions Duplicate fragment: RobinNP::processIncomingFragment: Fragment for L1ID 0xXXXXXXX already exists in index for ROL 0x9 replacing with newer version for RobinNP If this message occurs for all systems then just note it in the shift summary. However if only one system complains then contact the LAr RC. Ian 02/08/2016
LAr ERROR XXXXX ESInterface::CannotNotify The "CentralES" Expert System cannot be properly contacted: a communication problem ("TIMEOUT") occurred This error should be ignored. Understood as a transient network condition that does not cause further issues, as stated in DAQ Whiteboard:https://atlasop.cern.ch/twiki/bin/view/Main/DAQWhiteBoard Tulin 02/11/2015
LAr ERROR LArHistogramming is::HangingSubscriber The number of timeouts exceeds the maximum tolerated configuration parameter (100) the 'corbaloc:iiop=XXXX' receiver will no longer be serviced This error has to be ignored. It has absolutely no affect on the Histogramming in the ATLAS partition. Kirill 21/10/2015
LAr WARNING RODC_XXXXXX rcc::ActionMessage Action 'MONITOR': FEB_XXXXX_XX_XX QPLL Unlocked See instructions for shifters on LAr.LArTroubleShooting Emma 03/10/2015
LAr WARNING l1calo-trigger-monitor-app trigmon::VeryHotTower HEC L1Calo tower 0x051c0500 is MUCH hotter than its neighbours. Eta=-3.15 Phi=-2.11 (-32,42). Please check L1 Empty trigger rates for unusual activity (EM3,J10,Tau8 etc.) Beam state=STABLE BEAMS See instructions for shifters on LAr.LArTroubleShooting Emma 16/09/2015
LAr RODC_XX rcc::Generic XX DSP(s) have no received TTC event => ROD_XX_YY_NbTtcEvents Action required! Call immediatelly LAr RC! See info on LAr.LArTroubleShooting Emma 10/07/2015
LAr WARNING RODC_EMBA4 rcc::Timeout Waiting for state running on controller "RootController" accompanied by WARNING RODC_EMECC2 rcc::Timeout Waiting for state running on controller "RootController" These sometimes appear at the start of the run if the run is slow to start / does not start smoothly - no action needed Emma 10/07/2015
LAr WARNING EM L1Calo tower 0x001d0201 experienced a LARGE spike in its PPM rate. Eta=0.65 Phi=-1.23 (6,51). Please check L1Calo Mapping Tool and L1 Empty trigger rates (EM3,J10,TAU8 etc.) for unusual activity.  Beam state=STABLE BEAMS Thresholds not tuned yet, if not busy this doesn't affect data taking. See here for shifter actions required. Ryne 11/06/2015
LAr ddcdt_LAR_xxxx ddc::AppWarning Service crate_YXXFSMState.dcs unavailable misconfiguration of timeout - to be fixed in the ATLAS partition too - no need to report eleni 15/05/2015
LAr WARNING RODC_EMBA1 rcc::ActionMessage Action 'STATUS': Staging Fpga FIFO OK and WARNING RODC_EMBA1 rcc::ActionMessage Action 'STATUS': Staging Fpga FIFO not in Good state... once at the beginning of the run, no need to report Manuela 13/04/2015
LAr ABBA Warning Action 'STATUS': ipbusudp-2.0://10.145.91.xx:y {L1A_received, packet_built, packet_transmitted} counter is not increasing Clean them Is expected when ATLAS is not running at all or running but without the trigger item L1_LAR-EM disabled. Otherwise consult the trigger rate: If the rate is non-zero, call. Steffen 11.10.2016
LAr ABBA Warning Action 'START': ipbusudp-2.0://10.145.91.xx:y TIMEOUT reading ABBA board Clean those messages from time to time. May occur from time to time in single instances (i.e. once every five minutes) → ignore it: no call, no elog, not reported in shift summary. Steffen 12.08.2016
LAr ABBA Error Failed to lookup TTCC_EMF in IPC. Ignore. Steffen 03.08.2016
LAr ABBA Warning Cannot read RunParams.!ConfigVersions from Information Service (possible reason: oks2coral is disabled or failed) Ignore. Steffen 03.08.2016
LAr ABBA Warning Action 'STATUS': ipbusudp-2.0://10.145.91.xx:y Only n fibres are locked in Ppod{0/1} During day, inform LAr RC, otherwise report at next run meeting. Steffen 12.08.2016
LAr ABBA Error Action 'STATUS': ipbusudp-2.0://10.145.91.xx:y Only n fibres are locked in Ppod{0/1} During day, inform LAr RC, otherwise report at next run meeting. Steffen 12.08.2016
LAr ABBA Warning Action 'STATUS': ipbusudp-2.0://10.145.91.xx:y In Ppod{0/1} Fibre Rx{z} lost locked state. Note in shift summary. Steffen 12.08.2016
LAr ABBA Warning Master Trigger not defined. Ignore. Alessandra 31.08.2016
LAr ABBA Error Error from call to select(): fd flagged but no data avialable probable EOF -- 0 similar messages suppressed, last occurrence was at This message is appearing when ABBA Controller is dead. During day, inform LAr RC, otherwise report at next run meeting. Alessandra 08.11.2016
LAr ABBA Error Action 'CONFIG': ipbusudp-2.0://10.145.91.18:2 Error in ABBA reset : TIMEOUT This message is appearing if the connection with a board is lost and we are passing through CONFIG state. During day, inform LAr RC, otherwise report at next run meeting. Alessandra 08.11.2016
LAr ABBA Error Failed to notify the parent controller "LArABBA" about changes in my status This message only appears when the machine where LAr_ABBA runs is dead. During day, inform LAr RC, otherwise report at next run meeting. Alessandra 08.11.2016
LAr ABBA Fatal The transition "CONNECT" has not been properly completed This message only appears when there is a failure in restarting the application. During day, inform LAr RC, otherwise report at next run meeting. Alessandra 08.11.2016
LAr ABBA Error Error from send on a TCP connection : a call to send returned 104 This message only appears when there is a failure in restarting the application. During day, inform LAr RC, otherwise report at next run meeting. Alessandra 08.11.2016
LAr ABBA Error Application "LArABBA" cannot be re-started. Reason: errors occurred trying to start it This message only appears when there is a failure in restarting the application. During day, inform LAr RC, otherwise report at next run meeting. Alessandra 08.11.2016
LAr ABBA Error The transition "STOPROIB" could not be propagated to application "EB_ABBA" This message only appears when there is a failure in restarting the application. During day, inform LAr RC, otherwise report at next run meeting. Alessandra 08.11.2016
LAr ABBA Error unknown This message appears when there is a failure in restarting the application. During day, inform LAr RC, otherwise report at next run meeting. Alessandra 08.11.2016

Tile

Tile WARNING TileLaserRCD TileLaserII::Warning TileLaserII Warning NoLaserTT if you see this warning during a cosmics/collisions runs: check with the trigger desk if L1_CALREQ is enabled (it's in the menu and the prescale is not -1). If not, they need to correct this, L1_CALREQ2 should be included. If it is included, this means the shaft might be stuck, call the DAQ- on-call. Report in separate eLog once it is resolved. Arely June 2015
Tile l1calo-trigger-monitor-trigmon::LowRate Low L1Calo rate in Tile XXXYY Tower #. RECONFIGURE THIS MODULE If you see this ERS error, check first the L1Calo map and make sure you don't see anomalies for the correspondent tower. If it's the case, do not reconfigure the module. Put the error in your shift summary, no need for a separate eLog. If you see a problem in the L1Calo map, check with the trigger shifter and inform the Tile RC. Silvia 20/07/2016
Tile WARNING ROS-TIL-EBC-00 ROS::ROSRobinNPExceptions Duplicate fragment: RobinNP::processIncomingFragment: Fragment for L1ID 0x1f043d57 already exists in index for ROL 0x5 replacing with newer version for RobinNP 0, WARNING ROS-TIL-EBC-00 ROS::ROSRobinNPExceptions : RobinNP::clearRequest: The RobinNP could not delete ### events because they were not in its buffer (...), WARNING ROS-TIL-EBC-00 Fragment error: RobinNP::processIncomingFragment: ROL ### Fragment out of sequence: L1 ID = 0x2b009d51, Most Recent ID 0x2b009d4b for RobinNP 1 We might get some of these if the L1 rate is relatively high (we are investigating). If you observe them, wait for the run to go for a couple of hrs, so you can get an impression on how often they appear. Then, post an eLog commenting on how frequent they are, which partition they are coming from (LBA, EBA LBC, EBC) and the L1 and HLT rates of the run (found in 'counters' in the TDAQ Igui), and posting a few examples. NB They may come in bunches of 5-10, and then nothing for 1-2hrs. If the rate at which they show is much higher, call DAQ on-call or run coordinator. Arely 08.08.15
Tile Tile L1Calo tower 0x06160f03 is MUCH hotter than its neighbours. Eta=0.95 Phi=3.09 (9,31).  Please check L1 Empty trigger rates for unusual activity (EM3,J10,Tau8 etc.) The message can be MUCH hotter, Very hot or LARGE spike, for these towers: 0x06140b01, 0x06160f03 and 0x07130f00. These are known hot towers. Check the rates and if you see a correlation post a separate eLog (including LVL1). Otherwise you can ignore them (post the warnings only once in your summary). For any other towers check with the trigger shifter that the rates are ok and, if not, call L1Calo on-call (and post a separate eLog with trigger rates and warnings including LVL1). Silvia 03.06.16
Tile Tile L1Calo tower 0x061c0601  is hotter than its neighbours. Eta=-0.95 Phi=-0.83 (-10,55).  Please check L1 Empty trigger rates for unusual activity (EM3,J10,Tau8 etc.) The message can appear for these towers: 0x061c0601, 0x061c0603, 0x061c0701, 0x061c0703, 0x061d0701, 0x061d0703. These are known hot towers. Check the rates and if you see a correlation post a separate eLog (including LVL1). Otherwise you can ignore them (post the warnings only once in your summary). The same applies for the following towers when the LHC status is NOT Stable Beams: 0x071d0601, 0x061a0703, 0x061e0703, 0x061e0803, 0x071d0600, 0x061a0802, 0x07190901, 0x07190600. For any other towers check with the trigger shifter that the rates are ok and, if not, call L1Calo on-call (and post a separate eLog with trigger rates and warnings including LVL1). Silvia 03.06.16
Tile Digital Errors/Corruption LBA01, LBA52 and EBC06 are reporting digital errors and are known. You don't have to power cycle them if the digital error fraction is >5%. Silvia 20.07.16
Tile Hot spot at eta=1, phi=1 If DQ online shifter reports about hot spot at eta=1, phi=1, this is a known hot spot and can be ignored. Silvia 05.05.16
Tile Recurring calibration errors which can be ignored CHIP ERROR Example; No valid clock detected Example; STOPGATHERING Example. Silvia 11.05.16

LUCID

LUCID ReadoutModuleLUCROD::Lucrod-PMTSideC::connect: still on internal clock: no orbits from TTC system If this error appears only for one side it can be safely ignored, otherwise call the LUCID On-call Davide 02.09.2015
LUCID WARNING LUCID-SideA ers::Message ReadoutModuleLUCROD::Lucrod-PMTSideA::move2externalclock: still on internal clock If this error appears only once at the beginning of the run and only for one side it can be safely ignored, otherwise call the LUCID On-call Davide 19.09.2016
LUCID ERROR LUCID-SideA ers::Message ReadoutModuleLUCROD::Lucrod-PMTSideA::connect: TTCrq status not as expected. Err code =1 If this error appears only once at the beginning of the run and only for one side it can be safely ignored, otherwise call the LUCID On-call Davide 19.09.2016
LUCID ERROR [LUCID-SideA-LED] Mean threshold in Channel 2 of Lucrod Lucrod-PMTSideA=59.62: check for baseline shifts (Thres RMS is 0.485386) It's OK if error messages of such type appearing at the end of the calibration in the "Messages" section of the Combined Calibration panel. No need to call to expert Sasha 17.06.2016
LUCID PMT publish: Local LB changed during reading. Previous: 0; current: 1; skipping this data If there are just a few messages like this during the run --> no action needed, but all of them have to be reported in shift summary. If these messages keep appearing --> call tdaq on-call Sasha 01.07.2016
LUCID IS infomation 'XXX' does not exist, where XXX is DF.ROS.LUCID-EB-RCD-SIDEA-CALIB (or side C) or DF.ROS.LUCID-SideC-LED or RunParams.ConfigVersions It's OK if such messages are present in the log after calibration. No action is needed. Sasha 01.07.2016

AFP

AFP FATAL AFP-RCD-RCE-P1 rc::TransitionFailed The transition "RESYNCH" has not been properly completed If this error appears, call the On-call AFP expert Ivan 06/05/2016
AFP WARNING ResInfoProvider ResourcesInfo::ConfigError Configuration problem: the detector with id 133 linked with 'ROS-FWD-AFP-00@ROS' [..] is out of ranges defined by resources info configuration This warning may appear after start of run or a TTCrestart of any system. No action is required. Ivan 06/05/2016
AFP WARNING ROS-FWD-AFP-00 ROS::ROSRobinNPExceptions : RobinNP::clearRequest: The RobinNP could not delete X events because they were not in its buffer. [...] A large number of these warnings may appear after a TTC restart. Check if the FATAL "=The transition "RESYNCH" has not been properly completed=" occured inmediately after TTCrestart (see above). If not, check that after 6 minutes these warnings stop appearing. Ivan 06/05/2016

DCS warnings and errors

LAr

Detector Name of the alarm Actions to be taken by next shifters Author Date
LAr LAR TERMO LArTermoCan* E***_* NOT OPERATIONAL call HW oncall to inform Sergey (during day), document, very rare Richard 13.05.16
LAr LAR TERMO LArTermoCan* E***_* emergency no need to call, report full alarm text in shift summary Sergey 01.04.16
LAr LAR FECLV * FECCur crate_* LowVoltageSupply Current or Status WARNING no need to call, report full alarm text in shift summary Aaron/Manuela/Jose 12.08.15
LAr !VNotAtVop WENT WARNING , in HV channel for ~10s only Check voltage trend/plot, voltage should have changed only for one reading ~10s, alarm should be in WENT. No need to call, just put in the summary. Alarm will be acknowledged by HW on-call. NOTE: If accompanied by a second alarm for !TripAutoRecovery call RC. Jose 01.07.15
LAr Example: WENT LAR FCAL C HV FCAL C 01 0 S3 M192-C7 Not at Op Voltage Only for FCAL. These happen sometimes at ADJUST or during background spikes. IMPORTANT: The alarm should go to WENT almost immediately. If it does not, please call HW on call (70137) asap and write an elog. Otherwise, just put in Shift Summary. Claire 04.08.16
LAr LAr Cryogenic system If the Slimos or the shift leader tell you that there is a problem with ATLAS>EXT>CRYO ARGON, please note that this is NOT under your responsability. The Slimos should inform the piquet S. Mazza 20.05.15
LUCID ATLLCD02.ELMBE/ELMBCanBus_1/ELMB_X/AI/PT_4W_Y_Z.value If the alert refers only to one isolated value, i.e. there is no trend, it can be acknowledged. The reason(s) for these hick-ups is under investigation, most probably electronic noise. In case of doubt the DCS on-call must be called at any time: 16 1981 D. Caforio 17.04.2015

Tile

Tile Warning ATLTILLV01:ELMB/LVCAN3/LVPS_56/AI/15VMB_TEMP3 Known issue, see: eLog. No need to call expert, but report in a separate eLog if the warning re-appears. Arely Cortes-Gonzalez 12.05.2015
Tile Tile LBA DRAWER 61 is "HALF-ON" Known communication problem, it can cause Tile DCS in NOT_READY mode but it should disappear shortly afterwards. No need to call the expert, but report in shift summary. Silvia Fracchia 06.05.2016
Tile ATLTILLV02_LVCAN4 LVPS_60 emergency error code emergency message received Known issue, experts working on it. For the moment, no need to call the expert, but report in shift summary. Silvia Fracchia 11.07.2016

-- VincentPascuzzi - 2017-04-03

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2017-04-05 - VincentPascuzzi
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback