%CERTIFY%
CarloTest
Introduction
This page describes the debugging procedures for FELIX systems. The first part of the twiki focuses on the
hardware and is relevant for the on-call
readout expert, the second is dedicated to the
software and is relevant for on-call
DAQ/HLT experts.
Note on logging: whenever an intervention is made on a system at P1 a dedicated elog
entry should be made describing what was done and why, plus the expected impact on the system.
Note of jargon, cards and devices: one physical FELIX card is seen as two PCI devices by the computer. In many tools the card is selected via option -c
, the device with -d
. The enumeration starts at 0: card 0 includes devices 0 and 1. Card 1 corresponds to devices 2 and 3. Each device serves the e-links corresponding to one MTP connector (i.e. 12 GBT links per device on 24-channel cards). The TTC connector and BUSY are one per card.
Hardware
Diagnostic tools
Card identification, physical location
Some FELIX PCs are equipped with two cards. Card #0 is installed in the bottom slot of the FELIX PC, card #1 on the top slot. T
The location of the FELIX PCs is reported in the
Table at the end of the page.
Optical power on GBT links
Each FELIX card is equipped with four optical transceivers called
MiniPOD s. Two minipods emit light (TX), two receive light (RX). The emitted and received optical power can be visulised with the command
flx-info -c <card number>
The output is pasted below. The power is reported in the second table.
How to the read the table below:
# = FLX link endpoint OK (no LOS)
- = FLX link endpoint not OK (LOS)
First letter: Current channel status
Second letter: Latched channel status
Example: #(-) means channel had lost the signal in the past but the signal is present now.
Latched / current link status of channel:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
|======|======|======|======|======|======|======|======|======|======|=======|=======|
1st TX | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) |
1st RX | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) |
2nd TX | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) |
2nd RX | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) |
Optical power (rx or tx) of channel in uW:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|
1st TX | 845.10 | 872.70 | 873.30 | 888.70 | 898.90 | 852.70 | 923.10 | 852.40 | 978.20 | 862.30 | 911.90 | 773.30 |
1st RX | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2nd TX | 911.40 | 964.50 | 899.80 | 1016.60 | 940.30 | 929.50 | 976.30 | 954.80 | 1065.40 | 974.00 | 965.20 | 902.70 |
2nd RX | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Observation |
Conclusion |
Action |
RX power of a channel is 0 |
No light coming from the corresponding fibre. |
Cross-check with the sub-detector expert that light is expected. If yes, swap two MTP cables and check if the dark channel correlates with the cable or the connector on the card. If the problem is on the FLX card, replace the FELIX PC. |
RX power close to or below 100 uW |
Too little light for safe data transmission |
Swap two MTP cables and check if the dark channel correlates with the cable or the connector on the card. |
TX power significantly below 800 uW |
Laser (MiniPOD) ageing |
Replace PC at next occasion without beam |
RX light received but link status (first table) not #(#) |
Link not aligned. |
Run "flx-init -c <card number>" once more. Cross-check with detector expert that the front-end is in a good state. If the problem persists exchange the PC. |
TTC fibre diagnostic
The status of the TTC optical connection can be checked with
flx-info -c <card number>
The output contains a section called "TTC (ADN2814) status" reporting one of the messages listed below
Message |
Action |
The TTC optical connection is up and working |
Nothing to do. |
No light arriving. Check the fibre connection to the FLX-712 |
Swap the TTC fibre with a fibre from a neighbouring FLX-712 reporting no issues.. If no light is detected replace the FELIX PC. |
Light is arriving but the FLX-712 may have an internal problem |
1. Swap the TTC fibre with a fibre from a neighbouring FLX-712 2. Log on to the neighbouring FELIX (from where you have taken the fibre) and and run "flx-info -c [X] ADN2814" 3. If on that FELIX you get "The TTC optical connection is up and working" you can conclude that the FLX-712 in the first FELIX has a problem. Replace the entire FELIX PC. 4. If you get the same error in the other FELIX as well, the problem is upstream in the TTC system. You cannot fix this. |
BUSY LEMO diagnostic
The BUSY state can be manually switched. To verify the correct functionality
1. assert BUSY with
fttcbusy -d <device number> -m 1 -i 0
2. measure the output of the LEMO connector with a voltmeter. The output must be 0V (BUSY on = logical zero)
3. de-assert busy with
fttcbusy -d <device number> -m 1 -i 0
4. use the voltmeter to check that you have a logical 1 (between 3.3 and 5 V)
If you do not get the expected logic levels something is wrong with the FLX-712. Replace the entire PC by a spare.
Issues and solutions
flx-init reports "Lock not found" (failed recovery of TTC clock)
The output of flx-init produces the
Card type: FLX-712
Configuring Si5345...
Si5345 hard reset
Si5345 configuration done
Enabling Si5345 output
Si5345: LOS register = 0x20
Si5345: Sticky LOS register = 0xf0
Si5345: LOL register = 0x02
Si5345: LOL register = 0x02
Si5345: LOL register = 0x02
Si5345: LOL register = 0x02
Si5345: LOL register = 0x02
Si5345 ERROR: Lock not found in 5 secs
Si5345: Sticky LOL register = 0x06
Follow the instructions of section "TTC fibre diagnostic"
FELIX card not detected
If a FELIX card disappears from the system query the driver with
cat /proc/flx
and look for errors such as
Error message |
Action |
ERROR: 1 card(s) were ignored because of a problem with the power status |
Reboot the PC. If the problem persists note down the output of =lspci |
grep CERN= and replace the PC |
Monitoring via IS, ERS
Software
Diagnostic for the commissioning phase
List of FELIX nodes
The table below reports the list of all the FELIX nodes in USA15.
The acronym HCA stands for Host Channel Adapter and indicated the type of Mellanox network card installed.
hostname | location | installed | firmware | HCA | notes |
NSW others | | | | | |
pc-tdq-flx-nsw-spare-00 | 2-5-1 U37 | YES | GBT | x1 25 GbE | |
pc-tdq-flx-nsw-tp-a-00 | 2-5-1 U34 | YES | GBT | x1 25 GbE | (was pc-tdq-flx-nsw-stgc-tp-00) |
pc-tdq-flx-nsw-tp-c-00 | 2-5-1 U32 | YES | GBT | x1 25 GbE | (was pc-tdq-flx-nsw-mm-tp-00) |
pc-tdq-flx-nsw-spare-01 | 3-5-1 U42 | YES | GBT | x1 25 GbE | |
NSW mm | | | | | |
pc-tdq-flx-nsw-mm-06 to 11 | 3-5-1 U29/40 | NO | GBT | x1 25 GbE | |
pc-tdq-flx-nsw-mm-00 to 05 | 2-5-1 U19/30 | YES | GBT | x1 25 GbE | |
NSW stgc | | | | | |
pc-tdq-flx-nsw-stgc-08 to 15 | 4-5-1 U24/39 | NO | GBT | x1 25 GbE | |
pc-tdq-flx-nsw-stgc-00 to 07 | 4-5-1 U06/21 | YES | GBT | x1 25 GbE | |
BIS 7/8 | | | | | |
pc-tdq-flx-rpc-bis-00 | 5-5-1 U17 | YES | GBT | x1 25 GbE | |
LAr LDPB | | | | | |
pc-tdq-flx-lar-ldpb-07 to 13 | 5-16-2 U26/39 | YES | FULL | x1 100 GbE | |
pc-tdq-flx-lar-ldpb-00 to 6 | 4-16-2 U26/39 | YES | FULL | x1 100 GbE | |
L1Calo | | | | | |
pc-tdq-flx-l1c-trex-01 | 7-11-2 U24 | YES | FULL | x1 100 GbE | |
pc-tdq-flx-l1c-trex-00 | 7-11-2 U22 | YES | FULL | x1 100 GbE | |
pc-tdq-flx-l1c-gfex-00 | 7-11-2 U20 | YES | FULL | x1 100 GbE | |
pc-tdq-flx-l1c-jfex-00 | 7-11-2 U18 | YES | FULL | x1 100 GbE | |
pc-tdq-flx-l1c-efex-00 | 7-11-2 U16 | YES | FULL | x1 100 GbE | |
Tile | | | | | |
pc-tdq-flx-til-00 | 5-5-1 U25 | YES | FULL | x1 100 GbE | |
Spares | | | | | |
pc-tdq-flx-spare-00 to 01 | 5-5-1 U22/24 | NO | - | - | |
List of SW ROD nodes
hostname |
location |
HCA |
notes |
pc-tdq-swrod-rpc-bis-00 |
5-5-1 U37 |
x1 25 GbE, x1 40 GbE |
|
pc-tdq-swrod-til-00 |
5-5-1 U36 |
x1 100 GbE, x1 40 GbE |
|
pc-tdq-swrod-nsw-00 to 08 |
5-5-1 U27/35 |
x1 25 GbE, x1 40 GbE |
|
pc-tdq-swrod-lar-06 to 13 |
7-16-2 U23/30 |
x1 100 GbE, x1 40 GbE |
|
pc-tdq-swrod-lar-00 to 05 |
6-16-2 U27/32 |
x1 100 GbE, x1 40 GbE |
|
pc-tdq-swrod-l1c-00 to 05 |
7-11-2 U32/38 |
x1 100 GbE, x1 40 GbE |
|
Major updates:
--
MarkusJoos - 2020-05-25
%RESPONSIBLE%
CarloAlbertoGottardo
%REVIEW%
Never reviewed