%CERTIFY%
CarloTest
Introduction
This page describes the debugging procedures for FELIX systems. The first part of the twiki focuses on the
hardware and is relevant for the on-call
readout expert, the second is dedicated to the
software and is relevant for on-call
DAQ/HLT experts.
Note on logging: whenever an intervention is made on a system at P1 a dedicated elog
entry should be made describing what was done and why, plus the expected impact on the system.
Note of jargon: one physical FELIX card is seen as two PCI devices by the computer. In many tools the card is selected via option -c
, the device with -d
. The enumeration starts at 0: card 0 includes devices 0 and 1. Card 1 corresponds to devices 2 and 3. Each device serves the e-links corresponding to one MTP connector (i.e. 12 GBT links per device on 24-channel cards). The TTC connector and BUSY are one per card.
Hardware
Diagnostic tools
Optical power on GBT links
The
TTC fibre diagnositic
BUSY LEMO diagnositc
Issues
Software
GBT link diagnostic
Each FELIX-PC is equipped with one or two FLX-712 PCIe cards. For the purpose of the S/W tools they are enumerated with 0 and 1. This is the [X] parameter in e.g. flx-info.
Each card has two connectors for MTP fibres. Depending on the number (4 or 8) of
MiniPOD devices installed on the FLX-712, these fibres have (together) 24 or 48 channels.
Before drawing any conclusions about the status of these channels run the "flx-init -c [X]" program to (re)initialize the links.
Then run "flx-info -c [X] POD". You will see something like this:
How to the read the table below:
# = FLX link endpoint OK (no LOS)
- = FLX link endpoint not OK (LOS)
First letter: Current channel status
Second letter: Latched channel status
Example: #(-) means channel had lost the signal in the past but the signal is present now.
Latched / current link status of channel:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
|======|======|======|======|======|======|======|======|======|======|=======|=======|
1st TX | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) |
1st RX | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) |
2nd TX | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) |
2nd RX | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) |
Optical power (rx or tx) of channel in uW:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|
1st TX | 845.10 | 872.70 | 873.30 | 888.70 | 898.90 | 852.70 | 923.10 | 852.40 | 978.20 | 862.30 | 911.90 | 773.30 |
1st RX | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
2nd TX | 911.40 | 964.50 | 899.80 | 1016.60 | 940.30 | 929.50 | 976.30 | 954.80 | 1065.40 | 974.00 | 965.20 | 902.70 |
2nd RX | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
How to interpret this and what to do:
Observation |
Conclusion |
Action |
RX power (second table) of a channel is "0" |
no light coming into the FLX-712 |
Swap two MTP cables and check if the dark channel correlates with the cable or the connector on the card. If the problem is on the FLX card, replace the FELIX PC |
RX power close to or below 100 uW |
Too little light for safe data transmission |
Swap two MTP cables and check if the dark channel correlates with the cable or the connector on the card. |
TX power significantly below 800 uW |
Laser (MiniPOD) ageing |
Replace PC at next occasion without beam |
Link status (first table) not #(#) |
Link down (Note: Some links may be down because they are not used. The shift leader in the control room should know which links have to be up) |
Run "flx-init -c [X] once more. If the problem persists exchange the PC. |
TTC fibre diagnostic
- Execute "flx-info -c [X] ADN2814" (with X = number of card. I.e. "0" or "1")
- If you see the message "The TTC optical connection is up and working" you have got nothing to do
- If you see the message "No light arriving. Check the fibre connection to the FLX-712" swap the TTC fibre with a fibre from a neighbouring FLX-712 and draw conclusions.
- If you see the message "Light is arriving but the FLX-712 may have an internal problem":
- Swap the TTC fibre with a fibre from a neighbouring FLX-712
- Log on to the neighbouring FELIX (from where you have taken the fibre) and and run "flx-info -c [X] ADN2814"
- If on that FELIX you get "The TTC optical connection is up and working" you can conclude that the FLX-712 in the first FELIX has a problem. Replace the entire FELIX PC
- If you get the same error in the other FELIX as well, the problem is upstream in the TTC system. You cannot fix this.
Busy LEMO diagnostic
- Execute "fttcbusy -m 1 -i 0"
- Connect the voltmeter with the LEMO adapter to the Busy output and check that you have logical 0. This mean 0 Volt
- Execute "fttcbusy -m 0 -i 1"
- Use the voltmeter to check that you have a logical 1. This means 3.3V
If you do not get the expected logic levels something is wrong with the FLX-712. Replace the entire PC by a spare.
List of FELIX nodes
The table below reports the list of all the FELIX nodes in USA15.
The acronym HCA stands for Host Channel Adapter and indicated the type of Mellanox network card installed.
hostname |
location |
installed |
firmware |
HCA |
notes |
NSW others |
|
|
|
|
|
pc-tdq-flx-nsw-spare-00 |
2-5-1 U37 |
YES |
GBT |
x1 25 GbE |
|
pc-tdq-flx-nsw-tp-a-00 |
2-5-1 U34 |
YES |
GBT |
x1 25 GbE |
(was pc-tdq-flx-nsw-stgc-tp-00) |
pc-tdq-flx-nsw-tp-c-00 |
2-5-1 U32 |
YES |
GBT |
x1 25 GbE |
(was pc-tdq-flx-nsw-mm-tp-00) |
pc-tdq-flx-nsw-spare-01 |
3-5-1 U42 |
YES |
GBT |
x1 25 GbE |
|
NSW mm |
|
|
|
|
|
pc-tdq-flx-nsw-mm-06 to 11 |
3-5-1 U29/40 |
NO |
GBT |
x1 25 GbE |
|
pc-tdq-flx-nsw-mm-00 to 05 |
2-5-1 U19/30 |
YES |
GBT |
x1 25 GbE |
|
NSW stgc |
|
|
|
|
|
pc-tdq-flx-nsw-stgc-08 to 15 |
4-5-1 U24/39 |
NO |
GBT |
x1 25 GbE |
|
pc-tdq-flx-nsw-stgc-00 to 07 |
4-5-1 U06/21 |
YES |
GBT |
x1 25 GbE |
|
BIS 7/8 |
|
|
|
|
|
pc-tdq-flx-rpc-bis-00 |
5-5-1 U17 |
YES |
GBT |
x1 25 GbE |
|
LAr LDPB |
|
|
|
|
|
pc-tdq-flx-lar-ldpb-07 to 13 |
5-16-2 U26/39 |
YES |
FULL |
x1 100 GbE |
|
pc-tdq-flx-lar-ldpb-00 to 6 |
4-16-2 U26/39 |
YES |
FULL |
x1 100 GbE |
|
L1Calo |
|
|
|
|
|
pc-tdq-flx-l1c-trex-01 |
7-11-2 U24 |
YES |
FULL |
x1 100 GbE |
|
pc-tdq-flx-l1c-trex-00 |
7-11-2 U22 |
YES |
FULL |
x1 100 GbE |
|
pc-tdq-flx-l1c-gfex-00 |
7-11-2 U20 |
YES |
FULL |
x1 100 GbE |
|
pc-tdq-flx-l1c-jfex-00 |
7-11-2 U18 |
YES |
FULL |
x1 100 GbE |
|
pc-tdq-flx-l1c-efex-00 |
7-11-2 U16 |
YES |
FULL |
x1 100 GbE |
|
Tile |
|
|
|
|
|
pc-tdq-flx-til-00 |
5-5-1 U25 |
YES |
FULL |
x1 100 GbE |
|
Spares |
|
|
|
|
|
pc-tdq-flx-spare-00 to 01 |
5-5-1 U22/24 |
NO |
- |
- |
|
List of SW ROD nodes
hostname |
location |
HCA |
notes |
pc-tdq-swrod-rpc-bis-00 |
5-5-1 U37 |
x1 25 GbE, x1 40 GbE |
|
pc-tdq-swrod-til-00 |
5-5-1 U36 |
x1 100 GbE, x1 40 GbE |
|
pc-tdq-swrod-nsw-00 to 08 |
5-5-1 U27/35 |
x1 25 GbE, x1 40 GbE |
|
pc-tdq-swrod-lar-06 to 13 |
7-16-2 U23/30 |
x1 100 GbE, x1 40 GbE |
|
pc-tdq-swrod-lar-00 to 05 |
6-16-2 U27/32 |
x1 100 GbE, x1 40 GbE |
|
pc-tdq-swrod-l1c-00 to 05 |
7-11-2 U32/38 |
x1 100 GbE, x1 40 GbE |
|
Major updates:
--
MarkusJoos - 2020-05-25
%RESPONSIBLE%
CarloAlbertoGottardo
%REVIEW%
Never reviewed