%CERTIFY%

CarloTest

Introduction

This page describes the debugging procedures for FELIX systems. The first part of the twiki focuses on the hardware and is relevant for the on-call readout expert, the second is dedicated to the software and is relevant for on-call DAQ/HLT experts.

ALERT! Note on logging: whenever an intervention is made on a system at P1 a dedicated elog entry should be made describing what was done and why, plus the expected impact on the system.

ALERT! Note of jargon, cards and devices: one physical FELIX card is seen as two PCI devices by the computer. In many tools the card is selected via option -c, the device with -d. The enumeration starts at 0: card 0 includes devices 0 and 1. Card 1 corresponds to devices 2 and 3. Each device serves the e-links corresponding to one MTP connector (i.e. 12 GBT links per device on 24-channel cards). The TTC connector and BUSY are one per card.

Hardware

Diagnostic tools

Card identification, physical location

Some FELIX PCs are equipped with two cards. Card #0 is installed in the bottom slot of the FELIX PC, card #1 on the top slot. T

The location of the FELIX PCs is reported in the Table at the end of the page.

Optical power on GBT links

Each FELIX card is equipped with four optical transceivers called MiniPOD s. Two minipods emit light (TX), two receive light (RX). The emitted and received optical power can be visulised with the command

flx-info -c <card number>

The output is pasted below. The power is reported in the second table.


How to the read the table below:
# = FLX link endpoint OK     (no LOS)
- = FLX link endpoint not OK (LOS)
First letter:  Current channel status
Second letter: Latched channel status
Example: #(-) means channel had lost the signal in the past but the signal is present now.
 
Latched / current link status of channel:
        |   0  |   1  |   2  |   3  |   4  |   5  |   6  |   7  |   8  |   9  |   10  |   11  |
        |======|======|======|======|======|======|======|======|======|======|=======|=======|
1st TX | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) |  #(#) |  #(#) |
1st RX | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) |  -(-) |  -(-) |
2nd TX | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) |  #(#) |  #(#) |
2nd RX | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) |  -(-) |  -(-) |
 
 
Optical power (rx or tx) of channel in uW:
        |       0 |       1 |       2 |       3 |       4 |       5 |       6 |       7 |       8 |       9 |      10 |      11 |
        |=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|
1st TX |  845.10 |  872.70 |  873.30 |  888.70 |  898.90 |  852.70 |  923.10 |  852.40 |  978.20 |  862.30 |  911.90 |  773.30 |
1st RX |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |
2nd TX |  911.40 |  964.50 |  899.80 | 1016.60 |  940.30 |  929.50 |  976.30 |  954.80 | 1065.40 |  974.00 |  965.20 |  902.70 |
2nd RX |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |


Observation Conclusion Action
RX power of a channel is 0 No light coming from the corresponding fibre. Cross-check with the sub-detector expert that light is expected. If yes, swap two MTP cables and check if the dark channel correlates with the cable or the connector on the card. If the problem is on the FLX card, replace the FELIX PC.
RX power close to or below 100 uW Too little light for safe data transmission Swap two MTP cables and check if the dark channel correlates with the cable or the connector on the card.
TX power significantly below 800 uW Laser (MiniPOD) ageing Replace PC at next occasion without beam
RX light received but link status (first table) not #(#) Link not aligned. Run "flx-init -c <card number>" once more. Cross-check with detector expert that the front-end is in a good state. If the problem persists exchange the PC.

TTC fibre diagnostic

The status of the TTC optical connection can be checked with

flx-info -c <card number>

The output contains a section called "TTC (ADN2814) status" reporting one of the messages listed below

Message Action
The TTC optical connection is up and working Nothing to do.
No light arriving. Check the fibre connection to the FLX-712 Swap the TTC fibre with a fibre from a neighbouring FLX-712 reporting no issues.. If no light is detected replace the FELIX PC.
Light is arriving but the FLX-712 may have an internal problem

1. Swap the TTC fibre with a fibre from a neighbouring FLX-712

2. Log on to the neighbouring FELIX (from where you have taken the fibre) and and run "flx-info -c [X] ADN2814"

3. If on that FELIX you get "The TTC optical connection is up and working" you can conclude that the FLX-712 in the first FELIX has a problem. Replace the entire FELIX PC.

4. If you get the same error in the other FELIX as well, the problem is upstream in the TTC system. You cannot fix this.

BUSY LEMO diagnostic

The BUSY state can be manually switched. To verify the correct functionality

1. assert BUSY with

fttcbusy -d <device number> -m 1 -i 0

2. measure the output of the LEMO connector with a voltmeter. The output must be 0V (BUSY on = logical zero)

3. de-assert busy with

fttcbusy -d <device number> -m 1 -i 0

4. use the voltmeter to check that you have a logical 1 (between 3.3 and 5 V)

If you do not get the expected logic levels something is wrong with the FLX-712. Replace the entire PC by a spare.

Issues and solutions

flx-init reports "Lock not found" (failed recovery of TTC clock)

The output of flx-init produces the

Card type: FLX-712
Configuring Si5345...
Si5345 hard reset
Si5345 configuration done
Enabling Si5345 output
Si5345: LOS register = 0x20
Si5345: Sticky LOS register = 0xf0
Si5345: LOL register = 0x02
Si5345: LOL register = 0x02
Si5345: LOL register = 0x02
Si5345: LOL register = 0x02
Si5345: LOL register = 0x02
Si5345 ERROR: Lock not found in 5 secs
Si5345: Sticky LOL register = 0x06

Follow the instructions of section "TTC fibre diagnostic"

FELIX card not detected

If a FELIX card disappears from the system query the driver with

cat /proc/flx

and look for errors such as

Error message Action
ERROR: 1 card(s) were ignored because of a problem with the power status Reboot the PC. If the problem persists note down the output of =lspci grep CERN= and replace the PC

Monitoring via IS, ERS

Software

Diagnostic for the commissioning phase

List of FELIX nodes

The table below reports the list of all the FELIX nodes in USA15. The acronym HCA stands for Host Channel Adapter and indicated the type of Mellanox network card installed.

hostnamelocationinstalledfirmwareHCAnotes
NSW others
pc-tdq-flx-nsw-spare-00 2-5-1 U37 YES GBT x1 25 GbE
pc-tdq-flx-nsw-tp-a-00 2-5-1 U34 YES GBT x1 25 GbE (was pc-tdq-flx-nsw-stgc-tp-00)
pc-tdq-flx-nsw-tp-c-00 2-5-1 U32 YES GBT x1 25 GbE (was pc-tdq-flx-nsw-mm-tp-00)
pc-tdq-flx-nsw-spare-01 3-5-1 U42 YES GBT x1 25 GbE
NSW mm
pc-tdq-flx-nsw-mm-06 to 11 3-5-1 U29/40 NO GBT x1 25 GbE
pc-tdq-flx-nsw-mm-00 to 05 2-5-1 U19/30 YES GBT x1 25 GbE
NSW stgc
pc-tdq-flx-nsw-stgc-08 to 15 4-5-1 U24/39 NO GBT x1 25 GbE
pc-tdq-flx-nsw-stgc-00 to 07 4-5-1 U06/21 YES GBT x1 25 GbE
BIS 7/8
pc-tdq-flx-rpc-bis-00 5-5-1 U17 YES GBT x1 25 GbE
LAr LDPB
pc-tdq-flx-lar-ldpb-07 to 13 5-16-2 U26/39 YES FULL x1 100 GbE
pc-tdq-flx-lar-ldpb-00 to 6 4-16-2 U26/39 YES FULL x1 100 GbE
L1Calo
pc-tdq-flx-l1c-trex-01 7-11-2 U24 YES FULL x1 100 GbE
pc-tdq-flx-l1c-trex-00 7-11-2 U22 YES FULL x1 100 GbE
pc-tdq-flx-l1c-gfex-00 7-11-2 U20 YES FULL x1 100 GbE
pc-tdq-flx-l1c-jfex-00 7-11-2 U18 YES FULL x1 100 GbE
pc-tdq-flx-l1c-efex-00 7-11-2 U16 YES FULL x1 100 GbE
Tile
pc-tdq-flx-til-00 5-5-1 U25 YES FULL x1 100 GbE
Spares
pc-tdq-flx-spare-00 to 01 5-5-1 U22/24 NO - -

List of SW ROD nodes

hostname location HCA notes
pc-tdq-swrod-rpc-bis-00 5-5-1 U37 x1 25 GbE, x1 40 GbE  
pc-tdq-swrod-til-00 5-5-1 U36 x1 100 GbE, x1 40 GbE  
pc-tdq-swrod-nsw-00 to 08 5-5-1 U27/35 x1 25 GbE, x1 40 GbE  
pc-tdq-swrod-lar-06 to 13 7-16-2 U23/30 x1 100 GbE, x1 40 GbE  
pc-tdq-swrod-lar-00 to 05 6-16-2 U27/32 x1 100 GbE, x1 40 GbE  
pc-tdq-swrod-l1c-00 to 05 7-11-2 U32/38 x1 100 GbE, x1 40 GbE  


Major updates:
-- MarkusJoos - 2020-05-25

%RESPONSIBLE% CarloAlbertoGottardo
%REVIEW% Never reviewed


This topic: Sandbox > WebPreferences > CarloTest
Topic revision: r2 - 2021-10-17 - CarloAlbertoGottardo
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback