%CERTIFY%

CarloTest

Introduction

This page describes the debugging procedures for FELIX systems. The first part of the twiki focuses on the hardware and is relevant for the on-call readout expert, the second is dedicated to the software and is relevant for on-call DAQ/HLT experts.

Note on logging: whenever an intervention is made on a system at P1 a dedicated elog entry should be made describing what was done and why, plus the expected impact on the system.

Note of jargon: one physical FELIX card is seen as two PCI devices by the computer. In many tools the card is selected via option -c, the device with -d. The enumeration starts at 0: card 0 includes devices 0 and 1. Card 1 corresponds to devices 2 and 3. Each device serves the e-links corresponding to one MTP connector (i.e. 12 GBT links per device on 24-channel cards). The TTC connector and BUSY are one per card.

Hardware

Diagnostic tools

Optical power on GBT links

The

TTC fibre diagnositic

BUSY LEMO diagnositc

Issues

Software

GBT link diagnostic

Each FELIX-PC is equipped with one or two FLX-712 PCIe cards. For the purpose of the S/W tools they are enumerated with 0 and 1. This is the [X] parameter in e.g. flx-info.

Each card has two connectors for MTP fibres. Depending on the number (4 or 8) of MiniPOD devices installed on the FLX-712, these fibres have (together) 24 or 48 channels.

Before drawing any conclusions about the status of these channels run the "flx-init -c [X]" program to (re)initialize the links. Then run "flx-info -c [X] POD". You will see something like this:

How to the read the table below:
# = FLX link endpoint OK     (no LOS)
- = FLX link endpoint not OK (LOS)
First letter:  Current channel status
Second letter: Latched channel status
Example: #(-) means channel had lost the signal in the past but the signal is present now.
 
Latched / current link status of channel:
        |   0  |   1  |   2  |   3  |   4  |   5  |   6  |   7  |   8  |   9  |   10  |   11  |
        |======|======|======|======|======|======|======|======|======|======|=======|=======|
1st TX | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) |  #(#) |  #(#) |
1st RX | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) |  -(-) |  -(-) |
2nd TX | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) | #(#) |  #(#) |  #(#) |
2nd RX | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) | -(-) |  -(-) |  -(-) |
 
 
Optical power (rx or tx) of channel in uW:
        |       0 |       1 |       2 |       3 |       4 |       5 |       6 |       7 |       8 |       9 |      10 |      11 |
        |=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|=========|
1st TX |  845.10 |  872.70 |  873.30 |  888.70 |  898.90 |  852.70 |  923.10 |  852.40 |  978.20 |  862.30 |  911.90 |  773.30 |
1st RX |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |
2nd TX |  911.40 |  964.50 |  899.80 | 1016.60 |  940.30 |  929.50 |  976.30 |  954.80 | 1065.40 |  974.00 |  965.20 |  902.70 |
2nd RX |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |    0.00 |

How to interpret this and what to do:

Observation Conclusion Action
RX power (second table) of a channel is "0" no light coming into the FLX-712 Swap two MTP cables and check if the dark channel correlates with the cable or the connector on the card. If the problem is on the FLX card, replace the FELIX PC
RX power close to or below 100 uW Too little light for safe data transmission Swap two MTP cables and check if the dark channel correlates with the cable or the connector on the card.
TX power significantly below 800 uW Laser (MiniPOD) ageing Replace PC at next occasion without beam
Link status (first table) not #(#) Link down (Note: Some links may be down because they are not used. The shift leader in the control room should know which links have to be up) Run "flx-init -c [X] once more. If the problem persists exchange the PC.

TTC fibre diagnostic

  1. Execute "flx-info -c [X] ADN2814" (with X = number of card. I.e. "0" or "1")
  2. If you see the message "The TTC optical connection is up and working" you have got nothing to do
  3. If you see the message "No light arriving. Check the fibre connection to the FLX-712" swap the TTC fibre with a fibre from a neighbouring FLX-712 and draw conclusions.
  4. If you see the message "Light is arriving but the FLX-712 may have an internal problem":
    1. Swap the TTC fibre with a fibre from a neighbouring FLX-712
    2. Log on to the neighbouring FELIX (from where you have taken the fibre) and and run "flx-info -c [X] ADN2814"
    3. If on that FELIX you get "The TTC optical connection is up and working" you can conclude that the FLX-712 in the first FELIX has a problem. Replace the entire FELIX PC
    4. If you get the same error in the other FELIX as well, the problem is upstream in the TTC system. You cannot fix this.

Busy LEMO diagnostic

  1. Execute "fttcbusy -m 1 -i 0"
  2. Connect the voltmeter with the LEMO adapter to the Busy output and check that you have logical 0. This mean 0 Volt
  3. Execute "fttcbusy -m 0 -i 1"
  4. Use the voltmeter to check that you have a logical 1. This means 3.3V
If you do not get the expected logic levels something is wrong with the FLX-712. Replace the entire PC by a spare.

List of FELIX nodes

The table below reports the list of all the FELIX nodes in USA15. The acronym HCA stands for Host Channel Adapter and indicated the type of Mellanox network card installed.

hostname location installed firmware HCA notes
NSW others          
pc-tdq-flx-nsw-spare-00 2-5-1 U37 YES GBT x1 25 GbE  
pc-tdq-flx-nsw-tp-a-00 2-5-1 U34 YES GBT x1 25 GbE (was pc-tdq-flx-nsw-stgc-tp-00)
pc-tdq-flx-nsw-tp-c-00 2-5-1 U32 YES GBT x1 25 GbE (was pc-tdq-flx-nsw-mm-tp-00)
pc-tdq-flx-nsw-spare-01 3-5-1 U42 YES GBT x1 25 GbE  
NSW mm          
pc-tdq-flx-nsw-mm-06 to 11 3-5-1 U29/40 NO GBT x1 25 GbE  
pc-tdq-flx-nsw-mm-00 to 05 2-5-1 U19/30 YES GBT x1 25 GbE  
NSW stgc          
pc-tdq-flx-nsw-stgc-08 to 15 4-5-1 U24/39 NO GBT x1 25 GbE  
pc-tdq-flx-nsw-stgc-00 to 07 4-5-1 U06/21 YES GBT x1 25 GbE  
BIS 7/8          
pc-tdq-flx-rpc-bis-00 5-5-1 U17 YES GBT x1 25 GbE  
LAr LDPB          
pc-tdq-flx-lar-ldpb-07 to 13 5-16-2 U26/39 YES FULL x1 100 GbE  
pc-tdq-flx-lar-ldpb-00 to 6 4-16-2 U26/39 YES FULL x1 100 GbE  
L1Calo          
pc-tdq-flx-l1c-trex-01 7-11-2 U24 YES FULL x1 100 GbE  
pc-tdq-flx-l1c-trex-00 7-11-2 U22 YES FULL x1 100 GbE  
pc-tdq-flx-l1c-gfex-00 7-11-2 U20 YES FULL x1 100 GbE  
pc-tdq-flx-l1c-jfex-00 7-11-2 U18 YES FULL x1 100 GbE  
pc-tdq-flx-l1c-efex-00 7-11-2 U16 YES FULL x1 100 GbE  
Tile          
pc-tdq-flx-til-00 5-5-1 U25 YES FULL x1 100 GbE  
Spares          
pc-tdq-flx-spare-00 to 01 5-5-1 U22/24 NO - -  

List of SW ROD nodes

hostname location HCA notes
pc-tdq-swrod-rpc-bis-00 5-5-1 U37 x1 25 GbE, x1 40 GbE  
pc-tdq-swrod-til-00 5-5-1 U36 x1 100 GbE, x1 40 GbE  
pc-tdq-swrod-nsw-00 to 08 5-5-1 U27/35 x1 25 GbE, x1 40 GbE  
pc-tdq-swrod-lar-06 to 13 7-16-2 U23/30 x1 100 GbE, x1 40 GbE  
pc-tdq-swrod-lar-00 to 05 6-16-2 U27/32 x1 100 GbE, x1 40 GbE  
pc-tdq-swrod-l1c-00 to 05 7-11-2 U32/38 x1 100 GbE, x1 40 GbE  


Major updates:
-- MarkusJoos - 2020-05-25

%RESPONSIBLE% CarloAlbertoGottardo
%REVIEW% Never reviewed

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2021-10-15 - CarloAlbertoGottardo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback