HCAL DCS Basics
The purpose of this page is to provide some basic understanding on the layout and usage of the HCAL DCS/PVSS system.
It is assumed the reader has some knowledge of the CMS DCS system and conventions. The organization of HCAL FSM tree and the device state definitions are described
HERE.
Common Error and Alarm conditions and operator actions are described here .
Remote Connection to PVSS
For non-development work PVSS does not need to be installed on your machine. You can connect to the Windows Terminal Server at P5, and run PVSS from there. Instructions:
- Linux
- If not already installed on your machine, install the "rdesktop" package via your package manager.
-
-
- From CERN
-
- In a terminal, type :
rdesktop -g 100% -a24 -u niceusername -d CERN cerntscms.cern.ch
(replace niceusername with your NICE user name.)
- Then log in to "CERN" with your NICE login and Password
- Click on HCAL DCS, the PVSS panel should open.
-
-
- From outside CERN
-
- Login to lxplus. Make sure you have X11 forwarded. (ssh -X username@lxplusNOSPAMPLEASE.cern.ch)
- Once logged in, type :
rdesktop -g 100% -a24 -u niceusername -d CERN cerntscms.cern.ch
(replace niceusername with your NICE user name.)
- Then log in to "CERN" with your NICE login and Password
- Click on HCAL DCS, the PVSS panel should open.
- MAC
- If not already installed on your machine, install a remote desktop client for macs of your choosing, a good choice is the following: http://www.microsoft.com/mac/remote-desktop-client
-
-
- From CERN
-
- Open your remote desktop application and type
cerntscms.cern.ch
in the field.
- Then log in to "CERN" with your NICE login and Password.
- Click on HCAL DCS, the PVSS panel should open.
-
-
- From outside CERN
-
- Open a terminal and type
ssh -L 50000:cerntscms.cern.ch:3389 niceusername@lxplus.cern.ch
(replace niceusername with your NICE user name.)
- Open your remote desktop application and type
localhost:50000
in the field.
- Then log in to "CERN" with your NICE login and Password.
- Click on HCAL DCS, the PVSS panel should open.
Panels Overview
Main PVSS panel:
The main PVSS panel provides a general overview of HCAL's High Voltage (HV), Low Voltage (LV), and RBX status. Here are some brief descriptions of the more important services the Main panel provides:
- HCAL "System" (Top) node, "Sub-System" (Subdetectors) Nodes, and "Partitions" Nodes: These displays provide information on the current status of a given subdetector's (partition's) electronics.
- Power : Allows you to clear ERRORS after a LV trip ("Recover LV" button), monitor and turn on/off individual ACDCs ("ACDCs" button), and turn off all of HCAL ("Shutdown" button).
- A series of displays on the right and lower part of the panel show the status of the different components of all the subdetectors that compose HCAL, including ZDC and CASTOR. A layout of the central
components of HCAL can be seen
HERE.
* HCAL Main View:
HCAL Subdetectors:
HEP, HEM, HBP, HBM
- Both Subdetectors separated into Plus (P) side and Minus (M) side.
- Each is separated into 18 "wedges" in order to mimic the actual construction of HB and HE.
- The background color represents the current temperature status and the actual temperature can be seen by hovering the mouse pointer over the wedge.
- Each wedge, or Sector Node, has one HV module device, one RBX device, and one LV node with two LV channel devices. These are represented graphically by the three "dots" on each wedge.
Inner Dot: LV status (click for more detailed LV info)
Middle Dot: HV status (click for more detailed HV info)
Outer Dot: RBX status (click for more detailed RBX info)
HFP and HFM
- Subdetector separated into Plus (P) side and Minus (M) side.
- Each separated into 12 "wedges" in order to mimic the actual construction of HF.
- The background color represents the current temperature status and the actual temperature can be seen by hovering the mouse pointer over the wedge.
- The dot within each "wedge" corresponds to the RBX status.
- There is one HV module per quarter of the subdetector (click on the corresponding button for more detailed info on the HV status).
Q1: wedges 1-3
Q2: wedges 4-6
Q3: wedges 7-9
Q4: wedges 10-12
- There are two LV modules per side (Near and Far side). (click on the corresponding button for more detailed info on the LV status).
HO0, HO1, and HO2
- HO0 is separated into 12 sections, each with it's own LV, HV, and RBX module.
- HO1 and HO2 are separated into 12 sections, but a single LV/HV/RBX module covers two sections of the subdetector. Hence there are only six modules in total for each subsection of HO1 and HO2.
- HO is made up of five rings which fit side by side in the detector (HO2P, HO1P, HO0, HO1M, HO2M). In order to save space in the display screen, HO2P and HO2M are shown "inside" HO1P and HO1M respectively.
- The dots again represent LV/HV/RBX status. Hover over a dot in order to find out which module it represents (click the dot for more info).
- The background color represents the current temperature status and the actual temperature can be seen by hovering the mouse pointer over the box.
ZDC and CASTOR displays
- The frame color of the rectangular box represents the general status of the subdetector.
- The background color represents the current temperature status and the actual temperature can be seen by hovering the mouse pointer over the box.
- The dots inside shows the LV/HV/RBX status (click the dot for more info).
Warning/Alerts Screen
- This screen shows any HCAL local warning or alert messages. Warnings are shown in yellow font and alerts in red font.
- The message is updated every minute or so.
Common Error and Alert Conditions.
- Common HV incidents.
- Trips: HV Trips are error conditions that are propagated to the FSM tree and are shown by a red color on the LED representing the concerned HV device. HV trips are frequent during ramping time and in the very first moments of stable (HV ON) voltages, but extremely rare in normal stable conditions. Trips during ramping are in general recovered automatically by the HV system (unless the trip is recurrent). Trips that are not recovered automatically should be done with a RECOVER command. Expert should be contact in case of recurrent trip failures.
- Communication errors: Occasionally a HV device can go into error because of a communication problems, but it should not be consider a problem unless the problem persist for more that a minute (redout cycle is ~ 30 sec).
- HV Module hangs on busy state: The ramping up time of HCAL HV devices is of 3~4 min that can be extended in case of trips, but occasionally some channels may hang in an intermediate voltage, or reach the nominal value without reaching official ON state . The operator should check for such kind of incidents if a HV device persists for an unreasonable time in "Busy" (yellow in the HV state representation) state. A new ON command, or eventually a cycle of STANDBY/ON commands usually cure these incidents.
- Bad voltages: Sometimes, usually at the end of the ramping stage or shortly after it, some HV channels may go to a voltage different from their nominal value while staying in ON state. The DCS system monitors continuously the voltages and reports any channel voltage found outside tolerances as a red alert in the local alert messages panel indicating the device, channel, expected and read values. The operator should always be in alert for such kind of messages and send a RELOAD command to fix the incident (do not send, however, the command without expert advice if in the middle of a run).
- Magnet disabling HV: To protect HPDs from potentially dangerous discharges at intermediate fields, DCS will not allow to turn on the HV of HE,HB and HO if the magnet current is read between 100 and 17500A and the HCAL Safety System (HSS) will disable the same HV devices if a magnet ramping or discharge DSS signal is sensed. DCS will reenable automatically the HV when the veto condition has disappeared, but the HSS HV interlock has to be cleared with the "Reset HV" button of the HCAL HSS panel ("HSS" cutton of the main display). In case of Fast or Slow discharge or TDPC, the CMS DSS will also switch off the racks of the same HV devices, and the power of the racks need to be reestablished through the CDCS Rack Manager.
- Common LV incidents.
- CAEN Errors: HCAL Readout Boxes (RBXs) are powered by two LV channels, nominally at 6.5 and 5 volts. The LV system is based on the CAEN Easy system. Occasionally some LV channels may go to error state on a state transition, and they will not respond to any command unless the error state is cleared at the mainframe level. The "Recover LV" button of the main panel clears all HCAL LV error conditions (it may also clear ZDC HV error conditions, as the ZDC HV is also based on the CAEN platform).
- ACDC Power: The CAEN LV modules of HCAL receive service power from an ACDC. No communication is therefore possible with a power supply of a subdetector if its ACDC powering channel is off and all the LV channels of the module are then by definition in error state. The power of the ACDCs should be check if the LV is in error for large number of RBXs of a subdetector. The "ACDCs" button on the main display opens a panel that allow to check and control the state of HCAL ACDCs. A red color of the LED of an unit in the "Main Power" column indicates that the ACDC channel can not be power because of the lack of three phase AC power.
- HSS Action: The LV of a set of RBXs may be disabled by an interlock of the HSS in case for example of an overheating situation. Current HSS alarms are shown in the local alerts box and in the panel opened by the HSS button of the main display. Once the condition that triggered the HSS action has disappeared, the LV interlock can be cleared with the Reset LV button of the HSS panel. LV incidents caused by the HSS system should be in any case reported to the HCAL expert.
- Readout Warnings: The local alerts box may show sometimes warnings about bad LV readouts. The LV settings of all channels have being optimized to provide good voltages to all channels and these warnings are almost certainly caused by a bad digitalization or a communication error. Particularly notably for bad readouts are HBM03 and HEP14. Operators should not worry too much for this kind of warnings unless the readout is stable and persist for a long time (hours).
- Temperature Warnings.
- RBX Overheating: The local alerts display will show a warning if the temperature of a readout box exceeds 35 oC and and alert if over 40oC. At the same time the background color of the RBX representation will show in yellow or red the alert condition. This is not a major source of concern if the temperature readout is stable, but may indicate a cooling problem otherwise. DCS and DSS protect the system against dangerous overheating situations. The DCS system will switch OFF an RBX off it exceeds 41oC for two consecutive readouts and DSS would do it at around 45oC.
- LV Power supply overheating: Similar to RBX overheating. The warning threshold for LV power supplies overheating is at 40oC and the alarm at 50oC. DCS will turn off the module if exceeds 51 oC for two consecutive readouts and an internal safety protection will trip the module if exceeds 80 oC.
- RBX Errors.
- RBX status errors concern mainly DAQ and do not compromise the DCS status of the system.
LV, HV, and RBX Panels
- LV_Panel.png:
- HV_Panel.png:
- RBX_Panel.png:
HCAL Layout
Here's a slice of a quarter of the detector showing the location of the subdetectors with respect to the beamline (Not pictured: HF is located in the endcaps and ZDC is located on the LHC tunnel 140m on either side of the interaction point).
Final State Machine Tree (FSM Tree).
Now that you've seen the picture, here's some more detailed information:
Control Units
- Can be “taken” and control as an independent tree from different users.
- HCAL has one control unit per trigger partition: HEHBa, HEHBb, HEHBc, HF and HO.
- Status (Error, ON, OFF, etc...) defined by nodes below.
- Commands (Error, ON, OFF, etc...) propagated to nodes below.
Logical Units
- Can not be taken independently of the control unit to which belongs.
- Status defined by nodes below.
- Commands propagated to nodes below.
Hardware Devices
- Status given by state of the device they represent (i.e.:ON, OFF, Ramping).
- Commands act to change the hardware state.
Note: Commands propagate downwards. Status, alarms propagate upwards. So, for example, Let's refer back to the picture of the FSM Tree. Now, lets say HEM02_LV1 "trips" and goes into "ERROR". Since
Statuses propagate up, the HEM_LV, HEM02, HEMa, HEHBa, and CMS-HCAL displays will also go into "ERROR". To solve this problem you could navigate all the way to the HEM_LV status screen (through the HCAL Main Panel) and issue a "Recover" command; or, since
commands propagate downwards you could issue the "Recover" at the CMS-HCAL node located on the HCAL Main Panel.
Commands and Statuses
Logic for Control Unit and Logical Unit Status
- If any (enabled) Node bellow in error=> ERROR
- Else if any (enabled) Node bellow in OFF => OFF
- Else if any (enabled) Node bellow in Standby => Standby
- Else ON (All enabled Nodes below in ON).
Control Unit and Logical Unit Commands
- ON: Send ON Command to all nodes in the branch. (HV ON, LV ON).
- STANDBY: Send Standby Command to all nodes in the branch. (HV OFF, LV ON).
- OFF: Send OFF Command to all nodes in the branch. (HV OFF, LV OFF).
- RECOVER: Send RECOVER Command to all nodes in the branch. (Try to recover any device in error).
--
CristobalCeron, German Martinez - 15-March-2010