Cosmic Rack Commissioning Guide

For information on commissioning the XY Table, please contact Stefano Mersi.

Step 0. Create commissioning runs for a new partition and migrate them to RCMS:

  • This step only needs to be done once (or whenever you want to change something in the configuration).
  • Login to cmstracker035 with username xdaqtk.
  • Go to directory /home/xdaqtk/autotest/Crack2013/
  • Here you'll find the following files:
      • nukeDuckCad (a program to delete existing configurations from the DuckCad)
      • DevConfiguration_slc5.xml (file containing configuration settings)
      • commissioningSequence (a list of executables for each of the commissioning runs)
      • CRACKPart.xml (the partition file)
    • Edit the file commissionSequence. Change the list of commissioning runs to create for the partition by un/commenting lines such as 'tkconf11mc 01-CrackCratescan... '.
    • Also change the author and folder columns so these runs will be distinct and separate from those created by others.
    • Edit file CRACKPart.xml. Give it a new partition name in the format using today's date 'CR_DAY-MONTH-YEAR_1'.
    • Once the changes are made and you are happy with the setup, create the commissioning runs for the new partition, i.e. 'source commissioningSequence'.
    • Check that there were no errors during the creation of the commissioning runs
  • Migrate the commissioning runs
    • Open the manager (type 'manager').
    • Connect to cmstracker035 and select rcms user 'CRACKdev'.
    • Go to the Migrator and connect to the Duck.
    • Manually move each of the individual configuration down from the Duck to RS3 database below.
    • After you finish moving the configuration disconnect from both the RS3 and the Duck.
  • Once the commissioning runs are successfully created and migrated you are ready to commission the CRack using your new partition

Step 1. CRack CrateScan

  • Warning: Do not make a cratescan more than once per partition.
  • Open a web browser and go to http://cmstracker035:8080/rcms * Login using the username CRACKdev
  • Click on 'Configuration Chooser', then click on the partition name 'CRACK_DAY-MON-YEAR', then choose '01-CrackCratescan'.
  • 'Create' the CrateScan and 'Initialize' it. Do Not 'Configure' Until You Set the FEC Parameters
  • Set the FEC Parameters by first clicking on 'Status Display', then go to the FEC Crate Controller.
  • Set the Scan Slot Range from 4 to 4.
  • After the parameter has been applied, go to "state machine" on the crate controller (not rcms) and click on 'Configure'.
  • Next configure the FEDs. In Status Display open the FED's Crate Control. We can scan over all slots.
  • After looking over the parameter, defaults are usually ok, go to state machine and click on 'Configure'.
  • The CrateScan is now completed.

  • To look at the Crates, Slots, FED/FECs, APVs in more detail:
  • Login as xdaqdev on cmstracker029
  • export CONFDB=cms_tracker_tif3/xxxxxxx@cms_sstracker
  • ./tkconfigurationdb.sh.new $CONFDB
  • open a web browser and go to cmstracker029.cern.ch:15000 for the DB frontend
  • Choose the Database you wish to look at in the Database Parameter page.
  • Then click on the Partition/Version, FEC Partition Parameters, Modules & Parameters,FED Parameters.
  • Under FED Parameters, select a FED and look through each APV's TD (trim diag). These should each be less than 100 when the Lasers are off.
  • You can also change the firmware version of the configuration if need here. The FED might have an updated firmware that your configuration will need to match.
  • To do this, change the firmware version to your desire version and hit apply.
  • Then go back to Strip Tracker Configuration Database Interface and select Create a major version and 'Apply' it.

Step 2. CRack Connection Run

  • NB. It is assumed that the connections do not exist in the database before the run is taken.
  • You will have problems if you try and take a connection run using a partition in which connections already exist.
  • If you need to do this, check with the experts.
  • It is possible to disable the connections in the database before attempting to take a connection run.
  • Before we can start the connection we need to make sure the FEDs are configured correctly.
    • Use the DB frontend stated above, and click on Database Parameter and choose your Partition.
    • Click on Configure Database then open FED Parameters.
    • Click on Edit Fed Mode Parameters, and make sure the Trigger = TTC, Mode = Scope and Read Out = VME are set correctly.
    • After all the parameters are set. Make sure to save the parameters using either a major or minor revision in the Partition ([FED Parameters], click on "Create a major/minor version" ).
  • login into to the run control http://cmstracker035:8080/rcms
  • Click on configuration chooser, then click on the partition name, e.g. CRACK_27-FEB-2012_3, then choose 02-CrackConnection
  • Initialize the configuration.
  • After the run is configured, start the run and collected 34 events. Click on the StorageManager to see the collected event information.
  • During the run we can look at what's going on with the Tracker Supervisor in the status display.
    • The trigger is sent to the Fed9U Supervisor, then to the Data Sender, and Finally to the StorageManager.
  • After collecting 34 events, Halt and Destroy the run.
  • login to xdaqtk@cmstracker029. Look for the data in /opt/cmssw/Data/closed/<RUN#>. * To run tkCommissioner, 1) source /opt/trackerDAQ/config/user.sh 2) export CONFDB=cms_tracker_tif3/xxxxxxxx@cms_sstracker
  • In tkCommissioner, select your partition (left hand column) and then the run (right hand column) and click on Analyze
  • Goto /opt/cmssw/Data/<Run#> and cat analysis_<RunNumber_xxxxx.info.log
  • Information in the log file should be presented like this:

[FastFedCablingHistosUsingDb::connections] Summary of connections:
"Good" connections     : 5
"Dirty" connections    : 7
"Bad" TrimDAQ settings : 0
("Missing" connections : 0)
("Missing" APV pairs   : 0)
("Missing" APVs        : 0)
[FastFedCablingHistosUsingDb::connections] List of "dirty" connections:
FED:crate/slot/id/unit/chan/apv= -/-/52/6/1/0 FEC:crate/slot/ring/CCU/module/LLD/I2C= 2/2/8/1/23/3/0 DcuId= 0037fe6b DetId= ffffffff
FED:crate/slot/id/unit/chan/apv= -/-/52/6/2/0 FEC:crate/slot/ring/CCU/module/LLD/I2C= 2/2/8/1/23/1/0 DcuId= 0037fe6b DetId= ffffffff
FED:crate/slot/id/unit/chan/apv= -/-/52/6/8/0 FEC:crate/slot/ring/CCU/module/LLD/I2C= 2/2/8/1/26/1/0 DcuId= 00d7fe67 DetId= ffffffff
FED:crate/slot/id/unit/chan/apv= -/-/52/6/9/0 FEC:crate/slot/ring/CCU/module/LLD/I2C= 2/2/8/1/19/3/0 DcuId= 00defacb DetId= ffffffff
FED:crate/slot/id/unit/chan/apv= -/-/52/6/10/0 FEC:crate/slot/ring/CCU/module/LLD/I2C= 2/2/8/1/17/1/0 DcuId= 00dbfef1 DetId= ffffffff
FED:crate/slot/id/unit/chan/apv= -/-/52/6/11/0 FEC:crate/slot/ring/CCU/module/LLD/I2C= 2/2/8/1/17/3/0 DcuId= 00dbfef1 DetId= ffffffff
FED:crate/slot/id/unit/chan/apv= -/-/52/6/12/0 FEC:crate/slot/ring/CCU/module/LLD/I2C= 2/2/8/1/19/1/0 DcuId= 00defacb DetId= ffffffff

  • 7 dirty connection seems "bad" to me so I made another connection Run. The next connection run came up with 8 dirty connections.
    • A dirty connection corresponds to a channel with a high light level below 800 ADC counts.
    • A channel with a bad trimDAC setting corresponds to one with a trimDAC value below 10 ADC counts.
    • Good, dirty and bad trimDAC connections will all be uploaded to the database. But dirty and bad trimDAC connections will need to be investigated by the connection team.
    • Any missing connections or devices will not be uploaded. "Missing connections" are devices identified in the crate scan that could not be connected to a FED channel (ie. DCU ID could not be extracted from the histogram for some reason - miscabling or broken fibre). "Missing APV pairs" or "Missing APVs" are devices from the DCU-DetID static table that were not found during the crate scan.
    • Make sure there are no missing connections or devices. If there are, you must document them fully in the elog.
    • The FEC crate scan results are used to construct the "control view" of the tracker (in a cabling object). This is compared with the DCU-DetID static database table, which allows the identification of devices missed by the FEC crate scan. These appear as missing devices in the connection run.
    • I did not find the warning.log file described here. The log file warning.log contains brief summaries of the "dirty" connections and the "bad trimDAC" settings, as well as detailed summaries of each problem channel. The "location" (crate, slot, ID, unit etc.) is provided for each problem channel, so that the individual histograms can be identified easily. This is done automatically in kLogRead, as implemented in tkCommissioner.
    • I did not find the info.log file described here. info.log also contains information about the number of valid channels. A valid channel is one for which a connection has been established, regardless of the quality of the connection. Typically, an unconnected channel will be flagged with the error "SmallRangeInRawData". The summaries for each problem (unconnected, dirty or bad trimDAC) channel are also available in info.log.
    • I did not find the debug.log file described here. In debug.log, full details of the analysis can be found, including a complete listing of the dummy cabling map. It also gives the summaries for all channels, good or bad. There is also a dump of enabled FED channels, which are channels that have been enabled in the FED descriptions, based on the results of the connection run. All channels are disabled initially. Lists of all good, dirty, bad trimDAC and missing channels are also given.
  • An Example of a good channel was also found on the same log file.
[FastCablingAnalysis] Monitorables (65535 means "invalid"):
 Crate/FEC/Ring/CCU/Mod/LLD     : 2/2/8/1/19/1
 FedId/FeUnit/FeChan/FedChannel : 52/6/12/24
 FecKey/Fedkey (hex)            : 0x10a00484 / 0x0000d1b0
 DcuId (hex/dec)                : 0x00defacb /   14613195
 DetId (hex/dec)                : 0xffffffff / 4294967295
 DCU id extracted from histo     : 0x00defacb
 LLD chan extracted from histo   : 1
 "High" level (mean+/-rms) [ADC] : 690.87 +/- 2.59
 "Low" level (mean+/-rms)  [ADC] : 37.94 +/- 0.74
 Median "high" level       [ADC] : 690.78
 Median "low" level        [ADC] : 37.77
 Range                     [ADC] : 652.93
 Mid-range level           [ADC] : 364.40
 Maximum level             [ADC] : 695.37
 Minimum level             [ADC] : 36.99
 isValid                         : true
 isDirty                         : true
 badTrimDac                      : false
 Error codes (found  0)          : (none)
    • The FED ID/channel information will always be set, regardless of whether a connection was established or not.
    • The DCU ID (and hence the Det ID and the location in the control system) will only be given for channels which were successfully connected. Otherwise they appear as 0xFFFFFFFF.
    • Three flags are given to show if the channel was valid, dirty or had a bad trimDAC setting.

  • An example of a bad connection is:
[FastCablingAnalysis] Monitorables (65535 means "invalid"):
 Crate/FEC/Ring/CCU/Mod/LLD     : (invalid)
 FedId/FeUnit/FeChan/FedChannel : 124/3/6/66
 FecKey/Fedkey (hex)            : 0x3fffffe8 / 0x0001f0d8
 DcuId (hex/dec)                : 0xffffffff / 4294967295
 DetId (hex/dec)                : 0xffffffff / 4294967295
 DCU id extracted from histo     : 0xffffffff
 LLD chan extracted from histo   : 65535
 "High" level (mean+/-rms) [ADC] : 65535.00 +/- 65535.00
 "Low" level (mean+/-rms)  [ADC] : 65535.00 +/- 65535.00
 Median "high" level       [ADC] : 65535.00
 Median "low" level        [ADC] : 65535.00
 Range                     [ADC] : 0.82
 Mid-range level           [ADC] : 38.12
 Maximum level             [ADC] : 38.53
 Minimum level             [ADC] : 37.71
 isValid                         : false
 isDirty                         : false
 badTrimDac                      : false
 Error codes (found  1)          : SmallRangeInRawData 

  • You should check the following summary histograms:
  • cd into /opt/cmssw/Data/<Run#> and execute 'root -l SiStripCommissioningClient_00082160.root'. One can find these histograms.
Histogram Name Type Description
SummaryHisto_Histo2DSum_FastCabling_ReadoutView_ConnectionsPerFed 1D The histogram is titled 2D but it is a 1D histogram. X-Axis is the FE unit # and the Y-Axis is the number of good+dirty connections the FE unit. For my run, I only see 1 bin with a value 12. This indicates that I have 1 FED unit which has 12 connections on it. 12 is the maximum per FED unit btw.
SummaryHisto_Histo1D_FastCabling_ReadoutView_HighLightLevel 1D The saturation light level for all connected channels (average of the saturation level observed in the DCU bit pattern). A large spike should be visible at 1023 counts (corresponding to the 10-bit ADC). Ideally all channels will have a high light level of 1023 counts, but it is possible that a small number will be slightly lower. Any channel with a high light level below 800 ADC counts is flagged as dirty. The number of entries corresponds to all available FED channels. The integral is the number of valid connections and the overflow corresponds to the unconnected channels (65535).
SummaryHisto1D_FastCabling_readoutView_LowLightLevel 1D The low light level for all connected channels (average of the low light levels observed in the DCU bit pattern). The low light level should be approximately the trimDAC setting determined during the FED crate scan. This should be around 30-40 ADC counts. A value below 10 ADC counts is flagged as a bad trimDAC setting.
SummaryHisto_Histo2DScatter_FastCabling_ReadoutView_HighLightLevel 2D This shows the saturation light level for all connected channels versus FED FE unit. Useful for debugging purposes.
SummaryHisto_Histo2DScatter_FastCabling_ReadoutView_LowLightLevel 2D This shows the low light level for all connected channels versus FED FE unit. Useful for debugging purposes.

For each channel there is also an expert-level histogram that shows the DCU ID and the LLD channel as a bit pattern in high and low light levels. Bins 0-31 correspond to the DCU ID, while bins 32-33 encode the LLD channel. These histograms should be checked for any channels flagged as dirty or as having a bad trimDAC setting. NB. The LLD channel is numbered 1-3, but appears as 0-2 in the histogram.

  • After examining the histogram I thought the run was good enough to upload into the analysis and HW database.
  • I attempt to do this using tkCommissioner but it keep complaining that the analysis could not be run.
  • So I used /opt/cmssw/scripts/ ./run_analysis.sh 82160 true true CR_DAY-MON-YEAR true false true instead and it didn't have any complaint.
  • To make sure that the connection run data was actually upload, I can go to cmstracker029.cern.ch:15000 and see that the FEDs are now connected to modules.

Step 3: Timing Run

  • login into cmstracker035:8080/rcms
  • Select the CRack partition and the click on 03-CrackTiming.
  • Initialize.
  • Configure the run.
  • Start the run and collect 4800 events. Then Halt the run and Destroy.
  • Run the analysis as usual through the tkcommissioner, and check the following histograms.
  • If everything checks out, we can upload config to the database.
  • In the timing run we try to set the delay of each channel in the FED so that they are all synchronized. This is done by triggering each channel of the FED so it will send out a TickMark. The TickMark is the signal that the FED will send out to tell that there's a Frame of data coming. The timing run determines the delay required for each channel on the FED for this TickMark.

The summary histograms are as follows:

Histogram Name Type Description
SummaryHisto_Histo1D_ApvTiming_ControlView_TimeOfTickMarkEdge 1D This shows the measured position of the tick mark rising edge. The spread will be a few ns on the first timing run.
summaryHisto_Histo1D_ApvTiming_ControlView_RequiredDelayAdjustment 1D This shows the delay adjustment required to synchronise the system. On the first timing run, there will be a spread of a few ns.
SummaryHisto_Histo1D_ApvTiming_ControlView_TickMarkHeight 1D This shows the measured tick mark height. Once the system is synchronised (after bias and gain scan), the mean should be about 640 ADC counts, but after the first timing run, it will normally be lower. The tick mark height is about 8 MIPs and 1 MIP corresponds to 80 ADC counts, giving a mean of 640 ADC counts.
SummaryHisto_Histo2DScatter_ApvTiming_ControlView_TickMarkHeight 2D This shows the heights of successfully identified tick marks as a function of module.
SummaryHisto_Histo2DScatter_ApvTiming_ControlView_TimeOfTickMarkEdge 2D This shows the times of the rising edges of all the successfully-identified tick marks as a function of module. The distribution will have a structure that reflects the control structure of the tracker. Differences should be observed between CCUs, as well as within CCUs.
SummaryHisto_Histo2DScatter_ApvTiming_ControlView_RequiredDelayAdjustment 2D This shows the applied timing adjustment as a function of module. Outliers observed in the previous plots with have a correspondingly large delay adjustment.

There are also expert-level histograms that show the reconstructed tick marks for each channel. These should be checked for any problem channels to see if the tick mark has been properly reconstructed.

Some points to bear in mind

The timing of the channels is adjusted to the "latest" tick mark edge. If there is a problem with the "latest" tick mark, then this will be detected by the analysis job and an error will be written in <run_number>_error.log.

If any channels have a missing tick mark because the rising edge has moved out of range, then the latency can be adjusted. This is done by opening the TrackerSupervisor from the status display and selecting Commissioning Setting. The value that needs to be changed is the APV reset latency. To make the tick mark move to the right, increase the value of the latency. To make it move to the left, decrease the value of the latency. If this is necessary, then make sure that you document what you did fully in the elog.

Step 4: Bias and Gain scan

Use the gainscan run control configuration to take the bias and gain scan in the normal way. It will consist of 2000 events. Use tkCommissioner to check the results in the normal way.

* I cannot find these summary histograms. There were only series of OptoScan_Gain measures for each channel of the FEDs in the client file. The summary histograms are as follows:

Histogram Name Type Description
SummaryHisto_Histo1D_OptoScan_ControlView_MeasuredGain 1D This shows the distribution of the measured gain values for all channels. There is currently a bug in the histogram, such that the structure of the measured gain is not visible. The mean of the distribution should, however, be about 0.8 V/V. All channels should fall into the 0-1 V/V bin, although a small number of entries in the 1-2 V/V is acceptable, providing the mean is good.
SummaryHisto_Histo1D_OptoScan_ControlView_TickHeight 1D This shows the measured tick mark heights and should therefore have a mean of approximately 640 ADC counts. The spread should be typically less than 10%.
SummaryHisto_Histo1D_OptoScan_ControlView_LldGainSetting 1D This shows the distribution for the selected gain settings for all channels. The majority of channels should have a gain setting of 1. If you see an excess at 3, this indicates that many channels needed the highest possible gain, suggesting that there is a problem, eg. dirty fibres.
SummaryHisto_Histo1D_OptoScan_ControlView_LldBiasSetting 1D This shows the distribution of the selected bias settings for all channels. The mean value should be about 20-24.
SummaryHisto_Histo1D_OptoScan_ControlView_ZeroLightLevel 1D This shows the zero light level, basically the same as seen in the connection run. It should therefore be above 10 ADC counts. If below, it indicates a problem with the trimDAC calibration performed during the FED crate scan.
SummaryHisto_Histo2DScatter_OptoScan_ControlView_ZeroLightLevel 2D This shows the low light versus CCU.

There are also three expert-level histograms for each channel: The baseline level on which the tick mark sits within the raw APV data stream is shown as a function of bias setting, the point at which light is seen (lift-off) will be about 20. The tick mark height is also shown as a function of bias setting. Lift-off occurs earlier in this plot and saturation is correspondingly reached earlier. The measured gain is calculated from these two histograms. The third histogram is the noise in the sample for the baseline as a function of bias setting.

Some Tricks

  • Manually converts raw to source file.
  • copy the sourcefromraw_cfg.py from /opt/cmssw/scripts into that run's directory is convenient add the root files in the input and cmsRun it

  • Using sqlplus to change the scope length.
    • On xdaqdev@cmstracker029, set the $CONFDB variable and source /opt/trackerDAQ/config/oracle.env.bash afs
    • Select the part of the database you want to edit by, select distinct scopelength, versionmajorid, versionminorid, readroute, fedmode, supermode from fedvalues,fed,partition where fedvalues.fedid=fed.fedid and fed.partitionid=partition.partitionid and partitionname='CR_20-MAY-2009_1' order by versionmajorid,versionminorid;
    • Change the scopelength by, exec TestPackages.changeFedParameter('CR_20-MAY-2009_1','FEDVALUES','SCOPELENGTH','',71);
    • Save the changes by exec TestPkgFedDownload('CR_20-MAY-2009_1');
    • type quit to quit.

PSX server configuration

The PSX server enables communication between XDAQ and PVSS. It is installed on pccmstrdcs11. It is automatically running as a service. If you intend to migrate the PSX server to another computer, you need to make sure that PSX is installed and the remote PVSS project is registered on the new computer. Afterwards you can untar the files "psx_CRACK.tar" and "psx_XY.tar" which are stored at the moment within the "/root" directory of pccmstrdcs11. These files contain all the configurations needed for the PSX server, so that you only have to go into the directory "/etc/rc.d/init.d" and define the two PSX servers as a service by typing: "chkconfig --add psxCRACK" and "chkconfig --add psxXY". Afterwards you can start the service by typing: "service psxCRACK start" and "service psxXY start".


PVSS/DCS restart if needed

  • Always make sure the C-Rack DCS is running and set to the correct voltage/ctrl.
  • login to the DCSTRACKER machine, pccmstrdcs1.cern.ch, username should be on the machine and the password needs to be obtain from someone. Login to the CERN domain.
  • Open the PVSS Console either in the start menu or the desktop. In the menu call "Project" choose CRack2009 and start the project. This will start the DCS console you're familiar with.
  • The CRack may should up as Dead. In this case, you have to restart the DNS server. This can be done by just clicking the DNS shortcut on the desktop.
  • Once everything started you should be able to "take" the CRack and operate it normally.

Below is a copy of P5's Tracker Commissioning from Jo Cole.

Shifter instructions for TKCC

The up-to-date version of the information that used to be on this page (written by RB) can be found here. This page now contains the latest version of the procedures for commissioning the tracker for non-expert users.

Available pool of "expert shifters"

  • Alessandro Giassi
  • Erik Butz
  • Derek Strom
  • Matthew Chan
  • Jonathan Fulcher
  • Francesco Palmonari
  • Steven Lowette
  • Stefano Mersi

General Instructions

This section contains general instructions that might be needed at any point during the checkout procedure. You should read this before going to the section labelled Standard DAQ Checkout procedure.

Golden rules for the commissioning

In order to ensure that we minimize the number of hidden problems, the following rules have been set up:

  • NEVER disable a FED
  • During the run, the console output of the FEC must be looked at. Any FEC error must be understood, and the appropriate reaction must be taken.
  • After each run, a fine analysis of the procedure must be done to identify the outliers. Results must be documented in the elog, including the relevant plots.
  • Problems that can be fixed immediately should be fixed before going on with the next commissioning procedure. Not a single device/fibre should be disabled without an acknowledge by the DAQ expert in charge (so far, Stefano, Christophe or Laurent).

How to use tracker machines at P5

The tracker PCs at P5 are on the private CMS network, while the machines in the green barrack are on the standard network. This means that you need to make sure your computer and your browser are set up correctly. First of all, set up a tunnel using one of the head nodes (cmsusr0, cmsusr1, cmsusr2):

  • Open a terminal on the computer. Then type ssh -ND 1080 trackerpro@cmsusr1. This will make the terminal appear to hang.

In order to access the run control, you also need to make sure that your browser is set up correctly. The following instructions are for firefox:

  • Open Edit -> Preferences -> Connection
  • Select Manual Proxy connection (socks v5, host: localhost, port: 1080)
  • Go to the web address about:config, type dns in the filter field and switch the option network.proxy.socks_remote_dns to TRUE.

You will also have to use the head nodes to log on to the rack PCs. Simply log on to one of the head nodes as trackerpro and you will be able to log in to the tracker PCs. The tracker PCs are those actually connected to the tracker FECs and FEDs and generally have a name of the form vmepcs2bXX-YY.

How to use the run control

Make sure that your computer and browser are set up correctly. The run control is located at http://cmsrc-tracker:18000/rcms (user name = trackerpro). The standard procedure for taking a run is:

  • Check that there are no running configurations on the partition you want to use (TIB, TOB, TEC+, TEC-).
  • Click on Configuration chooser and find the configuration you are supposed to be using. Click on it and then press create. The main run control page will appear.
  • Press initialize.
  • Once initialization is complete, press configure. Once it is configured, press start to begin the run.
  • When the run has finished, press halt. Each commissioning run has a pre-defined number of events, so make sure that this number of events has been processed through the Storage Manager (the final stage of the event building process) before halting the run. The StorageManager can be accessed via the status display. The number of events processed during earlier stages of event building can be monitored via TrackerSupervisor, FUEventProcessor and the FUResourceBroker.
  • Once the system is halted, perform the commissioning run analysis. Once you are happy with the results and they have been uploaded to the database, press destroy.

Examining the contents of the TKCC database

The main page for accessing the TKCC database is cern.ch/test-tkcheckout/page.html

Examining the contents of the online configuration database

A XDAQ application exists for viewing the contents of the online configuration database. Log on to vmepcs2b18-39 and then do the following:

     export TNS_ADMIN = /etc
     CONFDB = <DB account> ; ./tkconfigurationdb.sh.new
<DB account> can be determined from the xml file used to make the RCMS configuration or using the history command on vmepcs2b18-39.

Running the script makes the terminal appear to hang, as it produces log messages in the terminal. You can then open the webpage vmepcs2b18-39:15000 and click on the right-hand link labelled TkConfigurationDb. Select the link Database Parameters and then click the apply button to connect to the database. Select your favourite database partition from the drop-down list and press apply again.

To view the contents of the partition, select Configure Database. Modules and parameters gives a list of the modules with associated FECs and FEC-FED connections, while FED parameters gives a list of FEDs associated to the database partition.

Using ELOG

The ELOG for TKCC can be found at http://tacweb.cern.ch:8080/elog/Tracker+Commissioning+and+Operations/ You will need to make an account for this ELOG if you do not already have one. For EVERY run that you take, you should submit a report giving a detailed summary of the run, including the results and plots of all the useful summary histograms.

Analyzing the results of a commissioning run

Open tkCommissioner on vmepcs2b18-39.

Old instructions (just in case!)

Although a number of runs must be taken during the commissioning process, the procedure for analyzing the results is the same, even if different output histograms are produced each time. The analysis is performed on trackerpro@vmepcs2b18-39. Go to the directory /opt/cmssw/shifter. There you will find a script called run_analysis.sh.

The script is used as follows:

./run_analysis.sh <run_number> <HWDbFlag> <AnalysisDbFlag> <DB_Partition_Name> <UseClientFile> 

  • run_number is self-explanatory. You can find the run number on the main run control page. Note that you will lose the run number if you destroy the configuration, so make sure you make a note of it.
  • HWDbFlag If set true, the analysis job will automatically upload the hardware configuration results to the online configuration database. When you run this script for the first time for each run, you must explicitly set the flag to FALSE. The script creates the necessary cfg file and runs the jobs using cmsRun. The output of the run is written to the $SCRATCH directory defined in the script. This should be set to /opt/cmssw/Data/ at P5.
  • AnalysisDbFlag If set true the analysis job will auromatically upload the analysis results to the online configuration database. As with the previous flag, when you run this script for the first time for each run you must explicitly set the flag to FALSE.
  • DB_Partition_Name This should be given as the DB partition name that was generated from the TKCC DB and was used in the RCMS configurations. Make sure you get it right, as it will be used to upload the results to the online configuration database.
  • UseClientFile This should be set to false normally. It allows the analysis to be performed using the Client root file, rather than the Source file, so can only be set true if the analysis has already been run at least once. For the moment, always set this flag to false, although in future this option may be brought into use.

For each run the contents of the log files will be written to the $SCRATCH directory, along with the ROOT files. The SCRATCH directory is set to /opt/cmssw/Data/<PartitionName>/<RunNumber> . The contents of both should be examined, as described in the following sections. If the results are good, run_analysis.sh should be run again, this time with both DB flags set true. The contents of the online configuration database should then be checked to ensure that the results have been successfully uploaded.

The output of a commissioning run analysis

Click view results on tkCommissioner, this will open kLogRead.

More detailed information for understanding the output

Each run produces a root file with a name of the form SiStripCommissioningSource_<run_number>_<ID>.root. On a small system only one file will be produced. However, on a large system, there will be many Filter Units and each will produce a separate source root file. The analysis job collates all these files to produce a single output file SiStripCommissioningClient_<run_number>.root. This contains top-level summary histograms, as well as containing expert histograms for each individual channel. The analysis job also produces several log files: info.log, error.log, debug.log and warning.log.

The first thing to check after running the analysis job is error.log. This will show if there were any general problems eg. accessing the database (even if you are not uploading the results, information will be downloaded from the online configuration database). warning.log contains more detailed information about individual channels. info.log contains a summary of the job.

The output root file has a standard structure and can be studied using the TBrowser in ROOT. The top level contains a variable to tell you which version of CMSSW was used to produce the plot, plus a DQMdata folder, which contains a Collate folder. Inside this is another folder called SiStrip. Inside this, there are two variables: the run number and the run type, plus another folder. This folder tells you which "view" has been chosen for the plots. ControlView means that all the histograms are done assuming that the basic monitoring object is a control channel. ReadoutView means that the basic monitoring object is a readout (FED) channel. Most commissioning runs are analyzed in the ControlView, except for the connection run, which is done in ReadoutView, for obvious reasons.

This last folder contains the actual histograms. For each run the top level contains a series of summary histograms, which must be checked (see details below). There is also a directory structure that leads you to the individual plots for each channel (aka expert-level histograms).

Standard DAQ Checkout procedure

This section contains the instructions for checking out one or more cooling loops. You should have read the section entitled General Instructions before proceeding with checkout.

Selecting cooling loops for commissioning

This section is no longer required. The first step is the DCU/PSU map. However, the commands described in this section are generally useful in the event of problems with the DCU/PSU map or crate scan (also at other times, when you think there might be a FEC problem).

Use the TKCCDB to identify which machine the FEC(s) you are interested in are connected to. Log on to this machine to execute the following commands (make sure the ring is powered before you start):

  • To reset the crate: ProgramTest.exe -crateReset
  • To see which rings are available: ProgramTest.exe -fecver

This command outputs the FEC hardware ID, the FEC slot.ring, the mFEC version, the VME version and the trigger version (SR0). The rings that are usable will be marked with "<<<" and will have SR0 = 4c90.

  • To reset a particular ring: ProgramTest.exe -fec <slot> -ring <ring> -reset. In the output the value of Status Register 0 should remain 4c90.
  • To check the status of the CCUs connected to the ring: ProgramTest.exe -fec <slot> -ring <ring> -scanring
  • To check the modules/devices attached to a ring: ProgramTest.exe -fec <slot> -ring <ring> -scantrackerdevice

Generating the Database partition and the RCMS configurations

To generate the RCMS configurations, do the following:

  • log on to cmsusr1 as trackerpro
  • change to autotest directory
  • run the script ./generateConfTKCC
All the RCMS configurations will then be inserted into the DuckCAD database. They are then migrated to the RCMS database using the configurator, as follows:
  • Log in to cmsusr1 as trackerpro.
  • Then run the configurator using the command: java -jar manager.jar.
  • Click on the satellite dish symbol to connect to the oracle database.
  • Then select trackerpro user from the right hand list.
  • Then select the migrator tab from the top of the window.
  • Click the satellite dish symbol to connect the DuckCAD database.
  • Then find the correct folder in the top right hand list ( DuckCAD configuration chooser). For checkout the folder is TKCC/<PartitionName> . Inside there will be a set of configurations called <PartitionName> <RunType>.
  • In the bottom left window ( Destination: RS3 database), find the folder TKCC and inside create a new folder using the new folder button and label it with the partition name. Select one of the configurations you want to migrate and then select the folder you just created and press the green down arrow in the middle of the window to migrate it. It will ask you to add a comment; just press OK.

Repeat this procedure for each configuration you want to migrate. You should migrate all the configurations that start with the partition name in the DuckCAD database folder into the RS3 database folder.

Now follow the instructions to use the run control.

Commissioning Runs

The details of each of the commissioning runs are given below. The LV must be ON for all commissioning runs. The second pedestal run should be taken with the HV ON as well. Details of which histograms to check and what they should look like for each run are given below.

For each standard run (connection run onwards), once the run has successfully completed, halt the run and, when it has halted, run the CMSSW analysis job using tkCommissioner. Check all the bad channels and record the problems in the elog. You should also check the summary histograms in the output ROOT file.

Once you are happy with the results of the run, update the configuration and destroy the configuration.

If you encounter problems, make sure that you document them fully in the elog. You should, however, continue regardless of problems, unless the problems actively prevent you from continuing. If you cannot continue, contact the appropriate expert.

DCU/PSU Map

To take the DCU/PSU map, create and then initialize the DcuMap RCMS configuration. Once initialized, configure. This will take a long time and, once it is configured, the DCU/PSU map is complete. If there are no errors in the FECSupervisor job control log file, then the run worked. During the run, the power supplies will be switched on briefly, so make sure the DCS shifter knows what you are doing, so they can watch. Once the scan is complete, halt the run. You can also watch the log file from the FECSupervisor in real-time by logging on to vmepcs2b18-39 and executing the command: ./watch_xdaq_app_log.sh <FEC-PC> <ProcID>, where <FEC-PC> is the name of the machine on which the FECSupervisor is running and <ProcID> is the process ID. Both of these pieces of information can be seen on the status display for the run.

Once the run has configured, check the TKCC database. Go to the main page and select the link DAQ. Press the List Partition List button and select your partition by following the link. Check the table labelled full summary. All the entries should be green, apart from the DOHM DCUs, which will be red and labelled bad missing. This is expected. If any other DCUs are red, make a note of these in your elog report so that they can be investigated.

FED and FEC crate scans

Before starting the crate scans, ask the DCS shifter to switch on the LV for the partition you are using.

Create the CrateController run configuration,initialize and open each CrateController. Go to the CrateController0, and configure. Once it is configured, ask the DCS to switch off the detector. Once everything is off, you can configure the CrateController1. When CrateController1 is configured, go on with the others. If configure completes without error, then both the FEC and FED crate scans should have completed successfully. Check in the online configuration database that all the expected FEDs and FECs are there. If so, you can halt and destroy and continue. If not, then open the CrateController application for FECs and FEDs from the status display and see what the information box on the state machine says. This may provide some useful information. If the solution to the problem is not obvious, contact the expert shifter for advice.

For this run only, the online configuration database can be checked quickly as follows:

  • log on to trackerpro@cmsusr1
  • cd autotest
  • ./checkscan

If the crate scans have worked, the database partition name for both FEDs and FECs should be the correct one. If either is wrong, then a problem occurred during the crate scan. Once you are happy with the results of the crate scans, you should destroy the RCMS configuration.

Obsolete buggy instructions

Create the CrateController run configuration, initialize and then configure. If configure completes without error, then both the FEC and FED crate scans should have completed successfully. Check in the online configuration database that all the expected FEDs and FECs are there. If so, you can halt and destroy and continue. If not, then open the CrateController application for FECs and FEDs from the status display and see what the information box on the state machine says. This may provide some useful information. If the solution to the problem is not obvious, contact the expert shifter for advice.

For this run only, the online configuration database can be checked quickly as follows:

  • log on to trackerpro@cmsusr1
  • cd autotest
  • ./checkscan

If the crate scans have worked, the database partition name for both FEDs and FECs should be the correct one. If either is wrong, then a problem occurred during the crate scan. Once you are happy with the results of the crate scans, you should destroy the RCMS configuration.

Connection run

Use the Connection run control configuration in the configuration chooser to take the connection run. After initializing the configuration, take some DCU data. Set up the DCUFilter and FECSupervisor applications, as described in the section "Reading out DCU data" at the bottom of this page and then leave the system without configuring for about five minutes. You can look at the DCU readings using the webpage:

select the time interval and press "submit query"

contains an automatically-updating display of the temperatures.

After five minutes, continue the run in the normal way. It will consist of 34 events. NB. It is assumed that the connections do not exist in the database before the run is taken. You will have problems if you try and take a connection run using a partition in which connections already exist. If you need to do this, check with the experts. It is possible to disable the connections in the database before attempting to take a connection run.

Once you have taken the connection run, the results can checked using the tkCommissioner as normal.

You can also check the log files by hand, in which case start with error.log. If there have been any general problems, eg. database access, they should appear in this log file. If you think you have database problems, the first thing to check is the variable TNS_ADMIN. This can be defined as an environmental variable, but it will be overriden by the default value in OnlineDB/SiStripConfigDb/data/SiStripConfigDb.cfi. At P5, it should be set to /etc. In info.log you will find information about the database connection, the input ROOT files and run information. It also contains a summary of the cabling object (see below). In the case of a connection run this is a dummy map because the connections are not yet known. In other runs, the summary will reflect the real cabling. It contains a connection summary table, which should look something like this:

[FastFedCablingHistosUsingDb::connections] Summary of connections: 
 "Good" connections       : 163
 "Dirty" connections        : 1
 "Bad" TrimDAQ settings : 4
 "Missing" connections   : 0
 "Missing" APV pairs        : 0
 "Missing" APVs              : 0
A dirty connection corresponds to a channel with a high light level below 800 ADC counts. A channel with a bad trimDAC setting corresponds to one with a trimDAC value below 10 ADC counts. Good, dirty and bad trimDAC connections will all be uploaded to the database. But dirty and bad trimDAC connections will need to be investigated by the connection team. Any missing connections or devices will not be uploaded. "Missing connections" are devices identified in the crate scan that could not be connected to a FED channel (ie. DCU ID could not be extracted from the histogram for some reason - miscabling or broken fibre). "Missing APV pairs" or "Missing APVs" are devices from the DCU-DetID static table that were not found during the crate scan. Make sure there are no missing connections or devices. If there are, you must document them fully in the elog. The FEC crate scan results are used to construct the "control view" of the tracker (in a cabling object). This is compared with the DCU-DetID static database table, which allows the identification of devices missed by the FEC crate scan. These appear as missing devices in the connection run. The log file warning.log contains brief summaries of the "dirty" connections and the "bad trimDAC" settings, as well as detailed summaries of each problem channel. The "location" (crate, slot, ID, unit etc.) is provided for each problem channel, so that the individual histograms can be identified easily. This is done automatically in kLogRead, as implemented in tkCommissioner.

info.log also contains information about the number of valid channels. A valid channel is one for which a connection has been established, regardless of the quality of the connection. Typically, an unconnected channel will be flagged with the error "SmallRangeInRawData". The summaries for each problem (unconnected, dirty or bad trimDAC) channel are also available in info.log.

In debug.log, full details of the analysis can be found, including a complete listing of the dummy cabling map. It also gives the summaries for all channels, good or bad. There is also a dump of enabled FED channels, which are channels that have been enabled in the FED descriptions, based on the results of the connection run. All channels are disabled initially. Lists of all good, dirty, bad trimDAC and missing channels are also given.

An example of a good channel summary is as follows:

[FastCablingAnalysis] Monitorables (65535 means "invalid"):
 Crate/FEC/Ring/CCU/Mod/LLD     : 2/4/7/96/16/3
 FedId/FeUnit/FeChan/FedChannel : 124/1/1/95
 FecKey/Fedkey (hex)            : 0x111d802c / 0x0001f044
 DcuId (hex/dec)                : 0x00def137 /   14610743
 DetId (hex/dec)                : 0x1a01e039 /  436330553
 DCU id extracted from histo     : 0x00def137
 LLD chan extracted from histo   : 3
 "High" level (mean+/-rms) [ADC] : 893.63 +/- 8.21
 "Low" level (mean+/-rms)  [ADC] : 39.57 +/- 0.65
 Median "high" level       [ADC] : 892.33
 Median "low" level        [ADC] : 39.24
 Range                     [ADC] : 854.05
 Mid-range level           [ADC] : 466.60
 Maximum level             [ADC] : 918.52
 Minimum level             [ADC] : 38.81
 isValid                         : true
 isDirty                         : false
 badTrimDac                      : false
 Error codes (found  0)          : (none)

The FED ID/channel information will always be set, regardless of whether a connection was established or not. The DCU ID (and hence the Det ID and the location in the control system) will only be given for channels which were successfully connected. Otherwise they appear as 0xFFFFFFFF. Three flags are given to show if the channel was valid, dirty or had a bad trimDAC setting. The error code, if there was a problem, is also given. An example of a bad connection is:

[FastCablingAnalysis] Monitorables (65535 means "invalid"):
 Crate/FEC/Ring/CCU/Mod/LLD     : (invalid)
 FedId/FeUnit/FeChan/FedChannel : 124/3/6/66
 FecKey/Fedkey (hex)            : 0x3fffffe8 / 0x0001f0d8
 DcuId (hex/dec)                : 0xffffffff / 4294967295
 DetId (hex/dec)                : 0xffffffff / 4294967295
 DCU id extracted from histo     : 0xffffffff
 LLD chan extracted from histo   : 65535
 "High" level (mean+/-rms) [ADC] : 65535.00 +/- 65535.00
 "Low" level (mean+/-rms)  [ADC] : 65535.00 +/- 65535.00
 Median "high" level       [ADC] : 65535.00
 Median "low" level        [ADC] : 65535.00
 Range                     [ADC] : 0.82
 Mid-range level           [ADC] : 38.12
 Maximum level             [ADC] : 38.53
 Minimum level             [ADC] : 37.71
 isValid                         : false
 isDirty                         : false
 badTrimDac                      : false
 Error codes (found  1)          : SmallRangeInRawData 

You should also check the following summary histograms:

Histogram Name Type Description
SummaryHisto_Histo2DSum_FastCabling_ReadoutView_ConnectionsPerFed 1D This shows how many good connections were found for each FED, plotted as a function of FED FE unit. Check the total number of entries in the histogram and look for any FEDs that don't have enough connections. A fully-connected FE unit will have 12 connections, but in some cases less than 12 is also OK. This depends on the control ring under investigation. Note that the entries include these flagged as dirty or having a bad trimDAC setting.
SummaryHisto_Histo1D_FastCabling_ReadoutView_HighLightLevel 1D The saturation light level for all connected channels (average of the saturation level observed in the DCU bit pattern). A large spike should be visible at 1023 counts (corresponding to the 10-bit ADC). Ideally all channels will have a high light level of 1023 counts, but it is possible that a small number will be slightly lower. Any channel with a high light level below 800 ADC counts is flagged as dirty. The number of entries corresponds to all available FED channels. The integral is the number of valid connections and the overflow corresponds to the unconnected channels (65535).
SummaryHisto1D_FastCabling_readoutView_LowLightLevel 1D The low light level for all connected channels (average of the low light levels observed in the DCU bit pattern). The low light level should be approximately the trimDAC setting determined during the FED crate scan. This should be around 30-40 ADC counts. A value below 10 ADC counts is flagged as a bad trimDAC setting.
SummaryHisto_Histo2DScatter_FastCabling_ReadoutView_HighLightLevel 2D This shows the saturation light level for all connected channels versus FED FE unit. Useful for debugging purposes.
SummaryHisto_Histo2DScatter_FastCabling_ReadoutView_LowLightLevel 2D This shows the low light level for all connected channels versus FED FE unit. Useful for debugging purposes.

For each channel there is also an expert-level histogram that shows the DCU ID and the LLD channel as a bit pattern in high and low light levels. Bins 0-31 correspond to the DCU ID, while bins 32-33 encode the LLD channel. These histograms should be checked for any channels flagged as dirty or as having a bad trimDAC setting. NB. The LLD channel is numbered 1-3, but appears as 0-2 in the histogram.

Timing run (1)

Use the Timing run control configuration in the configuration chooser to take the timing run in the usual way. It will consist of 480 events. Use tkCommissioner to check the results in the normal way. The log files for all different run types contain similar information to those for the connection run. However, once the connections are in place, the summary of the connected channels will be the true connection map rather than the dummy seen in the connection run log files. In info.log the minimum and maximum values for the time of the tick mark rising edge are given. For the first timing run, the difference is likely to be large. Check the log files for any tick marks that have not been found; note the error string and check the expert-level histograms for the corresponding channels.

Once again, debug.log contains summaries of each channel. An example of a good channel is:

[ApvTimingAnalysis] Monitorables (65535 means "invalid"):
 Crate/FEC/Ring/CCU/Mod/LLD     : 2/4/7/1/17/1
 FedId/FeUnit/FeChan/FedChannel : 127/2/7/77
 FecKey/Fedkey (hex)            : 0x111c0444 / 0x0001fc9c
 DcuId (hex/dec)                : 0x007bf617 /    8123927
 DetId (hex/dec)                : 0x1a002058 /  436215896
 Tick mark: time of rising edge     [ns] : 710.42
 Last tick: time of rising edge     [ns] : 742.71
 Tick mark: time of sampling point  [ns] : 725.42
 Last tick: time of sampling point  [ns] : 757.71
 Last tick: adjusted sampling point [ns] : 774.71
 Delay required to synchronise      [ns] : 49.29
 Tick mark bottom (baseline)       [ADC] : 370.85
 Tick mark top                     [ADC] : 842.10
 Tick mark height                  [ADC] : 471.25
 isValid                                 : true
 Error codes (found  0)                  : (none)

The last tick is the reference tick for synchronization (see "points to bear in mind" below). The sampling point is chosen to be 15 ns after the rising edge of the tick mark, so as to ensure it is sitting on the plateau. The sampling point for the last tick is adjusted to make sure that the FED captures the data correctly. The delay that must be applied to each channel is therefore determined so that the tick mark sampling point matches the adjusted last tick sampling point. The dimensions of the tick mark are also given. The height should be around 640 ADC counts after the bias and gain scan has been performed, but will most likely be less on the first timing run (as in the example above).

The summary histograms are as follows:

Histogram Name Type Description
SummaryHisto_Histo1D_ApvTiming_ControlView_TimeOfTickMarkEdge 1D This shows the measured position of the tick mark rising edge. The spread will be a few ns on the first timing run.
summaryHisto_Histo1D_ApvTiming_ControlView_RequiredDelayAdjustment 1D This shows the delay adjustment required to synchronise the system. On the first timing run, there will be a spread of a few ns.
SummaryHisto_Histo1D_ApvTiming_ControlView_TickMarkHeight 1D This shows the measured tick mark height. Once the system is synchronised (after bias and gain scan), the mean should be about 640 ADC counts, but after the first timing run, it will normally be lower. The tick mark height is about 8 MIPs and 1 MIP corresponds to 80 ADC counts, giving a mean of 640 ADC counts.
SummaryHisto_Histo2DScatter_ApvTiming_ControlView_TickMarkHeight 2D This shows the heights of successfully identified tick marks as a function of module.
SummaryHisto_Histo2DScatter_ApvTiming_ControlView_TimeOfTickMarkEdge 2D This shows the times of the rising edges of all the successfully-identified tick marks as a function of module. The distribution will have a structure that reflects the control structure of the tracker. Differences should be observed between CCUs, as well as within CCUs.
SummaryHisto_Histo2DScatter_ApvTiming_ControlView_RequiredDelayAdjustment 2D This shows the applied timing adjustment as a function of module. Outliers observed in the previous plots with have a correspondingly large delay adjustment.

There are also expert-level histograms that show the reconstructed tick marks for each channel. These should be checked for any problem channels to see if the tick mark has been properly reconstructed.

Some points to bear in mind

The timing of the channels is adjusted to the "latest" tick mark edge. If there is a problem with the "latest" tick mark, then this will be detected by the analysis job and an error will be written in <run_number>_error.log.

If any channels have a missing tick mark because the rising edge has moved out of range, then the latency can be adjusted. This is done by opening the TrackerSupervisor from the status display and selecting Commissioning Setting. The value that needs to be changed is the APV reset latency. To make the tick mark move to the right, increase the value of the latency. To make it move to the left, decrease the value of the latency. If this is necessary, then make sure that you document what you did fully in the elog.

Bias and Gain scan

Use the gainscan run control configuration to take the bias and gain scan in the normal way. It will consist of 2000 events. Use tkCommissioner to check the results in the normal way.

The summary histograms are as follows:

Histogram Name Type Description
SummaryHisto_Histo1D_OptoScan_ControlView_MeasuredGain 1D This shows the distribution of the measured gain values for all channels. There is currently a bug in the histogram, such that the structure of the measured gain is not visible. The mean of the distribution should, however, be about 0.8 V/V. All channels should fall into the 0-1 V/V bin, although a small number of entries in the 1-2 V/V is acceptable, providing the mean is good.
SummaryHisto_Histo1D_OptoScan_ControlView_TickHeight 1D This shows the measured tick mark heights and should therefore have a mean of approximately 640 ADC counts. The spread should be typically less than 10%.
SummaryHisto_Histo1D_OptoScan_ControlView_LldGainSetting 1D This shows the distribution for the selected gain settings for all channels. The majority of channels should have a gain setting of 1. If you see an excess at 3, this indicates that many channels needed the highest possible gain, suggesting that there is a problem, eg. dirty fibres.
SummaryHisto_Histo1D_OptoScan_ControlView_LldBiasSetting 1D This shows the distribution of the selected bias settings for all channels. The mean value should be about 20-24.
SummaryHisto_Histo1D_OptoScan_ControlView_ZeroLightLevel 1D This shows the zero light level, basically the same as seen in the connection run. It should therefore be above 10 ADC counts. If below, it indicates a problem with the trimDAC calibration performed during the FED crate scan.
SummaryHisto_Histo2DScatter_OptoScan_ControlView_ZeroLightLevel 2D This shows the low light versus CCU.

There are also three expert-level histograms for each channel: The baseline level on which the tick mark sits within the raw APV data stream is shown as a function of bias setting, the point at which light is seen (lift-off) will be about 20. The tick mark height is also shown as a function of bias setting. Lift-off occurs earlier in this plot and saturation is correspondingly reached earlier. The measured gain is calculated from these two histograms. The third histogram is the noise in the sample for the baseline as a function of bias setting.

Timing run (2)

Take another timing run, using the same run control configuration as for the first run. Use tkCommissioner to check the results.

The summary histograms are as follows:

Histogram Name Type Description
SummaryHisto_Histo1D_ApvTiming_ControlView_TickMarkHeight 1D This distribution should have a mean of about 640 ADC counts (this is the tick mark height that corresponds roughly to a measured gain of 0.8 V/V).
SummaryHisto _Histo1D_ApvTiming_ControlView_TimeOfTickMarkEdge 1D This should be narrower than that observed in the previous timing run and have a width less than about 1ns.
SummaryHisto _Hist2DScatter_ApvTiming_ControlView_TickMarkHeight 2D This distribution should be much flatter than that observed in the previous run.
SummaryHisto _Hist2DScatter_ApvTiming_ControlView_TimeOfTickMarkEdge 2D The distribution should be much flatter than that observed in the previous run and it should no longer show the control structure.
SummaryHisto _Histo1D_ApvTiming_ControlView_RequiredDelayAdjustment 1D This distribution should be narrower than that observed in the previous run and it should have a mean of about 25ns.

VPSP scan

Use the vpspscan run control configuration to take this run in the usual way. It will consist of 4720 events. Use tkCommissioner to check the results.

The digital "0" level is roughly equal to the trim DAC value (30-40 ADC counts) plus a margin to allow for the effects of temperature fluctuations. The value of the digital "0" level is determined for a particular bias setting. It is taken to be the baseline lift-off plus 2-3 bias settings.

The baseline is defined as the median pedestal value, which is approximately the digital "0" level, plus a third of the tick mark height (~640 ADC counts, as set in the second timing run). The highest possible value for the baseline should be approximately the digital "0" level, plus the tick mark height, plus 5-10%. The lowest possible level should be around the digital "0" level.

The definitions of the summary histograms are as follows:

Histogram Name Type Description
SummaryHisto _Histo1D_VpspScan_ControlView_ApvVpspSettings 1D This shows the distribution of VPSP settings that determine the level of the APV baseline. Make sure the mean of the distribution lies between around 30 and 50 and that there are no obvious outliers.
SummaryHisto _Histo1D_VpspScan_ControlView_BaselineLevel 1D This shows the baseline levels determined during the run. The majority of entries should lie approximately in the range 250 - 400, but a small number of higher entries is OK. Obviously low entries may indicate a problem.
SummaryHisto _Histo1D_VpspScan_ControlView_DigitalHigh 1D This corresponds to the highest possible level for the baseline for each channel. Check for obvious outliers.
SummaryHisto _Histo1D_VpspScan_ControlView_DigitalLow 1D This corresponds to the lowest possible level for the baseline for each channel. Check for obvious outliers.
SummaryHisto _Histo2DScatter_VpspScan_ControlView_ApvVpspSettings 2D This shows the VPSP settings versus channel number. The distribution should be flat.
SummaryHisto _Histo2DScatter_VpspScan_ControlView_BaselineLevel 2D This shows the baseline level versus channel number. Again, the distribution should be flat.

Pedestal runs

Use the pedestal run control configuration to take two runs in the standard way, one with only the LV on and the other with the HV on as well. Each will consist of 2040 events. Analyze the first run before taking the second. Use the tkCommissioner to check the results of both runs.

The definition of the summary histograms is below. All the quantities shown in the summary histograms are shown per APV pair. Each APV pair is connected to 256 strips and therefore the total number of strips can be determined by multiplying the number of entries in the histogram by 256. To convert noise in ADC counts into a number of electrons: 1 MIP = 80 ADC counts; 1MIP = ~25000 electrons.

Histogram Name Type Description
SummaryHisto Histo1D_Pedestals_ControlView_NumOfDeadStrips 1D The distribution of the number of dead strips (low noise) per APV. There should be only a small number and there should be no big tail on the distribution. There should be << 1% dead strips ( _to be checked against the TIF paper), preferably in both runs, but definitely with the HV on.
SummaryHisto _Histo1D_Pedestals_ControlView_NumOfNoisyStrips 1D The distribution of the number of noisy strips per APV. This should also be a small number, < 1% of the total number of strips, again, preferably in both runs, but definitely with the HV on.
SummaryHisto _Histo1D_Pedestals_ControlView_StripNoise 1D The distribution of the noise level per strip. The binning is wrong, but with the HV on, the aim is to have no more than 2000 electrons/detector channel. Outliers should be noted. The profile versions of this histogram and the pedestals can be used to identify the location of outliers.
SummaryHisto _Histo1D_Pedestals_ControlView_StripPedestals 1D The distribution of pedestal values per strip. The mean should be approximately the value determined during the VPSP scan. The spread should be approximately 10% of the mean.
SummaryHisto _Histo2DScatter_Pedestals_ControlView_StripPedestals 2D This distribution of noise versus APV - check for obvious outliers.
SummaryHisto _Histo2DScatter_Pedestals_ControlView_StripNoise 2D This distribution of pedestal values per module.
SummaryHisto _Histo1D_Pedestals_ControlView_NoiseMin(Max) 1D These are additional histograms for identifying outliers in the noise distributions. Each histogram has one entry per APV. The lowest(highest) noise value observed for each APV is added to the histogram.

The results of the two runs should be compared and the differences noted. The noise level will go down with the HV on.

The HV is 300 V here.

Calibration scan

The calibration scan analysis is not integrated to the tkCommissionner.
Select the CalibrationScan run and start as usual. This time, several source files will be produced on each filter unit, one for each value of ISHA/VFS. To run the client, there is a script in DQM/SiStripCommissioningAnalysis/test. Invoque it as

./step1.sh runnumber partition ishalow ishahigh ishastep vfslow vfshigh vfsstep
That will run the client n times, for each commissioning source. When this is done, a level 2 analysis has to be performed in an interactive ROOT session. Still in DQM/SiStripCommissioningAnalysis/test, do:
./step2.sh runnumber partition ishalow ishahigh ishastep vfslow vfshigh vfsstep
The resulting ishavfsScan.txt has to be uploaded to the db via tkConfigurationDb.
Details of the analysis procedures may be tuned in step2.C. That procedure is not automatic, and should not be performed without the advices of an expert.

IMPORTANT: After the db update, a new pedestal run must be done, since it affects the noise values.

Calibration

The calibration run is special since it does not produce new settings to be uploaded to the db. It's rather a standalone measurement to be used offline for performances studies.
Offline "client" processing is also a lengthy operation, since fits are performed for each strip independently, but that processing can be done independently of any other commissioning operation.
To start the run, select Calibration and proceed as usual.
The run will produce 8 different source files for 8 different strip subsets. These source files have to be analyzed offline (this is the lengthy part) individually.

Latency

Running the latency scan and latency scan analysis is not different from timing and pedestal runs. Just select Latency in a configuration built with an external physics trigger.
The run produces a unique histogram for all the partition. In the client debug.log, you will find the extracted parameters for PLL delays, FED delays, and latency, as well as a rough estimate of the precision achieved.

Reading out DCU data

In order to read out DCU data during a run, open all FecSupervisor applications. For each one, do the following:

  • Select "More parameters"
  • Find the tab called "DCU and Device work loop"
  • Check the box marked "DCU work loop used" and set the "Time between 2 DCU reads" to 30 (seconds).
  • Click apply

If you have to change the default settings, for example the time interval between readings or the output destination (the default is the configuration DB), before accessing the FecSupervisor applications you have to open the DCUFilter XDAQ application from the status display, change what you want to change and , finally, click apply.

(Re)Analyzing raw data and root files

Converting from streamer (.dat) to EDM (.root) format

  • The output .dat file uses a format that can change frequently with CMSSW release and so conversion to EDM must be done
  • The conversion must be done within the same release as used by the DAQ (presently CMSSW_2_1_0_pre3)
  • The converted EDM file contains only the FEDRawDataCollection, as this DataFormat is guaranteed not to change b/w releases
  • Nota bene: once the EDM file is created, the raw event data can be analyzed in any CMSSW release
    • Creation and analysis of commissioning histograms can be done in an "offline" two-step process
  • A .cfg file that performs the conversion is given below:
process Convert = {

    source = NewEventStreamFileReader {
        untracked vstring fileNames = { "file:./USC.00036863.0041.A.storageManager.0.0000.dat" }   // <-- input .dat file
    }

    untracked PSet maxEvents = { untracked int32 input = -1 }   // <-- "-1" means analyze all events in file

    module anal = EventContentAnalyzer { }   // <-- utility module that prints summary of what is contained in Event

    module out = PoolOutputModule
    {
        untracked string fileName = "./USC.00036863.0041.A.storageManager.0.0000.root"   // <-- output EDM .root file
        untracked vstring outputCommands =
        {
            "drop *",   // <-- "drop" all DataFormat collections...
            "keep FEDRawDataCollection_*_*_*"   // <-- ...but keep FEDRawDataCollection
        }
    }

    path p = { anal }

    endpath e = { out }

}

Framework bug in reading EDM (.root) files after conversion from streamer (.dat) format

  • Some early pre-releases within the 200 cycle are unable to read the converted EDM file
    • Runtime error message complains of lumi section and event numbers being equal to zero
  • Possible fix is to checkout out the package below and rebuild:
cvs co -r V06-12-00 IOPool/Input
cvs update -r 1.117 IOPool/Input/src/RootFile.cc

Creating root files containing the commissioning histograms using a "DQM source"

  • The EDM .root file containing the raw event data is processed by a "DQM source" (defined by an EDAnalyzer)
  • The "DQM source" produces a root file containing the low-level commissioning histograms
  • The "DQM source" requires a cabling object from the EventSetup to correctly book and fill the histograms
  • The cabling object requires access to the configuration database
  • An example cfg file is given below:
process Source = {

    include "DQM/SiStripCommon/data/MessageLogger.cfi"

    include "DQM/SiStripCommon/data/DaqMonitorROOTBackEnd.cfi"

    include "OnlineDB/SiStripConfigDb/data/SiStripConfigDb.cfi"
    replace SiStripConfigDb.UsingDb   = true   // <-- should be true!
    replace SiStripConfigDb.ConfDb    = "user/password@account"   // <-- obtain details from expert!
    replace SiStripConfigDb.Partition = "my_partition"   // <-- must specify a detector "partition"
    replace SiStripConfigDb.RunNumber = 12345   // <-- a run number must be given 

    es_source FedCablingFromConfigDb = SiStripFedCablingBuilderFromDb {
        untracked string CablingSource = "UNDEFINED"   // <-- this should be replaced by "DEVICES" for a connection run!
    }

    source = PoolSource {
        untracked vstring fileNames = { "file:./USC.00036863.0041.A.storageManager.0.0000.root" }   // <-- input EDM .root file
    }

    untracked PSet maxEvents = { untracked int32 input = -1 }   // <-- "-1" means analyze all events in file

    include "EventFilter/SiStripRawToDigi/data/FedChannelDigis.cfi"
    replace FedChannelDigis.TriggerFedId = -1   // <-- don't change unless expert!

    include "DQM/SiStripCommissioningSources/data/CommissioningHistos.cfi"
    replace CommissioningHistos.CommissioningTask = "UNDEFINED"   // <-- run type taken from event data, but can be overriden

    path p = { FedChannelDigis, CommissioningHistos }

}

Analyzing the commissioning histograms using a "DQM client"

  • The "DQM source" .root file containing the commissioning histograms is processed by a "DQM client"
  • The client performs "histogram collation" in the case of multiple "DQM source" root files
  • The client then analyses all low-level histograms and extracts tuned hardware configurations and calibration constants
  • The client can be configured to upload the hardware configurations and/or calibration constants
  • An example cfg file is given below:

process DbClient = {

    include "DQM/SiStripCommon/data/MessageLogger.cfi"   // <-- optional config for message logger

    include "DQM/SiStripCommon/data/DaqMonitorROOTBackEnd.cfi"

    include "OnlineDB/SiStripConfigDb/data/SiStripConfigDb.cfi"
    replace SiStripConfigDb.UsingDb   = true   // <-- should be true!
    replace SiStripConfigDb.ConfDb    = "user/password@account"   // <-- obtain details from expert!
    replace SiStripConfigDb.Partition = "my_partition"   // <-- must specify a detector "partition"
    replace SiStripConfigDb.RunNumber = 12345   // <-- a run number must be given 

    include "IORawData/SiStripInputSources/data/EmptySource.cff"   // <-- "Empty" input source (as don't need event loop!)
    replace maxEvents.input = 2   // <-- only need 2 events (as don't need event loop!)

    module db_client = SiStripCommissioningOfflineDbClient
    {
        untracked string     FilePath       = "."   // <-- directory location of input DQM source root files!
        untracked uint32     RunNumber      = 12345   // <-- a run number must be given (SAME AS ABOVE!)
        untracked bool       UseClientFile  = false   // <-- can read histograms from a "DQM client" root file if necessary
        untracked FileInPath SummaryXmlFile = "DQM/SiStripCommissioningClients/data/summary.xml"   // <-- location of XML describing format of summary plots
        untracked bool       UploadHwConfig = false   // <-- Should be "false". DO NOT USE UNLESS EXPERT!
        untracked bool       UploadAnalyses = false   // <-- Should be "false". DO NOT USE UNLESS EXPERT!
    }

    path p = { db_client }

}

Using DQM applications

Online DQM

Two ways of accessing the DQM are intended to be available. One based on Lassi's GUI (implemented by Volker and Matthias) and one based on Suchandra's GUI (implemented by Laura). Currently, only Suchandra's GUI works. We hope to get Lassi's GUI working in the very near future.

Using Lassi's GUI (Volker & Matthias)

The online DQM consists of four separate components:

  • the Storage Manager (SM)
  • the consumer: This is a CMSSW job running with a DQMHttpSource, which listens to the running XDAQ application, picks up the produced histograms and ships them to the locally running applications.
  • the collector: This application "collects" the histograms shipped by the consumer.
  • the web server application: This finally visualises the histograms on the web.
The installation procedure is described here.

Warning, important So far, the following procedure is tested for the CRACK system at the TAC only!

It is run as described in the following:

  1. Make sure, your environment is set up properly, e.g. with
          source /exports/slc4/CMSSW/scripts/setup.sh
          cd <CMSSW release area, where the VisMonitoring/DQMServer has been installed>
          eval `scramv1 ru -sh`
  2. Start the collector with
          DQMCollector
    The output should look like
          DaqMonitorROOTBackEnd: verbose parameter set to 1
          DaqMonitorROOTBackEnd: reference file name set to 
          DQM Server (aka Collector) started at port 9090
  3. Start the web server application:
    • check the configuration by typing
      visDQMControl show all from /exports/slc4/CMSSW/DQMGUI/config/server-conf.py
      The output should look like
      Backends:
        collector:
          type:        Collector
          params:      ['--listen 9091 --collector localhost:9090']
          label:       collector
        dqm:
          type:        Client
          params:      ['--listen 9092 --collector localhost:9091']
          label:       dqm
        dt:
          type:        Layout
          params:      ['/exports/slc4/CMSSW/DQMGUI/config/dt-layouts.py']
          label:       dt
      Server:
        port:          8030
        localBase:     cmstracker029:8030
        baseUrl:       /dqm/online
        serverDir:     /exports/slc4/CMSSW/DQMGUI/gui
        title:         CMS data quality
        serviceName:   Online
        services:      /exports/slc4/CMSSW/DQMGUI/config/dqm-services.py
        workspaces:    /exports/slc4/CMSSW/DQMGUI/config/online-workspaces.py
    • modify /exports/slc4/CMSSW/DQMGUI/config/server-conf.py , if necessary.
    • start the servers with
      visDQMControl start all from /exports/slc4/CMSSW/DQMGUI/config/server-conf.py
      The output should look like
      Starting backends: collector dqm
      Starting server at port 8030 in /exports/slc4/CMSSW/Development/DQM/gui
      At the same time, additional output of the collector should show up, like
       Added socket connection at localhost.localdomain, # of alive connections = 1
       Added client identified as IGUANA DQM Proxy at localhost.localdomain
       Sending node name: <SourceName;Collector>
       *** Warning! No monitoring objects available!
      The servers run in the background and can be stopped with
      visDQMControl stop all from /exports/slc4/CMSSW/DQMGUI/config/server-conf.py
      respectively restarted with
      visDQMControl restart all from /exports/slc4/CMSSW/DQMGUI/config/server-conf.py
  4. Spy the logs with
          tail -f /exports/slc4/CMSSW/DQMGUI/gui/*/log
  5. Check the configuartion of the consumer in /exports/slc4/CMSSW/DQMGUI/config/OnlineDQMFromSM.cfg for:
    • correct SM in DQMHttpSource.sourceURL
    • correct collector in MonitorDaemon.DestinationAddress and MonitorDaemon.SendPort (should match the entry "Backends.collector.params.--collector from the web server application configuration shown above.)
    • correct update frequency in DQMShipMonitor.period (see below: "Start run")
  6. Start run including SM:
    • Make sure that the following service is part of your actual "EventProcessor" configuration file (path is found in your *.xml run configuration file):
      service = FUShmDQMOutputService {
        untracked int32 initialMessageBufferSize = 1000000
        double lumiSectionsPerUpdate = 1.0
        bool useCompression = true
        int32 compressionLevel = 1
        untracked int32 lumiSectionInterval=20
      }
      The parameter FUShmDQMOutputService.lumiSectionInterval steers the size of the generated fake lumi sections. The parameter DQMShipMonitor.period mentioned above steers the number of lumi sections before the next update. This means, that you get updated histograms every FUShmDQMOutputService.lumiSectionInterval * DQMShipMonitor.period events.
  7. Start consumer with
          cmsRun /exports/slc4/CMSSW/DQMGUI/config/OnlineDQMFromSM.cfg
    If you don't stop (kill) it, this job will die as soon as your run is destroyed in the RCMS.
  8. Finally, point your web browser to the web address specified by the parameters localBase and baseUrl (so, in the example shown above: http://cmstracker029.cern.ch:8030/dqm/online/). You should be forwarded to the subdirectory session/XXXXXX then, where XXXXXX stands for some random code. You should be able to see the histogram tree/histograms now! The page is updated automatically from time to time, but of course you can force this by pressing the "refresh" button of your browser.

Using Suchandra's GUI (implemented by Laura)

  • Log on the vmepcs2b18-39 in the standard way.
  • Execute the script /opt/cmssw/scripts/runDQM.sh. This does not, and should not, end! If it has completed successfully, you will see XDAQ log file output, with the last light saying "Ready".
  • Open a second window on vmepcs2b18-39 and execute the script /opt/cmssw/scripts/changeDQMState.sh. This takes one argument, which is the state you wish to change to. The first state is "configure". When configuration has completed, the first window will display the message "Finished configuring!"
  • The second state is "enable". If this succeeds, the first window will show data from the run being processed.
  • The histograms can then be viewed at: http://server_name:40000/temporary/Online.html, where server_name is the name of the machine on which the DQM process is running (normally the one where the StorageManager is also running). Select the tab on the right-hand side labelled "Non-Geom View", then select the run type or "Hardware Error" from the dropdown list below. To retrieve the histograms, press "Get Selected Histo". Then wait a while for the histograms to appear on the left-hand side. You can configure the display to show different numbers of plots and if you move the mouse over a given plot and click, that plot will be displayed individually.
  • Once the run is over, in the second window, use changeDQMState.sh to first "stop" and then "destroy" the DQM process. Always wait for a state change to complete, before executing the next one!

Trouble-shooting guide

This is NOT an exhaustive list, but here are some pointers if things go wrong

  • If you are trying to initialize a configuration and the trigger state machine on the status display looks strange ie. there are xdaq executive and job control entries, but none of the other applications have names, this is a known trigger issue. Please contact the expert shifter and inform them of the problem.
  • Generally, if you think there are problems, open the logReader xdaq application and press "read/refresh received logs" and a list of error messages will appear. Check the red entries and contact the expert shifter.
  • If you have problems with XDAQ executives hanging and it appears that JobControl is at fault (this can happen sometimes when destroying an RCMS configuration), then simply login to the offending machine and restart the XDAQ processes: sudo /etc/init.d/xdaqd restart

-- WingTo - 15 Apr 2009

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatlog XYConnectionRun82160.log r1 manage 15.5 K 2009-04-21 - 15:32 WingTo  
Edit | Attach | Watch | Print version | History: r26 < r25 < r24 < r23 < r22 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r26 - 2013-05-10 - DerekAxelStrom
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback