This twiki talks about how to operate sTGC DAQ & baseline running on a superficial level.

If you want to know more, you check out my RongkunSandbox. You can also check if they're some alias for the cmd below

alias <cmd>

or look at ~/.bashrc for function definition inside.

If you see any error, please first go to Q&A and if it doesn't solve your problem, contact experts.

Basic Setups

FELIX

ssh -XY nswdaq@pcatlnswfelix05NOSPAMPLEASE.cern.ch

Make sure you have setup the felix correctly, which include

For mounting /dev/flx*. You should see 0 1 2 3 four cards. If not, manually mount the flxcard by doing:

  1. sudo /etc/init.d/drivers_flx start 
  2. [after rebooting felix] You can check for gbt alignment status by
    flx-info GBT; flx-info GBT -c 1
    Do the following to initialize and align phase betwen felix and GBTx
    flx-init; flx-init -c 1; # do for both FPGA
  3. [after rebooting felix] Setup elink with options HOIP for readout of two wedges. HOIPSca for baseline scan, HO or IP for single wedge readout.
    1. elinkconfig_sTGC_191 <option> # an alias
  4. For new sector: [after powering up FE and L1DDC ] GBTx phase training. phase is between GBTx and FE boards
    • For new wedges, you can create a corresponding directory

      mkdir /afs/cern.ch/user/n/nswdaq/public/config-files-sTGC/191_XXX
    • Go to the corresponding sector directory
      1. cd /afs/cern.ch/user/n/nswdaq/public/config-files-sTGC/191_XXX 
    • For completely new wedge. copy the ../config_gbtx_191_full.py to the corresopnding sector directory.
    • Do the following for new wedge, do this after you "Config" the FEs (see how on operation on SWORD, please)
      ./config_gbtx_191_full.py -t
  5. For existing sector: GBTx phase uploading.
    1. Go to corresonding sector directory
      1. cd /afs/cern.ch/user/n/nswdaq/public/config-files-sTGC/191_XXX 
    2. For existing wedges, all phase parameters are tuned properyly, do the following is enough. But if you think the wedge is not stable, you can retrain.
      1. ./config_gbtx_191_full.py   #  this upload the existing phase
  6. You are ready to go. Now you can run the two essential program for operation of felix: felixcore and OpcServer . Each server can only open one felixcore and OpcServer . You can use "ps aux | grep XXX" to search for the command, to see there's existing running program. kill them with option -9 if there is. Both cmd should give you no error, no continuous messages. Then you can proceed to the next step. If there's some error, kill these two by <CTRL>+C, and try this step again.
    1. In one terminal, do
      1. felixcore_191_sTGC_HOIP
    2. In another terminal, use opc_xml_file: sector_A12_Uncalibration_STG_191_ONLY_HOIP.xml for two wedges. If the felixcore is opened with ip=1, you need to use the one with sector_A12_Uncalibration_STG_191_ONLY_HOIP_FlxId1.xml.
      1. cd /opt/OpcUaScaServer/bin/; ./OpcUaScaServer <opc_xml_file>

TTC - Alti

For TTC setup, it should be done automatically at power up.

However, when you are taking cosmic run with external trigger. You need to unplug the NIM cable on L1A for Alti to prevent it sending L1A to felix. Only after you click run, can you plug it in.

If there's issue that points to Alti, contact experts.

SWROD

For swrod machine, you want to use it to run partition and possibly baseline (could be running on felix machine or lxplus if there's an issue). There should be no setup to be done here ideally, but if you see some glitches with the network/partition(RootController issues),afs/eos access, please consider `sudo reboot now` for possible recovery.

I will talk about the OKS database's logic briefly. When you source a specific setup.sh file, you alias many commands, by pointing them to a specific OKS db "partition". This file is defined in muons/partitions, with the corresponding name defined in the setup.sh file. This file will include many other segments file, each defining a "segment" in the partition. The path is defined by relative path, you can go to those files for detailed OKS db info.

Segments - Alti

Here you can define the trigger pattern you sent in the TTC bitstream, search for ".dat" pattern and you will see a "pattern file". This will be uploaded during "Config" step of partition. Modify this file to decide what to send. This runs on sbcl1ct-191.cern.ch machine

Most common use-case is with/without test pulse, for pulser run and noise run. You just need to comment out the line with TP(see the comments inside). Remember that to comment one line correctly, the first character need to be #, cannot be space.

Segments - Config

The front-end parameters are now defined in json files. In the segment file, you can search for "json" to see where is the config file located. This file's parameter is sent to the FE through FELIX by this segment.

This by default runs on SWROD machine.

For each sector, it needs to be taken from

/afs/ cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/config_json/180

You would need to merge config json files(from two wedges) together. If the name of the boards doesn't contain HO/IP and L/R information, please modify them so that they follow the files defined in the 191 directory (e.g. https://gitlab.cern.ch/atlas-muon-nsw-daq/config-files/blob/master/config_json/191/A12/wedge1B191_A12.json )

It's potentially dangerous to be modified by hand, as the common part of the two files can be different. So please consider this GUI described here: https://twiki.cern.ch/twiki/bin/view/Sandbox/Rongkun_FELIXGUI. You can load the config with the "Add Config" check box checked, so that the json file are loaded additively.

For small sectors, HO(away from interaction point) is pivot and IP(close to interaction point) is confirm. For big sectors, it's reversed.

Here I will describe below the essential parameters needed for pulser data. For switching to baseline/cosmics, please see the caveats - must read.

You would need to make sure:

  • for all vmm: l0offset to 900, offset to 964 for pulser, bc rollover to 3564.
  • for all roc: BC offset to 900, rollover to 3564, set busy_enable_sroc0/1/2/3 = 0.
  • Change the OpcServerIp to corresponding felix server.
And please save the file here: /afs/ cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/config_json/191, under the corresponding sector, for bookkeeping purpose.

Segments - SWROD

This does data assembling with data coming out from ROCs. Each sROC in ROC(in total 4) has a data-path called e-link, which is the unit for being assembled.

This file defines the plugin, processor etc.. The most important is to know where is the data written into. You can scan through this file and you will find it's under /opt/localdisk3/data/nswdaq.

This by default runs on SWROD machine

Operation

FELIX

Ideally, you don't need to touch FELIX at this step. But if felix give you constant junks, there's could be an issue, like someone sneeze around your FE, etc..

TTC - Alti

It's controlled by partition, most of the time here. It should only be used for test pulse(pulser data) or noise run.

When you are taking cosmic run with external trigger. You need to unplug the NIM cable on L1A for Alti to prevent it sending L1A to felix. Only after you click run, can you plug it in.

Data-taking

Login to our SWROD machine

ssh -XY nswdaq@pcatlnswswrod03

Do the following,

cd /afs/cern.ch/work/n/nswdaq/public/tdaq-08-03-01/db/NSW_OKS_Test_DB_191/NSW_OKS_Test_DB
source setup_oks_NSW-191-TGC-SWROD.sh
run

If it's smooth, the partition GUI should open after a few minutes. If not, you can consider run "killit" in the directory, and retry "run". Then if it still doesn't work, you can consider deleting the ipc_init_stgc.txt in the directory. In the worst case, you reboot the swrod.

With the GUI opened, it's very intuitive to operate.

"Initialize" setups software level interaction of the control over network, to segment that's possibly running on different machines that's controlled centrally by the partition's machine.

"Config" and wait for a stable felixcore and OpcServer output to stop.

"Run", wait until you're satisifed with the data stat, you can do

"Stop", then you can do "Run" again to take another run, or

"Unconfig" if you are changing something like config.json, pattern.dat...

"Shutdown" if you want to go home. or something goes wrong, or if you change something in the oks xml files.

If you change xml files when your partition is open, or if you disable/enable some of the components in the segment, you need to reload/commit.

Baseline & trimmers - the future official tool

Go to directory /afs/cern.ch/work/n/nswdaq/public/vlad/stgc_b191,

source setup.sh

read the readme there for how to run it.

The readme can also be found here:

https://gitlab.cern.ch/atlas-muon-nsw-daq/NSWCalibration # to run it

https://gitlab.cern.ch/vplesano/nswcalibrationdataplotter # plotting

Bug Vlad if there's issue, please.

Baseline & trimmers - home-brew from Alex, modified by dozens

Go to /eos/atlas/atlascerngroupdisk/det-nsw-stgc/b191/191_quick_and_dirty_baselines

source setup.sh

modify and run the file NSWConfiguration/dev/stgc_threshold_calib.sh accordingly.

This file defines the board to be run.

Check the results at cern.ch/stgc-trimmer/ with the postfix "_191"

Parameter tuning

Usually happens at pulser run.

Ok, first of all, we should try to rerun the GBTx phase upload a bit. And perhaps have an expert to check elinkconfig and config in oks db.

Then if we see problem, go below.

The rate is zero!

If swROD doesn't start recording data immediately(the rate is zero)

Some whole elink is not sending data. You can use netio_cat instead to grab elink from your felix server. Use `felix-buslist -e` command to know what port to grab.

For example, our felixcore runs with felix-id = 1 now, elink are offset by 2^16 = 65536. We can use

netio_cat subscribe -H pcatlnswfelix05 -p 12350 -e raw -t 65544 -t 65552 -t 65560 -t 65569 -t 65928 -t 65944 -t 65953 -t 65608 -t 65624 -t 65633 -t 65992 -t 66000 -t 66008 -t 66017 -t 65672 -t 65680 -t 65688 -t 65697 -t 66056 -t 66072 -t 66081 -t 66120 -t 66128 -t 66136 -t 66145 -t 65736 -t 65752 -t 65761 -t 65800 -t 65808 -t 65816 -t 65825 -t 66184 -t 66200 -t 66209 -t 66248 -t 66256 -t 66264 -t 66273 -t 65864 -t 65880 -t 65889

netio_cat subscribe -H pcatlnswfelix05 -p 12353 -e raw -t 72072 -t 72080 -t 72088 -t 72097 -t 72264 -t 72280 -t 72289 -t 72328 -t 72336 -t 72344 -t 72353 -t 72136 -t 72152 -t 72161

to grab data from the two possible ports. If some data are being sent does, it means at least some elinks are working, the L1 and test pulse are being sent from TTC stream. And some elinks are missing. You can use my FELIXGUI to figure out which elink are there, and which are missing. You can start by checking for those missing elinks, by following the step below in 'If you don't see any hits on one vmm'

If you don't see any hits on one vmm

1) Change to bypass mode for ROC for that board. On sROC to VMM mapping, unmapp all the other except the problematic vmm. Go to 2)

2) Run the partition with only that corresponding elink opened. You could use netio_cat on felix machine to see if there's data. If there's data, go to 3), if no, go to 4)

3) This means you have a incorrect L1 matching between this vmm and the roc, and bypass mode let the roc assemble it without matching. Try to change the "offset" without "bypass" a bit, after some tuning, you should be seeing data.

4) This means you have some parameter that is wrong! First thing to consider is ROC to VMM 160 MHz clock. Under rocPllCoreAnalog, there's ePllVmm0 and ePllVmm1, the ePllPhase160MHz_X in ePllVmY means vmm Y*4 + X. For example, ePllPhase160MHz_2[4] in reg066ePllVmm0 is the 4-th bit for vmm 2. ePllPhase160MHz_2[3:0] in reg069ePllVmm0 is the 3-0th bit for vmm2. It's a 5-bit parameter, and in the json, it is represented in decimal. Now you know the principle, if you feel it's complicated, you can use teh FELIXGUI to do modification, too. Try changing it by a step of 5, up and down alternatingly trying. If you still don't see data by doing this, go to 5)

5) Well, there's a chance that roc internal clock phase is a bit off. Contact expert if in doubt. You shouldn't need to come this far usually. Does 4) really doesn't work for you?

Some vmm has a much lower hit than others

This is a new phenomenon only seen in 191, not in 180. Try to set the CTRL_PHASE for corresonding vmm. 3 is often a reasonable number. This contrl the phase of TTC bits, except the test pulse bit. The logic this is similar to 160MHz clock parameter. You can also use FELIXGUI to modify it. Once the hit rate is normal, you can tune the offset a little bit to tune the bcid to the middle.

Caveats - must read

Cosmics

Remember to change the offset to 933 (current ideal value) from ~964 for pulser run.

Remember to change in the oks db, the source of L1A (search for it).

Remember to turn on neighbor mode for strip to record clusters with ADC under threshold.

Remember to unplug L1A for Alti all the time, until you start the run in the partition.

turn off "sth" and "st" on for all channel

Noise run

Remember to modify the *.dat defined in the oks db. Comment out the Test Pulse bit (see comment in the file) so that it only sends L1A.

turn off "sth" and "st" on for all channel

Baseline & trimmers

In order to have a stable OPC server, close all the data elink. We have option "HOIPSca" for elinkconfig_sTGC_191 command.

Pulser data

The pulser is generated by vmm itself. It is asked to generate by ROC. And roc asks so because TP bit is in the TTC stream!

Therefore, you need to make sure that the generated pulse height is higher than your threshold. An empirical value that works is

"sth" and "st" on for all channel, this amplifies the test pulses. threshold: sdt_dac = 400. pulse height sdp_dac = 320.

Q&A

Below are some more errors that are observed in 191:

Q: The OPC server is very unstable. It's especially true when there's L1A after I click start.

A: Try those in sequence: restarting the server --> re-upload the GBTx phases --> power cycle L1DDC & FEs.

Q: I see a lot of CMEM error when config elink/felixcore/

A: Reason is not understood. Several reboot might help.

Q: I don't see data after I click "Run" in partition

A: First try to kill felixcore and opcserver, then redo the gbtx phase uploading in FELIX setup. This could be a phase issue. Phases might drift away from the correct values.

Many other errors are summarized here in Rongkun_FELIXMan for FELIX

Experts

Melike

Rongkun

Xu

Emil

Brigitte

-- RongkunWang - 2020-02-17

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2020-03-09 - RongkunWang
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback