Disclaimer: If you have any question about this document being unclear, please let me know without hesitation! I might not be very clear at some point because English is not my first language and it's a very complicated system with complex ideas!

But if you think this document is too long and not worth reading, please just try to solve the problem yourself. Otherwise likely I will throw this document back at you.

This twiki talks about how to operate sTGC DAQ & baseline running on a superficial level.

If you want to know more, you check out my RongkunSandbox. You can also check if they're some alias for the cmd below

alias <cmd>

or look at ~/.bashrc for function definition inside.

If you see any error, please first go to Q&A and if it doesn't solve your problem, contact experts.

Basic Setups

Overview (details can be found in other specific sections)

  1. login to FELIX server, check if you have gbt aligned. If not, please try to check the FPGA connection, or try flx-init, or reboot the server, or even reload felix firmware.
  2. set up the proper elink configuration.
  3. config gbtx1 to the correct rate settings through IC (without felixcore and opcserver)
  4. open felixcore, wait till it's stable
  5. config gbtx2 to the correct rate setting through SCA/I2C (without opcserver but with felixcore)
  6. open opc server wait till it's stable. If there are some small but not recurrent errors, you can restart opc server again to see if they go away.
    1. it's always good to try configure frontend at this step, to see if it's successful and stable. This will configure ROC, TDS and VMM with the option -r -t -v correspondingly.
  7. You can check the monitoring, carry out baseline measurement. If you want to do readout test, see next steps.
  8. In order for readout to work, you need a correct gbtx phase between ROC on frontend and gbtx on L1DDC.
    • For a new wedge without setting up correctly, you need to train the phase, after you configure the ROC. This have to be done without felixcore and OpcServer
    • For an existing wedge, at the first step when you configure gbtx1, you can already upload the existing phases.
    • If you want to test readout through gbtx2, you need to train gbtx2 phase as well. Currently there is no implementation for phase uploading. Also the frontend configuration of mapping of sROC need to be changed.
  9. Now for how to take data, you can refer to "operation" section.

FELIX

ssh -XY nswdaq@pcatlnswfelix05NOSPAMPLEASE.cern.ch

Make sure you have setup the felix correctly, which include

  1. [after a power cycle of the FPGA]: this can happen if the power is cut entire, the felix server box
    1. run progFELIX to use JTAG to reload the FPGA fw for flx card.
    2. we have multiple card, and this command only do for the one that is connected with the jtag cable. You need to plug the cable to every used flx card and run the command again
    3. reboot the server.
  2. [after rebooting felix] For mounting /dev/flx*. You should see 0 1 2 3 four cards. If not, manually mount the flxcard by doing:
    1. sudo /etc/init.d/drivers_flx start 
  3. [after rebooting felix] You can check for gbt alignment status by
    flx-info GBT; flx-info GBT -c 1
    Do the following to initialize and align phase betwen felix and GBTx if there appears "NO" for connect fiber link when calling `flx-info GBT`
    flx-init; flx-init -c 1; # do for both FPGA
  4. [after rebooting felix] Setup elink with option HOIP for readout of two wedges for A14. (HOIPNoPT can be used without Pad Trigger elinks open. For A14, those are not available: HOIPSca for baseline scan, HO or IP for single wedge readout. )
    1. elinkconfig_sTGC_191 <option> # an alias 
  5. For new sector: [after powering up FE and L1DDC ] GBTx phase training. this phase is between GBTx and FE boards
    • For new wedges, you can create a corresponding directory
        mkdir /afs/cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/gbtx_config/191_sTGC_XXX 
    • Go to the corresponding sector directory
      1. cd /afs/cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/gbtx_config/191_sTGC_XXX 
    • For completely new wedge. copy those two files ../config_gbtx*.py to the corresopnding sector directory.
    • Do the following for new wedge first to config a correct rate.
      ./config_gbtx1.py -i 
    • To take data from swrod, you'll need this to train the phases, after you "Config" the FEs (you can use a partition. See how to do this on operation of SWROD) , and close felixcore and opcserver ./config_gbtx1.py -t
  6. For existing sector with GBTx phase already trained just do GBTx phase uploading as follows
    1. Go to corresonding sector directory
      1. cd /afs/cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/gbtx_config/191_sTGC_XXX 
    2. For existing wedges, all phase parameters are tuned properyly, do the following is enough. But if you think the wedge is not stable, you can retrain with "-t" again.
      1. ./config_gbtx1.py   #  this upload the existing phase
  7. Now you can run the two essential program to operate felix: felixcore and OpcServer. Each server can only open one felixcore and OpcServer . You can use "ps aux | grep XXX" to search for the command, to see there's existing running program. kill them with option -9 if there is (ask if someone else is using it). Both cmd should give you no error, no recurrent error messages. Then you can proceed to the next step. If there's some error, kill these two by Ctrl+C or CTRL+\ (not Ctrl+Z !), and try this step again.
    1. Open felixcore in one terminal, with the following command, to run felixID=1 without interference with other felix servers in the same network.
       felixcore_191_sTGC_HOIP 
      Alternatively, for testing 640MHz, you need to run:
       felixcore_191_sTGC_HOIP640 
    2. configure GBTx2, you can run the
        ./config_gbtx2.py -i 
      , with the felixcore running without OpcServer running. (For GBTx2, currently there is no way to save correct phases, so if you want to read through GBTx2, please remember to config ROC first, then run with -t every time, similar to GBTx1)
      1. if the felixcore crash, just restart it and try uploading again
      2. if the configure is not 100%(failed), it's sometimes due to gbtx on two flx logic cards cannot be trained at the same time (reason unclear). Use this script to only do config for
         config_gbtx2_card1.py 
      3. for safety you can restart felixcore before configuring each card, and before going to the next step.
    3. In another terminal, go to the appropriate directory for your sector:
      1.  cd /afs/cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/opc_xml/191/<sector> 
      2. use opc_xml_file: sector__Uncalibration_STG_191_ONLY_HOIP.xml for two wedges.
        • If the felixcore is opened with FelixId = 1 you need to use the xml with _FlxId1 postfix. Inside, you can see all elinks are shifted by 2^16 (in hex it's 0x10000)
      3.   OpcUaScaServer <opc.xml file>  

TTC - Alti

For TTC setup, it should be done automatically at power up.

However, when you are taking cosmic run with external trigger. You need to unplug the NIM cable on L1A for Alti to prevent it sending L1A to felix. Only after you click run, can you plug it in.

Please refer to Q&A if you have problem.

automatically setup transmitter at start-up: https://its.cern.ch/jira/browse/ATLNSWDAQ-108

If there's issue that points to Alti, contact experts.

SoftWare ReadOut Driver (SWROD)

For swrod machine, you want to use it to run partition and possibly baseline (could be running on felix machine or lxplus if there's an issue). There should be no setup to be done here ideally, but if you see some glitches with the network/partition(RootController issues),afs/eos access, please consider `sudo reboot now` for possible recovery.

I will talk about the OKS database's logic briefly. When you source a specific setup.sh file, you alias many commands, by pointing them to a specific OKS db "partition". This file is defined in muons/partitions, with the corresponding name defined in the setup.sh file. This file will include many other segments file, each defining a "segment" in the partition. The path is defined by relative path, you can go to those files for detailed OKS db info.

Segments - Alti

This runs on sbcl1ct-191-1.cern.ch machine

Here you can define the pattern(L1 trigger to ask for reading, and test pulse, etc) you sent in the TTC bitstream.

Search for "pattern_filename" pattern and you will see a file with postfix ".dat". This pattern configuration will be uploaded during "Config" step of partition. Modify the content of the .dat file to decide what to send.

Most common use-case is with/without test pulse, for pulser run and noise run, respectively. You just need to comment out the line with TP(see the comments inside) to disable test pulse. Remember that to comment one line correctly, the first character need to be #, cannot be space.

Segments - Config

The front-end parameters are now defined in json files. In the segment file, you can search for "json" to see where is the config file located. This file's parameter is sent to the FE through FELIX by this segment.

This by default runs on SWROD machine.

For each sector, it needs to be taken from

/afs/cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/config_json/180

You would need to merge config json files(from two wedges) together. If the name of the boards doesn't contain HO/IP and L/R information, please modify them so that they follow the files defined in the 191 directory (e.g. https://gitlab.cern.ch/atlas-muon-nsw-daq/config-files/blob/master/config_json/191/A12/wedge1B191_A12.json )

It's potentially dangerous to be modified by hand, as the common part of the two files can be different. So please consider this GUI described here: https://twiki.cern.ch/twiki/bin/view/Sandbox/Rongkun_FELIXGUI. You can load the config with the "Add Config" check box checked, so that the json file are loaded additively.

For small sectors, HO(away from interaction point) is pivot and IP(close to interaction point) is confirm. For big sectors, it's reversed.

Here I will describe below the essential parameters needed for pulser data. For switching to baseline/cosmics, please see the caveats - must read.

You would need to make sure:

  • for all vmm: l0offset to 900, offset to 964 for pulser, bc rollover to 3564.
  • for all roc: BC offset to 900, rollover to 3564, set busy_enable_sroc0/1/2/3 = 0.
  • Change the OpcServerIp to corresponding felix server.
And please save the file here: /afs/cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/config_json/191, under the corresponding sector, for bookkeeping purpose.

Segments - SWROD

This does data assembling with data coming out from ROCs. Each sROC in ROC(in total 4) has a data-path called e-link, which is the unit for being assembled.

This file defines the plugin, processor etc.. The most important is to know where is the data written into. You can scan through this file and you will find it's under /opt/localdisk3/data/nswdaq.

This by default runs on SWROD machine

Operation

FELIX

Ideally, you don't need to touch FELIX at this step. But if felix give you constant junks, there's could be an issue, like someone sneeze around your FE, etc..

TTC - Alti

It's controlled by partition, most of the time here. It should only be operated for test pulse (pulser data) or noise run. But it provides clock all the time, without you specifying it.

When you are taking cosmic run with external trigger. You need to unplug the NIM cable on L1A for Alti to prevent it sending L1A to felix. Only after you click run, can you plug it in.

Configuration

You can call
configure_frontend -h
to see what options you can use. This is enough for simple operation and baseline taking.

Or, in the partition, the configuration will be done during "Config" step. This calls correct TTC sequences, to allow a smooth data-taking.

Data-taking

Login to our SWROD machine (in a fresh terminal)

ssh -XY nswdaq@pcatlnswswrod03

Do the following,

cd /afs/cern.ch/work/n/nswdaq/public/tdaq-08-03-01/db/NSW_OKS_Test_DB_191/NSW_OKS_Test_DB
source setup_oks_NSW-191-TGC-SWROD.sh
run

If it's smooth, the partition GUI should open after a few minutes. If not, you can consider run "killit" in the directory, and retry "run". Then if it still doesn't work, you can consider deleting the ipc_init_stgc.txt in the directory. In the worst case, you reboot the swrod.

With the GUI opened, it's very intuitive to operate, you need to switch to "control" under "Access Control" menu. Only one person can do this at a time.

  • "Initialize" setups software level interaction of the control over network, to segment that's possibly running on different machines that's controlled centrally by the partition's machine.

  • "Config" and wait for a stable felixcore and OpcServer output to stop.

  • "Run", wait until you're satisifed with the data stat, you can do

  • "Stop", then you can do "Run" again to take another run, or

  • "Unconfig" if you are changing something like config.json, pattern.dat...

  • "Shutdown" if you want to go home. or something goes wrong, or if you change something in the oks xml files.

If you change xml files when your partition is open, or if you disable/enable some of the components in the segment, you need to reload/commit.

The location where the log are saved are defined in the partition(search for LogRoot). Right now it should be here: /tmp/nswdaq/log/. The log for NSWConfiguration is called `NSWConfigApp` (defined in config segment file)

Data analysis

In a brand new shiny terminal on pcatlnswswrod03 machine, do

 setup_nsw_process 

Then you'll enter the directory that saves data. There are the script "quick.sh" that does basic analysis, in which it uses nsw_process to decode data from binary to TTree and hit_RDF.py to make plots (it could take a while)

To be filled: The full-scale data analysis code is provided by Prachi. The comparison between 180 and 191 data is by Tongbin/

Baseline & trimmers - the future official tool

Go to directory /afs/cern.ch/work/n/nswdaq/public/vlad/stgc_b191,

source setup.sh

read the readme there for how to run it.

The readme can also be found here:

https://gitlab.cern.ch/atlas-muon-nsw-daq/NSWCalibration # to run it

https://gitlab.cern.ch/vplesano/nswcalibrationdataplotter # plotting

Bug Vlad if there's issue, please.

Baseline & trimmers - home-brew from Alex, modified by dozens

Go to /eos/atlas/atlascerngroupdisk/det-nsw-stgc/b191/191_quick_and_dirty_baselines_NEW

source setup.sh

For threshold tuning, modify and run the file scripts/stgc_threshold.sh accordingly

For baseline, modify and run the file scripts/stgc_baselines.sh accordingly.

This file defines the board to be run.

Check the results at cern.ch/stgc-trimmer/ with the postfix "_191"

You can compare with the 180 results summarized here (column "Wedge MTF" indicates it's confirm or pivot. small sector HO is pivot): https://docs.google.com/spreadsheets/d/1edrOf_MO08i0owHAcdrR9_jnZt31ZOs1XpOdIYsHGGk/edit#gid=0

Parameter tuning

Usually happens at pulser run.

Ok, first of all, we should try to rerun the GBTx phase upload a bit. And perhaps have an expert to check elinkconfig and config in oks db.

Then if we see problem, go below.

I prefer to use FELIXGUI mentioned before to change parameters.

The rate is zero!

If swROD doesn't start recording data immediately(the rate is zero)

Some whole elink is not sending data. You can use netio_cat instead to grab elink from your felix server. Use `felix-buslist -e` command to know what port to grab.

For example, our felixcore runs with felix-id = 1 now, elink are offset by 1 (our sTGC flx-id as discussed) * 2^16 + (logic card number 0 - 1 in our case for GBTx1) * 2^11 + (link number from 0 to 11) * 2^6 + corresponding elink (sFEB 8/16/24/33 or pFEB 8/24/33). We can use those below to check phase 1 elinks:

netio_cat subscribe -H pcatlnswfelix05 -p 12350 -e raw -t 65544 -t 65552 -t 65560 -t 65569 -t 65608 -t 65616 -t 65624 -t 65633 -t 65672 -t 65680 -t 65688 -t 65697 -t 65736 -t 65744 -t 65752 -t 65761 -t 65800 -t 65808 -t 65816 -t 65825 -t 65864 -t 65872 -t 65880 -t 65889 -t 65928 -t 65936 -t 65944 -t 65953 -t 65992 -t 66000 -t 66008 -t 66017 -t 66056 -t 66064 -t 66072 -t 66081 -t 66120 -t 66128 -t 66136 -t 66145 -t 66184 -t 66192 -t 66200 -t 66209 -t 66248 -t 66256 -t 66264 -t 66273 | tee /afs/cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/config_json/191/A14/test1

netio_cat subscribe -H pcatlnswfelix05 -p 12351 -e raw -t 67976 -t 67984 -t 67992 -t 68001 -t 68040 -t 68048 -t 68056 -t 68065 -t 68168 -t 68176 -t 68184 -t 68193 -t 68232 -t 68240 -t 68248 -t 68257 | tee /afs/cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/config_json/191/A14/test2

netio_cat subscribe -H pcatlnswfelix05 -p 12353 -e raw -t 71688 -t 71696 -t 71704 -t 71713 -t 71752 -t 71760 -t 71768 -t 71777 -t 71816 -t 71824 -t 71832 -t 71841 -t 71880 -t 71888 -t 71896 -t 71905 -t 72072 -t 72080 -t 72088 -t 72097 -t 72136 -t 72144 -t 72152 -t 72161 -t 72200 -t 72208 -t 72216 -t 72225 -t 72264 -t 72272 -t 72280 -t 72289 | tee /afs/cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/config_json/191/A14/test3

For phase 2 640 MHZ elinks(only sFEB Q1 for now):

netio_cat subscribe -H pcatlnswfelix05 -p 12350 -e raw -t 65547 -t 65555 -t 65561 -t 65569 -t 65611 -t 65619 -t 65625 -t 65633 -t 65675 -t 65683 -t 65689 -t 65697 -t 65739 -t 65747 -t 65753 -t 65761 -t 65803 -t 65811 -t 65817 -t 65825 -t 65867 -t 65875 -t 65881 -t 65889 -t 65931 -t 65939 -t 65945 -t 65953 -t 65995 -t 66003 -t 66009 -t 66017 -t 66059 -t 66067 -t 66073 -t 66081 -t 66123 -t 66131 -t 66137 -t 66145 -t 66187 -t 66195 -t 66201 -t 66209 -t 66251 -t 66259 -t 66265 -t 66273 | tee /afs/cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/config_json/191/A14/test1

netio_cat subscribe -H pcatlnswfelix05 -p 12351 -e raw -t 67979 -t 67987 -t 67993 -t 68001 -t 68043 -t 68051 -t 68057 -t 68065 -t 68171 -t 68179 -t 68185 -t 68193 -t 68235 -t 68243 -t 68249 -t 68257 | tee /afs/cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/config_json/191/A14/test2

netio_cat subscribe -H pcatlnswfelix05 -p 12353 -e raw -t 71691 -t 71699 -t 71705 -t 71713 -t 71755 -t 71763 -t 71769 -t 71777 -t 71819 -t 71827 -t 71833 -t 71841 -t 71883 -t 71891 -t 71897 -t 71905 -t 72075 -t 72083 -t 72089 -t 72097 -t 72139 -t 72147 -t 72153 -t 72161 -t 72203 -t 72211 -t 72217 -t 72225 -t 72267 -t 72275 -t 72281 -t 72289 | tee /afs/cern.ch/user/n/nswdaq/public/sw/config-ttc/config-files/config_json/191/A14/test3

to grab data from the two possible ports. If some data are being sent does, it means at least some elinks are working, the L1 and test pulse are being sent from TTC stream. And some elinks are missing. You can use my FELIXGUI to figure out which elink are there, and which are missing. You can start by checking for those missing elinks, by following the step below in 'If you don't see any hits on one vmm'

If you don't see any hits on one vmm

1) Change to bypass mode for ROC for that board. On sROC to VMM mapping, unmapp all the other except the problematic vmm. Go to 2)

2) Run the partition with only that corresponding elink opened. You could use netio_cat on felix machine to see if there's data. If there's data, go to 3), if no, go to 4)

3) This means you have a incorrect L1 matching between this vmm and the roc, and bypass mode let the roc assemble it without matching. Try to change the "offset" without "bypass" a bit, after some tuning, you should be seeing data.

4) This means you have some parameter that is wrong! First thing to consider is ROC to VMM 160 MHz clock. Under rocPllCoreAnalog, there's ePllVmm0 and ePllVmm1, the ePllPhase160MHz_X in ePllVmY means vmm Y*4 + X. For example, ePllPhase160MHz_2[4] in reg066ePllVmm0 is the 4-th bit for vmm 2. ePllPhase160MHz_2[3:0] in reg069ePllVmm0 is the 3-0th bit for vmm2. It's a 5-bit parameter, and in the json, it is represented in decimal. Now you know the principle, if you feel it's complicated, you can use teh FELIXGUI to do modification, too. Try changing it by a step of 5, up and down alternatingly trying. If you still don't see data by doing this, go to 5)

5) Well, there's a chance that roc internal clock phase is a bit off. Contact expert if in doubt. You shouldn't need to come this far usually. Does 4) really doesn't work for you?

Some vmm has a much lower hit than others

This is a new phenomenon only seen in 191, not in 180. Try to set the CTRL_PHASE for corresonding vmm. 3 is often a reasonable number. This contrl the phase of TTC bits, except the test pulse bit. The logic this is similar to 160MHz clock parameter. You can also use FELIXGUI to modify it. Once the hit rate is normal, you can tune the offset a little bit to tune the bcid to the middle.

Caveats - must read

Cosmics

Remember to change the offset to 933 (current ideal value) from ~964 for pulser run.

Remember to change in the oks db, the source of L1A (search for it).

Remember to turn on neighbor mode for strip to record clusters with ADC under threshold.

Remember to unplug L1A for Alti all the time, until you start the run in the partition.

turn off "sth" and "st" for all channel

Noise run

Remember to modify the *.dat defined in the oks db. Instruction mentioned above. Comment out the Test Pulse bit (see comment in the file) so that it only sends L1A.

turn off "sth" and "st" for all channel

Baseline & trimmers

Only if you have an unstable OPC server, you can try to, close all the data elink, but only leave SCA elinks opened. You can open elinkconfig and turn off to-host elinks for group1-4. TODO: we could add the option "HOIPSca" for elinkconfig_sTGC_191 command if needed.

Pulser data

The pulser is generated by vmm itself. It is asked to generate by ROC. And roc asks so because TP bit is in the TTC stream!

Therefore, you need to make sure that the generated pulse height is higher than your threshold. An empirical value that works is

"sth" and "st" on for all channel, this amplifies the test pulses. threshold: sdt_dac = 400. pulse height sdp_dac = 320.

Or "st" on for all channel, sdt_dac = 750, sdp_dac = 350

Q&A

Below are some more errors that are observed in 191:

Q: After I click config in a partition, the config got stuck. Also there's no response at OpcServer Side.

A: Typically you are not making correct connection between OpcServer and the Configuration Client. Check the Opc server config xml file and FE json file correctly!

Q: The OPC server is very unstable. It's especially true when there's L1A after I click start.

A: Try those in sequence: restarting the server --> re-upload the GBTx phases --> power cycle L1DDC & FEs.

Q: I see a lot of CMEM error when config elink/felixcore/

A: please follow: https://its.cern.ch/jira/browse/ATLNSWDAQ-164

Q: The config_gbtx all timeout !

A: Check if `flx-init` runs properly. If not, try `flx-reset`. Do these with `-c 1` for second card as well. If it still doesn't solve the problem, reboot the machine.

Q: I see recurring LOL register = 0x02 during flx-init

A: please check if your TTC module(Alti/Ltp/ttcvi/ttcvx) is sending clock correctly). Are the transmitter enabled? on our sbc server `sbcl1ct-191-1.cern.ch`:

setupSBC
menuAltiModule
9
1
We should at least enable the TX00 and TX01 for readout.

Q: I don't see data after I click "Run" in partition

A: First try to kill felixcore and opcserver, then redo the gbtx phase uploading in FELIX setup. This could be a phase issue. Phases might drift away from the correct values.

Q: How to reset Alti?

A: you can find this info in this https://espace.cern.ch/NSWCommissioning/_layouts/15/WopiFrame2.aspx?sourcedoc=/NSWCommissioning/Shared%20Documents/sTGC-integration-commissioning/sTGC-readout/HOW-TO-take-baselines.docx&action=default

Many other errors are summarized here in Rongkun_FELIXMan for FELIX

Experts

Melike: SBC/TTC, swrod

Rongkun: felix, readout, baseline, swrod

Prachi: baseline, felix

Xu: readout, frontend, L1DDC

Panos: L1DDC

Polyneikis: monitoring, readout, baseline

Emil: LV,

Brigitte

Other info

Useful info in: https://twiki.cern.ch/twiki/bin/viewauth/Atlas/NSWTriggerVS

-- RongkunWang - 2020-10-26

Edit | Attach | Watch | Print version | History: r43 < r42 < r41 < r40 < r39 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r43 - 2020-10-26 - RongkunWang
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback