Tips

An artdaq process crashed

The first thing to be aware of is, as of this writing (Apr-22-2018), if an artdaq process crashes, the rest of the system will attempt to continue running; this is to be considered something to be fixed, rather than a feature. In the meantime, if in JCOP it appears a process is in an unexpected state and/or it doesn't respond to transition requests, it's possible the process no longer exists. You can confirm this by logging into the relevant node and executing "ps aux | grep " where process_name could be eventbuilder, datalogger, boardreader, dispatcher. You can also look at the processes' logfile; when a crash occurs, helpful error messages may appear which don't show up in Kibana since they come from lower-level libraries and thus aren't MessageFacility messages.

If a process has crashed, a likely cause is that there were leftover processes / shared memory segments from a previous run which didn't end cleanly. If you run "root_out_zombies.sh", it will tell you whether this is the case on np04-srv-001 (where the eventbuilders and datalogger run). Be aware that it's of course possible that even though there aren't any leftover processes or shared memory segments now, that may not have been the case when the run where the crash occurred was performed.

DIM metrics aren't appearing

Take a look at the file which is sourced before artdaq processes are launched; the name of this file is specified by the "DAQ setup script" parameter in the file passed on the boot transition (see, e.g., /nfs/sw/artdaq/run_records/1093/boot.txt for an example of a boot file). In the DAQ setup script, uncomment the following lines if you see them commented:

#        export DIM_INC=/nfs/sw/dim/dim_v20r20
#        export DIM_LIB=/nfs/sw/dim/dim_v20r20/linux
#        export LD_LIBRARY_PATH=$DIM_LIB:$LD_LIBRARY_PATH

#        source /nfs/sw/work_dirs/dune-artdaq-dim-dev/localProducts_artdaq_v3_00_03a_e14_prof_s50/setup
#        setup artdaq_dim_plugin v0_02_03 -q e14:prof:s50
...where the actual versions given (v0_02_03, etc.) may be different. Be aware that these lines get commented by developers since it's not possible to perform a dune-artdaq build in an environment with artdaq_dim_plugin setup.

Check multicast transmission/reception

sudo tcpdump -x -nn -iem1 udp and multicast and not broadcast and src host 10.73.136.34

where 10.73.136.34 is a computer running an EventBuilder. Can be run on EB nodes or BR nodes to check both sides.

-- JohnChristianFreeman - 2018-04-10

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2018-04-22 - JohnChristianFreeman
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CENF All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback