-- MarcusDeBeurs - 2018-01-31

Network Monitoring

Introduction

This short tutorial will show you how you can monitor the network usage of a job. For this the tool written by Friedrich Höning is used which can be found on gitlab.

The tool basically works as a wrapper around your job and grabs all the network related read and write processes. It is written in basic C, and only contains around 250 lines.

Getting the tool

For this we use git, which we can setup via the atlas software:

setupATLAS

If you are not working on lxplus: you will need to define these variables:

export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh'
setupATLAS

Now we can setup git and clone the tool locally:


lsetup git
git clone https://gitlab.cern.ch/fhoenig/nw2

If you have never configured git before, you could run into trouble. When that happens, please have a look at the gitlab tutorial.

A directory called nw2 should have appeared, it contains a couple of files from which 3 are import for what we want to do:

  • Makefile
  • nw2.c
  • nw2_format_log.py

The file nw2.c is the source code of the tool, Makefile is used to compile and make the dependencies and nw2_format_log.py is a script that formats the output from the tool in a convenient way.

How to run the monitor

The usage of the tool is very simple. First one needs to compile the source and make the dependencies. For this we will use the Makefile and we can therefore simply run:

make

The shared object nw2.so should have appeared. The way you can monitor a process is by preloading this file during execution, which is done is the following way:

LD_PRELOAD=/path/to/nw2.so ./YourProgram

Where you should change the /path/to to the correct path (could be a relative path) and give an existing program. This could be any program!

An example is grabbing the information from this twiki website. This could be done by running the following command from within the directory when the nw2.so file is located:

LD_PRELOAD=nw2.so curl https://twiki.cern.ch/twiki/bin/view/Sandbox/NetworkMonitoring

After this a new file has been created which is called NW2.log, inside this file you can find all the network related processes with the amount of bytes that was involved together with a time stamp. By using the provided python script, this output can be formatted in the desired cumulative fashion:

python nw2_format_log.py NW2.log > NW2.cumulative.log

The following commands will show you the summary of the nework logfile, namely the first and the last line:

head -n 1 NW2.cumulative.log
tail -n 1 NW2.cumulative.log

Changing the logfile

It should be noted that the tool appends its output to the logfile. So when you run this tool multiple times without changing the logfile, all the network usage of the subsequent processes are combined..

This can be overcome by simply changing the name of the logfile each time. The tool itself has a method for this, by running:

export NW2_LOGFILE=NW_twiki.log
LD_PRELOAD=nw2.so curl https://twiki.cern.ch/twiki/bin/view/Sandbox/NetworkMonitoring

You will find that the file NW_twiki.log contains the output from the tool.

Plotting the output

A plot is always a nice way of showing the outcome of your monitor. The attached files plot_nw.py together with AtlasStyle.C and AtlasStyle.h lets you do just that. Download these files locally (remove the .txt from plot_nw.py) and run:

python plot_nw.py NW2.cumulative.log

Note that this plotting script only works on the log file that has been formatted in the cumulative way with the nw2_format_log.py script.

It could be useful to change the relative path to the AtlasStyle.C file to an absolute one, this is set to at the bottom of plot_nw.py. To prevent issues when running this script from different locations.

Athena example

Of course you want to do something more interesting then monitoring a curl command. Here I will show you how to run an athena job inside the monitor. Nothing new will be told here, but it might useful to have all the ingredients together. I start from a fresh shell and I will run the athena test job called q431:

mkdir monitor_athena_q431
cd monitor_athena_q431
setupATLAS
asetup Athena,21.0.60
export NW2_LOGFILE=NW_athena_q431.log
LD_PRELOAD=/user/marc/NetworkMonitor/nw2/nw2.so Reco_tf.py --AMI q431
python /user/marc/NetworkMonitor/nw2/nw2_format_log.py NW_athena_q431.log > NW_athena_q431.cumulative.log
python /user/marc/NetworkMonitor/plot_nw.py NW_athena_q431.cumulative.log

This athena job will take about 45 minutes to run and it reads about 135 MB form the network (as an input file).

Topic attachments
I Attachment History Action Size Date Who Comment
C source code filec AtlasStyle.C r1 manage 2.5 K 2018-01-31 - 14:32 MarcusDeBeurs ATLAS style for plotting
Header fileh AtlasStyle.h r1 manage 0.4 K 2018-01-31 - 14:32 MarcusDeBeurs ATLAS style for plotting
Texttxt plot_nw.py.txt r1 manage 5.1 K 2018-01-31 - 14:31 MarcusDeBeurs python script for plotting cumulative log files
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2018-01-31 - MarcusDeBeurs
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback