Introduction

This is a small CREAM stress test suite developed in python. It provides stress testing of CREAM service and a utility script monitoring target hosts and/or services, gathering the data and then plotting it. It is a work in progress and more features will be included in the future. The test suite supports Scientific Linux 6.If an SL5 compliant version is required, please feel free to contact me. Also, it can be easily ported to any Linux platform. (anyone wanting to use the test suite in a non supported environment, please feel free to contact me)

For news and more info, you can follow us on Twitter and Posterous!

Scientific Linux 6 Installation

The test-suite has to be run in a host which has the capabilities to submit jobs to a CREAM endpoint. Namely, an EMI-UI is ideal for the job. In order to install the test-suite, execute as root:

wget http://yum.gridctb.uoa.gr/repository/robot_testing_sl6.repo -O /etc/yum.repos.d/robot_testing_sl6.repo
yum install cream_stress_test
(please note that the dependencies are automatically taken care of. Also,in previous versions, the robot_testing repository had to take precedence over the standard SL repos. This is no longer the case.)

Scientific Linux 6 Dependencies



The available packages and their dependencies are:

Package Dependency
cream_stress_test python-argparse
  python-matplotlib
  pexpect

All dependencies are either installed by default, or available in the standard repos (SL, EPEL etc)

Installed files

The CREAM testsuite installs the following files:
File Description
/opt/cream_stress_test/bin/cream_stress_test.py The stress test script
/opt/cream_stress_test/bin/cream_stress_statistics.py The monitor script
/opt/cream_stress_test/docs/cream_stress_test.7.gz and /usr/share/man/man7/cream_stress_test.7.gz The stress test's man page
/opt/cream_stress_test/docs//cream_stress_statistics.7.gz and /usr/share/man/man7/cream_stress_statistics.7.gz The monitor's man page
/opt/cream_stress_test/docs/COPYING Copyright document
/opt/cream_stress_test/docs/CHANGELOG The test suite's changelog
Also, once installed, the man pages "cream_stress_test" and |cream_stress_statistics" are available.

Deployment

No special steps should be taken for deploying the test.

Test Execution

Stress

Arguments

First of all, lets check the scripts arguments:
Argument Description
-h --help Show the help message
-n --numjobs The total number of jobs to run. Note that this value is actually ignored in case of running in constant mode (see bellow)
-c --concurrentSubmitters The number of total submitter processes to create. Defaults to 1. WATCH OUT If this argument is very high, the system will dead lock due to memory shortage (e.g.: 160 processes for a 512 MB RAM host)
-s --slots The number of jobs each submitter process will submit/manage concurrently. Defaults to 1.
-t --testype The type of test to run. Either "submit" or "cancel". More test types will be added in the future.
-j --jdl Path to the jdl file to use. If the argumment is omitted, a sample builtin jdl running uname will be used. If "sleep" is given, then a jdl which sleeps for --sleep seconds will be created. Note that the --sleep argument defaults to 0, so it should essentially be specified if the jdl is supposed to actually sleep for some time.
--sleep The time to sleep, in case of sleep jdl being used. Defaults to 0 seconds.
-e --endpoint The endpoint where the jobs are submitted. Example: cream.uoa.gr:8443/cream-pbs-dteam
-d --delegation The type of delegation to use for the job submission. Either "multiple" or "single". Defaults to "multiple".
--nocolor Specify this flag to turn off colored output (on by default)
--noprint Specify this flag to turn off printing to the screen.
--constant Specify this option, in order to keep a constant queue of the number of jobs provided to the -n option. The integer argument, is the time the script will submit jobs, in minutes. Note that the queue will be sustained for an undetermined amount of time, since many variables play a big role to the total number of the submitted jobs. Generally, if the number of submitters is big, the queue will increase slowly and if the submitter is only one with a big -s (slot) argument, the queue will increase quickly. Note that if the number of concurrent submitters isn’t a divisor of the number of jobs, then the number of slots (-s) will be forcefully set to numJobs/concSubmitters and then numJobs will be forcefully set to slots*concSubmitters.

Test Types

The test types supported are "submit" and "cancel", in two modes (simple and "constant"), giving a total of 4 basic types of tests.

Simple Submit During this test, a total of -n jobs will run and then the test will exit.
Constant Submit During this test, jobs will be submitted for --constant minutes, and then they will be collected (this step will take as much time as it needs, it is irrelevant to the --constant argument). Then the test will exit.
Simple Cancel During this test, a total of -n jobs will be submitted and then cancelled. Then test will exit.
Constant Cancel During this test, jobs will be submitted and then cancelled for --constant minutes, and then they will be collected (this step will take as much time as it needs, it is irrelevant to the --constant argument, though smaller than the "constant submit" counterpart). Then the test will exit.

Submitters and Slots

The number of submitters and slots can lead to very different behaviour (and load applied to the CREAM subsystems). As previously stated, the number of submitter processes must not be too big, or else the system will eventually run out of memory. Also, in case of fast exiting jobs, if the slots per submitter are too many, the jobs will be left unaccounted until the submitter’s slots are filled (priority is always given to the submitting of jobs, against other operations). Furthermore, maybe contrary to the initial impression, many submitters with many slots will not fill a CREAM endpoint quicker than a single submitter with a huge slot. This is because all the submitter processes run on the same host, eventually using the same resources. Thus, the more of them running, the more they compete for these resources, ultimately spending more and more time in function overhead and context switching. Also, a yet unexplored scenario, is the effect multiple actual users submitting jobs concurrently, would have on the test. All in all, the best setup for these two arguments, varies per site setup and must be experimented upon, in order to find the best values for the testing scenario you want to implement for your site and/or service.

Examples

1. Use 125 submitter processes, each managing 40 job slots, submitting 60 second sleep jobs, using a single delegation, until 5000 jobs are submitted.

cream_stress_test.py -d single -n 5000 -c 125 -s 40 -e ctb04.gridctb.uoa.gr:8443/cream-pbs-see -t submit --jdl sleep --sleep 60

2. Same as above, but instead of only submitting, also cancel each job.

cream_stress_test.py -d single -n 5000 -c 125 -s 40 -e ctb04.gridctb.uoa.gr:8443/cream-pbs-see -t cancel --jdl sleep --sleep 60

3. Use 127 submitters, each managing 7 jobs concurrently, submitting and then cancelling the jdl in the given path, until 1924 jobs are cancelled.

cream_stress_test.py  -d  multiple  -n  1924  -c  127  -s   7   -e   ctb04.gridctb.uoa.gr:8443/cream-pbs-see   -t   cancel   -j ~/path/to/jdl/a_jdl.jdl

4. Run in constant submit mode for 100 minutes (note that the -n 1 argument is ignored), using 50 submit processes with slot size 15, using single delegation and the given jdl.

cream_stress_test.py -d single -n 1 -c 50 -s 15 -e ctb04.gridctb.uoa.gr:8443/cream-pbs-see -t submit -j ~/path/to/jdl/a_jdl.jdl --constant 100

Monitor

Arguments

These are the script's arguments:
-h --help Show help message
-d --delay The delay between monitoring operations
-s --savestats If this option flag is set, the statistics gathered, will be saven in a text file under /tmp, in a -parse- friendly format. Data are stored on a per host basis.
watchlist This is the main argument to the script. First and foremost, it must be given inside single quotes "’" -in order to prevent the shell from intepreting the metacharacters it uses for delimiters-. It has the following format: ’user,host,port{commands}[procs]:...’
Where "commands" and "procs" should be like: {vmstat,iostat,sar,qstat}[java,BUpdaterPBS].
The only available options for the "commands" part of the argument, are "vmstat","iostat","sar","qstat".
The "procs" argument can contain any process running in the target host (actually any string, if it doesn’t exist as a process on the target host, no data are gathered).
The user,host,port and "commands" arguments are mandatory, the "procs" is optional.
The brackets "[","]" and curly brackets "{","}" MUST be used!
The watchlist must ALWAYS be passed inside single quotes

Gathered Statistics

These are the available gathered statistics:
iostat Read and write in KB per second, for all the used hard disk devices
vmstat Number of processes in ready and in sleeping state. Free, cached, buffered and swap memory. User, system, iowait and idle cpu utilization. System interrupts and context switches per second.
sar The recieve and transmit rates in KBps for all registered interfaces
ps The cpu and size of a process, as reported by the ps program
qstat The number of jobs queued at Torque, at the given time.

Examples

1. Gather data from vmstat, iostat, sar and monitor java, BUpdaterPBS, BNotifier on cream.gridctb.uoa.gr. Gather data from vmstat, iostat, sar and monitor mysqld on db.gridctb.uoa.gr. Gather data from vmstat, iostat, sar, qstat and monitor pbs_server, maui, munged on lrms.gridctb.uoa.gr. Update monitor data each 5 seconds. Also, at the end, save the collected data.

 cream_stress_test.py ’root,cream.gridctb.uoa.gr,22{vmstat,iostat,sar}[java,BUpdaterPBS,BNotifier]:root,db.gridctb.uoa.gr,22{vmstat,iostat,sar}[mysqld]:root,lrms.gridctb.uoa.gr,22{vmstat,iostat,sar,qstat}[pbs_server,maui,munged]’ -d 5 -s 

Combine

Finally, if you want to utilize both scripts simultaneously -which is the best available option!-, you should execute the monitor script on a terminal, wait for a couple of minutes to gather the system's under test idle statistics and then fire up the stress test on another terminal. Once the stress test is finished, wait again for a couple of minutes (or maybe even more, depending on the load applied to the subsystems), in order to gather statistics for the time it takes to reach the same levels of resource utilization as before the stress test, in the idle state

See Also

Cream User Guide https://wiki.italiangrid.it/twiki/bin/view/CREAM/UserGuide

Cream Admin Guide https://wiki.italiangrid.it/twiki/bin/view/CREAM/SystemAdministratorGuideForEMI2

Python matplotlib www.matplotlib.org

Man pages of ps , iostat, vmstat, sar, qstat, ps

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2012-11-02 - DimosthenesFioretosExCern
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EMI All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback