CE Stress Testing

Overview

The original acceptance criteria for a CE implementation was outline by the TCG in the following document. A summary of the basic acceptance criteria is;

  • Performance
    • 5000 simultaneous jobs per CE node
    • 50 user/role/submission node combinations supported on a single CE node
  • Reliability
    • Job failure rates in normal operations due to the CE <0.5%
    • Job failures due to restart of CE services or reboot <0.5%
    • 5 days unattended running with performance on day 5 equal to that on day 1

The formal testing procedures for EGEE certification can be found here.

Initial Testing by Di Qing

The initial testing of various interfaces was overseen by Di Qing. The results of these tests can be found on the following pages.

In summary, there are two kinds of tests, how many unique users (DN/Group/Roes) the CE can handle and how many jobs the CE can handle simultaneously with certain numberof users. The testing scripts used to test the LCG-CE can be found in /afs/cern.ch/user/d/dqing/public/multipleuser-andrey7.The submit.sh can be viewed to understand how to use the utilities, for example .

"traffic-simulator -max-time 91 -proxy user_proxies/ -wms-list wms_list -users 20 30 75 0"
This result in jobs being submitted for 20 users, the time interval between two submission is 30 minutes, in each submission, 75 jobs will be submitted per user. When the script reaches the max-time, 91 minutes, it exits, thus there will be 4 submissions, and in total 4*20*75=6000 jobs will be submitted.

For CREAM CE, since it was not possible to submit jobs through WMS, the test was done by the scripts provided by INFN which can be found in /afs/cern.ch/user/d/dqing/public/cream.

LSF Testing Environment

A machine bes.cern.ch has been installed with LSF 7.0 by Chris Smith from Platform Computing. VMware has been used to set up a number of virtual machines to be used as Worker Nodes. The BES interface from LSF has been configured on this machine and can be used to submit jobs to the LSF batch system. The machine also has installed an LCG CE, a Cream CE and an ARC CE all of which use the underlaying LSF batch system. The machine vtb-generic-52 has been installed with a version of the WMS which can submit to all these CEs with the exception of the BES interface.

PNPI Testing

A testing framework has been written which can be used to stress test CEs. More details on the test suite and results from testing can be found here and here.

Test Plan

  1. Testing of CE performance via WMS submission
  2. Testing that the Cream CE meets that official acceptance criteria
  3. Testing of CE performance via direct submission
  4. Testing the performance of the Condor-G submission to the Cream CE
  5. The Performance of ICE submission to many Cream CEs

Direct Submission

LSF

Installation
mount lxbra1908.cern.ch:/share
. /share/lsf/conf/profile.lsf
cd /share/lsf/7.0/linux2.6-glibc2.3-x86/bin

LSF Direct Job Submission

bsub -m bes.cern.ch -o std-1.out -e std-1.erro ./testjob.sh

LSF Status Query

bjobs -m bes.cern.ch -u all -a

BES LSF

Installation
mount lxbra1908.cern.ch:/share
cd /share/hpcp/

BES Job Submission

<?xml version="1.0" encoding="UTF-8"?>
<JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl">
    <JobDescription>
        <JobIdentification>
            <JobName>Sleep</JobName>
            <JobProject>BES</JobProject>
        </JobIdentification>
        <Application>
            <HPCProfileApplication
xmlns="http://schemas.ggf.org/jsdl/2006/07/jsdl-hpcpa">
                <Executable>sleep</Executable>
                <Argument>300</Argument>
                <Output>/dev/null</Output>
                <WorkingDirectory>/tmp</WorkingDirectory>
            </HPCProfileApplication>
        </Application>
        <Resources>
            <TotalCPUCount>
                <Exact>1</Exact>
            </TotalCPUCount>
        </Resources>
    </JobDescription>
</JobDefinition>
 
./besclient -u csmith -p xxxxxxx create sleep.xml sleep-1.epr

BES Job Status

./besclient -u csmith -p xxxxxxxx status sleep1.epr

Cream

Installation
mount lxbra1908.cern.ch:/share
yum install expat log4cpp
export LD_LIBRARY_PATH=/share/glite/lib:/share/glite/globus/lib:/share/glite/external/opt/c-ares/lib/:/share/glite/external/opt/
cd /share/glite/bin

Job Submission

[
JobType = "Normal";
Executable = "/bin/date";
Arguments = "";
StdOutput="out.txt";
StdError="err.txt";
OutputSandbox = {"out.txt", "err.txt"};
OutputSandboxBaseDestUri = "gsiftp://lxbra1908.cern.ch/tmp/";
]

glite-ce-job-submit  --autm-delegation --resource lxbra1908.cern.ch:8443/cream-lsf-normal example.jdl

BES Job Status

glite-ce-job-status https://lxbra1908.cern.ch:8443/CREAM830810312

ARC

Installation
mount lxbra1908.cern.ch:/share
yum install libxml2 libtool-libs  openldap
export LD_LIBRARY_PATH=/share/nordugrid/lib
export GLOBUS_LOCATION=/share/glite/globus/
cd /share/nordugrid/bin/

Job Submission

&
(executable=/bin/echo)
(arguments="Hello World" )
(stdout="hello.txt")
(stderr="hello.err")
(* Grid Manager auxilliary logs will be stored in this directory: *)
(* gmlog="gridlog")
(jobname="My Hello Grid")

./ngsub -c bes.cern.ch -f arc-rsl

Job Status

./ngstat  gsiftp://bes.cern.ch:2811/jobs/5131235049518197584921
<verbatim>
</verbatim>
<nop>
Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2009-02-20 - LaurenceField
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback