SCAS stress tests results

In this page we collect stress tests results done on the first SCAS/glexec patches. The scripts used to run the tests are explained in the README file in CVS.

10 March 2009

This test lasted 4 days from 6 Feb 2009 to 10 Feb 2009 09:30. It was executed using 10 WNs and one SCAS server all deployed on VMs. The Worker Nodes were activated in sequence, with an interval of 2 hour.

This test used the new scripts to use multiple user credentials. On each worker node each glexec call choose a random proxy among a set of 10 available proxy. 100 proxies are used in total on all the worker nodes

The new glexec patch with the new SCAS client showed to be tolerant to SCAS internal refresh, which now do not cause glexec errors like it was happening in the previous versions.

The hostnames were:

  • lxb7606v1 to lxb7606v5 and lxb7605v1 to lxb7605v5 (WNs)
  • vtb-generic-83 (SCAS)

The patches installed were:

The total requests and the error rate was:

  • Total requests: 3306508
  • Frequency achieved: 10.02 Hz
  • Total errors: 1 (glexec failed after 172915 seconds with the message " [gLExec]: LCMAPS failed, see '/var/log/glexec/lcas_lcmaps.log' for more info" )

GLEXEC response time on the Worker Nodes

The response time on the first host (lxb7606v1) is showed in the following graph:

A frequency histogram with Y axis in logarithmic scale is of the same data is here:

The same plot with a NON logarithmic Y axis is here:

Categorizing the response time in 3 zones we have:

  • zone 1 [0,2): 98.16%
  • zone 2 [2,10): 1.80%
  • zone 3 [10, ): 0.04%

Memory consumpion on the SCAS server

The memory consumption on the SCAS server is showed in the following graphs:

19 February 2009

This test lasted 6 days from 13 Feb 2009 14:25 to 19 Feb 2009 08:00.

It was executed using 10 WNs and one SCAS server all deployed on VMs. The Worker Nodes were activated in sequence, with an interval of 1 hour. The hostnames were:

  • lxb7606v1 to lxb7606v5 and lxb7605v1 to lxb7605v5 (WNs)
  • vtb-generic-83 (SCAS)

The patches installed were:

The total requests and the error rate was:

  • Total requests: 6264443
  • Total errors: 14200
  • Error rate: .2267% (error meaning a glexec failure with an error message)
  • Requests per second: 12.65 (this is the frequency achieved, considering that WNs make requests continuously)

GLEXEC response time on the Worker Nodes

The response time on the first host (lxb7606v1) is showed in the following graph:

Zooming in 1 hour period in the middle of the test we get:

Two levels are present in the response time graph. Most of the executions have a response time less than 1 seconds but for a considerable amount of executions (~40 per hour) this response time is around 6 seconds. Some spikes are present in the zone around 10 seconds and very rarely at a higher level. Using a three zones categorization, these are the results:

zone1 [0,2): 99.49%

zone2 [2,8): 0.50%

zone3 [8,+inf): 0.02%

A frequency histogram plot is available here:

Breaking the Y axis to 5000 we can see smaller contributions:

Memory consumpion on the SCAS server

The memory consumption on the SCAS server is showed in the following graphs: The trend is more visible in the following graph, zoomed in 1 hour period:

The memory leak problem (see patch #2684) has been fixed killing the SCAS child process every 5 minutes. This allow the SCAS server not to crash but it introduces periodic errors that happen during the restarting of the child process (see Error distribution section). This problem is known to the SCAS developers and tracked in bug #47148. Some memory leak is still present and visible in the first graph (bug #47149) .

Error rate and distribution

The error rate of glexec executions, that was around .03% with 2 WNs, with 10WNs reaches 0.2%. The error distribution graph, zoomed in the same 1 hour period as before, shows that errors happen at the time of the switch in the SCAS server:

12 February 2009

This test lasted 18 hours from Thu Feb 12 13:38:13 2009.

It was executed using 10 WNs and one SCAS server all deployed on VMs. The hostnames were:

  • lxb7606v1 to lxb7606v5 and lxb7605v1 to lxb7605v5 (WNs)
  • vtb-generic-83 (SCAS)

The patches installed were:

The total requests and the error rate was:

  • Total requests: 765937
  • Total errors: 1360
  • Requests per second: 11.59
  • Error rate: .1775%

The response time on the first host (lxb7606v1) is showed in the following graph:

The error distribution on the same WN is in:

The memory consumption on the SCAS server is showed in the following graphs:

The total error rate computed each hour is showed in the following graph:

06 February 2009

This test lasted almost 3 days (67 hours), from Fri Feb 6 12:47:14 to Mon Feb 9 08:00:00 (in unix time, from 1233920839 to 1234162799) It was executed using 2 WNs and one SCAS server all deployed on VMs. The hostnames were:
  • vtb-generic-111 and lxb7606v1 (WNs)
  • vtb-generic-83 (SCAS)

The patches installed were:

The total requests and the error rate was:

  • Total requests: 2475288
  • Total errors: 907
  • Requests per second: 10.23
  • Error rate: .03664%

The response time on each host is showed in the following graphs:

The memory consumption on the SCAS server is showed in the following graphs:

-- GianniPucciani - 09 Feb 2009

Topic attachments
I Attachment History Action Size Date Who Comment
Postscriptps 090212hourlyErrorRate.ps r3 r2 r1 manage 19.5 K 2009-02-13 - 16:37 GianniPucciani  
Postscriptps 090212lxb7606v1.ps r1 manage 1683.9 K 2009-02-13 - 15:36 GianniPucciani  
Postscriptps 090212lxb7606v1_error.ps r1 manage 1694.1 K 2009-02-13 - 15:36 GianniPucciani  
Postscriptps 090212scas-mon.ps r1 manage 429.4 K 2009-02-13 - 15:41 GianniPucciani  
Postscriptps 090219chop_lxb7606v1.ps r1 manage 76.2 K 2009-02-19 - 11:02 GianniPucciani  
Postscriptps 090219chop_lxb7606v1_error.ps r1 manage 75.3 K 2009-02-19 - 11:42 GianniPucciani  
Postscriptps 090219chop_scas-mon.ps r1 manage 28.9 K 2009-02-19 - 11:08 GianniPucciani  
Postscriptps 090219lxb7606v1.ps r1 manage 8658.0 K 2009-02-19 - 11:02 GianniPucciani  
Postscriptps 090219scas-mon.ps r1 manage 1355.3 K 2009-02-19 - 11:06 GianniPucciani  
PNGpng 090306histo_filt_lxb7606v1.png r1 manage 4.3 K 2009-03-10 - 11:45 GianniPucciani  
PNGpng 090306histo_nolog_lxb7606v1.png r1 manage 3.0 K 2009-03-10 - 17:13 GianniPucciani  
Postscriptps 090306lxb7606v1.ps r1 manage 5485.8 K 2009-03-10 - 11:45 GianniPucciani  
Postscriptps 090306scas-mon.ps r1 manage 1795.6 K 2009-03-10 - 12:10 GianniPucciani  
PNGpng histo.png r1 manage 3.1 K 2009-03-09 - 17:49 GianniPucciani  
PNGpng histoYbroken.png r1 manage 8.0 K 2009-03-09 - 17:49 GianniPucciani  
Postscriptps lxb7606v1.ps r1 manage 15735.6 K 2009-02-09 - 12:01 GianniPucciani  
Postscriptps scas-mon.ps r1 manage 662.9 K 2009-02-09 - 12:06 GianniPucciani  
Postscriptps scas-mon_2h.ps r2 r1 manage 34.3 K 2009-02-09 - 12:10 GianniPucciani  
Postscriptps vtb-generic-111.ps r1 manage 15722.5 K 2009-02-09 - 12:00 GianniPucciani  
Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2009-03-11 - GianniPucciani
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback