WARNING: This web is not used anymore. Please use PDBService.QuadCoreTests instead!
 

Quadcore Tests - evaluation of a 2 x quadcore Intel x86_64 machine for PDB

This page reports on the results of the tests on a latest-generation server equipped with dual quadcore Intel CPUs, for evaluation at the PDB group in March 2007.

Main findings, Executive Summary

  • A machine with 8 cores and 16GB of RAM (Quadcore) has been tested and compared with current PDB RACs (2x P-IV CPUs)
  • Tests of CPU and concurrent memory access (performed using Oracle SQL) show a speedup of a factor 5 (i.e. 1 quad core = 5 nodes of RAC, for this type of workload and within the tested system loads). Scalability results show a different behavior between PIV and quad core
  • Streams tests on the quadcore show a measured performance increase of 60% compared to PIV - in the case when the quadcore is used as the apply (receiving) site
  • CMS phedex tests showed that the 1 quad core box performs as a 6-node RAC.
  • Measured power consumption per core showed a gain in efficiency for quadcore compared to current RAC production of a factor 2
  • Installation procedures for quad core and RHEL 4 do not need additional effort compared to the existing (there are minor differences with current production RACs)
  • SAN configuration and I/O throughput have been tested and are unchanged from the existing RAC configuration.


Systems main characteristics

  • 2x Intel quadcore CPUs - Xeon E5345 @ 2.33GHz - L1 cache =128kB, L2 cache=8MB
  • Intel_5000p Chipset Memory Controller Hub , RAM = 16GB -> 8 x 2GB FB DIMM 667MHz (1.5 ns) - memory bandwidth = 21GB/s
  • 2 x e1000 + Qlogic HBAs 2312
  • installed with RHEL 4 U4 kernel 2.6.9-42.0.8-ELsmp x86_64
  • Oracle 10.2.0.3 for x86_64 with RAC option (1-node RAC) and ASM

  • For comparison RAC4 HW is: 2x Pentium IV @ 3GHz ,
  • E7520 Memory Controller RAM =4GB DDR2 400MHz (2.5 ns) - memory bandwidth =6.4GB/s
  • 2 x e1000 + Qlogic HBAs 2312
  • installed with RHEL 4 U4 kernel 2.6.9-42.0.3-ELsmp i386 (32 bit)
  • Oracle 10.2.0.3 for i386 with RAC option (6-node RAC) and ASM

Installation

  • practically the same installation procedure as for PDB RACs on RHEL4 (Installation_verbose), the change to 64 bit is almost transparent
  • notably change Oracle binaries to x86_64 and the RAM config parameters: kernel.shmmax,vm.nr_hugepages, /etc/security/limits.conf and swap size
  • contingency: the box under test would not boot with default config. It has been necessary to add nopic and selinux=0

Power Consumption

  • Quadcore data measured by Alexander Iribarren:
    • Loaded: 453 VA
    • Idle: 326 VA
    • with the Qlogic HBA: +5VA
  • for reference
    • servers on RAC4 (Pentium IV Xeon)= 260 W (loaded)
    • storage arrays on RAC4: 200 W (loaded)

Memory throughput

  • Memory throughput has been measured with an Oracle-based benchmark, a modified version of the JLOCI benchmark (see attachment). The resutlt is a speedup of a factor 3 in the Quadcore machine. This is consistent with the specs of the FB DIMM 667MHz.
  • jloci.sql: logical IO throughput measurement script
  • core4_vs_rac4.txt: logical IO measurements: RAC4 vs quadcore

CPU speed, single-thread

  • The CPU + cache speed has been tested with a (single-threaded) PLSQL loop. The result show +25% performance gain for this type of workload
  • plssqlloop_res.txt: plsqlloop rac4 vs quadcore

CPU speed, thread scalability

  • CPU-bound jobs on Quadcore have been tested to scale up to 8 threads of simultaneous/parallel execution without response time degradation. A simple workload consisting of plsqlloop executed in parallel using parallel query has been used for the test. Increasing the parallelism the response time does not change up to 8 parallel threads. On the OS level 8 threads are seen scheduled on CPU. We conclude that there is no internal contention for simple CPU-bound jobs up the the number of cores (8).
  • Stress_test_parall_query.sql: multi thread scalability tested with Oracle parallel query

I/O throughput, sanity check

  • I/O throughput for sequential I/O has been measured as a sanity check. No change is expected from measurements on the current RAC system, see RAC_storage_performance.pdf
  • The sequential I/O throughput is limited by the HBA as expected. Random I/O tests have not been performed since we don't expect any change from current RAC configuration.
  • SeqIO_stress_test.sql: Stress test for sequential IO throughput

Memory access speed and scalability test

  • Response time and scalability of the logical IO has been measured vs. increasing workload
  • the result show that for server load of interest the quadcore machine performs as a 5-node RAC.
  • Contention for memory access can be seen at high load
  • quadcore_memory_test.pdf: memory access performance and scalability

Streams performance tests

  • Streams apply (receiving end) has been configured on the quad core. The CMS replication workload (Marcin's test) show a speed up of about 60%

Phedex (CMS data transfer application) performance tests

  • Overall performance of the application has been measured while using 6-node RAC consisting of dual Xeon CPU servers and 1 dual quadcore CPU server as database backends
  • the result show that for PhEDEx-like load the quadcore machine performs even a bit better then a 6-node RAC
  • However due to limitation in the client hardware resources, neither 6-node RAC nor the quadcore servers' CPU have been saturated.

COOL performance tests

  • Cool performance tests show a beneficial effect in performance/throughput when using the quadcore server. More data are being collected.

Topic attachments
I Attachment History Action Size Date Who Comment
Microsoft Word filedoc PSS_quadcore_tests_April07.doc r1 manage 218.0 K 2007-05-08 - 11:38 LucaCanali Quad-core servers for Oracle PDB services
Unknown file formatsql SeqIO_stress_test.sql r2 r1 manage 0.5 K 2007-03-26 - 21:37 LucaCanali Stress test for sequential IO throughput
Unknown file formatsql Stress_test_parall_query.sql r1 manage 1.2 K 2007-03-21 - 22:04 LucaCanali multi thread scalability tested with Oracle parallel query
Texttxt core4_vs_rac4.txt r1 manage 4.5 K 2007-03-21 - 21:31 LucaCanali logical IO measurements: RAC4 vs quadcore
Unknown file formatsql jloci.sql r1 manage 1.2 K 2007-03-21 - 21:30 LucaCanali logical IO throughput measurement script
Texttxt plssqlloop_res.txt r1 manage 0.6 K 2007-03-21 - 21:43 LucaCanali plsqlloop rac4 vs quadcore
PDFpdf quadcore_memory_test.pdf r5 r4 r3 r2 r1 manage 51.7 K 2007-03-30 - 11:09 LucaCanali memory access performance and scalability
Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r16 - 2007-06-13 - DirkDuellmann
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PSSGroup All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback