Quadcore Tests - evaluation of a 2 x quadcore Intel x86_64 machine for PDB
This page reports on the results of the tests on a latest-generation server equipped with dual quadcore Intel CPUs, for evaluation at the PDB group in March 2007.
Main findings, Executive Summary
- A machine with 8 cores and 16GB of RAM (Quadcore) has been tested and compared with current PDB RACs (2x P-IV CPUs)
- Tests of CPU and concurrent memory access (performed using Oracle SQL) show a speedup of a factor 5 (i.e. 1 quad core = 5 nodes of RAC, for this type of workload and within the tested system loads). Scalability results show a different behavior between PIV and quad core
- Streams tests on the quadcore show a measured performance increase of 60% compared to PIV - in the case when the quadcore is used as the apply (receiving) site
- CMS phedex tests showed that the 1 quad core box performs as a 6-node RAC.
- Measured power consumption per core showed a gain in efficiency for quadcore compared to current RAC production of a factor 2
- Installation procedures for quad core and RHEL 4 do not need additional effort compared to the existing (there are minor differences with current production RACs)
- SAN configuration and I/O throughput have been tested and are unchanged from the existing RAC configuration.
Systems main characteristics
- 2x Intel quadcore CPUs - Xeon E5345 @ 2.33GHz - L1 cache =128kB, L2 cache=8MB
- Intel_5000p
Chipset Memory Controller Hub , RAM = 16GB -> 8 x 2GB FB DIMM 667MHz (1.5 ns) - memory bandwidth = 21GB/s
- 2 x e1000 + Qlogic HBAs 2312
- installed with RHEL 4 U4 kernel 2.6.9-42.0.8-ELsmp x86_64
- Oracle 10.2.0.3 for x86_64 with RAC option (1-node RAC) and ASM
- For comparison RAC4 HW is: 2x Pentium IV @ 3GHz ,
- E7520 Memory Controller RAM =4GB DDR2 400MHz (2.5 ns) - memory bandwidth =6.4GB/s
- 2 x e1000 + Qlogic HBAs 2312
- installed with RHEL 4 U4 kernel 2.6.9-42.0.3-ELsmp i386 (32 bit)
- Oracle 10.2.0.3 for i386 with RAC option (6-node RAC) and ASM
Installation
- practically the same installation procedure as for PDB RACs on RHEL4 (Installation_verbose), the change to 64 bit is almost transparent
- notably change Oracle binaries to x86_64 and the RAM config parameters: kernel.shmmax,vm.nr_hugepages, /etc/security/limits.conf and swap size
- contingency: the box under test would not boot with default config. It has been necessary to add nopic and selinux=0
Power Consumption
- Quadcore data measured by Alexander Iribarren:
- Loaded: 453 VA
- Idle: 326 VA
- with the Qlogic HBA: +5VA
- for reference
- servers on RAC4 (Pentium IV Xeon)= 260 W (loaded)
- storage arrays on RAC4: 200 W (loaded)
Memory throughput
- Memory throughput has been measured with an Oracle-based benchmark, a modified version of the JLOCI benchmark (see attachment). The resutlt is a speedup of a factor 3 in the Quadcore machine. This is consistent with the specs of the FB DIMM 667MHz.
- jloci.sql: logical IO throughput measurement script
- core4_vs_rac4.txt: logical IO measurements: RAC4 vs quadcore
CPU speed, single-thread
- The CPU + cache speed has been tested with a (single-threaded) PLSQL loop. The result show +25% performance gain for this type of workload
- plssqlloop_res.txt: plsqlloop rac4 vs quadcore
CPU speed, thread scalability
- CPU-bound jobs on Quadcore have been tested to scale up to 8 threads of simultaneous/parallel execution without response time degradation. A simple workload consisting of plsqlloop executed in parallel using parallel query has been used for the test. Increasing the parallelism the response time does not change up to 8 parallel threads. On the OS level 8 threads are seen scheduled on CPU. We conclude that there is no internal contention for simple CPU-bound jobs up the the number of cores (8).
- Stress_test_parall_query.sql: multi thread scalability tested with Oracle parallel query
I/O throughput, sanity check
- I/O throughput for sequential I/O has been measured as a sanity check. No change is expected from measurements on the current RAC system, see RAC_storage_performance.pdf
- The sequential I/O throughput is limited by the HBA as expected. Random I/O tests have not been performed since we don't expect any change from current RAC configuration.
- SeqIO_stress_test.sql: Stress test for sequential IO throughput
Memory access speed and scalability test
- Response time and scalability of the logical IO has been measured vs. increasing workload
- the result show that for server load of interest the quadcore machine performs as a 5-node RAC.
- Contention for memory access can be seen at high load
- quadcore_memory_test.pdf: memory access performance and scalability
Streams performance tests
- Streams apply (receiving end) has been configured on the quad core. The CMS replication workload (Marcin's test) show a speed up of about 60%
Phedex (CMS data transfer application) performance tests
- Overall performance of the application has been measured while using 6-node RAC consisting of dual Xeon CPU servers and 1 dual quadcore CPU server as database backends
- the result show that for PhEDEx-like load the quadcore machine performs even a bit better then a 6-node RAC
- However due to limitation in the client hardware resources, neither 6-node RAC nor the quadcore servers' CPU have been saturated.
COOL performance tests
- Cool performance tests show a beneficial effect in performance/throughput when using the quadcore server. More data are being collected.