DB12 dependency on Python Version and OS

Open questions (that possibly can get an answer)

  1. Manfred: is it possible to enable batch queues on the various test environments to compare with the performance of Alice/Atlas/CMS/LHCb applications? Are user jobs also being boosted when using newer Python releases? - Remark (Manfred): At GridKa the static benchmark scores are available to users using the MJF interface. Submit benchmark jobs to the default ARCs; there are currently no dedicated batch queues configured to submit benchmark jobs.
  2. Manfred: AFAIK experiments are using private Python implementations from their CVMFS areas. Which Python release is used by production jobs? (Remark (Manfred): GridKa doesn't look into user jobs and is therefore not aware of the Python release being used inside the jobs. This question should become answered by the experiments.)

DB12.py running in different containers (D. Giordano)

Done on a physical server at CERN: Haswell specs here

Configurations tested

Used docker containers to run with different python versions. In addition measured directly the performance of DB12 on the python 2.7.5 installed on this server. Versions tested:

type OS version
Containers lc6-base Python 2.6.6
Server CC7 Python 2.7.5
Container cc7-base Python 2.7.5
Container python:2.7 (from dockerhub) Python 2.7.13
Container python:3 (from dockerhub) Python 3.6.0

GCC versions

OS lib GCC version
cern/slc6-base /usr/lib64/libpython2.6.so.1.0 GCC 4.4.7 20120313 (Red Hat 4.4.7-17)
cern/cc7-base /lib64/libpython2.7.so.1.0 GCC 4.8.5 20150623 (Red Hat 4.8.5-11
python:2.7 /usr/local/lib/libpython2.7.so.1.0 GCC: (Debian 4.9.2-10) 4.9.2
python:3 /usr/local/lib/libpython3.6m.so.1.0 GCC: (Debian 4.9.2-10) 4.9.2

DB12 running approach

Run DB12 version available in github https://github.com/DIRACGrid/DB12/blob/master/DIRACbenchmark.py with the flag --extra-iteration (DB12 modified to run 2 extra-iterations in order to make sure that all benchmarked processes finish when the machine is still fully loaded)

NB: had to fix few components in the python script to comply with python 3.6.0

Example of command line

  • docker run -it --rm --name my-running-script -v /root/DB12:/usr/src/myapp -w /usr/src/myapp $IMAGE python DIRACbenchmark.py --iterations=1 --extra-iteration
    • with $IMAGE in cern/slc6-base cern/cc7-base python:2.7 python:3

Results

Results are reported in the attached table. DB12_vs_PythonVersion.png

  • It is shown that the python 2.7.5 is ~10% faster than python 2.6.
  • 2.7.13 is even 18% faster, but could depend also on the different OS of that container.

The study is performed benchmarking the whole node (32 processes) and half of it (16 processes). Ratio between DB12 values for 16 and 32 processes shows the usual lack of gain in running multi-threading. the values are comparable within few %

Measurements with 16 processes are less stable. It always seems that there are slower threads respect to the average. This can be noticed also looking at the average value respect to the median:

  • Example: python DIRACbenchmark.py --iterations=1 --extra-iteration 16
    • DB12 output
    • (16, 244.2583650301175, 15.266147814382343, 14.986596160813637, 16.622340425531913)
    • 8.78117316473 9.52743902439 12.0714630613 14.409221902 16.5453342158 16.5672630881 16.5782493369 16.6223404255 16.6223404255 16.6333998669 16.6333998669 16.6444740346 16.6444740346 16.655562958 16.655562958 16.6666666667

DB12 C++ (D. Giordano)

DB12 implementation in C/C++ used to decouple python from the random number generation.

Results

Results are reported in the attached table. DB12.cpp_vs_OS.png

Done on the same physical server at CERN used for the python study. Adopt containers to run on different OS (SLC6, CC7, Debian 8)

  • No major score change across configurations.
    • +3% in CC7 (was 9% for the python version)
    • +5% in Debian (was 18% in python 2.7.13)
  • Ratio DB12 (32 cores)/DB12 (16 cores) is ~1.5 => 50% gain with HT=ON

Specs of the Bare metal server

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Stepping: 2
CPU MHz: 1631.906
BogoMIPS: 4793.86
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31

DB12np.py A python version based on Numpy (from V. Innocente)

Vincenzo has slightly modified the original DIRACBenchmark.py code to adopt Numpy and profit of the optimized (C based) implementation.

* Repository: https://github.com/VinInn/pyTools/blob/master/DB12np.py

Comparison of perf profile

A comparison of the perf profile for the three implementations is here reported.

  • perf record has been used to record the percentages of calls to the shared objects.
  • The data file has been analysed with perf report.
  • The percentage of calls (Overhead) to the distinct shared objects has been then computed (summing up on the different symbols)
    • e.g. the three entries of mtrand.so are summed together.
# Overhead  Command  Shared Object        Symbol                                        
# ........  .......  ...................  ..............................................
#
    23.15%  python   libm-2.17.so         [.] __ieee754_log_avx
    13.19%  python   mtrand.so            [.] rk_random
    12.89%  python   mtrand.so            [.] rk_gauss
     5.83%  python   mtrand.so            [.] rk_double

  • Results are reported in the attached table. DB12_perf_comparison.png
  • It is evident that
    • DB12 numpy and C++ versions are dominated by calls to libm-2.17.so where the log function is (__ieee754_log_avx)
    • On the contrary the original DIRACBenchmark (DB12 standard in the table) is dominated by calls to libpython2.7 (86% of the time) and this is dominated by PyEval_EvalFrameEx (for half of the time)

Comparison of DB12 flavors on grid nodes at GridKa

Hardware model Benchmark copies Ratio HS06 DB12 DB12-cpp DB12-np
E5-2630v4 20 1.0 333 276 338 372
32 1.6 390 290 436 445
40 2.0 416 289 500 498
E5-2630v3 16 1.0 278 241 272 303
24 1.5 328 246 335 356
32 2.0 352 230 401 392
E5-2660v3 20 1.0 374 334 380 409
32 1.6 447 330 494 501
40 2.0 467 329 567 551
E5-2665 16 1.0 261 155 200 214
24 1.5 305 173 290 289
32 2.0 322 194 348 334
E5630 8 1.0 112 67 123 112
12 1.5 132 73 146 129
16 2.0   81 165 143
6168 (2 sockets) 24 1.0 193 147 179 206
6174 (4 sockets) 48 1.0 430 347 412 490
6376 (4 sockets) 32 0.5 331 223 302 319
64 1.0 499 330 548 537

Commands used to run the DB12 benchmarks:

Benchmark Command sequence Comments
DB12 /usr/sbin/DIRACbenchmark.py -i 10 --extra-iteration $n_copies Python script from mjf-db12 package
DB12-cpp DB12.exe -n $n_copies
DB12-np ~/benchmarks/dirac/DB12np.py -i 10 --extra-iteration $n_copies

Note: there was a typos (transposed digits) in the first release of the table (the correct DB12 score of the E5-2630v3 system running 24 benchmark copies in parallel is 246, not 264 (fixed 2017-04-25), and another typo in the HS06 score of the E5-2660v3 running 40 copies (fixed 2017-04-26).

-- DomenicoGiordano - 2017-03-24

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng DB12.cpp_vs_OS.png r1 manage 96.6 K 2017-04-10 - 16:52 DomenicoGiordano DB12.cpp results for different OS versions
PNGpng DB12_perf_comparison.png r1 manage 135.4 K 2017-04-10 - 16:19 DomenicoGiordano perf comparison of the different DB12 versions
PNGpng DB12_vs_PythonVersion.png r1 manage 36.8 K 2017-03-24 - 11:11 DomenicoGiordano DB12 results for different python configurations
Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2017-04-27 - ManfredAlefExternal
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    HEPIX All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback