Experiment parameters

physical PC, pcitsdc04 with centos7-cern

Identical, unburdened AI nodes: ikadochn-es-c, ikadochn-es-c3, nova-large (4 CPU, 8GB) with slc6

Unless noted otherwise, aggregation queries shown are done from ikadochn-es-c3

Queries constructed in python to test different aggregations. The model aggregation (SRC) is the same as https://twiki.cern.ch/twiki/bin/view/ArdaGrid/ElasticSearchEvaluation#Aggregation_query

For each aggregation and time range, the query was performed 12 times.


  • Performance on the 3 tested clients is very similar
  • Performance of similar SRC and DST aggregations is almost identical
  • Indexing load creates rare 1-2 second delays to aggregation queries, but overall effect is small
  • SRC input and output size is linear with date range
  • SRC aggregation time is linear with respect to input size and output size
  • Matrix output size is not linear with date range
  • Matrix aggregation time is NOT linear with respect to either input size or output size

More detailed aggregation timings

Model SRC aggregation times plotted against time range, bucket count and record count.

0_SRC_timing.png 1_SRC_timing.png 2_SRC_timing.png

The same for the smaller range to see the realistic use-case better:

4_SRC_timing_1_week.png 5_SRC_timing_1_week.png 6_SRC_timing_1_week.png

Not exactly linear here.

Total time vs ES reported time:

3_SRC_total_vs_ES_time.png 7_SRC_total_vs_ES_time_1_week.png

Unexpectedly linear! Does transfer time fluctuation correlate with aggregation time fluctuation? Or is most of the difference not transfer time, but json serialization time?

What does ES report:

The time reported by elasticsearch in the "took" field is the time that it took elasticsearch to process the query on its side. It doesn't include
- serializing the request into JSON on the client
- sending the request over the network
- deserializing the request from JSON on the server
- serializing the response into JSON on the server
- sending the response over the network
- deserializing the response from JSON on the client

In our case, the aggregation request is small and client deserialization is not included in total time (as it does in the discussion linked). This leaves server-side serialization and transfer time to account for the difference.

Different client machines

To make sure changing the client that test run from does not drastically change the outcome. VMs possibly have different network performance than the PC the initial tests ran from. That may affect aggregation time.

Also, virtual machines might have less variance because they are empty and have no GUI running, or they might have more variance because they run puppet agent and the fluctuating load on other VMs might affect them.

8_Clients_comparison.png 10_Clients_comparison_1_week.png

Changing the client machine has almost no effect. Either the network conditions and performance is close for all clients, or the client has comparatively small effect on aggregation performance.

Plots with standard deviation to see if variance changes noticeably.

9_Clients_AVG.png 11_Clients_AVG_1_week.png

No apparent difference here.


A simple check that equivalent aggregation by DST instead of SRC is similar in performance:

12_SRC_vs_DST.png 13_SRC_vs_DST_1_week.png

Effect of indexing load on cluster performance

To check how the index update operation affects aggregation requests, repeated indexing of 3 days of data was started from ikadochn-es-c, then aggregation timings were measured from ikadochn-es-c3.

14_Load_comparison.png 17_Load_comparison_1_week.png

15_Load_comparison.png 18_Load_comparison_1_week.png

16_Load_timing.png 19_Load_timing_1_week.png

Looks like indexing has no effect on average aggregation times but rarely results in slower outliers.

Different SRC aggregations

Comparison of different SRC aggregations:

  • src_plot: hourly bins, aggregations from the top of hierarchy: SRC_DOMAIN, IS_REMOTE_ACCESS, IS_TRANSFER, ACTIVITY, PERIOD_END_TIME
  • src_plot_10m: SRC, but with 10-minute bins (no time aggregation at all, only sum over DST)
  • src_plot_10m_nohist: Instead of histogram aggregation, use terms aggregation for PERIOD_END_TIME
  • src_plot_daily: SRC same, but 24-hour bins
  • src_plot_timefirst: SRC, but change aggregation order to PERIOD_END_TIME, SRC_DOMAIN, IS_REMOTE_ACCESS, IS_TRANSFER, ACTIVITY
  • src_plot_minimal: hourly bins, aggregations are: SRC_DOMAIN, PERIOD_END_TIME, leading to less buckets
  • src_plot_minimal_timefirst: reverse the previous to PERIOD_END_TIME, SRC_DOMAIN

Number of records traversed only depends on time range aggregated:

3_SRC_record_count_1.png 10_SRC_record_count_1_week_1.png

Size of returned buckets depend on aggregation structure, but not interval, as expected:

5_SRC_bucket_size_1.png 5_SRC_bucket_size_1_zoom.png

The number of buckets returned depends on aggregation and interval, but not aggregation order, as expected:

4_SRC_bucket_count_1.png 11_SRC_bucket_count_1_week_1.png

As a result, data size over time range is different for all aggregations:

6_SRC_data_size_1.png 13_SRC_data_size_1_week_1.png tophits_days_size.png

Timing results:

0_SRC_comparison_1.png 7_SRC_comparison_1_week_1.png

tophits_days_total_0.png tophits_days_total.png

1_SRC_comparison_1.png 8_SRC_comparison_1_week_1.png

2_SRC_comparison_1.png 9_SRC_comparison_1_week_1.png

Matrix aggregations

Matrix aggregations are a curious case because data does not grow for longer aggregation periods.

Compared 2 aggregations with examples of SRC aggregations for scale:

  • matrix_minimal: just SRC_DOMAIN, DST_DOMAIN

Bucket size depends on aggregation time, on a scale comparable to SRC:

38_Matrix_vs_SRC_bucket_size.png 44_Small_matrix_bucket_size.png

Bucket count for the matrix is not linear:

37_Matrix_vs_SRC_bucket_count.png 43_Small_matrix_bucket_count.png

As a result, data size is also not linear:

39_Matrix_vs_SRC_data_size_.png 45_Small_matrix_data_size_.png

Matrix time is not linear with respect to date range (input data size):

34_Matrix_vs_SRC_timings.png 40_Small_matrix_timings.png

Matrix time is not linear with respect to bucket count (which is proportional to output data size):

35_Matrix_vs_SRC_timings.png 41_Small_matrix_timings.png

Matrix time is not linear with respect to output data size:

36_Matrix_vs_SRC_timings.png 42_Small_matrix_timings.png

-- IvanKadochnikov - 2015-05-18

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2015-06-04 - IvanKadochnikov
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback