WARNING: This web is not used anymore. Please use PDBService.StreamsPerformance instead!
 

Streams Performance and Scalability Tests

Goals

This wiki page has the goal of documenting and sharing the results of streams performance and scalability testing in progress at CERN. The aim is to achieve a scalable and robust replication system for data sharing between Oracle databases at CERN and outside CERN (Tier 1 sites) and also within CERN (online to offline data transfer).

People involved

CERN: Dirk, Eva, Luca
Oracle: Patricia, Graeme

Test Architecture and setup

Replication has been set up between two Oracle RAC databases at the CERN datacenter on the same Gbps LAN. This reduces the complexity by excluding network latency issues and the use of heterogeneous configurations. Oracle RAC on Linux with 2-node clusters are used (see Architecture_description.pdf and Installation_verbose ). Streams replication is setup as described in CONFIG_STREAMS_for_TEST_10Jul.txt The test workload is a simple insert into a single table replicated from source (d3r database) to destination (test1 database). This workload and the structure of the table (short row size composed mainly of numeric fields) have been chosen to reflect the expected behavior of an important class of applications at CERN (where 'condition data' that is numeric parameters of detectors and similar are logged into one table). The test script can be found in test_script.py.txt

Test 10-Jul, Luca

The test is: insert 1M rows into the a single table with 1 index (PK). Insert statements are grouped in transactions of 100 rows. A delay of 2cs has been added between transactions.

Notes

  • The delay is new, this has been added to slow down the insert speed and bring it just above the measured replication speed.
  • The test now runs for about 5 minutes, instead of 1 min.
  • Apply parallelism is set to 1
  • Replication source is d3r, target is test1
  • on both DBs node 1 is the owner of the streams queues during the test (KST trace were gathered on node 1)

Collected logs and test sequence:

  • scripts for KST traces are prepared (see KST_trace.txt)
  • the test is started from a client on the same LAN as the DB servers (Mon Jul 10 19:27:42 2006)
  • no further actions for a delay of a few seconds
  • AWR snapshots are taken (on d3r and test1)
  • strmmon data collection is started (d3r and test1)
  • KST trace data is collected
  • The test program stops inserting (reached 1M rows) at Mon Jul 10 19:31:40 2006
  • streams replication goes on till 2006-07-10 19:32:45
  • AWR reports are taken
  • strmmon collection is stopped
  • streams HC are gathered
  • test10Jul_d3r.tar.gz: source DB trace files (test 10jul)
  • test10Jul_test1.tar.gz: target DB trace files (10Jul)

Measured throughput:

  • in rows/lcrs: 1M rows in 5 min -> 3.3K rows/sec
  • In bytes: 50 MB in 5 min -> 10 MB/min, 170KB/sec
  • Note: this result is consistent with what was measured and discussed previously for similar tests

Notes on Performance and scalability:

  • The measured performance of 'Test 10-Jul' is already usable by some of the application in development and test at CERN
  • However some write-intensive applications that we foresee in our environment will most likely need more than that.
  • For this reason we would like to better understand the scalability limits of streams replication in its current form and possibly tune for performance and scalability.
  • Other observations related to scalability and 'bottleneck hunting':
    • We see that systems resource utilization (CPU, IO) is not the bottleneck
    • From this and other tests we see that 'capture' is not the bottleneck
    • From other tests we have seen that increasing the apply parallelism we don't increase the throughput for this particular workload (the peculiarity here is that we insert into one table)
    • During the test we could see that the replication pace was slowly drifting behind with the insert (test harness) workload, so I believe that a steady state can be reached at a somewhat lower workload than what was used for Test 10-Jul.
    • Note 11 Jul: this has been confirmed by further tests:
      • delay = 1 cs (was 2 cs in Test10Jul)-> workload generator run time = 2 min -> streams replication run time = 5min
      • delay = 3 cs (was 2 cs in Test10Jul)-> workload generator run time = 5 min -> streams replication run time = 5min (it stops about 10 seconds after the workload generator)

Test 12-Jul, Luca, Eva

  • similar to Test10 Jul, with the following differences, as requested by Patricia
  • KST tracing events changed:
ALTER SYSTEM SET trace_enabled = TRUE;
ALTER TRACING DISABLE "ALL";
ALTER TRACING ENABLE "10304:255:ALL:NORMAL";
ALTER TRACING ENABLE "10005:255:ALL:NORMAL";
  • collected ash report too
  • added a script to collect pstack info for Apply Server in a loop (100 times with a delay of 1 sec)
  • Test12Jul_d3r.tar.gz: test12jul - logs and reports from the source DB
  • Test12Jul_test1.tar.gz: test12jul - logs and reports from the destination DB

Reply from Patricia (22-Jul):

We would like to see more detailed information for the apply server process (P001) session id 505 from the ASH information:
Sid, Serial# % Activity Event % Event User Program # Samples Active XIDs
509, 4539 48.84 Streams AQ: enqueue blocked due to flow control 30.37 STRMADMIN oracle@itrac37NOSPAMPLEASE...h (TNS V1-V3) 222/606 [ 37%] 0
CPU + Wait for CPU 13.27 97/606 [ 16%] 0
SQL*Net more data from client 4.79 35/606 [ 6%] 0
505, 2 36.53 CPU + Wait for CPU 34.75 SYS oracle@itrac25NOSPAMPLEASE.cern.ch (P001) 254/606 [ 42%] 192

Just FYI...you can specify the specific session_id by defining target_session_id variable within the ashrpt.sql. Make a copy of the ashrpt.sql file from rdbms/admin directory, Define the target_session_id variable with this copied file and use this modified file to generate the ASH report.

If the ASH data is no longer available for this AWR report and you need to rerun the load, can you supply some sar data that shows the disk activity during the run - and identify the disks where the database files and logfiles are stored. Also, the KST tracing should not be collected until you see the apply consistently showing as a bottleneck (at least 6-10 times in a row).
Just to clarify... we will need to see 2 ASH reports for the target database. The first one must be run without the specific target_session_id setting. Then, generate a 2nd ASH report with the APPLY SERVER sid specified, so we can get more detailed information. Basically we want detailed event info on any session that is taking significant time on the database (other than the propagation receiver). In the earlier example, session id 509 (the first session listed in the table) is the propagation receiver.

Added 27-Jul: can you add to the data collected pstack output for the reader process of the CAPTURE at the source database. Take multiple pstack outputs at the source. In addition, obtain the ashrpt for the capture reader process session at the source database. There have been some additional discussions on the output supplied so far, and additional info has been requested.

Added 4-Aug: I just want to let you know that the last report showed the logminer reader to be 90% active. Based on this information, it doesn't seem likely that the apply is the bottleneck. If you are going to do a re-run, please make sure to get an ASH report targeting the logminer reader session, on the source, as well. Actually, I think we can make the information collection a bit simpler on this re-run. Just give us the STRMMON and the ASH report for the source and destination. Also, re-run the ASH for the logminer reader and the apply server session. In other words, it is not necessary to generate the KST and the AWR reports.

Test 04-Aug Luca, Eva

  • repeated the same test (insert into 1 table and replication from source to destination RAC on a LAN, 1 milion rows inserted with transactions of 100 rows, dealy of 2cs between transactions, see also python script)
  • test04aug.zip: streams replication performance test - test04 aug
  • note on the ash reports: those were run after a few seconds (estimated up to 1 min) replication had finished, with start = -9 min to current time (this introduces some 'idle time' in the calculation).
  • note on apply parallelism: it eas set to 1
  • comment: we don't see capture (logminer) as the bottleneck.

Test 15-Aug Web Conference

Test 12-Nov

  • From Patricia: alter system set events '26749 trace name context forever, level 2';
  • Test with apply parallelism = 2 (tested parall=3 and parall =4, contention reduces throughput, parall =1 also lower throughput for different reason)
    • using same test script as July tests above (with delay = 0.1 cs)
    • in rows/lcrs: 1M rows in 160 sec -> 6.2K rows/sec
    • capture is still most of the time blocked for flow control + capture (OS) process active with 20% of CPU (on a dual CPU machine) -> (capture is not the bottleneck),

CMS Online/Offline replication tests, Marcin

The test is: Streaming 7 ecalpedestals objects in a single commit

Test conditions and setup

  • Streams setup described in Test Architecture and setup (above)
  • Inserting data into 2 tables - st_ecalpedestals(1 row per object) and st_ecalpedestals_item (child)
  • Each object's size is 1.63 MB, after relational mapping it takes 61200 rows in st_eclalpedestals_item table
  • We have 7 x 61200 row inserts in a single commit
  • The number of LCRs is 428,400
  • It must be 7 objects in a single commit, otherwise the risk of data corruption may arise

Test procedure

PL/SQL streamtest procedure generates the random data and does the table inserts. During the tests we're monitoring the LCRs flow, redo generated on source and destination databases and streams pool size. AWR reports are taken.

Performance

  • Performance is very low - the replication time is 300 - 350 seconds
  • Both processors run at 50%
  • 100 - 120 secs of "Streams AQ: enqueue blocked due to flow control" wait events on the destination database

Unresolved or partially unresolved problems

We've noticed, that for a single run of streamtest(7) the redo volume generated by the destination database is much higher than on the source (716 MB vs. 60 MB - 12x factor).
The redo traffic distribution is also interesting. Redo activity is around 3.5 MB/s until the apply writer starts, and 1.5 MB/s during the apply (as indicated by strmmon, AXXX process activity changes to non-zero value) see lcrs.png and redo.png attached.
Having mined through the logs on the destination database we've discovered that there is a lot of inserting into STREAMS$_APPLY_SPILL_MSGS_PART - precisely the number of inserts into former table equals the number of inserts into st_ecalpedestals_item, which is 428400 (7x61200).
We've dumped the log and observed that redo entry's size for STREAMS$_APPLY_SPILL_MSGS_PART is 591 bytes vs 75 bytes for st_ecalpedestals_item redo insert entry. The redo factor is then 666/75 = 8.88. So the redo volume generated by st_ecalpedestals_item inserts and STREAMS$_APPLY_SPILL_MSGS_PART is around 300 MB, which leaves 400 MB of redo unaccounted for (we've only checked the most frequent redo entries).
We suspect that the inserts into STREAMS$_APPLY_SPILL_MSGS_PART are related to TXN_LCR_SPLILL_THRESHOLD apply parameter. We've tried to set this parameter to different values than default 10,000. With the parameter set to values > 80,000 the performance dropped slightly and the source database redo increased its volume by factor of 2 (so far we haven't mined through the logs). The performance drop could be observed via strmmon output - there were gaps 10-20 sec in LCR propagation (source and destination queue as well as propagation indicated 0 LCRs/sec). Having set the parameter to 300,000 caused the spilling from queues on the destination database, which led to redo volume increase to 1,2 GB on the destination and affected the redo on source side (increase). The free streams pool size hasn't dropped below 30% (of 1,2 GB).

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt CONFIG_STREAMS_for_TEST_10Jul.txt r1 manage 2.7 K 2006-07-10 - 22:08 LucaCanali streams configuration for Test 10Jul - main steps
Compressed Zip archivezip KST_output.zip r1 manage 1925.6 K 2006-08-15 - 20:50 EvaDafonte  
Texttxt KST_trace.txt r1 manage 3.3 K 2006-07-10 - 23:46 LucaCanali KST trace, details
Unknown file formatgz Test12Jul_d3r.tar.gz r2 r1 manage 1621.8 K 2006-07-12 - 15:35 LucaCanali test12jul - logs and reports from the source DB
Unknown file formatgz Test12Jul_test1.tar.gz r2 r1 manage 1671.9 K 2006-07-12 - 15:36 LucaCanali test12jul - logs and reports from the destination DB
Texttxt addmrpt_1_2174_2176_test12jul_d3r.txt r1 manage 7.7 K 2006-07-13 - 14:11 EvaDafonte test12jul - addm report from the source DB
Texttxt addmrpt_1_5030_5032_test12jul_test1.txt r1 manage 5.2 K 2006-07-13 - 14:11 EvaDafonte test12jul - addm report from the destination DB
Compressed Zip archivezip ash_reports.zip r1 manage 44.8 K 2006-08-15 - 20:48 EvaDafonte  
Unknown file formatsql ecalpedestals_ddl.sql r1 manage 0.8 K 2007-01-16 - 15:56 MarcinBogusz  
Compressed Zip archivezip hc_reports.zip r1 manage 187.2 K 2006-08-15 - 20:49 EvaDafonte Health Check Reports
PNGpng lcrs.png r1 manage 15.4 K 2007-01-16 - 15:55 MarcinBogusz LCR throughput - standard case, 2 runs
PNGpng redo_dest.PNG r1 manage 29.8 K 2007-01-16 - 15:56 MarcinBogusz Redo generated on the destination, 2 runs
Unknown file formatsql streamtest.sql r1 manage 1.4 K 2007-01-16 - 15:56 MarcinBogusz  
Compressed Zip archivezip test04aug.zip r1 manage 30.8 K 2006-08-04 - 15:41 LucaCanali streams replication performance test - test04 aug
Unknown file formatgz test10Jul_d3r.tar.gz r2 r1 manage 1584.0 K 2006-07-10 - 23:44 LucaCanali source DB trace files (test 10jul)
Unknown file formatgz test10Jul_test1.tar.gz r2 r1 manage 1717.6 K 2006-07-10 - 23:45 LucaCanali target DB trace files (10Jul)
Texttxt test_script.py.txt r1 manage 1.6 K 2006-07-10 - 22:09 LucaCanali workload generator
Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2007-01-17 - MarcinBogusz
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PSSGroup All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback