WARNING: This web is not used anymore. Please use PDBService.CMSHCALAndPixelDatabases instead!
 

General info

The author of the application is Gennadyi Lukhanin (lukhanin@fnalNOSPAMPLEASE.gov) working in Fermi Lab.
Some documents describing the project can be found here:

Recommendations

(Except email exchange I had a few phone chats with Gennadyi)

14-JUL-2005 Jacek

Just few points to think about before the chat.

1. Update operations:

In theory in a cluster environment (such as one that you use for the integration tests) DML operations are not very efficient. There are many reasons for this: more complicated locking, cluster interconnect latency, extensive use of the global cache which is slower than a local cache. This theory seems to be confirmed in practice. Below you can find a slice from AWR report pertaining wait events collected during yesterday's tests on the first instance of the cluster.

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                        % Total
Event                                 Waits    Time (s)   DB Time     Wait Class
------------------------------ ------------ ----------- --------- --------------
gc buffer busy                      896,127      25,694     32.00        Cluster
CPU time                                         19,089     23.77
gc current block busy               210,144       7,606      9.47        Cluster
enq: TX - index contention          220,310       5,999      7.47    Concurrency
gc current grant busy               203,944       2,847      3.55        Cluster
          -------------------------------------------------------------
As you can see almost 50% of DB time was spent waiting for cluster evens which is definitely not good. Statistics for the second instance of the cluster are very similar. Concluding, it is very likely that inserts through only one instance of the cluster could be similarly efficient and could use at the same time much less cluster resources. Would it be possible to execute the same workload as yesterday but this time against a single instance of the cluster? I have already created and put in the tnsnames.ora file on AFS a new connection descriptor that always points to the first instance of the cluster as long as this instance is available. The connection descriptor is named INTR_NLB and looks as follow:
INTR_NLB=(DESCRIPTION=
           (ADDRESS_LIST=
           (FAILOVER=on)
           (LOAD_BALANCE=off)
            (ADDRESS=
                (PROTOCOL=TCP)
                (HOST=itrac15-v.cern.ch)
                (PORT=1521)
            )
            (ADDRESS=
                (PROTOCOL=TCP)
                (HOST=itrac16-v.cern.ch)
                (PORT=1521)
            )
           )
           (CONNECT_DATA=
              (SERVER=DEDICATED)
              (SERVICE_NAME=intr_nlb)
              (FAILOVER_MODE=
                 (TYPE=SELECT)
                 (METHOD=BASIC)
              )
           )
       )
The other remedy for block contention in a cluster environment would be index and table hash partitioning but as it can have a negative impact on queries performance we shouldn't think about it, in my opinion.

2. I discovered that the following query was executed quite often and performs full table scans on a quite big table:

SELECT A0.COMMENT_DESCRIPTION,A0.COND_RUN_ID,A0.IS_RECORD_DELETED, A0.RECORD_INSERTION_USER, A0.RECORD_LASTUPDATE_USER, A0.RUN_BEGIN_TIMESTAMP, A0.RUN_END_TIMESTAMP, A0.RUN_NAME FROM CMS_YLG_CORE_CONDITION_OWNER.COND_RUNS A0 WHERE (A0.IS_RECORD_DELETED = 'F') AND A0.RUN_NAME = '342';
Index on the RUN_NAME column should help a lot.

Aha! And one more very important thing that I forgot to mention during our chat. Could you please use bind variables in your queries? In principle all changing literals in the WHERE clauses of your queries should be replaced with bind variables. In the query above at least value '342' should be replaced with a bind variable. Also 'F' can be replaced if you have similar queries with 'T' instead of 'F'. Using bind variables improves performance (less query parsing) and scalability (less library cache locking).

14-JUL-2005 Jacek

I have found few more full table scan statements. Only the first one is really problematic at the moment. If tables from three other statements are supposed to grow a lot in the future I would also suggest to create appropriate indexes there.

And one more time... Please change all your SQL, so it will use bind variables. Without bind variables 2 statements that differ only by values of literals in the WHERE clause will be treated by Oracle as completely different ones. This means hard parsing of both and the hard parsing is extremely expensive in Oracle (cpu, memory, latches).

Cheers Jacek

SELECT A0.BARCODE,A0.COMMENT_DESCRIPTION,A0.INSTALLED_BY_USER,A0.INSTALLED_DATE,A0.IS_RECORD_DELETED,A0.KIND_OF_PART_ID,A0.LOCATION_ID,A0.MANUFACTURER_ID,A0.NAME_LABEL,A0.RECORD_INSERTION_USER,A0.RECORD_LASTUPDATE_USER,A0.PART_ID,A0.REMOVED_BY_USER,A0.REMOVED_DATE,A0.SERIAL_NUMBER,A0.VERSION FROM CMS_YLG_CORE_CONSTRUCT_OWNER.PARTS A0 WHERE (A0.IS_RECORD_DELETED = 'F') AND A0.NAME_LABEL = 'CMS-ROOT'

SELECT A0.COMMENT_DESCRIPTION,A0.EXTENSION_TABLE_NAME,A0.IS_RECORD_DELETED,A0.KIND_OF_CONDITION_ID,A0.NAME,A0.RECORD_INSERTION_USER,A0.RECORD_LASTUPDATE_USER FROM CMS_YLG_CORE_CONDITION_OWNER.KINDS_OF_CONDITIONS A0 WHERE (A0.IS_RECORD_DELETED = 'F') AND A0.NAME = 'HCAL Pedestals'

A0.COMMENT_DESCRIPTION,A0.EXTENSION_TABLE_NAME,A0.IS_DETECTOR_PART,A0.IS_IMAGINARY_PART,A0.IS_RECORD_DELETED,A0.KIND_OF_PART_ID,A0.LPNAME,A0.DISPLAY_NAME,A0.RECORD_INSERTION_USER,A0.RECORD_LASTUPDATE_USER,A0.SUBDETECTOR_ID FROM CMS_YLG_CORE_CONSTRUCT_OWNER.KINDS_OF_PARTS A0 WHERE (A0.IS_RECORD_DELETED = 'F') AND A0.DISPLAY_NAME = 'HCAL QIE ADC'

SELECT A0.COMMENT_DESCRIPTION,A0.IS_RECORD_DELETED,A0.MANUFACTURER_ID,A0.MANUFACTURER_NAME,A0.RECORD_INSERTION_USER,A0.RECORD_LASTUPDATE_USER FROM CMS_YLG_CORE_CONSTRUCT_OWNER.MANUFACTURERS A0 WHERE (A0.IS_RECORD_DELETED = 'F') AND A0.MANUFACTURER_NAME = 'Fermilab'

22-JUL-2005 Gennadiy Lukhanin

> How are things? I have notice that last week when you were running
> your test suite against the connection string pointing to the first
> node of the cluster the performance was actually better then before.
> During similar test period (5 hours) you were able to perform 64,5
> transactions per second in average while before, against the whole
> cluster you could perform only about 40 transactions per second. Also
> other statistics confirm this better performance. Did you have similar
> observations on the client side?

[Gennadiy Lukhanin] Thank you for your feedback! I did notice some performance improvement, but it's hard to measure since lxplus workstations were under the heavy load from other users. You've done it better on the server side!

> What is also quite important I have impression that we can still
> improve performance of "single instance mode" for write operations
> with instance tuning.
>
> Although DML performance in "single instance mode" is better we should
> still keep in mind that in case of reading, "cluster mode" will be
> probably more efficient (bigger amount of memory). Have you got
> possibility in your application to use different connection string for
> writing and reading?
> Have you created missing indexes in the integration database? If yes,
> could we rerun the workload to check the influence?

[Gennadiy Lukhanin] I carefully went over all core schemas and created 23 additional indexes. I also did some tests on our local dev server. It seems now all queries are going through indexes.

[Gennadiy Lukhanin] I've started 15 clients and planning to call you the first thing in the morning (~9:15 Fermilab time) so we can discuss the results.

> What about more
> read-oriented workload? Is there any chance to run such workload
> during this integration tests period or should we schedule something
> in the future?

[Gennadiy Lukhanin] Well, we have some developments on this front as well. We are converting EDM code to POOL, so the data access pattern had changed significantly. Now we would need to run those queries only once, while transferring data from on-line to off-line database. Off-line database (we don't have the schema for it yet) will be accessed from POOL, so we would need to run different performance tests. I'll keep you updated....

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2005-11-29 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    PSSGroup All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback