Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Oracle Benchmark Testing

A case study by Burleson Consulting

Oracle databases do not run in a vacuum and it is the job of the DBA to know the thresholds where their Oracle database becomes dependent on external factors such as RAM, CPU and Network latency. 

I highly recommend the book "Oracle Benchmarking" for more details on conducting and interpreting Oracle benchmarks. 

This book is written by numerous Oracle tuning experts and shows a complete method for Oracle and SQL Server benchmarking.  Every Oracle database will experience declining response time as the server resources decline, and it is important that every Oracle professional know the point at which their response time will suffer. 

This whitepaper is a guide to understand the methods used to load-down your Oracle server and determine the point at which you will experience declining response time.  Also see this Oracle IBM terabyte RAM benchmark.

Beware of artificial test cases

Before we begin our discussion of Oracle benchmarking. beware of the limited value of artificial test cases, which only observe Oracle from one perspective. As we see below,
a single view of Oracle has lead to vastly different results, like the difference between a circle and a square. Always remember that there is no substitute for a real-world test that simulates the actual behavior of your production systems, using your own production SQL.




Diagram showing the limited value of test cases

Top-Down benchmark analysis

At the instance level, the database becomes dependent on the hardware environment when any of theses conditions is true:

  • CPU-Bound - When the CPU run queue exceeds the number of processors on the Oracle server and there are little or no IO wait conditions.  For example, a dedicated Oracle instance on a server would become CPU bound on a 32 processor box when the run queue exceeds 32 when there are no other causes such as IO waits for the runqueue.
  • RAM-Bound - When the Oracle server experiences RAM swapping operations.  We know that vmstat page-in’s are a normal party of program start-up, but we can correlate the scan-rate with page-in to determine real server RAM swapping conditions.
  • Network-Bound – You can measure network latency with a wide variety of Oracle metrics (e.g. SQL*Net roundtrips to client, SQL*Net message to client, SQL*Net more data to client, SQL*Net message from client, SQL*Net break/reset to client for example from the V$SYSTEM_EVENT and V$SESSION_EVENT tables), and determine if your database is experiencing an inordinate amount of time waiting for network packets. Note that the SQL*Net wait events are only relevant if you have control of the keying and think time components of wait time.
  • I/O bound – As the disk devices become saturated with read/write requests, we can measure device enqueues and quickly see disk-related system bottlenecks. Internally, we can measure the amount of time Oracle is waiting for both reads and writes to help determine disk or IO subsystem stress.

Once we determine the response-time threshold we can find the associated system bottleneck and be prepared to send alerts as we approach that threshold.  For example, assume that we have a response-time curve like this where we start to see a pronounced drop-off in response time at a certain rate of transactions per second TPS as shown in Figure 1):

 

Figure 1 – A typical Oracle response-time drop-off

In this example we see a dramatic increase in response time as we pass 11 transactions per second.  Once we have “broken the systems (by exceeding the maximum transaction rate), we then correlate the increase with various internal Oracle metrics (wait events from ASH, AWR reports) and external metrics (CPU, RAM, disk and network enqueues).

In essence, we place progressively higher load on the Oracle database and observe this response-time drop-off, and then correlate this with an associated server stress condition.  Let’s look at a methodology for performing this test using the Quest Benchmark Factory tool.

Benchmark Factory (BF) is a tool provided by Quest Software that allows us to load our Oracle database with either industry standard benchmarks or real transactions drawn from our applications or Oracle trace files.  The BF tool allows us to simulate the load from multiple end-users and to crank-up the volume of active users in distinct steps.

A Methodology for Oracle Server Stress Testing

The goal of an Oracle server stress test is to place an increasing load on the Oracle instance until the instance becomes dependent on an external resource.  Of course, an applications may have many patterns of SQL access and the bottleneck will change over time as access patterns change, but the primary goal of server stress testing is to determine the “threshold” when the server will “break”, as evidenced by hardware-related enqueues and/or internal Oracle bottlenecks.

This is an actual case study using a pair of Oracle servers with the following resources

  • Servers – Two Oracle servers connected via Oracle Streams n-way replication
    • CPU’s – Dual Intel 64-bit processors
    • RAM – 16 gigabytes of RAM
  • Disk – Six 72 gigabyte disks using RAID 10
  • Network – TCP/IP using Oracle*Net with n-way Oracle Streams replication
  • Oracle Release - Oracle 10g release 2, Enterprise Edition
  • Operating System - Red Hat Linux Advanced Server
  • Application - Oracle Application Express (formerly HTML-DB)
  • Client Software – Microsoft Internet Explorer

Figure 1 shows a model of the Oracle server architecture:

 

Figure 1 – The Oracle server architecture

 

This system architecture provides Oracle Streams cross-replication for two production servers in widely-separated locations (> 100 miles apart) with high-speed interconnects.  This approach has the dual benefit from Oracle disaster recovery, and also allows both servers to be used to process transactions.  Since all transactions are using APEX, redirecting traffic is as simple as changing the domain name in the requesting URL. 

Should one server location fail, end-user simply clicks-into the surviving servers at the other locations and continues processing, as normal.

About Benchmark Factory

Benchmark Factory is a Windows based GUI tool that allows users to easily configure and run either standard benchmarks such as TPC-C or TPC-H or generate server stress and goal tests using custom SQL. An example of a goal test is to increase server transaction load until the response time or TPS fall to less than the previous two tests.

Figure 2 shows the Benchmark Factory GUI screen.

 

Figure 2 – The Benchmark Factory GUI

As you can see in Figure 2 the test shown in the left hand window utilizes several custom SQL transactions in order to perform a goal test, in this case it will run users up from 1 to 200 in 25 user increments and execute the various SQL statements in each user until the TPS in the test is less than the last two consecutive user runs.

The Benchmark Factory utilizes agents which run on any Windows based platform and can handle up to 2000 users each (or whatever saturates your network connection or the agent boxes CPU!) The Agent screen is shown in Figure 3.

Figure 3 – The Benchmark Factory Agent Screen

Only the Agent software is installed on the client machines and all agents are coordinated through the main test coordinator GUI.  Without the agents, only 20 user processes can be run from the main GUI.

Now let’s take a look at the response metrics that we will collect during out stress test.

Collection Metrics

This test will be a real-world Oracle database with a variety of SQL and DML statements.  The benchmark factory allows us to measure the following metrics, and this will determine the slow-down point where overall response time begins to decline.  The benchmark factory allows you to capture metrics for each user, showing

  • BF Response time – This is the end-to-end SQL execution time, measured from SQL submission until the end of the fetch of the SQL result set.  This metric includes all components of response time except the HTML screen map-out time, which is negligible.
  • BF TPS – This is the number of transaction per second, based on the simulated load on each Benchmark factory client PC.
  • BF Transaction_time – This is the average time to execute a single transaction.

For Oracle and Linux metrics, we will use the standard TPC benchmark protocol for stress testing the Oracle server and increase transaction rates while watching all system metrics using:

  • AWR Reports – We will be collecting AWR snapshots at one minute intervals, and create elapsed-time reports using the data collected in the Automated Session History (ASH) tables.  Within the AWR report we will examine the top-5 timed events and pay special attention to specific wait events during the elapsed-time period
  • Linux vmstat – We will run vmstat collections at 10 second intervals during the entire duration of the testing.  This will give us real-time data on RAM paging and CPU enqueues.
  • Linux iostat – We will run iostat against each of our three 72 gig mirrored devices to monitor disk enqueues.

The stress testing plan involves extracting a representative sample of SQL and DML from the production application with the purpose of simulating a typical use load.  The TCP testing framework will allow us to select host variables for our SQL in-order to simulate a real-world load on the system. The SQL statements used are shown in Appendix A. The Benchmark Factory contains special functions ($BF functions) that generate data inputs to the SQL statements before sending them to the user processes. For this test we utilized the $BFRandList function to randomly select text based entries and the $BFRandRange function to randomly choose integer values for a designated range of integers.

We then use the TCP framework to turn-up the rate of the randomized transaction in one-minute intervals, taking an AWR snapshot, vmstat, iostat and netstat snapshot before each stepped increase in the transaction rate.  The benchmark factory will measure actual end-to-end response times and note when the response time drops-off because of high transaction load. 

The Experiment Steps

To be realistic, the transaction load should closely parallel the real-world transactions, and this benchmark will use the Goal benchmark framework from the “benchmark factory”.  There are several phases in preparation for the benchmark:

  1. SQL extraction – We extract a representative sample of actual SQL and DML from the production database.
  1. SQL preparation – We create PL/SQL wrappers around the SQL to pass-in valid values for the host variables, using data from the application tables.
  1. Load Preparation – Benchmark factory allow us to submit up to 2000 transaction threads per TNS client, however the practical limit is based on network bandwidth of the client, for example 60 users tends to saturate a 100Mbit line.  We will configure 2 clients, installing BF Agent software and TNS in order to set-up a load of 200 transaction threads.
  1. Test Execution – We then start the server data capture (vmstat, iostat) and take an AWR snapshot.  After a pre-designated interval we will step up the user load by 10 users at each step.
  1. Post-hoc Analysis – We plot the overall response_time and correlate the drop-off with internal Oracle metrics (e.g. wait events) and external server metrics (CPU enqueues, RAM swapping).

Let’s take a closer look at each step.

1. SQL Extraction

To simulate a full production load we start by extracting SQL and DML from the live production application.  You can gather the most resource intensive SQL directly from the library cache based on any of these criteria:

  • Disk_reads – This will extract the SQL that typically exceeds the data buffer capacity, generally non-repeating data fetches that are unlikely to be cached.
  • Executions – This is a good metric because it simulates the actual load on the database 
  • Buffer_gets – This measures logical IO and CPU and memory stress
  • Number of Executions – This tests the most used transactions

We must remember that some of the most frequently-executed SQL will be against data blocks that will already be cached (high consistent gets) whereas SQL with high disk_reads will generate a load that is disk I/O intensive (high physical reads). 

If we wanted to extract the top SQL by executions we could issue the following command:

set lines 2000;
select sql_text from v$sql order by executions desc;
 
select  unsub_first_name, unsub_middle_name, unsub_last_name,
   into :post_unsub_first_name, :post_unsub_middle_name,
        :post_unsub_last_name,
   from
        fbi_person g,
        imm_arture i,
        fbi_immig_c gic,
        imm_fli if
  where
        g.fbi_person_id = i.fbi_person_id
        and g.fbi_imcat_cd = gic.fbi_imcat_cd
        and i.flight_seq_nbr_key = if.flight_seq_nbr_key
        and i.ed_card_nbr = :post_lookup_card;
 
insert into
   imm_argssr
      (fbi_person_id, flight_seq_nbr_key, edd_nbr,
       actual_imm, immcate,   
       est_immre_time,
       actual_immige, hotel_name,   
       flight_crew_indicator_flag, purpose_of_visit)
 values
  (:post_fbi_person_id,
   :post_ad_flight_key,
   :post_ed_nbr_save,
   to_date(:post_actl_date, 'dd/mm/yyyy'),
   :post_ad_passetegory,
   to_date(:post_ad_eparture_date,'dd/mm/yyyy'),
   sysdate, null, 'n', null);

 2. Preparing the SQL for Benchmark Factory

For application SQL with the PL/SQL into clause, we must replace the PL/SQL into clause syntax with its generic SQLPLUS form.  If you want your test to keep the PL/SQL into clause, you can build anonymous PL/SQL blocks for them and execute them as prepared statements.  For our purposes, we only need the data transferred back to the calling machine, so we alter the SQL to remove the PL/SQL into clause:

select  unsub_stuff
   from
        fbi_person g,
        imm_arr_departure i,
        fbi_immigration_category gic,
        imm_flight if
  where
        g.fbi_person_id = i.fbi_person_id
        and i.flight_seq_key = if.flight_seq_key
        and i.ed_card_nbr = :post_LOOKUP_BY_NAME

Providing data values for the benchmark

There are two ways to generate random testing data form the SQL that we have extracted:

  1. Random Values - We can use the benchmark factory functions $BFRandList and $BFRandRange to generate a random lists of data values.  We can control all ranges of numeric values with valid ID ranges and the list substitutions with valid values.
  1. Application Values - We can extract actual host variable data from the application tables. By examining tables and actual values we can provide a realistic list of values to feed into the functions. For nearly all of the SQL selected for this test there were a limited number of values with most selections being based on integer value keys easily generated using the $BFRandRange function.

Random Range Values

Here we see an example of SQL using the $BFRandRange function to generate random values for host variables:

select stuff

               from fbi_person g, fbi_immigration ic

        where g.fbi_person_id = $bfrandrange(10000,100000)

          and g.fbi_imcat_cd = ic.fbi_imcat_cd;

Application Values

By carefully selecting our SQL we can utilize selection values easily loaded into the $BFRandList function.  To use actual application data you would need to create a function that would return random values to the SQL statement, however, this would generate additional load on the database possibly causing skewed or invalid results.

We are now ready to configure the client machines, placing the TNS layer and the Benchmark Factory software that will load-down the database.  Now that we have the SQL ready we can pursue the set-up on the BF clients.

3. Load Preparation

Stressing the database server involves a set-up of multiple Oracle*Net clients.  The benchmark factory allows you to install an “agent” on an Oracle*net client machine and launch the SQL from many client PC’s.

How many users do we want to simulate? Up to 2000 or so (up to CPU or network bandwidth limits) can be done from a single agent process, more than that we will need to install a second or more agents to submit requests through. Any windows based system can be used as an agent, but it must have Oracles network stack installed.

4. Test Execution & Capturing Results

Here is an example of the results from a “Goal” test where the goal was to push the system to where the TPS dropped below the previous two runs.

The Benchmark Factory collects all transaction data and places it inside a spreadsheet for easy access and summarization:

 

 

Experiment Predictions

In our typical 64-bit Oracle system with 32 gigabytes of RAM we would predict results as follows:

  • The RAM data buffer cache (14 gigs in each of the two servers) will be sufficient to cache the working-set of frequently-referenced data blocks and disk enqueues will not be the bottleneck.
  • The low-DML of this system and the fact that OAE does whole-screen data map-outs in one Oracle*Net trip, we do not expect to see network latency being responsible for the response-time drop-off.

Because of the architecture and the nature of transactions against this system we expect that the main source of contention in this system should be CPU, and Oracle latch-elated wait events.  We should start seeing CPU enqueues (in our case, the run queue exceeding “2” on each server) at about the same transaction rate when total response time declines precipitously.  We predict that this configuration will bottleneck on CPU at approximately 30 transactions per second and that RAM, network and disk enqueues will not be on the bottleneck.

Our Results

For the initial test run we utilized only a single Node as the full scale replication was not yet implemented. In the test we set up a run from 1-100 users executing the SQL transactions from Appendix A, in this phase of testing we will not test the replication.

In this goal test the criteria for ending the test was for the transactions per second at the current user load to fall behind that of the two previous runs. The graph in Figure 4 shows the TPS results.

 Figure 4 – The Preliminary Test TPS Results

The results show that at 60 users the system TPS seems to peak in this configuration, but what about other signs of strain in the system? Is it CPU, Disk, or Network related? Look at Figure 5 for CPU usage results.

 

Figure 4 – The Preliminary Test TPS Results

Note that the user (us) CPU usage peaks at around 65 percent; with System (sy) peaking at only 5 percent this still indicates that we have 30 percent CPU idle time. Clearly based on Figure 5 we do not have CPU stress. A secondary indicator of CPU stress is runqueue length. Figure 5 shows the plot of runqueue for the same time period.

 

Figure 5 – The Preliminary Test Runqueue Results

As you can see the runqueue has rarely rises above 4, this indicates that CPU stress was not a factor. Next let’s examine disk stress results. Look at Figure 6 showing Disk Writes.

 

Figure 6 – The Preliminary Test Disk Write  Results

As can be seen from Figure 6, other than a single large burst of greater than 30,000 IO in a 10 second interval near the mid-range of the test, disk IO was well within the disks operational limits. Figure 7 shows that Disk Reads were also well within specifications.

 

Figure 7 – The Preliminary Test Disk Read Results

 

The system reads are clearly below any stress levels that would indicate the disks were the cause of the low TPS values. This is also supported when we examine the system wait statistics in Figure 8.

 

Figure 8 – The Preliminary Test Wait Results

 

As can be seen from examining Figure 8, other than a peak that corresponds to the reads on disk hbc, system waits are not causing the TPS drop off. The final possible bottle necks are memory and network. Examination of the AWR report for the test period shows some interesting network statistics. Look at Figure 9 for an excerpt from the applicable AWR report from the test run.

 

Shared Pool Statistics        Begin    End
                              ------  ------
             Memory Usage %:   88.59   91.21
    % SQL with executions>1:   35.30   41.99
  % Memory for SQL w/exec>1:   44.32   45.20
 
Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time                                            555          67.3
latch: library cache                  7,068          69     10    8.3 Concurrenc
latch: shared pool                    4,291          15      4    1.9 Concurrenc
log file sync                         5,520          11      2    1.4     Commit
log file parallel write               6,464           9      1    1.1 System I/O
          -------------------------------------------------------------
 
Statistic Name                                       Time (s) % of DB Time
------------------------------------------ ------------------ ------------
DB CPU                                                  554.8         67.3
sql execute elapsed time                                395.7         48.0
parse time elapsed                                      384.1         46.6
hard parse elapsed time                                 311.9         37.9
 
                                                                  Avg
                                       %Time       Total Wait    wait     Waits
Wait Class                      Waits  -outs         Time (s)    (ms)      /txn
-------------------- ---------------- ------ ---------------- ------- ---------
Concurrency                    12,201     .0               96       8       2.0
System I/O                     10,306     .0               15       1       1.7
Commit                          5,520     .0               11       2       0.9
Network                     2,798,166     .0                8       0     451.2
Other                             247     .0                1       5       0.0
Application                       400     .0                0       0       0.1
User I/O                        1,407     .0                0       0       0.2
          -------------------------------------------------------------
 
 
Wait Events                              DB/Inst: FBIGT/fbigt  Snaps: 383-384
-> s  - second
-> cs - centisecond -     100th of a second
-> ms - millisecond -    1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)
 
                                                                   Avg
                                             %Time  Total Wait    wait     Waits
Event                                 Waits  -outs    Time (s)    (ms)      /txn
---------------------------- -------------- ------ ----------- ------- ---------
latch: library cache                  7,068     .0          69      10       1.1
latch: shared pool                    4,291     .0          15       4       0.7
log file sync                         5,520     .0          11       2       0.9
log file parallel write               6,464     .0           9       1       1.0
latch: cache buffers chains             597     .0           7      12       0.1
SQL*Net message to client         2,782,891     .0           5       0     448.8
SQL*Net message from client       2,782,946     .0     190,370      68     448.8
          -------------------------------------------------------------
 
Background Wait Events                   DB/Inst: FBIGT/fbigt  Snaps: 383-384
-> ordered by wait time desc, waits desc (idle events last)
 
                                                                   Avg
                                             %Time  Total Wait    wait     Waits
Event                                 Waits  -outs    Time (s)    (ms)      /txn
---------------------------- -------------- ------ ----------- ------- ---------
rdbms ipc message                    17,804   65.0      37,185    2089       2.9
          -------------------------------------------------------------

 

Figure 9 – The Preliminary Test AWR Results

 

What the AWR report is showing is that we are seeing some shared pool stress and network stress. Review of alert log entries shows the database was also getting ORA-04031 errors, indicating the shared pool was becoming fragmented. It was decided that the CURSOR_SHARING parameter would be left at EXACT; as a result this caused numerous not sharable SQL statements to be loaded into the shared pool causing the ORA-04031 errors.

When we retest using the dual nodes with replication, we will turn on the CURSOR_SHARING to reduce memory thrashing and parsing. In addition we will make sure the clients used are on separate network strands.

 

Conclusions for Phase 1 Testing

Phase one showed that even with limited network bandwidth the system easily supported 50-60 users on a single node, more than is anticipated than will be needed during normal operations. With better memory parameters and better network bandwidth it is anticipated that the nodes will easily support 80-100 users.

Disk Load - other than a single large burst of greater than 30,000 IO in a 10 second interval near the mid-range of the test, disk IO was well within the disks operational limits.

CPU – The user (us) CPU usage peaks at around 65 percent; with System (sy) peaking at only 5 percent this still indicates that we have 30 percent CPU idle time.

Memory and Network - Examination of the AWR report for the test period shows some interesting network statistics.  As predicted, the server became CPU-bound as active users rose above 60:

 

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time                                            555          67.3
latch: library cache                  7,068          69     10    8.3 Concurrenc
latch: shared pool                    4,291          15      4    1.9 Concurrenc
log file sync                         5,520          11      2    1.4     Commit
log file parallel write               6,464           9      1    1.1 System I/O

 

Other Oracle benchmark notes:


 

 

  
 

 
 
 
 
 

 
 
 
 
 
 
Oracle training Excel
 
Oracle performance tuning software 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2014

All rights reserved by Burleson

Oracle © is the registered trademark of Oracle Corporation.