Before we begin our discussion of Oracle
benchmarking. beware of the limited value of artificial test cases,
which only observe Oracle from one perspective. As we see
below,
a single view of Oracle has lead to vastly different
results, like the difference between a circle and a square.
Always remember that there is no substitute for a real-world test
that simulates the actual behavior of your production systems,
using your own production SQL.
At the instance level, the database becomes
dependent on the hardware environment when any of theses conditions
is true:
- CPU-Bound -
When the CPU run queue exceeds the number of processors on the
Oracle server and there are little or no IO wait conditions.
For example, a dedicated Oracle instance on a server would
become CPU bound on a 32 processor box when the run queue
exceeds 32 when there are no other causes such as IO waits for
the runqueue.
- RAM-Bound -
When the Oracle server experiences RAM swapping operations. We
know that vmstat page-in’s are a normal party of program
start-up, but we can correlate the scan-rate with page-in to
determine real server RAM swapping conditions.
- Network-Bound
– You can measure network latency with a wide variety of Oracle
metrics (e.g. SQL*Net roundtrips to client, SQL*Net message to
client, SQL*Net more data to client, SQL*Net message from
client, SQL*Net break/reset to client for example from the
V$SYSTEM_EVENT and V$SESSION_EVENT tables), and determine if
your database is experiencing an inordinate amount of time
waiting for network packets. Note that the SQL*Net wait events
are only relevant if you have control of the keying and think
time components of wait time.
- I/O bound – As
the disk devices become saturated with read/write requests, we
can measure device enqueues and quickly see disk-related system
bottlenecks. Internally, we can measure the amount of time
Oracle is waiting for both reads and writes to help determine
disk or IO subsystem stress.
Once we determine the response-time threshold
we can find the associated system bottleneck and be prepared to send
alerts as we approach that threshold. For example, assume that we
have a response-time curve like this where we start to see a
pronounced drop-off in response time at a certain rate of
transactions per second TPS as shown in Figure 1):

Figure 1 – A typical Oracle
response-time drop-off
In this example we see a dramatic increase in
response time as we pass 11 transactions per second. Once we have
“broken the systems (by exceeding the maximum transaction rate), we
then correlate the increase with various internal Oracle metrics
(wait events from ASH, AWR reports) and external metrics (CPU, RAM,
disk and network enqueues).
In essence, we place progressively higher load
on the Oracle database and observe this response-time drop-off, and
then correlate this with an associated server stress condition.
Let’s look at a methodology for performing this test using the Quest
Benchmark Factory tool.
Benchmark Factory (BF) is a tool provided by
Quest Software that allows us to load our Oracle database with
either industry standard benchmarks or real transactions drawn from
our applications or Oracle trace files. The BF tool allows us to
simulate the load from multiple end-users and to crank-up the volume
of active users in distinct steps.
A Methodology for Oracle Server Stress Testing
The goal of an Oracle server stress test is to
place an increasing load on the Oracle instance until the instance
becomes dependent on an external resource. Of course, an
applications may have many patterns of SQL access and the bottleneck
will change over time as access patterns change, but the primary
goal of server stress testing is to determine the “threshold” when
the server will “break”, as evidenced by hardware-related enqueues
and/or internal Oracle bottlenecks.
This is an actual case study using a pair of
Oracle servers with the following resources
- Servers
– Two Oracle servers connected via Oracle Streams n-way
replication
- CPU’s
– Dual Intel 64-bit processors
- RAM
– 16 gigabytes of RAM
- Disk
– Six 72 gigabyte disks using RAID 10
- Network
– TCP/IP using Oracle*Net with n-way Oracle Streams replication
- Oracle
Release - Oracle 10g release 2, Enterprise Edition
- Operating
System - Red Hat Linux Advanced Server
-
Application - Oracle Application Express (formerly HTML-DB)
- Client
Software – Microsoft Internet Explorer
Figure 1 shows a model of the Oracle server
architecture:

Figure 1 – The Oracle server
architecture
This system architecture provides Oracle
Streams cross-replication for two production servers in
widely-separated locations (> 100 miles apart) with high-speed
interconnects. This approach has the dual benefit from Oracle
disaster recovery, and also allows both servers to be used to
process transactions. Since all transactions are using APEX,
redirecting traffic is as simple as changing the domain name in the
requesting URL.
Should one server location fail, end-user
simply clicks-into the surviving servers at the other locations and
continues processing, as normal.
About Benchmark Factory
Benchmark Factory is a Windows based GUI tool
that allows users to easily configure and run either standard
benchmarks such as TPC-C or TPC-H or generate server stress and goal
tests using custom SQL. An example of a goal test is to increase
server transaction load until the response time or TPS fall to less
than the previous two tests.
Figure 2 shows the Benchmark Factory GUI
screen.
Figure 2 – The Benchmark
Factory GUI
As you can see in Figure 2 the test shown in
the left hand window utilizes several custom SQL transactions in
order to perform a goal test, in this case it will run users up from
1 to 200 in 25 user increments and execute the various SQL
statements in each user until the TPS in the test is less than the
last two consecutive user runs.
The Benchmark Factory utilizes agents which run
on any Windows based platform and can handle up to 2000 users each
(or whatever saturates your network connection or the agent boxes
CPU!) The Agent screen is shown in Figure 3.

Figure 3 – The Benchmark
Factory Agent Screen
Only the Agent software is installed on the
client machines and all agents are coordinated through the main test
coordinator GUI. Without the agents, only 20 user processes
can be run from the main GUI.
Now let’s take a look at the response metrics
that we will collect during out stress test.
Collection Metrics
This test will be a real-world Oracle database
with a variety of SQL and DML statements. The benchmark factory
allows us to measure the following metrics, and this will determine
the slow-down point where overall response time begins to decline.
The benchmark factory allows you to capture metrics for each user,
showing
- BF
Response time – This is the end-to-end SQL execution time,
measured from SQL submission until the end of the fetch of the
SQL result set. This metric includes all components of response
time except the HTML screen map-out time, which is negligible.
- BF TPS
– This is the number of transaction per second, based on the
simulated load on each Benchmark factory client PC.
- BF
Transaction_time – This is the average time to execute a
single transaction.
For Oracle and Linux metrics, we will use the
standard TPC benchmark protocol for stress testing the Oracle server
and increase transaction rates while watching all system metrics
using:
- AWR
Reports – We will be collecting AWR snapshots at one minute
intervals, and create elapsed-time reports using the data
collected in the Automated Session History (ASH) tables. Within
the AWR report we will examine the top-5 timed events and pay
special attention to specific wait events during the
elapsed-time period
- Linux
vmstat – We will run vmstat collections at 10 second
intervals during the entire duration of the testing. This will
give us real-time data on RAM paging and CPU enqueues.
- Linux
iostat – We will run iostat against each of our three 72 gig
mirrored devices to monitor disk enqueues.
The stress testing plan involves extracting a
representative sample of SQL and DML from the production application
with the purpose of simulating a typical use load. The TCP testing
framework will allow us to select host variables for our SQL
in-order to simulate a real-world load on the system. The SQL
statements used are shown in Appendix A. The Benchmark Factory
contains special functions ($BF functions) that generate data inputs
to the SQL statements before sending them to the user processes. For
this test we utilized the $BFRandList function to randomly select
text based entries and the $BFRandRange function to randomly choose
integer values for a designated range of integers.
We then use the TCP framework to turn-up the
rate of the randomized transaction in one-minute intervals, taking
an AWR snapshot, vmstat, iostat and netstat snapshot before each
stepped increase in the transaction rate. The benchmark factory
will measure actual end-to-end response times and note when the
response time drops-off because of high transaction load.
The Experiment Steps
To be realistic, the transaction load should
closely parallel the real-world transactions, and this benchmark
will use the Goal benchmark framework from the “benchmark factory”.
There are several phases in preparation for the benchmark:
- SQL
extraction – We extract a representative sample of actual
SQL and DML from the production database.
- SQL
preparation – We create PL/SQL wrappers around the SQL to
pass-in valid values for the host variables, using data from the
application tables.
- Load
Preparation – Benchmark factory allow us to submit up to
2000 transaction threads per TNS client, however the practical
limit is based on network bandwidth of the client, for example
60 users tends to saturate a 100Mbit line. We will configure 2
clients, installing BF Agent software and TNS in order to set-up
a load of 200 transaction threads.
- Test
Execution – We then start the server data capture (vmstat,
iostat) and take an AWR snapshot. After a pre-designated
interval we will step up the user load by 10 users at each step.
- Post-hoc
Analysis – We plot the overall response_time and
correlate the drop-off with internal Oracle metrics (e.g. wait
events) and external server metrics (CPU enqueues, RAM
swapping).
Let’s take a closer look at each step.
1. SQL Extraction
To simulate a full production load we start by
extracting SQL and DML from the live production application. You
can gather the most resource intensive SQL directly from the library
cache based on any of these criteria:
-
Disk_reads – This will extract the SQL that typically
exceeds the data buffer capacity, generally non-repeating data
fetches that are unlikely to be cached.
-
Executions – This is a good metric because it simulates the
actual load on the database
-
Buffer_gets – This measures logical IO and CPU and memory
stress
- Number of
Executions – This tests the most used transactions
We must remember that some of the most
frequently-executed SQL will be against data blocks that will
already be cached (high consistent gets) whereas SQL with
high disk_reads will generate a load that is disk I/O
intensive (high physical reads).
If we wanted to extract the top SQL by
executions we could issue the following command:
set lines 2000;
select sql_text from v$sql order by executions desc;
select unsub_first_name, unsub_middle_name, unsub_last_name,
into :post_unsub_first_name, :post_unsub_middle_name,
:post_unsub_last_name,
from
fbi_person g,
imm_arture i,
fbi_immig_c gic,
imm_fli if
where
g.fbi_person_id = i.fbi_person_id
and g.fbi_imcat_cd = gic.fbi_imcat_cd
and i.flight_seq_nbr_key = if.flight_seq_nbr_key
and i.ed_card_nbr = :post_lookup_card;
insert into
imm_argssr
(fbi_person_id, flight_seq_nbr_key, edd_nbr,
actual_imm, immcate,
est_immre_time,
actual_immige, hotel_name,
flight_crew_indicator_flag, purpose_of_visit)
values
(:post_fbi_person_id,
:post_ad_flight_key,
:post_ed_nbr_save,
to_date(:post_actl_date, 'dd/mm/yyyy'),
:post_ad_passetegory,
to_date(:post_ad_eparture_date,'dd/mm/yyyy'),
sysdate, null, 'n', null);
2. Preparing the SQL for Benchmark Factory
For application SQL with the PL/SQL into
clause, we must replace the PL/SQL into clause syntax with
its generic SQLPLUS form. If you want your test to keep the PL/SQL
into clause, you can build anonymous PL/SQL blocks for them
and execute them as prepared statements. For our purposes, we only
need the data transferred back to the calling machine, so we alter
the SQL to remove the PL/SQL into clause:
select unsub_stuff
from
fbi_person g,
imm_arr_departure i,
fbi_immigration_category gic,
imm_flight if
where
g.fbi_person_id = i.fbi_person_id
and i.flight_seq_key = if.flight_seq_key
and i.ed_card_nbr = :post_LOOKUP_BY_NAME
Providing data values for the benchmark
There are two ways to generate random testing
data form the SQL that we have extracted:
- Random
Values - We can use the benchmark factory functions $BFRandList
and $BFRandRange to generate a random lists of data
values. We can control all ranges of numeric values with valid
ID ranges and the list substitutions with valid values.
-
Application Values - We can extract actual host variable
data from the application tables. By examining tables and actual
values we can provide a realistic list of values to feed into
the functions. For nearly all of the SQL selected for this test
there were a limited number of values with most selections being
based on integer value keys easily generated using the $BFRandRange
function.
Random Range Values
Here we see an example of SQL using the $BFRandRange
function to generate random values for host variables:
select stuff
from fbi_person g, fbi_immigration ic
where g.fbi_person_id = $bfrandrange(10000,100000)
and g.fbi_imcat_cd = ic.fbi_imcat_cd;
Application Values
By carefully selecting our SQL we can utilize
selection values easily loaded into the $BFRandList
function. To use actual application data you would need to create a
function that would return random values to the SQL statement,
however, this would generate additional load on the database
possibly causing skewed or invalid results.
We are now ready to configure the client
machines, placing the TNS layer and the Benchmark Factory software
that will load-down the database. Now that we have the SQL ready we
can pursue the set-up on the BF clients.
3. Load Preparation
Stressing the database server involves a set-up
of multiple Oracle*Net clients. The benchmark factory allows you to
install an “agent” on an Oracle*net client machine and launch the
SQL from many client PC’s.
How many users do we want to simulate? Up to
2000 or so (up to CPU or network bandwidth limits) can be done from
a single agent process, more than that we will need to install a
second or more agents to submit requests through. Any windows based
system can be used as an agent, but it must have Oracles network
stack installed.
4. Test Execution & Capturing Results
Here is an example of the results from a “Goal”
test where the goal was to push the system to where the TPS dropped
below the previous two runs.
The Benchmark Factory collects all transaction
data and places it inside a spreadsheet for easy access and
summarization:


Experiment Predictions
In our typical 64-bit Oracle system with 32
gigabytes of RAM we would predict results as follows:
- The RAM data buffer cache (14 gigs in each
of the two servers) will be sufficient to cache the working-set
of frequently-referenced data blocks and disk enqueues will not
be the bottleneck.
- The low-DML of this system and the fact
that OAE does whole-screen data map-outs in one Oracle*Net trip,
we do not expect to see network latency being responsible for
the response-time drop-off.
Because of the architecture and the nature of
transactions against this system we expect that the main source of
contention in this system should be CPU, and Oracle latch-elated
wait events. We should start seeing CPU enqueues (in our case, the
run queue exceeding “2” on each server) at about the same
transaction rate when total response time declines precipitously.
We predict that this configuration will bottleneck on CPU at
approximately 30 transactions per second and that RAM, network and
disk enqueues will not be on the bottleneck.
Our Results
For the initial test run we utilized only a
single Node as the full scale replication was not yet implemented.
In the test we set up a run from 1-100 users executing the SQL
transactions from Appendix A, in this phase of testing we will not
test the replication.
In this goal test the criteria for ending the
test was for the transactions per second at the current user load to
fall behind that of the two previous runs. The graph in Figure 4
shows the TPS results.

Figure 4 – The Preliminary
Test TPS Results
The results show that at 60 users the system
TPS seems to peak in this configuration, but what about other signs
of strain in the system? Is it CPU, Disk, or Network related? Look
at Figure 5 for CPU usage results.

Figure 4 – The Preliminary
Test TPS Results
Note that the user (us) CPU usage peaks at
around 65 percent; with System (sy) peaking at only 5 percent this
still indicates that we have 30 percent CPU idle time. Clearly based
on Figure 5 we do not have CPU stress. A secondary indicator of CPU
stress is runqueue length. Figure 5 shows the plot of runqueue for
the same time period.

Figure 5 – The Preliminary
Test Runqueue Results
As you can see the runqueue has rarely rises
above 4, this indicates that CPU stress was not a factor. Next let’s
examine disk stress results. Look at Figure 6 showing Disk Writes.

Figure 6 – The Preliminary
Test Disk Write Results
As can be seen from Figure 6, other than a
single large burst of greater than 30,000 IO in a 10 second interval
near the mid-range of the test, disk IO was well within the disks
operational limits. Figure 7 shows that Disk Reads were also well
within specifications.

Figure 7 – The Preliminary
Test Disk Read Results
The system reads are clearly below any stress
levels that would indicate the disks were the cause of the low TPS
values. This is also supported when we examine the system wait
statistics in Figure 8.

Figure 8 – The Preliminary
Test Wait Results
As can be seen from examining Figure 8, other
than a peak that corresponds to the reads on disk hbc, system waits
are not causing the TPS drop off. The final possible bottle necks
are memory and network. Examination of the AWR report for the test
period shows some interesting network statistics. Look at Figure 9
for an excerpt from the applicable AWR report from the test run.
Shared Pool Statistics Begin End
------ ------
Memory Usage %: 88.59 91.21
% SQL with executions>1: 35.30 41.99
% Memory for SQL w/exec>1: 44.32 45.20
Top 5 Timed Events Avg
%Total
~~~~~~~~~~~~~~~~~~ wait
Call
Event Waits Time (s) (ms)
Time Wait Class
------------------------------ ------------ ----------- ------
------ ----------
CPU time 555
67.3
latch: library cache 7,068 69
10 8.3 Concurrenc
latch: shared pool 4,291 15
4 1.9 Concurrenc
log file sync 5,520 11
2 1.4 Commit
log file parallel write 6,464 9
1 1.1 System I/O
-------------------------------------------------------------
Statistic Name Time (s) %
of DB Time
------------------------------------------ ------------------
------------
DB CPU
554.8 67.3
sql execute elapsed time
395.7 48.0
parse time elapsed
384.1 46.6
hard parse elapsed time
311.9 37.9
Avg
%Time Total Wait
wait Waits
Wait Class Waits -outs Time (s)
(ms) /txn
-------------------- ---------------- ------ ----------------
------- ---------
Concurrency 12,201 .0
96 8 2.0
System I/O 10,306 .0
15 1 1.7
Commit 5,520 .0
11 2 0.9
Network 2,798,166 .0
8 0 451.2
Other 247 .0
1 5 0.0
Application 400 .0
0 0 0.1
User I/O 1,407 .0
0 0 0.2
-------------------------------------------------------------
Wait Events DB/Inst: FBIGT/fbigt
Snaps: 383-384
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)
Avg
%Time Total
Wait wait Waits
Event Waits -outs Time (s)
(ms) /txn
---------------------------- -------------- ------ -----------
------- ---------
latch: library cache 7,068 .0
69 10 1.1
latch: shared pool 4,291 .0
15 4 0.7
log file sync 5,520 .0
11 2 0.9
log file parallel write 6,464 .0
9 1 1.0
latch: cache buffers chains 597 .0
7 12 0.1
SQL*Net message to client 2,782,891 .0
5 0 448.8
SQL*Net message from client 2,782,946 .0
190,370 68 448.8
-------------------------------------------------------------
Background Wait Events DB/Inst: FBIGT/fbigt
Snaps: 383-384
-> ordered by wait time desc, waits desc (idle events last)
Avg
%Time Total
Wait wait Waits
Event Waits -outs Time (s)
(ms) /txn
---------------------------- -------------- ------ -----------
------- ---------
rdbms ipc message 17,804 65.0
37,185 2089 2.9
-------------------------------------------------------------
Figure 9 – The Preliminary
Test AWR Results
What the AWR report is showing is that we are
seeing some shared pool stress and network stress. Review of alert
log entries shows the database was also getting ORA-04031 errors,
indicating the shared pool was becoming fragmented. It was decided
that the CURSOR_SHARING parameter would be left at EXACT; as a
result this caused numerous not sharable SQL statements to be loaded
into the shared pool causing the ORA-04031 errors.
When we retest using the dual nodes with
replication, we will turn on the CURSOR_SHARING to reduce memory
thrashing and parsing. In addition we will make sure the clients
used are on separate network strands.
Conclusions for Phase 1 Testing
Phase one showed that even with limited network
bandwidth the system easily supported 50-60 users on a single node,
more than is anticipated than will be needed during normal
operations. With better memory parameters and better network
bandwidth it is anticipated that the nodes will easily support
80-100 users.
Disk Load - other than a single large
burst of greater than 30,000 IO in a 10 second interval near the
mid-range of the test, disk IO was well within the disks operational
limits.
CPU – The user (us) CPU usage peaks at
around 65 percent; with System (sy) peaking at only 5 percent this
still indicates that we have 30 percent CPU idle time.
Memory and Network - Examination of the
AWR report for the test period shows some interesting network
statistics. As predicted, the server became CPU-bound as active
users rose above 60:
Top 5 Timed Events Avg
%Total
~~~~~~~~~~~~~~~~~~ wait
Call
Event Waits Time (s) (ms)
Time Wait Class
------------------------------ ------------ ----------- ------
------ ----------
CPU time 555
67.3
latch: library cache 7,068 69 10
8.3 Concurrenc
latch: shared pool 4,291 15 4
1.9 Concurrenc
log file sync 5,520 11 2
1.4 Commit
log file parallel write 6,464 9 1
1.1 System I/O
Other Oracle benchmark notes: