| |
 |
|
Oracle Streams Benchmark Testing
Oracle Tips by Burleson Consulting |
Benchmarking Oracle Streams
Oracle Streams is a popular free alternative to
Oracle RAC for failover and disaster recovery, but it is important
to note that Oracle Streams is not a panacea, and its not
appropriate for all databases. Oracle Streams has no licensing
costs (RAC costs extra) and it is less complex to configure than a
RAC database. This disadvantage over RAC is that there is no
continuous availability, update collisions may occur, replication is
not instant, and application failover is usually done manually.
When using Oracle Streams for n-way replication
it is critical to implode each instance with DML to determine the
point where Oracle will experience contention.
The paper below describes a DML stress test on
Oracle 10g Streams, using the Quest Benchmark factory software
We also need to understand the limits of
cross-replicating Oracle systems with Oracle Streams. For more
information on Streams vs. RAC for failover, see my notes here:
Oracle Streams Stress test Executive Summary
The purpose of this DML stress test was to
implode an Oracle instance with insert transactions and observe the
breaking-point as high-volumes of data replicate to the to a remote
server using n-way Oracle Streams.
This Oracle Streams stress test was a
tremendous success and shows these exact areas where the database
will experience contentions at high loads. The test also revealed
maximum rates (per server) for I/O, including disk reads, disk
writes and maximum updates per second.
- 400
writes per second
-
7,000 disk reads per second
-
12,000 updates per second
-
19,000 logical buffer reads per second
- UGA
memory access at 18 million per second
We now know the
areas where we will have future problems as system update activity
increases:
- We
need to monitor the log switch interval and re-size the online
redo logs if switches occur more frequently than every 15
minutes.
- We
need to monitor Streams spills to disk and increase the streams
pool to ensure that we never have disk enqueues.
- Hot
tables may eventually have to be moved from ASSM to standard,
multiple freelists.
Oracle Streams Stress Test Overview
The subject of this test is an Oracle server
that is using Oracle10g 2-way Streams replication across a 90-mile
high-speed interconnect. Figure 1 shows a model of the Oracle
server architecture:

Figure 1 The Oracle server architecture
This test imploded the Oracle instance with
insert statements in order to locate the system bottleneck, focusing
on the ability of Oracle to replicate the data across the Oracle
Streams interface to the remote server. The default block size is
8k with a special 32k tablespace for the indexes.
This test used the Quest Benchmark Factory to
generate the insert statements, not using bind variables.
Predictions
As DML transactions increase we would expect
the stress to increase on those instance components associated with
DML:
-
Redo log contention We expect to see very-frequent log
switches and waits on redo
-
Streams replication enqueues We expect to see the Oracle
Streams back-up and enqueue (to disk).
-
Object contention As inert rates increase we expect to
note segment contention on the bitmap freelists (this test has
tablespaces with ASSM).
-
RAM exhaustion We expect to see contention within the
shared_pool and db_cache_size.
Streams Stress Experiment Results
This test was a tremendous success and shows
these exact areas where the database will experience contentions at
high loads and also reveals maximum rates for I/O, including disk
reads, disk writes and maximum updates per second.
The AWR report from the stress test is in
Appendix A. We noted excellent rates for data block access:
- 400 writes per second
- 7,000 disk reads per second
- 12,000 updates per second
- 19,000 logical buffer reads per second
- UGA memory access at 18 million per second
Instance
Activity Stats DB/Inst: CATGT/catgt Snaps:
911-912
Statistic Total per
Second per Trans
-------------------------------- ------------------
-------------- -------------
db block
changes 1,888,093
11,989.6 26.0
db block
gets 2,697,885
17,131.8 37.2
physical read
bytes 1,122,304 7,126.7
15.5
physical read
total IO requests 3,451 21.9
0.1
physical read
total bytes 610,102,272 3,874,206.4
8,410.8
physical
write IO requests 57,900
367.7 0.8
physical
write bytes 528,932,864 3,358,773.1
7,291.8
physical
write total IO requests 73,400
466.1 1.0
physical
write total bytes 1,864,753,152 11,841,356.6
25,707.3
session uga
memory 12,884,963,888 81,820,723.5
177,630.5
During the heavy load on the system, we
observed that the redo and Steams became system bottlenecks:
Top 5 Timed
Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait
Call
Event Waits Time (s) (ms)
Time Wait Class
------------------------------ ------------ ----------- ------
------ ----------
log file
sync 73,876 1,977 27 44.3
Commit
enq: HW -
contention 554 874 1578 19.6
Configurat
enq: SQ -
contention 3,169 680 215 15.3
Configurat
CPU
time 235
5.3
latch:
library cache 8,022 172 21 3.9
Concurrenc
-------------------------------------------------------------
Redo Log contention
For this load level the size of the redo logs
is insufficient, at the very start we were doing 197 switches per
hour at the end 205.
Statistic Total per Hour
-------------------------------- ------------------ ---------
log
switches (derived) 9 205.74
For recoverability we want to keep the redo
size at the default value (512k) so that we can get a few log
switches per day. If we make them too large we risk loosing
multiple days work.
Object Contention
Our biggest stress point for blocks appears to
be row lock waits on the indexes. This was probably because we were
doing a high volume of updates along with the inserts, in normal
operation we probably won't see this mix at this stress level. We
used ASSM and noted issues with the initrans value. We may want to
switch to manual for some tables if we expect this level of stress.
At max stress we were also seeing sequence and
object high watermark enqueues. The sequence enqueues probably
indicate we need to increase the caching level for the sequences
(I'll wager they are at the default of 20, we may wish to bump that
up to 100 or so). The HWM enqueues are because we started expanding
the table quickly...we may want to look at the tablespace settings
for extent size.
Enqueue Type
(Request Reason)
------------------------------------------------------------------------------
Requests Succ Gets Failed Gets Waits Wt Time (s) Av Wt
Time(ms)
------------
------------ ----------- ----------- ------------ --------------
SQ-Sequence
Cache
6,671 6,660 0 3,050 631
207.00
TX-Transaction (index contention)
1,589 1,588 0 1,565
8 5.35
We also note that the bitmap freelists (as
implemented via Automatic Segment Storage Management ASSM) become
overloaded, resulting in buffer busy waits:
Buffer
Tablespace Subobject Obj. Busy
% of
Owner
Name Object Name Name Type Waits
Capture
---------- ---------- -------------------- ---------- -----
------------ -------
CAT_SCHEMA
CAT_SCHEMA XPKIM_DONALDS
INDEX 2,768 73.85
Streams Contention
As predicted, we noted enqueue contention
within Oracle Streams as transactions backed-up to be replicated to
the remote server:
Avg
%Time Total
Wait wait Waits
Event Waits -outs Time (s)
(ms) /txn
---------------------------- -------------- ------ -----------
------- ---------
enq: HW -
contention 554 47.8 874
1578 0.0
Streams
capture: waiting for 85 50.6 112
1321 0.0
Streams AQ:
waiting for mess 91 96.7 330 3627
0.0
virtual
circuit status 6 100.0 176
29294 0.0
Streams AQ:
waiting for time 7 42.9 166 23752
0.0
Streams AQ:
qmn coordinator 78 37.2 161 2067
0.0
At this level of activity streams is spilling,
this is causing it to not keep up as well as it should (spilling
means going to disk) we need a larger streams pool. At maximum
(according to the report) we will need about a 400 megabyte streams
pool. We may also want to look at multiple streams processes.
Streams Pool
Advisory DB/Inst: CATGT/catgt Snap:
912
Size
for Size Est Spill Est Spill Est Unspill Est Unspill
Est (MB)
Factor Count Time (s) Count Time (s)
----------
--------- ----------- ----------- ----------- -----------
48
0.2 2,133,680 658 394,800 108
80 0.4 1,215,434 382 169,076 46
112 0.5 581,667 192 49,401 13
144 0.7 146,351 45 26,205 7
176 0.8 50,491 15 15,787 4
208 1.0
27,981 8 10,845 3
240 1.2 14,558 4 7,167 2
272 1.3 5,839 1 3,506 1
304 1.5 1,390 0 1,260 0
336 1.6 126 0 701 0
-------------------------------------------------------------
We also see high CPU activity associated with
Streams during the high update period
Streams
CPU/IO Usage DB/Inst: CATGT/catgt Snaps:
911-912
-> Streams
processes ordered by CPU usage
-> CPU and
I/O Time in micro seconds
Session
Type CPU Time User I/O Time Sys I/O Time
------------------------- -------------- --------------
--------------
STREAMS Apply
Reader 6,099,450 20,501 13,413
Logminer
Reader 970,096 0 8,876,876
Lessons Learned
-
Streams pool size and multiple streams processes (see Mike for
advice)
-
Increase sequence caching to 100
-
Review extent sizes for the most active tables (fbi_donald,
fbi_establishment)
We now know what
will happen as the load expands and we can prepare for future
high-stress conditions:
- We
need to monitor the log switch interval and re-size the online
redo logs if switches occur more frequently than every 15
minutes.
- We
need to monitor Streams spills to disk and increase the streams
pool to ensure that we never have disk enqueues.
- Hot
tables (fbi_donald) may eventually have to be moved from ASSM to
standard, multiple freelists.
Appendix A - AWR Report
WORKLOAD
REPOSITORY report for
DB Name
DB Id Instance Inst Num Release RAC Host
------------
----------- ------------ -------- ----------- --- ------------
CATGT
2438152563 catgt 1 10.2.0.1.0 NO catgmis-gt.g
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin
Snap: 911 16-Mar-06 22:30:32 81 7.7
End Snap: 912
16-Mar-06 22:33:10 80 7.7
Elapsed: 2.62 (mins)
DB Time: 74.32 (mins)
Cache Sizes
~~~~~~~~~~~ Begin End
---------- ----------
Buffer Cache: 3,376M 3,376M Std Block Size: 8K
Shared Pool Size: 624M 624M Log Buffer:
14,400K
Load Profile
~~~~~~~~~~~~ Per Second Per
Transaction
---------------
---------------
Redo size: 2,292,249.69
4,976.41
Logical reads: 19,080.45 41.42
Block changes:
11,989.57 26.03
Physical reads: 0.70 0.00
Physical writes:
409.84 0.89
User calls:
1,252.97 2.72
Parses:
739.47 1.61
Hard parses: 5.81 0.01
Sorts:
2.37 0.01
Logons:
0.04 0.00
Executes:
943.47 2.05
Transactions: 460.62
% Blocks
changed per Read: 62.84 Recursive Call %: 64.98
Rollback per
transaction %: 0.03 Rows per Sort: 36.06
Instance
Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.86 Redo NoWait %: 99.96
Buffer Hit %: 100.00 In-memory Sort %: 100.00
Library Hit %: 99.80 Soft Parse %: 99.21
Execute to Parse %: 21.62 Latch Hit %: 99.83
Parse CPU to
Parse Elapsd %: 21.32 % Non-Parse CPU: 91.63
Shared Pool
Statistics Begin End
------ ------
Memory Usage %: 72.94 81.20
% SQL with
executions>1: 97.06 82.84
% Memory for
SQL w/exec>1: 94.55 79.61
Top 5 Timed
Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait
Call
Event Waits Time (s) (ms)
Time Wait Class
------------------------------ ------------ ----------- ------
------ ----------
log file sync 73,876 1,977 27
44.3 Commit
enq: HW - contention 554 874 1578
19.6 Configurat
enq: SQ - contention 3,169 680 215
15.3 Configurat
CPU time
235 5.3
latch: library cache 8,022 172
21 3.9 Concurrenc
-------------------------------------------------------------
Time Model
Statistics DB/Inst: CATGT/catgt Snaps:
911-912
-> Total time
in database user-calls (DB Time): 4459.1s
-> Statistics
including the word "background" measure background process
time, and so
do not contribute to the DB time statistic
-> Ordered by %
or DB time desc, Statistic name
Statistic
Name Time (s) % of DB Time
------------------------------------------ ------------------
------------
sql execute elapsed time
2,152.0 48.3
sequence load elapsed time
748.3 16.8
DB
CPU
234.6 5.3
PL/SQL
execution elapsed time 177.9
4.0
parse time
elapsed 91.7 2.1
hard parse
elapsed time 7.1 .2
repeated bind
elapsed time 0.1 .0
PL/SQL
compilation elapsed time 0.1
.0
failed parse
elapsed time 0.0 .0
hard parse
(sharing criteria) elapsed time 0.0 .0
connection
management call elapsed time 0.0 .0
hard parse
(bind mismatch) elapsed time 0.0 .0
DB
time
4,459.1 N/A
background
elapsed time 235.9 N/A
background cpu
time 11.3 N/A
-------------------------------------------------------------
Wait
Class DB/Inst: CATGT/catgt
Snaps: 911-912
-> s - second
-> cs -
centisecond - 100th of a second
-> ms -
millisecond - 1000th of a second
-> us -
microsecond - 1000000th of a second
-> ordered by
wait time desc, waits desc
Avg
%Time Total Wait
wait Waits
Wait
Class Waits -outs Time (s)
(ms) /txn
-------------------- ---------------- ------ ----------------
------- ---------
Commit 73,876 .1
1,977 27 1.0
Configuration 4,220 11.9
1,773 420 0.1
Concurrency 16,589 .8
370 22 0.2
Network 117,711 .0
147 1 1.6
System
I/O 14,444 .0 124
9 0.2
Other 5,945 2.5
77 13 0.1
Application 2,478 .0
4 1 0.0
User I/O
567 .0 1 2
0.0
-------------------------------------------------------------
Wait
Events DB/Inst: CATGT/catgt Snaps:
911-912
-> s - second
-> cs -
centisecond - 100th of a second
-> ms -
millisecond - 1000th of a second
-> us -
microsecond - 1000000th of a second
-> ordered by
wait time desc, waits desc (idle events last)
Avg
%Time Total Wait wait
Waits
Event Waits -outs Time (s)
(ms) /txn
---------------------------- -------------- ------ -----------
------- ---------
log file
sync 73,876 .1 1,977
27 1.0
enq: HW - contention 554 47.8
874 1578 0.0
enq: SQ -
contention 3,169 3.7 680
215 0.0
latch: library
cache 8,022 .0 172 21
0.1
buffer busy
waits 4,316 2.9 169 39
0.1
SQL*Net more
data to dblink 18,425 .0 146 8
0.3
Streams capture: waiting for 85 50.6
112 1321 0.0
log file
parallel write 10,451 .0 99
9 0.1
log file switch
(checkpoint 181 39.2 97 535 0.0
rdbms ipc
reply 339 5.9 48
143 0.0
control file
parallel write 359 .0 13 35
0.0
latch
free 1,463 .0 10
7 0.0
log file
sequential read 521 .0 10
19 0.0
latch: enqueue
hash chains 1,312 .0 9 7
0.0
latch: library
cache pin 958 .0 8 9
0.0
log file switch
completion 187 2.7 8 43 0.0
enq: TX - index
contention 1,776 .0 8 4 0.0
latch: cache
buffers chains 1,214 .0 7 6
0.0
buffer
deadlock 104 100.0 5
46 0.0
latch: shared
pool 207 .0 4 21
0.0
SQL*Net
break/reset to clien 2,478 .0 4
1 0.0
Log archive
I/O 431 .0 2 6
0.0
LGWR wait for
redo copy 2,078 1.3 2 1
0.0
latch: session
allocation 264 .0 1 6
0.0
Data file init
write 257 .0 1 4
0.0
latch: redo
copy 38 .0 1 28
0.0
latch: library
cache lock 71 .0 1 11
0.0
SQL*Net message
to client 99,091 .0 1 0 1.4
latch: cache
buffers lru cha 66 .0 1 9
0.0
enq: US -
contention 58 .0 0
8 0.0
row cache
lock 5 .0 0
83 0.0
os thread
startup 3 .0 0
109 0.0
latch: redo
allocation 166 .0 0 2
0.0
enq: CF -
contention 7 .0 0
36 0.0
latch: object
queue header o 36 .0 0 6
0.0
cursor: mutex
S 9 .0 0 20
0.0
db file single
write 31 .0 0 6
0.0
db file
sequential read 75 .0 0
1 0.0
latch: undo
global data 43 .0 0 2
0.0
control file
sequential read 2,646 .0 0 0
0.0
latch: row
cache objects 8 .0 0
5 0.0
SQL*Net more
data to client 109 .0 0 0
0.0
wait list latch
free 2 .0 0 11 0.0
log file single
write 36 .0 0 1 0.0
enq: TX -
contention 2 .0 0
7 0.0
undo segment
extension 1 100.0 0 13
0.0
enq: TX -
allocate ITL entry 2 .0 0
6 0.0
enq: FB -
contention 5 .0 0
2 0.0
latch: redo
writing 3 .0 0 0
0.0
direct path
read 99 .0 0 0
0.0
direct path
write 103 .0 0 0
0.0
SQL*Net message
to dblink 72 .0 0 0 0.0
db file
scattered read 2 .0 0
0 0.0
SQL*Net more
data from clien 14 .0 0 0
0.0
SQL*Net message
from client 99,093 .0 4,468 45 1.4
Streams AQ: waiting for mess 91 96.7
330 3627 0.0
virtual circuit status 6 100.0 176
29294 0.0
Streams AQ: waiting for time 7 42.9 166
23752 0.0
Streams AQ: qmn coordinator 78 37.2
161 2067 0.0
LogMiner:
wakeup event for p 877 20.0 161
183 0.0
Wait
Events DB/Inst: CATGT/catgt Snaps:
911-912
-> s - second
-> cs -
centisecond - 100th of a second
-> ms -
millisecond - 1000th of a second
-> us -
microsecond - 1000000th of a second
-> ordered by
wait time desc, waits desc (idle events last)
Avg
%Time Total
Wait wait Waits
Event Waits -outs Time
(s) (ms) /txn
---------------------------- -------------- ------ -----------
------- ---------
Streams AQ: qmn
slave idle w 48 .0 159 3318 0.0
LogMiner:
wakeup event for b 22,391 .4 157
7 0.3
wait for unread
message on b 52,923 .2 157 3 0.7
Streams AQ:
delete acknowled 42 50.0 156 3725
0.0
LogMiner:
wakeup event for r 1,312 8.8 151
115 0.0
jobq slave
wait 47 100.0 138
2931 0.0
SQL*Net message
from dblink 72 .0 15 207 0.0
LogMiner:
client waiting for 1 .0 0
0 0.0
-------------------------------------------------------------
Background Wait
Events DB/Inst: CATGT/catgt Snaps: 911-912 |