As a tuning professional, I can tell you
that, for the vast majority of non-scientific systems, the primary
bottleneck is disk I/O. Back in the days before RAID and giant
db_cache_size, the DBA had to manually load balance the disk I/O
sub-system to relieve contention on the disks and the disk
controllers. For details on Linux I/O, see
Oracle
disk I/O on Linux Tips.
Many DBAs would like to believe that this
disk technology has changed. Sadly, the only major changes to disk
technology since the 1970s are these hardware and software changes:
-
Large data buffers — Today the DBA can cache large portions
of the data blocks in the db_cache_size reducing disk I/O.
-
Disks with on-board cache — Many of the newer disks have an
on-board RAM cache to hold the most frequently-referenced data
blocks.
-
RAID — The randomizing of data blocks with RAID 0+1 (and
RAID 5 for low-write systems) has removed the need for disk load
balancing by scrambling the data blocks across many disk spindles.
In Oracle10g, the Automatic Storage Management (ASM) feature
requires SAME disk "Stripe And Mirror Everywhere," which is
essentially RAID 1+0.
However, other than these advances, basic
disk technology has not changed since the 1970s. The Oracle DBA must
remember that disk I/O remains an important issue and understand the
internals of disk management to maximize the performance of their
I/O-bound systems.

The
Ion tool is the easiest way to analyze STATSPACK disk
I/O data in Oracle and
Ion allows you to spot hidden I/O trends.
Remember, even with lots of caching, the
goal of almost all Oracle tuning activities is to reduce I/O.
Tuning the instance parameters, sizing the library cache, tuning SQL
statements all have the common goal of reducing I/O. Even when
physical disk I/O (POI) is minimized, logical I/O (LIO) is still a
major contributor to response time. Even fully-cached
databases will run slowly if sub-optimal SQL statements force
thousands of unnecessary Logical I/Os against the buffer cache.
Time and time again, I see Oracle DBAs
tuning a component of their database that is not a top wait event,
and they are surprised to find that their change did not make a huge
difference in performance. For example, faster CPU does not help an
I/O-bound system, and moving to faster disk does not help a
CPU-bound system. All Oracle databases have some physical
constraint, and it is not always disk. The best way to find the
constraints for your system is to examine the top five wait events
on your STATSPACK report:
Disk bound — The majority of the wait time is spent
accessing data blocks. This can be db file sequential read waits
(usually index access) and db file scattered read waits (usually
full-table scans): Top 5
Timed Events
% Total
Event
Waits Time (s) Ela Time
--------------------------- ------------ ----------- --------
db file sequential read
2,598 7,146
48.54
db file scattered read
25,519 3,246
22.04
library cache load lock
673 1,363
9.26
CPU time
44 1,154
7.83
log file parallel write
19,157 837
5.68
CPU bound — The majority of the wait time is spent
performing computations. You can also see this when the CPU run
queue exceeds the number of CPUs on your database server (using the
"r" column in vmstat UNIX). Having CPU as the top wait event
is a good thing because you can add faster CPUs or more CPUs and
relieve the bottleneck:
Top 5 Timed Events
% Total
Event
Waits Time (s) Ela Time
---------------------------------- ------------ ----------- --------
CPU time
4,851 4,042
55.76
db file sequential read
1,968 1,997
27.55
log file sync
299,097 369
5.08
db file scattered read
53,031 330
4.55
log file parallel write
302,680 190
2.62
Network Constrained — Network bottlenecks are very common
in distributed systems and those with high network traffic. They are
manifested as SQL*Net wait events:
% Total
Event Waits Time (cs) Wt Time
--------------------------------- ------------ ------------ -------
SQL*Net more data to client 3,914,935 9,475,372 99.76
db file sequential read 1,367,659 6,129 .06
db file parallel write 7,582 5,001 .05
rdbms ipc reply 26 4,612 .05
db file scattered read 16,886 2,446 .03
Once you
have determined that your Oracle database is I/O-bound, you must fully
understand the internals of your disk devices, layers of caching, and
the configuration of your disk controllers and physical disk spindles.
Let's start with a review of disk architecture and then return to disk
tuning and see how to troubleshoot and repair disk I/O problems.
Back to the future
Back in
the days of Oracle7, almost all databases were I/O-bound. Because most
applications are data intensive, the database industry came-up will
all sorts of elaborate methods for load-balancing the I/O subsystem to
relieve disk contention.
On the IBM
mainframes, the DBA could specify the absolute track number for any
data file with the "absolute track" (ABSTR) JCL argument. This allows
the mainframe DBA to control placement of the data files on their IBM
3380 disks. Prior to RAID, data file placement rules were quite
different (or were they?):
Let's take
a look at each of these approaches. Today's disks still have three
sources of delay, and though the "average" disk latency is 8-15
milliseconds, there can be wide variations in disk access speed.
Inside Disk Architecture
Internally, disks have changed little since the 1970s. They still have
spinning magnetic-coated platters, read-write heads, and I/O channels.
Let's take a minute to understand how disk I/O works at the device
level.
-
Read-write head delay (seek delay)
— This is the time required to place the read-write head under the
appropriate disk cylinder, and this time can be 90 percent of disk
access time. It is especially bad then competing files are placed in
outermost cylinders. Back in the days of 3350 disks, you could load
an ISAM file into a 3350, and watch the device shake as the
read-write heads swung back-and-forth. The probability of traveling
an arbitrary distance across the disk (pdi) is:

-
Rotational delay
— Once on the proper cylinder, the read-write heads must wait until
the track header passes beneath them. The rotational delay is the
speed of rotation divided by two, assuming that a platter will have to
spin a half-revolution to access any given track header.
-
Data transmission Delay — This delay can be huge for
distributed databases and database on the Internet. For example, many
worldwide Oracle shops use replication techniques and place systems
staggered across the world to reduce data transmission time.
Disk enqueues can occur when the disk is unable to quickly service concurrent
requests. Super-large disks can be problematic, and the most popular
Oracle data files can be placed on the middle absolute track of the device to
minimize read-write head movement.

Even today
we still see these three components of disk access latency. Back
before large data buffer caches and RAID, the DBA had to manually
place data files on the disk and monitor I/O patterns to ensure that
there was no disk contention.
Remember,
if you are not using RAID striping, these rules remain important. The
manual disk placement rules included:
-
File Placement — The DBA would place high I/O data files in
the "middle" absolute track number to minimize read-write head
movement. This is a situation in which the low access data files
reside on the inner and outer cylinders of the disk. In Figure 1, we
see where high impact data files should be placed to minimize
read-write head movement. You want to place your "hot" data
files near the middle absolute track:
Figure
1: Hot file placement of non-RAID data file.
-
File Striping — High impact data files were supposed to be
spread across many disk drives to spread the load and relieve disk
and channel contention:

Figure 2: Striping a non-RAID data file across many disks.
-
File Segregation — The redo files and a data files should
be placed on separate disk spindles to relieve contention. This is
also true for the archived redo log file directory, and the undo log
files.

Figure
3: A segregated non-RAID Oracle file system.
Remember,
if you are not using hardware or software RAID,
RAM SAN, or
100 percent data caching, all of these disk I/O rules still applies.
Now that
we see the manual methods for Oracle data file management, let's see
how advances of the past decade have simplified this important task.
The Age of Change
As we can
see, the manual file placement rules are cumbersome and complex.
During the 1980s, many DBAs spent a great deal of their time managing
the disk I/O sub-system. There are three main technologies that have
changed this approach:
Solid State Disk
The new
solid-state disks retrieve data thousands of time faster than
traditional disks at a cost of about $10k per gigabyte. Many Oracle
shops are using RAM SAN technology for their TEMP tablespace, undo
files, and redo log files. The noted Oracle author James Morle has a
great
whitepaper on solid state disk.
Large RAM Caching
In 64-bit
Oracle, the db_cache_size is only limited by the server, and many
shops are running fully-cached databases. In the next five years,
prices of RAM should fall such that most systems can be fully cached,
making disk management obsolete.
Oracle8i
provides a utility called v$db_cache_advice that allows you to
predict the benefit of adding additional RAM buffers (refer to Figure
4).
Figure 4:
Output from the v$db_cache_advice utility.
In Figure
4 we see that Oracle estimates the physical reads for different sizes
of the db_cache_size. In this example, we see that doubling the
db_cache_size from 856 to 1,672 will reduce disk I/O by more
than 80 million disk reads. However, as we approach full-caching, less
frequently referenced data becomes cached, and the marginal benefit of
caching decreases:
The marginal gains from large RAM caches.
As we approach full data caching, the marginal benefit of
blocks to db_cache_size decreases. While Oracle has not published the
mechanism, this approach is likely the foundation of the new Oracle10g
Automatic Memory Management (AMM) feature. With AMM in Oracle10g, you
can tell Oracle to track the usage of RAM within the
shared_pool_size, pga_aggregate_target and db_cache_size,
and Oracle10g will automatically adjust the sizes of these SGA regions
based on current usage.
However, there are some imitation to
the Oracle data buffer cache advisor. Setting the dba_cache_advice=on
while the database is running will cause Oracle to grab RAM pages from the
shared pool RAM area, and the buffer cache advisory uses this RAM in a simple
simulation to help capture elapsed time data for logical I/O and physical disk
reads, using different buffer sizes.
While the internals of the utility
are not disclosed, we see these parameters that effect the execution
of the v$db_cache_advice. (Note: These may not be
used within the historical data buffer reports):
-
_db_cache_advice_batch_size
-
_db_cache_advice_sample_factor
-
dba_cache_advice=on or
ready
First, there are several versions
of the data buffer cache advisor:
-
Real-time - The
v$db_cache_advice utility samples data from the SGA in
real-time, and creates a limited elapsed-time report from the
SGA accumulators and addition memory from the
dba_cache_advice settings.
-
Historical with AWR and
STATSPACK - It appears that the AWR and STATSPACK data
buffer advisories use the DBA_HIST and STATSPACK table data for
the report.
We also see some limitations to
the accuracy of the data buffer advisor because of the architecture:
-
Limited
deltas - Using only two observations for
logical reads and physical I/O are not enough
data for a meaningful prediction. The
"current workload" assumption has a wide
variance, and the numbers for a one minute
report will be quite different from a one
hour report.
-
Limited
metrics - All of the advice from the data
buffer cache advisory is limited to logical I/O
and physical I/O at the system-wide level.
-
Assumption
of optimization? - The AWR data buffer cache
advisor (and possibly the related
v$db_cache_advice utility), only uses
limited
data points and some experts suggest that it assumes that the
existing data buffer size is already optimal (the point
at which the working set of frequently-used data
blocks are cached, and additions to the data
buffer result in marginally declining reductions
in physical reads).

Hence, on the margin, the data buffer cache
advisory is inaccurate for database with an undersized
db_cache_size (and db_keep_cache_size, etc.).
RAID Technology
The advent
of hardware and software RAID has relieved the need for the DBA to
manually stripe data files across multiple disk spindles. The two most
common "levels" of RAID are RAID 10 (mirroring and striping) and
RAID5.
Oracle
recommends using (SAME) Stripe And Mirror Everywhere (a.k.a., RAID 1+0) for all systems that
experience significant updates. This is because of the update penalty
with RAID 5 architectures.
Use caution with RAID 5
Using RAID
5 for a high-update Oracle system can be disastrous to performance,
yet many disk vendors continue to push RAID 5 as a viable solution for
highly updated systems.
Note: This has changed as of 2015,
and there are Oracle RAID5 configurations that are able to support
high DML rates. See
RAID5 now acceptable for Oracle data files.
In sum, using some implementations of
RAID5 can be problematic for any company with high volumes of updates to
use RAID 5, and Oracle10g with Automatic Storage Management (ASM)
wants Oracle customers to use RAID 1+0.
Conclusion
Until such
time that solid state disk is cheap enough to fully cache large
databases, the Oracle DBA will still be concerned about their most
critical performance area, the disk I/O sub-system. The main points of
this article include the following:
-
If you
are not using RAID, the old-fashion file placement rules still
apply, and you must manually place Oracle data files across you disk
spindles to relieve I/O contention.
-
Using a
RAID 10 approach (striping and mirroring) distributes data blocks
across all of your disk spindles, making "hot" disks a random and
transient event.
-
RAID 5
is not recommended for high-update Oracle systems. The performance
penalty from the parity checking will clobber Oracle performance.
-
Oracle9i
and Oracle10g continue to refine the ability to support very large
RAM data buffers with the v$db_cache_advice utility and
Oracle10g Automatic Memory Management (AMM).
-
Solid
state disk is getting cheaper, and may soon replace traditional disk
devices. Many Oracle customers are using solid state disk for high
I/O data files such as TEMP, UNDO, and REDO files.
-
If disk
is not your bottleneck, improving disk access speed will not help.
Be sure to check your top-5 STATSPACK wait events to ensure that
disk I/O is your bottleneck before undergoing expensive changes to
your disk I/O subsystem.
In sum,
Oracle DBAs must continue to fully understand their disk I/O
sub-system and make sure that disk I/O does not impede the high
performance of their systems.
Also see:
|