There are many Oracle experts who
have very different approaches to Oracle tuning and
database optimization. There are Oracle
scientists, pragmatists and statisticians, all of whom
are successful at Oracle tuning, but with very different
approaches solving the Oracle tuning problem.
The problem occurs when some Oracle experts become
zealous about their method and a "dogma" emerges. Some
Oracle guru's even develop a cult-like following, each
chanting the mantras of their opinion leader and fostering
an almost religious fervency in their belief system about
Oracle tuning. Cults are fine, but in the real-world, we
use every approach that has validity!
Many Oracle gurus' publish "rules" for the best way to
tune an Oracle database, and Oracle professionals accept
these rules as Gospel. This acceptance of rules is because
of their high respect for the Oracle guru, but it is also
a function of the needs of a tuning professional to have
"firm" guidelines and checklists to follow. If only it
were that simple! Of course, these rules are often little
more than tautologies and they only serve to confuse the
Oracle community.
In reality, Oracle tuning success is a semi-structured
task, and the successful tuner must have an excellent
understanding of all of the interacting components to be
successful. There are no hard-and-fast rules; everything
is in flux.
Let's take a close look at some of these Oracle tuning
fallacies and understand their weaknesses.
Fallacy: The sole purpose of Oracle tuning should be
to reduce logical I/O
This is a dangerous fallacy because it ignores
important external factors such as disk, network and CPU
bottlenecks and internal factors such as the library
cache. While reducing logical I/O will minimize impact on
the data buffer cache, there is no guarantee that physical
disk I/O will be reduced.
Accessing Oracle data via the data buffers is at least
10x faster (sometimes hundreds of times faster) than disk
access, so one should not ignore disk enqueues and assume
that only tuning-down logical I/O will remove the problem.
Oracle created the KEEP pool so that the DBA can identify
and cache high-use tables and indexes.
Fallacy: Hourly tuning averages are meaningless
Some Oracle scientists advocate taking point-in-time
samples of performance and argue that any elapsed-time
average of Oracle performance longer than 15 minutes is
too long to be valid. However, many shops recognize that
hourly STATSPACK and AWR metrics (average
buffer-hit-ratio, library cache miss ratio) are often
critical for long-term proactive tuning.
For example, collecting hourly STATSPACK or AWR
averages over many months allows the DBA to develop
"signatures" of Oracle performance. These signatures will
clue the Oracle DBA to times when their system can be
proactively reconfigured to anticipate changes in
processing modes. I know of one shop that detected a
hidden batch job every Tuesday evening, and they were able
to cut the run duration in half by reconfiguring the SGA.
Fallacy: Only transaction-level tuning will be
successful
Some Oracle tuning professionals collect Oracle 10046
trace dumps and use these cryptic hex dumps top locate
poorly-performing SQL statements. They shun the
system-level tuning approach as treating the "symptom" and
not the cause of the performance problem.
They argue that if they tune at the application and SQL
level, then the system-wide issues will self-optimize. In
reality, both transaction-level and system-level tuning
are critical activities. Of course you must optimize your
SQL to ensure that it is gathering the result set with the
minimum amount of logical I/O, but the overall
configuration is also critical. I have made a single
change to an Oracle init.ora parameters and seen the
performance of hundreds of SQL statements improve.
System-level tuning is a reality. Some tuning
professionals treat the system-wide issues first and then
drill-in to the sub-optimal PL/SQL and SQL. Other shops
that do not have the large amount of funds required to pay
for detailed application-level changes (tweaking
application code & SQL). Here are some examples of
real-world system-level Oracle tuning activities:
- Replace sub-optimal RAID - The wrong RAID
level for the Oracle application can be disastrous. For
example, write-intensive Oracle databases (high DML)
will often perform faster with RAID 0+1 than RAID5. For
details, see the Battle Against Any RAID Five at
www.baarf.com.
- Fixing disk bottlenecks - High waits on
background events like "db file parallel write" and "log
file parallel write" are great indicators of disk
enqueues. If your top wait event is relates to files
reads ("db file sequential reads" and "db file scattered
reads") then you may want to check for disk enqueues and
see if striping (RAID 0+1, RAID10, Stripe and mirror
Everywhere SAME) might relieve the bottleneck. Also some
shops move high impact data files onto solid-state
disks.
- Change sub-optimal optimizer parameters -
Resetting a sub-optimal setting for several optimizer
parameters (optimizer_mode, optimizer_index_cost_adj)
can double the performance of some systems. Changing
optimizer_mode=rule to optimizer_mode=first rows (or
vice-versa sometimes!) can influence the performance of
every SQL statement in the application.
- Fix poor-quality optimizer statistics - The
Oracle cost-based SQL optimizer (CBO) will only make
good decisions if it has good-quality samples from the
schema objects.
- Reconfigure the SGA - The setting of the SGA
region sizes and sub-area will have a major impact of
overall performance.
- Use a faster network - System where the top
Wait Event is consistently related to sqlnet waits then
you can add faster network resources. However, remember
that many network bottlenecks can be fixed by tuning the
application to make less Oracle calls, or by adjusting
the TNS parameters (sqlnet.ora, tnsnames.ora),
specifically the tcp.nodelay, SDU and TDU parameters.
- Get faster processors - Faster (or more) CPUs
may greatly improve performance of Oracle system that
are constrained by CPU (as determined by Oracle Wait
Event reports). Theses types of systems may perform up
to 10x faster depending upon the speed and SMP
capabilities of the server. The latest trend is to run
Oracle on the new 64-bit Itanium2 servers using Windows
or Linux.
Fallacy: If an Oracle tuning metric has an exception,
it is not valid
Many scientist Oracle professionals thrill when they
find that a general rule of Oracle performance can be
challenged.
For example, showing how the data buffer cache ratio
can be manipulated at-will does not negate the fact that
RAM caching (and the ratio's to monitor caching) are a
critical part of Oracle tuning. Data blocks in RAM can be
accessed in speeds that are several orders-of-magnitude
faster than traditional disk access.
One of the most obtuse arguments I've heard in quite
awhile is that a high buffer cache hit ratio (95%) may be
symptomatic of a poorly-tuned database I/O subsystem
because of sub-optimal SQL that is causing high logical
I/O! Of course, a high buffer cache hit ratio may also
signify a well-tuned database.
A high buffer cache hit ratio does not really tell you
much, but a low buffer cache hit ratio for an OLTP
database almost always indicates that the buffer is
too-small to cache the working-set of
frequently-referenced blocks. In fact, ratio's form the
basis for the Oracle10g AMM facility and the Oracle
v$db_cache_advice utility.