|
 |
|
Is the Oracle buffer hit ratio a useless metric for
monitoring and tuning?
Oracle Database Tips by Donald BurlesonUpdated 22 May 2013
|
Ever since the Oracle7 performance tuning class in the
early 1990's recommended
keeping the buffer hit ratio above 95%, (advice later adopted by SAP and other
ERP vendors), there has been a huge amount of misunderstandings about the Buffer
Cache Hit Ratio, and the value of ratios in-general as a tool for Oracle
monitoring and tuning.
To learn more about optimizing the data buffer hit ration, see
Oracle data
buffer hit ratio advisory and
the
importance of an improving your data buffer hit ratio.
Lets start by separating the fact from the fiction.
Whether we are talking about the buffer cache hit ratio or the data buffer cache
advisory (see
v$db_cache_advice), we need to remember that they are just different
formulas for twiddling the same metrics, namely the elapsed-time deltas for
logical I/O (consistent gets) and physical I/O (disk reads). Hence, any
computations are fundamentally flawed:
-
One Point - A single delta value does not
provide enough information for any meaningful estimates.
-
Too long - Using the standard one-hour snapshots
looses much detail, making most extrapolations meaningless. Of course,
it's a different story when we take multiple snapshots, especially at
high-granularity (e.g. 10 minutes or less).
If we decompose the problem, we are really taking about
elapsed logical I/O vs. physical reads. The formulas are incidental.
Most experts agree that the Buffer Cache Hit Ratios, by
themselves, are of limited value, only one of hundreds of metrics that indicate
database performance. In general, all ratios are limited because they
only measure a probability, in the instance case, the overall probability that a
data block will be found in the data buffer cache upon "re-read" (i.e. the BCHR
has no bearing on blocks are "brand new" to the data cache).
Another issue with the BCHR is it's high variability.
For example, while an overall hourly BCHR may be 96%, the value swings wildly.
When examined at a finer level of granularity, the BCHR could be 20% one
minute, and 95% the next. It's all dependent on the current workload.

Mythmakers and the buffer hit ratio
The mythos about the data buffer hit ration are spread
mainly by people who do not understand the fundamental nature of ratio's.
For example, a DBA once published a demo script which accepts a "target" BCHR and then floods the database with requests for pre-cached data
blocks until the desired ratio is achieved!
From this obvious
demonstration of ratio's in-action, many people falsely concluded that because a
ratio can be manipulated, ergo it is totally meaningless!
Bob Jones comments on the assertion that the BCHR is
meaningless, simply because it can be changed by artificially varying the
database workload:
BCHR can be manipulated. That is nothing new. All stats
can be inflated in similar manners. But that doesn't make them all
meaningless. Given everything else being equal, high BCHR is always better
than low BHCR . .
BCHR alone does not tell you about overall performance.
It simply tell you the disk I/O percentage. It is an indicator, a very
meaningful one.
But can we therefore conclude that the BCHR is totally
useless? Let's take a closer look.
The BCHR for performance tuning
The buffer cache hit ratio is no longer touted as a
standalone measure of system performance, but it's not completely useless
either.
There has been some question about whether the data buffer cache advisory is
a ratio-based tool (and hence is invaluable as the data buffer hit ratio).
MOSC
Note:148511.1 says that the estd_physical_read_factor is a ratio:
"Physical read factor for this cache size, which is the ratio of the
number of estimated physical reads to the number of reads in the real
cache."
See how to predict I/O reduction
from a larger data cache for more details.
From the
10g documentation on memory usage we see that Oracle continues to recommend
using the buffer cache hit ratio in conjunction with other metrics, namely the
predictive ratio's displayed in the 10g buffer cache advisory:
"The buffer cache hit ratio can be used to verify the
physical I/O as predicted by V$DB_CACHE_ADVICE"
In a well-tuned production database, adding RAM to the data
buffers can make a difference in overall throughput, via a reduction in physical
disk reads, one of the most time-consuming operations in any Oracle database.
The Oracle University Performance tuning course for Oracle9i (dated 2001)notes
that you should increase the cache size ratio under the following conditions:
-
The cache hit ratio is less than 0.9 on OLTP
system.
-
There is no undue page faulting.
-
If the previous increase was effective.
The Oracle AWR report contains a buffer advisory utility
that shows predictions of the marginal changes to physical disk reads with
changes to the buffer size:
Buffer Pool Advisory
-> Only rows with estimated physical reads >0 are displayed
-> ordered by Block Size, Buffers For Estimate
Size for Size Buffers for Est Physical Estimated
P Estimate (M) Factr Estimate Read Factor Physical Reads
--- ------------ ----- ---------- ------------- ------------------
D 4 .1 501 2.10 1,110,930
D 8 .2 1,002 1.84 970,631
D 12 .2 1,503 1.75 924,221
D 16 .3 2,004 1.62 857,294
D 20 .4 2,505 1.61 850,849
D 24 .5 3,006 1.59 837,223
D 28 .5 3,507 1.58 831,558
D 32 .6 4,008 1.57 829,083
D 36 .7 4,509 1.56 825,336
D 40 .8 5,010 1.56 823,195
D 44 .8 5,511 1.06 557,204
D 48 .9 6,012 1.01 534,992
D 52 1.0 6,513 1.00 527,967
D 56 1.1 7,014 0.78 411,218
D 60 1.2 7,515 0.35 186,842
D 64 1.2 8,016 0.28 148,305
D 68 1.3 8,517 0.26 134,969
D 72 1.4 9,018 0.23 123,283
D 76 1.5 9,519 0.23 121,878
D 80 1.5 10,020 0.23 120,317
-----------------------------------------------------
For a well-tuned database, the goal of setting the data buffer
size is to cache the "working set" of frequently referenced data blocks, the
point at which we see a marginal decline in the amount of RAM needed to reduce
disk reads:

However, there are some serious limitation to the Oracle
data buffer cache advisor:
-
Only one delta - Using only two observations for
logical reads and physical I/O are not enough data for a meaningful
prediction. The "current workload" assumption has a wide variance, and
the numbers for a one minute report will be quite different from a one
hour report.
-
Only two metrics - All of the advice from the
data buffer cache advisory is limited to logical I/O and physical I/O at the
system-wide level.
-
Assumption of optimization - The AWR data buffer
cache advisor (and possibly the related v$db_cache_advice utility),
only has two data points to consider and it assumes that the existing data
buffer size is optimal, the point at which the working set of
frequently-used data blocks are cached, and additions to the data buffer
result in marginally declining reductions in physical reads.

Hence, on the margin, the data buffer cache advisory is
inaccurate for database with an undersized db_cache_size (and
db_keep_cache_size, etc.). With the data
buffer set to a very small size, a small increase to the size of the RAM data
buffers results in a large reduction in Disk I/O.

The data buffer cache advisory may underestimate the benefits
with a too-small cache
However, the high reduction in Disk I/O does not continue ad infinitum. As
the RAM size approaches the database size, the marginal reduction in Disk I/O is
smaller because all databases have infrequently accessed data.

The data buffer cache advisory does
not know that a cache is oversized
In sum, the usefulness of a data buffer cache advisory is
undisputed, but the true way to a successful predictive model is to use the
scripts from
my book "Oracle
Tuning: The Definitive Reference".
These provide a valid time-series analysis since the single delta
values in the advisory are not sufficient. Using STATSPACK data for
consistent gets and physical reads, statistically significant trends can be
established.
See my related notes on v$db_cache_advice:
The BCHR for monitoring
Oracle has the v$db_cache_advice utility and has
incorporated a buffer cache advisory into the standard AWR report, ostensibly to
provide recommendations about the projected reduction in expensive disk I/O with
the addition of more data buffers.
But the question remains about the value of this metric to
the DBA.
Once I've tuned and stabilized my systems, I notice that the metrics create
repeatable "signatures", patterns of usage that form the baselines for the
exception alerts.
First, we establish a "exception threshold" for the BCHR, (e.g. +- 20%), and
compare that deviation to the historical average, normalized by the historical
average per hour and the day-of-the-week.
When the alert fires, an e-mail notifies the DBA.
This alert does not have any corrective action, it merely alerts the DBA that
there has been a significant change to the I/O patterns within their workload.
Getting the optimal threshold value is an iterative
process. If I get a false positive, I simply increase the threshold, and
sometimes, turn it off completely.
No single database will have the same BCHR alert thresholds. For example, while
OLTP databases have predictable patterns of usage, a data warehouse is likely to
have less signatures, often making the BCHR a totally useless metric.
My other notes on buffer hit ratios
Here are my elated notes on the buffer cache hit ratio, and
please note that some of this advice may not apply to later releases of Oracle:
|
|
|
Oracle Training from Don Burleson
The best on site
"Oracle
training classes" are just a phone call away! You can get personalized Oracle training by Donald Burleson, right at your shop!

|
|
|
|
|
|
|