|
|
Oracle Disk Access Speeds for Full Scan Operations
Oracle Database Tips by Donald BurlesonOctober 28, 2015
|
Prerequisite background reading:
A quick history of Oracle storage
Today we see a radical disconnect
between the traditional "transparent disk access" where you
could measure the "real" physical access speed and today's
storage arrays.
The most popular disk of the 1980's was the
refrigerator-sized 3380 disks, which contained only 1.2 gig of storage at the
astronomical cost of over $200,000.
In today's 2015 dollars, disk in the
1980's costs more than $4,000.00 per megabyte.
Today, you can buy 100
GB PC disks for $200, and 100 GB of RAM Disk (solid-state disk) for
$100,000.
In today's age of SAN, NAS and mass
storage devices, the "real" latency of a disk read/write is
hidden. For example, during a data block write, many disk
storage devices return a "false ack" (acknowledgement message)
to Oracle, when in reality, the data remains within an onboard
RAM cache and has not truly been written to the platters.
Introduction to Oracle full scan
I/O
To properly tune the Oracle I/O
subsystem we must carefully distinguish between Oracle's
perception of disk latency and the reality. The back-end
disk devices are "black boxes" and the only "real" information
we have about disk latency are the salient metrics collected by
Oracle. The Oracle metrics can easily be skewed by the
back-end devices but Oracle only knows the end-to-end latencies:
-
physical write waits, physical read waits
(physical reads/sec, physical reads direct, &c)
-
"db file scattered reads" waits
-
"db file sequential reads" waits
Traditionally, multiblock disk I/O
operations (e.g. table access full, index fast full scan, index
range scans) were considered faster because the access to
physically contiguous blocks required only a single seek delay,
the largest component of platter latency.
Once aligned under the proper cylinder, the read-write head
sits idle, sucking in many data blocks as the platter spins
beneath the read-write head. See my
notes on
Oracle db file sequential scattered read disk I/O speed tuning
for more details.
The Oracle optimizer is very
important because it is the component that decides between index
access vs. full-scan access, and there are several important
optimizer settings that influence this behavior.
It's all about the relative cost
of sequential vs. scattered read waits, and
analyzing system statistics provides the optimizer with
real-world empirical timings to help the CBO make better
decisions about the "best" access method. We also see
other important parameters such as the
db_file_multiblock_read_count and the
_optimizer_cost_model parms. To learn more, see my
notes on
global SQL optimization with parameters.
Read-ahead caching
The read ahead caching has many names on different operating
systems, and it started on IBM mainframes as "sequential
prefetch".
The concept of read-ahead caching is simple. Over 90% of I/O
latency is consumed in the read-write head movement, as the
heads are placed under the target cylinder. Once in-place, the
disk continues to spin and the read-write head can
simultaneously transmit back the original block request at the
same time as the next sequential block passes below the
read-write heads.
For scan operations (index range scans, index fast full scans,
and full-table scans), a read-ahead cache can be very useful for
speeding up these "scattered read" operations.
In sum, if your database is requesting adjacent data blocks, the
read-ahead cache may improve I/O throughput. However, using
RAID10 (SAME, stripe and mirror everywhere), like with ASM, the
blocks are not adjacent, and a read-ahead cache may not be as
useful as a database where the data blocks are laid-out
sequentially. Of course, the stripe width influences this
decision.
But not all disk configurations
are created equal. There are some Oracle
conditions that may adverse effect disk access speed for
full-scan operations (e.g. "db file scatter reads").
What factors
effect multi-block disk access speeds?
It generally accepted that
scattered read waits (multiblock reads from full-scans) will be
reported as higher latency than sequential read waits (single
block requests) because the multi-block I/O pulls-in many data
blocks as opposed to a single block. However, we must
remember that the first I/O that locates the read-write heads
under the proper cylinder can be 70%-80% of total access time,
and many databases can do multiblock I/O faster than
sequential I/O (See
Oracle
STATSPACK & AWR reports showing scattered read timings).
Why the big difference in DB file
wait times? Could this be due to internal
Oracle configurations (e.g. Automatic Storage Management (ASM)
of bitmap freelists (ASSM)? Here is a
write-up on Linux kernel I/O for large Oracle systems in
Linux.
In it, the author suggests that full-scan
access speed is aggravated by Oracle willy-nilly block placement
in Automated Storage Management (ASM) and using bitmap freelists
(Automated Segment Storage Management).
By doing a workload analysis of an Oracle database, we should
expect to see that any of these external and internal factors
will influence the speed of full-scan and range scan access:
- An unreasonable workload - As system
workload increases, it is possible to detect disk enqueues, especially where
the data blocks are not "randomized".
- Automatic Storage Management - ASM place data blocks in a
non-sequential fashion within the logical tablespace.
- Automatic segment Space Management - ASSM (bitmap freelists) have
been associated with slower multi-block access in large data warehouses (See
Aldridge link).
- The db_file_multiblock_read_count (MBRC) effects the number of
blocks accessed within a "db file scattered read" wait event.
- Blocksize - The OS block size and Oracle blocksize effect the bit
stream packet sizes and frequency during an I/O call.
- Platter disk vs. RAM disk - For example, solid-state disk has no spinning
platters and research shows no difference between single block and
multi-block access speeds (which can be up to 300x faster than antiquated
platter disk)
- RAID level - Oracle standard "Stripe and Mirror Everywhere" SAME
(a.k.a. RAID 10) allows you to define a stripe size, the number of physical
data blocks that are placed contiguously on disk. Other RAID levels
(RAID 5), may scramble the data blocks, negating the benefits of multi-block
reads, since logically contiguous bocks are not co-located on disk.
So, what can we assume? It would be interesting to review
some real-world databases and see if there are any correlations
between internal setup (ASM, ASSM), physical disk architecture,
and multi-block read performance.
Expert Observations on scattered
read I/O speed:
Steve Karam (OCP, OCM, Oracle ACE)
notes that the disk latency is primarily governed externally to
Oracle:
"In my experience, an I/O is an I/O?whether it's a single
block read or a multiblock read, it is one I/O and will take the same amount
of time.
I suppose there may be some credible cases where multiblock reads were
slower?perhaps having too small a segment size in the RAID volume or
something like that.
The problem is that if databases do show slower full-scan I/O latency, the
problem could be at so many levels?it could be the disk array, the RAID
type, the amount of disks in the volume, the HBAs, the multipathing
software?you just can't generalize something like that.
John Garmany, a West Point graduate in Electrical Engineering, notes the the
total number of blocks requested in a multi-block read will effect the overall
average disk latency:
"Another problem is that I can't really tell
the level at which you can state that multiblock access are slower.
For instance, if the choice is 50 scattered reads of a table
(a full-table scan) or 3 sequential reads of an index/table (index range
scan or index unique scan)
then of course the sequential I/O is going to be faster than the scattered."
A closer look at Oracle scattered
read waits
10gr2 Note: Starting in
Oracle 10g release2, Oracle recommends not setting the
db_file_multiblock_read_count parameter, allowing Oracle to
empirically determine the optimal setting. For more details,
see my notes on
10gR2 automatically tuned multi-block reads.
Lets examine the effect of
the "db_file_multiblock_read_count"
(MBRC) of the reported speeds of "db file scattered read".
At the DMCL level, many systems
implement a "sequential prefetch" read-ahead mechanism to
suck-in data blocks fast. The Oracle Documentation notes
this on db file scattered reads:
"The db file scattered Oracle metric event
signifies that the user process is reading buffers into the SGA buffer cache
and is waiting for a physical I/O call to return.
A db file scattered read issues a scatter-read to read the data into
multiple discontinuous memory locations. A scattered read is usually a
multiblock read. It can occur for a fast full scan (of an index) in addition
to a full table scan.
The db file scattered read wait event identifies that a full table scan is
occurring. When performing a full table scan into the buffer cache, the
blocks read are read into memory locations that are not physically adjacent
to each other.
Such reads are called scattered read calls, because the blocks are scattered
throughout memory.
This is why the corresponding wait event is called 'db file scattered read'.
Multiblock (up to DB_FILE_MULTIBLOCK_READ_COUNT blocks) reads due to full
table scans into the buffer cache show up as waits for 'db file scattered
read'."
My References:
Also see my related notes on Oracle scattered read
access speeds: