Oracle Disk Access Speeds for Full Scan Operations

Oracle Database Tips by Donald BurlesonOctober 28, 2015

Prerequisite background reading:

A quick history of Oracle storage

Today we see a radical disconnect between the traditional "transparent disk access" where you could measure the "real" physical access speed and today's storage arrays.

The most popular disk of the 1980's was the refrigerator-sized 3380 disks, which contained only 1.2 gig of storage at the astronomical cost of over $200,000.

[shrinking+gigabytes.jpg]

In today's 2015 dollars, disk in the 1980's costs more than $4,000.00 per megabyte. Today, you can buy 100 GB PC disks for $200, and 100 GB of RAM Disk (solid-state disk) for $100,000.

In today's age of SAN, NAS and mass storage devices, the "real" latency of a disk read/write is hidden. For example, during a data block write, many disk storage devices return a "false ack" (acknowledgement message) to Oracle, when in reality, the data remains within an onboard RAM cache and has not truly been written to the platters.

Introduction to Oracle full scan I/O

To properly tune the Oracle I/O subsystem we must carefully distinguish between Oracle's perception of disk latency and the reality. The back-end disk devices are "black boxes" and the only "real" information we have about disk latency are the salient metrics collected by Oracle. The Oracle metrics can easily be skewed by the back-end devices but Oracle only knows the end-to-end latencies:

physical write waits, physical read waits (physical reads/sec, physical reads direct, &c)
"db file scattered reads" waits
"db file sequential reads" waits

Traditionally, multiblock disk I/O operations (e.g. table access full, index fast full scan, index range scans) were considered faster because the access to physically contiguous blocks required only a single seek delay, the largest component of platter latency.

Once aligned under the proper cylinder, the read-write head sits idle, sucking in many data blocks as the platter spins beneath the read-write head. See my notes on Oracle db file sequential scattered read disk I/O speed tuning for more details.

The Oracle optimizer is very important because it is the component that decides between index access vs. full-scan access, and there are several important optimizer settings that influence this behavior.

It's all about the relative cost of sequential vs. scattered read waits, and analyzing system statistics provides the optimizer with real-world empirical timings to help the CBO make better decisions about the "best" access method. We also see other important parameters such as the db_file_multiblock_read_count and the _optimizer_cost_model parms. To learn more, see my notes on global SQL optimization with parameters.

Read-ahead caching

The read ahead caching has many names on different operating systems, and it started on IBM mainframes as "sequential prefetch".

The concept of read-ahead caching is simple. Over 90% of I/O latency is consumed in the read-write head movement, as the heads are placed under the target cylinder. Once in-place, the disk continues to spin and the read-write head can simultaneously transmit back the original block request at the same time as the next sequential block passes below the read-write heads.

For scan operations (index range scans, index fast full scans, and full-table scans), a read-ahead cache can be very useful for speeding up these "scattered read" operations.

In sum, if your database is requesting adjacent data blocks, the read-ahead cache may improve I/O throughput. However, using RAID10 (SAME, stripe and mirror everywhere), like with ASM, the blocks are not adjacent, and a read-ahead cache may not be as useful as a database where the data blocks are laid-out sequentially. Of course, the stripe width influences this decision.

But not all disk configurations are created equal. There are some Oracle conditions that may adverse effect disk access speed for full-scan operations (e.g. "db file scatter reads").

Observations of disk speed and the optimizer
What factors effect multi-block disk access speeds?
Samples from the real world
Oracle STATSPACK & AWR reports showing scattered read timings

What factors effect multi-block disk access speeds?

It generally accepted that scattered read waits (multiblock reads from full-scans) will be reported as higher latency than sequential read waits (single block requests) because the multi-block I/O pulls-in many data blocks as opposed to a single block. However, we must remember that the first I/O that locates the read-write heads under the proper cylinder can be 70%-80% of total access time, and many databases can do multiblock I/O faster than sequential I/O (See Oracle STATSPACK & AWR reports showing scattered read timings).

Why the big difference in DB file wait times? Could this be due to internal Oracle configurations (e.g. Automatic Storage Management (ASM) of bitmap freelists (ASSM)? Here is a write-up on Linux kernel I/O for large Oracle systems in Linux.

In it, the author suggests that full-scan access speed is aggravated by Oracle willy-nilly block placement in Automated Storage Management (ASM) and using bitmap freelists (Automated Segment Storage Management).

By doing a workload analysis of an Oracle database, we should expect to see that any of these external and internal factors will influence the speed of full-scan and range scan access:

- An unreasonable workload - As system workload increases, it is possible to detect disk enqueues, especially where the data blocks are not "randomized".

- Automatic Storage Management - ASM place data blocks in a non-sequential fashion within the logical tablespace.

- Automatic segment Space Management - ASSM (bitmap freelists) have been associated with slower multi-block access in large data warehouses (See Aldridge link).

- The db_file_multiblock_read_count (MBRC) effects the number of blocks accessed within a "db file scattered read" wait event.

- Blocksize - The OS block size and Oracle blocksize effect the bit stream packet sizes and frequency during an I/O call.

- Platter disk vs. RAM disk - For example, solid-state disk has no spinning platters and research shows no difference between single block and multi-block access speeds (which can be up to 300x faster than antiquated platter disk)

- RAID level - Oracle standard "Stripe and Mirror Everywhere" SAME (a.k.a. RAID 10) allows you to define a stripe size, the number of physical data blocks that are placed contiguously on disk. Other RAID levels (RAID 5), may scramble the data blocks, negating the benefits of multi-block reads, since logically contiguous bocks are not co-located on disk.

So, what can we assume? It would be interesting to review some real-world databases and see if there are any correlations between internal setup (ASM, ASSM), physical disk architecture, and multi-block read performance.

Expert Observations on scattered read I/O speed:

Steve Karam (OCP, OCM, Oracle ACE) notes that the disk latency is primarily governed externally to Oracle:

"In my experience, an I/O is an I/O?whether it's a single block read or a multiblock read, it is one I/O and will take the same amount of time.

I suppose there may be some credible cases where multiblock reads were slower?perhaps having too small a segment size in the RAID volume or something like that.

The problem is that if databases do show slower full-scan I/O latency, the problem could be at so many levels?it could be the disk array, the RAID type, the amount of disks in the volume, the HBAs, the multipathing software?you just can't generalize something like that.

John Garmany, a West Point graduate in Electrical Engineering, notes the the total number of blocks requested in a multi-block read will effect the overall average disk latency:

"Another problem is that I can't really tell the level at which you can state that multiblock access are slower.

For instance, if the choice is 50 scattered reads of a table (a full-table scan) or 3 sequential reads of an index/table (index range scan or index unique scan) then of course the sequential I/O is going to be faster than the scattered."

A closer look at Oracle scattered read waits

10gr2 Note: Starting in Oracle 10g release2, Oracle recommends not setting the db_file_multiblock_read_count parameter, allowing Oracle to empirically determine the optimal setting. For more details, see my notes on 10gR2 automatically tuned multi-block reads.

Lets examine the effect of the "db_file_multiblock_read_count" (MBRC) of the reported speeds of "db file scattered read". At the DMCL level, many systems implement a "sequential prefetch" read-ahead mechanism to suck-in data blocks fast. The Oracle Documentation notes this on db file scattered reads:

"The db file scattered Oracle metric event signifies that the user process is reading buffers into the SGA buffer cache and is waiting for a physical I/O call to return.

A db file scattered read issues a scatter-read to read the data into multiple discontinuous memory locations. A scattered read is usually a multiblock read. It can occur for a fast full scan (of an index) in addition to a full table scan.

The db file scattered read wait event identifies that a full table scan is occurring. When performing a full table scan into the buffer cache, the blocks read are read into memory locations that are not physically adjacent to each other.

Such reads are called scattered read calls, because the blocks are scattered throughout memory.

This is why the corresponding wait event is called 'db file scattered read'. Multiblock (up to DB_FILE_MULTIBLOCK_READ_COUNT blocks) reads due to full table scans into the buffer cache show up as waits for 'db file scattered read'."