Tuning Disk I/O in Oracle

Home
E-mail Us
Oracle Articles
New Oracle Articles

Oracle Training
Oracle Tips
Oracle Forum
Class Catalog

Remote DBA
Oracle Tuning
Emergency 911
RAC Support
Apps Support
Analysis
Design
Implementation
Oracle Support

SQL Tuning
Security
Oracle UNIX
Oracle Linux
Monitoring
Remote support
Remote plans
Remote services
Application Server
Applications
Oracle Forms
Oracle Portal
App Upgrades
SQL Server
Oracle Concepts
Software Support
Remote Support
Development
Implementation

Consulting Staff
Consulting Prices
Help Wanted!

Oracle Posters
Oracle Books
Oracle Scripts
Ion
Excel-DB

Don Burleson Blog

Tuning Disk I/O in Oracle

Oracle Tips by Burleson Consulting

Disk-I/O bottlenecks can be a major culprit in slowing database response times until you balance the load on your disks.

For related I/O tuning details, see my book "Oracle Tuning: The Definitive Reference",

With terabyte-sized and Web-enabled Oracle8 databases becoming more commonplace, the task of disk-load balancing has never been more critical. These databases are too massive to be cached in a buffer, yet they often serve thousands of users who expect instant response times. To minimize response time, you need to do all you can to shorten the most time-consuming aspect of a transaction: disk I/O. And the most important thing you can do to minimize disk I/O is to balance the load on your disks.

By placing data files strategically on the physical disks, you can minimize the likelihood of any one disk becoming stalled while handling simultaneous I/O requests. This article provides a strategy for collecting I/O information into Oracle tables and generating reports, to deliver maximum guidance in the load-balancing process for multiple-physical-disk systems. (Be sure to tune SQL statements before modifying the disk layout. Also note that alternative hardware configurations, such as disk striping, spread the load across multiple disks automatically to improve performance.)

Identify Possible Bottlenecks

The first step in balancing the load on disks is to find out where they're out of balance by identifying possible bottlenecks. Start by identifying hot disks—those with a disproportionate amount of activity. For example, if one disk in a ten-disk system were experiencing 25 percent of the I/O—measured as the number of reads, writes, or both—you would consider it hot.

Once you've identified the hot disks, look closely to find out which files and tables on the disks experience most of the activity, so that you can move them to less-active disks as needed.

The actual process of identifying hot files and disks involves running data-collection utilities, such as Oracle's utlbstat and utlestat and the UNIX iostat utility, and then using the collected I/O data to pinpoint the sources of excessive I/O measurements.

In a UNIX environment, you would generally navigate the hierarchy shown in Figure 1 to identify the problem. Each physical disk contains many UNIX mount points (directory locations where system administrators attach the disk storage); each mount point contains many Oracle data files, and each may also contain many database tables. After using data collected by iostat to identify a hot disk, you would use data collected by the Oracle utilities to identify which mount point and file contain the table causing the excessive I/O activity.

(Note: Although the hierarchy in Figure 1 is the usual situation, it is also possible for a mount point to contain multiple disks, rather than vice versa; in such cases, you would identify the hot mount point before narrowing in on the hot disk.)

FIGURE 1: To identify the cause of excessive I/O activity on a disk in a UNIX environment, you would typically navigate the hierarchy of I/O structures shown here. You would trace the cause of the activity to a particular mount point on the disk, then to a file on the mount point, and even to a particular table within the file.

To get the most-accurate results, you should collect I/O statistics at frequent intervals—preferably no more than ten minutes between samples—over a representative time period, such as a week. Because individual application processes have different I/O patterns, bottlenecks may appear on different disks at various times during each day. And because Oracle transactions happen very quickly, a disk may experience an I/O bottleneck for a very short period—but a short-duration bottleneck may nonetheless repeat itself thousands of times each day. If you make the mistake of summarizing I/O by the hour, as many DBAs do, you won't see these bottlenecks because the I/O spikes will not be evident in the hourly average, as shown in Figure 2.

FIGURE 2: The spikes in this I/O activity data for Disk 123 show the importance of measuring I/O at frequent intervals. They represent short-duration bottlenecks that you might not notice if the system had gathered data at less-frequent intervals—or if it had simply been averaged over the hour, rather than examined in detail.

A Table-Based Approach

Rather than running the off-the-shelf utilities, which generate a printed report for a single time period, you can modify the utilities to collect the I/O data over five-minute intervals and store the results in Oracle database tables for easy access and report generation.

(Note: This example assumes a UNIX environment, but you can use the same strategy in other environments, with appropriate modifications.)

An understanding of the table structures is helpful in reading through the listings that go with this article. Figure 3 shows the structure of the tables used in our example; the two key tables are filestat and iostat.

FIGURE 3: Several summary tables work together to help in the load-balancing process: After populating the vol_grp table with information about mount points and their associated disks, you can relate the disk-level data to the file-level data and identify which files are causing excessive activity on a disk. You can easily create summary tables such as sum_iostat and sum_filestat (which aggregate the disk- and file-level data by mount point) from the vol_grp, filestat, and iostat tables.

The filestat table contains the I/O data collected by modified versions of bstat and estat; it includes mount point, filename, and table name, as well as date and specific I/O information. The I/O data includes the actual number of physical reads and writes, the number of block reads and writes, and the time required to perform them.
The iostat table also includes read and write times corresponding to specific dates, but at the disk level. It collects information from the iostat utility, using the script get_iostat.ksh.
To provide a cross-reference between the filestat and iostat tables, we added the vol_grp (volume/group) table, which links mount points to physical disks. You need to populate this table manually, based on how your disks are partitioned into mount points. The design of the vol_grp, filestat, and iostat tables lets you aggregate, or average, I/O data over time and summarize it by disk, mount point, tablespace, or data file.
In the example, we've added two summary tables: sum_filestat and sum_iostat.
The sum_filestat table is created (in the estat_file.sql script) by summing I/O data from the filestat table over all files and tables for each mount point.
The sum_iostat table is created (in the get_iostat.ksh script) by aggregating the I/O data from the iostat table by mount point, rather than by disk, using the disk-to-mount-point relationships from the vol_grp table.

Collecting the Data

This data-collection approach is based on existing UNIX and Oracle utilities (iostat and bstat/estat, respectively), but it does require slight modifications to these utilities. The example scripts shown in Listings 1 through 4 will help you use these utilities to collect data into the tables shown in Figure 3.

The get_iostat.ksh script, in Listing 1, is a UNIX shell script that collects disk-level I/O information at five-minute intervals. It runs the iostat utility and captures the output into the iostat table, using the data from the vol_grp table to create the sum_iostat table as well. Once you've run this script, you have the data required to identify your system's hot disks and mount points.
The get_filestat.ksh script, in Listing 2, is a UNIX shell script that collects I/O information at the file and table levels, as well as by mount point, also at five-minute intervals. It calls two SQL snippets, bstat_file.sql and estat_file.sql (shown in Listing 3 and 4), which are customizations of Oracle's utlbstat and utlestat utilities, modified to store the I/O statistics in Oracle tables, rather than in report.txt output. The UNIX script runs bstat_file.sql first, waits five minutes, and then runs estat_file.sql, causing the system to insert the resulting data into the filestat and sum_filestat tables shown in Figure 3. After running this script, you have the data needed to track hot disks down to the file and table level.

#!/bin/ksh

while true

iostat -x 300 1|\

sed 1,2d|\

awk '{ printf("%s %s %s\n", $1, $4, $5) }' |\

while read HDISK VMSTAT_IO_R VMSTAT_IO_W

echo $HDISK

echo $VMSTAT_IO_R

echo $VMSTAT_IO_W

sqlplus -s / <<!

insert into

perfstat.stats\$iostat

values

(SYSDATE, 300, '$HDISK', $VMSTAT_IO_R,$VMSTAT_IO_W);

exit

done

sleep 300

done

Generally, you should run get_filestat.ksh concurrently with get_iostat.ksh, so that the file- and disk-level data are collected during the same time period. You can run both scripts as often as you like, and you can collect data over long periods of time without adverse effects on database performance. File I/O information is generally collected very quickly from the Oracle database's system global area (SGA) memory. (The actual memory structures that contain the file I/O data are called v$filestat and file$.) Disk I/O data collection is also very fast, usually taking less than one second.

#!/bin/ksh

# First, we must set the environment . . . .

ORACLE_SID=prodb1

ORACLE_HOME=`cat /var/opt/oracle/oratab|grep $ORACLE_SID|cut -f2 -d':'`

PATH=$ORACLE_HOME/bin:$PATH

MON=`echo ~oracle/iostat`

#----------------------------------------

# If it is not running, then start it . . .

#----------------------------------------

check_stat=`ps -ef|grep get_iostat|grep -v grep|wc -l`;

oracle_num=`expr $check_stat`

if [ $oracle_num -lt 1 ]

then nohup $MON/get_iostat_solaris.ksh > /dev/null 2>&1 &

One drawback of this approach is that the data-collection tables will eventually become very large. However, you can manage table size by deleting low-I/O data-file entries. For example, you could delete inactive-file entries with the following SQL (which is included in Listing 1 ):

delete from sys.iostat where phys_read < 10 and phys_write < 10;

Bear in mind that deleting these entries will skew long-term averages, since the averages will be based only on higher-activity entries.

Although the information collected by iostat and filestat is somewhat redundant, the two sets of results work well together. When the iostat results identify a hot mount point, you can turn to the filestat results to look at the activity for each data file residing on the mount point. The filestat results also provide more-in-depth information, including the overall time required to perform the reads and writes. From elapsed-time information, you can quickly identify the files that are waiting on disk I/O and see the actual number of physical reads and writes.

Sample Reports Online

If the I/O load on your disks isn't well balanced, you probably have some disk I/O bottlenecks slowing your database response times.
We've made some specific scripts below that you can use to generate a variety of activity and alert reports based on the I/O statistics, so that you can move the culprit bottleneck files to less-active disk devices. Download each SQL script then run at the SQL*Plus command prompt.

Daily activity: This trend report aggregates I/O activity for each UNIX mount point in five-minute intervals. It extracts and displays the overall activity for each mount point. You can present the data in a spreadsheet chart of the daily activity for each mount point.
Top mount point: This alert report identifies the mount point that experiences the most I/O over the course of the day.
Top mount point each hour: This alert report aggregates I/O by mount point per hour and displays the top mount point for each hour during the day.
Hot data files for each mount point: This alert report identifies the most-active data files on each mount point and compares their activity with the average activity for all files on that mount point. You can split those files that cause too much of the total I/O into many smaller data files and distribute them across the disks, or you can move them to less active disks.

Generating I/O Reports

Using the I/O information from the tables, you can generate trend and alert reports. Both types of reports are easy to generate using SQL, and the combination of the reports will help you identify current bottlenecks as well as spot potential future ones.

The trend reports provide a picture of the overall I/O situation, which helps identify potential problems. If you continue collecting I/O data, you can generate reports at regular intervals and distribute them to management as spreadsheets. For example, Listing 5 provides sample SQL code for generating a report of hot data files, with the output of the report shown in Figure 4.

For each five-minute interval, this report displays the name of any Oracle database file with an I/O value—defined in this case by number of readsthat
is more than 50 percent of the total I/O during that interval. The actual percentage is shown in the right-hand column.

selectto_char(sn.END_INTERVAL_TIME,'YYYY-MM-DD HH24:MI:SS') "End snapshot time", sum(after.PHYRDS+after.PHYWRTS-before.PHYWRTS-before.PHYRDS) "number of IOs", trunc(10*sum(after.READTIM+after.WRITETIM-before.WRITETIM-before.READTIM)/ sum(1+after.PHYRDS+after.PHYWRTS-before.PHYWRTS-before.PHYRDS)) "ave IO time (ms)", trunc((select value from v$parameter where name='db_block_size')* sum(after.PHYBLKRD+after.PHYBLKWRT-before.PHYBLKRD-before.PHYBLKWRT)/ sum(1+after.PHYRDS+after.PHYWRTS-before.PHYWRTS-before.PHYRDS)) "ave IO size (bytes)" from DBA_HIST_FILESTATXS before, DBA_HIST_FILESTATXS after,DBA_HIST_SNAPSHOT sn where after.file#=before.file# and after.snap_id=before.snap_id+1 and before.instance_number=after.instance_number and after.snap_id=sn.snap_id and after.instance_number=sn.instance_number group by to_char(sn.END_INTERVAL_TIME,'YYYY-MM-DD HH24:MI:SS') order by to_char(sn.END_INTERVAL_TIME,'YYYY-MM-DD HH24:MI:SS');

End snapshot time number of IOs ave IO time (ms) ave IO size (bytes) ------------------- ------------- ---------------- ------------------- 2015-06-15 15:00:59 359254 20 711636 2015-06-15 16:00:59 805884 16 793033 2015-06-15 17:00:13 516576 13 472478 2015-06-15 18:00:27 471098 6 123565 2015-06-15 19:00:41 201820 9 294858 ;

The alert reports, on the other hand, are intended to identify current bottleneck possibilities—that is, specific data files experiencing a disproportionate amount of I/O (for example, those experiencing 20 percent more activity than the average for the mount point).

Here is an examople of a disk I/O performance report over time.

Often, companies use automated procedures to generate the alert reports and e-mail them to the DBA staff, so that they can move these files to less-active mount points as soon as possible.

Some specific scripts for generating these reports are available at Oracle Publishing Online and include reports for daily activity, top mount point, top mount point each hour, and hot data files for each mount point. (See "Sample Reports Online.")

A Delicate Balance

Having a wealth of I/O data will take you far in the process of load balancing, but you also need ingenuity and patience. An application's disk-access patterns can vary greatly according to daily or weekly processing needs, so the optimal file placement may not always be obvious. (For example, hdisk32 might be very busy during evening batch processing but largely idle during daytime processing.) And it's possible that relocating a data file may relieve I/O contention for one process, only to cause contention for an unrelated process.

It may take several iterations of moving files—a nontrivial action, since it must be done while the database is stopped—to find the most workable arrangement. Generally, however, the process of load balancing is well worth the time it takes. Once you have achieved a fairly balanced load, you won't need to move the files unless new processes change the I/O pattern for the disks.

Nonetheless, isolating bottlenecks can be time-consuming and elusive, with the scripts and tables detailed in this article, you can quickly set up a procedure to provide the maximum information you can use for balancing your I/O load and minimizing disk I/O, which is key to keeping response times low. There are cases where although all indications seem to point to a file as a bottleneck that should be moved, the file is one that should stay put (see "Look Before You Leap").

Strategies such as load balancing can go a long way toward improving the speed of your applications—and keeping users happy across your network.

Look Before You Leap

A file with a high I/O rate is not always a bottleneck. Strictly speaking, a disk bottleneck is defined as two or more processes competing to read from or write to the same disk device. However, it can be very difficult to discover which processes are responsible for high I/O on a disk. (Oracle Enterprise Manager can help—look at Top Sessions ordered by physical reads to find out which sessions are currently doing lots of I/O).
In addition, you can't always move files with high I/O rates. For example, redo log files will always experience high I/O and you shouldn't move them as part of load balancing (although you can stripe your redo logos across separate disks if archiving, so that the archiver reads from a different disk to the active log). Remember, too, that for databases placed on a small number of disks, some amount of I/O contention may be unavoidable. Consider a 30GB database spread across two disks with 20 processes competing for data. On average, 10 processes would be queued waiting for I/O from each of the two disks. Clearly, these systems will always experience I/O contention.

FIGURE 4: The script shown in Listing 5 generated a sample of a hot-file report. The files shown here have I/O values (measured as number of reads) that are higher than 50 percent of the total I/O operations for the mount point during the elapsed period (the actual percentage is shown in right-hand column).

Date MountPt. MpReads FileName FileReads Pct

09/2213:41 file1 5 tablbtabi.dbf 3 60

file10 4 tablespace13d.dbf 4 100

file4 169 tablespace12d.dbf 168 99

file5 1 tablbtabd.dbf 1 100

file7 2 tablsourced.dbf 2 100

file8 57 tablloadd.dbf 53 93

file9 2 tablprotd.dbf 2 100

09/2213:46 file1 26 sat_system.dbf 18 69

file10 2 tablespace13d.dbf 2 100

file2 2 tablespace8i.dbf 2 100

file3 3 tablpooli.dbf 3 100

file4 11 tablstabd.dbf 11 100

file5 3 tablddicd.dbf 3 100

file7 3 tablsourced.dbf 3 100

file9 4 tablpoold.dbf 4 100

09/2213:51 file1 5 tablespace12i.dbf 3 60

file10 3 tablespace14d.dbf 2 67

file4 1 tablstabd.dbf 1 100

file8 7 tablloadd.dbf 5 71

file9 4 tabldocud.dbf 4 100

For a complete description of the v$ views, get the "free 11g poster" sponsored by Quest Software.

If you like Oracle tuning, see the book "Oracle Tuning: The Definitive Reference", with 950 pages of tuning tips and scripts.
You can buy it direct from the publisher for 30%-off and get instant access to the code depot of Oracle tuning scripts.

��