 |
|
Tuning Disk I/O in Oracle
Oracle Tips by Burleson Consulting |
Disk-I/O bottlenecks can be a major culprit in slowing
database response times until you balance the load on your disks.
For related I/O tuning details, see
my book "Oracle
Tuning: The Definitive Reference",
With terabyte-sized and Web-enabled Oracle8 databases becoming more
commonplace, the task of disk-load balancing has never been more critical. These
databases are too massive to be cached in a buffer, yet they often serve
thousands of users who expect instant response times. To minimize response time,
you need to do all you can to shorten the most time-consuming aspect of a
transaction: disk I/O. And the most important thing you can do to minimize disk
I/O is to balance the load on your disks.
By placing data files strategically on the physical disks, you can minimize
the likelihood of any one disk becoming stalled while handling simultaneous I/O
requests. This article provides a strategy for collecting I/O information into
Oracle tables and generating reports, to deliver maximum guidance in the
load-balancing process for multiple-physical-disk systems. (Be sure to tune SQL
statements before modifying the disk layout. Also note that alternative hardware
configurations, such as disk striping, spread the load across multiple disks
automatically to improve performance.)
Identify Possible Bottlenecks
The first step in balancing the load on disks is to find out where they're
out of balance by identifying possible bottlenecks. Start by identifying hot
disksthose with a disproportionate amount of activity. For example, if one
disk in a ten-disk system were experiencing 25 percent of the I/Omeasured as
the number of reads, writes, or bothyou would consider it hot.
Once you've identified the hot disks, look closely to find out which files
and tables on the disks experience most of the activity, so that you can move
them to less-active disks as needed.
The actual process of identifying hot files and disks involves running
data-collection utilities, such as Oracle's utlbstat and utlestat and the UNIX
iostat utility, and then using the collected I/O data to pinpoint the sources of
excessive I/O measurements.
In a UNIX environment, you would generally navigate the hierarchy shown in Figure
1 to identify the problem. Each physical disk contains many UNIX mount
points (directory locations where system administrators attach the disk
storage); each mount point contains many Oracle data files, and each may also
contain many database tables. After using data collected by iostat to identify a
hot disk, you would use data collected by the Oracle utilities to identify which
mount point and file contain the table causing the excessive I/O activity.
(Note: Although the hierarchy in Figure
1 is the usual situation, it is also possible for a mount point to contain
multiple disks, rather than vice versa; in such cases, you would identify the
hot mount point before narrowing in on the hot disk.)
 |
FIGURE 1: To identify the cause of excessive I/O
activity on a disk in a UNIX environment, you would typically navigate
the hierarchy of I/O structures shown here. You would trace the cause of
the activity to a particular mount point on the disk, then to a file on
the mount point, and even to a particular table within the file. |
To get the most-accurate results, you should collect I/O statistics at
frequent intervalspreferably no more than ten minutes between samplesover
a representative time period, such as a week. Because individual application
processes have different I/O patterns, bottlenecks may appear on different disks
at various times during each day. And because Oracle transactions happen very
quickly, a disk may experience an I/O bottleneck for a very short periodbut a
short-duration bottleneck may nonetheless repeat itself thousands of times each
day. If you make the mistake of summarizing I/O by the hour, as many DBAs do,
you won't see these bottlenecks because the I/O spikes will not be evident in
the hourly average, as shown in
Figure
2.
 |
FIGURE 2: The spikes in this I/O activity data for Disk
123 show the importance of measuring I/O at frequent intervals. They
represent short-duration bottlenecks that you might not notice if the
system had gathered data at less-frequent intervalsor if it had
simply been averaged over the hour, rather than examined in detail. |
A Table-Based Approach
Rather than running the off-the-shelf utilities, which generate a printed
report for a single time period, you can modify the utilities to collect the I/O
data over five-minute intervals and store the results in Oracle database tables
for easy access and report generation.
(Note: This example assumes a UNIX environment, but you can use the same
strategy in other environments, with appropriate modifications.)
An understanding of the table structures is helpful in reading through the
listings that go with this article.
Figure
3 shows the structure of the tables used in our example; the two key tables
are filestat and iostat.
 |
FIGURE 3: Several summary tables work together to help
in the load-balancing process: After populating the vol_grp table with
information about mount points and their associated disks, you can
relate the disk-level data to the file-level data and identify which
files are causing excessive activity on a disk. You can easily create
summary tables such as sum_iostat and sum_filestat (which aggregate the
disk- and file-level data by mount point) from the vol_grp, filestat,
and iostat tables. |
- The filestat table contains the I/O data collected by modified versions of
bstat and estat; it includes mount point, filename, and table name, as well
as date and specific I/O information. The I/O data includes the actual
number of physical reads and writes, the number of block reads and writes,
and the time required to perform them.
- The iostat table also includes read and write times corresponding to
specific dates, but at the disk level. It collects information from the
iostat utility, using the script get_iostat.ksh.
To provide a cross-reference between the
filestat and iostat tables, we added the vol_grp (volume/group) table, which
links mount points to physical disks. You need to populate this table
manually, based on how your disks are partitioned into mount points. The
design of the vol_grp, filestat, and iostat tables lets you aggregate, or
average, I/O data over time and summarize it by disk, mount point,
tablespace, or data file.
In the example, we've added two summary
tables: sum_filestat and sum_iostat.
- The sum_filestat table is created (in the estat_file.sql script) by
summing I/O data from the filestat table over all files and tables for each
mount point.
- The sum_iostat table is created (in the get_iostat.ksh script) by
aggregating the I/O data from the iostat table by mount point, rather than
by disk, using the disk-to-mount-point relationships from the vol_grp table.
Collecting the Data
This data-collection approach is based on existing UNIX and Oracle utilities
(iostat and bstat/estat, respectively), but it does require slight modifications
to these utilities. The example scripts shown in Listings 1 through 4 will help
you use these utilities to collect data into the tables shown in Figure 3.
- The get_iostat.ksh script, in
Listing
1, is a UNIX shell script that collects disk-level I/O information at
five-minute intervals. It runs the iostat utility and captures the output into
the iostat table, using the data from the vol_grp table to create the
sum_iostat table as well. Once you've run this script, you have the data
required to identify your system's hot disks and mount points.
The get_filestat.ksh script, in Listing
2, is a UNIX shell script that collects I/O information at the file and
table levels, as well as by mount point, also at five-minute intervals. It
calls two SQL snippets, bstat_file.sql and estat_file.sql (shown in Listing
3 and 4),
which are customizations of Oracle's utlbstat and utlestat utilities, modified
to store the I/O statistics in Oracle tables, rather than in report.txt
output. The UNIX script runs bstat_file.sql first, waits five minutes, and
then runs estat_file.sql, causing the system to insert the resulting data into
the filestat and sum_filestat tables shown in Figure 3. After running this
script, you have the data needed to track hot disks down to the file and table
level.
#!/bin/ksh
while
true
do
iostat
-x 300 1|\
sed
1,2d|\
awk '{ printf("%s %s %s\n", $1, $4, $5) }' |\
while
read HDISK VMSTAT_IO_R VMSTAT_IO_W
do
echo $HDISK
echo $VMSTAT_IO_R
echo $VMSTAT_IO_W
sqlplus -s / <<!
insert into
perfstat.stats\$iostat
values
(SYSDATE, 300, '$HDISK', $VMSTAT_IO_R,$VMSTAT_IO_W);
exit
!
done
sleep
300
done
Generally, you should run get_filestat.ksh concurrently with get_iostat.ksh,
so that the file- and disk-level data are collected during the same time period.
You can run both scripts as often as you like, and you can collect data over
long periods of time without adverse effects on database performance. File I/O
information is generally collected very quickly from the Oracle database's
system global area (SGA) memory. (The actual memory structures that contain the
file I/O data are called v$filestat and file$.) Disk I/O data collection is also
very fast, usually taking less than one second.
#!/bin/ksh
# First,
we must set the environment . . . .
ORACLE_SID=prodb1
ORACLE_HOME=`cat /var/opt/oracle/oratab|grep $ORACLE_SID|cut -f2 -d':'`
PATH=$ORACLE_HOME/bin:$PATH
MON=`echo
~oracle/iostat`
#----------------------------------------
# If it
is not running, then start it . . .
#----------------------------------------
check_stat=`ps -ef|grep get_iostat|grep -v grep|wc -l`;
oracle_num=`expr $check_stat`
if [ $oracle_num
-lt 1 ]
then
nohup $MON/get_iostat_solaris.ksh > /dev/null 2>&1 &
fi
One drawback of this approach is that the data-collection tables will
eventually become very large. However, you can manage table size by deleting
low-I/O data-file entries. For example, you could delete inactive-file entries
with the following SQL (which is included in Listing
1 ):
delete from sys.iostat where phys_read < 10 and phys_write < 10;
Bear in mind that deleting these entries will skew long-term averages, since
the averages will be based only on higher-activity entries.
Although the information collected by iostat and filestat is somewhat
redundant, the two sets of results work well together. When the iostat results
identify a hot mount point, you can turn to the filestat results to look at the
activity for each data file residing on the mount point. The filestat results
also provide more-in-depth information, including the overall time required to
perform the reads and writes. From elapsed-time information, you can quickly
identify the files that are waiting on disk I/O and see the actual number of
physical reads and writes.
If the I/O load on your disks isn't well balanced, you probably have
some disk I/O bottlenecks slowing your database response times.
We've made some specific scripts below that you can use to generate a
variety of activity and alert reports based on the I/O statistics, so
that you can move the culprit bottleneck files to less-active disk
devices. Download each SQL script then run at the SQL*Plus command
prompt.
- Daily
activity: This trend report aggregates I/O activity for each
UNIX mount point in five-minute intervals. It extracts and displays
the overall activity for each mount point. You can present the data
in a spreadsheet chart of the daily activity for each mount point.
- Top
mount point: This alert report identifies the mount point
that experiences the most I/O over the course of the day.
- Top
mount point each hour: This alert report aggregates I/O by
mount point per hour and displays the top mount point for each hour
during the day.
- Hot
data files for each mount point: This alert report
identifies the most-active data files on each mount point and
compares their activity with the average activity for all files on
that mount point. You can split those files that cause too much of
the total I/O into many smaller data files and distribute them
across the disks, or you can move them to less active disks.
|
Generating I/O Reports
Using the I/O information from the tables, you can generate trend and alert
reports. Both types of reports are easy to generate using SQL, and the
combination of the reports will help you identify current bottlenecks as well as
spot potential future ones.
The trend reports provide a picture of the overall I/O situation, which helps
identify potential problems. If you continue collecting I/O data, you can
generate reports at regular intervals and distribute them to management as
spreadsheets. For example, Listing
5 provides sample SQL code for generating a report of hot data files, with
the output of the report shown in Figure
4.
For each five-minute interval, this report displays the name of any Oracle
database file with an I/O valuedefined in this case by number of reads—that
is more than 50 percent of the total I/O during that interval. The actual
percentage is shown in the right-hand column.
select
to_char(sn.END_INTERVAL_TIME,'YYYY-MM-DD HH24:MI:SS') "End snapshot time",
sum(after.PHYRDS+after.PHYWRTS-before.PHYWRTS-before.PHYRDS) "number of IOs",
trunc(10*sum(after.READTIM+after.WRITETIM-before.WRITETIM-before.READTIM)/
sum(1+after.PHYRDS+after.PHYWRTS-before.PHYWRTS-before.PHYRDS)) "ave IO time (ms)",
trunc((select value from v$parameter where name='db_block_size')*
sum(after.PHYBLKRD+after.PHYBLKWRT-before.PHYBLKRD-before.PHYBLKWRT)/
sum(1+after.PHYRDS+after.PHYWRTS-before.PHYWRTS-before.PHYRDS)) "ave IO size (bytes)"
from DBA_HIST_FILESTATXS before, DBA_HIST_FILESTATXS after,DBA_HIST_SNAPSHOT sn
where
after.file#=before.file# and
after.snap_id=before.snap_id+1 and
before.instance_number=after.instance_number and
after.snap_id=sn.snap_id and
after.instance_number=sn.instance_number
group by to_char(sn.END_INTERVAL_TIME,'YYYY-MM-DD HH24:MI:SS')
order by to_char(sn.END_INTERVAL_TIME,'YYYY-MM-DD HH24:MI:SS');
End snapshot time number of IOs ave IO time (ms) ave IO size (bytes)
------------------- ------------- ---------------- -------------------
2015-06-15 15:00:59 359254 20 711636
2015-06-15 16:00:59 805884 16 793033
2015-06-15 17:00:13 516576 13 472478
2015-06-15 18:00:27 471098 6 123565
2015-06-15 19:00:41 201820 9 294858 ;
The alert reports, on the other hand, are intended to identify current
bottleneck possibilitiesthat is, specific data files experiencing a
disproportionate amount of I/O (for example, those experiencing 20 percent more
activity than the average for the mount point).
Here is an examople of a disk I/O performance report over time.
Often, companies use automated procedures to generate the alert reports and
e-mail them to the DBA staff, so that they can move these files to less-active
mount points as soon as possible.
Some specific scripts for generating these reports are available at Oracle
Publishing Online and include reports for daily activity, top mount point, top
mount point each hour, and hot data files for each mount point. (See "Sample
Reports Online.")
A Delicate Balance
Having a wealth of I/O data will take you far in the process of load
balancing, but you also need ingenuity and patience. An application's
disk-access patterns can vary greatly according to daily or weekly processing
needs, so the optimal file placement may not always be obvious. (For example,
hdisk32 might be very busy during evening batch processing but largely idle
during daytime processing.) And it's possible that relocating a data file may
relieve I/O contention for one process, only to cause contention for an
unrelated process.
It may take several iterations of moving filesa nontrivial action, since
it must be done while the database is stoppedto find the most workable
arrangement. Generally, however, the process of load balancing is well worth the
time it takes. Once you have achieved a fairly balanced load, you won't need to
move the files unless new processes change the I/O pattern for the disks.
Nonetheless, isolating bottlenecks can be time-consuming and elusive, with
the scripts and tables detailed in this article, you can quickly set up a
procedure to provide the maximum information you can use for balancing your I/O
load and minimizing disk I/O, which is key to keeping response times low. There
are cases where although all indications seem to point to a file as a bottleneck
that should be moved, the file is one that should stay put (see "Look
Before You Leap").
Strategies such as load balancing can go a long way toward improving the
speed of your applicationsand keeping users happy across your network.
A file with a high I/O rate is not always a bottleneck. Strictly
speaking, a disk bottleneck is defined as two or more processes
competing to read from or write to the same disk device. However, it can
be very difficult to discover which processes are responsible for high
I/O on a disk. (Oracle Enterprise Manager can helplook at Top
Sessions ordered by physical reads to find out which sessions are
currently doing lots of I/O).
In addition, you can't always move files with high I/O rates. For
example, redo log files will always experience high I/O and you
shouldn't move them as part of load balancing (although you can stripe
your redo logos across separate disks if archiving, so that the archiver
reads from a different disk to the active log). Remember, too, that for
databases placed on a small number of disks, some amount of I/O
contention may be unavoidable. Consider a 30GB database spread across
two disks with 20 processes competing for data. On average, 10 processes
would be queued waiting for I/O from each of the two disks. Clearly,
these systems will always experience I/O contention.
|
FIGURE 4: The script shown in Listing 5 generated a
sample of a hot-file report. The files shown here have I/O values (measured as
number of reads) that are higher than 50 percent of the total I/O operations for
the mount point during the elapsed period (the actual percentage is shown in
right-hand column).
Date |
MountPt. |
MpReads |
FileName |
FileReads |
Pct |
09/2213:41 |
file1 |
5 |
tablbtabi.dbf |
3 |
60 |
|
file10 |
4 |
tablespace13d.dbf |
4 |
100 |
|
file4 |
169 |
tablespace12d.dbf |
168 |
99 |
|
file5 |
1 |
tablbtabd.dbf |
1 |
100 |
|
file7 |
2 |
tablsourced.dbf |
2 |
100 |
|
file8 |
57 |
tablloadd.dbf |
53 |
93 |
|
file9 |
2 |
tablprotd.dbf |
2 |
100 |
|
|
|
|
|
|
09/2213:46 |
file1 |
26 |
sat_system.dbf |
18 |
69 |
|
file10 |
2 |
tablespace13d.dbf |
2 |
100 |
|
file2 |
2 |
tablespace8i.dbf |
2 |
100 |
|
file3 |
3 |
tablpooli.dbf |
3 |
100 |
|
file4 |
11 |
tablstabd.dbf |
11 |
100 |
|
file5 |
3 |
tablddicd.dbf |
3 |
100 |
|
file7 |
3 |
tablsourced.dbf |
3 |
100 |
|
file9 |
4 |
tablpoold.dbf |
4 |
100 |
|
|
|
|
|
|
09/2213:51 |
file1 |
5 |
tablespace12i.dbf |
3 |
60 |
|
file10 |
3 |
tablespace14d.dbf |
2 |
67 |
|
file4 |
1 |
tablstabd.dbf |
1 |
100 |
|
file8 |
7 |
tablloadd.dbf |
5 |
71 |
|
file9 |
4 |
tabldocud.dbf |
4 |
100 |
For a complete description of the v$ views, get the "free
11g poster" sponsored by Quest Software.
 |
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning tips and
scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |