If an Oracle database server has a hardware
problem, such as an overloaded CPU, excessive memory swapping, or disk I/O
bottleneck, no amount of tuning within the Oracle database itself is going to
improve performance. So the first thing an Oracle professional must examine
during tuning is the server's hardware.
Because Oracle databases run on dozens of different platforms, from mainframes
to personal computers, it's very difficult to give guidelines for tuning every
type of server environment. However, since more than 85 percent of Oracle
databases run in the UNIX environment, we're going to discuss some of the UNIX
tools for determining whether a UNIX database server is undergoing stress.
The vmstat utility
One of the most common tools used in monitoring hardware for an Oracle database
is the UNIX vmstat utility. This utility takes an elapsed-time snapshot of
what's going on in the server environment. By issuing the vmstat command, we can
take a look at the run queue for the CPU, the memory page and page out of
activity, and the activity within each CPU on the database server.
Some UNIX system administrators and Oracle DBAs have figured out methods for
capturing vmstat output and placing it in the Oracle tables for more detailed
analysis. But in order to conduct meaningful analysis, you must know what to
look for. It's important to recognize that in each of the different UNIX flavors
(Solaris, AIX, HP/UX), vmstat has different display formats. Although the vmstat
output is different depending on the flavor of UNIX, all flavors share some
important server metrics. In Table 1 below, the important metrics appear
in bold.
Table 1 |
 |
As you can see, each UNIX flavor of vmstat reports provides different
information about the current status of the server. But despite these
differences, only a small number of metrics is important for server monitoring.
These include:
- rrunque: Shows the number of tasks waiting for CPU resources.
When this number exceeds the number of CPUs on the server, a CPU bottleneck
exists.
- pipage-in: Occurs when the server is experiencing a shortage of
RAM. Any nonzero value for pi indicates excessive swap activity.
- usUser CPU: Shows the amount of CPU used by the user state.
- sy: Shows the percentage of CPU being used to service system tasks.
- ididle: Shows the percentage of CPU that is idle.
- wawait (AIX only): Shows the percentage of CPU that is waiting
on external operations such as disk I/O.
Note that the CPU metrics are expressed as percentages. Hence, all of the CPU
values (us+sy+id+wa) will always sum to 100.

The popular
Ion tool is
the easiest way to analyze Oracle OS parameter performance and Ion
allows you to spot hidden OS related performance trends.
Ion is
our favorite Oracle tuning tool.
Let's examine each of the important measures of server hardware performance.
Run queue (r) - The run queue values are usually the very first values displayed on the left of
all vmstat outputs. In the UNIX environment, the run queue is used to display
the number of active tasks that are currently waiting for CPU resources. Hence,
it is important that the run queue values never exceed the number of CPUs on the
processor. For example, for an Oracle database server with eight CPUs (8-way
SMP), any run queue value in excess of 8 would indicate that the CPU is being
overloaded and tasks are waiting for service.
RAM page-in (pi) - The use of RAM on the database server is also very important. In any virtual
memory environment, it is not uncommon to see RAM pages moved out to a swap
disk. This is a special disk area in UNIX that's reserved for holding memory
pages so that the processor is capable of addressing RAM in excess of its full
capability. While page-out of operations are a normal part of any server's
operations, page-in operations indicate that the real amount of available RAM
has been exceeded and that additional RAM pages are required on the server.
Whenever page-ins occur, the memory contents must be read in from the swap disk.
This page-in activity is very time-consuming, and memory page-in operations can
cause significant slowdowns of any Oracle database.
Disk bottlenecks
Some versions of vmstat, such as IBM's AIX environment, allow the DBA to get a
general idea that an I/O process is slowing down the system. In AIX, there's a
wait (wa) column that indicates the percentage of CPU that is in the wait-state,
waiting for other resources on the server.

An IBM disk I/O monitor
For other UNIX environments, such as Solaris, HP/UX, and Linux, the UNIX iostat
utility is generally used to take a look at the I/O activity on each device
that's been hooked up to the Oracle server. For more information on using the
iostat utility with Oracle, see my article "Tuning
Disk I/O in Oracle8" in Oracle Magazine.
In today's disk architectures, we commonly find that disk striping is used to
alleviate these types of disk I/O bottlenecks. The RAID 0+1 method effectively
does block-level striping across the disk devices, thereby ensuring that the
disk load rises and decreases at the same level. However, it's important to
recognize that individual disk devices for Oracle databases are becoming larger
all the time. Today it's not uncommon to see an individual disk with 73
gigabytes of storage. In such an environment, it's possible to store an entire
Oracle database on a single disk. Without disk array caching tools such as EMC
disk caching, the Oracle DBA might find the small, frequently referenced area of
disk is constantly being recalled, causing the read-write heads on the disk
device to have a significant impact on the overall performance of the database.
This type of I/O bottleneck can also commonly be seen in cases where the DBA is
running hot backups while the database is active. The Oracle hot backups task is
attempting to read the database from disk, cylinder by cylinder, while the
online database engine is attempting to randomly access information on the disk.
The contention is seen in the demand for the read-write heads on the disk, as
they relocate themselves under competing cylinders, and in the significant
slowdown of the system during hot backups.
Database server trend analysis
Many UNIX system administrators have utilities that can capture and plot the
server stress over a period of time. These server statistics are very useful for
the Oracle DBA because they can show periods where the server is overloaded. In
the example in Figure A, we see a plot of CPU and CPU waits, averaged by
the hour of the day. It is clear that this database is experiencing periods of
high activity each day at 5:00 A.M. and at 11:00 P.M.

Plotting database server statistics by hour of the
day
We can also plot abnormal conditions such as page-in operations. Alert reports,
such as the one shown in Figure B, can be created and e-mailed to the DBA
to show when the Oracle server is short on memory.

This type of page-in information can also be used to predict the overall load on
the database server. One of the jobs of the Oracle DBA is to predict when
additional hardware resources are needed for the database. By plotting the
consumption of memory and CPU and performing a linear regression, the Oracle DBA
can accurately predict when additional RAM or CPU will be required.

A predictive report for RAM memory on an Oracle
server
Summary
This article is intended to give you a high-level, overall understanding of
monitoring and tuning the hardware on your Oracle database server. The main
points are to carefully look for shortages in CPU, RAM, and disk I/O
bottlenecks. Next week's article will look at tuning within the confines of the
Oracle database, beginning with an overview of Oracle instance tuning.
|