 |
|
Oracle UNIX
CPU Bottlenecks with vmstat
Administration
Oracle UNIX/Linux Tips by Burleson Consulting |
Identifying CPU Bottlenecks with
vmstat
Waiting CPU resources can be shown in UNIX
vmstat command output as the second column under the kthr (kernel
thread state change) heading (see Listing 2-1). Tasks may be placed
in the wait queue (?b?) if they are waiting on a resource, while
other tasks appear in the run queue (?r?) column. As we see in
Figure 2-15, server tasks are queued for execution by the server.
Figure 15: Tasks queuing for service by the
CPUs
In short, the server is experiencing a CPU
bottleneck when ?r? is greater than the number of CPU?s on the
server. To see the number of CPUs on the server, you can use one of
the following UNIX commands.
Remember that we need to know the number of
CPUs on our server because the vmstat runqueue value must never
exceed the number of CPUs. A runqueue value of 32 is perfectly
acceptable for a 36-CPU server, while a value of 32 would be a
serious problem for a 24 CPU server.
In the example below, we run the vmstat
utility. For our purposes, we are interested in the first two
columns: the run queue ?r?, and the kthr wait ?b? column. In the
listing below we see that there are an average of about eight new
tasks entering the run queue every five seconds (the ?r? column),
while there are five other tasks that are waiting on resources (the
?b? column). Also, a nonzero value in the (?b?) column may indicate
a bottleneck.
root>
vmstat 5 5
kthr
memory
page
faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi
po fr sr cy in
sy cs us sy id wa
7 5 220214 141 0 0
0 42 53 0 1724 12381 2206 19 46 28
7
9 5 220933 195 0 0
1 216 290 0 1952 46118 2712 27 55 13 5
13 5 220646 452 0 0
1 33 54 0 2130 86185 3014 30 59
8 3
6 5 220228 672 0 0
0 0 0 0 1929 25068 2485 25
49 16 10
The rule for identifying a server with CPU
resource problems is quite simple. Whenever the value of the
runqueue ?r? column exceeds the number of CPUs on the server, tasks
are forced to wait for execution. There are several solutions to
managing CPU overload, and these alternatives are presented in their
order of desirability:
1. Add more processors (CPUs) to the server.
2. Load balance the system tasks by
rescheduling large batch tasks to execute during off-peak hours.
3. Adjust the dispatching priorities (nice
values) of existing tasks.
To understand how dispatching priorities
work, we must remember that incoming tasks are placed in the
execution queue according to their nice value (see Figure 2-15).
Below we see that tasks with a low nice value are scheduled for
execution above those tasks with a higher nice value.
Figure 15: Tasks queued for execution
according to their nice value
Now that we can see when the CPUs are
overloaded, let?s look into vmstat further and see how we can tell
when the CPUs are running at full capacity.
Identifying High CPU Usage with vmstat
We can also easily detect when we are
experiencing a busy CPU on the Oracle database server. Whenever the
?us? (user) column plus the ?sy? (system) column times approach
100%, the CPUs are operating at full capacity .
Please note that it is not uncommon to see
the CPU approach 100 percent even when the server is not overwhelmed
with work. This is because the UNIX internal dispatchers will always
attempt to keep the CPUs as busy as possible. This maximizes task
throughput, but it can be misleading for a neophyte.
Remember, it is not a cause for concern when
the user + system CPU values approach 100 percent. This just means
that the CPUs are working to their full potential. The only metric
that identifies a CPU bottleneck is when the run queue (?r? value)
exceeds the number of CPUs on the server.
root>
vmstat 5 1
kthr
memory
page
faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi
po fr sr cy in sy cs
us sy id wa
0 0 217485 386 0 0
0 4 14 0 202 300 210 20 75
3 2
Please note that in Chapter 9 we will
describe a detailed method for capturing vmstat information inside
STATSPACK extension tables. The approach of capturing server
information along with Oracle information provides the Oracle DBA
with a complete picture of the operation of the system.
The UNIX watch command
One common method for watching UNIX server
load is to monitor the load average for the server. The load average
is an arbitrary number that shows overall resource consumption of
the server. Most load average displays have three values for
the load average. The load average display shows the load
averages for the past minute, the past 5 minutes, and the past 10
minutes. A low load average is ideal, and the load average
should stay below zero. Whenever the value exceeds ?1? there
may be a CPU overload problem.
root> w
5:54pm up 2 days, 22:45, 29 users, load average:
0.08, 0.14, 0.22
User tty
login@ idle JCPU PCPU what
root ttyp1
7:11pm 25:47
tee -a /u01/home/crup
triha ttyp2
4:48pm 20 3
3 runmenu50 pamenu
lpayne ttyp3
5:29pm 24
runmenu50 pamenu
burleson ttyp5
5:50pm
-sh
tteply ttyp6 5:05pm
10 1 1
runmenu50 pamenu
kjoslin ttyp7
1:29pm 30 38
38 runmenu50 pamenu
jperry ttyp8
6:48am 1 51
51 runmenu50 pamenu
kharstad ttype
3:38pm 2:16
-sh
cmconway ttyqc 11:53am
17 5 5
runmenu50 pamenu
jhahn ttyr7
1:43pm 10 2
2 runmenu50 pamenu
tbailey ttyrb
12:12pm 1:38 4
4 runmenu50 pamenu
Now, let?s conclude this chapter with a review of the main concepts
and tools.
 |
If you like Oracle tuning, see the
book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning
tips and scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |