Oracle UNIX CPU Bottlenecks with vmstat Administration

Oracle UNIX/Linux Tips by Burleson Consulting

Identifying CPU Bottlenecks with vmstat

Waiting CPU resources can be shown in UNIX vmstat command output as the second column under the kthr (kernel thread state change) heading (see Listing 2-1). Tasks may be placed in the wait queue (?b?) if they are waiting on a resource, while other tasks appear in the run queue (?r?) column. As we see in Figure 2-15, server tasks are queued for execution by the server.

Figure 15: Tasks queuing for service by the CPUs

In short, the server is experiencing a CPU bottleneck when ?r? is greater than the number of CPU?s on the server. To see the number of CPUs on the server, you can use one of the following UNIX commands.

Remember that we need to know the number of CPUs on our server because the vmstat runqueue value must never exceed the number of CPUs. A runqueue value of 32 is perfectly acceptable for a 36-CPU server, while a value of 32 would be a serious problem for a 24 CPU server.

In the example below, we run the vmstat utility. For our purposes, we are interested in the first two columns: the run queue ?r?, and the kthr wait ?b? column. In the listing below we see that there are an average of about eight new tasks entering the run queue every five seconds (the ?r? column), while there are five other tasks that are waiting on resources (the ?b? column). Also, a nonzero value in the (?b?) column may indicate a bottleneck.

root> vmstat 5 5

kthr     memory             page              faults        cpu
----- ----------- ------------------------ ------------ -----------
r b   avm    fre re pi po fr   sr cy in     sy cs us sy id wa
7 5 220214   141   0   0   0 42   53   0 1724 12381 2206 19 46 28 7
9 5 220933   195   0   0   1 216 290   0 1952 46118 2712 27 55 13 5
13 5 220646   452   0   0   1 33   54   0 2130 86185 3014 30 59 8 3
6 5 220228   672   0   0   0   0    0   0 1929 25068 2485 25 49 16 10

The rule for identifying a server with CPU resource problems is quite simple. Whenever the value of the runqueue ?r? column exceeds the number of CPUs on the server, tasks are forced to wait for execution. There are several solutions to managing CPU overload, and these alternatives are presented in their order of desirability:

1. Add more processors (CPUs) to the server.

2. Load balance the system tasks by rescheduling large batch tasks to execute during off-peak hours.

3. Adjust the dispatching priorities (nice values) of existing tasks.

To understand how dispatching priorities work, we must remember that incoming tasks are placed in the execution queue according to their nice value (see Figure 2-15). Below we see that tasks with a low nice value are scheduled for execution above those tasks with a higher nice value.

Figure 15: Tasks queued for execution according to their nice value

Now that we can see when the CPUs are overloaded, let?s look into vmstat further and see how we can tell when the CPUs are running at full capacity.

Identifying High CPU Usage with vmstat

We can also easily detect when we are experiencing a busy CPU on the Oracle database server. Whenever the ?us? (user) column plus the ?sy? (system) column times approach 100%, the CPUs are operating at full capacity .

Please note that it is not uncommon to see the CPU approach 100 percent even when the server is not overwhelmed with work. This is because the UNIX internal dispatchers will always attempt to keep the CPUs as busy as possible. This maximizes task throughput, but it can be misleading for a neophyte.

Remember, it is not a cause for concern when the user + system CPU values approach 100 percent. This just means that the CPUs are working to their full potential. The only metric that identifies a CPU bottleneck is when the run queue (?r? value) exceeds the number of CPUs on the server.

root> vmstat 5 1

kthr     memory             page              faults        cpu
----- ----------- ------------------------ ------------ -----------
r b   avm   fre re pi po fr   sr cy in   sy cs us sy id wa
0 0 217485   386 0   0   0   4   14   0 202 300 210 20 75 3 2

Please note that in Chapter 9 we will describe a detailed method for capturing vmstat information inside STATSPACK extension tables. The approach of capturing server information along with Oracle information provides the Oracle DBA with a complete picture of the operation of the system.

The UNIX watch command

One common method for watching UNIX server load is to monitor the load average for the server. The load average is an arbitrary number that shows overall resource consumption of the server. Most load average displays have three values for the load average. The load average display shows the load averages for the past minute, the past 5 minutes, and the past 10 minutes. A low load average is ideal, and the load average should stay below zero. Whenever the value exceeds ?1? there may be a CPU overload problem.

root> w

5:54pm up 2 days, 22:45, 29 users, load average: 0.08, 0.14, 0.22
User     tty           login@ idle   JCPU   PCPU what
root     ttyp1         7:11pm 25:47                tee -a /u01/home/crup
triha    ttyp2         4:48pm    20      3      3 runmenu50 pamenu
lpayne   ttyp3         5:29pm    24                runmenu50 pamenu
burleson ttyp5         5:50pm                      -sh
tteply   ttyp6         5:05pm    10      1      1 runmenu50 pamenu
kjoslin ttyp7         1:29pm    30     38     38 runmenu50 pamenu
jperry   ttyp8         6:48am     1     51     51 runmenu50 pamenu
kharstad ttype         3:38pm 2:16                -sh
cmconway ttyqc        11:53am    17      5      5 runmenu50 pamenu
jhahn    ttyr7         1:43pm    10      2      2 runmenu50 pamenu
tbailey ttyrb        12:12pm 1:38      4      4 runmenu50 pamenu

Now, let?s conclude this chapter with a review of the main concepts and tools.

If you like Oracle tuning, see the book "Oracle Tuning: The Definitive Reference", with 950 pages of tuning tips and scripts.

You can buy it direct from the publisher for 30%-off and get instant access to the code depot of Oracle tuning scripts.

��