A common question is "why is my 100%
utilization at 100%". There is a great deal of concern about the measurement
of CPU at the Oracle server level.
If you suspect a CPU utilization problem, see
these important notes on 100%
CPU and Oracle. Also see
Oracle and CPU utilization metrics.
Also see my notes on OS Busy scripts.
Once we understand the CPU resources are
scarce (just like RAM resources), and not to be wasted), we need to understand
how to tell if our Oracle server is making optimal usage of his computing
hardware.
There are many OS utilities that allow us to
see CPU utilization statistics, including these, but also with uptime and
procinfo.
Each of these tools display CPU processor
metrics at a finer level of detail than Oracle. This is because the
OS does not reveal all processor details to applications (To UNIX, Oracle is
just another application), and the best place to see what's going on inside your
server is to use the operating systems CPU monitors. These will report
different metrics on CPU utilization:
- The runqueue - This is
the far left-hand column of the vmstat command display (labeled with
an "r"). It reports the total length of the CPU dispatcher queue.
When the runqueue exceeds the number of CPU's on the server, you have have
an overloaded server with a CPU bottleneck.
- The load average -
This is defined as the sum of the run queue length and the number of jobs
currently running on the CPUs. In each display of the load average consists
of three numbers. Most often, the load average numbers show a
descending order from left to right, with the load average for 1,
5, and 15 minutes in the past.
Occasionally, however, an ascending order appears (e.g. like that shown in
the top output).
There are a host of UNIX
commands that display CPU and memory consumption. While there are
dialect-specific utilities such as glance, we will look at the common vmstat and
top utilities.
Using top to monitor CPU
The "top" command can be used to display CPU
utilization. The metrcis are:
- load average - The
load average is computed as
- CPU states - This show
percentage metrics for current processor usage.

System:
corp-hp1 Thu Jul 6 09:14:23 2000
Load averages:
0.04, 0.03, 0.03
340 processes: 336
sleeping, 4 running
Cpu states:
CPU LOAD
USER NICE SYS IDLE BLOCK SWAIT INTR SSYS
0 0.06
5.0% 0.0% 0.6% 94.4% 0.0% 0.0% 0.0% 0.0%
1 0.06
0.0% 0.0% 0.8% 99.2% 0.0% 0.0% 0.0% 0.0%
2 0.06
0.8% 0.0% 0.0% 99.2% 0.0% 0.0% 0.0% 0.0%
3 0.06
0.0% 0.0% 0.2% 99.8% 0.0% 0.0% 0.0% 0.0%
4 0.00
0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0%
5 0.00
0.2% 0.0% 0.0% 99.8% 0.0% 0.0% 0.0% 0.0%
--- ---- -----
----- ----- ----- ----- ----- ----- -----
avg 0.04
1.0% 0.0% 0.2% 98.8% 0.0% 0.0% 0.0% 0.0%
Memory: 493412K
(229956K) real, 504048K (253952K) virtual, 767868K free Page# 1
/49
CPU TTY PID USERNAME PRI NI SIZE
RES STATE TIME %WCPU %CPU COMMAND
0 - 26835 applmgr 154 20 30948K
11936K sleep 0:49 3.91 3.90 f45runw
2 - 27210 applmgr 154 20 31316K
12836K sleep 0:49 1.91 1.91 f45runw
5 ? 36 root 152 20 0K
0K run 56:28 1.16 1.16 vxfsd
1 ? 347 root 154 20 32K
96K sleep 567:15 1.11 1.11 syncer
5 - 27429 oracle 154 20 20736K
2608K sleep 0:23 0.39 0.38 oraclePROD
4 - 27067 oracle 154 20 21984K
3792K sleep 1:31 0.36 0.36 oraclePROD
Using
svmon on AIX
root@AIX1 [/]#svmon
size inuse
free pin virtual
memory 1048566 1023178
4976 55113 251293
pg space 524288 10871
work pers clnt
pin 55116 0 0
in use 250952 772224 2
Where:
size = the number of real memory frames (size of real memory)
inuse = is the number of frames containing pages
pin = Number of frames containing pinned pages in use
The svmon command can also be used with the -p option to
display characteristics for a specific process ID (PID):
Root> svmon -P 26060
-------------------------------------------------------------------------------
Pid Command Inuse Pin
Pgsp Virtual 64-bit Mthrd
26060 pr 6871 1607
1022 6001 N N
Vsid Esid Type Description
Inuse Pin Pgsp Virtual Addr Range
24029 d work shared library
text 3992 0 22 2779 0..65535
0 0 work kernel seg
2509 1606 926 2897 0..32767 :
65475..65535
105e4 2 work process
private 188 1 48 230 0..273 :
65298..65535
285ea f work shared library data
92 0 26 95 0..919
185e6 1 pers
code,/dev/lvs001:301 81 0 - - 0..149
6c59b - pers
/dev/lvs001:92402 6 0 - - 0..9
744fd - pers
/dev/lvs001:763909 3 0 - - 0..9
7c5ff - pers
/dev/lvs001:1327130 0 0 - - 0..29
The watch command
The w command shows the "load average" which is computed
from the current runqueue values. Watch also shows the same information
uptime did.
$ w
22:42:14 up 2:34, 2 users, load
average: 0.00, 0.00, 0.00
USER TTY LOGIN@ IDLE JCPU PCPU WHAT
terry :0 20:10 ?xdm? 5:24 1.49s gnome-session
terry pts/1 22:22 0.00s 0.24s 0.04s /usr/sbin/sshd
Using SAR
The sar utility (System
Activity Reporter) is the system activity reporter that is quite popular in HP/UX,
and is widely becoming available for AIX and Solaris systems. SAR has much of
the same functionality as vmstat, but provides additional details.
There are four major flags
in sar:
sar -u = to see CPU
sar -w = for swapping
sar -b = for buffer activity
sar -d = for disk usage
Sar
-w (memory switching and swapping activity)
swpin/s Number of process
swapins per second;
swpot/s Number of process
swapouts per second;
bswin/s Number of 512-byte swap
in's per second.
bswot/s Number of 512-byte swap
out's per second
pswch/s Number of process
context switches per second
ROOT-/
>sar -w 5 5
HP-UX corp-hp1 B.11.00 U 9000/800
08/09/00
19:37:57 swpin/s bswin/s swpot/s bswot/s
pswch/s
19:38:02 0.00 0.0 0.00
0.0 222
19:38:07 0.00 0.0 0.00
0.0 314
19:38:12 0.00 0.0 0.00
0.0 280
19:38:17 0.00 0.0 0.00
0.0 295
19:38:22 0.00 0.0 0.00
0.0 359
Average 0.00 0.0 0.00
0.0 294
Sar
'u (CPU Report)
cpu cpu number (only on a
multi-processor
system with the -M option);
%usr user mode;
%sys system mode
%wio idle with some process
waiting for I/O
%idle otherwise idle.
ROOT-/
>sar -u 2 5
HP-UX burleson B.11.00 U 9000/800
08/09/00
08:37:06 %usr %sys %wio
%idle
08:37:07 43 57 0
0
08:37:08 45 55 0
0
08:37:09 44 56 0
0
08:37:10 44 56 0
0
08:37:11 43 57 0
0
08:37:12 52 48 0
0
08:37:13 49 51 0
0
08:37:14 49 51 0
0
08:37:15 57 43 0
0
08:37:16 65 35 0
0
08:37:17 40 29 12
19
08:37:18 23 20 12
44
08:37:19 0 1 0
99
Sar ?b (buffer activity report)
bread/s Number of physical
reads per second from disk
bwrit/s Number of physical
writes per second
lread/s Number of reads per
second from buffer cache
lwrit/s Number of writes per
second to buffer cache
cache;
%rcache Buffer cache hit ratio
for read requests
%wcache Buffer cache hit ratio
for write requests
pread/s Number of reads per
second from
pwrit/s Number of writes per
second to character
root>sar -b 1 6
HP-UX corp-hp1 B.11.00 U 9000/800
08/09/00
19:44:53 bread/s lread/s %rcache bwrit/s
lwrit/s %wcache pread/s pwrit/s
19:44:54 0 91 100
9 19 53 0 0
19:44:55 0 0 0
0 5 100 0 0
19:44:56 0 6 100
9 8 0 0 0
19:44:57 0 30 100
9 20 55 0 0
19:44:58 0 1 100
0 3 100 0 0
19:44:59 0 1 100
9 4 0 0 0
Average 0 22 100
6 10 39 0 0
Using sadc
The sadc (System
Activity Report Package) is a popular package that can be used inside cron to
schedule collections of server statistics.
All of the sadc reports
are located in the /usr/lbin/sa directory. These reports must be run as root
and provide detailed server information. One of the most popular sadc reports
is sa1:
#! /usr/bin/sh
# @(#)
$Revision: 72.3 $
# sa1.sh
DATE=`date +%d`
ENDIR=/usr/lbin/sa
DFILE=/var/adm/sa/sa$DATE
cd $ENDIR
if [ $# = 0 ]
then
exec $ENDIR/sadc
1 1 $DFILE
else
exec $ENDIR/sadc
$* $DFILE
fi
Using glance to
monitor Oracle CPU
For
complete details, see my notes on
monitoring
Oracle with glance.
The glance utility is
provided on HP/UX systems to provide a graphical display of server performance.
It displays current CPU, memory, disk and swap consumption, and also reports on
the top processes.

Using the vmstat
utility to monitor Oracle
The UNIX vmstat utility is especially useful
for monitoring the performance of Oracle databases. You'll find vmstat on almost
all implementations of UNIX, including Linux. Click here for details on
monitoring Oracle CPU with
vmstat, and
building a CPU monitor for Oracle.
The vmstat utility is the
most common Unix monitor utility. It is found on virtually all dialects of UNIX
(vmstat is called osview on IRIX), and vmstat quickly display's server values.
These values include:
r = runqueue -
When this value exceeds the number of CPUs (lsdev -C|grep Proc|wc -l). then
the sever is experiencing an CPU bottleneck
pi = Page in - Any
non-zero values indicates that the server is short on memory and RAM memory
is being send to the swap disk. However, this can also occur when
numerous programs are accessing their memory for the first time, so always
remember to check the scan rate 'sr' column. If both are non-zero. Then you are
short on RAM.
sr = scan rate - If we see 'sr' rising steadily we
know that the paging daemon is busy allocating memory pages.
For AIX and HP/UX, vmstat
provides the following CPU values. These values are expressed as percentages
and will sum to 100
us = user CPU
percentage
sy = system CPU
percentage
Id = Idle CPU
percentage
wa = wait CPU
percentage
When us+sy approaches 100,
then the CPUs are busy, but not necessarily overloaded. Only the run queue
values determines CPU overload and only when 'r' exceeds the number of CPUs on
the server.
When 'wa' values exceed 20,
then 20% of the processing time is waiting for a resource, usually I/O. It is
common to see high wa values during backup and exports, but high wa values can
also indicate an I/O bottleneck.
>vmstat 3
kthr memory
page faults cpu
----- -----------
------------------------ ------------ -----------
r b avm fre re pi po fr sr
cy in sy cs us sy id wa
0 0 84283 207 0 1 1 59
174 0 178 40 142 18 4 75 4
0 0 84283 187 0 4 0 0
0 0 144 294 70 2 1 91 6
0 0 84283 184 0 0 0 0
0 0 171 740 99 5 2 89 4
0 0 84283 165 0 0 0 0
0 0 173 193 98 1 8 52 40
0 0 84283 150 0 3 0 0
0 0 205 615 136 4 2 87 6
0 0 84283 141 0 1 0 0
0 0 281 935 192 5 0 91 4
vmstat for Solaris
The display format for
vmstat in Solaris is quite different than AIX and HP/UX. In Solaris the 'vmstat
-n' command is used to display server stats. The relevant columns are:
Pi = page-ins
Us = CPU user
time
Sys = CPU
system time
Id = CPU idle
time
R = runqueue -
If this exceeds the number of CPU's then you are CPU-bound
In the example below, we
sample an overstresses Oracle server. Note that us + sy = 100, and that the r
value far exceeds the 32 CPU's on this server:
root> vmstat -n 1
memory page
faults
avm free re at pi po fr de sr
in sy cs
41128 118400 4424 92 0 11 90 0 0
1124 77234 4113
CPU
cpu procs
us sy id r b w
49 51 0 100 2 0
46 54 0
49 51 0
42 58 0
54719 115379 4508 105 0 10 102 0 0
1107 78021 3912
44 56 0 67 2 0
56 44 0
58 42 0
45 55 0
54719 118479 4305 113 0 10 116 0 0
1070 75044 4085
41 59 0 67 2 0
56 44 0
50 50 0
50 50 0
54719 125113 4088 124 0 10 124 0 0
1055 75103 4520
52 48 0 67 2 0
50 50 0
65 35 0
53 47 0
54719 141189 3659 116 0 9 127 0 0
1065 71355 4882
60 40 0 67 2 0
60 40 0
61 39 0
61 39 0
54719 178306 3113 104 0 9 309 0 0
1075 64446 4741
4 15 81 67 2 0
9 13 78
16 9 75
10 9 81