 |
|
Identifying High CPU Usage with vmstat
Oracle Database Tips by Donald Burleson |
We can also easily detect
when we are experiencing a busy CPU on the Oracle database server. Whenever the
us (user) plus sy (system) times approach 100, the CPUs are operating at full
capacity as shown in the next listing.
Also see tips for
100% CPU utilization.
Please note that it is not uncommon to see the CPU
approach 100 percent even when the server is not overwhelmed with work. This is
because the UNIX internal dispatchers will always attempt to keep the CPUs as
busy as possible. This maximizes task throughput, but it can be misleading for a
neophyte.
Remember, it is not a cause for concern when the user +
system CPU values approach 100 percent. This just means that the CPUs are
working to their full potential. The only metric that identifies a CPU
bottleneck is when the run queue (r value) exceeds the number of CPUs on the
server.
vmstat 5 1
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 217485 386 0 0 0 4 14 0 202 300 210 20 75 3 2
Now let's take a look at
measuring the memory utilization on the server.
Identifying
RAM Memory Bottlenecks
Contention for random access
memory (RAM) has always been a problem for Oracle. All database servers have a
limited amount of available RAM, and this RAM must be shared between the Oracle
database and all external sessions that connect to the server and consume RAM in
their Program Global Area (PGA).
Understanding
UNIX RAM memory paging
As we have noted in chapter
2, most Oracle DBA's rely on the pi column in vmstat to signal
when the server is swapping RAM memory. However, there is more to the story.
There are times when the pi column will be non-zero, even
though there is no real RAM swapping. To illustrate this let's take a simple
example. Suppose that we invoke a 20 Megabyte Oracle executable program, such as
a Pro*C program. We don't need to load all 20MB of the executable into RAM all
at once. Rather, we just want to load those pieces of the executable code that
require immediate execution. Hence, UNIX will use memory frames as necessary
later and rely on the principle of spatial locality to minimize the amount of
pages in our RAM working set.
To manage the memory segments, the UNIX kernel builds a
memory map of the entire program when it starts. Included in this map is a note
on whether the storage is ?in memory? or ?on swap disk.?
As the program starts, it begins accessing some of its
pages that have never been loaded into RAM memory. Hence, you may see vmstat
page-in's when a large number of programs are starting and allocating their RAM
memory.
During normal operation we may see various points in time
when paging in happens a lot and this is not always a cause for concern.
Remember, a UNIX process may page-in when the UNIX program is starting or is
accessing parts of its code that it had not used before.
Paging out (the po column in vmstat) happens frequently
as UNIX prepares for the possibility of a page-in. With UNIX virtual memory we
are always anticipating running out of RAM memory, and a page-out is a method
for being ready for a subsequent page-in. Also, as UNIX processes end they call
the free() system call to free the RAM pages so they can be used by new
processes.
Internals of RAM memory paging
So if RAM paging in pi may
be acceptable and paging out po may be acceptable, how do we tell when the RAM
on a server is overstressed and swapping? One answer is to correlate the UNIX
scan rate with page-in operations. When an Oracle server begins to run low on
RAM memory, the page stealing daemon process awakens and UNIX begins to treat
the RAM memory as a sharable resource, moving memory frames to the swap disk
with paging operations.
The page-stealing daemon operates in two modes. When RAM
memory shortages are not critical, the daemon will steal small chunks of
least-recently-used RAM memory from a program. As RAM resource demands continue
to increase, the page-stealing daemon escalates and begins to page-out entire
programs RAM regions. In short, we cannot always tell if the page-in operations
that we see are normal housekeeping or a serious memory shortage unless we
correlate the activity of the page stealing daemon with the page-in output.
To aid in this, the vmstat utility gives the sr column to
designate the memory page scan rate. If we see the scan rate rising steadily, we
will have hit the page-stealing daemon's first threshold, indicating that entire
program's RAM memory regions are being paged-out to the swap disk. As the
process continues, we will begin to see high page-in numbers as the
entire process in paged-back into RAM memory.
Carefully review the list below from HP/UX vmstat. The
scan rate is the furthest right column, and here we see the value of sr
rising steadily as the page stealing daemon prepares for a page-in. As the sr
value peaks, we see the page-in operation (pi) as the real RAM memory on the
Oracle server is exceeded.
root>
vmstat 2
procs memory page
r b w avm free re at pi po fr de sr
3 0 0 144020 12778 17 9 0 14 29 0 3
3 0 0 144020 12737 15 0 1 34 4 0 8
3 0 0 144020 12360 9 0 1 46 2 0 13
1 0 0 142084 12360 5 0 3 17 0 0 21
1 0 0 142084 12360 3 0 8 0 0 0 8
1 0 0 140900 12360 1 0 10 0 0 0 0
1 0 0 140900 12360 0 0 9 0 0 0 0
1 0 0 140900 12204 0 0 3 0 0 0 0
1 0 0 137654 12204 0 0 0 0 0 0 0
However, before we go into
detecting if your server memory is exceeded, we must first give you a tool for
determining how much memory you have available on your server. Next is a command
that you can issue to see how much RAM exists on your server.
Display RAM Size on DEC-UNIX
In DEC-UNIX, you can use the
uerf command in conjunction with grep to display memory size. For
example:
L 5-11
uerf -r 300 | grep -i
mem
Here, the output of the
uerf command is piped to grep to filter out and display the segments
relating to memory. The ?i option causes grep to find both uppercase and
lowercase strings. With respect to the example shown here, grep ?i mem
looks for both ?MEM? and ?mem.?
Display RAM Size on HP/UX
In HP/UX, the dmesg
command can display memory information:
L 5-12
dmesg
Memory Information:
physical page size = 4096 bytes, logical page size = 4096 bytes
Physical: 5242880 Kbytes,lockable: 4051216 Kbytes,available: 4651796 Kbytes
Display RAM Size on AIX
In IBM's AIX dialect of UNIX, you must issue two separate
commands. You start with the lsdev command followed by the lsattr
command to display the amount of memory on a server. First, execute lsdev
to list all devices. Then pipe that output through grep to filter out everything
not related to memory. That will get you the name of the memory devices that are
installed. For example:
L 5-13
>lsdev -C|grep mem
mem0 Available 00-00 Memory
Here you can see that mem0
is the name of the memory device. Now that you have the name, you can issue the
lsattr ?El command to see the amount of memory on the server. In the
following example, the server has 3GB of RAM installed.
L 5-14
>lsattr -El mem0
size 3064 Total amount of physical memory ?
You must issue the lsattr
?El command separately for each memory device.
Display RAM Size on Solaris
The prtconf command
can also be used on all Solaris servers to quickly see the amount of available
memory:
L 5-15
>prtconf|grep -i mem
Memory size: 2048 Megabytes
memory (driver not attached)
virtual-memory (driver not attached)
Display RAM Size in Linux
In Linux, the free
command can be used to quickly display the amount of RAM memory on the server:
L 5-16
>free
total used free shared buffers cached
Mem: 3728668 504688 3223980 41316 430072 29440
-/+ buffers/cache: 45176 3683492
Swap: 265032 608 264424
Using the Top Utility for Displaying RAM
While each dialect has
unique mechanisms and commands to display RAM, most UNIX servers support the top
utility. The top utility is invoked by issuing the top command from the
UNIX prompt. This will display the total amount of RAM available, and will also
show the usage of both RAM and virtual memory:
L 5-17
root> top
9:43am up 16 days, 22:33, 24 users, load average: 0.00, 0.00, 0.00
123 processes: 122 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 0.0% user, 0.9% system, 0.0% nice, 99.0% idle
Mem: 257568K av, 244988K used, 12580K free, 88732K shrd, 179772K buff
Swap: 530104K av, 9972K used, 520132K free 15452K cached
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
22417 oracle 14 0 904 904 668 R 0 0.9 0.3 0:00 top
1 root 1 0 160 120 88 S 0 0.0 0.0 0:11 init
2 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kflushd
3 root 0 0 0 0 0 SW 0 0.0 0.0 0:08 kupdate
4 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kpiod
In this example, we see from
the top utility that we have 257,568 kilobytes of RAM on the server. The top
utility is also great for seeing the top CPU consumer tasks on your server.
Using Glance to See Memory
In HP/UX and Solaris, you can enter the glance
utility in order to see the amount of RAM available. The glance utility displays
a screen showing CPU and memory utilization for the system as a whole, and for
individual processes (see Figure 5-3). For more information on glance, look to
the man pages on your UNIX server. Glance is started by entering glance
from the UNIX prompt, and exited by entering
ctrl-c.
Figure 5-25: A sample glance screen
Glance is a great tool for reactive tuning because it
shows the current status of your Oracle server, along with consumption details
for the CPU and RAM.
RAM Memory and the Swap Disk
Now that we know the amount of RAM on our server,
we are ready to investigate the RAM and swap disk usage for the Oracle server.
Whenever the memory demands of the server exceed that amount of RAM, the virtual
memory facility is invoked. With virtual memory, segments of RAM are moved onto
a special disk segment called the swap disk. The swap disk is a
special segment of disk defined by the systems administrator to hold excess RAM
memory contents. The virtual memory system commonly pages-out memory segments,
and this is not an indicator of a memory problem. However, a page-in operation
indicates that the server has exceeded the amount of available RAM and is
recalling memory segments from the swap disk (see Figure 5-4).
Figure 5-26: RAM memory paging from the swap disk
Swapping (pi) slows down a server because it takes a
large amount of time to copy memory segments from the swap disk back into RAM.
On an Oracle database server, the solution to a page-in problem involves:
-
Smaller SGA Reduce the demand for RAM by
making the SGA smaller. The SGA can be made smaller by reducing the
db_block_buffers on Oracle8i and earlier versions, db_cache_size,
sga_max_size, db_xK_cache_size on Oracle9i or shared_pool_size
or java_pool_size init.ora parameters.
-
More RAM memory Add
additional RAM memory to the server. (Remember that some 32-bit versions of
Oracle cannot use more than 1.7GB of RAM.)
-
Reduce RAM demand Reduce
RAM consumption of a database server by reducing the demands on Program Global
Area (PGA) memory. Oracle parameters such as sort_area_size can greatly
increase the amount of RAM allocated to each connected user's PGA.
A memory-bound database server always experiences paging
from the swap disk. This is displayed in the vmstat utility as the po
(page-out) and pi (page-in) columns of vmstat. Here we see that the database
server is experiencing five page-in and nine page-out operations. You can tell
that a server is having excessive memory requests by looking at page-in
operations.
L 5-18
Root> vmstat 1 2
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id
wa
0 0 218094 166 0 4 0 4 16 0 202 301 211 14 19 45
22
0 0 218094 166 0 5 9 4 14 0 202 301 211 14 19 45
22
In sum, page-out (po)
operations are a normal part of virtual memory operation, but page-in (pi)
operations indicate that the server has excessive RAM demands. Now let's take a
look at a vmstat metric that is sometimes associated with a disk I/O bottleneck.
Detecting Potential I/O Bottlenecks in AIX
Using IBM's AIX dialect of
UNIX, an I/O bound database server is usually evidenced by a high value in the
wa (wait) column of vmstat. In the next listing, we see that 22 percent of the
CPU time is being used waiting for database I/O.
L 5-19
root> vmstat
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 217485 386 0 0 0 4 14 0 202 300 210 14 19 45 22
It is important to note that
a high wait value does not always signal an I/O bottleneck. The CPU wait could
be due to other factors, and this metric simply means that the CPU is waiting
for external OS services.
Now that we understand the basics of the vmstat utility,
let's explore an easy method for capturing vmstat data, storing it inside an
Oracle table, and producing server performance reports.
This is an excerpt from "Oracle9i
High Performance tuning with STATSPACK" by Oracle Press.