Question: I checked by server and was dismayed to
find my CPU consumption at 100%. How do I reduce Oracle
CPU consumption?
Answer: Remember, all virtual memory servers are designed
to drive CPU to 100% as soon as possible. Hence, 100% CPU utilization
may not be sub-optimal because 100% CPU simply means that all
the processors are busy (that's how the server SMP architecture is designed).
You only have CPU shortages when there are more tasks waiting for
CPU, than you have CPU's (your cpu_count).
Also see Oracle SQL causing
high CPU.
The only way to tell if your server has a CPU bottleneck is when
the CPU runqueue values (per vmstat) exceeds the number of processors on the
server (cpu_count).
There is a "Chicken Little" myth among neophytes who panic when
they see that CPU is 100%, and they do not
understand that 100% CPU utilization this is the desired (and optimal) behavior.
Let's take a closer look at CPU and Oracle databases.
There is a big differences between a database with 100% CPU
usage and a system with CPU enqueues. You can see ?real?
enqueues on CPU resources when the runqueue (r) column in vmstat
exceeds the cpu_count parameter value, and you can also detect
an overloaded CPU when you see the ?resmgr:cpu quantum? event in
a top-5 timed event on a AWR or STATSPACK report.
The myth of 100% CPU being a bottleneck
Billy Verreynne, a practicing software engineer in South
Africa notes that an Oracle database experiencing 100% CPU, by itself, does not indicate a
CPU problem:
"100% CPU alone not an indication of a problem and can indicate an optimal
state"
The book Oracle Performance Tuning 101" (2001) by Gaja
Vaidyanatha notes this common misperception about the 100% CPU
utilization myth:
"One of the classic myths about CPU utilization is that a system with 0
percent idle is categorized as a system undergoing CPU bottlenecks...
It is perfectly okay to have a system with 0 percent idle, so long as the
average runnable queue for the CPU is less than (2 x number of CPUs)."
Please note that it is not uncommon to see the CPU
approach 100 percent even when the server is not overwhelmed with work. This is
because the UNIX internal dispatchers will always attempt to keep the CPUs as
busy as possible. This maximizes task throughput, but it can be misleading for a
neophyte.
Remember, it is not a cause for concern when the user +
system CPU values approach 100 percent. This just means that the CPUs are
working to their full potential. The only metric that identifies a CPU
bottleneck is when the run queue (r value) exceeds the number of CPUs on the
server. The this output below, the 16 CPU server is not experiencing a CPU
shortage because the (r) "runqueue" value is 12, meaning that all Oracle asks
are getting prompt service.
vmstat 5 1
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs
us sy id wa
12 0 217485 386 0 0 0 4 14 0 202 300 210
20 75 3 2
Here is justification from IBM that 100% CPU is "optimal"
(from the
IBM documentation):
"Optimum use would have the CPU working 100 percent of the time.
This holds true in the case of a single-user system with no need to
share the CPU. Generally, if us + sy time is below 90 percent, a
single-user system is not considered CPU constrained. However, if us +
sy time on a multiuser system exceeds 80 percent, the processes may
spend time waiting in the run queue. Response time and throughput might
suffer.
To check if the CPU is the bottleneck, consider the four cpu columns and
the two kthr (kernel threads) columns in the vmstat report. It may also
be worthwhile looking at the faults column:"
Viewing CPU utilization for Oracle
Server statistics can be viewed in a variety of
ways using standard server-side UNIX and Linux tools such as
vmstat,
glance ,
top and
sar . The goal is to
ensure that the database server has enough CPU and RAM resources at
all times in order to manage the Oracle requests.
Oracle has many operations that are CPU intensive, and
tuning can reduce CPU:
-
Logical I/O (consistent gets) has high CPU overhead, and
buffer touches can be reduced via SQL tuning (adding more selective indexes,
materialized views).
-
Library cache contention (high parses) drives-up CPU
Having 100% CPU is not always a problem, it's normal for virtual
memory servers to drive CPU consumption to 100%. Also note that in Oracle10g
and beyond,
we have the _optimizer_cost_model which is set to CPU, from the default
in 9i and earlier of IO. This parameter is for Oracle databases that are
CPU-bound, and it tells Oracle to create the CBO decision tree weights with
estimated CPU consumption, not estimated I/O costs.
See
here for details.
When analyzing vmstat output, there are several metrics to
which you should pay attention. For example, keep an eye on the CPU run queue
column. The run queue should never exceed the number of CPUs on the server.
If
you do notice the run queue exceeding the amount of CPUs, it's a good indication
that your server has a CPU bottleneck. Inside
Oracle, you can display CPU for any Oracle user session with this script:
select
ss.username,
se.SID,
VALUE/100 cpu_usage_seconds
from
v$session ss,
v$sesstat se,
v$statname
sn
where
se.STATISTIC# = sn.STATISTIC#
and
NAME like '%CPU used by this
session%'
and
se.SID = ss.SID
and
ss.status='ACTIVE'
and
ss.username is not null
order by VALUE desc;
For complete scripts to monitor CPU, see the
Oracle script download.See these related notes on CPU consumption: