On most operating systems CPU states include (user, system, wait
and idle) components, and it's important to understand that
Oracle does not have direct access to CPU utilization
statistics. While this note is talking about CPU
utilization from the server side (because many instances share
the same CPU's), the Oracle documentation refers to Oracle's
"secondary" CPU metrics, essentially how Oracle "perceives" it's
requests for CPU services.
It's an instrumentation issue, and it's all about how Oracle
captures the CPU consumption information from the OS.
Remember, unless you are employing a vmware or processor
affinity tool, any instance shares the same pool of CPU
resources as the application and dozens of other instances.
- It is possible to have a "server-side" CPU bottleneck (runqueue > cpu
count), and the instance is not even aware of it.
- It is possible that an instance may bee experiencing a "application side"
CPU bottleneck, even though the OS runqueues are low.
The gap here is clear, we must separate the server-side CPU
metrics from the Oracle instance-centric perceptions of the CPU
state.
The OS does not provide full CPU metrics to executing tasks (it
would slow-down the processors), so by definition, Oracle's
"perception" of CPU is limited by what the external environment
can offer. Oracle may only be a small percentage of the CPU
load, and CPU utilization is wholly external to the instance.
References on Oracle CPU consumption
Charles Hooper offers these notes on high CPU consumption:
Interestingly, "Oracle Performance Tuning 101" (2001) by
Gaja Vaidyanatha states:
"One of the classic myths about CPU utilization is that a system with 0 percent
idle is categorized as a system undergoing CPU bottlenecks... It is perfectly
okay to have a system with 0 percent idle, so long as the average runnable queue
for the CPU is less than (2 x number of CPUs)."
The above quote seems to at least partially support your suggestion.
However, just a handful references that state 100% utilization is not optimal:
"Optimizing
Oracle Performance" page 264, by Cary Millsap:
"But be careful: pegging CPU utilization at 100% over
long periods often causes OS scheduler thrashing, which can reduce
throughput. On interactive-only systems, CPU utilization that stays to the
right of the knee over long periods is bad. The goal of an interactive-only
system user is minimized response time. When CPU utilization excees
the knee in the response time curve, response time fluctuations become
unbearable."
"Forecasting
Oracle Performance" (Page 71)
"With the CPU subsystem shown in Figure
3-7, queuing does not set in (that is response time does not significantly
change) until utilization is around 80% (150% workload increase). The CPU queue
time is virtually zero and then skyrockets because there are 32 CPUs. If the
system had fewer CPUs, the slope, while still steep, would have been more
gradual."
"Forecasting
Oracle Performance" (Page 195)
"The high-risk solution would need to
contain at least 22 CPUs. Because the reference ratios came from a 20 CPU
machine, scalability is not significant. However, recommending a solution at 75%
utilization is significant and probably reckless. At 75% utilization, the
arrival rate is already well into the elbow of the curve. It would be extremely
rare to recommend a solution at 75% utilization."
MOSC Note:148176.1 Diagnosing hardware configuration
induced performance problems (very short snippet):
"In general your utilization on anything should never be
over 75-80%..."
10g R2 Performance Tuning Guide (paraphrase):
"During peak workload
hours, 90% CPU utilization is acceptable. "In addition to the minimum
installation recommendations, your hardware resources need to be adequate for
the requirements of your specific applications. To avoid hardware-related
performance bottlenecks, each hardware component should operate at no more than
80% of capacity."
http://download.oracle.com/docs/cd/A95434_01/a86676/sizing.htm
"In addition to the minimum installation recommendations, your hardware
resources need to be adequate for the requirements of your specific
applications. To avoid hardware-related performance bottlenecks, each hardware
component should operate at no more than 80% of capacity."
http://www.oracle.com/technology/products/ias/portal/pdf/oow_10gr2_1337_pepper.pdf
Page 26: "When CPU utilization rises above 80%, the system overhead increases
significantly to handle other tasks. The lifespan of each child process is
longer and, as a result, the memory usage supporting those active concurrent
processes increases significantly. At stable load, 10% login, and CPU
utilization below 80%, the memory usage formula is as follows..."
Page 27: "When system load generates a high CPU utilization (>90%) some of the
constituent processes do not have enough CPU resource to complete within a
certain time and remain 'active'."
http://download-west.oracle.com/docs/cd/B19306_01/server.102/b28051.pdf
"Workload is an important factor when evaluating the level of resource
utilization for your system. During peak workload hours, 90 percent utilization
of a resource, such as a CPU with 10 percent idle and waiting time, can be
acceptable. However, if your system shows high utilization at normal workload,
then there is no room for additional workload."
http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/configbp.htm#sthref371
"If you are experiencing high load (excessive CPU utilization of over 90%,
paging and swapping), then you need to tune the system before proceeding with
Data Guard. Use the V$OSSTAT or V$SYSMETRIC_HISTORY view to monitor system usage
statistics from the operating system."