Oracle and CPU utilization metrics

Oracle Database Tips by Donald Burleson

If you suspect a CPU utilization problem in your Oracle database, see these important notes on 100% CPU and Oracle. Also see Monitoring CPU with UNIX.

CPU utilization and Oracle

On most operating systems CPU states include (user, system, wait and idle) components, and it's important to understand that Oracle does not have direct access to CPU utilization statistics. While this note is talking about CPU utilization from the server side (because many instances share the same CPU's), the Oracle documentation refers to Oracle's "secondary" CPU metrics, essentially how Oracle "perceives" it's requests for CPU services. It's an instrumentation issue, and it's all about how Oracle captures the CPU consumption information from the OS.

Remember, unless you are employing a vmware or processor affinity tool, any instance shares the same pool of CPU resources as the application and dozens of other instances.

- It is possible to have a "server-side" CPU bottleneck (runqueue > cpu count), and the instance is not even aware of it.

- It is possible that an instance may bee experiencing a "application side" CPU bottleneck, even though the OS runqueues are low.

The gap here is clear, we must separate the server-side CPU metrics from the Oracle instance-centric perceptions of the CPU state.

The OS does not provide full CPU metrics to executing tasks (it would slow-down the processors), so by definition, Oracle's "perception" of CPU is limited by what the external environment can offer. Oracle may only be a small percentage of the CPU load, and CPU utilization is wholly external to the instance.

References on Oracle CPU consumption

Charles Hooper offers these notes on high CPU consumption:

Interestingly, "Oracle Performance Tuning 101" (2001) by Gaja Vaidyanatha states:
"One of the classic myths about CPU utilization is that a system with 0 percent idle is categorized as a system undergoing CPU bottlenecks... It is perfectly okay to have a system with 0 percent idle, so long as the average runnable queue for the CPU is less than (2 x number of CPUs)."

The above quote seems to at least partially support your suggestion.

However, just a handful references that state 100% utilization is not optimal:
"Optimizing Oracle Performance" page 264, by Cary Millsap:

"But be careful: pegging CPU utilization at 100% over long periods often causes OS scheduler thrashing, which can reduce throughput. On interactive-only systems, CPU utilization that stays to the right of the knee over long periods is bad. The goal of an interactive-only system user is minimized response time. When CPU utilization excees the knee in the response time curve, response time fluctuations become unbearable."

"Forecasting Oracle Performance" (Page 71)

"With the CPU subsystem shown in Figure 3-7, queuing does not set in (that is response time does not significantly change) until utilization is around 80% (150% workload increase). The CPU queue time is virtually zero and then skyrockets because there are 32 CPUs. If the system had fewer CPUs, the slope, while still steep, would have been more gradual."

"Forecasting Oracle Performance" (Page 195)

"The high-risk solution would need to contain at least 22 CPUs. Because the reference ratios came from a 20 CPU machine, scalability is not significant. However, recommending a solution at 75% utilization is significant and probably reckless. At 75% utilization, the arrival rate is already well into the elbow of the curve. It would be extremely rare to recommend a solution at 75% utilization."

MOSC Note:148176.1 Diagnosing hardware configuration induced performance problems (very short snippet):

"In general your utilization on anything should never be over 75-80%..."

10g R2 Performance Tuning Guide (paraphrase):

"During peak workload hours, 90% CPU utilization is acceptable. "In addition to the minimum installation recommendations, your hardware resources need to be adequate for the requirements of your specific applications. To avoid hardware-related performance bottlenecks, each hardware component should operate at no more than 80% of capacity."

http://download.oracle.com/docs/cd/A95434_01/a86676/sizing.htm

"In addition to the minimum installation recommendations, your hardware resources need to be adequate for the requirements of your specific applications. To avoid hardware-related performance bottlenecks, each hardware component should operate at no more than 80% of capacity."

http://www.oracle.com/technology/products/ias/portal/pdf/oow_10gr2_1337_pepper.pdf

Page 26: "When CPU utilization rises above 80%, the system overhead increases significantly to handle other tasks. The lifespan of each child process is longer and, as a result, the memory usage supporting those active concurrent processes increases significantly. At stable load, 10% login, and CPU utilization below 80%, the memory usage formula is as follows..."

Page 27: "When system load generates a high CPU utilization (>90%) some of the constituent processes do not have enough CPU resource to complete within a certain time and remain 'active'."

http://download-west.oracle.com/docs/cd/B19306_01/server.102/b28051.pdf

"Workload is an important factor when evaluating the level of resource utilization for your system. During peak workload hours, 90 percent utilization of a resource, such as a CPU with 10 percent idle and waiting time, can be acceptable. However, if your system shows high utilization at normal workload, then there is no room for additional workload."

http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/configbp.htm#sthref371

"If you are experiencing high load (excessive CPU utilization of over 90%, paging and swapping), then you need to tune the system before proceeding with Data Guard. Use the V$OSSTAT or V$SYSMETRIC_HISTORY view to monitor system usage statistics from the operating system."

��