All DBA's are required to set-up an Oracle monitoring infrastructure and the
architecture of the monitoring system is directly dependent on the demands of
management. Oracle system cost the end-user community millions of dollars,
and the end-user often demand service-level agreements (SLA's) that impose
strict monitoring tasks for the Oracle professional.
In many shops, the
end-user community is demanding Oracle performance statistics at a detailed
level:
- A financial services company demands that 95% of their Oracle
transactions complete within one wallclock second. The DBA most
develop a monitoring strategy to ensure compliance. (Note: there are
special techniques that can be employed to get
end-to-end
response time within Oracle).
- A stock brokerage demands that the DBA write a sophisticated real-time
proactive monitor. This monitor will quickly diagnose and pinpoint
Oracle bottlenecks, but getting the data is a problem. It requires
real-time access to all internal Oracle control structures. Running
the monitor imposes a measurable burden on overall system load.
- A manufacturing plant requires measuring response time by
hour-of-the-day to ensure fast throughput for all shifts.
These are just a few examples of the business demands that drive the Oracle
professional to create sophisticated monitoring infrastructures. Let's
example some of the most important issues when making these decisions.
The art of gathering Oracle performance data
There are several choices that need to be made when architecting an Oracle
monitoring strategy:
- Proactive or reactive monitoring? - Do we collect real-time data
for crisis management, or collect time-series snapshots for predictive
analytics?
- Collect external statistics? - Oracle does not run in a
vacuum, and the speed of disk, network and CPU have a profound influence on
overall response time.
- What statistics? - Oracle provides hundreds of statistics that
are cryptic and undocumented (i.e.
calls to kcmgas), and
you must decide what metrics are best suited to your business need.
For example, a requirement to measure end-to-end response time would benefit
from monitoring network latency (SQL*Net Roundtrips), but only tangentially
from monitoring data buffer cache efficiency.
- What level of granularity? - What are the tradeoffs between
super-details statistics and system performance? How much data is
required for a statistically significant sample size?
- Build or Buy? - While the structures of Oracle performance
collections are well documented, the Oracle professional needs to determine
whether to purchase off-the-shelf Oracle monitoring tools or build their own
custom interface.

A monitoring report for server-side metrics
So, where do we start? Let's start with Oracle own metadata and cover
these topics:
- Inside Oracle's time-series monitoring infrastructure (AWR, STATSPACK
and ASH)
- Inside the Oracle v$ performance monitoring views
- A review of extra cost monitoring tools and options
- Building your own monitoring architecture
Inside Oracle's monitoring infrastructure
Oracle has already developed a great data collection infrastructure within
the Automated Workload Repository (AWR) which is built-in to the Oracle kernel.
AWR snapshots will rarely impose any measurable system load and by default,
Oracle collects AWR data in hourly snapshots for long periods of time
(determined by the DBA). To complement AWR, Oracle has created the
Automated Session History data collection mechanism, which keeps highly-detailed
session information for a short period of time (Usually an hour, or 30 minutes,
as specified by the DBA).
Within the Oracle core software we see Oracle Corporation balance the issue
of statistics detail and overall system performance with AWR (long term
proactive) vs. ASH (short-term reactive). Oracle's own
monitoring and statistics collection mechanisms (STATSPACK and AWR) are built to
be very unobtrusive because they quickly extract snapshots of accumulators from
the x$ fixed tables.

The real-time ASH table for Oracle monitoring
Using predictive analytics, the AWR and ASH data can be used to predict
future outages and changes in performance, an important benefit for a DBA who
must meet strict SLA requirements.

Using linear regression with AWR data
Most Oracle professionals will monitor for several dozen metrics. To
get more information on AWR and ASH:
Inside Oracle's v$ performance views
The V$ views provide continually metrics at internal statistics. You may hear
the V$ views called dynamic performance views or tables for this reason and
there are over 150 of them. The v$ views are created on top of Oracle x$
fixed tables (which are in-reality C structures) and you can see the mapping of
the x$ tables to the v$ views by examining the v$ creation scripts in
$ORACLE_HOME/rdbms/admin/catalog.sql.
The v$ data is similar to the AWR and ASH data in that they raw data are
accumulators, measuring a metric since startup time. To make v$ information
useful, it's helpful to take repeated samples over a fixed time period and
compute the delta values.
A review of extra cost Oracle monitoring options & tools
Within the world of molecular physics, the scientists say that the act of
observing a sub-atomic behavior will alter the behavior, and we see the same
problem with Oracle system monitoring. In the nanosecond world of the CPU, the
act of collecting statistics about a processing operation can easily exceed the
cost of the instruction itself.
The same is true of some real-time monitors that sample SGA information
directly from the SGA and other third party tools that use server-side agent
processes to monitor Oracle. Before you invest in real-time monitors,
beware of several potential problems:
- On stressed systems, greedy real-time monitors can create more
performance problems than they reveal.
- There are few options to remedy an acute performance problem - It can be
difficult if not impossible to correct some acute performance problems.
I personally recommend the following real-time Oracle monitors
- Confio - Confio Ignite
software is a low footprint tool that allow you to identify perforance
issues in real time.
- Oracle performance pack and diagnostic pack - While not
best-of-breed, these tools are fully integrated with Oracle and fully
supported by Oracle Corporation.
For proactive monitoring tools, there are several choices. Because they
use pre-collected data from STATSPACK and AWR, they impose no burden on the
production system. I personally endorse and sell these proactive Oracle
monitors:
Building your own Oracle monitoring infrastructure
In the classic build vs. buy decision you must weigh the relative costs of
developing a customized Oracle monitoring solution in-house vs. the time costs
for the Oracle professional to cobble together a solution. The relative
benefits of each approach involve:
- Build - You get the benefit of a customized solution that can be
easily changed and enhanced, with customized alert mechanisms. The
downside is that they require significant knowledge for Oracle internals.
Sophisticated Oracle monitors
can take hundreds of hours to develop.
|
- Buy - Buying a tool has several advantages, foremost that you get
quick access to performance data. The downsides are the high costs and
the difficulty in customizing the tool to meet the business monitoring
requirements of your shop.
In sum, Oracle monitoring addresses a critical business need to quality
and response time assurance, and it is the challenge of the Oracle professional
to leverage upon the performance data from the v$ views, and AWR/ASH tables to
monitor and analyze important performance trends within Oracle.
Oracle Monitoring References:
See these related references on Oracle monitoring: