This is an excerpt from the bestselling book
Oracle Grid & Real Application Clusters. To get immediate
access to the code depot of working RAC scripts, buy it
directly from the publisher and save more than 30%.
Most of the events reported in
the dynamic performance views or in a STATSPACK report that show a
high total time are actually normal. However, if monitored response
times increase and the STATSPACK report shows a high proportion of
wait time for cluster accesses, the cause of these waits needs to be
determined. STATSPACK reports provide a breakdown of the wait
events, with the five highest values sorted by percentages. The
following specific RAC-related events should be monitored:
* global cache open s: A block
* global cache open x: A block
was selected for IUD.
* global cache null to s: A
block was transferred for SELECT.
* global cache null to x: A
block was transferred for IUD.
* global cache cr request: A
block was requested transferred for consistent read purposes.
* Global Cache Service
Utilization for Logical Reads.
The following sections will
provide more information on these events to help show why they are
important to monitor.
The global cache open s and
global cache open x Events
The initial access of a
particular data block by an instance generates these events. The
duration of the wait should be short, and the completion of the wait
is most likely followed by a read from disk. This wait is a result
of the blocks that are being requested and not being cached in any
instance in the cluster database. This necessitates a disk read.
When these events are associated
with high totals or high per-transaction wait times, it is likely
that data blocks are not cached in the local instance and that the
blocks cannot be obtained from another instance, which results in a
disk read. At the same time, suboptimal buffer cache hit ratios may
also be observed. Unfortunately, other than preloading heavily used
tables into the buffer caches, there is little that can be done
about this type of wait event.
The global cache null to s
and global cache null to x Events
These events are generated by
inter-instance block ping across the network. Inter-instance block
ping is when two instances exchange the same block back and forth.
Processes waiting for global cache null to s events are waiting for
a block to be transferred from the instance that last changed it.
When one instance repeatedly requests cached data blocks from the
other RAC instances, these events consume a greater proportion of
the total wait time. The only method for reducing these events is to
reduce the number of rows per block to eliminate the need for block
swapping between two instances in the RAC cluster.
The global cache cr request
This event is generated when an
instance has requested a consistent read data block and the block to
be transferred had not arrived at the requesting instance. Other
than examining the cluster interconnects for possible problems,
there is nothing that can be done about this event other than to
modify objects to reduce the possibility of contention.
Global Cache Service Times
When global cache waits
constitute a large proportion of the wait time, as listed on the
first page of the STATSPACK or AWRRPT report, and if response time
or throughput does not conform to service level requirements, the
Global Cache Service workload characteristics on the cluster
statistics page of the STATSPACK or AWRRPT reports should be
examined. The STATSPACK or AWRRPT reports should be taken during
heavy RAC workloads.
If the STATSPACK report shows
that the average GCS time per request is high, it is the result of
one of the following:
* Contention for blocks.
* System loads.
* Network issues.
The operating system logs and
operating system statistics are used to indicate whether a network
link is congested. A network link can be congested if:
* Packets are being routed
through the public network instead of the private interconnect.
* The sizes of the run queues
If CPU usage is maxed out and
processes are queuing for the CPU, the priority of the GCS processes
(LMSn) can be raised over other processes to lower GCS times. The
load on the server can also be alleviated by reducing the number of
processes on the database server, increasing capacity by adding CPUs
to the server, or adding nodes to the cluster database.