 |
|
Oracle RAC GCS Monitoring
Oracle Database Tips by Donald Burleson |
Global Cache Services
(GCS) Monitoring
The use of the GCS relative to the number of
buffer cache reads, or logical reads can be estimated by dividing
the sum of GCS requests (global
cache gets + global
cache converts +
global cache cr blocks received
+ global cache current
blocks received ) by the number of logical reads (consistent
gets + db block gets
) for a given statistics collection interval.
A global cache service
request is made in Oracle when a user attempts to access a buffer
cache to read or modify a data block and the block is not in the
local cache. A remote cache read, disk read or change access
privileges is the inevitable result. These are logical read related.
Logical reads form a superset of the global cache service
operations. The calculation for
global cache hit ratio since instance startup is:
SELECT
a.inst_id "Instance",
(A.VALUE+B.VALUE+C.VALUE+D.VALUE)/(E.VALUE+F.VALUE)
"GLOBAL CACHE HIT RATIO"
FROM
GV$SYSSTAT
A,
GV$SYSSTAT
B,
GV$SYSSTAT
C,
GV$SYSSTAT
D,
GV$SYSSTAT
E,
GV$SYSSTAT F
WHERE
A.NAME='gc
gets'
AND
B.NAME='gc converts'
AND
C.NAME='gc cr blocks received'
AND D.NAME='gc current
blocks received'
AND E.NAME='consistent gets'
AND F.NAME='db block gets'
AND B.INST_ID=A.INST_ID
AND C.INST_ID=A.INST_ID
AND D.INST_ID=A.INST_ID
AND E.INST_ID=A.INST_ID
AND F.INST_ID=A.INST_ID;
SEE CODE DEPOT FOR MORE SCRIPTS
Instance GLOBAL CACHE HIT RATIO
---------- ----------------------
1 .02403656
2 .014798887
The instance with the best access to the drives,
or the faster I/O path, will likely have the best
cache hit ratio. This is
due to the way Oracle's RAC caching algorithm works as it may decide
that the cost of doing a local read is higher than reading into the
other cache and siphoning it across the cluster interconnect. In
formula form:
(gc gets + gc converts + gc cr blocks
received +
gc current blocks received) / (consistent
gets + db block gets)
Blocks frequently requested by local and remote
users will be very hot. If a block is hot, its transfer is delayed
for a few milliseconds to allow the local users to complete their
work. The following ratio provides a rough estimate of how prevalent
this is:
SELECT
A.INST_ID "Instance",
A.VALUE/B.VALUE "BLOCK
TRANSFER RATIO"
FROM
GV$SYSSTAT A, GV$SYSSTAT B
WHERE
A.NAME='gc defers'
AND B.NAME='gc current
blocks served'
AND B.INST_ID=A.INST_ID;
SEE CODE DEPOT FOR MORE SCRIPTS
Instance BLOCK TRANSFER RATIO
---------- --------------------
1 .052600105
2 .078004479
If the above SELECT generates a ratio of more
than 0.3, a fairly hot data set is indicated. If this is the case,
blocks involved in busy waits should be analyzed. The following
columns should be queried to find the blocks involved in busy waits:
n
name
n
kind
n
forced_reads
n
forced_writes
For example:
col
instance format 99999999
col name format a20
col kind format a10
set lines 80 pages 55
Select
INST_ID "Instance",
NAME,
KIND,
sum(FORCED_READS) "Forced
Reads",
sum(FORCED_WRITES) "Forced
Writes"
FROM GV$CACHE_TRANSFER
WHERE owner#!=0
GROUP BY INST_ID,NAME,KIND
ORDER BY 1,4 desc,2
/
SEE GRID CODE DEPOT FOR DOWNLOAD
Instance NAME
KIND Forced Reads Forced Writes
--------- --------------------
---------- ------------ -------------
1 MOD_TEST_IND
INDEX 308 0
1 TEST2
TABLE 64 0
1 AQ$_QUEUE_TABLES
TABLE 5 0
2 TEST2
TABLE 473 0
2 MOD_TEST_IND
INDEX 221 0
2 AQ$_QUEUE_TABLES
TABLE 2 0
These values come from the
gv$cache_transfer
view. Alternatively, the
cr_requests and
current_requests columns in
gv$cr_block_server can be examined. Also, the values shown
for the global cache busy ,
buffer busy global cache ,
and buffer busy global cr
statistics from the
gv$sysstat view should be examined.
SELECT
INST_ID,
sum(CR_REQUESTS) "CR
Requests",
sum(CURRENT_REQUESTS)
"Current Requests"
FROM
GV$CR_BLOCK_SERVER
GROUP BY
INST_ID;
INST_ID CR Requests Current
Requests
---------- -----------
----------------
1
28940 2244
2
31699 837
SELECT
inst_id "Instance",
event "Wait Event",
total_waits,
time_waited
FROM
GV$SYSTEM_EVENT
WHERE
event in (
'global cache busy',
'buffer busy global
cache',
'buffer busy global CR')
ORDER BY
INST_ID;
SEE CODE DEPOT FOR MORE SCRIPTS
Instance Wait Event TOTAL_WAITS
TIME_WAITED
--------- ------------------------
----------- -----------
1 buffer busy global CR
1 0
1 global cache busy
1073 7171
2 global cache busy
973 7524
If a problem is discovered, the object causing
the problem should be identified along with the instance that is
accessing the object, and how the object is being accessed. If
necessary, the contention can be alleviated by:
n
Reducing hot spots by spreading the accesses to index
blocks or data blocks.
n
Using Oracle hash or range partitions wherever
applicable, just as it would be done in single instance Oracle
databases.
n
Reducing concurrency on the object by implementing
load balancing or resource management. For example, decrease the
rate of modifications to that object by using fewer database
processes.
In RAC, as in a single instance Oracle database,
blocks are only written to disk for aging, cache replacement, or
checkpoints. When a data block is replaced from the cache due to
aging or when a checkpoint occurs and the block was previously
changed in another instance but not written to disk, Oracle sends a
message to notify the other instance that Oracle will perform a
fusion write to move the data block to disk.
These fusion writes are monitored with the
following ratio. It reveals the proportion of writes that Oracle
manages.
SELECT
a.inst_id "Instance",
A.VALUE/B.VALUE "Cache Fusion
Writes Ratio"
FROM
GV$SYSSTAT A,
GV$SYSSTAT B
WHERE
a.name='DBWR fusion
writes'
AND b.name='physical writes'
AND b.inst_id=a.inst_id
ORDER BY
A.INST_ID;
Instance Cache Fusion Writes
Ratio
---------
-------------------------
1
.216290958
2
.131862042
The larger this ratio is, the higher the number
of written blocks that have been copied with their previous changes
between the RAC instances. A large ratio is the result of:
n
Insufficiently sized caches.
n
Insufficient checkpoints.
n
Large numbers of buffers written due to cache
replacement or checkpointing.
For example, 0.21 means that 21% of the buffers
written to disk were globally dirty. A fusion write does not involve
an additional write to disk. A fusion write does require messaging
to arrange the transfer with the other instances. This indicates
that fusion writes are in fact a subset of all the instance's
physical writes.
|