This is an excerpt from the bestselling book
Oracle Grid & Real Application Clusters. To get immediate
access to the code depot of working RAC scripts, buy it
directly from the publisher and save more than 30%.
The following are undesirable
statistics or statistics for which the values should always be as
near to zero as possible:
* global cache blocks lost: This
statistic shows block losses during transfers. High values indicate
network problems. The use of an unreliable IPC protocol such as UDP
may result in the value for global cache blocks lost being non-zero.
When this occurs, take the ratio of global cache blocks lost divided
by global cache current blocks served plus global cache cr blocks
served. This ratio should be as small as possible. Many times, a
non-zero value for global cache blocks lost does not indicate a
problem because Oracle will retry the block transfer operation until
it is successful.
* global cache blocks corrupt:
This statistic shows if any blocks were corrupted during transfers.
If high values are returned for this statistic, there is probably an
IPC, network, or hardware problem.
An example SELECT to determine
if further examination is needed would be:
SELECT
A.VALUE "GC BLOCKS LOST 1",
B.VALUE "GC BLOCKS CORRUPT 1",
C.VALUE "GC BLOCKS LOST 2",
D.VALUE "GC BLOCKS CORRUPT 2"
FROM GV$SYSSTAT A, GV$SYSSTAT B, GV$SYSSTAT C, GV$SYSSTAT D
WHERE A.INST_ID=1 AND A.NAME='gc blocks lost'
AND B.INST_ID=1 AND B.NAME='gc blocks corrupt'
AND C.INST_ID=2 AND C.NAME='gc blocks lost'
AND D.INST_ID=2 AND D.NAME='gc blocks corrupt';
A sample result from the above
select should look like the following:
GC BLOCKS
LOST 1 GC BLOCKS CORRUPT 1 GC BLOCKS LOST 2 GC BLOCKS CORRUPT 2
---------------- ------------------- ----------------
-------------------
0
0
652
0
In this result, instance 2 is
showing some problems with lost blocks, so it might be useful to
look at the ratio described above:
SELECT
A.INST_ID "INSTANCE", A.VALUE "GC BLOCKS LOST",
B.VALUE "GC CUR BLOCKS SERVED",
C.VALUE "GC CR BLOCKS SERVED",
A.VALUE/(B.VALUE+C.VALUE) RATIO
FROM GV$SYSSTAT A, GV$SYSSTAT B, GV$SYSSTAT C
WHERE A.NAME='gc blocks lost' AND
B.NAME='gc current blocks served' AND
C.NAME='gc cr blocks served' and
B.INST_ID=a.inst_id AND
C.INST_ID = a.inst_id;
Instance gc blocks lost gc cur blocks served gc cr blocks served
RATIO
---------- -------------- -------------------- -------------------
----------
1
0
3923
2734 0
2
652
3008
4380 .088251218
The question now becomes, how
small is "as small as possible"? In this example, database instance
one takes 22 seconds to perform a series of tests and instance two
takes 25 minutes.
Investigation showed that the
TCP receive and send buffers on instance two were set at 64K. Since
this is an 8k-block size instance with a
db_file_multiblock_read_count of 16, this was causing excessive
network traffic because the system was using full table scans
resulting in a read of 128K. In addition, the actual TCP buffer area
was set to a small number. Setting these values for the TCP receive
and send buffers is an operating specific operation. DBA?s should
talk this over with the system operator or check out:
http://www.psc.edu/networking/perf_tune.html
The following section covers
example commands for various operating systems.
DEC Alpha (Digital UNIX)
$ dbx -k /vmunix
(dbx) assign sb_max = (u_long) 524288
(dbx) patch sb_max = (u_long) 524288
The first command changes the
value for the operating system and the second patches the value into
the kernel so it will not be lost at the next reboot. mbclusters can
be modified to at least 832, and the kernel variables tcp_sendspace
and tcp_recvspace cam also be modified in this manner. Under version
4.0, use the sysconfig -r inet <variable> <value> command to do this
type of modification.