GC Block Lost Wait Event
No network
is perfect. Data transmitted from point A to point B may
occasionally get lost. The same is true for global cache transfers
along the Cluster Interconnect. Global cache block transfers can get
lost. If a requested block is not received by the instance in 0.5
seconds, the block is considered to be lost. When most block
transfers complete in milliseconds, too many lost global cache block
transfers can hamper application performance because the block needs
to be re-sent, thus wasting time for the second transfer to
complete.
Lost
global cache block transfers can be seen in two different areas.
Wait events named gc cr block
lost and gc current block
lost will be raised when a consistent read block transfer is
lost, or when a current block transfer is lost, and the session must
wait for the block to be resent. The other area is for the Oracle
statistics named gc blocks
lost as can be seen on the system or session level. Examples of
these two metrics are seen below.
select
inst_id,
event,
total_waits,
time_waited
from
gv$system_event
where
event in ('gc current block lost',
'gc cr block lost')
order by
event,
inst_id;
INST_ID EVENT
TOTAL_WAITS TIME_WAITED
---------- ------------------------------
----------- -----------
1 gc cr block lost
50
3029
2 gc cr block lost
75
4516
1 gc current block lost
26
1467
2 gc current block lost
36
2060
select
sn.inst_id,
sn.name,
ss.value
from
gv$statname sn,
gv$sysstat ss
where
sn.inst_id = ss.inst_id
and
sn.statistic# = ss.statistic#
and
sn.name = 'gc blocks lost'
order by
sn.inst_id;
INST_ID NAME
VALUE
---------- -------------------- ----------
1 gc blocks lost
90
2 gc blocks lost
164
The output
above shows the metrics on a per-instance basis. One can certainly
summarize the values across all instances if desired.
The
presence of blocks lost in wait events or a system statistic is not
sufficient to cause us great concern. Just like any network, there
may be an occasional hiccup that would lead to lost block transfers
and would appear in the
gv$sysstat view. As with any wait event, the wait event metric
by itself is essentially meaningless as there is no context from the
output above. Is the wait event a 'top 5? wait event? Where the wait
events generated over a 1-hour time period or 1 month? Since we do
not know the answers to these questions, we cannot determine if the
metrics are indicating a problem or not. More information is needed.
An AWR report from a 1-hour snapshot of time can be more
indicative that a real problem exists.
Top
5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait
% DB
Event
Waits
Time(s)
(ms) time Wait
Class
-------------------------- ------------ ----------- ------ ------
----------
DB CPU
6,975
32.1
db file
sequential read
3,831,277
5,809
2 26.8
User I/O
gc current block lost
3,819
942 247
4.3 Cluster
db file
parallel read
145,588
854
6
3.9 User I/O
gc cr
multi block request
535,685
498
1
2.3 Cluster
Above, the
gc current block lost
wait event is in the Top 5 list. The listing above now provides
context to the wait event in question. This event contributes the
second longest total wait time for the instance during the one-hour
time period. However, if the wait event were totally eliminated,
only 4.3% of the total processing time would be recovered. From a
performance tuning perspective, where the end goal is often to
reduce processing time, it would be better to focus on the
db file sequential read
wait event that is contributing 26.8% of the total database time or
determining if the CPU utilization can be decreased as that is
contributing to 32.1% of the total time. That being said, it is
never a good sign when any global cache blocks being lost are a top
wait event.
The most
common reason for lost global cache blocks is a faulty private
network, i.e. one that is dropping packets. If global cache lost
blocks are seen as a problem, then work with the network
administrator to ensure the switch is valid, cables are secure and
seated properly, firmware levels are up to date, and that other
network configuration issues are not a problem. The network
administrator should be able to use network tools like netstat and
anything else in their arsenal to check for dropped packets on the
private network.
[root@host01
~]#
netstat 'su
IcmpMsg:
InType0: 91
InType3:
723
InType8:
23
OutType0:
23
OutType3:
928
OutType8:
103
Udp:
664034038
packets received
983
packets to unknown port received.
20150 packet
receive errors
654621700
packets sent
UdpLite:
IpExt:
InMcastPkts:
18041
OutMcastPkts: 8745
InBcastPkts: 102377
OutBcastPkts: 119
InOctets:
4678332299675
OutOctets:
2652878623355
InMcastOctets: 1401313
OutMcastOctets: 636504
InBcastOctets: 19312376
OutBcastOctets: 49090
The
netstat utility is
reporting UDP packet receive errors, indicating global cache lost
block transfers for this node of the cluster. In addition to
verifying the hardware is correct, the network administrator should
investigate the following:
Private
network is truly private
Oversaturated bandwidth due to too much traffic on the network
Quality of
Service (QoS) settings that may be downgrading performance
Incorrect
Jumbo Frames configuration
Multiple
hops between the nodes and the private network switch
Mismatched
MTU settings between devices
Mismatch
in duplex mode settings between devices
Incorrect
bonding/teaming configuration
If everything on the network side checks out, then look to sizing
the UDP settings to have larger socket sizes as discussed in the
previous section of this chapter. Global cache lost blocks are not
always a network issue. After the network has been verified and UDP
socket sizes are correct, look to see if CPU resources are in short
supply.
|
|
|
Learn RAC Tuning
Internals!
This is an excerpt from the landmark book
Oracle RAC Performance tuning,
a book that provides real world advice for resolving
the most difficult RAC performance and tuning issues.
Buy it
for 30% off directly from the publisher.
|
|
|
Burleson is the American Team
Note:
This Oracle
documentation was created as a support and Oracle training reference for use by our
DBA performance tuning consulting professionals.
Feel free to ask questions on our
Oracle forum.
Verify
experience!
Anyone
considering using the services of an Oracle support expert should
independently investigate their credentials and experience, and not rely on
advertisements and self-proclaimed expertise. All legitimate Oracle experts
publish
their Oracle
qualifications.
Errata?
Oracle technology is changing and we
strive to update our BC Oracle support information. If you find an error
or have a suggestion for improving our content, we would appreciate your
feedback. Just
e-mail:
and include the URL for the page.
Copyright © 1996 - 2020
All rights reserved by
Burleson
Oracle ®
is the registered trademark of Oracle Corporation.
|
|