Jumbo Frames and RAC
By
default, Ethernet has a variable frame size up to 1,500 bytes. The
Maximum Transmission Unit (MTU) defines this upper bound and
defaults to the 1,500 byte limitation. If data is sent across the
network, the data is broken into pieces no larger than the MTU frame
size. Right away, we can see a problem with the MTU limitation for
Oracle RAC's Cluster Interconnect. Many Oracle databases are
configured with a database block size of 8KB. If one block needs to
be transferred across the private network for Cache Fusion purposes,
the 8KB block will be broken into six frames. Even with a 2KB block
size, the block will be broken into two frames. Those pieces need to
be assembled back together when arriving at the destination. To make
matters worse, the maximum amount of data Oracle will attempt to
transmit is defined by multiplying the
db_block_size
initialization parameter by the
db_file_multiblock_read_count
parameter. A block size of
8KB taken 128 blocks at a time leads to 1 megabyte of data needing
to be transferred.
Jumbo
Frames allows a MTU value of up to 9,000 bytes. Unfortunately, Jumbo
Frames is not allowed in all platforms. Not only does the OS need to
support Jumbo Frames, but the network cards in the servers and the
network switch behind the private network need to support Jumbo
Frames. Many of today's NICs and switches do support Jumbo Frames,
but Jumbo Frames is not an IEEE standard, and as such, there may be
different implementations that may not all work well together. Not
all configurations will support the larger MTU size. When
configuring the network pieces, it is important to remember that the
smallest MTU of any component in the route is the maximum MTU from
point A to B. You can have the network cards configured to support
9000 bytes, but if the switch is configured for a MTU of 1,500
bytes, then Jumbo Frames won't be used. Infiniband supports Jumbo
Frames up to 65,000 bytes.
It is out
of scope of this book to provide direction on how to enable Jumbo
Frames in the network switch. You should talk with their network
administrator, who may, in turn, have to consult the switch vendor's
documentation for more details. On the OS network interface side, it
is easy to configure the larger frame size. The following examples
are from Oracle Linux 6. First, we need to determine which device is
used for the Cluster Interconnect.
[root@host01 ~]$
oifcfg getif
eth0
192.168.56.0 global
public
eth1
192.168.10.0 global
cluster_interconnect
The
eth1 device supports the
private network. Now we configure the larger MTU size.
[root@host01 ~]#
ifconfig
eth1 mtu 9000
[root@host01 ~]#
vi
/etc/sysconfig/network-scripts/ifcfg-eth1
In the
ifcfg-eth1 file, one line
is added that says ?MTU=9000? so that the setting persists when the
server is restarted.
The
interface is verified to ensure the larger MTU is used.
[root@host01 ~]#
ifconfig ?a
eth0
Link encap:Ethernet
HWaddr 08:00:27:98:EA:FE
inet addr:192.168.56.71
Bcast:192.168.56.255
Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe98:eafe/64 Scope:Link
UP BROADCAST RUNNING MULTICAST
MTU:1500
Metric:1
RX packets:3749 errors:0 dropped:0 overruns:0 frame:0
TX packets:3590 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:743396 (725.9 KiB)
TX bytes:623620 (609.0 KiB)
eth1
Link encap:Ethernet
HWaddr 08:00:27:54:73:8F
inet addr:192.168.10.1
Bcast:192.168.10.255
Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe54:738f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST
MTU:9000
Metric:1
RX packets:268585 errors:0 dropped:0 overruns:0 frame:0
TX packets:106426 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1699904418 (1.5 GiB)
TX bytes:77571961 (73.9 MiB)
Notice
that device eth1 has the
larger MTU setting. The
traceroute utility can be used to verify the largest possible
packet size.
[root@host01 ~]#
traceroute
host02-priv ?mtu
traceroute to host02-priv (192.168.10.2), 30 hops
max, 9000 byte packets
1
host02-priv.localdomain (192.168.10.2)
0.154 ms F=9000
0.231 ms 0.183 ms
Next, a
9,000 byte packet is sent along the route. The ?F option ensure the
packet is not broken into smaller frames.
[root@host01 ~]#
traceroute
-F host02-priv 9000
traceroute to host02-priv (192.168.10.2), 30 hops
max, 9000 byte packets
1
host02-priv.localdomain (192.168.10.2)
0.495 ms 0.261
ms 0.141 ms
The route
worked successfully.
Now a
packet one byte larger is sent along the route.
[root@host01 ~]#
traceroute
-F host02-priv 9001
too big packetlen 9001 specified
The error
from the traceroute utility shows the packet of 9,001 bytes is too
big. These steps verify that Jumbo Frames is working. Let's verify
that the change improved the usable bandwidth on the cluster
interconnect. To do that, the
iperf utility is used. The
iperf utility can force a
specific packet length with the ?l parameter. The public interface
is not configured for Jumbo Frames and no applications are
connecting to the nodes so the public network can be used as a
baseline.
[root@host02 ~]#
iperf -c
host01 -l 9000
------------------------------------------------------------
Client connecting to host01, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[ 3]
local 192.168.56.72 port 18222 connected with 192.168.56.71 port
5001
[ ID] Interval
Transfer
Bandwidth
[ 3]
0.0-10.0 sec
923 MBytes
774 Mbits/sec
The same
test is repeated for the private network with Jumbo Frames enabled.
[root@host02 ~]#
iperf -c
host01-priv -l 9000
------------------------------------------------------------
Client connecting to host01-priv, TCP port 5001
TCP window size: 96.1 KByte (default)
------------------------------------------------------------
[ 3]
local 192.168.10.2 port 40817 connected with 192.168.10.1 port 5001
[ ID] Interval
Transfer
Bandwidth
[ 3]
0.0-10.0 sec
1.28 GBytes
1.10 Gbits/sec
Here we
see that the bandwidth increased from 774 Mbs/sec to 1.10 Gbs/sec, a
42% increase! For the same 10 second interval, the number of bytes
transferred increased from 923 megabytes to 1.28 gigabytes, a 65%
increase!
If the
Oracle RAC systems are using Ethernet (Gig-E or 10Gig-E) for the
Cluster Interconnect, then the recommendation is to leverage Jumbo
Frames for the private network. It is less common to employ Jumbo
Frames for the public network interfaces. Jumbo Frames requires that
all network components from end to end support the larger MTU sizes.
In some cases, it may be tricky to diagnose issues where Jumbo
Frames will not work in the system, but even then, the effort is
well worth the cost.
|
|
|
Learn RAC Tuning
Internals!
This is an excerpt from the landmark book
Oracle RAC Performance tuning,
a book that provides real world advice for resolving
the most difficult RAC performance and tuning issues.
Buy it
for 30% off directly from the publisher.
|
|
|
Burleson is the American Team
Note:
This Oracle
documentation was created as a support and Oracle training reference for use by our
DBA performance tuning consulting professionals.
Feel free to ask questions on our
Oracle forum.
Verify
experience!
Anyone
considering using the services of an Oracle support expert should
independently investigate their credentials and experience, and not rely on
advertisements and self-proclaimed expertise. All legitimate Oracle experts
publish
their Oracle
qualifications.
Errata?
Oracle technology is changing and we
strive to update our BC Oracle support information. If you find an error
or have a suggestion for improving our content, we would appreciate your
feedback. Just
e-mail:
and include the URL for the page.
Copyright © 1996 - 2020
All rights reserved by
Burleson
Oracle ®
is the registered trademark of Oracle Corporation.
|
|