This is an excerpt from the bestselling book
Oracle Grid & Real Application Clusters. To get immediate
access to the code depot of working RAC scripts, buy it
directly from the publisher and save more than 30%.
Many of the Cluster vendors have
designed very competitive technology. Many of the interconnect
products come close to the latency levels of the SMP bus.
Technologies like Memory Channel, SCI and Myrinet support virtual
shared memory space by means of inter-nodal memory address space
mapping. Connections between nodes are established by mapping part
of the nodes virtual address space interconnect interface. Because
of the memory mapped nature of the interface, transmitting or
receiving overhead is similar to an access to local main memory.
This mechanism is the fundamental reason for the low latency levels
as seen in Memory Channel, SCI and Myrinet.
Table 3.1 summarizes the various
7 to 9
CPU overhead (?s)
Messages/ sec (million)
10 to 30
Table 3.1: Various Interconnect
Some of the popular interconnect
products and technologies will be examined next.
- The memory channel interconnect is a high-speed network
interconnect and provides applications with a cluster-wide address
space. Applications map portions of this address space into their
own virtual address space as 8 Kbytes pages and then read from and
write into this address space just like normal memory. This is
available for Alpha-based HP (now Compaq) clusters.
- Myrinet is a cost effective, high performance, packet
communication and switching technology. It is widely used in Linux
Clusters. Myrinet software supports most common hosts and operating
systems. The software is supplied open source. Myrinet
implements host interfaces that execute a control program to
interact directly with host processes (OS bypass) for low latency
communication, and directly with network to send, receive, and
Interconnect - SCI is Sun's best
performing cluster interconnect because of its high data rate and
low latency. Applications that stress the interconnect will scale
better using SCI compared to using lower performing alternatives.
Sun SCI implements Remote Shared Memory (RSM), a feature that
bypasses the TCP/IP communication overhead of Solaris. This improves
cluster performance. The SCI is Sun's highest performing cluster
interconnect for Sun Cluster hardware/software. It is available for
the Sun Fire 4800 and 6800 servers.
- Database Edition/Advanced Cluster communications consists of Low
Level Transport (LLT) and Group Membership Services (GAB). LLT
provides kernel-to-kernel communications and functions as a high
performance replacement for the IP stack. Use of LLT rather than IP
reduces the latency and overhead associated with the IP stack.
HP Hyper Fabric (HMP):
HP Hyper-Fabric supports standard TCP/UDP over IP and HP?s
proprietary Hyper Message Protocol. Hyper Fabric extends the
scalability and reliability of TCP/UDP by providing transparent load
balancing of connection traffic across multiple network interface
cards. HMP coupled with OS bypass capability and the hardware
support for protocol offload provides low latency and extremely low
Infiniband is an emerging
standard that will help businesses build an ideal cluster database
platform. This standard specifies channels that are created by
attaching Host Channel Adapters (HCA) between servers for
Some of the significant features
of Infiniband include:
* Infiniband allows HCA to
transfer data directly into or out of application buffers.
* Data transfers are initiated
directly from user mode, thus eliminating a costly context-switch to
kernels. As a result, there is no system buffer involved.
* Supports up to 3.0 Gbits/Sec
bandwidth. Because it allows bundling of multiple channels,
effective bandwidth can grow appreciably.
* Memory Windows provides a way
for the application to grant remote read/write to a specified buffer
at a byte level granularity. RDMA write and read allow a clustered
application, such as Oracle RAC, to transfer data blocks directly
from the cache in one instance to the cache in another instance in
* Reliability built into
hardware eliminates the need for implementing additional checking in
the application and thus saves CPU cycles at process level.
For building a high performance
Oracle Real Application Cluster, the selection of the right
interconnect is important. Care should be taken to select the
highest speed, most appropriate technology that is suitable to one?s