This is an excerpt from the bestselling book
Oracle Grid & Real Application Clusters. To get immediate
access to the code depot of working RAC scripts, buy it
directly from the publisher and save more than 30%.
It is reasonable to expect that
server components are prone to failures. It is the responsibility of
the cluster to detect and monitor and stabilize the application
running on the cluster. Clusters Systems are geared to handle
peculiar situations like Amnesia and Split Brain conditions.
Amnesia occurs when the cluster
restarts after a shutdown with cluster data older than at the time
of the shutdown. This can happen if multiple versions of the
framework data are stored on disk and a new incarnation of the
cluster is started when the latest version is not available.
Split Brain Condition occurs
when a single cluster has a failure that results in reconfiguration
of cluster into multiple partitions, with each partition forming its
own sub-cluster without the knowledge of the existence of other.
This would lead to collision and corruption of shared data as each
sub-cluster assumes ownership of shared data.
As an example, when two systems
have access to the shared storage, integrity of the data depends on
the systems communication through heartbeats using the private
interconnects. When the private links are lost and failed or if one
of the systems is hung or too busy to send/receive heartbeats, each
system thinks the other system has exited the cluster, then it tries
to become the master or form a sub-cluster and claim exclusive
access to the shared storage. This condition leads to Split Brain.
There are definite methods, also
known as fencing, to avoid such a tricky and undesirable situation.
The two basic approaches to fencing are resource based fencing and
system reset or STOMITH or STONITH fencing.
Resource-based fencing includes
I/O fencing and the maintenance of Quorum disks. In resource-based
fencing, a hardware mechanism is employed, which immediately
disables or disallows access to shared resources. If the shared
resource is a SCSI disk or disk array, one can use SCSI
reserve/release or better yet persistent reserve/release operations.
If the shared resource is a fiber channel disk or disk array, then
one can instruct a fiber channel switch to deny the problem node
access to shared resources. In general, the errant node itself is
left undisturbed, and its resources are instructed to deny access to
it. If the node is able to later become part of a cluster with
quorum, it will then go through the normal channels to reacquire its
STOMITH stands for Shoot the
Other Machine in the Head. STOMITH fencing takes a completely
different approach. In STOMITH systems, the errant cluster node is
simply reset and forced to reboot. When it rejoins the cluster it
acquires resources in the normal way. In many cases, STOMITH
operations are performed via smart power switches, which simply
remove power from the errant node for a brief period of time.
However, implementation of
processes to avoid split brain varies from vendor to vendor, and
also depends on the type of shared storage in use for the cluster.
For example, Sun Cluster avoids split brain by using the majority
vote principle coupled with quorum disks and Linux cluster using
Polyserve Matrix Server employs fabric fencing. The next section
examines these techniques in detail.