This is an excerpt from the bestselling book
Oracle Grid & Real Application Clusters. To get immediate
access to the code depot of working RAC scripts, buy it
directly from the publisher and save more than 30%.
The GCS maintains the status of
the resources. It also keeps an inventory of the access requests for
the data blocks. After the blocks are transferred from one instance
to another to meet requests, the requesting processes need to be
notified that the block is actually available. Therefore, processes
utilize interrupts to inform of the arrival or completion of block
transfers. The GCS uses various interrupts to manage resource
allocation. These interrupts are:
* Blocking Interrupt -
When exclusive access is needed for a requestor, the GCS sends a
blocking interrupt to a process that currently owns the shared
resource, notifying it that a request for an exclusive resource is
waiting.
* Acquisition Interrupt -
When the requested access (e.g., exclusive) is made available after
releasing an earlier access mode, an acquisition interrupt is sent
to alert the process that has requested the exclusive resource. The
acquisition interrupt helps to notify the requesting process.
* Block Arrival Interrupt
- When a process requests a block from the GCS, the request is
forwarded to the instance holding the block. Then the requested
block is sent to the requesting process, and the process informs the
GCS that it has received the block. This notification is called
block arrival interrupt.
The block requests are granted
for many processes at the same time, but they follow a queuing
mechanism. The GCS maintains two types of queues for resource
requests.
If the GCS is unable to grant a
resource request immediately, then the GCS puts it in the convert
queue. The GCS then tracks all waiting requests.
Once a resource is granted to
the requesting process, it is kept in the granted queue. The GCS
tracks resource requests in the granted queue.
Cache Fusion and Recovery
In the RAC system, whenever
there is a node failure, the instance running on the failed node
crashes and becomes unusable. There can be several reasons for such
a failure. In this section, focus will be placed on the changes that
take place in the global cache and how the recovery of the failed
instance is undertaken by one of the surviving instances.
Recovery Features
Only the cache resources that
reside on the failed nodes or are mastered by the GCS on the failed
nodes need to be re-built or re-mastered. Rebuilt or re-master does
not mean building a block; the lock ownership is merely changed and
this is explained later with examples.
All resources previously
mastered at the failed instance are redistributed across the
remaining instances. These resources are reconstructed at their new
master instance. All other resources previously mastered at
surviving instances remain unaffected.
The cluster manager first
detects the node and instance failure. It communicates the failure
status to the GCS by way of the LMON process. At this stage, any
surviving instance in the cluster initiates the recovery process.
Remember, instance recovery does
not include restarting the failed instance or recovering
applications that were running on that instance. Also note that,
even after a node failure and instance loss, the redo log file of
the failed instance is still available to the other recovering
instance, since the redo log file is located on the shared cluster
file system or shared raw partition. This is an important feature of
the RAC system.
Because of past images, instance
recovery is performed differently in the RAC implementation. The
SMON process of a surviving instance performs recovery of the failed
instance or thread. However, note that the foreground process
performs recovery in a stand-alone instance.
Recovery Methodology and
steps
Oracle performs the following
steps to recover:
1. In the initial phase of
recovery, GES enqueues are reconfigured and the global resource
directory is frozen. All GCS resource requests and writes are
temporarily halted.
2. GCS resources are
reconfigured among the surviving instances. One of the surviving
instances becomes the recovering instance. The SMON process of the
recovering instance starts a first pass of the redo log read of the
failed instance's redo thread.
3. Block resources that need to
be recovered are identified and the global resource directory is
reconstructed. Pending requests or writes are cancelled or replayed.
4. Resources identified in the
previous log read phase are defined as recovery resources. Buffer
space for recovery is allocated.
5. Assuming that there are past
images of blocks to be recovered in other caches in the cluster,
source buffers are requested from other instances. The resource
buffers are the starting point of recovery for a particular block.
6. All resources and enqueues
required for subsequent processing have been acquired and the global
resource directory is now unfrozen. Any data blocks that are not in
recovery can now be accessed. At this time, the system is partially
available.
7. The SMON merges the redo
thread order by SCN to ensure that changes are written in an orderly
fashion. This process is important for multiple simultaneous
failures. If multiple instances die simultaneously, neither the PI
buffers nor the current buffers for a data block can be found in any
surviving instance's cache. Then a log merger of the failed
instances is performed.
8. Now the second pass of
recovery begins and redo is applied to data files, releasing the
recovery resources immediately after block recovery, so that more
and more blocks become available as cache recovery proceeds.
9. After all blocks have been
recovered and recovery resources have been released, the system is
available for normal use.
Figure 7.7 shows the basic steps
in the recovery.
Figure 7.7: Online Instance
Recovery Steps