This is an excerpt from the bestselling book
Oracle Grid & Real Application Clusters. To get immediate
access to the code depot of working RAC scripts, buy it
directly from the publisher and save more than 30%.
Normally the Failover cluster is
implemented in two types of architectures. They are Active/Passive
architecture and Active/Active architecture.
Active/Passive Clusters: This type comprises two near identical infrastructures, logically
sitting side-by-side. One node hosts the database service or
application, while the other rests idly waiting in case the primary
system goes down. They share a storage component, and the primary
server gracefully turns over control of the storage to the other
server or node when it fails. On failure of the primary node, the
inactive node becomes the primary and hosts the database or
application.
Active/Active Clusters: In this type, one node acts as primary to a
database instance and another one acts as a secondary node for
failover purpose. At the same time, the secondary node acts as
primary for another instance and the primary node act as the
backup/secondary node.
Figure 3.9 shows an example of
active/active architecture.
Figure 3.9: Two Node Cluster
with Active /Active Resource groups
The Active/Passive architecture
is the most widely used. Unfortunately, this option is usually
capital intensive and an expensive option. For simplicity and
manageability reasons many administrators prefer to implement this
way. Active/Active looks attractive and is a more cost-benefit
solution as the backup server is put to use. However, it can result
in performance problems when both the database services (or
applications) failover to single node. As the surviving node picks
up the load from the failed node, performance issues may arise.
Oracle Database Service in HA
Cluster
The Oracle database is a widely
used database system. Large numbers of critical applications and
business operations depend on the availability of the database. Most
of the cluster products provide agents to support database fail over
processes.
The implementation of Oracle
Database service with failover in a HA cluster has the following
general features.
* A single instance of Oracle
runs on one of the nodes in the cluster. The Oracle instance and
listener has dependencies on other resources such as file systems,
mount points and IP address. etc.
* It has exclusive access to the
set of database disk groups on a storage array that is shared among
the nodes.
* Optionally, an Active/Active
architecture of Oracle databases can be established. One node acts
as the primary node to an Oracle instance and another node acts as a
secondary node for failover purposes. At the same time, the
secondary node acts as primary for another database instance and the
primary node acts as the backup/secondary node.
* When the primary node suffers
a failure, the Oracle instance is restarted on the surviving or
backup node in the cluster.
* The failover process involves
moving IP address, volumes, and file systems containing the Oracle
data files. In other words, on the backup node, IP address is
configured, disk group is imported, volumes are started and file
systems are mounted.
* The restart of the database
automatically performs crash recovery returning the database to a
transactional consistent state.
There are some issues connected
with Oracle Database failover one needs to be aware of:
* On restart of the database,
there is a fresh database cache (SGA) established and it loses all
the previous instance's SGA contents. All the frequently used
packages and statements parsed images are lost.
* Once the new instance is
created and made available on the backup node, all the client
connections seeking the database service attempts to connect at the
same time. This could result in a lengthy waiting period.
* The impact of the outage may
be felt for an extended duration during the failover process. When
there is a failure at the primary node, all the relevant resources
such as mount points, disk group, listener, database instance have
to be logically off-lined or shutdown. This process may take
considerable time depending on failure situation.
However, when the Oracle
Database Cluster is implemented in Parallel, Scalable cluster such
as Oracle RAC, there are many advantages and it provides a
transparent failover for the clients. The main high availability
features include:
* Multiple Instances exist at
the same time accessing a single database. Data files are common to
the multiple instances.
* Multiple nodes have read/write
access to the shared storage at the same time. Data blocks are read
and updated by multiple nodes.
* Should a failure occur in a
node and the Oracle instance is not usable or has crashed, the
surviving node performs recovery for the crashed instance. There is
no need to restart the instance on the surviving node since a
parallel instance is already running there.
* All the client connections
continue to access the database through the surviving node/instance.
With the help of the Transparent Application Failover (TAF)
facility, clients will be able to move over to the surviving
instance near instantaneously.
* There is no such thing as the
moving of Volumes and File system to the surviving node.
Server Redundancy
The database resides within a server. The server or host is an
important component in the provision of the data service. Any
failure in the host system causes the database to go down.
Necessity of Server Redundancy
Clustered servers utilize two or more nodes, essentially
keeping the extra nodes as standby or sometimes as extra computing
power, as in the case of the RAC system. With the help of the
additional nodes, we ensure that the standby node can provide the
same database service to the user community. However, when in
standby, it loses the performance and scalability level for which
it is intended.
Clustering servers assures the administrators and the
application users that at least one node is alive. A cluster, in
its most general form, comprises two or more interconnected
computers that are viewed and used as a single, unified computing
resource. By using multiple systems, the impact of the failure of
any individual system is kept low by passing the failed system?s
workload to the remaining members of the cluster.
The standby node becomes functional, or becomes the primary
host, when the failed host is unable to provide any host services.
When some of the internal components fail and the failure is
non-recoverable without intervention, the server is declared not
available or simply ?failed?. This indicates that there is a lot
of scope for keeping the internal components safe or redundant.
Before losing the server and resorting to the use of the
clustered backup node, there are many things we can do to keep the
components from failing. Let us examine these methods that act as
the first level of redundancy. Some people call it ?high
availability without clustering.? In contrast to clustering,
system availability can be improved without adding additional
servers.
Redundancy Features
There are many features or options that add value to the
redundancy at the server level. Taking advantage of such features
helps avoid failures and avoids degraded cluster performance in
systems like the RAC system. These features address different
subsystems of the server, such as the memory and processors.
Redundant components such as fans, power supplies, and adapters
can also provide higher availability, particularly when used with
software that provides monitoring and alerting capability to the
system administrators.
To make the servers more reliable, we should use
high-reliability components and best-system practices. Let us
examine some features of the redundancy that administrators need
to focus on.