Donald Burleson is one of the world's most widely-read Oracle experts in parallel
server systems. Burleson is now available perform Oracle9i consulting for
Real Application Clusters (RAC) and Transparent Application Failover
(TAF).
Burleson is available for
Oracle9i Real Application Clusters Consulting involving all areas:
-
Determining the optimal Oracle9i RAC architecture
-
Choosing the best MPP RAC servers and hardware
-
Configuring the Distributed Lock Manager (DLM)
-
Configuring the raw disks for Oracle RAC
-
Installing and configure Oracle9i RAC
-
Installing and testing Oracle9i Transparent Application
Failover (TAF)
-
Performance tuning for Oracle Real Application
Clusters (RAC)

The Best Oracle
Resource on the Web |
Using Oracle9i Real Application Clusters for Continuous Availability
by Donald K. Burleson
|
The Evolution of RAC
Oracle Parallel
Server (OPS) has been around for many years. Now, Oracle has taken the general
idea of OPS and improved upon it, offering a new product, which they call Cache
Fusion. That term describes the ability of OPS databases to allow multiple
instances to share data buffer storage between the instances while still
accessing a single Oracle database.
Here's an
example of an actual TNSNAMES.ORA file for a Real Application Clusters system.
Note in this example that you can specify a failover tide off from one of the
two Oracle failover modes, and that you also have a re-try parameter.
prod.world =
(DESCRIPTION_LIST =
(FAILOVER = TRUE)
(LOAD_BALANCE = FALSE)
(DESCRIPTION =
(ADDRESS =
(PROTOCOL = TCP)
(HOST = arkum)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = ARK1)
(SERVER = DEDICATED)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD=PRECONNECT)
(BACKUP=bkup)
)
)
)
(DESCRIPTION =
(ADDRESS =
(PROTOCOL = TCP)
(HOST = diamond)(PORT = 1523))
(CONNECT_DATA =
(SERVICE_NAME = DIA3)
(SERVER = DEDICATED)
)
)
)
The fact that
the continuously available solution employs a re-try parameter is very scary to
many continuous availability professionals. Consumers are demanding systems that
will automatically and reliably restart any in-flight transactions that might be
running during the system failure, and the idea of delayed retries are onerous
to people who are counting on continuous availability.
Another
important note is that the RAC solution requires downtime to upgrade the Oracle
software. Oracle is currently working to create a rolling update technology, but
for now, RAC systems must be stopped when Oracle upgrades are applied.
- RAC is far
simpler and more reliable than OPS (because of cache fusion).
- RAC and OPS
only protect against instance failure. A failure of disk or the live cache
(RAM) will cause catastrophic failure.
Inside
Transparent Application Failover
Once Oracle
developed the RAC technology to prevent instance outages, they had to develop a
method for database connections to restart on the surviving instance. This
technology was dubbed "transparent application failover," and to understand how
it works we must take a close look inside the internal methods of TAF.
TAF has several
failover types:
1 - SELECT
Failover
When SELECT
failover is used, Net8 keeps track of any SELECT statements issued in the
current transaction. SELECT failover keeps track of how many rows have been
fetched back to the client for each cursor associated with a SELECT statement.
If connection to the instance is lost, Net8 establishes a connection to a backup
instance, re-executes the SELECT statements, and positions the cursors so the
client can continue fetching rows as if nothing had happened. SELECT failover
can be useful for reporting applications, but that's as sophisticated as TAF
gets.
2 - SESSION Failover
When the
connection to an instance is lost, SESSION failover results only in the
establishment of a new connection to a backup instance. Any work in progress is
lost.
Within this
failover type, Oracle offers two sub-methods:
- BASIC
failover - In this approach, Oracle connects to backup instance only after
primary connection fails.
- PRECONNECT
failover - In this approach, Oracle connects to backup database and primary
database. This offers faster failover, but it does so at the expense and added
overhead for duplicating the Oracle connections.
The
limitations of TAF are as follows:
- The effect of
any ALTER SESSION statements will be lost.
- Global
temporary tables will be lost.
- Any PL/SQL
package states will be lost.
- Transactions
involving INSERT, UPDATE, or DELETE statements cannot be handled automatically
by TAF.
By default, when
a TAF-initiated failover occurs, Net8 will make only one attempt to connect to
the backup instance.
Using the RETRIES and DELAY
parameters, you can change that behavior so that Net8 makes multiple attempts to
connect to the backup database.
In the
following, we see 20 retries at 30-second intervals:
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD=PRECONNECT)
(BACKUP=bkup)
(RETRIES=20)(DELAY=30)
)
Monitoring
Transaction Application Failover in Oracle
With widespread
usage of Real Application Clusters with
Transparent Application Failover, the Oracle or databases made significant
enhancements to the internal v$ views to allow the Oracle administrator to keep
track of what's going on within reconnected transactions.
The biggest
enhancement is to the v$ process view. Several new columns have been added to
the dollar process view to allow the DBA to see exactly what's going on within
the Oracle database.
- FAILOVER_TYPE
- indicates the type of failover.
-
FAILOVER_METHOD - indicates the method used to establish the backup
connection.
- FAILED_OVER -
Indicates whether or not a session has failed over to the backup connection.
Here we see a sample an Oracle query to display the new TAF columns.
SQL> SELECT username, sid, serial#,
failover_type, failover_method, failed_over
FROM v$session;
USERNAME
|
SID
|
SERIAL#
|
FAILOVER_TYPE
|
FAILOVER_M
|
FAI
|
|
1
|
1
|
NONE
|
NONE |
NO |
|
2
|
1
|
NONE
|
NONE |
NO |
|
3
|
1
|
NONE
|
NONE |
NO |
|
4
|
1
|
NONE
|
NONE |
NO |
|
5
|
1
|
NONE
|
NONE |
NO |
|
6
|
1
|
NONE
|
NONE |
NO |
SYSTEM
|
7
|
688
|
SELECT |
PRECONNECT |
YES |
|
9
|
110
|
NONE
|
NONE |
NO |
|
10
|
109
|
NONE
|
NONE |
NO |
|
11
|
110
|
NONE
|
NONE |
NO |
|
12
|
1
|
NONE
|
NONE |
NO |
|
15
|
84
|
NONE
|
NONE |
NO |
|
16
|
1729
|
NONE
|
NONE |
NO |
Alternatives to RAC for Continuous Availability
As we've already
noted, here are several alternatives to using Oracle's Real Application Clusters
for continuous availability and transaction failover. Most Oracle companies in
the real world will choose one of the following approaches:
Cross
database links -- using a cross database link approach,
all Oracle are encapsulated inside stored procedures.
Within each stored procedure, distributed transactions are set up such that
transactions are automatically sent to all of the replicated databases in the
form of a two-phase COMMIT transaction. Doing updates as a two-phase COMMIT
transaction ensures that all of the databases are either updated or rolled-back
as a single unit within the Oracle database. Many companies of been very
successful with this approach and they can ensure 100 percent reliability that
all transactions are synchronized in real and distributed in real time to
all of the different systems.
Using undo
log replication -- the undo log replication technique first became popular
in the year 2001 when Quest software introduced their Shareplex product. The
Shareplex product reads directly from Oracle's log buffer structure, and builds
SQL statements that are immediately fed to all of the replicated systems. While
this product worked quite reliably, Oracle Corp. made it clear that this product
would not be officially supported by Oracle because Oracle reserves the right to
change the structure in future releases of how information was stored within the
RAM region of law buffer. However, later in 2001, the release of Oracle9i
logical databases codified this concept, and now in Oracle9i, customers can
confidently use redo log-based replication to keep many Oracle databases
synchronized in real time.
When building
your own replicated database:
- Place all SQL
inside stored procedures.
- Create
back-and-forth database links for updates.
- Have the
stored procedures call both databases as a two-phase commit.

Custom Failover Approaches
- Customized
WebServers are written with Apache extensions
- Replicated
databases with WebServers connecting based on load
- Slow response
from Oracle causes WebServer to re-connect to another database and re-start
transaction
- Requires
knowledge of Apache code
|