This is an excerpt from the bestselling book
Oracle Grid & Real Application Clusters. To get immediate
access to the code depot of working RAC scripts, buy it
directly from the publisher and save more than 30%.
Also see my notes on
using TAF
to guarantee availability.
The TAF function is controlled
by processes external to the Oracle Database RAC cluster
control. The cluster failover types and methods can be unique for
each Oracle Net client. Under more complex environments, application
code may have to be altered or modified to fully support TAF, using
Oracle OCI calls. In its most basic form, TAF supports:
* Active Transactions: Any
active, non-committed transaction (INSERT, UPDATE, or DELETE) is
rolled back at the time of failure, because TAF cannot preserve
active transactions after failover. The application will receive an
error message until a rollback command is submitted.
* Client-Server Database
Connections: TAF automatically reestablishes the client-server
database connection using the same connect string or an alternate
connect string specified when configuring TAF in the tnsnames file.
* Executed Commands: If a
command was committed at the time of the connection failure and the
command changed the state of the database, TAF will not reissue the
command. If, by mistake, TAF reconnects in response to a command
that may have changed the database, TAF issues an error message to
the application.
* Open Cursors Used for
Fetching: TAF enables applications that began fetching rows
from a cursor before the failover event to re-fetch the rows after
failover. This type of failover is called select failover. TAF
accomplishes this by re-executing the cursor select statement using
the same working set, effectively retrieving the rows again. TAF
verifies that the discarded rows are those that were returned
initially, or it returns an error message to the application.
* Server side Program Variables:
Server side program variables, such as PL/SQL package states, are
lost during failures. TAF cannot recover them. Making a call from
the failover OCI callback functions to the server-side processes can
initialize them.
* Users' Database Sessions:
TAF logs the users in with the same user IDs that were in use prior
to failure. If multiple users were using the connection, TAF
automatically logs them in as they attempt to process database
commands. Unfortunately, TAF cannot automatically restore
non-persistent session properties. These properties can, however, be
restored by invoking an OCI callback function that notifies the
calling transaction of the switchover and requests the
non-persistent session properties.
In a nutshell, the above means
that at the time of failover, in-progress queries are reissued and
processed from the beginning. Rows already fetched are discarded.
All of this discarding of rows and re-fetching can delay the
completion of the original transaction. However, TAF can also be
configured to issue two connections at a time, one to the main
instance and one to the standby instance. This speeds the processing
by eliminating the reconnection penalty. DDL operations are not
reissued. Committed transactions are not reissued.
Uncommitted INSERT, UPDATE and
DELETE commands are rolled back and must be resubmitted after
reconnection. Again, use of the OCI packages should be utilized to
have the DML operations reissued.
The Oracle Net process carries
out TAF functionality. The failover is configured in the tnsnames
file. The TAF settings are placed in the net service name area,
within the connect_data section of the tnsnames, using the
failover_mode and instance_role parameters.
failover_mode
Subparameter Descriptions
BACKUP
Used to set a different net
service name for backup instance connections. A backup should be
specified when using preconnect to pre-establish connections.
TYPE
Used to specify the type of
connection failover. There are three types of Oracle Net failover
functionality available to Oracle Call Interface (OCI) applications:
* session:
Used to set to
failover the session. When a user's connection is lost, a new
session is automatically created for the user on the backup. This
type of failover will not recover selects.
* select: Used to enable users
with open cursors (selects) to continue fetching on them after
failure. It should be noted that this mode involves some overhead on
the client side during normal select operations.
* none:
This setting is the
default. With none, no failover functionality is provided. If the
goal is to prevent failover, use the none setting.
METHOD
This is used to determine how
failover occurs from the primary node to the backup node:
* basic: Set this mode to
establish connections only at failover time. Since no preconnection
is done, basic requires virtually no work on the backup server until
failover occurs.
* preconnect: Set this mode to
pre-established connections to a backup server. The preconnect
setting provides for faster failover but does require that the
backup instance be capable of supporting all connections from all
supported instances.
RETRIES
This sets the number of times
that the server will attempt to connect after a failover. With DELAY
specified but RETRIES not specified, RETRIES will default to five
retry attempts.
DELAY
This specifies the number of
seconds between connection attempts. When RETRIES is specified and
DELAY is not, DELAY defaults to one second.
Table 10.1: fail_over Mode
Options
Setting load_balance=YES
instructs Net to progress through the list of listener addresses in
a random sequence, balancing the load on the various listeners. When
set to OFF, load_balance instructs Net to try the addresses
sequentially until one succeeds. This parameter must be correctly
coded in the net service name or connect descriptor. By default,
this parameter is set to ON for description_list. Load
balancing can be specified for an address_list, associated with a
set of addresses or set descriptions. If address_list is used, the
load_balance=YES should be within the (address_list=) portion.
If address_list is not used, the load_balance=YES should be within
the DESCRIPTION clause.
failover=ON is the default for address_list, description_list and the set of description;
therefore, it does not have to be specified. This only applies
for connect time failover not transparent application failover (TAF).
The failover_mode parameter must
be included in the connect_data portion of a net_service_name. There
is no BACKUP=failover in failover_mode=. This implies (failover_mode=(TYPE=SELECT)
(METHOD=BASIC) (BACKUP=failover)), meaning whenever failover occurs,
the connected session will failover to the net_service_name failover
again. A backup should be specified when using PRECONNECT to
pre-establish connections.
If it is desired to have remote
instances registered with the listener, even if the listener is
using port 1521, local_listener still needs to be in the init.ora
file. Otherwise, with remote_listener="<remote_listener>" alone, the
remote instances will not be registered with the listener, and there
will be no server-side listener connection load balancing.
This is due to bug 2194549 that is fixed in 10g.
If the configuration is not
using the default port 1521, the local_listener parameter in the
initialization file is required. If the hostname output is the
interconnect IP address as opposed to the public Ethernet IP
address, the PMON process will register the service and instance
with the hostname's listener. In this case, the local_listener
parameter should be set to instruct the PMON to register the service
and instance with the public Ethernet IP address listener.
The following script
demonstrates the initialization parameters that would be set for the
example server setup. Both nodes? init.ora file would have the
following parameters:
ault1.local_listener="LISTENER_ault1"
ault2.local_listener="LISTENER_ault2"
db_name='ault'
ault1.instance_name='ault1'
ault2.instance_name='ault2'
remote_listener='LISTENERS_AULT'
The TAF parameters must be
manually added to the tnsnames file, since the network configuration
assistant (NETCA) cannot configure them. Once configured, Oracle Net
will failover the connection, transparently to the user in many
cases, with the exceptions noted in the list of failover objects
above.
In order to configure TAF, the
static service information must be removed in the <sid>_list_<listener_name>
entry from the listener.ora, allowing the instance to self-register.
This is known as Dynamic Service Registration and has been available
since Oracle8i. In addition, the global_dbname parameter must be
removed from the tnsnames file or TAF will be disabled.