Introduction to the choice of implementing
Oracle's Real Application Clusters
(RAC) product is a great temptation for DBAs and businesses alike. What could
be better than 24x7 availability, true scalability, rock-bottom hardware prices
due to commodity servers, high performance, and maximum user concurrency? It
all sounds like a miracle piece of software, well worth the extra cost and
However, there are things you need
to know about RAC before you commit your business to using it. We will divide
these concerns into three main sections:
For each of these sections, there
are definite pros and cons. This presentation aims at providing a non-biased
view based on years of experience with Oracle RAC to help you choose the best
path for your business.
We need RAC to
Find out what RAC is and report back immediately.
What is Oracle RAC:
Perhaps you heard about RAC from
an Oracle sales sheet and were captivated by the wide range of benefits
covered. Or maybe you heard about it from a colleague or an article in
InfoWorld, where you saw another business implemented it and cut their downtime
right to the 0.00001% mark.
Wherever you hear about RAC, you
usually only receive a single viewpoint. Some people really love it because of
the benefits they've gotten, though at a cost, where others absolutely hate it
because of the trouble it has caused them. I have worked with companies that
have felt both ways.
Before we actually delve into all
these pros and cons to gain a whole view of a RAC implementation, let's take a
look at what RAC is from a high-level perspective.
The Oracle RAC System
Though we all refer to our
implementations of Oracle as 'the database,' a complete Oracle system is
actually formed of two parts: the database and the instance.
Component 1: The Database
The database is simply our files
on disk. An Oracle database consists of three specific required file types.
Datafiles in RAC
Oracle Datafiles are the final
storage location of our data. All data that is inserted, updated, or deleted
will make its way to the datafiles (eventually) once they are committed. These
files are physically stored on disk resources.
These files make up what are known
as tablespaces. A tablespace is a logical disk area that Oracle objects such as
tables and indexes can be stored in. When a DBA or developer creates an object,
the object is placed logically in the tablespace and physically in a data file.
Objects are further broken down into extents and blocks, but this is beyond the
scope of this explanation.
In Oracle 10g and beyond, there are two
tablespaces that are absolutely required: SYSTEM and SYSAUX.
Control Files in RAC
The Control File is the record
keeper of the Oracle Database. It keeps track of the current state of the
datafiles and redo logs, archive logs, and the database itself. In a standard
Oracle system (one database, one instance) you may have multiple control files,
but they are all copies of each other. This is known as multiplexing.
Oracle's Control File is a
required file. If you lose a control file, the instance will crash until a
recovery of some sort is performed.
Redo Logs in RAC clusters
Think of redo logs as a tape
recorder that records every change in the Oracle database. As changes occur,
they are regularly recorded in the online redo logs, just like you might record
a movie on your VCR.
Also like VCRs, Oracle can
'replay' the saved transactions in the redo logs,
and re-apply lost transactions back into the database. Many times, this means
that Oracle can recover from a crash without the DBA having to do anything other
than just telling the database to startup.
At a minimum, Oracle requires that
you have two redo logs assigned to the database.
Oracle will write redo to the first log, and when the first log is full, Oracle
will switch to the second log and write the same redo. Each of these individual
online redo logs is known as an online redo log group.
The reason we call them groups is
that there can be mirrored copies of the online redo log files in each group.
Like control files, it's a good idea to have multiplexed copies of the redo
logs. Each copy of a redo log file within a log group is called a 'redo log
member'. Each redo log group can have one or
Component 2: The Instance
The Oracle Instance is the actual
runtime aspect of Oracle. The instance is made up of:
Binary Processes in RAC
Oracle actually runs as five
critical and required binary processes that are activated when you start your
SMON ' The System Monitor. SMON is
primarily used to recover a crashed instance.
PMON - The Process Monitor. PMON
cleans up dead processes and registers network services for the instance.
DBWR ' Database Writer. DBWR is
used to write blocks to datafiles (transition from instance to database)
LGWR : Log Writer. LGWR writes redo
information to the redo log files.
CKPT : Checkpoint. CKPT assists in
keeping all files in sync.
Please note that on Windows, these
five separate processes are threaded under a single process called ORACLE.EXE.
If any of these processes fail,
the entire instance of Oracle crashes. In a single instance environment, this
results in downtime.
RAM Memory and RAC
Oracle stores data in RAM in an
area called the System Global Area (SGA). The SGA is broken down into pools
where data can be temporarily stored before being discarded, overwritten, or
flushed to disk. These pools, or memory areas, are:
: Buffer Cache
: Stores cached blocks
of data from Oracle datafiles when queried. Also stores data written with
inserts, updates, and deletes (called Data Manipulation Language, or DML). Data
is flushed from this pool via DBWR to the datafiles.
: Shared Pool
: Caches the means by
which SQL can be executed, called an execution plan. When SQL is run, it must
be parsed; if the execution plan is cached in the shared pool, the parse phase
is sped up considerably.
: Log Buffer
: Stores change data to
be flushed to the current redo log file. Flushing occurs every commit, every
three seconds, when the buffer is 1/3rd full, when it reaches 1MB, on
checkpoint, or when required by DBWR.
Note that the Buffer Cache is very
important for RAC. I will explain this in a moment.
Cache Fusion for RAC
RAC provides us a multiple
instance, single database system. In a RAC environment, there is one shared set
of datafiles. Each instance in the 'cluster' will have its own SGA (RAM areas)
and binary processes. They will also have their own control files and redo log
files, though these must be viewable by all nodes, or systems, in the cluster.
A RAC environment uses something
called Cache Fusion to bring all the instances in the cluster together. Each
instance has its own Buffer Cache, as we saw in the previous section; however,
Oracle fuses these caches together into a single global buffer cache. This
occurs over a private network called a private or cluster interconnect.
This cluster interconnect allows
each node of the RAC cluster to share cached data located in the buffer cache
with any other node on the cluster.
Figure 1. A simple view of Cache Fusion at Work
Notice in the image above that
Instance 1 (server 1) queries the centralized storage to find all employees
between 1 and 10. Once this query has been executed and fetched, the data will
be cached in Instance 1's Buffer Cache. If Instance 1 were to require any of
this data again, it would have to look no further than local RAM. RAM is much
faster than disk, and so the query would return much quicker.
Now notice that Instance 2 runs a
query that wants a row that Instance 1 already has cached. In this case,
Instance 2 would receive the data over the high speed network interconnect using
Cache Fusion. This RAM to RAM transfer over the network isn't as fast as local
RAM, but it definitely beats going to disk for it!
High Availability and RAC
RAC also gives us the benefit of
High Availability. If instance 2 above crashes; for instance, due to a power
plug being kicked loose, or a fatal error of some sort on the system, Instance 1
will take over the user load. All connections that would have pointed to
Instance 2 (and in some cases connections that were already pointing at instance
2) will fail over
Scalability and RAC
There are two ways to scale your
hardware: horizontally and vertically. We all know about vertical scaling; we
build up. We add CPUs, RAM, etc until the system we are on is full. To
visualize scaling vertically, think of Manhattan. There is no more room on the
horizontal plane; they cannot build new buildings. However, they can build up,
taller, and therefore have the skyscrapers we all know, love, and sometimes
Scaling horizontally is the
practice of adding new systems to the cluster. For example, think Oklahoma.
There is a lot of land available, acres and acres of spare room. When new
developments are needed, they do not need to build taller buildings. Instead,
they build out, scaling upon the horizontal plane (or plains, as the case may
Is That It?
No, of course RAC does much more
internally. There is software called Clusterware that must make the bridge
between the nodes, or servers, of the cluster. Disks must be set up properly in
order to allow this shared storage. Networks must be set up just so to allow
data to transfer freely from node to node. Complex locking mechanisms must be
in place to make sure data is reliable and secure.
There is much more than this
diagram to a fully functional RAC system. However, it provides us enough meat
to start talking about the pros and cons of using Oracle 10g RAC.
What Does RAC Do For My
The primary goal of RAC can be
summed up in a single word: Uptime.
Uptime and Oracle RAC
Data drives business.
Applications, DSS, expert systems, reporting, analytics, they all require a
steady stream of data to keep them alive; and thus, your business requires data
to stay above ground.
If a bank loses its core
transaction database for even a single hour, it can and will cause massive
amounts of error, possible data corruption, and millions of dollars lost. And
though this seems like a horrible loss, others can be more horrible still.
Imagine if the data powering the FAA's air traffic control systems was suddenly
lost, with the hundreds of planes in the air at all times? Or if the database
powering a just in time provider of organs for transplant were to suddenly crash
because the janitor pulled the plug? It sounds incredibly dramatic, but a
crashed database could end lives.
Oracle RAC is a High Availability
(HA) system. It makes downtime more bearable by providing a multiple nodes to
connect to. If you have a four node RAC cluster and a single node crashes,
three nodes will take over immediately, without a single second of downtime, and
allow your business to continue.
Not all downtime is 'bad.'
Downtime comes in two categories: planned and unplanned.
Unplanned Downtime and RAC databases
Unplanned downtime was mentioned
above, and is generally regarded as the worst type. It can last from seconds to
hours in extreme situations, and can happen because of some of the most simple
or unexpected issues.
Some examples of events causing
Overheating server room
Fat fingered mistake (for instance,
a systems administrator kills a required process such as SMON)
Oracle Internal errors
Localized disasters (coffee spill on
the new Sun server)
Planned Downtime and RAC
Planned downtime is more graceful
than unplanned of course, but in some ways can be worse than unplanned
downtime. Depending on the software on the server, it could require frequent
restarts in order to keep things updated. Some developers and administrators
want daily maintenance periods, which can cause planned downtime to be the bulk
of your total downtime.
RAC alleviates these issues by
allowing you to have a single server down at a time. Work can progress in a
'rolling' fashion, where one server at a time comes down, thereby allowing your
operation to remain online.
Scalability and RAC
Oracle, other vendors, and
consultants may mention that RAC is good because of the price. Though at first
glance it seems expensive, an added cost per CPU on top of what you are already
paying for Oracle, it can actually decrease costs by decreasing hardware
We've all seen spreadsheets for
new project implementations where we list all the new hardware we will need to
purchase. We have all seen the requests for huge multi-processor systems that
are upgradeable to somewhere around 128 gigabytes of RAM and over 90 CPUs. They usually
end up costing hundreds of thousands of dollars, and even run into the range of
RAC allows us to connect multiple
low cost machines together in order to provide the same capability of a single
large system, with the added benefit of high availability. For instance, we can
use 4 16 CPU systems instead of a single 64 CPU server. We will probably save
money using the lower-cost hardware, and now we can add new servers if we run
out of capacity, whereas our 64 CPU system may be maxed out.
In addition, a single system may
have underutilized resources. If the system is waiting on a RAM resource, but
the CPUs are at only 50% capacity, you are wasting half your CPUs. In a RAC
environment, we can utilize every server to the max. The concurrent processes
will be balanced across all the nodes of your cluster, and will therefore have a
better chance to use otherwise unclaimed resources.
Possibly Go Wrong?
After the previous section, you
may be thinking 'Where do I sign up''
It's not that simple. RAC has its
drawbacks as well, from Implementation up to Usage.
Implementation of Oracle RAC
RAC is a complex system to
implement. Most companies I have worked with require a consultant to come in to
help plan their move to RAC and for the actual installation itself. There are
many different pieces to the RAC environment, from networking to disk drives to
Clusterware to Oracle itself. On top of that, there are some costly disk
In order to implement a RAC
system, you must use some sort of shared storage device. Whereas a single
instance database can use Direct Attached Storage (DAS), which is an array of
inexpensive disks connected to a single server, you must now use what is known
as a Storage Area Network (SAN). A SAN is much more expensive, capable of
connecting to many servers, usually through fibre-channel connections. This
requires a unique set of hardware, ranging from Host Bus Adapters (HBA) to the
SAN itself, and it can get very costly.
Redundancy can also be costly.
Even though you have multiple servers to fail over to, most administrators
require redundancy within each server as well. This means doubling up on
hardware, and double the hardware equals double the cost. For each server, you
will want multiple Host Bus Adapters, multiple network cards, multiple power
sources, etc. The multiple HBA cards are used in case a single one fails; but
this usually requires expensive software to manage.
Yet another cost is the network
connection. Earlier we learned that the RAC system requires a cluster
interconnect in order to accommodate RAM-to-RAM transfers of data blocks. This
interconnect must be very fast, high bandwidth with low latency. Interconnects
such as Infiniband and Myrinet can accommodate this, but are very expensive.
Though RAC does provide horizontal scalability, if your cluster interconnect
cannot handle the traffic, extra servers will actually degrade your performance
instead of helping it. The only way around this issue is to change your entire
application to accommodate RAC, or to purchase other means of disk storage such
as Solid State Disk.
RAC learning for DBAs & System Administrators
There is a definite learning curve
when it comes to RAC. Because of all the different components that make up a
RAC environment, multiple levels of training may be required.
System Administrators will have to
learn how to work with the disk resources. Complex SAN environments such as EMC
and NetApp can require training of their own. In addition, Oracle RAC will only
function when using specific disk setups (ASM, OCFS, RAW, or a 3rd
Party CFS), and the administrator will have to assist in setup. Setting up and
administering the hardware mentioned in the previous section on Implementation
is no small task!
Network Administrators will have
to learn how to work with the new interconnect. If you use a specialized
interconnect such as Infiniband, training and consulting may be required.
Of all the staff, DBAs will have
the greatest learning curve. They will have to understand how to set up and
administer Clusterware, your volume manager or filesystem of choice, the RAC
specific features of Oracle, and troubleshooting for clusters. While this does
not sound like much, it makes up many days of training, lots of trial and error,
and even a little bit of 'miracle work' at times.
Heck, by the time you're done, you
the manager may require some training in dealing with setting up training
sessions, consulting, and dealing with employees with some great new marks on
Usage for RAC
Thankfully, once a RAC system has
been implemented, it behaves much like a normal database. Oracle's goal is to
provide transparency for all users, so no one ever knows they're even touching a
complex RAC environment.
However, this does not apply to
the DBA. The DBA must keep everything in the RAC environment monitored, up to
date, and running perfectly. With so many components, it is possible for more
things to go wrong.
The DBA must monitor the cluster,
the shared disk setup, ASM or OCFS if they're in use, the database, all
instances, listeners, and more in-depth metrics such as cache coherency,
interconnect latency, disk times from multiple systems, and many other things.
While tools such as Grid Control help perform this monitoring, it costs more
money, requires more implementation, and possible even training and consulting.
Remember also that humans are fast
becoming the most expensive part of the IT environment. With hardware costs
falling on a daily basis while manpower costs remain the same, you may pay a
hefty fee for the administration of this complex environment. DBAs that are RAC
proficient are usually better paid. In addition, you may need more DBAs than
you previously did to keep everything in top notch shape.
Another note on usage comes from
the architecture of RAC as a whole. Remember the Cache Fusion component we
learned about in the last section' Well, it's nice, but it's not always a
surefire winner. While RAM-to-RAM transfers over the network are indeed faster
than reading from disk, they're still not as fast as a local RAM read. You may
notice key queries slowing down where they used to be lightning fast due to the
application pointing at varying nodes of the cluster.
In addition, we learned in the
last section on Implementation that the interconnect MUST be very fast with low
latency in order to sustain your RAC cluster. If you bog down the interconnect
with too many nodes, it could be that your performance hits rock bottom; this
time may come sooner than you think. RAC is scalable, and it performs well, but
it's not the end all be all of performance. In fact, most database
professionals find it easier to tune a single instance system than a RAC
environment, due to the lower level of complexity and resources required for
High Availability, Yes. Disaster Recovery,
We have learned about instance
failure, which is roughly the same as server failure. RAC protects us against
this issue by providing multiple servers to which we may connect. However,
remember that all data will be in centralized storage. There is still a
possibility of data failure or data center loss.
Data failure is the worst of the
three we have seen thus far (instance and system failure), resulting in the loss
or corruption of data. Some disk failures are non-disastrous; for instance, if
a disk is mirrored with hardware or software RAID. Even then, if excessive
disks are lost it is possible that production data could be lost as well,
requiring some form of recovery. User error can also cause data loss if an
operating system user removes database files with a command such as rm.
In this case, the file will be removed, and the disk mirror will provide no
protection. Lastly, corruption can occur if hardware or software bugs result in
inappropriate data being written to the datafiles.
Data Center Loss occurs when a
system is completely lost, usually as the result of some sort of natural
disaster. A hurricane, flood, or tornado may destroy or seriously disable an
entire data center resulting in a combined loss of servers and disk. This is by
far the worst unplanned-downtime scenario, and can only be protected against
with extensive (and usually expensive) disaster recovery methods.
Oracle provides many options for
preventing downtime and data loss, all of which make up the Maximum Availability
Architecture (MAA). The MAA provides us with redundancy on all components and
employs different Oracle tools. RAC only makes up one piece of the MAA; it does
not account for all possible problems.
As we have seen in the previous
section, these tools must protect us from planned and unplanned downtime. In
addition, it must protect us from varying levels of unplanned downtime ranging
from single server outages (which RAC covers) to entire data center loss (which
RAC does not cover).
Some businesses choose not to
follow all the guidelines for maximum availability. When considering a high
availability strategy, the DBA must consider:
Recovery Time Objective (RTO)
Recovery Point Objective (RPO)
The RTO defines the allowable
downtime for the database. An advertising company may allow hours of downtime;
however, a bank will usually allow no downtime whatsoever. RPO defines the
allowable data loss if a failure occurs. If batch processes load our data, it
may be that hours or even days of data could be reloaded. However, for a system
that allows direct access by the end user, such as an online store or ATM
machine, zero data loss is allowed.
Downtime can be expensive.
Depending on the system, costs can range from dollars per minute to
tens-of-thousands of dollars lost for every minute the database is unavailable.
However, as we have seen here,
uptime is expensive as well. In the previous sections we've talked about how
costly RAC can be for your business; now we see that even more may be required
for a fully bulletproof system.
Figure 2: Example of an HA Configuration using MAA Best Practices
property of Oracle Corporation)
RAC provides businesses with some
outstanding benefits. Not only can you be much closer to 100% uptime, but you
can also enjoy scalability with lower priced hardware, and possibly even a
higher user load.
But do not forget that these
things come with a cost. The cost will not only be in licensing; it will be in
the form of employees, training, consultants, software, hardware, and other
little mentioned components of a RAC system. In addition, RAC only provides
support for part of the availability spectrum. Other costs will have to be
endured to become fully bulletproof.
It is important for managers to
understand these concepts before embarking on the RAC quest; remember that while
your employees are hopefully top notch and know what they are doing, it is your
credibility if you jump into a project without having a full view of its