 |
|
An
Oracle RAC review by Microsoft SQL Server architect
Oracle Tips by Burleson Consulting |
5 February 07
This 2004 research paper titled "Oracle
Real Application Clusters and Industry Trends in Cluster Parallelism
and Availability" by James Hamilton, Architect, Microsoft SQL
Server, at first appears to be an "unbiased" review of Microsoft v. SQL Server
high availability options.
Like all marketing papers, it takes
a few quotes out-of-context and takes some fairly extreme liberties,
especially
with conclusions.
Here are the conclusions of the Microsoft paper, and it
notes that other Oracle HA tools (DataGuard, Streams) may be somehow superior to
RAC:
"This paper focuses on
two important attributes of high-scale, data-intensive applications: 1)
application availability and 2) affordable performance.
The original design
point for RAC was multinode scalability and it remains a less-than-ideal choice to
address application availability.
Fortunately, all major
DBMS providers including Oracle offer technologies better suited to achieve
this goal. We recommend these alternatives be used.
Focusing on performance
where RAC is a viable option, it has been shown that there exist more cost
effective architectural alternatives that should be considered when
deploying high-scale, data-intensive application workloads."
Like all vendor
whitepapers, we expect a bias, and this whitepaper is no exception.
To level the playing field, I'm interjecting some
pro-Oracle points into this commentary, so that the reader is not completely
mis-lead by the papers quotes and conclusions.
RAC has a single point of failure - NOT
We see this non sequitur, quoting James Morle, suggesting
that RAC somehow has a single point of failure:
"The
original design point for RAC when the technology was first conceived and
implemented nearly a decade ago [MORL02] was multinode scalability.
Therefore, it should not be surprising that RAC suffers from single points
of failure that make it a poor choice as a primary availability mechanism."
However, Mr. Hamilton does not go into details for this
conclusion. By the way, Oracle RAC's only shared component is the disk,
and disk HA techniques (such as triple-mirroring) removes disk as a single point
of failure.
RAC imposes a significant performance overhead - NOT
This whitepaper goes on to cite the overhead of RAC cache
fusion as-if this was some sort of performance problem.
"Performance impact:
James Morle of Scale Abilities investigated the overhead of running a RAC
system in his paper Unbreakable [MORL02]. In this paper, he
benchmarks an order entry application running under non-RAC
Oracle and the same workload under a single node
RAC
deployment on the same hardware.
What he found was an 18
percent overhead in moving to RAC
running the same workload on exactly the same hardware."
As a working RAC consultant, I find this a
tad misleading. Of course, a continuous availability solution is going to
have some overhead, but the whitepaper does not note that the overhead is
directly proportional to the amount of update activity. A well-tuned RAC
database has minimal overhead.
You probably don't need RAC - NOT
The most damning research cited in this a paper titled "You
probably don't need RAC". It appears that the authors intent was
to show that most shops do not need continuous availability, and hence do not
need RAC, not that there is anything wrong with RAC.
"I’ve seen many clusters that just froze for
no apparent reason in my time. It’s always possible to make the OS or
Cluster software dump a trace/log file when it happens. . . .
Then the files (often with sizes measured in
GB) are shipped to the vendor and some months later they will report back
that it wasn’t possible to pinpoint the exact reason for the complete
cluster freeze or crash, but that this parameter was probably a bit low and
this parameter was probably a bit high.
That’s what always happens. I have
never—really: never—seen a vendor who could correctly diagnose and explain a
hanging cluster or a cluster that kept crashing.
[YDNR02]
In my experience with RAC, a
properly installed and configured database can run for many years without any
node crashes, and the only reason for RAC is to protect against hardware
failures, not these alleged software failures. Also, I've noted that
Oracle RAC technical support has been amazing, and they provide high-quality
support even faster than the standard Oracle support.
Diagnosing RAC is difficult - NOT
Hamilton goes no to cite this condemnation
of RAC where the author suggests that introducing redundant instances will
somehow REDUCE availability!
"One way of looking at availability is this:
If you have a standalone Unix box it will usually give you 99.9%
availability over a year (some say 99.5, some say 99.9). It just runs. And
so does Oracle usually.