This paper examines proven Oracle scalability
strategies for large eCommerce systems that support thousand of end-users
and process thousands of transactions per second.
The paper will explain the economic imperatives in Oracle scalability
and examine the scale-up, scale-out strategy for applying just-in-time
resources to a growing database workload.
We will also explore Oracle Real Application Clusters
and see how RAC fits into a global strategy for seamless growth planning of
VLDB systems. The paper will outline a method for adding just-in-time RAC
nodes to a growing system in order to accommodate increasing demands for
processing poser at the web server, application server and database server
levels.
Introduction to
Oracle scalability
One of the reasons that Oracle became a leader in
database technology was their flexible tools, utilities that allow a
database to grow seamlessly from a small departmental system to a giant
multinational behemoth. Being
the world's most flexible and robust database, Oracle offers a wide array of
tools and techniques for scaling, and it's the challenge of the IT manager
to apply these tools at the proper time to ensure seamless growth while
minimizing the investment in hardware resources.
Even though hardware process fall by a order of
magnitude every decade, the investment in computing resources remains large,
a critical expense that requires careful management.
Hardware depreciates regardless of use, and a too-much, too-soon
approach can be wasteful. On
the other hand, waiting until the system experiences stress-related response
time delays is also bad, especially since today's end-user community has
very little tolerance for sluggish response time.
Goals for
Oracle scalability management
The overall goals for any IT manager are twofold;
maximize end-user satisfaction while minimizing expenses:
·
Monitor end-user
satisfaction - The primary objective of any
information system is end-user happiness, and assuming that a user's data is
correct and complete, the number one factor is response time.
The IT manager must create end-user monitors and carefully ensure
that end-user access speeds remain fast.
·
Manage economic
resources - Hardware is expensive and the IT
manager must devise a plan to add new hardware only when it is needed.
Advanced planning with Oracle tool standards can also reduce the DBA
costs involved in growth.
Given these goals, there are several important tools
and techniques that will come into play as we design a scalable architecture
for an Oracle database.
Techniques for
Oracle scalability management
There are many proven techniques and approaches for
ensuring your success in a rapidly growing database:
·
Enforce Standards
- Oracle has many standard conventions that facilitate seamless growth.
This would include disk standards, like SAME (Stripe and mirror
everywhere), Oracle's "RAID-10" standard for datafile layouts on disk.
Also, be sure to follow the Oracle Optimal Flexible architecture
(OFA) for all external files and directories.
Following these standards will make it very easy to grow the database
when the time comes.
·
Perform Capacity
planning - Growth monitoring and planning are
critical to seamless growth.
Like any project, you must know exactly how long it takes to have resources
added to your infrastructure and plan accordingly.
For example, if it takes a vendor 72 hours to install more disks,
your predictive monitoring must alert you more than 72 hours before you have
a disk-full condition. A good
IT manager will install capacity planning and monitoring tools that alert
them well in advanced of any resource shortage.
In a rapidly growing database, the idea is to fix the problem before
it cripples the database and effects end-user response time.
·
Don't skimp on
resource quality - When choosing a hardware vendor
and DBA staff, don't look solely at the costs.
Top quality resources are expensive and a penny-wise, pound-foolish
approach can backfire. For
example, a DBA with 10 years experience who charges $300/hr is often a
better value than a $75/hr DBA because they can work many times faster.
With hardware, choosing from the "major" vendors (Sun, HP, IBM,
UNISYS, EMC) is always a wise approach.
You will pay more up front, but "you get what you pay for".
·
Use the right tools
for the job - Because Oracle is the world's most
flexible database there are many tools that do similar jobs.
For example, high availability can be accomplished with Real
Application Clusters (RAC), Oracle Streams, and Oracle Data Guard.
The savvy IT manager will hire Oracle experts to advise them about
the right tool to match their specific requirements.
While this may seem self evident, we must remember that
the stakes are high, and hundreds of millions of dollars are riding on an IT
manager making the right decisions for any mission critical database.
When designing Oracle systems it's important to
remember that we must understand a few very important realities:
1 - Hardware
depreciates rapidly - Hardware becomes worthless quickly, depreciating
as a function of age, and not a function of usage.
All CPU's, disks, and RAM rapidly depreciate to worthless in just a
few years, regardless of how much they are used.
2 - Today's
servers allow for internal expansion - Many companies offer servers that
can accept additional RAM and CPU quickly, such that you can scale-up,
within a single server environment.
3 - Human costs
now exceed hardware costs - With hardware costs falling rapidly, Oracle
DBA costs will frequently exceed hardware costs.
For example, instead of paying a DBA $200,000 to tune the I/O for a
large database, you may choose to deploy solid-state disks for $100,000.
4 - Independent
advisors are more reliable - Obviously, hardware vendors will have a
built-in bias, as will Oracle Corporation consultants, each pushing their
own hidden agendas. It's not
hard to find experts with a proven track record of success in architecting
scalable systems.
Now that we see the basic concepts, let's examine a
proven approach for scalability, the scale-up, scale-out approach to
infinite growth.
Oracle
scalability solutions
While Oracle has a host of tools that facilitate
scalability (online reorganization tools), Oracle RAC is most commonly
associated with scalable Oracle solutions.
RAC is marketed for two purposes, scalability and
continuous availability. While
RAC is a superb 24x7 availability solution (when used in conjunction with
other redundant components), using RAC for scalability is widely
misunderstood.
It's important to know that RAC only prevents outages
that are due to an Oracle instance failure and a complete HA solution also
required redundant disks and other hardware components.
Where RAC shines is its ability to allow you to quickly
add an entire server to a cluster, adding horsepower without effecting
end-user response time. And
Oracle RAC Grid control is not just for adding database resources.
By using pre-loaded servers, you can use Oracle grid control to add
servers to any layer in the architecture, adding servers to the web server,
application server or database server (Figure 1):
Figure 1 - Using Oracle grid control
to add servers
Using the scale-up, scale-out approach, RAC only comes
into the picture when you have saturated a single server.
Let's take a closer look at how the scale-up, scale-out approach
works.
Scale up, scale
out
To achieve seamless growth, you need to be able to
start with a server environment whereby additional computing resources can
be added without service interruption.
You must be able to add RAM, CPU's and disk as the workload grows.
Eventually, you will saturate the capacity of even the largest single
server, and then you start the scale-out process, adding additional servers
to accommodate your growing workloads.
Let's start by understand how scale-up (vertical integration) works
with Orafcle systems.
Scale Up
(vertical scaling)
Oracle hardware vendors promise on-demand computing
resources, lower TCO, and easy scalability. Their huge servers offer savings
from CPU and RAM consolidation, far less human management costs, and
seamless allocation of resources.
In the "scale up" approach, server resources (CPU, RAM,
Disk) can be added into a single, monolithic server, which can have slots
for up to 64 CPU's and over 256 Gigabytes of RAM.
Examples include the HP Superdome (64 CPU), the Unisys ES-7000 Series
(32 Processors), the Sun Microsystems SunFire and the IBM X and Regatta
class servers.
Adding resources to a single server frame is simple and
effective because machine resources (especially CPU) are instantly available
to the growing application. The
scale up approach is simple and yields immediate benefits, without the
complexity of the scale-out approach:
·
On demand resource
allocation by sharing CPU and RAM between many resources.
·
Less maintenance and
human resources required to manage fewer servers.
·
Optimal utilization of
RAM and CPU resources.
·
High availability
through fault tolerant components.
·
The expense and
management of RAC is not required.
But we cannot "scale up" forever, and as our processing
demands grow, we need to look at the scale-out, the "horizontal" integration
of many large servers in a RAC cluster environment.
Scale Out
(horizontal scaling)
Grid vendors offer solutions where server blades can be
added to Oracle as processing demand increases. While Grid computing offers
infinite scalability, no central point of failure, and the use of fast cheap
server blades, it does have the same in-the-box parallelism that is found
within a monolithic server. Unlike the scale up approach, Oracle10g Grid
computing is not automatic and requires additional costs, additional
training, as well as sophisticated monitoring and management software.
The "scale out" approach is designed for super large
Oracle databases that support many thousands of concurrent users. Unless the
system has a need to support more than 10,000 transactions per second, it is
likely that the system will benefit more from a scale up approach.
In the real world, savvy corporations combine vertical
scalability and horizontal scalability. They start with a large vertical
architecture server, adding resources as-needed. If continuous availability
is also required, they may have a mirrored server using long-distance RAC or
Oracle Streams.
When the single server is approaching capacity, they
the "scale out" with the horizontal scalability, employing RAC and adding
additional servers, each with a vertical scaling architecture.
For these huge shops that rely on on-demand server
allocation with Oracle Grid control we see the ability to gen-in new servers
on an as-needed basis.
Combining
horizontal scalability and vertical scalability
In the real world, savvy corporations combine vertical
scalability and horizontal scalability.
They start with a large vertical architecture server, adding
resources as-needed. If
continuous availability is also required, they may have a mirrored server
using long-distance RAC or Oracle Streams.
When the single server is approaching capacity, they
the "scale out" with the horizontal scalability, employing RAC and adding
additional servers, each with a vertical scaling architecture.
Conclusions
The scale-up and scale-out approaches are simple in
concept, but most difficult to deploy in practice.
A common misconception is that everyone will need to eventually
scale-out. However, in the real
world, very few applications have workloads that saturate the capacity of
the million dollar servers with ^$ CPU's and hundreds of gigabytes in RAM.
The scale out approach using RAC and Grid control are
designed for super large Oracle databases that support many thousands of
concurrent users. Unless a
system has a need to support more than 10,000 transactions per second, it is
likely that the scale up approach will be more than adequate.
Amazon is an excellent example of a scale out Oracle
shop. Amazon announced plans to
move their 14 trillion byte Oracle database to Oracle RAC on Linux and
Amazon uses load-balanced Linux Web servers to horizontally scale its Web
presence to millions of connected users.
Remember, large-scale RAC database use large servers,
each with 32 or 64 processors and over a hundred gigabytes of RAM.
As the capacities of the large servers are exceeded, a new server is
genned into the RAC cluster.