The Best Oracle
Resource on the Web
Compare Sun E4500 and a Linux Redhat3.0 AS Cluster Using
Oracle10g
by Michael R. Ault
In
many Oracle shops, there is a move afoot to move from the larger,
older, mid-range UNIX servers to the commodity-based Linux model of
computing. Whether this is motivated by finance, technology, or just
because Linux is the new kid on the block, it is important to
understand how this move may affect the performance and behavior of
your system.
Why Did
I Choose Linux to Test Against?
There
are many operating systems available for the data center, including
Windows-based systems, other UNIX versions such as HPUX, AIX, and
TRU64 — so why Linux?
Let's
look at a few of the most compelling reasons:
•
Linux is one of the fastest growing operating systems; in 2002, its
market share grew 18 percent as opposed to Microsoft's three percent
• Linux provides enterprise-level OS features for a fraction of the
cost
• Linux utilizes a non-centralized support model in which bugs are
tracked, reported, and fixed by thousands of developers worldwide
• Linux outperforms most other operating systems, especially with
Oracle (most new development at Oracle will be done on Linux,
replacing Solaris)
Thus,
Linux seems to be one of the up-and-coming operating system, and in
my consulting practice, I have had more requests concerning
conversion to Linux than to any other operating system.
What We Will Cover
This
article discusses a testing methodology and the results of a test
performed utilizing an Sun E4500 with 4-400 MHz RISC CPUs, with 2 GB
RAM, using a AS5100 8x73GB FC disk array, against a Linux Redhat 3.0
AS DB Server Cluster, each with Dual 2.8 GHz Xeon CPUs, 2x36GB
internal SCSI, 2GB RAM per server, with 2Gb/s FC HBA and a 12-disk
SATA Array, 12x36GB 10K for a total system of 4 Linux Servers, 2
SATA Arrays, 8 FC Ports and 8 Gig-E Ports.
Now,
some of you are no doubt saying, "Geesh, the E4500 is already
out-gunned," and I would have to agree with you. However, this is a
typical configuration of those being replaced, and that is why it
was chosen.

Figure 1: Linux server configuration.
None
of the servers were preinstalled with Oracle; all were "Greenfield"
boxes with just the operating system installed.
Basic Project Plan
The
basic project plan consisted of setting up both systems utilizing
the same release of Oracle10g (10.1.0.2) and configuring
the Sun Solaris system as a single Oracle10g system and the
Linux cluster as an Oracle10g RAC system. I picked this
design because a majority of clients will be operating in
Single-Instance mode on Oracle before they decide to go to Oracle10g
using RAC.
We configured the RAC cluster by following this general procedure:
1.
Install CRS
2. Install ASM and database software
3. Create example RAC database on Linux
4. Utilize TPCH dbgen to generate 10GB database load files
5. Load example Linux database using SQLLoader
6. Create indexes on Linux
Next,
we configured the Solaris system:
1.
Install CRS
2. Install ASM and database software
3. Create single instance database on Solaris
4. Utilize TPCH dbgen to generate 10GB database load files
5. Load example Solaris Database using SQLLoader
6. Create indexes on Solaris
Note
that the Solaris and Linux configurations progressed in parallel as
much as possible.
Once
we had the basic database loaded, we used the
qgen routine from
TPCH to generate the standard queries; after reviewing the queries,
one was chosen as the test query for our purposes. This query is
shown in Listing 1.
select
ps_partkey,
sum(ps_supplycost * ps_availqty) value
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'INDIA'
group by
ps_partkey having
sum(ps_supplycost * ps_availqty) > (
select
sum(ps_supplycost * ps_availqty) * 0.0001000000
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'INDIA'
)
order by
value desc;
Listing 1: Query chosen for test.
Notice that the query uses a three-table join, a sub-select, a group
by, and an order by, making it a fairly complex query that utilizes
sorts, indexes, and summaries, and will therefore exercise much of
the Oracle query engine.
Tools Used
We used the TPCH benchmark
dbgen program (which had to be relinked for Solaris) to
create a 10GB load set. The load set was generated in five flatfiles
per major table, except for the nation and region table, which had
one flatfile each. The databases were loaded using sqlloader (TPCH —
Decision Support benchmark, available from
www.tpc.org)
One
item that is not well documented is that, to use the Oracle ASM
product most effectively, you should use the
ASMLIB set of
routines from Oracle, so be sure that the
oracleasm libraries
and utilities are downloaded and installed (side note — Fedora
cannot load this as it is branded for the commercial kernel). The
oracleasm libraries
provide the capability to mark entire disks (either actual or
virtual) as ASM devices, making discovering them and managing them
easier. Both the Oracle cluster file system (OCFS) and ASM can be
used in the same database, but not for the same files.
Issues Resolved
We
had several issues during the software install and load process:
Disk Reconfiguration
We established the required RAW devices (two at 150MB each) on the
Linux cluster; this required a rebuild of the disk farm. If you need
to rebuild CRS, you must remove the
ocr.loc file from
/etc/oracle (this is overlooked by the deinstall procedure
from MOSC).
Missing Library
The Linux box was missing a Library that allowed backward
compatibility to 7.3, which caused the load to fail, and required
analysis to resolve. The library is provided on the Linux install
disk, which is why I usually use the ITHDT method of install on
Linux boxes ("install the whole darn thing"). As with any other
install, I usually end up searching the disks for that one RPM or
utility that wasn't get installed, and the difference in install
size between the maximum typical install and the whole darn thing is
usually only a gigabyte or two.
Script Miss-run
Also, do not run the
rootdeinstall.sh script in
CRS_HOME/install if
you only want to deinstall a single node from CRS. Instead, just run
rootdeletenode.sh.
The first time through, we ran both when all that was needed was to
run rootdeletenode.sh,
which then required another reload of CRS to resolve.
ASM Issue
Once CRS was installed, we attempted to use the general database
from the start; however, you need to use the advanced (custom)
database or the installer assumes that you already have ASM
installed. If you choose advanced, custom database, the
dbca utility will
allow you to install ASM. During the install, you will be asked for
a discovery path for the ASM disks; this is "ORCL:" period — nothing
else.
OUI GUI Lack of Warnings
Once the database was installed (note that there were several
restarts during this process, and two of them were caused by hitting
the "Back" button one too many times), we began the data loading.
Solaris Load Stall
The first system running and ready for loading was the Solaris
system, so we started the load process there. (This concluded the
first day.)
The
next morning, we noted the Solaris load seemed to have stalled. We
bounced the database and resumed the load which ran to completion.
Power Failure to Boxes
We finished the Linux install and started its load. By
early afternoon, the Linux box had caught up with the Solaris box,
right up to the point we lost power to the systems (they were
plugged into the same circuit, with no UPS.) When the databases came
back up, the Solaris database was OK, but the Linux database showed
that its mirrors were out of sync. We attempted several times to
restore the database, but it was not recoverable. Investigation
showed this was because the cache was not being protected by battery
backup (after all, this was just a demo box). We had to drop the
database, reload CRS, reload the database software, and recreate the
database. (This experience indicates a very important tip: If you
are using memory caches, be sure they are battery-back-up caches!)
On
the third day we rebuilt the Linux database and restarted its data
load.
Possible ASM Configuration Issue
During the Linux data loads I had to restart them frequently because
the control files would report bad blocks, and the loading instance
would crash, but would then immediately restart. I theorize that
this is because we used the RAID controller to stripe the drives and
present them as one large, logical volume to the ASM unit. It
probably would have been better to use individual disks, either
mapped as raw devices or just ASM disks, and allow Oracle to perform
the striping.
The
drive arrays were divided into four logical volumes each, two 150MB
raw partitions and two 50GB raw partitions. The 50GB partitions were
mapped into the DATA and FLASH disk groups at the ASM level with
DATA being used for all data, index, undo, rollback, and temporary
files; and FLASH being used for a back-up area. The two 150MB
partitions were used for the CRS cluster configuration disk area and
the cluster quorum disk area.
Results
After
everything was loaded, all indexes built, and schemas analyzed, we
were ready to start. Before we look at the query testing results,
let's look at a few build statistics. First, let's consider load
times for the tables: The Line Item table was loaded form five files
of 12 million rows each. The results comparing the SUN with its
eight drives stripped by ASM and the Linux cluster with a 12 disk
SATA array are shown in figure 2. The loads on the Linux cluster
were done serially using SQLLoader from a single Linux node.

Figure 2: Example load time comparison.
As
you can see, the load times were considerably less for the Linux
cluster using SATA arrays verses the SCSI load.
Query Testing
We
selected an example query from the query set generated from the qgen
utility from the TPCH benchmark and used the Quest benchmark tool to
submit it as 100 simultaneous queries, we then ran this test several
times on each box. The results are shown in figure 3.

Figure 3: Query results.
As
you can see, the Linux server once again beat the Sun server in
simultaneous access from 100 users executing a query. The
fluctuation in query five for the Linux box was due to all users
being assigned to a single node in the cluster.
One
interesting result from the testing was that, in using the
client-side load balancing in OracleNET, we couldn't get the load
balancing algorithm to ever assign users to the fourth node in the
cluster. We got user-to-node spreads of 60/40/0/0, 33/33/34/0,
100/0/0/0, and 50/50/0/0, but in all seven runs, we never got any
users assigned to the fourth node. This highlights the fact that
Oracle's RAC load balancing must be done at the server level instead
of the client level if you want equal use of the machines.
We
assume we could have gotten even more interesting results if we had
used parallel query as well as RAC, but we wanted to simulate just
the placement of an existing system into a RAC environment.
Conclusions
All
of this information leads to the following conclusions:
-
Linux on Intel performance beats typical Sun installations
-
SATA Array performance keeps up with native FC Arrays
-
Placing an application onto RAC can provide performance
improvements without requiring excessive modification of the
application
-
Oracle ASM is very sensitive to disk array arrangement
I
hope you find this article useful in helping to decide if conversion
to Oracle RAC on commodity hardware is the right decision. However,
each application is different, and you should thoroughly test your
application in a RAC environment before committing to such a
venture.
 |
If you like Oracle tuning, check-out my
latest book "Oracle Tuning: The Definitive Reference".
Packed with almost 1,000 pages of Oracle performance tuning
techniques, it's the foolproof way to find and correct
Oracle bottlenecks. |
|