by Mike AultOne big topic I always hear about at conferences and
with clients is what RAID is best for Oracle. Some advocate RAID5 (disks
not mirrored but striped with parity), others RAID-10 (disks striped then
mirrored) or RAID-1 (disks mirrored then striped) while others want
RAID5 (disks striped with parity then mirrored). So what is the best
RAID for Oracle? The answer is it depends.
For determining the optimal strips size, see
the optimal disk
RAID stripe size.
If you want the best performance and assurance
against disk failure go with hardware based RAID10 with multi-read
capability (the ability to read/write from any disk in the array that
isn?t busy). If you want to minimize disks purchased but maximize
protection from failure and don?t have much worry about performance go
with RAID5. If you are totally paranoid about disk failure go with
RAID50. In order of cost for the same capacity (size wise), RAID5,
RAID10/01, RAID50, in order of performance, RAID10/01,RAID5/50 and in
order of protection level, RAID50, RAID10, RAID5, RAID01.
Mike?s Recommendation
What do I recommend? I support RAID10 or SAME
(stripe and mirror everything) RAID 0+1 with proper array tuning. Proper array
tuning means aligning the array stripe width such that the most
frequently expected maximum IO can be supported in a read form a single
disk. For example, in some early RAID setups the stripe width was set at 8k/disk. A stripe width of 8k per disk meant that for a full table scan
(db_file_multiblock_read_count of 16 and a db_block_size of 4K) you
blocked access to 8 disks with a single read. Now if the stripe
width/disk was set at 64K or greater, in the best case, only one disk
was blocked per read. On many UNIX systems the standard physical IO size
was 64k anyway. Now it is 1 megabyte for most UNIX systems.
Table 3 shows what Oracle suggests
for RAID usage.
RAID |
Type of Raid |
Control File |
Database File |
Redo Log File |
Archive Log File |
0 |
Striping |
Avoid |
OK |
Avoid |
Avoid |
1 |
Shadowing |
Best |
OK |
Best |
Best |
1+0 |
Striping and Shadowing |
OK |
Best |
Avoid |
Avoid |
3 |
Striping with static parity |
OK |
OK |
Avoid |
Avoid |
5 |
Striping with rotating parity |
OK |
Best if RAID0-1 not available |
Avoid |
Avoid |
Note: Oracle redo logs can be run with RAID
(RAID0 or RAID0+1), but it is important that they be segregated onto
different disks spindles, or used with SSD.
A Bit of Math (or is that a
Math of Bits?)
In the 1990's Paul Chen of the University Of
Berkeley computer center published a series of papers on tuning disk
array stripe unit size based on expected concurrency. In these papers by
Mr. Chen and his associates they determined that the IO speed (as
measured by average seek time) and IO rate (as measured in megabytes per
second) for a disk determined the stripe size for performance in an
array even when the number of concurrent accesses is not known. There
were three formulae derived from these papers:
For non-RAID5 arrays when concurrency is known:
SU = (S*APT*DTR*(CON-1)*1.024)+.5K
Where:
SU - Striping unit per disk
S - Concurrency slope coefficient (~.25)
APT - Average positioning time (milliseconds)
DTR - Data transfer rate (Megabyte/sec)
CON - number of concurrent users.
1.024 = 1s/1000ms*1024K/1M (conversion factors for
units)
So for a drive that has an average seek time of 5.6
ms and a transfer rate of 20 Mbyte/second the calculated stripe unit for
a 20 concurrent user base would be:
(.25*5.6*20*(19)*1.024)+.5 = 545K (or ~512K)
For a system where you didn't know the concurrency
the calculation becomes:
SU =(2/3*APT*DTR)
So for the same drive:
2/3*5.6*20*1.024 = 76.46K so rounding up ~128K or
rounding down 64K
And from Chen's final paper, a formula for RAID5
arrays is:
0.5*5.6*20*1.024 = 57.34 (rounding up 64K)
The values for average access time and transfer rate
used in these examples are actually fairly low when compared to more
advanced drives so the stripe sizes shown above are probably low by at
least a factor of 2 or more. I say this because while average seek times
drop, the transfer rate increases for example on a Ultra3 SCSI 15K drive
the spec for average seek may drop to 4.7 ms, however the transfer rate
leaps to 70 Mbyte per second, so the over all value of the combined
factor goes from 112 to 329, a 293% increase.
So, what should you derive from all of the above
talk of RAID and stripe size? First, generally RAID10 with a multi-read
based controller is the best performing RAID option. Second, in RAID10
(or any other striped RAID) use a stripe width that is aligned with
database most prevalent maximum IO size (generally
db_file_multi_block_read_count times db_block_size) or with the maximum
physical IO size for your system (most modern UNIX utilize 1 megabyte).
In studies with disk arrays and Oracle conducted by EMC and Oracle a 1
megabyte stripe width performed best as long as the product of
db_file_multi_block_read_count and db_block_size was equal to or an even
fraction of the stripe width.
Another Wee Bit of Math
When I was in the Navy we home ported out of Holy
Loch, Scotland. One of the Chiefs on the submarine decided to walk one
of the local lasses home, after all, she said it was only a ?wee walk?.
Five miles later he learned the difference between a ?wee walk? in a
non-car based technology and ours?I hope you won?t consider these
formulas the way the Chief considered a wee walk from that night onward.
Let?s look at some formulas that may help in these capacity verses IO
rate decisions.
RAID10:
CEILING((MAX(APP_IO_RATE/(DISK_IO_RATE*RAID_FACTOR*MIRRORS)),1)),1)*MIRRORS
RAID5:
CEILING((MAX(APP_IO_RATE/(DISK_IO_DATE*RAID_FACTOR),2))+1,1)
For RAID10 the raid_factor is simply the ratio of:
ACTUAL_DISK_IO_RATE/MAX_DISK_IO_RATE
(Non-linear IO
rate to Linear IO rate) if you can?t get this value from the
manufacturer assume a factor of 0.6 to 0.7 (in the above data, this
would be 0.58!).
For RAID5 you
have to account for the extra overhead of the additional parity writes,
usually about 25% but may be less or more. This means the factor
becomes:
(ACTUAL_DISK_IO_RATE/MAX_DISK_IO_RATE)*0.75
If you use
mirrored RAID5 just use the RAID10 calculation with the RAID_FACTOR the
RAID5 factor and adding one disk for each level of mirroring.
By the way, I
have an example spread sheet showing these calculations if you ask
nicely I will send it to you.
If you have an
existing system, capture the IO statistics using operating system level
tools or from the Oracle V$FILE_STAT and V$TEMP_STAT views captured
during your applications peak access/activity times to feed into these
equations. The rest of the information should be available from the
specification sheets for your disks.
In Summary
Use RAID10 when possible, RAID5 if it is not. Size
the array based on IO needs first, then storage capacity and you can?t
go wrong.