 |
|
Popular RAC Storage Options in 11g
Oracle Tips by Burleson Consulting |
By Steve Karam, the world's youngest Oracle ACE
and Oracle certified Master.
When one of my clients is working on a new RAC
configuration, I,m guaranteed to receive tons of questions.
One of the most common is: what is the best storage option for RAC?
Despite the plethora of articles and information
regarding storage options, most companies end up going with the advice of
their storage vendor.
In this article we will explore some of these options.
-
Raw storage, which is often demanded yet rarely
used. Popular
misconceptions and difficult management make this a fading technology.
-
ASM, the new standard for RAC storage.
As a one-time skeptic of this technology, I have found myself
consistently pleased with it.
-
Direct NFS, 11g's new networked storage product
sure to excite users of NAS filers.
-
OCFS2, a cluster file system developed for Oracle
RAC environments.
We will also discuss
udev rules, a device management
solution which replaces traditional methods in RHEL5.
RAC Using Raw Storage
Some Oracle files can be written to unformatted disk
areas known as raw devices.
Note: Some
sources may also call these raw volumes, raw partitions, or raw disks.
The Oracle files which can be written to raw devices are:
-
OCR
-
Voting Disk
-
Datafiles
-
Redo Logs
-
Control File
-
SPFILE
It is worth noting the reason archive logs and RMAN
backups do not make the "raw storage" list.
This is because raw devices cannot handle files created and runtime.
Given a partition with no filesystem, there are three
available options: format the partition for a particular filesystem, use the
partition in an ASM diskgroup (discussed later), or use the partition as a
raw device on which a single file may be placed.
One reason behind the popularity of raw devices is
performance. In the past, raw
devices were the only way by which a system could be set up to take
advantage of Direct I/O (DIO); that is, I/O that bypasses the filesystem
cache. In fact, Direct I/O has
been supported in the ext3 filesystem since Enterprise Linux 2.1.
Support for enhanced Asynchronous I/O (AIO) with Direct I/O was added
in Enterprise Linux 4, even when using an
ext based filesystem.
According to Red Hat, ext3 filesystem access with AIO and DIO can
perform within 3% of raw I/O performance.
Direct I/O is also enabled when using OCFS/OCFS2.
%
Note:
The filesystemio_options parameter allows a DBA to
direct how Oracle will perform I/O.
A setting of "directio" will allow Direct I/O access.
"asynch" allows Asynchronous I/O access.
"setall" allows both.
Consult your OS specific documentation to determine
if your system is optimized for both DIO and AIO.
|
In Oracle 11g it is very common to find the OCR and
Voting Disk of a RAC cluster on raw devices.
This is because those two files are 1) very small, 2) very static in
size, and 3) cannot be placed in ASM.
However, according to Oracle MOSC (Oracle's support system), raw
device support will be completely unavailable in Oracle 12g.
This may be due to the fact that raw devices have been declared
obsolete in Linux since kernel version 2.6.3, and support for raw devices
will soon be gone.
However, there is no need to fear this change.
Instead, it is only necessary to make room for a few changes in
vocabulary.
Those used to using rawdevices on Linux may get a shock
when using Redhat Enterprise Linux 5 (RHEL5) or Oracle Enterprise Linux 5
(OEL5), as they will not find the traditional raw device configuration.
As mentioned above, in kernel version 2.6.3 this support is
officially deprecated. However,
it is still possible to configure a /dev/raw volume using udev rules.
In RHEL4 it was possible to simply place entries in /etc/sysconfig/rawdevices
which mapped a block device (i.e. /dev/sda1) to a raw device (i.e.
/dev/raw1). Using the "rawdevices"
service, the mapping would take effect and /dev/raw would be a usable area.
%
Note:
In a Windows environment, a raw device is simply a
logical partition created in Disk Manager that is not
formatted and has no drive letter.
|
In RHEL and OEL 5, entries must be made under the rules
specified in /etc/udev. "udev"
is responsible for managing the /dev area in Linux, and udev rules determine
how /dev will be presented.
Udev rules were also allowed in
RHEL4, though not required.
While /bin/raw can be used to bind a block device to a
raw device, /bin/raw binding alone is not meant to be a long term
configuration. One of the
primary purposes of udev is to keep disk areas and naming consistent.
To create a udev rule that maps block device
/dev/sda1 to raw device
/dev/raw1:
1.
Create a file called:
/etc/udev/rules.d/60-raw.rules
o
Any number greater than
or equal to 60 may be used
2.
Add the line:
ACTION=='add', 'KERNEL=='sda1', RUN+=?/bin/raw
/dev/raw/raw1 %N'
Despite this ability, in Oracle
11g there is really no point in creating /dev/raw devices unless it is being
done for comfort value. This is
because in Oracle 10.2.0.2 up, block devices are accessible by Oracle using
the O_DIRECT flag, meaning they are able to perform direct I/O without using
the rawio interface. OUI and
ASMlib will both accept a block device (i.e. /dev/sda1) as input for file
placement in Oracle 11g on Linux.
%
Note:
In 10gR2, even though Oracle allowed block devices to
be used in 10.2.0.2 and up, OUI was not able to handle a
block device name.
Instead, symbolic links had to be created to map the
block device to a different name under /dev.
While effective, this does not follow the udev rules.
|
Even with direct support for block devices, in order to
configure a block device for Oracle's use udev rules must still be created.
Since udev manages the /dev area, the rules will need to state
ownership of your block devices in order to grant Oracle the permissions
necessary to use them.
1.
Edit
the file: /etc/udev/rules.d/50-udev.rules
2.
At the bottom of the
file, add the new rules in the following format:
KERNEL=='blockdevicename', OWNER='deviceowner',
GROUP='devicegroup', MODE='4digitpermissions'
-
blockdevicename is the name of the
block device. For instance,
if the device is listed as /dev/sda1, the block device name is sda1.
-
deviceowner
should be set to the name of the OS user that will own the block device.
For instance, if the device is going to be used for placement of
the OCR, root should be the owner.
For the voting disk or ASM disks, oracle should be the owner.
-
devicegroup
should be set to the name of the group which owns the block device.
This will usually be oinstall or dba.
-
4digitpermissions should be set to the
permissions mask of the block device.
For the OCR and ASM devices this will be 0640.
For the Voting Disk it will usually be 0644.
It is important to note that even though Oracle is
writing to a block device instead of a raw device, this is still technically
raw storage. Instead of using
the rawio interface, a direct interface to the block device has been
provided by the Linux kernel and Oracle.
The limitations of raw are still a factor when writing to
block devices. Only one file
may be present on any single block device when the device is unformatted.
As such, raw devices are usually not recommended for most Oracle
files.
Creating the OCR and Voting Disk on block devices is a
popular option, as the only other storage method available is a cluster file
system such as OCFS2 which presents yet another layer of dependency for an
Oracle installation. For
datafiles, control files, SPFILEs, redo logs, RMAN backups, and archive
logs, ASM is the new de facto standard for RAC storage.
RAC using Automatic Storage Management (ASM)
ASM was introduced in Oracle 10g, and is widely used in
both RAC and single instance environments.
Oracle created ASM as a way for DBAs to simplify their storage
options, especially when using a cluster environment.
ASM will work directly with block devices and provide a
combination of software RAID and volume management specifically built for
Oracle files. However, files
stored within ASM are not available to the operating system without the use
of special tools.
This means that ASM volumes (called 'diskgroups') cannot
simply be mounted at the OS level and browsed, copied, edited, or otherwise
managed. However, a whole host
of commands have been created which can be performed through SQL*Plus.
Tools such as ASMlib and ASMCMD simplify management of files inside
of ASM volumes. For example,
ASMCMD allows an ASM volume to be browsed as if it were a standard
filesystem:
bash-3.00$
asmcmd
ASMCMD> ls -ltr
State
Type
Rebal Name
MOUNTED
EXTERN N
DATA/
ASMCMD> cd DATA
ASMCMD> ls -ltr
Type
Redund Striped
Time Sys
Name
Y RACDB/
ASMCMD> cd RACDB
ASMCMD> ls -ltr
Type
Redund Striped
Time
Sys Name
Y
CONTROLFILE/
Y
DATAFILE/
Y
ONLINELOG/
Y
PARAMETERFILE/
Y
TEMPFILE/
N
spfileRACDB.ora => +DATA/RACDB/PARAMETERFILE/spfile.269.679922899
ASMCMD> cd DATAFILE
ASMCMD> ls -ltr
Type
Redund Striped
Time
Sys Name
DATAFILE
UNPROT COARSE
JAN 15 11:00:00 Y
SYSTEM.260.679921433
DATAFILE
UNPROT COARSE
JAN 15 11:00:00 Y
UNDOTBS1.263.679921435
DATAFILE
UNPROT COARSE
JAN 15 11:00:00 Y
UNDOTBS2.259.679922629
DATAFILE
UNPROT COARSE
JAN 15 11:00:00 Y
USERS.267.679921435
DATAFILE
UNPROT COARSE
JAN 15 13:00:00 Y
SYSAUX.268.679921433
In addition, ASM gives storage administrators and DBAs
the option to add or remove disks from the configuration as needed, allowing
easy scalability at the storage level while remaining online.
This level of granularity was previously not possible with most
Logical Volume Managers (LVMs).
Enhanced striping is available as well, allowing a database to stripe not
only across multiple disks, but multiple trays and storage arrays.
Block devices at the OS level are recognized by ASM as
'ASM disks.' Even if a volume
is formed of a twelve disk RAID 10 volume and presented to ASM, in ASM it is
still considered a disk. ASM
Disks can then be added to ASM diskgroups, which take on the format '+NAME'.
The plus sign (+) is used in naming an ASM diskgroup, and when
creating files inside of an ASM diskgroup.
For example:
SQL> create tablespace ASM_TBS datafile '+DATA' size
100M;
Information about ASM Disks can be found in the
V$ASM_DISK view, while information about diskgroups can be found in
V$ASM_DISKGROUP.
When multiple disks are added to an ASM diskgroup, ASM
will automatically rebalance data between the disks in the diskgroup.
For instance, if a shelf of 14 disks is made into a single RAID 10
volume (7 mirrored disks striped), and another four disks are made into a
RAID 10 volume, it would be possible to combine the two into an ASM
diskgroup. ASM will rebalance
the data across both volumes to optimize I/O throughput.
Additionally, ASM adds no overhead to standard raw device I/O; as a
result, ASM works at 'the speed of raw'.
RAC Using NFS with Direct NFS (DNFS)
Oracle 11g comes with enhanced support for Oracle storage
over the network using the new Direct NFS feature.
Direct NFS allows for costs savings by sticking with one connection
model: the network. This allows
for multipathing and unified storage.
In addition, Direct NFS even works in Windows, even though Windows
has no NFS support.
The reason this support is available is because DNFS is
not NFS. It is a completely new
network storage model build specifically for and within Oracle.
DNFS takes the fundamentals of NFS and strips away much of the
overhead (such as data cache copying between user and kernel space), adding
features specifically required for the enterprise Oracle database.
The result is a storage method that is convenient for
Oracle shops who prefer NAS devices or those who are simply browsing for a
new platform. DNFS allows for
Direct I/O and Asynchronous I/O out of the box, and provides a familiar
filesystem environment for storage administrators and DBAs.
Whereas Oracle over NFS was a possible option, Oracle over DNFS is
more viable in resource intensive environments.
%
Note:
Oracle 11g Direct NFS only works with NFS V3
compatible NAS devices.
|
To use DNFS, the Direct NFS Client which ships with
Oracle 11g must be configured on all necessary nodes.
On Linux, this can be done in a few easy steps:
-
Add your mount point
details to /etc/mtab or $ORACLE_HOME/dbs/oranfstab
-
Shut down the
database (all nodes in a RAC environment)
-
cd $ORACLE_HOME/lib
-
mv libodm11.so
libodm11.so.old
-
ln -s libnfsodm11.so
libodm11.so
-
Start your database
While DNFS has not survived the test of several versions,
it is quickly emerging in the Oracle world.
While RAC environments sharing storage with a SAN are more likely to
use ASM for management, environments using or considering NAS filers can
benefit quickly and easily from DNFS support.
Additionally, it is possible to blend DNFS with ASM in order to
stripe across multiple filers if necessary.
OCFS2
OCFS2 is a shared-disk cluster file system (CFS)
available for Linux which provides a shared environment for RAC.
Previous releases of OCFS were incapable of storing
standard non-Oracle files, a level of support some DBAs found inconvenient.
OCFS2 is much more robust and offers not only Oracle shared storage,
but standard filesystem capabilities which provide clustering for a wide
range of server needs such as webservers, mailservers, and file servers.
As noted on the project page at
http://oss.oracle.com/projects/ocfs2, OCFS2 offers some notable features
associated with complex filesystems such as:
-
Variable Block
sizes
-
Flexible
Allocations (extents, sparse, unwritten extents with the ability to
punch holes)
-
Journaling
(ordered and writeback data journaling modes)
-
Endian and
Architecture Neutral (x86, x86_64, ia64 and ppc64)
-
In-built
Clusterstack with a Distributed Lock Manager
-
Support for
Buffered, Direct, Asynchronous, Splice and Memory Mapped I/Os
-
Comprehensive
Tools support
OCFS2 installation is as simple as installing RPMs on
your Linux server. As of the
time of this writing, OCFS2 is at version 1.4, with three specific RPM files
required for installation:
-
ocfs2-tools
-
ocfs2console
-
ocfs2
Once installed, OCFS2 can be configured manually or using
OCFS2 console, pictured below:

The benefits of OCFS2 are much the same as DNFS except
that it does not require Network Attached Storage.
An OCFS2 filesystem is usable as a shared environment for RAC while
providing standard filesystem capabilities and commands along with high
performance through low-overhead DIO and AIO.
For users who wish to use their SAN for RAC but require the use of
filesystem commands such as 'ls', 'cp', 'mv', et al., OCFS2 is a viable
alternative to ASM.
Conclusion
There are many options for data storage in a RAC
environment. This is much
different from the days where the only options were raw volumes or third
party cluster file systems.
With these options, it is possible for the DBA and System
Administrators to work together to find an optimal environment for their RAC
cluster. Between ASM, OCFS2,
and DNFS, Oracle offers high performance solutions for any need: ASM and
OCFS for direct attached methods depending upon the need for filesystem
access, DNFS for network attached storage requiring high performance, raw
storage for required files such as the OCR and Voting disk, or a combination
of all of these technologies to suit the needs of the environment.