File System Mount Options and Oracle

Oracle Tips by Burleson Consulting
Mike Ault

In UNIX you can control whether a file system uses buffered or unbuffered IO. With Oracle the use of a buffered filesystem is redundant and dangerous. An example of the dangers of a buffered filesystem with Oracle is when power is lost. The buffer in a buffered filesystem depends on the cache battery to provide enough power to allow the buffer to be written to disk before the disk spins down. However, many shops fail to monitor the cache battery lifetime limitations or fail to change the batteries at all. This can result in loss of data in a buffered filesystem on loss of power.

You can turn off buffered writes in several ways (buffered reads aren?t an issue, but you should always use write-through caching). One is to mount the filesystems used with Oracle files as non-buffered using such options as:

AIX: ?dio?, ?rbrw?, ?nointegrity? 

SUN: ?delaylog?, ?mincache=direct?, ?convosync=direct? ,?nodatainlog?

LINUX: ?async?, ?noatime?

HP: Use VxFS with: ?delaylog?, ?nodatainlog?, ?mincache=direct?, ?convosync=direct?

Using Direct IO at the Oracle Level

For information about Oracle direct I/O, refer to this URL by Steve Adams:


Checking Your Server

Methods for configuring the OS will vary depending on the operating system and file system in use. Here are some examples of quick checks that anyone can perform to ensure that you are using direct I/O: 

?         Solaris - Look for a "forcedirectio" option.  Oracle DBAs find this option often makes a huge difference in I/O speed for Sun servers.  Here is the Sun documentation:

?         AIX - Look for a "dio" option.  Here is a great link for AIX direct I/O:

?         Veritas VxFS - (including HP-UX, Solaris and AIX), look for "convosync=direct".  It is also possible to enable direct I/O on a   per-file basis using Veritas QIO; refer to the "qiostat" command and corresponding man page for hints.  For HPUX, see Oracle on HP-UX ? Best Practices

?         Linux - Linux systems support direct I/O on a per-filehandle basis (which is much more flexible), and I believe Oracle enables this feature automatically.  Someone should verify at what release Oracle started to support this feature (it is called O_DIRECT). See Kernel Asynchronous I/O (AIO) Support for Linux  and this great OTN article: Talking Linux: OCFS Update.

I?m Using LINUX and ATA Arrays, no Stress, but IO is slow!

Don?t panic! Most LINUX kernels will take the default ATA interface setpoints that were the ?standard? when the kernel was built (or even older ones).  This can be corrected.

In LINUX there is the hdparm  command which allows you to reset how ATA drives are accessed by the operating system. Using hdparm is simple and with it I have seen 300% improvement in access speeds of various ATA drives. Let?s go through a quick tuning sequence.

First, we will use the hdparm command with no arguments but the full path to the disk device listing:

[root@aultlinux2 root]# hdparm /dev/hdb


 multcount    = 16 (on)

 IO_support   =  0 (default 16-bit)

 unmaskirq    =  0 (off)

 using_dma    =  0 (off)

 keepsettings =  0 (off)

 readonly     =  0 (off)

 readahead    =  8 (on)

 geometry     = 77557/16/63, sectors = 78177792, start = 0


The hdparm with no arguments but the disk device gives the current settings for the disk drive. You should compare this to the specifications for your drive. You may find that direct emmory access (DMA) is not being used, readahead is too small, you are only using 16 bit when you should be using 32 bit, etc.

Next, let?s do a basic benchmark of the current performance of the drive, you do this using the hdparm ?Tt option (for all options do a ?man hdparm? at the command line.

[root@aultlinux2 root]# hdparm -Tt /dev/hdb


Timing buffer-cache reads:   128 MB in  1.63 seconds = 78.53 MB/sec

Timing buffered disk reads:  64 MB in 14.20 seconds =  4.51 MB/sec

Now lets adjust the settings, the ?c option, when set to 1 enables 32 bit IO, the ?u option is used to get or set the interrupt-unmask flag for the drive. A setting of 1 permits the driver to unmask other interrupts during processing of a disk interrupt, which greatly improves Linux's responsiveness and eliminates "serial port overrun" errors. Use this feature with caution on older kernels: some drive/controller combinations do not tolerate the increased I/O latencies possible when this feature is enabled, resulting in massive filesystem corruption. However most versions (RedHat 2.1 and greater) using modern controllers don?t have this issue. The ?p option is used to autoset the PIO mode and ?d is used to set or unset the DMA mode.

[root@aultlinux2 root]# hdparm -c1 -u0 -p -d0 /dev/hdb


 attempting to set PIO mode to 0

 setting 32-bit IO_support flag to 1

 setting unmaskirq to 0 (off)

 setting using_dma to 0 (off)

 IO_support   =  1 (32-bit)

 unmaskirq    =  0 (off)

 using_dma    =  0 (off)

So we turned on 32 bit mode and set DMA to mode 0. Let?s see the resulting performance change using our previous ?Tt option.

[root@aultlinux2 root]# hdparm -Tt /dev/hdb


Timing buffer-cache reads:   128 MB in  1.63 seconds = 78.53 MB/sec

Timing buffered disk reads:  64 MB in  9.80 seconds =  6.53 MB/sec

So we didn?t change the buffer-cache read timings, however, we improved the buffered disk reads by 45%. Lets tweak some more and see if we can do better. The ?m option sets the multi-sector IO count on the drive. The ?c option sets the 32 bit option, the ?X sets the access mode to mdma2 the ?d1 option turns on direct memory access, the ?a8 option improves the readahead performance for large reads and ?u1 turns on the unmasking operation described above.

[root@aultlinux2 root]# hdparm -m16 -c3 -X mdma2 -d1 -a8 -u1 /dev/hdb


 setting fs readahead to 8

 setting 32-bit IO_support flag to 3

 setting multcount to 16

 setting unmaskirq to 1 (on)

 setting using_dma to 1 (on)

 setting xfermode to 34 (multiword DMA mode2)

 multcount    = 16 (on)

 IO_support   =  3 (32-bit w/sync)

 unmaskirq    =  1 (on)

 using_dma    =  1 (on)

 readahead    =  8 (on)


So now let?s see what we have done to performance using the ?Tt option.

[root@aultlinux2 root]# hdparm -Tt /dev/hdb


Timing buffer-cache reads:   128 MB in  1.56 seconds = 82.05 MB/sec

Timing buffered disk reads:  64 MB in  4.29 seconds = 14.92 MB/sec

Not bad! We improved buffered cache reads by 5% and buffered disk reads by 231%! These options can then be loaded into a startup file to make them part of the system startup.

I?m Really Feeling SCSI About Disk Performance, what then?

Sorry for the bad pun (well, actually I?m not) what can be done with SCSI interfaces? To tell you the truth, not a lot, however, there are some items which you may find useful. Most interfaces will buffer commands and issue them in batches, for example, most SCSI interfaces use a 32 command buffer that stacks commands until it has 32 of them and then fires them off. This can be reset in LINUX using options in the modules.conf file for the SCSI interface module.

In other UNIX flavors there are many settings which can be changed, but an exact understanding of the interface and its limitations as well as current system loads must be had before changing any of the SCSI settings. If you feel you need to have them checked, ask your SA.

Disk Stress In a Nut Shell

In summary, to determine if a disk or array is undergoing IO related stress, perform an IO balance and an IO timing analysis. If the IO timing analysis shows excessive read or write times investigate the causes. Generally speaking, poor IO timings will result when:

?         A single disk exceeds 110 ? 150 IO per second 

?         An entire multi-read capable RAID10 array exceeds #MIRRORS*#DPM*110 IO?s per second

?         An entire non-multi-read capable RAID10 array exceeds #DPM*110 IO?s per second

?         If a RAID5 array exceeds (#DISKS-1)*66 IO?s per second then it will probably experience poor IO timings.

?         Make sure Oracle is using direct IO at both the OS and Oracle levels

?         Make sure your disk interface is tuned to perform optimally

 *DPM=Disks per mirror




