Oracle UNIX Mapping Disk Architectures Administration

Oracle UNIX/Linux Tips by Burleson Consulting

Mapping Oracle Disk Architectures

Today?s disk devices are normally delivered as complete I/O subsystems, complete with their own memory cache, channels, disk adapters and SCSI adapters. Understanding the architecture requires mapping the number of ports, the size of the disk cache, the number of disk adapters, and the mapping of I/O channels between the disks and the disk cache. Figure 4-10 shows a sample of a disk architecture map for a disk array.

Figure 10: A sample architecture of a disk array

Developing this type of disk map is very important to load balancing within Oracle because there are many possible bottlenecks within the disk array subsystem that can cause slowdowns. In addition to monitoring for disk waits, we also need to monitor for SCSI contention, channel contention and contention between the disk adapters. Fortunately, many of the major disk vendors (EMC, IBM) provide their own proprietary disk utilities (e.g., NaviStar, Open Symmetrics Manager) to perform these disk monitor functions.

The Multiple RAM Buffer Issue

We are also seeing disk arrays being delivered with a separate RAM cache for the disk arrays, as shown in Figure 4-11. These RAM caches can be many gigabytes in size and contain special software tools for performing asynchronous writes and minimizing disk I/O.

Figure 11: Multiple RAM caches with an Oracle database

The Oracle DBA needs to consider the RAM cache on the disk array, because it changes the basic nature of disk I/O. As you know, when Oracle cannot find a data block in one of the data buffers in the SGA, Oracle will issue a physical read request to the disk array. This physical read request is received by the disk array, and the disk RAM cache is checked for the desired block. If the desired block is in the RAM cache, the disk array will return the block to Oracle without making a physical disk I/O.

The fact that Oracle physical requests may not match actual read requests is a very important point, because it can lead to misleading statistics. For example, the stats$filestatxs table shows the number of reads and writes to files. If you are using a disk array such as EMC, these I/O statistics will not correspond to the actual disk reads and writes. The only conclusive way to check ?real? disk I/O is to compare the physical I/O as measured on the disk array with Oracle?s read and write statistics. In many cases, the disks are performing less than half the I/O reported by Oracle, and this discrepancy is due to the caching of data blocks on the disk array RAM memory.

Next, let?s look at file striping and see how it can be used to load balance the I/O subsystem.

File Striping with Oracle

File striping is the process of splitting a tablespace into small datafiles and placing these datafiles across many disks. With the introduction of RAID (redundant arrays of inexpensive disks), we also have the option of block-interleaf striping (RAID 1), which places each data block in the tablespace on a separate disk.

Other methods of Oracle file striping involve taking a large tablespace and splitting it into many Oracle datafiles. These files may then be spread across many disks to reduce I/O bottlenecks, as shown in Figure 4-12.

Figure 12: Striping a tablespace across multiple disks

However, manual file striping has become obsolete because of the large size of disks. In 1990, a 20GB database would probably have been composed of 20 physical disks, each within 1GB of storage. With many disks in a database, the Oracle DBA could improve throughput by manually striping the busiest tablespaces across many disks.

Commercials disks are getting larger every year, and it is very difficult to find small disk devices that contain less than 36GB of storage. Just ten years ago, the IBM 3380 disk was considered huge at 1GB of storage. Today, the smallest disks available are 18GB. The larger disks mean that there are fewer disk spindles, and fewer opportunities for manual file striping. Since it is often not possible to isolate Oracle tablespaces on separate disks without wasting a huge amount of disk space, the Oracle administrator must balance active with inactive tablespaces across their disks.

Note: There is a new feature in Oracle8i called ?single table clusters.? By using a cluster, the keys are grouped in the same physical block?reducing IO and speeding data retrieval by key.

Using RAID with Oracle

As you may know, there are more than six different types (called ?levels?) of RAID architectures, and each has its own relative advantages and disadvantages. For the purposes of an Oracle database, many of the RAID schemes do not posses the high performance required for an Oracle database, and are omitted from this discussion.

Please note that RAID 5 is not considered for database that perform write activity since the processing overhead for updates makes it too slow for most Oracle applications. Below are the most commonly used RAID architectures for Oracle databases:

* Raid 0?RAID 0 is commonly referred to as block-level striping. This is an excellent method for performing load balancing of the Oracle database on the disk devices, but it does nothing for high availability since none of the data is duplicated. Unlike manual datafile striping, where the Oracle professional divides an Oracle tablespace into small datafiles, with RAID 0, the Oracle datafile is automatically striped one block at a time across all of the disk devices. In this fashion, every datafile has pieces residing on each disk, and the disk I/O load will become very well balanced. Note that a disk failure in RAID 0 results in the loss of the datafiles storage on this device. A good recommendation is to only put temporary tables on this that can be easily recovered in the case of a disk failure.

* RAID 1?RAID 1 is commonly called disk mirroring. Since the disks are replicated, RAID 1 may involve double or triple mirroring. The RAID 1 architecture is designed such that a disk failure will cause the I/O subsystem to switch to one of the replicated disks with no service interruption. RAID 1 is use when high availability is critical, and with triple mirroring, the mean time to failure (MTTF) for an Oracle database is measured in decades. (Note that disk controller errors may cause RAID 1 failures, although the disks remain healthy.)

* RAID 0+1?Raid 0+1 is the combination of block-level striping and disk mirroring. The advent of RAID 0+1 has made Oracle-level striping obsolete since RAID 0+1 stripes at the block level, dealing out the table blocks, one block per disk, across each disk device. RAID 0+1 is also a far better striping alternative since it distributes the load evenly across all of the disk devices, and the load will rise and fall evenly across all of the disks. This relieves the Oracle administrator of the burden of manually striping Oracle tables across disks and provides a far greater level of granularity than Oracle striping, because adjacent data blocks within the same table are on different disks.

* RAID 5 - Some of the newer hardware based Raid 5 storage does extremely well in performance in data warehouses. RAID 5 is a good approach for Oracle data warehouses where the load speeds are not important and where the majority of the system I/O is read-only activity.

Note that the use of RAID does not guarantee against catastrophic disk failure. Oracle specifically recommends that all production databases be run in archivelog mode regardless of the RAID architecture, and that periodic Oracle backups should be performed. Remember that there are many components to I/O subsystems?including controllers, channels, disk adapters, SCSI adapters?and a failure of any of these components could cause unrecoverable disk failures of your database. RAID should only be used as an additional level of insurance, and not as a complete recovery method.

If you like Oracle tuning, see the book "Oracle Tuning: The Definitive Reference", with 950 pages of tuning tips and scripts.

You can buy it direct from the publisher for 30%-off and get instant access to the code depot of Oracle tuning scripts.

��