Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

   

Oracle RAID configuration

Oracle Database Tips by Donald Burleson

By Michael R. Ault

Welcome to My Nightmare - The Common Performance Errors in Oracle Databases

RAID—Redundant Arrays of Inexpensive Disks 

The main strengths of RAID technology are its dependability and IO bandwidth. For example, in a RAID5 array, the data is stored as are checksums and other information about the contents of each disk in the array. If one disk is lost, the others can use this stored information to re-create the lost data. However, this rebuild of data on-the-fly causes a massive hit on performance. In RAID 1, RAID 10 and RAID 01 failed disks are immediately replaced by their mirror with no performance hit. This makes RAID very attractive. RAID 5 has the same advantages as shadowing and striping at a lower cost. It has been suggested that if the manufacturers would use slightly more expensive disks (RASMED—redundant array of slightly more expensive disks) performance gains could be realized. A RAID system appears as one very large, reliable disk to the CPU. There are several levels of RAID to date:

·        RAID 0. Known as disk striping.

·        RAID 1. Known as disk shadowing or mirroring.

·        RAID 1/0. Combination of RAID0 and RAID1. May also be called RAID 10 depending on whether they are striped and mirrored or mirrored then striped. It is generally felt that RAID 10 performs better than RAID 01.

·        RAID 2. Data is distributed in extremely small increments across all disks and adds one or more disks that contain a Hamming code for redundancy. RAID 2 is not considered commercially viable due to the added disk requirements (10 to 20 percent must be added to allow for the Hamming disks).

·        RAID 3. This also distributes data in small increments but adds only one parity disk. This results in good performance for large transfers, but small transfers show poor performance.

·        RAID 4. In order to overcome the small transfer performance penalties in RAID3, RAID4 uses large data chunks distributed over several disks and a single parity disk. This results in a bottleneck at the parity disk. Due to this performance problem RAID 4 is not considered commercially viable. RAID 3 and 4 are usually are used for video streaming technology or large LOB storage.

·        RAID 5. This solves the bottleneck by distributing the parity data across the disk array. The major problem is it requires several write operations to update parity data. The performance hit is only moderate, and the other benefits may outweigh this minor problem. However the penalty for writes can be over 20% and must be weighed against the benefits.

·        RAID 6. This adds a second redundancy disk that contains error-correction codes. Read performance is good due to load balancing, but write performance suffers due to RAID 6 requiring more writes than RAID 5 for data update.

For the money, I would suggest RAID0/1 or RAID1/0, that is, striped and mirrored. It provides nearly all of the dependability of RAID5 and gives much better write performance. You will usually take at least a 20 percent write performance hit using RAID5. For read-only applications RAID5 is a good choice, but in high-transaction/high-performance environments the write penalties may be too high. Figure 18 shows RAID 1-0 or 0-1 depending on whether you stripe and then mirror or mirror first and then stripe. In most situations you get better performance from RAID 1-0 (mirroring then striping.)

Figure 18: Mirroring and Striping

Table 1 shows how Oracle suggests RAID should be used with Oracle database files.

RAID

Type of Raid

Control File

Database File

Redo Log File

Archive Log File

0

Striping

Avoid

OK

Avoid

Avoid

1

Shadowing

Best

OK

Best

Best

1+0

Striping and Shadowing

OK

Best

Avoid

Avoid

3

Striping with static parity

OK

OK

Avoid

Avoid

5

Striping with rotating parity

OK

Best if RAID0-1 not available

Avoid

Avoid

Table 1: RAID Recommendations (From MOSC NOTE: 45635.1)

Disk Speed and Size Selection

This all points to the fact that in order to get the maximum performance from your disk system you must understand the IO characteristics (the profile) of your database system, be it Oracle, SQL Server, Informix, UDB or MySQL. You must tune your disk architecture to support the expected IO profile and must tune the database system to take advantage of the disk architecture. For example, an Oracle database has different IO characteristics depending on whether it is reading or writing data and what type of read or write it is doing. Other databases have fixed read/write sizes.

You must determine the IO profile for your database and then use the IO profile of the database to determine the maximum and minimum IO size. The IO profile will tell you what percentage of IO is large IO and what percentage is small IO, it will also give you the expected IO rate in IO/second.

Once you have the IO per second you can determine the IO capacity (number of drives) needed to support your database.

The first rule of tuning your disk system is:

Size first for IO capacity, then for volume.

Some back of the envelope calculations for the number of spindles needed to support IO rate are:

RAID10 with active read/write to all mirrors:

MAX(CEILING(IOR/(NSIOR*M),M),2*M)

Where:

·        IOR is expected maximum IO rate in IO/sec

·        NSIOR is the average non-sequential IO rate of the disks in IO/sec (range of 90-100 for RAID10)

·        M is the number of mirrors

(The maximum of the IO rate divided by the average non-sequential IO rate per disk times the number of mirrors to the nearest power of  M or 2*M)

RAID5 assuming 1 parity disk:

MAX((IOR/CNSIOR)+1,3)

Where:

·        IOR is expected maximum IO rate in IO/sec

·        CNSIOR is the corrected average non-sequential IO rate of the disks in IO/sec (range of 60-90 for RAID5)

(The maximum of the IO rate divided by the average non-sequential IO rate per disk corrected for RAID5 penalties plus 1 disk for the parity disk)

The correction for the non-sequential IO rate for RAID is due to the up to 400% penalty on writes (writes take 4 times linger than reads on the same drive). In some cases on RAID5 I have seen this go as high as 6400% (writes take 64 times as long as reads for the same file) when combined with other problems such as fragmentation.

A case in point, early RAID architectures utilized the "stripe shallow and wide" mind set where files where broken into small pieces and spread over a large number of disks. For example, stripe unites per disk of as small as 8K were common. Many systems read in IO sizes of 64K or larger. This means that to satisfy a single IO request 8 disks of the RAID set were required, if there were fewer than 8 disks in the set. Disks would have to undergo 2 or more IOs to satisfy the request. This sounds fine if you are talking about a single user wanting to read a large file from a large group of disks very quickly, however, what happens when you have 10 or 100 or 1000 concurrent users all wanting to do the same thing?

Tune for Concurrency

This problem with concurrent access and RAID arrays is one of the most prevailing in the industry. The ubiquitous IO wait is usually the predominant wait event in any database system simply due to the fact that IO to memory is in the nanosecond range while IO to disk is in the millisecond range, when you add in blocked access due to multi-disk IO requests you get a snowball effect that can cripple your IO subsystem.

Array manufacturers have begun to recognize this concurrent access problem and have increased the base stripe unit per disk to 64K, matching the IO unit for many systems. Of course now systems such as SUN and Windows utilize maximum IO sizes of 1 megabyte or larger, so again the array manufacturers are playing catch up to the server manufacturers.

So what is our second rule of tuning disks? Based on the above information the rule is:

Always ensure that the primary IO size for your database system is matched to the IO size of the disk array system.

Of course the inverse also holds true:

Always match the stripe unit per disk to the expected majority IO request from your (database) application.

In the 1990's Paul Chen of the University Of Berkeley computer center published a series of papers on tuning disk array stripe units size based on expected concurrency. In these papers by Mr. Chen and his associates they determined that the IO speed (as measured by average seek time) and IO rate (as measured in megabytes per second) for a disk determined the stripe size for performance in an array even when the number of concurrent accesses is not known. There were three formulae derived from these papers:

For non-RAID5 arrays when concurrency is known:

SU = (S*APT*DTR*(CON-1)*1.024)+.5K

Where:

·        SU - Striping unit per disk

·        S - Concurrency slope coefficient (~.25)

·        APT - Average positioning time  (milliseconds)

·        DTR - Data transfer rate (Megabyte/sec)

·        CON - number of concurrent users.

·        1.024= 1s/1000ms*1024K/1M (conversion factors for units)

So for a drive that has an average seek time of 5.6 ms and a transfer rate of 20 Mbyte/second the calculated stripe unit for a 20 concurrent user base would be:

(.25*5.6*20*(19)*1.024)+.5 = 545K (or ~512K)

For a system where you didn't know the concurrency the calculation becomes:

SU =(2/3*APT*DTR)

So for the same drive:

2/3*5.6*20*1.024 = 76.46K so rounding up ~128K or rounding down 64K

And from Chen's final paper, a formula for RAID5 arrays is:

0.5*5.6*20*1.024 = 57.34 (rounding up 64K)

The values for average access time and transfer rate used in these examples is actually fairly low when compared to more advanced drives so the stripe sizes shown above are probably low by at least a factor of 2 or more. I say this because while average seek times drop, the transfer rate increases for example on a Ultra3 SCSI 15K drive the spec for average seek may drop to 4.7 ms, however the transfer rate leaps to 70 Mbyte per second. So the over all value of the combined factor goes from 112 to 329, a 293% increase.

The 100% Myth

Many system administrators are guilty of perpetuating the 100% myth. This myth states that you don't need more assets (be it disk, CPU, or Memory) until the existing asset is 100% utilized. This leads to performance issues in the area of disks. Due to disk physics the best performance for a disk is at the outer edges, once you get towards the inner sectors performance decreases because of the distance the head must travel to read the data and other factors. In the good old days administrators spent much time positioning frequently used files on the outer edges of disks.

While physically positioning files on disks is difficult if not impossible in modern RAID systems, you should endeavor not to fill the disks to 100% of capacity. Some experts say don't use more then 30% if you want maximum performance, others 50%. I say it depends on how the system is used, the operating system and the RAID array system. For example the Clariion from EMC promises to tune the placement of files such that frequently used files are in the best locations.

So, what can we summarize about disk size and speed?

Get the fastest drives you can and plan capacity based on concurrency requirements as well as IO requirements. The more, faster disks the better.

Improper Initialization File Parameter Settings

For Oracle7, version 7.3, there are 154 initialization parameters, for Oracle8, version 8.0.5, there are 184. In Oracle8i there are 194. In Oracle9i version 9.0.1 there are 251 and in 9iR2, 257. In 10g the number of parameters actually dropped to 254 but the number of undocumented parameters increased. In 9iR2 the number of undocumented parameters was 583 up to 918 in 10gR1.

Fortunately there are very few that you need to adjust to tune Oracle. Table 2 lists the major tuning parameters, but is not supposed to be a complete list by any means.

Parameter

Definition

create_bitmap_area_size

This sets the memory area for bitmap creation

bitmap_merge_area_size

This is the memory area used for bitmap merge

create_stored_outlines

This allows Oracle to create stored outlines

cursor_sharing

This sets for automated literal replacement

db_file_multiblock_read_count

This sets the read size for full table and index scans

filesystemio_options

This is used to set direct or AIO options for filesystem reads

optimizer_index_caching

Used to tune index access

optimizer_index_cost_adj

Used to tune index access

query_rewrite_enabled

Sets for queries to be rewritten to use materialized views or FBIs

query_rewrite_integrity

Sets the criteria for when MVs are used.

session_cached_cursors

Sets the number of cached cursors at the session level

sga_max_size

Sets the maximum SGA memory size

sga_target

Sets the baseline SGA memory size

star_transformation_enabled

Allows Oracle to use star transformation

transactions_per_rollback_segment

Sets the number of transactions that will use a single rollback (undo) segment

pga_aggregate_target

Sets the total PGA memory usage limit

workarea_size_policy

Determines how workareas (sort and hash) are determined

buffer_pool_keep

Sets the size of the keep buffer pool for tables and indexes

buffer_pool_recycle

Sets the size of the recycle buffer pool for tables and indexes

cursor_space_for_time

Sacrifices memory for cursor storage space

db_16k_cache_size

Sets the size of the 16K cache size

db_2k_cache_size

Sets the size of the 2K cache size

db_32k_cache_size

Sets the size of the 32K cache size

db_4k_cache_size

Sets the size of the 4K cache size

db_8k_cache_size

Sets the size of the 8K cache size

db_block_size

Sets the default block size for the database

db_cache_size

Sets the default cache size

 

Table 2: Tuning Parameters

How to determine proper setpoints for all of these is beyond the scope of this paper. However the Oracle tuning guides provide many good tips as does the Burleson Consulting web site: www.remote-dba.net.

Improper PGA setup

I don't believe there is anyone out there that believes disk based sorts and hashes are good things. A disk based operation will take anywhere from 17 to hundreds of times as long as a memory based operation depending on buffering, IO bandwidth, memory and disk speeds.

Oracle provides AWRRPT or statspack reports to track and show the number of sorts. Unfortunately hashes are not so easily tracked. Oracle tracks disk and memory sorts, number of sort rows and other sort related statistics. Hashes on the other hand only can be tracked usually by the execution plans for cumulative values, and by various views for live values.

In versions prior to 9i the individual areas were set using the sort_area_size and hash_area_size parameters, after 9i the parameter PGA_AGGREGATE_TARGET was provided to allow automated setting of the sort and hash areas. For currently active sorts or hashes the script in Figure 19 can be used to watch the growth of temporary areas.

For complete scripts, see my complete Oracle script collection at www.dba-oracle.com/oracle_scripts.htm.

column now format a14
column operation format a15
column dt new_value td noprint
set feedback off
select to_char(sysdate,'ddmonyyyyhh24miss') dt from dual;
set lines 132 pages 55
@title132 'Sorts and Hashes'
spool rep_out\&&db\sorts_hashes&&td
select sid,work_area_size,expected_size,actual_mem_used,max_mem_used,tempseg_size,
to_char(sysdate,'ddmonyyyyhh24miss') now, operation_type operation
from v$sql_workarea_active
/
spool off
clear columns
set lines 80 feedback on
ttitle off
Figure 19: Live Sorts and Hashes Report
Figure 20 shows an example output from this report.
 
Date: 01/04/06                                                           Page:   1
Time: 01:27 PM                    Sorts and Hashes                       SYS
                                  whoville database
     Work Area Expected Actual Mem Max Mem Tempseg
SID       Size     Size       Used    Used    Size Now             Operation
---- --------- -------- ---------- ------- ------- --------------- ---------------
1176   6402048  6862848          0       0         04jan2006132711 GROUP BY (HASH)
 582    114688   114688     114688  114688         04jan2006132711 GROUP BY (SORT)
 568   5484544  5909504     333824  333824         04jan2006132711 GROUP BY (HASH)
1306   3469312  3581952    1223680 1223680         04jan2006132711 GROUP BY (HASH)

Figure 20: Example Sorts and hashes Report

As you can see the whoville database had no hashes, at the time the report was run, going to disk. We can also look at the cumulative statistics in the v$sysstat view for cumulative sort data.

Date: 12/09/05                                                   Page:   1
Time: 03:36 PM                  Sorts Report                PERFSTAT
                            sd3p database
Type Sort              Number Sorts
-------------------- --------------
sorts (memory)           17,213,802
sorts (disk)                    230
sorts (rows)          3,268,041,228

Figure 21: Cumulative Sorts

Another key indicator that hashes are occurring are if there is excessive IO to the temporary tablespace yet there are few or no disk sorts.

The PGA_AGGREGATE_TARGET is the target total amount of space for all PGA memory areas. However, only 5% or a maximum of 200 megabytes can be assigned to any single process. The limit for PGA_AGGREGATE_TARGET is 4 gigabytes (supposedly) however you can increase the setting above this point. The 200 megabyte limit is set by the _pga_max_size undocumented parameter, this parameter can be reset but only under the guidance of Oracle support. But what size should PGA_AGGREGATE_TARGET be set? The AWRRPT report in 10g provides a sort histogram which can help in this decision. Figure 22 shows an example of this histogram.

 

PGA Aggr Target Histogram                       DB/Inst: OLS/ols  Snaps: 73-74
-> Optimal Executions are purely in-memory operations
  Low     High
Optimal Optimal    Total Execs  Optimal Execs 1-Pass Execs M-Pass Execs
------- ------- -------------- -------------- ------------ ------------
     2K      4K      1,283,085      1,283,085            0            0
    64K    128K          2,847          2,847            0            0
   128K    256K          1,611          1,611            0            0
   256K    512K          1,668          1,668            0            0
   512K   1024K         91,166         91,166            0            0
     1M      2M            690            690            0            0
     2M      4M            174            164           10            0
     4M      8M             18             12            6            0
          -------------------------------------------------------------

 Figure 22: Sort Histogram

In this case we are seeing 1-pass executions indicating disk sorts are occurring with the maximum size being in the 4m to 8m range. For an 8m sort area the PGA_AGGREGATE_TARGET should be set at 320 megabytes (sorts get 0.5*(.05*PGA_AGGREGATE_TARGET)). For this system the setting was at 160 so 4 megabytes was the maximum sort size, as you can see we were seeing 1-pass sorts in the 2-4m range as well even at 160m.

By monitoring the realtime or live hashes and sorts and looking at the sort histograms from the AWRRPT reports you can get a very good idea of the needed PGA_AGGREGATE_TARGET setting. If you need larger than 200 megabyte sort areas you may need to get approval from Oracle support through the i-tar process to set the _pga_max_size parameter to greater than 200 megabytes.

Summary

This paper has presented the major tuning issues I have seen at many sites during tuning engagement. I have presented ways of determining if the issues exist and how to determine settings to help mitigate the issues in an active database.

 


 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.