The Power of Indexing

Home
E-mail Us
Oracle Articles
New Oracle Articles

Oracle Training
Oracle Tips
Oracle Forum
Class Catalog

Remote DBA
Oracle Tuning
Emergency 911
RAC Support
Apps Support
Analysis
Design
Implementation
Oracle Support

SQL Tuning
Security
Oracle UNIX
Oracle Linux
Monitoring
Remote support
Remote plans
Remote services
Application Server
Applications
Oracle Forms
Oracle Portal
App Upgrades
SQL Server
Oracle Concepts
Software Support
Remote Support
Development
Implementation

Consulting Staff
Consulting Prices
Help Wanted!

Oracle Posters
Oracle Books
Oracle Scripts
Ion
Excel-DB

Don Burleson Blog

The Power of Indexing

Oracle Database Tips by Donald Burleson

THE POWER OF INDEXING

John Garmany, Burleson Consulting

Indexes are the easiest way to improve the performance of long running queries with full table scans. By properly indexing large tables, query completion time can go from hours to minutes, minutes to milliseconds. Indexes allow the database to search the smaller indexes as apposed to searching the large table. This can improve not only SELECT queries but also UPDATEs and DELETEs. Anytime the database needs to locate one or a small number of rows from a table, an index will provide it with a fast path to the data it needs.

Indexes are also help in enforcing referential integrity. All primary keys and unique columns are enforced using indexes. These indexes are created by the database when the constraint is created. Foreign Keys can also benefit from indexes. When a child row is inserted (or updated), the parent key must first be verified. By placing an index on the column of the key in the parent table, the database can use the index to validate the child.

Otherwise, a full table scan of the parent table is required to validate the child. Even features like Advanced Replication benefit from proper indexing. If you have a replicated table that has a foreign key on another table, you can replication just the index on the Foreign Key column, and not replication the base tables.

This improves replication response times, reduced database and network loads, and still allows the maintenance of the FK constraint. Finding the most efficient index is often hard. Most indexes will be the default B-Tree type. This type of index has been in use for years and had been highly optimized. It is effectively used by both the old RULE and the now standard COST based optimizers. However, if your query uses a function in the where clause, the database will not use a standard B-Tree index. For that you need a Function Based Index (FBI). A FBI creates the index by applying the function to each value as the index is built and maintained.

This allows the database to make direct comparisons with the function in the WHERE clause. But FBIs pay the penalty of having to execute the function for each change in the index's base column(s), and the old RULE based optimizer cannot use FBIs. There is another problem with function based indexes; the database may not use them.

An FBI can be created on any deterministic function, basically a function that will return the same out put for a specific input. This includes user created functions. Below is a rather complicated function based on Oracle's REGEX function.

create index card_idx on dumbtbl (REGEXP_SUBSTR (lots_data,?(([0-9]{4})([[:space:]])){3}[0-9]{4}?));

Here I am indexing a large VARCHAR column based on a credit card number that it contains. The database will create this index, but even with a hint, the optimizer will not use it?and it is expensive to maintain. Then there are Bitmap Indexes. Computer CPUs are incredibly fast at comparing bits.

By having a bit for each value in the index, Oracle can create a mask of the value wanted and OR down the bitmapped index to find the values that match. Bitmapped indexes are very fast but they also have drawbacks. First, to work efficiently, the index values are reduced to bits. The width of the bitmap index is basically the number of values in the index. Thus the indexed column needs to be of low cardinality (few distinct values).

Also, there is a penalty to change or add values to a bitmapped index. This penalty has been significantly reduced in Oracle 10gR2, but the change penalty is so great that bitmapped indexes are most useful in read only or data warehouse implementations. Also, the RULE based optimizer does not know what a bitmapped index is and will not use them. Lastly, every index added to a database will have a performance impact on INSERTs, UPDATEs and DELETEs to the underlying table.

This is because the database must update the indexes when ever the underlying tables data changes. If a table has 6 indexes and you insert a row, the table and all six indexes will have to be updated. Thus adding indexes is always a tradeoff between SELECT performance and DML performance.

Too many Indexes will also slow down performance of updates and inserts. Too few indexes and all types of queries may run slower (even updates and deletes). If it were just a trade off, performance of updates and inserts verses performance of selects, optimizing Indexes in a database would be fairly easy. However, it is not that simple.

First is the question of is the index being used? If it is used, is it the correct index? Is there a better index? How can a DBA with a database choking on indexes, focus his efforts to optimize index usage? The question is not just ?are my indexes being used?? but ?do I have the right indexes on the right data to optimize my database??

Database Parameters

Two parameters introduced in Oracle9i effect the COST based optimizer's index usage decisions. The optimizer_index_caching parameter tells the optimizer how much of the index it can expect to find in the buffer cache. It defaults to 0 (turned off). Changing this parameter to account for indexes in the buffer cache will cause the optimizer to favor using indexes. I normally start with this parameter set to 75 (assume 75% of the index is in the cache). The other parameter, optimizer_index_cost_adj) directly impacts the COST based optimizer's calculation of the using an index. It defaults to 100.

This tells the optimizer to calculate the cost of using an index at 100% its actual cost. Setting this parameter lower will lower the cost for all indexes that the COST based optimizer evaluates. This parameter has to most potential for misuse because the DBA may not fully understand its impact. If you set the optimizer_index_cost_adj to a low value, you run the risk of removing the ability of the optimizer to determine which index is better for a specific query.

Important Note: Prior to Oracle 10g, adjusting these optimizer parameters was the only way to compensate for sample size issues with dbms_stats. As of 10g, the use of dbms_stats.gather_system_stats and improved sampling within dbms_stats had made adjustments to these parameters far less important.

Ceteris Parabus, always adjust CBO statistics before adjusting optimizer parms. For more details on optimizer parameters, see my latest book "Oracle Tuning: The Definitive Reference".

Basically, if the database has three indexes that can be used in a query, and the optimizer_index_cost_adj is set to 10 (10%), the optimizer will examine each of the three indexes available. It will evaluate each index and if the index has a lower cost (is better) than the current index, the optimizer will select the new index and use it for the remaining index comparisons. But the new index but be better to be selected, not just as good. With the cost cut by 90% all the indexes in the comparison will have the same cost, leaving the optimizer to only choose the first index, because the other indexes can not be better, they all have the same cost.

Thus setting this parameter too low can result in the optimizer making bad choices on which index to use. If the optimizer needs a push toward using indexes, lowering the optimizer_index_cost_adj parameter can help but values below 50 (50%) should be monitored closely.

Evaluating Current Indexes

Finding and removing indexes that are not used will improve database performance by removing the maintenance overhead. However, finding and removing duplicate indexes can have just as great an effect if not more. The plan of attack is to locate all indexes that the database is currently not using. These unused indexes are then removed from the system. The second step is to locate possible duplicate indexes. Duplicate indexes in this context are multiple indexes that can satisfy the queries; the database will not let you create an actual duplicate index.

Duplicate indexes can be removed or modified to meet the requirements of the SQL statements. In reviewing possible duplicate indexes, tables that could benefit from reorganization into Index Organized Tables (IOT) are also identified. The last step is identifying tables/indexes that are candidates for Function Based Indexes (FBI).

Unused Indexes.

Finding unused indexes in Oracle 8i is difficult to say the least. Unlike Oracle 9i and 10g, there is no flag to tell you that an index was used. In Oracle 8i you must do it the hard way, fishing in the library cache. Basically, you exaplain each query in the library cache into the plan table, then query the indexes out of the plan table. Oracle9i and 10g has made finding unused indexes easy. The database will monitor for index use and update a view called v$object_usage.

When you turn monitoring on with: alter index <index_name> monitoring usage; The database begins monitoring the index and updates the USED column when the index is first used. When you want to stop monitoring simply execute: alter index <index_name> nomonitoring usage; This only tells you that the index was used during the monitoring period (YES or NO). Each time you start monitoring an index the USED column is reset to NO until the index is used. This is great for identifying those indexes used only for periodic reports.

Although index monitoring does not in itself impact database performance, turning it on does. Each time you turn on index monitoring, all execution plans in the library cache that use the index are invalidated. You need to monitor index usage over a long enough period of time to identify all indexes being used. By restarting monitoring before running periodic reports, etc, you will locate indexes that are only used in specific task or times.

The database may benefit by removing these indexes, rebuilding them when needed and dropping them again. In this way you are not always paying the overhead of maintaining indexes that are only used at a specific time. If the penalty caused by index montoring's SQL invalidation is too high, you can also look to STATSPACK and the AWR for index information. In Oracle9i, STATSPACK gathers execution plans and stores them in the STATS$SQL_PLAN table.

From this table you can determine which indexes where being used by the execution plan during that snap. Oracle 10g introduced the AWR, which also captures index use and can be used for time series analysis. We cover AWR in more detail later in the paper.

Step Two - Remove the unused indexes.

Once you have a list of indexes that have not been used, I recommend that you use a tool (or script) and punch out the DDL required to rebuild them. Also watch for Foreign Key indexes. Hopefully you have a naming protocol so that you can easily identify PK and FK indexes as you will not see a PK or FK index in the explain plan if it is only used for referential integrity. Primary keys can't be dropped (and you don't want to drop them) but FK indexes can. I recommend that you not drop FK indexes until you can closely monitor the system.

Missing FK indexes can cause large numbers of full table scans during inserts and updates. Now you can start dropping indexes and monitoring the system for increased response time or increases in full table scans. Again I use one of Don Burleson's scripts plan9i.sql (plan10g.sql) to monitor full table scans. The script basically generates a report that details full table scans and index full and range scans.

Small tables that having high full table scans may benefit more by being placed in the KEEP pool, rather than having an index added. When I perform index tuning for a client I always find that I need to create a few indexes after the unused indexes are dropped to reduce full table scans. However, the added index is always different that ones that were dropped.

Too Many Indexes?

Most of the systems I work on are under indexed. That said, databases supporting custom applications are often over indexed. Many developers believe that all columns in all WHERE clauses should be indexed. While this philosophy may work for the developer, it often drives the DBA nuts! As a DBA you focus on tuning the system, thin fine tune the problem queries individually. Sometimes I find a system that is choking on the overhead of indexes. Even after removing unused indexes, the system still is impacted by the overhead of the remaining indexes. In this case I start reviewing the indexes to see if fewer (less perfect) indexes can relieve the overhead wile maintaining acceptable query performance.

I start by looking for indexes on the same table with the same leading column, then for indexes with the same two leading columns. This provides a good starting point for trying to reduce redundancy in indexes. Indexes with the same three leading columns (yes, I find many of those) will be in the results from the two leading column report. Oracle looks for an index that will satisfy the ?WHERE? statement in the query. The Rule Based Optimizer will select the index it thinks will be most efficient based on the columns in the WHERE clause and uses it. The Cost Based Optimizer selects an index based on the columns it is looking for, the index statistics and whether it believes using the index is more efficient than a full table scan.

If an index exist that will completely satisfy the query then Oracle will select that index, otherwise Oracle will try and select the index that will return the least number of rows (i.e. most restrictive) and use that index. Oracle will only use one index unless the query contains an ?and_equal? hint telling it to use more than one index before accessing the table. The ?and_equal? hint is depreciated in Oracle 10g and in real life was not as useful as it would appear to be. Basically here are some scenarios where you may be able to reduce the number of indexes and still maintain performance. Performance is always the deciding factor.

Primary Keys: A

ny table with a PK has an index on that PK. If you do not specify a PK index, the system will create one itself. It is very common to find redundant indexes based on the assumption that there is no index on the table's PK when in fact there always is.

Table with 3 or more columns:

Index1 column1_PK Index2 column1 column2 Index3 column1 column2 column3 In this case index2 is redundant because it is completely contained in index3. If index1 were not a primary key index, it could possibly be redundant.

Using ?and_equal? Hints To Concatenate Indexes:

Index1 column1_PK Index2 column2 Index3 column1 column2 - Index4 column2 column1 -

In this case index3 and 4 were created for multiple column queries with some lead columns on 1 and some lead columns on 2. In this case Index 3 and 4 could be removed and multi-column queries that use column1 and 2 can use the ?and_equal? hint to cause the optimizer to use index1 and index2 before accessing the table with rowids. Using the ?and_equal? hint does have additional overhead since two indexes must be read.

You have to balance the overhead of two reading two indexes against the overhead of the additional indexes. I also find cases where Index3 and Index4 were added to support queries that contained both column1 and column2 in separate WHERE clauses. These indexes were added because in some queries column1 is more restrictive (so it is the lead column) and in other queries column2 is more restrictive (and thus the lead column). In fact either Index3 or Index4 (or using the ?and_equal? hint) will satisfy the query and will have equivalent performance. Y

ou are also relying on the optimizer to pick the correct index to use for each type of query, which is unlikely. Having the lead column the most restrictive column will not affect the index performance. In this case you can drop either Index3 or Index4 and not affect query performance. Remember that leading columns are only important in that they must be used in the WHERE clause for the index to be used. Oracle 9i introduced the index skip scan that allows you to use an index where the lead column is not in the where clause, but this method normally requires more overhead than resulting performance benefit.

Large Multi-column Indexes:

Multi-column indexes with more than 3 columns may not provide more efficient access than a two-column index. The objective of the index is to reduce the amount of rows returned from a table access. Therefore each added column must substantially reduce the number of returned rows to be effective. For example, assuming a large table, on a query with 5 or more WHERE (AND) clauses using a 5-column index may return only 1 row. However using a 3-column index may return only 50 rows.

A two-column index returns 200 rows. The time it takes to extract the one row from the 200 rows using nested-loops is negligible. Thus the two-column index may be almost as efficient (fast) as the 5-column index. The key is to index the most restrictive columns. Another tradeoff is a table with multiple column indexes where the leading column(s) are the same.

For instance, a table with four 3-column indexes where the leading two columns are the same may work very efficiently on select statements but cause a heavy penalty on inserts and updates. Just one 2-column index on the leading two columns may provide acceptable query performance while greatly improving DML. Small tables with two or three columns may benefit by being rebuilt as an Index Organized Table (IOT).

A 2-column table with a PK and a two-column index has 1.5 times the data in indexes that are in the table. Making the table an Index Organized Table reduced the need for indexes because the table is the index. Also IOTs can have indexes on non-leading columns if required. Again this has to be balanced with the overhead of maintaining the IOT. Lastly, do not be afraid to use temporary indexes.

If you run a nightly report that requires 6 hours to run, but will run in 30 mins with a specific index, you might want to create the index before running the report and drop it upon completion. I work with clients that drop certain indexes to expedite the bill run, then recreate then for the normal application. They create indexes each night and drop them in the morning. There is nothing wrong with dynamically changing you database to respond to varying tasks if it results in efficiency.

The question of rebuilding indexes raises people's ire more that talking politics. Everyone has an opinion, everyone has rules and if you don't agree with this or that rule you?re just plain dumb. The rules go from never to always with sometimes in between. I am not going to provide any rule, rather I am going to discuss what rebuilding an index does and when you might want to rebuild an index.

Creating an Index

This is a basic description of how Oracle builds indexes. A B-Tree index starts as two blocks. The first block contains the pointers to the rows in the second block. As the index grows, the data/rowids go in the second block. So to find a key requires two reads. The first block is read to determine which block contains the data/rowids, and the second block is read to retrieve the necessary information. As the index grows the second block will fill up. When it hits PCTFREE, it is logically full. Oracle evaluates how it has been growing the current block.

If it has been growing sequentially, then Oracle adds another block and continues to add data (middle diagram below). If the data has been growing randomly, then Oracle splits the first block into two approximately half filled blocks and continues to add data (bottom diagram below). (Figure 1)

Figure 1, Splitting and Growing Blocks

Notice that when you grow a block, the data remains tightly packed but when you split a block you end up with two half empty blocks. This empty space is referred to as index fluff (actually there are many names, I will use fluff). Lastly, since the header block contains only references to the other blocks (references are very small) the index will need to be very large before that block is filled. When it fills, a new header block is added and the current header block moves down and is split or grown. Now the index is a three level index.

Why it is good to rebuild indexes

When you rebuild an index, Oracle uses the current index to create the new index. Thus, the new index is created sequentially and is tightly packed on the blocks (no fluff other than PCTFREE). This is good because it may take a three level index and rebuilding it into a two level index, reducing a read. The tightly packed blocks are also more effect in the buffer cache.

Why rebuilt indexes are problems

A tightly packed index is more efficient for the database as long as no changes are made in the underlying table. Once you start making changes, the index blocks start to split and reintroduce the fluff. Not only is fluff reintroduced, but there is redo created as blocks are added to the index. The higher the rate of change, the faster your nicely packed index will return to a steady state of fluff.

When to rebuild indexes

The only time you need to rebuild indexes is before placing them and their underlying tables into readonly tablespaces or if the underlying tables have been rebuilt. Changes to the underlying tables can cause you to need to rebuild related indexes. When you delete from the underlying table, the database will mark rows as deleted rather than clearing them out. It does this to save time. If you perform a large amount of deletes, then you can end up with a large number of del_lf_rows in the index.

When you are using incremental keys (such as a sequence as a primary key) you can end up with an index that is carrying around a lot of mostly empty blocks. For example, if I create an index on the stats$snapshot.snap_id column, the index will always be adding numbers to the right side of the index as snapshots are taken (thus the term ?right-sided?).

If each week I delete those snapshots older than 3 days, I in effect delete rows from the left while adding rows on the right. Although by definition the index is still balanced, it has to keep adding blocks (on the right) while there are empty blocks on the left. Rebuilding the index will fix the problem. Actually, this is the only reason I know of that would warrant scheduled index rebuilding.

Again, cost vs benefit. The other time to rebuild is when the cost to rebuild is less that the performance gained. First, it is a continuing process since the index will move toward fluff with use. Second, there is the cost of rebuilding the index AND the cost of the additional redo as the index changes. There is only one method to determine if rebuilding an indexes benefits your database, TESTING!.

Counting index usage inside SQL

Prior to Oracle9i it was very difficult to see if an index was being used by the SQL in your database. It required explaining all of the SQL in the library cache into a holding area and then parsing through the execution plans for the index name. Things were simplified slightly in Oracle9i when we got the primitive alter index xxx monitoring usage command and the ability to see if the index was invoked. The problem has always been that it is very difficult to know what indexes are the most popular. In Oracle10g we can easily see what indexes are used, when they are used and the context where they are used. Here is a simple AWR query to plot index usage:

-- ************************************************

-- Copyright - 2004 by Rampant TechPress Inc.

-- Free for non-commercial use!

-- To license, e-mail info@rampant.cc

-- ************************************************

col c1 heading ?Begin|Interval|time? format a20

col c2 heading 'search Columns? format 999

col c3 heading ?Invocation|Count? format 99,999,999

break on c1 skip 2

accept idxname char prompt ?Enter Index Name: ?

ttitle ?Invocation Counts for index|&idxname?

select

to_char(sn.begin_interval_time,'yy-mm-dd hh24') c1,

p.search_columns c2,

count(*) c3

from

dba_hist_snapshot sn,

dba_hist_sql_plan p,

dba_hist_sqlstat st

where

st.sql_id = p.sql_id

and

sn.snap_id = st.snap_id

and

p.object_name = ?&idxname'

group by

begin_interval_time,search_columns;

As we can see, knowing the signature for large-table full-table scans can help us in both SQL tuning and instance tuning. For SQL tuning, this report will tell us when to drill-down to verify that all of the large table full-table scans are legitimate, and once verified, this same data can be used to dynamically reconfigure the Oracle instance to accommodate the large scans.

Conclusion

Index efficiency is hard. Many times finding the best index is a matter of trial and error. Removing unused indexes is the easy part. Finding the most efficient index for your system is more of an art and the DBA must always remember that changing an index can have a cascading effect on many SQL statement's execution plans. Developing an index use signature will also help you determine what indexes are being used and when. All of this information goes into determining which are the best indexes for you particular system.

About the Author:

John Garmany is a graduate of West Point and a retired LTC with more than 20 years of IT experience. He is an OCP Certified Oracle DBA with a master degree in Information Systems, a graduate certificate in Software Engineering, and a BS degree (electrical engineering) from West Point. John is a senior consultant and trainer with Burleson Consulting and author of Oracle Replication Handbook, Easy Oracle SQL, Easy Oracle PL/SQL by Rampant TechPress and Oracle Application Server10g Administration Handbook by Oracle Press.

Get the Complete
Oracle SQL Tuning Information

The landmark book "Advanced Oracle SQL Tuning The Definitive Reference" is filled with valuable information on Oracle SQL Tuning. This book includes scripts and tools to hypercharge Oracle 11g performance and you can buy it for 30% off directly from the publisher.

��