Question: I have a SQL with multiple
columns in my where clause. I know that Oracle can only choose
one index, and I know about multi-column composite indexes, but I do
not know how to determine the optimal column order for a composite
index with multiple column values. What is the secret for
creating a composite index with the columns in the proper sequence?
Answer: You are correct that the column sequence
matters! This is an empirical question, and you need to run
scripts against your SQL workload (STATSPACK or AWR) to examine
how frequently a specific index column was needed by SQL.
Remember, it's the SQL workload that drives your choice of composite
indexes, and the order of the columns within the index.
See these important scripts
to display multi-column index usage using AWR.
- In general, when using a multi-column index, you want to put
the most restrictive column value first (the column with the
highest unique values) because this will trim-down the result
- Because Oracle can only access one index, your job is to
examine your historical SQL workload and build a single
composite index that satisfies the majority of the SQL queries.
- The Oracle optimizer may try to make single column
indexes behave as-if they were a single composite index.
Prior to 10g, this could be done with the "and_equal" hint.
- Beware that indexes have overhead and see my notes on
detecting duplicate index columns.
- You can run scripts to monitor the invocation count for each
column in a multiple column composite index (see
counting column usage from a SQL workload)
I have more complete
details on composite index usage monitoring in
Advanced Oracle SQL Tuning: The Definitive Reference.
Also, see my related notes on
tuning with composite bitmap indexes and my scripts to monitor
which columns of a composite index are used, and
counting index column usage from AWR and STATSPACK.
Large Multi-column Composite Indexes
Multi-column indexes with more than 3 columns may not provide more
efficient access than a two-column index.
The objective of the index is to reduce the amount of rows
returned from a table access.
Therefore each added column must substantially reduce the
number of returned rows to be effective.
For example, assuming a large table, on a query with 5 or
more WHERE (AND) clauses using a 5-column index may return only 1
row. However using a
3-column index may return only 50 rows.
A two-column index returns 200 rows.
The time it takes to extract the one row from the 200 rows
using nested-loops is negligible.
Thus the two-column index may be almost as efficient (fast) as the
5-column index. The key
is to index the most restrictive columns.
Another tradeoff is a table with multiple column indexes
where the leading column(s) are the same.
For instance, a table with four 3-column indexes where the
leading two columns are the same may work very efficiently on select
statements but cause a heavy penalty on inserts and updates.
Just one 2-column index on the leading two columns may
provide acceptable query performance while greatly improving DML.
Small tables with two or three columns may benefit by being rebuilt
as an Index Organized Table (IOT).
A 2-column table with a primary key and a two-column index
has 1.5 times the data in indexes that are in the table.
Making the table an Index Organized Table reduced the need
for indexes because the table is the index.
Also IOTs can have indexes on non-leading columns if
this has to be balanced with the overhead of maintaining the IOT.
Lastly, do not be afraid to use temporary indexes.
If you run a nightly report
that requires 6 hours to run, but will run in 30 mins with a
specific index, you might want to create the index before running
the report and drop it upon completion.
I work with clients that drop certain indexes to expedite the
bill run, then recreate then for the normal application.
They create indexes each night and drop them in the morning.
There is nothing wrong with dynamically changing you database
to respond to varying tasks if it results in efficiency.
Script for tracking composite index column usage
These scripts will only track SQL that you have
directed Oracle to capture via your threshold settings in AWR or
STATSPACK. STATSPACK and AWR will not collect "transient SQL" that
did not appear in v$sql at snapshot time. Hence, not all SQL
will appear in these reports. See my notes here on
adjusting the SQL capture thresholds.
col c1 heading 'Begin|Interval|time'
col c2 heading 'Search
Columns' format 999
col c3 heading 'Invocation|Count'
break on c1 skip 2
accept idxname char prompt
'Enter Index Name: '
ttitle 'Invocation Counts
SEE CODE DEPOT FOR FULL SCRIPTS
The query will produce an output showing a
summary count of the index specified during the snapshot interval.
This can be compared to the number of times that a table was invoked
from SQL. Here is a sample of the output from the script.
Invocation Counts for
Search Columns Count
For more complete details on creating a custom composite index
monitoring infrastructure, see my latest book
Advanced Oracle SQL Tuning: The Definitive Reference.
Get the Complete
Oracle SQL Tuning Information
The landmark book
SQL Tuning The Definitive Reference" is
filled with valuable information on Oracle SQL Tuning.
This book includes scripts and tools to hypercharge Oracle 11g
performance and you can
for 30% off directly from the publisher.