 |
|
The Power of Indexing
Oracle Database Tips by Donald Burleson |
THE POWER OF INDEXING
John Garmany, Burleson Consulting
Indexes are the easiest way to improve the performance of long
running queries with full table scans. By properly indexing large tables,
query completion time can go from hours to minutes, minutes to milliseconds.
Indexes allow the database to search the smaller indexes as apposed to searching
the large table. This can improve not only SELECT queries but also UPDATEs and
DELETEs. Anytime the database needs to locate one or a small number of rows from
a table, an index will provide it with a fast path to the data it needs.
Indexes
are also help in enforcing referential integrity. All primary keys and unique
columns are enforced using indexes. These indexes are created by the database
when the constraint is created. Foreign Keys can also benefit from indexes. When
a child row is inserted (or updated), the parent key must first be verified. By
placing an index on the column of the key in the parent table, the
database can use the index to validate the child.
Otherwise, a
full table scan of the parent table is required
to validate the child. Even features like Advanced Replication
benefit from proper indexing. If you have a
replicated table that has a foreign key on another table, you
can replication just the index on the Foreign
Key column, and not replication the base tables.
This improves
replication response times, reduced
database and network loads, and still allows the maintenance of
the FK constraint.
Finding the most efficient index is often hard. Most indexes
will be the default B-Tree type. This type
of index has been in use for years and had been highly
optimized. It is effectively used by both the old
RULE and the now standard COST based optimizers. However, if
your query uses a function in the where
clause, the database will not use a standard B-Tree index. For
that you need a Function Based Index (FBI).
A FBI creates the index by applying the function to each value
as the index is built and maintained.
This
allows the database to make direct comparisons with the function
in the WHERE clause. But FBIs pay the
penalty of having to execute the function for each change in the
index's base column(s), and the old RULE
based optimizer cannot use FBIs. There is another problem with
function based indexes; the database may
not use them.
An FBI can be created on any deterministic
function, basically a function that will return the
same out put for a specific input. This includes user created
functions. Below is a rather complicated
function based on Oracle's REGEX function.
create index card_idx on dumbtbl
(REGEXP_SUBSTR
(lots_data,?(([0-9]{4})([[:space:]])){3}[0-9]{4}?));
Here I am indexing a large VARCHAR column based on a credit card
number that it contains. The
database will create this index, but even with a hint, the
optimizer will not use it?and it is expensive to
maintain.
Then there are Bitmap Indexes. Computer CPUs are incredibly fast
at comparing bits.
By having a bit for
each value in the index, Oracle can create a mask of the value
wanted and OR down the bitmapped index to
find the values that match. Bitmapped indexes are very fast but
they also have drawbacks. First, to work
efficiently, the index values are reduced to bits. The width of
the bitmap index is basically the number of
values in the index. Thus the indexed column needs to be of low
cardinality (few distinct values).
Also,
there is a penalty to change or add values to a bitmapped index.
This penalty has been significantly
reduced in Oracle 10gR2, but the change penalty is so great that
bitmapped indexes are most useful in read only
or data warehouse implementations. Also, the RULE based
optimizer does not know what a
bitmapped index is and will not use them.
Lastly, every index added to a database will have a performance
impact on INSERTs, UPDATEs and
DELETEs to the underlying table.
This is because the database
must update the indexes when ever the
underlying tables data changes. If a table has 6 indexes and you
insert a row, the table and all six indexes
will have to be updated. Thus adding indexes is always a
tradeoff between SELECT performance and
DML performance.
Too many Indexes will also slow down performance of updates and
inserts. Too few indexes and all
types of queries may run slower (even updates and deletes). If
it were just a trade off, performance of
updates and inserts verses performance of selects, optimizing
Indexes in a database would be fairly easy.
However, it is not that simple.
First is the question of is the
index being used? If it is used, is it the correct
index? Is there a better index? How can a DBA with a database
choking on indexes, focus his efforts to
optimize index usage? The question is not just ?are my indexes
being used?? but ?do I have the right
indexes on the right data to optimize my database??
Database Parameters
Two parameters introduced in Oracle9i effect the COST based
optimizer's index usage decisions. The
optimizer_index_caching parameter tells the optimizer how much
of the index it can expect to find in the
buffer cache. It defaults to 0 (turned off). Changing this
parameter to account for indexes in the buffer
cache will cause the optimizer to favor using indexes. I
normally start with this parameter set to 75
(assume 75% of the index is in the cache). The other parameter,
optimizer_index_cost_adj) directly
impacts the COST based optimizer's calculation of the using an
index. It defaults to 100.
This tells the
optimizer to calculate the cost of using an index at 100% its
actual cost. Setting this parameter lower will
lower the cost for all indexes that the COST based optimizer
evaluates. This parameter has to most
potential for misuse because the DBA may not fully understand
its impact. If you set the
optimizer_index_cost_adj to a low value, you run the risk of
removing the ability of the optimizer to
determine which index is better for a specific query.
Important
Note: Prior to Oracle 10g, adjusting these optimizer parameters was the
only way to compensate for sample size issues with dbms_stats. As of
10g, the use of dbms_stats.gather_system_stats and improved sampling
within dbms_stats had made adjustments to these parameters far less
important.
Ceteris
Parabus, always adjust CBO statistics before adjusting optimizer parms. For
more details on optimizer parameters, see my latest book "Oracle
Tuning: The Definitive Reference".
Basically,
if the database has three indexes that can be
used in a query, and the optimizer_index_cost_adj is set to 10
(10%), the optimizer will examine each of
the three indexes available. It will evaluate each index and if
the index has a lower cost (is better) than the
current index, the optimizer will select the new index and use
it for the remaining index comparisons. But
the new index but be better to be selected, not just as good.
With the cost cut by 90% all the indexes in the
comparison will have the same cost, leaving the optimizer to
only choose the first index, because the other
indexes can not be better, they all have the same cost.
Thus
setting this parameter too low can result in the
optimizer making bad choices on which index to use. If the
optimizer needs a push toward using indexes,
lowering the optimizer_index_cost_adj parameter can help but
values below 50 (50%) should be monitored
closely.
Evaluating Current Indexes
Finding and removing indexes that are not used will improve
database performance by removing the
maintenance overhead. However, finding and removing duplicate
indexes can have just as great an effect if
not more. The plan of attack is to locate all indexes that the
database is currently not using. These unused
indexes are then removed from the system. The second step is to
locate possible duplicate indexes.
Duplicate indexes in this context are multiple indexes that can
satisfy the queries; the database will not let
you create an actual duplicate index.
Duplicate indexes can be
removed or modified to meet the
requirements of the SQL statements. In reviewing possible
duplicate indexes, tables that could benefit
from reorganization into Index Organized Tables (IOT) are also
identified. The last step is identifying
tables/indexes that are candidates for Function Based Indexes
(FBI).
Unused Indexes.
Finding unused indexes in Oracle 8i is difficult to say the
least. Unlike Oracle 9i and 10g, there is no
flag to tell you that an index was used. In Oracle 8i you must
do it the hard way, fishing in the library
cache. Basically, you exaplain each query in the library cache
into the plan table, then query the indexes
out of the plan table. Oracle9i and 10g has made finding unused
indexes easy. The database will monitor
for index use and update a view called v$object_usage.
When you
turn monitoring on with:
alter index <index_name> monitoring usage;
The database begins monitoring the index and updates the USED
column when the index is first used.
When you want to stop monitoring simply execute:
alter index <index_name> nomonitoring usage;
This only tells you that the index was used during the
monitoring period (YES or NO). Each time you start
monitoring an index the USED column is reset to NO until the
index is used. This is great for identifying
those indexes used only for periodic reports.
Although index
monitoring does not in itself impact database
performance, turning it on does. Each time you turn on index
monitoring, all execution plans in the library
cache that use the index are invalidated.
You need to monitor index usage over a long enough period of
time to identify all indexes being used.
By restarting monitoring before running periodic reports, etc,
you will locate indexes that are only used in
specific task or times.
The database may benefit by removing
these indexes, rebuilding them when needed
and dropping them again. In this way you are not always paying
the overhead of maintaining indexes that
are only used at a specific time.
If the penalty caused by index montoring's SQL invalidation is
too high, you can also look to
STATSPACK and the AWR for index information. In Oracle9i,
STATSPACK gathers execution plans and
stores them in the STATS$SQL_PLAN table.
From this table you can
determine which indexes where
being used by the execution plan during that snap. Oracle 10g
introduced the AWR, which also captures
index use and can be used for time series analysis. We cover AWR
in more detail later in the paper.
Step Two - Remove the unused indexes.
Once you have a list of indexes that have not been used, I
recommend that you use a tool (or script)
and punch out the DDL required to rebuild them. Also watch for
Foreign Key indexes. Hopefully you
have a naming protocol so that you can easily identify PK and FK
indexes as you will not see a PK or FK
index in the explain plan if it is only used for referential
integrity. Primary keys can't be dropped (and you
don't want to drop them) but FK indexes can. I recommend that
you not drop FK indexes until you can
closely monitor the system.
Missing FK indexes can cause large
numbers of full table scans during inserts
and updates. Now you can start dropping indexes and monitoring
the system for increased response time or
increases in full table scans. Again I use one of Don Burleson's
scripts plan9i.sql (plan10g.sql) to monitor
full table scans. The script basically generates a report that
details full table scans and index full and range
scans.
Small tables that having high full table scans may
benefit more by being placed in the KEEP pool,
rather than having an index added. When I perform index tuning
for a client I always find that I need to
create a few indexes after the unused indexes are dropped to
reduce full table scans. However, the added
index is always different that ones that were dropped.
Too Many Indexes?
Most of the systems I work on are under indexed. That said,
databases supporting custom applications are
often over indexed. Many developers believe that all columns in
all WHERE clauses should be indexed.
While this philosophy may work for the developer, it often
drives the DBA nuts! As a DBA you focus on
tuning the system, thin fine tune the problem queries
individually. Sometimes I find a system that is
choking on the overhead of indexes. Even after removing unused
indexes, the system still is impacted by
the overhead of the remaining indexes. In this case I start
reviewing the indexes to see if fewer (less
perfect) indexes can relieve the overhead wile maintaining
acceptable query performance.
I start by looking for indexes on the same table with the same
leading column, then for indexes with the
same two leading columns. This provides a good starting point
for trying to reduce redundancy in indexes.
Indexes with the same three leading columns (yes, I find many of
those) will be in the results from the two
leading column report.
Oracle looks for an index that will satisfy the ?WHERE?
statement in the query. The Rule Based
Optimizer will select the index it thinks will be most efficient
based on the columns in the WHERE clause
and uses it. The Cost Based Optimizer selects an index based on
the columns it is looking for, the index
statistics and whether it believes using the index is more
efficient than a full table scan.
If an index exist
that will completely satisfy the query then Oracle will select
that index, otherwise Oracle will try and select
the index that will return the least number of rows (i.e. most
restrictive) and use that index. Oracle will
only use one
index unless the query contains
an ?and_equal? hint telling it to use more than one index
before accessing the table. The ?and_equal? hint is depreciated
in Oracle 10g and in real life was not as
useful as it would appear to be.
Basically here are some scenarios where you may be able to
reduce the number of indexes and still
maintain performance. Performance is always the deciding factor.
Primary Keys:
A
ny table with a PK has an index on that PK. If you do not
specify a PK index, the system will create
one itself. It is very common to find redundant indexes based on
the assumption that there is no index on
the table's PK when in fact there always is.
Table with 3 or more columns:
Index1 column1_PK
Index2 column1 column2
Index3 column1 column2 column3
In this case index2 is redundant because it is completely
contained in index3. If index1 were not a primary
key index, it could possibly be redundant.
Using ?and_equal? Hints To Concatenate Indexes:
Index1 column1_PK
Index2 column2
Index3 column1 column2 - Index4 column2 column1 -
In this case index3 and 4 were created for multiple column
queries with some lead columns on 1 and
some lead columns on 2. In this case Index 3 and 4 could be
removed and multi-column queries that use
column1 and 2 can use the ?and_equal? hint to cause the
optimizer to use index1 and index2 before
accessing the table with rowids. Using the ?and_equal? hint does
have additional overhead since two
indexes must be read.
You have to balance the overhead of two
reading two indexes against the overhead
of the additional indexes.
I also find cases where Index3 and Index4 were added to support
queries that contained both column1
and column2 in separate WHERE clauses. These indexes were added
because in some queries column1 is
more restrictive (so it is the lead column) and in other queries
column2 is more restrictive (and thus the lead
column). In fact either Index3 or Index4 (or using the
?and_equal? hint) will satisfy the query and will
have equivalent performance. Y
ou are also relying on the
optimizer to pick the correct index to use for
each type of query, which is unlikely. Having the lead column
the most restrictive column will not affect
the index performance. In this case you can drop either Index3
or Index4 and not affect query
performance.
Remember that leading columns are only important in that they
must be used in the WHERE clause for
the index to be used. Oracle 9i introduced the index skip scan
that allows you to use an index where the
lead column is not in the where clause, but this method normally
requires more overhead than resulting
performance benefit.
Large Multi-column Indexes:
Multi-column indexes with more than 3 columns may not provide
more efficient access than a two-column
index. The objective of the index is to reduce the amount of
rows returned from a table access.
Therefore each added column must substantially reduce the number
of returned rows to be effective. For
example, assuming a large table, on a query with 5 or more WHERE
(AND) clauses using a 5-column
index may return only 1 row. However using a 3-column index may
return only 50 rows.
A two-column
index returns 200 rows. The time it takes to extract the one row
from the 200 rows using nested-loops is
negligible. Thus the two-column index may be almost as efficient
(fast) as the 5-column index. The key is
to index the most restrictive columns. Another tradeoff is a
table with multiple column indexes where the
leading column(s) are the same.
For instance, a table with four
3-column indexes where the leading two
columns are the same may work very efficiently on select
statements but cause a heavy penalty on inserts
and updates. Just one 2-column index on the leading two columns
may provide acceptable query
performance while greatly improving DML.
Small tables with two or three columns may benefit by being
rebuilt as an Index Organized Table
(IOT).
A 2-column table with a PK and a two-column index has 1.5
times the data in indexes that are in
the table. Making the table an Index Organized Table reduced the
need for indexes because the table is the
index. Also IOTs can have indexes on non-leading columns if
required. Again this has to be balanced
with the overhead of maintaining the IOT.
Lastly, do not be afraid to use temporary indexes.
If you run a
nightly report that requires 6 hours to
run, but will run in 30 mins with a specific index, you might
want to create the index before running the
report and drop it upon completion. I work with clients that
drop certain indexes to expedite the bill run,
then recreate then for the normal application. They create
indexes each night and drop them in the
morning. There is nothing wrong with dynamically changing you
database to respond to varying tasks if it
results in efficiency.
The question of rebuilding indexes raises people's ire more that
talking politics. Everyone has an
opinion, everyone has rules and if you don't agree with this or
that rule you?re just plain dumb. The rules
go from never to always with sometimes in between. I am not
going to provide any rule, rather I am going
to discuss what rebuilding an index does and when you might want
to rebuild an index.
Creating an Index
This is a basic description of how Oracle builds indexes. A
B-Tree index starts as two blocks. The
first block contains the pointers to the rows in the second
block. As the index grows, the data/rowids go in
the second block. So to find a key requires two reads. The first
block is read to determine which block
contains the data/rowids, and the second block is read to
retrieve the necessary information. As the index
grows the second block will fill up. When it hits PCTFREE, it is
logically full. Oracle evaluates how it has
been growing the current block.
If it has been growing
sequentially, then Oracle adds another block and
continues to add data (middle diagram below). If the data has
been growing randomly, then Oracle splits
the first block into two approximately half filled blocks and
continues to add data (bottom diagram below).
(Figure 1)
Figure 1, Splitting and Growing Blocks
Notice that when you grow a block, the data remains tightly
packed but when you split a block you end up
with two half empty blocks. This empty space is referred to as
index fluff (actually there are many names,
I will use fluff). Lastly, since the header block contains only
references to the other blocks (references are
very small) the index will need to be very large before that
block is filled. When it fills, a new header
block is added and the current header block moves down and is
split or grown. Now the index is a three
level index.
Why it is good to rebuild indexes
When you rebuild an index, Oracle uses the current index to
create the new index. Thus, the new
index is created sequentially and is tightly packed on the
blocks (no fluff other than PCTFREE). This is
good because it may take a three level index and rebuilding it
into a two level index, reducing a read. The
tightly packed blocks are also more effect in the buffer cache.
Why rebuilt indexes are problems
A tightly packed index is more efficient for the database as
long as no changes are made in the
underlying table. Once you start making changes, the index
blocks start to split and reintroduce the fluff.
Not only is fluff reintroduced, but there is redo created as
blocks are added to the index. The higher the
rate of change, the faster your nicely packed index will return
to a steady state of fluff.
When to rebuild indexes
The only time you need to rebuild indexes is before placing them
and their underlying tables into readonly
tablespaces or if the underlying tables have been rebuilt.
Changes to the underlying tables can cause you to need to
rebuild related indexes. When you
delete from the underlying table, the database will mark rows as
deleted rather than clearing them out. It
does this to save time. If you perform a large amount of
deletes, then you can end up with a large number
of del_lf_rows in the index.
When you are using incremental keys (such as a sequence as a
primary key) you can end up with
an index that is carrying around a lot of mostly empty blocks.
For example, if I create an index on the stats$snapshot.snap_id column, the index will always be adding
numbers to the right side of the index as
snapshots are taken (thus the term ?right-sided?).
If each week
I delete those snapshots older than 3 days, I
in effect delete rows from the left while adding rows on the
right. Although by definition the index is still
balanced, it has to keep adding blocks (on the right) while
there are empty blocks on the left. Rebuilding
the index will fix the problem. Actually, this is the only
reason I know of that would warrant scheduled
index rebuilding.
Again, cost vs benefit.
The other time to rebuild is when the cost to rebuild is less
that the performance gained. First, it is a
continuing process since the index will move toward fluff with
use. Second, there is the cost of rebuilding
the index AND the cost of the additional redo as the index
changes. There is only one method to determine
if rebuilding an indexes benefits your database, TESTING!.
Counting index usage inside SQL
Prior to Oracle9i it was very difficult to see if an index was
being used by the SQL in your database. It
required explaining all of the SQL in the library cache into a
holding area and then parsing through the
execution plans for the index name. Things were simplified
slightly in Oracle9i when we got the primitive
alter index xxx monitoring usage
command and the ability to see if the index
was invoked.
The problem has always been that it is very difficult to know
what indexes are the most popular. In
Oracle10g we can easily see what indexes are used, when they are
used and the context where they are
used. Here is a simple AWR query to plot index usage:
--
************************************************
-- Copyright - 2004 by Rampant TechPress
Inc.
-- Free for non-commercial use!
-- To license, e-mail info@rampant.cc
--
************************************************
col c1 heading ?Begin|Interval|time? format
a20
col c2 heading 'search Columns? format 999
col c3 heading ?Invocation|Count? format
99,999,999
break on c1 skip 2
accept idxname char prompt ?Enter Index
Name: ?
ttitle ?Invocation Counts for index|&idxname?
select
to_char(sn.begin_interval_time,'yy-mm-dd
hh24') c1,
p.search_columns c2,
count(*) c3
from
dba_hist_snapshot sn,
dba_hist_sql_plan p,
dba_hist_sqlstat st
where
st.sql_id = p.sql_id
and
sn.snap_id = st.snap_id
and
p.object_name = ?&idxname'
group by
begin_interval_time,search_columns;
As we can see, knowing the signature for large-table full-table
scans can help us in both SQL tuning and
instance tuning. For SQL tuning, this report will tell us when
to drill-down to verify that all of the large table
full-table scans are legitimate, and once verified, this same
data can be used to dynamically
reconfigure the Oracle instance to accommodate the large scans.
Conclusion
Index efficiency is hard. Many times finding the best index is a
matter of trial and error. Removing
unused indexes is the easy part. Finding the most efficient
index for your system is more of an art and the
DBA must always remember that changing an index can have a
cascading effect on many SQL statement's
execution plans. Developing an index use signature will also
help you determine what indexes are being
used and when. All of this information goes into determining
which are the best indexes for you particular
system.
About the Author:
John Garmany is a graduate of West Point and a retired LTC with
more than 20 years of IT experience. He
is an OCP Certified Oracle DBA with a master degree in
Information Systems, a graduate certificate in
Software Engineering, and a BS degree (electrical engineering)
from West Point. John is a senior
consultant and trainer with Burleson Consulting and author of
Oracle Replication Handbook, Easy Oracle
SQL, Easy Oracle PL/SQL by Rampant TechPress and Oracle
Application Server10g Administration
Handbook by Oracle Press.
|
|
Get the Complete
Oracle SQL Tuning Information
The landmark book
"Advanced Oracle
SQL Tuning The Definitive Reference" is
filled with valuable information on Oracle SQL Tuning.
This book includes scripts and tools to hypercharge Oracle 11g
performance and you can
buy it
for 30% off directly from the publisher.
|