So who
is right? I suspect that they both are correct. There is a huge body of
evidence that index rebuilding makes the end-users report faster response time,
and I have to wonder if this is only a Placebo Effect, with no scientific
basis. Some experts suspect a Placebo Effect may be at work here, and the
end-users, knowing that they have new index trees, report a performance gain
when none exists.
Because
of their
extreme flexibility, Oracle b-tree indexes are quite complex, and to really
gather scientific evidence we must examine all of the index metrics. Getting a
meaningful measure of the benefit of an index rebuild in a production
environment is very difficult because the system is under heavy load and usage
patterns change constantly. Plus, many IT managers require periodic index
re-building because it is a low-risk job and it curries favor from the end-user
community.
Where are the index
details?
Most
Oracle professionals are aware of the dba_indexes view, which is populated
with index statistics when indexes are analyzed. The dba_indexes view
contains a great deal of important information for the SQL optimizer, but there
is still more to see. Oracle provides an analyze index xxx validate
structure command that provides additional statistics into a temporary
tables called index_stats, which, sadly, is overlaid after each command.
To get
the full picture, we must devise a table structure that will collect data from
both sources. Here is a method that will do the job:
-
Create a
temporary table to hold data from dba_indexes and index_stats
-
Verify
quality of dbms_stats analysis
-
Populate
temporary table from dba_indexes
-
Validate
indexes and send output into temp table
IMPORTANT: Collecting
the index_stats information is very time consuming and expensive and will
introduce serious locking issues on production databases. It is strongly
recommended that you perform these checks during scheduled downtime, or on a
representative test database.
10g note: Tested against Oracle10g
10.2.0.1.0 on Windows XP SP2, and found
IOT_REDUNDANT_PKEY_ELIM VARCHAR2(3)
DROPPED
VARCHAR2(3)
NUM_KEYS
NUMBER
Need to be defined in the table index_details, and this crashes the subsequent
table population and procedure.
Let’s
start by creating a table to hold our index data. I call this table
index_details:
drop table index_details;
Create table index_details
(
-- ********* The following is from dba_indexes ******************
OWNER_NAME VARCHAR2(30),
INDEX_NAME VARCHAR2(30),
INDEX_TYPE VARCHAR2(27),
TABLE_OWNER VARCHAR2(30),
TABLE_NAME VARCHAR2(30),
TABLE_TYPE VARCHAR2(11),
UNIQUENESS VARCHAR2(9),
COMPRESSION VARCHAR2(8),
PREFIX_LENGTH NUMBER,
TABLESPACE_NAME VARCHAR2(30),
INI_TRANS NUMBER,
MAX_TRANS NUMBER,
INITIAL_EXTENT NUMBER,
NEXT_EXTENT NUMBER,
MIN_EXTENTS NUMBER,
MAX_EXTENTS NUMBER,
PCT_INCREASE NUMBER,
PCT_THRESHOLD NUMBER,
INCLUDE_COLUMN NUMBER,
FREELISTS NUMBER,
FREELIST_GROUPS NUMBER,
PCT_FREE NUMBER,
LOGGING VARCHAR2(3),
BLEVEL NUMBER,
LEAF_BLOCKS NUMBER,
DISTINCT_KEYS NUMBER,
AVG_LEAF_BLOCKS_PER_KEY NUMBER,
AVG_DATA_BLOCKS_PER_KEY NUMBER,
CLUSTERING_FACTOR NUMBER,
STATUS VARCHAR2(8),
NUM_ROWS NUMBER,
SAMPLE_SIZE NUMBER,
LAST_ANALYZED DATE,
DEGREE VARCHAR2(40),
INSTANCES VARCHAR2(40),
PARTITIONED VARCHAR2(3),
TEMPORARY VARCHAR2(1),
GENERATED VARCHAR2(1),
SECONDARY VARCHAR2(1),
BUFFER_POOL VARCHAR2(7),
USER_STATS VARCHAR2(3),
DURATION VARCHAR2(15),
PCT_DIRECT_ACCESS NUMBER,
ITYP_OWNER VARCHAR2(30),
ITYP_NAME VARCHAR2(30),
PARAMETERS VARCHAR2(1000),
GLOBAL_STATS VARCHAR2(3),
DOMIDX_STATUS VARCHAR2(12),
DOMIDX_OPSTATUS VARCHAR2(6),
FUNCIDX_STATUS VARCHAR2(8),
JOIN_INDEX VARCHAR2(3),
-- ********* The following is from index_stats ******************
HEIGHT NUMBER,
BLOCKS NUMBER,
NAMEx VARCHAR2(30),
PARTITION_NAME VARCHAR2(30),
LF_ROWS NUMBER,
LF_BLKS NUMBER,
LF_ROWS_LEN NUMBER,
LF_BLK_LEN NUMBER,
BR_ROWS NUMBER,
BR_BLKS NUMBER,
BR_ROWS_LEN NUMBER,
BR_BLK_LEN NUMBER,
DEL_LF_ROWS NUMBER,
DEL_LF_ROWS_LEN NUMBER,
DISTINCT_KEYSx NUMBER,
MOST_REPEATED_KEY NUMBER,
BTREE_SPACE NUMBER,
USED_SPACE NUMBER,
PCT_USED NUMBER,
ROWS_PER_KEY NUMBER,
BLKS_GETS_PER_ACCESS NUMBER,
PRE_ROWS NUMBER,
PRE_ROWS_LEN NUMBER,
OPT_CMPR_COUNT NUMBER,
OPT_CMPR_PCTSAVE NUMBER
)
tablespace tools
storage (initial 5k next 5k maxextents unlimited);
(Note: the index_stats table has a column named PCT_USED even though Oracle
indexes do not allow changes to this value.)
Now that
we have a table that will hold all of the index details, the next step is to
populate the table with data from freshly-analyzed indexes. Remember, you
should always run dbms_stats to get current index statistics. Here is
the script.
insert into index_details
(
OWNER_NAME,
INDEX_NAME,
INDEX_TYPE,
TABLE_OWNER,
TABLE_NAME,
TABLE_TYPE,
UNIQUENESS,
COMPRESSION,
PREFIX_LENGTH,
TABLESPACE_NAME,
INI_TRANS,
MAX_TRANS,
INITIAL_EXTENT,
NEXT_EXTENT,
MIN_EXTENTS,
MAX_EXTENTS,
PCT_INCREASE,
PCT_THRESHOLD,
INCLUDE_COLUMN,
FREELISTS,
FREELIST_GROUPS,
PCT_FREE,
LOGGING,
BLEVEL,
LEAF_BLOCKS,
DISTINCT_KEYS,
AVG_LEAF_BLOCKS_PER_KEY,
AVG_DATA_BLOCKS_PER_KEY,
CLUSTERING_FACTOR,
STATUS,
NUM_ROWS,
SAMPLE_SIZE,
LAST_ANALYZED,
DEGREE,
INSTANCES,
PARTITIONED,
TEMPORARY,
GENERATED,
SECONDARY,
BUFFER_POOL,
USER_STATS,
DURATION,
PCT_DIRECT_ACCESS,
ITYP_OWNER,
ITYP_NAME,
PARAMETERS,
GLOBAL_STATS,
DOMIDX_STATUS,
DOMIDX_OPSTATUS,
FUNCIDX_STATUS,
JOIN_INDEX
)
select * from dba_indexes
where owner not like 'SYS%'
;
Now that
we have gathered the index details from dba_indexes, we must loop through
iterations of the analyze index xxx validate structure command to populate our
table with other statistics. Here is the script that I use to get all index
details.
/* INDEX.STATS contains 1 row from last execution */
/* of ANALYZE INDEX ... VALIDATE STRUCTURE */
/* We need to loop through validates for each */
/* index and populate the table. */
DECLARE
v_dynam varchar2(100);
cursor idx_cursor is
select owner_name, index_name from index_details;
BEGIN
for c_row in idx_cursor loop
v_dynam := 'analyze index '||c_row.owner_name||'.'||c_row.index_name||
' validate structure';
execute immediate v_dynam;
update index_details set
(HEIGHT, BLOCKS, NAMEx, PARTITION_NAME, LF_ROWS, LF_BLKS,
LF_ROWS_LEN, LF_BLK_LEN, BR_ROWS, BR_BLKS, BR_ROWS_LEN,
BR_BLK_LEN, DEL_LF_ROWS, DEL_LF_ROWS_LEN, DISTINCT_KEYSx,
MOST_REPEATED_KEY, BTREE_SPACE, USED_SPACE, PCT_USED,
ROWS_PER_KEY, BLKS_GETS_PER_ACCESS, PRE_ROWS, PRE_ROWS_LEN,
OPT_CMPR_COUNT, OPT_CMPR_PCTSAVE)
= (select * from index_stats)
where index_details.owner_name = c_row.owner_name
and index_details.index_name = c_row.index_name;
if mod(idx_cursor%rowcount,50)=0 then
commit;
end if;
end loop;
commit;
END;
/
update
index_details a
set
num_keys =
(select
count(*)
from
dba_ind_columns b
where
a.owner_name = b.table_owner
and
a.index_name = b.index_name
)
;
After
running the script from listing 3, we should now have complete index details for
any index that we desire. However with 70 different metrics for each index, it
can be quite confusing about which columns are the most important. To make
queries easy, I create a view that only displays the columns that I find the
most interesting. Here is my view.
drop view indx_stats;
Create view idx_stats
as
select
OWNER_NAME ,
INDEX_NAME ,
INDEX_TYPE ,
UNIQUENESS ,
PREFIX_LENGTH ,
BLEVEL ,
LEAF_BLOCKS ,
DISTINCT_KEYS ,
AVG_LEAF_BLOCKS_PER_KEY ,
AVG_DATA_BLOCKS_PER_KEY ,
CLUSTERING_FACTOR ,
NUM_ROWS ,
PCT_DIRECT_ACCESS ,
HEIGHT ,
BLOCKS ,
NAMEx ,
PARTITION_NAME ,
LF_ROWS ,
LF_BLKS ,
LF_ROWS_LEN ,
LF_BLK_LEN ,
BR_ROWS ,
BR_BLKS ,
BR_ROWS_LEN ,
BR_BLK_LEN ,
DEL_LF_ROWS ,
DEL_LF_ROWS_LEN ,
DISTINCT_KEYSx ,
MOST_REPEATED_KEY ,
BTREE_SPACE ,
USED_SPACE ,
PCT_USED ,
ROWS_PER_KEY ,
BLKS_GETS_PER_ACCESS ,
PRE_ROWS ,
PRE_ROWS_LEN ,
num_keys,
sum_key_len
from
Index_details;
While
most of these column descriptions are self-evident, there are some that are
especially important:
-
CLUSTERING_FACTOR
– This is one of the most important index statistics because it indicates how
well sequenced the index columns are to the table rows. If
clustering_factor is low (about the same as the number of
dba_segments.blocks in the table segment) then the index key is in the
same order as the table rows and index range scans will be very efficient, with
minimal disk I/O. As clustering_factor increases (up to
dba_tables.num_rows), the index key is increasingly out of sequence with
the table rows. Oracle’s cost-based SQL optimizer relies heavily upon
clustering_factor to decide whether to use the index to access the
table.
-
HEIGHT
- As an
index accepts new rows, the index blocks split. Once the index nodes have
split to a predetermined maximum level the index will “spawn” into a new
level.
-
BLOCKS
– This is the
number of blocks consumed by the index. This is dependent on the
db_block_size. In Oracle9i and beyond, many DBAs create b-tree indexes in
very large blocksizes (db_32k_cache_size) because the index will spawn less.
Robin Schumacher has noted in his book
Oracle Performance Troubleshooting notes “As
you can see, the amount of logical reads has been reduced in half simply by
using the new 16K tablespace and accompanying 16K data cache. Clearly, the
benefits of properly using the new data caches and multi-block tablespace
feature of Oracle9i and above are worth your investigation and trials in your
own database.“
-
PCT_USED – This
metric is very misleading because it looks identical to the dba_tables pct_used column, but has a different meaning. Normally, the pct_used
threshold is the freelist unlink threshold, while in index_stats
pct_used is the percentage of space allocated in the b-tree that is being
used.
Is there a criterion
for index rebuilding?
If we
believe the anecdotal reports that index rebuilding improved end-user reported
performance, how can we analyze this data and see what the criteria (if any)
might be for an index rebuild?
For example, here are the criteria used by
a fellow Oracle DBA who swears that rebuilding indexes with these criteria has a
positive effect on his system performance:
-- ***
Only consider when space used is more than 1 block ***
btree_space > 8192
and
-- *** The number of index levels is > 3 ***
(height > 3
-- *** The % being used is < 75%
***
or pct_used < 75
-- *** Deleted > 20% of total
***
or (del_lf_rows/(decode(lf_rows,0,1,lf_rows)) *100) > 20)
;
In
reality I suspect that the rules are far more complicated than this simple
formula. To see the commonality between indexes of similar nature you can use
the data from your new index_details table to write summary queries
(Listing 5). Here we see the average number of index blocks, leaf rows and leaf
blocks for indexes of different heights.
This
gives us a high-level idea of Oracle threshold for spawning an index onto new
levels. We can take this same approach and attempt to answer the following
questions:
1 - At what point does an index spawn to another level
(height)? It should be a function of blocksize, key length and the number of
keys.
2 - The number of deleted leaf nodes may not be enough to
trigger an index rebuild. This is because if clustering_factor is low (dba_indexes.clustering_factor
~= dba_segments.blocks), then the rows are added in order, and the index
is likely to reuse the deleted leaf nodes. On the other hand, if
clustering_factor is high (dba_indexes.clustering_factor ~=
dba_tables.num_rows), and the majority of queries use the index with
fast-full scans or index range scans, then a rebuild of the underlying table (to
resequence the
rows) may be beneficial.
To illustrate, assume I have an index on the last_name column of a
1,000,000 row table and the clustering_factor is close to the number of
blocks, indicating that the rows are in the same sequence as the index.
In this case, a bulk delete of all people whose
last_name begins with the letter “K” would leave a dense cluster of deleted
leaf nodes on adjacent data blocks within the index tablespace. This large
section of space is more likely to be reused than many tiny chunks.
We can
also use the data from our detail table to compute our own metrics. In the
example query below, we create a meta-rule for indexes:
-
Dense
Full Block Space
- This is the index key space (number of table rows * index key length) as a
function of the blocksize and free index space.
-
Percent
of Free Blocks -
This is the estimated number of free blocks within the index.
Using
these metrics, we can analyze the system and produce some very interesting
reports of index internals:
col c1 heading 'Average|Height' format 99
col c2 heading 'Average|Blocks' format 999,999
col c3 heading 'Average|Leaf|Rows' format 9,999,999,999
col c4 heading 'Average|Leaf Row|Length' format 999,999,999
col c5 heading 'Average|Leaf Blocks' format 9,999,999
col c6 heading 'Average|Leaf Block|Length' format 9,999,999
select
height c1,
avg(blocks) c2,
avg(lf_rows) c3,
avg(lf_rows_len) c4,
avg(lf_blks) c5,
avg(lf_blk_len) c6
from
index_details
group by
height
;
Average Average Average
Average Average Leaf Leaf Row Average Leaf Block
Height Blocks Rows Length Leaf Blocks Length
------- -------- -------------- ------------ ----------- ----------
1 236 12 234 1 7,996
2 317 33,804 691,504 106 7,915
3 8,207 1,706,685 41,498,749 7,901 7,583
4 114,613 12,506,040 538,468,239 113,628 7,988
As we
see, we can compute and spin this data in an almost infinite variety of ways.
Reader Comments on index rebuilding:
It has been my experience that the percentage of deletes
required to affect performance actually goes down as the table gets larger,
perhaps because the I/O effect is magnified with more data.
At my previous job, our observation was that on 20 million
row tables and upward as little as 5% deletes, regardless of other factors,
would cause sufficient performance degradation to justify an index rebuild.
Admittedly this was a site with fairly tight budget constraints, so we couldn't
afford absolute, top of the line hardware. We also didn't have the luxury of
making sure every tablespace was ideally placed for performance, but we did a
pretty good job with what we had. I actually wrote a script that would
calculate the % deletes and generate the index rebuild command.
Also, there is the rebuild online option for indices, which does work but it
will cause noticeable performance degradation if you try and do it during busy
times. I believe it was available in 8i, but it would generate ora-600 errors
when used on busy tables in 8i.
Andrew W. Kerber
See my related notes on index rebuilding:
Reader Feedback:
10g usage note:
Tested against Oracle10g 10.2.0.1.0 on
Windows XP SP2, and found
IOT_REDUNDANT_PKEY_ELIM VARCHAR2(3)
DROPPED VARCHAR2(3)
NUM_KEYS NUMBER
not defined in the table index_details, and this crashes the subsequent table
population and procedure.