|
|
Oracle Data Warehouse How to Create a
Clustered Index
Oracle Data Warehouse Tips by Burleson Consulting |
How to create a clustered index
For each table, there can only be one
clustered index, since only one index will be stored in the same
physical order as the table. Unfortunately, Oracle does not
give the DBA a simple method for insuring the physical ordering of
rows within a table. There are, however, some tricks that can
be used to re-sequence a table to give it a different physical
order. Beware, however, that the index may quickly be out of
physical sequence as new rows are added to the end of the table and
key values are updated to change the logical sequence in the index.
For Oracle data warehouses updates are
usually done on a time-based formula, so the issues becomes one of
re-sequencing the index and insuring that additions to the index are
managed. The most obvious key for an Oracle data warehouse
would be the date column. Since Oracle data warehouses are
updated in batch mode periodically, the physical sequence of the
data with the index could be maintained if we insure that the new
records are pre-sorted in date order.
Note: A Date sequence for a clustered
index is only beneficial if your system performs a lot of range
scans by date.
Here are the options for appending new rows
onto the end of a physically sequenced Oracle table:
1. Pre-sort the extract files in index
key order and load with SQL*Loader.
2. Extract the data directly from an
Oracle OLTP system using date predicates:
INSERT INTO warehouse_table
SELECT * FROM
oltp_table@remote_instance
WHERE
trans_date > ?01-JAN-1998?
ORDER BY
trans_date;
Note: Oracle export/import utilities have no
mechanism for changing the physical sequence of tables, and cannot
be used to cluster an index.
Now that we understand the basic constructs
of Oracle indexes, let?s look at a dictionary query (Listing 8.2)
that will tell us the structure of our indexes. Note that this query
assumes that your Oracle database is using the cost-based optimizer,
and that your tables have been analyzed with the ANALYZE TABLE
command. Here we see that the indexes are grouped according to the
tables that they are built upon. We also see that the
clustering factor for each index is computed as a percentage of the
number of rows in the index.
Listing 8.2 A SQL*Plus routine
to locate clustered indexes
rem
idx_bad1.sql, ? 1997 by Donald K. Burleson
set pagesize 60;
set linesize 100;
column c0
heading 'Table' format a8;
column c1 heading 'Index' format a18;
column c2 heading 'Level' format 999;
column c3 heading 'Clust Ftr' format 9,999,999;
column c4 heading '# rows' format
99,999,999;
column c5 heading 'Clust Pct' format 999.9999;
column c6 heading 'dist. Keys' format 99,999,999;
spool
idx_bad1.lst;
break on
c0 skip 1;
select
dba_indexes.table_name
c0,
index_name
c1,
blevel
c2,
clustering_factor
c3,
num_rows
c4,
decode(clustering_factor,0,1,clustering_factor)/
decode(num_rows,0,1,num_rows)
c5,
distinct_keys
c6
from dba_indexes, dba_tables
where
dba_indexes.owner not in ('SYS','SYSTEM')
and
dba_tables.table_name = dba_indexes.table_name
and c5 < .25
order by c0, c5 desc;
spool off;
Here is a sample listing from this report:
SQL>
@idx_bad1
Table
Index
Level Clust Ftr # rows
reorg dist. Keys
-------- ------------ ----- ----------
---------- ------- ----------
INV_LINE INV_LINE_ITEM_PK 2
62,107 1,910,034 .0325
1,912,644
ILI_FK_INV
2 164,757 1,910,034
.0339 1,659,625
ILI_FK_ACT
2 283,343 1,910,034
.0436 47
ILI_EK_CCHS_ACCT
3 1,276,987 1,910,034
.1450 25,041
Now, let's inspect this listing to see the
clustering status of our indexes. The indexes are listed
within each table heading in descending order of their clustering
factor, with the most clustered indexes at the bottom of the list.
In the example above we see the INV_LINE_ITEM_PK index with a
clustering factor of 62,107, indicating that the INV_LINE table has
been loaded in nearly the same physical order as this index.
If the value for clustering factor
approaches the number of blocks in the base table, then the index is
said to be clustered. If the clustering factor is greater than
the number of blocks in the base table and approaches the number of
rows in the base table, then the index is un-clustered.
Determining when to rebuild indexes
One of the problems with Oracle data
warehouses is their huge size. For very large database
warehouses, performing a database re-organization is impractical
because of the amount of time required to export the data warehouse
to tape, drop the Oracle database, and re-create the warehouse using
Oracle's import utility. Because of this time issue, the data
warehouse manager must find alternative methods for insuring that
our Oracle data warehouse remains well-tuned from a physical data
perspective.
In an Oracle data warehouse, a large index
may take a long amount of time to rebuild, and the prudent data
warehouse administrator must carefully choose the right conditions
that warrant an index rebuild. If the very large database
tables have been horizontally partitioned, where the large table is
split into sub-tables according to date, we see that there will be
several smaller indexes to replace a single, very large index. For
details on using index partitioning, see Chapter 14, Oracle8 for the
Warehouse.
In general, indexes will seldom required
rebuilding in an Oracle data warehouse unless there has been a high
amount of update or delete activity against the index columns.
SQL Insert operations that are common for loads of new warehouse
data do not cause structural problems with the Oracle index
structure.
So, how do we tell when an index would
benefit from being rebuilt? There are two Oracle views that
provide index statistics, DBA_INDEXES and INDEX_STATS. The
DBA_INDEX view contains statistical information that is placed into
the view when the ANALYZE INDEX xxx command is issued.
Unfortunately, the DBA_INDEXES view was designed to provide
information to the cost-based SQL optimizer and it does not keep
statistics about the internal status of the Oracle indexes. To
see the internal structure for an Oracle index, you must use the
ANALYZE INDEX xxx VALIDATE STRUCTURE SQL command to validate
the structure for the index. This command creates a single row
in a view called INDEX_STATS.
SQL>
analyze index DON.DON_FK_PLT validate structure;
Index
analyzed.
SQL>
select * from index_stats;
HEIGHT BLOCKS NAME
LF_ROWS LF_BLKS
---------- ---------- ------------------------------ ----------
----------
LF_ROWS_LEN LF_BLK_LEN BR_ROWS
BR_BLKS BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS
----------- ---------- ---------- ---------- ----------- ----------
-----------
DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE
USED_SPACE
--------------- ------------- ----------------- -----------
----------
PCT_USED ROWS_PER_KEY BLKS_GETS_PER_ACCESS
---------- ------------ --------------------
3
5635 DON_FK_PART
196103 2382
3137648 3900
2381 18
41031 3956
7
112
125
56220 9361008 3178679
34 1568.824
787.912
The Oracle index_stats view will never
contain more than one row. Therefore, you must perform the
"analyze index xxx validate structure" command and "select * from
index_stats" before issuing the next analyze index command.
The script id1.sql provides a method for getting a complete report
for all indexes.
Oracle indexes perform two basic operations
as the index expands to hold more keys. Oracle's version of
b-tree indexing uses an algorithm where each index node may contain
many index keys. As new key values are added to the index,
Oracle must manage the configuration of each index node.
Oracle index nodes are managed with two operations; splitting and
spawning.
Splitting-- This is the term used to
describe what happens when an index node is filled with keys and a
new index node is created at the same level as the full node.
Splitting widens the b-tree horizontally.
Spawning-- This is the term used to describe
the process of adding a new level to an index.
The index_stats view contains information
about the internal structure of the b-tree index that can be useful
when determining whether or not to rebuild the index. The
following columns of index_stats are especially useful:
height - This columns refers to the maximum
number of levels encountered within the index. An index may
have 90% of the nodes at 3 levels, but excessive splitting and
spawning in one area of the index may have caused some nodes to have
move then 3 levels. Whenever the value of height is more than
three, you may benefit from dropping and re-creating the index.
Oracle indexing will not spawn a fourth level on a clean rebuild
until more than ten million kets have been added to the index.
del_lf_rows-- This columns refers to the
number of leaf rows that have been deleted from the index.
This occurs when heavy index update activity occurs within the index
tree, and indicated that the index will benefit from being dropped
and re-created.
distinct_keys-- This indicates the number of
distinct key values in the index. This is called the
cardinality of the index, and values less than 20 are candidates for
being re-created as bitmapped indexes.
most_repeated_key-- This column counts the
number of times that the most frequent key value in a non-unique
index appears in the b-tree.
Since the index_stats view will only hold
one row at a time, it is not easy to create a SQL*Plus routine that
will produce a index_stats report for all of the indexes on a
system. The SQL in listing 8.3 will perform an analyze index
xxx validate structure for each index in the schema and report of
the resulting values in index_stats.
Note: Running id1.sql will invoke
id2.sql through id5.sql automatically producing the unbalanced index
report. Just be sure that id1.sql through id5.sql are present
in a common directory when starting id1.sql.
Despite the complexity of dealing with a
one-row index_stats table, it is easy to use the following script to
get index_stats for all warehouse indexes. (listings 8-3 through
8-8) In operational use, the unbalanced index report would be run
whenever the DBA suspects that update activity may have unbalanced
the indexes.
listing 8.3 The SQL*Plus script to generate
the report for index_stats.
rem id1.sql The main driver routine for
reporting index_stats
Rem ? 1997 by Donald K. Burleson
REM id1.sq
set pages 9999;
set heading off;
set feedback off;
set echo off;
spool
id4.sql;
select
'@id2.sql' from dual;
select
'analyze index '||owner||'.'||index_name||' validate structure;',
'@id3.sql;'
from dba_indexes
where
owner not in ('SYS','SYSTEM');
spool off;
set
heading on;
set feedback on;
set echo on;
@id4.sql
@id5.sql
Listing 8.4 The SQL to create a
temporary table for the index_stats report.
Rem ? 1997 by Donald K. Burleson
rem
id2.sql
elect
name
,
most_repeated_key ,
distinct_keys ,
del_lf_rows ,
height
,
blks_gets_per_access
from index_stats;
Listing 8.5 The SQL to insert
the data from index_stats into the temporary table
Rem ? 1997 by Donald K. Burleson
rem
id3.sql
nsert into temp_stats
(select
name
,
most_repeated_key ,
distinct_keys ,
del_lf_rows ,
height
,
blks_gets_per_access
from index_stats
);
Listing 8.6 This SQL is generated from
running id1.sql
Rem ? 1997 by Donald K. Burleson
rem
id4.sql
analyze
index DON.SHL_EK_TRUCK_LINK_NUM validate structure;
@id3.sql;
analyze
index DON.SHL_UK_FACT1_ID_SRC_CD_LOB validate structure;
@id3.sql;
analyze
index DON.PURCH_UNIT_PK validate structure;
@id3.sql;
Listing 8.7 This SQL*Plus script
generates the clustering report
Rem ? 1997 by Donald K. Burleson
rem
id5.sql - This creates the unbalanced index report and the rebuild
syntax
set pagesize 60;
set linesize 100;
set echo off;
set feedback off;
set heading off;
column c1
format a18;
column c2 format 9,999,999;
column c3 format 9,999,999;
column c4 format 999,999;
column c5 format 99,999;
column c6 format 9,999;
spool
idx_report.lst;
prompt
prompt
prompt '
# rep dist. # deleted
blk s
prompt Index
keys keys
leaf rows Height per s
prompt -------------------- ------
----- -------- ------
-----
select
distinct
name
c1,
most_repeated_key c2,
distinct_keys c3,
del_lf_Rows
c4,
height
c5,
blks_gets_per_access c6
from temp_stats
where
height > 3
or
del_lf_rows > 10
order by name;
spool off;
spool
id6.sql;
select
'alter index '||owner||'.'||name||' rebuild tablespace '||tablespace_nam'
from temp_stats, dba_indexes
where
temp_stats.name = dba_indexes.index_name
and
(height > 3
or
del_lf_rows > 10);
select
'analyze index '||owner||'.'||name||' compute statistics;'
from temp_stats, dba_indexes
where
temp_stats.name = dba_indexes.index_name
and
(height > 3
or
del_lf_rows > 10);
spool off;
Listing 8.8 This is the completed
unbalanced index report
'
# rep dist.
# deleted blk gets
Index
keys keys
leaf rows Height per access
-------------------- ------
----- -------- ------
----------DON_EK
159,450 25,420
934 4
41
DON_FK_ACT
1,009,808 542
101 3
1,705
INV_EK_INV_NUM
4 1,586,880
122 3
4
INV_FK_CAR
546,366 1,109
315 3
725
INV_FK_SRC
1,041,696 309
31 3
2,591
LOB_FACT1_PK
1 3,778,981 66,918
4
5
PAT_FK_JEN
37 2,736,262 436,880
3
4
PAT_FK_JEN
37 2,736,262 436,880
3
4
PAT_FK_JEN
37 2,736,262 436,880
3
4
PAT_FK_JEN
37 2,736,262 436,880
3
4
JEN_FK_SHP
88 1,464,282 97,473
4
6
JEN_FK_SHP
88 1,464,282 97,473
4
6
DON_FK_LEG
342,290 1,350
301 4
933
DON_FK_LEG
342,290 1,350
301 4 933
DON_FK_LEG
342,290 1,350
301 4
933
DON_FK_LEG
342,290 1,350
301 4
933
Dropping and re-creating an index has inherent problems and is not
the best way to rebuild an Oracle index. The most common
problem occurs when an extended index is dropped and then fails to
re-create, failing with a lack of space in the index tablespace.
For example, if we have an index with an initial size of 100
megabytes with 20 extents of 5 megabytes, we will drop the index,
re-define the index with an initial extent of 200 megabytes, and
re-create the index. Of course, we will need to insure that
there is 200 megabytes of free space within the index before we try
this operation. Also, we may need to coalesce free index
extents in the tablespace to make more contiguous space.
|
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning
tips and scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |