Note:
You should never use the default
dbms_stats statistics collection mechanism with Oracle.
Just like the initialization parameters, the statistics
collection should be customized to your specific needs.
For details, see my latest book
"Oracle Tuning: The Definitive Reference"
and these notes on
11g extended
optimizer statistics.
When a SQL
statement is executed, the database must convert the query
into an execution plan and choose the best way to retrieve the
data. For Oracle, each SQL query has many choices for
execution plans, including which index to use to retrieve
table row, what order in which to join multiple tables
together, and which internal join methods to use (Oracle has
nested loop joins, hash joins, star joins, and sort merge join
methods). These execution plans are computed by the Oracle
cost-based SQL optimizer commonly known as the CBO.
The choice of executions plans made by the Oracle SQL
optimizer is only as good as the Oracle statistics. To always
choose the best execution plan for a SQL query, Oracle relies
on information about the tables and indexes in the query.
Starting with the introduction of the dbms_stats
package, Oracle provides a simple way for the Oracle
professional to collect statistics for the CBO. The
old-fashioned analyze table and dbms_utility methods
for generating CBO statistics are obsolete and somewhat
dangerous to SQL performance because they don't always capture
high-quality information about tables and indexes. The CBO
uses object statistics to choose the best execution plan for
all SQL statements.
The dbms_stats utility does a far better job in
estimating statistics, especially for large partitioned
tables, and the better stats result in faster SQL execution
plans. Andrew Holdsworth of Oracle Corporation notes
that dbms_stats is essential to good SQL performance,
and it should always be used before adjusting any of the
Oracle optimizer initialization parameters:
?The payback from good
statistics management and execution plans will exceed any benefit of
init.ora tuning by orders of magnitude?
Below is a sample execution of dbms_stats
with the options clause.
exec dbms_stats.gather_schema_stats( -
ownname => 'SCOTT', -
options => 'GATHER AUTO', -
estimate_percent => dbms_stats.auto_sample_size, -
method_opt => 'for all columns size repeat', -
cascade
=> true, -
degree => 15 -
)
Here is a representative example of invoking dbms_stats in 10g:
DBMS_STATS.gather_schema_stats(
ownname=>?<schema>?,
estimate_percent=>dbms_stats.auto_sample_size
cascade=>TRUE,
method_opt=>?FOR ALL COLUMNS SIZE AUTO?)
To fully appreciate dbms_stats, you need to examine
each of the major directives. Let?s take a close look at each
directive and see how it is used to gather top-quality
statistics for the cost-based SQL optimizer.
The options parameter
Using one of the four provided methods, this option governs
the way Oracle statistics are refreshed:
- gather?Reanalyzes the whole schema
- gather empty?Only analyzes tables that
have no existing statistics
- gather stale?Only reanalyzes tables with
more than 10% modifications (inserts, updates, deletes).
- gather auto?Reanalyzes objects which
currently have no statistics and objects with stale
statistics (Using gather auto is like combining
gather stale and gather empty.)
Note that both gather stale and gather auto
require monitoring. If you issue the alter table xxx
monitoring command, Oracle tracks changed tables with the
dba_tab_modifications view, which allows you to see the
exact number of inserts, updates, and deletes tracked since
the last analysis of statistics.
The estimate percent option
The following estimate_percent argument is a new way to
allow Oracle?s dbms_stats to automatically estimate the
best percentage of a segment to sample when gathering
statistics:
estimate_percent =>
dbms_stats.auto_sample_size
You can verify the accuracy of the automatic statistics
sampling by looking at the dba_tables sample_size
column. It is interesting to note that Oracle chooses between
5 and 20 percent for a sample size when using automatic
sampling. Remember, the better the quality of your statistics,
the better the decision of the CBO.
The method_opt option
The method_opt parameter for dbms_stats
is very useful for refreshing statistics when the
table and index data change. The method_opt
parameter is also very useful for determining
which columns require histograms.
In some cases, the distribution of values within
an index will effect the CBOs decision to use an
index versus perform a full-table scan. This
happens when a where clause has a disproportional
amount of values, making a full-table scan cheaper
than index access.
Oracle histograms statistics can be created when
you have a highly skewed index, where some values
have a disproportional number of rows. In the real
world, this is quite rare, and one of the most
common mistakes with the CBO is the unnecessary
introduction of histograms in the CBO statistics.
As a general rule, histograms are used when a
column's values warrant a change to the execution
plan.
To aid in intelligent histogram generation, Oracle
uses the method_opt parameter of
dbms_stats. There are also important new
options within the method_opt clause,
namely skewonly, repeat and auto:
method_opt=>'for
all indexed columns size skewonly'
method_opt=>'for all columns size repeat'
method_opt=>'for columns size auto'
The skewonly option is very time-intensive
because it examines the distribution of values for
every column within every index.
If dbms_stats discovers an index whose
columns are unevenly distributed, it will create
histograms for that index to aid the cost-based
SQL optimizer in making a decision about index
versus full-table scan access. For example, if
an index has one column that is in 50 percent of
the rows, a full-table
scan is faster than an index scan to retrieve
these rows.
--*************************************************************
-- SKEWONLY option?Detailed analysis
--
-- Use this method for a first-time analysis for
skewed indexes
-- This runs a long time because all indexes are
examined
--*************************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent =>
dbms_stats.auto_sample_size,
method_opt => 'for all columns size
skewonly',
cascade
=> true,
degree => 7
);
end;
If you need to reanalyze your statistics, the
reanalyze task will be less resource intensive
with the repeat option. Using the repeat
option will only reanalyze
indexes with existing histograms, and will not
search for other histograms opportunities. This is
the way that you will reanalyze you statistics on
a regular basis.
--**************************************************************
-- REPEAT OPTION - Only reanalyze histograms for
indexes
-- that have histograms
--
-- Following the initial analysis, the weekly
analysis
-- job will use the ?repeat? option. The repeat
option
-- tells dbms_stats that no indexes have changed,
and
-- it will only reanalyze histograms for
-- indexes that have histograms.
--**************************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent =>
dbms_stats.auto_sample_size,
method_opt => 'for all columns size
repeat',
cascade
=> true,
degree => 7
);
end;
The auto option within dbms_stats is
used when Oracle table monitoring is implemented
using the alter table xxx monitoring;
command. The auto option, shown in
Listing D, creates histograms based upon data
distribution and the manner in which the column is
accessed by the application (e.g., the workload on
the column as determined by monitoring). Using
method_opt=>?auto? is similar to using the
gather auto in the option parameter of
dbms_stats.
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent =>
dbms_stats.auto_sample_size,
method_opt => 'for all columns size
auto',
cascade
=> true,
degree => 7
);
end;
Parallel collection
Oracle allows for parallelism when collecting CBO
statistics, which can greatly speed up the time
required to collect statistics. A parallel
statistics collection requires an SMP server with
multiple CPUs.
Better execution speed
The dbms_stats utility is a great way to
improve SQL execution speed. By using
dbms_stats to collect top-quality statistics,
the CBO will usually make an intelligent decision
about the fastest way to execute any SQL query.
The dbms_stats utility continues to improve
and the exciting new features of automatic sample
size and automatic histogram generation greatly
simplify the job of the Oracle professional.