 |
|
Quickly Improve SQL Performance with dbms_stats
Oracle Tips by Burleson Consulting
Updated for 11g on September 17, 2015
|
The old fashioned
"analyze table" and dbms_utility methods for generating
CBO statistics are obsolete and somewhat dangerous to SQL performance.
This is because the cost-based SQL Optimizer (CBO) relies on the quality
of the statistics to choose the best execution plan for all SQL
statements. The dbms_stats utility does a far better job in
estimating statistics, especially for large partitioned tables, and the
better stats results in faster SQL execution plans.
Let's see how
dbms_stats works. It's easy! Here is a sample execution of
dbms_stats with the options clause:
exec dbms_stats.gather_schema_stats( -
ownname => 'SCOTT', -
estimate_percent =>
dbms_stats.auto_sample_size, -
method_opt => 'for all
columns size repeat', -
degree => 34 -
)
When the
options clause is specified you may specify GATHER options.
When GATHER AUTO is specified, the only additional valid parameters are
ownname, stattab, statid, objlist and statown; all other parameter
settings are ignored.
exec dbms_stats.gather_schema_stats( -
ownname => 'SCOTT', -
options => 'GATHER AUTO'
)
There are several
values for the options parameter that we need to know about:
-
gather
– re-analyzes the whole schema.
-
gather empty
– Only analyze tables that have no existing statistics.
-
gather stale
– Only re-analyze tables with more than 10% modifications (inserts,
updates, deletes).
-
gather auto
– This will re-analyze objects which currently have no
statistics and objects with stale statistics. Using gather auto
is like combining gather stale and gather empty.
Note that both
gather stale and gather auto require monitoring. If you
issue the "alter table xxx monitoring" command, Oracle tracks
changed tables with the dba_tab_modifications view. Below we see
that the exact number of inserts, updates and deletes are tracked since
the last analysis of statistics.
SQL> desc dba_tab_modifications;
Name Type
--------------------------------
TABLE_OWNER VARCHAR2(30)
TABLE_NAME VARCHAR2(30)
PARTITION_NAME VARCHAR2(30)
SUBPARTITION_NAME VARCHAR2(30)
INSERTS NUMBER
UPDATES NUMBER
DELETES NUMBER
TIMESTAMP DATE
TRUNCATED VARCHAR2(3)
The most
interesting of these options is the gather stale option. Because
all statistics will become stale quickly in a robust OLTP database, we
must remember the rule for gather stale is > 10% row change
(based on num_rows at statistics collection time).
Hence, almost
every table except read-only tables will be re-analyzed with the
gather stale option. Hence, the gather stale option is best
for systems that are largely read-only. For example, if only 5% of the
database tables get significant updates, then only 5% of the tables will
be re-analyzed with the "gather stale" option.
The CASCADE
Option
When analyzing
specific tables, the cascade option can be used to analyze all related
indexes.
The Oracle documentation notes that using the cascade option gathers
statistics on the table, plus all indexes for the target table. Using
this option is equivalent to running gather_table_stats plus
running gather_index_stats for each index on the table.
If you always want indexes analyzed when running gather_table_stats you
can use the set_database_prefs, set_global_prefs, or
set_table_prefs, to always include indexes when gather_table_stats
is executed.
exec dbms_stats.gather_table_stats( -
ownname => 'PERFSTAT', -
tabname =>
'STATS$SNAPSHOT' -
estimate_percent =>
dbms_stats.auto_sample_size, -
method_opt => 'for all
columns size skewonly', -
cascade => true,
-
degree => 7 -
)
The DEGREE Option
Note that you can
also parallelize the collection of statistics because the CBO does
full-table and full-index scans. When you set degree=x, Oracle
will invoke parallel query slave processes to speed up table access.
Degree is usually about equal to the number of CPUs, minus 1 (for the
OPQ query coordinator).
Automating Sample Size with dbms_stats
Now that we see
how the dbms_stats options works, get see how to specify
the sample size for dbms_stats. The following
estimate_percent argument is a new way to allow Oracle's
dbms_stats to automatically estimate the "best" percentage of a
segment to sample when gathering statistics:
estimate_percent => dbms_stats.auto_sample_size
You can verify the
accuracy of the automatic statistics sampling by looking at the
dba_tables sample_size column. It is interesting to note
that Oracle chooses between 5% to 20% for a sample_size when
using automatic sampling.
In our next
installment we will look at automatics the collection of histogram data
from dbms_stats.
11g Update:
Oracle guru
Guy Harrison also
offers this advice for 11g statistics collection on function-based index
columns.
In 11g, I think there are two other ways to get
statistics collected for indexed expressions:
1) Collect
extended statistics directly on the expression. So for
instance, if we had a function SALES_CATEGORY, we might do this:DBMS_STATS.gather_table_stats
(ownname => USER,
tabname => ?SALES?,
method_opt => ?FOR ALL COLUMNS FOR COLUMNS
(sale_category(amount_sold))?
);
2) Create a virtual column on the expression,
then index that column. So for the same example as above we
might create the following virtual column, then index the column and
collect stats as usual:ALTER TABLE
SALES
ADD
sales_category
GENERATED
ALWAYS AS
(sale_category(amount_sold));
I think I like the first method better, because the
statistics will still exist even if the index is dropped and ? unlike
the second approach ? it doesn?t change the logical structure of the
table.
Arup Nanda has a great article on extended statistics with dbms_stats,
specialty histogram analysis using function-based columnar data:
Next, re-gather statistics on the table and collect the
extended statistics on the expression upper(cust_name).
begin
dbms_stats.gather_table_stats (
ownname => 'ARUP',
tabname => 'CUSTOMERS',
method_opt => 'for all columns size skewonly for columns (upper(cust_name))'
);
end;
Alternatively you can define
the column group as part of the gather statistics command.
You do that by placing
these columns in the method_opt parameter of the gather_table_stats procedure in
dbms_stats as shown below:
begin
dbms_stats.gather_table_stats (
ownname => 'ARUP',
tabname => 'BOOKINGS',
estimate_percent=> 100,
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS(HOTEL_ID,RATE_CATEGORY)',
cascade => true
See my related dbms_stats notes: