Note:
See "Introduction to Histograms", "
skewonly",
"all about
histograms" and "Oracle
Cardinality and histograms" for more
details on using Oracle histograms to
improve execution plans.
About Oracle Histograms
Histograms may help the Oracle optimizer in
deciding whether to use an index vs. a
full-table scan (where index values are
skewed) or help the optimizer determine the
fastest table join order. For
determining the best table join order, the
WHERE clause of the query can be inspected
along with the execution plan for the
original query. If the cardinality of
the table is too-high, then histograms on
the most selective column in the WHERE
clause will tip-off the optimizer and change
the table join order.
See this important note about
Oracle unnecessary histograms.
Important
note: If your database
exclusively uses bind variables, Oracle
recommends deleting any existing Oracle
histograms and disabling Oracle histogram
generation (method opt) for any future
dbms_stats analysis. This approach will use
the number if distinct values to determine
the selectivity of a column.
Most Oracle experts only recommend scheduled
re-analysis for highly dynamic databases,
and most shops save one very-deep sample
(with histograms), storing the statistic
with the dbms_stats.export_schema_stats
procedure. The only exceptions are
highly-volatile systems (i.e. lab research
systems) where a table is huge one-day and
small the next.
For periodic re-analysis, many shops us the
table "monitoring" option and also
method_opt "auto" after they are confident
that all histograms are in-place.
Oracle histograms statistics can be created
when you have a highly skewed index, where
some values have a disproportional number of
rows. In the real world, this is quite rare,
and one of the most common mistakes with the
CBO is the unnecessary introduction of
histograms in the CBO statistics. As a
general rule, histograms are used when a
column's values warrant a change to the
execution plan.
If you need to reanalyze your statistics,
the reanalyze task will be less resource
intensive with the repeat option. Using the
repeat option will only reanalyze indexes
with existing histograms, and will not
search for other histograms opportunities.
This is the way that you will reanalyze you
statistics on a regular basis.
--**************************************************************
-- REPEAT OPTION - Only re-analyze
histograms for indexes
-- that have histograms
--
-- Following the initial analysis, the
weekly analysis
-- job will use the "repeat" option. The
repeat option
-- tells dbms_stats that no indexes have
changed, and
-- it will only re-analyze histograms for
-- indexes that have histograms.
--**************************************************************
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent =>
dbms_stats.auto_sample_size,
method_opt => 'for all columns size repeat',
degree => 7
);
end;
/
Find histograms for foreign key columns -
Many DBAs forget that the CBO must have
foreign-key histograms in order to determine
the optimal table join order (i.e. the
ORDERED hint).
Fix the cause, not the symptom - For
example, whenever I see a sub-optimal order
for table joins, I resist the temptation to
add the ORDERED hint, and instead create
histograms on the foreign keys of the join
to force the CBO to make the best decision.
For new features, explore the Oracle10g
automatic histograms collection mechanism
that interrogates v$sql_plan to see where
the foreign keys are used. It claims to
generate histograms when appropriate, all
automatically.
This is one reason that the ORDERED hint is
so popular, but it has been shown that
having liberal column histograms on the
table columns can often aid the optimizer in
making better execution plans.
In sum, histograms are not just for
non-unique column values that are unevenly
distributed (skewed), and several noted
DBA's have suggested that more liberal use
of histograms will aid the CBO is making
better decisions. The dbms_stats "auto"
feature detects and builds column
histograms, but it has the shortcoming of
being too conservative in some cases.
Savvy DBA's are now experimenting with
broad-brush histograms, for all indexes
columns. I first heard of this technique
from Jeff Maresh (noted data warehouse
consultant), who told me that he has taken
to creating 10-bucket histograms for all
data warehouse table columns. I heard this
advice again at the IOUG conference from
Arup Nanda (noted author and DBA of the
year) and from Mike Ault.
They are abandoning the use of the "auto"
option and manually creating 20-bucket
histograms across-the-board, and they claim
that it can make a huge difference for
databases with lots of multi-table joins in
he SQL.
I've not tried this technique yet, but when
three experts make the assertion, I believe
that there may be something to the new
technique. The only downside, of course, is
the time required to gather the column
histograms and a small amount of additional
storage in the data dictionary.
One exciting feature of dbms_stats is the
ability to automatically look for columns
that should have histograms, and create the
histograms. Multi-bucket histograms add a
huge parsing overhead to SQL statements, and
histograms should ONLY be used when the SQL
will choose a different execution plan based
upon the column value.
|