|
 |
|
Using gather_stats_job
Oracle Database Tips by Donald BurlesonMarch 28, 2015
|
By Mladen Gogala
Collecting statistics is a tough nut to crack, for every
DBA. Newer versions of Oracle RDBMS come with an automated job which, let's be
honest, does a really poor job, if left to its own accords. This automated job
runs far too often, daily in version 10g or weekly in version 11g and it doesn't
collect histograms.
Histograms provide the distribution of the column data
and are crucial for the optimizer to decide whether to use an index or not. In
other words, histograms are essential for the selection of adequately performing
execution plans. The usual advice dispensed by many Oracle experts is to turn
this job off and write your own, doing a better job. Unfortunately, that will
also force you to either rewrite all of the intelligence built into
DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC or give it up. Job that gathers
database statistics is based on an undocumented procedure
GATHER_DATABASE_STATS_JOB_PROC in the DBMS_STATS package.
This procedure reads and observes all parameters, set by
SET_PARAM procedure, which means that we can obtain fairly usable statistics
without much effort. Many authors discourage gathering statistics with option
like ?FOR ALL INDEXED COLUMNS SIZE 251? but statistics produced by DBMS_STATS
using ?indexed columns? option is quite adequate for the vast majority of Oracle
databases I've worked with during the last 5 years of my career. This is more
then 100 databases in various companies, ranging from 50 employees to more then
3000 employees.
Recipe
The recipe is really simple. First, we set parameters for
the collection routine:
1 begin
2 DBMS_STATS.SET_PARAM('CASCADE','TRUE');
3 DBMS_STATS.SET_PARAM('DEGREE','4');
4 DBMS_STATS.SET_PARAM('METHOD_OPT','FOR
ALL INDEXED COLUMNS SIZE 251');
5 DBMS_STATS.SET_PARAM('ESTIMATE_PERCENT','8');
6* end;
SQL> /
PL/SQL procedure successfully completed.
SQL>
Then, we create a schedule and set the job
use the schedule:
SQL> begin
2 dbms_scheduler.create_schedule(
3 schedule_name => 'STATS_COLLECTION',
4 repeat_interval=>'freq=monthly;bymonthday=1;byhour=22;byminute=30',
5 comments => 'Schedule to collect
statistics');
6 dbms_scheduler.set_attribute(
7 name=>'GATHER_STATS_JOB',
8 attribute=>'SCHEDULE_NAME',
9 value=>'STATS_COLLECTION');
10 end;
11 /
PL/SQL procedure successfully completed.
That will make GATHER_STATS_JOB to run on the first day
of every month, at 22:30 PM. It will sample 8 percent of every table it collects
statistics for and will create histograms with 251 endpoints on all indexed
columns. Statistics thus produces is actually quite good statistics and provides
the optimizer with sufficient information to accurately assess the optimal
execution plans for the vast majority of SQL statements. Sample of 8% is rather
large, but the underlying database is quite a small one. Larger the database,
smaller the sample.
Statistics operations can be monitored by looking into
DBA_OPTSTAT_OPERATIONS table. This table shows operation, start and end times of
all DBMS_STATS operations on database and schema level.
Proof
As usual, Oracle Corp. doesn't make our job easier by
documenting things, but we can experiment. What we can do to experiment is to
set parameters by using DBMS_STATS.SET_PARAM to see whether the undocumented
database procedure for statistics collection observes them. The experiment
itself is extremely simple, it involves dropping statistics on the EMP table,
setting parameters and then re-running statistics collection like this:
SQL> analyze table scott.emp delete
statistics;
Table analyzed.
SQL> begin
2
dbms_stats.gather_database_stats_job_proc;
3 end;
4 /
PL/SQL procedure successfully completed.
SQL>
Varying different options by using different arguments
for DBMS_STATS.SET_PARAM will produce different histograms. The first option is
with METHOD_OPT set to 'FOR TABLE'. It doesn't produce any histograms
whatsoever:
NUM |
COLUMN_NAME |
TYPE |
LEN |
PREC. |
SCALE |
HISTOGRAM |
BUCKETS |
1 |
EMPNO |
NUMBER |
22 |
4 |
0 |
NONE |
n/a |
2 |
ENAME |
VARCHAR2 |
10 |
n/a |
n/a |
NONE |
n/a |
3 |
JOB |
VARCHAR2 |
9 |
n/a |
n/a |
NONE |
n/a |
4 |
MGR |
NUMBER |
22 |
4 |
0 |
NONE |
n/a |
5 |
HIREDATE |
DATE |
7 |
n/a |
n/a |
NONE |
n/a |
6 |
SAL |
NUMBER |
22 |
7 |
2 |
NONE |
n/a |
7 |
COMM |
NUMBER |
22 |
7 |
2 |
NONE |
n/a |
8 |
DEPTNO |
NUMBER |
22 |
2 |
0 |
NONE |
n/a |
For the next step, I created indexes on columns DEPTNO
and JOB, in addition to the existing primary key. The METHOD_OPT will now be set
to 'FOR ALL INDEXED COLUMNS SIZE 4'. The histogram produced looks like this:
NUM |
COL_NAME |
TYPE |
LEN |
PRECISION |
SCALE |
HISTOGRAM |
BUCKETS |
1 |
EMPNO |
NUMBER |
22 |
4 |
0 |
HEIGHT BALANCED |
5 |
2 |
ENAME |
VARCHAR2 |
10 |
n/a |
n/a |
NONE |
n/a |
3 |
JOB |
VARCHAR2 |
9 |
n/a |
n/a |
HEIGHT BALANCED |
4 |
4 |
MGR |
NUMBER |
22 |
4 |
0 |
NONE |
n/a |
5 |
HIREDATE |
DATE |
7 |
n/a |
n/a |
NONE |
n/a |
6 |
SAL |
NUMBER |
22 |
7 |
2 |
NONE |
n/a |
7 |
COMM |
NUMBER |
22 |
7 |
2 |
NONE |
n/a |
8 |
DEPTNO |
NUMBER |
22 |
2 |
0 |
FREQUENCY |
3 |
Finally, when we set the options to 'ALL INDEXED COLUMNS
SIZE 251', we will see this:
NUM |
COLUMN_NAME |
TYPE |
LEN |
PRECISION |
SCALE |
HISTOGRAM |
BUCKETS |
1 |
EMPNO |
NUMBER |
22 |
4 |
0 |
HEIGHT BALANCED |
14 |
2 |
ENAME |
VARCHAR2 |
10 |
n/a |
n/a |
NONE |
n/a |
3 |
JOB |
VARCHAR2 |
9 |
n/a |
n/a |
FREQUENCY |
5 |
4 |
MGR |
NUMBER |
22 |
4 |
0 |
NONE |
n/a |
5 |
HIREDATE |
DATE |
7 |
n/a |
n/a |
NONE |
n/a |
6 |
SAL |
NUMBER |
22 |
7 |
2 |
NONE |
n/a |
7 |
COMM |
NUMBER |
22 |
7 |
2 |
NONE |
n/a |
8 |
DEPTNO |
NUMBER |
22 |
2 |
0 |
FREQUENCY |
3 |
That is pretty much what is expected. Columns JOB and
DEPTNO have frequency histograms as the number of different values is much
smaller then the requested histogram size. EMPNO column is the primary key
column and histograms are pointless for the primary key columns as the
distribution of values in the primary key columns is always the same. The query
used to produce these HTML tables looks is a simple join of DBA_TAB_COLUMNS and
DBA_HISTOGRAMS. It looks like this:
SELECT rownum
num,column_name,type,len,precision,scale,histogram,buckets, nullok
FROM (
SELECT c.column_name,
c.data_type as type,
c.data_length as len,
c.data_precision as
precision,
c.data_scale as
scale,
c.histogram,
h.buckets,
c.nullable as nullok
FROM dba_tab_columns c,
(SELECT
owner,table_name,column_name,count(*) as buckets
FROM dba_histograms
WHERE owner=&&OWN and
table_name=&&TAB
GROUP BY
owner,table_name,column_name ) h
WHERE c.owner=&&OWN and
c.table_name=&&TAB and
c.table_name =
h.table_name(+) and
c.column_name =
h.column_name(+) and
c.owner = h.owner(+)
ORDER BY c.column_id)
Oracle11g
Until now, this article was entirely devoted to
Oracle10g. Oracle11g, however, is a slightly different animal. First, job named
?GATHER_STATS_JOB? doesn't exist in Oracle11g. Here are jobs delivered with an
Oracle11g instance.
OWNER |
JOB_NAME |
JOB_ACTION |
SYS |
BSLN_MAINTAIN_STATS_JOB |
n/a |
SYS |
DRA_REEVALUATE_OPEN_FAILURES |
dbms_ir.reevaluateopenfailures |
SYS |
FGR$AUTOPURGE_JOB |
sys.dbms_file_group.purge_file_group(NULL); |
SYS |
HM_CREATE_OFFLINE_DICTIONARY |
dbms_hm.create_offline_dictionary |
ORACLE_OCM |
MGMT_CONFIG_JOB |
ORACLE_OCM.MGMT_CONFIG.collect_config |
ORACLE_OCM |
MGMT_STATS_CONFIG_JOB |
ORACLE_OCM.MGMT_CONFIG.collect_stats |
SYS |
ORA$AUTOTASK_CLEAN |
n/a |
SYS |
PURGE_LOG |
n/a |
EXFSYS |
RLM$EVTCLEANUP |
begin dbms_rlmgr_dr.cleanup_events; end; |
EXFSYS |
RLM$SCHDNEGACTION |
begin
dbms_rlmgr_dr.execschdactions('RLM$SCHDNEGACTION'); end; |
SYS |
XMLDB_NFS_CLEANUP_JOB |
dbms_xdbutil_int.cleanup_expired_nfsclients |
Judging by the name alone, there are two jobs that can be
candidates to replace GATHER_STATS_JOB:
1.
ORACLE_OCM.MGMT_STATS_CONFIG_JOB
2.
SYS.BSLN_MAINTAIN_STATS_JOB
ORACLE_OCM user belongs to the Oracle Configuration
Manager, Oracle version of similar Microsoft utility. Same as with Microsoft,
people are reluctant to use the OCM because of the ?big brother is watching you?
feeling. For now (March 2015), it is fair to assume that the vast majority of
Oracle11 databases are installed for testing purposes and without OCM.
Nevertheless, we will have to check it out. How can we check what does a job,
for which we don't have a source, do? We can see whether there are any objects
dependent on it and we can trace the procedure execution with 10046 trace.
After doing all that, the conclusion is that neither of
these two jobs is a plug-in replacement for GATHER_STATS_JOB. One job requires
diagnostics package license, the other one belongs to the configuration manager.
There is no GATHER_STATS_JOB in version 11! What can we do now to get our
intelligent statistics collection back? We can create our beloved
GATHER_STATS_JOB:
connect / as
sysdba
begin
dbms_scheduler.create_schedule(
schedule_name => 'STATS_COLLECTION',
repeat_interval=>'freq=monthly;bymonthday=1;byhour=22;byminute=30',
comments => 'Schedule to collect
statistics');
DBMS_SCHEDULER.CREATE_JOB
(
job_name => 'GATHER_STATS_JOB',
job_type => 'STORED_PROCEDURE',
job_action => 'DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC',
schedule_name => 'STATS_COLLECTION',
enabled => TRUE);
end;
/
We still have to do that as SYS, but we've got our
intelligent statistics collection back. Of course, now that the job is created,
we can still modify its behavior by setting parameters, just as we did in
Oracle10g. Thus, we've closed the full circle: from disabling the job in the
initial stages of Oracle10g implementation to creating it in version 11g. This
is a relational database version of the Lion King.
Of course, if 'special handling? is needed for certain
tables, we can always write a custom procedure, belonging to the user SYSTEM and
with AUTHID CURRENT_USER and then invoke it through a job launched by SYS. From
the conversations with colleagues, it is a rather general consensus that it is
OK to create privileged jobs, running as user SYS, but it is not OK to create
any other objects, like procedures. So called 'special handling? may include
setting statistics for an index to make it more appealing, gathering histograms
for a non-indexed column where it might make a difference between hash and merge
join or some other surgical intervention which needs to be performed after every
collection of database statistics.
Conclusion
There are many methods and philosophies centered around
collecting database statistics. Those philosophies all recommend different
frequency and options of collecting statistics. Some people are still reusing
old scripts from version 8, containing ?ANALYZE TABLE <TAB> ESTIMATE STATISTICS?
commands, some have written new scripts using DBMS_STATS and some have even
written their own, arcane and complex packages. There is no need to do that,
unless there are some objects that require really special handling.
Furthermore, it is easy to modify the job behavior in several ways.
Documentation could use some improvements but a little bit of simple
experimenting can help us to better understand and manage statistics collection
on our databases. The behavior of the built-in tools is easy enough to modify to
support our particular needs and philosophies.
See these related notes on using dbms_stats_job:
 |
If you like Oracle tuning, you
might enjoy my book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning tips and
scripts.
You can buy it direct from the publisher for 30%-off and get instant
access to the code depot of Oracle tuning scripts. |
|