Tracking Oracle hash joins
Oracle Tips by Burleson Consulting
Interrogating table join methods
The choice between a hash join and a nested loop join depends on
The relative number of rows in each table
The presence of indexes on the key values
The settings for static parameters such as
The current setting and available memory in
Hash joins frequently do not use indexes and perform full-table scans, often
using parallel query. Hence, the use of hash joins with parallel
full-table scans tend to drive-up CPU consumption.
Also, PGA memory consumption becomes higher when hash joins are
used, but if AMM is enabled, it is not usually a a problem.
The following query produces a report alerting an Oracle DBA when
hash join operations count exceeds some threshold (See
book errata for details):
Note: These scripts
will only track SQL that you have directed Oracle to capture via your
threshold settings in AWR or STATSPACK, and STATSPACK and AWR will not
collect "transient SQL" that did not appear in v$sql at
snapshot time. Hence, not all SQL will
appear in these reports. See my notes here on
adjusting the SQL capture thresholds and
what SQL is
included in AWR/STATSPACK tables?
col c1 heading ‘Date’ format a20
col c2 heading ‘Hash|Join|Count’ format 99,999,999
col c3 heading ‘Rows|Processed’ format 99,999,999
col c4 heading ‘Disk|Reads’ format 99,999,999
col c5 heading ‘CPU|Time’ format 99,999,999
accept hash_thr char prompt ‘Enter Hash Join Threshold: ‘
ttitle ‘Hash Join Threshold|&hash_thr’
SEE CODE DEPOT FOR FULL SCRIPTS
st.dbid = sn.dbid
st.instance_number = sn.instance_number
sp.sql_id = st.sql_id
sp.dbid = st.dbid
sp.plan_hash_value = st.plan_hash_value
sp.operation = 'HASH JOIN'
count(*) > &hash_thr;
The sample output might look the following, showing the number of
hash joins during the snapshot period along with the relative I/O
and CPU associated with the processing. The values for
generally higher for hash joins which do full-table scans as opposed
to nested loop joins with generally involved a very small set of
Hash Join Thresholds
Join Rows Disk CPU
Date Count Processed Reads Time
-------------------- ----------- ----------- -----------
04-10-12 17 22 4,646 887 39,990,515
04-10-13 16 25 2,128 827 54,746,653
04-10-14 11 21 17,368 3,049 77,297,578
04-10-21 15 60 2,805 3,299 5,041,064
04-10-22 10 25 6,864 941 4,077,524
04-10-22 13 31 11,261 2,950 46,207,733
04-10-25 16 35 46,269 1,504 6,364,414
The sorting default is that no single task may consume more
than 5% of the
pga_aggregate_target region before the
sort pages out to the TEMP tablespace for a disk sort.
The Oracle cost-based optimizer will determine
whether a hash join would be beneficial over a nested-loop join, so
making more PGA available for hash joins will not have any
detrimental effect since the optimizer will only invoke a
super-sized hash join if it is better than a nested-loop join. In a
system like the example above, the following settings would increase
the default sizes for large sorts and hash joins while limiting
those for parallel sorts.
pga_aggregate_target = 4g
_pga_max_size = 400m
_smm_px_max_size = 333m
With these hidden parameters set we see
significant size increase for serial sorts and a throttling effect
for parallel queries and sorts. However, bear in mind that it only
valid for a specific release of Oracle10g, on a specific hardware
and OS environment, and not using any optional features such as the
Here is a STATSPACK version of the script to track
col c1 heading 'Date'
col c2 heading 'Hash|join|Count'
col c3 heading 'Rows|Processed'
col c4 heading 'Disk|Reads'
col c5 heading 'CPU|Time'
accept hash_thr char prompt 'Enter Hash Join Threshold: '
ttitle 'Hash Join Threshold|&hash_thr'
st.sql_id = p.sql_id
p.plan_hash_value = sp.plan_hash_value
sn.dbid = st.dbid
sp.operation like '%HASH%'
count(*) > &hash_thr
Ion tool is
also excellent for identifying SQL to tune and it can show SQL
execution over time with stunning SQL graphics.
SEE CODE DEPOT FOR FULL SCRIPTS