Oracle hash joins tips
Oracle Tips by Burleson Consulting
The following is from the bestselling book
Oracle 10g Grid & Real Application
Clusters and the book "Oracle
Tuning: The Definitive Reference".
Enabling Oracle to perform hash joins
In cases where a very small table is being
joined to a large table, the Oracle hash
join will often dramatically speed-up
the query. Hash joins are far faster
than nested loop joins in certain cases,
often in cases where your SQL is joining a large table to a small table.
However, in a production database with very large tables, it is not
always easy to get your database
to invoke hash joins without increasing
the RAM regions that control hash joins. For large tables, hash joins
requires lots of RAM.
The 200 megabyte limit for hash joins
Oracle places limits of the amount of RAM
available for hash joins so that no single session will "hog" all of the RAM.
research using Oracle 10g, the hash join maximum is only to only 200 megabytes (5% of
The Oracle DBA controls the optimizers' propensity to invoke
hash joins because the DBA must allocate the RAM resources to Oracle (using the
hash_area_size and pga_aggregate_target parameters) for the
optimizer to choose a hash join. The CBO will only choose a hash join if
you have allocated Oracle enough RAM area in which to perform the hash join.
Sizing your PGA for hash joins
The rules are quite
different depending on your release, and
you need to focus on the
hash_area_size OR the
Unfortunately, the Oracle hash join is
more memory intensive than a nested loop
join. To be faster than a nested loop
join, we must set the hash_area_size
large enough to hold the entire hash
table in memory (about 1.6 times the sum
of the rows in the table).
If the Oracle hash join overflows the
hash_area_size memory, the hash join
will page into the TEMP tablespace,
severely degrading the performance of
the hash join. You can use the following
script, to dynamically
allocate the proper hash_area_size for
your SQL query in terms of the size of
your hash join driving table.
Here is the output from this script
with the suggested hash area size
calculation for the driving table. As
you see, we pass the driving table name,
and the script generates the appropriate
alter session command to ensure that we
have enough space in hash_area_size RAM
to hold the driving table.
'alter session set hash_area_size='||trunc(sum(bytes)*1.6)||';'
See code depot for full hash sizing
segment_name = upper('&1');
alter session set hash_area_size=3774873;
In addition to seeing the hash_area_size,
we must also be able to adjust the
degree of parallelism in cases where we
use a full-table scan to access the
tables in a hash join.
Laurent Schneider notes in Oracle MOSC that
overriding the PGA defaults made a large batch processes run more than 8x
faster with a use_hash hint:
"I set appropriate values for pga_aggregate_target
alter system set
alter system set "_pga_max_size"=2000000000;
...and I gave the query some hints "NOREWRITE FULL USE_HASH ORDERED".
As a result, it boosted my query performance from 12 hours to 1.5 hour."
Note: Always consult Oracle Technical Support before using
any undocumented parameters.
Ah, if only it were that
easy, just change a setting and batch jobs run six times faster.
Laurent Schneider notes some perils and reliability issues relating to
this parameter and says "this parameter often leads to an
even when plenty of memory available, for some obscure reasons".
There are other tricks for overcoming the built-in
governor for PGA usage. Oracle
has a 5% limit for any individual process, and by using parallel DML any
single batch job can consume 30% of the PGA without touching any of the
Laurent Schneider noted:
opted for a more maintainable solution.
No more hints, no more undocumented parameter, but parallel
processing up to 16 threads on a 4 CPU server.
As discussed in
MOSC thread 460157.996, a supported way to increase the
maximum PGA memory per single SQL query is to increase the degree of
While Laurent abandoned
the undocumented approach, the promise of eight times faster execution
speeds are very tempting. Once you get permission from Oracle
Technical Support to set an undocumented parameter, they can work with
to resolve errors. While they may not address bugs, they may be able
to provide alternatives and workarounds.
This short tip does not have complete
information about the hash join, but you
can get complete information from book "Oracle
Tuning: The Definitive Reference".
When hash joins fail
The hash join is very finicky,
and there are many conditions that must
be satisfied. It is not uncommon to find
that a use_hash hint is ignored, and
here are some common causes of this
Check hash join parameters
- Make sure
that you have the proper settings for
optimizer_index_cost_adj and optimizer_max_permutations
earlier) to limit the number of table
join evaluations. Also check your values
for hash_area_size and hash_multiblock_io_count. If using
pga_aggregate_target, remember that it
will not use more than 5% of the space
for any hash join and you may need to
- Verify the hash
join driving table
Make sure that the smaller table is the
driving table (the first table in the
from clause when using the ordered
hint). This is because a hash join
builds the memory array using the
- Analyze CBO statistics
Check that tables and/or columns of the
join tables are appropriately analyzed
- Check for skewed columns
Histograms are recommended only for
non-uniform column distributions. If
necessary, you can override the join
order chosen by the cost-based optimizer
using histograms or the ordered hint.
- Check RAM region - Ensure
that hash_area_size is large enough to
hold the smaller table in memory.
Otherwise, Oracle must write to the TEMP
tablespace, slowing down the hash join.
Oracle recommends that the
hash_area_size for the driving table be
set at 1.6 times the sum of bytes for
the driving table, and you can use the
hash_area.sql script to set the
hash_area_size for your query session.
If you have followed these steps, you
should be able to add a use_hash hint
ton your SQL to invoke a hash join.
Remember, it is always a good idea to
run a explain plan on your SQL to ensure
that the hash join is being invoked.
Tracking hash joins
Because hash joins are so tightly controlled by available
memory, the savvy DBA might
track hash joins over
time. You can use SQL scripts to track system-wide hash joins.
See code depot for full hash sizing
st.dbid = sn.dbid
st.instance_number = sn.instance_number
sp.sql_id = st.sql_id
sp.dbid = st.dbid
sp.plan_hash_value = st.plan_hash_value
sp.operation = 'HASH JOIN'
count(*) > &hash_thr;
The sample output might look the following, showing the number of
hash joins during the snapshot period along with the relative I/O
and CPU associated with the processing. The values for
generally higher for hash joins which do full-table scans as opposed
to nested loop joins with generally involved a very small set of
Hash Join Thresholds
Join Rows Disk CPU
Date Count Processed Reads Time
-------------------- ----------- ----------- -----------
04-10-12 17 22 4,646 887 39,990,515
04-10-13 16 25 2,128 827 54,746,653
04-10-14 11 21 17,368 3,049 77,297,578
04-10-21 15 60 2,805 3,299 5,041,064
04-10-22 10 25 6,864 941 4,077,524
04-10-22 13 31 11,261 2,950 46,207,733
04-10-25 16 35 46,269 1,504 6,364,414
Oracle hash joins are dependent upon your
system and session parameter settings.
See my research on Oracle hash joins here:
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning tips and
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts.