WIP - Not Yet completedOracle has
recently made an important change to an internal algorithm for performing hash
joins, replacing their old method with a "bloom filter" hash join.
For those who are new to computer science, a "hash" is the transformation of
a symbolic key value into a "address", a fixed location in RAM or disk.
Just as your home address uniquely identified where you live, a memory address
has all of the information required to locate a unique place on RAM or Disk.
Hashing algorithms are a very fast way to store Oracle information, and a
hash can convent a symbolic key into a disk address in as little as 50
microseconds. This makes Oracle 10g sorted hash clusters a very fast way
to store information, but hashing is also used for other important Oracle
operations such as joining rows within the SQL optimizer. Hashing is used
for:
- Row identification (sorted hash clusters)
- Joining tables (the hash join)
One of the problems with Oracle's older hashing algorithm was that it did not
deal well with large sets. For example, an Oracle hash join was great for
joining a small table to a large table, but large table joins were more
efficient with nested loop joins. The new bloom filter hashing algorithm
may change that.
In a bloom filter hash, the problem is large sets is overcome.
Rampant TechPress author Steve Callan has a great description of the Oracle
hashing with bloom filters. In the paper “General
Purpose Hash Function Algorithms,” we see a good definition of the bloom
filer hash:
A Bloom filter allows for the state of
existence of a very large set of possible type values to be represented with
a much smaller piece of memory.
This is achieved through the use of
multiple distinct hash functions and also by allowing the result of a query
for the existence of a particular type to have a certain amount of error.
This error can be controlled by varying
the size of the table used for the Bloom filter and also by varying the
number of hash functions.
While the internal machinations of Oracle remain a well-guarded proprietary
secret, this new bloom filter hash may have a profound impact on overall Oracle
performance as larger sets can now bypass nested loop processing in favor of a
hash join.