Answer: You
are correct that the maximum cardinality for bitmap indexes is a function of
the number of distinct values and the size of the key, but far and away it
is the number of distinct values that is the largest contributing factor to
the maximum number of distinct values you can have in a bitmap index.
As the number if distinct values increases, the size of
the bitmap increases exponentially, such that an index with 100 values may
perform thousands of times faster than a bitmap index on 1,000 distinct
column values. The answer, as with all Oracle tuning questions is "It
depends".
Also, remember that bitmap indexes are only suitable
for static tables and materialized views which are updated at nigh and
rebuilt after batch row loading. If your tables are not read-only
during query time, carefully test your DML speed before implementing bitmap indexes! Tables with dozens of updates per second can see slowdowns because of the overhead of updating the bitmap.
So, which is the right maximum
values for your bitmap index? Who knows? You will need to run
performance benchmarks on your unique database to see!
Benchmarks have shown these trends in SQL performance and the number of
distinct bitmap index values (as a percentage of total table rows), and the
clustering factor for the index column. All else being equal, cardinality is
one of the most important factors in deciding to use a bitmap index, but
there are rare cases where a bitmap on a column with 10,000 key values might
be appropriate.
So, what is the maximum threshold for unique values in a bitmap index?
The answer (as with all Oracle tuning questions) is "It depends". To
understand how a bitmap works, remember that a bitmap is a two dimensional
matrix with two axis. The number of rows in the table is one axis, and
the number of distinct key values is the other axis.

Conceptualize a bitmap index as a two-dimensional array
Hence, a bitmap index on a million row table with ten distinct values in
the index column will have ten million cells in the bitmap.
Conversely, a bitmap on a million rows table with an index column with 1,000
distinct values will be much larger, with a billion cells.
At
this point, let's note that Oracle performs bitmap index compression, and
that is why the clustering_factor is important. If a bitmap index
column has a "good" clustering factor (e.g. close to the number of blocks in
the table) then the index values are adjacent, and Oracle will be able to
compress the index far better than a bitmap on an un-clustered column.
Bitmap compression, row clustering, and column
cardinality
It should be clear that it's not exclusively the number of distinct
values in the index that governs our choice of using a bitmap, it's also a
function of the percentage the distinct values as a percentage of rows in
the table and the clustering of the column values (as per the
clustering_factor column in the dba_indexes view). Hence, a decision
for a bitmap is a function of these criteria:
- The ratio of rows to distinct values: A more
accurate estimate of the suitability of a bitmap index is the ratio of
distinct rows to the number of total rows in a table.
- Clustering factor: The clustering of the
index keys to the physical data blocks has an influence on bitmap
indexes.
In a table where the bitmap columns appear in the same physical order as
the rows, related column values will be adjacent on each data block.
The clustering_factor will be close to the value of the blocks column in
dba_data_files, and Oracle will be able to compress the bitmap index.
When a bitmap index sees repeating column values, the matching rows can
be omitted from the bitmap. As a bitmap is being created or updated,
Oracle looks to the previous index entry. If it is the same, Oracle
replaces the bitmap entry with a "same as prior" flag, and continues until a
new key value is detected.
At an extreme case, consider a
million rows table where the column has 100,000 distinct values and
clustering_factor ~= blocks.
Because all adjacent columns values are
grouped together, the bitmap axis goes from 1,000,000 rows down to 100,000,
making for a highly compressed bitmap.
In the compressed example
below, all "East:" rows appear adjacent in rows 1-5, all "North" values
resides on rows 6-13, all "South values" are in rows 14-15, and all "West"
rows reside on rows 16-18:

Again, in most cases, there will not be a lot of adjacent index values,
so it quite rare to see extensive compression.
Exceptions to the rule! High cardinality bitmap indexes
Oracle is not a science and for every rule we see exceptions! At
first blush, the Oracle 11g documentation can appear contradictory. In
some places we see excellent rules-of-thumb for bitmap index cardinality.
"In a data warehouse, B-tree indexes should only be used
for unique columns or other columns with very high cardinalities (that is,
columns that are almost unique)."
"You typically want to use bitmap
indexes on low degree of cardinality columns and B-tree indexes on high
degree of cardinality columns.
As a general rule, a cardinality of under 1% makes a good
candidate for a bitmap index."
However, there are rare cases where high cardinality columns can be used
in bitmap indexes, and Oracle suggests that these conditions are acceptable
for a bitmap index:
- If the number of distinct values of a column is less than 1% of the
number of rows in the table.
- If the values in a column are repeated more than 100 times, then the
column is a candidate for a bitmap index.
Now that we understand bitmap index compression, it should be clear that
there are rare cases where high cardinality columns might be candidates for
bitmap indexes.
So, which are the right maximum values for your
bitmap index? Who knows? You will need to run performance
benchmarks on your unique database to see!
The only way to
know for sure is to perform real-world benchmark testing your real SQL
workloads. You can use the dbms_workload_capture procedure to
grab a representative SQL workload from your production database and replay
the workload in a test instance using your new bitmap index.
Bitmap index usage
Remember how bitmap indexes are generally used.
Because they are low cardinality, they have very little value by themselves,
and the power comes when Oracle combines multiple bitmaps in a bitmap merge
operation:
We also have to deal
with the volatility of low cardinality columns.
For example, some hippie states have adopted a five gender system, male, female, male becoming female, female becoming male, and eunuch,
changes that cost billions of dollars to implement.
 |
If you like Oracle tuning, you may enjoy the book
Oracle Tuning: The Definitive Reference , with over 900 pages of BC's favorite tuning
tips & scripts.
You can buy it directly from the publisher and save 30%, and get
instant access to the code depot of Oracle tuning scripts. |