Oracle bitmap index maximum distinct values

Oracle Database Tips by Donald BurlesonUpdated March 8, 2015

Question: I know that bitmap indexes are for columns with a small number of distinct values, but what is the rule for cardinality. Is 500 distinct values too many for a bitmap index? Does the length of the index key play a role in the maximum number of distinct values? How do I tell when m y bitmap index has too many values?

Answer: You are correct that the maximum cardinality for bitmap indexes is a function of the number of distinct values and the size of the key, but far and away it is the number of distinct values that is the largest contributing factor to the maximum number of distinct values you can have in a bitmap index.

As the number if distinct values increases, the size of the bitmap increases exponentially, such that an index with 100 values may perform thousands of times faster than a bitmap index on 1,000 distinct column values. The answer, as with all Oracle tuning questions is "It depends".

Also, remember that bitmap indexes are only suitable for static tables and materialized views which are updated at nigh and rebuilt after batch row loading. If your tables are not read-only during query time, carefully test your DML speed before implementing bitmap indexes! Tables with dozens of updates per second can see slowdowns because of the overhead of updating the bitmap.

So, which is the right maximum values for your bitmap index? Who knows? You will need to run performance benchmarks on your unique database to see!

Benchmarks have shown these trends in SQL performance and the number of distinct bitmap index values (as a percentage of total table rows), and the clustering factor for the index column. All else being equal, cardinality is one of the most important factors in deciding to use a bitmap index, but there are rare cases where a bitmap on a column with 10,000 key values might be appropriate.

So, what is the maximum threshold for unique values in a bitmap index? The answer (as with all Oracle tuning questions) is "It depends". To understand how a bitmap works, remember that a bitmap is a two dimensional matrix with two axis. The number of rows in the table is one axis, and the number of distinct key values is the other axis.

Conceptualize a bitmap index as a two-dimensional array

Hence, a bitmap index on a million row table with ten distinct values in the index column will have ten million cells in the bitmap. Conversely, a bitmap on a million rows table with an index column with 1,000 distinct values will be much larger, with a billion cells.

At this point, let's note that Oracle performs bitmap index compression, and that is why the clustering_factor is important. If a bitmap index column has a "good" clustering factor (e.g. close to the number of blocks in the table) then the index values are adjacent, and Oracle will be able to compress the index far better than a bitmap on an un-clustered column.

Bitmap compression, row clustering, and column cardinality

It should be clear that it's not exclusively the number of distinct values in the index that governs our choice of using a bitmap, it's also a function of the percentage the distinct values as a percentage of rows in the table and the clustering of the column values (as per the clustering_factor column in the dba_indexes view). Hence, a decision for a bitmap is a function of these criteria:

The ratio of rows to distinct values: A more accurate estimate of the suitability of a bitmap index is the ratio of distinct rows to the number of total rows in a table.
Clustering factor: The clustering of the index keys to the physical data blocks has an influence on bitmap indexes.

In a table where the bitmap columns appear in the same physical order as the rows, related column values will be adjacent on each data block. The clustering_factor will be close to the value of the blocks column in dba_data_files, and Oracle will be able to compress the bitmap index.

When a bitmap index sees repeating column values, the matching rows can be omitted from the bitmap. As a bitmap is being created or updated, Oracle looks to the previous index entry. If it is the same, Oracle replaces the bitmap entry with a "same as prior" flag, and continues until a new key value is detected.

At an extreme case, consider a million rows table where the column has 100,000 distinct values and clustering_factor ~= blocks.

Because all adjacent columns values are grouped together, the bitmap axis goes from 1,000,000 rows down to 100,000, making for a highly compressed bitmap.

In the compressed example below, all "East:" rows appear adjacent in rows 1-5, all "North" values resides on rows 6-13, all "South values" are in rows 14-15, and all "West" rows reside on rows 16-18:

Again, in most cases, there will not be a lot of adjacent index values, so it quite rare to see extensive compression.

Exceptions to the rule! High cardinality bitmap indexes

Oracle is not a science and for every rule we see exceptions! At first blush, the Oracle 11g documentation can appear contradictory. In some places we see excellent rules-of-thumb for bitmap index cardinality.

"In a data warehouse, B-tree indexes should only be used for unique columns or other columns with very high cardinalities (that is, columns that are almost unique)."

"You typically want to use bitmap indexes on low degree of cardinality columns and B-tree indexes on high degree of cardinality columns.

As a general rule, a cardinality of under 1% makes a good candidate for a bitmap index."

However, there are rare cases where high cardinality columns can be used in bitmap indexes, and Oracle suggests that these conditions are acceptable for a bitmap index:

If the number of distinct values of a column is less than 1% of the number of rows in the table.
If the values in a column are repeated more than 100 times, then the column is a candidate for a bitmap index.

Now that we understand bitmap index compression, it should be clear that there are rare cases where high cardinality columns might be candidates for bitmap indexes.

So, which are the right maximum values for your bitmap index? Who knows? You will need to run performance benchmarks on your unique database to see!

The only way to know for sure is to perform real-world benchmark testing your real SQL workloads. You can use the dbms_workload_capture procedure to grab a representative SQL workload from your production database and replay the workload in a test instance using your new bitmap index.

Bitmap index usage

Remember how bitmap indexes are generally used. Because they are low cardinality, they have very little value by themselves, and the power comes when Oracle combines multiple bitmaps in a bitmap merge operation:

We also have to deal with the volatility of low cardinality columns. For example, some hippie states have adopted a five gender system, male, female, male becoming female, female becoming male, and eunuch, changes that cost billions of dollars to implement.

If you like Oracle tuning, you may enjoy the book Oracle Tuning: The Definitive Reference , with over 900 pages of BC's favorite tuning tips & scripts.

You can buy it directly from the publisher and save 30%, and get instant access to the code depot of Oracle tuning scripts.

��