Oracle Regular expressions indexes as a tuning tool
Oracle tips by Burleson
Expression syntax has profound implications for Oracle tuning,
especially in the area of indexing where indexes can be created
on regular expressions, eliminating expensive full-table scans
in-favor of fast index access. Regular expressions are
extremely powerful for extracting facts from large text columns,
Oracle expert Jonathan Gennick
regular expression can be used in an Oracle index, a
powerful tool for improving the speed of complex SQL queries,
and notes that regular expression indexes (a type of
function-based index) can dramatically reduce database overhead
for pattern-matching queries. I've noted in my book
Tuning: The Definitive Reference", that function-based
indexes are one of the most powerful and underutilized tools for
Here are some little know facts
about regular expressions:
Indexing on regular
Parsing with regular
expressions regexp_like Jonathan Gennick shows a great
example where we use Oracle regular expressions to extract
�acreage� references from inside a text string, ignoring
important factors such as case sensitivity and words stems
(acre, acres, acreage):
COLUMN park_name format a30
COLUMN acres format a13
WHERE REGEXP_LIKE(description, '[^
Here is the
output, where we see that the regular expression has parsed-out
the acreage figures, just as-if they were a discrete data column
with the table:
Mackinac Island State Park
Muskallonge Lake State Park 217-acre
Porcupine Mountains State Park 60,000 acres
Tahquamenon Falls State Park 40,000+ acres
The only problem with
this query is that it will always perform a large-table
full-table scan on the michigan_park table, causing unnecessary
overhead for Oracle.
However, using the powerful
function-based indexes we could eliminate the unnecessary
overhead by using the regular expression directly in the index.
(REGEXP_LIKE(description, '[^ ]+[- ]acres?','i'));
Laurent Schneider notes that it is illegal to have an index on a
Boolean function, but you could have an index on a case
expression returning 1.
description like '_% acre%'
description like '_%-acre%'
then 1 end);
description like '_% acre%'
description like '_%-acre%' then 1 end)
is not null;
This simple index definition would
create a yes/no index on all park records that contain a
reference to "acre", "acres", "acreage". The database
overhead would be once, when each rows is added to the table,
and not over-and-over again when queries are executed.
The rules for choosing a
function-based index on a complex expression (regular
expression, decode) is a trade-off between several factors:
The number of
blocks in the table - A full-table scan of a super-large
table can cause I/O contention.
of rows returned - If the regular expression returns
only a small percentage of the total table rows, a regular
expression index will greatly reduce I/O.
of the query - If the query is executed frequently,
Oracle may do millions of unnecessary full-table scans.
for slower row inserts - Parsing the text column at
insert time (to add the row to the regular expression index)
will slow-down inserts.
It's the age-old
quandary. If we build the regular expression once (at insert
time) it can be used over-and-over again with little overhead.
Conversely, using regular expressions in SQL without a
supporting index will cause repeated full-table scans.
information on learning regular expression and using them in the
database see Jonathan Gennick and Peter Linsley's book
Oracle Regular Expressions Pocket Reference.
Get the Complete
Oracle SQL Tuning Information
The landmark book
SQL Tuning The Definitive Reference" is
filled with valuable information on Oracle SQL Tuning.
This book includes scripts and tools to hypercharge Oracle 11g
performance and you can
for 30% off directly from the publisher.