 |
|
Oracle Regular expressions indexes as a tuning tool
Oracle Database Tips by Donald Burleson |
Oracle Regular
Expression syntax has profound implications for Oracle tuning,
especially in the area of indexing where indexes can be created
on regular expressions, eliminating expensive full-table scans
in-favor of fast index access. Regular expressions are
extremely powerful for extracting facts from large text columns,
especially the
regexp_like syntax.
Oracle expert Jonathan Gennick
notes that
regular expression can be used in an Oracle index, a
powerful tool for improving the speed of complex SQL queries,
and notes that regular expression indexes (a type of
function-based index) can dramatically reduce database overhead
for pattern-matching queries. I've noted in my book
"Oracle
Tuning: The Definitive Reference", that function-based
indexes are one of the most powerful and underutilized tools for
Oracle professionals.
Here are some little know facts
about regular expressions:
Indexing on regular
expressions
In
Parsing with regular
expressions regexp_like Jonathan Gennick shows a great
example where we use Oracle regular expressions to extract
�acreage� references from inside a text string, ignoring
important factors such as case sensitivity and words stems
(acre, acres, acreage):
COLUMN park_name format a30
COLUMN acres format a13
SELECT
park_name,
REGEXP_SUBSTR(description,'[^ ]+[-
]acres?',1,1,'i') acres
FROM michigan_park
WHERE REGEXP_LIKE(description, '[^
]+[- ]acres?','i');
Here is the
output, where we see that the regular expression has parsed-out
the acreage figures, just as-if they were a discrete data column
with the table:
PARK_NAME ACRES
____________________________
___________
Mackinac Island State Park
1800 acres
Muskallonge Lake State Park 217-acre
Porcupine Mountains State Park 60,000 acres
Tahquamenon Falls State Park 40,000+ acres
The only problem with
this query is that it will always perform a large-table
full-table scan on the michigan_park table, causing unnecessary
overhead for Oracle.
However, using the powerful
function-based indexes we could eliminate the unnecessary
overhead by using the regular expression directly in the index.
CREATE INDEX
parks_acreage
ON
michigan_parks
(REGEXP_LIKE(description, '[^ ]+[- ]acres?','i'));
However,
Laurent Schneider notes that it is illegal to have an index on a
Boolean function, but you could have an index on a case
expression returning 1.
create index
i
on
michigan_parks
(case when
description like '_% acre%'
or
description like '_%-acre%'
then 1 end);
select
*
from
michigan_parks
where
(case when
description like '_% acre%'
or
description like '_%-acre%' then 1 end)
is not null;
This simple index definition would
create a yes/no index on all park records that contain a
reference to "acre", "acres", "acreage". The database
overhead would be once, when each rows is added to the table,
and not over-and-over again when queries are executed.
The rules for choosing a
function-based index on a complex expression (regular
expression, decode) is a trade-off between several factors:
-
The number of
blocks in the table - A full-table scan of a super-large
table can cause I/O contention.
-
The percentage
of rows returned - If the regular expression returns
only a small percentage of the total table rows, a regular
expression index will greatly reduce I/O.
-
The frequency
of the query - If the query is executed frequently,
Oracle may do millions of unnecessary full-table scans.
-
The tolerance
for slower row inserts - Parsing the text column at
insert time (to add the row to the regular expression index)
will slow-down inserts.
It's the age-old
quandary. If we build the regular expression once (at insert
time) it can be used over-and-over again with little overhead.
Conversely, using regular expressions in SQL without a
supporting index will cause repeated full-table scans.
For detailed
information on learning regular expression and using them in the
database see Jonathan Gennick and Peter Linsley's book
Oracle Regular Expressions Pocket Reference.
|
|
Get the Complete
Oracle SQL Tuning Information
The landmark book
"Advanced Oracle
SQL Tuning The Definitive Reference" is
filled with valuable information on Oracle SQL Tuning.
This book includes scripts and tools to hypercharge Oracle 11g
performance and you can
buy it
for 30% off directly from the publisher.
|