|
 |
|
Oracle Regular Expressions with regexp_like parsing
Oracle Database Tips by Donald Burleson |
See also:
Oracle
Regular expressions indexes as a tuning tool
If you are familiar with UNIX or Linux shell
scripting, then you are probably already familiar with using the
power of regular expressions. It's a powerful pattern-matching tool
that allows you to implement formatting logic rules for Oracle data
column values.
But regular expressions are more than just
answering simple questions like "Does this field look like a
Social Security Number?" Regular expression can now be
incorporated into Oracle SQL and Java, creating a platform for
super-powerful tools that border on Artificial Intelligence.
Regular expressions are commonly used to ensure data quality in
Oracle database where data is Hoovered (a Hoover is a vacuum) from
many remote schemas into a central repository, usually a Decision
Support System (DSS). Examples would be the EDI feeds (of
point-of-sale data) from retail stores (e.g. K-Mart) to
manufacturers.
Did you know that every time the cashier swipes
an individual item across the bar code reader, that a record of that
sale is collected and sent to the manufacturer of the product? If
you use a shopper's club card it gets even cooler, because the
manufacturer now has data about the "type" of person who has just
purchased their widget.
As a tool for pattern matching and parsing,
regular expressions are unparalleled in their sheer power, and
regular expressions are an integral part of Oracle database
management. Regular expressions are indispensable for data
cleansing and data scrubbing, ensuring data consistency within
Oracle databases, large and small.
ALTER TABLE
patient
ADD (
CONSTRAINT
phone_number_format
CHECK
(REGEXP_LIKE(patient_phone,
'^\([[:digit:]]{3}\) [[:digit:]]{3}-[[:digit:]]{4}$')));
This format mask defines a telephone number at
a much greater level of detail than most constraints:
^ = This denotes the first character in the string
$ = This is the end of the string
We could also use these wildcards:
'?' = allows a preceding character to match zero or one time
'+' = allows a preceding character to match one or more times
. = Any character (except new-line \n)
* = 0, 1, or more of the preceding element.
{n} = Exactly n repetitions of the preceding element.
{n,} = Matches n or more repetitions of the preceding element.
{m,n} = Matches between m and n reps of the preceding element.
[abc] = Character list, matching a, b, or c.
[a-c] = an alphabetic range match allowing a, b or c
| = The "or" operator, allowing multiple matches
This matches the standard for USA telephone
numbers: (e.g. '(212) 555-1212'). But what of you want a pattern
that "looks like" a phone number? If we go with the "xxx-xxxx"
mask, we can "loosen" the definition, like this:
'^\[0-9]{3}[-.][0-9]{4,4}'
This would cause a hit on any matching phone
number patterns, but it would miss these phone numbers:
1 (800) EAT-HERE
44-PARKS
Let's start our discussion of Oracle regular
expressions for data cleansing by looking at the easiest component,
the unification of common data formats such as addresses.
For expert Oracle data cleansing support and
data scrubbing consulting, use an expert from BC. We
understand the powerful Oracle data unification tools, and
we can aid in improving the data quality of any Oracle
database, large or small. |
 |
|