|
 |
|
Oracle Regular Expressions for date address and telephone number formats
Oracle Database Tips by Donald Burleson |
Oracle Data Unification
See also:
Oracle
Regular expressions indexes as a tuning tool
In a nutshell, data unification is the process
of standardizing internal data formats, changing the values to a
standard format "mask":
phone_number char(14) (999) 999-9999
social_Security_Number char(11) 999-99-9999
More complex unification involved standardizing
formats of more complex columns like full_name and
street_address. These have lots of "rules", such as the
standard way to reference a P.O. Box, standard abbreviations for
streets, avenues and boulevards, and so on. This type of internal
data unification is the perfect application for regular expressions:
- Date Format - A valid date might
have a regular expression pattern that looks like this:
(19|20)\d\d[- /.](0[1-9]|1[012])[-
/.](0[1-9]|[12][0-9]|3[01])
- P.O. Box Format - Here is another
example of using Oracle regular expression syntax to match a
P/O. Box:
REGEXP_SUBSTR(street_address
,'BOX[([:space:]?\.?,?)]*+[[:digit:]]+' -- pattern
, 1 -- position
, 1 -- occurrence
, 'i' ) -- modifier
- Credit card Format - Here is a
validation for a credit card number:
REGEXP_SUBSTR(credit_card_nbr
, '(([[:digit:]]{4})
([[:space:]]|[[:punct:]]|)*){4}' ) acceptable
Now, let's look at the real challenge,
multi-column data cleansing. This is where you have the challenge
of replicating the decision processes of a human, namely the data
quality professional who is changed with identifying and repairing
incorrect data. As we will learn, this is an iterative process, and
we start with small, modest successes and move-on to complex
decision-logic after we have built the foundation. Remember "We
eat the Elephant one bite at a time".
For expert Oracle data cleansing support and
data scrubbing consulting, use an expert from BC. We
understand the powerful Oracle data unification tools, and
we can aid in improving the data quality of any Oracle
database, large or small. |
 |
|