|
 |
|
Automating Oracle Data cleansing approaches
Oracle Tips by Burleson Consulting
|
Oracle provides a wealth of data cleansing
methods, and you can enforce data format:
- Constraints - Defining Oracle
constraints ensures that only columns with matching data formats
can be added to a table.
- Global Updates – A single update
statement can be issued to unify all matching data formats.
For new databases, you can enforce data format
at the database-level by defining “constraints” that will
only allow matching entries to be added. Consider this one where we
force a patient name to only contain character values:
alter table
patient
add constraint
alpha_name_only
check (
regexp_like(patient_name,'^[[:alpha:]]+$')
);
The downside to using regular expressions at
the database level are the DML errors.
ORA-02290:
CHECK CONSTRAINT VIOLATED
If you have Oracle 10g release 2, you can also
use the new
DML error logging feature to track DML errors.
The data unification phase is the easiest part
of data cleansing because it’s all about one-to-one format
conversion. This is easy low-hanging fruit, and it is normally
automated after the end-user is comfortable with the results. Here
is the Oracle update statement to unify telephone numbers. It’s
just this simple:
UPDATE
patient
SET
patient_phone =
REGEXP_REPLACE(
patient_phone,
'([[:digit:]]{3}) -– backward reference #1
[-. ]
([[:digit:]]{3}) -– backward reference #2
[-. ]
([[:digit:]]{4})', -– backward reference #3
'(\1) \2-\3'); -- here, re-format string
This simple but powerful update
statement will perform the following transformations, perhaps on
millions of rows in your database:
800-555-1212
(800)
555-1212
800.555.1212
(800)
555-1212
800
555-1212
(800)
555-1212
1-800-eat-here No change
011-44-23-58564 No change
But what about the “bad” non-matching telephone
numbers? As we will learn soon, “Bad” data (that which does not “Look
like a phone number” will remain as-is, because the regular
expression format mask did not trigger any change.
So, how does the Oracle DBA implement a
complete data quality mechanism? Let’s start by examining Oracle
“regular expressions”, a common tool for Oracle data quality
control.
| For expert Oracle data cleansing support and
data scrubbing consulting, use an expert from BC. We
understand the powerful Oracle data unification tools, and
we can aid in improving the data quality of any Oracle
database, large or small. |
 |
|