|
 |
|
Data format validation with Oracle regular expressions
Oracle Tips by Burleson Consulting
|
See also:
Oracle
Regular expressions indexes as a tuning tool
Simple column validation with Oracle regular expressions
Data quality control is one of the most
important jobs of the Oracle DBA, and Oracle provides sophisticated
tools to ensure data quality and many of them can be placed “inside”
the database schema, like this simple constraint that ensures that
names only contain alphabetic characters:
alter table
patient
add constraint
alpha_name_only
check (
regexp_like(patient_name,'^[[:alpha:]]+$')
);
But data quality control is far more than
controlling simple data column values.
Complex column format validation
The DBA must develop database-centric methods
to ensure data quality formats, like this constraint to ensure that
all credit card values are properly formatted:
alter table
customer
add constraint
valid_credit_card_nbr
check (
REGEXP_SUBSTR(credit_card_nbr
, '(([[:digit:]]{4})
([[:space:]]|[[:punct:]]|)*){4}' )
But in the real-world multiple column values
have to be compared to ensure data quality.
Multi-column validation
Multi-column validation happens when one column
value is compared to another column value to determine validity.
For example, consider a table where we want to ensure that a
customer with a salutation of “Ms.” Has a matching gender
column of “F” for female:
CREATE TYPE cust_name (
salutation char(3),
first_name varchar(20),
last_name varchar(50),
suffix char(3)
);
CREATE TABLE customer (
customer_name cust_name,
gender char(1),
. . .
);
In this case an “insert trigger” is required to
compare and validate the data columns for internal consistency. The
Oracle trigger syntax allow you to apply Boolean logic to compare
the internal consistency of each incoming row:
CREATE [OR REPLACE] TRIGGER <trigger_name>
{BEFORE|AFTER} {INSERT|DELETE|UPDATE} ON <table_name>
[REFERENCING [NEW AS <new_row_name>] [OLD AS <old_row_name>]]
[FOR EACH ROW [WHEN (<trigger_condition>)]]
<trigger_body>
But the real “meat” of Oracle data quality
control (called “data cleansing”, a.k.a. “data scrubbing”) is the
ability to compare incoming row to the “norm” from validated data,
and to “flag” and data row values that are statistically abnormal.
Complex multi-table data validation
The data quality question “Does this row
look right?” requires a more sophisticated approach, replete
with complex logic (a PL/SQL program), audit log tables, and a
interactive front-end for the Data Quality Officer (The DQO, the
site-appointed expert who understands the data).
This is the most challenging type of Oracle
data quality validation where complex rules (and rule “exceptions”)
are coded to compare incoming table rows to the existing database
“norms” and identify suspicious data for human examination.
| For expert Oracle data cleansing support and
data scrubbing consulting, use an expert from BC. We
understand the powerful Oracle data unification tools, and
we can aid in improving the data quality of any Oracle
database, large or small. |
 |
|