Oracle Data cleansing Decision Support System Team
Oracle Database Tips by Donald Burleson
Data cleansing is a critical task for all Oracle
data warehouses. It's not "special" or "unique" to any shop, and
the data cleansing problem is very well-understood. Oracle 10g
Warehouse Builder contains built-in data cleaning tools (data
profiling), and tools for data scrubbing are freely available over
In our experience, a savvy team of Oracle
professionals can create a Decision Support System (DSS) to identify
and resolve the low-hanging fruit (the 80/20 rule) and cleanse a
trillion byte database in a matter of weeks.
A DSS is an interactive system that
incorporates known rules about data anomalies and provides the Data
Quality Officer (DQO) with a tool to quickly identify and scrub
errors. As the DSS evolves and the rules are refined to 100%
reliability, some system components are automated, providing an
"expert system" component to reduce the manual workload of the DQO.
If we start by examining known data errors to
find common patterns, the Oracle team can design programs to detect
these types of errors and quickly clean-up a large amount of
transposition errors, and successively refine the model to identify
less obvious data anomalies. We can also search for statistical
"outliers", data that violates the norms of the database as-a-whole.
A successful data cleansing project requires an 8-week commitment of
- Data Quality Officer (DQO) - This
is the resident "data expert", someone who understands the
nature of the data and can provide "decision rules" for
incorporation into the DSS.
- Data Analyst - This is the DSS
expert who works directly with the Data Quality Officer.
- Programmers - An Oracle
professional who is fluent in PL/SQL and Java write the main
PL/SQL routines to specify new decision rules and exceptions to
the rules. The program may also incorporate "translation
tables" for unifying items like IVCD-9 diagnosis codes. An
online system programmer is also required to build the HTML-DB
front-end to the interactive component of the DSS.
By using well-understood best practices for
Oracle data cleansing a robust and flexible system can be created to
dramatically reduce data anomalies. Using an iterative cycles of
refining the decision rules from the DQO, the DSS evolves to become
increasingly accurate and powerful.
This is a DSS for Oracle data cleansing in a
nutshell. Note that we start by examining the "nature" of known
data errors and seek "fishy" data (statistically valid outliners)
for creating the suggestion lists for the human expert (the DQO).
The DQO then manually resolved the errors and
works with the DBA to refine the decision rules until they are 100%
complete and accurate using the "feedback loop" of successive rule
refinement. At that point, that component of the Oracle data
cleansing is automated, becoming an "expert system" component of the
|For expert Oracle data cleansing support and
data scrubbing consulting, use an expert from BC. We
understand the powerful Oracle data unification tools, and
we can aid in improving the data quality of any Oracle
database, large or small.