Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Data Unification & Cleansing with Oracle


Oracle Database Tips by Donald Burleson

Introduction to Oracle Data Cleansing Systems

The Oracle DBA is the chief custodian of the mission-critical corporate data, and the advent of time-saving automation tools in Oracle 10g has freed-up the Oracle DBA to focus on issues of quality within their databases. 

  • Simple character validation
  • Complex column format validation
  • Multi-column validation
  • Complex multi-table data validation

Some data quality fixes are so easy that you may wonder why you have not found them before.  For example we know that transposition errors are very common with people's name columns, you can quickly run a PL/SQL program to locate possible transposition victims.  It's easier than you think, especially when the key table has unique "candidate keys" that can be used to ensure our reliability.

Also see how Oracle Data cleansing is performed in a Decision Support System framework and understanding Oracle knowledge engineering .

Let's take a look at each type of data validation activity.  Most Oracle databases perform two types of data cleansing, namely data unification and data scrubbing:

  • Data Unification - This is the relatively-mundane process of unifying column items, such as guaranteeing uniform values for salutations ("Mr.", "Mrs.") name suffixes ("Jr,", "III") and street type abbreviations ("Blvd", "Ave.").  Oracle 10g Release 2 Warehouse Builder ("Paris") introduces an exciting new data cleansing feature dubbed "Data profiling" where you can do simple unification such as name and address cleansing.  Best of all OWB Data profiling allow the system to be "self learning" using existing valid data to derive the data validation rules.  Most Oracle Data Warehouse DBA's argue that data unification belongs in the ETL process, but they differ about whether the data cleansing should be done record-at-a-time during ETL, or daily with a batch job.
     
  • Data Scrubbing - This is the more complex process of creating rules to identify statistical "outliers", suspicious rows that might contain invalid data.  A "suggestion" list is created along with recommended changes, and a human data quality expert (the Data Quality Officer, or DQO) evaluates each suspicious row and makes a decision:

        Reject suggestion - There are two reason for rejecting a data cleansing suggestion, either the data is just plain bad (and it is marked with "i" for invalid, or because the decision rules are incomplete.  In this case, the DBA gathers the reason for the rejection and adds this new rule or exception into the identification program.

        Accept suggestion - The data is changed and a complete audit log is created.  The data is marked as having been scrubbed.

As the DSS evolves and the rules are refined to 100% reliability, some system components are automated, providing an "expert system" component to reduce the manual workload of the DQO.

If we start by examining known data errors to find common patterns, the Oracle team can design programs to detect these types of errors and quickly clean-up a large amount of transposition errors, and successively refine the model to identify less obvious data anomalies.  We can also search for statistical "outliers", data that violates the norms of the database as-a-whole.  A successful data cleansing project requires an 8-week commitment of these resources:

  • Data Quality Officer (DQO) - This is the resident "data expert", someone who understands the nature of the data and can provide "decision rules" for incorporation into the DSS.
     
  • Data Analyst - This is the DSS expert who works directly with the Data Quality Officer.
     
  • Programmers - An Oracle professional who is fluent in PL/SQL and Java write the main PL/SQL routines to specify new decision rules and exceptions to the rules.  The program may also incorporate "translation tables" for unifying items like IVCD-9 diagnosis codes.  An online system programmer is also required to build the HTML-DB front-end to the interactive component of the DSS.

By using well-understood best practices for Oracle data cleansing a robust and flexible system can be created to dramatically reduce data anomalies.  Using an iterative cycles of refining the decision rules from the DQO, the DSS evolves to become increasingly accurate and powerful.

 

This is a DSS for Oracle data cleansing in a nutshell.  Note that we start by examining the "nature" of known data errors and seek "fishy" data (statistically valid outliners) for creating the suggestion lists for the human expert (the Data quality officer or DQO).

The DQO then manually resolved the errors and works with the DBA to refine the decision rules until they are 100% complete and accurate using the "feedback loop" of successive rule refinement.  At that point, that component of the Oracle data cleansing is automated, becoming an "expert system" component of the DSS.


For expert Oracle data cleansing support and data scrubbing consulting, use an expert from BC.  We understand the powerful Oracle data unification tools, and we can aid in improving the data quality of any Oracle database, large or small.

 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational