Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 E-mail Us
 Oracle Articles
New Oracle Articles

 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog

 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Oracle Support

 SQL Tuning

 Oracle UNIX
 Oracle Linux
 Remote s
 Remote plans
 Application Server

 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S


 Consulting Staff
 Consulting Prices
 Help Wanted!


 Oracle Posters
 Oracle Books

 Oracle Scripts

Don Burleson Blog 









Oracle Data cleansing anomalies

Oracle Database Tips by Donald Burleson

Oracle data cleansing is the science of quickly locating and repairing data anomalies by comparing the universe of values within the database.  In other worlds, if a data item "stands-out" (i.e. it is scientifically termed an "outliner", which any statistically significant variance from a mean population), the best-practice procedure is to "flag" the data as "questionable" and forward it to the human "expert" who approves every suggestion. 

The nature of data anomalies

Any Oracle application can have data-entry errors, and it is not always the fault of poor-quality data entry.  For example, consider a database that tracks information about elderly people.  Many older folks forget their birthday's and some even forget their names and data anomalies are guaranteed, no matter how comprehensive the front-end.

Let's use a hospital clinic management database to understand the common causes of data anomalies.

There are many sources of patient data quality concerns, almost all relating to data entry inconsistencies.  Names can be mis-spelled or transposed, and important supplemental information (e.g. SSN, age) might be entered incorrectly.

  • Inconsistent Patient information - One patient might appear to be many patients (as in the case where patient ID, or the SSN contains typo's or transposition errors), and conversely, many patients might appear to be one patient.  This anomaly is more rare, but possible, such as a case where a typo relocates a clinic encounter into the "history" of another patient.

  • Inconsistent diagnosis codes - Many clinics use a "check the box" approach for the physician to enter their diagnoses.  To minimize data entry errors, clinics use well-defined diagnosis codes (ICD-9 and ICD-10), but these codes have some limitations.  The diagnosis codes are displayed in a "choose all that apply" format, and the code range from very general to very specific.  It's not uncommon for a doctor to under-diagnose a patient by choosing a broad disease code, while failing to check-off the more descriptive codes.

This is an interesting area of Oracle data cleansing because medical records are legal documents and there could be serious ramifications to changing a physician's diagnosis.  Inconsistent or incorrect ICD-9 diagnosis codes can make it impossible to track diseases or the efficacy of a treatment.  There are many variations on the ICD-9 diagnosis codes, over 50 disparate clinics and hospitals in the data feed. 

The first step is to identify the "outlier" codes, figure-out how to standardize them, one at a time, using the expert judgment of the Data Quality Officer.  Once basic cleansing rules are in-place, the uniqueness of bad ICD-9 code becomes easy to spot (because they will not occur more than a few times in the whole database).  In the case of ICD-9 diagnosis codes, "outliers" can be marked as invalid (using a Data Quality "status" flag, deleted or changed), all at the discretion of the DQO.

Before we get into details, let's start with an overview of the design approach for a Decision Support System (DSS) for Oracle data quality assurance.

For expert Oracle data cleansing support and data scrubbing consulting, use an expert from BC.  We understand the powerful Oracle data unification tools, and we can aid in improving the data quality of any Oracle database, large or small.



Oracle Training at Sea
oracle dba poster

Follow us on Twitter 
Oracle performance tuning software 
Oracle Linux poster


Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.