Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Clusters and Cohorts

Data warehouse tips by Burleson Consulting

This is an excerpt from Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse".

Clustering data is a very common technique in data mining as well as many other fields, including statistics, bioinformatics, pattern recognition, and machine learning.  Clustering is the unsupervised classification of data, where the subsets of data share common traits.  In previous chapters we have discussed supervised classification, meaning that a target was identified and the accuracy of the prediction followed from how many cases were correctly classified according to the target values.  With clustering algorithms no target is specified, you simply see what patterns are discovered by the technique.  

For example you may find clusters in a large group of hospital patients, which are comprised of those with the same diseases, such as coronary patients, pediatric patients and so on.  Furthermore, certain cancer patients may exhibit a type of tumor characterized by a certain gene that is sensitive to a specific type of drug treatment.  Clustering can reveal the characteristics of drugs, genes and the disease that may respond best to a specific therapy. 

Oracle Data Miner has two algorithms for performing cluster analysis the k-Meanstechnique and the Orthogonal Partitioning Clustering(O-Cluster).  The enhanced k-means algorithm randomly defines initial centroids, which approximate a ?center of gravity? and uses distance measures to calculate the distance between centroids and data objects.  ODMr uses either the Euclidean, Cosine, or Fast Cosine distance metrics.  From the Oracle Data Mining Forum in response to ?How does ODM cluster algorithm work?? posted May 2, 2006:

?ODM k-means builds a hierarchical tree. When a new cluster is added, the parent node is replaced with two new nodes. Both children have the same centroid as the parent except for a small perturbation in the dimension with most variability. Then a few k-means iterations are run on the two children and the points belonging to the parent are distributed among the two new nodes.  

There are a couple of different strategies how to choose which node to split (e.g., size, dispersion).  Once the desired number of leaf nodes is reached, we run k-means across all leaf nodes.

 The advantage is that all clusters have reasonable initial centroids and we are unlikely to get dead/empty clusters.  We explode categorical attributes into multiple binary dimensions and compute distances using these new dimensions.?

 

For more tips and tricks for Oracle data warehouse analysis, see Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse"

You can buy it direct from the publisher for 30%-off:

http://www.rampant-books.com/book_2006_1_oracle_data_mining.htm


 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational