Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Using O-Cluster

Data warehouse tips by Burleson Consulting

This is an excerpt from Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse".

O-Cluster is a density-based algorithm that does not use distance formulas.  O-Cluster is an Oracle proprietary algorithm.  Technical details about the O-Cluster algorithm can be found in Milenovaand Campos paper ?Clustering Large Databases with Numeric and Nominal Values Using Orthogonal Projections? at http://www.oracle.com/technology/products/bi/odm/pdf/ocluster_wnominal_data.pdf.

According to the Oracle Data Miner Tutorial, O-Clusterfinds ?natural? clusters by identifying areas of density within the data, up to the maximum number entered as a parameter. That is, the algorithm is not forced into defining a user-specified number of clusters, so the cluster membership is more clearly defined.

O-Cluster Sensitivity Settings

The Sensitivity settingdetermines how sensitive the algorithm is to differences in the characteristics of the population. O-cluster determines areas of density by looking for a ?valley? separating two ?hills? of density in the distribution curve of an attribute. A lower sensitivity requires a deeper valley; a higher sensitivity allows a shallow valley to define differences in density.

Thus, a higher sensitivity value usually leads to a higher number of clusters.  If the build operation is very slow, you can increase the Maximum Buffer Sizein an attempt to improve performance. 

For our example, we?ll use the ODMr K-means algorithm to find clusters in the CoIL dataset, found at http://kdd.ics.uci.edu/databases/tic/tic.html.  The build dataset used in the CoIL 2000 Challenge has 86 attributes and 5822 descriptions of customers of a Dutch insurance company.  The target attribute is #86, ?CARAVAN? which is the number of mobile home policies.  

Using K-Means for Clustering

As we start, be sure to name the file with the ?.dat? extension and save as comma delimited if you use the ODMrImport Wizard. 

After importingthe dataset, we will examine the histogramof the target attribute by right-clicking the case table and choosing Show Summary Single Record

Examining the K-Means Data

You?ll see that there are 348 cases where CARAVAN = 1, approximately 6% of the total.  In order to more clearly distinguish clusters around the target value of interest, we?ll stratify the case table so that we have a more even distribution of 1?s and 0?s for the CARAVAN attribute. 

Use the ?transform wizard Stratified Sample? to create a new table with 1/3 of the target attribute = 1 (having insurance) and 2/3 of the cases will be for customers who don?t have mobile home insurance. 

The new case table will have a total sample count of 1044 cases, 348 with insurance and a random sample of uninsured customers equaling 696 cases. 

Now let?s build a new cluster model using the stratified sample, choosing K-means for the algorithm.  There is no unique key for the case data, so we choose ?Compound? or ?None? for the Unique Identifier.  Note that in-contrast to the classification models, you do not choose a target variable in the Activity Wizard. 

 

For more tips and tricks for Oracle data warehouse analysis, see Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse"

You can buy it direct from the publisher for 30%-off:

http://www.rampant-books.com/book_2006_1_oracle_data_mining.htm


 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational