Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 E-mail Us
 Oracle Articles
New Oracle Articles

 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog

 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Oracle Support

 SQL Tuning

 Oracle UNIX
 Oracle Linux
 Remote s
 Remote plans
 Application Server

 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S


 Consulting Staff
 Consulting Prices
 Help Wanted!


 Oracle Posters
 Oracle Books

 Oracle Scripts

Don Burleson Blog 









When to use O-Cluster Analysis

Data warehouse tips by Burleson Consulting

This is an excerpt from Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse".

O-Cluster, a proprietary Oracle algorithm, has the advantage of handling large numbers of attributes (high dimensionality), and is more appropriate for large numbers of cases (more than 500).  The number of leaf clusters is determined automatically.  The points where splits occur can provide insight to the way data is structured and is helpful in selecting features that help discriminate among cohortsof cases. 

Be aware that O-Clusterdoes not necessarily use all the input data when it builds a model.  The algorithm reads 50,000 rows in a batch and will only read in another batch if there is reason to suspect that more clusters exist.  Therefore, O-Cluster may stop building the model before all the data is read in, and it is highly recommended that the data be randomized, and discretized using equi-width binning after clipping to handle outliers.  Of course, the Build ActivityWizard in ODMrtakes care of all data preparation for you.

Applying the Cohort Cluster

Now that we have defined cohortsof our population, we?re interested in applying the cluster definition to a new dataset.  We have downloaded the COILtest dataset from  Using the copy table Wizard in ODMr, create an exact copy of the COIL build dataset.  Then delete all rows in the table using the following SQL statements in the SQL Worksheet:

delete from coil_test;


Next, we create a control file for SQLLDR with the following (note that the target attribute CARAVAN is not included):

load data

infile '\\folder\shareddocs\coil_test.dat.csv'


into table dmuser_book.coil_test

fields terminated by ','



MFGEKIND, MFWEKIND, MOPLHOOG, MOPLMIDD, MOPLLAAG, MBERHOOG,                                          

 MBERZELF, MBERBOER, MBERMIDD, MBERARBG, MBERARBO, MSKA,                                              

 MSKB1, MSKB2, MSKC, MSKD, MHHUUR, MHKOOP, MAUT1, MAUT2,                      MAUT0,  MZFONDS,  MZPART,  MINKM30, MINK3045, MINK4575, MINK7512,                                          

 MINK123M, MINKGEM, MKOOPKLA, PWAPART, PWABEDR, PWALAND,                                           

 PPERSAIT, PBESAUT, PMOTSCO, PVRAAUT, PAANHANG, PTRACTOR,                                          

 PWERKT, PBROM, PLEVEN, PPERSONG, PGEZONG, PWAOREG, PBRAND,                                            

 PZEILPL, PPLEZIER, PFIETS, PINBOED, PBYSTAND, AWAPART, AWABEDR,                                            

 AWALAND, APERSAUT, ABESAUT, AMOTSCO, AVRAAUT, AAANHANG,                                          



Next, execute this line of code at the command prompt in the directory where the SQLLDR control file is located, substituting your password and database sid as appropriate.

C:\scripts>sqlldr  dmuser /pswd@database control=coil.ctl log=coil.log

You will now have 4000 records for 86 attributes for the apply dataset.  Choose Activity, Apply from the menu, and select the ?Cluster Build Model? that we built previously. 

Select the table just created for the ?Apply Data Source?, and pick any attributes that you want included in the result set.  The COIL test data table does not have a unique identifier, and ODMrwill create one for you.  In step 4 of 5 in the Apply Activity Wizard, you have the choice of Most Probable Cluster, Specific Cluster ID, and Number of Best Cluster ID?s as output options. 

The Most Probable Cluster ID will assign the cluster with highest probability to each record, the default choice.  These results are shown below.  Each case in the result table is assigned to a cluster with a certain probability attached. 

When specific clusters are chosen, the output is shown below.  For each cluster chosen, ODMr displays a column with the probability of that case fitting into the cluster.  Selecting the records with probability > .85, for instance, will result in a cohort of customers who fit a particular profile. 

If you choose the Number of Best Cluster ID?s, you will get a listing of the number of best clusters with corresponding probabilities as shown below. 


For more tips and tricks for Oracle data warehouse analysis, see Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse"

You can buy it direct from the publisher for 30%-off:


Oracle Training at Sea
oracle dba poster

Follow us on Twitter 
Oracle performance tuning software 
Oracle Linux poster


Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational