Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 
 

The Decision Tree Classification Model

Data warehouse tips by Burleson Consulting

This is an excerpt from Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse".

The Decision Tree algorithm splits the data in the case data by internally optimizing attributes to use at branching points.  As the data is split, a Homogeneity Metric is applied to ensure that the attribute values are predominately one or the other.  Branching stops when the algorithm has created 7 (the default) levels of branches in the tree. 

The Data Mining ActivityBuild is identical to those we created for the Na?e and Adaptive Bayes Network.  Under Advanced Settings on the Final Step page, the build settings have options for the homogeneity metric, maximum depth, minimum records in a node, minimum percent of records in a node, minimum records for a split, and minimum percent of records for a split. 

Using the forest cover data, we?ll construct a new Build Activity for Classification using the Decision Tree algorithm, keeping the default settings as shown.  We?ll also use priors in the classification build to target ponderosa pines.  Set the sample size at 200,000 cases, using the stratified sampling  type. 

When the build finishes, click on the result in the Test Metrics section.  Here we see that the predictive confidence is good at 38%, the average accuracy is 47%, and the overall accuracy  is 71%.   However, the accuracy in predicting ponderosa pines (Target = 3) is greatly improved at 86.5% as a result of using Priors in the model build. 

If we look at the Results under the Build Activitysection, we see the classification tree and the set of rules for classifying forest cover.  For example, highlighting the row with the 59th shows one of the rule for predicting ponderosa pines (Target = 3):

IF

Hillshade_am <= 213.5 AND

Elevation <= 2408.5 AND

ST2 is in 0 AND

Hz_dist_hyd <= 15.0 AND

Elevation <= 2513.5 AND

Elevation <= 3044.5

THEN Class = 3.

For this rule, there are 207 cases with .17% support and 38% confidence.  The predicted value (3) is the target value of the majority of records in that node.  Confidence is the percentage of records in the node having the predicted target value.  Support is the percentage of cases in the dataset satisfying the rule for that node. 

Decision TreeClassification Rules

Decision tree classification is popular because of these easily understandable classification rules.  Scroll down to examine the 116 rules available for this model. 

The classifier ?choose Elevation? for the first split with a splitting value of 3044.5.  The data is now divided into two sets of data, one with Elevation <= 3044.5 and the other with Elevation > 3044.5. 

Each of the data in the split is more homogeneous than before the split, although this is difficult to see in this example due to the complexity of the dataset. 

The Decision Tree  ?chose this attribute? to split after examining all the possible split values for each variable.

Check the box ?Show Leaves Only? to display only the terminal nodes, or leaves.  These are the nodes used to make the prediction when the model is applied to new data.  Because the Decision Tree is sensitive to missing values when applied to new data, ODMrwill assign a surrogate attribute if the attribute is missing in the apply data. 

By highlighting the leaves, and clicking the radio button for Surrogate, you can see that ODMrwill substitute HILLSHADE_PM, or ASPECT in place of HILLSHADE_AM, since these attributes are highly correlated with each other.

 

For more tips and tricks for Oracle data warehouse analysis, see Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse"

You can buy it direct from the publisher for 30%-off:

http://www.rampant-books.com/book_2006_1_oracle_data_mining.htm


 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational