Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 
 

Viewing Model Histograms

Data warehouse tips by Burleson Consulting

This is an excerpt from Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse".

When you view the histogram for each of the attributes, keep in mind that the sample count is less than the 581,012 cases in the entire dataset.  If you want to report the average, max, min and variance for the whole dataset, you can change the samplingsize to 582,000 by going to Tools, Preferences, Sampling.  As you can see when viewing the target forest cover, 85% of the forest trees are spruce and pine. 

Do we actually need all 55 columns to build a model?  If you look at some of the Soil Type (ST) variables, you?ll see that for ST?s 10, 11, 12, 17, 18, 19 (to name a few) there are not many samples.  It is not likely that all attributes will contribute to a predictive model.  Some of them may in fact simple add noise and detract from the model?s value.

Attribute Importance in the Model

ODMrhas an Attribute Importance featurethat ranks the attributes by significance in determining the target value.  Attribute Importance can be used to reduce the size of a classification problem by eliminating some attributes, and consequently increase speed and accuracy when building models.

Let?s re-visit the Na?e Bayes analysis we completed in the last chapter.  We?ll use ODMr?s Attribute Importance analysis to find the highest ranking attributes and use these to build another model. 

Pick Attribute Importance under Activity Build and choose MINING_DATA_BUILD_US as the case table.  Use Customer ID as the unique identifier, and keep the default columns that the activity choose to build the model.  Finish and run the Activity. 

Upon completion, we can view the ranking results. 

You see that the Attribute Importance ranked HOUSEHOLD_SIZE as the most important attribute, followed by marital status and so on.  Now we?ll enter this information in a new Na?e Bayes model.

Using the new Na?e Bayes

Under Activity pick Build, then Classification as the function type and Na?e Bayesas the algorithm. 

1.      Choose MINING_DATA_BUILD_V_US for the case table, and customer ID (CUST_ID) as the unique identifier.  De-select BULK_PACK_DISKETTES, COUNTRY_NAME, CUST_INCOME_LEVEL, FLAT_PANEL_MONITOR, OS_DOC_SET_KANJI, and PRINTER_SUPPLIES from the Select Columns box.  

2.      Click next and check AFFINITY_CARD for the target column.  Keep Preferred Target Value ? 1, and name the activity MINING_DATA_BUILD_US_NB2. 

3.      When you click finish, the Activity Wizard will show the progress of sampling, discretizing, splitting, building and testing the new model.  Click on ?Result?, ?Accuracy? and ?More Detail? to view the confusion matrix. 

Table 1 shows the predictive accuracy, average accuracy, overall accuracy, and total cost between the two models.  These differences appear to be negligible, showing that you can drop one third of the data columns and not lose accuracy in the model, possibly saving time and money.

Now let?s return to the forest cover dataset.  Pick Attribute Importance under Activity Build to find the predictor attributes that may have the most effect in our model. 

1.      Choose COVER_TYPE_IMP as the case table, and Compound or None for the Unique Identifier. 

2.      Select target (forest cover) as the target column and make sure that it is set properly as a categorical mining type. 

3.      Type in a name for the Mining Activity and view the advanced settings before running the activity.  We will not change any of the default values for this analysis. 

4.      Click ?Finish? when you are ready to create the activity.  This model may take a while to run, since the dataset is large and there is no unique identifier.  You can view the progress of the steps ?Sample?, ?Discretize? and ?Build? as they run. 

 

For more tips and tricks for Oracle data warehouse analysis, see Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse"

You can buy it direct from the publisher for 30%-off:

http://www.rampant-books.com/book_2006_1_oracle_data_mining.htm


 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational