Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 
 

Adaptive Bayes Network Single Feature Model

Data warehouse tips by Burleson Consulting

This is an excerpt from Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse".

But first, we'll build a new Adaptive Bayes Networkmodel, and instead of using the Single Feature model type as we did previously, choose Multi Feature, maximize the Overall Accuracy, set the target attribute for forest cover = 3, and sample 20,000 rows. 

Multi Feature will not give us the rules as before, but will build multiple features to improve the model with each feature, and result in a more effective model.

The results of our new model has improved the overall accuracyfrom 21% to 45%, and correctly classified 42% of the ponderosa pines, while keeping the overall accuracy around 60%.  Lower cost is an indicator of improved prediction, and the cost of the Multi Feature model is 11804 compared to 13454 in the Single Feature model. 

By changing the type of model used to classify our data, we were able to influence the predictive accuracy of our model.  As mentioned earlier, there are two ways to nudge the model into producing different results that we might be more interested in seeing. 

One of the methods is to introduce cost bias into our build model, and the methods for doing this were described in Chapter One.  To review, go to the ROCtab in the result viewer of your Mining Activity

The Receiver Operator Characteristics metric shows the change in probability given modification to the Cost Matrix.   We want to predict more of the ponderosa pines and avoid false negative predictions.  Under the ROCcurve, there are two boxes labeled False Positiveand False NegativeCost.  Type in ?3? in the False Negative Cost box, telling the model that a false negative is three times more costly than a false positive error, and click Compute Cost.  Note that the red line jumps to the right and in the detail section the line with probability threshold 0.216 is highlighted.  The confusion matrixchanges to show that there are 22 false negatives, 200 false positives, 215 true positives and 3556 true negatives.  

To modify the model test results, return to the Mining Activityand click on Select ROCthreshold in the Test Metrics section.  The default costs for False Positive and False Negativeare assumed to be equal and are set to 1 by default. 

Now we change the setting from a probability of 0.5 to 0.216 and notice that the cost of false negative cost is now 3.63.  Click OK to save the settings and now see that the ROC Threshold is changed.  The new cost bias will be used when the model is applied to a dataset.

We have now built two types of Adaptive Bayes Networkmodels, the Single Feature and the Multi Feature.  There is one more type in ODMr which is the Pruned Naive Bayes, very similar to the Na?e Bayes that we used in Chapter One, although the results will not be exactly what you'd get using the Na?e Bayes Classification modeldirectly.  In the next section we'll look at Decision Trees, which like the Adaptive Bayes Network Single Feature modelgives rules for determining how cases are classified.

In many types of data the target values may comprise only a very small percentage of cases.  In hospital data, for example, preventable hospitalizations are somewhat rare events, and of all hospitalizations account for a very small percentage of admissions.  Likewise of all the users hitting a website, only a small number are initiated by hackers with malicious intent.  A classification model built on datasets containing only a few known cases will not be able to discriminate very effectively between the two classes. 

The model may in fact predict that no cases are preventable, or are hacker attempts, and it will be 98 to 100% correct!  However, we really have not learned anything about these cases, and the model is not very effective.

What we want to do is use case data for the model build that has approximately equal numbers of positive and negative cases.  However, the algorithm will take this distribution as if it were realistic, so we need to supply the actual distribution of target values, called the Prior Distribution(Priors), so the build process will result in more meaningful models.  The ODMrclassification models will use Priors when you specify stratified samplingin the Mining Activityadvanced settings for sampling. 

Enhanced Adaptive Bayes Network (ABN)

Oracle Data Mining (ODM) incorporates supervised and unsupervised learning models. Supervised learning models, sometimes called directed models, are used to predict a value. One supervised model, the Adaptive Bayes Network (ABN) is a data-mining algorithm that provides decision tree-like functionality in the database.  Oracle Database 10g has made the following ABN enhancements:

?  Enable access through a Java API to a prediction's supporting rules.

?  Enable the user to control accuracy, performance, and output parameters on a model-by-model basis rather than a configuration table setting.

?  Automatically select a best Naive Bayes Model baseline.

Enhanced Adaptive Bayes Network (ABN)

Oracle Data Mining (ODM) incorporates supervised and unsupervised learning models. Supervised learning models, sometimes called directed models, are used to predict a value. One supervised model, the Adaptive Bayes Network (ABN) is a data-mining algorithm that provides decision tree-like functionality in the database.  Oracle Database 10g has made the following ABN enhancements:

?  Enable access through a Java API to a prediction?s supporting rules.

?  Enable the user to control accuracy, performance, and output parameters on a model-by-model basis rather than a configuration table setting.

?  Automatically select a best Naive Bayes Model baseline.

 

 

Next, we'll try this using the Decision Tree classification model. 
 

For more tips and tricks for Oracle data warehouse analysis, see Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse"

You can buy it direct from the publisher for 30%-off:

http://www.rampant-books.com/book_2006_1_oracle_data_mining.htm


 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational