Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 E-mail Us
 Oracle Articles
New Oracle Articles

 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog

 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Oracle Support

 SQL Tuning

 Oracle UNIX
 Oracle Linux
 Remote s
 Remote plans
 Application Server

 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S


 Consulting Staff
 Consulting Prices
 Help Wanted!


 Oracle Posters
 Oracle Books

 Oracle Scripts

Don Burleson Blog 








SVM and Overfitting

Data warehouse tips by Burleson Consulting

This is an excerpt from Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse".

The complexity factor prevents overfitting.  If a model is built that exactly fits the dataset used in its construction, the ability of the model to predict target attributes in this build dataset would be 100%.  This sounds great until you try to apply the model to your test dataset, and find that the predictive accuracy is poor.  In fact, the model is only useful for the build dataset, because errant or extreme values that are not seen anywhere else are operating to prevent the model from being generally applicable to new data. 

The SVM algorithm will calculate the most optimal complexity factor to prevent over-fitting by finding the best tradeoff between simplicity and complexity.  You may if you like re-build the model and specify a higher complexity factor than the one chosen by SVM, especially if you find that the model is skewing(or favoring) the prediction in favor of one class.

Activity Learning maintains accuracy while enhancing the speed of building the model, and should not be disabled. 

Sample SVM Activity

For this exercise, we will keep the default settings.  When the Build and Test Activity Steps are completed, click on Result in the Build section.  There we see that ODMr used the Gaussiankernel function to build the model.  Click on Weights and note that all seasons had equal weight.  The test results show that the predictive confidence is in the good range at 46.3%.  

A look at the accuracy of the model indicates that the spring season had the fewest number of correct predictions, and that many spring days were actually classified as winter.   A more accurate model might be constructed by changing the months designated for the different seasons, guided by better knowledge of Irish weather, but the point here is that the model can differentiate calendar months simply by examining wind speed data.  

Re-building the model and forcing the SVM model to use the linear kernel resulted in a very poor model for this dataset, and reduced predictive accuracy to 11.4%.  The cost of the linear kernel was 598 as compared to 502 for the Gaussian, illustrating that the linear model was worse when measuring the relative accuracy of the two models. 

Examining the coefficients of the attributes is not very revealing for this dataset, except to point out that even though ?month? was explicitly built into the definition of ?season?, it was not as important as wind speeds in predicting which season the data was recorded from. 

Usually when you derive a new attribute value from one of the existing attributes, you?ll want to exclude this variable from the model since it is very highly correlated with the target attribute in this example.  Note that there are different coefficients and rankings of attributes for each season fall, winter, spring and summer, and the value of the coefficients are very small. 

To demonstrate a better model using the linear kernel of SVM, we?ll import the Boston house price dataset from  

This data is from the publication Harrison, D. and Rubinfeld, D.L.'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. 

There are 20 attributes in this example:

OBS                               unique identifier for each case

TOWN                           town where area is located

TOWN#                         numeric identifier of the town

TRACT                          tract number

LON                               longitude

LAT                               latitude  

CRIM                             per capita crime rate by town

 ZN                                 proportion of residential land zoned for lots over 25,000 sq.ft.

 INDUS                          proportion of non-retail business acres per town

 CHAS                            Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

 NOX                              nitric oxides concentration (parts per 10 million)

 RM                                average number of rooms per dwelling

 AGE                              proportion of owner-occupied units built prior to 1940

 DIS                                weighted distances to five Boston employment centres

 RAD                              index of accessibility to radial highways

 TAX                              full-value property-tax rate per $10,000

 PTRATIO                      pupil-teacher ratio by town

 B                                   1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

 LSTAT                          % lower status of the population

 MEDV                           Median value of owner-occupied homes in $1000's


For more tips and tricks for Oracle data warehouse analysis, see Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse"

You can buy it direct from the publisher for 30%-off:



Oracle Training at Sea
oracle dba poster

Follow us on Twitter 
Oracle performance tuning software 
Oracle Linux poster


Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational