This is an excerpt from Dr. Ham's premier book "Oracle
Data Mining: Mining Gold from your Warehouse".
Support Vector Machines (SVM) is a suite of
algorithms that are used for classification applications. Like the Adaptive
Bayes Networkand Decision Treealgorithms discussed in Chapter Two, SVM provides rules that are
useful in understanding the relationships and patterns in the dataset.
Inside Support
Vector Machines
Another advantage to using SVM is that it can
be used to predict outcomes based on text data, so if you have descriptive data
such as clinical notes for hospital patients, customer satisfaction survey
results, or other textual information it can be used as part of the
classification model.
In this chapter we will show an example using
SVM to predict how researchers rated web pages based on content in the web pages
themselves. We?ll also apply SVM to continuous data to illustrate the use of
classification models in regression. We will show how to use sqldr and ODMrto extract and load CLOBdata.
Let?s first examine the use of SVM in
classifying discrete attributes, where the problem is to predict one or more
values such as Yes or No, or as in the forest cover problem described in Chapter
Two, predicting 7 different types of trees depending on altitude and
environmental factors. ODMrprovides two
types of algorithms, or kernels, the Linear and Gaussian
kernels.
These linear and non-linear equations are
similar to statistical and artificial machine learning techniques such as neural
networks and linear regression, but are much better in terms of prediction
accuracy and speed in building the model. Using the Linear kernel will give us
a ranked listing of the attributes used to build the model, showing which
attributes were most important in predicting the target class. Let?s take a
closer look.
Inside the
SVM Analysis
We?ll use as our case data the daily average
wind speeds for 1961 through 1978 at 12 synoptic meteorological stations in the
Republic of Ireland. Each row corresponds to one day of data, with the
following attributes: year, month, day, and average wind speed (in knots) at
each of the stations in the following order: RPT, VAL, ROS, KIL, SHA, BIR, DUB,
CLA, MUL, CLO, BEL, and MAL.
We will create a new target class, season,
which we will code using months, with months 12, 1, and 2 designated winter, 3
through 5 as spring, 6 through 8 as summer, and 9 through 11 as fall.
Importing the
SVM Model Data
Using the import feature of ODMr,
import the csv (comma delimited file), being sure to create a dat type file by
renaming the file with the ?.dat? extension. Enter new column names in Step 3
of the Import Wizard, name the new table wind_ireland, and finish the wizard to
complete the data import.
To create the four seasons, on the Main Menu
choose ?Data?, ?Transform?, and pick ?Compute Field?.
Choose a new name for the view such as "wind_ireland_V? and type ?season? for
the new column name. Enter the following statements in the ?Expression? box,
and click on ?Validate? to ensure that the expression is valid.
case
when "wind_ireland"."month" = 12 or
"wind_ireland"."month" = 1 or
"wind_ireland"."month" = 2 then 'winter'
when "wind_ireland"."month" = 3 or
"wind_ireland"."month" = 4 or
"wind_ireland"."month" = 5 then 'spring'
when "wind_ireland"."month" = 6 or
"wind_ireland"."month" = 7 or
"wind_ireland"."month" = 8 then 'summer'
else
'fall'
End
You can view the SQL code and
save the script to a file when you preview the transformation.
1.
After you
complete the compute wizard, right click the new view wind_ireland_v and choose
Show Summary Single Record to
view the new case data details.
2.
Click on the new
attribute ?season? and check the histogram
showing the relative
distribution of values.
You can see that each season
comprises about a quarter of the case data, so no need to set priors in the
Build as we did in Chapter Two.