This is an excerpt from Dr. Ham's premier book "Oracle
Data Mining: Mining Gold from your Warehouse".
Oracle Data Miner gives us the choice of four different
classification models, Naïve Bayeswhich was
described in Chapter 1, Adaptive Bayes, Decision Tree and Support Vector Machine.
Each approach has distinct advantages over the other, so which one will be the
best? The exploratory nature of data mining lends itself to investigating many
different techniques. As you saw in the last chapter, the Naïve Bayes model can
be “tweaked” to perform better given the nature of the data or results you are
interested in.
As a general rule, try several different methods and examine
the differences between the results. Because we are looking for patterns that
are most likely unknown to us, we may not even find any useful results at all!
The patterns we see may not be meaningful or practical to apply. The usefulness
of a method can depend on the size of the dataset, the types of patterns that
may exist in the data, meeting the underlying assumptions of the algorithm, the
type of data, the goal of the analysis, and many other factors.
Using the Models
In this chapter we will describe the Adaptive Bayes Networkand Decision Treemodels. We
will use the import tool to import data into the Oracle database, and describe
how to configure the models to produce the best results for our dataset, using
attribute importance, costs and priors.
We start with an example of predicting actual forest cover
type using geographical data from the US Forest Service Resource Information
System data. The dataset available on the UCI KDD Archives site has 581,012
observations with 54 attributes regarding geological survey characteristics of
the land, wilderness area designation, and soil type. The target
classifications are 7 types of forest cover, including 1 = spruce/fir, 2 =
lodgepole pine, 3 = ponderosa pine, 4 = cottonwood/willow, 5 = aspen, 6 =
Douglas-fir, and 7 = Krummholz.
Importing Model Data
We start by importing the data using the ODMrImport Wizard found under the “Data” tab. The dataset is comma
delimited, and since there are no column headings, you have the option of
changing the column name (be sure to enclose the variable names with “” e.g.
“TARGET”) and designating the data type and
size. You can preview the data. Specify a new table name COVER_TYPE_IMP,
then click “Next” and “Finish”.
This dataset will be imported using SQL*Loader,
and for the Import Wizard to work you must set the directory for the SQL*Loader
under Tools, Preferences in the “Environment” tab.
We right click on the table name you just created,
COVER_TYPE_IMP, choose “Show Summary Single Record”,
and check the data. ODMrwill guess whether
the data type is numerical or categorical; you may need to change the
TARGET attribute (type of forest cover) from numerical to categorical.