This is an excerpt from Dr. Ham's premier book "Oracle
Data Mining: Mining Gold from your Warehouse".
Upon completion of the Build Activity,
we can view the results.
We can see that the elevation has the
greatest influence on type of forest cover, with Soil Type 3 a distant second in
importance.
Three man-made features came in next:
roads, distance to fire points, and designated wilderness areas. You can report
these results, and use them in a Naïve Bayes
analysis as shown previously.
We will go ahead and perform the
Adaptive Bayes Network
analysis,
which uses a built-in Attribute Importance methodology when building the model.
Both the Adaptive Bayes Network
and the Decision Tree
algorithms
rank attributes as part of the model building algorithm, so Attribute Importance
is most useful as a preprocessor for Naïve Bayes or
Support Vector Machines.
The Naïve Bayes
model is something like a black box, and we cannot see what is used to create
the final results. One of the advantages to using the Adaptive Bayes Network
is that you can generate human-readable rules that can give us insight as to
what the model is using to classify cases.
Using the Adaptive Bayes Network Model
Let’s start a new Classification Mining
Activity and use the Adaptive Bayes Network
for the activity type.
1. Pick
COVER_TYPE_IMP
as the case table and
Compound or None for the Unique Identifier.
2. Select all the
columns to be used in the analysis, skip joining other tables, select TARGET
(forest cover)
as the target, and review the settings. Make sure that the target attribute is
a categorical mining type, otherwise ODMr
will stop you
from running the Activity.
When you select the preferred target
value, you have the choice of 1 through 7. Pick the type of forest cover that
you are most interested in to test the model. You can change this later, so to
get started choose Target - 4. After you have named the activity, and on the
Final Step page, select Advanced Settings and examine the Advanced Settings
Dialog
.
Until this point, all steps in the Build
Activity are identical to those for Naïve Bayes.
If you click on the Build tab, and then Algorithm Settings under options, you’ll
see a drop down box with three selections for Model Type: Single Feature, Multi
Feature, and Naïve Bayes. Setting the model type to Single Feature (the
default) will give you the human-readable rules.
The speed of building the model can be
slower or faster depending on the number of predictors chosen for the model.
You can also limit the build time by entering the number of minutes you want the
algorithm to execute. We will keep all the defaults at this point and go ahead
and finish the model building activity.
This is a large dataset; you can build the model on the
entire dataset if you have enough computer resources (i.e. memory), or you may
choose to build the model on a sample of the data. To speed development of
classification models, it often the case that models are built on smaller
subsets of data, or limits set for the amount of time (minutes) used to build
the model.