This is an excerpt from Dr. Ham's premier book "Oracle
Data Mining: Mining Gold from your Warehouse".
To start the classification problem, we’ll begin by using
the Naive Bayes data mining activity. The Naive Bayes algorithm has the
advantages of being quick to run.
The Mining ActivityBuild
wizardis launched from the Activity
pull-down menu. Select Build to activate the wizard and click Next on the
Welcome page. Choose the Classification function type,
Naïve Bayes algorithm, and click Next.
In
step 2 of the New Activity Wizard, select MINING_DATA_BUILD_V
_US as the case table
or view, choose Single Key CUST_ID as the unique identifier, and select all
columns to include in the analysis. Note that clicking Sampling Settings opens
a new window that allows you to change how the data is sampled. For this
exercise you’ll keep the default Random sampling.
In classification problems, a target is
identified and in this case we are using the attribute AFFINITY_CARDto
distinguish high-value customers where 1 = High-value and 0 = Low-value. On
step 3 of the New Activity Wizard, choose AFFINITY_CARD as the Target column.
Note that COUNTRY_NAME and PRINTER_SUPPLIES are not selected as attribute
variables. Neither of these fields will contribute the classification model
because there is only one country in this view and all consumers order printer
supplies. You can see this by clicking the Data Summary link and seeing that
the average, max, and min columns for PRINTER_SUPPLIES is 1, with 0 variance.
In step 4 of the New Activity Wizard, select 1 as the
preferred target value which identifies the cases we are trying to target, where
our best customers have AFFINITY_CARD = 1.
Naming Data Mining Activities
For step 5, ODMr provides a name for the data mining
activity, but you’ll probably want to change this to a name that explains the
activity, such as ALL_US_NB1 for all US customers, Naïve Bayes activity 1.
On
the final page, the New Activity Wizard
is complete and the Data Mining Activity
is
set to run upon finish when you click the Finish button.
Click Advanced Settings to display and possibly modify the default settings.
The Advanced Settings Dialog
window shows three tabs: Sample, Discretize, Split, Build and Test Metrics. The
wizard has determined that samplingis not
needed for this small dataset, and you’ll want to leave the Enable Step box not
checked.
The details of the Naïve Bayes
classification model will be discussed in more detail in a later Chapter. For
now, accept the defaults and finish the model.