 |
|
Oracle Data
Mining Histogram Display
Data warehouse tips by Burleson Consulting |
This is an excerpt from Dr. Ham's premier book "Oracle
Data Mining: Mining Gold from your Warehouse".
After selecting AGE as shown below, you will see a
Preference and Histogram button on the right side of the screen. If you click
on Preference here, you can change the number of bins in your sampling,
but you cannot change the sample count, which in this case is now set to 2000.
We’ll examine the age attribute by clicking on histogram.
With number of bins set to 10, the histogram will show 10 bars (groups), age
values for each group, number of cases in each bin (bin count), and % of
total.
Numerical attributes like AGE and YRS_RESIDENCE are divided into “bins” of equal
width between the minimum and maximum. The equal width binning strategy groups
or bins the data so that each bin contains roughly equal numbers of cases.
However, you can see that the proportion of young clients in this database is
large. Let’s change the number of bins to three and see what happens.
Now there are three age groups, with group 0 containing 904 cases less than 42
years, group 1 having 538 cases between 42 and 67, and group 3 with 58 cases
over 67.
But suppose that you really wanted to divide your age groups into set age
ranges, say 1 to 25, 25 to 30, and 30 to 65?
We’ll see how to create ranges if values in Chapter 5.
Notice that the attribute with AGE selected in the histogram
window is a drop down list. You can choose each categorical and numerical field
and view the histogram for each one. Categorical attributes are binned using
the “Top N
” method, where N is
the number of bins. There are 19 different countries for COUNTRY_NAME, and if
you leave the bins set to 10, the group labeled OTHER will contain South Africa,
New Zealand, India, and so on. Setting the bin to 19 shows all countries
individually. The mode is United States, where the majority of clients reside.
Concentrating on a customer
Suppose we want to concentrate on our customers in the United States. We can
filter the data so that only selected groups of people are included. Right
click MINING_ DATA_BUILD_V, choose Transform, then Filter Single Record. You
could also click Data on the upper toolbar to reach the Filter Single-Record
Transformation Wizard.
After the introductory screen, click Next to go on to Step 1 of 3, and select
the view MINING_ DATA_BUILD_V, click on Next and name the new view in Step 2
MINING_DATA_BUILD_V
_US.
For the final step, click on the small box next to Filter to open the Expression
Editor.
1.
Use the editor to select COUNTRY_NAME, click
the “=”, and then type in 'United States of America'.
2.
You should see the message Validation
successful when you click “Validate”, and the expression builder shows "MINING_DATA_BUILD_V"."COUNTRY_NAME”
= 'United States of America'.
3.
Click OK. You may preview the results and
then choose to generate a stored procedure by clicking “Preview Transform” on
the Finish page.
4.
Click “Finish”
to complete the transformation view containing only US customers.
We have now completed the “case” table used to build the
data mining model. . Oracle Data Mining provides four algorithms for solving
classification problems, Adaptive Bayes Network,
Decision Tree, Naïve Bayes,
and Support Vector Machine. Each
classification data mining activity has distinct advantages depending on the
data and the business solution, and will be described in more detail in the next
chapter.