This is an excerpt from Dr. Ham's premier book
"Oracle
Data Mining: Mining Gold from your Warehouse".
We see that ODMr
recognized that the two
numerical data attributes AGE and YRS_RESIDENCE should be binned,
and discretized the data so that these fields were categorized into
3 different bins: 1, 2, and 3.
When we click on the Options
button in the Discretize
section, we will find that our
options are Quantile Binning, Equal Width Binning,
and None.
We can illustrate the
difference between the quantile and equal width binning by using the
discretize wizard. Below is the histogram
for the attribute AGE in the
MINING_DATA_BUILD_V_US
case dataset using the equal width binning strategy.
Each group in the histogram
view is composed of age values
in increasing increments of 7.3 years. As age increases, the
number of customers in the bins decreases, from a maximum of 19.49%
in group 3 to 0.22% in group 9. This type of distribution that
“tails off” is not a good choice for data mining analysis. You
want a more uniform distribution of ages across all groups, as in
the quantile binning shown below.
Using
the Discretize
Transform Wizard