This is an excerpt from Dr. Ham's premier book
"Oracle
Data Mining: Mining Gold from your Warehouse".
Although the numbers of customers
in these clusters are a small percentage of the whole sample,
insights as to how subsets of cases behave in relation to others may
help target areas where taking some action may substantially impact
the overall business practice.
K-Means gives you the rules for
deriving each cluster, so that you may apply the rules to another
dataset. For example, the rule for Cluster #16 is shown below.
The algorithm provides probabilistic scoring and gives you the
confidence and percent support. Note that the rule is written
in such a manner that IF A implies (THEN) B (Cluster = 16).
The confidence of the rule is the
conditional probability of B given A (A implies B) = probability (B
given A). Support for a rule is an estimate of the number of
cases in the training dataset for which the rule is true. For
Cluster #16 the confidence is 82% and the support estimate is 69
cases.
IF
AAANHANG in (0.0) and ABESAUT in (0.0) and ABRAND
in (1.0) and ABROM in (0.0) and ABYSTAND in (0.0) and AFIETS in
(0.0) and AGEZONG in (0.0) and AINBOED in (0.0) and ALEVEN in
(0.0) and AMOTSCO in (0.0) and APERSAUT in (1.0) and APERSONG in
(0.0) and APLEZIER in (0.0) and ATRACTOR in (0.0) and AVRAAUT in
(0.0) and AWABEDR in (0.0) and AWALAND in (0.0) and AWAOREG in
(0.0) and AWAPART in (1.0) and AWERKT in (0.0) and AZEILPL in
(0.0) and CARAVAN in (1.0) and MAANTHUI in (1.0) and MAUT0 <= 3.6
and MAUT0 >= 0.0 and MAUT1 <= 9.0 and MAUT1 >= 4.8 and MAUT2 <=
2.5 and MAUT2 >= 0.0 and MBERARBG <= 4.2 and MBERARBG >= 0.0 and
MBERARBO <= 4.2 and MBERARBO >= 0.0 and MBERBOER <=
0.30000000000000004 and MBERBOER >= 0.0 and MBERHOOG <= 6.3 and
MBERHOOG >= 0.0 and MBERMIDD <= 7.2 and MBERMIDD >= 0.0 and
MBERZELF <= 1.2 and MBERZELF >= 0.0 and MFALLEEN <= 4.2 and
MFALLEEN >= 0.0 and MFGEKIND <= 6.4 and MFGEKIND >= 0.0 and
MFWEKIND <= 9.0 and MFWEKIND >= 0.9000000000000001 and MGEMLEEF <=
4.2 and MGEMLEEF >= 1.8 and MGEMOMV in (2.0,4.0) and MGODGE <=
5.6000000000000005 and MGODGE >= 0.0 and MGODOV <= 2.4 and MGODOV
>= 0.0 and MGODRK <= 2.1 and MGODRK >= 0.0 and MGODRP <= 6.3 and
MGODRP >= 1.8 and MHHUUR <= 9.0 and MHHUUR >= 0.0 and MHKOOP <=
9.0 and MHKOOP >= 0.0 and MINK123M <= 0.2 and MINK123M >= 0.0 and
MINK3045 <= 5.3999999999999995 and MINK3045 >= 0.0 and MINK4575 <=
5.6 and MINK4575 >= 0.0 and MINK7512 <= 2.4 and MINK7512 >= 0.0
and MINKGEM <= 6.3 and MINKGEM >= 2.8000000000000003 and MINKM30
<= 5.6 and MINKM30 >= 0.0 and MKOOPKLA <= 8.0 and MKOOPKLA >=
1.7000000000000002 and MOPLHOOG <= 4.2 and MOPLHOOG >= 0.0 and
MOPLLAAG <= 6.3 and MOPLLAAG >= 0.0 and MOPLMIDD <= 5.6 and
MOPLMIDD >= 0.0 and MOSHOOFD <= 9.1 and MOSHOOFD >= 1.0 and
MOSTYPE <= 41.0 and MOSTYPE >= 1.0 and MRELGE <= 9.0 and MRELGE >=
5.0 and MRELOV <= 4.2 and MRELOV >= 0.0 and MRELSA <= 2.1 and
MRELSA >= 0.0 and MSKA <= 6.3 and MSKA >= 0.0 and MSKB1 <= 4.5 and
MSKB1 >= 0.0 and MSKB2 <= 4.2 and MSKB2 >= 0.0 and MSKC <= 7.2 and
MSKC >= 0.0 and MSKD <= 2.4 and MSKD >= 0.0 and MZFONDS <= 9.0 and
MZFONDS >= 1.8 and MZPART <= 7.2 and MZPART >= 0.0 and PAANHANG in
(0.0) and PBESAUT in (0.0) and PBRAND <= 4.2 and PBRAND >=
2.8000000000000003 and PBROM <= 0.2 and PBROM >= 0.0 and PBYSTAND
in (0.0) and PFIETS in (0.0) and PGEZONG in (0.0) and PINBOED in
(0.0) and PLEVEN <= 0.2 and PLEVEN >= 0.0 and PMOTSCO <=
0.30000000000000004 and PMOTSCO >= 0.0 and PPERSAIT in (6.0) and
PPERSONG in (0.0) and PPLEZIER <= 0.1 and PPLEZIER >= 0.0 and
PTRACTOR in (0.0) and PVRAAUT in (0.0) and PWABEDR in (0.0) and
PWALAND in (0.0) and PWAOREG in (0.0) and PWAPART in (2.0) and
PWERKT in (0.0) and PZEILPL in (0.0)
THEN
Cluster equal 16
Confidence (%)=82.1428571428571
Support =69
When
to use K-Means Analysis
K-Means is recommended for
datasets with low numbers of attributes (less than 500). The
number of clusters is specified by the user (the default is 10), and
normalizationof the dataset is
recommended to prepare the data for analysis. The advantages
of using enhanced k-Means
clustering (the algorithm included in Oracle Data Miner) is that it
provides results based on the algorithm that are superior to the
results obtained with traditional k-Means techniques utilized in
other data mining programs.