This is an excerpt from Dr. Ham's premier book
"Oracle
Data Mining: Mining Gold from your Warehouse".
Predictive analytics concerns the prediction of
future probabilities. With predictive analytics, the data mining
analyst takes the case dataset, identifies two key components, and
voila a model is applied to the data. Predictive analytics builds
models automatically, by combining predictors based on your case
dataset. The results explain what attributes are important in
predicting the outcome or target, and the probability that each case
will meet the predicted target value.
In contrast to the methods previously described
in this text, using predictive analytics requires no decisions on
the part of the data analyst in terms of picking an algorithm,
adjusting sensitivity values or any other settings. In essence,
predictive analytics simplifies the process and fully automates data
mining.
Oracle Predictive analytics is composed of the
PREDICT and EXPLAIN wizards, and is based on Oracle 10G Release 2’s
PL/SQL package DMBS_PREDICTIVE_ANALYTICS and DBMS_ODM. Oracle
provides an interface to this package with both ODMrand the
predictive analytic spreadsheet add-in for MS Excel. You can find
the Predict and Explain wizards under the Data toolbar. The Explain
Wizard identifies attributes important for explaining the target
attribute.
The steps taken by the wizard include analyzing
the input table, prepping the data, building the model, analyzing
the model to identify important attributes, and creating a table
with the attributes rank ordered in importance. The output table
lists the attributes sorted in decreasing order of importance for
explaining the target values. Importance is a number between 0 and
1, with 1 being most important.
After identifying the case dataset, Step 2 of
the Explain Wizard asks you to select the attribute you wish to
explain. This is the target attribute, and for the
Mining_Data_Build_V dataset the target is AFFINITY_CARD.
All that is left is to pick a name for the output table, and click
Finish.
In the “Explain Output” shown below, the top 10
ranking attributes for predicting whether a customer has an affinity
card are HOUSEHOLD_SIZE, CUST_MARITAL_STATUS, YRS_RESIDENCE,
Y_BOX_GAMES, EDUCATION, HOME_THEATER_PACKAGE, OCCUPATION,
CUST_GENDER, AGE and BOOKKEEPING_APPLICATION. The model built by
predictive analytics sets the importance of the remaining columns at
zero.
The Predict Wizard assigns probabilities and predictions of the target value for every
case in the dataset. The Predict Wizard analyzes the input table,
preps the data, builds the model, analyzes the model, and creates a
table with three columns: Case ID, Prediction of the target value,
and the Probability of the prediction.
Shown below is a portion of the Predict table
showing the CUST_ID, PREDICTION and PROBABILITY of the OCCUPATION
attribute for the Mining_Data_Build_V dataset.
The Predict Wizardis
independent of the Explain Wizard. You are not applying the Explain
model to new data. In other words, “Explain” and “Predict” are
Data-Centric automated data mining, and since all supporting objects
are linked to a data source the issue of matching the model to data
is eliminated.