This is an excerpt from Dr. Ham's premier book "Oracle
Data Mining: Mining Gold from your Warehouse".
A similar curve to the lift chart is the ROC(short for Receiver Operating Characteristic)
curve. The ROC curve uses the same metric on the y-axis as the lift curve,
versus the number of true negatives correctly classified, for different cutoff
levels.
The default cutoff level is 0.5, but we may be more
interested in customers who are more likely to have an affinity card than those
who do not.
The ROCmetric
gives us the opportunity to explore “what-if” analysis. Let’s reduce the false
negative value as much as possible with the requirement that we keep the total
positive number of positive predictions under 150.
We may have a budget restraint so we can only print 150
brochures. The false negatives in our model amount to 77 cases. The red
vertical line is set at 0.5 probability threshold.
By moving the red vertical line to the right, we change the
values in the confusion matrix.
Changing the probability threshold to 0.886 reduces the
false negatives from 77 to 58, and keeps the total number of positives (52 + 95
= 147) to less than 150. Now that we have modified the ROCchart, we can use these metrics when we apply the model to our
dataset.
Applying changes to a ROCModel
To change the actual ROC
model, we can follow these steps:
1.
Return to the Mining Activitydisplay
for ALL_US_NB1 and click on the ROC
Threshold:0.95151359 link in the Test Metrics block.
2.
Move the vertical red line to 0.88639 or click this value under
Probability Threshold, then click OK.
3.
You’ll see that the ROCThreshold has
the new value. You do not need to re-run the test step for this new threshold
to be used when you apply the model.
Applying the ROC Model
Now we will apply the model to new data so that we can
prepare our mailing list. This is also known as “scoring the data”. When a
model is applied to new data, the data must be prepared and transformed in
exactly the same way that the original source data was prepared for the model
building. Remember that we built the naïve classification model on a subset of
the MINING_DATA_BUILD_V,
which was all the 'United States of America’ customers. Now let’s apply the
model to all other customers.
Generalizing the Model
We want to create a new view, so in the main toolbar, click
on Data, then Create View to start the Create View Builder. Expand the data
source list under your connection and double click on the MINING_DATA_BUILD_V
view. The column names will appear in the window; click on the top-most box to
select all the attributes.
Under the Create Where Clause, choose COUNTRY_NAME and
doesn’t contain from the drop-down lists, and type in “America” in the third
box.
Click the View Results tab to see what the dataset looks
like, and if satisfactory, choose Create View under File. Type in a name for
the new view, such as MINING_DATA_BUILD_V_NOUS
and click OK.
To apply the model to
the MINING_DATA_BUILD_V_NOUS:
1.
Launch the Activity Guide Apply wizard from the Activity menu.
2.
Choose the ALL_US_NB1 model under Classification.
All the information about data preparation and model
metadata will be passed to the apply activity from the build model.