This is an excerpt from Dr. Ham's premier book
"Oracle
Data Mining: Mining Gold from your Warehouse".
Clicking the Data Mining Activityin the Navigator pane shows a listing of activities that
were created under each of the algorithms, and by selecting an
activity the details of all the steps and the output, build, and
test data created are shown for each step. In previous
releases, these tasks were not automated, leaving the data mining
analyst to prep the data and create the interim data tables or views
before building and testing each model. The analyst kept notes
of all the tables and views generated by this process in order to
document the process and replicate the resultant datasets and
analyses.
These transformation wizards,
although now automated, are still available in 10gR2 and allow
analysts to customize the data mining tasks to their liking.
In this chapter we will also introduce predictive analytics, a fully
automated “one click” data mining algorithm included in ODMr.
ODMrActivity Builder Tasks
Let’s look at the Support Vector
Mining Activitycreated for the
Boston housing market. The model produced output data for the
outlier treatment, missing values, normalization,
split, build and test steps.
The outlier treatment created an
output dataset named (in the example case for this text, it is)
“DMUSER_BOOK”.”DM4J$VBOSTON_PRICE212110607”.
This dataset can be viewed from
the Data Viewer by clicking on the output data name. You can
view the structure of this dataset, scroll through the data and view
the lineage to see where the data was generated from. This
dataset is then used as input for the next step in the Activity
list, taking care of missing values, which results in a new output
dataset named “DMUSER
_BOOK”.”DM4J$VBOSTON_PRICE569386744”, used as input to the splitting
algorithm.
Each step in the Activity
generates an output table that is used in the following step.
This feature is a huge help to the analyst for keeping track of the
many steps of building the final result set.
For many business applications,
the default settings used by the Activity Builder will be sufficient
to guide data mining activities. For normalizing and treating
outlier data, we probably do not want to change the settings given
by the data mining experts. However, you may want to change
the binning to something else that might have more meaning for the
business rules you are interested in.
In that case, you will want to
use the Discretizewizard to
prepare the data more to your liking. There may also be
situations where you choose to use a different normalizationscheme. We’ll examine these transformation wizards
in this chapter. First, let’s review the Naïve BayesMining Activitywe
created in Chapter One. Recall that we created an Activity
named ALL_US_NB1. Selecting this Activity brings up the Data
Mining Activity steps. In the Mining Activity pane we
can see that the case table is DMUSER_BOOK.MINING_DATA_BUILD_V_US,
with the Unique Identifier CUST_ID, target attribute of
AFFINITY_CARD,
and comment “Naïve Bayes classification for all US customers”.
Note that “Case Table” is a link
that opens a Data Viewer window where you can see the Structure,
Data, and View Lineage.
ODMr named the Mining Data “DMUSER
_BOOK”.”DM4J$VMINING_DATA_28226663”. You can
also link directly to this case table by clicking on MINING DATA
underneath the comment.
In the “View Lineage”
tab, we can see the SQL statement defining the case table for the
Mining Activity:
SELECT
"MINING_DATA_BUILD_V_US"."CUST_ID"
as "DMR$CASE_ID", TO_CHAR( "MINING_DATA_BUILD_V_US"."AFFINITY_CARD")
AS "AFFINITY_CARD", "MINING_DATA_BUILD_V_US"."AGE" AS "AGE",
TO_CHAR( "MINING_DATA_BUILD_V_US"."BOOKKEEPING_APPLICATION") AS "BOOKKEEPING_APPLICATION",
TO_CHAR( "MINING_DATA_BUILD_V_US"."BULK_PACK_DISKETTES") AS "BULK_PACK_DISKETTES",
"MINING_DATA_BUILD_V_US"."CUST_GENDER" AS "CUST_GENDER", "MINING_DATA_BUILD_V_US"."CUST_INCOME_LEVEL"
AS "CUST_INCOME_LEVEL", "MINING_DATA_BUILD_V_US"."CUST_MARITAL_STATUS"
AS "CUST_MARITAL_STATUS", "MINING_DATA_BUILD_V_US"."EDUCATION"
AS "EDUCATION", TO_CHAR( "MINING_DATA_BUILD_V_US"."FLAT_PANEL_MONITOR")
AS "FLAT_PANEL_MONITOR", TO_CHAR( "MINING_DATA_BUILD_V_US"."HOME_THEATER_PACKAGE")
AS "HOME_THEATER_PACKAGE", "MINING_DATA_BUILD_V_US"."HOUSEHOLD_SIZE"
AS "HOUSEHOLD_SIZE", "MINING_DATA_BUILD_V_US"."OCCUPATION"
AS "OCCUPATION", TO_CHAR( "MINING_DATA_BUILD_V_US"."OS_DOC_SET_KANJI")
AS "OS_DOC_SET_KANJI", "MINING_DATA_BUILD_V_US"."YRS_RESIDENCE"
AS "YRS_RESIDENCE", TO_CHAR( "MINING_DATA_BUILD_V_US"."Y_BOX_GAMES")
AS "Y_BOX_GAMES"
FROM "DMUSER_BOOK"."MINING_DATA_BUILD_V_US"
The Sample Step was skipped in
this example, so we go on to Discretize
and click on the Output Data link to view the data.
Note that the AGE attribute has been binned so that ages have been
coded as 1, 2 and 3.
We can see what was done to bin
the data by reviewing the SQL statement in the “View Lineage”
tab:
SELECT
"AFFINITY_CARD",(
CASE WHEN "AGE" < 32 THEN 1
WHEN "AGE" <= 44 THEN 2
WHEN "AGE" > 44 THEN 3
end) "AGE", "BOOKKEEPING_APPLICATION", "BULK_PACK_DISKETTES",
"CUST_GENDER", "CUST_INCOME_LEVEL", "CUST_MARITAL_STATUS", "DMR$CASE_ID",
"EDUCATION", "FLAT_PANEL_MONITOR", "HOME_THEATER_PACKAGE", "HOUSEHOLD_SIZE",
"OCCUPATION", "OS_DOC_SET_KANJI",( CASE WHEN "YRS_RESIDENCE" < 3
THEN 1
WHEN "YRS_RESIDENCE" <= 5 THEN 2
WHEN "YRS_RESIDENCE" > 5 THEN 3
end) "YRS_RESIDENCE", "Y_BOX_GAMES"
FROM "DMUSER_BOOK"."DM4J$VMINING_DATA_28226663"