This is an excerpt from Dr. Ham's premier book
"Oracle
Data Mining: Mining Gold from your Warehouse".
For times when you want to group
your data, a useful transform in ODMr
is the Aggregate Transformation Wizard. In the
Mining_Data_Build_V_US dataset you may want to count how many of
each items were sold to customers with an affinity card, for
example.
You may want to view trends in
sales by aggregating daily sales to the week, month or year level.
You can calculate, store, and export aggregated values such as sum,
average, max, min, standard deviation, sum and variance. This
information may be saved in a view or table, or exported in csv or
text format.
In this example, the Aggregate
Transform Wizard is used to visualize customer buying habits grouped
by occupation in the Mining_Data_Build_V_US dataset. For every
level of OCCUPATION, data was aggregated using the average, count
and max functions. The wizard provides an easy interface for adding
and editing functions for any attribute in the case dataset, and
gives you a preview of the result. After viewing the preview you
can go back and delete, edit or add more functions before finishing
the transform. The SQL statement shown is provided by the transform
wizard for creating the resulting view.
CREATE VIEW "DMUSER
_BOOK"."MINING_DATA_BUILD_980685885"
AS
SELECT AVG ( ALL "AGE" ) AS "Ave_Age", AVG ( ALL "YRS_RESIDENCE"
) AS "Ave_Years_Res", MAX ( ALL "EDUCATION" ) AS "MAX_Education",
COUNT ( DISTINCT "FLAT_PANEL_MONITOR" ) AS "NUM_Flat_Panel", COUNT
( DISTINCT "HOME_THEATER_PACKAGE" ) AS "Num_Home_Theater", COUNT (
DISTINCT "OS_DOC_SET_KANJI" ) AS "Num_Os_Doc", COUNT ( DISTINCT "PRINTER_SUPPLIES"
) AS "NUM_Printer_Supplies", COUNT ( DISTINCT "Y_BOX_GAMES" ) AS "NUM_Y_Box",
"OCCUPATION" AS "OCCUPATION"
FROM "DMUSER_BOOK"."MINING_DATA_BUILD_V_US" GROUP BY "OCCUPATION"
The resulting aggregated data is
shown in Table 5.1:
Recode
the Transformation Wizard
A similar
transformation to the Discretize
transformation is
the Recode transformation.
The Recode Wizard
supports the relational operators =, <=, and >=. This makes it
difficult to partition values at a particular numerical value
(example: Latitude <= 4.2 vs. Latitude > 4.2). Recode is really
meant for use with strings and integers; use Discretization to
accomplish the same purpose for real numbers.
You can recode
single values, NULL values, ranges of values, and Other values. For
single values and ranges of values, you can pick the values from the
dropdown that has been populated with all possible attribute values,
or you can enter new values that have a compatible data type.
The new or recode
value can have a different data type from the old value (for
example, you could recode 0 as the one character y). Recode
definitions are sorted in the following order in the listbox: NULL
values (first), single values, ranges of values, and Other values
last.
If the wizard
detects problems when you are defining a recode scheme, it generates
a message explaining the problem.
Using the
Split Transformation WizardIf you are using a version of ODMrprior to 10gR2, you will also need to create the build
and test datasets. This is done by using the split transformation
to generate build and test tables or views of randomized data from a
single case table.
The use of the split
transformationsis
straightforward. Select the case table or view that you are
analyzing, and select Split in the transformation menu. Re-name
each table from the default (e.g. "boston_priceT103959454") if
desired to a more meaningful name (BOSTON_PRICE_TEST1). In the
final step of the wizard, select the percent of cases desired in
each table. The wizard defaults to 60% for the build table and 40%
for the test table. The SQL code used to create the build and test
tables is shown below:
CREATE TABLE "DMUSER_BOOK"."BOSTON_PRICE_BUILD1"
AS
SELECT "OBS", "TOWN", "TOWN#", "TRACT", "LON",
"LAT", "MEDV", "CMEDV", "CRIM", "ZN", "INDUS", "CHAS", "NOX
", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO",
"B", "LSTAT" FROM (SELECT /*+ no_merge */ t.*, ROWNUM RNUM
FROM "DMUSER
_BOOK"."boston_price" t)
WHERE ORA_HASH(RNUM,99,0) < 60
CREATE TABLE "DMUSER_BOOK"."
BOSTON_PRICE_TEST1" AS
SELECT "OBS", "TOWN", "TOWN#", "TRACT", "LON",
"LAT", "MEDV", "CMEDV", "CRIM", "ZN", "INDUS", "CHAS", "NOX
", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO",
"B", "LSTAT" FROM (SELECT /*+ no_merge */ t.*, ROWNUM RNUM
FROM "DMUSER_BOOK"."boston_price"
t)
WHERE ORA_HASH(RNUM,99,0) >= 60