This is an excerpt from Dr. Ham's premier book
"Oracle
Data Mining: Mining Gold from your Warehouse".
The Numeric Transformation wizard allows you to
create a view by applying one of a list of predefined functions to
one or more numeric attributes. These functions modify the data
distribution characteristics and/or normalize the data values.
You can select one of the following predefined schemes:
EXP(x), where x <=
70 (Oracle database limit)
1 / EXP(AVG(x) -
x), where AVG(x) - x <= 70 (Oracle database limit)
LN(x + a), where a
is a user-supplied numeric constant and x + a > 0; LN is natural
logarithm. Use this scheme when you are dealing with large
numbers; this transformation makes the distribution more like a
normal distribution.
LN((x - a) / (b -
x)), where a and b are user-supplied numeric constants and (x - a)
/ (b - x) > 0; LN is the natural logarithm.
LOG(10, x + a),
where a is a user-supplied numeric constant and x + a > 0; LOG(10,
z) is the logarithm to the base 10 of z. Use this scheme when you
are dealing with large numbers; this transformation makes the
distribution more like a normal distribution.
SQRT(x), where x
>= 0. Use this function to linearize a distribution.
Using the Outlier Treatment Transformation
Wizard
The Outlier Treatment Transformation is used to
generate recommended outlier treatments based on the mining
algorithm that you plan to use. If you invoke this wizard from a
Mining Activity, the appropriate default treatments are specified
automatically.
The default algorithm settings are shown below.
An outlier is a data point that is located far from the rest of the
data. An outlier is typically several standard deviations from the
mean. Some data mining algorithms are sensitive to outliers in data.
The Outlier Treatment wizard identifies outliers and lets you
specify how to treat them.
You specify a treatment by defining what
constitutes an outlier (for example, all values in the top and
bottom 5% of values) and how to replace outliers (either with NULL
or edge values). The wizard can generate default treatments based on
the algorithm that you are planning to use.
If you like, you
can specify how to identify outliers using the wizard.
The SQL statement shows the wizard-generated
code when the attribute CMEDV in the Boston price dataset is set to
Standard Deviation, Multiples of Sigma = 3.
CREATE VIEW "DMUSER_BOOK"."boston_price58837524"
AS SELECT
"AGE","B","CHAS",( CASE WHEN "CMEDV" < -5.02 THEN NULL
WHEN "CMEDV" >= -5.02 AND "CMEDV" <= 50.08 THEN "CMEDV"
WHEN "CMEDV" > 50.08 THEN NULL
end) "CMEDV"
,"CRIM","DIS","INDUS","LAT","LON","LSTAT","MEDV","NOX"
,"OBS","PTRATIO","RAD","RM","TAX","TOWN","TOWN#","TRACT","ZN"
FROM "DMUSER_BOOK"."boston_price"