|
|
Linear Regression modeling for Oracle databases
Don Burleson
|
Oracle started with predictive modeling in Oracle
data mining (ODM) tools, and that Oracle Corporation is developing
the Automatic Maintenance Tasks (AMT), a new Oracle10g feature that
will
automatically detect and re-build sub-optimal indexes."
There has been great discussion about using the
scientific method with Oracle databases, and how mathematical models
are developed for Oracle. Predicting the future
without historical justifications is the realm of psychics, not
scientists. Virtually
every predictive model in Oracle software uses the database to
create the predictive model:
Data mining can sift
through massive amounts of data and find hidden information —
valuable information that can help you better understand your
customers and anticipate their behavior.
So, do Oracle modeling rules have to
make-sense? No, of course not. The Oracle scientists who
created the Oracle data mining tools make no such mistake.
They scan historical data and identify statistically significant
correlations (within 2 standard deviations of the mean value), and
base their results on empirical truths, not theory.
For
example, the popular
MMPI test
is a set of 500 true/false questions that accesses personality with
remarkable validity, and it's results are accepted in all U.S.
courts. Their test-base consists of hundreds of thousands of
subjects,. with a pre-diagnosed mental disorder (see
DSM IV).
By comparing their responses to seemingly innocuous questions (e.g.
"I read the editorials in the newspaper every day") a proven
predictive model was created (Federal
courts have affirmed the MMPI as a scientifically valid) and
accepted procedure for personality assessment.
For example, the subjects preference to take
showers vs. baths is an extremely reliable measure of self-esteem.
Do we know why? No. Do we care? Not really. All
that is proven is that this correlation is a statistically reliable
predictor of feelings of self-worth. We see the exact same
scientific principle applied to Oracle data mining (ODM) tools.
For example, we might find-out that people with red hair buy a
disproportionate amount of skin care products. Knowing "why"
is not important. What's important is knowing that the data
supports the assertion. Also useful is the book "Unobtrusive
Measures", which shows creative techniques for finding "hidden"
significant metrics.
In sum, rules don't have to be proven true to
be statistically reliable, and exceptions do not make the rule
invalid. For example, if two out of every 1,000 read-haired
people don't buy skin care products, we still have a model with a
very-high predictive quality.
|