 |
|
Oracle Data Mining (ODM) Tips
Oracle Tips by Burleson Consulting
|
Oracle data mining is one of
the most challenging areas of Oracle. As disk prices fall by orders of
magnitude, many shops find multi-terabyte online archives of historical
information. This is a virtual gold-mine of information. Data mining can
sift through massive amounts of data and find hidden information — valuable
information that can help you better understand your customers and anticipate
behavior.
Historical Oracle information is so valuable that a typical data warehouse can
pay for itself in just a few months by providing nuggets of information that
saves the company millions of dollars. Let's take a look at the evolution
of Oracle data mining and see why it is a critical Oracle skill and briefly
review Dr. Hamm's new book "Oracle
Data Mining".
The evolution towards data mining
Once a new data warehouse and
ETL has been created, Oracle data experts implement data queries, starting from
simple, and culminating in data mining:
- Ad-hoc query – The
Discoverer 10g end-user layer will be configured to allow for the ad-hoc
display of summary and detailed level data.
- Aggregation and
multidimensional display – Develop Oracle warehouse builder structures
to summarize, aggregate and rollup salient information for display using the
Oracle 10g Discoverer interface.
- Basic correlations
– The front-end should allow the end-user to specify dimensions and request
a correlation matrix between the variables with each dimension. The system
will start with one-to-one correlations and evolve to support multivariate
chi-square methods.
- Hypothesis testing
– The data warehouse is used to validate theories about the behavior of the
customer universe, and curve formulation techniques allow data mining
experts to derive valid formulae to describe their data. Hypothesis testing
in data mining often involves simulation modeling, using the Oracle data as
input.
- Oracle Data Mining
– This is the capstone of Oracle data queries, a method for defining cohorts
of related data items and tracking them over time. The basic goal of data
mining is to identify hidden correlations, and the data mining expert must
identify populations (e.g. Eskimo’s with alcoholism) and then track this
population across various external factors (e.g. treatments and drugs).
These Oracle Decision Support System (DSS) interfaces require the ability
for the end-user to refine their decision rules and change the salient
parameters of their domain (i.e. the confidence interval for the
predictions).
Obviously, performing Oracle
data mining requires special skills, and Oracle data mining requires advanced
statistics skills including multivariate (chi-square) techniques for identifying
hidden correlations.
Performing advanced analytics in an Oracle data warehouse requires skills that
are far-beyond those of an ordinary Oracle system. Many shops employ
professionals with advanced degrees in areas that are statistics-centered
drawing from people with doctorates in Economics, Experimental Psychology and
Sociology. To perform complex and valid studies, the warehouse team must have a
statistician with these skills:
- Multivariate statistics
– Even a simple longitudinal study required knowledge of the application
of applied multivariate statistics.
- Artificial Intelligence
– Oracle Data Mining (ODM) product is heavily-centered around the
application of AI for the mining algorithms and the statistician should have
a firm grounding in fuzzy logic, pattern matching and the use of advanced
Boolean logic.
That is why Dr. Hamm wrote the
book "Oracle
Data Mining". The Oracle data mining tools are complex by nature and the ODM
professionals must understand how to apply Oracle’s powerful tools to the data
mining process.
Oracle Data Mining and Predictive Analytics
Oracle started with predictive
modeling in Oracle data mining (ODM) tools, and that Oracle Corporation is
developing the Automatic Maintenance Tasks (AMT), a new Oracle10g feature that
will automatically detect and re-build sub-optimal indexes.“
There has been great discussion about using the scientific method with Oracle
databases, and how mathematical models are developed for Oracle. Predicting the
future without historical justifications is the realm of psychics, not
scientists. Virtually every predictive model in Oracle software uses the
database to create the
predictive model.
As the first book on
Oracle
Data Mining, Dr. Hamm is breaking new ground and sharing her valuable
insights into the complex machinations of this sophisticated tool. Dr. Hamm has
done a great job in explaining the complex concepts and providing citations to
get more information.
This is not a book for beginners, nor should it be targeted at dilettantes.
Oracle data mining is the realm of expert statisticians and actuaries, and no
book exists that can “dumb-down” such a complex and important subject.