Oracle Data Mining (ODM) Tips

Oracle Database Tips by Donald Burleson

Oracle data mining is one of the most challenging areas of Oracle. As disk prices fall by orders of magnitude, many shops find multi-terabyte online archives of historical information. This is a virtual gold-mine of information.  Data mining can sift through massive amounts of data and find hidden information - valuable information that can help you better understand your customers and anticipate behavior.

Historical Oracle information is so valuable that a typical data warehouse can pay for itself in just a few months by providing nuggets of information that saves the company millions of dollars.  Let's take a look at the evolution of Oracle data mining and see why it is a critical Oracle skill and briefly review Dr. Hamm's new book "Oracle Data Mining".

The evolution towards data mining

Once a new data warehouse and ETL has been created, Oracle data experts implement data queries, starting from simple, and culminating in data mining:

  • Ad-hoc query - The Discoverer 10g end-user layer will be configured to allow for the ad-hoc display of summary and detailed level data.
  • Aggregation and multidimensional display - Develop Oracle warehouse builder structures to summarize, aggregate and rollup salient information for display using the Oracle 10g Discoverer interface.
  • Basic correlations - The front-end should allow the end-user to specify dimensions and request a correlation matrix between the variables with each dimension. The system will start with one-to-one correlations and evolve to support multivariate chi-square methods.
  • Hypothesis testing - The data warehouse is used to validate theories about the behavior of the customer universe, and curve formulation techniques allow data mining experts to derive valid formulae to describe their data. Hypothesis testing in data mining often involves simulation modeling, using the Oracle data as input.
  • Oracle Data Mining - This is the capstone of Oracle data queries, a method for defining cohorts of related data items and tracking them over time. The basic goal of data mining is to identify hidden correlations, and the data mining expert must identify populations (e.g. Eskimo's with alcoholism) and then track this population across various external factors (e.g. treatments and drugs). These Oracle Decision Support System (DSS) interfaces require the ability for the end-user to refine their decision rules and change the salient parameters of their domain (i.e. the confidence interval for the predictions).

Obviously, performing Oracle data mining requires special skills, and Oracle data mining requires advanced statistics skills including multivariate (chi-square) techniques for identifying hidden correlations.

Performing advanced analytics in an Oracle data warehouse requires skills that are far-beyond those of an ordinary Oracle system. Many shops employ professionals with advanced degrees in areas that are statistics-centered drawing from people with doctorates in Economics, Experimental Psychology and Sociology. To perform complex and valid studies, the warehouse team must have a statistician with these skills:

  • Multivariate statistics - Even a simple longitudinal study required knowledge of the application of applied multivariate statistics.
  • Artificial Intelligence - Oracle Data Mining (ODM) product is heavily-centered around the application of AI for the mining algorithms and the statistician should have a firm grounding in fuzzy logic, pattern matching and the use of advanced Boolean logic.

That is why Dr. Hamm wrote the book "Oracle Data Mining". The Oracle data mining tools are complex by nature and the ODM professionals must understand how to apply Oracle's powerful tools to the data mining process.

Oracle Data Mining and Predictive Analytics

Oracle started with predictive modeling in Oracle data mining (ODM) tools, and that Oracle Corporation is developing the Automatic Maintenance Tasks (AMT), a new Oracle10g feature that will automatically detect and re-build sub-optimal indexes.?

There has been great discussion about using the scientific method with Oracle databases, and how mathematical models are developed for Oracle. Predicting the future without historical justifications is the realm of psychics, not scientists. Virtually every predictive model in Oracle software uses the database to create the predictive model.

As the first book on Oracle Data Mining, Dr. Hamm is breaking new ground and sharing her valuable insights into the complex machinations of this sophisticated tool. Dr. Hamm has done a great job in explaining the complex concepts and providing citations to get more information.

This is not a book for beginners, nor should it be targeted at dilettantes. Oracle data mining is the realm of expert statisticians and actuaries, and no book exists that can ?dumb-down? such a complex and important subject.




