This is an excerpt from Dr. Ham's premier book "Oracle
Data Mining: Mining Gold from your Warehouse".
What is Data Mining?
You quickly realize that pivot table analyses, while
interesting, will take weeks or months of examination and since time is of the
essence, you decide to try your hand at data mining. Why? Data mining is great
at finding patterns in huge amounts data. The Gartner Group the information
technology research firm defines data mining (from their web site, Jan. 2004):
“Data mining is the process
of discovering meaningful new correlations,
patterns and trends by sifting
through large amounts of data stored in
repositories, using pattern recognition
technologies as well as statistical
and mathematical techniques."
This is a wonderful but somewhat obtuse definition of data
mining! How do we start? Let’s explore a little bit about the Oracle data
warehousing tools.
Oracle provides a powerful data mining infrastructure
embedded directly into the database. The data mining infrastructure, accessed
through Java API, automates the performance of all the phases of data. Even
though data mining is based on statistics and machine learning (i.e. artificial
intelligence), you don’t have to have to be a statistical genius to run your
data mining analysis with Oracle.
The approach to Oracle data mining follows these
straightforward steps:
1.
Sample from a larger database or data warehouse.
2.
Explore, clean, preprocess and reduce the data, including treatment of
outliers and missing data.
3.
Develop an understanding of variables and selection of variables for
building a model.
4.
Data is partitioned into training, validation and test data sets.
5.
Run several modeling techniques, choosing one on the basis of its
performance on the validation data. Results with the test data are an indicator
of how well it will do with new data.
These steps will be explained in greater detail as we go
along.
First of all, how will you obtain the customer data? Do you
have to decide what fields are important at the outset? Fortunately, one of the
strengths of data mining is that there are algorithms available to help you
capture the important fields that are needed to build successful models of
“good” customers. So don’t worry about deciding which fields you need, include
as many as you can reasonably load into the table and let Oracle data miner help
mine the gold from the data.
Components of Oracle Data Miner
To help you with steps one and two above, Oracle data miner
has an impressive array of tools for sampling,
exploring, and cleaning the data. These tools include importingfiles, recodingexisting fields,
filtering using where queries, and deriving new fields.
In addition, there are utilities for creating views,
creating tables from views, copying tables, joining tables together, and
importingtext files. Displaying summary
statistics and histograms assists in step three, developing an understanding of
the data.
In order to build our model of the best customer to contact
for our marketing blitz, we’ll need to partition the data into build and test
data sets. ODMrdoes this automatically for
you, and we’ll explore these features in a future chapter. Finally, we’ll
develop a model, test it and then apply it to new data to obtain our customer
mailing list.