Question: What software
options are available to perform custom data mining with Oracle 10g. I am aware
that there is a module for data mining as an option with Oracle 10g Enterprise
Edition.
Answer:
Data Mining is the capstone of Oracle
data queries, a method for defining cohorts of related data items and tracking
them over time. The basic goal of data mining is to identify hidden
correlations, and the data mining expert must identify populations (e.g.
Eskimo's with alcoholism) and then track this population across various external
factors (e.g. treatments and drugs). These Oracle Decision Support System (DSS)
interfaces (data mining software) require the ability to create and to refine
decision rules and change the salient parameters of their problem domain (i.e.
the confidence interval for the predictions).
Data Mining is a complex area, and all
of the software tools assume some knowledge of advanced statistics, especially
multivariate analysis (Chi-square), model building and algorithms. The
Gartner group defines data mining as:
"Data mining is the process of
discovering meaningful new correlations, patterns and trends by sifting
through large amounts of data stored in repositories, using pattern
recognition technologies as well as statistical and mathematical
techniques."
For details on the features of Oracle
data Mining software, Dr. Carolyn Hamm is just finishing a book called "Oracle
Data Mining", and it's received stellar reviews by Oracle Corporation.
Using any data mining tool requires
advanced statistics skills, generally at least four college-level university
courses:
- Multivariate Statistics
(chi-square)
- Model Building
- Adaptive Bayes Network and
decision tree models
- Na?e Bayes
- Support Vector Machine
The top three Oracle data mining
software products include:
- SPSS Clementine - The
SPSS (Statistical Package for the Social Sciences)
Clementine software interface is very popular software for Oracle
data mining, and Oracle named SPSS their "technology partner of the
year" for 2006. They note that the Clementine interface is used to
build, browse and score models in the Oracle Database 10g using
techniques available with Oracle Data Mining. Clementine software for
Oracle includes all of the Oracle Data Mining algorithms, including
Na?e Bayes, Adaptive Bayes Network and Support Vector Machines.
- SAS - The SAS
(Statistical Analysis System) "SAS/Access" software and SAS Enterprise
Miner for Oracle is very popular because it's been used for data mining
for more than 30 years.
- Oracle Data Mining (ODMr)
- Oracle Data Mining provides four algorithms for solving classification
problems, Adaptive Bayes Network, Decision Tree, Na?e Bayes, and
Support Vector Machine.
We also see these less popular
Oracle data mining software tools:
- CleaverPath Predictive
Analysis Server (by Computer Associates)
- Cognos for Oracle is a
very popular Oracle OLAP tool for multidimensional business
intelligence, and they claim some data mining extentions.
- Genalytics Predictive
Suite
- Hyperion uses a patented
cube technology, but Hyperion can also be used for data mining
- Insightful Miner
- KnowledgeStudio (by
Angoss Software)
- Quadstone System