One of the new features in Oracle 9i was
Oracle Data Mining, a data
mining engine which allowed data analysts and application developers
to perform a range of data mining algorithms on data held in the
Oracle database. Oracle 9i came with a
number of mining algorithms such as Adaptive Bayes Networks,
Clustering and Association Rules, together with a Java API, to allow
ODM functions to be included in Java applications.
Whilst this was useful for Java programmers, it
was not all that relevant for PL/SQL programmers and to remedy this,
Oracle 10g came with a new package called
dbms_data_mining that provides PL/SQL access to the
data mining engine.
Like the Java API,
dbms_data_mining
allows for building a data
mining model, testing it and then applying the model to provide
scores or predictive information for an application. One of the key
differentiators for Oracle data mining is that mining models can be
applied directly to data in the database. There is no need to
extract the data and then separately load it into the mining engine,
meaning that data mining can now be carried out in real time. The
Oracle data mining engine can be pointed at any schema in the
database. If the data needs processing beforehand to place
continuous and discrete values into range bins, there is also a new
accompanying package,
dbms_data_mining_transform, to carry this out
automatically.
Oracle provides a graphical tool named Oracle
Data Miner. This can be downloaded at
http://www.oracle.com/technology/products/bi/ odm/index.html.
Further information on Oracle data mining can be
found in ?Oracle Data Mining Concepts? at
http://tahiti.oracle.com.
The data mining packages
dbms_data_mining and
dbms_data_mining_tranform
provide many procedures and functions for managing data mine
information. Some of them will be presented in the next example.
There should be some familiarity with Oracle data
mining as some objects are a prerequisite for using the
dbms_data_mining and
dbms_data_mining_transform packages. A
complete Oracle by Example of ODM can be found through this link:
http://www.oracle.com/technetwork/database/options/odm/odm-samples-194497.html.
The first example will show how to create, drop, rename, export,
import and get information about mining models.
<
Code
8.6 - dbms_data_mining.sql
conn sys@ora11g as sysdba
Connected to:
Oracle 11g Enterprise Edition Release 11.2.0.1.0
- Production
With the Partitioning, Oracle Label Security,
OLAP, Data Mining,
Oracle Database Vault and Real Application
Testing options
begin
dbms_data_mining.create_model(
model_name => 'my_first_model',
mining_function => dbms_data_mining.feature_extraction,
data_table_name => 'tab_my_tab_exemple',
case_id_column_name => 'cust_id',
target_column_name => NULL,
settings_table_name => 'my_new_settings_1',
settings_schema_name => 'pkg',
xform_list => NULL);
end;
/
--Dropping a model
begin
dbms_data_mining.drop_model(model_name => 'my_first_model',force =>
TRUE);
end;
/
--Rename a model
begin
dbms_data_mining.rename_model(
model_name => 'my_first_model',
new_model_name =>
'my_first_model_new_name');
end;
/
Here are some important points on exporting a
model. During the process of the import or export procedures,
temporary tables with the names
dm$p_model_expimp_temp,
dm$p_model_import_temp,
and
dm$p_model_tabkey_temp
are created. Located in the owner's schema, they
contain internal information about export or import processes.
Be sure that the object directory is already created in the
database.
In this example, only two models will be
exported. If it is necessary to export all models, leave
model_filter blank.
The command below will export all models for the
user that is currently connected.
begin
dbms_data_mining.export_model(
filename => 'my_first_exp_data_mining',
directory => 'exp_mining_dir',
model_filter => 'name in (''nmf_model_1'', ''svm_model_2'')',
filesize => NULL,
operation => NULL,
remote_link => NULL,
jobname => NULL);
end;
/
A dump file named
my_first_exp_data_mining is generated in the directory
specified by exp_mining_dir. This
example shows how to import a model connected as sys, mapping source
schema to a different target schema.
begin
dbms_data_mining.import_model(
filename => 'my_first_exp_data_mining',
directory => 'exp_mining_dir',
model_filter => NULL,
operation => NULL,
remote_link => NULL,
jobname => NULL,
schema_remap => pkg_target);
end;
/
The next examples show how to add, remove and get
information about the cost mode matrix. The process of adding a cost
mode matrix associates the classification model with the cost matrix
table.
begin
dbms_data_mining.add_cost_matrix(
model_name => 'my_first_exp_data_mining',
cost_matrix_table_name => 'costs_nb',
cost_matrix_schema_name => NULL);
end;
/
--Remove cost mode matrix
begin
dbms_data_mining.remove_cost_matrix(
model_name => 'my_first_exp_data_mining');
end;
/
There are other functions that can be used to
gather information about models. They are described below:
--Return XML object with information about
Decision Tree Model
select
*
from
table(dbms_data_mining.get_model_details_xml(model_name =>
'my_first_model'));
--Return coefficient statistics for a
Generalized Linear Model
select
*
from
table(dbms_data_mining.get_model_details_glm(model_name =>
'my_first_model'));
--Return information about k-Means
Clustering Models
select
*
from
table(dbms_data_mining.get_model_details_km(
model_name => 'my_first_model',
cluster_id
=> NULL,
attribute => NULL,
centroid => NULL,
histogram => NULL,
rules => NULL,
attribute_subname => NULL));
--Return information
about Naive Bayes Model
select
*
from
table(dbms_data_mining.get_model_details_nb(model_name =>
'my_first_model'));
--Return information about Non-Negative
Matrix Factorization Model
select
*
from
table(dbms_data_mining.get_model_details_nmf(model_name =>
'my_first_model'));
--Return information about
O-Cluster Clustering Model
select
*
from
table(dbms_data_mining.get_model_details_oc(
model_name =>
'my_first_model',
cluster_id => NULL,
attribute => NULL,
centroid => NULL,
histogram => NULL,
rules => NULL);
--Return information about a linear
Support Vector Machine Model
select
*
from
table(dbms_data_mining.get_model_details_svm(model_name =>
'my_first_model'));
--Return statistics information about a model
select
*
from
table(dbms_data_mining.get_model_details_global(model_name =>
'my_first_model'));
--This function was replaced by data dictionary view
dba|all|user|_mining_model_settings.
select
*
from
dba_mining_model_settings;
--This function was replaced by data dictionary view
DBA|all|user|_mining_model_attributes
select
*
from
dba_mining_model_attributes;
--Return information about transformation definitions
associated with a model
select
*
from
table(dbms_data_mining.get_model_transformations(model_name =>
'my_first_model'));
While dbms_data_mining
is used to create, drop, change and get information about data
mining models, dbms_data_mining_transform
is used to prepare data for
mining.
In the next examples, operations like
create, insert,
stack and xform
will be found.
They are used to transform columns of data for mining. The
operations are briefly described here, followed by examples:
n
create:
This operation creates a transformation table used for
transformation of data such as binning, column removal,
normalization, outlier treatment and missing value treatment.
n
insert:
This operation populates a transformation table in a
specified data source.
n
stack:
This operation adds to a list of transformation instructions.
This stack can be used in the create_model procedure.
n
xform:
This operation creates a view based on table data which
contains transformed columns.
Procedures starting with
create_% are responsible for creating definition tables and
procedures starting with insert_% are
responsible for inserting transformation instructions into
definition tables. There are also the procedures
stack_% used to add expressions to the
transformation definition stack and xform_%
to create views that can add, remove or transform values and
expressions.
<
Code
8.7 - dbms_data_mining_transform.sql
conn sys@ora11g as sysdba
Connected to:
Oracle 11g Enterprise Edition Release 11.2.0.1.0
- Production
With the Partitioning, Oracle Label Security,
OLAP, Data Mining,
Oracle Database Vault and Real Application
Testing options
--Procedure to create a table that will be filled
with attribute information to be excluded from mining process.
begin
dbms_data_mining_transform.create_col_rem(
rem_table_name => 'tab_transf_def',
rem_schema_name => 'pkg');
end;
/
--Procedure to
create a view that removes columns specified from data table.
begin
dbms_data_mining_transform.xform_col_rem(
rem_table_name
=> 'tab_transf_def',
data_table_name => 'tab_data',
xform_view_name
=> 'view_1',
rem_schema_name => 'pkg',
data_schema_name => 'pkg',
xform_schema_name => ' pkg ');
end;
/
--Procedure
to produce bin boundaries at equal intervals in ordered data
begin
dbms_data_mining_transform.insert_bin_num_eqwidth(
bin_table_name => 'tab_bin',
data_table_name => 'tab_data',
bin_num => 10,
exclude_list => dbms_data_mining_transform.column_list(
'cust_id',
'affinity_card',
'bookkeeping_application',
'bulk_pack_diskettes',
'flat_panel_monitor',
'printer_supplies',
'y_box_games',
'os_doc_set_kanji'),
round_num => 4
bin_schema_name => 'pkg',
data_schema_name => 'pkg');
end;
/
--Update
stack definition using expression in xform_list
begin
dbms_data_mining_transform.stack_norm_lin(
norm_table_name => 'tab_norm',
xform_list => '({expr} - {shift})/{scale}',
norm_schema_name => '');
end;
/