This is an excerpt from Dr. Ham's premier book
"Oracle
Data Mining: Mining Gold from your Warehouse".
Next, we create 4 tables for
importingthe CLOBdata, using the code example below. Using sqlplus,
we log on as the ODMruser and
run these scripts:
create table web_desc_sheep (seqnum int primary
key, type varchar2(10), location varchar2(20), filename
varchar2(20),text CLOB);
create table web_desc_goats (seqnum int primary
key, type varchar2(10), location varchar2(20), filename
varchar2(20),text CLOB);
create table web_desc_bands (seqnum int primary
key, type varchar2(10), location varchar2(20), filename
varchar2(20),text CLOB);
create table web_desc_biomed (seqnum int primary
key, type varchar2(10), location varchar2(20), filename
varchar2(40),text CLOB);
Now that the tables are created,
use sqlldrto import CLOBdata from each directory using the code below.
First, create control files using the programs as shown.
We are creating two new fields,
seqnum for matching the CLOB
files with the index, and type, designating a constant for each
subject area. Go to the directory where the CLOB files are
located, substitute the directory listing for each folder as
appropriate, and change the filename position to read the filename
for your system.
Loading CLOB Data
into the SVM Model
For example, if your files are
located at c:/sw/bands/1 ensure that the numbers following
filename position are the columns in the control file where this is
listed. Once you have created and saved your control files,
type the following SQL*Loader (sqlldr)
syntax from the command line, substituting your ODMrusername and password, and the location of the control
files that you created:
Sqlldr dmuser/pswd control = c:/loader_bands.ctl
log=c:/loader_bands.log
LOAD DATA
INFILE *
replace
into table web_desc_bands
(seqnum recnum,
type constant bands,
filename position(63:79),
text LOBFILE (filename)
terminated by EOF)
BEGINDATA
10/13/1995 06:21 PM
6,897 DSCWEBDEV\Decision Suppc:\sw\bands\1
10/17/1995 12:01 PM
4,382 DSCWEBDEV\Decision Suppc:\sw\bands\2
10/17/1995 12:05 PM
4,654 DSCWEBDEV\Decision Suppc:\sw\bands\3
10/17/1995 12:11 PM
5,051 DSCWEBDEV\Decision Suppc:\sw\bands\4
10/17/1995 12:12 PM
9,086 DSCWEBDEV\Decision Suppc:\sw\bands\5
10/17/1995 12:15 PM
2,585 DSCWEBDEV\Decision Suppc:\sw\bands\6
For Step 4, create views using
the following scripts:
create or replace view bands_v as
select a."file_name", a."rating", b.type, b.text
from "web_rating_bands" a join web_desc_bands b
on a."file_name" = b.location;
create or replace view goats_v as
select a."file_name", a."rating", b.type, b.text
from "web_rating_goats" a join web_desc_goats b
on a."file_name" = b.location;
create or replace view sheep_v as
select a."file_name", a."rating", b.type, b.text
from "web_rating_sheep" a join web_desc_sheep b
on a."file_name" = b.location;
create or replace view biomed_v as
select a."file_name", a."rating", b.type, b.text
from "web_rating_biomed" a join web_desc_biomed b
on a."file_name" = b.location;
The
Oracle Application Express(APEX,
formerly HTML-DB)
program shown below shows a listing of the CLOB information
for the biomed record #2.
Next, we create a new table by
executing the union SQL statements below:
Create table web_union as
Select ?file_name?, ?rating?, type, text from
bands_v
UNION ALL
Select ?file_name?, ?rating?, type, text from
sheep_v
UNION ALL
Select ?file_name?, ?rating?, type, text from
goats_v
UNION ALL
Select ?file_name?, ?rating?, type, text from
biomed_v;
And we also create a new sequence
web_seq:
create sequence
web_seq
start with 1
increment by 1;
Finally, we create the case table
for web ratings.
create table
WEB_RATE
as
select
web_seq.nextval file_id,
a.*
from
WEB_UNION a;
Now that we have a table with a
sequence (file_id), file_name, rating, type, textual data (CLOB),
and 327 rows of data, we can proceed with the new data mining
activity.
1.
As before, choose a classification
function type and use the SVM algorithm. The unique identifier
is file_id, and in the ?select columns? box de-select file_name from
the column list.
2.
On the Review Data Usage Settings page
(Step 3 of 5) ensure that the CLOB
(text) field, type, and rating are checked as input attributes for
the model.
3.
Using type as the target, run the SVM
data mining activity, keeping the build parameters at the default
settings.