This is an excerpt from Dr. Ham's premier book
"Oracle
Data Mining: Mining Gold from your Warehouse".
The SVM classificationmining activity shows two new steps labeled Text and
Text (Test). The algorithm detected the CLOB field and
correctly identified it as a text attribute.
In this step the algorithm applies context indexing or feature
extraction to the text attribute, and uses the same settings for the
test text fields.
Clicking the ?Result? of the
?Build? step you can see that the algorithm used a linear kernel and
the words that were used to classify the type target are listed
along with their coefficients.
Not surprisingly, ?GOATS? and ?GOAT? were the top attributes for the
goat target class, ?SHEEP? was the highest for sheep target class,
?INFORMATION? and ?MEDICAL? were the best for biomed, and ?IUMA? was
the top for bands.
Checking the Result of the Test Metrics step shows that the model
was in the ?best? range at 82% predictive confidence.
The
confusion matrixshows that the
model predicted bands with 100% accuracy, and the other web pages
ranged from 81% to 83% correctly predicted.
Now
we?ll repeat the SVM classification
and choose ?rating? as the target. Since there are only 11 web
ratings = ?medium? we have recoded these to ?cold? using the Recode
Transformation. Repeat the steps as above and keep the
default settings, picking ?text? and ?rating? as the input
attributes. The preferred target value will be ?hot?.
The
result of the test metrics show that predicting the users rating of
Web pages is in the ?good? range at 35%, and 41% of the ?hot?
ratings were correctly predicted.
The Build results show
the words that were used in the model to classify the test cases.
If we
repeat the SVM Activity Build and include ?type? along with ?rating?
and ?text? in the model, we find that the predictive confidence of
the model actually decreases to 28%.