Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Using the Stratified Sample Transformation Wizard

Data warehouse tips by Burleson Consulting

This is an excerpt from Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse".

The stratified sample transformation is useful when your target attribute constitutes a small percentage of cases.  This transformation was utilized in Chapter 4 to create a dataset for clustering.  In that example we used the COIL dataset and increased the percentage of the target CARAVAN from 6% to 33% of the sampled data. 

We kept all attribute?s where CARAVAN = 1 (348 cases) and randomly selected cases where CARAVAN = 0 which made up 2/3 of the total sample.  An alternate method is using percentage of cases in Step 3 of the stratified sample wizard.  Click Percent of Cases, leaving 100% as the default.  In step 4 of 4, click the Equal Distribution button to create a case dataset with a 50% distribution of the target attribute.  The total sample count is 696 cases, which is 12% of the original COIL dataset. 

The SQL used to create this transformation is shown below:

CREATE TABLE "DMUSER_BOOK"."COIL128895151"

AS

SELECT "MOSTYPE", "MAANTHUI", "MGEMOMV", "MGEMLEEF", "MOSHOOFD", "MGODRK", "MGODRP", "MGODOV", "MGODGE", "MRELGE", "MRELSA", "MRELOV", "MFALLEEN", "MFGEKIND", "MFWEKIND", "MOPLHOOG", "MOPLMIDD", "MOPLLAAG", "MBERHOOG", "MBERZELF", "MBERBOER", "MBERMIDD", "MBERARBG", "MBERARBO", "MSKA", "MSKB1", "MSKB2", "MSKC", "MSKD", "MHHUUR", "MHKOOP", "MAUT1", "MAUT2", "MAUT0", "MZFONDS", "MZPART", "MINKM30", "MINK3045", "MINK4575", "MINK7512", "MINK123M", "MINKGEM", "MKOOPKLA", "PWAPART", "PWABEDR", "PWALAND", "PPERSAIT", "PBESAUT", "PMOTSCO", "PVRAAUT", "PAANHANG", "PTRACTOR", "PWERKT", "PBROM", "PLEVEN", "PPERSONG", "PGEZONG", "PWAOREG", "PBRAND", "PZEILPL", "PPLEZIER", "PFIETS", "PINBOED", "PBYSTAND", "AWAPART", "AWABEDR", "AWALAND", "APERSAUT", "ABESAUT", "AMOTSCO", "AVRAAUT", "AAANHANG", "ATRACTOR", "AWERKT", "ABROM", "ALEVEN", "APERSONG", "AGEZONG", "AWAOREG", "ABRAND", "AZEILPL", "APLEZIER", "AFIETS", "AINBOED", "ABYSTAND", "CARAVAN"

FROM (SELECT /*+ no_merge */ t.*, ROWNUM RNUM FROM "DMUSER_BOOK"."COIL" t)

WHERE

( "CARAVAN"='1' and ORA_HASH(RNUM,(348-1),12345) < 348 )

 OR ( "CARAVAN"='0' and ORA_HASH(RNUM,(5474-1),12345) < 348 )

Using the Filter Single-Record Transformation Wizard

In some analysis, you may want to examine a subset of the case dataset.  You may see some interesting results in a cohort of diabetic patients, or want to explore more about customers in a specific region of the country.  The Filter Single Record transformation creates a view for the filtered dataset.  Filter conditions are specified using a Where clause, which is easy to construct using the expression builder. 

The resulting SQL to create the view is shown below:

CREATE VIEW "DMUSER_BOOK"."boston_price261754772" AS

SELECT * FROM "DMUSER_BOOK"."boston_price"

WHERE "boston_price"."CRIM"  < 8.0  And "boston_price"."NOX "  > 5.5

Inside the Sample Transformation Wizard

The Sample Transformation wizard creates a random sample of a table or view; the generated random sample can be either a table or a view. You can specify the sample size either as a number of records or as a percentage of records. You can also specify a random seed. A random sample is created by choosing a sample of cases picked at random.

The random sample is smaller than the set that it is based on. Using the smaller sample can result in improved performance in building a model or performing unsupervised data mining activities. If the sample is representative of the whole set, results generated using the sample will be compatible with the results generated using the whole set.
 

For more tips and tricks for Oracle data warehouse analysis, see Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse"

You can buy it direct from the publisher for 30%-off:

http://www.rampant-books.com/book_2006_1_oracle_data_mining.htm


 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational