Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 
 

Clusters and cohort groups

Oracle Tips by Burleson Consulting


One important feature of Oracle data warehousing is the creation and maintenance of complex classification hierarchies.  Clients need the ability to create custom-defined populations and compare these populations to other arbitrary data groupings. In data warehouse vernacular, these custom classifications are known as ?cohorts? or ?clusters. 

This ability to custom-define populations is the foundation of advanced analytics, traditionally within the domain of data warehousing, where correlations between populations are discovered using multivariate statistics.

 

However, there is a push towards allowing the definition of cohorts online.  Sophisticated data warehouse users are requesting the ability to define and compare ?user defined populations? in real-time using Oracle 10g Discoverer.

 

For a simple example, consider a point of sale data warehouse for a grocery store chain.  When customers use their ?shoppers club? card to receive discounts, the POS warehouse tracks their purchases.  An executive might want to create an abstract data type called ?person type? and then instantiate user-defined diseases based on complex Boolean rules.  These classification groups are often given arbitrary names:

 

  • ?A YUPPIE is someone over the age of 18 with a college degree who has a home valued at more than $300,000, a gross income of at least $100,000 per year?.

 

  • ?A DINK (double-income, no kids) is a married couple with no children whose combined income is at least 150,000 per year, a pristine credit rating and net assets of more than $200,000?.

 

Once defined, these arbitrary categorizations must be able to take-on full status as a warehouse dimension such that they can be used in OLAP queries:

 

  • Display the average monthly coffee expenses for DINKS, grouped by MONTH

     

  • Display the average grocery expenses for YUPPIES, grouped by NET INCOME and OCCUPATION

 

This ability to define and compare clusters is a important feature of any data warehouse.  In an article section titled ?the analytics of behavior?, Dr. Ralph Kimball notes that clustering is critical for:

 

Classification is possible once you have clusters. If a new customer prospect can be associated with one of your existing clusters, you can reasonably infer that this customer will behave like the other members of that cohort group. Note the word behave. You have classified the customers by their behavior. We need some more advanced analytics here in order to understand how close the prospect is to the centroid of the existing cohort group, another step up the analytic ladder.

 

Prediction is the highest art form. You can associate a numeric metric with each known member of a cohort group and then use that metric together with the "distance" to the new prospect to derive a numeric prediction of lifetime value, or likelihood to default.

 

From an operational perspective we must have the ability to define these clusters and compare their behavior to other clusters.

 

The concept of ?behavior tags?

 

Dr. Kimball coined the term ?behavior tags? to describe the process of identifying participation in clusters.  Kimball notes that participation on a cluster is time-based, and someone can be a YUPPIE one year and then change groups as their income or occupation changes.  The trick is to be able to define the cluster and then pre-define those members of the cluster over time.

 

For our purposes we need to use Oracle Discoverer to allow our sophisticated users to create user-defined populations and use these arbitrary groupings within all areas of Discoverer, including OLAP and decision support.

 

Modeling Clusters with Oracle Discoverer

 

These are suggestions from Mark Rittman.  Take your two classification groups:

 

  • ?A YUPPIE is someone over the age of 18 with a college degree who has a home valued at more than $300,000, a gross income of at least $100,000 per year?.

 

  • ?A DINK (double-income, no kids) is a married couple with no children whose combined income is at least 150,000 per year, a pristine credit rating and net assets of more than $200,000?.

 

Looking at Yuppies, the distinguishing characteristics are age, education, home value and gross income. All of these distinguishing factors could be considered attributes of a customer dimension. The same goes for DINKS, with the attributes being marital status, number of children, combined income, credit rating and assets.

 

Your mail says that these groupings would become full dimensions, but in Discoverer and OLAP terms you could consider these groupings as a "saved selection" of a number of dimension members. One saved selection would be customers who met the Yuppie criteria, another would be customers who met the DINK criteria. I don't think these groupings of customers would be dimensions in themselves, what they are is a selection of members of the customer dimension.

 

Where Discoverer 10g comes into this is that, when you use Discoverer for OLAP, and the OLAP Option, you can use the Query Wizard to create saved selections like these using an OLAP-aware GUI tool. The user can come up with a subset of (for example) customers, who have a shoe size of 5, who's income is above average, who have spent more than the average on electric items, and who have not been mailed before, and make a saved selection (i.e, a "cluster") of these for use later. These saved selections can then be recalled to compare one cluster with another.

 

In terms of Discoverer 10g, I think this is what you are looking at. In terms of implementation, you would use Discoverer for OLAP against either a relational OLAP schema (regular star schema with additional metadata) or you'd move the data into an analytic workspace (better query performance).

 

Take a look at this viewlet:

 

http://www.oracle.com/technology/products/bi/files/viewlets/Chapter4_viewlet_swf.html 

 

Of course the real question is what attributes should I pick to create my clusters - someone would have come up with the "Yuppie" cluster as one that could be profitable, and the real skill is putting these clusters together in the first place, which is of course a task usually done using data mining software. All you're doing with Discoverer is taking the attributes that describe the cluster and selecting them using the tool.

 

Other than that, if you're after a more custom solution - perhaps one that combines data mining and selection building, perhaps one that more closely matches the customers requirements, then you're looking at a custom BI application using Oracle BI Beans.

 

One thing to bear in mind with the Discoverer for OLAP solution is that there's no EUL - you use the Discoverer Catalog instead, and create the dimensions, measures, permissions and so on using Analytic Workspace Manager.



 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational