Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Histograms in SQL

SQL Tips by Donald Burleson

The concept of histograms originated in statistics. Admittedly, statistics never enjoyed the reputation of being the most exciting math subject. I never was able to overcome a presumably unfair impression that it is just a collection of ad-hoc methods. Yet a typical database table has a huge volume of data, and histograms provide an opportunity to present it in compact human digestible report.

A histogram can be built on any (multi-)set of values. In the database world this set comes from a table column, so values can be numerical, character, or other datatypes. For purposes in this book, it is convenient to order the values and then index them as in Figure 3.4

This might not feel natural at first, but the two dimensions on the graph provide an insight as to why there are exactly two kinds of histogram.

Equal-Width Histogram

Equal-width histogram partitions the graph horizontally, as shown on Figure 3.5.

Evidently, there is something wrong here, either with the term equal- width histogram, or my unorthodox interpretation. First, the partitioning is vertical; second, why do buckets have to be of uniform size? Well, if the coordinates are swapped in an attempt to make partitioning conform to its name, then it would not be a function anymore. Also, partitioning does not have to be uniform. Histograms with logarithmic partitioning will be studied later.

There is one special type of equal-width histogram - frequency histogram. It partitions the set of values in the finest possible way, Figure 3.6. 

The two kinds of objects -- (indexed) values and buckets -- can now be associated in a query. Either of the following questions should be asked:

1        what bucket a particular value falls in, or

2       what is the aggregate value in each bucket (there may be interest in more than one aggregate function)

In the first case, the query returns one extra attribute per each record in the input data. Given the input set of data,  

As suggested in Figure 3.4, the number in the Bucket column is determined by the Value.

Enough abstract talk, let's write some SQL queries.

Assign a bucket number to each record in the IndexedValues table. All the records with values in the range from 0 to 1 are placed into 0th bucket, values in the range 2 to 3 go into bucket number 1, etc.

The key to the solution is identifying what kind of function mapping of indexes into buckets satisfies the (informal) specification. The floor(index#/2) does, so the solution is as simple as:

select index#, value, floor(value/2) bucket
from IndexedValues

Often, the mapping from indexes to buckets is not determined by a mathematical function, like the combination of floor and division in the previous example. Consider:

Distribute IndexedValues records into 4 buckets. If total number of records is n, then all the records with values in the range from 0 to n/4 are placed into 0th bucket, values in the range n/4 to n/2 go into bucket number 1, etc.

This informal query translates 1-to-1 into SQL:

select index#, value, floor(4*value/n) bucket
from IndexedValues, (select count(*) n from IndexedValues)


Let's move on to the second type of queries - aggregates grouped by bucket.

Assign a bucket number to each record in the IndexedValues table. All the records with values in the range from 0 to 1 are placed into 0th bucket, values in the range 2 to 3 go into bucket number 1, etc. For each bucket count the number of values that fall into this bucket.

This version is just a small increment to the query written earlier:

select floor(value/2) bucket, count(*)
from IndexedValues
group by floor(value/2)


A simplified version of the previous query in verbose form defines a frequency histogram.

Assign a bucket number to each record in the IndexedValues table. The record with value 0 is placed into 0th bucket, value 1 goes into bucket number 1, etc. For each bucket count the number of values that fall into this bucket.

It is formally expressed as celebrated group by query:

select value bucket, count(*)
from IndexedValues
group by value

Equal-Height Histogram

Equal-height histogram partitions the graph vertically, as shown on Figure 3.7.

The development in the rest of the section would mimic an equal-width case with the value and index# roles reversed. There is some asymmetry, though. Unlike the equal-width case where partitioning values with the finest granularity led to the introduction of the frequency histogram, there is nothing interesting about partitioning indexes to the extreme. Each bucket corresponds one-to-one to the index, so the concept of a bucket in no longer needed.

Let's proceed straight to queries. They require little thought other than the formal substitution of value by index#.

Assign a bucket number to each record in the IndexedValues table. All the records with indexes in the range from 0 to 1 are placed into 0th bucket, indexes in the range 2 to 3 go into bucket number 1, etc.

select index#, value, floor(index#/2) bucket
from IndexedValues

There is one subtlety for an aggregate query. The record count within each bucket is not interesting anymore, as in our example we know it's trivially 2. We might ask other aggregate functions, though:

Assign a bucket number to each record in the IndexedValues table. All the records with indexes in the range from 0 to 1 are placed into 0th bucket, indexes in the range from 2 to 3 go into bucket number 1, etc. For each bucket find the maximum and minimum value.

select floor(index#/2) bucket, min(value), max(value)
from IndexedValues
group by floor(index#/2)

       

This is an excerpt from the new book SQL Design Patterns: The Expert Guide to SQL Programming by Vadim Tropashko

You can buy it direct from the publisher for 30%-off.


 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.