Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 
 
 

SQL Extensions for Performance

Oracle Database Tips by Donald BurlesonJuly 24, 2015


  By Kim Floss

Oracle has greatly enhanced their SQL capabilities such that there are many new functions that return complex data quickly.  Formerly, coding a simple query to use a ranking of items was quite complex.    Now, if you delve into these capabilities, things are much easier.  Jonathon Lewis, world-renowned Oracle expert, provides a great introduction into these functions in the following excerpt:

Analytic Functions

In Oracle version 8.1.6, Oracle Corp. introduced a whole new layer of functionality to its implementation of SQL: the analytic functions. This article provides a brief introduction to these functions, describing their use, and then demonstrating the benefit in clarity and brevity of the code that they offer. This article is based on material previously presented to the U.K. Oracle User Group at their annual conference in December 2000.

The Way Things Were

Assume you have a data warehouse describing the sales for a chain of shops selling computer games. One of your main data tables, describing sales by location and title sold per week, may look something like the following:

 

create table game_sale (

title varchar2(30),

store varchar2(30),

sales number(10,2),

week_ending date

);

 

How would you deal with the requirement for an end-user report showing the two best-selling games in each store this week? There are two traditional approaches: procedural and nonprocedural.

 

In the procedural case, we simply write a piece of SQL that sorts the data for a given week by store and period, then use a 3GL to walk through it one row at a time, making sure we print only the first two rows for each store. Consider, for example, this very simple PL/SQL solution that produces output which is correct but perhaps not entirely aesthetically satisfactory:

 

declare

cursor c1 is

select store, title, sales

from game_sale

where week_ending = '21-Jan-2001'

order by store, sales desc

;

m_last_row c1%rowtype;

m_out_ct number := 0;

 

begin

for r1 in c1 loop

if (m_last_row.store != r1.store) then

m_out_ct := 0;

dbms_output.new_line;

end if;

 

if m_out_ct != 2 then

dbms_output.put_line (

r1.store || ' - ' ||

r1.title || ' - ' || r1.sales

);

m_out_ct := m_out_ct + 1;

end if;

m_last_row := r1;

end loop;

end;

/

Glasgow - Crash Simulator - 1934

Glasgow - Manic the Gerbil - 913

 

London - Dome - 2167

London - Portal Combat - 1824

 

PL/SQL procedure successfully completed.

 

The following nonprocedural solution is rather more compact. On the plus side, if we are using SQL*Plus, it does allow us to produce a much tidier output without resorting to more fiddly little bits of code. On the minus side, the style of thinking required to produce this code is not really "natural;" thus, it is not immediately obvious that the SQL is appropriate to the requirement. More significantly, perhaps, this second solution requires Oracle to perform a correlated subquery for each row we examine to check how many items sold more than the current item in the current store this week — this could be a very resource-intensive query.

 

select     store, title, sales

from       game_sale gs1

where      week_ending = '21-Jan-2001'

and        2 > (

select        count(*)

from          game_sale gs2

where         gs2.week_ending =

gs1.week_ending

and           gs2.store = gs1.store

and           gs2.sales > gs1.sales

)

order by

store, sales desc;

 

STORE      TITLE            SALES

---------- ---------------- ---------

Glasgow    Crash Simulator       1934

           Manic the Gerbil       913

 

London     Dome                  2167

           Portal Combat         1824

 

In fact, both solutions also suffer from logical defects, partly because the requirement was not specified with sufficient precision, but also because the code in both cases is too simplistic.

 

Look at the PL/SQL code and ask yourself what would happen if Glasgow happened to sell 913 copies of Dome — it wouldn't be reported although it appears to qualify for second place. (In fact, two users running the same query with a different set of init.ora parameters might get different answers, depending solely on a minor variation in their init.ora parameters such as sort_area_size). Anyway, if there is a draw for second place, should both items be reported or should neither item be reported? The requirement was for the two best-selling games — we have no indication of how we should handle draws which result in more than two candidate titles.

 

The SQL option suffers from similar problems. Take the same scenario where Dome ties with Manic in Glasgow. As it stands, the SQL solution will automatically return three lines for Glasgow. At least there is no risk of the results changing randomly and you can choose to control the result by changing the inequalities in the SQL, substituting '>' for '>=', but you do need to think very carefully about how you are supposed to handle tied places.

 

One final point to consider with these solutions is the next, simple refinement of the requirement. The top two sales in London total 3991, whereas the top two sales in Glasgow total 2847; surely the marketing director will want to see the stores sorted in descending order by volume of sales. How do you do that in PL/SQL or SQL?

An Analytic Solution

The analytic functions offer a cleaner and simpler solution at the SQL level. We can solve (one version of) our requirement using the rank()  function. This function behaves like all the other analytic functions: after any initial selection, manipulation and grouping of the data, but before applying an order by clause, it will:

  • Break the data into partitions (nothing to do with partitioned tables or partition views)

  • Optionally sort the data within each partition for further processing

  • Optionally define "windows" within each partition (not shown in this example)

  • Add value to the output based on the partitions and windows.

All this extra processing is cued by the over()clause as follows:

 

select     store, title, sales, in_store_rank

ranked

from (

select

store, title, sales,

rank()over (

partition by store

order by sales desc

) as in_store_rank

from game_sale

where week_ending = '21-Jan-2001'

  )

where in_store_rank <= 2

order by

store, in_store_rank;

 

STORE   TITLE            SALES   IN_STORE_RANK

------- ---------------- ------- --------

Glasgow Crash Simulator     1934        1

        Manic the Gerbil     913        2

 

London  Dome                2167        1

        Portal Combat       1824        2

 

You will note that I have used an in-line view as part of the query. There are two reasons for this. First, you cannot use an analytic function in a where clause but we want to test the value of the generated rank, so we have wrapped our SQL in a pair of parentheses and made it an in-line view. The second reason is for clarity: if you use a complex SQL statement to generate the basic solution, it is convenient to improve clarity by pushing it into an in-line view before applying the analytic functions to something that now looks like a simple table. I frequently use in-line views when applying analytic views. (Be warned, however, that I have found a few exotic cases in which this causes dramatic changes to execution paths; in particular, when mixing partitioned tables with partition views. The current status of this problem is "possibly a bug.")

 

In examining the highlighted (i.e., bold faced) code that demonstrates the use of the rank() function, we see the partition clause that breaks the data set into separate sections, each section is then sorted in descending order by the sales column. When this phase of sorting is complete, Oracle applies a ranking to the intermediate data and the job is nearly finished. Because we want to report only the top two rows for each store (partition), we wrap the query into an in-line view, and restrict our selection to those rows in which the generated rank is less than or equal to 2.

 

Note that we then sort the final result set; it is likely, but not guaranteed, that the data for this example would coincidently appear in the correct order, but it is important to remember that any application of analytic functions takes place before, and independently of, the final order by clause. Do not rely on side-effects; always include an explicit closing order by if you want your output sorted.

 

In case you are wondering about tied places, the rank()function leaves gaps. If sales of Dome in the Glasgow store had matched those of Manic, then both games would have been ranked 2, and the next placed game would have been ranked 4, with no game in third place. There is an alternative ranking function, dense_rank()that would give the results 1,2,2,3,4…, but there is no function for generating the ordering 1,3,3,4,5….

 

We can be a lot more fanciful than this simple example, however. We note that London clearly sells more games than Glasgow. Perhaps we would like to list the store not alphabetically, but by total sales. We can do this by introducing another style of analytic function, partitioning the data again by store, and sorting on the result of that function.

 

select store, title, sales

from (

select

store, title, sales,

rank()over (

partition by store

order by sales desc

) as in_store_rank,

sum(sales) over (

partition by store

) as store_totals

from game_sale

where week_ending = '21-Jan-2001'

)

where

in_store_rank <= 2

order by

store_totals desc, in_store_rank;

 

In the highlighted (bold faced) code this time, we have taken the familiar sum()function, and used it in its new analytic form, demonstrating that analytic functions can use existing columns as parameters and are not restricted to generating new values as the parameterless rank()function does. In this case, we only have a partition clause, which splits the data by store again and thereby sums the sales by store without the need for the "traditional" group by clause. In this way, we can get "raw data" and summaries on the same line of a report without having to run two versions of the query or joining a table to itself. By the way, the example above sorts the store by total sold in each store; it is left as useful and interesting exercise to sort the stores by the totals of just the games listed in the report.

 

The savings in processing can be significant, but do remember that each time we do any partitioning we are introducing more work, typically sorting. However, the v$sysstatand v$sesstatviews do not appear to report in all the work that Oracle is doing, so be a little cautious as you investigate the possibilities on offer from analytic functions.

 

There are numerous functions, old and new, that can be used in this analytic way. The existing functions, like sum()above are count(), avg() , min() , max() , variance(), and stddev(). Apart from the rank()function introduced above, we have the new dense_rank() , percent_rank() , and row_number().  There are also a number of new statistical functions, such as corr() , var_pop() , var_samp(), covar_pop(), covar_samp() , stddev_pop(), stddev_samp(), and an entire host of regression functions. There are also the cume_dist()and ntile()functions (the latter allowing data to be split into percentiles, quartiles, etc.). Finally, under the heading of "miscellaneous but useful" we have the functions first_value() , last_value() , ratio_to_report() , lead(), and lag().

Common Requirements

Take a look at a couple of the new functions and see how they answer some of the most commonly asked questions about SQL. How do I report the difference between one row and the next? How do I report a figure as the percentage of the total for a report? How do I make one column report a running total of the figures in another column?

 

First, the lead()function can used to return the data from the following row in the output. lead() and lag()take as their parameters the name of the column you want to report, and the distance, (expressed as a number of rows) between the current row and the row you want to report alongside it. In my case, the distance is just one row; hence, the value 1 appearing in the function in this example.

 

select

title, this_sale, next_sale,

this_sale - next_sale delta

from (

select title, sales this_sale,

lead(sales,1) over (

partition by store

order by sales desc

) as next_sale

from game_sale

where week_ending = '21-Jan-2001'

);

 

TITLE            THIS_SALE  NEXT_SALE DELTA

---------------- ---------- --------- --------

Crash Simulator        1934       913     1021

Tonic the Gerbil        913       482      431

Dome                    482       315      167

Portal Combat           315        72      243

Age of Umpires           72

 

Next, to report one column as a percentage of the total data, we use the ratio_to_report()function. This function is named somewhat modestly, as it can be used not only to report the data as a percentage of the report total, but can also be used to report the data as a ratio of a partition. I demonstrate this by showing how a query can be simultaneously partitioned in many ways (bear in mind that this does mean more sorting, however).

 

Note how this report does not need to use an in-line view, because I am not using the result of an analytic function as the basis for a subsequent where clause. Note also how an empty over()clause shows that I am treating the entire data set as the partition.

 

select

store, title, sales,

ratio_to_report(sales) over (

partition by store

) store_ratio,

ratio_to_report(sales) over (

) country_ratio

from game_sale

where week_ending = '21-Jan-2001'

order by

store, sales desc;

 

Store   Title            Sale  St % Co %

------- ---------------- ----- ---- ----

Glasgow Crash Simulator   1934  .52  .24

        Tonic the Gerbil   913  .25  .11

        Dome               482  .13  .06

        Portal Combat      315  .08  .04

        Age of Umpires      72  .02  .01

 

London  Dome              2167  .49  .27

        Portal Combat     1824  .41  .22

        Crash Simulator    247  .06  .03

        Age of Umpires     110  .03  .01

        Tonic the Gerbil    52  .01  .01

 

Finally we come to the example of a running total, which also introduces the final feature of analytic functions, and the over() clause, which is the range or rows clause. Thus far, our analytic functions have been associated with either a single row from a partition (e.g., rank()) or with the entire set of data within the partition (e.g., ratio_to_report() ). It is possible, however, to specify that a function should be applied to a range within the partition.

 

There are several types of range specification, such as "within 3 rows of the current row," "within 10 days of the date in the current row," "the week (yes, analytic functions do know what a week is) preceding the date in the current row," etc. In this example, we use one of the simplest options, which is "this row and all the preceding rows in the partition," to make a function produce the running total of the partition. Note that if you use the range or rows clause, you must have an order by clause in the over() clause to allow the ranging calculation to have a consistent, reproducible meaning.

 

select

store, week_ending, sales,

sum(sales) over(

partition by store

order by week_ending

range unbounded preceding

) running

from game_sales

where store = 'Glasgow'

order by store, week_ending;

 

STORE    WEEK ENDING SALES RUNNING

-------- ----------- ----- -------

Glasgow  07-May-2000 4,000   4,000

         14-May-2000 5,000   9,500

         21-May-2000 6,000  15,000

         28-May-2000 7,000  22,000

         04-Jun-2000 8,000  30,000

         11-Jun-2000 7,802  37,802

         18-Jun-2000 7,636  45,438

         25-Jun-2000 8,134  53,572

          2-Jul-2000 7,815  61,387

          9-Jul-2000 8,023  69,410

 

This is a wonderful demonstration of how a very clean and simple piece of SQL can now be used to produce a result that previously required some heavy-duty calculation from the database engine or a carefully tested piece of procedural code. In this particular case, we also have an example in which the analytic function appears to be applied with virtually no overhead to produce a commonly requested form of output.

Conclusion

Analytic functions are very powerful and allow you to produce useful tabular results with very little programming, but it helps to use in-line views to increase the clarity of code.

 

Be careful about performance; some of the functions available clearly require Oracle to do a lot of sorting. But because the sort statistics are not reported; you should only apply analytic functions to relatively small result sets.

 

One final warning, PL/SQL does not yet recognize the syntax of analytic functions in static code, so you will need to resort to ref cursors and dynamic SQL to implement analytic functions inside PL/SQL packages.

 

But, what if you want to do more than that?  What if you want to use a function, but still want to use the index?  Often, in the past, use of functions meant a full-table scan, as the optimizer would often opt to ignore the index, if you had a function present.  Enter Function Based Indexes - build your index using the function, and low and behold - now you are back to using your index.  Also, there are a wealth of other functions present, in addition to ranking, things like 'top n' which will return your top 5 or top 10 (or top x) items from a table, or group by functionality that can now do cubes and roll ups, essentially totaling all the data for you, for each major category.

 


The above text is an excerpt from:

Oracle SQL Tuning & CBO Internals
ISBN 0-9745993-3-6

by Kimberly Floss


 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.