Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 
 

Using a HASH Signature in Data Warehouses

Oracle Tips by Burleson Consulting

On my current assignment I am dealing with slowly changing dimensions, I found an interesting technique I would like to share.

In data warehouses one item that can be difficult to deal with are slowly changing dimensions. Slowly changing dimensions are dimensions whose values may alter over time such as addresses, deployments, etc. Often it is a desire of the data warehouse manager to maintain a running history of these changes over time, this makes the dimension a type 2 dimension.

The reason these are difficult to deal with is because determining what has changed becomes difficult if there are more than a few fields in the dimension. The data warehouse manager has the choice of a difficult, field by field verification of each field in the record or using some other form of field verification to determine if a record has changed in a dimension thus requiring a new record insertion. When the new record is inserted into a slowly changing dimension the old record is marked as invalid either by a flag or via a date field or set of date fields allowing for a bi-temporal dimension.

One method for finding out if a record has a changed field is to generate a CRC or a hash for the entire pertinent sections of the record, store this signature with the record and then on each record that is to be inserted, a signature is calculated for the important sections of the record. This new signature is compared to the stored signature and if the signatures match, the record is tossed, if the signature is changed, the new record is inserted, the old is marked as invalid and the new marked as the current record, if the new record matches none of the existing signatures, the record is inserted but no existing records are altered.

However, calculating a CRC or HASH can be difficult, unless you know something about what Oracle provides in the various packages contained in Oracle. For example the OUTLN_EDIT_PKG contains the GENERATE_SIGNATURE package. The GENERATE_SIGNATURE package creates a 32 byte hash signature for up to a 32k-1 sized string value passed to it either internally or via a procedure call. If the pass-in is from a procedure call from SQL then the length would be limited to 4000 bytes. Providing your critical columns are less than 4000 bytes concatenated length you can generate a signature that can be used to compare records, in this scenario, each date or number field should be converted using the appropriate function to a character equivalent.

To use the OUTLN_EDIT_PKG.GENERATE_SIGNATURE package the first thing to do is make sure your user that owns the schema for the data warehouse (DWH) has EXECUTE privilege on the OUTLN_EDIT_PKG:

From the owner of the OUTLN_EDIT_PKG, the user SYS, grant the proper privilege:

GRANT EXECUTE ON outln_edit_pkg TO ;

In order to use the GENERATE_SIGNATURE package most effectively, I suggest wrapping it into a deterministic function, for example:

From the DWH user you could create a function similar to:

CREATE OR REPLACE FUNCTION get_sign (text VARCHAR2)
RETURN RAW
DETERMINISTIC
AS
signature RAW(32);
text2 VARCHAR2(32767);
BEGIN
text2:=text;
IF text2 IS NULL THEN
text2:='1';
END IF;
sys.outln_edit_pkg.generate_signature(text2,signature);
RETURN signature;
END;
/


Note that this is just a bare-bones function, exception checking should be added.

Notice I use a default value of '1' if the input value is null, because the function generates a hash signature you cannot pass it a null value, so in the case of a null value we change it to a 1, you of course could use any string value you desire, just be sure to document it!

Once you have a deterministic function you can either use it as an INSERT into a table column, or, use it to generate a function based index. By using a function based index and query re-write you can eliminate the need to have a column in the table. This allows you to regenerate the matching columns without having to update the complete dimension each time there may be columns added to the dimension. Also, there are performance benefits to using a function based index or an internal column over doing either a column by column match. Let's look at some examples:

First, we have our test table TEST, in this case I created it from the DBA_OBJECTS view using a simple CTAS.

SQL> DESC test
Name Null? Type
----------------------------------------- -------- -------------------
OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(128)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(18)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)

SQL> SELECT COUNT(*) FROM test;

COUNT(*)
----------
32359

Once we have our test table, we create our function to call the GENERATE_SIGNATURE and then use it to create a function based index:

SQL> CREATE INDEX test_fbi ON test(get_sign(ownerobject_namesubobject_name));

Index created.


Before we can use query rewrite we must analyze the table and index as well as set the proper initialization parameters, (query_rewrite_enabled = true and query_rewrite_integrity = trusted).

SQL> ANALYZE TABLE test COMPUTE STATISTICS;

Table analyzed.


Let's look at a query that will use this index and see what the performance is like.

SQL> SELECT a.object_id FROM test a
2 WHERE
3 get_sign('SYSTEM''TEST''') = get_sign(a.ownera.object_namea.subobject_name);

OBJECT_ID
----------
35567


Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=324
Bytes=11016)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=324
Bytes=11016)
2 1 INDEX (RANGE SCAN) OF 'TEST_FBI' (NON-UNIQUE) (Cost=2
Card=129)


Statistics
----------------------------------------------------------
74 recursive calls
0 db block gets
14 consistent gets
1 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)

0 sorts (disk)
1 rows processed
SQL> /

OBJECT_ID
----------
35567

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=1
Bytes=34)
2 1 INDEX (RANGE SCAN) OF 'TEST_FBI' (NON-UNIQUE) (Cost=2
Card=1)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


SQL> /

OBJECT_ID
----------
35567

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=1
Bytes=34)
2 1 INDEX (RANGE SCAN) OF 'TEST_FBI' (NON-UNIQUE) (Cost=2
Card=1)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


Now let?s drop the index and re-run our query to check performance without it:

SQL> DROP INDEX test_fbi;

Index dropped.


SQL> SELECT a.object_id FROM test a
2 WHERE
3 get_sign('SYSTEM''TEST''') = get_sign(a.ownera.object_namea.subobject_name);

OBJECT_ID
----------
35567

Elapsed: 00:00:01.04

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=138
Card=324
Bytes=11016)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=138 Card=324
Bytes=11016)

Statistics
----------------------------------------------------------
243 recursive calls
0 db block gets
526 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> /

OBJECT_ID
----------
35567

Elapsed: 00:00:01.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=138 Card=324
Bytes=11016)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=138 Card=324
Bytes=11016)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
486 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> SELECT a.object_id FROM test a
2 WHERE
3 get_sign('SYSTEM''TEST''') = get_sign(a.ownera.object_namea.subobject_name);

OBJECT_ID
----------
35567

Elapsed: 00:00:01.03


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=132 Card=324
Bytes=11016)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=132 Card=324
Bytes=11016)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
446 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


As you can see the simple SELECT statement using the function based index performs in too small a time interval for Oracle's timing command to measure (the test was run several times, with identical results), while the SELECT not able to use the index takes a consistent time just over 1 second (the tests were run several times, and the times averaged).

As you can see even when the table is read from memory (physical reads = 0) the consistent gets (446) are much higher than those from the index based selects ( consistent gets 4) and the index based query performs better even though it has 74 recursive calls. The recursive calls were only present on the first run, after that the recursive calls were 0.

In situations where the call is repeated many times, such as for loading dimension tables, then the performance improvement should be dramatic.

In other tests, using an added column SIGNATURE, a RAW(32), and an index on the column we can get identical performance to a function based index and table combination:

First, using a modified SELECT to go against the SIGNATURE column and no index:

SQL> alter table test add signature raw(32);

Table altered.

SQL> update test a set a.signature=get_sign(a.ownera.object_namea.subobject_name);

32359 rows updated.

SQL> commit;

Commit complete.

SQL> set autotrace on
SQL> set timing on
SQL> SELECT a.object_id FROM test a
2 WHERE
3 get_sign('SYSTEM''TEST''') = a.signature;

OBJECT_ID
----------
35567

Elapsed: 00:00:00.08

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=116 Card=324
Bytes=7128)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=116 Card=324
Bytes=7128)

Statistics
----------------------------------------------------------
277 recursive calls
0 db block gets
534 consistent gets
3 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> /

OBJECT_ID
----------
35567

Elapsed: 00:00:00.07

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=116 Card=324
Bytes=7128)
1 0 TABLE ACCESS (FULL) OF 'TEST' (Cost=116 Card=324
Bytes=7128)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
486 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


So without an index the use of a special SIGNATURE column performs worse than the table and function based index combination. However, when we add an index, the performance is the same from the timing and the statistics.

SQL> create index test_ui on test(signature);

Index created.

Elapsed: 00:00:00.07
SQL> analyze table test compute statistics;

Table analyzed.

Elapsed: 00:00:02.05
SQL> SELECT a.object_id FROM test a
2 WHERE
3 get_sign('SYSTEM''TEST''') = a.signature;

OBJECT_ID
----------
35567

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=20)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=1
Bytes=20)
2 1 INDEX (RANGE SCAN) OF 'TEST_UI' (NON-UNIQUE) (Cost=2
Card=1)

Statistics
----------------------------------------------------------
26 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

SQL> /

OBJECT_ID
----------
35567

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=20)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=1
Bytes=20)
2 1 INDEX (RANGE SCAN) OF 'TEST_UI' (NON-UNIQUE) (Cost=2
Card=1)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)

1 rows processed

SQL> /

OBJECT_ID
----------
35567

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=20)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=1
Bytes=20)
2 1 INDEX (RANGE SCAN) OF 'TEST_UI' (NON-UNIQUE) (Cost=2
Card=1)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
382 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


However, if there is a chance that columns will be added or removed using a column inside the dimension will require the entire dimension to be updated a row at a time. In low row-count dimensions this may be acceptable, in large dimensions with thousands of rows it may be better to use the table and function based index solution.

 
Get the Complete
Oracle SQL Tuning Information 

The landmark book "Advanced Oracle SQL Tuning  The Definitive Reference"  is filled with valuable information on Oracle SQL Tuning. This book includes scripts and tools to hypercharge Oracle 11g performance and you can buy it for 30% off directly from the publisher.


 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.