Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

Free Oracle Tips

HTML Text

 Home
 E-mail Us
 Oracle Articles



 Oracle Training
 Oracle News

 Oracle Forum
 Class Catalog


 Our Staff
 Our Prices
 Help Wanted!

 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 UNIX
 Oracle UNIX
 Linux
 Oracle Linux
 Monitoring
 Remote help

 Remote plans
 Remote
services
 Oracle C++
 Oracle Java
 Apache
 JDeveloper
 App Server

 Applications
 Oracle Forms
 Oracle Portal
 11i Upgrades
 SQL Server
 Oracle Concepts
 HTML-DB Tips
 Software Help

 Remote Help  
 Development  

 Implementation


 Financials Training
 Oracle 11i
 Oracle Apps 11i
 Oracle Workflow
 Oracle AR 11i Class
 Oracle AP 11i class
 Oracle GL 11i class
 Oracle HR 11i class
 Oracle FA 11i class
 11i Project Mgt
 11i procurement
 11i collections


 Oracle Posters
 Oracle Books

 Oracle Tuning Book
 Oracle RAC Book
 Oracle Security
 Easy Oracle Books
 Oracle Scripts
 SQL Server DBA
 SQL Design Patterns
 WISE
 Excel-DB   


 BC Oracle News


 Rednecks!
 Dress code
 Arabian Stallion

 Burleson Arabians
 Guide Horses
 Don Burleson Blog
 Golf & Travel


 Privacy Policy
 

 

 
 

Using utl_match to compare string similarity

Oracle Tips by Burleson Consulting
March 30, 2008

 

Hugo Cardoso asks: Given a column name (word or small text) I want to choose from a set of column names the most seemed (if it is not equal). I'm thinking to use 'soundex' function, but I do not know if I can use it (and how use it) as a measured of proximity (choose the nearest) in the case of the function return it is not exactly the same.  I made some tests and it seems to me that the 'soundex' function it is not good for me (for example it is insensitive to the numbers).  I think that the solution for my problem is in utl_match built-in package. Is there any examples of using utl_match to compare strings?

Answer:  Soundex is commonly used in genealogy research where you must locate “similar sounding” names in databases, where the original data was entered hundreds of years ago, often by someone who used phonetics to “sound-out” the way that a spoken name was heard.

The Oracle soundex function accepts a string and returns a number corresponding to the “sound” of the spoken string, an invaluable tool for locating typographical errors in a name.

Oracle Corporation borrowed the soundex algorithm from the book The Art of Computer Programming, Volume 3, by Donald Knuth, to generate the phonetic representation of character strings.

Here are some of the values SOUNDEX generated and how they vary according to the input string:

SOUNDEX ('feuerstein') ==> 'F623'
SOUNDEX ('feuerst') ==> 'F623'

From the Oracle documentation, we see these limits of the soundex function.

The soundex value always begins with the first letter in the input string. soundex only uses the first five consonants in the string to generate the return value. Only consonants are used to compute the numeric portion of the soundex value. Except for a possible leading vowel, all vowels are ignored. soundex is not case-sensitive. Upper- and lowercase letters return the same soundex value.

Using utl_match to compare strings

According to Oracle the utl_match utility is new in Oracle 10g release 2, and there is no documentation.  The undocumented utl_match package is used to compare the similarity of a target string and a examined string, embedded inside SQL. The utl_match procedure has four string compare functions:

edit_distance – This utl_edit function counts the number of character changes that are required to morph the target string into the examined string.

edit_distance_similarity – This utl_edit function returns the percentage of similarity between the target string and the examined string.

jaro_winkler – This utl_edit function is used to detect data entry errors by measuring the degree that the strings match.

jaro_winkler_similarity – This utl_edit function is similar to edit_distance_similiarity, computing the percentage of a string match.
 

This OTN thread shows some great examples of utl_match:

SCOTT@orcl_11g> -- ordering by utl_match.edit_distance:
SCOTT@orcl_11g> select z.*,  UTL_MATCH.EDIT_DISTANCE ('hilton hotel', LOWER (operator_name)) edit_dist
  2  from   operator z
  3  where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
  4  ORDER BY edit_dist
  5  /

OPERATOR_ID OPERATOR_NAME                   EDIT_DIST
----------- ------------------------------ ----------
    1000084 Hilton Hotel                            0
    1005660 Hilton Hotel                            0
    1001397 HILTON HOTEL AIRPORT                    8
    1004356 Hilton Hotel/Avendra Eugene            15
    1001472 Hilton Hotel & Conference Ctr.         18

SCOTT@orcl_11g> -- ordering by utl_match.edit_distance_similarity:
SCOTT@orcl_11g> select z.*,  UTL_MATCH.EDIT_DISTANCE_SIMILARITY ('hilton hotel', LOWER (operator_name)) edit_dist_sim
  2  from   operator z
  3  where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
  4  ORDER BY edit_dist_sim DESC
  5  /

OPERATOR_ID OPERATOR_NAME                  EDIT_DIST_SIM
----------- ------------------------------ -------------
    1000084 Hilton Hotel                             100
    1005660 Hilton Hotel                             100
    1001397 HILTON HOTEL AIRPORT                      60
    1004356 Hilton Hotel/Avendra Eugene               45
    1001472 Hilton Hotel & Conference Ctr.            40

SCOTT@orcl_11g> -- ordering without selecting the column:
SCOTT@orcl_11g> select * from operator z
  2  where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
  3  ORDER BY UTL_MATCH.EDIT_DISTANCE ('hilton hotel', LOWER (operator_name))
  4  /

OPERATOR_ID OPERATOR_NAME
----------- ------------------------------
    1000084 Hilton Hotel
    1005660 Hilton Hotel
    1001397 HILTON HOTEL AIRPORT
    1004356 Hilton Hotel/Avendra Eugene
    1001472 Hilton Hotel & Conference Ctr.

SCOTT@orcl_11g> select * from operator z
  2  where contains (concat,'hilton hotel within {the_operator_name}',1 ) > 0
  3  ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY ('hilton hotel', LOWER (operator_name)) DESC
  4  /

OPERATOR_ID OPERATOR_NAME
----------- ------------------------------
    1000084 Hilton Hotel
    1005660 Hilton Hotel
    1001397 HILTON HOTEL AIRPORT
    1004356 Hilton Hotel/Avendra Eugene
    1001472 Hilton Hotel & Conference Ctr.

 

If you like Oracle tuning, you might enjoy my book "Oracle Tuning: The Definitive Reference", with 950 pages of tuning tips and scripts. 

You can buy it direct from the publisher for 30%-off and get instant access to the code depot of Oracle tuning scripts.


    Need an Oracle Health Check?
  • Do you have bad performance after an upgrade?
     
  • Need to certify that your database follows best practices?

BC Oracle performance gurus can quickly certify every aspect of your Oracle database and provide a complete verification that your database is fully optimized.

 

 

 

 
 
 

Oracle performance tuning book

 

 

Oracle performance tuning software

 
Oracle performance tuning software
 
SearchOracle web site
 
Oracle performance Tuning 10g reference poster
 
Oracle performance tuning webcast
 
Oracle training in Linux commands
 
Oracle training Excel
 
Oracle training & performance tuning books
 

 

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  and include the URL for the page.
 
 


Burleson Consulting

The Oracle of database support


 

Copyright © 1996 -  2007 by Burleson Enterprises, Inc. All rights reserved.

Oracle® is the registered trademark of Oracle Corporation.


Hit Counter