Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Parsing with regular expressions regexp_like

Oracle Database Tips by Donald Burleson

Also see: Oracle Regular expressions indexes as a tuning tool

The regexp_instr function is a very powerful to extract patterns from inside a string.  Let's take a closer look at the prototype for the regexp_instr function:

regexp_instr (
   string,
   pattern,
   position,
   occurrence,
   return-option,
   parameters)

The last argument is the most important.  The "parameters" argument can be a combination of these formats:

i - Used to introduce case-insensitively

c - Keeps case (default value)

n - to make the dot (.) match new lines as well

m - to make ^ and $ match beginning and end of a line in a multi-line string

The "i" parameter is used to force case-insensitive parsing.  Just like the common UNIX "grep –i" command, we can use the lowercase "i" to extract both "Acre" and "acres".  The lowercase 'i" is also the last argument to the regexp_like function.

Borrowing an example from Jonathan Gennick's superb article "First Expressions", we see a parsing routine to extract "acreage" references from inside a test string, ignoring important factors such as case sensitivity and words stems (acre, acres, acreage):

COLUMN park_name format a30
COLUMN acres     format a13
 
SELECT
  park_name,
  REGEXP_SUBSTR(description,'[^ ]+[- ]acres?',1,1,'i') acres
  FROM michigan_park
  WHERE REGEXP_LIKE(description, '[^ ]+[- ]acres?','i');

Here is the output, where we see that the regular expression has parsed-out the acreage figures, just as-if they were a discrete data column with the table:

PARK_NAME                       ACRES
____________________________    ___________
Mackinac Island State Park      1800 acres
Muskallonge Lake State Park     217-acre
Porcupine Mountains State Park  60,000 acres
Tahquamenon Falls State Park    40,000+ acres

As we see, the regexp_like function is extremely powerful for extracting known pattenrs from inside larger strings.  Also, you can nest the regexp_instr function inside a substr function, using the regexp_instr fifth argument (the return-option) to compute the length of the result column "acres":

COLUMN park_name format a30
COLUMN acres     format a13
 
SELECT
   park_name,
   SUBSTR(
      description,
      REGEXP_INSTR(description, '[^ ]+ acres|[^ ]+-acre',1,1,0,'i'), -- start
      REGEXP_INSTR(description, '[^ ]+ acres|[^ ]+-acre',1,1,1,'i')
      -                                                              -- minus
      REGEXP_INSTR(description, '[^ ]+ acres|[^ ]+-acre',1,1,0,'i')  -- nbr
         ) acres
FROM michigan_park
WHERE
   REGEXP_LIKE(description, '[^ ]+ acres|[^ ]+-acre','i');

Here we see identical output, but we gain insight into how the regexp_instr function is used to get the start-position and string-length numbers for the regexp_substr function:

PARK_NAME                       ACRES
____________________________    ___________
Mackinac Island State Park      1800 acres
Muskallonge Lake State Park     217-acre
Porcupine Mountains State Park  60,000 acres
Tahquamenon Falls State Park    40,000+ acres

Jonathan Gennick notes:

The arguments to SUBSTR are string, start position, and length. The first call to REGEXP_INSTR searches for a substring matching the pattern 'xxx acres'. I did that by searching for any number of non-space characters followed by the word "acres", as in:

[^ ]+ acres

The "[^ ]+" part of the pattern allows for the non-space characters.

I used alternation to also include the case where a hyphen appeared between the number and the acreage:

[^ ]+-acre

To compute the number of characters to take using SUBSTR, I subtracted the start position of the acreage string from the ending. To get the ending position, I flipped the third argument to REGEXP_INSTR from a 0 to a 1. (that may actually give one past the end. look it up to be sure).

I see now that I did not anticipate the possibility of a "one acre" ("acre" as singular) national park. Ah, well. It'd be easy enough to accommodate that possibility.

Now that we understand the basic concept of data parsing, let's examine how we might use regular expressions to find transposition errors within an Oracle database.

See also: Oracle Regular expressions indexes as a tuning tool


For expert Oracle data cleansing support and data scrubbing consulting, use an expert from BC.  We understand the powerful Oracle data unification tools, and we can aid in improving the data quality of any Oracle database, large or small.

 

 

��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2020

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.