-
Grouped Table Outer Join
-
Increased Number of Aggregates per Query
-
Remote Stored Functions in SELECT Statements
-
Case-Insensitive and Accent-Insensitive Query and
Sort
-
Enhanced CONNECT BY Support
-
Oracle Expression Filter
-
SQL Regular Expressions
-
Row Timestamp
We will start by examining the grouped table outer join
enhancement.
Grouped Table Outer Join
This new SELECT feature, known as the "Grouped Table
Outer Join", is an extension to the standard ANSI join syntax. The new
syntax is used to improve performance and simplify SQL queries for Oracle10g
time-based calculations. The WITHIN GROUP clause must reference a column or
columns used in a GROUP BY statement in the source SELECT statement.
The grouped table outer join clause syntax is:
<Table_reference> WITHIN GROUP (<expr>[,])
You use a grouped table outer join to fill in sparse
data; sparse data is data that doesn't have rows for all possible values of
a dimension, such as time or department. This is known as the process of
data densification.
The example below shows the use of a grouped table outer
join in a subquery_factoring_clause.
A subquery_factoring_clause
allows you to assign a name to a subquery block. The subquery block can then
be referenced from multiple locations in your query simply by specifying the
name. This type of query is optimized by Oracle in the same way as an
in-line view. The subquery_factoring_clause can be used in top-level SELECT
statements and subqueries.
The subquery factoring clause comes before the normal
SELECT in the command syntax.
WITH X AS (
SELECT time_id, loader_id, sum(loaded_weight)
total_weight
FROM shipments
WHERE loader_id = 165432
GROUP BY time_id, loader_id)
SELECT X.loader_id, t.time_id,
SUM(total_weight) OVER
(partition by loader_id, t.calendar_year
ORDER BY T.time_id) as YTD_tonnes
FROM X WITHIN GROUP (loader_id)
RIGHT OUTER JOIN TIMES T
ON (x.time_id = t.time_id);
Grouped tables can be used singly, or as a target of a
JOIN. The grouped table clause is usually a sub-section of the joined_table
clause as a sub-section of the table_reference clause.
The next new feature is the increased number of
aggregates per query.
Increased Number of Aggregates per Query
In previous releases of Oracle, the number of aggregate
functions that could be used in a SELECT statement was limited to the
constraints imposed by the limit on GROUP BY clauses, in other words, the
GROUP BY expression and all of the non-distinct aggregate functions (for
example, SUM, AVG) had to fit within a single database block.
There is no longer a limitation on the number or size of aggregations
in a single SQL statement. This was also tied to the 64K size limitation on
SQL statements, which has also been removed.
Next, let's talk about the use of remote stored functions
in select statements.
Remote Stored Functions in SELECT Statements
You can now refer to remote functions stored in a
non-Oracle database within the SELECT statement. The hs_call_nameparameter refers to this feature. Detailed information about
this parameter can be found on page A-3 in the manual: Oracle10i
Heterogeneous Connectivity Administrator's Guide.
The hs_call_name
parameter is specified in the heterogeneous services initialization
parameter file. This file contains configuration settings stored as a text
file. The heterogeneous services initialization parameters serve two
functions:
n
They give the DBA a
means of fine-tuning to optimize performance and memory utilization for the
gateway and the Heterogeneous Services component.
n
They enable the DBA to
tell the gateway services (and thereby, heterogeneous services) how the
non-Oracle system has been configured.
These parameters give heterogeneous services information
about the configurable properties of the non-Oracle database system.
You use the view
v$hs_parameterto monitor the
heterogeneous services initialization parameters. Users can set
initialization parameters in the gateway initialization files. There is a
heterogeneous services initialization file for each gateway instance. The
name of the file is initsid.ora,
where sid is the Oracle system identifier used for the gateway.
For generic connectivity, the heterogeneous services
initialization file is located in the directory $ORACLE_HOME/hs/admin.
In case of specific non-oracle database system
transparent gateways, they are located in directories named with the naming
convention:
$ORACLE_HOME/product_name/admin
where the product_name variable is the name of the
database system. So, for example, the Sybase gateway initialization file is
located in the directory $ORACLE_HOME/tg4sybs/admin.
The heterogeneous services initialization files contain a
list of initialization parameter settings, each of which should be on an
individual line. The syntax to set the initialization parameters is:
[set] [private] parameter = parameter_value
The setand
privatekeywords are optional. When
the set keyword is present, the variable will also be set in the
environment. When the private keyword is present, the parameter will
not be uploaded to the server. In general, it is recommended by Oracle that
the private keyword not be used. The only exception is if the
initialization parameter value contains sensitive information (such as a
password) that shouldn't be sent over the network.
In the initialization parameter syntax, the following
rules apply:
n
All keywords (SET,
PRIVATE, and IFILE) are case insensitive.
n
Initialization parameter
names and values are case sensitive. Most initialization parameter names are
uppercase.
n
String values for
heterogeneous services parameters must be lowercase.
Note: Exceptions to these rules are explicitly noted.
For this section, the only parameter we are interested in
is the hs_call_nameparameter.
hs_call_name
The hs_call_name
parameter specifies the list of remote functions that a developer can
reference in SQL statements to a specific database, through a specific
gateway. The value for hs_call_name contains a list of remote
functions and their owners, separated by semicolons, in the following
format:
owner_name.function_name
For example:
owner1.F1;owner2.F2;owner3.F3
If no owner name is specified for a remote function, then
the default owner name is used to connect to the remote database (the
default owner name is specified when the HS database link is created). The
entries for the owner names and the function names are case-sensitive.
Case-Insensitive and Accent-Insensitive Query and
Sort
You use the nls_sort
session parameter to specify a case-insensitive or accent-insensitive sort,
this is accomplished by:
n
Appending a " _CI" to an
Oracle sort name for a case-insensitive sort.
n
Appending an "_AI" to an
Oracle sort name for an accent-insensitive and case-insensitive sort.
For example, you can set nls_sort to the following
types of values:
n
SPANISH becomes SPANISH_AI
n
DUTCH becomes DUTCH_CI
Binary sorts can also be case-insensitive or
accent-insensitive. When you specify
binary_ci as a value for nls_sort,
it designates a sort that is accent-sensitive and case-insensitive.
binary_ai designates an
accent-insensitive and case-insensitive binary sort. You may want to use a
binary sort if the binary sort order of the character set is appropriate for
the character set you are using.
For example, with the
nls_lang
environment variable set to
american_america.we8iso8859p1, to use an example right out of the Oracle
Database 10g manual, you can
create a table called test1 and populate it as follows:
SQL> CREATE TABLE test1 (letter VARCHAR2(10));
SQL> INSERT INTO test1 VALUES('ä');
SQL> INSERT INTO test1 VALUES('a');
SQL> INSERT INTO test1 VALUES('A');
SQL> INSERT INTO test1 VALUES('Z');
SQL> SELECT * FROM test1;
LETTER
------
ä
a
A
Z
Since the default value of nls_sort is BINARY you
don't need to specify anything extra to use a binary sort. Use the following
statement to do a binary sort of the characters in table test1:
SELECT * FROM test1 ORDER BY letter;
To change the value of nls_sort, you would enter a
statement like (only using your sort specifier):
ALTER SESSION SET NLS_SORT=BINARY_AI;
When nls_sort=binary,
uppercase letters come before lowercase letters. Letters with diacritics
appear last. If the sort considers diacritics but ignores case (binary_ci),
the letters with diacritics will appear last. When both case and diacritics
are ignored (binary_ai), the "ä"
is sorted with the other characters whose base letter is "a". All characters
whose base letter is "a" occur before any occurrence of the letter "z". You
should use binary sorts for better performance when the character set is
US7ASCII, or any other character set that has the same sort order as the
binary sorts.
The following table shows the
results from setting nls_sort to
binary, binary_ci, and binary_ai.
BINARY
|
BINARY_CI
|
BINARY_AI
|
A
|
a
|
ä
|
Z
|
A
|
a
|
a
|
Z
|
A
|
ä
|
ä
|
Z
|
The next table shows the sort orders that result from
German sorts for the table. A German sort places lowercase letters before
uppercase letters, and a "ä" occurs before a "Z". When the sort ignores both
case and diacritics (using nls_sort=german_ai),
the "ä" appears with the other characters whose base letter is an "a".
GERMAN
|
GERMAN_CI
|
GERMAN_AI
|
a
|
a
|
ä
|
A
|
A
|
a
|
Ä
|
ä
|
A
|
Z
|
Z
|
Z
|
The next feature we will discuss is the enhanced CONNECT
BY support.
Enhanced CONNECT BY Support
Additions to the CONNECT BY clause
enhance queries of hierarchical data by returning all ancestor-dependent
pairs (not just parent-child pairs); a new set of pseudo-columns is provided
in Oracle Database 10g:
§
connect_by_iscycle
§
connect_by_isleaf
§
connect_by_root
These specify whether or not a given node is a leaf of a
hierarchy, a cycle in a hierarchy triggers an error message with information
about the rows involved in the cycle, and the CONNECT BY clause is now
supported for simple subqueries.
Hierarchical Query Pseudo-columns
The new hierarchical query pseudo-columns are valid only
in hierarchical queries. The specific hierarchical query pseudo-columns are:
§
connect_by_iscycle
§
connect_by_isleaf
connect_by_iscycle
The purpose of the
connect_by_iscyclepseudo-column is
to return 1 if the current row has a child which is also its ancestor. If
there is no relation, it returns a 0 value. This information can then be
used to further expand the hierarchy.
connect_by_iscyclecan only be used if you have specified the
nocycle parameter of the CONNECT
BY clause. The NOCYCLE clause enables Oracle to return the results of a
query that would fail because of a CONNECT BY loop.
connect_by_iscycle Example
In the standard Oracle example table, scott.emp, provided
for this purpose in all Oracle databases, the employee King is the ultimate
boss and has no manager (no entry for the mgr column). One of his employees
is Clark, who is the manager of department 10. If we update the employees
table to set Clark as King's manager, we
can create a loop in the data to show the use of the CONNECT_BY_ISCYCLE
pseudo-column:
Without the update the SELECT:
SQL> SELECT ename "Employee",
2
LEVEL, SYS_CONNECT_BY_PATH(ename, '/') "Path"
3
FROM scott.emp
4
WHERE level <= 3 AND deptno = 10
5
START WITH ename = 'KING'
6
CONNECT BY PRIOR empno = mgr AND LEVEL <= 4;
Generates:
Employee
LEVEL Path
-------------- --------- ------------------
KING
1 /KING
CLARK
2 /KING/CLARK
MILLER
3 /KING/CLARK/MILLER
Now, let's give Clark a
really big promotion (don't we all dream about becoming our boss's boss?).
SQL> UPDATE scott.emp SET mgr = 7782
2
WHERE empno = 7939;
1 row updated.
Now let's check out the hierarchy again:
SQL> SELECT ename "Employee",
2
LEVEL, SYS_CONNECT_BY_PATH(ename, '/') "Path"
3
FROM scott.emp
4
WHERE level <= 3 AND deptno = 10
5
START WITH ename = 'KING'
6
CONNECT BY PRIOR empno = mgr AND LEVEL <= 4;
ERROR:
ORA-01436: CONNECT BY loop in user data
no rows selected
By adding the
nocycle parameter in the CONNECT BY condition, we can cause Oracle to
return the rows despite the loop. The
connect_by_iscyclepseudo-column will
show you which rows contain the cycle:
SQL> SELECT ename "Employee", CONNECT_BY_ISCYCLE "Cycle",
2
LEVEL, SYS_CONNECT_BY_PATH(ename, '/') "Path"
3
FROM scott.emp
4
WHERE level <= 3 AND deptno = 10
5
START WITH ename = 'KING'
6
CONNECT BY NOCYCLE PRIOR empno = mgr AND LEVEL <= 4;
Employee
CYCLE
LEVEL Path
-------------- -----
--------- ------------------
KING
0
1 /KING
CLARK
1
2 /KING/CLARK
MILLER
0
3 /KING/CLARK/MILLER
connect_by_isleaf
The pseudo-column,
connect_by_isleaf, returns 1
if the current row is a leaf in the tree defined by the CONNECT BY
condition. If the current row is not a leaf in the current tree, it returns
0. This information indicates whether a given row can be further expanded to
show more of the hierarchy.
connect_by_isleaf
Example
The following example shows the first three levels of the
scott.emp table, indicating for each row whether it is a leaf row ( which is
indicated by 1 in the Leaf column) or whether it has child rows (which is
indicated by 0 in the Leaf column):
SQL> SELECT ename "Employee", CONNECT_BY_ISLEAF "Leaf",
LEVEL,
2
SYS_CONNECT_BY_PATH(ename,'/') "Path" FROM scott.emp
3
WHERE level <= 3 AND deptno = 10 START WITH ename ='KING'
4*
CONNECT BY nocycle PRIOR
empno = mgr AND LEVEL <= 4
Employee
Leaf
LEVEL
Path
KING
0
1
/KING
CLARK
1
2
/KING/CLARK
MILLER
1
3
/KING/CLARK/MILLER
There is one more new clause,
connect_by_root. Let's discuss
this additional clause before we go on to the next new feature.
connect_by_root
The unary operator
connect_by_rootis only valid in
hierarchical queries. When a column is qualified with this operator, Oracle
returns the column value using data from the root row. This operator is
intended to extend the functionality of the CONNECT BY [PRIOR] condition of
hierarchical queries.
There is one restriction on the
connect_by_rootclause - this
operator cannot be specified in the START WITH condition or the CONNECT BY
condition.
SQL> SELECT ename "Employee", CONNECT_BY_ROOT empno
"Root", LEVEL,
2
SYS_CONNECT_BY_PATH(ename,'/')
"Path" FROM scott.emp
3
WHERE level <= 3 AND deptno = 10 START WITH ename='KING'
4* CONNECT
BY NOCYCLE PRIOR empno = mgr AND LEVEL <= 4
Employee
Root
LEVEL Path
-------------- -----
--------- ------------------
KING
7839
1 /KING
CLARK
7839
2 /KING/CLARK
MILLER
7839
3 /KING/CLARK/MILLER