 |
|
Parallel Enabled Table
Functions
Oracle Tips by Burleson Consulting
|
The following Tip is from the
outstanding book "Oracle
PL/SQL Tuning: Expert Secrets for High Performance Programming" by
Dr. Tim Hall, Oracle ACE of the year, 2006:
Parallel enabled table functions can improve
performance by sharing their workload between slave processes.
To parallel enable a function, it must internally define a method for
partitioning the workload, and the following conditions must be met:
-
The function must include a PARALLEL_ENABLE
clause.
-
A single REF CURSOR must be specified with a
PARTITION BY clause. Only strongly typed REF CURSORs can be
specified in a PARTITION BY clause that specifies a partition
column, but weakly typed REF CURSORs can be used with the PARTITION
BY ANY clause.
The basic syntax for a parallel enabled table
function is displayed below.
CREATE
FUNCTION function-name(parameter-name ref-cursor-type)
RETURN rec_tab_type PIPELINED
PARALLEL_ENABLE(PARTITION parameter-name BY [{HASH | RANGE} (column-list)
| ANY ]) IS
BEGIN
...
END;
/
This syntax shows that the workload can be
range or hash partitioned when an explicit column is specified. The
ANY keyword indicates that the workload is randomly partitioned
between the slave processes.
The test_parallel_setup.sql script defines
several schema objects to test parallel execution.
test_parallel_setup.sql
-- Create
and poplate a test table.
CREATE TABLE
parallel_test (
id
NUMBER(10),
description VARCHAR2(50)
);
TRUNCATE
TABLE parallel_test;
BEGIN
FOR i IN 1 .. 100000 LOOP
INSERT INTO parallel_test (id, description)
VALUES (i, 'Description or ' || i);
END LOOP;
COMMIT;
END;
/
-- Define a
strongly typed REF CURSOR type.
CREATE OR
REPLACE PACKAGE parallel_ptf_api AS
TYPE
t_parallel_test_row IS RECORD (
id
NUMBER(10),
description VARCHAR2(50),
sid
NUMBER
);
TYPE
t_parallel_test_tab IS TABLE OF t_parallel_test_row;
TYPE
t_parallel_test_ref_cursor IS REF CURSOR RETURN parallel_test%ROWTYPE;
FUNCTION test_ptf (p_cursor IN t_parallel_test_ref_cursor)
RETURN t_parallel_test_tab PIPELINED
PARALLEL_ENABLE(PARTITION p_cursor BY HASH (id));
END
parallel_ptf_api;
/
SHOW ERRORS
CREATE OR
REPLACE PACKAGE BODY parallel_ptf_api AS
FUNCTION test_ptf (p_cursor IN t_parallel_test_ref_cursor)
RETURN t_parallel_test_tab PIPELINED
PARALLEL_ENABLE(PARTITION p_cursor BY HASH (id))
IS
l_row t_parallel_test_row;
BEGIN
LOOP
FETCH p_cursor
INTO l_row.id,
l_row.description;
EXIT WHEN p_cursor%NOTFOUND;
SELECT sid
INTO l_row.sid
FROM v$mystat
WHERE rownum = 1;
PIPE ROW (l_row);
END LOOP;
RETURN;
END test_ptf;
END
parallel_ptf_api;
/
SHOW ERRORS
First the test_parallel_setup.sql script
creates and populates a test table, then it creates a package
containing the necessary types and a parallel enabled pipelined table
function. The SID from the V$MYSTAT view is used to identify the
session rather than the SESSIONID value from the SYS_CONTEXT function.
This is because the latter always returns the value of the initiating
session, rather than the slave session.
With the schema objects in place, the
test_parallel_table_function.sql can be used to compare the
performance of the table function in parallel and serial modes.
test_parallel_table_function.sql
SET TIMING
ON
PROMPT
PROMPT Serial Execution
PROMPT ================
SELECT sid, count(*)
FROM TABLE(parallel_ptf_api.test_ptf(CURSOR(SELECT *
FROM parallel_test t1
)
)
) t2
GROUP BY sid;
PROMPT
PROMPT Parallel Execution
PROMPT ==================
SELECT sid, count(*)
FROM TABLE(parallel_ptf_api.test_ptf(CURSOR(SELECT /*+
parallel(t1, 5) */ *
FROM parallel_test t1
)
)
) t2
GROUP BY sid;
The output from the
test_parallel_table_function.sql script is displayed below and clearly
demonstrates that the workload is shared among several slave processes
when the driving query is parallelized.
SQL> @test_parallel_table_function.sql
Serial
Execution
================
SID COUNT(*)
---------- ----------
155 100000
1 row
selected.
Elapsed:
00:00:46.25
Parallel
Execution
==================
SID COUNT(*)
---------- ----------
147 33213
154 33241
162 33546
3 rows
selected.
Elapsed:
00:00:43.85
The performance improvement shown in this test
is rather small. The assessment of the benefits of parallel
execution has to be made on a case-by-case basis. The
performance improvements come at the expense of extra resource
requirements and coordination of the slaves, so it is not a valid
option in all circumstances.
The ORDER BY and CLUSTER BY clauses can be
used to alter the order of the data entering each instance of the
parallel enabled table function, such that:
-
When neither the ORDER BY or CLUSTER BY
clause is specified, the data entry is random.
-
The ORDER BY clause orders the data by the
specified column.
-
The CLUSTER BY clause groups data with the
same values together, but does not order the data between values.
The parallel_order.sql script contains
examples of all three methods.
parallel_order.sql
CREATE OR
REPLACE PACKAGE parallel_order_api AS
TYPE
t_objects_row IS RECORD (
owner VARCHAR2(30),
object_name VARCHAR2(30)
);
TYPE
t_objects_tab IS TABLE OF t_objects_row;
TYPE
t_objects_ref_cursor IS REF CURSOR RETURN all_objects%ROWTYPE;
FUNCTION pipe_value(p_cursor IN t_objects_ref_cursor)
RETURN t_objects_tab PIPELINED
PARALLEL_ENABLE(PARTITION p_cursor BY RANGE (owner));
FUNCTION pipe_value_order(p_cursor IN t_objects_ref_cursor)
RETURN t_objects_tab PIPELINED
ORDER p_cursor BY (owner)
PARALLEL_ENABLE(PARTITION p_cursor BY RANGE (owner));
FUNCTION pipe_value_cluster(p_cursor IN
t_objects_ref_cursor)
RETURN t_objects_tab PIPELINED
CLUSTER p_cursor BY (owner)
PARALLEL_ENABLE(PARTITION p_cursor BY RANGE (owner));
END
parallel_order_api;
/
SHOW ERRORS
CREATE OR
REPLACE PACKAGE BODY parallel_order_api AS
FUNCTION pipe_value(p_cursor IN t_objects_ref_cursor)
RETURN t_objects_tab PIPELINED
PARALLEL_ENABLE(PARTITION p_cursor BY RANGE (owner))
IS
l_row all_objects%ROWTYPE;
l_out_row t_objects_row;
BEGIN
LOOP
FETCH p_cursor
INTO l_row;
EXIT WHEN p_cursor%NOTFOUND;
l_out_row.owner := l_row.owner;
l_out_row.object_name := l_row.object_name;
PIPE ROW (l_out_row);
END LOOP;
RETURN;
END pipe_value;
FUNCTION pipe_value_order(p_cursor IN t_objects_ref_cursor)
RETURN t_objects_tab PIPELINED
ORDER p_cursor BY (owner)
PARALLEL_ENABLE(PARTITION p_cursor BY RANGE (owner))
IS
l_row all_objects%ROWTYPE;
l_out_row t_objects_row;
BEGIN
LOOP
FETCH p_cursor
INTO l_row;
EXIT WHEN p_cursor%NOTFOUND;
l_out_row.owner := l_row.owner;
l_out_row.object_name := l_row.object_name;
PIPE ROW (l_out_row);
END LOOP;
RETURN;
END pipe_value_order;
FUNCTION
pipe_value_cluster(p_cursor IN t_objects_ref_cursor)
RETURN t_objects_tab PIPELINED
CLUSTER p_cursor BY (owner)
PARALLEL_ENABLE(PARTITION p_cursor BY RANGE (owner))
IS
l_row all_objects%ROWTYPE;
l_out_row t_objects_row;
BEGIN
LOOP
FETCH p_cursor
INTO l_row;
EXIT WHEN p_cursor%NOTFOUND;
l_out_row.owner := l_row.owner;
l_out_row.object_name := l_row.object_name;
PIPE ROW (l_out_row);
END LOOP;
RETURN;
END pipe_value_cluster;
END
parallel_order_api;
/
SHOW ERRORS
/*
SELECT *
FROM TABLE(parallel_order_api.pipe_value(
CURSOR(SELECT /*+
parallel(all_objects, 5) */ * FROM all_objects))
);
SELECT *
FROM TABLE(parallel_order_api.pipe_value_order(
CURSOR(SELECT /*+
parallel(all_objects, 5) */ * FROM all_objects))
);
SELECT *
FROM TABLE(parallel_order_api.pipe_value_cluster(
CURSOR(SELECT /*+
parallel(all_objects, 5) */ * FROM all_objects))
);
*/
-- Cleanup
objects.
DROP PACKAGE
parallel_order_api;
The output from this script is too large to
display here. However, when comparing the results of the queries, the
difference in order should be apparent.
The next section uses the information covered
so far to build a transformation pipeline.
|