|
Improving PL/SQL performance with collections
February 5, 2004
Yesterday I
looked at some code written in Oracle 7-style PL/SQL, and compared
it with what is considered best practice with Oracle 8i and 9i. The
more up-to-date code generally runs around 3-5 times faster than the
old style code, and, as part of understanding where the changes to
PL/SQL have occurred, I've started off by looking at how Oracle now
handles a concept known as 'Collections'.
The online Oracle 9i PL/SQL User Guide
introduces collections as;
"A collection is an ordered group of elements, all of the same
type. It is a general concept that encompasses lists, arrays, and
other familiar datatypes. Each element has a unique subscript that
determines its position in the collection."
So what are these used for? Well, when you first start using
PL/SQL, you generally end up declaring variables, then retrieving a
value from a table to go into the variable. If you're going to
retrieve a range of values from a table, you declare a cursor,
retrieve the values one at a time from the database, and process them
sequentially. Sometimes though, you want to retrieve a load of values
all in one go, and load them into an array, so that you can carry out
some sort of operation on the group of values as a whole.
With Oracle 7, you could create what was called an index-by table,
or 'PL/SQL Table', that consisted of a series of value pairs; an index
value, and a scalar datatype (such as varchar2, or number). You
referred to an individual PL/SQL Table entry by using the index, i.e.
CUSTOMER_NAME(10). What made
it interesting was that, as well as using scalar datatypes, you could
also create PL/SQL tables using PL/SQL records, which could consist of
a number of individual columns. By creating a PL/SQL record type based
off of an existing table (for example, by using the
SCOTT.EMP%ROWTYPE), you could
load a table row, or an entire table, into a variable and process it
within your PL/SQL package.
For example
DECLARE
TYPE EmpTabTyp IS TABLE OF emp%ROWTYPE
INDEX BY BINARY_INTEGER;
emp_tab EmpTabTyp;
BEGIN
/* Retrieve employee record. */
SELECT * INTO emp_tab(7468) FROM emp WHERE empno = 7468;
...
END;
would set up
a PL/SQL table type that consists of PL/SQL records with the same
definition as the columns in SCOTT.EMP, and then create a variable
based on this type. It would then retrieve a single row from the EMP
table, then place it into the EMP_TAB variable, using the index value
of 7468. Note that you don't need to initialise the variable before
it's used, and you can arbitrarily put values into the variable using
any valid BINARY_INTEGER value.
As of Oracle
8, PL/SQL Tables were renamed 'Collections' and supplemented by two
new composite types: Nested Tables, and VARRAYs
Nested tables
extend the functionality of index-by tables by adding extra collection
methods (known as table attributes for index-by tables), and, in a new
development, nested tables can also be store in database tables and
can be directly manipulated using SQL. Collectively, both types are
known known as PL/SQL Tables.
To declare a
nested table, you use the syntax (note the lack of 'INDEX BY BINARY
INTEGER')
TYPE
type_name IS TABLE OF element_type [NOT NULL];
There are
some key differences between traditional index-by tables, and nested
tables, and some good reasons why you'd want to start using nested
tables, which detailed in the
online documentation. In practical terms, one major difference
between nested tables and index-by tables, is that you have to
initialise nested tables, using a constructor (like you get in java),
definining how many elements can initially be stored in it (although
you can later EXTEND the nested table); however, as mentioned above,
you can store nested tables within the database (embedded in database
columns), which is a distinct advantage over index-by tables. So, if
you want to put together a database that is object orientated, and you
need the data to be persistent, nested tables are the way to go.
The other new
type of collection, as of Oracle 8, is VARRAYs. A varray (variable
length array) is a datatype that would be familiar to java or C
programmers. A varray has a maximum size, which you must specify in
its type definition. Its index has a fixed lower bound of 1 and an
extensible upper bound. Thus, a varray can contain a varying number of
elements, from zero (when empty) to the maximum specified in its type
definition.
So, how do
you choose which of these three types (index-by tables, nested tables
and varrays) to use.
According to the docs (which also gives detailed reasons to choose
nested tables over associative arrays, and between nested tables and
varrays);
"If you
already have code or business logic that uses some other language, you
can usually translate that language's array and set types directly to
PL/SQL collection types.
-
Arrays in other languages become VARRAYs in PL/SQL.
-
Sets and bags in other languages become nested tables in PL/SQL.
-
Hash tables and other kinds of unordered lookup tables in other
languages become associative arrays in PL/SQL."
So,
going back to our original examples, why do the answers suggest
using collections - in this case, nested tables?
The original
method for loading the table as put forward in the question was:
CREATE TABLE
t1 AS
SELECT *
FROM all_objects
WHERE 1=0;
CREATE OR
REPLACE PROCEDURE test_proc IS
BEGIN
FOR x IN (SELECT * FROM all_objects)
LOOP
INSERT INTO t1
(owner, object_name, subobject_name, object_id,
data_object_id, object_type, created, last_ddl_time,
timestamp, status, temporary, generated, secondary)
VALUES
(x.owner, x.object_name, x.subobject_name, x.object_id,
x.data_object_id, x.object_type, x.created,
x.last_ddl_time, x.timestamp, x.status, x.temporary,
x.generated, x.secondary);
END LOOP;
COMMIT;
END test_proc;
Ignoring the
initial table creation script, the test_proc producedure does three
things;
-
Declares a cursor that points to the resultset from SELECT * FROM
ALL_OBJECTS
-
Starts at record one, and inserts into the t1 table the columns from
the first row in the cursor
-
Then, it loops back and gets the next row of data, until all rows
from the cursor have been retrieved.
The data is
then committed, and the procedure ends.
The first
solution put forward uses a nested table to hold the data from the
ALL_OBJECTS table, and does something called BULK COLLECT to load all
of the source tables' data into the nested table.
CREATE OR
REPLACE PROCEDURE fast_proc (p_array_size IN PLS_INTEGER DEFAULT 100)
IS
TYPE ARRAY IS
TABLE OF all_objects%ROWTYPE;
l_data ARRAY;
CURSOR c IS
SELECT *
FROM all_objects;
BEGIN
OPEN c;
LOOP
FETCH c BULK COLLECT INTO l_data LIMIT p_array_size;
FORALL i
IN 1..l_data.COUNT
INSERT INTO t2 VALUES l_data(i);
EXIT WHEN
c%NOTFOUND;
END LOOP;
CLOSE c;
END fast_proc;
/
Ignore the
fact that the table's called ARRAY - it's a nested table, not a varray.
The second
example answer is a variation on this, that does much the same thing
with slightly more compact code;
SQL> create
or replace procedure fast_proc is
2 type TObjectTable is table of ALL_OBJECTS%ROWTYPE;
3 ObjectTable$ TObjectTable;
4 begin
5 select
6 * BULK COLLECT INTO ObjectTable$
7 from ALL_OBJECTS;
8
9 forall x in ObjectTable$.First..ObjectTable$.Last
10 insert into t1 values ObjectTable$(x) ;
11 end;
12 /
Again, a
nested table is declared, but this time the cursor is dispensed with,
and the data is just bulk loaded directly into the nested table.
Again, the FORALL statement is used afterwards to run through the
nested table, and I'll go into how BULK COLLECT and FORALL work
tomorrow.
So what is
the message here? Well, first of all, if you need to process lots of
rows in one go, loading the data into memory first, use a collection
and BULK COLLECT the data into them. As Daniel Morgan pointed out when
putting his answer together, "The point I would hope more junior
developers take away from this is that while cursors definitely have
their uses ... they should not be the first thing one thinks of any
more.". Together with BULK COLLECT, use FORALL to loop through the
data, and i'll be going into more detail on two features tomorrow. In
the meantime, if you want to do a bit more reading up on collections,
and see how they are being further improved with Oracle 10g, take a
look at
this OTN article by Steven Feuerstein.
|
|
|