Improving PL/SQL Performance
By Using Collections
Yesterday I
looked at some code
written in Oracle 7-style
PL/SQL, and compared it with
what is considered best
practice with Oracle 8i and
beyond. The more up-to-date code
generally runs around 3-5
times faster than the old
style code, and, as part of
understanding where the
changes to PL/SQL have
occurred, I've started off
by looking at how Oracle now
handles a concept known as
'Collections'.
The online
Oracle PL/SQL User Guide
introduces collections as;
"A
collection is an ordered
group of elements, all
of the same type. It is
a general concept that
encompasses lists,
arrays, and other
familiar datatypes. Each
element has a unique
subscript that
determines its position
in the collection."
So what are
these used for? Well, when
you first start using
PL/SQL, you generally end up
declaring variables, then
retrieving a value from a
table to go into the
variable. If you're going to
retrieve a range of values
from a table, you declare a
cursor, retrieve the values
one at a time from the
database, and process them
sequentially. Sometimes
though, you want to retrieve
a load of values all in one
go, and load them into an
array, so that you can carry
out some sort of operation
on the group of values as a
whole.
With Oracle
7, you could create what was
called an index-by table, or
'PL/SQL Table', that
consisted of a series of
value pairs; an index value,
and a scalar datatype (such
as varchar2, or number). You
referred to an individual
PL/SQL Table entry by using
the index, i.e.
CUSTOMER_NAME(10).
What made it interesting was
that, as well as using
scalar datatypes, you could
also create PL/SQL tables
using PL/SQL records, which
could consist of a number of
individual columns. By
creating a PL/SQL record
type based off of an
existing table (for example,
by using the
SCOTT.EMP%ROWTYPE),
you could load a table row,
or an entire table, into a
variable and process it
within your PL/SQL package.
For example
DECLARE
TYPE EmpTabTyp IS TABLE
OF emp%ROWTYPE
INDEX
BY BINARY_INTEGER;
emp_tab EmpTabTyp;
BEGIN
/* Retrieve employee
record. */
SELECT * INTO
emp_tab(7468) FROM emp
WHERE empno = 7468;
...
END;
would set up
a PL/SQL table type that
consists of PL/SQL records
with the same definition as
the columns in SCOTT.EMP,
and then create a variable
based on this type. It would
then retrieve a single row
from the EMP table, then
place it into the EMP_TAB
variable, using the index
value of 7468. Note that you
don't need to initialise the
variable before it's used,
and you can arbitrarily put
values into the variable
using any valid
BINARY_INTEGER value.
As of Oracle
8, PL/SQL Tables were
renamed 'Collections' and
supplemented by two new
composite types: Nested
Tables, and VARRAYs
Nested
tables extend the
functionality of index-by
tables by adding extra
collection methods (known as
table attributes for
index-by tables), and, in a
new development, nested
tables can also be store in
database tables and can be
directly manipulated using
SQL. Collectively, both
types are known known as
PL/SQL Tables.
To declare a
nested table, you use the
syntax (note the lack of
'INDEX BY BINARY INTEGER')
TYPE type_name IS TABLE
OF element_type [NOT
NULL];
There are
some key differences between
traditional index-by tables,
and nested tables, and some
good reasons why you'd want
to start using nested
tables, which detailed in
the
online documentation. In
practical terms, one major
difference between nested
tables and index-by tables,
is that you have to
initialize nested tables,
using a constructor (like
you get in java), defining
how many elements can
initially be stored in it
(although you can later
EXTEND the nested table);
however, as mentioned above,
you can store nested tables
within the database
(embedded in database
columns), which is a
distinct advantage over
index-by tables. So, if you
want to put together a
database that is object
orientated, and you need the
data to be persistent,
nested tables are the way to
go.
The other
new type of collection, as
of Oracle 8, is VARRAYs. A
varray (variable length
array) is a datatype that
would be familiar to java or
C programmers. A varray has
a maximum size, which you
must specify in its type
definition.
Its index has a
fixed lower bound of 1 and
an extensible upper bound.
Thus, a varray can contain a
varying number of elements,
from zero (when empty) to
the maximum specified in its
type definition.
So, how do
you choose which of these
three types (index-by
tables, nested tables and
varrays) to use.
According to the docs
(which also gives detailed
reasons to choose nested
tables over associative
arrays, and between nested
tables and varrays);
"If
you already have code or
business logic that uses
some other language, you
can usually translate
that language's array
and set types directly
to PL/SQL collection
types.
-
Arrays in other
languages become
VARRAYs in PL/SQL.
-
Sets and bags in
other languages
become nested tables
in PL/SQL.
-
Hash tables and
other kinds of
unordered lookup
tables in other
languages become
associative arrays
in PL/SQL."
So,
going back to our original
examples, why do the
answers suggest using
collections - in this case,
nested tables?
The original
method for loading the table
as put forward in the
question was:
CREATE TABLE t1 AS
SELECT *
FROM all_objects
WHERE 1=0;
CREATE OR REPLACE
PROCEDURE test_proc IS
BEGIN
FOR x
IN (SELECT * FROM
all_objects)
LOOP
INSERT INTO t1
(owner, object_name,
subobject_name,
object_id,
data_object_id,
object_type, created,
last_ddl_time,
timestamp, status,
temporary, generated,
secondary)
VALUES
(x.owner, x.object_name,
x.subobject_name,
x.object_id,
x.data_object_id,
x.object_type,
x.created,
x.data_object_id,
x.object_type,
x.created,
x.last_ddl_time,
x.timestamp,
x.status,
x.temporary,
x.generated,
x.secondary);
END
LOOP;
COMMIT;
END
test_proc;
Ignoring
the initial
table
creation
script, the
test_proc
producedure
does three
things;
-
Declares
a cursor
that
points
to the
resultset
from
SELECT *
FROM
ALL_OBJECTS
-
Starts
at
record
one, and
inserts
into the
t1 table
the
columns
from the
first
row in
the
cursor
-
Then, it
loops
back and
gets the
next row
of data,
until
all rows
from the
cursor
have
been
retrieved.
The data is
then
committed,
and the
procedure
ends.
The first
solution put
forward uses
a nested
table to
hold the
data from
the
ALL_OBJECTS
table, and
does
something
called BULK
COLLECT to
load all of
the source
tables' data
into the
nested
table.
CREATE
OR
REPLACE
PROCEDURE
fast_proc
(p_array_size
IN
PLS_INTEGER
DEFAULT
100)
IS
TYPE
ARRAY IS
TABLE OF
all_objects%ROWTYPE;
l_data
ARRAY;
CURSOR c
IS
SELECT *
FROM
all_objects;
BEGIN
OPEN c;
LOOP
FETCH c
BULK
COLLECT
INTO
l_data
LIMIT
p_array_size;
FORALL i
IN
1..l_data.COUNT
INSERT
INTO t2
VALUES
l_data(i);
EXIT
WHEN
c%NOTFOUND;
END
LOOP;
CLOSE c;
END
fast_proc;
/
Ignore the
fact that
the table's
called ARRAY
- it's a
nested
table, not a
varray.
The second
example
answer is a
variation on
this, that
does much
the same
thing with
slightly
more compact
code;
SQL>
create
or
replace
procedure
fast_proc
is
2
type
TObjectTable
is table
of
ALL_OBJECTS%ROWTYPE;
3
ObjectTable$
TObjectTable;
4
begin
5
select
6
* BULK
COLLECT
INTO
ObjectTable$
7
from
ALL_OBJECTS;
8
9
forall x
in
ObjectTable$.First..ObjectTable$.Last
10
insert
into t1
values
ObjectTable$(x)
;
11
end;
12 /
Again, a
nested table
is declared,
but this
time the
cursor is
dispensed
with, and
the data is
just bulk
loaded
directly
into the
nested
table.
Again, the
FORALL
statement is
used
afterwards
to run
through the
nested
table, and
I'll go into
how BULK
COLLECT and
FORALL work
tomorrow.
So what is
the message
here? Well,
first of
all, if you
need to
process lots
of rows in
one go,
loading the
data into
memory
first, use a
collection
and BULK
COLLECT the
data into
them. As
Daniel
Morgan
pointed out
when putting
his answer
together,
"The point I
would hope
more junior
developers
take away
from this is
that while
cursors
definitely
have their
uses ...
they should
not be the
first thing
one thinks
of any
more.".
Together
with BULK
COLLECT, use
FORALL to
loop through
the data,
and i'll be
going into
more detail
on two
features
tomorrow. In
the
meantime, if
you want to
do a bit
more reading
up on
collections,
and see how
they are
being
further
improved
with Oracle
10g, take a
look at
this OTN
article by
Steven
Feuerstein.