Understanding Oracle NULL values:
Well actually, a NULL is
UNKOWN not the absence of a value. But how would ?Much Adieu About
Unknown? sound as a title?
I had to explain to a fellow DBA the impact of null on processing
where the columns that where null were also used in joining of
tables. It took a while, but finally he caught on. It made me wonder
how many folks are confused by this. The major issue is that nulls
are not tracked in normal indexes. It is true that while bitmap
indexes may be used to track null values, they have their own issues
for OLTP environments and usually have very restricted use in OLTP
environments.
I decided to develop a little test case just to show what is going
on with NULLs. Let?s take a look at this test case:
SQL> create table test as
select * from dba_objects;
Table created.
SQL> desc test
Name Null? Type
----------------------------------------- -------- ------------------------
OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(128)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(18)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)
SQL> create index test_ind on test(object_id);
Index created.
SQL> analyze table test compute statistics;
Table analyzed.
SQL> set autotrace on
SQL> select count(*) from test where object_id is not null;
COUNT(*)
----------
33732
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=19 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 INDEX (FAST FULL SCAN) OF 'TEST_IND' (NON-UNIQUE) (Cost=
19 Card=33732 Bytes=134928)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
79 consistent gets
0 physical reads
0 redo size
381 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select count(*) from test where object_id is null;
COUNT(*)
----------
11
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=104 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'TEST' (Cost=104 Card=11 Bytes=44
)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
464 consistent gets
0 physical reads
0 redo size
379 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> create index fbi_test on test(nvl(object_id,-1));
Index created.
SQL> analyze table test compute statistics;
Table analyzed.
SQL> select count(*) from test where nvl(object_id,-1)=-1;
COUNT(*)
----------
11
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'FBI_TEST' (NON-UNIQUE) (Cost=2 Ca
rd=12 Bytes=48)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
379 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> update test set object_id=-1 where object_id is null;
11 rows updated.
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=CHOOSE (Cost=104 Card=11 Bytes=44
)
1 0 UPDATE OF 'TEST'
2 1 TABLE ACCESS (FULL) OF 'TEST' (Cost=104 Card=11 Bytes=44
)
Statistics
----------------------------------------------------------
2 recursive calls
78 db block gets
465 consistent gets
0 physical reads
9208 redo size
625 bytes sent via SQL*Net to client
548 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
11 rows processed
SQL> commit;
Commit complete.
SQL> select count(*) from test where object_id=-1;
COUNT(*)
----------
11
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'TEST_IND' (NON-UNIQUE) (Cost=2 Ca
rd=1 Bytes=4)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
379 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
So what have we done in
the above SQL?
1. We create a test table with nulls in the column OBJECT_ID
2. We then created a normal index on the OBJECT_ID column
3. Using IS NULL and IS NOT NULL based WHERE clauses we determined:
a. The IS NOT NULL type WHERE clause will use an index
b. The IS NULL type WHERE doesn?t use an index
4. Next we create a function base index using the NVL function to
convert the NULL values into a default value.When we use the NVL function in our SELECT in place of the IS NULL
we get index usage and much better performance, however, this may
lead to improper relationships between tables.
5. As a final step, we replace the NULL values in TEST with a
default value of -1, now using that in our select instead of IS NULL
we get response virtually identical with the use of NVL function
based index.
Of course the issues we are trying to prevent with all this is
processing of IS NULL type queries forcing such items as full table
scans and causing developers to have to use OUTER JOIN type syntax
to resolve NULL conditions between tables. By eliminating these
OUTER JOIN and IS NULL processing steps which result in large HASH
and full table scans in many cases, we can dramatically reduce the
execution times of queries. Notice in the example how we reduced
cost figures from 104 for a full table scan to 2 for an index
(either our normal or the function based index) scan.
So, what should we take from this? Several conclusions can be
reached:
- Whenever possible avoid the use of NULLable columns for columns
that will be joined.
- Use default values in place of NULL, note that to avoid outer
joins, both parent and child tables must have a value, for example
if we had a parent table the related to the OBJECT_ID in our
example, we would have an entry of (-1, ?NO ID?) in that parent to
resolve the -1 entries in the child.
- If IS NULL is required, consider replacing it with a NVL call and
a function based index, while this may not help with outer joins, it
will help with IS NULL type selects.