|
 |
|
Using Spreadsheets as Oracle Tables
Oracle Tips by Burleson Consulting |
If you are
seeking a easy way to get Oracle data into spreadsheets, see
Excel-DB,
the best way to quickly add Oracle data into Excel spreadsheets.
Also see
Oracle create spreadsheet tips.The advent
of external tables in Oracle9i is exciting because it
allows SQL queries to access any type of flat file, as
if the data were stored inside an Oracle table. We?ll
examine some caveats to this new approach,
specifically:
-
The external file must be comma-delimited and
stored on the server as a file with a .csv
extension.
-
External spreadsheets are not good for large
files because the entire file must be reread into
Oracle whenever a change is saved to the
spreadsheet.
-
End users must never reformat the data columns
inside the spreadsheet environment.
As a review, in the last
article we took the following comma-delimited flat
file and accessed it in Oracle9i as an external table.
7369,SMITH,CLERK,7902,17-DEC-80,800,20
7499,ALLEN,SALESMAN,7698,20-FEB-81,1600,300,30
7521,WARD,SALESMAN,7698,22-FEB-81,1250,500,30
7566,JONES,MANAGER,7839,02-APR-81,2975,,20
7654,MARTIN,SALESMAN,7698,28-SEP-81,1250,1400,30
7698,BLAKE,MANAGER,7839,01-MAY-81,2850,,30
7782,CLARK,MANAGER,7839,09-JUN-81,2450,,10
7788,SCOTT,ANALYST,7566,19-APR-87,3000,,20
7839,KING,PRESIDENT,,17-NOV-81,5000,,10
7844,TURNER,SALESMAN,7698,08-SEP-81,1500,0,30
7876,ADAMS,CLERK,7788,23-MAY-87,1100,,20
The next code listing shows the syntax used to make
the file appear as an Oracle external table.
create
directory testdir as 'u01/oracle/oradata/testdb';
create
table emp_ext (
EMPNO NUMBER(4),
ENAME VARCHAR2(10),
JOB VARCHAR2(9),
MGR NUMBER(4),
HIREDATE DATE,
SAL NUMBER(7,2),
COMM NUMBER(7,2),
DEPTNO NUMBER(2))
Organization external
(type oracle_loader
default directory testdir
access parameters (records
delimited by newline
fields terminated by ',')
location ('emp_ext.csv'))
reject limit 1000;
However, when defining the flat file as an external
table, the file remains on the operating system as a
flat file, where it can be read and updated with a
variety of tools, including spreadsheets. Using
Microsoft's Excel spreadsheets, the external table
data can be read just as if it were standard
spreadsheet data (Figure A)
Figure A |
 |
An external table inside an
Excel spreadsheet |
End users can now manage critical tables inside
easy-to-use spreadsheets. Oracle immediately notices
whenever a change is made to the spreadsheet. However,
there are important limitations to using spreadsheets
as Oracle tables, the foremost being excessive disk
I/O whenever the spreadsheet has changed. Let's take a
closer look.
Internals of
external tables
It's important to recognize that Oracle data inside
spreadsheets will not be accessible as quickly as
internal row data. Oracle can't maintain row-level
locking because the operating system, not Oracle, is
in command. When a spreadsheet is defined as an
external table, Oracle has no way of knowing when
individual row data changes. The operating system will
only tell Oracle that the entire spreadsheet has
changed.
In addition,
data blocks that are read from an external
table are not placed inside the Oracle
data buffers. The dictionary query in
Listing A demonstrates that Oracle
doesn't read the external table rows into
the RAM data cache.
select
bp.name pool_name,
ob.name object,
ob.subname sub_name,
sum(buf_count) buffer_blocks
from
(select set_ds, obj, count(*) buf_count
from x$bh group by set_ds, obj) bh,
obj$ ob,
x$kcbwds ws,
v$buffer_pool bp
where
ob.dataobj# = bh.obj
and
ob.owner# > 0
and
bh.set_ds = ws.addr
and
ws.set_id between bp.lo_setid and bp.hi_setid
group by
bp.name,
ob.name,
ob.subname
order by
bp.name,
ob.name,
ob.subname
;
As we see in
Listing below, selections from our
table don't reside in the data buffers
following a SQL query.
SQL> select ename from pubs.emp_ext;
SQL> @buf_data
POOL_NAME OBJECT
SUB_NAME
BLOCKS
--------- ------------------------
------------------------
--------
DEFAULT PUBLISHER
2
REPCAT$_REPPROP
1
SNS$BINDINGS$
2
SNS$INODE$
2
SNS$NODE_INDEX
1
SNS$REFADDR$
3
SNS$REFADDR_INDEX
3
SYS_C001042
1
SYS_C001414
1
Oracle doesn't make it clear whether a
separate buffering mechanism is used for
external tables. With this lack of
buffering, Oracle9i must reread the entire
spreadsheet for each SQL invocation that
accesses the external table.
To maintain data integrity, Oracle must
detect when the spreadsheet data has
changed, but there is no way to discover
when specific spreadsheet values have
changed. When Oracle detects that the flat
file has been updated, all data in the RAM
data buffers becomes invalid, and the
entire spreadsheet must be reread. This is
the primary reason external tables are not
efficient for large volumes of data.
Because Oracle reads operating system
files in data blocks, we can compute the
amount of disk I/O by determining the
number of spreadsheet blocks with a simple
shell script. In this script, we know the
Oracle database has 8-KB block sizes:
bytes=`ls -al|grep
emp_ext.csv|awk '{ print $5 }'`
num_bytes=`expr $bytes`
blocks=`expr $num_bytes
/ 8192`
echo $blocks
This script will tell us exactly how many
disk reads are required to access the
Oracle external table whenever a change is
made.
Security for external table files
Any saved change to the spreadsheet causes
the entire spreadsheet to be read again
into Oracle from the disk. Spreadsheets
can be password-protected at the operating
system level, and they can be marked
read-only with the following DOS command:
c:\docs\pubsdb\queries>
attrib +r emp_ext.csv
In UNIX, we can use this command to make
the spreadsheet read-only for everyone
except the owner of the spreadsheet:
chmod 744 emp_ext.csv
This ensures the file will not be updated,
except by authorized users. It makes sure
that Oracle caches the data in an
efficient manner. Once defined to Oracle,
the spreadsheet will be accessible both
through Oracle and the Excel spreadsheet.
Limitations of
comma-delimited spreadsheet files
In order for Oracle to successfully read
comma-delimited (csv) files, it's
important to avoid making
spreadsheet-specific changes because Excel
will change the internal storage of the
column to accommodate the formatting. For
example, let's assume that a manager
reformats the SALARY column for comma
display, as shown below.

Reformatting a comma-delimited (csv) spreadsheet
Once the file has been saved, Oracle can
no longer read the SALARY column because
the column has been stored in quotes. To
Oracle, this defines the column as a
character:
7369,SMITH,CLERK,7902,17-Dec-80,800,20,
7499,ALLEN,SALESMAN,7698,20-Feb-81,"1,600",300,30
7521,WARD,SALESMAN,7698,22-Feb-81,"1,250",500,30
7566,JONES,MANAGER,7839,2-Apr-81,"2,975",,20
7654,MARTIN,SALESMAN,7698,28-Sep-81,"1,250",1400,30
7698,BLAKE,MANAGER,7839,1-May-81,"2,850",,30
7782,CLARK,MANAGER,7839,9-Jun-81,"2,450",,10
7788,SCOTT,ANALYST,7566,19-Apr-87,"3,000",,20
7839,KING,PRESIDENT,,17-Nov-81,"5,000",,10
7844,TURNER,SALESMAN,7698,8-Sep-81,"1,500",0,30
7876,ADAMS,CLERK,7788,23-May-87,"1,100",,20
The accidental reformatting of the file
makes it unreadable by Oracle. You must
take special care to instruct end users to
never change the formatting.
Click here to learn how to make Oracle external
tables accept updates.
 |
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning tips and
scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |
|