In order to appreciate the database object
model, it is important to understand all of the basic data structures that have
been used in the past, and which are being used again inside database objects.
In this chapter we will review the evolution of database management from
pre-database systems through to today's object technology databases with a focus
on how these data structures apply to database object technology. We will also
examine the hybrid database architectures including the object/relational hybrid
and explain why this approach has become popular for certain applications. The
aim of this chapter is to introduce the important concepts for each of the
database architectures and show how the concepts developed in these
architectures have been implemented in database objects.
Pre-database information storage
In the pre-database era, most records were put
in a folder and then the secretary would put them in a filing cabinet,
alphabetically of course, so any one could find the records needed when they
were desired. Unfortunately, each secretary had their own unique method of
filing, and the concept of alphabetical order was usually unique to the
secretary. So, if you needed some data, you generally waited until they
returned. In short, data storage was unique to the person who was storing the
data and the data was not always accessible when it was needed.
Flat file processing
After computers were invented people began to
realize the potential of the database storage. Data could be stored on a
computer, and computer programs were written to store, retrieve, and update this
data. The early computers were great at ?number crunching?, that is calculating
things like totals, averages, and standard deviations. Also, the computer was
great at doing the same thing over and over without making a mistake.
Consequently, the first computer applications were targeted at tasks which were
both repetitive and well structured. Programs and data were generally stored on
punched cards and read into the computer via a card reader. The data was then
kept in files on disk, drum, or magnetic tapes.
Flat files
Before databases were introduced, many
?database? systems were really nothing more than a loose collection of flat
files. These were called ?flat files? because they were not linked to other
database records, and each record in the file was independent of the other
records. Flat files can be stored in several formats, including the physical
sequential and direct access format.
Physical sequential files stored fixed-length
records in a linear fashion, such that it was necessary to read the file from
front-to-back to retrieve a record in the middle of the file.
Basically, a flat file was updated by merging
the existing master file with a new data file and the outcome was a new master
file. (Figure 2-1) The result of merging the old master file with a daily
transaction file created an audit trail of sorts, such that the old master tapes
reflected the content of the file at any given day.
Figure 2-1 Physical Sequential file updating
To find a record in a sequential file, the
system starts at the beginning of the file, and reads the file one record at a
time until it finds the desired record. It is impossible to update any record
in a physical sequential file without re-writing the entire file. Because the
file has to be re-written anyway, magnetic tape is ideal since it costs thousand
of times less than disk storage. Magnetic tapes only support sequential
organization, and are mainly used for very large amounts of data.
It is interesting to note that in 1997, more
data is stored in physical-sequential format than in all of the other file
formats. Companies are still using a flat-file architecture because of systems
that contain large amounts of unchanging, infrequently used data. Magnetic
tapes, which are 10,000 times cheaper than disk, are still the most economical
way to store large volumes of data.
So why talk about this antiquated method of file
storage? It turns out that the Common Object Request Broker Architecture
(CORBA) specifications allows for physical sequential files to be passed as
database objects in a distributed environment.
Physical
sequential files are said to be non-keyed files because they must always be
retrieved in the same order. The terms ?QSAM? (pronounced que-sam) and ?BSAM?
(pronounced bee-sam) were often used to describe physical sequential files in an
IBM mainframe environment. BSAM stands for the Basic Sequential Access Method,
and QSAM stands for the Queued Sequential Access Method (QSAM). Both of these
methods worked well for sequential files and remain in use today in parts of the
tired world and areas of New Jersey.