By Steve Karam in SearchOracle
In this article, OCM and ACE Steve Karam discusses Oracle data integration
options
You are reading this article because you are interested in integrating data
from other databases such as SQL Server into Oracle, and you wish to make an
informed decision on the feasibility of this undertaking. In order to make this
informed decision, you might also search Google to find a few relevant blogs,
read other articles discussing the same concept, and view a multimedia demo from
Oracle's Web site showing the technologies in action.
This is how a human being operates: we assimilate data from many different
sources, process it and integrate it into our central storage known as the
brain. Using this data, we are able to make informed decisions and progress in
our day-to-day activities. In database terms, we extract data from hundreds of
sources, transform this data into one format that our brain can process, and
load the data into various pools in our brain, thus making the data useful for
our daily analysis.
Now consider your server room, and the many sources of data that may be
contained there. If all this data could be integrated into one central
repository, a whole new range of possibilities would open. In this article, I'll
provide an overview of how to do this using Oracle.
A real-life example
Consider the following database environment:
- An Oracle OLTP environment that processes sales.
- A SalesLogix database in SQL Server that stores call
information.
- A legacy AS/400 repository of HR information.
- An Oracle data warehouse that aids in sales analysis,
reporting, and BI.
Unfortunately, the only system that is native to the data warehouse is the
OLTP environment, as they are both running Oracle. Sales are analyzed, but a
greater degree of information could be accessible if only Oracle could integrate
external data easily.
The jargon of integration
Before investigating the tools Oracle provides to bring your data together
into an integrated database, it is very useful to understand a few key acronyms
that are heavily used in the data integration world.
ETL: Extract, transform and load
The ETL concept has been around for quite a long time, and has been a popular
method of pulling data from multiple sources and integrating this data into one
central warehouse. The concept of ETL is based on three defined processes:
- Extract - This process involves the extraction
of data from any source. In the example above, the data sources would be a
SQL Server database, an AS/400 repository and an Oracle OLTP system. Data
would be extracted from each of these sources by any means possible, whether
through a direct database connection, a sophisticated data standard such as
XML, or the always handy comma-separated values (CSV) file.
- Transform - During the transformation process,
all data extracted in the previous phase would be cleaned and standardized
into one set format. For instance, phone numbers and state names may require
standardization between several original formats. This process would usually
take place on a staging server that is made to perform the complex
transformation procedures.
- Load - The transformed data is applied to the
data warehouse by means of a data loading mechanism.
ELT: Extract, load and transform
This methodology is just like ETL, except that you decide where the
transformation phase takes place. ELT involves performing the transformation
either on the source or target database and eliminating the staging server. It
really all depends on the needs of the source and target systems, and could just
as easily be reconfigured as TETL, TELT, ETLT or TETLT. ELT tools use code and
various methods of data movement in order to eliminate the need for a staging
database system.
EII: Enterprise information integration
EII is a relatively new concept that has become popular in the world of
service-oriented architecture (SOA) and business intelligence (BI). Think of EII
as "information on demand." The concept is that there are many sources of data
within a single corporate network, and nearly limitless sources of data all over
the globe. By bringing this data together only when necessary, fewer resources
can be spent pulling and loading data, and more resources can be spent on
bringing data from multiple sources. EII-enabled technologies use several
mechanisms for providing this on-tap form of data, including Web services,
enterprise service bus (ESB) and B2B servers, all with the final goal of
creating an abstraction layer consisting of virtual tables that can be queried
as necessary.
Oracle integration tools
Oracle has several tools that can be used to assist with data integration.
These tools allow heterogeneous access from a wide range of data sources such as
SQL Server, MySQL, DB2, XML, CSV, WSDL and many other databases and formats.
ODI: Oracle Data Integrator
The ODI allows data to be pulled from a multitude of sources, cleaned and
then loaded to a multitude of targets. This is accomplished through a code base
with hot-pluggable components called knowledge modules that allow heterogeneous
access to or from nearly any remote source.
Using process flows and data mapping, ODI is able to perform data integration
in a method akin to ETL processing. Data is extracted from multiple sources,
sent through several transformation processes and loaded into a final
destination. Transformations can occur on the source systems or on the target
environment through platform-specific code defined by the included knowledge
modules.
In addition, ODI is able to provide data services to your SOA suite using
native hooks built into the ODI software. Also see:
ESB: Enterprise service bus
Oracle's ESB is part of the SOA suite. The ESB is a component designed to
extract data from Web services (regardless of the standard used), then transform
the data in real time into immediately usable information by way of
transformation mappings. Using this methodology, Oracle is able to achieve EII-like
capabilities. Though EII is not purely based on Web services, Oracle's SOA suite
provides integrators with nearly all the tools they need to leverage on-demand
data via the EII architecture.
Oracle BI Publisher
Formerly known as XML Publisher, this tool has a pseudo-EII feel. Multiple
data sources can be easily defined, such as XML flat files, Web services or
direct database connections. Using predefined display templates, BI Publisher is
able to produce complex reports with this on-demand data. The reason this is not
true EII is that the multiple data sources do not come together to form an
abstraction layer of virtual tables or views. Instead, BI Publisher can almost
be thought of as a single-purpose real-time EII solution.
Transparent Gateways, SQL Loader, PL/SQL
It is still completely possible to write your own integration architecture
using command-line scripts. While many shops are fully embracing the SOA and BI
suites, some developers and administrators remain adamant that command line is
best.
Most of the infrastructure behind tools such as ODI, ESB and BI Publisher is
available right from the command line. Built-in packages facilitate the parsing,
searching and creation of XML.
The SQL*Loader utility allows both simple and complex flat files to be loaded
into staging tables for processing. PL/SQL code can be written to clean, manage
and integrate your data. And most importantly, Oracle Transparent Gateways will
allow you to access data from a multitude of sources, such as SQL Server,
heterogeneously.
Using Transparent Gateways, it is possible to create database links pointing
to external database resources such as a SQL Server database. Oracle scripts can
then extract the data as if it were a standard Oracle table for processing. If
an EII interface is the preferred data access methodology, queries can also be
written to join SQL Server (or other database) tables to Oracle tables in an
integrated fashion for real-time queries without physical consolidation.
Conclusions on Oracle Data Integration
Whether you are using command-line tools for a scripted approach to data
integration or a robust tool such as ODI, SOA Suite or BI Suite, Oracle makes
data in a multitude of databases highly accessible.
See our related notes on Oracle gateway technology: