 |
|
Oracle 11g Database Management for Business Intelligence
Oracle 11g Tips by Burleson Consulting
|
Database management for business intelligence systems
The use of consumer data for market
analysis has been used since ancient times when the Mesopotamians sold shipments of olive oil and other
commodities to the Ancient Grecian empire.
While the foundations of the data storage have changed dramatically from Mesopotamian clay tablets to today's modern database
management systems, the goals of business intelligence and
data mining remain
unchanged.
The basic tenet of business intelligence is that one can
predict the future by analyzing the past, and by grouping together related
groups of consumers, you can develop highly sophisticated and
accurate predictive models that can save billions of dollars a
year in advertising expenses. At the same time, consumers are provided with
targeted marketing which is most appropriate to their needs.
Business intelligence is not limited
exclusively to the area of marketing and sales. Hospitals group
patients together in terms of their age and symptoms (a "cohort"), and analyze
treatment regimens in order to determine the best course of treatment
for their specific patient populations.
Even though the use of
business intelligence saves lives,
BI technology has broader social implications.
First and foremost is the issue of data privacy. As consumer
monitoring becomes more and more ubiquitous (note how your purchasing
behavior is controlled at super markets via your buyers club card), we see that many
privacy advocates do not want even our most innocuous behaviors
recorded.
Fortunately, most consumers don't care whether you
prefer peas to string beans and they allow point of sale systems to readily
track purchases. Via the use of buyers club cards, the BI expert ties individual
purchases to background demographic information.
When consumers apply for buyers club cards, they provide
basic demographic information which
is in turn analyzed with publicly available information on major life
events and income (such as the purchase of a house, a divorce, the
presence or absence of children). Hence, the database has detailed
information not only about what products are being purchased, but the
basic demographics of the person who is purchasing the goods or services.
The issue of data storage has always been
important to business intelligence because of the dynamics of
changing technology. Disk prices are falling radically each and
every year. Back in the 1980's, 1.2 gigabytes of disk storage could cost
a whopping $200, 000 whereas today you can purchase the same amount of
disk for less than $100.
Given our ability to store large amounts of
empirical information
cheaply, the goal of the business intelligence manager is to somehow be
able to cleanse and manipulate this data in such a way that accurate
predictive models can be built.
Let's take a closer look at the
evolution of business intelligence from the perspective of the database
manager, and explore how the database influences the manipulation of
these vast quantities of observable data in the real world.
Data as a predictive tool
As with previously noted, people have been analyzing data for centuries
in the attempt to predict consumers' future behavior, as well as the
behavior of other important tasks such as medical treatment programs.
The statistical methods for analyzing predictive data have been with us
for centuries, and
data mining analysis allows us to predict, with relative certainty, the
internal mechanisms and behaviors of groups of people in the general
public. For an interesting exploration of this concept, see the
book Super Crunchers by professor Ian Ayres of Yale University.
In his book
Super Crunchers, Dr. Ayres shows how data is often replacing
human intuition in many areas of business intelligence. Today, we
know the top CIO's and CEO's of large corporations can earn hundreds of
millions of dollars a year, largely for their human intuition.
It's been largely recognized that computers can only take care of the
well structured part of any decision making task. We generally
find that these types of information systems fall into different
categories:
-
Expert systems
-
Expert systems are systems
that quantify the well structured component of a decision task and make
recommendations without the input of a human expert. These systems are
typified by MYCIN, a predictive tool that quantifies the questions asked
when diagnosing specific blood illnesses. The same approach can be
applied to just about every area of business management, including the
database management system itself. In the early twenty first century, Oracle database
administrators can use tools such as Oracle data mining to filter through
their database metadata and performance data (using Oracle's automated
workload repository), and predict in advance resource consumption trends
within the database management system.
-
Decision
support systems (DSS) - Decisions support systems are systems where it is recognized that human intuition is an
essential component of the decision making process; and DSS technology makes
no claims to actually solving the problem. Rather, a decision support
system provides the decision maker with information from their problem
domain and leaves the actual decision process to the human expert.
This is an important concept within information systems.
It is
interesting to note that many systems which were first thought to be decision
support systems turn out to be expert systems. In one notable case, a
major soup manufacturer was about to loose a long-term employee of forty years,
who knew every intricacy of the tricky soup vats within the company.
Initially setting out to create a
DSS, the decision analyst
quizzed the employee over a period of months and discovered that what was once
thought to be intuition was actually the application of a large set of well
structured decision rules. When this soup vat expert would say something
like "I have a feeling that the problem is X", it appeared to be human intuition
to those less knowledgeable observers.
However in reality it was the
application of a long forgotten decision rule or an experiential case for which
the individual had since lost conscious
knowledge. The application of the decision support system technology
eventually led to an expert system. This allowed the forty year worker to
retire comfortably, with the knowledge that all of his years of decision
rules had in fact been quantified, helping the soup company carry on without him making
even faster and better decisions as a whole.
The application of business intelligence for
predictive models
The idea of data mining allows us
to do far more than predict the future behavior of a consumer. Companies
such as Amazon pioneered the idea of a "recommendation engine", which analyzed patterns of behavior amongst
known consumers, extrapolated them online, and made on-point recommendations for
future purchases. This type of technology has also been applied to other web-type interfaces such as
NetFlix and TiVo, where consumers are directed to related entertainment that people of similar interests might
have in mind.
Another good example of data
mining is the role of a bank loan officer. Traditionally, bank loan
officers all have access to the same set of data, but it is undeniable that some
people serve as better loan officers
than others. This could be blamed on human intuition, whereby the loan
officer recognizes someone as either having a good or a bad propensity to repay
the loan based on non quantifiable characteristics.
It is largely understood
now that the role of an experienced bank loan officer has more to do with the
subtle nuances of the applicant; and being able to recognize them. Hence,
today's bank loan officers are largely constrained by following the computer
whereby an individual borrowers is compared against a cohort (the term "cohort"
is the arbitrary grouping of like minded people).
In some, the rapid falling prices of disk storage technology
have now made it feasible for organizations of even a modest budget to store
trillions of bytes of real time information about their business processes.
The immediate challenge is how to store, organize, and extrapolate from this
information in order to make valuable business decisions.
Let's take a
closer look at the underlying database technology, and explore how today's
database management systems help business intelligence experts to organize,
collect, and make valid predictive models.
The foundation of database management for business
intelligence
The storage of online data began
in the 1960s as organizations began to develop the digital means to store information about stock
prices, consumer trend behaviors, and so on. Unfortunately, this information
had to be stored on large volumes of magnetic tape, and simple decision support
queries for correlations could take days, making it difficult for a manager to follow any
'flow' of a decision process. It was only as disk storage began to become
cheaper that this information was able to be brought online, so that the
information could be indexed, pre-computed, and organized in such a fashion that
the user of the business intelligence system could quickly get feedback on given questions.
This would stimulate new questions, and provide a platform for making more informed
business decisions.
An early leader in the area of decision support systems and
expert systems was SAS, the Cary North Carolina based company which has been a
capstone of data analysis for more than forty years. SAS had its own full
programming language and rudimentary data storage platform, upon which
statistical algorithms could be run to analyze just about any kind of
information. But as today's corporations start collecting "raw" data from their
observable world, several problems have to be undertaken:
-
Data cleansing - Data is only as good as the input to that
system, and common keyboarding errors from individuals can skew the
quality of the information. Today we recognize that all data must be cleaned,
scrubbed, and standardized in order to get meaningful information from it.
-
Data summarization - In data summarization, we find the
problem of pre-computing large scale aggregations from mammoth volumes of
data in real time. A simple question like "how many consumers of
widgets are their in New York?", might require millions of data block I/Os, and
a significant amount of computing power. Even with today's super fast
computer systems and super cheap disk storage, the decision support system
or expert system must be able to have this information available at the
fingertips of the decision maker, which often requires pre-summarization and
pre-aggregation of the salient data factors. Hence, today's database
managers devote a significant amount of time to observing the decision
patterns of their end user base, using tools such as
Oracle materialized views, Oracle's
star query joins; allowing the information to be accessible to the end
user base in a real time fashion. We also see today's business intelligence
applications supporting a drill down
mechanism whereby they can take a look at the behavior of a cohort as a
whole, then double click through to see the information at successive
levels of usage. Today, we see tools such as the Urchin software (now
called Google Analytics) which allow website referrer stats to be organized
in such a way that an SEO, or search engine optimization expert, can quickly
drill through and see how customers are visiting their individual websites.
We also see a paradigm change on
the Internet whereby referrer statistics can now measure not only the number of page viewed for an individual
webpage but how long an individual actually spends on that page; a far better
indicator of the actual popularity of a web page. These types of
technology are fostering a whole new way that we use information in order to
make predictions.
For example, Ayres notes in his book "Super Crunchers" that he
helped chose the title "Super Crunchers" by doing an empirical experiment using
Google AdWords on the keywords "data mining".
By presenting his end user community
with a choice of either "Super Crunchers" or "The End of Intuition", Ayres was able to
determine that Super Crunchers was a far better title using the very type of
data mining technology which he espouses within his fantastic book. But
the idea of using database management systems as the foundation for business
intelligence also has applications far beyond basic predictive modeling.
Let's
take a look at some of the more sophisticated uses of these trillions of bytes
of corporate information, and understand how they can be used for hypothesis
testing.
Hypothesis testing in business intelligence
The aircraft industry learned in the 1960s that large-scale
computers could be used to simulate the flying of a new aircraft without putting
pilot's lives at risk, and we are starting to see the same application of
hypothesis testing being used within the business community today. Prior to
launching a 100 million dollar ad campaign, the behavior of that can be
simulated using sophisticated algorithms and techniques which will model the
actual advertising campaign in order for the marketing executive to see what
kind of an ROI (return on investment) the marketing campaign might do.
Hypothesis testing is generally a "what if" type of question, whereby the
business intelligence expert can isolate individual variables within their
database and manipulate them over time based on well defined preconditions. This
"ceteris paribus" approach (ceteris paribus literally means "all else being
equal"), allows the decision maker to keep everything except their problem
domain fixed. By fixing all but a single variable, and testing it against a well
known universe, the business intelligence person can develop models which are
far more sophisticated than traditional predictive analysis. For more
information on this technique see Dr. Carolyn Hamm's book "Oracle
Data Mining".
The costs of business intelligence
It's often said in the information technology world that you
'can't afford not to have a data mining technique with in your organization'. It's
not uncommon to hear stories of payback periods compressed into mere weeks even
on data mining projects that cost tens of millions of dollars, because of the
high value of the information that comes from these, and the end users savings
for consumers.
The best example of this of course is within predicting consumer
behavior, where organizations save hundreds of millions of dollars in broadcast
advertising, replacing it instead with well-targeted advertising and a high
probability of buying a specific product. The consumers appreciate the
targeted marketing, and the reduced costs allow products to be offered more
cheaply; benefiting everyone.
Let's take a close look at the shift of
the costs. Back in the 1970s the major cost of any data warehousing or any
data mining operation was the hardware itself which would often comprise more
than 80% of the total cost. In the early twenty-first century we see a
complete reversal of this, whereby the disk storage, while significant, are
minimized by the amount of work required by both the database administrator and
the business intelligence analyst. A highly skilled database administrator must
be put in place in order to capture the real time data and organize it in such a
fashion (using tools such as
Oracle partitioning) so that the information can be more easily accessed by
the statistical managers.
Once the data has been collected, organized, and
aggregates are pre-computed and summarized, the largest expense is that of the
business analyst themselves. These people must have very extensive
backgrounds in multivariate statistics and understand in detail how all of the
algorithms work, so that they can tear through all of this data in order to make
statistically meaningful correlations between the data. In some, the
lion's share of today's costs of data business analysis are in the human
resources arena, propelling an experienced data mining analyst into the realm
of some of the most highly paid people within the information systems industry.
Conclusions about business intelligence in America
The Gartner Group has predicted a large scale uptake of
business intelligence within the IT arena for the years 2007-2015, largely based
upon the industry's understanding as a whole that this information is quite
valuable and has an extremely short payback period. While disk costs
continued to fall, data storage engines such as SAS, DB2, and Oracle provide the
vehicle and platform for sophisticated business analysis.
We can expect to
see a lot more demand for people who are highly skilled in quantitative aspects
of data analyses. These data analysts may not be fluent in Oracle Database
administration or the nuances of the internal data storage, but they have the
statistical acumen of an actuary, a 'super cruncher' if you will, who is someone
who can take terabytes of point of sale information and glean the nuggets of
golden information from that. This is certainly an emerging area of
information technology, one that requires many years of studies in statistics and
understanding how to present information in a meaningful way to the decision
maker.
sf: DBNR