Savvy Web sites
are watching every move of their customers, tracking their page viewing
preferences and buying habits. This tracking is fed into sophisticated
correlation engines that can accurately predict those products and services
that the customer is likely to buy. Using this information, the Web page
content is customized. We are seeing the dawn of applied artificial
intelligence in eCommerce, and the major database vendors are creating
products to help.
The ability to
track the behavior of customers on an eCommerce site is critical, and Oracle
offers several tools and techniques to assist in this process. If Web sites
can create dynamic Web page content featuring those items that their customer
likes to buy, they can greatly improve the shopping experience for the
customer, and greatly increase sales.
When used
properly, intelligent customer correlation analysis techniques are a win-win
technology. Customers are happy because the eCommerce site highlights those
products they want to buy. The eCommerce company is also happy because of
millions in extra revenue from "impulse" sales. Marketing experts estimate
that impulse sales account for billions of dollars in yearly consumption, and
the eCommerce vendor can dramatically improve revenue by using custom HTML
technology.
When used
improperly, customer correlation analysis can lead to broken lives and
lawsuits. There is an urban legend that an Oracle data warehouse for a Hotel
chain analyzed patterns of hotel usage, and sent targeted mailing at customers
who rented rooms on weekdays. Sadly, the coupons were sent to the customer's
home address, and not their work address. Legend says that more than six wives
learned that their partners were renting rooms in local hotels during the
week, and several divorce proceedings were started because of this poor choice
of mailing address.
This article
explores this exciting new technology and reveals the secrets mechanisms used
by Oracle9iAS Personalization Engine to build intelligent Web page content.
While the technology behind the scene is very sophisticated, the mechanisms
are well-known and have been employed in marketing systems for years.
Big Brother
is Watching You
While most Web
users profess a desire to remain anonymous, the eCommerce software must track
visitor's behavior and use their prior behavior to customize their Web page
content. Using a combination of cookies and referrer statistics, many
eCommerce sites can watch your every move and store your page viewing habits
inside an Oracle database.
Before we move
into the Oracle specifics, let's take a closer look at how page view tracking
works. Every time you make an HTTP request from your browser, you are sending
three bits of information:
1. Your IP
address — This is needed for the Webserver to relay the page back to your
browser. For most Web users, their IP address changes every time they connect
to their Internet Service Provider (ISP)
2. The URL you
want — This is the page that you seek to display.
3. The URL of
the last page you visited — This is a "referrer" that says the URL of the page
immediately preceding this request. If this URL contains "content strings,"
you can record a great deal of information about the viewing habits of the
user.
Internally, the
URL that you are requesting gets resolved to an IP address, and these three
items are sent to the target IP address. The most important of these three
items is the referrer URL because it provides a history of Web page usage. In
their quest for Internet privacy, a host of products such as the
Anonymizer, prevent
browsers from sending referrer URLs.
Unobtrusive
Measures
The methods
used by eCommerce sites for tracking your visits are borrowed from the
principles in the landmark management spy book Unobtrusive Measures by
Eugene Webb. The idea behind unobtrusive measures is simple: "Customers
will change their behavior when they know they are being watched."
It is well
documented that "anticipated evaluation" will change consumer behavior, and a
valid data collection mechanism requires an unobtrusive tool. In the marketing
industry, researchers have long been aware of the anticipated evaluation bias
and spend millions of dollars on Web systems that will unobtrusively monitor
consumer behavior.
The tools used
by eCommerce sites to monitoring your behavior are very unobtrusive and the
casual user is not aware that their movements are being tracked.
What Does an
eCommerce Site Track?
eCommerce sites
gather detailed visitor information from the referrer statistics that are
coded inside each HTTP request, and store them inside an Oracle database for
analysis. Because the Web site knows the last URL that you issued (the
referrer URL), you can gather this "referrer" information and see how people
got to you site. If your visitor used a search Engine such as Google to find
your Web site, the referrer URL may look like this:
http://www.google.com/search?sourceid=navclient&q=oracle+dba+support
Just from
looking at this URL, we can see that this visitor issued a Google search for
pages using the keywords "remote oracle support." Because keyword information
is included inside the URL, referrer statistics can be a gold mine for
tracking Web browsing behavior.
It is important
for eCommerce sites to add data inside URLs, and the more sophisticated the
tracking mechanism, the longer the URLs. For example, here is a URL to display
a book on Amazon.com:
http://www.amazon.com/exec/obidos/ASIN/0072223049/qid%3D1006012027/sr%3D1-
2/ref%3Dsr%5F1%5F6%5F2/102-6066513-8940125
From looking at
the URL above, it is clear that eCommerce sites are using the URL to help
capture detailed data about you behavior.
If the
Webserver can capture this referrer URL chain in a database, then a whole
history of your activity on the Web site can be stored and analyzed. Standard
Webserver software (apache, asp, cgi) collects the referrer statistics for all
visitor information (refer to Listing 1).

Listing 1: A referrer report of Google keywords.
By themselves,
referrer statistics are critical to companies who rely on search engines to
bring them traffic. On the Web site in Listing 1, we see exactly the keywords
that are used by new visitors to find the Web site, and the frequency of the
keywords.
However,
tracking and storing details about your visit is only half the story, and to
be efficient, the Web site must have details about your demographics.
Learning
More About You
Demographic
collection is a critical part of a successful formula for personalization
technology. Without demographics, we cannot develop multivariate correlations
that compare your viewing and purchasing behavior with known groups of
customers.
When you sign
on to an eCommerce site for the first time, you will most likely provide
enough personal information to uniquely identify yourself. The eCommerce
server takes your identity information and collects demographics about you
(refer to Figure 1). These demographics include your sex, age, income level,
credit history, and other publicly available information about your income,
social status, and personal interests.

Figure 1: Demographics collection mechanism for eCommerce.
Once collected,
this information is stored in a large Oracle database. Using this data,
complex multivariate (Chi-square) analysis is performed, and detailed
predictive models are created. These models are used to customize the Web
content.
Gimme a
Cookie
At this point,
the Web site knows details about your browsing behavior, but we next must have
a means to positively identify each visitor so we can spy on their browsing
behavior. Web developers for eCommerce systems have faced the challenge by
writing cookies into the PC of the customer. A cookie is a small flat-file
that serves to positively identify each visitor.
As an
interesting side-note, these files were called cookies after the famous Cookie
Monster rouge software on the MIT student system. For more than a year, a
hidden program named cookie monster would randomly interrupt user session and
demand "Gimme Cookie." When the student typed "cookie," the software thanked
them and restored their computing session.
As we have
noted, we cannot use the IP address of the browser to verify the identity of a
customer. Most ISPs use dynamic IP addressing where by the fourth octet of the
IP is assigned at login time for example, an ISP may have the IP of
172.16.10.xxx, and xxx is a number assigned at ISP connect time. Without a
static IP to identify the customer eCommerce vendors must rely on cookies to
get a positive ID on an incoming customer request.
You can easily
see all of the cookies on your MS-Windows system (refer to Figure 2).

Figure 2: The location of your cookies directory on MS-Windows.
For fun, you
can look inside cookie file and get an idea about how cookies help the
eCommerce site spy on your behavior. Here is an example of an Amazon cookie
(refer to Listing 2).

Listing 2: A sample cookie file.
Once you
understand cookies, you can have fun with eCommerce sites either by removing
the cookie or editing their contents. Next time you use such a site, take a
minute to edit and change the cookie file before you go to the site.
Let's wee how
cookies influence customized Web content. Below, we access www.Amazon.com
without a Cookie file (refer to Figure 3).
Figure 3: Amazon home page without a cookie.
Once we replace
our cookie, we access Amazon and get a customized recommendation list (refer
to Figure 4). This list of recommendations is produced by a personalization
engine that has carefully analyzed your viewing behavior, compared your
behavior to other customers, correlated your demographics, and develops a
customized predictive model of those items that you have the highest
propensity to purchase.
Figure 4: Accessing Amazon with a Cookie.
At this point,
we should now have an understanding of how eCommerce sites collect information
about you and your behavior. Next, let's look at how they use the data to
recommend purchases for visitors.
Oracle's
Solution to Personalization
Analyzing
referrer data on a busy eCommerce site is a formidable computing challenge.
Companies such as Oracle have developed tools such as the
ORACLEiAS Personalization engine and
Oracle Data Mining Suite (formerly the Darwin product) to assist with this
complex process. The nature of this analysis is very resource intensive, and
almost all large eCommerce sites must devote large servers exclusively for
developing predictive recommendations.
With millions
of dollars each year in impulse purchases at stake, IT marketing professionals
know that it is critical of get the right products onto your custom page. To
be successful, the system must be able to accurately predict your propensity
to buy a product, based on two factors:
1. Prior buying
and browsing patterns
2. Buying patterns of like-minded customers (customer profiling)
The challenge
in developing these models is accurately placing visitors into consumer
groups. A consumer group is a group of customers with similar
demographics, and hence, similar buying patterns. Some shops tag these groups
with "handles" such as yuppies, dinks (double income, no kids), and so on.
Figure 5 shows
the process of analyzing demographic information to place visitors into
consumer groups. A visitor can be placed into a consumer group in two ways:
1 - Their
demographic category (collected from personal information)
2 - Their pattern of page views (collected from referrer URLs)

Figure 5: Categorizing users into consumer groups.
Once we have
defined consumer groups, we next start a data mining procedure to correlate
the patterns of each consumer group to specific products (refer to Figure 6).
The customized HTML personalization is based on data from three sources:
1. Known
consumer group data — These are predetermined summaries of consumer group
characteristics.
2. Weighted
rankings of pages viewed — This is a measure of the popularity of product
pages according to each consumer group.
3. Sales
history data — This is historical sales data, correlated by consumer group.

Figure 6: Correlating consumer groups to product preferences.
After we have
the consumer groups ready, the Oracle database can quickly generate customized
HTML (refer to Figure 7). Here we see three steps:
1. Initial
Request — The customer browser makes an HTTP request to the Web listener.
2. Get the
Cookie — The Web server sends a request to the browser PC to get the
cookie file
3. Deliver
the custom page — The cookie file verifies the identity of the visitor,
and customized HTML is created by the app server based on the identity, and
the custom HTML is delivered back to the browser.

Figure 7: Delivering custom HTML pages.
Despite the
inherent complexity of personalization technology, vendors are creating
interfaces to the software. The Oracle product even offers a GUI interface to
aid in the complex analysis of referrer statistics (refer to Figure 8).

Figure 8: The Oracle iAS Personalization GUI.
These tools use
sophisticated data mining technology to constantly peruse the database looking
for statistically-significant correlations. Often, these predictions are
obvious (i.e., long underwear does not sell well in Hawaii), but other
predictions are amazing. Some predictive models have been able to accurately
target new products to exactly the type of people who will want them. This is
the classic "need creation" model used by successful companies like Sony. For
example, Sony launched the first Walkman because they had empirical evidence
that customers would look at the product and say "I need that," even though
the customer had no prior knowledge that they needed the product.
Conclusion
While the
technology surrounding personalization of Web content is already extremely
sophisticated, IT professionals are constantly making it even more complex.
Artificial Intelligence routines are now being employed to develop detailed
models about how consumers spend their disposable income.
Oracle
Corporation is making headway into this market with their Data Mining Suite
and Personalization Engine software. These products are sophisticated programs
that interface with the Oracle database.
Along with
threes advances will come even more concerns about the loss of privacy on the
Internet, but for now, customers appear only too happy to provide the raw data
needed to analyze their spending patterns.