This amazing and cleaver article titled “Data Mining 101” author
Tom Owad claims that the Amazon (Oracle powered) wishlists may
be used to find subversives and he provides a techniques for
downloading wishlists from the Amazon Oracle database.
This is important because if offers-up a way to Hoover data from
Oracle databases. Smart people like Owad can replicate the
web page transactions and vacuum-off all of the Oracle data that
is exposed via the web site. By using this important
technique, SQL becomes a moot issue and the API is used as the
data delivery language.
Most of all, this outstanding article highlights the fact that
any and all Oracle data that is accessible over the web can be
extracted and cloned into another database.
http://www.applefritter.com/bannedbooks
“There are many websites and databases that could be used for
this project, but few things tell you as much about a person as
the books he chooses to read. Isn't that why the Patriot Act
specifically requires libraries to release information on who's
reading what? For this reason, I chose to focus on the
information contained in the popular Amazon wishlists.”
Owad gives actual code listing of how to extract readers of
“subversive” publications from Amazon’s Oracle database, and
scariest of all, directions for locating the readers:
“Using a pair of 5-year-old computers, two home DSL
connections, 42 hours of computer time, and 5 man hours, I now
had documents describing the reading preferences of 260,000 U.S.
citizens.
I downloaded all the files to an external 120 GB Firewire
drive in UFS format. The raw data occupied little more than 5
GB. I initially wanted to move all the files into a single
directory to facilitate searching, but as the directory contents
exceeded 100,000 items, the speed became glacially slow, so I
kept the data divided into chunks of 25,000 wishlists.”
Owad notes that his Oracle data mining technique may be soon
used by the FBI to locate potential bad-guys:
“On a final note, the FBI is
now hiring computer scientists to implement a project that
sounds very similar to what I just did:
"Currently, the FBI is
strengthening systems engineering in order to tie new systems
together architecturally and ensure that standards for custom
and packaged applications are enforced, and it needs engineers
to accomplish this goal, the agency said.
"The FBI is also focusing on
data warehousing as well as federated search technology, which
allows a single search query to be deployed across a number of
databases, regardless of whether those databases belong to the
same protocol or platform.
"'Warehousing has been very
successful, yet enterprise extraction, translation and loading
processes must be fine-tuned,” the FBI said. “Data engineers are
needed to model legacy databases for federated search and
participate in legacy transition planning.'"(Computerworld)”