 |
|
Oracle disk and the "personal petabyte"
Oracle Tips by
Steve Karam |
Nouto Souto has a fascinating blog entry titled "No
Moore" where he discusses the concept of the "personal petabyte", a concept
that says that the sum of all individual experience will fit in a petabyte of
information, called the Personal Petabyte.
"Anyone interested in this should check out the research carried out by
Jim Gray
of Microsoft. In one of his papers he proposes that the entire life history
of a human being can be contained in a Petabyte (PB) of data - he calls it
the "Personal Petabyte".
Read it, it's very interesting and I believe he is 100% right. We are
heading fast into a world where it will be possible to store that PB about a
person in a finite storage element!
Any future marketing exercises wanting to address a given population better
be prepared to be able to digest this sort of volume of information, in a
usable time! Because it will happen, very soon, scaled out to the size of
their audience. Yes, Virginia: we're talking Hexabytes - HB - here!"
Noons is wondering how we're going to be able to use Personal Petabytes for
marketing if we don't have proper computing resources. Here is my take on
this issue:
Inside a personal petabyte
A Personal Petabyte would be mostly composed of non-correlated data. The movies
I watch have no true relationship to the sandwich I ate for lunch, except that
they are both related to me. If you were to try to form an ERD of a complete
Personal Petabyte (that is, every experience I have ever been through) you would
have a central "Me" table surrounded by thousands of child tables that have no
real bearing on each other except for form new instances of "Me."
The problem you mention above regarding the CPU and its memory being the true
bottleneck is true if we're talking about a traditional Von Neumann
architecture: a sequential flow of data between a CPU and its memory. However,
using multiple CPUs with cache and branch prediction, we can achieve a high
level of parallelism that can break through the conventional boundaries that you
mention. Add in Solid State Disk and you have an extremely fast system that can
tackle huge volumes of data.
But even that still falls under the Von Neumann architecture, which has mostly
been deemed inadequate to handle large amounts of non-correlated data such as a
Personal Petabyte. If the aim is to capture EVERY bit of data regarding a
person's life (and not just the data that pertains to our business/marketing
scheme), a different architecture entirely will be required.
I would say this is where neural networks come into play. Neural networks are
made to store huge amounts of raw sensory data, then process it with multiple
asynchronous systems to find patterns and correlations (e.g. "People who like
beef, wear flip flops, and watch movies about ninjas are more apt to buy Tide
Detergent than Gain"). The concept here is to mimic the human brain (and by
cross-referencing your Personal Petabyte with other people's Personal Petabytes,
to simulate a Super Conscious) to figure out what the next Instance of You and
ultimately the next Instance of Group Mind would like to buy.
I suppose the closest thing we would have in the current abilities of Oracle
would be a massive snowflake schema based around a fact table called "HUMAN".
All the data, forming together into every instance of a person, would be
crunched and Materialized Views generated to store statistics regarding all the
discovered correlations. A system such as this would absolutely require
lightning fast disk resources such as SSD, coupled with a large amount of
processors distributed amongst multiple systems in order to crunch the data.
And because of the randomness of human experience, a large amount of the data
would have to be self-identifying...SYS.ANYDATA galore!
SQL> CREATE NEURAL CLUSTER "PERSON"
2 (id number not null,
....
4389475394843220598 fingernail_clip_date date);
 |
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning tips and
scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |