Real-Time Data Warehousing and the Semantic Web
June 23, 2005
Mark Rittman

Neil Raden, who's article on Model-Driven Approaches For BI Projects I linked to last year, dropped me a line to tell me about some new real-time ETL articles he'd written:


I've written some articles and white papers about real-time data warehousing:


...but I think the really interesting part of it is not data warehousing, per se, but abstraction and real-time analytics. Abstraction can provide the logical-to-physical layer between a data warehouse and a BI app, but it can also provide the kind of rich meaning we need for our machines to do some reasoning, something all current data warehousing and BI concepts lack.

For that, I'm investigating the competing approaches of the Semantic Web and Emergent Semantics, though I'm leaning toward the latter.

With good semantics and Moore's Law, much of data warehousing becomes irrelevant. As for BI, most of it is parlor tricks. I'm hoping to see a new batch of tools that can reason and learn, at least in the limited domain commerce, supply chain, CRM, etc. That's where real time will show some returns."

Apart from the list of Neil's ETL and data warehousing articles, there's a couple of good (free) e-Books linked to on Neil's site, including one on ETL and Data Integration (including articles on Kimball vs. Inmon, the model-driven BI project and two on real-time data warehousing), and another one specifically about real-time data warehousing. You have to go through an annoying registration process to get the books, and they're Windows-only executables, but it looks like there's some interesting content there. See also "Implementing Real-Time Data Warehousing Using Oracle 10g" on DBAZine.

Real-Time data warehousing is an interesting area, and one that's addressed by some of the new features in OWB "Paris" - the ability to accept data from Advanced Queues and web services, and the ability to publish out to the same, such that you can publish an OWB mapping that "listens" for ETL data and then transforms and hands it off in real-time. The reality though, at least as far as I've experienced in the UK, is that the market isn't really clamouring for this at the moment, at least not in any volume. What is of interest though is reducing the ETL load time down to as close to zero as possible, with as little impact as possible on users who are accessing the system, and it's this requirement that's driving my interest in this area. Whilst it's still probably a while off before a significant number of OWB users use technologies such as web services and AQs to process their ETL jobs, RDBMS technologies such as external tables, table functions/pipelining and change data capture are already getting take-up and are starting to become a normal feature of OWB projects.

Must also take a proper look at this "semantics" stuff as well - it's been cropping up a lot recently and I need to get a better understanding of what this is all about...