- Buy a single, large Unix server (probably a HP box running HP-UX) with 8 CPUs
- Buy two or more commodity "Lintel" boxes, with 2 or 4 processors, running Red Hat Advanced Server, and use RAC to create a single database instance.
But just how suitable is Real Application Clusters for data warehousing? The reason I started doubting RAC's suitability was an article I came across entitled "Oracle Real Application Clusters and Data Warehouse Applications" which stated the following:
"Oracle has marketed the Real Application Clusters (RAC) option as appropriate for all types of applications, but there is one very important exception, Oracle data warehouses. Oracle RAC is used primarily for continuous availability of mission-critical systems (i.e. banking applications) and for transparent scalability of massive online transactions systems (i.e. Amazon). However, it is not always the best architecture for data warehouse that require large materialized view rollups because they often require large, cohesive RAM regions (large data sorts) and tightly-coupled CPU’s (fast parallel query) ... In sum, Oracle Real Application Clusters is a wonderful tool for mission-critical database that must have continuous availability (using RAC with Transparent Application Failover (TAF)) and for scalability of super-large OLTP systems, Oracle RAC is not the best choice for data warehouse applications that require high-speed table scan performance."
Now I know Don and Mike have spent quite a bit of time working with RAC, so I thought it wise to look into this in a bit more detail. What I did in the end was post a link to the article on the Dizwell Forum, and ask the forum participants for their experiences with RAC and data warehousing. I won't repeat the whole thread (which went on for three pages, so it's obviously an area of interest) but if you're considering RAC for a data warehousing project, take a look at the thread and Don's article, as they're both useful bits of background reading. My conclusion? A large, single server and a collection of clustered servers aren't interchangeable concepts; there could be issues with RAC with "edge cases", for instance (as mentioned in the forum comments) where a parallel query needs to run over multiple nodes, and the "large cohesive memory area" issue I still think could be an problem if you need more memory than a single small server can provide. The lesson I guess though is that there's no "black and white" rule for this sort of thing (not that the original article was suggesting this) and it's worth getting a bit of peer advice if you're looking to get an answer to a question such as this, with presumably big implications if you get it wrong.