The Data Warehouse Development Life Cycle

The Scope Of Work Agreement 

* A determination of the amount of aggregation and summarization for each fact. This is sometimes called the level of data granularity. For example, the SOW might state that monthly sales will be tracked against sales district, inventory class, item category, and so on, until each and every aggregation has been identified. The level of summarization varies from system to system, but it needs to be clear what levels of summarization will be available to end-users and whether they will be able to drill-down into increasing levels of detail. For example, most Oracle warehouses do not allow end-users to drill-down into the lowest level of transaction detail because this data is often stored in a non-Oracle database system.

* The choices for hardware and software platforms. While designers may not know the exact types and model numbers of disks, CPUs, and software programs, they should be able to specify broad categories of hardware and software elements. For example, at this point, designers should know what processors are required, and they should be able to estimate the necessary amount of disk space and the cost of the Oracle engine and any other client software and statistical packages that will be used. (Many experienced managers have remarkable success in making disk space estimations by using a SWAG--a scientific wild-assed guess! A SWAG should apply proven statistical estimation techniques to educated guesses. Fortunately, experienced designers often prove to be remarkably accurate when their SWAG is later compared to actual figures.)

* A listing of the functional deliverables. This list is a functional description of the analytical capabilities that end-users expect from the data warehouse. For example, the listing might state that end-users require a multidimensional presentation of Oracle data and that they desire simulation, modeling, decision support, forecasting, data mining, and so on. The listing should be a broad description of functional requirements, without delving into technical or product-specific details. Often, the details will not become apparent until the warehouse design phase. At this point, designers merely specify the type of end-user delivery metaphor that will be used. For example, multidimensional data analysis commonly uses a spreadsheet metaphor--this can be stated without actually picking a spreadsheet product.

* A detailed cost-benefit analysis. As discussed earlier in the chapter, a detailed cost-benefit analysis includes a complete description of the costs and benefits of the data warehouse, both tangible and intangible, expressed in net present value dollars. This analysis may also include an estimate of the payback period for the project, which was also discussed earlier in this chapter.

* A current project plan. A project plan is where the high-level project plan and work breakdown structure is documented. Project plans should include rough estimates of the number of project participants and their desired skill requirements, as well as a Gantt chart showing the progression of the major phases of the project.

In addition to specifying as much detail as possible, the SOW should spell out the function that the data warehouse will not be able to perform. The data warehouse development staff must take every precaution against any misconceptions from the end-user community, and it is far better for the end-user to have low expectations that are exceeded by the Oracle warehouse, than it is to have the end-users excited with grandiose and unrealistic expectations, only to have them disappointed during the implementation phase of the project.

The SOW should be regarded as a binding agreement between the end-user management and the data warehouse management, and it should be as formal as possible, spelling out realistic scope and delivery schedules for each phase of the project. There have been numerous data warehouse projects that have collapsed after the data warehouse management has spent several million dollars on hardware and software, only to become mired in issues of scale. And while most savvy data warehouse managers create rough prototypes to demonstrate the data warehouse to end-user managers, this sometimes creates difficulties because end-user management may not understand why a project is taking so long if they’ve already seen a working prototype, leading them to believe that you have already done much of the work! Misunderstandings are common, but they can be minimized if data warehouse management can help end-user managers understand the technical issues involved in the creation of the warehouse. No mater how well-designed a data warehouse might be, it is imperative to keep the lines of communication open between end-user management and the data warehouse management in each and every phase of warehouse development. In short, the SOW makes or breaks the success of a data warehouse development, and it should be treated very formally.

This is an excerpt from "High Performance Data Warehousing", copyright 1997.
