Home
E-mail Us
Oracle Articles
New Oracle Articles

Oracle Training
Oracle Tips
Oracle Forum
Class Catalog

Remote DBA
Oracle Tuning
Emergency 911
RAC Support
Apps Support
Analysis
Design
Implementation
Oracle Support

SQL Tuning
Security
Oracle UNIX
Oracle Linux
Monitoring
Remote support
Remote plans
Remote services
Application Server
Applications
Oracle Forms
Oracle Portal
App Upgrades
SQL Server
Oracle Concepts
Software Support
Remote Support
Development
Implementation

Consulting Staff
Consulting Prices
Help Wanted!

Oracle Posters
Oracle Books
Oracle Scripts
Ion
Excel-DB

Don Burleson Blog

Oracle 11g Data Compression Tips for the Database Administrator

Oracle 11g Tips by Burleson Consulting

These is an excerpt from the book "Oracle 11g New Features". Tests show that 11g compression result is slower transaction throughput but creates less writes because of higher row density on the data block. See this benchmark of transparent data encryption.

Also see this quick note on how to compress Oracle tablespaces.

While it is true that data storage prices (disks) have fallen dramatically over the last decade (and continue to fall rapidly), Oracle data compression has far more appealing benefits than simply saving on disk storage cost. Because data itself can be highly compressed, information can be fetched off of the disk devices with less physical IO, which radically improves query performance under certain conditions.

Please note that there is common misconception that 11g table compression decompresses data while reading and holds it in the uncompressed form in the cache. That is not quite correct since one of the salient features of the database compression is that we do not have to uncompress the data before reading it and the data stays in the compressed form even in the cache.

As a result, Oracle table compression not only helps customers save disk space, it also helps to increase cache efficiency since more blocks can now fit in the memory.

According to the Oracle whitepaper on 11g data compression, the CPU overhead for the compress/decompress operations will be minimal. More important, Oracle11g data compression will be a Godsend for shops that are constrained by Federal regulation to archive their audit trails (HIPAA, SOX).

But best of all, because Oracle 11g compression makes storage up to 3x cheaper, solid-state flash drives far less expensive allowing Oracle shops to forever eliminate the high costs of platter-disk I/O and enjoy data access speeds up to 300x faster.

Let's take a closer look at how one would implement Oracle 11g Data Compression in order to achieve the optimal results.

A brief history of database compression
Oracle Compression Overview
Costs and Benefits of 11g compression
Oracle11g compression syntax
Oracle 11g encrypted tablespace compression
Operational tests of the 11g data compression utility
A benchmark of 11g compression with SSD

It is expected that Oracle data compression will eventually become a default for Oracle systems, much like Oracle implemented the move from dictionary managed tablespaces to locally managed tablespaces. Eventually, this data compression may become ubiquitous, a general part of the Oracle database management engine, but its important for the Oracle database administrator to understand the ramifications of the data compression and have the ability to turn-off compression at-will.

For example, super small servers (read PC's), may not possess enough horsepower to absorb the small (but measurable) overhead of the compress/decompress routines. Remember, there is always a tradeoff between these costs vs. the saving on disk storage and allowing for information to be retrieved with the minimum amount of physical disk I/O.

A history of database compression

Data compression algorithms (such as the Huffman algorithm), have been around for nearly a century, but only today are they being put to use within mainstream information systems processing. All of the industrial strength database offer some for of data compression (Oracle, DB2, CA-IDMS), while they are unknown within simple data engines such as Microsoft Access and SQL Server. There are several places where data can be compressed, either external to the database, or internally, within the DBMS software:

Physical database compression

Hardware assisted compression - IMS, the first commercially available database offers Hardware Assisted Data Compression (HDC) which interfaces with the 3380 DASD to compress IMS blocks at the hardware level, completely transparent to the database engine.
Block/page level compression - Historical database compression uses external mechanisms that are invisible to the database. As block are written from the database, user exits invoke compression routines to store the compressed block on disk. Examples include CLEMCOMP, Presspack and InfoPak. In 1993, the popular DB2 offers built-in tablespace level compression using a COMPRESS DDL keyword.

Logical database compression

Internal Database compression operates within the DBMS software, and write pre-compressed block directly to DASD:

Table/segment-level compression - A database administrator has always had the ability to remove excess empty space with table blocks by using Oracle Data Pump or Oracle's online reorganization utility (dbms_redefinition). By adjusting storage parameters, the DBA can tightly pack rows onto data blocks. In 2003, Oracle9i release 2 introduced a table-level compression utility using a table DDL COMPRESS keyword, and here is Oracle's TPC-H compression benchmark from 2006 and this blogger noted problems with the Oracle 9i compression:

"Secondly, we implemented data segment compression, and now we have to keep running mods against the sys.obj$ table to prevent "block too fragmented to build bitmap index" errors."

Row level compression - In 2006, DB2 extended their page-level compression with row-level compression. Oracle 11g offers a row-level compression routine.
Transparent data encryption (TDE) - See here for Oracle transparent data encryption Tips

Let's take a closer look at the historical evolution of database compression.

External database compression

Legacy mainframe databases such as IDMS and DB2 allowed the DBA to choose any data compression algorithm they desired. One popular database compression program was offered by Clemson University (still a leader in data compression technologies), and their compression programs were very popular in the early 1980's with their CLEMCOMP and CLEMDCOM programs. In IDMS, a user exit in the DMCL allowed for the data compression routine to be invoked "before put" (writes), and "after get" (reads). Internally, all database compression routines try to avoid changing their internal software and rely on user exits to compress the data outbound and decompress the incoming data before it enters the database buffers. In general, data compression techniques follows this sequence:

Read compressed data (decompress):

1 - The database determines that a physical read is desired and issues an I/O request.

2 - Upon receipt of the data block from disk, Oracle un-compresses the data. This happens in RAM, very quickly.

3 - The uncompressed data block is moved into the Oracle buffer.

Write an compressed Oracle block (compress):

1 - The database determines that a physical write is desired and issues an I/O request.

2 - The database reads the data block from the buffer and calls a compression routine to quickly uncompresses the data. This happens in RAM, very quickly.

3 - The compressed data block is written to disk.

Of course, database compression has evolved dramatically since it was first introduced in the early 1980's, and Oracle 11g claims to have one of the best database compression utilities ever made. Let's take a closer look.

Oracle Compression Overview

Over the past decade Oracle introduced several types of compression so we must be careful to distinguish between the disparate tools. The 11g data compression is threshold-based and allows Oracle to honor the freelist unlink threshold (PCTFREE) for the compress rows, thereby allowing more rows per data block.

Simple index compression in Oracle 8i
Table-level compression in Oracle9ir2 (create table mytab COMPRESS)
LOB compression (utl_compress) in Oracle 10g
Row-level compression in Oracle 11g, even for materialized views (create table mytab COMPRESS FOR ALL OPERATIONS; )

The historical external compression (blocks are compressed outbound and uncompressed before presenting to the database) are far simpler because all index objects are treated the same way, whereas with the 11g table compression, a data block may contain both compressed and uncompressed row data.

For the official details on Oracle 11g table data compression, see the Oracle 11g concepts documentation. Oracle says that their software will perform a native disk read on the data block, only decompressing the row data after the physical I/O has been completed.

Within the data buffers, the fully uncompressed version of the data remains, even though the information remains compressed on the disk data blocks themselves. This leads to a discrepancy between the size of information on the data blocks and the size of the information within the data buffers. Upon applying Oracle data compression, people will find that far more rows will fit on a data block of a given size, but there is still no impact on the data base management system from the point of view of the SGA (system global area).

Because the decompression routine is called upon block fetch, the Oracle data buffers remain largely unchanged while the data blocks themselves tend to have a lot more data on them. This Oracle 11g data compression whitepaper describes the data compression algorithm:

Compressed blocks contain a structure called a symbol table that maintains compression metadata.

When a block is compressed, duplicate values are eliminated by first adding a single copy of the duplicate value to the symbol table. Each duplicate value is then replaced by a short reference to the appropriate entry in the symbol table.

Through this innovative design, compressed data is self-contained within the database block as the metadata used to translate compressed data into its original state is contained within the block.

Oracle 11g compression (Source: Oracle Corporation)

In today's Oracle database management systems, physical disk I/O remains one of the foremost bottlenecks. Even at relatively fast speeds of 10 milliseconds, many data intensive Oracle applications can still choke on I/O by having disk enqueues, Oracle block read tasks waiting to pull information from the spinning platters. Data compression is certainly useful for reducing the amount of physical disk I/O but there are some caveats that need to be followed by the Oracle database administrator.

Costs and Benefits of 11g compression

One of the exciting new features of Oracle 11g is the new inline data compression utility that promises these benefits:

Up to a 3x disk savings - Depending on the nature of your data, Oracle compression will result in huge savings on disk space.
Cheaper solid-state disk - Because compressed tables reside on fewer disk blocks, shops that might not otherwise be able to afford solid-state flash disk can now enjoy I/O speeds up to 300x faster than platter disk.
Faster full scan/range scan operations - Because tables will reside on less data blocks, full table scans and index range scans can retrieve the rows with less disk I/O.
Reduced network traffic - Because the data blocks are compressed/decompressed only within Oracle, the external network packets will be significantly smaller.

Tests show that 11g compression result is slower transaction throughput but creates less writes because of higher row density on the data block. See this benchmark of transparent data encryption.

Note: When rows are first inserted into a data block, Oracle does not compress the row. Oracle compression always begins after the first row insert when a subsequent inserts takes block free space. At this time, all existing rows within the block are compressed.

Overall, the benchmark slows that I/O writes being reduced while CPU increases, resulting in slowing SQL throughput:

Slower transaction throughput – As we expect, Oracle transactions run faster without the encryption/decryption processing overhead. This encryption benchmark shows significantly slower throughput when deploying TDE, almost 20% (81 transactions/second with TDE, 121 transactions/second with TDE).
Less Disk Writes – Since transparent data encryption compresses the data, the benchmark with TDE required less disk writes.
More CPU required - As we would expert, TDE required CPU cycles for the encrypt/decrypt operations, and in this benchmark test we see User CPU rise from 46 to 80 when using TDE data encryption.

The overhead of 11g compression?

Remember, physical disk I/O against disk platters has become the major system bottleneck as the speed of processors increase. Until the widespread adoption of RAM disk (solid state disk), we can see this type of data compression being widely used in order to reduce the amount of physical disk I/O against Oracle systems.

The internal machinations of Oracle have always been a closely-guarded secret, Oracle's internal software, their bread-and-butter "edge" that gives Oracle such a huge competitive advantage over their competition. Because Oracle withholds many internal details, we must we must discover the internals of 11g compression with with real-world observations and conjecture.

First, Oracle hires some of the brightest software engineers in the world (graduates of prestigious colleges like MIT), and it's likely that overhead will be minimized by doing the data compress/uncompress only once, at disk I/O time, and kept in decompressed form somewhere within the RAM data buffers.

It's clear that 11g data compression offers these huge benefits, but the exact overhead costs remain unknown. Oracle explains that there new 11g data compression algorithm:

"The algorithm works by eliminating duplicate values within a database block, even across multiple columns. Compressed blocks contain a structure called a symbol table that maintains compression metadata.

When a block is compressed, duplicate values are eliminated by first adding a single copy of the duplicate value to the symbol table. Each duplicate value is then replaced by a short reference to the appropriate entry in the symbol table."

Here is a 2005 benchmark test on Oracle table compression, with results indicating that the 11g data compression is even faster than this 2005 version. Among the findings is the important suggestion that using Oracle table compression may actually improve the performance of your Oracle database:

“The reduction of disk space using Oracle table compression can be significantly higher than standard compression algorithms, because it is optimized for relational data.

It has virtually no negative impact on the performance of queries against compressed data; in fact, it may have a significant positive impact on queries accessing large amounts of data, as well as on data management operations like backup and recovery.”

This compression paper also suggestion that using a large blocksize may benefit Oracle databases where rows contain common redundant values:

“Table compression can significantly reduce disk and buffer cache requirements for database tables. Since the compression algorithm utilizes data redundancy to compress data at a block level, the higher the data redundancy is within one block, the larger the benefits of compression are.”

The article also cites evidence that Oracle table compression can reduce the time required to perform large-table full-able scans by half:

“The fts of the non-compressed table takes about 12s while the fts of the compressed table takes only about 6s.”

Some unknown issues (as of September 2007) with implementing Oracle11g data compression include the amount of overhead. The compress/decompress operations are computationally intensive but super small (probably measured in microseconds). This CPU overhead might be significantly measurable, but we can assume that the overhead will be the same (or smaller) than data compression in legacy databases (with the possible exception of PC-based Oracle databases). In a perfect implementation, incoming data would only be decompressed once (at read time) and the uncompressed copy of the disk block would reside in RAM, thereby minimizing changes to the Oracle kernel code. The overhead on DML must involve these operations:

Overhead at DML time - Whenever a SQL update, insert of delete changes a data block in RAM, Oracle must determine if the data block should be unlinked from the freelist (this threshold is defined by the PCTFREE parameter).
Compression on write - An outbound data block must be compressed to fit into it's tertiary block size (as defined by db_block_size and the tablespace blocksize keyword). For example, an uncompressed block in RAM might occupy up to 96k in RAM and be compressed into it's tertiary disk blocksize of 32k upon a physical disk write.
Decompress on read - At physical read time, incoming disk blocks must be expanded once and stored in the RAM data buffer. The exact mechanism for this expansion is not published in the Oracle11g documentation, but it's most likely a block versioning scheme similar to the one used for maintaining read consistency.
Increased likelihood of disk contention - Because the data is tightly compressed on the data blocks, more rows can be stored, thus increasing the possibility of "hot" blocks on disk. Of course, using large data buffers and/or solid-state disk (RAM-SAN) will alleviate this issue.

Oracle11g compression syntax

The 11g docs note that the new COMPRESS keyword works for tables, table partitions and entire tablespaces. Oracle has implemented their data compression at the table level, using new keywords within the "create table" DDL:

create table fred (col1 number) NOCOMPRESS;

create table fred (col1 number) COMPRESS FOR DIRECT_LOAD OPERATIONS;

create table fred (col1 number) COMPRESS FOR ALL OPERATIONS;
alter table fred move COMPRESS;

We also see syntax for creating a compressed tablespace:

CREATE TABLESPACE MYSTUFF . . . DEFAULT

{ COMPRESS [ FOR { ALL | DIRECT_LOAD } OPERATIONS ] 
| NOCOMPRESS 
}

Several DBA views have been enhanced in 11g to show compression attributes. The dba_tables view has added the new columns COMPRESSED (enabled, disabled) and COMPRESSED_FOR (nocompress, compress for direct_load operations, compress for all operations).

While powerful, Oracle11g has made some important fundamental decisions about mixed-mode compression, a feature whereby it is possible for rows within the same table to be either compressed or uncompressed! The expected behaviors for the new compression syntax has a few surprises, features that require knowledge of how Oracle has chosen to implement their data compression utility.

Here is an example of Oracle tablespace compression:

create smallfile tablespace
"my_ts"
logging
datafile
'/u01/app/oracle/oradata/orasid/file1.dbf' size 10m autoextend
space management auto
default compress for oltp;

Oracle's multi-state compression

While the "alter table" and "alter tablespace" clauses support changing the compression options, we would expect that Oracle would feel obligated to change all objects to match their new compression attributes. That is not the case, and the 11g compression docs note that a table may have multi-state rows, some compressed and others expanded:

You can alter the compression attribute for a table (or a partition or tablespace), and the change only applies to new data going into that table.

As a result, a single table or partition may contain some compressed blocks and some regular blocks. This guarantees that data size will not increase as a result of compression; in cases where compression could increase the size of a block, it is not applied to that block.

Oracle 11g multi-state blocks (Source: Oracle Corporation)

Without implementing this revolutionary "partial" row compression, making a table-wide or tablespace-wide compression change would require a massive update to blocks within the target tablespace. The 11g compression docs note that when changing to/from global compression features, the risk averse DBA would choose to rebuild the table or tablespace from scratch:

"Existing data in the database can also be compressed by moving it into compressed form through ALTER TABLE and MOVE statements. This operation takes an exclusive lock on the table, and therefore prevents any updates and loads until it completes. If this is not acceptable, the Oracle Database online redefinition utility (the DBMS_REDEFINITION PL/SQL package) can be used."

Also see 11g encrypted tablespace tips.

Operational tests of the 11g data compression utility

Roby Sherman published some negative test results about the 11g data compression utility, suggesting that there may be some performance issues, noting that his performance was "quite horrible". However, this does not match the experience of others, and it serves to underscore the issue that data compression is marginally CPU intensive, and 11g table compression may not be performant on smaller personal computers:

For anyone interested, I ran some very quick benchmarks on 11g's new Advanced Compression table option COMPRESS FOR ALL OPERATIONS that Oracle is claiming was "specifically written to create only the most 'minimal' of performance impacts in OLTP environments.

I guess their definition of minimal and my definition of minimal must be different..

Other comments of Sherman's 11g compression test included:

Rather than commenting that the advanced compression is "quite horrible" I'd comment that your choice of tests are quite horrible.

I don't consider Roby's test cases are horrible. Anyone who criticizes the hype seems to be criticized. Look at the argument of Roby: "But, then again, if your operations are that minimal, you probably aren't creating enough data to need compression in the first place!"

Robert Freeman noted that his results did not show the same degradation and he offers insightful comments about the dangers of using a "negative proof":

"I've read this particular post several times. I just have to believe that I'm not getting something here, because ..... I want to be charitable but the point that is being made is just asinine in my opinion. I hope, I really hope, that I've missed something obvious and that I'm the fool (would not be the first time - I freely confess my imperfections).

>> The common nonsense peddled is like: It all depends.

EXCUSE ME????? Common nonsense? The whole scientific method is all about Ceteris paribus. Research is influenced heavily on IT DEPENDS. Drop a rock and a feather on the Earth and then on the moon and tell me the results are not a matter of IT DEPENDS. I must have missed your point, because nobody could be so short sighted as to presuppose that there are no dependencies.

Can you explain negative cases? Sure. I can explain the negative case of the rock falling faster than the feather to the difference in location and criteria of the experiment. I can explain Roby's negative results in numerous ways, including accepting the *possibility* that his results reflect truth, and that compression is a performance killer. Did he provide sufficient evidence to review his results, of course not. How do we know the issue isn't one of the optimizer changing the plan, as opposed to the physical implementation of compression, for example? We don't because no execution plans were provided.

That being said, his results do not mirror mine. Explain that negative case. Oh, is it because my results are on Oracle Beta 5? Or is it that my results are on a faster CPU? Can we always explain negative cases? No. . .

Additionally, I argue that one can never, ever, systematically prove everything 100%. Perhaps to a high degree of confidence, but never for sure. Why? Because the conditions of that result and that analysis can change because the dependencies change. You can not control the entire environment, thus you can not 100% guarantee an outcome, ever. If you have never had the frustrating experience of having two different result sets and not being
able to figure out why they differ, then you are luckier than I (or younger, or you have more time or less experience).
. . .
While I have not tested compression in the production code (yet, I'm running the book through production now), when I did my chapter on compression in Beta 5, I found the results to be very different from Roby's. Still, I'm glad to see him testing this stuff and reminding us that not every new feature is a panacea."

But these unverifiable reports from DBA are not conclusive and we need a reproducible benchmark to ascertain the true costs and benefits of using Oracle 11g compression.

A benchmark of 11g compression with SSD

To see the effects of Oracle 11g data compression we need a reproducible benchmark test that illustrates the benefits and costs of utilizing 11g compression, especially with SSD flash drives.

A reproducible TPC-H benchmark can conclusively show these performance benefits for using Oracle 11g data compression with SSD:

Solid-state Oracle becomes cheaper - This test is expected to show that the use of Oracle 11g compression can half the costs of solid-state RAM-SAM, making diskless Oracle far more cost effective.
Scan operations up to 200 times faster - The benchmark is expected to confirmed a dramatic speed improvement for index range scans and full scan operations, including full-table scans and index fast full scans. The performance benefit for combining 11g compression and SSD is expected to improve performance by several orders of magnitude.

��