Types of Data Compression

In my previous article we have seen the need for compression, goal of compression etc. In this article we are going to discuss the types of compression supported in SQL Server 2008. There are two types of compression supported and they are

Row Level Compression and
Page Level Compression

Lets discuss on each compression type

Row Level Compression

Row level compression is one of the type where there is no compression algorithm is used. The main aim of row level compression is to make fixed data types to a variable data type there by reducing the size. This type of compression will remove any extra bytes in the fixed data type. Even though compression plays on fixed data type the actual schema will be the same, that is when you retrieve the data it will be returned as fixed data type. One main consideration in this type of compression is that, data schema and data distribution directly plays an important role with respect to compression ratio.

Let me try to show this with and example. Consider you are having a table with column CHAR(25) and this will be fixed data type. When you insert a data (say SAGAR) to this column its using only 5 characters where as the rest of the characters are empty but it will still take the space because of fixed data type. Row level compression will remove these extra bytes from the column there by saving free space. Consider if the same is repeated for N number of rows and actually you will be ending up with more free space. To visualize the same, Im just created the image below.

Before Compression

After Compression

Before Compression After Compression

From the image above you can see the free bytes is removed and it saved 17 bytes, if the same is applied for many rows then more space can be saved, but this actually depends on the data stored and data type. Lets say you have declared the column as CHAR(1) and if you try to implement row level compression then obviously no space can be saved and you will be ending up with worst compression ratio, so before implementing this type of compression just keep these thing in mind.

Page Level Compression

Page level compression type can give more compression ratio compared to Row level. Page compression uses dictionary based compression algorithm to attain more compression ratio. The main aim of page level compression is to reduce column redundancy in one or more rows in a data page. There are two phases for dictionary based algorithm to achieve more compression and they are below

Column Prefix and
Page Dictionary

Column Prefix

In column prefix phase, SQL Server checks for common byte pattern in the beginning of all column in all the pages. If it finds at least minimum of two pattern, SQL Server will store the pattern in a new row called anchor record. Common byte pattern is specific to each column. Common byte pattern is checked from left to right and not from right to left to make a balance between cost spent and compression ratio. This phase is similar to Huffman Coding discussed in Part 2 article. In simple we can point the steps for column prefix as below

Detect the common byte pattern
Store the pattern in anchor record
Refer to anchor record in column

To visualize this better, lets take the table shown below and implement column prefix

compression_type_3

compression_type_4

After column prefix phase gets completed, it will process page dictionary phase.

Page Dictionary

In this phase SQL Server will create a dictionary for common bytes across all columns and rows in a page. Dictionary values are common to entire page hence it can be used across columns and rows. Dictionary values can be used by both Data and Index pages. Non-leaf pages in Indexes will uses ROW compression even though you specify page compression, this is to make a balance between cost spent and compression ratio. In simple we can point the steps for page dictionary as below

Detect the common pattern
Create a dictionary based on the pattern
Replace values with pattern in dictionary

To visualize this better we will take the example used in column prefix phase.

compression_type_5

From the image above you can see that in that last stage, the table has more NULL values and bit values only there by attaining more compression ratio.

Comparison between Row Level & Page Level Compression

Im getting some questions from the readers that what is the difference between row level and page level compression. I have explained it in this article but to summarize that Im providing this comparison for quick overview

Row Level	Page Level
No Compression Algorithm is used	Compression Algorithm is used
Less compression ratio compared to page level	More compression ratio compared to row level
This method will try to convert fixed data type to variable data type to attain compression	This method uses Dictionary based algorithm to attain compression
Uses less resources compared to page level	Uses more resources compared to row level
Compression ratio depends on Data distribution and Schema	Compression ratio depends on Data distribution