Types of Data Compression

In my previous article we have seen the need for compression, goal of compression etc. In this article we are going to discuss the types of compression supported in SQL Server 2008. There are two types of compression supported and they are

  • Row Level Compression and
  • Page Level Compression

Lets discuss on each compression type

Row Level Compression

Row level compression is one of the type where there is no compression algorithm is used. The main aim of row level compression is to make fixed data types to a variable data type there by reducing the size. This type of compression will remove any extra bytes in the fixed data type. Even though compression plays on fixed data type the actual schema will be the same, that is when you retrieve the data it will be returned as fixed data type. One main consideration in this type of compression is that, data schema and data distribution directly plays an important role with respect to compression ratio.

Let me try to show this with and example. Consider you are having a table with column CHAR(25) and this will be fixed data type. When you insert a data (say SAGAR) to this column its using only 5 characters where as the rest of the characters are empty but it will still take the space because of fixed data type. Row level compression will remove these extra bytes from the column there by saving free space. Consider if the same is repeated for N number of rows and actually you will be ending up with more free space. To visualize the same, Im just created the image below.

Before Compression

After Compression

Before Compression After Compression

From the image above you can see the free bytes is removed and it saved 17 bytes, if the same is applied for many rows then more space can be saved, but this actually depends on the data stored and data type. Lets say you have declared the column as CHAR(1) and if you try to implement row level compression then obviously no space can be saved and you will be ending up with worst compression ratio, so before implementing this type of compression just keep these thing in mind.

Page Level Compression

Page level compression type can give more compression ratio compared to Row level. Page compression uses dictionary based compression algorithm to attain more compression ratio. The main aim of page level compression is to reduce column redundancy in one or more rows in a data page. There are two phases for dictionary based algorithm to achieve more compression and they are below

  • Column Prefix and
  • Page Dictionary

Column Prefix

In column prefix phase, SQL Server checks for common byte pattern in the beginning of all column in all the pages. If it finds at least minimum of two pattern, SQL Server will store the pattern in a new row called anchor record. Common byte pattern is specific to each column. Common byte pattern is checked from left to right and not from right to left to make a balance between cost spent and compression ratio. This phase is similar to Huffman Coding discussed in Part 2 article. In simple we can point the steps for column prefix as below

  1. Detect the common byte pattern
  2. Store the pattern in anchor record
  3. Refer to anchor record in column
    To visualize this better, lets take the table shown below and implement column prefix

compression_type_3

compression_type_4

After column prefix phase gets completed, it will process page dictionary phase.

Page Dictionary

In this phase SQL Server will create a dictionary for common bytes across all columns and rows in a page. Dictionary values are common to entire page hence it can be used across columns and rows. Dictionary values can be used by both Data and Index pages. Non-leaf pages in Indexes will uses ROW compression even though you specify page compression, this is to make a balance between cost spent and compression ratio. In simple we can point the steps for page dictionary as below

  1. Detect the common pattern
  2. Create a dictionary based on the pattern
  3. Replace values with pattern in dictionary

To visualize this better we will take the example used in column prefix phase.

compression_type_5

From the image above you can see that in that last stage, the table has more NULL values and bit values only there by attaining more compression ratio.

Comparison between Row Level & Page Level Compression

Im getting some questions from the readers that what is the difference between row level and page level compression. I have explained it in this article but to summarize that Im providing this comparison for quick overview

Row Level

Page Level

No Compression Algorithm is used Compression Algorithm is used
Less compression ratio compared to page level More compression ratio compared to row level
This method will try to convert fixed data type to variable data type to attain compression This method uses Dictionary based algorithm to attain compression
Uses less resources compared to page level Uses more resources compared to row level
Compression ratio depends on Data distribution and Schema Compression ratio depends on Data distribution

Posted

in

by

Tags:

Comments

5 responses to “Types of Data Compression”

  1. srinath avatar
    srinath

    Nice article…Thanks much!!

  2. Ganapathy avatar
    Ganapathy

    good article vidhya

  3. distas avatar

    Good post, I referenced it in one of mine. Any links to part 1 / part 2?

  4. distas avatar

    Hi VidhyaSagar, thanks for posting the links! Will update my post accordingly.

Leave a Reply to Ganapathy Cancel reply

Your email address will not be published. Required fields are marked *