In my previous article we have seen the need for compression, goal of compression etc. In this article we are going to discuss the types of compression supported in SQL Server 2008. There are two types of compression supported and they are
- Row Level Compression and
- Page Level Compression
Lets discuss on each compression type
Row Level Compression
Row level compression is one of the type where there is no compression algorithm is used. The main aim of row level compression is to make fixed data types to a variable data type there by reducing the size. This type of compression will remove any extra bytes in the fixed data type. Even though compression plays on fixed data type the actual schema will be the same, that is when you retrieve the data it will be returned as fixed data type. One main consideration in this type of compression is that, data schema and data distribution directly plays an important role with respect to compression ratio.
Let me try to show this with and example. Consider you are having a table with column CHAR(25) and this will be fixed data type. When you insert a data (say SAGAR) to this column its using only 5 characters where as the rest of the characters are empty but it will still take the space because of fixed data type. Row level compression will remove these extra bytes from the column there by saving free space. Consider if the same is repeated for N number of rows and actually you will be ending up with more free space. To visualize the same, Im just created the image below.
Before Compression After Compression
From the image above you can see the free bytes is removed and it saved 17 bytes, if the same is applied for many rows then more space can be saved, but this actually depends on the data stored and data type. Lets say you have declared the column as CHAR(1) and if you try to implement row level compression then obviously no space can be saved and you will be ending up with worst compression ratio, so before implementing this type of compression just keep these thing in mind.
Page Level Compression
Page level compression type can give more compression ratio compared to Row level. Page compression uses dictionary based compression algorithm to attain more compression ratio. The main aim of page level compression is to reduce column redundancy in one or more rows in a data page. There are two phases for dictionary based algorithm to achieve more compression and they are below
- Column Prefix and
- Page Dictionary
In column prefix phase, SQL Server checks for common byte pattern in the beginning of all column in all the pages. If it finds at least minimum of two pattern, SQL Server will store the pattern in a new row called anchor record. Common byte pattern is specific to each column. Common byte pattern is checked from left to right and not from right to left to make a balance between cost spent and compression ratio. This phase is similar to Huffman Coding discussed in Part 2 article. In simple we can point the steps for column prefix as below
- Detect the common byte pattern
- Store the pattern in anchor record
- Refer to anchor record in column
- To visualize this better, lets take the table shown below and implement column prefix
After column prefix phase gets completed, it will process page dictionary phase.
In this phase SQL Server will create a dictionary for common bytes across all columns and rows in a page. Dictionary values are common to entire page hence it can be used across columns and rows. Dictionary values can be used by both Data and Index pages. Non-leaf pages in Indexes will uses ROW compression even though you specify page compression, this is to make a balance between cost spent and compression ratio. In simple we can point the steps for page dictionary as below
- Detect the common pattern
- Create a dictionary based on the pattern
- Replace values with pattern in dictionary
To visualize this better we will take the example used in column prefix phase.
From the image above you can see that in that last stage, the table has more NULL values and bit values only there by attaining more compression ratio.
Comparison between Row Level & Page Level Compression
Im getting some questions from the readers that what is the difference between row level and page level compression. I have explained it in this article but to summarize that Im providing this comparison for quick overview
|No Compression Algorithm is used
|Compression Algorithm is used
|Less compression ratio compared to page level
|More compression ratio compared to row level
|This method will try to convert fixed data type to variable data type to attain compression
|This method uses Dictionary based algorithm to attain compression
|Uses less resources compared to page level
|Uses more resources compared to row level
|Compression ratio depends on Data distribution and Schema
|Compression ratio depends on Data distribution