Change Data Capture

What is Change Data Capture?

CDC is acronym for Change Data Capture. This is one of the new feature added from SQL Server 2008. CDC helps us to provide information about DML changes on a table and a database. It provides these information’s in relational format. In previous versions of SQL Server we use triggers, timestamp columns etc., to capture the DML changes to the table which is resource expensive techniques, so to overcome this MS added this new feature.

Editions Supported

  • Enterprise
  • Developer

CDC Tables

When you enable CDC for a database below are the five system tables created in that database. These tables are created with CDC schema, i.e all the CDC tables are under CDC schema.

  1. cdc.captured_columns
  2. cdc.change_tables
  3. cdc.ddl_history
  4. cdc.index_columns
  5. cdc.lsn_time_mapping

In addition to this when you enable CDC for a table, then a new change table will be created to store the modified data. This new change table is exact replica of the source table with 5 additional columns for tracking purpose. Naming format of the change table will be as below

cdc.tableowner_tablename_CT

Below image shows the additional columns in the change table

CDC_1

How it works ?

CDC_2

Im going to explain this in simple. Lets consider the above image, in which we have tables in a database. Once you configure CDC it will capture the modified data as below

  • In the first stage you have your source tables. When you configure CDC, Change tables (exact replica of source tables with additional columns) will be created to store the modified data.
  • As you know all the DML operations are logged in a transaction log, with the help of this CDC will extract the modified data from transaction log and it will store it in change tables
  • This extraction method is similar to transactional replication process. In this case two SQL jobs will be created, one to initiate the capture process at scheduled intervals and the second one for retention of captured data
    • The capture job invokes sp_replcmds command to capture the modified data
    • CDC has relationship with transactional replication. If you enable both transactional replication and CDC the transaction log reader will do the capture process or if you enable CDC alone then the job does the capture process, by this way you reduce the contention in scanning the log file.
  • Once the captured data is stored in change tables you are free to use CDC query functions to retrieve the modified data to process it for ETL in other words you can use it for data warehouse purposes
  • Do you know? CDC also captures DDL changes to the tracked tables.

SQL Server Agent Jobs

As I said earlier two new jobs will be created when you enable CDC for a table. These two jobs are common to the entire database, i.e if you enable CDC for N number of tables these jobs itself will take care.

Capture Job

This job is used to capture the changes on the tracking tables. Naming convention for this job is as below

cdc.dbname_capture

Cleanup Job

This job is used to clean up the change table , in other words retention for change table. Naming convention for this job is as below

cdc.dbname_cleanup


Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *