What is Change Data Capture?
CDC is acronym for Change Data Capture. This is one of the new feature added from SQL Server 2008. CDC helps us to provide information about DML changes on a table and a database. It provides these information’s in relational format. In previous versions of SQL Server we use triggers, timestamp columns etc., to capture the DML changes to the table which is resource expensive techniques, so to overcome this MS added this new feature.
When you enable CDC for a database below are the five system tables created in that database. These tables are created with CDC schema, i.e all the CDC tables are under CDC schema.
In addition to this when you enable CDC for a table, then a new change table will be created to store the modified data. This new change table is exact replica of the source table with 5 additional columns for tracking purpose. Naming format of the change table will be as below
Below image shows the additional columns in the change table
How it works ?
Im going to explain this in simple. Lets consider the above image, in which we have tables in a database. Once you configure CDC it will capture the modified data as below
In the first stage you have your source tables. When you configure CDC, Change tables (exact replica of source tables with additional columns) will be created to store the modified data.
As you know all the DML operations are logged in a transaction log, with the help of this CDC will extract the modified data from transaction log and it will store it in change tables
This extraction method is similar to transactional replication process. In this case two SQL jobs will be created, one to initiate the capture process at scheduled intervals and the second one for retention of captured data
The capture job invokes sp_replcmds command to capture the modified data
CDC has relationship with transactional replication. If you enable both transactional replication and CDC the transaction log reader will do the capture process or if you enable CDC alone then the job does the capture process, by this way you reduce the contention in scanning the log file.
Once the captured data is stored in change tables you are free to use CDC query functions to retrieve the modified data to process it for ETL in other words you can use it for data warehouse purposes
Do you know? CDC also captures DDL changes to the tracked tables.
SQL Server Agent Jobs
As I said earlier two new jobs will be created when you enable CDC for a table. These two jobs are common to the entire database, i.e if you enable CDC for N number of tables these jobs itself will take care.
This job is used to capture the changes on the tracking tables. Naming convention for this job is as below
This job is used to clean up the change table , in other words retention for change table. Naming convention for this job is as below