www.coglinmill.com

 

     

Change Data Capture

Change Data Capture is just one component of the ETL process. However, it is worthy of discussion by itself, because if implemented correctly it can significantly reduce load times and simplify the follow on ETL process.


When loading certain types of data into a data warehouse (e.g. "master files", such as Customer and Product), CDC can significantly reduce load times. Instead of the ETL process needing to scan every row looking for changes, the captured data consists of only new, changed or deleted rows, and includes a flag indicating the change type. Typically CDC is associated with database journaling: the journal images are Captured from the journal receivers (and in most cases) Applied to a target table.

RODIN can perform CDC in real-time (or near real-time), capturing journal images as they are written, or in batch at the end of the day. You can specify to capture detailed changes (every before and after image), or to consolidate multiple changes down to a single resulting image.

The figure below provides an overview of the real-time RODIN CDC Server (real-time) process.

 

 RODIN-CDC-Server

 

A RODIN CDC Server processes a single journal, with data from potentially several hundred tables being captured. Both local and remote journals are supported. For each table associated with the CDC Server you may specify up to 4 different capture types:

  1. Detail Capture and Apply. Each journal image (after images, and optionally before images) is captured almost immediately it arrives in the journal receiver and is written out into am automatically generated Capture Table.
  2. Consolidated Capture and Apply. Multiple journal images for the same row in the source table are merged into one resulting Capture row.
  3. Replica Capture. This is very similar to Consolidated Capture, however there is no seperate Apply process. The target table is a copy of the source table.
  4. Data Queue Capture: The captured data is sent to data queue, which is typeically used as the input to a RODIN ETL process.

Each table associated with a CDC server may have a different combination of these output types. You must select at least one type, or all four if you wish.

Capture and Apply

Since the capture process is a continuously active server job, it is often required to perform an Apply: where the Captured data is moved from the Capture Table into a separate Apply Table, becoming a 'batch' of captured data. Capture then resumes in the (no empty) Capture Table. An Apply may be requested once a day, or every 5 minutes, as the need dictates. Additional control columns are automatically included in the capture and apply tables. For example, a flag that specifies whether this is an Insert, Change or Delete image as well as other control information.

The Replica and Data Queue options have no separate Apply step.

Filters and Rules

There may be scenarios where you do not want to capture all changes to a table. For example, you may have a nightly job which updates every row in a table -but you are not interested in capturing these updates, since they are not relevant to your specific requirements. RODIN CDC allows you to define rules to select/omit capture by job name, program name, user profile or even based on the value of one or more columns in the journal image. This is especially useful in audit situations where you need to capture changes to a specific column , or changes made by a particular user.

You can also define Critical Column rules: these apply only to updates to existing rows, allowing you to trap updates only where one or more critical column values have changed. (Inserts and deletes are not filtered by this type of rule).

Any number of CDC Servers may be defined.

RODIN CDC is highly efficient with the capability to process millions of journal images per hour, even on smaller systems.

RODIN also supports a batch type CDC, which processes one table and is requested at a point in time (as opposed to continuous capture). This CDC type also supports capture by Comparison. This is useful where the tables are not journaled. RODIN maintains a copy of the table and performs a row by row comparison, looking for new, changed and deleted rows.

 

 

 

  866-RODIN-DW

  (866-763-4639) 

 Request Information
 Request a Quote
 Evaluation Copy
  View FAQs 
 Support

 

Subscribe