What are Slowly Changing Dimensions in a Data Warehouse ?
Your Dimension tables will contain attributes that change over time. ex. Employee Address, Phone Numbers, etc. Slowly changing dimensions (SCD in short) is a modelling technique to capture and handle the change records.
Consider this example: Charlie is a customer with ABC Inc. He lives in Chicago, Illinois. So, the original entry in the customer lookup table has the following record:
At a later date, He moved to Los Angeles, California on January, 2020. How should ABC Inc. now modify its customer table to reflect this change? This is the "Slowly Changing Dimension" problem
There are in general three ways to solve this type of problem, and they are categorized as follows:
- Type-1: The new record replaces the original record. No trace of the old record exists
- Type-2: A new record is added into the customer dimension table. And the customers previous record is marked inactive by end-dating the old record
- Type-3: When there is a change a new physical column is added and history is maintained
In this tutorial, you will learn:
Types of Slowly Changing Dimensions
Type 1 Slowly Changing Dimension
In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept
In our example, After Charlie moved from Illinois to California, the new information replaces the new record, and we have the following table:
Advantages of using Type-1 Slowly Changing Dimension
- SCD-1 is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information
Disadvantages of using Type-1 Slowly Changing Dimension
- All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Charlie lived in Illinois before
When should you use Type-1 SCD ?
- Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes
Type 2 Slowly Changing Dimension
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key
In our example, After Charlie moved from Illinois to California, we add the new information as a new row into the table. SCD2 tables usually have few house keeping fields like EFF_DT, END_DT to associate the time on which the change happened:
Advantages of using Type-2 Slowly Changing Dimension
- Type 2 SCD is the popular dimension in Data warehousing. It preserves entire history of changes and is the most effective SCD
Disadvantages of using Type-2 Slowly Changing Dimension
- Complex ETL required to do change data capture and perform the SCD Type 2 Process
- As a new record is inserted everytime there is a change. This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern
When should you use Type-2 SCD ?
- Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes
Type 3 Slowly Changing Dimension
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active
In our example, After Charlie moved from Illinois to California, We will add three columns to the table (Original State, Current State, Effective Date) to maitain History:
Advantages of using Type-3 Slowly Changing Dimension
- Does not increase the size of the table, since new information is updated in the same row
- Allows us to store some part of history
Disadvantages of using Type-3 Slowly Changing Dimension
- Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Charlie later moves to Texas on April 15, 2020, the California information will be lost
When should you use Type-2 SCD ?
- Type 3 is rarely used in actual practice
- Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur for a finite number of time
Next Section: Data Modelling