what is recommended indexing rules for ETL in general?

Hello Experts,

What kind of indexes are recommended for staging area in ETL flow. For example unique index on natural key columns. I have surrogate key as well in data flowing in from source.

Also any suggestion/recommendation for indexing in other layers like dimensional data model layer and type 1 persistent data layer for full snapshot data.

Thanks, Rajneesh

rajneesh4u

posted on 29 Apr 20

Enjoy great content like this and a lot more !

Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds




nVector01-May-20

The only objective of an Index is to reduce the amount of data scans during retrieval. With that said, 

Indexing for staging tables:

Stage tables are primarily meant to stage the daily data and then eventually do CDC (Change data capture) with target tables. So, Use the columns that you will use to join the stage and target table to do the CDC Comparison. 

Should I use Surrogate key / Natural keys as part of Index ?

All the modern cloud data warehouse platforms support both numeric as well as character columns to be part of the index. So, Look at the join columns and then use your best judgement to choose the Index columns

Indexing for Target Tables (Fact / Dimension):

  • Fact tables are joined with Dimensions, so use the join columns as part of the primary index on the Fact tables
  • Dimension tables are joined across Dimensions as well by users by specific filters. So, again use the columns that are frequently used as part of the filter columns


rajneesh4u03-May-20

Thank you !!!