Top Data Warehouse Interview Questions & Answers

Here is the list of Top 50 Data warehousing Interview Questions that are frequently asked:

1. What is Datawarehousing?

A data warehouse is the cental repository of an organization's historical data for data analytics and decision making. In single sentence, it is repository of integrated information which can be available for queries and analysis.

2. What is Fact Table?

Fact table contains the measurement of business processes, and it contains foreign keys for the dimension tables

Example - If the business process is selling Cars. Sales Fact would have montly sales volume by Region

3. What is Dimension Table?

Dimension table is a table which contain attributes of measurements stored in fact tables. Eg. Employee is a Dimension, Location, Product are other examples

3. What are the uses of Datawarehousing?

Data Warehouse is used for:

  • Data Analysis and Prediction
  • Support Organizational Decision making
  • Operations
  • Fulfill Regulatory needs
  • Detect Fraud and Anomolies

5. What is Business Intelligence?

Business Intelligence is also known as DSS - Decision support system which refers to the technologies, application and practices for the collection, integration and analysis of the business related information or data

6. What is Data Mining?

Data Mining is set to be a process of analyzing the data in different dimensions or perspectives and summarizing into a useful information. Can be queried and retrieved the data from database in their own format.

7. What is OLTP?

OLTP is abbreviated as On-Line Transaction Processing, Its build for high concurrency and low latency on low volume of data. Eg. Your regular database (eg. MySQL) is a OLTP System

8. What is OLAP?

OLAP is abbreviated as Online Analytical Processing, and it is set to be a system which collects, manages, processes multi-dimensional data for analysis and management purposes. Your Data Warehouse is a OLAP System

9. What is the difference between OLTP and OLAP?

Following are the differences between OLTP and OLAP:

OLTPOLAP
Optimized for Transactions (Inserts, Updates, Deletes)Optimized for Analytics
It is characterized by large numbers of short online transactionsIt is characterized by a large volume of data
Tables in OLTP database are Highly normalized (3NF)Tables in OLAP database are de-normalized
Faster queries on less volume of dataAnalytical queries on huge volume of historical data

10. What is ODS?

ODS is abbreviated as Operational Data Store and it is a repository of real time operational data rather than long term trend data.

11. What is factless fact tables?

A factless fact tables are the fact table which doesn’t contain numeric fact column in the fact table. It only has the key-fields from the dimensions, Mostly used to handle to relationships

12. How can we load the time dimension?

Time dimensions are usually loaded through all possible dates in a year and it can be done through a program. Here, 100 years can be represented with one row per day.

13. What are Non-additive facts?

Non-Addictive facts are facts that cannot be summed up for any of the dimensions present in the fact table. If there are changes in the dimensions, same facts can be useful.

14. What is conformed fact?

Conformed fact is a table which can be used across multiple data marts in combined with the multiple fact tables.

15. What is Datamart?

A Datamart is a specialized version of Datawarehouse and it contains a snapshot of operational data that helps the business people to decide with the analysis of past trends and experiences. A data mart helps to emphasizes on easy access to relevant information.

16. What is the difference between View and Materialized View?

View is built on top of Tables, its an abstract (mirror) layer. No data is actually stored in the view, Whenever the view is queried, it in turn fetches the data from the physical table

Materialized View provides indirect access to the table data by storing the results of a query in a separate schema

17. What is ETL?

ETL is abbreviated as Extract, Transform and Load. During ETL, Software tools are used, which reads the data from mutliple data sources and extracts a desired subset of data. Next, it transform the data using rules and lookup tables and loads it to the destination table

18. What is MPP?

MPP is abbreviated as Massively Parallel Processing. They are specialized OLAP systems that do parallel processing to efficiently process huge volumes of data. Eg. Teradata, Snowflake, BigQuery are all MPP Data Warehouses

19. What is real-time datawarehousing?

Real-time datawarehousing captures the business data whenever it occurs. The Business transactions are streamed in real-time and are available for use instantly

20. What are Aggregate tables?

Aggregate tables summarize the data in the dimensions. It is easy to retrieve data from the aggregated tables than the original table which has more number of records.

This table reduces the load in the database server and increases the performance of the query.

21. What are the types of SCD?

There are three types of SCD and they are as follows:

SCD 1 - The new record replaces the original record

SCD 2 - A new record is added to the existing customer dimension table

SCD 3 - A original data is modified to include new data in a separate column

22. What is Star Schema?

Star schema comprises of a central fact table linked to multiple dimension tables around it (resembling a star)

23. What is Snowflake Schema?

Snowflake Schema is an extension of the star schema. In addition to Fact and Dimensions in Star Schema, the dimension table can have sub-hierarchy tables

24. Give an example for Star Schema ?

In a Bank, Member, Account, Location, Collateral are all Dimension tables. Fact_Account would be a fact table which will store the monthly summary of the account activity

25. What is a core dimension?

Core dimension is a Dimension table which is used as dedicated for single fact table or datamart

26. What is Active Datawarehousing?

An active datawarehouse is a datawarehouse that enables decision makers within a company or organization to manage customer relationships effectively and efficiently.

27. What is the difference between Datawarehouse and OLAP?

Datawarehouse is a place where the whole data is stored for analyzing, but OLAP is used for analyzing the data, managing aggregations, information partitioning into minor level information.

28. What is ER Diagram?

ER diagram is abbreviated as Entity-Relationship diagram which illustrates the interrelationships between the entities in the database. The ER diagram shows the structure of each tables and the links between the tables.

29. What are the key columns in Fact and dimension tables?

Foreign keys of fact tables are the primary keys of the dimension tables.

30. What is SCD?

SCD is defined as slowly changing dimensions, and it applies to the cases where record changes slowly over time.

31. What is called Dimensional Modelling?

Dimensional Modeling is used by data warehouse designers to break the business problems into two types of tables - Facts and Dimension tables

Fact table has facts and measurements of the business and dimension table contains the context of measurements

32. What are the types of Dimensional Modeling?

There are three types of Dimensional Modeling and they are as follows:

  • Conceptual Modeling
  • Logical Modeling
  • Physical Modeling

33. What is a Surrogate key?

Surrogate key is a substitute for the natural primary key. It is set to be a unique identifier for each row that can be used for the primary key to a table

34. What is the difference between Relational Modeling vs Dimensional Modeling?

ER Modeling is used for normalizing the OLTP database design whereas Dimensional Modeling is used for de-normalizing the ROLAP and MOLAP design.

45. What are the steps in building the Data Warehouse?

Following are the steps to be followed to build the datawaerhouse:

  • Gathering business requirements
  • Identifying the necessary sources
  • Identifying the facts
  • Defining the dimensions
  • Defining the attributes
  • Redefine the dimensions and attributes if required
  • Organize the Attribute hierarchy
  • Define Relationships
  • Assign unique Identifiers

36. What is called data cleansing?

Scrubbing Bad Data, Cleansing of Orphan records, Inconsistent data and missing information from the database

37. What is Metadata?

Metadata is defined as data about the data. The metadata contains information like number of columns used, fix width and limited width, ordering of fields and data types of the fields

38. What are loops (recursive) in Datawarehousing?

In datawarehousing, loops are existing between the tables. If there is a loop between the tables, then the query generation will take more time and it creates ambiguity. It is advised to avoid loop between the tables.

39. Whether Dimension table can have numeric value?

Yes, dimension table can have numeric value as they are the descriptive elements of our business.

40. What is the definition of Cube in Datawarehousing?

Cubes are logical representation of multidimensional data. The edge of the cube has the dimension members,and the body of the cube contains the data values

41. What are the different types of data warehouses ?

Following are the different types of Datawarehouses:

  • Enterprise Datawarehousing
  • Operational Data Store
  • Data Mart

42. What are the popular Data warehouse platforms?

Here's the list of Data Warehouse Systems:

  1. Teradata
  2. Netezza
  3. Snowflake (Cloud)
  4. Google BigQuery (Cloud)
  5. Amazon Redshift (Cloud)
  6. Microsoft Synapse (Cloud)
  7. Panoply (Cloud)

43. Difference between On-Premise and Cloud Data Warehouse Platforms ?

On Premise Data WarehouseCloud Data Warehouse
Involves purchasing Hardware, Sign Contracts and Procure LicencesEasy as signup online and spin a new Warehouse. Less Friction to Start
Maintain Hardware and Software. May need a team of Infrastructure specialists and DBAsVendor is responsible for Hardware, Software and Infra maintenance
High Initial Cost, Time, Energy to start a Data Warehouse You pay for what you use. Launch a Data Warehouse in Few clicks with zero initial investment
Limited Compute and Storage. If you have to increase it, it has to go through lengthy licensing and procurement processUnlimited Scalable Compute and Unlimited Storage
Highly Secure by defaultHighly Secure (Needs experts to audit the security periodically)

44. How often do you take backups of Data Warehouse ?

The DBA backups run everyday and the copy is retained for 7 days. Whereas Application backups happen once a week or once a month (based on the application) and stored for longer period of time (usually 3 months). Disaster recovery backups is a different Backup where the Data Warehouse is synced everyday to a remote location

45. What is defined as Incremental Backup?

A Incremental backup will only copy the changed records from the time of last backup

46. What is the goal of Optimizer?

The goal to Optimizer is to find the most efficient way to execute the SQL statements.

47. What is Execution Plan?

Execution Plan is a plan which is used to the optimizer to select the combination of the steps.

48. What are the approaches used by Optimizer during execution plan?

There are two approaches:

  1. Rule Based
  2. Cost Based

49. What are the tools available for ETL?

Following are the ETL tools available:

Informatica
Data Stage
Ab Initio
Matillion
Dataflow

50.What is the difference between metadata and data dictionary?
Metadata is defined as data about the data. But, Data dictionary contain the information about the project information, graphs, abinito commands and server information



Next Section: Home



SQL.info