What is a Data Warehouse ?

What is a Data Warehousing ?

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process - Bill Inmon

  • Subject-Oriented: A data warehouse should be focused to analyze a particular subject area. ex. SalesWH, MarketingWH, FraudWH

  • Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product

  • Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses ever associated with a customer

  • Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered

In this tutorial, you will learn more about:

Types of Data Warehouse

Three main types of Data Warehouses are:

1. Enterprise Data Warehouse:

Enterprise Data Warehouse is a centralized warehouse. It provides decision support service across the enterprise. It offers a unified approach for organizing and representing data. It also provide the ability to classify data according to the subject and give access according to those divisions

2. Operational Data Store:

Operational Data Store, which is also called ODS, are nothing but data store required when neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the Employees

3. Data Mart:

A data mart is a subset of the data warehouse. It specially designed for a particular line of business, such as sales, finance, sales or finance. In an independent data mart, data can collect directly from sources

Who needs Data warehouse?

Data warehouse is needed for all types of users like:

  • Decision makers who rely on mass amount of data
  • Users who use customized, complex processes to obtain information from multiple data sources
  • If the user wants fast performance on a huge amount of data which is a necessity for reports, grids or charts, then Data warehouse proves useful
  • It also essential for those people who want a systematic approach for making decisions
  • Data warehouse is a first step If you want to discover 'hidden patterns' of data-flows and groupings

Steps to Implement Data Warehouse

Here, are key steps in Datawarehouse implementation along with its deliverables

1Defining Project ScopeScope Definition
2Requirements GatheringRequirements Document
3Need to determine business needsLogical Data Model
4Acquire or develop Extraction toolsExtract tools and Software
5Document missing dataTo Do Project List
6Mapping attributes from source to the new modelD/W Data Integration Mapping
7Develop Data Warehouse Database designD/W Database Design
8Extract Data from Operational Data StoreIntegrated D/W Data Extracts
9Load Data WarehouseInitial Data Load
10Maintain Data WarehouseOn-going Data Access and Subsequent Loads

What is the difference between a Data Warehouse and a Database?

Data warehouses and databases are both relational data systems, but were built to serve different purposes. A data warehouse is built to store large quantities of historical data and enable fast, complex queries across all the data, typically using Online Analytical Processing (OLAP). A database was built to store current transactions and enable fast access to specific transactions for ongoing business processes, known as Online Transaction Processing (OLTP)

Difference between OLTP vs OLAP

Optimized for Transactions (Inserts, Updates, Deletes)Optimized for Analytics
It is characterized by large numbers of short online transactionsIt is characterized by a large volume of data
Tables in OLTP database are Highly normalized (3NF)Tables in OLAP database are de-normalized
Faster queries on less volume of dataAnalytical queries on huge volume of historical data

Best practices to implement a Data Warehouse

  • The data warehouse must be well integrated, well defined and time stamped
  • Decide a plan to test the consistency, accuracy, and integrity of the data
  • While designing Datawarehouse make sure you use right tool, stick to life cycle, take care about data conflicts and ready to learn you're your mistakes
  • Don't spend too much time on extracting, cleaning and loading data
  • Never replace operational systems and reports
  • Ensure to involve all stakeholders including business personnel in Datawarehouse implementation process. Establish that Data warehousing is a joint/ team project. You don't want to create Data warehouse that is not useful to the end users
  • Prepare a training plan for the end users