A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process - Bill Inmon
Subject-Oriented: A data warehouse should be focused to analyze a particular subject area. ex. SalesWH, MarketingWH, FraudWH
Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product
Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses ever associated with a customer
Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered
In this tutorial, you will learn more about:
Three main types of Data Warehouses are:
Enterprise Data Warehouse is a centralized warehouse. It provides decision support service across the enterprise. It offers a unified approach for organizing and representing data. It also provide the ability to classify data according to the subject and give access according to those divisions
Operational Data Store, which is also called ODS, are nothing but data store required when neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the Employees
A data mart is a subset of the data warehouse. It specially designed for a particular line of business, such as sales, finance, sales or finance. In an independent data mart, data can collect directly from sources
Data warehouse is needed for all types of users like:
Here, are key steps in Datawarehouse implementation along with its deliverables
Step | Task | Deliverable |
---|---|---|
1 | Defining Project Scope | Scope Definition |
2 | Requirements Gathering | Requirements Document |
3 | Need to determine business needs | Logical Data Model |
4 | Acquire or develop Extraction tools | Extract tools and Software |
5 | Document missing data | To Do Project List |
6 | Mapping attributes from source to the new model | D/W Data Integration Mapping |
7 | Develop Data Warehouse Database design | D/W Database Design |
8 | Extract Data from Operational Data Store | Integrated D/W Data Extracts |
9 | Load Data Warehouse | Initial Data Load |
10 | Maintain Data Warehouse | On-going Data Access and Subsequent Loads |
Data warehouses and databases are both relational data systems, but were built to serve different purposes. A data warehouse is built to store large quantities of historical data and enable fast, complex queries across all the data, typically using Online Analytical Processing (OLAP). A database was built to store current transactions and enable fast access to specific transactions for ongoing business processes, known as Online Transaction Processing (OLTP)
OLTP | OLAP |
---|---|
Optimized for Transactions (Inserts, Updates, Deletes) | Optimized for Analytics |
It is characterized by large numbers of short online transactions | It is characterized by a large volume of data |
Tables in OLTP database are Highly normalized (3NF) | Tables in OLAP database are de-normalized |
Faster queries on less volume of data | Analytical queries on huge volume of historical data |