Dimensional Data Modeling For Data Warehouses

Dimensional Data Modeling For Data Warehouses

Dimensional data model is the most common design concept used by data warehouse designers to build data warehousing systems. The data model design is the underlying data model used by many of the commercial OLAP products available today in the market. Some of the terms commonly used in this type of modeling are: Dimension- a category of information (e.g. The time dimension); Attribute- a unique level within a dimension (e.g. Month is an attribute in the time dimension); and Hierarchy- the specification of levels that represents relationship between different attributes within a dimension (e.g. Year → Quarter → Month → Day).

Dimensional data model contains two types of tables. They are:

Fact Table: Fact table in a dimensional data model contains the measures of all interest, such measurements or metrics or facts of business processes. Take the example of the sales amount of a business. The amount can be a monthly sales number or sales number for a day. This measure is stored in the fact table with the appropriate granularity. For sales measures, a fact table generally contains three columns: a date column, a store column and a sales amount column. Besides the measurements the table will also contain foreign keys for the dimension tables.

Dimension Table: The dimension table in a dimensional model represents the context of the measurements. The context of measurements can also be understood as the characteristics such as who, what, where, when, how of a measurement (subject). For example, in a business process Sales, the characteristics of the ‘monthly sales number’ measurement would be a Location (Where), Time (When) and Product Sold (What). A dimension table contains a number of dimension attributes or columns. In the Location dimension the various attributes can be Location Code, State, Country, Zip code. Further, dimension attributes contain one or more hierarchical relationships.

If you are looking forward to building a data warehouse for your organization, you should first decide what your data warehouse will contain. Depending upon your organizational goals, you can choose the type of dimensions that can best meet your requirements. For example, if you want to build a data warehouse that would contain monthly sales numbers across multiple store locations, across time and across products then your dimensions would be Location, Time and Product.

In designing data models for data warehouses or data marts the most commonly used schema types are Star Schema and Snowflake Schema.

Star Schema: In this type of schema design, a single object or the fact table is placed in the middle and is radially connected to other surrounding objects or dimension tables like a star. Here, each dimension is represented as a single table and the primary key in each dimension table is related to a foreign key in the fact table. A simple start schema consists of one fact table and a complex star schema may contain more than one fact table.

Snowflake Schema: This type of schema design can be called as an extension of the star schema. In this design each point of the star or each dimension table contains more points. In other words, in a star schema each dimension is represented by a single dimensional table, while in a snowflake schema that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy.

Choosing a particular type of schema design depends on personal preference as well as business needs. So, it is up to you which one you choose among the two for your data warehouse project.