Thursday, 22 October 2015

Dimension Table Column Types

When you start to learn about BI and data warehousing it can be very daunting when you hear all the technical terms.  It is very easy to get over whelmed when hearing about conformed dimensions, junk dimensions, fact-less tables, slowly changing dimensions, semantic layer, MOLAP, ROLAP and HOLAP (I think you get the idea). 

It was my experience, coming from a traditional programmer/analyst back ground that used a relational model as a database back end, that most of these terms were unfamiliar.  The good news is that with a little effort and study it doesn’t take that long to get familiar with it all. 

A good way to get used to it all is to just break it down bit by bit and digest it slowly.  I have previous posts that describe the use of dimension and fact tables as the foundation of star and snow flake schemas (http://mssqlcanuck.blogspot.ca/2015/10/the-fact-less-facttable-every-noob-to.html and http://mssqlcanuck.blogspot.ca/2015/08/dimensional-modeling-concepts.html). 

I’ll use this post to briefly explain the types of columns that make up a typical dimension table.  I find that once you understand the columns types (and I don’t mean data types like nvarchar, numeric or int) then it is much easier to understand the bigger picture.

Dimension Table Column Types

Dimension tables really describe that facts they are related to.  For example, 16 could be the measure but it doesn't make sense if it isn’t related to a product dimension (RICE) or a unit of measure dimension (KGS).  

These are the types of columns you will find in a dimension table.

Key columns – You will need to be able to uniquely identify each row in a dimension table so you will need key columns. You can use natural keys from the source system or surrogate keys (http://www.kimballgroup.com/1998/05/surrogate-keys/).

Name columns – After you identify a row with a unique key, you will need to be able to refer to the entity with a name.  A report that only contains only keys won't read very well.  For example, you could have a row in a customer dimension that has a key value but you will also want to have a human name to refer to the customer. 

Attribute columns – This column is very important for performing BI analysis.  Quite often in data analytics you will want to create pivot reports on discrete values such as gender or year.  Columns in dimension tables that are discrete and make good candidates to perform pivot tables or graphs are called attribute columns.

Member properties – It wouldn’t make sense to pivot on something like a customer’s address because there would be way too many of them.  However, things like addresses are important to provide additional information on a report.  Columns like address are called member properties and are useful as labels on a report.  It makes sense from a performance stand point to include these types of columns in a dimension table to avoid having to use distributed queries to the source system to get the information.

Lineage columns – These types of columns are used for auditing purposes.  Typically in a data 
warehouse environment the ETL loads are run nightly.  You may want to keep track of which loads are successful or fail and also the load date so you know how current your data is in the warehouse or even which source system provided the data.  This is known as lineage data and is typically not exposed to the report consumer but is very helpful for the DBA and ETL developers.




















Example: Customer dimension table 

I’ll do a similar post next week to describe the types of columns you will find on a fact table.
Thanks for reading,


SQL Canuck

No comments:

Post a Comment