When you start to learn about BI and data warehousing it can
be very daunting when you hear all the technical terms. It is very easy to get over whelmed when
hearing about conformed dimensions, junk dimensions, fact-less tables, slowly
changing dimensions, semantic layer, MOLAP, ROLAP and HOLAP (I think you get
the idea).
It was my experience, coming from a traditional
programmer/analyst back ground that used a relational model as a database
back end, that most of these terms were unfamiliar. The good news is that with a little effort
and study it doesn’t take that long to get familiar with it all.
A good way to get used to it all is to just break it down
bit by bit and digest it slowly. I have
previous posts that describe the use of dimension and fact tables as the
foundation of star and snow flake schemas (http://mssqlcanuck.blogspot.ca/2015/10/the-fact-less-facttable-every-noob-to.html
and http://mssqlcanuck.blogspot.ca/2015/08/dimensional-modeling-concepts.html).
I’ll use this post to briefly explain the types of columns
that make up a typical dimension table. I
find that once you understand the columns types (and I don’t mean data types like nvarchar, numeric or int)
then it is much easier to understand the bigger picture.
Dimension Table Column Types
Dimension tables really describe that facts they are related
to. For example, 16 could be the measure
but it doesn't make sense if it isn’t related to a product dimension (RICE) or a unit
of measure dimension (KGS).
These are
the types of columns you will find in a dimension table.
Key columns – You will need to be able to uniquely identify each
row in a dimension table so you will need key columns. You can use natural keys from the source system or surrogate keys (http://www.kimballgroup.com/1998/05/surrogate-keys/).
Name columns – After you identify a row with a unique key,
you will need to be able to refer to the entity with a name. A report that only contains only keys won't
read very well. For example, you could
have a row in a customer dimension that has a key value but you will also want
to have a human name to refer to the customer.
Attribute columns – This column is very important for performing BI
analysis. Quite often in data analytics
you will want to create pivot reports on discrete values such as gender or
year. Columns in dimension tables that
are discrete and make good candidates to perform pivot tables or graphs are
called attribute columns.
Member properties – It wouldn’t make sense to pivot on
something like a customer’s address because there would be way too many of
them. However, things like addresses are
important to provide additional information on a report. Columns like address are called member properties and are useful as labels on a report. It makes sense from a performance stand point
to include these types of columns in a dimension table to avoid having to use
distributed queries to the source system to get the information.
Lineage columns – These types of columns are used for auditing
purposes. Typically in a data
warehouse
environment the ETL loads are run nightly.
You may want to keep track of which loads are successful or fail and
also the load date so you know how current your data is in the warehouse or
even which source system provided the data.
This is known as lineage data and is typically not exposed to the report
consumer but is very helpful for the DBA and ETL developers.
Example: Customer dimension table
I’ll do a similar post next week to describe the types of
columns you will find on a fact table.
Thanks for reading,
SQL Canuck
No comments:
Post a Comment