I promised last week to continue the discussion of the
various column types that are found in dimension and fact tables.
I explained the types of columns found in dimension tables
and that blog can be found here: (http://mssqlcanuck.blogspot.ca/2015/10/dimension-table-column-types.html).
I also have a previous blog about the basics of dimensional
modelling and it helps to explain what dimension and fact tables are. (http://mssqlcanuck.blogspot.ca/2015/08/dimensional-modeling-concepts.html)
Fact tables are really what make the dimensional model so
efficient and quick for reporting and data analytics because it contains
granular measurements of important business processes. The fact table combined with the dimension
tables create an ideal environment for fast and accurate data analysis.
I mentioned in the blog about dimension column types that it
can be over whelming to hear all the names of the data warehousing concepts
such as lineage, slowly changing dimensions, surrogate keys, natural keys, composite
keys, fact-less fact tables etc. I’ve
tried to break down these concepts into smaller blogs so they are easier to
digest.
Fact Table Column Types
These are the types of columns you will find in a fact
table.
Measure – This is the essence of a fact table and they store
measurements associate with a specific business process. They are usually numeric and can be
aggregated. Examples are sales amount,
order quantity, discount amount. These
values can often be additive or semi-additive and when associated with relevant dimensions can be
rolled up to represent aggregations at various granularity.
Foreign keys – These keys create a relationship between the
dimension tables and the fact tables. Without
the relationship to dimension tables the measures would be meaningless. I often use the example of 28kg of rice. 28
by itself is ambiguous but when you add kg (unit of measure dimension) and rice (product
dimension) it becomes much more meaningful. The fact table is on the many side of the
relationship. All the foreign keys in a fact table usually
create a unique row in the fact table and can be considered a composite primary
key.
Surrogate key – Sometimes you will include a surrogate key
to act as the primary key. This key is usually
the business key from the source table.
For performance reasons is best if the key is short and consists of only
1 or 2 columns and preferably numeric.
Often you could just use a collection of foreign keys that point to dimension
tables to act as a composite primary key but this can be confusing and
difficult to work with so sometimes it makes more sense to assign a surrogate
key either from an identity or sequence database object or adopt one from the
source system table.
Lineage columns – These columns are used for auditing
purposes. Typically in a data warehouse
environment the ETL loads are run nightly.
You may want to keep track of which loads are successful or failed and
also the load date so you know how current your data is in the warehouse or
even which source system provided the data.
This is known as lineage data and is typically not exposed to the report
consumer but is very helpful for the DBA or ETL developers.
Example: FactSalesQuota fact table
Thanks for reading,
SQL Canuck