Friday, 16 October 2015

UNION ALL and MERGE in SSIS

Recently, I  needed to combine the results of two different queries into one existing dim table.  Our third party vendor changed the way they performed a business requirement.  Our existing SSIS ETL processes had already loaded data into the data warehouse based on the ‘old’ way of doing things but because now there was a new way of doing the same thing we needed to enhance our ETL to take into account the ‘new’ way and load the data into the same existing dim table.

I was able to write a SQL query that used a UNION ALL statement that would combine the old and new way of doing things and I could have put that SQL query into a simple Execute SQL Task control flow but it made more sense to represent the combination of the two queries in a graphic fashion so that the flow was more self-documenting and made more sense to the next person who has to deal with it.

I would put the old query in one OLE DB Source and the new query in another OLE DB Source and then combine them into one destination.  It was easy enough to create the top two sources but when it came time to make the center combination of the two sources I realized there were two options.  I could use the MERGE transformation or the UNION ALL transformation.

















Example:  Union All Transformation


















Example:  Merge Transformation



I have worked as a Data Warehouse Architect for several years and I have had the opportunity to use the MERGE transformation before but I have not worked with the UNION ALL in a while and I needed to remind myself what was the difference of the two since they appeared so similar.

After some research, it is really quite simple. 

There are only 3 main differences between them.
  1.        Union All will output unsorted data so does not require sorted inputs. 
  2.        Merge requires sorted inputs and will output sorted data.
  3.        Merge can only use 2 inputs.  Union All can accept multiple inputs.


It is quite obvious from the list above that most significant item is how Merge can only use 2 inputs.  
If you have more than 2 inputs that you need to combine then you must use Union All.  

Union All is nicer to work with because you don’t have to mess around with ensuring the inputs are sorted which can require a SORT transformation or just having a SORT clause in your sql query.  

Merge can be very fussy about how you set advanced properties such as IsSorted and SortKey position.

An important similarity between the two transformation is that both require the inputs and outputs to have identical metadata such as compatible data types.

In the end the results are identical so take case to know the differences and choose carefully depending on the nature of you goal.


Thanks for reading,   SQL Canuck

No comments:

Post a Comment