Recently, I needed to
combine the results of two different queries into one existing dim table. Our third party vendor changed the way they
performed a business requirement. Our
existing SSIS ETL processes had already loaded data into the data warehouse
based on the ‘old’ way of doing things but because now there was a new way of
doing the same thing we needed to enhance our ETL to take into account the ‘new’ way and load the data into the same existing dim table.
I was able to write a SQL query that used a UNION ALL
statement that would combine the old and new way of doing things and I could
have put that SQL query into a simple Execute SQL Task control flow but it made
more sense to represent the combination of the two queries in a graphic fashion
so that the flow was more self-documenting and made more sense to the next
person who has to deal with it.
I would put the old query in one OLE DB Source and the new
query in another OLE DB Source and then combine them into one destination. It was easy enough to create the top two
sources but when it came time to make the center combination of the two sources
I realized there were two options. I could
use the MERGE transformation or the UNION ALL transformation.
Example: Union All Transformation
I have worked as a Data Warehouse Architect for several years
and I have had the opportunity to use the MERGE transformation before but I have
not worked with the UNION ALL in a while and I needed to remind myself what was
the difference of the two since they appeared so similar.
After some research, it is really quite simple.
There are only 3 main differences between
them.
- Union All will output unsorted data so does not require sorted inputs.
- Merge requires sorted inputs and will output sorted data.
- Merge can only use 2 inputs. Union All can accept multiple inputs.
It is quite obvious from the list above that most
significant item is how Merge can only use 2 inputs.
If you have more than 2 inputs that you need
to combine then you must use Union All.
Union All is nicer to work with because you don’t have to mess around
with ensuring the inputs are sorted which can require a SORT transformation or
just having a SORT clause in your sql query.
Merge can be very fussy about how you set advanced properties such as
IsSorted and SortKey position.
An important similarity between the two transformation is
that both require the inputs and outputs to have identical metadata such as compatible
data types.
In the end the results are identical so take case to know
the differences and choose carefully depending on the nature of you goal.
Thanks for reading,
SQL Canuck
No comments:
Post a Comment