The use of SQL Server Integration Services (SSIS) can help
to save a lot of time when creating ETL solutions. The graphical nature of the tool makes it
very intuitive and easy to understand. Often,
a person can look at a SSIS ETL flow and understand at a glance what the
intended purpose is.
I have been working with SSIS for several years now and I have
realized that the easy to use graphical nature of the tool is very advantageous
but it recently occurred to me that there is a disadvantage that I had not
considered. I have also been the recipient of formal training courses on the tool as well as completed formal exams on the subject.
When you select a transformation to use from the tool box,
you select the transformation and drag it over to your data flow. This could give the illusion that all
transformations are created equally because they all come from the same tool
box. What is not immediately apparent is
that there are significant differences between the performance behaviors of the
transformations based on a few different properties of the objects.
The properties I will discuss are the synchronous/asynchronous
property and the transformation type property ie: non-blocking, semi-blocking, fully-blocking. The properties that an object possesses are a
significant factor in the performance you can expect from the object.
Synchronous/Asynchronous
A synchronous transformation is the more highly performing type because the transformation uses the same buffer for the output as is used for
the input. The incoming rows are
processed and passed on one row at a time and this is possible because the transformation
does not need any additional information from other buffers. MSDN uses the Data Conversion transformation
as a perfect example of this. Each row
that is used as input from the buffer is converted and then outputted one row
at a time. Each conversion is completely
separate from the rest of the row set so the process does not have to wait on
anything else and can complete much faster.
Example: The Data Conversion transformation is synchronous.
An asynchronous transformation is the just the opposite of
that. Instead of relying on the same
input buffer as the output buffer, the process may have to acquire multiple
buffers before it can complete the necessary processing. There may be some additional waiting
involved. The output buffer is not
independent of the input buffer. A
perfect example of this is the Merge transformation. Before the output buffer can be complete it
has to first look at many rows from each of the input buffers and then completes
the merge operation and return the row set in a sorted order.
Example: The Merge Conversion transformation is asynchronous.
It is interesting to note that all destination adapters are
synchronous and all source adapters are asynchronous because they need 2 output
buffers. One for the success and one for
any failed outputs.
Three Transformation
Types
In this next section, Ill briefly discuss the three
transformation type of non-blocking, semi-blocking, fully-blocking.
Non-Blocking
This transformation type is the best performing type. They process row by row with no dependency on
other rows. The number of rows that are
in the input buffer are equal to the rows that are in the output buffer. The process is synchronous and therefore they
are made available to the downstream output with no waiting on the status of
other buffers. This makes them very
lightweight and efficient. Row transformation
are a good example of a non-blocking transformation.
Semi-Blocking
Semi-blocking transformation have dependence on other
rows. Rows are added to the input buffer
and held for a while until the process is able to release the rows to the
output buffer. A good example of this is
the Merge transformation. Rows are added
the input buffer and held until a match is found and then the rows are released
to the output buffer in a sorted output.
The time taken before a match is found is what causes these types of
transformation to perform worse than non-blocking transformations.
Fully-Blocking
A full-blocking transformation is the least performing transformation. They require all rows to be received from the
input buffer and held until they can be subjected to transformation. The nature of the output is dependent on
collecting all the rows first. A perfect
example of a fully-blocking transformation is an Aggregate transformation. If you are going to SUM a row set it only
makes sense that you collect the entire row set first before you can SUM it.
It is important to consider these properties when deciding
which transformations to utilize. A
complete understanding of the difference in performance helps to make the best
choice.
Example: Transformation Type Categories (Satya Katari)
Thanks for reading,
SQL Canuck