Friday, 20 November 2015

Performance Considerations When Choosing Transformations in SSIS

The use of SQL Server Integration Services (SSIS) can help to save a lot of time when creating ETL solutions.  The graphical nature of the tool makes it very intuitive and easy to understand.  Often, a person can look at a SSIS ETL flow and understand at a glance what the intended purpose is.

I have been working with SSIS for several years now and I have realized that the easy to use graphical nature of the tool is very advantageous but it recently occurred to me that there is a disadvantage that I had not considered.  I have also been the recipient of formal training courses on the tool as well as completed formal exams on the subject.

When you select a transformation to use from the tool box, you select the transformation and drag it over to your data flow.  This could give the illusion that all transformations are created equally because they all come from the same tool box.  What is not immediately apparent is that there are significant differences between the performance behaviors of the transformations based on a few different properties of the objects.  

The properties I will discuss are the synchronous/asynchronous property and the transformation type property ie:  non-blocking, semi-blocking, fully-blocking.  The properties that an object possesses are a significant factor in the performance you can expect from the object.

Synchronous/Asynchronous

A synchronous transformation is the more highly performing type because the transformation uses the same buffer for the output as is used for the input.  The incoming rows are processed and passed on one row at a time and this is possible because the transformation does not need any additional information from other buffers.  MSDN uses the Data Conversion transformation as a perfect example of this.  Each row that is used as input from the buffer is converted and then outputted one row at a time.  Each conversion is completely separate from the rest of the row set so the process does not have to wait on anything else and can complete much faster.



Example: The Data Conversion transformation is synchronous.

An asynchronous transformation is the just the opposite of that.  Instead of relying on the same input buffer as the output buffer, the process may have to acquire multiple buffers before it can complete the necessary processing.  There may be some additional waiting involved.  The output buffer is not independent of the input buffer.   A perfect example of this is the Merge transformation.  Before the output buffer can be complete it has to first look at many rows from each of the input buffers and then completes the merge operation and return the row set in a sorted order.












Example: The Merge Conversion transformation is asynchronous.

It is interesting to note that all destination adapters are synchronous and all source adapters are asynchronous because they need 2 output buffers.  One for the success and one for any failed outputs.

Three Transformation Types
In this next section, Ill briefly discuss the three transformation type of non-blocking, semi-blocking, fully-blocking.

Non-Blocking
This transformation type is the best performing type.  They process row by row with no dependency on other rows.  The number of rows that are in the input buffer are equal to the rows that are in the output buffer.  The process is synchronous and therefore they are made available to the downstream output with no waiting on the status of other buffers.  This makes them very lightweight and efficient.  Row transformation are a good example of a non-blocking transformation.

Semi-Blocking
Semi-blocking transformation have dependence on other rows.  Rows are added to the input buffer and held for a while until the process is able to release the rows to the output buffer.  A good example of this is the Merge transformation.  Rows are added the input buffer and held until a match is found and then the rows are released to the output buffer in a sorted output.  The time taken before a match is found is what causes these types of transformation to perform worse than non-blocking transformations.

Fully-Blocking
A full-blocking transformation is the least performing transformation.  They require all rows to be received from the input buffer and held until they can be subjected to transformation.  The nature of the output is dependent on collecting all the rows first.  A perfect example of a fully-blocking transformation is an Aggregate transformation.  If you are going to SUM a row set it only makes sense that you collect the entire row set first before you can SUM it.

It is important to consider these properties when deciding which transformations to utilize.  A complete understanding of the difference in performance helps to make the best choice.



Example: Transformation Type Categories (Satya Katari)

Thanks for reading,

SQL Canuck

No comments:

Post a Comment