Tuesday, 22 December 2015

SQL Server using Windows/Active Directory Groups

My experience working as an operational DBA is that a significant portion of time is spent on time consuming repetitive requests from the end users or necessary maintenance work like monitoring space usage and remedying issues with limited space.  

If the DBAs are lucky and are sent on training from time to time, they may have the opportunity to learn about interesting topics like performance tuning or troubleshooting performance issues.  My experience with this is that you can go off and learn all sorts of tips and tricks about performance optimization and get back to work with intentions of utilizing your new or refreshed skills and proactively tuning your environment.  However, you get back to work and the steady stream of day to day requests such as security provisioning returns and your best intentions once again have to take a back seat.

The topic of using Windows/AD groups for the purposes of SQL Server security is not only a security best practice that is recommended by Microsoft but also a very concrete action you can take to free up time to perform additional proactive operations that improve the SQL Server environment that you are responsible for. 

In this blog, Ill summarize just a couple benefits of using Windows/AD groups to make securing your SQL Server much more efficient and less labor intensive.

Anyone who has attained any level of SQL Server certification knows that learning the basics of SQL Server is required learning.  Two of the first things you learn about is the need for creating users and logins in SQL Server and time is spent learning the difference.  

In order to even connect to a SQL Server instance you need to have a login to authenticate your credentials.  That login is mapped to a user object and from there privileges and permissions are provisioned as necessary.  Sounds simple right? It is but it can get a lot more complicated due to the amount of work it would take to administer all of these objects if we created logins and user objects for each individual user and for each application context.

The basis of this blog is that Windows authentication should be used to leverage the security of a trusted domain in a Windows network environment.    Benefits also include being able to use the features such as password complexity and expiration requirements that are administered by the domain administrators. 

When defining a login it is good idea to use an Windows/AD group such as the following:

USE [master]
GO
CREATE LOGIN [DOMAIN\ADGroup] FROM WINDOWS 
WITH 
   DEFAULT_DATABASE=[Adventureworks2012];

Then you can create a user object for this login:

USE [Adventureworks2012] 
GO 
CREATE USER [DOMAIN\ADGroupFOR LOGIN [DOMAIN\ADGroup 
WITH DEFAULT_SCHEMA=[dbo]; 
EXEC sp_addrolemember N'db_datareader'N'DOMAIN\ADGroup';

A very real benefit from a security view is that now connection strings from applications to the SQL Server instance do not need to include vulnerable passwords.  If the domain is a trusted domain then a connection string such as the following will be much more secure.

"Driver={SQLServer};Server=SERVERNAME;Database=Adventureworks2012;Trusted_Connection=yes;"

The biggest time saving benefit is now when a new employee is hired or when an employee leaves, the domain administrator will just have to include or exclude them from the Windows/AD group and because the login is created with the Windows/AD group the user will automatically be included or excluded from the SQL Server security model.  No more adding or deleting individual logins and users and also the risk of creating orphaned users is reduced as well.

A lesser known benefit of using Windows/AD groups for logins is that the ability to audit on a granular level is not lost.  For example, a user is authenticated by signing on to their trusted domain.  The Windows/AD group is mapped to a login.  When a SQL Server user who is mapped to that login performs DDL or DML their actions are logged as their Windows account.  If you were to execute SP_WHO2 you would see their own personal Windows account ie: Domain\UserName.

This illustration shows the various level of security from a Windows level to a Database level.  






































Example: Permissions Hierarchy (MSDN)

By creating a Windows/AD group and a server level role and a database level user the security model becomes very robust.

Thanks for reading,
SQLCANUCK

Friday, 4 December 2015

Transactions in SSIS

It could be argued that the number 1 reason why database products like SQL Server and Oracle are a viable solution to manage data is that the products possess certain properties that ensure data integrity.  This is accomplished by applying the characteristics of ACID properties and the use of transactions.

ACID stands for Atomicity, Consistency, Isolation, Durability.  I won’t be go into detail on these in this blog but Ill direct you to this msdn link for more information: https://msdn.microsoft.com/en-us/library/aa480356.aspx?f=255&MSPPError=-2147217396 .  

The use of transactions in dealing with data is vital to ensuring data integrity and as a DBA and SQL developer I have implemented transactions in T-SQL code to ensure data integrity.  SQL Server Integration Services also needs to use transactions for the same reasons.

The use of transactions to ensure the ACID properties are followed is fully supported in SSIS and this blog will deal with the SSIS options that make it possible.

When you study for and complete any Microsoft certification on SQL server you will be required to know all of the differences between the various Isolation Levels that SQL Server supports.  These isolation level control the behavior of transactions and levels of concurrency.  This type of behavior has a direct effect on the type of behavior one can expect to see when querying data in a multi-user environment.  I won’t go into detail on the various Isolation levels because it is very complex but I will direct you to this link for further explanation.   This link will also take you to a cute poster that Kendra Little created.  http://www.brentozar.com/isolation-levels-sql-server/ .


The SSIS properties I want to discuss in this blog are the Transaction Option properties. 

 

Example:  Transaction Option in Properties window in SSDT


The three possible options for this property are Supported, NotSupported and  Required.  
This property can be set in the properties window in SSDT.

Transaction Options

      Supported:   I’ll start with Supported because this is the default that is used when you create a new package in an SSIS project.   Supported means that the task or container does not start in a transaction but it will become part of a transaction if one exists.  This means that if a container is set to Supported is part of a package that is set to Required then the container will join the transaction of the parent package.
         
      Required:  This one is a bit easier to understand.  It simply means that the container will start a transaction.  However, if the container has a parent that already has a transaction then it will just join the transaction of the parent.  If the parent were set to Supported or Not Supported then this container would start a transaction of its own.
      
      NotSupported:  This option indicates that the container will not join an existing transaction nor will it start a transaction of its own.


Note:  It is important to note that the use of transactions in SSIS require that the MSDTC  (Distributed Transaction Coordinator) be running.  

If you try to implement transactions in SSIS and receive an error like the one below then you should check to see if the MSDTC  is running.

 
Example: Error you receive if the MSDTC is not running.

To start the MSDTC you can open the Control Panel and go to Administrative Tools and click on Services.  Locate Distributed Transaction Coordinator in the list of services and click on Start in the action menu.



Example: Distributed Transaction Coordinator in the Services snap-in.

Thanks for reading,

SQL CANUCK





Friday, 20 November 2015

Performance Considerations When Choosing Transformations in SSIS

The use of SQL Server Integration Services (SSIS) can help to save a lot of time when creating ETL solutions.  The graphical nature of the tool makes it very intuitive and easy to understand.  Often, a person can look at a SSIS ETL flow and understand at a glance what the intended purpose is.

I have been working with SSIS for several years now and I have realized that the easy to use graphical nature of the tool is very advantageous but it recently occurred to me that there is a disadvantage that I had not considered.  I have also been the recipient of formal training courses on the tool as well as completed formal exams on the subject.

When you select a transformation to use from the tool box, you select the transformation and drag it over to your data flow.  This could give the illusion that all transformations are created equally because they all come from the same tool box.  What is not immediately apparent is that there are significant differences between the performance behaviors of the transformations based on a few different properties of the objects.  

The properties I will discuss are the synchronous/asynchronous property and the transformation type property ie:  non-blocking, semi-blocking, fully-blocking.  The properties that an object possesses are a significant factor in the performance you can expect from the object.

Synchronous/Asynchronous

A synchronous transformation is the more highly performing type because the transformation uses the same buffer for the output as is used for the input.  The incoming rows are processed and passed on one row at a time and this is possible because the transformation does not need any additional information from other buffers.  MSDN uses the Data Conversion transformation as a perfect example of this.  Each row that is used as input from the buffer is converted and then outputted one row at a time.  Each conversion is completely separate from the rest of the row set so the process does not have to wait on anything else and can complete much faster.



Example: The Data Conversion transformation is synchronous.

An asynchronous transformation is the just the opposite of that.  Instead of relying on the same input buffer as the output buffer, the process may have to acquire multiple buffers before it can complete the necessary processing.  There may be some additional waiting involved.  The output buffer is not independent of the input buffer.   A perfect example of this is the Merge transformation.  Before the output buffer can be complete it has to first look at many rows from each of the input buffers and then completes the merge operation and return the row set in a sorted order.












Example: The Merge Conversion transformation is asynchronous.

It is interesting to note that all destination adapters are synchronous and all source adapters are asynchronous because they need 2 output buffers.  One for the success and one for any failed outputs.

Three Transformation Types
In this next section, Ill briefly discuss the three transformation type of non-blocking, semi-blocking, fully-blocking.

Non-Blocking
This transformation type is the best performing type.  They process row by row with no dependency on other rows.  The number of rows that are in the input buffer are equal to the rows that are in the output buffer.  The process is synchronous and therefore they are made available to the downstream output with no waiting on the status of other buffers.  This makes them very lightweight and efficient.  Row transformation are a good example of a non-blocking transformation.

Semi-Blocking
Semi-blocking transformation have dependence on other rows.  Rows are added to the input buffer and held for a while until the process is able to release the rows to the output buffer.  A good example of this is the Merge transformation.  Rows are added the input buffer and held until a match is found and then the rows are released to the output buffer in a sorted output.  The time taken before a match is found is what causes these types of transformation to perform worse than non-blocking transformations.

Fully-Blocking
A full-blocking transformation is the least performing transformation.  They require all rows to be received from the input buffer and held until they can be subjected to transformation.  The nature of the output is dependent on collecting all the rows first.  A perfect example of a fully-blocking transformation is an Aggregate transformation.  If you are going to SUM a row set it only makes sense that you collect the entire row set first before you can SUM it.

It is important to consider these properties when deciding which transformations to utilize.  A complete understanding of the difference in performance helps to make the best choice.



Example: Transformation Type Categories (Satya Katari)

Thanks for reading,

SQL Canuck

Friday, 13 November 2015

Using LOCALHOST to connect to a named SQL Server 2012 Instance

Apart from working on database servers that are part of a wide area network for most of my day to day duties as a Senior Data Warehouse Architect, I like to keep  local SQL Server instance(s) on my work PC.  This allows me to spend some time working through training demos that I come across or that are part of my self-paced learning objectives to prepare for passing certification exams.

I am about a week away from writing the first of 2 certification exams for the MCSA SQL Server 2012 certification.  In preparation for that, I am working through some material from Microsoft Press.  This material comes with excellent demos and these demos need to access to a named instance of SQL Server 2012.  I installed the developer edition of SQL Server 2012 on my PC and created an appropriate named instance.

For many years, I have kept various instances of SQL Server local to my PC so I could play in a secure environment.  I got in the habit of connecting to these instances by using the LOCALHOST hostname.  This allows me to access the network services that run on the host (my own PC) by using the loopback network interface.  I could type my computer name each time but this is just easier to type LOCALHOST or even the loopback IP 127.0.0.1 .

After I created my newest named instance, I noticed that I was having connection issues when I tried to connect using the LOCALHOST as my server name.  The same was true when I would try use the loopback IP address.  I would try to connect to SQL Server SSMS and if I used LOCALHOST or 127.0.0.1 as the server name it wouldn’t not connect and give me the following error.











Example: Connect to Server error message


After some investigation, I realized that because I was trying to connect to a named instance as opposed to a default instance there were a few steps I would need to take to continue to use LOCALHOST as my server name. 

If I wanted the hostname LOCALHOST to be associated with my newly installed named instance I would need to create an alias.  This can be done in the SQL Server Configuration Manager that is provided when you install SQL Server 2012.  This can found at Start>All Programs>Microsoft SQL Server 2012>Configuration Tools.  

Once the configuration manager is opened you can go to the node that is called SQL Native Client 11.0 Configuration (32bit).  You can right click and select New Alias and enter ‘LOCALHOST’ under Alias Name and your SERVERNAME/InstanceName under Server.  You should do the same on SQL Native Client 11.0 Configuration if your PC runs a 64 bit architecture.












Example: SQL Server Configuration Manager





















Example: Add a new alias


Now you will find that you can connect to the database engine by using LOCALHOST as server name and because your alias points to a named instance you don’t even need to specify the \InstanceName as part of your server name when you connect to SSMS or SSIS etc.

It is important to also note that you can connect to a named instance using a loopback address as long as you also specify your instance name ie:  127.0.0.1\InstanceName and in this case an alias isn’t necessary.

The method I have described above is not a lot of work and if you are going to be doing a lot of work on a local named instance then being able to connect by using LOCALHOST instead of SERVERNAME\InstanceName is a convenient time saver


Thanks for reading,
SQL Canuck




Friday, 6 November 2015

Out of Date Statistics

This week a BI specialist came up to me and asked me to inspect a query that was not performing very well.  He knew that I have attended a week long Microsoft course called ‘SQL Server 2012: Performance Tuning- Design, Internals and Architecture’ so I have had the opportunity to help him out on a number of occasions for many performance based questions. 

He sent the query over and I ran it and inspected the actual query plan by pressing the ‘Include Actual Execution Plan’ button in SQL Server 2012 Management Studio.







(Example: Include Actual Execution Plan button)

The query was not very long or complicated so I was surprised when it took several minutes to return data.  When I inspected the query execution plan I didn’t notice anything that looked like it should be causing a long execution time.  Quite the opposite was true actually.

When I broke the query into chunks and ran each chunk individually the performance was fine until I introduced a specific table.  We will call it TableA.  I investigated the meta-data of TableA and discovered that it was a large table but it had appropriate indexes on it including a clustered b-tree index on a surrogate integer column. 

When I examined the query plan for that TableA, it indicated that it was using an Clustered Index Seek on with a Key Look up.  I was surprised to see this because this really is the ideal situation and is the best choice for highly selective queries.  It means that the optimizer has located an appropriate index.  The Clustered index seek means that the engine will not have to scan all the rows in the data pages to find what it is looking for.  It can traverse the b-tree index and in just a couple reads get right to the index key it is looking for.  It can then do a quick key look up and go right to the needed data.








(Example: Clustered Index Seek operator)





(Example: Key Lookup operator)

So this struck me as strange. If everything was so ideal then why was it taking so long?  If you guessed ‘Statistics’ then you are correct.  The first thing that came to mind was that the statistics may have been out of date for that index. 

I suggested that the developer identify the statistics associated with that index and see when it was last updated. 

















(Example: Statistics node associated with the Products table)






















(Example: Property page of a statistic object shows the last time they were updated.)

If the statistics were out of date then they should be updated by running a similar command to the following:
 
USE AdventureWorks2012;
GO
UPDATE STATISTICS Production.Product(Products)
    WITH FULLSCAN, NORECOMPUTE;
GO

'The following example updates the Products statistics in the Product table, forces a full scan of all rows in the Product table, and turns off automatic statistics for the Products statistics.'

The developer came back to me later that day and said he had updated the statistics and now the query was completing in just a few seconds.

Thanks for reading,

SQL Canuck

Friday, 30 October 2015

Fact Table Column Types

I promised last week to continue the discussion of the various column types that are found in dimension and fact tables.

I explained the types of columns found in dimension tables and that blog can be found here: (http://mssqlcanuck.blogspot.ca/2015/10/dimension-table-column-types.html).

I also have a previous blog about the basics of dimensional modelling and it helps to explain what dimension and fact tables are. (http://mssqlcanuck.blogspot.ca/2015/08/dimensional-modeling-concepts.html)

Fact tables are really what make the dimensional model so efficient and quick for reporting and data analytics because it contains granular measurements of important business processes.  The fact table combined with the dimension tables create an ideal environment for fast and accurate data analysis.

I mentioned in the blog about dimension column types that it can be over whelming to hear all the names of the data warehousing concepts such as lineage, slowly changing dimensions, surrogate keys, natural keys, composite keys, fact-less fact tables etc.  I’ve tried to break down these concepts into smaller blogs so they are easier to digest.

Fact Table Column Types

These are the types of columns you will find in a fact table.

Measure – This is the essence of a fact table and they store measurements associate with a specific business process.  They are usually numeric and can be aggregated.  Examples are sales amount, order quantity, discount amount.  These values can often be additive or semi-additive and when associated with relevant dimensions can be rolled up to represent aggregations at various granularity.

Foreign keys – These keys create a relationship between the dimension tables and the fact tables.  Without the relationship to dimension tables the measures would be meaningless.  I often use the example of 28kg of rice. 28 by itself is ambiguous but when you add kg (unit of measure dimension) and rice (product dimension) it becomes much more meaningful.  The fact table is on the many side of the relationship.   All the foreign keys in a fact table usually create a unique row in the fact table and can be considered a composite primary key. 

Surrogate key – Sometimes you will include a surrogate key to act as the primary key.  This key is usually the business key from the source table.  For performance reasons is best if the key is short and consists of only 1 or 2 columns and preferably numeric.  Often you could just use a collection of foreign keys that point to dimension tables to act as a composite primary key but this can be confusing and difficult to work with so sometimes it makes more sense to assign a surrogate key either from an identity or sequence database object or adopt one from the source system table.


Lineage columns – These columns are used for auditing purposes.  Typically in a data warehouse environment the ETL loads are run nightly.  You may want to keep track of which loads are successful or failed and also the load date so you know how current your data is in the warehouse or even which source system provided the data.  This is known as lineage data and is typically not exposed to the report consumer but is very helpful for the DBA or ETL developers.
















Example:  FactSalesQuota fact table

Thanks for reading, 

SQL Canuck

Friday, 23 October 2015

Identify and Resolve Orphaned SQL Server Database Users

In my experience as a SQL Server DBA, I found myself spending a lot of time making sure that our backup/restore strategy was in tip top shape.  This involved spot checking backups and testing them on test servers to ensure that they were valid.  

I'm reminded of a couple old DBA saying such as "Your data is only as good as your last backup." and "Your backup is only as good as your last restore."  Testing that your backups will be useful in an disaster recovery situation is very important and essential for job preservation.

Also, as an operational support DBA for various LOB systems such as PeopleSoft and BizTalk, we would spend some time creating copy only backups and restoring them in test and development environments so the developers have a ‘refresh’ of test or development data.  

A possible by-product of moving databases around to various environments is that sometimes it can result in an orphaned user. 

An orphaned user is a user that has no corresponding SQL Server login.  When you create a login so that it can be authenticated against the SQL Server instance you will also create a user and then associate the two together. 

USE [AdventureWorks2012]
GO
CREATE USER "domain\user1" FOR LOGIN "domain\login1";
GO

If a SQL Server log in is dropped but the user is not then the user is orphaned. 

USE [AdventureWorks2012]
GO
Drop Login "domain\login1";
GO

It is prudent to clean up these orphaned  users from time to time to avoid confusion and for security reasons.  It can be very time consuming and brain numbing to manually try to traverse the database objects in SSMS and identify and delete users.  It would also be very risky because you could incorrectly identify and a user as orphaned and delete it and possibly cause serious problems.

There is a way to detect orphaned users in a database by running a system stored procedure.  The system stored procedure is called sp_change_users and has an option where you provide @Action = ‘Report’ as a parameter.

USE [AdventureWorks2012]
GO;
sp_change_users_login @Action='Report';
GO;

The command will return the following result set:









You can also use the sp_change_user_login stored procedure to then associate the orphaned database user to a SQL Server login that you choose.

You can read more about this system stored procedure at:



Thanks for reading,  

SQL Canuck