Hope you guys are getting ready for Diwali Festival :-). Before you guys start with the festival event I thought of writing an article on performance. Couple of days ago one of the developer came to me and said he was using an INNER join in his query and he is getting delay in results, soon after he changes it to LEFT OUTER join he could see some improvement in getting the results. I was blinking for a second and I started arguing with him that this will change the business logic & asked him to explain me how’s OUTER join faster so we went into an argument which resulted in this article.
Before I start this article I assume you are familiar with all the JOINS in SQL Server, If not read my earlier article on JOINS. As you know INNER join brings you record which is common to both the tables, however OUTER join will bring you non matching records from either of the table or from both the records including NULL values. So OUTER join will get you more records compared to INNER join which means it requires additional IO & CPU cycles and thus OUTER JOIN wont be faster than INNER join.
INNER join and OUTER join will give you results differently, so you can’t replace an INNER join with an OUTER join as it will change your business logic itself. So, if you are the one! who feel changing the join will improve performance then you have to be cautious about this. Instead of changing the joins you have to look into other factors. Check whether indexes are present on the join columns, make sure stats and indexes are properly updated, primary key consideration and make the join small instead of joining more tables.
Actually when you do the join, SQL Server will use any of the following JOIN operations based on the row count
Nested Loop Join– When one of the table in Join has less number of rows it will use Nested Loop Join
Merge Join– When both the tables have more or less equal number of records and it’s sorted for the joining column then this operation is used
Hash Join– When both the tables rows differ significantly then this operation is used. Hash joins can efficiently process large, unsorted, nonindexed inputs.
Let’s execute some query to find out how these join operations are used.
I’m going to execute the query below, in both the tables number of rows are very less (less than 300) and I’ve used INNER JOIN
USE AdventureWorks2012 GO SELECT A.LoginID , A.HireDate, B.SalesQuota FROM [HumanResources].[Employee] A INNER JOIN [Sales].[SalesPerson] B ON B.BusinessEntityID = A.BusinessEntityID
The above execution is with index on both the table so it used clustered index scan, it used nested loop join even without clustered index
Let’s change the query now to use LEFT OUTER join to fetch the rows with index, since the rows are already sorted with the help of index merge join has just used the indexes. If you can note down in earlier it used Nested loop join now in this query it changed to Merge Join
Now I’ve dropped index on Salesperson table and executed the same query, you can find a difference. Yes n ow there is one more operator “Sort”. Since Merge join requires the column to be sored hence before it goes to Merge join it sorts the data and then does the join hence there is a over head to sort the data which will delay in result set.
Now lets do the same test on large data set (approx 1.2 lac records). I’ve tried the below query with Inner join & it resulted the same plan
USE AdventureWorks2012 GO SELECT A.SalesOrderID,A.CarrierTrackingNumber ,B.OrderDate from [Sales].[SalesOrderDetail] A LEFT OUTER JOIN [Sales].[SalesOrderHeader] B ON B.SalesOrderID = A.SalesOrderID
Do you know you can change the join operation using Join hints. Since the data set is large we will change the merge join to hash join operator using join hint as below
USE AdventureWorks2012 GO SELECT A.SalesOrderID,A.CarrierTrackingNumber ,B.OrderDate from [Sales].[SalesOrderDetail] A LEFT OUTER HASH JOIN [Sales].[SalesOrderHeader] B ON B.SalesOrderID = A.SalesOrderID
Note down the difference in cost between this and the earlier plan (using merge) you can find a drastic change in operator cost from 0.33 to 1.58, similarly subtree cost is also increased from 3.7 to 5.2. That’s the reason we always prefer not to use hints in your query as per best practice, it’s better to give the decision to query optimizer which will chose the best plan based on your data pattern and distribution.
So to conclude this from the above examples we can say OUTER JOIN performs the similar way as INNER JOIN however OUTER join need to perform additional task of getting non matched rows from tables as well as it has to bring the NULL values. Join operations plans an important role in performance however it relies on the data pattern and data distribution.