What to choose for performance   SubQuery or Joins   Part 62

What to choose for performance SubQuery or Joins Part 62

Text version of the video
http://csharp-video-tutorials.blogspot.com/2013/01/what-to-choose-for-performance.html

Slides
http://csharp-video-tutorials.blogspot.com/2013/09/part-62-what-to-choose-for-performance.html

All SQL Server Text Articles
http://csharp-video-tutorials.blogspot.com/p/free-sql-server-video-tutorials-for.html

All SQL Server Slides
http://csharp-video-tutorials.blogspot.com/p/sql-server.html

All Dot Net and SQL Server Tutorials in English
https://www.youtube.com/user/kudvenkat/playlists?view=1&sort=dd

All Dot Net and SQL Server Tutorials in Arabic
https://www.youtube.com/c/KudvenkatArabic/playlists

What to choose for performance – SubQuery or Joins

According to MSDN, in sql server, in most cases, there is usually no performance difference between queries that uses sub-queries and equivalent queries using joins. For example, on my machine I have
400,000 records in tblProducts table
600,000 records in tblProductSales tables

The following query, returns, the list of products that we have sold atleast once. This query is formed using sub-queries. When I execute this query I get 306,199 rows in 6 seconds
Select Id, Name, Description
from tblProducts
where ID IN
(
Select ProductId from tblProductSales
)

At this stage please clean the query and execution plan cache using the following T-SQL command.
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
Go
DBCC FREEPROCCACHE;
GO

Now, run the query that is formed using joins. Notice that I get the exact same 306,199 rows in 6 seconds.
Select distinct tblProducts.Id, Name, Description
from tblProducts
inner join tblProductSales
on tblProducts.Id = tblProductSales.ProductId

Please Note: I have used automated sql script to insert huge amounts of this random data. Please watch Part 61 of SQL Server tutorial, in which we have discussed about this automated script.

According to MSDN, in some cases where existence must be checked, a join produces better performance. Otherwise, the nested query must be processed for each result of the outer query. In such cases, a join approach would yield better results.

The following query returns the products that we have not sold atleast once. This query is formed using sub-queries. When I execute this query I get 93,801 rows in 3 seconds
Select Id, Name, [Description]
from tblProducts
where Not Exists(Select * from tblProductSales where ProductId = tblProducts.Id)

When I execute the below equivalent query, that uses joins, I get the exact same 93,801 rows in 3 seconds.
Select tblProducts.Id, Name, [Description]
from tblProducts
left join tblProductSales
on tblProducts.Id = tblProductSales.ProductId
where tblProductSales.ProductId IS NULL

In general joins work faster than sub-queries, but in reality it all depends on the execution plan that is generated by SQL Server. It does not matter how we have written the query, SQL Server will always transform it on an execution plan. If sql server generates the same plan from both queries, we will get the same result.

I would say, rather than going by theory, turn on client statistics and execution plan to see the performance of each option, and then make a decision. In a later video session we will discuss about client statistics and execution plans in detail.

Get Paid Taking Pictures
Share