Is a Case statement considered SARGable
Recently a question was posed by Denny Cherry (@MrDenny) about using a case statement on the right side of an inner join. That sparked a rather vibrant conversation between myself and Mladen Prajdic (@MladenPrajdic). Here is the original question:
"Any idea if having a case on the right side of an inner join will hurt performance?"
Here is the section in question (Can't seem to get it to fit, click on image to see it properly):
The conversation revolved around whether or not the optimizer in SQL Server is capable of evaluating the case statement prior to executing the statement or if it must wait to after. This is all very important to understand since the optimizer must know values ahead of time for them to be what is called a SARG (Search ARGument).
Alright Sherman, quick trip in the way-back machine for those who are scratching your head. SQL Server needs to know either exact values, a.purchaseDate = '1/22/10' or be able to look up values against other table columns, a.purchaseDate=b.purchaseDate for it to consider on index on a.purchaseDate. If it cannot calculate these values ahead of time (pre-compile) then it cannot consider any indexes on a.purchaseDate. This is why it is highly frowned upon to surround your where clause columns with functions like DateAdd. This is no longer a SARG: a.purchaseDate=dateAdd('dd',1,b.purchasedate). Why? The function DateAdd, is evaluated after the optimizer has calculated its query plan. Since the query plan determines what indexes to use, it cannot evaluate the proper index. You want to make sure that any column that has an index on it has values it calculate ahead of time; SARGS.
We went back and forth try to dissect how the optimizer would handle a CASE statement. One train of thought was the the impression that a CASE statement was nothing more than a T-SQL structure that was then converted into, potentially complex, AND's and OR's, much the same way an IN is converted into a bunch of OR's. This would allow the optimizer to calculate the values ahead of time (by exploding the CASE statement) into those complex AND's and OR's. The other idea was that a CASE works just like other functions and is calculated post-execution, which meas the optimizer would not be able to consider it for any index choice. Shortly thereafter, Denny, provided us with an Estimated Execution plan. This is what the optimizer will choose, based upon SARG considerations and Index density information. This is below.
If you take a look, we are doing an Index Seek, using the CASE statement as a SARG. The question seems to have been answered. It is acceptable to use a CASE statement on the ride side of an inner join. There is a caveat that we are working through. There is a thought that says the only reason this worked was due to the CASE statement being part of join and not variables or fixed values. This is something that we will look to explore and post at a later date.
Josef Richberg
Rob Farley has written an excellent article explaining SARGability in greater detail and showing some of the misconceptions that were presented in post. His article can be viewed here: A CASE study in SARGability




Reader Comments (3)
I think you guys got lucky with this specific execution plan. Inequality operations are not considered SARGable operations. I would hazard a guess, based on the query snippet you referenced, that the tables/virtual tables in question were small enough that SQL Server was able to perform a seek operation regardless of the inequality operator. If you were to try this on a much larger table, I suspect that you would see an index or table scan going on.
Rob Farley had also mentioned that this might not be a valid test. I was trying to replicate it as is he. If you can help us come up with a similar 'case' (no pun intended) that shows this, I'd love to post a new article about it.
As I try to duplicate this using Northgale (download and run Northgale.sql) I'm running into an interesting situation where I can't duplicate it no matter how hard I try. Instead, I'm always getting clustered index scans. What's really interesting, though, is that the execution plans always resort to using a Nested Loops join.
Erland Sommarskog outlines some of the potential issues in Dynamic Search Conditions in T-SQL. What's interesting is that he has two versions of the article - one for everything before SQL Server 2008 SP1 CU5 and one for everything including SQL Server 2008 SP1 CU5 and later.
I'm going to try messing around with something more complex than Northwind/Northgale to see if I can duplicate these results or throw them in the garbage as totally anomalous.