Recently a question was posed by Denny Cherry (@MrDenny) about using a case statement on the right side of an inner join. That sparked a rather vibrant conversation between myself and Mladen Prajdic (@MladenPrajdic). Here is the original question:
"Any idea if having a case on the right side of an inner join will hurt performance?"
Here is the section in question (Can't seem to get it to fit, click on image to see it properly):
The conversation revolved around whether or not the optimizer in SQL Server is capable of evaluating the case statement prior to executing the statement or if it must wait to after. This is all very important to understand since the optimizer must know values ahead of time for them to be what is called a SARG (Search ARGument).
Alright Sherman, quick trip in the way-back machine for those who are scratching your head. SQL Server needs to know either exact values, a.purchaseDate = '1/22/10' or be able to look up values against other table columns, a.purchaseDate=b.purchaseDate for it to consider on index on a.purchaseDate. If it cannot calculate these values ahead of time (pre-compile) then it cannot consider any indexes on a.purchaseDate. This is why it is highly frowned upon to surround your where clause columns with functions like DateAdd. This is no longer a SARG: a.purchaseDate=dateAdd('dd',1,b.purchasedate). Why? The function DateAdd, is evaluated after the optimizer has calculated its query plan. Since the query plan determines what indexes to use, it cannot evaluate the proper index. You want to make sure that any column that has an index on it has values it calculate ahead of time; SARGS.
We went back and forth try to dissect how the optimizer would handle a CASE statement. One train of thought was the the impression that a CASE statement was nothing more than a T-SQL structure that was then converted into, potentially complex, AND's and OR's, much the same way an IN is converted into a bunch of OR's. This would allow the optimizer to calculate the values ahead of time (by exploding the CASE statement) into those complex AND's and OR's. The other idea was that a CASE works just like other functions and is calculated post-execution, which meas the optimizer would not be able to consider it for any index choice. Shortly thereafter, Denny, provided us with an Estimated Execution plan. This is what the optimizer will choose, based upon SARG considerations and Index density information. This is below.
If you take a look, we are doing an Index Seek, using the CASE statement as a SARG. The question seems to have been answered. It is acceptable to use a CASE statement on the ride side of an inner join. There is a caveat that we are working through. There is a thought that says the only reason this worked was due to the CASE statement being part of join and not variables or fixed values. This is something that we will look to explore and post at a later date.
Rob Farley has written an excellent article explaining SARGability in greater detail and showing some of the misconceptions that were presented in post. His article can be viewed here: A CASE study in SARGability