Navigation
Follow Along

Powered by Squarespace

Entries in index (2)

Wednesday
16Sep2009

Indexes and Convert_Implicit

When you are designing your where clause to access data from a given set of tables it would benefit you to pay careful attention to the data types of the columns you will be using.  SQL Server tries to help out by implicitly converting between data types which can cause 'hidden' optimizer issues.

A common example is the implicit conversion between character and integer datatypes.  Listed below is a snippet of the query I was asked to tune.

 You will notice the number 45 in purple.  The graphical query plan produced this:

 What first drew my attention was that 90% of the time was being taken up by a very small part of the overall where clause.  Digging deeper I saw this:

 

 Highlighted in purple was the culprit.  A quick look at the column dbo.stcsmf11.hlvno showed it was a char(2), yet in the where clause I was joining it as a number stcsmf11.hlvno=45.  I changed the where clause to be:

 

The subtle change in the where clause (making the number (45) into a character ('45')), yields a dramatic improvement.  The time spent in that specific section of code goes from 90% to 2%:

The detailed picture shows two important pieces of information.  First, we are doing an Index Seek vs. an Index Scan and secondly we have an additional join now being used.  The join you see was always there, but due to the implicit conversion, SQL Server was not able to use the other qualifier.  The additional seek predicate is 00779081.hlval.

 It is very important to remember to be as specific as possible in your where clauses and keep a close eye on data types.

Saturday
07Feb2009

Cryptographic Hash Functions

Cryptographic hash functions take an arbitrary number of input bytes and reduce it to a fixed size. This resultant size is dependent upon the function you use, be it MD4,MD5,SHA-1, etc. I prefer to use SHA-1 since it is the most secure of the 3 listed above. There are many other cryptographic functions, but they are not included in all libraries.

SHA-1 has an output size of 160 bits (20bytes) for an input as large as 2^64-1 bytes. The output size of 160 bits means that the chance of a two completely random input streams to produce the same hash is 1 in 2^80 ( or 2^(output size/2)). I have not yet worked on a data set that large, so it's good enough for me to use in this fashion; digital fingerprint. Let's say you have a customer table withfirstname,lastname,address1-address5, city,state,zip. You want to update the record every time something changes in any of the 10 columns. If you happen to have a trigger on the table, then you just do your updates based upon the records that changed, which is happily provided to you by the trigger. If on the other hand you are working with very large data sets or don't have the option of a trigger you need an effective way to determine which records have changed. You might not want to run a 10k record update to catch the 100 or so that actually changed. Here is where the hash comes into play. Depending upon the function you are using (either SQLServer's built-in HashBytes) or through SSIS and a .Net function, you want to take all of the columns you that can change and condense them into a single value. In the example above, which was allchar(x)fields, you can simply do firstname+lastname+address1+address2....Feed the entire concatenation into the function and you have a digital fingerprint. If anything in any of the fields change, then the 'fingerprint' will be different and a simple comparison (orig.digitalprint!=delta.digitalprint) enables you to identify the record. With SQLServer 2005 sp2,HashBytes, is deterministic so you can make a computed column which will calculate the HashByte of each record automatically. You can then run a sql statement which will do an update if the hashes match or an insert if they don't. This is a very good unique key, providing very random placement (the purpose of a cryptographic function is to be as truly random as possible) versus adding a incrementing column simply to provide a unique index.