Web of Ideas

Developers blog

 

Using CHECKSUM and BINARY_CHECKSUM to get hash values from any SQL Server columns

13 Jan 2011, 09:40   Subscribe to Richies blog (RSS)

I recently had a task to ensure that two databases were identical.
A database was backed up and written to a portable disk in one country, then transported by secure courier to me for further processing.
As a security measure, I needed to come up with a mechanism to determine the data had not been tampered with during transportation.

To do this, I had a series of SQL totals queries run against the data before it was backed up. Then I ran the same queries after I had restored it and compared the results. Identical results meant the data was OK.

Simple totals aren't enough to absolutely guarantee the data has not been tampered with, so I devised some totals that tested the relationships between columns on rows.
For example:

SUM ( (column1 + column2) / column3 )


However, most of the columns on the database were character data, so I needed a way to achieve this for non-numeric columns.

Enter the SQL Server functions CHECKSUM and BINARY_CHECKSUM.

These functions produce a hash for any column, any data type, even if the value is NULL.
So, for non-numeric data, the above query becomes:

SUM ( (BINARY_CHECKSUM(column1) + BINARY_CHECKSUM(column2)) / BINARY_CHECKSUM(column3) )


What is the difference between CHECKSUM and BINARY_CHECKSUM? The honest answer is I'm not sure. The Microsoft definition is woolly at best, and both produce a valid (but different) hash value. For my purposes, either would have worked. I chose BINARY_CHECKSUM mainly because it 'sounds' more exact than plain old CHECKSUM :)
These are the MSDN definitions :

CHECKSUM: Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.
BINARY_CHECKSUM: Returns the binary checksum value computed over a row of a table or over a list of expressions. BINARY_CHECKSUM can be used to detect changes to a row of a table.

If anyone can explain this in plain English, please post a comment. Otherwise, I think my selection technique is valid - choose the one that sounds best!


Comments7
Ratings429
Average rating72%  72%
Posted13 Jan 2011, 09:40

View blog

Richies blog

Previous blog entry: Ctrl Alt Del in nested remote desktop
Next blog entry: Introduction to LINQ for C#

Rate this blog entry

1 (Very poor)  2 (Poor)  3 (Satisfactory)  4 (Good)  5 (Very good)   Enter your name   


Comments


No comments have been made.

Post a comment



Please enter your name to be displayed against the comment