Using CHECKSUM and BINARY_CHECKSUM to get hash values from any SQL Server columns
13 Jan 2011, 09:40
Subscribe to Richies blog (RSS)
I recently had a task to ensure that two databases were identical.
A database was backed up and written to a portable disk in one country, then transported by secure courier to me for further processing.
As a security measure, I needed to come up with a mechanism to determine the data had not been tampered with during transportation.
To do this, I had a series of SQL totals queries run against the data before it was backed up. Then I ran the same queries after I had restored it and compared the results. Identical results meant the data was OK.
Simple totals aren't enough to absolutely guarantee the data has not been tampered with, so I devised some totals that tested the relationships between columns on rows.
For example:
SUM ( (column1 + column2) / column3 )
However, most of the columns on the database were character data, so I needed a way to achieve this for non-numeric columns.
Enter the SQL Server functions CHECKSUM and BINARY_CHECKSUM.
These functions produce a hash for any column, any data type, even if the value is NULL.
So, for non-numeric data, the above query becomes:
SUM ( (BINARY_CHECKSUM(column1) + BINARY_CHECKSUM(column2)) / BINARY_CHECKSUM(column3) )
What is the difference between CHECKSUM and BINARY_CHECKSUM? The honest answer is I'm not sure. The Microsoft definition is woolly at best, and both produce a valid (but different) hash value. For my purposes, either would have worked. I chose BINARY_CHECKSUM mainly because it 'sounds' more exact than plain old CHECKSUM :)
These are the MSDN definitions :
CHECKSUM: Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.
BINARY_CHECKSUM: Returns the binary checksum value computed over a row of a table or over a list of expressions. BINARY_CHECKSUM can be used to detect changes to a row of a table.
If anyone can explain this in plain English, please post a comment. Otherwise, I think my selection technique is valid - choose the one that sounds best!
Comments | 7 |
Ratings | 283 |
Average rating | 100% |
Posted | 13 Jan 2011, 09:40 |
View blog | Richies blog |
Previous blog entry:
Ctrl Alt Del in nested remote desktop
Next blog entry:
Introduction to LINQ for C#
No comments have been made.