Uncategorized

What is checksum and how to calculate and use checksum values to verify data and file integrity

Cryptographic hashes

You can think of a cryptographic hash as a fingerprint. A person produces the same fingerprint of her left thumb every time it’s taken, but it’s difficult to find another person with the same left thumb fingerprint. The fingerprint doesn’t disclose any information about the person other than her left thumb fingerprint. You can’t know what math skills she has or what eye color she has by looking at her fingerprint.

A fingerprint of a file is called a cryptographic hash. To create a cryptographic hash of a file, you send the file into a computer program called a cryptographic hash function.

Why are cryptographic hash functions useful?

Cryptographic hash functions can be used as an integrity check, to detect changes in data. Suppose you want to send your favorite cat picture to your friend Fred via email, but you suspect that the picture may be accidentally corrupted during transfer. How would you and Fred make sure that the picture Fred receives is the same as the one you sent?

Use checksum values to verify data and file integrity
Use checksum values to verify data and file integrity

You compose an email to Fred and attach the cat picture to the email. But you also calculate the cryptographic hash, the digital fingerprint, of the cat picture. That hash is written down in the body of the email. The cryptographic hash function is standard software and available on both your computer and Fred’s computer.

When Fred receives this email, he saves the cat picture in a file on his computer and calculates the hash of that file. If the result is the same as the hash in the email, Fred knows for sure that the file isn’t accidentally corrupted.

What checksum is

Checksum is a calculated value that is used to determine the integrity of data. Checksum serves as a unique identifier for the data (a file, a text string, or a hexadecimal string). If the data changes then so does the checksum value. This makes it easy to verify the integrity of the data.
To test data integrity, the sender of the data calculates checksum value by taking the sum of the binary data transmitted. When receiving the data, the receiver can perform the same calculation on the data and compare it with the checksum value provided by the sender. If the two values match, the receiver has a high degree of confidence that the data was received correctly.

Checksum value is also called hash value. The data that is calculated can be a file, a text string, or a hexadecimal string.

The most commonly used checksum is MD5 (Message-Digest algorithm 5) hash. MD5 was designed by Professor Ronald L. Rivest in 1991 to replace an earlier hash function, MD4. MD5 checksum is a 128-bit hash value (32 characters).

Use checksum values

In practice, checksum values are mainly used in three situations.

First, checksum value can be used to check data integrity when data is sent through telecommunication networks such as Internet.

For example, serious software download sites often display MD5 checksum value for each file they offer for downloading. After download the file, you calculate the checksum value of the file you downloaded and compare with the checksum value provided by the download website. If they match, you are sure that the file is in good shape – not corrupted or modified. The checksum value becomes a fingerprint of the file.

Another example of using checksum is to calculate the hash value of a text string such as a password. MD5 hash value for the text ‘test’ (without quotes) is 098f6bcd4621d373cade4e832627b4f6. This is a 32-character GUID (Global Unique Identifier).

When storing a password in a database, it is always a good idea to store the password’s hash such as MD5 checksum value. This way, the plain password is not exposed to anyone even though reversing MD5 to its original text string is possible for simple words by using dictionary attack.

Second, checksum value can be used to check data integrity of stored data to see if the data has been modified or changed in any way over time. Data can be modified in many ways. It may be infected by viruses, packet loss when transferring through networks, accidental or intentional human changes of data, or anything else.

For example, you may have a file that you created and stored on a network drive. How to make sure that the file is identical two months later when you want to use it again? You can calculate the file’s checksum value when you first created it. When you need the file two months later, calculate its checksum again and compare with your previous checksum calculation. If they are the same, you can have high degree of confidence that the file has not been tampered with by anyone.

Third, checksum values can be used to verify data burned to CDROM, CD-R (Compact Disc-Recordable), OR DVD, DVD-R.

How to calculate checksum values

Calculate checksum value for a file, a text string or hex string.

There are some free software tools to help you calculate checksum value. HashCalc is a good one that not only calculates MD5 value, but also calculates other popular algorithms such as MD2, MD4, SHA1, SHA2 (SHA256, SHA384, SHA512), RIPEMD160, PANAMA, TIGER, ADLER32, CRC32, and the hash used in Peer-to-Peer eDonkey and eMule tools. HashCalc supports 3 input data formats: file, text string and hex string.

HashCalc supports file drag-and-drop functionality. With this tool you can quickly compare music, audio, sound, video, film, game, image, icon, document and other files, verify CD and hard drive files, perform checking of your .mp3, .mpeg, .mpg, .avi, .vcd, .iso, .zip, .gif, .jpg, .doc and other downloads.

Or using a cmdlet of PowerShell to compute the hash value for a file by using a specified hash algorithm.

Get-FileHash [-InputStream]  [[-Algorithm] ]  [] 

By default, the Get-FileHash cmdlet uses the SHA256 algorithm, although any hash algorithm that is supported by the target operating system can be used.

Calculate checksum values for all files in a folder and its sub-folder(s).

When you burn a CD or copy large amount of files, you want to verify the accuracy of all files. To do so, a checksum value needs to be calculated for each file. In this case, a checksum file can be created to store checksum values for all files. For an example of checksum file, see MD5 checksum file generated by freeware FileCheckMD5.

Freeware FileCheckMD5 allows you to calculate checksum values and create the checksum file. FileCheckMD5 is a Windows GUI based small application that can recursively calculate MD5 checksum values for all files in a folder and its subfolder(s).

To test data integrity by using FileCheckMD5, see FileCheckMD5 how-to page.

References