Cryptography - Hash Functions & Digital Signatures

In this article we discuss hash functions in depth and how they can be combined with public-key encryption to create a digital signature.

Hash Functions - Definition

Hash functions take a potentially long message as the input and generate a unique output value from the content. The output of a hash function is commonly referred to as the message digest.

Hashing is a one-way function and with a properly designed algorithm, there is no way to reverse the hashing process to reveal the original input.

Compare this to encryption (two-way function) which allows encryption and decryption with the correct key or key pair.

Another specific use case of hash functions is in data structures like hash tables or bloom filters. The goal here is not security but rapid data lookup.

Hash functions in the context of digital signatures are supposed to produce the same output for the same input (deterministic). This enables the recipient of a message to recompute the message digest with the same hash function and compare it to the transmitted digest to verify that the message wasn’t modified in transit.

If the message has even a minor difference in spacing, punctuation, or content, the message digest will be completely different.

It is not possible to derive the degree of difference between two messages by comparing the digest. The slightest difference in the input will produce a drastically different digest value.

a picture showing a city

There are five requirements for a cryptographic hash function:

The input can be of any length.
The output has a fixed length.
The hash function is relatively easy to compute for any input.
The hash function is one-way (this means it is extremely hard if not impossible to determine the input from the output).
The hash function is collision-free (there can’t be two different messages producing the same hash value).

SHA - Secure Hash Algorithm

The secure hash algorithm SHA and its successors, SHA-1, SHA-2, SHA-3, are government standard hash functions promoted by the National Institute of Standards and Technology (NIST).

SHA-1 takes an input of virtually any length and produces a 160-bit message. It processes a message in 512-bit blocks. If a message length isn’t a multiple of 512-bit, the SHA algorithm pads the message with data until the length reaches the next highest multiple of 512-bit.

SHA-1 is no longer considered secure against well-funded adversaries. All major web browser manufacturers stopped accepting SHA-1 SSL certificates in 2017. Google even demonstrated a collision in SHA-1.

SHA-2 was published in 2001 as a reaction to the weaknesses in SHA-1. It includes significant changes from its predecessor and has four major variants:

SHA-256 produces a 256-bit message digest using a 512-bit block size.
SHA-224 uses a truncated version of the SHA-256 hash and produces a 224-bit digest using a 512-bit block size.
SHA-512 produces a 512-bit message digest using a 1,024-bit block size.
SHA-384 uses a truncated version of the SHA-512 hash and produces a 384-bit digest using a 1,024-bit block size.

The cryptographic community generally considers SHA-2 algorithms as secure, but it theoretically suffers from the same weakness as the SHA-1 algorithm.

SHA-3 was published in 2015, while part of the same series of standards, SHA-3 is internally different from the MD5 like structure of SHA-1 and SHA-2. SHA-3 is a subset of the broader cryptographic primitive family Keccak. It was developed as a drop-in replacement of SHA-2, offering the same variants (SHA3-256/SHA3-224/SHA3-512/SHA3-384) and hash lengths but using a more secure algorithm.

MD2 - Message Digest

The MD2 Message-Digest Algorithm was developed by Ronald Rivest (yes, the one from Rivest, Shamir, and Adleman aka RSA Security) in 1989 to provide a secure hash function for 8-bit processors.

MD2 pads the message to a length of a multiple of 16-bit and computes a 16-byte(!) checksum which is appended to the end of the input message. Then a 128-bit message digest is generated by using the original message along with the appended checksum.

Cryptanalytic attacks against the MD2 algorithm exist and it was even proven that MD2 is not a one-way function. Therefore, it should no longer be used.

MD4 is an enhancement of MD2, was released in 1990, and supports 32-bit processors. It increases the security level with an enhanced algorithm.

MD4 pads the message to a length of 64-bit smaller than a multiple of 512-bits. The MD4 algorithm then processes 512-bit blocks of the message in thee rounds of computation to produce a 128-bit message digest.

An 8-bit message would be padded with 440 additional bits of data to make it 448-bits, which is 64-bit smaller than a 512-bit message.

Several flaws have been found in the MD4 algorithm and therefore it is no longer considered secure. Usage should be avoided if possible.

MD5 was released in 1991 as the next version of the message-digest algorithm. It also processes 512-bit blocks of the message but uses four rounds of computation to produce the same 128-bit message-digest length as in MD2 and MD4.

MD5 has the same padding requirements as MD4, the message length must be 64-bit less than a multiple of 512-bit. MD5 introduced additional security features that reduced the speed of message-digest production.

Recent cryptanalytic attacks demonstrated that MD5 is subject to collisions. In 2005 it was demonstrated that two digital certificates from different public keys have the same MD5 hash.

All algorithms in the MD family are no longer accepted as suitable hashing functions. However, they may still be found in use today.

a picture showing a city

Digital Signatures

With secure hash functions, we can implement a digital signature system. A digital signature infrastructure has two goals:

Digitally signed messages assure the recipient that the message came from the claimed sender. This provides nonrepudiation.
Digitally signed messages provide the recipient with the assurance that the message was not altered while in transit. This protects against malicious (man in the middle) or unintentional (communication interference) modification.

Digital signatures rely on the combination of two concepts, public-key cryptography, and hash functions.

Alice is sending a digitally signed but not encrypted message to Bob:

1: Alice generates a message digest of the original plaintext message using a secure hash function like SHA3-512.

2: Alice then encrypts the message digest using her private key. The output is the digital signature.

3: Alice appends the digital signature to the plaintext message.

4: Alice then sends the appended message to Bob

5: Bob removes the digital signature from the appended message and decrypts it with the public key of Alice.

6: Bob calculates the hash of the plaintext message with SHA3-512.

7: Bob then compares the decrypted message digest he received from Alice with the message digest Bob computed. If the two digests match, he can be assured that the message he received was sent by Alice.

The digital signature process does not provide any privacy by itself. It only ensures that the cryptographic goals of authentication, integrity, and nonrepudiation are met. If Alice wants to ensure the privacy of her message to Bob, she could encrypt the appended message generated in step 3 with the public key of Bob. Bob then would need to first decrypt the encrypted message with his private key before continuing with step 5.

Digital signatures are used not only for messages but software vendors are often using digital signature technology to authenticate code distribution over insecure networks like the internet. Checksums do not require any encryption key, they are simple digests of fingerprints to represent some kind of data.

HMAC - Hashed Message Authentication

The hashed message authentication code (HMAC) algorithm implements a partial digital signature and guarantees the integrity of a message but it does not provide nonrepudiation.

HMAC relies on the combination of two concepts, private-key cryptography, and hash functions.

HMAC can be combined with any secure hash function such as SHA3. The resulting message authentication code (MAC) is called HMAC-SHA3. HMAC combines a secret key with a hash function and represents a halfway point between the unencrypted use of a message-digest algorithm and computationally expensive digital signature algorithms based on public-key cryptography.

HMAC does not encrypt the message. Instead, the message (encrypted or not) must be sent alongside the HMAC hash. Parties with the secret key will hash the message again themselves, and the received and computed hashes will match if it is authentic.

DSS - Digital Signature Standard

This standard specifies that all federally approved digital signature algorithms must use the SHA-3 hashing function. DSS also specifies the encryption algorithms that can be used to support digital signature infrastructure. Currently approved in version 186-4 are:

DSA - Digital Signature Algorithm
RSA - Rivest-Shamir-Adleman Algorithm
ECDSA - Elliptic Curve DSA

Key Confusion

Public-Key Cryptography and Digital Signatures can be confusing. Encryption, Decryption, Digital Signatures, and Signature verification all use the same algorithms with different key inputs. Here are a few simple rules:

You encrypt a message with the recipient’s public key.
You decrypt a message with your own private key.
You digitally sign a message with your own private key.
You verify the signature of a message with the sender’s public key.