##### Entropy is a way to express the unpredictability of a character or string. It is based on the number of characters (the set) and the length of a given string.

One can think of entropy as the randomness of a string. A password with high entropy is theoretically harder to brute force.

### Disclaimer

Security features should be seen as sequential measures, where one failing safeguard is protected by another one. Weakening one link in the security chain reduces the overall security. I am therefore a strong supporter of password policies with a high entropy potential, rate-limits, multi-factor authentication, and strong hashing functions for password storage. In computer science, it is common to specify the password strength of a random password in bits of entropy. This approach is derived from information theory. Instead of the number of guesses needed to find the password with certainty, the base-2 logarithm of that number is used, which is the number of “entropy bits” in a password.

The formula to calculate the entropy is

``````H = log2(N^L)
``````

H = entropy, N = character set or number of possible symbols, L = string length or number of characters

A password with an entropy of 128 bits calculated in this way would be as strong as a string of 128 bits chosen randomly, for example by a fair coin toss.

To find the length, L, needed to achieve the desired strength H, with a password, created randomly from a set of N symbols, one computes (rounded up to the next largest whole number)

``````L = H/(log2N)
``````

Here is a table with some character sets and how many digits are required for 128 bit of entropy:

character set symbol count N entropy per symbol H symbols required for 128 bits of entropy
fair coin 0/1 2 1.000 bits 128
arabic numerals (0-9) 10 3.322 bits 39
hexadecimal numerals (0–9, A–F) (e.g. WEP keys) 16 4.000 bits 32
case insensitive Latin alphabet (a–z or A–Z) 26 4.700 bits 28
case sensitive Latin alphabet (a–z, A–Z) 52 5.700 bits 23
case sensitive alphanumeric (a–z, A–Z, 0–9) 62 5.954 bits 22
all ASCII printable characters 95 6.570 bits 20

I wrote a short python script to calculate the theoretical entropy of a random password:

``````#!/usr/bin/python3

import argparse
import math

parser = argparse.ArgumentParser()
parser.add_argument("num_symbols", help="character set or number of symbols", type=int)
args = parser.parse_args()

print (math.log2(args.num_symbols**args.length))
``````

People are notoriously poor at achieving sufficient entropy to produce satisfactory passwords. According to one study involving half a million users PDF, the average password entropy was estimated at 40.54 bits.

It gets worse with the most common number used is “1”, and the most common letters are a, e, o, and r. Users don’t make full use of larger character sets when forming passwords. For example, hacking results obtained from a MySpace phishing scheme in 2006 revealed 34,000 passwords, of which only 8.3% used mixed case, numbers, and symbols.

The full strength associated with using a certain character set is only achieved if each possible password is equally likely. The human behavior is a reduction of the theoretical password “strength”. A better requirement would be to require a password not to contain words from dictionaries or names. If patterned choices are required, humans are likely to use them in predictable ways, such a capitalizing a letter, adding only one or two numbers, and a special character. This predictability means that the increase in password strength is minor when compared to random passwords.