In the last post, I discussed the properties of cryptographic keys, and the different types of keys used in cryptography.

One of the examples I gave was the one of key exchange, and that in modern systems, we generally use a public/private key system and we hand out the public key to whoever wants it – typically by way of a digital certificate.

That’s all well and good, but how do we know that the certificate (and subsequent key) we receive is one which can be trusted?

Before I tackle that question, lets take a look at cryptographic hashes, because they form part of the answer.

Cryptographic hashes

A hash (or hash digest) in the world of cryptography, is the term given to the output value of a cryptographic hashing algorithm. Unlike an encryption algorithm, a hashing algorithm should only produce an output in one direction – that is to say once the plain-text data has been proceeded via the hashing algorithm, there should be no way to reverse the process to recover the original input data. As such, hashing algorithms are sometimes called trapdoor functions.

A hashing algorithm takes input data of variable length and produces a known-fixed length output (the digest). Passing the same input data through the same hashing algorithm should always produce the same output digest. So, for example, passing the value Cat through the MD5 hashing algorithm will always produce the digest fa3ebd6742c360b2d9652b7f78d9bd7d

A desired feature of the hashing process is the avalanche effect in that if you make a small change to the input data the result is a huge change in the output hash digest. So if you pass the value cat through the MD5 hashing algorithm you get the digest d077f244def8a70e5ea758bd8352fcd8 as the result, or the word CAT, you get c01ae1a5f122f25ce5675f86028b536a.

This property makes it much harder to try to work out what value has changed in the input data.

The problem here though is that you could produce every variant of the word cat and its associated hash digest, then all you need do is compare an obtained hash digest to see which one matches.

cat = d077f244def8a70e5ea758bd8352fcd8
Cat = fa3ebd6742c360b2d9652b7f78d9bd7d
cAt = 8f39a4600699aae6e519c1e7443ae913
caT = 4ec88e3a781ac9c4ee5d0e42b9ac0e02
CAt = becd8bd0bf6200827d3923ee8d9ccbc6
CaT = 2f6730b4d110fe7f0d7f698ab60b4a58
cAT = 4e24157a62430e14dd1b9114cbafaf76
CAT = c01ae1a5f122f25ce5675f86028b536a

To tackle this problem, many systems, especially when storing sensitive data such as a password, add a random value (called a salt) to the input data which will alter the output digest

cat+salt = cfa590c5b4c51852821cc9a7669cfcd1
Cat+salt = 8e81792636da5c77b891e1bd6f6e71e9
cAt+salt = 332d89499a13b0d6556f8335d14777b1

Unless the salt value is known, it is not possible to recreate the hash digest.

Data integrity

Hashes are regularly used as a data integrity verification tool. For example if you are downloading a file from the Internet and want a level of assurance that the file has not been altered during the transfer process, the owner of the file can run the data through a hashing algorithm to produce a hash digest which they publish alongside the file download.

Once you download the file, you can run the file through the same hashing algorithm to generate your copy of the hash digest – if your hash matches the owners hash, then you know the file must not have been altered in transit, if it were to be altered then the hash digests would not match.

Hashes as signatures

When wanting to prove legitimacy of a file, a hash alone is not enough.

If an owner of a file sends the file over a network along with a hash, there is the possibility that an attacker could intercept the file, destroy it and produce a new file and hash. The recipient of the file will have no way of knowing the file has been altered because the associated hash will be correct for the new file.

As such, we must bind the hash to something only the sender has – their private key.

If we hash a file and then encrypt the hash with the private key of the sender, the recipient can decrypt the hash with the senders public key – therefore have proof that the hash did originate from the sender and has not been intercepted and altered en-route. An attacker cannot encrypt a new file & hash because they don’t have the senders private key.

Certificate signing

So, now we understand the features of hashes, how are they used in digital certificates?

When a service owner (such as the owner of a website) wants to produce a public/private key pair and distribute the public key via a digital certificate, they must have their certificate validated by a trusted digital certificate authority (CA).

The CA receives the website owners certificate and hashes it with the CA’s own private key to produce a certificate signature.

When a visitor to the website receives the certificate, their browser checks the digital signature to see if it has indeed originated by a trusted CA, and as such knows whether to trust the senders public key or not.

Digital certificate for this website, showing the “Encryption Everywhere” certificate signature

So, when you visit a website using TLS encryption, your browser downloads a digital certificate which it checks for authenticity by decrypting the hash of the issuing certificate authority using the CA’s public key which is known by the browser.

Assuming the hash is valid, it knows to trust the associated public key for the website being visited, and as such uses that key to encrypt the seed value for the symmetric session key it will use to communicate securely with the webserver.

Hopefully now you have a bit more knowledge about the encryption we use to protect our data.