Contact FutureLearn for Support
Skip main navigation
We use cookies to give you a better experience, if that’s ok you can close this message and carry on browsing. For more info read our cookies policy.
We use cookies to give you a better experience. Carry on browsing if you're happy with this, or read our cookies policy for more information.

Hashing data

One alternative to storing the data itself is to store a cryptographic hash of it. In fact this is how passwords are usually stored.

Storing password hashes rather than the passwords themselves is a good example of the correct application of data hygiene.

What is a hash function?

A hash function is a one-way function that scrambles the data that is passed to it in a way that is impossible to reverse. The key properties a hash function must have are:

  1. Every time you hash the same data you get the same answer.

  2. Given a hash of some (unknown) data, it should be (nearly) impossible to guess what the input data was.

Obviously (2) is not strictly possible, as an attacker could simply be very very lucky, but what we require is that the hash function gives no clue as to how close a guess was to the original data. An attacker is always in the dark, and gains no information from each failed guess (beyond the fact that the guess was wrong).

Notes for Nerds: most hash functions produce output that is smaller than the input, therefore they suffer from what are known as collisions. A collision is when two different inputs produce the same output. In a collision attack, an attacker tries to find any two inputs that hash to the same value. Some hash functions, for example MD5 have been found to be vulnerable to collision attacks. However guessing the input that produced a given output is called a (first) preimage attack, and these usually require brute-force to perform, i.e. an attacker simply keeps guessing until they get a match, and here the speed with which each individual application of the hash function can be performed becomes critical. Many people consider MD5 to be simply “too fast”, and recommend the use of slower hash functions that make brute-forcing harder. Another way to slow down brute-forcing is to repeatedly apply the hash function many times. This is sometimes called hash-stretching.

How hashes are used to protect passwords

You are probably wondering how the system checks your password if it has been hashed in a way that cannot be unscrambled. Well the key is that hashing the same data always gives the same output. So you proceed as follows:

  1. Take the hash of the original password and store it.

  2. When the user re-enters their password take the hash of the new password.

  3. If the new hash matches the old hash then let them in, otherwise refuse entry.

This same idea can be applied to any data where you only need to compare the new value with the old value, but obviously does not work if you need to read the old value, for that you need to encrypt the data.

Notes for Nerds: password databases usually also apply salt to the passwords before hashing them. This is a technique that guards against an attack where an attacker precomputes a table consisting of the hashes of a large number of common passwords (like “password123”) and then simply looks up a stolen hash in the table to see which password generated it. To guard against this a large random number (or string) is added to the user’s password before hashing it. This is called a salt, and it must be different for each password. Obviously the salt has to be stored along with the hash otherwise it would be impossible to verify a user’s password, but, even though an attacker will have the salt, because it is different for each password they cannot precompute a lookup table. The attacker’s only option is to brute-force the hash by guessing passwords and hashing them along with the salt. If combined with hash-stretching this can be a very effective technique for protecting passwords.

Do not attempt to invent your own algorithm for storing passwords. There are libraries available for doing this. Speak to a security expert.

Creating hashes in Android

In Android hashes can be created using the MessageDigest class.

Even if your app is running on a device that supports full-disk or file-based encryption, and even if your app is also using a Content Provider to secure access to the data, it can still make sense to only store hashes of the data.

This is because any mistake in the configuration of the Content Provider (despite following the advice in this course!) could allow a malicious app to request a decrypted copy of the data. This is especially the case if the Content Provider is secured by a dangerous permission and the user grants that permission to a malicious app!

Share this article:

This article is from the free online course:

Secure Android App Development

University of Southampton