Who is not at risk?
You do not need to worry if you are already using a secure hashing algorithm with salts that are hard to break. SHA1 or SHA256 are more secure than MD5, but still not really safe. There is a multitude of hashing algorithms that are more computationally intensive and – due to their intrinsic salt – hard to beat using rainbow tables, the most prominent of which is bcrypt. More generic key derivation functions can also be used for one-way encrypting passwords, a good candidate is e.g. PBKDF2. There are intensive ongoing discussions which algorithm is theoretically superior, but these are, in our opinion, not directly relevant as both seem to be secure enough for a long time being.
Some backgroundLet’s first discuss how an attacker actually gets hold of a database. Passwords (or their hashed versions) are often stored in relational databases. The by far most common attack to get a copy of the database (or individual rows) is a technique called SQL injection. URL parameters or POST parameters of a web page are manipulated to contain malicious characters which are interpreted by the SQL database. Of course not all web sites are susceptible to SQL injection but it is very hard to prove that a web site is invulnerable. Security vulnerabilities are also a permanent danger. Security leaks might exist in some server software, operating system kernels or even in the database system itself. Such leaks are traded on the Internet in order to abuse them. You should never underestimate internal attacks. Often databases are accessible from the inside and passwords can readily be found in the source code or server configuration. This makes it easy for employees to get hold of all data and analyze it after it has been saved to a local disk or even transferred to an external medium. Backups are often performed in data centers and it is not always safe to assume that these backups are treated as confidentially as the servers themselves. Often backup tapes are reused or even disposed; this sometimes leads to interesting finds including old databases. Remember that users tend to change their passwords very infrequently if not forced; so even very old passwords have a high chance to still be working (maybe also on other sites).
The same is basically true for old hard disks. However, hard disks are often changed routinely and the old disks are then sold on used hardware wholesale sites. If you cannot absolutely trust your hosting provider (and who can?) you should take into account that your password database might sometime also go this way.
How Hashing worksOne way to protect the password itself is to not save it at all, but to just save a “fingerprint” of the password instead. If a password is entered, it is enough to calculate the fingerprint of the entered password and compare it to the saved fingerprint. This fingerprinting technique is called “hashing“. There are of course more formal explanations of hashing, but for our discussion it is not necessary to understand it on the deepest level. The most important aspect for us is the use of a cryptographic hash function that ensures that small changes in the original text lead to completely different hashes. To make fingerprints different for users even if their passwords are the same, an additional (random) value is combined with the password before it is hashed. This value is called “salt” and is also added to the hash to calculate the fingerprint from the entered password correctly using this hash. It is obviously much safer to store the salt in a different place but for our discussion this just complicates matters and therefore we ignore this. Sometimes the algorithm is modified further by adding a “secret”, which is only known to the application, to the password as well before it is hashed. This slightly enhances security if the secret is well-chosen.
There is a variety of hashing algorithms; the most famous ones are MD5 and SHA1. Both were however created in former times and optimized for the calculation of hashes with high performance. This makes them not very well suited for storing passwords as we will see below.
How hashed Passwords can be brokenUnfortunately, not all is safe even when passwords are hashed. If an attacker gets hold of the (hashed) password database, there are several ways of how this database can be broken. We concentrate on the most famous ones:
- Dictionary attacks: Dictionary attacks make use of the fact that most passwords are words which are also found in a dictionary. The attacker then takes a dictionary (usually one which is specifically designed for cracking passwords) and tries to hash each word in the dictionary and compares the result to the hashed version of the passwords. The tools used for dictionary attacks, like “John the Ripper” have become quite sophisticated and perform all kind of permutations and obvious replacements (like “I” ⇔ “1″ and “E” ⇔ “3″ etc.)Of course this only works if the hashing algorithm is known and the position of the salt can be extracted. This can usually be accomplished quite easily if the attacker creates a login for him/herself before stealing the password database.
- Brute force attacks: These attacks are a simpler version of the dictionary attacks. Instead of using a dictionary, all conceivable combinations of characters are created and hashed. Of course the short-term success rate is lower as most users will have passwords which are in a dictionary. However, this method will find all passwords if given enough time and not only passwords which are found in a dictionary.
- Rainbow tables: Rainbow tables are precomputed tables that contain hash values for many, many passwords. As specific weaknesses of the hashing algorithm can be taken into account, the size of the table can shrink considerably. Obviously rainbow tables work best when no salt is included in the passwords. Then this method is very efficient and can break a few million passwords a day on decent hardware.
2257151269b83ef0e139c3eec8bbcbcbfrom a password, so head over to www.md5decrypt.org and try to find the original password. Even search engines carry some of the most popular hashes today. To create a secure storage of passwords, a more complicated algorithm has to be used. Bcrypt is an algorithm which is specifically designed for this purpose and will be complicated enough to last for several years. bcrypt was specifically designed for adjustable complexity by increasing the number of cycles. Both bcrypt and PBKDF2 take several orders of magnitude longer to calculate than MD5 / SHA1, which is a bit cumbersome if you expect mass logins, but on the other side makes it impossible to crack the password database by using brute force techniques. By adjusting this “work factor” the algorithms will be safe for a long time to come. For a more detailed discussion regarding complexity see article “How To Safely Store A Password“.
Our Conversion ProcessLet’s assume that we already have a password database with a significant amount of users. The passwords are hashed with a hashing function hash(cleartext), where hash can be e.g. MD5, SHA-1 or in the most simple case a function just returning the cleartext (in which case the passwords are stored unencrypted).
The challengeAs described above and in the bcrypt Wikipedia page, one of the advantages of bcrypt is its complexity. So it takes time to compute the hash value from the original. This makes the algorithm hard to attack but also poses some problems when converting many values at once as we have to make sure that no concurrent access is modifying the password at the same time (i.e.. a user trying to change his/her password at the same time). We have to ensure that no password change is lost and no user will be denied login during the whole conversion process. There are some additional complexities related to the different ways a password was saved before the change; these will be explained individually below.
PrerequisitesAs a first step the software has to be made aware of bcrypted passwords. This of course makes it necessary to save bcrypted passwords and check against these saved versions. As the transition cannot be performed for all users at once, it makes sense to introduce a new column in the database first to host these bcrypted passwords. Alternatively, we could also have a Boolean column which tells us whether the password is still saved in the old representation or is already bcrypted. The choice is up to you. In the next step, the software itself has to be adapted. It has to be aware of the two password formats, must be able to check entered passwords against both formats and be able to write changed passwords in the new format. This software will still be fully functional even if not a single password has been converted to the new format.
Algorithm for Changing Passwords
In the first step the procedure for changing passwords should check whether the row is already locked. This is a good idea as it is possible (though very unlikely) that a user opened multiple windows and is trying to change the password in all of them.
In the next step, the idea is to exclusively lock the row of the current user first, to then read the original (maybe already hashed) password, bcrypt it, write it back, delete the original password, finish the transaction and release the lock. It is essential that all this is performed within a single transaction. This ensures that no password change gets lost and users will be able to log in to the system using their old (and new) passwords without interruption (or only a very small delay if a user whose password is about to be converted tries to log in simultaneously).
Algorithm for Verifying PasswordsTo avoid concurrency problems due to passwords just being converted, the conversion algorithm will perform the necessary change in one single transaction, so the verification does not have to concern itself with it. If the password is in bcrypt format, only this value should be used to check the password. The bcrypt value already contains some meta information like salt and number of iterations, so verification is as simple as
Bcrypt.check(hash(enteredPassword), storedBcryptHash)If the password has not yet been converted the previously used algorithm should take over. In the other cases below we concentrate only on the verification of the bcrypted password, while the old password is gracefully handled by the old password checking algorithm.
Handling clear-text PasswordsConverting clear text passwords is straightforward. In each step, the row is locked, the unencrypted password from each row taken, bcrypted and saved to the new column. Verifying the password is similar; depending on whether it has already been bcrypted or not, the clear text or the bcrypted password is used to check against the entered password.
Handling unsalted MD5 passwordsAs MD5 cannot be easily inverted (unless you attack the database yourself) it is much simpler to use the MD5 hashed password as a starting point and during conversion just bcrypt that value into a new column. Verification is a bit more complicated, because the entered password must be MD5 hashed and that hash then be checked against the bcrypted column in the database.
Handling salted MD5 passwordsThe conversion procedure is more complicated if MD5 passwords with salt are used. To re-create the salted MD5 hash, the salt has to be separated in a first step and should be kept in an extra column in the database. After doing that, the conversion procedure is identical to the one described above, i.e. the salted MD5 hash will be bcrypted. Checking an entered password is also a bit more complicated. First the correct salt must be selected from the database, the entered password must be MD5 hashed with this salt and finally this value will be checked with the bcrypt algorithm. Checking in this case works with:
Bcrypt.check(MD5(salt + enteredPassword), storedBcryptPassword)
Converting to a “pure” bcrypt databaseStoring the bcrypt’ed version of an already (via e.g. MD5) hashed password is not as secure as storing the password directly with bcrypt, as some of the entropy is eaten up by the original hashing algorithm. However it is still tremendously more secure than storing the password trivially hashed or in clear text. If you started with an MD5 password database and are now unhappy about the “mixed” hashes in your database, there is also a solution for this. You have to use a flag column which indicates that the passwords are in the double-hashed intermediate format. As soon as a user successfully logs in you know the correct password and can save a pure bcrypted version in the database. Of course you still have to change the flag so that your future password verifications only use bcrypt.
This procedure is also non-intrusive but will never catch all users. If you want to get rid of those “mixed” users completely, you either have to force them to log in or reset their passwords. If they have not logged in for a long time, chances are high that they have forgotten their passwords anyway.