Consequences when using Mutable Fields in hashCode()

This entry is part of the series:
Hacking Java Puzzlers for Fun and Profit

We start our new series with an informative HashSet puzzler. It’s about a bug that gave us quite a headache since its root cause was hard to identify. This subtle bug has without doubt crept into many code bases, so a detailed discussion should interest every Java coder. I will also discuss code inspection tools that detect this violation (sadly, only few). And by the way, what we learned about HashSet also makes a good topic in our job interviews.

The starting point was that we had implemented a class with some fields and needed quick access to its instances. This use case was pretty performance-critical, so we took a HashSet to maintain the class instances. As the instance count was high, we needed a broad distribution of possible hash-values. We used the IDE to generate code for the hashCode() and equals() method. Everything should have been fine with this generated code — at least we thought so. However, our application showed some unexpected results and it took us quite a long time to nail down the root cause.

Here’s a simplified example of code using a HashSet of PhoneNumber instances:

// Class PhoneNumber implements hashCode() and equals()
PhoneNumber obj = new PhoneNumber("mgm", "089/358680");
System.out.println("Hashcode: " +
obj.hashCode()); //prints "1476725853"
// Add PhoneNumber object to HashSet
Set<PhoneNumber> set = new HashSet();
set.add(obj);
// Modify object after it has been inserted
obj.setNumber("089/358680-0");
// Modification causes a different hash value
System.out.println("New hashcode: " +
obj.hashCode()); //prints "7130851"
// ... Later or in another class, code such as the following
// is operating on the Set:
// Unexpected Result!
// Output: obj is set member: FALSE
System.out.println("obj is set member: " +
set.contains(obj));
// Even stranger unexpected Result!
// Output: obj is set member: FALSE
for (PhoneNumber p : set) {
if (p.equals(obj)) {
System.out.println("obj is set member: " +
set.contains(p));
}
}

Executing the code above surprisingly produces the following output:
obj is set member: FALSE
obj is set member: FALSE

Obviously, what we would expect is a “TRUE”, since obj has been inserted into the HashSet.

What just happened?

The unexpected result from the code above is caused by a trap in the JDK Collections framework into which many developers have fallen: If an implementation of hashCode() uses mutable fields to calculate the value, HashSet.contains() produces unexpected results, i.e. your object seems to be not a member of the set.

For an illustration, let’s look at the class PhoneNumber and its mutable field “number”:

public class PhoneNumber {
private final String name;
private String number;
public PhoneNumber(String number, String name) {
this.number = number;
this.name = name;
}
// Setter makes "number" mutable!
public void setNumber(String number) {
this.number = number;
}
@Override
public int hashCode() {
int result = name != null ? name.hashCode() : 0;
result = 31 * result +
(number != null ? number.hashCode() : 0);
return result;
}
// equals() left out here ...
}

What’s wrong with this class? Well, it’s a bad idea to use mutable fields in hashCode() when its instances are put into a HashSet (or as keys into a HashMap). In general, any hash-based collection is problematic. See also “HashSet contains problem with custom objects” (Stackoverflow) and “hashCode() pitfalls with HashSet and HashMap”.

HashSet.contains() surprisingly uses hashCode()

Part of our problem to spot our bug was that the HashSet.contains() method relies on hash values to stay immutable. Unfortunately, this is not stated explicitly in the HashSet JavaDoc, which only mentions “…returns true if and only if this set contains an element e such that (o==null ? e==null : o.equals(e)). Actually, this is the same description as the JavaDoc of Set.contains().

A conscientious reader may also find the following note in the JavaDoc of the Set interface, which only mentions equals():

“Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set.”

By the way, the following Sun JDK bug was reported quite some time ago: “(coll) HashSet.contains method violates Set.contains() contract”. The bug is approved (but not fixed) and the last comment was made in late 2007.

Properly coding the hashCode() method

The contract of hashCode() is explained in the JavaDoc of Object. You will also find hints on proper implementations in the very interesting book “Effective Java” from Joshua Bloch. The book covers many interesting topics and especially the items 7 and 8 shed light on good equals() and hashCode() implementation practices. Another book by the same author, “Java Puzzlers”, also contains two puzzlers on this problematic area: specifically, puzzlers 57 and 58 show how these two methods depend on each other.

There are many more discussions about the problems that can occur with hashCode(). For example, in his Google TechTalks presentation “How To Design A Good API and Why it Matters”, Joshua Bloch says that hashCode() is an implementation detail that should not have leaked into the Java API at all (at about 27:30 min).

And don’t forget the lesson learned here: using mutable fields in hashCode() is a recipe for disaster. And disaster strikes when instances of this class are put in a hash-based collection like HashSet or HashMap (as map keys).

Please note that since code usually uses only the respective collection interfaces, e.g. Set and Map, you might not even know about it (as in our case). Or you use a module or library that stores your objects in a collection internally as an implementation detail that’s hidden from you.

Don’t rely on automatic hashCode() Generation

Even with coding rules in mind, a hashCode() implementation that uses mutable fields creeps into our code base faster than you can spell “bug”. This is because developers are reluctant to write the long-winded calculations in the hashCode() methods manually and often generate them with the help of the IDE, as shown in the screenshot below. But it’s just too easy for a developer to press “Generate” without first checking the specific fields that can be included and leaving the mutable fields out. Of all IDEs I tested only NetBeans at least has all fields unchecked, which forces the developer to select them on purpose.

Modern IDEs provide automatic generation of hashCode(). Eclipse and Intellij IDEA by default include all mutable fields.

Code Inspection Tools to the Rescue?

You might wonder if classes of your code base contain an hashCode() implementation that uses mutable fields. One option (besides a manual code review) is using a code inspection tool. Unfortunately, the prominent open source tools like FindBugs, PMD, CheckStyle do not offer such a built-in inspection.

Only IntelliJ IDEA has a built-in code inspection that detects the use of mutable (actually non-final) fields in hashCode().

The only tool support I found was Intellij IDEA. This IDE provides a code inspection named “Non-final field referenced in ‘hashCode()'”. Any violation is highlighted as shown in the screenshot above.

Series Navigation<< Regular Expressions: Splitting PipesPossibly the most malicious Regular Expression >>
Share