Oscar Westra van Holthe - Kind

Equals and hashCode

The methods equals(Object) and hashCode() are used to denote equivalence and exist since before the Java Collections framework. So they should be well understood, right?

Unfortunately, there is still plenty of room for confusion. Especially in relation to database entities (whether modelled as the old EJB'2 or as JPA @Entity classes). The problems occur for two reasons:

  1. It's easy to not adhere to the equals() contract.
  2. While legal, a changing return value from hashCode() or equals() causes undefined behaviour in Map and Set implementations. This causes bugs.

For the initial part of this essay, the hashCode() method will all but be ignored. The reason is that equals() much more difficult to get right. Once we have that correct, hashCode() follows easily.

The contract

According to it's javadoc, the equals method implements an equivalence relation on non-null object references:

The tricky part about this contract is the combination of symmetry, transitivity and inheritance. The reason becomes clear when you have a class structure with non-trivial extensions. Many (relatively) simple implementations either fail to provide symmetry, or transitivity. This is detailed by Angelika Langer & Klaus Kreft in Secrets of equals(). And although they do show in a second installment that it's possible to maintain both symmetry and transitivity by using a Visitor pattern to navigate the class hierarchy, this is much too convoluted for most (if not all) cases.

Considerations for mere mortals

We've seen that if you're really smart, it's possible to create a complex class hierarchy where equals() is fully correctly implemented. Usually though, this is not needed.

To see why, first let us look at our classes. We'll notice there are two basic types. These are behavioural and value classes. Note that the difference is quite subtle: value classes do have some behavior, as guided by the Tell, don't ask principle. Behavioral classes however, never have the need to have data. Important here is that because of the Single Responsibility Principle, a behavioral class cannot become a value class, nor vice versa.

The easiest to handle are behavioural classes. Since they are all about what they do, you'll usually not have many instances, and when you do there'll be no point in comparing them. So you'll use the equals() and hashCode() implementations of java.lang.Object.

For value classes, the class hierarchy becomes important. The choice you need to make, is whether you'll allow subclass instances to be equal to superclass instances. And if you do, whether you'll allow non-trivial extensions. Let's summarize them briefly:

No subclass/superclass instance equality
The easiest to get right, but a bit limited: it's unusable with EJB's, as they often use proxies. Also, this implementation violates the Liskov Substitution Principle: when you substitute an instance with a subclass instance, the result of equals is different. If you don't mind that, a typical implementation looks like this:
public boolean equals(object obj)
{
	if (this == obj)
	{
		return true;
	}
	if (obj!=null && getClass() == obj.getClass())
	{
		// Cast and compare all relevant fields.
		// Return the result.
	}
	return false; // obj is null or of the wrong class
}
public int hashCode()
{
	final int modifier = 31; // Any odd prime will do.
	// (look at multiple powers of a number in binary to find out why)

	int hashCode = 17; // Any non-zero number will do.
	// Add the following line, adding the hashcode of each relevant field.
	hashCode = hashCode * modifier + 0;
	/*
	 * Hash codes for primitives:
	 * boolean -> value ? 1 : 0
	 * byte, char, short, int -> (int)value
	 * long -> (int)(value^(value>>>32))
	 * float -> floatToIntBits(value)
	 * double -> (int)(doubleToLongBits(value)^(doubleToLongBits(value)>>>32))
	 */
}
Allow only trivial extensions to be equal
A little bit trickier, because you need a final modifier. The downside is severely limited subclassing, but it's usable with EJB's and when you have a data model in a database, subclasses are usually (!) limited anyway. A typical implementation looks like this:
public final boolean equals(object obj)
{
	if (this == obj)
	{
		return true;
	}
	if (obj instanceof MyClass) // Replace with your actual class name
	{
		// Cast and compare all relevant fields.
		// Return the result.
	}
	return false; // obj is null or of the wrong class
}
public final int hashCode()
{
	// Implement as above.
}
Allow complex hierarchies
The most difficult by far. Angelika Langer & Klaus Kreft have detailed a solution for this though.

Entity Java Beans, equality and collections

You might be tempted to use your database's primary key for equals comparisons and hashCode calculations — don't. You'll get subtle bugs when you create an entity, add it to a collection, and then persist it. The reason is that the primary key will first be null, and after persisting it the database / persistence provider will have given it a value. Thus, your Map or Set will give undefined results.

The correct way of implementing equals and hashCode is by using an alternate, natural candidate key. Or, in programming language, final fields that uniquely identify your objects. Such fields do not change during the lifetime of the object, and as such are perfect for use in equals() and hashCode().