Misconceptions about Object.hashCode()

The given implementation for hashCode is somewhat a mystery. Some misconceptions about it confuse students. I try to explain it.

Note: Students struggle to use the Java Collections Framework because they do not understand the data structures and algorithms. The concept of Hash Codes is one thing you won’t learn from an online blog (online blogs suck). This text only throws light on some common misconceptions. It does not actually teach you anything. Buy a good book and work through it.

Documentation and Specification on Object.hashCode()

Javadoc API description

You can read the full text here: docs.oracle.com → Object → hashCode

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)

So this suggests that the JVM would use some internal address. But we do not know what kind of address that could be.

Java Language Specification

As stated above there is no requirement. It simply has to return some integer. It only explains that hashCode is useful for HashMap:

The method hashCode is very useful, together with the method equals, in hashtables such as java.util.HashMap.

What is Object.hashCode()?

There is no definition on how it is supposed to be implemented, but there is a clear definition on what it has to be. That’s very normal. Any specification of an algorithm is all about what, not how.

Here’s a shortened version of that specification:

  • The hashCode method must consistently return the same integer, provided no information on the object is modified.
  • If two objects are equal, then calling the hashCode method on each of the two objects must produce the same result.
  • The programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

A hash code is a number that is nearly unique in the set of all hash codes of all objects of the same type. Two objects that are equal must have the same hash code. If not then they can’t be used in hash based collections (HashMap, HashSet).

Two different (non-equal) objects can have the same hash code, that’s called a collision. This is ok as long as it doesn’t happen too often.

And that’s it. So if there is a unique id then that is used. If not (or if it is larger than a 32-bit integer) then the next best thing is used. It could be some algorithm, or some id that is generated but will repeat if all or most integer values are already in use.

What are hash codes used for?

Hash codes are used in data structures that are based on hash codes. HashSet and HashMap are often used in Java and they are always based on the hashCode- and equals-methods.

In general, hash functions can be used for more than that. They can be used to check integrity. If the same hash code is computed for some data then it probably wasn’t altered. A hash sometimes is used like a fingerprint that is nearly-unique and if two objects have the same hash code then they are probably equal.

Some systems store a hash code of a password to later recalculate the hash code from the user input and check if it’s the same. If so then it’s very probable that the correct password was given. This approach is only secure if the hash code is large enough (much larger than 32 bit!), a salt is used and timing attacks are prevented. Object.hashCode() could never be used for security.

What are hash codes NOT used for?

They are not addresses. The JVM doesn’t even use them. So they are not directly used to locate objects. However, they can be used to locate a bucket of objects that might contain the object, and then the object can be searched in that bucket.

Hash codes are not unique identifiers, simply because they do not have to be unique. However, a unique identifier sometimes can used for the hash code. In Java you can use any 32bit unique id as a hash code.

Misconceptions

Hash Code is not an address!

It’s not a unique id, it’s not a locator, and it’s not a memory address. The mailing address of a person could change any time, so obviously it’s not something to identify that person over a long time. A memory address is a bit like a mailing address. Each memory address is unique in that no other address could point to the same location. But when somebody moves then that address belongs to a different person. But a hash code doesn’t change when the JVM moves the object.

A Java object does not have a fixed location

Not only is it possible that the location will change, the heap of a common JVM is designed to move around objects all the time. The majority of newly created objects are, at first,  located in the Eden space, which is a rather small part of the heap. So the hash codes would constantly have the same value for objects created at the same location in Eden.

However, it could have some table with all existing references, mapped to the current memory addresses of the objects. The JVM needs such a table that translates references to current memory locations. If the “position” or the “key” in that table never changes then that is a perfect candidate for a hash code. Even if the JVM uses 64 bit object references, or if those positions change during the lifetime of the object, it still can be used. It could use that for nearly-unique hash codes: The two 32 bit parts of a long value can be XORed to one single 32 bit value and the first used position in the table could be stored with the object. And that’s what the “internal address” is: Some position in some kind of internal data structure used to manage the references of all objects. The native implementation of Object.hashCode() can use this.

How to make Hash Codes

Work smart, not hard

You could write the algorithm yourself, but a good programmer is way too lazy for that. So simply use Objects.hash(...) instead. Pass all fields of the object to that method and it returns a hash code. Or let your IDE create the implementation. In Eclipse you click “Source”, “Generate hashCode() and equals()…”, “OK”. Three simple clicks and you are done.

Don’t forget equals()

When you do, always override both hashCode() and equals(). The API explains how to do it and your IDE probably can help you with it. A text books explain it even better (for example Effective Java by Joshua Bloch). But even he didn’t fully solve the problem. It won’t work properly if you extend some type that already implements those two methods and you want to add more properties to your extending type. There is a solution to that, explained here. But in most cases it’s much better to simply not have such a complex design and instead use final types for hash-based collections. And to make things easier you should use immutable types, so you do not have to worry about elements that change state while they are in a hash-based collection.

Default Implementation of hashCode()

You can’t define an interface and add a default implementation of hashCode(). That’s because an interface can’t provide any default implementation to any method defined in Object. This might change in the future, but for Java 8 it simply isn’t possible. However, you can have some abstract type that implements hashCode().

One thought on “Misconceptions about Object.hashCode()”

Leave a Reply

Your email address will not be published. Required fields are marked *