Using the hashCode() Method

The first thing to note about the hashCode() method of the Object class is that it returns a 32-bit integer. So the output is allowed to be anything in the range from $-2^{31}$ to $2^{31}-1$. Note that means it could be negative, or ridiculously large -- both of which would represent problems were we to use it "right out of the box" as an array index.

However, we can easily convert the output into an appropriate index for an array of size $m$. Supposing it's output is $h$, we simply find the (non-negative) remainder of $h$ upon division by $m$.

I know what you are thinking -- shouldn't we just calculate (h % m)? Wouldn't that give us the value we seek? Well, using the mod operator to get the magnitude of our hash code down to a more appropriate size is definitely something we will want to do. However, unlike the mathematical mod operator, which only ever results in positive values -- in most programming languages, the result of applying a mod operator can occasionally be negative.

Simple fix, you say? Just use (Math.abs(h) % m), you suggest?

Granted, in theory this makes perfect sense -- except for one annoying complication. When one takes the absolute value of the minimum 32-bit integer (i.e., $-2^{31}=-2147483648$), one gets $2^{31}$. But this value is one more than the maximum 32-bit integer, $2^{31}-1$.

As a result, an overflow error occurs. Since Java's mechanism for storing negative integers (i.e., Two's Complement) effectively curves the number line into a circle -- connecting the maximum value to the minimum -- and noting that $2^{31}$ is one larger than $2^{31}-1$, the overflow takes us back to the minimum again. The upshot of all this is that one will see the following in Java:

System.out.println(Math.abs(-2147483648)); // prints -2147483648   <-- Yikes!

Admittedly, the chance of having a hash code equal $-2147483648$ is exceedingly low (i.e., less than one in a billion, when hash codes are uniformly distributed) -- but it can happen. As an amusing example, note that "polygenelubricants".hashCode() equals this value.

Fortunately, the fix is easy.

The key is another consequence of Java's use of two's complement to store negative integers -- namely, that a 32-bit integer will be positive whenever its the leading bit is zero.

We can ensure this happens with a clever trick. Consider the following bit-wise and operation on the sequence of bits that represents -123456789:

  11111000101001000011001011101011   // -123456789
& 01111111111111111111111111111111   // 0x7fffffff (in hexadecimal)
----------------------------------
  01111000101001000011001011101011

Notice how in all but the first column, the bit-wise and operator (&) preserves the top bit by "and-ing it together" with a 1. In the first column however, regardless of the top bit's value, the result of "anding it together" with 0 is always 0 (which makes the result positive).

As shown above, the hexadecimal notation for the second bitstring above is 0x7fffffff. (To understand why, check out the notes on Numbers in Other Bases -- especially the part covering the shortcut for changing between base 2 and base 16).

Consequently, we can use the following to generate an int that will always represent a valid index in an array of length $m$ from a given key's hash code.

private int hash(Key key) {
  return (key.hashCode() & 0x7fffffff) % m;
}

With the above undestanding of how the result of the hashCode() method should be used, we now turn our attention to the implementation details of how the original hash code value is generated.

Importantly, the hashCode() method of the Object class is meant to be overridden. It's default implementation, which simply returns the memory address of the object in question, very often does not meet the requirement that hash codes of equal keys should always be equal themselves.

To see how this can happen, consider the following CharWrapper class and the later use of its default hashCode() method. Note how c1 and c2 are clearly equal to one another (as determined by the given equals() method.) However, as they are constructed separately, they have different memory addresses -- and consequently different hash codes, since we failed to override the hashCode() method.

public class CharWrapper {
    
    char c;
    
    CharWrapper(char c) {
        this.c = c;
    }
    
    public boolean equals(Object o) {
        return (o instanceof CharWrapper) && (this.c == ((CharWrapper) o).c);
    }
}
...
    CharWrapper c1 = new CharWrapper('a');   
    CharWrapper c2 = new CharWrapper('a');   
    
    System.out.println(c1.equals(c2));                    // prints "true"
    System.out.println(c1.hashCode() == c2.hashCode());   // prints "false" <-- Problem! 
...

Again, how we override the hashCode() method in a given class depends on the nature of the keys that class represents. Before we look at a general strategy for constructing hash codes tied to a given class, it will be helpful to see how the hashCode() method is implemented by some existing, well-known Java classes. But we'll save that for the next section...