Thursday 11 June 2015

Generating collisions in String.hashCode

In a previous post I mentioned collisions in String.hashCode(). Turns out that generating such collisions is quite easy.
String.hashCode() is defined as:
 
    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }
For a string A of length N (a0, a1,..., aN-2, aN-1),
this can be written in a closed form as: 31^(N-1)a0 + 31^(N-2)a1 + ... + 31aN-2 + aN-1.


Pad by zeros

The first observation is that 0 is a valid character and prepending arbitrary number of 0 characters doesn't change the value of the hash. The following code takes advantage of this:
    public static String padByZeroes(String original, int numberOfExtraZeroes) {
        int length = original.length();
        char[] chars = new char[length + numberOfExtraZeroes];
        for (int i = 0; i < length; i++) {
            chars[i + numberOfExtraZeroes] = original.charAt(i);
        }
        return new String(chars);
    }
padByZeroes produces strings of length greater then the original with identical hashcode.

Pair Perturbation

As a starting point to this approach we take a string B which is identical to A: (b0, b1,..., bN-2, bN-1). We then modify two consecutive character of B without affecting it's hashcode.
If the modified characters are are at index K and K+1 we have: hashCode(A)-hashCode(B) = ... = 31(aK - bK)+(aK+1 - bK+1).
We want the last expression to be zero, that is:  31(aK - bK) =  bK+1 - aK+1. As we are free to choose bK and bK+1, to make things simple (and reduce the chance of overflowing/underflowing the two byte value of char) we choose bK = aK - 1 which gives us bK+1 = aK+1 + 31.
    public static String pairPerturbation(String original, int pairIndex) {
        int length = original.length();
        if (pairIndex > length - 2) {
            throw new IllegalArgumentException("pairIndex can't be greater then " + (length - 2));
        }

        char[] chars = original.toCharArray();

        chars[pairIndex] = (char) (original.charAt(pairIndex) - 1);
        chars[pairIndex + 1] = (char) (original.charAt(pairIndex + 1) + 31);

        return new String(chars);
    }
Here is a link to the Gist of the code and tests: https://gist.github.com/sorokod/a6848e6cd4c9fe32a523#file-stringhashcollisions

1 comment:

  1. This seems to be the trigger for changes made to Maps in Java 8 as described in http://openjdk.java.net/jeps/180

    ReplyDelete