Wednesday, February 1, 2012

Single-word Wang/Jenkins Hash in ConcurrentHashMap


ConcurrentHashMap is hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates. I recently came across this code during testing, and one part really got my attention. To generate the hash, ConcurrentHashMap uses an algorithm based on bitshifting and bitwise operations.

========================================
Variant of single-word Wang/Jenkins hash
========================================
private static int hash(int h) {

// Spread bits to regularize both segment and index locations,
// using variant of single-word Wang/Jenkins hash.
h += (h <<  15) ^ 0xffffcd7d;
h ^= (h >>> 10);
h += (h <<   3);
h ^= (h >>>  6);
h += (h <<   2) + (h << 14);

return h ^ (h >>> 16);
}

According to the comment in the code, this method applies a supplemental hash function to a given hashCode, which defends against poor quality hash functions. Good hash functions are important as a hash table effectively turns from a map to a linked list, in the worst case, all keys in the same bucket. There are also other considerations that come into play such as the performance of hash calculation and the number of buckets. Dr. Heinz M. Kabutz explains the power of “power-of-two number of buckets” which gives us some good starting point to understand what is really going on here.

Let’s look at the code above and see how things change, line-by-line. To make things simple, I use int 1 to perform all the operations.

In Java, the int data type is a 32-bit signed two’s complement integer. To represent int 1 in binary code, we have the following:

1
h=1 > 0000-0000-0000-0000-0000-0000-0001

Now, let’s dissect the following line:

1
h += (h << 15) ^ 0xffffcd7d

First, let’s re-write this into an easier-to-read format.. at least for me  .

1
2
3
4
h1  = h << 15      =  0000-0000-0000-0000-1000-0000-0000-0000
hex = 0xffffcd7d   =  1111-1111-1111-1111-1100-1101-0111-1101
h2  = h1 ^ hex     =  1111-1111-1111-1111-0100-1101-0111-1101
h2 + h             =  1111-1111-1111-1111-0100-1101-0111-1110

Using the same thought processing and applying it to each line, we end-up with:

1
2
3
4
5
6
h += (h << 15) ^ 0xffffcd7d = 1111-1111-1111-1111-0100-1101-0111-1110
h ^= (h >>> 10)             = 1111-1111-1100-0000-1011-0010-1010-1101
h += (h << 3)               = 1111-1101-1100-0110-0100-1000-0001-0101
h ^= (h >>> 6)              = 1111-1110-0011-0001-0101-0001-0011-0101
h += (h << 2) + (h << 14)   = 0100-1011-0100-0011-1101-0110-0000-1001
h ^= (h >>> 16)             = 0100-1011-0100-0011-1001-1101-0100-1010

Result:
Bin = 0100-1011-0100-0011-1001-1101-0100-1010
Decimal = 1,262,722,378

No comments:

Post a Comment