* Fastest among inttoint map implementations I have tested in my previous article in the tests I have implemented for that article.
I would like to thank Sebastiano Vigna and Roman Leventov for sharing their hash map wisdom with me. Some implementation ideas were inspired by “Code Optimization: Effective Memory Usage” by Kris Kaspersky.
This article will give you a step by step overview of various implementation tricks used in the modern hash map implementations. At the end of this article you will have a probably fastest Java inttoint hash map implementation available at the moment of writing of this article.
Open indexing
Most of modern hash maps are based on the idea of open indexing. What does it mean? Your map is based on the array of keys (values will always be placed at the matching array index, so forget about them for now). You have to find your key in the array of keys for each map operation. How does it implemented?
First of all, you need the initial lookup position in the array. It may be calculated by any function which maps a key into an integer in the range [0, array.length  1]
. A key is usually mapped into an integer by means of hashCode
method. A simplest function here could be Math.abs(key.hashCode() % array.length)
(keep in mind that %
result could be negative).
As you understand, a mapping of large set of keys into a small set of integer values means that you may end up with some collisions (they are called hash collisions) – same results of the initial function for the different keys. Collisions are resolved by trying to apply another function to the original array index. The simplest of such functions is (prevIdx + 1) % array.length
. There is one requirement for such functions – if applied in a loop, they should cover the whole set or array cells, so that you can use the whole array capacity. Another example of such function is incrementing the index by one prime number if the array length is another prime number.
Free and removed cells
In theory, that’s enough to implement your own hash map. In practice, you need to distinguish free and removed cells from occupied cells (you can avoid using removed cells if you’ll do extra work in remove
method – see how it is implemented in the latest FastUtil). Removed cells are also known as “tombstones”.
Your keys array is initially filled with free “cells”. You set a cell into “removed” state if you need to remove an existing key.
Let’s take a look at an example:
This int
key map uses the initial and next functions defined above:
1 2 initial = Math.abs( key % array.length ); nextIdx = ( prevIdx + 1 ) % array.length;initial = Math.abs( key % array.length ); nextIdx = ( prevIdx + 1 ) % array.length;
This map originally contained keys 1, 2, 3 and 4, but key=3 was subsequently removed from a map, so it was replaced with a removed (“R”) placeholder.
Let’s see what should we do to find the following keys:
Key  Description 
2  Start function points at a cell with index=2 at once. We have key=2 at a cell with index=2, so no further lookup is required. 
3  Start function points at a cell with index=3. This cell is “removed”, so we have to apply “nextIdx” function in a loop until we either find a key or a free cell. We check cell index=4 next – bad luck, key is not equal. Then we check cell index=5: it is a free cell, so we stop the lookup – key is not found. 
Next, let’s see what will happen if we want to add key=10: initial = key % array.length = 10 % 9 = 1
. Cell at index=1 is already occupied with another key, so we can not use it. So is cell at index=2. The cell at index=3 is “removed”, so we can reuse it and put key=10 into it.
Removed cells cleanup
In many cases your hash map may degrade to O(n^{2}) complexity if you would keep the removed cells in the map. Fastest maps are implementing the removed cells cleanup one way or another. As a result, all other map methods will need to distinguish just 2 cell states: free or used. Besides that, remove
method is usually called infrequently compared to get
and less frequently than put
, which means that some extra complexity during key removal will be paid off by fasted execution of other methods. This article will use FastUtil cleanup logic.
Key scrambling
The initial index function I have mentioned above ( initial = Math.abs( key % array.length );
) will put consecutive keys in the consecutive array cells. This is highly undesirable if your next cell function is simply picking up the next array cell, because it will cause the long lookup chains to be created in a pretty common case.
In order to avoid it, we need to “scramble” the key, shuffling its bits. I will rely on FastUtil scrambling code:
1 2 3 4 5 6 private static final int INT_PHI = 0x9E3779B9; public static int phiMix( final int x ) { final int h = x * INT_PHI; return h ^ (h >> 16); }private static final int INT_PHI = 0x9E3779B9; public static int phiMix( final int x ) { final int h = x * INT_PHI; return h ^ (h >> 16); }
As a result, consecutive keys will not be placed in the consecutive array cells, thus keeping the average hash chain length under control. As for “random” keys case, you are likely to end up with a pretty good distribution of keys over the keys array as well.
Now you are definitely ready to implement your own hash map. We will be implementing an intint
map in the next several sections of this article.
Version 1: Base intint
map
Let’s start from a simplest possible implementation (which will leave us enough space for optimization). This implementation will look similar to Trove 3.0 TIntIntHashMap
(though I was not copying its source code while writing this example).
It will use 3 arrays: one int[]
for keys, one int[]
for values and one boolean[]
for cell usage flags (true==used). We will allocate the arrays of requested size (it will be size / fillFactor + 1
), despite the fact that all production quality implementations are rounding this number up. Initial and nextIdx functions are the simplicity itself:
1 2 initial = Math.abs( Tools.phiMix(key) % array.length); nextIdx = ( prevIdx + 1 ) % array.length;initial = Math.abs( Tools.phiMix(key) % array.length); nextIdx = ( prevIdx + 1 ) % array.length;
You can find the source code of this and all other maps at the end of this article.
I will compare each implementation results to the previous article results. Let’s compare it to the fastest implementation – Koloboke (note that I will give only get
test results for the intermediate versions of a map to save space – you can find the full test results at the end of an article). All tests for this article are running on the random key set. All maps have the fill factor set to 0.75
.
Map size:  10.000  100.000  1.000.000  10.000.000  100.000.000 
tests.maptests.primitive.KolobokeMutableMapTest  1867  2471  3129  7546  11191 
tests.maptests.article_examples.IntIntMap1Test  2768  3671  6105  12313  16073 
That’s already great! The very first unoptimized attempt to implement a hashmap is less than 2 times slower than the Koloboke implementation.
Version 2: avoiding expensive %
operation – arrays capacity is power of 2 now
Surprisingly many people think that old wisdom about extremely slow integer division/modulo operations is no longer valid – new CPUs are so smart and fast!!! Unfortunately this is still incorrect – both integer division operations are pretty slow and should be avoided in the performance critical code.
Previous version of a hashmap used %
operation for both start and next index calculations, so it was guaranteed to be executed at least once per hash map method call. We can avoid it if our arrays size would be equal to a power of 2 (first power of 2 higher than the expected capacity). In this case our simple index functions could be transformed into:
1 2 initial = Tools.phiMix( key ) & (array.length1); nextIdx = ( prevIdx + 1 ) & (array.length1);initial = Tools.phiMix( key ) & (array.length1); nextIdx = ( prevIdx + 1 ) & (array.length1);
array.length1
should be, of course, cached into a separate mask
class field. Why array.length1
could be used as a mask? It is a known fact that if K = 2^N
, then X % K == X & (K  1)
. Using &
operation gives us an extra benefit of always nonnegative results (highest bit is always cleared by such masks), which further speeds up the calculation.
Keep in mind that all high performance hash maps are relying on this optimization.
Let’s compare this implementation results with the previous maps:
Map size:  10.000  100.000  1.000.000  10.000.000  100.000.000 
tests.maptests.primitive.KolobokeMutableMapTest  1867  2471  3129  7546  11191 
tests.maptests.article_examples.IntIntMap1Test  2768  3671  6105  12313  16073 
tests.maptests.article_examples.IntIntMap2Test  2254  2767  4869  10543  16724 
This optimization alone allowed us to make half a way from the slowest to the fastest implementation! Nevertheless, there is a long way ahead of us.
Version 3: getting rid of m_used
array
The previous version of a map used 3 separate arrays to store the map data. It means that for a random key a map has to access 3 different areas of memory, each time likely causing a CPU cache miss. High performance code should minimize the possible number of CPU cache misses it causes on the normal operation path. The simplest optimization we can do is to get rid of m_used
array and encode the cell usage flag in a smarter way.
The problem is that we are implementing an inttoint
map, so we may expect any int
to be used as a key (how pathetic would be a general purpose hashmap stating that some key N
is reserved and could not be used…). It means that we need some extra storage for usage flags, isn’t it? Yes, it is. But the point is that we can use O(1)
instead of O(n)
bytes for this storage!
The idea is to choose one special key value for free cells. There are 2 known strategies after that (I will use the first one):

Store a value matching to the free cell in a separate field. You also need some indication if a free key is actually used (a
boolean
and anint
or justBoolean
is fine for this purpose). Each map method should check if a key argument is equal to a free key prior to the normal logic and act accordingly.  Pick a random free key. If you are requested to insert it in the map, select a new random free key, which is not present in the map so far (and replace all old free keys with a new value). Now you need to store the free key instead of its corresponding values, but you would not get any other benefits. Koloboke is the only implementation which uses this strategy.
As a side note I want to mention that dealing with free keys on maps with Object
keys is much easier:
1 private static final Object FREE_KEY = new Object();private static final Object FREE_KEY = new Object();
This key is not accessible to other classes, so it could never be passed into your map implementation as a key. Those smart guys who remember about reflection are reminded that the hash map keys must properly implement equals
and hashCode
methods, which is not the case for this private object 🙂
As I have mentioned before, our implementation will use a hardcoded free key (0
, which is a cheapest to compare constant) and store the corresponding value in the separate fields:
1 2 3 4 5 6 7 8 9 10 11 private static final int FREE_KEY = 0; /** Keys */ private int[] m_keys; /** Values */ private int[] m_values; /** Do we have 'free' key in the map? */ private boolean m_hasFreeKey; /** Value of 'free' key */ private int m_freeValue;private static final int FREE_KEY = 0; /** Keys */ private int[] m_keys; /** Values */ private int[] m_values; /** Do we have 'free' key in the map? */ private boolean m_hasFreeKey; /** Value of 'free' key */ private int m_freeValue;
Let’s compare this implementation results with the previous maps:
Map size:  10.000  100.000  1.000.000  10.000.000  100.000.000 
tests.maptests.primitive.KolobokeMutableMapTest  1867  2471  3129  7546  11191 
tests.maptests.article_examples.IntIntMap1Test  2768  3671  6105  12313  16073 
tests.maptests.article_examples.IntIntMap2Test  2254  2767  4869  10543  16724 
tests.maptests.article_examples.IntIntMap3Test  2050  2269  3548  9074  13750 
As you can see, the impact of this change grows with the map size (the bigger is your map, the less help you get from the CPU cache). Nevertheless, we are still pretty far from the Koloboke implementation. But actually this map would already be the third in my previous article, after Koloboke and FastUtil.
Versions 4 and 4a – replacing arrays of keys and values with a single array
This step follows the direction of the previous step – now we want to use a single array to store keys and values. This will allow us to be able to access/modify values at a very low cost, because they will be located next to the keys.
There are 2 possible implementations in case of inttoint
map:

Use
long[]
– a key and a value will share the singlelong
cell. Usefulness of this method is limited to certain types of keys and values. 
Use a single
int[]
– keys and values will be interleaved (the only meaningful interleaving strategy is storing a value right after a key). This strategy has a small disadvantage of limiting the maximal map capacity to 1 billion cells (the maximal array size in Java is equal toInteger.MAX_VALUE
). I believe this is not a problem for the most of use cases.
The difference between 2 those scenarios is the need to use bit arithmetic / type conversions to extract a key and a value from a long
cell. My tests have shown that these operations have a noticeable negative impact on the map performance. Nevertheless, I have included both long[]
(IntIntMap4
) and int[]
(IntIntMap4a
) versions to this article.
Micro optimizations
Both versions would be pretty fast, but you need more to become the fastest. The important thing you should understand about a hashmap that its basic operations have O(1)
complexity provided that you were not too greedy with the fill factor. It actually means that you must count the instructions on the hash hit path (the very first cell you check if either free or contains the requested key). Optimizing a hash collision loop is also important, but (I repeat this) you must be very careful on the hash hit path, because most of your operations will end up with a hash hit.
Having this in mind, you may want to inline some of helper methods, especially the ones which could save you an instruction or two if inlined. Take a look, for example, at the get
method from the previous map version:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 public int get( final int key ) { if ( key == FREE_KEY ) return m_hasFreeKey ? m_freeValue : NO_VALUE; final int idx = getReadIndex( key ); return idx != 1 ? m_values[ idx ] : NO_VALUE; } private int getReadIndex( final int key ) { int idx = getStartIndex( key ); if ( m_keys[ idx ] == key ) //we check FREE prior to this call return idx; if ( m_keys[ idx ] == FREE_KEY ) //end of chain already return 1; final int startIdx = idx; while (( idx = getNextIndex( idx ) ) != startIdx ) { if ( m_keys[ idx ] == FREE_KEY ) return 1; if ( m_keys[ idx ] == key ) return idx; } return 1; }public int get( final int key ) { if ( key == FREE_KEY ) return m_hasFreeKey ? m_freeValue : NO_VALUE; final int idx = getReadIndex( key ); return idx != 1 ? m_values[ idx ] : NO_VALUE; } private int getReadIndex( final int key ) { int idx = getStartIndex( key ); if ( m_keys[ idx ] == key ) //we check FREE prior to this call return idx; if ( m_keys[ idx ] == FREE_KEY ) //end of chain already return 1; final int startIdx = idx; while (( idx = getNextIndex( idx ) ) != startIdx ) { if ( m_keys[ idx ] == FREE_KEY ) return 1; if ( m_keys[ idx ] == key ) return idx; } return 1; }
The first check (key == FREE_KEY
) is unlikely to happen, so it will be predicted correctly by CPU in most of cases. Essentially, it can be excluded from consideration here.
Next you get the cell index to use from a helper getReadIndex
method. It is perfectly objectoriented, but it causes an extra check ( return idx != 1 ? m_values[ idx ] : NO_VALUE
) to be made. This check, unlike previous ones, is unpredictable unless your map is essentially readonly and most of your get
calls are made for the existing keys. Unpredictable checks should be avoided on the critical path (if possible).
It is easy to achieve in this case – just inline the body of getReadIndex
into get
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 public int get( final int key ) { if ( key == FREE_KEY ) return m_hasFreeKey ? m_freeValue : NO_VALUE; int idx = getStartIndex( key ); if ( m_keys[ idx ] == key ) //we check FREE prior to this call return m_values[ idx ]; if ( m_keys[ idx ] == FREE_KEY ) //end of chain already return NO_VALUE; final int startIdx = idx; while (( idx = getNextIndex( idx ) ) != startIdx ) { if ( m_keys[ idx ] == FREE_KEY ) return NO_VALUE; if ( m_keys[ idx ] == key ) return m_values[ idx ]; } return NO_VALUE; }public int get( final int key ) { if ( key == FREE_KEY ) return m_hasFreeKey ? m_freeValue : NO_VALUE; int idx = getStartIndex( key ); if ( m_keys[ idx ] == key ) //we check FREE prior to this call return m_values[ idx ]; if ( m_keys[ idx ] == FREE_KEY ) //end of chain already return NO_VALUE; final int startIdx = idx; while (( idx = getNextIndex( idx ) ) != startIdx ) { if ( m_keys[ idx ] == FREE_KEY ) return NO_VALUE; if ( m_keys[ idx ] == key ) return m_values[ idx ]; } return NO_VALUE; }
As you can see, the only transformation I did was the replacement of getReadIndex
return values with get
return values. You can do a few similar transformations here and there. It also worth to manually inline getStartIndex
and getNextIndex
calls – you can’t tell for sure if they will be inlined by JIT. As a result, you may end up with the following get
method (taken from IntIntMap4a
– an int[]
version):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 public int get( final int key ) { int ptr = ( Tools.phiMix( key ) & m_mask) << 1; if ( key == FREE_KEY ) return m_hasFreeKey ? m_freeValue : NO_VALUE; int k = m_data[ ptr ]; if ( k == FREE_KEY ) return NO_VALUE; //end of chain already if ( k == key ) //we check FREE prior to this call return m_data[ ptr + 1 ]; while ( true ) { ptr = (ptr + 2) & m_mask2; //that's next index k = m_data[ ptr ]; if ( k == FREE_KEY ) return NO_VALUE; if ( k == key ) return m_data[ ptr + 1 ]; } }public int get( final int key ) { int ptr = ( Tools.phiMix( key ) & m_mask) << 1; if ( key == FREE_KEY ) return m_hasFreeKey ? m_freeValue : NO_VALUE; int k = m_data[ ptr ]; if ( k == FREE_KEY ) return NO_VALUE; //end of chain already if ( k == key ) //we check FREE prior to this call return m_data[ ptr + 1 ]; while ( true ) { ptr = (ptr + 2) & m_mask2; //that's next index k = m_data[ ptr ]; if ( k == FREE_KEY ) return NO_VALUE; if ( k == key ) return m_data[ ptr + 1 ]; } }
As you can see, we have just the following operations on the hash hit path:
 A check for free key  highly predictable => extremely cheap
 4 bit operations and a multiplication to calculate the start index
 Extraction of a key  the only truly expensive operation, can't be avoided
 Checks for free key / our key  your scenario may cause these checks to be predicted correctly, but they are not predictable in general case. Anyway, the arguments to compare will reside in L1 cache at worst, so both checks are cheap.
 If key is found  extraction of a value. Our data layout ensures that a value will be fetched into L1 cache with a key, so reading a value is very cheap.
The hash hit path costs us:
 2 highly predictable comparisons with constants
 2 non predictable comparisons with constant / a key which is most likely already loaded into a register
 4 bit operations (all arguments are either constants or values located no further than in L1 cache)
 1 multiplication (unfortunately unavoidable if you need to shuffle the bits)
 1 memory read which is likely to cause CPU cache miss (loading a key to check)
 1 memory read which will be served out of L1 cache (after a previous mem read operation)
11 operations only to get a value by a key (and only one of them is likely to be expensive)  no wonder that a test is able to fetch 10 million keys a second in the worst case (when a map is too large and CPU cache can't help us and we also likely to check a few keys due to a large fill factor).
Test results
Now we will present the test results for all 5 map implementations and a Koloboke map as the fastest inttoint
map from my previous article.
"get" test results
Map size:  10.000  100.000  1.000.000  10.000.000  100.000.000 
tests.maptests.primitive.KolobokeMutableMapTest  1867  2471  3129  7546  11191 
tests.maptests.article_examples.IntIntMap1Test  2768  3671  6105  12313  16073 
tests.maptests.article_examples.IntIntMap2Test  2254  2767  4869  10543  16724 
tests.maptests.article_examples.IntIntMap3Test  2050  2269  3548  9074  13750 
tests.maptests.article_examples.IntIntMap4Test  1902  2296  3229  7749  10661 
tests.maptests.article_examples.IntIntMap4aTest  1738  22209  2927  6582  9969 
"put" test results
Map size:  10.000  100.000  1.000.000  10.000.000  100.000.000 
tests.maptests.primitive.KolobokeMutableMapTest  3262  4549  5600  12182  16140 
tests.maptests.article_examples.IntIntMap1Test  9048  10555  11322  23004  28974 
tests.maptests.article_examples.IntIntMap2Test  4305  4816  9435  19805  26770 
tests.maptests.article_examples.IntIntMap3Test  3865  4063  7455  15274  21014 
tests.maptests.article_examples.IntIntMap4Test  3562  4676  5866  12999  16304 
tests.maptests.article_examples.IntIntMap4aTest  3411  4401  5374  11289  15095 
"remove" test results
Map size:  10.000  100.000  1.000.000  10.000.000  100.000.000 
tests.maptests.article_examples.IntIntMap1Test  8301  9142  9313  16507  24134 
tests.maptests.article_examples.IntIntMap2Test  3915  3890  6227  13450  20456 
tests.maptests.article_examples.IntIntMap3Test  3339  3270  4901  10425  16120 
tests.maptests.article_examples.IntIntMap4Test  3098  3052  3988  8670  12019 
tests.maptests.article_examples.IntIntMap4aTest  2870  2876  3823  8127  11503 
tests.maptests.primitive.KolobokeMutableMapTest  2836  3042  3923  8228  12007 
As you can see, the results are pretty obvious  IntIntMap4a
is faster than a Koloboke intint
map, which was the fastest map in my previous article.
Summary
If you want to optimize your hash map for speed, you have to do as much as you can of the following list:

Use underlying array(s) with capacity equal to a power of 2  it will allow you to use cheap
&
instead of expensive%
for array index calculations.  Do not store the state in the separate array  use dedicated fields for free/removed keys and values.
 Interleave keys and values in the one array  it will allow you to load a value into memory for free.

Implement a strategy to get rid of 'removed' cells  you can sacrifice some of
remove
performance in favor of more frequentget
/put
.  Scramble the keys while calculating the initial cell index  this is required to deal with the case of consecutive keys.
Source code
Classes written for this article are added to my previous hash map testing project on GitHub: https://github.com/mikvor/hashmapTest.
Please note you should run this project via tests.MapTestRunner class:
mvn clean install java cp target/benchmarks.jar tests.MapTestRunner