Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove – January 2015 version

by Mikhail Vorontsov

This is a major update of the previous version of this article. The reasons for this update are:

  • The major performance updates in fastutil 6.6.0
  • Updates in the “get” test from the original article, addition of “put/update” and “put/remove” tests
  • Adding identity maps to all tests
  • Now using different objects for any operations after map population (in case of Object keys – except identity maps). Old approach of reusing the same keys gave the unfair advantage to Koloboke.

I would like to thank Sebastiano Vigna for providing the initial versions of “get” and “put” tests.

Introduction

This article will give you an overview of hash map implementations in 5 well known libraries and JDK HashMap as a baseline. We will test separately:

  • Primitive to primitive maps
  • Primitive to object maps
  • Object to primitive maps
  • Object to Object maps
  • Object (identity) to Object maps

This article will provide you the results of 3 tests:

  • “Get” test: Populate a map with a pregenerated set of keys (in the JMH setup), make ~50% successful and ~50% unsuccessful “get” calls. For non-identity maps with object keys we use a distinct set of keys (the different object with the same value is used for successful “get” calls).
  • “Put/update” test: Add a pregenerated set of keys to the map. In the second loop add the equal set of keys (different objects with the same values) to this map again (make the updates). Identical keys are used for identity maps and for maps with primitive keys.
  • “Put/remove” test: In a loop: add 2 entries to a map, remove 1 of existing entries (“add” pointer is increased by 2 on each iteration, “remove” pointer is increased by 1).

This article will just give you the test results. There will be a followup article on the most interesting implementation details of the various hash maps.

Test participants

JDK 8

JDK HashMap is the oldest hash map implementation in this test. It got a couple of major updates recently – a shared underlying storage for the empty maps in Java 7u40 and a possibility to convert underlying hash bucket linked lists into tree maps (for better worse case performance) in Java 8.

FastUtil 6.6.0

FastUtil provides a developer a set of all 4 options listed above (all combinations of primitives and objects). Besides that, there are several other types of maps available for each parameter type combination: array map, AVL tree map and RB tree map. Nevertheless, we are only interested in hash maps in this article.

Goldman Sachs Collections 5.1.0

Goldman Sachs has open sourced its collections library about 3 years ago. In my opinion, this library provides the widest range of collections out of box (if you need them). You should definitely pay attention to it if you need more than a hash map, tree map and a list for your work 🙂 For the purposes of this article, GS collections provide a normal, synchronized and unmodifiable versions of each hash map. The last 2 are just facades for the normal map, so they don’t provide any performance advantages.

HPPC 0.6.1

HPPC provides array lists, array dequeues, hash sets and hash maps for all primitive types. HPPC provides normal hash maps for primitive keys and both normal and identity hash maps for object keys.

Koloboke 0.6.5

Koloboke is the youngest of all libraries in this article. It is developed as a part of an OpenHFT project by Roman Leventov. This library currently provides hash maps and hash sets for all primitive/object combinations. This library was recently renamed from HFTC, so some artifacts in my tests will still use the old library name.

Trove 3.0.3

Trove is available for a long time and quite stable. Unfortunately, not much development is happening in this project at the moment. Trove provides you the list, stack, queue, hash set and map implementations for all primitive/object combinations. I have already written about Trove.

Data storage implementations and tests

This article will look at 5 different sorts of maps:

  1. intint
  2. intInteger
  3. Integerint
  4. IntegerInteger
  5. Integer (identity map)Integer

We will use JMH 1.0 for testing. Here is the test description: for each map size in (10K, 100K, 1M, 10M, 100M) (outer loop) generate a set of random keys (they will be used for each test at a given map size) and then run a test for each map implementations (inner loop). Each test will be run 100M / map_size times. “get”, “put” and “remove” tests are run separately, so you can update the test source code and run only a few of them.

Note that each test suite takes around 7-8 hours on my box. Spreadsheet-friendly results will be printed to stdout once all test suites will finish.

int-int

Each section will start with a table showing how data is stored inside each map. Only arrays will be shown here (some maps have special fields for a few corner cases).

tests.maptests.primitive.FastUtilMapTest int[] key, int[] value
tests.maptests.primitive.GsMutableMapTest int[] keys, int[] values
tests.maptests.primitive.HftcMutableMapTest long[] (key-low bits, value-high bits)
tests.maptests.primitive.HppcMapTest int[] keys, int[] values, boolean[] allocated
tests.maptests.primitive.TroveMapTest int[] _set, int[] _values, byte[] _states

As you can see, Koloboke is using a single array, FastUtil and GS use 2 arrays, and HPPC and Trove use 3 arrays to store the same data. Let’s see what would be the actual performance.

“Get” test results

All “get” tests make around 50% of unsuccessful get calls in order to test both success and failure paths in each map.

Each test results section will contain the results graph. X axis will show a map size, Y axis – time to run a test in milliseconds. Note, that each test in a graph has a fixed number of map method calls: 100M get call for “get” test; 200M put calls for “put” test; 100M put and 50M remove calls for “remove” tests.

There would be the links to OpenOffice spreadsheets with all test results at the end of this article.

int-int 'get' test results

int-int ‘get’ test results

GS and FastUtil test results lines are nearly parallel, but FastUtil is faster due to a lower constant factor. Koloboke becomes fastest only on large enough maps. Trove is slower than other implementations at each map size.

“Put” test results

“Put” tests insert all keys into a map and then use another equal set of keys to insert entries into a map again (these methods calls would update the existing entries). We make 100M put calls with “insert” functionality and 100M put calls with “update” functionality in each test.

int-int 'put' test results

int-int ‘put’ test results

This test shows the implementation difference more clear: Koloboke is fastest from the start (though FastUtil is as fast on small maps); GS and FastUtil are parallel again (but GS is always slower). HPPC and Trove are the slowest.

“Remove” test results

In “remove” test we interleave 2 put operations with 1 remove operation, so that a map size grows by 1 after each group of put/remove calls. In total we make 100M put and 50M remove calls.

int-int 'remove' test results

int-int ‘remove’ test results

Results are similar to “put” test (of course, both tests make a majority of put calls!): Koloboke is quickly becoming the fastest implementation; FastUtil is a bit faster than GS on all map sizes; HPPC and Trove are the slowest, but HPPC performs reasonably good on map sizes up to 1M entries.

int-int summary

An underlying storage implementation is the most important factor defining the hash map performance: the fewer memory accesses an implementation makes (especially for large maps which do not into CPU cache) to access an entry – the faster it would be. As you can see, the single array Koloboke is faster than other implementations in most of tests on large map sizes. For smaller map sizes, CPU cache starts hiding the costs of accessing several arrays – in this case other implementations may be faster due to less CPU commands required for a method call: FastUtil is the second best implementation for primitive collection tests due to its highly optimized code.

int-Object

tests.maptests.prim_object.FastUtilIntObjectMapTest int[] key, Object[] value
tests.maptests.prim_object.GsIntObjectMapTest int[] keys, Object[] values
tests.maptests.prim_object.HftcIntObjectMapTest int[] keys, Object[] values
tests.maptests.prim_object.HppcIntObjectMapTest int[] keys, Object[] values, boolean[] allocated
tests.maptests.prim_object.TroveIntObjectMapTest int[] _set, Object[] _values, byte[] _states

There are 2 groups here: FastUtil, GS and Koloboke are using 2 arrays; HPPC and Trove are using 3 arrays.

“Get” test results

int-Object 'get' test results

int-Object ‘get’ test results

As you can, see FastUtil and Koloboke are very close to each other, though FastUtil is consistently faster. GS and HPPC form the next group, where HPPC is slightly faster than GS (which is a surprise despite the extra underlying array). Trove is noticeably slower.

“Put” test results

int-Object 'put' test results

int-Object ‘put’ test results

FastUtil, Koloboke and GS are leaders in the “put” test (and FastUtil is a clear winner here). HPPC is getting slower than Trove on the large map sizes.

“Remove” test results

int-Object 'remove' test results

int-Object ‘remove’ test results

This picture is very similar to the previous one: nearly identical results of FastUtil and Koloboke; slightly slower GS; HPPC is pretty good on the smaller map sizes, but is getting slower on the large maps; Trove is slower than HPPC, but parallel to it.

int-Object summary

Extra byte/boolean array used by HPPC/Trove makes them predictably slower than 3 other implementations in this test. Once the underlying storage becomes identical, second order optimizations starts to make the difference. As a result, FastUtil is getting faster than other 2-array maps, and HPPC is getting close to 2-array maps on the smaller maps sizes, where the CPU cache can fit the whole/most of the map, so the extra array does not make a serious difference.

Object-int

tests.maptests.object_prim.FastUtilObjectIntMapTest Object[] key, int[] value
tests.maptests.object_prim.GsObjectIntMapTest Object[] keys, int[] values
tests.maptests.object_prim.HftcObjectIntMapTest Object[] keys, int[] values
tests.maptests.object_prim.HppcObjectIntMapTest Object[] keys, int[] values, boolean[] allocated
tests.maptests.object_prim.TroveObjectIntMapTest Object[] _set, int[] _values

Only HPPC is still using 3 arrays for Object-int mapping. Hopefully, this would be fixed in the next HPPC release.

“Get” test results

Object-int 'get' results

Object-int ‘get’ results

FastUtil is a leader in this test. Koloboke and GS are very close, though GS is running a little slower if a map no longer fits into CPU cache. HPPC is surprisingly faster than Trove…

“Put” test results

Object-int 'put' results

Object-int ‘put’ results

FastUtil, Koloboke and GS are very close to each other on the map sizes up to 1M, but you can see the difference after this point: FastUtil is the fastest, Koloboke is the second and GS is the third. Trove is a little slower than those 3 implementations, but much faster than HPPC, which behaves really bad on the very large map sizes.

“Remove” test results

Object-int 'remove' results

Object-int ‘remove’ results

This test is very similar to the previous one with the only difference: the gap between Trove and 3 fastest implementations is much bigger.

Object-int summary

This test is again highlighting the importance of having the minimal possible number of underlying arrays in a map implementation: once you have an extra one, your implementation becomes non-competitive. The next important lesson you can learn is to use the underlying arrays with a size equal to a power of 2. This allows you to use bit operations while calculating a lookup position in a map instead of an expensive mod (used by Trove).

Object-Object

tests.maptests.object.FastUtilObjMapTest Object[] key, Object[] value
tests.maptests.object.GsObjMapTest Object[] table – interleaved keys and values
tests.maptests.object.HftcMutableObjTest Object[] table – interleaved keys and values
tests.maptests.object.HppcObjMapTest Object[] keys, Object[] values, boolean[] allocated
tests.maptests.object.JdkMapTest Node<K,V>[] table – each Node could be a part of a linked list or a TreeMap (Java 8)
tests.maptests.object.TroveObjMapTest Object[] _set, Object[] _values

This group of tests will be bigger than the previous ones. We will see if Koloboke can make use of the fact (you can specify it in the factory) that there would be no null keys. We will also try to work around the JDK HashMap “feature” that there may be one rehashing before you will add the requested number of entries in the map (JDK implementation may not allocate arrays large enough to store the requested map size).

“Get” test results

Object-Object 'get' results

Object-Object ‘get’ results

This test results are pretty surprising:

  • Both JDK versions are the fastest (rehashing does not make a difference in this test because it happens at setup).
  • FastUtil and GS are close to JDK, but slightly slower. Nevertheless, they require less memory overhead, so they may still be considered as an option.
  • Koloboke is close to the above mentioned implementations on some map sizes, but slower on others. This is most likely due to the variable fill factor (Koloboke does not perform well if you try to set a fixed fill factor).
  • Surprisingly, Trove is slower than HPPC in this test despite an extra “allocated” array in the HPPC implementation.

“Put” test results

Object-Object 'put' results

Object-Object ‘put’ results

  • Koloboke implementation is the fastest in this test due to best possible storage structure.
  • FastUtil keeps up with Koloboke on the smaller map sizes (while the map fits in CPU cache), but becomes a little slower on large maps due to a less efficient memory access pattern.
  • GS, on the other hand, is slower than Koloboke and FastUtil on the smaller maps (due to less efficient code), but is getting closer to Koloboke once a map no longer fits in CPU cache.
  • As for JDK maps, you can see that a map with correct preallocated capacity is always faster than a default map (by default I mean a map where you specify just the right capacity in the constructor, say 10.000, instead of an inflated capacity of 10.000/fill_factor(0.5)=20.000). The difference is bigger on some sizes and smaller on the others, due to either one or 2 factors in effect: 1) you will always have a smaller chance of hash collisions in the bigger capacity map; 2) you may avoid rehashing by specifying a bigger capacity in the JDK map.
  • Trove is faster than HPPC in this test due to a 2-array underlying implementation.

“Remove” test results

Object-Object 'remove' results

Object-Object ‘remove’ results

  • Koloboke is the fastest in this test and GS is very close to GS. FastUtil and a proper capacity JDK map are very close, but slightly slower.
  • Default capacity JDK map is noticeably slower than the first 4 implementations.
  • It is followed by Trove (not far behind) and HPPC (too far behind).

Object-Object summary

The first lesson you should remember from this group of maps is that a JDK HashMap may not allocate enough storage for the requested number of elements. As a result, you may be penalised by rehashing. Nevertheless, JDK HashMap is extremely good if you mostly use it as a read-only storage.

The second lesson is that an underlying storage is still the most important factor affecting the hash map performance – an implementation should try to minimize a number of memory accesses for each operation and do not expect that a CPU cache would help it.

Identity maps

The most important property of identity maps is that they expect that the same object will be used for all accesses to the map. It means that an identity map will use == instead of yourObject.equals() and System.identityHashCode() instead of yourObject.hashCode().

Keep in mind that some non-identity maps also make == check prior to requested_key.equals(stored_map_key) check (for example, JDK and Koloboke implement such check) in hope that some previously inserted keys may be later used for queries. Pay attention if this is your application case.

tests.maptests.identity_object.FastUtilRef2ObjectMapTest Object[] key, Object[] value
tests.maptests.identity_object.GsIdentityMapTest Object[] table – interleaved keys and values
tests.maptests.identity_object.HftcIdentityMapTest Object[] table – interleaved keys and values
tests.maptests.identity_object.HppcIdentityMapTest Object[] keys, Object[] values, boolean[] allocated
tests.maptests.identity_object.JDKIdentityMapTest Object[] table – interleaved keys and values
tests.maptests.identity_object.TroveIdentityMapTest Object[] _set, Object[] _values

There are 3 groups here: JDK, Koloboke and GS use a single interleaved array, FastUtil and Trove use 2 arrays, finally HPPC uses 3 arrays.

“Get” test results

Object (identity)-Object 'get' results

Object (identity)-Object ‘get’ results

“Put” test results

Object (identity)-Object 'put' results

Object (identity)-Object ‘put’ results

“Remove” test results

Object (identity)-Object 'remove' results

Object (identity)-Object ‘remove’ results

All these tests are mode difficult to comment. We can see that Trove is simply slow and HPPC is penalised for the third underlying array. FastUtil, GS and JDK are consistently good. Koloboke is also good, but is surprisingly slower than most of implementations on the small maps sizes in “get” tests.

Summary

  • FastUtil 6.6.0 turned out to be consistently fast. It may become even faster if it would introduce any other storage structures except 2 arrays for keys and values.
  • Koloboke is getting second in many tests, but it still outperforms FastUtil in int-int tests.
  • GS implementation is good enough, but is slower than FastUtil and Koloboke.
  • JDK maps are pretty good for Object-Object maps provided that you can tolerate the extra memory consumption and you will call HashMap constructor with required capacity = actual_capacity / fill_factor + 1 to avoid rehashing.
  • Trove suffers from using mod operation for array index calculations and HPPC is too slow due to an extra underlying array (for cell states).

Source code

The article source code is now hosted at GitHub: https://github.com/mikvor/hashmapTest.

Please note you should run this project via tests.MapTestRunner class:

mvn clean install
java -cp target/benchmarks.jar tests.MapTestRunner

The full test set may take around 24 hours to complete. You need a computer with proper CPU cooling to run this test set, so it can sustain an hours long CPU load without throttling (small laptops are seldom designed for such load). You need 20G+ heap to run the 100M tests, so it makes sense to shrink MapTestRunner.TOTAL_SIZE to 10M if you want to use a commodity computer for testing.

Test results

Here are the article test results in form of OpenOffice spreadsheet files.


2 thoughts on “Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove – January 2015 version

  1. Pingback: Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove  - Java Performance Tuning Guide

  2. Pingback: Implementing a world fastest Java int-to-int hash map*  - Java Performance Tuning Guide

Leave a Reply

Your email address will not be published. Required fields are marked *