Tag Archives: hppc

Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove – January 2015 version

by Mikhail Vorontsov

This is a major update of the previous version of this article. The reasons for this update are:

  • The major performance updates in fastutil 6.6.0
  • Updates in the “get” test from the original article, addition of “put/update” and “put/remove” tests
  • Adding identity maps to all tests
  • Now using different objects for any operations after map population (in case of Object keys – except identity maps). Old approach of reusing the same keys gave the unfair advantage to Koloboke.

I would like to thank Sebastiano Vigna for providing the initial versions of “get” and “put” tests.

Introduction

This article will give you an overview of hash map implementations in 5 well known libraries and JDK HashMap as a baseline. We will test separately:

  • Primitive to primitive maps
  • Primitive to object maps
  • Object to primitive maps
  • Object to Object maps
  • Object (identity) to Object maps

This article will provide you the results of 3 tests:

  • “Get” test: Populate a map with a pregenerated set of keys (in the JMH setup), make ~50% successful and ~50% unsuccessful “get” calls. For non-identity maps with object keys we use a distinct set of keys (the different object with the same value is used for successful “get” calls).
  • “Put/update” test: Add a pregenerated set of keys to the map. In the second loop add the equal set of keys (different objects with the same values) to this map again (make the updates). Identical keys are used for identity maps and for maps with primitive keys.
  • “Put/remove” test: In a loop: add 2 entries to a map, remove 1 of existing entries (“add” pointer is increased by 2 on each iteration, “remove” pointer is increased by 1).

This article will just give you the test results. There will be a followup article on the most interesting implementation details of the various hash maps.

Test participants

JDK 8

JDK HashMap is the oldest hash map implementation in this test. It got a couple of major updates recently – a shared underlying storage for the empty maps in Java 7u40 and a possibility to convert underlying hash bucket linked lists into tree maps (for better worse case performance) in Java 8.

FastUtil 6.6.0

FastUtil provides a developer a set of all 4 options listed above (all combinations of primitives and objects). Besides that, there are several other types of maps available for each parameter type combination: array map, AVL tree map and RB tree map. Nevertheless, we are only interested in hash maps in this article.

Goldman Sachs Collections 5.1.0

Goldman Sachs has open sourced its collections library about 3 years ago. In my opinion, this library provides the widest range of collections out of box (if you need them). You should definitely pay attention to it if you need more than a hash map, tree map and a list for your work 🙂 For the purposes of this article, GS collections provide a normal, synchronized and unmodifiable versions of each hash map. The last 2 are just facades for the normal map, so they don’t provide any performance advantages.

HPPC 0.6.1

HPPC provides array lists, array dequeues, hash sets and hash maps for all primitive types. HPPC provides normal hash maps for primitive keys and both normal and identity hash maps for object keys.

Koloboke 0.6.5

Koloboke is the youngest of all libraries in this article. It is developed as a part of an OpenHFT project by Roman Leventov. This library currently provides hash maps and hash sets for all primitive/object combinations. This library was recently renamed from HFTC, so some artifacts in my tests will still use the old library name.

Trove 3.0.3

Trove is available for a long time and quite stable. Unfortunately, not much development is happening in this project at the moment. Trove provides you the list, stack, queue, hash set and map implementations for all primitive/object combinations. I have already written about Trove.

Data storage implementations and tests

This article will look at 5 different sorts of maps:

  1. intint
  2. intInteger
  3. Integerint
  4. IntegerInteger
  5. Integer (identity map)Integer

We will use JMH 1.0 for testing. Here is the test description: for each map size in (10K, 100K, 1M, 10M, 100M) (outer loop) generate a set of random keys (they will be used for each test at a given map size) and then run a test for each map implementations (inner loop). Each test will be run 100M / map_size times. “get”, “put” and “remove” tests are run separately, so you can update the test source code and run only a few of them.

Note that each test suite takes around 7-8 hours on my box. Spreadsheet-friendly results will be printed to stdout once all test suites will finish.

int-int

Each section will start with a table showing how data is stored inside each map. Only arrays will be shown here (some maps have special fields for a few corner cases).

tests.maptests.primitive.FastUtilMapTest int[] key, int[] value
tests.maptests.primitive.GsMutableMapTest int[] keys, int[] values
tests.maptests.primitive.HftcMutableMapTest long[] (key-low bits, value-high bits)
tests.maptests.primitive.HppcMapTest int[] keys, int[] values, boolean[] allocated
tests.maptests.primitive.TroveMapTest int[] _set, int[] _values, byte[] _states

As you can see, Koloboke is using a single array, FastUtil and GS use 2 arrays, and HPPC and Trove use 3 arrays to store the same data. Let’s see what would be the actual performance.

“Get” test results

All “get” tests make around 50% of unsuccessful get calls in order to test both success and failure paths in each map.

Each test results section will contain the results graph. X axis will show a map size, Y axis – time to run a test in milliseconds. Note, that each test in a graph has a fixed number of map method calls: 100M get call for “get” test; 200M put calls for “put” test; 100M put and 50M remove calls for “remove” tests.

There would be the links to OpenOffice spreadsheets with all test results at the end of this article.

int-int 'get' test results

int-int ‘get’ test results

GS and FastUtil test results lines are nearly parallel, but FastUtil is faster due to a lower constant factor. Koloboke becomes fastest only on large enough maps. Trove is slower than other implementations at each map size.

“Put” test results

“Put” tests insert all keys into a map and then use another equal set of keys to insert entries into a map again (these methods calls would update the existing entries). We make 100M put calls with “insert” functionality and 100M put calls with “update” functionality in each test.

int-int 'put' test results

int-int ‘put’ test results

This test shows the implementation difference more clear: Koloboke is fastest from the start (though FastUtil is as fast on small maps); GS and FastUtil are parallel again (but GS is always slower). HPPC and Trove are the slowest.

“Remove” test results

In “remove” test we interleave 2 put operations with 1 remove operation, so that a map size grows by 1 after each group of put/remove calls. In total we make 100M put and 50M remove calls.

int-int 'remove' test results

int-int ‘remove’ test results

Results are similar to “put” test (of course, both tests make a majority of put calls!): Koloboke is quickly becoming the fastest implementation; FastUtil is a bit faster than GS on all map sizes; HPPC and Trove are the slowest, but HPPC performs reasonably good on map sizes up to 1M entries.

int-int summary

An underlying storage implementation is the most important factor defining the hash map performance: the fewer memory accesses an implementation makes (especially for large maps which do not into CPU cache) to access an entry – the faster it would be. As you can see, the single array Koloboke is faster than other implementations in most of tests on large map sizes. For smaller map sizes, CPU cache starts hiding the costs of accessing several arrays – in this case other implementations may be faster due to less CPU commands required for a method call: FastUtil is the second best implementation for primitive collection tests due to its highly optimized code.

Continue reading