Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove

by Mikhail Vorontsov


This article is outdated! A newer version covering the latest versions of collections libraries is available here.












04 Jan 2015 update: a couple of clarifications, fixed a bug in FastUtil Object-int test – now it got much faster (thanks to Sebastiano Vigna for his suggestions).

Introduction

This article will give you an overview of hash map implementations in 5 well known libraries and JDK HashMap as a baseline. We will test separately:

  • Primitive to primitive maps
  • Primitive to object maps
  • Object to primitive maps
  • Object to Object maps (JDK participates only in this section)

This article will overview a single test – map read access for a random set of keys (a set of keys is shared for all collections of a given capacity).

We will also pay attention to the way the data is stored inside these collections and to some pretty interesting implementation details.

Participants

JDK 8

JDK HashMap is the oldest hash map implementation in this test. It got a couple of major updates recently – a shared underlying storage for the empty maps in Java 7u40 and a possibility to convert underlying hash bucket linked lists into tree maps (for better worse case performance) in Java 8.

FastUtil 6.5.15

FastUtil provides a developer a set of all 4 options listed above (all combinations of primitives and objects). Besides that, there are several other types of maps available for each parameter type combination: array map, AVL tree map and RB tree map. Nevertheless, we are only interested in hash maps in this article.

Goldman Sachs Collections 5.1.0

Goldman Sachs has open sourced its collections library about 3 years ago. In my opinion, this library provides the widest range of collections out of box (if you need them). You should definitely pay attention to it if you need more than a hash map, tree map and a list for your work :) For the purposes of this article, GS collections provide a normal, synchronized and unmodifiable versions of each hash map. The last 2 are just facades for the normal map, so they don’t provide any performance advantages.

HPPC 0.6.1

HPPC provides array lists, array dequeues, hash sets and hash maps for all primitive types. HPPC provides normal hash maps for primitive keys and both normal and identity hash maps for object keys.

Koloboke 0.6

Koloboke is the youngest of all libraries in this article. It is developed as a part of an OpenHFT project by Roman Leventov. This library currently provides hash maps and hash sets for all primitive/object combinations. This library was recently renamed from HFTC, so some artifacts in my tests will still use the old library name.

Trove 3.0.3

Trove is available for a long time and quite stable. Unfortunately, not much development is happening in this project at the moment. Trove provides you the list, stack, queue, hash set and map implementations for all primitive/object combinations. I have already written about Trove.

Data storage implementations and tests

This article will look at 4 different sorts of maps:

  1. intint
  2. intInteger
  3. Integerint
  4. IntegerInteger

Let’s see how the data is stored in each kind of those maps. We will refer to the test names instead of the actual implementation names, because a lot of those implementations are called very similarly and it’s not easy to distinguish them by name. After looking at the implementation details, we will check how they affect the actual test results.

We will use JMH 1.0 for testing. Here is the test description: for each map size in (10K, 100K, 1M, 10M, 100M) (outer loop) generate a set of random keys (they will be used for each test at a given map size) and then run a test for each map implementations (inner loop). Each test will be run 100M / map_size times (so that we will call map.get 100M times for each test case).

  1. In setup: Take a set of int keys and required fill factor
  2. Initialize a map with a given fill factor and capacity = number of keys
  3. Populate a map with keys and values = keys
  4. Store a reference to the keys array or convert it into Integer[] for tests with object keys (nevertheless, use the same keys)
  5. All tests are nearly identical – get stored values for an array of keys and use these values, so that JVM will not optimize out your code:

    1
    2
    3
    4
    5
    6
    
    public int runRandomTest() {
        int res = 0;
        for ( int i = 0; i < m_keys.length; ++i )
            res = res ^ m_map.get( m_keys[ i ] );
        return res;
    }
    public int runRandomTest() {
        int res = 0;
        for ( int i = 0; i < m_keys.length; ++i )
            res = res ^ m_map.get( m_keys[ i ] );
        return res;
    }

int-int

tests.maptests.primitive.FastUtilMapTest int[] keys, int[] values, boolean[] used
tests.maptests.primitive.GsMutableMapTest int[] keys, int[] values
tests.maptests.primitive.HftcMutableMapTest long[] (key-low bits, value-high bits)
tests.maptests.primitive.HppcMapTest int[] keys, int[] values, boolean[] allocated
tests.maptests.primitive.TroveMapTest int[] _set, int[] _values, byte[] _states

As you can see, FastUtil, HPPC and Trove use identical storage, so you may expect the similar performance from them.

Handling of empty and removed cells in GS collections and Koloboke

GS collections use just keys and values arrays. If you have ever looked at the hash map implementations, you should know that a map should at least distinguish empty cells from the occupied ones (some maps also use "removed cell" marker). How could you achieve such functionality without extra storage? GS IntIntHashMap uses a companion sentinel object containing values for key=0 (empty cell) and key=1 (removed key). All operations on keys=0 or 1 are done on the sentinel object. Such an object allows GS IntIntHashMap to use O(1) storage for flags instead of O(capacity). This also allows you to access only 2 cells of memory instead of 3, which makes this implementation faster.

Koloboke int-int map (the actual name is hidden behind the factories and may change) is going even further. First of all, in some cases it uses an array of longer datatype as storage, which is capable to keep both key and value in one element. int-int map is an example of such approach: a key is stored in the low 32 bits of a long cell and a value is stored in the high 32 bits. Such a layout means only one cache line miss in case of the cold data access instead of 2 (GS collections) or 3 (all other).

Koloboke uses a different technique for marking non-used entries. When a map is initialized, it picks a random int and uses it as a free cell marker. If you try to insert a key = free cell marker, it picks another random value, which is not present in the map and so on. It means that Koloboke uses just 4 bytes overhead for handling empty nodes and does it in the extremely efficient way.

In general such approach does not impose any performance penalties unless your map size is getting close to the number of values in a given datatype. You may want to think what will happen in case of smaller key data types? You will get a HashOverflowException defined in koloboke-api library if you will attempt to add all datatype values into a map. You can use the following test to reproduce it:

1
2
3
4
5
6
7
8
HashByteIntMap m = HashByteIntMaps.newMutableMap( 256 );
for ( int i = Byte.MIN_VALUE; i < Byte.MAX_VALUE; ++i )
{
    final byte key = (byte) i;
    m.put( key, i );
}
m.put( Byte.MAX_VALUE, 127 );   //exception will be thrown here
System.out.println( m.size() );
HashByteIntMap m = HashByteIntMaps.newMutableMap( 256 );
for ( int i = Byte.MIN_VALUE; i < Byte.MAX_VALUE; ++i )
{
    final byte key = (byte) i;
    m.put( key, i );
}
m.put( Byte.MAX_VALUE, 127 );   //exception will be thrown here
System.out.println( m.size() );

Nevertheless, this should not be an issue in the real life. If you want to map every / most of byte/char/short into some value, you'd better use an array of value type indexed by keys.

int-int Test results

Each of test sections will start with a result table followed by a chart. The first line in a table is a map size. All test results are in milliseconds.

  10000 100000 1000000 10000000 100000000
tests.maptests.primitive.HftcMutableMapTest 955 1324 1871 4198 3805
tests.maptests.primitive.HftcImmutableMapTest 941 1335 1807 4194 3793
tests.maptests.primitive.HftcUpdateableMapTest 949 1314 1836 4183 3799
tests.maptests.primitive.GsMutableMapTest 977 1883 3322 6256 7754
tests.maptests.primitive.GsImmutableMapTest 997 1895 3279 6201 7786
tests.maptests.primitive.FastUtilMapTest 1045 1590 3776 7655 10095
tests.maptests.primitive.HppcMapTest 1021 1580 3693 7612 10086
tests.maptests.primitive.TroveMapTest 1775 2642 5137 10799 13834

int-int test results

As you can see, libraries got split into 4 distinctly different groups (fastest to slowest):

  1. Koloboke shows the best results: using a single long[] for storage and a clever trick of a random free cell values gives its results. All 3 versions of Koloboke collections are showing exactly the same result in this test (it does not mean they will be equally fast in other tests as well).
  2. GS collections implementation is the second fastest - using 2 arrays instead of 3 as well as good code quality pays off here.
  3. FastUtil and HPPC are showing exactly the same performance (less than 2% difference).
  4. Trove is the the slowest implementation in this test, being about 2 times slower than Koloboke on most of map sizes, but becoming even more slower on huge maps sizes (10M+).

Note that Koloboke works faster on 100M map rather than on 10M map. According to Roman Leventov email, this happens due to bigger fill factor chosen for a map(size=10M) than for a map(size=100M). You will see the similar difference in Object-Object test results.

int-Object

tests.maptests.prim_object.FastUtilIntObjectMapTest int[] key, Object[] value, boolean[] used
tests.maptests.prim_object.GsIntObjectMapTest int[] keys, Object[] values
tests.maptests.prim_object.HftcIntObjectMapTest int[] keys, Object[] values
tests.maptests.prim_object.HppcIntObjectMapTest int[] keys, Object[] values, boolean[] allocated
tests.maptests.prim_object.TroveIntObjectMapTest int[] _set, Object[] _values, byte[] _states

No surprises here: FastUtil, HPPC and Trove are using 3 arrays (including an array of cell states). GS collections and Koloboke are using 2 arrays and the tricks similar to the listed above for the special cases.

int-Object test results

  10000 100000 1000000 10000000 100000000
tests.maptests.prim_object.HftcIntObjectMapTest 1223 1358 3034 6187 7064
tests.maptests.prim_object.FastUtilIntObjectMapTest 1213 1746 4112 7902 10595
tests.maptests.prim_object.GsIntObjectMapTest 1764 2658 4310 7775 9715
tests.maptests.prim_object.HppcIntObjectMapTest 1666 1725 4083 8447 12202
tests.maptests.prim_object.TroveIntObjectMapTest 1987 2835 5812 11269 14265

int-object test results

There are 3 groups in this test (fastest to slowest):

  1. Koloboke is the fastest one due to using only 2 arrays and simpler code for the empty cells case.
  2. It is followed by GS collections (which did not manage to use the advantage of 2 storage arrays instead of 3), FastUtil and HPPC. Their results slightly vary in different tests, but they are relatively close to each other.
  3. Trove is the slowest again, losing 1.5 to 2 times to Koloboke.

Object-int

tests.maptests.object_prim.FastUtilObjectIntMapTest Object[] key, int[] value, boolean[] used
tests.maptests.object_prim.GsObjectIntMapTest Object[] keys, int[] values
tests.maptests.object_prim.HftcObjectIntMapTest Object[] keys, int[] values
tests.maptests.object_prim.HppcObjectIntMapTest Object[] keys, int[] values, boolean[] allocated
tests.maptests.object_prim.TroveObjectIntMapTest Object[] _set, int[] _values

FastUtil and HPPC are using the third array in case of Object keys. This seems to be a bad idea, because you can always use a private sentinel object as a flag in case of Object keys. We will see the actual performance a bit below.

GS collections, Koloboke and Trove are using 2 arrays, so we should expect them to be a little faster.

Object-int test results

  10000 100000 1000000 10000000 100000000
tests.maptests.object_prim.HftcObjectIntMapTest 1775 1781 4320 8567 8962
tests.maptests.object_prim.GsObjectIntMapTest 1598 2876 6214 8467 11700
tests.maptests.object_prim.FastUtilObjectIntMapTest 1599 2614 6151 9273 15146
tests.maptests.object_prim.HppcObjectIntMapTest 2297 2687 6077 10788 17425
tests.maptests.object_prim.TroveObjectIntMapTest 2550 3286 5837 11804 14324

object-int test results

There are 2 groups in this test, though the groups are not that distinctive as before (fastest to slowest):

  1. Koloboke is faster than other implementations with the exceptions of 10K map, where it is slower than both GS collections and FastUtil and 10M, where it is slower than GS collections (yeah, the same problem with too big fill factor which was mentioned above).
  2. Other collections behave similarly to each other until map size = 1M. After that we can see that GS collections are getting faster than others, and it is followed by FastUtil.

Object-Object

tests.maptests.object.FastUtilObjMapTest Object[] keys, Object[] values, boolean[] used
tests.maptests.object.GsObjMapTest Object[] table - interleaved keys and values
tests.maptests.object.HftcMutableObjTest Object[] tab - interleaved keys and values
tests.maptests.object.HppcObjMapTest Object[] keys, Object[] values, boolean[] allocated
tests.maptests.object.JdkMapTest Node<K,V>[] table - each Node could be a part of a linked list or a TreeMap (Java 8)
tests.maptests.object.TroveObjMapTest Object[] _set, Object[] _values

In case of Object-to-Object mappings we have a more complex picture:

  • FastUtil and HPPC are using 3 arrays per map. Nothing fancy.
  • JDK HashMap is the only map which stores entries in the Node objects, which combine a key and a value. It means you have at least 24 bytes of overhead per entry. The actual overhead are 32 bytes because each bucket in a HashMap is a double linked list, so each entry has 2 extra pointers.
  • Trove is using 2 maps (and a special sentinel object for empty cells).
  • Finally, GS collections and Koloboke are using a single array with interleaved keys and values, which makes them most CPU cache friendly collections of these 6.

Now, armed with the implementation knowledge, let's test the maps performance.

Object-Object test results

  10000 100000 1000000 10000000 100000000
tests.maptests.object.HftcMutableObjTest 1146 1378 2928 6215 5945
tests.maptests.object.JdkMapTest 1151 1776 3759 5341 11523
tests.maptests.object.GsObjMapTest 1566 2242 4582 6012 8110
tests.maptests.object.FastUtilObjMapTest 1720 3002 6015 9360 13292
tests.maptests.object.HppcObjMapTest 1726 3085 5692 9125 13139
tests.maptests.object.TroveObjMapTest 2065 2979 5713 10266 12631

object-object test results

This test results are even less clear.

  1. There is Koloboke which is generally faster than JDK HashMap, but the difference is not that big except the case of huge maps, where Koloboke wins.
  2. GS collections is close to Koloboke and JDK on the large and huge maps, but sufficiently far in case of smaller maps.
  3. Finally there FastUtil, HPPC and Trove with approximately the same performance for all map sizes.

One billion entries test

I decided to see what will happen to these collections if I will try to create a map with a requested size of one billion entries and fill factor = 0.5, which means that all these maps will have to allocate an array very close to the maximal allowed array length = 231.

FastUtil, HPPC and GS collections have failed with various exceptions (not OOM - I have allocated 110G RAM for this test).

Koloboke, Trove and JDK managed to pass these tests. Unfortunately, I dod not manage to run some of these tests successfully in JMH, so they were run by a separate code.

Here are the test results (if you want to compare them to the previous results, multiply the previous results by 10, because all previous tests called map.get 100M times in total):

tests.maptests.primitive.HftcMutableMapTest : time = 95.05 sec
tests.maptests.primitive.TroveMapTest : time = 235.062 sec

tests.maptests.prim_object.HftcIntObjectMapTest : time = 216.361 sec
tests.maptests.prim_object.TroveIntObjectMapTest : time = 304.019 sec

tests.maptests.object_prim.HftcObjectIntMapTest : time = 335.139 sec
tests.maptests.object_prim.TroveObjectIntMapTest : time = 217.412 sec

tests.maptests.object.HftcMutableObjTest : time = 272.792 sec
tests.maptests.object.JdkMapTest : time = 163.335 sec
tests.maptests.object.TroveObjMapTest : time = 239.133 sec

As you can see, Koloboke wins by a large margin in the primitive-to-primitive test. It is also significantly faster in primitive-to-object test.

In case of object-to-primitive test Koloboke took significantly longer than Trove to complete.

Finally, for object-to-object test, I had to change Koloboke map initialization code, because by default it started to degrade extremely quickly once I have added half a billion elements into it:

1
HashObjObjMaps.getDefaultFactory().withHashConfig(HashConfig.fromLoads(0.5, 0.6, 0.8)).newMutableMap(keys.length)
HashObjObjMaps.getDefaultFactory().withHashConfig(HashConfig.fromLoads(0.5, 0.6, 0.8)).newMutableMap(keys.length)

Koloboke 2.0?

Roman Leventov has just announced that he is considering to implement a newer and even faster version of Koloboke library, but he needs your feedback. Do you mind to write him a line?

Summary

  • Koloboke has turned out to be the fastest and the most memory efficient library implementing hash maps. This library is too young and not widely used yet, but why don't give it a try?
  • If you are looking for a more stable and mature library (and willing to sacrifice some performance), you should probably look at GS collections library. Unlike Koloboke, it gives you a wide range of collections out of box.

Source code

The article source code is now hosted at GitHub: https://github.com/mikvor/hashmapTest. You may expect that the test set would be slightly ahead of this article :)

Please note you should run this project via tests.MapTestRunner class:

mvn clean install
java -cp target/benchmarks.jar tests.MapTestRunner

3 thoughts on “Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove

  1. Pingback: Links & Reads for 2014 Week 44 | Martin's Weekly Curations

  2. Pingback: Java Annotated Monthly – November 2014 | JetBrains IntelliJ IDEA Blog

  3. Pingback: Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove - January 2015 version  - Java Performance Tuning Guide

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code lang=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" extra="">