Tag Archives: trove

Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove – January 2015 version

by Mikhail Vorontsov

This is a major update of the previous version of this article. The reasons for this update are:

  • The major performance updates in fastutil 6.6.0
  • Updates in the “get” test from the original article, addition of “put/update” and “put/remove” tests
  • Adding identity maps to all tests
  • Now using different objects for any operations after map population (in case of Object keys – except identity maps). Old approach of reusing the same keys gave the unfair advantage to Koloboke.

I would like to thank Sebastiano Vigna for providing the initial versions of “get” and “put” tests.

Introduction

This article will give you an overview of hash map implementations in 5 well known libraries and JDK HashMap as a baseline. We will test separately:

  • Primitive to primitive maps
  • Primitive to object maps
  • Object to primitive maps
  • Object to Object maps
  • Object (identity) to Object maps

This article will provide you the results of 3 tests:

  • “Get” test: Populate a map with a pregenerated set of keys (in the JMH setup), make ~50% successful and ~50% unsuccessful “get” calls. For non-identity maps with object keys we use a distinct set of keys (the different object with the same value is used for successful “get” calls).
  • “Put/update” test: Add a pregenerated set of keys to the map. In the second loop add the equal set of keys (different objects with the same values) to this map again (make the updates). Identical keys are used for identity maps and for maps with primitive keys.
  • “Put/remove” test: In a loop: add 2 entries to a map, remove 1 of existing entries (“add” pointer is increased by 2 on each iteration, “remove” pointer is increased by 1).

This article will just give you the test results. There will be a followup article on the most interesting implementation details of the various hash maps.

Test participants

JDK 8

JDK HashMap is the oldest hash map implementation in this test. It got a couple of major updates recently – a shared underlying storage for the empty maps in Java 7u40 and a possibility to convert underlying hash bucket linked lists into tree maps (for better worse case performance) in Java 8.

FastUtil 6.6.0

FastUtil provides a developer a set of all 4 options listed above (all combinations of primitives and objects). Besides that, there are several other types of maps available for each parameter type combination: array map, AVL tree map and RB tree map. Nevertheless, we are only interested in hash maps in this article.

Goldman Sachs Collections 5.1.0

Goldman Sachs has open sourced its collections library about 3 years ago. In my opinion, this library provides the widest range of collections out of box (if you need them). You should definitely pay attention to it if you need more than a hash map, tree map and a list for your work 🙂 For the purposes of this article, GS collections provide a normal, synchronized and unmodifiable versions of each hash map. The last 2 are just facades for the normal map, so they don’t provide any performance advantages.

HPPC 0.6.1

HPPC provides array lists, array dequeues, hash sets and hash maps for all primitive types. HPPC provides normal hash maps for primitive keys and both normal and identity hash maps for object keys.

Koloboke 0.6.5

Koloboke is the youngest of all libraries in this article. It is developed as a part of an OpenHFT project by Roman Leventov. This library currently provides hash maps and hash sets for all primitive/object combinations. This library was recently renamed from HFTC, so some artifacts in my tests will still use the old library name.

Trove 3.0.3

Trove is available for a long time and quite stable. Unfortunately, not much development is happening in this project at the moment. Trove provides you the list, stack, queue, hash set and map implementations for all primitive/object combinations. I have already written about Trove.

Data storage implementations and tests

This article will look at 5 different sorts of maps:

  1. intint
  2. intInteger
  3. Integerint
  4. IntegerInteger
  5. Integer (identity map)Integer

We will use JMH 1.0 for testing. Here is the test description: for each map size in (10K, 100K, 1M, 10M, 100M) (outer loop) generate a set of random keys (they will be used for each test at a given map size) and then run a test for each map implementations (inner loop). Each test will be run 100M / map_size times. “get”, “put” and “remove” tests are run separately, so you can update the test source code and run only a few of them.

Note that each test suite takes around 7-8 hours on my box. Spreadsheet-friendly results will be printed to stdout once all test suites will finish.

int-int

Each section will start with a table showing how data is stored inside each map. Only arrays will be shown here (some maps have special fields for a few corner cases).

tests.maptests.primitive.FastUtilMapTest int[] key, int[] value
tests.maptests.primitive.GsMutableMapTest int[] keys, int[] values
tests.maptests.primitive.HftcMutableMapTest long[] (key-low bits, value-high bits)
tests.maptests.primitive.HppcMapTest int[] keys, int[] values, boolean[] allocated
tests.maptests.primitive.TroveMapTest int[] _set, int[] _values, byte[] _states

As you can see, Koloboke is using a single array, FastUtil and GS use 2 arrays, and HPPC and Trove use 3 arrays to store the same data. Let’s see what would be the actual performance.

“Get” test results

All “get” tests make around 50% of unsuccessful get calls in order to test both success and failure paths in each map.

Each test results section will contain the results graph. X axis will show a map size, Y axis – time to run a test in milliseconds. Note, that each test in a graph has a fixed number of map method calls: 100M get call for “get” test; 200M put calls for “put” test; 100M put and 50M remove calls for “remove” tests.

There would be the links to OpenOffice spreadsheets with all test results at the end of this article.

int-int 'get' test results

int-int ‘get’ test results

GS and FastUtil test results lines are nearly parallel, but FastUtil is faster due to a lower constant factor. Koloboke becomes fastest only on large enough maps. Trove is slower than other implementations at each map size.

“Put” test results

“Put” tests insert all keys into a map and then use another equal set of keys to insert entries into a map again (these methods calls would update the existing entries). We make 100M put calls with “insert” functionality and 100M put calls with “update” functionality in each test.

int-int 'put' test results

int-int ‘put’ test results

This test shows the implementation difference more clear: Koloboke is fastest from the start (though FastUtil is as fast on small maps); GS and FastUtil are parallel again (but GS is always slower). HPPC and Trove are the slowest.

“Remove” test results

In “remove” test we interleave 2 put operations with 1 remove operation, so that a map size grows by 1 after each group of put/remove calls. In total we make 100M put and 50M remove calls.

int-int 'remove' test results

int-int ‘remove’ test results

Results are similar to “put” test (of course, both tests make a majority of put calls!): Koloboke is quickly becoming the fastest implementation; FastUtil is a bit faster than GS on all map sizes; HPPC and Trove are the slowest, but HPPC performs reasonably good on map sizes up to 1M entries.

int-int summary

An underlying storage implementation is the most important factor defining the hash map performance: the fewer memory accesses an implementation makes (especially for large maps which do not into CPU cache) to access an entry – the faster it would be. As you can see, the single array Koloboke is faster than other implementations in most of tests on large map sizes. For smaller map sizes, CPU cache starts hiding the costs of accessing several arrays – in this case other implementations may be faster due to less CPU commands required for a method call: FastUtil is the second best implementation for primitive collection tests due to its highly optimized code.

Continue reading

Trove library: using primitive collections for performance

by Mikhail Vorontsov


19 July 2014: article text cleanup, added a chapter on JDK to Trove migration.
16 July 2012: original version.

This article would describe Trove library, which contains a set of primitive collection implementations. The latest version of Trove (3.1a1 at the time of writing) would be described here.

Why should you use Trove? Why not to keep using well known JDK collections? The answer is performance and memory consumption. Trove doesn’t use any java.lang.Number subclasses internally, so you don’t have to pay for boxing/unboxing each time you want to pass/query a primitive value to/from the collection. Besides, you don’t have to waste memory on the boxed numbers (24 bytes for Long/Double, 16 bytes for smaller types) and reference to them. For example, if you want to store an Integer in JDK map, you need 4 bytes for a reference (or 8 bytes on huge heaps) and 16 bytes for an Integer instance. Trove, on the other hand, uses just 4 bytes to store an int. Trove also doesn’t create Map.Entry for each key-value pair unlike java.util.HashMap, further reducing the map memory footprint. For sets, it doesn’t use a Map internally, just keeping set values.

There are 3 main collection types in Trove: array lists, sets and maps. There are also queues, stacks and linked lists, but they are not so important (and usually instances of these collections tend to be rather small).

Array lists

All array lists are built on top of an array of corresponding data type (for example, int[] for TIntArrayList). There is a small problem which you should deal with: a value for the absent elements (default value). It is zero by default, but you can override it using

1
2
public TIntArrayList(int capacity, int no_entry_value);
public static TIntArrayList wrap(int[] values, int no_entry_value);
public TIntArrayList(int capacity, int no_entry_value);
public static TIntArrayList wrap(int[] values, int no_entry_value);

There are 2 useful methods called getQuick/setQuick – they just access the underlying array without any additional checks. As a side-effect they would allow to access elements between list size and capacity (don’t use it too much – it is still better to add values legally and when getQuick values as long as you are inside array list boundaries.

Each Trove array list has several helper methods which implement the java.util.Collections functionality:

1
2
3
4
5
6
7
8
9
10
11
12
public void reverse()
public void reverse(int from, int to)
public void shuffle(java.util.Random rand)
public void sort()
public void sort(int fromIndex, int toIndex)
public void fill(int val)
public void fill(int fromIndex, int toIndex, int val)
public int binarySearch(int value)
public int binarySearch(int value, int fromIndex, int toIndex)
public int max()
public int min()
public int sum()
public void reverse()
public void reverse(int from, int to)
public void shuffle(java.util.Random rand)
public void sort()
public void sort(int fromIndex, int toIndex)
public void fill(int val)
public void fill(int fromIndex, int toIndex, int val)
public int binarySearch(int value)
public int binarySearch(int value, int fromIndex, int toIndex)
public int max()
public int min()
public int sum()

Continue reading

Memory consumption of popular Java data types – part 2

by Mikhail Vorontsov

This article will give you an overview of some popular Java data types memory consumption. This article follows An overview of memory saving techniques in Java and Memory consumption of popular Java data types – part 1 articles earlier published in this blog. I strongly recommend you to read An overview of memory saving techniques in Java before reading this one. This article assumes that size of references is equal to 4 bytes.

HashMap, THashMap

HashMap is the most popular map in JDK. Provided that it contains objects with a quality hashCode method and it has not too high load factor, it will generally give you O(1) performance of get and put methods (as well as similar methods like contains).

You can find a proper hash map description in one of popular textbooks (“Algorithms (4th Edition)” or “Introduction to Algorithms”) or in Wikipedia. Here we will only discuss JDK HashMap implementation.

HashMap is built on top of the array of Map.Entry objects. The implementation ensures that this array length is always equal to at least max( size, capacity ) / load_factor. Default load factor for HashMap is 0.75 and default capacity is 16. Load factor specifies which part of an array could be used for storage and is a value between 0 and 1. The higher is the load factor, the less space is being wasted, but HashMap starts to work slower due to increased rate of collisions. The smaller if the load factor, the more memory is wasted, but the performance of a HashMap is increasing due to smaller possibility of collisions.

So, as you have seen, the default (new HashMap<>()) size of array of entries is 16 / 0.75 = 21.33 ~ 22.

What is a HashMap.Entry? It contains a key, a value, int hash of a key and a pointer to the next entry (remember that entries array could be sparse). It means that an entry occupies 32 bytes (12 bytes header + 16 bytes data + 4 bytes padding). So, a HashMap with size = S has to spend 32 * S bytes for entries storage. Besides, it will use 4 * C bytes for entries array, where C is the map capacity.

As you can see, if you will make the map capacity small enough (less than 12.5%), its entry array size will start dominating over the entries.

Continue reading

Memory consumption of popular Java data types – part 1

by Mikhail Vorontsov

This article will give you an overview of some popular Java data types memory consumption. This article follows An overview of memory saving techniques in Java article earlier published in this blog. I strongly recommend you to read the previous article before reading this one.

Enum, EnumMap, EnumSet

Enums were added to Java 1.5. Technically speaking, each enum is an Object and has all memory consumption properties of objects described in the previous article. Luckily, they have several useful properties:

All enum objects are singletons. This is guaranteed by Java language – you can create an instance of enum only by its definition. This means that you pay for enum object once and then you only spend 4 bytes per its reference (in this article I will assume that object references occupy 4 bytes). But what is the benefit of enums if they consume as much memory as int variables and 4 times more than byte variables, besides type safety and better support in IDE?

The answer is the ordinal() method implemented by every enum. It is an increasing number starting from 0 and ending at the number of values in the given enum minus 1. Such enum property allows us to use arrays for enum to any other object mapping: a value related to the first enum value will be stored in array[0], second enum will be mapped to array[1] and so on according to Enum.ordinal() method result. By the way, it was a short description of JDK EnumMap class implementation.

If you need to implement a map from Object into enum, you won’t have a tailored JDK implementation at hand. Of course, you can use any JDK Object to Object map, but it wouldn’t be that efficient. The easy way here is to use Trove TObjectByteMap (or TObjectIntMap in rare cases when you have more than 128 values in your enum) and map from Object key into Enum.ordinal() value. You will need a decoding method for getters in order to convert byte into an enum. This method will require 1 byte per entry, which is the best we can do without paying CPU algorithmic penalty (of course, we can use less than 1 byte per enum, if there are less than 128, 64, 32, etc elements in the enum, but it may make your code more complicated for a very little memory gain).

With all this knowledge at hand, you may now realize that EnumSet is implemented similarly to BitSet. There are 2 EnumSet implementations in JDK: RegularEnumSet and JumboEnumSet. The former is used for enums having less than 65 values (it covers 99,9999% of real-world enums), the latter is used for larger enumerations.

RegularEnumSet utilizes the knowledge of the fact that there are less or equal to 64 values in the enum. It allows RegularEnumSet to use a single long to store all “enum present in the set” flags, rather than using long[] (utilized by JumboEnumSet or BitSet). Using a single long instead of long[1] allows to save 4 bytes on long[] reference and 16 bytes on long value (long occupies 8 bytes, long[1] needs 12 bytes for header, 8 bytes for a single long and 4 bytes for alignment).

Continue reading

An overview of memory saving techniques in Java

by Mikhail Vorontsov

This article will give you the general advices on memory consumption optimization in Java.

Memory usage optimization is the most important type of optimization in Java. Current systems are limited by memory access times rather than by CPU frequency (otherwise, why CPU producers are implementing all these L1, L2 and L3 caches?). It means that by reducing your application memory footprint you will most likely improve your program data processing speed by making your CPU to wait for smaller amount of data. Now let’s get back to Java.

General Java memory layout information

First of all, we have to revise the memory layout of Java objects: any Java Object occupies at least 16 bytes, 12 out of which are occupied by a Java object header. Besides that, all Java objects are aligned by 8 bytes boundary. It means that, for example, an object containing 2 fields: int and byte will occupy not 17 (12 + 4 + 1), but 24 bytes (17 aligned by 8 bytes).

Each Object reference occupies 4 bytes if the Java heap is under 32G and XX:+UseCompressedOops is turned on (it is turned on by default in the recent Oracle JVM versions). Otherwise, Object references occupy 8 bytes.

All primitive data types occupy their exact byte size:

byte, boolean 1 byte
short, char 2 bytes
int, float 4 bytes
long, double 8 bytes

In essence, this information is sufficient for Java memory optimization. But it will be more convenient if you will be aware of arrays / String / numeric wrappers memory consumption.

Continue reading

Use case: Optimizing memory footprint of a read only csv file (Trove, Unsafe, ByteBuffer, data compression)

by Mikhail Vorontsov

Suppose your application has to obtain some data from an auxiliary data source, which could be a csv file. Your csv file will contain several fields, one of those is used as a field ID. You need to keep all that file in memory and provide fast access to records by an ID field. Your additional target is to consume as little memory as possible, while keeping access cost as low as possible. In this article we will process fake person records. Here is an example:

{id=idnum10, surname=Smith10, address=10 One Way Road, Springfield, NJ, DOB=01/11/1965, names=John Paul 10 Ringo}
    

All these records are generated by the following class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
private static class DataGenerator
{
    private int cnt = 1;
 
    public Map<String, String> getNextEntry()
    {
        final Map<String, String> res = new HashMap<String, String>( 8 );
        res.put( ID, "idnum" + cnt );
        res.put( "names", "John Paul " + cnt + " Ringo" );
        res.put( "surname", "Smith" + cnt );
        res.put( "DOB", "01/" + two((cnt % 31) + 1) + "/1965" ); //date of birth, MM/DD/YYYY
        res.put( "address", cnt + " One Way Road, Springfield, NJ" );
        ++cnt;
        return res;
    }
 
    private String two( final int val )
    {
        return val < 10 ? "0" + val : Integer.toString( val );
    }
}
private static class DataGenerator
{
    private int cnt = 1;

    public Map<String, String> getNextEntry()
    {
        final Map<String, String> res = new HashMap<String, String>( 8 );
        res.put( ID, "idnum" + cnt );
        res.put( "names", "John Paul " + cnt + " Ringo" );
        res.put( "surname", "Smith" + cnt );
        res.put( "DOB", "01/" + two((cnt % 31) + 1) + "/1965" ); //date of birth, MM/DD/YYYY
        res.put( "address", cnt + " One Way Road, Springfield, NJ" );
        ++cnt;
        return res;
    }

    private String two( final int val )
    {
        return val < 10 ? "0" + val : Integer.toString( val );
    }
}

Simple approach – map of maps

We always have to start from something simple and easy to support. In this case it may be a map of maps: outer map is indexed by ID field and th inner ones are indexed by field names.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
    /**
     * Initial version. Outer map indexed by a key field value, inner map is field name to field value.
     * All field names are interned, but we still have to pay for storing references to field names.
     */
    private static class SimpleMapStorage extends SimpleStorage
    {
        private final Map<String, Map<String, String>> m_data = new HashMap<String, Map<String, String>>( 1000 );
 
        public void addEntry( final Map<String, String> entry )
        {
            m_data.put( entry.get( ID ), entry );
        }
 
        public Map<String, String> getById( final String id )
        {
            return m_data.get( id );
        }
    }
    
    /**
     * Initial version. Outer map indexed by a key field value, inner map is field name to field value.
     * All field names are interned, but we still have to pay for storing references to field names.
     */
    private static class SimpleMapStorage extends SimpleStorage
    {
        private final Map<String, Map<String, String>> m_data = new HashMap<String, Map<String, String>>( 1000 );

        public void addEntry( final Map<String, String> entry )
        {
            m_data.put( entry.get( ID ), entry );
        }

        public Map<String, String> getById( final String id )
        {
            return m_data.get( id );
        }
    }
    

For testing purposes, all storage implementations will either implement Storage interface or extend SimpleStorage class. You will see the purpose of pack method in the more advanced examples.

1
2
3
4
5
6
7
8
9
10
11
12
13
private interface Storage
{
    public void addEntry( final Map<String, String> entry );
    public Map<String, String> getById( final String id );
    public void pack();
}
 
public static abstract class SimpleStorage implements Storage
{
    public void pack()
    {
    }
}
private interface Storage
{
    public void addEntry( final Map<String, String> entry );
    public Map<String, String> getById( final String id );
    public void pack();
}

public static abstract class SimpleStorage implements Storage
{
    public void pack()
    {
    }
}

All storage implementations will be tested by the following method:

1
2
3
4
5
6
7
8
9
10
11
private static void testStorage(final int recordCount, final Storage storage)
{
    final DataGenerator generator = new DataGenerator();
    for ( int i = 0; i < recordCount; ++i )
        storage.addEntry( generator.getNextEntry() );
    storage.pack();
    System.gc();
    final long mem = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
    System.out.println( storage.getClass().getName() + ": " + mem / 1024.0 / 1024.0 + " MB");
    System.out.println( storage.getById( "idnum10" ) ); //in order to retain storage in memory
}
private static void testStorage(final int recordCount, final Storage storage)
{
    final DataGenerator generator = new DataGenerator();
    for ( int i = 0; i < recordCount; ++i )
        storage.addEntry( generator.getNextEntry() );
    storage.pack();
    System.gc();
    final long mem = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
    System.out.println( storage.getClass().getName() + ": " + mem / 1024.0 / 1024.0 + " MB");
    System.out.println( storage.getById( "idnum10" ) ); //in order to retain storage in memory
}

For every implementation in this article we will try to create 1 million entries. SimpleMapStorage consumes 706 Mb to store 1M records. The actual data size is about 82 Mb, which means that this simple implementation wastes about 85% of consumed memory. Yes, straightforward solutions for big data storage do not work well in Java…

Continue reading