Category Archives: Java tricks

Implementing a world fastest Java int-to-int hash map*

by Mikhail Vorontsov

* Fastest among int-to-int map implementations I have tested in my previous article in the tests I have implemented for that article.

I would like to thank Sebastiano Vigna and Roman Leventov for sharing their hash map wisdom with me. Some implementation ideas were inspired by “Code Optimization: Effective Memory Usage” by Kris Kaspersky.

This article will give you a step by step overview of various implementation tricks used in the modern hash map implementations. At the end of this article you will have a probably fastest Java int-to-int hash map implementation available at the moment of writing of this article.

Open indexing

Most of modern hash maps are based on the idea of open indexing. What does it mean? Your map is based on the array of keys (values will always be placed at the matching array index, so forget about them for now). You have to find your key in the array of keys for each map operation. How does it implemented?

First of all, you need the initial lookup position in the array. It may be calculated by any function which maps a key into an integer in the range [0, array.length - 1]. A key is usually mapped into an integer by means of hashCode method. A simplest function here could be Math.abs(key.hashCode() % array.length) (keep in mind that % result could be negative).

As you understand, a mapping of large set of keys into a small set of integer values means that you may end up with some collisions (they are called hash collisions) – same results of the initial function for the different keys. Collisions are resolved by trying to apply another function to the original array index. The simplest of such functions is (prevIdx + 1) % array.length. There is one requirement for such functions – if applied in a loop, they should cover the whole set or array cells, so that you can use the whole array capacity. Another example of such function is incrementing the index by one prime number if the array length is another prime number.

Free and removed cells

In theory, that’s enough to implement your own hash map. In practice, you need to distinguish free and removed cells from occupied cells (you can avoid using removed cells if you’ll do extra work in remove method – see how it is implemented in the latest FastUtil). Removed cells are also known as “tombstones”.

Your keys array is initially filled with free “cells”. You set a cell into “removed” state if you need to remove an existing key.

Let’s take a look at an example:

Open indexing example

Open indexing example

This int key map uses the initial and next functions defined above:

1
2
initial = Math.abs( key % array.length );
nextIdx = ( prevIdx + 1 ) % array.length;
initial = Math.abs( key % array.length );
nextIdx = ( prevIdx + 1 ) % array.length;

This map originally contained keys 1, 2, 3 and 4, but key=3 was subsequently removed from a map, so it was replaced with a removed (“R”) placeholder.

Let’s see what should we do to find the following keys:

Key Description
2 Start function points at a cell with index=2 at once. We have key=2 at a cell with index=2, so no further lookup is required.
3 Start function points at a cell with index=3. This cell is “removed”, so we have to apply “nextIdx” function in a loop until we either find a key or a free cell. We check cell index=4 next – bad luck, key is not equal. Then we check cell index=5: it is a free cell, so we stop the lookup – key is not found.

Next, let’s see what will happen if we want to add key=10: initial = key % array.length = 10 % 9 = 1. Cell at index=1 is already occupied with another key, so we can not use it. So is cell at index=2. The cell at index=3 is “removed”, so we can reuse it and put key=10 into it.

Removed cells cleanup

In many cases your hash map may degrade to O(n2) complexity if you would keep the removed cells in the map. Fastest maps are implementing the removed cells cleanup one way or another. As a result, all other map methods will need to distinguish just 2 cell states: free or used. Besides that, remove method is usually called infrequently compared to get and less frequently than put, which means that some extra complexity during key removal will be paid off by fasted execution of other methods. This article will use FastUtil cleanup logic.

Key scrambling

The initial index function I have mentioned above ( initial = Math.abs( key % array.length ); ) will put consecutive keys in the consecutive array cells. This is highly undesirable if your next cell function is simply picking up the next array cell, because it will cause the long lookup chains to be created in a pretty common case.

In order to avoid it, we need to “scramble” the key, shuffling its bits. I will rely on FastUtil scrambling code:

1
2
3
4
5
6
private static final int INT_PHI = 0x9E3779B9;
 
public static int phiMix( final int x ) {
    final int h = x * INT_PHI;
    return h ^ (h >> 16);
}
private static final int INT_PHI = 0x9E3779B9;

public static int phiMix( final int x ) {
    final int h = x * INT_PHI;
    return h ^ (h >> 16);
}

As a result, consecutive keys will not be placed in the consecutive array cells, thus keeping the average hash chain length under control. As for “random” keys case, you are likely to end up with a pretty good distribution of keys over the keys array as well.

Now you are definitely ready to implement your own hash map. We will be implementing an int-int map in the next several sections of this article.

Continue reading

Implementing a high performance Money class

by Mikhail Vorontsov

This article will discuss how to efficiently implement a Money class in Java. This article follows Using double/long vs BigDecimal for monetary calculations article published earlier in this blog.

Introduction

As I have written before, we should not use any floating point types for storing monetary values. We should use long instead, keeping the number of smallest currency units in it. Unfortunately, there is a couple of problems:

  • We may still have to communicate with a software storing monetary values in double variables.
  • We may want to multiply it by a non-integer number, like 0.015, or divide it by any type of number, thus getting the non-integer result.

The easy way to deal with arithmetic operations is to use the BigDecimal class for all of them. Unfortunately, BigDecimal will not help you to deal with already incorrect double calculations results. For example, the following code will print 0.7999999999999999:

1
System.out.println( new BigDecimal( 0.7 + 0.1, MathContext.DECIMAL64 ) );
System.out.println( new BigDecimal( 0.7 + 0.1, MathContext.DECIMAL64 ) );

So, we may say that BigDecimal operations will produce the correct result if it had the correct arguments. We can not say the same about double operations.

double operations may end up with the exact result or a result which is one bit off the exact result. This is most likely caused by executing double operations using higher precision on CPU and then rounding the result. All we need to do to get the exact result is to look at these 3 values (result; result plus one bit value, which is also known as ulp; and result minus ulp). One of them would be the correct one. Easy to say, difficult to efficiently implement 🙂

Continue reading

An overview of memory saving techniques in Java

by Mikhail Vorontsov

This article will give you the general advices on memory consumption optimization in Java.

Memory usage optimization is the most important type of optimization in Java. Current systems are limited by memory access times rather than by CPU frequency (otherwise, why CPU producers are implementing all these L1, L2 and L3 caches?). It means that by reducing your application memory footprint you will most likely improve your program data processing speed by making your CPU to wait for smaller amount of data. Now let’s get back to Java.

General Java memory layout information

First of all, we have to revise the memory layout of Java objects: any Java Object occupies at least 16 bytes, 12 out of which are occupied by a Java object header. Besides that, all Java objects are aligned by 8 bytes boundary. It means that, for example, an object containing 2 fields: int and byte will occupy not 17 (12 + 4 + 1), but 24 bytes (17 aligned by 8 bytes).

Each Object reference occupies 4 bytes if the Java heap is under 32G and XX:+UseCompressedOops is turned on (it is turned on by default in the recent Oracle JVM versions). Otherwise, Object references occupy 8 bytes.

All primitive data types occupy their exact byte size:

byte, boolean 1 byte
short, char 2 bytes
int, float 4 bytes
long, double 8 bytes

In essence, this information is sufficient for Java memory optimization. But it will be more convenient if you will be aware of arrays / String / numeric wrappers memory consumption.

Continue reading

Forbidden Java actions: not declaring a checked exception; avoiding a constructor while creating an object

by Mikhail Vorontsov

This is the last article in the series of “Forbidden Java actions” articles. The two previous articles are Forbidden Java actions: object assignments, type conversions etc on the low level in Java and Forbidden Java actions: updating final and static final fields.

In this article we will see how to throw a checked exception in Java without declaring it in the method throws clause and how to create an object without calling any of its constructors.

Throwing a checked exception without letting know about it in the method signature

In Java you have to declare all checked exceptions in the throws clause of your method signature. This is a Java language requirement, JVM does not need this information. Let us prove it by finding a few ways to throw a checked exception while avoiding to declare it.

Thread.stop( Throwable )

The first way to throw a checked exception is to use a deprecated Thread.stop( Throwable ) method, which throws a given exception in the given thread. Don’t use this method in the production code! The following method will print “test” followed by “99”, which proves that thread won’t actually die 🙂

1
2
3
4
5
6
7
8
9
10
11
12
private static void threadStop()
{
    try
    {
        Thread.currentThread().stop( new IOException( "test" ) );
    }
    catch ( Exception ex )
    {
        System.out.println( ex.getMessage() );
    }
    System.out.println( 99 );
}
private static void threadStop()
{
    try
    {
        Thread.currentThread().stop( new IOException( "test" ) );
    }
    catch ( Exception ex )
    {
        System.out.println( ex.getMessage() );
    }
    System.out.println( 99 );
}

Continue reading

Forbidden Java actions: updating final and static final fields

by Mikhail Vorontsov

This article will discuss how you can update final or static final fields in Java. This is not allowed on the Java language level, but surprisingly, standard JDK classes will make it possible for you. Of course, we won’t forget a best Java hacker friend – sun.misc.Unsafe class from Oracle JDK. This is a second article in the “Forbidden Java actions” series started with Forbidden Java actions: object assignments, type conversions etc on the low level in Java.

Updating final or private fields

This is a simple trick. All you have to know about is a java.lang.reflect.AccessibleObject.setAccessible(boolean) method. If a field or method can not be accessed by Java language means in the current context (for example, you want to access a private field or method) – use this method with argument = true to obtain reflection access to that field or method. Unfortunately, this method is a subject of security checks, so it may be forbidden by a security manager in many enterprise server environments.

The following program will update a private final field of another class. It will print 20, despite the fact that a final field was initialized with 10 in the constructor.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
public class FinalPrivateField {
    public static void main(String[] args) throws NoSuchFieldException, IllegalAccessException {
        final A instance = new A( 10 );
        B.updateA( instance, 20 );
        System.out.println( instance.getField() );
    }
 
    public static class A
    {
        private final int m_field;
 
        public A( final int field ) {
            m_field = field;
        }
 
        public int getField() {
            return m_field;
        }
    }
 
    public static class B
    {
        public static void updateA( final A other, final int newValue ) throws NoSuchFieldException, IllegalAccessException
        {
            final Field fieldA = A.class.getDeclaredField( "m_field" );
            fieldA.setAccessible( true );
            fieldA.setInt( other, newValue );
        }
    }
}
public class FinalPrivateField {
    public static void main(String[] args) throws NoSuchFieldException, IllegalAccessException {
        final A instance = new A( 10 );
        B.updateA( instance, 20 );
        System.out.println( instance.getField() );
    }

    public static class A
    {
        private final int m_field;

        public A( final int field ) {
            m_field = field;
        }

        public int getField() {
            return m_field;
        }
    }

    public static class B
    {
        public static void updateA( final A other, final int newValue ) throws NoSuchFieldException, IllegalAccessException
        {
            final Field fieldA = A.class.getDeclaredField( "m_field" );
            fieldA.setAccessible( true );
            fieldA.setInt( other, newValue );
        }
    }
}

Continue reading

Forbidden Java actions: object assignments, type conversions etc on the low level in Java

by Mikhail Vorontsov

This article will reveal you a few details about the low level Java memory layout: we will see how to implement Object assignments using just primitive types. Then we will see what’s hidden in the array header and will convert an array of one type into an array of another type. This article continues the memory allocation discussion from Various types of memory allocation in Java article. It is also the first article in the “Forbidden Java actions” series.

WARNING: It is not recommended to apply these ideas to a production code! This article has only exploratory tasks.

Object assignments on the low level

As we already know from the Memory introspection using sun.misc.Unsafe and reflection article, Object reference (not its contents!) occupies just 4 bytes on the under 32Gb heaps. We will assume 4 byte Object references in this article.

All examples from this article require the following static definition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
private static final Unsafe unsafe;
static
{
    try
    {
        Field field = Unsafe.class.getDeclaredField("theUnsafe");
        field.setAccessible(true);
        unsafe = (Unsafe)field.get(null);
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}
 
private static final long longArrayOffset = unsafe.arrayBaseOffset(long[].class);
private static final long intArrayOffset = unsafe.arrayBaseOffset(int[].class);
private static final long integerArrayOffset = unsafe.arrayBaseOffset(Integer[].class);
private static final Unsafe unsafe;
static
{
    try
    {
        Field field = Unsafe.class.getDeclaredField("theUnsafe");
        field.setAccessible(true);
        unsafe = (Unsafe)field.get(null);
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}

private static final long longArrayOffset = unsafe.arrayBaseOffset(long[].class);
private static final long intArrayOffset = unsafe.arrayBaseOffset(int[].class);
private static final long integerArrayOffset = unsafe.arrayBaseOffset(Integer[].class);

Let’s allocate a small Integer[ 4 ] array and populate it with 1, 2, 3 and 4. After that let’s see what’s stored in the data cells of the array. We will read them as int values:

1
2
3
4
5
6
7
8
9
10
11
12
13
final Integer[] ar = new Integer[ 4 ];
ar[ 0 ] = 1;
ar[ 1 ] = 2;
ar[ 2 ] = 3;
ar[ 3 ] = 4;
 
//objects occupy 4 bytes for under 32G heaps
int[] ptrs = new int[ 4 ];
for ( int i = 0; i < 4; i++)
{
    ptrs[ i ] = unsafe.getInt(ar, integerArrayOffset + i * 4);
    System.out.println("Integer[" + i + "] = " + Integer.toHexString( ptrs[ i ] ) );
}
final Integer[] ar = new Integer[ 4 ];
ar[ 0 ] = 1;
ar[ 1 ] = 2;
ar[ 2 ] = 3;
ar[ 3 ] = 4;

//objects occupy 4 bytes for under 32G heaps
int[] ptrs = new int[ 4 ];
for ( int i = 0; i < 4; i++)
{
    ptrs[ i ] = unsafe.getInt(ar, integerArrayOffset + i * 4);
    System.out.println("Integer[" + i + "] = " + Integer.toHexString( ptrs[ i ] ) );
}

Here is an output from my computer. It will be different each time, but most likely the difference between cells will be equal to 16, which is an actual Integer memory footprint.

Integer[0] = f004e880
Integer[1] = f004e870
Integer[2] = f004e860
Integer[3] = f004e850

Unfortunately, these values are not pure pointers (in terms of C) – trying to use them with unsafe.getInt( long address ) will cause an access violation.

On the other hand, we can still work in terms of these values. For example, we can implement an assignment ar[ 0 ] = ar[ 1 ]:

Continue reading