Category Archives: Advanced

Implementing a world fastest Java int-to-int hash map*

by Mikhail Vorontsov

* Fastest among int-to-int map implementations I have tested in my previous article in the tests I have implemented for that article.

I would like to thank Sebastiano Vigna and Roman Leventov for sharing their hash map wisdom with me. Some implementation ideas were inspired by “Code Optimization: Effective Memory Usage” by Kris Kaspersky.

This article will give you a step by step overview of various implementation tricks used in the modern hash map implementations. At the end of this article you will have a probably fastest Java int-to-int hash map implementation available at the moment of writing of this article.

Open indexing

Most of modern hash maps are based on the idea of open indexing. What does it mean? Your map is based on the array of keys (values will always be placed at the matching array index, so forget about them for now). You have to find your key in the array of keys for each map operation. How does it implemented?

First of all, you need the initial lookup position in the array. It may be calculated by any function which maps a key into an integer in the range [0, array.length - 1]. A key is usually mapped into an integer by means of hashCode method. A simplest function here could be Math.abs(key.hashCode() % array.length) (keep in mind that % result could be negative).

As you understand, a mapping of large set of keys into a small set of integer values means that you may end up with some collisions (they are called hash collisions) – same results of the initial function for the different keys. Collisions are resolved by trying to apply another function to the original array index. The simplest of such functions is (prevIdx + 1) % array.length. There is one requirement for such functions – if applied in a loop, they should cover the whole set or array cells, so that you can use the whole array capacity. Another example of such function is incrementing the index by one prime number if the array length is another prime number.

Free and removed cells

In theory, that’s enough to implement your own hash map. In practice, you need to distinguish free and removed cells from occupied cells (you can avoid using removed cells if you’ll do extra work in remove method – see how it is implemented in the latest FastUtil). Removed cells are also known as “tombstones”.

Your keys array is initially filled with free “cells”. You set a cell into “removed” state if you need to remove an existing key.

Let’s take a look at an example:

Open indexing example

Open indexing example

This int key map uses the initial and next functions defined above:

1
2
initial = Math.abs( key % array.length );
nextIdx = ( prevIdx + 1 ) % array.length;
initial = Math.abs( key % array.length );
nextIdx = ( prevIdx + 1 ) % array.length;

This map originally contained keys 1, 2, 3 and 4, but key=3 was subsequently removed from a map, so it was replaced with a removed (“R”) placeholder.

Let’s see what should we do to find the following keys:

Key Description
2 Start function points at a cell with index=2 at once. We have key=2 at a cell with index=2, so no further lookup is required.
3 Start function points at a cell with index=3. This cell is “removed”, so we have to apply “nextIdx” function in a loop until we either find a key or a free cell. We check cell index=4 next – bad luck, key is not equal. Then we check cell index=5: it is a free cell, so we stop the lookup – key is not found.

Next, let’s see what will happen if we want to add key=10: initial = key % array.length = 10 % 9 = 1. Cell at index=1 is already occupied with another key, so we can not use it. So is cell at index=2. The cell at index=3 is “removed”, so we can reuse it and put key=10 into it.

Removed cells cleanup

In many cases your hash map may degrade to O(n2) complexity if you would keep the removed cells in the map. Fastest maps are implementing the removed cells cleanup one way or another. As a result, all other map methods will need to distinguish just 2 cell states: free or used. Besides that, remove method is usually called infrequently compared to get and less frequently than put, which means that some extra complexity during key removal will be paid off by fasted execution of other methods. This article will use FastUtil cleanup logic.

Key scrambling

The initial index function I have mentioned above ( initial = Math.abs( key % array.length ); ) will put consecutive keys in the consecutive array cells. This is highly undesirable if your next cell function is simply picking up the next array cell, because it will cause the long lookup chains to be created in a pretty common case.

In order to avoid it, we need to “scramble” the key, shuffling its bits. I will rely on FastUtil scrambling code:

1
2
3
4
5
6
private static final int INT_PHI = 0x9E3779B9;
 
public static int phiMix( final int x ) {
    final int h = x * INT_PHI;
    return h ^ (h >> 16);
}
private static final int INT_PHI = 0x9E3779B9;

public static int phiMix( final int x ) {
    final int h = x * INT_PHI;
    return h ^ (h >> 16);
}

As a result, consecutive keys will not be placed in the consecutive array cells, thus keeping the average hash chain length under control. As for “random” keys case, you are likely to end up with a pretty good distribution of keys over the keys array as well.

Now you are definitely ready to implement your own hash map. We will be implementing an int-int map in the next several sections of this article.

Continue reading

Creating an exception in Java is very slow

by Mikhail Vorontsov


29 June 2014 update – now using JMH for testing. Added 2 ways to avoid the cost of exceptions. Made some editorial changes to the original text to reflect JMH usage.

Filling in the stack trace is slow…

Creating an exception in Java is a very slow operation. Expect that throwing an exception will cost you around 1-5 microseconds. Nearly all this time is spent on filling in the exception thread stack. The deeper the stack trace is, the more time it will take to populate it.

Usually we throw an exception in case of unexpected problems. This means that we don’t expect exceptions to be thrown at a rate of thousands per second per process. But sometimes you will encounter a method which uses exceptions for more likely events. We have already seen a good (actually bad) example of such behavior in Base 64 encoding and decoding performance article: sun.misc.BASE64Decoder is extremely slow due to using an exception for “give me more data” requests:

at java.lang.Throwable.fillInStackTrace(Throwable.java:-1)
at java.lang.Throwable.fillInStackTrace(Throwable.java:782)
- locked <0x6c> (a sun.misc.CEStreamExhausted)
at java.lang.Throwable.<init>(Throwable.java:250)
at java.lang.Exception.<init>(Exception.java:54)
at java.io.IOException.<init>(IOException.java:47)
at sun.misc.CEStreamExhausted.<init>(CEStreamExhausted.java:30)
at sun.misc.BASE64Decoder.decodeAtom(BASE64Decoder.java:117)
at sun.misc.CharacterDecoder.decodeBuffer(CharacterDecoder.java:163)
at sun.misc.CharacterDecoder.decodeBuffer(CharacterDecoder.java:194)

You may encounter the same problem if you try to run pack method from String packing part 2: converting Strings to any other objects with a string starting with a digit, but followed by letters. Let’s see how long does it take to pack ‘12345’ and ‘12345a’ with that method:

Benchmark                        (m_param)   Mode   Samples         Mean   Mean error    Units
t.StringPacking2Tests.testPack      12345a  thrpt        10        0.044        0.000   ops/us
t.StringPacking2Tests.testPack       12345  thrpt        10        7.934        0.154   ops/us

As you can see, we were able to convert “12345” from String about 200 times faster than “12345a”. Most of processing time is again being spent filling in stack traces:

at java.lang.Throwable.fillInStackTrace(Throwable.java:-1)
at java.lang.Throwable.fillInStackTrace(Throwable.java:782)
- locked <0x87> (a java.lang.NumberFormatException)
at java.lang.Throwable.<init>(Throwable.java:265)
at java.lang.Exception.<init>(Exception.java:66)
at java.lang.RuntimeException.<init>(RuntimeException.java:62)
at java.lang.IllegalArgumentException.<init>(IllegalArgumentException.java:53)
at java.lang.NumberFormatException.<init>(NumberFormatException.java:55)
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.valueOf(Long.java:540)
at tests.StringPacking2Tests.pack(StringPacking2Tests.java:69)
...

Continue reading

Introduction to JMH Profilers

by Mikhail Vorontsov

This article follows Introduction to JMH article, which should be read prior to reading this article.

This article will give you an overview of profilers available inside the JMH framework.

List of available profilers

Name Description
CL Classloader profiling via standard MBeans
COMP JIT compiler profiling via standard MBeans
GC GC profiling via standard MBeans
HS_CL HotSpot ™ classloader profiling via implementation-specific MBeans
HS_COMP HotSpot ™ JIT compiler profiling via implementation-specific MBeans
HS_GC HotSpot ™ memory manager (GC) profiling via implementation-specific MBeans
HS_RT HotSpot ™ runtime profiling via implementation-specific MBeans
HS_THR HotSpot ™ threading subsystem via implementation-specific MBeans
STACK Simple and naive Java stack profiler

You can specify which profiler to use via JMH API:

1
2
3
4
5
Options opt = new OptionsBuilder()
        .include(".*" + YourClass.class.getSimpleName() + ".*")
        .forks(1)
        .addProfiler( StackProfiler.class )
        .build();
Options opt = new OptionsBuilder()
        .include(".*" + YourClass.class.getSimpleName() + ".*")
        .forks(1)
        .addProfiler( StackProfiler.class )
        .build();

Continue reading

java.util.Random and java.util.concurrent.ThreadLocalRandom in multithreaded environments

by Mikhail Vorontsov

Generating pseudorandom data

There are 2 types of random generators in Java: pseudorandom and secure. Pseudorandom generators are transforming a seed into a new portion of pseudorandom data based on some formula. Secure random generators are using some machine specific sources of actually random events (file/socket access, for example) to generate its data.

Secure random generators:

  • should be used only when cryptographically strong random data is required
  • are slow
  • could be waiting for external events (“stuck”) if you have requested a lot of random data (Linux /dev/random is an example of such generator)

Pseudorandom generators, on the other hand, depend only on the initial “seed” value, so you can generate the same sequence of pseudorandom events if you will provide the same seed to the algorithm. In general case, such generators are fast because they are CPU bound only (do not rely on any IO). We will review the evolution of pseudorandom generators in Java and the reasons behind these changes.

java.util.Random

java.util.Random is available from Java 1.0. It is a thread safe class, so you may share (in theory) instances of this class between several threads and do not expect to get the same random data in 2 threads at the same time. Such thread safety is achieved via using an AtomicLong for the generator seed.

Random uses AtomicLong CAS (compare-and-set) operations for updating its seed. Despite being a light non-blocking primitive used in a lot of non-blocking algorithms, CAS behaves really poorly under the high contention. Wait for test results to see how poorly it behaves.

java.util.concurrent.ThreadLocalRandom

java.util.concurrent.ThreadLocalRandom was added in Java 7 as an attempt to overcome all performance issues associated with java.util.Random. This new class extends java.util.Random, so you may pass it to all logic expecting the old java.util.Random.

Here are main ThreadLocalRandom implementation details:

  • Internally it does not use an AtomicLong seed from Random. Instead it uses an ordinary long.
  • You can not create an instance of ThreadLocalRandom yourself, because its constructor is not public. Instead you should use its static factory ThreadLocalRandom.current(). This factory method queries internal ThreadLocal<ThreadLocalRandom>
  • It is CPU-cache aware, so it uses 8 long dummy fields for padding, thus pushing anything else out of its 64 byte L1 cache line.

All of these changes are important, which will be revealed by the following tests.

Continue reading

I/O bound algorithms: SSD vs HDD

by Mikhail Vorontsov

This article will investigate an impact of modern SSDs on the I/O bound algorithms of HDD era.

Improved write speed of SSD

Modern SSD provide read/write speeds up to 500Mb/sec. Compare it to approximately 100Mb/sec cap on the speed of modern HDD. It means that your application has to produce the output 5 times faster than before in order to still be I/O bound.

Let’s make 3 tests:

  1. Fill an 8 Gb file with a repeating sequence of 1024 bytes using a BufferedOutputStream with 32K buffer size. Data will be written in a loop, no extra processing is done inside the loop ( testWriteNoProcessing method from the following code snippet ).
  2. Fill an 8 Gb file with a sequence of 1024 bytes which is recomputed before writing it on the every iteration. Data will be written using a BufferedOutputStream with 32K buffer size ( testWriteSimple ).
  3. Same as previous test, but data will not be written to disk. This test will estimate how long does it take to prepare the data to write.

Continue reading

Forbidden Java actions: not declaring a checked exception; avoiding a constructor while creating an object

by Mikhail Vorontsov

This is the last article in the series of “Forbidden Java actions” articles. The two previous articles are Forbidden Java actions: object assignments, type conversions etc on the low level in Java and Forbidden Java actions: updating final and static final fields.

In this article we will see how to throw a checked exception in Java without declaring it in the method throws clause and how to create an object without calling any of its constructors.

Throwing a checked exception without letting know about it in the method signature

In Java you have to declare all checked exceptions in the throws clause of your method signature. This is a Java language requirement, JVM does not need this information. Let us prove it by finding a few ways to throw a checked exception while avoiding to declare it.

Thread.stop( Throwable )

The first way to throw a checked exception is to use a deprecated Thread.stop( Throwable ) method, which throws a given exception in the given thread. Don’t use this method in the production code! The following method will print “test” followed by “99”, which proves that thread won’t actually die 🙂

1
2
3
4
5
6
7
8
9
10
11
12
private static void threadStop()
{
    try
    {
        Thread.currentThread().stop( new IOException( "test" ) );
    }
    catch ( Exception ex )
    {
        System.out.println( ex.getMessage() );
    }
    System.out.println( 99 );
}
private static void threadStop()
{
    try
    {
        Thread.currentThread().stop( new IOException( "test" ) );
    }
    catch ( Exception ex )
    {
        System.out.println( ex.getMessage() );
    }
    System.out.println( 99 );
}

Continue reading

Forbidden Java actions: updating final and static final fields

by Mikhail Vorontsov

This article will discuss how you can update final or static final fields in Java. This is not allowed on the Java language level, but surprisingly, standard JDK classes will make it possible for you. Of course, we won’t forget a best Java hacker friend – sun.misc.Unsafe class from Oracle JDK. This is a second article in the “Forbidden Java actions” series started with Forbidden Java actions: object assignments, type conversions etc on the low level in Java.

Updating final or private fields

This is a simple trick. All you have to know about is a java.lang.reflect.AccessibleObject.setAccessible(boolean) method. If a field or method can not be accessed by Java language means in the current context (for example, you want to access a private field or method) – use this method with argument = true to obtain reflection access to that field or method. Unfortunately, this method is a subject of security checks, so it may be forbidden by a security manager in many enterprise server environments.

The following program will update a private final field of another class. It will print 20, despite the fact that a final field was initialized with 10 in the constructor.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
public class FinalPrivateField {
    public static void main(String[] args) throws NoSuchFieldException, IllegalAccessException {
        final A instance = new A( 10 );
        B.updateA( instance, 20 );
        System.out.println( instance.getField() );
    }
 
    public static class A
    {
        private final int m_field;
 
        public A( final int field ) {
            m_field = field;
        }
 
        public int getField() {
            return m_field;
        }
    }
 
    public static class B
    {
        public static void updateA( final A other, final int newValue ) throws NoSuchFieldException, IllegalAccessException
        {
            final Field fieldA = A.class.getDeclaredField( "m_field" );
            fieldA.setAccessible( true );
            fieldA.setInt( other, newValue );
        }
    }
}
public class FinalPrivateField {
    public static void main(String[] args) throws NoSuchFieldException, IllegalAccessException {
        final A instance = new A( 10 );
        B.updateA( instance, 20 );
        System.out.println( instance.getField() );
    }

    public static class A
    {
        private final int m_field;

        public A( final int field ) {
            m_field = field;
        }

        public int getField() {
            return m_field;
        }
    }

    public static class B
    {
        public static void updateA( final A other, final int newValue ) throws NoSuchFieldException, IllegalAccessException
        {
            final Field fieldA = A.class.getDeclaredField( "m_field" );
            fieldA.setAccessible( true );
            fieldA.setInt( other, newValue );
        }
    }
}

Continue reading

Forbidden Java actions: object assignments, type conversions etc on the low level in Java

by Mikhail Vorontsov

This article will reveal you a few details about the low level Java memory layout: we will see how to implement Object assignments using just primitive types. Then we will see what’s hidden in the array header and will convert an array of one type into an array of another type. This article continues the memory allocation discussion from Various types of memory allocation in Java article. It is also the first article in the “Forbidden Java actions” series.

WARNING: It is not recommended to apply these ideas to a production code! This article has only exploratory tasks.

Object assignments on the low level

As we already know from the Memory introspection using sun.misc.Unsafe and reflection article, Object reference (not its contents!) occupies just 4 bytes on the under 32Gb heaps. We will assume 4 byte Object references in this article.

All examples from this article require the following static definition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
private static final Unsafe unsafe;
static
{
    try
    {
        Field field = Unsafe.class.getDeclaredField("theUnsafe");
        field.setAccessible(true);
        unsafe = (Unsafe)field.get(null);
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}
 
private static final long longArrayOffset = unsafe.arrayBaseOffset(long[].class);
private static final long intArrayOffset = unsafe.arrayBaseOffset(int[].class);
private static final long integerArrayOffset = unsafe.arrayBaseOffset(Integer[].class);
private static final Unsafe unsafe;
static
{
    try
    {
        Field field = Unsafe.class.getDeclaredField("theUnsafe");
        field.setAccessible(true);
        unsafe = (Unsafe)field.get(null);
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}

private static final long longArrayOffset = unsafe.arrayBaseOffset(long[].class);
private static final long intArrayOffset = unsafe.arrayBaseOffset(int[].class);
private static final long integerArrayOffset = unsafe.arrayBaseOffset(Integer[].class);

Let’s allocate a small Integer[ 4 ] array and populate it with 1, 2, 3 and 4. After that let’s see what’s stored in the data cells of the array. We will read them as int values:

1
2
3
4
5
6
7
8
9
10
11
12
13
final Integer[] ar = new Integer[ 4 ];
ar[ 0 ] = 1;
ar[ 1 ] = 2;
ar[ 2 ] = 3;
ar[ 3 ] = 4;
 
//objects occupy 4 bytes for under 32G heaps
int[] ptrs = new int[ 4 ];
for ( int i = 0; i < 4; i++)
{
    ptrs[ i ] = unsafe.getInt(ar, integerArrayOffset + i * 4);
    System.out.println("Integer[" + i + "] = " + Integer.toHexString( ptrs[ i ] ) );
}
final Integer[] ar = new Integer[ 4 ];
ar[ 0 ] = 1;
ar[ 1 ] = 2;
ar[ 2 ] = 3;
ar[ 3 ] = 4;

//objects occupy 4 bytes for under 32G heaps
int[] ptrs = new int[ 4 ];
for ( int i = 0; i < 4; i++)
{
    ptrs[ i ] = unsafe.getInt(ar, integerArrayOffset + i * 4);
    System.out.println("Integer[" + i + "] = " + Integer.toHexString( ptrs[ i ] ) );
}

Here is an output from my computer. It will be different each time, but most likely the difference between cells will be equal to 16, which is an actual Integer memory footprint.

Integer[0] = f004e880
Integer[1] = f004e870
Integer[2] = f004e860
Integer[3] = f004e850

Unfortunately, these values are not pure pointers (in terms of C) – trying to use them with unsafe.getInt( long address ) will cause an access violation.

On the other hand, we can still work in terms of these values. For example, we can implement an assignment ar[ 0 ] = ar[ 1 ]:

Continue reading

Using double/long vs BigDecimal for monetary calculations

by Mikhail Vorontsov

This article discusses how to perform monetary operations in Java: what is the correct way of using double and what alternatives do we have.

Monetary operations using long/double

The easiest way to represent monetary values in financial environment is to work with the smallest currency units – for example, cents in USA, instead of normal currency unit – dollar in USA. long datatype is rather suitable for this case. Unfortunately, sometimes we have to divide such values or multiply them by decimal point values (for example, calculate how much you have earned on your savings account). This means that while we can still use long for storing cents, we need to multiply/divide using decimal point arithmetic.

Do not use float for any monetary operations unless you absolutely sure. It has too low precision (23 bits).

double calculations are not precise. Even simple one, such as addition and subtraction:

1
System.out.println( "362.2 - 362.6 = " + ( 362.2 - 362.6 ) );
System.out.println( "362.2 - 362.6 = " + ( 362.2 - 362.6 ) );

will print "362.2 - 362.6 = -0.4000000000000341". This means we should:

  1. Avoid working with non-integral values while using double (calculate in the smallest currency units).
  2. Round any multiplication/division results using Math.round/rint/ceil/floor (per your system requirements).

Continue reading

Memory introspection using sun.misc.Unsafe and reflection

by Mikhail Vorontsov

It is useful for a serious Java developer to realize how much memory is occupied by one or another Java object. You may have heard that we live in a world there memory is not an issue anymore. This may be true for your text editor ( though, try to open a large document with tons of embedded images and charts and see how much memory will be consumed by your favourite editor ). This may be true for a dedicated server software (at least, until your business would grow to a bigger market or you will run another piece of software on the same server). This is may even be true for a cloud-based software, if you are rich enough in order to pay a premium for a top-class server hardware.

Still, in the real world your software will once reach a point where it makes sense to spend money in its optimization rather than trying to obtain an even better hardware (currently the most you can get on a commodity class server is 64G RAM). At this point you will have to analyze your application and find which data structures consume most of application memory. The best tool for such task is a good profiler, but you can start with a cheaper approach of analyzing your objects inside your code. This article describes an Oracle JDK based ClassIntrospector class, which will allow you to analyze your application memory consumption.

I have already mentioned the Java object memory layout in the String packing part 1: converting characters to bytes article. For example, I have written that a 28 character long String should occupy 104 bytes before Java 1.7.0_06. To be honest, at the time of writing of that article I used my profiler to get a proof for my calculations. Now it is about time to implement a Java object introspector using pure Java and Oracle JDK specific sun.misc.Unsafe class.

We will use the following sun.misc.Unsafe methods:

Continue reading