Category Archives: Advanced

Changes to String internal representation made in Java 1.7.0_06

by Mikhail Vorontsov

This post was updated on 19 Nov 2013 in order to reflect Java 8 changes.


This post was updated on 28 Nov 2013 in order to reflect Java 7u40 changes (thanks to Sunny Chan and his colleagues for pointing my attention at this JDK update).

Sharing an underlying char[]

An original String implementation has 4 non static field: char[] value with string characters, int offset and int count with an index of the first character to use from value and a number of characters to use and int hash with a cached value of a String hash code. As you can see, in a very large number of cases a String will have offset = 0 and count = value.length. The only exception to this rule were the strings created via String.substring calls and all API calls using this method internally (like Pattern.split).

String.substring created a String, which shared an internal char[] value with an original String, which allowed you:

  1. To save some memory by sharing character data
  2. To run String.substring in a constant time ( O(1) )

At the same time such feature was a source of a possible memory leak: if you extract a tiny substring from an original huge string and discard that original string, you will still have a live reference to the underlying huge char[] value taken from an original String. The only way to avoid it was to call a new String( String ) constructor on such string – it made a copy of a required section of underlying char[], thus unlinking your shorter string from its longer “parent”.

From Java 1.7.0_06 (as well as in current versions of Java 8 – Nov 13) offset and count fields were removed from a String. This means that you can’t share a part of an underlying char[] value anymore. Now you can forget about a memory leak described above and never ever use new String(String) constructor anymore. As a drawback, you now have to remember that String.substring has now a linear complexity instead of a constant one.

Continue reading

Static constructor code is not JIT-optimized in a lot of cases

by Mikhail Vorontsov

I have found this article problem by an accident – I had to initialize a large in-memory static table with results of quite expensive method calls in one of my classes static constructors. Surprisingly, it took much longer than I expected…

Let’s start with a simple test code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
private final static int ITERS = 100000000;  //100M
 
private static void test( final String name )
{
    final long start = System.currentTimeMillis();
    long res = start;
    for ( int i = 0; i < ITERS; ++i )
    {
        for ( int j = 0; j < 20; ++j )
            res = add2( res,  sub( add( i, j ), 1 ) );
    }
    final long time = System.currentTimeMillis() - start;
    System.out.println( name + ": Time = " + time / 1000.0 + " sec, res = " + res );
}
 
public static int add( final int a, final int b )
{
    return a + b;
}
 
public static int sub( final int a, final int b )
{
    return a - b;
}
 
public static long add2( final long a, final int b )
{
    return a + b;
}
private final static int ITERS = 100000000;  //100M

private static void test( final String name )
{
    final long start = System.currentTimeMillis();
    long res = start;
    for ( int i = 0; i < ITERS; ++i )
    {
        for ( int j = 0; j < 20; ++j )
            res = add2( res,  sub( add( i, j ), 1 ) );
    }
    final long time = System.currentTimeMillis() - start;
    System.out.println( name + ": Time = " + time / 1000.0 + " sec, res = " + res );
}

public static int add( final int a, final int b )
{
    return a + b;
}

public static int sub( final int a, final int b )
{
    return a - b;
}

public static long add2( final long a, final int b )
{
    return a + b;
}

We have 4 billion additions and 2 billion subtractions to execute. How long should it take on a modern CPU? We could expect something close to number of operations / CPU frequency (all these operations could be executed on registers only, without any memory access) – something in between 1.5 and 4 seconds.

Initially we will define these methods in the same class with static constructor and call test method in a static constructor in a loop in order to trigger JIT-compilation. Surprisingly, it took 46 seconds to run test method a single time on my Core i5-3317 (1.7Ghz). At least 10 times slower than we may expect. And it took the same time for subsequent calls (I made 30 of them). Something is very wrong, I thought… Was my code actually JIT-compiled? I’ve added -XX:+PrintCompilation Java option – it printed that yes, my methods were compiled:

Continue reading

Performance of various methods of binary serialization in Java

by Mikhail Vorontsov

We are going to find out what is the performance of binary serialization in Java. Following classes will be compared:

  • DataInputStream(ByteArrayInputStream) and its counterpart DataOutputStream(ByteArrayOutputStream)
  • See how synchronization affects ByteArrayInput/OutputStream and check performance of BAInputStream – copy of ByteArrayInputStream w/o synchronization
  • ByteBuffer in its 4 flavours – heap/direct, big/little endian
  • sun.misc.Unsafe – based memory operations on heap byte arrays

My experience has shown me that all these serialization methods depend on on data item size as well as on buffer/stream type. So, two sets of tests were written. First test works on an object having a single field – byte[500], while second test is using another object with another single field – long[500]. In case of ByteBuffer and Unsafe we will test both bulk operations and serialization of every array element as a separate method call.

Continue reading