String.intern in Java 6, 7 and 8 – multithreaded access

by Mikhail Vorontsov

This article follows the initial String.intern in Java 6, 7 and 8 – string pooling article describing the implementation and benefits of using String.intern() method in Java 7 and 8. The original article was already getting too long, so I had to write this article in order to describe the performance characteristics of the multithreaded access to the String.intern.

The tests we will perform today are calling String.intern() from the multiple threads. They will emulate the behavior of a lot of modern server applications (for example, specialized web crawlers). For tests I will use my new workstation with Intel Xeon E5-2650 CPU (8 physical, 16 virtual cores @ 2 Ghz) and 128 Gb RAM. It will allow us to test the rather high contention scenarios. In this article we will create 8 threads in order to utilize all physical cores.

There will be 4 tests:

  1. The benchmark one – a single thread calling String.intern() using testLongLoop method from the previous article. It will show us how fast this configuration is without any contention.
  2. All 8 threads are calling String.intern() with unique values – an own prefix will be added by each thread to the interned string. This test will show us the synchronization overhead of String.intern(). It should be the theoretical worst case: it is highly unlikely that the only thing the actual application would do is calling String.intern in a loop from many threads.
  3. Initially we start the first thread interning the set of strings. After 2 seconds delay we will start a second thread interning the same set of strings. We expect that the following assumptions will be true: str.intern()==str for the first thread; str.intern()!=str for the second thread. It will allow us to prove that there are no thread local JVM string pools.
  4. All 8 threads will intern the same set of values. This scenario will be closer to the real situation – it will provide us the rather likely mix of adding strings to the JVM pool and querying the strings from it. Nevertheless, such a high read contention on the JVM string pool is still an unlikely event.

Single threaded String.intern() in a loop – a benchmark test

Test one: adding strings to the JVM string pool in a loop in a single thread using testLongLoop from the previous article. Unlike the previous article, this time we will make time snapshots every one million strings. For example, when you see the 3000000; time = 0.402 sec line in the log, it means that it took 0.402 second to add one million strings to the pool when there were initially 2 million strings in it (and now there are 3,000,000 strings in the pool). This test (as well as others in this article) was run with -Xmx64G -XX:StringTableSize=1000003 settings (one million buckets in the JVM string pool).

1000000; time = 0.362 sec
2000000; time = 0.377 sec
3000000; time = 0.402 sec
4000000; time = 0.434 sec
5000000; time = 0.36 sec
6000000; time = 0.364 sec
7000000; time = 0.351 sec
8000000; time = 0.379 sec
9000000; time = 0.445 sec
10000000; time = 0.487 sec
11000000; time = 0.518 sec
12000000; time = 0.59 sec
13000000; time = 0.695 sec
14000000; time = 0.768 sec
15000000; time = 0.815 sec
16000000; time = 0.873 sec
17000000; time = 0.954 sec
18000000; time = 1.031 sec
19000000; time = 1.081 sec
20000000; time = 1.12 sec
21000000; time = 1.17 sec
22000000; time = 1.194 sec
23000000; time = 1.264 sec
24000000; time = 1.291 sec
25000000; time = 1.352 sec
26000000; time = 1.421 sec
27000000; time = 1.476 sec
28000000; time = 1.514 sec
29000000; time = 1.612 sec
30000000; time = 1.643 sec
31000000; time = 1.695 sec
32000000; time = 1.703 sec
33000000; time = 1.81 sec
34000000; time = 1.854 sec
35000000; time = 1.943 sec
36000000; time = 1.937 sec
37000000; time = 2.0 sec
38000000; time = 2.102 sec
39000000; time = 2.124 sec
40000000; time = 2.212 sec
41000000; time = 2.225 sec
42000000; time = 2.305 sec
43000000; time = 2.344 sec
44000000; time = 2.379 sec
45000000; time = 2.46 sec
46000000; time = 2.557 sec
47000000; time = 2.656 sec
48000000; time = 2.627 sec
49000000; time = 2.629 sec
50000000; time = 2.735 sec
51000000; time = 2.738 sec
52000000; time = 2.823 sec
53000000; time = 2.861 sec
54000000; time = 2.974 sec
55000000; time = 3.027 sec
56000000; time = 3.05 sec
57000000; time = 3.088 sec
58000000; time = 3.161 sec
59000000; time = 3.244 sec
60000000; time = 3.31 sec
61000000; time = 3.327 sec
62000000; time = 3.382 sec
63000000; time = 3.445 sec
64000000; time = 3.524 sec
65000000; time = 3.669 sec
66000000; time = 3.596 sec
67000000; time = 3.673 sec
68000000; time = 3.705 sec
69000000; time = 3.752 sec
70000000; time = 3.81 sec
71000000; time = 3.898 sec
72000000; time = 3.93 sec
73000000; time = 4.0 sec
74000000; time = 4.133 sec
75000000; time = 4.109 sec
76000000; time = 4.193 sec
77000000; time = 4.182 sec
78000000; time = 4.283 sec
79000000; time = 4.349 sec
80000000; time = 4.395 sec

As you can see, results are growing linearly, which confirms the “list in the hash map bucket” implementation. We will use these results later to compare them with the multithreaded ones.

8 threads interning the different sets of strings

The next test will check what are the synchronization limitations of the JVM thread pool: each of 8 threads will create a unique set of strings and intern them in a loop. This is the very high contention scenario – the real world application is unlikely to have more than 2 or 3 simultaneous calls to String.intern() (this assumption is based on the current number of cores in the best modern CPUs).

Here is the test method, which we will use for the multithreaded tests. The only difference between tests #2 and #4 is in the string to intern (read the method comments):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
private static void multiThreadedInternTest( final int threads, final int cnt )
{
    final CountDownLatch latch = new CountDownLatch( threads );
    for ( int i = 0; i < threads; ++i )
    {
        final int threadNo = i;
        final Runnable task = new Runnable() {
            @Override
            public void run() {
                latch.countDown();
                try {
                    latch.await(); //start all threads simultaneously
                } catch ( InterruptedException ignored ) {
                }
 
                final List<String> lst = new ArrayList<String>( 100 );
                long start = System.currentTimeMillis();
                for ( int i = 0; i < cnt; ++i )
                {
                    //this line is used in 8 writers scenario
                    final String str = "Thread #" + threadNo + " : " + i;
                    //use the following line for 1 writer, 7 readers scenario
                    //final String str = "Thread #0 : " + i;
                    lst.add( str.intern() );
                    if ( i % 10000 == 0 )
                    {
                        System.out.println( "Thread # " + threadNo + " : " + i +
                            "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                        start = System.currentTimeMillis();
                    }
                }
                System.out.println( "Total length = " + lst.size() );
            }
        };
        new Thread( task ).start();
    }
}
private static void multiThreadedInternTest( final int threads, final int cnt )
{
    final CountDownLatch latch = new CountDownLatch( threads );
    for ( int i = 0; i < threads; ++i )
    {
        final int threadNo = i;
        final Runnable task = new Runnable() {
            @Override
            public void run() {
                latch.countDown();
                try {
                    latch.await(); //start all threads simultaneously
                } catch ( InterruptedException ignored ) {
                }

                final List<String> lst = new ArrayList<String>( 100 );
                long start = System.currentTimeMillis();
                for ( int i = 0; i < cnt; ++i )
                {
                    //this line is used in 8 writers scenario
                    final String str = "Thread #" + threadNo + " : " + i;
                    //use the following line for 1 writer, 7 readers scenario
                    //final String str = "Thread #0 : " + i;
                    lst.add( str.intern() );
                    if ( i % 10000 == 0 )
                    {
                        System.out.println( "Thread # " + threadNo + " : " + i +
                            "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                        start = System.currentTimeMillis();
                    }
                }
                System.out.println( "Total length = " + lst.size() );
            }
        };
        new Thread( task ).start();
    }
}

Here are the test results for "8 writers" test case (with same JVM settings: -XX:StringTableSize=1000003 -Xmx64G). This time I had to exclude the lines affected by the garbage collection (-verbose:gc). The meaning of log lines is the same as in the previous test, with the only difference of added thread number.

Thread # 6 : 1000000; time = 5.922 sec
Thread # 3 : 1000000; time = 6.037 sec
Thread # 4 : 1000000; time = 6.065 sec
Thread # 0 : 1000000; time = 6.066 sec
Thread # 1 : 1000000; time = 6.075 sec
Thread # 5 : 1000000; time = 6.091 sec
Thread # 2 : 1000000; time = 6.169 sec
Thread # 7 : 1000000; time = 6.288 sec
Thread # 1 : 4000000; time = 4.991 sec
Thread # 0 : 4000000; time = 4.983 sec
Thread # 3 : 4000000; time = 5.067 sec
Thread # 2 : 4000000; time = 5.024 sec
Thread # 4 : 4000000; time = 5.028 sec
Thread # 6 : 4000000; time = 5.052 sec
Thread # 5 : 4000000; time = 5.102 sec
Thread # 7 : 4000000; time = 5.083 sec
Thread # 1 : 6000000; time = 7.012 sec
Thread # 0 : 6000000; time = 7.06 sec
Thread # 2 : 6000000; time = 6.99 sec
Thread # 3 : 6000000; time = 7.09 sec
Thread # 4 : 6000000; time = 7.045 sec
Thread # 6 : 6000000; time = 7.173 sec
Thread # 5 : 6000000; time = 7.16 sec
Thread # 7 : 6000000; time = 7.175 sec
Thread # 1 : 8000000; time = 9.098 sec
Thread # 0 : 8000000; time = 9.157 sec
Thread # 2 : 8000000; time = 9.157 sec
Thread # 3 : 8000000; time = 9.2 sec
Thread # 4 : 8000000; time = 9.222 sec
Thread # 6 : 8000000; time = 9.308 sec
Thread # 5 : 8000000; time = 9.314 sec
Thread # 7 : 8000000; time = 9.332 sec
Thread # 1 : 11000000; time = 12.987 sec
Thread # 2 : 11000000; time = 13.028 sec
Thread # 0 : 11000000; time = 13.063 sec
Thread # 4 : 11000000; time = 13.007 sec
Thread # 3 : 11000000; time = 13.04 sec
Thread # 6 : 11000000; time = 13.238 sec
Thread # 5 : 11000000; time = 13.268 sec
Thread # 7 : 11000000; time = 13.209 sec
Thread # 1 : 15000000; time = 21.826 sec
Thread # 2 : 15000000; time = 22.124 sec
Thread # 4 : 15000000; time = 22.142 sec
Thread # 3 : 15000000; time = 22.144 sec
Thread # 0 : 15000000; time = 22.384 sec
Thread # 7 : 15000000; time = 23.129 sec
Thread # 6 : 15000000; time = 23.228 sec
Thread # 5 : 15000000; time = 23.244 sec
Thread # 1 : 17000000; time = 32.329 sec
Thread # 2 : 17000000; time = 32.488 sec
Thread # 4 : 17000000; time = 32.489 sec
Thread # 3 : 17000000; time = 32.448 sec
Thread # 0 : 17000000; time = 32.603 sec
Thread # 7 : 17000000; time = 32.567 sec
Thread # 5 : 17000000; time = 32.574 sec
Thread # 6 : 17000000; time = 32.791 sec
Thread # 1 : 19000000; time = 37.914 sec
Thread # 2 : 19000000; time = 37.895 sec
Thread # 3 : 19000000; time = 37.827 sec
Thread # 4 : 19000000; time = 38.014 sec
Thread # 0 : 19000000; time = 37.981 sec
[GC 15464352K->15461583K(23223936K), 31.1850310 secs]
Thread # 7 : 19000000; time = 69.329 sec
Thread # 5 : 19000000; time = 69.291 sec
Thread # 6 : 19000000; time = 69.446 sec
Thread # 1 : 21000000; time = 43.171 sec
Thread # 2 : 21000000; time = 43.225 sec
Thread # 3 : 21000000; time = 43.265 sec
Thread # 4 : 21000000; time = 43.175 sec
Thread # 0 : 21000000; time = 43.206 sec
Thread # 7 : 21000000; time = 42.972 sec
Thread # 5 : 21000000; time = 42.983 sec
Thread # 6 : 21000000; time = 43.025 sec

The interpretation of this test should be done in the groups of all threads. For example, the following log snippet tells us that it took approximately 43 seconds to intern eight million strings (each thread has interned a million strings), when there were 20M*8=160 million strings in the JVM pool:

Thread # 1 : 21000000; time = 43.171 sec
Thread # 2 : 21000000; time = 43.225 sec
Thread # 3 : 21000000; time = 43.265 sec
Thread # 4 : 21000000; time = 43.175 sec
Thread # 0 : 21000000; time = 43.206 sec
Thread # 7 : 21000000; time = 42.972 sec
Thread # 5 : 21000000; time = 42.983 sec
Thread # 6 : 21000000; time = 43.025 sec

Comparing the singlethreaded and multithreaded String.intern tests

The first group of messages not affected by a garbage collection relates to 11 million - it translates into 88 million strings for the singlethreaded cases. Let us make a small approximation. Assume that the singlethreaded results are growing linearly, in this case we need to calculate a single threaded time (84M) as an average time for our [80M; 88M] multithreaded test case. It will approximately be equal to time(80M) + (time(80M) - time(76M)) = 4.4 sec + (4.4 sec - 4.2 sec) = 4.6 sec. The time to add one million strings in the multithreaded mode in the same situation is approximately equal to 43 sec / 8 = 5.375 sec, which translates in the just 17% overhead for 8 concurrent threads adding strings to the pool. This is an extremely low price to pay for the multithreaded write access to the JVM string pool.

JVM string pool thread locality test

After getting so low synchronization overhead measurement I asked myself if the JVM string pool is actually thread local? In that situation we should get 2 different objects if we will intern the same string from 2 different threads. The following test will start the first thread which will intern the strings, when sleep for 2 seconds and then start another similar thread. The only difference if that we expect that str.intern() == str (identity equality here) in the first thread, because we intern a new string, so it should be saved in the JVM pool, but we expect str.intern() != str in the second thread, because this string has already been interned by the first thread.

There is another side effect in this test - at some point the second thread will make up the gap, because read access to the JVM pool is definitely faster than the write access.

Here is the test code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
private static void multiThreadedLocalPoolTest( final int cnt ) throws InterruptedException {
    final Runnable task1 = new Runnable() {
        @Override
        public void run() {
            final List<String> lst = new ArrayList<String>( 100 );
            long start = System.currentTimeMillis();
            for ( int i = 0; i < cnt; ++i )
            {
                final String str = Integer.toString( i );
                final String interned = str.intern();
                if ( str != interned )
                    System.out.println( "Thread 0: different interned " + str );
                lst.add( interned );
                if ( i % 1000000 == 0 )
                {
                    System.out.println( "Thread 0 : " + i + "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                    start = System.currentTimeMillis();
                }
            }
        System.out.println( "Total length = " + lst.size() );
        }
    };
 
    final Runnable task2 = new Runnable() {
    @Override
    public void run() {
        final List<String> lst = new ArrayList<String>( 100 );
        long start = System.currentTimeMillis();
        for ( int i = 0; i < cnt; ++i )
        {
            final String str = Integer.toString( i );
            final String interned = str.intern();
            if ( str == interned )
                System.out.println( "Thread 1: same interned " + str );
            lst.add( interned );
            if ( i % 1000000 == 0 )
            {
                System.out.println( "Thread 1 : " + i + "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                start = System.currentTimeMillis();
            }
        }
        System.out.println( "Total length = " + lst.size() );
    }
};
 
final Thread thread1 = new Thread( task1 );
thread1.start();
 
Thread.sleep( 2000 );
 
final Thread thread2 = new Thread( task2 );
thread2.start();}
private static void multiThreadedLocalPoolTest( final int cnt ) throws InterruptedException {
    final Runnable task1 = new Runnable() {
        @Override
        public void run() {
            final List<String> lst = new ArrayList<String>( 100 );
            long start = System.currentTimeMillis();
            for ( int i = 0; i < cnt; ++i )
            {
                final String str = Integer.toString( i );
                final String interned = str.intern();
                if ( str != interned )
                    System.out.println( "Thread 0: different interned " + str );
                lst.add( interned );
                if ( i % 1000000 == 0 )
                {
                    System.out.println( "Thread 0 : " + i + "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                    start = System.currentTimeMillis();
                }
            }
        System.out.println( "Total length = " + lst.size() );
        }
    };

    final Runnable task2 = new Runnable() {
    @Override
    public void run() {
        final List<String> lst = new ArrayList<String>( 100 );
        long start = System.currentTimeMillis();
        for ( int i = 0; i < cnt; ++i )
        {
            final String str = Integer.toString( i );
            final String interned = str.intern();
            if ( str == interned )
                System.out.println( "Thread 1: same interned " + str );
            lst.add( interned );
            if ( i % 1000000 == 0 )
            {
                System.out.println( "Thread 1 : " + i + "; time = " + ( System.currentTimeMillis() - start ) / 1000.0 + " sec" );
                start = System.currentTimeMillis();
            }
        }
        System.out.println( "Total length = " + lst.size() );
    }
};

final Thread thread1 = new Thread( task1 );
thread1.start();

Thread.sleep( 2000 );

final Thread thread2 = new Thread( task2 );
thread2.start();}

The output from this test may show you a few interned strings below 1000 (at least it does for me). After that the output will be empty until the thread #1 will catch up with the thread #0. The output after that point is quite meaningless. This test proves us that there are no thread local pools (otherwise Thread #1 will print all interned values from the beginning).

1 writer, 7 readers test

The last test will check the performance of "1 writer, 7 readers" scenario. We will start 8 identical threads interning the same set of strings. It means that only one thread will actually add a string to the pool, while all other threads will only query it from the pool. We will use multiThreadedInternTest method from the test 2. The only difference will be that the string to intern will not contain a thread specific prefix.

Here are the test results. I will keep only one line per each million of strings.

Thread # 4 : 1000000; time = 0.807 sec
Thread # 6 : 7000000; time = 1.201 sec
Thread # 2 : 8000000; time = 1.244 sec
Thread # 2 : 10000000; time = 1.639 sec
Thread # 6 : 11000000; time = 1.65 sec
Thread # 6 : 12000000; time = 1.726 sec
Thread # 5 : 14000000; time = 1.588 sec
Thread # 7 : 15000000; time = 1.612 sec
Thread # 6 : 17000000; time = 1.715 sec
Thread # 6 : 18000000; time = 1.711 sec
Thread # 2 : 19000000; time = 1.762 sec
Thread # 3 : 21000000; time = 1.857 sec
Thread # 1 : 21000000; time = 1.858 sec
Thread # 3 : 22000000; time = 1.877 sec
Thread # 6 : 23000000; time = 1.991 sec
Thread # 7 : 25000000; time = 2.052 sec
Thread # 2 : 26000000; time = 2.15 sec
Thread # 3 : 27000000; time = 2.17 sec
Thread # 7 : 28000000; time = 2.145 sec
Thread # 0 : 31000000; time = 2.341 sec
Thread # 2 : 32000000; time = 2.353 sec
Thread # 1 : 33000000; time = 2.392 sec
Thread # 2 : 35000000; time = 2.548 sec
Thread # 7 : 36000000; time = 2.499 sec
Thread # 0 : 37000000; time = 2.532 sec
Thread # 0 : 38000000; time = 2.622 sec
Thread # 7 : 40000000; time = 2.748 sec
Thread # 6 : 41000000; time = 2.768 sec
Thread # 3 : 42000000; time = 2.835 sec
Thread # 0 : 44000000; time = 2.813 sec
Thread # 5 : 45000000; time = 2.979 sec
Thread # 4 : 46000000; time = 2.996 sec
Thread # 4 : 47000000; time = 3.067 sec
Thread # 7 : 49000000; time = 2.976 sec
Thread # 0 : 50000000; time = 3.191 sec
Thread # 3 : 51000000; time = 3.102 sec
Thread # 7 : 52000000; time = 3.214 sec
Thread # 3 : 54000000; time = 3.401 sec
Thread # 4 : 55000000; time = 3.409 sec
Thread # 2 : 56000000; time = 3.471 sec
Thread # 2 : 57000000; time = 3.448 sec
Thread # 6 : 59000000; time = 3.67 sec
Thread # 5 : 60000000; time = 3.797 sec
Thread # 6 : 61000000; time = 3.744 sec
Thread # 0 : 62000000; time = 3.748 sec
Thread # 1 : 64000000; time = 3.921 sec
Thread # 1 : 66000000; time = 4.042 sec
Thread # 4 : 68000000; time = 4.115 sec
Thread # 2 : 69000000; time = 4.167 sec
Thread # 2 : 70000000; time = 4.276 sec
Thread # 7 : 71000000; time = 4.23 sec
Thread # 7 : 73000000; time = 4.38 sec
Thread # 6 : 74000000; time = 4.439 sec
Thread # 3 : 75000000; time = 4.403 sec
Thread # 4 : 76000000; time = 4.414 sec
Thread # 6 : 77000000; time = 4.499 sec
Thread # 6 : 78000000; time = 4.582 sec
Thread # 0 : 80000000; time = 4.706 sec
Thread # 6 : 80000000; time = 4.706 sec
Thread # 1 : 80000000; time = 4.706 sec
Thread # 4 : 80000000; time = 4.706 sec
Thread # 2 : 80000000; time = 4.706 sec
Thread # 3 : 80000000; time = 4.706 sec
Thread # 7 : 80000000; time = 4.706 sec
Thread # 5 : 80000000; time = 4.706 sec

If you will compare the results of this test with the first test, you will see that this scenario incurs the overhead of approximately 9% compared to the singlethreaded case.

Summary

  • Feel free to use String.intern() in the multithreaded code. "8 writers" scenario has only 17% overhead compared to "1 writer" (singlethreaded) scenario. "1 writer, 7 readers" scenario has 9% overhead in my test compared to the singlethreaded results.
  • JVM string pool is NOT thread local. Each string added to the pool will be available to all other threads in the JVM thus further improving the program memory consumption.