Primitive types to String conversion and String concatenation

by Mikhail Vorontsov

Primitive types to String conversion

From time to time you may need to create a string in your program from several values, some of them may be of primitive types. If you have two or more primitive type values in the beginning of your string concatenation, you need to explicitly convert first of them to a string (otherwise System.out.println( 1 + 'a' ) will print ’98’, but not ‘1a’). Of course, there is a family of String.valueOf methods (or corresponding wrapper type methods), but who needs them if there is another way which requires less typing?

Concatenating an empty string literal and the first of your primitive type variables (in our example, "" + 1) is the easiest idea. Result of this expression is a String and after that you can safely concatenate any primitive type values to it – compiler will take care of all implicit conversions to String.

Unfortunately, this is the worst way one can imagine. In order to understand why it is so, we need to review how string concatenation operator is translated in Java. If we have a String value (doesn’t matter which sort of it – literal or variable or method call) followed by + operator followed by any type expression:

1
string_exp + any_exp
string_exp + any_exp

Java compiler will translate it to:

1
new StringBuilder().append( string_exp ).append( any_exp ).toString()
new StringBuilder().append( string_exp ).append( any_exp ).toString()

If you have more than one + operator in the expression, you will end up with several StringBuilder.append calls before final toString call.

StringBuilder(String) constructor allocates a buffer containing 16 characters. So, appending up to 16 characters to that StringBuilder will not require buffer reallocation, but appending more than 16 characters will expand StringBuilder buffer. At the end, in the StringBuilder.toString() call a new String object with a copy of StringBuilder buffer will be created.

This means that for the worst case conversion of a single primitive type value to String, you will need to allocate: one StringBuilder, one char[ 16 ], one String and one char[] of appropriate size to fit your input value. By using one of String.valueOf methods you will at least avoid creating a StringBuilder.

Sometimes you actually don’t have to convert primitive value to String at all. For example, you are parsing an input string, which is a comma-separated string. In the initial version you had something like such call:

1
final int nextComma = str.indexOf("'");
final int nextComma = str.indexOf("'");

or even

1
final int nextComma = str.indexOf('\'');
final int nextComma = str.indexOf('\'');

After that program requirements were extended in order to support any separator. Of course, a straightforward interpretation of “any” means you need to keep a separator in a String object and use String.indexOf(String) method. Let’s suggest that a preconfigured separator is stored in m_separator field. In this case your parsing may look like:

1
2
3
4
5
6
7
8
9
10
11
12
private static List<String> split( final String str )
{
    final List<String> res = new ArrayList<String>( 10 );
    int pos, prev = 0;
    while ( ( pos = str.indexOf( m_separator, prev ) ) != -1 )
    {
        res.add( str.substring( prev, pos ) );
        prev = pos + m_separator.length(); // start from next char after separator
    }
    res.add( str.substring( prev ) );
    return res;
}
private static List<String> split( final String str )
{
    final List<String> res = new ArrayList<String>( 10 );
    int pos, prev = 0;
    while ( ( pos = str.indexOf( m_separator, prev ) ) != -1 )
    {
        res.add( str.substring( prev, pos ) );
        prev = pos + m_separator.length(); // start from next char after separator
    }
    res.add( str.substring( prev ) );
    return res;
}

But later it was discovered that you will never get more than a single character separator. In the initialization, you will replace String m_separator with char m_separator and change its setter appropriately. But you may be tempted not to update parsing method a lot (why should I change the working code anyway?):

1
2
3
4
5
6
7
8
9
10
11
12
private static List<String> split2( final String str )
{
    final List<String> res = new ArrayList<String>( 10 );
    int pos, prev = 0;
    while ( ( pos = str.indexOf( <b>"" + m_separatorChar</b>, prev ) ) != -1 )
    {
        res.add( str.substring( prev, pos ) );
        prev = pos + <b>1</b>; // start from next char after separator
    }
    res.add( str.substring( prev ) );
    return res;
}
private static List<String> split2( final String str )
{
    final List<String> res = new ArrayList<String>( 10 );
    int pos, prev = 0;
    while ( ( pos = str.indexOf( <b>"" + m_separatorChar</b>, prev ) ) != -1 )
    {
        res.add( str.substring( prev, pos ) );
        prev = pos + <b>1</b>; // start from next char after separator
    }
    res.add( str.substring( prev ) );
    return res;
}

As you may see, indexOf call was updated, but it still creates a string and uses it. Of course, this is wrong, because there is a same method accepting char instead of String. Let’s use it:

1
2
3
4
5
6
7
8
9
10
11
12
private static List<String> split3( final String str )
{
    final List<String> res = new ArrayList<String>( 10 );
    int pos, prev = 0;
    while ( ( pos = str.indexOf( <b>m_separatorChar</b>, prev ) ) != -1 )
    {
        res.add( str.substring( prev, pos ) );
        prev = pos + 1; // start from next char after separator
    }
    res.add( str.substring( prev ) );
    return res;
}
private static List<String> split3( final String str )
{
    final List<String> res = new ArrayList<String>( 10 );
    int pos, prev = 0;
    while ( ( pos = str.indexOf( <b>m_separatorChar</b>, prev ) ) != -1 )
    {
        res.add( str.substring( prev, pos ) );
        prev = pos + 1; // start from next char after separator
    }
    res.add( str.substring( prev ) );
    return res;
}

For the test, "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz" string was parsed 10 million times using all 3 methods. Here are Java 6_41 and 7_15 running times. Java 7 running time was increased due to now linear complexity of String.substring method. You can read more about it here.

As you may see, this simple refactoring has considerably decreased time spent in splitting ( split/split2 -> split3 ).

  split split2 split3
Java 6 4.65 sec 10.34 sec 3.8 sec
Java 7 6.72 sec 8.29 sec 4.37 sec

String concatenation

This article will not be complete without mentioning the 2 other string concatenation methods. First one, rather rarely used, is String.concat method. Inside, it allocates a char[] of length equal to sum of concatenated strings lengths, copies string data into it and creates a new String using a private String constructor, which doesn’t make a copy of input char[], so only two objects are being created as a result – String and its internal char[]. Unfortunately, this method is only efficient when you need to concatenate exactly 2 strings.

The third way of string concatenation is using StringBuilder class and its various append methods. This is definitely the fastest way when you need to concatenate many input values. It was introduced in Java 5 as a replacement for StringBuffer class. Their main difference is that a StringBuffer is thread-safe, while StringBuilder is not. Do you often create a string concurrently?

As a test, all numbers between 0 and 100,000 were concatenated using String.concat, + operator and StringBuilder using code like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
String res = ""; 
for ( int i = 0; i < ITERS; ++i )
{
    final String s = Integer.toString( i );
    res = res.concat( s ); //second option: res += s;
}        
//third option:        
StringBuilder res = new StringBuilder(); 
for ( int i = 0; i < ITERS; ++i )
{
    final String s = Integer.toString( i );
    res.append( s );
}
String res = ""; 
for ( int i = 0; i < ITERS; ++i )
{
    final String s = Integer.toString( i );
    res = res.concat( s ); //second option: res += s;
}        
//third option:        
StringBuilder res = new StringBuilder(); 
for ( int i = 0; i < ITERS; ++i )
{
    final String s = Integer.toString( i );
    res.append( s );
}
String.concat + StringBuilder.append
10.145 sec 42.677 sec 0.012 sec

Results are obvious – O(n) algorithm is of course much faster than O(n2) algorithms. But in real life we have a lot of + operators in our programs – they are more convenient. In order to deal with it, -XX:+OptimizeStringConcat option was introduced in Java 6 update 20. It was turned on by default between Java 7_02 and Java 7_15 (and it is still off by default in Java 6_41), so you may have to explicitly turn it on. As many other -XX options, it is extremely badly documented:

Optimize String concatenation operations where possible. (Introduced in Java 6 Update 20)

Let’s just assume that Oracle engineers did their best with this option. Anecdotal knowledge tells that it replaces some StringBuilder generated logic with logic similar to String.concat implementation – it creates a char[] with appropriate length for all concatenated values and copies them to that output array. After that it creates a result String. Probably, nested concatenations are also supported ( str1 + ( str2 + str3 ) + str4 ). Running our test with this option proves that time for + operator is getting very similar to String.concat implementation:

String.concat + StringBuilder.append
10.19 sec 10.722 sec 0.013 sec

Let’s make one more test for this option. As it was noticed before, default StringBuilder constructor allocates 16 characters buffer. The buffer is expanded when we need to add 17-th character to it. Let’s append each number between 100 and 100,000 to “12345678901234” string. As a result we will have strings 17 to 20 characters long, so default + operator implementation will require StringBuilder resizing. As a counter example, let’s make another test in which we will explicitly create StringBuilder(21) to ensure that its buffer will not resize:

1
2
final String s = BASE + i;
final String s = new StringBuilder( 21 ).append( BASE ).append( i ).toString();
final String s = BASE + i;
final String s = new StringBuilder( 21 ).append( BASE ).append( i ).toString();

Without this option, time for + implementation is 50% higher than time for explicit StringBuilder implementation. Turning this option on makes both results equal. But what’s more interesting, even explicit StringBuilder implementation is getting faster with it!

+, turned off +, turned on new StringBuilder(21), turned off new StringBuilder(21), turned on
0.958 sec 0.494 sec 0.663 sec 0.494 sec

Summary

  • Never use concatenation with an empty string "" as a “to string conversion”. Use appropriate String.valueOf or wrapper types toString(value) methods instead.
  • Whenever possible, use StringBuilder for string concatenation. Check old code and get rid of StringBuffer is possible.
  • Use -XX:+OptimizeStringConcat option introduced in Java 6 update 20 in order to improve string concatenation performance. It is turned on by default in recent Java 7 releases, but it is still turned off in Java 6_41.

One thought on “Primitive types to String conversion and String concatenation

  1. Pingback: Book review: Java Performance: The Definitive Guide by Scott Oaks   | Java Performance Tuning Guide

Comments are closed.