Category Archives: Trivial

String switch implementation

by Mikhail Vorontsov

This article covers the implementation details of String switch introduced in Java 7. It is a syntactic sugar on top of the normal switch operator.

Suppose you have the following method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public int switchTest( final String s )
{
    switch ( s )
    {
        case "a" :
            System.out.println("aa");
            return 11;
        case "b" :
            System.out.println("bb");
            return 22;
        default :
            System.out.println("cc");
            return 33;
    }
}
public int switchTest( final String s )
{
    switch ( s )
    {
        case "a" :
            System.out.println("aa");
            return 11;
        case "b" :
            System.out.println("bb");
            return 22;
        default :
            System.out.println("cc");
            return 33;
    }
}

It is converted by javac into the following code (decompiled back into Java):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public int switchTest(String var1) {
    byte var3 = -1;
    switch(var1.hashCode()) {
    case 97:
        if(var1.equals("a")) {
            var3 = 0;
        }
        break;
    case 98:
        if(var1.equals("b")) {
            var3 = 1;
        }
    }
 
    switch(var3) {
    case 0:
        System.out.println("aa");
        return 11;
    case 1:
        System.out.println("bb");
        return 22;
    default:
        System.out.println("cc");
        return 33;
    }
}
public int switchTest(String var1) {
    byte var3 = -1;
    switch(var1.hashCode()) {
    case 97:
        if(var1.equals("a")) {
            var3 = 0;
        }
        break;
    case 98:
        if(var1.equals("b")) {
            var3 = 1;
        }
    }

    switch(var3) {
    case 0:
        System.out.println("aa");
        return 11;
    case 1:
        System.out.println("bb");
        return 22;
    default:
        System.out.println("cc");
        return 33;
    }
}

The generated code consists of 2 parts:

  • Translation from String into a distinct int for each case, which is implemented in the first switch statement.
  • The actual switch based on int-s.

The first switch contains a case for each distinct String.hashCode in the original String switch labels. After matching by hash code, a string is compared for equality to every string with the same hash code. It is pretty unlikely that 2 strings used in switch labels will have the same hash code, so in most cases you will end up with exactly one String.equals call.

After seeing the generated code, it becomes clear why you can not use null as a switch label: the first switch starts from calculating the hashCode of the switch argument.

What can we say about the performance of the underlying int switch? As you can find in one of my earlier articles, a switch is implemented as a fixed map with a table size of approximately 20 (which is fine for most of common cases).

Finally, we should note that String.hashCode implementation has implicitly became the part of the Java Language Specification after it was used in the String switch implementation. It can no longer be changed without breaking the .class files containing String switch, which were compiled with the older versions of Java.

String switch performance

by Mikhail Vorontsov

Suppose we have a large number of commands. For simplicity of writing this article, they all would be implemented as methods of one class. We should be able to call any of these commands by a string name. We will allow case-insensitive calls. Our “class with commands” would look like:

1
2
3
4
5
6
7
8
9
10
public class ObjectWithCommands {
    public Object Command1( final Object arg ) { return arg; }
    public Object Command2( final Object arg ) { return arg; }
    ...
    public Object Command9( final Object arg ) { return arg; }
    public Object Command10( final Object arg ) { return arg; }
    ...
    public Object Command99( final Object arg ) { return arg; }
    public Object Command100( final Object arg ) { return arg; }
}
public class ObjectWithCommands {
    public Object Command1( final Object arg ) { return arg; }
    public Object Command2( final Object arg ) { return arg; }
    ...
    public Object Command9( final Object arg ) { return arg; }
    public Object Command10( final Object arg ) { return arg; }
    ...
    public Object Command99( final Object arg ) { return arg; }
    public Object Command100( final Object arg ) { return arg; }
}

This article will check the performance of various ways of calling these commands.

But first of all, a quiz 🙂 Suppose you are going to call your commands using the following class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public class EqualsIgnoreCaseCaller {
    public static Object call( final ObjectWithCommands obj, final String commandName, final Object arg )
    {
        if ( commandName.equalsIgnoreCase( "Command1" ) )
            return obj.Command1( arg );
        if ( commandName.equalsIgnoreCase( "Command2" ) )
            return obj.Command2( arg );
        ...
        if ( commandName.equalsIgnoreCase( "Command99" ) )
            return obj.Command99( arg );
        if ( commandName.equalsIgnoreCase( "Command100" ) )
            return obj.Command100( arg );
    }
}
public class EqualsIgnoreCaseCaller {
    public static Object call( final ObjectWithCommands obj, final String commandName, final Object arg )
    {
        if ( commandName.equalsIgnoreCase( "Command1" ) )
            return obj.Command1( arg );
        if ( commandName.equalsIgnoreCase( "Command2" ) )
            return obj.Command2( arg );
        ...
        if ( commandName.equalsIgnoreCase( "Command99" ) )
            return obj.Command99( arg );
        if ( commandName.equalsIgnoreCase( "Command100" ) )
            return obj.Command100( arg );
    }
}

Which of the following method calls will be the fastest (after warmup)?

  1. EqualsIgnoreCaseCaller.call( obj, "Command9", arg );
  2. EqualsIgnoreCaseCaller.call( obj, "Command99", arg );
  3. EqualsIgnoreCaseCaller.call( obj, "Command100", arg );

Continue reading

Charset encoding and decoding in Java 7/8

by Mikhail Vorontsov

We will take a look at Charset encoder and decoder performance in Java 7/8. We will check the performance of the 2 following String methods with various charsets:

1
2
3
4
5
/* String to byte[] */
public byte[] getBytes(Charset charset);
/* byte[] to String */
public String(byte bytes[], Charset charset);
    
/* String to byte[] */
public byte[] getBytes(Charset charset);
/* byte[] to String */
public String(byte bytes[], Charset charset);
    

I have translated phrase “Develop with pleasure” into German, Russian, Japanese and traditional Chinese using Google Translate. We will build chunks of given size from these phrases by concatenating them using “\n” as a separator until we reach the required length (in most cases the result will be slightly longer). After that we will convert 100 million characters of such data to byte[] and back to String (100M is the total data length in Java chars). We will convert data 10 times in order to make results more reliable (as a result, times in the following table are the times to convert 1 billion chars).

We will use 2 chunk sizes: 100 chars to test the performance of short string conversion and 100M chars to test the raw conversion performance. You can find the source code at the end of this article. We will compare encoding into a “national” charset (US-ASCII for English; ISO-8859-1 for German; windows-1251 for Russian; Shift_JIS for Japanese; GB18030 for Traditional Chinese) with UTF-8 encoding, which could be used as a universal encoding (at a possible price of a longer binary representation). We will also compare Java 7u51 with Java 8 (original release). All tests were run on my Xeon-2650 (2.8Ghz) workstation with -Xmx32G setting (to avoid GC).

Here are the test results. Each cell has two times: Java7_time (Java8_time). “UTF-8” line, which follows every “national” charset line contains conversion times for the data from the previous line (for example, the last line contains times to encode/decode a string in the traditional Chinese into UTF-8).

Continue reading

String.intern in Java 7 and 8 – part 3

by Mikhail Vorontsov

I want to return to String.intern method I have discussed earlier ( part 1, part 2 ). During the last couple of months I have used interning heavily in my pet project in order to see the pros and cons of using String.intern for every non-temporary string in my application (non-temporary – the one which is expected to live longer than for a few seconds and most likely get into the GC old generation).

As I have written before, the advantages of String.intern in Java 7 and 8 are:

  • Rather fast execution, nearly no performance loss in the multithreaded mode (still using the global string pool).
  • Saving memory, thus allowing your dataset to shrink, which will let your application to run faster (in general).

The main disadvantages of this method (as I have noticed before) are:

  • The requirement to set -XX:StringTableSize=N JVM parameter in advance and dealing with its fixed value (needs JVM restart in order to expand the JVM string pool).
  • Adding calls to String.intern in a lot of places globally (probably via your own wrapper) – which has linear code maintenance cost.

After a couple of months of using String.intern in my project, I think that its usage should be limited to the fields having a limited domain of distinct values (like person first names or state/province names). We should not try to intern anything with a low probability of a repeating value – it would be a waste of CPU time.

Continue reading

A possible memory leak in the manual MultiMap implementation

by Mikhail Vorontsov

Pure Java 6 and 7

In this short article I will describe a junior level memory leak I have recently seen in a pure JDK application (no 3rd party libs were allowed).

Suppose you have a map from String identifiers to some Collections, for example a set of String properties of such identifiers: Map<String, Set<String>>. The actual type of the inner collection does not matter for this article – it should just be a Collection. Such collections are generally called multimaps.

The following method was initially written to obtain the inner set by its identifier:

1
2
3
4
5
6
7
8
9
10
11
12
private final Map<String, Set<String>> m_ids = new HashMap<String, Set<String>>( 4 );
 
private Set<String> getSetByNameFaulty( final String id )
{
    Set<String> res = m_ids.get( id );
    if ( res == null )
    {
        res = new HashSet<String>( 1 );
        m_ids.put( id, res );
    }
    return res;
}
private final Map<String, Set<String>> m_ids = new HashMap<String, Set<String>>( 4 );

private Set<String> getSetByNameFaulty( final String id )
{
    Set<String> res = m_ids.get( id );
    if ( res == null )
    {
        res = new HashSet<String>( 1 );
        m_ids.put( id, res );
    }
    return res;
}

This method checks if an identifier is already present in the map and either returns the corresponding Set or allocates a new one and adds it into the map. This method is useful for populating our map:

1
2
3
4
5
6
7
private void populateJava67()
{
    getSetByNameFaulty( "id1" ).add( "property1" );
    getSetByNameFaulty( "id1" ).add( "property2" );
    getSetByNameFaulty( "id1" ).add( "property3" );
    getSetByNameFaulty( "id2" ).add( "property1" );
}
private void populateJava67()
{
    getSetByNameFaulty( "id1" ).add( "property1" );
    getSetByNameFaulty( "id1" ).add( "property2" );
    getSetByNameFaulty( "id1" ).add( "property3" );
    getSetByNameFaulty( "id2" ).add( "property1" );
}

The next step while writing a program would be to add some accessors to our map, like the following one:

1
2
3
4
private boolean hasPropertyFaulty( final String id, final String property )
{
    return getSetByNameFaulty( id ).contains( property );
}
private boolean hasPropertyFaulty( final String id, final String property )
{
    return getSetByNameFaulty( id ).contains( property );
}

This method looks good and is unlikely to be caught by any code quality tools. Unfortunately, it has a major flaw: if you will query properties of unknown identifier, a new empty set will be allocated in our map inside getSetByNameFaulty method. This is definitely a not wanted side effect. Instead we should let our new getSetByName method know if we want to write something to the returned set:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
private Set<String> getSetByName( final String id, final boolean isWrite )
{
    Set<String> res = m_ids.get( id );
    if ( res == null )
    {
        if ( isWrite )
        {
            res = new HashSet<String>( 1 );
            m_ids.put( id, res );
        }
        else
            res = Collections.emptySet();
    }
    return res;
}
 
private boolean hasProperty( final String id, final String property )
{
    return getSetByName( id, false ).contains( property );
}
 
private void populateJava67Better()
{
    getSetByName( "id1", true ).add( "property1" );
    getSetByName( "id1", true ).add( "property2" );
    getSetByName( "id1", true ).add( "property3" );
    getSetByName( "id2", true ).add( "property1" );
}
private Set<String> getSetByName( final String id, final boolean isWrite )
{
    Set<String> res = m_ids.get( id );
    if ( res == null )
    {
        if ( isWrite )
        {
            res = new HashSet<String>( 1 );
            m_ids.put( id, res );
        }
        else
            res = Collections.emptySet();
    }
    return res;
}

private boolean hasProperty( final String id, final String property )
{
    return getSetByName( id, false ).contains( property );
}

private void populateJava67Better()
{
    getSetByName( "id1", true ).add( "property1" );
    getSetByName( "id1", true ).add( "property2" );
    getSetByName( "id1", true ).add( "property3" );
    getSetByName( "id2", true ).add( "property1" );
}

Continue reading

How to iterate zip file records (java.util.zip.ZipFile, java.util.zip.ZipInputStream)

by Mikhail Vorontsov

The right way to iterate a zip file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
final ZipFile file = new ZipFile( FILE_NAME );
try
{
    final Enumeration<? extends ZipEntry> entries = file.entries();
    while ( entries.hasMoreElements() )
    {
        final ZipEntry entry = entries.nextElement();
        System.out.println( entry.getName() );
        //use entry input stream:
        readInputStream( file.getInputStream( entry ) )
    }
}
finally
{
    file.close();
}
    
private static int readInputStream( final InputStream is ) throws IOException {
    final byte[] buf = new byte[ 8192 ];
    int read = 0;
    int cntRead;
    while ( ( cntRead = is.read( buf, 0, buf.length ) ) >=0  )
    {
        read += cntRead;
    }
    return read;
}
final ZipFile file = new ZipFile( FILE_NAME );
try
{
    final Enumeration<? extends ZipEntry> entries = file.entries();
    while ( entries.hasMoreElements() )
    {
        final ZipEntry entry = entries.nextElement();
        System.out.println( entry.getName() );
        //use entry input stream:
        readInputStream( file.getInputStream( entry ) )
    }
}
finally
{
    file.close();
}
    
private static int readInputStream( final InputStream is ) throws IOException {
    final byte[] buf = new byte[ 8192 ];
    int read = 0;
    int cntRead;
    while ( ( cntRead = is.read( buf, 0, buf.length ) ) >=0  )
    {
        read += cntRead;
    }
    return read;
}

Continue reading

Primitive types to String conversion and String concatenation

by Mikhail Vorontsov

Primitive types to String conversion

From time to time you may need to create a string in your program from several values, some of them may be of primitive types. If you have two or more primitive type values in the beginning of your string concatenation, you need to explicitly convert first of them to a string (otherwise System.out.println( 1 + 'a' ) will print ’98’, but not ‘1a’). Of course, there is a family of String.valueOf methods (or corresponding wrapper type methods), but who needs them if there is another way which requires less typing?

Concatenating an empty string literal and the first of your primitive type variables (in our example, "" + 1) is the easiest idea. Result of this expression is a String and after that you can safely concatenate any primitive type values to it – compiler will take care of all implicit conversions to String.

Unfortunately, this is the worst way one can imagine. In order to understand why it is so, we need to review how string concatenation operator is translated in Java. If we have a String value (doesn’t matter which sort of it – literal or variable or method call) followed by + operator followed by any type expression:

1
string_exp + any_exp
string_exp + any_exp

Java compiler will translate it to:

1
new StringBuilder().append( string_exp ).append( any_exp ).toString()
new StringBuilder().append( string_exp ).append( any_exp ).toString()

If you have more than one + operator in the expression, you will end up with several StringBuilder.append calls before final toString call.

Continue reading

Java varargs performance issues

by Mikhail Vorontsov

One of Java 5 new features is variable length arguments lists. They could be useful if you have to provide several arguments of the same type to your method. For example, if prior to Java 5 you had to write a method which prints all its arguments to a console, it would have been:

1
2
3
4
5
public static void printAll( final Object[] args )
{
    for ( int i = 0; i < args.length; ++i )
        System.out.println( args[ i ] );
}
public static void printAll( final Object[] args )
{
    for ( int i = 0; i < args.length; ++i )
        System.out.println( args[ i ] );
}

And its call would look like:

1
printAll( new Object[] { new Integer( 75 ), new Date(), "String" + 50 } );
printAll( new Object[] { new Integer( 75 ), new Date(), "String" + 50 } );

In Java 5 varargs support was added to the language. The same method now looks much simpler:

1
2
3
4
5
public static void printAllNew( final Object... args )
{
  for ( final Object arg : args )
      System.out.println( arg );
}
public static void printAllNew( final Object... args )
{
  for ( final Object arg : args )
      System.out.println( arg );
}

Its call also became shorter (due to the added variable length argument list support):

1
printAllNew( 75, new Date(), "String" + 50 );
printAllNew( 75, new Date(), "String" + 50 );

Continue reading

Java collections overview

by Mikhail Vorontsov

This article will give you an overview of all standard Java collections. We will categorize their distinguishable properties and main use cases. Besides that, we will list all correct ways of transforming your data between various collection types.

Arrays

Array is the only collection type built in Java. It is useful when you know an upper bound on the number of processed elements in advance. java.util.Arrays contains a lot of useful methods for array processing:

  • Arrays.asList – conversion from array to List, which could be passed to other standard collection constructors.
  • Arrays.binarySearch – fast lookup in a sorted array or its subsection.
  • Arrays.copyOf – use this method if you need to expand your array while keeping its contents.
  • Arrays.copyOfRange – if you need to make a copy of the whole array or its subsection.
  • Arrays.deepEquals, Arrays.deepHashCode – versions of Arrays.equals/hashCode supporting nested sub-arrays.
  • Arrays.equals – if you need to compare two arrays for equality, use this method instead of array equals method ( array.equals is not overridden in any array, so it only compares references to arrays, rather than their contents ). This method may be combined with Java 5 boxing and varargs in order to write a simple implementation of your class equals method – just pass all your class fields to Arrays.equals after comparing object types.
  • Arrays.fill – populate the whole array or its subsection with a given value.
  • Arrays.hashCode – useful method for calculating a hashcode of array contents ( array own hashcode method can not be used for this purpose ). This method may be combined with Java 5 boxing and varargs in order to write a simple implementation of your class hashcode method – just pass all your class fields to Arrays.hashcode.
  • Arrays.sort – sort the full array or its subsection using natural ordering. There is also a pair of Arrays.sort methods for sorting an Object[] using a provided Comparator.
  • Arrays.toString – fine-print the array contents.

If you need to copy one part of array (or the whole array) into another already existing array, you need to use System.arraycopy, which copies a given number of elements from a given position in the source array to a given position in the destination array. Generally, this is the fastest way to copy array contents in Java (but in some cases you may need to check if ByteBuffer bulk copy works faster ).

Finally, we need to mention that any Collection could be copied into an array using T[] Collection.toArray( T[] a ) method. The usual pattern of this method call:

1
return coll.toArray( new T[ coll.size() ] );
return coll.toArray( new T[ coll.size() ] );

Such method call allocates a sufficient array for storing the whole collection, so that toArray doesn’t have to allocate sufficiently large array to return.

Single-threaded collections

This part of the article describes non-thread-safe collections. All these collections are stored in the java.util package. Some of these collections were present in Java since Java 1.0 (and are now deprecated), most of them were already present in Java 1.4. Enum collections were added in Java 1.5 with the support of generics in all collection classes. PriorityQueue was also added in Java 1.5. The latest addition to the non-thread-safe collections framework is ArrayDeque, which was added in Java 1.6.

Lists

  • ArrayList – the most useful List implementation. Backed by an array and an int – position of the first not used element in the array. Like all Lists, expands itself when necessary. Has constant element access time. Cheap updates at the tail (constant complexity), expensive at the head (linear complexity) due to ArrayList invariant – all elements start from index = 0 in the underlying array, which means that everything to the right from the update position must be moved to the right for insertions and to the left for removals. CPU-cache friendly collection due to being backed by an array (unfortunately, not too friendly, because contains Objects, which are just pointers to the actual objects).
  • LinkedListDeque implementation – each Node consists of a value, prev and next pointers. It means that element access/updates have linear complexity (due to an optimization, these methods do not traverse more than a half of the list, so the most expensive elements are located in the middle of the list). You need to use ListIterators if you want to try to write fast LinkedList code. If you want a Queue/Deque implementation (you need to access only first and last elements) – consider using ArrayDeque instead.
  • Vector – a prehistoric version of ArrayList with all synchronized methods. Use ArrayList instead.

Continue reading

hashCode method performance tuning

by Mikhail Vorontsov

In this chapter we will discuss various implications of hashCode method implementation on application performance.

The main purpose of hashCode method is to allow an object to be a key in the hash map or a member of a hash set. In this case an object should also implement equals(Object) method, which is consistent with hashCode implementation:

  • If a.equals(b) then a.hashCode() == b.hashCode()
  • If hashCode() was called twice on the same object, it should return the same result provided that the object was not changed

hashCode from performance point of view

From the performance point of view, the main objective for your hashCode method implementation is to minimize the number of objects sharing the same hash code. All JDK hash based collections store their values in an array. Hash code is used to calculate an initial lookup position in this array. After that equals is used to compare given value with values stored in the internal array. So, if all values have distinct hash codes, this will minimize the possibility of hash collisions. On the other hand, if all values will have the same hash code, hash map (or set) will degrade into a list with operations on it having O(n2) complexity.

For more details, read about collision resolution in hash maps. JDK is using a method called open addressing, but there is another method called “chaining” – all key-value pairs with the same hash code are stored in a linked list.

Continue reading