Tag Archives: cpu optimization

String.intern in Java 6, 7 and 8 – multithreaded access

by Mikhail Vorontsov

This article follows the initial String.intern in Java 6, 7 and 8 – string pooling article describing the implementation and benefits of using String.intern() method in Java 7 and 8. The original article was already getting too long, so I had to write this article in order to describe the performance characteristics of the multithreaded access to the String.intern.

The tests we will perform today are calling String.intern() from the multiple threads. They will emulate the behavior of a lot of modern server applications (for example, specialized web crawlers). For tests I will use my new workstation with Intel Xeon E5-2650 CPU (8 physical, 16 virtual cores @ 2 Ghz) and 128 Gb RAM. It will allow us to test the rather high contention scenarios. In this article we will create 8 threads in order to utilize all physical cores.

There will be 4 tests:

  1. The benchmark one – a single thread calling String.intern() using testLongLoop method from the previous article. It will show us how fast this configuration is without any contention.
  2. All 8 threads are calling String.intern() with unique values – an own prefix will be added by each thread to the interned string. This test will show us the synchronization overhead of String.intern(). It should be the theoretical worst case: it is highly unlikely that the only thing the actual application would do is calling String.intern in a loop from many threads.
  3. Initially we start the first thread interning the set of strings. After 2 seconds delay we will start a second thread interning the same set of strings. We expect that the following assumptions will be true: str.intern()==str for the first thread; str.intern()!=str for the second thread. It will allow us to prove that there are no thread local JVM string pools.
  4. All 8 threads will intern the same set of values. This scenario will be closer to the real situation – it will provide us the rather likely mix of adding strings to the JVM pool and querying the strings from it. Nevertheless, such a high read contention on the JVM string pool is still an unlikely event.

Continue reading

String.intern in Java 6, 7 and 8 – string pooling

by Mikhail Vorontsov

This article will describe how String.intern method was implemented in Java 6 and what changes were made in it in Java 7 and Java 8.

First of all I want to thank Yannis Bres for inspiring me to write this article.

07 June 2014 update: added 60013 as a default string pool size since Java 7u40 (instead of Java 8), added -XX:+PrintFlagsFinal and -XX:+PrintStringTableStatistics JVM parameter references, cleaned up a few typos.

This is an updated version of this article including -XX:StringTableSize=N JVM parameter description. This article is followed by String.intern in Java 6, 7 and 8 – multithreaded access article describing the performance characteristics of the multithreaded access to String.intern().

String pooling

String pooling (aka string canonicalisation) is a process of replacing several String objects with equal value but different identity with a single shared String object. You can achieve this goal by keeping your own Map<String, String> (with possibly soft or weak references depending on your requirements) and using map values as canonicalised values. Or you can use String.intern() method which is provided to you by JDK.

At times of Java 6 using String.intern() was forbidden by many standards due to a high possibility to get an OutOfMemoryException if pooling went out of control. Oracle Java 7 implementation of string pooling was changed considerably. You can look for details in http://bugs.sun.com/view_bug.do?bug_id=6962931 and http://bugs.sun.com/view_bug.do?bug_id=6962930.

String.intern() in Java 6

In those good old days all interned strings were stored in the PermGen – the fixed size part of heap mainly used for storing loaded classes and string pool. Besides explicitly interned strings, PermGen string pool also contained all literal strings earlier used in your program (the important word here is used – if a class or method was never loaded/called, any constants defined in it will not be loaded).

The biggest issue with such string pool in Java 6 was its location – the PermGen. PermGen has a fixed size and can not be expanded at runtime. You can set it using -XX:MaxPermSize=N option. As far as I know, the default PermGen size varies between 32M and 96M depending on the platform. You can increase its size, but its size will still be fixed. Such limitation required very careful usage of String.intern – you’d better not intern any uncontrolled user input using this method. That’s why string pooling at times of Java 6 was mostly implemented in the manually managed maps.

String.intern() in Java 7

Oracle engineers made an extremely important change to the string pooling logic in Java 7 – the string pool was relocated to the heap. It means that you are no longer limited by a separate fixed size memory area. All strings are now located in the heap, as most of other ordinary objects, which allows you to manage only the heap size while tuning your application. Technically, this alone could be a sufficient reason to reconsider using String.intern() in your Java 7 programs. But there are other reasons.

Continue reading

A few more memory saving techniques in Java

by Mikhail Vorontsov

This article will list a few more ideas on cheap memory saving techniques in Java. This article follows An overview of memory saving techniques in Java, Memory consumption of popular Java data types – part 1 and Memory consumption of popular Java data types – part 2 articles earlier published in this blog. I strongly recommend you to read An overview of memory saving techniques in Java before reading this article. This article assumes that size of Object references is equal to 4 bytes.

Static inner classes

Java lacks a normal library tuple support like the one Scala has. This forces Java developers to create a lot of usually inner classes, whose only purpose is to compose a few elements as a single entity for a collection. One really popular example of such inner class is Pair, which is often implemented as a generic class.

Inner non-static classes have an important property – they store a reference to their parent class. Such reference allows them to seamlessly call parent methods and access parent fields. Such a reference is often not needed for the simple tuple-like classes. You can get rid of it by marking your class as static. This will save you 4 bytes and increase its chances to occupy 8 bytes less (see An overview of memory saving techniques in Java for more details).

You should generally mark all your inner classes as static until there is a need to remove such qualifier.

Continue reading

java.io.BufferedInputStream and java.util.zip.GZIPInputStream

by Mikhail Vorontsov

BufferedInputStream and GZIPInputStream are two classes often used while reading some data from a file (the latter is widely used at least in Linux). Buffering input data is generally a good idea which was described in many Java performance articles. There are still a couple of issues worth knowing about these streams.

When not to buffer

Buffering is done in order to reduce number of separate read operations from an input device. Many developers often forget about it and always wrap InputStream inside BufferedInputStream, like

1
final InputStream is = new BufferedInputStream( new FileInputStream( file ) );
final InputStream is = new BufferedInputStream( new FileInputStream( file ) );

The short rule on whether to use buffering or not is the following: you don’t need it if your data blocks are long enough (100K+) and you can process blocks of any length (you don’t need any guarantees that at least N bytes is available in the buffer before starting processing). In all other cases you need to buffer input data.

The simplest example when you don’t need buffering is a manual file copy process.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public static void copyFile( final File from, final File to ) throws IOException {
    final InputStream is = new FileInputStream( from );
    try
    {
        final OutputStream os = new FileOutputStream( to );
        try
        {
            final byte[] buf = new byte[ 8192 ];
            int read = 0;
            while ( ( read = is.read( buf ) ) != -1 )
            {
                os.write( buf, 0, read );
            }
        }
        finally {
            os.close();
        }
    }
    finally {
        is.close();
    }
}
public static void copyFile( final File from, final File to ) throws IOException {
    final InputStream is = new FileInputStream( from );
    try
    {
        final OutputStream os = new FileOutputStream( to );
        try
        {
            final byte[] buf = new byte[ 8192 ];
            int read = 0;
            while ( ( read = is.read( buf ) ) != -1 )
            {
                os.write( buf, 0, read );
            }
        }
        finally {
            os.close();
        }
    }
    finally {
        is.close();
    }
}

Continue reading

Java logging performance pitfalls

by Mikhail Vorontsov

In this article we will discuss possible performance issues with Java loggers. We will not discuss any java.util.logging.Handler implementations, because they are actually based on I/O classes discussed in other articles. Here we will discuss how to use java.util.logging.Logger in order not to write a slow Java logging code.

Logging arguments handling

The most important thing about Logger performance could be found in its JavaDoc:

On each logging call the Logger initially performs a cheap check of the request level (e.g. SEVERE or FINE) against the effective log level of the logger. If the request level is lower than the log level, the logging call returns immediately.

Continue reading

java.io.ByteArrayOutputStream

by Mikhail Vorontsov

Do not use ByteArrayOutputStream in performance critical code

Very important: you will rarely need ByteArrayOutputStream in performance critical code. If you still think you may need it – read the rest of the article.

ByteArrayOutputStream is mostly used when you write a method which writes some sort of message with unknown length to some output stream (are there many cases when you can’t calculate size of your message?).

Important: if you know your message size in advance (or at least know an upper limit for it) – allocate a ByteBuffer instead (or reuse a previously allocated one) and write a message into it. It works faster than ByteArrayOutputStream (read Various methods of binary serialization in Java article).

ByteArrayOutputStream allows you to write anything to an internal expandable byte array and use that array as a single piece of output afterwards. Default buffer size is 32 bytes, so if you expect to write something longer, provide an explicit buffer size in the ByteArrayOutputStream(int) constructor.

In most cases ByteArrayOutputStream is used either when you are writing a callback method and caller provides you with some OutputStream, which nature is undefined, or if you are writing some “message to byte array” serialization method. Second case is covered in Various methods of binary serialization in Java article.

From above mentioned article you will know that ByteArrayOutputStream is synchronized and it seriously impacts its performance. So, if you don’t need synchronization, go to JDK sources, copy class contents to your project and remove all synchronization from it (and forget that it was my advice! 🙂 ). This will make it a bit faster…

Continue reading

java.util.IdentityHashMap

by Mikhail Vorontsov

IdentityHashMap is the only map in JDK useful for tracking objects by identity (object is identical only to itself) instead of tracking by equality (if object1.equals( object2 ), they are equal). IdentityHashMap is used mostly in cases when you can’t modify tracked objects definition by either adding additional fields or writing equals and hashCode methods.

Let’s see how some of these limitations could be removed:

  • If you can’t add equals and hashCode to the class (or it has different implementations of these methods), it could be solved by Trove maps custom hashing strategies.
  • If there is a primary key field in the object – try to use an ordinary map with this field as a key instead of IdentityHashMap with object itself as a key.

IdentityHashMap calls System.identityHashCode to get object identity hash code – some semi-unique int value identifying each object. System.identityHashCode belongs to a set of Java intrinsic methods, which is a small set of core Java methods, which are directly (or nearly directly) mapped onto single (or just a few) CPU instructions.

You can find the list of Java 8 intrinsic methods in this listing. Look for do_intrinsic macros in the second half of the file.

Continue reading

Changes to String internal representation made in Java 1.7.0_06

by Mikhail Vorontsov

This post was updated on 19 Nov 2013 in order to reflect Java 8 changes.


This post was updated on 28 Nov 2013 in order to reflect Java 7u40 changes (thanks to Sunny Chan and his colleagues for pointing my attention at this JDK update).

Sharing an underlying char[]

An original String implementation has 4 non static field: char[] value with string characters, int offset and int count with an index of the first character to use from value and a number of characters to use and int hash with a cached value of a String hash code. As you can see, in a very large number of cases a String will have offset = 0 and count = value.length. The only exception to this rule were the strings created via String.substring calls and all API calls using this method internally (like Pattern.split).

String.substring created a String, which shared an internal char[] value with an original String, which allowed you:

  1. To save some memory by sharing character data
  2. To run String.substring in a constant time ( O(1) )

At the same time such feature was a source of a possible memory leak: if you extract a tiny substring from an original huge string and discard that original string, you will still have a live reference to the underlying huge char[] value taken from an original String. The only way to avoid it was to call a new String( String ) constructor on such string – it made a copy of a required section of underlying char[], thus unlinking your shorter string from its longer “parent”.

From Java 1.7.0_06 (as well as in current versions of Java 8 – Nov 13) offset and count fields were removed from a String. This means that you can’t share a part of an underlying char[] value anymore. Now you can forget about a memory leak described above and never ever use new String(String) constructor anymore. As a drawback, you now have to remember that String.substring has now a linear complexity instead of a constant one.

Continue reading

hashCode method performance tuning

by Mikhail Vorontsov

In this chapter we will discuss various implications of hashCode method implementation on application performance.

The main purpose of hashCode method is to allow an object to be a key in the hash map or a member of a hash set. In this case an object should also implement equals(Object) method, which is consistent with hashCode implementation:

  • If a.equals(b) then a.hashCode() == b.hashCode()
  • If hashCode() was called twice on the same object, it should return the same result provided that the object was not changed

hashCode from performance point of view

From the performance point of view, the main objective for your hashCode method implementation is to minimize the number of objects sharing the same hash code. All JDK hash based collections store their values in an array. Hash code is used to calculate an initial lookup position in this array. After that equals is used to compare given value with values stored in the internal array. So, if all values have distinct hash codes, this will minimize the possibility of hash collisions. On the other hand, if all values will have the same hash code, hash map (or set) will degrade into a list with operations on it having O(n2) complexity.

For more details, read about collision resolution in hash maps. JDK is using a method called open addressing, but there is another method called “chaining” – all key-value pairs with the same hash code are stored in a linked list.

Continue reading

java.util.zip.CRC32 and java.util.zip.Adler32 performance

by Mikhail Vorontsov

Checksum is a function from an arbitrary size input byte array to a small number of bytes (integer in case of CRC32 and Adler32). Its most important property is that a small change in the input data (even one byte is increased by one) means a large change in the checksum. Checksums are usually calculated in one of two following scenarios:

  • A file was received with a separately received checksum. We need to calculate a checksum on the file contents and compare it with the received checksum to ensure that the file was not changed during transmission.
  • A file is being read from an input device and a checksum is being calculated on its contents. Partial file contents are being somehow processed. In some occasions connection with the input device may be broken and retransmission is required. We may use the last calculated checksum and its file position in order to check that the same data is being received during retransmission until the last known file position in order not to reprocess the retransmitted file and continue processing after that position.

The most well-known checksum implementation from standard JDK is java.util.zip.CRC32. A lot of developers think that it is the only standard checksum implementation (a lot of them would tell ‘standard CRC implementation’). Actually, there is one more checksum implementation called java.util.zip.Adler32. Both CRC32 and Adler32 implement the same interface – java.util.zip.Checksum.

Continue reading