Category Archives: CPU optimization

Implementing a high performance Money class

by Mikhail Vorontsov

This article will discuss how to efficiently implement a Money class in Java. This article follows Using double/long vs BigDecimal for monetary calculations article published earlier in this blog.

Introduction

As I have written before, we should not use any floating point types for storing monetary values. We should use long instead, keeping the number of smallest currency units in it. Unfortunately, there is a couple of problems:

  • We may still have to communicate with a software storing monetary values in double variables.
  • We may want to multiply it by a non-integer number, like 0.015, or divide it by any type of number, thus getting the non-integer result.

The easy way to deal with arithmetic operations is to use the BigDecimal class for all of them. Unfortunately, BigDecimal will not help you to deal with already incorrect double calculations results. For example, the following code will print 0.7999999999999999:

1
System.out.println( new BigDecimal( 0.7 + 0.1, MathContext.DECIMAL64 ) );
System.out.println( new BigDecimal( 0.7 + 0.1, MathContext.DECIMAL64 ) );

So, we may say that BigDecimal operations will produce the correct result if it had the correct arguments. We can not say the same about double operations.

double operations may end up with the exact result or a result which is one bit off the exact result. This is most likely caused by executing double operations using higher precision on CPU and then rounding the result. All we need to do to get the exact result is to look at these 3 values (result; result plus one bit value, which is also known as ulp; and result minus ulp). One of them would be the correct one. Easy to say, difficult to efficiently implement 🙂

Continue reading

Joda Time library performance

by Mikhail Vorontsov

Joda Time library is an attempt to create a cleaner and more developer-friendly date/time library than an existing JDK Date/Calendar/DateFormat ecosystem. This article provides a top level overview of Joda Time library from performance point of view. It may be also useful as a short introduction to Joda Time. I recommend reading java.util.Date, java.util.Calendar and java.text.SimpleDateFormat article before reading this one in order to better understand the performance of built-in JDK classes.

This article discusses Joda Time library versions 2.1 – 2.3.


26 Jan 2014: This is a major article rewrite – a major performance issue was found in Joda Time implementation.

Date/time storage classes

There are five general purpose date/time classes in this library:

  • DateTime – full replacement of java.util.Calendar, supporting time zones and date/time arithmetic. This class is immutable.
  • MutableDateTime – mutable version of DateTime.
  • LocalDate – immutable version of DateTime containing only date fields. This class does not use a timezone after construction.
  • LocalTime – immutable version of DateTime containing only time fields. This class does not use a timezone after construction.
  • LocalDateTime – immutable version of DateTime not using a timezone.

All these classes contain 2 mandatory fields – long with millis since epoch and a Chronology object, containing all timezone and calendar system (Gregorian, Buddhist, etc.) related logic. Some of these classes contain 1-2 more fields. An instance of these classes occupy from 24 to 40 bytes.

Since these classes are based on the “machine” time – millis since epoch, we may expect the better performance on from/to long conversions and worse performance on to/from date components (human time) conversions. In reality, Joda developers have made a series of clever optimizations which make it really fast even on the human time calculations.

Joda Time timezone offset calculation performance bug (ver 2.3)

Let’s start an updated version of this article from this point 🙂 When I wrote an original version of this article in late 2012, I noticed the poor performance of timezone based logic in Joda Time, but I thought that it was the common property of that library, so I wrote that “this logic is definitely worth optimizing in Joda Time”.

One year later I have written a more comprehensive set of tests which includes the new Java 8 date/time implementation (JSR-310). This test set has highlighted some weird inconsistencies between various date/time implementations. In particular I have noticed that creating a Joda MutableDateTime from date components was 3-4 times faster than the same operation on date+time components. Nevertheless both were using the identical client logic. But there was a difference – date tests were using years from 1981 to 2000 (in a loop). DateTime tests were using 2013. This turned out to be a key to the problem.

MutableDateTime, as well as some other Joda Time classes are calculating timezone offset for the given “millis since epoch” values from time to time. It may be calculated more than once per client API call. Deep under the hood there is a org.joda.time.tz.DateTimeZoneBuilder$PrecalculatedZone class with getOffset method. This method looks up the transition table for the given timezone using binary search. If your timestamp is less or equal to the biggest timestamp in the table – you get it. Otherwise org.joda.time.tz.DateTimeZoneBuilder$DSTZone.getOffset method is called for every offset you calculate. It uses daylight savings transition rules to calculate the latest transition and use it for offset calculation. Calculated values are not cached on this branch.

I have noticed this difference between years 2008 and 2009 in “Australia/Sydney” timezone. After that I ran the same test on all available timezones and found a list of zones in/around Australia and New Zealand with the same performance issue – offsets in 2009 were calculated much slower than in 2008. At the same time I have noticed that European timezones were slow in both 2008 and 2009. This led me to the conclusion.

Joda time ships with timezone rule source files in the src/main/java/org/joda/time/tz/src directory. If you’ll take a look at “australasia” file and look for the “New South Wales” rule, you will see that 2 its last lines are:

Rule	AN	2008	max	-	Apr	Sun>=1	2:00s	0	-
Rule	AN	2008	max	-	Oct	Sun>=1	2:00s	1:00	-

This is the last fast year – 2008 (and I used 1st January for testing). After that I got more suspicious and opened “europe” file and got really worried. For example, Netherlands (Europe/Amsterdam) last rule belongs back to 1945:

Rule	Neth	1945	only	-	Apr	 2	2:00s	1:00	S
Rule	Neth	1945	only	-	Sep	16	2:00s	0	-
    

The longer a country lives on a stable daylight saving rule, the more it gets penalized by Joda Time: it takes ~3.6 seconds to create 10M MutableDateTime objects for year 2008 in “Europe/Amsterdam” timezone, ~3 seconds to create 10M MutableDateTime objects for year 2010 in “Australia/Sydney” timezone (which also has to calculate its daylight savings transitions), but only ~1.2 seconds for year 2008 in Sydney (precalculated table).


So, I would like to ask Joda Time maintainers to consider prebuilding such transition tables during the timezone construction (at runtime) up to at least the current year plus a few years more.

Continue reading

java.util.Random and java.util.concurrent.ThreadLocalRandom in multithreaded environments

by Mikhail Vorontsov

Generating pseudorandom data

There are 2 types of random generators in Java: pseudorandom and secure. Pseudorandom generators are transforming a seed into a new portion of pseudorandom data based on some formula. Secure random generators are using some machine specific sources of actually random events (file/socket access, for example) to generate its data.

Secure random generators:

  • should be used only when cryptographically strong random data is required
  • are slow
  • could be waiting for external events (“stuck”) if you have requested a lot of random data (Linux /dev/random is an example of such generator)

Pseudorandom generators, on the other hand, depend only on the initial “seed” value, so you can generate the same sequence of pseudorandom events if you will provide the same seed to the algorithm. In general case, such generators are fast because they are CPU bound only (do not rely on any IO). We will review the evolution of pseudorandom generators in Java and the reasons behind these changes.

java.util.Random

java.util.Random is available from Java 1.0. It is a thread safe class, so you may share (in theory) instances of this class between several threads and do not expect to get the same random data in 2 threads at the same time. Such thread safety is achieved via using an AtomicLong for the generator seed.

Random uses AtomicLong CAS (compare-and-set) operations for updating its seed. Despite being a light non-blocking primitive used in a lot of non-blocking algorithms, CAS behaves really poorly under the high contention. Wait for test results to see how poorly it behaves.

java.util.concurrent.ThreadLocalRandom

java.util.concurrent.ThreadLocalRandom was added in Java 7 as an attempt to overcome all performance issues associated with java.util.Random. This new class extends java.util.Random, so you may pass it to all logic expecting the old java.util.Random.

Here are main ThreadLocalRandom implementation details:

  • Internally it does not use an AtomicLong seed from Random. Instead it uses an ordinary long.
  • You can not create an instance of ThreadLocalRandom yourself, because its constructor is not public. Instead you should use its static factory ThreadLocalRandom.current(). This factory method queries internal ThreadLocal<ThreadLocalRandom>
  • It is CPU-cache aware, so it uses 8 long dummy fields for padding, thus pushing anything else out of its 64 byte L1 cache line.

All of these changes are important, which will be revealed by the following tests.

Continue reading

Static code compilation in Groovy 2.0

by Mikhail Vorontsov

A tiny introduction to Groovy

Groovy is a JVM-based language useful for scripting and other non-CPU critical applications (like Grails web framework). It is a dynamically typed language allowing you to add methods and properties to existing objects at runtime.

The ability to add methods and properties at runtime is implemented via methodMissing and propertyMissing methods/handlers as well as a dynamic method registration. Such feature allows you to support your own DSLs via parsing not existing method names at runtime and registering the actual method bodies corresponding to such methods/properties. It allows you, for example, to generate database access method like List<Person> getPersonsByN( N n ) where N is any field defined in the Persons database table.

Such functionality made Groovy popular in the web development due to ability to generate repeating data access methods at runtime in the frameworks. Unfortunately (or luckily 🙂 ), Groovy method calls are using the dynamic dispatch model – Groovy runtime chooses the best matching method signature based on the runtime argument types instead of compile time argument types, like Java does. Dynamic dispatch requires each Groovy method call to use the Groovy runtime method lookup code based on the reflection. So, are method calls in Groovy extremely slow? The answer is no – Groovy does a very good job of caching call sites, not making another reflection lookup if possible.

Groovy static compilation

One of the main features of Groovy 2.0 was the static compilation mode. It is turned on by annotating methods or the whole class with the @CompileStatic annotation. This annotation actually turns on 2 features:

  1. Static type checking
  2. Static compilation

Continue reading

String.intern in Java 7 and 8 – part 3

by Mikhail Vorontsov

I want to return to String.intern method I have discussed earlier ( part 1, part 2 ). During the last couple of months I have used interning heavily in my pet project in order to see the pros and cons of using String.intern for every non-temporary string in my application (non-temporary – the one which is expected to live longer than for a few seconds and most likely get into the GC old generation).

As I have written before, the advantages of String.intern in Java 7 and 8 are:

  • Rather fast execution, nearly no performance loss in the multithreaded mode (still using the global string pool).
  • Saving memory, thus allowing your dataset to shrink, which will let your application to run faster (in general).

The main disadvantages of this method (as I have noticed before) are:

  • The requirement to set -XX:StringTableSize=N JVM parameter in advance and dealing with its fixed value (needs JVM restart in order to expand the JVM string pool).
  • Adding calls to String.intern in a lot of places globally (probably via your own wrapper) – which has linear code maintenance cost.

After a couple of months of using String.intern in my project, I think that its usage should be limited to the fields having a limited domain of distinct values (like person first names or state/province names). We should not try to intern anything with a low probability of a repeating value – it would be a waste of CPU time.

Continue reading

How to iterate zip file records (java.util.zip.ZipFile, java.util.zip.ZipInputStream)

by Mikhail Vorontsov

The right way to iterate a zip file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
final ZipFile file = new ZipFile( FILE_NAME );
try
{
    final Enumeration<? extends ZipEntry> entries = file.entries();
    while ( entries.hasMoreElements() )
    {
        final ZipEntry entry = entries.nextElement();
        System.out.println( entry.getName() );
        //use entry input stream:
        readInputStream( file.getInputStream( entry ) )
    }
}
finally
{
    file.close();
}
    
private static int readInputStream( final InputStream is ) throws IOException {
    final byte[] buf = new byte[ 8192 ];
    int read = 0;
    int cntRead;
    while ( ( cntRead = is.read( buf, 0, buf.length ) ) >=0  )
    {
        read += cntRead;
    }
    return read;
}
final ZipFile file = new ZipFile( FILE_NAME );
try
{
    final Enumeration<? extends ZipEntry> entries = file.entries();
    while ( entries.hasMoreElements() )
    {
        final ZipEntry entry = entries.nextElement();
        System.out.println( entry.getName() );
        //use entry input stream:
        readInputStream( file.getInputStream( entry ) )
    }
}
finally
{
    file.close();
}
    
private static int readInputStream( final InputStream is ) throws IOException {
    final byte[] buf = new byte[ 8192 ];
    int read = 0;
    int cntRead;
    while ( ( cntRead = is.read( buf, 0, buf.length ) ) >=0  )
    {
        read += cntRead;
    }
    return read;
}

Continue reading

Primitive types to String conversion and String concatenation

by Mikhail Vorontsov

Primitive types to String conversion

From time to time you may need to create a string in your program from several values, some of them may be of primitive types. If you have two or more primitive type values in the beginning of your string concatenation, you need to explicitly convert first of them to a string (otherwise System.out.println( 1 + 'a' ) will print ’98’, but not ‘1a’). Of course, there is a family of String.valueOf methods (or corresponding wrapper type methods), but who needs them if there is another way which requires less typing?

Concatenating an empty string literal and the first of your primitive type variables (in our example, "" + 1) is the easiest idea. Result of this expression is a String and after that you can safely concatenate any primitive type values to it – compiler will take care of all implicit conversions to String.

Unfortunately, this is the worst way one can imagine. In order to understand why it is so, we need to review how string concatenation operator is translated in Java. If we have a String value (doesn’t matter which sort of it – literal or variable or method call) followed by + operator followed by any type expression:

1
string_exp + any_exp
string_exp + any_exp

Java compiler will translate it to:

1
new StringBuilder().append( string_exp ).append( any_exp ).toString()
new StringBuilder().append( string_exp ).append( any_exp ).toString()

If you have more than one + operator in the expression, you will end up with several StringBuilder.append calls before final toString call.

Continue reading

Java varargs performance issues

by Mikhail Vorontsov

One of Java 5 new features is variable length arguments lists. They could be useful if you have to provide several arguments of the same type to your method. For example, if prior to Java 5 you had to write a method which prints all its arguments to a console, it would have been:

1
2
3
4
5
public static void printAll( final Object[] args )
{
    for ( int i = 0; i < args.length; ++i )
        System.out.println( args[ i ] );
}
public static void printAll( final Object[] args )
{
    for ( int i = 0; i < args.length; ++i )
        System.out.println( args[ i ] );
}

And its call would look like:

1
printAll( new Object[] { new Integer( 75 ), new Date(), "String" + 50 } );
printAll( new Object[] { new Integer( 75 ), new Date(), "String" + 50 } );

In Java 5 varargs support was added to the language. The same method now looks much simpler:

1
2
3
4
5
public static void printAllNew( final Object... args )
{
  for ( final Object arg : args )
      System.out.println( arg );
}
public static void printAllNew( final Object... args )
{
  for ( final Object arg : args )
      System.out.println( arg );
}

Its call also became shorter (due to the added variable length argument list support):

1
printAllNew( 75, new Date(), "String" + 50 );
printAllNew( 75, new Date(), "String" + 50 );

Continue reading

String.intern in Java 6, 7 and 8 – multithreaded access

by Mikhail Vorontsov

This article follows the initial String.intern in Java 6, 7 and 8 – string pooling article describing the implementation and benefits of using String.intern() method in Java 7 and 8. The original article was already getting too long, so I had to write this article in order to describe the performance characteristics of the multithreaded access to the String.intern.

The tests we will perform today are calling String.intern() from the multiple threads. They will emulate the behavior of a lot of modern server applications (for example, specialized web crawlers). For tests I will use my new workstation with Intel Xeon E5-2650 CPU (8 physical, 16 virtual cores @ 2 Ghz) and 128 Gb RAM. It will allow us to test the rather high contention scenarios. In this article we will create 8 threads in order to utilize all physical cores.

There will be 4 tests:

  1. The benchmark one – a single thread calling String.intern() using testLongLoop method from the previous article. It will show us how fast this configuration is without any contention.
  2. All 8 threads are calling String.intern() with unique values – an own prefix will be added by each thread to the interned string. This test will show us the synchronization overhead of String.intern(). It should be the theoretical worst case: it is highly unlikely that the only thing the actual application would do is calling String.intern in a loop from many threads.
  3. Initially we start the first thread interning the set of strings. After 2 seconds delay we will start a second thread interning the same set of strings. We expect that the following assumptions will be true: str.intern()==str for the first thread; str.intern()!=str for the second thread. It will allow us to prove that there are no thread local JVM string pools.
  4. All 8 threads will intern the same set of values. This scenario will be closer to the real situation – it will provide us the rather likely mix of adding strings to the JVM pool and querying the strings from it. Nevertheless, such a high read contention on the JVM string pool is still an unlikely event.

Continue reading

String.intern in Java 6, 7 and 8 – string pooling

by Mikhail Vorontsov

This article will describe how String.intern method was implemented in Java 6 and what changes were made in it in Java 7 and Java 8.

First of all I want to thank Yannis Bres for inspiring me to write this article.

07 June 2014 update: added 60013 as a default string pool size since Java 7u40 (instead of Java 8), added -XX:+PrintFlagsFinal and -XX:+PrintStringTableStatistics JVM parameter references, cleaned up a few typos.

This is an updated version of this article including -XX:StringTableSize=N JVM parameter description. This article is followed by String.intern in Java 6, 7 and 8 – multithreaded access article describing the performance characteristics of the multithreaded access to String.intern().

String pooling

String pooling (aka string canonicalisation) is a process of replacing several String objects with equal value but different identity with a single shared String object. You can achieve this goal by keeping your own Map<String, String> (with possibly soft or weak references depending on your requirements) and using map values as canonicalised values. Or you can use String.intern() method which is provided to you by JDK.

At times of Java 6 using String.intern() was forbidden by many standards due to a high possibility to get an OutOfMemoryException if pooling went out of control. Oracle Java 7 implementation of string pooling was changed considerably. You can look for details in http://bugs.sun.com/view_bug.do?bug_id=6962931 and http://bugs.sun.com/view_bug.do?bug_id=6962930.

String.intern() in Java 6

In those good old days all interned strings were stored in the PermGen – the fixed size part of heap mainly used for storing loaded classes and string pool. Besides explicitly interned strings, PermGen string pool also contained all literal strings earlier used in your program (the important word here is used – if a class or method was never loaded/called, any constants defined in it will not be loaded).

The biggest issue with such string pool in Java 6 was its location – the PermGen. PermGen has a fixed size and can not be expanded at runtime. You can set it using -XX:MaxPermSize=N option. As far as I know, the default PermGen size varies between 32M and 96M depending on the platform. You can increase its size, but its size will still be fixed. Such limitation required very careful usage of String.intern – you’d better not intern any uncontrolled user input using this method. That’s why string pooling at times of Java 6 was mostly implemented in the manually managed maps.

String.intern() in Java 7

Oracle engineers made an extremely important change to the string pooling logic in Java 7 – the string pool was relocated to the heap. It means that you are no longer limited by a separate fixed size memory area. All strings are now located in the heap, as most of other ordinary objects, which allows you to manage only the heap size while tuning your application. Technically, this alone could be a sufficient reason to reconsider using String.intern() in your Java 7 programs. But there are other reasons.

Continue reading