Category Archives: Intermediate

JSR 310 – Java 8 Date/Time library performance (as well as Joda Time 2.3 and j.u.Calendar)

by Mikhail Vorontsov

Introduction

This is the third date/time article in this blog. I advice you to look at the other two as well: java.util.Date, java.util.Calendar and java.text.SimpleDateFormat and Joda Time library performance.

This article is a short overview of the new Java 8 date/time implementation also known as JSR-310. I will compare JSR-310 implementation and performance with their counterparts from Joda Time library as well as with the good old java.util.GregorianCalendar. This review was written and tested on Java 8 ea b121.

All new Java 8 classes are implemented around the human time concept – separate fields for years, months, days, hours, minutes, seconds, and, in line with the current fashion, nanoseconds. Their counterpart is a machine time – number of milliseconds since epoch, which you may obtain, for example, via System.currentTimeMillis() call. In order to convert time between 2 these systems you will need to know which timezone to use. A timezone defines the offset from UTC used in conversions. Offset calculation may require the use of transition table or transition rules defining when the daylight savings changes happen. Sometime it may become a performance bottleneck.

JSR-310 implementation was inspired by a Joda Time library – both libraries have the similar interface, but their implementations differ greatly – Java 8 classes are built around the human time, but Joda Time is using machine time inside. As a result, if you are looking for the fastest implementation, I would recommend you to use Java 8 classes except the situations when:

  • You can’t use Java 8 (yeah, not many people can use it before the first official release…)
  • You work strictly with the machine time inside a few day range (in this case manual long/int based implementation will be faster).
  • You have to parse timestamps including offset/zone id.

Continue reading

Joda Time library performance

by Mikhail Vorontsov

Joda Time library is an attempt to create a cleaner and more developer-friendly date/time library than an existing JDK Date/Calendar/DateFormat ecosystem. This article provides a top level overview of Joda Time library from performance point of view. It may be also useful as a short introduction to Joda Time. I recommend reading java.util.Date, java.util.Calendar and java.text.SimpleDateFormat article before reading this one in order to better understand the performance of built-in JDK classes.

This article discusses Joda Time library versions 2.1 – 2.3.


26 Jan 2014: This is a major article rewrite – a major performance issue was found in Joda Time implementation.

Date/time storage classes

There are five general purpose date/time classes in this library:

  • DateTime – full replacement of java.util.Calendar, supporting time zones and date/time arithmetic. This class is immutable.
  • MutableDateTime – mutable version of DateTime.
  • LocalDate – immutable version of DateTime containing only date fields. This class does not use a timezone after construction.
  • LocalTime – immutable version of DateTime containing only time fields. This class does not use a timezone after construction.
  • LocalDateTime – immutable version of DateTime not using a timezone.

All these classes contain 2 mandatory fields – long with millis since epoch and a Chronology object, containing all timezone and calendar system (Gregorian, Buddhist, etc.) related logic. Some of these classes contain 1-2 more fields. An instance of these classes occupy from 24 to 40 bytes.

Since these classes are based on the “machine” time – millis since epoch, we may expect the better performance on from/to long conversions and worse performance on to/from date components (human time) conversions. In reality, Joda developers have made a series of clever optimizations which make it really fast even on the human time calculations.

Joda Time timezone offset calculation performance bug (ver 2.3)

Let’s start an updated version of this article from this point 🙂 When I wrote an original version of this article in late 2012, I noticed the poor performance of timezone based logic in Joda Time, but I thought that it was the common property of that library, so I wrote that “this logic is definitely worth optimizing in Joda Time”.

One year later I have written a more comprehensive set of tests which includes the new Java 8 date/time implementation (JSR-310). This test set has highlighted some weird inconsistencies between various date/time implementations. In particular I have noticed that creating a Joda MutableDateTime from date components was 3-4 times faster than the same operation on date+time components. Nevertheless both were using the identical client logic. But there was a difference – date tests were using years from 1981 to 2000 (in a loop). DateTime tests were using 2013. This turned out to be a key to the problem.

MutableDateTime, as well as some other Joda Time classes are calculating timezone offset for the given “millis since epoch” values from time to time. It may be calculated more than once per client API call. Deep under the hood there is a org.joda.time.tz.DateTimeZoneBuilder$PrecalculatedZone class with getOffset method. This method looks up the transition table for the given timezone using binary search. If your timestamp is less or equal to the biggest timestamp in the table – you get it. Otherwise org.joda.time.tz.DateTimeZoneBuilder$DSTZone.getOffset method is called for every offset you calculate. It uses daylight savings transition rules to calculate the latest transition and use it for offset calculation. Calculated values are not cached on this branch.

I have noticed this difference between years 2008 and 2009 in “Australia/Sydney” timezone. After that I ran the same test on all available timezones and found a list of zones in/around Australia and New Zealand with the same performance issue – offsets in 2009 were calculated much slower than in 2008. At the same time I have noticed that European timezones were slow in both 2008 and 2009. This led me to the conclusion.

Joda time ships with timezone rule source files in the src/main/java/org/joda/time/tz/src directory. If you’ll take a look at “australasia” file and look for the “New South Wales” rule, you will see that 2 its last lines are:

Rule	AN	2008	max	-	Apr	Sun>=1	2:00s	0	-
Rule	AN	2008	max	-	Oct	Sun>=1	2:00s	1:00	-

This is the last fast year – 2008 (and I used 1st January for testing). After that I got more suspicious and opened “europe” file and got really worried. For example, Netherlands (Europe/Amsterdam) last rule belongs back to 1945:

Rule	Neth	1945	only	-	Apr	 2	2:00s	1:00	S
Rule	Neth	1945	only	-	Sep	16	2:00s	0	-
    

The longer a country lives on a stable daylight saving rule, the more it gets penalized by Joda Time: it takes ~3.6 seconds to create 10M MutableDateTime objects for year 2008 in “Europe/Amsterdam” timezone, ~3 seconds to create 10M MutableDateTime objects for year 2010 in “Australia/Sydney” timezone (which also has to calculate its daylight savings transitions), but only ~1.2 seconds for year 2008 in Sydney (precalculated table).


So, I would like to ask Joda Time maintainers to consider prebuilding such transition tables during the timezone construction (at runtime) up to at least the current year plus a few years more.

Continue reading

Core Java 7 Change Log

by Mikhail Vorontsov


01 Jan 2014 update – covered all Java 7 versions up to Java 7u45.

This article will list performance related changes made in the core part of various Java 7 updates. I will track changes in the following packages:

  • java.lang.*
  • java.io
  • java.math
  • java.net
  • java.nio.*
  • java.text.*
  • java.util.*

These changes have one peculiar property – Oracle usually does not include them in the release notes. Probably, they consider them too minor… Anyway, this page will try to help you a little: hopefully, you will not miss updates like a “famous” String.substring change in Java 7u6.

This page will be updated after new Java 7 updates. I will also add a list of changes done before Java 7u45 (give me some time!).

Java 7u45 (compared to Java 7u25)

Most of these changes were made in Java 7u40, but I have unfortunately missed that release.

Shared empty table in ArrayList and HashMap update

File changed: \util\ArrayList.java
File changed: \util\HashMap.java

Two most popular Java collections – ArrayList and HashMap are now consuming less memory if their instances are empty (size=0). The only reason for this update, from my point of view is a critical mass of cases in Oracle performance benchmarks when a map/list is created, but not populated later due to a conditional logic (for example, empty parameter list was provided in the HTTP request).

This update reduces the garbage collector workload by creating less unused garbage.

In case of ArrayList, only objects created via an empty constructor are updated. If you are using a constructor specifying the initial size, there are no changes at all for you.

1
2
3
4
5
6
7
8
9
/**
 * Shared empty array instance used for empty instances.
 */
private static final Object[] EMPTY_ELEMENTDATA = {};
 
public ArrayList() {
    super();
    this.elementData = EMPTY_ELEMENTDATA;
}
/**
 * Shared empty array instance used for empty instances.
 */
private static final Object[] EMPTY_ELEMENTDATA = {};

public ArrayList() {
    super();
    this.elementData = EMPTY_ELEMENTDATA;
}

HashMap got a more thorough update – internal table initialization is now moved away from the constructor. It is now always initialized with an empty table. As a consequence, all getters and setters in the HashMap are now checking if a map is empty. This makes getters slightly faster in case of an empty map – you no longer have to look at the actual table, but at the same time it makes all setters/getters slower in every other case (this is a very tiny slowdown – a zero check of a single int field, but this is still an extra instruction to execute).

I hope that this HashMap change has a solid justification – there are enough always empty maps in the Oracle benchmark applications. In my personal opinion this change is quite questionable – I do not like “taxes” you always have to pay just for a corner case optimization.

1
2
3
4
5
6
7
8
9
/**
 * An empty table instance to share when the table is not inflated.
 */
static final Entry<?,?>[] EMPTY_TABLE = {};
 
/**
 * The table, resized as necessary. Length MUST Always be a power of two.
 */
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
/**
 * An empty table instance to share when the table is not inflated.
 */
static final Entry<?,?>[] EMPTY_TABLE = {};

/**
 * The table, resized as necessary. Length MUST Always be a power of two.
 */
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

An author of these changes has responded me here.

Continue reading

Static code compilation in Groovy 2.0

by Mikhail Vorontsov

A tiny introduction to Groovy

Groovy is a JVM-based language useful for scripting and other non-CPU critical applications (like Grails web framework). It is a dynamically typed language allowing you to add methods and properties to existing objects at runtime.

The ability to add methods and properties at runtime is implemented via methodMissing and propertyMissing methods/handlers as well as a dynamic method registration. Such feature allows you to support your own DSLs via parsing not existing method names at runtime and registering the actual method bodies corresponding to such methods/properties. It allows you, for example, to generate database access method like List<Person> getPersonsByN( N n ) where N is any field defined in the Persons database table.

Such functionality made Groovy popular in the web development due to ability to generate repeating data access methods at runtime in the frameworks. Unfortunately (or luckily 🙂 ), Groovy method calls are using the dynamic dispatch model – Groovy runtime chooses the best matching method signature based on the runtime argument types instead of compile time argument types, like Java does. Dynamic dispatch requires each Groovy method call to use the Groovy runtime method lookup code based on the reflection. So, are method calls in Groovy extremely slow? The answer is no – Groovy does a very good job of caching call sites, not making another reflection lookup if possible.

Groovy static compilation

One of the main features of Groovy 2.0 was the static compilation mode. It is turned on by annotating methods or the whole class with the @CompileStatic annotation. This annotation actually turns on 2 features:

  1. Static type checking
  2. Static compilation

Continue reading

Single file vs multi file storage

by Mikhail Vorontsov

Introduction

This article will discuss how badly your application may be impacted by a data storage containing a huge number of tiny files. While it is obvious that it is better to keep the same amount of data in a single file rather than in a million of tiny files, we will see what modern operating systems offer to alleviate this problem.

This article will describe data layout problem I had recently faced. Suppose we have a database containing a few million records with average size of 10 KB, but no hard upper limit. Each of these records has a unique long id. Besides that, there is some other data, but its size is unlikely to exceed ~5% of space occupied by records.

For some reason there was a decision to use ordinary files as a data storage. The original implementation used one file per entry. Files were separated into directories according to auxiliary attributes. All this data was read into memory on startup. There are other analytical batch tools which also read the whole database in memory before data processing.

The system required very long time to startup due to necessity to read all these tiny files from disk. My task was to solve this bottleneck.

Continue reading

String.intern in Java 6, 7 and 8 – multithreaded access

by Mikhail Vorontsov

This article follows the initial String.intern in Java 6, 7 and 8 – string pooling article describing the implementation and benefits of using String.intern() method in Java 7 and 8. The original article was already getting too long, so I had to write this article in order to describe the performance characteristics of the multithreaded access to the String.intern.

The tests we will perform today are calling String.intern() from the multiple threads. They will emulate the behavior of a lot of modern server applications (for example, specialized web crawlers). For tests I will use my new workstation with Intel Xeon E5-2650 CPU (8 physical, 16 virtual cores @ 2 Ghz) and 128 Gb RAM. It will allow us to test the rather high contention scenarios. In this article we will create 8 threads in order to utilize all physical cores.

There will be 4 tests:

  1. The benchmark one – a single thread calling String.intern() using testLongLoop method from the previous article. It will show us how fast this configuration is without any contention.
  2. All 8 threads are calling String.intern() with unique values – an own prefix will be added by each thread to the interned string. This test will show us the synchronization overhead of String.intern(). It should be the theoretical worst case: it is highly unlikely that the only thing the actual application would do is calling String.intern in a loop from many threads.
  3. Initially we start the first thread interning the set of strings. After 2 seconds delay we will start a second thread interning the same set of strings. We expect that the following assumptions will be true: str.intern()==str for the first thread; str.intern()!=str for the second thread. It will allow us to prove that there are no thread local JVM string pools.
  4. All 8 threads will intern the same set of values. This scenario will be closer to the real situation – it will provide us the rather likely mix of adding strings to the JVM pool and querying the strings from it. Nevertheless, such a high read contention on the JVM string pool is still an unlikely event.

Continue reading

String.intern in Java 6, 7 and 8 – string pooling

by Mikhail Vorontsov

This article will describe how String.intern method was implemented in Java 6 and what changes were made in it in Java 7 and Java 8.

First of all I want to thank Yannis Bres for inspiring me to write this article.

07 June 2014 update: added 60013 as a default string pool size since Java 7u40 (instead of Java 8), added -XX:+PrintFlagsFinal and -XX:+PrintStringTableStatistics JVM parameter references, cleaned up a few typos.

This is an updated version of this article including -XX:StringTableSize=N JVM parameter description. This article is followed by String.intern in Java 6, 7 and 8 – multithreaded access article describing the performance characteristics of the multithreaded access to String.intern().

String pooling

String pooling (aka string canonicalisation) is a process of replacing several String objects with equal value but different identity with a single shared String object. You can achieve this goal by keeping your own Map<String, String> (with possibly soft or weak references depending on your requirements) and using map values as canonicalised values. Or you can use String.intern() method which is provided to you by JDK.

At times of Java 6 using String.intern() was forbidden by many standards due to a high possibility to get an OutOfMemoryException if pooling went out of control. Oracle Java 7 implementation of string pooling was changed considerably. You can look for details in http://bugs.sun.com/view_bug.do?bug_id=6962931 and http://bugs.sun.com/view_bug.do?bug_id=6962930.

String.intern() in Java 6

In those good old days all interned strings were stored in the PermGen – the fixed size part of heap mainly used for storing loaded classes and string pool. Besides explicitly interned strings, PermGen string pool also contained all literal strings earlier used in your program (the important word here is used – if a class or method was never loaded/called, any constants defined in it will not be loaded).

The biggest issue with such string pool in Java 6 was its location – the PermGen. PermGen has a fixed size and can not be expanded at runtime. You can set it using -XX:MaxPermSize=N option. As far as I know, the default PermGen size varies between 32M and 96M depending on the platform. You can increase its size, but its size will still be fixed. Such limitation required very careful usage of String.intern – you’d better not intern any uncontrolled user input using this method. That’s why string pooling at times of Java 6 was mostly implemented in the manually managed maps.

String.intern() in Java 7

Oracle engineers made an extremely important change to the string pooling logic in Java 7 – the string pool was relocated to the heap. It means that you are no longer limited by a separate fixed size memory area. All strings are now located in the heap, as most of other ordinary objects, which allows you to manage only the heap size while tuning your application. Technically, this alone could be a sufficient reason to reconsider using String.intern() in your Java 7 programs. But there are other reasons.

Continue reading

A few more memory saving techniques in Java

by Mikhail Vorontsov

This article will list a few more ideas on cheap memory saving techniques in Java. This article follows An overview of memory saving techniques in Java, Memory consumption of popular Java data types – part 1 and Memory consumption of popular Java data types – part 2 articles earlier published in this blog. I strongly recommend you to read An overview of memory saving techniques in Java before reading this article. This article assumes that size of Object references is equal to 4 bytes.

Static inner classes

Java lacks a normal library tuple support like the one Scala has. This forces Java developers to create a lot of usually inner classes, whose only purpose is to compose a few elements as a single entity for a collection. One really popular example of such inner class is Pair, which is often implemented as a generic class.

Inner non-static classes have an important property – they store a reference to their parent class. Such reference allows them to seamlessly call parent methods and access parent fields. Such a reference is often not needed for the simple tuple-like classes. You can get rid of it by marking your class as static. This will save you 4 bytes and increase its chances to occupy 8 bytes less (see An overview of memory saving techniques in Java for more details).

You should generally mark all your inner classes as static until there is a need to remove such qualifier.

Continue reading

Memory consumption of popular Java data types – part 2

by Mikhail Vorontsov

This article will give you an overview of some popular Java data types memory consumption. This article follows An overview of memory saving techniques in Java and Memory consumption of popular Java data types – part 1 articles earlier published in this blog. I strongly recommend you to read An overview of memory saving techniques in Java before reading this one. This article assumes that size of references is equal to 4 bytes.

HashMap, THashMap

HashMap is the most popular map in JDK. Provided that it contains objects with a quality hashCode method and it has not too high load factor, it will generally give you O(1) performance of get and put methods (as well as similar methods like contains).

You can find a proper hash map description in one of popular textbooks (“Algorithms (4th Edition)” or “Introduction to Algorithms”) or in Wikipedia. Here we will only discuss JDK HashMap implementation.

HashMap is built on top of the array of Map.Entry objects. The implementation ensures that this array length is always equal to at least max( size, capacity ) / load_factor. Default load factor for HashMap is 0.75 and default capacity is 16. Load factor specifies which part of an array could be used for storage and is a value between 0 and 1. The higher is the load factor, the less space is being wasted, but HashMap starts to work slower due to increased rate of collisions. The smaller if the load factor, the more memory is wasted, but the performance of a HashMap is increasing due to smaller possibility of collisions.

So, as you have seen, the default (new HashMap<>()) size of array of entries is 16 / 0.75 = 21.33 ~ 22.

What is a HashMap.Entry? It contains a key, a value, int hash of a key and a pointer to the next entry (remember that entries array could be sparse). It means that an entry occupies 32 bytes (12 bytes header + 16 bytes data + 4 bytes padding). So, a HashMap with size = S has to spend 32 * S bytes for entries storage. Besides, it will use 4 * C bytes for entries array, where C is the map capacity.

As you can see, if you will make the map capacity small enough (less than 12.5%), its entry array size will start dominating over the entries.

Continue reading

Memory consumption of popular Java data types – part 1

by Mikhail Vorontsov

This article will give you an overview of some popular Java data types memory consumption. This article follows An overview of memory saving techniques in Java article earlier published in this blog. I strongly recommend you to read the previous article before reading this one.

Enum, EnumMap, EnumSet

Enums were added to Java 1.5. Technically speaking, each enum is an Object and has all memory consumption properties of objects described in the previous article. Luckily, they have several useful properties:

All enum objects are singletons. This is guaranteed by Java language – you can create an instance of enum only by its definition. This means that you pay for enum object once and then you only spend 4 bytes per its reference (in this article I will assume that object references occupy 4 bytes). But what is the benefit of enums if they consume as much memory as int variables and 4 times more than byte variables, besides type safety and better support in IDE?

The answer is the ordinal() method implemented by every enum. It is an increasing number starting from 0 and ending at the number of values in the given enum minus 1. Such enum property allows us to use arrays for enum to any other object mapping: a value related to the first enum value will be stored in array[0], second enum will be mapped to array[1] and so on according to Enum.ordinal() method result. By the way, it was a short description of JDK EnumMap class implementation.

If you need to implement a map from Object into enum, you won’t have a tailored JDK implementation at hand. Of course, you can use any JDK Object to Object map, but it wouldn’t be that efficient. The easy way here is to use Trove TObjectByteMap (or TObjectIntMap in rare cases when you have more than 128 values in your enum) and map from Object key into Enum.ordinal() value. You will need a decoding method for getters in order to convert byte into an enum. This method will require 1 byte per entry, which is the best we can do without paying CPU algorithmic penalty (of course, we can use less than 1 byte per enum, if there are less than 128, 64, 32, etc elements in the enum, but it may make your code more complicated for a very little memory gain).

With all this knowledge at hand, you may now realize that EnumSet is implemented similarly to BitSet. There are 2 EnumSet implementations in JDK: RegularEnumSet and JumboEnumSet. The former is used for enums having less than 65 values (it covers 99,9999% of real-world enums), the latter is used for larger enumerations.

RegularEnumSet utilizes the knowledge of the fact that there are less or equal to 64 values in the enum. It allows RegularEnumSet to use a single long to store all “enum present in the set” flags, rather than using long[] (utilized by JumboEnumSet or BitSet). Using a single long instead of long[1] allows to save 4 bytes on long[] reference and 16 bytes on long value (long occupies 8 bytes, long[1] needs 12 bytes for header, 8 bytes for a single long and 4 bytes for alignment).

Continue reading