Tag Archives: date

JSR 310 – Java 8 Date/Time library performance (as well as Joda Time 2.3 and j.u.Calendar)

by Mikhail Vorontsov

Introduction

This is the third date/time article in this blog. I advice you to look at the other two as well: java.util.Date, java.util.Calendar and java.text.SimpleDateFormat and Joda Time library performance.

This article is a short overview of the new Java 8 date/time implementation also known as JSR-310. I will compare JSR-310 implementation and performance with their counterparts from Joda Time library as well as with the good old java.util.GregorianCalendar. This review was written and tested on Java 8 ea b121.

All new Java 8 classes are implemented around the human time concept – separate fields for years, months, days, hours, minutes, seconds, and, in line with the current fashion, nanoseconds. Their counterpart is a machine time – number of milliseconds since epoch, which you may obtain, for example, via System.currentTimeMillis() call. In order to convert time between 2 these systems you will need to know which timezone to use. A timezone defines the offset from UTC used in conversions. Offset calculation may require the use of transition table or transition rules defining when the daylight savings changes happen. Sometime it may become a performance bottleneck.

JSR-310 implementation was inspired by a Joda Time library – both libraries have the similar interface, but their implementations differ greatly – Java 8 classes are built around the human time, but Joda Time is using machine time inside. As a result, if you are looking for the fastest implementation, I would recommend you to use Java 8 classes except the situations when:

  • You can’t use Java 8 (yeah, not many people can use it before the first official release…)
  • You work strictly with the machine time inside a few day range (in this case manual long/int based implementation will be faster).
  • You have to parse timestamps including offset/zone id.

Continue reading

Joda Time library performance

by Mikhail Vorontsov

Joda Time library is an attempt to create a cleaner and more developer-friendly date/time library than an existing JDK Date/Calendar/DateFormat ecosystem. This article provides a top level overview of Joda Time library from performance point of view. It may be also useful as a short introduction to Joda Time. I recommend reading java.util.Date, java.util.Calendar and java.text.SimpleDateFormat article before reading this one in order to better understand the performance of built-in JDK classes.

This article discusses Joda Time library versions 2.1 – 2.3.


26 Jan 2014: This is a major article rewrite – a major performance issue was found in Joda Time implementation.

Date/time storage classes

There are five general purpose date/time classes in this library:

  • DateTime – full replacement of java.util.Calendar, supporting time zones and date/time arithmetic. This class is immutable.
  • MutableDateTime – mutable version of DateTime.
  • LocalDate – immutable version of DateTime containing only date fields. This class does not use a timezone after construction.
  • LocalTime – immutable version of DateTime containing only time fields. This class does not use a timezone after construction.
  • LocalDateTime – immutable version of DateTime not using a timezone.

All these classes contain 2 mandatory fields – long with millis since epoch and a Chronology object, containing all timezone and calendar system (Gregorian, Buddhist, etc.) related logic. Some of these classes contain 1-2 more fields. An instance of these classes occupy from 24 to 40 bytes.

Since these classes are based on the “machine” time – millis since epoch, we may expect the better performance on from/to long conversions and worse performance on to/from date components (human time) conversions. In reality, Joda developers have made a series of clever optimizations which make it really fast even on the human time calculations.

Joda Time timezone offset calculation performance bug (ver 2.3)

Let’s start an updated version of this article from this point 🙂 When I wrote an original version of this article in late 2012, I noticed the poor performance of timezone based logic in Joda Time, but I thought that it was the common property of that library, so I wrote that “this logic is definitely worth optimizing in Joda Time”.

One year later I have written a more comprehensive set of tests which includes the new Java 8 date/time implementation (JSR-310). This test set has highlighted some weird inconsistencies between various date/time implementations. In particular I have noticed that creating a Joda MutableDateTime from date components was 3-4 times faster than the same operation on date+time components. Nevertheless both were using the identical client logic. But there was a difference – date tests were using years from 1981 to 2000 (in a loop). DateTime tests were using 2013. This turned out to be a key to the problem.

MutableDateTime, as well as some other Joda Time classes are calculating timezone offset for the given “millis since epoch” values from time to time. It may be calculated more than once per client API call. Deep under the hood there is a org.joda.time.tz.DateTimeZoneBuilder$PrecalculatedZone class with getOffset method. This method looks up the transition table for the given timezone using binary search. If your timestamp is less or equal to the biggest timestamp in the table – you get it. Otherwise org.joda.time.tz.DateTimeZoneBuilder$DSTZone.getOffset method is called for every offset you calculate. It uses daylight savings transition rules to calculate the latest transition and use it for offset calculation. Calculated values are not cached on this branch.

I have noticed this difference between years 2008 and 2009 in “Australia/Sydney” timezone. After that I ran the same test on all available timezones and found a list of zones in/around Australia and New Zealand with the same performance issue – offsets in 2009 were calculated much slower than in 2008. At the same time I have noticed that European timezones were slow in both 2008 and 2009. This led me to the conclusion.

Joda time ships with timezone rule source files in the src/main/java/org/joda/time/tz/src directory. If you’ll take a look at “australasia” file and look for the “New South Wales” rule, you will see that 2 its last lines are:

Rule	AN	2008	max	-	Apr	Sun>=1	2:00s	0	-
Rule	AN	2008	max	-	Oct	Sun>=1	2:00s	1:00	-

This is the last fast year – 2008 (and I used 1st January for testing). After that I got more suspicious and opened “europe” file and got really worried. For example, Netherlands (Europe/Amsterdam) last rule belongs back to 1945:

Rule	Neth	1945	only	-	Apr	 2	2:00s	1:00	S
Rule	Neth	1945	only	-	Sep	16	2:00s	0	-
    

The longer a country lives on a stable daylight saving rule, the more it gets penalized by Joda Time: it takes ~3.6 seconds to create 10M MutableDateTime objects for year 2008 in “Europe/Amsterdam” timezone, ~3 seconds to create 10M MutableDateTime objects for year 2010 in “Australia/Sydney” timezone (which also has to calculate its daylight savings transitions), but only ~1.2 seconds for year 2008 in Sydney (precalculated table).


So, I would like to ask Joda Time maintainers to consider prebuilding such transition tables during the timezone construction (at runtime) up to at least the current year plus a few years more.

Continue reading

Use case: FIX message processing. Part 1: Writing a simple FIX parser

by Mikhail Vorontsov

In this article we will see a “real life” example: we will describe how to parse a tag-based FIX message, how to improve original parsing code. The second part of this article will be dedicated to implementing a simple gateway for FIX messages and finding out why parse-compose logic is very bad from performance point of view.

FIX messages consist of a number of fields. Each field has a name (it is decimal numerical in FIX) and a value (its datatype depends on message name). Fields are separated with 0x01 and name is separated from value with =. This is textual message format, so field 45 with value ‘test’ will look like ’45=test’. FIX also defines some binary fields, consisting of field name, field length and raw data, which may contain 0x01, but for the sake of simplicity we will not discuss them.

Message parsing: naive approach

Let’s start writing a message parser. Just for ease of reading, field separator 0x01 was replaced by semicolon in the source code. It doesn’t change any logic, only makes a message literal more readable. I’ve also replaced real FIX fields with very fake ones and left only date/int/double/string field formats. Adding more of them is straightforward, but not beneficial for this article.

The following code reads a message 20K times in the beginning – in order to compile test code and 10M times after that – for the actual test. It parses a “FIX” message string into a list of Field objects, which are field id plus field value.

Note: the actual code for this article (see link at the end of the article) is more object oriented 🙂

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
public class FixTests {
    private static final int ITERS = 10000000;
 
    private static final String MESSAGE = "1=123;5=test data;7=20120815;8=another data field, this one is rather long;" +
            "8=and one more field, looks like a repeating one;14=20120101;9=4444;21=20111231;48=one more string field to parse;" +
            "5=another field 5, why does it repeat itself?;1=123;5=test data;7=20120815;8=another data field, this one is rather long;100=144.82;102=2.25";
 
    public static void main(String[] args) {
        test( 20000 ); //to compile a method
        test( ITERS );
    }
 
    private static void test( final int iters )
    {
        long cnt = 0;
        final long start = System.currentTimeMillis();
        for ( int i = 0; i < iters; ++i )
        {
            final List<Field> fields = parse( MESSAGE );
            cnt += fields.size();
        }
        final long time = System.currentTimeMillis() - start;
        if ( iters >= 100000 )
            System.out.println( "Time to parse " + iters + " messages = " + time / 1000.0 + " sec, cnt = " + cnt );
    }
 
    private static Set<Integer> set( final int... values )
    {
        final Set<Integer> res = new HashSet<Integer>( values.length );
        for ( final int i : values )
        res.add( i );
        return res;
    }
 
    //numbers of non-string fields
    private static final Set<Integer> DATE_FIELDS = set( 7, 14, 21 );
    private static final Set<Integer> INT_FIELDS = set( 7, 14, 21 );
    private static final Set<Integer> DOUBLE_FIELDS = set( 100, 102 );
 
    private static final String FIELD_SEPARATOR = ";";
    private static final String VALUE_SEPARATOR = "=";
 
    private static final class Field
    {
        public final int id;
        public final Object value;
 
        private Field(int id, Object value) {
            this.id = id;
            this.value = value;
        }
    }
 
    //SimpleDateFormat objects are not threadsafe, so such wrapper will save us from multithreading issues
    private static final ThreadLocal<SimpleDateFormat> DATE_FORMAT = new ThreadLocal<SimpleDateFormat>()
    {
        @Override
        protected SimpleDateFormat initialValue() {
            final SimpleDateFormat sdf = new SimpleDateFormat( "yyyyMMdd" );
            sdf.setLenient( true );
            return sdf;
        }
    };
 
    private static List<Field> parse( final String str )
    {
        final String[] parts = str.split( FIELD_SEPARATOR );
        final List<Field> res = new ArrayList<Field>( parts.length );
        for ( final String part : parts )
        {
            final String[] subparts = part.split( VALUE_SEPARATOR );
            final int fieldId = Integer.parseInt( subparts[ 0 ] );
            if ( DATE_FIELDS.contains( fieldId ) )
            {
                try {
                    res.add( new Field( fieldId, DATE_FORMAT.get().parse( subparts[ 1 ] ) ) );
                } catch (ParseException e) {
                    //not production code, so ignore failure, like with numbers
                }
            }
            else if ( INT_FIELDS.contains( fieldId ) )
                res.add( new Field( fieldId, Integer.parseInt( subparts[1]) ) );
            else if ( DOUBLE_FIELDS.contains( fieldId ) )
                res.add( new Field( fieldId, Double.parseDouble( subparts[ 1 ] ) ) );
            else //string
                res.add( new Field( fieldId, subparts[ 1 ] ) );
        }
        return res;
    }
}
public class FixTests {
    private static final int ITERS = 10000000;

    private static final String MESSAGE = "1=123;5=test data;7=20120815;8=another data field, this one is rather long;" +
            "8=and one more field, looks like a repeating one;14=20120101;9=4444;21=20111231;48=one more string field to parse;" +
            "5=another field 5, why does it repeat itself?;1=123;5=test data;7=20120815;8=another data field, this one is rather long;100=144.82;102=2.25";

    public static void main(String[] args) {
        test( 20000 ); //to compile a method
        test( ITERS );
    }

    private static void test( final int iters )
    {
        long cnt = 0;
        final long start = System.currentTimeMillis();
        for ( int i = 0; i < iters; ++i )
        {
            final List<Field> fields = parse( MESSAGE );
            cnt += fields.size();
        }
        final long time = System.currentTimeMillis() - start;
        if ( iters >= 100000 )
            System.out.println( "Time to parse " + iters + " messages = " + time / 1000.0 + " sec, cnt = " + cnt );
    }

    private static Set<Integer> set( final int... values )
    {
        final Set<Integer> res = new HashSet<Integer>( values.length );
        for ( final int i : values )
        res.add( i );
        return res;
    }

    //numbers of non-string fields
    private static final Set<Integer> DATE_FIELDS = set( 7, 14, 21 );
    private static final Set<Integer> INT_FIELDS = set( 7, 14, 21 );
    private static final Set<Integer> DOUBLE_FIELDS = set( 100, 102 );

    private static final String FIELD_SEPARATOR = ";";
    private static final String VALUE_SEPARATOR = "=";

    private static final class Field
    {
        public final int id;
        public final Object value;

        private Field(int id, Object value) {
            this.id = id;
            this.value = value;
        }
    }

    //SimpleDateFormat objects are not threadsafe, so such wrapper will save us from multithreading issues
    private static final ThreadLocal<SimpleDateFormat> DATE_FORMAT = new ThreadLocal<SimpleDateFormat>()
    {
        @Override
        protected SimpleDateFormat initialValue() {
            final SimpleDateFormat sdf = new SimpleDateFormat( "yyyyMMdd" );
            sdf.setLenient( true );
            return sdf;
        }
    };

    private static List<Field> parse( final String str )
    {
        final String[] parts = str.split( FIELD_SEPARATOR );
        final List<Field> res = new ArrayList<Field>( parts.length );
        for ( final String part : parts )
        {
            final String[] subparts = part.split( VALUE_SEPARATOR );
            final int fieldId = Integer.parseInt( subparts[ 0 ] );
            if ( DATE_FIELDS.contains( fieldId ) )
            {
                try {
                    res.add( new Field( fieldId, DATE_FORMAT.get().parse( subparts[ 1 ] ) ) );
                } catch (ParseException e) {
                    //not production code, so ignore failure, like with numbers
                }
            }
            else if ( INT_FIELDS.contains( fieldId ) )
                res.add( new Field( fieldId, Integer.parseInt( subparts[1]) ) );
            else if ( DOUBLE_FIELDS.contains( fieldId ) )
                res.add( new Field( fieldId, Double.parseDouble( subparts[ 1 ] ) ) );
            else //string
                res.add( new Field( fieldId, subparts[ 1 ] ) );
        }
        return res;
    }
}

Continue reading

java.util.Date, java.util.Calendar and java.text.SimpleDateFormat performance

by Mikhail Vorontsov

In this post we will discuss memory footprint and speed of various classes used to represent and parse dates and times in core Java:

  • java.util.Date – Java 1.0 style datetime storage class, now absolutely useless
  • java.util.Calendar – Java 1.1+ style datetime storage class, supports time zones and locales, useful for date calculations
  • java.text.SimpleDateFormat – most commonly used datetime parser

Datetime storage classes

There are 2 core Java classes designed to store dates and times: java.util.Date and java.util.Calendar. Despite the fact the former is used in a lot of APIs, it is actually just an wrapper over a long timestamp field. It is also important to note that java.util.Date is not immutable, what further limits its use.

The latter one, java.util.Calendar, was added in Java 1.1 in order to support internationalization. Besides, it has a good support of datetime arithmetic operations, like adding 30 days to a given date, supporting all daylight savings issues as well. So, this class should be used for most of modern software datetime operations.

Still, when it comes to storing millions of datetimes, both these classes are a rather bad choice. Simply remember the footprint of instances of these classes in most common conditions:

java.util.Date java.util.GregorianCalendar
24 bytes 448 bytes

Yes, 448 bytes to keep a timestamp, locale and time zone. Take a look at your IDE debugger to see how much information is stored inside a java.util.GregorianCalendar – the most commonly used subclass of java.util.Calendar.

Continue reading