Tag Archives: parsing

JSR 310 – Java 8 Date/Time library performance (as well as Joda Time 2.3 and j.u.Calendar)

by Mikhail Vorontsov

Introduction

This is the third date/time article in this blog. I advice you to look at the other two as well: java.util.Date, java.util.Calendar and java.text.SimpleDateFormat and Joda Time library performance.

This article is a short overview of the new Java 8 date/time implementation also known as JSR-310. I will compare JSR-310 implementation and performance with their counterparts from Joda Time library as well as with the good old java.util.GregorianCalendar. This review was written and tested on Java 8 ea b121.

All new Java 8 classes are implemented around the human time concept – separate fields for years, months, days, hours, minutes, seconds, and, in line with the current fashion, nanoseconds. Their counterpart is a machine time – number of milliseconds since epoch, which you may obtain, for example, via System.currentTimeMillis() call. In order to convert time between 2 these systems you will need to know which timezone to use. A timezone defines the offset from UTC used in conversions. Offset calculation may require the use of transition table or transition rules defining when the daylight savings changes happen. Sometime it may become a performance bottleneck.

JSR-310 implementation was inspired by a Joda Time library – both libraries have the similar interface, but their implementations differ greatly – Java 8 classes are built around the human time, but Joda Time is using machine time inside. As a result, if you are looking for the fastest implementation, I would recommend you to use Java 8 classes except the situations when:

  • You can’t use Java 8 (yeah, not many people can use it before the first official release…)
  • You work strictly with the machine time inside a few day range (in this case manual long/int based implementation will be faster).
  • You have to parse timestamps including offset/zone id.

Continue reading

Use case: FIX message processing. Part 1: Writing a simple FIX parser

by Mikhail Vorontsov

In this article we will see a “real life” example: we will describe how to parse a tag-based FIX message, how to improve original parsing code. The second part of this article will be dedicated to implementing a simple gateway for FIX messages and finding out why parse-compose logic is very bad from performance point of view.

FIX messages consist of a number of fields. Each field has a name (it is decimal numerical in FIX) and a value (its datatype depends on message name). Fields are separated with 0x01 and name is separated from value with =. This is textual message format, so field 45 with value ‘test’ will look like ’45=test’. FIX also defines some binary fields, consisting of field name, field length and raw data, which may contain 0x01, but for the sake of simplicity we will not discuss them.

Message parsing: naive approach

Let’s start writing a message parser. Just for ease of reading, field separator 0x01 was replaced by semicolon in the source code. It doesn’t change any logic, only makes a message literal more readable. I’ve also replaced real FIX fields with very fake ones and left only date/int/double/string field formats. Adding more of them is straightforward, but not beneficial for this article.

The following code reads a message 20K times in the beginning – in order to compile test code and 10M times after that – for the actual test. It parses a “FIX” message string into a list of Field objects, which are field id plus field value.

Note: the actual code for this article (see link at the end of the article) is more object oriented 🙂

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
public class FixTests {
    private static final int ITERS = 10000000;
 
    private static final String MESSAGE = "1=123;5=test data;7=20120815;8=another data field, this one is rather long;" +
            "8=and one more field, looks like a repeating one;14=20120101;9=4444;21=20111231;48=one more string field to parse;" +
            "5=another field 5, why does it repeat itself?;1=123;5=test data;7=20120815;8=another data field, this one is rather long;100=144.82;102=2.25";
 
    public static void main(String[] args) {
        test( 20000 ); //to compile a method
        test( ITERS );
    }
 
    private static void test( final int iters )
    {
        long cnt = 0;
        final long start = System.currentTimeMillis();
        for ( int i = 0; i < iters; ++i )
        {
            final List<Field> fields = parse( MESSAGE );
            cnt += fields.size();
        }
        final long time = System.currentTimeMillis() - start;
        if ( iters >= 100000 )
            System.out.println( "Time to parse " + iters + " messages = " + time / 1000.0 + " sec, cnt = " + cnt );
    }
 
    private static Set<Integer> set( final int... values )
    {
        final Set<Integer> res = new HashSet<Integer>( values.length );
        for ( final int i : values )
        res.add( i );
        return res;
    }
 
    //numbers of non-string fields
    private static final Set<Integer> DATE_FIELDS = set( 7, 14, 21 );
    private static final Set<Integer> INT_FIELDS = set( 7, 14, 21 );
    private static final Set<Integer> DOUBLE_FIELDS = set( 100, 102 );
 
    private static final String FIELD_SEPARATOR = ";";
    private static final String VALUE_SEPARATOR = "=";
 
    private static final class Field
    {
        public final int id;
        public final Object value;
 
        private Field(int id, Object value) {
            this.id = id;
            this.value = value;
        }
    }
 
    //SimpleDateFormat objects are not threadsafe, so such wrapper will save us from multithreading issues
    private static final ThreadLocal<SimpleDateFormat> DATE_FORMAT = new ThreadLocal<SimpleDateFormat>()
    {
        @Override
        protected SimpleDateFormat initialValue() {
            final SimpleDateFormat sdf = new SimpleDateFormat( "yyyyMMdd" );
            sdf.setLenient( true );
            return sdf;
        }
    };
 
    private static List<Field> parse( final String str )
    {
        final String[] parts = str.split( FIELD_SEPARATOR );
        final List<Field> res = new ArrayList<Field>( parts.length );
        for ( final String part : parts )
        {
            final String[] subparts = part.split( VALUE_SEPARATOR );
            final int fieldId = Integer.parseInt( subparts[ 0 ] );
            if ( DATE_FIELDS.contains( fieldId ) )
            {
                try {
                    res.add( new Field( fieldId, DATE_FORMAT.get().parse( subparts[ 1 ] ) ) );
                } catch (ParseException e) {
                    //not production code, so ignore failure, like with numbers
                }
            }
            else if ( INT_FIELDS.contains( fieldId ) )
                res.add( new Field( fieldId, Integer.parseInt( subparts[1]) ) );
            else if ( DOUBLE_FIELDS.contains( fieldId ) )
                res.add( new Field( fieldId, Double.parseDouble( subparts[ 1 ] ) ) );
            else //string
                res.add( new Field( fieldId, subparts[ 1 ] ) );
        }
        return res;
    }
}
public class FixTests {
    private static final int ITERS = 10000000;

    private static final String MESSAGE = "1=123;5=test data;7=20120815;8=another data field, this one is rather long;" +
            "8=and one more field, looks like a repeating one;14=20120101;9=4444;21=20111231;48=one more string field to parse;" +
            "5=another field 5, why does it repeat itself?;1=123;5=test data;7=20120815;8=another data field, this one is rather long;100=144.82;102=2.25";

    public static void main(String[] args) {
        test( 20000 ); //to compile a method
        test( ITERS );
    }

    private static void test( final int iters )
    {
        long cnt = 0;
        final long start = System.currentTimeMillis();
        for ( int i = 0; i < iters; ++i )
        {
            final List<Field> fields = parse( MESSAGE );
            cnt += fields.size();
        }
        final long time = System.currentTimeMillis() - start;
        if ( iters >= 100000 )
            System.out.println( "Time to parse " + iters + " messages = " + time / 1000.0 + " sec, cnt = " + cnt );
    }

    private static Set<Integer> set( final int... values )
    {
        final Set<Integer> res = new HashSet<Integer>( values.length );
        for ( final int i : values )
        res.add( i );
        return res;
    }

    //numbers of non-string fields
    private static final Set<Integer> DATE_FIELDS = set( 7, 14, 21 );
    private static final Set<Integer> INT_FIELDS = set( 7, 14, 21 );
    private static final Set<Integer> DOUBLE_FIELDS = set( 100, 102 );

    private static final String FIELD_SEPARATOR = ";";
    private static final String VALUE_SEPARATOR = "=";

    private static final class Field
    {
        public final int id;
        public final Object value;

        private Field(int id, Object value) {
            this.id = id;
            this.value = value;
        }
    }

    //SimpleDateFormat objects are not threadsafe, so such wrapper will save us from multithreading issues
    private static final ThreadLocal<SimpleDateFormat> DATE_FORMAT = new ThreadLocal<SimpleDateFormat>()
    {
        @Override
        protected SimpleDateFormat initialValue() {
            final SimpleDateFormat sdf = new SimpleDateFormat( "yyyyMMdd" );
            sdf.setLenient( true );
            return sdf;
        }
    };

    private static List<Field> parse( final String str )
    {
        final String[] parts = str.split( FIELD_SEPARATOR );
        final List<Field> res = new ArrayList<Field>( parts.length );
        for ( final String part : parts )
        {
            final String[] subparts = part.split( VALUE_SEPARATOR );
            final int fieldId = Integer.parseInt( subparts[ 0 ] );
            if ( DATE_FIELDS.contains( fieldId ) )
            {
                try {
                    res.add( new Field( fieldId, DATE_FORMAT.get().parse( subparts[ 1 ] ) ) );
                } catch (ParseException e) {
                    //not production code, so ignore failure, like with numbers
                }
            }
            else if ( INT_FIELDS.contains( fieldId ) )
                res.add( new Field( fieldId, Integer.parseInt( subparts[1]) ) );
            else if ( DOUBLE_FIELDS.contains( fieldId ) )
                res.add( new Field( fieldId, Double.parseDouble( subparts[ 1 ] ) ) );
            else //string
                res.add( new Field( fieldId, subparts[ 1 ] ) );
        }
        return res;
    }
}

Continue reading