Tag Archives: bitset

A few more memory saving techniques in Java

by Mikhail Vorontsov

This article will list a few more ideas on cheap memory saving techniques in Java. This article follows An overview of memory saving techniques in Java, Memory consumption of popular Java data types – part 1 and Memory consumption of popular Java data types – part 2 articles earlier published in this blog. I strongly recommend you to read An overview of memory saving techniques in Java before reading this article. This article assumes that size of Object references is equal to 4 bytes.

Static inner classes

Java lacks a normal library tuple support like the one Scala has. This forces Java developers to create a lot of usually inner classes, whose only purpose is to compose a few elements as a single entity for a collection. One really popular example of such inner class is Pair, which is often implemented as a generic class.

Inner non-static classes have an important property – they store a reference to their parent class. Such reference allows them to seamlessly call parent methods and access parent fields. Such a reference is often not needed for the simple tuple-like classes. You can get rid of it by marking your class as static. This will save you 4 bytes and increase its chances to occupy 8 bytes less (see An overview of memory saving techniques in Java for more details).

You should generally mark all your inner classes as static until there is a need to remove such qualifier.

Continue reading

Memory consumption of popular Java data types – part 1

by Mikhail Vorontsov

This article will give you an overview of some popular Java data types memory consumption. This article follows An overview of memory saving techniques in Java article earlier published in this blog. I strongly recommend you to read the previous article before reading this one.

Enum, EnumMap, EnumSet

Enums were added to Java 1.5. Technically speaking, each enum is an Object and has all memory consumption properties of objects described in the previous article. Luckily, they have several useful properties:

All enum objects are singletons. This is guaranteed by Java language – you can create an instance of enum only by its definition. This means that you pay for enum object once and then you only spend 4 bytes per its reference (in this article I will assume that object references occupy 4 bytes). But what is the benefit of enums if they consume as much memory as int variables and 4 times more than byte variables, besides type safety and better support in IDE?

The answer is the ordinal() method implemented by every enum. It is an increasing number starting from 0 and ending at the number of values in the given enum minus 1. Such enum property allows us to use arrays for enum to any other object mapping: a value related to the first enum value will be stored in array[0], second enum will be mapped to array[1] and so on according to Enum.ordinal() method result. By the way, it was a short description of JDK EnumMap class implementation.

If you need to implement a map from Object into enum, you won’t have a tailored JDK implementation at hand. Of course, you can use any JDK Object to Object map, but it wouldn’t be that efficient. The easy way here is to use Trove TObjectByteMap (or TObjectIntMap in rare cases when you have more than 128 values in your enum) and map from Object key into Enum.ordinal() value. You will need a decoding method for getters in order to convert byte into an enum. This method will require 1 byte per entry, which is the best we can do without paying CPU algorithmic penalty (of course, we can use less than 1 byte per enum, if there are less than 128, 64, 32, etc elements in the enum, but it may make your code more complicated for a very little memory gain).

With all this knowledge at hand, you may now realize that EnumSet is implemented similarly to BitSet. There are 2 EnumSet implementations in JDK: RegularEnumSet and JumboEnumSet. The former is used for enums having less than 65 values (it covers 99,9999% of real-world enums), the latter is used for larger enumerations.

RegularEnumSet utilizes the knowledge of the fact that there are less or equal to 64 values in the enum. It allows RegularEnumSet to use a single long to store all “enum present in the set” flags, rather than using long[] (utilized by JumboEnumSet or BitSet). Using a single long instead of long[1] allows to save 4 bytes on long[] reference and 16 bytes on long value (long occupies 8 bytes, long[1] needs 12 bytes for header, 8 bytes for a single long and 4 bytes for alignment).

Continue reading

hashCode method performance tuning

by Mikhail Vorontsov

In this chapter we will discuss various implications of hashCode method implementation on application performance.

The main purpose of hashCode method is to allow an object to be a key in the hash map or a member of a hash set. In this case an object should also implement equals(Object) method, which is consistent with hashCode implementation:

  • If a.equals(b) then a.hashCode() == b.hashCode()
  • If hashCode() was called twice on the same object, it should return the same result provided that the object was not changed

hashCode from performance point of view

From the performance point of view, the main objective for your hashCode method implementation is to minimize the number of objects sharing the same hash code. All JDK hash based collections store their values in an array. Hash code is used to calculate an initial lookup position in this array. After that equals is used to compare given value with values stored in the internal array. So, if all values have distinct hash codes, this will minimize the possibility of hash collisions. On the other hand, if all values will have the same hash code, hash map (or set) will degrade into a list with operations on it having O(n2) complexity.

For more details, read about collision resolution in hash maps. JDK is using a method called open addressing, but there is another method called “chaining” – all key-value pairs with the same hash code are stored in a linked list.

Continue reading

Use case: FIX message processing. Part 1: Writing a simple FIX parser

by Mikhail Vorontsov

In this article we will see a “real life” example: we will describe how to parse a tag-based FIX message, how to improve original parsing code. The second part of this article will be dedicated to implementing a simple gateway for FIX messages and finding out why parse-compose logic is very bad from performance point of view.

FIX messages consist of a number of fields. Each field has a name (it is decimal numerical in FIX) and a value (its datatype depends on message name). Fields are separated with 0x01 and name is separated from value with =. This is textual message format, so field 45 with value ‘test’ will look like ’45=test’. FIX also defines some binary fields, consisting of field name, field length and raw data, which may contain 0x01, but for the sake of simplicity we will not discuss them.

Message parsing: naive approach

Let’s start writing a message parser. Just for ease of reading, field separator 0x01 was replaced by semicolon in the source code. It doesn’t change any logic, only makes a message literal more readable. I’ve also replaced real FIX fields with very fake ones and left only date/int/double/string field formats. Adding more of them is straightforward, but not beneficial for this article.

The following code reads a message 20K times in the beginning – in order to compile test code and 10M times after that – for the actual test. It parses a “FIX” message string into a list of Field objects, which are field id plus field value.

Note: the actual code for this article (see link at the end of the article) is more object oriented 🙂

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
public class FixTests {
    private static final int ITERS = 10000000;
 
    private static final String MESSAGE = "1=123;5=test data;7=20120815;8=another data field, this one is rather long;" +
            "8=and one more field, looks like a repeating one;14=20120101;9=4444;21=20111231;48=one more string field to parse;" +
            "5=another field 5, why does it repeat itself?;1=123;5=test data;7=20120815;8=another data field, this one is rather long;100=144.82;102=2.25";
 
    public static void main(String[] args) {
        test( 20000 ); //to compile a method
        test( ITERS );
    }
 
    private static void test( final int iters )
    {
        long cnt = 0;
        final long start = System.currentTimeMillis();
        for ( int i = 0; i < iters; ++i )
        {
            final List<Field> fields = parse( MESSAGE );
            cnt += fields.size();
        }
        final long time = System.currentTimeMillis() - start;
        if ( iters >= 100000 )
            System.out.println( "Time to parse " + iters + " messages = " + time / 1000.0 + " sec, cnt = " + cnt );
    }
 
    private static Set<Integer> set( final int... values )
    {
        final Set<Integer> res = new HashSet<Integer>( values.length );
        for ( final int i : values )
        res.add( i );
        return res;
    }
 
    //numbers of non-string fields
    private static final Set<Integer> DATE_FIELDS = set( 7, 14, 21 );
    private static final Set<Integer> INT_FIELDS = set( 7, 14, 21 );
    private static final Set<Integer> DOUBLE_FIELDS = set( 100, 102 );
 
    private static final String FIELD_SEPARATOR = ";";
    private static final String VALUE_SEPARATOR = "=";
 
    private static final class Field
    {
        public final int id;
        public final Object value;
 
        private Field(int id, Object value) {
            this.id = id;
            this.value = value;
        }
    }
 
    //SimpleDateFormat objects are not threadsafe, so such wrapper will save us from multithreading issues
    private static final ThreadLocal<SimpleDateFormat> DATE_FORMAT = new ThreadLocal<SimpleDateFormat>()
    {
        @Override
        protected SimpleDateFormat initialValue() {
            final SimpleDateFormat sdf = new SimpleDateFormat( "yyyyMMdd" );
            sdf.setLenient( true );
            return sdf;
        }
    };
 
    private static List<Field> parse( final String str )
    {
        final String[] parts = str.split( FIELD_SEPARATOR );
        final List<Field> res = new ArrayList<Field>( parts.length );
        for ( final String part : parts )
        {
            final String[] subparts = part.split( VALUE_SEPARATOR );
            final int fieldId = Integer.parseInt( subparts[ 0 ] );
            if ( DATE_FIELDS.contains( fieldId ) )
            {
                try {
                    res.add( new Field( fieldId, DATE_FORMAT.get().parse( subparts[ 1 ] ) ) );
                } catch (ParseException e) {
                    //not production code, so ignore failure, like with numbers
                }
            }
            else if ( INT_FIELDS.contains( fieldId ) )
                res.add( new Field( fieldId, Integer.parseInt( subparts[1]) ) );
            else if ( DOUBLE_FIELDS.contains( fieldId ) )
                res.add( new Field( fieldId, Double.parseDouble( subparts[ 1 ] ) ) );
            else //string
                res.add( new Field( fieldId, subparts[ 1 ] ) );
        }
        return res;
    }
}
public class FixTests {
    private static final int ITERS = 10000000;

    private static final String MESSAGE = "1=123;5=test data;7=20120815;8=another data field, this one is rather long;" +
            "8=and one more field, looks like a repeating one;14=20120101;9=4444;21=20111231;48=one more string field to parse;" +
            "5=another field 5, why does it repeat itself?;1=123;5=test data;7=20120815;8=another data field, this one is rather long;100=144.82;102=2.25";

    public static void main(String[] args) {
        test( 20000 ); //to compile a method
        test( ITERS );
    }

    private static void test( final int iters )
    {
        long cnt = 0;
        final long start = System.currentTimeMillis();
        for ( int i = 0; i < iters; ++i )
        {
            final List<Field> fields = parse( MESSAGE );
            cnt += fields.size();
        }
        final long time = System.currentTimeMillis() - start;
        if ( iters >= 100000 )
            System.out.println( "Time to parse " + iters + " messages = " + time / 1000.0 + " sec, cnt = " + cnt );
    }

    private static Set<Integer> set( final int... values )
    {
        final Set<Integer> res = new HashSet<Integer>( values.length );
        for ( final int i : values )
        res.add( i );
        return res;
    }

    //numbers of non-string fields
    private static final Set<Integer> DATE_FIELDS = set( 7, 14, 21 );
    private static final Set<Integer> INT_FIELDS = set( 7, 14, 21 );
    private static final Set<Integer> DOUBLE_FIELDS = set( 100, 102 );

    private static final String FIELD_SEPARATOR = ";";
    private static final String VALUE_SEPARATOR = "=";

    private static final class Field
    {
        public final int id;
        public final Object value;

        private Field(int id, Object value) {
            this.id = id;
            this.value = value;
        }
    }

    //SimpleDateFormat objects are not threadsafe, so such wrapper will save us from multithreading issues
    private static final ThreadLocal<SimpleDateFormat> DATE_FORMAT = new ThreadLocal<SimpleDateFormat>()
    {
        @Override
        protected SimpleDateFormat initialValue() {
            final SimpleDateFormat sdf = new SimpleDateFormat( "yyyyMMdd" );
            sdf.setLenient( true );
            return sdf;
        }
    };

    private static List<Field> parse( final String str )
    {
        final String[] parts = str.split( FIELD_SEPARATOR );
        final List<Field> res = new ArrayList<Field>( parts.length );
        for ( final String part : parts )
        {
            final String[] subparts = part.split( VALUE_SEPARATOR );
            final int fieldId = Integer.parseInt( subparts[ 0 ] );
            if ( DATE_FIELDS.contains( fieldId ) )
            {
                try {
                    res.add( new Field( fieldId, DATE_FORMAT.get().parse( subparts[ 1 ] ) ) );
                } catch (ParseException e) {
                    //not production code, so ignore failure, like with numbers
                }
            }
            else if ( INT_FIELDS.contains( fieldId ) )
                res.add( new Field( fieldId, Integer.parseInt( subparts[1]) ) );
            else if ( DOUBLE_FIELDS.contains( fieldId ) )
                res.add( new Field( fieldId, Double.parseDouble( subparts[ 1 ] ) ) );
            else //string
                res.add( new Field( fieldId, subparts[ 1 ] ) );
        }
        return res;
    }
}

Continue reading