Tag Archives: byte buffer

java.io.ByteArrayOutputStream

by Mikhail Vorontsov

Do not use ByteArrayOutputStream in performance critical code

Very important: you will rarely need ByteArrayOutputStream in performance critical code. If you still think you may need it – read the rest of the article.

ByteArrayOutputStream is mostly used when you write a method which writes some sort of message with unknown length to some output stream (are there many cases when you can’t calculate size of your message?).

Important: if you know your message size in advance (or at least know an upper limit for it) – allocate a ByteBuffer instead (or reuse a previously allocated one) and write a message into it. It works faster than ByteArrayOutputStream (read Various methods of binary serialization in Java article).

ByteArrayOutputStream allows you to write anything to an internal expandable byte array and use that array as a single piece of output afterwards. Default buffer size is 32 bytes, so if you expect to write something longer, provide an explicit buffer size in the ByteArrayOutputStream(int) constructor.

In most cases ByteArrayOutputStream is used either when you are writing a callback method and caller provides you with some OutputStream, which nature is undefined, or if you are writing some “message to byte array” serialization method. Second case is covered in Various methods of binary serialization in Java article.

From above mentioned article you will know that ByteArrayOutputStream is synchronized and it seriously impacts its performance. So, if you don’t need synchronization, go to JDK sources, copy class contents to your project and remove all synchronization from it (and forget that it was my advice! 🙂 ). This will make it a bit faster…

Continue reading

Various types of memory allocation in Java

by Mikhail Vorontsov

This article discusses various types of a memory buffer allocation in Java. We will see how to treat any sort of Java buffer uniformly using sun.misc.Unsafe memory access methods. This article may be especially interesting for ex-C programmers willing to work with the memory on the lowest possible level in Java.

If you are more interested in general Java memory optimization, take a look at An overview of memory saving techniques in Java article in this blog as well as its following parts: one, two.

Array allocation limitations

Array size in Java is limited by the fact of using int as an array index. This means that you can not allocate an array with more than Integer.MAX_VALUE ( = 2^31 - 1 ) elements. This doesn’t mean that the longest chunk of memory you can allocate in Java is 2 Gb. You can allocate an array of bigger type instead. For example,

1
final long[] ar = new long[ Integer.MAX_VALUE ];
final long[] ar = new long[ Integer.MAX_VALUE ];

will allocate 16Gb - 8 bytes, if you have sufficiently high -Xmx Java setting (usually you should have about 50% more memory in heap – so in order to allocate 16Gb buffer, you will have to specify -Xmx24G (this is a general rule, actual required heap size may vary).

Unfortunately, you will be limited by your array element type in pure Java. The only useful class for working with arrays is a ByteBuffer, which offers methods for getting/writing various Java data types in the buffer (see Various methods of binary serialization in Java for more details). The disadvantage of a ByteBuffer – you are limited with byte[] as a source array type, which means a limitation of 2Gb for your buffer.

Treating any arrays as a byte buffer

For a while let’s assume that 2Gb buffers were not sufficient for our needs, but a 16Gb buffer will make us happy. We have allocated a long[], but want to treat this buffer as a byte array. We need to use a best C programmer friend in Java – sun.misc.Unsafe. This class has 2 sets of methods: getN( Object, offset ), where N is a result type for reading a value of given type from the given offset in the object and putN( Object, offset, value ) for writing a value at a given offset.

Unfortunately, these methods set or get only an individual value. If you want to copy data to/from an array, you will need one more Unsafe method: copyMemory(srcObject, srcOffset, destObject, destOffset, count). It works similar to System.arraycopy, but copies bytes instead of array elements.

In order to access array data using sun.misc.Unsafe, you will need 2 components:

Continue reading

Use case: Optimizing memory footprint of a read only csv file (Trove, Unsafe, ByteBuffer, data compression)

by Mikhail Vorontsov

Suppose your application has to obtain some data from an auxiliary data source, which could be a csv file. Your csv file will contain several fields, one of those is used as a field ID. You need to keep all that file in memory and provide fast access to records by an ID field. Your additional target is to consume as little memory as possible, while keeping access cost as low as possible. In this article we will process fake person records. Here is an example:

{id=idnum10, surname=Smith10, address=10 One Way Road, Springfield, NJ, DOB=01/11/1965, names=John Paul 10 Ringo}
    

All these records are generated by the following class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
private static class DataGenerator
{
    private int cnt = 1;
 
    public Map<String, String> getNextEntry()
    {
        final Map<String, String> res = new HashMap<String, String>( 8 );
        res.put( ID, "idnum" + cnt );
        res.put( "names", "John Paul " + cnt + " Ringo" );
        res.put( "surname", "Smith" + cnt );
        res.put( "DOB", "01/" + two((cnt % 31) + 1) + "/1965" ); //date of birth, MM/DD/YYYY
        res.put( "address", cnt + " One Way Road, Springfield, NJ" );
        ++cnt;
        return res;
    }
 
    private String two( final int val )
    {
        return val < 10 ? "0" + val : Integer.toString( val );
    }
}
private static class DataGenerator
{
    private int cnt = 1;

    public Map<String, String> getNextEntry()
    {
        final Map<String, String> res = new HashMap<String, String>( 8 );
        res.put( ID, "idnum" + cnt );
        res.put( "names", "John Paul " + cnt + " Ringo" );
        res.put( "surname", "Smith" + cnt );
        res.put( "DOB", "01/" + two((cnt % 31) + 1) + "/1965" ); //date of birth, MM/DD/YYYY
        res.put( "address", cnt + " One Way Road, Springfield, NJ" );
        ++cnt;
        return res;
    }

    private String two( final int val )
    {
        return val < 10 ? "0" + val : Integer.toString( val );
    }
}

Simple approach – map of maps

We always have to start from something simple and easy to support. In this case it may be a map of maps: outer map is indexed by ID field and th inner ones are indexed by field names.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
    /**
     * Initial version. Outer map indexed by a key field value, inner map is field name to field value.
     * All field names are interned, but we still have to pay for storing references to field names.
     */
    private static class SimpleMapStorage extends SimpleStorage
    {
        private final Map<String, Map<String, String>> m_data = new HashMap<String, Map<String, String>>( 1000 );
 
        public void addEntry( final Map<String, String> entry )
        {
            m_data.put( entry.get( ID ), entry );
        }
 
        public Map<String, String> getById( final String id )
        {
            return m_data.get( id );
        }
    }
    
    /**
     * Initial version. Outer map indexed by a key field value, inner map is field name to field value.
     * All field names are interned, but we still have to pay for storing references to field names.
     */
    private static class SimpleMapStorage extends SimpleStorage
    {
        private final Map<String, Map<String, String>> m_data = new HashMap<String, Map<String, String>>( 1000 );

        public void addEntry( final Map<String, String> entry )
        {
            m_data.put( entry.get( ID ), entry );
        }

        public Map<String, String> getById( final String id )
        {
            return m_data.get( id );
        }
    }
    

For testing purposes, all storage implementations will either implement Storage interface or extend SimpleStorage class. You will see the purpose of pack method in the more advanced examples.

1
2
3
4
5
6
7
8
9
10
11
12
13
private interface Storage
{
    public void addEntry( final Map<String, String> entry );
    public Map<String, String> getById( final String id );
    public void pack();
}
 
public static abstract class SimpleStorage implements Storage
{
    public void pack()
    {
    }
}
private interface Storage
{
    public void addEntry( final Map<String, String> entry );
    public Map<String, String> getById( final String id );
    public void pack();
}

public static abstract class SimpleStorage implements Storage
{
    public void pack()
    {
    }
}

All storage implementations will be tested by the following method:

1
2
3
4
5
6
7
8
9
10
11
private static void testStorage(final int recordCount, final Storage storage)
{
    final DataGenerator generator = new DataGenerator();
    for ( int i = 0; i < recordCount; ++i )
        storage.addEntry( generator.getNextEntry() );
    storage.pack();
    System.gc();
    final long mem = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
    System.out.println( storage.getClass().getName() + ": " + mem / 1024.0 / 1024.0 + " MB");
    System.out.println( storage.getById( "idnum10" ) ); //in order to retain storage in memory
}
private static void testStorage(final int recordCount, final Storage storage)
{
    final DataGenerator generator = new DataGenerator();
    for ( int i = 0; i < recordCount; ++i )
        storage.addEntry( generator.getNextEntry() );
    storage.pack();
    System.gc();
    final long mem = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
    System.out.println( storage.getClass().getName() + ": " + mem / 1024.0 / 1024.0 + " MB");
    System.out.println( storage.getById( "idnum10" ) ); //in order to retain storage in memory
}

For every implementation in this article we will try to create 1 million entries. SimpleMapStorage consumes 706 Mb to store 1M records. The actual data size is about 82 Mb, which means that this simple implementation wastes about 85% of consumed memory. Yes, straightforward solutions for big data storage do not work well in Java…

Continue reading