java.io.ByteArrayOutputStream

by Mikhail Vorontsov

Do not use ByteArrayOutputStream in performance critical code

Very important: you will rarely need ByteArrayOutputStream in performance critical code. If you still think you may need it – read the rest of the article.

ByteArrayOutputStream is mostly used when you write a method which writes some sort of message with unknown length to some output stream (are there many cases when you can’t calculate size of your message?).

Important: if you know your message size in advance (or at least know an upper limit for it) – allocate a ByteBuffer instead (or reuse a previously allocated one) and write a message into it. It works faster than ByteArrayOutputStream (read Various methods of binary serialization in Java article).

ByteArrayOutputStream allows you to write anything to an internal expandable byte array and use that array as a single piece of output afterwards. Default buffer size is 32 bytes, so if you expect to write something longer, provide an explicit buffer size in the ByteArrayOutputStream(int) constructor.

In most cases ByteArrayOutputStream is used either when you are writing a callback method and caller provides you with some OutputStream, which nature is undefined, or if you are writing some “message to byte array” serialization method. Second case is covered in Various methods of binary serialization in Java article.

From above mentioned article you will know that ByteArrayOutputStream is synchronized and it seriously impacts its performance. So, if you don’t need synchronization, go to JDK sources, copy class contents to your project and remove all synchronization from it (and forget that it was my advice! 🙂 ). This will make it a bit faster…

How to use ByteArrayOutputStream

Do you still want to use unpatched ByteArrayOutputStream? OK, the rest of the article will finally tell you how to use ByteArrayOutputStream.

By the way, you may also use BufferedOutputStream instead of ByteArrayOutputStream. The only real difference between them is number of underlying stream write method calls – it is always 1 for ByteArrayOutputStream and anything from 1 for BufferedOutputStream (when internal buffer gets full, BufferedOutputStream writes it to an underlying stream and ByteArrayOutputStream expands the buffer).

So, when you are writing a method which stores a message to some underlying stream, create a ByteArrayOutputStream, keep a reference to it, after that optionally envelope it with a DataOutputStream in order to conveniently write most data types to the stream, write message fields to a DataOutputStream, close it (in order to flush everything to ByteArrayOutputStream) and use a stored reference to the ByteArrayOutputStream.

For example, let’s use LogEvent class from the java.util.LinkedList article. We will write a saveTo(OutputStream) method for it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
private static final class LogEvent
{
    public final int ipv4;
    public final long time;
    public final String eventDesc;
 
    public void saveTo( final OutputStream os ) throws IOException {
        final ByteArrayOutputStream baos = new ByteArrayOutputStream( 12 + 2 + eventDesc.length() * 2 );
        final DataOutputStream dos = new DataOutputStream( baos );
        try
        {
            dos.writeInt( ipv4 );
            dos.writeLong( time );
            dos.writeUTF( eventDesc );
        }
        finally {
            dos.close();
        }
        baos.writeTo( os );
    }
}
private static final class LogEvent
{
    public final int ipv4;
    public final long time;
    public final String eventDesc;

    public void saveTo( final OutputStream os ) throws IOException {
        final ByteArrayOutputStream baos = new ByteArrayOutputStream( 12 + 2 + eventDesc.length() * 2 );
        final DataOutputStream dos = new DataOutputStream( baos );
        try
        {
            dos.writeInt( ipv4 );
            dos.writeLong( time );
            dos.writeUTF( eventDesc );
        }
        finally {
            dos.close();
        }
        baos.writeTo( os );
    }
}

How to use it? Here is the second trick. A lot of developers know about toByteArray method. Yes, it is a very convenient method. And, like with convenience stores, you are paying a premium to use it. This method creates a copy of internal byte array and returns it to the caller. In most cases you would write it to another OutputStream immediately. Use ByteArrayOutputStream.writeTo(OutputStream) instead. This method will use internal byte array instead of making a copy.

The less known method are ByteArrayOutputStream.toString() and ByteArrayOutputStream.toString(String charsetName). They will allow you to make a String from internal byte array. First method uses default encoding (not a good idea), second one is using a provided encoding name. There is no third method accepting Charset object, which is inconsistent with many other similar method groups.

Summary

  • For performance critical code try to use ByteBuffer instead of ByteArrayOutputStream. If you still want to use ByteArrayOutputStream – get rid of its synchronization.
  • If you are working on a method which writes some sort of message to unknown OutputStream, always write your message to the ByteArrayOutputStream first and use its writeTo(OutputStream) method after that. In some rare cases when you are building a String from its byte representation, do not forget about ByteArrayOutputStream.toString methods.
  • In most cases avoid ByteArrayOutputStream.toByteArray method – it creates a copy of internal byte array. Garbage collecting these copies may take a noticeable time if your application is using a few gigabytes of memory (see Inefficient byte[] to String constructor article for another example).

Leave a Reply

Your email address will not be published. Required fields are marked *