Tag Archives: io

java.io.BufferedInputStream and java.util.zip.GZIPInputStream

by Mikhail Vorontsov

BufferedInputStream and GZIPInputStream are two classes often used while reading some data from a file (the latter is widely used at least in Linux). Buffering input data is generally a good idea which was described in many Java performance articles. There are still a couple of issues worth knowing about these streams.

When not to buffer

Buffering is done in order to reduce number of separate read operations from an input device. Many developers often forget about it and always wrap InputStream inside BufferedInputStream, like

1
final InputStream is = new BufferedInputStream( new FileInputStream( file ) );
final InputStream is = new BufferedInputStream( new FileInputStream( file ) );

The short rule on whether to use buffering or not is the following: you don’t need it if your data blocks are long enough (100K+) and you can process blocks of any length (you don’t need any guarantees that at least N bytes is available in the buffer before starting processing). In all other cases you need to buffer input data.

The simplest example when you don’t need buffering is a manual file copy process.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public static void copyFile( final File from, final File to ) throws IOException {
    final InputStream is = new FileInputStream( from );
    try
    {
        final OutputStream os = new FileOutputStream( to );
        try
        {
            final byte[] buf = new byte[ 8192 ];
            int read = 0;
            while ( ( read = is.read( buf ) ) != -1 )
            {
                os.write( buf, 0, read );
            }
        }
        finally {
            os.close();
        }
    }
    finally {
        is.close();
    }
}
public static void copyFile( final File from, final File to ) throws IOException {
    final InputStream is = new FileInputStream( from );
    try
    {
        final OutputStream os = new FileOutputStream( to );
        try
        {
            final byte[] buf = new byte[ 8192 ];
            int read = 0;
            while ( ( read = is.read( buf ) ) != -1 )
            {
                os.write( buf, 0, read );
            }
        }
        finally {
            os.close();
        }
    }
    finally {
        is.close();
    }
}

Continue reading

I/O bound algorithms: SSD vs HDD

by Mikhail Vorontsov

This article will investigate an impact of modern SSDs on the I/O bound algorithms of HDD era.

Improved write speed of SSD

Modern SSD provide read/write speeds up to 500Mb/sec. Compare it to approximately 100Mb/sec cap on the speed of modern HDD. It means that your application has to produce the output 5 times faster than before in order to still be I/O bound.

Let’s make 3 tests:

  1. Fill an 8 Gb file with a repeating sequence of 1024 bytes using a BufferedOutputStream with 32K buffer size. Data will be written in a loop, no extra processing is done inside the loop ( testWriteNoProcessing method from the following code snippet ).
  2. Fill an 8 Gb file with a sequence of 1024 bytes which is recomputed before writing it on the every iteration. Data will be written using a BufferedOutputStream with 32K buffer size ( testWriteSimple ).
  3. Same as previous test, but data will not be written to disk. This test will estimate how long does it take to prepare the data to write.

Continue reading

java.io.ByteArrayOutputStream

by Mikhail Vorontsov

Do not use ByteArrayOutputStream in performance critical code

Very important: you will rarely need ByteArrayOutputStream in performance critical code. If you still think you may need it – read the rest of the article.

ByteArrayOutputStream is mostly used when you write a method which writes some sort of message with unknown length to some output stream (are there many cases when you can’t calculate size of your message?).

Important: if you know your message size in advance (or at least know an upper limit for it) – allocate a ByteBuffer instead (or reuse a previously allocated one) and write a message into it. It works faster than ByteArrayOutputStream (read Various methods of binary serialization in Java article).

ByteArrayOutputStream allows you to write anything to an internal expandable byte array and use that array as a single piece of output afterwards. Default buffer size is 32 bytes, so if you expect to write something longer, provide an explicit buffer size in the ByteArrayOutputStream(int) constructor.

In most cases ByteArrayOutputStream is used either when you are writing a callback method and caller provides you with some OutputStream, which nature is undefined, or if you are writing some “message to byte array” serialization method. Second case is covered in Various methods of binary serialization in Java article.

From above mentioned article you will know that ByteArrayOutputStream is synchronized and it seriously impacts its performance. So, if you don’t need synchronization, go to JDK sources, copy class contents to your project and remove all synchronization from it (and forget that it was my advice! 🙂 ). This will make it a bit faster…

Continue reading

Performance of various methods of binary serialization in Java

by Mikhail Vorontsov

We are going to find out what is the performance of binary serialization in Java. Following classes will be compared:

  • DataInputStream(ByteArrayInputStream) and its counterpart DataOutputStream(ByteArrayOutputStream)
  • See how synchronization affects ByteArrayInput/OutputStream and check performance of BAInputStream – copy of ByteArrayInputStream w/o synchronization
  • ByteBuffer in its 4 flavours – heap/direct, big/little endian
  • sun.misc.Unsafe – based memory operations on heap byte arrays

My experience has shown me that all these serialization methods depend on on data item size as well as on buffer/stream type. So, two sets of tests were written. First test works on an object having a single field – byte[500], while second test is using another object with another single field – long[500]. In case of ByteBuffer and Unsafe we will test both bulk operations and serialization of every array element as a separate method call.

Continue reading