Base64 encoding and decoding performance

by Mikhail Vorontsov

02 Apr 2014 update: added Guava implementation and byte[] <-> byte[] section.

21 Mar 2014 update: major rewrite + added javax.xml.bind.DatatypeConverter class description.

21 Feb 2014 update: added MiGBase64 class description.

25 Dec 2013 update: added Java 8 java.util.Base64 class description.

We will discuss what is Base64 algorithm and what is the performance of several different well-known libraries implementing Base64 encoding/decoding.

Base64 is an algorithm mapping all 256 byte values to 64 printable byte values (printable means that those bytes are printed in US-ASCII encoding). This is done by packing 3 input bytes to 4 output bytes. Base64 is generally used in text-based data exchange protocols when there is still a need to transfer some binary data. The best known example is encoding of e-mail attachments.

JDK Base64 implementations

Surprisingly, there was no Base64 implementation in the core JDK classes before Java 6. Some web forums advise to use two non-public sun.* classes which are present in all Sun/Oracle JDK: sun.misc.BASE64Encoder and sun.misc.BASE64Decoder. The advantage of using them is that you don’t need to ship any other libraries with your application. The disadvantage is that those classes are not supposed to be used from outside JDK classes (and, of course, they can be removed from JDK implementation… in theory, at least).

Sun has added another Base64 implementation in Java 6 (thanks to Thomas Darimont for his remainder!): it was hidden in javax.xml.bind package and was unknown to many developers. javax.xml.bind.DatatypeConverter class has 2 static methods – parseBase64Binary and printBase64Binary, which are used for Base64 encoding and decoding.

Java 8 has finally added a Base64 implementation in the java. namespace – java.util.Base64. This static factory class provides you with the basic/MIME/URL and filename safe encoder and decoder implementations.

Surprisingly (or may be not), all these implementations do not share any logic even in Java 8.

Third party Base64 implementations

I will also mention 4 quite well known Base64 third party implementations.

  • The first one is present in the Apache Commons Codec library and called org.apache.commons.codec.binary.Base64.
  • The next one is present in the Google Guava library and accessible via com.google.common.io.BaseEncoding.base64() static method.
  • Another one was written by Robert Harder and available from his website: http://iharder.net/base64. This is a single class which you will have to add to your project.
  • The last one was written by Mikael Grev nearly 10 years ago. It is available from http://migbase64.sourceforge.net/. This is also a single class you have to add into your project. This implementation claims to be the fastest Base64 implementation, but unfortunately this is not true any longer. Besides, it has a strictest limit on the maximal length of byte[] to decode (see below).

Tests

I have written a test which generates 200M of random data, stripes it into 100 or 1000 byte chunks and then encodes them (adding each encoded String to a properly sized ArrayList) and then decodes each element of the new list (adding results to another list). Results are checked against the input data in order to check that nothing was lost during conversions. I have also added a test where a single 200M chunk of data is encoded/decoded in order to calculate the maximal possible performance of an implementation. I’ve measured time to encode all chunks and time to decode all chunks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
private static TestResult testCodec( final Base64Codec codec, final List<byte[]> buffers ) throws IOException {
    final List<String> encoded = new ArrayList<String>( buffers.size() );
    final long start = System.currentTimeMillis();
    for ( final byte[] buf : buffers )
        encoded.add( codec.encode( buf ) );
    final long encodeTime = System.currentTimeMillis() - start;
 
    final List<byte[]> result = new ArrayList<byte[]>( buffers.size() );
    final long start2 = System.currentTimeMillis();
    for ( final String s : encoded )
        result.add( codec.decode( s ) );
    final long decodeTime = System.currentTimeMillis() - start2;
 
    for ( int i = 0; i < buffers.size(); ++i )
    {
        if ( !Arrays.equals( buffers.get( i ), result.get( i ) ) )
            System.out.println( "Diff at pos = " + i );
    }
    return new TestResult( encodeTime / 1000.0, decodeTime / 1000.0 );
}
private static TestResult testCodec( final Base64Codec codec, final List<byte[]> buffers ) throws IOException {
    final List<String> encoded = new ArrayList<String>( buffers.size() );
    final long start = System.currentTimeMillis();
    for ( final byte[] buf : buffers )
        encoded.add( codec.encode( buf ) );
    final long encodeTime = System.currentTimeMillis() - start;

    final List<byte[]> result = new ArrayList<byte[]>( buffers.size() );
    final long start2 = System.currentTimeMillis();
    for ( final String s : encoded )
        result.add( codec.decode( s ) );
    final long decodeTime = System.currentTimeMillis() - start2;

    for ( int i = 0; i < buffers.size(); ++i )
    {
        if ( !Arrays.equals( buffers.get( i ), result.get( i ) ) )
            System.out.println( "Diff at pos = " + i );
    }
    return new TestResult( encodeTime / 1000.0, decodeTime / 1000.0 );
}

Base64Codec is an interface which implements encode and decode methods. I've written six implementations - one for each library. Pay attention to Base64.DONT_GUNZIP flag used in iharder.net decoding: without this flag that library will try to gunzip data after finding 0x8b1f value in the beginning of your message. Possibility of accidentally having these bytes in the beginning of some of your messages is actually rather high, so it is safer to add this flag in advance rather than getting an unexpected exception later ('Can not gunzip data').

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
private static class ApacheImpl implements Base64Codec
{
    public String encode(byte[] data) {
        return org.apache.commons.codec.binary.Base64.encodeBase64String( data );
    }
 
    public byte[] decode(String base64) {
        return org.apache.commons.codec.binary.Base64.decodeBase64( base64 );
    }
}
 
private static class IHarderImpl implements Base64Codec
{
    public String encode(byte[] data) {
        return Base64.encodeBytes( data );
    }
 
    public byte[] decode(String base64) throws IOException {
        return Base64.decode( base64, Base64.DONT_GUNZIP );
    }
}
 
private static class SunImpl implements Base64Codec
{
    public String encode(byte[] data) {
        sun.misc.BASE64Encoder encoder = new BASE64Encoder();
        return encoder.encode( data );
    }
 
    public byte[] decode(String base64) throws IOException {
        sun.misc.BASE64Decoder decoder = new BASE64Decoder();
        return decoder.decodeBuffer( base64 );
    }
}
 
//available in Java 8 only!
private static class Java8Impl implements Base64Codec
{
    private final java.util.Base64.Decoder m_decoder = java.util.Base64.getDecoder();
    private final java.util.Base64.Encoder m_encoder = java.util.Base64.getEncoder();
 
    public String encode(byte[] data) {
        return m_encoder.encodeToString(data);
    }
 
    public byte[] decode(String base64) throws IOException {
        return m_decoder.decode(base64);
    }
}
 
private static class JavaXmlImpl implements Base64Codec
{
    public String encode(byte[] data) {
        return DatatypeConverter.printBase64Binary( data );
    }
 
    public byte[] decode(String base64) throws IOException {
        return DatatypeConverter.parseBase64Binary( base64 );
    }
}
 
private static class MiGBase64Impl implements Base64Codec
{
    public String encode(byte[] data) {
        return mig.Base64.encodeToString( data, false );
    }
 
    public byte[] decode(String base64) throws IOException {
        return mig.Base64.decode( base64 );
    }
}
 
private static class GuavaImpl implements Base64Codec
{
    private final BaseEncoding m_base64 = BaseEncoding.base64();
 
    public String encode(byte[] data) {
        return m_base64.encode(data);
    }
 
    public byte[] decode(String base64) throws IOException {
        return m_base64.decode( base64 );
    }
}
private static class ApacheImpl implements Base64Codec
{
    public String encode(byte[] data) {
        return org.apache.commons.codec.binary.Base64.encodeBase64String( data );
    }

    public byte[] decode(String base64) {
        return org.apache.commons.codec.binary.Base64.decodeBase64( base64 );
    }
}

private static class IHarderImpl implements Base64Codec
{
    public String encode(byte[] data) {
        return Base64.encodeBytes( data );
    }

    public byte[] decode(String base64) throws IOException {
        return Base64.decode( base64, Base64.DONT_GUNZIP );
    }
}

private static class SunImpl implements Base64Codec
{
    public String encode(byte[] data) {
        sun.misc.BASE64Encoder encoder = new BASE64Encoder();
        return encoder.encode( data );
    }

    public byte[] decode(String base64) throws IOException {
        sun.misc.BASE64Decoder decoder = new BASE64Decoder();
        return decoder.decodeBuffer( base64 );
    }
}

//available in Java 8 only!
private static class Java8Impl implements Base64Codec
{
    private final java.util.Base64.Decoder m_decoder = java.util.Base64.getDecoder();
    private final java.util.Base64.Encoder m_encoder = java.util.Base64.getEncoder();

    public String encode(byte[] data) {
        return m_encoder.encodeToString(data);
    }

    public byte[] decode(String base64) throws IOException {
        return m_decoder.decode(base64);
    }
}

private static class JavaXmlImpl implements Base64Codec
{
    public String encode(byte[] data) {
        return DatatypeConverter.printBase64Binary( data );
    }

    public byte[] decode(String base64) throws IOException {
        return DatatypeConverter.parseBase64Binary( base64 );
    }
}

private static class MiGBase64Impl implements Base64Codec
{
    public String encode(byte[] data) {
        return mig.Base64.encodeToString( data, false );
    }

    public byte[] decode(String base64) throws IOException {
        return mig.Base64.decode( base64 );
    }
}

private static class GuavaImpl implements Base64Codec
{
    private final BaseEncoding m_base64 = BaseEncoding.base64();

    public String encode(byte[] data) {
        return m_base64.encode(data);
    }

    public byte[] decode(String base64) throws IOException {
        return m_base64.decode( base64 );
    }
}

After comparing Base64-encoded strings you may notice that the old Sun class and iharder.net class will add line terminator characters in the output string (yes, in order to make it printable on terminal - 2 characters are added for every 76 characters). Apache Commons, Guava, javax.xml and Java 8 implementations do not add them. MiGBase64 may add line terminator if requested (via encodeToString second param).

Here are the test results I got on my Xeon E5-2650@2.8 Ghz using the initial Java 8 release (column headers show the chunk length):

Name Encode, 100 bytes Decode, 100 bytes Encode, 1000 bytes Decode, 1000 bytes Encode, 200000000 bytes Decode, 200000000 bytes
JavaXmlImpl 0.721 sec 1.067 sec 0.664 sec 0.947 sec 0.689 sec 0.885 sec
Java8Impl 0.808 sec 1.101 sec 0.712 sec 0.913 sec 0.699 sec 0.887 sec
MiGBase64Impl 0.887 sec 2.113 sec 0.788 sec 1.978 sec 0.812 sec 1.928 sec
IHarderImpl 1.179 sec 2.645 sec 1.012 sec 2.439 sec 0.976 sec 2.431 sec
ApacheImpl 4.113 sec 4.733 sec 2.308 sec 2.667 sec 2.205 sec 2.552 sec
GuavaImpl 3.305 sec 4.178 sec 3.12 sec 3.755 sec 3.102 sec 3.744 sec
SunImpl 11.153 sec 8.281 sec 3.992 sec 4.533 sec 3.289 sec 4.096 sec

Everything is evident from the table: appalling performance of sun.misc classes, acceptable performance of Commons Codec classes and very good performance of both iharder.net and MiG Base64 implementation. New Java 8 encoder/decoder is running faster than any other implementation (but you need Java 8...). javax.xml class turned out to be a hidden gem!

Using exceptions for normal operation conditions in sun.misc classes

You may wonder why sun.misc classes are so slow. Try to run them from a debugger and make a few thread dumps from your IDE. Most probably, you will see a stack like this for encoding:

1
2
3
4
5
6
7
8
9
at java.security.AccessController.doPrivileged(AccessController.java:-1)
at java.io.BufferedWriter.<init>(BufferedWriter.java:109)
at java.io.BufferedWriter.<init>(BufferedWriter.java:88)
at java.io.PrintStream.<init>(PrintStream.java:105)
at java.io.PrintStream.<init>(PrintStream.java:151)
at java.io.PrintStream.<init>(PrintStream.java:135)
at sun.misc.CharacterEncoder.encodeBufferPrefix(CharacterEncoder.java:92)
at sun.misc.CharacterEncoder.encode(CharacterEncoder.java:147)
at sun.misc.CharacterEncoder.encode(CharacterEncoder.java:191)
at java.security.AccessController.doPrivileged(AccessController.java:-1)
at java.io.BufferedWriter.<init>(BufferedWriter.java:109)
at java.io.BufferedWriter.<init>(BufferedWriter.java:88)
at java.io.PrintStream.<init>(PrintStream.java:105)
at java.io.PrintStream.<init>(PrintStream.java:151)
at java.io.PrintStream.<init>(PrintStream.java:135)
at sun.misc.CharacterEncoder.encodeBufferPrefix(CharacterEncoder.java:92)
at sun.misc.CharacterEncoder.encode(CharacterEncoder.java:147)
at sun.misc.CharacterEncoder.encode(CharacterEncoder.java:191)

And like this for decoding:

1
2
3
4
5
6
7
8
9
10
at java.lang.Throwable.fillInStackTrace(Throwable.java:-1)
at java.lang.Throwable.fillInStackTrace(Throwable.java:782)
- locked <0x6c> (a sun.misc.CEStreamExhausted)
at java.lang.Throwable.<init>(Throwable.java:250)
at java.lang.Exception.<init>(Exception.java:54)
at java.io.IOException.<init>(IOException.java:47)
at sun.misc.CEStreamExhausted.<init>(CEStreamExhausted.java:30)
at sun.misc.BASE64Decoder.decodeAtom(BASE64Decoder.java:117)
at sun.misc.CharacterDecoder.decodeBuffer(CharacterDecoder.java:163)
at sun.misc.CharacterDecoder.decodeBuffer(CharacterDecoder.java:194)
at java.lang.Throwable.fillInStackTrace(Throwable.java:-1)
at java.lang.Throwable.fillInStackTrace(Throwable.java:782)
- locked <0x6c> (a sun.misc.CEStreamExhausted)
at java.lang.Throwable.<init>(Throwable.java:250)
at java.lang.Exception.<init>(Exception.java:54)
at java.io.IOException.<init>(IOException.java:47)
at sun.misc.CEStreamExhausted.<init>(CEStreamExhausted.java:30)
at sun.misc.BASE64Decoder.decodeAtom(BASE64Decoder.java:117)
at sun.misc.CharacterDecoder.decodeBuffer(CharacterDecoder.java:163)
at sun.misc.CharacterDecoder.decodeBuffer(CharacterDecoder.java:194)

Using a PrintStream to encode data and throwing an exception to probably request more data was really a brilliant idea!

Maximal supported chunk size to encode

In theory, your output can not exceed 2Gb in length, because this is the array size limit imposed by JVM. It means that your input should not be longer than 2Gb / 4 * 3 = 1.5Gb. The practice is more complex.

It turned out that there is one more problem to avoid - how are you going to calculate the size of the output array (originalLength * 4 / 3 + padding)? Won't your result overflow the size of int? Some implementations are not careful about this calculation.

I have tested the maximal length of data it is possible to encode / decode with a given codec. Usually encoding is more memory amount sensitive, but in some cases decoding fails first. The first test allocates 32G heap (-Xmx32G), which should be more than enough to encode up to 2G of data.

-Xmx32G
Codec Base64Tests$MiGBase64Impl Failed encoding at size = 1.62 Gb due to java.lang.NegativeArraySizeException
Codec Base64Tests$MiGBase64Impl Failed decoding at size = 0.36 Gb due to java.lang.NegativeArraySizeException

Codec Base64Tests$JavaXmlImpl Failed encoding at size = 1.62 Gb due to java.lang.NegativeArraySizeException

Codec Base64Tests$Java8Impl Failed encoding at size = 1.62 Gb due to java.lang.NegativeArraySizeException

Codec Base64Tests$GuavaImpl Failed encoding at size = 1.62 Gb due to java.lang.NegativeArraySizeException

Codec Base64Tests$SunImpl Failed encoding at size = 0.79 Gb due to java.lang.OutOfMemoryError: Requested array size exceeds VM limit

Codec Base64Tests$IHarderImpl Failed encoding at size = 1.62 Gb due to java.lang.NegativeArraySizeException
Codec Base64Tests$IHarderImpl Failed decoding at size = 0.72 Gb due to java.lang.NegativeArraySizeException

Codec Base64Tests$ApacheImpl Failed encoding at size = 0.81 Gb due to java.lang.NegativeArraySizeException
Codec Base64Tests$ApacheImpl Failed decoding at size = 0.72 Gb due to java.lang.OutOfMemoryError: Requested array size exceeds VM limit

There are a few magic numbers in these results:

  • 1.62G * 4 / 3 = 2,160,000,000 bytes, at the same time 1.61G * 4 / 3 = 2,146,666,666 - this is a boundary of an int as well as a maximal possible amount of data you can Base64 in Java using a single array.
  • 0.81G * 4 / 3 > Integer.MAX_VALUE / 2. Apache Codec implementation does not calculate the encoded data size. It uses an expanding buffer instead. The starting buffer size is 8K and it doubles on each expansion, which means that at some step the buffer size will be 1G (exactly). Unfortunately, it can not expand to 2G... The last fitting buffer length in case of Apache Codec will be 1G / 4 * 3 = 805,306,368 bytes.
  • 0.79G - approximately 2 times less that 1.62G. Old Sun implementation uses ByteArrayOutputStream, which doubles in size on every expansion. The important word here is doubles.
  • 0.72G * 3 = 2,160,000,000 > Integer.MAX_VALUE - Apache implementation tries to convert a String into UTF-8 byte[]. Unfortunately, deep inside JDK, String.getBytes method initially allocates a byte[] with a maximal possible number of bytes per char, which is 3 in case of UTF-8 (see java.lang.StringCoding$StringEncoder.encode). IHarder implementation is more straightforward: int len34 = len * 3 / 4; - first multiplication leads to overflow.
  • 0.36G * 6 = 2,160,000,000. Then same as above. You would be surprised, but len/4*3 could be calculated as int len = ((sLen - sepCnt) * 6 >> 3) - pad;

Lack of decoding message in case of 3 JDK implementations means that the decoder can handle at least as much data as encoder. In case of a new Java 8 implementation and javax.xml implementations it means they can handle the full 2G buffer.

Memory consumption of Base64 implementations

Now we know what is the limit for a single chunk of data to be processed by all these codecs. Now let's find out how much memory each of these codecs needs. This time will will run the JVM with -Xmx8G key to see the actual memory consumption.

As you understand, Base64 encoding requires 4/3 of the input array length for the output. The output is a String, so the output needs 4/3*sizeof(char) bytes for the output String = 8/3. Besides that, you have the original byte[]. So, the theoretical minimal memory consumption of Base64 (to String) algorithm in Java is 11/3 of the input byte array (which is nearly 4 times). Unfortunately, you can not create a String without making a copy of an original byte[]/char[]. That's why the actual minimal required amount of RAM is 15/3 = 5 times of an original byte[] (assuming that a temp byte[] with size = 4/3 of an original array is used). That's quite a lot. Most of implementations will actually use temp char[], so you will need 19/3 = 6.33 times of the original byte[]. So, if you are processing the large input arrays, consider using implementations writing to the output byte[]/char[] - you would avoid allocating a temporary buffer this way.

This time we will test encoding only, because it requires a temporary buffer.

-Xmx8G
Codec com.mvorontsov.javaperf.Base64Tests$GuavaImpl Failed encoding at size = 1.08 Gb due to java.lang.OutOfMemoryError: Java heap space
Codec com.mvorontsov.javaperf.Base64Tests$MiGBase64Impl Failed encoding at size = 1.08 Gb due to java.lang.OutOfMemoryError: Java heap space
Codec com.mvorontsov.javaperf.Base64Tests$JavaXmlImpl Failed encoding at size = 1.08 Gb due to java.lang.OutOfMemoryError: Java heap space
Codec com.mvorontsov.javaperf.Base64Tests$Java8Impl Failed encoding at size = 1.17 Gb due to java.lang.OutOfMemoryError: Java heap space
Codec com.mvorontsov.javaperf.Base64Tests$SunImpl Failed encoding at size = 0.79 Gb due to java.lang.OutOfMemoryError: Requested array size exceeds VM limit
Codec com.mvorontsov.javaperf.Base64Tests$IHarderImpl Failed encoding at size = 1.24 Gb due to java.lang.OutOfMemoryError: Java heap space
Codec com.mvorontsov.javaperf.Base64Tests$ApacheImpl Failed encoding at size = 0.81 Gb due to java.lang.NegativeArraySizeException

As you can see, all the codecs consume more or less the same amount of memory. Nevertheless, we have 2 leaders here: Java 8 codec and IHarder codec.

Base64 into byte[] encoding

As you have seen, you need quite a lot of memory for byte[] -> String Base64 encoding. But actually you do not need to encode into a String - all your output characters belong to ASCII charset, so you can safely write the Base64 output into a byte[]. There should be no temporary buffers during conversion, so you need just 7/3 of the input for the conversion (compare with 19/3 in case of ->String conversion).

Besides that, your code will be faster, because you would not need to copy data from the temp buffer into a final String as well as to allocate it.

Surprisingly, byte[] -> byte[] Base64 is supported by only 4 of these codecs: Java 8, MigBase64, IHarder and Apache Commons. I have added Base64ByteCodec interface implementation to these 4 codecs (you can take a look at the source code at the end of this article).

Here are test results on the same workstation.

Name Encode, 100 bytes Decode, 100 bytes Encode, 1000 bytes Decode, 1000 bytes Encode, 200000000 bytes Decode, 200000000 bytes
Java8Impl 0.528 sec 0.91 sec 0.436 sec 0.816 sec 0.415 sec 0.792 sec
MiGBase64Impl 0.695 sec 1.6 sec 0.614 sec 1.457 sec 0.596 sec 1.455 sec
IHarderImpl 0.852 sec 1.999 sec 0.775 sec 1.895 sec 0.744 sec 1.893 sec
ApacheImpl 4.411 sec 4.237 sec 2.057 sec 2.004 sec 1.949 sec 2.226 sec

I have also copied byte[] -> String times for the same codecs here:

Name Encode, 100 bytes Decode, 100 bytes Encode, 1000 bytes Decode, 1000 bytes Encode, 200000000 bytes Decode, 200000000 bytes
Java8Impl 0.808 sec 1.101 sec 0.712 sec 0.913 sec 0.699 sec 0.887 sec
MiGBase64Impl 0.887 sec 2.113 sec 0.788 sec 1.978 sec 0.812 sec 1.928 sec
IHarderImpl 1.179 sec 2.645 sec 1.012 sec 2.439 sec 0.976 sec 2.431 sec
ApacheImpl 4.113 sec 4.733 sec 2.308 sec 2.667 sec 2.205 sec 2.552 sec

As you can see, byte[] -> byte[] conversion is faster (but probably not much faster). Anyway, you should use byte[] -> byte[] if you can process byte[] instead of String - it will save you some processing time (and even more processing time if you will need to serialize your String later).

Summary

Let's summarize the codec properties in one table. This table is sorted by the relative performance of all these codecs (faster on top).

Name Max encoding len Max decoding len How much we can encode with -Xmx8G Supports byte[] -> byte[]
Java 8 1.62 G 2 G 1.16 G Yes
javax.xml 1.62 G 2 G 1.07 G No
MiGBase64 1.62 G 0.36 G 1.07 G Yes
IHarder 1.62 G 0.72 G 1.23 G Yes
Apache Codec 0.81 G 0.72 G 0.8 G Yes
Guava 1.62 G 2 G 1.07 G No
Sun 0.79 G 1.05 G 0.78 G No
  • If you looking for a fast and reliable Base64 codec - do not look outside JDK. There is a new codec in Java 8: java.util.Base64 and there is also one hidden from many eyes (from Java 6): javax.xml.bind.DatatypeConverter. Both are fast, reliable and do not suffer from integer overflows.
  • 2 out of 4 3rd party codecs described here are very fast: MiGBase64 and IHarder. Unfortunately, if you will need to process hundreds of megabytes at a time, only Google Guava will allow you to decode 2G of data at a time (360MB in case of MiGBase64 / 720M in case of IHarder and Apache Commons). Unfortunately, Guava does not support byte[] -> byte[] encoding.
  • Do not try to call String.getBytes(Charset) on huge strings if your charset is a multibyte one - you may get the whole gamma of integer overflow related exceptions.

Test source code

Base64Tests


14 thoughts on “Base64 encoding and decoding performance

  1. DucQuoc.wordpress.com

    May you add the MiGBase64 implementation as well ?

    According to the author, MiGBase64 is faster than Apache commons codec and IHarder !

    Reply
  2. Manfred Rosenboom

    Many thanks for your interesting article! We need to encode/decode BIG byte arrays and at the moment we are using what you call IHarderImpl. I have done my own testing and up to an byte array of 100 MB MiGBase64Impl is the fastest implementation. In our environment we can safely use the decodeFast method which is even much faster than the default decode method.

    But: I have also tried a 500 MB byte array which is one of the bigger byte arrays that can occur in certain situations in our application and I get

    java.lang.NegativeArraySizeException
    at test.base64.impl.mig.Base64.decode(Base64.java:490)

    or

    java.lang.NegativeArraySizeException
    at test.base64.impl.mig.Base64.decodeFast(Base64.java:545)

    IHarderImpl works flawless even in this cases.

    Reply
    1. admin Post author

      Hi Manfred,

      thank your for your idea to check the maximal possible datasize which is possible to process with all these codecs!
      Sorry it took me so long to update an article.

      Regards,
      Mikhail

      Reply
  3. Thomas Darimont

    Hello Mikhail,

    great article 🙂

    I just wanted to add that there is another method for Bas64 encoding/decoding in the JDK (JAXB): (it’s there since Java 6)
    javax.xml.bind.DatatypeConverter#parseBase64Binary
    javax.xml.bind.DatatypeConverter#printBase64Binary

    Cheers,
    Thomas

    Reply
    1. admin Post author

      Hi Thomas,

      thanks for your hint! This class turned out to be my new choice for pre-Java 8 projects 🙂

      Regards,
      Mikhail

      Reply
  4. Mikael Grev

    Fun to find this!

    I just looked at my MigBase64 code from 10 years ago and saw that there’s a few things that could be improved. The encode/decode max size is an obvious and simple one and today a re-alloc would be faster than a double parsing.

    Luckily there’s now a very good base64 impl. that we all can and should use (which is why I found this while searching performance numbers for) so I can watch som TV instead, thinking MigBase64 was at leastfastest for 10 years. 😉

    Good performace test!

    Cheers,
    Mikael Grev

    Reply
    1. admin Post author

      Hi Christian,

      thanks for pointing my attenion to Guava and Netty. I have realised that I need to add another large section to this article. It will take me about a week to do it.

      Regards,
      Mikhail

      Reply
  5. Pingback: Exceptions are slow in Java - Java Performance Tuning Guide

  6. Leo Bayer

    Thanks! This is a nice comparison.

    For large blobs of base64 I prefer using a streaming encoder/decoder. Using streams removes any upper limit on input or output size, as well as reduces the memory overhead, and presumably should have a negligible impact on speed.

    We’ve been using the apache-commons Base64OutputStream for decoding and after seeing your writeup I wanted to try out the new Java 8 implementation. Unfortunately when replacing the commons output stream with the Decoder#wrap output stream we see a two orders of magnitude increase in decoding time. A 40 megabyte file with apache-commons decodes in around 0.3 seconds but with the Java 8 API it takes over a minute. This is with and without using buffered streams. Surprisingly, it seems that using buffered streams actually makes the performance worse.

    Have you tried using the new Java 8 base64 streams? I’m hoping I’m overlooking something simple.

    Reply
    1. admin Post author

      Hi Leo,

      looks like you are overlooking a position of a buffered stream. It should be between Decoder.wrap and FileInputStream. And yes, a stream returned by a Decoder.wrap is pretty slow…
      The code below takes about 2 seconds on my box to write a file and another 2 seconds to read it (note, it includes CRC calculation). If I’ll remove a buffered stream or move it into an incorrect position, it will take 83-86 seconds for the same task (on Xeon and Samsung 840 Pro SSD :((((((((((((( ).

      Best regards,
      Mikhail

      public class Base64Test {
      private final static File FILE = new File( "some file name" );

      public static void main(String[] args) throws IOException {
      final long start = System.currentTimeMillis();
      final long crc = writeFile(FILE, 40 * 1024 * 1024);
      final long time = System.currentTimeMillis() - start;
      System.out.println( "crcWrite = " + crc );
      System.out.println( "Time write = " + time / 1000.0 + " sec");

      final long start2 = System.currentTimeMillis();
      final long crc2 = readFile( FILE );
      final long time2 = System.currentTimeMillis() - start2;

      System.out.println( "crcRead = " + crc2 );
      System.out.println( "Time read = " + time2 / 1000.0 + " sec");
      }

      private static long writeFile( final File f, final int size ) throws IOException {
      final Random r = ThreadLocalRandom.current();
      final CRC32 crc = new CRC32();
      byte[] block = new byte[ 8192 ];
      //this one is fast (hmm, not too fast...)
      try ( OutputStream os = Base64.getEncoder().wrap( new BufferedOutputStream( new FileOutputStream( f ), 65536 ) ) ) {
      //this one is slow
      //try ( OutputStream os = new BufferedOutputStream( Base64.getEncoder().wrap( new FileOutputStream( f ) ), 65536 ) ) {
      for (int i = 0; i < size / block.length; ++i) { r.nextBytes( block ); crc.update( block ); os.write( block ); } } return crc.getValue(); } private static long readFile( final File f ) throws IOException { final CRC32 crc = new CRC32(); byte[] block = new byte[ 8192 ]; try ( InputStream is = Base64.getDecoder().wrap( new BufferedInputStream( new FileInputStream( f ), 65536 ) ) ) { int read; while ( (read = is.read( block )) > 0 ) {
      crc.update(block, 0, read);
      }
      }
      return crc.getValue();
      }
      }

      Reply
      1. Leo Bayer

        Thanks for looking at this.

        Now I’m seeing about the same results. I must have missed it because the commons base64 stream basically buffers on its own, so it had no need for a buffered stream underneath. So with the buffered streams, it’s only about 4 times slower than the apache-commons streams.

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *