An overview of memory saving techniques in Java

by Mikhail Vorontsov

This article will give you the general advices on memory consumption optimization in Java.

Memory usage optimization is the most important type of optimization in Java. Current systems are limited by memory access times rather than by CPU frequency (otherwise, why CPU producers are implementing all these L1s, L2s and L3s?). It means that by reducing your application memory footprint you will most likely improve your program data processing speed by making your CPU to wait for smaller amount of data. Now let’s get back to Java.

General Java memory layout information

First of all, we have to revise the memory layout of Java objects: any Java Object occupies at least 16 bytes, 12 out of which are occupied by a Java object header. Besides that, all Java objects are aligned by 8 bytes boundary. It means that, for example, an object containing 2 fields: int and byte will occupy not 17 (12 + 4 + 1), but 24 bytes (17 aligned by 8 bytes).

Each Object reference occupies 4 bytes if the Java heap is under 32G and XX:+UseCompressedOops is turned on (it is turned on by default in the recent Oracle JVM versions). Otherwise, Object references occupy 8 bytes.

All primitive data types occupy their exact byte size:

byte, boolean 1 byte
short, char 2 bytes
int, float 4 bytes
long, double 8 bytes

In essence, this information is sufficient for Java memory optimization. But it will be more convenient if you will be aware of arrays / String / numeric wrappers memory consumption.


Most common Java types memory consumption

Arrays consume 12 bytes plus their length multiplied by their element size (and, of course, rounded by 8 bytes alignment).

From Java 7 build 06 String contains 3 fields – a char[] with the string data plus 2 int fields with 2 hash codes calculated by the different algorithms. It means that a String itself needs 12 (header) + 4 (char[] reference) + 4 * 2 (int) = 24 bytes (as you can see, it exactly fits in 8 byte alignment). Besides that, a char[] with the String data occupies 12 + length * 2 bytes (plus alignment). It means that a String occupies 36 + length*2 bytes aligned by 8 bytes (which is, by the way, 8 byte less than a String memory consumption prior to Java 7 build 06).

Numeric wrappers occupy 12 bytes plus size of the underlying type. Byte/Short/Character/Integer/Long are cached by JDK, so the actual memory consumption may be smaller for values in the range [-128; 127]. Anyway, these type may be the source of serious memory overhead in the collection-based applications:

Byte, Boolean 16 bytes
Short, Character 16 bytes
Integer, Float 16 bytes
Long, Double 24 bytes

General Java memory optimization tips

With all this knowledge at hand it is not difficult to give the general Java memory optimization tips:

  • Prefer primitive types to their Object wrappers. The main cause of wrapper types usage are JDK collections, so consider using one of primitive type collection frameworks like Trove.
  • Try to minimize number of Objects you have. For example, prefer array-based structures like ArrayList/ArrayDeque to pointer based structures like LinkedList.

Java memory optimization example

Here is an example. Suppose you have to create a map from int to 20 character long strings. The size of this map is equal to one million and all mappings are static and predefined (saved in some dictionary, for example).

The first approach is to use a Map<Integer, String> from the standard JDK. Let’s roughly estimate the memory consumption of this structure. Each Integer occupies 16 bytes plus 4 bytes for an Integer reference from a map. Each 20 character long String occupies 36 + 20*2 = 76 bytes (see String description above), which are aligned to 80 bytes. Plus 4 bytes for a reference. The total memory consumption will be roughly (16 + 4 + 80 + 4) * 1M = 104M.

The better approach will be replacing String with a byte[] in UTF-8 encoding as described in String packing part 1: converting characters to bytes article. Our map will be Map<Integer, byte[]>. Assume that all string characters belong to ASCII set (0-127), which is true in most of English-speaking countries. A byte[20] occupies 12 (header) + 20*1 = 32 bytes, which conveniently fit into 8 bytes alignment. The whole map will now occupy (16 + 4 + 32 + 4) * 1M = 56M, which is 2 times less than the previous example.

Now let’s use Trove TIntObjectMap<byte[]>. It stores keys as normal int[] compared to wrapper types in JDK collections. Each key will now occupy exactly 4 bytes. The total memory consumption will go down to (4 + 32 + 4) * 1M = 40M.

The final structure will be more complicated. All String values will be stored in the single byte[] one after another with (byte)0 as a separator (we still assume we have a text-based ASCII strings). The whole byte[] will occupy (20 + 1) * 1M = 21M. Our map will store an offset of the string in the large byte[] instead of the string itself. We will use Trove TIntIntMap for this purpose. It will consume (4 + 4) * 1M = 8M. The total memory consumption in this example will be 8M + 21M = 29M. By the way, this is the first example relying on the immutability of this dataset.

Can we achieve the better result? Yes, we can, but at a cost of CPU consumption. The obvious ‘optimization’ is to sort the whole dataset by keys before storing values into a large byte[]. Now we may store the keys in the int[] and use a binary search to look for a key. If a key is found, its index multiplied by 21 (remember, all strings have the same length) will give us the offset of a value in the byte[]. This structure occupies ‘only’ 21M + 4M (for int[]) = 25M at a price of O(log N) lookups compared to O(1) lookups in case of a hash map.

Is this the best we can do? No! We have forgotten that all values are 20 characters long, so we don’t actually need the separators in the byte[]. It means that we can store our ‘map’ using 24M memory if we agree on O( log N ) lookups. No overhead at all compared to theoretical data size and nearly 4.5 times less than required by the original solution ( Map<Integer, String> )! Who told you that Java programs are memory hungry?

In second part of this article we will look at a few commonly used Java types memory consumption.

Summary

  • Prefer primitive types to their Object wrappers. The main cause of wrapper types usage are JDK collections, so consider using one of primitive type collection frameworks like Trove.
  • Try to minimize number of Objects you have. For example, prefer array-based structures like ArrayList/ArrayDeque to pointer based structures like LinkedList.

Recommended reading

If you want to find out more about the clever data compression algorithms, it worth to read “Programming Pearls” (second edition) by Jon Bentley. This is a wonderful collection of rather unexpected algorithms. For example, in column 13.8 author has described how Doug McIlroy managed to fit a 75,000 word spellchecker just in 64 kilobytes of RAM. That spellchecker kept all required information in that tiny amount of memory and did not use disks! It may also be noticed that “Programming Pearls” is one of recommended preparation books for Google SRE interviews.