Memory consumption of popular Java data types – part 1

by Mikhail Vorontsov

This article will give you an overview of some popular Java data types memory consumption. This article follows An overview of memory saving techniques in Java article earlier published in this blog. I strongly recommend you to read the previous article before reading this one.

Enum, EnumMap, EnumSet

Enums were added to Java 1.5. Technically speaking, each enum is an Object and has all memory consumption properties of objects described in the previous article. Luckily, they have several useful properties:

All enum objects are singletons. This is guaranteed by Java language – you can create an instance of enum only by its definition. This means that you pay for enum object once and then you only spend 4 bytes per its reference (in this article I will assume that object references occupy 4 bytes). But what is the benefit of enums if they consume as much memory as int variables and 4 times more than byte variables, besides type safety and better support in IDE?

The answer is the ordinal() method implemented by every enum. It is an increasing number starting from 0 and ending at the number of values in the given enum minus 1. Such enum property allows us to use arrays for enum to any other object mapping: a value related to the first enum value will be stored in array[0], second enum will be mapped to array[1] and so on according to Enum.ordinal() method result. By the way, it was a short description of JDK EnumMap class implementation.

If you need to implement a map from Object into enum, you won’t have a tailored JDK implementation at hand. Of course, you can use any JDK Object to Object map, but it wouldn’t be that efficient. The easy way here is to use Trove TObjectByteMap (or TObjectIntMap in rare cases when you have more than 128 values in your enum) and map from Object key into Enum.ordinal() value. You will need a decoding method for getters in order to convert byte into an enum. This method will require 1 byte per entry, which is the best we can do without paying CPU algorithmic penalty (of course, we can use less than 1 byte per enum, if there are less than 128, 64, 32, etc elements in the enum, but it may make your code more complicated for a very little memory gain).

With all this knowledge at hand, you may now realize that EnumSet is implemented similarly to BitSet. There are 2 EnumSet implementations in JDK: RegularEnumSet and JumboEnumSet. The former is used for enums having less than 65 values (it covers 99,9999% of real-world enums), the latter is used for larger enumerations.

RegularEnumSet utilizes the knowledge of the fact that there are less or equal to 64 values in the enum. It allows RegularEnumSet to use a single long to store all “enum present in the set” flags, rather than using long[] (utilized by JumboEnumSet or BitSet). Using a single long instead of long[1] allows to save 4 bytes on long[] reference and 16 bytes on long value (long occupies 8 bytes, long[1] needs 12 bytes for header, 8 bytes for a single long and 4 bytes for alignment).


BitSet

As you have noticed from a previous section, a BitSet is built on top of the long[]. The minor disadvantage of a BitSet is that its minimal allowed value is fixed to zero. Thus, if you need a BitSet starting at some offset, you will have to manually subtract your offset from all your values in BitSet methods. Besides that, a BitSet is limited to int values. It also means a need of some sort of envelope for storing real long values in it. The LongBitSet class described in my Bit sets article may give you some useful hints.

Dense set of integers as a BitSet

Details of EnumSet implementation may give you a hint on efficient storage of a dense set of integer values: find out the minimal allowed value in your set (call it minVal) and then use a BitSet for integer values storage: call BitSet.set( value - minVal ) to store your value and BitSet.get( value - minVal ) to check if a value is present in the set. The dense set is a set with more than 1 value per 64 entries.

ArrayList

ArrayList is one of the most efficient storage structures in JDK: it has an Object[] for storage plus an int field for tracking the list size (which is different from the array/list capacity = array.length). As long as you keep the actual objects in the ArrayList, it offers you near-optimal storage. All you need to remember about ArrayList is that it is better not to allocate a large ArrayList in advance if you are not sure that its capacity will be fully utilized. This is especially important for collections of such arrays (or collections of objects having array fields).

Another important property of ArrayList to remember is that there is only one ArrayList method which shrinks its capacity – ArrayList.trimToSize. clear/remove* methods decrease ArrayList size instead of capacity. So, if you use an ArrayList as a buffer of some sort and you expect some high peaks in your data, but rather low average buffer size, then it may make sense for you to trim your ArrayList “manually” after that.

If you have a list of numeric wrappers, you’d better use Trove T[Type]ArrayList, which allows you to store (and most important – keep) primitive values in those collections.

LinkedList

LinkedList is the only JDK collection which implements the two following interfaces: List and Deque, which makes it useful for iteration operations on queue/dequeue/stack structures.

Due to being a deque, each LinkedList node contains references to the previous and next elements as well as a reference to the data value. Such node occupies 24 bytes (12 bytes header + 3*4 bytes for references), which is 6 times more than ArrayList in terms of per-node overhead. This structure is also CPU cache-unfriendly due to its spatial non-locality.

ArrayDeque

If all you need is a FIFO/LIFO buffer and you don’t need to access anything except its head or tail, you’d better prefer ArrayDeque to LinkedList.

ArrayDeque is based on an Object[], which length is a power of 2. Besides the array, there are int head and tail pointers – they allow the array to wrap around its edges if required. Because of ArrayDeque internal memory management, its storage array length is always equal to the size of ArrayDeque rounded to the next power of 2. It means that you spend less than 8 bytes per value in the worst case and 4 bytes per value in the best case.

Summary

The following table summarizes the storage occupied per stored value assuming that a reference occupies 4 bytes. Note that you must spend 4 byte per Object reference in any case, so subtract 4 bytes from the values in the following table to find out the storage overhead.

EnumSet, BitSet 1 bit per value
EnumMap 4 bytes (for value, nothing for key)
ArrayList 4 bytes (but may be more if ArrayList capacity is seriously more than its size)
LinkedList 24 bytes (fixed)
ArrayDeque 4 to 8 bytes, 6 bytes on average

See also

Recommended reading

If you want to better understand computer memory system and its implications on your software, I’d recommend you to read Memory Systems: Cache, DRAM, Disk by Bruce Jacob et al.

Paperback Kindle edition

One thought on “Memory consumption of popular Java data types – part 1

  1. Pingback: Various types of memory allocation in Java  - Java Performance Tuning Guide

Comments are closed.