This article will give you an overview of some popular Java data types memory consumption. This article follows An overview of memory saving techniques in Java article earlier published in this blog. I strongly recommend you to read the previous article before reading this one.
Enum, EnumMap, EnumSet
Enums were added to Java 1.5. Technically speaking, each
enum is an
Object and has all memory consumption properties of objects described in the previous article. Luckily, they have several useful properties:
enum objects are singletons. This is guaranteed by Java language – you can create an instance of
enum only by its definition. This means that you pay for
enum object once and then you only spend 4 bytes per its reference (in this article I will assume that object references occupy 4 bytes). But what is the benefit of enums if they consume as much memory as
int variables and 4 times more than
byte variables, besides type safety and better support in IDE?
The answer is the
ordinal() method implemented by every
enum. It is an increasing number starting from 0 and ending at the number of values in the given
enum minus 1. Such
enum property allows us to use arrays for
enum to any other object mapping: a value related to the first enum value will be stored in
array, second enum will be mapped to
array and so on according to
Enum.ordinal() method result. By the way, it was a short description of JDK
EnumMap class implementation.
If you need to implement a map from
enum, you won’t have a tailored JDK implementation at hand. Of course, you can use any JDK
Object map, but it wouldn’t be that efficient. The easy way here is to use Trove
TObjectIntMap in rare cases when you have more than 128 values in your enum) and map from
Object key into
Enum.ordinal() value. You will need a decoding method for getters in order to convert
byte into an
enum. This method will require 1 byte per entry, which is the best we can do without paying CPU algorithmic penalty (of course, we can use less than 1 byte per
enum, if there are less than 128, 64, 32, etc elements in the
enum, but it may make your code more complicated for a very little memory gain).
With all this knowledge at hand, you may now realize that
EnumSet is implemented similarly to
BitSet. There are 2
EnumSet implementations in JDK:
JumboEnumSet. The former is used for enums having less than 65 values (it covers 99,9999% of real-world enums), the latter is used for larger enumerations.
RegularEnumSet utilizes the knowledge of the fact that there are less or equal to 64 values in the
enum. It allows
RegularEnumSet to use a single
long to store all “enum present in the set” flags, rather than using
long (utilized by
BitSet). Using a single
long instead of
long allows to save 4 bytes on
long reference and 16 bytes on
long value (
long occupies 8 bytes,
long needs 12 bytes for header, 8 bytes for a single
long and 4 bytes for alignment).
As you have noticed from a previous section, a
BitSet is built on top of the
long. The minor disadvantage of a
BitSet is that its minimal allowed value is fixed to zero. Thus, if you need a
BitSet starting at some offset, you will have to manually subtract your offset from all your values in
BitSet methods. Besides that, a
BitSet is limited to
int values. It also means a need of some sort of envelope for storing real
long values in it. The
LongBitSet class described in my Bit sets article may give you some useful hints.
Dense set of integers as a BitSet
EnumSet implementation may give you a hint on efficient storage of a dense set of integer values: find out the minimal allowed value in your set (call it
minVal) and then use a
BitSet for integer values storage: call
BitSet.set( value - minVal ) to store your value and
BitSet.get( value - minVal ) to check if a value is present in the set. The dense set is a set with more than 1 value per 64 entries.
ArrayList is one of the most efficient storage structures in JDK: it has an
Object for storage plus an
int field for tracking the list size (which is different from the array/list capacity =
array.length). As long as you keep the actual objects in the
ArrayList, it offers you near-optimal storage. All you need to remember about
ArrayList is that it is better not to allocate a large
ArrayList in advance if you are not sure that its capacity will be fully utilized. This is especially important for collections of such arrays (or collections of objects having array fields).
Another important property of
ArrayList to remember is that there is only one
ArrayList method which shrinks its capacity –
clear/remove* methods decrease
ArrayList size instead of capacity. So, if you use an
ArrayList as a buffer of some sort and you expect some high peaks in your data, but rather low average buffer size, then it may make sense for you to trim your
ArrayList “manually” after that.
LinkedList is the only JDK collection which implements the two following interfaces:
Deque, which makes it useful for iteration operations on queue/dequeue/stack structures.
Due to being a deque, each
LinkedList node contains references to the previous and next elements as well as a reference to the data value. Such node occupies 24 bytes (12 bytes header + 3*4 bytes for references), which is 6 times more than
ArrayList in terms of per-node overhead. This structure is also CPU cache-unfriendly due to its spatial non-locality.
If all you need is a FIFO/LIFO buffer and you don’t need to access anything except its head or tail, you’d better prefer
ArrayDeque is based on an
Object, which length is a power of 2. Besides the array, there are
int head and tail pointers – they allow the array to wrap around its edges if required. Because of
ArrayDeque internal memory management, its storage array length is always equal to the size of
ArrayDeque rounded to the next power of 2. It means that you spend less than 8 bytes per value in the worst case and 4 bytes per value in the best case.
The following table summarizes the storage occupied per stored value assuming that a reference occupies 4 bytes. Note that you must spend 4 byte per
Object reference in any case, so subtract 4 bytes from the values in the following table to find out the storage overhead.
||1 bit per value|
||4 bytes (for value, nothing for key)|
||4 bytes (but may be more if
||24 bytes (fixed)|
||4 to 8 bytes, 6 bytes on average|
- An overview of memory saving techniques in Java
- Memory consumption of popular Java data types – part 2
- A few more memory saving techniques in Java
- String.intern in Java 6, 7 and 8 – string pooling
- Java collections overview
- Bit sets
- java.util.ArrayList performance
- java.util.LinkedList performance
If you want to better understand computer memory system and its implications on your software, I’d recommend you to read Memory Systems: Cache, DRAM, Disk by Bruce Jacob et al.