Tag Archives: memory optimization

String deduplication feature (from Java 8 update 20)

by Mikhail Vorontsov

This article will provide you a short overview of a string deduplication feature added into Java 8 update 20.

String objects consume a large amount of memory in an average application. Some of these strings may be duplicated – there exist several distinct instances of the same String (a != b, but a.equals(b)). In practice, a lot of Strings could be duplicated due to various reasons.

Originally, JDK offered String.intern() method to deal with the string duplication. The disadvantage of this method is that you have to find which strings should be interned. This generally requires a heap analysis tool with a duplicate string lookup ability, like YourKit profiler. Nevertheless, if used properly, string interning is a powerful memory saving tool – it allows you to reuse the whole String objects (each of whose is adding 24 bytes overhead to the underlying char[]).

Starting from Java 7 update 6, each String object has its own private underlying char[]. This allows JVM to make an automatic optimization – if an underlying char[] is never exposed to the client, then JVM can find 2 strings with the same contents, and replace the underlying char[] of one string with an underlying char[] of another string.

That’s done by the string deduplication feature added into Java 8 update 20. How it works:

Continue reading

Trove library: using primitive collections for performance

by Mikhail Vorontsov


19 July 2014: article text cleanup, added a chapter on JDK to Trove migration.
16 July 2012: original version.

This article would describe Trove library, which contains a set of primitive collection implementations. The latest version of Trove (3.1a1 at the time of writing) would be described here.

Why should you use Trove? Why not to keep using well known JDK collections? The answer is performance and memory consumption. Trove doesn’t use any java.lang.Number subclasses internally, so you don’t have to pay for boxing/unboxing each time you want to pass/query a primitive value to/from the collection. Besides, you don’t have to waste memory on the boxed numbers (24 bytes for Long/Double, 16 bytes for smaller types) and reference to them. For example, if you want to store an Integer in JDK map, you need 4 bytes for a reference (or 8 bytes on huge heaps) and 16 bytes for an Integer instance. Trove, on the other hand, uses just 4 bytes to store an int. Trove also doesn’t create Map.Entry for each key-value pair unlike java.util.HashMap, further reducing the map memory footprint. For sets, it doesn’t use a Map internally, just keeping set values.

There are 3 main collection types in Trove: array lists, sets and maps. There are also queues, stacks and linked lists, but they are not so important (and usually instances of these collections tend to be rather small).

Array lists

All array lists are built on top of an array of corresponding data type (for example, int[] for TIntArrayList). There is a small problem which you should deal with: a value for the absent elements (default value). It is zero by default, but you can override it using

1
2
public TIntArrayList(int capacity, int no_entry_value);
public static TIntArrayList wrap(int[] values, int no_entry_value);
public TIntArrayList(int capacity, int no_entry_value);
public static TIntArrayList wrap(int[] values, int no_entry_value);

There are 2 useful methods called getQuick/setQuick – they just access the underlying array without any additional checks. As a side-effect they would allow to access elements between list size and capacity (don’t use it too much – it is still better to add values legally and when getQuick values as long as you are inside array list boundaries.

Each Trove array list has several helper methods which implement the java.util.Collections functionality:

1
2
3
4
5
6
7
8
9
10
11
12
public void reverse()
public void reverse(int from, int to)
public void shuffle(java.util.Random rand)
public void sort()
public void sort(int fromIndex, int toIndex)
public void fill(int val)
public void fill(int fromIndex, int toIndex, int val)
public int binarySearch(int value)
public int binarySearch(int value, int fromIndex, int toIndex)
public int max()
public int min()
public int sum()
public void reverse()
public void reverse(int from, int to)
public void shuffle(java.util.Random rand)
public void sort()
public void sort(int fromIndex, int toIndex)
public void fill(int val)
public void fill(int fromIndex, int toIndex, int val)
public int binarySearch(int value)
public int binarySearch(int value, int fromIndex, int toIndex)
public int max()
public int min()
public int sum()

Continue reading

Creating an exception in Java is very slow

by Mikhail Vorontsov


29 June 2014 update – now using JMH for testing. Added 2 ways to avoid the cost of exceptions. Made some editorial changes to the original text to reflect JMH usage.

Filling in the stack trace is slow…

Creating an exception in Java is a very slow operation. Expect that throwing an exception will cost you around 1-5 microseconds. Nearly all this time is spent on filling in the exception thread stack. The deeper the stack trace is, the more time it will take to populate it.

Usually we throw an exception in case of unexpected problems. This means that we don’t expect exceptions to be thrown at a rate of thousands per second per process. But sometimes you will encounter a method which uses exceptions for more likely events. We have already seen a good (actually bad) example of such behavior in Base 64 encoding and decoding performance article: sun.misc.BASE64Decoder is extremely slow due to using an exception for “give me more data” requests:

at java.lang.Throwable.fillInStackTrace(Throwable.java:-1)
at java.lang.Throwable.fillInStackTrace(Throwable.java:782)
- locked <0x6c> (a sun.misc.CEStreamExhausted)
at java.lang.Throwable.<init>(Throwable.java:250)
at java.lang.Exception.<init>(Exception.java:54)
at java.io.IOException.<init>(IOException.java:47)
at sun.misc.CEStreamExhausted.<init>(CEStreamExhausted.java:30)
at sun.misc.BASE64Decoder.decodeAtom(BASE64Decoder.java:117)
at sun.misc.CharacterDecoder.decodeBuffer(CharacterDecoder.java:163)
at sun.misc.CharacterDecoder.decodeBuffer(CharacterDecoder.java:194)

You may encounter the same problem if you try to run pack method from String packing part 2: converting Strings to any other objects with a string starting with a digit, but followed by letters. Let’s see how long does it take to pack ‘12345’ and ‘12345a’ with that method:

Benchmark                        (m_param)   Mode   Samples         Mean   Mean error    Units
t.StringPacking2Tests.testPack      12345a  thrpt        10        0.044        0.000   ops/us
t.StringPacking2Tests.testPack       12345  thrpt        10        7.934        0.154   ops/us

As you can see, we were able to convert “12345” from String about 200 times faster than “12345a”. Most of processing time is again being spent filling in stack traces:

at java.lang.Throwable.fillInStackTrace(Throwable.java:-1)
at java.lang.Throwable.fillInStackTrace(Throwable.java:782)
- locked <0x87> (a java.lang.NumberFormatException)
at java.lang.Throwable.<init>(Throwable.java:265)
at java.lang.Exception.<init>(Exception.java:66)
at java.lang.RuntimeException.<init>(RuntimeException.java:62)
at java.lang.IllegalArgumentException.<init>(IllegalArgumentException.java:53)
at java.lang.NumberFormatException.<init>(NumberFormatException.java:55)
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.valueOf(Long.java:540)
at tests.StringPacking2Tests.pack(StringPacking2Tests.java:69)
...

Continue reading

Core Java 7 Change Log

by Mikhail Vorontsov


01 Jan 2014 update – covered all Java 7 versions up to Java 7u45.

This article will list performance related changes made in the core part of various Java 7 updates. I will track changes in the following packages:

  • java.lang.*
  • java.io
  • java.math
  • java.net
  • java.nio.*
  • java.text.*
  • java.util.*

These changes have one peculiar property – Oracle usually does not include them in the release notes. Probably, they consider them too minor… Anyway, this page will try to help you a little: hopefully, you will not miss updates like a “famous” String.substring change in Java 7u6.

This page will be updated after new Java 7 updates. I will also add a list of changes done before Java 7u45 (give me some time!).

Java 7u45 (compared to Java 7u25)

Most of these changes were made in Java 7u40, but I have unfortunately missed that release.

Shared empty table in ArrayList and HashMap update

File changed: \util\ArrayList.java
File changed: \util\HashMap.java

Two most popular Java collections – ArrayList and HashMap are now consuming less memory if their instances are empty (size=0). The only reason for this update, from my point of view is a critical mass of cases in Oracle performance benchmarks when a map/list is created, but not populated later due to a conditional logic (for example, empty parameter list was provided in the HTTP request).

This update reduces the garbage collector workload by creating less unused garbage.

In case of ArrayList, only objects created via an empty constructor are updated. If you are using a constructor specifying the initial size, there are no changes at all for you.

1
2
3
4
5
6
7
8
9
/**
 * Shared empty array instance used for empty instances.
 */
private static final Object[] EMPTY_ELEMENTDATA = {};
 
public ArrayList() {
    super();
    this.elementData = EMPTY_ELEMENTDATA;
}
/**
 * Shared empty array instance used for empty instances.
 */
private static final Object[] EMPTY_ELEMENTDATA = {};

public ArrayList() {
    super();
    this.elementData = EMPTY_ELEMENTDATA;
}

HashMap got a more thorough update – internal table initialization is now moved away from the constructor. It is now always initialized with an empty table. As a consequence, all getters and setters in the HashMap are now checking if a map is empty. This makes getters slightly faster in case of an empty map – you no longer have to look at the actual table, but at the same time it makes all setters/getters slower in every other case (this is a very tiny slowdown – a zero check of a single int field, but this is still an extra instruction to execute).

I hope that this HashMap change has a solid justification – there are enough always empty maps in the Oracle benchmark applications. In my personal opinion this change is quite questionable – I do not like “taxes” you always have to pay just for a corner case optimization.

1
2
3
4
5
6
7
8
9
/**
 * An empty table instance to share when the table is not inflated.
 */
static final Entry<?,?>[] EMPTY_TABLE = {};
 
/**
 * The table, resized as necessary. Length MUST Always be a power of two.
 */
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
/**
 * An empty table instance to share when the table is not inflated.
 */
static final Entry<?,?>[] EMPTY_TABLE = {};

/**
 * The table, resized as necessary. Length MUST Always be a power of two.
 */
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

An author of these changes has responded me here.

Continue reading

String.intern in Java 7 and 8 – part 3

by Mikhail Vorontsov

I want to return to String.intern method I have discussed earlier ( part 1, part 2 ). During the last couple of months I have used interning heavily in my pet project in order to see the pros and cons of using String.intern for every non-temporary string in my application (non-temporary – the one which is expected to live longer than for a few seconds and most likely get into the GC old generation).

As I have written before, the advantages of String.intern in Java 7 and 8 are:

  • Rather fast execution, nearly no performance loss in the multithreaded mode (still using the global string pool).
  • Saving memory, thus allowing your dataset to shrink, which will let your application to run faster (in general).

The main disadvantages of this method (as I have noticed before) are:

  • The requirement to set -XX:StringTableSize=N JVM parameter in advance and dealing with its fixed value (needs JVM restart in order to expand the JVM string pool).
  • Adding calls to String.intern in a lot of places globally (probably via your own wrapper) – which has linear code maintenance cost.

After a couple of months of using String.intern in my project, I think that its usage should be limited to the fields having a limited domain of distinct values (like person first names or state/province names). We should not try to intern anything with a low probability of a repeating value – it would be a waste of CPU time.

Continue reading

String.intern in Java 6, 7 and 8 – multithreaded access

by Mikhail Vorontsov

This article follows the initial String.intern in Java 6, 7 and 8 – string pooling article describing the implementation and benefits of using String.intern() method in Java 7 and 8. The original article was already getting too long, so I had to write this article in order to describe the performance characteristics of the multithreaded access to the String.intern.

The tests we will perform today are calling String.intern() from the multiple threads. They will emulate the behavior of a lot of modern server applications (for example, specialized web crawlers). For tests I will use my new workstation with Intel Xeon E5-2650 CPU (8 physical, 16 virtual cores @ 2 Ghz) and 128 Gb RAM. It will allow us to test the rather high contention scenarios. In this article we will create 8 threads in order to utilize all physical cores.

There will be 4 tests:

  1. The benchmark one – a single thread calling String.intern() using testLongLoop method from the previous article. It will show us how fast this configuration is without any contention.
  2. All 8 threads are calling String.intern() with unique values – an own prefix will be added by each thread to the interned string. This test will show us the synchronization overhead of String.intern(). It should be the theoretical worst case: it is highly unlikely that the only thing the actual application would do is calling String.intern in a loop from many threads.
  3. Initially we start the first thread interning the set of strings. After 2 seconds delay we will start a second thread interning the same set of strings. We expect that the following assumptions will be true: str.intern()==str for the first thread; str.intern()!=str for the second thread. It will allow us to prove that there are no thread local JVM string pools.
  4. All 8 threads will intern the same set of values. This scenario will be closer to the real situation – it will provide us the rather likely mix of adding strings to the JVM pool and querying the strings from it. Nevertheless, such a high read contention on the JVM string pool is still an unlikely event.

Continue reading

String.intern in Java 6, 7 and 8 – string pooling

by Mikhail Vorontsov

This article will describe how String.intern method was implemented in Java 6 and what changes were made in it in Java 7 and Java 8.

First of all I want to thank Yannis Bres for inspiring me to write this article.

07 June 2014 update: added 60013 as a default string pool size since Java 7u40 (instead of Java 8), added -XX:+PrintFlagsFinal and -XX:+PrintStringTableStatistics JVM parameter references, cleaned up a few typos.

This is an updated version of this article including -XX:StringTableSize=N JVM parameter description. This article is followed by String.intern in Java 6, 7 and 8 – multithreaded access article describing the performance characteristics of the multithreaded access to String.intern().

String pooling

String pooling (aka string canonicalisation) is a process of replacing several String objects with equal value but different identity with a single shared String object. You can achieve this goal by keeping your own Map<String, String> (with possibly soft or weak references depending on your requirements) and using map values as canonicalised values. Or you can use String.intern() method which is provided to you by JDK.

At times of Java 6 using String.intern() was forbidden by many standards due to a high possibility to get an OutOfMemoryException if pooling went out of control. Oracle Java 7 implementation of string pooling was changed considerably. You can look for details in http://bugs.sun.com/view_bug.do?bug_id=6962931 and http://bugs.sun.com/view_bug.do?bug_id=6962930.

String.intern() in Java 6

In those good old days all interned strings were stored in the PermGen – the fixed size part of heap mainly used for storing loaded classes and string pool. Besides explicitly interned strings, PermGen string pool also contained all literal strings earlier used in your program (the important word here is used – if a class or method was never loaded/called, any constants defined in it will not be loaded).

The biggest issue with such string pool in Java 6 was its location – the PermGen. PermGen has a fixed size and can not be expanded at runtime. You can set it using -XX:MaxPermSize=N option. As far as I know, the default PermGen size varies between 32M and 96M depending on the platform. You can increase its size, but its size will still be fixed. Such limitation required very careful usage of String.intern – you’d better not intern any uncontrolled user input using this method. That’s why string pooling at times of Java 6 was mostly implemented in the manually managed maps.

String.intern() in Java 7

Oracle engineers made an extremely important change to the string pooling logic in Java 7 – the string pool was relocated to the heap. It means that you are no longer limited by a separate fixed size memory area. All strings are now located in the heap, as most of other ordinary objects, which allows you to manage only the heap size while tuning your application. Technically, this alone could be a sufficient reason to reconsider using String.intern() in your Java 7 programs. But there are other reasons.

Continue reading

A few more memory saving techniques in Java

by Mikhail Vorontsov

This article will list a few more ideas on cheap memory saving techniques in Java. This article follows An overview of memory saving techniques in Java, Memory consumption of popular Java data types – part 1 and Memory consumption of popular Java data types – part 2 articles earlier published in this blog. I strongly recommend you to read An overview of memory saving techniques in Java before reading this article. This article assumes that size of Object references is equal to 4 bytes.

Static inner classes

Java lacks a normal library tuple support like the one Scala has. This forces Java developers to create a lot of usually inner classes, whose only purpose is to compose a few elements as a single entity for a collection. One really popular example of such inner class is Pair, which is often implemented as a generic class.

Inner non-static classes have an important property – they store a reference to their parent class. Such reference allows them to seamlessly call parent methods and access parent fields. Such a reference is often not needed for the simple tuple-like classes. You can get rid of it by marking your class as static. This will save you 4 bytes and increase its chances to occupy 8 bytes less (see An overview of memory saving techniques in Java for more details).

You should generally mark all your inner classes as static until there is a need to remove such qualifier.

Continue reading

Memory consumption of popular Java data types – part 2

by Mikhail Vorontsov

This article will give you an overview of some popular Java data types memory consumption. This article follows An overview of memory saving techniques in Java and Memory consumption of popular Java data types – part 1 articles earlier published in this blog. I strongly recommend you to read An overview of memory saving techniques in Java before reading this one. This article assumes that size of references is equal to 4 bytes.

HashMap, THashMap

HashMap is the most popular map in JDK. Provided that it contains objects with a quality hashCode method and it has not too high load factor, it will generally give you O(1) performance of get and put methods (as well as similar methods like contains).

You can find a proper hash map description in one of popular textbooks (“Algorithms (4th Edition)” or “Introduction to Algorithms”) or in Wikipedia. Here we will only discuss JDK HashMap implementation.

HashMap is built on top of the array of Map.Entry objects. The implementation ensures that this array length is always equal to at least max( size, capacity ) / load_factor. Default load factor for HashMap is 0.75 and default capacity is 16. Load factor specifies which part of an array could be used for storage and is a value between 0 and 1. The higher is the load factor, the less space is being wasted, but HashMap starts to work slower due to increased rate of collisions. The smaller if the load factor, the more memory is wasted, but the performance of a HashMap is increasing due to smaller possibility of collisions.

So, as you have seen, the default (new HashMap<>()) size of array of entries is 16 / 0.75 = 21.33 ~ 22.

What is a HashMap.Entry? It contains a key, a value, int hash of a key and a pointer to the next entry (remember that entries array could be sparse). It means that an entry occupies 32 bytes (12 bytes header + 16 bytes data + 4 bytes padding). So, a HashMap with size = S has to spend 32 * S bytes for entries storage. Besides, it will use 4 * C bytes for entries array, where C is the map capacity.

As you can see, if you will make the map capacity small enough (less than 12.5%), its entry array size will start dominating over the entries.

Continue reading

An overview of memory saving techniques in Java

by Mikhail Vorontsov

This article will give you the general advices on memory consumption optimization in Java.

Memory usage optimization is the most important type of optimization in Java. Current systems are limited by memory access times rather than by CPU frequency (otherwise, why CPU producers are implementing all these L1, L2 and L3 caches?). It means that by reducing your application memory footprint you will most likely improve your program data processing speed by making your CPU to wait for smaller amount of data. Now let’s get back to Java.

General Java memory layout information

First of all, we have to revise the memory layout of Java objects: any Java Object occupies at least 16 bytes, 12 out of which are occupied by a Java object header. Besides that, all Java objects are aligned by 8 bytes boundary. It means that, for example, an object containing 2 fields: int and byte will occupy not 17 (12 + 4 + 1), but 24 bytes (17 aligned by 8 bytes).

Each Object reference occupies 4 bytes if the Java heap is under 32G and XX:+UseCompressedOops is turned on (it is turned on by default in the recent Oracle JVM versions). Otherwise, Object references occupy 8 bytes.

All primitive data types occupy their exact byte size:

byte, boolean 1 byte
short, char 2 bytes
int, float 4 bytes
long, double 8 bytes

In essence, this information is sufficient for Java memory optimization. But it will be more convenient if you will be aware of arrays / String / numeric wrappers memory consumption.

Continue reading