Category Archives: Intermediate

Java server application troubleshooting using JDK tools

by Mikhail Vorontsov

1. Introduction
2. Troubleshooting scenarios
    2.1. Getting a list of running JVMs
    2.2. Making a heap dump
    2.3. Analyzing a class histogram
    2.4. Making a thread dump
    2.5. Running Java Flight Recorder

1. Introduction

In the Java world most of us are used to GUI tools for all stages of Java application development: writing your code, debugging and profiling it. We often prefer to set up the server environment on our dev boxes and try to reproduce the problems locally using familiar tools. Unfortunately, it is often impossible to reproduce some issues locally for various reasons. For example, you may not be authorised to access the real client data which is processed by your server application.

In situations like this one you need to troubleshoot the application remotely on the server box. You should keep in mind that you can not properly troubleshoot an application with bare JRE in your hands: it contains all the troubleshooting functionality, but there is literally no way to access it. As a result, you need either a JDK or some 3rd party tools on the same box. This article will describe JDK tools, because you are likely to be allowed to use it on production boxes compared to any 3rd party tools which require security audit in many organizations.

In general case, it is sufficient just to unpack the JDK distribution to your box – you don’t need to install it properly for troubleshooting purposes (actually, proper installation may be undesirable in a lot of cases). For JMX based functionality you can install literally any Java 7/8 JDK, but some tools can not recognize the future JDK, so I advice you to install either the latest Java 7/8 JDK or the build exactly matching to server JRE – it allows you to dump the app heap for applications with no safepoints being accessed at the moment (some applications in the idle mode are the easy example of “no safepoints” applications).

2. Troubleshooting scenarios

2.1. Getting a list of running JVMs

In order to start working you nearly always need to get a list of running JVMs, their process IDs and command line arguments. Sometimes it may enough: you may find a second instance of the same application doing the same job concurrently (and damaging the output files / reopening sockets / doing some other stupid things).

Just run jcmd without any arguments. It will print you a list of running JVMs:

3824 org.jetbrains.idea.maven.server.RemoteMavenServer

Now you can see what diagnostic commands are available for a given JVM by running jcmd <PID> help command. Here is a sample output for VisualVM:

>jcmd 3036 help

The following commands are available:

Type jcmd <PID> <COMMAND_NAME> to either run a diagnostic command or get an error message asking for command arguments:

>jcmd 3036 GC.heap_dump

java.lang.IllegalArgumentException: The argument 'filename' is mandatory.

You can get more information about a diganostic command arguments by using the following command: jcmd <PID> help <COMMAND_NAME>. For example, here is the output for GC.heap_dump command:

>jcmd 3036 help GC.heap_dump
Generate a HPROF format dump of the Java heap.

Impact: High: Depends on Java heap size and content. Request a full GC unless the '-all' option is specified.


Syntax : GC.heap_dump [options] <filename>

filename :  Name of the dump file (STRING, no default value)

Options: (options must be specified using the <key> or <key>=<value> syntax)
-all : [optional] Dump all objects, including unreachable objects (BOOLEAN, false)

Continue reading

Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove – January 2015 version

by Mikhail Vorontsov

This is a major update of the previous version of this article. The reasons for this update are:

  • The major performance updates in fastutil 6.6.0
  • Updates in the “get” test from the original article, addition of “put/update” and “put/remove” tests
  • Adding identity maps to all tests
  • Now using different objects for any operations after map population (in case of Object keys – except identity maps). Old approach of reusing the same keys gave the unfair advantage to Koloboke.

I would like to thank Sebastiano Vigna for providing the initial versions of “get” and “put” tests.


This article will give you an overview of hash map implementations in 5 well known libraries and JDK HashMap as a baseline. We will test separately:

  • Primitive to primitive maps
  • Primitive to object maps
  • Object to primitive maps
  • Object to Object maps
  • Object (identity) to Object maps

This article will provide you the results of 3 tests:

  • “Get” test: Populate a map with a pregenerated set of keys (in the JMH setup), make ~50% successful and ~50% unsuccessful “get” calls. For non-identity maps with object keys we use a distinct set of keys (the different object with the same value is used for successful “get” calls).
  • “Put/update” test: Add a pregenerated set of keys to the map. In the second loop add the equal set of keys (different objects with the same values) to this map again (make the updates). Identical keys are used for identity maps and for maps with primitive keys.
  • “Put/remove” test: In a loop: add 2 entries to a map, remove 1 of existing entries (“add” pointer is increased by 2 on each iteration, “remove” pointer is increased by 1).

This article will just give you the test results. There will be a followup article on the most interesting implementation details of the various hash maps.

Test participants


JDK HashMap is the oldest hash map implementation in this test. It got a couple of major updates recently – a shared underlying storage for the empty maps in Java 7u40 and a possibility to convert underlying hash bucket linked lists into tree maps (for better worse case performance) in Java 8.

FastUtil 6.6.0

FastUtil provides a developer a set of all 4 options listed above (all combinations of primitives and objects). Besides that, there are several other types of maps available for each parameter type combination: array map, AVL tree map and RB tree map. Nevertheless, we are only interested in hash maps in this article.

Goldman Sachs Collections 5.1.0

Goldman Sachs has open sourced its collections library about 3 years ago. In my opinion, this library provides the widest range of collections out of box (if you need them). You should definitely pay attention to it if you need more than a hash map, tree map and a list for your work 🙂 For the purposes of this article, GS collections provide a normal, synchronized and unmodifiable versions of each hash map. The last 2 are just facades for the normal map, so they don’t provide any performance advantages.

HPPC 0.6.1

HPPC provides array lists, array dequeues, hash sets and hash maps for all primitive types. HPPC provides normal hash maps for primitive keys and both normal and identity hash maps for object keys.

Koloboke 0.6.5

Koloboke is the youngest of all libraries in this article. It is developed as a part of an OpenHFT project by Roman Leventov. This library currently provides hash maps and hash sets for all primitive/object combinations. This library was recently renamed from HFTC, so some artifacts in my tests will still use the old library name.

Trove 3.0.3

Trove is available for a long time and quite stable. Unfortunately, not much development is happening in this project at the moment. Trove provides you the list, stack, queue, hash set and map implementations for all primitive/object combinations. I have already written about Trove.

Data storage implementations and tests

This article will look at 5 different sorts of maps:

  1. intint
  2. intInteger
  3. Integerint
  4. IntegerInteger
  5. Integer (identity map)Integer

We will use JMH 1.0 for testing. Here is the test description: for each map size in (10K, 100K, 1M, 10M, 100M) (outer loop) generate a set of random keys (they will be used for each test at a given map size) and then run a test for each map implementations (inner loop). Each test will be run 100M / map_size times. “get”, “put” and “remove” tests are run separately, so you can update the test source code and run only a few of them.

Note that each test suite takes around 7-8 hours on my box. Spreadsheet-friendly results will be printed to stdout once all test suites will finish.


Each section will start with a table showing how data is stored inside each map. Only arrays will be shown here (some maps have special fields for a few corner cases).

tests.maptests.primitive.FastUtilMapTest int[] key, int[] value
tests.maptests.primitive.GsMutableMapTest int[] keys, int[] values
tests.maptests.primitive.HftcMutableMapTest long[] (key-low bits, value-high bits)
tests.maptests.primitive.HppcMapTest int[] keys, int[] values, boolean[] allocated
tests.maptests.primitive.TroveMapTest int[] _set, int[] _values, byte[] _states

As you can see, Koloboke is using a single array, FastUtil and GS use 2 arrays, and HPPC and Trove use 3 arrays to store the same data. Let’s see what would be the actual performance.

“Get” test results

All “get” tests make around 50% of unsuccessful get calls in order to test both success and failure paths in each map.

Each test results section will contain the results graph. X axis will show a map size, Y axis – time to run a test in milliseconds. Note, that each test in a graph has a fixed number of map method calls: 100M get call for “get” test; 200M put calls for “put” test; 100M put and 50M remove calls for “remove” tests.

There would be the links to OpenOffice spreadsheets with all test results at the end of this article.

int-int 'get' test results

int-int ‘get’ test results

GS and FastUtil test results lines are nearly parallel, but FastUtil is faster due to a lower constant factor. Koloboke becomes fastest only on large enough maps. Trove is slower than other implementations at each map size.

“Put” test results

“Put” tests insert all keys into a map and then use another equal set of keys to insert entries into a map again (these methods calls would update the existing entries). We make 100M put calls with “insert” functionality and 100M put calls with “update” functionality in each test.

int-int 'put' test results

int-int ‘put’ test results

This test shows the implementation difference more clear: Koloboke is fastest from the start (though FastUtil is as fast on small maps); GS and FastUtil are parallel again (but GS is always slower). HPPC and Trove are the slowest.

“Remove” test results

In “remove” test we interleave 2 put operations with 1 remove operation, so that a map size grows by 1 after each group of put/remove calls. In total we make 100M put and 50M remove calls.

int-int 'remove' test results

int-int ‘remove’ test results

Results are similar to “put” test (of course, both tests make a majority of put calls!): Koloboke is quickly becoming the fastest implementation; FastUtil is a bit faster than GS on all map sizes; HPPC and Trove are the slowest, but HPPC performs reasonably good on map sizes up to 1M entries.

int-int summary

An underlying storage implementation is the most important factor defining the hash map performance: the fewer memory accesses an implementation makes (especially for large maps which do not into CPU cache) to access an entry – the faster it would be. As you can see, the single array Koloboke is faster than other implementations in most of tests on large map sizes. For smaller map sizes, CPU cache starts hiding the costs of accessing several arrays – in this case other implementations may be faster due to less CPU commands required for a method call: FastUtil is the second best implementation for primitive collection tests due to its highly optimized code.

Continue reading

Going over Xmx32G heap boundary means you will have less memory available

by Mikhail Vorontsov

This small article will remind you what happens to Oracle JVM once your heap setting goes over 32G. By default, all references in JVM occupy 4 bytes on heaps under 32G. This decision is made by JVM at start-up time. You can use 8 byte references on small heaps if you will clear the -XX:-UseCompressedOops JVM option (it does not make any sense for production systems!).

Once your heap exceeds 32G, you are in 64 bit land, so your object references will now use 8 bytes instead of 4. As Scott Oaks mentioned in his “Java Performance: The Definitive Guide” book (pages 234-236, read my review of this book here), an average Java program uses about 20% of heap for object references. It means that by setting anything between Xmx32G and Xmx37G – Xmx38G you will actually reduce the amount of heap available for your application (actual numbers, of course, depend on your application). This may become a big surprise for anyone thinking that adding extra memory will let his/her application to process more data 🙂

Continue reading

Performance of various general compression algorithms – some of them are unbelievably fast!

by Mikhail Vorontsov

07 Jan 2015 update: extending LZ4 description (thanks to Mikael Grev for a hint!)

This article will give you an overview of several general compression algorithm implementations performance. As it turned out, some of them could be used even when your CPU requirements are pretty strict.

In this article we will compare:

  • JDK GZIP – a slow algorithm with a good compression, which could be used for long term data compression. Implemented in JDK / GZIPOutputStream.
  • JDK deflate – another algorithm available in JDK (it is used for zip files). Unlike GZIP, you can set compression level for this algorithm, which allows you to trade compression time for the output file size. Available levels are 0 (store, no compression), 1 (fastest compression) to 9 (slowest compression). Implemented as / InflaterInputStream.
  • Java implementation of LZ4 compression algorithm – this is the fastest algorithm in this article with a compression level a bit worse than the fastest deflate. I advice you to read the wikipedia article about this algorithm to understand its usage. It is distributed under a friendly Apache license 2.0.
  • Snappy – a popular compressor developed in Google, which aims to be fast and provide relatively good compression. I have tested this implementation. It is also distributed under Apache license 2.0.

Continue reading

Introduction to JMH

by Mikhail Vorontsov

11 Sep 2014: Article was updated for JMH 1.0.

10 May 2014: Original version.


This article will give you an overview of basic rules and abilities of JMH. The second article will give you an overview of JMH profilers.

JMH is a new microbenchmarking framework (first released late-2013). Its distinctive advantage over other frameworks is that it is developed by the same guys in Oracle who implement the JIT. In particular I want to mention Aleksey Shipilev and his brilliant blog. JMH is likely to be in sync with the latest Oracle JRE changes, which makes its results very reliable.

You can find JMH examples here.

JMH has only 2 requirements (everything else are recommendations):

  • You need to create a maven project using a command from the JMH official web page
  • You need to annotate test methods with @Benchmark annotation

In some cases, it is not convenient to create a new project just for the performance testing purposes. In this situation you can rather easily add JMH into an existing project. You need to make the following steps:

  1. Ensure your project directory structure is recognizable by Maven (your benchmarks are at src/main/java at least)
  2. Copy 2 JMH maven dependencies and maven-shade-plugin from the JMH archetype. No other plugins mentioned in the archetype are required at the moment of writing (JMH 1.0).

How to run

Run the following maven command to create a template JMH project from an archetype (it may change over the time, check for the latest version near the start of the the official JMH page):

$ mvn archetype:generate \
          -DinteractiveMode=false \
          -DarchetypeGroupId=org.openjdk.jmh \
          -DarchetypeArtifactId=jmh-java-benchmark-archetype \
          -DgroupId=org.sample \
          -DartifactId=test \

Alternatively, copy 2 JMH dependencies and maven-shade-plugin from the JMH archetype (as described above).

Create one (or a few) java files. Annotate some methods in them with @Benchmark annotation – these would be your performance benchmarks.

You have at least 2 simple options to run your tests::

Follow the procedure from the official JMH page):

$ cd your_project_directory/
$ mvn clean install
$ java -jar target/benchmarks.jar

The last command should be entered verbatim – regardless of your project settings you will end up with target/benchmarks.jar sufficient to run all your tests. This option has a slight disadvantage – it will use the default JMH settings for all settings not provided via annotations ( @Fork, @Warmup and @Measurement annotations are getting nearly mandatory in this mode). Use java -jar target/benchmarks.jar -h command to see all available command line options (there are plenty).

Or use the old way: add main method to some of your classes and write a JMH start script inside it. Here is an example:

Options opt = new OptionsBuilder()
                .include(".*" + YourClass.class.getSimpleName() + ".*")
new Runner(opt).run();
Options opt = new OptionsBuilder()
                .include(".*" + YourClass.class.getSimpleName() + ".*")
new Runner(opt).run();

After that you can run it with target/benchmarks.jar as your classpath:

$ cd your_project_directory/
$ mvn clean install
$ java -cp target/benchmarks.jar your.test.ClassName

Now after extensive “how to run it” manual, let’s look at the framework itself.

Continue reading

String deduplication feature (from Java 8 update 20)

by Mikhail Vorontsov

This article will provide you a short overview of a string deduplication feature added into Java 8 update 20.

String objects consume a large amount of memory in an average application. Some of these strings may be duplicated – there exist several distinct instances of the same String (a != b, but a.equals(b)). In practice, a lot of Strings could be duplicated due to various reasons.

Originally, JDK offered String.intern() method to deal with the string duplication. The disadvantage of this method is that you have to find which strings should be interned. This generally requires a heap analysis tool with a duplicate string lookup ability, like YourKit profiler. Nevertheless, if used properly, string interning is a powerful memory saving tool – it allows you to reuse the whole String objects (each of whose is adding 24 bytes overhead to the underlying char[]).

Starting from Java 7 update 6, each String object has its own private underlying char[]. This allows JVM to make an automatic optimization – if an underlying char[] is never exposed to the client, then JVM can find 2 strings with the same contents, and replace the underlying char[] of one string with an underlying char[] of another string.

That’s done by the string deduplication feature added into Java 8 update 20. How it works:

Continue reading

Trove library: using primitive collections for performance

by Mikhail Vorontsov

19 July 2014: article text cleanup, added a chapter on JDK to Trove migration.
16 July 2012: original version.

This article would describe Trove library, which contains a set of primitive collection implementations. The latest version of Trove (3.1a1 at the time of writing) would be described here.

Why should you use Trove? Why not to keep using well known JDK collections? The answer is performance and memory consumption. Trove doesn’t use any java.lang.Number subclasses internally, so you don’t have to pay for boxing/unboxing each time you want to pass/query a primitive value to/from the collection. Besides, you don’t have to waste memory on the boxed numbers (24 bytes for Long/Double, 16 bytes for smaller types) and reference to them. For example, if you want to store an Integer in JDK map, you need 4 bytes for a reference (or 8 bytes on huge heaps) and 16 bytes for an Integer instance. Trove, on the other hand, uses just 4 bytes to store an int. Trove also doesn’t create Map.Entry for each key-value pair unlike java.util.HashMap, further reducing the map memory footprint. For sets, it doesn’t use a Map internally, just keeping set values.

There are 3 main collection types in Trove: array lists, sets and maps. There are also queues, stacks and linked lists, but they are not so important (and usually instances of these collections tend to be rather small).

Array lists

All array lists are built on top of an array of corresponding data type (for example, int[] for TIntArrayList). There is a small problem which you should deal with: a value for the absent elements (default value). It is zero by default, but you can override it using

public TIntArrayList(int capacity, int no_entry_value);
public static TIntArrayList wrap(int[] values, int no_entry_value);
public TIntArrayList(int capacity, int no_entry_value);
public static TIntArrayList wrap(int[] values, int no_entry_value);

There are 2 useful methods called getQuick/setQuick – they just access the underlying array without any additional checks. As a side-effect they would allow to access elements between list size and capacity (don’t use it too much – it is still better to add values legally and when getQuick values as long as you are inside array list boundaries.

Each Trove array list has several helper methods which implement the java.util.Collections functionality:

public void reverse()
public void reverse(int from, int to)
public void shuffle(java.util.Random rand)
public void sort()
public void sort(int fromIndex, int toIndex)
public void fill(int val)
public void fill(int fromIndex, int toIndex, int val)
public int binarySearch(int value)
public int binarySearch(int value, int fromIndex, int toIndex)
public int max()
public int min()
public int sum()
public void reverse()
public void reverse(int from, int to)
public void shuffle(java.util.Random rand)
public void sort()
public void sort(int fromIndex, int toIndex)
public void fill(int val)
public void fill(int fromIndex, int toIndex, int val)
public int binarySearch(int value)
public int binarySearch(int value, int fromIndex, int toIndex)
public int max()
public int min()
public int sum()

Continue reading

Base64 encoding and decoding performance

by Mikhail Vorontsov

02 Apr 2014 update: added Guava implementation and byte[] < -> byte[] section.

21 Mar 2014 update: major rewrite + added javax.xml.bind.DatatypeConverter class description.

21 Feb 2014 update: added MiGBase64 class description.

25 Dec 2013 update: added Java 8 java.util.Base64 class description.

We will discuss what is Base64 algorithm and what is the performance of several different well-known libraries implementing Base64 encoding/decoding.

Base64 is an algorithm mapping all 256 byte values to 64 printable byte values (printable means that those bytes are printed in US-ASCII encoding). This is done by packing 3 input bytes to 4 output bytes. Base64 is generally used in text-based data exchange protocols when there is still a need to transfer some binary data. The best known example is encoding of e-mail attachments.

JDK Base64 implementations

Surprisingly, there was no Base64 implementation in the core JDK classes before Java 6. Some web forums advise to use two non-public sun.* classes which are present in all Sun/Oracle JDK: sun.misc.BASE64Encoder and sun.misc.BASE64Decoder. The advantage of using them is that you don’t need to ship any other libraries with your application. The disadvantage is that those classes are not supposed to be used from outside JDK classes (and, of course, they can be removed from JDK implementation… in theory, at least).

Sun has added another Base64 implementation in Java 6 (thanks to Thomas Darimont for his remainder!): it was hidden in javax.xml.bind package and was unknown to many developers. javax.xml.bind.DatatypeConverter class has 2 static methods – parseBase64Binary and printBase64Binary, which are used for Base64 encoding and decoding.

Java 8 has finally added a Base64 implementation in the java. namespace – java.util.Base64. This static factory class provides you with the basic/MIME/URL and filename safe encoder and decoder implementations.

Surprisingly (or may be not), all these implementations do not share any logic even in Java 8.

Third party Base64 implementations

I will also mention 4 quite well known Base64 third party implementations.

  • The first one is present in the Apache Commons Codec library and called org.apache.commons.codec.binary.Base64.
  • The next one is present in the Google Guava library and accessible via static method.
  • Another one was written by Robert Harder and available from his website: This is a single class which you will have to add to your project.
  • The last one was written by Mikael Grev nearly 10 years ago. It is available from This is also a single class you have to add into your project. This implementation claims to be the fastest Base64 implementation, but unfortunately this is not true any longer. Besides, it has a strictest limit on the maximal length of byte[] to decode (see below).

Continue reading

Implementing a high performance Money class

by Mikhail Vorontsov

This article will discuss how to efficiently implement a Money class in Java. This article follows Using double/long vs BigDecimal for monetary calculations article published earlier in this blog.


As I have written before, we should not use any floating point types for storing monetary values. We should use long instead, keeping the number of smallest currency units in it. Unfortunately, there is a couple of problems:

  • We may still have to communicate with a software storing monetary values in double variables.
  • We may want to multiply it by a non-integer number, like 0.015, or divide it by any type of number, thus getting the non-integer result.

The easy way to deal with arithmetic operations is to use the BigDecimal class for all of them. Unfortunately, BigDecimal will not help you to deal with already incorrect double calculations results. For example, the following code will print 0.7999999999999999:

System.out.println( new BigDecimal( 0.7 + 0.1, MathContext.DECIMAL64 ) );
System.out.println( new BigDecimal( 0.7 + 0.1, MathContext.DECIMAL64 ) );

So, we may say that BigDecimal operations will produce the correct result if it had the correct arguments. We can not say the same about double operations.

double operations may end up with the exact result or a result which is one bit off the exact result. This is most likely caused by executing double operations using higher precision on CPU and then rounding the result. All we need to do to get the exact result is to look at these 3 values (result; result plus one bit value, which is also known as ulp; and result minus ulp). One of them would be the correct one. Easy to say, difficult to efficiently implement 🙂

Continue reading