Java server application troubleshooting using JDK tools

by Mikhail Vorontsov

1. Introduction
2. Troubleshooting scenarios
    2.1. Getting a list of running JVMs
    2.2. Making a heap dump
    2.3. Analyzing a class histogram
    2.4. Making a thread dump
    2.5. Running Java Flight Recorder

1. Introduction

In the Java world most of us are used to GUI tools for all stages of Java application development: writing your code, debugging and profiling it. We often prefer to set up the server environment on our dev boxes and try to reproduce the problems locally using familiar tools. Unfortunately, it is often impossible to reproduce some issues locally for various reasons. For example, you may not be authorised to access the real client data which is processed by your server application.

In situations like this one you need to troubleshoot the application remotely on the server box. You should keep in mind that you can not properly troubleshoot an application with bare JRE in your hands: it contains all the troubleshooting functionality, but there is literally no way to access it. As a result, you need either a JDK or some 3rd party tools on the same box. This article will describe JDK tools, because you are likely to be allowed to use it on production boxes compared to any 3rd party tools which require security audit in many organizations.

In general case, it is sufficient just to unpack the JDK distribution to your box – you don’t need to install it properly for troubleshooting purposes (actually, proper installation may be undesirable in a lot of cases). For JMX based functionality you can install literally any Java 7/8 JDK, but some tools can not recognize the future JDK, so I advice you to install either the latest Java 7/8 JDK or the build exactly matching to server JRE – it allows you to dump the app heap for applications with no safepoints being accessed at the moment (some applications in the idle mode are the easy example of “no safepoints” applications).

2. Troubleshooting scenarios

2.1. Getting a list of running JVMs

In order to start working you nearly always need to get a list of running JVMs, their process IDs and command line arguments. Sometimes it may enough: you may find a second instance of the same application doing the same job concurrently (and damaging the output files / reopening sockets / doing some other stupid things).

Just run jcmd without any arguments. It will print you a list of running JVMs:

3824 org.jetbrains.idea.maven.server.RemoteMavenServer
2196
780 sun.tools.jcmd.JCmd

Now you can see what diagnostic commands are available for a given JVM by running jcmd <PID> help command. Here is a sample output for VisualVM:

>jcmd 3036 help

3036:
The following commands are available:
JFR.stop
JFR.start
JFR.dump
JFR.check
VM.native_memory
VM.check_commercial_features
VM.unlock_commercial_features
ManagementAgent.stop
ManagementAgent.start_local
ManagementAgent.start
GC.rotate_log
Thread.print
GC.class_stats
GC.class_histogram
GC.heap_dump
GC.run_finalization
GC.run
VM.uptime
VM.flags
VM.system_properties
VM.command_line
VM.version
help

Type jcmd <PID> <COMMAND_NAME> to either run a diagnostic command or get an error message asking for command arguments:

>jcmd 3036 GC.heap_dump

3036:
java.lang.IllegalArgumentException: The argument 'filename' is mandatory.

You can get more information about a diganostic command arguments by using the following command: jcmd <PID> help <COMMAND_NAME>. For example, here is the output for GC.heap_dump command:

>jcmd 3036 help GC.heap_dump
        
3036:
GC.heap_dump
Generate a HPROF format dump of the Java heap.

Impact: High: Depends on Java heap size and content. Request a full GC unless the '-all' option is specified.

Permission: java.lang.management.ManagementPermission(monitor)

Syntax : GC.heap_dump [options] <filename>

Arguments:
filename :  Name of the dump file (STRING, no default value)

Options: (options must be specified using the <key> or <key>=<value> syntax)
-all : [optional] Dump all objects, including unreachable objects (BOOLEAN, false)

2.2. Making a heap dump

jcmd provides you with a handy interface for making the heap dump in HPROF format. Simply run jcmd <PID> GC.heap_dump <FILENAME>. Pay attention that the file name is relative to the running JVM current directory instead of your current directory, so you may want to specify the full path. It is a good idea to use .hprof extension for the dump file name.

After thread dump completion you can copy a file to your own box and open it either in VisualVM (it is a part of JDK) and use its heap walker and query language functionality or load it into JOverflow plugin of Java Mission Control and analyze it for various memory issues.

Note 1: of course there is lots of other tools capable to handle hprof files: NetBeans, Eclipse Memory Analyzer, YourKit, etc. Use your favorite one once you have downloaded an .hprof file to your box.

Note 2: you can make a heap dump using jmap tool as well: jmap -dump:live,file=<FILE_NAME> <PID>. The problem with it is that it is officially documented as unsupported. Many of us were thinking that unsupported stuff in JDK will remain forever, but it turns out that this is no longer the case: JEP 240, JEP 241

2.3. Analyzing a class histogram

If you are looking for a memory leak, you are usually interested just in a number of live objects of certain type in a heap. For example, you may know that you can have only one object of a certain type at a time (some sort of main working class in your application). There also may be one or more instances of the same class in the old generation which are not garbage collected so far, but they should not be accessible from the application roots.

To print the class histogram run either of these commands (both commands print the number of live objects):

jcmd <PID> GC.class_histogram
jmap -histo:live <PID>

Here are the top few lines of the example output:

 num     #instances         #bytes  class name
----------------------------------------------
   1:          5923        5976952  [I
   2:         50034        4127704  [C
   3:         49465        1187160  java.lang.String
   4:           188        1069496  [J
   5:          3985        1067240  [Ljava.util.HashMap$Node;
   6:          8756         982872  java.lang.Class
   7:          2855         835792  [B
   8:         23570         754240  java.util.HashMap$Node
   9:         13964         671440  [Ljava.lang.Object;
  10:          9642         308544  java.util.Hashtable$Entry
  11:          4453         213744  java.util.HashMap

Note that occupied size in bytes is a shallow size – it does not include any child objects. It is easy to notice this fact from char[] (class name = [C) and String stats – while the number of instances is similar (though there are always more char[]-s than Strings, the size of char[]-s is noticeably bigger, which can not be the case if String size included the underlying char[] size.

Now you can just grep/search for the class name you are interested in and check the number of live instances. If you see more instances than expected, make a heap dump and analyze it in any heap walker (see above).

2.4. Making a thread dump

Sometimes your application may be reported as “not doing anything/got stuck”. There are many kinds of getting “stuck” – a deadlock, high resource contention or simply an O(N10) algorithm processing the requests of a few million users 🙂 In all these situations you should know what your application threads are executing and what locks do they hold.

There are 2 kinds of locks: original ones based on synchronized keyword and Object.wait/notifyAll methods and the java.util.concurrent locks introduced in Java 5. The major difference between them is that former are bound to the stack frame where you enter the synchronized section and were always available in the thread dumps. The latter (java.util.concurrent), on the other hand, are not stack frame bound – you can enter the lock in one method and leave it in the different one. As a result, for some time they were not printed in the thread dump at all and even now they are still an option. Nevertheless, you need both sorts of locks in your thread dump to properly investigate the threading issues.

There are 3 ways to print the application thread dump. You can run kill -3 <PID> on Linux. Or you can run one of the following commands on any platform:

jstack <PID>
jcmd <PID> Thread.print

2.5. Running Java Flight Recorder

All tools mentioned up to this point in this article should be used only for quick investigation. For deeper analysis I recommend to rely on the built-in Java Flight Recorder. Take a look at my recent article about Java Mission Control for more details.

Running JFR is a 3 step process:

  1. You need to create a JFR template file containing desired settings. In order to do this, run jmc and go to Window -> Flight Recording Template Manager menu. Once the profile is ready, export it to the file and then send it to the box you are working on.
  2. JFR requires JDK commercial license. I ask you to take a look again at my recent Java Mission Control article for details. I suppose you are happy with the license terms. Now you need to unlock commercial features on the required JVM:

    jcmd <PID> VM.unlock_commercial_features
  3. After that you can start JFR. Here is an example of the command line:

    jcmd <PID> JFR.start name=test duration=60s settings=template.jfc filename=output.jfr

    This command runs JFR immediately (delay property is not set) and collects JVM information for 60 seconds using settings from template.jfc template file and writing results into output.jfr (it makes sense to use absolute paths for both files).

Once recording is done, you can copy the .jfr file to your laptop and analyze it inside jmc GUI. It contains nearly all information you need to troubleshoot JVM with the exception of the full heap dump, which you can make and copy to your box separately.


Leave a Reply

Your email address will not be published. Required fields are marked *