A possible memory leak in the manual MultiMap implementation

by Mikhail Vorontsov

Pure Java 6 and 7

In this short article I will describe a junior level memory leak I have recently seen in a pure JDK application (no 3rd party libs were allowed).

Suppose you have a map from String identifiers to some Collections, for example a set of String properties of such identifiers: Map<String, Set<String>>. The actual type of the inner collection does not matter for this article – it should just be a Collection. Such collections are generally called multimaps.

The following method was initially written to obtain the inner set by its identifier:

1
2
3
4
5
6
7
8
9
10
11
12
private final Map<String, Set<String>> m_ids = new HashMap<String, Set<String>>( 4 );
 
private Set<String> getSetByNameFaulty( final String id )
{
    Set<String> res = m_ids.get( id );
    if ( res == null )
    {
        res = new HashSet<String>( 1 );
        m_ids.put( id, res );
    }
    return res;
}
private final Map<String, Set<String>> m_ids = new HashMap<String, Set<String>>( 4 );

private Set<String> getSetByNameFaulty( final String id )
{
    Set<String> res = m_ids.get( id );
    if ( res == null )
    {
        res = new HashSet<String>( 1 );
        m_ids.put( id, res );
    }
    return res;
}

This method checks if an identifier is already present in the map and either returns the corresponding Set or allocates a new one and adds it into the map. This method is useful for populating our map:

1
2
3
4
5
6
7
private void populateJava67()
{
    getSetByNameFaulty( "id1" ).add( "property1" );
    getSetByNameFaulty( "id1" ).add( "property2" );
    getSetByNameFaulty( "id1" ).add( "property3" );
    getSetByNameFaulty( "id2" ).add( "property1" );
}
private void populateJava67()
{
    getSetByNameFaulty( "id1" ).add( "property1" );
    getSetByNameFaulty( "id1" ).add( "property2" );
    getSetByNameFaulty( "id1" ).add( "property3" );
    getSetByNameFaulty( "id2" ).add( "property1" );
}

The next step while writing a program would be to add some accessors to our map, like the following one:

1
2
3
4
private boolean hasPropertyFaulty( final String id, final String property )
{
    return getSetByNameFaulty( id ).contains( property );
}
private boolean hasPropertyFaulty( final String id, final String property )
{
    return getSetByNameFaulty( id ).contains( property );
}

This method looks good and is unlikely to be caught by any code quality tools. Unfortunately, it has a major flaw: if you will query properties of unknown identifier, a new empty set will be allocated in our map inside getSetByNameFaulty method. This is definitely a not wanted side effect. Instead we should let our new getSetByName method know if we want to write something to the returned set:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
private Set<String> getSetByName( final String id, final boolean isWrite )
{
    Set<String> res = m_ids.get( id );
    if ( res == null )
    {
        if ( isWrite )
        {
            res = new HashSet<String>( 1 );
            m_ids.put( id, res );
        }
        else
            res = Collections.emptySet();
    }
    return res;
}
 
private boolean hasProperty( final String id, final String property )
{
    return getSetByName( id, false ).contains( property );
}
 
private void populateJava67Better()
{
    getSetByName( "id1", true ).add( "property1" );
    getSetByName( "id1", true ).add( "property2" );
    getSetByName( "id1", true ).add( "property3" );
    getSetByName( "id2", true ).add( "property1" );
}
private Set<String> getSetByName( final String id, final boolean isWrite )
{
    Set<String> res = m_ids.get( id );
    if ( res == null )
    {
        if ( isWrite )
        {
            res = new HashSet<String>( 1 );
            m_ids.put( id, res );
        }
        else
            res = Collections.emptySet();
    }
    return res;
}

private boolean hasProperty( final String id, final String property )
{
    return getSetByName( id, false ).contains( property );
}

private void populateJava67Better()
{
    getSetByName( "id1", true ).add( "property1" );
    getSetByName( "id1", true ).add( "property2" );
    getSetByName( "id1", true ).add( "property3" );
    getSetByName( "id2", true ).add( "property1" );
}

Java 8

Java 8 has added the following method to the Map interface (actually, many more methods were added, but we are interested in this one only):

1
V getOrDefault(Object key, V defaultValue)
V getOrDefault(Object key, V defaultValue)

This method returns either a value from the map or a default value if a key is not present in the map. The map itself is not updated on this call. It will allow us to simplify the read accesses to the map to

1
2
3
4
private boolean hasPropertyJava8( final String id, final String property )
{
    return m_ids.getOrDefault( id, Collections.<String>emptySet() ).contains( property );
}
private boolean hasPropertyJava8( final String id, final String property )
{
    return m_ids.getOrDefault( id, Collections.<String>emptySet() ).contains( property );
}

Google Guava

Google Guava library defines a MultiMap interface and provides its several implementations (number of those is actually similar to a number of Map implementations in JDK collections framework).

MultiMap.get defined as

1
Collection<V> get(K key)
Collection<V> get(K key)

It returns all values associated with a given key (or an empty collection if nothing is associated). Here is a possible client code implementation using Google Guava. As you can see, we don’t have to explicitly care about inner collection:

1
2
3
4
5
6
7
8
9
10
11
private final Multimap<String, String> m_idsMM = HashMultimap.create();
 
private void addPropertyGuava( final String id, final String property )
{
    m_idsMM.put( id, property );
}
 
private boolean hasPropertyGuava( final String id, final String property )
{
    return m_idsMM.get( id ).contains( property );
}
private final Multimap<String, String> m_idsMM = HashMultimap.create();

private void addPropertyGuava( final String id, final String property )
{
    m_idsMM.put( id, property );
}

private boolean hasPropertyGuava( final String id, final String property )
{
    return m_idsMM.get( id ).contains( property );
}

Scala 2.10

Scala is one of most popular JVM languages. Its current release, 2.10, provides user a rich library of collections.

The following example mixes in Scala MultiMap trait with an ordinary Scala HashMap. It uses MapLike.getOrElseUpdate method for initialization of absent key-value pairs.

object MultimapTest {
  val m_ids = new mutable.HashMap[String, mutable.Set[String]] with mutable.MultiMap[String, String]

  def addProperty( id:String, property:String) {
    m_ids.getOrElseUpdate(id, new mutable.HashSet[String]())+=property
  }

  def hasProperty( id:String, property:String) = m_ids(id).contains(property)

  def main( args:Array[String])
  {
    addProperty("id1", "prop1")
    addProperty("id1", "prop2")
    addProperty("id1", "prop3")
    addProperty("id2", "prop1")

    println(hasProperty("id1", "prop2"))
    println(hasProperty("id1", "prop4"))
    println(hasProperty("id2", "prop2"))
    println(hasProperty("id2", "prop4"))
  }
}

This code will print true followed by 3 false.

Summary

  • As you have seen, it is quite easy to miss a memory leak while implementing a multilevel map. You need to be careful and split read and write accesses to the outer map.
  • Newer frameworks and languages, like Google Guava, Java 8 and Scala already provide you more convenient syntax and wider choice of collections thus allowing you to avoid possible memory leaks in the multilevel maps.