Category Archives: Overviews

Static code compilation in Groovy 2.0

by Mikhail Vorontsov

A tiny introduction to Groovy

Groovy is a JVM-based language useful for scripting and other non-CPU critical applications (like Grails web framework). It is a dynamically typed language allowing you to add methods and properties to existing objects at runtime.

The ability to add methods and properties at runtime is implemented via methodMissing and propertyMissing methods/handlers as well as a dynamic method registration. Such feature allows you to support your own DSLs via parsing not existing method names at runtime and registering the actual method bodies corresponding to such methods/properties. It allows you, for example, to generate database access method like List<Person> getPersonsByN( N n ) where N is any field defined in the Persons database table.

Such functionality made Groovy popular in the web development due to ability to generate repeating data access methods at runtime in the frameworks. Unfortunately (or luckily 🙂 ), Groovy method calls are using the dynamic dispatch model – Groovy runtime chooses the best matching method signature based on the runtime argument types instead of compile time argument types, like Java does. Dynamic dispatch requires each Groovy method call to use the Groovy runtime method lookup code based on the reflection. So, are method calls in Groovy extremely slow? The answer is no – Groovy does a very good job of caching call sites, not making another reflection lookup if possible.

Groovy static compilation

One of the main features of Groovy 2.0 was the static compilation mode. It is turned on by annotating methods or the whole class with the @CompileStatic annotation. This annotation actually turns on 2 features:

  1. Static type checking
  2. Static compilation

Continue reading

A possible memory leak in the manual MultiMap implementation

by Mikhail Vorontsov

Pure Java 6 and 7

In this short article I will describe a junior level memory leak I have recently seen in a pure JDK application (no 3rd party libs were allowed).

Suppose you have a map from String identifiers to some Collections, for example a set of String properties of such identifiers: Map<String, Set<String>>. The actual type of the inner collection does not matter for this article – it should just be a Collection. Such collections are generally called multimaps.

The following method was initially written to obtain the inner set by its identifier:

1
2
3
4
5
6
7
8
9
10
11
12
private final Map<String, Set<String>> m_ids = new HashMap<String, Set<String>>( 4 );
 
private Set<String> getSetByNameFaulty( final String id )
{
    Set<String> res = m_ids.get( id );
    if ( res == null )
    {
        res = new HashSet<String>( 1 );
        m_ids.put( id, res );
    }
    return res;
}
private final Map<String, Set<String>> m_ids = new HashMap<String, Set<String>>( 4 );

private Set<String> getSetByNameFaulty( final String id )
{
    Set<String> res = m_ids.get( id );
    if ( res == null )
    {
        res = new HashSet<String>( 1 );
        m_ids.put( id, res );
    }
    return res;
}

This method checks if an identifier is already present in the map and either returns the corresponding Set or allocates a new one and adds it into the map. This method is useful for populating our map:

1
2
3
4
5
6
7
private void populateJava67()
{
    getSetByNameFaulty( "id1" ).add( "property1" );
    getSetByNameFaulty( "id1" ).add( "property2" );
    getSetByNameFaulty( "id1" ).add( "property3" );
    getSetByNameFaulty( "id2" ).add( "property1" );
}
private void populateJava67()
{
    getSetByNameFaulty( "id1" ).add( "property1" );
    getSetByNameFaulty( "id1" ).add( "property2" );
    getSetByNameFaulty( "id1" ).add( "property3" );
    getSetByNameFaulty( "id2" ).add( "property1" );
}

The next step while writing a program would be to add some accessors to our map, like the following one:

1
2
3
4
private boolean hasPropertyFaulty( final String id, final String property )
{
    return getSetByNameFaulty( id ).contains( property );
}
private boolean hasPropertyFaulty( final String id, final String property )
{
    return getSetByNameFaulty( id ).contains( property );
}

This method looks good and is unlikely to be caught by any code quality tools. Unfortunately, it has a major flaw: if you will query properties of unknown identifier, a new empty set will be allocated in our map inside getSetByNameFaulty method. This is definitely a not wanted side effect. Instead we should let our new getSetByName method know if we want to write something to the returned set:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
private Set<String> getSetByName( final String id, final boolean isWrite )
{
    Set<String> res = m_ids.get( id );
    if ( res == null )
    {
        if ( isWrite )
        {
            res = new HashSet<String>( 1 );
            m_ids.put( id, res );
        }
        else
            res = Collections.emptySet();
    }
    return res;
}
 
private boolean hasProperty( final String id, final String property )
{
    return getSetByName( id, false ).contains( property );
}
 
private void populateJava67Better()
{
    getSetByName( "id1", true ).add( "property1" );
    getSetByName( "id1", true ).add( "property2" );
    getSetByName( "id1", true ).add( "property3" );
    getSetByName( "id2", true ).add( "property1" );
}
private Set<String> getSetByName( final String id, final boolean isWrite )
{
    Set<String> res = m_ids.get( id );
    if ( res == null )
    {
        if ( isWrite )
        {
            res = new HashSet<String>( 1 );
            m_ids.put( id, res );
        }
        else
            res = Collections.emptySet();
    }
    return res;
}

private boolean hasProperty( final String id, final String property )
{
    return getSetByName( id, false ).contains( property );
}

private void populateJava67Better()
{
    getSetByName( "id1", true ).add( "property1" );
    getSetByName( "id1", true ).add( "property2" );
    getSetByName( "id1", true ).add( "property3" );
    getSetByName( "id2", true ).add( "property1" );
}

Continue reading

Java collections overview

by Mikhail Vorontsov

This article will give you an overview of all standard Java collections. We will categorize their distinguishable properties and main use cases. Besides that, we will list all correct ways of transforming your data between various collection types.

Arrays

Array is the only collection type built in Java. It is useful when you know an upper bound on the number of processed elements in advance. java.util.Arrays contains a lot of useful methods for array processing:

  • Arrays.asList – conversion from array to List, which could be passed to other standard collection constructors.
  • Arrays.binarySearch – fast lookup in a sorted array or its subsection.
  • Arrays.copyOf – use this method if you need to expand your array while keeping its contents.
  • Arrays.copyOfRange – if you need to make a copy of the whole array or its subsection.
  • Arrays.deepEquals, Arrays.deepHashCode – versions of Arrays.equals/hashCode supporting nested sub-arrays.
  • Arrays.equals – if you need to compare two arrays for equality, use this method instead of array equals method ( array.equals is not overridden in any array, so it only compares references to arrays, rather than their contents ). This method may be combined with Java 5 boxing and varargs in order to write a simple implementation of your class equals method – just pass all your class fields to Arrays.equals after comparing object types.
  • Arrays.fill – populate the whole array or its subsection with a given value.
  • Arrays.hashCode – useful method for calculating a hashcode of array contents ( array own hashcode method can not be used for this purpose ). This method may be combined with Java 5 boxing and varargs in order to write a simple implementation of your class hashcode method – just pass all your class fields to Arrays.hashcode.
  • Arrays.sort – sort the full array or its subsection using natural ordering. There is also a pair of Arrays.sort methods for sorting an Object[] using a provided Comparator.
  • Arrays.toString – fine-print the array contents.

If you need to copy one part of array (or the whole array) into another already existing array, you need to use System.arraycopy, which copies a given number of elements from a given position in the source array to a given position in the destination array. Generally, this is the fastest way to copy array contents in Java (but in some cases you may need to check if ByteBuffer bulk copy works faster ).

Finally, we need to mention that any Collection could be copied into an array using T[] Collection.toArray( T[] a ) method. The usual pattern of this method call:

1
return coll.toArray( new T[ coll.size() ] );
return coll.toArray( new T[ coll.size() ] );

Such method call allocates a sufficient array for storing the whole collection, so that toArray doesn’t have to allocate sufficiently large array to return.

Single-threaded collections

This part of the article describes non-thread-safe collections. All these collections are stored in the java.util package. Some of these collections were present in Java since Java 1.0 (and are now deprecated), most of them were already present in Java 1.4. Enum collections were added in Java 1.5 with the support of generics in all collection classes. PriorityQueue was also added in Java 1.5. The latest addition to the non-thread-safe collections framework is ArrayDeque, which was added in Java 1.6.

Lists

  • ArrayList – the most useful List implementation. Backed by an array and an int – position of the first not used element in the array. Like all Lists, expands itself when necessary. Has constant element access time. Cheap updates at the tail (constant complexity), expensive at the head (linear complexity) due to ArrayList invariant – all elements start from index = 0 in the underlying array, which means that everything to the right from the update position must be moved to the right for insertions and to the left for removals. CPU-cache friendly collection due to being backed by an array (unfortunately, not too friendly, because contains Objects, which are just pointers to the actual objects).
  • LinkedListDeque implementation – each Node consists of a value, prev and next pointers. It means that element access/updates have linear complexity (due to an optimization, these methods do not traverse more than a half of the list, so the most expensive elements are located in the middle of the list). You need to use ListIterators if you want to try to write fast LinkedList code. If you want a Queue/Deque implementation (you need to access only first and last elements) – consider using ArrayDeque instead.
  • Vector – a prehistoric version of ArrayList with all synchronized methods. Use ArrayList instead.

Continue reading