java.util.ArrayList performance guide

by Mikhail Vorontsov

We will discuss most of possible ArrayList performance problems in this article. ArrayList methods will be divided into several groups and their performance will be discussed.

ArrayList is a general list implementation suitable for most use cases. It is backed by an Object array, which size is dynamically adjusted while user adds or removes elements from the list. If you need a resizable list of primitive type values, you should read a Trove library article.

Adding elements to the list

There are 2 pairs of methods used to add elements:

  • add(E)
  • add(int, E)
  • addAll(List<E>)
  • addAll(int, List<E>)

Single argument methods are adding new elements to the list tail. They are the fastest ones. Methods specifying insertion position have to copy all array elements to the right from insertion point by System.arraycopy call, that’s why these methods have O(n) complexity (performance penalty is small if new element is added near the tail, but getting larger when insertion point moves towards the head). This means that you shouldn’t add too many elements to the head of big ArrayList. Adding 1M elements to the head of List<String> took 125 seconds on my computer. Adding 0.5M elements took 30 seconds, which proves that n calls of a method with O(n) complexity have O(n2) complexity (adding 2 times more elements took 4 times longer).

If there is not enough space in the array to fit new elements, it should be extended. Extension logic was changed in Java 7: previously new array size was oldSize * 3 / 2 + 1, but it is oldSize * 3 in Java 7.

Removing elements from the list

Each of remove methods has its own problems, so they would be discussed separately.

remove(int)

This method removes a single element at given position. After that all elements from the right are shifted to the left via System.arraycopy call, that’s why this method has O(n) complexity. There are 2 possible problems here:

If we have ArrayList<Integer>, when you must pay attention to all remove(int) calls: what are you really going to call – remove(int) or remove(Object)? You may need to manually cast remove(int) argument to int datatype and remove(Object) argument to Integer datatype. For example, if you want to remove first element of the list, you should call remove(0), but if you want to remove first zero value from list, you’d better call remove( (Integer) 0 ).

As it was said before, remove(int) has O(n) complexity. The biggest problem caused by this method author has ever seen was in the buffer implementation. First of all, caller code added a lot of elements to the buffer, and after that the following code snippet was used to process buffer contents:

1
2
3
4
5
while ( !buffer.isEmpty() )
{
    Elem el = buffer.remove( 0 );
    process( el );
}
while ( !buffer.isEmpty() )
{
    Elem el = buffer.remove( 0 );
    process( el );
}

Initial dataset was no larger than 100K elements, so slowdown caused by this method was not noticed. But after that someone decided to use the same program to process much larger dataset – a few million messages. As a result, that code has literally stopped processing. On my computer it took 126 seconds to clean list of 1M Strings using above code (and, just to prove O(n2) complexity of this approach – 30 seconds to clean list of 0.5M Strings). These are literally the same numbers as for add test case. This proves that most of time in both cases is spent in System.arraycopy calls. The only difference is that it moves elements to the right while adding new elements to the head, but moves them to the left while removing elements from the head.

So, how to fix it? If all elements are added to the buffer first and processed (removed) only after that, we can reverse this list by calling Collections.reverse and when remove messages from the list tail:

1
2
3
4
5
6
Collections.reverse( buffer );
while ( !buffer.isEmpty() )
{
  Elem el = buffer.remove( buffer.size() - 1 );
  process( el );
}
Collections.reverse( buffer );
while ( !buffer.isEmpty() )
{
  Elem el = buffer.remove( buffer.size() - 1 );
  process( el );
}

Each call to remove last element would not invoke System.arraycopy call, so such method call complexity would be O(1). Removing even a million messages using such code would be done in a blink of eye.

If usage pattern is different: add a few elements, process a few elements, add some more elements and so on, we would need either a LinkedList or we can use ArrayList.subList method described below.

remove(Object)

This method removes the first occurrence of a given element from the list. It supports null argument and has a separate branch to process it. It iterates all list elements, so it has O(n) complexity. This method would access all array elements in any case – either read them while looking for requested element or move them on one position to the left by System.arraycopy call after requested element was found. Moving is slightly faster, but it would still access all elements.

This method has the same problems as those discussed for remove(int). We can devise even worst ‘buffer processing’ code using this method:

1
2
3
4
5
6
while ( !buffer.isEmpty() )
{
    Elem el = buffer.get( 0 );
    process( el );
    buffer.remove( el );
}
while ( !buffer.isEmpty() )
{
    Elem el = buffer.get( 0 );
    process( el );
    buffer.remove( el );
}

Never call remove(Object) when you know at which position you can find your element! Besides a lookup, you may also delete an element at earlier position instead of the one you want to delete.

And now the most important property of remove methods – they do not shrink internal array size. Never. Neither does clear method. Only trimToSize does this. So, if your data has one peak (or a few of them), all the time between peaks you will waste memory, because internal array will remain large enough to fit peak data. That’s why it worth considering to call trimToSize when your buffer size was reduced below some predefined level (for example, from above 100K elements to below 100K elements).

removeAll(Collection), retainAll(Collection)

First of these methods removes all elements which present in its argument, the second – keeps all elements which present in its argument. Both methods have O(n2) complexity, because they iterate over all elements in the ArrayList, calling contains method on the argument Collection for each ArrayList element. They wouldn’t shrink internal array.

These methods may be used in scenarios when a list was scanned for some values, they were added to a separate collection and after that we need to remove these elements from initial list. Why should someone do so? Maybe because he’s used to for-each loop, which doesn’t allow removing elements from iterated list. Or for any other reason. For example, because you’ve read about performance of single remove methods and want to avoid it by using these methods.

You can clean a list from removed elements in one pass, but you either need to replace removed elements with nulls or maintain another collection with indices of all removed elements (depending on how many elements are there in the list and how many you are going to remove, you may need an array, Set or BitSet). That’s how you can remove all nulls from an ArrayList:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public static <T> void cleanNulls( final List<T> lst )
{
    int pFrom = 0;
    int pTo = 0;
    final int len = lst.size();
    //copy all not-null elements towards list head
    while ( pFrom < len )
    {
        if ( lst.get( pFrom ) != null )
            lst.set( pTo++, lst.get( pFrom ) );
        ++pFrom;
    }
    //there are some elements left in the tail - clean them
    lst.subList( pTo, len ).clear();
}
public static <T> void cleanNulls( final List<T> lst )
{
    int pFrom = 0;
    int pTo = 0;
    final int len = lst.size();
    //copy all not-null elements towards list head
    while ( pFrom < len )
    {
        if ( lst.get( pFrom ) != null )
            lst.set( pTo++, lst.get( pFrom ) );
        ++pFrom;
    }
    //there are some elements left in the tail - clean them
    lst.subList( pTo, len ).clear();
}

This code needs just one pass. It uses 2 pointers – pFrom, which is current element to check and increased on each iteration, and pTo, which is destination position and increased only when a not null element is copied. After all not null elements were copied, there may be some elements left at the end of ArrayList, which are no longer required (they were already copied). We would clean them by using subList method, which is described next.

subList(int, int)

This method creates a view of specified part of current list. This method works differently in Java 6 and Java 7.

In Java 6 it is defined in AbstractList class. Each sublist keeps a reference to its parent list and uses parent list methods just adding offset (first subList argument) to indices.

In Java 7, a sublist created by ArrayList.subList call keeps a reference to original list and accesses its elementData array directly.

This difference is not important unless you wish to write a recursive algorithm using subLists (for example, quick sort). In Java 6, the deeper you get in recursion, the more subList objects each method call has to pass (in most cases – with each own range checks on each recursion level). In Java 7 it doesn’t matter how deep you went into recursion – each time most methods will make just one range check and will access required elements of base ArrayList class elementData array.

There are 3 obvious use cases for this method:

  • Fast clearing parts of the list
  • Using it in for-each iteration over the part of the list
  • Using it in the recursive algorithms

If you need to clear some part of the list for any reason (for example, you have just processed it) and you already know that continuously calling remove method is a bad idea (and you don’t know about subList method), you will either copy remaining elements to the new list and replace old list with the new one or you will call remove method, thinking that it is still a better way than any other.

But actually you need such call: list.subList(from, to).clear() Unfortunately, it is absolutely not obvious from JavaDoc that it will clean a part of the source list. Nevertheless, it is a fastest way to clean a sublist – this method call will end up with just one System.arraycopy call, which will shift all remaining elements to the left.

For-each loop was added to Java language in order to get rid of index variables when you need to iterate list or array elements. But what shall you do if you need to iterate only some part of the list? Still use index variable? Not necessarily – you can iterate all elements of a sublist. The following method calculates a total length of all strings in a given part of a list of strings.

1
2
3
4
5
6
7
8
9
public static int getTotalLength( final List<String> lst, final int from, final int to )
{
    int sum = 0;
    for ( final String s : lst.subList( from, to ) )
    {
        sum += s.length();
    }
    return sum;
}
public static int getTotalLength( final List<String> lst, final int from, final int to )
{
    int sum = 0;
    for ( final String s : lst.subList( from, to ) )
    {
        sum += s.length();
    }
    return sum;
}

The last use case for sublists is recursive algorithms. As it was said before, you’d better not use sublists in recursive algorithms in Java 6. Java 7 improves them significantly. Here is a slightly artificial method – it is the same ‘total length’ method, but written recursively:

1
2
3
4
5
6
7
public static int getTotalLengthRec( final List<String> lst )
{
    if ( lst.size() == 1 )
        return lst.get( 0 ).length();
    final int middle = lst.size() >> 1;
    return getTotalLengthRec( lst.subList( 0, middle ) ) + getTotalLengthRec( lst.subList( middle, lst.size() ) );
}
public static int getTotalLengthRec( final List<String> lst )
{
    if ( lst.size() == 1 )
        return lst.get( 0 ).length();
    final int middle = lst.size() >> 1;
    return getTotalLengthRec( lst.subList( 0, middle ) ) + getTotalLengthRec( lst.subList( middle, lst.size() ) );
}

For benchmarking we used a list containing 1M short strings and called this method 1000 times. It took 78 seconds in Java 6 and only 22 seconds in Java 7. But what should you do in these cases in order not to depend on Java version? Handle indices manually. Here is the same recursive method, but now it uses explicit borders.

1
2
3
4
5
6
7
8
9
10
11
12
public static int getTotalLengthRec2( final List<String> lst )
{
    return getTotalLengthRecHelper( lst, 0, lst.size() );
}
 
public static int getTotalLengthRecHelper( final List<String> lst, final int from, final int to )
{
    if ( to - from <= 1 )
        return lst.get( from ).length();
    final int middle = ( from + to ) >> 1;
    return getTotalLengthRecHelper( lst, from, middle ) + getTotalLengthRecHelper( lst, middle, to );
}
public static int getTotalLengthRec2( final List<String> lst )
{
    return getTotalLengthRecHelper( lst, 0, lst.size() );
}

public static int getTotalLengthRecHelper( final List<String> lst, final int from, final int to )
{
    if ( to - from <= 1 )
        return lst.get( from ).length();
    final int middle = ( from + to ) >> 1;
    return getTotalLengthRecHelper( lst, from, middle ) + getTotalLengthRecHelper( lst, middle, to );
}

One thousand invocations of getTotalLengthRec2 takes 7.8 sec in Java 6 and 9.3 sec in Java 7 to complete. So, it is 10 times faster in Java 6 and 2 times faster in Java 7.

get(int)

A word of caution about this method. It is about 1/3 slower in Java 7 than in Java 6, because in Java 7 it uses additional method to access internal array (Java 6 accessed that array directly). It is expected that JIT should inline such simple methods and eliminate the difference between Java 6 and 7, but looks like it doesn’t. Maybe, it would be fixed in later Java 7 releases. Anyway, this method is still extremely fast and you will see any difference only on hundreds of millions accesses.

contains(Object), indexOf(Object)

First method checks is a given object is present in the list (and defined as indexOf(elem) >= 0). Second method tries to find position of given element in the list (nulls are supported). Both methods have O(n) complexity, because they are scanning an internal array from the beginning in order to find given element. They are using equals method in order to compare given element with array elements, so you may construct new elements and still find their position in the list. If you are making too many contains/indexOf calls on ArrayList in your code, consider using any Set implementation instead.

If you are using indexOf method on the ArrayList, usually you are looking for some similar elements in the list. Check if you can sort your ArrayList so that all similar elements (by some criteria) are now located next to each other – in most cases that is possible (even if you will have to sort data after that again by different criteria). For example, you have a list of stock exchange trades, which have have unique id, buyer, seller and some other fields. Both buyer and seller may not know some parts of particular trade information, but if you’ll merge buyer and seller information based on the same trade id, you will have complete trade information. There are two ways to implement such merge – first one is straightforward – keep map <TradeId, Trade> for all met trades, match a new trade with one from the map, find previous trade position by indexOf method call:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
private static class Trade
{
    public String tradeId;
    //other fields
 
    public void merge( final Trade other )
    {
        //merging code here
    }
}
 
public void mergeSides( final List<Trade> trades )
{
    final Map<String, Trade> byTradeId = new HashMap<String, Trade>( 100 );
    for ( int i = 0; i < trades.size(); ++i )
    {
        final Trade trade = trades.get( i );
        final Trade prevTrade = byTradeId.get( trade.tradeId ); //do we have a match?
        if ( prevTrade != null )
        {
            final int prevPos = trades.indexOf( prevTrade );
            trade.merge( prevTrade );//merge trades attributes
            byTradeId.remove( trade.tradeId );//trades matched - no need to keep it in map any longer
            //clean nulls before exit
            trades.set( prevPos, null );
        }
        else
        {
            byTradeId.put( trade.tradeId, trade );//no trades with this tradeId before - save it in the map
        }
    }
    cleanNulls( trades ); //clean previously set nulls
}
private static class Trade
{
    public String tradeId;
    //other fields

    public void merge( final Trade other )
    {
        //merging code here
    }
}

public void mergeSides( final List<Trade> trades )
{
    final Map<String, Trade> byTradeId = new HashMap<String, Trade>( 100 );
    for ( int i = 0; i < trades.size(); ++i )
    {
        final Trade trade = trades.get( i );
        final Trade prevTrade = byTradeId.get( trade.tradeId ); //do we have a match?
        if ( prevTrade != null )
        {
            final int prevPos = trades.indexOf( prevTrade );
            trade.merge( prevTrade );//merge trades attributes
            byTradeId.remove( trade.tradeId );//trades matched - no need to keep it in map any longer
            //clean nulls before exit
            trades.set( prevPos, null );
        }
        else
        {
            byTradeId.put( trade.tradeId, trade );//no trades with this tradeId before - save it in the map
        }
    }
    cleanNulls( trades ); //clean previously set nulls
}

Second approach is to sort list of trades by trade id and then try to find adjacent trades with the same trade id and merge them. Comparing two trades by trade id is implemented as a separate Comparator, because it is not a natural comparator for Trades ( it is hard to say what would be a natural comparator for them ).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
private static final class TradeIdComparator implements Comparator<Trade>
{
    public int compare( final Trade first, final Trade second ) {
        return first.tradeId.compareTo( second.tradeId );
    }
}
 
public void mergeTradesSort( final List<Trade> trades )
{
    Collections.sort( trades, new TradeIdComparator() );
    for ( int i = 1; i < trades.size(); ++i )
    {
        if ( trades.get( i ).tradeId.equals( trades.get( i - 1 ).tradeId ) )
        {
            trades.get( i ).merge( trades.get( i - 1 ) );
            //clean nulls before exit
            trades.set( i - 1, null );
            ++i;//no need to compare second trade with the next trade
        }
    }
    cleanNulls( trades ); //clean previously set nulls
}
private static final class TradeIdComparator implements Comparator<Trade>
{
    public int compare( final Trade first, final Trade second ) {
        return first.tradeId.compareTo( second.tradeId );
    }
}

public void mergeTradesSort( final List<Trade> trades )
{
    Collections.sort( trades, new TradeIdComparator() );
    for ( int i = 1; i < trades.size(); ++i )
    {
        if ( trades.get( i ).tradeId.equals( trades.get( i - 1 ).tradeId ) )
        {
            trades.get( i ).merge( trades.get( i - 1 ) );
            //clean nulls before exit
            trades.set( i - 1, null );
            ++i;//no need to compare second trade with the next trade
        }
    }
    cleanNulls( trades ); //clean previously set nulls
}

See also

Java collections overview – an overview of all standard JDK collections.

Summary

Try to follow these rules while optimizing ArrayList performance of your code:

  • Add elements to the end of the list
  • Remove elements from the end too
  • Avoid contains, indexOf and remove(Object) methods
  • Even more avoid removeAll and retainAll methods
  • Use subList(int, int).clear() idiom to quickly clean a part of the list

Leave a Reply

Your email address will not be published. Required fields are marked *