java.util.LinkedList performance

by Mikhail Vorontsov

We will discuss LinkedList performance in this article. Generally, it is a bad idea to use it in performance-critical code. But, sometimes you may need it…

LinkedList is a list implementation with each node having pointers to previous and next nodes. Such implementation allows to add/remove/update elements fast, but only if:

  • Either they are first/last elements or
  • You have scrolled to required element using ListIterator

In all other cases list modification is O(n) complexity operation. Access to list elements (get) is also O(n) operation (actually, list will scroll either from head or from tail, so it would take no more than size() / 2 operations to access any LinkedList node.

Author is aware of only two cases when you could use a LinkedList:

  1. You are implementing a FIFO buffer and you don’t need to add/remove elements to the middle of the buffer (or rarely need to do it). Adding/removing from LinkedList head/tail is very fast. Nevertheless, consider using java.util.ArrayDeque in this case. It is also optimized for fast head/tail operations (more on this later).
  2. You need to add/remove elements from the middle of the list too often.

LinkedList and ArrayDeque as FIFO buffer

Let’s see how fast can be a FIFO queue if it uses LinkedList or ArrayDeque. We will prefill an instance of both classes with a number of values and when add 5 elements to the head of the lists and when remove 5 elements from the tail. Adding/removing will be done 100M times in a loop:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
final int PREFILL_COUNT = 100_000;
final int LOOP_COUNT = 100_000_000;
final LinkedList<Integer> lst = new LinkedList<Integer>();
final Integer val = 1;
for ( int i = 0; i < PREFILL_COUNT; ++i )
    lst.add( 35 );
//start measuring time here<br/>
for ( int i = 0; i < LOOP_COUNT; ++i )
{
    for ( int j = 0; j < 5; ++j )
        lst.addFirst( val );
    for ( int j = 0; j < 5; ++j )
        lst.removeLast();
}
            
final int PREFILL_COUNT = 100_000;
final int LOOP_COUNT = 100_000_000;
final LinkedList<Integer> lst = new LinkedList<Integer>();
final Integer val = 1;
for ( int i = 0; i < PREFILL_COUNT; ++i )
    lst.add( 35 );
//start measuring time here<br/>
for ( int i = 0; i < LOOP_COUNT; ++i )
{
    for ( int j = 0; j < 5; ++j )
        lst.addFirst( val );
    for ( int j = 0; j < 5; ++j )
        lst.removeLast();
}
            

Results are quite interesting. LinkedList performance in theory doesn’t depend on the number of prefilled elements. In practice, each add operation creates a node (4 Objects – node itself, previous and next pointers and the value) and each remove operation cleans these objects, thus creating quite a noticeable amount of garbage to collect. The bigger is memory footprint of your application, the slower would be these garbage collections. ArrayDeque objects don’t create their own garbage as long as collection size would stabilize, so its performance doesn’t really depend on the current size.

  LinkedList, 10 elems prefilled LinkedList, 100K elems prefilled LinkedList, 1M elems prefilled ArrayDeque, 10 elems prefilled ArrayDeque, 100K elems prefilled ArrayDeque, 1M elems prefilled
Java 6 7.533 sec 7.879 sec 9.461 sec 2.323 sec 2.422 sec 2.446 sec
Java 7 6.004 sec 6.493 sec 7.945 sec 2.035 sec 2.160 sec 2.343 sec

How to use LinkedList properly

The main property of LinkedList to remember – it offers only sequential access to its elements instead of random access of ArrayList. So, don’t try to adapt logic you’ve previously written with ArrayList in mind for the LinkedList – yes, they are both lists, but they are implemented too differently.

LinkedList is a sequential data structure. That’s why all linked list algorithms rely on iterators. In some cases, like remove(Object), iterator would be used implicitly, inside a library method, in other cases it must be used explicitly. For example, you have a LinkedList<String> and you need to remove all Strings after a specified string which are 5 characters long. If specified string is not present, you must process the full buffer. If this example looks a little artificial, replace strings with messages with a timestamp and other properties, and it will become more real.

This could be a first approach to this problem – find position of required string using indexOf method and use this position as an argument for listIterator(int) method (and, of course, add 1 to this position to cater for ‘after a specified string’/’specified string not found’ cases.

1
2
3
4
5
6
7
8
9
10
11
public static void cleanStringListSlow( final LinkedList<String> lst, final String first )
{
    final int startPos = lst.indexOf( first ) + 1;
    final ListIterator<String> iter = lst.listIterator( startPos );
    while ( iter.hasNext() )
    {
        if ( iter.next().length() == 5 )
            iter.remove();
    }
}
            
public static void cleanStringListSlow( final LinkedList<String> lst, final String first )
{
    final int startPos = lst.indexOf( first ) + 1;
    final ListIterator<String> iter = lst.listIterator( startPos );
    while ( iter.hasNext() )
    {
        if ( iter.next().length() == 5 )
            iter.remove();
    }
}
            

Unfortunately, we need to iterate 1.25 list length on the average in this example. First of all, indexOf method is iterating the list in order to find required string. In the best case required string is the first in the list, in the worst case it is not present in the list at all, so we need to check all elements of the list. So, on average indexOf call requires iterating 0.5 list length. After that we are passing start position to the listIterator(int) method. It was optimized (as well as other indexed access methods, like get(int), remove(int)), so if index belongs to the first half of the list, iterator will scroll from the beginning of the list internally, otherwise it will go back from the end. In our example, the best case would be if required string is the last element of the list – nothing would be done, because listIterator argument would be equal to length of the list. The worst case is an element from the first half of the list – first of all listIterator(int) will scroll to it from the beginning and then the method loop will scroll to the tail of the list, thus accessing all list elements.

The idea to rewrite such algorithm is to write your own version of indexOf method, which will return listIterator. Let’s agree that returned iterator should point to requested element if it is present in the list (next method will return this element). In case when requested element is not present in the list, iterator should point at the end of the list (hasNext() == false). Let’s also assume that required element is not equal to null.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public static <T> ListIterator<T> findElem( final List<T> lst, final T elem )
{
    final ListIterator<T> iter = lst.listIterator();
    while ( iter.hasNext() )
    {
        if ( elem.equals( iter.next() ) )
        {
            iter.previous();
            break;
        }
    }
    return iter;
}
            
public static <T> ListIterator<T> findElem( final List<T> lst, final T elem )
{
    final ListIterator<T> iter = lst.listIterator();
    while ( iter.hasNext() )
    {
        if ( elem.equals( iter.next() ) )
        {
            iter.previous();
            break;
        }
    }
    return iter;
}
            

Now previous method could be simplified. We just need to get a new ListIterator in case when required element is not present in the list.

1
2
3
4
5
6
7
8
9
10
11
12
public static void cleanStringListFast( final LinkedList<String> lst, final String first )
{
    ListIterator<String> iter = findElem( lst, first );
    if ( !iter.hasNext() ) //if element is not present - process the full list<
        iter = lst.listIterator();
    while ( iter.hasNext() )
    {
        if ( iter.next().length() == 5 )
            iter.remove();
    }
}
            
public static void cleanStringListFast( final LinkedList<String> lst, final String first )
{
    ListIterator<String> iter = findElem( lst, first );
    if ( !iter.hasNext() ) //if element is not present - process the full list<
        iter = lst.listIterator();
    while ( iter.hasNext() )
    {
        if ( iter.next().length() == 5 )
            iter.remove();
    }
}
            

So, as a rule, do not use any LinkedList methods which accept or return the position of an element in the list. Especially, do not try old style iteration:

1
2
3
4
5
6
7
final List<Integer> lst = new LinkedList<Integer>();
for ( int i = 0; i < 100000; ++i )
    lst.add( i );
long sum = 0;
for ( int i = 0; i < 100000; ++i )
    sum += lst.get( i );
            
final List<Integer> lst = new LinkedList<Integer>();
for ( int i = 0; i < 100000; ++i )
    lst.add( i );
long sum = 0;
for ( int i = 0; i < 100000; ++i )
    sum += lst.get( i );
            

This code takes unexpected 6 seconds to complete! Do not even try to iterate a LinkedList containing a million elements this way. You'll get tired waiting. The only exception to this rule is accessing/removing first or last element of the list (or one of the few first/last elements).

removeFirst/pollFirst

While working with LinkedList, keep in mind that it is not a simple List, but a Deque. Rather often in code using LinkedList I see the following construct:

1
2
3
4
5
6
7
public T next()
{
    if ( lst.isEmpty() )
        return null;
    return lst.removeFirst();
}              
          
public T next()
{
    if ( lst.isEmpty() )
        return null;
    return lst.removeFirst();
}              
          

LinkedList.removeFirst() (as well as LinkedList.remove()) returns first element if the list is not empty or throws NoSuchElementException if the list is empty. This exception is the common reason why removeFirst is guarded is isEmpty call.

Such code is excessive, because LinkedList provides pollFirst method which does exactly the same as above mentioned next method - returns null if the list is empty otherwise the first element. So, right method can save one check and make the code more clear in this case. The same is applicable to removeLast/pollLast pair.

Batch processing

Sometimes you may have a LinkedList which contains some data obtained from the several sources and you need to process data from each source separately. For example, you have a real time network event log ordered by event timestamps. Each element of this log has, for example, IP address property, which specifies network device (computer, router, etc.) where this event has happened. You need to process events related to each IP address separately. Besides, you can't collect information for the long time and process IP addresses separately - it is a real time log, so we can't afford to delay processing for too long (either we have to response to some events not later than N seconds after these events or we have a large network, so it would take too much memory to keep/process all events at once).

For such kind of data, LinkedList is a good solution - we can remove elements from any position of the list quite cheap. First of all we need a method to extract events for particular IP address from the network log. Do not try to select elements and remove them from the list afterwards, even if you already have one of these methods in your library, because it would require 2 iterations instead of one for extraction. We have to use ListIterator in the extraction method because this is the only way to iterate elements and remove some of them from the source list at the same time.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
private static final class LogEvent
{
    public final int ipv4;
    public final long time;
    public final String eventDesc;
 
    private LogEvent( final int ipv4, final long time, final String eventDesc ) {
        this.ipv4 = ipv4;
        this.time = time;
        this.eventDesc = eventDesc;
    }
}
 
private static List<LogEvent> extractForIp( final LinkedList<LogEvent> fullLst, final int ip )
{
    final List<LogEvent> res = new ArrayList<LogEvent>( 10 );
    final ListIterator<LogEvent> iter = fullLst.listIterator();
    while ( iter.hasNext() )
    {
        final LogEvent event = iter.next();
        if ( event.ipv4 == ip )
        {
            res.add( event );
            iter.remove();
        }
    }
    return res;
}
            
private static final class LogEvent
{
    public final int ipv4;
    public final long time;
    public final String eventDesc;

    private LogEvent( final int ipv4, final long time, final String eventDesc ) {
        this.ipv4 = ipv4;
        this.time = time;
        this.eventDesc = eventDesc;
    }
}

private static List<LogEvent> extractForIp( final LinkedList<LogEvent> fullLst, final int ip )
{
    final List<LogEvent> res = new ArrayList<LogEvent>( 10 );
    final ListIterator<LogEvent> iter = fullLst.listIterator();
    while ( iter.hasNext() )
    {
        final LogEvent event = iter.next();
        if ( event.ipv4 == ip )
        {
            res.add( event );
            iter.remove();
        }
    }
    return res;
}
            

Now we are able to process all IP addresses in the current timestamp.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
private static void processIp( final List<LogEvent> lst )
{
    //...processing logic here
}
 
private static void processFirstTimestamp( final LinkedList<LogEvent> fullList )
{
    if ( fullList.isEmpty() )
        return;
    final long firstTime = fullList.getFirst().time;
    while ( !fullList.isEmpty() && fullList.getFirst().time == firstTime )
    {
        final int ip = fullList.getFirst().ipv4;
        final List<LogEvent> eventsForIp = extractForIp( fullList, ip );
        processIp( eventsForIp );
    }
}
            
private static void processIp( final List<LogEvent> lst )
{
    //...processing logic here
}

private static void processFirstTimestamp( final LinkedList<LogEvent> fullList )
{
    if ( fullList.isEmpty() )
        return;
    final long firstTime = fullList.getFirst().time;
    while ( !fullList.isEmpty() && fullList.getFirst().time == firstTime )
    {
        final int ip = fullList.getFirst().ipv4;
        final List<LogEvent> eventsForIp = extractForIp( fullList, ip );
        processIp( eventsForIp );
    }
}
            

Unfortunately, this algorithm will perform poorly when you have a lot of messages/IP addresses in your log. Each time you have to scan all events but you will end up with only a few messages after extraction. The better way is to maintain a map from IP addresses to lists of their events. Both extraction and appending would work much faster in this case. We will use LinkedHashMap in order to keep original order of found IP. If we have removed all entries for some IP address and added some new entries for the same IP address later, this IP address would be added to the tail of iteration order. updateMap method adds all entries from the input list to the map and cleans input list. We need only this method for both initial and subsequent calls.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
private static Map<Integer, List<LogEvent>> extractMap( final List<LogEvent> fullLst )
{
    final Map<Integer, List<LogEvent>> res = new LinkedHashMap<Integer, List<LogEvent>>( 10 );
    updateMap( res, fullLst );
    return res;
}
 
private static void updateMap( final Map<Integer, List<LogEvent>> eventMap, final List<LogEvent> fullLst )
{
    for ( final LogEvent event : fullLst )
    {
        List<LogEvent> lst = eventMap.get( event.ipv4 );
        if ( lst == null )
        {
            lst = new ArrayList<LogEvent>( 10 );
            eventMap.put( event.ipv4, lst );
        }
        lst.add( event );
    }
    fullLst.clear();
}
            
private static Map<Integer, List<LogEvent>> extractMap( final List<LogEvent> fullLst )
{
    final Map<Integer, List<LogEvent>> res = new LinkedHashMap<Integer, List<LogEvent>>( 10 );
    updateMap( res, fullLst );
    return res;
}

private static void updateMap( final Map<Integer, List<LogEvent>> eventMap, final List<LogEvent> fullLst )
{
    for ( final LogEvent event : fullLst )
    {
        List<LogEvent> lst = eventMap.get( event.ipv4 );
        if ( lst == null )
        {
            lst = new ArrayList<LogEvent>( 10 );
            eventMap.put( event.ipv4, lst );
        }
        lst.add( event );
    }
    fullLst.clear();
}
            

Now it is sufficient to call updateMap once in a while, when new log entries were received. They would be added to the correct map entry and map entries order wouldn't be modified. All we need after that is a new processing logic.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
private static void processFirstTimestamp( final Map<Integer, List<LogEvent>> eventMap )
{
    if ( eventMap.isEmpty() )
        return;
    final Iterator<Map.Entry<Integer, List<LogEvent>>> iter = eventMap.entrySet().iterator();
    Long firstTime = null;
    while ( iter.hasNext() )
    {
        final Map.Entry<Integer, List<LogEvent>> entry = iter.next();
        final List<LogEvent> lst = entry.getValue();
        if ( firstTime == null )
            firstTime = lst.get(0).time;
        else if ( lst.get(0).time != firstTime )
            break;
        //extract entries for processing
        iter.remove();
        processIp(lst);
    }
}
            
private static void processFirstTimestamp( final Map<Integer, List<LogEvent>> eventMap )
{
    if ( eventMap.isEmpty() )
        return;
    final Iterator<Map.Entry<Integer, List<LogEvent>>> iter = eventMap.entrySet().iterator();
    Long firstTime = null;
    while ( iter.hasNext() )
    {
        final Map.Entry<Integer, List<LogEvent>> entry = iter.next();
        final List<LogEvent> lst = entry.getValue();
        if ( firstTime == null )
            firstTime = lst.get(0).time;
        else if ( lst.get(0).time != firstTime )
            break;
        //extract entries for processing
        iter.remove();
        processIp(lst);
    }
}
            

A small test method was implemented. It generates 100 entries for each of 1000 IP addresses for each of 50 timestamps. Each time we generate entries for slightly different set of IP addresses. First 5 timestamps are not processed (consider this as some buffering), after that we process one timestamp for each new portion of data (so we are always 5 timestamps late). Finally, all remaining timestamps are processed. Here is a test method for batch mode. Test method for initial approach just doesn't use a map and calls processFirstTimestamp( LinkedList<LogEvent> ) instead.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
private static void testEventsMap()
{
    final long start = System.currentTimeMillis();
    final LinkedList<LogEvent> lst = new LinkedList<LogEvent>();
    final Map<Integer, List<LogEvent>> map = new HashMap<Integer, List<LogEvent>>( 1000 );
    int mlt = 0;
    for ( long t = 0; t < 50; ++t )
    {
        for ( int ip = mlt * 100; ip < 1000 + mlt * 100; ++ip )
        {
            for ( int num = 0; num < 100; ++num )
            {
                final LogEvent event = new LogEvent( ip, t, "Event " + num );
                lst.add( event );
            }
        }
        mlt++;
        if ( mlt > 4 )
            mlt = 0;
        updateMap( map, lst );
        if ( t > 5 )
            processFirstTimestamp( map );
    }
    while ( !map.isEmpty() )
        processFirstTimestamp( map );
    System.out.println( "Total time batch = " + ( System.currentTimeMillis() - start ) );
}
            
private static void testEventsMap()
{
    final long start = System.currentTimeMillis();
    final LinkedList<LogEvent> lst = new LinkedList<LogEvent>();
    final Map<Integer, List<LogEvent>> map = new HashMap<Integer, List<LogEvent>>( 1000 );
    int mlt = 0;
    for ( long t = 0; t < 50; ++t )
    {
        for ( int ip = mlt * 100; ip < 1000 + mlt * 100; ++ip )
        {
            for ( int num = 0; num < 100; ++num )
            {
                final LogEvent event = new LogEvent( ip, t, "Event " + num );
                lst.add( event );
            }
        }
        mlt++;
        if ( mlt > 4 )
            mlt = 0;
        updateMap( map, lst );
        if ( t > 5 )
            processFirstTimestamp( map );
    }
    while ( !map.isEmpty() )
        processFirstTimestamp( map );
    System.out.println( "Total time batch = " + ( System.currentTimeMillis() - start ) );
}
            

Map based implementation completed 35 times faster than simple list based one - 10 seconds against 351 seconds. This is the price for multiple iterations over the full log event list.

See also

Java collections overview - an overview of all standard JDK collections.

Summary

If you need to optimize LinkedList performance in your code, try to stick to these rules:

  • Consider using ArrayDeque for queue-based algorithms
  • Use ListIterator with LinkedList
  • Avoid any LinkedList methods which accept or return index of an element in the list - they have nothing in common with performance
  • Check if you have a reason to use LinkedList.remove/removeFirst/removeLast methods, use pollFirst/pollLast instead
  • Try batch processing LinkedList

Leave a Reply

Your email address will not be published. Required fields are marked *