Tag Archives: array

Forbidden Java actions: object assignments, type conversions etc on the low level in Java

by Mikhail Vorontsov

This article will reveal you a few details about the low level Java memory layout: we will see how to implement Object assignments using just primitive types. Then we will see what’s hidden in the array header and will convert an array of one type into an array of another type. This article continues the memory allocation discussion from Various types of memory allocation in Java article. It is also the first article in the “Forbidden Java actions” series.

WARNING: It is not recommended to apply these ideas to a production code! This article has only exploratory tasks.

Object assignments on the low level

As we already know from the Memory introspection using sun.misc.Unsafe and reflection article, Object reference (not its contents!) occupies just 4 bytes on the under 32Gb heaps. We will assume 4 byte Object references in this article.

All examples from this article require the following static definition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
private static final Unsafe unsafe;
static
{
    try
    {
        Field field = Unsafe.class.getDeclaredField("theUnsafe");
        field.setAccessible(true);
        unsafe = (Unsafe)field.get(null);
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}
 
private static final long longArrayOffset = unsafe.arrayBaseOffset(long[].class);
private static final long intArrayOffset = unsafe.arrayBaseOffset(int[].class);
private static final long integerArrayOffset = unsafe.arrayBaseOffset(Integer[].class);
private static final Unsafe unsafe;
static
{
    try
    {
        Field field = Unsafe.class.getDeclaredField("theUnsafe");
        field.setAccessible(true);
        unsafe = (Unsafe)field.get(null);
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}

private static final long longArrayOffset = unsafe.arrayBaseOffset(long[].class);
private static final long intArrayOffset = unsafe.arrayBaseOffset(int[].class);
private static final long integerArrayOffset = unsafe.arrayBaseOffset(Integer[].class);

Let’s allocate a small Integer[ 4 ] array and populate it with 1, 2, 3 and 4. After that let’s see what’s stored in the data cells of the array. We will read them as int values:

1
2
3
4
5
6
7
8
9
10
11
12
13
final Integer[] ar = new Integer[ 4 ];
ar[ 0 ] = 1;
ar[ 1 ] = 2;
ar[ 2 ] = 3;
ar[ 3 ] = 4;
 
//objects occupy 4 bytes for under 32G heaps
int[] ptrs = new int[ 4 ];
for ( int i = 0; i < 4; i++)
{
    ptrs[ i ] = unsafe.getInt(ar, integerArrayOffset + i * 4);
    System.out.println("Integer[" + i + "] = " + Integer.toHexString( ptrs[ i ] ) );
}
final Integer[] ar = new Integer[ 4 ];
ar[ 0 ] = 1;
ar[ 1 ] = 2;
ar[ 2 ] = 3;
ar[ 3 ] = 4;

//objects occupy 4 bytes for under 32G heaps
int[] ptrs = new int[ 4 ];
for ( int i = 0; i < 4; i++)
{
    ptrs[ i ] = unsafe.getInt(ar, integerArrayOffset + i * 4);
    System.out.println("Integer[" + i + "] = " + Integer.toHexString( ptrs[ i ] ) );
}

Here is an output from my computer. It will be different each time, but most likely the difference between cells will be equal to 16, which is an actual Integer memory footprint.

Integer[0] = f004e880
Integer[1] = f004e870
Integer[2] = f004e860
Integer[3] = f004e850

Unfortunately, these values are not pure pointers (in terms of C) – trying to use them with unsafe.getInt( long address ) will cause an access violation.

On the other hand, we can still work in terms of these values. For example, we can implement an assignment ar[ 0 ] = ar[ 1 ]:

Continue reading

Various types of memory allocation in Java

by Mikhail Vorontsov

This article discusses various types of a memory buffer allocation in Java. We will see how to treat any sort of Java buffer uniformly using sun.misc.Unsafe memory access methods. This article may be especially interesting for ex-C programmers willing to work with the memory on the lowest possible level in Java.

If you are more interested in general Java memory optimization, take a look at An overview of memory saving techniques in Java article in this blog as well as its following parts: one, two.

Array allocation limitations

Array size in Java is limited by the fact of using int as an array index. This means that you can not allocate an array with more than Integer.MAX_VALUE ( = 2^31 - 1 ) elements. This doesn’t mean that the longest chunk of memory you can allocate in Java is 2 Gb. You can allocate an array of bigger type instead. For example,

1
final long[] ar = new long[ Integer.MAX_VALUE ];
final long[] ar = new long[ Integer.MAX_VALUE ];

will allocate 16Gb - 8 bytes, if you have sufficiently high -Xmx Java setting (usually you should have about 50% more memory in heap – so in order to allocate 16Gb buffer, you will have to specify -Xmx24G (this is a general rule, actual required heap size may vary).

Unfortunately, you will be limited by your array element type in pure Java. The only useful class for working with arrays is a ByteBuffer, which offers methods for getting/writing various Java data types in the buffer (see Various methods of binary serialization in Java for more details). The disadvantage of a ByteBuffer – you are limited with byte[] as a source array type, which means a limitation of 2Gb for your buffer.

Treating any arrays as a byte buffer

For a while let’s assume that 2Gb buffers were not sufficient for our needs, but a 16Gb buffer will make us happy. We have allocated a long[], but want to treat this buffer as a byte array. We need to use a best C programmer friend in Java – sun.misc.Unsafe. This class has 2 sets of methods: getN( Object, offset ), where N is a result type for reading a value of given type from the given offset in the object and putN( Object, offset, value ) for writing a value at a given offset.

Unfortunately, these methods set or get only an individual value. If you want to copy data to/from an array, you will need one more Unsafe method: copyMemory(srcObject, srcOffset, destObject, destOffset, count). It works similar to System.arraycopy, but copies bytes instead of array elements.

In order to access array data using sun.misc.Unsafe, you will need 2 components:

Continue reading

Java collections overview

by Mikhail Vorontsov

This article will give you an overview of all standard Java collections. We will categorize their distinguishable properties and main use cases. Besides that, we will list all correct ways of transforming your data between various collection types.

Arrays

Array is the only collection type built in Java. It is useful when you know an upper bound on the number of processed elements in advance. java.util.Arrays contains a lot of useful methods for array processing:

  • Arrays.asList – conversion from array to List, which could be passed to other standard collection constructors.
  • Arrays.binarySearch – fast lookup in a sorted array or its subsection.
  • Arrays.copyOf – use this method if you need to expand your array while keeping its contents.
  • Arrays.copyOfRange – if you need to make a copy of the whole array or its subsection.
  • Arrays.deepEquals, Arrays.deepHashCode – versions of Arrays.equals/hashCode supporting nested sub-arrays.
  • Arrays.equals – if you need to compare two arrays for equality, use this method instead of array equals method ( array.equals is not overridden in any array, so it only compares references to arrays, rather than their contents ). This method may be combined with Java 5 boxing and varargs in order to write a simple implementation of your class equals method – just pass all your class fields to Arrays.equals after comparing object types.
  • Arrays.fill – populate the whole array or its subsection with a given value.
  • Arrays.hashCode – useful method for calculating a hashcode of array contents ( array own hashcode method can not be used for this purpose ). This method may be combined with Java 5 boxing and varargs in order to write a simple implementation of your class hashcode method – just pass all your class fields to Arrays.hashcode.
  • Arrays.sort – sort the full array or its subsection using natural ordering. There is also a pair of Arrays.sort methods for sorting an Object[] using a provided Comparator.
  • Arrays.toString – fine-print the array contents.

If you need to copy one part of array (or the whole array) into another already existing array, you need to use System.arraycopy, which copies a given number of elements from a given position in the source array to a given position in the destination array. Generally, this is the fastest way to copy array contents in Java (but in some cases you may need to check if ByteBuffer bulk copy works faster ).

Finally, we need to mention that any Collection could be copied into an array using T[] Collection.toArray( T[] a ) method. The usual pattern of this method call:

1
return coll.toArray( new T[ coll.size() ] );
return coll.toArray( new T[ coll.size() ] );

Such method call allocates a sufficient array for storing the whole collection, so that toArray doesn’t have to allocate sufficiently large array to return.

Single-threaded collections

This part of the article describes non-thread-safe collections. All these collections are stored in the java.util package. Some of these collections were present in Java since Java 1.0 (and are now deprecated), most of them were already present in Java 1.4. Enum collections were added in Java 1.5 with the support of generics in all collection classes. PriorityQueue was also added in Java 1.5. The latest addition to the non-thread-safe collections framework is ArrayDeque, which was added in Java 1.6.

Lists

  • ArrayList – the most useful List implementation. Backed by an array and an int – position of the first not used element in the array. Like all Lists, expands itself when necessary. Has constant element access time. Cheap updates at the tail (constant complexity), expensive at the head (linear complexity) due to ArrayList invariant – all elements start from index = 0 in the underlying array, which means that everything to the right from the update position must be moved to the right for insertions and to the left for removals. CPU-cache friendly collection due to being backed by an array (unfortunately, not too friendly, because contains Objects, which are just pointers to the actual objects).
  • LinkedListDeque implementation – each Node consists of a value, prev and next pointers. It means that element access/updates have linear complexity (due to an optimization, these methods do not traverse more than a half of the list, so the most expensive elements are located in the middle of the list). You need to use ListIterators if you want to try to write fast LinkedList code. If you want a Queue/Deque implementation (you need to access only first and last elements) – consider using ArrayDeque instead.
  • Vector – a prehistoric version of ArrayList with all synchronized methods. Use ArrayList instead.

Continue reading