Forbidden Java actions: object assignments, type conversions etc on the low level in Java

by Mikhail Vorontsov

This article will reveal you a few details about the low level Java memory layout: we will see how to implement Object assignments using just primitive types. Then we will see what’s hidden in the array header and will convert an array of one type into an array of another type. This article continues the memory allocation discussion from Various types of memory allocation in Java article. It is also the first article in the “Forbidden Java actions” series.

WARNING: It is not recommended to apply these ideas to a production code! This article has only exploratory tasks.

Object assignments on the low level

As we already know from the Memory introspection using sun.misc.Unsafe and reflection article, Object reference (not its contents!) occupies just 4 bytes on the under 32Gb heaps. We will assume 4 byte Object references in this article.

All examples from this article require the following static definition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
private static final Unsafe unsafe;
static
{
    try
    {
        Field field = Unsafe.class.getDeclaredField("theUnsafe");
        field.setAccessible(true);
        unsafe = (Unsafe)field.get(null);
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}
 
private static final long longArrayOffset = unsafe.arrayBaseOffset(long[].class);
private static final long intArrayOffset = unsafe.arrayBaseOffset(int[].class);
private static final long integerArrayOffset = unsafe.arrayBaseOffset(Integer[].class);
private static final Unsafe unsafe;
static
{
    try
    {
        Field field = Unsafe.class.getDeclaredField("theUnsafe");
        field.setAccessible(true);
        unsafe = (Unsafe)field.get(null);
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}

private static final long longArrayOffset = unsafe.arrayBaseOffset(long[].class);
private static final long intArrayOffset = unsafe.arrayBaseOffset(int[].class);
private static final long integerArrayOffset = unsafe.arrayBaseOffset(Integer[].class);

Let’s allocate a small Integer[ 4 ] array and populate it with 1, 2, 3 and 4. After that let’s see what’s stored in the data cells of the array. We will read them as int values:

1
2
3
4
5
6
7
8
9
10
11
12
13
final Integer[] ar = new Integer[ 4 ];
ar[ 0 ] = 1;
ar[ 1 ] = 2;
ar[ 2 ] = 3;
ar[ 3 ] = 4;
 
//objects occupy 4 bytes for under 32G heaps
int[] ptrs = new int[ 4 ];
for ( int i = 0; i < 4; i++)
{
    ptrs[ i ] = unsafe.getInt(ar, integerArrayOffset + i * 4);
    System.out.println("Integer[" + i + "] = " + Integer.toHexString( ptrs[ i ] ) );
}
final Integer[] ar = new Integer[ 4 ];
ar[ 0 ] = 1;
ar[ 1 ] = 2;
ar[ 2 ] = 3;
ar[ 3 ] = 4;

//objects occupy 4 bytes for under 32G heaps
int[] ptrs = new int[ 4 ];
for ( int i = 0; i < 4; i++)
{
    ptrs[ i ] = unsafe.getInt(ar, integerArrayOffset + i * 4);
    System.out.println("Integer[" + i + "] = " + Integer.toHexString( ptrs[ i ] ) );
}

Here is an output from my computer. It will be different each time, but most likely the difference between cells will be equal to 16, which is an actual Integer memory footprint.

Integer[0] = f004e880
Integer[1] = f004e870
Integer[2] = f004e860
Integer[3] = f004e850

Unfortunately, these values are not pure pointers (in terms of C) - trying to use them with unsafe.getInt( long address ) will cause an access violation.

On the other hand, we can still work in terms of these values. For example, we can implement an assignment ar[ 0 ] = ar[ 1 ]:

1
2
3
4
5
6
System.out.println( "Before change: ar = " + Arrays.toString( ar ) );
 
//ar[ 0 ] = ar[ 1 ];
unsafe.putInt( ar, integerArrayOffset, ptrs[ 1 ] );
 
System.out.println( "After change: ar = " + Arrays.toString( ar ) );
System.out.println( "Before change: ar = " + Arrays.toString( ar ) );

//ar[ 0 ] = ar[ 1 ];
unsafe.putInt( ar, integerArrayOffset, ptrs[ 1 ] );

System.out.println( "After change: ar = " + Arrays.toString( ar ) );

This code snippet (it should be added after the previous one) will print:

Before change: ar = [1, 2, 3, 4]
After change: ar = [2, 2, 3, 4]

Java array header inspection and update

In this part of the article we will inspect a Java array header structure. After that we will try to convert long[] into an int[] using raw memory updates in the header.

Let's allocate 2 arrays of the same size but different type - one long[ 1000 ] and another int[ 1000 ]. We will use them in order to find out where type and length are stored in the array header. The sizes of array headers are equal to longArrayOffset / intArrayOffset constants defined in the static block.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
final long[] ar = new long[ 1000 ];
System.out.println( "long[] header: size = " + longArrayOffset );
System.out.println( "First 8 bytes = " + Long.toHexString( unsafe.getLong( ar, 0L ) ) );
//check what's in the array header
for ( int i = 8; i < longArrayOffset; i+=2 )
{
    System.out.print( ( unsafe.getShort(ar, (long) i) & 0xFFFF ) + " ");
}
System.out.println();
 
final int[] ar1 = new int[ 1000 ];
System.out.println( "int[] header: size = " + intArrayOffset );
System.out.println( "First 8 bytes = " + Long.toHexString( unsafe.getLong( ar1, 0L ) ) );
//check what's in the array header
for ( int i = 8; i < intArrayOffset; i+=2 )
{
    System.out.print( ( unsafe.getShort(ar1, (long) i) & 0xFFFF ) + " ");
}
System.out.println();
final long[] ar = new long[ 1000 ];
System.out.println( "long[] header: size = " + longArrayOffset );
System.out.println( "First 8 bytes = " + Long.toHexString( unsafe.getLong( ar, 0L ) ) );
//check what's in the array header
for ( int i = 8; i < longArrayOffset; i+=2 )
{
    System.out.print( ( unsafe.getShort(ar, (long) i) & 0xFFFF ) + " ");
}
System.out.println();

final int[] ar1 = new int[ 1000 ];
System.out.println( "int[] header: size = " + intArrayOffset );
System.out.println( "First 8 bytes = " + Long.toHexString( unsafe.getLong( ar1, 0L ) ) );
//check what's in the array header
for ( int i = 8; i < intArrayOffset; i+=2 )
{
    System.out.print( ( unsafe.getShort(ar1, (long) i) & 0xFFFF ) + " ");
}
System.out.println();

Here is an output of this code snippet:

long[] header: size = 16
First 8 bytes = 1
5072 60128 1000 0

int[] header: size = 16
First 8 bytes = 1
4496 60128 1000 0

We can conclude that the last 4 bytes of a header contain an array length (which is logical - an extra attribute compared to a Java Object should be at the end of the header). Either 2 or 4 bytes from offset = 8 contain object type.

Let's convert a long[ 1000 ] into int[ 2000 ] and see if this could be done. We need to take int representing long[] from a helper int[] header at offset=8 and write it at the same offset to the long[] header. Then, in order to keep all long[] memory accessible, we need to increase array length to 2000 (int is 2 times smaller than long).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
//let's try to change long[] into int[]
//change type
unsafe.putShort(ar, 8L, unsafe.getShort(ar1, 8L));
//change length (*2)
unsafe.putShort( ar, 12L, (short) (unsafe.getShort( ar, 12L ) * 2) );
 
System.out.println( "After change from long[] to int[]");
System.out.println( ar.getClass().getName() );
System.out.println( ar.length );
 
final Object temp = ar;
final int[] newAr = (int[]) temp;
System.out.println( "int[] length using int[] reference = " + newAr.length );
for ( int i = 0; i < newAr.length; ++i )
{
    newAr[ i ] = -1;
    if ( newAr[ i ] != -1 )
        System.out.println( "Failed at pos = " + i );
}
//let's try to change long[] into int[]
//change type
unsafe.putShort(ar, 8L, unsafe.getShort(ar1, 8L));
//change length (*2)
unsafe.putShort( ar, 12L, (short) (unsafe.getShort( ar, 12L ) * 2) );

System.out.println( "After change from long[] to int[]");
System.out.println( ar.getClass().getName() );
System.out.println( ar.length );

final Object temp = ar;
final int[] newAr = (int[]) temp;
System.out.println( "int[] length using int[] reference = " + newAr.length );
for ( int i = 0; i < newAr.length; ++i )
{
    newAr[ i ] = -1;
    if ( newAr[ i ] != -1 )
        System.out.println( "Failed at pos = " + i );
}

After this change we can see that an original array ar type has changed to int[] (using Object.getClass()) and its length is now 2000. Do not try to access elements after ar[1000] - you will get an access violation.

We can convert an object of any reference type to an object of another reference type via the intermediate Object variable. We will do it for long[] -> int[] conversion. The last loop allows us to check that all 2000 elements of a "new" array are read-write accessible.

Summary

  • All Java object references occupy 4 bytes for under 32G heaps. You can use sun.misc.Unsafe in order to treat such references as int fields.
  • Java arrays contain element type as int at the offset=8 in the array header. Length (int) is stored at offset=12. Changing these values is possible, but care must be taken in order not to extend an updated array outside of initially allocated memory.