Use case: compacting price field disk representation

by Mikhail Vorontsov

In this article I’ll describe a small trick you can use while working with prices (at least, in countries with rather expensive currency).

There are 2 well-known ways to store price in memory and on disk:

  • In smallest currency units of your currency, using integral type of your choice
  • Keep it as double and don’t worry about number of smallest currency units in any currency. This approach has advantage if you are writing a system which supports prices in fractions of smallest currency unit (for example, sub-penny prices on a large number of stock exchanges).

Impact on compression

If you will ever need to compress your data as a stream, when integral types have one crucial advantage: they are shorter and most prices use only a small number of bits from a corresponding datatype. Floating point types, on the other hand, have sign, exponent and mantissa at various places of a datatype binary representation (see IEEE-754 standard for details), which generally makes data compression less effective.

For a test, we will generate 10M short values between 0 and 9999, convert them to 10M double values between 0 and 99.99 (by dividing by 100). After that each set of numbers will be written to GZIPOutputStream(ByteArrayOutputStream) and we will compare resulting stream sizes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
private static void testShortVsDouble() throws IOException {
    final short[] shorts = generatePrices();
    final double[] doubles = convert( shorts );
    //test shorts
    {
        final ByteArrayOutputStream bos = new ByteArrayOutputStream( 1024 * 1024 );
        final DataOutputStream dos = new DataOutputStream( new GZIPOutputStream( bos, 65536 ) );
        try
        {
            for ( final short val : shorts )
                dos.writeShort( val );
        }
        finally {
            dos.close();
        }
        System.out.println( "Short stream compressed: " + bos.size() + " bytes");
    }
    //test doubles
    {
        final ByteArrayOutputStream bos = new ByteArrayOutputStream( 1024 * 1024 );
        final DataOutputStream dos = new DataOutputStream( new GZIPOutputStream( bos, 65536 ) );
        try
        {
            for ( final double val : doubles )
                dos.writeDouble(val);
        }
        finally {
            dos.close();
        }
        System.out.println( "Double stream compressed: " + bos.size() + " bytes");
    }
}
 
//convert short price in cents to double price in dollars
private static double[] convert( final short[] data )
{
    final double[] res = new double[ data.length ];
    for ( int i = 0; i < data.length; ++i )
        res[ i ] = data[ i ] / 100.0;
    return res;
}
 
private static short[] generatePrices()
{
    final Random r = new Random( 1 ); //constant is used in order to make result reproducible
    final short[] res = new short[ ITEMS_COUNT ];
    for ( int i = 0; i < ITEMS_COUNT; ++i )
        res[ i ] = (short) r.nextInt( 9999 );
    return res;
}
private static void testShortVsDouble() throws IOException {
    final short[] shorts = generatePrices();
    final double[] doubles = convert( shorts );
    //test shorts
    {
        final ByteArrayOutputStream bos = new ByteArrayOutputStream( 1024 * 1024 );
        final DataOutputStream dos = new DataOutputStream( new GZIPOutputStream( bos, 65536 ) );
        try
        {
            for ( final short val : shorts )
                dos.writeShort( val );
        }
        finally {
            dos.close();
        }
        System.out.println( "Short stream compressed: " + bos.size() + " bytes");
    }
    //test doubles
    {
        final ByteArrayOutputStream bos = new ByteArrayOutputStream( 1024 * 1024 );
        final DataOutputStream dos = new DataOutputStream( new GZIPOutputStream( bos, 65536 ) );
        try
        {
            for ( final double val : doubles )
                dos.writeDouble(val);
        }
        finally {
            dos.close();
        }
        System.out.println( "Double stream compressed: " + bos.size() + " bytes");
    }
}

//convert short price in cents to double price in dollars
private static double[] convert( final short[] data )
{
    final double[] res = new double[ data.length ];
    for ( int i = 0; i < data.length; ++i )
        res[ i ] = data[ i ] / 100.0;
    return res;
}

private static short[] generatePrices()
{
    final Random r = new Random( 1 ); //constant is used in order to make result reproducible
    final short[] res = new short[ ITEMS_COUNT ];
    for ( int i = 0; i < ITEMS_COUNT; ++i )
        res[ i ] = (short) r.nextInt( 9999 );
    return res;
}

Test output is:

Short stream compressed: 18385584 bytes
Double stream compressed: 26267355 bytes
    

Floating point values take ~50% more space to store in compressed form despite the fact they store the same amount of information.

Why expensive currency units matter

Why did I specifically mentioned that we may benefit from expensive currency units? Because a very large share of prices in such currencies is between 0 and 99.99 (take a look in your local grocery or even department store - you'll see). Even in case of stock exchanges - most of share prices are still in this range. This means that short datatype could be used to keep prices between 0 and 99.99 if they have no more than 2 decimals digits. For all other prices we can still use a double. Of course, this is only applicable for stream data, systems using databases are not so flexible.

Experienced stock exchange developer will certainly ask about derivatives, which can have a negative price (which is calculated as a difference between 2 underlying instruments). It is not a problem for this approach too - short still have enough space to store all prices between -99.99 and 99.99, provided that we don't have more than 2 decimal digits. Logic in this case will require extra space in order to distinguish between short and double datatypes.

Let's start with a simple case - all prices are non-negative. For simplicity, we will use big endian byte order, which has a benefit of all sign bytes being the first in the datatype memory representation.

We will utilize the fact that prices are non-negative and we are using big-endian byte order. If price is between 0 and 99.99 and has no more than 2 decimal digits, we will convert it into a number of cents (multiply by 100) and write as short, otherwise we will write -price as double. When reading data, it will be enough to read a short and check if it is non-negative. For non-negative values we will return them divided by 100 (remember, we multiplied them while writing). If our short turned out to be negative, it means that we should have actually read a double. We will go back by 2 bytes and read -double (we have negated it will writing):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
private static final BigDecimal HUNDRED = new BigDecimal( "100" );
 
private static void writePriceUnsigned( final ByteBuffer buffer, final double price )
{
    //decimal32 because we are only interested in 1-4 digits price. They easily fit in 32 bits
    final BigDecimal bdVal = new BigDecimal( price, MathContext.DECIMAL32 ).stripTrailingZeros();
    if ( bdVal.precision() > 4 || bdVal.scale() > 2
            || bdVal.compareTo( HUNDRED ) >=0 || bdVal.compareTo( BigDecimal.ZERO ) < 0 )
    {
        //for long values - write -price
        buffer.putDouble( -price );
    }
    else
    {
        //we have up to 4 digits long number with up to 2 decimal digits. Multiply it by 100 to get cents.
        final BigDecimal scaled = bdVal.scaleByPowerOfTen( 2 );
        try
        {
            buffer.putShort( scaled.shortValueExact() );
        }
        catch ( ArithmeticException ex )
        {
            buffer.putDouble( -price );
        }
    }
}
 
private static double readPriceUnsigned( final ByteBuffer buf )
{
    //read short. If it is positive - we are done. If it is negative, go back by 2 bytes and read -double.
    final short shortRes = buf.getShort();
    if ( shortRes >= 0 )
    {
        return shortRes / 100.0;
    }
    buf.position( buf.position() - 2 );
    return -buf.getDouble();
}
 
private static void testUnsignedPrices()
{
    final double[] testValues = { 100, 10, 0.075, 0.999, 0.07, 98.82, 0.7, 0.1, 1, 7, 0.17, 0.77, 0.007 };
    final ByteBuffer buf = ByteBuffer.allocate( testValues.length * 8 ).order( ByteOrder.BIG_ENDIAN );
    //write test values to buffer
    for ( final double d : testValues )
        writePriceUnsigned( buf, d );
    buf.flip();
    //read them back, assure they are the same
    for ( double testValue : testValues )
    {
        final double v = readPriceUnsigned(buf);
        if (v != testValue) {
            System.err.println("Failed for " + testValue);
        }
    }
}
private static final BigDecimal HUNDRED = new BigDecimal( "100" );

private static void writePriceUnsigned( final ByteBuffer buffer, final double price )
{
    //decimal32 because we are only interested in 1-4 digits price. They easily fit in 32 bits
    final BigDecimal bdVal = new BigDecimal( price, MathContext.DECIMAL32 ).stripTrailingZeros();
    if ( bdVal.precision() > 4 || bdVal.scale() > 2
            || bdVal.compareTo( HUNDRED ) >=0 || bdVal.compareTo( BigDecimal.ZERO ) < 0 )
    {
        //for long values - write -price
        buffer.putDouble( -price );
    }
    else
    {
        //we have up to 4 digits long number with up to 2 decimal digits. Multiply it by 100 to get cents.
        final BigDecimal scaled = bdVal.scaleByPowerOfTen( 2 );
        try
        {
            buffer.putShort( scaled.shortValueExact() );
        }
        catch ( ArithmeticException ex )
        {
            buffer.putDouble( -price );
        }
    }
}

private static double readPriceUnsigned( final ByteBuffer buf )
{
    //read short. If it is positive - we are done. If it is negative, go back by 2 bytes and read -double.
    final short shortRes = buf.getShort();
    if ( shortRes >= 0 )
    {
        return shortRes / 100.0;
    }
    buf.position( buf.position() - 2 );
    return -buf.getDouble();
}

private static void testUnsignedPrices()
{
    final double[] testValues = { 100, 10, 0.075, 0.999, 0.07, 98.82, 0.7, 0.1, 1, 7, 0.17, 0.77, 0.007 };
    final ByteBuffer buf = ByteBuffer.allocate( testValues.length * 8 ).order( ByteOrder.BIG_ENDIAN );
    //write test values to buffer
    for ( final double d : testValues )
        writePriceUnsigned( buf, d );
    buf.flip();
    //read them back, assure they are the same
    for ( double testValue : testValues )
    {
        final double v = readPriceUnsigned(buf);
        if (v != testValue) {
            System.err.println("Failed for " + testValue);
        }
    }
}

Now we can add an extra test to testShortVsDouble, which will use writePriceUnsigned and readPriceUnsigned. It will write the same number of bytes as in case of short[], because all the data we initially generated fitting in 0-99.99 range.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
//test doubles converted to unsigned shorts
{
    final ByteBuffer buf = ByteBuffer.allocate( doubles.length * 8 ).order( ByteOrder.BIG_ENDIAN );
    for ( final double val : doubles )
        writePriceUnsigned( buf, val );
 
    final ByteArrayOutputStream bos = new ByteArrayOutputStream( 1024 * 1024 );
    final OutputStream os = new GZIPOutputStream( bos, 65536 );
    try
    {
        os.write( buf.array(), 0, buf.position() );
    }
    finally {
        os.close();
    }
    System.out.println( "Double stream compressed as unsigned shorts: " + bos.size() + " bytes");
}
//test doubles converted to unsigned shorts
{
    final ByteBuffer buf = ByteBuffer.allocate( doubles.length * 8 ).order( ByteOrder.BIG_ENDIAN );
    for ( final double val : doubles )
        writePriceUnsigned( buf, val );

    final ByteArrayOutputStream bos = new ByteArrayOutputStream( 1024 * 1024 );
    final OutputStream os = new GZIPOutputStream( bos, 65536 );
    try
    {
        os.write( buf.array(), 0, buf.position() );
    }
    finally {
        os.close();
    }
    System.out.println( "Double stream compressed as unsigned shorts: " + bos.size() + " bytes");
}

Now let's examine a more complicated case of possibly negative prices. For stock exchange data, we may have such prices on any derivative instrument with 2 legs. Price of such instruments is usually calculated as price_instrument_1 - price_instrument_2, so it may be negative if instrument2 is more expensive than instrument1.

Because our prices now can have a sign, we can't use bit 7 in the first byte that easily anymore. Instead we will use the fact that numbers in range [-9999, 9999] require only 14 bits for their storage. This will allow us to use bit 15 of short value as a flag - if it is set, read 8 following bytes as double, otherwise read remaining 15 bits as packed short: bit 14 will be a sign, bits 0-13 - abs(value).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
private static final BigDecimal HUNDRED = new BigDecimal( "100" );
private static final BigDecimal MINUS_HUNDRED = new BigDecimal( "-100" );
private static final int BIT14 = 0x4000;
 
//bit 15=0, bit 14=sign, bits 13-0: abs(value) (enough to fit -16384 : 16383)
private static short pack9999( final short val )
{
    if ( val >= 0 )
        return val;
    final short unsigned = (short) -val;
    return (short) (BIT14 | unsigned);  //0x4000 - bit 14 set
}
 
private static short unpack9999( final short val )
{
    final int sign = val & BIT14;
    final int value = val & 0x3FFF;
    return (short) (sign == 0 ? value : -value);
}
 
private static void writePriceSigned( final ByteBuffer buffer, final double price )
{
    //decimal32 because we are only interested in 1-4 digits price. They easily fit in 32 bits
    final BigDecimal bdVal = new BigDecimal( price, MathContext.DECIMAL32 ).stripTrailingZeros();
    //precision = 4 - no more than 4 digits; scale=2 - no more than 2 decimal digits (disallow 0.375)
    if ( bdVal.precision() > 4 || bdVal.scale() > 2 || bdVal.compareTo( MINUS_HUNDRED ) <= 0 ||
            bdVal.compareTo( HUNDRED ) >= 0 )
    {
        //for long values - write price with negative sign
        buffer.put((byte) 0xFF);
        buffer.putDouble( price );
    }
    else
    {
        //we have up to 4 digits long number with up to 2 decimal digits. Multiply it by 100 to get cents.
        final BigDecimal scaled = bdVal.scaleByPowerOfTen( 2 );
        try
        {
            buffer.putShort( pack9999( scaled.shortValueExact() ) );
        }
        catch ( ArithmeticException ex )
        {
            buffer.put((byte) 0xFF);
            buffer.putDouble( price );
        }
    }
}
 
private static double readPriceSigned( final ByteBuffer buffer )
{
    final short shortRes = buffer.getShort();
    if ( shortRes >= 0 )
        return unpack9999( shortRes ) / 100.0;
    //go 1 byte back and read a double value
    buffer.position( buffer.position() - 1 );
    return buffer.getDouble();
}
 
private static void testSignedPrices()
{
    final double[] testValues = { 0.075, 0.999, -0.999, 0.07, -0.07, 98.82, -98.82, 0.7, 0.1,
            1, 7, 0.17, 0.77, 0.007, 100, -100, 101, -101, 0.001, 10, -10 };
    final ByteBuffer buf = ByteBuffer.allocate( testValues.length * 8 ).order( ByteOrder.BIG_ENDIAN );
    //write test values to buffer
    for ( final double d : testValues )
        writePriceSigned(buf, d);
    buf.flip();
    //read them back, assure they are the same
    for ( double testValue : testValues )
    {
        final double v = readPriceSigned(buf);
        if (v != testValue) {
            System.err.println("Failed for " + testValue);
        }
    }
}
private static final BigDecimal HUNDRED = new BigDecimal( "100" );
private static final BigDecimal MINUS_HUNDRED = new BigDecimal( "-100" );
private static final int BIT14 = 0x4000;

//bit 15=0, bit 14=sign, bits 13-0: abs(value) (enough to fit -16384 : 16383)
private static short pack9999( final short val )
{
    if ( val >= 0 )
        return val;
    final short unsigned = (short) -val;
    return (short) (BIT14 | unsigned);  //0x4000 - bit 14 set
}

private static short unpack9999( final short val )
{
    final int sign = val & BIT14;
    final int value = val & 0x3FFF;
    return (short) (sign == 0 ? value : -value);
}

private static void writePriceSigned( final ByteBuffer buffer, final double price )
{
    //decimal32 because we are only interested in 1-4 digits price. They easily fit in 32 bits
    final BigDecimal bdVal = new BigDecimal( price, MathContext.DECIMAL32 ).stripTrailingZeros();
    //precision = 4 - no more than 4 digits; scale=2 - no more than 2 decimal digits (disallow 0.375)
    if ( bdVal.precision() > 4 || bdVal.scale() > 2 || bdVal.compareTo( MINUS_HUNDRED ) <= 0 ||
            bdVal.compareTo( HUNDRED ) >= 0 )
    {
        //for long values - write price with negative sign
        buffer.put((byte) 0xFF);
        buffer.putDouble( price );
    }
    else
    {
        //we have up to 4 digits long number with up to 2 decimal digits. Multiply it by 100 to get cents.
        final BigDecimal scaled = bdVal.scaleByPowerOfTen( 2 );
        try
        {
            buffer.putShort( pack9999( scaled.shortValueExact() ) );
        }
        catch ( ArithmeticException ex )
        {
            buffer.put((byte) 0xFF);
            buffer.putDouble( price );
        }
    }
}

private static double readPriceSigned( final ByteBuffer buffer )
{
    final short shortRes = buffer.getShort();
    if ( shortRes >= 0 )
        return unpack9999( shortRes ) / 100.0;
    //go 1 byte back and read a double value
    buffer.position( buffer.position() - 1 );
    return buffer.getDouble();
}

private static void testSignedPrices()
{
    final double[] testValues = { 0.075, 0.999, -0.999, 0.07, -0.07, 98.82, -98.82, 0.7, 0.1,
            1, 7, 0.17, 0.77, 0.007, 100, -100, 101, -101, 0.001, 10, -10 };
    final ByteBuffer buf = ByteBuffer.allocate( testValues.length * 8 ).order( ByteOrder.BIG_ENDIAN );
    //write test values to buffer
    for ( final double d : testValues )
        writePriceSigned(buf, d);
    buf.flip();
    //read them back, assure they are the same
    for ( double testValue : testValues )
    {
        final double v = readPriceSigned(buf);
        if (v != testValue) {
            System.err.println("Failed for " + testValue);
        }
    }
}

Summary

Try to avoid storing double values in your disk data structures. Often same information may be represented in a smaller number of bytes (compare, for example, 0.01 converted with writePriceUnsigned and 3f 84 7a e1 47 ae 14 7b - binary representation of 0.01).

Analyze properties of the data you have to store in the largest data structures in your programs. Try to identify cases when most of your data can fit into a more compact data type than an original one.

See Use case: how to compact a long-to-long mapping for other ideas of compacting your data.


Leave a Reply

Your email address will not be published. Required fields are marked *