How to iterate zip file records (java.util.zip.ZipFile, java.util.zip.ZipInputStream)

by Mikhail Vorontsov

The right way to iterate a zip file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
final ZipFile file = new ZipFile( FILE_NAME );
try
{
    final Enumeration<? extends ZipEntry> entries = file.entries();
    while ( entries.hasMoreElements() )
    {
        final ZipEntry entry = entries.nextElement();
        System.out.println( entry.getName() );
        //use entry input stream:
        readInputStream( file.getInputStream( entry ) )
    }
}
finally
{
    file.close();
}
    
private static int readInputStream( final InputStream is ) throws IOException {
    final byte[] buf = new byte[ 8192 ];
    int read = 0;
    int cntRead;
    while ( ( cntRead = is.read( buf, 0, buf.length ) ) >=0  )
    {
        read += cntRead;
    }
    return read;
}
final ZipFile file = new ZipFile( FILE_NAME );
try
{
    final Enumeration<? extends ZipEntry> entries = file.entries();
    while ( entries.hasMoreElements() )
    {
        final ZipEntry entry = entries.nextElement();
        System.out.println( entry.getName() );
        //use entry input stream:
        readInputStream( file.getInputStream( entry ) )
    }
}
finally
{
    file.close();
}
    
private static int readInputStream( final InputStream is ) throws IOException {
    final byte[] buf = new byte[ 8192 ];
    int read = 0;
    int cntRead;
    while ( ( cntRead = is.read( buf, 0, buf.length ) ) >=0  )
    {
        read += cntRead;
    }
    return read;
}

The wrong way to iterate a zip file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
final ZipInputStream is = new ZipInputStream( new BufferedInputStream( new FileInputStream( FILE_NAME ) ) );
try
{
    ZipEntry entry;
    while ( ( entry = is.getNextEntry() ) != null )
    {
        System.out.println( entry.getName() );
        //use entry input stream:
        readInputStream( is )
    }
}
finally
{
    is.close();
}
final ZipInputStream is = new ZipInputStream( new BufferedInputStream( new FileInputStream( FILE_NAME ) ) );
try
{
    ZipEntry entry;
    while ( ( entry = is.getNextEntry() ) != null )
    {
        System.out.println( entry.getName() );
        //use entry input stream:
        readInputStream( is )
    }
}
finally
{
    is.close();
}

Description

Zip file consists of several entries, each of them has a field containing the number of bytes in the current entry. So, it is easy to iterate all zip file entries without actual data decompression. java.util.zip.ZipFile accepts a file/file name and uses random access to jump between file positions. java.util.zip.ZipInputStream, on the other hand, is working with streams, so it is unable to freely jump. That’s why it has to read and decompress all zip data in order to reach EOF for each entry and read the next entry header.

What does it mean? If you already have a zip file in your file system – use ZipFile to process it regardless of your task. As a bonus, you can access zip entries either sequentially or randomly (with rather small performance penalty). On the other hand, if you are processing a stream, you’ll need to process all entries sequentially using ZipInputStream.

Here is an example. A zip archive (total file size = 1.6Gb) containing three 0.6Gb entries was iterated in 0.05 sec using ZipFile and in 18 sec using ZipInputStream.

64 bit ZIP file support in Java 7

We also need to mention that Java 6 ZIP implementation does not support zip files with any of file sizes exceeding 2Gb (total .zip file size, uncompressed size of any file in archive). Java 7 supports zip64 mode via the same interface (java.util.zip package).

Summary

  • Always use java.util.zip.ZipFile for a zip file entries iteration. Avoid using java.util.zip.ZipInputStream for the same purpose.
  • 64 bit ZIP files are not supported in Java 6 JDK. Their support was added from Java 7.