3.  Programming with libbzip2

This chapter describes the programming interface to libbzip2.

For general background information, particularly about memory use and performance aspects, you'd be well advised to read How to use bzip2 as well.

3.1. Top-level structure

libbzip2 is a flexible library for compressing and decompressing data in the bzip2 data format. Although packaged as a single entity, it helps to regard the library as three separate parts: the low level interface, and the high level interface, and some utility functions.

The structure of libbzip2's interfaces is similar to that of Jean-loup Gailly's and Mark Adler's excellent zlib library.

All externally visible symbols have names beginning BZ2_. This is new in version 1.0. The intention is to minimise pollution of the namespaces of library clients.

To use any part of the library, you need to #include <bzlib.h> into your sources.

3.1.1. Low-level summary

This interface provides services for compressing and decompressing data in memory. There's no provision for dealing with files, streams or any other I/O mechanisms, just straight memory-to-memory work. In fact, this part of the library can be compiled without inclusion of stdio.h, which may be helpful for embedded applications.

The low-level part of the library has no global variables and is therefore thread-safe.

Six routines make up the low level interface: BZ2_bzCompressInit, BZ2_bzCompress, and BZ2_bzCompressEnd for compression, and a corresponding trio BZ2_bzDecompressInit, BZ2_bzDecompress and BZ2_bzDecompressEnd for decompression. The *Init functions allocate memory for compression/decompression and do other initialisations, whilst the *End functions close down operations and release memory.

The real work is done by BZ2_bzCompress and BZ2_bzDecompress. These compress and decompress data from a user-supplied input buffer to a user-supplied output buffer. These buffers can be any size; arbitrary quantities of data are handled by making repeated calls to these functions. This is a flexible mechanism allowing a consumer-pull style of activity, or producer-push, or a mixture of both.

3.1.2. High-level summary

This interface provides some handy wrappers around the low-level interface to facilitate reading and writing bzip2 format files (.bz2 files). The routines provide hooks to facilitate reading files in which the bzip2 data stream is embedded within some larger-scale file structure, or where there are multiple bzip2 data streams concatenated end-to-end.

For reading files, BZ2_bzReadOpen, BZ2_bzRead, BZ2_bzReadClose and BZ2_bzReadGetUnused are supplied. For writing files, BZ2_bzWriteOpen, BZ2_bzWrite and BZ2_bzWriteFinish are available.

As with the low-level library, no global variables are used so the library is per se thread-safe. However, if I/O errors occur whilst reading or writing the underlying compressed files, you may have to consult errno to determine the cause of the error. In that case, you'd need a C library which correctly supports errno in a multithreaded environment.

To make the library a little simpler and more portable, BZ2_bzReadOpen and BZ2_bzWriteOpen require you to pass them file handles (FILE*s) which have previously been opened for reading or writing respectively. That avoids portability problems associated with file operations and file attributes, whilst not being much of an imposition on the programmer.

3.1.3. Utility functions summary

For very simple needs, BZ2_bzBuffToBuffCompress and BZ2_bzBuffToBuffDecompress are provided. These compress data in memory from one buffer to another buffer in a single function call. You should assess whether these functions fulfill your memory-to-memory compression/decompression requirements before investing effort in understanding the more general but more complex low-level interface.

Yoshioka Tsuneo ( / has contributed some functions to give better zlib compatibility. These functions are BZ2_bzopen, BZ2_bzread, BZ2_bzwrite, BZ2_bzflush, BZ2_bzclose, BZ2_bzerror and BZ2_bzlibVersion. You may find these functions more convenient for simple file reading and writing, than those in the high-level interface. These functions are not (yet) officially part of the library, and are minimally documented here. If they break, you get to keep all the pieces. I hope to document them properly when time permits.

Yoshioka also contributed modifications to allow the library to be built as a Windows DLL.

