bzip2 [ Home | Documentation | Downloads ]
3.4. High-level interface

3.4. High-level interface

This interface provides functions for reading and writing bzip2 format files. First, some general points.

  • All of the functions take an int* first argument, bzerror. After each call, bzerror should be consulted first to determine the outcome of the call. If bzerror is BZ_OK, the call completed successfully, and only then should the return value of the function (if any) be consulted. If bzerror is BZ_IO_ERROR, there was an error reading/writing the underlying compressed file, and you should then consult errno / perror to determine the cause of the difficulty. bzerror may also be set to various other values; precise details are given on a per-function basis below.

  • If bzerror indicates an error (ie, anything except BZ_OK and BZ_STREAM_END), you should immediately call BZ2_bzReadClose (or BZ2_bzWriteClose, depending on whether you are attempting to read or to write) to free up all resources associated with the stream. Once an error has been indicated, behaviour of all calls except BZ2_bzReadClose (BZ2_bzWriteClose) is undefined. The implication is that (1) bzerror should be checked after each call, and (2) if bzerror indicates an error, BZ2_bzReadClose (BZ2_bzWriteClose) should then be called to clean up.

  • The FILE* arguments passed to BZ2_bzReadOpen / BZ2_bzWriteOpen should be set to binary mode. Most Unix systems will do this by default, but other platforms, including Windows and Mac, will not. If you omit this, you may encounter problems when moving code to new platforms.

  • Memory allocation requests are handled by malloc / free. At present there is no facility for user-defined memory allocators in the file I/O functions (could easily be added, though).

3.4.1. BZ2_bzReadOpen

typedef void BZFILE;

BZFILE *BZ2_bzReadOpen( int *bzerror, FILE *f, 
                        int verbosity, int small,
                        void *unused, int nUnused );

Prepare to read compressed data from file handle f. f should refer to a file which has been opened for reading, and for which the error indicator (ferror(f))is not set. If small is 1, the library will try to decompress using less memory, at the expense of speed.

For reasons explained below, BZ2_bzRead will decompress the nUnused bytes starting at unused, before starting to read from the file f. At most BZ_MAX_UNUSED bytes may be supplied like this. If this facility is not required, you should pass NULL and 0 for unused and nUnused respectively.

For the meaning of parameters small and verbosity, see BZ2_bzDecompressInit.

The amount of memory needed to decompress a file cannot be determined until the file's header has been read. So it is possible that BZ2_bzReadOpen returns BZ_OK but a subsequent call of BZ2_bzRead will return BZ_MEM_ERROR.

Possible assignments to bzerror:

BZ_CONFIG_ERROR
  if the library has been mis-compiled
BZ_PARAM_ERROR
  if f is NULL
  or small is neither 0 nor 1
  or ( unused == NULL && nUnused != 0 )
  or ( unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED) )
BZ_IO_ERROR
  if ferror(f) is nonzero
BZ_MEM_ERROR
  if insufficient memory is available
BZ_OK
  otherwise.

Possible return values:

Pointer to an abstract BZFILE
  if bzerror is BZ_OK
NULL
  otherwise

Allowable next actions:

BZ2_bzRead
  if bzerror is BZ_OK
BZ2_bzClose
  otherwise

3.4.2. BZ2_bzRead

int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len );

Reads up to len (uncompressed) bytes from the compressed file b into the buffer buf. If the read was successful, bzerror is set to BZ_OK and the number of bytes read is returned. If the logical end-of-stream was detected, bzerror will be set to BZ_STREAM_END, and the number of bytes read is returned. All other bzerror values denote an error.

BZ2_bzRead will supply len bytes, unless the logical stream end is detected or an error occurs. Because of this, it is possible to detect the stream end by observing when the number of bytes returned is less than the number requested. Nevertheless, this is regarded as inadvisable; you should instead check bzerror after every call and watch out for BZ_STREAM_END.

Internally, BZ2_bzRead copies data from the compressed file in chunks of size BZ_MAX_UNUSED bytes before decompressing it. If the file contains more bytes than strictly needed to reach the logical end-of-stream, BZ2_bzRead will almost certainly read some of the trailing data before signalling BZ_SEQUENCE_END. To collect the read but unused data once BZ_SEQUENCE_END has appeared, call BZ2_bzReadGetUnused immediately before BZ2_bzReadClose.

Possible assignments to bzerror:

BZ_PARAM_ERROR
  if b is NULL or buf is NULL or len < 0
BZ_SEQUENCE_ERROR
  if b was opened with BZ2_bzWriteOpen
BZ_IO_ERROR
  if there is an error reading from the compressed file
BZ_UNEXPECTED_EOF
  if the compressed file ended before 
  the logical end-of-stream was detected
BZ_DATA_ERROR
  if a data integrity error was detected in the compressed stream
BZ_DATA_ERROR_MAGIC
  if the stream does not begin with the requisite header bytes 
  (ie, is not a bzip2 data file).  This is really 
  a special case of BZ_DATA_ERROR.
BZ_MEM_ERROR
  if insufficient memory was available
BZ_STREAM_END
  if the logical end of stream was detected.
BZ_OK
  otherwise.

Possible return values:

number of bytes read
  if bzerror is BZ_OK or BZ_STREAM_END
undefined
  otherwise

Allowable next actions:

collect data from buf, then BZ2_bzRead or BZ2_bzReadClose
  if bzerror is BZ_OK
collect data from buf, then BZ2_bzReadClose or BZ2_bzReadGetUnused
  if bzerror is BZ_SEQUENCE_END
BZ2_bzReadClose
  otherwise

3.4.3. BZ2_bzReadGetUnused

void BZ2_bzReadGetUnused( int* bzerror, BZFILE *b, 
                          void** unused, int* nUnused );

Returns data which was read from the compressed file but was not needed to get to the logical end-of-stream. *unused is set to the address of the data, and *nUnused to the number of bytes. *nUnused will be set to a value between 0 and BZ_MAX_UNUSED inclusive.

This function may only be called once BZ2_bzRead has signalled BZ_STREAM_END but before BZ2_bzReadClose.

Possible assignments to bzerror:

BZ_PARAM_ERROR
  if b is NULL
  or unused is NULL or nUnused is NULL
BZ_SEQUENCE_ERROR
  if BZ_STREAM_END has not been signalled
  or if b was opened with BZ2_bzWriteOpen
BZ_OK
  otherwise

Allowable next actions:

BZ2_bzReadClose

3.4.4. BZ2_bzReadClose

void BZ2_bzReadClose ( int *bzerror, BZFILE *b );

Releases all memory pertaining to the compressed file b. BZ2_bzReadClose does not call fclose on the underlying file handle, so you should do that yourself if appropriate. BZ2_bzReadClose should be called to clean up after all error situations.

Possible assignments to bzerror:

BZ_SEQUENCE_ERROR
  if b was opened with BZ2_bzOpenWrite
BZ_OK
  otherwise

Allowable next actions:

none

3.4.5. BZ2_bzWriteOpen

BZFILE *BZ2_bzWriteOpen( int *bzerror, FILE *f, 
                         int blockSize100k, int verbosity,
                         int workFactor );

Prepare to write compressed data to file handle f. f should refer to a file which has been opened for writing, and for which the error indicator (ferror(f))is not set.

For the meaning of parameters blockSize100k, verbosity and workFactor, see BZ2_bzCompressInit.

All required memory is allocated at this stage, so if the call completes successfully, BZ_MEM_ERROR cannot be signalled by a subsequent call to BZ2_bzWrite.

Possible assignments to bzerror:

BZ_CONFIG_ERROR
  if the library has been mis-compiled
BZ_PARAM_ERROR
  if f is NULL
  or blockSize100k < 1 or blockSize100k > 9
BZ_IO_ERROR
  if ferror(f) is nonzero
BZ_MEM_ERROR
  if insufficient memory is available
BZ_OK
  otherwise

Possible return values:

Pointer to an abstract BZFILE
  if bzerror is BZ_OK
NULL
  otherwise

Allowable next actions:

BZ2_bzWrite
  if bzerror is BZ_OK
  (you could go directly to BZ2_bzWriteClose, but this would be pretty pointless)
BZ2_bzWriteClose
  otherwise

3.4.6. BZ2_bzWrite

void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len );

Absorbs len bytes from the buffer buf, eventually to be compressed and written to the file.

Possible assignments to bzerror:

BZ_PARAM_ERROR
  if b is NULL or buf is NULL or len < 0
BZ_SEQUENCE_ERROR
  if b was opened with BZ2_bzReadOpen
BZ_IO_ERROR
  if there is an error writing the compressed file.
BZ_OK
  otherwise

3.4.7. BZ2_bzWriteClose

void BZ2_bzWriteClose( int *bzerror, BZFILE* f,
                       int abandon,
                       unsigned int* nbytes_in,
                       unsigned int* nbytes_out );

void BZ2_bzWriteClose64( int *bzerror, BZFILE* f,
                         int abandon,
                         unsigned int* nbytes_in_lo32,
                         unsigned int* nbytes_in_hi32,
                         unsigned int* nbytes_out_lo32,
                         unsigned int* nbytes_out_hi32 );

Compresses and flushes to the compressed file all data so far supplied by BZ2_bzWrite. The logical end-of-stream markers are also written, so subsequent calls to BZ2_bzWrite are illegal. All memory associated with the compressed file b is released. fflush is called on the compressed file, but it is not fclose'd.

If BZ2_bzWriteClose is called to clean up after an error, the only action is to release the memory. The library records the error codes issued by previous calls, so this situation will be detected automatically. There is no attempt to complete the compression operation, nor to fflush the compressed file. You can force this behaviour to happen even in the case of no error, by passing a nonzero value to abandon.

If nbytes_in is non-null, *nbytes_in will be set to be the total volume of uncompressed data handled. Similarly, nbytes_out will be set to the total volume of compressed data written. For compatibility with older versions of the library, BZ2_bzWriteClose only yields the lower 32 bits of these counts. Use BZ2_bzWriteClose64 if you want the full 64 bit counts. These two functions are otherwise absolutely identical.

Possible assignments to bzerror:

BZ_SEQUENCE_ERROR
  if b was opened with BZ2_bzReadOpen
BZ_IO_ERROR
  if there is an error writing the compressed file
BZ_OK
  otherwise

3.4.8. Handling embedded compressed data streams

The high-level library facilitates use of bzip2 data streams which form some part of a surrounding, larger data stream.

  • For writing, the library takes an open file handle, writes compressed data to it, fflushes it but does not fclose it. The calling application can write its own data before and after the compressed data stream, using that same file handle.

  • Reading is more complex, and the facilities are not as general as they could be since generality is hard to reconcile with efficiency. BZ2_bzRead reads from the compressed file in blocks of size BZ_MAX_UNUSED bytes, and in doing so probably will overshoot the logical end of compressed stream. To recover this data once decompression has ended, call BZ2_bzReadGetUnused after the last call of BZ2_bzRead (the one returning BZ_STREAM_END) but before calling BZ2_bzReadClose.

This mechanism makes it easy to decompress multiple bzip2 streams placed end-to-end. As the end of one stream, when BZ2_bzRead returns BZ_STREAM_END, call BZ2_bzReadGetUnused to collect the unused data (copy it into your own buffer somewhere). That data forms the start of the next compressed stream. To start uncompressing that next stream, call BZ2_bzReadOpen again, feeding in the unused data via the unused / nUnused parameters. Keep doing this until BZ_STREAM_END return coincides with the physical end of file (feof(f)). In this situation BZ2_bzReadGetUnused will of course return no data.

This should give some feel for how the high-level interface can be used. If you require extra flexibility, you'll have to bite the bullet and get to grips with the low-level interface.

3.4.9. Standard file-reading/writing code

Here's how you'd write data to a compressed file:

FILE*   f;
BZFILE* b;
int     nBuf;
char    buf[ /* whatever size you like */ ];
int     bzerror;
int     nWritten;

f = fopen ( "myfile.bz2", "w" );
if ( !f ) {
 /* handle error */
}
b = BZ2_bzWriteOpen( &bzerror, f, 9 );
if (bzerror != BZ_OK) {
 BZ2_bzWriteClose ( b );
 /* handle error */
}

while ( /* condition */ ) {
 /* get data to write into buf, and set nBuf appropriately */
 nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf );
 if (bzerror == BZ_IO_ERROR) { 
   BZ2_bzWriteClose ( &bzerror, b );
   /* handle error */
 }
}

BZ2_bzWriteClose( &bzerror, b );
if (bzerror == BZ_IO_ERROR) {
 /* handle error */
}

And to read from a compressed file:

FILE*   f;
BZFILE* b;
int     nBuf;
char    buf[ /* whatever size you like */ ];
int     bzerror;
int     nWritten;

f = fopen ( "myfile.bz2", "r" );
if ( !f ) {
  /* handle error */
}
b = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 );
if ( bzerror != BZ_OK ) {
  BZ2_bzReadClose ( &bzerror, b );
  /* handle error */
}

bzerror = BZ_OK;
while ( bzerror == BZ_OK && /* arbitrary other conditions */) {
  nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ );
  if ( bzerror == BZ_OK ) {
    /* do something with buf[0 .. nBuf-1] */
  }
}
if ( bzerror != BZ_STREAM_END ) {
   BZ2_bzReadClose ( &bzerror, b );
   /* handle error */
} else {
   BZ2_bzReadClose ( &bzerror );
}

Copyright © 1996 - 2014  julian@bzip.org

Hosting kindly donated by Mythic Beasts