bzip2 [ Home | Documentation | Downloads ]
4.5. Further Reading

4.5. Further Reading

bzip2 is not research work, in the sense that it doesn't present any new ideas. Rather, it's an engineering exercise based on existing ideas.

Four documents describe essentially all the ideas behind bzip2:

Michael Burrows and D. J. Wheeler:
  "A block-sorting lossless data compression algorithm"
   10th May 1994. 
   Digital SRC Research Report 124.
   If you have trouble finding it, try searching at the
   New Zealand Digital Library,

Daniel S. Hirschberg and Debra A. LeLewer
  "Efficient Decoding of Prefix Codes"
   Communications of the ACM, April 1990, Vol 33, Number 4.
   You might be able to get an electronic copy of this
   from the ACM Digital Library.

David J. Wheeler
   Program bred3.c and accompanying document
   This contains the idea behind the multi-table Huffman coding scheme.

Jon L. Bentley and Robert Sedgewick
  "Fast Algorithms for Sorting and Searching Strings"
   Available from Sedgewick's web page,

The following paper gives valuable additional insights into the algorithm, but is not immediately the basis of any code used in bzip2.

Peter Fenwick:
   Block Sorting Text Compression
   Proceedings of the 19th Australasian Computer Science Conference,
     Melbourne, Australia.  Jan 31 - Feb 2, 1996.

Kunihiko Sadakane's sorting algorithm, mentioned above, is available from:

The Manber-Myers suffix array construction algorithm is described in a paper available from:

Finally, the following papers document some investigations I made into the performance of sorting and decompression algorithms:

Julian Seward
   On the Performance of BWT Sorting Algorithms
   Proceedings of the IEEE Data Compression Conference 2000
     Snowbird, Utah.  28-30 March 2000.

Julian Seward
   Space-time Tradeoffs in the Inverse B-W Transform
   Proceedings of the IEEE Data Compression Conference 2001
     Snowbird, Utah.  27-29 March 2001.

Copyright © 1996 - 2018

Hosting kindly donated by Mythic Beasts