bzip2 is not research
work, in the sense that it doesn't present any new ideas.
Rather, it's an engineering exercise based on existing
ideas.
Four documents describe essentially all the ideas behind
bzip2:
Michael Burrows and D. J. Wheeler:
"A block-sorting lossless data compression algorithm"
10th May 1994.
Digital SRC Research Report 124.
ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
If you have trouble finding it, try searching at the
New Zealand Digital Library, http://www.nzdl.org.
Daniel S. Hirschberg and Debra A. LeLewer
"Efficient Decoding of Prefix Codes"
Communications of the ACM, April 1990, Vol 33, Number 4.
You might be able to get an electronic copy of this
from the ACM Digital Library.
David J. Wheeler
Program bred3.c and accompanying document bred3.ps.
This contains the idea behind the multi-table Huffman coding scheme.
ftp://ftp.cl.cam.ac.uk/users/djw3/
Jon L. Bentley and Robert Sedgewick
"Fast Algorithms for Sorting and Searching Strings"
Available from Sedgewick's web page,
www.cs.princeton.edu/~rs
The following paper gives valuable additional insights into
the algorithm, but is not immediately the basis of any code used
in bzip2.
Peter Fenwick:
Block Sorting Text Compression
Proceedings of the 19th Australasian Computer Science Conference,
Melbourne, Australia. Jan 31 - Feb 2, 1996.
ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
Kunihiko Sadakane's sorting algorithm, mentioned above, is
available from:
http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz
The Manber-Myers suffix array construction algorithm is
described in a paper available from:
http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps
Finally, the following papers document some
investigations I made into the performance of sorting
and decompression algorithms:
Julian Seward
On the Performance of BWT Sorting Algorithms
Proceedings of the IEEE Data Compression Conference 2000
Snowbird, Utah. 28-30 March 2000.
Julian Seward
Space-time Tradeoffs in the Inverse B-W Transform
Proceedings of the IEEE Data Compression Conference 2001
Snowbird, Utah. 27-29 March 2001.