|
[ Home | Documentation | Downloads ] |
|
bzip2 compresses large files in blocks. The block size affects both the compression ratio achieved, and the amount of memory needed for compression and decompression. The flags -1 through -9 specify the block size to be 100,000 bytes through 900,000 bytes (the default) respectively. At decompression time, the block size used for compression is read from the header of the compressed file, and bunzip2 then allocates itself just enough memory to decompress the file. Since block sizes are stored in compressed files, it follows that the flags -1 to -9 are irrelevant to and so ignored during decompression. Compression and decompression requirements, in bytes, can be estimated as: Compression: 400k + ( 8 x block size )
Decompression: 100k + ( 4 x block size ), or
100k + ( 2.5 x block size )
Larger block sizes give rapidly diminishing marginal returns. Most of the compression comes from the first two or three hundred k of block size, a fact worth bearing in mind when using bzip2 on small machines. It is also important to appreciate that the decompression memory requirement is set at compression time by the choice of block size. For files compressed with the default 900k block size, bunzip2 will require about 3700 kbytes to decompress. To support decompression of any file on a 4 megabyte machine, bunzip2 has an option to decompress using approximately half this amount of memory, about 2300 kbytes. Decompression speed is also halved, so you should use this option only where necessary. The relevant flag is -s. In general, try and use the largest block size memory constraints allow, since that maximises the compression achieved. Compression and decompression speed are virtually unaffected by block size. Another significant point applies to files which fit in a single block -- that means most files you'd encounter using a large block size. The amount of real memory touched is proportional to the size of the file, since the file is smaller than a block. For example, compressing a file 20,000 bytes long with the flag -9 will cause the compressor to allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560 kbytes of it. Similarly, the decompressor will allocate 3700k but only touch 100k + 20000 * 4 = 180 kbytes. Here is a table which summarises the maximum memory usage for different block sizes. Also recorded is the total compressed size for 14 files of the Calgary Text Compression Corpus totalling 3,141,622 bytes. This column gives some feel for how compression varies with block size. These figures tend to understate the advantage of larger block sizes for larger files, since the Corpus is dominated by smaller files. Compress Decompress Decompress Corpus Flag usage usage -s usage Size -1 1200k 500k 350k 914704 -2 2000k 900k 600k 877703 -3 2800k 1300k 850k 860338 -4 3600k 1700k 1100k 846899 -5 4400k 2100k 1350k 845160 -6 5200k 2500k 1600k 838626 -7 6100k 2900k 1850k 834096 -8 6800k 3300k 2100k 828642 -9 7600k 3700k 2350k 828642 |
|
|
Copyright © 1996 - 2010 julian@bzip.org |
|