bzip2

from Wikipedia, the free encyclopedia
bzip2

logo
Basic data

developer Julian Seward
Publishing year July 18, 1996
Current  version 1.0.8
(July 13, 2019)
operating system Linux / Unix , Windows
programming language C.
category Packing program
License BSD -like
German speaking No
sourceware.org/bzip2/
BZip2
File extension : .bz2
MIME type : application / x-bzip
Magic number : 42 5A 68 hex
BZh

(String)

Developed by: Julian Seward
Initial release: 1996
Type: Data compression

bzip2 is a free compression program for lossless compression of files, developed by Julian Seward . It is free of any patented algorithms and is distributed under a BSD-like license .

Bzip2 compresses data in a three-stage process: First, the input data is sorted in blocks using the reversible Burrows-Wheeler transformation . The result is then subjected to a move-to-front transformation . The result is then finally subjected to Huffman coding , which performs the actual data compression .

The compression with bzip2 is often more effective, but mostly considerably slower than the compression with gzip or rar . Since 2003, however, the variant exists pbzip2 that multi-threading controlled and on current multi-core processors is significantly faster. To do this, pbzip splits the input data stream into several individual streams, which are compressed separately. The result is a file that contains the concatenated Bzip streams.

Files compressed with bzip2 are identified by the file extension .bz2 . tar files compressed with bzip2 usually have the extension .tar.bz2 or .tbz2 . One advantage of such tar files compressed with bzip2 is that in the event of read errors or damage, all still readable blocks can be copied out using bzip2recover and then unpacked, while other compression methods cannot continue working after a read error.

bzip2 is the successor to bzip, which originally used arithmetic coding according to the block sort; However, for reasons of patent law , bzip was no longer developed.

libbzip2

The command line program bzip2 uses a program library called libbzip2 for the actual compression and decompression work , which is also used by other programs that can read and write the bz2 file format.

This program library offers functions to compress any data in the main memory and an interface similar to stdio for reading and writing of bz2-compressed files.

File format

A .bz2data stream begins with a signature (4 bytes), followed by zero or more compressed blocks, followed immediately by an end-of-stream marker and a CRC (32-bit) for the original content of the entire file. The compressed blocks are bit-aligned (no padding).

VarName Bits Description
.magic 2 * 8 'BZ' signature / magic number
.version 1 * 8 'h' for Bzip2 ('H'uffman coding),' 0 'for Bzip1 (deprecated)
.hundred_k_blocksize 1 * 8 '1' .. '9' block size 100 kB .. 900 kB (uncompressed)
.compressed_magic 6 * 8 '1AY & SY' -> 0x314159265359 ( BCD ( Pi ))
.crc 4 * 8 checksum for this block
.randomized 1 (! Bit) 0 => normal, 1 => randomized (deprecated)
.origPtr 3 * 8 starting pointer into BWT for after untransform
.huffman_used_map 2 * 8 bitmap for the following 'huffman_used_bitmaps', of ranges of 16 bytes, present / not present
.huffman_used_bitmaps (0..32) * 8 bitmap, of symbols used, present / not present (multiples of 16)
.huffman_groups 3 2..6 number of different Huffman tables in use
.selectors_used 15th number of times that the Huffman tables are swapped (each 50 bytes)
* .selector_list 1..6 zero-terminated bit runs (0..62) of MTF'ed Huffman table (* selectors_used)
.start_huffman_length 5 0..20 starting bit length for Huffman deltas
* .delta_bit_length (1..5) * 8 0 => next symbol; 1 => alternated length
{1 => decrement length; 0 => increment length} (* (symbols + 2) * groups)
.contents 2..900 KB Huffman encoded data stream until end of block (max. 900 * 1024 * 8 => 7372800 bit)
.eos_magic 6 * 8 \ x17 'rE8P' \ x90 -> 0x177245385090 (BCD sqrt (pi))
.crc 4 * 8 checksum for whole stream
.padding 0..7 align to whole byte

See also

Web links