gzip
gzip
|
|
---|---|
Help display in the command line |
|
Basic data
|
|
developer | Jean-Loup Gailly and Mark Adler |
Publishing year | 1992 |
Current version |
1.10 ( December 30, 2018 ) |
operating system | available across platforms |
programming language | C. |
category | Data compression |
License | GPL ( Free Software ) |
www.gnu.org/software/gzip |
gzip is a free compression program that, like the corresponding gzip file format, is available for practically all computer operating systems (under the terms of the GPL, also in the source code).
In general, gzip is the short form for " GNU zip", where "zip" was borrowed from the English word for zip. OpenBSD has made a BSD-licensed re-implementation under the name gzip(1), gunzip(1)
as well gzcat(1)
, which is fully compatible with the GNU tools.
gzip offers a degree of compression that is satisfactory for text and does not contain any patented algorithms ( deflate is used). It was originally developed by Jean-Loup Gailly to replace the compress used under Unix . Mark Adler wrote the decompression program gunzip .
technology
gzip is based on the Deflate algorithm, which is a combination of LZ77 and Huffman coding . Deflate was developed in response to the patents that passed on LZW and other compression algorithms . The ZIP file format also mainly uses deflate for compression, but otherwise must not be confused with gzip.
The window size with gzip is 32 KiB. If a sequence of bytes does not repeat itself in the previous 32 KiB, it is saved uncompressed in the .gz file. This window size is outdated compared to modern compression programs (e.g. bzip2 with 100 to 900 KiB block size, rzip as an extreme case with 900 MiB window size), but gzip is still one of the fastest compression programs and can be used in many ways, for example in connection with a so-called pipeline - the output ("standard out") of a program can represent the input ("standard in") of gzip and vice versa.
To simplify the development of software that uses data compression, the zlib library was written. It supports the gzip file format and deflate compression. The library is widely used because it is small, efficient, and versatile.
construction
Source: RFC 1952 GZIP File Format Specification version 4.3.
Initialization [0-1]
The first two bytes form the so-called “identification code” of the format, which is always the same in the gzip format. This header is standardized with bytes 0x1f and 0x8b ( hexadecimal ) or 31 and 139 ( decimal ). These bytes are used to verify the file format (gzip) and to identify the first noticeable defects in the file. If this initialization is wrong or not done at all, there will be errors that either cause an error or produce incorrect end files (after decompression).
Compression method [2]
The byte on the index (starting from 0) 2 indicates which compression method is involved or which action was used to save the file (s).
Byte value | meaning |
---|---|
0 | Copy of the file (no action taken) |
1 | Compression |
2 | Packing |
3 | LZH format |
4th | Reserved |
5 | Reserved |
6th | Reserved |
7th | Reserved |
8th | "Deflate" is supposed to provide a faster alternative to compressing (currently only little information) |
Special information ("flags") [3]
The byte that is used for special information is located on index 3. Again there are certain values that have a fixed meaning.
It is important that the bits are always taken into account here. This means that z. B. "10111000" (binary) (19 [decimal]; 0x13 [hexadecimal]) says the following: The file is ASCII text, is a single file, has a CRC-16 number, has extra information and the original name is known.
Byte value | meaning |
---|---|
1 (bit position 1) | The file has an ASCII text |
2 (bit position 2) | CRC-16 present (serves as a check value to determine whether the file is possibly damaged or was not transferred correctly). |
4 (bit position 3) | Determines whether additional information is provided |
8 (bit position 4) | Original name available |
16 (bit position 5) | Comment available |
32 (bit position 6) | Reserved (must be 0) |
64 (bit position 7) | Reserved (must be 0) |
128 (bit position 8) | Reserved (must be 0) |
Last modification (time) [4-7]
This value is determined by 4 bytes and indicates a time in Unix time .
Additional special information ("Extra Flags") [8]
The definition is analogous to "Special information" on byte 3.
Example for compression method "Deflate":
Byte value | meaning |
---|---|
2 (bit position 2) | Compressor uses maximum compression and the slowest algorithm |
4 (bit position 3) | Compressor uses the fastest algorithm |
Operating system [9]
This byte indicates on which operating system the file was compressed.
Byte value | meaning |
---|---|
0 | FAT (file system) |
1 | AmigaOS |
2 | VMS or OpenVMS |
3 | Unix |
4th | VM or CMS |
5 | Atari TOS |
6th | HPFS (file system) |
7th | Macintosh (platform), Mac OS (operating system) |
8th | Z system |
9 | CP / M |
10 | TOPS-20 |
11 | NTFS (file system) |
12 | QDOS |
13 | Acorn RISC OS |
255 | Unknown |
Sample calls
Pack a file:
gzip <Dateiname>
Unzip a packed file:
gzip -d <Dateiname>
or
gunzip <Dateiname>
Recursively pack all files in a directory and specify the compression rate:
gzip -rv <Verzeichnis>
Output a compressed text file:
zcat <Dateiname>
Unpack a defective compressed file to the point of failure:
zcat <gzip-Datei> > <Ziel-Datei>
gzip compressed files
gzip | |
---|---|
File extension : |
.gz
|
MIME type : | application / gzip |
Magic number : |
\ x1F \ x8B \ x08
( ASCII-C notation ) |
Developed by: | Jean-Loup Gailly and Mark Adler |
Type: | Data compression |
Container for: | any file |
Extended by: | compress |
Standard (s) : | RFC 1952 |
Website : | gzip.org |
The usual file extension for gzip-compressed files is today .gz
and in the past .z
.
Since gzip can only compress individual files, several files or directory trees are usually first combined with tar to form an archive file called tarball , which is then compressed with gzip.
Such compressed archive files then usually have the double extension .tar.gz
or simply .tgz
. This method enables better compression overall, since redundancies between the individual files can be exploited ( progressive compression ), but makes access to the individual components more difficult.
distribution
Under Unix , compression with gzip is standard today because it enables a good compromise between high speed and good data reduction for many tasks. However, where speed is less important than minimum file sizes (e.g. when data is widely distributed over relatively slow networks), bzip2 and LZMA are increasingly used (as is the case with gzip in combination with tar).
The zlib-compressed file format, the deflate algorithm and the gzip file format were standardized in 1996 as Request for Comments RFC 1950 , RFC 1951 and RFC 1952 .
See also
- List of data compression programs
- zopfli is an encoder programmed by Google employees that creates compatible and smaller gzip files, but at the expense of very long compression times
-
pigz
is a version ofgzip
, programmed by Mark Adler , which uses all available processor cores and threads, thus noticeably accelerating the compression
Web links
- gzip.org - original project page (English)
- P. Deutsch: RFC 1952 . - GZIP File Format Specification version 4.3 . May 1996. (English). (English)
-
gzip(1)
: gzip, gunzip, zcat - compressing and expanding files - Debian GNU / Linux executables or shell commands man page - goethe.ira.uka.de ( Memento from September 8, 2012 in the Internet Archive ) - easy to understand description of the various compression options
Individual evidence
- ↑ gzip-1.10 released [stable] . December 30, 2018 (accessed December 30, 2018).
-
↑
gzip(1)
: compress and expand data (deflate mode) - OpenBSD General Commands Manual -
↑
compress
: compress data - Open Group Base Specification - ↑ Jean-loup Gailly, Mark Adler: Compression algorithm (deflate) ( Memento from February 16, 2014 in the Internet Archive ) on gzip.org. September 1, 1997 ( Last-Modified ).
- ↑ P. German: RFC 1952 . - GZIP File Format Specification version 4.3 . May 1996. (English).
- ↑ tools.ietf.org