ZIP file format

from Wikipedia, the free encyclopedia
ZIP
Breezeicons-mimetypes-64-application-zip.svg
File extension : .zip
MIME type : application / zip
Magic number : 504B.0304 hex
PK \ x03 \ x04

( ASCII-C notation )

Developed by: Phil Katz
Type: Data compression
Container for: any files
Standard (s) : PKWARE / IANA

The ZIP file format (of English zipper , zipper ') is a format for lossless compressed files, on the one hand reduces the space required for archiving and secondly as a container file can be functions, summarized in several related files or entire directory trees. The file extension for zip-archived files is .zip . The MIME type is application / zip .

history

The ZIP format was originally developed in 1989 with the programs PKZIP (compress) and PKUNZIP (decompress) by the American Phil Katz and has since been expanded a few times . Katz originally used a different file format ( ARC ). This format was developed by Software Enhancements Associates (SEA) and was distributed as shareware . Katz wrote his own, much faster version of this software and distributed it as PKARC . When SEA sued him, he withdrew PKARC and instead developed PKZIP, which used a more efficient algorithm . The rapid spread of PKZIP made SEA and ARC meaningless.

features

Container

The ZIP format is initially a data container in which several files can be stored compressed or uncompressed and also individually decompressed (extracted). The format also enables the associated storage location path to be saved. It is also possible to encrypt the otherwise only compressed files with a password .

No progressive compression

The ZIP format does not support progressive compression (also English solid called), the files are compressed individually. On the one hand, this enables flexible handling (deleting / adding files from the archive without having to recompress everything; extraction of individual files without having to decompress previous files), but has the disadvantage that redundancies between the files are not taken into account during compression can. This disadvantage can be circumvented by first archiving the files uncompressed and then saving the zip file created in this way in a compressed format (usually only useful if there are a large number of similar files).

Non-sequential format

The files are available as file entries ( english file entries stored) in any order. The file entries all begin with a local header ( English local header ) that describes the file entry, and initiates the data portion of the effective content. In order to handle these randomly arranged entries to ensure the zip file is located at the end in each case a central directory ( English central directory ), which references all file entries from the local headers. The order of the file entries and the corresponding references in the central directory can differ from one another. It is a non-sequential structure that best with the concept of random access ( English random access ) can be described.

On the other hand, this non-sequential format also has the effect that, in contrast to the tar format that has been used since 1977 and standardized since 1988 , incomplete archives or archives that are defective in the rear cannot be unpacked at all.

Structure of a zip file

Multivolume

It is still possible to divide the archive over several files (for example, to split large files into pieces that each fit on a CD or DVD).

Packing algorithms

In addition to the deflate method, which is the best method up to PKZip version 2.x , ZIP supports a number of other compression algorithms :

method Short text comment
0 Store The file is saved without compression.
1 Unshrinking Dynamic Lempel-Ziv-Welch algorithm
2 Expanding - compression level 1
3 Expanding - compression level 2
4th Expanding - compression level 3
5 Expanding - compression level 4
6th Imploding
7th Tokenization
8th Deflating LZSS and Huffman - entropy coding
9 Enhanced Deflating ( DEFLATE64 )
10 PKWARE Data Compression Library Imploding (formerly IBM TERSE)
11 reserved
12 Bzip2
13 reserved
14th LZMA Lempel-Ziv-Markow algorithm
15th reserved
16 reserved
17th reserved
18th IBM TERSE (new)
19th IBM LZ77 z Architecture (PFS)
95 Xz (LZMA2) 1.0.4 Extension by WinZIP 18.0 (November 2013)
96 JPEG compression Extension by WinZIP 12.0 (September 2008)
97 WavPack Extension by WinZIP 11.0 Beta (October 2006)
98 PPMd version 1, rev 1 Extension by WinZIP 10.0 Beta (August 2005)
99 AES encrypted Extension through WinZIP

Extensions

There are now extensions that have been introduced subsequently, such as the Zip128 extension.

Spread, meaning

The file format and the compression method Deflate are public domain and thus achieved worldwide distribution and importance.

The deflate method can be found as a quasi-standard in numerous other formats, such as the image file formats Portable Network Graphics (PNG) and Tagged Image File Format (TIFF), the OpenDocument and the Office Open XML format of the ISO .

Programs

In addition to PKZIP, there are numerous other programs that can process this file format. These include free programs such as Info-ZIP , PeaZip , Xarchiver or 7-Zip , whose optimized deflate algorithm can also generate slightly smaller PKZIP-2.xx-compatible files. There are also commercial programs such as WinRAR or WinZip .

Logo of the 7-Zip packing program

Program and class libraries for accessing zip files are available for numerous programming languages. For example, the Java Platform, Standard Edition (Java SE) since 1997 (Version 1.1) has included the “java.util.zip” package with the corresponding classes for compression and decompression. Next there is the class library Zip64File which zip files as so-called random access files ( English random access files can handle). Zip64File is available to the public in full, free of charge and including the source code.

The BOMArchiveHelper program integrated in the macOS system also generates and decompresses in Zip format. The Windows file explorer is also able to pack and unzip zip files, so that no additional software usually has to be installed here.

Name, confusion of names

According to the company PKWare, the name zip (English for zip fastener ) refers to the packing of many individual files in a larger container and not to the compression function of the program.

Not every compression program with a name containing the string "ZIP" works with the ZIP file format. The most important examples are gzip from the GNU project and bzip2 , which each compress only a single file in a separate format. In this case, to archive several files, another program must be used before compression (usually tar in connection with gzip and bzip2 ). Also with 7-Zip , although the ZIP file format is fully supported, but their own archive format 7z is not compatible with ZIP.

With version 12.1, WinZip introduced the zipx extension of the ZIP format, which marks the use of newer compression methods than DEFLATE, in particular BZip, LZMA, PPMd, Jpeg and Wavpack.

The word “zip” is occasionally used as a synonym for “compressed archiving”, but this does not necessarily mean the packing as a zip file.

ZIP compression in other file formats

The following file formats are zip files, but they must contain certain files:

See also

Web links

Individual evidence

  1. iana.org IANA
  2. ZIP File Format Specification ( en ) PKWARE Inc. October 1, 2014. Retrieved August 18, 2017.
  3. pkware.cachefly.net
  4. winzip.com
  5. kb.winzip.com
  6. winzip.com
  7. imagewz.winzip.com (PDF)
  8. winzip.com
  9. winzip.com
  10. winzip.com
  11. winzip.com
  12. Compressing (zipping) and unzipping (unzipping) files. In: Windows Support. Microsoft Corporation, accessed May 8, 2020 .
  13. APK: What is it actually? - Giga , on April 28, 2014
  14. Android package - entry in the Android wiki ; u. a. with "The programming language used is mostly Java, [...]", in the section "Create APK file" (last changed on November 5, 2017)