Universally Unique Identifier

from Wikipedia, the free encyclopedia

A Universally Unique Identifier ( UUID ) is a 128- bit number that is used to identify information in computer systems. The term Globally Unique Identifier ( GUID ) is also used, typically in connection with Microsoft (e.g. software, registry ).

When generated by the standard methods, unlike most other numbering schemes, UUIDs are unique for practical purposes without their uniqueness depending on a central registry or coordination between the parties who generate them. Although the probability of a UUID being duplicated is not zero, it is close enough to zero to be negligible.

Hence, anyone can create a UUID and use it to identify something with the greatest possible certainty that the identifier is not duplicating another identifier that has already been or is being created to identify something else. Information that has been marked with UUIDs by independent parties can therefore later be combined in a single database or transmitted on the same channel with a negligible probability of duplication.

The use of UUIDs and GUIDs is widespread. Many computing platforms offer assistance in generating and parsing their text representation.

It was standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE) and is now regulated in RFC 4122.

A UUID consists of a 16- byte number that is noted in hexadecimal and divided into five groups. For example, in its normal form a UUID looks like this:

550e8400-e29b-11d4-a716-446655440000

History and standardization

UUIDs are documented as part of the ISO / IEC 11578: 1996 “Information technology - Open Systems Interconnection - Remote Procedure Call (RPC)” standard and as a separate ISO / IEC 9834-8: 2005 standard. The IETF has published the UUID-based RFC 4122 .

The original (version 1) generation scheme for UUID was to concatenate the UUID version with the MAC address of the computer generating the UUID and the number of 100 nanosecond intervals since the Gregorian calendar began . In practice, the actual algorithm is more complicated. This scheme has been criticized because it reveals both the identity of the generating computer and the time of generation.

Several other generation algorithms were developed and incorporated into the standard, such as a scheme that is only based on random numbers (version 4 UUID) and schemes in which the UUID is made up of any string (e.g. DNS entry, URL , ISO OID , " X.500 Distinguished Names", but also any other semantics, provided a basic UUID is defined for it) is derived via MD5 - (Version 3 UUID) or SHA-1 - (Version 5 UUID) hash values .

Release 5.0 of Java provides a class that generates a 128-bit wide UUID. The API documentation for the class java.util.UUIDreferences RFC 4122 . Many other languages ​​also provide ready-made routines for generating UUIDs.

Variant: Microsoft GUID

Microsoft also uses UUIDs in its Component Object Model, also called GUID there. However, some of these IDs correspond to their own specification. The UUIDs described here can be recognized by the top two bits of the field clock_seq_high_and_reserved. They always have the value 10 ; in the hexadecimal notation, the first hex digit of the fourth number is therefore always between 8 hex and B hex , e.g. E.g .: 5945c961-e74d-478f- 8 afe-da53cf4189e3. The historical UUIDs used by Microsoft have the value 110 in the top three bits of this field , so in the hexadecimal representation the first hex digit of the fourth number is either C hex or D hex . Example: The GUID of the IUnknowninterface in the COM has the UUID 00000000-0000-0000- C 000-000000000046.

construction

RFC 4122 describes the structure of a UUID. The names of the individual fields are based on the original UUID version 1 and are only of historical interest with the randomly generated UUIDs that are predominantly used today.

Structure of a UUID according to RFC 4122
Offset Field name Data type description
0 time_low uint32_t Timestamp, least significant 32 bits
4th time_mid uint16_t Timestamp, middle 16 bits
6th time_hi_and_version uint16_t The uppermost bits of the time stamp in the lower 12 bits of the field, the upper 4 bits serve as the version identifier
8th clock_seq_high_and_reserved uint8_t Top 6 bits of the clock sequence (the top 2 bits of the field are always in the UUID variant described here 1 0)
9 clock_seq_low uint8_t Lower 8 bits of the clock sequence
10 node uint48_t Unique node identification number

The top 4 bits in the field time_hi_and_versionindicate the so-called version of the UUID. Strictly speaking, this is not a version, but some kind of UUID subtype. The following 5 versions have been defined so far:

UUID versions according to RFC 4122
version description
1 original, timestamp-based UUID
2 DCE Security version
3 name based, MD5 hashed
4th random or pseudo-random UUID
5 name-based, SHA1- hashed

Timestamp-based UUIDs

The time stamp is a 60-bit value that counts the 100 ns intervals that have passed since October 15, 1582 (the introduction of today's Gregorian calendar ). In order to keep the time stamp unambiguous, if the system time has to be reset, there is a Clock sequence field , which in this case is either increased by 1 or is to be set to a new (pseudo) random value. The node ID should be the MAC address of one of the network cards installed in the system or a pseudo-random value if the system does not have a MAC address.

(Pseudo) randomly generated UUIDs (version 4)

All bits that are not set to fixed values ​​by the UUID format are occupied by (pseudo-) random values.

Although the uniqueness of a UUID generated in this way is not guaranteed, the total number of randomly generated UUIDs is so large that the probability of generating two identical UUIDs is very small, provided that the random number algorithms used provide uniformly distributed random numbers. Therefore, UUIDs can be generated and used for identification without a central control body, without running the relevant risk that the same UUID is used for something else. Information marked with UUID can thus later be merged in a single database without having to resolve identifier conflicts.

Examples of implementation of the UUID standard are:

Name-based UUIDs (version 3 and 5)

A UUID is generated based on a name that is not specified in detail. Names are unique identifiers for an object, a resource or the like within an assigned namespace. Based on a UUID for the namespace, a UUID is generated from the name by forming a byte sequence from the namespace UUID and the name itself and then hashing this byte sequence with MD5 or SHA1. The hash is then distributed to the available UUID bits in a defined way.

RFC 4122 contains sample namespace UUIDs for the namespaces DNS , URL , ISO OID and “ X.500 Distinguished Names”.

For example, if www.example.orga UUID version 5 is to be generated from the DNS name , proceed as follows:

  1. Namespace UUID for "DNS" names is 6ba7b810-9dad-11d1-80b4-00c04fd430c8.
  2. At this byte sequence is the sequence of bytes for the name www.example.orgappended (bytes from the namespace UUID in bold): .
    0x6b 0xa7 0xb8 0x10 0x9d 0xad 0x11 0xd1 0x80 0xb4 0x00 0xc0 0x4f 0xd4 0x30 0xc8 0x77 0x77 0x77 0x2e 0x65 0x78 0x61 0x6d 0x70 0x6c 0x65 0x2e 0x6f 0x72 0x67
  3. This byte sequence must be hashed with SHA1. The result is the following SHA1 hash: 74738ff55367e9589aee98fffdcd187694028007
  4. The bits of the SHA1 hash are to be divided among the available bits of UUID version 5.
  5. Generated UUID: 74738ff5-5367- 5 958- 9 aee-98fffdcd1876. (The places marked in bold contain fixed bits from the UUID variant and version.)

Collisions

Collisions occur when the same UUID is generated more than once and assigned to different reference objects.

With versions 1 and 2, a collision can only occur if the standard implementation is deviated from, either intentionally or unintentionally.

With the hash-based versions 3 and 5 as well as the pseudo-random version 4, collisions can occur without deviations in the implementation. However, the likelihood of this is so small that it can usually be ignored. The probability of this can be calculated exactly like the birthday problem.

For example, the number of version-4 UUIDs that must be computed to have a 50% chance of at least one collision is 2.71 trillion. This is calculated as follows:

If one were to generate 1 billion UUIDs per second, it would take 85 years to generate this number of UUIDs.

The smallest number of Version-4 UUIDs that have to be generated for the probability of finding a collision is approximated by this formula:

Therefore, the probability of a duplicate is one in a billion with 103 trillion version-4 UUIDs.

use

The main uses include the administration tools for the Ext2 / Ext3 / Ext4 file systems ( e2fsprogs uses the libuuid provided by util-linux ), LUKS encrypted partitions, GNOME , KDE and macOS , most of which are from the original implementation of Theodore Ts'o are derived.

Web links

Individual evidence

  1. Class Java API Specification at Oracle . (For more information including algorithms used to create UUIDs, see RFC 4122 : A Universally Unique IDentifier (UUID) URN Namespace, section 4.2 “Algorithms for Creating a Time-Based UUID”.) UUID
  2. Detailed description of timestamp-based UUIDs (in English)
  3. Linux Programmer's Manual. RANDOM (4). In: The Linux Programming Interface. Michael Kerrisk, August 29, 2010, accessed May 25, 2011 .
  4. Paulo Jesus, Carlos Baquero, Paula Almaeida: ID Generation in Mobile Environments .