What is a GUID?
A GUID is an acronym that stands for Globally Unique Identifier, they are also referred to as UUIDs or Universally Unique Identifiers – there is no real difference between the two. Technically they are 128-bit unique reference numbers used in computing which are highly unlikely to repeat when generated despite there being no central GUID authority to ensure uniqueness.
What does a GUID look like?
A GUID follows a specific structure defined in RFC 4122 and come in a few different versions and variants. All variants follow the same structure xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx where M represents the version and the most significant bits of N represent the variant.
How unique is unique?
A GUID is a unique number that can be used as an identifier for anything in the universe, but unlike ISBN there is no central authority – the uniqueness of a GUID relies on the algorthm that was used to generate it. We’ll look at the types of GUIDs later, but assuming a randomly generated GUID you have about the same chance of getting hit by a meteorite in a year as getting a collision in 10-30 trillion GUIDs.
Types of GUIDs
There a 5 versions of GUIDs defined in RFC 4122, each with different properties.
To identify the version of the GUID, just look at the version digit e.g version 4 GUIDs have the format xxxxxxxx-xxxx-4xxx-Nxxx-xxxxxxxxxxxx where N is one of 8, 9, A, or B.
Version 1: date-time & MAC address
This version is generated using both the current time and client MAC address. This means that if you have a version 1 GUID you can figure out when it was created by inspecting the timestamp value.
Version 2: DCE Security
This version isn’t specifically defined in RFC 4122 so doesn’t have to be generated by compliant generators. It is similar to a version 1 GUID except the first 4 bytes of the timestamp are replaced by the user’s POSIX UID or GID and the upper byte of the clock sequence is replaced by either the POSIX UID/GID domain.
Version 3: MD5 hash & namespace
This GUID is generated by taking a namespace (e.g. a fully qualified domain name) and a given name, converting to bytes, concatenating, and hashing. Once specifying the special bits like version and variant the resulting bytes are then converted into its hexadecimal form. The special property about this version is that GUIDs generated from the same name in the same namepsace will be identical even if generated at different times.
This version is identical to SHA-1 except for the hashing algorithm used, if you don’t have to maintain backwards compatibility with existing MD5 GUIDs, SHA-1 (version 5) is preferred.
Version 4: random
This type of GUID is created using random numbers – of the 128 bits in a GUID, 6 are reserved for special use (version + variant bits) giving us 122 bits that can be filled at random.
The specification doesn’t specify how the random numbers should be generated, they could be anything where from pseudo-random to photographically secure – hence these GUIDs like all other GUIDs should only be used for identification and not for security.
Version 5: SHA-1 hash & namespace
This version is identical to version 3 except that SHA-1 is used in the hashing step in place of MD5.
What’s so great about GUIDs
- No need for a central authority to ensure uniqueness: Easy to dish out, but hard to track
- You probably won’t run out – there’s about 75,000,000,000,000,000,000 grains of sand on earth but even that pales in comparision to the number of GUIDs available: 340,282,366,920,938,463,463,374,607,431,770,000,000
- Easy to merge: You can merge different datasets as GUID entity identifiers are very unlikely to collide