Choosing a primary key
Choosing a primary key#
Summary of the article on supabase: Choosing a Primary Key for Postgres
- Natural key - has meaning eg.
email
in users table - Surrogate key - no meaning other than to identify the row
In postgres serial/bigserial
is the right tool (datatype) in our toolbox to maintain a shared, auto-incrementing sequence of numbers. Not integer/biginteger
.
Problems with serial
- When writing automation that simply iterates through id values, note that serial columns can have gaps, even if you never
DELETE
(e.x. if anINSERT
was rolled back — sequences live outside transactions) - They are in order and attackers know that
UUID#
Enter UUIDS
They’re very random (almost always generated with secure random sources), and while they’re even worse for remembering, they’re near impossible to practically guess – the search space is just too large!
UUID v1#
Consists of:
- a 60 bit date-time (at nanosecond precision)
-
a 48 bit MAC address
a9957082-0b47-11ed-8a91-3cf011fe32f1
UUIDv4#
- Use all available bits for randomness
-
122 bits
ce0b897d-03a0-4f54-8c97-41d29a325a23
Make for great Global Unique IDentifiers - as they are unlikely to collide
problems with UUID:
- UUIDs are twice the size of bigint/bigserial
- UUIDv1s contain a time element but they’re not lexicographically sortable (this means they SORT terribly, relative to integer or a timestamp column)
- UUIDv4s index terribly, as they’re essentially random values (obviously, they SORT terribly as well)
### Greater than UUIDv4
- UUIDv6 - 62 bits of gregorian time + 48 bits of randomness
- UUIDv7 - 36 bits of big endian unix timestamp (seconds since epoch + leapseconds w/ optional sub-second precision) + variable randomness up to 62 bits
- UUIDv8 - variable size timestamp (32/48/60/64 bits) + variable size clock (8/12 bits) + variable randomness (54/62 bits)
SQL Compliant Integer IDs#
Use:
CREATE TABLE (
id integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
email citext NOT NULL CHECK (LENGTH(email) < 255),
name text NOT NULL
)