How Reddit and HackerNews structure infinitely deep conversation threads.
A flat list of comments is easy: just store post_id. But in a nested discussion, replies have replies, which have replies. Querying a relational database recursively to fetch 10 levels of replies is incredibly slow (an N+1 query nightmare). We need a data structure that allows us to fetch an entire comment tree in a single fast database query, while preserving the exact nested order.
The most common pattern is the Materialized Path (also called Path Enumeration). Every comment stores the full path of its ancestors as a string. By sorting by this string path, the database naturally returns all comments in perfect threaded, nested order in a single query.
# The Materialized Path approach in SQL
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
post_id INT,
body TEXT,
path VARCHAR -- The magic column! Example: "001.002.001"
);
# To fetch all comments for a post, already sorted correctly for UI rendering:
# We just query by post_id and sort alphabetically by the path!
SELECT * FROM comments
WHERE post_id = 42
ORDER BY path ASC;
# Resulting Order:
# path="001" -> "I love this"
# path="001.001" -> "Me too!"
# path="001.001.001" -> "Same here"
# path="002" -> "I disagree"
Fetching is incredibly fast O(1) query. Inserting is fast O(1). The trade-off is moving a comment to a different thread (re-parenting), which requires rewriting the `path` of the comment and all of its descendants. Fortunately, users rarely move comments between threads.
1.2.10, standard alphabetical sorting will put 1.2.10 before 1.2.2! You must pad the IDs (e.g. 001.002.010) or use a database type designed for this (like Postgres's ltree extension) to ensure proper sorting.