Take a big blob of text, hand back a tiny shareable link — and trade reads against writes.
A pastebin accepts a chunk of text (code, logs, a config) and returns a short URL anyone can open. The hard part is not storing the text — it's minting a short, unique key for each paste and serving it back fast, because reads vastly outnumber writes.
The clean design splits the two paths. On write, we save the blob and generate a base62 key. On read, we look the key up and stream the blob straight from object storage, ideally through a cache so the database barely gets touched.
Generate the key from a monotonically increasing ID encoded in base62 (so it stays short and collision-free), or hash the content and keep the first few characters — retrying on the rare collision.
ALPHABET = "0..9a..zA..Z" # 62 symbols
def base62(n): # 125 -> "21", 999999 -> "4c91"
s = ""
while n:
n, r = divmod(n, 62)
s = ALPHABET[r] + s
return s or "0"
def create_paste(text):
paste_id = db.next_id() # atomic counter
key = base62(paste_id) # short, unique
blob_store.put(key, text) # the big bytes
db.put(key, {"size": len(text)}) # tiny metadata row
return f"https://pb/{key}"
def read_paste(key):
if (hit := cache.get(key)): # most reads end here
return hit
text = blob_store.get(key)
cache.set(key, text, ttl=3600)
return text
| Operation | Work | Why |
|---|---|---|
| Write | O(1) | One ID bump, one blob put, one metadata row |
| Read (cache hit) | O(1) | Served from memory, never reaches the database |
| Read (cache miss) | O(1) + I/O | One blob fetch, then populate the cache |
| Key length | ~7 chars | 627 ≈ 3.5 trillion pastes |
/4c91 is followed by /4c92, people can scrape private pastes. Add randomness or a per-paste secret.You paste a 4 KB stack trace. The service bumps its counter to 3,500,000, encodes it as base62 → "efp4", writes the bytes to blob storage under that key and a tiny row to the database, then returns https://pb/efp4. A teammate opens the link: the first open misses the cache and reads from blob storage; every open for the next hour is a pure cache hit, so the database stays idle even under a thousand views.
Reads outnumber writes 100:1. Where should the read path spend most of its time?
Coach note: if this didn't click yet, replay the visual and watch which boxes light up on read versus write — the asymmetry is the whole design.