Non-Atomic Counter

Why count++ is secretly three separate operations.

The idea

When you write count++ in your code, it looks like a single, unbreakable command. But to the CPU, it is three completely separate steps: 1) READ the current value from memory into a CPU register. 2) ADD one to the register. 3) WRITE the new value back to memory. If two threads do this at the exact same time, their operations will interleave, and you will lose increments.

Step 1: Two threads are about to execute count++ on a shared variable currently at 0.

How it works (Atomic Types)

You could fix this by wrapping count++ in a Mutex (Lock). But locks are slow. A much faster solution is to use hardware-level Atomics (like Java's AtomicInteger or C++'s std::atomic). These use a special CPU instruction (Compare-And-Swap) that forces the Read-Modify-Write to happen in one unbreakable microscopic tick of the CPU.

# The Bug (count++)
count = 0
# If 100 threads run count++, final count might be 87!

# The Fix (Atomics)
import java.util.concurrent.atomic.AtomicInteger;

AtomicInteger count = new AtomicInteger(0);

// This delegates to a special hardware CPU instruction
// It is thread-safe and much faster than a Mutex lock
count.incrementAndGet();

Cost

Atomic operations are lock-free and extremely fast (nanoseconds). However, under massive contention (thousands of threads hammering the same atomic variable constantly), the CPU's internal bus locks up, degrading performance. For ultra-high contention counting, structures like Java's LongAdder are used instead.

Watch out for