False Sharing (Cache Contention)

When adding more CPU cores makes your multithreaded code slower.

The idea

CPUs don't read memory one byte at a time. They fetch memory in chunks called Cache Lines (usually 64 bytes). If Core 1 is modifying variable A, and Core 2 is modifying variable B, they shouldn't interfere with each other, right? But if A and B happen to sit right next to each other in RAM (within the same 64-byte block), they end up in the same Cache Line. When Core 1 writes to A, it invalidates the entire cache line for Core 2. Core 2 is forced to fetch it from slow main RAM again just to read B. This invisible fight over the cache line is called False Sharing, and it crushes performance.

Step 1: Two independent counters, A and B, in an array. Core 1 updates A, Core 2 updates B.

How it works (Padding)

To fix False Sharing, you force the CPU to put A and B into different cache lines. You do this by injecting empty space (Padding) between them in memory. If the cache line is 64 bytes, you put 64 bytes of junk data between A and B. Now, Core 1 and Core 2 can thrash their respective cache lines without invalidating each other.

// THE BAD WAY: A and B are adjacent in memory
struct Counters {
    int A; // Core 1 writes here
    int B; // Core 2 writes here
}; 
// Result: Massive Cache Invalidation Ping-Pong

// THE GOOD WAY: Cache Line Padding
struct Counters {
    int A;
    char padding[60]; // Force B into the next 64-byte chunk
    int B;
};
// Result: 10x faster execution across multiple cores.

Cost

Adding padding wastes RAM. In the example above, 60 bytes of memory are doing absolutely nothing. If you have an array of 1,000,000 padded structs, you are wasting 60 Megabytes of RAM just to prevent cache contention. It's a direct trade-off: sacrificing Memory Efficiency for Multi-Core CPU Speed.

Watch out for