The traditional BSD scheduler (O(N) priority recalculation every second) is fatal on a 16-CPU system. The 4.4BSD-Lite scheduler, while improved, still requires a global lock on the run queue.
The Mach 3.0 microkernel, despite its performance problems (user-space IPC overhead was 200µs on a 50MHz 68040), influenced the model. By late 1994, almost every commercial UNIX vendor is internally re-engineering their kernel as a set of server threads within a single address space—a "hybrid" microkernel that avoids the IPC penalty. unix systems for modern architectures -1994- pdf
Senior Systems Analyst, UNIX Research Group Date: April 17, 1994 By late 1994, almost every commercial UNIX vendor
He introduced the concept of a subtle bug where two unrelated variables sit on the same cache line. If Processor A writes to variable 1, and Processor B writes to variable 2, the hardware cache coherency logic forces the cache line to bounce back and forth between processors, killing performance. This specific insight remains relevant today for anyone writing high-performance multithreaded code in C++, Rust, or Go. This specific insight remains relevant today for anyone
Schimmel’s book was among the first to bridge the gap between hardware engineering and software engineering. He explained the MESI protocol (Modified, Exclusive, Shared, Invalid) used by hardware to maintain cache coherency, and critically, how operating system code must be written to avoid "cache thrashing."
The PDF of UNIX Systems for Modern Architectures is most famous for its deep dive into the evolution of kernel locking strategies. This is the heart of the book.