One of the most important things I’ve learned over the last ten years is that ...

One of the most important things I’ve learned over the last ten years is that building a core is the easy bit of building an SoC. Aside from a few things like branch predictors, which are literal black magic (little-known fact: there is an extra layer of doping in CMOS manufacturing for branch predictors, where a molecule-thick layer of sacrificial goat blood is applied to the silicon), most of what happens in a core is pretty well understood and documented in the literature. There are loads of papers and even books about how cores work. Most of the hard work is looking at the large possible design space and choosing the tradeoffs that make sense for your target price/power/performance/area point.

But the rest of the SoC is usually described in very short summaries and contains large number of places where you can easily lose as much as 20-30% performance. The core’s throughout gives you an upper bound on performance but it’s trivial to accidentally build an SoC that can’t achieve (or even approach) that performance in anything other than a synthetic microbenchmark.

This is what makes the Apple systems nice. It’s not that anything they do is particularly good (though some bits are), it’s that nothing is bad. They can do this (in part) because they design the SoC along with the target memory configuration, so the memory size and bandwidth are known when designing caches, cache eviction policies, prefetches, and when sizing store queues and speculation windows to hide memory latency.

David Chisnall (*Now with 50% more sarcasm!*) on Nostr: One of the most important things I’ve learned over the last ten years is that ...

David Chisnall (Now with 50% more sarcasm!) on Nostr: One of the most important things I’ve learned over the last ten years is that ...