What if I have a chunk of memory with a bunch of counters that are being constantly updated by another thread, and I want to make a private copy of it (doesn't have to be consistent as a whole, only each counter needs to be internally consistent) to read at my leisure, without the other cores' writes pushing it out of my cache?
Not a real case where I found this to be a bottleneck, just wondering what are the possibilities.