Hi Paul and all, I've read through Appendix C "Why Memory Barriers". It helps greatly with understanding the overall problem. However, one part in particular confused me and seemed to contradict the previous subsections:
"However, the CPU need not actually invalidate the cache line before sending the acknowledgement." [1] Well yes, I think it absolutely needs to. The previous examples relied precisely on this. What a CPU sending an Invalidate Message actually is saying is: "I will modify this cache line that you currently have read- only in your local cache. Once you sent me the Invalidate-ACK I know that you have invalidated it and I can safely modify it." A CPU sending an Invalidate-ACK without actually having invalidated its cache line is, bluntly, lying and endangering the entire cache coherence. Now don't get me wrong, I accept that this is obviously what is really happening. But the chapter got me to the point of interpreting a truthfull Invalidate-ACK as an essential part of cache coherence. The previous section detailing the store-buffer, on the contrary, makes more sense: "Altough not owning this cache line yet, I can store my new value in the store buffer already because whatever the current value is, I will overwrite it anyways." whereas with the invalidate queue the reader just ignores that the variable might have changed. I guess this is legal because the only real guarantee of CPUs is that one particular CPU sees all its accesses in order? But even then, as above, for store buffers it makes sense, because the storing CPU doesn't care about other values. The *reading* CPU sending the fake Invalidate-ACK, on the contrary, should very well care about reading the truthfull value from the cache line. And if it all works like that, then what even is the point of Invalidate messages at all, if you can not rely on them being followed before you yourself start modifying the cache line? Or is the point that a CPU temporarily ignoring an Invalidate message can still validly (without memory barriers) use data in that cache line which does *not* get modified by the other CPU? So memory barriers in this scenario would allow for more efficiency by "segmenting" cache lines? Quite confusing. Parallel programming is hard and discussing it is one thing we can do about it :] Thanks, Philipp [1] https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git/tree/appendix/whymb/whymemorybarriers.tex#n1127
