Hi Paul and all,

I've read through Appendix C "Why Memory Barriers". It helps greatly
with understanding the overall problem. However, one part in particular
confused me and seemed to contradict the previous subsections:

"However, the CPU need not actually invalidate the cache line before
sending the acknowledgement." [1]

Well yes, I think it absolutely needs to. The previous examples relied
precisely on this. What a CPU sending an Invalidate Message actually is
saying is: "I will modify this cache line that you currently have read-
only in your local cache. Once you sent me the Invalidate-ACK I know
that you have invalidated it and I can safely modify it."

A CPU sending an Invalidate-ACK without actually having invalidated its
cache line is, bluntly, lying and endangering the entire cache
coherence.

Now don't get me wrong, I accept that this is obviously what is really
happening. But the chapter got me to the point of interpreting a
truthfull Invalidate-ACK as an essential part of cache coherence.

The previous section detailing the store-buffer, on the contrary, makes
more sense: "Altough not owning this cache line yet, I can store my new
value in the store buffer already because whatever the current value
is, I will overwrite it anyways." whereas with the invalidate queue the
reader just ignores that the variable might have changed.

I guess this is legal because the only real guarantee of CPUs is that
one particular CPU sees all its accesses in order? But even then, as
above, for store buffers it makes sense, because the storing CPU
doesn't care about other values. The *reading* CPU sending the fake
Invalidate-ACK, on the contrary, should very well care about reading
the truthfull value from the cache line.

And if it all works like that, then what even is the point of
Invalidate messages at all, if you can not rely on them being followed
before you yourself start modifying the cache line?

Or is the point that a CPU temporarily ignoring an Invalidate message
can still validly (without memory barriers) use data in that cache line
which does *not* get modified by the other CPU? So memory barriers in
this scenario would allow for more efficiency by "segmenting" cache
lines?


Quite confusing. Parallel programming is hard and discussing it is one
thing we can do about it :]

Thanks,
Philipp


[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git/tree/appendix/whymb/whymemorybarriers.tex#n1127

Reply via email to