On Sun, 2025-11-02 at 10:37 -0800, Paul E. McKenney wrote:
> On Sun, Nov 02, 2025 at 04:18:48PM +0000, Philipp Stanner wrote:
> >
[…]
> >
> > "However, the CPU need not actually invalidate the cache line before
> > sending the acknowledgement." [1]
> >
> > Well yes, I think it absolutely needs to. The previous examples relied
> > precisely on this. What a CPU sending an Invalidate Message actually is
> > saying is: "I will modify this cache line that you currently have read-
> > only in your local cache. Once you sent me the Invalidate-ACK I know
> > that you have invalidated it and I can safely modify it."
> >
> > A CPU sending an Invalidate-ACK without actually having invalidated its
> > cache line is, bluntly, lying and endangering the entire cache
> > coherence.
> >
> > Now don't get me wrong, I accept that this is obviously what is really
> > happening. But the chapter got me to the point of interpreting a
> > truthfull Invalidate-ACK as an essential part of cache coherence.
>
> The following sentence was intended to help: "It could instead queue
> the invalidate message with the understanding that the message will
> be processed before the CPU sends any further messages regarding that
> cache line."
So it can refuse to invalidate the cache line as long as it only keeps
it in its current state, the "worst damage" being that the CPU reads
potentially outdated data?
>
[…]
> Fourth, you are right that strict unoptimized MESI
> would absolutely require that the cache line be invalidated prior to
> acknowledging the invalidation.
My understanding of that chapter is that classic MESI needs little
memory barriers, but store buffers and invalidate queues inform the
necessity for wmb() and rmb(), respectively.
>
> > The previous section detailing the store-buffer, on the contrary, makes
> > more sense: "Altough not owning this cache line yet, I can store my new
> > value in the store buffer already because whatever the current value
> > is, I will overwrite it anyways." whereas with the invalidate queue the
> > reader just ignores that the variable might have changed.
>
> Well, if the cacheline is in Modified or Exclusive state, then the
> CPU must transition it to at least Shared (with extra state saying
> "doomed" or some such). Or not, given yet more protocol complexity.
> If the CPU receiving the invalidation request knows that the CPU sending
> that request doesn't care what the current value of the cacheline is,
> then the receiving CPU can pretend that any stores happened before it
> received the invalidation request. Again, assuming that there are no
> ordering instructions that prohibit this.
I'm losing track why MESI even exists in the first place, to be honest.
I had thought it's about guaranteeing that a given cache line can only
have one value at a given time; but it seems protocols like that are
more about getting a cache line at all, sooner or later, and every
ordering must be ensured by the instructions.
This might seem strange or trivial to you.. I guess the crucial point
is my (false?) understanding of the Invalidate message serving to
guarantee that the receiver will see the sender's update of the cache
line. And it will see that update, just too late…
So the invalidate ack message is interpreted as a hard, reliable
synchronization point.
>
> > I guess this is legal because the only real guarantee of CPUs is that
> > one particular CPU sees all its accesses in order? But even then, as
> > above, for store buffers it makes sense, because the storing CPU
> > doesn't care about other values. The *reading* CPU sending the fake
> > Invalidate-ACK, on the contrary, should very well care about reading
> > the truthfull value from the cache line.
>
> Also, different types of CPUs have different underlying ordering
> guarantees. And speculative execution can often ignore those guarantees
> as long as it can avoid the user-visible state seeing any violations. And
> given multiple CPUs reading and modifying a given variable concurrently,
> what exactly is the truthful value at any given point in time?
> (Referring to Figure 15.10 ("A Variable With More Simultaneous Values").
Argh.
There is this famous quote from a famous book, where the roman governor
asks:
"What is truth?"
>
> > And if it all works like that, then what even is the point of
> > Invalidate messages at all, if you can not rely on them being followed
> > before you yourself start modifying the cache line?
>
> Because they are needed for things like memory-ordering instructions
> to work correctly. But on a weakly ordered system, if there are no
> memory-ordering instructions in the code, then there are precious few
> memory-ordering guarantees anyway. ;-)
>
> > Or is the point that a CPU temporarily ignoring an Invalidate message
> > can still validly (without memory barriers) use data in that cache line
> > which does *not* get modified by the other CPU? So memory barriers in
> > this scenario would allow for more efficiency by "segmenting" cache
> > lines?
>
> The point is mostly that on weakly ordered systems in the absence of
> memory-ordering instructions, there are very few guarantees. See again
> Figure 15.10.
Is the store buffer the exact equivalent of the invalidate queue, or is
the latter "more evil" as to my explanations above?
I'd say:
* The store buffer allows the CPU to ignore that it should (in an
ideal, performant world) store this _right_now_, only caring about
_itself_ seeing the stores in order, ignoring that the other CPU
should see it _now_ (or before that other, subsequent store in the
example).
* The invalidate queue allows a CPU to keep its read-only cacheline,
ignoring that it might get updated _right now_, ignoring the other
CPU's store, only caring about its own previous stores / reads (or,
more simple: relying on the fact that the cache line had been r-o so
far)
Still, the latter seems less wise to me.
>
> > Quite confusing. Parallel programming is hard and discussing it is one
> > thing we can do about it :]
>
> Agreed!!!
>
> Do the explanations above help? If so, I will rework that paragraph
> with attribution.
Well, yes, the reminder of there being no "true value" anyways helps
and reminds about why barriers exist.
But in fact, the question about the Invalidate message just being
temporarily *ignored and falsly answered* seemed so obvious to me that
I was searching for the Quick Quiz that would answer it, but there was
none :(
Something like that would sound very paul-ish to me:
" Quick Quiz:
But wait! Wasn't the entire point of the invalidate-ack message to
assure the receiver that it's now safe to modify the cache line without
there being readers of the old data left? How can the CPU possibly get
away with falsly claiming it has invalidated the cache line?
"
I think that would help. Or rather: the answer would help.
Regards
P.
>
> Thanx, Paul
>
> > Thanks,
> > Philipp
> >
> >
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git/tree/appendix/whymb/whymemorybarriers.tex#n1127
> >