On Mon, 6 Jan 2025, Jeff Law wrote:

> > I note that the atomic sequences emitted are suboptimal performance-wise
> > as the looping branch for the unsuccessful completion of the sequence
> > points backwards, which means it will be predicted as taken despite that
> > in most cases it will fall through.  I do not see it as a deficiency of
> > this change proposed as it takes care of recording that the branch is
> > unlikely to be taken, by calling `alpha_emit_unlikely_jump'.  Therefore
> > generic code elsewhere shou
> Looks like this got truncated.

 Umm, I'll complete the paragraph along the lines of:

"Therefore generic code elsewhere should instead be investigated and 
adjusted accordingly for the arrangement to actually take effect."

>  Anyway, easy to forget how limited branch
> prediction was in this era.  I haven't pondered static branch prediction in
> forever.  I wouldn't worry too much about this case -- we can always come back
> to it if generic doesn't do the right thing with code layout.

 Indeed.  It seems like something to be taken care of elsewhere in the 
Alpha backend though.

 FWIW Linux kernel inline assembly code uses intermediate unconditional 
branches emitted in a subsection each for this purpose.  The long +/-4MiB 
span of Alpha branches makes it feasible, though I suppose it will still 
be a problem with some software.

> >   NB I note that there is a warning in gcc/config/alpha/sync.md that it is
> > unpredictable if the lock_flag is cleared or not by a normal load or store
> > executed on the same CPU and therefore we need to make sure no register
> > spill is inserted in the sequence.  I seem not to have seen it actually
> > happen and testing results with actual hardware do look good.
> If it's a real issue in practice and we have to revisit this code, then we
> could look at hard barriers before/after the sequence to prevent scheduling
> into the sequence.  Or we could emit the entire sequence as an atomic unit
> much like some ports do for inlined subword atomics.

 For the time being I have chosen to update the change such as to emit 
barriers.  It is suboptimal, because it's not an issue if instructions get 
scheduled across the atomic sequence and we only need to prevent any from 
being moved *into* it.  I think it is good enough though for now.  Later 
on I'd like to rewrite the sequences along the lines of insn splitters 
from gcc/config/alpha/sync.md which do the optimal thing.

> >   NB2 I reckon the manual ought to be updated to say only one scratch is
> > permitted for secondary reloads.  While it's documented otherwise since
> > forever, it's never actually matched reality.
> ISTM that could be a follow-up.  Consider such an update pre-approved.

 Agreed, and thanks for the pre-approval.  I'll come up with a patch.

  Maciej

Reply via email to