On Mon, Jul 8, 2024 at 6:41 PM Andi Kleen wrote:
>
> > I have added a target hook for this in v4 of this patch. The hook
> > receives all the information about the stores, the load, the estimated
> > sequence cost and whether we expect to eliminate the load. With this
> > information the target sh
On 7/8/24 6:58 AM, Manolis Tsamis wrote:
This is still hard to tell. In some cases I have observed either
improvement or regressions in benchmarks, which are highly susceptible
to costing and the specific store-forwarding penalties of the CPU.
I have seen cases where the store-forwarding ins
> I have added a target hook for this in v4 of this patch. The hook
> receives all the information about the stores, the load, the estimated
> sequence cost and whether we expect to eliminate the load. With this
> information the target should be able to make an informed decision.
>
> What you men
On Thu, Jun 13, 2024 at 7:18 PM Andi Kleen wrote:
>
> Manolis Tsamis writes:
> >
> > Assembly like this can appear with bitfields or type punning / unions.
> > On stress-ng when running the cpu-union microbenchmark the following
> > speedups
> > have been observed.
> >
> > Neoverse-N1: +2
On 6/13/24 5:32 AM, Manolis Tsamis wrote:
This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:
strbw2, [x1, 1]
ldr x0, [x1] # Expensive sto
On 6/13/24 10:10 AM, Andi Kleen wrote:
Manolis Tsamis writes:
Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.
Neoverse-N1: +29.4%
Intel Coffeelake: +13.1%
Manolis Tsamis writes:
>
> Assembly like this can appear with bitfields or type punning / unions.
> On stress-ng when running the cpu-union microbenchmark the following speedups
> have been observed.
>
> Neoverse-N1: +29.4%
> Intel Coffeelake: +13.1%
> AMD 5950X:+17.5%
It seems