Re: [PATCH v3] Target-independent store forwarding avoidance.

2024-07-09 Thread Manolis Tsamis
On Mon, Jul 8, 2024 at 6:41 PM Andi Kleen wrote: > > > I have added a target hook for this in v4 of this patch. The hook > > receives all the information about the stores, the load, the estimated > > sequence cost and whether we expect to eliminate the load. With this > > information the target sh

Re: [PATCH v3] Target-independent store forwarding avoidance.

2024-07-08 Thread Jeff Law
On 7/8/24 6:58 AM, Manolis Tsamis wrote: This is still hard to tell. In some cases I have observed either improvement or regressions in benchmarks, which are highly susceptible to costing and the specific store-forwarding penalties of the CPU. I have seen cases where the store-forwarding ins

Re: [PATCH v3] Target-independent store forwarding avoidance.

2024-07-08 Thread Andi Kleen
> I have added a target hook for this in v4 of this patch. The hook > receives all the information about the stores, the load, the estimated > sequence cost and whether we expect to eliminate the load. With this > information the target should be able to make an informed decision. > > What you men

Re: [PATCH v3] Target-independent store forwarding avoidance.

2024-07-08 Thread Manolis Tsamis
On Thu, Jun 13, 2024 at 7:18 PM Andi Kleen wrote: > > Manolis Tsamis writes: > > > > Assembly like this can appear with bitfields or type punning / unions. > > On stress-ng when running the cpu-union microbenchmark the following > > speedups > > have been observed. > > > > Neoverse-N1: +2

Re: [PATCH v3] Target-independent store forwarding avoidance.

2024-06-14 Thread Jeff Law
On 6/13/24 5:32 AM, Manolis Tsamis wrote: This pass detects cases of expensive store forwarding and tries to avoid them by reordering the stores and using suitable bit insertion sequences. For example it can transform this: strbw2, [x1, 1] ldr x0, [x1] # Expensive sto

Re: [PATCH v3] Target-independent store forwarding avoidance.

2024-06-13 Thread Jeff Law
On 6/13/24 10:10 AM, Andi Kleen wrote: Manolis Tsamis writes: Assembly like this can appear with bitfields or type punning / unions. On stress-ng when running the cpu-union microbenchmark the following speedups have been observed. Neoverse-N1: +29.4% Intel Coffeelake: +13.1%

Re: [PATCH v3] Target-independent store forwarding avoidance.

2024-06-13 Thread Andi Kleen
Manolis Tsamis writes: > > Assembly like this can appear with bitfields or type punning / unions. > On stress-ng when running the cpu-union microbenchmark the following speedups > have been observed. > > Neoverse-N1: +29.4% > Intel Coffeelake: +13.1% > AMD 5950X:+17.5% It seems

[PATCH v3] Target-independent store forwarding avoidance.

2024-06-13 Thread Manolis Tsamis
This pass detects cases of expensive store forwarding and tries to avoid them by reordering the stores and using suitable bit insertion sequences. For example it can transform this: strbw2, [x1, 1] ldr x0, [x1] # Expensive store forwarding to larger load. To: ldr