Re: [PATCH], PR target/80718, Improve PowerPC splat double word

Richard Sandiford Tue, 23 May 2017 07:17:52 -0700

Michael Meissner <meiss...@linux.vnet.ibm.com> writes:
> When I was comparing spec 2006 numbers between GCC 6.3 and 7.1, there was one
> benchmark that was noticeably slower (milc).  In looking at the code 
> generated,
> the #1 hot function (mult_adj_su3_mat_vec) had some cases where automatic
> vectorization generated splat of double from memory.
>
> The register allocator did not use the load with splat instruction (LXVDSX)
> because all of the loads were register+offset.  For the scalar values that it
> could load into the FPR registers, it used the normal register+offset load
> (d-form).  For the other scalar values that would wind up in the traditional
> Altivec registers, the register allocator decided to load up the value into a
> GPR register and do a direct move.
>
> Now, it turns out that while the above code is inefficient, it is not a cause
> for slow down of the milc benchmark.  However there might be other places 
> where
> using a load, direct move, and double word permute are causing a performance
> problem, so I made this patch.
>
> The patch splits the splat into a register splat and a memory splat.  This
> forces the register allocator to convert the load to the indexed form which 
> the
> LXVDSX instruction uses.  I did a spec 2006 run with these changes, and there
> were no significant performance differences with this patch.
>
> In the mult_adj_su3_mat_vec function, there were previously 5 GPR loads, 
> direct
> move, and permute sequences along with one LXVDSK.  With this patch, those GPR
> loads have been replaced with LXVDSKs.


It sounds like that might create the opposite problem, in that if the RA
ends up having to spill the input operand of a register splat, it'll be
forced to keep the instruction as a register splat and load the spill
slot into a temporary register.

I thought the advice was normally to do what the port did before the
patch and present the register and memory versions as alternatives of a
single pattern.  If there's no current way of getting efficient code in
that case then maybe we need to tweak some of the costing infrastructure...

Thanks,
Richard

Re: [PATCH], PR target/80718, Improve PowerPC splat double word

Reply via email to