Michael Meissner <meiss...@linux.vnet.ibm.com> writes: > When I was comparing spec 2006 numbers between GCC 6.3 and 7.1, there was one > benchmark that was noticeably slower (milc). In looking at the code > generated, > the #1 hot function (mult_adj_su3_mat_vec) had some cases where automatic > vectorization generated splat of double from memory. > > The register allocator did not use the load with splat instruction (LXVDSX) > because all of the loads were register+offset. For the scalar values that it > could load into the FPR registers, it used the normal register+offset load > (d-form). For the other scalar values that would wind up in the traditional > Altivec registers, the register allocator decided to load up the value into a > GPR register and do a direct move. > > Now, it turns out that while the above code is inefficient, it is not a cause > for slow down of the milc benchmark. However there might be other places > where > using a load, direct move, and double word permute are causing a performance > problem, so I made this patch. > > The patch splits the splat into a register splat and a memory splat. This > forces the register allocator to convert the load to the indexed form which > the > LXVDSX instruction uses. I did a spec 2006 run with these changes, and there > were no significant performance differences with this patch. > > In the mult_adj_su3_mat_vec function, there were previously 5 GPR loads, > direct > move, and permute sequences along with one LXVDSK. With this patch, those GPR > loads have been replaced with LXVDSKs.
It sounds like that might create the opposite problem, in that if the RA ends up having to spill the input operand of a register splat, it'll be forced to keep the instruction as a register splat and load the spill slot into a temporary register. I thought the advice was normally to do what the port did before the patch and present the register and memory versions as alternatives of a single pattern. If there's no current way of getting efficient code in that case then maybe we need to tweak some of the costing infrastructure... Thanks, Richard