https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102264

--- Comment #2 from Nicholai Tukanov <ntukanov at cmu dot edu> ---
(In reply to Andrew Pinski from comment #1)
> There seems to be some extra moves the register allocator cannot remove and
> that is causing some extra spilling.
>
> Your loop has 32 live variables and that is just at the limit.

Can the register allocator be modified to recognize the other registers? The
problem seems limited to the compute instruction (vpdpwssd in this case). 

I specifically choose 32 to max out the registers. Since the compute
instruction gets limited to half of that (zmm0-zmm15), the extra moves are
killing the performance.

Reply via email to