https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78116

--- Comment #17 from amker at gcc dot gnu.org ---
(In reply to Andrew Senkevich from comment #16)
> (In reply to amker from comment #13)
> > We should create another PR for additional copy instructions after my patch
> > and close this one.  IMHO they are two different issues.
> 
> I agree, currently there are no fills from stack on both testcases for which
> this PR was created.
> But I have no bugzilla permissions to close it, could somebody from CC close
> it please?
> 
> (In reply to Pat Haugen from comment #14)
> . . . 
> > Additional info, it's really just one copy introduced, but becomes 4 after
> > unrolling. This is the loop from the first testcase without -funroll-loops.
> > Looks like we could get rid of the vmovaps by making zmm2 the dest on the
> > vpermps (assuming I'm understanding the asm correctly).
> > 
> > .L26:
> >         vpermps (%rcx), %zmm10, %zmm1
> >         leal    1(%rsi), %esi
> >         vmovaps %zmm1, %zmm2
> >         vmaxps  (%r15,%rdx), %zmm3, %zmm1
> >         vfnmadd132ps    (%r12,%rdx), %zmm7, %zmm2
> >         cmpl    %esi, %r8d
> >         leaq    -64(%rcx), %rcx
> >         vmaxps  %zmm1, %zmm2, %zmm1
> >         vmovups %zmm1, (%rdi,%rdx)
> >         leaq    64(%rdx), %rdx
> >         ja      .L26
> 
> Looks like so. For which optimization/analysis we should file ticket for it?

Put rtl-optimization would be fine for this.  I can close this ticket after new
one is created.  Thanks.

Reply via email to