On Wed, Dec 6, 2023 at 6:23 AM Jakub Jelinek <ja...@redhat.com> wrote: > > Hi! > > Regardless of the outcome of the REG_UNUSED discussions, I think > it is a good idea to move the vzeroupper pass one pass later. > As can be seen in the multiple PRs and as postreload.cc documents, > reload/LRA is known to create dead statements quite often, which > is the reason why we have postreload_cse pass at all. > Doing vzeroupper pass before such cleanup means the pass including > df_analyze for it needs to process more instructions than needed > and because mode switching adds note problem, also higher chance of > having stale REG_UNUSED notes. > And, I really don't see why vzeroupper can't wait until those cleanups > are done. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? LGTM. > > 2023-12-05 Jakub Jelinek <ja...@redhat.com> > > PR rtl-optimization/112760 > * config/i386/i386-passes.def (pass_insert_vzeroupper): Insert > after pass_postreload_cse rather than pass_reload. > * config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper): > Adjust comment for it. > > * gcc.dg/pr112760.c: New test. > > --- gcc/config/i386/i386-passes.def.jj 2023-01-16 11:52:15.960735877 +0100 > +++ gcc/config/i386/i386-passes.def 2023-12-05 19:15:01.748279329 +0100 > @@ -24,7 +24,7 @@ along with GCC; see the file COPYING3. > REPLACE_PASS (PASS, INSTANCE, TGT_PASS) > */ > > - INSERT_PASS_AFTER (pass_reload, 1, pass_insert_vzeroupper); > + INSERT_PASS_AFTER (pass_postreload_cse, 1, pass_insert_vzeroupper); > INSERT_PASS_AFTER (pass_combine, 1, pass_stv, false /* timode_p */); > /* Run the 64-bit STV pass before the CSE pass so that CONST0_RTX and > CONSTM1_RTX generated by the STV pass can be CSEed. */ > --- gcc/config/i386/i386-features.cc.jj 2023-11-02 07:49:15.029894060 +0100 > +++ gcc/config/i386/i386-features.cc 2023-12-05 19:15:48.658620698 +0100 > @@ -2627,10 +2627,11 @@ convert_scalars_to_vector (bool timode_p > static unsigned int > rest_of_handle_insert_vzeroupper (void) > { > - /* vzeroupper instructions are inserted immediately after reload to > - account for possible spills from 256bit or 512bit registers. The pass > - reuses mode switching infrastructure by re-running mode insertion > - pass, so disable entities that have already been processed. */ > + /* vzeroupper instructions are inserted immediately after reload and > + postreload_cse to clean up after it a little bit to account for possible > + spills from 256bit or 512bit registers. The pass reuses mode switching > + infrastructure by re-running mode insertion pass, so disable entities > + that have already been processed. */ > for (int i = 0; i < MAX_386_ENTITIES; i++) > ix86_optimize_mode_switching[i] = 0; > > --- gcc/testsuite/gcc.dg/pr112760.c.jj 2023-12-01 13:46:57.444746529 +0100 > +++ gcc/testsuite/gcc.dg/pr112760.c 2023-12-01 13:46:36.729036971 +0100 > @@ -0,0 +1,22 @@ > +/* PR rtl-optimization/112760 */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fno-dce -fno-guess-branch-probability > --param=max-cse-insns=0" } */ > +/* { dg-additional-options "-m8bit-idiv -mavx" { target i?86-*-* x86_64-*-* > } } */ > + > +unsigned g; > + > +__attribute__((__noipa__)) unsigned short > +foo (unsigned short a, unsigned short b) > +{ > + unsigned short x = __builtin_add_overflow_p (a, g, (unsigned short) 0); > + g -= g / b; > + return x; > +} > + > +int > +main () > +{ > + unsigned short x = foo (40, 6); > + if (x != 0) > + __builtin_abort (); > +} > > Jakub >
-- BR, Hongtao