https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69454
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Ilya Enkovich from comment #4) > (In reply to Jakub Jelinek from comment #3) > > I need additional -march=x86-64 to trigger this. > > I'd say either we have to pessimistically assume what the STV pass might be > > doing already during expansion, or the STV pass would need to perform parts > > of what expand_stack_alignment is doing (basically check if what the STV > > pass created causes any differences in decision during > > expand_stack_alignment, and if yes, tweak things so that the end result > > looks as if those decisions were done already during the expansion (STV is > > pre-RA pass, so maybe it still could work), or maybe easiest fix is for now > > disable TARGET_STV if preferred_stack_boundary is smaller than 4. > > Looking into expand_stack_alignment I see we may need to allocate DRAP > register and make fixup_tail_calls call. Isn't it too late for that? > Couldn't we make some optimizations basing on notes fixup_tail_calls > invalidates? > > I propose this change in STV gate: > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 34b57a4..fb11680 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -3661,7 +3661,11 @@ public: > /* opt_pass methods: */ > virtual bool gate (function *) > { > - return !TARGET_64BIT && TARGET_STV && TARGET_SSE2 && optimize > 1; > + return !TARGET_64BIT && TARGET_STV && TARGET_SSE2 && optimize > 1 > + /* Check we don't need to allocate DRAP register for STV. */ > + && (crtl->drap_reg > + || !crtl->need_drap > + || INCOMING_STACK_BOUNDARY >= 128); > } > > virtual unsigned int execute (function *) Already during the expansion TARGET_STV makes quite a big difference, won't just disabling the stv pass cause performance regression to -mno-stv? Also, I'm surprised you are checking INCOMING_STACK_BOUNDARY, I'd have expected || ix86_preferred_stack_boundary >= 128 instead. I had in mind either: --- gcc/config/i386/i386.c.jj 2016-01-25 12:10:57.000000000 +0100 +++ gcc/config/i386/i386.c 2016-01-25 16:54:28.662713284 +0100 @@ -5453,6 +5453,11 @@ ix86_option_override_internal (bool main opts->x_target_flags |= MASK_VZEROUPPER; if (!(opts_set->x_target_flags & MASK_STV)) opts->x_target_flags |= MASK_STV; + /* Disable STV if -mpreferred-stack-boundary={2,3} - the needed + stack realignment will be extra cost the pass doesn't take into + account and the pass does not ensure DRAP is created either. */ + if (ix86_preferred_stack_boundary < 128) + opts->x_target_flags &= ~MASK_STV; if (!ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL] && !(opts_set->x_target_flags & MASK_AVX256_SPLIT_UNALIGNED_LOAD)) opts->x_target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD; (i.e. force -mno-stv for -mpreferred-boundary={2,3}), but that will likely disable the pass altogether for the -miamcu (but your patch in most cases will too), not sure if that is a big deal or not). Another alternative is if the STV pass changes anything and creates possible need for aligned vector spills, create drap rtx during that pass.