Hi! On Sat, Nov 05, 2011 at 10:50:44AM +0100, Jakub Jelinek wrote: > On the following testcase with -m64 -O3 -mavx2 (but it is just an example, > you can replace the loop there with any code that doesn't touch the > stack or frame pointer at all), only f3 is shrink wrapped and in that case > it on the other side doesn't add vzeroupper before leaving the AVX using > code that it IMNSHO should. But I wonder why we can't shrink-wrap also
Here is a quick hack that deals with the missing vzeroupper issue. Probably it would be nicer to create a helper in i386.c for that though, because call_no_avx256 is an enum private to i386.c. --- gcc/config/i386/i386.md.jj 2011-11-04 07:49:41.000000000 +0100 +++ gcc/config/i386/i386.md 2011-11-05 14:00:32.000000000 +0100 @@ -11725,6 +11725,12 @@ (define_expand "return" [(simple_return)] "ix86_can_use_return_insn_p ()" { + /* Emit vzeroupper if needed. */ + if (TARGET_VZEROUPPER + && !TREE_THIS_VOLATILE (cfun->decl) + && !cfun->machine->caller_return_avx256_p) + emit_insn (gen_avx_vzeroupper (const2_rtx)); + if (crtl->args.pops_args) { rtx popc = GEN_INT (crtl->args.pops_args); @@ -11741,6 +11747,12 @@ (define_expand "simple_return" [(simple_return)] "!TARGET_SEH" { + /* Emit vzeroupper if needed. */ + if (TARGET_VZEROUPPER + && !TREE_THIS_VOLATILE (cfun->decl) + && !cfun->machine->caller_return_avx256_p) + emit_insn (gen_avx_vzeroupper (const2_rtx)); + if (crtl->args.pops_args) { rtx popc = GEN_INT (crtl->args.pops_args); Jakub