Hello! Attached patch is a different approach to the problem of split return copies in create_pre_exit. It turns out that for vzeroupper insertion pass, we actually don't need to insert a mode switch before the return copy, it is enough to split edge to exit block - so we can emit vzeroupper at the function exit edge.
Since x86 is the only target that uses optimize mode switching after reload, I took the liberty and used !reload_completed for the condition when we don't need to search for return copy. Sure, with the big comment as evident from the patch. 2018-11-20 Uros Bizjak <ubiz...@gmail.com> PR target/88070 * mode-switching.c (create_pre_exit): After reload, always split the fallthrough edge to the exit block. testsuite/ChangeLog: 2018-11-20 Uros Bizjak <ubiz...@gmail.com> PR target/88070 * gcc.target/i386/pr88070.c: New test. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline SVN. Uros.
Index: mode-switching.c =================================================================== --- mode-switching.c (revision 266278) +++ mode-switching.c (working copy) @@ -248,8 +248,22 @@ create_pre_exit (int n_entities, int *entity_map, gcc_assert (!pre_exit); /* If this function returns a value at the end, we have to insert the final mode switch before the return value copy - to its hard register. */ - if (EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) == 1 + to its hard register. + + x86 targets use mode-switching infrastructure to + conditionally insert vzeroupper instruction at the exit + from the function where there is no need to switch the + mode before the return value copy. The vzeroupper insertion + pass runs after reload, so use !reload_completed as a stand-in + for x86 to skip the search for the return value copy insn. + + N.b.: the code below assumes that the return copy insn + immediately precedes its corresponding use insn. This + assumption does not hold after reload, since sched1 pass + can schedule the return copy insn away from its + corresponding use insn. */ + if (!reload_completed + && EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) == 1 && NONJUMP_INSN_P ((last_insn = BB_END (src_bb))) && GET_CODE (PATTERN (last_insn)) == USE && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG) Index: testsuite/gcc.target/i386/pr88070.c =================================================================== --- testsuite/gcc.target/i386/pr88070.c (nonexistent) +++ testsuite/gcc.target/i386/pr88070.c (working copy) @@ -0,0 +1,12 @@ +/* PR target/88070 */ +/* { dg-do compile } */ +/* { dg-options "-O -fexpensive-optimizations -fnon-call-exceptions -fschedule-insns -fno-dce -fno-dse -mavx" } */ + +typedef float vfloat2 __attribute__ ((__vector_size__ (2 * sizeof (float)))); + +vfloat2 +test1float2 (float c) +{ + vfloat2 v = { c, c }; + return v; +}