On Mon, Nov 5, 2012 at 7:34 PM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Mon, Nov 5, 2012 at 7:05 PM, Eric Botcazou <ebotca...@adacore.com> wrote: >>> This sequence breaks assumption in mode-switching.c, that final >>> function return register load operates in MODE_EXIT mode and triggere >>> following code: >>> >>> for (j = n_entities - 1; j >= 0; j--) >>> { >>> int e = entity_map[j]; >>> int mode = MODE_NEEDED (e, return_copy); >>> >>> if (mode != num_modes[e] && mode != MODE_EXIT (e)) >>> break; >>> } >>> >>> As discussed above, modes of loads, generated from __builtin_apply >>> have no connection to function return mode. mode-switching.c does >>> detect __builtin_apply situation and raises maybe_builtin_apply flag, >>> but doesn't use it to short-circuit wrong check. In proposed patch, we >>> detect this situation and raise force_late_switch in the same way, as >>> SH4 does for its "late" fpscr emission. >> >> If I understand correctly, we need to insert the vzeroupper because the >> function returns double in SSE registers but we generate an OImode load >> instead of a DFmode load because of the __builtin_return. So we're in the >> forced_late_switch case but we fail to recognize the tweaked return value >> load >> since the number of registers doesn't match. >> >> If so, I'd rather add another special case, like for the SH4, instead of a >> generic bypass for maybe_builtin_apply, something along the lines of: >> >> /* For the x86 with AVX, we might be using a larger load for a value >> returned in SSE registers and we want to put the final mode switch >> after this return value copy. */ >> if (copy_start == ret_start >> && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)] >> && copy_num >= nregs >> && OBJECT_P (SET_SRC (return_copy_pat))) >> forced_late_switch = 1; > > Yes, this approach also works. > > I assume it is OK to commit attached patch? > > 2012-11-05 Eric Botcazou <ebotca...@adacore.com> > Uros Bizjak <ubiz...@gmail.com> > > * mode-switching.c (create_pre_exit): Force late switch for > __builtin_return case, when value, returned in SSE register, > was loaded using OImode load. > > Tested on x86_64-pc-linux-gnu, also with to-be-committed avx-vzeroupper-27.c
Unfortunately the proposed patch fails the testcase from PR41993: --cut here-- short retframe_short (void *rframe) { __builtin_return (rframe); } --cut here-- Uros. > > Thanks, > Uros.