On Mon, Nov 5, 2012 at 7:34 PM, Uros Bizjak <[email protected]> wrote:
> On Mon, Nov 5, 2012 at 7:05 PM, Eric Botcazou <[email protected]> wrote:
>>> This sequence breaks assumption in mode-switching.c, that final
>>> function return register load operates in MODE_EXIT mode and triggere
>>> following code:
>>>
>>> for (j = n_entities - 1; j >= 0; j--)
>>> {
>>> int e = entity_map[j];
>>> int mode = MODE_NEEDED (e, return_copy);
>>>
>>> if (mode != num_modes[e] && mode != MODE_EXIT (e))
>>> break;
>>> }
>>>
>>> As discussed above, modes of loads, generated from __builtin_apply
>>> have no connection to function return mode. mode-switching.c does
>>> detect __builtin_apply situation and raises maybe_builtin_apply flag,
>>> but doesn't use it to short-circuit wrong check. In proposed patch, we
>>> detect this situation and raise force_late_switch in the same way, as
>>> SH4 does for its "late" fpscr emission.
>>
>> If I understand correctly, we need to insert the vzeroupper because the
>> function returns double in SSE registers but we generate an OImode load
>> instead of a DFmode load because of the __builtin_return. So we're in the
>> forced_late_switch case but we fail to recognize the tweaked return value
>> load
>> since the number of registers doesn't match.
>>
>> If so, I'd rather add another special case, like for the SH4, instead of a
>> generic bypass for maybe_builtin_apply, something along the lines of:
>>
>> /* For the x86 with AVX, we might be using a larger load for a value
>> returned in SSE registers and we want to put the final mode switch
>> after this return value copy. */
>> if (copy_start == ret_start
>> && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)]
>> && copy_num >= nregs
>> && OBJECT_P (SET_SRC (return_copy_pat)))
>> forced_late_switch = 1;
>
> Yes, this approach also works.
>
> I assume it is OK to commit attached patch?
>
> 2012-11-05 Eric Botcazou <[email protected]>
> Uros Bizjak <[email protected]>
>
> * mode-switching.c (create_pre_exit): Force late switch for
> __builtin_return case, when value, returned in SSE register,
> was loaded using OImode load.
>
> Tested on x86_64-pc-linux-gnu, also with to-be-committed avx-vzeroupper-27.c
Unfortunately the proposed patch fails the testcase from PR41993:
--cut here--
short retframe_short (void *rframe)
{
__builtin_return (rframe);
}
--cut here--
Uros.
>
> Thanks,
> Uros.