On Mon, Nov 5, 2012 at 7:34 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
> On Mon, Nov 5, 2012 at 7:05 PM, Eric Botcazou <ebotca...@adacore.com> wrote:
>>> This sequence breaks assumption in mode-switching.c, that final
>>> function return register load operates in MODE_EXIT mode and triggere
>>> following code:
>>>
>>>                   for (j = n_entities - 1; j >= 0; j--)
>>>                     {
>>>                       int e = entity_map[j];
>>>                       int mode = MODE_NEEDED (e, return_copy);
>>>
>>>                       if (mode != num_modes[e] && mode != MODE_EXIT (e))
>>>                         break;
>>>                     }
>>>
>>> As discussed above, modes of loads, generated from __builtin_apply
>>> have no connection to function return mode. mode-switching.c does
>>> detect __builtin_apply situation and raises maybe_builtin_apply flag,
>>> but doesn't use it to short-circuit wrong check. In proposed patch, we
>>> detect this situation and raise force_late_switch in the same way, as
>>> SH4 does for its "late" fpscr emission.
>>
>> If I understand correctly, we need to insert the vzeroupper because the
>> function returns double in SSE registers but we generate an OImode load
>> instead of a DFmode load because of the __builtin_return.  So we're in the
>> forced_late_switch case but we fail to recognize the tweaked return value 
>> load
>> since the number of registers doesn't match.
>>
>> If so, I'd rather add another special case, like for the SH4, instead of a
>> generic bypass for maybe_builtin_apply, something along the lines of:
>>
>>         /* For the x86 with AVX, we might be using a larger load for a value
>>            returned in SSE registers and we want to put the final mode switch
>>            after this return value copy.  */
>>         if (copy_start == ret_start
>>             && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)]
>>             && copy_num >= nregs
>>             && OBJECT_P (SET_SRC (return_copy_pat)))
>>           forced_late_switch = 1;
>
> Yes, this approach also works.
>
> I assume it is OK to commit attached patch?
>
> 2012-11-05  Eric Botcazou  <ebotca...@adacore.com>
>             Uros Bizjak  <ubiz...@gmail.com>
>
>         * mode-switching.c (create_pre_exit): Force late switch for
>         __builtin_return case, when value, returned in SSE register,
>         was loaded using OImode load.
>
> Tested on x86_64-pc-linux-gnu, also with to-be-committed avx-vzeroupper-27.c

Unfortunately the proposed patch fails the testcase from PR41993:

--cut here--
short retframe_short (void *rframe)
{
    __builtin_return (rframe);
}
--cut here--

Uros.
>
> Thanks,
> Uros.

Reply via email to