On Mon, Nov 5, 2012 at 7:05 PM, Eric Botcazou <[email protected]> wrote:
>> This sequence breaks assumption in mode-switching.c, that final
>> function return register load operates in MODE_EXIT mode and triggere
>> following code:
>>
>> for (j = n_entities - 1; j >= 0; j--)
>> {
>> int e = entity_map[j];
>> int mode = MODE_NEEDED (e, return_copy);
>>
>> if (mode != num_modes[e] && mode != MODE_EXIT (e))
>> break;
>> }
>>
>> As discussed above, modes of loads, generated from __builtin_apply
>> have no connection to function return mode. mode-switching.c does
>> detect __builtin_apply situation and raises maybe_builtin_apply flag,
>> but doesn't use it to short-circuit wrong check. In proposed patch, we
>> detect this situation and raise force_late_switch in the same way, as
>> SH4 does for its "late" fpscr emission.
>
> If I understand correctly, we need to insert the vzeroupper because the
> function returns double in SSE registers but we generate an OImode load
> instead of a DFmode load because of the __builtin_return. So we're in the
> forced_late_switch case but we fail to recognize the tweaked return value load
> since the number of registers doesn't match.
>
> If so, I'd rather add another special case, like for the SH4, instead of a
> generic bypass for maybe_builtin_apply, something along the lines of:
>
> /* For the x86 with AVX, we might be using a larger load for a value
> returned in SSE registers and we want to put the final mode switch
> after this return value copy. */
> if (copy_start == ret_start
> && nregs == hard_regno_nregs[ret_start][GET_MODE (ret_reg)]
> && copy_num >= nregs
> && OBJECT_P (SET_SRC (return_copy_pat)))
> forced_late_switch = 1;
Yes, this approach also works.
I assume it is OK to commit attached patch?
2012-11-05 Eric Botcazou <[email protected]>
Uros Bizjak <[email protected]>
* mode-switching.c (create_pre_exit): Force late switch for
__builtin_return case, when value, returned in SSE register,
was loaded using OImode load.
Tested on x86_64-pc-linux-gnu, also with to-be-committed avx-vzeroupper-27.c
Thanks,
Uros.
Index: mode-switching.c
===================================================================
--- mode-switching.c (revision 193174)
+++ mode-switching.c (working copy)
@@ -1,6 +1,6 @@
/* CPU mode switching
Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2008,
- 2009, 2010 Free Software Foundation, Inc.
+ 2009, 2010, 2012 Free Software Foundation, Inc.
This file is part of GCC.
@@ -342,6 +342,17 @@ create_pre_exit (int n_entities, int *entity_map,
}
if (j >= 0)
{
+ /* For the x86 with AVX, we might be using a larger
+ load for a value returned in SSE registers and we
+ want to put the final mode switch after this
+ return value copy. */
+ if (copy_start == ret_start
+ && nregs
+ == hard_regno_nregs[ret_start][GET_MODE
(ret_reg)]
+ && copy_num >= nregs
+ && OBJECT_P (SET_SRC (return_copy_pat)))
+ forced_late_switch = 1;
+
/* For the SH4, floating point loads depend on fpscr,
thus we might need to put the final mode switch
after the return value copy. That is still OK,