On Sun, 2012-11-04 at 14:52 +0100, Uros Bizjak wrote:
> Hello!
> 
> Vzeroupper placement patch uses MODE_EXIT to determine if vzeroupper
> has to be placed before function exit. However, when compiling
> following test
> 
> --cut here--
> typedef struct objc_class *Class;
> typedef struct objc_object
> {
>   Class class_pointer;
> } *id;
> 
> typedef const struct objc_selector *SEL;
> typedef void * retval_t;
> typedef void * arglist_t;
> 
> extern retval_t __objc_forward (id object, SEL sel, arglist_t args);
> 
> double
> __objc_double_forward (id rcv, SEL op, ...)
> {
>   void *args, *res;
> 
>   args = __builtin_apply_args ();
>   res = __objc_forward (rcv, op, args);
>   __builtin_return (res);
> }
> --cut here--
> 
> __builtin_return emits a sequence of loads from "argument area" to all
> possible function return registers in their widest mode (together with
> corresponding USE RTXes), creating:
> 
> (insn 31 30 33 2 (set (reg:DI 0 ax)
>         (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8])) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 33 31 35 2 (set (reg:XF 8 st)
>         (mem:XF (plus:DI (reg/v/f:DI 60 [ res ])
>                 (const_int 16 [0x10])) [0 S16 A8])) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 35 33 32 2 (set (reg:OI 21 xmm0)
>         (mem:OI (plus:DI (reg/v/f:DI 60 [ res ])
>                 (const_int 32 [0x20])) [0 S32 A8])) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 32 35 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1
>      (nil))
> 
> This sequence breaks assumption in mode-switching.c, that final
> function return register load operates in MODE_EXIT mode and triggere
> following code:
> 
>                   for (j = n_entities - 1; j >= 0; j--)
>                     {
>                       int e = entity_map[j];
>                       int mode = MODE_NEEDED (e, return_copy);
> 
>                       if (mode != num_modes[e] && mode != MODE_EXIT (e))
>                         break;
>                     }
> 
> As discussed above, modes of loads, generated from __builtin_apply
> have no connection to function return mode. mode-switching.c does
> detect __builtin_apply situation and raises maybe_builtin_apply flag,
> but doesn't use it to short-circuit wrong check. In proposed patch, we
> detect this situation and raise force_late_switch in the same way, as
> SH4 does for its "late" fpscr emission. For the testcase above,
> following RTX sequence is generated:
> 
> (insn 31 30 33 2 (set (reg:DI 0 ax)
>         (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8]))
> avx-vzeroupper-27.c:23 63 {*movdi_internal_rex64}
>      (nil))
> (insn 33 31 35 2 (set (reg:XF 8 st)
>         (mem:XF (plus:DI (reg/v/f:DI 60 [ res ])
>                 (const_int 16 [0x10])) [0 S16 A8]))
> avx-vzeroupper-27.c:23 106 {*movxf_internal}
>      (nil))
> (insn 35 33 52 2 (set (reg:OI 21 xmm0)
>         (mem:OI (plus:DI (reg/v/f:DI 60 [ res ])
>                 (const_int 32 [0x20])) [0 S32 A8]))
> avx-vzeroupper-27.c:23 60 {*movoi_internal_avx}
>      (expr_list:REG_DEAD (reg/v/f:DI 60 [ res ])
>         (nil)))
> (insn 52 35 49 2 (unspec_volatile [
>             (const_int 9 [0x9])
>         ] UNSPECV_VZEROUPPER) -1
>      (nil))
> (note 49 52 32 2 NOTE_INSN_DELETED)
> (insn 32 49 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1
>      (expr_list:REG_DEAD (reg:DI 0 ax)
>         (nil)))
> (insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1
>      (expr_list:REG_DEAD (reg:XF 8 st)
>         (nil)))
> (insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1
>      (nil))
> (insn 44 36 0 2 (use (reg/i:DF 21 xmm0)) avx-vzeroupper-27.c:24 -1
> 
> where mode switching correctly placed vzeroupper insn, based on
> MODE_EXIT mode, determined from final use in (insn 44).
> 
> 2012-11-04  Vladimir Yakovlev  <vladimir.b.yakov...@intel.com>
>           Uros Bizjak  <ubiz...@gmail.com>
> 
>         * mode-switching.c (create_pre_exit): Added code for
> maybe_builtin_apply case.
> 
> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu,
> with vzeroupper patch [1] applied.
> 
> I have added SH4 maintainer for possible comments.

BTW, there are at least two mode-switching SH PRs: 41933 and 49220.
I've tried those test cases with the vzeroupper patch [1] applied.
Unfortunately, it doesn't change the situation of the two PRs.

Cheers,
Oleg

Reply via email to