On Sun, 2012-11-04 at 14:52 +0100, Uros Bizjak wrote: > Hello! > > Vzeroupper placement patch uses MODE_EXIT to determine if vzeroupper > has to be placed before function exit. However, when compiling > following test > > --cut here-- > typedef struct objc_class *Class; > typedef struct objc_object > { > Class class_pointer; > } *id; > > typedef const struct objc_selector *SEL; > typedef void * retval_t; > typedef void * arglist_t; > > extern retval_t __objc_forward (id object, SEL sel, arglist_t args); > > double > __objc_double_forward (id rcv, SEL op, ...) > { > void *args, *res; > > args = __builtin_apply_args (); > res = __objc_forward (rcv, op, args); > __builtin_return (res); > } > --cut here-- > > __builtin_return emits a sequence of loads from "argument area" to all > possible function return registers in their widest mode (together with > corresponding USE RTXes), creating: > > (insn 31 30 33 2 (set (reg:DI 0 ax) > (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8])) avx-vzeroupper-27.c:23 -1 > (nil)) > (insn 33 31 35 2 (set (reg:XF 8 st) > (mem:XF (plus:DI (reg/v/f:DI 60 [ res ]) > (const_int 16 [0x10])) [0 S16 A8])) avx-vzeroupper-27.c:23 -1 > (nil)) > (insn 35 33 32 2 (set (reg:OI 21 xmm0) > (mem:OI (plus:DI (reg/v/f:DI 60 [ res ]) > (const_int 32 [0x20])) [0 S32 A8])) avx-vzeroupper-27.c:23 -1 > (nil)) > (insn 32 35 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1 > (nil)) > (insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1 > (nil)) > (insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1 > (nil)) > > This sequence breaks assumption in mode-switching.c, that final > function return register load operates in MODE_EXIT mode and triggere > following code: > > for (j = n_entities - 1; j >= 0; j--) > { > int e = entity_map[j]; > int mode = MODE_NEEDED (e, return_copy); > > if (mode != num_modes[e] && mode != MODE_EXIT (e)) > break; > } > > As discussed above, modes of loads, generated from __builtin_apply > have no connection to function return mode. mode-switching.c does > detect __builtin_apply situation and raises maybe_builtin_apply flag, > but doesn't use it to short-circuit wrong check. In proposed patch, we > detect this situation and raise force_late_switch in the same way, as > SH4 does for its "late" fpscr emission. For the testcase above, > following RTX sequence is generated: > > (insn 31 30 33 2 (set (reg:DI 0 ax) > (mem:DI (reg/v/f:DI 60 [ res ]) [0 S8 A8])) > avx-vzeroupper-27.c:23 63 {*movdi_internal_rex64} > (nil)) > (insn 33 31 35 2 (set (reg:XF 8 st) > (mem:XF (plus:DI (reg/v/f:DI 60 [ res ]) > (const_int 16 [0x10])) [0 S16 A8])) > avx-vzeroupper-27.c:23 106 {*movxf_internal} > (nil)) > (insn 35 33 52 2 (set (reg:OI 21 xmm0) > (mem:OI (plus:DI (reg/v/f:DI 60 [ res ]) > (const_int 32 [0x20])) [0 S32 A8])) > avx-vzeroupper-27.c:23 60 {*movoi_internal_avx} > (expr_list:REG_DEAD (reg/v/f:DI 60 [ res ]) > (nil))) > (insn 52 35 49 2 (unspec_volatile [ > (const_int 9 [0x9]) > ] UNSPECV_VZEROUPPER) -1 > (nil)) > (note 49 52 32 2 NOTE_INSN_DELETED) > (insn 32 49 34 2 (use (reg:DI 0 ax)) avx-vzeroupper-27.c:23 -1 > (expr_list:REG_DEAD (reg:DI 0 ax) > (nil))) > (insn 34 32 36 2 (use (reg:XF 8 st)) avx-vzeroupper-27.c:23 -1 > (expr_list:REG_DEAD (reg:XF 8 st) > (nil))) > (insn 36 34 44 2 (use (reg:OI 21 xmm0)) avx-vzeroupper-27.c:23 -1 > (nil)) > (insn 44 36 0 2 (use (reg/i:DF 21 xmm0)) avx-vzeroupper-27.c:24 -1 > > where mode switching correctly placed vzeroupper insn, based on > MODE_EXIT mode, determined from final use in (insn 44). > > 2012-11-04 Vladimir Yakovlev <vladimir.b.yakov...@intel.com> > Uros Bizjak <ubiz...@gmail.com> > > * mode-switching.c (create_pre_exit): Added code for > maybe_builtin_apply case. > > Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu, > with vzeroupper patch [1] applied. > > I have added SH4 maintainer for possible comments.
BTW, there are at least two mode-switching SH PRs: 41933 and 49220. I've tried those test cases with the vzeroupper patch [1] applied. Unfortunately, it doesn't change the situation of the two PRs. Cheers, Oleg