https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88717
--- Comment #1 from 刘袋鼠 <crazylht at gmail dot com> --- Pass_insert_vzeroupper uses mode_switch to insert `vzeroupper`. In function entry and functon body, 256bits/512bits registers are used ,so it will set mode as `AVX_U128_DIRTY`. But for function exit no 256bits/512bits register is returned, so `AVX_U128_CLEAN` is set. Then `case AVX_U128_CLEAN` will be triggered for mode switching, maybe we should handle ix86_avx_u128_mode_exit. Simple case show vzeroupper disappear when return a 512bits register. ``` test.i: ---------------------------- typedef float __v16sf __attribute__ ((__vector_size__ (64))); typedef float __m512 __attribute__ ((__vector_size__ (64), __may_alias__)); __m512 foo (float *p, __m512 x) { *p = ((__v16sf)x)[0]; return x; } test.s ---------------------------------- foo: .LFB0: .cfi_startproc vmovss %xmm0, (%rdi) ret .cfi_endproc .LFE0: ```