On Tue, Jan 8, 2019 at 9:29 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Tue, Jan 8, 2019 at 5:17 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > On Tue, Jan 8, 2019 at 6:54 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > > > On Tue, Jan 8, 2019 at 3:39 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > On Mon, Jan 7, 2019 at 11:12 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > > > > > > > On Mon, Jan 7, 2019 at 6:40 PM H.J. Lu <hongjiu...@intel.com> wrote: > > > > > > > > > > > > There is no need to generate vzeroupper if caller uses upper bits of > > > > > > AVX/AVX512 registers, We track caller's avx_u128_state and avoid > > > > > > vzeroupper when caller's avx_u128_state is AVX_U128_DIRTY. > > > > > > > > > > > > Tested on i686 and x86-64 with and without --with-arch=native. > > > > > > > > > > > > OK for trunk? > > > > > > > > > > In principle OK, but I think we don't have to cache the result of > > > > > ix86_avx_u128_mode_entry. Simply call the function from > > > > > ix86_avx_u128_mode_exit; it is a simple function, so I guess we can > > > > > afford to re-call it one more time per function. > > > > > > > > Do we really need ix86_avx_u128_mode_entry? We can just > > > > set entry state to AVX_U128_CLEAN and set exit state to > > > > AVX_U128_DIRTY if caller returns AVX/AVX512 register or passes > > > > AVX/AVX512 registers to callee. > > > > > > > > Does this patch look OK? > > > > > > No, the compiler is then free to move optimal insertion point at the > > > beginning of the function. > > > > > > > Here is the updated patch. OK for trunk? > > OK with the comment fix. > > Thanks, > Uros. > > - return AVX_U128_CLEAN; > + /* Entry mode is set to AVX_U128_DIRTY if there are 256bit or 512bit > > s/Entry/Exit/ > > + modes used in function arguments. */ > > ... , otherwise return AVX_U128_CLEAN. > > + return ix86_avx_u128_mode_entry (); > }
This is what I am checking in. Thanks. -- H.J.
From 315e6eadf7021748de375c59da9cf451351c9597 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.to...@gmail.com> Date: Mon, 7 Jan 2019 06:56:44 -0800 Subject: [PATCH] x86: Don't generate vzeroupper if caller passes AVX/AVX512 registers There is no need to generate vzeroupper if caller passes arguments in AVX/AVX512 registers. Tested on i686 and x86-64 with and without --with-arch=native. gcc/ PR target/88717 * config/i386/i386.c (ix86_avx_u128_mode_exit): Call ix86_avx_u128_mode_entry. gcc/testsuite/ PR target/88717 * gcc.target/i386/pr88717.c: New test. --- gcc/config/i386/i386.c | 5 ++++- gcc/testsuite/gcc.target/i386/pr88717.c | 24 ++++++++++++++++++++++++ 2 files changed, 28 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr88717.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index d01278d866f..bd48e080f46 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -19137,7 +19137,10 @@ ix86_avx_u128_mode_exit (void) if (reg && ix86_check_avx_upper_register (reg)) return AVX_U128_DIRTY; - return AVX_U128_CLEAN; + /* Exit mode is set to AVX_U128_DIRTY if there are 256bit or 512bit + modes used in function arguments, otherwise return AVX_U128_CLEAN. + */ + return ix86_avx_u128_mode_entry (); } /* Return a mode that ENTITY is assumed to be diff --git a/gcc/testsuite/gcc.target/i386/pr88717.c b/gcc/testsuite/gcc.target/i386/pr88717.c new file mode 100644 index 00000000000..01680998f1b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88717.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512f -mvzeroupper" } */ + +#include <immintrin.h> + +__m128 +foo1 (__m256 x) +{ + return _mm256_castps256_ps128 (x); +} + +void +foo2 (float *p, __m256 x) +{ + *p = ((__v8sf)x)[0]; +} + +void +foo3 (float *p, __m512 x) +{ + *p = ((__v16sf)x)[0]; +} + +/* { dg-final { scan-assembler-not "vzeroupper" } } */ -- 2.20.1