Re: [PATCH] libstdc++: more #pragma diagnostic
On Tue, 24 Sept 2024, 21:43 Jason Merrill, wrote: > On 9/24/24 7:51 AM, Jason Merrill wrote: > > Tested x86_64-pc-linux-gnu. > > > > Is this the right fix, or do we want to stop using these deprecated > classes, > > here and in stl_function.h? > We can't stop using them in stl_function.h for ABI compatibility reasons, and the parallel mode should be deprecated in favour of C++17 parallel algos so isn't worth "fixing", so I think the pragmas are the right answer here. OK, thanks. > Oops, adding libstdc++ CC. > > > -- 8< -- > > > > The CI saw failures on 17_intro/headers/c++2011/parallel_mode.cc due to > > -Wdeprecated-declarations warnings in some parallel/ headers. > > > > libstdc++-v3/ChangeLog: > > > > * include/parallel/base.h: Suppress -Wdeprecated-declarations. > > * include/parallel/multiseq_selection.h: Likewise. > > --- > > libstdc++-v3/include/parallel/base.h | 4 > > libstdc++-v3/include/parallel/multiseq_selection.h | 6 ++ > > 2 files changed, 10 insertions(+) > > > > diff --git a/libstdc++-v3/include/parallel/base.h > b/libstdc++-v3/include/parallel/base.h > > index 5bc5350e723..fcbcc1e0b99 100644 > > --- a/libstdc++-v3/include/parallel/base.h > > +++ b/libstdc++-v3/include/parallel/base.h > > @@ -166,6 +166,8 @@ namespace __gnu_parallel > > { return !_M_comp(__a, __b) && !_M_comp(__b, __a); } > > }; > > > > +#pragma GCC diagnostic push > > +#pragma GCC diagnostic ignored "-Wdeprecated-declarations" // > *nary_function > > > > /** @brief Similar to std::unary_negate, > > * but giving the argument types explicitly. */ > > @@ -297,6 +299,8 @@ namespace __gnu_parallel > > struct _Multiplies<_Tp, _Tp, _Tp> > > : public std::multiplies<_Tp> { }; > > > > +#pragma GCC diagnostic pop // -Wdeprecated-declarations > > + > > /** @brief _Iterator associated with __gnu_parallel::_PseudoSequence. > > * If features the usual random-access iterator functionality. > > * @param _Tp Sequence _M_value type. > > diff --git a/libstdc++-v3/include/parallel/multiseq_selection.h > b/libstdc++-v3/include/parallel/multiseq_selection.h > > index f25895adbdd..22bd97e6432 100644 > > --- a/libstdc++-v3/include/parallel/multiseq_selection.h > > +++ b/libstdc++-v3/include/parallel/multiseq_selection.h > > @@ -48,6 +48,10 @@ > > > > namespace __gnu_parallel > > { > > + > > +#pragma GCC diagnostic push > > +#pragma GCC diagnostic ignored "-Wdeprecated-declarations" // > *nary_function > > + > > /** @brief Compare __a pair of types lexicographically, ascending. */ > > template > > class _Lexicographic > > @@ -100,6 +104,8 @@ namespace __gnu_parallel > > } > > }; > > > > +#pragma GCC diagnostic pop // -Wdeprecated-declarations > > + > > /** > > * @brief Splits several sorted sequences at a certain global > __rank, > > * resulting in a splitting point for each sequence. > > > > base-commit: b752eed3e3f2f27570ea89b7c2339468698472a8 > >
Re: [PATCH] gfortran testsuite: Remove unit-files in files having open-statements, PR116701
On 9/23/24 11:21 PM, Hans-Peter Nilsson wrote: Here's a general approach to handle PR116701. I considered adding manual deletions as quoted below and mentioned in the PR, but seeing the handling of "integer 8" in fortran-torture-execute I decided to follow that example: better scan the source for open-statements than relying on manual annotations and people remembering to add them for new test-cases. I hope the inclusion of gfortran-dg.exp in fortran-torture.exp is not controversial, but there's no fortran-specific testsuite file common to dg and classic-torture and also this placement is still in the "Utility routines" section of gfortran-dg.exp. (BTW, the C torture-tests changed to the dg framework some time ago - no more .x-files there and dg-directives actually work - there are some in gfortran.fortran-torture that are apparently ignored!) Explain this change of including gfortran-dg.exp in fortran-torture.exp. What does it mean in the case I do 'make -k -j4 check-fortran'? Does gfortran-dg-exp get performed twice? Forgive my ignorance of the testsuite incantations. Regards, Jerry
Re: [PATCH] gfortran testsuite: Remove unit-files in files having open-statements, PR116701
Thanks for the review! > Date: Tue, 24 Sep 2024 17:10:27 -0700 > Cc: Jerry D > From: Jerry D > On 9/23/24 11:21 PM, Hans-Peter Nilsson wrote: > > I hope the inclusion of gfortran-dg.exp in > > fortran-torture.exp is not controversial, but there's no > > fortran-specific testsuite file common to dg and > > classic-torture and also this placement is still in the > > "Utility routines" section of gfortran-dg.exp. (BTW, the C > > torture-tests changed to the dg framework some time ago - no > > more .x-files there and dg-directives actually work - there > > are some in gfortran.fortran-torture that are apparently > > ignored!) > > Explain this change of including gfortran-dg.exp in fortran-torture.exp. I need to put the new proc in a file, to be used by both dg and classic-torture. I picked among the untility-carrying files gfortran-dg.exp, as it looked more fitting than e.g. fortran-modules.exp. Since it's not previously included there, I included that file in fortran-torture.exp. By including that file, not just the new proc gfortran-dg-rmunits but also the other procs in that file are available. Since they don't collide with the fortran-torture machinery, that should have no effect. > What does it mean in the case I do 'make -k -j4 check-fortran'? Does > gfortran-dg-exp get performed twice? (I assume you mean "are the gfortran.dg tests run twice" as other interpretations make less sense to me.) No. > Forgive my ignorance of the > testsuite incantations. There's nothing but load_lib and proc definitions in gfortran-dg.exp, specifically no "top-level code" running tests like execute.exp or dg.exp, so including it should have no such effect...but I see that the files it include *do* have top-level code (setting global variables for use by the testsuite machinery, *not* running tests). Perhaps I should ignore that misnomer and put gfortran-dg-rmunits in fortran-modules.exp in order to put pollution worries to rest. After all, that file already has the utility proc igrep, used in gfortran-dg-rmunits. So, new version coming up. brgds, H-P
[PATCH] libgcc, libstdc++: Make more entities no longer TU-local [PR115126]
I found that my previous minimal change to libstdc++ was only sufficient to pass regtest on x86_64-pc-linux-gnu; Linaro complained about ARM and aarch64. This patch removes the rest of the internal-linkage entities I could find exposed via libstdc++. The libgcc changes include some blocks specific to FreeBSD, Solaris <10, and HP-UX; I haven't been able to test these changes at this time. Happy to adjust or remove those hunks as needed. Apologies if I haven't CC'd in the correct people. Bootstrapped and regtested on x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu, OK for trunk? -- >8 -- In C++20, modules streaming check for exposures of TU-local entities. In general exposing internal linkage functions in a header is liable to cause ODR violations in C++, and this is now detected in a module context. This patch goes through and removes 'static' from many functions exposed through libstdc++ to prevent code like the following from failing: export module M; extern "C++" { #include } Since gthreads is used from C as well, we need to choose whether to use 'inline' or 'static inline' depending on whether we're compiling for C or C++ (since the semantics of 'inline' are different between the languages). Additionally we need to remove static global variables, so we migrate these to function-local statics to avoid the ODR issues. There doesn't seem to be a good workaround for weakrefs, so I've left them as-is and will work around it in the modules streaming code to consider them as not TU-local. The same issue occurs in the objective-C specific parts of gthreads, but I'm not familiar with the surrounding context and we don't currently test modules with Objective C++ anyway so I've left it as-is. PR libstdc++/115126 libgcc/ChangeLog: * gthr-posix.h (__GTHREAD_INLINE): New macro. (__gthread_active): Convert from variable to function. (__gthread_trigger): Mark as __GTHREAD_INLINE instead of static. (__gthread_active_p): Likewise. (__gthread_create): Likewise. (__gthread_join): Likewise. (__gthread_detach): Likewise. (__gthread_equal): Likewise. (__gthread_self): Likewise. (__gthread_yield): Likewise. (__gthread_once): Likewise. (__gthread_key_create): Likewise. (__gthread_key_delete): Likewise. (__gthread_getspecific): Likewise. (__gthread_setspecific): Likewise. (__gthread_mutex_init_function): Likewise. (__gthread_mutex_destroy): Likewise. (__gthread_mutex_lock): Likewise. (__gthread_mutex_trylock): Likewise. (__gthread_mutex_timedlock): Likewise. (__gthread_mutex_unlock): Likewise. (__gthread_recursive_mutex_init_function): Likewise. (__gthread_recursive_mutex_lock): Likewise. (__gthread_recursive_mutex_trylock): Likewise. (__gthread_recursive_mutex_timedlock): Likewise. (__gthread_recursive_mutex_unlock): Likewise. (__gthread_recursive_mutex_destroy): Likewise. (__gthread_cond_init_function): Likewise. (__gthread_cond_broadcast): Likewise. (__gthread_cond_signal): Likewise. (__gthread_cond_wait): Likewise. (__gthread_cond_timedwait): Likewise. (__gthread_cond_wait_recursive): Likewise. (__gthread_cond_destroy): Likewise. (__gthread_rwlock_rdlock): Likewise. (__gthread_rwlock_tryrdlock): Likewise. (__gthread_rwlock_wrlock): Likewise. (__gthread_rwlock_trywrlock): Likewise. (__gthread_rwlock_unlock): Likewise. * gthr-single.h (__GTHREAD_INLINE): New macro. (__gthread_active_p): Mark as __GTHREAD_INLINE instead of static. (__gthread_once): Likewise. (__gthread_key_create): Likewise. (__gthread_key_delete): Likewise. (__gthread_getspecific): Likewise. (__gthread_setspecific): Likewise. (__gthread_mutex_destroy): Likewise. (__gthread_mutex_lock): Likewise. (__gthread_mutex_trylock): Likewise. (__gthread_mutex_unlock): Likewise. (__gthread_recursive_mutex_lock): Likewise. (__gthread_recursive_mutex_trylock): Likewise. (__gthread_recursive_mutex_unlock): Likewise. (__gthread_recursive_mutex_destroy): Likewise. libstdc++-v3/ChangeLog: * include/bits/shared_ptr.h (std::__is_shared_ptr): Remove unnecessary 'static'. * include/bits/unique_ptr.h (std::__is_unique_ptr): Likewise. * include/std/future (std::__craete_task_state): Likewise. * include/std/shared_mutex (_GLIBCXX_GTRHW): Likewise. (__glibcxx_rwlock_init): Likewise. (__glibcxx_rwlock_timedrdlock): Likewise. (__glibcxx_rwlock_timedwrlock): Likewise. (__glibcxx_rwlock_rdlock): Likewise. (__glibcxx_rwlock_tryrdlock): Likewise. (__glibcxx_rwlock_wrlock): Likewise. (__glibcxx_rwlock_trywrlock): Likewise. (__gli
[PATCH 11/10] c++/modules: Treat weakrefs as not TU-local [PR115126]
This follows up on some more test failures reported by Linaro on aarch64. The testcase also depends on the libgcc/libstdc++ patch here: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663749.html To avoid an intermediary state where aarch64 regtests fail I could include the module.cc changes in patch 6 of this series. Let me know if you'd like me to send through a full updated v2 patch series instead of having all these 'extra' patches fixing issues on other platforms... Bootstrapped and regtested on x86_64-pc-linux and aarch64-unknown-linux-gnu, OK for trunk? -- >8 -- On some targets the gthreads support code uses weakref aliases on entities marked 'static'. By the C++ standard these have internal linkage, but we really shouldn't consider these as TU-local. This provides enough of the puzzle to pass the testcase in the PR on at least x86_64-linux and aarch64-linux; we'll see what happens on other targets. PR c++/115126 gcc/cp/ChangeLog: * module.cc (depset::hash::is_tu_local_entity): Don't treat weak entities as TU-local. gcc/testsuite/ChangeLog: * g++.dg/modules/xtreme-header-8.C: New test. Signed-off-by: Nathaniel Shead --- gcc/cp/module.cc | 5 + gcc/testsuite/g++.dg/modules/xtreme-header-8.C | 8 2 files changed, 13 insertions(+) create mode 100644 gcc/testsuite/g++.dg/modules/xtreme-header-8.C diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index d54f1c88366..3e9b63c1e56 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -13135,6 +13135,11 @@ depset::hash::is_tu_local_entity (tree decl, bool explain/*=false*/) linkage_kind kind = decl_linkage (decl); if (kind == lk_internal) { + /* But don't consider weak entities as TU-local. */ + tree inner = STRIP_TEMPLATE (decl); + if (VAR_OR_FUNCTION_DECL_P (inner) && DECL_WEAK (inner)) + return false; + if (explain) inform (loc, "%qD declared with internal linkage", decl); return true; diff --git a/gcc/testsuite/g++.dg/modules/xtreme-header-8.C b/gcc/testsuite/g++.dg/modules/xtreme-header-8.C new file mode 100644 index 000..9da4e01cc68 --- /dev/null +++ b/gcc/testsuite/g++.dg/modules/xtreme-header-8.C @@ -0,0 +1,8 @@ +// PR c++/115126 +// { dg-additional-options "-fmodules-ts -Wignored-exposures" } +// { dg-module-cmi xstd } + +export module xstd; +extern "C++" { + #include "xtreme-header.h" +} -- 2.46.0
Re: [PATCH] [x86] Define VECTOR_STORE_FLAG_VALUE
On Tue, Sep 24, 2024 at 5:46 PM Uros Bizjak wrote: > > On Tue, Sep 24, 2024 at 11:23 AM liuhongt wrote: > > > > Return constm1_rtx when GET_MODE_CLASS (MODE) == MODE_VECTOR_INT. > > Otherwise NULL_RTX. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ready push to trunk. > > > > gcc/ChangeLog: > > > > * config/i386/i386.h (VECTOR_STORE_FLAG_VALUE): New macro. > > > > gcc/testsuite/ChangeLog: > > * gcc.dg/rtl/x86_64/vector_eq.c: New test. > > --- > > gcc/config/i386/i386.h | 5 +++- > > gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c | 26 + > > 2 files changed, 30 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c > > > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > > index c1ec92ffb15..b12be41424f 100644 > > --- a/gcc/config/i386/i386.h > > +++ b/gcc/config/i386/i386.h > > @@ -899,7 +899,10 @@ extern const char *host_detect_local_cpu (int argc, > > const char **argv); > > and give entire struct the alignment of an int. */ > > /* Required on the 386 since it doesn't have bit-field insns. */ > > #define PCC_BITFIELD_TYPE_MATTERS 1 > > - > > + > > +#define VECTOR_STORE_FLAG_VALUE(MODE) \ > > + (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT ? constm1_rtx : NULL_RTX) > > + > > /* Standard register usage. */ > > > > /* This processor has special stack-like registers. See reg-stack.cc > > diff --git a/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c > > b/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c > > new file mode 100644 > > index 000..b82603d0b64 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c > > @@ -0,0 +1,26 @@ > > +/* { dg-do compile { target x86_64-*-* } } */ > > target { { i?86-*-* x86_64-*-* } && lp64 } Thanks, changed. > > Uros. > > > +/* { dg-additional-options "-O2 -march=x86-64-v3" } */ > > + > > +typedef int v4si __attribute__((vector_size(16))); > > + > > +v4si __RTL (startwith ("vregs")) foo (void) > > +{ > > +(function "foo" > > + (insn-chain > > +(block 2 > > + (edge-from entry (flags "FALLTHRU")) > > + (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK) > > + (cnote 2 NOTE_INSN_FUNCTION_BEG) > > + (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int 0) > > (const_int 0) (const_int 0) (const_int 0)]))) > > + (cinsn 5 (set (reg:V4SI <2>) > > + (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1> > > + (cinsn 6 (set (reg:V4SI <3>) (reg:V4SI <2>))) > > + (cinsn 7 (set (reg:V4SI xmm0) (reg:V4SI <3>))) > > + (edge-to exit (flags "FALLTHRU")) > > +) > > + ) > > + (crtl (return_rtx (reg/i:V4SI xmm0))) > > +) > > +} > > + > > +/* { dg-final { scan-assembler-not "vpxor" } } */ > > -- > > 2.31.1 > > -- BR, Hongtao
Re: [PATCH] i386: Add GENERIC and GIMPLE folders of __builtin_ia32_{min,max}* [PR116738]
On Wed, Sep 25, 2024 at 1:07 AM Jakub Jelinek wrote: > > Hi! > > The following patch adds GENERIC and GIMPLE folders for various > x86 min/max builtins. > As discussed, these builtins have effectively x < y ? x : y > (or x > y ? x : y) behavior. > The GENERIC folding is done if all the (relevant) arguments are > constants (such as VECTOR_CST for vectors) and is done because > the GIMPLE folding can't easily handle masking, rounding and the > ss/sd cases (in a way that it would be pattern recognized back to the > corresponding instructions). The GIMPLE folding is also done just > for TARGET_SSE4 or later when optimizing, otherwise it is apparently > not matched back. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2024-09-24 Jakub Jelinek > > PR target/116738 > * config/i386/i386.cc (ix86_fold_builtin): Handle > IX86_BUILTIN_M{IN,AX}{S,P}{S,H,D}*. > (ix86_gimple_fold_builtin): Handle IX86_BUILTIN_M{IN,AX}P{S,H,D}*. > > * gcc.target/i386/avx512f-pr116738-1.c: New test. > * gcc.target/i386/avx512f-pr116738-2.c: New test. > > --- gcc/config/i386/i386.cc.jj 2024-09-12 10:56:57.344683959 +0200 > +++ gcc/config/i386/i386.cc 2024-09-23 15:15:40.154783766 +0200 > @@ -18507,6 +18507,8 @@ ix86_fold_builtin (tree fndecl, int n_ar > = (enum ix86_builtins) DECL_MD_FUNCTION_CODE (fndecl); >enum rtx_code rcode; >bool is_vshift; > + enum tree_code tcode; > + bool is_scalar; >unsigned HOST_WIDE_INT mask; > >switch (fn_code) > @@ -18956,6 +18958,133 @@ ix86_fold_builtin (tree fndecl, int n_ar > } > break; > > + case IX86_BUILTIN_MINSS: > + case IX86_BUILTIN_MINSH_MASK: > + tcode = LT_EXPR; > + is_scalar = true; > + goto do_minmax; > + > + case IX86_BUILTIN_MAXSS: > + case IX86_BUILTIN_MAXSH_MASK: > + tcode = GT_EXPR; > + is_scalar = true; > + goto do_minmax; > + > + case IX86_BUILTIN_MINPS: > + case IX86_BUILTIN_MINPD: > + case IX86_BUILTIN_MINPS256: > + case IX86_BUILTIN_MINPD256: > + case IX86_BUILTIN_MINPS512: > + case IX86_BUILTIN_MINPD512: > + case IX86_BUILTIN_MINPS128_MASK: > + case IX86_BUILTIN_MINPD128_MASK: > + case IX86_BUILTIN_MINPS256_MASK: > + case IX86_BUILTIN_MINPD256_MASK: > + case IX86_BUILTIN_MINPH128_MASK: > + case IX86_BUILTIN_MINPH256_MASK: > + case IX86_BUILTIN_MINPH512_MASK: > + tcode = LT_EXPR; > + is_scalar = false; > + goto do_minmax; > + > + case IX86_BUILTIN_MAXPS: > + case IX86_BUILTIN_MAXPD: > + case IX86_BUILTIN_MAXPS256: > + case IX86_BUILTIN_MAXPD256: > + case IX86_BUILTIN_MAXPS512: > + case IX86_BUILTIN_MAXPD512: > + case IX86_BUILTIN_MAXPS128_MASK: > + case IX86_BUILTIN_MAXPD128_MASK: > + case IX86_BUILTIN_MAXPS256_MASK: > + case IX86_BUILTIN_MAXPD256_MASK: > + case IX86_BUILTIN_MAXPH128_MASK: > + case IX86_BUILTIN_MAXPH256_MASK: > + case IX86_BUILTIN_MAXPH512_MASK: > + tcode = GT_EXPR; > + is_scalar = false; > + do_minmax: > + gcc_assert (n_args >= 2); > + if (TREE_CODE (args[0]) != VECTOR_CST > + || TREE_CODE (args[1]) != VECTOR_CST) > + break; > + mask = HOST_WIDE_INT_M1U; > + if (n_args > 2) > + { > + gcc_assert (n_args >= 4); > + /* This is masked minmax. */ > + if (TREE_CODE (args[3]) != INTEGER_CST > + || TREE_SIDE_EFFECTS (args[2])) > + break; > + mask = TREE_INT_CST_LOW (args[3]); > + unsigned elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (args[0])); > + mask |= HOST_WIDE_INT_M1U << elems; > + if (mask != HOST_WIDE_INT_M1U > + && TREE_CODE (args[2]) != VECTOR_CST) > + break; > + if (n_args >= 5) > + { > + if (!tree_fits_uhwi_p (args[4])) > + break; > + if (tree_to_uhwi (args[4]) != 4 > + && tree_to_uhwi (args[4]) != 8) > + break; > + } > + if (mask == (HOST_WIDE_INT_M1U << elems)) > + return args[2]; > + } > + /* Punt on NaNs, unless exceptions are disabled. */ > + if (HONOR_NANS (args[0]) > + && (n_args < 5 || tree_to_uhwi (args[4]) != 8)) > + for (int i = 0; i < 2; ++i) > + { > + unsigned count = vector_cst_encoded_nelts (args[i]), j; > + for (j = 0; j < count; ++j) > + if (!tree_expr_nan_p (VECTOR_CST_ENCODED_ELT (args[i], j))) Is this a typo? I assume you want to check if the component is NAN, so tree_expr_nan_p, not !tree_expr_nan_p? > + break; > + if (j < count) > +
[PATCH v2] gfortran testsuite: Remove unit-files in files having open-statements, PR116701
Changes since v1: - Rename gfortran-dg-rmunits to fortran-delete-unit-files. - Move it to lib/fortran-modules.exp. - Tweak commit message accordingly and mention cause of placement of the proc. - Tweak proc comment to mention why keeping removals unique despite comment. Here's a general approach to handle PR116701. I considered adding manual deletions as quoted below and mentioned in the PR, but seeing the handling of "integer 8" in fortran-torture-execute I decided to follow that example: better scan the source for open-statements than relying on manual annotations and people remembering to add them for new test-cases. I hope the inclusion of gfortran-dg.exp in fortran-torture.exp is not controversial, but there's no fortran-specific testsuite file common to dg and classic-torture and also this placement is still in the "Utility routines" section of gfortran-dg.exp. (BTW, the C torture-tests changed to the dg framework some time ago - no more .x-files there and dg-directives actually work - there are some in gfortran.fortran-torture that are apparently ignored!) There's one further cleanup possible, removing the manual removal in open_errors_2.f90 (which should have used "target", not "build") Works for cris-elf (no regressions). Version v1 was also similarly regtested on native x86_64-linux-gnu. Manual checks have verified the unit-removal. Ok to commit? -- >8 -- PR testsuite/116701 shows that left-behind files from unnamed gfortran open statements (named unit.N, where N = unit number) can interfere with the result of a subsequent run. While that's unlikely to happen for a "real" fortran target or a test with a deleting close-statement, test-cases should not rely on previous test-cases passing and not execute along different execution paths depending on earlier runs, even if the difference is benevolent. Most but not all fortran test-cases go through gfortran-dg-runtest (gfortran.dg) or fortran-torture-execute (gfortran.fortran-torture). However, the exceptions, with more complex framework and call-chains, either don't run or don't have open-statements, so a more complex solution doesn't seem worthwhile. If test-cases with open-statements are added later to those parts of the test-suite, calls to fortran-delete-unit-files at the right spot may be added or worst case, "manual" cleanup-calls added, like: ! { dg-final { remote_file target delete "fort.10" } } Put the new proc in fortran-modules.exp since that's where other common fortran-testsuite dejagnu-library functions are located. PR testsuite/116701 * lib/fortran-modules.exp (fortran-delete-unit-files): New proc. * lib/gfortran-dg.exp (gfortran-dg-runtest): Call fortran-delete-unit-files after executing test. * lib/fortran-torture.exp (fortran-torture-execute): Ditto. --- gcc/testsuite/lib/fortran-modules.exp | 21 + gcc/testsuite/lib/fortran-torture.exp | 2 ++ gcc/testsuite/lib/gfortran-dg.exp | 1 + 3 files changed, 24 insertions(+) diff --git a/gcc/testsuite/lib/fortran-modules.exp b/gcc/testsuite/lib/fortran-modules.exp index 158b16bada91..a7196f13ed22 100644 --- a/gcc/testsuite/lib/fortran-modules.exp +++ b/gcc/testsuite/lib/fortran-modules.exp @@ -172,3 +172,24 @@ proc igrep { args } { } return $grep_out } + +# If the code has any "open" statements for numbered units, make sure +# no corresponding output file remains. Redundant remove operations +# are ok, but duplicate removals look sloppy, so track for uniqueness. +proc fortran-delete-unit-files { src } { +set openpat {open *\( *(?:unit *= *)?([0-9]+)} +set openmatches [igrep $src $openpat] +if {![string match "" $openmatches]} { + # verbose -log "Found \"$openmatches\"" + set deleted_units {} + foreach openmatch $openmatches { + regexp -nocase -- "$openpat" $openmatch match unit + if {[lsearch $deleted_units $unit] < 0} { + set rmfile "fort.$unit" + verbose -log "Deleting $rmfile" + remote_file target delete "fort.$unit" + lappend deleted_units $unit + } + } +} +} diff --git a/gcc/testsuite/lib/fortran-torture.exp b/gcc/testsuite/lib/fortran-torture.exp index 66f5bc822232..0727fb4fb0a6 100644 --- a/gcc/testsuite/lib/fortran-torture.exp +++ b/gcc/testsuite/lib/fortran-torture.exp @@ -332,6 +332,8 @@ proc fortran-torture-execute { src } { catch { remote_file build delete $executable } } $status "$testcase execution, $option" + + fortran-delete-unit-files $src } cleanup-modules "" } diff --git a/gcc/testsuite/lib/gfortran-dg.exp b/gcc/testsuite/lib/gfortran-dg.exp index fcba95dc3961..2edc09e5c995 100644 --- a/gcc/testsuite/lib/gfortran-dg.exp +++ b/gcc/testsuite/lib/gfortran-dg.exp @@ -160,6 +160,7 @@ proc gfortran-dg-runtest { testcases flags default-extra-flags } { foreach flags_t $option_list { verbo
Re: libgomp: with USM, init 'link' variables with host address
Now committed as r15-3836-g4cb20dc043cf70 Contrary to the originally posted patch, it also acts on the newer/newly added 'omp requires self_maps'. In the area of (unified-)shared memory/self maps, the next step seems to be to do still mapping for static variables – before moving to refinements like how to handle implicit 'declare target' for static variables, … For this piece of code, we also want to run it for APUs even when no USM has been requested, avoid adding those to the mapping table (for self maps) and do a more efficient mapping (e.g. memcpy or avoid multiple locks). Tobias Tobias Burnus wrote: short version: I think the patch as posted is fine and no action beyond is needed for this one issue. See below for the long version. Possibly modifications (now or as follow up): - using memcpy + or let the plugin do it - not adding link variables to the splay tree with 'USM'. Thomas Schwinge wrote: Tested on x86-64-gnu-linux and nvptx offloading (that supports USM). (I yet have to set up such a USM configuration...) You already used an USM config, e.g., when running gfx90a (likewise: gfx90c), except that USM on mainline it currently only works if you explicitly set 'export HSA_XNACK=1'. For Nvptx, you need a post-Volta GPU with the open-kernels driver, which is for newer driver versions the default. * * * Do I understand correctly that even if 'GOMP_REQUIRES_UNIFIED_SHARED_MEMORY', we cannot just skip all the 'mem_map' setup in 'gomp_load_image_to_device' etc., because we're not (yet?) setting 'GOMP_OFFLOAD_CAP_SHARED_MEM'? We actually do set GOMP_OFFLOAD_CAP_SHARED_MEM with 'requires unified_shared_memory'. But, indeed, we cannot skip the memory mapping parts – due to the way we handle static variables. * * * + + if (is_link_var + && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)) + gomp_copy_host2dev (devicep, NULL, (void *) target_var->start, + &k->host_start, sizeof (void *), false, NULL); } Calling 'gomp_copy_host2dev' looks a bit funny given we've just determined USM (..., but I'm not asking for plain 'memcpy'). I guess a plain memcpy would do as well. [Assuming that the device's static variable is host accessible, which it probably is and should be.] I add it to my to-do list for USM-related tasks to change this; possibly moving it to the plugin side has some advantages? Possibly not adding it to the splay tree if not needed. (Cf. below for env var discussion.) Regarding the unload: For 'declare target link(A)', we have, e.g., 'static int *A' on the device side. Thus, we could do 'A = NULL' – and rather should do 'A = {clobber}', but that's rather pointless in general and especially when unloading the image. What's the advantage/rationale of doing this here vs. in 'gomp_map_vars_internal' for 'REFCOUNT_LINK'? (May be worth a source code comment?) (A, B, C refers to the following example.) We don't see 'A' (or 'B') in the GOMP_target_ext call and thus not in gomp_map_vars_internal. Besides: We only want to do the initialization once and not every time gomp_map_vars_internal is called. I think the following program may help to understand the issue and the patch better. Note: While A, B, C are 'int …[3]' on the host, on the device we only have 'int B[3]' while for A it's 'int *A' and C only exists on the host. * * * #pragma requires unified_shared_memory static int A[3], B[3], C[3]; #pragma omp declare target link(A) enter(B) #pragma omp begin declare target void f(int *p) { A[2] += B[2] + p[2]; // p points to the host's C variable } #pragma omp end declare target void foo(int dev) { int *ptr = C; #pragma omp target firstprivate(ptr) device(dev) f (ptr); } * * * Here, 'ptr' (and thus 'p') point to the host 'C' variable, both before the target region and inside the target region. 'B' points to the device local version of the variable. And 'A' on a non-host device is likely to be NULL ('static int *A' + .BSS) before this patch. Or pointing to the host's 'A' with this patch. * * * With A pointing to the host version (and likewise 'p' pointing to the host C), host fallback and device version yield identical result for 'A' and for 'C' (via ptr/p). — However, 'B' on host and non-host device have nothing in common. While that might be fine, in general it is not. Hence, in order to get for a .BSS valued 'B' the same result on host and device, we need, e.g. #pragma omp data map(always: B) device(dev) foo (dev); to call 'foo' to ensure that the two 'B' are in sync. * * * Code wise, this means that with GOMP_OFFLOAD_CAP_SHARED_MEM, we still have to apply the map for 'declare target enter(…)' variables, except if host and device share the same code – but that should only be the case for host fallback (= initial device) and, possibly, GOMP_OFFLOAD_CAP_NATIVE_EXEC. * * * NOTE: OpenMP still permits to honor explicit 'map' with 'requires unified_sha
[pushed] libgcc, Darwin: Drop the legacy library build for macOS >= 15 [PR116809].
Tested on i686-darwin9, 17; x86_64-darwin17, 19, 21, 23 and my FX on x86_64 darwin24, pushed to trunk, thanks Iain --- 8< --- We have been building a legacy libgcc_s.1 DSO to support code that was built with older compilers. >From macOS 15, the unwinder no longer exports some of the symbols used in that library which (a) cuases bootstrap fail and (b) means that the legacy library is no longer useful. No open branch of GCC emits references to this library - and any already -built code that depends on the symbols would need rework anyway. PR target/116809 libgcc/ChangeLog: * config.host: Build legacy libgcc_s.1 on hosts before macOS 15. * config/i386/t-darwin: Remove reference to legacy libgcc_s.1 * config/rs6000/t-darwin: Likewise. * config/t-darwin-libgccs1: New file. Signed-off-by: Iain Sandoe --- libgcc/config.host | 11 +++ libgcc/config/i386/t-darwin | 3 --- libgcc/config/rs6000/t-darwin | 3 --- libgcc/config/t-darwin-libgccs1 | 3 +++ 4 files changed, 10 insertions(+), 10 deletions(-) create mode 100644 libgcc/config/t-darwin-libgccs1 diff --git a/libgcc/config.host b/libgcc/config.host index 5c6b656531f..00bd6384c0f 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -239,22 +239,25 @@ case ${host} in esac tmake_file="$tmake_file t-slibgcc-darwin" case ${host} in +x86_64-*-darwin2[0-3]*) + tmake_file="t-darwin-min-11 t-darwin-libgccs1 $tmake_file" + ;; *-*-darwin2*) tmake_file="t-darwin-min-11 $tmake_file" ;; *-*-darwin1[89]*) - tmake_file="t-darwin-min-8 $tmake_file" + tmake_file="t-darwin-min-8 t-darwin-libgccs1 $tmake_file" ;; *-*-darwin9* | *-*-darwin1[0-7]*) - tmake_file="t-darwin-min-5 $tmake_file" + tmake_file="t-darwin-min-5 t-darwin-libgccs1 $tmake_file" ;; *-*-darwin[4-8]*) - tmake_file="t-darwin-min-1 $tmake_file" + tmake_file="t-darwin-min-1 t-darwin-libgccs1 $tmake_file" ;; *) # Fall back to configuring for the oldest system known to work with # all archs and the current sources. - tmake_file="t-darwin-min-5 $tmake_file" + tmake_file="t-darwin-min-5 t-darwin-libgccs1 $tmake_file" echo "Warning: libgcc configured to support macOS 10.5" 1>&2 ;; esac diff --git a/libgcc/config/i386/t-darwin b/libgcc/config/i386/t-darwin index 4c18da1efbf..c6b3acaaca2 100644 --- a/libgcc/config/i386/t-darwin +++ b/libgcc/config/i386/t-darwin @@ -4,6 +4,3 @@ LIB2FUNCS_EXCLUDE = _fixtfdi _fixunstfdi _floatditf _floatunditf # Extra symbols for this port. SHLIB_MAPFILES += $(srcdir)/config/i386/libgcc-darwin.ver - -# Build a legacy libgcc_s.1 -BUILD_LIBGCCS1 = YES diff --git a/libgcc/config/rs6000/t-darwin b/libgcc/config/rs6000/t-darwin index 183d0df92ce..8b513bdb1d7 100644 --- a/libgcc/config/rs6000/t-darwin +++ b/libgcc/config/rs6000/t-darwin @@ -56,6 +56,3 @@ unwind-dw2_s.o: HOST_LIBGCC2_CFLAGS += -maltivec unwind-dw2.o: HOST_LIBGCC2_CFLAGS += -maltivec LIB2ADDEH += $(srcdir)/config/rs6000/darwin-fallback.c - -# Build a legacy libgcc_s.1 -BUILD_LIBGCCS1 = YES diff --git a/libgcc/config/t-darwin-libgccs1 b/libgcc/config/t-darwin-libgccs1 new file mode 100644 index 000..b88b1a5bba8 --- /dev/null +++ b/libgcc/config/t-darwin-libgccs1 @@ -0,0 +1,3 @@ + +# Build a legacy libgcc_s.1 +BUILD_LIBGCCS1 = YES -- 2.39.2 (Apple Git-143)
Re: [PATCH] c++, v2: Implement C++23 P2718R0 - Wording for P2644R1 Fix for Range-based for Loop [PR107637]
On Tue, Sep 24, 2024 at 01:34:44PM -0400, Jason Merrill wrote: > Let's also give an error for trying to disable it in C++23+. > Missing function comment, maybe just use the one below? > Please add a comment to this and range-for4 explaining that this is to get > the fix enabled in GNU modes. > > OK with those changes. Done, committed now, thanks for the review. I've also committed the following tweak for the status page: diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html index d986fc79..76f6ef6d 100644 --- a/htdocs/projects/cxx-status.html +++ b/htdocs/projects/cxx-status.html @@ -576,7 +576,7 @@ Wording for P2644R1 Fix for Range-based for Loop https://wg21.link/p2718";>P2718R0 - https://gcc.gnu.org/PR107637";>No + 15 __cpp_range_based_for >= 202211L
[PATCH] [x86] Define VECTOR_STORE_FLAG_VALUE
Return constm1_rtx when GET_MODE_CLASS (MODE) == MODE_VECTOR_INT. Otherwise NULL_RTX. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/i386.h (VECTOR_STORE_FLAG_VALUE): New macro. gcc/testsuite/ChangeLog: * gcc.dg/rtl/x86_64/vector_eq.c: New test. --- gcc/config/i386/i386.h | 5 +++- gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c | 26 + 2 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index c1ec92ffb15..b12be41424f 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -899,7 +899,10 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); and give entire struct the alignment of an int. */ /* Required on the 386 since it doesn't have bit-field insns. */ #define PCC_BITFIELD_TYPE_MATTERS 1 - + +#define VECTOR_STORE_FLAG_VALUE(MODE) \ + (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT ? constm1_rtx : NULL_RTX) + /* Standard register usage. */ /* This processor has special stack-like registers. See reg-stack.cc diff --git a/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c b/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c new file mode 100644 index 000..b82603d0b64 --- /dev/null +++ b/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c @@ -0,0 +1,26 @@ +/* { dg-do compile { target x86_64-*-* } } */ +/* { dg-additional-options "-O2 -march=x86-64-v3" } */ + +typedef int v4si __attribute__((vector_size(16))); + +v4si __RTL (startwith ("vregs")) foo (void) +{ +(function "foo" + (insn-chain +(block 2 + (edge-from entry (flags "FALLTHRU")) + (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK) + (cnote 2 NOTE_INSN_FUNCTION_BEG) + (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int 0) (const_int 0) (const_int 0) (const_int 0)]))) + (cinsn 5 (set (reg:V4SI <2>) + (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1> + (cinsn 6 (set (reg:V4SI <3>) (reg:V4SI <2>))) + (cinsn 7 (set (reg:V4SI xmm0) (reg:V4SI <3>))) + (edge-to exit (flags "FALLTHRU")) +) + ) + (crtl (return_rtx (reg/i:V4SI xmm0))) +) +} + +/* { dg-final { scan-assembler-not "vpxor" } } */ -- 2.31.1
[PATCH] c++, v2: Implement C++23 P2718R0 - Wording for P2644R1 Fix for Range-based for Loop [PR107637]
On Mon, Sep 23, 2024 at 03:46:36PM -0400, Jason Merrill wrote: > > -frange-based-for-ext-temps > > or do you have better suggestion? > > I'd probably drop "based", "range-for" seems enough. > > > Shall we allow also disabling it in C++23 or later modes, or override > > user choice unconditionally for C++23+ and only allow users to > > enable/disable it in C++11-C++20? > > Hmm, I think the latter. > > > What about the __cpp_range_based_for predefined macro? > > Shall it be defined to the C++23 202211L value if the switch is on? > > While that could be done in theory for C++17 and later code, for C++11/14 > > __cpp_range_based_for is 200907L and doesn't include the C++17 > > 201603L step. Or keep the macro only for C++23 and later? > > I think update the macro for 17 and later. Ok. Here is a new patch. > > > > @@ -44600,11 +44609,14 @@ cp_convert_omp_range_for (tree &this_pre > > > > else > > > > { > > > > range_temp = build_range_temp (init); > > > > + tree name = DECL_NAME (range_temp); > > > > DECL_NAME (range_temp) = NULL_TREE; > > > > pushdecl (range_temp); > > > > + DECL_NAME (range_temp) = name; > > > > cp_finish_decl (range_temp, init, > > > > /*is_constant_init*/false, NULL_TREE, > > > > LOOKUP_ONLYCONVERTING); > > > > + DECL_NAME (range_temp) = NULL_TREE; > > > > > > This messing with the name needs a rationale. What wants it to be null? > > > > I'll add comments. The first = NULL_TREE; is needed so that pushdecl > > doesn't register the temporary for name lookup, the = name now is so that > > cp_finish_decl recognizes the temporary as range based for temporary > > for the lifetime extension, and the last one is just to preserve previous > > behavior, not have it visible in debug info etc. > > But cp_convert_range_for doesn't ever set the name to NULL_TREE, why should > the OMP variant be different? > > Having it visible to name lookup in the debugger seems beneficial. Having it > visible to the code seems less useful, but not important to prevent. So, in the end it works fine even for the OpenMP case when not inside of a template, all I had to add is the renaming of the symbol at the end after pop_scope from "__for_range " to "__for_range" etc. It doesn't work unfortunately during instantiation, we only create a single scope in that case for the whole loop nest rather than one for each loop in it and changing that isn't easy. With the "__for_range " name in, if there are 2+ range based for loops in the OpenMP loop nest (collapsed or ordered), one gets then errors about defining it multiple times. I'll try to fix that up at incrementally later, for now I just went with a new flag to the function, so that it does the DECL_NAME dances only when called from the instantiation (and confirmed actually all 3 spots are needed, clearing before pushdecl, resetting back before cp_finish_decl and clearing after cp_finish_decl, the last one so that pop_scope doesn't ICE on seeing the name change). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-09-24 Jakub Jelinek PR c++/107637 gcc/ * omp-general.cc (find_combined_omp_for, find_nested_loop_xform): Handle CLEANUP_POINT_EXPR like TRY_FINALLY_EXPR. * doc/invoke.texi (frange-for-ext-temps): Document. Add -fconcepts to the C++ option list. gcc/c-family/ * c.opt (frange-for-ext-temps): New option. * c-opts.cc (c_common_post_options): Set flag_range_for_ext_temps for C++23 or later or for C++11 or later in !flag_iso mode if the option wasn't set by user. * c-cppbuiltin.cc (c_cpp_builtins): Change __cpp_range_based_for value for flag_range_for_ext_temps from 201603L to 202212L in C++17 or later. * c-omp.cc (c_find_nested_loop_xform_r): Handle CLEANUP_POINT_EXPR like TRY_FINALLY_EXPR. gcc/cp/ * cp-tree.h: Implement C++23 P2718R0 - Wording for P2644R1 Fix for Range-based for Loop. (cp_convert_omp_range_for): Add bool tmpl_p argument. (find_range_for_decls): Declare. * parser.cc (cp_convert_range_for): For flag_range_for_ext_temps call push_stmt_list () before cp_finish_decl for range_temp and save it temporarily to FOR_INIT_STMT. (cp_convert_omp_range_for): Add tmpl_p argument. If set, remember DECL_NAME of range_temp and for cp_finish_decl call restore it before clearing it again, if unset, don't adjust DECL_NAME of range_temp at all. (cp_parser_omp_loop_nest): For flag_range_for_ext_temps range for add CLEANUP_POINT_EXPR around sl. Call find_range_for_decls and adjust DECL_NAMEs for range fors if not processing_template_decl. Adjust cp_convert_omp_range_for caller. Remove superfluous backslash at the end of line. * decl.cc (initialize_local_v
[committed] i386: Fix comment typo
Hi! Found a comment typo, fixed as obvious. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2024-09-24 Jakub Jelinek * config/i386/i386-expand.cc (ix86_expand_round_builtin): Fix comment typo, insead -> instead. --- gcc/config/i386/i386-expand.cc.jj 2024-09-20 08:57:02.496083163 +0200 +++ gcc/config/i386/i386-expand.cc 2024-09-23 11:01:14.128079764 +0200 @@ -12748,7 +12748,7 @@ ix86_expand_round_builtin (const struct /* Skip erasing embedded rounding for below expanders who generates multiple insns. In ix86_erase_embedded_rounding the pattern will be transformed to a single set, and emit_insn -appends the set insead of insert it to chain. So the insns +appends the set instead of insert it to chain. So the insns emitted inside define_expander would be ignored. */ switch (icode) { Jakub
[PATCH] i386: Add GENERIC and GIMPLE folders of __builtin_ia32_{min,max}* [PR116738]
Hi! The following patch adds GENERIC and GIMPLE folders for various x86 min/max builtins. As discussed, these builtins have effectively x < y ? x : y (or x > y ? x : y) behavior. The GENERIC folding is done if all the (relevant) arguments are constants (such as VECTOR_CST for vectors) and is done because the GIMPLE folding can't easily handle masking, rounding and the ss/sd cases (in a way that it would be pattern recognized back to the corresponding instructions). The GIMPLE folding is also done just for TARGET_SSE4 or later when optimizing, otherwise it is apparently not matched back. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-09-24 Jakub Jelinek PR target/116738 * config/i386/i386.cc (ix86_fold_builtin): Handle IX86_BUILTIN_M{IN,AX}{S,P}{S,H,D}*. (ix86_gimple_fold_builtin): Handle IX86_BUILTIN_M{IN,AX}P{S,H,D}*. * gcc.target/i386/avx512f-pr116738-1.c: New test. * gcc.target/i386/avx512f-pr116738-2.c: New test. --- gcc/config/i386/i386.cc.jj 2024-09-12 10:56:57.344683959 +0200 +++ gcc/config/i386/i386.cc 2024-09-23 15:15:40.154783766 +0200 @@ -18507,6 +18507,8 @@ ix86_fold_builtin (tree fndecl, int n_ar = (enum ix86_builtins) DECL_MD_FUNCTION_CODE (fndecl); enum rtx_code rcode; bool is_vshift; + enum tree_code tcode; + bool is_scalar; unsigned HOST_WIDE_INT mask; switch (fn_code) @@ -18956,6 +18958,133 @@ ix86_fold_builtin (tree fndecl, int n_ar } break; + case IX86_BUILTIN_MINSS: + case IX86_BUILTIN_MINSH_MASK: + tcode = LT_EXPR; + is_scalar = true; + goto do_minmax; + + case IX86_BUILTIN_MAXSS: + case IX86_BUILTIN_MAXSH_MASK: + tcode = GT_EXPR; + is_scalar = true; + goto do_minmax; + + case IX86_BUILTIN_MINPS: + case IX86_BUILTIN_MINPD: + case IX86_BUILTIN_MINPS256: + case IX86_BUILTIN_MINPD256: + case IX86_BUILTIN_MINPS512: + case IX86_BUILTIN_MINPD512: + case IX86_BUILTIN_MINPS128_MASK: + case IX86_BUILTIN_MINPD128_MASK: + case IX86_BUILTIN_MINPS256_MASK: + case IX86_BUILTIN_MINPD256_MASK: + case IX86_BUILTIN_MINPH128_MASK: + case IX86_BUILTIN_MINPH256_MASK: + case IX86_BUILTIN_MINPH512_MASK: + tcode = LT_EXPR; + is_scalar = false; + goto do_minmax; + + case IX86_BUILTIN_MAXPS: + case IX86_BUILTIN_MAXPD: + case IX86_BUILTIN_MAXPS256: + case IX86_BUILTIN_MAXPD256: + case IX86_BUILTIN_MAXPS512: + case IX86_BUILTIN_MAXPD512: + case IX86_BUILTIN_MAXPS128_MASK: + case IX86_BUILTIN_MAXPD128_MASK: + case IX86_BUILTIN_MAXPS256_MASK: + case IX86_BUILTIN_MAXPD256_MASK: + case IX86_BUILTIN_MAXPH128_MASK: + case IX86_BUILTIN_MAXPH256_MASK: + case IX86_BUILTIN_MAXPH512_MASK: + tcode = GT_EXPR; + is_scalar = false; + do_minmax: + gcc_assert (n_args >= 2); + if (TREE_CODE (args[0]) != VECTOR_CST + || TREE_CODE (args[1]) != VECTOR_CST) + break; + mask = HOST_WIDE_INT_M1U; + if (n_args > 2) + { + gcc_assert (n_args >= 4); + /* This is masked minmax. */ + if (TREE_CODE (args[3]) != INTEGER_CST + || TREE_SIDE_EFFECTS (args[2])) + break; + mask = TREE_INT_CST_LOW (args[3]); + unsigned elems = TYPE_VECTOR_SUBPARTS (TREE_TYPE (args[0])); + mask |= HOST_WIDE_INT_M1U << elems; + if (mask != HOST_WIDE_INT_M1U + && TREE_CODE (args[2]) != VECTOR_CST) + break; + if (n_args >= 5) + { + if (!tree_fits_uhwi_p (args[4])) + break; + if (tree_to_uhwi (args[4]) != 4 + && tree_to_uhwi (args[4]) != 8) + break; + } + if (mask == (HOST_WIDE_INT_M1U << elems)) + return args[2]; + } + /* Punt on NaNs, unless exceptions are disabled. */ + if (HONOR_NANS (args[0]) + && (n_args < 5 || tree_to_uhwi (args[4]) != 8)) + for (int i = 0; i < 2; ++i) + { + unsigned count = vector_cst_encoded_nelts (args[i]), j; + for (j = 0; j < count; ++j) + if (!tree_expr_nan_p (VECTOR_CST_ENCODED_ELT (args[i], j))) + break; + if (j < count) + break; + } + { + tree res = const_binop (tcode, + truth_type_for (TREE_TYPE (args[0])), + args[0], args[1]); + if (res == NULL_TREE || TREE_CODE (res) != VECTOR_CST) + break; + res = fold_ternary (VEC_COND_EXPR, TREE_TYPE (args[0]), res, +
[PATCH] libcpp: Add -Wleading-whitespace= warning
Hi! The following patch on top of the https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663388.html patch adds -Wleading-whitespace= warning option. This warning doesn't care how much one actually indents which line in the source (that is something that can't be easily done in the preprocessor without doing syntactic analysis), but just simple checks on what kind of whitespace is used in the indentation. I think it is still useful to get warnings about such issues early, while git diagnoses some of it in patches (e.g. the tab after space case), getting the warnings earlier might help avoiding such issues sooner. There are projects which ban use of tabs and require just spaces, others which require indentation just with horizontal tabs, and finally projects which want indentation with tabs for multiples of tabstop size followed by spaces (fewer than tabstop size), like GCC. For all 3 kinds the warning diagnoses indentation with '\v' or '\f' characters (unless line contains just whitespace), and for the last one also cases where a space in the indentation is followed by horizontal tab or where there are N or more consecutive spaces in the indentation (for -ftabstop=N). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? BTW, for additional testing I've enabled the warnings (without -Werror for them) in stage3. There are many warnings (both trailing and leading whitespace), some of them something that can be easily fixed in the headers or source files, but others with whitespace issues in generated sources, so if we enable the warnings, either we'd need to adjust the generators or disable the warnings in (some of the) generated files. 2024-09-24 Jakub Jelinek libcpp/ * include/cpplib.h (struct cpp_options): Add cpp_warn_leading_whitespace and cpp_tabstop members. (enum cpp_warning_reason): Add CPP_W_LEADING_WHITESPACE. * internal.h (struct _cpp_line_note): Document new line note kinds. * init.cc (cpp_create_reader): Set cpp_tabstop to 8. * lex.cc (find_leading_whitespace_issues): New function. (_cpp_clean_line): Use it. (_cpp_process_line_notes): Handle 'L', 'S' and 'T' line notes. (lex_raw_string): Clear type on 'L', 'S' and 'T' line notes inside of raw string literals. gcc/ * doc/invoke.texi (Wleading-whitespace=): Document. gcc/c-family/ * c.opt (Wleading-whitespace=): New option. * c-opts.cc (c_common_post_options): Set cpp_opts->cpp_tabstop to global_dc->m_tabstop. gcc/testsuite/ * c-c++-common/cpp/Wleading-whitespace-1.c: New test. * c-c++-common/cpp/Wleading-whitespace-2.c: New test. * c-c++-common/cpp/Wleading-whitespace-3.c: New test. * c-c++-common/cpp/Wleading-whitespace-4.c: New test. --- libcpp/include/cpplib.h.jj 2024-09-23 16:08:40.846050280 +0200 +++ libcpp/include/cpplib.h 2024-09-23 17:09:32.250056701 +0200 @@ -594,9 +594,15 @@ struct cpp_options /* True if -finput-charset= option has been used explicitly. */ bool cpp_input_charset_explicit; + /* -Wleading-whitespace= value. */ + unsigned char cpp_warn_leading_whitespace; + /* -Wtrailing-whitespace= value. */ unsigned char cpp_warn_trailing_whitespace; + /* -ftabstop= value. */ + unsigned int cpp_tabstop; + /* Dependency generation. */ struct { @@ -713,6 +719,7 @@ enum cpp_warning_reason { CPP_W_BIDIRECTIONAL, CPP_W_INVALID_UTF8, CPP_W_UNICODE, + CPP_W_LEADING_WHITESPACE, CPP_W_TRAILING_WHITESPACE }; --- libcpp/internal.h.jj2024-09-23 16:08:40.846050280 +0200 +++ libcpp/internal.h 2024-09-23 18:19:46.642467051 +0200 @@ -318,7 +318,8 @@ struct _cpp_line_note /* Type of note. The 9 'from' trigraph characters represent those trigraphs, '\\' an escaped newline, ' ' an escaped newline with - intervening space, 'W' trailing whitespace, 0 represents a note that + intervening space, 'W' trailing whitespace, 'L', 'S' and 'T' for + leading whitespace issues, 0 represents a note that has already been handled, and anything else is invalid. */ unsigned int type; }; --- libcpp/init.cc.jj 2024-09-20 08:57:03.041075703 +0200 +++ libcpp/init.cc 2024-09-23 17:24:53.564421636 +0200 @@ -246,6 +246,7 @@ cpp_create_reader (enum c_lang lang, cpp CPP_OPTION (pfile, cpp_warn_invalid_utf8) = 0; CPP_OPTION (pfile, cpp_warn_unicode) = 1; CPP_OPTION (pfile, cpp_input_charset_explicit) = 0; + CPP_OPTION (pfile, cpp_tabstop) = 8; /* Default CPP arithmetic to something sensible for the host for the benefit of dumb users like fix-header. */ --- libcpp/lex.cc.jj2024-09-23 16:08:40.847050267 +0200 +++ libcpp/lex.cc 2024-09-24 09:32:57.293210930 +0200 @@ -818,6 +818,59 @@ _cpp_init_lexer (void) #endif } +/* Look for leading whitespace style issues on lines which don't contain + just whitespace. + For -Wleading-whitespace=spaces report if such lines
Re: [PATCH] c++, v2: Implement C++23 P2718R0 - Wording for P2644R1 Fix for Range-based for Loop [PR107637]
On 9/24/24 12:53 PM, Jakub Jelinek wrote: On Mon, Sep 23, 2024 at 03:46:36PM -0400, Jason Merrill wrote: -frange-based-for-ext-temps or do you have better suggestion? I'd probably drop "based", "range-for" seems enough. Shall we allow also disabling it in C++23 or later modes, or override user choice unconditionally for C++23+ and only allow users to enable/disable it in C++11-C++20? Hmm, I think the latter. What about the __cpp_range_based_for predefined macro? Shall it be defined to the C++23 202211L value if the switch is on? While that could be done in theory for C++17 and later code, for C++11/14 __cpp_range_based_for is 200907L and doesn't include the C++17 201603L step. Or keep the macro only for C++23 and later? I think update the macro for 17 and later. Ok. Here is a new patch. @@ -44600,11 +44609,14 @@ cp_convert_omp_range_for (tree &this_pre else { range_temp = build_range_temp (init); + tree name = DECL_NAME (range_temp); DECL_NAME (range_temp) = NULL_TREE; pushdecl (range_temp); + DECL_NAME (range_temp) = name; cp_finish_decl (range_temp, init, /*is_constant_init*/false, NULL_TREE, LOOKUP_ONLYCONVERTING); + DECL_NAME (range_temp) = NULL_TREE; This messing with the name needs a rationale. What wants it to be null? I'll add comments. The first = NULL_TREE; is needed so that pushdecl doesn't register the temporary for name lookup, the = name now is so that cp_finish_decl recognizes the temporary as range based for temporary for the lifetime extension, and the last one is just to preserve previous behavior, not have it visible in debug info etc. But cp_convert_range_for doesn't ever set the name to NULL_TREE, why should the OMP variant be different? Having it visible to name lookup in the debugger seems beneficial. Having it visible to the code seems less useful, but not important to prevent. So, in the end it works fine even for the OpenMP case when not inside of a template, all I had to add is the renaming of the symbol at the end after pop_scope from "__for_range " to "__for_range" etc. It doesn't work unfortunately during instantiation, we only create a single scope in that case for the whole loop nest rather than one for each loop in it and changing that isn't easy. With the "__for_range " name in, if there are 2+ range based for loops in the OpenMP loop nest (collapsed or ordered), one gets then errors about defining it multiple times. I'll try to fix that up at incrementally later, for now I just went with a new flag to the function, so that it does the DECL_NAME dances only when called from the instantiation (and confirmed actually all 3 spots are needed, clearing before pushdecl, resetting back before cp_finish_decl and clearing after cp_finish_decl, the last one so that pop_scope doesn't ICE on seeing the name change). Don't worry too much about fixing it up if it's complicated. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-09-24 Jakub Jelinek PR c++/107637 gcc/ * omp-general.cc (find_combined_omp_for, find_nested_loop_xform): Handle CLEANUP_POINT_EXPR like TRY_FINALLY_EXPR. * doc/invoke.texi (frange-for-ext-temps): Document. Add -fconcepts to the C++ option list. gcc/c-family/ * c.opt (frange-for-ext-temps): New option. * c-opts.cc (c_common_post_options): Set flag_range_for_ext_temps for C++23 or later or for C++11 or later in !flag_iso mode if the option wasn't set by user. * c-cppbuiltin.cc (c_cpp_builtins): Change __cpp_range_based_for value for flag_range_for_ext_temps from 201603L to 202212L in C++17 or later. * c-omp.cc (c_find_nested_loop_xform_r): Handle CLEANUP_POINT_EXPR like TRY_FINALLY_EXPR. gcc/cp/ * cp-tree.h: Implement C++23 P2718R0 - Wording for P2644R1 Fix for Range-based for Loop. (cp_convert_omp_range_for): Add bool tmpl_p argument. (find_range_for_decls): Declare. * parser.cc (cp_convert_range_for): For flag_range_for_ext_temps call push_stmt_list () before cp_finish_decl for range_temp and save it temporarily to FOR_INIT_STMT. (cp_convert_omp_range_for): Add tmpl_p argument. If set, remember DECL_NAME of range_temp and for cp_finish_decl call restore it before clearing it again, if unset, don't adjust DECL_NAME of range_temp at all. (cp_parser_omp_loop_nest): For flag_range_for_ext_temps range for add CLEANUP_POINT_EXPR around sl. Call find_range_for_decls and adjust DECL_NAMEs for range fors if not processing_template_decl. Adjust cp_convert_omp_range_for caller. Remove superfluous backslash at the end of line. * decl.cc (initialize_local_var): For flag_range_for_ext_temps temporarily clear stmts_are_full_expr
Re: [PATCH 02/10] c++: Update decl_linkage for C++11
On 9/23/24 7:43 PM, Nathaniel Shead wrote: This patch intends no change in functionality apart from the mangling difference noted; more tests are in patch 4 of this series, which adds a way to actually check what the linkage of decl_linkage provides more directly. Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? -- >8 -- Currently modules code uses a variety of ad-hoc methods to attempt to determine whether an entity has internal linkage, which leads to inconsistencies and some correctness issues as different edge cases are neglected. While investigating this I discovered 'decl_linkage', but it doesn't seem to have been updated to account for the C++11 clarification that all entities declared in an anonymous namespace are internal. I'm not convinced that even in C++98 it was intended that e.g. types in anonymous namespaces should be external, but some tests in the testsuite rely on this, so for compatibility I restricted those modifications to C++11 and later. This should have relatively minimal impact as not much seems to actually rely on decl_linkage, but does change the mangling of symbols in anonymous namespaces slightly. Previously, we had namespace { int x; // mangled as '_ZN12_GLOBAL__N_11xE' static int y; // mangled as '_ZN12_GLOBAL__N_1L1yE' } but with this patch the x is now mangled like y (with the extra 'L'). For contrast, Clang currently mangles neither x nor y with the 'L'. Since this only affects internal-linkage entities I don't believe this should break ABI in any observable fashion. gcc/cp/ChangeLog: * name-lookup.cc (do_namespace_alias): Propagate TREE_PUBLIC for namespace aliases. * tree.cc (decl_linkage): Update rules for C++11. gcc/testsuite/ChangeLog: * g++.dg/modules/mod-sym-4.C: Update test to account for non-static internal-linkage variables new mangling. Signed-off-by: Nathaniel Shead --- gcc/cp/name-lookup.cc| 1 + gcc/cp/tree.cc | 92 +++- gcc/testsuite/g++.dg/modules/mod-sym-4.C | 4 +- 3 files changed, 60 insertions(+), 37 deletions(-) diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc index c7a693e02d5..50e169eca43 100644 --- a/gcc/cp/name-lookup.cc +++ b/gcc/cp/name-lookup.cc @@ -6610,6 +6610,7 @@ do_namespace_alias (tree alias, tree name_space) DECL_NAMESPACE_ALIAS (alias) = name_space; DECL_EXTERNAL (alias) = 1; DECL_CONTEXT (alias) = FROB_CONTEXT (current_scope ()); + TREE_PUBLIC (alias) = TREE_PUBLIC (DECL_CONTEXT (alias)); set_originating_module (alias); pushdecl (alias); diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc index f43febed124..28e14295de4 100644 --- a/gcc/cp/tree.cc +++ b/gcc/cp/tree.cc @@ -5840,7 +5840,7 @@ char_type_p (tree type) || same_type_p (type, wchar_type_node)); } -/* Returns the kind of linkage associated with the indicated DECL. Th +/* Returns the kind of linkage associated with the indicated DECL. The value returned is as specified by the language standard; it is independent of implementation details regarding template instantiation, etc. For example, it is possible that a declaration @@ -5857,53 +5857,75 @@ decl_linkage (tree decl) linkage first, and then transform that into a concrete implementation. */ - /* Things that don't have names have no linkage. */ - if (!DECL_NAME (decl)) -return lk_none; + /* An explicit type alias has no linkage. */ + if (TREE_CODE (decl) == TYPE_DECL + && !DECL_IMPLICIT_TYPEDEF_P (decl) + && !DECL_SELF_REFERENCE_P (decl)) +{ + /* But this could be a typedef name for linkage purposes, in which +case we're interested in the linkage of the main decl. */ Perhaps we should move is_naming_typedef_decl out of dwarf2out.cc... Anyway, the patch is OK. Jason
Re: [PATCH] libstdc++: more #pragma diagnostic
On 9/24/24 7:51 AM, Jason Merrill wrote: Tested x86_64-pc-linux-gnu. Is this the right fix, or do we want to stop using these deprecated classes, here and in stl_function.h? Oops, adding libstdc++ CC. -- 8< -- The CI saw failures on 17_intro/headers/c++2011/parallel_mode.cc due to -Wdeprecated-declarations warnings in some parallel/ headers. libstdc++-v3/ChangeLog: * include/parallel/base.h: Suppress -Wdeprecated-declarations. * include/parallel/multiseq_selection.h: Likewise. --- libstdc++-v3/include/parallel/base.h | 4 libstdc++-v3/include/parallel/multiseq_selection.h | 6 ++ 2 files changed, 10 insertions(+) diff --git a/libstdc++-v3/include/parallel/base.h b/libstdc++-v3/include/parallel/base.h index 5bc5350e723..fcbcc1e0b99 100644 --- a/libstdc++-v3/include/parallel/base.h +++ b/libstdc++-v3/include/parallel/base.h @@ -166,6 +166,8 @@ namespace __gnu_parallel { return !_M_comp(__a, __b) && !_M_comp(__b, __a); } }; +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wdeprecated-declarations" // *nary_function /** @brief Similar to std::unary_negate, * but giving the argument types explicitly. */ @@ -297,6 +299,8 @@ namespace __gnu_parallel struct _Multiplies<_Tp, _Tp, _Tp> : public std::multiplies<_Tp> { }; +#pragma GCC diagnostic pop // -Wdeprecated-declarations + /** @brief _Iterator associated with __gnu_parallel::_PseudoSequence. * If features the usual random-access iterator functionality. * @param _Tp Sequence _M_value type. diff --git a/libstdc++-v3/include/parallel/multiseq_selection.h b/libstdc++-v3/include/parallel/multiseq_selection.h index f25895adbdd..22bd97e6432 100644 --- a/libstdc++-v3/include/parallel/multiseq_selection.h +++ b/libstdc++-v3/include/parallel/multiseq_selection.h @@ -48,6 +48,10 @@ namespace __gnu_parallel { + +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wdeprecated-declarations" // *nary_function + /** @brief Compare __a pair of types lexicographically, ascending. */ template class _Lexicographic @@ -100,6 +104,8 @@ namespace __gnu_parallel } }; +#pragma GCC diagnostic pop // -Wdeprecated-declarations + /** * @brief Splits several sorted sequences at a certain global __rank, * resulting in a splitting point for each sequence. base-commit: b752eed3e3f2f27570ea89b7c2339468698472a8
[PATCH v1 1/3] Match: Support form 1 for scalar signed integer SAT_SUB
From: Pan Li This patch would like to support the form 1 of the scalar signed integer SAT_SUB. Aka below example: Form 1: #define DEF_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_sub_##T##_fmt_1 (T x, T y) \ {\ T minus = (UT)x - (UT)y; \ return (x ^ y) >= 0\ ? minus \ : (minus ^ x) >= 0 \ ? minus\ : x < 0 ? MIN : MAX; \ } DEF_SAT_S_SUB_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX) Before this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_sub_int8_t_fmt_1 (int8_t x, int8_t y) 6 │ { 7 │ int8_t minus; 8 │ unsigned char x.0_1; 9 │ unsigned char y.1_2; 10 │ unsigned char _3; 11 │ signed char _4; 12 │ signed char _5; 13 │ int8_t _6; 14 │ _Bool _11; 15 │ signed char _12; 16 │ signed char _13; 17 │ signed char _14; 18 │ signed char _15; 19 │ 20 │ ;; basic block 2, loop depth 0 21 │ ;;pred: ENTRY 22 │ x.0_1 = (unsigned char) x_7(D); 23 │ y.1_2 = (unsigned char) y_8(D); 24 │ _3 = x.0_1 - y.1_2; 25 │ minus_9 = (int8_t) _3; 26 │ _4 = x_7(D) ^ y_8(D); 27 │ _5 = x_7(D) ^ minus_9; 28 │ _15 = _4 & _5; 29 │ if (_15 < 0) 30 │ goto ; [41.00%] 31 │ else 32 │ goto ; [59.00%] 33 │ ;;succ: 3 34 │ ;;4 35 │ 36 │ ;; basic block 3, loop depth 0 37 │ ;;pred: 2 38 │ _11 = x_7(D) < 0; 39 │ _12 = (signed char) _11; 40 │ _13 = -_12; 41 │ _14 = _13 ^ 127; 42 │ ;;succ: 4 43 │ 44 │ ;; basic block 4, loop depth 0 45 │ ;;pred: 2 46 │ ;;3 47 │ # _6 = PHI 48 │ return _6; 49 │ ;;succ: EXIT 50 │ 51 │ } After this patch: 4 │ __attribute__((noinline)) 5 │ int8_t sat_s_sub_int8_t_fmt_1 (int8_t x, int8_t y) 6 │ { 7 │ int8_t _6; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;;pred: ENTRY 11 │ _6 = .SAT_SUB (x_7(D), y_8(D)); [tail call] 12 │ return _6; 13 │ ;;succ: EXIT 14 │ 15 │ } The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add case 1 matching pattern for signed SAT_SUB. * tree-ssa-math-opts.cc (gimple_signed_integer_sat_sub): Add new decl for generated SAT_SUB matching func. (match_unsigned_saturation_sub): Rename from... (match_saturation_sub): ...Rename to and add signed SAT_SUB matching. (math_opts_dom_walker::after_dom_children): Leverage the named match func for both the unsigned and signed SAT_SUB. Signed-off-by: Pan Li --- gcc/match.pd | 14 ++ gcc/tree-ssa-math-opts.cc | 8 +--- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/gcc/match.pd b/gcc/match.pd index 940292d0d49..63f7f3142c4 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3358,6 +3358,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) } (if (wi::eq_p (sum, wi::uhwi (0, precision))) +/* Signed saturation sub, case 1: + T minus = (T)((UT)X - (UT)Y); + SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus; + + The T and UT are type pair like T=int8_t, UT=uint8_t. */ +(match (signed_integer_sat_sub @0 @1) + (cond^ (lt (bit_and:c (bit_xor:c @0 @1) + (bit_xor @0 (nop_convert@2 (minus (nop_convert @0) +(nop_convert @1) + integer_zerop) + (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value) + @2) + (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type + /* Unsigned saturation truncate, case 1, sizeof (WT) > sizeof (NT). SAT_U_TRUNC = (NT)x | (NT)(-(X > (WT)(NT)(-1))). */ (match (unsigned_integer_sat_trunc @0) diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc index d61668aacfc..f04b17101db 100644 --- a/gcc/tree-ssa-math-opts.cc +++ b/gcc/tree-ssa-math-opts.cc @@ -4024,6 +4024,7 @@ extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree)); extern bool gimple_unsigned_integer_sat_trunc (tree, tree*, tree (*)(tree)); extern bool gimple_signed_integer_sat_add (tree, tree*, tree (*)(tree)); +extern bool gimple_signed_integer_sat_sub (tree, tree*, tree (*)(tree)); static void build_saturation_binary_arith_call (gimple_stmt_iterator *gsi, internal_fn fn, @@ -4162,7 +4163,7 @@ match_unsigned_saturation_sub (gimple_stmt_iterator *gsi, gassign *stmt) * [local count: 1073741824]: * _1 = .SAT_SUB (x_2(D), y_3(D)); */ static void -match_unsigned_sat
[PATCH v1 2/3] RISC-V: Implement scalar SAT_SUB for signed integer
From: Pan Li This patch would like to implement the sssub form 1. Aka: Form 1: #define DEF_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_sub_##T##_fmt_1 (T x, T y) \ {\ T minus = (UT)x - (UT)y; \ return (x ^ y) >= 0\ ? minus \ : (minus ^ x) >= 0 \ ? minus\ : x < 0 ? MIN : MAX; \ } DEF_SAT_S_SUB_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX) Before this patch: 10 │ sat_s_sub_int8_t_fmt_1: 11 │ subwa5,a0,a1 12 │ slliw a5,a5,24 13 │ sraiw a5,a5,24 14 │ xor a1,a0,a1 15 │ xor a4,a0,a5 16 │ and a1,a1,a4 17 │ blt a1,zero,.L4 18 │ mv a0,a5 19 │ ret 20 │ .L4: 21 │ sraia0,a0,63 22 │ xoria5,a0,127 23 │ mv a0,a5 24 │ ret After this patch: 10 │ sat_s_sub_int8_t_fmt_1: 11 │ sub a4,a0,a1 12 │ xor a5,a0,a4 13 │ xor a1,a0,a1 14 │ and a5,a5,a1 15 │ srlia5,a5,7 16 │ andia5,a5,1 17 │ sraia0,a0,63 18 │ xoria3,a0,127 19 │ neg a0,a5 20 │ addia5,a5,-1 21 │ and a3,a3,a0 22 │ and a0,a4,a5 23 │ or a0,a0,a3 24 │ slliw a0,a0,24 25 │ sraiw a0,a0,24 26 │ ret The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_expand_sssub): Add new func decl for expanding signed SAT_SUB. * config/riscv/riscv.cc (riscv_expand_sssub): Add new func impl for expanding signed SAT_SUB. * config/riscv/riscv.md (sssub3): Add new pattern sssub for scalar signed integer. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv.cc | 69 + gcc/config/riscv/riscv.md | 11 ++ 3 files changed, 81 insertions(+) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 07a4d42e3a5..3d8775e582d 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -136,6 +136,7 @@ extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx); extern void riscv_expand_usadd (rtx, rtx, rtx); extern void riscv_expand_ssadd (rtx, rtx, rtx); extern void riscv_expand_ussub (rtx, rtx, rtx); +extern void riscv_expand_sssub (rtx, rtx, rtx); extern void riscv_expand_ustrunc (rtx, rtx); #ifdef RTX_CODE diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 7be3939a7f9..8708a7b42c6 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -12329,6 +12329,75 @@ riscv_expand_ussub (rtx dest, rtx x, rtx y) emit_move_insn (dest, gen_lowpart (mode, xmode_dest)); } +/* Implements the signed saturation sub standard name ssadd for int mode. + + z = SAT_SUB(x, y). + => + 1. minus = x - y + 2. xor_0 = x ^ y + 3. xor_1 = x ^ minus + 4. lt_0 = xor_1 < 0 + 5. lt_1 = xor_0 < 0 + 6. and = lt_0 & lt_1 + 7. lt = x < 0 + 8. neg = -lt + 9. max = INT_MAX + 10. max = max ^ neg + 11. neg = -and + 12. max = max & neg + 13. and = and - 1 + 14. z = minus & and + 15. z = z | max */ + +void +riscv_expand_sssub (rtx dest, rtx x, rtx y) +{ + machine_mode mode = GET_MODE (dest); + unsigned bitsize = GET_MODE_BITSIZE (mode).to_constant (); + rtx shift_bits = GEN_INT (bitsize - 1); + rtx xmode_x = gen_lowpart (Xmode, x); + rtx xmode_y = gen_lowpart (Xmode, y); + rtx xmode_minus = gen_reg_rtx (Xmode); + rtx xmode_xor_0 = gen_reg_rtx (Xmode); + rtx xmode_xor_1 = gen_reg_rtx (Xmode); + rtx xmode_lt_0 = gen_reg_rtx (Xmode); + rtx xmode_lt_1 = gen_reg_rtx (Xmode); + rtx xmode_and = gen_reg_rtx (Xmode); + rtx xmode_lt = gen_reg_rtx (Xmode); + rtx xmode_neg = gen_reg_rtx (Xmode); + rtx xmode_max = gen_reg_rtx (Xmode); + rtx xmode_dest = gen_reg_rtx (Xmode); + + /* Step-1: mins = x - y, xor_0 = x ^ y, xor_1 = x ^ minus. */ + riscv_emit_binary (MINUS, xmode_minus, xmode_x, xmode_y); + riscv_emit_binary (XOR, xmode_xor_0, xmode_x, xmode_y); + riscv_emit_binary (XOR, xmode_xor_1, xmode_x, xmode_minus); + + /* Step-2: and = xor_0 < 0 & xor_1 < 0. */ + riscv_emit_binary (LSHIFTRT, xmode_lt_0, xmode_xor_0, shift_bits); + riscv_emit_binary (LSHIFTRT, xmode_lt_1, xmode_xor_1, shift_bits); + riscv_emit_binary (AND, xmode_and, xmode_lt_0, xmode_lt_1); + riscv_emit_binary (AND, xmode_and, xmode_and, CONST1_RTX (Xmode)); + + /* Step-3: lt = x < 0, neg = -lt. */ + riscv_emit_binary (LT, xmode_lt, xmode_x, CONST0_RTX (Xmode)); + riscv_emit_unary (NEG, xmode_neg, xmode_lt); + + /* Step-4: max = 0x7f..., max = max ^ neg, neg = -and, max = max & neg. */ + riscv_emit_move (xmode_max
[PATCH v1 3/3] RISC-V: Add testcases for form 1 of scalar signed SAT_SUB
From: Pan Li Form 1: #define DEF_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_sub_##T##_fmt_1 (T x, T y) \ {\ T minus = (UT)x - (UT)y; \ return (x ^ y) >= 0\ ? minus \ : (minus ^ x) >= 0 \ ? minus\ : x < 0 ? MIN : MAX; \ } DEF_SAT_S_SUB_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX) The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_arith_data.h: Add test data for SAT_SUB. * gcc.target/riscv/sat_s_sub-1-i16.c: New test. * gcc.target/riscv/sat_s_sub-1-i32.c: New test. * gcc.target/riscv/sat_s_sub-1-i64.c: New test. * gcc.target/riscv/sat_s_sub-1-i8.c: New test. * gcc.target/riscv/sat_s_sub-run-1-i16.c: New test. * gcc.target/riscv/sat_s_sub-run-1-i32.c: New test. * gcc.target/riscv/sat_s_sub-run-1-i64.c: New test. * gcc.target/riscv/sat_s_sub-run-1-i8.c: New test. Signed-off-by: Pan Li --- gcc/testsuite/gcc.target/riscv/sat_arith.h| 17 + .../gcc.target/riscv/sat_arith_data.h | 73 +++ .../gcc.target/riscv/sat_s_sub-1-i16.c| 30 .../gcc.target/riscv/sat_s_sub-1-i32.c| 28 +++ .../gcc.target/riscv/sat_s_sub-1-i64.c| 27 +++ .../gcc.target/riscv/sat_s_sub-1-i8.c | 28 +++ .../gcc.target/riscv/sat_s_sub-run-1-i16.c| 16 .../gcc.target/riscv/sat_s_sub-run-1-i32.c| 16 .../gcc.target/riscv/sat_s_sub-run-1-i64.c| 16 .../gcc.target/riscv/sat_s_sub-run-1-i8.c | 16 10 files changed, 267 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-1-i16.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-1-i32.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-1-i64.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-1-i8.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-1-i16.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-1-i32.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-1-i64.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_s_sub-run-1-i8.c diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h index a2617b6db70..587f3f8348c 100644 --- a/gcc/testsuite/gcc.target/riscv/sat_arith.h +++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h @@ -353,6 +353,23 @@ sat_u_sub_imm_type_check##_##INDEX##_##T##_fmt_4 (T x)\ return x > IMM ? x - IMM : 0; \ } +#define DEF_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \ +T __attribute__((noinline)) \ +sat_s_sub_##T##_fmt_1 (T x, T y) \ +{\ + T minus = (UT)x - (UT)y; \ + return (x ^ y) >= 0\ +? minus \ +: (minus ^ x) >= 0 \ + ? minus\ + : x < 0 ? MIN : MAX; \ +} +#define DEF_SAT_S_SUB_FMT_1_WRAP(T, UT, MIN, MAX) \ + DEF_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) + +#define RUN_SAT_S_SUB_FMT_1(T, x, y) sat_s_sub_##T##_fmt_1(x, y) +#define RUN_SAT_S_SUB_FMT_1_WRAP(T, x, y) RUN_SAT_S_SUB_FMT_1(T, x, y) + /**/ /* Saturation Truncate (unsigned and signed) */ /**/ diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith_data.h b/gcc/testsuite/gcc.target/riscv/sat_arith_data.h index 75037c5d806..39a1e17cd3d 100644 --- a/gcc/testsuite/gcc.target/riscv/sat_arith_data.h +++ b/gcc/testsuite/gcc.target/riscv/sat_arith_data.h @@ -37,6 +37,11 @@ TEST_BINARY_STRUCT (int16_t, ssadd) TEST_BINARY_STRUCT (int32_t, ssadd) TEST_BINARY_STRUCT (int64_t, ssadd) +TEST_BINARY_STRUCT (int8_t, sssub) +TEST_BINARY_STRUCT (int16_t, sssub) +TEST_BINARY_STRUCT (int32_t, sssub) +TEST_BINARY_STRUCT (int64_t, sssub) + TEST_UNARY_STRUCT_DECL(uint8_t, uint16_t) \ TEST_UNARY_DATA(uint8_t, uint16_t)[] = { @@ -189,4 +194,72 @@ TEST_BINARY_STRUCT_DECL(int64_t, ssadd) TEST_BINARY_DATA(int64_t, ssadd)[] = { -9223372036854775803ll, 9223372036854775805ll, 2}, }; +TEST_BINARY_STRUCT_DECL(int8_t, sssub) TEST_BINARY_DATA(int8_t, sssub)[] = +{ + { 0,0,0}, + { 2,4, -2}, + { 126, -1, 127}, + { 127, -1, 127}, + { 127, -127, 127}, + { -7, -4, -3}, +
Re: [r15-3834 Regression] FAIL: c-c++-common/gomp/declare-variant-duplicates.c (test for excess errors) on Linux/x86_64
On 9/24/24 14:08, haochen.jiang wrote: On Linux/x86_64, 96246bff0bcd9e5cdec9e6cf811ee3db4997f6d4 is the first bad commit commit 96246bff0bcd9e5cdec9e6cf811ee3db4997f6d4 Author: Sandra Loosemore Date: Fri Sep 6 20:58:13 2024 + OpenMP: Check additional restrictions on context selector properties caused FAIL: c-c++-common/gomp/declare-variant-duplicates.c (test for errors, line 11) FAIL: c-c++-common/gomp/declare-variant-duplicates.c (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3834/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="gomp.exp=c-c++-common/gomp/declare-variant-duplicates.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="gomp.exp=c-c++-common/gomp/declare-variant-duplicates.c --target_board='unix{-m32\ -march=cascadelake}'" It turns out the problem here is that with -m32, "i386" is defined as a built-in preprocessor macro :-O so it cannot be used as an identifier. I've pushed the attached patch to adjust the testcase not to do that. -Sandra From 6935bddd8f90dde6009a1b8dea9745788ceeefb1 Mon Sep 17 00:00:00 2001 From: Sandra Loosemore Date: Wed, 25 Sep 2024 02:59:53 + Subject: [PATCH] OpenMP: Fix testsuite failure on x86 with -m32 The testcase decare-variant-duplicates.c added in commit 96246bff0bcd9e5cdec9e6cf811ee3db4997f6d4 failed on 32-bit x86 because on that target "i386" is defined as a preprocessor macro and cannot be used as an identifier. Fixed by rewriting that test not to do that. gcc/testsuite/ChangeLog * c-c++-common/gomp/declare-variant-duplicates.c: Avoid using "i386" as an identifier. --- gcc/testsuite/c-c++-common/gomp/declare-variant-duplicates.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-duplicates.c b/gcc/testsuite/c-c++-common/gomp/declare-variant-duplicates.c index 47d34fc52e2..9f319c72449 100644 --- a/gcc/testsuite/c-c++-common/gomp/declare-variant-duplicates.c +++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-duplicates.c @@ -8,6 +8,6 @@ extern int f4 (int); #pragma omp declare variant (f1) match (device={kind(cpu,gpu,"cpu")}) /* { dg-error "trait-property .cpu. specified more than once" } */ #pragma omp declare variant (f2) match (device={isa(sse4,"avx",avx)}) /* { dg-error "trait-property .avx. specified more than once" } */ -#pragma omp declare variant (f3) match (device={arch(x86_64,i386,aarch64,"i386")}) /* { dg-error "trait-property .i386. specified more than once" } */ +#pragma omp declare variant (f3) match (device={arch(x86_64,"i386",aarch64,"x86_64")}) /* { dg-error "trait-property .x86_64. specified more than once" } */ #pragma omp declare variant (f4) match (implementation={vendor(llvm,gnu,"arm",gnu)}) /* { dg-error "trait-property .gnu. specified more than once" } */ int f (int); -- 2.25.1
Re: [PATCH] c++: compile time evaluation of prvalues [PR116416]
On Fri, Sep 20, 2024 at 06:39:52PM -0400, Jason Merrill wrote: > On 9/20/24 12:18 AM, Marek Polacek wrote: > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > > > -- >8 -- > > This PR reports a missed optimization. When we have: > > > >Str str{"Test"}; > >callback(str); > > > > as in the test, we're able to evaluate the Str::Str() call at compile > > time. But when we have: > > > >callback(Str{"Test"}); > > > > we are not. With this patch (in fact, it's Patrick's patch with a little > > tweak), we turn > > > >callback (TARGET_EXPR > 5 > > __ct_comp > > D.2890 > > (struct Str *) <<< Unknown tree: void_cst >>> > > (const char *) "Test" ) > > > > into > > > >callback (TARGET_EXPR ) > > > > I explored the idea of calling maybe_constant_value for the whole > > TARGET_EXPR in cp_fold. That has three problems: > > - we can't always elide a TARGET_EXPR, so we'd have to make sure the > >result is also a TARGET_EXPR; > > I'd think that the result should always be a TARGET_EXPR for a class, and > that's the case we want to fold; a TARGET_EXPR for a scalar is always the > initialize-temp-and-use-it pattern you mention below. Checking CLASS_TYPE_P would solve some of the problems, yes. But... > > - the resulting TARGET_EXPR must have the same flags, otherwise Bad > >Things happen; > > I guess maybe_constant_value should be fixed to preserve flags regardless of > this change. Yeah, cxx_eval_outermost_constant_expr already preserves TARGET_EXPR flags, but here we go into the break_out_target_exprs block in maybe_constant_value and that doesn't necessarily preserve them. > > - getting a new slot is also problematic. I've seen a test where we > >had "TARGET_EXPR, D.2680", and folding the whole TARGET_EXPR > >would get us "TARGET_EXPR", but since we don't see the outer > >D.2680, we can't replace it with D.2681, and things break. > > Hmm, yeah. Maybe only if TARGET_EXPR_IMPLICIT_P? ...unfortunately that doesn't always help. I've reduced an example into: struct optional { constexpr optional(int) {} }; optional foo() { return 2; } where check_return_expr creates a COMPOUND_EXPR: retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval, TREE_OPERAND (retval, 0)); where the TARGET_EXPR comes from build_cplus_new so it is _IMPLICIT_P. > > With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C. > > > > FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted redundant > > store: .*.a = {}" > > is easy. Previously, we would call C::C, so .gimple has: > > > >D.2590 = {}; > >C::C (&D.2590); > >D.2597 = D.2590; > >return D.2597; > > > > Then .einline inlines the C::C call: > > > >D.2590 = {}; > >D.2590.a = {}; // #1 > >D.2590.b = 0; // #2 > >D.2597 = D.2590; > >D.2590 ={v} {CLOBBER(eos)}; > >return D.2597; > > > > then #2 is removed in .fre1, and #1 is removed in .dse1. So the test > > passes. But with the patch, .gimple won't have that C::C call, so the > > IL is of course going to look different. > > Maybe -fno-inline instead of the --param? Then that C::C call isn't inlined and the test fails :/. > > Unfortunately, pr78687.C is much more complicated and I can't explain > > precisely what happens there. But it seems like a good idea to have > > a way to avoid this optimization. So I've added the "noinline" check. > > Hmm, I'm surprised make_object_1 would be affected, since the ref_proxy > constructors are not constexpr. And I wouldn't expect the optimization to > affect the value-initialization option_2(). In pr78687.C we do this new optimization only once for "constexpr eggs::variants::variant::variant(U&&) noexcept (std::is_nothrow_constructible::value)". > > PR c++/116416 > > > > gcc/cp/ChangeLog: > > > > * cp-gimplify.cc (cp_fold_r) : Try to fold > > TARGET_EXPR_INITIAL and replace it with the folded result if > > it's TREE_CONSTANT. > > > > gcc/testsuite/ChangeLog: > > > > * g++.dg/analyzer/pr97116.C: Adjust dg-message. > > * g++.dg/cpp2a/consteval-prop2.C: Adjust dg-bogus. > > * g++.dg/tree-ssa/pr78687.C: Add __attribute__((noinline)). > > * g++.dg/tree-ssa/pr90883.C: Likewise. > > * g++.dg/cpp1y/constexpr-prvalue1.C: New test. > > > > Co-authored-by: Patrick Palka > > --- > > gcc/cp/cp-gimplify.cc | 14 + > > gcc/testsuite/g++.dg/analyzer/pr97116.C | 2 +- > > .../g++.dg/cpp1y/constexpr-prvalue1.C | 29 +++ > > gcc/testsuite/g++.dg/cpp2a/consteval-prop2.C | 2 +- > > gcc/testsuite/g++.dg/tree-ssa/pr78687.C | 5 +++- > > gcc/testsuite/g++.dg/tree-ssa/pr90883.C | 1 + > > 6 files changed, 50 insertions(+), 3 deletions(-) > > create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify
Re: [PATCH] c++: compile time evaluation of prvalues [PR116416]
On Sat, Sep 21, 2024 at 05:00:51PM +0200, Jakub Jelinek wrote: > On Fri, Sep 20, 2024 at 07:03:45PM -0400, Jason Merrill wrote: > > > The CALL_EXPR case in cp_fold uses !flag_no_inline instead, that makes > > > more > > > sense to me. > > > Because checking "noinline" attribute (which means don't inline this > > > function) on current_function_decl rather than on functions being > > > "inlined" > > > (the constexpr functions being non-manifestly constant evaluated) is just > > > weird. > > > If we really wanted, we could honor "noinline" during constant evaluation > > > on the CALL_EXPR/AGGR_INIT_EXPR fndecls, but dunno if whenever doing the > > > non-manifestly constant evaluated cases or just in special cases like > > > these > > > two (CALL_EXPR in cp_fold, this in cp_fold_r). > > > > Checking noinline in non-manifestly constant-evaluated cases might make > > sense. > > Though, if somebody marks some function explicitly constexpr they should be > prepared to get some constexpr evaluation of it, doesn't have to be strictly > standard required one. Yeah, I would agree with that. > And for -fimplicit-constexpr we already have "noinline" attribute check, so > maybe it is ok as is. Yeah. I dropped the "noinline" attribute check though because I no longer see any need for it. Marek
[Patch] OpenMP: Update OMP_REQUIRES_TARGET_USED for declare_target + interop
OpenMP mandates that when certain clauses are used with 'omp requires' that in all compilation units this requires clause appears. Those clauses influence the offloading behavior (+ potentially codegen); hence, the must requires must match for those claues when device code is involved. That's the case for device functions (in particular 'declare target') and all OpenMP directives that take a 'device' clause. Before OpenMP was rather vague, but in .e.g. TR13, it is fortunally more explicit. Thus, this patch adds it for 'declare target' and it adds it ("device" clause!) for 'interop' (but only for Fortran as C/C++ still does not support 'interop' directive plarsing.) And comment before I commit it? Tobias PS: In TR13, page 321, lines 14–16 — https://www.openmp.org/wp-content/uploads/openmp-TR13.pdf OpenMP: Update OMP_REQUIRES_TARGET_USED for declare_target + interop Older versions of the OpenMP specification were not clear about what counted as device usage. Newer (like TR13) are rather clear. Hence, this commit adds "target used" also when 'declare target' or 'interop' are encountered. (The latter only to Fortran as C/C++ parsing support is still missing.) TR13 also lists 'dispatch' as construct and 'device_safesync' affected by device use, but both are not yet supported in GCC: gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_declare_target): Set target-used bit in omp_requires_mask. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_declare_target): Set target-used bit in omp_requires_mask. gcc/fortran/ChangeLog: * parse.cc (decode_omp_directive): Set target-used bit of omp_requires_mask when encountering the declare_target or interop directive. gcc/c/c-parser.cc| 3 +++ gcc/cp/parser.cc | 3 +++ gcc/fortran/parse.cc | 8 ++-- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 6a46577f511..a681438cbbe 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -25492,6 +25492,9 @@ c_parser_omp_declare_target (c_parser *parser) int device_type = 0; bool indirect = false; bool only_device_type_or_indirect = true; + if (flag_openmp) +omp_requires_mask + = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED); if (c_parser_next_token_is (parser, CPP_NAME) || (c_parser_next_token_is (parser, CPP_COMMA) && c_parser_peek_2nd_token (parser)->type == CPP_NAME)) diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 35c266659e4..3b3ab0f1923 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -49524,6 +49524,9 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok) int device_type = 0; bool indirect = false; bool only_device_type_or_indirect = true; + if (flag_openmp) +omp_requires_mask + = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED); if (cp_lexer_next_token_is (parser->lexer, CPP_NAME) || (cp_lexer_next_token_is (parser->lexer, CPP_COMMA) && cp_lexer_nth_token_is (parser->lexer, 2, CPP_NAME))) diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc index e749bbdc6b5..9e06dbf0911 100644 --- a/gcc/fortran/parse.cc +++ b/gcc/fortran/parse.cc @@ -1345,8 +1345,12 @@ decode_omp_directive (void) switch (ret) { -/* Set omp_target_seen; exclude ST_OMP_DECLARE_TARGET. - FIXME: Get clarification, cf. OpenMP Spec Issue #3240. */ +/* For the constraints on clauses with the global requirement property, + we set omp_target_seen. This included all clauses that take the + DEVICE clause, (BEGIN) DECLARE_TARGET and procedures run the device + (which effectively is implied by the former). */ +case ST_OMP_DECLARE_TARGET: +case ST_OMP_INTEROP: case ST_OMP_TARGET: case ST_OMP_TARGET_DATA: case ST_OMP_TARGET_ENTER_DATA:
Re: [PATCH] ltmain.sh: allow more flags at link-time
On Thu, Sep 19, 2024 at 11:52:48PM +0100, Sam James wrote: > Sam James writes: > > > Sam James writes: > > > >> libtool defaults to filtering flags passed at link-time. > >> > >> This brings the filtering in GCC's 'fork' of libtool into sync with > >> upstream libtool commit 22a7e547e9857fc94fe5bc7c921d9a4b49c09f8e. Looks OK to me, thanks. -- Alan Modra
[r15-3841 Regression] FAIL: gfortran.dg/unsigned_25.f90 -Os (test for excess errors) on Linux/x86_64
On Linux/x86_64, 5d98fe096b5d17021875806ffc32ba41ea0e87b0 is the first bad commit commit 5d98fe096b5d17021875806ffc32ba41ea0e87b0 Author: Thomas Koenig Date: Tue Sep 24 21:51:42 2024 +0200 Implement MATMUL and DOT_PRODUCT for unsigned. caused FAIL: gfortran.dg/unsigned_25.f90 -O0 (test for excess errors) FAIL: gfortran.dg/unsigned_25.f90 -O1 (test for excess errors) FAIL: gfortran.dg/unsigned_25.f90 -O2 (test for excess errors) FAIL: gfortran.dg/unsigned_25.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) FAIL: gfortran.dg/unsigned_25.f90 -O3 -g (test for excess errors) FAIL: gfortran.dg/unsigned_25.f90 -Os (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3841/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/unsigned_25.f90 --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gfortran.dg/unsigned_25.f90 --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at haochen dot jiang at intel.com.) (If you met problems with cascadelake related, disabling AVX512F in command line might save that.) (However, please make sure that there is no potential problems with AVX512.)
[PATCH] tree-optimization/116819 - SLP with !STMT_VINFO_RELEVANT representative
Under some circumstances we can end up picking a not relevant stmt as representative of a SLP node. Instead of skipping stmt analysis and declaring success we have to either ignore relevancy throughout the code base or fail SLP operation verification. The following does the latter. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/116819 * tree-vect-stmts.cc (vect_analyze_stmt): When the SLP representative isn't relevant signal failure instead of success. --- gcc/tree-vect-stmts.cc | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index f7867c0803b..7e0a8095fe8 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -13289,6 +13289,12 @@ vect_analyze_stmt (vec_info *vinfo, if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "irrelevant.\n"); + if (node) + return opt_result::failure_at (stmt_info->stmt, + "not vectorized:" + " irrelevant stmt as SLP node %p " + "representative.\n", + (void *)node); return opt_result::success (); } } -- 2.43.0
RE: [PATCH v1 2/2] RISC-V: Add testcases for form 3 of signed vector SAT_ADD
Thanks Robin, this depends on [PATCH 1/2] of match.pd change, will commit it after that. Pan -Original Message- From: Robin Dapp Sent: Tuesday, September 24, 2024 8:40 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Robin Dapp Subject: Re: [PATCH v1 2/2] RISC-V: Add testcases for form 3 of signed vector SAT_ADD LGTM (in case you haven't committed it yet). -- Regards Robin
Re: [PATCH] Update email in MAINTAINERS file.
On Mon 2024-09-23 09:43:28, Aldy Hernandez wrote: > From: Aldy Hernandez > > ChangeLog: > > * MAINTAINERS: Update email and add myself to DCO. > --- > MAINTAINERS | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index cfd96c9f33e..e9fafaf45a7 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -116,7 +116,7 @@ riscv port Jim Wilson > > rs6000/powerpc port David Edelsohn > rs6000/powerpc port Segher Boessenkool > rs6000/powerpc port Kewen Lin > -rs6000 vector extns Aldy Hernandez > +rs6000 vector extns Aldy Hernandez > rx port Nick Clifton > s390 port Ulrich Weigand > s390 port Andreas Krebbel > @@ -213,7 +213,7 @@ c++ runtime libsJonathan Wakely > > c++ runtime libs special modes François Dumont > fixincludes Bruce Korb > *gimpl* Jakub Jelinek > -*gimpl* Aldy Hernandez > +*gimpl* Aldy Hernandez > *gimpl* Jason Merrill > gcse.cc Jeff Law > global opt frameworkJeff Law > @@ -240,7 +240,7 @@ option handling Joseph Myers > > middle-end Jeff Law > middle-end Ian Lance Taylor > middle-end Richard Biener > -*vrp, rangerAldy Hernandez > +*vrp, rangerAldy Hernandez > *vrp, rangerAndrew MacLeod > tree-ssaAndrew MacLeod > tree browser/unparser Sebastian Pop > @@ -518,7 +518,7 @@ Daniel Hellstromdanielh > > Fergus Henderson- > Richard Henderson rth > Stuart Hendersonshenders > -Aldy Hernandez aldyh > +Aldy Hernandez aldyh > Philip Herron redbrain > > Marius Hillenbrand - > Matthew Hiller - > @@ -948,3 +948,4 @@ Jonathan Wakely > > Alexander Westbrooks > Chung-Ju Wu > Pengxuan Zheng > +Aldy Hernandez > -- > 2.43.0 > Hi Aldy, Could you move your entry in the DCO list so that it respects surname alphabetical order, please? Your name should be between Robin Dapp and Michal Jires. Thanks, Filip Kastl
[PATCH] RISC-V: Fix FIXED_REGISTERS comment missing return address register
From: Yixuan Chen gcc/config/ChangeLog: 2024-09-24 Yixuan Chen * riscv/riscv.h: Fix FIXED_REGISTERS comment missing return address register. --- gcc/config/riscv/riscv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index ead97867eb8..3aecb43f831 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -316,7 +316,7 @@ ASM_MISA_SPEC #define FIRST_PSEUDO_REGISTER 128 -/* x0, sp, gp, and tp are fixed. */ +/* x0, ra, sp, gp, and tp are fixed. */ #define FIXED_REGISTERS \ { /* General registers. */\ -- 2.45.2
Re: [PATCH] MATCH: add abs support for half float
On Mon, Sep 23, 2024 at 10:52 AM Kugan Vivekanandarajah wrote: > > Hi Richard, > > > On 20 Sep 2024, at 8:11 pm, Richard Biener > > wrote: > > > > External email: Use caution opening links or attachments > > > > > > On Fri, Sep 20, 2024 at 10:23 AM Kugan Vivekanandarajah > > wrote: > >> > >> Hi Richard, > >> > >>> On 17 Sep 2024, at 7:36 pm, Richard Biener > >>> wrote: > >>> > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> On Tue, Sep 17, 2024 at 10:31 AM Kugan Vivekanandarajah > >>> wrote: > > Hi Richard, > > > On 10 Sep 2024, at 9:33 pm, Richard Biener > > wrote: > > > > External email: Use caution opening links or attachments > > > > > > On Thu, Sep 5, 2024 at 3:19 AM Kugan Vivekanandarajah > > wrote: > >> > >> Thanks for the explanation. > >> > >> > >>> On 2 Sep 2024, at 9:47 am, Andrew Pinski wrote: > >>> > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> On Sun, Sep 1, 2024 at 4:27 PM Kugan Vivekanandarajah > >>> wrote: > > Hi Andrew. > > > On 28 Aug 2024, at 2:23 pm, Andrew Pinski wrote: > > > > External email: Use caution opening links or attachments > > > > > > On Tue, Aug 27, 2024 at 8:54 PM Kugan Vivekanandarajah > > wrote: > >> > >> Hi Richard, > >> > >> Thanks for the reply. > >> > >>> On 27 Aug 2024, at 7:05 pm, Richard Biener > >>> wrote: > >>> > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> On Tue, Aug 27, 2024 at 8:23 AM Kugan Vivekanandarajah > >>> wrote: > > Hi Richard, > > > On 22 Aug 2024, at 10:34 pm, Richard Biener > > wrote: > > > > External email: Use caution opening links or attachments > > > > > > On Wed, Aug 21, 2024 at 12:08 PM Kugan Vivekanandarajah > > wrote: > >> > >> Hi Richard, > >> > >>> On 20 Aug 2024, at 6:09 pm, Richard Biener > >>> wrote: > >>> > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> On Fri, Aug 9, 2024 at 2:39 AM Kugan Vivekanandarajah > >>> wrote: > > Thanks for the comments. > > > On 2 Aug 2024, at 8:36 pm, Richard Biener > > wrote: > > > > External email: Use caution opening links or attachments > > > > > > On Fri, Aug 2, 2024 at 11:20 AM Kugan Vivekanandarajah > > wrote: > >> > >> > >> > >>> On 1 Aug 2024, at 10:46 pm, Richard Biener > >>> wrote: > >>> > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> On Thu, Aug 1, 2024 at 5:31 AM Kugan Vivekanandarajah > >>> wrote: > > > On Mon, Jul 29, 2024 at 10:11 AM Andrew Pinski > wrote: > > > > On Mon, Jul 29, 2024 at 12:57 AM Kugan Vivekanandarajah > > wrote: > >> > >> On Thu, Jul 25, 2024 at 10:19 PM Richard Biener > >> wrote: > >>> > >>> On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah > >>> wrote: > > On Tue, Jul 23, 2024 at 11:56 PM Richard Biener > wrote: > > > > On Tue, Jul 23, 2024 at 10:27 AM Kugan > > Vivekanandarajah > > wrote: > >> > >> On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski > >> wrote: > >>> > >>> On Mon, Jul 22, 2024 at 5:26 PM Kugan > >>> Vivekanandarajah > >>> wrote: > > Revised based on the comment and moved it into > existing patterns as. > > gcc/ChangeLog: > > * match.pd: Extend A CMP 0 ? A : -A into (type)A > CMP 0 ? A : -A. > Extend A CMP 0 ? A : -A in
Re: [RFC][PATCH] AArch64: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> On 28 Aug 2024, at 14:56, Kyrylo Tkachov wrote: > > > >> On 28 Aug 2024, at 10:27, Tamar Christina wrote: >> >> External email: Use caution opening links or attachments >> >> >>> -Original Message- >>> From: Kyrylo Tkachov >>> Sent: Wednesday, August 28, 2024 8:55 AM >>> To: Tamar Christina >>> Cc: Richard Sandiford ; Jennifer Schmitz >>> ; gcc-patches@gcc.gnu.org; Kyrylo Tkachov >>> >>> Subject: Re: [RFC][PATCH] AArch64: Remove >>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS >>> >>> Hi all, >>> >>> Thanks to Jennifer for proposing a patch and Tamar and Richard for digging >>> into it. >>> On 27 Aug 2024, at 13:16, Tamar Christina wrote: External email: Use caution opening links or attachments > -Original Message- > From: Richard Sandiford > Sent: Tuesday, August 27, 2024 11:46 AM > To: Tamar Christina > Cc: Jennifer Schmitz ; gcc-patches@gcc.gnu.org; > Kyrylo > Tkachov > Subject: Re: [RFC][PATCH] AArch64: Remove > AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > > Tamar Christina writes: >> Hi Jennifer, >> >>> -Original Message- >>> From: Jennifer Schmitz >>> Sent: Friday, August 23, 2024 1:07 PM >>> To: gcc-patches@gcc.gnu.org >>> Cc: Richard Sandiford ; Kyrylo Tkachov >>> >>> Subject: [RFC][PATCH] AArch64: Remove >>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS >>> >>> This patch removes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS >>> tunable and >>> use_new_vector_costs entry in aarch64-tuning-flags.def and makes the >>> AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend >>> the >>> default. > > Thanks for doing this. This has been on my TODO list ever since the > tunable was added. > > The history is that these "new" costs were originally added in stage 4 > of GCC 11 for Neoverse V1. Since the costs were added so late, it wasn't > appropriate to change the behaviour for any other core. All the new code > was therefore gated on this option. > > The new costs do two main things: > > (1) use throughput-based calculations where available, including to choose > between Advanced SIMD and SVE > > (2) try to make the latency-based costs more precise, by looking more > closely > at the provided stmt_info > > Old cost models won't be affected by (1) either way, since they don't > provide any throughput information. But they should in principle benefit > from (2). So... > >>> To that end, the function aarch64_use_new_vector_costs_p and its uses >>> were >>> removed. Additionally, guards were added prevent nullpointer >>> dereferences >>> of >>> fields in cpu_vector_cost. >>> >> >> I'm not against this change, but it does mean that we now switch old Adv. >>> SIMD >> cost models as well to the new throughput based cost models. That means >>> that >> -mcpu=generic now behaves differently, and -mcpu=neoverse-n1 and I think >> some distros explicitly use this (I believe yocto for instance does). > > ...it shouldn't mean that we start using throughput-based models for > cortexa53 etc., since there's no associated issue info. Yes, I was using throughput based model as a name. But as you indicated in (2) it does change the latency calculation. My question was because of things in e.g. aarch64_adjust_stmt_cost and >>> friends, e.g. aarch64_multiply_add_p changes the cost between FMA SIMD vs scalar. So my question.. > >> Have we validated that the old generic cost model still behaves sensibly >> with >>> this > change? is still valid I think, we *are* changing the cost for all models, and while they should indeed be more accurate, there could be knock on effects. >>> >>> We can run SPEC on a Grace system with -mcpu=generic to see what the effect >>> is, >>> but wider benchmarking would be more appropriate. Can you help with that >>> Tamar once we agree on the other implementation details in this patch? >>> >> >> Sure that's not a problem. Just ping me when you have a patch you want me >> to test :) >> >>> Thanks, Tamar >> >>> The patch was bootstrapped and regtested on aarch64-linux-gnu: >>> No problems bootstrapping, but several test files (in aarch64-sve.exp: >>> gather_load_extend_X.c >>> where X is 1 to 4, strided_load_2.c, strided_store_2.c) fail because of >>> small >>> differences >>> in codegen that make some of the scan-assembler-times tests fail. >>> >>> Kyrill suggested to add a -fvect-cost-model=unlimited flag to these >>> tests and > add >>> some >> >> I don't personally like unlimited here as unlimited means just vectorize >> at any >> cost. This means that costing between
Re: [committed] arc: Remove mlra option [PR113954]
I'll include your comment in my second patch where I clean some patterns used by reload. Thank you, claudiu On Mon, Sep 23, 2024 at 5:05 PM Andreas Schwab wrote: > > On Sep 23 2024, Claudiu Zissulescu wrote: > > > diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc > > index c800226b179..a225adeff57 100644 > > --- a/gcc/config/arc/arc.cc > > +++ b/gcc/config/arc/arc.cc > > @@ -721,7 +721,7 @@ static rtx arc_legitimize_address_0 (rtx, rtx, > > machine_mode mode); > >arc_no_speculation_in_delay_slots_p > > > > #undef TARGET_LRA_P > > -#define TARGET_LRA_P arc_lra_p > > +#define TARGET_LRA_P hook_bool_void_true > > This is the default for lra_p, so you can remove the override. > > -- > Andreas Schwab, SUSE Labs, sch...@suse.de > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 > "And now for something completely different."
SVE intrinsics: Fold constant operands for svlsl.
This patch implements constant folding for svlsl. Test cases have been added to check for the following cases: Zero, merge, and don't care predication. Shift by 0. Shift by register width. Overflow shift on signed and unsigned integers. Shift on a negative integer. Maximum possible shift, eg. shift by 7 on an 8-bit integer. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Soumya AR gcc/ChangeLog: * config/aarch64/aarch64-sve-builtins-base.cc (svlsl_impl::fold): Try constant folding. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/const_fold_lsl_1.c: New test. 0001-SVE-intrinsics-Fold-constant-operands-for-svlsl.patch Description: 0001-SVE-intrinsics-Fold-constant-operands-for-svlsl.patch
[PATCH] Testsuite, darwin: account for macOS 15
I’ve pushed the attached patch as obvious, taking into account the newly released macOS 15 (darwin24). It makes the test pass. FX 0001-Testsuite-darwin-account-for-macOS-15.patch Description: Binary data
[PATCH] tree-optimization/114855 - more update_ssa speedup
The following tackles another source of slow bitmap operations, namely populating blocks_to_update. We already have that in tree view around PHI insertion but also the initial population is slow. There's unfortunately a conditional inbetween list view requirement and the bitmap API doesn't allow opportunistic switching but rejects tree -> tree or list -> list transitions. So the following patch wraps the early population in a tree view section with possibly one redundant tree -> list -> tree view transition. This cuts tree SSA incremental from 228.25s (21%) to 65.05s (7%). Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/114855 * tree-into-ssa.cc (update_ssa): Use tree view for the initial population of blocks_to_update. --- gcc/tree-into-ssa.cc | 5 + 1 file changed, 5 insertions(+) diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc index 1cce9d62809..fc61d47ca77 100644 --- a/gcc/tree-into-ssa.cc +++ b/gcc/tree-into-ssa.cc @@ -3445,6 +3445,7 @@ update_ssa (unsigned update_flags) blocks_with_phis_to_rewrite = BITMAP_ALLOC (NULL); bitmap_tree_view (blocks_with_phis_to_rewrite); blocks_to_update = BITMAP_ALLOC (NULL); + bitmap_tree_view (blocks_to_update); insert_phi_p = (update_flags != TODO_update_ssa_no_phi); @@ -3492,6 +3493,8 @@ update_ssa (unsigned update_flags) placement heuristics. */ prepare_block_for_update (start_bb, insert_phi_p); + bitmap_list_view (blocks_to_update); + tree name; if (flag_checking) @@ -3517,6 +3520,8 @@ update_ssa (unsigned update_flags) } else { + bitmap_list_view (blocks_to_update); + /* Otherwise, the entry block to the region is the nearest common dominator for the blocks in BLOCKS. */ start_bb = nearest_common_dominator_for_set (CDI_DOMINATORS, -- 2.43.0
Re: [PATCH v3] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking
On Tue, Sep 24, 2024 at 12:29 PM wrote: > > From: Pan Li > > This patch would like to fix the following ICE for -O2 -m32 of x86_64. > > during RTL pass: expand > JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned > int)': > JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in > expand_fn_using_insn, at internal-fn.cc:263 > 3 | void DequeueEvent(unsigned frame) { > | ^~~~ > 0x27b580d diagnostic_context::diagnostic_impl(rich_location*, > diagnostic_metadata const*, diagnostic_option_id, char const*, > __va_list_tag (*) [1], diagnostic_t) > ???:0 > 0x27c4a3f internal_error(char const*, ...) > ???:0 > 0x27b3994 fancy_abort(char const*, int, char const*) > ???:0 > 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int) > ???:0 > 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int) > ???:0 > 0xf2c87c expand_SAT_SUB(internal_fn, gcall*) > ???:0 > > We allowed the operand convert when matching SAT_SUB in match.pd, to support > the zip benchmark SAT_SUB pattern. Aka, > > (convert? (minus (convert1? @0) (convert1? @1))) for below sample code. > > void test (uint16_t *x, unsigned b, unsigned n) > { > unsigned a = 0; > register uint16_t *p = x; > > do { > a = *--p; > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB > } while (--n); > } > > The pattern match for SAT_SUB itself may also act on below scalar sample > code too. > > unsigned long long GetTimeFromFrames(int); > unsigned long long GetMicroSeconds(); > > void DequeueEvent(unsigned frame) { > long long frame_time = GetTimeFromFrames(frame); > unsigned long long current_time = GetMicroSeconds(); > DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); > } > > Aka: > > uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t); > > Then there will be a problem when ia32 or -m32 is given when compiling. > Because we only check the lhs (aka uint32_t) type is supported by ifn > instead of the operand (aka uint64_t). Mostly DImode is disabled for > 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding. OK. Thanks, Richard. > The below test suites are passed for this patch. > * The rv64gcv fully regression test. > * The x86 bootstrap test. > * The x86 fully regression test. > > PR middle-end/116814 > > gcc/ChangeLog: > > * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Make > ifn is_supported type check based on operand instead of lhs. > > gcc/testsuite/ChangeLog: > > * g++.dg/torture/pr116814-1.C: New test. > > Signed-off-by: Pan Li > --- > gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 > gcc/tree-ssa-math-opts.cc | 2 +- > 2 files changed, 13 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C > > diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C > b/gcc/testsuite/g++.dg/torture/pr116814-1.C > new file mode 100644 > index 000..dd6f29daa7c > --- /dev/null > +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C > @@ -0,0 +1,12 @@ > +/* { dg-do compile { target { ia32 } } } */ > +/* { dg-options "-O2" } */ > + > +unsigned long long GetTimeFromFrames(int); > +unsigned long long GetMicroSeconds(); > + > +void DequeueEvent(unsigned frame) { > + long long frame_time = GetTimeFromFrames(frame); > + unsigned long long current_time = GetMicroSeconds(); > + > + DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); > +} > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc > index d61668aacfc..8c622514dbd 100644 > --- a/gcc/tree-ssa-math-opts.cc > +++ b/gcc/tree-ssa-math-opts.cc > @@ -4042,7 +4042,7 @@ build_saturation_binary_arith_call > (gimple_stmt_iterator *gsi, gphi *phi, > internal_fn fn, tree lhs, tree op_0, > tree op_1) > { > - if (direct_internal_fn_supported_p (fn, TREE_TYPE (lhs), > OPTIMIZE_FOR_BOTH)) > + if (direct_internal_fn_supported_p (fn, TREE_TYPE (op_0), > OPTIMIZE_FOR_BOTH)) > { >gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1); >gimple_call_set_lhs (call, lhs); > -- > 2.43.0 >
[PATCH] libstdc++: more #pragma diagnostic
Tested x86_64-pc-linux-gnu. Is this the right fix, or do we want to stop using these deprecated classes, here and in stl_function.h? -- 8< -- The CI saw failures on 17_intro/headers/c++2011/parallel_mode.cc due to -Wdeprecated-declarations warnings in some parallel/ headers. libstdc++-v3/ChangeLog: * include/parallel/base.h: Suppress -Wdeprecated-declarations. * include/parallel/multiseq_selection.h: Likewise. --- libstdc++-v3/include/parallel/base.h | 4 libstdc++-v3/include/parallel/multiseq_selection.h | 6 ++ 2 files changed, 10 insertions(+) diff --git a/libstdc++-v3/include/parallel/base.h b/libstdc++-v3/include/parallel/base.h index 5bc5350e723..fcbcc1e0b99 100644 --- a/libstdc++-v3/include/parallel/base.h +++ b/libstdc++-v3/include/parallel/base.h @@ -166,6 +166,8 @@ namespace __gnu_parallel { return !_M_comp(__a, __b) && !_M_comp(__b, __a); } }; +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wdeprecated-declarations" // *nary_function /** @brief Similar to std::unary_negate, * but giving the argument types explicitly. */ @@ -297,6 +299,8 @@ namespace __gnu_parallel struct _Multiplies<_Tp, _Tp, _Tp> : public std::multiplies<_Tp> { }; +#pragma GCC diagnostic pop // -Wdeprecated-declarations + /** @brief _Iterator associated with __gnu_parallel::_PseudoSequence. * If features the usual random-access iterator functionality. * @param _Tp Sequence _M_value type. diff --git a/libstdc++-v3/include/parallel/multiseq_selection.h b/libstdc++-v3/include/parallel/multiseq_selection.h index f25895adbdd..22bd97e6432 100644 --- a/libstdc++-v3/include/parallel/multiseq_selection.h +++ b/libstdc++-v3/include/parallel/multiseq_selection.h @@ -48,6 +48,10 @@ namespace __gnu_parallel { + +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wdeprecated-declarations" // *nary_function + /** @brief Compare __a pair of types lexicographically, ascending. */ template class _Lexicographic @@ -100,6 +104,8 @@ namespace __gnu_parallel } }; +#pragma GCC diagnostic pop // -Wdeprecated-declarations + /** * @brief Splits several sorted sequences at a certain global __rank, * resulting in a splitting point for each sequence. base-commit: b752eed3e3f2f27570ea89b7c2339468698472a8 -- 2.46.0
Re: [Fortran, Patch, PR101100, v1] Fix ICE when compiling with caf-lib and using proc_pointer component.
Hi Harald, thanks for the review. Committed as gcc-15-3827-g0c0d79c783f Thanks again, Andre On Mon, 23 Sep 2024 21:25:55 +0200 Harald Anlauf wrote: > Hi Andre, > > Am 19.09.24 um 14:19 schrieb Andre Vehreschild: > > Hi all, > > > > the attached patch fixes an ICE when compiling with -fcoarray=lib and using > > (proc_-)pointer component in a coarray. The code was looking at the wrong > > location for the caf-token. > > > > Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline? > > this looks good to me. > > Thanks for the patch! > > Harald > > > Regards, > > Andre > > -- > > Andre Vehreschild * Email: vehre ad gmx dot de > -- Andre Vehreschild * Email: vehre ad gmx dot de
[PATCH 2/2] Disable add_store_equivs when -fno-expensive-optimizations
IRAs add_store_equivs is quadratic in the size of the function worst case, disable it when -fno-expensive-optimizations which means at -O1 and -Og. Bootstrap and regtest running on x86_64-unknown-linux-gnu. OK? Thanks, Richard. * ira.cc (ira): Gate add_store_equivs on flag_expensive_optimizations. --- gcc/ira.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/ira.cc b/gcc/ira.cc index 3936456c4ed..5231f63398e 100644 --- a/gcc/ira.cc +++ b/gcc/ira.cc @@ -5739,7 +5739,7 @@ ira (FILE *f) combine_and_move_insns (); /* Gather additional equivalences with memory. */ - if (optimize) + if (optimize && flag_expensive_optimizations) add_store_equivs (); loop_optimizer_finalize (); -- 2.43.0
[PATCH 1/2] rtl-optimization/114855 - slow add_store_equivs in IRA
For the testcase in PR114855 at -O1 add_store_equivs shows up as the main sink for bitmap_set_bit because it uses a bitmap to mark all seen insns by UID to make sure the forward walk in memref_used_between_p will find the insn in question. Given we do have a CFG here the functions operation is questionable, given memref_used_between_p together with the walk of all insns is obviously quadratic in the worst case that whole thing should be re-done ... but, for the testcase, using a sbitmap of size get_max_uid () + 1 gets bitmap_set_bit off the profile and improves IRA time from 15.58s (8%) to 3.46s (2%). Now, given above quadraticness I wonder whether we should instead gate add_store_equivs on optimize > 1 or flag_expensive_optimizations. Jeff, you added the bitmap in r6-7529-g14d7d4be52585b, I have no idea how get_insns () works at this point and in which CFG mode we are but a simplification might be to simply verify both insns are in the same BB and hopefully get_insns gets us walk the insns in order there, thus we could elide the bitmap completely (with some loss of cases, but the function comment suggests it is supposed to catch single-BB cases only anyway?!). Bootstrap and regtest running on x86_64-unknown-linux-gnu. OK if that succeeds? Thanks, Richard. PR rtl-optimization/114855 * ira.cc (add_store_equivs): Use sbitmap for tracking visited insns. --- gcc/ira.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/ira.cc b/gcc/ira.cc index 156541df4e6..3936456c4ed 100644 --- a/gcc/ira.cc +++ b/gcc/ira.cc @@ -3838,7 +3838,8 @@ update_equiv_regs (void) static void add_store_equivs (void) { - auto_bitmap seen_insns; + auto_sbitmap seen_insns (get_max_uid () + 1); + bitmap_clear (seen_insns); for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn)) { -- 2.43.0
Re: [PATCH v3 1/4] tree-optimization/116024 - simplify C1-X cmp C2 for UB-on-overflow types
On 9/24/2024 12:16 AM, Jeff Law wrote: > > > On 9/23/24 2:32 AM, Artemiy Volkov wrote: >> Implement a match.pd pattern for C1 - X cmp C2, where C1 and C2 are >> integer constants and X is of a UB-on-overflow type. The pattern is >> simplified to X rcmp C1 - C2 by moving X and C2 to the other side of the >> comparison (with opposite signs). If C1 - C2 happens to overflow, >> replace the whole expression with either a constant 0 or a constant 1 >> node, depending on the comparison operator and the sign of the overflow. >> >> This transformation allows to occasionally save load-immediate / >> subtraction instructions, e.g. the following statement: >> >> 10 - (int) x <= 9; >> >> now compiles to >> >> sgt a0,a0,zero >> >> instead of >> >> li a5,10 >> sub a0,a5,a0 >> slti a0,a0,10 >> >> on 32-bit RISC-V. >> >> Additional examples can be found in the newly added test file. This >> patch has been bootstrapped and regtested on aarch64, x86_64, and >> i386, and additionally regtested on riscv32. Existing tests were >> adjusted where necessary. >> >> gcc/ChangeLog: >> >> PR tree-optimization/116024 >> * match.pd: New transformation around integer comparison. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/tree-ssa/pr116024.c: New test. >> * gcc.dg/pr67089-6.c: Adjust. > I think Richi is already engaged on the review side, so I'll let him own > especially since he knows more about match.pd patterns than I do. > > >> +int32_t i1(void) >> +{ >> + int32_t l = 2; >> + l = 10 - (int32_t)f(); >> + return l <= 9; // f() > 0 >> +} > Why the initialization of l = 2? It's trivially dead and I expect it to > be cleaned up early in the optimization pipeline. It looks like most of > the tests in the series have this trivially dead initialization code. Hi Jeff, These initializers come from the original reduced testcase in 116024 and are completely superfluous - I will remove them before resending the series. Thanks, Artemiy > > Jeff > > > >
Re: [PATCH v1 2/2] RISC-V: Add testcases for form 3 of signed vector SAT_ADD
LGTM (in case you haven't committed it yet). -- Regards Robin
Re: [PATCH v1 2/2] RISC-V: Add testcases for form 2 of signed vector SAT_ADD
LGTM. -- Regards Robin
Re: [PATCH] Update email in MAINTAINERS file.
Pushed attached patch. Thanks. Aldy On Tue, Sep 24, 2024 at 10:09 AM Filip Kastl wrote: > On Mon 2024-09-23 09:43:28, Aldy Hernandez wrote: > > From: Aldy Hernandez > > > > ChangeLog: > > > > * MAINTAINERS: Update email and add myself to DCO. > > --- > > MAINTAINERS | 9 + > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index cfd96c9f33e..e9fafaf45a7 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -116,7 +116,7 @@ riscv port Jim Wilson < > jim.wilson@gmail.com> > > rs6000/powerpc port David Edelsohn > > rs6000/powerpc port Segher Boessenkool < > seg...@kernel.crashing.org> > > rs6000/powerpc port Kewen Lin > > -rs6000 vector extns Aldy Hernandez > > +rs6000 vector extns Aldy Hernandez > > rx port Nick Clifton > > s390 port Ulrich Weigand > > s390 port Andreas Krebbel > > @@ -213,7 +213,7 @@ c++ runtime libsJonathan Wakely < > jwak...@redhat.com> > > c++ runtime libs special modes François Dumont > > fixincludes Bruce Korb > > *gimpl* Jakub Jelinek > > -*gimpl* Aldy Hernandez > > +*gimpl* Aldy Hernandez > > *gimpl* Jason Merrill > > gcse.cc Jeff Law > > global opt frameworkJeff Law > > @@ -240,7 +240,7 @@ option handling Joseph Myers< > josmy...@redhat.com> > > middle-end Jeff Law > > middle-end Ian Lance Taylor > > middle-end Richard Biener > > -*vrp, rangerAldy Hernandez > > +*vrp, rangerAldy Hernandez > > *vrp, rangerAndrew MacLeod > > tree-ssaAndrew MacLeod > > tree browser/unparser Sebastian Pop > > @@ -518,7 +518,7 @@ Daniel Hellstromdanielh < > dan...@gaisler.com> > > Fergus Henderson- > > Richard Henderson rth > > Stuart Hendersonshenders > > -Aldy Hernandez aldyh > > +Aldy Hernandez aldyh > > Philip Herron redbrain< > herron.phi...@googlemail.com> > > Marius Hillenbrand - > > Matthew Hiller - > > @@ -948,3 +948,4 @@ Jonathan Wakely < > jwak...@redhat.com> > > Alexander Westbrooks > > > Chung-Ju Wu > > Pengxuan Zheng < > quic_pzh...@quicinc.com> > > +Aldy Hernandez > > -- > > 2.43.0 > > > > Hi Aldy, > > Could you move your entry in the DCO list so that it respects surname > alphabetical order, please? Your name should be between Robin Dapp and > Michal > Jires. > > Thanks, > Filip Kastl > > From 34366176046351250e1beb578664d926fbdd50c9 Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Tue, 24 Sep 2024 11:40:52 +0200 Subject: [PATCH] Alphabetize my entry in MAINTAINER's DCO list. ChangeLog: * MAINTAINERS: Move my entry in DCO list into alphabetical order. --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 3b4cf9d20d8..47b5915e9f8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -917,6 +917,7 @@ information. Juergen Christ Robin Dapp Robin Dapp +Aldy Hernandez Michal Jires Matthias Kretz Prathamesh Kulkarni @@ -949,4 +950,3 @@ Jonathan Wakely Alexander Westbrooks Chung-Ju Wu Pengxuan Zheng -Aldy Hernandez -- 2.43.0
Re: [PATCH] [x86] Define VECTOR_STORE_FLAG_VALUE
On Tue, Sep 24, 2024 at 11:23 AM liuhongt wrote: > > Return constm1_rtx when GET_MODE_CLASS (MODE) == MODE_VECTOR_INT. > Otherwise NULL_RTX. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready push to trunk. > > gcc/ChangeLog: > > * config/i386/i386.h (VECTOR_STORE_FLAG_VALUE): New macro. > > gcc/testsuite/ChangeLog: > * gcc.dg/rtl/x86_64/vector_eq.c: New test. > --- > gcc/config/i386/i386.h | 5 +++- > gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c | 26 + > 2 files changed, 30 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index c1ec92ffb15..b12be41424f 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -899,7 +899,10 @@ extern const char *host_detect_local_cpu (int argc, > const char **argv); > and give entire struct the alignment of an int. */ > /* Required on the 386 since it doesn't have bit-field insns. */ > #define PCC_BITFIELD_TYPE_MATTERS 1 > - > + > +#define VECTOR_STORE_FLAG_VALUE(MODE) \ > + (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT ? constm1_rtx : NULL_RTX) > + > /* Standard register usage. */ > > /* This processor has special stack-like registers. See reg-stack.cc > diff --git a/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c > b/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c > new file mode 100644 > index 000..b82603d0b64 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/rtl/x86_64/vector_eq.c > @@ -0,0 +1,26 @@ > +/* { dg-do compile { target x86_64-*-* } } */ target { { i?86-*-* x86_64-*-* } && lp64 } Uros. > +/* { dg-additional-options "-O2 -march=x86-64-v3" } */ > + > +typedef int v4si __attribute__((vector_size(16))); > + > +v4si __RTL (startwith ("vregs")) foo (void) > +{ > +(function "foo" > + (insn-chain > +(block 2 > + (edge-from entry (flags "FALLTHRU")) > + (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK) > + (cnote 2 NOTE_INSN_FUNCTION_BEG) > + (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int 0) > (const_int 0) (const_int 0) (const_int 0)]))) > + (cinsn 5 (set (reg:V4SI <2>) > + (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1> > + (cinsn 6 (set (reg:V4SI <3>) (reg:V4SI <2>))) > + (cinsn 7 (set (reg:V4SI xmm0) (reg:V4SI <3>))) > + (edge-to exit (flags "FALLTHRU")) > +) > + ) > + (crtl (return_rtx (reg/i:V4SI xmm0))) > +) > +} > + > +/* { dg-final { scan-assembler-not "vpxor" } } */ > -- > 2.31.1 >
Re: [PATCH] Update email in MAINTAINERS file.
On Tue 2024-09-24 11:43:47, Aldy Hernandez wrote: > Pushed attached patch. > > Thanks. > Aldy > Nice. Thanks! Filip > On Tue, Sep 24, 2024 at 10:09 AM Filip Kastl wrote: > > > On Mon 2024-09-23 09:43:28, Aldy Hernandez wrote: > > > From: Aldy Hernandez > > > > > > ChangeLog: > > > > > > * MAINTAINERS: Update email and add myself to DCO. > > > --- > > > MAINTAINERS | 9 + > > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > > index cfd96c9f33e..e9fafaf45a7 100644 > > > --- a/MAINTAINERS > > > +++ b/MAINTAINERS > > > @@ -116,7 +116,7 @@ riscv port Jim Wilson < > > jim.wilson@gmail.com> > > > rs6000/powerpc port David Edelsohn > > > rs6000/powerpc port Segher Boessenkool < > > seg...@kernel.crashing.org> > > > rs6000/powerpc port Kewen Lin > > > -rs6000 vector extns Aldy Hernandez > > > +rs6000 vector extns Aldy Hernandez > > > rx port Nick Clifton > > > s390 port Ulrich Weigand > > > s390 port Andreas Krebbel > > > @@ -213,7 +213,7 @@ c++ runtime libsJonathan Wakely < > > jwak...@redhat.com> > > > c++ runtime libs special modes François Dumont > > > fixincludes Bruce Korb > > > *gimpl* Jakub Jelinek > > > -*gimpl* Aldy Hernandez > > > +*gimpl* Aldy Hernandez > > > *gimpl* Jason Merrill > > > gcse.cc Jeff Law > > > global opt frameworkJeff Law > > > @@ -240,7 +240,7 @@ option handling Joseph Myers< > > josmy...@redhat.com> > > > middle-end Jeff Law > > > middle-end Ian Lance Taylor > > > middle-end Richard Biener > > > -*vrp, rangerAldy Hernandez > > > +*vrp, rangerAldy Hernandez > > > *vrp, rangerAndrew MacLeod > > > tree-ssaAndrew MacLeod > > > tree browser/unparser Sebastian Pop > > > @@ -518,7 +518,7 @@ Daniel Hellstromdanielh < > > dan...@gaisler.com> > > > Fergus Henderson- > > > Richard Henderson rth > > > Stuart Hendersonshenders > > > -Aldy Hernandez aldyh > > > +Aldy Hernandez aldyh > > > Philip Herron redbrain< > > herron.phi...@googlemail.com> > > > Marius Hillenbrand - > > > Matthew Hiller - > > > @@ -948,3 +948,4 @@ Jonathan Wakely < > > jwak...@redhat.com> > > > Alexander Westbrooks > > > > > Chung-Ju Wu > > > Pengxuan Zheng < > > quic_pzh...@quicinc.com> > > > +Aldy Hernandez > > > -- > > > 2.43.0 > > > > > > > Hi Aldy, > > > > Could you move your entry in the DCO list so that it respects surname > > alphabetical order, please? Your name should be between Robin Dapp and > > Michal > > Jires. > > > > Thanks, > > Filip Kastl > > > > > From 34366176046351250e1beb578664d926fbdd50c9 Mon Sep 17 00:00:00 2001 > From: Aldy Hernandez > Date: Tue, 24 Sep 2024 11:40:52 +0200 > Subject: [PATCH] Alphabetize my entry in MAINTAINER's DCO list. > > ChangeLog: > > * MAINTAINERS: Move my entry in DCO list into alphabetical order. > --- > MAINTAINERS | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 3b4cf9d20d8..47b5915e9f8 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -917,6 +917,7 @@ information. > Juergen Christ > Robin Dapp > Robin Dapp > +Aldy Hernandez > Michal Jires > Matthias Kretz > Prathamesh Kulkarni > @@ -949,4 +950,3 @@ Jonathan Wakely > > Alexander Westbrooks > Chung-Ju Wu > Pengxuan Zheng > -Aldy Hernandez > -- > 2.43.0 >
[PATCH] tree-optimization/114855 - slow VRP due to equiv oracle queries
For the testcase in PR114855 VRP takes 320.41s (23%) (after mitigating backwards threader slowness). This is mostly due to the bitmap check in equiv_oracle::find_equiv_dom. The following turns this bitmap to tree view, trading the linear search for a O(log N) one which improves VRP time to 54.54s (5%). Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK if that succeeds? Thanks, Richard. PR tree-optimization/114855 * value-relation.cc (equiv_oracle::equiv_oracle): Switch m_equiv_set to tree view. --- gcc/value-relation.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc index 45722fcd13a..d6ad2dd984f 100644 --- a/gcc/value-relation.cc +++ b/gcc/value-relation.cc @@ -321,6 +321,7 @@ equiv_oracle::equiv_oracle () m_equiv.create (0); m_equiv.safe_grow_cleared (last_basic_block_for_fn (cfun) + 1); m_equiv_set = BITMAP_ALLOC (&m_bitmaps); + bitmap_tree_view (m_equiv_set); obstack_init (&m_chain_obstack); m_self_equiv.create (0); m_self_equiv.safe_grow_cleared (num_ssa_names + 1); -- 2.43.0
Re: [patch, fortran] Matmul and dot_product for unsigned
Hi Thomas, thanks for your answers. I am ok with the patch. - Andre On Mon, 23 Sep 2024 15:07:31 +0200 Thomas Koenig wrote: > Hello Andre and everybody else? > > Any more comments on the matmul patch? The other ones depend on > it, so I would like to commit (unless there are further > questions, of course). > > Best regards > > Thomas -- Andre Vehreschild * Email: vehre ad gmx dot de
RE: [PATCH v2] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking
Thanks Richard for comments. > Since you're creating the call with op_0/op_1 shouldn't you _only_ check > support > for op_type operation and not lhs_type? Yes, your are right. Checking operand makes much more sense to me. Let me update in v3. Pan -Original Message- From: Richard Biener Sent: Tuesday, September 24, 2024 3:42 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v2] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking On Tue, Sep 24, 2024 at 9:13 AM wrote: > > From: Pan Li > > This patch would like to fix the following ICE for -O2 -m32 of x86_64. > > during RTL pass: expand > JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned > int)': > JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in > expand_fn_using_insn, at internal-fn.cc:263 > 3 | void DequeueEvent(unsigned frame) { > | ^~~~ > 0x27b580d diagnostic_context::diagnostic_impl(rich_location*, > diagnostic_metadata const*, diagnostic_option_id, char const*, > __va_list_tag (*) [1], diagnostic_t) > ???:0 > 0x27c4a3f internal_error(char const*, ...) > ???:0 > 0x27b3994 fancy_abort(char const*, int, char const*) > ???:0 > 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int) > ???:0 > 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int) > ???:0 > 0xf2c87c expand_SAT_SUB(internal_fn, gcall*) > ???:0 > > We allowed the operand convert when matching SAT_SUB in match.pd, to support > the zip benchmark SAT_SUB pattern. Aka, > > (convert? (minus (convert1? @0) (convert1? @1))) for below sample code. > > void test (uint16_t *x, unsigned b, unsigned n) > { > unsigned a = 0; > register uint16_t *p = x; > > do { > a = *--p; > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB > } while (--n); > } > > The pattern match for SAT_SUB itself may also act on below scalar sample > code too. > > unsigned long long GetTimeFromFrames(int); > unsigned long long GetMicroSeconds(); > > void DequeueEvent(unsigned frame) { > long long frame_time = GetTimeFromFrames(frame); > unsigned long long current_time = GetMicroSeconds(); > DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); > } > > Aka: > > uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t); > > Then there will be a problem when ia32 or -m32 is given when compiling. > Because we only check the lhs (aka uint32_t) type is supported by ifn > and missed the operand (aka uint64_t). Mostly DImode is disabled for > 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding. > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. > * The x86 bootstrap test. > * The x86 fully regression test. > > PR middle-end/116814 > > gcc/ChangeLog: > > * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add > ifn is_supported check for operand TREE type. > > gcc/testsuite/ChangeLog: > > * g++.dg/torture/pr116814-1.C: New test. > > Signed-off-by: Pan Li > --- > gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 > gcc/tree-ssa-math-opts.cc | 23 +++ > 2 files changed, 27 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C > > diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C > b/gcc/testsuite/g++.dg/torture/pr116814-1.C > new file mode 100644 > index 000..dd6f29daa7c > --- /dev/null > +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C > @@ -0,0 +1,12 @@ > +/* { dg-do compile { target { ia32 } } } */ > +/* { dg-options "-O2" } */ > + > +unsigned long long GetTimeFromFrames(int); > +unsigned long long GetMicroSeconds(); > + > +void DequeueEvent(unsigned frame) { > + long long frame_time = GetTimeFromFrames(frame); > + unsigned long long current_time = GetMicroSeconds(); > + > + DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); > +} > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc > index d61668aacfc..361761cedef 100644 > --- a/gcc/tree-ssa-math-opts.cc > +++ b/gcc/tree-ssa-math-opts.cc > @@ -4042,15 +4042,22 @@ build_saturation_binary_arith_call > (gimple_stmt_iterator *gsi, gphi *phi, > internal_fn fn, tree lhs, tree op_0, > tree op_1) > { > - if (direct_internal_fn_supported_p (fn, TREE_TYPE (lhs), > OPTIMIZE_FOR_BOTH)) > -{ > - gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1); > - gimple_call_set_lhs (call, lhs); > - gsi_insert_before (gsi, call, GSI_SAME_STMT); > + tree lhs_type = TREE_TYPE (lhs); > + tree op_type = TREE_TYPE (op_0); > > - gimple_stmt_iterator psi = gsi_for_stmt (phi); > - remove_phi_node (&psi, /* releas
Re: [Fortran, Patch, PR84870, v1] Fix ICE and allocated memory not assigned correctly.
Hi Harald, thanks for the review. Committed as gcc-15-3825-gf5035d7d015 Thanks again, Andre On Mon, 23 Sep 2024 21:19:40 +0200 Harald Anlauf wrote: > Hi Andre, > > Am 19.09.24 um 16:01 schrieb Andre Vehreschild: > > Hi all, > > > > in PR84870 an ICE was reported, that has been fixed in the meantime by some > > other patch. Nevertheless did a testcase reveal that the memory handling > > still was not correct. I.e. the test case in the patch was answering 2 for > > both x.b.a and y.b.a which is not correct. > > > > For a coarray all memory is allocated using an array descriptor. For scalars > > just a temporary descriptor is created and handed to the caf-register > > routine. The error here was, that the memory now handed back in the > > temporary descriptor was not used for the memory in the component, thus the > > pointer in the component was not updated. The patch fixes this. > > > > Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline? > > this looks good to me. > > Thanks for the patch! > > Harald > > > Regards, > > Andre > > -- > > Andre Vehreschild * Email: vehre ad gmx dot de > -- Andre Vehreschild * Email: vehre ad gmx dot de
[PATCH v2] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking
From: Pan Li This patch would like to fix the following ICE for -O2 -m32 of x86_64. during RTL pass: expand JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned int)': JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in expand_fn_using_insn, at internal-fn.cc:263 3 | void DequeueEvent(unsigned frame) { | ^~~~ 0x27b580d diagnostic_context::diagnostic_impl(rich_location*, diagnostic_metadata const*, diagnostic_option_id, char const*, __va_list_tag (*) [1], diagnostic_t) ???:0 0x27c4a3f internal_error(char const*, ...) ???:0 0x27b3994 fancy_abort(char const*, int, char const*) ???:0 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int) ???:0 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int) ???:0 0xf2c87c expand_SAT_SUB(internal_fn, gcall*) ???:0 We allowed the operand convert when matching SAT_SUB in match.pd, to support the zip benchmark SAT_SUB pattern. Aka, (convert? (minus (convert1? @0) (convert1? @1))) for below sample code. void test (uint16_t *x, unsigned b, unsigned n) { unsigned a = 0; register uint16_t *p = x; do { a = *--p; *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB } while (--n); } The pattern match for SAT_SUB itself may also act on below scalar sample code too. unsigned long long GetTimeFromFrames(int); unsigned long long GetMicroSeconds(); void DequeueEvent(unsigned frame) { long long frame_time = GetTimeFromFrames(frame); unsigned long long current_time = GetMicroSeconds(); DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); } Aka: uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t); Then there will be a problem when ia32 or -m32 is given when compiling. Because we only check the lhs (aka uint32_t) type is supported by ifn and missed the operand (aka uint64_t). Mostly DImode is disabled for 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding. The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. PR middle-end/116814 gcc/ChangeLog: * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add ifn is_supported check for operand TREE type. gcc/testsuite/ChangeLog: * g++.dg/torture/pr116814-1.C: New test. Signed-off-by: Pan Li --- gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 gcc/tree-ssa-math-opts.cc | 23 +++ 2 files changed, 27 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C b/gcc/testsuite/g++.dg/torture/pr116814-1.C new file mode 100644 index 000..dd6f29daa7c --- /dev/null +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C @@ -0,0 +1,12 @@ +/* { dg-do compile { target { ia32 } } } */ +/* { dg-options "-O2" } */ + +unsigned long long GetTimeFromFrames(int); +unsigned long long GetMicroSeconds(); + +void DequeueEvent(unsigned frame) { + long long frame_time = GetTimeFromFrames(frame); + unsigned long long current_time = GetMicroSeconds(); + + DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); +} diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc index d61668aacfc..361761cedef 100644 --- a/gcc/tree-ssa-math-opts.cc +++ b/gcc/tree-ssa-math-opts.cc @@ -4042,15 +4042,22 @@ build_saturation_binary_arith_call (gimple_stmt_iterator *gsi, gphi *phi, internal_fn fn, tree lhs, tree op_0, tree op_1) { - if (direct_internal_fn_supported_p (fn, TREE_TYPE (lhs), OPTIMIZE_FOR_BOTH)) -{ - gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1); - gimple_call_set_lhs (call, lhs); - gsi_insert_before (gsi, call, GSI_SAME_STMT); + tree lhs_type = TREE_TYPE (lhs); + tree op_type = TREE_TYPE (op_0); - gimple_stmt_iterator psi = gsi_for_stmt (phi); - remove_phi_node (&psi, /* release_lhs_p */ false); -} + if (!direct_internal_fn_supported_p (fn, lhs_type, OPTIMIZE_FOR_BOTH)) +return; + + if (lhs_type != op_type + && !direct_internal_fn_supported_p (fn, op_type, OPTIMIZE_FOR_BOTH)) +return; + + gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1); + gimple_call_set_lhs (call, lhs); + gsi_insert_before (gsi, call, GSI_SAME_STMT); + + gimple_stmt_iterator psi = gsi_for_stmt (phi); + remove_phi_node (&psi, /* release_lhs_p */ false); } /* -- 2.43.0
Re: [PATCH]middle-end: check explicitly for external or constants when checking for loop invariant [PR116817]
On Tue, 24 Sep 2024, Tamar Christina wrote: > > Can you explain how you get to see constant/external defs with > > astmt_vec_info? That's somehow a violation of some inherentinvariant in > > the vectorizer. > > I'm not sure I actually get any. It could be the condition is never hit > with a stmt_vec_info. I had assumed however since the condition is part > of a gimple_cond and if one of the arguments of the gimple_cond is loop > bound, that the condition would be analyzed too. > > So if you're saying you never get a stmt_vec_info for invariants at this > point (I assume you could see you see them in the corresponding slp > tree) then maybe checking for the stmt_vec_info is enough. > > However, when I was looking around for how to check for externals I > noticed other patterns also check for externals and constants. So I > assumed that you could indeed get them. You usually check that after doing vect_is_simple_use on the SSA name or constant which internally makes all stmts with a stmt_vec_info one of the internal def kinds. So I guess you could do vect_is_simple_use on 'var' as well and check the 'dt' it will populate Richard. > Kind regards, > Tamar > > > > From: Richard Biener > Sent: Tuesday, September 24, 2024 7:45 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org ; nd ; > j...@ventanamicro.com > Subject: RE: [PATCH]middle-end: check explicitly for external or constants > when checking for loop invariant [PR116817] > > On Mon, 23 Sep 2024, Tamar Christina wrote: > > > I had made the condition to strict before, here's an updated patch: > > > > Hi All, > > > > The previous check if a value was external was checking > > !vect_get_internal_def (vinfo, var) but this of course isn't completely > > right > > as they could reductions etc. > > > > This changes the check to just explicitly look at externals and constants. > > Note that reductions remain unhandled here, but we don't support codegen of > > boolean reductions today anyway. > > Can you explain how you get to see constant/external defs with a > stmt_vec_info? That's somehow a violation of some inherent > invariant in the vectorizer. > > Richard. > > > So at the time we do then this would have the be handled as well in > > lowering. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, > > x86_64-pc-linux-gnu -m32, -m64 and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > >PR tree-optimization/116817 > >* tree-vect-patterns.cc (vect_recog_bool_pattern): Check for const or > >externals. > > > > gcc/testsuite/ChangeLog: > > > >PR tree-optimization/116817 > >* g++.dg/vect/pr116817.cc: New test. > > > > -- inline copy of patch -- > > > > diff --git a/gcc/testsuite/g++.dg/vect/pr116817.cc > > b/gcc/testsuite/g++.dg/vect/pr116817.cc > > new file mode 100644 > > index > > ..7e28982fb138c24f956aedb03fa454d9d858 > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/vect/pr116817.cc > > @@ -0,0 +1,16 @@ > > +/* { dg-do compile } */ > > +/* { dg-additional-options "-O3" } */ > > + > > +int main_ulData0; > > +unsigned *main_pSrcBuffer; > > +int main(void) { > > + int iSrc = 0; > > + bool bData0; > > + for (; iSrc < 4; iSrc++) { > > +if (bData0) > > + main_pSrcBuffer[iSrc] = main_ulData0; > > +else > > + main_pSrcBuffer[iSrc] = 0; > > +bData0 = !bData0; > > + } > > +} > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > index > > e7e877dd2adb55262822f1660f8d92b42d44e6d0..f0298b2ab97a1e7dd0d943340e1389c3c0fa796e > > 100644 > > --- a/gcc/tree-vect-patterns.cc > > +++ b/gcc/tree-vect-patterns.cc > > @@ -6062,12 +6062,15 @@ vect_recog_bool_pattern (vec_info *vinfo, > >if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE) > >return NULL; > > > > + stmt_vec_info var_def_info = vinfo->lookup_def (var); > >if (check_bool_pattern (var, vinfo, bool_stmts)) > >var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo); > >else if (integer_type_for_mask (var, vinfo)) > >return NULL; > >else if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE > > -&& !vect_get_internal_def (vinfo, var)) > > +&& (!var_def_info > > +|| STMT_VINFO_DEF_TYPE (var_def_info) == vect_external_def > > +|| STMT_VINFO_DEF_TYPE (var_def_info) == > > vect_constant_def)) > >{ > > /* If the condition is already a boolean then manually convert it > > to a > > mask of the given integer type but don't set a vectype. */ > > > > -- > Richard Biener > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146,
[PATCH v3] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking
From: Pan Li This patch would like to fix the following ICE for -O2 -m32 of x86_64. during RTL pass: expand JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned int)': JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in expand_fn_using_insn, at internal-fn.cc:263 3 | void DequeueEvent(unsigned frame) { | ^~~~ 0x27b580d diagnostic_context::diagnostic_impl(rich_location*, diagnostic_metadata const*, diagnostic_option_id, char const*, __va_list_tag (*) [1], diagnostic_t) ???:0 0x27c4a3f internal_error(char const*, ...) ???:0 0x27b3994 fancy_abort(char const*, int, char const*) ???:0 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int) ???:0 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int) ???:0 0xf2c87c expand_SAT_SUB(internal_fn, gcall*) ???:0 We allowed the operand convert when matching SAT_SUB in match.pd, to support the zip benchmark SAT_SUB pattern. Aka, (convert? (minus (convert1? @0) (convert1? @1))) for below sample code. void test (uint16_t *x, unsigned b, unsigned n) { unsigned a = 0; register uint16_t *p = x; do { a = *--p; *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB } while (--n); } The pattern match for SAT_SUB itself may also act on below scalar sample code too. unsigned long long GetTimeFromFrames(int); unsigned long long GetMicroSeconds(); void DequeueEvent(unsigned frame) { long long frame_time = GetTimeFromFrames(frame); unsigned long long current_time = GetMicroSeconds(); DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); } Aka: uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t); Then there will be a problem when ia32 or -m32 is given when compiling. Because we only check the lhs (aka uint32_t) type is supported by ifn instead of the operand (aka uint64_t). Mostly DImode is disabled for 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding. The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. PR middle-end/116814 gcc/ChangeLog: * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Make ifn is_supported type check based on operand instead of lhs. gcc/testsuite/ChangeLog: * g++.dg/torture/pr116814-1.C: New test. Signed-off-by: Pan Li --- gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 gcc/tree-ssa-math-opts.cc | 2 +- 2 files changed, 13 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C b/gcc/testsuite/g++.dg/torture/pr116814-1.C new file mode 100644 index 000..dd6f29daa7c --- /dev/null +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C @@ -0,0 +1,12 @@ +/* { dg-do compile { target { ia32 } } } */ +/* { dg-options "-O2" } */ + +unsigned long long GetTimeFromFrames(int); +unsigned long long GetMicroSeconds(); + +void DequeueEvent(unsigned frame) { + long long frame_time = GetTimeFromFrames(frame); + unsigned long long current_time = GetMicroSeconds(); + + DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); +} diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc index d61668aacfc..8c622514dbd 100644 --- a/gcc/tree-ssa-math-opts.cc +++ b/gcc/tree-ssa-math-opts.cc @@ -4042,7 +4042,7 @@ build_saturation_binary_arith_call (gimple_stmt_iterator *gsi, gphi *phi, internal_fn fn, tree lhs, tree op_0, tree op_1) { - if (direct_internal_fn_supported_p (fn, TREE_TYPE (lhs), OPTIMIZE_FOR_BOTH)) + if (direct_internal_fn_supported_p (fn, TREE_TYPE (op_0), OPTIMIZE_FOR_BOTH)) { gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1); gimple_call_set_lhs (call, lhs); -- 2.43.0
[PATCH] Fix bogus SLP nvector compute in check_load_store_for_partial_vectors
We have a new overload for vect_get_num_copies that handles both SLP and non-SLP. Use it. Bootstrap and regtest running on x86_64-unknown-linux-gnu. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Use the new vect_get_num_copies overload. --- gcc/tree-vect-stmts.cc | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index cafcedb7b9e..f7867c0803b 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1507,13 +1507,7 @@ check_load_store_for_partial_vectors (loop_vec_info loop_vinfo, tree vectype, if (memory_access_type == VMAT_INVARIANT) return; - unsigned int nvectors; - if (slp_node) -/* ??? Incorrect for multi-lane lanes. */ -nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) / group_size; - else -nvectors = vect_get_num_copies (loop_vinfo, vectype); - + unsigned int nvectors = vect_get_num_copies (loop_vinfo, slp_node, vectype); vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); machine_mode vecmode = TYPE_MODE (vectype); -- 2.43.0
[PATCH] MATCH: Simplify `(trunc)copysign ((extend)x, CST)` to `copysign (x, -1.0/1.0)` [PR112472]
This patch simplify `(trunc)copysign ((extend)x, CST)` to `copysign (x, -1.0/1.0)` depending on the sign of CST. Previously, it was simplified to `copysign (x, CST)`. It can be optimized as the sign of the CST matters, not the value. The patch also simplify `(trunc)abs (extend x)` to `abs (x)`. PR tree-optimization/112472 gcc/ChangeLog: * match.pd ((trunc)copysign ((extend)x, -CST) --> copysign (x, -1.0)): New pattern. ((trunc)abs (extend x) --> abs (x)): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr112472.c: New test. --- gcc/match.pd | 24 +++- gcc/testsuite/gcc.dg/tree-ssa/pr112472.c | 22 ++ 2 files changed, 45 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112472.c diff --git a/gcc/match.pd b/gcc/match.pd index 940292d0d49..52dc8b539fc 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -8535,7 +8535,29 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@2)) && direct_internal_fn_supported_p (IFN_COPYSIGN, type, OPTIMIZE_FOR_BOTH)) -(IFN_COPYSIGN @0 @1 +(IFN_COPYSIGN @0 @1))) + /* (trunc)copysign (extend)x, CST) to copysign (x, -1.0/1.0) */ + (simplify + (convert (copysigns (convert@2 @0) REAL_CST@1)) + (if (optimize + && !HONOR_SNANS (@2) + && types_match (type, TREE_TYPE (@0)) + && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@2)) + && direct_internal_fn_supported_p (IFN_COPYSIGN, + type, OPTIMIZE_FOR_BOTH)) +(if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1))) + (IFN_COPYSIGN @0 { build_minus_one_cst (type); }) + (IFN_COPYSIGN @0 { build_one_cst (type); }) + +/* (trunc)abs (extend x) --> abs (x) + x is a float value */ +(simplify + (convert (abs (convert@1 @0))) + (if (optimize + && !HONOR_SNANS (@1) + && types_match (type, TREE_TYPE (@0)) + && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@1))) + (abs @0))) (for froms (BUILT_IN_FMAF BUILT_IN_FMA BUILT_IN_FMAL) tos (IFN_FMA IFN_FMA IFN_FMA) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112472.c b/gcc/testsuite/gcc.dg/tree-ssa/pr112472.c new file mode 100644 index 000..8f97278ffe8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112472.c @@ -0,0 +1,22 @@ +/* PR tree-optimization/109878 */ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-tree-optimized" } */ + +/* Optimized to .COPYSIGN(a, -1.0e+0) */ +float f(float a) +{ + return (float)__builtin_copysign(a, -3.0); +} + +/* This gets converted to (float) abs((double) a) + With the patch it is optimized to abs(a) */ +float f2(float a) +{ + return (float)__builtin_copysign(a, 5.0); +} + +/* { dg-final { scan-tree-dump-not "= __builtin_copysign" "optimized" } } */ +/* { dg-final { scan-tree-dump-not " double " "optimized" { target ifn_copysign } } } */ +/* { dg-final { scan-tree-dump-times ".COPYSIGN" 1 "optimized" { target ifn_copysign } } } */ +/* { dg-final { scan-tree-dump-times "-1.0e\\+0" 1 "optimized" { target ifn_copysign } } } */ +/* { dg-final { scan-tree-dump-times " ABS_EXPR " 1 "optimized" { target ifn_copysign } } } */ -- 2.17.1
Re: [Patch] OpenMP: Add support for 'self_maps' to the 'require' directive
Hi all, now committed as r15-3822-gb752eed3e3f2f2, see attachment. I fixed on C/C++ test issue (missing 's') and added the Fortran module check. Tobias PS: I noticed that 'declare target' does not add the target-used flag. At least TR13 is very clear that it counts, but currently GCC does not regard this (with a FIXME check spec note.) This needs to be fixed ventually. PPS: Old discussion: Andre Vehreschild: Hi Tobias, to my eye this looks fine. I would appreciate, if you could add some tests for errors on the fortran side, esp. where modules are involved. But no must. Ok for mainline. Thanks for the patch. - Andre On Sat, 21 Sep 2024 23:37:33 +0200 Tobias Burnus wrote: Add support of the 'self_maps' clause in 'omp requires', an OpenMP 6 feature but added here mostly as part of the on-going improvement of the unified-shared memory (USM) handling. Comments, remarks concerns before I commit it? * * * Regarding USM, there is on one hand the hardware: - some hardware cannot access the host memory at all - other hardware can access it, but either only through an interconnect or via page migration on page fault - on the third time of hardware, a host and device share the same memory controller For the latter, a 'map' never does make sense, but for the second case, it depends on the details whether it is better to do mapping or directly accessing the memory (i.e. via interconnect or page migration). On the compile-time side, the user can demand: - no requirement - 'requires unified_shared_memory' (= memory has to be accessible but the implementation can still do mapping for explicit maps) - 'requires shared_memory' - mapping is strictly not permitted. - other hints using compiler flags And for the runtime, the result depends on the actual hardware, the compile-time wishes, environment variables what is done. * * * Currently, the runtime never maps with USM, i.e. both act the same. At least using an environment variable, I would consider enabling mapping - one could also consider to have it always do mappings, except for self_maps. On the compile side, we need to handle implicit 'declare target' better - as it currently leads to separate memory. Using 'link', we could point to the host memory (at least for 'self_maps'). And before we can enable USM by default for integrated/APU devices, we need to solve some issues with 'link' (→ posted link) and for those, 'map' has to be honored. Those are 5.x follow up tasks, but having 'self_maps' available, completes the what-does-the-user-want part. Tobias PS: There is also the 'self' modifier to the map clause, working on a per-variable granularity. However, this like several other 6.0 items is completely out of scope of the current USM work. PPS: See also https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663209.html and the patch associated set, posted at https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655946.html commit b752eed3e3f2f27570ea89b7c2339468698472a8 Author: Tobias Burnus Date: Tue Sep 24 10:53:59 2024 +0200 OpenMP: Add support for 'self_maps' to the 'require' directive 'self_maps' implies 'unified_shared_memory', except that the latter also permits that explicit maps copy data to device memory while self_maps does not. In GCC, currently, both are handled identical. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_requires): Handle self_maps clause. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_requires): Handle self_maps clause. gcc/fortran/ChangeLog: * gfortran.h (enum gfc_omp_requires_kind): Add OMP_REQ_SELF_MAPS. (gfc_namespace): Enlarge omp_requires bitfield. * module.cc (enum ab_attribute, attr_bits): Add AB_OMP_REQ_SELF_MAPS. (mio_symbol_attribute): Handle it. * openmp.cc (gfc_check_omp_requires, gfc_match_omp_requires): Handle self_maps clause. * parse.cc (gfc_parse_file): Handle self_maps clause. gcc/ChangeLog: * lto-cgraph.cc (output_offload_tables, omp_requires_to_name): Handle self_maps clause. * omp-general.cc (struct omp_ts_info, omp_context_selector_matches): Likewise for the associated trait. * omp-general.h (enum omp_requires): Add OMP_REQUIRES_SELF_MAPS. * omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_IMPLEMENTATION_SELF_MAPS. include/ChangeLog: * gomp-constants.h (GOMP_REQUIRES_SELF_MAPS): #define. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Accept self_maps clause. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Likewise. * libgomp.texi (TR13 Impl. Status): Set to 'Y'. * target.c (gomp_requires_to_name, GOMP_offload_register_ver, gomp_target_init): Handle sel
Re: [PATCH v2] Widening-Mul: Fix one ICE for SAT_SUB matching operand checking
On Tue, Sep 24, 2024 at 9:13 AM wrote: > > From: Pan Li > > This patch would like to fix the following ICE for -O2 -m32 of x86_64. > > during RTL pass: expand > JackMidiAsyncWaitQueue.cpp.cpp: In function 'void DequeueEvent(unsigned > int)': > JackMidiAsyncWaitQueue.cpp.cpp:3:6: internal compiler error: in > expand_fn_using_insn, at internal-fn.cc:263 > 3 | void DequeueEvent(unsigned frame) { > | ^~~~ > 0x27b580d diagnostic_context::diagnostic_impl(rich_location*, > diagnostic_metadata const*, diagnostic_option_id, char const*, > __va_list_tag (*) [1], diagnostic_t) > ???:0 > 0x27c4a3f internal_error(char const*, ...) > ???:0 > 0x27b3994 fancy_abort(char const*, int, char const*) > ???:0 > 0xf25ae5 expand_fn_using_insn(gcall*, insn_code, unsigned int, unsigned int) > ???:0 > 0xf2a124 expand_direct_optab_fn(internal_fn, gcall*, optab_tag, unsigned int) > ???:0 > 0xf2c87c expand_SAT_SUB(internal_fn, gcall*) > ???:0 > > We allowed the operand convert when matching SAT_SUB in match.pd, to support > the zip benchmark SAT_SUB pattern. Aka, > > (convert? (minus (convert1? @0) (convert1? @1))) for below sample code. > > void test (uint16_t *x, unsigned b, unsigned n) > { > unsigned a = 0; > register uint16_t *p = x; > > do { > a = *--p; > *p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB > } while (--n); > } > > The pattern match for SAT_SUB itself may also act on below scalar sample > code too. > > unsigned long long GetTimeFromFrames(int); > unsigned long long GetMicroSeconds(); > > void DequeueEvent(unsigned frame) { > long long frame_time = GetTimeFromFrames(frame); > unsigned long long current_time = GetMicroSeconds(); > DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); > } > > Aka: > > uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t); > > Then there will be a problem when ia32 or -m32 is given when compiling. > Because we only check the lhs (aka uint32_t) type is supported by ifn > and missed the operand (aka uint64_t). Mostly DImode is disabled for > 32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding. > > The below test suites are passed for this patch. > * The rv64gcv fully regression test. > * The x86 bootstrap test. > * The x86 fully regression test. > > PR middle-end/116814 > > gcc/ChangeLog: > > * tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Add > ifn is_supported check for operand TREE type. > > gcc/testsuite/ChangeLog: > > * g++.dg/torture/pr116814-1.C: New test. > > Signed-off-by: Pan Li > --- > gcc/testsuite/g++.dg/torture/pr116814-1.C | 12 > gcc/tree-ssa-math-opts.cc | 23 +++ > 2 files changed, 27 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/torture/pr116814-1.C > > diff --git a/gcc/testsuite/g++.dg/torture/pr116814-1.C > b/gcc/testsuite/g++.dg/torture/pr116814-1.C > new file mode 100644 > index 000..dd6f29daa7c > --- /dev/null > +++ b/gcc/testsuite/g++.dg/torture/pr116814-1.C > @@ -0,0 +1,12 @@ > +/* { dg-do compile { target { ia32 } } } */ > +/* { dg-options "-O2" } */ > + > +unsigned long long GetTimeFromFrames(int); > +unsigned long long GetMicroSeconds(); > + > +void DequeueEvent(unsigned frame) { > + long long frame_time = GetTimeFromFrames(frame); > + unsigned long long current_time = GetMicroSeconds(); > + > + DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time); > +} > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc > index d61668aacfc..361761cedef 100644 > --- a/gcc/tree-ssa-math-opts.cc > +++ b/gcc/tree-ssa-math-opts.cc > @@ -4042,15 +4042,22 @@ build_saturation_binary_arith_call > (gimple_stmt_iterator *gsi, gphi *phi, > internal_fn fn, tree lhs, tree op_0, > tree op_1) > { > - if (direct_internal_fn_supported_p (fn, TREE_TYPE (lhs), > OPTIMIZE_FOR_BOTH)) > -{ > - gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1); > - gimple_call_set_lhs (call, lhs); > - gsi_insert_before (gsi, call, GSI_SAME_STMT); > + tree lhs_type = TREE_TYPE (lhs); > + tree op_type = TREE_TYPE (op_0); > > - gimple_stmt_iterator psi = gsi_for_stmt (phi); > - remove_phi_node (&psi, /* release_lhs_p */ false); > -} > + if (!direct_internal_fn_supported_p (fn, lhs_type, OPTIMIZE_FOR_BOTH)) > +return; > + > + if (lhs_type != op_type > + && !direct_internal_fn_supported_p (fn, op_type, OPTIMIZE_FOR_BOTH)) > +return; > + > + gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1); Since you're creating the call with op_0/op_1 shouldn't you _only_ check support for op_type operation and not lhs_type? Thanks, Richard. > + gimple_call_set_lhs (call, lhs); > + gsi_insert_before (gsi, call, GSI_SAME_STMT); > + > + gimple_stmt_iterator psi = gsi_for_st
[PATCH] Simplify range-op shift mask generation
The following reduces the number of wide_ints built which show up in the profile for PR114855 as the largest remaining bit at -O1. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. * range-op.cc (operator_rshift::op1_range): Use wi::mask instead of shift and not. --- gcc/range-op.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/range-op.cc b/gcc/range-op.cc index c576f688221..3f5cf083440 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -2863,7 +2863,7 @@ operator_rshift::op1_range (irange &r, // OP1 is anything from 0011 1000 to 0011 . That is, a // range from LHS<<3 plus a mask of the 3 bits we shifted on the // right hand side (0x07). - wide_int mask = wi::bit_not (wi::lshift (wi::minus_one (prec), shift)); + wide_int mask = wi::mask (shift.to_uhwi (), false, prec); int_range_max mask_range (type, wi::zero (TYPE_PRECISION (type)), mask); -- 2.43.0
[PATCH 3/3] phiprop: VOP phi confuses phiprop [PR116824]
Another small phiprop improvement, in some cases we could have a vop defining statement be a phi which might be the same bb as the load happens. This is ok since the phi here is not a store so we can just accept it. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/116824 gcc/ChangeLog: * tree-ssa-phiprop.cc (propagate_with_phi): Don't reject if the bb of the def_stmt is the same as load and if the def_stmt was a phi. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phiprop-3.c: New test. Signed-off-by: Andrew Pinski --- gcc/testsuite/gcc.dg/tree-ssa/phiprop-3.c | 30 +++ gcc/tree-ssa-phiprop.cc | 3 ++- 2 files changed, 32 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phiprop-3.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phiprop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-3.c new file mode 100644 index 000..a0d5891dc60 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-3.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-cselim-details -fdump-tree-phiopt2" } */ + +/* PR tree-optimization/116824 */ +/* phiprop should be able to handle the case where the vops defining + statement was a phi in the same bb as the deference. */ + +int g(int i, int *tt) +{ + const int t = 10; + const int *a; + { +if (t < i) +{ + *tt = 1; + a = &t; +} +else +{ + *tt = 1; + a = &i; +} + } + return *a; +} + +/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop1"} } */ +/* { dg-final { scan-tree-dump-times "factoring out stores" 1 "cselim"} } */ +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "phiopt2"} } */ + diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc index f04990e8cb4..4d1df7d351e 100644 --- a/gcc/tree-ssa-phiprop.cc +++ b/gcc/tree-ssa-phiprop.cc @@ -401,7 +401,8 @@ propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn, def_stmt = SSA_NAME_DEF_STMT (vuse); } if (!SSA_NAME_IS_DEFAULT_DEF (vuse) - && (gimple_bb (def_stmt) == bb + && ((gimple_bb (def_stmt) == bb + && !is_a(def_stmt)) || (gimple_bb (def_stmt) && !dominated_by_p (CDI_DOMINATORS, bb, gimple_bb (def_stmt) -- 2.43.0
[PATCH 1/3] Add an alternative testcase for PR 70740
While looking into improving phiprop, I noticed that the current pr70740.c testcase was being optimized almost all the way before phiprop because the addresses were considered the same; the arrays were all zero in size. This adds an alternative testcase which changes the array sizes to be 1 and phiprop can and will act on this testcase now and the fix which was being tested is actually tested now. Tested on x86_64-linux-gnu. PR 70740 gcc/testsuite/ChangeLog: * gcc.dg/torture/pr70740-1.c: New test. Signed-off-by: Andrew Pinski --- gcc/testsuite/gcc.dg/torture/pr70740-1.c | 41 1 file changed, 41 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/torture/pr70740-1.c diff --git a/gcc/testsuite/gcc.dg/torture/pr70740-1.c b/gcc/testsuite/gcc.dg/torture/pr70740-1.c new file mode 100644 index 000..77e6a2d7187 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr70740-1.c @@ -0,0 +1,41 @@ +/* { dg-do compile } */ + +/* This is an alternative to the original pr70740.c testcase, + arrays are now 1 in size where they were 0 in the other testcase. */ + +extern int foo (void); +extern void *memcpy (void *, const void *, __SIZE_TYPE__); + +struct +{ + char a[6]; +} d; +struct +{ + int a1[1]; + int a2[1]; + int a3[1]; + int a4[1]; +} a, c; +int b; + +int * +bar () +{ + if (b) +return a.a4; + return a.a2; +} + +void +baz () +{ + int *e, *f; + if (foo ()) +e = c.a3; + else +e = c.a1; + memcpy (d.a, e, 6); + f = bar (); + memcpy (d.a, f, 1); +} -- 2.43.0
[PATCH 2/3] phiprop: Skip over clobbers [PR116823]
In C++ code the clobber gets in the way of phiprop. E.g. ``` if (lr_bitpos.2401_412 < rr_bitpos.2402_413) goto ; [INV] else goto ; [INV] : : MEM[(struct poly_int *)&D.192544] ={v} {CLOBBER(bob)}; _1060 = MEM[(const long int &)iftmp.2400_515]; ``` The above comes from fold-const.cc. The clobber in the above case is the clobber from the start of the constructor but other clobbers can also get in the way, see gcc.dg/tree-ssa/phiprop-2.c for an example. This shows up in a lot of C++ code where std::min/max (or even ?: like in the fold-const.cc case) is used with in connection of constructors. So optimizing this early in phiprop can improve code generation and compile time speed. g++.dg/tree-ssa/phiprop-2.C contains the reduced testcase from fold-const.cc. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/116823 gcc/ChangeLog: * tree-ssa-phiprop.cc (phiprop_insert_phi): Get the use_vuse before the looping of the phi arguments, also skip over clobbers to get the use_vuse. (propagate_with_phi): Skip over clobbers for the vuse. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phiprop-2.c: New test. * g++.dg/tree-ssa/phiprop-1.C: New test. * g++.dg/tree-ssa/phiprop-2.C: New test. Signed-off-by: Andrew Pinski --- gcc/testsuite/g++.dg/tree-ssa/phiprop-1.C | 23 +++ gcc/testsuite/g++.dg/tree-ssa/phiprop-2.C | 25 + gcc/testsuite/gcc.dg/tree-ssa/phiprop-2.c | 27 +++ gcc/tree-ssa-phiprop.cc | 25 - 4 files changed, 99 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/tree-ssa/phiprop-1.C create mode 100644 gcc/testsuite/g++.dg/tree-ssa/phiprop-2.C create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phiprop-2.c diff --git a/gcc/testsuite/g++.dg/tree-ssa/phiprop-1.C b/gcc/testsuite/g++.dg/tree-ssa/phiprop-1.C new file mode 100644 index 000..e3388d1d157 --- /dev/null +++ b/gcc/testsuite/g++.dg/tree-ssa/phiprop-1.C @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-release_ssa" } */ + +/* PR tree-optimization/116823 */ +/* The clobber on a should not get in the way of phiprop here even if + this is undefined code. */ +/* We should have MIN_EXPR early on then too. */ + +static inline +const int &c(const int &d, const int &e) { + if (d < e) +return d; + return e; +} + +int g(int i, struct f *ff) +{ + const int &a = c(i, 10); + return a; +} +/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop1"} } */ +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "release_ssa"} } */ + diff --git a/gcc/testsuite/g++.dg/tree-ssa/phiprop-2.C b/gcc/testsuite/g++.dg/tree-ssa/phiprop-2.C new file mode 100644 index 000..1a0d6ed92ee --- /dev/null +++ b/gcc/testsuite/g++.dg/tree-ssa/phiprop-2.C @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-release_ssa" } */ + +/* PR tree-optimization/116823 */ +/* The clobber on the temp s2 should not get in the way of phiprop here. */ +/* We should have MAX_EXPR early on then too. */ +/* This is derived from fold-const.cc; s2 is similar to poly_int. */ + +struct s2 +{ + int i; + s2(const int &a) : i (a) {} +}; + + +int h(s2 b); + +int g(int l, int r) +{ + return h(l > r ? l : r); +} + +/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop1"} } */ +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "release_ssa"} } */ + diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phiprop-2.c b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-2.c new file mode 100644 index 000..546031e63d7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/phiprop-2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-tree-phiprop1-details -fdump-tree-release_ssa" } */ + +/* PR tree-optimization/116823 */ +/* The clobber on b should not get in the way of phiprop here. */ +/* We should have MIN_EXPR early on. */ + +void f(int *); + +int g(int i) +{ + const int t = 10; + const int *a; + { +int b; +f(&b); +if (t < i) + a = &t; +else + a = &i; + } + return *a; +} + +/* { dg-final { scan-tree-dump-times "Inserting PHI for result of load" 1 "phiprop1"} } */ +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "release_ssa"} } */ + diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc index 2a1cdae46d2..f04990e8cb4 100644 --- a/gcc/tree-ssa-phiprop.cc +++ b/gcc/tree-ssa-phiprop.cc @@ -159,6 +159,20 @@ phiprop_insert_phi (basic_block bb, gphi *phi, gimple *use_stmt, } gphi *vphi = get_virtual_phi (bb); + tree use_vuse = gimple_vuse (use_stmt); + gimple *def_stmt = SSA_NAME_DEF_STMT (use_vuse); + /* Skip over clobbers in the same bb as the use + as they don't interfer with loads. */ + while (!SSA_NAME_IS_DEFAULT_DEF (use_vuse) +&& gimple_clobber_p (def_stmt
[r15-3834 Regression] FAIL: c-c++-common/gomp/declare-variant-duplicates.c (test for excess errors) on Linux/x86_64
On Linux/x86_64, 96246bff0bcd9e5cdec9e6cf811ee3db4997f6d4 is the first bad commit commit 96246bff0bcd9e5cdec9e6cf811ee3db4997f6d4 Author: Sandra Loosemore Date: Fri Sep 6 20:58:13 2024 + OpenMP: Check additional restrictions on context selector properties caused FAIL: c-c++-common/gomp/declare-variant-duplicates.c (test for errors, line 11) FAIL: c-c++-common/gomp/declare-variant-duplicates.c (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-3834/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="gomp.exp=c-c++-common/gomp/declare-variant-duplicates.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="gomp.exp=c-c++-common/gomp/declare-variant-duplicates.c --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at haochen dot jiang at intel.com.) (If you met problems with cascadelake related, disabling AVX512F in command line might save that.) (However, please make sure that there is no potential problems with AVX512.)
Re: [PATCH 03/10] c++/modules: Use decl_linkage in maybe_record_mergeable_decl
On 9/23/24 7:44 PM, Nathaniel Shead wrote: I don't currently have any testcases where this changes something, but I felt it to be a valuable cleanup. Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? OK. -- >8 -- This avoids any possible inconsistencies (current or future) about whether a declaration is internal or not. gcc/cp/ChangeLog: * name-lookup.cc (maybe_record_mergeable_decl): Use decl_linkage instead of ad-hoc checks. Signed-off-by: Nathaniel Shead --- gcc/cp/name-lookup.cc | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc index 50e169eca43..c0f89f98d87 100644 --- a/gcc/cp/name-lookup.cc +++ b/gcc/cp/name-lookup.cc @@ -3725,17 +3725,10 @@ maybe_record_mergeable_decl (tree *slot, tree name, tree decl) if (TREE_CODE (*slot) != BINDING_VECTOR) return; - if (!TREE_PUBLIC (CP_DECL_CONTEXT (decl))) -/* Member of internal namespace. */ + if (decl_linkage (decl) == lk_internal) return; tree not_tmpl = STRIP_TEMPLATE (decl); - if ((TREE_CODE (not_tmpl) == FUNCTION_DECL - || VAR_P (not_tmpl)) - && DECL_THIS_STATIC (not_tmpl)) -/* Internal linkage. */ -return; - bool is_attached = (DECL_LANG_SPECIFIC (not_tmpl) && DECL_MODULE_ATTACH_P (not_tmpl)); tree *gslot = get_fixed_binding_slot
Re: [PATCH] Simplify range-op shift mask generation
Richard Biener writes: > The following reduces the number of wide_ints built which show up > in the profile for PR114855 as the largest remaining bit at -O1. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. Thanks.
Re: [PATCH] RISC-V: Fix FIXED_REGISTERS comment missing return address register
On 9/24/24 2:11 AM, chenyix...@iscas.ac.cn wrote: From: Yixuan Chen gcc/config/ChangeLog: 2024-09-24 Yixuan Chen * riscv/riscv.h: Fix FIXED_REGISTERS comment missing return address register. Thanks. I made minor fixes to the ChangeLog entry and pushed this to the trunk. jeff
Re: [PATCH] c++/contracts: ICE in build_contract_condition_function [PR116490]
On 8/30/24 8:49 AM, Nina Dinka Ranns wrote: We currently do not expect comdat group of the guarded function to be set at the time of generating pre and post check function. However, in the case of an explicit instantiation, the guarded function has been added to a comdat group before generating contract check functions, which causes the observed ICE. Current assert removed and an additional check for comdat group of the guarded function added. With this change, the pre and post check functions get added to the same comdat group of the guarded function if the guarded function is already placed in a comdat group. Tested on x86_64-pc-linux-gnu. Patch attached to the email. Thanks for the ping, I missed this the first time. Please CC me directly on C++ patches, especially on pings. FWIW it attached as application/x-patch, which Thunderbird doesn't understand to display inline; text/plain or text/x-patch attachments work better. I don't know how to tell gmail that it's text other than perhaps changing the extension to .txt. Please include the ChangeLog entries in plaintext, along with the description/rationale. Pushed, thanks! Jason
Re: [PATCH 1/2] rtl-optimization/114855 - slow add_store_equivs in IRA
On 9/24/24 6:34 AM, Richard Biener wrote: For the testcase in PR114855 at -O1 add_store_equivs shows up as the main sink for bitmap_set_bit because it uses a bitmap to mark all seen insns by UID to make sure the forward walk in memref_used_between_p will find the insn in question. Given we do have a CFG here the functions operation is questionable, given memref_used_between_p together with the walk of all insns is obviously quadratic in the worst case that whole thing should be re-done ... but, for the testcase, using a sbitmap of size get_max_uid () + 1 gets bitmap_set_bit off the profile and improves IRA time from 15.58s (8%) to 3.46s (2%). Now, given above quadraticness I wonder whether we should instead gate add_store_equivs on optimize > 1 or flag_expensive_optimizations. Jeff, you added the bitmap in r6-7529-g14d7d4be52585b, I have no idea how get_insns () works at this point and in which CFG mode we are but a simplification might be to simply verify both insns are in the same BB and hopefully get_insns gets us walk the insns in order there, thus we could elide the bitmap completely (with some loss of cases, but the function comment suggests it is supposed to catch single-BB cases only anyway?!). I don't recall the work, but looking at the PR and history, I'm pretty confident the equivalence code here is assuming linear IL, so BB or perhaps EBB. In retrospect it probably would have been better to restrict the check to a BB/EBB. Bootstrap and regtest running on x86_64-unknown-linux-gnu. OK if that succeeds? Thanks, Richard. PR rtl-optimization/114855 * ira.cc (add_store_equivs): Use sbitmap for tracking visited insns. OK jeff
Re: [PATCH RFA] libstdc++: #ifdef out #pragma GCC system_header
On 23/09/24 11:06 -0400, Jason Merrill wrote: Tested x86_64-pc-linux-gnu, OK for trunk? Yes please, I've wanted this for years, but it wasn't practical until you changed some of the warnings recently. Thanks!
Re: [PATCH] tree-optimization/114855 - slow VRP due to equiv oracle queries
Absolutely ok. Thanks! Andrew On 9/24/24 05:52, Richard Biener wrote: For the testcase in PR114855 VRP takes 320.41s (23%) (after mitigating backwards threader slowness). This is mostly due to the bitmap check in equiv_oracle::find_equiv_dom. The following turns this bitmap to tree view, trading the linear search for a O(log N) one which improves VRP time to 54.54s (5%). Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK if that succeeds? Thanks, Richard. PR tree-optimization/114855 * value-relation.cc (equiv_oracle::equiv_oracle): Switch m_equiv_set to tree view. --- gcc/value-relation.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc index 45722fcd13a..d6ad2dd984f 100644 --- a/gcc/value-relation.cc +++ b/gcc/value-relation.cc @@ -321,6 +321,7 @@ equiv_oracle::equiv_oracle () m_equiv.create (0); m_equiv.safe_grow_cleared (last_basic_block_for_fn (cfun) + 1); m_equiv_set = BITMAP_ALLOC (&m_bitmaps); + bitmap_tree_view (m_equiv_set); obstack_init (&m_chain_obstack); m_self_equiv.create (0); m_self_equiv.safe_grow_cleared (num_ssa_names + 1);
Re: [PATCH 2/2] Disable add_store_equivs when -fno-expensive-optimizations
On 9/24/24 6:35 AM, Richard Biener wrote: IRAs add_store_equivs is quadratic in the size of the function worst case, disable it when -fno-expensive-optimizations which means at -O1 and -Og. Bootstrap and regtest running on x86_64-unknown-linux-gnu. OK? Thanks, Richard. * ira.cc (ira): Gate add_store_equivs on flag_expensive_optimizations. Given it's quadratic, definitely OK :-) jeff
[PATCH v1 1/3] RISC-V: Refine the testcase of vector SAT_ADD
From: Pan Li Take scan-assembler-times for vsadd insn check instead of function body, as we only care about if we can generate the fixed point insn vsadd. The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: Remove func body check and take scan asm times instead. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-1.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-10.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-11.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-12.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-13.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-14.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-15.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-16.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-4.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-5.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-6.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-7.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-8.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-9.c: Ditto. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/binop/vec_sat_s_add-1.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_s_add-2.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_s_add-3.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_s_add-4.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_add-1.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_add-10.c | 5 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_add-11.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_add-12.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_add-13.c | 12 +--- .../riscv/rvv/autovec/binop/vec_sat_u_add-14.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_add-15.c
Re: [PATCH v1 3/3] RISC-V: Refine the testcase of vector SAT_TRUNC
LGTM juzhe.zh...@rivai.ai From: pan2.li Date: 2024-09-25 14:45 To: gcc-patches CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li Subject: [PATCH v1 3/3] RISC-V: Refine the testcase of vector SAT_TRUNC From: Pan Li Take scan-assembler-times for vsadd insn check instead of function body, as we only care about if we can generate the fixed point insn vnclip. The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: Remove func body check and take scan asm times instead. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-10.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-11.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-12.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-13.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-14.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-15.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-16.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-17.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-18.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-19.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-20.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-21.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-22.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-23.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-24.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-5.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-6.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-7.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-8.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-9.c: Ditto. Signed-off-by: Pan Li --- .../rvv/autovec/unop/vec_sat_u_trunc-1.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-10.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-11.c | 16 +- .../rvv/autovec/unop/vec_sat_u_trunc-12.c | 12 +-- .../rvv/autovec/unop/vec_sat_u_trunc-13.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-14.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-15.c | 21 ++- .../rvv/autovec/unop/vec_sat_u_trunc-16.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-17.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-18.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-19.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-2.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-20.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-21.c | 21 ++- .../rvv/autovec/unop/vec_sat_u_trunc-22.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-23.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-24.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-3.c | 21 ++- .../rvv/autovec/unop/vec_sat_u_trunc-4.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-5.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-6.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-7.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-8.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-9.c | 21 ++- 24 files changed, 46 insertions(+), 328 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c index 186005733ec..3d29d26abff 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c @@ -1,18 +1,9 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ -/* { dg-skip-if "" { *-*-* } { "-flto" } } */ -/* { dg-final { check-function-bodies "**" "" } } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details" } */ #include "../vec_sat_arith.h" -/* -** vec_sat_u_trunc_uint8_t_uint16_t_fmt_1: -** ... -** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*mf2,\s*ta,\s*ma -** ... -** vnclipu\.wi\s+v[0-9]+,\s*v[0-9]+,\s*0 -** ... -*/ DEF_VEC_SAT_U_TRUNC_FMT_1 (uint8_t, uint16_t) /* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 4 "expand" } } */ +/* { dg-final { scan-assembler-times {vnclipu\.wi} 1 } } */ diff --git a/gcc/testsuite/gcc.
[PATCH v1 2/3] RISC-V: Refine the testcase of vector SAT_SUB
From: Pan Li Take scan-assembler-times for vsadd insn check instead of function body, as we only care about if we can generate the fixed point insn vssub. The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-1.c: Remove func body check and take scan asm times instead. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-25.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-26.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-27.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-28.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-29.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-30.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-31.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-32.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-33.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-34.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-35.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-36.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-37.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-38.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-39.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-4.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-40.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-5.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-6.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-7.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-8.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-9.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c: Ditto. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/binop/vec_sat_u_sub-1.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-10.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-11.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-12.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-13.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-14.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-15.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-16.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-17.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-18.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-19.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-2.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-20.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-21.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-22.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-23.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-24.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-25.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-26.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_sat_u_sub-27.c | 13 ++--- .../riscv/rvv/autovec/binop/vec_
[PATCH v1 3/3] RISC-V: Refine the testcase of vector SAT_TRUNC
From: Pan Li Take scan-assembler-times for vsadd insn check instead of function body, as we only care about if we can generate the fixed point insn vnclip. The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: Remove func body check and take scan asm times instead. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-10.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-11.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-12.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-13.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-14.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-15.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-16.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-17.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-18.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-19.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-20.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-21.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-22.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-23.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-24.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-5.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-6.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-7.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-8.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-9.c: Ditto. Signed-off-by: Pan Li --- .../rvv/autovec/unop/vec_sat_u_trunc-1.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-10.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-11.c | 16 +- .../rvv/autovec/unop/vec_sat_u_trunc-12.c | 12 +-- .../rvv/autovec/unop/vec_sat_u_trunc-13.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-14.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-15.c | 21 ++- .../rvv/autovec/unop/vec_sat_u_trunc-16.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-17.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-18.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-19.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-2.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-20.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-21.c | 21 ++- .../rvv/autovec/unop/vec_sat_u_trunc-22.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-23.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-24.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-3.c | 21 ++- .../rvv/autovec/unop/vec_sat_u_trunc-4.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-5.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-6.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-7.c | 13 ++-- .../rvv/autovec/unop/vec_sat_u_trunc-8.c | 17 ++- .../rvv/autovec/unop/vec_sat_u_trunc-9.c | 21 ++- 24 files changed, 46 insertions(+), 328 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c index 186005733ec..3d29d26abff 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c @@ -1,18 +1,9 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ -/* { dg-skip-if "" { *-*-* } { "-flto" } } */ -/* { dg-final { check-function-bodies "**" "" } } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details" } */ #include "../vec_sat_arith.h" -/* -** vec_sat_u_trunc_uint8_t_uint16_t_fmt_1: -** ... -** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*mf2,\s*ta,\s*ma -** ... -** vnclipu\.wi\s+v[0-9]+,\s*v[0-9]+,\s*0 -** ... -*/ DEF_VEC_SAT_U_TRUNC_FMT_1 (uint8_t, uint16_t) /* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 4 "expand" } } */ +/* { dg-final { scan-assembler-times {vnclipu\.wi} 1 } } */ diff --git a/gcc/te
RE: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion
Got it, thanks a lot. Pan -Original Message- From: Uros Bizjak Sent: Tuesday, September 24, 2024 3:29 PM To: Li, Pan2 Cc: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion On Tue, Sep 24, 2024 at 8:53 AM Li, Pan2 wrote: > > Got it and thanks, let me rerun to make sure it works well as expected. For reference, this is documented in: https://gcc.gnu.org/wiki/Testing_GCC https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html https://gcc.gnu.org/install/test.html Uros.
Re: [PATCH v1] Widening-Mul: Fix one ICE for SAT_SUB matching operand promotion
On Tue, Sep 24, 2024 at 8:53 AM Li, Pan2 wrote: > > Got it and thanks, let me rerun to make sure it works well as expected. For reference, this is documented in: https://gcc.gnu.org/wiki/Testing_GCC https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html https://gcc.gnu.org/install/test.html Uros.
Re: [Patch, fortran] PR116733: Generic processing of assumed rank objects (f202y)
Hi Paul, in addition to Thomas' remarks (which I second to), I have the following: > diff --git a/gcc/fortran/intrinsic.cc b/gcc/fortran/intrinsic.cc > index 0a6be215825..d95f35145b5 100644 > --- a/gcc/fortran/intrinsic.cc > +++ b/gcc/fortran/intrinsic.cc > @@ -293,11 +293,15 @@ do_ts29113_check (gfc_intrinsic_sym *specific, > gfc_actual_arglist *arg) >&a->expr->where, gfc_current_intrinsic); > ok = false; > } > - else if (a->expr->rank == -1 && !specific->inquiry) > + else if (a->expr->rank == -1 > +&& !(specific->inquiry > + || (specific->id == GFC_ISYM_RESHAPE > + && (gfc_option.allow_std & GFC_STD_F202Y > { > gfc_error ("Assumed-rank argument at %L is only permitted as actual " > - "argument to intrinsic inquiry functions", > - &a->expr->where); > + "argument to intrinsic inquiry functions or to reshape. " Is it not a convention to write Fortran intrinsics function names all uppercase? I.e. RESHAPE when the function is meant just to make it clear like in the message above on C_LOC and PRESENT (lines 268--270). > + "The latter is an experimental F202y feature. Use " > + "-std=f202y to enable", &a->expr->where); > ok = false; > } >else if (a->expr->rank == -1 && arg != a) > @@ -307,6 +311,13 @@ do_ts29113_check (gfc_intrinsic_sym *specific, > gfc_actual_arglist *arg) &a->expr->where, gfc_current_intrinsic); > ok = false; > } > + else if (a->expr->rank == -1 && specific->id == GFC_ISYM_RESHAPE > +&& !gfc_is_simply_contiguous (a->expr, true, false)) > + { > + gfc_error ("Assumed rank argument to the reshape intrinsic at %L " Here, too? > + "must be contiguous", &a->expr->where); > + ok = false; > + } > } > >return ok; > diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc > index 0cd78a57a2f..81610b93345 100644 > --- a/gcc/fortran/match.cc > +++ b/gcc/fortran/match.cc > @@ -1920,7 +1920,31 @@ gfc_match_associate (void) >gfc_association_list* a; > >/* Match the next association. */ > - if (gfc_match (" %n =>", newAssoc->name) != MATCH_YES) > + if (gfc_match (" %n ", newAssoc->name) != MATCH_YES) > + { > + /* "Expected associate name at %C" would be better. > + Change associate_3.f03 to match. */ That's an odd comment. Sounds to me like a remark to your self. > + gfc_error ("Expected associate name at %C"); > + goto assocListError; > + } > + > + /* Required for an assumed rank target. */ > + if (gfc_peek_char () == '(') > + { > + newAssoc->ar = gfc_get_array_ref (); This is not freeed in case of an error and may result in a memory leak, right? > + if (gfc_match_array_ref (newAssoc->ar, NULL, 0, 0) != MATCH_YES) > + { > + gfc_error ("Bad bounds remapping list at %C"); > + goto assocListError; > + } > + } > + > + if (newAssoc->ar && !(gfc_option.allow_std & GFC_STD_F202Y)) > + gfc_error_now ("The bounds remapping list at %C is an experimental " > +"F202y feature. Use std=f202y to enable"); > + > + /* Match the next association. */ > + if (gfc_match (" =>", newAssoc->name) != MATCH_YES) > { > gfc_error ("Expected association at %C"); > goto assocListError; > diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc > index 07e28a9f7a8..aa0ee1b0164 100644 > --- a/gcc/fortran/trans-expr.cc > +++ b/gcc/fortran/trans-expr.cc > @@ -10784,6 +10815,13 @@ gfc_trans_pointer_assignment (gfc_expr * expr1, > gfc_expr * expr2) > gcc_assert (remap->u.ar.start[dim] && > remap->u.ar.end[dim]); > + if (remap->u.ar.start[dim]->expr_type != EXPR_CONSTANT > + || remap->u.ar.start[dim]->expr_type != EXPR_VARIABLE) > + gfc_resolve_expr (remap->u.ar.start[dim]); > + if (remap->u.ar.end[dim]->expr_type != EXPR_CONSTANT > + || remap->u.ar.end[dim]->expr_type != EXPR_VARIABLE) > + gfc_resolve_expr (remap->u.ar.end[dim]); > + Can't these resolves be done during resolve-stage? I have had some serious trouble with late resolves, therefore asking. > /* Convert declared bounds. */ > gfc_init_se (&lower_se, NULL); > gfc_init_se (&upper_se, NULL); > diff --git a/gcc/fortran/trans-stmt.cc b/gcc/fortran/trans-stmt.cc > index 86c54970475..450c11c06d7 100644 > --- a/gcc/fortran/trans-stmt.cc > +++ b/gcc/fortran/trans-stmt.cc > @@ -1910,6 +1910,20 @@ trans_associate_var (gfc_symbol *sym, > gfc_wrapped_block *block) gfc_add_init_cleanup (block, gfc_finish_block > (&se.pre), tmp); } >/* Now all the other kinds of associate variable. */ > + else if (e->rank == -1 &&
[PATCH] x86/{,V}AES: adjust when to force EVEX encoding
Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly said "..., but we need to emit {evex} prefix in the assembly if AES ISA is not enabled". Yet it did so only for the TARGET_AES insns. Going from the alternative chosen in the TARGET_VAES insns is wrong for two reasons: - if, with AES disabled, the latter alternative was chosen despite no "high" XMM register nor any eGPR in use, gas would still pick the AES (VEX) encoding when no {evex} pseudo-prefix is in use (which is against - as stated by the description of said commit - AES presently not being considered a prereq of VAES in gcc); - if AES is (also) enabled, EVEX encoding would needlessly be forced. gcc/ * config/i386/sse.md (vaesdec_, vaesdeclast_, vaesenc_, vaesenclast_): Replace which_alternative check by TARGET_AES one. --- As an aside - {evex} (and other) pseudo-prefixes would better be avoided anyway whenever possible, as those are getting in the way of code putting in place macro overrides for certain insns: gas 2.43 rejects such bogus placement of pseudo-prefixes. --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -30802,7 +30802,7 @@ UNSPEC_VAESDEC))] "TARGET_VAES" { - if (which_alternative == 0 && mode == V16QImode) + if (!TARGET_AES && mode == V16QImode) return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}"; else return "vaesdec\t{%2, %1, %0|%0, %1, %2}"; @@ -30816,7 +30816,7 @@ UNSPEC_VAESDECLAST))] "TARGET_VAES" { - if (which_alternative == 0 && mode == V16QImode) + if (!TARGET_AES && mode == V16QImode) return "%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}"; else return "vaesdeclast\t{%2, %1, %0|%0, %1, %2}"; @@ -30830,7 +30830,7 @@ UNSPEC_VAESENC))] "TARGET_VAES" { - if (which_alternative == 0 && mode == V16QImode) + if (!TARGET_AES && mode == V16QImode) return "%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}"; else return "vaesenc\t{%2, %1, %0|%0, %1, %2}"; @@ -30844,7 +30844,7 @@ UNSPEC_VAESENCLAST))] "TARGET_VAES" { - if (which_alternative == 0 && mode == V16QImode) + if (!TARGET_AES && mode == V16QImode) return "%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}"; else return "vaesenclast\t{%2, %1, %0|%0, %1, %2}";