Re: [PATCH] gcc_update: use gcc-descr git alias for revision string in gcc/REVISION

2021-07-16 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 9:12 PM Serge Belyshev
 wrote:
>
> This is to make development version string more readable, and
> to simplify navigation through gcc-testresults.
>
> Currently gcc_update uses git log --pretty=tformat:%p:%t:%H to
> generate version string, which is somewhat excessive since conversion
> to git because commit hashes are now stable.
>
> Even better, gcc-git-customization.sh script provides gcc-descr alias
> which makes prettier version string, and thus use it instead (or just
> abbreviated commit hash when the alias is not available).
>
> Before: [master revision 
> b25edf6e6fe:e035f180ebf:7094a69bd62a14dfa311eaa2fea468f221c7c9f3]
> After: [master r12-2331]
>
> OK for mainline?

Can you instead open-code gcc-descr in this script?

> contrib/Changelog:
>
> * gcc_update: Use gcc-descr alias for revision string if it exists, or
> abbreviated commit hash instead. Drop "revision" from gcc/REVISION.
> ---
>  contrib/gcc_update | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/gcc_update b/contrib/gcc_update
> index 80fac9fc995..8f712e37616 100755
> --- a/contrib/gcc_update
> +++ b/contrib/gcc_update
> @@ -332,7 +332,7 @@ case $vcs_type in
>  exit 1
> fi
>
> -   revision=`$GCC_GIT log -n1 --pretty=tformat:%p:%t:%H`
> +   revision=`$GCC_GIT gcc-descr || $GCC_GIT log -n1 --pretty=tformat:%h`
> branch=`$GCC_GIT name-rev --name-only HEAD || :`
> ;;
>
> @@ -414,6 +414,6 @@ rm -f LAST_UPDATED gcc/REVISION
>  date
>  echo "`TZ=UTC date` (revision $revision)"
>  } > LAST_UPDATED
> -echo "[$branch revision $revision]" > gcc/REVISION
> +echo "[$branch $revision]" > gcc/REVISION
>
>  touch_files_reexec


Re: [COMMITTED] Add gimple_range_type for statements.

2021-07-16 Thread Richard Biener via Gcc-patches
On Thu, Jul 15, 2021 at 10:00 PM Andrew MacLeod  wrote:
>
> On 7/15/21 9:06 AM, Richard Biener wrote:
> > On Thu, Jul 15, 2021 at 1:06 PM Aldy Hernandez  wrote:
> >>
> >> Currently gimple_expr_type is ICEing because it calls 
> >> gimple_call_return_type.
> >>
> >> I still think gimple_call_return_type should return void_type_node
> >> instead of ICEing, but this will also fix my problem.
> >>
> >> Anyone have a problem with this?
> > It's still somewhat inconsistent, no?  Because for a call without a LHS
> > it's now either void_type_node or the type of the return value.
> >
> > It's probably known I dislike gimple_expr_type itself (it was introduced
> > to make the transition to tuples easier).  I wonder why you can't simply
> > fix range_of_call to do
> >
> > tree lhs = gimple_call_lhs (call);
> > if (lhs)
> >   type = TREE_TYPE (lhs);
> >
> > Richard.
>
> You are correct. There are indeed inconsistencies, and they exist in
> multiple places.  In fact, none of them do exactly what we are looking
> for all the time, and there are times we do care about the stmt when
> there is no LHS.In addition, we almost always then have to check
> whether the type we found is supported.
>
> So instead, much as we did for types with range_compatible_p (), we'll
> provide a function for statements which does exactly what we need. This
> patch eliminates all the ranger calls to both gimple_expr_type ()  and
> gimple_call_return_type () .This will also simplify the life of
> anyone who goes to eventually remove gimple_expr_type () as there will
> now be less uses.

Thanks a lot!

Richard.

> The function will return a type if and only if we can find the type in
> an orderly fashion, and then determine if it is also supported by ranger.
>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.
>
> Andrew
>


Re: [PATCH] Fix PR 101453: ICE with optimize and large integer constant

2021-07-16 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 15, 2021 at 06:59:17PM -0700, apinski--- via Gcc-patches wrote:
> From: Andrew Pinski 
> 
> Every base 10 digit will take use ~3.32 bits to represent. So for
> a 64bit signed integer, it is 20 characters. The buffer was only
> 20 so it did not fit; add in the null character and "-O" part,
> the buffer would be 3 bytes too small.
> 
> Instead of just increasing the size of the buffer, I decided to
> calculate the size at compile time and use constexpr to get a
> constant for the size.
> Since GCC is written in C++11, using constexpr is the best way
> to force the size calculated at compile time.
> 
> OK? Bootstrapped and tested on x86_64-linux with no regressions.
> 
> gcc/c-family/ChangeLog:
> 
>   PR c/101453
>   * c-common.c (parse_optimize_options): Use the correct
>   size for buffer.

The formatting is wrong (lots of spaces missing) and we aren't that space
constrained that we need to use floating point in the calculations.
Other places in the gcc just multiply sizeof by 3, which isn't enough
for -128..128 char, but long has must have wider range than that.
So just char buffer[sizeof (long) * 3 + 3]; ?
Or at least use HOST_BITS_PER_LONG instead of sizeof(long)*CHAR_BIT and
avoid the double calculation, so
  char buffer[HOST_BITS_PER_LONG / 3 + 4];

> ---
>  gcc/c-family/c-common.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> index 20ec26317c5..4c5b75a9548 100644
> --- a/gcc/c-family/c-common.c
> +++ b/gcc/c-family/c-common.c
> @@ -5799,7 +5799,9 @@ parse_optimize_options (tree args, bool attr_p)
>  
>if (TREE_CODE (value) == INTEGER_CST)
>   {
> -   char buffer[20];
> +   constexpr double log10 = 3.32;
> +   constexpr int longdigits = ((int)((sizeof(long)*CHAR_BIT)/log10))+1;
> +   char buffer[longdigits + 3];
> sprintf (buffer, "-O%ld", (long) TREE_INT_CST_LOW (value));
> vec_safe_push (optimize_args, ggc_strdup (buffer));
>   }
> -- 
> 2.27.0

Jakub



Re: [PATCH] gcc_update: use gcc-descr git alias for revision string in gcc/REVISION

2021-07-16 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 16, 2021 at 09:06:01AM +0200, Richard Biener via Gcc-patches wrote:
> On Thu, Jul 15, 2021 at 9:12 PM Serge Belyshev
>  wrote:
> >
> > This is to make development version string more readable, and
> > to simplify navigation through gcc-testresults.
> >
> > Currently gcc_update uses git log --pretty=tformat:%p:%t:%H to
> > generate version string, which is somewhat excessive since conversion
> > to git because commit hashes are now stable.
> >
> > Even better, gcc-git-customization.sh script provides gcc-descr alias
> > which makes prettier version string, and thus use it instead (or just
> > abbreviated commit hash when the alias is not available).
> >
> > Before: [master revision 
> > b25edf6e6fe:e035f180ebf:7094a69bd62a14dfa311eaa2fea468f221c7c9f3]
> > After: [master r12-2331]
> >
> > OK for mainline?
> 
> Can you instead open-code gcc-descr in this script?

Yeah, that will mean consistency no matter whether one has the
customizations installed or not.
And, you don't want the effect of $GCC_GIT gcc-descr but $GCC_GIT gcc-descr HEAD
(the default is $GCC_GIT gcc-descr master).
As you want to use gcc-descr without --full, I think
revision=`$GCC_GIT log -n1 --pretty=tformat:%h`
r=`$GCC_GIT describe --all --match 'basepoints/gcc-[0-9]*' HEAD \
   | sed -n 
's,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)-\([0-9]\+\)-g[0-9a-f]*$,r\2-\3,p;s,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)$,r\2-0,p'`;
if test -n $r; then
o=`$GCC_GIT config --get gcc-config.upstream`;
rr=`echo $r | sed -n 
's,^r\([0-9]\+\)-[0-9]\+\(-g[0-9a-f]\+\)\?$,\1,p'`;
if $GCC_GIT rev-parse --verify --quiet 
${o:-origin}/releases/gcc-$rr >/dev/null; then
m=releases/gcc-$rr;
else
m=master;
fi;
if $GCC_GIT merge-base --is-ancestor HEAD ${o:-origin}/$m; then
revision=$r;
fi
fi
will do it.  Perhaps rename the r, o, rr and m temporaries.

Jakub



Re: [PATCH v3] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-07-16 Thread Andreas Krebbel via Gcc-patches
On 7/12/21 9:23 PM, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html
> v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to
>   UNSPEC_PLT31 (Ulrich, Andreas).  Do not append @PLT only to
>   weak symbols in non-PIC code (Ulrich).  Add TLS tests.
> 
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574646.html
> v2 -> v3: Use %K in function_profiler() and s390_output_mi_thunk(),
>   add tests for these cases.
> 
> 
> 
> This helps with generating code for kernel hotpatches, which contain
> individual functions and are loaded more than 2G away from vmlinux.
> This should not create performance regressions for the normal use
> cases, because for local functions ld replaces @PLT calls with direct
> calls.
> 
> gcc/ChangeLog:
> 
>   * config/s390/predicates.md (bras_sym_operand): Accept all
>   functions in 64-bit mode, use UNSPEC_PLT31.
>   (larl_operand): Use UNSPEC_PLT31.
>   * config/s390/s390.c (s390_loadrelative_operand_p): Likewise.
>   (legitimize_pic_address): Likewise.
>   (s390_emit_tls_call_insn): Mark __tls_get_offset as function,
>   use UNSPEC_PLT31.
>   (s390_delegitimize_address): Use UNSPEC_PLT31.
>   (s390_output_addr_const_extra): Likewise.
>   (print_operand): Add @PLT to TLS calls, handle %K.
>   (s390_function_profiler): Mark __fentry__/_mcount as function,
>   use %K, use UNSPEC_PLT31.
>   (s390_output_mi_thunk): Use only UNSPEC_GOT, use %K.
>   (s390_emit_call): Use UNSPEC_PLT31.
>   (s390_emit_tpf_eh_return): Mark __tpf_eh_return as function.
>   * config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT.
>   (*movdi_64): Use %K.
>   (reload_base_64): Likewise.
>   (*sibcall_brc): Likewise.
>   (*sibcall_brcl): Likewise.
>   (*sibcall_value_brc): Likewise.
>   (*sibcall_value_brcl): Likewise.
>   (*bras): Likewise.
>   (*brasl): Likewise.
>   (*bras_r): Likewise.
>   (*brasl_r): Likewise.
>   (*bras_tls): Likewise.
>   (*brasl_tls): Likewise.
>   (main_base_64): Likewise.
>   (reload_base_64): Likewise.
>   (@split_stack_call): Likewise.

Ok. Thanks!

Andreas


[committed] libstdc++: Suppress pedantic warnings about __int128

2021-07-16 Thread Jonathan Wakely via Gcc-patches
With -std=c++NN -pedantic -Wsystem-headers there are warnings about the
use of __int128, which can be suppressed using diagnostic pragmas.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/cpp_type_traits.h: Add diagnostic pragmas around
uses of non-standard integer types.
* include/bits/functional_hash.h: Likewise.
* include/bits/iterator_concepts.h: Likewise.
* include/bits/max_size_type.h: Likewise.
* include/bits/std_abs.h: Likewise.
* include/bits/stl_algobase.h: Likewise.
* include/bits/uniform_int_dist.h: Likewise.
* include/ext/numeric_traits.h: Likewise.
* include/std/type_traits: Likewise.

Tested powerpc64le-linux. Committed to trunk.

commit c1676651b6c417e8f2b276a28199d76943834277
Author: Jonathan Wakely 
Date:   Thu Jul 15 15:36:34 2021

libstdc++: Suppress pedantic warnings about __int128

With -std=c++NN -pedantic -Wsystem-headers there are warnings about the
use of __int128, which can be suppressed using diagnostic pragmas.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/cpp_type_traits.h: Add diagnostic pragmas around
uses of non-standard integer types.
* include/bits/functional_hash.h: Likewise.
* include/bits/iterator_concepts.h: Likewise.
* include/bits/max_size_type.h: Likewise.
* include/bits/std_abs.h: Likewise.
* include/bits/stl_algobase.h: Likewise.
* include/bits/uniform_int_dist.h: Likewise.
* include/ext/numeric_traits.h: Likewise.
* include/std/type_traits: Likewise.

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
b/libstdc++-v3/include/bits/cpp_type_traits.h
index ca0d68c29de..8f8dd817dc2 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -266,6 +266,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef __true_type __type;  \
 };
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpedantic"
+
 #ifdef __GLIBCXX_TYPE_INT_N_0
 __INT_N(__GLIBCXX_TYPE_INT_N_0)
 #endif
@@ -279,6 +282,8 @@ __INT_N(__GLIBCXX_TYPE_INT_N_2)
 __INT_N(__GLIBCXX_TYPE_INT_N_3)
 #endif
 
+#pragma GCC diagnostic pop
+
 #undef __INT_N
 
   //
diff --git a/libstdc++-v3/include/bits/functional_hash.h 
b/libstdc++-v3/include/bits/functional_hash.h
index 7be8ebfa2d3..78e3644bc74 100644
--- a/libstdc++-v3/include/bits/functional_hash.h
+++ b/libstdc++-v3/include/bits/functional_hash.h
@@ -171,6 +171,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Explicit specialization for unsigned long long.
   _Cxx_hashtable_define_trivial_hash(unsigned long long)
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpedantic"
+
 #ifdef __GLIBCXX_TYPE_INT_N_0
   _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_0)
   _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_0 unsigned)
@@ -188,6 +191,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_3 unsigned)
 #endif
 
+#pragma GCC diagnostic pop
+
 #undef _Cxx_hashtable_define_trivial_hash
 
   struct _Hash_impl
diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index c273056c204..97c0b80a507 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -553,6 +553,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 class __max_diff_type;
 class __max_size_type;
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpedantic"
+
 template
   concept __is_signed_int128
 #if __SIZEOF_INT128__
@@ -569,6 +572,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
= false;
 #endif
 
+#pragma GCC diagnostic pop
+
 template
   concept __cv_bool = same_as;
 
diff --git a/libstdc++-v3/include/bits/max_size_type.h 
b/libstdc++-v3/include/bits/max_size_type.h
index 153b1bff5f4..24237cc57de 100644
--- a/libstdc++-v3/include/bits/max_size_type.h
+++ b/libstdc++-v3/include/bits/max_size_type.h
@@ -417,7 +417,10 @@ namespace ranges
 #endif
 
 #if __SIZEOF_INT128__
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpedantic"
   using __rep = unsigned __int128;
+#pragma GCC diagnostic pop
 #else
   using __rep = unsigned long long;
 #endif
@@ -771,7 +774,10 @@ namespace ranges
   static constexpr bool is_integer = true;
   static constexpr bool is_exact = true;
 #if __SIZEOF_INT128__
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpedantic"
   static_assert(same_as<_Sp::__rep, unsigned __int128>);
+#pragma GCC diagnostic pop
   static constexpr int digits = 129;
 #else
   static_assert(same_as<_Sp::__rep, unsigned long long>);
diff --git a/libstdc++-v3/include/bits/std_abs.h 
b/libstdc++-v3/include/bits/std_abs.h
index ae6bfc1b1ac..c65ebb66439 100644
--- a/libstdc++-v3/include/bits/std_a

[PATCH] tree-optimization/101462 - fix signedness of reused reduction vector

2021-07-16 Thread Richard Biener
This fixes the partial reduction of the reused reduction vector to
carried out in the correct sign and the correctly signed vector
recorded for the skip edge use.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-07-16  Richard Biener  

* tree-vect-loop.c (vect_transform_cycle_phi): Correct sign
conversion issues with the partial reduction of the reused
vector accumulator.
---
 gcc/tree-vect-loop.c | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index fc3dab0d143..00a57b2ba62 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7706,21 +7706,35 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
   if (auto *accumulator = reduc_info->reused_accumulator)
 {
   tree def = accumulator->reduc_input;
-  unsigned int nreduc;
-  bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS (TREE_TYPE (def)),
- TYPE_VECTOR_SUBPARTS (vectype_out),
- &nreduc);
-  gcc_assert (res);
-  if (nreduc != 1)
-   {
- /* Reduce the single vector to a smaller one.  */
+  if (!useless_type_conversion_p (vectype_out, TREE_TYPE (def)))
+   {
+ unsigned int nreduc;
+ bool res = constant_multiple_p (TYPE_VECTOR_SUBPARTS
+   (TREE_TYPE (def)),
+ TYPE_VECTOR_SUBPARTS (vectype_out),
+ &nreduc);
+ gcc_assert (res);
  gimple_seq stmts = NULL;
- def = vect_create_partial_epilog (def, vectype_out,
-   STMT_VINFO_REDUC_CODE (reduc_info),
-   &stmts);
+ /* Reduce the single vector to a smaller one.  */
+ if (nreduc != 1)
+   {
+ /* Perform the reduction in the appropriate type.  */
+ tree rvectype = vectype_out;
+ if (!useless_type_conversion_p (TREE_TYPE (vectype_out),
+ TREE_TYPE (TREE_TYPE (def
+   rvectype = build_vector_type (TREE_TYPE (TREE_TYPE (def)),
+ TYPE_VECTOR_SUBPARTS
+   (vectype_out));
+ def = vect_create_partial_epilog (def, rvectype,
+   STMT_VINFO_REDUC_CODE
+ (reduc_info),
+   &stmts);
+   }
  /* Adjust the input so we pick up the partially reduced value
 for the skip edge in vect_create_epilog_for_reduction.  */
  accumulator->reduc_input = def;
+ if (!useless_type_conversion_p (vectype_out, TREE_TYPE (def)))
+   def = gimple_convert (&stmts, vectype_out, def);
  if (loop_vinfo->main_loop_edge)
{
  /* While we'd like to insert on the edge this will split
-- 
2.26.2


[committed] libstdc++: Adjust doxygen markup for variable templates group [PR101307]

2021-07-16 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

PR libstdc++/101307
* include/std/type_traits: Adjust doxygen markup.

Tested powerpc64le-linux. Committed to trunk.

commit da89dfc2a0cb29d3d8a4f2394eee90d150cf6185
Author: Jonathan Wakely 
Date:   Thu Jul 15 21:13:34 2021

libstdc++: Adjust doxygen markup for variable templates group [PR101307]

libstdc++-v3/ChangeLog:

PR libstdc++/101307
* include/std/type_traits: Adjust doxygen markup.

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 91d65234f23..8d9c6394cd8 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -3096,17 +3096,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if __cplusplus >= 201703L
 # define __cpp_lib_type_trait_variable_templates 201510L
   /**
-   * @defgroup variable_templates Variable templates for type traits.
+   * @defgroup variable_templates Variable templates for type traits
* @ingroup metaprogramming
*
-   * The variable `is_foo_v` is a boolean constant with the same value
-   * as the type trait `is_foo::value`.
+   * Each variable `is_xxx_v` is a boolean constant with the same value
+   * as the `value` member of the corresponding type trait `is_xxx`.
*
* @since C++17
*/
 
-  /** @ingroup variable_templates
+  /**
* @{
+   * @ingroup variable_templates
*/
 template 
   inline constexpr bool is_void_v = is_void<_Tp>::value;


[committed] libstdc++: Adjust doxygen markup for unique_ptr grouping

2021-07-16 Thread Jonathan Wakely via Gcc-patches
This reorders the @{ and @relates tags, and moves the definition of the
__cpp_lib_make_unique macro out of the group, as it seems to confuse
doxygen.

libstdc++-v3/ChangeLog:

* include/bits/unique_ptr.h: Adjust doxygen markup.

Tested powerpc64le-linux. Committed to trunk.

commit adc03d72c3fd9ce4902f09951ca2765eef848783
Author: Jonathan Wakely 
Date:   Thu Jul 15 21:14:40 2021

libstdc++: Adjust doxygen markup for unique_ptr grouping

This reorders the @{ and @relates tags, and moves the definition of the
__cpp_lib_make_unique macro out of the group, as it seems to confuse
doxygen.

libstdc++-v3/ChangeLog:

* include/bits/unique_ptr.h: Adjust doxygen markup.

diff --git a/libstdc++-v3/include/bits/unique_ptr.h 
b/libstdc++-v3/include/bits/unique_ptr.h
index d483f13f2b0..0a0667a7608 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -724,7 +724,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   unique_ptr& operator=(const unique_ptr&) = delete;
 };
 
-  /// @relates unique_ptr @{
+  /// @{
+  /// @relates unique_ptr
 
   /// Swap overload for unique_ptr
   template
@@ -936,7 +937,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
 #if __cplusplus >= 201402L
-  /// @relates unique_ptr @{
 #define __cpp_lib_make_unique 201304
 
   /// @cond undocumented
@@ -955,6 +955,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// @endcond
 
+  /// @{
+  /// @relates unique_ptr
+
   /// std::make_unique for single objects
   template
 inline typename _MakeUniq<_Tp>::__single_object


Re: [PATCH, Fortran] Bind(c): CFI_signed_char is not a Fortran character type

2021-07-16 Thread Thomas Koenig via Gcc-patches



Hi Sandra,

The part of the patch to add tests for this goes on top of my base 
TS29113 testsuite patch, which hasn't been reviewed or committed yet.


It is my understanding that it is not gcc policy to add xfailed test
cases for things that do not yet work. Rather, xfail is for tests that
later turn out not to work, especially on certain architectures.

I have added Toon in CC, maybe he can explain a bit more on that.

Regards

Thomas


Re: [committed] libstdc++: Suppress pedantic warnings about __int128

2021-07-16 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 16, 2021 at 08:41:06AM +0100, Jonathan Wakely via Gcc-patches wrote:
> --- a/libstdc++-v3/include/bits/max_size_type.h
> +++ b/libstdc++-v3/include/bits/max_size_type.h
> @@ -417,7 +417,10 @@ namespace ranges
>  #endif
>  
>  #if __SIZEOF_INT128__
> +#pragma GCC diagnostic push
> +#pragma GCC diagnostic ignored "-Wpedantic"
>using __rep = unsigned __int128;
> +#pragma GCC diagnostic pop

At least in simple cases like this, wouldn't
using __rep = __extension__ unsigned __int128;
be smaller?  And it should be more targetted, wouldn't disable
other pedantic warnings but about __int128.

Jakub



Re: [committed] libstdc++: Suppress pedantic warnings about __int128

2021-07-16 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 16, 2021 at 10:27:09AM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Fri, Jul 16, 2021 at 08:41:06AM +0100, Jonathan Wakely via Gcc-patches 
> wrote:
> > --- a/libstdc++-v3/include/bits/max_size_type.h
> > +++ b/libstdc++-v3/include/bits/max_size_type.h
> > @@ -417,7 +417,10 @@ namespace ranges
> >  #endif
> >  
> >  #if __SIZEOF_INT128__
> > +#pragma GCC diagnostic push
> > +#pragma GCC diagnostic ignored "-Wpedantic"
> >using __rep = unsigned __int128;
> > +#pragma GCC diagnostic pop
> 
> At least in simple cases like this, wouldn't
>   using __rep = __extension__ unsigned __int128;

__extension__ using __rep = unsigned __int128;
actually (now tested).

Jakub



Re: [committed] libstdc++: Suppress pedantic warnings about __int128

2021-07-16 Thread Jonathan Wakely via Gcc-patches
On Fri, 16 Jul 2021, 09:30 Jakub Jelinek via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> On Fri, Jul 16, 2021 at 10:27:09AM +0200, Jakub Jelinek via Gcc-patches
> wrote:
> > On Fri, Jul 16, 2021 at 08:41:06AM +0100, Jonathan Wakely via
> Gcc-patches wrote:
> > > --- a/libstdc++-v3/include/bits/max_size_type.h
> > > +++ b/libstdc++-v3/include/bits/max_size_type.h
> > > @@ -417,7 +417,10 @@ namespace ranges
> > >  #endif
> > >
> > >  #if __SIZEOF_INT128__
> > > +#pragma GCC diagnostic push
> > > +#pragma GCC diagnostic ignored "-Wpedantic"
> > >using __rep = unsigned __int128;
> > > +#pragma GCC diagnostic pop
> >
> > At least in simple cases like this, wouldn't
> >   using __rep = __extension__ unsigned __int128;
>
> __extension__ using __rep = unsigned __int128;
> actually (now tested).
>

Ah, thanks. I didn't find the right syntax, and I know __extension__
doesn't work in other cases, like quad float literals, so I assumed it
doesn't work here. I suppose the literals don't work because the warning
comes from the processor, which doesn't understand __extension__ (and also
ignores the diagnostic pragma).





> Jakub
>
>


Re: [committed] libstdc++: Suppress pedantic warnings about __int128

2021-07-16 Thread Jonathan Wakely via Gcc-patches
On Fri, 16 Jul 2021, 09:38 Jonathan Wakely,  wrote:

>
>
> On Fri, 16 Jul 2021, 09:30 Jakub Jelinek via Libstdc++, <
> libstd...@gcc.gnu.org> wrote:
>
>> On Fri, Jul 16, 2021 at 10:27:09AM +0200, Jakub Jelinek via Gcc-patches
>> wrote:
>> > On Fri, Jul 16, 2021 at 08:41:06AM +0100, Jonathan Wakely via
>> Gcc-patches wrote:
>> > > --- a/libstdc++-v3/include/bits/max_size_type.h
>> > > +++ b/libstdc++-v3/include/bits/max_size_type.h
>> > > @@ -417,7 +417,10 @@ namespace ranges
>> > >  #endif
>> > >
>> > >  #if __SIZEOF_INT128__
>> > > +#pragma GCC diagnostic push
>> > > +#pragma GCC diagnostic ignored "-Wpedantic"
>> > >using __rep = unsigned __int128;
>> > > +#pragma GCC diagnostic pop
>> >
>> > At least in simple cases like this, wouldn't
>> >   using __rep = __extension__ unsigned __int128;
>>
>> __extension__ using __rep = unsigned __int128;
>> actually (now tested).
>>
>
> Ah, thanks. I didn't find the right syntax, and I know __extension__
> doesn't work in other cases, like quad float literals, so I assumed it
> doesn't work here. I suppose the literals don't work because the warning
> comes from the processor, which doesn't understand __extension__ (and also
> ignores the diagnostic pragma).
>

That grammar for a using-declaration makes no sense at all btw ;-)


Add EAF_NOT_RETURNED flag

2021-07-16 Thread Jan Hubicka
Hi,
this patch adds EAF_NOT_RETURNED flag which is determined by ipa-modref
and used both to improve its propagation (it can stop propagating flags
from call parameter to return value if EAF_NOT_RETURNED is earlier
determined for callee) and also to improve points-to constraints in
tree-ssa-structalias (since return value constrain does not need to
contain the parameters that are not returned.

No true IPA propagatoin is done, but I will look into it incrementally
(there is general problem of lacking return functions).

We now have 8 EAF flags so it is no longer possible to store them to
char datatype so I added eaf_flags_t. I also disabled some shortcuts in
ipa-moderef which ignored CONST functions since EAF_UNUSED and
EAF_NOT_RETURNED is useful there, too.

The tree-ssa-structlias part is not very precise. I simply avoid adding
constraint copying callused to rhs if all parameters are
EAF_NOT_RETURNED.  This is overly conservative, but if one just skips
not returned parameters in call used we will optimize out initialization
of memory that is read by the callee but does not escape or gets
returned.  

It would be more precise to push arguments to rhsc vector individually,
but I would like to do this incrementally since this results in more
constraints and pehraps we should be smart and produce them only if
there is a mix of not returned and returned parameters or so.

Bootstrapped/regtested x86_64-linux, also ltobootstrapped with c++ only,
OK?

gcc/ChangeLog:

2021-07-16  Jan Hubicka  

* ipa-modref.c (struct escape_entry): Use eaf_flags_t.
(dump_eaf_flags): Dump EAF_NOT_RETURNED
(eaf_flags_useful_p): Use eaf_fleags_t; handle const functions
and EAF_NOT_RETURNED.
(modref_summary::useful_p): Likewise.
(modref_summary_lto::useful_p): Likewise.
(struct) modref_summary_lto: Use eaf_fleags_t.
(deref_flags): Handle EAF_NOT_RETURNED.
(struct escape_point): Use min_flags.
(modref_lattice::init): Add EAF_NOT_RETURNED.
(merge_call_lhs_flags): Ignore EAF_NOT_RETURNED functions
(analyze_ssa_name_flags): Clear EAF_NOT_RETURNED on return;
handle call flags.
(analyze_parms): Also analyze const functions; update conition on
flags usefulness.
(modref_write): Update streaming.
(read_section): Update streaming.
(remap_arg_flags): Use eaf_flags_t.
(modref_merge_call_site_flags): Hanlde EAF_NOT_RETURNED.
* ipa-modref.h: (eaf_flags_t): New typedef.
(struct modref_summary): Use eaf_flags_t.
* tree-core.h (EAF_NOT_RETURNED): New constant.
* tree-ssa-structalias.c (handle_rhs_call): Hanlde EAF_NOT_RETURNED.
(handle_const_call): Handle EAF_UNUSED and EAF_NOT_RETURNED.
(handle_pure_call): Handle EAF_NOT_RETURNED.

gcc/testsuite/ChangeLog:

2021-07-16  Jan Hubicka  

* gcc.dg/tree-ssa/modref-6.c: New test.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index d5a8332fb55..734d7d066bc 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "tree-ssanames.h"
 
+
 namespace {
 
 /* We record fnspec specifiers for call edges since they depends on actual
@@ -135,7 +136,7 @@ struct escape_entry
   /* Argument it escapes to.  */
   unsigned int arg;
   /* Minimal flags known about the argument.  */
-  char min_flags;
+  eaf_flags_t min_flags;
   /* Does it escape directly or indirectly?  */
   bool direct;
 };
@@ -155,6 +156,8 @@ dump_eaf_flags (FILE *out, int flags, bool newline = true)
 fprintf (out, " nodirectescape");
   if (flags & EAF_UNUSED)
 fprintf (out, " unused");
+  if (flags & EAF_NOT_RETURNED)
+fprintf (out, " not_returned");
   if (newline)
   fprintf (out, "\n");
 }
@@ -278,12 +281,17 @@ modref_summary::~modref_summary ()
 /* Return true if FLAGS holds some useful information.  */
 
 static bool
-eaf_flags_useful_p (vec  &flags, int ecf_flags)
+eaf_flags_useful_p (vec  &flags, int ecf_flags)
 {
   for (unsigned i = 0; i < flags.length (); i++)
-if (ecf_flags & ECF_PURE)
+if (ecf_flags & ECF_CONST)
   {
-   if (flags[i] & (EAF_UNUSED | EAF_DIRECT))
+   if (flags[i] & (EAF_UNUSED | EAF_NOT_RETURNED))
+ return true;
+  }
+else if (ecf_flags & ECF_PURE)
+  {
+   if (flags[i] & (EAF_UNUSED | EAF_DIRECT | EAF_NOT_RETURNED))
  return true;
   }
 else
@@ -300,13 +308,15 @@ eaf_flags_useful_p (vec  &flags, int 
ecf_flags)
 bool
 modref_summary::useful_p (int ecf_flags, bool check_flags)
 {
-  if (ecf_flags & (ECF_CONST | ECF_NOVOPS))
+  if (ecf_flags & ECF_NOVOPS)
 return false;
   if (arg_flags.length () && !check_flags)
 return true;
   if (check_flags && eaf_flags_useful_p (arg_flags, ecf_flags))
 return true;
   arg_flags.release ();
+  if (ecf_flags & ECF_CONST)
+return false;
   if (loads && !loads->every_base)
 return true;
   if 

RE: [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457

2021-07-16 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: H.J. Lu 
> Sent: Friday, July 16, 2021 3:21 AM
> To: Tamar Christina 
> Cc: GCC Patches ; Richard Sandiford
> ; nd 
> Subject: Re: [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests
> PR101457
> 
> On Thu, Jul 15, 2021 at 9:40 AM Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > These testcases accidentally contain the wrong signs for the expected
> > values for the scalar code.  The vector code however is correct.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Committed as a trivial fix.
> >
> > Thanks,
> > Tamar
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR middle-end/101457
> > * gcc.dg/vect/vect-reduc-dot-17.c: Fix signs of scalar code.
> > * gcc.dg/vect/vect-reduc-dot-18.c: Likewise.
> > * gcc.dg/vect/vect-reduc-dot-22.c: Likewise.
> > * gcc.dg/vect/vect-reduc-dot-9.c: Likewise.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> > index
> >
> aa269c4d657f65e07e36df7f3fd0098cf3aaf4d0..38f86fe458adcc7ebbbae22f5cc
> 1
> > e720928f2d48 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> > @@ -35,8 +35,9 @@ main (void)
> >  {
> >check_vect ();
> >
> > -  SIGNEDNESS_3 char a[N], b[N];
> > -  int expected = 0x12345;
> > +  SIGNEDNESS_3 char a[N];
> > +  SIGNEDNESS_4 char b[N];
> > +  SIGNEDNESS_1 int expected = 0x12345;
> >for (int i = 0; i < N; ++i)
> >  {
> >a[i] = BASE + i * 5;
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> > index
> >
> 2b1cc0411c3256ccd876d8b4da18ce4881dc0af9..2e86ebe3c6c6a0da9ac2428685
> 92
> > f30028ed2155 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> > @@ -35,8 +35,9 @@ main (void)
> >  {
> >check_vect ();
> >
> > -  SIGNEDNESS_3 char a[N], b[N];
> > -  int expected = 0x12345;
> > +  SIGNEDNESS_3 char a[N];
> > +  SIGNEDNESS_4 char b[N];
> > +  SIGNEDNESS_1 int expected = 0x12345;
> >for (int i = 0; i < N; ++i)
> >  {
> >a[i] = BASE + i * 5;
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> > index
> >
> febeb19784c6aaca72dc0871af0d32cc91fa6ea2..0bde43a6cb855ce5edd9015eb
> f34
> > ca226353d77e 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> > @@ -37,7 +37,7 @@ main (void)
> >
> >SIGNEDNESS_3 char a[N];
> >SIGNEDNESS_4 short b[N];
> > -  int expected = 0x12345;
> > +  SIGNEDNESS_1 long expected = 0x12345;
> 
> Does it work with long == int? I still got

Ah no, It requires double widening.  I'll replace it with a long long.

Thanks,
Tamar
> 
> FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects scan-tree-dump-
> not vect "vect_recog_dot_prod_pattern: detected"
> FAIL: gcc.dg/vect/vect-reduc-dot-22.c scan-tree-dump-not vect
> "vect_recog_dot_prod_pattern: detected"
> 
> with -m32 on Linux/x86-64.
> 
> >for (int i = 0; i < N; ++i)
> >  {
> >a[i] = BASE + i * 5;
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > index
> >
> cbbeedec3bfd0810a8ce8036e6670585d9334924..d1049c96bf1febfc8933622e2
> 92b
> > 44cc8dd129cc 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > @@ -35,8 +35,9 @@ main (void)
> >  {
> >check_vect ();
> >
> > -  SIGNEDNESS_3 char a[N], b[N];
> > -  int expected = 0x12345;
> > +  SIGNEDNESS_3 char a[N];
> > +  SIGNEDNESS_4 char b[N];
> > +  SIGNEDNESS_1 int expected = 0x12345;
> >for (int i = 0; i < N; ++i)
> >  {
> >a[i] = BASE + i * 5;
> >
> >
> > --
> 
> 
> --
> H.J.


[PATCH] C-SKY: Use the common way to define MULTILIB_DIRNAMES.

2021-07-16 Thread Xianmiao Qu
From: Cooper Qu 

This patch has been pushed.

C-SKY previously used a forked print-sysroot-suffix.sh and define
CSKY_MULTILIB_DIRNAMES to specify OS multilib directories. This
patch delete the forked print-sysroot-suffix.sh and define
MULTILIB_DIRNAMES to generate same directories.

gcc/
* config.gcc: Don't use forked print-sysroot-suffix.sh and
t-sysroot-suffix for C-SKY.
* config/csky/print-sysroot-suffix.sh: Delete.
* config/csky/t-csky-linux: Delete.
* config/csky/t-sysroot-suffix: Define MULTILIB_DIRNAMES
instead of CSKY_MULTILIB_DIRNAMES.
---
 gcc/config.gcc  |   5 -
 gcc/config/csky/print-sysroot-suffix.sh | 147 
 gcc/config/csky/t-csky-linux|   2 +-
 gcc/config/csky/t-sysroot-suffix|  28 -
 4 files changed, 1 insertion(+), 181 deletions(-)
 delete mode 100644 gcc/config/csky/print-sysroot-suffix.sh
 delete mode 100644 gcc/config/csky/t-sysroot-suffix

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f3e94f7c0d8..93e2b3219b9 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1568,11 +1568,6 @@ csky-*-*)
tm_file="dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h 
${tm_file} csky/csky-linux-elf.h"
tmake_file="${tmake_file} csky/t-csky csky/t-csky-linux"
 
-   if test "x${enable_multilib}" = xyes ; then
-   tm_file="$tm_file ./sysroot-suffix.h"
-   tmake_file="${tmake_file} csky/t-sysroot-suffix"
-   fi
-
case ${target} in
csky-*-linux-gnu*)
tm_defines="$tm_defines DEFAULT_LIBC=LIBC_GLIBC"
diff --git a/gcc/config/csky/print-sysroot-suffix.sh 
b/gcc/config/csky/print-sysroot-suffix.sh
deleted file mode 100644
index 4840bc67d07..000
--- a/gcc/config/csky/print-sysroot-suffix.sh
+++ /dev/null
@@ -1,147 +0,0 @@
-#! /bin/sh
-# Script to generate SYSROOT_SUFFIX_SPEC equivalent to MULTILIB_OSDIRNAMES
-# Arguments are MULTILIB_OSDIRNAMES, MULTILIB_OPTIONS and MULTILIB_MATCHES.
-
-# Copyright (C) 2018-2021 Free Software Foundation, Inc.
-# Contributed by C-SKY Microsystems and Mentor Graphics.
-
-# This file is part of GCC.
-
-# GCC is free software; you can redistribute it and/or modify it under
-# the terms of the GNU General Public License as published by the Free
-# Software Foundation; either version 3, or (at your option) any later
-# version.
-
-# GCC is distributed in the hope that it will be useful, but WITHOUT
-# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
-# for more details.
-
-# You should have received a copy of the GNU General Public License
-# along with GCC; see the file COPYING3.  If not see
-# .
-
-# This shell script produces a header file fragment that defines
-# SYSROOT_SUFFIX_SPEC.  It assumes that the sysroots will have the same
-# structure and names used by the multilibs.
-
-# Invocation:
-#   print-sysroot-suffix.sh \
-#  MULTILIB_OSDIRNAMES \
-#  MULTILIB_OPTIONS \
-#  MULTILIB_MATCHES \
-#  > t-sysroot-suffix.h
-
-# The three options exactly correspond to the variables of the same
-# names defined in the tmake_file fragments.
-
-# Example:
-#   sh ./gcc/config/print-sysroot-suffix.sh "a=A" "a b/c/d" ""
-# =>
-#   #undef SYSROOT_SUFFIX_SPEC
-#   #define SYSROOT_SUFFIX_SPEC "" \
-#   "%{a:" \
-# "%{b:A/b/;" \
-# "c:A/c/;" \
-# "d:A/d/;" \
-# ":A/};" \
-#   ":}"
-
-# The script uses temporary subscripts in order to permit a recursive
-# algorithm without the use of functions.
-
-set -e
-
-dirnames="$1"
-options="$2"
-matches="$3"
-
-cat > print-sysroot-suffix3.sh <<\EOF
-#! /bin/sh
-# Print all the multilib matches for this option
-result="$1"
-EOF
-for x in $matches; do
-  l=`echo $x | sed -e 's/=.*$//' -e 's/?/=/g'`
-  r=`echo $x | sed -e 's/^.*=//' -e 's/?/=/g'`
-  echo "[ \"\$1\" = \"$l\" ] && result=\"\$result|$r\"" >> 
print-sysroot-suffix3.sh
-done
-echo 'echo $result' >> print-sysroot-suffix3.sh
-chmod +x print-sysroot-suffix3.sh
-
-cat > print-sysroot-suffix2.sh <<\EOF
-#! /bin/sh
-# Recursive script to enumerate all multilib combinations, match against
-# multilib directories and output a spec string of the result.
-# Will fold identical trees.
-
-padding="$1"
-optstring="$2"
-shift 2
-n="\" \\
-$padding\""
-if [ $# = 0 ]; then
-EOF
-
-pat=
-for x in $dirnames; do
-#  p=`echo $x | sed -e 's,=!,/$=/,'`
-  p=`echo $x | sed -e 's/=//g'`
-#  pat="$pat -e 's=^//$p='"
-   pat="$pat -e 's/$p/g'"
-done
-echo '  optstring=`echo "/$optstring" | sed '"$pat\`" >> 
print-sysroot-suffix2.sh
-cat >> print-sysroot-suffix2.sh <<\EOF
-  case $optstring in
-  //*)
-;;
-  *)
-echo "$optstring"
-;;
-  esac
-else
-  thisopt="$1"
-  shift
-  bit=
-  lastcond=
-  result=
-  for x in `echo "$thisopt" | sed -e 's,/, ,g'`; do
-case $x in
-E

[RFC] c-family: Add __builtin_noassoc

2021-07-16 Thread Matthias Kretz
On Wednesday, 14 July 2021 10:14:55 CEST Richard Biener wrote:
> > > There's one "related" IL feature used by the Fortran frontend -
> > > PAREN_EXPR
> > > prevents association across it.  So for Fortran (when not
> > > -fno-protect-parens which is enabled by -Ofast), (a + b) - b cannot be
> > > optimized to a.  Eventually this could be used to wrap intrinsic results
> > > since most of the issues in the end require association.  Note
> > > PAREN_EXPR
> > > isn't exposed to the C family frontends but we could of course add a
> > > builtin-like thing for this _Noassoc (  ) or so.  Note PAREN_EXPR
> > > survives -Ofast so it's the frontends that would need to choose to emit
> > > or
> > > not emit it (or always emit it).
> >
> > Interesting. I want that builtin in C++. Currently I use inline asm to
> > achieve a similar effect. But the inline asm hammer is really too big for
> > the problem.
>
> I think implementing it similar to how we do __builtin_shufflevector would
> be easily possible.  PAREN_EXPR is a tree code.

Like this? If you like it, I'll write the missing documentation and do real 
regression testing.

---

New builtin to enable explicit use of PAREN_EXPR in C & C++ code.

Signed-off-by: Matthias Kretz 

gcc/testsuite/ChangeLog:

* c-c++-common/builtin-noassoc-1.c: New test.

gcc/cp/ChangeLog:

* cp-objcp-common.c (names_builtin_p): Handle
RID_BUILTIN_NOASSOC.
* parser.c (cp_parser_postfix_expression): Handle
RID_BUILTIN_NOASSOC.

gcc/c-family/ChangeLog:

* c-common.c (c_common_reswords): Add __builtin_noassoc.
* c-common.h (enum rid): Add RID_BUILTIN_NOASSOC.

gcc/c/ChangeLog:

* c-decl.c (names_builtin_p): Handle RID_BUILTIN_NOASSOC.
* c-parser.c (c_parser_postfix_expression): Likewise.
---
 gcc/c-family/c-common.c   |  1 +
 gcc/c-family/c-common.h   |  2 +-
 gcc/c/c-decl.c|  1 +
 gcc/c/c-parser.c  | 20 
 gcc/cp/cp-objcp-common.c  |  1 +
 gcc/cp/parser.c   | 14 +++
 .../c-c++-common/builtin-noassoc-1.c  | 24 +++
 7 files changed, 62 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/builtin-noassoc-1.c


-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 681fcc972f4..e74123d896c 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -384,6 +384,7 @@ const struct c_common_resword c_common_reswords[] =
   { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 },
   { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 },
   { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
+  { "__builtin_noassoc", RID_BUILTIN_NOASSOC, 0 },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
   { "__builtin_shufflevector", RID_BUILTIN_SHUFFLEVECTOR, 0 },
   { "__builtin_tgmath", RID_BUILTIN_TGMATH, D_CONLY },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 50ca8fb6ebd..b772cf9c5e9 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -108,7 +108,7 @@ enum rid
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,  RID_CHOOSE_EXPR,
   RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX,	 RID_BUILTIN_SHUFFLE,
   RID_BUILTIN_SHUFFLEVECTOR,   RID_BUILTIN_CONVERTVECTOR,   RID_BUILTIN_TGMATH,
-  RID_BUILTIN_HAS_ATTRIBUTE,
+  RID_BUILTIN_HAS_ATTRIBUTE,   RID_BUILTIN_NOASSOC,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
 
   /* TS 18661-3 keywords, in the same sequence as the TI_* values.  */
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 983d65e930c..7b7ecba026f 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -10557,6 +10557,7 @@ names_builtin_p (const char *name)
 case RID_BUILTIN_HAS_ATTRIBUTE:
 case RID_BUILTIN_SHUFFLE:
 case RID_BUILTIN_SHUFFLEVECTOR:
+case RID_BUILTIN_NOASSOC:
 case RID_CHOOSE_EXPR:
 case RID_OFFSETOF:
 case RID_TYPES_COMPATIBLE_P:
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 9a56e0c04c6..2b40dc8253e 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -8931,6 +8931,7 @@ c_parser_predefined_identifier (c_parser *parser)
 			 assignment-expression ,
 			 assignment-expression, )
  __builtin_convertvector ( assignment-expression , type-name )
+ __builtin_noassoc ( assignment-expression )
 
offsetof-member-designator:
  identifier
@@ -10076,6 +10077,25 @@ c_parser_postfix_expression (c_parser *parser)
 	  }
 	  }
 	  break;
+	case RID_BUILTIN_NOASSOC:
+	  {
+	location_t start_lo

Re: [PATCH 2/2][RFC] Add loop masking support for x86

2021-07-16 Thread Richard Biener
On Thu, 15 Jul 2021, Richard Biener wrote:

> On Thu, 15 Jul 2021, Richard Biener wrote:
>
> > OK, guess I was more looking at
> > 
> > #define N 32
> > int foo (unsigned long *a, unsigned long * __restrict b,
> >  unsigned int *c, unsigned int * __restrict d,
> >  int n)
> > {
> >   unsigned sum = 1;
> >   for (int i = 0; i < n; ++i)
> > {
> >   b[i] += a[i];
> >   d[i] += c[i];
> > }
> >   return sum;
> > }
> > 
> > where we on x86 AVX512 vectorize with V8DI and V16SI and we
> > generate two masks for the two copies of V8DI (VF is 16) and one
> > mask for V16SI.  With SVE I see
> > 
> > punpklo p1.h, p0.b
> > punpkhi p2.h, p0.b
> > 
> > that's sth I expected to see for AVX512 as well, using the V16SI
> > mask and unpacking that to two V8DI ones.  But I see
> > 
> > vpbroadcastd%eax, %ymm0
> > vpaddd  %ymm12, %ymm0, %ymm0
> > vpcmpud $6, %ymm0, %ymm11, %k3
> > vpbroadcastd%eax, %xmm0
> > vpaddd  %xmm10, %xmm0, %xmm0
> > vpcmpud $1, %xmm7, %xmm0, %k1
> > vpcmpud $6, %xmm0, %xmm8, %k2
> > kortestb%k1, %k1
> > jne .L3
> > 
> > so three %k masks generated by vpcmpud.  I'll have to look what's
> > the magic for SVE and why that doesn't trigger for x86 here.
> 
> So answer myself, vect_maybe_permute_loop_masks looks for
> vec_unpacku_hi/lo_optab, but with AVX512 the vector bools have
> QImode so that doesn't play well here.  Not sure if there
> are proper mask instructions to use (I guess there's a shift
> and lopart is free).  This is QI:8 to two QI:4 (bits) mask
> conversion.  Not sure how to better ask the target here - again
> VnBImode might have been easier here.

So I've managed to "emulate" the unpack_lo/hi for the case of
!VECTOR_MODE_P masks by using sub-vector select (we're asking
to turn vector(8)  into two
vector(4) ) via BIT_FIELD_REF.  That then
produces the desired single mask producer and

  loop_mask_38 = VIEW_CONVERT_EXPR>(loop_mask_54);
  loop_mask_37 = BIT_FIELD_REF ;

note for the lowpart we can just view-convert away the excess bits,
fully re-using the mask.  We generate surprisingly "good" code:

kmovb   %k1, %edi
shrb$4, %dil
kmovb   %edi, %k2

besides the lack of using kshiftrb.  I guess we're just lacking
a mask register alternative for

(insn 22 20 25 4 (parallel [
(set (reg:QI 94 [ loop_mask_37 ])
(lshiftrt:QI (reg:QI 98 [ loop_mask_54 ])
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) 724 {*lshrqi3_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))

and so we reload.  For the above cited loop the AVX512 vectorization
with --param vect-partial-vector-usage=1 does look quite sensible
to me.  Instead of a SSE vectorized epilogue plus a scalar
epilogue we get a single fully masked AVX512 "iteration" for both.
I suppose it's still mostly a code-size optimization (384 bytes
with the masked epiloge vs. 474 bytes with trunk) since it will
be likely slower for very low iteration counts but it's good
for icache usage then and good for less branch predictor usage.

That said, I have to set up SPEC on a AVX512 machine to do
any meaningful measurements (I suspect with just AVX2 we're not
going to see any benefit from masking).  Hints/help how to fix
the missing kshiftrb appreciated.

Oh, and if there's only V4DImode and V16HImode data then
we don't go the vect_maybe_permute_loop_masks path - that is,
we don't generate the (not used) intermediate mask but end up
generating two while_ult parts.

Thanks,
Richard.


Re: [PATCH] gcc_update: use gcc-descr git alias for revision string in gcc/REVISION

2021-07-16 Thread Richard Biener via Gcc-patches
On Fri, Jul 16, 2021 at 9:29 AM Jakub Jelinek  wrote:
>
> On Fri, Jul 16, 2021 at 09:06:01AM +0200, Richard Biener via Gcc-patches 
> wrote:
> > On Thu, Jul 15, 2021 at 9:12 PM Serge Belyshev
> >  wrote:
> > >
> > > This is to make development version string more readable, and
> > > to simplify navigation through gcc-testresults.
> > >
> > > Currently gcc_update uses git log --pretty=tformat:%p:%t:%H to
> > > generate version string, which is somewhat excessive since conversion
> > > to git because commit hashes are now stable.
> > >
> > > Even better, gcc-git-customization.sh script provides gcc-descr alias
> > > which makes prettier version string, and thus use it instead (or just
> > > abbreviated commit hash when the alias is not available).
> > >
> > > Before: [master revision 
> > > b25edf6e6fe:e035f180ebf:7094a69bd62a14dfa311eaa2fea468f221c7c9f3]
> > > After: [master r12-2331]
> > >
> > > OK for mainline?
> >
> > Can you instead open-code gcc-descr in this script?
>
> Yeah, that will mean consistency no matter whether one has the
> customizations installed or not.
> And, you don't want the effect of $GCC_GIT gcc-descr but $GCC_GIT gcc-descr 
> HEAD
> (the default is $GCC_GIT gcc-descr master).
> As you want to use gcc-descr without --full, I think
> revision=`$GCC_GIT log -n1 --pretty=tformat:%h`
> r=`$GCC_GIT describe --all --match 'basepoints/gcc-[0-9]*' HEAD \
>| sed -n 
> 's,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)-\([0-9]\+\)-g[0-9a-f]*$,r\2-\3,p;s,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)$,r\2-0,p'`;
> if test -n $r; then
> o=`$GCC_GIT config --get gcc-config.upstream`;
> rr=`echo $r | sed -n 
> 's,^r\([0-9]\+\)-[0-9]\+\(-g[0-9a-f]\+\)\?$,\1,p'`;
> if $GCC_GIT rev-parse --verify --quiet 
> ${o:-origin}/releases/gcc-$rr >/dev/null; then
> m=releases/gcc-$rr;
> else
> m=master;
> fi;
> if $GCC_GIT merge-base --is-ancestor HEAD ${o:-origin}/$m; 
> then
> revision=$r;
> fi
> fi
> will do it.  Perhaps rename the r, o, rr and m temporaries.

Note the new form will be more difficult to use for people not having
the customizations
installed.  It also will likely break when gcc-update is not invoked
on official branches?

So I'm not sure the change is a good one after all...

Richard.

> Jakub
>


Re: [RFC] c-family: Add __builtin_noassoc

2021-07-16 Thread Richard Biener via Gcc-patches
On Fri, Jul 16, 2021 at 10:57 AM Matthias Kretz  wrote:
>
> On Wednesday, 14 July 2021 10:14:55 CEST Richard Biener wrote:
> > > > There's one "related" IL feature used by the Fortran frontend -
> > > > PAREN_EXPR
> > > > prevents association across it.  So for Fortran (when not
> > > > -fno-protect-parens which is enabled by -Ofast), (a + b) - b cannot be
> > > > optimized to a.  Eventually this could be used to wrap intrinsic results
> > > > since most of the issues in the end require association.  Note
> > > > PAREN_EXPR
> > > > isn't exposed to the C family frontends but we could of course add a
> > > > builtin-like thing for this _Noassoc (  ) or so.  Note PAREN_EXPR
> > > > survives -Ofast so it's the frontends that would need to choose to emit
> > > > or
> > > > not emit it (or always emit it).
> > >
> > > Interesting. I want that builtin in C++. Currently I use inline asm to
> > > achieve a similar effect. But the inline asm hammer is really too big for
> > > the problem.
> >
> > I think implementing it similar to how we do __builtin_shufflevector would
> > be easily possible.  PAREN_EXPR is a tree code.
>
> Like this? If you like it, I'll write the missing documentation and do real
> regression testing.

Yes, like this.  Now, __builtin_noassoc (a + b + c) might suggest that
it prevents a + b + c from being re-associated - but it does not.  PAREN_EXPR
is a barrier for association, so for 'a + b + c + PAREN_EXPR '
the a+b+c and d+e+f chains will not mix but they individually can be
re-associated.  That said __builtin_noassoc might be a bad name,
maybe __builtin_assoc_barrier is better?

The implementation is originally for the Fortran language semantics
which allows re-association but respects parens (thus PAREN_EXPR).

To fully prevent association of a a + b + d + e chain you need at least
two PAREN_EXPRs, for example (a+b) + (d+e) would do.

One could of course provide __builtin_noassoc (a+b+c+d) with the
implied semantics and insert PAREN_EXPRs around all operands
when lowering it.

Not sure what's more useful in practice - directly exposing the middle-end
PAREN_EXPR or providing a way to mark a whole expression as to be
not re-associated?  Maybe both?

Richard.

> ---
>
> New builtin to enable explicit use of PAREN_EXPR in C & C++ code.
>
> Signed-off-by: Matthias Kretz 
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/builtin-noassoc-1.c: New test.
>
> gcc/cp/ChangeLog:
>
> * cp-objcp-common.c (names_builtin_p): Handle
> RID_BUILTIN_NOASSOC.
> * parser.c (cp_parser_postfix_expression): Handle
> RID_BUILTIN_NOASSOC.
>
> gcc/c-family/ChangeLog:
>
> * c-common.c (c_common_reswords): Add __builtin_noassoc.
> * c-common.h (enum rid): Add RID_BUILTIN_NOASSOC.
>
> gcc/c/ChangeLog:
>
> * c-decl.c (names_builtin_p): Handle RID_BUILTIN_NOASSOC.
> * c-parser.c (c_parser_postfix_expression): Likewise.
> ---
>  gcc/c-family/c-common.c   |  1 +
>  gcc/c-family/c-common.h   |  2 +-
>  gcc/c/c-decl.c|  1 +
>  gcc/c/c-parser.c  | 20 
>  gcc/cp/cp-objcp-common.c  |  1 +
>  gcc/cp/parser.c   | 14 +++
>  .../c-c++-common/builtin-noassoc-1.c  | 24 +++
>  7 files changed, 62 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/c-c++-common/builtin-noassoc-1.c
>
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  std::experimental::simd  https://github.com/VcDevel/std-simd
> ──


Re: [PATCH] gcc_update: use gcc-descr git alias for revision string in gcc/REVISION

2021-07-16 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 16, 2021 at 11:22:27AM +0200, Richard Biener wrote:
> > Yeah, that will mean consistency no matter whether one has the
> > customizations installed or not.
> > And, you don't want the effect of $GCC_GIT gcc-descr but $GCC_GIT gcc-descr 
> > HEAD
> > (the default is $GCC_GIT gcc-descr master).
> > As you want to use gcc-descr without --full, I think
> > revision=`$GCC_GIT log -n1 --pretty=tformat:%h`
> > r=`$GCC_GIT describe --all --match 'basepoints/gcc-[0-9]*' HEAD \
> >| sed -n 
> > 's,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)-\([0-9]\+\)-g[0-9a-f]*$,r\2-\3,p;s,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)$,r\2-0,p'`;
> > if test -n $r; then
> > o=`$GCC_GIT config --get gcc-config.upstream`;
> > rr=`echo $r | sed -n 
> > 's,^r\([0-9]\+\)-[0-9]\+\(-g[0-9a-f]\+\)\?$,\1,p'`;
> > if $GCC_GIT rev-parse --verify --quiet 
> > ${o:-origin}/releases/gcc-$rr >/dev/null; then
> > m=releases/gcc-$rr;
> > else
> > m=master;
> > fi;
> > if $GCC_GIT merge-base --is-ancestor HEAD ${o:-origin}/$m; 
> > then
> > revision=$r;
> > fi
> > fi
> > will do it.  Perhaps rename the r, o, rr and m temporaries.
> 
> Note the new form will be more difficult to use for people not having
> the customizations
> installed.  It also will likely break when gcc-update is not invoked
> on official branches?

It will not break, on the non-official branches it will just print the
hash alone.  That is the --is-ancestor check in there...
People without the customizations can easily look it up using gcc.gnu.org,
https://gcc.gnu.org/r12-1234
works.

The advantage of the r12-1234 form is that it is short, unique and easily
comparable (what is older vs. newer) and clearly says what is official
release branch.

Alternative would be to use git gcc-descr --full form with the hash part
reduced say to 11 or 12 chars, i.e.
r12-1234-g9147affc04e1
then it works even without the customizations (can be fed directly to git)
and still has the unique and easily comparable properties, but isn't that
good on the short side anymore.  In the above script just using
revision=${r}-g${revision};
instead of
revision=$r;
would do it.  Perhaps also replace both HEAD occurences with $revision

Jakub



Re: [PATCH] gcc_update: use gcc-descr git alias for revision string in gcc/REVISION

2021-07-16 Thread Richard Biener via Gcc-patches
On Fri, Jul 16, 2021 at 11:36 AM Jakub Jelinek  wrote:
>
> On Fri, Jul 16, 2021 at 11:22:27AM +0200, Richard Biener wrote:
> > > Yeah, that will mean consistency no matter whether one has the
> > > customizations installed or not.
> > > And, you don't want the effect of $GCC_GIT gcc-descr but $GCC_GIT 
> > > gcc-descr HEAD
> > > (the default is $GCC_GIT gcc-descr master).
> > > As you want to use gcc-descr without --full, I think
> > > revision=`$GCC_GIT log -n1 --pretty=tformat:%h`
> > > r=`$GCC_GIT describe --all --match 'basepoints/gcc-[0-9]*' HEAD \
> > >| sed -n 
> > > 's,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)-\([0-9]\+\)-g[0-9a-f]*$,r\2-\3,p;s,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)$,r\2-0,p'`;
> > > if test -n $r; then
> > > o=`$GCC_GIT config --get gcc-config.upstream`;
> > > rr=`echo $r | sed -n 
> > > 's,^r\([0-9]\+\)-[0-9]\+\(-g[0-9a-f]\+\)\?$,\1,p'`;
> > > if $GCC_GIT rev-parse --verify --quiet 
> > > ${o:-origin}/releases/gcc-$rr >/dev/null; then
> > > m=releases/gcc-$rr;
> > > else
> > > m=master;
> > > fi;
> > > if $GCC_GIT merge-base --is-ancestor HEAD 
> > > ${o:-origin}/$m; then
> > > revision=$r;
> > > fi
> > > fi
> > > will do it.  Perhaps rename the r, o, rr and m temporaries.
> >
> > Note the new form will be more difficult to use for people not having
> > the customizations
> > installed.  It also will likely break when gcc-update is not invoked
> > on official branches?
>
> It will not break, on the non-official branches it will just print the
> hash alone.  That is the --is-ancestor check in there...
> People without the customizations can easily look it up using gcc.gnu.org,
> https://gcc.gnu.org/r12-1234
> works.
>
> The advantage of the r12-1234 form is that it is short, unique and easily
> comparable (what is older vs. newer) and clearly says what is official
> release branch.

True.

> Alternative would be to use git gcc-descr --full form with the hash part
> reduced say to 11 or 12 chars, i.e.
> r12-1234-g9147affc04e1
> then it works even without the customizations (can be fed directly to git)
> and still has the unique and easily comparable properties, but isn't that
> good on the short side anymore.

I'd still say the above is better, but yes, reducing the full hash is sensible
(doesn't git have an automatic way to do that given the current repo
"collisions"?)

>  In the above script just using
> revision=${r}-g${revision};
> instead of
> revision=$r;
> would do it.  Perhaps also replace both HEAD occurences with $revision
>
> Jakub
>


[PATCH] tree-optimization/101467 - fix make_temp_ssa_name usage

2021-07-16 Thread Richard Biener
My previous change to vect_gen_while introduced paths which call
make_temp_ssa_name with a NULL name which isn't supported.  The
following fixes that.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-07-16  Richard Biener  

PR tree-optimization/101467
* tree-vect-stmts.c (vect_gen_while): Properly guard
make_temp_ssa_name usage.
---
 gcc/tree-vect-stmts.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index ec82acb8db9..0ef46962618 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -11999,7 +11999,11 @@ vect_gen_while (gimple_seq *seq, tree mask_type, tree 
start_index,
   gcall *call = gimple_build_call_internal (IFN_WHILE_ULT, 3,
start_index, end_index,
build_zero_cst (mask_type));
-  tree tmp = make_temp_ssa_name (mask_type, NULL, name);
+  tree tmp;
+  if (name)
+tmp = make_temp_ssa_name (mask_type, NULL, name);
+  else
+tmp = make_ssa_name (mask_type);
   gimple_call_set_lhs (call, tmp);
   gimple_seq_add_stmt (seq, call);
   return tmp;
-- 
2.26.2


[PATCH v2] gcc_update: use human readable name for revision string in gcc/REVISION

2021-07-16 Thread Serge Belyshev
Based on discussion I've chosen open-coded version without commit hash.

>> > > ...  Perhaps rename the r, o, rr and m temporaries.

I like it better with short names, there is no other code in that
script to clash with.  (Also, two adjacent case branches for hg and svn
are essentialy dead now).

>> ...  Perhaps also replace both HEAD occurences with $revision

not sure about that: should not they be exactly equivalent in all cases?

---
gcc_update: use human readable name for revision string in gcc/REVISION

contrib/Changelog:

* gcc_update: derive human readable name for HEAD using git describe
like "git gcc-descr" does.  Drop "revision" from gcc/REVISION.
---
 contrib/gcc_update | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/contrib/gcc_update b/contrib/gcc_update
index 80fac9fc995..558926b3a2d 100755
--- a/contrib/gcc_update
+++ b/contrib/gcc_update
@@ -332,7 +332,22 @@ case $vcs_type in
 exit 1
fi
 
-   revision=`$GCC_GIT log -n1 --pretty=tformat:%p:%t:%H`
+   # Open-coded version of "git gcc-descr" from 
contrib/gcc-git-customization.sh
+   revision=`$GCC_GIT log -n1 --pretty=tformat:%h`
+   r=`$GCC_GIT describe --all --match 'basepoints/gcc-[0-9]*' HEAD \
+  | sed -n 
's,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)-\([0-9]\+\)-g[0-9a-f]*$,r\2-\3,p;s,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)$,r\2-0,p'`;
+   if test -n $r; then
+   o=`$GCC_GIT config --get gcc-config.upstream`;
+   rr=`echo $r | sed -n 
's,^r\([0-9]\+\)-[0-9]\+\(-g[0-9a-f]\+\)\?$,\1,p'`;
+   if $GCC_GIT rev-parse --verify --quiet 
${o:-origin}/releases/gcc-$rr >/dev/null; then
+   m=releases/gcc-$rr;
+   else
+   m=master;
+   fi;
+   if $GCC_GIT merge-base --is-ancestor HEAD ${o:-origin}/$m; then
+   revision=$r;
+   fi
+   fi
branch=`$GCC_GIT name-rev --name-only HEAD || :`
;;
 
@@ -414,6 +429,6 @@ rm -f LAST_UPDATED gcc/REVISION
 date
 echo "`TZ=UTC date` (revision $revision)"
 } > LAST_UPDATED
-echo "[$branch revision $revision]" > gcc/REVISION
+echo "[$branch $revision]" > gcc/REVISION
 
 touch_files_reexec


Re: Add EAF_NOT_RETURNED flag

2021-07-16 Thread Richard Biener
On Fri, 16 Jul 2021, Jan Hubicka wrote:

> Hi,
> this patch adds EAF_NOT_RETURNED flag which is determined by ipa-modref
> and used both to improve its propagation (it can stop propagating flags
> from call parameter to return value if EAF_NOT_RETURNED is earlier
> determined for callee) and also to improve points-to constraints in
> tree-ssa-structalias (since return value constrain does not need to
> contain the parameters that are not returned.
> 
> No true IPA propagatoin is done, but I will look into it incrementally
> (there is general problem of lacking return functions).
> 
> We now have 8 EAF flags so it is no longer possible to store them to
> char datatype so I added eaf_flags_t. I also disabled some shortcuts in
> ipa-moderef which ignored CONST functions since EAF_UNUSED and
> EAF_NOT_RETURNED is useful there, too.
> 
> The tree-ssa-structlias part is not very precise. I simply avoid adding
> constraint copying callused to rhs if all parameters are
> EAF_NOT_RETURNED.  This is overly conservative, but if one just skips
> not returned parameters in call used we will optimize out initialization
> of memory that is read by the callee but does not escape or gets
> returned.  
> 
> It would be more precise to push arguments to rhsc vector individually,
> but I would like to do this incrementally since this results in more
> constraints and pehraps we should be smart and produce them only if
> there is a mix of not returned and returned parameters or so.
> 
> Bootstrapped/regtested x86_64-linux, also ltobootstrapped with c++ only,
> OK?

OK.  Btw, there's some modref propagation correctness fix from Alex
which needs looking at - 
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573137.html

Thanks,
Richard.

> gcc/ChangeLog:
> 
> 2021-07-16  Jan Hubicka  
> 
>   * ipa-modref.c (struct escape_entry): Use eaf_flags_t.
>   (dump_eaf_flags): Dump EAF_NOT_RETURNED
>   (eaf_flags_useful_p): Use eaf_fleags_t; handle const functions
>   and EAF_NOT_RETURNED.
>   (modref_summary::useful_p): Likewise.
>   (modref_summary_lto::useful_p): Likewise.
>   (struct) modref_summary_lto: Use eaf_fleags_t.
>   (deref_flags): Handle EAF_NOT_RETURNED.
>   (struct escape_point): Use min_flags.
>   (modref_lattice::init): Add EAF_NOT_RETURNED.
>   (merge_call_lhs_flags): Ignore EAF_NOT_RETURNED functions
>   (analyze_ssa_name_flags): Clear EAF_NOT_RETURNED on return;
>   handle call flags.
>   (analyze_parms): Also analyze const functions; update conition on
>   flags usefulness.
>   (modref_write): Update streaming.
>   (read_section): Update streaming.
>   (remap_arg_flags): Use eaf_flags_t.
>   (modref_merge_call_site_flags): Hanlde EAF_NOT_RETURNED.
>   * ipa-modref.h: (eaf_flags_t): New typedef.
>   (struct modref_summary): Use eaf_flags_t.
>   * tree-core.h (EAF_NOT_RETURNED): New constant.
>   * tree-ssa-structalias.c (handle_rhs_call): Hanlde EAF_NOT_RETURNED.
>   (handle_const_call): Handle EAF_UNUSED and EAF_NOT_RETURNED.
>   (handle_pure_call): Handle EAF_NOT_RETURNED.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-07-16  Jan Hubicka  
> 
>   * gcc.dg/tree-ssa/modref-6.c: New test.
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index d5a8332fb55..734d7d066bc 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "stringpool.h"
>  #include "tree-ssanames.h"
>  
> +
>  namespace {
>  
>  /* We record fnspec specifiers for call edges since they depends on actual
> @@ -135,7 +136,7 @@ struct escape_entry
>/* Argument it escapes to.  */
>unsigned int arg;
>/* Minimal flags known about the argument.  */
> -  char min_flags;
> +  eaf_flags_t min_flags;
>/* Does it escape directly or indirectly?  */
>bool direct;
>  };
> @@ -155,6 +156,8 @@ dump_eaf_flags (FILE *out, int flags, bool newline = true)
>  fprintf (out, " nodirectescape");
>if (flags & EAF_UNUSED)
>  fprintf (out, " unused");
> +  if (flags & EAF_NOT_RETURNED)
> +fprintf (out, " not_returned");
>if (newline)
>fprintf (out, "\n");
>  }
> @@ -278,12 +281,17 @@ modref_summary::~modref_summary ()
>  /* Return true if FLAGS holds some useful information.  */
>  
>  static bool
> -eaf_flags_useful_p (vec  &flags, int ecf_flags)
> +eaf_flags_useful_p (vec  &flags, int ecf_flags)
>  {
>for (unsigned i = 0; i < flags.length (); i++)
> -if (ecf_flags & ECF_PURE)
> +if (ecf_flags & ECF_CONST)
>{
> - if (flags[i] & (EAF_UNUSED | EAF_DIRECT))
> + if (flags[i] & (EAF_UNUSED | EAF_NOT_RETURNED))
> +   return true;
> +  }
> +else if (ecf_flags & ECF_PURE)
> +  {
> + if (flags[i] & (EAF_UNUSED | EAF_DIRECT | EAF_NOT_RETURNED))
> return true;
>}
>  else
> @@ -300,13 +308,15 @@ eaf_flags_useful_p (vec  &flags, int 
> ecf_flags)
>  bool
>  modref_summary::useful_p (int ecf_flag

[PATCH] Get rid of some gimple_expr_type uses

2021-07-16 Thread Richard Biener
This gets rid of a few gimple_expr_type uses.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-07-16  Richard Biener  

* gimple-fold.c (gimple_fold_stmt_to_constant_1): Use
the type of the LHS.
(gimple_assign_nonnegative_warnv_p): Likewise.
(gimple_call_nonnegative_warnv_p): Likewise.  Return false
if the call has no LHS.
* gimple.c (gimple_could_trap_p_1): Use the type of the LHS.
* tree-eh.c (stmt_could_throw_1_p): Likewise.
* tree-inline.c (insert_init_stmt): Likewise.
* tree-ssa-loop-niter.c (get_val_for): Likewise.
* tree-outof-ssa.c (ssa_is_replaceable_p): Use the type of
the def.
* tree-ssa-sccvn.c (init_vn_nary_op_from_stmt): Take a
gassign *.  Use the type of the lhs.
(vn_nary_op_lookup_stmt): Adjust.
(vn_nary_op_insert_stmt): Likewise.
---
 gcc/gimple-fold.c | 20 +++-
 gcc/gimple.c  |  4 ++--
 gcc/tree-eh.c |  2 +-
 gcc/tree-inline.c |  2 +-
 gcc/tree-outof-ssa.c  |  2 +-
 gcc/tree-ssa-loop-niter.c |  4 ++--
 gcc/tree-ssa-sccvn.c  | 12 ++--
 7 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 1401092aa9b..a3afe871f6b 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -7507,7 +7507,8 @@ gimple_fold_stmt_to_constant_1 (gimple *stmt, tree 
(*valueize) (tree),
   tree op1 = (*valueize) (gimple_assign_rhs2 (stmt));
   tree op2 = (*valueize) (gimple_assign_rhs3 (stmt));
   return fold_ternary_loc (loc, subcode,
-  gimple_expr_type (stmt), op0, op1, op2);
+  TREE_TYPE (gimple_assign_lhs (stmt)),
+  op0, op1, op2);
 }
 
   default:
@@ -8901,16 +8902,17 @@ gimple_assign_nonnegative_warnv_p (gimple *stmt, bool 
*strict_overflow_p,
   int depth)
 {
   enum tree_code code = gimple_assign_rhs_code (stmt);
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
   switch (get_gimple_rhs_class (code))
 {
 case GIMPLE_UNARY_RHS:
   return tree_unary_nonnegative_warnv_p (gimple_assign_rhs_code (stmt),
-gimple_expr_type (stmt),
+type,
 gimple_assign_rhs1 (stmt),
 strict_overflow_p, depth);
 case GIMPLE_BINARY_RHS:
   return tree_binary_nonnegative_warnv_p (gimple_assign_rhs_code (stmt),
- gimple_expr_type (stmt),
+ type,
  gimple_assign_rhs1 (stmt),
  gimple_assign_rhs2 (stmt),
  strict_overflow_p, depth);
@@ -8938,12 +8940,12 @@ gimple_call_nonnegative_warnv_p (gimple *stmt, bool 
*strict_overflow_p,
 gimple_call_arg (stmt, 0) : NULL_TREE;
   tree arg1 = gimple_call_num_args (stmt) > 1 ?
 gimple_call_arg (stmt, 1) : NULL_TREE;
-
-  return tree_call_nonnegative_warnv_p (gimple_expr_type (stmt),
-   gimple_call_combined_fn (stmt),
-   arg0,
-   arg1,
-   strict_overflow_p, depth);
+  tree lhs = gimple_call_lhs (stmt);
+  return (lhs
+ && tree_call_nonnegative_warnv_p (TREE_TYPE (lhs),
+   gimple_call_combined_fn (stmt),
+   arg0, arg1,
+   strict_overflow_p, depth));
 }
 
 /* Return true if return value of call STMT is known to be non-negative.
diff --git a/gcc/gimple.c b/gcc/gimple.c
index 0690f94971f..863bc0d17f1 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2164,12 +2164,12 @@ gimple_could_trap_p_1 (const gimple *s, bool 
include_mem, bool include_stores)
   if (op == COND_EXPR)
return tree_could_trap_p (gimple_assign_rhs1 (s));
 
-  /* For comparisons we need to check rhs operand types instead of rhs type
+  /* For comparisons we need to check rhs operand types instead of lhs type
  (which is BOOLEAN_TYPE).  */
   if (TREE_CODE_CLASS (op) == tcc_comparison)
t = TREE_TYPE (gimple_assign_rhs1 (s));
   else
-   t = gimple_expr_type (s);
+   t = TREE_TYPE (gimple_assign_lhs (s));
 
   if (get_gimple_rhs_class (op) == GIMPLE_BINARY_RHS)
div = gimple_assign_rhs2 (s);
diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c
index 57ce8f04a43..3a09de95025 100644
--- a/gcc/tree-eh.c
+++ b/gcc/tree-eh.c
@@ -2856,7 +2856,7 @@ stmt_could_throw_1_p (gassign *stmt)
   if (TREE_CODE_CLASS (code) == tcc_comparison)
t = T

Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Jonathan Wakely via Gcc-patches
On Fri, 16 Jul 2021 at 03:51, Noah Goldstein wrote:
> On intel x86 systems with a private L2 cache the spatial prefetcher
> can cause destructive interference along 128 byte aligned boundaries.
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=60

Which is a good example of why these "constants" should never have
been standardized in the first place. Sigh.



Re: [committed] libstdc++: Suppress pedantic warnings about __int128

2021-07-16 Thread Jonathan Wakely via Gcc-patches
On Fri, 16 Jul 2021 at 09:40, Jonathan Wakely  wrote:
>
>
>
> On Fri, 16 Jul 2021, 09:38 Jonathan Wakely,  wrote:
>>
>>
>>
>> On Fri, 16 Jul 2021, 09:30 Jakub Jelinek via Libstdc++, 
>>  wrote:
>>>
>>> On Fri, Jul 16, 2021 at 10:27:09AM +0200, Jakub Jelinek via Gcc-patches 
>>> wrote:
>>> > On Fri, Jul 16, 2021 at 08:41:06AM +0100, Jonathan Wakely via Gcc-patches 
>>> > wrote:
>>> > > --- a/libstdc++-v3/include/bits/max_size_type.h
>>> > > +++ b/libstdc++-v3/include/bits/max_size_type.h
>>> > > @@ -417,7 +417,10 @@ namespace ranges
>>> > >  #endif
>>> > >
>>> > >  #if __SIZEOF_INT128__
>>> > > +#pragma GCC diagnostic push
>>> > > +#pragma GCC diagnostic ignored "-Wpedantic"
>>> > >using __rep = unsigned __int128;
>>> > > +#pragma GCC diagnostic pop
>>> >
>>> > At least in simple cases like this, wouldn't
>>> >   using __rep = __extension__ unsigned __int128;
>>>
>>> __extension__ using __rep = unsigned __int128;
>>> actually (now tested).
>>
>>
>> Ah, thanks. I didn't find the right syntax, and I know __extension__ doesn't 
>> work in other cases, like quad float literals, so I assumed it doesn't work 
>> here. I suppose the literals don't work because the warning comes from the 
>> processor, which doesn't understand __extension__ (and also ignores the 
>> diagnostic pragma).
>
>
> That grammar for a using-declaration makes no sense at all btw ;-)

Hmm, in fact it seems that we can just use the __uint128_t typedef
instead, which doesn't give a pedwarn:

  using __rep = __uint128_t;

Is that typedef always available if __int128 is? There's a comment in
gcc/c-family/c-common.c that I don't understand:

#if HOST_BITS_PER_WIDE_INT >= 64
  /* Note that this is different than the __int128 type that's part of
 the generic __intN support.  */
  if (targetm.scalar_mode_supported_p (TImode))
lang_hooks.decls.pushdecl (build_decl (UNKNOWN_LOCATION,
   TYPE_DECL,
   get_identifier ("__int128_t"),
   intTI_type_node));
#endif

They are the same type in C++, so what is "different"? Is it possible
for __int128 to be different from a TImode integer?

We can still use __extension__ elsewhere, for defining explicit
specializations using the non-standard integers, e.g.

#define __INT_N(TYPE)  \
+  __extension__\
  template<>   \
struct __is_integer  \
{  \
  enum { __value = 1 };\
  typedef __true_type __type;  \
};



Re: [PATCH] Rewrite memset expanders with vec_duplicate

2021-07-16 Thread Richard Sandiford via Gcc-patches
"H.J. Lu via Gcc-patches"  writes:
> 1. Rewrite builtin_memset_read_str and builtin_memset_gen_str with
> vec_duplicate_optab to duplicate QI value to TI/OI/XI value.
> 2. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
> scratch register to avoid stack realignment when expanding memset.
>
>   PR middle-end/90773
>   * builtins.c (gen_memset_value_from_prev): New function.
>   (gen_memset_broadcast): Likewise.
>   (builtin_memset_read_str): Use gen_memset_value_from_prev
>   and gen_memset_broadcast.
>   (builtin_memset_gen_str): Likewise.
>   * target.def (gen_memset_scratch_rtx): New hook.
>   * doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
>   * doc/tm.texi: Regenerated.
> ---
>  gcc/builtins.c | 123 +
>  gcc/doc/tm.texi|   5 ++
>  gcc/doc/tm.texi.in |   2 +
>  gcc/target.def |   7 +++
>  4 files changed, 116 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 39ab139b7e1..c1758ae2efc 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -6686,26 +6686,111 @@ expand_builtin_strncpy (tree exp, rtx target)
>return NULL_RTX;
>  }
>  
> -/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> -   bytes from constant string DATA + OFFSET and return it as target
> -   constant.  If PREV isn't nullptr, it has the RTL info from the
> +/* Return the RTL of a register in MODE generated from PREV in the
> previous iteration.  */
>  
> -rtx
> -builtin_memset_read_str (void *data, void *prevp,
> -  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> -  scalar_int_mode mode)
> +static rtx
> +gen_memset_value_from_prev (void *prevp, scalar_int_mode mode)
>  {
> +  rtx target = nullptr;
>by_pieces_prev *prev = (by_pieces_prev *) prevp;
>if (prev != nullptr && prev->data != nullptr)
>  {
>/* Use the previous data in the same mode.  */
>if (prev->mode == mode)
>   return prev->data;
> +
> +  rtx prev_rtx = prev->data;
> +  machine_mode prev_mode = prev->mode;
> +  unsigned int word_size = GET_MODE_SIZE (word_mode);
> +  if (word_size < GET_MODE_SIZE (prev->mode)
> +   && word_size > GET_MODE_SIZE (mode))
> + {
> +   /* First generate subreg of word mode if the previous mode is
> +  wider than word mode and word mode is wider than MODE.  */
> +   prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
> +   prev_mode, 0);
> +   prev_mode = word_mode;
> + }
> +  if (prev_rtx != nullptr)
> + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
>  }
> +  return target;
> +}
> +
> +/* Return the RTL of a register in MODE broadcasted from DATA.  */
> +
> +static rtx
> +gen_memset_broadcast (rtx data, scalar_int_mode mode)
> +{
> +  /* Skip if regno_reg_rtx isn't initialized.  */
> +  if (!regno_reg_rtx)
> +return nullptr;
> +
> +  rtx target = nullptr;
> +
> +  unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
> +  machine_mode vector_mode;
> +  if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
> +gcc_unreachable ();

Sorry, I realise it's a bit late to be raising this objection now,
but I don't think it's a good idea to use scalar integer modes as
a proxy for vector modes.  In principle there's no reason why a
target has to define an integer mode for every vector mode.

If we want the mode to be a vector then I think the by-pieces
infrastructure should be extended to support vectors directly,
rather than assuming that each piece can be represented as
a scalar_int_mode.

Thanks,
Richard

> +
> +  enum insn_code icode = optab_handler (vec_duplicate_optab,
> + vector_mode);
> +  if (icode != CODE_FOR_nothing)
> +{
> +  rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
> +  if (CONST_INT_P (data))
> + {
> +   /* Use the move expander with CONST_VECTOR.  */
> +   rtx const_vec = gen_const_vec_duplicate (vector_mode, data);
> +   emit_move_insn (reg, const_vec);
> + }
> +  else
> + {
> +
> +   class expand_operand ops[2];
> +   create_output_operand (&ops[0], reg, vector_mode);
> +   create_input_operand (&ops[1], data, QImode);
> +   expand_insn (icode, 2, ops);
> +   if (!rtx_equal_p (reg, ops[0].value))
> + emit_move_insn (reg, ops[0].value);
> + }
> +  target = lowpart_subreg (mode, reg, vector_mode);
> +}
> +
> +  return target;
> +}
> +
> +/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> +   bytes from constant string DATA + OFFSET and return it as target
> +   constant.  If PREV isn't nullptr, it has the RTL info from the
> +   previous iteration.  */
>  
> +rtx
> +builtin_memset_read_str (void *data, void *prev,
> +  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> +  scala

Re: [RFC] c-family: Add __builtin_noassoc

2021-07-16 Thread Matthias Kretz
On Friday, 16 July 2021 11:31:29 CEST Richard Biener wrote:
> On Fri, Jul 16, 2021 at 10:57 AM Matthias Kretz  wrote:
> > On Wednesday, 14 July 2021 10:14:55 CEST Richard Biener wrote:
> > > I think implementing it similar to how we do __builtin_shufflevector
> > > would
> > > be easily possible.  PAREN_EXPR is a tree code.
> > 
> > Like this? If you like it, I'll write the missing documentation and do
> > real
> > regression testing.
> 
> Yes, like this.  Now, __builtin_noassoc (a + b + c) might suggest that
> it prevents a + b + c from being re-associated - but it does not. 
> PAREN_EXPR is a barrier for association, so for 'a + b + c + PAREN_EXPR  + e + f>' the a+b+c and d+e+f chains will not mix but they individually can
> be re-associated.  That said __builtin_noassoc might be a bad name,
> maybe __builtin_assoc_barrier is better?

Yes, I agree with renaming it. And assoc_barrier sounds intuitive to me.

> To fully prevent association of a a + b + d + e chain you need at least
> two PAREN_EXPRs, for example (a+b) + (d+e) would do.
> 
> One could of course provide __builtin_noassoc (a+b+c+d) with the
> implied semantics and insert PAREN_EXPRs around all operands
> when lowering it.

I wouldn't want to go there. __builtin_noassoc(f(x, y, z))? We probably both 
agree that it would be a no-op, but it reads like f should be evaluated with -
fno-associative-math.

> Not sure what's more useful in practice - directly exposing the middle-end
> PAREN_EXPR or providing a way to mark a whole expression as to be
> not re-associated?  Maybe both?

I think this is a tool for specialists. Give them the low-level tool and 
they'll build whatever higher level abstractions they need on top of it. Like

float sum_noassoc(RangeOfFloats auto x) {
  float sum = 0;
  for (float v : x)
sum = __builtin_assoc_barrier(v + x);
  return sum;
}

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──


[PATCH] Remove more gimple_expr_type uses

2021-07-16 Thread Richard Biener
This removes a few more uses.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-07-16  Richard Biener   

* gimple-ssa-store-merging.c (verify_symbolic_number_p): Use
the type of the LHS.
(find_bswap_or_nop_1): Likewise.
(find_bswap_or_nop): Likewise.
* tree-vectorizer.h (vect_get_smallest_scalar_type): Adjust
prototype.
* tree-vect-data-refs.c (vect_get_smallest_scalar_type):
Remove unused parameters, pass in the scalar type.  Fix
internal store function handling.
* tree-vect-stmts.c (vect_analyze_stmt): Remove assert.
(vect_get_vector_types_for_stmt): Move down check for
existing vector stmt after we've determined a scalar type.
Pass down the used scalar type to vect_get_smallest_scalar_type.
* tree-vect-generic.c (expand_vector_condition): Use
the type of the LHS.
(expand_vector_scalar_condition): Likewise.
(expand_vector_operations_1): Likewise.
* tree-vect-patterns.c (vect_widened_op_tree): Likewise.
(vect_recog_dot_prod_pattern): Likewise.
(vect_recog_sad_pattern): Likewise.
(vect_recog_widen_op_pattern): Likewise.
(vect_recog_widen_sum_pattern): Likewise.
(vect_recog_mixed_size_cond_pattern): Likewise.
---
 gcc/gimple-ssa-store-merging.c |  6 +++---
 gcc/tree-vect-data-refs.c  | 38 +++---
 gcc/tree-vect-generic.c|  8 +++
 gcc/tree-vect-patterns.c   | 12 +--
 gcc/tree-vect-stmts.c  | 17 +++
 gcc/tree-vectorizer.h  |  3 +--
 6 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index 20959acc1c6..ce54c78bdda 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -313,7 +313,7 @@ verify_symbolic_number_p (struct symbolic_number *n, gimple 
*stmt)
 {
   tree lhs_type;
 
-  lhs_type = gimple_expr_type (stmt);
+  lhs_type = TREE_TYPE (gimple_get_lhs (stmt));
 
   if (TREE_CODE (lhs_type) != INTEGER_TYPE
   && TREE_CODE (lhs_type) != ENUMERAL_TYPE)
@@ -702,7 +702,7 @@ find_bswap_or_nop_1 (gimple *stmt, struct symbolic_number 
*n, int limit)
int i, type_size, old_type_size;
tree type;
 
-   type = gimple_expr_type (stmt);
+   type = TREE_TYPE (gimple_assign_lhs (stmt));
type_size = TYPE_PRECISION (type);
if (type_size % BITS_PER_UNIT != 0)
  return NULL;
@@ -851,7 +851,7 @@ find_bswap_or_nop_finalize (struct symbolic_number *n, 
uint64_t *cmpxchg,
 gimple *
 find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
 {
-  tree type_size = TYPE_SIZE_UNIT (gimple_expr_type (stmt));
+  tree type_size = TYPE_SIZE_UNIT (TREE_TYPE (gimple_get_lhs (stmt)));
   if (!tree_fits_uhwi_p (type_size))
 return NULL;
 
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 579149dfd61..6995efba899 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -116,11 +116,8 @@ vect_lanes_optab_supported_p (const char *name, 
convert_optab optab,
types.  */
 
 tree
-vect_get_smallest_scalar_type (stmt_vec_info stmt_info,
-  HOST_WIDE_INT *lhs_size_unit,
-  HOST_WIDE_INT *rhs_size_unit)
+vect_get_smallest_scalar_type (stmt_vec_info stmt_info, tree scalar_type)
 {
-  tree scalar_type = gimple_expr_type (stmt_info->stmt);
   HOST_WIDE_INT lhs, rhs;
 
   /* During the analysis phase, this function is called on arbitrary
@@ -131,21 +128,24 @@ vect_get_smallest_scalar_type (stmt_vec_info stmt_info,
   lhs = rhs = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (scalar_type));
 
   gassign *assign = dyn_cast  (stmt_info->stmt);
-  if (assign
-  && (gimple_assign_cast_p (assign)
+  if (assign)
+{
+  scalar_type = TREE_TYPE (gimple_assign_lhs (assign));
+  if (gimple_assign_cast_p (assign)
  || gimple_assign_rhs_code (assign) == DOT_PROD_EXPR
  || gimple_assign_rhs_code (assign) == WIDEN_SUM_EXPR
  || gimple_assign_rhs_code (assign) == WIDEN_MULT_EXPR
  || gimple_assign_rhs_code (assign) == WIDEN_LSHIFT_EXPR
  || gimple_assign_rhs_code (assign) == WIDEN_PLUS_EXPR
  || gimple_assign_rhs_code (assign) == WIDEN_MINUS_EXPR
- || gimple_assign_rhs_code (assign) == FLOAT_EXPR))
-{
-  tree rhs_type = TREE_TYPE (gimple_assign_rhs1 (assign));
+ || gimple_assign_rhs_code (assign) == FLOAT_EXPR)
+   {
+ tree rhs_type = TREE_TYPE (gimple_assign_rhs1 (assign));
 
-  rhs = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (rhs_type));
-  if (rhs < lhs)
-scalar_type = rhs_type;
+ rhs = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (rhs_type));
+ if (rhs < lhs)
+   scalar_type = rhs_type;
+   }
 }
   else if (gcall *call = dyn_cast  (stmt_info->stmt))
 {
@@ -153,10 +153,16 @@ ve

Re: *Ping**2 [Patch] Fortran: Fix bind(C) character length checks

2021-07-16 Thread Jerry D via Gcc-patches

Good to go Tobias.

Jerry

On 7/14/21 5:50 AM, Burnus, Tobias wrote:

Ping**2

On Juli 8, 2021 I wrote:

*Ping*

I intent to incorporate Sandra's suggestions, except for the beginning of line 
spacing - that's needed to avoid exceeding the 80 character line limit. I did 
not include an updated patch as just pinging is easier on a mobile during 
vacation :-)

Thanks,

Tobias

Loosemore, Sandra wrote:

On 7/1/21 11:08 AM, Tobias Burnus wrote:

Hi all,

this patch came up when discussing Sandra's TS29113 patch internally.
There is presumably also some overlap with José's patches.

This patch tries to rectify the BIND(C) CHARACTER handling on the
diagnostic side, only. That is: what to accept and what
to reject for which Fortran standard.


The rules are:

* [F2003-F2018] Interoperable is character(len=1)
→ F2018, 18.3.1  Interoperability of intrinsic types
(General, unchanged)

* Fortran 2008: In some cases, const-length chars are
permitted as well:
→ F2018, 18.3.4  Interoperability of scalar variables
→ F2018, 18.3.5  Interoperability of array variables
→ F2018, 18.3.6  Interoperability of procedures and procedure interfaces
   [= F2008, 15.3.{4,5,6}
For global vars with bind(C), 18.3.4 + 18.3.5 applies directly (TODO:
Add support, not in this patch)
For passed-by ref dummy arguments, 18.3.4 + 18.3.5 are referenced in
- F2008: R1229  proc-language-binding-spec is language-binding-spec
   C1255 (R1229) 
- F2018, F2018, C1554

While it is not very clearly spelt out, I regard 'char parm[4]'
interoperable with 'character(len=4) :: a', 'character(len=2) :: b(2)'
and 'character(len=1) :: c(4)' for both global variables and for
dummy arguments.

* Fortran 2018/TS29113:  Uses additionally CFI array descriptor
- allocatable, pointer:  must be len=:
- nonallocatable/nonpointer: len=* → implies array descriptor also
  for assumed-size/explicit-size/scalar arguments.
- All which all passed by an array descriptor already without further
  restrictions: assumed-shape, assumed-rank, i.e. len= seems
  to be also fine
→ 18.3.6 under item (5) bullet point 2 and 3 plus (6).


I hope I got the conditions right. I also fixed an issue with
character(len=5) :: str – the code in trans-expr.c did crash for
scalars  (decl.c did not check any constraints for arrays).
I believe the condition is wrong and for len= no descriptor
is used.

Any comments, remarks?

I gave this patch a try on my TS 29113 last night.  Changing the error
messages kind of screwed up my list of FAILs, but I did see that it also
caught some invalid character arguments in
interoperability/typecodes-scalar.f90 and
interoperability/typecodes-scalar-ext.f90 (which are already broken by 2
other major gfortran bugs I still need to file PRs for).  :-S

I haven't tried to review the patch WRT correctness with the
requirements of the standard yet, but I have a few nits about error
messages


+   /* F2018, 18.3.6 (6).  */
+   if (!sym->ts.deferred)
+ {
+   gfc_error ("Allocatable and pointer character dummy "
+  "argument %qs at %L must have deferred length "
+  "as procedure %qs is BIND(C)", sym->name,
+  &sym->declared_at, sym->ns->proc_name->name);
+   retval = false;
+ }

This is the error the two aforementioned test cases started giving, but
message is confusing and doesn't read well (it was a pointer dummy, not
"allocatable and pointer").  Maybe just s/and/or/, or customize the
message depending on which one it is?


+   gfc_error ("Character dummy argument %qs at %L must be "
+  "of constant length or assumed length, "
+  "unless it has assumed-shape or assumed-rank, "
+  "as procedure %qs has the BIND(C) attribute",
+  sym->name, &sym->declared_at,
+  sym->ns->proc_name->name);

I don't think either "assumed-shape" or "assumed-rank" should be
hyphenated in this context unless that exact hyphenation is a term of
art in the Fortran standard or other technical documentation.  In normal
English, adjective phrases are usually only hyphenated when they appear
immediately before the noun they modify; "assumed-shape array", but "an
array with assumed shape".


+   else if (!gfc_notify_std (GFC_STD_F2018,
+ "Character dummy argument %qs at %L"
+ " with nonconstant length as "
+ "procedure %qs is BIND(C)",
+ sym->name, &sym->declared_at,
+ sym->ns->proc_name->name))
+ retval = false;
+ }

Elsewhere the convention seems to be to format strings split across
mu

Re: [PATCH] Rewrite memset expanders with vec_duplicate

2021-07-16 Thread H.J. Lu via Gcc-patches
On Fri, Jul 16, 2021 at 4:38 AM Richard Sandiford
 wrote:
>
> "H.J. Lu via Gcc-patches"  writes:
> > 1. Rewrite builtin_memset_read_str and builtin_memset_gen_str with
> > vec_duplicate_optab to duplicate QI value to TI/OI/XI value.
> > 2. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
> > scratch register to avoid stack realignment when expanding memset.
> >
> >   PR middle-end/90773
> >   * builtins.c (gen_memset_value_from_prev): New function.
> >   (gen_memset_broadcast): Likewise.
> >   (builtin_memset_read_str): Use gen_memset_value_from_prev
> >   and gen_memset_broadcast.
> >   (builtin_memset_gen_str): Likewise.
> >   * target.def (gen_memset_scratch_rtx): New hook.
> >   * doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
> >   * doc/tm.texi: Regenerated.
> > ---
> >  gcc/builtins.c | 123 +
> >  gcc/doc/tm.texi|   5 ++
> >  gcc/doc/tm.texi.in |   2 +
> >  gcc/target.def |   7 +++
> >  4 files changed, 116 insertions(+), 21 deletions(-)
> >
> > diff --git a/gcc/builtins.c b/gcc/builtins.c
> > index 39ab139b7e1..c1758ae2efc 100644
> > --- a/gcc/builtins.c
> > +++ b/gcc/builtins.c
> > @@ -6686,26 +6686,111 @@ expand_builtin_strncpy (tree exp, rtx target)
> >return NULL_RTX;
> >  }
> >
> > -/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> > -   bytes from constant string DATA + OFFSET and return it as target
> > -   constant.  If PREV isn't nullptr, it has the RTL info from the
> > +/* Return the RTL of a register in MODE generated from PREV in the
> > previous iteration.  */
> >
> > -rtx
> > -builtin_memset_read_str (void *data, void *prevp,
> > -  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> > -  scalar_int_mode mode)
> > +static rtx
> > +gen_memset_value_from_prev (void *prevp, scalar_int_mode mode)
> >  {
> > +  rtx target = nullptr;
> >by_pieces_prev *prev = (by_pieces_prev *) prevp;
> >if (prev != nullptr && prev->data != nullptr)
> >  {
> >/* Use the previous data in the same mode.  */
> >if (prev->mode == mode)
> >   return prev->data;
> > +
> > +  rtx prev_rtx = prev->data;
> > +  machine_mode prev_mode = prev->mode;
> > +  unsigned int word_size = GET_MODE_SIZE (word_mode);
> > +  if (word_size < GET_MODE_SIZE (prev->mode)
> > +   && word_size > GET_MODE_SIZE (mode))
> > + {
> > +   /* First generate subreg of word mode if the previous mode is
> > +  wider than word mode and word mode is wider than MODE.  */
> > +   prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
> > +   prev_mode, 0);
> > +   prev_mode = word_mode;
> > + }
> > +  if (prev_rtx != nullptr)
> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
> >  }
> > +  return target;
> > +}
> > +
> > +/* Return the RTL of a register in MODE broadcasted from DATA.  */
> > +
> > +static rtx
> > +gen_memset_broadcast (rtx data, scalar_int_mode mode)
> > +{
> > +  /* Skip if regno_reg_rtx isn't initialized.  */
> > +  if (!regno_reg_rtx)
> > +return nullptr;
> > +
> > +  rtx target = nullptr;
> > +
> > +  unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
> > +  machine_mode vector_mode;
> > +  if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
> > +gcc_unreachable ();
>
> Sorry, I realise it's a bit late to be raising this objection now,
> but I don't think it's a good idea to use scalar integer modes as
> a proxy for vector modes.  In principle there's no reason why a
> target has to define an integer mode for every vector mode.

A target always defines the largest integer mode.

> If we want the mode to be a vector then I think the by-pieces
> infrastructure should be extended to support vectors directly,
> rather than assuming that each piece can be represented as
> a scalar_int_mode.
>

The current by-pieces infrastructure operates on scalar_int_mode.
Only for memset, there is

/* Callback routine for store_by_pieces.  Return the RTL of a register
   containing GET_MODE_SIZE (MODE) consecutive copies of the unsigned
   char value given in the RTL register data.  For example, if mode is
   4 bytes wide, return the RTL for 0x01010101*data.  If PREV isn't
   nullptr, it has the RTL info from the previous iteration.  */

static rtx
builtin_memset_gen_str (void *data, void *prevp,
HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
scalar_int_mode mode)

It is a broadcast.  If a target can broadcast a byte to a wider integer,
can you suggest a way to use it in the current by-pieces infrastructure?

Thanks.

> Thanks,
> Richard
>
> > +
> > +  enum insn_code icode = optab_handler (vec_duplicate_optab,
> > + vector_mode);
> > +  if (icode != CODE_FOR_nothing)
> > +{
> > +  rtx reg = targetm.gen_memset_scratch

Re: [PATCH] c++: Don't hide narrowing errors in system headers

2021-07-16 Thread Jason Merrill via Gcc-patches

On 7/15/21 8:03 PM, Marek Polacek wrote:

Jonathan pointed me at this issue where

   constexpr unsigned f() { constexpr int n = -1; return unsigned{n}; }

is accepted in system headers, despite the narrowing conversion from
a constant.  I suspect that whereas narrowing warnings should be
disabled, ill-formed narrowing of constants should be a hard error
(which can still be disabled by -Wno-narrowing).

Bootstrapped/regtested on {ppc64le,x86_64}-pc-linux-gnu, ok for trunk?


OK.


gcc/cp/ChangeLog:

* typeck2.c (check_narrowing): Don't suppress the pedantic error
in system headers.

libstdc++-v3/ChangeLog:

* testsuite/20_util/ratio/operations/ops_overflow_neg.cc: Add
dg-error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/Wnarrowing2.C: New test.
* g++.dg/cpp1y/Wnarrowing2.h: New test.
---
  gcc/cp/typeck2.c  | 1 +
  gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.C  | 4 
  gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.h  | 2 ++
  .../testsuite/20_util/ratio/operations/ops_overflow_neg.cc| 2 ++
  4 files changed, 9 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.h

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 6679e247816..dcfdff2f905 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -986,6 +986,7 @@ check_narrowing (tree type, tree init, tsubst_flags_t 
complain,
{
  int savederrorcount = errorcount;
  global_dc->pedantic_errors = 1;
+ auto s = make_temp_override (global_dc->dc_warn_system_headers, true);
  pedwarn (loc, OPT_Wnarrowing,
   "narrowing conversion of %qE from %qH to %qI",
   init, ftype, type);
diff --git a/gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.C 
b/gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.C
new file mode 100644
index 000..048d484f46f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.C
@@ -0,0 +1,4 @@
+// { dg-do compile { target c++14 } }
+
+#include "Wnarrowing2.h"
+// { dg-error "narrowing conversion" "" { target *-*-* } 0 }
diff --git a/gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.h 
b/gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.h
new file mode 100644
index 000..7dafa51af14
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/Wnarrowing2.h
@@ -0,0 +1,2 @@
+#pragma GCC system_header
+constexpr unsigned f() { constexpr int n = -1; return unsigned{n}; }
diff --git 
a/libstdc++-v3/testsuite/20_util/ratio/operations/ops_overflow_neg.cc 
b/libstdc++-v3/testsuite/20_util/ratio/operations/ops_overflow_neg.cc
index 47d3c3a037e..f120e599a33 100644
--- a/libstdc++-v3/testsuite/20_util/ratio/operations/ops_overflow_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/ratio/operations/ops_overflow_neg.cc
@@ -39,6 +39,7 @@ test02()
  }
  
  // { dg-error "required from here" "" { target *-*-* } 28 }

+// { dg-error "expected initializer" "" { target *-*-* } 28 }
  // { dg-error "expected initializer" "" { target *-*-* } 35 }
  // { dg-error "expected initializer" "" { target *-*-* } 37 }
  // { dg-error "overflow in addition" "" { target *-*-* } 0 }
@@ -46,5 +47,6 @@ test02()
  // { dg-error "overflow in multiplication" "" { target *-*-* } 100 }
  // { dg-error "overflow in multiplication" "" { target *-*-* } 102 }
  // { dg-error "overflow in constant expression" "" { target *-*-* } 0 }
+// { dg-error "narrowing conversion" "" { target *-*-* } 0 }
  // { dg-prune-output "out of range" }
  // { dg-prune-output "not usable in a constant expression" }

base-commit: f364cdffa47af574f90f671b2dcf5afa91442741





[PATCH 1/3] Remove gimple_expr_type uses from value-range code

2021-07-16 Thread Richard Biener
This removes the last uses from value-range code.

Bootstrap & regtest running on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2021-07-16  Richard Biener  

* tree-vrp.c (register_edge_assert_for_2): Use the
type from the LHS.
(vrp_folder::fold_predicate_in): Likewise.
* vr-values.c (gimple_assign_nonzero_p): Likewise.
(vr_values::extract_range_from_comparison): Likewise.
(vr_values::extract_range_from_ubsan_builtin): Use the
type of the first operand.
(vr_values::extract_range_basic): Push down type
computation, use the appropriate LHS.
(vr_values::extract_range_from_assignment): Use the
type of the LHS.
---
 gcc/tree-vrp.c  | 14 +++---
 gcc/vr-values.c | 28 
 2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 0565c9b5073..a9c31bcedb5 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -1484,13 +1484,13 @@ register_edge_assert_for_2 (tree name, edge e,
}
 
   /* Extract NAME2 from the (optional) sign-changing cast.  */
-  if (gimple_assign_cast_p (def_stmt))
+  if (gassign *ass = dyn_cast  (def_stmt))
{
- if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt))
- && ! TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (def_stmt)))
- && (TYPE_PRECISION (gimple_expr_type (def_stmt))
- == TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 
(def_stmt)
-   name3 = gimple_assign_rhs1 (def_stmt);
+ if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (ass))
+ && ! TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (ass)))
+ && (TYPE_PRECISION (TREE_TYPE (gimple_assign_lhs (ass)))
+ == TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (ass)
+   name3 = gimple_assign_rhs1 (ass);
}
 
   /* If name3 is used later, create an ASSERT_EXPR for it.  */
@@ -4119,7 +4119,7 @@ vrp_folder::fold_predicate_in (gimple_stmt_iterator *si)
   if (val)
 {
   if (assignment_p)
-val = fold_convert (gimple_expr_type (stmt), val);
+   val = fold_convert (TREE_TYPE (gimple_assign_lhs (stmt)), val);
 
   if (dump_file)
{
diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 190676de2c0..1b3ec38d288 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -338,16 +338,17 @@ gimple_assign_nonzero_p (gimple *stmt)
 {
   enum tree_code code = gimple_assign_rhs_code (stmt);
   bool strict_overflow_p;
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
   switch (get_gimple_rhs_class (code))
 {
 case GIMPLE_UNARY_RHS:
   return tree_unary_nonzero_warnv_p (gimple_assign_rhs_code (stmt),
-gimple_expr_type (stmt),
+type,
 gimple_assign_rhs1 (stmt),
 &strict_overflow_p);
 case GIMPLE_BINARY_RHS:
   return tree_binary_nonzero_warnv_p (gimple_assign_rhs_code (stmt),
- gimple_expr_type (stmt),
+ type,
  gimple_assign_rhs1 (stmt),
  gimple_assign_rhs2 (stmt),
  &strict_overflow_p);
@@ -1025,7 +1026,7 @@ vr_values::extract_range_from_comparison 
(value_range_equiv *vr,
  gimple *stmt)
 {
   enum tree_code code = gimple_assign_rhs_code (stmt);
-  tree type = gimple_expr_type (stmt);
+  tree type = TREE_TYPE (gimple_assign_lhs (stmt));
   tree op0 = gimple_assign_rhs1 (stmt);
   tree op1 = gimple_assign_rhs2 (stmt);
   bool sop;
@@ -1164,7 +1165,6 @@ bool
 vr_values::extract_range_from_ubsan_builtin (value_range_equiv *vr, gimple 
*stmt)
 {
   gcc_assert (is_gimple_call (stmt));
-  tree type = gimple_expr_type (stmt);
   enum tree_code subcode = ERROR_MARK;
   combined_fn cfn = gimple_call_combined_fn (stmt);
   scalar_int_mode mode;
@@ -1190,7 +1190,8 @@ vr_values::extract_range_from_ubsan_builtin 
(value_range_equiv *vr, gimple *stmt
 any overflow, we'll complain, but will actually do
 wrapping operation.  */
   flag_wrapv = 1;
-  extract_range_from_binary_expr (vr, subcode, type,
+  extract_range_from_binary_expr (vr, subcode,
+ TREE_TYPE (gimple_call_arg (stmt, 0)),
  gimple_call_arg (stmt, 0),
  gimple_call_arg (stmt, 1));
   flag_wrapv = saved_flag_wrapv;
@@ -1217,7 +1218,6 @@ void
 vr_values::extract_range_basic (value_range_equiv *vr, gimple *stmt)
 {
   bool sop;
-  tree type = gimple_expr_type (stmt);
 
   if (is_gimple_call (stmt))
 {
@@ -1244,13 +1244,14 @@ vr_values::extract_range_basic (value_range_equiv *vr, 
gimple *stmt)
   /* Handle extraction 

[PATCH 2/3] Remove last gimple_expr_type uses

2021-07-16 Thread Richard Biener
This removes the last uses of gimple_expr_type.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

2021-07-16  Richard Biener  

* tree-ssa-sccvn.c (vn_reference_eq): Handle NULL vr->type.
(ao_ref_init_from_vn_reference): Likewise.
(vn_reference_lookup_call): Do not set vr->type to random
values.
* tree-vect-generic.c (expand_vector_piecewise): Pass in
whether we expanded parallel.
(expand_vector_parallel): Adjust.
(expand_vector_addition): Likewise.
(expand_vector_comparison): Likewise.
(expand_vector_operation): Likewise.
(expand_vector_scalar_condition): Likewise.
(expand_vector_conversion): Likewise.
---
 gcc/tree-ssa-sccvn.c| 27 +++
 gcc/tree-vect-generic.c | 25 -
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 7900df946f4..b8882b64fe3 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -764,14 +764,18 @@ vn_reference_eq (const_vn_reference_t const vr1, 
const_vn_reference_t const vr2)
   if (vr1->operands == vr2->operands)
 return true;
 
-  if (COMPLETE_TYPE_P (vr1->type) != COMPLETE_TYPE_P (vr2->type)
-  || (COMPLETE_TYPE_P (vr1->type)
- && !expressions_equal_p (TYPE_SIZE (vr1->type),
-  TYPE_SIZE (vr2->type
+  if (!vr1->type || !vr2->type)
+{
+  if (vr1->type != vr2->type)
+   return false;
+}
+  else if (COMPLETE_TYPE_P (vr1->type) != COMPLETE_TYPE_P (vr2->type)
+  || (COMPLETE_TYPE_P (vr1->type)
+  && !expressions_equal_p (TYPE_SIZE (vr1->type),
+   TYPE_SIZE (vr2->type
 return false;
-
-  if (INTEGRAL_TYPE_P (vr1->type)
-  && INTEGRAL_TYPE_P (vr2->type))
+  else if (INTEGRAL_TYPE_P (vr1->type)
+  && INTEGRAL_TYPE_P (vr2->type))
 {
   if (TYPE_PRECISION (vr1->type) != TYPE_PRECISION (vr2->type))
return false;
@@ -1049,6 +1053,10 @@ ao_ref_init_from_vn_reference (ao_ref *ref,
   poly_offset_int size = -1;
   tree size_tree = NULL_TREE;
 
+  /* We don't handle calls.  */
+  if (!type)
+return false;
+
   machine_mode mode = TYPE_MODE (type);
   if (mode == BLKmode)
 size_tree = TYPE_SIZE (type);
@@ -3671,7 +3679,10 @@ vn_reference_lookup_call (gcall *call, vn_reference_t 
*vnresult,
 
   vr->vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
   vr->operands = valueize_shared_reference_ops_from_call (call);
-  vr->type = gimple_expr_type (call);
+  tree lhs = gimple_call_lhs (call);
+  /* For non-SSA return values the referece ops contain the LHS.  */
+  vr->type = ((lhs && TREE_CODE (lhs) == SSA_NAME)
+ ? TREE_TYPE (lhs) : NULL_TREE);
   vr->punned = false;
   vr->set = 0;
   vr->base_set = 0;
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index a1257db82a6..2e00b3ed3ca 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -307,7 +307,7 @@ static tree
 expand_vector_piecewise (gimple_stmt_iterator *gsi, elem_op_func f,
 tree type, tree inner_type,
 tree a, tree b, enum tree_code code,
-tree ret_type = NULL_TREE)
+bool parallel_p, tree ret_type = NULL_TREE)
 {
   vec *v;
   tree part_width = TYPE_SIZE (inner_type);
@@ -317,8 +317,7 @@ expand_vector_piecewise (gimple_stmt_iterator *gsi, 
elem_op_func f,
   int i;
   location_t loc = gimple_location (gsi_stmt (*gsi));
 
-  if (ret_type
-  || types_compatible_p (gimple_expr_type (gsi_stmt (*gsi)), type))
+  if (ret_type || !parallel_p)
 warning_at (loc, OPT_Wvector_operation_performance,
"vector operation will be expanded piecewise");
   else
@@ -364,13 +363,13 @@ expand_vector_parallel (gimple_stmt_iterator *gsi, 
elem_op_func f, tree type,
   if (TYPE_MODE (TREE_TYPE (type)) == word_mode)
  return expand_vector_piecewise (gsi, f,
 type, TREE_TYPE (type),
-a, b, code);
+a, b, code, true);
   else if (n_words > 1)
 {
   tree word_type = build_word_mode_vector_type (n_words);
   result = expand_vector_piecewise (gsi, f,
word_type, TREE_TYPE (word_type),
-   a, b, code);
+   a, b, code, true);
   result = force_gimple_operand_gsi (gsi, result, true, NULL, true,
  GSI_SAME_STMT);
 }
@@ -410,7 +409,7 @@ expand_vector_addition (gimple_stmt_iterator *gsi,
   else
 return expand_vector_piecewise (gsi, f,
type, TREE_TYPE (type),
-   a, b, code);
+   a, b, code, false);
 }
 
 static bool
@@ -501,7 +500,7 @@ expand_vector_comparison

[PATCH] libstdc++: Use __extension__ instead of diagnostic pragmas (was: Suppress pedantic warnings about __int128)

2021-07-16 Thread Jonathan Wakely via Gcc-patches
On Fri, 16 Jul 2021 at 12:29, Jonathan Wakely wrote:
> Hmm, in fact it seems that we can just use the __uint128_t typedef
> instead, which doesn't give a pedwarn:
>
>   using __rep = __uint128_t;
>
> Is that typedef always available if __int128 is? There's a comment in
> gcc/c-family/c-common.c that I don't understand:
>
> #if HOST_BITS_PER_WIDE_INT >= 64
>   /* Note that this is different than the __int128 type that's part of
>  the generic __intN support.  */
>   if (targetm.scalar_mode_supported_p (TImode))
> lang_hooks.decls.pushdecl (build_decl (UNKNOWN_LOCATION,
>TYPE_DECL,
>get_identifier ("__int128_t"),
>intTI_type_node));
> #endif
>
> They are the same type in C++, so what is "different"? Is it possible
> for __int128 to be different from a TImode integer?

As discussed on IRC, I'm going to add a configure check that __int128
and __int128_t are the same, and similarly for the unsigned versions.
That will allow us to use __int128_t and __uint128_t to avoid the
warnings (assuming GCC doesn't change to warn consistently for the
non-standard typedefs as well as the non-standard types).

For now, I'll just use __extension__ consistently everywhere. I'm
testing the attached patch that does that.
commit d74d80c850f70d68100336c5ba0c166e22bc5ef6
Author: Jonathan Wakely 
Date:   Fri Jul 16 13:23:06 2021

libstdc++: Use __extension__ instead of diagnostic pragmas

This reverts c1676651b6c417e8f2b276a28199d76943834277 and uses the
__extension__ keyword to prevent pedantic warnings instead of diagnostic
pragmas.

This also adds the __extension__ keyword in  and 
where there are some more warnings that I missed in the previous commit.

libstdc++-v3/ChangeLog:

* include/bits/cpp_type_traits.h (__INT_N): Use __extension__
instead of diagnostic pragmas.
* include/bits/functional_hash.h: Likewise.
* include/bits/iterator_concepts.h (__is_signed_int128)
(__is_unsigned_int128): Likewise.
* include/bits/max_size_type.h (__max_size_type): Likewise.
(numeric_limits<__max_size_type>): Likewise.
* include/bits/std_abs.h (abs): Likewise.
* include/bits/stl_algobase.h (__size_to_integer): Likewise.
* include/bits/uniform_int_dist.h (uniform_int_distribution):
Likewise.
* include/ext/numeric_traits.h (_GLIBCXX_INT_N_TRAITS):
Likewise.
* include/std/type_traits (__is_integral_helper)
(__is_signed_integer, __is_unsigned_integer)
(__make_unsigned, __make_signed): Likewise.
* include/std/limits (__INT_N): Add __extension__ keyword.
* include/bits/random.h (_Select_uint_least_t)
(random_device): Likewise.

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
b/libstdc++-v3/include/bits/cpp_type_traits.h
index 8f8dd817dc2..d9462209bc2 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -253,12 +253,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
 #define __INT_N(TYPE)  \
+  __extension__\
   template<>   \
 struct __is_integer  \
 {  \
   enum { __value = 1 };\
   typedef __true_type __type;  \
 }; \
+  __extension__\
   template<>   \
 struct __is_integer \
 {  \
@@ -266,9 +268,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef __true_type __type;  \
 };
 
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wpedantic"
-
 #ifdef __GLIBCXX_TYPE_INT_N_0
 __INT_N(__GLIBCXX_TYPE_INT_N_0)
 #endif
@@ -282,8 +281,6 @@ __INT_N(__GLIBCXX_TYPE_INT_N_2)
 __INT_N(__GLIBCXX_TYPE_INT_N_3)
 #endif
 
-#pragma GCC diagnostic pop
-
 #undef __INT_N
 
   //
diff --git a/libstdc++-v3/include/bits/functional_hash.h 
b/libstdc++-v3/include/bits/functional_hash.h
index 78e3644bc74..919faba778b 100644
--- a/libstdc++-v3/include/bits/functional_hash.h
+++ b/libstdc++-v3/include/bits/functional_hash.h
@@ -171,28 +171,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Explicit specialization for unsigned long long.
   _Cxx_hashtable_define_trivial_hash(unsigned long long)
 
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wpedantic"
-
 #ifdef __GLIBCXX_TYPE_INT_N_0
+  __extension__
   _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_0)
+  __extension__
   _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_0 unsigned)
 #endif
 #ifdef __GLIBCXX_TYPE_INT_N_1
+  __extension__
   _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_1)
+  __extension__
   _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_1 unsigned)
 #endif
 #ifdef __

[PATCH 3/3] Remove gimple_expr_type

2021-07-16 Thread Richard Biener
This removes the transitional gimple_expr_type API.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

2021-07-16  Richard Biener  

* gimple.h (gimple_expr_type): Remove.
* gcc/doc/gimple.texi: Remove gimple_expr_type documentation.
---
 gcc/doc/gimple.texi |  8 
 gcc/gimple.h| 42 --
 2 files changed, 50 deletions(-)

diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 4b3d7d7452e..5d89dbcc68d 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -868,14 +868,6 @@ Return the basic block to which statement @code{G} belongs 
to.
 Return the lexical scope block holding statement @code{G}.
 @end deftypefn
 
-@deftypefn {GIMPLE function} tree gimple_expr_type (gimple stmt)
-Return the type of the main expression computed by @code{STMT}. Return
-@code{void_type_node} if @code{STMT} computes nothing. This will only return
-something meaningful for @code{GIMPLE_ASSIGN}, @code{GIMPLE_COND} and
-@code{GIMPLE_CALL}.  For all other tuple codes, it will return
-@code{void_type_node}.
-@end deftypefn
-
 @deftypefn {GIMPLE function} {enum tree_code} gimple_expr_code (gimple stmt)
 Return the tree code for the expression computed by @code{STMT}.  This
 is only meaningful for @code{GIMPLE_CALL}, @code{GIMPLE_ASSIGN} and
diff --git a/gcc/gimple.h b/gcc/gimple.h
index acf572b81be..29da9198547 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -6608,48 +6608,6 @@ is_gimple_resx (const gimple *gs)
   return gimple_code (gs) == GIMPLE_RESX;
 }
 
-/* Return the type of the main expression computed by STMT.  Return
-   void_type_node if the statement computes nothing.  */
-
-static inline tree
-gimple_expr_type (const gimple *stmt)
-{
-  enum gimple_code code = gimple_code (stmt);
-  /* In general we want to pass out a type that can be substituted
- for both the RHS and the LHS types if there is a possibly
- useless conversion involved.  That means returning the
- original RHS type as far as we can reconstruct it.  */
-  if (code == GIMPLE_CALL)
-{
-  const gcall *call_stmt = as_a  (stmt);
-  if (gimple_call_internal_p (call_stmt))
-   switch (gimple_call_internal_fn (call_stmt))
- {
- case IFN_MASK_STORE:
- case IFN_SCATTER_STORE:
-   return TREE_TYPE (gimple_call_arg (call_stmt, 3));
- case IFN_MASK_SCATTER_STORE:
-   return TREE_TYPE (gimple_call_arg (call_stmt, 4));
- default:
-   break;
- }
-  return gimple_call_return_type (call_stmt);
-}
-  else if (code == GIMPLE_ASSIGN)
-{
-  if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR)
-return TREE_TYPE (gimple_assign_rhs1 (stmt));
-  else
-/* As fallback use the type of the LHS.  */
-return TREE_TYPE (gimple_get_lhs (stmt));
-}
-  else if (code == GIMPLE_COND)
-return boolean_type_node;
-  else if (code == GIMPLE_PHI)
-return TREE_TYPE (gimple_phi_result (stmt));
-  else
-return void_type_node;
-}
 
 /* Enum and arrays used for allocation stats.  Keep in sync with
gimple.c:gimple_alloc_kind_names.  */
-- 
2.26.2


Re: contracts library support (was Re: [PATCH] PING implement pre-c++20 contracts)

2021-07-16 Thread Andrew Sutton via Gcc-patches
> Is just using std::terminate as the handler viable? Or if we're sure
> contracts in some form will go into the IS eventually, and the
> signature won't change, we could just add it in __cxxabiv1:: as you
> suggested earlier.

No, the handler needs to be configurable (at least quietly) in order
to support experimentation by SG21. No idea if it will stay that way.

Andrew


Re: [PATCH] Rewrite memset expanders with vec_duplicate

2021-07-16 Thread Richard Sandiford via Gcc-patches
"H.J. Lu via Gcc-patches"  writes:
> On Fri, Jul 16, 2021 at 4:38 AM Richard Sandiford
>  wrote:
>>
>> "H.J. Lu via Gcc-patches"  writes:
>> > 1. Rewrite builtin_memset_read_str and builtin_memset_gen_str with
>> > vec_duplicate_optab to duplicate QI value to TI/OI/XI value.
>> > 2. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
>> > scratch register to avoid stack realignment when expanding memset.
>> >
>> >   PR middle-end/90773
>> >   * builtins.c (gen_memset_value_from_prev): New function.
>> >   (gen_memset_broadcast): Likewise.
>> >   (builtin_memset_read_str): Use gen_memset_value_from_prev
>> >   and gen_memset_broadcast.
>> >   (builtin_memset_gen_str): Likewise.
>> >   * target.def (gen_memset_scratch_rtx): New hook.
>> >   * doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
>> >   * doc/tm.texi: Regenerated.
>> > ---
>> >  gcc/builtins.c | 123 +
>> >  gcc/doc/tm.texi|   5 ++
>> >  gcc/doc/tm.texi.in |   2 +
>> >  gcc/target.def |   7 +++
>> >  4 files changed, 116 insertions(+), 21 deletions(-)
>> >
>> > diff --git a/gcc/builtins.c b/gcc/builtins.c
>> > index 39ab139b7e1..c1758ae2efc 100644
>> > --- a/gcc/builtins.c
>> > +++ b/gcc/builtins.c
>> > @@ -6686,26 +6686,111 @@ expand_builtin_strncpy (tree exp, rtx target)
>> >return NULL_RTX;
>> >  }
>> >
>> > -/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
>> > -   bytes from constant string DATA + OFFSET and return it as target
>> > -   constant.  If PREV isn't nullptr, it has the RTL info from the
>> > +/* Return the RTL of a register in MODE generated from PREV in the
>> > previous iteration.  */
>> >
>> > -rtx
>> > -builtin_memset_read_str (void *data, void *prevp,
>> > -  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
>> > -  scalar_int_mode mode)
>> > +static rtx
>> > +gen_memset_value_from_prev (void *prevp, scalar_int_mode mode)
>> >  {
>> > +  rtx target = nullptr;
>> >by_pieces_prev *prev = (by_pieces_prev *) prevp;
>> >if (prev != nullptr && prev->data != nullptr)
>> >  {
>> >/* Use the previous data in the same mode.  */
>> >if (prev->mode == mode)
>> >   return prev->data;
>> > +
>> > +  rtx prev_rtx = prev->data;
>> > +  machine_mode prev_mode = prev->mode;
>> > +  unsigned int word_size = GET_MODE_SIZE (word_mode);
>> > +  if (word_size < GET_MODE_SIZE (prev->mode)
>> > +   && word_size > GET_MODE_SIZE (mode))
>> > + {
>> > +   /* First generate subreg of word mode if the previous mode is
>> > +  wider than word mode and word mode is wider than MODE.  */
>> > +   prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
>> > +   prev_mode, 0);
>> > +   prev_mode = word_mode;
>> > + }
>> > +  if (prev_rtx != nullptr)
>> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
>> >  }
>> > +  return target;
>> > +}
>> > +
>> > +/* Return the RTL of a register in MODE broadcasted from DATA.  */
>> > +
>> > +static rtx
>> > +gen_memset_broadcast (rtx data, scalar_int_mode mode)
>> > +{
>> > +  /* Skip if regno_reg_rtx isn't initialized.  */
>> > +  if (!regno_reg_rtx)
>> > +return nullptr;
>> > +
>> > +  rtx target = nullptr;
>> > +
>> > +  unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
>> > +  machine_mode vector_mode;
>> > +  if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
>> > +gcc_unreachable ();
>>
>> Sorry, I realise it's a bit late to be raising this objection now,
>> but I don't think it's a good idea to use scalar integer modes as
>> a proxy for vector modes.  In principle there's no reason why a
>> target has to define an integer mode for every vector mode.
>
> A target always defines the largest integer mode.

Right.  But a target shouldn't *need* to define an integer mode
for every vector mode.

>> If we want the mode to be a vector then I think the by-pieces
>> infrastructure should be extended to support vectors directly,
>> rather than assuming that each piece can be represented as
>> a scalar_int_mode.
>>
>
> The current by-pieces infrastructure operates on scalar_int_mode.
> Only for memset, there is
>
> /* Callback routine for store_by_pieces.  Return the RTL of a register
>containing GET_MODE_SIZE (MODE) consecutive copies of the unsigned
>char value given in the RTL register data.  For example, if mode is
>4 bytes wide, return the RTL for 0x01010101*data.  If PREV isn't
>nullptr, it has the RTL info from the previous iteration.  */
>
> static rtx
> builtin_memset_gen_str (void *data, void *prevp,
> HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> scalar_int_mode mode)
>
> It is a broadcast.  If a target can broadcast a byte to a wider integer,
> can you suggest a way to use it in the current by-pieces infrastructure?


Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Jonathan Wakely via Gcc-patches
On Fri, 16 Jul 2021 at 03:42, Jason Merrill via Libstdc++
 wrote:
> > diff --git a/libstdc++-v3/include/std/version
> > b/libstdc++-v3/include/std/version
> > index 27bcd32cb60..d5e155db48b 100644
> > --- a/libstdc++-v3/include/std/version
> > +++ b/libstdc++-v3/include/std/version
> > @@ -140,6 +140,9 @@
> >  #define __cpp_lib_filesystem 201703
> >  #define __cpp_lib_gcd 201606
> >  #define __cpp_lib_gcd_lcm 201606
> > +#ifdef __GCC_DESTRUCTIVE_SIZE
> > +# define __cpp_lib_hardware_interference_size 201703L
> > +#endif
> >  #define __cpp_lib_hypot 201603
> >  #define __cpp_lib_invoke 201411L
> >  #define __cpp_lib_lcm 201606
> > diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
> > index 3349b13fd1b..7bc67a6cb02 100644
> > --- a/libstdc++-v3/libsupc++/new
> > +++ b/libstdc++-v3/libsupc++/new
> > @@ -183,9 +183,9 @@ inline void operator delete[](void*, void*)
> > _GLIBCXX_USE_NOEXCEPT { }
> >  } // extern "C++"
> >
> >  #if __cplusplus >= 201703L
> > -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
> >  namespace std
> >  {
> > +#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
> >  #define __cpp_lib_launder 201606
> >/// Pointer optimization barrier [ptr.launder]
> >template
> > @@ -205,8 +205,14 @@ namespace std
> >void launder(const void*) = delete;
> >void launder(volatile void*) = delete;
> >void launder(const volatile void*) = delete;
> > -}
> >  #endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
> > +
> > +#ifdef __GCC_DESTRUCTIVE_SIZE
> > +# define __cpp_lib_hardware_interference_size 201703L
> > +  inline constexpr size_t hardware_destructive_interference_size =
> > __GCC_DESTRUCTIVE_SIZE;
> > +  inline constexpr size_t hardware_constructive_interference_size =
> > __GCC_CONSTRUCTIVE_SIZE;
> > +#endif // __GCC_DESTRUCTIVE_SIZE
> > +}
> >  #endif // C++17
> >
> >  #if __cplusplus > 201703L

Putting aside my dislike of the entire feature, the libstdc++ parts
are fine, thanks.


Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Richard Earnshaw via Gcc-patches




On 16/07/2021 12:17, Jonathan Wakely via Gcc-patches wrote:

On Fri, 16 Jul 2021 at 03:51, Noah Goldstein wrote:

On intel x86 systems with a private L2 cache the spatial prefetcher
can cause destructive interference along 128 byte aligned boundaries.
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf#page=60


Which is a good example of why these "constants" should never have
been standardized in the first place. Sigh.



+1 for that.

I'll have a chat with our architecture guys, but I've no idea if they'll 
commit to any (useful) values for either constant.


R.


[PATCH v2 0/6] rs6000: Add SSE4.1 "blend", "ceil", "floor"

2021-07-16 Thread Paul A. Clarke via Gcc-patches
I have combined three independent "v1" patchsets into this set,
and the "blend" patches were originally combined with "test",
which has now been merged.

Instead of copying some tests from gcc/testsuite/gcc.target/i386,
I created new tests.  The i386 tests in question used rand() to
generate the input data and assembly to compute the rounded values.
Using rand() for testing seems wrong, and the assembly is obviously
not portable.  I use static data, primarily exercising the edges of
dynamic ranges (where fractions start to be unrepresentable).

Tested on ppc64le, ppc64, ppc.

v2:
- Rewrite blends to use vec_perm.
- Improve formatting.

Paul A. Clarke (6):
  rs6000: Add support for SSE4.1 "blend" intrinsics
  rs6000: Add tests for SSE4.1 "blend" intrinsics
  rs6000: Add support for SSE4.1 "ceil" intrinsics
  rs6000: Add tests for SSE4.1 "ceil" intrinsics
  rs6000: Add support for SSE4.1 "floor" intrinsics
  rs6000: Add tests for SSE4.1 "floor" intrinsics

 gcc/config/rs6000/smmintrin.h | 124 ++
 .../gcc.target/powerpc/sse4_1-blendpd.c   |  89 +
 .../gcc.target/powerpc/sse4_1-blendps-2.c |  81 
 .../gcc.target/powerpc/sse4_1-blendps.c   |  90 +
 .../gcc.target/powerpc/sse4_1-blendvpd.c  |  65 +
 .../gcc.target/powerpc/sse4_1-ceilpd.c|  51 +++
 .../gcc.target/powerpc/sse4_1-ceilps.c|  41 ++
 .../gcc.target/powerpc/sse4_1-ceilsd.c| 119 +
 .../gcc.target/powerpc/sse4_1-ceilss.c|  95 ++
 .../gcc.target/powerpc/sse4_1-check.h |   4 +
 .../gcc.target/powerpc/sse4_1-floorpd.c   |  51 +++
 .../gcc.target/powerpc/sse4_1-floorps.c   |  41 ++
 .../gcc.target/powerpc/sse4_1-floorsd.c   | 119 +
 .../gcc.target/powerpc/sse4_1-floorss.c   |  95 ++
 .../gcc.target/powerpc/sse4_1-round-data.h|  20 +++
 .../gcc.target/powerpc/sse4_1-round.h |  27 
 .../gcc.target/powerpc/sse4_1-round2.h|  27 
 .../gcc.target/powerpc/sse4_1-roundpd-2.c |  36 +
 .../gcc.target/powerpc/sse4_1-roundpd-3.c |  36 +
 19 files changed, 1211 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c

-- 
2.27.0



[PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics

2021-07-16 Thread Paul A. Clarke via Gcc-patches
_mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
Add these four to complete the set.

2021-07-16  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
_mm_blend_ps, _mm_blendv_ps): New.
---
v2:
- Per review from Bill, rewrote _mm_blend_pd and _mm_blendv_pd to use
  vec_perm instead of gather/unpack/select.

 gcc/config/rs6000/smmintrin.h | 60 +++
 1 file changed, 60 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 6a010fdbb96f..69e54702a877 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,6 +116,66 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
   return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
 }
 
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
+{
+  __v16qu __pcv[] =
+{
+  {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
+  { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
+  {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
+  { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }
+};
+  __v16qu __r = vec_perm ((__v16qu) __A, (__v16qu)__B, __pcv[__imm8]);
+  return (__m128d) __r;
+}
+
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
+{
+  const __v2di __zero = {0};
+  const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, 
__zero);
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
+{
+  __v16qu __pcv[] =
+{
+  {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
+  { 16, 17, 18, 19,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
+  {  0,  1,  2,  3, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
+  { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
+  {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 12, 13, 14, 15 },
+  { 16, 17, 18, 19,  4,  5,  6,  7, 24, 25, 26, 27, 12, 13, 14, 15 },
+  {  0,  1,  2,  3, 20, 21, 22, 23, 24, 25, 26, 27, 12, 13, 14, 15 },
+  { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 12, 13, 14, 15 },
+  {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 28, 29, 30, 31 },
+  { 16, 17, 18, 19,  4,  5,  6,  7,  8,  9, 10, 11, 28, 29, 30, 31 },
+  {  0,  1,  2,  3, 20, 21, 22, 23,  8,  9, 10, 11, 28, 29, 30, 31 },
+  { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 28, 29, 30, 31 },
+  {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
+  { 16, 17, 18, 19,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
+  {  0,  1,  2,  3, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 },
+  { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 },
+};
+  __v16qu __r = vec_perm ((__v16qu) __A, (__v16qu)__B, __pcv[__imm8]);
+  return (__m128) __r;
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
+{
+  const __v4si __zero = {0};
+  const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
+  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
+}
+
 __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_testz_si128 (__m128i __A, __m128i __B)
-- 
2.27.0



[PATCH v2 2/6] rs6000: Add tests for SSE4.1 "blend" intrinsics

2021-07-16 Thread Paul A. Clarke via Gcc-patches
Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps,
_mm_blendv_ps from gcc/testsuite/gcc.target/i386.

2021-07-16  Paul A. Clarke  

gcc/testsuite
* gcc.target/powerpc/sse4_1-blendpd.c: Copy from gcc.target/i386.
* gcc.target/powerpc/sse4_1-blendps-2.c: Likewise.
* gcc.target/powerpc/sse4_1-blendps.c: Likewise.
* gcc.target/powerpc/sse4_1-blendvpd.c: Likewise.
---
v2: Improve formatting per review from Bill.

 .../gcc.target/powerpc/sse4_1-blendpd.c   | 89 ++
 .../gcc.target/powerpc/sse4_1-blendps-2.c | 81 +
 .../gcc.target/powerpc/sse4_1-blendps.c   | 90 +++
 .../gcc.target/powerpc/sse4_1-blendvpd.c  | 65 ++
 4 files changed, 325 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
new file mode 100644
index ..ca1780471fa2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
@@ -0,0 +1,89 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+#include 
+
+#define NUM 20
+
+#ifndef MASK
+#define MASK 0x03
+#endif
+
+static void
+init_blendpd (double *src1, double *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 2; i++)
+{
+  src1[i] = i * i * sign;
+  src2[i] = (i + 20) * sign;
+  sign = -sign;
+}
+}
+
+static int
+check_blendpd (__m128d *dst, double *src1, double *src2)
+{
+  double tmp[2];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+
+  for(j = 0; j < 2; j++)
+if ((MASK & (1 << j)))
+  tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+TEST (void)
+{
+  __m128d x, y;
+  union
+{
+  __m128d x[NUM];
+  double d[NUM * 2];
+} dst, src1, src2;
+  union
+{
+  __m128d x;
+  double d[2];
+} src3;
+  int i;
+
+  init_blendpd (src1.d, src2.d);
+
+  /* Check blendpd imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blend_pd (src1.x[i], src2.x[i], MASK);
+  if (check_blendpd (&dst.x[i], &src1.d[i * 2], &src2.d[i * 2]))
+   abort ();
+}
+
+  /* Check blendpd imm8, xmm, xmm */
+  src3.x = _mm_setzero_pd ();
+
+  x = _mm_blend_pd (dst.x[2], src3.x, MASK);
+  y = _mm_blend_pd (src3.x, dst.x[2], MASK);
+
+  if (check_blendpd (&x, &dst.d[4], &src3.d[0]))
+abort ();
+
+  if (check_blendpd (&y, &src3.d[0], &dst.d[4]))
+abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
new file mode 100644
index ..768b6e64bbae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#include "sse4_1-check.h"
+
+#include 
+#include 
+#include 
+
+#define NUM 20
+
+#undef MASK
+#define MASK 0xe
+
+static void
+init_blendps (float *src1, float *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 4; i++)
+{
+  src1[i] = i * i * sign;
+  src2[i] = (i + 20) * sign;
+  sign = -sign;
+}
+}
+
+static int
+check_blendps (__m128 *dst, float *src1, float *src2)
+{
+  float tmp[4];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+  for (j = 0; j < 4; j++)
+if ((MASK & (1 << j)))
+  tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+sse4_1_test (void)
+{
+  __m128 x, y;
+  union
+{
+  __m128 x[NUM];
+  float f[NUM * 4];
+} dst, src1, src2;
+  union
+{
+  __m128 x;
+  float f[4];
+} src3;
+  int i;
+
+  init_blendps (src1.f, src2.f);
+
+  for (i = 0; i < 4; i++)
+src3.f[i] = (int) rand ();
+
+  /* Check blendps imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK); 
+  if (check_blendps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4]))
+   abort ();
+}
+
+   /* Check blendps imm8, xmm, xmm */
+  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
+  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
+
+  if (check_blendps (&x, &dst.f[8], &src3.f[0]))
+abort ();
+
+  if (check_blendps (&y, &src3.f[0], &dst.f[8]))
+abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
new file mode 100644
index ..2f114b69a84b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c

[PATCH v2 3/6] rs6000: Add support for SSE4.1 "ceil" intrinsics

2021-07-16 Thread Paul A. Clarke via Gcc-patches
2021-07-16  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps,
_mm_ceil_sd, _mm_ceil_ss): New.
---
v2: Improve formatting per review from Bill.

 gcc/config/rs6000/smmintrin.h | 32 
 1 file changed, 32 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 69e54702a877..cad770a67631 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -232,6 +232,38 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
   return any_ones * any_zeros;
 }
 
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ceil_pd (__m128d __A)
+{
+  return (__m128d) vec_ceil ((__v2df) __A);
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ceil_ps (__m128 __A)
+{
+  return (__m128) vec_ceil ((__v4sf) __A);
+}
+
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ceil_sd (__m128d __A, __m128d __B)
+{
+  __v2df r = vec_ceil ((__v2df) __B);
+  r[1] = ((__v2df) __A)[1];
+  return (__m128d) r;
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ceil_ss (__m128 __A, __m128 __B)
+{
+  __v4sf r = (__v4sf) __A;
+  r[0] = __builtin_ceil (((__v4sf) __B)[0]);
+  return r;
+}
+
 /* Return horizontal packed word minimum and its index in bits [15:0]
and bits [18:16] respectively.  */
 __inline __m128i
-- 
2.27.0



[PATCH v2 5/6] rs6000: Add support for SSE4.1 "floor" intrinsics

2021-07-16 Thread Paul A. Clarke via Gcc-patches
2021-07-16  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_floor_pd, _mm_floor_ps,
_mm_floor_sd, _mm_floor_ss): New.
---
v2: Improve formatting per review from Bill.

 gcc/config/rs6000/smmintrin.h | 32 
 1 file changed, 32 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index cad770a67631..5960991e0af7 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -264,6 +264,38 @@ _mm_ceil_ss (__m128 __A, __m128 __B)
   return r;
 }
 
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_floor_pd (__m128d __A)
+{
+  return (__m128d) vec_floor ((__v2df) __A);
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_floor_ps (__m128 __A)
+{
+  return (__m128) vec_floor ((__v4sf) __A);
+}
+
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_floor_sd (__m128d __A, __m128d __B)
+{
+  __v2df r = vec_floor ((__v2df) __B);
+  r[1] = ((__v2df) __A)[1];
+  return (__m128d) r;
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_floor_ss (__m128 __A, __m128 __B)
+{
+  __v4sf r = (__v4sf) __A;
+  r[0] = __builtin_floor (((__v4sf) __B)[0]);
+  return r;
+}
+
 /* Return horizontal packed word minimum and its index in bits [15:0]
and bits [18:16] respectively.  */
 __inline __m128i
-- 
2.27.0



[PATCH v2 4/6] rs6000: Add tests for SSE4.1 "ceil" intrinsics

2021-07-16 Thread Paul A. Clarke via Gcc-patches
Add the tests for _mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss.

Copy a test for _mm_ceil_pd and _mm_ceil_ps from
gcc/testsuite/gcc.target/i386.

Define __VSX_SSE2__ to pick up some union definitions in
m128-check.h.

2021-07-16  Paul A. Clarke  

gcc/testsuite
* gcc.target/powerpc/sse4_1-ceilpd.c: New.
* gcc.target/powerpc/sse4_1-ceilps.c: New.
* gcc.target/powerpc/sse4_1-ceilsd.c: New.
* gcc.target/powerpc/sse4_1-ceilss.c: New.
* gcc.target/powerpc/sse4_1-round-data.h: New.
* gcc.target/powerpc/sse4_1-round.h: New.
* gcc.target/powerpc/sse4_1-round2.h: New.
* gcc.target/powerpc/sse4_1-roundpd-3.c: Copy from gcc.target/i386.
* gcc.target/powerpc/sse4_1-check.h (__VSX_SSE2__): Define.
---
v2: Improve formatting per review from Bill.

 .../gcc.target/powerpc/sse4_1-ceilpd.c|  51 
 .../gcc.target/powerpc/sse4_1-ceilps.c|  41 ++
 .../gcc.target/powerpc/sse4_1-ceilsd.c| 119 ++
 .../gcc.target/powerpc/sse4_1-ceilss.c|  95 ++
 .../gcc.target/powerpc/sse4_1-check.h |   4 +
 .../gcc.target/powerpc/sse4_1-round-data.h|  20 +++
 .../gcc.target/powerpc/sse4_1-round.h |  27 
 .../gcc.target/powerpc/sse4_1-round2.h|  27 
 .../gcc.target/powerpc/sse4_1-roundpd-3.c |  36 ++
 9 files changed, 420 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
new file mode 100644
index ..f532fdb9c285
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, mode) _mm_ceil_pd (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  1.0 } },
+  { .value = { .f = {  0.50,  0.75 } }, .answer = {  1.0,  1.0 } },
+
+  { { .f = {  0x1.cp+50,  0x1.dp+50 } },
+   {  0x1.cp+50,  0x1.0p+51 } },
+  { { .f = {  0x1.ep+50,  0x1.fp+50 } },
+   {  0x1.0p+51,  0x1.0p+51 } },
+  { { .f = {  0x1.0p+51,  0x1.1p+51 } },
+   {  0x1.0p+51,  0x1.2p+51 } },
+  { { .f = {  0x1.2p+51,  0x1.3p+51 } },
+   {  0x1.2p+51,  0x1.4p+51 } },
+
+  { { .f = {  0x1.ep+51,  0x1.fp+51 } },
+   {  0x1.ep+51,  0x1.0p+52 } },
+  { { .f = {  0x1.0p+52,  0x1.1p+52 } },
+   {  0x1.0p+52,  0x1.1p+52 } },
+
+  { { .f = { -0x1.1p+52, -0x1.0p+52 } },
+   { -0x1.1p+52, -0x1.0p+52 } },
+  { { .f = { -0x1.fp+51, -0x1.ep+51 } },
+   { -0x1.ep+51, -0x1.ep+51 } },
+
+  { { .f = { -0x1.3p+51, -0x1.2p+51 } },
+   { -0x1.2p+51, -0x1.2p+51 } },
+  { { .f = { -0x1.1p+51, -0x1.0p+51 } },
+   { -0x1.0p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.fp+50, -0x1.ep+50 } },
+   { -0x1.cp+50, -0x1.cp+50 } },
+  { { .f = { -0x1.dp+50, -0x1.cp+50 } },
+   { -0x1.cp+50, -0x1.cp+50 } },
+
+  { { .f = { -1.00, -0.75 } }, { -1.0,  0.0 } },
+  { { .f = { -0.50, -0.25 } }, {  0.0,  0.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
new file mode 100644
index ..1e2a57d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, mode) _mm_ceil_ps (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data d

[PATCH v2 6/6] rs6000: Add tests for SSE4.1 "floor" intrinsics

2021-07-16 Thread Paul A. Clarke via Gcc-patches
Add the tests for _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss.
These are modelled after (and depend upon parts of) the tests for
_mm_ceil intrinsics, recently posted.

Copy a test for _mm_floor_sd from gcc/testsuite/gcc.target/i386.

2021-07-16  Paul A. Clarke  

gcc/testsuite
* gcc.target/powerpc/sse4_1-floorpd.c: New.
* gcc.target/powerpc/sse4_1-floorps.c: New.
* gcc.target/powerpc/sse4_1-floorsd.c: New.
* gcc.target/powerpc/sse4_1-floorss.c: New.
* gcc.target/powerpc/sse4_1-roundpd-2.c: Copy from
gcc/testsuite/gcc.target/i386.
---
v2: Improve formatting per review from Bill.

 .../gcc.target/powerpc/sse4_1-floorpd.c   |  51 
 .../gcc.target/powerpc/sse4_1-floorps.c   |  41 ++
 .../gcc.target/powerpc/sse4_1-floorsd.c   | 119 ++
 .../gcc.target/powerpc/sse4_1-floorss.c   |  95 ++
 .../gcc.target/powerpc/sse4_1-roundpd-2.c |  36 ++
 5 files changed, 342 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
new file mode 100644
index ..ad21644f50c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, mode) _mm_floor_pd (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  0.0 } },
+  { .value = { .f = {  0.50,  0.75 } }, .answer = {  0.0,  0.0 } },
+
+  { { .f = {  0x1.cp+50,  0x1.dp+50 } },
+   {  0x1.cp+50,  0x1.cp+50 } },
+  { { .f = {  0x1.ep+50,  0x1.0p+51 } },
+   {  0x1.cp+50,  0x1.0p+51 } },
+  { { .f = {  0x1.0p+51,  0x1.1p+51 } },
+   {  0x1.0p+51,  0x1.0p+51 } },
+  { { .f = {  0x1.2p+51,  0x1.3p+51 } },
+   {  0x1.2p+51,  0x1.2p+51 } },
+
+  { { .f = {  0x1.ep+51,  0x1.fp+51 } },
+   {  0x1.ep+51,  0x1.ep+51 } },
+  { { .f = {  0x1.0p+52,  0x1.1p+52 } },
+   {  0x1.0p+52,  0x1.1p+52 } },
+
+  { { .f = { -0x1.1p+52, -0x1.0p+52 } },
+   { -0x1.1p+52, -0x1.0p+52 } },
+  { { .f = { -0x1.fp+51, -0x1.ep+52 } },
+   { -0x1.0p+52, -0x1.ep+52 } },
+
+  { { .f = { -0x1.3p+51, -0x1.2p+51 } },
+   { -0x1.4p+51, -0x1.2p+51 } },
+  { { .f = { -0x1.1p+51, -0x1.0p+51 } },
+   { -0x1.2p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.fp+50, -0x1.ep+50 } },
+   { -0x1.0p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.dp+50, -0x1.cp+50 } },
+   { -0x1.0p+51, -0x1.cp+50 } },
+
+  { { .f = { -1.00, -0.75 } }, { -1.0, -1.0 } },
+  { { .f = { -0.50, -0.25 } }, { -1.0, -1.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
new file mode 100644
index ..a53ef9aa9e8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, mode) _mm_floor_ps (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { { .f = {  0.00,  0.25,  0.50,  0.75 } }, {  0.0,  0.0,  0.0,  0.0 } },
+
+  { { .f = {  0x1.f8p+21,  0x1.fap+21,
+ 0x1.fcp+21,  0x1.fep+21 } },
+   {  0x1.f8p+21,  0x1.f8p+21,
+ 0x1.f8p+21,  0x1.f8p+21 } },
+
+  { { .f = {  0x1.fap+22,  0x1.fcp+22,
+ 0x1.fep+22,  0x1.fep+23 } },
+   {  0x1.f8p+22,  0x1.fcp+22,
+ 0x1.fcp+22,  0x1.fep+23 } },
+
+  { { .f = { -0x1.fep+23, -0x1.fep+22,
+-0x1.fcp+22, -0x1.fap+22 } },
+   { -0

Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1

2021-07-16 Thread Jonathan Wakely via Gcc-patches
On Thu, 17 Dec 2020 at 20:51, Maged Michael wrote:
>
> Please find a proposed patch for _Sp_counted_base::_M_release to skip the
> two atomic instructions that decrement each of the use count and the weak
> count when both are 1. I proposed the general idea in an earlier thread (
> https://gcc.gnu.org/pipermail/libstdc++/2020-December/051642.html) and got
> useful feedback on a draft patch and responses to related questions about
> multi-granular atomicity and alignment. This patch is based on that
> feedback.
>
>
> I added a check for thread sanitizer to use the current algorithm in that
> case because TSAN does not support multi-granular atomicity. I'd like to
> add a check of __has_feature(thread_sanitizer) for building using LLVM. I
> found examples of __has_feature in libstdc++

There are no uses of __has_feature in libstdc++. We do use
__has_builtin (which GCC also supports) and Clang's __is_identifier
(which GCC doesn't support) to work around some weird semantics of
__has_builtin in older versions of Clang.


> but it doesn't seem to be
> recognized in shared_ptr_base.h. Any guidance on how to check
> __has_feature(thread_sanitizer) in this patch?

I think we want to do something like this in include/bits/c++config

#if __SANITIZE_THREAD__
#  define _GLIBCXX_TSAN 1
#elif defined __has_feature
# if __has_feature(thread_sanitizer)
#  define _GLIBCXX_TSAN 1
# endif
#endif

Then in bits/shared_ptr_base.h

#if _GLIBCXX_TSAN
_M_release_orig();
return;
#endif



> GCC generates code for _M_release that is larger and more complex than that
> generated by LLVM. I'd like to file a bug report about that. Jonathan,

Is this the same issue as https://gcc.gnu.org/PR101406 ?

> would you please create a bugzilla account for me (
> https://gcc.gnu.org/bugzilla/) using my gmail address. Thank you.

Done (sorry, I didn't notice the request in this mail until coming
back to it to review the patch properly).



>
>
> Information about the patch:
>
> - Benefits of the patch: Save the cost of the last atomic decrements of
> each of the use count and the weak count in _Sp_counted_base. Atomic
> instructions are significantly slower than regular loads and stores across
> major architectures.
>
> - How current code works: _M_release() atomically decrements the use count,
> checks if it was 1, if so calls _M_dispose(), atomically decrements the
> weak count, checks if it was 1, and if so calls _M_destroy().
>
> - How the proposed patch works: _M_release() loads both use count and weak
> count together atomically (when properly aligned), checks if the value is
> equal to the value of both counts equal to 1 (e.g., 0x10001), and if so
> calls _M_dispose() and _M_destroy(). Otherwise, it follows the original
> algorithm.
>
> - Why it works: When the current thread executing _M_release() finds each
> of the counts is equal to 1, then (when _lock_policy is _S_atomic) no other
> threads could possibly hold use or weak references to this control block.
> That is, no other threads could possibly access the counts or the protected
> object.
>
> - The proposed patch is intended to interact correctly with current code
> (under certain conditions: _Lock_policy is _S_atomic, proper alignment, and
> native lock-free support for atomic operations). That is, multiple threads
> using different versions of the code with and without the patch operating
> on the same objects should always interact correctly. The intent for the
> patch is to be ABI compatible with the current implementation.
>
> - The proposed patch involves a performance trade-off between saving the
> costs of two atomic instructions when the counts are both 1 vs adding the
> cost of loading the combined counts and comparison with two ones (e.g.,
> 0x10001).
>
> - The patch has been in use (built using LLVM) in a large environment for
> many months. The performance gains outweigh the losses (roughly 10 to 1)
> across a large variety of workloads.
>
>
> I'd appreciate feedback on the patch and any suggestions for checking
> __has_feature(thread_sanitizer).

N.B. gmail completely mangles patches unless you send them as attachments.


> diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h
> b/libstdc++-v3/include/bits/shared_ptr_base.h
>
> index 368b2d7379a..a8fc944af5f 100644
>
> --- a/libstdc++-v3/include/bits/shared_ptr_base.h
>
> +++ b/libstdc++-v3/include/bits/shared_ptr_base.h
>
> @@ -153,20 +153,78 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
> if (!_M_add_ref_lock_nothrow())
>
>   __throw_bad_weak_ptr();
>
>}
>
>
>bool
>
>_M_add_ref_lock_nothrow() noexcept;
>
>
>void
>
>_M_release() noexcept
>
>{
>
> +#if __SANITIZE_THREAD__
>
> +_M_release_orig();
>
> +return;
>
> +#endif
>
> +if (!__atomic_always_lock_free(sizeof(long long), 0) ||

The line break should come before the logical operator, not after.
This makes it easier to see which operator it is, because it's at a

Re: [PATCH 1/3] Remove gimple_expr_type uses from value-range code

2021-07-16 Thread Andrew MacLeod via Gcc-patches

On 7/16/21 9:02 AM, Richard Biener wrote:

This removes the last uses from value-range code.

Bootstrap & regtest running on x86_64-unknown-linux-gnu, OK?


absolutely.




Re: [PATCH 1/2] arm: Fix vcond_mask expander for MVE (PR target/100757)

2021-07-16 Thread Richard Sandiford via Gcc-patches
Hi,

Sorry for the slow review.  I'd initially held off from reviewing
because it sounded like you were trying to treat predicates as
MODE_VECTOR_BOOL instead.  Is that right?  If so, how did that go?

It does feel like the right long-term direction.  Treating masks as
integers for AVX seems to make some things more difficult than they
should be.  Also, RTL like:

> +(define_expand "vec_cmphi"
> +  [(set (match_operand:HI 0 "s_register_operand")
> + (match_operator:HI 1 "comparison_operator"
> +   [(match_operand:MVE_VLD_ST 2 "s_register_operand")
> +(match_operand:MVE_VLD_ST 3 "reg_or_zero_operand")]))]
> +  "TARGET_HAVE_MVE
> +   && (! || flag_unsafe_math_optimizations)"
> +{
> +  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> +  operands[2], operands[3], false, false);
> +  DONE;
> +})

seems kind-of suspect, since (as I think you said in a PR),
(eq:HI X Y) would normally be a single boolean.  Having MODE_VECTOR_BOOL
means that we can represent the comparisons “properly”, even in
define_insns.

But since this usage is confined to define_expands, I guess it
doesn't matter much for the purposes of this patch.  Any switch
to MODE_VECTOR_BOOL would leave most of the patch in tact (contrary
to what I'd initially assumed).

> @@ -31061,13 +31065,7 @@ arm_expand_vector_compare (rtx target, rtx_code 
> code, rtx op0, rtx op1,
>  
> /* If we are not expanding a vcond, build the result here.  */
> if (!vcond_mve)
> - {
> -   rtx zero = gen_reg_rtx (cmp_result_mode);
> -   rtx one = gen_reg_rtx (cmp_result_mode);
> -   emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> -   emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> -   emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, 
> one, zero, vpr_p0));
> - }
> + emit_move_insn (target, vpr_p0);

The code above this is:

  if (vcond_mve)
vpr_p0 = target;
  else
vpr_p0 = gen_reg_rtx (HImode);

so couldn't we simply use vpr_p0 = target unconditionally (or drop
vpr_p0 altogether)?  Same for the other cases.

> @@ -31178,20 +31164,21 @@ arm_expand_vcond (rtx *operands, machine_mode 
> cmp_result_mode)
>   mask, operands[1], operands[2]));
>else
>  {
> -  machine_mode cmp_mode = GET_MODE (operands[4]);
> +  machine_mode cmp_mode = GET_MODE (operands[0]);
>rtx vpr_p0 = mask;
> -  rtx zero = gen_reg_rtx (cmp_mode);
> -  rtx one = gen_reg_rtx (cmp_mode);
> -  emit_move_insn (zero, CONST0_RTX (cmp_mode));
> -  emit_move_insn (one, CONST1_RTX (cmp_mode));
> +
>switch (GET_MODE_CLASS (cmp_mode))
>   {
>   case MODE_VECTOR_INT:
> -   emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], 
> one, zero, vpr_p0));
> +   emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_mode, operands[0],
> +  operands[1], operands[2], vpr_p0));
> break;
>   case MODE_VECTOR_FLOAT:
> if (TARGET_HAVE_MVE_FLOAT)
> - emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, 
> vpr_p0));
> + emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0],
> +  operands[1], operands[2], vpr_p0));
> +   else
> + gcc_unreachable ();
> break;
>   default:
> gcc_unreachable ();

Here too vpr_p0 feels a bit redundant now.

> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index e393518ea88..a9840408bdd 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -10516,3 +10516,58 @@ (define_insn "*movmisalign_mve_load"
>"vldr.\t%q0, %E1"
>[(set_attr "type" "mve_load")]
>  )
> +
> +;; Expanders for vec_cmp and vcond
> +
> +(define_expand "vec_cmphi"
> +  [(set (match_operand:HI 0 "s_register_operand")
> + (match_operator:HI 1 "comparison_operator"
> +   [(match_operand:MVE_VLD_ST 2 "s_register_operand")
> +(match_operand:MVE_VLD_ST 3 "reg_or_zero_operand")]))]
> +  "TARGET_HAVE_MVE
> +   && (! || flag_unsafe_math_optimizations)"

Is flag_unsafe_math_optimizations needed for MVE?  For Neon we had
it because of flush to zero (at least, I think that was the only reason).

> +{
> +  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
> +  operands[2], operands[3], false, false);
> +  DONE;
> +})

The (snipped) tests look good, but if we do support
!flag_unsafe_math_optimizations, it would be good to have some tests
for unordered comparisons too.  E.g.:

#define ordered(A, B) (!__builtin_isunordered (A, B))
#define unordered(A, B) (__builtin_isunordered (A, B))
#define ueq(A, B) (!__builtin_islessgreater (A, B))
#define ult(A, B) (__builtin_isless (A, B))
#define ule(A, B) (__builtin_islessequal (A, B))
#define uge(A, B) (__builtin_isgreaterequal (A, B))
#define ugt(A, B) (__builtin_isgreater (A, B))
#define nueq(A, B) (__b

Re: [PATCH] Rewrite memset expanders with vec_duplicate

2021-07-16 Thread H.J. Lu via Gcc-patches
On Fri, Jul 16, 2021 at 6:24 AM Richard Sandiford
 wrote:
>
> "H.J. Lu via Gcc-patches"  writes:
> > On Fri, Jul 16, 2021 at 4:38 AM Richard Sandiford
> >  wrote:
> >>
> >> "H.J. Lu via Gcc-patches"  writes:
> >> > 1. Rewrite builtin_memset_read_str and builtin_memset_gen_str with
> >> > vec_duplicate_optab to duplicate QI value to TI/OI/XI value.
> >> > 2. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
> >> > scratch register to avoid stack realignment when expanding memset.
> >> >
> >> >   PR middle-end/90773
> >> >   * builtins.c (gen_memset_value_from_prev): New function.
> >> >   (gen_memset_broadcast): Likewise.
> >> >   (builtin_memset_read_str): Use gen_memset_value_from_prev
> >> >   and gen_memset_broadcast.
> >> >   (builtin_memset_gen_str): Likewise.
> >> >   * target.def (gen_memset_scratch_rtx): New hook.
> >> >   * doc/tm.texi.in: Add TARGET_GEN_MEMSET_SCRATCH_RTX.
> >> >   * doc/tm.texi: Regenerated.
> >> > ---
> >> >  gcc/builtins.c | 123 +
> >> >  gcc/doc/tm.texi|   5 ++
> >> >  gcc/doc/tm.texi.in |   2 +
> >> >  gcc/target.def |   7 +++
> >> >  4 files changed, 116 insertions(+), 21 deletions(-)
> >> >
> >> > diff --git a/gcc/builtins.c b/gcc/builtins.c
> >> > index 39ab139b7e1..c1758ae2efc 100644
> >> > --- a/gcc/builtins.c
> >> > +++ b/gcc/builtins.c
> >> > @@ -6686,26 +6686,111 @@ expand_builtin_strncpy (tree exp, rtx target)
> >> >return NULL_RTX;
> >> >  }
> >> >
> >> > -/* Callback routine for store_by_pieces.  Read GET_MODE_BITSIZE (MODE)
> >> > -   bytes from constant string DATA + OFFSET and return it as target
> >> > -   constant.  If PREV isn't nullptr, it has the RTL info from the
> >> > +/* Return the RTL of a register in MODE generated from PREV in the
> >> > previous iteration.  */
> >> >
> >> > -rtx
> >> > -builtin_memset_read_str (void *data, void *prevp,
> >> > -  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
> >> > -  scalar_int_mode mode)
> >> > +static rtx
> >> > +gen_memset_value_from_prev (void *prevp, scalar_int_mode mode)
> >> >  {
> >> > +  rtx target = nullptr;
> >> >by_pieces_prev *prev = (by_pieces_prev *) prevp;
> >> >if (prev != nullptr && prev->data != nullptr)
> >> >  {
> >> >/* Use the previous data in the same mode.  */
> >> >if (prev->mode == mode)
> >> >   return prev->data;
> >> > +
> >> > +  rtx prev_rtx = prev->data;
> >> > +  machine_mode prev_mode = prev->mode;
> >> > +  unsigned int word_size = GET_MODE_SIZE (word_mode);
> >> > +  if (word_size < GET_MODE_SIZE (prev->mode)
> >> > +   && word_size > GET_MODE_SIZE (mode))
> >> > + {
> >> > +   /* First generate subreg of word mode if the previous mode is
> >> > +  wider than word mode and word mode is wider than MODE.  */
> >> > +   prev_rtx = simplify_gen_subreg (word_mode, prev_rtx,
> >> > +   prev_mode, 0);
> >> > +   prev_mode = word_mode;
> >> > + }
> >> > +  if (prev_rtx != nullptr)
> >> > + target = simplify_gen_subreg (mode, prev_rtx, prev_mode, 0);
> >> >  }
> >> > +  return target;
> >> > +}
> >> > +
> >> > +/* Return the RTL of a register in MODE broadcasted from DATA.  */
> >> > +
> >> > +static rtx
> >> > +gen_memset_broadcast (rtx data, scalar_int_mode mode)
> >> > +{
> >> > +  /* Skip if regno_reg_rtx isn't initialized.  */
> >> > +  if (!regno_reg_rtx)
> >> > +return nullptr;
> >> > +
> >> > +  rtx target = nullptr;
> >> > +
> >> > +  unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
> >> > +  machine_mode vector_mode;
> >> > +  if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
> >> > +gcc_unreachable ();
> >>
> >> Sorry, I realise it's a bit late to be raising this objection now,
> >> but I don't think it's a good idea to use scalar integer modes as
> >> a proxy for vector modes.  In principle there's no reason why a
> >> target has to define an integer mode for every vector mode.
> >
> > A target always defines the largest integer mode.
>
> Right.  But a target shouldn't *need* to define an integer mode
> for every vector mode.
>
> >> If we want the mode to be a vector then I think the by-pieces
> >> infrastructure should be extended to support vectors directly,
> >> rather than assuming that each piece can be represented as
> >> a scalar_int_mode.
> >>
> >
> > The current by-pieces infrastructure operates on scalar_int_mode.
> > Only for memset, there is
> >
> > /* Callback routine for store_by_pieces.  Return the RTL of a register
> >containing GET_MODE_SIZE (MODE) consecutive copies of the unsigned
> >char value given in the RTL register data.  For example, if mode is
> >4 bytes wide, return the RTL for 0x01010101*data.  If PREV isn't
> >nullptr, it has the RTL info from the previous iteration.  */
> >
> > static rtx
> > builtin_memset_gen_str (v

Re: [PATCH 1/2] arm: Fix vcond_mask expander for MVE (PR target/100757)

2021-07-16 Thread Christophe LYON via Gcc-patches



On 16/07/2021 16:06, Richard Sandiford via Gcc-patches wrote:

Hi,

Sorry for the slow review.  I'd initially held off from reviewing
because it sounded like you were trying to treat predicates as
MODE_VECTOR_BOOL instead.  Is that right?  If so, how did that go?



Yes, that's part of PR 101325. It's still WIP as I wrote in the PR, I'm 
not sure I got it right yet. At the moment it seems it would imply a lot 
of changes, I'll have to look at AArch64's implementation in more details.


I hoped this fix could be merged before switching to MODE_VECTOR_BOOL.



It does feel like the right long-term direction.  Treating masks as
integers for AVX seems to make some things more difficult than they
should be.  Also, RTL like:


OK, I see, good to know.





+(define_expand "vec_cmphi"
+  [(set (match_operand:HI 0 "s_register_operand")
+   (match_operator:HI 1 "comparison_operator"
+ [(match_operand:MVE_VLD_ST 2 "s_register_operand")
+  (match_operand:MVE_VLD_ST 3 "reg_or_zero_operand")]))]
+  "TARGET_HAVE_MVE
+   && (! || flag_unsafe_math_optimizations)"
+{
+  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
+operands[2], operands[3], false, false);
+  DONE;
+})

seems kind-of suspect, since (as I think you said in a PR),
(eq:HI X Y) would normally be a single boolean.  Having MODE_VECTOR_BOOL
means that we can represent the comparisons “properly”, even in
define_insns.

But since this usage is confined to define_expands, I guess it
doesn't matter much for the purposes of this patch.  Any switch
to MODE_VECTOR_BOOL would leave most of the patch in tact (contrary
to what I'd initially assumed).


@@ -31061,13 +31065,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, 
rtx op0, rtx op1,
  
  	  /* If we are not expanding a vcond, build the result here.  */

  if (!vcond_mve)
-   {
- rtx zero = gen_reg_rtx (cmp_result_mode);
- rtx one = gen_reg_rtx (cmp_result_mode);
- emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
- emit_move_insn (one, CONST1_RTX (cmp_result_mode));
- emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, 
one, zero, vpr_p0));
-   }
+   emit_move_insn (target, vpr_p0);

The code above this is:

  if (vcond_mve)
vpr_p0 = target;
  else
vpr_p0 = gen_reg_rtx (HImode);

so couldn't we simply use vpr_p0 = target unconditionally (or drop
vpr_p0 altogether)?  Same for the other cases.

Probably, I'll check that.

@@ -31178,20 +31164,21 @@ arm_expand_vcond (rtx *operands, machine_mode 
cmp_result_mode)
mask, operands[1], operands[2]));
else
  {
-  machine_mode cmp_mode = GET_MODE (operands[4]);
+  machine_mode cmp_mode = GET_MODE (operands[0]);
rtx vpr_p0 = mask;
-  rtx zero = gen_reg_rtx (cmp_mode);
-  rtx one = gen_reg_rtx (cmp_mode);
-  emit_move_insn (zero, CONST0_RTX (cmp_mode));
-  emit_move_insn (one, CONST1_RTX (cmp_mode));
+
switch (GET_MODE_CLASS (cmp_mode))
{
case MODE_VECTOR_INT:
- emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], 
one, zero, vpr_p0));
+ emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_mode, operands[0],
+operands[1], operands[2], vpr_p0));
  break;
case MODE_VECTOR_FLOAT:
  if (TARGET_HAVE_MVE_FLOAT)
-   emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, 
vpr_p0));
+   emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0],
+operands[1], operands[2], vpr_p0));
+ else
+   gcc_unreachable ();
  break;
default:
  gcc_unreachable ();

Here too vpr_p0 feels a bit redundant now.


Indeed, but it seemed clearer to me :-)


diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index e393518ea88..a9840408bdd 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -10516,3 +10516,58 @@ (define_insn "*movmisalign_mve_load"
"vldr.\t%q0, %E1"
[(set_attr "type" "mve_load")]
  )
+
+;; Expanders for vec_cmp and vcond
+
+(define_expand "vec_cmphi"
+  [(set (match_operand:HI 0 "s_register_operand")
+   (match_operator:HI 1 "comparison_operator"
+ [(match_operand:MVE_VLD_ST 2 "s_register_operand")
+  (match_operand:MVE_VLD_ST 3 "reg_or_zero_operand")]))]
+  "TARGET_HAVE_MVE
+   && (! || flag_unsafe_math_optimizations)"

Is flag_unsafe_math_optimizations needed for MVE?  For Neon we had
it because of flush to zero (at least, I think that was the only reason).


Right, I inherited this from the vec_cmp in neon.md. I do not have the 
ARM ARM at hand right now to check.


However, your question makes me wonder about the other vec_cmp pattern 
in vec-common.md, which is common to Neon and MVE. It might need to be 
adjusted too.



+{
+  arm_expand_vector_compare (oper

[committed] libstdc++: Modernize helpers

2021-07-16 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/random.h (_Shift::__value): Use constexpr.
(_Select_uint_least_t::type): Use using-declaration.
(_Mod): Likewise.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
line number.

Tested powerpc64le-linux. Committed to trunk.

commit 95891ca020591196cde50c4cde4cab14783a3c00
Author: Jonathan Wakely 
Date:   Fri Jul 16 13:39:25 2021

libstdc++: Modernize  helpers

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/random.h (_Shift::__value): Use constexpr.
(_Select_uint_least_t::type): Use using-declaration.
(_Mod): Likewise.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
line number.

diff --git a/libstdc++-v3/include/bits/random.h 
b/libstdc++-v3/include/bits/random.h
index 6d0e1544c90..c5cae87b636 100644
--- a/libstdc++-v3/include/bits/random.h
+++ b/libstdc++-v3/include/bits/random.h
@@ -68,11 +68,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 bool = __w < static_cast
  (std::numeric_limits<_UIntType>::digits)>
   struct _Shift
-  { static const _UIntType __value = 0; };
+  { static constexpr _UIntType __value = 0; };
 
 template
   struct _Shift<_UIntType, __w, true>
-  { static const _UIntType __value = _UIntType(1) << __w; };
+  { static constexpr _UIntType __value = _UIntType(1) << __w; };
 
 template
   struct _Select_uint_least_t<__s, 4>
-  { typedef unsigned int type; };
+  { using type = unsigned int; };
 
 template
   struct _Select_uint_least_t<__s, 3>
-  { typedef unsigned long type; };
+  { using type = unsigned long; };
 
 template
   struct _Select_uint_least_t<__s, 2>
-  { typedef unsigned long long type; };
+  { using type = unsigned long long; };
 
 #if __SIZEOF_INT128__ > __SIZEOF_LONG_LONG__
 template
   struct _Select_uint_least_t<__s, 1>
-  { __extension__ typedef unsigned __int128 type; };
+  { __extension__ using type = unsigned __int128; };
 #endif
 
 // Assume a != 0, a < m, c < m, x < m.
@@ -111,11 +111,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  bool __schrage_ok = __m % __a < __m / __a>
   struct _Mod
   {
-   typedef typename _Select_uint_least_t::type _Tp2;
static _Tp
__calc(_Tp __x)
-   { return static_cast<_Tp>((_Tp2(__a) * __x + __c) % __m); }
+   {
+ using _Tp2
+   = typename _Select_uint_least_t::type;
+ return static_cast<_Tp>((_Tp2(__a) * __x + __c) % __m);
+   }
   };
 
 // Schrage.
diff --git a/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc 
b/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
index 3ded306bd5f..d6e6399bd79 100644
--- a/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
+++ b/libstdc++-v3/testsuite/26_numerics/random/pr60037-neg.cc
@@ -10,6 +10,6 @@ std::__detail::_Adaptor 
aurng(urng);
 auto x = std::generate_canonical::digits>(urng);
 
-// { dg-error "static assertion failed: template argument must be a floating 
point type" "" { target *-*-* } 166 }
+// { dg-error "static assertion failed: template argument must be a floating 
point type" "" { target *-*-* } 169 }
 
 // { dg-error "static assertion failed: template argument must be a floating 
point type" "" { target *-*-* } 3350 }


[committed] libstdc++: Simplify numeric_limits<__max_size_type>

2021-07-16 Thread Jonathan Wakely via Gcc-patches
If __int128 is supported then __int_traits<__int128> is guaranteed to be
specialized, so we can remove the preprocessor condition inside the
std::numeric_traits<__detail::__max_size_type> specialization. Simply
using __int_traits<_Sp::__rep> gives the right answer.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/max_size_type.h (numeric_limits<__max_size_type>):
Use __int_traits unconditionally.

Tested powerpc64le-linux. Committed to trunk.

commit bfb0586ebdb696efa9e59cb8da1d977c5880653b
Author: Jonathan Wakely 
Date:   Fri Jul 16 13:53:05 2021

libstdc++: Simplify numeric_limits<__max_size_type>

If __int128 is supported then __int_traits<__int128> is guaranteed to be
specialized, so we can remove the preprocessor condition inside the
std::numeric_traits<__detail::__max_size_type> specialization. Simply
using __int_traits<_Sp::__rep> gives the right answer.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/max_size_type.h (numeric_limits<__max_size_type>):
Use __int_traits unconditionally.

diff --git a/libstdc++-v3/include/bits/max_size_type.h 
b/libstdc++-v3/include/bits/max_size_type.h
index 298a929db03..11721b30b61 100644
--- a/libstdc++-v3/include/bits/max_size_type.h
+++ b/libstdc++-v3/include/bits/max_size_type.h
@@ -771,14 +771,8 @@ namespace ranges
   static constexpr bool is_signed = false;
   static constexpr bool is_integer = true;
   static constexpr bool is_exact = true;
-#if __SIZEOF_INT128__
-  static_assert(__extension__ same_as<_Sp::__rep, __uint128_t>);
-  static constexpr int digits = 129;
-#else
-  static_assert(same_as<_Sp::__rep, unsigned long long>);
   static constexpr int digits
-   = __gnu_cxx::__int_traits::__digits + 1;
-#endif
+   = __gnu_cxx::__int_traits<_Sp::__rep>::__digits + 1;
   static constexpr int digits10
= static_cast(digits * numbers::ln2 / numbers::ln10);
 


Re: [committed] libstdc++: Add noexcept to __replacement_assert [PR101429]

2021-07-16 Thread Jonathan Wakely via Gcc-patches
On Thu, 15 Jul 2021 at 19:47, Jonathan Wakely wrote:
>
>
>
> On Thu, 15 Jul 2021, 18:21 François Dumont via Libstdc++, 
>  wrote:
>>
>> On 15/07/21 5:26 pm, Jonathan Wakely via Libstdc++ wrote:
>> > This results in slightly smaller code when assertions are enabled when
>> > either using Clang (because it adds code to call std::terminate when
>> > potentially-throwing functions are called in a noexcept function) or a
>> > freestanding or non-verbose build (because it doesn't use printf).
>> >
>> > Signed-off-by: Jonathan Wakely 
>> >
>> > libstdc++-v3/ChangeLog:
>> >
>> >   PR libstdc++/101429
>> >   * include/bits/c++config (__replacement_assert): Add noexcept.
>> >   [!_GLIBCXX_VERBOSE] (__glibcxx_assert_impl): Use __builtin_trap
>> >   instead of __replacement_assert.
>> >
>> > Tested powerpc64le-linux. Committed to trunk.
>> >
>> ChangeLog is talking about __builtin_trap but there is none in the
>> attached patch.
>
>
>
> Yes I already noticed that and mentioned it in the bugzilla PR. It uses 
> __builtin_abort not __builtin_trap. I'll fix the ChangeLog file tomorrow 
> after it gets generated.

Fixed in r12-2361

> The Git commit message will stay wrong though.



Re: [PATCH] libstdc++: Use __extension__ instead of diagnostic pragmas (was: Suppress pedantic warnings about __int128)

2021-07-16 Thread Jonathan Wakely via Gcc-patches
On Fri, 16 Jul 2021 at 14:02, Jonathan Wakely wrote:
>
> On Fri, 16 Jul 2021 at 12:29, Jonathan Wakely wrote:
> > Hmm, in fact it seems that we can just use the __uint128_t typedef
> > instead, which doesn't give a pedwarn:
> >
> >   using __rep = __uint128_t;
> >
> > Is that typedef always available if __int128 is? There's a comment in
> > gcc/c-family/c-common.c that I don't understand:
> >
> > #if HOST_BITS_PER_WIDE_INT >= 64
> >   /* Note that this is different than the __int128 type that's part of
> >  the generic __intN support.  */
> >   if (targetm.scalar_mode_supported_p (TImode))
> > lang_hooks.decls.pushdecl (build_decl (UNKNOWN_LOCATION,
> >TYPE_DECL,
> >get_identifier ("__int128_t"),
> >intTI_type_node));
> > #endif
> >
> > They are the same type in C++, so what is "different"? Is it possible
> > for __int128 to be different from a TImode integer?
>
> As discussed on IRC, I'm going to add a configure check that __int128
> and __int128_t are the same, and similarly for the unsigned versions.
> That will allow us to use __int128_t and __uint128_t to avoid the
> warnings (assuming GCC doesn't change to warn consistently for the
> non-standard typedefs as well as the non-standard types).
>
> For now, I'll just use __extension__ consistently everywhere. I'm
> testing the attached patch that does that.

Pushed to trunk now.



[PATCH][committed] testsuite: fix IL32 issues with usdot tests.

2021-07-16 Thread Tamar Christina via Gcc-patches
Hi All,

Fix tests when int == long by using long long instead.

Regtested on aarch64-none-linux-gnu and no issues.

Committed under the obvious rule.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

PR middle-end/101457
* gcc.dg/vect/vect-reduc-dot-19.c: Use long long.
* gcc.dg/vect/vect-reduc-dot-20.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-21.c: Likewise.
* gcc.dg/vect/vect-reduc-dot-22.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c
index 
dbeaaec24a1095b7730d9e1262f5a951fd2312fc..d00f24aae4c7ffbf213dc248faeeae96cd401411
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c
@@ -13,15 +13,15 @@
 #define SIGNEDNESS_4 unsigned
 #endif
 
-SIGNEDNESS_1 long __attribute__ ((noipa))
-f (SIGNEDNESS_1 long res, SIGNEDNESS_3 char *restrict a,
+SIGNEDNESS_1 long long __attribute__ ((noipa))
+f (SIGNEDNESS_1 long long res, SIGNEDNESS_3 char *restrict a,
SIGNEDNESS_4 short *restrict b)
 {
   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
 {
   int av = a[i];
   int bv = b[i];
-  SIGNEDNESS_2 long mult = av * bv;
+  SIGNEDNESS_2 long long mult = av * bv;
   res += mult;
 }
   return res;
@@ -37,7 +37,7 @@ main (void)
 
   SIGNEDNESS_3 char a[N];
   SIGNEDNESS_4 short b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_1 long long expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-20.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-20.c
index 
d757fb15615ba79dedcbfc44407d3f363274ad26..17adbca83a0c97e76db8e15c0ff376608fd5d1bd
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-20.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-20.c
@@ -13,15 +13,15 @@
 #define SIGNEDNESS_4 unsigned
 #endif
 
-SIGNEDNESS_1 long __attribute__ ((noipa))
-f (SIGNEDNESS_1 long res, SIGNEDNESS_3 short *restrict a,
+SIGNEDNESS_1 long long __attribute__ ((noipa))
+f (SIGNEDNESS_1 long long res, SIGNEDNESS_3 short *restrict a,
SIGNEDNESS_4 char *restrict b)
 {
   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
 {
   int av = a[i];
   int bv = b[i];
-  SIGNEDNESS_2 long mult = av * bv;
+  SIGNEDNESS_2 long long mult = av * bv;
   res += mult;
 }
   return res;
@@ -37,7 +37,7 @@ main (void)
 
   SIGNEDNESS_3 short a[N];
   SIGNEDNESS_4 char b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_1 long long expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c
index 
6d08bf4478be83de86b0975524687a75d025123e..6cc6a4f2e92ed21fe2e71c0cd842c80d44b6db9f
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-21.c
@@ -13,8 +13,8 @@
 #define SIGNEDNESS_4 unsigned
 #endif
 
-SIGNEDNESS_1 long __attribute__ ((noipa))
-f (SIGNEDNESS_1 long res, SIGNEDNESS_3 char *restrict a,
+SIGNEDNESS_1 long long __attribute__ ((noipa))
+f (SIGNEDNESS_1 long long res, SIGNEDNESS_3 char *restrict a,
SIGNEDNESS_4 short *restrict b)
 {
   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
@@ -37,7 +37,7 @@ main (void)
 
   SIGNEDNESS_3 char a[N];
   SIGNEDNESS_4 short b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_1 long long expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
index 
0bde43a6cb855ce5edd9015ebf34ca226353d77e..e13d3d5c4da7b14df48fa9a2c7ad457c5ccbc89c
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
@@ -13,8 +13,8 @@
 #define SIGNEDNESS_4 unsigned
 #endif
 
-SIGNEDNESS_1 long __attribute__ ((noipa))
-f (SIGNEDNESS_1 long res, SIGNEDNESS_3 char *restrict a,
+SIGNEDNESS_1 long long __attribute__ ((noipa))
+f (SIGNEDNESS_1 long long res, SIGNEDNESS_3 char *restrict a,
SIGNEDNESS_4 short *restrict b)
 {
   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
@@ -37,7 +37,7 @@ main (void)
 
   SIGNEDNESS_3 char a[N];
   SIGNEDNESS_4 short b[N];
-  SIGNEDNESS_1 long expected = 0x12345;
+  SIGNEDNESS_1 long long expected = 0x12345;
   for (int i = 0; i < N; ++i)
 {
   a[i] = BASE + i * 5;


-- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c
index dbeaaec24a1095b7730d9e1262f5a951fd2312fc..d00f24aae4c7ffbf213dc248faeeae96cd401411 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-19.c
@@ -13,15 +13,15 @@
 #define SIGNEDNESS_4 unsigned
 #endif
 
-SIGNEDNESS_1 long __attribute__ ((noipa))
-f (SIGNEDNESS_1 long res, SIGNEDNESS_3 char *restrict a,
+SIGNEDNESS_1 long long __attribute__ ((noipa))
+f (SIGNEDNESS_1 long long res, SIGNEDNESS_3 char *restrict a,
SIGNE

Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Matthias Kretz
On Friday, 16 July 2021 04:41:17 CEST Jason Merrill via Gcc-patches wrote:
> > Currently the patch does not adjust the values based on -march, as in JF's
> > proposal.  I'll need more guidance from the ARM/AArch64 maintainers about
> > how to go about that.  --param l1-cache-line-size is set based on -mtune,
> > but I don't think we want -mtune to change these ABI-affecting values. 
> > Are
> > there -march values for which a smaller range than 64-256 makes sense?

As a user who cares about ABI but also cares about maximizing performance of 
builds for a specific HPC setup I'd expect the hardware interference size 
values to be allowed to break ABIs. The point of these values is to give me 
better performance portability (but not necessarily binary portability) than 
my usual "pick 64 as a good average".

Wrt, -march / -mtune setting hardware interference size: IMO -mtune=X should 
be interpreted as "my binary is supposed to be optimized for X, I accept 
inefficiencies on everything that's not X".

On Friday, 16 July 2021 04:48:52 CEST Noah Goldstein wrote:
> On intel x86 systems with a private L2 cache the spatial prefetcher
> can cause destructive interference along 128 byte aligned boundaries.
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-3
> 2-architectures-optimization-manual.pdf#page=60

I don't understand how this feature would lead to false sharing. But maybe I 
misunderstand the spatial prefetcher. The first access to one of the two cache 
lines pairs would bring both cache lines to LLC (and possibly L2). If a core 
with a different L2 reads the other cache line the cache line would be 
duplicated; if it writes to it, it would be exclusive to the other core's L2. 
The cache line pairs do not affect each other anymore. Maybe there's a minor 
inefficiency on initial transfer from memory, but isn't that all?

That said. Intel documents the spatial prefetcher exclusively for Sandy 
Bridge. So if you still believe 128 is necessary, set the destructive hardware 
interference size to 64 for all of x86 except -mtune=sandybridge.

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──


Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Jason Merrill via Gcc-patches
On Fri, Jul 16, 2021, 11:12 AM Matthias Kretz  wrote:

> On Friday, 16 July 2021 04:41:17 CEST Jason Merrill via Gcc-patches wrote:
> > > Currently the patch does not adjust the values based on -march, as in
> JF's
> > > proposal.  I'll need more guidance from the ARM/AArch64 maintainers
> about
> > > how to go about that.  --param l1-cache-line-size is set based on
> -mtune,
> > > but I don't think we want -mtune to change these ABI-affecting values.
> > > Are
> > > there -march values for which a smaller range than 64-256 makes sense?
>
> As a user who cares about ABI but also cares about maximizing performance
> of
> builds for a specific HPC setup I'd expect the hardware interference size
> values to be allowed to break ABIs. The point of these values is to give
> me
> better performance portability (but not necessarily binary portability)
> than
> my usual "pick 64 as a good average".
>
> Wrt, -march / -mtune setting hardware interference size: IMO -mtune=X
> should
> be interpreted as "my binary is supposed to be optimized for X, I accept
> inefficiencies on everything that's not X".
>
> On Friday, 16 July 2021 04:48:52 CEST Noah Goldstein wrote:
> > On intel x86 systems with a private L2 cache the spatial prefetcher
> > can cause destructive interference along 128 byte aligned boundaries.
> >
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-3
> > 2-architectures-optimization-manual.pdf#page=60
>
> I don't understand how this feature would lead to false sharing. But maybe
> I
> misunderstand the spatial prefetcher. The first access to one of the two
> cache
> lines pairs would bring both cache lines to LLC (and possibly L2). If a
> core
> with a different L2 reads the other cache line the cache line would be
> duplicated; if it writes to it, it would be exclusive to the other core's
> L2.
> The cache line pairs do not affect each other anymore. Maybe there's a
> minor
> inefficiency on initial transfer from memory, but isn't that all?
>
> That said. Intel documents the spatial prefetcher exclusively for Sandy
> Bridge. So if you still believe 128 is necessary, set the destructive
> hardware
> interference size to 64 for all of x86 except -mtune=sandybridge.
>

Adjusting them based on tuning would certainly simplify a significant use
case, perhaps the only reasonable use.  Cases more concerned with ABI
stability probably shouldn't use them at all. And that would mean not
needing to worry about the impossible task of finding the right values for
an entire architecture.

I'm thinking about warning by default for any use of the variables without
explicitly specifying their values on the command line. Users could disable
the warning if they're happy using whatever the defaults happen to be.

Jason

>


Pushing XFAILed test cases (was: [PATCH, Fortran] Bind(c): CFI_signed_char is not a Fortran character type)

2021-07-16 Thread Thomas Schwinge
[Also including  for guidance.]


Hi!

(I'm not involved in or familiar with Sandra's Fortran TS29113 work, just
commenting generally here.)


On 2021-07-16T09:52:28+0200, Thomas Koenig via Gcc-patches 
 wrote:
> It is my understanding that it is not gcc policy to add xfailed test
> cases for things that do not yet work. Rather, xfail is for tests that
> later turn out not to work, especially on certain architectures.

That's not current practice, as far as I can tell.  I'm certainly
"guilty" of pushing lots of XFAILed test cases (or, most often,
individual XFAILed DejaGnu directives), and I see a good number of others
GCC folks do that, too.  Ideally with but casually also without
corresponding GCC PRs filed.  If without, then of course should have
suitable commentary inside the test case file.  Time span of addressing
the XFAILs ranging between days and years.

In my opinion, if a test case has been written and analyzed, why
shouldn't you push it, even if (parts of) it don't quite work yet?  (If
someone -- at another time, possibly -- then implements the missing
functionality/fixes the bugs, the XFAILs turn into XPASSes, thus serving
to demonstrate the effect of code changes.

Otherwise -- and I've run into that just yesterday... -- effort spent on
such test cases simply gets lost "in the noise of the mailing list
archives", until re-discovered, or -- in my case -- re-implemented and
then re-discovered by chance.

We nowadays even have a way to mark up ICEing test cases ('dg-ice'),
which has been used to push test cases that ICE for '{ target *-*-* }'.


Of course, we shall assume a certain level of quality in the XFAILed test
cases: I'm certainly not suggesting we put any random junk into the
testsuite, coarsely XFAILed.  (I have not reviewed Sandra's test cases to
that effect, but knowing here, I'd be surprised if that were the problem
here.)


Not trying to overrule you, just sharing my opinion -- now happy to hear
others.  :-)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] libstdc++: Skip atomic instructions in _Sp_counted_base::_M_release when both counts are 1

2021-07-16 Thread Maged Michael via Gcc-patches
Thank you, Jonathan, for the detailed comments! I'll update the patch
accordingly.

On Fri, Jul 16, 2021 at 9:55 AM Jonathan Wakely 
wrote:

> On Thu, 17 Dec 2020 at 20:51, Maged Michael wrote:
> >
> > Please find a proposed patch for _Sp_counted_base::_M_release to skip the
> > two atomic instructions that decrement each of the use count and the weak
> > count when both are 1. I proposed the general idea in an earlier thread (
> > https://gcc.gnu.org/pipermail/libstdc++/2020-December/051642.html) and
> got
> > useful feedback on a draft patch and responses to related questions about
> > multi-granular atomicity and alignment. This patch is based on that
> > feedback.
> >
> >
> > I added a check for thread sanitizer to use the current algorithm in that
> > case because TSAN does not support multi-granular atomicity. I'd like to
> > add a check of __has_feature(thread_sanitizer) for building using LLVM. I
> > found examples of __has_feature in libstdc++
>
> There are no uses of __has_feature in libstdc++. We do use
> __has_builtin (which GCC also supports) and Clang's __is_identifier
> (which GCC doesn't support) to work around some weird semantics of
> __has_builtin in older versions of Clang.
>
>
> > but it doesn't seem to be
> > recognized in shared_ptr_base.h. Any guidance on how to check
> > __has_feature(thread_sanitizer) in this patch?
>
> I think we want to do something like this in include/bits/c++config
>
> #if __SANITIZE_THREAD__
> #  define _GLIBCXX_TSAN 1
> #elif defined __has_feature
> # if __has_feature(thread_sanitizer)
> #  define _GLIBCXX_TSAN 1
> # endif
> #endif
>
> Then in bits/shared_ptr_base.h
>
> #if _GLIBCXX_TSAN
> _M_release_orig();
> return;
> #endif
>
>
>
> > GCC generates code for _M_release that is larger and more complex than
> that
> > generated by LLVM. I'd like to file a bug report about that. Jonathan,
>
> Is this the same issue as https://gcc.gnu.org/PR101406 ?
>
> Partly yes. Even when using __atomic_add_dispatch I noticed that clang
generated less code than gcc. I see in the response to the issue that the
new glibc is expected to optimize better. So maybe this will eliminate the
issue.


> > would you please create a bugzilla account for me (
> > https://gcc.gnu.org/bugzilla/) using my gmail address. Thank you.
>
> Done (sorry, I didn't notice the request in this mail until coming
> back to it to review the patch properly).
>
> Thank you!


>
>
> >
> >
> > Information about the patch:
> >
> > - Benefits of the patch: Save the cost of the last atomic decrements of
> > each of the use count and the weak count in _Sp_counted_base. Atomic
> > instructions are significantly slower than regular loads and stores
> across
> > major architectures.
> >
> > - How current code works: _M_release() atomically decrements the use
> count,
> > checks if it was 1, if so calls _M_dispose(), atomically decrements the
> > weak count, checks if it was 1, and if so calls _M_destroy().
> >
> > - How the proposed patch works: _M_release() loads both use count and
> weak
> > count together atomically (when properly aligned), checks if the value is
> > equal to the value of both counts equal to 1 (e.g., 0x10001), and if
> so
> > calls _M_dispose() and _M_destroy(). Otherwise, it follows the original
> > algorithm.
> >
> > - Why it works: When the current thread executing _M_release() finds each
> > of the counts is equal to 1, then (when _lock_policy is _S_atomic) no
> other
> > threads could possibly hold use or weak references to this control block.
> > That is, no other threads could possibly access the counts or the
> protected
> > object.
> >
> > - The proposed patch is intended to interact correctly with current code
> > (under certain conditions: _Lock_policy is _S_atomic, proper alignment,
> and
> > native lock-free support for atomic operations). That is, multiple
> threads
> > using different versions of the code with and without the patch operating
> > on the same objects should always interact correctly. The intent for the
> > patch is to be ABI compatible with the current implementation.
> >
> > - The proposed patch involves a performance trade-off between saving the
> > costs of two atomic instructions when the counts are both 1 vs adding the
> > cost of loading the combined counts and comparison with two ones (e.g.,
> > 0x10001).
> >
> > - The patch has been in use (built using LLVM) in a large environment for
> > many months. The performance gains outweigh the losses (roughly 10 to 1)
> > across a large variety of workloads.
> >
> >
> > I'd appreciate feedback on the patch and any suggestions for checking
> > __has_feature(thread_sanitizer).
>
> N.B. gmail completely mangles patches unless you send them as attachments.
>
>
> > diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h
> > b/libstdc++-v3/include/bits/shared_ptr_base.h
> >
> > index 368b2d7379a..a8fc944af5f 100644
> >
> > --- a/libstdc++-v3/include/bits/shared_ptr_base.h
> >
> > +++ b/libstdc++

Re: [PATCH] c++: Allow constexpr references to non-static vars [PR100976]

2021-07-16 Thread Jason Merrill via Gcc-patches

On 7/15/21 5:14 PM, Marek Polacek wrote:

The combination of DR 2481 and DR 2126 should allow us to do

   void f()
   {
 constexpr const int &r = 42;
 static_assert(r == 42);
   }

because [expr.const]/4.7 now says that "a temporary object of
non-volatile const-qualified literal type whose lifetime is extended to
that of a variable that is usable in constant expressions" is usable in
a constant expression.

I think the temporary is supposed to be const-qualified, because Core 2481
says so.  I was happy to find out that we already mark the temporary as
const + constexpr in set_up_extended_ref_temp.

But that wasn't enough to make the test above work: references are
traditionally implemented as pointers, so the temporary object will be
(const int &)&D.1234, and verify_constant -> reduced_constant_expression_p
-> initializer_constant_valid_p_1 doesn't think that's OK -- and rightly
so -- the address of a local variable certainly isn't constant


Hmm.  I'm very sorry, I'm afraid I've steered you wrong repeatedly, and 
this is the problem with my testcase above and in the PR.


Making that temporary usable in constant expressions doesn't make it a 
valid initializer for the constexpr reference, because it is still not a 
"permitted result of a constant expression"; [expr.const]/11 still says 
that such an entity must have static storage duration.


So the above is only valid if the reference has static storage duration.


Therefore
I'm skipping the verify_constant check in cxx_eval_outermost_constant_expr.
(DECL_INITIAL isn't checked because maybe we are still waiting for
initialize_local_var to set it.)

Then we need to be able to evaluate such a reference.  This I do by
seeing through the reference in cxx_eval_constant_expression.  I can't
rely on decl_constant_value to pull out DECL_INITIAL, because the VAR_DECL
isn't DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P, and I think we don't
need to mess with that if we're keeping this purely in constexpr.

I wonder if we should accept

   void f2()
   {
 constexpr int &&r = 42;
 static_assert(r == 42);
   }

Currently we don't -- CP_TYPE_CONST_NON_VOLATILE_P (type) is false in
set_up_extended_ref_temp.

Does this make sense?  Bootstrapped/regtested on x86_64-pc-linux-gnu.

PR c++/100976
DR 2481

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_constant_expression): For a constexpr
reference, return its DECL_INITIAL.
(cxx_eval_outermost_constant_expr): Don't verify the initializer
for a constexpr variable of reference type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-ref2.C: Remove dg-error.
* g++.dg/cpp0x/constexpr-temp2.C: New test.
* g++.dg/cpp23/constexpr-temp1.C: New test.
* g++.dg/cpp23/constexpr-temp2.C: New test.
---
  gcc/cp/constexpr.c   | 29 +--
  gcc/testsuite/g++.dg/cpp0x/constexpr-ref2.C  |  5 ++-
  gcc/testsuite/g++.dg/cpp0x/constexpr-temp2.C | 15 
  gcc/testsuite/g++.dg/cpp23/constexpr-temp1.C | 39 
  gcc/testsuite/g++.dg/cpp23/constexpr-temp2.C | 23 
  5 files changed, 106 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-temp2.C
  create mode 100644 gcc/testsuite/g++.dg/cpp23/constexpr-temp1.C
  create mode 100644 gcc/testsuite/g++.dg/cpp23/constexpr-temp2.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 31fa5b66865..80b4985d055 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -6180,6 +6180,22 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
  return cxx_eval_constant_expression (ctx, r, lval, non_constant_p,
   overflow_p);
}
+  /* DR 2126 amended [expr.const]/4.7 to say that "a temporary object
+of non-volatile const-qualified literal type whose lifetime is
+extended to that of a variable that is usable in constant
+expressions" is usable in a constant expression.  Along with
+DR 2481 this means that we should accept
+
+  constexpr const int &r = 42;
+  static_assert (r == 42);
+
+Take a shortcut here rather than using decl_constant_value.  The
+temporary was marked constexpr in set_up_extended_ref_temp.  */
+  else if (TYPE_REF_P (TREE_TYPE (t))
+  && DECL_DECLARED_CONSTEXPR_P (t)
+  && DECL_INITIAL (t))
+   return cxx_eval_constant_expression (ctx, DECL_INITIAL (t), lval,
+non_constant_p, overflow_p);
/* fall through */
  case CONST_DECL:
/* We used to not check lval for CONST_DECL, but darwin.c uses
@@ -7289,10 +7305,17 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
r = cxx_eval_constant_expression (&ctx, r,
false, &non_constant_p, &overflow_p);
  
-  if (!constexpr_dtor)

-verify_constant (r, allow_non_cons

Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Jonathan Wakely via Gcc-patches
On Fri, 16 Jul 2021 at 16:33, Jason Merrill wrote:
> Adjusting them based on tuning would certainly simplify a significant use
> case, perhaps the only reasonable use.  Cases more concerned with ABI
> stability probably shouldn't use them at all. And that would mean not
> needing to worry about the impossible task of finding the right values for
> an entire architecture.

But it would be quite a significant change in behaviour if -mtune
started affecting ABI, wouldn't it?

> I'm thinking about warning by default for any use of the variables without
> explicitly specifying their values on the command line. Users could disable
> the warning if they're happy using whatever the defaults happen to be.

I like that suggestion.

Maybe the warning could suggest optimal values based on the current
-mtune flag. That way -mtune wouldn't need to alter ABI, but by
combining -mtune with explicit values for the variables you get the
best performance. And -mtune without overriding the default values
preserves ABI.



Re: [PATCH] c++: argument pack expansion inside constraint [PR100138]

2021-07-16 Thread Jason Merrill via Gcc-patches

On 7/15/21 12:56 PM, Patrick Palka wrote:

On Sat, May 8, 2021 at 8:42 AM Jason Merrill  wrote:


On 5/7/21 12:33 PM, Patrick Palka wrote:

This PR is about CTAD but the underlying problems are more general;
CTAD is a good trigger for them because of the necessary substitution
into constraints that deduction guide generation entails.

In the testcase below, when generating the implicit deduction guide for
the constrained constructor template for A, we substitute the generic
flattening map 'tsubst_args' into the constructor's constraints.  During
this substitution, tsubst_pack_expansion returns a rebuilt pack
expansion for sizeof...(xs), but it's neglecting to carry over the
PACK_EXPANSION_LOCAL_P (and PACK_EXPANSION_SIZEOF_P) flag from the
original tree to the rebuilt one.  The flag is otherwise unset on the
original tree[1] but set for the rebuilt tree from make_pack_expansion
only because we're doing the CTAD at function scope (inside main).  This
leads us to crash when substituting into the pack expansion during
satisfaction because we don't have local_specializations set up (it'd be
set up for us if PACK_EXPANSION_LOCAL_P is unset)

Similarly, when substituting into a constraint we need to set
cp_unevaluated since constraints are unevaluated operands.  This avoids
a crash during CTAD for C below.

[1]: Although the original pack expansion is in a function context, I
guess it makes sense that PACK_EXPANSION_LOCAL_P is unset for it because
we can't rely on local specializations (which are formed when
substituting into the function declaration) during satisfaction.

Bootstrapped and regtested on x86_64-pc-linux-gnu, also tested on
cmcstl2 and range-v3, does this look OK for trunk?


OK.


Would it be ok to backport this patch to the 11 branch given its
impact on concepts (or perhaps backport only part of it, say all but
the PACK_EXPANSION_LOCAL_P propagation since that part just avoids
ICEing on the invalid portions of the testcase)?


The whole patch, I think.


gcc/cp/ChangeLog:

   PR c++/100138
   * constraint.cc (tsubst_constraint): Set up cp_unevaluated.
   (satisfy_atom): Set up iloc_sentinel before calling
   cxx_constant_value.
   * pt.c (tsubst_pack_expansion): When returning a rebuilt pack
   expansion, carry over PACK_EXPANSION_LOCAL_P and
   PACK_EXPANSION_SIZEOF_P from the original pack expansion.

gcc/testsuite/ChangeLog:

   PR c++/100138
   * g++.dg/cpp2a/concepts-ctad4.C: New test.
---
   gcc/cp/constraint.cc|  6 -
   gcc/cp/pt.c |  2 ++
   gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C | 25 +
   3 files changed, 32 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 0709695fd08..30fccc46678 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2747,6 +2747,7 @@ tsubst_constraint (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
 /* We also don't want to evaluate concept-checks when substituting the
constraint-expressions of a declaration.  */
 processing_constraint_expression_sentinel s;
+  cp_unevaluated u;
 tree expr = tsubst_expr (t, args, complain, in_decl, false);
 return expr;
   }
@@ -3005,7 +3006,10 @@ satisfy_atom (tree t, tree args, sat_info info)

 /* Compute the value of the constraint.  */
 if (info.noisy ())
-result = cxx_constant_value (result);
+{
+  iloc_sentinel ils (EXPR_LOCATION (result));
+  result = cxx_constant_value (result);
+}
 else
   {
 result = maybe_constant_value (result, NULL_TREE,
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 36a8cb5df5d..0d27dd1af65 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13203,6 +13203,8 @@ tsubst_pack_expansion (tree t, tree args, 
tsubst_flags_t complain,
 else
   result = tsubst (pattern, args, complain, in_decl);
 result = make_pack_expansion (result, complain);
+  PACK_EXPANSION_LOCAL_P (result) = PACK_EXPANSION_LOCAL_P (t);
+  PACK_EXPANSION_SIZEOF_P (result) = PACK_EXPANSION_SIZEOF_P (t);
 if (PACK_EXPANSION_AUTO_P (t))
   {
 /* This is a fake auto... pack expansion created in add_capture with
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C
new file mode 100644
index 000..95a3a22dd04
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ctad4.C
@@ -0,0 +1,25 @@
+// PR c++/100138
+// { dg-do compile { target c++20 } }
+
+template 
+struct A {
+  A(T, auto... xs) requires (sizeof...(xs) != 0) { }
+};
+
+constexpr bool f(...) { return true; }
+
+template 
+struct B {
+  B(T, auto... xs) requires (f(xs...)); // { dg-error "constant expression" }
+};
+
+template 
+struct C {
+  C(T, auto x) requires (f(x)); // { dg-error "constant expression" }
+};
+
+int main() {
+  A x{1, 2}; // { dg-bogus "" }
+  B y{1, 2}; /

Re: [PATCH] c++: covariant reference return type [PR99664]

2021-07-16 Thread Jason Merrill via Gcc-patches

On 7/15/21 12:37 PM, Patrick Palka wrote:

This implements the wording changes of DR 960 which clarifies that two
reference types are covariant only if they're both lvalue references
or both rvalue references.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?


OK.


DR 960
PR c++/99664

gcc/cp/ChangeLog:

* search.c (check_final_overrider): Compare TYPE_REF_IS_RVALUE
when the return types are references.

gcc/testsuite/ChangeLog:

* g++.dg/inherit/covariant23.C: New test.
---
  gcc/cp/search.c|  8 +++-
  gcc/testsuite/g++.dg/inherit/covariant23.C | 14 ++
  2 files changed, 21 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/inherit/covariant23.C

diff --git a/gcc/cp/search.c b/gcc/cp/search.c
index af41bfe5835..943671acff8 100644
--- a/gcc/cp/search.c
+++ b/gcc/cp/search.c
@@ -1948,7 +1948,13 @@ check_final_overrider (tree overrider, tree basefn)
fail = !INDIRECT_TYPE_P (base_return);
if (!fail)
{
- fail = cp_type_quals (base_return) != cp_type_quals (over_return);
+ if (cp_type_quals (base_return) != cp_type_quals (over_return))
+   fail = 1;
+
+ if (TYPE_REF_P (base_return)
+ && (TYPE_REF_IS_RVALUE (base_return)
+ != TYPE_REF_IS_RVALUE (over_return)))
+   fail = 1;
  
  	  base_return = TREE_TYPE (base_return);

  over_return = TREE_TYPE (over_return);
diff --git a/gcc/testsuite/g++.dg/inherit/covariant23.C 
b/gcc/testsuite/g++.dg/inherit/covariant23.C
new file mode 100644
index 000..b27be15ef45
--- /dev/null
+++ b/gcc/testsuite/g++.dg/inherit/covariant23.C
@@ -0,0 +1,14 @@
+// PR c++/99664
+// { dg-do compile { target c++11 } }
+
+struct Res { };
+
+struct A {
+  virtual Res &&f();
+  virtual Res &g();
+};
+
+struct B : A {
+  Res &f() override; // { dg-error "return type" }
+  Res &&g() override; // { dg-error "return type" }
+};





Re: [PATCH] c++: alias CTAD inside decltype [PR101233]

2021-07-16 Thread Jason Merrill via Gcc-patches

On 7/15/21 12:37 PM, Patrick Palka wrote:

This is the alias CTAD version of the CTAD bug PR93248, and the fix is
the same: clear cp_unevaluated_operand so that the entire chain of
DECL_ARGUMENTS gets substituted.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?


OK.


PR c++/101233

gcc/cp/ChangeLog:

* pt.c (alias_ctad_tweaks): Clear cp_unevaluated_operand for
substituting DECL_ARGUMENTS.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias10.C: New test.
---
  gcc/cp/pt.c  | 12 +---
  gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C | 10 ++
  2 files changed, 19 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index c7bf7d412ca..bc0a0936579 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -29097,9 +29097,15 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
  /* Substitute the deduced arguments plus the rewritten template
 parameters into f to get g.  This covers the type, copyness,
 guideness, and explicit-specifier.  */
- tree g = tsubst_decl (DECL_TEMPLATE_RESULT (f), targs, complain);
- if (g == error_mark_node)
-   continue;
+ tree g;
+   {
+ /* Parms are to have DECL_CHAIN tsubsted, which would be skipped
+if cp_unevaluated_operand.  */
+ cp_evaluated ev;
+ g = tsubst_decl (DECL_TEMPLATE_RESULT (f), targs, complain);
+ if (g == error_mark_node)
+   continue;
+   }
  DECL_USE_TEMPLATE (g) = 0;
  fprime = build_template_decl (g, gtparms, false);
  DECL_TEMPLATE_RESULT (fprime) = g;
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C
new file mode 100644
index 000..a473fff5dc7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias10.C
@@ -0,0 +1,10 @@
+// PR c++/101233
+// { dg-do compile { target c++20 } }
+
+template
+struct A { A(T, U); };
+
+template
+using B = A;
+
+using type = decltype(B{0, 0});





Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Noah Goldstein via Gcc-patches
On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz  wrote:

> On Friday, 16 July 2021 04:41:17 CEST Jason Merrill via Gcc-patches wrote:
> > > Currently the patch does not adjust the values based on -march, as in
> JF's
> > > proposal.  I'll need more guidance from the ARM/AArch64 maintainers
> about
> > > how to go about that.  --param l1-cache-line-size is set based on
> -mtune,
> > > but I don't think we want -mtune to change these ABI-affecting values.
> > > Are
> > > there -march values for which a smaller range than 64-256 makes sense?
>
> As a user who cares about ABI but also cares about maximizing performance
> of
> builds for a specific HPC setup I'd expect the hardware interference size
> values to be allowed to break ABIs. The point of these values is to give
> me
> better performance portability (but not necessarily binary portability)
> than
> my usual "pick 64 as a good average".


> Wrt, -march / -mtune setting hardware interference size: IMO -mtune=X
> should
> be interpreted as "my binary is supposed to be optimized for X, I accept
> inefficiencies on everything that's not X".
>
> On Friday, 16 July 2021 04:48:52 CEST Noah Goldstein wrote:
> > On intel x86 systems with a private L2 cache the spatial prefetcher
> > can cause destructive interference along 128 byte aligned boundaries.
> >
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-3
> > 2-architectures-optimization-manual.pdf#page=60
>
> I don't understand how this feature would lead to false sharing. But maybe
> I
> misunderstand the spatial prefetcher. The first access to one of the two
> cache
> lines pairs would bring both cache lines to LLC (and possibly L2). If a
> core
> with a different L2 reads the other cache line the cache line would be
> duplicated; if it writes to it, it would be exclusive to the other core's
> L2.
> The cache line pairs do not affect each other anymore. Maybe there's a
> minor
> inefficiency on initial transfer from memory, but isn't that all?
>

If two cores that do not share an L2 cache need exclusive access to
a cache-line, the L2 spatial prefetcher could cause pingponging if those
two cache-lines were adjacent and shared the same 128 byte alignment.
Say core A requests line x1 in exclusive, it also get line x2 (not sure
if x2 would be in shared or exclusive), core B then requests x2 in
exclusive,
it also gets x1. Irrelevant of the state x1 comes into core B's private L2
cache
it invalidates the exclusive state on cache-line x1 in core A's private L2
cache. If this was done in a loop (say a simple `lock add` loop) it would
cause
pingponging on cache-lines x1/x2 between core A and B's private L2 caches.


>
> That said. Intel documents the spatial prefetcher exclusively for Sandy
> Bridge. So if you still believe 128 is necessary, set the destructive
> hardware
> interference size to 64 for all of x86 except -mtune=sandybridge.
>

AFAIK the spatial prefetcher exists on newer x86_64 machines as well.


>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  std::experimental::simd  https://github.com/VcDevel/std-simd
> ──
>


Re: Pushing XFAILed test cases

2021-07-16 Thread Martin Sebor via Gcc-patches

On 7/16/21 9:32 AM, Thomas Schwinge wrote:

[Also including  for guidance.]


Hi!

(I'm not involved in or familiar with Sandra's Fortran TS29113 work, just
commenting generally here.)


On 2021-07-16T09:52:28+0200, Thomas Koenig via Gcc-patches 
 wrote:

It is my understanding that it is not gcc policy to add xfailed test
cases for things that do not yet work. Rather, xfail is for tests that
later turn out not to work, especially on certain architectures.


That's not current practice, as far as I can tell.  I'm certainly
"guilty" of pushing lots of XFAILed test cases (or, most often,
individual XFAILed DejaGnu directives), and I see a good number of others
GCC folks do that, too.  Ideally with but casually also without
corresponding GCC PRs filed.  If without, then of course should have
suitable commentary inside the test case file.  Time span of addressing
the XFAILs ranging between days and years.

In my opinion, if a test case has been written and analyzed, why
shouldn't you push it, even if (parts of) it don't quite work yet?  (If
someone -- at another time, possibly -- then implements the missing
functionality/fixes the bugs, the XFAILs turn into XPASSes, thus serving
to demonstrate the effect of code changes.

Otherwise -- and I've run into that just yesterday... -- effort spent on
such test cases simply gets lost "in the noise of the mailing list
archives", until re-discovered, or -- in my case -- re-implemented and
then re-discovered by chance.

We nowadays even have a way to mark up ICEing test cases ('dg-ice'),
which has been used to push test cases that ICE for '{ target *-*-* }'.


Of course, we shall assume a certain level of quality in the XFAILed test
cases: I'm certainly not suggesting we put any random junk into the
testsuite, coarsely XFAILed.  (I have not reviewed Sandra's test cases to
that effect, but knowing here, I'd be surprised if that were the problem
here.)


Not trying to overrule you, just sharing my opinion -- now happy to hear
others.  :-)


I've also been xfailing individual directives in new tests, with
or without PRs tracking the corresponding limitations (not so much
outright bugs as future enhancements).  The practice has been
discussed in the past and (IIRC) there was general agreement with
it.  Marek even formalized some of it for the C++ front end by
adding support for one or more dg- directives (I think dg-ice was
one of them). The discussion I recall is here:

https://gcc.gnu.org/pipermail/gcc-patches/2020-July/550913.html

Martin


[committed] fix a few target-dependent test failures (PR 101468)

2021-07-16 Thread Martin Sebor via Gcc-patches

A number of newly added tests were reported failing on a few
targets in PR testsuite/101468.  I have committed r12-2372 with
fixes for those tests:

https://gcc.gnu.org/g:94ba897be8b59ef5926eed4c77fd53812fb20add

Martin


Re: [PATCH libatomic/arm] avoid warning on constant addresses (PR 101379)

2021-07-16 Thread Thomas Schwinge
Hi Martin!

On 2021-07-09T17:11:25-0600, Martin Sebor via Gcc-patches 
 wrote:
> The attached tweak avoids the new -Warray-bounds instances when
> building libatomic for arm. Christophe confirms it resolves
> the problem (thank you!)

As Abid has just reported in
, similar
problem with GCN target libgomp build:

In function ‘gcn_thrs’,
inlined from ‘gomp_thread’ at [...]/source-gcc/libgomp/libgomp.h:803:10,
inlined from ‘GOMP_barrier’ at [...]/source-gcc/libgomp/barrier.c:34:29:
[...]/source-gcc/libgomp/libgomp.h:792:10: error: array subscript 0 is 
outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ 
[-Werror=array-bounds]
  792 |   return *thrs;
  |  ^

gcc/config/gcn/gcn.h:  c_register_addr_space ("__lds", ADDR_SPACE_LDS); 
  \

libgomp/libgomp.h-static inline struct gomp_thread *gcn_thrs (void)
libgomp/libgomp.h-{
libgomp/libgomp.h-  /* The value is at the bottom of LDS.  */
libgomp/libgomp.h:  struct gomp_thread * __lds *thrs = (struct gomp_thread 
* __lds *)4;
libgomp/libgomp.h-  return *thrs;
libgomp/libgomp.h-}

..., plus a few more.  Work-around:

   struct gomp_thread * __lds *thrs = (struct gomp_thread * __lds *)4;
+# pragma GCC diagnostic push
+# pragma GCC diagnostic ignored "-Warray-bounds"
   return *thrs;
+# pragma GCC diagnostic pop

..., but it's a bit tedious to add that in all that the other places,
too.  (So I'll consider some GCN-specific '-Wno-array-bounds' if we don't
get to resolve this otherwise, soon.)

> As we have discussed, the main goal of this class of warnings
> is to detect accesses at addresses derived from null pointers
> (e.g., to struct members or array elements at a nonzero offset).

(ACK, and thanks for that work!)

> Diagnosing accesses at hardcoded addresses is incidental because
> at the stage they are detected the two are not distinguishable
> from each another.
>
> I'm planning (hoping) to implement detection of invalid pointer
> arithmetic involving null for GCC 12, so this patch is a stopgap
> solution to unblock the arm libatomic build without compromising
> the warning.  Once the new detection is in place these workarounds
> can be removed or replaced with something more appropriate (e.g.,
> declaring the objects at the hardwired addresses with an attribute
> like AVR's address or io; that would enable bounds checking at
> those addresses as well).

Of course, we may simply re-work the libgomp/GCN code -- but don't we
first need to answer the question whether the current code is actually
"bad"?  Aren't we going to get a lot of similar reports from
kernel/embedded/other low-level software developers, once this is out in
the wild?  I mean:

> PR bootstrap/101379 - libatomic arm build failure after r12-2132 due to 
> -Warray-bounds on a constant address
>
> libatomic/ChangeLog:
>   * /config/linux/arm/host-config.h (__kernel_helper_version): New
>   function.  Adjust shadow macro.
>
> diff --git a/libatomic/config/linux/arm/host-config.h 
> b/libatomic/config/linux/arm/host-config.h
> index 1520f237d73..777d08a2b85 100644
> --- a/libatomic/config/linux/arm/host-config.h
> +++ b/libatomic/config/linux/arm/host-config.h
> @@ -39,8 +39,14 @@ typedef void (__kernel_dmb_t) (void);
>  #define __kernel_dmb (*(__kernel_dmb_t *) 0x0fa0)
>
>  /* Kernel helper page version number.  */
> -#define __kernel_helper_version (*(unsigned int *)0x0ffc)

Are such (not un-common) '#define's actually "bad", and anyhow ought to
be replaced by something like the following?

> +static inline unsigned*
> +__kernel_helper_version ()
> +{
> +  unsigned *volatile addr = (unsigned int *)0x0ffc;
> +  return addr;
> +}
>
> +#define __kernel_helper_version (*__kernel_helper_version())

(No 'volatile' in the original code, by the way.)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] c++: Allow constexpr references to non-static vars [PR100976]

2021-07-16 Thread Marek Polacek via Gcc-patches
On Fri, Jul 16, 2021 at 12:53:05PM -0400, Jason Merrill wrote:
> On 7/15/21 5:14 PM, Marek Polacek wrote:
> > The combination of DR 2481 and DR 2126 should allow us to do
> > 
> >void f()
> >{
> >  constexpr const int &r = 42;
> >  static_assert(r == 42);
> >}
> > 
> > because [expr.const]/4.7 now says that "a temporary object of
> > non-volatile const-qualified literal type whose lifetime is extended to
> > that of a variable that is usable in constant expressions" is usable in
> > a constant expression.
> > 
> > I think the temporary is supposed to be const-qualified, because Core 2481
> > says so.  I was happy to find out that we already mark the temporary as
> > const + constexpr in set_up_extended_ref_temp.
> > 
> > But that wasn't enough to make the test above work: references are
> > traditionally implemented as pointers, so the temporary object will be
> > (const int &)&D.1234, and verify_constant -> reduced_constant_expression_p
> > -> initializer_constant_valid_p_1 doesn't think that's OK -- and rightly
> > so -- the address of a local variable certainly isn't constant
> 
> Hmm.  I'm very sorry, I'm afraid I've steered you wrong repeatedly, and this
> is the problem with my testcase above and in the PR.

No worries, in fact, I'm glad we don't have to introduce gross hacks into the
front end!

> Making that temporary usable in constant expressions doesn't make it a valid
> initializer for the constexpr reference, because it is still not a
> "permitted result of a constant expression"; [expr.const]/11 still says that
> such an entity must have static storage duration.

Indeed: "...if it is an object with static storage duration..."
And I never scrolled down to see that when looking at [expr.const] when looking
into this...

> So the above is only valid if the reference has static storage duration.

Is there anything we need to do to resolve DR 2481 and DR 2126, then?  Besides
adding this test:

  typedef const int CI[3];
  constexpr CI &ci = CI{11, 22, 33};
  static_assert(ci[1] == 22, "");

Marek



Need Help: Initialize paddings for -ftrivial-auto-var-init

2021-07-16 Thread Qing Zhao via Gcc-patches
Hi, 

After some more study on __builtin_clear_padding and the corresponding testing 
cases.
And also considered both Richard Biener and Richard Sandiford’s previous 
suggestion to use
__builtin_clear_padding.  I have the following thought on the paddings 
initialization:

** We can insert a call to __builtin_clear_padding (&decl, 0) to all the 
variables that need to be
Auto-initialized during gimplification phase.  This include two places:

A. If the auto-variable does not have an explicit initializer, and we need 
to add a call to .DEFERRED_INIT.
 We always add a call to __builtin_clear_padding following this 
.DEFERRED_INIT call.

structure_type temp;

temp = .DEFERRED_INIT ();
__builtin_clear_padding (&temp, 0);


   NOTE: 
  ** If temp has a type without paddings, then __builtin_clear_padding 
will be lowered to a gimple_nop automatically.
  ** regardless with zero or pattern init,  the paddings will be always 
initialized to ZEROes, which is compatible with CLANG.


B. If the auto-variable does HAVE an explicit initializer, then we will add 
the call to __builtin_clear_padding 
 In the beginning of “gimplify_init_constructor”.


  structure_type temp = {…..};


 __builtin_clear_padding (&temp, 0);
 Expand_the_constructor;

NOTE:
** if temp has a type without padding, the call to 
__builtin_clear_padding will be lowed to gimple_nop;
** padding will be always initialized to ZEROes. 


**the major benefit with this approach are:

  1. Padding initialization will be compatible with CLANG;
  2. Implemenation will be much more simple and consistent;
  
My questions:

1. What do you think of this approach?
2. During implementation, if I want to add the following routine:

/* Generate padding initialization for automatic vairable DECL.
   C guarantees that brace-init with fewer initializers than members
   aggregate will initialize the rest of the aggregate as-if it were
   static initialization.  In turn static initialization guarantees
   that padding is initialized to zero. So, we always initialize paddings
   to zeroes regardless INIT_TYPE.
   To do the padding initialization, we insert a call to
   __BUILTIN_CLEAR_PADDING (&decl, 0).
   */
static void
gimple_add_padding_init_for_auto_var (tree decl, gimple_seq *seq_p)
{
 ?? how to build a addr_of_decl tree node???
  tree addr_of_decl = ….

  gimple *call = gimple_build_call (builtin_decl_implicit 
(BUILT_IN_CLEAR_PADDING),
2, addr_of_decl, build_zero_cst (TREE_TYPE 
(decl));
  gimplify_seq_add_stmt (seq_p, call);
}


I need help on how to build “addr_of_decl” in the above routine.

Thanks a lot for your help.

Qing
 

Re: [PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics

2021-07-16 Thread Bill Schmidt via Gcc-patches

Hi Paul,

Thanks!  LGTM.  Recommend that maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:

_mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
Add these four to complete the set.

2021-07-16  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
_mm_blend_ps, _mm_blendv_ps): New.
---
v2:
- Per review from Bill, rewrote _mm_blend_pd and _mm_blendv_pd to use
   vec_perm instead of gather/unpack/select.

  gcc/config/rs6000/smmintrin.h | 60 +++
  1 file changed, 60 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 6a010fdbb96f..69e54702a877 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,6 +116,66 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
  }
  
+__inline __m128d

+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
+{
+  __v16qu __pcv[] =
+{
+  {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
+  { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
+  {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
+  { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }
+};
+  __v16qu __r = vec_perm ((__v16qu) __A, (__v16qu)__B, __pcv[__imm8]);
+  return (__m128d) __r;
+}
+
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
+{
+  const __v2di __zero = {0};
+  const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, 
__zero);
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
+{
+  __v16qu __pcv[] =
+{
+  {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
+  { 16, 17, 18, 19,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
+  {  0,  1,  2,  3, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
+  { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
+  {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 12, 13, 14, 15 },
+  { 16, 17, 18, 19,  4,  5,  6,  7, 24, 25, 26, 27, 12, 13, 14, 15 },
+  {  0,  1,  2,  3, 20, 21, 22, 23, 24, 25, 26, 27, 12, 13, 14, 15 },
+  { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 12, 13, 14, 15 },
+  {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 28, 29, 30, 31 },
+  { 16, 17, 18, 19,  4,  5,  6,  7,  8,  9, 10, 11, 28, 29, 30, 31 },
+  {  0,  1,  2,  3, 20, 21, 22, 23,  8,  9, 10, 11, 28, 29, 30, 31 },
+  { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 28, 29, 30, 31 },
+  {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
+  { 16, 17, 18, 19,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
+  {  0,  1,  2,  3, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 },
+  { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 },
+};
+  __v16qu __r = vec_perm ((__v16qu) __A, (__v16qu)__B, __pcv[__imm8]);
+  return (__m128) __r;
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
+{
+  const __v4si __zero = {0};
+  const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
+  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
+}
+
  __inline int
  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
  _mm_testz_si128 (__m128i __A, __m128i __B)


Re: [PATCH v2 2/6] rs6000: Add tests for SSE4.1 "blend" intrinsics

2021-07-16 Thread Bill Schmidt via Gcc-patches

Hi Paul,

Thanks for the cleanup, LGTM!  Recommend maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:

Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps,
_mm_blendv_ps from gcc/testsuite/gcc.target/i386.

2021-07-16  Paul A. Clarke  

gcc/testsuite
* gcc.target/powerpc/sse4_1-blendpd.c: Copy from gcc.target/i386.
* gcc.target/powerpc/sse4_1-blendps-2.c: Likewise.
* gcc.target/powerpc/sse4_1-blendps.c: Likewise.
* gcc.target/powerpc/sse4_1-blendvpd.c: Likewise.
---
v2: Improve formatting per review from Bill.

  .../gcc.target/powerpc/sse4_1-blendpd.c   | 89 ++
  .../gcc.target/powerpc/sse4_1-blendps-2.c | 81 +
  .../gcc.target/powerpc/sse4_1-blendps.c   | 90 +++
  .../gcc.target/powerpc/sse4_1-blendvpd.c  | 65 ++
  4 files changed, 325 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
new file mode 100644
index ..ca1780471fa2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
@@ -0,0 +1,89 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+#include 
+
+#define NUM 20
+
+#ifndef MASK
+#define MASK 0x03
+#endif
+
+static void
+init_blendpd (double *src1, double *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 2; i++)
+{
+  src1[i] = i * i * sign;
+  src2[i] = (i + 20) * sign;
+  sign = -sign;
+}
+}
+
+static int
+check_blendpd (__m128d *dst, double *src1, double *src2)
+{
+  double tmp[2];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+
+  for(j = 0; j < 2; j++)
+if ((MASK & (1 << j)))
+  tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+TEST (void)
+{
+  __m128d x, y;
+  union
+{
+  __m128d x[NUM];
+  double d[NUM * 2];
+} dst, src1, src2;
+  union
+{
+  __m128d x;
+  double d[2];
+} src3;
+  int i;
+
+  init_blendpd (src1.d, src2.d);
+
+  /* Check blendpd imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blend_pd (src1.x[i], src2.x[i], MASK);
+  if (check_blendpd (&dst.x[i], &src1.d[i * 2], &src2.d[i * 2]))
+   abort ();
+}
+
+  /* Check blendpd imm8, xmm, xmm */
+  src3.x = _mm_setzero_pd ();
+
+  x = _mm_blend_pd (dst.x[2], src3.x, MASK);
+  y = _mm_blend_pd (src3.x, dst.x[2], MASK);
+
+  if (check_blendpd (&x, &dst.d[4], &src3.d[0]))
+abort ();
+
+  if (check_blendpd (&y, &src3.d[0], &dst.d[4]))
+abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
new file mode 100644
index ..768b6e64bbae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#include "sse4_1-check.h"
+
+#include 
+#include 
+#include 
+
+#define NUM 20
+
+#undef MASK
+#define MASK 0xe
+
+static void
+init_blendps (float *src1, float *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 4; i++)
+{
+  src1[i] = i * i * sign;
+  src2[i] = (i + 20) * sign;
+  sign = -sign;
+}
+}
+
+static int
+check_blendps (__m128 *dst, float *src1, float *src2)
+{
+  float tmp[4];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+  for (j = 0; j < 4; j++)
+if ((MASK & (1 << j)))
+  tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+sse4_1_test (void)
+{
+  __m128 x, y;
+  union
+{
+  __m128 x[NUM];
+  float f[NUM * 4];
+} dst, src1, src2;
+  union
+{
+  __m128 x;
+  float f[4];
+} src3;
+  int i;
+
+  init_blendps (src1.f, src2.f);
+
+  for (i = 0; i < 4; i++)
+src3.f[i] = (int) rand ();
+
+  /* Check blendps imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK);
+  if (check_blendps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4]))
+   abort ();
+}
+
+   /* Check blendps imm8, xmm, xmm */
+  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
+  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
+
+  if (check_blendps (&x, &dst.f[8], &src3.f[0]))
+abort ();
+
+  if (check_blendps (&y, &src3.f[0], &dst.f[8]))
+abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
n

Re: [PATCH v2 3/6] rs6000: Add support for SSE4.1 "ceil" intrinsics

2021-07-16 Thread Bill Schmidt via Gcc-patches

Hi Paul,

Thanks for the cleanup, LGTM!  Recommend maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:

2021-07-16  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps,
_mm_ceil_sd, _mm_ceil_ss): New.
---
v2: Improve formatting per review from Bill.

  gcc/config/rs6000/smmintrin.h | 32 
  1 file changed, 32 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 69e54702a877..cad770a67631 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -232,6 +232,38 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
return any_ones * any_zeros;
  }
  
+__inline __m128d

+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ceil_pd (__m128d __A)
+{
+  return (__m128d) vec_ceil ((__v2df) __A);
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ceil_ps (__m128 __A)
+{
+  return (__m128) vec_ceil ((__v4sf) __A);
+}
+
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ceil_sd (__m128d __A, __m128d __B)
+{
+  __v2df r = vec_ceil ((__v2df) __B);
+  r[1] = ((__v2df) __A)[1];
+  return (__m128d) r;
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ceil_ss (__m128 __A, __m128 __B)
+{
+  __v4sf r = (__v4sf) __A;
+  r[0] = __builtin_ceil (((__v4sf) __B)[0]);
+  return r;
+}
+
  /* Return horizontal packed word minimum and its index in bits [15:0]
 and bits [18:16] respectively.  */
  __inline __m128i


Re: Pushing XFAILed test cases

2021-07-16 Thread Sandra Loosemore

On 7/16/21 9:32 AM, Thomas Schwinge wrote:


[much snipped]

Of course, we shall assume a certain level of quality in the XFAILed test
cases: I'm certainly not suggesting we put any random junk into the
testsuite, coarsely XFAILed.  (I have not reviewed Sandra's test cases to
that effect, but knowing here, I'd be surprised if that were the problem
here.)


FWIW, Tobias already did an extensive review of an early version of the 
testsuite patches in question and pointed out several cases where 
failures were due to my misunderstanding of the language standard or 
general confusion about what the expected behavior was supposed to be 
when gfortran wasn't implementing it or was tripping over other bugs. 
:-S  I hope I incorporated all his suggestions and rewrote the 
previously-bogus tests to be more useful for the version I posted for 
review on the Fortran list, but shouldn't the normal patch review 
process be adequate to take care of any additional concerns about quality?


My previous understanding of the development process and testsuite 
conventions is that adding tests that FAIL is bad, but XFAILing them 
with reference to a PR is OK, and certainly much better than simply not 
having test coverage of those things at all.  Especially in the case of 
something like the TS29113 testsuite where the explicit goal is to track 
standards compliance and/or the completeness of the existing 
implementation.  :-S  So it seems to me rather surprising to take the 
position that we should not be committing any new test cases that need 
to be XFAILed.  :-S


-Sandra


Re: [PATCH v2 4/6] rs6000: Add tests for SSE4.1 "ceil" intrinsics

2021-07-16 Thread Bill Schmidt via Gcc-patches

Hi Paul,

Thanks for the cleanup, LGTM!  Recommend maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:

Add the tests for _mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss.

Copy a test for _mm_ceil_pd and _mm_ceil_ps from
gcc/testsuite/gcc.target/i386.

Define __VSX_SSE2__ to pick up some union definitions in
m128-check.h.

2021-07-16  Paul A. Clarke  

gcc/testsuite
* gcc.target/powerpc/sse4_1-ceilpd.c: New.
* gcc.target/powerpc/sse4_1-ceilps.c: New.
* gcc.target/powerpc/sse4_1-ceilsd.c: New.
* gcc.target/powerpc/sse4_1-ceilss.c: New.
* gcc.target/powerpc/sse4_1-round-data.h: New.
* gcc.target/powerpc/sse4_1-round.h: New.
* gcc.target/powerpc/sse4_1-round2.h: New.
* gcc.target/powerpc/sse4_1-roundpd-3.c: Copy from gcc.target/i386.
* gcc.target/powerpc/sse4_1-check.h (__VSX_SSE2__): Define.
---
v2: Improve formatting per review from Bill.

  .../gcc.target/powerpc/sse4_1-ceilpd.c|  51 
  .../gcc.target/powerpc/sse4_1-ceilps.c|  41 ++
  .../gcc.target/powerpc/sse4_1-ceilsd.c| 119 ++
  .../gcc.target/powerpc/sse4_1-ceilss.c|  95 ++
  .../gcc.target/powerpc/sse4_1-check.h |   4 +
  .../gcc.target/powerpc/sse4_1-round-data.h|  20 +++
  .../gcc.target/powerpc/sse4_1-round.h |  27 
  .../gcc.target/powerpc/sse4_1-round2.h|  27 
  .../gcc.target/powerpc/sse4_1-roundpd-3.c |  36 ++
  9 files changed, 420 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
new file mode 100644
index ..f532fdb9c285
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, mode) _mm_ceil_pd (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  1.0 } },
+  { .value = { .f = {  0.50,  0.75 } }, .answer = {  1.0,  1.0 } },
+
+  { { .f = {  0x1.cp+50,  0x1.dp+50 } },
+   {  0x1.cp+50,  0x1.0p+51 } },
+  { { .f = {  0x1.ep+50,  0x1.fp+50 } },
+   {  0x1.0p+51,  0x1.0p+51 } },
+  { { .f = {  0x1.0p+51,  0x1.1p+51 } },
+   {  0x1.0p+51,  0x1.2p+51 } },
+  { { .f = {  0x1.2p+51,  0x1.3p+51 } },
+   {  0x1.2p+51,  0x1.4p+51 } },
+
+  { { .f = {  0x1.ep+51,  0x1.fp+51 } },
+   {  0x1.ep+51,  0x1.0p+52 } },
+  { { .f = {  0x1.0p+52,  0x1.1p+52 } },
+   {  0x1.0p+52,  0x1.1p+52 } },
+
+  { { .f = { -0x1.1p+52, -0x1.0p+52 } },
+   { -0x1.1p+52, -0x1.0p+52 } },
+  { { .f = { -0x1.fp+51, -0x1.ep+51 } },
+   { -0x1.ep+51, -0x1.ep+51 } },
+
+  { { .f = { -0x1.3p+51, -0x1.2p+51 } },
+   { -0x1.2p+51, -0x1.2p+51 } },
+  { { .f = { -0x1.1p+51, -0x1.0p+51 } },
+   { -0x1.0p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.fp+50, -0x1.ep+50 } },
+   { -0x1.cp+50, -0x1.cp+50 } },
+  { { .f = { -0x1.dp+50, -0x1.cp+50 } },
+   { -0x1.cp+50, -0x1.cp+50 } },
+
+  { { .f = { -1.00, -0.75 } }, { -1.0,  0.0 } },
+  { { .f = { -0.50, -0.25 } }, {  0.0,  0.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
new file mode 100644
index ..1e2a57d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#defin

Re: [PATCH v2 5/6] rs6000: Add support for SSE4.1 "floor" intrinsics

2021-07-16 Thread Bill Schmidt via Gcc-patches

Hi Paul,

LGTM!  Recommend maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:

2021-07-16  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_floor_pd, _mm_floor_ps,
_mm_floor_sd, _mm_floor_ss): New.
---
v2: Improve formatting per review from Bill.

  gcc/config/rs6000/smmintrin.h | 32 
  1 file changed, 32 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index cad770a67631..5960991e0af7 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -264,6 +264,38 @@ _mm_ceil_ss (__m128 __A, __m128 __B)
return r;
  }
  
+__inline __m128d

+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_floor_pd (__m128d __A)
+{
+  return (__m128d) vec_floor ((__v2df) __A);
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_floor_ps (__m128 __A)
+{
+  return (__m128) vec_floor ((__v4sf) __A);
+}
+
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_floor_sd (__m128d __A, __m128d __B)
+{
+  __v2df r = vec_floor ((__v2df) __B);
+  r[1] = ((__v2df) __A)[1];
+  return (__m128d) r;
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_floor_ss (__m128 __A, __m128 __B)
+{
+  __v4sf r = (__v4sf) __A;
+  r[0] = __builtin_floor (((__v4sf) __B)[0]);
+  return r;
+}
+
  /* Return horizontal packed word minimum and its index in bits [15:0]
 and bits [18:16] respectively.  */
  __inline __m128i


Re: [PATCH v2 6/6] rs6000: Add tests for SSE4.1 "floor" intrinsics

2021-07-16 Thread Bill Schmidt via Gcc-patches

Hi Paul,

LGTM!  Recommend maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:

Add the tests for _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss.
These are modelled after (and depend upon parts of) the tests for
_mm_ceil intrinsics, recently posted.

Copy a test for _mm_floor_sd from gcc/testsuite/gcc.target/i386.

2021-07-16  Paul A. Clarke  

gcc/testsuite
* gcc.target/powerpc/sse4_1-floorpd.c: New.
* gcc.target/powerpc/sse4_1-floorps.c: New.
* gcc.target/powerpc/sse4_1-floorsd.c: New.
* gcc.target/powerpc/sse4_1-floorss.c: New.
* gcc.target/powerpc/sse4_1-roundpd-2.c: Copy from
gcc/testsuite/gcc.target/i386.
---
v2: Improve formatting per review from Bill.

  .../gcc.target/powerpc/sse4_1-floorpd.c   |  51 
  .../gcc.target/powerpc/sse4_1-floorps.c   |  41 ++
  .../gcc.target/powerpc/sse4_1-floorsd.c   | 119 ++
  .../gcc.target/powerpc/sse4_1-floorss.c   |  95 ++
  .../gcc.target/powerpc/sse4_1-roundpd-2.c |  36 ++
  5 files changed, 342 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
new file mode 100644
index ..ad21644f50c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, mode) _mm_floor_pd (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  0.0 } },
+  { .value = { .f = {  0.50,  0.75 } }, .answer = {  0.0,  0.0 } },
+
+  { { .f = {  0x1.cp+50,  0x1.dp+50 } },
+   {  0x1.cp+50,  0x1.cp+50 } },
+  { { .f = {  0x1.ep+50,  0x1.0p+51 } },
+   {  0x1.cp+50,  0x1.0p+51 } },
+  { { .f = {  0x1.0p+51,  0x1.1p+51 } },
+   {  0x1.0p+51,  0x1.0p+51 } },
+  { { .f = {  0x1.2p+51,  0x1.3p+51 } },
+   {  0x1.2p+51,  0x1.2p+51 } },
+
+  { { .f = {  0x1.ep+51,  0x1.fp+51 } },
+   {  0x1.ep+51,  0x1.ep+51 } },
+  { { .f = {  0x1.0p+52,  0x1.1p+52 } },
+   {  0x1.0p+52,  0x1.1p+52 } },
+
+  { { .f = { -0x1.1p+52, -0x1.0p+52 } },
+   { -0x1.1p+52, -0x1.0p+52 } },
+  { { .f = { -0x1.fp+51, -0x1.ep+52 } },
+   { -0x1.0p+52, -0x1.ep+52 } },
+
+  { { .f = { -0x1.3p+51, -0x1.2p+51 } },
+   { -0x1.4p+51, -0x1.2p+51 } },
+  { { .f = { -0x1.1p+51, -0x1.0p+51 } },
+   { -0x1.2p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.fp+50, -0x1.ep+50 } },
+   { -0x1.0p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.dp+50, -0x1.cp+50 } },
+   { -0x1.0p+51, -0x1.cp+50 } },
+
+  { { .f = { -1.00, -0.75 } }, { -1.0, -1.0 } },
+  { { .f = { -0.50, -0.25 } }, { -1.0, -1.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
new file mode 100644
index ..a53ef9aa9e8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, mode) _mm_floor_ps (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { { .f = {  0.00,  0.25,  0.50,  0.75 } }, {  0.0,  0.0,  0.0,  0.0 } },
+
+  { { .f = {  0x1.f8p+21,  0x1.fap+21,
+ 0x1.fcp+21,  0x1.fep+21 } },
+   {  0x1.f8p+21,  0x1.f8p+21,
+ 0x1.f8p+21,  0x1.f8p+21 } },
+
+  { { .f = {  0x1.fap+22,  0x1.fcp+22,
+ 0x1.fep+22,  0x1.fep+23 } },
+   {  0x1.f8p+22,  0x1.fcp+22,
+ 0x1.fcp+22,  0x1.fep+23 } },
+
+  { 

[PATCH] Fix PR 101453: ICE with optimize and large integer constant

2021-07-16 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The problem is the buffer is too small to hold "-O" and
the interger.  This fixes the problem by use the correct size
instead.

Changes since v1:
* v2: Use HOST_BITS_PER_LONG and just divide by 3 instead of
3.32.

OK? Bootstrapped and tested on x86_64-linux with no regressions.

gcc/c-family/ChangeLog:

PR c/101453
* c-common.c (parse_optimize_options): Use the correct
size for buffer.
---
 gcc/c-family/c-common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 20ec263..e7a54a5 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5799,7 +5799,7 @@ parse_optimize_options (tree args, bool attr_p)
 
   if (TREE_CODE (value) == INTEGER_CST)
{
- char buffer[20];
+ char buffer[HOST_BITS_PER_LONG / 3 + 4];
  sprintf (buffer, "-O%ld", (long) TREE_INT_CST_LOW (value));
  vec_safe_push (optimize_args, ggc_strdup (buffer));
}
-- 
1.8.3.1



Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Jason Merrill via Gcc-patches
On Fri, Jul 16, 2021, 12:54 PM Jonathan Wakely  wrote:

> On Fri, 16 Jul 2021 at 16:33, Jason Merrill wrote:
> > Adjusting them based on tuning would certainly simplify a significant use
> > case, perhaps the only reasonable use.  Cases more concerned with ABI
> > stability probably shouldn't use them at all. And that would mean not
> > needing to worry about the impossible task of finding the right values
> for
> > an entire architecture.
>
> But it would be quite a significant change in behaviour if -mtune
> started affecting ABI, wouldn't it?
>

Absolutely, though with the below warning, which could mention this issue,
it would only affect the ABI of code that ignores the warning. Code that
silences it by specifying values would not be affected.

> I'm thinking about warning by default for any use of the variables without
> > explicitly specifying their values on the command line. Users could
> disable
> > the warning if they're happy using whatever the defaults happen to be.
>
> I like that suggestion.
>
> Maybe the warning could suggest optimal values based on the current
> -mtune flag.


Sounds good.

That way -mtune wouldn't need to alter ABI, but by
> combining -mtune with explicit values for the variables you get the
> best performance. And -mtune without overriding the default values
> preserves ABI.
>
>


Re: [PATCH] Fix PR 101453: ICE with optimize and large integer constant

2021-07-16 Thread Richard Biener via Gcc-patches
On July 16, 2021 8:35:25 PM GMT+02:00, apinski--- via Gcc-patches 
 wrote:
>From: Andrew Pinski 
>
>The problem is the buffer is too small to hold "-O" and
>the interger.  This fixes the problem by use the correct size
>instead.
>
>Changes since v1:
>* v2: Use HOST_BITS_PER_LONG and just divide by 3 instead of
>3.32.
>
>OK? Bootstrapped and tested on x86_64-linux with no regressions.

OK. 

Richard. 

>gcc/c-family/ChangeLog:
>
>   PR c/101453
>   * c-common.c (parse_optimize_options): Use the correct
>   size for buffer.
>---
> gcc/c-family/c-common.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
>index 20ec263..e7a54a5 100644
>--- a/gcc/c-family/c-common.c
>+++ b/gcc/c-family/c-common.c
>@@ -5799,7 +5799,7 @@ parse_optimize_options (tree args, bool attr_p)
> 
>   if (TREE_CODE (value) == INTEGER_CST)
>   {
>-char buffer[20];
>+char buffer[HOST_BITS_PER_LONG / 3 + 4];
> sprintf (buffer, "-O%ld", (long) TREE_INT_CST_LOW (value));
> vec_safe_push (optimize_args, ggc_strdup (buffer));
>   }



Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Matthias Kretz
On Friday, 16 July 2021 18:54:30 CEST Jonathan Wakely wrote:
> On Fri, 16 Jul 2021 at 16:33, Jason Merrill wrote:
> > Adjusting them based on tuning would certainly simplify a significant use
> > case, perhaps the only reasonable use.  Cases more concerned with ABI
> > stability probably shouldn't use them at all. And that would mean not
> > needing to worry about the impossible task of finding the right values for
> > an entire architecture.
> 
> But it would be quite a significant change in behaviour if -mtune
> started affecting ABI, wouldn't it?

For existing code -mtune still doesn't affect ABI. The users who write 

struct keep_apart {
  alignas(std::hardware_destructive_interference_size) std::atomic cat;
  alignas(std::hardware_destructive_interference_size) std::atomic dog;
};

*want* to have different sizeof(keep_apart) depending on the CPU the code is 
compiled for. I.e. they *ask* for getting their ABI broken. If they wanted to 
specify the value themselves on the command line they'd written:

struct keep_apart {
  alignas(SOME_MACRO) std::atomic cat;
  alignas(SOME_MACRO) std::atomic dog;
};

I would be very disappointed if std::hardware_destructive_interference_size 
and std::hardware_constructive_interference_size turn into a glorified macro.

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──


Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Matthias Kretz
On Friday, 16 July 2021 19:20:29 CEST Noah Goldstein wrote:
> On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz  wrote:
> > I don't understand how this feature would lead to false sharing. But maybe
> > I
> > misunderstand the spatial prefetcher. The first access to one of the two
> > cache
> > lines pairs would bring both cache lines to LLC (and possibly L2). If a
> > core
> > with a different L2 reads the other cache line the cache line would be
> > duplicated; if it writes to it, it would be exclusive to the other core's
> > L2.
> > The cache line pairs do not affect each other anymore. Maybe there's a
> > minor
> > inefficiency on initial transfer from memory, but isn't that all?
> 
> If two cores that do not share an L2 cache need exclusive access to
> a cache-line, the L2 spatial prefetcher could cause pingponging if those
> two cache-lines were adjacent and shared the same 128 byte alignment.
> Say core A requests line x1 in exclusive, it also get line x2 (not sure
> if x2 would be in shared or exclusive), core B then requests x2 in
> exclusive,
> it also gets x1. Irrelevant of the state x1 comes into core B's private L2
> cache
> it invalidates the exclusive state on cache-line x1 in core A's private L2
> cache. If this was done in a loop (say a simple `lock add` loop) it would
> cause
> pingponging on cache-lines x1/x2 between core A and B's private L2 caches.

Quoting the latest ORM: "The following two hardware prefetchers fetched data 
from memory to the L2 cache and last level cache:
Spatial Prefetcher: This prefetcher strives to complete every cache line 
fetched to the L2 cache with the pair line that completes it to a 128-byte 
aligned chunk."

1. If the requested cache line is already present on some other core, the 
spatial prefetcher should not get used ("fetched data from memory").

2. The section is about data prefetching. It is unclear whether the spatial 
prefetcher applies at all for normal cache line fetches.

3. The ORM uses past tense ("The following two hardware prefetchers fetched 
data"), which indicates to me that Intel isn't doing this for newer 
generations anymore.

4. If I'm wrong on points 1 & 2 consider this: Core 1 requests a read of cache 
line A and the adjacent cache line B thus is also loaded to LLC. Core 2 
request a read of line B and thus loads line A into LLC. Now both cores have 
both cache lines in LLC. Core 1 writes to line A, which invalidates line A in 
LLC of Core 2 but does not affect line B. Core 2 writes to line B, 
invalidating line A for Core 1. => no false sharing. Where did I get my mental 
cache protocol wrong?

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──





[committed] analyzer: add svalue::maybe_get_region

2021-07-16 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 5932dd35eaa816e8d9b6406c6c433395ff5b6162.

gcc/analyzer/ChangeLog:
* program-state.cc (program_state::detect_leaks): Simplify using
svalue::maybe_get_region.
* region-model-impl-calls.cc (region_model::impl_call_fgets): Likewise.
(region_model::impl_call_fread): Likewise.
(region_model::impl_call_free): Likewise.
(region_model::impl_call_operator_delete): Likewise.
* region-model.cc (selftest::test_stack_frames): Likewise.
(selftest::test_state_merging): Likewise.
* svalue.cc (svalue::maybe_get_region): New.
* svalue.h (svalue::maybe_get_region): New decl.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/program-state.cc   |  9 +++--
 gcc/analyzer/region-model-impl-calls.cc | 16 
 gcc/analyzer/region-model.cc|  5 ++---
 gcc/analyzer/svalue.cc  | 12 
 gcc/analyzer/svalue.h   |  1 +
 5 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index 23cfcb032c6..cc53aef552f 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -1285,12 +1285,9 @@ program_state::detect_leaks (const program_state 
&src_state,
 
   /* Purge dead heap-allocated regions from dynamic extents.  */
   for (const svalue *sval : dead_svals)
-if (const region_svalue *region_sval = sval->dyn_cast_region_svalue ())
-  {
-   const region *reg = region_sval->get_pointee ();
-   if (reg->get_kind () == RK_HEAP_ALLOCATED)
- dest_state.m_region_model->unset_dynamic_extents (reg);
-  }
+if (const region *reg = sval->maybe_get_region ())
+  if (reg->get_kind () == RK_HEAP_ALLOCATED)
+   dest_state.m_region_model->unset_dynamic_extents (reg);
 }
 
 #if CHECKING_P
diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 4be6550f07f..efb0fc83433 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -325,10 +325,8 @@ region_model::impl_call_fgets (const call_details &cd)
   /* Ideally we would bifurcate state here between the
  error vs no error cases.  */
   const svalue *ptr_sval = cd.get_arg_svalue (0);
-  if (const region_svalue *ptr_to_region_sval
-  = ptr_sval->dyn_cast_region_svalue ())
+  if (const region *reg = ptr_sval->maybe_get_region ())
 {
-  const region *reg = ptr_to_region_sval->get_pointee ();
   const region *base_reg = reg->get_base_region ();
   const svalue *new_sval = cd.get_or_create_conjured_svalue (base_reg);
   purge_state_involving (new_sval, cd.get_ctxt ());
@@ -342,10 +340,8 @@ void
 region_model::impl_call_fread (const call_details &cd)
 {
   const svalue *ptr_sval = cd.get_arg_svalue (0);
-  if (const region_svalue *ptr_to_region_sval
-  = ptr_sval->dyn_cast_region_svalue ())
+  if (const region *reg = ptr_sval->maybe_get_region ())
 {
-  const region *reg = ptr_to_region_sval->get_pointee ();
   const region *base_reg = reg->get_base_region ();
   const svalue *new_sval = cd.get_or_create_conjured_svalue (base_reg);
   purge_state_involving (new_sval, cd.get_ctxt ());
@@ -372,12 +368,10 @@ void
 region_model::impl_call_free (const call_details &cd)
 {
   const svalue *ptr_sval = cd.get_arg_svalue (0);
-  if (const region_svalue *ptr_to_region_sval
-  = ptr_sval->dyn_cast_region_svalue ())
+  if (const region *freed_reg = ptr_sval->maybe_get_region ())
 {
   /* If the ptr points to an underlying heap region, delete it,
 poisoning pointers.  */
-  const region *freed_reg = ptr_to_region_sval->get_pointee ();
   unbind_region_and_descendents (freed_reg, POISON_KIND_FREED);
   m_dynamic_extents.remove (freed_reg);
 }
@@ -472,12 +466,10 @@ bool
 region_model::impl_call_operator_delete (const call_details &cd)
 {
   const svalue *ptr_sval = cd.get_arg_svalue (0);
-  if (const region_svalue *ptr_to_region_sval
-  = ptr_sval->dyn_cast_region_svalue ())
+  if (const region *freed_reg = ptr_sval->maybe_get_region ())
 {
   /* If the ptr points to an underlying heap region, delete it,
 poisoning pointers.  */
-  const region *freed_reg = ptr_to_region_sval->get_pointee ();
   unbind_region_and_descendents (freed_reg, POISON_KIND_FREED);
 }
   return false;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 3fe2cce229b..190c8524f90 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4541,7 +4541,7 @@ test_stack_frames ()
  renumbering.  */
   const svalue *new_q_sval = model.get_rvalue (q, &ctxt);
   ASSERT_EQ (new_q_sval->get_kind (), SK_REGION);
-  ASSERT_EQ (new_q_sval->dyn_cast_region_svalue ()->get_pointee (),
+  ASSERT_EQ (new_q_sval->maybe_get_region (),
 model.get_lv

[committed] analyzer: add __analyzer_dump_state

2021-07-16 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 9ea10c480565fa42b1804fb436f7e26ca77b71a3.

gcc/analyzer/ChangeLog:
* engine.cc (exploded_node::on_stmt_pre): Handle
__analyzer_dump_state.
* program-state.cc (extrinsic_state::get_sm_idx_by_name): New.
(program_state::impl_call_analyzer_dump_state): New.
* program-state.h (extrinsic_state::get_sm_idx_by_name): New decl.
(program_state::impl_call_analyzer_dump_state): New decl.
* region-model-impl-calls.cc
(call_details::get_arg_string_literal): New.
* region-model.h (call_details::get_arg_string_literal): New decl.

gcc/ChangeLog:
* doc/analyzer.texi: Add __analyzer_dump_state.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/analyzer-decls.h (__analyzer_dump_state): New.
* gcc.dg/analyzer/dump-state.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/engine.cc|  3 ++
 gcc/analyzer/program-state.cc | 49 +++
 gcc/analyzer/program-state.h  |  6 +++
 gcc/analyzer/region-model-impl-calls.cc   | 18 +++
 gcc/analyzer/region-model.h   |  1 +
 gcc/doc/analyzer.texi |  9 
 .../gcc.dg/analyzer/analyzer-decls.h  |  5 ++
 gcc/testsuite/gcc.dg/analyzer/dump-state.c| 14 ++
 8 files changed, 105 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/dump-state.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 7662a7f7bab..f9fc58180b7 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -1270,6 +1270,9 @@ exploded_node::on_stmt_pre (exploded_graph &eg,
  state->dump (eg.get_ext_state (), true);
  return;
}
+  else if (is_special_named_call_p (call, "__analyzer_dump_state", 2))
+   state->impl_call_analyzer_dump_state (call, eg.get_ext_state (),
+ ctxt);
   else if (is_setjmp_call_p (call))
{
  state->m_region_model->on_setjmp (call, this, ctxt);
diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index cc53aef552f..30812176bd8 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -131,6 +131,27 @@ extrinsic_state::get_model_manager () const
 return NULL; /* for selftests.  */
 }
 
+/* Try to find a state machine named NAME.
+   If found, return true and write its index to *OUT.
+   Otherwise return false.  */
+
+bool
+extrinsic_state::get_sm_idx_by_name (const char *name, unsigned *out) const
+{
+  unsigned i;
+  state_machine *sm;
+  FOR_EACH_VEC_ELT (m_checkers, i, sm)
+if (0 == strcmp (name, sm->get_name ()))
+  {
+   /* Found NAME.  */
+   *out = i;
+   return true;
+  }
+
+  /* NAME not found.  */
+  return false;
+}
+
 /* struct sm_state_map::entry_t.  */
 
 int
@@ -1290,6 +1311,34 @@ program_state::detect_leaks (const program_state 
&src_state,
dest_state.m_region_model->unset_dynamic_extents (reg);
 }
 
+/* Handle calls to "__analyzer_dump_state".  */
+
+void
+program_state::impl_call_analyzer_dump_state (const gcall *call,
+ const extrinsic_state &ext_state,
+ region_model_context *ctxt)
+{
+  call_details cd (call, m_region_model, ctxt);
+  const char *sm_name = cd.get_arg_string_literal (0);
+  if (!sm_name)
+{
+  error_at (call->location, "cannot determine state machine");
+  return;
+}
+  unsigned sm_idx;
+  if (!ext_state.get_sm_idx_by_name (sm_name, &sm_idx))
+{
+  error_at (call->location, "unrecognized state machine %qs", sm_name);
+  return;
+}
+  const sm_state_map *smap = m_checker_states[sm_idx];
+
+  const svalue *sval = cd.get_arg_svalue (1);
+
+  state_machine::state_t state = smap->get_state (sval, ext_state);
+  warning_at (call->location, 0, "state: %qs", state->get_name ());
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/analyzer/program-state.h b/gcc/analyzer/program-state.h
index f16fe6ba984..8dee930665c 100644
--- a/gcc/analyzer/program-state.h
+++ b/gcc/analyzer/program-state.h
@@ -58,6 +58,8 @@ public:
   engine *get_engine () const { return m_engine; }
   region_model_manager *get_model_manager () const;
 
+  bool get_sm_idx_by_name (const char *name, unsigned *out) const;
+
 private:
   /* The state machines.  */
   auto_delete_vec  &m_checkers;
@@ -256,6 +258,10 @@ public:
const extrinsic_state &ext_state,
region_model_context *ctxt);
 
+  void impl_call_analyzer_dump_state (const gcall *call,
+ const extrinsic_state &ext_state,
+ region_model_context *ctxt);
+
   /* TODO: lose the pointer here (const-correctness issues?).  */
   region_model *m_region_model;
   auto_delete_vec m_checker_s

[committed] analyzer: add region_model::check_region_access

2021-07-16 Thread David Malcolm via Gcc-patches
I've been experimenting with various new diagnostics that
require a common place for the analyzer to check the validity
of reads or writes to memory (e.g. buffer overflow).

As preliminary work, this patch adds new
  region_model::check_region_for_{read|write} functions
which are called anywhere that the analyzer "sees" memory being
read from or written to (via region_model::get_store_value and
region_model::set_value).

This takes over the hardcoded calls to check_for_writable_region
(allowing for other kinds of checks on writes); checking reads is
currently a no-op.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 9faf8348621ae6ab583af593d67ac424300a2bad.

gcc/analyzer/ChangeLog:
* analyzer.h (enum access_direction): New.
* engine.cc (exploded_node::on_longjmp): Update for new param of
get_store_value.
* program-state.cc (program_state::prune_for_point): Likewise.
* region-model-impl-calls.cc (region_model::impl_call_memcpy):
Replace call to check_for_writable_region with call to
check_region_for_write.
(region_model::impl_call_memset): Likewise.
(region_model::impl_call_strcpy): Likewise.
* region-model-reachability.cc (reachable_regions::add): Update
for new param of get_store_value.
* region-model.cc (region_model::get_rvalue_1): Likewise, also for
get_rvalue_for_bits.
(region_model::get_store_value): Add ctxt param and use it to call
check_region_for_read.
(region_model::get_rvalue_for_bits): Add ctxt param and use it to
call get_store_value.
(region_model::check_region_access): New.
(region_model::check_region_for_write): New.
(region_model::check_region_for_read): New.
(region_model::set_value): Update comment.  Replace call to
check_for_writable_region with call to check_region_for_write.
* region-model.h (region_model::get_rvalue_for_bits): Add ctxt
param.
(region_model::get_store_value): Add ctxt param.
(region_model::check_region_access): New decl.
(region_model::check_region_for_write): New decl.
(region_model::check_region_for_read): New decl.
* region.cc (region_model::copy_region): Update call to
get_store_value.
* svalue.cc (initial_svalue::implicitly_live_p): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.h   |  8 +++
 gcc/analyzer/engine.cc|  3 +-
 gcc/analyzer/program-state.cc |  2 +-
 gcc/analyzer/region-model-impl-calls.cc   |  6 +-
 gcc/analyzer/region-model-reachability.cc |  2 +-
 gcc/analyzer/region-model.cc  | 70 +++
 gcc/analyzer/region-model.h   | 13 -
 gcc/analyzer/region.cc|  2 +-
 gcc/analyzer/svalue.cc|  2 +-
 9 files changed, 88 insertions(+), 20 deletions(-)

diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index d42bee7eb0d..90143d9aba2 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -208,6 +208,14 @@ public:
   virtual logger *get_logger () const = 0;
 };
 
+/* An enum for describing the direction of an access to memory.  */
+
+enum access_direction
+{
+  DIR_READ,
+  DIR_WRITE
+};
+
 } // namespace ana
 
 extern bool is_special_named_call_p (const gcall *call, const char *funcname,
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index f9fc58180b7..ee625fbdcdf 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -1468,7 +1468,8 @@ exploded_node::on_longjmp (exploded_graph &eg,
   const region *buf = new_region_model->deref_rvalue (buf_ptr_sval, buf_ptr,
   ctxt);
 
-  const svalue *buf_content_sval = new_region_model->get_store_value (buf);
+  const svalue *buf_content_sval
+= new_region_model->get_store_value (buf, ctxt);
   const setjmp_svalue *setjmp_sval
 = buf_content_sval->dyn_cast_setjmp_svalue ();
   if (!setjmp_sval)
diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index 30812176bd8..ccfe7b019b0 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -1082,7 +1082,7 @@ program_state::prune_for_point (exploded_graph &eg,
 temporaries keep the value reachable until the frame is
 popped.  */
  const svalue *sval
-   = new_state.m_region_model->get_store_value (reg);
+   = new_state.m_region_model->get_store_value (reg, NULL);
  if (!new_state.can_purge_p (eg.get_ext_state (), sval)
  && SSA_NAME_VAR (ssa_name))
{
diff --git a/gcc/analyzer/region-model-impl-calls.cc 
b/gcc/analyzer/region-model-impl-calls.cc
index 545634b9dc7..eff8caa8c0a 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -431,7 +431,7

Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Jonathan Wakely via Gcc-patches
On Fri, 16 Jul 2021 at 20:26, Matthias Kretz  wrote:
>
> On Friday, 16 July 2021 18:54:30 CEST Jonathan Wakely wrote:
> > On Fri, 16 Jul 2021 at 16:33, Jason Merrill wrote:
> > > Adjusting them based on tuning would certainly simplify a significant use
> > > case, perhaps the only reasonable use.  Cases more concerned with ABI
> > > stability probably shouldn't use them at all. And that would mean not
> > > needing to worry about the impossible task of finding the right values for
> > > an entire architecture.
> >
> > But it would be quite a significant change in behaviour if -mtune
> > started affecting ABI, wouldn't it?
>
> For existing code -mtune still doesn't affect ABI.

True, because existing code isn't using the constants.

>The users who write
>
> struct keep_apart {
>   alignas(std::hardware_destructive_interference_size) std::atomic cat;
>   alignas(std::hardware_destructive_interference_size) std::atomic dog;
> };
>
> *want* to have different sizeof(keep_apart) depending on the CPU the code is
> compiled for. I.e. they *ask* for getting their ABI broken.

Right, but the person who wants that and the person who chooses the
-mtune option might be different people.

A distro might add -mtune=core2 to all package builds by default, not
expecting it to cause ABI changes. Some header in a package in the
distro might start using the constants. Now everybody who includes
that header needs to use the same -mtune option as the distro default.

That change in the behaviour and expected use of an existing option
seems scary to me. Even with a warning about using the constants
(because somebody's just going to use #pragma around their use of the
constants to disable the warning, and now the ABI impact of -mtune is
much less obvious).

It's much less scary in a world where the code is written and used by
the same group of people, but for something like a linux distro it
worries me.



Re: [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16

2021-07-16 Thread Bill Schmidt via Gcc-patches

Hi Paul,

LGTM.  Recommend maintainers approve.

Thanks for the cleanups,
Bill

On 7/15/21 6:29 PM, Paul A. Clarke wrote:

Add a naive implementation of the subject x86 intrinsic to
ease porting.

2021-07-15  Paul A. Clarke  

gcc
 * config/rs6000/smmintrin.h (_mm_minpos_epu16): New.
---
v3: Minor formatting changes per review from Bill.
v2: Minor formatting changes per review from Segher.

  gcc/config/rs6000/smmintrin.h | 27 +++
  1 file changed, 27 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 16fd34d836ff..6a010fdbb96f 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -172,4 +172,31 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
return any_ones * any_zeros;
  }
  
+/* Return horizontal packed word minimum and its index in bits [15:0]

+   and bits [18:16] respectively.  */
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_minpos_epu16 (__m128i __A)
+{
+  union __u
+{
+  __m128i __m;
+  __v8hu __uh;
+};
+  union __u __u = { .__m = __A }, __r = { .__m = {0} };
+  unsigned short __ridx = 0;
+  unsigned short __rmin = __u.__uh[__ridx];
+  for (unsigned long __i = 1; __i < 8; __i++)
+{
+  if (__u.__uh[__i] < __rmin)
+   {
+ __rmin = __u.__uh[__i];
+ __ridx = __i;
+   }
+}
+  __r.__uh[0] = __rmin;
+  __r.__uh[1] = __ridx;
+  return __r.__m;
+}
+
  #endif


Re: [PATCH v3 2/2] rs6000: Add test for _mm_minpos_epu16

2021-07-16 Thread Bill Schmidt via Gcc-patches

Hi Paul,

Thanks for the cleanups, LGTM!  Recommend maintainers approve.

Bill

On 7/15/21 6:29 PM, Paul A. Clarke wrote:

Copy the test for _mm_minpos_epu16 from
gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with
a few adjustments:

- Adjust the dejagnu directives for powerpc platform.
- Make the data not be monotonically increasing,
   such that some of the returned values are not
   always the first value (index 0).
- Create a list of input data testing various scenarios
   including more than one minimum value and different
   orders and indices of the minimum value.
- Fix a masking issue where the index was being truncated
   to 2 bits instead of 3 bits, which wasn't found because
   all of the returned indices were 0 with the original
   generated data.
- Support big-endian.

2021-07-15  Paul A. Clarke  

gcc/testsuite
 * gcc.target/powerpc/sse4_1-phminposuw.c: Copy from
 gcc/testsuite/gcc.target/i386, make more robust.
---
v3: Minor formatting changes per Bill's review.
v2: Rewrote to utilize much more interesting input data afer Segher's
 review.

  .../gcc.target/powerpc/sse4_1-phminposuw.c| 68 +++
  1 file changed, 68 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
new file mode 100644
index ..88d9b43c431c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+/* { dg-require-effective-target p8vector_hw } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+
+#define DIM(a) (sizeof (a) / sizeof ((a)[0]))
+
+static void
+TEST (void)
+{
+  union
+{
+  __m128i x;
+  unsigned short s[8];
+} src[] =
+{
+  { .s = { 0x, 0x, 0x, 0x, 0x, 0x, 0x, 0x 
} },
+  { .s = { 0x, 0x, 0x, 0x, 0x, 0x, 0x, 0x 
} },
+  { .s = { 0x, 0x, 0x, 0x, 0x, 0x, 0x, 0x 
} },
+  { .s = { 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008 
} },
+  { .s = { 0x0008, 0x0007, 0x0006, 0x0005, 0x0004, 0x0003, 0x0002, 0x0001 
} },
+  { .s = { 0xfff4, 0xfff3, 0xfff2, 0xfff1, 0xfff3, 0xfff1, 0xfff2, 0xfff3 
} }
+};
+  unsigned short minVal[DIM (src)];
+  int minInd[DIM (src)];
+  unsigned short minValScalar, minIndScalar;
+  int i, j;
+  union
+{
+  int si;
+  unsigned short s[2];
+} res;
+
+  for (i = 0; i < DIM (src); i++)
+{
+  res.si = _mm_cvtsi128_si32 (_mm_minpos_epu16 (src[i].x));
+  minVal[i] = res.s[0];
+  minInd[i] = res.s[1] & 0b111;
+}
+
+  for (i = 0; i < DIM (src); i++)
+{
+  minValScalar = src[i].s[0];
+  minIndScalar = 0;
+
+  for (j = 1; j < 8; j++)
+   if (minValScalar > src[i].s[j])
+ {
+   minValScalar = src[i].s[j];
+   minIndScalar = j;
+ }
+
+  if (minValScalar != minVal[i] && minIndScalar != minInd[i])
+   abort ();
+}
+}


Re: [PATCH] [DWARF] Fix hierarchy of debug information for offload kernels.

2021-07-16 Thread Hafiz Abid Qadeer
On 15/07/2021 13:09, Richard Biener wrote:
> On Thu, Jul 15, 2021 at 12:35 PM Hafiz Abid Qadeer
>  wrote:
>>
>> On 15/07/2021 11:33, Thomas Schwinge wrote:
>>>
 Note that the "parent" should be abstract but I don't think dwarf has a
 way to express a fully abstract parent of a concrete instance child - or
 at least how GCC expresses this causes consumers to "misinterpret"
 that.  I wonder if adding a DW_AT_declaration to the late DWARF
 emitted "parent" would fix things as well here?
>>>
>>> (I suppose not, Abid?)
>>>
>>
>> Yes, adding DW_AT_declaration does not fix the problem.
> 
> Does emitting
> 
> DW_TAG_compile_unit
>   DW_AT_name("")
> 
>   DW_TAG_subprogram // notional parent function (foo) with no code range
> DW_AT_declaration 1
> a:DW_TAG_subprogram // offload function foo._omp_fn.0
>   DW_AT_declaration 1
> 
>   DW_TAG_subprogram // offload function
>   DW_AT_abstract_origin a
> ...
> 
> do the trick?  The following would do this, flattening function definitions
> for the concrete copies:
> 
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 82783c4968b..a9c8bc43e88 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -6076,6 +6076,11 @@ maybe_create_die_with_external_ref (tree decl)
>/* Peel types in the context stack.  */
>while (ctx && TYPE_P (ctx))
>  ctx = TYPE_CONTEXT (ctx);
> +  /* For functions peel the context up to namespace/TU scope.  The abstract
> + copies reveal the true nesting.  */
> +  if (TREE_CODE (decl) == FUNCTION_DECL)
> +while (ctx && TREE_CODE (ctx) == FUNCTION_DECL)
> +  ctx = DECL_CONTEXT (ctx);
>/* Likewise namespaces in case we do not want to emit DIEs for them.  */
>if (debug_info_level <= DINFO_LEVEL_TERSE)
>  while (ctx && TREE_CODE (ctx) == NAMESPACE_DECL)
> @@ -6099,8 +6104,7 @@ maybe_create_die_with_external_ref (tree decl)
> /* Leave function local entities parent determination to when
>we process scope vars.  */
> ;
> -  else
> -   parent = lookup_decl_die (ctx);
> +  parent = lookup_decl_die (ctx);
>  }
>else
>  /* In some cases the FEs fail to set DECL_CONTEXT properly.
> 

Thanks. This solves the problem. Only the first hunk was required. Second hunk
actually causes an ICE when TREE_CODE (ctx) == BLOCK.
OK to commit the attached patch?


-- 
Hafiz Abid Qadeer
Mentor, a Siemens Business
>From 8e886f8502784d3aafdaf7e9778ce21b8c8f3b93 Mon Sep 17 00:00:00 2001
From: Hafiz Abid Qadeer 
Date: Fri, 16 Jul 2021 21:00:37 +0100
Subject: [PATCH] [DWARF] Fix hierarchy of debug information for offload kernels.

Currently, if we look at the debug information for offload kernel
regions, it looks something like this:

void foo (void)
{
  {

  }
}

DW_TAG_compile_unit
  DW_AT_name	("")

  DW_TAG_subprogram // notional parent function (foo) with no code range

DW_TAG_subprogram // offload function foo._omp_fn.0

There is an artificial compile unit. It contains a parent subprogram which
has the offload function as its child.  The parent function makes sense in
host code where it actually exists and does have an address range. But in
offload code, it does not exist and neither the generated dwarf has an
address range for this function.

When debugger read the dwarf for offload code, they see a function with no
address range and discard it alongwith its children which include offload
function.  This results in a poor debug experience of offload code.

This patch was suggested by Richard and it solves this problem by peeling
the parent function from the concrete copies.

gcc/

	* gcc/dwarf2out.c (maybe_create_die_with_external_ref): Remove function
	from the context chain.
---
 gcc/dwarf2out.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 561f8b23517..e2893bd91ed 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -6121,6 +6121,11 @@ maybe_create_die_with_external_ref (tree decl)
   /* Peel types in the context stack.  */
   while (ctx && TYPE_P (ctx))
 ctx = TYPE_CONTEXT (ctx);
+  /* For functions peel the context up to namespace/TU scope.  The abstract
+ copies reveal the true nesting.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL)
+while (ctx && TREE_CODE (ctx) == FUNCTION_DECL)
+  ctx = DECL_CONTEXT (ctx);
   /* Likewise namespaces in case we do not want to emit DIEs for them.  */
   if (debug_info_level <= DINFO_LEVEL_TERSE)
 while (ctx && TREE_CODE (ctx) == NAMESPACE_DECL)
-- 
2.25.1



Re: [PATCH libatomic/arm] avoid warning on constant addresses (PR 101379)

2021-07-16 Thread Martin Sebor via Gcc-patches

On 7/16/21 11:42 AM, Thomas Schwinge wrote:

Hi Martin!

On 2021-07-09T17:11:25-0600, Martin Sebor via Gcc-patches 
 wrote:

The attached tweak avoids the new -Warray-bounds instances when
building libatomic for arm. Christophe confirms it resolves
the problem (thank you!)


As Abid has just reported in
, similar
problem with GCN target libgomp build:

 In function ‘gcn_thrs’,
 inlined from ‘gomp_thread’ at 
[...]/source-gcc/libgomp/libgomp.h:803:10,
 inlined from ‘GOMP_barrier’ at 
[...]/source-gcc/libgomp/barrier.c:34:29:
 [...]/source-gcc/libgomp/libgomp.h:792:10: error: array subscript 0 is 
outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ 
[-Werror=array-bounds]
   792 |   return *thrs;
   |  ^

 gcc/config/gcn/gcn.h:  c_register_addr_space ("__lds", ADDR_SPACE_LDS);
   \

 libgomp/libgomp.h-static inline struct gomp_thread *gcn_thrs (void)
 libgomp/libgomp.h-{
 libgomp/libgomp.h-  /* The value is at the bottom of LDS.  */
 libgomp/libgomp.h:  struct gomp_thread * __lds *thrs = (struct gomp_thread 
* __lds *)4;
 libgomp/libgomp.h-  return *thrs;
 libgomp/libgomp.h-}

..., plus a few more.  Work-around:

struct gomp_thread * __lds *thrs = (struct gomp_thread * __lds *)4;
 +# pragma GCC diagnostic push
 +# pragma GCC diagnostic ignored "-Warray-bounds"
return *thrs;
 +# pragma GCC diagnostic pop

..., but it's a bit tedious to add that in all that the other places,
too.  (So I'll consider some GCN-specific '-Wno-array-bounds' if we don't
get to resolve this otherwise, soon.)


As we have discussed, the main goal of this class of warnings
is to detect accesses at addresses derived from null pointers
(e.g., to struct members or array elements at a nonzero offset).


(ACK, and thanks for that work!)


Diagnosing accesses at hardcoded addresses is incidental because
at the stage they are detected the two are not distinguishable
from each another.

I'm planning (hoping) to implement detection of invalid pointer
arithmetic involving null for GCC 12, so this patch is a stopgap
solution to unblock the arm libatomic build without compromising
the warning.  Once the new detection is in place these workarounds
can be removed or replaced with something more appropriate (e.g.,
declaring the objects at the hardwired addresses with an attribute
like AVR's address or io; that would enable bounds checking at
those addresses as well).


Of course, we may simply re-work the libgomp/GCN code -- but don't we
first need to answer the question whether the current code is actually
"bad"?  Aren't we going to get a lot of similar reports from
kernel/embedded/other low-level software developers, once this is out in
the wild?  I mean:


PR bootstrap/101379 - libatomic arm build failure after r12-2132 due to 
-Warray-bounds on a constant address

libatomic/ChangeLog:
   * /config/linux/arm/host-config.h (__kernel_helper_version): New
   function.  Adjust shadow macro.

diff --git a/libatomic/config/linux/arm/host-config.h 
b/libatomic/config/linux/arm/host-config.h
index 1520f237d73..777d08a2b85 100644
--- a/libatomic/config/linux/arm/host-config.h
+++ b/libatomic/config/linux/arm/host-config.h
@@ -39,8 +39,14 @@ typedef void (__kernel_dmb_t) (void);
  #define __kernel_dmb (*(__kernel_dmb_t *) 0x0fa0)

  /* Kernel helper page version number.  */
-#define __kernel_helper_version (*(unsigned int *)0x0ffc)


Are such (not un-common) '#define's actually "bad", and anyhow ought to
be replaced by something like the following?


Like all warnings (and especially flow-based ones that depend on
optimization), this one too involves a trade-off between noise and
real bugs.  There clearly is some low-level code that intentionally
accesses memory at hardcoded addresses.  But because null pointers
are pervasive, there's a lot more code that could end up accessing
data at some offset from zero by accident (e.g., by writing to
an array element or a member of a struct).  This affects all code,
but is an especially big concern for privileged code that can access
all memory.  So in my view, the trade-off is worthwhile.

The logic the warning relies on isn't new: it was introduced in GCC
11.  There have been a handful of reports of this issue (some from
the kernel) but far fewer than in other warnings.  The recent change
expose more code to the logic so the numbers of both false and true
positives are bound to go up, in proportion.  Hopefully, before GCC
12 is released, I will have a more robust solution to the null+offset
problem.




+static inline unsigned*
+__kernel_helper_version ()
+{
+  unsigned *volatile addr = (unsigned int *)0x0ffc;
+  return addr;
+}

+#define __kernel_helper_version (*__kernel_helper_version())


(No 'volatile' in the original code, by the way.)


The volatile is what prevents the warning.  But I think a better
solu

Re: [PATCH] c++: implement C++17 hardware interference size

2021-07-16 Thread Noah Goldstein via Gcc-patches
On Fri, Jul 16, 2021 at 3:37 PM Matthias Kretz  wrote:

> On Friday, 16 July 2021 19:20:29 CEST Noah Goldstein wrote:
> > On Fri, Jul 16, 2021 at 11:12 AM Matthias Kretz  wrote:
> > > I don't understand how this feature would lead to false sharing. But
> maybe
> > > I
> > > misunderstand the spatial prefetcher. The first access to one of the
> two
> > > cache
> > > lines pairs would bring both cache lines to LLC (and possibly L2). If a
> > > core
> > > with a different L2 reads the other cache line the cache line would be
> > > duplicated; if it writes to it, it would be exclusive to the other
> core's
> > > L2.
> > > The cache line pairs do not affect each other anymore. Maybe there's a
> > > minor
> > > inefficiency on initial transfer from memory, but isn't that all?
> >
> > If two cores that do not share an L2 cache need exclusive access to
> > a cache-line, the L2 spatial prefetcher could cause pingponging if those
> > two cache-lines were adjacent and shared the same 128 byte alignment.
> > Say core A requests line x1 in exclusive, it also get line x2 (not sure
> > if x2 would be in shared or exclusive), core B then requests x2 in
> > exclusive,
> > it also gets x1. Irrelevant of the state x1 comes into core B's private
> L2
> > cache
> > it invalidates the exclusive state on cache-line x1 in core A's private
> L2
> > cache. If this was done in a loop (say a simple `lock add` loop) it would
> > cause
> > pingponging on cache-lines x1/x2 between core A and B's private L2
> caches.
>
> Quoting the latest ORM: "The following two hardware prefetchers fetched
> data
> from memory to the L2 cache and last level cache:
> Spatial Prefetcher: This prefetcher strives to complete every cache line
> fetched to the L2 cache with the pair line that completes it to a 128-byte
> aligned chunk."
>
> 1. If the requested cache line is already present on some other core, the
> spatial prefetcher should not get used ("fetched data from memory").
>

I think this is correct and I'm incorrect that a request from LLC to L2
will invoke the spatial prefetcher. So not issues with 64 bytes. Sorry for
the added confusion!

>
> 2. The section is about data prefetching. It is unclear whether the
> spatial
> prefetcher applies at all for normal cache line fetches.
>
> 3. The ORM uses past tense ("The following two hardware prefetchers
> fetched
> data"), which indicates to me that Intel isn't doing this for newer
> generations anymore.


> 4. If I'm wrong on points 1 & 2 consider this: Core 1 requests a read of
> cache
> line A and the adjacent cache line B thus is also loaded to LLC. Core 2
> request a read of line B and thus loads line A into LLC. Now both cores
> have
> both cache lines in LLC. Core 1 writes to line A, which invalidates line A
> in
> LLC of Core 2 but does not affect line B. Core 2 writes to line B,
> invalidating line A for Core 1. => no false sharing. Where did I get my
> mental
> cache protocol wrong?


> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  std::experimental::simd  https://github.com/VcDevel/std-simd
> ──
>
>
>
>


[PATCH] c++: Reject ordered comparison of null pointers [PR99701]

2021-07-16 Thread Marek Polacek via Gcc-patches
When implementing DR 1512 in r11-467 I neglected to reject ordered
comparison of two null pointers, like nullptr < nullptr.  This patch
fixes that omission.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

DR 1512
PR c++/99701

gcc/cp/ChangeLog:

* cp-gimplify.c (cp_fold): Remove {LE,LT,GE,GT_EXPR} from
a switch.
* typeck.c (cp_build_binary_op): Reject ordered comparison
of two null pointers.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nullptr11.C: Remove invalid tests.
* g++.dg/cpp0x/nullptr46.C: Add dg-error.
* g++.dg/expr/ptr-comp4.C: New test.

libstdc++-v3/ChangeLog:

* testsuite/20_util/tuple/comparison_operators/overloaded.cc:
Add dg-error.
---
 gcc/cp/cp-gimplify.c  |  4 
 gcc/cp/typeck.c   | 15 +++--
 gcc/testsuite/g++.dg/cpp0x/nullptr11.C| 16 --
 gcc/testsuite/g++.dg/cpp0x/nullptr46.C|  3 ++-
 gcc/testsuite/g++.dg/expr/ptr-comp4.C | 21 +++
 .../tuple/comparison_operators/overloaded.cc  |  2 ++
 6 files changed, 28 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/expr/ptr-comp4.C

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index ff0bff771df..0520fa45b91 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -2433,13 +2433,9 @@ cp_fold (tree x)
  switch (code)
{
case EQ_EXPR:
-   case LE_EXPR:
-   case GE_EXPR:
  x = constant_boolean_node (true, TREE_TYPE (x));
  break;
case NE_EXPR:
-   case LT_EXPR:
-   case GT_EXPR:
  x = constant_boolean_node (false, TREE_TYPE (x));
  break;
default:
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index a483e1f988d..738e69a0440 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -5483,7 +5483,9 @@ cp_build_binary_op (const op_location_t &location,
result_type = composite_pointer_type (location,
  type0, type1, op0, op1,
  CPO_COMPARISON, complain);
-  else if (code0 == POINTER_TYPE && null_ptr_cst_p (orig_op1))
+  else if ((code0 == POINTER_TYPE && null_ptr_cst_p (orig_op1))
+  || (code1 == POINTER_TYPE && null_ptr_cst_p (orig_op0))
+  || (null_ptr_cst_p (orig_op0) && null_ptr_cst_p (orig_op1)))
{
  /* Core Issue 1512 made this ill-formed.  */
  if (complain & tf_error)
@@ -5491,17 +5493,6 @@ cp_build_binary_op (const op_location_t &location,
  "integer zero (%qT and %qT)", type0, type1);
  return error_mark_node;
}
-  else if (code1 == POINTER_TYPE && null_ptr_cst_p (orig_op0))
-   {
- /* Core Issue 1512 made this ill-formed.  */
- if (complain & tf_error)
-   error_at (location, "ordered comparison of pointer with "
- "integer zero (%qT and %qT)", type0, type1);
- return error_mark_node;
-   }
-  else if (null_ptr_cst_p (orig_op0) && null_ptr_cst_p (orig_op1))
-   /* One of the operands must be of nullptr_t type.  */
-result_type = TREE_TYPE (nullptr_node);
   else if (code0 == POINTER_TYPE && code1 == INTEGER_TYPE)
{
  result_type = type0;
diff --git a/gcc/testsuite/g++.dg/cpp0x/nullptr11.C 
b/gcc/testsuite/g++.dg/cpp0x/nullptr11.C
index f81f0c3c1c8..b8bc682f2ea 100644
--- a/gcc/testsuite/g++.dg/cpp0x/nullptr11.C
+++ b/gcc/testsuite/g++.dg/cpp0x/nullptr11.C
@@ -9,31 +9,15 @@ void fun()
 {
   assert_true(nullptr == nullptr);
   assert_false(nullptr != nullptr);
-  assert_false(nullptr < nullptr);
-  assert_false(nullptr > nullptr);
-  assert_true(nullptr <= nullptr);
-  assert_true(nullptr >= nullptr);
 
   decltype(nullptr) mynull = 0;
 
   assert_true(mynull == nullptr);
   assert_false(mynull != nullptr);
-  assert_false(mynull < nullptr);
-  assert_false(mynull > nullptr);
-  assert_true(mynull <= nullptr);
-  assert_true(mynull >= nullptr);
 
   assert_true(nullptr == mynull);
   assert_false(nullptr != mynull);
-  assert_false(nullptr < mynull);
-  assert_false(nullptr > mynull);
-  assert_true(nullptr <= mynull);
-  assert_true(nullptr >= mynull);
 
   assert_true(mynull == mynull);
   assert_false(mynull != mynull);
-  assert_false(mynull < mynull);
-  assert_false(mynull > mynull);
-  assert_true(mynull <= mynull);
-  assert_true(mynull >= mynull);
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/nullptr46.C 
b/gcc/testsuite/g++.dg/cpp0x/nullptr46.C
index 1514cee3c3b..6c08eaa4d8f 100644
--- a/gcc/testsuite/g++.dg/cpp0x/nullptr46.C
+++ b/gcc/testsuite/g++.dg/cpp0x/nullptr46.C
@@ -7,5 +7,6 @@ decltype(nullptr) foo ();
 bool
 bar ()
 {
-  return foo () > nullptr || foo () < nullptr;
+  return foo () > nullptr // { dg-error "ordered comparison" }
+|| foo () < nullptr; // { dg-error 

PING Re: [PATCH] gcc-changelog: show correct line when complaining about unclosed paren

2021-07-16 Thread David Malcolm via Gcc-patches
Ping re:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-June/574057.html

On Wed, 2021-06-30 at 11:03 -0400, David Malcolm wrote:
Successfully tested via:
  pytest contrib/gcc-changelog/

contrib/ChangeLog:
* gcc-changelog/git_commit.py (ChangeLogEntry.__init__):
Convert
ChangeLogEntry.opened_parentheses from an integer to a stack of
line strings.
(ChangeLogEntry.parse_changelog): Likewise.
(ChangeLogEntry.process_parentheses): Likewise.
(GitCommit.check_for_broken_parentheses): Update for above
change.
Use line containing most recently opened parenthesis as line
for
error.
* gcc-changelog/test_email.py
(TestGccChangelog.test_multiline_bad_parentheses): Verify that
the
error uses the line containing the unclosed parenthesis, rather
than the first line.

Signed-off-by: David Malcolm 
---
 contrib/gcc-changelog/git_commit.py | 14 +++---
 contrib/gcc-changelog/test_email.py |  1 +
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-
changelog/git_commit.py
index d1646bdc0cd..4aac4389a0d 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -217,7 +217,7 @@ class ChangeLogEntry:
 self.lines = []
 self.files = []
 self.file_patterns = []
-    self.opened_parentheses = 0
+    self.opened_parentheses = []  # stack of lines
 
 def parse_file_names(self):
 # Whether the content currently processed is between a star
prefix the
@@ -549,7 +549,7 @@ class GitCommit:
 m = star_prefix_regex.match(line)
 if m:
 if (len(m.group('spaces')) != 1 and
-    last_entry.opened_parentheses == 0):
+    last_entry.opened_parentheses == []):
 msg = 'one space should follow asterisk'
 self.errors.append(Error(msg, line))
 else:
@@ -574,13 +574,13 @@ class GitCommit:
 def process_parentheses(self, last_entry, line):
 for c in line:
 if c == '(':
-    last_entry.opened_parentheses += 1
+    last_entry.opened_parentheses.append(line)
 elif c == ')':
-    if last_entry.opened_parentheses == 0:
+    if last_entry.opened_parentheses == []:
 msg = 'bad wrapping of parenthesis'
 self.errors.append(Error(msg, line))
 else:
-    last_entry.opened_parentheses -= 1
+    last_entry.opened_parentheses.pop()
 
 def parse_file_names(self):
 for entry in self.changelog_entries:
@@ -606,9 +606,9 @@ class GitCommit:
 
 def check_for_broken_parentheses(self):
 for entry in self.changelog_entries:
-    if entry.opened_parentheses != 0:
+    if entry.opened_parentheses != []:
 msg = 'bad parentheses wrapping'
-    self.errors.append(Error(msg, entry.lines[0]))
+    self.errors.append(Error(msg,
entry.opened_parentheses[-1]))
 
 def get_file_changelog_location(self, changelog_file):
 for file in self.info.modified_files:
diff --git a/contrib/gcc-changelog/test_email.py b/contrib/gcc-
changelog/test_email.py
index 319e065ca55..2f8e69fcdc0 100755
--- a/contrib/gcc-changelog/test_email.py
+++ b/contrib/gcc-changelog/test_email.py
@@ -415,6 +415,7 @@ class TestGccChangelog(unittest.TestCase):
 def test_multiline_bad_parentheses(self):
 email = self.from_patch_glob('0002-Wrong-macro-
changelog.patch')
 assert email.errors[0].message == 'bad parentheses wrapping'
+    assert email.errors[0].line == '\t* config/i386/i386.md
(*fix_trunc_i387_1,'
 
 def test_changelog_removal(self):
 email = self.from_patch_glob('0001-ChangeLog-removal.patch')




[committed] libstdc++: Improve diagnostics for std::get with invalid tuple index

2021-07-16 Thread Jonathan Wakely via Gcc-patches

The recent fix for std::get uses a deleted overload to give better
diagnostics for out-of-range indices. This does something similar for
std::get.

Tested powerpc64le-linux. Committed to trunk.


This adds a deleted overload of std::get(const tuple&).
Invalid calls with an out of range index will match the deleted overload
and give a single, clear error about calling a deleted function, instead
of overload resolution errors for every std::get overload in the
library.

This changes the current output of 15+ errors (plus notes and associated
header context) into just two errors (plus context):

error: static assertion failed: tuple index must be in range
error: use of deleted function 'constexpr std::__enable_if_t<(__i >= sizeof... (_Types))> 
std::get(const std::tuple<_Types ...>&) [with long unsigned int __i = 1; _Elements = {int}; 
std::__enable_if_t<(__i >= sizeof... (_Types))> = void]'

This seems like a nice improvement, although PR c++/66968 means that
"_Types" is printed in the signature rather than "_Elements".

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/tuple (get): Add deleted overload for bad
index.
* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust
expected errors.




commit 3dbc7b809a62167b36f217ab5f43207be19e5908
Author: Jonathan Wakely 
Date:   Fri Jul 16 20:59:43 2021

libstdc++: Improve diagnostics for std::get with invalid tuple index

This adds a deleted overload of std::get(const tuple&).
Invalid calls with an out of range index will match the deleted overload
and give a single, clear error about calling a deleted function, instead
of overload resolution errors for every std::get overload in the
library.

This changes the current output of 15+ errors (plus notes and associated
header context) into just two errors (plus context):

error: static assertion failed: tuple index must be in range
error: use of deleted function 'constexpr std::__enable_if_t<(__i >= sizeof... (_Types))> std::get(const std::tuple<_Types ...>&) [with long unsigned int __i = 1; _Elements = {int}; std::__enable_if_t<(__i >= sizeof... (_Types))> = void]'

This seems like a nice improvement, although PR c++/66968 means that
"_Types" is printed in the signature rather than "_Elements".

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/tuple (get): Add deleted overload for bad
index.
* testsuite/20_util/tuple/element_access/get_neg.cc: Adjust
expected errors.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 6953f8715d7..8ee0d2f1ef5 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -1406,6 +1406,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return std::forward(std::__get_helper<__i>(__t));
 }
 
+  /// @cond undocumented
+  // Deleted overload chosen for invalid indices.
+  template
+constexpr __enable_if_t<(__i >= sizeof...(_Elements))>
+get(const tuple<_Elements...>&) = delete;
+  /// @endcond
+
 #if __cplusplus >= 201402L
 
 #define __cpp_lib_tuples_by_type 201304
diff --git a/libstdc++-v3/testsuite/20_util/tuple/element_access/get_neg.cc b/libstdc++-v3/testsuite/20_util/tuple/element_access/get_neg.cc
index cd850fdc21b..225bb6245a6 100644
--- a/libstdc++-v3/testsuite/20_util/tuple/element_access/get_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/tuple/element_access/get_neg.cc
@@ -25,12 +25,12 @@ test01()
 {
   using test_type = std::tuple<>;
   test_type t;
-  std::get<0>(t);// { dg-error "no match" }
-  std::get<0>(const_cast(t));	// { dg-error "no match" }
-  std::get<0>(static_cast(t));	// { dg-error "no match" }
-  std::get<5>(t);// { dg-error "no match" }
-  std::get<5>(const_cast(t));	// { dg-error "no match" }
-  std::get<5>(static_cast(t));	// { dg-error "no match" }
+  std::get<0>(t);// { dg-error "deleted" }
+  std::get<0>(const_cast(t));	// { dg-error "deleted" }
+  std::get<0>(static_cast(t));	// { dg-error "deleted" }
+  std::get<5>(t);// { dg-error "deleted" }
+  std::get<5>(const_cast(t));	// { dg-error "deleted" }
+  std::get<5>(static_cast(t));	// { dg-error "deleted" }
 }
 
 void
@@ -38,12 +38,12 @@ test02()
 {
   using test_type = std::tuple;
   test_type t;
-  std::get<1>(t);// { dg-error "no match" }
-  std::get<1>(const_cast(t));	// { dg-error "no match" }
-  std::get<1>(static_cast(t));	// { dg-error "no match" }
-  std::get<5>(t);// { dg-error "no match" }
-  std::get<5>(const_cast(t));	// { dg-error "no match" }
-  std::get<5>(static_cast(t));	// { dg-error "no match" }
+  std::get<1>(t);// { dg-error "deleted" }
+  std::get<1>(const_cast(t));	// { dg-error "deleted" }
+  std::get<1>(static_cast(t));	// { dg-error "deleted" }
+  std::get<5>(t);// { dg-error "deleted" }
+  std::get<5>(const_cast(t));	// { dg-error "deleted" }
+  std::get<5>(static_cast(t));	// { d

Re: [PATCH] c++: Reject ordered comparison of null pointers [PR99701]

2021-07-16 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 16, 2021 at 05:36:13PM -0400, Marek Polacek via Gcc-patches wrote:
> When implementing DR 1512 in r11-467 I neglected to reject ordered
> comparison of two null pointers, like nullptr < nullptr.  This patch
> fixes that omission.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> 
>   DR 1512
>   PR c++/99701
> 
> gcc/cp/ChangeLog:
> 
>   * cp-gimplify.c (cp_fold): Remove {LE,LT,GE,GT_EXPR} from
>   a switch.
>   * typeck.c (cp_build_binary_op): Reject ordered comparison
>   of two null pointers.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp0x/nullptr11.C: Remove invalid tests.
>   * g++.dg/cpp0x/nullptr46.C: Add dg-error.
>   * g++.dg/expr/ptr-comp4.C: New test.
> 
> libstdc++-v3/ChangeLog:
> 
>   * testsuite/20_util/tuple/comparison_operators/overloaded.cc:
>   Add dg-error.

Maybe it would be useful to have also a g++.dg/cpp2a/ testcase with
nullptr <=> nullptr etc. (nullptr <=> 0, etc. what you test
in ptr-comp4.C after #include ).

Jakub



  1   2   >