Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-11 Thread Richard Biener
On Sun, 10 Dec 2023, Jason Merrill wrote:

> On 12/10/23 05:22, Richard Biener wrote:
> >> Am 09.12.2023 um 21:13 schrieb Jason Merrill :
> >>
> >> On 11/2/23 21:18, Nathaniel Shead wrote:
> >>> Bootstrapped and regtested on x86-64_pc_linux_gnu.
> >>> I'm not entirely sure if the change I made to have destructors clobber
> >>> with
> >>> CLOBBER_EOL instead of CLOBBER_UNDEF is appropriate, but nothing seemed to
> >>> have
> >>> broken by doing this and I wasn't able to find anything else that really
> >>> depended on this distinction other than a warning pass. Otherwise I could
> >>> experiment with a new clobber kind for destructor calls.
> >>
> >> It seems wrong to me: CLOBBER_EOL is documented to mean that the storage is
> >> expiring at that point as well, which a (pseudo-)destructor does not imply;
> >> it's perfectly valid to destroy an object and then create another in the
> >> same storage.
> >>
> >> We probably do want another clobber kind for end of object lifetime. And/or
> >> one for beginning of object lifetime.
> > 
> > There?s not much semantically different between UNDEF and end of object but
> > not storage lifetime?  At least for what middle-end optimizations do.
> 
> That's fine for the middle-end, but Nathaniel's patch wants to distinguish
> between the clobbers at beginning and end of object lifetime in order to
> diagnose stores to an out-of-lifetime object in constexpr evaluation.

Ah, I see.  I did want to add CLOBBER_SOL (start-of-life) when working
on PR90348, but I always fail to finish working on that stack-slot sharing
issue.  But it would be for the storage life, not object life, also
added by gimplification.

> One option might be to remove the clobber at the beginning of the constructor;
> are there any useful optimizations enabled by that, or is it just pedantically
> breaking people's code?

It's allowing DSE to the object that was live before the new one.  Not
all objects require explicit destruction (which would get you a clobber)
before storage can be re-used.

> > EOL is used by stack slot sharing and that operates on the underlying
> > storage, not individual objects live in it.
> 
> I wonder about changing the name to EOS (end of storage [duration]) to avoid
> similar confusion with object lifetime?

EOS{L,D}?  But sure, better names (and documentation) are appreciated.

Richard.

Re: [PATCHv2 2/2] MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type are the same

2023-12-11 Thread Richard Biener
On Sun, Dec 10, 2023 at 8:57 PM Andrew Pinski  wrote:
>
> When I moved two_value to match.pd, I removed the check for the {0,+-1}
> as I had placed it after the {0,+-1} case for cond in match.pd.
> In the case of {0,+-1} and non boolean, before we would optmize those
> case to just `(convert)a` but after we would get `(convert)(a != 0)`
> which was not handled anyways to just `(convert)a`.
> So this adds a pattern to match `(convert)(zeroone != 0)` and simplify
> to `(convert)zeroone`.
>
> Also this optimizes (convert)(zeroone == 0) into (zeroone^1) if the
> type match. Removing the opposite transformation from fold.
> The opposite transformation was added with
> https://gcc.gnu.org/pipermail/gcc-patches/2006-February/190514.html
> It is no longer considered the canonicalization either, even VRP will
> transform it back into `(~a) & 1` so removing it is a good idea.
>
> Note the testcase pr69270.c needed a slight update due to not matching
> exactly a scan pattern, this update makes it more robust and will match
> before and afterwards and if there are other changes in this area too.
>
> Note the testcase gcc.target/i386/pr110790-2.c needs a slight update
> for better code generation in LP64 bit mode.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK

> gcc/ChangeLog:
>
> PR tree-optimization/111972
> PR tree-optimization/110637
> * match.pd (`(convert)(zeroone !=/== CST)`): Match
> and simplify to ((convert)zeroone){,^1}.
> * fold-const.cc (fold_binary_loc): Remove
> transformation of `(~a) & 1` and `(a ^ 1) & 1)`
> into `(convert)(a == 0)`.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr110637-1.c: New test.
> * gcc.dg/tree-ssa/pr110637-2.c: New test.
> * gcc.dg/tree-ssa/pr110637-3.c: New test.
> * gcc.dg/tree-ssa/pr111972-1.c: New test.
> * gcc.dg/tree-ssa/pr69270.c: Update testcase.
> * gcc.target/i386/pr110790-2.c: Update testcase.
> * gcc.dg/fold-even-1.c: Removed.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/fold-const.cc  | 27 -
>  gcc/match.pd   | 16 ++
>  gcc/testsuite/gcc.dg/fold-even-1.c | 32 
>  gcc/testsuite/gcc.dg/tree-ssa/pr110637-1.c | 10 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr110637-2.c | 13 +
>  gcc/testsuite/gcc.dg/tree-ssa/pr110637-3.c | 14 +
>  gcc/testsuite/gcc.dg/tree-ssa/pr111972-1.c | 34 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr69270.c|  4 +--
>  gcc/testsuite/gcc.target/i386/pr110790-2.c | 16 --
>  9 files changed, 103 insertions(+), 63 deletions(-)
>  delete mode 100644 gcc/testsuite/gcc.dg/fold-even-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110637-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110637-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110637-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr111972-1.c
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 2692b98ceac..f5d68ac323a 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -12077,33 +12077,6 @@ fold_binary_loc (location_t loc, enum tree_code 
> code, tree type,
>goto bit_rotate;
>
>  case BIT_AND_EXPR:
> -  /* Fold (X ^ 1) & 1 as (X & 1) == 0.  */
> -  if (TREE_CODE (arg0) == BIT_XOR_EXPR
> - && INTEGRAL_TYPE_P (type)
> - && integer_onep (TREE_OPERAND (arg0, 1))
> - && integer_onep (arg1))
> -   {
> - tree tem2;
> - tem = TREE_OPERAND (arg0, 0);
> - tem2 = fold_convert_loc (loc, TREE_TYPE (tem), arg1);
> - tem2 = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (tem),
> - tem, tem2);
> - return fold_build2_loc (loc, EQ_EXPR, type, tem2,
> - build_zero_cst (TREE_TYPE (tem)));
> -   }
> -  /* Fold ~X & 1 as (X & 1) == 0.  */
> -  if (TREE_CODE (arg0) == BIT_NOT_EXPR
> - && INTEGRAL_TYPE_P (type)
> - && integer_onep (arg1))
> -   {
> - tree tem2;
> - tem = TREE_OPERAND (arg0, 0);
> - tem2 = fold_convert_loc (loc, TREE_TYPE (tem), arg1);
> - tem2 = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (tem),
> - tem, tem2);
> - return fold_build2_loc (loc, EQ_EXPR, type, tem2,
> - build_zero_cst (TREE_TYPE (tem)));
> -   }
>/* Fold !X & 1 as X == 0.  */
>if (TREE_CODE (arg0) == TRUTH_NOT_EXPR
>   && integer_onep (arg1))
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 4d554ba4721..e242eac92f5 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3332,6 +3332,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
>  (rcmp @0 @1
>
> +/* (type)([0,1]@a != 0) -> (type)a
> +   (type)([0

Re: [PATCH 1/2] analyzer: Remove check of unsigned_char in maybe_undo_optimize_bit_field_compare.

2023-12-11 Thread Richard Biener
On Sun, Dec 10, 2023 at 8:57 PM Andrew Pinski  wrote:
>
> From: Andrew Pinski 
>
> The check for the type seems unnecessary and gets in the way sometimes.
> Also with a patch I am working on for match.pd, it causes a failure to happen.
> Before my patch the IR was:
>   _1 = BIT_FIELD_REF ;
>   _2 = _1 & 1;
>   _3 = _2 != 0;
>   _4 = (int) _3;
>   __analyzer_eval (_4);
>
> Where _2 was an unsigned char type.
> And After my patch we have:
>   _1 = BIT_FIELD_REF ;
>   _2 = (int) _1;
>   _3 = _2 & 1;
>   __analyzer_eval (_3);
>
> But in this case, the BIT_AND_EXPR is in an int type.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK (hope it's OK that I approve this).

Richard.

> gcc/analyzer/ChangeLog:
>
> * region-model-manager.cc (maybe_undo_optimize_bit_field_compare): 
> Remove
> the check for type being unsigned_char_type_node.
> ---
>  gcc/analyzer/region-model-manager.cc | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/gcc/analyzer/region-model-manager.cc 
> b/gcc/analyzer/region-model-manager.cc
> index b631bcb04d0..26c34e38875 100644
> --- a/gcc/analyzer/region-model-manager.cc
> +++ b/gcc/analyzer/region-model-manager.cc
> @@ -596,9 +596,6 @@ maybe_undo_optimize_bit_field_compare (tree type,
>tree cst,
>const svalue *arg1)
>  {
> -  if (type != unsigned_char_type_node)
> -return NULL;
> -
>const binding_map &map = compound_sval->get_map ();
>unsigned HOST_WIDE_INT mask = TREE_INT_CST_LOW (cst);
>/* If "mask" is a contiguous range of set bits, see if the
> --
> 2.39.3
>


Re: PING^2: [PATCH v2] Only allow (int)trunc(x) to (int)x simplification with -ffp-int-builtin-inexact [PR107723]

2023-12-11 Thread Richard Biener
On Mon, Dec 11, 2023 at 7:39 AM Xi Ruoyao  wrote:
>
> Ping again.

OK, sorry for the delay.

Richard.

> On Fri, 2023-12-01 at 13:44 +0800, Xi Ruoyao wrote:
> > Ping.
> >
> > On Fri, 2023-11-24 at 17:09 +0800, Xi Ruoyao wrote:
> > > With -fno-fp-int-builtin-inexact, trunc is not allowed to raise
> > > FE_INEXACT and it should produce an integral result (if the input is not
> > > NaN or Inf).  Thus FE_INEXACT should not be raised.
> > >
> > > But (int)x may raise FE_INEXACT when x is a non-integer, non-NaN, and
> > > non-Inf value.  C23 recommends to do so in a footnote.
> > >
> > > Thus we should not simplify (int)trunc(x) to (int)x if
> > > -fno-fp-int-builtin-inexact is in-effect.
> > >
> > > gcc/ChangeLog:
> > >
> > > PR middle-end/107723
> > > * convert.cc (convert_to_integer_1) [case BUILT_IN_TRUNC]: Break
> > > early if !flag_fp_int_builtin_inexact and flag_trapping_math.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR middle-end/107723
> > > * gcc.dg/torture/builtin-fp-int-inexact-trunc.c: New test.
> > > ---
> > >
> > > Change from v1: add flag_trapping_math into the condition.
> > >
> > > Bootstrapped and regtested on x86_64-linux-gnu.  Ok for trunk?
> > >
> > >  gcc/convert.cc   |  3 ++-
> > >  .../gcc.dg/torture/builtin-fp-int-inexact-trunc.c| 12 
> > >  2 files changed, 14 insertions(+), 1 deletion(-)
> > >  create mode 100644 
> > > gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact-trunc.c
> > >
> > > diff --git a/gcc/convert.cc b/gcc/convert.cc
> > > index 46c8bcb31f8..f214b750188 100644
> > > --- a/gcc/convert.cc
> > > +++ b/gcc/convert.cc
> > > @@ -591,7 +591,8 @@ convert_to_integer_1 (tree type, tree expr, bool 
> > > dofold)
> > > CASE_FLT_FN (BUILT_IN_TRUNC):
> > > CASE_FLT_FN_FLOATN_NX (BUILT_IN_TRUNC):
> > >   if (call_expr_nargs (s_expr) != 1
> > > - || !SCALAR_FLOAT_TYPE_P (TREE_TYPE (CALL_EXPR_ARG (s_expr, 0
> > > + || !SCALAR_FLOAT_TYPE_P (TREE_TYPE (CALL_EXPR_ARG (s_expr, 0)))
> > > + || (!flag_fp_int_builtin_inexact && flag_trapping_math))
> > > break;
> > >   return convert_to_integer_1 (type, CALL_EXPR_ARG (s_expr, 0),
> > >dofold);
> > > diff --git a/gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact-trunc.c 
> > > b/gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact-trunc.c
> > > new file mode 100644
> > > index 000..09731183621
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact-trunc.c
> > > @@ -0,0 +1,12 @@
> > > +/* Test -fno-fp-int-builtin-inexact.  */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-fno-fp-int-builtin-inexact -fdump-tree-original" } */
> > > +
> > > +long
> > > +x (double y)
> > > +{
> > > +  return __builtin_trunc (y);
> > > +}
> > > +
> > > +/* Optimization should not discard the __builtin_trunc call.  */
> > > +/* { dg-final { scan-tree-dump "__builtin_trunc" "original" } } */
> >
>
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University


Re: [v3 PATCH] Simplify vector ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE ((a cmp b) ? (VCE c) : (VCE d))).

2023-12-11 Thread Richard Biener
On Mon, Dec 11, 2023 at 7:51 AM liuhongt  wrote:
>
> > since you are looking at TYPE_PRECISION below you want
> > VECTOR_INTIEGER_TYPE_P here as well?  The alternative
> > would be to compare TYPE_SIZE.
> >
> > Some of the checks feel redundant but are probably good for
> > documentation purposes.
> >
> > OK with using VECTOR_INTIEGER_TYPE_P
> Actually, the data type doens't need to integer, .i.e x86 support vblendvps
> so I'm using TYPE_SIZE here, the code is adjusted to
>
> && tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (type)))
> && (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type)))
><= TYPE_PRECISION (TREE_TYPE (TREE_TYPE (@6
>
> Here's the updated patch.
> Ok for trunk?
>
> When I'm working on PR112443, I notice there's some misoptimizations:
> after we fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend
> fails to combine it back to v{,p}blendv{v,ps,pd} since the pattern is
> too complicated, so I think maybe we should hanlde it in the gimple
> level.
>
> The dump is like
>
>   _1 = c_3(D) >= { 0, 0, 0, 0 };
>   _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>   _7 = VIEW_CONVERT_EXPR(_2);
>   _8 = VIEW_CONVERT_EXPR(b_6(D));
>   _9 = VIEW_CONVERT_EXPR(a_5(D));
>   _10 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
>   _11 = VEC_COND_EXPR <_10, _8, _9>;
>
> It can be optimized to
>
>   _1 = c_2(D) >= { 0, 0, 0, 0 };
>   _6 = VEC_COND_EXPR <_1, b_5(D), a_4(D)>;
>
> since _7 is either -1 or 0, the selection of _7 < 0 ? _8 : _9 should
> be euqal to _1 ? b : a as long as TYPE_PRECISION of the component type
> of the second VEC_COND_EXPR is less equal to the first one.
> The patch add a gimple pattern to handle that.
>
> gcc/ChangeLog:
>
> * match.pd (VCE (a cmp b ? -1 : 0) < 0) ? c : d ---> (VCE ((a
> cmp b) ? (VCE:c) : (VCE:d))): New gimple simplication.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512vl-blendv-3.c: New test.
> * gcc.target/i386/blendv-3.c: New test.
> ---
>  gcc/match.pd  | 23 ++
>  .../gcc.target/i386/avx512vl-blendv-3.c   |  6 +++
>  gcc/testsuite/gcc.target/i386/blendv-3.c  | 46 +++
>  3 files changed, 75 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/blendv-3.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 4d554ba4721..359c7b07dc3 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5190,6 +5190,29 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
>(vec_cond (bit_and @0 (bit_not @3)) @2 @1)))
>
> +/*  ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d is just
> +(VCE ((a cmp b) ? (VCE c) : (VCE d))) when TYPE_PRECISION of the
> +component type of the outer vec_cond is greater equal the inner one.  */
> +(for cmp (simple_comparison)
> + (simplify
> +  (vec_cond
> +(lt (view_convert@5 (vec_cond@6 (cmp@4 @0 @1)
> +   integer_all_onesp
> +   integer_zerop))
> + integer_zerop) @2 @3)
> +  (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0))
> +   && VECTOR_INTEGER_TYPE_P (TREE_TYPE (@5))
> +   && !TYPE_UNSIGNED (TREE_TYPE (@5))
> +   && VECTOR_TYPE_P (TREE_TYPE (@6))
> +   && VECTOR_TYPE_P (type)
> +   && tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (type)))
> +   && (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type)))
> + <= TYPE_PRECISION (TREE_TYPE (TREE_TYPE (@6

sorry for nitpicking, but can you please use

&& tree_int_cst_le (TYPE_SIZE (TREE_TYPE (type)),
 TREE_TYPE (TREE_TYPE (@6)))

thus not use precision on one and size on the other type?

OK with that change.

Richard.

> +   && TYPE_SIZE (type) == TYPE_SIZE (TREE_TYPE (@6)))
> +   (with { tree vtype = TREE_TYPE (@6);}
> + (view_convert:type
> +   (vec_cond @4 (view_convert:vtype @2) (view_convert:vtype @3)))
> +
>  /* c1 ? c2 ? a : b : b  -->  (c1 & c2) ? a : b  */
>  (simplify
>   (vec_cond @0 (vec_cond:s @1 @2 @3) @3)
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c 
> b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> new file mode 100644
> index 000..2777e72ab5f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512vl -mavx512bw -O2" } */
> +/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
> +/* { dg-final { scan-assembler-not {vpcmp} } } */
> +
> +#include "blendv-3.c"
> diff --git a/gcc/testsuite/gcc.target/i386/blendv-3.c 
> b/gcc/testsuite/gcc.target/i386/blendv-3.c
> new file mode 100644
> index 000..fa0fb067a73
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/blendv-3.c
> @@ -0,0 +1,46 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx2 -O2" } */
> +/* { dg-fina

Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2023-12-11 Thread Nate Eldredge
Ref: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636805.html, 
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636804.html


I found a couple of bugs in this patch set.

#1: In atomic_wait.h, we have __wait_flags defined to include:

   __do_spin = 4,
   __spin_only = 8 | __do_spin, // implies __do_spin

So __spin_only is defined as two bits, which breaks when we test `__args & 
__wait_flags::__spin_only` in __wait_impl().  The test evaluates true even 
when we have only set __do_spin (bit 2) without bit 3, which is the 
default at least on Linux and which is supposed to request a limited 
number of spins before calling futex() and sleeping.  You can observe this 
by seeing that a thread blocked on std::counting_semaphore::acquire() 
consumes 100% CPU indefinitely.


There is another instance in atomic_timed_wait.h.

Probably __spin_only should just be 8, and any configuration that wants 
it should be responsible for manually setting __do_spin as well.



#2: __atomic_wait_address() does not populate __args._M_old before passing 
to __wait_impl(), so it remains at its default value 0 (set by the 
constructor).  As a result, `__wait_impl` is effectively always waiting on 
the value 0 (i.e. until `*__addr != 0`) rather than whatever value is 
actually wanted.  This is of course wrong, and can deadlock under the 
right race condition.


To fix, we need something like `__args._M_old = __val;` inside the loop in 
__atomic_wait_address(), so that we always wait on the exact value that 
the predicate __pred() rejected.  Again, there are similar instances in 
atomic_timed_wait.h.


(In the current libstdc++ implementation, the predicate function loads the 
value itself and doesn't return it, so we wait instead on a possibly 
different value that was loaded somewhere else in _S_do_spin().  That is 
also wrong, because the latter value might have been acceptable to the 
predicate, and if so then waiting on it may deadlock.  A classic TOCTOU. 
This is Bug #104928, reported in March 2022 and still unfixed: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928.  The patch proposed 
here would fix it, though.)


--
Nate Eldredge
n...@thatsmathematics.com



Re: [r14-5930 Regression] FAIL: gcc.c-torture/compile/libcall-2.c -Os (test for excess errors) on Linux/x86_64

2023-12-11 Thread FX Coudert
>> Pushed as 
>> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=b74981b5cf32ebf4bfffd25e7174b5c80243447a

Somehow I pushed the wrong commit, we should skip the test and not xfail.
This showed up in 
https://gcc.gnu.org/pipermail/gcc-testresults/2023-December/802839.html

So, new commit push as obvious fixup: 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=53e954a673a0d6ac80ab1f0591ea4f751e67374c

FX



Re: [v3 PATCH] Simplify vector ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE ((a cmp b) ? (VCE c) : (VCE d))).

2023-12-11 Thread Hongtao Liu
On Mon, Dec 11, 2023 at 4:14 PM Richard Biener
 wrote:
>
> On Mon, Dec 11, 2023 at 7:51 AM liuhongt  wrote:
> >
> > > since you are looking at TYPE_PRECISION below you want
> > > VECTOR_INTIEGER_TYPE_P here as well?  The alternative
> > > would be to compare TYPE_SIZE.
> > >
> > > Some of the checks feel redundant but are probably good for
> > > documentation purposes.
> > >
> > > OK with using VECTOR_INTIEGER_TYPE_P
> > Actually, the data type doens't need to integer, .i.e x86 support vblendvps
> > so I'm using TYPE_SIZE here, the code is adjusted to
> >
> > && tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (type)))
> > && (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type)))
> ><= TYPE_PRECISION (TREE_TYPE (TREE_TYPE (@6
> >
> > Here's the updated patch.
> > Ok for trunk?
> >
> > When I'm working on PR112443, I notice there's some misoptimizations:
> > after we fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend
> > fails to combine it back to v{,p}blendv{v,ps,pd} since the pattern is
> > too complicated, so I think maybe we should hanlde it in the gimple
> > level.
> >
> > The dump is like
> >
> >   _1 = c_3(D) >= { 0, 0, 0, 0 };
> >   _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
> >   _7 = VIEW_CONVERT_EXPR(_2);
> >   _8 = VIEW_CONVERT_EXPR(b_6(D));
> >   _9 = VIEW_CONVERT_EXPR(a_5(D));
> >   _10 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> >   _11 = VEC_COND_EXPR <_10, _8, _9>;
> >
> > It can be optimized to
> >
> >   _1 = c_2(D) >= { 0, 0, 0, 0 };
> >   _6 = VEC_COND_EXPR <_1, b_5(D), a_4(D)>;
> >
> > since _7 is either -1 or 0, the selection of _7 < 0 ? _8 : _9 should
> > be euqal to _1 ? b : a as long as TYPE_PRECISION of the component type
> > of the second VEC_COND_EXPR is less equal to the first one.
> > The patch add a gimple pattern to handle that.
> >
> > gcc/ChangeLog:
> >
> > * match.pd (VCE (a cmp b ? -1 : 0) < 0) ? c : d ---> (VCE ((a
> > cmp b) ? (VCE:c) : (VCE:d))): New gimple simplication.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/avx512vl-blendv-3.c: New test.
> > * gcc.target/i386/blendv-3.c: New test.
> > ---
> >  gcc/match.pd  | 23 ++
> >  .../gcc.target/i386/avx512vl-blendv-3.c   |  6 +++
> >  gcc/testsuite/gcc.target/i386/blendv-3.c  | 46 +++
> >  3 files changed, 75 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/blendv-3.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 4d554ba4721..359c7b07dc3 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -5190,6 +5190,29 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
> >(vec_cond (bit_and @0 (bit_not @3)) @2 @1)))
> >
> > +/*  ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d is just
> > +(VCE ((a cmp b) ? (VCE c) : (VCE d))) when TYPE_PRECISION of the
> > +component type of the outer vec_cond is greater equal the inner one.  
> > */
> > +(for cmp (simple_comparison)
> > + (simplify
> > +  (vec_cond
> > +(lt (view_convert@5 (vec_cond@6 (cmp@4 @0 @1)
> > +   integer_all_onesp
> > +   integer_zerop))
> > + integer_zerop) @2 @3)
> > +  (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0))
> > +   && VECTOR_INTEGER_TYPE_P (TREE_TYPE (@5))
> > +   && !TYPE_UNSIGNED (TREE_TYPE (@5))
> > +   && VECTOR_TYPE_P (TREE_TYPE (@6))
> > +   && VECTOR_TYPE_P (type)
> > +   && tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (type)))
> > +   && (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type)))
> > + <= TYPE_PRECISION (TREE_TYPE (TREE_TYPE (@6
>
> sorry for nitpicking, but can you please use
>
> && tree_int_cst_le (TYPE_SIZE (TREE_TYPE (type)),
>  TREE_TYPE (TREE_TYPE (@6)))
>
> thus not use precision on one and size on the other type?
>
> OK with that change.
Thanks, committed.
>
> Richard.
>
> > +   && TYPE_SIZE (type) == TYPE_SIZE (TREE_TYPE (@6)))
> > +   (with { tree vtype = TREE_TYPE (@6);}
> > + (view_convert:type
> > +   (vec_cond @4 (view_convert:vtype @2) (view_convert:vtype @3)))
> > +
> >  /* c1 ? c2 ? a : b : b  -->  (c1 & c2) ? a : b  */
> >  (simplify
> >   (vec_cond @0 (vec_cond:s @1 @2 @3) @3)
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c 
> > b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> > new file mode 100644
> > index 000..2777e72ab5f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
> > @@ -0,0 +1,6 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx512vl -mavx512bw -O2" } */
> > +/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
> > +/* { dg-final { scan-assembler-not {vpcmp} } } */
> > +
> > +#include "blendv-3.c"

[PATCH] Testsuite: restrict test to nonpic targets

2023-12-11 Thread FX Coudert
The test is currently failing on x86_64-apple-darwin. This patch requires 
nonpic, as suggested in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112297 by 
Andrew Pinski.

OK to commit?

FX




0001-Testsuite-restrict-test-to-nonpic-targets.patch
Description: Binary data


Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-11 Thread Robin Dapp
> FYI. I have the some failures as juzhe mentioned, with the emulator
> qemu version qemu-riscv64 version 8.1.93 (v8.2.0-rc3). The entire log
> may look like below:
> 
> Executing on host: 
> /home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/xgcc 
> -B/home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/  
> /home/box/panli/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
>   -march=rv64gcv -mabi=lp64d -mcmodel=medlow --param=riscv-autovec-lmul=m1 
> --param=riscv-autovec-preference=fixed-vlmax   -fdiagnostics-plain-output   
> -ftree-vectorize -O3 --param riscv-autovec-lmul=m1 -O3 -minline-strcmp   
> -lm  -o ./strcmp-run.exe    (timeout = 600)
> 
> spawn -ignore SIGHUP 
> /home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/xgcc 
> -B/home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/ 
> /home/box/panli/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
>  -march=rv64gcv -mabi=lp64d -mcmodel=medlow --param=riscv-autovec-lmul=m1 
> --param=riscv-autovec-preference=fixed-vlmax -fdiagnostics-plain-output 
> -ftree-vectorize -O3 --param riscv-autovec-lmul=m1 -O3 -minline-strcmp -lm -o 
> ./strcmp-run.exe^M

Thanks, it must be a bug if you both see it.  But I cannot reproduce
it yet for some reason.  I tried your exact parameters (just didn't
use newlib).  Also, for Juzhe it seemed to fail without
-minline-strcmp for you it fails with it.  Maybe my testcase uses
undefined behavior?  Could you try reducing SZ to 1 for a test?

Regards
 Robin


[PATCH] Testsuite, i386: mark test as requiring dfp

2023-12-11 Thread FX Coudert
Test currently fails on darwin with:
error: decimal floating-point not supported for this target

Pushed as obvious fix.
FX



0001-Testsuite-i386-mark-test-as-requiring-dfp.patch
Description: Binary data


[PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread Juzhe-Zhong
Hi, Richard. This patch fixes an ICE on record_use during RTL_SSA 
initialization RISC-V backend VSETVL PASS.

This is the ICE:

0x11a8603 partial_subreg_p(machine_mode, machine_mode)
../../../../gcc/gcc/rtl.h:3187
0x3b695eb 
rtl_ssa::function_info::record_use(rtl_ssa::function_info::build_info&, 
rtl_ssa::insn_info*, rtx_obj_reference)
../../../../gcc/gcc/rtl-ssa/insns.cc:524

In record_use:

  if (HARD_REGISTER_NUM_P (regno)
  && partial_subreg_p (use->mode (), mode))

Assertion failed on partial_subreg_p which is:

inline bool
partial_subreg_p (machine_mode outermode, machine_mode innermode)
{
  /* Modes involved in a subreg must be ordered.  In particular, we must
 always know at compile time whether the subreg is paradoxical.  */
  poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
  poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
  gcc_checking_assert (ordered_p (outer_prec, inner_prec)); 
-> cause ICE.
  return maybe_lt (outer_prec, inner_prec);
}

RISC-V VSETVL PASS is an advanced lazy vsetvl insertion PASS after RA (register 
allocation).

The rootcause is that we have a pattern (reduction instruction) that includes 
both VLA (length-agnostic) and VLS (fixed-length) modes.

(insn 168 173 170 31 (set (reg:RVVM1SI 101 v5 [311])
(unspec:RVVM1SI [
(unspec:V32BI [
(const_vector:V32BI [
(const_int 1 [0x1]) repeated x32
])
(reg:DI 30 t5 [312])
(const_int 2 [0x2]) repeated x2
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(unspec:RVVM1SI [
(reg:V32SI 96 v0 [orig:185 vect__96.40 ] [185])   
-> VLS mode NUNITS = 32 elements.
(reg:RVVM1SI 113 v17 [439])   
-> VLA mode NUNITS = [8, 8] elements.
] UNSPEC_REDUC_XOR)
(unspec:RVVM1SI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)
] UNSPEC_REDUC)) 15948 {pred_redxorv32si}

In this case, record_use is trying to check partial_subreg_p (use->mode (), 
mode) for RTX = (reg:V32SI 96 v0 [orig:185 vect__96.40 ] [185]).

use->mode () == V32SImode, wheras mode = RVVM1SImode. Then it ICE since they 
are !ordered_p.

In this situation, the use mode for such use should keep the original mode 
(V32SI), no need to compare VLSmode and VLAmode which one is bigger.

I am still trying to find a way to walk around this issue in RISC-V backend.

Is this patch fix ok for the trunk ? If not, could you give me a suggestion to 
walk around this ICE in RISC-V backend.

Thanks.

gcc/ChangeLog:

* rtl-ssa/insns.cc (function_info::record_use): Add ordered_p.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_use_bug-2.c: New test.

---
 gcc/rtl-ssa/insns.cc  |  2 ++
 .../riscv/rvv/vsetvl/avl_use_bug-2.c  | 21 +++
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_use_bug-2.c

diff --git a/gcc/rtl-ssa/insns.cc b/gcc/rtl-ssa/insns.cc
index 2fa48e0dacd..5b9c5baf8c7 100644
--- a/gcc/rtl-ssa/insns.cc
+++ b/gcc/rtl-ssa/insns.cc
@@ -521,6 +521,8 @@ function_info::record_use (build_info &bi, insn_info *insn,
   // different but equal-sized modes.
   gcc_checking_assert (use->insn () == insn);
   if (HARD_REGISTER_NUM_P (regno)
+ && ordered_p (GET_MODE_PRECISION (use->mode ()),
+   GET_MODE_PRECISION (mode))
  && partial_subreg_p (use->mode (), mode))
use->set_mode (mode);
   use->record_reference (ref, false);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_use_bug-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_use_bug-2.c
new file mode 100644
index 000..bbc02eab818
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_use_bug-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d 
--param=riscv-autovec-lmul=m4 -O3 -fomit-frame-pointer -funroll-loops" } */
+
+int safe_lshift_func_int32_t_s_s_left, safe_lshift_func_int32_t_s_s_right,
+safe_sub_func_uint64_t_u_u_ui2, safe_mul_func_uint64_t_u_u_ui2, g_79_2,
+g_97_l_439;
+void g_97(int * __restrict l_437)
+{
+  for (; g_97_l_439; g_97_l_439 += 1)
+for (char l_502 = 0; l_502 < 4; l_502++)
+  {
+int __trans_tmp_14 = ((safe_lshift_func_int32_t_s_s_right >= 2
+   || safe_lshift_func_int32_t_s_s_left)
+  ? 1 : safe_lshift_func_int32_t_s_s_right);
+long __trans_tmp_15 = __trans_tmp_14 * safe_mul_func_uint64_t_u_u_ui2;
+unsigned short __trans_tmp_16 = -__trans_tmp_15;
+int __trans_tmp_7
+  = (__trans_tmp_16 ^ 65535UL) - safe_su

[PATCH] testsuite: Disable -fstack-protector* for some strub tests

2023-12-11 Thread Jakub Jelinek
Hi!

In our distro builds, we test with
RUNTESTFLAGS='--target_board=unix\{,-fstack-protector-strong\}'
because SSP is something we use widely in the distribution.
4 new strub test FAIL with that option though, as can be
seen with a simple
make check-gcc check-g++ 
RUNTESTFLAGS='--target_board=unix\{,-fstack-protector-strong\} dg.exp=strub-O*'
- in particular, the expand dump
\[(\]call\[^\n\]*strub_leave.*\n\[(\]code_label
regexps see code_labels in there introduced for stack protector.

The following patch fixes it by using -fno-stack-protector for these
explicitly.

Tested on x86_64-linux, ok for trunk?

2023-12-11  Jakub Jelinek  

* c-c++-common/strub-O2fni.c: Add -fno-stack-protector to dg-options.
* c-c++-common/strub-O3fni.c: Likewise.
* c-c++-common/strub-Os.c: Likewise.
* c-c++-common/strub-Og.c: Likewise. 

--- gcc/testsuite/c-c++-common/strub-O2fni.c.jj 2023-12-08 08:28:23.689170380 
+0100
+++ gcc/testsuite/c-c++-common/strub-O2fni.c2023-12-11 09:25:49.100792709 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fstrub=strict -fdump-rtl-expand -fno-inline" } */
+/* { dg-options "-O2 -fstrub=strict -fdump-rtl-expand -fno-inline 
-fno-stack-protector" } */
 /* { dg-require-effective-target strub } */
 
 /* With -fno-inline, none of the strub builtins are inlined.  */
--- gcc/testsuite/c-c++-common/strub-O3fni.c.jj 2023-12-08 08:28:23.707170125 
+0100
+++ gcc/testsuite/c-c++-common/strub-O3fni.c2023-12-11 09:25:56.388695362 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fstrub=strict -fdump-rtl-expand -fno-inline" } */
+/* { dg-options "-O3 -fstrub=strict -fdump-rtl-expand -fno-inline 
-fno-stack-protector" } */
 /* { dg-require-effective-target strub } */
 
 /* With -fno-inline, none of the strub builtins are inlined.  */
--- gcc/testsuite/c-c++-common/strub-Os.c.jj2023-12-08 08:28:23.707170125 
+0100
+++ gcc/testsuite/c-c++-common/strub-Os.c   2023-12-11 09:26:24.994313267 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Os -fstrub=strict -fdump-rtl-expand" } */
+/* { dg-options "-Os -fstrub=strict -fdump-rtl-expand -fno-stack-protector" } 
*/
 /* { dg-require-effective-target strub } */
 
 /* At -Os, without -fno-inline, we fully expand enter, and also update.  The
--- gcc/testsuite/c-c++-common/strub-Og.c.jj2023-12-08 08:28:23.707170125 
+0100
+++ gcc/testsuite/c-c++-common/strub-Og.c   2023-12-11 09:26:07.077552587 
+0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Og -fstrub=strict -fdump-rtl-expand" } */
+/* { dg-options "-Og -fstrub=strict -fdump-rtl-expand -fno-stack-protector" } 
*/
 /* { dg-require-effective-target strub } */
 
 /* At -Og, without -fno-inline, we fully expand enter, but neither update nor

Jakub



[PATCH] Testsuite, asan, darwin: Adjust output pattern

2023-12-11 Thread FX Coudert
Since the last import from upstream libsanitizer, the output has changed
and now looks more like this:

READ of size 6 at 0x7ff7beb2a144 thread T0
#0 0x101cf7796 in MemcmpInterceptorCommon(void*, int (*)(void const*, void 
const*, unsigned long), void const*, void const*, unsigned long) 
sanitizer_common_interceptors.inc:813
#1 0x101cf7b99 in memcmp sanitizer_common_interceptors.inc:840
#2 0x108a0c39f in __stack_chk_guard+0xf (dyld:x86_64+0x8039f)

so let's adjust the pattern accordingly.

Tested on x86_64-apple-darwin21. OK to push?

FX




0001-Testsuite-asan-darwin-Adjust-output-pattern.patch
Description: Binary data


RE: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging

2023-12-11 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Monday, December 11, 2023 7:38 AM
> To: Richard Sandiford 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; nd
> ; j...@ventanamicro.com
> Subject: Re: [PATCH 15/21]middle-end: [RFC] conditionally support forcing 
> final
> edge for debugging
> 
> On Sat, 9 Dec 2023, Richard Sandiford wrote:
> 
> > Tamar Christina  writes:
> > > Hi All,
> > >
> > > What do people think about having the ability to force only the latch 
> > > connected
> > > exit as the exit as a param? I.e. what's in the patch but as a param.
> > >
> > > I found this useful when debugging large example failures as it tells me 
> > > where
> > > I should be looking.  No hard requirement but just figured I'd ask if we 
> > > should.
> >
> > If it's useful for that, then perhaps it would be worth making it a
> > DEBUG_COUNTER instead of a --param, for easy bisection.
> 
> Or even better, make a debug counter that would skip the IV edge and
> choose the "next".
> 

Ah, I'd never heard of debug counters. They look very useful!

Did you mean everytime the counter is reached it picks the n-th successor?

So If the counter is hit twice it picks the 3rd exit?

Thanks,
Tamar


Re: [PATCH v8] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-12-11 Thread FX Coudert
Hi Marek,

The patch is causing three failures on x86_64-apple-darwin21:

> FAIL: g++.dg/cpp2a/concepts-explicit-inst1.C -std=c++20 scan-assembler 
> _Z1gI1XEvT_
> FAIL: g++.dg/cpp2a/concepts-explicit-inst1.C -std=c++20 scan-assembler 
> _Z1gI1YEvT_
> FAIL: g++.dg/cpp2a/consteval-prop6.C -std=c++20 at line 58 (test for 
> warnings, line 57)

How could I help debug this?

FX

Re: [PATCH] testsuite: Disable -fstack-protector* for some strub tests

2023-12-11 Thread Richard Biener
On Mon, 11 Dec 2023, Jakub Jelinek wrote:

> Hi!
> 
> In our distro builds, we test with
> RUNTESTFLAGS='--target_board=unix\{,-fstack-protector-strong\}'
> because SSP is something we use widely in the distribution.
> 4 new strub test FAIL with that option though, as can be
> seen with a simple
> make check-gcc check-g++ 
> RUNTESTFLAGS='--target_board=unix\{,-fstack-protector-strong\} 
> dg.exp=strub-O*'
> - in particular, the expand dump
> \[(\]call\[^\n\]*strub_leave.*\n\[(\]code_label
> regexps see code_labels in there introduced for stack protector.
> 
> The following patch fixes it by using -fno-stack-protector for these
> explicitly.
> 
> Tested on x86_64-linux, ok for trunk?

OK.

> 2023-12-11  Jakub Jelinek  
> 
>   * c-c++-common/strub-O2fni.c: Add -fno-stack-protector to dg-options.
>   * c-c++-common/strub-O3fni.c: Likewise.
>   * c-c++-common/strub-Os.c: Likewise.
>   * c-c++-common/strub-Og.c: Likewise. 
> 
> --- gcc/testsuite/c-c++-common/strub-O2fni.c.jj   2023-12-08 
> 08:28:23.689170380 +0100
> +++ gcc/testsuite/c-c++-common/strub-O2fni.c  2023-12-11 09:25:49.100792709 
> +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fstrub=strict -fdump-rtl-expand -fno-inline" } */
> +/* { dg-options "-O2 -fstrub=strict -fdump-rtl-expand -fno-inline 
> -fno-stack-protector" } */
>  /* { dg-require-effective-target strub } */
>  
>  /* With -fno-inline, none of the strub builtins are inlined.  */
> --- gcc/testsuite/c-c++-common/strub-O3fni.c.jj   2023-12-08 
> 08:28:23.707170125 +0100
> +++ gcc/testsuite/c-c++-common/strub-O3fni.c  2023-12-11 09:25:56.388695362 
> +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -fstrub=strict -fdump-rtl-expand -fno-inline" } */
> +/* { dg-options "-O3 -fstrub=strict -fdump-rtl-expand -fno-inline 
> -fno-stack-protector" } */
>  /* { dg-require-effective-target strub } */
>  
>  /* With -fno-inline, none of the strub builtins are inlined.  */
> --- gcc/testsuite/c-c++-common/strub-Os.c.jj  2023-12-08 08:28:23.707170125 
> +0100
> +++ gcc/testsuite/c-c++-common/strub-Os.c 2023-12-11 09:26:24.994313267 
> +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-Os -fstrub=strict -fdump-rtl-expand" } */
> +/* { dg-options "-Os -fstrub=strict -fdump-rtl-expand -fno-stack-protector" 
> } */
>  /* { dg-require-effective-target strub } */
>  
>  /* At -Os, without -fno-inline, we fully expand enter, and also update.  The
> --- gcc/testsuite/c-c++-common/strub-Og.c.jj  2023-12-08 08:28:23.707170125 
> +0100
> +++ gcc/testsuite/c-c++-common/strub-Og.c 2023-12-11 09:26:07.077552587 
> +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-Og -fstrub=strict -fdump-rtl-expand" } */
> +/* { dg-options "-Og -fstrub=strict -fdump-rtl-expand -fno-stack-protector" 
> } */
>  /* { dg-require-effective-target strub } */
>  
>  /* At -Og, without -fno-inline, we fully expand enter, but neither update nor
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


RE: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging

2023-12-11 Thread Richard Biener
On Mon, 11 Dec 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, December 11, 2023 7:38 AM
> > To: Richard Sandiford 
> > Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; nd
> > ; j...@ventanamicro.com
> > Subject: Re: [PATCH 15/21]middle-end: [RFC] conditionally support forcing 
> > final
> > edge for debugging
> > 
> > On Sat, 9 Dec 2023, Richard Sandiford wrote:
> > 
> > > Tamar Christina  writes:
> > > > Hi All,
> > > >
> > > > What do people think about having the ability to force only the latch 
> > > > connected
> > > > exit as the exit as a param? I.e. what's in the patch but as a param.
> > > >
> > > > I found this useful when debugging large example failures as it tells 
> > > > me where
> > > > I should be looking.  No hard requirement but just figured I'd ask if 
> > > > we should.
> > >
> > > If it's useful for that, then perhaps it would be worth making it a
> > > DEBUG_COUNTER instead of a --param, for easy bisection.
> > 
> > Or even better, make a debug counter that would skip the IV edge and
> > choose the "next".
> > 
> 
> Ah, I'd never heard of debug counters. They look very useful!
> 
> Did you mean everytime the counter is reached it picks the n-th successor?
> 
> So If the counter is hit twice it picks the 3rd exit?

  if (!dbg_cnt (...))
do not take this exit, try next

which means it might even fail to find an exit.


> Thanks,
> Tamar
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread Robin Dapp
> In record_use:
> 
>   if (HARD_REGISTER_NUM_P (regno)
> && partial_subreg_p (use->mode (), mode))
> 
> Assertion failed on partial_subreg_p which is:
> 
> inline bool
> partial_subreg_p (machine_mode outermode, machine_mode innermode)
> {
>   /* Modes involved in a subreg must be ordered.  In particular, we must
>  always know at compile time whether the subreg is paradoxical.  */
>   poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
>   poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
>   gcc_checking_assert (ordered_p (outer_prec, inner_prec));   
>   -> cause ICE.
>   return maybe_lt (outer_prec, inner_prec);
> }
> 
> RISC-V VSETVL PASS is an advanced lazy vsetvl insertion PASS after RA 
> (register allocation).
> 
> The rootcause is that we have a pattern (reduction instruction) that includes 
> both VLA (length-agnostic) and VLS (fixed-length) modes.

Maybe as additional context: The second input (which has a VLA mode here)
is not used entirely but just its first element.  This serves as initial
value for the reduction.

I'm not sure we'd want to model it as subreg here (endianness etc?).
Could we have a VLS-mode equivalent for the VLA mode that only holds
one element?

Regards
 Robin



Re: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread juzhe.zh...@rivai.ai
>> I'm not sure we'd want to model it as subreg here (endianness etc?).
>> Could we have a VLS-mode equivalent for the VLA mode that only holds
>> one element?

Yes. This is the last chance to walk around it here but we will end up with 
more patterns.
since reduction dest operand always LMUL = 1 mode.

So, when -march=rv64gcv, the dest mode should be V4SI, if 
-march=rv64gcv_zvl256b, the dest mode should be V8SI.
...etc.  Different TARGET_MIN_VLEN, different M1 mode. It's going to be a big 
change in RISC-V backend.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-11 17:01
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; richard.sandiford
Subject: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V 
VSETVL PASS
> In record_use:
> 
>   if (HARD_REGISTER_NUM_P (regno)
>   && partial_subreg_p (use->mode (), mode))
> 
> Assertion failed on partial_subreg_p which is:
> 
> inline bool
> partial_subreg_p (machine_mode outermode, machine_mode innermode)
> {
>   /* Modes involved in a subreg must be ordered.  In particular, we must
>  always know at compile time whether the subreg is paradoxical.  */
>   poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
>   poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
>   gcc_checking_assert (ordered_p (outer_prec, inner_prec));   
>   -> cause ICE.
>   return maybe_lt (outer_prec, inner_prec);
> }
> 
> RISC-V VSETVL PASS is an advanced lazy vsetvl insertion PASS after RA 
> (register allocation).
> 
> The rootcause is that we have a pattern (reduction instruction) that includes 
> both VLA (length-agnostic) and VLS (fixed-length) modes.
 
Maybe as additional context: The second input (which has a VLA mode here)
is not used entirely but just its first element.  This serves as initial
value for the reduction.
 
I'm not sure we'd want to model it as subreg here (endianness etc?).
Could we have a VLS-mode equivalent for the VLA mode that only holds
one element?
 
Regards
Robin
 
 


Re: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread juzhe.zh...@rivai.ai
I think it's reasonable refactor reduction instruction pattern work around this 
issue.

Going to send a patch to apply this solution.

So drop this patch.  Sorry for bothering Richard S.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-11 17:01
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; richard.sandiford
Subject: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V 
VSETVL PASS
> In record_use:
> 
>   if (HARD_REGISTER_NUM_P (regno)
>   && partial_subreg_p (use->mode (), mode))
> 
> Assertion failed on partial_subreg_p which is:
> 
> inline bool
> partial_subreg_p (machine_mode outermode, machine_mode innermode)
> {
>   /* Modes involved in a subreg must be ordered.  In particular, we must
>  always know at compile time whether the subreg is paradoxical.  */
>   poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
>   poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
>   gcc_checking_assert (ordered_p (outer_prec, inner_prec));   
>   -> cause ICE.
>   return maybe_lt (outer_prec, inner_prec);
> }
> 
> RISC-V VSETVL PASS is an advanced lazy vsetvl insertion PASS after RA 
> (register allocation).
> 
> The rootcause is that we have a pattern (reduction instruction) that includes 
> both VLA (length-agnostic) and VLS (fixed-length) modes.
 
Maybe as additional context: The second input (which has a VLA mode here)
is not used entirely but just its first element.  This serves as initial
value for the reduction.
 
I'm not sure we'd want to model it as subreg here (endianness etc?).
Could we have a VLS-mode equivalent for the VLA mode that only holds
one element?
 
Regards
Robin
 
 


Re: [PATCH v3 11/11] c: Add new -Wdeclaration-missing-parameter-type permerror

2023-12-11 Thread Florian Weimer
* Marek Polacek:

> On Mon, Nov 20, 2023 at 10:56:42AM +0100, Florian Weimer wrote:
>> This used to be a warning, enabled by default, without its own option.
>
> I think this patch is OK now.
>  
>> A subsequent change could improve diagnostics and provide spelling
>> hints for declarations like “void function (int32t);”.
>
> Feel free to open a PR.

Good idea, it's now:

  Spelling hint for typos in parameter types in function prototypes
  

Thanks,
Florian



Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread Robin Dapp
> Yes. This is the last chance to walk around it here but we will end up with 
> more patterns.
> since reduction dest operand always LMUL = 1 mode.
> 
> So, when -march=rv64gcv, the dest mode should be V4SI, if 
> -march=rv64gcv_zvl256b, the dest mode should be V8SI.
> ...etc.  Different TARGET_MIN_VLEN, different M1 mode. It's going to be a big 
> change in RISC-V backend.

Hmm I haven't really thought this through yet (nor checked the spec
in detail) but isn't the result always a 1-element thing?  I.e. a
V1SI regardless of the input vlen?  That would also mean various
changes of course.

Regards
 Robin


[PATCH] ipa/92606 - properly handle no_icf attribute for variables

2023-12-11 Thread Richard Biener
The following adds no_icf handling for variables where the attribute
was rejected.  It also fixes the check for no_icf by checking both
the source and the targets decl.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

This would solve the AVR issue with merging of "progmem" attributed
and non-"progmem" attributed variables if they'd also add no_icf there.

OK?

Thanks,
Richard.

PR ipa/92606
gcc/c-family/
* c-attribs.cc (handle_noicf_attribute): Also allow the
attribute on global variables.

gcc/
* ipa-icf.cc (sem_item_optimizer::merge_classes): Check
both source and alias for the no_icf attribute.
* doc/extend.texi (no_icf): Document variable attribute.
---
 gcc/c-family/c-attribs.cc | 3 ++-
 gcc/doc/extend.texi   | 5 +
 gcc/ipa-icf.cc| 3 ++-
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 854e987dc79..a3671fe3a57 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -1649,7 +1649,8 @@ handle_noicf_attribute (tree *node, tree name,
tree ARG_UNUSED (args),
int ARG_UNUSED (flags), bool *no_add_attrs)
 {
-  if (TREE_CODE (*node) != FUNCTION_DECL)
+  if (TREE_CODE (*node) != FUNCTION_DECL
+  && (TREE_CODE (*node) != VAR_DECL || !is_global_var (*node)))
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
   *no_add_attrs = true;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e8b5e771f7a..f0c789f6cb4 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8152,6 +8152,11 @@ script to place the sections with the @code{.persistent} 
prefix in the
 right location.  Specifically, some type of non-volatile, writeable
 memory is required.
 
+@cindex @code{no_icf} variable attribute
+@item no_icf
+This variable attribute prevents a variable from being merged with another
+equivalent variable.
+
 @cindex @code{objc_nullability} variable attribute
 @item objc_nullability (@var{nullability kind}) @r{(Objective-C and 
Objective-C++ only)}
 This attribute applies to pointer variables only.  It allows marking the
diff --git a/gcc/ipa-icf.cc b/gcc/ipa-icf.cc
index 81232d5706e..e27536d73a9 100644
--- a/gcc/ipa-icf.cc
+++ b/gcc/ipa-icf.cc
@@ -3422,7 +3422,8 @@ sem_item_optimizer::merge_classes (unsigned int 
prev_class_count,
 alias->node->dump_asm_name ());
  }
 
-   if (lookup_attribute ("no_icf", DECL_ATTRIBUTES (alias->decl)))
+   if (lookup_attribute ("no_icf", DECL_ATTRIBUTES (alias->decl))
+   || lookup_attribute ("no_icf", DECL_ATTRIBUTES (source->decl)))
  {
if (dump_enabled_p ())
  dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
-- 
2.35.3


Re: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread juzhe.zh...@rivai.ai
Oh. I just confirmed. V1SI make perfect sens since we never apply partial 
vectorization for VLSmode.

Drop this patch and going to refactor reduction pattern to fix this issue.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-11 17:11
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; richard.sandiford
Subject: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V 
VSETVL PASS
> Yes. This is the last chance to walk around it here but we will end up with 
> more patterns.
> since reduction dest operand always LMUL = 1 mode.
> 
> So, when -march=rv64gcv, the dest mode should be V4SI, if 
> -march=rv64gcv_zvl256b, the dest mode should be V8SI.
> ...etc.  Different TARGET_MIN_VLEN, different M1 mode. It's going to be a big 
> change in RISC-V backend.
 
Hmm I haven't really thought this through yet (nor checked the spec
in detail) but isn't the result always a 1-element thing?  I.e. a
V1SI regardless of the input vlen?  That would also mean various
changes of course.
 
Regards
Robin
 


Re: [PATCH] treat argp-based mem as frame related in dse

2023-12-11 Thread Jiufu Guo


Hi,

Thanks for your quick reply!

Jeff Law  writes:

> On 12/10/23 20:07, Jiufu Guo wrote:
>
> I'm having a bit of a hard time convincing myself this is correct
> though.  I can't see how rewriting the load to read the source of the
> prior store is unsafe.  If that fixes a problem, then it would seem
> like we've gone wrong before here -- perhaps failing to use the fusage
> loads to "kill" any available stores to the same or aliased memory
> locations.
 As you said the later one, call's fusage would killing the previous
 store. It is a kind of case like:

 134: [argp:SI+0x8]=r134:SI
 135: [argp:SI+0x4]=0x1
 136: [argp:SI]=r132:SI
 137: ax:SI=call [`memset'] argc:0xc
 REG_CALL_DECL `memset'
 REG_EH_REGION 0

 This call insn is:
 (call_insn/j 137 136 147 27 (set (reg:SI 0 ax)
   (call (mem:QI (symbol_ref:SI ("memset") [flags 0x41]  
 ) [0 __builtin_memset S1 A8])
   (const_int 12 [0xc]))) "pr102798.c":23:22 1086 
 {*sibcall_value}
(expr_list:REG_UNUSED (reg:SI 0 ax)
   (expr_list:REG_CALL_DECL (symbol_ref:SI ("memset") [flags 0x41]  
 )
   (expr_list:REG_EH_REGION (const_int 0 [0])
   (nil
   (expr_list:SI (use (mem/f:SI (reg/f:SI 16 argp) [0  S4 A32]))
   (expr_list:SI (use (mem:SI (plus:SI (reg/f:SI 16 argp) 
 (const_int 4 [0x4])) [0  S4 A32]))
   (expr_list:SI (use (mem:SI (plus:SI (reg/f:SI 16 argp) 
 (const_int 8 [0x8])) [0  S4 A32]))
   (nil)

 The stores in "insns 134-136" are used by the call. "check_mem_read_rtx"
 would prevent them to eliminated.
>>> Right.  But unless I read something wrong, the patch wasn't changing
>>> store removal, it was changing whether or not we forwarded the source
>>> of the store into the destination of a subsequent load from the same
>>> address.
>> "check_mem_read_rtx" has another behavior which checks the mem
>> and adds read_info to insn_info->read_rec. "read_rec" could prevent
>> the "store" from being eliminated during the dse's global alg. This
>> patch leverages this behavior.
>> And to avoid the "mem on fusage" to be replaced by leading store's rhs
>> "replace_read" was disabled if the mem is on the call's fusage.
> Ah, so not only do we want to avoid the call to replace_read, but also avoid 
> the early return.
>
> By avoiding the early return, we proceed into later code which "kills"
> the tracked store, thus avoiding the problem.  Right?
It is similar, I would say.  There is "leading code" as below:
  /* Look at all of the uses in the insn.  */
  note_uses (&PATTERN (insn), check_mem_read_use, bb_info);

This checks possible loads in the "insn" and "kills" the tracked
stores if needed.
But "note_uses" does not check the fusage of the call insn.
So, this patch proceed the code "check_mem_read" for the "use mem"
on fusage.

BR,
Jeff (Jiufu Guo)

>
> jeff


[patch,avr] PR112944: Support .rodata in RAM for AVR64* and AVR128* devices

2023-12-11 Thread Georg-Johann Lay

This is a patch that locates .rodata in flash for some AVR
devices that can support it.  All new functionality depends
on Binutils PR31124 and is switched on by configure checks
for the new emulations.

https://sourceware.org/PR31124

For explanation of the gcc part see commit message below.

Most of the patch is adjusting device-specs generation.

Ok for master?


Johann

--

avr: Support .rodata in Flash for AVR64* and AVR128* Devices.

These devices see a 32 KiB block of their program memory (flash) in
the RAM address space.  This can be used to support .rodata in flash
provided Binutils support PR31124 (Add new emulations which locate
.rodata in flash).  This patch does the following:

* configure checks availability of Binutils PR31124.

* Add new command line options -mrodata-in-ram and -flmap.
While -flmap is for internal usage (communicate hardware properties
to the compiler proper), -mrodata-in-ram is a user space option that
allows to return to the current rodata-in-ram layout.

* Adjust gen-avr-mmcu-specs.cc so that specs are generated that sanity
check options, and that translate -m[no-]rodata-in-ram to its emulation.

* Objects in .rodata don't drag __do_copy_data.

* Document new options and built-in macros.

PR target/112944

gcc/
* configure.ac [target=avr]: Check availability of emulations
avrxmega2_flmap and avrxmega4_flmap, resulting in new config vars
HAVE_LD_AVR_AVRXMEGA2_FLMAP and HAVE_LD_AVR_AVRXMEGA4_FLMAP.
* configure: Regenerate.
* config.in: Regenerate.
* doc/invoke.texi (AVR Options): Document -mflmap, -mrodata-in-ram,
__AVR_HAVE_FLMAP__, __AVR_RODATA_IN_RAM__.
* doc/avr-mmcu.texi: Regenerate.

* gcc/config/avr/avr.opt (-mflmap, -mrodata-in-ram): New options.
* config/avr/avr-arch.h (enum avr_device_specific_features):
Add AVR_ISA_FLMAP.
* config/avr/avr-mcus.def (AVR_MCU) [avr64*, avr128*]: Set isa flag
AVR_ISA_FLMAP.
* gcc/config/avr/avr.cc (avr_arch_index, avr_has_rodata_p): New vars.
(avr_set_core_architecture): Set avr_arch_index.
(have_avrxmega2_flmap, have_avrxmega4_flmap)
(have_avrxmega3_rodata_in_flash): Set new static const bool according
to configure results.
(avr_rodata_in_flash_p): New function.
(avr_asm_init_sections): Let readonly_data_section->unnamed.callback
track avr_need_copy_data_p only if not avr_rodata_in_flash_p().
(avr_asm_named_section): Track avr_has_rodata_p.
(avr_file_end): Emit __do_copy_data also when avr_has_rodata_p
and not avr_rodata_in_flash_p ().
* config/avr/specs.h (CC1_SPEC): Add %(cc1_rodata_in_ram).
(LINK_SPEC): Add %(link_rodata_in_ram).
(LINK_ARCH_SPEC): Remove.
* gcc/config/avr/gen-avr-mmcu-specs.cc (have_avrxmega3_rodata_in_flash)
(have_avrxmega2_flmap, have_avrxmega4_flmap): Set new static
const bool according to configure results.
(diagnose_mrodata_in_ram): New function.
(print_mcu): Generate specs with the following changes:
<*cc1_misc, *asm_misc, *link_misc>: New specs so that we don't
need to extend avr/specs.h each time we add a new bell or whistle.
<*cc1_rodata_in_ram, *link_rodata_in_ram>: New specs to diagnose
-m[no-]rodata-in-ram.
<*cpp_rodata_in_ram>: New. Does -D__AVR_RODATA_IN_RAM__=0/1.
<*cpp_mcu>: Add -D__AVR_AVR_FLMAP__ if it applies.
<*cpp>: Add %(cpp_rodata_in_ram).
<*link_arch>: Use emulation avrxmega2_flmap, avrxmega4_flmap as needed.
<*self_spec>: Add -mflmap or %diff --git a/gcc/config.in b/gcc/config.in
index fa40825d6d0..18e0538af30 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1673,6 +1673,12 @@
 #endif
 
 
+/* Define if your linker supports emulation avrxmega2_flmap. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_LD_AVR_AVRXMEGA2_FLMAP
+#endif
+
+
 /* Define if your default avr linker script for avrxmega3 leaves .rodata in
flash. */
 #ifndef USED_FOR_TARGET
@@ -1680,6 +1686,12 @@
 #endif
 
 
+/* Define if your linker supports emulation avrxmega4_flmap. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_LD_AVR_AVRXMEGA4_FLMAP
+#endif
+
+
 /* Define if your linker supports -z bndplt */
 #ifndef USED_FOR_TARGET
 #undef HAVE_LD_BNDPLT_SUPPORT
diff --git a/gcc/config/avr/avr-arch.h b/gcc/config/avr/avr-arch.h
index 79445fe7987..9ef187a1fd1 100644
--- a/gcc/config/avr/avr-arch.h
+++ b/gcc/config/avr/avr-arch.h
@@ -166,7 +166,35 @@ AVR_ISA_RCALL
   assume these instructions are not available and we set the built-in
   macro __AVR_HAVE_JMP_CALL__ accordingly.  This macro is used to
   determine a rough estimate of flash size in libgcc, and AVR-LibC uses
-  this macro to determine vector sizes.  */
+  this macro to determine vector sizes.
+
+AVR_ISA_FLMAP
+  The device has the NVMCTRL_CTRLB.FLMAP bitfield.  The value of FLMAP
+  determines which 32 KiB segment of the program memory (flash) is v

RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-11 Thread Richard Biener
On Mon, 11 Dec 2023, Tamar Christina wrote:

> > > >
> > > > Hmm, but we're visiting them then?  I wonder how you get along
> > > > without doing adjustmens on the uses if you consider
> > > >
> > > > _1 = a < b;
> > > > _2 = c != d;
> > > > _3 = _1 | _2;
> > > > if (_3 != 0)
> > > >   exit loop;
> > > >
> > > > thus a combined condition like
> > > >
> > > > if (a < b || c != d)
> > > >
> > > > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > > > mask uses and thus possibly adjust them.
> > > >
> > > > What bad happens if you drop 'analyze_only'?  We're not really
> > > > rewriting anything there.
> > >
> > > You mean drop it only in the above? We then fail to update the type for
> > > the gcond.  So in certain circumstances like with
> > >
> > > int a, c, d;
> > > short b;
> > >
> > > int
> > > main ()
> > > {
> > >   int e[1];
> > >   for (; b < 2; b++)
> > > {
> > >   a = 0;
> > >   if (b == 28378)
> > > a = e[b];
> > >   if (!(d || b))
> > > for (; c;)
> > >   ;
> > > }
> > >   return 0;
> > > }
> > >
> > > Unless we walk the statements regardless of whether they come from inside 
> > > the
> > loop or not.
> > 
> > What do you mean by "fail to update the type for the gcond"?  If
> > I understood correctly the 'analyze_only' short-cuts some
> > checks, it doens't add some?
> > 
> > But it's hard to follow what's actually done for a gcond ...
> > 
> 
> Yes so I had realized I had misunderstood what this pattern was doing and once
> I had made the first wrong change it snowballed.
> 
> This is an updates patch where the only modification made is to 
> check_bool_pattern
> to also return the type of the overall expression even if we are going to 
> handle the
> conditional through an optab expansion.  I'm piggybacking on the fact that 
> this function
> has seen enough of the operands to be able to tell the precision needed when 
> vectorizing.
> 
> This is needed because in the cases where the condition to the gcond was 
> already a bool
> The precision would be 1 bit, to find the actual mask since we have to dig 
> through the
> operands which this function already does.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
> issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
>   (check_bool_pattern, vect_recog_bool_pattern): Support gconds type
>   analysis.
>   * tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
>   lhs.
>   (vectorizable_early_exit): New.
>   (vect_analyze_stmt, vect_transform_stmt): Use it.
>   (vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> 7debe7f0731673cd1bf25cd39d55e23990a73d0e..6bf1c0aba8ce94f70ce4e952efd1c5695b189690
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple 
> *pattern_stmt,
>if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>  {
>gcc_assert (!vectype
> +   || is_a  (pattern_stmt)
> || (VECTOR_BOOLEAN_TYPE_P (vectype)
> == vect_use_mask_type_p (orig_stmt_info)));
>STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> @@ -5210,10 +5211,12 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
> true if bool VAR can and should be optimized that way.  Assume it 
> shouldn't
> in case it's a result of a comparison which can be directly vectorized 
> into
> a vector comparison.  Fills in STMTS with all stmts visited during the
> -   walk.  */
> +   walk.  if VECTYPE then this value will contain the common type of the
> +   operations making up the comparisons.  */
>  
>  static bool
> -check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts)
> +check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts,
> + tree *vectype)
>  {
>tree rhs1;
>enum tree_code rhs_code;
> @@ -5234,27 +5237,28 @@ check_bool_pattern (tree var, vec_info *vinfo, 
> hash_set &stmts)
>switch (rhs_code)
>  {
>  case SSA_NAME:
> -  if (! check_bool_pattern (rhs1, vinfo, stmts))
> +  if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
>   return false;
>break;
>  
>  CASE_CONVERT:
>if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
>   return false;
> -  if (! check_bool_pattern (rhs1, vinfo, stmts))
> +  if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
>   return false;
>break;
>  
>  case BIT_NOT_EXPR:
> -  if (! check_bool_pattern (rhs1, vinfo, stmts))
> +  if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
>   return false;
>break;
>  
>  case BIT_AND_EXPR:
>  case BIT_IOR_EXPR:
> 

[PATCH 0/3] RISC-V: vectorised memory operations

2023-12-11 Thread Sergei Lewis
This patchset permits generation of inlined vectorised code for movmem, 
setmem and cmpmem, if and only if the operation size is 
at least one and at most eight vector registers' worth of data.

Further vectorisation rapidly becomes debatable due to code size concerns;
however, for these simple cases we do have an unambiguous performance win 
without sacrificing too much code size compared to a libc call.

Signed-off-by: Sergei Lewis 

---

Sergei Lewis (3):
  RISC-V: movmem for RISCV with V extension
  RISC-V: setmem for RISCV with V extension
  RISC-V: cmpmem for RISCV with V extension

 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-string.cc  | 193 ++
 gcc/config/riscv/riscv.md |  51 +
 .../gcc.target/riscv/rvv/base/cmpmem-1.c  |  85 
 .../gcc.target/riscv/rvv/base/cmpmem-2.c  |  69 +++
 .../gcc.target/riscv/rvv/base/movmem-1.c  |  59 ++
 .../gcc.target/riscv/rvv/base/setmem-1.c  |  99 +
 7 files changed, 558 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c

-- 
2.34.1



[PATCH 2/3] RISC-V: setmem for RISCV with V extension

2023-12-11 Thread Sergei Lewis
gcc/ChangeLog

* config/riscv/riscv-protos.h (riscv_vector::expand_vec_setmem): New 
function
declaration.

* config/riscv/riscv-string.cc (riscv_vector::expand_vec_setmem): New
function: this generates an inline vectorised memory set, if and only if we
know the entire operation can be performed in a single vector store

* config/riscv/riscv.md (setmem): Try riscv_vector::expand_vec_setmem
for constant lengths

gcc/testsuite/ChangeLog
* gcc.target/riscv/rvv/base/setmem-1.c: New tests
---
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-string.cc  | 82 +++
 gcc/config/riscv/riscv.md | 14 +++
 .../gcc.target/riscv/rvv/base/setmem-1.c  | 99 +++
 4 files changed, 196 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 20bbb5b859c..950cb65c910 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -560,6 +560,7 @@ void expand_popcount (rtx *);
 void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
 bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
 void emit_vec_extract (rtx, rtx, poly_int64);
+bool expand_vec_setmem (rtx, rtx, rtx, rtx);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 11c1f74d0b3..0abbd5f8b28 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1247,4 +1247,86 @@ expand_strcmp (rtx result, rtx src1, rtx src2, rtx 
nbytes,
   return true;
 }
 
+
+/* Select appropriate LMUL for a single vector operation based on
+   byte size of data to be processed.
+   On success, return true and populate lmul_out.
+   If length_in is too wide for a single vector operation, return false
+   and leave lmul_out unchanged.  */
+
+static bool
+select_appropriate_lmul (HOST_WIDE_INT length_in,
+HOST_WIDE_INT &lmul_out)
+{
+  /* if it's tiny, default operation is likely better; maybe worth
+ considering fractional lmul in the future as well.  */
+  if (length_in < (TARGET_MIN_VLEN/8))
+return false;
+
+  /* find smallest lmul large enough for entire op.  */
+  HOST_WIDE_INT lmul = 1;
+  while ((lmul <= 8) && (length_in > ((lmul*TARGET_MIN_VLEN)/8)))
+{
+  lmul <<= 1;
+}
+
+  if (lmul > 8)
+return false;
+
+  lmul_out = lmul;
+  return true;
+}
+
+/* Used by setmemdi in riscv.md.  */
+bool
+expand_vec_setmem (rtx dst_in, rtx length_in, rtx fill_value_in,
+ rtx alignment_in)
+{
+  /* we're generating vector code.  */
+  if (!TARGET_VECTOR)
+return false;
+  /* if we can't reason about the length, let libc handle the operation.  */
+  if (!CONST_INT_P (length_in))
+return false;
+
+  HOST_WIDE_INT length = INTVAL (length_in);
+  HOST_WIDE_INT lmul;
+
+  /* select an lmul such that the data just fits into one vector operation;
+ bail if we can't.  */
+  if (!select_appropriate_lmul (length, lmul))
+return false;
+
+  machine_mode vmode = riscv_vector::get_vector_mode (QImode,
+ BYTES_PER_RISCV_VECTOR * lmul).require ();
+  rtx dst_addr = copy_addr_to_reg (XEXP (dst_in, 0));
+  rtx dst = change_address (dst_in, vmode, dst_addr);
+
+  rtx fill_value = gen_reg_rtx (vmode);
+  rtx broadcast_ops[] = {fill_value, fill_value_in};
+
+  /* If the length is exactly vlmax for the selected mode, do that.
+ Otherwise, use a predicated store.  */
+  if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in)))
+{
+  emit_vlmax_insn (code_for_pred_broadcast (vmode),
+ UNARY_OP, broadcast_ops);
+  emit_move_insn (dst, fill_value);
+}
+  else
+{
+  if (!satisfies_constraint_K (length_in))
+ length_in= force_reg (Pmode, length_in);
+  emit_nonvlmax_insn (code_for_pred_broadcast (vmode), UNARY_OP,
+ broadcast_ops, length_in);
+  machine_mode mask_mode = riscv_vector::get_vector_mode
+ (BImode, GET_MODE_NUNITS (vmode)).require ();
+  rtx mask =  CONSTM1_RTX (mask_mode);
+  emit_insn (gen_pred_store (vmode, dst, mask, fill_value, length_in,
+ get_avl_type_rtx (riscv_vector::NONVLMAX)));
+}
+
+  return true;
+}
+
 }
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 88fde290a8a..29d3b1aa342 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2381,6 +2381,20 @@
 FAIL;
 })
 
+(define_expand "setmemsi"
+  [(set (match_operand:BLK 0 "memory_operand") ;; Dest
+ (match_operand:QI  2 "nonmemory_operand")) ;; Value
+   (use (match_operand:SI  1 "const_int_operand")) ;; Length
+   (match_operand:SI   3 "const_int_operand")] ;; Align
+  "TARGET_VECTOR"
+{
+  if (riscv_vector::expand_vec_setmem (operands[0], operands[1], operands[2],
+  operands[3]))
+DO

[PATCH 1/3] RISC-V: movmem for RISCV with V extension

2023-12-11 Thread Sergei Lewis
gcc/ChangeLog

* config/riscv/riscv.md (movmem): Use riscv_vector::expand_block_move,
if and only if we know the entire operation can be performed using one 
vector
load followed by one vector store

gcc/testsuite/ChangeLog

* gcc.target/riscv/rvv/base/movmem-1.c: New test
---
 gcc/config/riscv/riscv.md | 22 +++
 .../gcc.target/riscv/rvv/base/movmem-1.c  | 59 +++
 2 files changed, 81 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index eed997116b0..88fde290a8a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2359,6 +2359,28 @@
 FAIL;
 })
 
+;; inlining general memmove is a pessimisation: we can't avoid having to decide
+;; which direction to go at runtime, which is costly in instruction count
+;; however for situations where the entire move fits in one vector operation
+;; we can do all reads before doing any writes so we don't have to worry
+;; so generate the inline vector code in such situations
+;; nb. prefer scalar path for tiny memmoves
+(define_expand "movmem"
+  [(parallel [(set (match_operand:BLK 0 "general_operand")
+  (match_operand:BLK 1 "general_operand"))
+ (use (match_operand:P 2 ""))
+ (use (match_operand:SI 3 "const_int_operand"))])]
+  "TARGET_VECTOR"
+{
+  if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN/8)
+   && (INTVAL (operands[2]) <= TARGET_MIN_VLEN)
+   && riscv_vector::expand_block_move (operands[0], operands[1],
+operands[2]))
+DONE;
+  else
+FAIL;
+})
+
 ;; Expand in-line code to clear the instruction cache between operand[0] and
 ;; operand[1].
 (define_expand "clear_cache"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
new file mode 100644
index 000..b930241ae5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include 
+
+#define MIN_VECTOR_BYTES (__riscv_v_min_vlen/8)
+
+/* tiny memmoves should not be vectorised
+** f1:
+**  li\s+a2,15
+**  tail\s+memmove
+*/
+char * f1 (char *a, char const *b)
+{
+  return memmove (a, b, 15);
+}
+
+/* vectorise+inline minimum vector register width with LMUL=1
+** f2:
+**  (
+**  vsetivli\s+zero,16,e8,m1,ta,ma
+**  |
+**  li\s+[ta][0-7],\d+
+**  vsetvli\s+zero,[ta][0-7],e8,m1,ta,ma
+**  )
+**  vle8\.v\s+v\d+,0\(a1\)
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+char * f2 (char *a, char const *b)
+{
+  return memmove (a, b, MIN_VECTOR_BYTES);
+}
+
+/* vectorise+inline up to LMUL=8
+** f3:
+**  li\s+[ta][0-7],\d+
+**  vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma
+**  vle8\.v\s+v\d+,0\(a1\)
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+char * f3 (char *a, char const *b)
+{
+  return memmove (a, b, MIN_VECTOR_BYTES*8);
+}
+
+/* don't vectorise if the move is too large for one operation
+** f4:
+**  li\s+a2,\d+
+**  tail\s+memmove
+*/
+char * f4 (char *a, char const *b)
+{
+  return memmove (a, b, MIN_VECTOR_BYTES*8+1);
+}
+
-- 
2.34.1



[PATCH 3/3] RISC-V: cmpmem for RISCV with V extension

2023-12-11 Thread Sergei Lewis
gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_vector::expand_vec_cmpmem): New 
function
declaration.

* config/riscv/riscv-string.cc (riscv_vector::expand_vec_cmpmem): New
function; this generates an inline vectorised memory compare, if and only if
we know the entire operation can be performed in a single vector load per
input

* config/riscv/riscv.md (cmpmemsi): Try riscv_vector::expand_vec_cmpmem for
constant lengths

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/cmpmem-1.c: New codegen tests
* gcc.target/riscv/rvv/base/cmpmem-2.c: New execution tests
---
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-string.cc  | 111 ++
 gcc/config/riscv/riscv.md |  15 +++
 .../gcc.target/riscv/rvv/base/cmpmem-1.c  |  85 ++
 .../gcc.target/riscv/rvv/base/cmpmem-2.c  |  69 +++
 5 files changed, 281 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 950cb65c910..72378438552 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -561,6 +561,7 @@ void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = 
false);
 bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
 void emit_vec_extract (rtx, rtx, poly_int64);
 bool expand_vec_setmem (rtx, rtx, rtx, rtx);
+bool expand_vec_cmpmem (rtx, rtx, rtx, rtx);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 0abbd5f8b28..6128565310b 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1329,4 +1329,115 @@ expand_vec_setmem (rtx dst_in, rtx length_in, rtx 
fill_value_in,
   return true;
 }
 
+
+/* Used by cmpmemsi in riscv.md.  */
+
+bool
+expand_vec_cmpmem (rtx result_out, rtx blk_a_in, rtx blk_b_in, rtx length_in)
+{
+  /* we're generating vector code.  */
+  if (!TARGET_VECTOR)
+return false;
+  /* if we can't reason about the length, let libc handle the operation.  */
+  if (!CONST_INT_P (length_in))
+return false;
+
+  HOST_WIDE_INT length = INTVAL (length_in);
+  HOST_WIDE_INT lmul;
+
+  /* select an lmul such that the data just fits into one vector operation;
+ bail if we can't.  */
+  if (!select_appropriate_lmul (length, lmul))
+return false;
+
+  /* strategy:
+ load entire blocks at a and b into vector regs
+ generate mask of bytes that differ
+ find first set bit in mask
+ find offset of first set bit in mask, use 0 if none set
+ result is ((char*)a[offset] - (char*)b[offset])
+   */
+
+  machine_mode vmode = riscv_vector::get_vector_mode (QImode,
+   BYTES_PER_RISCV_VECTOR * lmul).require ();
+  rtx blk_a_addr = copy_addr_to_reg (XEXP (blk_a_in, 0));
+  rtx blk_a = change_address (blk_a_in, vmode, blk_a_addr);
+  rtx blk_b_addr = copy_addr_to_reg (XEXP (blk_b_in, 0));
+  rtx blk_b = change_address (blk_b_in, vmode, blk_b_addr);
+
+  rtx vec_a = gen_reg_rtx (vmode);
+  rtx vec_b = gen_reg_rtx (vmode);
+
+  machine_mode mask_mode = get_mask_mode (vmode);
+  rtx mask = gen_reg_rtx (mask_mode);
+  rtx mismatch_ofs = gen_reg_rtx (Pmode);
+
+  rtx ne = gen_rtx_NE (mask_mode, vec_a, vec_b);
+  rtx vmsops[] = {mask, ne, vec_a, vec_b};
+  rtx vfops[] = {mismatch_ofs, mask};
+
+  /* If the length is exactly vlmax for the selected mode, do that.
+ Otherwise, use a predicated store.  */
+
+  if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in)))
+{
+  emit_move_insn (vec_a, blk_a);
+  emit_move_insn (vec_b, blk_b);
+  emit_vlmax_insn (code_for_pred_cmp (vmode),
+ riscv_vector::COMPARE_OP, vmsops);
+
+  emit_vlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
+ riscv_vector::CPOP_OP, vfops);
+}
+  else
+{
+  if (!satisfies_constraint_K (length_in))
+ length_in= force_reg (Pmode, length_in);
+
+  rtx memmask =  CONSTM1_RTX (mask_mode);
+
+  rtx m_ops_a[] = {vec_a, memmask, blk_a};
+  rtx m_ops_b[] = {vec_b, memmask, blk_b};
+
+  emit_nonvlmax_insn (code_for_pred_mov (vmode),
+ riscv_vector::UNARY_OP_TAMA, m_ops_a, length_in);
+  emit_nonvlmax_insn (code_for_pred_mov (vmode),
+ riscv_vector::UNARY_OP_TAMA, m_ops_b, length_in);
+
+  emit_nonvlmax_insn (code_for_pred_cmp (vmode),
+ riscv_vector::COMPARE_OP, vmsops, length_in);
+
+  emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
+ riscv_vector::CPOP_OP, vfops, length_in);
+}
+
+  /* mismatch_ofs is -1 if blocks match, or the offset of
+ the first mismatch otherwise.  */
+rtx ltz = gen_reg_rtx (Xmode);
+emit_insn (gen_slt_3 (LT, Xmode, Xmode, ltz, mismatch_ofs, const0_rtx

[PATCH 2/3] RISC-V: setmem for RISCV with V extension

2023-12-11 Thread juzhe.zh...@rivai.ai
Hi, Thanks for contributing this.

+/* Select appropriate LMUL for a single vector operation based on
+   byte size of data to be processed.
+   On success, return true and populate lmul_out.
+   If length_in is too wide for a single vector operation, return false
+   and leave lmul_out unchanged.  */
+
+static bool
+select_appropriate_lmul (HOST_WIDE_INT length_in,
+HOST_WIDE_INT &lmul_out)
+{
I don't think we need this, you only need to use TARGET_MAX_LMUL




juzhe.zh...@rivai.ai


[committed] d: Merge upstream dmd, druntime 2bbf64907c, phobos b64bfbf91

2023-12-11 Thread Iain Buclaw
Hi,

This patch merges the D front-end and runtime library with upstream dmd
2bbf64907c, and the standard library with phobos b64bfbf91.

Synchronizing with the upstream release of v2.106.0.

D front-end changes:

- Import dmd v2.106.0.

D runtime changes:

- Import druntime v2.106.0.

Phobos changes:

- Import phobos v2.106.0.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* Make-lang.in (D_FRONTEND_OBJS): Rename d/common-string.o to
d/common-smallbuffer.o.
* dmd/MERGE: Merge upstream dmd 2bbf64907c.
* dmd/VERSION: Bump version to v2.106.0.
* modules.cc (layout_moduleinfo_fields): Update for new front-end
interface.
(layout_moduleinfo): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 2bbf64907c.
* src/MERGE: Merge upstream phobos b64bfbf91.
---
 gcc/d/Make-lang.in|   2 +-
 gcc/d/dmd/MERGE   |   2 +-
 gcc/d/dmd/VERSION |   2 +-
 gcc/d/dmd/aggregate.d |  10 -
 gcc/d/dmd/aggregate.h |   1 -
 gcc/d/dmd/attrib.d|  67 --
 gcc/d/dmd/attrib.h|   9 -
 gcc/d/dmd/canthrow.d  |   2 +-
 gcc/d/dmd/common/README.md|   2 +-
 gcc/d/dmd/common/file.d   |  15 +-
 gcc/d/dmd/common/{string.d => smallbuffer.d}  |  49 ++--
 gcc/d/dmd/cparse.d|   8 +
 gcc/d/dmd/dcast.d |  12 +-
 gcc/d/dmd/denum.d |   7 -
 gcc/d/dmd/dimport.d   |  16 --
 gcc/d/dmd/dmodule.d   |  36 ++-
 gcc/d/dmd/dsymbol.d   | 172 --
 gcc/d/dmd/dsymbol.h   |   5 +-
 gcc/d/dmd/dsymbolsem.d| 214 +
 gcc/d/dmd/dtemplate.d |   7 +-
 gcc/d/dmd/enum.h  |   1 -
 gcc/d/dmd/escape.d|   2 +-
 gcc/d/dmd/expressionsem.d |   2 +-
 gcc/d/dmd/hdrgen.d|  27 +++
 gcc/d/dmd/import.h|   1 -
 gcc/d/dmd/initsem.d   |  20 +-
 gcc/d/dmd/module.h|   1 +
 gcc/d/dmd/nspace.d|  14 --
 gcc/d/dmd/nspace.h|   1 -
 gcc/d/dmd/parse.d |  12 +-
 gcc/d/dmd/root/file.d |   2 +-
 gcc/d/dmd/root/filename.d |   4 +-
 gcc/d/dmd/root/speller.d  |   2 +-
 gcc/d/dmd/root/string.d   |   2 +-
 gcc/d/dmd/typesem.d   |  58 +
 gcc/d/modules.cc  |   4 +-
 .../fail_compilation/misc_parser_err_cov1.d   |   2 +-
 gcc/testsuite/gdc.test/runnable/dbitfields.d  |  34 +++
 libphobos/libdruntime/MERGE   |   2 +-
 libphobos/libdruntime/core/cpuid.d|   7 +-
 libphobos/src/MERGE   |   2 +-
 libphobos/src/std/algorithm/searching.d   | 218 +++---
 libphobos/src/std/conv.d  |   5 +-
 libphobos/src/std/range/package.d |  24 +-
 libphobos/src/std/uni/package.d   |  12 +
 45 files changed, 579 insertions(+), 518 deletions(-)
 rename gcc/d/dmd/common/{string.d => smallbuffer.d} (82%)

diff --git a/gcc/d/Make-lang.in b/gcc/d/Make-lang.in
index b3007a96bd0..a0d4d7cbeb4 100644
--- a/gcc/d/Make-lang.in
+++ b/gcc/d/Make-lang.in
@@ -95,7 +95,7 @@ D_FRONTEND_OBJS = \
d/common-bitfields.o \
d/common-file.o \
d/common-outbuffer.o \
-   d/common-string.o \
+   d/common-smallbuffer.o \
d/compiler.o \
d/cond.o \
d/constfold.o \
diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index aa0062c10eb..5edcee1c84d 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-ff57fec51558013b25cadb7e83da9f4675915d56
+2bbf64907cbbb483d003e0a8fcf8b502e4883799
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/VERSION b/gcc/d/dmd/VERSION
index 41fdc654b14..8c95cd04f80 100644
--- a/gcc/d/dmd/VERSION
+++ b/gcc/d/dmd/VERSION
@@ -1 +1 @@
-v2.106.0-rc.1
+v2.106.0
diff --git a/gcc/d/dmd/aggregate.d b/gcc/d/dmd/aggregate.d
index 307bb0171c4..352ca88f470 100644
--- a/gcc/d/dmd/aggregate.d
+++ b/gcc/d/dmd/aggregate.d
@@ -178,16 +178,6 @@ extern (C++) abstract class AggregateDeclaration : 
ScopeDsymbol
 return sc2;
 }
 
-override final void setScope(Scope* sc)
-{
-// Might need a scope to resolve forward references. The check for
-// semanticRun prevents unnecessary setting of _scope

Re: [PATCH 1/3] RISC-V: movmem for RISCV with V extension

2023-12-11 Thread Robin Dapp
Hi Sergei,

thanks for contributing this!

Small general remarks/nits upfront:

The code looks like it hasn't been run through clang-format or
similar.  Please make sure that it adheres to the GNU coding
conventions.  The same applies to comments.  Some of them start
in lowercase.

As you rely on the vector length, please make sure to test various
combinations (also "exotic" ones) like zve32 and zve64.
Also, please specify which configurations it has been tested on. 

> * config/riscv/riscv.md (movmem): Use 
> riscv_vector::expand_block_move,
> if and only if we know the entire operation can be performed using one 
> vector
> load followed by one vector store
> 
> gcc/testsuite/ChangeLog
> 
> * gcc.target/riscv/rvv/base/movmem-1.c: New test

Please add a PR target/112109 here.  I believe after these
patches have landed we can close that bug.

> ---
>  gcc/config/riscv/riscv.md | 22 +++
>  .../gcc.target/riscv/rvv/base/movmem-1.c  | 59 +++
>  2 files changed, 81 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
> 
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index eed997116b0..88fde290a8a 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -2359,6 +2359,28 @@
>  FAIL;
>  })
>  
> +;; inlining general memmove is a pessimisation: we can't avoid having to 
> decide
> +;; which direction to go at runtime, which is costly in instruction count
> +;; however for situations where the entire move fits in one vector operation
> +;; we can do all reads before doing any writes so we don't have to worry
> +;; so generate the inline vector code in such situations
> +;; nb. prefer scalar path for tiny memmoves
> +(define_expand "movmem"
> +  [(parallel [(set (match_operand:BLK 0 "general_operand")
> +  (match_operand:BLK 1 "general_operand"))
> +   (use (match_operand:P 2 ""))
> +   (use (match_operand:SI 3 "const_int_operand"))])]
> +  "TARGET_VECTOR"
> +{
> +  if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN/8)

If operands[2] is used as an int we need to make sure the constraint
says so.  Shouldn't operand[1] be a memory_operand?

> + && (INTVAL (operands[2]) <= TARGET_MIN_VLEN)
> + && riscv_vector::expand_block_move (operands[0], operands[1],
> +  operands[2]))
> +DONE;
> +  else
> +FAIL;
> +})
> +

> +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen/8)
> +
> +/* tiny memmoves should not be vectorised
> +** f1:
> +**  li\s+a2,15
> +**  tail\s+memmove
> +*/
> +char * f1 (char *a, char const *b)
> +{
> +  return memmove (a, b, 15);
> +}

The < 16 assumption might not hold on embedded targets.
Same with the other tests.

Regards
 Robin



Re: [committed] d: Merge upstream dmd, druntime 2bbf64907c, phobos b64bfbf91

2023-12-11 Thread Iain Buclaw
Excerpts from Iain Buclaw's message of Dezember 11, 2023 11:07 am:
> Hi,
> 
> This patch merges the D front-end and runtime library with upstream dmd
> 2bbf64907c, and the standard library with phobos b64bfbf91.
> 
> Synchronizing with the upstream release of v2.106.0.
> 

...

> diff --git a/gcc/d/dmd/canthrow.d b/gcc/d/dmd/canthrow.d
> index 67305922df6..5a608a9986d 100644
> --- a/gcc/d/dmd/canthrow.d
> +++ b/gcc/d/dmd/canthrow.d
> @@ -22,7 +22,6 @@ import dmd.declaration;
>  import dmd.dsymbol;
>  import dmd.errorsink;
>  import dmd.expression;
> -import dmd.expressionsem;
>  import dmd.func;
>  import dmd.globals;
>  import dmd.init;
> @@ -81,6 +80,7 @@ CT canThrow(Expression e, FuncDeclaration func, ErrorSink 
> eSink)
>  if (!f.isDtorDeclaration())
>  errorSupplementalInferredAttr(f, 10, false, 
> STC.nothrow_);
>  
> +import dmd.expressionsem : checkOverriddenDtor;
>  f.checkOverriddenDtor(null, e.loc, dd => 
> dd.type.toTypeFunction().isnothrow, "not nothrow");
>  }
>  else if (func)
> 

Hi Rainer,

This specific change that moves an import from toplevel to local should
fix that linker problem when using gdc-9 for bootstrapping.

Iain.


Re: [PATCH 1/3] RISC-V: movmem for RISCV with V extension

2023-12-11 Thread Robin Dapp
Ah, please also ensure to include (and follow) the stringop_strategy
checks. (LIBCALL, VECTOR)
The naming is a bit unfortunate still but that need not be fixed
in this patch.  

Regards
 Robin


[PATCH]middle-end: Mark all control flow as used_in_scope.

2023-12-11 Thread Tamar Christina
Hi All,

While compiling SPECCPU 2017 I ran accross the reason (I had forgotten) why my
initial patch marked all control statements as used in scope and not just
gconds:  There are other statements that can introduce multiple exits, like
switch statements.   If we ignore them as not relevant we never get a chance to
reject them later as not vectorizable.  Becuase they are marked as not relevant
we crash or produce invalid code.

The fix is to mark all control statements as used in scope, and then we
later reject them as not vectorizable.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_stmt_relevant_p): Mark all control flow as
used in scope.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-early-break_89.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_89.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_89.c
new file mode 100644
index 
..d33f3d94c096ffc53e4e82a28c3db058633fb21d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_89.c
@@ -0,0 +1,18 @@
+/* { dg-require-effective-target vect_int } */
+
+char *a;
+extern void d();
+void b() {
+  int c = 0;
+  while (c < 16) {
+switch (a[c]) {
+case '"':
+case '\'':
+  c++;
+  continue;
+}
+break;
+  }
+  if (c)
+d();
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
72f271658e357bd562eb29087735825eb5ab0dc0..98704b7cea8a93f5beae7a55c85085c049e54152
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -361,7 +361,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
 
   /* cond stmt other than loop exit cond.  */
   gimple *stmt = STMT_VINFO_STMT (stmt_info);
-  if (is_a  (stmt)
+  if (is_ctrl_stmt (stmt)
   && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != stmt
   && (!loop->inner || gimple_bb (stmt)->loop_father == loop))
 *relevant = vect_used_in_scope;




-- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_89.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_89.c
new file mode 100644
index 
..d33f3d94c096ffc53e4e82a28c3db058633fb21d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_89.c
@@ -0,0 +1,18 @@
+/* { dg-require-effective-target vect_int } */
+
+char *a;
+extern void d();
+void b() {
+  int c = 0;
+  while (c < 16) {
+switch (a[c]) {
+case '"':
+case '\'':
+  c++;
+  continue;
+}
+break;
+  }
+  if (c)
+d();
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
72f271658e357bd562eb29087735825eb5ab0dc0..98704b7cea8a93f5beae7a55c85085c049e54152
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -361,7 +361,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
 
   /* cond stmt other than loop exit cond.  */
   gimple *stmt = STMT_VINFO_STMT (stmt_info);
-  if (is_a  (stmt)
+  if (is_ctrl_stmt (stmt)
   && LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != stmt
   && (!loop->inner || gimple_bb (stmt)->loop_father == loop))
 *relevant = vect_used_in_scope;





[PATCH] ada: Fix Ada bootstrap on FreeBSD

2023-12-11 Thread Rainer Orth
Ada bootstrap on FreeBSD/amd64 was also broken by the recent warning
changes:

terminals.c: In function 'allocate_pty_desc':
terminals.c:1200:12: error: implicit declaration of function 'openpty'; did you
mean 'openat'? [-Wimplicit-function-declaration]
 1200 |   status = openpty (&master_fd, &slave_fd, NULL, NULL, NULL);
  |^~~
  |openat

terminals.c: At top level:
terminals.c:1268:9: warning: "TABDLY" redefined
 1268 | #define TABDLY 0
  | ^~
In file included from /usr/include/termios.h:38,
 from terminals.c:1109:
/usr/include/sys/_termios.h:111:9: note: this is the location of the previous 
definition
  111 | #define TABDLY  0x0004  /* tab delay mask */
  | ^~
make[7]: *** [../gcc-interface/Makefile:302: terminals.o] Error 1

Fixed by including the necessary header and guarding the fallback
definition of TABDLY.

This allowed a 64-bit-only bootstrap on x86_64-unknown-freebsd14.0 to
complete successfully.  Multilibbed bootstrap is still broken for
unrelated reasons, cf. PR ada/ada/112958.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2023-12-11  Rainer Orth  

gcc/ada:
* terminals.c [__FreeBSD__]: Include .
(TABDLY): Only define if missing.

diff --git a/gcc/ada/terminals.c b/gcc/ada/terminals.c
--- a/gcc/ada/terminals.c
+++ b/gcc/ada/terminals.c
@@ -1125,6 +1125,9 @@ void
 #if defined (__APPLE__)
 #   include 
 #endif
+#if defined (__FreeBSD__)
+#   include 
+#endif
 
 #define CDISABLE _POSIX_VDISABLE
 
@@ -1265,11 +1268,13 @@ allocate_pty_desc (pty_desc **desc) {
 #ifndef NLDLY
 #define NLDLY 0
 #define CRDLY 0
-#define TABDLY 0
 #define BSDLY 0
 #define VTDLY 0
 #define FFDLY 0
 #endif
+#ifndef TABDLY
+#define TABDLY 0
+#endif
 
 /* child_setup_tty - set terminal properties
  *


Re: [PATCH] ada: Fix Ada bootstrap on FreeBSD

2023-12-11 Thread Marc Poulhiès


Rainer Orth  writes:

> Ada bootstrap on FreeBSD/amd64 was also broken by the recent warning
> changes:
>
> terminals.c: In function 'allocate_pty_desc':
> terminals.c:1200:12: error: implicit declaration of function 'openpty'; did 
> you
> mean 'openat'? [-Wimplicit-function-declaration]
>  1200 |   status = openpty (&master_fd, &slave_fd, NULL, NULL, NULL);
>   |^~~
>   |openat
>
> terminals.c: At top level:
> terminals.c:1268:9: warning: "TABDLY" redefined
>  1268 | #define TABDLY 0
>   | ^~
> In file included from /usr/include/termios.h:38,
>  from terminals.c:1109:
> /usr/include/sys/_termios.h:111:9: note: this is the location of the previous 
> definition
>   111 | #define TABDLY  0x0004  /* tab delay mask */
>   | ^~
> make[7]: *** [../gcc-interface/Makefile:302: terminals.o] Error 1
>
> Fixed by including the necessary header and guarding the fallback
> definition of TABDLY.
>
> This allowed a 64-bit-only bootstrap on x86_64-unknown-freebsd14.0 to
> complete successfully.  Multilibbed bootstrap is still broken for
> unrelated reasons, cf. PR ada/ada/112958.

Hello Rainer,

> Ok for trunk?

OK !

Thanks,
Marc


Re: [PATCH] Testsuite, asan, darwin: Adjust output pattern

2023-12-11 Thread Iain Sandoe
Hi FX,

> On 11 Dec 2023, at 08:43, FX Coudert  wrote:
> 
> Since the last import from upstream libsanitizer, the output has changed
> and now looks more like this:
> 
> READ of size 6 at 0x7ff7beb2a144 thread T0
>#0 0x101cf7796 in MemcmpInterceptorCommon(void*, int (*)(void const*, void 
> const*, unsigned long), void const*, void const*, unsigned long) 
> sanitizer_common_interceptors.inc:813
>#1 0x101cf7b99 in memcmp sanitizer_common_interceptors.inc:840
>#2 0x108a0c39f in __stack_chk_guard+0xf (dyld:x86_64+0x8039f)
> 

I see the same on a spot-check of Darwin19.
I guess we just have to keep tweaking as the upstream alters output.

> so let's adjust the pattern accordingly.
> 
> Tested on x86_64-apple-darwin21. OK to push?

OK, 

Iain

> 
> FX
> 
> 
> <0001-Testsuite-asan-darwin-Adjust-output-pattern.patch>



Re: [PATCH] expmed: Perform mask extraction via QImode [PR112773].

2023-12-11 Thread Robin Dapp
What also works is something like:

  scalar_mode extract_mode = innermode;
  if (GET_MODE_CLASS (outermode) == MODE_VECTOR_BOOL)
extract_mode = smallest_int_mode_for_size
  (GET_MODE_PRECISION (innermode));

however

> So yes, I guess we need to answer BImode vs. QImode.  I hope Richard
> has a better idea here?

aarch64's predicate vec_extract is:

(define_expand "vec_extract"
  [(match_operand: 0 "register_operand")
   (match_operand: 1 "register_operand")
   (match_operand:SI 2 "nonmemory_operand")
   ;; Dummy operand to which we can attach the iterator.
   (reg:SVE_FULL_I V0_REGNUM)]

So if I'm reading this correctly they are using the element
mode of the associated full vector mode for extraction rather
than QImode.

I could also do something similar for the riscv backend but
that still wouldn't yield a BImode vec_extract result of course
and expmed would need to be adjusted.  Do we even know the
original associated non-predicate mode here?  I suppose not?

Do we need a mov from/to BImode instead?

Maybe Richard has a good idea.

Even though I haven't seen it being hit, vec_set in expmed
would have the same problem?

Regards
 Robin


Re: [PATCH]middle-end: Mark all control flow as used_in_scope.

2023-12-11 Thread Richard Biener
On Mon, 11 Dec 2023, Tamar Christina wrote:

> Hi All,
> 
> While compiling SPECCPU 2017 I ran accross the reason (I had forgotten) why my
> initial patch marked all control statements as used in scope and not just
> gconds:  There are other statements that can introduce multiple exits, like
> switch statements.   If we ignore them as not relevant we never get a chance 
> to
> reject them later as not vectorizable.  Becuase they are marked as not 
> relevant
> we crash or produce invalid code.
> 
> The fix is to mark all control statements as used in scope, and then we
> later reject them as not vectorizable.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-stmts.cc (vect_stmt_relevant_p): Mark all control flow as
>   used in scope.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-early-break_89.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_89.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_89.c
> new file mode 100644
> index 
> ..d33f3d94c096ffc53e4e82a28c3db058633fb21d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_89.c
> @@ -0,0 +1,18 @@
> +/* { dg-require-effective-target vect_int } */
> +
> +char *a;
> +extern void d();
> +void b() {
> +  int c = 0;
> +  while (c < 16) {
> +switch (a[c]) {
> +case '"':
> +case '\'':
> +  c++;
> +  continue;
> +}
> +break;
> +  }
> +  if (c)
> +d();
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 
> 72f271658e357bd562eb29087735825eb5ab0dc0..98704b7cea8a93f5beae7a55c85085c049e54152
>  100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -361,7 +361,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, 
> loop_vec_info loop_vinfo,
>  
>/* cond stmt other than loop exit cond.  */
>gimple *stmt = STMT_VINFO_STMT (stmt_info);
> -  if (is_a  (stmt)
> +  if (is_ctrl_stmt (stmt)
>&& LOOP_VINFO_LOOP_IV_COND (loop_vinfo) != stmt
>&& (!loop->inner || gimple_bb (stmt)->loop_father == loop))
>  *relevant = vect_used_in_scope;
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-12-11 Thread Richard Biener
On Wed, Nov 29, 2023 at 3:36 PM Di Zhao OS
 wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, November 21, 2023 9:01 PM
> > To: Di Zhao OS 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> > get_reassociation_width
> >
> > On Thu, Nov 9, 2023 at 6:53 PM Di Zhao OS 
> > wrote:
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Tuesday, October 31, 2023 9:48 PM
> > > > To: Di Zhao OS 
> > > > Cc: gcc-patches@gcc.gnu.org
> > > > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> > > > get_reassociation_width
> > > >
> > > > On Sun, Oct 8, 2023 at 6:40 PM Di Zhao OS 
> > > > 
> > > > wrote:
> > > > >
> > > > > Attached is a new version of the patch.
> > > > >
> > > > > > -Original Message-
> > > > > > From: Richard Biener 
> > > > > > Sent: Friday, October 6, 2023 5:33 PM
> > > > > > To: Di Zhao OS 
> > > > > > Cc: gcc-patches@gcc.gnu.org
> > > > > > Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in
> > > > > > get_reassociation_width
> > > > > >
> > > > > > On Thu, Sep 14, 2023 at 2:43 PM Di Zhao OS
> > > > > >  wrote:
> > > > > > >
> > > > > > > This is a new version of the patch on "nested FMA".
> > > > > > > Sorry for updating this after so long, I've been studying and
> > > > > > > writing micro cases to sort out the cause of the regression.
> > > > > >
> > > > > > Sorry for taking so long to reply.
> > > > > >
> > > > > > > First, following previous discussion:
> > > > > > > (https://gcc.gnu.org/pipermail/gcc-patches/2023-
> > September/629080.html)
> > > > > > >
> > > > > > > 1. From testing more altered cases, I don't think the
> > > > > > > problem is that reassociation works locally. In that:
> > > > > > >
> > > > > > >   1) On the example with multiplications:
> > > > > > >
> > > > > > > tmp1 = a + c * c + d * d + x * y;
> > > > > > > tmp2 = x * tmp1;
> > > > > > > result += (a + c + d + tmp2);
> > > > > > >
> > > > > > >   Given "result" rewritten by width=2, the performance is
> > > > > > >   worse if we rewrite "tmp1" with width=2. In contrast, if we
> > > > > > >   remove the multiplications from the example (and make "tmp1"
> > > > > > >   not singe used), and still rewrite "result" by width=2, then
> > > > > > >   rewriting "tmp1" with width=2 is better. (Make sense because
> > > > > > >   the tree's depth at "result" is still smaller if we rewrite
> > > > > > >   "tmp1".)
> > > > > > >
> > > > > > >   2) I tried to modify the assembly code of the example without
> > > > > > >   FMA, so the width of "result" is 4. On Ampere1 there's no
> > > > > > >   obvious improvement. So although this is an interesting
> > > > > > >   problem, it doesn't seem like the cause of the regression.
> > > > > >
> > > > > > OK, I see.
> > > > > >
> > > > > > > 2. From assembly code of the case with FMA, one problem is
> > > > > > > that, rewriting "tmp1" to parallel didn't decrease the
> > > > > > > minimum CPU cycles (taking MULT_EXPRs into account), but
> > > > > > > increased code size, so the overhead is increased.
> > > > > > >
> > > > > > >a) When "tmp1" is not re-written to parallel:
> > > > > > > fmadd d31, d2, d2, d30
> > > > > > > fmadd d31, d3, d3, d31
> > > > > > > fmadd d31, d4, d5, d31  //"tmp1"
> > > > > > > fmadd d31, d31, d4, d3
> > > > > > >
> > > > > > >b) When "tmp1" is re-written to parallel:
> > > > > > > fmul  d31, d4, d5
> > > > > > > fmadd d27, d2, d2, d30
> > > > > > > fmadd d31, d3, d3, d31
> > > > > > > fadd  d31, d31, d27 //"tmp1"
> > > > > > > fmadd d31, d31, d4, d3
> > > > > > >
> > > > > > > For version a), there are 3 dependent FMAs to calculate "tmp1".
> > > > > > > For version b), there are also 3 dependent instructions in the
> > > > > > > longer path: the 1st, 3rd and 4th.
> > > > > >
> > > > > > Yes, it doesn't really change anything.  The patch has
> > > > > >
> > > > > > +  /* If there's code like "acc = a * b + c * d + acc" in a tight 
> > > > > > loop,
> > > > some
> > > > > > + uarchs can execute results like:
> > > > > > +
> > > > > > +   _1 = a * b;
> > > > > > +   _2 = .FMA (c, d, _1);
> > > > > > +   acc_1 = acc_0 + _2;
> > > > > > +
> > > > > > + in parallel, while turning it into
> > > > > > +
> > > > > > +   _1 = .FMA(a, b, acc_0);
> > > > > > +   acc_1 = .FMA(c, d, _1);
> > > > > > +
> > > > > > + hinders that, because then the first FMA depends on the result
> > > > > > of preceding
> > > > > > + iteration.  */
> > > > > >
> > > > > > I can't see what can be run in parallel for the first case.  The 
> > > > > > .FMA
> > > > > > depends on the multiplication a * b.  Iff the uarch somehow 
> > > > > > decomposes
> > > > > > .FMA into multiply + add then the c * d multiply could run in 
> > > > > > parallel
> > > > > > with the a * b multiply which _might_ be 

Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2023-12-11 Thread Richard Sandiford
Richard Sandiford  writes:
>   template 
> struct _SveMaskWrapper
> {
>   ...
>
>   _GLIBCXX_SIMD_INTRINSIC constexpr value_type
>   operator[](size_t __i) const
>   {
> return _BuiltinSveMaskType::__sve_mask_active_count(
> _BuiltinSveVectorType::__sve_active_mask(),
> svand_z(_BuiltinSveVectorType::__sve_active_mask(),
> svcmpeq(_BuiltinSveVectorType::__sve_active_mask(),
> _BuiltinSveMaskType::__index0123,
> typename 
> _BuiltinSveMaskType::__sve_mask_uint_type(__i)),
> _M_data));
>   }
>
> A simpler way to do this might be to use svdup_u_z (_M_data, 1, 0)
> and then svorrv on the result.

Sorry for the nonsense comment.  I was thinking of a different operation.
But GCC knows how to index a fixed-length data vector, so it might be
worth using:

  svdup_u_z (_M_data, 1)[__i]

Thanks,
Richard


RE: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-11 Thread Li, Pan2
Hi Robin,

I reduced the SZ size from 10 to 1, and the below case with SZ = 2 will fail. 
The failed location is "foo is 50, foo2 is 12800, i,j is 1, 0".

#define SZ 2

const char *s[SZ]  = {"1",  "12345678901234567889012345678901234567890"};

Executing on host: 
/home/pli/gcc/111/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/xgcc 
-B/home/pli/gcc/111/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/  
exceptions_enabled4031601.cc  -march=rv64gcv_zvl512b -mabi=lp64d 
-mcmodel=medlow --param=riscv-autovec-lmul=m4   -fdiagnostics-plain-output  
-Wno-complain-wrong-lang -S   -o exceptions_enabled4031601.s(timeout = 600)
spawn -ignore SIGHUP 
/home/pli/gcc/111/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/xgcc 
-B/home/pli/gcc/111/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/ 
exceptions_enabled4031601.cc -march=rv64gcv_zvl512b -mabi=lp64d -mcmodel=medlow 
--param=riscv-autovec-lmul=m4 -fdiagnostics-plain-output 
-Wno-complain-wrong-lang -S -o exceptions_enabled4031601.s
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c (test for excess errors)
spawn riscv64-unknown-elf-run ./strcmp-run.exe
+ QEMU_CPU=rv64,vlen=512,v=true,vext_spec=v1.0,Zve32f=true,Zve64f=true
+ qemu-riscv64 -r 5.10 -L 
/home/pli/gcc/111/riscv-gnu-toolchain/__RISC-V_INSTALL___/sysroot 
./strcmp-run.exe
qemu-riscv64: warning: CPU property 'Zve32f' is deprecated. Please use 'zve32f' 
instead
qemu-riscv64: warning: CPU property 'Zve64f' is deprecated. Please use 'zve64f' 
instead
foo is 50, foo2 is 12800, i,j is 1, 0
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test

Pan

-Original Message-
From: Robin Dapp  
Sent: Monday, December 11, 2023 4:34 PM
To: Li, Pan2 ; 钟居哲 ; gcc-patches 
; palmer ; kito.cheng 
; Jeff Law 
Cc: rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Add vectorized strcmp.

> FYI. I have the some failures as juzhe mentioned, with the emulator
> qemu version qemu-riscv64 version 8.1.93 (v8.2.0-rc3). The entire log
> may look like below:
> 
> Executing on host: 
> /home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/xgcc 
> -B/home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/  
> /home/box/panli/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
>   -march=rv64gcv -mabi=lp64d -mcmodel=medlow --param=riscv-autovec-lmul=m1 
> --param=riscv-autovec-preference=fixed-vlmax   -fdiagnostics-plain-output   
> -ftree-vectorize -O3 --param riscv-autovec-lmul=m1 -O3 -minline-strcmp   
> -lm  -o ./strcmp-run.exe    (timeout = 600)
> 
> spawn -ignore SIGHUP 
> /home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/xgcc 
> -B/home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/ 
> /home/box/panli/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
>  -march=rv64gcv -mabi=lp64d -mcmodel=medlow --param=riscv-autovec-lmul=m1 
> --param=riscv-autovec-preference=fixed-vlmax -fdiagnostics-plain-output 
> -ftree-vectorize -O3 --param riscv-autovec-lmul=m1 -O3 -minline-strcmp -lm -o 
> ./strcmp-run.exe^M

Thanks, it must be a bug if you both see it.  But I cannot reproduce
it yet for some reason.  I tried your exact parameters (just didn't
use newlib).  Also, for Juzhe it seemed to fail without
-minline-strcmp for you it fails with it.  Maybe my testcase uses
undefined behavior?  Could you try reducing SZ to 1 for a test?

Regards
 Robin


Backport of "fixincludes: Update darwin_flt_eval_method for macOS 14"

2023-12-11 Thread FX Coudert
Hi,

I’d like to backport the fixincludes for macOS 14 SDK at 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=93f803d53b5ccaabded9d7b4512b54da81c1c616
 to the active branches, i.e. 13, 12 and 11 (unless I am mistaken).

The fix has been there for months, it’s stable and very specific. Without it, 
we can’t compile GCC for macOS 14.
OK to backport?

FX

Re: [PATCH] tree-cfg: Fix misleading error message in verify_gimple_assign_single

2023-12-11 Thread xndcn
Got it, thanks! It is really confusing >,<
What about the first one? For case MEM_REF.

在 2023年12月11日星期一,Richard Biener  写道:

> On Sun, Dec 10, 2023 at 4:00 PM xndcn  wrote:
> >
> > Hi, I am a newbie in GCC, and I do not have access to git repo.
> >
> > I found some misleading error messages in verify_gimple_assign_single
> function of tree-cfg.cc. It prompt error "invalid RHS for gimple memory
> store: ", but it checks lhs in fact.
>
> it might be a bit confusing but it's correct.  There is a store
> because !is_gimple_reg (lhs)
> and the only case !is_gimple_reg (rhs1) is correct is when this is an
> aggregate
> copy (!is_gimple_reg_type (TREE_TYPE (lhs))).  Otherwise the _RHS_ needs
> to be
> a register.
>
> Richard.
>


Re: [PATCH] tree-cfg: Fix misleading error message in verify_gimple_assign_single

2023-12-11 Thread Richard Biener
On Mon, Dec 11, 2023 at 12:39 PM xndcn  wrote:
>
> Got it, thanks! It is really confusing >,<
> What about the first one? For case MEM_REF.

The same - the LHS determines this is a store, if it is the
RHS is invalid as diagnosed (it needs to go through a
temporary).

Richard.



> 在 2023年12月11日星期一,Richard Biener  写道:
>>
>> On Sun, Dec 10, 2023 at 4:00 PM xndcn  wrote:
>> >
>> > Hi, I am a newbie in GCC, and I do not have access to git repo.
>> >
>> > I found some misleading error messages in verify_gimple_assign_single 
>> > function of tree-cfg.cc. It prompt error "invalid RHS for gimple memory 
>> > store: ", but it checks lhs in fact.
>>
>> it might be a bit confusing but it's correct.  There is a store
>> because !is_gimple_reg (lhs)
>> and the only case !is_gimple_reg (rhs1) is correct is when this is an 
>> aggregate
>> copy (!is_gimple_reg_type (TREE_TYPE (lhs))).  Otherwise the _RHS_ needs to 
>> be
>> a register.
>>
>> Richard.


Re: [PATCH v7 3/5] OpenMP: Pointers and member mappings

2023-12-11 Thread Tobias Burnus

Hi Julian,

On 07.12.23 18:24, Julian Brown wrote:

On Wed, 6 Dec 2023 12:36:34 +0100
Tobias Burnus  wrote:


LGTM, except for:

* The 'target exit data' handling - comments below - looks a bit
fishy/inconsistent.

...

Thus, I wonder whether that shouldn't be instead
OMP_CLAUSE_MAP_KIND (node) == GOMP_MAP_DELETE
? GOMP_MAP_DELETE : GOMP_MAP_RELEASE;

I've fixed that as you suggest.  Actually I've made OpenACC use the new
node layout as well, since (a) it works and (b) it was weirdly
inconsistent before.  That is, exit data directives will no longer use
e.g.:

   GOMP_MAP_FROM
   GOMP_MAP_TO_PSET
   GOMP_MAP_ATTACH_DETACH

but instead,

   GOMP_MAP_FROM
   GOMP_MAP_RELEASE (with OMP_CLAUSE_RELEASE_DESCRIPTOR set)
   GOMP_MAP_ATTACH_DETACH

actually the current state is that GOMP_MAP_TO_PSET will be used for
the descriptor on an "exit data" directive if you refer to the whole
array, but GOMP_MAP_RELEASE (etc.) will be used if you refer to an array
section (without the flag newly added in this patch, of course). I
don't think there's any reason to maintain that inconsistency.

...

I've re-tested this version. Does it look better now?


Yes, LGTM as well.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch] OpenMP: Minor '!$omp allocators' cleanup - and still: Re: [patch] OpenMP/Fortran: Implement omp allocators/allocate for ptr/allocatables

2023-12-11 Thread Tobias Burnus

Hi Thomas & Jakub,

I included a minor cleanup patch - but the rest is really a bunch of RFC.

I intent to commit that patch as obvious, unless there are further comments.

On 09.12.23 16:14, Jakub Jelinek wrote:


There were 3 reasons to add GOMP_alloc (and 1 for GOMP_free):
1) when it was added, omp_aligned_alloc was still not exported from the
library because we thought we shouldn't expose 5.1 features until we
have 5.0 implemented (then changed mind)
2) unline omp_aligned_alloc, GOMP_alloc issues fatal error on allocation
failure
3) the omp_* functions have omp_allocator_handle_t arguments, which is hard
to provide for builtins (I think this is the only reason for GOMP_free
addition, maybe together with wanting those to be paired)


Is this a real issue? GOMP_{alloc,free} take uintptr_t as allocator
while omp_* take omp_allocator_handle_t.

But that contains a __omp_memspace_handle_t_max__ = __UINTPTR_MAX__ and
>= C++ 11 ': __UINTPTR_TYPE__' for C/C++ and omp_allocator_handle_kind
= c_intptr_t in Fortran (→ integer(c_intptr_t) = signed variant of
uintptr_t).

In Fortran, there is an explicit check that the allocator has that kind,
which is a check whether it is kind=4 or kind=8, respectively, depending
what kind matches intptr_t. Thus, for Fortran, there is already a mismatch.

Thus, it seems to be at most a C/C++ issue.


Now, 1) is a non-issue anymore, I don't know what Fortran wants for
allocation failures, if it is better to have diagnostics on the libgomp side
or if wants to emit it inline.


I think that's not quite clear, while for Fortran itself and for
OpenMP's omp_alloc routine, the behavior is well defined, it is not for
'!$omp allocators'.

However, I think using omp_alloc is more Fortranish.

I think it would be a tad cleaner to change it – but the question is how
to do it best:

Even when ignoring the uintptr_t vs. omp_allocator_handle_t issue, the
question is still how to handle it best.

Create another builtin - to make it possible to add it to
gimple-ssa-warn-access.cc / tree-ssa-ccp.cc / tree-ssa-dce.cc?

Or the alternative: Be more un-Fortranish and keep the current
GOMP_alloc handling?

Thoughts?


And yes, 3) would be an argument to add
GOMP_realloc.


I am happy to add GOMP_realloc – passing 0 for old/new allocator in case
of Fortran - if it really makes sense. Otherwise, I'd keep it as is.

I think the next user would be the (pseudo-)USM patches, i.e. replacing
all (re,c,m)alloc/free by calls to omp_(re,c,)alloc/omp_free + special
allocator when -foffload-memory={none,pinned,unified} (well, not when
=none). Those are a bit fragile but permit using pinned/USM on less
capable offload devices. (Feature exists also with other compilers, i.e.
the users seem to be used to the limitations of this feature.)

Thus, one option would be to wait until that feature is ready.

What would be the NULL behavior? As omp_alloc or as GOMP_alloc? I assume
as omp_alloc would be fine?

* * *

On 09.12.23 12:19, Thomas Schwinge wrote:

Why not define 'GOMP_add_alloc', 'GOMP_is_alloc' via 'gcc/omp-builtins.def'?


We could - but does it make sense to have it for a single caller?


Should this either be 'BUILT_IN_OMP_REALLOC' ('OMP' instead of 'GOMP'),
or otherwise a 'GOMP_realloc' be added to 'libgomp/allocator.c

Cf. also above. but I am fine with a name change to OMP_ as well/in addition.


'GOMP_add_alloc', 'GOMP_is_alloc' should get prototyped in
'libgomp/libgomp_g.h'.


I concur + added.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Minor '!$omp allocators' cleanup

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_omp_call_add_alloc,
	gfc_omp_call_is_alloc): Set 'fn spec'.

libgomp/ChangeLog:

	* libgomp_g.h (GOMP_add_alloc, GOMP_is_alloc): Add.

 gcc/fortran/trans-openmp.cc | 8 ++--
 libgomp/libgomp_g.h | 3 +++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 9e166c94f8e..95184920cf7 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -8361,8 +8361,10 @@ gfc_omp_call_add_alloc (tree ptr)
   if (fn == NULL_TREE)
 {
   fn = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
+  tree att = build_tree_list (NULL_TREE, build_string (4, ". R "));
+  att = tree_cons (get_identifier ("fn spec"), att, TYPE_ATTRIBUTES (fn));
+  fn = build_type_attribute_variant (fn, att);
   fn = build_fn_decl ("GOMP_add_alloc", fn);
-/* FIXME: attributes.  */
 }
   return build_call_expr_loc (input_location, fn, 1, ptr);
 }
@@ -8380,7 +8382,9 @@ gfc_omp_call_is_alloc (tree ptr)
   fn = build_function_type_list (boolean_type_node, ptr_type_node,
  NULL_TREE);
   fn = build_fn_decl ("GOMP_is_alloc", fn)

Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread Richard Sandiford
"juzhe.zh...@rivai.ai"  writes:
> I think it's reasonable refactor reduction instruction pattern work around 
> this issue.
>
> Going to send a patch to apply this solution.
>
> So drop this patch.  Sorry for bothering Richard S.

It wasn't a bother.

On the patch: as things stand, we try to make the use_info's mode be
at least as big as all the uses in the instruction.  So if we did
want to handle unordered modes for hard registers, I think we would need
to fall back on the register's natural mode instead.  I did something
similar in LRA recently for PR112278 (6e2e0ce6795).

So if in future you would like to change RTL-SSA, a fix along those
lines would be fine with me.

Thanks,
Richard

>
>
>
> juzhe.zh...@rivai.ai
>  
> From: Robin Dapp
> Date: 2023-12-11 17:01
> To: Juzhe-Zhong; gcc-patches
> CC: rdapp.gcc; richard.sandiford
> Subject: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V 
> VSETVL PASS
>> In record_use:
>> 
>>   if (HARD_REGISTER_NUM_P (regno)
>>   && partial_subreg_p (use->mode (), mode))
>> 
>> Assertion failed on partial_subreg_p which is:
>> 
>> inline bool
>> partial_subreg_p (machine_mode outermode, machine_mode innermode)
>> {
>>   /* Modes involved in a subreg must be ordered.  In particular, we must
>>  always know at compile time whether the subreg is paradoxical.  */
>>   poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
>>   poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
>>   gcc_checking_assert (ordered_p (outer_prec, inner_prec));  
>>-> cause ICE.
>>   return maybe_lt (outer_prec, inner_prec);
>> }
>> 
>> RISC-V VSETVL PASS is an advanced lazy vsetvl insertion PASS after RA 
>> (register allocation).
>> 
>> The rootcause is that we have a pattern (reduction instruction) that 
>> includes both VLA (length-agnostic) and VLS (fixed-length) modes.
>  
> Maybe as additional context: The second input (which has a VLA mode here)
> is not used entirely but just its first element.  This serves as initial
> value for the reduction.
>  
> I'm not sure we'd want to model it as subreg here (endianness etc?).
> Could we have a VLS-mode equivalent for the VLA mode that only holds
> one element?
>  
> Regards
> Robin
>  
>  


Re: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread juzhe.zh...@rivai.ai
Thanks Richard.

It would be great if you are ok I can fix it in RTL_SSA.
I leverage your LRA patch in RTL_SSA:

  else
{
  // Record the mode of the largest use.  The choice is arbitrary if
  // the instruction (unusually) references the same register in two
  // different but equal-sized modes.
  gcc_checking_assert (use->insn () == insn);
  if (HARD_REGISTER_NUM_P (regno))
{
  if (!ordered_p (GET_MODE_PRECISION (use->mode ()),
  GET_MODE_PRECISION (mode)))
use->set_mode (reg_raw_mode[regno]);
  else if (partial_subreg_p (use->mode (), mode))
use->set_mode (mode);
}
  use->record_reference (ref, false);
}

Is it reasonable to you ?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-12-11 19:45
To: juzhe.zhong\@rivai.ai
CC: Robin Dapp; gcc-patches
Subject: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V 
VSETVL PASS
"juzhe.zh...@rivai.ai"  writes:
> I think it's reasonable refactor reduction instruction pattern work around 
> this issue.
>
> Going to send a patch to apply this solution.
>
> So drop this patch.  Sorry for bothering Richard S.
 
It wasn't a bother.
 
On the patch: as things stand, we try to make the use_info's mode be
at least as big as all the uses in the instruction.  So if we did
want to handle unordered modes for hard registers, I think we would need
to fall back on the register's natural mode instead.  I did something
similar in LRA recently for PR112278 (6e2e0ce6795).
 
So if in future you would like to change RTL-SSA, a fix along those
lines would be fine with me.
 
Thanks,
Richard
 
>
>
>
> juzhe.zh...@rivai.ai
>  
> From: Robin Dapp
> Date: 2023-12-11 17:01
> To: Juzhe-Zhong; gcc-patches
> CC: rdapp.gcc; richard.sandiford
> Subject: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V 
> VSETVL PASS
>> In record_use:
>> 
>>   if (HARD_REGISTER_NUM_P (regno)
>>   && partial_subreg_p (use->mode (), mode))
>> 
>> Assertion failed on partial_subreg_p which is:
>> 
>> inline bool
>> partial_subreg_p (machine_mode outermode, machine_mode innermode)
>> {
>>   /* Modes involved in a subreg must be ordered.  In particular, we must
>>  always know at compile time whether the subreg is paradoxical.  */
>>   poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
>>   poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
>>   gcc_checking_assert (ordered_p (outer_prec, inner_prec));  
>>-> cause ICE.
>>   return maybe_lt (outer_prec, inner_prec);
>> }
>> 
>> RISC-V VSETVL PASS is an advanced lazy vsetvl insertion PASS after RA 
>> (register allocation).
>> 
>> The rootcause is that we have a pattern (reduction instruction) that 
>> includes both VLA (length-agnostic) and VLS (fixed-length) modes.
>  
> Maybe as additional context: The second input (which has a VLA mode here)
> is not used entirely but just its first element.  This serves as initial
> value for the reduction.
>  
> I'm not sure we'd want to model it as subreg here (endianness etc?).
> Could we have a VLS-mode equivalent for the VLA mode that only holds
> one element?
>  
> Regards
> Robin
>  
>  
 


[PATCH] RISC-V: Robostify shuffle index used by vrgather and fix regression

2023-12-11 Thread Juzhe-Zhong
Notice there are some regression FAILs:
FAIL: gcc.target/riscv/rvv/autovec/pr110950.c -O3 -ftree-vectorize  
scan-assembler-times vslide1up\\.vx 1
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax  
scan-assembler-times vrgather\\.vv\\tv[0-9]+,\\s*v[0-9]+,\\s*v[0-9]+ 19
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax  
scan-assembler-times vrgatherei16\\.vv\\tv[0-9]+,\\s*v[0-9]+,\\s*v[0-9]+ 12
FAIL: gcc.target/riscv/rvv/autovec/vls/perm-4.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable  scan-assembler-times 
vrgather\\.vv\\tv[0-9]+,\\s*v[0-9]+,\\s*v[0-9]+ 19
FAIL: gcc.target/riscv/rvv/autovec/vls/perm-4.c -O3 -ftree-vectorize --param 
riscv-autovec-preference=scalable  scan-assembler-times 
vrgatherei16\\.vv\\tv[0-9]+,\\s*v[0-9]+,\\s*v[0-9]+ 12

pr110950 is not a regression, adapt testcase is enough.

The rest FAILs which is caused by this patch:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d9dd06ad51b7479f09acb88adf404664a1e18b2a

need to be recovered back.

Robostify the gather index to fixe those FAILs.


gcc/ChangeLog:

* config/riscv/riscv-v.cc (get_gather_index_mode): New function.
(shuffle_series_patterns): Robostify shuffle index.
(shuffle_generic_patterns): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr110950.c: Adapt test.

---
 gcc/config/riscv/riscv-v.cc   | 80 +++
 .../gcc.target/riscv/rvv/autovec/pr110950.c   |  2 +-
 2 files changed, 49 insertions(+), 33 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 484c690c3db..944b37b5df7 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2923,6 +2923,39 @@ struct expand_vec_perm_d
   bool testing_p;
 };
 
+/* Return the appropriate index mode for gather instructions.  */
+opt_machine_mode
+get_gather_index_mode (struct expand_vec_perm_d *d)
+{
+  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
+  poly_uint64 nunits = GET_MODE_NUNITS (d->vmode);
+
+  if (GET_MODE_INNER (d->vmode) == QImode)
+{
+  if (nunits.is_constant ())
+   {
+ /* If indice is LMUL8 CONST_VECTOR and any element value
+exceed the range of 0 ~ 255, Forbid such permutation
+since we need vector HI mode to hold such indice and
+we don't have it.  */
+ if (!d->perm.all_in_range_p (0, 255)
+ && !get_vector_mode (HImode, nunits).exists (&sel_mode))
+   return opt_machine_mode ();
+   }
+  else
+   {
+ /* Permuting two SEW8 variable-length vectors need vrgatherei16.vv.
+Otherwise, it could overflow the index range.  */
+ if (!get_vector_mode (HImode, nunits).exists (&sel_mode))
+   return opt_machine_mode ();
+   }
+}
+  else if (riscv_get_v_regno_alignment (sel_mode) > 1
+  && GET_MODE_INNER (sel_mode) != HImode)
+sel_mode = get_vector_mode (HImode, nunits).require ();
+  return sel_mode;
+}
+
 /* Recognize the patterns that we can use merge operation to shuffle the
vectors. The value of Each element (index i) in selector can only be
either i or nunits + i.  We will check the pattern is actually monotonic.
@@ -3428,12 +3461,10 @@ shuffle_series_patterns (struct expand_vec_perm_d *d)
   if (!have_series)
 return false;
 
-  /* Get a vector int-mode to be used for the permute selector.  */
-  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
-  insn_code icode = optab_handler (vec_shl_insert_optab, sel_mode);
-
-  /* We need to be able to insert an element and shift the vector.  */
-  if (need_insert && icode == CODE_FOR_nothing)
+  /* Disable shuffle if we can't find an appropriate integer index mode for
+ gather.  */
+  machine_mode sel_mode;
+  if (!get_gather_index_mode (d).exists (&sel_mode))
 return false;
 
   /* Success! */
@@ -3448,7 +3479,12 @@ shuffle_series_patterns (struct expand_vec_perm_d *d)
 
   /* Insert the remaining element if necessary.  */
   if (need_insert)
-emit_insn (GEN_FCN (icode) (series, series, gen_int_mode (el1, eltmode)));
+{
+  insn_code icode = code_for_pred_slide (UNSPEC_VSLIDE1UP, sel_mode);
+  rtx ops[]
+   = {series, series, gen_int_mode (el1, GET_MODE_INNER (sel_mode))};
+  emit_vlmax_insn (icode, BINARY_OP, ops);
+}
 
   emit_vlmax_gather_insn (d->target, d->op0, series);
 
@@ -3460,36 +3496,16 @@ shuffle_series_patterns (struct expand_vec_perm_d *d)
 static bool
 shuffle_generic_patterns (struct expand_vec_perm_d *d)
 {
-  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
-  poly_uint64 nunits = GET_MODE_NUNITS (d->vmode);
+  machine_mode sel_mode;
 
   /* We don't enable SLP for non-power of 2 NPATTERNS.  */
   if (!pow2p_hwi (d->perm.encoding().npatterns ()))

Re: [PATCH] tree-cfg: Fix misleading error message in verify_gimple_assign_single

2023-12-11 Thread xndcn
Thanks, now I have totally understand! I think it deserves a clearer
prompt, but I do not have a better idea currently. So forget it, thanks!

在 2023年12月11日星期一,Richard Biener  写道:

> On Mon, Dec 11, 2023 at 12:39 PM xndcn  wrote:
> >
> > Got it, thanks! It is really confusing >,<
> > What about the first one? For case MEM_REF.
>
> The same - the LHS determines this is a store, if it is the
> RHS is invalid as diagnosed (it needs to go through a
> temporary).
>
> Richard.
>
>
>
> > 在 2023年12月11日星期一,Richard Biener  写道:
> >>
> >> On Sun, Dec 10, 2023 at 4:00 PM xndcn  wrote:
> >> >
> >> > Hi, I am a newbie in GCC, and I do not have access to git repo.
> >> >
> >> > I found some misleading error messages in verify_gimple_assign_single
> function of tree-cfg.cc. It prompt error "invalid RHS for gimple memory
> store: ", but it checks lhs in fact.
> >>
> >> it might be a bit confusing but it's correct.  There is a store
> >> because !is_gimple_reg (lhs)
> >> and the only case !is_gimple_reg (rhs1) is correct is when this is an
> aggregate
> >> copy (!is_gimple_reg_type (TREE_TYPE (lhs))).  Otherwise the _RHS_
> needs to be
> >> a register.
> >>
> >> Richard.
>


Re: [PATCH v3] aarch64: Implement the ACLE instruction/data prefetch functions.

2023-12-11 Thread Richard Sandiford
Victor Do Nascimento  writes:
> Key changes in v3:
>   * Implement the `require_const_argument' function to ensure the nth
>   argument in EXP represents a const-type argument in the valid range
>   given by [minval, maxval), forgoing expansion altogether when an
>   invalid argument is detected early on.
>   * Whereas in the previous iteration, out-of-bound function
>   parameters led to warnings and sensible defaults set, akin to the
>   `__builtin_prefetch' implementation, parameters outside valid ranges
>   now result in an error, more faithfully reflecting ACLE
>   specifications.
>
>  ---
>
> Implement the ACLE data and instruction prefetch functions[1] with the
> following signatures:
>
>   1. Data prefetch intrinsics:
>   
>   void __pldx (/*constant*/ unsigned int /*access_kind*/,
>/*constant*/ unsigned int /*cache_level*/,
>/*constant*/ unsigned int /*retention_policy*/,
>void const volatile *addr);
>
>   void __pld (void const volatile *addr);
>
>   2. Instruction prefetch intrinsics:
>   ---
>   void __plix (/*constant*/ unsigned int /*cache_level*/,
>/*constant*/ unsigned int /*retention_policy*/,
>void const volatile *addr);
>
>   void __pli (void const volatile *addr);
>
> `__pldx' affords the programmer more fine-grained control over the
> data prefetch behavior than the analogous GCC builtin
> `__builtin_prefetch', and allows access to the "SLC" cache level.
>
> While `__builtin_prefetch' chooses both cache-level and retention
> policy automatically via the optional `locality' parameter, `__pldx'
> expects 2 (mandatory) arguments to explicitly define the desired
> cache-level and retention policies.
>
> `__plix' on the other hand, generates a code prefetch instruction and
> so extends functionality on aarch64 targets beyond that which is
> exposed by `builtin_prefetch'.
>
> `__pld' and `__pli' do prefetch of data and instructions,
> respectively, using default values for both cache-level and retention
> policies.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> [1] 
> https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc:
>   (AARCH64_PLD): New enum aarch64_builtins entry.
>   (AARCH64_PLDX): Likewise.
>   (AARCH64_PLI): Likewise.
>   (AARCH64_PLIX): Likewise.
>   (aarch64_init_prefetch_builtin): New.
>   (aarch64_general_init_builtins): Call prefetch init function.
>   (aarch64_expand_prefetch_builtin): New.
>   (aarch64_general_expand_builtin):  Add prefetch expansion.
>   (require_const_argument): New.
>   * config/aarch64/aarch64.md (UNSPEC_PLDX): New.
>   (aarch64_pldx): New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/builtin_pld_pli.c: New.
>   * gcc.target/aarch64/builtin_pld_pli_illegal.c: New.
> ---
>  gcc/config/aarch64/aarch64-builtins.cc| 136 ++
>  gcc/config/aarch64/aarch64.md |  12 ++
>  gcc/config/aarch64/arm_acle.h |  30 
>  .../gcc.target/aarch64/builtin_pld_pli.c  |  90 
>  .../aarch64/builtin_pld_pli_illegal.c |  33 +
>  5 files changed, 301 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 04f59fd9a54..d092654b6fb 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -808,6 +808,10 @@ enum aarch64_builtins
>AARCH64_RBIT,
>AARCH64_RBITL,
>AARCH64_RBITLL,
> +  AARCH64_PLD,
> +  AARCH64_PLDX,
> +  AARCH64_PLI,
> +  AARCH64_PLIX,
>AARCH64_BUILTIN_MAX
>  };
>  
> @@ -1798,6 +1802,34 @@ aarch64_init_rng_builtins (void)
>  AARCH64_BUILTIN_RNG_RNDRRS);
>  }
>  
> +/* Add builtins for data and instrution prefetch.  */
> +static void
> +aarch64_init_prefetch_builtin (void)
> +{
> +#define AARCH64_INIT_PREFETCH_BUILTIN(INDEX, N)  
> \
> +  aarch64_builtin_decls[INDEX] = \
> +aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX)
> +
> +  tree ftype;
> +  tree cv_argtype;
> +  cv_argtype = build_qualified_type (void_type_node, TYPE_QUAL_CONST
> +  | TYPE_QUAL_VOLATILE);
> +  cv_argtype = build_pointer_type (cv_argtype);
> +
> +  ftype = build_function_type_list (void_type_node, cv_argtype, NULL);
> +  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLD, "pld");
> +  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLI, "pli");
> +
> +  ftype = build_function_type_list (void_type_node, unsigned_type_node,
> + unsigned_type_node, unsigned_t

Re: [PATCH] RISC-V: Robostify shuffle index used by vrgather and fix regression

2023-12-11 Thread Robin Dapp
> Robostify the gather index to fixe those FAILs.

OK.

They must have somehow slipped through because I pruned vlmax tests
for comparison of different vlens. 

Regards
 Robin


Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread Richard Sandiford
"juzhe.zh...@rivai.ai"  writes:
> Thanks Richard.
>
> It would be great if you are ok I can fix it in RTL_SSA.
> I leverage your LRA patch in RTL_SSA:
>
>   else
> {
>   // Record the mode of the largest use.  The choice is arbitrary if
>   // the instruction (unusually) references the same register in two
>   // different but equal-sized modes.
>   gcc_checking_assert (use->insn () == insn);
>   if (HARD_REGISTER_NUM_P (regno))
> {
>   if (!ordered_p (GET_MODE_PRECISION (use->mode ()),
>   GET_MODE_PRECISION (mode)))
> use->set_mode (reg_raw_mode[regno]);
>   else if (partial_subreg_p (use->mode (), mode))
> use->set_mode (mode);
> }
>   use->record_reference (ref, false);
> }
>
> Is it reasonable to you ?

Yeah, the above is OK for trunk, thanks.

Richard

>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2023-12-11 19:45
> To: juzhe.zhong\@rivai.ai
> CC: Robin Dapp; gcc-patches
> Subject: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V 
> VSETVL PASS
> "juzhe.zh...@rivai.ai"  writes:
>> I think it's reasonable refactor reduction instruction pattern work around 
>> this issue.
>>
>> Going to send a patch to apply this solution.
>>
>> So drop this patch.  Sorry for bothering Richard S.
>  
> It wasn't a bother.
>  
> On the patch: as things stand, we try to make the use_info's mode be
> at least as big as all the uses in the instruction.  So if we did
> want to handle unordered modes for hard registers, I think we would need
> to fall back on the register's natural mode instead.  I did something
> similar in LRA recently for PR112278 (6e2e0ce6795).
>  
> So if in future you would like to change RTL-SSA, a fix along those
> lines would be fine with me.
>  
> Thanks,
> Richard
>  
>>
>>
>>
>> juzhe.zh...@rivai.ai
>>  
>> From: Robin Dapp
>> Date: 2023-12-11 17:01
>> To: Juzhe-Zhong; gcc-patches
>> CC: rdapp.gcc; richard.sandiford
>> Subject: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V 
>> VSETVL PASS
>>> In record_use:
>>> 
>>>   if (HARD_REGISTER_NUM_P (regno)
>>>   && partial_subreg_p (use->mode (), mode))
>>> 
>>> Assertion failed on partial_subreg_p which is:
>>> 
>>> inline bool
>>> partial_subreg_p (machine_mode outermode, machine_mode innermode)
>>> {
>>>   /* Modes involved in a subreg must be ordered.  In particular, we must
>>>  always know at compile time whether the subreg is paradoxical.  */
>>>   poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
>>>   poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
>>>   gcc_checking_assert (ordered_p (outer_prec, inner_prec)); 
>>> -> cause ICE.
>>>   return maybe_lt (outer_prec, inner_prec);
>>> }
>>> 
>>> RISC-V VSETVL PASS is an advanced lazy vsetvl insertion PASS after RA 
>>> (register allocation).
>>> 
>>> The rootcause is that we have a pattern (reduction instruction) that 
>>> includes both VLA (length-agnostic) and VLS (fixed-length) modes.
>>  
>> Maybe as additional context: The second input (which has a VLA mode here)
>> is not used entirely but just its first element.  This serves as initial
>> value for the reduction.
>>  
>> I'm not sure we'd want to model it as subreg here (endianness etc?).
>> Could we have a VLS-mode equivalent for the VLA mode that only holds
>> one element?
>>  
>> Regards
>> Robin
>>  
>>  
>  


[COMMITTED V2] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread Juzhe-Zhong
This patch fixes an ICE on record_use during RTL_SSA initialization RISC-V 
backend VSETVL PASS.

This is the ICE:

0x11a8603 partial_subreg_p(machine_mode, machine_mode)
../../../../gcc/gcc/rtl.h:3187
0x3b695eb 
rtl_ssa::function_info::record_use(rtl_ssa::function_info::build_info&, 
rtl_ssa::insn_info*, rtx_obj_reference)
../../../../gcc/gcc/rtl-ssa/insns.cc:524

In record_use:

  if (HARD_REGISTER_NUM_P (regno)
  && partial_subreg_p (use->mode (), mode))

Assertion failed on partial_subreg_p which is:

inline bool
partial_subreg_p (machine_mode outermode, machine_mode innermode)
{
  /* Modes involved in a subreg must be ordered.  In particular, we must
 always know at compile time whether the subreg is paradoxical.  */
  poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
  poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
  gcc_checking_assert (ordered_p (outer_prec, inner_prec)); 
-> cause ICE.
  return maybe_lt (outer_prec, inner_prec);
}

RISC-V VSETVL PASS is an advanced lazy vsetvl insertion PASS after RA (register 
allocation).

The rootcause is that we have a pattern (reduction instruction) that includes 
both VLA (length-agnostic) and VLS (fixed-length) modes.

(insn 168 173 170 31 (set (reg:RVVM1SI 101 v5 [311])
(unspec:RVVM1SI [
(unspec:V32BI [
(const_vector:V32BI [
(const_int 1 [0x1]) repeated x32
])
(reg:DI 30 t5 [312])
(const_int 2 [0x2]) repeated x2
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(unspec:RVVM1SI [
(reg:V32SI 96 v0 [orig:185 vect__96.40 ] [185])   
-> VLS mode NUNITS = 32 elements.
(reg:RVVM1SI 113 v17 [439])   
-> VLA mode NUNITS = [8, 8] elements.
] UNSPEC_REDUC_XOR)
(unspec:RVVM1SI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)
] UNSPEC_REDUC)) 15948 {pred_redxorv32si}

In this case, record_use is trying to check partial_subreg_p (use->mode (), 
mode) for RTX = (reg:V32SI 96 v0 [orig:185 vect__96.40 ] [185]).

use->mode () == V32SImode, wheras mode = RVVM1SImode. Then it ICE since they 
are !ordered_p.

Set the use mode as the biggest mode which is natural fall back mode.

gcc/ChangeLog:

* rtl-ssa/insns.cc (function_info::record_use): Add !ordered_p case.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-2.c: New test.

---
 gcc/rtl-ssa/insns.cc  | 11 +++---
 .../riscv/rvv/vsetvl/vsetvl_bug-2.c   | 21 +++
 2 files changed, 29 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-2.c

diff --git a/gcc/rtl-ssa/insns.cc b/gcc/rtl-ssa/insns.cc
index 2fa48e0dacd..a54168d5c5f 100644
--- a/gcc/rtl-ssa/insns.cc
+++ b/gcc/rtl-ssa/insns.cc
@@ -520,9 +520,14 @@ function_info::record_use (build_info &bi, insn_info *insn,
   // the instruction (unusually) references the same register in two
   // different but equal-sized modes.
   gcc_checking_assert (use->insn () == insn);
-  if (HARD_REGISTER_NUM_P (regno)
- && partial_subreg_p (use->mode (), mode))
-   use->set_mode (mode);
+  if (HARD_REGISTER_NUM_P (regno))
+   {
+ if (!ordered_p (GET_MODE_PRECISION (use->mode ()),
+ GET_MODE_PRECISION (mode)))
+   use->set_mode (reg_raw_mode[regno]);
+ else if (partial_subreg_p (use->mode (), mode))
+   use->set_mode (mode);
+   }
   use->record_reference (ref, false);
 }
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-2.c
new file mode 100644
index 000..bbc02eab818
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl_bug-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d 
--param=riscv-autovec-lmul=m4 -O3 -fomit-frame-pointer -funroll-loops" } */
+
+int safe_lshift_func_int32_t_s_s_left, safe_lshift_func_int32_t_s_s_right,
+safe_sub_func_uint64_t_u_u_ui2, safe_mul_func_uint64_t_u_u_ui2, g_79_2,
+g_97_l_439;
+void g_97(int * __restrict l_437)
+{
+  for (; g_97_l_439; g_97_l_439 += 1)
+for (char l_502 = 0; l_502 < 4; l_502++)
+  {
+int __trans_tmp_14 = ((safe_lshift_func_int32_t_s_s_right >= 2
+   || safe_lshift_func_int32_t_s_s_left)
+  ? 1 : safe_lshift_func_int32_t_s_s_right);
+long __trans_tmp_15 = __trans_tmp_14 * safe_mul_func_uint64_t_u_u_ui2;
+unsigned short __trans_tmp_16 = -__trans_tmp_15;
+int __trans_tmp_7
+  = (__trans_tmp_16 

Re: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASS

2023-12-11 Thread juzhe.zh...@rivai.ai
Thanks Richard.

Committed with V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640172.html 



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-12-11 20:12
To: juzhe.zhong\@rivai.ai
CC: Robin Dapp; gcc-patches
Subject: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V 
VSETVL PASS
"juzhe.zh...@rivai.ai"  writes:
> Thanks Richard.
>
> It would be great if you are ok I can fix it in RTL_SSA.
> I leverage your LRA patch in RTL_SSA:
>
>   else
> {
>   // Record the mode of the largest use.  The choice is arbitrary if
>   // the instruction (unusually) references the same register in two
>   // different but equal-sized modes.
>   gcc_checking_assert (use->insn () == insn);
>   if (HARD_REGISTER_NUM_P (regno))
> {
>   if (!ordered_p (GET_MODE_PRECISION (use->mode ()),
>   GET_MODE_PRECISION (mode)))
> use->set_mode (reg_raw_mode[regno]);
>   else if (partial_subreg_p (use->mode (), mode))
> use->set_mode (mode);
> }
>   use->record_reference (ref, false);
> }
>
> Is it reasonable to you ?
 
Yeah, the above is OK for trunk, thanks.
 
Richard
 
>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2023-12-11 19:45
> To: juzhe.zhong\@rivai.ai
> CC: Robin Dapp; gcc-patches
> Subject: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V 
> VSETVL PASS
> "juzhe.zh...@rivai.ai"  writes:
>> I think it's reasonable refactor reduction instruction pattern work around 
>> this issue.
>>
>> Going to send a patch to apply this solution.
>>
>> So drop this patch.  Sorry for bothering Richard S.
>  
> It wasn't a bother.
>  
> On the patch: as things stand, we try to make the use_info's mode be
> at least as big as all the uses in the instruction.  So if we did
> want to handle unordered modes for hard registers, I think we would need
> to fall back on the register's natural mode instead.  I did something
> similar in LRA recently for PR112278 (6e2e0ce6795).
>  
> So if in future you would like to change RTL-SSA, a fix along those
> lines would be fine with me.
>  
> Thanks,
> Richard
>  
>>
>>
>>
>> juzhe.zh...@rivai.ai
>>  
>> From: Robin Dapp
>> Date: 2023-12-11 17:01
>> To: Juzhe-Zhong; gcc-patches
>> CC: rdapp.gcc; richard.sandiford
>> Subject: Re: [PATCH] RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V 
>> VSETVL PASS
>>> In record_use:
>>> 
>>>   if (HARD_REGISTER_NUM_P (regno)
>>>   && partial_subreg_p (use->mode (), mode))
>>> 
>>> Assertion failed on partial_subreg_p which is:
>>> 
>>> inline bool
>>> partial_subreg_p (machine_mode outermode, machine_mode innermode)
>>> {
>>>   /* Modes involved in a subreg must be ordered.  In particular, we must
>>>  always know at compile time whether the subreg is paradoxical.  */
>>>   poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
>>>   poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
>>>   gcc_checking_assert (ordered_p (outer_prec, inner_prec)); 
>>> -> cause ICE.
>>>   return maybe_lt (outer_prec, inner_prec);
>>> }
>>> 
>>> RISC-V VSETVL PASS is an advanced lazy vsetvl insertion PASS after RA 
>>> (register allocation).
>>> 
>>> The rootcause is that we have a pattern (reduction instruction) that 
>>> includes both VLA (length-agnostic) and VLS (fixed-length) modes.
>>  
>> Maybe as additional context: The second input (which has a VLA mode here)
>> is not used entirely but just its first element.  This serves as initial
>> value for the reduction.
>>  
>> I'm not sure we'd want to model it as subreg here (endianness etc?).
>> Could we have a VLS-mode equivalent for the VLA mode that only holds
>> one element?
>>  
>> Regards
>> Robin
>>  
>>  
>  
 


[PATCH] i386: Fix missed APX_NDD check for shift/rotate expanders [PR 112943]

2023-12-11 Thread Hongyu Wang
Hi,

The ashl/lshr/ashr expanders calls ix86_expand_binary_operator, while
they will be called for some post-reload split, and TARGET_APX_NDD is
required for these calls to avoid force-load to memory at postreload
stage.

Bootstrapped/regtested on x86-64-pc-linux-gnu{-m32,}

Ok for master?

gcc/ChangeLog:

PR target/112943
* config/i386/i386.md (ashl3): Add TARGET_APX_NDD to
ix86_expand_binary_operator call.
(3): Likewise for rshift.
(di3): Likewise for DImode rotate.
(3): Likewise for SWI124 rotate.

gcc/testsuite/ChangeLog:

PR target/112943
* gcc.target/i386/pr112943.c: New test.
---
 gcc/config/i386/i386.md  | 12 +++--
 gcc/testsuite/gcc.target/i386/pr112943.c | 63 
 2 files changed, 71 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112943.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b4db50f61cd..f83064ec335 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14308,7 +14308,8 @@ (define_expand "ashl3"
(ashift:SDWIM (match_operand:SDWIM 1 "")
  (match_operand:QI 2 "nonmemory_operand")))]
   ""
-  "ix86_expand_binary_operator (ASHIFT, mode, operands); DONE;")
+  "ix86_expand_binary_operator (ASHIFT, mode, operands,
+   TARGET_APX_NDD); DONE;")
 
 (define_insn_and_split "*ashl3_doubleword_mask"
   [(set (match_operand: 0 "register_operand")
@@ -15564,7 +15565,8 @@ (define_expand "3"
(any_shiftrt:SDWIM (match_operand:SDWIM 1 "")
   (match_operand:QI 2 "nonmemory_operand")))]
   ""
-  "ix86_expand_binary_operator (, mode, operands); DONE;")
+  "ix86_expand_binary_operator (, mode, operands,
+   TARGET_APX_NDD); DONE;")
 
 ;; Avoid useless masking of count operand.
 (define_insn_and_split "*3_mask"
@@ -16791,7 +16793,8 @@ (define_expand "di3"
  ""
 {
   if (TARGET_64BIT)
-ix86_expand_binary_operator (, DImode, operands);
+ix86_expand_binary_operator (, DImode, operands,
+TARGET_APX_NDD);
   else if (const_1_to_31_operand (operands[2], VOIDmode))
 emit_insn (gen_ix86_di3_doubleword
(operands[0], operands[1], operands[2]));
@@ -16811,7 +16814,8 @@ (define_expand "3"
(any_rotate:SWIM124 (match_operand:SWIM124 1 "nonimmediate_operand")
(match_operand:QI 2 "nonmemory_operand")))]
   ""
-  "ix86_expand_binary_operator (, mode, operands); DONE;")
+  "ix86_expand_binary_operator (, mode, operands,
+   TARGET_APX_NDD); DONE;")
 
 ;; Avoid useless masking of count operand.
 (define_insn_and_split "*3_mask"
diff --git a/gcc/testsuite/gcc.target/i386/pr112943.c 
b/gcc/testsuite/gcc.target/i386/pr112943.c
new file mode 100644
index 000..45da6cce5b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112943.c
@@ -0,0 +1,63 @@
+/* PR target/112943 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -march=westmere -mapxf" } */
+
+typedef unsigned char __attribute__((__vector_size__(1))) v8u8;
+typedef char __attribute__((__vector_size__(2))) v16u8;
+typedef char __attribute__((__vector_size__(4))) v32u8;
+typedef char __attribute__((__vector_size__(8))) v64u8;
+typedef char __attribute__((__vector_size__(16))) v128u8;
+typedef _Float16 __attribute__((__vector_size__(2))) v16f16;
+typedef _Float16 __attribute__((__vector_size__(16))) v128f16;
+typedef _Float64x __attribute__((__vector_size__(16))) v128f128;
+typedef _Decimal64 d64;
+char foo0_u8_0;
+v8u8 foo0_v8u8_0;
+__attribute__((__vector_size__(sizeof(char char foo0_v8s8_0;
+__attribute__((__vector_size__(sizeof(long unsigned long v64u64_0;
+_Float16 foo0_f16_0;
+v128f16 foo0_v128f16_0;
+double foo0_f64_0;
+int foo0_f128_0, foo0_v32d32_0, foo0__0;
+d64 foo0_d64_0;
+v8u8 *foo0_ret;
+unsigned __int128 foo0_u128_3;
+v8u8 d;
+void foo0() {
+v64u64_0 -= foo0_u8_0;
+v8u8 v8u8_1 = foo0_v8u8_0 % d;
+v128f128 v128f128_1 = __builtin_convertvector(v64u64_0, v128f128);
+__int128 u128_2 = (9223372036854775808 << 4) * foo0_u8_0; /* { dg-warning 
"integer constant is so large that it is unsigned" "so large" } */
+__int128 u128_r = u128_2 + foo0_u128_3 + foo0_f128_0 + 
(__int128)foo0_d64_0;
+v16f16 v16f16_1 = __builtin_convertvector(foo0_v8s8_0, v16f16);
+v128f16 v128f16_1 = 0 > foo0_v128f16_0;
+v128u8 v128u8_r = (v128u8)v128f16_1 + (v128u8)v128f128_1;
+v64u8 v64u8_r = ((union {
+ v128u8 a;
+ v64u8 b;
+ })v128u8_r)
+.b +
+  (v64u8)v64u64_0;
+v32u8 v32u8_r = ((union {
+ v64u8 a;
+ v32u8 b;
+ })v64u8_r)
+.b +
+  (v32u8)foo0_v32d32_0;
+v16u8 v16u8_r = ((union {
+ v32u8 a;
+ v16u8 b;
+ })v32u8_r)
+

Re: [PATCH] i386: Fix missed APX_NDD check for shift/rotate expanders [PR 112943]

2023-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2023 at 08:20:20PM +0800, Hongyu Wang wrote:

LGTM but will defer review of the patch to Hongtao or Uros.

> +__int128 u128_2 = (9223372036854775808 << 4) * foo0_u8_0; /* { 
> dg-warning "integer constant is so large that it is unsigned" "so large" } */

Just you can use (9223372036854775807LL + (__int128) 1) instead of 
9223372036854775808
to avoid the warning.
The testcase will ICE without the patch even with that.

Jakub



Re: [PATCH] i386: Fix missed APX_NDD check for shift/rotate expanders [PR 112943]

2023-12-11 Thread Hongyu Wang
> > +__int128 u128_2 = (9223372036854775808 << 4) * foo0_u8_0; /* { 
> > dg-warning "integer constant is so large that it is unsigned" "so large" } 
> > */
>
> Just you can use (9223372036854775807LL + (__int128) 1) instead of 
> 9223372036854775808
> to avoid the warning.
> The testcase will ICE without the patch even with that.

Thanks for the hint! Will adjust when pushing the patch.


[PATCH] wrong code on m68k with -mlong-jump-table-offsets and -malign-int (PR target/112413)

2023-12-11 Thread Mikael Pettersson
On m68k the compiler assumes that the PC-relative jump-via-jump-table
instruction and the jump table are adjacent with no padding in between.

When -mlong-jump-table-offsets is combined with -malign-int, a 2-byte
nop may be inserted before the jump table, causing the jump to add the
fetched offset to the wrong PC base and thus jump to the wrong address.

Fixed by referencing the jump table via its label. On the test case
in the PR the object code change is (the moveal at 16 is the nop):

a:  6536bcss 42 
c:  e588lsll #2,%d0
e:  203b 0808   movel %pc@(18 ,%d0:l),%d0
-  12:  4efb 0802   jmp %pc@(16 ,%d0:l)
+  12:  4efb 0804   jmp %pc@(18 ,%d0:l)
   16:  284cmoveal %a4,%a4
   18:   0020   orib #32,%d0
   1c:   002c   orib #44,%d0

Bootstrapped and tested on m68k-linux-gnu, no regressions.

Note: I don't have commit rights to I would need assistance applying this.

2023-12-11  Mikael Pettersson  

PR target/112413
* config/m68k/linux.h (ASM_RETURN_CASE_JUMP): For
TARGET_LONG_JUMP_TABLE_OFFSETS, reference the jump table
via its label.
* config/m68k/m68kelf.h (ASM_RETURN_CASE_JUMP: Likewise.
* config/m68k/netbsd-elf.h (ASM_RETURN_CASE_JUMP): Likewise.
---
 gcc/config/m68k/linux.h  | 4 ++--
 gcc/config/m68k/m68kelf.h| 4 ++--
 gcc/config/m68k/netbsd-elf.h | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/m68k/linux.h b/gcc/config/m68k/linux.h
index 2e1cb5498b8..37069c4d082 100644
--- a/gcc/config/m68k/linux.h
+++ b/gcc/config/m68k/linux.h
@@ -102,12 +102,12 @@ along with GCC; see the file COPYING3.  If not see
if (ADDRESS_REG_P (operands[0]))\
  return "jmp %%pc@(2,%0:l)";   \
else if (TARGET_LONG_JUMP_TABLE_OFFSETS)\
- return "jmp %%pc@(2,%0:l)";   \
+ return "jmp %%pc@(%l1,%0:l)"; \
else\
  return "ext%.l %0\n\tjmp %%pc@(2,%0:l)";  \
   }\
 else if (TARGET_LONG_JUMP_TABLE_OFFSETS)   \
-  return "jmp %%pc@(2,%0:l)";  \
+  return "jmp %%pc@(%l1,%0:l)";\
 else   \
   return "jmp %%pc@(2,%0:w)";  \
   } while (0)
diff --git a/gcc/config/m68k/m68kelf.h b/gcc/config/m68k/m68kelf.h
index 01ee724ef2b..f89c9b70455 100644
--- a/gcc/config/m68k/m68kelf.h
+++ b/gcc/config/m68k/m68kelf.h
@@ -59,12 +59,12 @@ along with GCC; see the file COPYING3.  If not see
if (ADDRESS_REG_P (operands[0]))\
  return "jmp %%pc@(2,%0:l)";   \
else if (TARGET_LONG_JUMP_TABLE_OFFSETS)\
- return "jmp %%pc@(2,%0:l)";   \
+ return "jmp %%pc@(%l1,%0:l)"; \
else\
  return "ext%.l %0\n\tjmp %%pc@(2,%0:l)";  \
   }\
 else if (TARGET_LONG_JUMP_TABLE_OFFSETS)   \
-  return "jmp %%pc@(2,%0:l)";  \
+  return "jmp %%pc@(%l1,%0:l)";\
 else   \
   return "jmp %%pc@(2,%0:w)";  \
   } while (0)
diff --git a/gcc/config/m68k/netbsd-elf.h b/gcc/config/m68k/netbsd-elf.h
index 4d4a6d71cc4..6ba581b7b18 100644
--- a/gcc/config/m68k/netbsd-elf.h
+++ b/gcc/config/m68k/netbsd-elf.h
@@ -137,12 +137,12 @@ while (0)
if (ADDRESS_REG_P (operands[0]))\
  return "jmp %%pc@(2,%0:l)";   \
else if (TARGET_LONG_JUMP_TABLE_OFFSETS)\
- return "jmp %%pc@(2,%0:l)";   \
+ return "jmp %%pc@(%l1,%0:l)"; \
else\
  return "ext%.l %0\n\tjmp %%pc@(2,%0:l)";  \
   }\
 else if (TARGET_LONG_JUMP_TABLE_OFFSETS)   \
-  return "jmp %%pc@(2,%0:l)";  \
+  return "jmp %%pc@(%l1,%0:l)";\
 else   \
   return "jmp %%pc@(2,%0:w)";  \
   } while (0)
-- 
2.43.0



Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-11 Thread Robin Dapp
Hi Pan,

> I reduced the SZ size from 10 to 1, and the below case with SZ = 2
> will fail. The failed location is "foo is 50, foo2 is 12800, i,j is
> 1, 0".
> 
> #define SZ 2
> 
> const char *s[SZ]  = {"1",
> "12345678901234567889012345678901234567890"};

Thanks.  I still cannot reproduce but I think the reason is that
foo2 (so the reference) does something different with newlib as
opposed to libc.

Could you try if the attached helps for you?  

Regards
 Robin

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
index 6dec7da91c1..adbe022e0ee 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
@@ -1,8 +1,6 @@
 /* { dg-do run } */
 /* { dg-additional-options "-O3 -minline-strcmp" } */
 
-#include 
-
 int
 __attribute__ ((noipa))
 foo (const char *s, const char *t)
@@ -10,23 +8,26 @@ foo (const char *s, const char *t)
   return __builtin_strcmp (s, t);
 }
 
-int
-__attribute__ ((noipa, optimize ("0")))
-foo2 (const char *s, const char *t)
-{
-  return strcmp (s, t);
-}
-
 #define SZ 10
 
-int main ()
+int
+main ()
 {
   const char *s[SZ]
 = {"",  "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
"a", "z","1", "9",  "12345678901234567889012345678901234567890"};
 
+  const int ref[SZ * SZ]
+= {0,   -97, -48, 0,   -33, -97, -122, -49, -57, -49, 97,  0,   49,
 97, 64,
+   115, -25, 48,  40,  48, 48,  -49,  0,   48,  15,  -49, -74, -1,  -9, -1,
+   0,   -97, -48, 0,   -33, -97, -122, -49, -57, -49, 33,  -64, -15, 33, 0,
+   -64, -89, -16, -24, -16, 97,  -115, 49, 97,  64,  0,   -25, 48,  40, 48,
+   122, 25, 74,  122, 89,  25,  0,73,  65,  73,  49,  -48, 1,  
 49, 16,
+   -48, -73, 0,   -8,  -50, 57,  -40,  9,  57,  24,  -40, -65, 8,   0,  8,
+   49,  -48, 1,   49,  16, -48, -73,  50,  -8,  0};
+
   for (int i = 0; i < SZ; i++)
 for (int j = 0; j < SZ; j++)
-  if (foo (s[i], s[j]) != foo2 (s[i], s[j]))
+  if (foo (s[i], s[j]) != ref [i * SZ + j])
 __builtin_abort ();
 }


Re: Backport of "fixincludes: Update darwin_flt_eval_method for macOS 14"

2023-12-11 Thread Iain Sandoe
Hi FX,

> On 11 Dec 2023, at 11:37, FX Coudert  wrote:

> I’d like to backport the fixincludes for macOS 14 SDK at 
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=93f803d53b5ccaabded9d7b4512b54da81c1c616
>  to the active branches, i.e. 13, 12 and 11 (unless I am mistaken).
> 
> The fix has been there for months, it’s stable and very specific. Without it, 
> we can’t compile GCC for macOS 14.
> OK to backport?

Yes, OK (build fixes are on my list, but you got to it first).
thanks
Iain


> 
> FX



RE: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-11 Thread Li, Pan2
Yes, I test the patch with all below configurations and there is no failure 
now. That would be great!

riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow

riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1

riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax

riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2

riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax

riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4

riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax

riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8

riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax

riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1

riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax

riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2

riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax

riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4

riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax

riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8

riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax

riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1

riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax

riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2

riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax

riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4

riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax

riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8

riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax

Pan

-Original Message-
From: Robin Dapp  
Sent: Monday, December 11, 2023 9:15 PM
To: Li, Pan2 ; 钟居哲 ; gcc-patches 
; palmer ; kito.cheng 
; Jeff Law 
Cc: rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Add vectorized strcmp.

Hi Pan,

> I reduced the SZ size from 10 to 1, and the below case with SZ = 2
> will fail. The failed location is "foo is 50, foo2 is 12800, i,j is
> 1, 0".
> 
> #define SZ 2
> 
> const char *s[SZ]  = {"1",
> "12345678901234567889012345678901234567890"};

Thanks.  I still cannot reproduce but I think the reason is that
foo2 (so the reference) does something different with newlib as
opposed to libc.

Could you try if the attached helps for you?  

Regards
 Robin

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
index 6dec7da91c1..adbe022e0ee 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
@@ -1,8 +1,6 @@
 /* { dg-do run } */
 /* { dg-additional-options "-O3 -minline-strcmp" } */
 
-#include 
-
 int
 __attribute__ ((noipa))
 foo (const char *s, const char *t)
@@ -10,23 +8,26 @@ foo (const char *s, const char *t)
   return __builtin_strcmp (s, t);
 }
 
-int
-__attribute__ ((noipa, optimize ("0")))
-foo2 (const char *s, const char *t)
-{
-  return strcmp (s, t);
-}
-
 #define SZ 10
 
-int main ()
+int
+main ()
 {
   const char *s[SZ]
 = {"",  "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
"a", "z","1", "9",  "12345678901234567889012345678901234567890"};
 
+  const int ref[SZ * SZ]
+= {0,   -97, -48, 0,   -33, -97, -122, -49, -57, -49, 97,  0,   49,
 97, 64,
+   115, -25, 48,  40,  48, 48,  -49,  0,   48,  15,  -49, -74, -1,  -9, -1,
+   0,   -97, -48, 0,   -33, -97, -122, -49, -57, -49, 33,  -64, -15, 33, 0,
+   -64, -89, -16, -24, -16, 97,  -115, 49, 97,  64,  0,   -25, 48,  40, 48,
+   122, 25, 74,  122, 89,  25,  0,73,  65,  73,  49,  -48, 1,  
 49

Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension

2023-12-11 Thread Kito Cheng
On Mon, Dec 11, 2023 at 5:48 PM Sergei Lewis  wrote:
>
> gcc/ChangeLog
>
> * config/riscv/riscv-protos.h (riscv_vector::expand_vec_setmem): New 
> function
> declaration.
>
> * config/riscv/riscv-string.cc (riscv_vector::expand_vec_setmem): New
> function: this generates an inline vectorised memory set, if and only if 
> we
> know the entire operation can be performed in a single vector store
>
> * config/riscv/riscv.md (setmem): Try 
> riscv_vector::expand_vec_setmem
> for constant lengths
>
> gcc/testsuite/ChangeLog
> * gcc.target/riscv/rvv/base/setmem-1.c: New tests
> ---
>  gcc/config/riscv/riscv-protos.h   |  1 +
>  gcc/config/riscv/riscv-string.cc  | 82 +++
>  gcc/config/riscv/riscv.md | 14 +++
>  .../gcc.target/riscv/rvv/base/setmem-1.c  | 99 +++
>  4 files changed, 196 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 20bbb5b859c..950cb65c910 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -560,6 +560,7 @@ void expand_popcount (rtx *);
>  void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
>  bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
>  void emit_vec_extract (rtx, rtx, poly_int64);
> +bool expand_vec_setmem (rtx, rtx, rtx, rtx);
>
>  /* Rounding mode bitfield for fixed point VXRM.  */
>  enum fixed_point_rounding_mode
> diff --git a/gcc/config/riscv/riscv-string.cc 
> b/gcc/config/riscv/riscv-string.cc
> index 11c1f74d0b3..0abbd5f8b28 100644
> --- a/gcc/config/riscv/riscv-string.cc
> +++ b/gcc/config/riscv/riscv-string.cc
> @@ -1247,4 +1247,86 @@ expand_strcmp (rtx result, rtx src1, rtx src2, rtx 
> nbytes,
>return true;
>  }
>
> +
> +/* Select appropriate LMUL for a single vector operation based on
> +   byte size of data to be processed.
> +   On success, return true and populate lmul_out.
> +   If length_in is too wide for a single vector operation, return false
> +   and leave lmul_out unchanged.  */
> +
> +static bool
> +select_appropriate_lmul (HOST_WIDE_INT length_in,
> +HOST_WIDE_INT &lmul_out)
> +{
> +  /* if it's tiny, default operation is likely better; maybe worth
> + considering fractional lmul in the future as well.  */
> +  if (length_in < (TARGET_MIN_VLEN/8))

(TARGET_MIN_VLEN / 8)

> +return false;
> +
> +  /* find smallest lmul large enough for entire op.  */
> +  HOST_WIDE_INT lmul = 1;
> +  while ((lmul <= 8) && (length_in > ((lmul*TARGET_MIN_VLEN)/8)))

 ((lmu l *TARGET_MIN_VLEN) / 8)))

> +{
> +  lmul <<= 1;
> +}
> +
> +  if (lmul > 8)
> +return false;
> +
> +  lmul_out = lmul;
> +  return true;
> +}
> +
> +/* Used by setmemdi in riscv.md.  */
> +bool
> +expand_vec_setmem (rtx dst_in, rtx length_in, rtx fill_value_in,
> + rtx alignment_in)
> +{
> +  /* we're generating vector code.  */
> +  if (!TARGET_VECTOR)
> +return false;
> +  /* if we can't reason about the length, let libc handle the operation.  */
> +  if (!CONST_INT_P (length_in))
> +return false;
> +
> +  HOST_WIDE_INT length = INTVAL (length_in);
> +  HOST_WIDE_INT lmul;
> +
> +  /* select an lmul such that the data just fits into one vector operation;
> + bail if we can't.  */
> +  if (!select_appropriate_lmul (length, lmul))
> +return false;
> +
> +  machine_mode vmode = riscv_vector::get_vector_mode (QImode,
> + BYTES_PER_RISCV_VECTOR * lmul).require ();
> +  rtx dst_addr = copy_addr_to_reg (XEXP (dst_in, 0));
> +  rtx dst = change_address (dst_in, vmode, dst_addr);
> +
> +  rtx fill_value = gen_reg_rtx (vmode);
> +  rtx broadcast_ops[] = {fill_value, fill_value_in};
> +
> +  /* If the length is exactly vlmax for the selected mode, do that.
> + Otherwise, use a predicated store.  */
> +  if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in)))
> +{
> +  emit_vlmax_insn (code_for_pred_broadcast (vmode),
> + UNARY_OP, broadcast_ops);
> +  emit_move_insn (dst, fill_value);
> +}
> +  else
> +{
> +  if (!satisfies_constraint_K (length_in))
> + length_in= force_reg (Pmode, length_in);
> +  emit_nonvlmax_insn (code_for_pred_broadcast (vmode), UNARY_OP,
> + broadcast_ops, length_in);
> +  machine_mode mask_mode = riscv_vector::get_vector_mode
> + (BImode, GET_MODE_NUNITS (vmode)).require ();
> +  rtx mask =  CONSTM1_RTX (mask_mode);
> +  emit_insn (gen_pred_store (vmode, dst, mask, fill_value, length_in,
> + get_avl_type_rtx (riscv_vector::NONVLMAX)));
> +}
> +
> +  return true;
> +}
> +
>  }
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 88fde290a8a..29d3b1aa342 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -2381,6 +2381,20 @@
>  FAIL;
>  })
>
> +(define_expand "set

[PATCH] RISC-V: testsuite: Fix strcmp-run.c test.

2023-12-11 Thread Robin Dapp
Hi,

this fixes expectations in the strcmp-run test which would sometimes
fail with newlib.  The test expects libc strcmp return values and
asserts the vectorized result is similar to those.  Therefore hard-code
the expected results instead of relying on a strcmp call.

Pan has already tested in a lot of configurations and doesn't see
failures anymore.

I'd argue it's obvious enough to push it if nobody complains :)

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: Adjust test
expectation.
---
 .../riscv/rvv/autovec/builtin/strcmp-run.c| 23 ++-
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
index 6dec7da91c1..adbe022e0ee 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
@@ -1,8 +1,6 @@
 /* { dg-do run } */
 /* { dg-additional-options "-O3 -minline-strcmp" } */
 
-#include 
-
 int
 __attribute__ ((noipa))
 foo (const char *s, const char *t)
@@ -10,23 +8,26 @@ foo (const char *s, const char *t)
   return __builtin_strcmp (s, t);
 }
 
-int
-__attribute__ ((noipa, optimize ("0")))
-foo2 (const char *s, const char *t)
-{
-  return strcmp (s, t);
-}
-
 #define SZ 10
 
-int main ()
+int
+main ()
 {
   const char *s[SZ]
 = {"",  "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
"a", "z","1", "9",  "12345678901234567889012345678901234567890"};
 
+  const int ref[SZ * SZ]
+= {0,   -97, -48, 0,   -33, -97, -122, -49, -57, -49, 97,  0,   49,
 97, 64,
+   115, -25, 48,  40,  48, 48,  -49,  0,   48,  15,  -49, -74, -1,  -9, -1,
+   0,   -97, -48, 0,   -33, -97, -122, -49, -57, -49, 33,  -64, -15, 33, 0,
+   -64, -89, -16, -24, -16, 97,  -115, 49, 97,  64,  0,   -25, 48,  40, 48,
+   122, 25, 74,  122, 89,  25,  0,73,  65,  73,  49,  -48, 1,  
 49, 16,
+   -48, -73, 0,   -8,  -50, 57,  -40,  9,  57,  24,  -40, -65, 8,   0,  8,
+   49,  -48, 1,   49,  16, -48, -73,  50,  -8,  0};
+
   for (int i = 0; i < SZ; i++)
 for (int j = 0; j < SZ; j++)
-  if (foo (s[i], s[j]) != foo2 (s[i], s[j]))
+  if (foo (s[i], s[j]) != ref [i * SZ + j])
 __builtin_abort ();
 }
-- 
2.43.0



Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-11 Thread Robin Dapp
> Yes, I test the patch with all below configurations and there is no failure 
> now. That would be great!

Thank you!  I posted it as a patch now:

https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640182.html

Regards
 Robin


Re: [PATCH] RISC-V: testsuite: Fix strcmp-run.c test.

2023-12-11 Thread juzhe.zhong
lgtm. Replied Message FromRobin DappDate12/11/2023 21:40 Togcc-patches,palmer,Kito Cheng,jeffreyalaw,juzhe.zh...@rivai.ai,Li, Pan2 Ccrdapp@gmail.comSubject[PATCH] RISC-V: testsuite: Fix strcmp-run.c test.Hi,

this fixes expectations in the strcmp-run test which would sometimes
fail with newlib.  The test expects libc strcmp return values and
asserts the vectorized result is similar to those.  Therefore hard-code
the expected results instead of relying on a strcmp call.

Pan has already tested in a lot of configurations and doesn't see
failures anymore.

I'd argue it's obvious enough to push it if nobody complains :)

Regards
 Robin

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: Adjust test
    expectation.
---
 .../riscv/rvv/autovec/builtin/strcmp-run.c    | 23 ++-
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
index 6dec7da91c1..adbe022e0ee 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
@@ -1,8 +1,6 @@
 /* { dg-do run } */
 /* { dg-additional-options "-O3 -minline-strcmp" } */
  
-#include 
-
 int
 __attribute__ ((noipa))
 foo (const char *s, const char *t)
@@ -10,23 +8,26 @@ foo (const char *s, const char *t)
   return __builtin_strcmp (s, t);
 }
  
-int
-__attribute__ ((noipa, optimize ("0")))
-foo2 (const char *s, const char *t)
-{
-  return strcmp (s, t);
-}
-
 #define SZ 10
  
-int main ()
+int
+main ()
 {
   const char *s[SZ]
 = {"",  "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
    "a", "z",    "1", "9",  "12345678901234567889012345678901234567890"};
  
+  const int ref[SZ * SZ]
+    = {0,   -97, -48, 0,   -33, -97, -122, -49, -57, -49, 97,  0,   49,     97, 64,
+   115, -25, 48,  40,  48,    48,  -49,  0,    48,  15,  -49, -74, -1,     -9, -1,
+   0,   -97, -48, 0,   -33, -97, -122, -49, -57, -49, 33,  -64, -15, 33, 0,
+   -64, -89, -16, -24, -16, 97,  -115, 49,    97,  64,  0,   -25, 48,     40, 48,
+   122, 25,     74,  122, 89,    25,  0,       73,    65,  73,  49,  -48, 1,     49, 16,
+   -48, -73, 0,   -8,  -50, 57,  -40,  9,    57,  24,  -40, -65, 8,     0,  8,
+   49,  -48, 1,   49,  16,    -48, -73,  50,    -8,  0};
+
   for (int i = 0; i < SZ; i++)
 for (int j = 0; j < SZ; j++)
-  if (foo (s[i], s[j]) != foo2 (s[i], s[j]))
+  if (foo (s[i], s[j]) != ref [i * SZ + j])
 __builtin_abort ();
 }
--  
2.43.0




Re: [PATCH] testsuite: adjust call to abort in excess-precision-12

2023-12-11 Thread Marc Poulhiès
Hello,

> Why wouldn't they have abort and what else does __builtin_abort () expand
> to?

It expands to abort but works around the "abort is undeclared" error.

> There are 2000+ other tests in gcc.target/i386/ which call abort (),
> not __builtin_abort (), after including  directly or indirectly
> or declaring it themselves.  This test in particular includes 
>
> Does whatever target you are running this into provide just std::abort ()
> and not abort (); from ?  If so, perhaps it should call
> std::abort (); instead of abort ().

You are correct, std::abort() is a better solution. cstdlib does not
include stdlib.h because I'm on a non-hosted target. I'll send a
refreshed patch.

Thanks,
Marc


[PATCH v2] testsuite: adjust call to abort in excess-precision-12

2023-12-11 Thread Marc Poulhiès
On non-hosted targets, cstdlib may not be sufficient to have abort
defined, but it should be for std::abort.

gcc/testsuite/ChangeLog:

* g++.target/i386/excess-precision-12.C: call std::abort instead of 
abort.
---
Changed from calling __builtin_abort to std::abort, as advised.

Ok for master?

Thanks,
Marc

 gcc/testsuite/g++.target/i386/excess-precision-12.C | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.target/i386/excess-precision-12.C 
b/gcc/testsuite/g++.target/i386/excess-precision-12.C
index dff48c07c8b..7cfd15d6136 100644
--- a/gcc/testsuite/g++.target/i386/excess-precision-12.C
+++ b/gcc/testsuite/g++.target/i386/excess-precision-12.C
@@ -13,8 +13,8 @@ main (void)
   unsigned long long int u = (1ULL << 63) + 1;
 
   if ((f <=> u) >= 0)
-abort ();
+std::abort ();
 
   if ((u <=> f) <= 0)
-abort ();
+std::abort ();
 }
-- 
2.43.0



Re: [PATCH v2] testsuite: adjust call to abort in excess-precision-12

2023-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2023 at 02:35:52PM +0100, Marc Poulhiès wrote:
> On non-hosted targets, cstdlib may not be sufficient to have abort
> defined, but it should be for std::abort.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/i386/excess-precision-12.C: call std::abort instead of 
> abort.
> ---
> Changed from calling __builtin_abort to std::abort, as advised.
> 
> Ok for master?

Ok.

Jakub



Re: [PATCH V3 4/4] OpenMP: Permit additional selector properties

2023-12-11 Thread Tobias Burnus

This patch LGTM.

Likewise 'LGTM' are patches 1/4 and 2/4, in line with my previous
comments. (Those are unchanged to previous round.)

Thanks for the patches!

I still have to look at 3/4, which is large and did see some changes
between v2 and v3. (Overall they seem to be really nice!)

Tobias

On 07.12.23 16:52, Sandra Loosemore wrote:

This patch adds "hpe" to the known properties for the "vendor" selector,
and support for "acquire" and "release" for "atomic_default_mem_order".

gcc/ChangeLog
  * omp-general.cc (vendor_properties): Add "hpe".
  (atomic_default_mem_order_properties): Add "acquire" and "release".
  (omp_context_selector_matches): Handle "acquire" and "release".

gcc/testsuite/ChangeLog
  * c-c++-common/gomp/declare-variant-2.c: Don't expect error on
  "acquire" and "release".
  * gfortran.dg/gomp/declare-variant-2a.f90: Likewise.
---
  gcc/omp-general.cc| 10 --
  gcc/testsuite/c-c++-common/gomp/declare-variant-2.c   |  4 ++--
  gcc/testsuite/gfortran.dg/gomp/declare-variant-2a.f90 |  4 ++--
  3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index 5f0cb041ffa..4f7c83fbd2c 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -1126,12 +1126,12 @@ const char *omp_tss_map[] =
  static const char *const kind_properties[] =
{ "host", "nohost", "cpu", "gpu", "fpga", "any", NULL };
  static const char *const vendor_properties[] =
-  { "amd", "arm", "bsc", "cray", "fujitsu", "gnu", "ibm", "intel",
+  { "amd", "arm", "bsc", "cray", "fujitsu", "gnu", "hpe", "ibm", "intel",
  "llvm", "nvidia", "pgi", "ti", "unknown", NULL };
  static const char *const extension_properties[] =
{ NULL };
  static const char *const atomic_default_mem_order_properties[] =
-  { "seq_cst", "relaxed", "acq_rel", NULL };
+  { "seq_cst", "relaxed", "acq_rel", "acquire", "release", NULL };

  struct omp_ts_info omp_ts_map[] =
{
@@ -1551,6 +1551,12 @@ omp_context_selector_matches (tree ctx)
else if (!strcmp (prop, "acq_rel")
 && omo != OMP_MEMORY_ORDER_ACQ_REL)
  return 0;
+   else if (!strcmp (prop, "acquire")
+&& omo != OMP_MEMORY_ORDER_ACQUIRE)
+ return 0;
+   else if (!strcmp (prop, "release")
+&& omo != OMP_MEMORY_ORDER_RELEASE)
+ return 0;
  }
break;
  case OMP_TRAIT_DEVICE_ARCH:
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-2.c 
b/gcc/testsuite/c-c++-common/gomp/declare-variant-2.c
index 97285fa3b74..bc3f443379f 100644
--- a/gcc/testsuite/c-c++-common/gomp/declare-variant-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-2.c
@@ -105,9 +105,9 @@ void f50 (void);  /* { 
dg-error "expected '\\\}' before '\\(' token" "" {
  void f51 (void);/* { dg-error 
"expected '\\\}' before '\\(' token" "" { target c } .-1 } */
  #pragma omp declare variant (f1) match(implementation={atomic_default_mem_order})   /* 
{ dg-error "expected '\\(' before '\\\}' token" } */
  void f52 (void);
-#pragma omp declare variant (f1) 
match(implementation={atomic_default_mem_order(acquire)})   /* { dg-error "incorrect 
property 'acquire' of 'atomic_default_mem_order' selector" } */
+#pragma omp declare variant (f1) 
match(implementation={atomic_default_mem_order(acquire)})
  void f53 (void);
-#pragma omp declare variant (f1) 
match(implementation={atomic_default_mem_order(release)})   /* { dg-error "incorrect 
property 'release' of 'atomic_default_mem_order' selector" } */
+#pragma omp declare variant (f1) 
match(implementation={atomic_default_mem_order(release)})
  void f54 (void);
  #pragma omp declare variant (f1) 
match(implementation={atomic_default_mem_order(foobar)})   /* { dg-error "incorrect 
property 'foobar' of 'atomic_default_mem_order' selector" } */
  void f55 (void);
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-2a.f90 
b/gcc/testsuite/gfortran.dg/gomp/declare-variant-2a.f90
index 56de1177789..edc9b27f884 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-2a.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-2a.f90
@@ -29,10 +29,10 @@ contains
  !$omp declare variant (f1) match(implementation={vendor("foobar")}) ! { dg-warning 
"unknown property '.foobar.' of 'vendor' selector" }
end subroutine
subroutine f53 ()
-!$omp declare variant (f1) match(implementation={atomic_default_mem_order(acquire)}) 
 ! { dg-error "incorrect property 'acquire' of 'atomic_default_mem_order' 
selector" }
+!$omp declare variant (f1) 
match(implementation={atomic_default_mem_order(acquire)})
end subroutine
subroutine f54 ()
-!$omp declare variant (f1) match(implementation={atomic_default_mem_order(release)}) 
 ! { dg-error "incorrec

Re: [PATCH 1/2] analyzer: Remove check of unsigned_char in maybe_undo_optimize_bit_field_compare.

2023-12-11 Thread David Malcolm
On Mon, 2023-12-11 at 09:04 +0100, Richard Biener wrote:
> On Sun, Dec 10, 2023 at 8:57 PM Andrew Pinski
>  wrote:
> > 
> > From: Andrew Pinski 
> > 
> > The check for the type seems unnecessary and gets in the way
> > sometimes.
> > Also with a patch I am working on for match.pd, it causes a failure
> > to happen.
> > Before my patch the IR was:
> >   _1 = BIT_FIELD_REF ;
> >   _2 = _1 & 1;
> >   _3 = _2 != 0;
> >   _4 = (int) _3;
> >   __analyzer_eval (_4);
> > 
> > Where _2 was an unsigned char type.
> > And After my patch we have:
> >   _1 = BIT_FIELD_REF ;
> >   _2 = (int) _1;
> >   _3 = _2 & 1;
> >   __analyzer_eval (_3);
> > 
> > But in this case, the BIT_AND_EXPR is in an int type.
> > 
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no
> > regressions.

Yes...

> 
> OK (hope it's OK that I approve this).

...and yes.

Dave



[PATCH] Add myself to write after approval

2023-12-11 Thread Paul Iannetta
Hi,

I would like to add myself to write after approval, is it ok for
master?

Thanks,
Paul Iannetta

---8<---

ChangeLog:

* MAINTAINERS: Add myself to write after approval

Signed-off-by: Paul Iannetta 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0dbcbadcfd7..971a33873bb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -471,6 +471,7 @@ Dominique d'Humieres

 Andy Hutchinson
 Joel Hutton
 Lewis Hyatt
+Paul Iannetta  
 Roland Illig   
 Meador Inge
 Bernardo Innocenti 
-- 
2.35.1.500.gb896f729e2







Re: [PATCH v2 5/6] libgomp, nvptx: Cuda pinned memory

2023-12-11 Thread Thomas Schwinge
Hi!

On 2023-12-08T17:44:14+0100, Tobias Burnus  wrote:
> On 08.12.23 15:09, Thomas Schwinge wrote:
>>> On 22/11/2023 17:07, Tobias Burnus wrote:
 Let's start with the patch itself:
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> ...
> +static struct gomp_device_descr *
> +get_device_for_page_locked (void)
> +{
> + gomp_debug (0, "%s\n",
> + __FUNCTION__);
> +
> + struct gomp_device_descr *device;
> +#ifdef HAVE_SYNC_BUILTINS
> + device
> +   = __atomic_load_n (&device_for_page_locked, MEMMODEL_RELAXED);
> + if (device == (void *) -1)
> +   {
> + gomp_debug (0, " init\n");
> +
> + gomp_init_targets_once ();
> +
> + device = NULL;
> + for (int i = 0; i < num_devices; ++i)
 Given that this function just sets a single variable based on whether the
 page_locked_host_alloc_func function pointer exists, wouldn't it be much
 simpler to just do all this handling in   gomp_target_init  ?
>>> @Thomas, care to comment on this?
>>  From what I remember, we cannot assume that 'gomp_target_init' has
>> already been done when we get here; therefore 'gomp_init_targets_once' is
>> being called here.  We may get to 'get_device_for_page_locked' via
>> host-side OpenMP, in code that doesn't contain any OpenMP 'target'
>> offloading things.  Therefore, this was (a) necessary to make that work,
>> and (b) did seem to be a useful abstraction to me.
>
> I am not questioning the "gomp_init_targets_once ();" but I am wounding
> whether only 'gomp_init_targets_once()' should remain without the
> locking + loading dance - and then just set that single variable inside
> gomp_target_init.

Ah, I see, thanks.

> If you reach here w/o target set up, the "gomp_init_targets_once ();"
> would ensure it gets initialized with all the other code inside
> gomp_target_init.
>
> And if gomp_target_init() was called before, gomp_init_targets_once()
> will just return without doing anything and your are also fine.

Yes, I suppose we could do it that way.  'get_device_for_page_locked'
could then, after 'gomp_init_targets_once', unconditionally return
'device_for_page_locked' (even without '__atomic_load', right?).
A disadvantage is that the setup of 'device_for_page_locked' (in
'gomp_target_init') and use of it (in 'get_device_for_page_locked') is
then split apart.  I guess I don't have a strong opinion on that one.
;-)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension

2023-12-11 Thread Sergei Lewis
The thinking here is that using the largest possible LMUL when we know the
operation will fit in fewer registers potentially leaves performance on the
table - indirectly, due to the unnecessarily increased register pressure,
and also directly, depending on the implementation.

On Mon, Dec 11, 2023 at 10:05 AM juzhe.zh...@rivai.ai 
wrote:

> Hi, Thanks for contributing this.
>
> +/* Select appropriate LMUL for a single vector operation based on
> +   byte size of data to be processed.
> +   On success, return true and populate lmul_out.
> +   If length_in is too wide for a single vector operation, return false
> +   and leave lmul_out unchanged.  */
> +
> +static bool
> +select_appropriate_lmul (HOST_WIDE_INT length_in,
> +HOST_WIDE_INT &lmul_out)
> +{
>
> I don't think we need this, you only need to use TARGET_MAX_LMUL
>
>
> --
> juzhe.zh...@rivai.ai
>


Re: Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension

2023-12-11 Thread 钟居哲
I think we should leave it to user choice.

--param=riscv-autovec-lmul=m1/m2/m4/m8/dynamic.

So use TARGET_MAX_LMUL should be more reasonable.



juzhe.zh...@rivai.ai
 
From: Sergei Lewis
Date: 2023-12-11 22:58
To: juzhe.zh...@rivai.ai
CC: gcc-patches; Robin Dapp; Kito.cheng; jeffreyalaw
Subject: Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension
The thinking here is that using the largest possible LMUL when we know the 
operation will fit in fewer registers potentially leaves performance on the 
table - indirectly, due to the unnecessarily increased register pressure, and 
also directly, depending on the implementation.

On Mon, Dec 11, 2023 at 10:05 AM juzhe.zh...@rivai.ai  
wrote:
Hi, Thanks for contributing this.

+/* Select appropriate LMUL for a single vector operation based on
+   byte size of data to be processed.
+   On success, return true and populate lmul_out.
+   If length_in is too wide for a single vector operation, return false
+   and leave lmul_out unchanged.  */
+
+static bool
+select_appropriate_lmul (HOST_WIDE_INT length_in,
+HOST_WIDE_INT &lmul_out)
+{
I don't think we need this, you only need to use TARGET_MAX_LMUL




juzhe.zh...@rivai.ai


Re: Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension

2023-12-11 Thread Sergei Lewis
...oh, and keep the current approach if riscv-autovec-lmul=dynamic.
Makes perfect sense - thanks!

On Mon, Dec 11, 2023 at 3:01 PM 钟居哲  wrote:

> I think we should leave it to user choice.
>
> --param=riscv-autovec-lmul=m1/m2/m4/m8/dynamic.
>
> So use TARGET_MAX_LMUL should be more reasonable.
>
> --
> juzhe.zh...@rivai.ai
>
>
> *From:* Sergei Lewis 
> *Date:* 2023-12-11 22:58
> *To:* juzhe.zh...@rivai.ai
> *CC:* gcc-patches ; Robin Dapp
> ; Kito.cheng ; jeffreyalaw
> 
> *Subject:* Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension
> The thinking here is that using the largest possible LMUL when we know the
> operation will fit in fewer registers potentially leaves performance on the
> table - indirectly, due to the unnecessarily increased register pressure,
> and also directly, depending on the implementation.
>
> On Mon, Dec 11, 2023 at 10:05 AM juzhe.zh...@rivai.ai <
> juzhe.zh...@rivai.ai> wrote:
>
>> Hi, Thanks for contributing this.
>>
>> +/* Select appropriate LMUL for a single vector operation based on
>> +   byte size of data to be processed.
>> +   On success, return true and populate lmul_out.
>> +   If length_in is too wide for a single vector operation, return false
>> +   and leave lmul_out unchanged.  */
>> +
>> +static bool
>> +select_appropriate_lmul (HOST_WIDE_INT length_in,
>> +HOST_WIDE_INT &lmul_out)
>> +{
>>
>> I don't think we need this, you only need to use TARGET_MAX_LMUL
>>
>>
>> --
>> juzhe.zh...@rivai.ai
>>
>


[PATCH v4] aarch64: SVE/NEON Bridging intrinsics

2023-12-11 Thread Richard Ball
ACLE has added intrinsics to bridge between SVE and Neon.

The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
SVE vectors.

This patch adds support to GCC for the following 3 intrinsics:
svset_neonq, svget_neonq and svdup_neonq

gcc/ChangeLog:

* config.gcc: Adds new header to config.
* config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers):
Moved to header file.
(ENTRY): Likewise.
(enum aarch64_simd_type): Likewise.
(struct aarch64_simd_type_info): Remove static.
(GTY): Likewise.
* config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
Defines pragma for arm_neon_sve_bridge.h.
* config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
* config/aarch64/aarch64-sve-builtins-base.cc
(class svget_neonq_impl): New intrinsic implementation.
(class svset_neonq_impl): Likewise.
(class svdup_neonq_impl): Likewise.
(NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
* config/aarch64/aarch64-sve-builtins-functions.h
(NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE
functions.
* config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_element_type): Add NEON element types.
(parse_type): Likewise.
(struct get_neonq_def): Defines function shape for get_neonq.
(struct set_neonq_def): Defines function shape for set_neonq.
(struct dup_neonq_def): Defines function shape for dup_neonq.
* config/aarch64/aarch64-sve-builtins.cc 
(DEF_SVE_TYPE_SUFFIX): Changed to be called through
SVE_NEON macro.
(DEF_SVE_NEON_TYPE_SUFFIX): Defines 
macro for NEON_SVE_BRIDGE type suffixes.
(DEF_NEON_SVE_FUNCTION): Defines 
macro for NEON_SVE_BRIDGE functions.
(function_resolver::infer_neon128_vector_type): Infers type suffix
for overloaded functions.
(init_neon_sve_builtins): Initialise neon_sve_bridge_builtins for LTO.
(handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes.
(bf16): Replace entry with neon-sve entry.
(f16): Likewise.
(f32): Likewise.
(f64): Likewise.
(s8): Likewise.
(s16): Likewise.
(s32): Likewise.
(s64): Likewise.
(u8): Likewise.
(u16): Likewise.
(u32): Likewise.
(u64): Likewise.
* config/aarch64/aarch64-sve-builtins.h
(GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h.
(ENTRY): Add aarch64_simd_type definiton.
(enum aarch64_simd_type): Add neon information to type_suffix_info.
(struct type_suffix_info): New function.
* config/aarch64/aarch64-sve.md
(@aarch64_sve_get_neonq_): New intrinsic insn for big endian.
(@aarch64_sve_set_neonq_): Likewise.
* config/aarch64/aarch64.cc 
(aarch64_init_builtins): Add call to init_neon_sve_builtins.
* config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ.
* config/aarch64/aarch64-builtins.h: New file.
* config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file.
* config/aarch64/arm_neon_sve_bridge.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include 
arm_neon_sve_bridge header file
* gcc.dg/torture/neon-sve-bridge.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test.

Re: [pushed] configure, libquadmath: Remove unintended AC_CHECK_LIBM [PR111928]

2023-12-11 Thread Jakub Jelinek
On Mon, Oct 23, 2023 at 02:18:39PM +0100, Iain Sandoe wrote:
> This is a partial reversion of r14-4825-g6a6d3817afa02b to remove an
> unintended change.
> 
> Tested with x86_64-linux X arm-none-eabi (and  x86_64-darwin X arm-non-eabi
> and native x86_64-darwin bootstrap.  Also reported by the OP to fix the
> issue, pushed to trunk, apologies for the breakage,
> Iain
> 
> --- 8< ---
> 
> This was a rebase error, that managed to pass testing on Darwin and
> Linux (but fails on bare metal).
> 
>   PR libquadmath/111928
> 
> libquadmath/ChangeLog:
> 
>   * Makefile.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Remove AC_CHECK_LIBM.

I'm afraid this change is very harmful on Linux.
libquadmath.so.0 had on Linux since forever
 0x0001 (NEEDED) Shared library: [libm.so.6]
entry (and it should have it, because it has undefined relocations against
libm.so.6 entrypoints: at least signgam and sqrt, on powerpc64le also
__sqrtieee128.
But with this change it no longer has.
This e.g. breaks libtool build on powerpc64le, where the dynamic linker
crashes during sqrt related IFUNC resolution.

Jakub



Ping: [PATCH] Treat "p" in asms as addressing VOIDmode

2023-12-11 Thread Richard Sandiford
Ping

---

check_asm_operands was inconsistent about how it handled "p" after
RA compared to before RA.  Before RA it tested the address with a
void (unknown) memory mode:

case CT_ADDRESS:
  /* Every address operand can be reloaded to fit.  */
  result = result || address_operand (op, VOIDmode);
  break;

After RA it deferred to constrain_operands, which used the mode
of the operand:

if ((GET_MODE (op) == VOIDmode
 || SCALAR_INT_MODE_P (GET_MODE (op)))
&& (strict <= 0
|| (strict_memory_address_p
 (recog_data.operand_mode[opno], op
  win = true;

Using the mode of the operand matches reload's behaviour:

  else if (insn_extra_address_constraint
   (lookup_constraint (constraints[i])))
{
  address_operand_reloaded[i]
= find_reloads_address (recog_data.operand_mode[i], (rtx*) 0,
recog_data.operand[i],
recog_data.operand_loc[i],
i, operand_type[i], ind_levels, insn);

It allowed the special predicate address_operand to be used, with the
mode of the operand being the mode of the addressed memory, rather than
the mode of the address itself.  For example, vax has:

(define_insn "*movaddr"
  [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
(match_operand:VAXfp 1 "address_operand" "p"))
   (clobber (reg:CC VAX_PSL_REGNUM))]
  "reload_completed"
  "mova %a1,%0")

where operand 1 is an SImode expression that can address memory of
mode VAXfp.  GET_MODE (recog_data.operand[1]) is SImode (or VOIDmode),
but recog_data.operand_mode[1] is mode.

But AFAICT, ira and lra (like pre-reload check_asm_operands) do not
do this, and instead pass VOIDmode.  So I think this traditional use
of address_operand is effectively an old-reload-only feature.

And it seems like no modern port cares.  I think ports have generally
moved to using different address constraints instead, rather than
relying on "p" with different operand modes.  Target-specific address
constraints post-date the code above.

The big advantage of using different constraints is that it works
for asms too.  And that (to finally get to the point) is the problem
fixed in this patch.  For the aarch64 test:

  void f(char *p) { asm("prfm pldl1keep, %a0\n" :: "p" (p + 6)); }

everything up to and including RA required the operand to be a
valid VOIDmode address.  But post-RA check_asm_operands and
constrain_operands instead required it to be valid for
recog_data.operand_mode[0].  Since asms have no syntax for
specifying an operand mode that's separate from the operand itself,
operand_mode[0] is simply Pmode (i.e. DImode).

This meant that we required one mode before RA and a different mode
after RA.  On AArch64, VOIDmode is treated as a wildcard and so has a
more conservative/restricted range than DImode.  So if a post-RA pass
tried to form a new address, it would use a laxer condition than the
pre-RA passes.

This happened with the late-combine pass that I posted in October:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634166.html
which in turn triggered an error from aarch64_print_operand_address.

This patch takes the (hopefully) conservative fix of using VOIDmode for
asms but continuing to use the operand mode for .md insns, so as not
to break ports that still use reload.

Fixing this made me realise that recog_level2 was doing duplicate
work for asms after RA.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* recog.cc (constrain_operands): Pass VOIDmode to
strict_memory_address_p for 'p' constraints in asms.
* rtl-ssa/changes.cc (recog_level2): Skip redundant constrain_operands
for asms.

gcc/testsuite/
* gcc.target/aarch64/prfm_imm_offset_2.c: New test.
---
 gcc/recog.cc   | 18 +++---
 gcc/rtl-ssa/changes.cc |  4 +++-
 .../gcc.target/aarch64/prfm_imm_offset_2.c |  2 ++
 3 files changed, 16 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/prfm_imm_offset_2.c

diff --git a/gcc/recog.cc b/gcc/recog.cc
index eaab79c25d7..bff7be1aec1 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -3191,13 +3191,17 @@ constrain_operands (int strict, alternative_mask 
alternatives)
   strictly valid, i.e., that all pseudos requiring hard regs
   have gotten them.  We also want to make sure we have a
   valid mode.  */
-   if ((GET_MODE (op) == VOIDmode
-|| SCALAR_INT_MODE_P (GET_MODE (op)))
-   && (strict <= 0
-   || (strict_memory_address_p
-(recog_data.operand_mode[opno], op
- win = true;
-  

Ping: [PATCH] Add a late-combine pass [PR106594]

2023-12-11 Thread Richard Sandiford
Ping

---

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.

The pass currently has a single objective: remove definitions by
substituting into all uses.  The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.

The patch fixes PR106594.  It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.  I hope it would
also help with Robin's vec_duplicate testcase, although the
pressure heuristic might need tweaking for that case.

This is just a first step..  I'm hoping that the pass could be
used for other combine-related optimisations in future.  In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutitable, since it doesn't have to worry about register
pressure.  If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.

I've run an assembly comparison with one target per CPU directory,
and it seems to be a win for all targets except nvptx (which is hard
to measure, being a higher-level asm).  The biggest winner seemed
to be AVR.

I'd originally hoped to enable the pass by default at -O2 and above
on all targets.  But in the end, I don't think that's possible,
because it interacts badly with x86's STV and partial register
dependency passes.

For example, gcc.target/i386/minmax-6.c tests whether the code
compiles without any spilling.  The RTL created by STV contains:

(insn 33 31 3 2 (set (subreg:V4SI (reg:SI 120) 0)
(vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 116))
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(const_int 1 [0x1]))) -1
 (nil))
(insn 3 33 34 2 (set (subreg:V4SI (reg:SI 118) 0)
(subreg:V4SI (reg:SI 120) 0)) {movv4si_internal}
 (expr_list:REG_DEAD (reg:SI 120)
(nil)))
(insn 34 3 32 2 (set (reg/v:SI 108 [ y ])
(reg:SI 118)) -1
 (nil))

and it's crucial for the test that reg 108 is kept, rather than
propagated into uses.  As things stand, 118 can be allocated
a vector register and 108 a scalar register.  If 108 is propagated,
there will be scalar and vector uses of 118, and so it will be
spilled to memory.

That one could be solved by running STV2 later.  But RPAD is
a bigger problem.  In gcc.target/i386/pr87007-5.c, RPAD converts:

(insn 27 26 28 6 (set (reg:DF 100 [ _15 ])
(sqrt:DF (mem/c:DF (symbol_ref:DI ("d2") {*sqrtdf2_sse}
 (nil))

into:

(insn 45 26 44 6 (set (reg:V4SF 108)
(const_vector:V4SF [
(const_double:SF 0.0 [0x0.0p+0]) repeated x4
])) -1
 (nil))
(insn 44 45 27 6 (set (reg:V2DF 109)
(vec_merge:V2DF (vec_duplicate:V2DF (sqrt:DF (mem/c:DF (symbol_ref:DI 
("d2")
(subreg:V2DF (reg:V4SF 108) 0)
(const_int 1 [0x1]))) -1
 (nil))
(insn 27 44 28 6 (set (reg:DF 100 [ _15 ])
(subreg:DF (reg:V2DF 109) 0)) {*movdf_internal}
 (nil))

But both the pre-RA and post-RA passes are able to combine these
instructions back to the original form.

The patch therefore enables the pass by default only on AArch64.
However, I did test the patch with it enabled on x86_64-linux-gnu
as well, which was useful for debugging.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu (as posted, with no regressions, and with the
pass enabled by default, with some gcc.target/i386 regressions).
OK to install?

Richard


gcc/
PR rtl-optimization/106594
* Makefile.in (OBJS): Add late-combine.o.
* common.opt (flate-combine-instructions): New option.
* doc/invoke.texi: Document it.
* common/config/aarch64/aarch64-common.cc: Enable it by default
at -O2 and above.
* tree-pass.h (make_pass_late_combine): Declare.
* late-combine.cc: New file.
* passes.def: Add two instances of late_combine.

gcc/testsuite/
PR rtl-optimization/106594
* gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
targets.
* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
* gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
* gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
* gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
* gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
described in the comment.
* gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
* gcc.target/aarch64/pr106594_1.c: New test.
---
 gcc/Makefile.in   |   1 +
 gcc/common.opt|   5 +
 gcc/common/config/aarch64/aarch64-com

Re: [PATCH] Treat "p" in asms as addressing VOIDmode

2023-12-11 Thread Jeff Law




On 11/27/23 05:12, Richard Sandiford wrote:

check_asm_operands was inconsistent about how it handled "p" after
RA compared to before RA.  Before RA it tested the address with a
void (unknown) memory mode:

case CT_ADDRESS:
  /* Every address operand can be reloaded to fit.  */
  result = result || address_operand (op, VOIDmode);
  break;

After RA it deferred to constrain_operands, which used the mode
of the operand:

if ((GET_MODE (op) == VOIDmode
 || SCALAR_INT_MODE_P (GET_MODE (op)))
&& (strict <= 0
|| (strict_memory_address_p
 (recog_data.operand_mode[opno], op
  win = true;

Using the mode of the operand matches reload's behaviour:

   else if (insn_extra_address_constraint
   (lookup_constraint (constraints[i])))
{
  address_operand_reloaded[i]
= find_reloads_address (recog_data.operand_mode[i], (rtx*) 0,
recog_data.operand[i],
recog_data.operand_loc[i],
i, operand_type[i], ind_levels, insn);

It allowed the special predicate address_operand to be used, with the
mode of the operand being the mode of the addressed memory, rather than
the mode of the address itself.  For example, vax has:

(define_insn "*movaddr"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=g")
(match_operand:VAXfp 1 "address_operand" "p"))
(clobber (reg:CC VAX_PSL_REGNUM))]
   "reload_completed"
   "mova %a1,%0")

where operand 1 is an SImode expression that can address memory of
mode VAXfp.  GET_MODE (recog_data.operand[1]) is SImode (or VOIDmode),
but recog_data.operand_mode[1] is mode.

But AFAICT, ira and lra (like pre-reload check_asm_operands) do not
do this, and instead pass VOIDmode.  So I think this traditional use
of address_operand is effectively an old-reload-only feature.

And it seems like no modern port cares.  I think ports have generally
moved to using different address constraints instead, rather than
relying on "p" with different operand modes.  Target-specific address
constraints post-date the code above.

The big advantage of using different constraints is that it works
for asms too.  And that (to finally get to the point) is the problem
fixed in this patch.  For the aarch64 test:

   void f(char *p) { asm("prfm pldl1keep, %a0\n" :: "p" (p + 6)); }

everything up to and including RA required the operand to be a
valid VOIDmode address.  But post-RA check_asm_operands and
constrain_operands instead required it to be valid for
recog_data.operand_mode[0].  Since asms have no syntax for
specifying an operand mode that's separate from the operand itself,
operand_mode[0] is simply Pmode (i.e. DImode).

This meant that we required one mode before RA and a different mode
after RA.  On AArch64, VOIDmode is treated as a wildcard and so has a
more conservative/restricted range than DImode.  So if a post-RA pass
tried to form a new address, it would use a laxer condition than the
pre-RA passes.
This was initially a bit counter-intuitive, my first reaction was that a 
wildcard mode is more general.  And that's true, but it necessarily 
means the addresses accepted are more restrictive because any mode is 
allowed.




This happened with the late-combine pass that I posted in October:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634166.html
which in turn triggered an error from aarch64_print_operand_address.

This patch takes the (hopefully) conservative fix of using VOIDmode for
asms but continuing to use the operand mode for .md insns, so as not
to break ports that still use reload.
Sadly I didn't get as far as I would have liked in removing reload, 
though we did get a handful of ports converted this cycle




Fixing this made me realise that recog_level2 was doing duplicate
work for asms after RA.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* recog.cc (constrain_operands): Pass VOIDmode to
strict_memory_address_p for 'p' constraints in asms.
* rtl-ssa/changes.cc (recog_level2): Skip redundant constrain_operands
for asms.

gcc/testsuite/
* gcc.target/aarch64/prfm_imm_offset_2.c: New test.
It all seems a bit hackish.  I don't think ports have had much success 
using 'p' through the decades.  I think I generally ended up having to 
go with distinct constraints rather than relying on 'p'.


OK for the trunk, but ewww.

jeff


Re: [PATCH] wrong code on m68k with -mlong-jump-table-offsets and -malign-int (PR target/112413)

2023-12-11 Thread Jeff Law




On 12/11/23 05:51, Mikael Pettersson wrote:

On m68k the compiler assumes that the PC-relative jump-via-jump-table
instruction and the jump table are adjacent with no padding in between.

When -mlong-jump-table-offsets is combined with -malign-int, a 2-byte
nop may be inserted before the jump table, causing the jump to add the
fetched offset to the wrong PC base and thus jump to the wrong address.

Fixed by referencing the jump table via its label. On the test case
in the PR the object code change is (the moveal at 16 is the nop):

 a:  6536bcss 42 
 c:  e588lsll #2,%d0
 e:  203b 0808   movel %pc@(18 ,%d0:l),%d0
-  12:  4efb 0802   jmp %pc@(16 ,%d0:l)
+  12:  4efb 0804   jmp %pc@(18 ,%d0:l)
16:  284cmoveal %a4,%a4
18:   0020   orib #32,%d0
1c:   002c   orib #44,%d0

Bootstrapped and tested on m68k-linux-gnu, no regressions.

Note: I don't have commit rights to I would need assistance applying this.

2023-12-11  Mikael Pettersson  

PR target/112413
* config/m68k/linux.h (ASM_RETURN_CASE_JUMP): For
TARGET_LONG_JUMP_TABLE_OFFSETS, reference the jump table
via its label.
* config/m68k/m68kelf.h (ASM_RETURN_CASE_JUMP: Likewise.
* config/m68k/netbsd-elf.h (ASM_RETURN_CASE_JUMP): Likewise.

THanks.  Installed.

jeff


Re: [PATCH] treat argp-based mem as frame related in dse

2023-12-11 Thread Jeff Law




On 12/11/23 02:26, Jiufu Guo wrote:


Hi,

Thanks for your quick reply!

Jeff Law  writes:


On 12/10/23 20:07, Jiufu Guo wrote:


I'm having a bit of a hard time convincing myself this is correct
though.  I can't see how rewriting the load to read the source of the
prior store is unsafe.  If that fixes a problem, then it would seem
like we've gone wrong before here -- perhaps failing to use the fusage
loads to "kill" any available stores to the same or aliased memory
locations.

As you said the later one, call's fusage would killing the previous
store. It is a kind of case like:

 134: [argp:SI+0x8]=r134:SI
 135: [argp:SI+0x4]=0x1
 136: [argp:SI]=r132:SI
 137: ax:SI=call [`memset'] argc:0xc
 REG_CALL_DECL `memset'
 REG_EH_REGION 0

This call insn is:
(call_insn/j 137 136 147 27 (set (reg:SI 0 ax)
   (call (mem:QI (symbol_ref:SI ("memset") [flags 0x41]  ) [0 __builtin_memset S1 A8])
   (const_int 12 [0xc]))) "pr102798.c":23:22 1086 {*sibcall_value}
(expr_list:REG_UNUSED (reg:SI 0 ax)
   (expr_list:REG_CALL_DECL (symbol_ref:SI ("memset") [flags 0x41]  
)
   (expr_list:REG_EH_REGION (const_int 0 [0])
   (nil
   (expr_list:SI (use (mem/f:SI (reg/f:SI 16 argp) [0  S4 A32]))
   (expr_list:SI (use (mem:SI (plus:SI (reg/f:SI 16 argp) (const_int 4 
[0x4])) [0  S4 A32]))
   (expr_list:SI (use (mem:SI (plus:SI (reg/f:SI 16 argp) 
(const_int 8 [0x8])) [0  S4 A32]))
   (nil)

The stores in "insns 134-136" are used by the call. "check_mem_read_rtx"
would prevent them to eliminated.

Right.  But unless I read something wrong, the patch wasn't changing
store removal, it was changing whether or not we forwarded the source
of the store into the destination of a subsequent load from the same
address.

"check_mem_read_rtx" has another behavior which checks the mem
and adds read_info to insn_info->read_rec. "read_rec" could prevent
the "store" from being eliminated during the dse's global alg. This
patch leverages this behavior.
And to avoid the "mem on fusage" to be replaced by leading store's rhs
"replace_read" was disabled if the mem is on the call's fusage.

Ah, so not only do we want to avoid the call to replace_read, but also avoid 
the early return.

By avoiding the early return, we proceed into later code which "kills"
the tracked store, thus avoiding the problem.  Right?

It is similar, I would say.  There is "leading code" as below:
   /* Look at all of the uses in the insn.  */
   note_uses (&PATTERN (insn), check_mem_read_use, bb_info);

This checks possible loads in the "insn" and "kills" the tracked
stores if needed.
But "note_uses" does not check the fusage of the call insn.
So, this patch proceed the code "check_mem_read" for the "use mem"
on fusage.
OK for the trunk.  Please double check that older BZ and if that issue 
is fixed as well, add it to the commit log.  Thanks for walking me 
through the details.


jeff


Re: [PATCH] aarch64: Fix wrong code for bfloat when f16 is enabled [PR 111867]

2023-12-11 Thread Richard Sandiford
Andrew Pinski  writes:
> The problem here is when f16 is enabled, movbf_aarch64 accepts `Ufc`
> as a constraint:
>  [ w, Ufc ; fconsts , fp16  ] fmov\t%h0, %1
> But that is for fmov values and in this case fmov represents f16 rather than 
> bfloat16 values.
> This means we would get the wrong value in the register.
>
> Built and tested for aarch64-linux-gnu with no regressions.  Also tested with 
> `-march=armv9-a+sve2,
> gcc.dg/torture/bfloat16-basic.c and gcc.dg/torture/bfloat16-builtin.c no 
> longer fail.
>
> gcc/ChangeLog:
>
>   PR target/111867
>   * config/aarch64/aarch64.cc (aarch64_float_const_representable_p): For 
> BFmode,
>   only accept +0.0.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64.cc | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 5cffdabc62e..d48f5a1ba4b 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23904,6 +23904,7 @@ aarch64_float_const_representable_p (rtx x)
>  
>r = *CONST_DOUBLE_REAL_VALUE (x);
>  
> +
>/* We cannot represent infinities, NaNs or +/-zero.  We won't
>   know if we have +zero until we analyse the mantissa, but we
>   can reject the other invalid values.  */

Seems like a stray change.

OK without that, thanks.

Richard

> @@ -23911,6 +23912,10 @@ aarch64_float_const_representable_p (rtx x)
>|| REAL_VALUE_MINUS_ZERO (r))
>  return false;
>  
> +  /* For BFmode, only handle 0.0. */
> +  if (GET_MODE (x) == BFmode)
> +return real_iszero (&r, false);
> +
>/* Extract exponent.  */
>r = real_value_abs (&r);
>exponent = REAL_EXP (&r);


Re: [PATCH] untyped calls: enable target switching [PR112334]

2023-12-11 Thread Jeff Law




On 12/1/23 08:10, Alexandre Oliva wrote:

On Dec  1, 2023, Alexandre Oliva  wrote:


Also tested on arm-eabi, but it's *not* enough (or needed) to fix the
PR, there's another bug lurking there, with a separate patch coming
up.


Here it is.



The computation of apply_args_size and apply_result_size is saved in a
static variable, so that the corresponding _mode arrays are
initialized only once.  That is not compatible with switchable
targets, and ARM's arm_set_current_function, by saving and restoring
target globals, exercises this problem with a testcase such as that in
the PR, in which more than one function in the translation unit calls
__builtin_apply or __builtin_return, respectively.

This patch moves the _size statics into the target_builtins array,
with a bit of ugliness over _plus_one so that zero initialization of
the struct does the right thing.

Regstrapped on x86_64-linux-gnu, tested on arm-eabi with and without the
upthread patch.  It fixes the hardcfr fails either way.  As for the
ugliness, there's a follow up patch below that attempts to alleviate it
a little (also regstrapped and tested), but I'm not sure we want to go
down that path.  WDYT?

It's a wart, but doesn't seem too bad to me.




for  gcc/ChangeLog

PR target/112334
* builtins.h (target_builtins): Add fields for apply_args_size
and apply_result_size.
* builtins.cc (apply_args_size, apply_result_size): Cache
results in fields rather than in static variables.
(get_apply_args_size, set_apply_args_size): New.
(get_apply_result_size, set_apply_result_size): New.

OK.




untyped calls: use wrapper class type for implicit plus_one

Instead of get and set macros to apply a delta, use a single macro
that resorts to a temporary wrapper class to apply it.

To be combined (or not) with the previous patch.

I'd be OK with this as well.

jeff



Re: [PATCH v2 5/6] libgomp, nvptx: Cuda pinned memory

2023-12-11 Thread Tobias Burnus

On 11.12.23 15:31, Thomas Schwinge wrote:

On 2023-12-08T17:44:14+0100, Tobias Burnus  wrote:

On 08.12.23 15:09, Thomas Schwinge wrote:

On 22/11/2023 17:07, Tobias Burnus wrote:

Let's start with the patch itself:

--- a/libgomp/target.c
+++ b/libgomp/target.c
...
+static struct gomp_device_descr *
+get_device_for_page_locked (void)
+{
+ gomp_debug (0, "%s\n",
+ __FUNCTION__);
+
+ struct gomp_device_descr *device;
+#ifdef HAVE_SYNC_BUILTINS
+ device
+   = __atomic_load_n (&device_for_page_locked, MEMMODEL_RELAXED);
+ if (device == (void *) -1)
+   {
+ gomp_debug (0, " init\n");
+
+ gomp_init_targets_once ();
+
+ device = NULL;
+ for (int i = 0; i < num_devices; ++i)

Given that this function just sets a single variable based on whether the
page_locked_host_alloc_func function pointer exists, wouldn't it be much
simpler to just do all this handling in   gomp_target_init  ?

@Thomas, care to comment on this?

  From what I remember, we cannot assume that 'gomp_target_init' has
already been done when we get here; therefore 'gomp_init_targets_once' is
being called here.  We may get to 'get_device_for_page_locked' via
host-side OpenMP, in code that doesn't contain any OpenMP 'target'
offloading things.  Therefore, this was (a) necessary to make that work,
and (b) did seem to be a useful abstraction to me.

I am not questioning the "gomp_init_targets_once ();" but I am wounding
whether only 'gomp_init_targets_once()' should remain without the
locking + loading dance - and then just set that single variable inside
gomp_target_init.

Ah, I see, thanks.


If you reach here w/o target set up, the "gomp_init_targets_once ();"
would ensure it gets initialized with all the other code inside
gomp_target_init.

And if gomp_target_init() was called before, gomp_init_targets_once()
will just return without doing anything and your are also fine.

Yes, I suppose we could do it that way.  'get_device_for_page_locked'
could then, after 'gomp_init_targets_once', unconditionally return
'device_for_page_locked' (even without '__atomic_load', right?).

Yes, that was my idea.

A disadvantage is that the setup of 'device_for_page_locked' (in
'gomp_target_init') and use of it (in 'get_device_for_page_locked') is
then split apart.  I guess I don't have a strong opinion on that one.
;-)


But pro is that it avoids the #ifdef HAVE_SYNC_BUILTINS, avoiding a "-1"
initialization of using_device_for_page_locked, atomic loads all over
the place etc.

Thus, I prefer this option – but I also don't have a strong opinion,
either.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] libgccjit Fix a RTL bug for libgccjit

2023-12-11 Thread Jeff Law




On 11/20/23 16:54, David Malcolm wrote:

On Mon, 2023-11-20 at 16:38 -0700, Jeff Law wrote:



On 11/20/23 15:46, David Malcolm wrote:

On Fri, 2023-11-17 at 14:09 -0700, Jeff Law wrote:



On 11/17/23 14:08, Antoni Boucher wrote:

In contrast with the other frontends, libgccjit can be executed
multiple times in a row in the same process.

Yup.  I'm aware of that.  Even so calling init_emit_once more
than
one
time still seems wrong.


There are two approaches we follow when dealing with state stored
in
global variables:
(a) clean it all up via the various functions called from
toplev::finalize
(b) make it effectively constant once initialized, with idempotent
initialization

The multiple in-process executions of libgccjit could pass in
different
code-generation options.  Does the RTL-initialization logic depend
anywhere on flags passed in, because if so, we're probably going to
need to re-run the initialization.

The INIT_EXPANDERS code would be the most concerning as it's
implementation is totally hidden and provided by the target. I
wouldn't
be at all surprised if one or more do something evil in there.  That
probably needs to be evaluated on a target by target basis.

The rest really do look like single init, even in a JIT environment
kinds of things -- ie all the shared constants in RTL.


I think Antoni's patch can we described as implementing "single init",
in that it ensures that at least part of init_emit_once is single init.

Is the posted patch OK by you, or do we need to rework things, and if
the latter, what would be the goal?
What I'm struggling with is perhaps a problem of naming.  Conceptually 
"init_emit_once" in my mind should be called once and only once.If I 
read Antoni's change correctly, we call it more than once.  That just 
feels conceptually wrong -- add to it the opaqueness of INIT_EXPANDERS 
and it feels even more wrong -- we don't know what's going on behind the 
scenes in there.


jeff


Re: [PATCH] Treat "p" in asms as addressing VOIDmode

2023-12-11 Thread Maciej W. Rozycki
On Mon, 11 Dec 2023, Jeff Law wrote:

> > This happened with the late-combine pass that I posted in October:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634166.html
> > which in turn triggered an error from aarch64_print_operand_address.
> > 
> > This patch takes the (hopefully) conservative fix of using VOIDmode for
> > asms but continuing to use the operand mode for .md insns, so as not
> > to break ports that still use reload.
> Sadly I didn't get as far as I would have liked in removing reload, though we
> did get a handful of ports converted this cycle

 The VAX port isn't ready for LRA yet as not only LRA produces notably 
worse RISC-like code ignoring all the architecture's address mode features 
(unenthusiatically acceptable), but it causes testsuite regressions of the 
ICE kind (unacceptable) as well.

 I did hope to at least start work on it in this release cycle, but there 
has been this outstanding issue of broken exception unwinding, which makes 
C++ unusuable except for odd cases such as with GCC itself where 
exceptions are not used.  This unwinder issue obviously has to take 
precedence as it cripples the usability of code produced by the compiler 
even for developer's use, e.g. native VAX/GDB is mostly broken and even 
VAX/gdbserver quits with a crash.

 I can try and see if I can find some time over the festive period to 
move the VAX port forward in either respect.

  Maciej


Re: [RFC PATCH 1/1] nix: add a simple flake nix shell

2023-12-11 Thread Vincenzo Palazzo
Hi all,

>Are you backing down from that opinion and deciding that this proposal
is, indeed, after all specific to NixOS and only NixOS and is neither
needed nor used on any other distro?

I may be misreading the conversation, so let's restart it.

Why should my RFC be inside the distro's repository? What makes this a distro
build package? and not a developer configuration for building a
development environment?

Cheers,

   Vincent.

On Tue, Dec 5, 2023 at 1:43 PM Eli Schwartz  wrote:
>
> On 12/5/23 5:35 AM, Vincenzo Palazzo wrote:
> >>> I see, but to me, this do not look like a distro build procedure,
> >>> because you can use
> >>> with any kind of system (OSX/UNIX) by using nix.
> >>
> >> But you can do the same with various other distro build procedures too?
> >> e.g. Gentoo Prefix allows you to install a full-blown Gentoo anywhere
> >> you like, "by using portage".
> >
> > With a single difference on Gentoo you are allowed to put stuff in the
> > global path and use
> > it from a terminal like `$ pacman -Sy foo`. On nix os you do not,
> > because the development environment
> > is used for that.
> >
> > So all the nice dependencies that gcc required to build can not be
> > installed in NixOS global pat (e.g libc)
> > so in NixOS you should define the development environment otherwise
> > you can do the build. Instead in all
> > the other systems that you mention you can do.
>
>
> And yet, it seems your original point was that this doesn't qualify as a
> "distro build procedure" because nix isn't specific to NixOS.
>
> Are you backing down from that opinion and deciding that this proposal
> is, indeed, after all specific to NixOS and only NixOS and is neither
> needed nor used on any other distro?
>
>
> > Please note that the flake.nix does not define how to build gcc, but
> > just what are the dependencies
> > that gcc is required in order to contribute to the compiler. In other
> > words, is you run the flake.nix
> > on NixOS or any other system you do not have gcc installed on your
> > system, this is the job of the
> > distro.
>
>
> Its lack of completeness is surely an issue, but not the issue at hand
> *here*. Why do you think the lack of completeness is a supporting
> argument, rather than an opposing argument?
>
>
> --
> Eli Schwartz
>


Re: [RFC] RISC-V: Support RISC-V Profiles in -march option.

2023-12-11 Thread Jeff Law




On 11/20/23 12:14, Jiawei wrote:

Supports RISC-V profiles[1] in -march option.

Default input set the profile is before other formal extensions.

[1]https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc

gcc/ChangeLog:

 * common/config/riscv/riscv-common.cc (struct riscv_profiles):
   New struct.
 (riscv_subset_list::parse_profiles): New function.
 (riscv_subset_list::parse): New table.
 * config/riscv/riscv-subset.h: New protype.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/arch-29.c: New test.
 * gcc.target/riscv/arch-30.c: New test.
 * gcc.target/riscv/arch-31.c: New test.

---
  gcc/common/config/riscv/riscv-common.cc  | 58 +++-
  gcc/config/riscv/riscv-subset.h  |  2 +
  gcc/testsuite/gcc.target/riscv/arch-29.c |  5 ++
  gcc/testsuite/gcc.target/riscv/arch-30.c |  5 ++
  gcc/testsuite/gcc.target/riscv/arch-31.c |  5 ++
  6 files changed, 81 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-29.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-30.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-31.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 5111626157b..30617e619b1 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -165,6 +165,12 @@ struct riscv_ext_version
int minor_version;
  };
  
+struct riscv_profiles

+{
+  const char * profile_name;
+  const char * profile_string;
+};

Just a formatting nit, no space between the '*' and the field name.


@@ -348,6 +354,28 @@ static const struct riscv_ext_version riscv_combine_info[] 
=
{NULL, ISA_SPEC_CLASS_NONE, 0, 0}
  };
  
+static const riscv_profiles riscv_profiles_table[] =

+{
+  {"RVI20U64", "rv64i"},
+  {"RVI20U32", "rv32i"},
+  /*Currently we don't have zicntr,ziccif,ziccrse,ziccamoa,
+zicclsm,za128rs yet.  */
It is actually useful to note the extensions not included?  I don't 
think the profiles are supposed to change once ratified.



+  {"RVA22U64", "rv64imafdc_zicsr_zihintpause_zba_zbb_zbs_" 
\
Note the trailing "_", was that intentional?  None of the other entries 
have a trailing "_".




@@ -927,6 +955,31 @@ riscv_subset_list::parsing_subset_version (const char *ext,
return p;
  }
  
+const char *

+riscv_subset_list::parse_profiles (const char * p){
+  for (int i = 0; riscv_profiles_table[i].profile_name != NULL; ++i) {
+const char* match = strstr(p, riscv_profiles_table[i].profile_name);
+const char* plus_ext = strchr(p, '+');
+/* Find profile at the begin.  */
+if (match != NULL && match == p) {
+  /* If there's no '+' sign, return the profile_string directly.  */
+  if(!plus_ext)
+   return riscv_profiles_table[i].profile_string;
+  /* If there's a '+' sign, concatenate profiles with other ext.  */
+  else {
+   size_t arch_len = strlen(riscv_profiles_table[i].profile_string) +
+   strlen(plus_ext);
+   static char* result = new char[arch_len + 2];
+   strcpy(result, riscv_profiles_table[i].profile_string);
+   strcat(result, "_");
+   strcat(result, plus_ext + 1); /* skip the '+'.  */
+   return result;
+  }
+}
+  }
+  return p;
+}

This needs a function comment.

The open curly should always be on a line by itself which is going to 
require reindenting all this code.  Comments go on separate lines rather 
than appending them to an existing line.



I think the consensus in the Tuesday patchwork meeting was that while 
there are concerns about profiles, those concerns should prevent this 
patch from going forward.  So if you could fix the formatting problem as 
well as the trailing "_" issue noted above and repost, it would be 
appreciated.


Thanks,

Jeff


Re: Ping: [PATCH] Add a late-combine pass [PR106594]

2023-12-11 Thread Robin Dapp
Hi Richard,

I have tested the new pass on riscv64 and while it did exhibit some
regressions, none of them are critical.  Mostly, test expectations
will need to be adjusted - no new execution failures.

As mentioned in the initial discussion it does help us get the
behavior we want but, as of now, seems to propagate/combine a bit
more than I expected.  I suppose a bit of register-pressure tuning
will still be required in order to get the behavior we want.
It will also force us to properly set latencies/costs for the
register-file-crossing vector instructions.

All in all I would be very glad to see this get in :)

Regards
 Robin



[pushed] c++: add fixed testcase [PR63378]

2023-12-11 Thread Patrick Palka
We accept this testcase since r12-4453-g79802c5dcc043a.

PR c++/63378

gcc/testsuite/ChangeLog:

* g++.dg/template/fnspec3.C: New test.
---
 gcc/testsuite/g++.dg/template/fnspec3.C | 20 
 1 file changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/fnspec3.C

diff --git a/gcc/testsuite/g++.dg/template/fnspec3.C 
b/gcc/testsuite/g++.dg/template/fnspec3.C
new file mode 100644
index 000..c36cb17751d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/fnspec3.C
@@ -0,0 +1,20 @@
+// PR c++/63378
+// { dg-do compile { target c++11 } }
+
+template
+struct B { };
+
+template
+struct A {
+private:
+  template
+  static B g();
+
+public:
+  template
+  auto f() -> decltype(g());
+};
+
+template<>
+template<>
+auto A::f() -> B;
-- 
2.43.0.76.g1a87c842ec



[PATCH v3 1/6] libgomp: basic pinned memory on Linux

2023-12-11 Thread Andrew Stubbs

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.

This implementation will work OK for page-scale allocations, and finer-grained
allocations will be implemented in a future patch.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
(omp_init_allocator): Use MEMSPACE_VALIDATE to check pinning.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
* config/gcn/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* libgomp.texi: Switch pinned trait to supported.
(MEMSPACE_VALIDATE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.

Co-Authored-By: Thomas Schwinge 
---
 libgomp/allocator.c  |  65 +---
 libgomp/config/gcn/allocator.c   |  21 +--
 libgomp/config/linux/allocator.c | 111 +
 libgomp/config/nvptx/allocator.c |  21 +--
 libgomp/libgomp.texi |   3 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c | 115 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c | 120 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c | 156 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c | 150 ++
 9 files changed, 716 insertions(+), 46 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-4.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index a8a80f8028d..666adf9a3a9 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -38,27 +38,30 @@
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
 /* These macros may be overridden in config//allocator.c.
+   The defaults (no override) are to return NULL for pinned memory requests
+   and pass through to the regular OS calls otherwise.
The following definitions (ab)use comma operators to avoid unused
variable errors.  */
 #ifndef MEMSPACE_ALLOC
-#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
-  malloc (((void)(MEMSPACE), (SIZE)))
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : malloc (((void)(MEMSPACE), (SIZE
 #endif
 #ifndef MEMSPACE_CALLOC
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
-  calloc (1, (((void)(MEMSPACE), (SIZE
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : calloc (1, (((void)(MEMSPACE), (SIZE)
 #endif
 #ifndef MEMSPACE_REALLOC
-#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \
-  realloc (ADDR, (((void)(MEMSPACE), (void)(OLDSIZE), (SIZE
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+   ((PIN) || (OLDPIN) ? NULL \
+   : realloc (ADDR, (((void)(MEMSPACE), (void)(OLDSIZE), (SIZE)
 #endif
 #ifndef MEMSPACE_FREE
-#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
-  free (((void)(MEMSPACE), (void)(SIZE), (ADDR)))
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+  if (PIN) free (((void)(MEMSPACE), (void)(SIZE), (ADDR)))
 #endif
 #ifndef MEMSPACE_VALIDATE
-#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \
-  (((void)(MEMSPACE), (void)(ACCESS), 1))
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS, PIN) \
+  (PIN ? 0 : ((void)(MEMSPACE), (void)(ACCESS), 1))
 #endif
 
 /* Map the predefined allocators to the correct memory space.
@@ -439,12 +442,8 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
 }
 #endif
 
-  /* No support for this so far.  */
-  if (data.pinned)
-return omp_null_allocator;
-
   /* Reject unsupported memory spaces.  */
-  if (!MEMSPACE_VALIDATE (data.memspace, data.access))
+  if (!MEMSPACE_VALIDATE (data.memspace, data.access, data.pinned))
 return omp_null_allocator;
 
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
@@ -586,7 +585,8 @@ retry:
 	}
   else
 #endif
-	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
+	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
+			  allocator_data->pinned);
   if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -623,7 +6

[PATCH v3 0/6] libgomp: OpenMP pinned memory omp_alloc

2023-12-11 Thread Andrew Stubbs
This patch series is a rework of the v2 series I posted in August:

https://patchwork.sourceware.org/project/gcc/list/?series=23763&state=%2A&archive=both

This version addresses most of the review comments from Tobias, but
after discussion with Tobias and Thomas we've decided to skip the
nice-to-have proposed initialization improvement in the interest of
getting the job done, for now.

Otherwise, some bugs have been fixed and few other clean-ups have been
made, but the series retains the same purpose and structure.

This series no longer has any out-of-tree dependencies, now that the
low-latency allocator patch have been committed.

An older, less compact, version of these patches is already applied to
the devel/omp/gcc-13 (OG13) branch.

OK for mainline?

Andrew

Andrew Stubbs (5):
  libgomp: basic pinned memory on Linux
  libgomp, openmp: Add ompx_pinned_mem_alloc
  openmp: Add -foffload-memory
  openmp: -foffload-memory=pinned
  libgomp: fine-grained pinned memory allocator

Thomas Schwinge (1):
  libgomp, nvptx: Cuda pinned memory

 gcc/common.opt|  16 +
 gcc/coretypes.h   |   7 +
 gcc/doc/invoke.texi   |  16 +-
 gcc/omp-builtins.def  |   3 +
 gcc/omp-low.cc|  66 
 libgomp/Makefile.am   |   2 +-
 libgomp/Makefile.in   |   7 +-
 libgomp/allocator.c   |  95 --
 libgomp/config/gcn/allocator.c|  21 +-
 libgomp/config/linux/allocator.c  | 243 +
 libgomp/config/nvptx/allocator.c  |  21 +-
 libgomp/libgomp-plugin.h  |   2 +
 libgomp/libgomp.h |  14 +
 libgomp/libgomp.map   |   1 +
 libgomp/libgomp.texi  |  17 +-
 libgomp/libgomp_g.h   |   1 +
 libgomp/omp.h.in  |   1 +
 libgomp/omp_lib.f90.in|   2 +
 libgomp/plugin/plugin-nvptx.c |  42 +++
 libgomp/target.c  | 136 
 .../libgomp.c-c++-common/alloc-pinned-1.c |  28 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c  | 141 
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c  | 146 
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c  | 189 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c  | 184 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 129 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 128 +++
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  |  63 
 libgomp/testsuite/libgomp.c/alloc-pinned-8.c  | 127 +++
 .../libgomp.fortran/alloc-pinned-1.f90|  16 +
 libgomp/usmpin-allocator.c| 319 ++
 31 files changed, 2127 insertions(+), 56 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-8.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90
 create mode 100644 libgomp/usmpin-allocator.c

-- 
2.41.0



[PATCH v3 3/6] openmp: Add -foffload-memory

2023-12-11 Thread Andrew Stubbs

Add a new option.  It's inactive until I add some follow-up patches.

gcc/ChangeLog:

* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
 gcc/common.opt  | 16 
 gcc/coretypes.h |  7 +++
 gcc/doc/invoke.texi | 16 +++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 5eb5ecff04b..a008827cfa2 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2332,6 +2332,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-memory=
+Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) Init(OFFLOAD_MEMORY_NONE)
+-foffload-memory=[none|unified|pinned]	Use an offload memory optimization.
+
+Enum
+Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload memory option %qs)
+
+EnumValue
+Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE)
+
+EnumValue
+Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED)
+
+EnumValue
+Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED)
+
 fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index fe5b868fb4f..fb4bf37ba24 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -218,6 +218,13 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of memory optimization for an offload device.  */
+enum offload_memory {
+  OFFLOAD_MEMORY_NONE,
+  OFFLOAD_MEMORY_UNIFIED,
+  OFFLOAD_MEMORY_PINNED
+};
+
 /* Types of profile update methods.  */
 enum profile_update {
   PROFILE_UPDATE_SINGLE,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 43341fe6e5e..f6a7459bda7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@ in the following sections.
 -fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch
 -ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted
 -flax-vector-conversions  -fms-extensions
--foffload=@var{arg}  -foffload-options=@var{arg}
+-foffload=@var{arg}  -foffload-options=@var{arg} -foffload-memory=@var{arg} 
 -fopenacc  -fopenacc-dim=@var{geom}
 -fopenmp  -fopenmp-simd  -fopenmp-target-simd-clone@r{[}=@var{device-type}@r{]}
 -fpermitted-flt-eval-methods=@var{standard}
@@ -2766,6 +2766,20 @@ Typical command lines are
 -foffload-options=amdgcn-amdhsa=-march=gfx906
 @end smallexample
 
+@opindex foffload-memory
+@cindex OpenMP offloading memory modes
+@item -foffload-memory=none
+@itemx -foffload-memory=unified
+@itemx -foffload-memory=pinned
+Enable a memory optimization mode to use with OpenMP.  The default behavior,
+@option{-foffload-memory=none}, is to do nothing special (unless enabled via
+a requires directive in the code).  @option{-foffload-memory=unified} is
+equivalent to @code{#pragma omp requires unified_shared_memory}.
+@option{-foffload-memory=pinned} forces all host memory to be pinned (this
+mode may require the user to increase the ulimit setting for locked memory).
+All translation units must select the same setting to avoid undefined
+behavior.
+
 @opindex fopenacc
 @cindex OpenACC accelerator programming
 @item -fopenacc


  1   2   >