date:20240930

[PATCH] tree-optimization/113197 - bougs assert in PTA

2024-09-30 Thread Richard Biener

PTA asserts that EAF_NO_DIRECT_READ is not set when flags are
set consistently which doesn't make sense.  The following removes
the assert.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

PR tree-optimization/113197
* tree-ssa-structalias.cc (handle_call_arg): Remove bougs
assert.

* gcc.dg/lto/pr113197_0.c: New testcase.
* gcc.dg/lto/pr113197_1.c: Likewise.
---
 gcc/testsuite/gcc.dg/lto/pr113197_0.c | 15 +++
 gcc/testsuite/gcc.dg/lto/pr113197_1.c |  3 +++
 gcc/tree-ssa-structalias.cc   |  1 -
 3 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113197_0.c
 create mode 100644 gcc/testsuite/gcc.dg/lto/pr113197_1.c

diff --git a/gcc/testsuite/gcc.dg/lto/pr113197_0.c 
b/gcc/testsuite/gcc.dg/lto/pr113197_0.c
new file mode 100644
index 000..293c8207dee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr113197_0.c
@@ -0,0 +1,15 @@
+/* { dg-lto-do link } */
+/* { dg-lto-options { { -O -flto -fpie } } } */
+/* { dg-extra-ld-options { -r -nostdlib -flinker-output=nolto-rel } } */
+
+enum a { b } register_dccp();
+void c();
+void __attribute__((noreturn)) exit_error(enum a d) {
+  __builtin_va_list va;
+  __builtin_va_end(va);
+  if (d)
+c();
+  c();
+  __builtin_exit(1);
+}
+int main() { register_dccp(); }
diff --git a/gcc/testsuite/gcc.dg/lto/pr113197_1.c 
b/gcc/testsuite/gcc.dg/lto/pr113197_1.c
new file mode 100644
index 000..30bf6f7e7c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr113197_1.c
@@ -0,0 +1,3 @@
+int a;
+void exit_error();
+void register_dccp() { exit_error(a); }
diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 54c4818998d..73ba5aa6195 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -4194,7 +4194,6 @@ handle_call_arg (gcall *stmt, tree arg, vec 
*results, int flags,
 {
   make_transitive_closure_constraints (tem);
   callarg_transitive = true;
-  gcc_checking_assert (!(flags & EAF_NO_DIRECT_READ));
 }
 
   /* If necessary, produce varinfo for indirect accesses to ARG.  */
-- 
2.43.0

Re: [PATCH] libstdc++-v3: Fix signed-overflow warning for newlib/ctype_base.h, PR116895

2024-09-30 Thread Jonathan Wakely

On Mon, 30 Sept 2024, 01:58 Hans-Peter Nilsson,  wrote:

> FWIW, I see "typedef char mask;" also for bionic and
> openbsd.  Tested for cris-elf.
>
> Ok to commit?
>

OK thanks



> -- >8 --
> There are 100+ regressions when running the g++ testsuite for newlib
> targets (probably excepting ARM-based ones) e.g cris-elf after commit
> r15-3859-g63a598deb0c9fc "libstdc++: #ifdef out #pragma GCC
> system_header", which effectively no longer silences warnings for
> gcc-installed system headers.  Some of these regressions are fixed by
> r15-3928.  For the remaining ones, there's in g++.log:
>
> FAIL: g++.old-deja/g++.robertl/eb79.C  -std=c++26 (test for excess errors)
> Excess errors:
> /gccobj/cris-elf/libstdc++-v3/include/cris-elf/bits/ctype_base.h:50:53: \
>  warning: overflow in conversion from 'int' to 'std::ctype_base::mask' \
>  {aka 'char'} changes value from '151' to '-105' [-Woverflow]
>
> This is because the _B macro in newlib's ctype.h (from where the
> "_" macros come) is bit 7, the sign-bit of 8-bit types:
>
> #define _B  0200
>
> Using it in an int-expression that is then truncated to 8 bits will
> "change" the value to negative for a default-signed char.  If this
> code was created from scratch, it should have been an unsigned type,
> however it's not advisable to change the type of mask as this affects
> the API.  The least ugly option seems to be to silence the warning by
> explict casts in the initializer, and for consistency, doing it for
> all members.
>
> PR libstdc++/116895
> * config/os/newlib/ctype_base.h: Avoid signed-overflow warnings by
> explicitly casting initializer expressions to mask.
> ---
>  libstdc++-v3/config/os/newlib/ctype_base.h | 24 +++---
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/libstdc++-v3/config/os/newlib/ctype_base.h
> b/libstdc++-v3/config/os/newlib/ctype_base.h
> index 309fdeea7731..5ec43a0c6803 100644
> --- a/libstdc++-v3/config/os/newlib/ctype_base.h
> +++ b/libstdc++-v3/config/os/newlib/ctype_base.h
> @@ -41,19 +41,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  // NB: Offsets into ctype::_M_table force a particular size
>  // on the mask type. Because of this, we don't use an enum.
>  typedef char   mask;
> -static const mask upper= _U;
> -static const mask lower= _L;
> -static const mask alpha= _U | _L;
> -static const mask digit= _N;
> -static const mask xdigit   = _X | _N;
> -static const mask space= _S;
> -static const mask print= _P | _U | _L | _N | _B;
> -static const mask graph= _P | _U | _L | _N;
> -static const mask cntrl= _C;
> -static const mask punct= _P;
> -static const mask alnum= _U | _L | _N;
> +static const mask upper= mask (_U);
> +static const mask lower= mask (_L);
> +static const mask alpha= mask (_U | _L);
> +static const mask digit= mask (_N);
> +static const mask xdigit   = mask (_X | _N);
> +static const mask space= mask (_S);
> +static const mask print= mask (_P | _U | _L | _N | _B);
> +static const mask graph= mask (_P | _U | _L | _N);
> +static const mask cntrl= mask (_C);
> +static const mask punct= mask (_P);
> +static const mask alnum= mask (_U | _L | _N);
>  #if __cplusplus >= 201103L
> -static const mask blank= space;
> +static const mask blank= mask (space);
>  #endif
>};
>
> --
> 2.30.2
>
>

Re: [PATCH] c++: Avoid "infinite parsing" because of cp_parser_decltype [PR114858]

2024-09-30 Thread Simon Martin

Friendly ping. Thanks!

On 17 Sep 2024, at 14:14, Simon Martin wrote:

> The invalid test case in this PR highlights a bad interaction between
> the tentative_firewall and error recovery in cp_parser_decltype: the
> firewall makes cp_parser_skip_to_closing_parenthesis a no-op, and the
> parser does not make any progress, running "forever".
>
> This patch calls cp_parser_commit_to_tentative_parse before initiating
> error recovery.
>
> Successfully tested on x86_64-pc-linux-gnu.
>
>   PR c++/114858
>
> gcc/cp/ChangeLog:
>
>   * parser.cc (cp_parser_decltype): Commit tentative parse before
>   initiating error recovery.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/cpp0x/decltype10.C: Adjust test expectation.
>   * g++.dg/cpp2a/pr114858.C: New test.
> ---
>  gcc/cp/parser.cc|  3 +++
>  gcc/testsuite/g++.dg/cpp0x/decltype10.C |  2 ++
>  gcc/testsuite/g++.dg/cpp2a/pr114858.C   | 25 
> +
>  3 files changed, 30 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/pr114858.C
>
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 4dd9474cf60..3a7c5ffe4c8 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -17508,6 +17508,9 @@ cp_parser_decltype (cp_parser *parser)
>/* Parse to the closing `)'.  */
>if (expr == error_mark_node || !parens.require_close (parser))
>  {
> +  /* Commit to the tentative_firewall so we actually skip to the 
> closing
> +  parenthesis.  */
> +  cp_parser_commit_to_tentative_parse (parser);
>cp_parser_skip_to_closing_parenthesis (parser, true, false,
>/*consume_paren=*/true);
>expr = error_mark_node;
> diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype10.C 
> b/gcc/testsuite/g++.dg/cpp0x/decltype10.C
> index fe7247269f5..bd606e325d4 100644
> --- a/gcc/testsuite/g++.dg/cpp0x/decltype10.C
> +++ b/gcc/testsuite/g++.dg/cpp0x/decltype10.C
> @@ -7,3 +7,5 @@ template struct A
>  };
>
>  template int A::i(decltype (A::i;  // { dg-error "expected" 
> }
> +
> +// { dg-excess-errors "" }
> diff --git a/gcc/testsuite/g++.dg/cpp2a/pr114858.C 
> b/gcc/testsuite/g++.dg/cpp2a/pr114858.C
> new file mode 100644
> index 000..6ffde4c3a2c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/pr114858.C
> @@ -0,0 +1,25 @@
> +// PR c++/114858
> +// { dg-do compile { target c++20 } }
> +// { dg-timeout 2 }
> +
> +template  void g(F);
> +template 
> +auto h(A &&... a) -> decltype(
> +  decltype(g< // { dg-error "expected primary-expression" }
> +  decltype(g +  decltype(g +  decltype(g +  decltype(g +  decltype(g +  decltype(g +  decltype()>)(a)...)
> +{
> +  h([] {});
> +}
> +
> +int main() {
> +  h();
> +  return 0;
> +}
> +
> +// { dg-excess-errors "" }
> -- 
> 2.44.0

Re: [PATCH] arm: Fix missed CE optimization for armv8.1-m.main [PR 116444]

2024-09-30 Thread Ramana Radhakrishnan

On Fri, Sep 27, 2024 at 2:11 PM Andre Vieira (lists)
 wrote:
>
>
>
> On 26/09/2024 18:56, Ramana Radhakrishnan wrote:
> >
>
> >>   +/* Helper function to determine whether SEQ represents a sequence of
> >> +   instructions representing the Armv8.1-M Mainline conditional arithmetic
> >> +   instructions: csinc, csneg and csinv. The cinc instruction is generated
> >> +   using a different mechanism.  */
> >> +
> >> +static bool
> >> +arm_is_v81m_cond_insn (rtx_insn *seq)
> >> +{
> >> +  rtx_insn *curr_insn = seq;
> >> +  rtx set;
> >> +  /* The pattern may start with a simple set with register operands.  Skip
> >> + through any of those.  */
> >> +  while (curr_insn)
> >> +{
> >> +  set = single_set (curr_insn);
> >> +  if (!set
> >> +   || !REG_P (SET_DEST (set)))
> >> + return false;
> >> +
> >> +  if (!REG_P (SET_SRC (set)))
> >> + break;
> >> +  curr_insn = NEXT_INSN (curr_insn);
> >
> > Too late at night for me - but don’t you want to skip DEBUG_INSNS in some 
> > way here ?
>
>
> It's a good point, but this sequence is created by noce as a potential
> replacement for the incoming one and no debug insns are inserted here.

True and fair.

>
> Compiling gcc/gcc/testsuite/gcc.target/arm/csinv-1.c with -g3 for an
> Armv8.1-M Mainline target still generates the csinv.
>
> Either way, I could add code to skip if we don't have a NONDEBUG_INSN_P,
> but that means we should also do so after every NEXT_INSN after the
> while loop and at the end.  It does feel 'more robust', but I also fear
> it might be a bit overkill here?

Feels overkill as you say.

Please watch out for any regressions

Ok to commit and later backport if no regressions as this is
sufficiently isolated to only Armv8.1M configurations.

regards
Ramana

>
> Kind regards,
> Andre
>
>

Re: [PING] [PATCH] i386: Implement Thread Local Storage on Windows

2024-09-30 Thread Julian Waters

Pinging https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662860.html
again and also paging for Jan Hubicka, the x86 expert

best regards,
Julian

Re: [PATCH] RISC-V: Implement TARGET_CAN_INLINE_P

2024-09-30 Thread Yangyu Chen




> On Sep 30, 2024, at 13:58, Kito Cheng  wrote:
> 
> Hi Yang-Yu:
> 
>> 
>> Specially, we can reproduce the result on BananaPi-F3 Hardware:
>> 
>> Use this GCC branch with my patch:
>> https://github.com/cyyself/gcc/tree/rv_can_inline
>> 
>> And compile the coremark on this branch:
>> https://github.com/cyyself/coremark/tree/rva22_v_hotspot
>> 
>> With command `make CC=riscv64-unknown-linux-gnu-gcc compile`
>> 
>> With my patch, we will get the coremark scored `Iterations/Sec   :
>> 5992.917461`. But without this patch after `git reset HEAD^` and
>> recompile the GCC and then coremark, we will get `Iterations/Sec :
>> 5235.602094`, which is 12.6% slower.
> 
> Could you add a test case to demonstrate that ?
> 

I will try.

>> /* Callee's ISA should be a subset of the caller's ISA.  */
> 
> This check is necessary, but this way may not scalable for longer term,
> I mean people may forgot to update this part when adding new extension
> variables,
> so I would suggest add a new function to construct a riscv_subset_list
> from options
> e.g.
> 
> diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
> index dace4de6575..e8b7c0f194b 100644
> --- a/gcc/config/riscv/riscv-subset.h
> +++ b/gcc/config/riscv/riscv-subset.h
> @@ -103,6 +103,7 @@ public:
>  riscv_subset_list *clone () const;
> 
>  static riscv_subset_list *parse (const char *, location_t);
> +  static riscv_subset_list *parse (struct gcc_options *opts);

Thanks for this hint. However, using the class riscv_subset_list
is very costly and requires a copy of the ISA string. We can implement
a new function like bool riscv_ext_is_subset(struct gcc_options
*opts, struct gcc_options *subset) in riscv-common.cc and iterate
through the riscv_ext_flag_table. This method is also scalable for
the long term. I will submit the next revision lately.

>  const char *parse_single_ext (const char *, bool exact_single_p = true);
> 
>  const riscv_subset_t *begin () const {return m_head;};
> 
> And then use riscv_subset_list to do the checking

[PATCH] Fix crash with constant initializer

2024-09-30 Thread Eric Botcazou

Hi,

the attached Ada testcase compiled with -O2 -gnatn makes the compiler crash in 
vect_can_force_dr_alignment_p during SLP vectorization:

  if (decl_in_symtab_p (decl)
  && !symtab_node::get (decl)->can_increase_alignment_p ())
return false;

because symtab_node::get (decl) returns a null node.  The phenomenon occurs 
for a pair of twin symbols listed like so in .cgraph:

Opt7_Pkg.T12b/17 (Opt7_Pkg.T12b)
  Type: variable definition analyzed
  Visibility: semantic_interposition external public artificial
  Aux: @0x44d45e0
  References: 
  Referring: opt7_pkg__enum_name_table/13 (addr) opt7_pkg__enum_name_table/13 
(addr) 
  Availability: not-ready
  Varpool flags: initialized read-only const-value-known

Opt7_Pkg.T8b/16 (Opt7_Pkg.T8b)
  Type: variable definition analyzed
  Visibility: semantic_interposition external public artificial
  Aux: @0x7f9fda3fff00
  References: 
  Referring: opt7_pkg__enum_name_table/13 (addr) opt7_pkg__enum_name_table/13 
(addr) 
  Availability: not-ready
  Varpool flags: initialized read-only const-value-known

with:

opt7_pkg__enum_name_table/13 (Opt7_Pkg.Enum_Name_Table)
  Type: variable definition analyzed
  Visibility: semantic_interposition external public
  Aux: @0x44d45e0
  References: Opt7_Pkg.T8b/16 (addr) Opt7_Pkg.T8b/16 (addr) Opt7_Pkg.T12b/17 
(addr) Opt7_Pkg.T12b/17 (addr) 
  Referring: opt7_pkg__image/2 (read) opt7_pkg__image/2 (read) 
opt7_pkg__image/2 (read) opt7_pkg__image/2 (read) opt7_pkg__image/2 (read) 
opt7_pkg__image/2 (read) opt7_pkg__image/2 (read) opt7_pkg__image/2 (read) 
  Availability: not-ready
  Varpool flags: initialized read-only const-value-known

being the crux of the matter.

What happens is that symtab_remove_unreachable_nodes leaves the last symbol in 
kind of a limbo state: in .remove_symbols, we have:

opt7_pkg__enum_name_table/13 (Opt7_Pkg.Enum_Name_Table)
  Type: variable
  Body removed by symtab_remove_unreachable_nodes
  Visibility: externally_visible semantic_interposition external public
  References: 
  Referring: opt7_pkg__image/2 (read) opt7_pkg__image/2 (read) 
  Availability: not_available
  Varpool flags: initialized read-only const-value-known

This means that the "body" (DECL_INITIAL) of the symbol has been disregarded 
during reachability analysis, causing the first two symbols to be discarded:

Reclaiming variables: Opt7_Pkg.T12b/17 Opt7_Pkg.T8b/16

but the DECL_INITIAL is explicitly preserved for later constant folding, which 
makes it possible to retrofit the DECLs corresponding to the first two symbols 
in the GIMPLE IR and ultimately leads vect_can_force_dr_alignment_p to crash.


The decision to disregard the "body" (DECL_INITIAL) of the symbol is made in 
the first process_references present in ipa.cc:

  if (node->definition && !node->in_other_partition
  && ((!DECL_EXTERNAL (node->decl) || node->alias)
  || (possible_inline_candidate_p (node)
  /* We use variable constructors during late compilation for
 constant folding.  Keep references alive so partitioning
 knows about potential references.  */
  || (VAR_P (node->decl)
  && (flag_wpa
  || flag_incremental_link
 == INCREMENTAL_LINK_LTO)
  && dyn_cast  (node)
   ->ctor_useable_for_folding_p ()

because neither flag_wpa nor flag_incremental_link = INCREMENTAL_LINK_LTO is 
true, while the decision to ultimately preserve the DECL_INITIAL is made later 
in remove_unreachable_nodes:

  /* Keep body if it may be useful for constant folding.  */
  if ((flag_wpa || flag_incremental_link == INCREMENTAL_LINK_LTO)
  || ((init = ctor_for_folding (vnode->decl)) == error_mark_node))
vnode->remove_initializer ();
  else
DECL_INITIAL (vnode->decl) = init;


I think that the testcase shows that the "body" of ctor_useable_for_folding_p 
symbols must always be considered for reachability analysis (which could make 
the above test on ctor_for_folding useless).  But implementing that introduces 
a regression for g++.dg/ipa/devirt-39.C, because the vtable is preserved and 
in turn forces the method to be preserved, hence the special case for vtables.

The test also renames the first process_references function in ipa.cc to clear 
the confusion with the second function in the same file.

Bootstrapped/regtested on x86-64/Linux, OK for the mainline?


2024-09-30  Eric Botcazou  

* ipa.cc (process_references): Rename into...
(mark_references): ...this.  Always mark referenced external
variables as reachable if they are usable for folding, except
for vtables.
(symbol_table::remove_unreachable_nodes): Adjust to renaming.


2024-09-30  Eric Botcazou  

* gnat.dg/specs/opt7.ads: New test.
* gnat.dg/specs/opt7_pkg.ads: New helper.
* gnat.dg/specs/opt7_pkg.adb: Likewise.


-- 
Eric Botcazoudiff --git a/gcc/ipa.cc b/g

[PATCH v4 0/4] tree-optimization/116024 - match.pd: add 4 int-compare simplifications

2024-09-30 Thread Artemiy Volkov

Hi,

sending a v4 of
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663592.html
with the following changes since v3:

- Cleaned up the testcases in patches #1, #2, #4 by removing trivially
  dead initializers.
- Added a !TYPE_UNSIGNED () check in patch #3 for clarity.
- Removed the use of build_uniform_cst () in patch #4.

The series has been reviewed and pre-approved by Richard contingent on
the changes above, so assuming it looks good, could anyone please push
it to trunk/14 on my behalf?

Many thanks,
Artemiy

Artemiy Volkov (4):
  tree-optimization/116024 - simplify C1-X cmp C2 for UB-on-overflow
types
  tree-optimization/116024 - simplify C1-X cmp C2 for unsigned types
  tree-optimization/116024 - simplify C1-X cmp C2 for wrapping signed
types
  tree-optimization/116024 - simplify some cases of X +- C1 cmp C2

 gcc/match.pd  | 109 +-
 gcc/testsuite/gcc.dg/pr67089-6.c  |   4 +-
 .../gcc.dg/tree-ssa/pr116024-1-fwrapv.c   |  65 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c|  65 +++
 .../gcc.dg/tree-ssa/pr116024-2-fwrapv.c   |  38 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c|  37 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr116024.c  |  66 +++
 .../gcc.target/aarch64/gtu_to_ltu_cmp_1.c |   2 +-
 8 files changed, 382 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2-fwrapv.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024.c

-- 
2.44.2

[PATCH v4 2/4] tree-optimization/116024 - simplify C1-X cmp C2 for unsigned types

2024-09-30 Thread Artemiy Volkov

Implement a match.pd transformation inverting the sign of X in
C1 - X cmp C2, where C1 and C2 are integer constants and X is
of an unsigned type, by observing that:

(a) If cmp is == or !=, simply move X and C2 to opposite sides of the
comparison to arrive at X cmp C1 - C2.

(b) If cmp is <:
- C1 - X < C2 means that C1 - X spans the range of 0, 1, ..., C2 - 1;
- This means that X spans the range of C1 - (C2 - 1),
  C1 - (C2 - 2), ..., C1;
- Subtracting C1 - (C2 - 1), X - (C1 - (C2 - 1)) is one of 0, 1,
  ..., C1 - (C1 - (C2 - 1));
- Simplifying the above, X - (C1 - C2 + 1) is one of 0, 1, ...,
 C2 - 1;
- Summarizing, the expression C1 - X < C2 can be transformed
  into X - (C1 - C2 + 1) < C2.

(c) Similarly, if cmp is <=:
- C1 - X <= C2 means that C1 - X is one of 0, 1, ..., C2;
- It follows that X is one of C1 - C2, C1 - (C2 - 1), ..., C1;
- Subtracting C1 - C2, X - (C1 - C2) has range 0, 1, ..., C2;
- Thus, the expression C1 - X <= C2 can be transformed into
  X - (C1 - C2) <= C2.

(d) The >= and > cases are negations of (b) and (c), respectively.

This transformation allows to occasionally save load-immediate /
subtraction instructions, e.g. the following statement:

300 - (unsigned int)f() < 100;

now compiles to

addia0,a0,-201
sltiu   a0,a0,100

instead of

li  a5,300
sub a0,a5,a0
sltiu   a0,a0,100

on 32-bit RISC-V.

Additional examples can be found in the newly added test file.  This
patch has been bootstrapped and regtested on aarch64, x86_64, and i386,
and additionally regtested on riscv32.

gcc/ChangeLog:

PR tree-optimization/116024
* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr116024-1.c: New test.

Signed-off-by: Artemiy Volkov 
---
 gcc/match.pd   | 23 +++-
 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c | 65 ++
 2 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index b074f49eebd..46195a603d0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -9020,7 +9020,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 TYPE_SIGN (TREE_TYPE (@0)));
   constant_boolean_node (less == ovf_high, type);
 })
-  (rcmp @1 { res; }))
+  (rcmp @1 { res; })))
+/* For unsigned types, transform like so (using < as example):
+C1 - X < C2
+  ==>  C1 - X = { 0, 1, ..., C2 - 1 }
+  ==>  X = { C1 - (C2 - 1), ..., C1 + 1, C1 }
+  ==>  X - (C1 - (C2 - 1)) = { 0, 1, ..., C1 - (C1 - (C2 - 1)) }
+  ==>  X - (C1 - C2 + 1) = { 0, 1, ..., C2 - 1 }
+  ==>  X - (C1 - C2 + 1) < C2.
+
+  Similarly,
+C1 - X <= C2 ==> X - (C1 - C2) <= C2;
+C1 - X >= C2 ==> X - (C1 - C2 + 1) >= C2;
+C1 - X > C2 ==> X - (C1 - C2) > C2.  */
+   (if (TYPE_UNSIGNED (TREE_TYPE (@1)))
+ (switch
+   (if (cmp == EQ_EXPR || cmp == NE_EXPR)
+(cmp @1 (minus @0 @2)))
+   (if (cmp == LE_EXPR || cmp == GT_EXPR)
+(cmp (plus @1 (minus @2 @0)) @2))
+   (if (cmp == LT_EXPR || cmp == GE_EXPR)
+(cmp (plus @1 (minus @2
+  (plus @0 { build_one_cst (TREE_TYPE (@1)); }))) @2)))
 
 /* Canonicalizations of BIT_FIELD_REFs.  */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c
new file mode 100644
index 000..91cb6a7c4f1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1.c
@@ -0,0 +1,65 @@
+/* PR tree-optimization/116024 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-details" } */
+
+#include 
+
+uint32_t f(void);
+
+int32_t i2(void)
+{
+  uint32_t l = 10 - (uint32_t)f();
+  return l <= 20; // f() + 10 <= 20 
+}
+
+int32_t i2a(void)
+{
+  uint32_t l = 10 - (uint32_t)f();
+  return l < 30; // f() + 19 < 30 
+}
+
+int32_t i2b(void)
+{
+  uint32_t l = 200 - (uint32_t)f();
+  return l <= 100; // f() - 100 <= 100 
+}
+
+int32_t i2c(void)
+{
+  uint32_t l = 300 - (uint32_t)f();
+  return l < 100; // f() - 201 < 100
+}
+
+int32_t i2d(void)
+{
+  uint32_t l = 1000 - (uint32_t)f();
+  return l >= 2000; // f() + 999 >= 2000
+}
+
+int32_t i2e(void)
+{
+  uint32_t l = 1000 - (uint32_t)f();
+  return l > 3000; // f() + 2000 > 3000
+}
+
+int32_t i2f(void)
+{
+  uint32_t l = 2 - (uint32_t)f();
+  return l >= 1; // f() - 10001 >= 1
+}
+
+int32_t i2g(void)
+{
+  uint32_t l = 3 - (uint32_t)f();
+  return l > 1; // f() - 2 > 1
+}
+
+/* { dg-final { scan-tree-dump-times "Removing dead stmt:.*?- _" 8 "forwprop1" 
} } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ 10.*\n.*<= 
20" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ 19.*\n.*<= 
29" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "g

[PATCH v4 1/4] tree-optimization/116024 - simplify C1-X cmp C2 for UB-on-overflow types

2024-09-30 Thread Artemiy Volkov

Implement a match.pd pattern for C1 - X cmp C2, where C1 and C2 are
integer constants and X is of a UB-on-overflow type.  The pattern is
simplified to X rcmp C1 - C2 by moving X and C2 to the other side of the
comparison (with opposite signs).  If C1 - C2 happens to overflow,
replace the whole expression with either a constant 0 or a constant 1
node, depending on the comparison operator and the sign of the overflow.

This transformation allows to occasionally save load-immediate /
subtraction instructions, e.g. the following statement:

10 - (int) x <= 9;

now compiles to

sgt a0,a0,zero

instead of

li  a5,10
sub a0,a5,a0
sltia0,a0,10

on 32-bit RISC-V.

Additional examples can be found in the newly added test file. This
patch has been bootstrapped and regtested on aarch64, x86_64, and
i386, and additionally regtested on riscv32.  Existing tests were
adjusted where necessary.

gcc/ChangeLog:

PR tree-optimization/116024
* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr116024.c: New test.
* gcc.dg/pr67089-6.c: Adjust.

Signed-off-by: Artemiy Volkov 
---
 gcc/match.pd | 26 ++
 gcc/testsuite/gcc.dg/pr67089-6.c |  4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr116024.c | 66 
 3 files changed, 94 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024.c

diff --git a/gcc/match.pd b/gcc/match.pd
index e06a812e976..b074f49eebd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8996,6 +8996,32 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
}
(cmp @0 { res; })
 
+/* Invert sign of X in comparisons of the form C1 - X CMP C2.  */
+
+(for cmp (lt le gt ge eq ne)
+ rcmp (gt ge lt le eq ne)
+  (simplify
+   (cmp (minus INTEGER_CST@0 @1) INTEGER_CST@2)
+/* For UB-on-overflow types, simply switch sides for X and C2
+   to arrive at X RCMP C1 - C2, handling the case when the latter
+   expression overflows.  */
+   (if (!TREE_OVERFLOW (@0) && !TREE_OVERFLOW (@2)
+   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@1)))
+ (with { tree res = int_const_binop (MINUS_EXPR, @0, @2); }
+  (if (TREE_OVERFLOW (res))
+   (switch
+(if (cmp == NE_EXPR)
+ { constant_boolean_node (true, type); })
+(if (cmp == EQ_EXPR)
+ { constant_boolean_node (false, type); })
+{
+  bool less = cmp == LE_EXPR || cmp == LT_EXPR;
+  bool ovf_high = wi::lt_p (wi::to_wide (@0), 0,
+TYPE_SIGN (TREE_TYPE (@0)));
+  constant_boolean_node (less == ovf_high, type);
+})
+  (rcmp @1 { res; }))
+
 /* Canonicalizations of BIT_FIELD_REFs.  */
 
 (simplify
diff --git a/gcc/testsuite/gcc.dg/pr67089-6.c b/gcc/testsuite/gcc.dg/pr67089-6.c
index b59d75b2318..80a33c3f3e2 100644
--- a/gcc/testsuite/gcc.dg/pr67089-6.c
+++ b/gcc/testsuite/gcc.dg/pr67089-6.c
@@ -57,5 +57,5 @@ T (25, unsigned short, 2U - x, if (r > 2U) foo (0))
 T (26, unsigned char, 2U - x, if (r <= 2U) foo (0))
 
 /* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 16 "widening_mul" { target 
{ i?86-*-* x86_64-*-* } } } } */
-/* { dg-final { scan-tree-dump-times "SUB_OVERFLOW" 11 "widening_mul" { target 
{ { i?86-*-* x86_64-*-* } && { ! ia32 } } } } } */
-/* { dg-final { scan-tree-dump-times "SUB_OVERFLOW" 9 "widening_mul" { target 
{ { i?86-*-* x86_64-*-* } && ia32 } } } } */
+/* { dg-final { scan-tree-dump-times "SUB_OVERFLOW" 9 "widening_mul" { target 
{ { i?86-*-* x86_64-*-* } && { ! ia32 } } } } } */
+/* { dg-final { scan-tree-dump-times "SUB_OVERFLOW" 7 "widening_mul" { target 
{ { i?86-*-* x86_64-*-* } && ia32 } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116024.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr116024.c
new file mode 100644
index 000..6efa0c2f916
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116024.c
@@ -0,0 +1,66 @@
+/* PR tree-optimization/116024 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-details" } */
+
+#include 
+#include 
+
+uint32_t f(void);
+
+int32_t i1(void)
+{
+  int32_t l = 10 - (int32_t)f();
+  return l <= 9; // f() > 0
+}
+
+int32_t i1a(void)
+{
+  int32_t l = 20 - (int32_t)f();
+  return l <= INT32_MIN; // return 0
+}
+
+int32_t i1b(void)
+{
+  int32_t l = 30 - (int32_t)f();
+  return l <= INT32_MIN + 31; // f() == INT32_MAX
+}
+
+int32_t i1c(void)
+{
+  int32_t l = INT32_MAX - 40 - (int32_t)f();
+  return l <= -38; // f() > INT32_MAX - 3
+}
+
+int32_t i1d(void)
+{
+  int32_t l = INT32_MAX - 50 - (int32_t)f();
+  return l <= INT32_MAX - 1; // f() != -50
+}
+
+int32_t i1e(void)
+{
+  int32_t l = INT32_MAX - 60 - (int32_t)f();
+  return l != INT32_MAX - 90; // f() != 30
+}
+
+int32_t i1f(void)
+{
+  int32_t l = INT32_MIN + 70 - (int32_t)f();
+  return l <= INT32_MAX - 2; // return 0
+}
+
+int32_t i1g(void)
+{
+  int32_t l = INT32_MAX/2 + 30 - (int32_t)f();
+  return l <= INT32_MIN/2 - 30; // return 1

[PATCH v4 3/4] tree-optimization/116024 - simplify C1-X cmp C2 for wrapping signed types

2024-09-30 Thread Artemiy Volkov

Implement a match.pd transformation inverting the sign of X in
C1 - X cmp C2, where C1 and C2 are integer constants and X is
of a wrapping signed type, by observing that:

(a) If cmp is == or !=, simply move X and C2 to opposite sides of
the comparison to arrive at X cmp C1 - C2.

(b) If cmp is <:
- C1 - X < C2 means that C1 - X spans the values of -INF,
  -INF + 1, ..., C2 - 1;
- Therefore, X is one of C1 - -INF, C1 - (-INF + 1), ...,
  C1 - C2 + 1;
- Subtracting (C1 + 1), X - (C1 + 1) is one of - (-INF) - 1,
  - (-INF) - 2, ..., -C2;
- Using the fact that - (-INF) - 1 is +INF, derive that
  X - (C1 + 1) spans the values +INF, +INF - 1, ..., -C2;
- Thus, the original expression can be simplified to
  X - (C1 + 1) > -C2 - 1.

(c) Similarly, C1 - X <= C2 is equivalent to X - (C1 + 1) >= -C2 - 1.

(d) The >= and > cases are negations of (b) and (c), respectively.

(e) In all cases, the expression -C2 - 1 can be shortened to
bit_not (C2).

This transformation allows to occasionally save load-immediate /
subtraction instructions, e.g. the following statement:

10 - (int)f() >= 20;

now compiles to

addia0,a0,-11
sltia0,a0,-20

instead of

li  a5,10
sub a0,a5,a0
sltit0,a0,20
xoria0,t0,1

on 32-bit RISC-V when compiled with -fwrapv.

Additional examples can be found in the newly added test file.  This
patch has been bootstrapped and regtested on aarch64, x86_64, and i386,
and additionally regtested on riscv32.

gcc/ChangeLog:

PR tree-optimization/116024
* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr116024-1-fwrapv.c: New test.

Signed-off-by: Artemiy Volkov 
---
 gcc/match.pd  | 21 +-
 .../gcc.dg/tree-ssa/pr116024-1-fwrapv.c   | 65 +++
 2 files changed, 85 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 46195a603d0..3b973887470 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -9041,7 +9041,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (cmp (plus @1 (minus @2 @0)) @2))
(if (cmp == LT_EXPR || cmp == GE_EXPR)
 (cmp (plus @1 (minus @2
-  (plus @0 { build_one_cst (TREE_TYPE (@1)); }))) @2)))
+  (plus @0 { build_one_cst (TREE_TYPE (@1)); }))) @2)))
+/* For wrapping signed types (-fwrapv), transform like so (using < as example):
+C1 - X < C2
+  ==>  C1 - X = { -INF, -INF + 1, ..., C2 - 1 }
+  ==>  X = { C1 - (-INF), C1 - (-INF + 1), ..., C1 - C2 + 1 }
+  ==>  X - (C1 + 1) = { - (-INF) - 1, - (-INF) - 2, ..., -C2 }
+  ==>  X - (C1 + 1) = { +INF, +INF - 1, ..., -C2 }
+  ==>  X - (C1 + 1) > -C2 - 1
+  ==>  X - (C1 + 1) > bit_not (C2)
+
+  Similarly,
+C1 - X <= C2 ==> X - (C1 + 1) >= bit_not (C2);
+C1 - X >= C2 ==> X - (C1 + 1) <= bit_not (C2);
+C1 - X > C2 ==> X - (C1 + 1) < bit_not (C2).  */
+   (if (!TYPE_UNSIGNED (TREE_TYPE (@1))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@1)))
+ (if (cmp == EQ_EXPR || cmp == NE_EXPR)
+   (cmp @1 (minus @0 @2))
+ (rcmp (minus @1 (plus @0 { build_one_cst (TREE_TYPE (@1)); }))
+(bit_not @2
 
 /* Canonicalizations of BIT_FIELD_REFs.  */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c
new file mode 100644
index 000..24e1abef774
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c
@@ -0,0 +1,65 @@
+/* PR tree-optimization/116024 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-details -fwrapv" } */
+
+#include 
+
+uint32_t f(void);
+
+int32_t i2(void)
+{
+  int32_t l = 10 - (int32_t)f();
+  return l <= 20; // f() - 11 >= -21
+}
+
+int32_t i2a(void)
+{
+  int32_t l = 10 - (int32_t)f();
+  return l < 30; // f() - 11 > -31
+}
+
+int32_t i2b(void)
+{
+  int32_t l = 200 - (int32_t)f();
+  return l <= 100; // f() - 201 >= -101
+}
+
+int32_t i2c(void)
+{
+  int32_t l = 300 - (int32_t)f();
+  return l < 100; // f() - 301 > -101
+}
+
+int32_t i2d(void)
+{
+  int32_t l = 1000 - (int32_t)f();
+  return l >= 2000; // f() - 1001 <= -2001
+}
+
+int32_t i2e(void)
+{
+  int32_t l = 1000 - (int32_t)f();
+  return l > 3000; // f() - 1001 < -3001
+}
+
+int32_t i2f(void)
+{
+  int32_t l = 2 - (int32_t)f();
+  return l >= 1; // f() - 20001 <= -10001
+}
+
+int32_t i2g(void)
+{
+  int32_t l = 3 - (int32_t)f();
+  return l > 1; // f() - 30001 < -10001
+}
+
+/* { dg-final { scan-tree-dump-times "Removing dead stmt:.*?- _" 8 "forwprop1" 
} } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -11.*\n.*>= 
-21" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ -11.*\n.*>= 
-30" 1 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simpli

[PATCH v4 4/4] tree-optimization/116024 - simplify some cases of X +- C1 cmp C2

2024-09-30 Thread Artemiy Volkov

Whenever C1 and C2 are integer constants, X is of a wrapping type, and
cmp is a relational operator, the expression X +- C1 cmp C2 can be
simplified in the following cases:

(a) If cmp is <= and C2 -+ C1 == +INF(1), we can transform the initial
comparison in the following way:
   X +- C1 <= C2
   -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1)
   -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions)
   -INF -+ C1 <= X <= +INF (due to (1))
   -INF -+ C1 <= X (eliminate the right hand side since it holds for any X)

(b) By analogy, if cmp if >= and C2 -+ C1 == -INF(1), use the following
sequence of transformations:

   X +- C1 >= C2
   +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1)
   +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions)
   +INF -+ C1 >= X >= -INF (due to (1))
   +INF -+ C1 >= X (eliminate the right hand side since it holds for any X)

(c) The > and < cases are negations of (a) and (b), respectively.

This transformation allows to occasionally save add / sub instructions,
for instance the expression

3 + (uint32_t)f() < 2

compiles to

cmn w0, #4
csetw0, ls

instead of

add w0, w0, 3
cmp w0, 2
csetw0, ls

on aarch64.

Testcases that go together with this patch have been split into two
separate files, one containing testcases for unsigned variables and the
other for wrapping signed ones (and thus compiled with -fwrapv).
Additionally, one aarch64 test has been adjusted since the patch has
caused the generated code to change from

cmn w0, #2
csinc   w0, w1, wzr, cc   (x < -2)

to

cmn w0, #3
csinc   w0, w1, wzr, cs   (x <= -3)

This patch has been bootstrapped and regtested on aarch64, x86_64, and
i386, and additionally regtested on riscv32.

gcc/ChangeLog:

PR tree-optimization/116024
* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr116024-2.c: New test.
* gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto.
* gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Adjust.

Signed-off-by: Artemiy Volkov 
---
 gcc/match.pd  | 43 ++-
 .../gcc.dg/tree-ssa/pr116024-2-fwrapv.c   | 38 
 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c| 37 
 .../gcc.target/aarch64/gtu_to_ltu_cmp_1.c |  2 +-
 4 files changed, 118 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2-fwrapv.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 3b973887470..30e66d3dbfa 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8967,6 +8967,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(cmp @0 { TREE_OVERFLOW (res)
 ? drop_tree_overflow (res) : res; }
 (for cmp (lt le gt ge)
+ rcmp (gt ge lt le)
  (for op (plus minus)
   rop (minus plus)
   (simplify
@@ -8994,7 +8995,47 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  "X cmp C2 -+ C1"),
 WARN_STRICT_OVERFLOW_COMPARISON);
}
-   (cmp @0 { res; })
+   (cmp @0 { res; })
+/* For wrapping types, simplify the following cases of X +- C1 CMP C2:
+
+   (a) If CMP is <= and C2 -+ C1 == +INF (1), simplify to X >= -INF -+ C1
+   by observing the following:
+
+   X +- C1 <= C2
+  ==>  -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1)
+  ==>  -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions)
+  ==>  -INF -+ C1 <= X <= +INF (due to (1))
+  ==>  -INF -+ C1 <= X (eliminate the right hand side since it holds for any X)
+
+(b) Similarly, if CMP is >= and C2 -+ C1 == -INF (1):
+
+   X +- C1 >= C2
+  ==>  +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1)
+  ==>  +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions)
+  ==>  +INF -+ C1 >= X >= -INF (due to (1))
+  ==>  +INF -+ C1 >= X (eliminate the right hand side since it holds for any X)
+
+(c) The > and < cases are negations of (a) and (b), respectively.  */
+   (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
+ (with
+   {
+   wide_int max = wi::max_value (TREE_TYPE (@0));
+   wide_int min = wi::min_value (TREE_TYPE (@0));
+
+   wide_int c2 = rop == PLUS_EXPR
+ ? wi::add (wi::to_wide (@2), wi::to_wide (@1))
+ : wi::sub (wi::to_wide (@2), wi::to_wide (@1));
+   }
+   (if (((cmp == LE_EXPR || cmp == GT_EXPR) && wi::eq_p (c2, max))
+   || ((cmp == LT_EXPR || cmp == GE_EXPR) && wi::eq_p (c2, min)))
+ (with
+  {
+wide_int c1 = rop == PLUS_EXPR
+  ? wi::add (wi::bit_not (c2), wi::to_wide (@1))
+  : wi::sub (wi::bit_not (c2), wi::to_wide (@1));
+tree c1_cst = wide_int_to_tree (TREE_TYPE (@0), c1);
+  }
+  (rcmp @0 { c1_cst; })
 
 /* Invert si

Re: [Fortran, Patch, PR81265, v1] Fix passing coarrays always w/ descriptor

2024-09-30 Thread Andre Vehreschild

Hi Steve,

thanks for the review. Committed as: gcc-15-3958-gbac95615b50

Thanks again,
Andre

On Fri, 27 Sep 2024 10:48:46 -0700
Steve Kargl  wrote:

> On Fri, Sep 27, 2024 at 03:20:43PM +0200, Andre Vehreschild wrote:
> >
> > attached patch fixes a runtime issue when a coarray was passed as
> > parameter to a procedure that was itself a parameter. The issue here
> > was that the coarray was passed as array pointer (i.e. w/o descriptor)
> > to the function, but the function expected it to be an array
> > w/ descriptor.
> >
> > Regtests ok on x86_64-pc-linux-gnu / Fedore 39. Ok for mainline?
> >
>
> Yes.
>
> One general question as you're plowing through the coarray
> bug reports:  does the testing include -fcoarray=none, single,
> and lib; or a subset of the three.
>


--
Andre Vehreschild * Email: vehre ad gmx dot de

[COMMITTED] Re: Re: [PATCH] RISC-V: Add an implicit dependency for Zawrs

2024-09-30 Thread Xiao Zeng

2024-09-30 14:32  Kito Cheng  wrote:
>
>LGTM, and let me know if you need my help to commit that :) 
Thank you, Kito. Recently, I received permission from Jeff.

>
>On Mon, Sep 30, 2024 at 9:37 AM Xiao Zeng  wrote:
>>
>> There is a description in 
>> :
>>
>> "The instructions in the Zawrs extension are only useful in conjunction
>> with the LR instruction, which is provided by the Zalrsc component
>> of the A extension."
>>
>> It can be concluded that: zawrs -> zalrsc.
>>
>> gcc/ChangeLog:
>>
>> * common/config/riscv/riscv-common.cc: zawrs -> zalrsc.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/predef-38.c: New test.
>> * gcc.target/riscv/predef-39.c: New test.
>>
>> Signed-off-by: Xiao Zeng 
>> ---
>>  gcc/common/config/riscv/riscv-common.cc    |  1 +
>>  gcc/testsuite/gcc.target/riscv/predef-38.c | 31 ++
>>  gcc/testsuite/gcc.target/riscv/predef-39.c | 31 ++
>>  3 files changed, 63 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-38.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-39.c
>>
>> diff --git a/gcc/common/config/riscv/riscv-common.cc 
>> b/gcc/common/config/riscv/riscv-common.cc
>> index bd42fd01532..a6abd903b98 100644
>> --- a/gcc/common/config/riscv/riscv-common.cc
>> +++ b/gcc/common/config/riscv/riscv-common.cc
>> @@ -96,6 +96,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
>>
>>    {"zabha", "zaamo"},
>>    {"zacas", "zaamo"},
>> +  {"zawrs", "zalrsc"},
>>
>>    {"zcmop", "zca"},
>>
>> diff --git a/gcc/testsuite/gcc.target/riscv/predef-38.c 
>> b/gcc/testsuite/gcc.target/riscv/predef-38.c
>> new file mode 100644
>> index 000..986c02b451a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/predef-38.c
>> @@ -0,0 +1,31 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -march=rv32i_zawrs -mabi=ilp32 -mcmodel=medlow 
>> -misa-spec=20191213" } */
>> +
>> +int main () {
>> +
>> +#ifndef __riscv_arch_test
>> +#error "__riscv_arch_test"
>> +#endif
>> +
>> +#if __riscv_xlen != 32
>> +#error "__riscv_xlen"
>> +#endif
>> +
>> +#if !defined(__riscv_i)
>> +#error "__riscv_i"
>> +#endif
>> +
>> +#if !defined(__riscv_zawrs)
>> +#error "__riscv_zawrs"
>> +#endif
>> +
>> +#if !defined(__riscv_zalrsc)
>> +#error "__riscv_zalrsc"
>> +#endif
>> +
>> +#if defined(__riscv_a)
>> +#error "__riscv_a"
>> +#endif
>> +
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/riscv/predef-39.c 
>> b/gcc/testsuite/gcc.target/riscv/predef-39.c
>> new file mode 100644
>> index 000..558164de8c4
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/predef-39.c
>> @@ -0,0 +1,31 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -march=rv64i_zawrs -mabi=lp64 -mcmodel=medlow 
>> -misa-spec=20191213" } */
>> +
>> +int main () {
>> +
>> +#ifndef __riscv_arch_test
>> +#error "__riscv_arch_test"
>> +#endif
>> +
>> +#if __riscv_xlen != 64
>> +#error "__riscv_xlen"
>> +#endif
>> +
>> +#if !defined(__riscv_i)
>> +#error "__riscv_i"
>> +#endif
>> +
>> +#if !defined(__riscv_zawrs)
>> +#error "__riscv_zawrs"
>> +#endif
>> +
>> +#if !defined(__riscv_zalrsc)
>> +#error "__riscv_zalrsc"
>> +#endif
>> +
>> +#if defined(__riscv_a)
>> +#error "__riscv_a"
>> +#endif
>> +
>> +  return 0;
>> +}
>> --
>> 2.17.1
>>
Thanks
Xiao Zeng

Re: [RFC PATCH] More detailed diagnostics for section type conflicts

2024-09-30 Thread Florian Weimer

* Richard Biener:

>> +  append (flags & SECTION_RELRO, "RELRO");
>> +  append (flags & SECTION_EXCLUDE, "EXCLUDE");
>> +  append (flags & SECTION_RETAIN, "RETAIN");
>> +  append (flags & SECTION_LINK_ORDER, "LINK_ORDER");
>
> I'm not sure printing these internal flags is of help to the user.

There are cases where at least one of the conflicting sections is
internally created.  I came through this via PR116887.  In those cases,
the diagnostic is more of an ICE than a programmer error.

> So these are all cases where neither
>
>   /* It is fine if one of the section flags is
>  SECTION_WRITE | SECTION_RELRO and the other has none of these
>  flags (i.e. read-only) in named sections and either the
>  section hasn't been declared yet or has been declared as 
> writable.
>  In that case just make sure the resulting flags are
>  SECTION_WRITE | SECTION_RELRO, ie. writable only because of
>  relocations.  */
>   if (((sect->common.flags ^ flags) & (SECTION_WRITE | SECTION_RELRO))
>   == (SECTION_WRITE | SECTION_RELRO)
>   && (sect->common.flags
>   & ~(SECTION_DECLARED | SECTION_WRITE | SECTION_RELRO))
>  == (flags & ~(SECTION_WRITE | SECTION_RELRO))
>   && ((sect->common.flags & SECTION_DECLARED) == 0
>   || (sect->common.flags & SECTION_WRITE)))
> {
>   sect->common.flags |= (SECTION_WRITE | SECTION_RELRO);
>   return sect;
> }
>   /* If the SECTION_RETAIN bit doesn't match, return and switch
>  to a new section later.  */
>   if ((sect->common.flags & SECTION_RETAIN)
>   != (flags & SECTION_RETAIN))
> return sect;
>
> matched.  It should be possible to elaborate on the actual mismatch instead of
> dumping all of the random section flags?

Hmm.  This part

  && (sect->common.flags
  & ~(SECTION_DECLARED | SECTION_WRITE | SECTION_RELRO))
 == (flags & ~(SECTION_WRITE | SECTION_RELRO))

seems to restrict the early return to the case where the internal flags
match.  So I think we should print them in the general case.  I could
special-case the LoongArch error (“attempt to mark an existing section
as RELRO”), but we can't say *what* causes this section type change
attempt in that particular case (it seems to come from constant pool
generation), so it's not going to be useful to programmers either way.

Thanks,
Florian

[PATCH v2] x86/{,V}AES: adjust when to force EVEX encoding

2024-09-30 Thread Jan Beulich

Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly
said "..., but we need to emit {evex} prefix in the assembly if AES ISA
is not enabled". Yet it did so only for the TARGET_AES insns. Going from
the alternative chosen in the TARGET_VAES insns isn't quite right: If
AES is (also) enabled, EVEX encoding would needlessly be forced.

gcc/

* config/i386/sse.md (vaesdec_, vaesdeclast_,
vaesenc_, vaesenclast_): Replace which_alternative
check by TARGET_AES one.
---
As an aside - {evex} (and other) pseudo-prefixes would better be avoided
anyway whenever possible, as those are getting in the way of code
putting in place macro overrides for certain insns: gas 2.43 rejects
such bogus placement of pseudo-prefixes.

Is it, btw, correct that none of these insns have a "prefix" attribute?
---
v2: Adjust (shrink) description.

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -30802,7 +30802,7 @@
  UNSPEC_VAESDEC))]
   "TARGET_VAES"
 {
-  if (which_alternative == 0 && mode == V16QImode)
+  if (!TARGET_AES && mode == V16QImode)
 return "%{evex%} vaesdec\t{%2, %1, %0|%0, %1, %2}";
   else
 return "vaesdec\t{%2, %1, %0|%0, %1, %2}";
@@ -30816,7 +30816,7 @@
  UNSPEC_VAESDECLAST))]
   "TARGET_VAES"
 {
-  if (which_alternative == 0 && mode == V16QImode)
+  if (!TARGET_AES && mode == V16QImode)
 return "%{evex%} vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
   else
 return "vaesdeclast\t{%2, %1, %0|%0, %1, %2}";
@@ -30830,7 +30830,7 @@
  UNSPEC_VAESENC))]
   "TARGET_VAES"
 {
-  if (which_alternative == 0 && mode == V16QImode)
+  if (!TARGET_AES && mode == V16QImode)
 return "%{evex%} vaesenc\t{%2, %1, %0|%0, %1, %2}";
   else
 return "vaesenc\t{%2, %1, %0|%0, %1, %2}";
@@ -30844,7 +30844,7 @@
  UNSPEC_VAESENCLAST))]
   "TARGET_VAES"
 {
-  if (which_alternative == 0 && mode == V16QImode)
+  if (!TARGET_AES && mode == V16QImode)
 return "%{evex%} vaesenclast\t{%2, %1, %0|%0, %1, %2}";
   else
 return "vaesenclast\t{%2, %1, %0|%0, %1, %2}";

Re: [RFC PATCH] More detailed diagnostics for section type conflicts

2024-09-30 Thread Florian Weimer

* David Malcolm:

> I'm not quite sure what you mean by "non-error" and "non-anchored". 

Sorry, I'm not familiar with the appropriate terminology.

> By "non-error", do you mean that this should this be a warning?  If so,
> use warning_at.  You can use 0 for the option_id whilst prototyping. 
> Or use "inform" to get a note.

I meant that I want to attach something like a note to another error.
There is just one error here (an ICE really, in case of PR116887).  If
it's okay I can call the error function without a location multiple
times to provide the additional information.  But the thing the
alternative below seems more appropriate.

> By "non-anchored", do you mean "not associated with any particular
> source location"?  If so, use error_at/warning_at and use
> UNKNOWN_LOCATION for the location_t ("error" and "warning" implicitly
> use the "input_location" global variable; it's usually best to instead
> specify a location, or use UNKNOWN_LOCATION for "global" problems).

Ahh, so use inform (UNKNOWN_LOCATION, …)?  I see a couple of examples
like that.

I'll wait for further comments from Richi and repost.  There are also
some test cases that need adjusting.

Thanks,
Florian

Re: [PATCH] lra: emit caller-save register spills before call insn [PR116028]

2024-09-30 Thread Christophe Lyon

Hi!

Sorry for replying late...


On Sat, 10 Aug 2024 at 05:15, Andrew Pinski  wrote:
>
> On Fri, Aug 9, 2024 at 8:11 PM Xi Ruoyao  wrote:
> >
> > On Fri, 2024-08-09 at 17:55 -0400, Vladimir Makarov wrote:
> >
> > > Still, for GCC developer novice, I think it is important to test all
> > > major targets and aarch64 (one target on which bootstrap was broken) is
> > > the 2nd most important target.
> >
> > Linaro CI will complain (via off-list email) if a patch posted on the
> > list breaks aarch64.  It complained some of my patches and I fixed them
> > before commit.  Why this case was not caught?
>
> I had been wondering the same until I looked into it earlier today.
> Linaro CI's does `--disable-bootstrap` and there was no extra
> testsuite failures with the patch.
> So Linaro CI's is not catching all the bugs that a developer would
> catch in the end. Because bootstrap is one of the normal requirements;
> though usually only on one target.
>

That's not quite right :-)

* Linaro precommit CI does indeed disable bootstrap,, for HW bandwidth
reasons. Bootstrap takes longer, and would mean we can test less
patches in precommit mode.

* Linaro postcommit CI does include (several) bootstrap), and we did
report the breakage on gcc-regression:
https://gcc.gnu.org/pipermail/gcc-regression/2024-August/080509.html

Looking at our associated Jira card (see link to
https://linaro.atlassian.net/browse/GNU-1310 in the report above), we
noticed regressions in:
- aarch64 "normal' bootstrap (build failure)
- aarch64 bootstrap-debug (build failure)
- aarch64 bootstrap-profiled-lto (build failure)
- aarch64 non-bootstrap
Running libgo:gcc.git~master/libgo/libgo.exp ...
FAIL: net
- arm non-bootstrap:
FAIL: 
libphobos.phobos/std/experimental/allocator/building_blocks/kernighan_ritchie.d
execution test

We sent the regression report on Aug 8th, the same day the patch was
committed. I supposed the reason we didn't accumulate regressions in
more cases in our Jira is that the patch was reverted on Aug 9th.


It's not the first time I see a "request" to enable bootstrap in
precommit CI though :-)

Thanks,

Christophe


> Thanks,
> Andrew Pinski
>
> >
> > --
> > Xi Ruoyao 
> > School of Aerospace Science and Technology, Xidian University

Re: [PATCH v2] c++: concept in default argument [PR109859]

2024-09-30 Thread Jason Merrill


On 9/27/24 5:30 PM, Marek Polacek wrote:

On Fri, Sep 27, 2024 at 04:57:58PM -0400, Jason Merrill wrote:

On 9/18/24 5:06 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
1) We're hitting the assert in cp_parser_placeholder_type_specifier.
It says that if it turns out to be false, we should do error() instead.
Do so, then.

2) lambda-targ8.C should compile fine, though.  The problem was that
local_variables_forbidden_p wasn't cleared when we're about to parse
the optional template-parameter-list for a lambda in a default argument.

PR c++/109859

gcc/cp/ChangeLog:

* parser.cc (cp_parser_lambda_declarator_opt): Temporarily clear
local_variables_forbidden_p.
(cp_parser_placeholder_type_specifier): Turn an assert into an error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-defarg3.C: New test.
* g++.dg/cpp2a/lambda-targ8.C: New test.
---
   gcc/cp/parser.cc  |  9 +++--
   gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C |  8 
   gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C | 10 ++
   3 files changed, 25 insertions(+), 2 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 4dd9474cf60..bdc4fef243a 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -11891,6 +11891,11 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, 
tree lambda_expr)
 "lambda templates are only available with "
 "%<-std=c++20%> or %<-std=gnu++20%>");
+  /* Even though the whole lambda may be a default argument, its
+template-parameter-list is a context where it's OK to create
+new parameters.  */
+  auto lvf = make_temp_override (parser->local_variables_forbidden_p, 0u);
+
 cp_lexer_consume_token (parser->lexer);
 template_param_list = cp_parser_template_parameter_list (parser);
@@ -20978,8 +20983,8 @@ cp_parser_placeholder_type_specifier (cp_parser 
*parser, location_t loc,
 /* In a default argument we may not be creating new parameters.  */
 if (parser->local_variables_forbidden_p & LOCAL_VARS_FORBIDDEN)
{
- /* If this assert turns out to be false, do error() instead.  */
- gcc_assert (tentative);
+ if (!tentative)
+   error_at (loc, "local variables may not appear in this context");


There's no local variable in the new testcase, the error should talk about a
concept-name.


Ah sure.  So like this?

Tested dg.exp.

-- >8 --
1) We're hitting the assert in cp_parser_placeholder_type_specifier.
It says that if it turns out to be false, we should do error() instead.
Do so, then.

2) lambda-targ8.C should compile fine, though.  The problem was that
local_variables_forbidden_p wasn't cleared when we're about to parse
the optional template-parameter-list for a lambda in a default argument.

PR c++/109859

gcc/cp/ChangeLog:

* parser.cc (cp_parser_lambda_declarator_opt): Temporarily clear
local_variables_forbidden_p.
(cp_parser_placeholder_type_specifier): Turn an assert into an error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-defarg3.C: New test.
* g++.dg/cpp2a/lambda-targ8.C: New test.
---
  gcc/cp/parser.cc  |  9 +++--
  gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C |  8 
  gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C | 10 ++
  3 files changed, 25 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f50534f5f39..a92e6a29ba6 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -11891,6 +11891,11 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, 
tree lambda_expr)
 "lambda templates are only available with "
 "%<-std=c++20%> or %<-std=gnu++20%>");
  
+  /* Even though the whole lambda may be a default argument, its

+template-parameter-list is a context where it's OK to create
+new parameters.  */
+  auto lvf = make_temp_override (parser->local_variables_forbidden_p, 0u);
+
cp_lexer_consume_token (parser->lexer);
  
template_param_list = cp_parser_template_parameter_list (parser);

@@ -20989,8 +20994,8 @@ cp_parser_placeholder_type_specifier (cp_parser 
*parser, location_t loc,
/* In a default argument we may not be creating new parameters.  */
if (parser->local_variables_forbidden_p & LOCAL_VARS_FORBIDDEN)
{
- /* If this assert turns out to be false, do error() instead.  */
- gcc_assert (tentative);
+ if (!tentative)
+   error_at (loc, "concept-name may not appear in this context");


Hmm, actually I expect it can appear

Re: [PATCH 1/3] bpf: make sure CO-RE relocs are never typed with a BTF_KIND_CONST

2024-09-30 Thread David Faust




On 9/27/24 09:49, Cupertino Miranda wrote:
> Based on observation within bpf-next selftests and comparisson of GCC
> and clang compiled code, the BPF loader expects all CO-RE relocations to
> point to BTF non const type nodes.
> ---
>  gcc/btfout.cc |  2 +-
>  gcc/config/bpf/btfext-out.cc  |  6 
>  gcc/ctfc.h|  2 ++
>  .../gcc.target/bpf/core-attr-const.c  | 32 +++
>  4 files changed, 41 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/core-attr-const.c
> 
> diff --git a/gcc/btfout.cc b/gcc/btfout.cc
> index 8b91bde8798..24f62ec1a52 100644
> --- a/gcc/btfout.cc
> +++ b/gcc/btfout.cc
> @@ -167,7 +167,7 @@ get_btf_kind (uint32_t ctf_kind)
>  
>  /* Convenience wrapper around get_btf_kind for the common case.  */
>  
> -static uint32_t
> +uint32_t
>  btf_dtd_kind (ctf_dtdef_ref dtd)
>  {
>if (!dtd)
> diff --git a/gcc/config/bpf/btfext-out.cc b/gcc/config/bpf/btfext-out.cc
> index 095c35b894b..655da23066d 100644
> --- a/gcc/config/bpf/btfext-out.cc
> +++ b/gcc/config/bpf/btfext-out.cc
> @@ -320,6 +320,12 @@ bpf_core_reloc_add (const tree type, const char * 
> section_name,
>ctf_container_ref ctfc = ctf_get_tu_ctfc ();
>ctf_dtdef_ref dtd = ctf_lookup_tree_type (ctfc, type);
>  
> +  /* Make sure CO-RE type is never the const version.  */
> +  if (btf_dtd_kind (dtd) == BTF_KIND_CONST

Hm, what about volatile and restrict? I would guess they are treated in
the same way as const by the kernel BPF loader, so probably we will have
to handle them here also. Please check.

> +  && kind >= BPF_RELO_FIELD_BYTE_OFFSET
> +  && kind <= BPF_RELO_FIELD_RSHIFT_U64)
> +dtd = dtd->ref_type;
> +
>/* Buffer the access string in the auxiliary strtab.  */
>bpfcr->bpfcr_astr_off = 0;
>gcc_assert (accessor != NULL);
> diff --git a/gcc/ctfc.h b/gcc/ctfc.h
> index 41e1169f271..e5967f590f9 100644
> --- a/gcc/ctfc.h
> +++ b/gcc/ctfc.h
> @@ -465,4 +465,6 @@ extern void btf_mark_type_used (tree);
>  extern int ctfc_get_dtd_srcloc (ctf_dtdef_ref, ctf_srcloc_ref);
>  extern int ctfc_get_dvd_srcloc (ctf_dvdef_ref, ctf_srcloc_ref);
>  
> +extern uint32_t btf_dtd_kind (ctf_dtdef_ref dtd);
> +
>  #endif /* GCC_CTFC_H */
> diff --git a/gcc/testsuite/gcc.target/bpf/core-attr-const.c 
> b/gcc/testsuite/gcc.target/bpf/core-attr-const.c
> new file mode 100644
> index 000..34a4a9cc5e8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/core-attr-const.c
> @@ -0,0 +1,32 @@
> +/* Test to make sure CO-RE access relocs point to non const versions of the
> +   type.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O0 -dA -gbtf -mco-re -masm=normal" } */
> +
> +struct S {
> +  int a;
> +  int b;
> +  int c;
> +} __attribute__((preserve_access_index));
> +
> +void
> +func (struct S * s)
> +{
> +  int *x;
> +  int *y;
> +  const struct S *cs = s;
> +
> +  /* 0:2 */
> +  x = &(s->c);
> +
> +  /* 0:2 */
> +  y = (int *) &(cs->c);
> +
> +  *x = 4;
> +  *y = 4;
> +}
> +
> +/* Both const and non const struct type should have the same bpfcr_type. */
> +/* { dg-final { scan-assembler-times "0x1\t# bpfcr_type \\(struct S\\)" 1 } 
> } */
> +/* { dg-final { scan-assembler-times "0x1\t# bpfcr_type \\(const struct 
> S\\)" 1 } } */

[PATCH] libstdc++: Workaround glibc header on ia64-linux

2024-09-30 Thread Frank Scheiner


We see:

```
FAIL: 17_intro/names.cc  -std=gnu++17 (test for excess errors)
FAIL: 17_intro/names_pstl.cc  -std=gnu++17 (test for excess errors)
FAIL: experimental/names.cc  -std=gnu++17 (test for excess errors)
```

...on ia64-linux.

This is due to:

* /usr/include/bits/sigcontext.h:32-38:
```
32 struct __ia64_fpreg
33   {
34 union
35   {
36 unsigned long bits[2];
37   } u;
38   } __attribute__ ((__aligned__ (16)));
```

* /usr/include/sys/ucontext.h:39-45:
```
 39 struct __ia64_fpreg_mcontext
 40   {
 41 union
 42   {
 43 unsigned long __ctx(bits)[2];
 44   } __ctx(u);
 45   } __attribute__ ((__aligned__ (16)));
```

...from glibc 2.39 (w/ia64 support re-added). See the discussion
starting on [1].

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654487.html

The following patch adds a workaround for this on the libstdc++
testsuite side.

Signed-off-by: Frank Scheiner 
---
 libstdc++-v3/testsuite/17_intro/names.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc
b/libstdc++-v3/testsuite/17_intro/names.cc
index 9b0ffcb50b2..b45aefe1ccf 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -265,6 +265,12 @@
 #undef j
 #endif

+#if defined (__linux__) && defined (__ia64__)
+//  defines __ia64_fpreg::u
+//  defines __ia64_fpreg_mcontext::u
+#undef u
+#endif
+
 #if defined (__linux__) && defined (__powerpc__)
 //  defines __vector128::u
 #undef u
--
2.45.2

Re: [PATCH 2/2]AArch64: support encoding integer immediates using floating point moves

2024-09-30 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> This patch extends our immediate SIMD generation cases to support generating
> integer immediates using floating point operation if the integer immediate 
> maps
> to an exact FP value.
>
> As an example:
>
> uint32x4_t f1() {
> return vdupq_n_u32(0x3f80);
> }
>
> currently generates:
>
> f1:
> adrpx0, .LC0
> ldr q0, [x0, #:lo12:.LC0]
> ret
>
> i.e. a load, but with this change:
>
> f1:
> fmovv0.4s, 1.0e+0
> ret
>
> Such immediates are common in e.g. our Math routines in glibc because they are
> created to extract or mark part of an FP immediate as masks.

I agree this is a good thing to do.  The current code is too beholden
to the original vector mode.  This patch relaxes it so that it isn't
beholden to the original mode's class (integer vs. float), but it would
still be beholden to the original mode's element size.

It looks like an alternative would be to remove:

  scalar_float_mode elt_float_mode;
  if (n_elts == 1
  && is_a  (elt_mode, &elt_float_mode))
{
  rtx elt = CONST_VECTOR_ENCODED_ELT (op, 0);
  if (aarch64_float_const_zero_rtx_p (elt)
  || aarch64_float_const_representable_p (elt))
{
  if (info)
*info = simd_immediate_info (elt_float_mode, elt);
  return true;
}
}

and instead insert code:

  /* Get the repeating 8-byte value as an integer.  No endian correction
 is needed here because bytes is already in lsb-first order.  */
  unsigned HOST_WIDE_INT val64 = 0;
  for (unsigned int i = 0; i < 8; i++)
val64 |= ((unsigned HOST_WIDE_INT) bytes[i % nbytes]
  << (i * BITS_PER_UNIT));

---> here

  if (vec_flags & VEC_SVE_DATA)
return aarch64_sve_valid_immediate (val64, info);
  else
return aarch64_advsimd_valid_immediate (val64, info, which);

that tries to reduce val64 to the smallest repeating pattern,
then tries to interpret that pattern as a float.  The reduction step
could reuse the first part of aarch64_sve_valid_immediate, which
calculates the narrowest repeating integer mode:

  scalar_int_mode mode = DImode;
  unsigned int val32 = val64 & 0x;
  if (val32 == (val64 >> 32))
{
  mode = SImode;
  unsigned int val16 = val32 & 0x;
  if (val16 == (val32 >> 16))
{
  mode = HImode;
  unsigned int val8 = val16 & 0xff;
  if (val8 == (val16 >> 8))
mode = QImode;
}
}

This would give us the candidate integer mode, to which we could
apply float_mode_for_size (...).exists, as in the patch.

In this case we would have the value as an integer, rather than
as an rtx, so I think it would make sense to split out the part of
aarch64_float_const_representable_p that processes the REAL_VALUE_TYPE.
aarch64_simd_valid_immediate could then use the patch's:

> +  long int as_long_ints[2];
> +  as_long_ints[0] = buf & 0x;
> +  as_long_ints[1] = (buf >> 32) & 0x;
> [...]
> +  real_from_target (&r, as_long_ints, fmode);

with "buf" being "val64" in the code above, and "fmode" being the result
of float_mode_for_size (...).exists.  aarch64_simd_valid_immediate
would then pass "r" and and "fmode" to the new, split-out variant of
aarch64_float_const_representable_p.  (I haven't checked the endiannes
requirements for real_from_target.)

The split-out variant would still perform the HFmode test in:

  if (GET_MODE (x) == VOIDmode
  || (GET_MODE (x) == HFmode && !TARGET_FP_F16INST))
return false;

The VOIDmode test is redundant and can be dropped.  AArch64 has always
been a CONST_WIDE_INT target.

If we do that, we should probably also pass the integer mode calculated
by the code quoted above down to aarch64_sve_valid_immediate (where it
came from) and aarch64_advsimd_valid_immediate, since both of them would
find it useful.  E.g.:

  /* Try using a replicated byte.  */
  if (which == AARCH64_CHECK_MOV
  && val16 == (val32 >> 16)
  && val8 == (val16 >> 8))
{
  if (info)
*info = simd_immediate_info (QImode, val8);
  return true;
}

would become:

  /* Try using a replicated byte.  */
  if (which == AARCH64_CHECK_MOV && mode == QImode)
{
  if (info)
*info = simd_immediate_info (QImode, val8);
  return true;
}

I realise that's quite a bit different from the patch as posted, sorry,
and I've made it sound more complicated than it actually is.  But I think
it should be both more general (because it ignores the element size as
well as the mode class) and a little simpler.

The proposed split of aarch64_float_const_representable_p would be
a replacement for patch 1 in the series.  The current rtx version
of aarch64_float_const_representable_p would not need to take a mode,
but the REAL_VALUE_TYPE interface would.

Thanks,
Richard

>
> Bootstrapped Regtested on aarch64-none-linux-gnu and  issues.
>
> Ok for master?
>
> Thanks,

[PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

2024-09-30 Thread Soumya AR

This patch uses the FSCALE instruction provided by SVE to implement the
standard ldexp family of functions.

Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
following code:

float
test_ldexpf (float x, int i)
{
return __builtin_ldexpf (x, i);
}

double
test_ldexp (double x, int i)
{
return __builtin_ldexp(x, i);
}

GCC Output:

test_ldexpf:
b ldexpf

test_ldexp:
b ldexp

Since SVE has support for an FSCALE instruction, we can use this to process
scalar floats by moving them to a vector register and performing an fscale call,
similar to how LLVM tackles an ldexp builtin as well.

New Output:

test_ldexpf:
fmov s31, w0
ptrue p7.b, all
fscale z0.s, p7/m, z0.s, z31.s
ret

test_ldexp:
sxtw x0, w0
ptrue p7.b, all
fmov d31, x0
fscale z0.d, p7/m, z0.d, z31.d
ret

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR 

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md
(ldexp3): Added a new pattern to match ldexp calls with scalar
floating modes and expand to the existing pattern for FSCALE.
(@aarch64_pred_): Extended the pattern to accept SVE
operands as well as scalar floating modes.

* config/aarch64/iterators.md:
SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well
as SF and DF.
VPRED: Extended the attribute to handle GPF modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/fscale.c: New test.



0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch
Description: 0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch

Patch ping Re: [PATCH] opts: Fix up regenerate-opt-urls dependencies

2024-09-30 Thread Jakub Jelinek

Hi!

On Sat, Sep 21, 2024 at 07:43:25PM +0200, Jakub Jelinek wrote:
> It seems that we currently require
> 1) enabling at least c,c++,fortran,d in --enable-languages
> 2) first doing make html
> before one can successfully regenerate-opt-urls, otherwise without 2)
> one gets
> make regenerate-opt-urls
> make: *** No rule to make target 
> '/home/jakub/src/gcc/obj12x/gcc/HTML/gcc-15.0.0/gcc/Option-Index.html', 
> needed by 'regenerate-opt-urls'.  Stop.
> or say if not configuring d after make html one still gets
> make regenerate-opt-urls
> make: *** No rule to make target 
> '/home/jakub/src/gcc/obj12x/gcc/HTML/gcc-15.0.0/gdc/Option-Index.html', 
> needed by 'regenerate-opt-urls'.  Stop.
> 
> Now, I believe neither 1) nor 2) is really necessary.
> The regenerate-opt-urls goal has dependency on 3 Option-Index.html files,
> but those files don't have dependencies how to generate them.
> make html has dependency on $(HTMLS_BUILD) which adds
> $(build_htmldir)/gcc/index.html and lang.html among other things, where
> the former actually builds not just index.html but also Option-Index.html
> and tons of other files, and lang.html is filled in by configure depending
> on configured languages, so sometimes will include gfortran.html and
> sometimes d.html.
> 
> The following patch adds dependencies of the Option-Index.html on their
> corresponding index.html files and that is all that seems to be needed,
> make regenerate-opt-urls then works even without prior make html and
> even if just a subset of c/c++, fortran and d is enabled.

I'd like to ping this patch.  Bootstrapped/regtested on x86_64-linux and
i686-linux several times (and tested that make regenerate-opt-urls just
works in that case even without make html or configuring in d,fortran).

> 2024-09-21  Jakub Jelinek  
> 
>   * Makefile.in ($(OPT_URLS_HTML_DEPS)): Add dependencies of the
>   Option-Index.html files on the corresponding index.html files.
>   Don't mention the requirement that all languages that have their own
>   HTML manuals to be enabled.
> 
> --- gcc/Makefile.in.jj2024-09-18 15:03:25.979207519 +0200
> +++ gcc/Makefile.in   2024-09-21 19:26:31.160949856 +0200
> @@ -3640,12 +3640,12 @@ $(build_htmldir)/gccinstall/index.html:
>   $(SHELL) $(srcdir)/doc/install.texi2html
>  
>  # Regenerate the .opt.urls files from the generated html, and from the .opt
> -# files.  Doing so requires all languages that have their own HTML manuals
> -# to be enabled.
> +# files.
>  .PHONY: regenerate-opt-urls
>  OPT_URLS_HTML_DEPS = $(build_htmldir)/gcc/Option-Index.html \
>   $(build_htmldir)/gdc/Option-Index.html \
>   $(build_htmldir)/gfortran/Option-Index.html
> +$(OPT_URLS_HTML_DEPS): %/Option-Index.html: %/index.html
>  
>  regenerate-opt-urls: $(srcdir)/regenerate-opt-urls.py $(OPT_URLS_HTML_DEPS)
>   $(srcdir)/regenerate-opt-urls.py $(build_htmldir) $(shell dirname 
> $(srcdir))

Jakub

Re: [RFC PATCH] More detailed diagnostics for section type conflicts

2024-09-30 Thread David Malcolm

On Mon, 2024-09-30 at 09:33 +0200, Florian Weimer wrote:
> * David Malcolm:
> 
> > I'm not quite sure what you mean by "non-error" and "non-
> > anchored". 
> 
> Sorry, I'm not familiar with the appropriate terminology.
> 
> > By "non-error", do you mean that this should this be a warning?  If
> > so,
> > use warning_at.  You can use 0 for the option_id whilst
> > prototyping. 
> > Or use "inform" to get a note.
> 
> I meant that I want to attach something like a note to another error.
> There is just one error here (an ICE really, in case of PR116887). 
> If
> it's okay I can call the error function without a location multiple
> times to provide the additional information.  But the thing the
> alternative below seems more appropriate.

To attach notes to another diagnostic, use auto_diagnostic_group, a
RAII class that, during its lifetime, put all diagnostics into a group:

   {
 auto_diagnostic_group d;
 error_at (somewhere, "can't find %qs", "foo");
 inform (somewhere_else, "here's where I last remember seeing it");
}

The effect isn't visible on the standard text output format, but is in
SARIF.

> 
> > By "non-anchored", do you mean "not associated with any particular
> > source location"?  If so, use error_at/warning_at and use
> > UNKNOWN_LOCATION for the location_t ("error" and "warning"
> > implicitly
> > use the "input_location" global variable; it's usually best to
> > instead
> > specify a location, or use UNKNOWN_LOCATION for "global" problems).
> 
> Ahh, so use inform (UNKNOWN_LOCATION, …)?  I see a couple of examples
> like that.

Yes.

Using UNKNOWN_LOCATION will lead to output like:

  cc1: note: message goes here

which might be appropriate, but can be unhelpful to the user for
tracking down the problem.  Is there no source location that's
relevant?

I got the impression from Richi's comments that this might be more of a
GCC developer thing rather than an end-user thing, so perhaps using the
dumpfile might be more appropriate?  (I'm not sure)


Dave

> 
> I'll wait for further comments from Richi and repost.  There are also
> some test cases that need adjusting.
> 
> Thanks,
> Florian
>

Re: [PATCH] Fixup unaligned load/store cost for znver5

2024-09-30 Thread Jan Hubicka

> Currently unaligned YMM and ZMM load and store costs are cheaper than
> aligned which causes the vectorizer to purposely mis-align accesses
> by adding an alignment prologue.  It looks like the unaligned costs
> were simply copied from the bogus znver4 costs.  The following makes
> the unaligned costs equal to the aligned costs like in the fixed znver4
> version.
> 
> Pushed as obvious (matching the znver4 change).
> 
>   * config/i386/x86-tune-costs.h (znver5_cost): Update unaligned
>   load and store cost from the aligned costs.
Hi,
I backported this patch to active branches (where Richi did not beat me)

Honza

[PATCH] tree-optimization/116566 - single lane SLP for VLA inductions

2024-09-30 Thread Richard Biener

The following adds SLP support for vectorizing single-lane inductions
with variable length vectors.

This is a WIP patch, local testing for SVE and riscv is fine but the
CI might discover issues.

PR tree-optimization/116566
* tree-vect-loop.cc (vectorizable_induction): Handle single-lane
SLP for VLA vectors.
---
 gcc/tree-vect-loop.cc | 192 ++
 1 file changed, 156 insertions(+), 36 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 0ce1bf8ebba..206c44226bd 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10282,7 +10282,6 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   gimple *new_stmt;
   gphi *induction_phi;
   tree induc_def, vec_dest;
-  tree init_expr, step_expr;
   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned i;
   tree expr;
@@ -10368,7 +10367,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
 iv_loop = loop;
   gcc_assert (iv_loop == (gimple_bb (phi))->loop_father);
 
-  if (slp_node && !nunits.is_constant ())
+  if (slp_node && (!nunits.is_constant () && SLP_TREE_LANES (slp_node) != 1))
 {
   /* The current SLP code creates the step value element-by-element.  */
   if (dump_enabled_p ())
@@ -10386,7 +10385,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return false;
 }
 
-  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
+  tree step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
   gcc_assert (step_expr != NULL_TREE);
   if (INTEGRAL_TYPE_P (TREE_TYPE (step_expr))
   && !type_has_mode_precision_p (TREE_TYPE (step_expr)))
@@ -10474,9 +10473,6 @@ vectorizable_induction (loop_vec_info loop_vinfo,
[i2 + 2*S2, i0 + 3*S0, i1 + 3*S1, i2 + 3*S2].  */
   if (slp_node)
 {
-  /* Enforced above.  */
-  unsigned int const_nunits = nunits.to_constant ();
-
   /* The initial values are vectorized, but any lanes > group_size
 need adjustment.  */
   slp_tree init_node
@@ -10498,11 +10494,12 @@ vectorizable_induction (loop_vec_info loop_vinfo,
 
   /* Now generate the IVs.  */
   unsigned nvects = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-  gcc_assert ((const_nunits * nvects) % group_size == 0);
+  gcc_assert (multiple_p (nunits * nvects, group_size));
   unsigned nivs;
+  unsigned HOST_WIDE_INT const_nunits;
   if (nested_in_vect_loop)
nivs = nvects;
-  else
+  else if (nunits.is_constant (&const_nunits))
{
  /* Compute the number of distinct IVs we need.  First reduce
 group_size if it is a multiple of const_nunits so we get
@@ -10513,21 +10510,42 @@ vectorizable_induction (loop_vec_info loop_vinfo,
  nivs = least_common_multiple (group_sizep,
const_nunits) / const_nunits;
}
+  else
+   {
+ gcc_assert (SLP_TREE_LANES (slp_node) == 1);
+ nivs = 1;
+   }
+  gimple_seq init_stmts = NULL;
   tree stept = TREE_TYPE (step_vectype);
   tree lupdate_mul = NULL_TREE;
   if (!nested_in_vect_loop)
{
- /* The number of iterations covered in one vector iteration.  */
- unsigned lup_mul = (nvects * const_nunits) / group_size;
- lupdate_mul
-   = build_vector_from_val (step_vectype,
-SCALAR_FLOAT_TYPE_P (stept)
-? build_real_from_wide (stept, lup_mul,
-UNSIGNED)
-: build_int_cstu (stept, lup_mul));
+ if (nunits.is_constant ())
+   {
+ /* The number of iterations covered in one vector iteration.  */
+ unsigned lup_mul = (nvects * const_nunits) / group_size;
+ lupdate_mul
+   = build_vector_from_val (step_vectype,
+SCALAR_FLOAT_TYPE_P (stept)
+? build_real_from_wide (stept, lup_mul,
+UNSIGNED)
+: build_int_cstu (stept, lup_mul));
+   }
+ else
+   {
+ if (SCALAR_FLOAT_TYPE_P (stept))
+   {
+ tree tem = build_int_cst (integer_type_node, vf);
+ lupdate_mul = gimple_build (&init_stmts, FLOAT_EXPR, stept, 
tem);
+   }
+ else
+   lupdate_mul = build_int_cst (stept, vf);
+ lupdate_mul = gimple_build_vector_from_val (&init_stmts,
+ step_vectype,
+ lupdate_mul);
+   }
}
   tree peel_mul = NULL_TREE;
-  gimple_seq init_stmts = NULL;
   if (LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo))
{
  if (SCALAR_FLOAT_TYPE_P

[pushed: r15-3971] diagnostics: fix memory leak in SARIF selftests

2024-09-30 Thread David Malcolm

"make selftest-valgrind" was complaining about leaks of artifact objects
in SARIF's selftest::test_make_location_object:

-fself-test: 7638695 pass(es) in 89.999249 seconds
==3306525==
==3306525== HEAP SUMMARY:
==3306525== in use at exit: 1,215,639 bytes in 2,808 blocks
==3306525==   total heap usage: 2,860,898 allocs, 2,858,090 frees, 
1,336,446,579 bytes allocated
==3306525==
==3306525== 11,728 (1,536 direct, 10,192 indirect) bytes in 16 blocks are 
definitely lost in loss record 353 of 375
==3306525==at 0x514FE7D: operator new(unsigned long) 
(vg_replace_malloc.c:342)
==3306525==by 0x36E5FD2: sarif_builder::get_or_create_artifact(char const*, 
diagnostic_artifact_role, bool) (diagnostic-format-sarif.cc:2884)
==3306525==by 0x36E3D57: 
sarif_builder::maybe_make_physical_location_object(unsigned int, 
diagnostic_artifact_role, int, content_renderer const*) 
(diagnostic-format-sarif.cc:2097)
==3306525==by 0x36E34CE: 
sarif_builder::make_location_object(sarif_location_manager&, rich_location 
const&, logical_location const*, diagnostic_artifact_role) 
(diagnostic-format-sarif.cc:1922)
==3306525==by 0x36E72C6: 
selftest::test_make_location_object(selftest::line_table_case const&) 
(diagnostic-format-sarif.cc:3500)
==3306525==by 0x375609B: selftest::for_each_line_table_case(void 
(*)(selftest::line_table_case const&)) (input.cc:3898)
==3306525==by 0x36E9668: selftest::diagnostic_format_sarif_cc_tests() 
(diagnostic-format-sarif.cc:3910)
==3306525==by 0x3592A11: selftest::run_tests() (selftest-run-tests.cc:100)
==3306525==by 0x17DBEF3: toplev::run_self_tests() (toplev.cc:2268)
==3306525==by 0x17DC2BF: toplev::main(int, char**) (toplev.cc:2376)
==3306525==by 0x36A1919: main (main.cc:39)
==3306525==
==3306525== 12,400 (1,536 direct, 10,864 indirect) bytes in 16 blocks are 
definitely lost in loss record 355 of 375
==3306525==at 0x514FE7D: operator new(unsigned long) 
(vg_replace_malloc.c:342)
==3306525==by 0x36E5FD2: sarif_builder::get_or_create_artifact(char const*, 
diagnostic_artifact_role, bool) (diagnostic-format-sarif.cc:2884)
==3306525==by 0x36E2323: sarif_builder::sarif_builder(diagnostic_context&, 
line_maps const*, char const*, bool) (diagnostic-format-sarif.cc:1500)
==3306525==by 0x36E70AA: 
selftest::test_make_location_object(selftest::line_table_case const&) 
(diagnostic-format-sarif.cc:3469)
==3306525==by 0x375609B: selftest::for_each_line_table_case(void 
(*)(selftest::line_table_case const&)) (input.cc:3898)
==3306525==by 0x36E9668: selftest::diagnostic_format_sarif_cc_tests() 
(diagnostic-format-sarif.cc:3910)
==3306525==by 0x3592A11: selftest::run_tests() (selftest-run-tests.cc:100)
==3306525==by 0x17DBEF3: toplev::run_self_tests() (toplev.cc:2268)
==3306525==by 0x17DC2BF: toplev::main(int, char**) (toplev.cc:2376)
==3306525==by 0x36A1919: main (main.cc:39)
==3306525==
==3306525== LEAK SUMMARY:
==3306525==definitely lost: 3,072 bytes in 32 blocks
==3306525==indirectly lost: 21,056 bytes in 368 blocks
==3306525==  possibly lost: 0 bytes in 0 blocks
==3306525==still reachable: 1,191,511 bytes in 2,408 blocks
==3306525== suppressed: 0 bytes in 0 blocks
==3306525== Reachable blocks (those to which a pointer was found) are not shown.
==3306525== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==3306525==
==3306525== For lists of detected and suppressed errors, rerun with: -s
==3306525== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

Fixed thusly.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3971-gab6c7a329d4958.

gcc/ChangeLog:
* diagnostic-format-sarif.cc (sarif_builder::~sarif_builder): New,
deleting any remaining artifact objects.
(sarif_builder::make_run_object): Empty the artifact map.
* ordered-hash-map.h (ordered_hash_map::empty): New.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-format-sarif.cc | 14 ++
 gcc/ordered-hash-map.h |  2 ++
 2 files changed, 16 insertions(+)

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 6cd18cef6c89..7b11dfd89a31 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -649,6 +649,7 @@ public:
 const line_maps *line_maps,
 const char *main_input_filename_,
 bool formatted);
+  ~sarif_builder ();
 
   void on_report_diagnostic (const diagnostic_info &diagnostic,
 diagnostic_t orig_diag_kind);
@@ -1500,6 +1501,18 @@ sarif_builder::sarif_builder (diagnostic_context 
&context,
  false);
 }
 
+sarif_builder::~sarif_builder ()
+{
+  /* Normally m_filename_to_artifact_map will have been emptied as part
+ of make_run_object, but this isn't run by all the selftests.
+ Ensure the artifact objects are cleaned up for such cases.  */
+  for (auto iter : m_filename_to_

[pushed: r15-3978] diagnostics: return text buffer from test_show_locus [PR116613]

2024-09-30 Thread David Malcolm

As work towards supporting multiple diagnostic outputs (where each
output has its own pretty_printer), avoid referencing dc.m_printer
throughout the selftests of diagnostic-show-locus.cc.  Instead
have test_diagnostic_context::test_show_locus return the result
buffer, hiding the specifics of which printer is in use in such
test cases.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3978-g9c14f9a9c19957.

gcc/ChangeLog:
PR other/116613
* diagnostic-show-locus.cc
(selftest::test_diagnostic_show_locus_unknown_location): Move call
to dc.test_show_locus into ASSERT_STREQ, and compare against its
result, rather than explicitly using dc.m_printer.
(selftest::test_one_liner_simple_caret): Likewise.
(selftest::test_one_liner_no_column): Likewise.
(selftest::test_one_liner_caret_and_range): Likewise.
(selftest::test_one_liner_multiple_carets_and_ranges): Likewise.
(selftest::test_one_liner_fixit_insert_before): Likewise.
(selftest::test_one_liner_fixit_insert_after): Likewise.
(selftest::test_one_liner_fixit_remove): Likewise.
(selftest::test_one_liner_fixit_replace): Likewise.
(selftest::test_one_liner_fixit_replace_non_equal_range):
Likewise.
(selftest::test_one_liner_fixit_replace_equal_secondary_range):
Likewise.
(selftest::test_one_liner_fixit_validation_adhoc_locations):
Likewise.
(selftest::test_one_liner_many_fixits_1): Likewise.
(selftest::test_one_liner_many_fixits_2): Likewise.
(selftest::test_one_liner_labels): Likewise.
(selftest::test_one_liner_simple_caret_utf8): Likewise.
(selftest::test_one_liner_multiple_carets_and_ranges_utf8):
Likewise.
(selftest::test_one_liner_fixit_insert_before_utf8): Likewise.
(selftest::test_one_liner_fixit_insert_after_utf8): Likewise.
(selftest::test_one_liner_fixit_remove_utf8): Likewise.
(selftest::test_one_liner_fixit_replace_utf8): Likewise.
(selftest::test_one_liner_fixit_replace_non_equal_range_utf8):
Likewise.
(selftest::test_one_liner_fixit_replace_equal_secondary_range_utf8):
Likewise.
(selftest::test_one_liner_fixit_validation_adhoc_locations_utf8):
Likewise.
(selftest::test_one_liner_many_fixits_1_utf8): Likewise.
(selftest::test_one_liner_many_fixits_2_utf8): Likewise.
(selftest::test_one_liner_labels_utf8): Likewise.
(selftest::test_one_liner_colorized_utf8): Likewise.
(selftest::test_add_location_if_nearby): Likewise.
(selftest::test_diagnostic_show_locus_fixit_lines): Likewise.
(selftest::test_overlapped_fixit_printing): Likewise.
(selftest::test_overlapped_fixit_printing_utf8): Likewise.
(selftest::test_overlapped_fixit_printing_utf8): Likewise.
(selftest::test_overlapped_fixit_printing_2): Likewise.
(selftest::test_fixit_insert_containing_newline): Likewise.
(selftest::test_fixit_insert_containing_newline_2): Likewise.
(selftest::test_fixit_replace_containing_newline): Likewise.
(selftest::test_fixit_deletion_affecting_newline): Likewise.
(selftest::test_tab_expansion): Likewise.
(selftest::test_escaping_bytes_1): Likewise.
(selftest::test_escaping_bytes_2): Likewise.
(selftest::test_line_numbers_multiline_range): Likewise.
* selftest-diagnostic.cc
(selftest::test_diagnostic_context::test_show_locus): Return the
formatted text of m_printer.
* selftest-diagnostic.h
(selftest::test_diagnostic_context::test_show_locus): Convert
return type from void to const char *.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-show-locus.cc | 248 ---
 gcc/selftest-diagnostic.cc   |   6 +-
 gcc/selftest-diagnostic.h|   2 +-
 3 files changed, 88 insertions(+), 168 deletions(-)

diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index b575dc51a78c..415de42cbc7b 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -3742,8 +3742,7 @@ test_diagnostic_show_locus_unknown_location ()
 {
   test_diagnostic_context dc;
   rich_location richloc (line_table, UNKNOWN_LOCATION);
-  dc.test_show_locus (richloc);
-  ASSERT_STREQ ("", pp_formatted_text (dc.m_printer));
+  ASSERT_STREQ ("", dc.test_show_locus (richloc));
 }
 
 /* Verify that diagnostic_show_locus works sanely for various
@@ -3764,10 +3763,9 @@ test_one_liner_simple_caret ()
   test_diagnostic_context dc;
   location_t caret = linemap_position_for_column (line_table, 10);
   rich_location richloc (line_table, caret);
-  dc.test_show_locus (richloc);
   ASSERT_STREQ (" foo = bar.field;\n"
"  ^\n",
-   pp_formatted_text (dc.m_printer));
+   dc.test_show_locus (

[pushed: r15-3975] diagnostics: avoid using diagnostic_context's m_printer [PR116613]

2024-09-30 Thread David Malcolm

As work towards supporting multiple diagnostic outputs (where each
output has its own pretty_printer), avoid using diagnostic_context's
m_printer field.  Instead, use the output format's printer.  Currently
this *is* the dc's printer, but eventually it might not be.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3975-gcce52867d1892c.

gcc/ChangeLog:
PR other/116613
* diagnostic-format-json.cc (diagnostic_output_format_init_json):
Pass in the format.  Use the format's printer when disabling
colorization.  Move the call to set_output_format into here.
(diagnostic_output_format_init_json_stderr): Update for above
change.
(diagnostic_output_format_init_json_file): Likewise.
* diagnostic-format-sarif.cc
(diagnostic_output_format_init_sarif): Use the format's printer
when disabling colorization.
* diagnostic-path.cc (selftest::test_empty_path): Use the
text_output's printer.
(selftest::test_intraprocedural_path): Likewise.
(selftest::test_interprocedural_path_1): Likewise.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.
(selftest::test_control_flow_1): Likewise.
(selftest::test_control_flow_2): Likewise.
(selftest::test_control_flow_3): Likewise.
(selftest::assert_cfg_edge_path_streq): Likewise.
(selftest::test_control_flow_5): Likewise.
(selftest::test_control_flow_6): Likewise.

gcc/testsuite/ChangeLog:
PR other/116613
* gcc.dg/plugin/diagnostic_group_plugin.c
(test_output_format::on_begin_group): Use get_printer () rather
than accessing m_context.m_printer.
(test_output_format::on_end_group): Likewise.
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.c
(xhtml_builder::m_printer): New field.
(xhtml_builder::xhtml_builder): Add "pp" param and use it to
initialize m_printer.
(xhtml_builder::on_report_diagnostic): Drop "context" param.
(xhtml_builder::make_element_for_diagnostic): Likewise.  Use
this->m_printer rather than the context's m_printer.  Pass
m_printer to call to diagnostic_show_locus.
(xhtml_builder::emit_diagram): Drop "context" param.
(xhtml_output_format::on_report_diagnostic): Drop context param
from call to m_builder.
(xhtml_output_format::on_diagram): Likewise.
(xhtml_output_format::xhtml_output_format): Pass result of
get_printer as printer for builder.
(diagnostic_output_format_init_xhtml): Use the fmt's printer
rather than the context's.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-format-json.cc | 23 +
 gcc/diagnostic-format-sarif.cc|  2 +-
 gcc/diagnostic-path.cc| 38 +++
 .../gcc.dg/plugin/diagnostic_group_plugin.c   | 12 +++--
 .../plugin/diagnostic_plugin_xhtml_format.c   | 47 +--
 5 files changed, 64 insertions(+), 58 deletions(-)

diff --git a/gcc/diagnostic-format-json.cc b/gcc/diagnostic-format-json.cc
index 448b6cb54eee..cf900a9ecba2 100644
--- a/gcc/diagnostic-format-json.cc
+++ b/gcc/diagnostic-format-json.cc
@@ -393,14 +393,17 @@ private:
to a file).  */
 
 static void
-diagnostic_output_format_init_json (diagnostic_context &context)
+diagnostic_output_format_init_json (diagnostic_context &context,
+   std::unique_ptr fmt)
 {
   /* Suppress normal textual path output.  */
   context.set_path_format (DPF_NONE);
 
   /* Don't colorize the text.  */
-  pp_show_color (context.m_printer) = false;
+  pp_show_color (fmt->get_printer ()) = false;
   context.set_show_highlight_colors (false);
+
+  context.set_output_format (fmt.release ());
 }
 
 /* Populate CONTEXT in preparation for JSON output to stderr.  */
@@ -409,9 +412,10 @@ void
 diagnostic_output_format_init_json_stderr (diagnostic_context &context,
   bool formatted)
 {
-  diagnostic_output_format_init_json (context);
-  context.set_output_format (new json_stderr_output_format (context,
-   formatted));
+  diagnostic_output_format_init_json
+(context,
+ ::make_unique (context,
+  formatted));
 }
 
 /* Populate CONTEXT in preparation for JSON output to a file named
@@ -422,10 +426,11 @@ diagnostic_output_format_init_json_file 
(diagnostic_context &context,
 bool formatted,
 const char *base_file_name)
 {
-  diagnostic_output_format_init_json (context);
-  context.set_output_format (new json_file_output_format (context,
- formatted,
-

[pushed: r15-3972] diagnostics: fix typo in XHTML output [PR116792]

2024-09-30 Thread David Malcolm

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3972-g3286b6724ec1d0.

gcc/testsuite/ChangeLog:
PR other/116792
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.c: Fix stray
reference to JSON.

Signed-off-by: David Malcolm 
---
 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_xhtml_format.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_xhtml_format.c 
b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_xhtml_format.c
index 192288aff1bc..0f13e8d6d01a 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_xhtml_format.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_xhtml_format.c
@@ -751,7 +751,7 @@ diagnostic_output_format_init_xhtml_file 
(diagnostic_context &context,
 namespace selftest {
 
 /* A subclass of xhtml_output_format for writing selftests.
-   The JSON output is cached internally, rather than written
+   The XML output is cached internally, rather than written
out to a file.  */
 
 class test_xhtml_diagnostic_context : public test_diagnostic_context
-- 
2.26.3

[pushed: r15-3974] diagnostics: use "%e" to avoid intermediate strings [PR116613]

2024-09-30 Thread David Malcolm

Various diagnostics build an intermediate string, potentially with
colorization, and then use this in a diagnostic message.

This won't work if we have multiple diagnostic sinks, where some might
be colorized and some not.

This patch reworks such places using "%e" and pp_element subclasses, so
that any colorization happens within report_diagnostic's call to
pp_format.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3974-g3d3d20ccd83659.

gcc/analyzer/ChangeLog:
PR other/116613
* kf-analyzer.cc: Include "pretty-print-markup.h".
(kf_analyzer_dump_escaped::impl_call_pre): Defer colorization
choices by eliminating the construction of a intermediate string,
replacing it with a new pp_element subclass via "%e".

gcc/ChangeLog:
PR other/116613
* attribs.cc: Include "pretty-print-markup.h".
(decls_mismatched_attributes): Defer colorization choices by
replacing printing to a pretty_printer * param with appending
to a vec of strings.
(maybe_diag_alias_attributes): As above, replacing pretty_printer
with usage of pp_markup::comma_separated_quoted_strings and "%e"
in two places.
* attribs.h (decls_mismatched_attributes): Update decl.
* gimple-ssa-warn-access.cc: Include "pretty-print-markup.h".
(pass_waccess::maybe_warn_memmodel): Defer colorization choices by
replacing printing to a pretty_printer * param with use of
pp_markup::comma_separated_quoted_strings and "%e".
(pass_waccess::maybe_warn_memmodel): Likewise, replacing printing
to a temporary buffer.
* pretty-print-markup.h
(class pp_markup::comma_separated_quoted_strings): New.
* pretty-print.cc
(pp_markup::comma_separated_quoted_strings::add_to_phase_2): New.
(selftest::test_pp_printf_within_pp_element): New.
(selftest::test_comma_separated_quoted_strings): New.
(selftest::pretty_print_cc_tests): Call the new tests.

gcc/cp/ChangeLog:
PR other/116613
* pt.cc: Include "pretty-print-markup.h".
(warn_spec_missing_attributes): Defer colorization choices by
replacing printing to a pretty_printer * param with appending
to a vec of strings.  Replace pretty_printer with usage of
pp_markup::comma_separated_quoted_strings and "%e".

gcc/testsuite/ChangeLog:
PR other/116613
* c-c++-common/analyzer/escaping-1.c: Update expected results to
remove type information from C++ results.  Previously we were
using %qD with default_tree_printer, which used
lang_hooks.decl_printable_name, whereas now we're using %qD with
a clone of the cxx_pretty_printer.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/kf-analyzer.cc   | 42 ++---
 gcc/attribs.cc| 33 ---
 gcc/attribs.h |  2 +-
 gcc/cp/pt.cc  | 18 ++--
 gcc/gimple-ssa-warn-access.cc | 21 ++---
 gcc/pretty-print-markup.h | 17 
 gcc/pretty-print.cc   | 92 +++
 .../c-c++-common/analyzer/escaping-1.c|  9 +-
 8 files changed, 179 insertions(+), 55 deletions(-)

diff --git a/gcc/analyzer/kf-analyzer.cc b/gcc/analyzer/kf-analyzer.cc
index 26c2e41da6ff..da49baa5bff1 100644
--- a/gcc/analyzer/kf-analyzer.cc
+++ b/gcc/analyzer/kf-analyzer.cc
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "analyzer/pending-diagnostic.h"
 #include "analyzer/call-details.h"
 #include "make-unique.h"
+#include "pretty-print-markup.h"
 
 #if ENABLE_ANALYZER
 
@@ -176,23 +177,40 @@ public:
probably most user-friendly.  */
 escaped_decls.qsort (cmp_decls_ptr_ptr);
 
-pretty_printer pp;
-pp_format_decoder (&pp) = default_tree_printer;
-pp_show_color (&pp) = pp_show_color (global_dc->m_printer);
-bool first = true;
-for (auto iter : escaped_decls)
+class escaped_list_element : public pp_element
+{
+public:
+  escaped_list_element (auto_vec &escaped_decls)
+  : m_escaped_decls (escaped_decls)
   {
-   if (first)
- first = false;
-   else
- pp_string (&pp, ", ");
-   pp_printf (&pp, "%qD", iter);
   }
+
+  void add_to_phase_2 (pp_markup::context &ctxt) final override
+  {
+   /* We can't call pp_printf directly on ctxt.m_pp from within
+  formatting.  As a workaround, work with a clone of the pp.  */
+   std::unique_ptr pp (ctxt.m_pp.clone ());
+   bool first = true;
+   for (auto iter : m_escaped_decls)
+ {
+   if (first)
+ first = false;
+   else
+ pp_string (pp.get (), ", ");
+   pp_printf (pp.get (), "%qD", iter);
+ }
+   pp_string (&ctxt.m_pp, pp_formatted_text (pp.get ()));
+  }
+
+private:
+

[pushed: r15-3973] diagnostics: add "dump" to pretty_printer and output_buffer

2024-09-30 Thread David Malcolm

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3973-g4c7a58ac2617e2.

gcc/ChangeLog:
* pretty-print.cc (output_buffer::dump): New.
(pretty_printer::dump): New.
* pretty-print.h (output_buffer::dump): New decls.
(pretty_printer::dump): New decls.

Signed-off-by: David Malcolm 
---
 gcc/pretty-print.cc | 23 +++
 gcc/pretty-print.h  |  6 ++
 2 files changed, 29 insertions(+)

diff --git a/gcc/pretty-print.cc b/gcc/pretty-print.cc
index 998e06e155f7..2b865212ac55 100644
--- a/gcc/pretty-print.cc
+++ b/gcc/pretty-print.cc
@@ -790,6 +790,21 @@ output_buffer::pop_formatted_chunks ()
   obstack_free (&m_chunk_obstack, old_top);
 }
 
+/* Dump state of this output_buffer to OUT, for debugging.  */
+
+void
+output_buffer::dump (FILE *out) const
+{
+  int depth = 0;
+  for (pp_formatted_chunks *iter = m_cur_formatted_chunks;
+   iter;
+   iter = iter->m_prev, depth++)
+{
+  fprintf (out, "pp_formatted_chunks: depth %i\n", depth);
+  iter->dump (out);
+}
+}
+
 #ifndef PTRDIFF_MAX
 #define PTRDIFF_MAX INTTYPE_MAXIMUM (ptrdiff_t)
 #endif
@@ -3013,6 +3028,14 @@ pretty_printer::end_url ()
 pp_string (this, get_end_url_string (this));
 }
 
+/* Dump state of this pretty_printer to OUT, for debugging.  */
+
+void
+pretty_printer::dump (FILE *out) const
+{
+  m_buffer->dump (out);
+}
+
 /* class pp_markup::context.  */
 
 void
diff --git a/gcc/pretty-print.h b/gcc/pretty-print.h
index b5ded5cdd5e0..ec64a167327b 100644
--- a/gcc/pretty-print.h
+++ b/gcc/pretty-print.h
@@ -93,6 +93,9 @@ public:
   pp_formatted_chunks *push_formatted_chunks ();
   void pop_formatted_chunks ();
 
+  void dump (FILE *out) const;
+  void DEBUG_FUNCTION dump () const { dump (stderr); }
+
   /* Obstack where the text is built up.  */
   struct obstack m_formatted_obstack;
 
@@ -313,6 +316,9 @@ public:
   void set_real_maximum_length ();
   int remaining_character_count_for_line ();
 
+  void dump (FILE *out) const;
+  void DEBUG_FUNCTION dump () const { dump (stderr); }
+
 private:
   /* Where we print external representation of ENTITY.  */
   output_buffer *m_buffer;
-- 
2.26.3

[pushed: r15-3976] diagnostics: isolate diagnostic_context with interface classes [PR116613]

2024-09-30 Thread David Malcolm

As work towards supporting multiple diagnostic outputs (where each
output has its own pretty_printer), avoid passing around
diagnostic_context to the various printing routines, so that we
can be more explicit about which pretty_printer is in use.

Introduce a set of "policy" classes that capture the parts of
diagnostic_context that are needed, and use these rather than
diagnostic_context *.  Pass around pretty_printer & rather than
taking value from context.  Split out the pretty_printer-using code
from class layout into a new class layout_printer, separating the
responsibilities of determining layout when quoting source versus
actually printing the source.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3976-gbe02253af81034.

gcc/analyzer/ChangeLog:
PR other/116613
* program-point.cc (function_point::print_source_line): Replace
call to diagnostic_show_locus with a call to
diagnostic_source_print_policy::print.

gcc/ChangeLog:
PR other/116613
* diagnostic-format-json.cc (json_from_expanded_location): Replace
call to diagnostic_context::converted_column with call to
diagnostic_column_policy::converted_column.
* diagnostic-format-sarif.cc
(sarif_builder::make_location_object): Replace call to
diagnostic_show_locus with call to
diagnostic_source_print_policy::print.
* diagnostic-format-text.cc (get_location_text): Replace call to
diagnostic_context::get_location_text with call to
diagnostic_column_policy::get_location_text.
(diagnostic_text_output_format::report_current_module): Replace call
to diagnostic_context::converted_column with call to
diagnostic_column_policy::converted_column.
* diagnostic-format-text.h
(diagnostic_text_output_format::diagnostic_output_format):
Initialize m_column_policy.
(diagnostic_text_output_format::get_column_policy): New.
(diagnostic_text_output_format::m_column_policy): New.
* diagnostic-path.cc (class path_print_policy): New.
(event_range::maybe_add_event): Replace diagnostic_context param
with path_print_policy.
(event_range::print): Convert "pp" from * to &.  Convert first
param of start_span callback from diagnostic_context to
diagnostic_location_print_policy.
(path_summary::path_summary): Convert first param from
diagnostic_text_output_format to path_print_policy.  Add
colorize param.  Update for changes to
event_range::maybe_add_event.
(thread_event_printer::print_swimlane_for_event_range): Assert
that pp is non-null.  Update for change to event_range::print.
(diagnostic_text_output_format::print_path): Pass
path_print_policy to path_summary's ctor.
(selftest::test_empty_path): Likewise.
(selftest::test_intraprocedural_path): Likewise.
(selftest::test_interprocedural_path_1): Likewise.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.
(selftest::test_control_flow_1): Likewise.
(selftest::test_control_flow_2): Likewise.
(selftest::test_control_flow_3): Likewise.
(selftest::assert_cfg_edge_path_streq): Likewise.
(selftest::test_control_flow_5): Likewise.
(selftest::test_control_flow_6): Likewise.
* diagnostic-show-locus.cc (colorizer::set_range): Update for
change to m_pp.
(colorizer::m_pp): Convert from * to &.
(class layout): Add friend class layout_printer and move various
decls to it.
(layout::m_pp): Drop field.
(layout::m_policy): Rename to...
(layout::m_char_policy): ...this.
(layout::m_colorizer): Move field to class layout_printer.
(layout::m_diagnostic_path_p): Drop field.
(class layout_printer): New class, by refactoring class layout.
(colorizer::colorizer): Convert "pp" param from * to &.
(colorizer::set_named_color): Update for above change.
(colorizer::begin_state): Likewise.
(colorizer::finish_state): Likewise.
(make_policy): Rename to...
(make_char_policy): ...this, and update param from
diagnostic_context to diagnostic_source_print_policy.
(layout::layout): Update param from diagnostic_context to
diagnostic_source_print_policy.  Drop params "diagnostic_kind" and
"pp", moving these and other material to class layout_printer.
(layout::maybe_add_location_range): Update for renamed field.
(layout::print_gap_in_line_numbering): Convert to...
(layout_printer::print_gap_in_line_numbering): ...this.
(layout::calculate_x_offset_display): Update for renamed field.
(layout::print_source_line): Convert to...
(layout_printer::print_source_line): ...this.
(layout::p

[pushed: r15-3977] diagnostics: require callers of diagnostic_show_locus to be explicit about the printer [PR116613]

2024-09-30 Thread David Malcolm

As work towards supporting multiple diagnostic outputs (where each
output has its own pretty_printer), update diagnostic_show_locus
so that the pretty_printer must always be explicitly passed in.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-3977-ge7a8fbe2fed83b.

gcc/c-family/ChangeLog:
PR other/116613
* c-format.cc (selftest::test_type_mismatch_range_labels):
Explicitly pass in dc.m_printer to diagnostic_show_locus.

gcc/ChangeLog:
PR other/116613
* diagnostic-show-locus.cc (diagnostic_context::maybe_show_locus):
Convert param "pp" from * to &.  Drop logic for using the
context's m_printer when the param is null.
* diagnostic.h (diagnostic_context::maybe_show_locus): Convert
param "pp" from * to &.
(diagnostic_show_locus): Drop default "nullptr" value for pp
param.  Assert that it and context are nonnull.  Pass pp by
reference to maybe_show_locus.

gcc/testsuite/ChangeLog:
PR other/116613
* gcc.dg/plugin/expensive_selftests_plugin.c (test_richloc):
Explicitly pass in dc.m_printer to diagnostic_show_locus.

Signed-off-by: David Malcolm 
---
 gcc/c-family/c-format.cc | 2 +-
 gcc/diagnostic-show-locus.cc | 8 ++--
 gcc/diagnostic.h | 8 +---
 gcc/testsuite/gcc.dg/plugin/expensive_selftests_plugin.c | 2 +-
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 614b43266a31..f4a65a5019c3 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -5578,7 +5578,7 @@ test_type_mismatch_range_labels ()
   richloc.add_range (param, SHOW_RANGE_WITHOUT_CARET, ¶m_label);
 
   test_diagnostic_context dc;
-  diagnostic_show_locus (&dc, &richloc, DK_ERROR);
+  diagnostic_show_locus (&dc, &richloc, DK_ERROR, dc.m_printer);
   if (c_dialect_cxx ())
 /* "char*", without a space.  */
 ASSERT_STREQ ("   printf (\"msg: %i\\n\", msg);\n"
diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index a1d66cf493d6..b575dc51a78c 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -3265,7 +3265,7 @@ add_location_if_nearby (const diagnostic_context &dc,
 void
 diagnostic_context::maybe_show_locus (const rich_location &richloc,
  diagnostic_t diagnostic_kind,
- pretty_printer *pp,
+ pretty_printer &pp,
  diagnostic_source_effect_info *effects)
 {
   const location_t loc = richloc.get_loc ();
@@ -3287,12 +3287,8 @@ diagnostic_context::maybe_show_locus (const 
rich_location &richloc,
 
   m_last_location = loc;
 
-  if (!pp)
-pp = m_printer;
-  gcc_assert (pp);
-
   diagnostic_source_print_policy source_policy (*this);
-  source_policy.print (*pp, richloc, diagnostic_kind, effects);
+  source_policy.print (pp, richloc, diagnostic_kind, effects);
 }
 
 diagnostic_source_print_policy::
diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index 54b7f307f849..447e3b183d92 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -529,7 +529,7 @@ public:
 
   void maybe_show_locus (const rich_location &richloc,
 diagnostic_t diagnostic_kind,
-pretty_printer *pp,
+pretty_printer &pp,
 diagnostic_source_effect_info *effect_info);
 
   void emit_diagram (const diagnostic_diagram &diagram);
@@ -970,11 +970,13 @@ inline void
 diagnostic_show_locus (diagnostic_context *context,
   rich_location *richloc,
   diagnostic_t diagnostic_kind,
-  pretty_printer *pp = nullptr,
+  pretty_printer *pp,
   diagnostic_source_effect_info *effect_info = nullptr)
 {
+  gcc_assert (context);
   gcc_assert (richloc);
-  context->maybe_show_locus (*richloc, diagnostic_kind, pp, effect_info);
+  gcc_assert (pp);
+  context->maybe_show_locus (*richloc, diagnostic_kind, *pp, effect_info);
 }
 
 /* Because we read source files a second time after the frontend did it the
diff --git a/gcc/testsuite/gcc.dg/plugin/expensive_selftests_plugin.c 
b/gcc/testsuite/gcc.dg/plugin/expensive_selftests_plugin.c
index 3c534005a419..554dad6fa35a 100644
--- a/gcc/testsuite/gcc.dg/plugin/expensive_selftests_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/expensive_selftests_plugin.c
@@ -48,7 +48,7 @@ test_richloc (rich_location *richloc)
 {
   /* Run the diagnostic and fix-it printing code.  */
   test_diagnostic_context dc;
-  diagnostic_show_locus (&dc, richloc, DK_ERROR);
+  diagnostic_show_locus (&dc, richloc, DK_ERROR, dc.m_printer);
 
   /* Generate a diff.  */
   edit_context ec (global_dc->get_file_cache ());
-- 
2.26.3

Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

2024-09-30 Thread Saurabh Jha


Hi Soumya,

Thank you for the patch. Two clarifications:

In the instruction pattern's output string, why did you add the 'Z' 
prefix before operands? (%0 -> %Z0).


Also, maybe you can make your test cases more precise by specifying 
which functions generate which instructions. I don't have and SVE test 
off the top of my head but have a look at

/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c
for example.

Regards,
Saurabh



On 9/30/2024 5:26 PM, Soumya AR wrote:

This patch uses the FSCALE instruction provided by SVE to implement the
standard ldexp family of functions.

Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
following code:

float
test_ldexpf (float x, int i)
{
return __builtin_ldexpf (x, i);
}

double
test_ldexp (double x, int i)
{
return __builtin_ldexp(x, i);
}

GCC Output:

test_ldexpf:
b ldexpf

test_ldexp:
b ldexp

Since SVE has support for an FSCALE instruction, we can use this to process
scalar floats by moving them to a vector register and performing an fscale call,
similar to how LLVM tackles an ldexp builtin as well.

New Output:

test_ldexpf:
fmov s31, w0
ptrue p7.b, all
fscale z0.s, p7/m, z0.s, z31.s
ret

test_ldexp:
sxtw x0, w0
ptrue p7.b, all
fmov d31, x0
fscale z0.d, p7/m, z0.d, z31.d
ret

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Soumya AR 

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md
(ldexp3): Added a new pattern to match ldexp calls with scalar
floating modes and expand to the existing pattern for FSCALE.
(@aarch64_pred_): Extended the pattern to accept SVE
operands as well as scalar floating modes.

* config/aarch64/iterators.md:
SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well
as SF and DF.
VPRED: Extended the attribute to handle GPF modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/fscale.c: New test.

RE: [PATCH]middle-end: check explicitly for external or constants when checking for loop invariant [PR116817]

2024-09-30 Thread Richard Biener

On Mon, 30 Sep 2024, Tamar Christina wrote:

> > > > Can you explain how you get to see constant/external defs with
> > astmt_vec_info?  That's somehow a violation of some inherentinvariant in the
> > vectorizer.
> > >
> > > I'm not sure I actually get any. It could be the condition is never hit
> > > with a stmt_vec_info. I had assumed however since the condition is part
> > > of a gimple_cond and if one of the arguments of the gimple_cond is loop
> > > bound, that the condition would be analyzed too.
> > >
> > > So if you're saying you never get a stmt_vec_info for invariants at this
> > > point (I assume you could see you see them in the corresponding slp
> > > tree) then maybe checking for the stmt_vec_info is enough.
> > >
> > > However, when I was looking around for how to check for externals I
> > > noticed other patterns also check for externals and constants. So I
> > > assumed that you could indeed get them.
> > 
> > You usually check that after doing vect_is_simple_use on the SSA name
> > or constant which internally makes all stmts with a stmt_vec_info
> > one of the internal def kinds.
> > 
> > So I guess you could do vect_is_simple_use on 'var' as well and check
> > the 'dt' it will populate
> > 
> 
> Ah I see, I did see it being called in some other patterns but wasn't sure 
> what it
> was providing.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/116817
>   * tree-vect-patterns.cc (vect_recog_bool_pattern): Check for const or
>   externals.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/116817
>   * g++.dg/vect/pr116817.cc: New test.
> 
> -- inline copy of patch --
> 
> diff --git a/gcc/testsuite/g++.dg/vect/pr116817.cc 
> b/gcc/testsuite/g++.dg/vect/pr116817.cc
> new file mode 100644
> index 
> ..7e28982fb138c24f956aedb03fa454d9d858
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/pr116817.cc
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3" } */
> +
> +int main_ulData0;
> +unsigned *main_pSrcBuffer;
> +int main(void) {
> +  int iSrc = 0;
> +  bool bData0;
> +  for (; iSrc < 4; iSrc++) {
> +if (bData0)
> +  main_pSrcBuffer[iSrc] = main_ulData0;
> +else
> +  main_pSrcBuffer[iSrc] = 0;
> +bData0 = !bData0;
> +  }
> +}
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> e7e877dd2adb55262822f1660f8d92b42d44e6d0..b174ff1e705cec8e7bb414c760eb170ca98222cb
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -6062,12 +6062,15 @@ vect_recog_bool_pattern (vec_info *vinfo,
>if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
>   return NULL;
>  
> +  enum vect_def_type dt;
>if (check_bool_pattern (var, vinfo, bool_stmts))
>   var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
>else if (integer_type_for_mask (var, vinfo))
>   return NULL;
>else if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
> -&& !vect_get_internal_def (vinfo, var))
> +&& vect_is_simple_use (var, vinfo, &dt)
> +&& (dt == vect_external_def
> +|| dt == vect_constant_def))
>   {
> /* If the condition is already a boolean then manually convert it to a
>mask of the given integer type but don't set a vectype.  */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH 2/2]AArch64: support encoding integer immediates using floating point moves

2024-09-30 Thread Tamar Christina

Hi All,

This patch extends our immediate SIMD generation cases to support generating
integer immediates using floating point operation if the integer immediate maps
to an exact FP value.

As an example:

uint32x4_t f1() {
return vdupq_n_u32(0x3f80);
}

currently generates:

f1:
adrpx0, .LC0
ldr q0, [x0, #:lo12:.LC0]
ret

i.e. a load, but with this change:

f1:
fmovv0.4s, 1.0e+0
ret

Such immediates are common in e.g. our Math routines in glibc because they are
created to extract or mark part of an FP immediate as masks.

Bootstrapped Regtested on aarch64-none-linux-gnu and  issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_float_const_representable_p):
Add overload.
* config/aarch64/aarch64.cc (aarch64_float_const_zero_rtx_p): Reject
integer modes.
(aarch64_simd_valid_immediate, aarch64_float_const_representable_p):
Check if integer value maps to an exact FP constant.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/const_create_using_fmov.c: New test.

---
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
7a84acc59569da0b50af2300615db561a5de460a..6c683ea2d93e1b733cfe49fac38381ea6451fd55
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -974,6 +974,7 @@ void aarch64_split_simd_move (rtx, rtx);
 
 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx, machine_mode);
+bool aarch64_float_const_representable_p (rtx *, rtx, machine_mode);
 
 extern int aarch64_epilogue_uses (int);
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
1842f6ecf6330f11a64545d0903240c89b104ffc..2d44608d93b8e7542ea8d5eb4c3f99c9f88e70ed
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -10991,7 +10991,8 @@ aarch64_float_const_zero_rtx_p (rtx x)
   /* 0.0 in Decimal Floating Point cannot be represented by #0 or
  zr as our callers expect, so no need to check the actual
  value if X is of Decimal Floating Point type.  */
-  if (GET_MODE_CLASS (GET_MODE (x)) == MODE_DECIMAL_FLOAT)
+  if (GET_MODE_CLASS (GET_MODE (x)) == MODE_DECIMAL_FLOAT
+  || !CONST_DOUBLE_P (x))
 return false;
 
   if (REAL_VALUE_MINUS_ZERO (*CONST_DOUBLE_REAL_VALUE (x)))
@@ -23026,17 +23027,30 @@ aarch64_simd_valid_immediate (rtx op, 
simd_immediate_info *info,
   else
 return false;
 
-  scalar_float_mode elt_float_mode;
-  if (n_elts == 1
-  && is_a  (elt_mode, &elt_float_mode))
+  if (n_elts == 1)
 {
   rtx elt = CONST_VECTOR_ENCODED_ELT (op, 0);
+  rtx new_elt = NULL_RTX;
   if (aarch64_float_const_zero_rtx_p (elt)
- || aarch64_float_const_representable_p (elt, elt_mode))
-   {
- if (info)
-   *info = simd_immediate_info (elt_float_mode, elt);
- return true;
+ || aarch64_float_const_representable_p (&new_elt, elt, elt_mode))
+   {
+ scalar_float_mode elt_float_mode;
+ auto bitsize = GET_MODE_UNIT_BITSIZE (elt_mode);
+ if (is_a  (elt_mode))
+   elt_float_mode = as_a  (elt_mode);
+ else if (which == AARCH64_CHECK_MOV
+  && new_elt
+  && float_mode_for_size (bitsize).exists (&elt_float_mode))
+   elt = new_elt;
+ else
+   elt = NULL_RTX;
+
+ if (elt != NULL_RTX)
+   {
+ if (info)
+   *info = simd_immediate_info (elt_float_mode, elt);
+ return true;
+   }
}
 }
 
@@ -25121,8 +25135,22 @@ aarch64_c_mode_for_suffix (char suffix)
 
 /* Return true iff X with mode MODE can be represented by a quarter-precision
floating point immediate operand X.  Note, we cannot represent 0.0.  */
+
 bool
 aarch64_float_const_representable_p (rtx x, machine_mode mode)
+{
+  return aarch64_float_const_representable_p (NULL, x, mode);
+}
+
+
+/* Return true iff X with mode MODE can be represented by a quarter-precision
+   floating point immediate operand X.  Note, we cannot represent 0.0.
+   If the value is a CONST_INT that can be represented as an exact floating
+   point then OUT will contain the new floating point value to emit to generate
+   the integer constant.  */
+
+bool
+aarch64_float_const_representable_p (rtx *out, rtx x, machine_mode mode)
 {
   /* This represents our current view of how many bits
  make up the mantissa.  */
@@ -25134,14 +25162,45 @@ aarch64_float_const_representable_p (rtx x, 
machine_mode mode)
 
   x = unwrap_const_vec_duplicate (x);
   mode = GET_MODE_INNER (mode);
-  if (!CONST_DOUBLE_P (x))
+  if (!CONST_DOUBLE_P (x)
+  && !CONST_INT_P (x))
 return false;
 
   if (mode == VOIDmode
-  || (mode == HFmode && !TARGET_FP_F16INST))
+  || ((mode == HFmode || mode == HImode) && !TARGET_FP_F16INST))
 return false;

Re: [testcase] Fix absfloat16.c testcase

2024-09-30 Thread Jeff Law





On 9/29/24 10:46 PM, Kugan Vivekanandarajah wrote:

Hi,

This patch Fixes absfloat16.c testcase to have the dg-add-options float16 at 
the correct order. Due to this mixup, this test is failing for some arm 
variants.

Is this OK for trunk?

OK
jeff

[PATCH] middle-end: Fix ifcvt predicate generation for masked function calls

2024-09-30 Thread Victor Do Nascimento

Up until now, due to a latent bug in the code for the ifcvt pass,
irrespective of the branch taken in a conditional statement, the
original condition for the if statement was used in masking the
function call.

Thus, for code such as:

  if (a[i] > limit)
b[i] = fixed_const;
  else
b[i] = fn (a[i]);

we would generate the following (wrong) if-converted tree code:

  _1 = a[i_1];
  _2 = _1 > limit;
  _3 = .MASK_CALL (fn, _1, _2);
  cstore_4 = _2 ? fixed_const : _3;

as opposed to the correct expected sequence:

  _1 = a[i_1];
  _2 = _1 > limit;
  _3 = ~_2;
  _4 = .MASK_CALL (fn, _1, _3);
  cstore_5 = _2 ? fixed_const : _4;

This patch ensures that the correct predicate mask generation is
carried out such that, upon autovectorization, the correct vector
lanes are selected in the vectorized function call.

gcc/ChangeLog:

* tree-if-conv.cc (predicate_statements): Fix handling of
predicated function calls.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-fncall-mask.c: New.
---
 gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c | 31 
 gcc/tree-if-conv.cc  | 14 -
 2 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c 
b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
new file mode 100644
index 000..554488e0630
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
@@ -0,0 +1,31 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw -Ofast" 
{ target { aarch64*-*-* } } } */
+
+extern int __attribute__ ((simd, const)) fn (int);
+
+const int N = 20;
+const float lim = 101.0;
+const float cst =  -1.0;
+float tot =   0.0;
+
+float b[20];
+float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
+   [10 ... 19] = 100.0 };/* Else branch.  */
+
+int main (void)
+{
+  #pragma omp simd
+  for (int i = 0; i < N; i += 1)
+{
+  if (a[i] > lim)
+   b[i] = cst;
+  else
+   b[i] = fn (a[i]);
+  tot += b[i];
+}
+  return (0);
+}
+
+/* { dg-final { scan-tree-dump {gimple_assign } ifcvt } } */
+/* { dg-final { scan-tree-dump {gimple_assign } ifcvt } } */
+/* { dg-final { scan-tree-dump {gimple_call <.MASK_CALL, _3, fn, _2, _34>} 
ifcvt } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 0346a1376c5..246a6bb5bd1 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -2907,6 +2907,8 @@ predicate_statements (loop_p loop)
 This will cause the vectorizer to match the "in branch"
 clone variants, and serves to build the mask vector
 in a natural way.  */
+ tree mask = cond;
+ gimple_seq stmts = NULL;
  gcall *call = dyn_cast  (gsi_stmt (gsi));
  tree orig_fn = gimple_call_fn (call);
  int orig_nargs = gimple_call_num_args (call);
@@ -2914,7 +2916,17 @@ predicate_statements (loop_p loop)
  args.safe_push (orig_fn);
  for (int i = 0; i < orig_nargs; i++)
args.safe_push (gimple_call_arg (call, i));
- args.safe_push (cond);
+ /* If `swap', we invert the mask used for the if branch for use
+when masking the function call.  */
+ if (swap)
+   {
+ tree true_val
+   = constant_boolean_node (true, TREE_TYPE (mask));
+ mask = gimple_build (&stmts, BIT_XOR_EXPR,
+  TREE_TYPE (mask), mask, true_val);
+   }
+ gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
+ args.safe_push (mask);
 
  /* Replace the call with a IFN_MASK_CALL that has the extra
 condition parameter. */
-- 
2.34.1

Re: [PATCH 2/3] bpf: calls do not promote attr access_index on lhs

2024-09-30 Thread David Faust




On 9/27/24 09:49, Cupertino Miranda wrote:
> When traversing gimple to introduce CO-RE relocation entries to
> expressions that are accesses to attributed perserve_access_index types,
> the access is likely to be split in multiple gimple statments.
> In order to keep doing the proper CO-RE convertion we will need to mark
> the LHS tree nodes of gimple expressions as explicit CO-RE accesses,
> such that the gimple traverser will further convert the sub-expressions.
> 
> This patch makes sure that this LHS marking will not happen in case the
> gimple statement is a function call, which case it is no longer
> expecting to keep generating CO-RE accesses with the remaining of the
> expression.

OK, LGTM.
Thanks!

> ---
>  gcc/config/bpf/core-builtins.cc   |  1 +
>  .../gcc.target/bpf/core-attr-calls.c  | 49 +++
>  2 files changed, 50 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/core-attr-calls.c
> 
> diff --git a/gcc/config/bpf/core-builtins.cc b/gcc/config/bpf/core-builtins.cc
> index 86e2e9d6e39..cdfb356660e 100644
> --- a/gcc/config/bpf/core-builtins.cc
> +++ b/gcc/config/bpf/core-builtins.cc
> @@ -1822,6 +1822,7 @@ make_gimple_core_safe_access_index (tree *tp,
>  
>tree lhs;>if (!wi->is_lhs
> +   && gimple_code (wi->stmt) != GIMPLE_CALL
> && (lhs = gimple_get_lhs (wi->stmt)) != NULL_TREE)
>   core_mark_as_access_index (lhs);
>  }
> diff --git a/gcc/testsuite/gcc.target/bpf/core-attr-calls.c 
> b/gcc/testsuite/gcc.target/bpf/core-attr-calls.c
> new file mode 100644
> index 000..87290c5c211
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/core-attr-calls.c
> @@ -0,0 +1,49 @@
> +/* Test for BPF CO-RE __attribute__((preserve_access_index)) with accesses on
> +   LHS and both LHS and RHS of assignment with calls involved.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -dA -gbtf -mco-re -masm=normal" } */
> +
> +struct U {
> +  int c;
> +  struct V {
> +int d;
> +int e[4];
> +int f;
> +int *g;
> +  } v;
> +};
> +
> +struct T {
> +  int a;
> +  int b;
> +  struct U u;
> +  struct U *ptr_u;
> +  struct U *array_u;
> +} __attribute__((preserve_access_index));
> +
> +extern struct U *get_other_u(struct U *);
> +extern struct V *get_other_v(struct V *);
> +
> +void
> +func (struct T *t, int i)
> +{
> +  /* Since we are using the builtin all accesses are converted to CO-RE.  */
> +  /* 0:30:0   */
> +  __builtin_preserve_access_index(({ get_other_u(t->ptr_u)->c = 42; }));
> +
> +  /* This should not pass-through CO-RE accesses beyond the call since 
> struct U
> + is not explicitly marked with preserve_access_index. */
> +  /* 0:3  */
> +  get_other_u(t->ptr_u)->c = 43;
> +
> +  /* 0:2:1  */
> +  get_other_v(&t->u.v)->d = 44;
> +}
> +
> +/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0:3\"\\)" 2 } } */
> +/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0:0\"\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0:2:1\"\\)" 1 } } 
> */
> +/* { dg-final { scan-assembler-times "bpfcr_type \\(struct T\\)" 3 } } */
> +/* { dg-final { scan-assembler-times "bpfcr_type \\(struct U\\)" 1 } } */
> +

RE: [PATCH 2/2]AArch64: support encoding integer immediates using floating point moves

2024-09-30 Thread Tamar Christina

Thanks for the review,
Will get started on it but one question...

> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, September 30, 2024 6:33 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH 2/2]AArch64: support encoding integer immediates using
> floating point moves
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > This patch extends our immediate SIMD generation cases to support generating
> > integer immediates using floating point operation if the integer immediate 
> > maps
> > to an exact FP value.
> >
> > As an example:
> >
> > uint32x4_t f1() {
> > return vdupq_n_u32(0x3f80);
> > }
> >
> > currently generates:
> >
> > f1:
> > adrpx0, .LC0
> > ldr q0, [x0, #:lo12:.LC0]
> > ret
> >
> > i.e. a load, but with this change:
> >
> > f1:
> > fmovv0.4s, 1.0e+0
> > ret
> >
> > Such immediates are common in e.g. our Math routines in glibc because they 
> > are
> > created to extract or mark part of an FP immediate as masks.
> 
> I agree this is a good thing to do.  The current code is too beholden
> to the original vector mode.  This patch relaxes it so that it isn't
> beholden to the original mode's class (integer vs. float), but it would
> still be beholden to the original mode's element size.
> 
> It looks like an alternative would be to remove:
> 
>   scalar_float_mode elt_float_mode;
>   if (n_elts == 1
>   && is_a  (elt_mode, &elt_float_mode))
> {
>   rtx elt = CONST_VECTOR_ENCODED_ELT (op, 0);
>   if (aarch64_float_const_zero_rtx_p (elt)
> || aarch64_float_const_representable_p (elt))
>   {
> if (info)
>   *info = simd_immediate_info (elt_float_mode, elt);
> return true;
>   }
> }
> 
> and instead insert code:
> 
>   /* Get the repeating 8-byte value as an integer.  No endian correction
>  is needed here because bytes is already in lsb-first order.  */
>   unsigned HOST_WIDE_INT val64 = 0;
>   for (unsigned int i = 0; i < 8; i++)
> val64 |= ((unsigned HOST_WIDE_INT) bytes[i % nbytes]
> << (i * BITS_PER_UNIT));
> 
> ---> here
> 
>   if (vec_flags & VEC_SVE_DATA)
> return aarch64_sve_valid_immediate (val64, info);
>   else
> return aarch64_advsimd_valid_immediate (val64, info, which);
> 
> that tries to reduce val64 to the smallest repeating pattern,
> then tries to interpret that pattern as a float.  The reduction step
> could reuse the first part of aarch64_sve_valid_immediate, which
> calculates the narrowest repeating integer mode:
> 
>   scalar_int_mode mode = DImode;
>   unsigned int val32 = val64 & 0x;
>   if (val32 == (val64 >> 32))
> {
>   mode = SImode;
>   unsigned int val16 = val32 & 0x;
>   if (val16 == (val32 >> 16))
>   {
> mode = HImode;
> unsigned int val8 = val16 & 0xff;
> if (val8 == (val16 >> 8))
>   mode = QImode;
>   }
> }
> 
> This would give us the candidate integer mode, to which we could
> apply float_mode_for_size (...).exists, as in the patch.
> 

I was doubting whether it's safe to use this or not.  That's why I listed
the modes using a switch statement.  Namely I'm concerned about the
multiple float 16 format.  It looks like from looking at the source of
float_mode_for_size that it just returns the first float mode, so makes it
pretty sensitive to the order of definition in aarch64/aarch64-modes.def.

Is it safe to assume that storage only formats like BF16 will always be
listed after general compute types?

Thanks,
Tamar

> In this case we would have the value as an integer, rather than
> as an rtx, so I think it would make sense to split out the part of
> aarch64_float_const_representable_p that processes the REAL_VALUE_TYPE.
> aarch64_simd_valid_immediate could then use the patch's:
> 
> > +  long int as_long_ints[2];
> > +  as_long_ints[0] = buf & 0x;
> > +  as_long_ints[1] = (buf >> 32) & 0x;
> > [...]
> > +  real_from_target (&r, as_long_ints, fmode);
> 
> with "buf" being "val64" in the code above, and "fmode" being the result
> of float_mode_for_size (...).exists.  aarch64_simd_valid_immediate
> would then pass "r" and and "fmode" to the new, split-out variant of
> aarch64_float_const_representable_p.  (I haven't checked the endiannes
> requirements for real_from_target.)
> 
> The split-out variant would still perform the HFmode test in:
> 
>   if (GET_MODE (x) == VOIDmode
>   || (GET_MODE (x) == HFmode && !TARGET_FP_F16INST))
> return false;
> 
> The VOIDmode test is redundant and can be dropped.  AArch64 has always
> been a CONST_WIDE_INT target.
> 
> If we do that, we should probably also pass the integer mode calculated
> by the code quoted above down to aarch64_sve_valid_immediate (where it
> came from) and aarch64_advsimd_valid_immediate, since both of them would
> find it useful.  E.g.:

[PATCH v3] c++: concept in default argument [PR109859]

2024-09-30 Thread Marek Polacek

On Mon, Sep 30, 2024 at 10:53:04AM -0400, Jason Merrill wrote:
> On 9/27/24 5:30 PM, Marek Polacek wrote:
> > On Fri, Sep 27, 2024 at 04:57:58PM -0400, Jason Merrill wrote:
> > > On 9/18/24 5:06 PM, Marek Polacek wrote:
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > 1) We're hitting the assert in cp_parser_placeholder_type_specifier.
> > > > It says that if it turns out to be false, we should do error() instead.
> > > > Do so, then.
> > > > 
> > > > 2) lambda-targ8.C should compile fine, though.  The problem was that
> > > > local_variables_forbidden_p wasn't cleared when we're about to parse
> > > > the optional template-parameter-list for a lambda in a default argument.
> > > > 
> > > > PR c++/109859
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * parser.cc (cp_parser_lambda_declarator_opt): Temporarily clear
> > > > local_variables_forbidden_p.
> > > > (cp_parser_placeholder_type_specifier): Turn an assert into an 
> > > > error.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/cpp2a/concepts-defarg3.C: New test.
> > > > * g++.dg/cpp2a/lambda-targ8.C: New test.
> > > > ---
> > > >gcc/cp/parser.cc  |  9 +++--
> > > >gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C |  8 
> > > >gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C | 10 ++
> > > >3 files changed, 25 insertions(+), 2 deletions(-)
> > > >create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C
> > > >create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C
> > > > 
> > > > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > > > index 4dd9474cf60..bdc4fef243a 100644
> > > > --- a/gcc/cp/parser.cc
> > > > +++ b/gcc/cp/parser.cc
> > > > @@ -11891,6 +11891,11 @@ cp_parser_lambda_declarator_opt (cp_parser* 
> > > > parser, tree lambda_expr)
> > > >  "lambda templates are only available with "
> > > >  "%<-std=c++20%> or %<-std=gnu++20%>");
> > > > +  /* Even though the whole lambda may be a default argument, its
> > > > +template-parameter-list is a context where it's OK to create
> > > > +new parameters.  */
> > > > +  auto lvf = make_temp_override 
> > > > (parser->local_variables_forbidden_p, 0u);
> > > > +
> > > >  cp_lexer_consume_token (parser->lexer);
> > > >  template_param_list = cp_parser_template_parameter_list 
> > > > (parser);
> > > > @@ -20978,8 +20983,8 @@ cp_parser_placeholder_type_specifier (cp_parser 
> > > > *parser, location_t loc,
> > > >  /* In a default argument we may not be creating new 
> > > > parameters.  */
> > > >  if (parser->local_variables_forbidden_p & LOCAL_VARS_FORBIDDEN)
> > > > {
> > > > - /* If this assert turns out to be false, do error() instead.  
> > > > */
> > > > - gcc_assert (tentative);
> > > > + if (!tentative)
> > > > +   error_at (loc, "local variables may not appear in this 
> > > > context");
> > > 
> > > There's no local variable in the new testcase, the error should talk 
> > > about a
> > > concept-name.
> > 
> > Ah sure.  So like this?
> > 
> > Tested dg.exp.
> > 
> > -- >8 --
> > 1) We're hitting the assert in cp_parser_placeholder_type_specifier.
> > It says that if it turns out to be false, we should do error() instead.
> > Do so, then.
> > 
> > 2) lambda-targ8.C should compile fine, though.  The problem was that
> > local_variables_forbidden_p wasn't cleared when we're about to parse
> > the optional template-parameter-list for a lambda in a default argument.
> > 
> > PR c++/109859
> > 
> > gcc/cp/ChangeLog:
> > 
> > * parser.cc (cp_parser_lambda_declarator_opt): Temporarily clear
> > local_variables_forbidden_p.
> > (cp_parser_placeholder_type_specifier): Turn an assert into an error.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/concepts-defarg3.C: New test.
> > * g++.dg/cpp2a/lambda-targ8.C: New test.
> > ---
> >   gcc/cp/parser.cc  |  9 +++--
> >   gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C |  8 
> >   gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C | 10 ++
> >   3 files changed, 25 insertions(+), 2 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C
> > 
> > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > index f50534f5f39..a92e6a29ba6 100644
> > --- a/gcc/cp/parser.cc
> > +++ b/gcc/cp/parser.cc
> > @@ -11891,6 +11891,11 @@ cp_parser_lambda_declarator_opt (cp_parser* 
> > parser, tree lambda_expr)
> >  "lambda templates are only available with "
> >  "%<-std=c++20%> or %<-std=gnu++20%>");
> > +  /* Even though the whole lambda may be a default argument, its
> > +template-parameter-list is a context where it's OK to create
> > +new param

Re: [PATCH 3/3] bpf: set index entry for a VAR_DECL in CO-RE relocs

2024-09-30 Thread David Faust




On 9/27/24 09:49, Cupertino Miranda wrote:
> CO-RE accesses with non pointer struct variables will also generate a
> "0" string access within the CO-RE relocation.
> The first index within the access string, has sort of a different
> meaning then the remaining of the indexes.
> For i0:i1:...:in being an access index for "struct A a" declaration, its
> semantics are represented by:
>   (&a + (sizeof(struct A) * i0) + offsetof(i1:...:in)

I can guess the answer, but is this semantic actually documented anywhere?

We may want to see about adding this in the "official" kernel BTF docs
since IMO this special meaning of the first index is not at all obvious.

Patch LGTM.
Thanks.

> ---
>  gcc/config/bpf/core-builtins.cc  |  5 -
>  gcc/testsuite/gcc.target/bpf/core-builtin-1.c| 16 
>  gcc/testsuite/gcc.target/bpf/core-builtin-2.c|  3 ++-
>  .../gcc.target/bpf/core-builtin-exprlist-1.c | 16 
>  4 files changed, 22 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/config/bpf/core-builtins.cc b/gcc/config/bpf/core-builtins.cc
> index cdfb356660e..fc6379cf028 100644
> --- a/gcc/config/bpf/core-builtins.cc
> +++ b/gcc/config/bpf/core-builtins.cc
> @@ -698,10 +698,13 @@ compute_field_expr (tree node, unsigned int *accessors,
> access_node, false, callback);
>return n;
>  
> +case VAR_DECL:
> +  accessors[0] = 0;
> +  return 1;
> +
>  case ADDR_EXPR:
>  case CALL_EXPR:
>  case SSA_NAME:
> -case VAR_DECL:
>  case PARM_DECL:
>return 0;
>  default:
> diff --git a/gcc/testsuite/gcc.target/bpf/core-builtin-1.c 
> b/gcc/testsuite/gcc.target/bpf/core-builtin-1.c
> index b4f9998afb8..0706005f0e5 100644
> --- a/gcc/testsuite/gcc.target/bpf/core-builtin-1.c
> +++ b/gcc/testsuite/gcc.target/bpf/core-builtin-1.c
> @@ -24,16 +24,16 @@ unsigned long ula[8];
>  unsigned long
>  func (void)
>  {
> -  /* 1 */
> +  /* 0:1 */
>int b = _(my_s.b);
>  
> -  /* 2 */
> +  /* 0:2 */
>char c = _(my_s.c);
>  
> -  /* 2:3 */
> +  /* 0:2:3 */
>unsigned char uc = _(my_u.uc[3]);
>  
> -  /* 6 */
> +  /* 0:6 */
>unsigned long ul = _(ula[6]);
>  
>return b + c + uc + ul;
> @@ -55,10 +55,10 @@ u_ptr (union U *pu)
>return x;
>  }
>  
> -/* { dg-final { scan-assembler-times "ascii \"1.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
> -/* { dg-final { scan-assembler-times "ascii \"2.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
> -/* { dg-final { scan-assembler-times "ascii \"2:3.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
> -/* { dg-final { scan-assembler-times "ascii \"6.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
> +/* { dg-final { scan-assembler-times "ascii \"0:1.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
> +/* { dg-final { scan-assembler-times "ascii \"0:2.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
> +/* { dg-final { scan-assembler-times "ascii \"0:2:3.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
> +/* { dg-final { scan-assembler-times "ascii \"0:6.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
>  /* { dg-final { scan-assembler-times "ascii \"0:2.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
>  /* { dg-final { scan-assembler-times "ascii \"0:2:3.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
>  
> diff --git a/gcc/testsuite/gcc.target/bpf/core-builtin-2.c 
> b/gcc/testsuite/gcc.target/bpf/core-builtin-2.c
> index b72e2566b71..04b3f6b2652 100644
> --- a/gcc/testsuite/gcc.target/bpf/core-builtin-2.c
> +++ b/gcc/testsuite/gcc.target/bpf/core-builtin-2.c
> @@ -16,11 +16,12 @@ struct S foo;
>  
>  void func (void)
>  {
> +  /* 0:1:3:2 */
>char *x = __builtin_preserve_access_index (&foo.u[3].c);
>  
>*x = 's';
>  }
>  
>  /* { dg-final { scan-assembler-times "\[\t \]0x402\[\t 
> \]+\[^\n\]*btt_info" 1 } } */
> -/* { dg-final { scan-assembler-times "ascii \"1:3:2.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
> +/* { dg-final { scan-assembler-times "ascii \"0:1:3:2.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
>  /* { dg-final { scan-assembler-times "bpfcr_type" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/bpf/core-builtin-exprlist-1.c 
> b/gcc/testsuite/gcc.target/bpf/core-builtin-exprlist-1.c
> index 8ce4a6e70de..c53daf81c5f 100644
> --- a/gcc/testsuite/gcc.target/bpf/core-builtin-exprlist-1.c
> +++ b/gcc/testsuite/gcc.target/bpf/core-builtin-exprlist-1.c
> @@ -31,16 +31,16 @@ func (void)
>int ic;
>  
>__builtin_preserve_access_index (({
> -/* 1 */
> +/* 0:1 */
>  b = my_s.b;
>  
> -/* 2 */
> +/* 0:2 */
>  ic = my_s.c;
>  
> -/* 2:3 */
> +/* 0:2:3 */
>  uc = my_u.uc[3];
>  
> -/* 6 */
> +/* 0:6 */
>  ul = ula[6];
>}));
>  
> @@ -65,10 +65,10 @@ u_ptr (union U *pu)
>return x;
>  }
>  
> -/* { dg-final { scan-assembler-times "ascii \"1.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
> -/* { dg-final { scan-assembler-times "ascii \"2.0\"\[\t 
> \]+\[^\n\]*btf_aux_string" 1 } } */
> -/* { dg-f

RE: [PATCH] i386: Add _MM_FROUND_TO_NEAREST_TIES_EVEN to smmintrin.h

2024-09-30 Thread Paul Caprioli


Hi,

I'm writing to ask that someone with write access to the git repo apply 
this patch, which provides the macro definition 
`_MM_FROUND_TO_NEAREST_TIES_EVEN`.

Intrinsics such as `_mm512_add_round_ps` take a rounding mode argument to 
specify the floating point rounding mode. This and similar instructions do NOT 
round their result to an integer. Thus it is inappropriate for user code to 
specify the existing `_MM_FROUND_TO_NEAREST_INT` when desiring to round to the 
nearest floating point number. This patch adds a suitable macro definition.

Note that some few instructions, e.g., `ROUNDPS`, do round to an integer, 
so the existing macro definition ought not be deprecated.

Note that IEEE Std 754-2019 for floating-point arithmetic specifies two 
rounding direction attributes to nearest: `roundTiesToEven` and 
`roundTiesToAway`. Also, it specifies three directed rounding attributes: 
`roundTowardPositive`, `roundTowardNegative`, and `roundTowardZero`.

Please note that I do not have write access to the git repo and that I am 
not a member of this email alias.  Your adding p...@hpkfft.com 
 to any discussions would be appreciated.

Regards,
Paul

From c3f03dcd6a804a159c5789b44cb0ecac76439ee7 Mon Sep 17 00:00:00 2001
From: Paul Caprioli 
Date: Sat, 20 Jul 2024 11:31:06 -0700
Subject: [PATCH] Add _MM_FROUND_TO_NEAREST_TIES_EVEN to smmintrin.h

---
 gcc/config/i386/smmintrin.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/i386/smmintrin.h b/gcc/config/i386/smmintrin.h
index 4c315feec36..22ef9e966a7 100644
--- a/gcc/config/i386/smmintrin.h
+++ b/gcc/config/i386/smmintrin.h
@@ -39,6 +39,7 @@
 
 /* Rounding mode macros. */
 #define _MM_FROUND_TO_NEAREST_INT	0x00
+#define _MM_FROUND_TO_NEAREST_TIES_EVEN	0x00
 #define _MM_FROUND_TO_NEG_INF		0x01
 #define _MM_FROUND_TO_POS_INF		0x02
 #define _MM_FROUND_TO_ZERO		0x03
-- 
2.39.5

RE: [PATCH]middle-end: check explicitly for external or constants when checking for loop invariant [PR116817]

2024-09-30 Thread Tamar Christina

> > > Can you explain how you get to see constant/external defs with
> astmt_vec_info?  That's somehow a violation of some inherentinvariant in the
> vectorizer.
> >
> > I'm not sure I actually get any. It could be the condition is never hit
> > with a stmt_vec_info. I had assumed however since the condition is part
> > of a gimple_cond and if one of the arguments of the gimple_cond is loop
> > bound, that the condition would be analyzed too.
> >
> > So if you're saying you never get a stmt_vec_info for invariants at this
> > point (I assume you could see you see them in the corresponding slp
> > tree) then maybe checking for the stmt_vec_info is enough.
> >
> > However, when I was looking around for how to check for externals I
> > noticed other patterns also check for externals and constants. So I
> > assumed that you could indeed get them.
> 
> You usually check that after doing vect_is_simple_use on the SSA name
> or constant which internally makes all stmts with a stmt_vec_info
> one of the internal def kinds.
> 
> So I guess you could do vect_is_simple_use on 'var' as well and check
> the 'dt' it will populate
> 

Ah I see, I did see it being called in some other patterns but wasn't sure what 
it
was providing.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/116817
* tree-vect-patterns.cc (vect_recog_bool_pattern): Check for const or
externals.

gcc/testsuite/ChangeLog:

PR tree-optimization/116817
* g++.dg/vect/pr116817.cc: New test.

-- inline copy of patch --

diff --git a/gcc/testsuite/g++.dg/vect/pr116817.cc 
b/gcc/testsuite/g++.dg/vect/pr116817.cc
new file mode 100644
index 
..7e28982fb138c24f956aedb03fa454d9d858
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr116817.cc
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+int main_ulData0;
+unsigned *main_pSrcBuffer;
+int main(void) {
+  int iSrc = 0;
+  bool bData0;
+  for (; iSrc < 4; iSrc++) {
+if (bData0)
+  main_pSrcBuffer[iSrc] = main_ulData0;
+else
+  main_pSrcBuffer[iSrc] = 0;
+bData0 = !bData0;
+  }
+}
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
e7e877dd2adb55262822f1660f8d92b42d44e6d0..b174ff1e705cec8e7bb414c760eb170ca98222cb
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6062,12 +6062,15 @@ vect_recog_bool_pattern (vec_info *vinfo,
   if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
return NULL;
 
+  enum vect_def_type dt;
   if (check_bool_pattern (var, vinfo, bool_stmts))
var = adjust_bool_stmts (vinfo, bool_stmts, type, stmt_vinfo);
   else if (integer_type_for_mask (var, vinfo))
return NULL;
   else if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
-  && !vect_get_internal_def (vinfo, var))
+  && vect_is_simple_use (var, vinfo, &dt)
+  && (dt == vect_external_def
+  || dt == vect_constant_def))
{
  /* If the condition is already a boolean then manually convert it to a
 mask of the given integer type but don't set a vectype.  */


rb18806 (1).patch
Description: rb18806 (1).patch

[PATCH 1/2]AArch64: refactor aarch64_float_const_representable_p to take additional mode param

2024-09-30 Thread Tamar Christina

Hi All,

This is a refactoring to allow aarch64_float_const_representable_p
to take an additional mode parameter which is the mode of the constant being
analyzed.  This will be required by the next patch in the series.

No functional change is expected from this change.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_float_const_representable_p):
Add mode param.
* config/aarch64/aarch64.cc (aarch64_float_const_representable_p):
Add mode param.
(aarch64_print_operand, aarch64_rtx_costs,
aarch64_simd_valid_immediate): Use it.
* config/aarch64/aarch64.md: Likewise.
* config/aarch64/constraints.md: Likewise.
* config/aarch64/predicates.md: Likewise.

---
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
d03c1fe798b2ccc2258b8581473a6eb7dc4af850..7a84acc59569da0b50af2300615db561a5de460a
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -973,7 +973,7 @@ bool aarch64_mov128_immediate (rtx);
 void aarch64_split_simd_move (rtx, rtx);
 
 /* Check for a legitimate floating point constant for FMOV.  */
-bool aarch64_float_const_representable_p (rtx);
+bool aarch64_float_const_representable_p (rtx, machine_mode);
 
 extern int aarch64_epilogue_uses (int);
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
68913beaee2092d65279801c362d6e742269b3c4..1842f6ecf6330f11a64545d0903240c89b104ffc
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12317,7 +12317,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
  fputc ('0', f);
  break;
}
- else if (aarch64_float_const_representable_p (x))
+ else if (aarch64_float_const_representable_p (x, GET_MODE (x)))
{
 #define buf_size 20
  char float_buf[buf_size] = {'\0'};
@@ -14233,7 +14233,7 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer 
ATTRIBUTE_UNUSED,
 
   /* First determine number of instructions to do the move
  as an integer constant.  */
-  if (!aarch64_float_const_representable_p (x)
+  if (!aarch64_float_const_representable_p (x, mode)
   && !aarch64_can_const_movi_rtx_p (x, mode)
   && aarch64_float_const_rtx_p (x))
{
@@ -14252,7 +14252,7 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer 
ATTRIBUTE_UNUSED,
   if (speed)
{
  /* mov[df,sf]_aarch64.  */
- if (aarch64_float_const_representable_p (x))
+ if (aarch64_float_const_representable_p (x, mode))
/* FMOV (scalar immediate).  */
*cost += extra_cost->fp[mode == DFmode || mode == DDmode].fpconst;
  else if (!aarch64_float_const_zero_rtx_p (x))
@@ -23032,7 +23032,7 @@ aarch64_simd_valid_immediate (rtx op, 
simd_immediate_info *info,
 {
   rtx elt = CONST_VECTOR_ENCODED_ELT (op, 0);
   if (aarch64_float_const_zero_rtx_p (elt)
- || aarch64_float_const_representable_p (elt))
+ || aarch64_float_const_representable_p (elt, elt_mode))
{
  if (info)
*info = simd_immediate_info (elt_float_mode, elt);
@@ -25119,10 +25119,10 @@ aarch64_c_mode_for_suffix (char suffix)
  'n' is an integer in the range 16 <= n <= 31.
  'r' is an integer in the range -3 <= r <= 4.  */
 
-/* Return true iff X can be represented by a quarter-precision
+/* Return true iff X with mode MODE can be represented by a quarter-precision
floating point immediate operand X.  Note, we cannot represent 0.0.  */
 bool
-aarch64_float_const_representable_p (rtx x)
+aarch64_float_const_representable_p (rtx x, machine_mode mode)
 {
   /* This represents our current view of how many bits
  make up the mantissa.  */
@@ -25133,11 +25133,12 @@ aarch64_float_const_representable_p (rtx x)
   bool fail;
 
   x = unwrap_const_vec_duplicate (x);
+  mode = GET_MODE_INNER (mode);
   if (!CONST_DOUBLE_P (x))
 return false;
 
-  if (GET_MODE (x) == VOIDmode
-  || (GET_MODE (x) == HFmode && !TARGET_FP_F16INST))
+  if (mode == VOIDmode
+  || (mode == HFmode && !TARGET_FP_F16INST))
 return false;
 
   r = *CONST_DOUBLE_REAL_VALUE (x);
@@ -25150,7 +25151,7 @@ aarch64_float_const_representable_p (rtx x)
 return false;
 
   /* For BFmode, only handle 0.0. */
-  if (GET_MODE (x) == BFmode)
+  if (mode == BFmode)
 return real_iszero (&r, false);
 
   /* Extract exponent.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
c54b29cd64b9e0dc6c6d12735049386ccedc5408..20e131403071b6cf68aa06c0df7c90ef9c656cae
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1712,7 +1712,7 @@ (define_split
(match_operand:GPF_HF 1 "const_double_operand"))]
   "can_create_pseudo_p ()
&& !aarch64_can_const_movi_rt

[PATCH 1/2] libstdc++: Implement C++23 (P0429R9)

2024-09-30 Thread Patrick Palka

This implements the C++23 container adaptors std::flat_map and
std::flat_multimap from P0429R9.  The implementation is shared
as much as possible between the two adaptors via a common base
class that's parameterized according to key uniqueness.

The main known issues are:

  * the range insert() overload exceeds its complexity requirements
since an idiomatic efficient implementation needs a non-buggy
ranges::inplace_merge
  * exception safety is likely incomplete/buggy
  * unimplemented from_range_t constructors and insert_range function
  * the main workhorse function _M_try_emplace is probably buggy
buggy wrt its handling of the hint parameter and could be simplified
  * more extensive testcases are a WIP

The iterator type is encoded as a {pointer, index} pair instead of an
{iterator, iterator} pair.  I'm not sure which encoding is preferable?
It seems the latter would allow for better debuggability when the
underlying iterators are debug iterators.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/stl_function.h (__transparent_comparator): Define.
* include/bits/utility.h (sorted_unique_t): Define for C++23.
(sorted_unique): Likewise.
(sorted_equivalent_t): Likewise.
(sorted_equivalent): Likewise.
* include/bits/version.def (flat_map): Define.
* include/bits/version.h: Regenerate.
* include/std/flat_map: New file.
* testsuite/23_containers/flat_map/1.cc: New test.
* testsuite/23_containers/flat_multimap/1.cc: New test.
---
 libstdc++-v3/include/Makefile.am  |1 +
 libstdc++-v3/include/Makefile.in  |1 +
 libstdc++-v3/include/bits/stl_function.h  |6 +
 libstdc++-v3/include/bits/utility.h   |8 +
 libstdc++-v3/include/bits/version.def |8 +
 libstdc++-v3/include/bits/version.h   |   10 +
 libstdc++-v3/include/std/flat_map | 1477 +
 .../testsuite/23_containers/flat_map/1.cc |   90 +
 .../23_containers/flat_multimap/1.cc  |   77 +
 9 files changed, 1678 insertions(+)
 create mode 100644 libstdc++-v3/include/std/flat_map
 create mode 100644 libstdc++-v3/testsuite/23_containers/flat_map/1.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/flat_multimap/1.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 422a0f4bd0a..632bbafa63e 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -70,6 +70,7 @@ std_headers = \
${std_srcdir}/deque \
${std_srcdir}/execution \
${std_srcdir}/filesystem \
+   ${std_srcdir}/flat_map \
${std_srcdir}/format \
${std_srcdir}/forward_list \
${std_srcdir}/fstream \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 9fd4ab4848c..1ac963c4415 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -426,6 +426,7 @@ std_freestanding = \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/deque \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/execution \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/filesystem \
+@GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/flat_map \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/format \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/forward_list \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/fstream \
diff --git a/libstdc++-v3/include/bits/stl_function.h 
b/libstdc++-v3/include/bits/stl_function.h
index c9123ccecae..c579ba9f47b 100644
--- a/libstdc++-v3/include/bits/stl_function.h
+++ b/libstdc++-v3/include/bits/stl_function.h
@@ -1426,6 +1426,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 using __has_is_transparent_t
   = typename __has_is_transparent<_Func, _SfinaeType>::type;
+
+#if __cpp_concepts
+  template
+concept __transparent_comparator
+  = requires { typename _Func::is_transparent; };
+#endif
 #endif
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/include/bits/utility.h 
b/libstdc++-v3/include/bits/utility.h
index 4a6c16dc2e0..9e10ce2cb1c 100644
--- a/libstdc++-v3/include/bits/utility.h
+++ b/libstdc++-v3/include/bits/utility.h
@@ -308,6 +308,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
   _GLIBCXX17_INLINE constexpr _Swallow_assign ignore{};
 
+#if __glibcxx_flat_map || __glibcxx_flat_set // >= C++23
+  struct sorted_unique_t { explicit sorted_unique_t() = default; };
+  inline constexpr sorted_unique_t sorted_unique{};
+
+  struct sorted_equivalent_t { explicit sorted_equivalent_t() = default; };
+  inline constexpr sorted_equivalent_t sorted_equivalent{};
+#endif
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index f2e28175b08..631eca7beac 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1658,6 +1658,14 @@ ftms = {
   };
 };
 
+ftms = {
+  name = flat

[PATCH 2/2] libstdc++: Implement C++23 (P1222R4)

2024-09-30 Thread Patrick Palka

This implements the C++23 container adaptors std::flat_set and
std::flat_multiset from P1222R4.  The implementation is essentially
an simpler and pared down version of std::flat_map.

The main known issues are:

  * exception safety is likely incomplete/buggy
  * unimplemented from_range_t constructors and insert_range function
  * the main worthouse function _M_try_emplace is probably buggy
wrt its handling of the hint parameter and could be simplified
  * more extensive testcases are a WIP

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new header .
* include/Makefile.in: Regenerate.
* include/bits/version.def (__cpp_flat_set): Define.
* include/bits/version.h: Regenerate
* include/std/flat_set: New file.
* testsuite/23_containers/flat_multiset/1.cc: New test.
* testsuite/23_containers/flat_set/1.cc: New test.
---
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/version.def |   8 +
 libstdc++-v3/include/bits/version.h   |  10 +
 libstdc++-v3/include/std/flat_set | 968 ++
 .../23_containers/flat_multiset/1.cc  |  73 ++
 .../testsuite/23_containers/flat_set/1.cc |  78 ++
 7 files changed, 1139 insertions(+)
 create mode 100644 libstdc++-v3/include/std/flat_set
 create mode 100644 libstdc++-v3/testsuite/23_containers/flat_multiset/1.cc
 create mode 100644 libstdc++-v3/testsuite/23_containers/flat_set/1.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 632bbafa63e..e49cdb23c55 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -71,6 +71,7 @@ std_headers = \
${std_srcdir}/execution \
${std_srcdir}/filesystem \
${std_srcdir}/flat_map \
+   ${std_srcdir}/flat_set \
${std_srcdir}/format \
${std_srcdir}/forward_list \
${std_srcdir}/fstream \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 1ac963c4415..8e6ee44cc0e 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -427,6 +427,7 @@ std_freestanding = \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/execution \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/filesystem \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/flat_map \
+@GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/flat_set \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/format \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/forward_list \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/fstream \
diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 631eca7beac..827582cf2ea 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1666,6 +1666,14 @@ ftms = {
   };
 };
 
+ftms = {
+  name = flat_set;
+  values = {
+v = 202207;
+cxxmin = 23;
+  };
+};
+
 ftms = {
   name = formatters;
   values = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 1f3040fcbde..311586461e3 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1840,6 +1840,16 @@
 #endif /* !defined(__cpp_lib_flat_map) && defined(__glibcxx_want_flat_map) */
 #undef __glibcxx_want_flat_map
 
+#if !defined(__cpp_lib_flat_set)
+# if (__cplusplus >= 202100L)
+#  define __glibcxx_flat_set 202207L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_flat_set)
+#   define __cpp_lib_flat_set 202207L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_flat_set) && defined(__glibcxx_want_flat_set) */
+#undef __glibcxx_want_flat_set
+
 #if !defined(__cpp_lib_formatters)
 # if (__cplusplus >= 202100L) && _GLIBCXX_HOSTED
 #  define __glibcxx_formatters 202302L
diff --git a/libstdc++-v3/include/std/flat_set 
b/libstdc++-v3/include/std/flat_set
new file mode 100644
index 000..bbb63408cc2
--- /dev/null
+++ b/libstdc++-v3/include/std/flat_set
@@ -0,0 +1,968 @@
+//  -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYIN

[PATCH v2] RISC-V: Implement TARGET_CAN_INLINE_P

2024-09-30 Thread Yangyu Chen

Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V
ISA. As a result, certain functions cannot be optimized with inlining
when specific options, such as __attribute__((target("arch=+v"))) .
This can lead to potential performance issues when building
retargetable binaries for RISC-V.

To address this, I have implemented the riscv_can_inline_p function.
This addition enables inlining when the callee either has no special
options or when the some options match, and also ensuring that the
callee's ISA is a subset of the caller's. I also check some other
options when there is no always_inline set.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (cl_opt_var_ref_t): Add
cl_opt_var_ref_t pointer to member of cl_target_option.
(struct riscv_ext_flag_table_t): Add new cl_opt_var_ref_t field.
(RISCV_EXT_FLAG_ENTRY): New macro to simplify the definition of
riscv_ext_flag_table.
(riscv_ext_is_subset): New function to check if the callee's ISA
is a subset of the caller's.
(riscv_x_target_flags_isa_mask): New function to get the mask of
ISA extension in x_target_flags of gcc_options.
* config/riscv/riscv-subset.h (riscv_ext_is_subset): Declare
riscv_ext_is_subset function.
(riscv_x_target_flags_isa_mask): Declare
riscv_x_target_flags_isa_mask function.
* config/riscv/riscv.cc (riscv_can_inline_p): New function.
(TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P.

Signed-off-by: Yangyu Chen 
---
 gcc/common/config/riscv/riscv-common.cc | 370 +---
 gcc/config/riscv/riscv-subset.h |   3 +
 gcc/config/riscv/riscv.cc   |  59 
 3 files changed, 267 insertions(+), 165 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index bd42fd01532..90c386e76dc 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1567,191 +1567,196 @@ riscv_arch_str (bool version_p)
 return std::string();
 }
 
-/* Type for pointer to member of gcc_options.  */
+/* Type for pointer to member of gcc_options and cl_target_option.  */
 typedef int (gcc_options::*opt_var_ref_t);
+typedef int (cl_target_option::*cl_opt_var_ref_t);
 
 /* Types for recording extension to internal flag.  */
 struct riscv_ext_flag_table_t {
   const char *ext;
   opt_var_ref_t var_ref;
+  cl_opt_var_ref_t cl_var_ref;
   int mask;
 };
 
+#define RISCV_EXT_FLAG_ENTRY(NAME, VAR, MASK) \
+  {NAME, &gcc_options::VAR, &cl_target_option::VAR, MASK}
+
 /* Mapping table between extension to internal flag.  */
 static const riscv_ext_flag_table_t riscv_ext_flag_table[] =
 {
-  {"e", &gcc_options::x_target_flags, MASK_RVE},
-  {"m", &gcc_options::x_target_flags, MASK_MUL},
-  {"a", &gcc_options::x_target_flags, MASK_ATOMIC},
-  {"f", &gcc_options::x_target_flags, MASK_HARD_FLOAT},
-  {"d", &gcc_options::x_target_flags, MASK_DOUBLE_FLOAT},
-  {"c", &gcc_options::x_target_flags, MASK_RVC},
-  {"v", &gcc_options::x_target_flags, MASK_FULL_V},
-  {"v", &gcc_options::x_target_flags, MASK_VECTOR},
-
-  {"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
-  {"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
-  {"zicond",   &gcc_options::x_riscv_zi_subext, MASK_ZICOND},
-
-  {"za64rs",  &gcc_options::x_riscv_za_subext, MASK_ZA64RS},
-  {"za128rs", &gcc_options::x_riscv_za_subext, MASK_ZA128RS},
-  {"zawrs",   &gcc_options::x_riscv_za_subext, MASK_ZAWRS},
-  {"zaamo",   &gcc_options::x_riscv_za_subext, MASK_ZAAMO},
-  {"zalrsc",  &gcc_options::x_riscv_za_subext, MASK_ZALRSC},
-  {"zabha",   &gcc_options::x_riscv_za_subext, MASK_ZABHA},
-  {"zacas",   &gcc_options::x_riscv_za_subext, MASK_ZACAS},
-
-  {"zba",&gcc_options::x_riscv_zb_subext, MASK_ZBA},
-  {"zbb",&gcc_options::x_riscv_zb_subext, MASK_ZBB},
-  {"zbc",&gcc_options::x_riscv_zb_subext, MASK_ZBC},
-  {"zbs",&gcc_options::x_riscv_zb_subext, MASK_ZBS},
-
-  {"zfinx",&gcc_options::x_riscv_zinx_subext, MASK_ZFINX},
-  {"zdinx",&gcc_options::x_riscv_zinx_subext, MASK_ZDINX},
-  {"zhinx",&gcc_options::x_riscv_zinx_subext, MASK_ZHINX},
-  {"zhinxmin", &gcc_options::x_riscv_zinx_subext, MASK_ZHINXMIN},
-
-  {"zbkb",   &gcc_options::x_riscv_zk_subext, MASK_ZBKB},
-  {"zbkc",   &gcc_options::x_riscv_zk_subext, MASK_ZBKC},
-  {"zbkx",   &gcc_options::x_riscv_zk_subext, MASK_ZBKX},
-  {"zknd",   &gcc_options::x_riscv_zk_subext, MASK_ZKND},
-  {"zkne",   &gcc_options::x_riscv_zk_subext, MASK_ZKNE},
-  {"zknh",   &gcc_options::x_riscv_zk_subext, MASK_ZKNH},
-  {"zkr",&gcc_options::x_riscv_zk_subext, MASK_ZKR},
-  {"zksed",  &gcc_options::x_riscv_zk_subext, MASK_ZKSED},
-  {"zksh",   &gcc_options::x_riscv_zk_subext, MASK_ZKSH},
-  {"zkt",&gcc_options::x_riscv_zk_subext, MASK_ZKT},
-
-  {"zihintntl", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTNTL},
-  {"zihintpause", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTPA

[PATCH] tree-optimization/116879 - failure to recognize non-empty latch

2024-09-30 Thread Richard Biener

When we relaxed the vectorizers constraint on loop structure verifying
the emptiness of the latch became too lose as can be seen in the case
for PR116879 where the latch effectively contains two basic-blocks
which one being an unmerged forwarder that's not empty.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/116879
* tree-vect-loop.cc (vect_analyze_loop_form): Scan all
blocks that form the latch.

* gcc.dg/pr116879.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr116879.c | 15 +++
 gcc/tree-vect-loop.cc   | 15 +++
 2 files changed, 26 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr116879.c

diff --git a/gcc/testsuite/gcc.dg/pr116879.c b/gcc/testsuite/gcc.dg/pr116879.c
new file mode 100644
index 000..73ddb2b658c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr116879.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fallow-store-data-races -fno-tree-ch 
-ftree-loop-distribution" } */
+
+static int b;
+int *a, c, *d = &c;
+int main() {
+  int e = 0;
+  for (; e < 8; e = (char)(e + 1)) {
+int *f = &b, g[8], h = 0;
+for (; h < 8; h++)
+  g[h] = 0;
+--*f != (*d = g[0] || a);
+  }
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 5cd4bdb32e0..0ce1bf8ebba 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1851,10 +1851,17 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
   " too many incoming edges.\n");
 
   /* We assume that the latch is empty.  */
-  if (!empty_block_p (loop->latch)
-  || !gimple_seq_empty_p (phi_nodes (loop->latch)))
-return opt_result::failure_at (vect_location,
-  "not vectorized: latch block not empty.\n");
+  basic_block latch = loop->latch;
+  do
+{
+  if (!empty_block_p (latch)
+ || !gimple_seq_empty_p (phi_nodes (latch)))
+   return opt_result::failure_at (vect_location,
+  "not vectorized: latch block not "
+  "empty.\n");
+  latch = single_pred (latch);
+}
+  while (single_succ_p (latch));
 
   /* Make sure there is no abnormal exit.  */
   auto_vec exits = get_loop_exit_edges (loop);
-- 
2.43.0

Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen5 CPU with znver5 scheduler Model

2024-09-30 Thread Jan Hubicka

Hi,
I have now backported this patch to active branches (12 and 13).

Honza

Re: Zen5 tuning part 1: avoid FMA chains

2024-09-30 Thread Jan Hubicka

Hi,
> 
> gcc/ChangeLog:
> 
>   * config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for
>   znver5.
>   (X86_TUNE_AVOID_256FMA_CHAINS): Likewise.
>   (X86_TUNE_AVOID_512FMA_CHAINS): Likewise.
This patch is also now backported to active branches.

Honza

Re: [PATCH v2] c++: Don't ICE due to artificial constructor parameters [PR116722]

2024-09-30 Thread Jason Merrill


On 9/23/24 4:44 AM, Simon Martin wrote:

Hi Jason,

On 20 Sep 2024, at 18:01, Jason Merrill wrote:


On 9/20/24 5:21 PM, Simon Martin wrote:

The following code triggers an ICE

=== cut here ===
class base {};
class derived : virtual public base {
public:
template constexpr derived(Arg) {}
};
int main() {
derived obj(1.);
}
=== cut here ===

The problem is that cxx_bind_parameters_in_call ends up attempting to



convert a REAL_CST (the first non artificial parameter) to
INTEGER_TYPE
(the type of the __in_chrg parameter), which ICEs.

This patch teaches cxx_bind_parameters_in_call to handle the
__in_chrg
and __vtt_parm parameters that {con,de}structors might have.

Note that in the test case, the constructor is not
constexpr-suitable,
however it's OK since it's a template according to my read of
paragraph
(3) of [dcl.constexpr].


Agreed.

It looks like your patch doesn't correct the mismatching of arguments
to parameters that you describe, but at least for now it should be
enough to set *non_constant_p and return if we see a VTT or in-charge
parameter.


Thanks, it’s true that my initial patch was wrong in that we’d leave
cxx_bind_parameters_in_call thinking the expression was actually a
constant expression :-/

The attached revised patch follows your suggestion (thanks!).
Successfully tested on x86_64-pc-linux-gnu. OK for trunk?


After this patch I'm seeing a regression on constexpr-dynamic10.C with 
-fimplicit-constexpr; we also need to give an error here when

(!ctx->quiet).

Jason

RE: [PATCH] middle-end: Fix ifcvt predicate generation for masked function calls

2024-09-30 Thread Tamar Christina

Hi Victor,

Thanks! This looks good to me with one minor comment:

> -Original Message-
> From: Victor Do Nascimento 
> Sent: Monday, September 30, 2024 2:34 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Tamar Christina ; richard.guent...@gmail.com;
> Victor Do Nascimento 
> Subject: [PATCH] middle-end: Fix ifcvt predicate generation for masked 
> function
> calls
> 
> Up until now, due to a latent bug in the code for the ifcvt pass,
> irrespective of the branch taken in a conditional statement, the
> original condition for the if statement was used in masking the
> function call.
> 
> Thus, for code such as:
> 
>   if (a[i] > limit)
> b[i] = fixed_const;
>   else
> b[i] = fn (a[i]);
> 
> we would generate the following (wrong) if-converted tree code:
> 
>   _1 = a[i_1];
>   _2 = _1 > limit;
>   _3 = .MASK_CALL (fn, _1, _2);
>   cstore_4 = _2 ? fixed_const : _3;
> 
> as opposed to the correct expected sequence:
> 
>   _1 = a[i_1];
>   _2 = _1 > limit;
>   _3 = ~_2;
>   _4 = .MASK_CALL (fn, _1, _3);
>   cstore_5 = _2 ? fixed_const : _4;
> 
> This patch ensures that the correct predicate mask generation is
> carried out such that, upon autovectorization, the correct vector
> lanes are selected in the vectorized function call.
> 
> gcc/ChangeLog:
> 
>   * tree-if-conv.cc (predicate_statements): Fix handling of
>   predicated function calls.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-fncall-mask.c: New.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c | 31 
>  gcc/tree-if-conv.cc  | 14 -
>  2 files changed, 44 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
> b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
> new file mode 100644
> index 000..554488e0630
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw 
> -Ofast"
> { target { aarch64*-*-* } } } */
> +
> +extern int __attribute__ ((simd, const)) fn (int);
> +
> +const int N = 20;
> +const float lim = 101.0;
> +const float cst =  -1.0;
> +float tot =   0.0;
> +
> +float b[20];
> +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */
> + [10 ... 19] = 100.0 };/* Else branch.  */
> +
> +int main (void)
> +{
> +  #pragma omp simd
> +  for (int i = 0; i < N; i += 1)
> +{
> +  if (a[i] > lim)
> + b[i] = cst;
> +  else
> + b[i] = fn (a[i]);
> +  tot += b[i];
> +}
> +  return (0);
> +}
> +
> +/* { dg-final { scan-tree-dump {gimple_assign  NULL>} ifcvt } } */
> +/* { dg-final { scan-tree-dump {gimple_assign  NULL>} ifcvt } } */
> +/* { dg-final { scan-tree-dump {gimple_call <.MASK_CALL, _3, fn, _2, _34>} 
> ifcvt } }
> */
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 0346a1376c5..246a6bb5bd1 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -2907,6 +2907,8 @@ predicate_statements (loop_p loop)
>This will cause the vectorizer to match the "in branch"
>clone variants, and serves to build the mask vector
>in a natural way.  */
> +   tree mask = cond;
> +   gimple_seq stmts = NULL;
> gcall *call = dyn_cast  (gsi_stmt (gsi));
> tree orig_fn = gimple_call_fn (call);
> int orig_nargs = gimple_call_num_args (call);
> @@ -2914,7 +2916,17 @@ predicate_statements (loop_p loop)
> args.safe_push (orig_fn);
> for (int i = 0; i < orig_nargs; i++)
>   args.safe_push (gimple_call_arg (call, i));
> -   args.safe_push (cond);
> +   /* If `swap', we invert the mask used for the if branch for use
> +  when masking the function call.  */
> +   if (swap)
> + {
> +   tree true_val
> + = constant_boolean_node (true, TREE_TYPE (mask));
> +   mask = gimple_build (&stmts, BIT_XOR_EXPR,
> +TREE_TYPE (mask), mask, true_val);
> + }
> +   gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);

Looks like this mirrors what is currently being done for gimple_assign, but
you can move the gsi_insert_seq and the declaration of stmts into the if
block since they're only used there.

Otherwise looks good to me but can't approve. 

Thanks,
Tamar

> +   args.safe_push (mask);
> 
> /* Replace the call with a IFN_MASK_CALL that has the extra
>condition parameter. */
> --
> 2.34.1

[PATCH] phi-opt: Improve factor heurstic with constants and conversions from bool [PR116890]

2024-09-30 Thread Andrew Pinski

Take:
```
  if (t_3(D) != 0)
goto ;
  else
goto ;

  
  _8 = c_4(D) != 0;
  _9 = (int) _8;

  
  # e_2 = PHI <_9(3), 0(2)>
```

We should factor out the conversion here as that will allow a simplfication to
`(t_3 != 0) & (c_4 != 0)`. Unlike most other types; `a ? b : CST` will simplify
for boolean result type to either `a | b` or `a & b` so allowing this conversion
for all operations will be always profitable.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note on the phi-opt-7.c testcase change, we are now able to optimize this
and remove the if due to the factoring out now so this is an improvement.

PR tree-optimization/116890

gcc/ChangeLog:

* tree-ssa-phiopt.cc (factor_out_conditional_operation): Conversions
from bool is also should be considered as wanting to happen.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-7.c: Update testcase for no ifs left.
* gcc.dg/tree-ssa/phi-opt-42.c: New test.
* gcc.dg/tree-ssa/phi-opt-43.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-42.c | 19 +++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-43.c | 19 +++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-7.c  |  6 +++---
 gcc/tree-ssa-phiopt.cc | 10 +-
 4 files changed, 50 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-42.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-43.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-42.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-42.c
new file mode 100644
index 000..62556945159
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-42.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+
+/* PR tree-optimization/116890 */
+
+int f(int a, int b, int c)
+{
+  int x;
+  if (c) x = a == 0;
+  else x = 0;
+  return x;
+}
+
+
+/* The if should have been removed as the the conversion from bool to int 
should have been factored out.  */
+/* { dg-final { scan-tree-dump-not "if" "optimized" }  }*/
+/* { dg-final { scan-tree-dump-times "\[^\r\n\]*_\[0-9\]* = a_\[0-9\]*.D. == 
0" 1 "optimized"  } } */
+/* { dg-final { scan-tree-dump-times "\[^\r\n\]*_\[0-9\]* = c_\[0-9\]*.D. != 
0" 1 "optimized"  } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-43.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-43.c
new file mode 100644
index 000..1d16f283f27
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-43.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+
+/* PR tree-optimization/116890 */
+
+int f(_Bool a, _Bool b, int c)
+{
+  int x;
+  if (c) x = a & b;
+  else x = 0;
+  return x;
+}
+
+
+/* The if should have been removed as the the conversion from bool to int 
should have been factored out.  */
+/* { dg-final { scan-tree-dump-not "if" "optimized" }  }*/
+/* { dg-final { scan-tree-dump-times "\[^\r\n\]*_\[0-9\]* = a_\[0-9\]*.D. & 
b_\[0-9\]*.D." 1 "optimized"  } } */
+/* { dg-final { scan-tree-dump-times "\[^\r\n\]*_\[0-9\]* = c_\[0-9\]*.D. != 
0" 1 "optimized"  } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-7.c
index 51e1f6dfa75..3ee43e55692 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-7.c
@@ -15,8 +15,8 @@ int f(int t, int c)
   return g(d,e);
 }
 
-/* There should be one ifs as one of them should be changed into
-   a conditional and the other should be there still.  */
-/* { dg-final { scan-tree-dump-times "if" 1 "optimized" }  }*/
+/* There should be no ifs as this is converted into `(t != 0) & (c != 0)`.
+/* { dg-final { scan-tree-dump-not "if" "optimized" }  }*/
 /* { dg-final { scan-tree-dump-times "\[^\r\n\]*_\[0-9\]* = c_\[0-9\]*.D. != 
0" 1 "optimized"  } } */
+/* { dg-final { scan-tree-dump-times "\[^\r\n\]*_\[0-9\]* = t_\[0-9\]*.D. != 
0" 1 "optimized"  } } */
 
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index d43832b390b..bd7f9607eb9 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -345,9 +345,17 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
  if (gassign *assign = dyn_cast  (stmt))
{
  tree lhs = gimple_assign_lhs (assign);
+ tree lhst = TREE_TYPE (lhs);
  enum tree_code ass_code
= gimple_assign_rhs_code (assign);
- if (ass_code != MAX_EXPR && ass_code != MIN_EXPR)
+ if (ass_code != MAX_EXPR && ass_code != MIN_EXPR
+ /* Conversions from boolean like types is ok
+as `a?1:b` and `a?0:b` will always simplify
+to `a & b` or `a | b`.
+See PR 116890.  */
+ && !(INTEGRAL_TYPE_P (lhst)
+  && TYPE_UNSIGNED (lhst)
+

Re: [PATCH] c++: Avoid "infinite parsing" because of cp_parser_decltype [PR114858]

2024-09-30 Thread Jason Merrill


On 9/17/24 8:14 AM, Simon Martin wrote:

The invalid test case in this PR highlights a bad interaction between
the tentative_firewall and error recovery in cp_parser_decltype: the
firewall makes cp_parser_skip_to_closing_parenthesis a no-op, and the
parser does not make any progress, running "forever".

This patch calls cp_parser_commit_to_tentative_parse before initiating
error recovery.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/114858

gcc/cp/ChangeLog:

* parser.cc (cp_parser_decltype): Commit tentative parse before
initiating error recovery.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/decltype10.C: Adjust test expectation.
* g++.dg/cpp2a/pr114858.C: New test.
---
  gcc/cp/parser.cc|  3 +++
  gcc/testsuite/g++.dg/cpp0x/decltype10.C |  2 ++
  gcc/testsuite/g++.dg/cpp2a/pr114858.C   | 25 +
  3 files changed, 30 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/pr114858.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 4dd9474cf60..3a7c5ffe4c8 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -17508,6 +17508,9 @@ cp_parser_decltype (cp_parser *parser)
/* Parse to the closing `)'.  */
if (expr == error_mark_node || !parens.require_close (parser))
  {
+  /* Commit to the tentative_firewall so we actually skip to the closing
+parenthesis.  */
+  cp_parser_commit_to_tentative_parse (parser);


I don't think this is right.

Earlier in cp_parser_decltype I see

  /* If in_declarator_p, a reparse as an expression might succeed (60361).  
 Otherwise, commit now for better diagnostics.  */

  if (cp_parser_uncommitted_to_tentative_parse_p (parser)
  && !parser->in_declarator_p)
cp_parser_commit_to_topmost_tentative_parse (parser);


Here we're in a declarator, so we didn't commit at that point.  And we 
still don't want to commit if parsing fails; as the comment says, when 
reparsing as an expression-statement it might work.  Though there seems 
not to be a testcase for that...


In trying to come up with a testcase, I wrote this one that already 
fails because the error doesn't happen until after the decltype, so we 
memorize the wrong result:


struct Helper { Helper(int, ...); };
template  struct C;
template<> struct C {};
char A = 1;
Helper testFail(int(A), C{}); // { dg-bogus "C" }

So in the long term we need to overhaul this code to handle reparsing 
even without a syntax error.  But it's not a high priority.


Getting back to your patch, I think the problem is in 
cp_parser_simple_type_specifier:



case RID_DECLTYPE:
  /* Since DR 743, decltype can either be a simple-type-specifier by
 itself or begin a nested-name-specifier.  Parsing it will replace  
 it with a CPP_DECLTYPE, so just rewind and let the CPP_DECLTYPE
 handling below decide what to do.  */

  cp_parser_decltype (parser);
  cp_lexer_set_token_position (parser->lexer, token);
  break;


This assumes that cp_parser_decltype will always succeed, which is 
wrong.  We need to check whether the token actually became CPP_DECLTYPE 
and parser_error if not.


Jason

Re: [PATCH v3] c++: concept in default argument [PR109859]

2024-09-30 Thread Marek Polacek

On Mon, Sep 30, 2024 at 03:02:39PM -0400, Jason Merrill wrote:
> On 9/30/24 1:45 PM, Marek Polacek wrote:
> > On Mon, Sep 30, 2024 at 10:53:04AM -0400, Jason Merrill wrote:
> > > On 9/27/24 5:30 PM, Marek Polacek wrote:
> > > > On Fri, Sep 27, 2024 at 04:57:58PM -0400, Jason Merrill wrote:
> > > > > On 9/18/24 5:06 PM, Marek Polacek wrote:
> > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > 
> > > > > > -- >8 --
> > > > > > 1) We're hitting the assert in cp_parser_placeholder_type_specifier.
> > > > > > It says that if it turns out to be false, we should do error() 
> > > > > > instead.
> > > > > > Do so, then.
> > > > > > 
> > > > > > 2) lambda-targ8.C should compile fine, though.  The problem was that
> > > > > > local_variables_forbidden_p wasn't cleared when we're about to parse
> > > > > > the optional template-parameter-list for a lambda in a default 
> > > > > > argument.
> > > > > > 
> > > > > > PR c++/109859
> > > > > > 
> > > > > > gcc/cp/ChangeLog:
> > > > > > 
> > > > > > * parser.cc (cp_parser_lambda_declarator_opt): Temporarily clear
> > > > > > local_variables_forbidden_p.
> > > > > > (cp_parser_placeholder_type_specifier): Turn an assert into an 
> > > > > > error.
> > > > > > 
> > > > > > gcc/testsuite/ChangeLog:
> > > > > > 
> > > > > > * g++.dg/cpp2a/concepts-defarg3.C: New test.
> > > > > > * g++.dg/cpp2a/lambda-targ8.C: New test.
> > > > > > ---
> > > > > > gcc/cp/parser.cc  |  9 +++--
> > > > > > gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C |  8 
> > > > > > gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C | 10 ++
> > > > > > 3 files changed, 25 insertions(+), 2 deletions(-)
> > > > > > create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C
> > > > > > create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C
> > > > > > 
> > > > > > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > > > > > index 4dd9474cf60..bdc4fef243a 100644
> > > > > > --- a/gcc/cp/parser.cc
> > > > > > +++ b/gcc/cp/parser.cc
> > > > > > @@ -11891,6 +11891,11 @@ cp_parser_lambda_declarator_opt 
> > > > > > (cp_parser* parser, tree lambda_expr)
> > > > > >  "lambda templates are only available with "
> > > > > >  "%<-std=c++20%> or %<-std=gnu++20%>");
> > > > > > +  /* Even though the whole lambda may be a default argument, 
> > > > > > its
> > > > > > +template-parameter-list is a context where it's OK to create
> > > > > > +new parameters.  */
> > > > > > +  auto lvf = make_temp_override 
> > > > > > (parser->local_variables_forbidden_p, 0u);
> > > > > > +
> > > > > >   cp_lexer_consume_token (parser->lexer);
> > > > > >   template_param_list = cp_parser_template_parameter_list 
> > > > > > (parser);
> > > > > > @@ -20978,8 +20983,8 @@ cp_parser_placeholder_type_specifier 
> > > > > > (cp_parser *parser, location_t loc,
> > > > > >   /* In a default argument we may not be creating new 
> > > > > > parameters.  */
> > > > > >   if (parser->local_variables_forbidden_p & 
> > > > > > LOCAL_VARS_FORBIDDEN)
> > > > > > {
> > > > > > - /* If this assert turns out to be false, do error() instead.  
> > > > > > */
> > > > > > - gcc_assert (tentative);
> > > > > > + if (!tentative)
> > > > > > +   error_at (loc, "local variables may not appear in this 
> > > > > > context");
> > > > > 
> > > > > There's no local variable in the new testcase, the error should talk 
> > > > > about a
> > > > > concept-name.
> > > > 
> > > > Ah sure.  So like this?
> > > > 
> > > > Tested dg.exp.
> > > > 
> > > > -- >8 --
> > > > 1) We're hitting the assert in cp_parser_placeholder_type_specifier.
> > > > It says that if it turns out to be false, we should do error() instead.
> > > > Do so, then.
> > > > 
> > > > 2) lambda-targ8.C should compile fine, though.  The problem was that
> > > > local_variables_forbidden_p wasn't cleared when we're about to parse
> > > > the optional template-parameter-list for a lambda in a default argument.
> > > > 
> > > > PR c++/109859
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * parser.cc (cp_parser_lambda_declarator_opt): Temporarily clear
> > > > local_variables_forbidden_p.
> > > > (cp_parser_placeholder_type_specifier): Turn an assert into an 
> > > > error.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/cpp2a/concepts-defarg3.C: New test.
> > > > * g++.dg/cpp2a/lambda-targ8.C: New test.
> > > > ---
> > > >gcc/cp/parser.cc  |  9 +++--
> > > >gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C |  8 
> > > >gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C | 10 ++
> > > >3 files changed, 25 insertions(+), 2 deletions(-)
> > > >create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C
> > > >create mode 100644

Re: [PATCH v3] c++: concept in default argument [PR109859]

2024-09-30 Thread Jason Merrill


On 9/30/24 1:45 PM, Marek Polacek wrote:

On Mon, Sep 30, 2024 at 10:53:04AM -0400, Jason Merrill wrote:

On 9/27/24 5:30 PM, Marek Polacek wrote:

On Fri, Sep 27, 2024 at 04:57:58PM -0400, Jason Merrill wrote:

On 9/18/24 5:06 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
1) We're hitting the assert in cp_parser_placeholder_type_specifier.
It says that if it turns out to be false, we should do error() instead.
Do so, then.

2) lambda-targ8.C should compile fine, though.  The problem was that
local_variables_forbidden_p wasn't cleared when we're about to parse
the optional template-parameter-list for a lambda in a default argument.

PR c++/109859

gcc/cp/ChangeLog:

* parser.cc (cp_parser_lambda_declarator_opt): Temporarily clear
local_variables_forbidden_p.
(cp_parser_placeholder_type_specifier): Turn an assert into an error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-defarg3.C: New test.
* g++.dg/cpp2a/lambda-targ8.C: New test.
---
gcc/cp/parser.cc  |  9 +++--
gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C |  8 
gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C | 10 ++
3 files changed, 25 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C
create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 4dd9474cf60..bdc4fef243a 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -11891,6 +11891,11 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, 
tree lambda_expr)
 "lambda templates are only available with "
 "%<-std=c++20%> or %<-std=gnu++20%>");
+  /* Even though the whole lambda may be a default argument, its
+template-parameter-list is a context where it's OK to create
+new parameters.  */
+  auto lvf = make_temp_override (parser->local_variables_forbidden_p, 0u);
+
  cp_lexer_consume_token (parser->lexer);
  template_param_list = cp_parser_template_parameter_list (parser);
@@ -20978,8 +20983,8 @@ cp_parser_placeholder_type_specifier (cp_parser 
*parser, location_t loc,
  /* In a default argument we may not be creating new parameters.  */
  if (parser->local_variables_forbidden_p & LOCAL_VARS_FORBIDDEN)
{
- /* If this assert turns out to be false, do error() instead.  */
- gcc_assert (tentative);
+ if (!tentative)
+   error_at (loc, "local variables may not appear in this context");


There's no local variable in the new testcase, the error should talk about a
concept-name.


Ah sure.  So like this?

Tested dg.exp.

-- >8 --
1) We're hitting the assert in cp_parser_placeholder_type_specifier.
It says that if it turns out to be false, we should do error() instead.
Do so, then.

2) lambda-targ8.C should compile fine, though.  The problem was that
local_variables_forbidden_p wasn't cleared when we're about to parse
the optional template-parameter-list for a lambda in a default argument.

PR c++/109859

gcc/cp/ChangeLog:

* parser.cc (cp_parser_lambda_declarator_opt): Temporarily clear
local_variables_forbidden_p.
(cp_parser_placeholder_type_specifier): Turn an assert into an error.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-defarg3.C: New test.
* g++.dg/cpp2a/lambda-targ8.C: New test.
---
   gcc/cp/parser.cc  |  9 +++--
   gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C |  8 
   gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C | 10 ++
   3 files changed, 25 insertions(+), 2 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-defarg3.C
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ8.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f50534f5f39..a92e6a29ba6 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -11891,6 +11891,11 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, 
tree lambda_expr)
 "lambda templates are only available with "
 "%<-std=c++20%> or %<-std=gnu++20%>");
+  /* Even though the whole lambda may be a default argument, its
+template-parameter-list is a context where it's OK to create
+new parameters.  */
+  auto lvf = make_temp_override (parser->local_variables_forbidden_p, 0u);
+
 cp_lexer_consume_token (parser->lexer);
 template_param_list = cp_parser_template_parameter_list (parser);
@@ -20989,8 +20994,8 @@ cp_parser_placeholder_type_specifier (cp_parser 
*parser, location_t loc,
 /* In a default argument we may not be creating new parameters.  */
 if (parser->local_variables_forbidden_p & LOCAL_VARS_FORBIDDEN)
{
- /* If this assert turns out to be false, do error() instead.  */
- gcc_assert (tentative);
+ if (!tentati

Re: [PATCH 3/4] rs6000, Remove redundant built-in __builtin_vsx_xvcvuxwdp

2024-09-30 Thread Carl Love


GCC maintainers:

Here is my respnses to the review comments by Kewen.  Unfortunately, 
Kewen is no longer working on GCC power.


I will submit an updated version of the patch with Kewen's suggested 
changes.


 Carl


On 8/9/24 3:11 AM, Kewen.Lin wrote:

rs6000, Remove redundant built-in __builtin_vsx_xvcvuxwdp

The built-in __builtin_vsx_xvcvuxwdp is a duplicate of the overloaded
built-in vec_doubleo.  There are no test cases or documentation for

I think this wording is wrong, __builtin_vsx_xvcvuxwdp is a bif doing
1-1 map to xvcvuxwdp, but vec_doubleo with vector unsigned int is only
mapped to xvcvuxwdp on LE while it's vec_doublee on BE.  So how about
"... __builtin_vsx_xvcvuxwdp can be covered with PVIPR function
vec_doubleo on LE and vec_doublee on BE...".

OK with this wording tweaked, thanks!

Yes, the mapping is LE/BE dependent.  Updated the description as suggested.

    Carl

Re: [PATCH 4/4] rs6000, Add tests and documentation for vector, conversions between integer and float

2024-09-30 Thread Carl Love


GCC maintainers:

Here is my respnses to the review comments by Kewen.  Unfortunately, 
Kewen is no longer working on GCC power.


I will submit an updated version of the patch with Kewen's suggested 
changes.


 Carl


On 8/20/24 12:54 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/8/8 01:15, Carl Love wrote:


  GCC maintainers:

The following patch fixes errors in the definition of the 
__builtin_vsx_uns_floate_v2di, __builtin_vsx_uns_floato_v2di and 
__builtin_vsx_uns_float2_v2di built-ins.  The arguments should be unsigned but 
are listed as signed.

Additionally, there are a number of test cases that are missing for the various 
instances of the built-ins.  Additionally, the documentation for the various 
built-ins is missing.

This patch adds the missing test cases and documentation.

The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

     Carl
-
rs6000, Add tests and documentation for vector conversions between integer and 
float

The arguments for the __builtin_vsx_uns_floate_v2di,
__builtin_vsx_uns_floato_v2di and __builtin_vsx_uns_float2_v2di built-ins
should be unsigned.

Add tests for the following existing integer and long long int to float
built-ins:
   __builtin_altivecfloat_sisf (vsi);
   __builtin_altivec_uns_float_sisf (vui);
   __builtin_vsxfloate_v2di (vsll);
   __builtin_vsx_uns_floate_v2di (vull);
   __builtin_vsx_floato_v2di (vsll);
   __builtin_vsx_uns_floato_v2di (vull);
   __builtin_vsx_float2_v2di (vsll, vsll);
   __builtin_vsx_uns_float2_v2di (vull, vull);

Add tests for the vector float to vector int built-ins:
   __builtin_altivec_fix_sfsi
   __builtin_altivec_fixuns_sfsi

The various built-ins are not documented.  The patch adds the missing
documentation for the variouls built-ins.

This patch fixes the incorrect __builtin_vsx_uns_float[o|e|2]_v2di
argument types and adds test cases for each of the built-ins listed above.

gcc/ChangeLog:
     * config/rs6000/rs6000-builtins.def (__builtin_vsx_uns_floate_v2di,
     __builtin_vsx_uns_floato_v2di,__builtin_vsx_uns_float2_v2di): Change
     argument from signed to unsigned.
     * doc/extend.texi: Add documentation for each of the built-ins.

gcc/testsuite/ChangeLog:
     * gcc.target/powerpc/vsx-int-to-float-runnable.c: New file.
---
  gcc/config/rs6000/rs6000-builtins.def |   6 +-
  gcc/doc/extend.texi   |  37 +++
  .../powerpc/vsx-int-to-float-runnable.c   | 260 ++
  3 files changed, 300 insertions(+), 3 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/powerpc/vsx-int-to-float-runnable.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f2bebd299b2..1227daa1555 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1463,10 +1463,10 @@
    const vd __builtin_vsx_uns_doubleo_v4si (vsi);
  UNS_DOUBLEO_V4SI unsdoubleov4si2 {}

I noticed there are extra four that should be updated together:

const vd __builtin_vsx_uns_doublee_v4si (vsi);
  UNS_DOUBLEE_V4SI unsdoubleev4si2 {}

const vd __builtin_vsx_uns_doubleh_v4si (vsi);
  UNS_DOUBLEH_V4SI unsdoublehv4si2 {}

const vd __builtin_vsx_uns_doublel_v4si (vsi);
  UNS_DOUBLEL_V4SI unsdoublelv4si2 {}

const vd __builtin_vsx_uns_doubleo_v4si (vsi);
  UNS_DOUBLEO_V4SI unsdoubleov4si2 {}


Yes, those definitions are also incorrect.  Fixed.


-  const vf __builtin_vsx_uns_floate_v2di (vsll);
+  const vf __builtin_vsx_uns_floate_v2di (vull);
  UNS_FLOATE_V2DI unsfloatev2di {}

-  const vf __builtin_vsx_uns_floato_v2di (vsll);
+  const vf __builtin_vsx_uns_floato_v2di (vull);
  UNS_FLOATO_V2DI unsfloatov2di {}

    const vsll __builtin_vsx_vsigned_v2df (vd);
@@ -2272,7 +2272,7 @@
    const vss __builtin_vsx_revb_v8hi (vss);
  REVB_V8HI revb_v8hi {}

-  const vf __builtin_vsx_uns_float2_v2di (vsll, vsll);
+  const vf __builtin_vsx_uns_float2_v2di (vull, vull);
  UNS_FLOAT2_V2DI uns_float2_v2di {}

    const vsi __builtin_vsx_vsigned2_v2df (vd, vd);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index bf6f4094040..7ec4f19a6bf 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22919,6 +22919,43 @@ but the index value must be 0.

  Only functions excluded from the PVIPR are listed here.

+The following built-ins convert signed and unsigned vectors of ints and
+long long ints to a vector of 32-bit floating point values.
+
+@smallexample
+vector float __builtin_altivec_float_sisf (vector int);
+vector float __builtin_altivec_uns_float_sisf (vector unsigned int);

These functions are to convert vector {un,}signed int to vector float,
PVIPR has defined "vec_float" for this kind of conversion.  For now,
this function only considers VSX:

[VEC_FLOA

Re: [PATCH 1/4] rs6000, add testcases to the overloaded vec_perm built-in

2024-09-30 Thread Carl Love


GCC maintainers:

Here is my respnses to the review comments by Kewen.  Unfortunately, 
Kewen is no longer working on GCC power.


I will submit an updated version of the patch with Kewen's suggested 
changes.


 Carl

On 8/9/24 3:11 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/8/8 01:15, Carl Love wrote:

GCC maintainers:

The following patch adds missing test cases for the overloaded vec_perm 
built-in.  It also fixes and issue with printing the 128-bit values in the 
DEBUG section that was noticed when adding the additional test cases.

The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

   Carl

-

rs6000, add testcases to the overloaded vec_perm built-in

The overloaded vec_perm built-in supports permuting signed and unsigned
vectors of char, bool char, short int, short bool, int, bool,
long long int, long long bool, int128, float and double.  However, not all
of the supported arguments are included in the test cases.  This patch adds
the missing test cases.

Additionally, in the 128-bit debug print statements the expected result and
the result need to be cast to unsigned long long to print correctly.  The
patch makes this additional change to the print statements.

gcc/ChangeLog:
     * doc/extend.texi: Fix spelling mistake in description of the
     vec_sel built-in.
     Add documentation of the 128-bit vec_perm instance.

gcc/testsuite/ChangeLog:
     * gcc.target/powerpc/vsx-builtin-3.c: Add vec_perm test cases    for
     arguments of type vector signed long long int, long long bool,
     bool, bool short, bool char and pixel,
     vector unsigned long long int, unsigned int, unsigned short int,
     unsigned char.
     Cast arguments for debug prints to unsigned long long.
     * gcc.target/powerpc/builtins-4-int128-runnable.c: Add vec_perm
     test cases for signed and unsigned int128 arguments.

Nit: Some changelog lines have unnecessary newlines and spaces.


Fixed.

---
  gcc/doc/extend.texi   |  12 +-
  .../powerpc/builtins-4-int128-runnable.c  | 108 +++---
  .../gcc.target/powerpc/vsx-builtin-3.c    |  18 +++
  3 files changed, 121 insertions(+), 17 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 48b27ff9f39..bf6f4094040 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21553,9 +21553,19 @@ vector bool __int128 vec_sel (vector bool __int128,
     vector bool __int128, vector unsigned __int128);
  @end smallexample

-The instance is an extension of the exiting overloaded built-in @code{vec_sel}
+The instance is an extension of the existing overloaded built-in @code{vec_sel}
  that is documented in the PVIPR.

Good catch!


+@smallexample
+vector signed __int128 vec_perm (vector signed __int128,
+   vector signed __int128);
+vector unsigned __int128 vec_perm (vector unsigned __int128,
+   vector unsigned __int128);
+@end smallexample
+
+The 128-bit integer arguments for the @code{vec_perm} built-in are in addition
+to the instances that are documented in the PVIPR.

Nit: Maybe just copy the above wording for @code{vec_sel} but replaced with
@code{vec_perm} to keep them consistent.


OK, made them consistent.





diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
index 67c93be1469..b3b76be34b9 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -39,10 +39,17 @@

  #include 

+extern __vector long long int sll[][4];
There is a "extern __vector long long sll[][4]" below.


+extern __vector long long bool bll[][4];
  extern __vector int si[][4];
+extern __vector bool int bi[][4];

Similar, having "... __vector __bool int bi[][4]" below.


  extern __vector short ss[][4];
+extern __vector bool short bs[][4];

Similar, having "... __vector __bool short bs[][4]" below.


  extern __vector signed char sc[][4];
+extern __vector bool char bc[][4];

Ditto.


+extern __vector pixel p[][4];

Similar, having "... __vector __pixel p[][4]" below.


  extern __vector float f[][4];
+extern __vector unsigned long long int ull[][4];

As above, I think we only need "bll" and "ull" here.


Yea, looks like I didn't notice that they were previously defined. Looks 
like all I really needed to add is the bll.  There is as ull definition 
already for __VSX__ which I think needs to be moved so it is always there.


Surprised the compiler didn't complain about multiple definitions.

   Carl

Re: [PATCH 2/4] rs6000, remove built-ins __builtin_vsx_vperm_8hi and, __builtin_vsx_vperm_8hi_uns

2024-09-30 Thread Carl Love


GCC maintainers:

Here is my respnses to the review comments by Kewen.  Unfortunately, 
Kewen is no longer working on GCC power.


I will submit an updated version of the patch with Kewen's suggested 
changes.


 Carl


On 8/9/24 3:11 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/8/8 01:15, Carl Love wrote:

GCC maintainers:

The following patch removes two redundant built-ins __builtin_vsx_vperm_8hi and 
__builtin_vsx_vperm_8hi_uns.  The built-ins are covered by the overloaded 
vec_perm built-in.

The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

   Carl

-
rs6000, remove built-ins __builtin_vsx_vperm_8hi and __builtin_vsx_vperm_8hi_uns

The two built-ins __builtin_vsx_vperm_8hi and __builtin_vsx_vperm_8hi_uns
are redundant. The are covered by the overloaded vec_perm built-in.  The
built-ins are not documented and do not have test cases.

OK for trunk, maybe also mention this is a follow up of r15-1923, thanks!


Yes, added:

  The removal of these built-ins was missed in commit gcc r15-1923 on 
7/9/2024.


to the patch description.

 Carl

[PATCH]middle-end: support SLP early break

2024-09-30 Thread Tamar Christina

Hi all,

This patch introduces feature parity for early break int the SLP only
vectorizer.

The approach taken here is to treat the early exits as root statements for an
SLP tree.  This means that we don't need any changes to build_slp to support
gconds.

Codegen for the gcond itself now has to be done out of line but the body of the
SLP blocks itself is simply driven by SLP scheduling.  There is a slight
awkwardness in having re-used vectorizable_early_exit for both SLP and non-SLP
but I've documented the differences and when I did try to refactor it it wasn't
really worth it given that this is a temporary state anyway.

This version is restricted to lane = 1, as such we can re-use the existing
move_early_break function instead of having to do safety update through
scheduling.  I have a branch where I'm working on that but lane > 1 is out of
scope for GCC 15 anyway.   The only reason I will try to get moving through
scheduling done as a stretch goal is so we get epilogue vectorization back for
early break.

The example:

unsigned test4(unsigned x)
{
 unsigned ret = 0;
 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i]*2 != x)
 break;
   vect_a[i] = x;
   
 }
 return ret;
}

builds the following SLP instance for early break:

note:   Analyzing vectorizable control flow: if (patt_6 != 0)
note:   Starting SLP discovery for
note: patt_6 = _4 != x_9(D);
note:   starting SLP discovery for node 0x63abc80
note:   Build SLP for patt_6 = _4 != x_9(D);
note:   precomputed vectype: vector(4) 
note:   nunits = 4
note:   vect_is_simple_use: operand x_9(D), type of def: external
note:   vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 0][2, 
+INF] MASK 0x
_3 * 2, type of def: internal
note:   starting SLP discovery for node 0x63abdc0
note:   Build SLP for _4 = _3 * 2;
note:   precomputed vectype: vector(4) unsigned int
note:   nunits = 4
note:   vect_is_simple_use: operand #
vect_aD.4416[i_15], type of def: internal
note:   vect_is_simple_use: operand 2, type of def: constant
note:   starting SLP discovery for node 0x63abe60
note:   Build SLP for _3 = vect_a[i_15];
note:   precomputed vectype: vector(4) unsigned int
note:   nunits = 4
note:   SLP discovery for node 0x63abe60 succeeded
note:   SLP discovery for node 0x63abdc0 succeeded
note:   SLP discovery for node 0x63abc80 succeeded
note:   SLP size 3 vs. limit 10.
note:   Final SLP tree for instance 0x6474190:
note:   node 0x63abc80 (max_nunits=4, refcnt=2) vector(4) 
note:   op template: patt_6 = _4 != x_9(D);
note:   stmt 0 patt_6 = _4 != x_9(D);
note:   children 0x63abd20 0x63abdc0
note:   node (external) 0x63abd20 (max_nunits=1, refcnt=1)
note:   { x_9(D) }
note:   node 0x63abdc0 (max_nunits=4, refcnt=2) vector(4) unsigned int
note:   op template: _4 = _3 * 2;
note:   stmt 0 _4 = _3 * 2;
note:   children 0x63abe60 0x63abf00
note:   node 0x63abe60 (max_nunits=4, refcnt=2) vector(4) unsigned int
note:   op template: _3 = vect_a[i_15];
note:   stmt 0 _3 = vect_a[i_15];
note:   load permutation { 0 }
note:   node (constant) 0x63abf00 (max_nunits=1, refcnt=1)
note:   { 2 }

and during codegen:

note:   -->vectorizing SLP node starting from: patt_6 = _4 != x_9(D);
note:   vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 0][2, 
+INF] MASK 0x
_3 * 2, type of def: internal
note:   add new stmt: mask_patt_6.18_58 = _53 != vect__4.17_57;
note:=== vectorizable_early_exit ===
note:transform early-exit.
note:   vectorizing stmts using SLP.
note:   Vectorizing SLP tree:
note:   node 0x63abfa0 (max_nunits=4, refcnt=1) vector(4) int
note:   op template: i_12 = i_15 + 1;
note:   stmt 0 i_12 = i_15 + 1;
note:   children 0x63aba00 0x63ac040
note:   node 0x63aba00 (max_nunits=4, refcnt=2) vector(4) int
note:   op template: i_15 = PHI 
note:   [l] stmt 0 i_15 = PHI 
note:   children (nil) (nil)
note:   node (constant) 0x63ac040 (max_nunits=1, refcnt=1) vector(4) int
note:   { 1 }

Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
x86_64-pc-linux-gnu -m32, -m64 and no issues.

Also bootstrapped --with-build-config='bootstrap-O3 bootstrap-lto'
--enable-checking=release,yes,rtl,extra on aarch64-none-linux-gnu and
x86_64-pc-linux-gnu -m32, -m64 and no issues.

Ok for master?

gcc/ChangeLog:

* tree-vectorizer.h (enum slp_instance_kind): Add slp_inst_kind_gcond.
(LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS): New.
(vectorizable_early_exit): Expose.
(class _loop_vec_info): Add early_break_live_stmts.
* tree-vect-slp.cc (vect_build_slp_instance, vect_analyze_slp_instance):
Support gcond instances.
(vect_analyze_slp): Analyze gcond roots and early break live statements.
(maybe_push_to_hybrid_worklist): Don't sink gconds.
(vect_slp_analyze_node_operations): Support gconds.
(vect_slp_check_for_roots):

Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

2024-09-30 Thread Kyrylo Tkachov

Hi Soumya

> On 30 Sep 2024, at 18:26, Soumya AR  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> This patch uses the FSCALE instruction provided by SVE to implement the
> standard ldexp family of functions.
> 
> Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
> following code:
> 
> float
> test_ldexpf (float x, int i)
> {
>return __builtin_ldexpf (x, i);
> }
> 
> double
> test_ldexp (double x, int i)
> {
>return __builtin_ldexp(x, i);
> }
> 
> GCC Output:
> 
> test_ldexpf:
>b ldexpf
> 
> test_ldexp:
>b ldexp
> 
> Since SVE has support for an FSCALE instruction, we can use this to process
> scalar floats by moving them to a vector register and performing an fscale 
> call,
> similar to how LLVM tackles an ldexp builtin as well.
> 
> New Output:
> 
> test_ldexpf:
>fmov s31, w0
>ptrue p7.b, all
>fscale z0.s, p7/m, z0.s, z31.s
>ret
> 
> test_ldexp:
>sxtw x0, w0
>ptrue p7.b, all
>fmov d31, x0
>fscale z0.d, p7/m, z0.d, z31.d
>ret
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
> 
> Signed-off-by: Soumya AR 
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-sve.md
> (ldexp3): Added a new pattern to match ldexp calls with scalar
> floating modes and expand to the existing pattern for FSCALE.
> (@aarch64_pred_): Extended the pattern to accept SVE
> operands as well as scalar floating modes.
> 
> * config/aarch64/iterators.md:
> SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well
> as SF and DF.
> VPRED: Extended the attribute to handle GPF modes.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/sve/fscale.c: New test.

This patch fixes the bugzilla report at 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111733
So it should be referenced in the ChangeLog entries like so:

PR target/111733
* config/aarch64/aarch64-sve.md 

That way the commit hooks will pick it up and updated the bug tracker 
accordingly

> 
> <0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch>

+(define_expand "ldexp3"
+  [(set (match_operand:GPF 0 "register_operand" "=w")
+   (unspec:GPF
+ [(match_operand:GPF 1 "register_operand" "w")
+  (match_operand: 2 "register_operand" "w")]
+ UNSPEC_COND_FSCALE))]
+  "TARGET_SVE"
+  {
+rtx ptrue = aarch64_ptrue_reg (mode);
+rtx strictness = gen_int_mode (SVE_RELAXED_GP, SImode);
+emit_insn (gen_aarch64_pred_fscale (operands[0], ptrue, operands[1], 
operands[2], strictness));
+DONE;
+  }

Lines should not exceed 80 columns, this should be wrapped around

The patch looks good to me otherwise.
Thanks,
Kyrill

65 matches

Mail list logo