date:20231115

Re: [committed] i386: Generate strict_low_part QImode insn with high input register

2023-11-15 Thread Uros Bizjak

On Tue, Nov 14, 2023 at 6:37 PM Uros Bizjak  wrote:

> PR target/78904
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (*addqi_ext_1_slp):
> New define_insn_and_split pattern.
> (*subqi_ext_1_slp): Ditto.
> (*qi_ext_1_slp): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr78904-7.c: New test.
> * gcc.target/i386/pr78904-7a.c: New test.
> * gcc.target/i386/pr78904-7b.c: New test.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Eh, I somehow managed to mix up patches. The attached patch is also
needed to avoid testsuite ICE in gcc.c-torture/execute/pr82524.c.

Will commit it later today.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6136e46b1bc..0a9d14e9c08 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6626,7 +6626,7 @@ (define_insn_and_split "*addqi_ext_1_slp"
   (const_int 8)]) 0)
  (match_operand:QI 1 "nonimmediate_operand" "0,!Q")))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
   "@
add{b}\t{%h2, %0|%0, %h2}
#"
@@ -6638,8 +6638,8 @@ (define_insn_and_split "*addqi_ext_1_slp"
   (plus:QI
 (subreg:QI
   (match_op_dup 3
-[(match_dup 0) (const_int 8) (const_int 8)]) 0)
-  (match_dup 1)))
+[(match_dup 2) (const_int 8) (const_int 8)]) 0)
+  (match_dup 0)))
   (clobber (reg:CC FLAGS_REG))])]
   ""
   [(set_attr "type" "alu")
@@ -7669,7 +7669,7 @@ (define_insn_and_split "*subqi_ext_1_slp"
   (const_int 8)
   (const_int 8)]) 0)))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
   "@
sub{b}\t{%h2, %0|%0, %h2}
#"
@@ -7679,10 +7679,10 @@ (define_insn_and_split "*subqi_ext_1_slp"
(parallel
  [(set (strict_low_part (match_dup 0))
   (minus:QI
-  (match_dup 1)
+(match_dup 0)
 (subreg:QI
   (match_op_dup 3
-[(match_dup 0) (const_int 8) (const_int 8)]) 0)))
+[(match_dup 2) (const_int 8) (const_int 8)]) 0)))
   (clobber (reg:CC FLAGS_REG))])]
   ""
   [(set_attr "type" "alu")
@@ -11494,7 +11494,7 @@ (define_insn_and_split "*qi_ext_1_slp"
   (const_int 8)]) 0)
  (match_operand:QI 1 "nonimmediate_operand" "0,!Q")))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
   "@
{b}\t{%h2, %0|%0, %h2}
#"
@@ -11504,10 +11504,10 @@ (define_insn_and_split "*qi_ext_1_slp"
(parallel
  [(set (strict_low_part (match_dup 0))
   (any_logic:QI
-  (match_dup 1)
 (subreg:QI
   (match_op_dup 3
-[(match_dup 0) (const_int 8) (const_int 8)]) 0)))
+[(match_dup 2) (const_int 8) (const_int 8)]) 0)))
+  (match_dup 0)
   (clobber (reg:CC FLAGS_REG))])]
   ""
   [(set_attr "type" "alu")

Re: [PATCH] rs6000: Only enable PCREL on supported ABIs [PR111045]

2023-11-15 Thread Kewen.Lin

Hi,

on 2023/11/15 11:01, Peter Bergner wrote:
> PCREL data accesses are only officially supported on ELFv2.  We currently
> incorrectly enable PCREL on all Power10 compiles in which prefix instructions
> are also enabled.  Rework the option override code so we only enable PCREL
> for those ABIs that actually support it.
> 
> Jeevitha has confirmed this patch fixes the testsuite fallout seen with her
> PR110320 patch.
> 
> This has been bootstrapped and regtested with no regressions on the following
> builds: powerpc64le-linux, powerpc64le-linux --with-cpu=power10 and
> powerpc64-linux - testsuite run in both 32-bit and 64-bit modes.
> Ok for trunk?

OK for trunk and backporting, but wait for two days or so in case Segher and
David have some comments, thanks!

BR,
Kewen

> 
> Ok for the release branches after some burn-in on trunk?
> 
> Peter
> 
> 
> gcc/
>   PR target/111045
>   * config/rs6000/linux64.h (PCREL_SUPPORTED_BY_OS): Only test the ABI.
>   * config/rs6000/rs6000-cpus.def (RS6000_CPU): Remove OPTION_MASK_PCREL
>   from power10.
>   * config/rs6000/predicates.md: Use TARGET_PCREL.
>   * config/rs6000/rs6000-logue.cc (rs6000_decl_ok_for_sibcall): Likewise.
>   (rs6000_global_entry_point_prologue_needed_p): Likewise.
>   (rs6000_output_function_prologue): Likewise.
>   * config/rs6000/rs6000.md: Likewise.
>   * config/rs6000/rs6000.cc (rs6000_option_override_internal): Rework
>   the logic for enabling PCREL by default.
>   (rs6000_legitimize_tls_address): Use TARGET_PCREL.
>   (rs6000_call_template_1): Likewise.
>   (rs6000_indirect_call_template_1): Likewise.
>   (rs6000_longcall_ref): Likewise.
>   (rs6000_call_aix): Likewise.
>   (rs6000_sibcall_aix): Likewise.
>   (rs6000_pcrel_p): Remove.
>   * config/rs6000/rs6000-protos.h (rs6000_pcrel_p): Likewise.
> 
> gcc/testsuite/
>   PR target/111045
>   * gcc.target/powerpc/pr111045.c: New test.
>   * gcc.target/powerpc/float128-constant.c: Add instruction counts for
>   non-pcrel compiles.
> 
> diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
> index 98b7255c95f..5b77bd7fd51 100644
> --- a/gcc/config/rs6000/linux64.h
> +++ b/gcc/config/rs6000/linux64.h
> @@ -563,8 +563,5 @@ extern int dot_symbols;
>  #define TARGET_FLOAT128_ENABLE_TYPE 1
>  
>  /* Enable using prefixed PC-relative addressing on POWER10 if the ABI
> -   supports it.  The ELF v2 ABI only supports PC-relative relocations for
> -   the medium code model.  */
> -#define PCREL_SUPPORTED_BY_OS(TARGET_POWER10 && TARGET_PREFIXED  
> \
> -  && ELFv2_ABI_CHECK \
> -  && TARGET_CMODEL == CMODEL_MEDIUM)
> +   supports it.  */
> +#define PCREL_SUPPORTED_BY_OS(ELFv2_ABI_CHECK)
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index 4f350da378c..fe01a2312ae 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -256,7 +256,8 @@ RS6000_CPU ("power8", PROCESSOR_POWER8, MASK_POWERPC64 | 
> ISA_2_7_MASKS_SERVER
>   | OPTION_MASK_HTM)
>  RS6000_CPU ("power9", PROCESSOR_POWER9, MASK_POWERPC64 | ISA_3_0_MASKS_SERVER
>   | OPTION_MASK_HTM)
> -RS6000_CPU ("power10", PROCESSOR_POWER10, MASK_POWERPC64 | 
> ISA_3_1_MASKS_SERVER)
> +RS6000_CPU ("power10", PROCESSOR_POWER10, MASK_POWERPC64
> + | (ISA_3_1_MASKS_SERVER & ~OPTION_MASK_PCREL))
>  RS6000_CPU ("powerpc", PROCESSOR_POWERPC, 0)
>  RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, OPTION_MASK_PPC_GFXOPT
>   | MASK_POWERPC64)
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index ef7d3f214c4..0b76541fc0a 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -1216,7 +1216,7 @@
>&& SYMBOL_REF_DECL (op) != NULL
>&& TREE_CODE (SYMBOL_REF_DECL (op)) == FUNCTION_DECL
>&& (rs6000_fndecl_pcrel_p (SYMBOL_REF_DECL (op))
> -  != rs6000_pcrel_p ()))")))
> +  != TARGET_PCREL))")))
>  
>  ;; Return 1 if this operand is a valid input for a move insn.
>  (define_predicate "input_operand"
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index 98846f781ec..9e08d9bb4d2 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -1106,7 +1106,7 @@ rs6000_decl_ok_for_sibcall (tree decl)
>r2 for its caller's TOC.  Such a function may make sibcalls to any
>function, whether local or external, without restriction based on
>TOC-save/restore rules.  */
> -  if (rs6000_pcrel_p ())
> +  if (TARGET_PCREL)
>   return true;
>  
>/* Otherwise, under the AIX or ELFv2 ABIs we can't allow sibcalls
> @@ -2583,7 +2583,7 @@ rs6000_global_entry_point_prologue_needed_p (voi

Re: [PATCH] Clean up by_pieces_ninsns

2023-11-15 Thread Kewen.Lin

Hi,

on 2023/11/15 10:26, HAO CHEN GUI wrote:
> Hi,
>   This patch cleans up by_pieces_ninsns and does following things.
> 1. Do the length and alignment adjustment for by pieces compare when
> overlap operation is enabled.
> 2. Remove unnecessary mov_optab checks.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with
> no regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> Clean up by_pieces_ninsns
> 
> The by pieces compare can be implemented by overlapped operations. So
> it should be taken into consideration when doing the adjustment for
> overlap operations.  The mode returned from
> widest_fixed_size_mode_for_size is already checked with mov_optab in
> by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size.
> So there is no need to check mov_optab again in by_pieces_ninsns.
> The patch fixes these issues.
> 
> gcc/
>   * expr.cc (by_pieces_ninsns): Include by pieces compare when
>   do the adjustment for overlap operations.  Remove unnecessary
>   mov_optab check.
> 
> patch.diff
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 3e2a678710d..7cb2c935177 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -1090,18 +1090,15 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned 
> int align,
>unsigned HOST_WIDE_INT n_insns = 0;
>fixed_size_mode mode;
> 
> -  if (targetm.overlap_op_by_pieces_p () && op != COMPARE_BY_PIECES)
> +  if (targetm.overlap_op_by_pieces_p ())
>  {
>/* NB: Round up L and ALIGN to the widest integer mode for
>MAX_SIZE.  */
>mode = widest_fixed_size_mode_for_size (max_size, op);
> -  if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)

These changes are on generic code, so not a review.  :)

If it's guaranteed previously, maybe we can replace it with an assertion
like: gcc_assert (optab_handler (mov_optab, mode) != CODE_FOR_nothing);

> - {
> -   unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
> -   if (up > l)
> - l = up;
> -   align = GET_MODE_ALIGNMENT (mode);
> - }
> +  unsigned HOST_WIDE_INT up = ROUND_UP (l, GET_MODE_SIZE (mode));
> + if (up > l)
> +   l = up;
> +  align = GET_MODE_ALIGNMENT (mode);
>  }
> 
>align = alignment_for_piecewise_move (MOVE_MAX_PIECES, align);
> @@ -1109,12 +1106,10 @@ by_pieces_ninsns (unsigned HOST_WIDE_INT l, unsigned 
> int align,
>while (max_size > 1 && l > 0)
>  {
>mode = widest_fixed_size_mode_for_size (max_size, op);
> -  enum insn_code icode;
> 
>unsigned int modesize = GET_MODE_SIZE (mode);
> 
> -  icode = optab_handler (mov_optab, mode);

... likewise.

BR,
Kewen

> -  if (icode != CODE_FOR_nothing && align >= GET_MODE_ALIGNMENT (mode))
> +  if (align >= GET_MODE_ALIGNMENT (mode))
>   {
> unsigned HOST_WIDE_INT n_pieces = l / modesize;
> l %= modesize;
>

Re: [PATCH] RISC-V: Support trailing vec_init optimization

2023-11-15 Thread Robin Dapp

Hi Juzhe,

thanks, LGTM as it is just a refinement of what we already have.

Regards
 Robin

[PATCH] sched: Remove debug counter sched_block

2023-11-15 Thread Kewen.Lin

Hi,

on 2023/11/10 01:40, Alexander Monakov wrote:

> I agree with the concern. I hoped that solving the problem by skipping the BB
> like the (bit-rotted) debug code needs to would be a minor surgery. As things
> look now, it may be better to remove the non-working sched_block debug counter
> entirely and implement a good solution for the problem at hand.
> 

According to this comment, I made and tested the below patch to remove the
problematic debug counter:

Subject: [PATCH] sched: Remove debug counter sched_block

Currently the debug counter sched_block doesn't work well
since we create dependencies for some insns and those
dependencies are expected to be resolved during scheduling
insns but they can get skipped once we are skipping some
block while respecting sched_block debug counter.

For example, for the below test case:
--
int a, b, c, e, f;
float d;

void
g ()
{
  float h, i[1];
  for (; f;)
if (c)
  {
d *e;
if (b)
  {
float *j = i;
j[0] = 0;
  }
h = d;
  }
  if (h)
a = i[0];
}
--
ICE occurs with option "-O2 -fdbg-cnt=sched_block:1".

As the discussion in [1], it seems that we think this debug
counter is useless and can be removed.  It's also implied
that if it's useful and used often, the above issue should
have been cared about and resolved earlier.  So this patch
is to remove this debug counter.

Bootstrapped and regtested on x86_64-redhat-linux and
powerpc64{,le}-linux-gnu.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635852.html

Is it ok for trunk?

BR,
Kewen
-

gcc/ChangeLog:

* dbgcnt.def (sched_block): Remove.
* sched-rgn.cc (schedule_region): Remove the support of debug count
sched_block.
---
 gcc/dbgcnt.def   |  1 -
 gcc/sched-rgn.cc | 19 ++-
 2 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 871cbf75d93..a8c4e61e13d 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -198,7 +198,6 @@ DEBUG_COUNTER (pre_insn)
 DEBUG_COUNTER (prefetch)
 DEBUG_COUNTER (registered_jump_thread)
 DEBUG_COUNTER (sched2_func)
-DEBUG_COUNTER (sched_block)
 DEBUG_COUNTER (sched_breakdep)
 DEBUG_COUNTER (sched_func)
 DEBUG_COUNTER (sched_insn)
diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index e5964f54ead..1c8acf5068a 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -3198,20 +3198,13 @@ schedule_region (int rgn)
   current_sched_info->queue_must_finish_empty = current_nr_blocks == 1;

   curr_bb = first_bb;
-  if (dbg_cnt (sched_block))
-{
- int saved_last_basic_block = last_basic_block_for_fn (cfun);
+  int saved_last_basic_block = last_basic_block_for_fn (cfun);

- schedule_block (&curr_bb, bb_state[first_bb->index]);
- gcc_assert (EBB_FIRST_BB (bb) == first_bb);
- sched_rgn_n_insns += sched_n_insns;
- realloc_bb_state_array (saved_last_basic_block);
- save_state_for_fallthru_edge (last_bb, curr_state);
-}
-  else
-{
-  sched_rgn_n_insns += rgn_n_insns;
-}
+  schedule_block (&curr_bb, bb_state[first_bb->index]);
+  gcc_assert (EBB_FIRST_BB (bb) == first_bb);
+  sched_rgn_n_insns += sched_n_insns;
+  realloc_bb_state_array (saved_last_basic_block);
+  save_state_for_fallthru_edge (last_bb, curr_state);

   /* Clean up.  */
   if (current_nr_blocks > 1)
--
2.39.1

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-15 Thread Kewen.Lin

Hi Alexander/Richard/Jeff,

Thanks for the insightful comments!

on 2023/11/10 22:41, Alexander Monakov wrote:
> 
> On Fri, 10 Nov 2023, Richard Biener wrote:
> 
>> On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov  wrote:
>>>
>>>
>>> On Fri, 10 Nov 2023, Richard Biener wrote:
>>>
> I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking 
> design:
> DEBUG_INSNs participate in dependency graph so that schedulers can remove 
> or
> mutate them as needed when moving real insns across them.

 Note that debug-only BBs do not exist - the BB would be there even without 
 debug
 insns!
>>>
>>> Yep, sorry, I misspoke when I earlier said
>>>
> and cause divergence when passing through a debug-only BB which would not 
> be
> present at all without -g.
>>>
>>> They are present in the region, but skipped via no_real_insns_p.
>>>
 So instead you have to handle BBs with just debug insns the same you
 handle a completely empty BB.
>>>
>>> Yeah. There would be no problem if the scheduler never used no_real_insns_p
>>> and handled empty and non-empty BBs the same way.
>>
>> And I suppose it would be OK to do that.  Empty BBs are usually removed by
>> CFG cleanup so the situation should only happen in rare corner cases where
>> the fix would be to actually run CFG cleanup ...
> 
> Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
> may be a preferable compromise for sched-rgn as well.
> 

Inspired by this discussion, I tested the attached patch 1 which is to run
cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and
regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu.

Then I assumed some of the current uses of no_real_insns_p won't encounter
empty blocks any more, so made a patch 2 with some explicit assertions, but
unfortunately I got ICEs during bootstrapping happens in function
compute_priorities.  I'm going to investigate it further and post more
findings, but just heads-up to ensure if this is on the right track.

BR,
Kewen

From 7652655f278cfe0f6271c50aecb56e68e0877cc2 Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Tue, 14 Nov 2023 15:16:47 +0800
Subject: [PATCH] sched: cleanup cfg for empty blocks first in haifa_sched_init
 [PR108273]

PR108273 exposed the inconsistent states issue between
non-debug mode and debug mode, as the discussion in [1],
we can follow the current practice in sel_global_init,
to run cleanup_cfg (0) first to remove empty blocks.

This patch is to follow this direction and remove empty
blocks by cleanup_cfg (0) in haifa_sched_init which
affects sms, ebb and rgn schedulings.

PR rtl-optimization/108273

gcc/ChangeLog:

* haifa-sched.cc (haifa_sched_init): Call cleanup_cfg (0) to remove
empty blocks.
---
 gcc/haifa-sched.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 8e8add709b3..e348d1a2119 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -7375,6 +7375,12 @@ haifa_sched_init (void)
   sched_deps_info->generate_spec_deps = 1;
 }
 
+  /* Remove empty blocks to avoid some inconsistency like: we skip
+ empty block in scheduling but don't for empty block + only
+ debug_insn, it could result in different subsequent states
+ and unexpected insn sequence difference.  */
+  cleanup_cfg (0);
+
   /* Initialize luids, dependency caches, target and h_i_d for the
  whole function.  */
   {
-- 
2.39.1

From efe1ed8fb5b151c9c4819ff1fa9af579151e259c Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Tue, 14 Nov 2023 15:39:24 +0800
Subject: [PATCH] sched: Assert we don't have any chance to get empty blocks

att.

gcc/ChangeLog:

* sched-ebb.cc (schedule_ebb): Assert no empty blocks.
* sched-rgn.cc (compute_priorities): Likewise.
(schedule_region): Likewise.
---
 gcc/sched-ebb.cc | 5 -
 gcc/sched-rgn.cc | 7 ++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
index 110fcdbca4d..9fdfd72b6fc 100644
--- a/gcc/sched-ebb.cc
+++ b/gcc/sched-ebb.cc
@@ -492,7 +492,10 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool 
modulo_scheduling)
   last_bb = BLOCK_FOR_INSN (tail);
 
   if (no_real_insns_p (head, tail))
-return BLOCK_FOR_INSN (tail);
+{
+  gcc_unreachable ();
+  return BLOCK_FOR_INSN (tail);
+}
 
   gcc_assert (INSN_P (head) && INSN_P (tail));
 
diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index 1c8acf5068a..795c455872e 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -3025,7 +3025,10 @@ compute_priorities (void)
   get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
 
   if (no_real_insns_p (head, tail))
-   continue;
+   {
+ gcc_unreachable ();
+ continue;
+   }
 
   rgn_n_insns += set_priorities (head, tail);
 }
@@ -3160,6 +3163,7 @@ schedule_region (int rgn)
 
  if (no_real_insns_p (head, tail

[PATCH] rs6000: New pass to mitigate SP float load perf issue on Power10

2023-11-15 Thread Kewen.Lin

Hi,

As Power ISA defines, when loading a scalar single precision (SP)
floating point from memory, we have the double precision (DP) format
in target register converted from SP, it's unlike some other
architectures which supports SP and DP in registers with their
separated formats.  The scalar SP instructions operates on DP format
value in register and round the result to fit in SP (but still
keeping the value in DP format).

On Power10, a scalar SP floating point load insn will be cracked into
two internal operations, one is to load the value, the other is to
convert SP to DP format.  Comparing to those uncracked load like
vector SP load, it has extra 3 cycles load-to-use penalty.  When
evaluating some critical workloads, we found that for some cases we
don't really need the conversion if all the involved operations are
only with SP format.  In this case, we can replace the scalar SP
loads with vector SP load and splat (no conversion), replace all
involved computation with the corresponding vector operations (with
Power10 slice-based design, we expect the latency of scalar operation
and its equivalent vector operation is the same), that is to promote
the scalar SP loads and their affected computation to vector
operations.

For example for the below case:

void saxpy (int n, float a, float * restrict x, float * restrict y)
{
  for (int i = 0; i < n; ++i)
  y[i] = a*x[i] + y[i];
}

At -O2, the loop body would end up with:

.L3:
lfsx 12,6,9// conv
lfsx 0,5,9 // conv
fmadds 0,0,1,12
stfsx 0,6,9
addi 9,9,4
bdnz .L3

but it can be implemented with:

.L3:
lxvwsx 0,5,9   // load and splat
lxvwsx 12,6,9
xvmaddmsp 0,1,12
stxsiwx 0,6,9  // just store word 1 (BE ordering)
addi 9,9,4
bdnz .L3

Evaluated on Power10, the latter can speed up 23% against the former.

So this patch is to introduce a pass to recognize such case and
change the scalar SP operations with the appropriate vector SP
operations when it's proper.

The processing of this pass starts from scalar SP loads, first it
checks if it's valid, further checks all the stmts using its loaded
result, then propagates from them.  This process of propagation
mainly goes with function visit_stmt, which first checks the
validity of the given stmt, then checks the feeders of use operands
with visit_stmt recursively, finally checks all the stmts using the
def with visit_stmt recursively.  The purpose is to ensure all
propagated stmts are valid to be transformed with its equivalent
vector operations.  For some special operands like constant or
GIMPLE_NOP def ssa, record them as splatting candidates.  There are
some validity checks like: if the addressing mode can satisfy index
form with some adjustments, if there is the corresponding vector
operation support, and so on.  Once all propagated stmts from one
load are valid, they are transformed by function transform_stmt by
respecting the information in stmt_info like sf_type, new_ops etc.

For example, for the below test case:

  _4 = MEM[(float *)x_13(D) + ivtmp.13_24 * 1];  // stmt1
  _7 = MEM[(float *)y_15(D) + ivtmp.13_24 * 1];  // stmt2
  _8 = .FMA (_4, a_14(D), _7);   // stmt3
  MEM[(float *)y_15(D) + ivtmp.13_24 * 1] = _8;  // stmt4

The processing starts from stmt1, which is taken as valid, adds it
into the chain, then processes its use stmt stmt3, which is also
valid, iterating its operands _4 whose def is stmt1 (visited), a_14
which needs splatting and _7 whose def stmt2 is to be processed.
Then stmt2 is taken as a valid load and it's added into the chain.
All operands _4, a_14 and _7 of stmt3 are processed well, then it's
added into the chain.  Then it processes use stmts of _8 (result of
stmt3), so checks stmt4 which is a valid store.  Since all these
involved stmts are valid to be transformed, we get below finally:

  sf_5 = __builtin_vsx_lxvwsx (ivtmp.13_24, x_13(D));
  sf_25 = __builtin_vsx_lxvwsx (ivtmp.13_24, y_15(D));
  sf_22 = {a_14(D), a_14(D), a_14(D), a_14(D)};
  sf_20 = .FMA (sf_5, sf_22, sf_25);
  __builtin_vsx_stxsiwx (sf_20, ivtmp.13_24, y_15(D));

Since it needs to do some validity checks and adjustments if allowed,
such as: check if some scalar operation has the corresponding vector
support, considering scalar SP load can allow reg + {reg, disp}
addressing modes while vector SP load and splat can only allow reg +
reg, also considering the efficiency to get UD/DF chain for affected
operations, we make this pass as a gimple pass.

Considering gimple_isel pass has some gimple massaging, this pass is
placed just before that.  Considering this pass can generate some
extra vector construction (like some constant, values converted from
int etc.), which are extra comparing to the original scalar, and it
makes use of more vector resource than before, it's not turned on by
default conservatively for now.

With the extra code to make this default on Power10, it's bootstrapped
and alm

[PATCH] [i386] APX: Fix EGPR usage in several patterns.

2023-11-15 Thread Hongyu Wang

Hi,

For vextract/insert{if}128 they cannot adopt EGPR in their memory operand, all
related pattern should be adjusted to disable EGPR usage on them.
Also fix a wrong gpr16 attr for insertps.

Bootstrapped/regtested on x86-64-pc-linux-gnu{-m32,}

Ok for master?

gcc/ChangeLog:

* config/i386/sse.md (vec_extract_hi_): Add noavx512vl
alternative with attr addr gpr16 and "jm" constraint.
(vec_extract_hi_): Likewise for SF vector modes.
(@vec_extract_hi_): Likewise.
(*vec_extractv2ti): Likewise.
(vec_set_hi_): Likewise.
* config/i386/mmx.md (@sse4_1_insertps_): Correct gpr16 attr for
each alternative.
---
 gcc/config/i386/mmx.md |  2 +-
 gcc/config/i386/sse.md | 32 
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index a3d08bb9d3b..355538749d1 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1215,7 +1215,7 @@ (define_insn "@sse4_1_insertps_"
 }
 }
   [(set_attr "isa" "noavx,noavx,avx")
-   (set_attr "addr" "*,*,gpr16")
+   (set_attr "addr" "gpr16,gpr16,*")
(set_attr "type" "sselog")
(set_attr "prefix_data16" "1,1,*")
(set_attr "prefix_extra" "1")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index c502582102e..472c2190f89 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -12049,9 +12049,9 @@ (define_insn "vec_extract_hi__mask"
(set_attr "mode" "")])
 
 (define_insn "vec_extract_hi_"
-  [(set (match_operand: 0 "nonimmediate_operand" "=vm")
+  [(set (match_operand: 0 "nonimmediate_operand" "=xjm,vm")
(vec_select:
- (match_operand:VI8F_256 1 "register_operand" "v")
+ (match_operand:VI8F_256 1 "register_operand" "x,v")
  (parallel [(const_int 2) (const_int 3)])))]
   "TARGET_AVX"
 {
@@ -12065,7 +12065,9 @@ (define_insn "vec_extract_hi_"
   else
 return "vextract\t{$0x1, %1, %0|%0, %1, 0x1}";
 }
-  [(set_attr "type" "sselog1")
+  [(set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "addr" "gpr16,*")
+   (set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "vex")
@@ -12132,7 +12134,7 @@ (define_insn "vec_extract_hi__mask"
(set_attr "mode" "")])
 
 (define_insn "vec_extract_hi_"
-  [(set (match_operand: 0 "nonimmediate_operand" "=xm, vm")
+  [(set (match_operand: 0 "nonimmediate_operand" "=xjm, vm")
(vec_select:
  (match_operand:VI4F_256 1 "register_operand" "x, v")
  (parallel [(const_int 4) (const_int 5)
@@ -12141,7 +12143,8 @@ (define_insn "vec_extract_hi_"
   "@
 vextract\t{$0x1, %1, %0|%0, %1, 0x1}
 vextract32x4\t{$0x1, %1, %0|%0, %1, 0x1}"
-  [(set_attr "isa" "*, avx512vl")
+  [(set_attr "isa" "noavx512vl, avx512vl")
+   (set_attr "addr" "gpr16,*")
(set_attr "prefix" "vex, evex")
(set_attr "type" "sselog1")
(set_attr "length_immediate" "1")
@@ -1,7 +12225,7 @@ (define_insn_and_split "@vec_extract_lo_"
   "operands[1] = gen_lowpart (mode, operands[1]);")
 
 (define_insn "@vec_extract_hi_"
-  [(set (match_operand: 0 "nonimmediate_operand" "=xm,vm")
+  [(set (match_operand: 0 "nonimmediate_operand" "=xjm,vm")
(vec_select:
  (match_operand:V16_256 1 "register_operand" "x,v")
  (parallel [(const_int 8) (const_int 9)
@@ -12236,7 +12239,8 @@ (define_insn "@vec_extract_hi_"
   [(set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
-   (set_attr "isa" "*,avx512vl")
+   (set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "addr" "gpr16,*")
(set_attr "prefix" "vex,evex")
(set_attr "mode" "OI")])
 
@@ -20465,7 +20469,7 @@ (define_split
 })
 
 (define_insn "*vec_extractv2ti"
-  [(set (match_operand:TI 0 "nonimmediate_operand" "=xm,vm")
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=xjm,vm")
(vec_select:TI
  (match_operand:V2TI 1 "register_operand" "x,v")
  (parallel
@@ -20477,6 +20481,8 @@ (define_insn "*vec_extractv2ti"
   [(set_attr "type" "sselog")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
+   (set_attr "isa" "noavx512vl,avx512vl")
+   (set_attr "addr" "gpr16,*")
(set_attr "prefix" "vex,evex")
(set_attr "mode" "OI")])
 
@@ -27556,12 +27562,12 @@ (define_insn "vec_set_lo_"
(set_attr "mode" "")])
 
 (define_insn "vec_set_hi_"
-  [(set (match_operand:VI8F_256 0 "register_operand" "=v")
+  [(set (match_operand:VI8F_256 0 "register_operand" "=x,v")
(vec_concat:VI8F_256
  (vec_select:
-   (match_operand:VI8F_256 1 "register_operand" "v")
+   (match_operand:VI8F_256 1 "register_operand" "x,v")
(parallel [(const_int 0) (const_int 1)]))
- (match_operand: 2 "nonimmediate_operand" "vm")))]
+ (match_operand: 2 "nonimmediate_operand" "xjm,vm")))]
   "TARGET_AVX && "
 {
   if (TARGET_AVX512DQ)
@@ -27571,7 +27577,9 @@ (define_insn "vec_set_hi

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-15 Thread Alexander Monakov



On Wed, 15 Nov 2023, Kewen.Lin wrote:

> >> And I suppose it would be OK to do that.  Empty BBs are usually removed by
> >> CFG cleanup so the situation should only happen in rare corner cases where
> >> the fix would be to actually run CFG cleanup ...
> > 
> > Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
> > may be a preferable compromise for sched-rgn as well.
> 
> Inspired by this discussion, I tested the attached patch 1 which is to run
> cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and
> regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu.

I don't think you can run cleanup_cfg after sched_init. I would suggest
to put it early in schedule_insns.

> Then I assumed some of the current uses of no_real_insns_p won't encounter
> empty blocks any more, so made a patch 2 with some explicit assertions, but
> unfortunately I got ICEs during bootstrapping happens in function
> compute_priorities.  I'm going to investigate it further and post more
> findings, but just heads-up to ensure if this is on the right track.

I suspect this may be caused by invoking cleanup_cfg too late.

Alexander

[PATCH 02/16] [APX NDD] Restrict TImode register usage when NDD enabled

2023-11-15 Thread Hongyu Wang

Under APX NDD, previous TImode allocation will have issue that it was
originally allocated using continuous pair, like rax:rdi, rdi:rdx.

This will cause issue for all TImode NDD patterns. For NDD we will not
assume the arithmetic operations like add have dependency between dest
and src1, then write to 1st highpart rdi will be overrided by the 2nd
lowpart rdi if 2nd lowpart rdi have different src as input, then the write
to 1st highpart rdi will missed and cause miscompliation.

To resolve this, under TARGET_APX_NDD we'd only allow register with even
regno to be allocated with TImode, then TImode registers will be allocated
with non-overlapping pairs.

There could be some error for inline assembly if it forcely allocate __int128
with odd number general register.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_hard_regno_mode_ok): Restrict even regno
for TImode if APX NDD enabled.
---
 gcc/config/i386/i386.cc | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 683ac643bc8..3779d5b1206 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20824,6 +20824,16 @@ ix86_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
return true;
   return !can_create_pseudo_p ();
 }
+  /* With TImode we previously have assumption that src1/dest will use same
+ register, so the allocation of highpart/lowpart can be consecutive, and
+ 2 TImode insn would held their low/highpart in continuous sequence like
+ rax:rdx, rdx:rcx. This will not work for APX_NDD since NDD allows
+ different registers as dest/src1, when writes to 2nd lowpart will impact
+ the writes to 1st highpart, then the insn will be optimized out. So for
+ TImode pattern if we support NDD form, the allowed register number should
+ be even to avoid such mixed high/low part override. */
+  else if (TARGET_APX_NDD && mode == TImode)
+return regno % 2 == 0;
   /* We handle both integer and floats in the general purpose registers.  */
   else if (VALID_INT_MODE_P (mode)
   || VALID_FP_MODE_P (mode))
-- 
2.31.1

[PATCH 00/16] Support Intel APX NDD

2023-11-15 Thread Hongyu Wang

Hi,

Intel APX NDD feature has been released in [1]. 

NDD means New data destination. In such forms, NDD is the new destination
register receiving the result of the computation and all other operands
(including the original destination operand) become read-only source operands.
This feature, i.e.

Existing form | Existing semantics | NDD extension | NDD semantics
INC r/m   | r/m := r/m + 1 | INC ndd, r/m  | ndd := r/m + 1
SUB r/m, imm  | r/m := r/m - imm   | SUB ndd, r/m, imm | ndd(v) := r/m - imm
SUB r/m, reg  | r/m := r/m - reg   | SUB ndd, r/m, reg | ndd(v) := r/m - reg
SUB reg, r/m  | reg := reg - r/m   | SUB ndd, reg, r/m | ndd(v) := reg - r/m

Theoratically, it will provide more flexibility in compiler optimization.

In this series of patch, we will suport below instructions as basic NDD
support:
INC, DEC, NOT, NEG, ADD, SUB, ADC, SBB, AND, OR, XOR, SAL, SAR, SHL, SHR, RCL,
RCR, ROL, ROR, SHLD, SHRD, CMOVcc

In GCC, legacy insns will have constraint "0" to operands[1], so for NDD form
we adds extra alternatives like "r, rm, r", and restrict them under NDD only.
We also made necessary changes in ix86_fixup_*_operators for binary/unary
operations to allow different src and dest. We also added several adjustment
to avoid miscompile under NDD alternatives which are explained in each
standalone patch (i.e. There are some implicit assumptions in TImode
doubleword splitter that operands[0] and operands[1] are the same). 

This series of patches are basic NDD supports. In the future we will
continuously support NDD optimizations.

Bootstrapped/regtested on x86-64-pc-linux{-m32,} and SDE, also passed SPEC sde
simulation run.

Hongyu Wang (7):
  [APX NDD] Restrict TImode register usage when NDD enabled
  [APX NDD] Disable seg_prefixed memory usage for NDD add
  [APX NDD] Support APX NDD for left shift insns
  [APX NDD] Support APX NDD for right shift insns
  [APX NDD] Support APX NDD for rotate insns
  [APX NDD] Support APX NDD for shld/shrd insns
  [APX NDD] Support APX NDD for cmove insns

Kong Lingling (9):
  [APX NDD] Support Intel APX NDD for legacy add insn
  [APX NDD] Support APX NDD for optimization patterns of add
  [APX NDD] Support APX NDD for adc insns
  [APX NDD] Support APX NDD for sub insns
  [APX NDD] Support APX NDD for sbb insn
  [APX NDD] Support APX NDD for neg insn
  [APX NDD] Support APX NDD for not insn
  [APX NDD] Support APX NDD for and insn
  [APX NDD] Support APX NDD for or/xor insn

 gcc/config/i386/constraints.md|5 +
 gcc/config/i386/i386-expand.cc|   51 +-
 gcc/config/i386/i386-options.cc   |3 +
 gcc/config/i386/i386-protos.h |   15 +-
 gcc/config/i386/i386.cc   |   40 +-
 gcc/config/i386/i386.md   | 2367 +++--
 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c   |   15 +
 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c  |   16 +
 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c   |6 +
 .../gcc.target/i386/apx-ndd-shld-shrd.c   |   24 +
 gcc/testsuite/gcc.target/i386/apx-ndd.c   |  202 ++
 .../gcc.target/i386/apx-spill_to_egprs-1.c|8 +-
 12 files changed, 1984 insertions(+), 768 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd.c

-- 
2.31.1

[PATCH 04/16] [APX NDD] Disable seg_prefixed memory usage for NDD add

2023-11-15 Thread Hongyu Wang

NDD uses evex prefix, so when segment prefix is also applied, the instruction
could excceed its 15byte limit, especially adding immediates. This could happen
when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will
add the offset to segment register, which will be encoded using segment prefix.
Disable those *POFF constant usage in NDD add alternatives with new constraint.

gcc/ChangeLog:

* config/i386/constraints.md (je): New constraint.
* config/i386/i386-protos.h (x86_no_poff_operand_p): New function to
check any *POFF constant.
* config/i386/i386.cc (x86_no_poff_operand_p): New prototype.
* config/i386/i386.md (*add_1): Split out je alternative for add.
---
 gcc/config/i386/constraints.md |  5 +
 gcc/config/i386/i386-protos.h  |  1 +
 gcc/config/i386/i386.cc| 25 +
 gcc/config/i386/i386.md| 10 +-
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index cbee31fa40a..c6b51324294 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -433,3 +433,8 @@ (define_address_constraint "jb"
 
 (define_register_constraint  "jc"
  "TARGET_APX_EGPR && !TARGET_AVX ? GENERAL_GPR16 : GENERAL_REGS")
+
+(define_constraint  "je"
+  "@internal constant that do not allow any unspec global offsets"
+  (and (match_operand 0 "x86_64_immediate_operand")
+   (match_test "x86_no_poff_operand_p (op)")))
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 3e08eae4e79..5d902e2925b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -66,6 +66,7 @@ extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
 extern bool x86_extended_reg_mentioned_p (rtx);
 extern bool x86_extended_rex2reg_mentioned_p (rtx);
 extern bool x86_evex_reg_mentioned_p (rtx [], int);
+extern bool x86_no_poff_operand_p (rtx);
 extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
 extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3779d5b1206..47159b06f7d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23292,6 +23292,31 @@ x86_evex_reg_mentioned_p (rtx operands[], int nops)
   return false;
 }
 
+/* Return true when rtx operand does not contain any UNSPEC_*POFF related
+   constant to avoid APX_NDD instructions excceed encoding length limit.  */
+bool
+x86_no_poff_operand_p (rtx operand)
+{
+  if (GET_CODE (operand) == CONST)
+{
+  rtx op = XEXP (operand, 0);
+  if (GET_CODE (op) == PLUS)
+   op = XEXP (op, 0);
+   
+  if (GET_CODE (op) == UNSPEC)
+   {
+ int unspec = XINT (op, 1);
+ return (unspec != UNSPEC_NTPOFF
+ && unspec != UNSPEC_TPOFF
+ && unspec != UNSPEC_DTPOFF
+ && unspec != UNSPEC_GOTTPOFF
+ && unspec != UNSPEC_GOTNTPOFF
+ && unspec != UNSPEC_INDNTPOFF);
+   }
+}
+  return true;
+}
+
 /* If profitable, negate (without causing overflow) integer constant
of mode MODE at location LOC.  Return true in this case.  */
 bool
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 7ddb2cb2a71..ecd06625a7d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6359,15 +6359,15 @@ (define_insn_and_split 
"*add3_doubleword_concat_zext"
  "split_double_mode (mode, &operands[0], 1, &operands[0], &operands[5]);")
 
 (define_insn "*add_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r,r,r")
(plus:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,r,r,rm,r")
- (match_operand:SWI48 2 "x86_64_general_operand" "re,BM,0,le,re,BM")))
+ (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,r,r,rm,r,m,r")
+ (match_operand:SWI48 2 "x86_64_general_operand" 
"re,BM,0,le,r,e,je,BM")))
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (PLUS, mode, operands,
ix86_can_use_ndd_p (PLUS))"
 {
-  bool use_ndd = (which_alternative == 4 || which_alternative == 5);
+  bool use_ndd = (which_alternative >= 4);
   switch (get_attr_type (insn))
 {
 case TYPE_LEA:
@@ -6398,7 +6398,7 @@ (define_insn "*add_1"
: "add{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set_attr "isa" "*,*,*,*,apx_ndd,apx_ndd")
+  [(set_attr "isa" "*,*,*,*,apx_ndd,apx_ndd,apx_ndd,apx_ndd")
(set (attr "type")
  (cond [(eq_attr "alternative" "3")
   (const_string "lea")
-- 
2.31.1

[PATCH 06/16] [APX NDD] Support APX NDD for sub insns

2023-11-15 Thread Hongyu Wang

From: Kong Lingling 

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_fixup_binary_operands_no_copy):
Add use_ndd parameter.
(ix86_can_use_ndd_p): ADD MINUS.
* config/i386/i386-protos.h (ix86_fixup_binary_operands_no_copy):
Change define.
* config/i386/i386.md (sub3): Add NDD constraints.
(*sub_1): Likewise.
(*subsi_1_zext): Likewise.
(*sub_2): Likewise.
(*subsi_2_zext): Likewise.
(subv4): Likewise.
(*subv4): Likewise.
(subv4_1): Likewise.
(usubv4): Likewise.
(*sub_3): Likewise.
(*subsi_3_zext): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add test for ndd sub.

[APX NDD] Support APX NDD for more optimized sub insn

gcc/ChangeLog:

* config/i386/i386.md
---
 gcc/config/i386/i386-expand.cc  |   6 +-
 gcc/config/i386/i386-protos.h   |   2 +-
 gcc/config/i386/i386.md | 152 
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  13 ++
 4 files changed, 118 insertions(+), 55 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index ea0e5881087..e5f75875e3b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1270,6 +1270,7 @@ bool ix86_can_use_ndd_p (enum rtx_code code)
   switch (code)
 {
 case PLUS:
+case MINUS:
   return true;
 default:
   return false;
@@ -1342,9 +1343,10 @@ ix86_fixup_binary_operands (enum rtx_code code, 
machine_mode mode,
 
 void
 ix86_fixup_binary_operands_no_copy (enum rtx_code code,
-   machine_mode mode, rtx operands[])
+   machine_mode mode, rtx operands[],
+   bool use_ndd)
 {
-  rtx dst = ix86_fixup_binary_operands (code, mode, operands);
+  rtx dst = ix86_fixup_binary_operands (code, mode, operands, use_ndd);
   gcc_assert (dst == operands[0]);
 }
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 5d902e2925b..ad895fac72d 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -111,7 +111,7 @@ extern void ix86_expand_vector_move_misalign (machine_mode, 
rtx[]);
 extern rtx ix86_fixup_binary_operands (enum rtx_code,
   machine_mode, rtx[], bool = false);
 extern void ix86_fixup_binary_operands_no_copy (enum rtx_code,
-   machine_mode, rtx[]);
+   machine_mode, rtx[], bool = 
false);
 extern void ix86_expand_binary_operator (enum rtx_code,
 machine_mode, rtx[], bool = false);
 extern void ix86_expand_vector_logical_operator (enum rtx_code,
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index f23859d1172..1aa8469d666 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -7637,7 +7637,8 @@ (define_expand "sub3"
(minus:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")
 (match_operand:SDWIM 2 "")))]
   ""
-  "ix86_expand_binary_operator (MINUS, mode, operands); DONE;")
+  "ix86_expand_binary_operator (MINUS, mode, operands,
+   ix86_can_use_ndd_p (MINUS)); DONE;")
 
 (define_insn_and_split "*sub3_doubleword"
   [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
@@ -7663,7 +7664,10 @@ (define_insn_and_split "*sub3_doubleword"
   split_double_mode (mode, &operands[0], 3, &operands[0], &operands[3]);
   if (operands[2] == const0_rtx)
 {
-  ix86_expand_binary_operator (MINUS, mode, &operands[3]);
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  ix86_expand_binary_operator (MINUS, mode, &operands[3],
+  ix86_can_use_ndd_p (MINUS));
   DONE;
 }
 })
@@ -7692,25 +7696,35 @@ (define_insn_and_split "*sub3_doubleword_zext"
   "split_double_mode (mode, &operands[0], 2, &operands[0], 
&operands[3]);")
 
 (define_insn "*sub_1"
-  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,")
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,i,r,r")
(minus:SWI
- (match_operand:SWI 1 "nonimmediate_operand" "0,0")
- (match_operand:SWI 2 "" ",")))
+ (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,r")
+ (match_operand:SWI 2 "" ",,r,")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, mode, operands)"
-  "sub{}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "alu")
+  "ix86_binary_operator_ok (MINUS, mode, operands,
+   ix86_can_use_ndd_p (MINUS))"
+  "@
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "alu")
(set_attr "mode" "")])
 
 (define_insn "*subsi_1_zext"
-  [(set

[PATCH 01/16] [APX NDD] Support Intel APX NDD for legacy add insn

2023-11-15 Thread Hongyu Wang

From: Kong Lingling 

APX NDD provides an extra destination register operand for several gpr
related legacy insns, so a new alternative can be adopted to operand1
with "r" constraint.

This first patch supports NDD for add instruction, and keeps to use lea
when all operands are registers since lea have shorter encoding. For
add operations containing mem NDD will be adopted to save an extra move.

In legacy x86 binary operation expand it will force operands[0] and
operands[1] to be the same so add a helper function to allow NDD form
pattern that operands[0] and operands[1] can be different.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_can_use_ndd_p): New function.
(ix86_fixup_binary_operands): Add new use_ndd flag to check
whether ndd can be used for this binop and adjust operand emit.
(ix86_binary_operator_ok): Likewise.
(ix86_expand_binary_operator): Likewise, and void postreload
expand generate lea pattern when use_ndd is explicit parsed.
* config/i386/i386-options.cc (ix86_option_override_internal):
Prohibit apx subfeatures when not in 64bit mode.
* config/i386/i386-protos.h (ix86_binary_operator_ok):
Add use_ndd flag.
(ix86_fixup_binary_operand): Likewise.
(ix86_expand_binary_operand): Likewise.
* config/i386/i386.md (*add_1): Extend with new alternatives
to support NDD, and adjust output template.
(*addhi_1): Likewise.
(*addqi_1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: New test.
---
 gcc/config/i386/i386-expand.cc  |  31 +--
 gcc/config/i386/i386-options.cc |   3 +
 gcc/config/i386/i386-protos.h   |   7 +-
 gcc/config/i386/i386.md | 109 ++--
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  21 +
 5 files changed, 118 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a8d871d321e..ea0e5881087 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1260,6 +1260,22 @@ ix86_swap_binary_operands_p (enum rtx_code code, 
machine_mode mode,
   return false;
 }
 
+/* APX extends most (but not all) integer instructions with a new form that
+   has a third register operand called a nondestructive destination (NDD). */
+
+bool ix86_can_use_ndd_p (enum rtx_code code)
+{
+  if (!TARGET_APX_NDD)
+return false;
+  switch (code)
+{
+case PLUS:
+  return true;
+default:
+  return false;
+  }
+  return false;
+}
 
 /* Fix up OPERANDS to satisfy ix86_binary_operator_ok.  Return the
destination to use for the operation.  If different from the true
@@ -1267,7 +1283,7 @@ ix86_swap_binary_operands_p (enum rtx_code code, 
machine_mode mode,
 
 rtx
 ix86_fixup_binary_operands (enum rtx_code code, machine_mode mode,
-   rtx operands[])
+   rtx operands[], bool use_ndd)
 {
   rtx dst = operands[0];
   rtx src1 = operands[1];
@@ -1307,7 +1323,7 @@ ix86_fixup_binary_operands (enum rtx_code code, 
machine_mode mode,
 src1 = force_reg (mode, src1);
 
   /* Source 1 cannot be a non-matching memory.  */
-  if (MEM_P (src1) && !rtx_equal_p (dst, src1))
+  if (!use_ndd && MEM_P (src1) && !rtx_equal_p (dst, src1))
 src1 = force_reg (mode, src1);
 
   /* Improve address combine.  */
@@ -1338,11 +1354,11 @@ ix86_fixup_binary_operands_no_copy (enum rtx_code code,
 
 void
 ix86_expand_binary_operator (enum rtx_code code, machine_mode mode,
-rtx operands[])
+rtx operands[], bool use_ndd)
 {
   rtx src1, src2, dst, op, clob;
 
-  dst = ix86_fixup_binary_operands (code, mode, operands);
+  dst = ix86_fixup_binary_operands (code, mode, operands, use_ndd);
   src1 = operands[1];
   src2 = operands[2];
 
@@ -1352,7 +1368,8 @@ ix86_expand_binary_operator (enum rtx_code code, 
machine_mode mode,
 
   if (reload_completed
   && code == PLUS
-  && !rtx_equal_p (dst, src1))
+  && !rtx_equal_p (dst, src1)
+  && !use_ndd)
 {
   /* This is going to be an LEA; avoid splitting it later.  */
   emit_insn (op);
@@ -1451,7 +1468,7 @@ ix86_expand_vector_logical_operator (enum rtx_code code, 
machine_mode mode,
 
 bool
 ix86_binary_operator_ok (enum rtx_code code, machine_mode mode,
-rtx operands[3])
+rtx operands[3], bool use_ndd)
 {
   rtx dst = operands[0];
   rtx src1 = operands[1];
@@ -1475,7 +1492,7 @@ ix86_binary_operator_ok (enum rtx_code code, machine_mode 
mode,
 return false;
 
   /* Source 1 cannot be a non-matching memory.  */
-  if (MEM_P (src1) && !rtx_equal_p (dst, src1))
+  if (!use_ndd && MEM_P (src1) && !rtx_equal_p (dst, src1))
 /* Support "andhi/andsi/anddi" as a zero-extending move.  */
 return (code == AND
&& (mode == HImode
d

[PATCH 08/16] [APX NDD] Support APX NDD for neg insn

2023-11-15 Thread Hongyu Wang

From: Kong Lingling 

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add NEG
support.
(ix86_expand_unary_operator): Add use_ndd parameter and adjust for NDD.
* config/i386/i386-protos.h : Add use_ndd parameter for
ix86_unary_operator_ok and ix86_expand_unary_operator.
* config/i386/i386.cc (ix86_unary_operator_ok): Add ndd constraints,
and add use_ndd parameter.
* config/i386/i386.md (neg2): Add ndd constraints.
(*neg_1): Likewise.
(*neg2_doubleword): Likewise.
(*negsi_1_zext): Likewise.
(*neg_2): Likewise.
(*negsi_2_zext): Likewise.
(*neg_ccc_1): Likewise.
(*neg_ccc_2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add neg test.
---
 gcc/config/i386/i386-expand.cc  |  5 +-
 gcc/config/i386/i386-protos.h   |  5 +-
 gcc/config/i386/i386.cc |  5 +-
 gcc/config/i386/i386.md | 79 -
 gcc/testsuite/gcc.target/i386/apx-ndd.c | 29 +
 5 files changed, 90 insertions(+), 33 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index e5f75875e3b..995cc792c5f 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1271,6 +1271,7 @@ bool ix86_can_use_ndd_p (enum rtx_code code)
 {
 case PLUS:
 case MINUS:
+case NEG:
   return true;
 default:
   return false;
@@ -1511,7 +1512,7 @@ ix86_binary_operator_ok (enum rtx_code code, machine_mode 
mode,
 
 void
 ix86_expand_unary_operator (enum rtx_code code, machine_mode mode,
-   rtx operands[])
+   rtx operands[], bool use_ndd)
 {
   bool matching_memory = false;
   rtx src, dst, op, clob;
@@ -1530,7 +1531,7 @@ ix86_expand_unary_operator (enum rtx_code code, 
machine_mode mode,
 }
 
   /* When source operand is memory, destination must match.  */
-  if (MEM_P (src) && !matching_memory)
+  if (!use_ndd && MEM_P (src) && !matching_memory)
 src = force_reg (mode, src);
 
   /* Emit the instruction.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ad895fac72d..0010fd71011 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -128,7 +128,7 @@ extern bool ix86_vec_interleave_v2df_operator_ok (rtx 
operands[3], bool high);
 extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn);
 extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn);
 extern void ix86_expand_unary_operator (enum rtx_code, machine_mode,
-   rtx[]);
+   rtx[], bool = false);
 extern rtx ix86_build_const_vector (machine_mode, bool, rtx);
 extern rtx ix86_build_signbit_mask (machine_mode, bool, bool);
 extern HOST_WIDE_INT ix86_convert_const_vector_to_integer (rtx,
@@ -148,7 +148,8 @@ extern void ix86_split_fp_absneg_operator (enum rtx_code, 
machine_mode,
   rtx[]);
 extern void ix86_expand_copysign (rtx []);
 extern void ix86_expand_xorsign (rtx []);
-extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[2]);
+extern bool ix86_unary_operator_ok (enum rtx_code, machine_mode, rtx[2],
+   bool = false);
 extern bool ix86_match_ccmode (rtx, machine_mode);
 extern bool ix86_match_ptest_ccmode (rtx);
 extern void ix86_expand_branch (enum rtx_code, rtx, rtx, rtx);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 47159b06f7d..9b0715943f7 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -16160,11 +16160,12 @@ ix86_dep_by_shift_count (const_rtx set_insn, 
const_rtx use_insn)
 bool
 ix86_unary_operator_ok (enum rtx_code,
machine_mode,
-   rtx operands[2])
+   rtx operands[2],
+   bool use_ndd)
 {
   /* If one of operands is memory, source and destination must match.  */
   if ((MEM_P (operands[0])
-   || MEM_P (operands[1]))
+   || (!use_ndd && MEM_P (operands[1])))
   && ! rtx_equal_p (operands[0], operands[1]))
 return false;
   return true;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c3dcfaf52e1..8ba524e9e44 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12952,13 +12952,15 @@ (define_expand "neg2"
   [(set (match_operand:SDWIM 0 "nonimmediate_operand")
(neg:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")))]
   ""
-  "ix86_expand_unary_operator (NEG, mode, operands); DONE;")
+  "ix86_expand_unary_operator (NEG, mode, operands,
+  ix86_can_use_ndd_p (NEG)); DONE;")
 
 (define_insn_and_split "*neg2_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro")
-   (neg: (match_operand: 1 "nonimmediate_operand" "0")))
+  [(set (match_operand: 0 "n

[PATCH 14/16] [APX NDD] Support APX NDD for rotate insns

2023-11-15 Thread Hongyu Wang

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add ROTATE
and ROTATERT.
* config/i386/i386.md (*3_1): Extend with a new
alternative to support NDD for SI/DI rotate, and adjust output
template.
(*si3_1_zext): Likewise.
(*3_1): Likewise for QI/HI modes.
(rcrsi2): Likewise.
(rcrdi2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add test for left/right rotate.
---
 gcc/config/i386/i386-expand.cc  |  2 +
 gcc/config/i386/i386.md | 91 -
 gcc/testsuite/gcc.target/i386/apx-ndd.c | 20 ++
 3 files changed, 80 insertions(+), 33 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 8e040346fbb..ab6f14485d6 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1279,6 +1279,8 @@ bool ix86_can_use_ndd_p (enum rtx_code code)
 case ASHIFT:
 case ASHIFTRT:
 case LSHIFTRT:
+case ROTATE:
+case ROTATERT:
   return true;
 default:
   return false;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3ff333d4a41..760c0d32f4d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16362,13 +16362,15 @@ (define_insn "*bmi2_rorx3_1"
(set_attr "mode" "")])
 
 (define_insn "*3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
(any_rotate:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,")))
+ (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm")
+ (match_operand:QI 2 "nonmemory_operand" "c,,c")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands)"
+  "ix86_binary_operator_ok (, mode, operands,
+   ix86_can_use_ndd_p ())"
 {
+  bool use_ndd = (which_alternative == 2);
   switch (get_attr_type (insn))
 {
 case TYPE_ROTATEX:
@@ -16376,14 +16378,18 @@ (define_insn "*3_1"
 
 default:
   if (operands[2] == const1_rtx
- && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
-   return "{}\t%0";
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+ && !(use_ndd && REG_P (operands[1])
+  && REGNO (operands[1]) == CX_REG))
+   return use_ndd ? "{}\t{%1, %0|%0, %1}"
+  : "{}\t%0";
   else
-   return "{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "{}\t{%2, %1, %0|%0, %1, %2}"
+  : "{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set_attr "isa" "*,bmi2")
-   (set_attr "type" "rotate,rotatex")
+  [(set_attr "isa" "*,bmi2,apx_ndd")
+   (set_attr "type" "rotate,rotatex,rotate")
(set (attr "preferred_for_size")
  (cond [(eq_attr "alternative" "0")
  (symbol_ref "true")]
@@ -16433,13 +16439,14 @@ (define_insn "*bmi2_rorxsi3_1_zext"
(set_attr "mode" "SI")])
 
 (define_insn "*si3_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
(zero_extend:DI
- (any_rotate:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm")
-(match_operand:QI 2 "nonmemory_operand" "cI,I"
+ (any_rotate:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,rm")
+(match_operand:QI 2 "nonmemory_operand" "cI,I,cI"
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && ix86_binary_operator_ok (, SImode, operands)"
 {
+  bool use_ndd = (which_alternative == 2);
   switch (get_attr_type (insn))
 {
 case TYPE_ROTATEX:
@@ -16447,14 +16454,18 @@ (define_insn "*si3_1_zext"
 
 default:
   if (operands[2] == const1_rtx
- && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
-   return "{l}\t%k0";
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+ && !(use_ndd && REG_P (operands[1])
+  && REGNO (operands[1]) == CX_REG))
+   return use_ndd ? "{l}\t{%1, %k0|%k0, %1}"
+  : "{l}\t%k0";
   else
-   return "{l}\t{%2, %k0|%k0, %2}";
+   return use_ndd ? "{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  : "{l}\t{%2, %k0|%k0, %2}";
 }
 }
-  [(set_attr "isa" "*,bmi2")
-   (set_attr "type" "rotate,rotatex")
+  [(set_attr "isa" "*,bmi2,apx_ndd")
+   (set_attr "type" "rotate,rotatex,rotate")
(set (attr "preferred_for_size")
  (cond [(eq_attr "alternative" "0")
  (symbol_ref "true")]
@@ -16498,19 +16509,27 @@ (define_split
(zero_extend:DI (rotatert:SI (match_dup 1) (match_dup 2])
 
 (define_insn "*3_1"
-  [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m")
-   (any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0")
- (match_operand:QI 2 "nonmemory_operand" "c")))
+  [(set (match_operand:SWI12 0 "non

[PATCH 03/16] [APX NDD] Support APX NDD for optimization patterns of add

2023-11-15 Thread Hongyu Wang

From: Kong Lingling 

gcc/ChangeLog:

* config/i386/i386.md: (addsi_1_zext): Add new alternatives for NDD and
adjust output templates.
(*add_2): Likewise.
(*addsi_2_zext): Likewise.
(*add_3): Likewise.
(*addsi_3_zext): Likewise.
(*adddi_4): Likewise.
(*add_4): Likewise.
(*add_5): Likewise.
(*addv4): Likewise.
(*addv4_1): Likewise.
(*add3_cconly_overflow_1): Likewise.
(*add3_cc_overflow_1): Likewise.
(*addsi3_zext_cc_overflow_1): Likewise.
(*add3_cconly_overflow_2): Likewise.
(*add3_cc_overflow_2): Likewise.
(*addsi3_zext_cc_overflow_2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add more test.
---
 gcc/config/i386/i386.md | 314 +++-
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  53 ++--
 2 files changed, 236 insertions(+), 131 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index daab634fea0..7ddb2cb2a71 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6420,12 +6420,13 @@ (define_insn "*add_1"
 ;; patterns constructed from addsi_1 to match.
 
 (define_insn "addsi_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
(zero_extend:DI
- (plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,r,r")
-  (match_operand:SI 2 "x86_64_general_operand" "rBMe,0,le"
+ (plus:SI (match_operand:SI 1 "nonimmediate_operand" "%0,r,r,r")
+  (match_operand:SI 2 "x86_64_general_operand" 
"rBMe,0,le,rBMe"
(clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands)"
+  "TARGET_64BIT && ix86_binary_operator_ok (PLUS, SImode, operands,
+   ix86_can_use_ndd_p (PLUS))"
 {
   switch (get_attr_type (insn))
 {
@@ -6434,11 +6435,13 @@ (define_insn "addsi_1_zext"
 
 case TYPE_INCDEC:
   if (operands[2] == const1_rtx)
-return "inc{l}\t%k0";
+return which_alternative == 3 ? "inc{l}\t{%1, %k0|%k0, %1}"
+ : "inc{l}\t%k0";
   else
 {
  gcc_assert (operands[2] == constm1_rtx);
-  return "dec{l}\t%k0";
+ return which_alternative == 3 ? "dec{l}\t{%1, %k0|%k0, %1}"
+   : "dec{l}\t%k0";
}
 
 default:
@@ -6448,12 +6451,15 @@ (define_insn "addsi_1_zext"
 std::swap (operands[1], operands[2]);
 
   if (x86_maybe_negate_const_int (&operands[2], SImode))
-return "sub{l}\t{%2, %k0|%k0, %2}";
+return which_alternative == 3 ? "sub{l}\t{%2 ,%1, %k0|%k0, %1, %2}"
+ : "sub{l}\t{%2, %k0|%k0, %2}";
 
-  return "add{l}\t{%2, %k0|%k0, %2}";
+  return which_alternative == 3 ? "add{l}\t{%2 ,%1, %k0|%k0, %1, %2}"
+   : "add{l}\t{%2, %k0|%k0, %2}";
 }
 }
-  [(set (attr "type")
+  [(set_attr "isa" "*,*,*,apx_ndd")
+   (set (attr "type")
  (cond [(eq_attr "alternative" "2")
  (const_string "lea")
(match_operand:SI 2 "incdec_operand")
@@ -6697,37 +6703,43 @@ (define_insn "*add_2"
   [(set (reg FLAGS_REG)
(compare
  (plus:SWI
-   (match_operand:SWI 1 "nonimmediate_operand" "%0,0,")
-   (match_operand:SWI 2 "" ",,0"))
+   (match_operand:SWI 1 "nonimmediate_operand" "%0,0,,rm,r")
+   (match_operand:SWI 2 "" ",,0,r,"))
  (const_int 0)))
-   (set (match_operand:SWI 0 "nonimmediate_operand" "=m,,")
+   (set (match_operand:SWI 0 "nonimmediate_operand" "=m,,,r,r")
(plus:SWI (match_dup 1) (match_dup 2)))]
   "ix86_match_ccmode (insn, CCGOCmode)
-   && ix86_binary_operator_ok (PLUS, mode, operands)"
+   && ix86_binary_operator_ok (PLUS, mode, operands,
+  ix86_can_use_ndd_p (PLUS))"
 {
+  bool use_ndd = (which_alternative == 3 || which_alternative == 4);
   switch (get_attr_type (insn))
 {
 case TYPE_INCDEC:
   if (operands[2] == const1_rtx)
-return "inc{}\t%0";
+return use_ndd ? "inc{}\t{%1, %0|%0, %1}"
+  : "inc{}\t%0";
   else
 {
  gcc_assert (operands[2] == constm1_rtx);
-  return "dec{}\t%0";
+ return use_ndd ? "dec{}\t{%1, %0|%0, %1}"
+: "dec{}\t%0";
}
 
 default:
   if (which_alternative == 2)
 std::swap (operands[1], operands[2]);
 
-  gcc_assert (rtx_equal_p (operands[0], operands[1]));
   if (x86_maybe_negate_const_int (&operands[2], mode))
-return "sub{}\t{%2, %0|%0, %2}";
+return use_ndd ? "sub{}\t{%2, %1, %0|%0, %1, %2}"
+  : "sub{}\t{%2, %0|%0, %2}";
 
-  return "add{}\t{%2, %0|%0, %2}";
+  return use_ndd ? "add{}\t{%2, %1, %0|%0, %1, %2}"
+

[PATCH 09/16] [APX NDD] Support APX NDD for not insn

2023-11-15 Thread Hongyu Wang

From: Kong Lingling 

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add NOT
support.
* config/i386/i386.md (one_cmpl2): Add NDD constraints, adjust
output template.
(*one_cmpl2_1): Likewise.
(*one_cmplqi2_1): Likewise.
(*one_cmpl2_doubleword): Likewise.
(*one_cmplsi2_1_zext): Likewise.
(*one_cmpl2_2): Likewise.
(*one_cmplsi2_2_zext): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add not test.
---
 gcc/config/i386/i386-expand.cc  |  1 +
 gcc/config/i386/i386.md | 73 +++--
 gcc/testsuite/gcc.target/i386/apx-ndd.c | 11 
 3 files changed, 55 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 995cc792c5f..be77ba4a476 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1272,6 +1272,7 @@ bool ix86_can_use_ndd_p (enum rtx_code code)
 case PLUS:
 case MINUS:
 case NEG:
+case NOT:
   return true;
 default:
   return false;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8ba524e9e44..9758e4e5144 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -13673,64 +13673,73 @@ (define_expand "one_cmpl2"
   [(set (match_operand:SDWIM 0 "nonimmediate_operand")
(not:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")))]
   ""
-  "ix86_expand_unary_operator (NOT, mode, operands); DONE;")
+  "ix86_expand_unary_operator (NOT, mode, operands,
+  ix86_can_use_ndd_p (NOT)); DONE;")
 
 (define_insn_and_split "*one_cmpl2_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro")
-   (not: (match_operand: 1 "nonimmediate_operand" "0")))]
-  "ix86_unary_operator_ok (NOT, mode, operands)"
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+   (not: (match_operand: 1 "nonimmediate_operand" "0,ro")))]
+  "ix86_unary_operator_ok (NOT, mode, operands,
+  ix86_can_use_ndd_p (NOT))"
   "#"
   "&& reload_completed"
   [(set (match_dup 0)
(not:DWIH (match_dup 1)))
(set (match_dup 2)
(not:DWIH (match_dup 3)))]
-  "split_double_mode (mode, &operands[0], 2, &operands[0], 
&operands[2]);")
+  "split_double_mode (mode, &operands[0], 2, &operands[0], &operands[2]);"
+  [(set_attr "isa" "*,apx_ndd")])
 
 (define_insn "*one_cmpl2_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,?k")
-   (not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0,k")))]
-  "ix86_unary_operator_ok (NOT, mode, operands)"
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
+   (not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0,rm,k")))]
+  "ix86_unary_operator_ok (NOT, mode, operands,
+  ix86_can_use_ndd_p (NOT))"
   "@
not{}\t%0
+   not{}\t{%1, %0|%0, %1}
#"
-  [(set_attr "isa" "*,")
-   (set_attr "type" "negnot,msklog")
+  [(set_attr "isa" "*,apx_ndd,")
+   (set_attr "type" "negnot,negnot,msklog")
(set_attr "mode" "")])
 
 (define_insn "*one_cmplsi2_1_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,?k")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,?k")
(zero_extend:DI
- (not:SI (match_operand:SI 1 "register_operand" "0,k"]
-  "TARGET_64BIT && ix86_unary_operator_ok (NOT, SImode, operands)"
+ (not:SI (match_operand:SI 1 "register_operand" "0,r,k"]
+  "TARGET_64BIT && ix86_unary_operator_ok (NOT, SImode, operands,
+  ix86_can_use_ndd_p (NOT))"
   "@
not{l}\t%k0
+   not{l}\t{%k1, %k0|%k0, %k1}
#"
-  [(set_attr "isa" "x64,avx512bw_512")
-   (set_attr "type" "negnot,msklog")
-   (set_attr "mode" "SI,SI")])
+  [(set_attr "isa" "x64,apx_ndd,avx512bw_512")
+   (set_attr "type" "negnot,negnot,msklog")
+   (set_attr "mode" "SI,SI,SI")])
 
 (define_insn "*one_cmplqi2_1"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,?k")
-   (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,k")))]
-  "ix86_unary_operator_ok (NOT, QImode, operands)"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,r,?k")
+   (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,rm,k")))]
+  "ix86_unary_operator_ok (NOT, QImode, operands,
+  ix86_can_use_ndd_p (NOT))"
   "@
not{b}\t%0
not{l}\t%k0
+   not{l}\t{%k1, %k0|%k0, %k1}
#"
-  [(set_attr "isa" "*,*,avx512f")
-   (set_attr "type" "negnot,negnot,msklog")
+  [(set_attr "isa" "*,*,apx_ndd,avx512f")
+   (set_attr "type" "negnot,negnot,negnot,msklog")
(set (attr "mode")
-   (cond [(eq_attr "alternative" "1")
+   (cond [(eq_attr "alternative" "1,2")
 (const_string "SI")
-   (and (eq_attr "alternative" "2")
+   (and (eq_attr "alternative" "3")
 (match_test "!TARGET_AVX512DQ"))
 (const_str

[PATCH 12/16] [APX NDD] Support APX NDD for left shift insns

2023-11-15 Thread Hongyu Wang

For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl
1 can be optimized to add. As NDD form of add requires src operand to
be register since NDD cannot take 2 memory src, we currently just keep
using NDD form shift instead of add.

The optimization TARGET_SHIFT1 will try to remove constant 1, but under NDD it
could create ambiguous mnemonic like sal %ecx, %edx, this will be encoded to
legacy shift sal %cl, %edx which changes the expected behavior that %ecx is
actually considered as NDD src. Under such case we emit $1 explicitly when
operands[1] is CX reg.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add ASHIFT.
* config/i386/i386.md (*ashl3_1): Extend with new
alternatives to support NDD, limit the new alternative to
generate sal only, and adjust output template for NDD.
(*ashlsi3_1_zext): Likewise.
(*ashlhi3_1): Likewise.
(*ashlqi3_1): Likewise.
(*ashl3_cmp): Likewise.
(*ashlsi3_cmp_zext): Likewise.
(*ashl3_cconly): Likewise.
(*ashl3_doubleword): Likewise.
(*ashl3_doubleword_highpart): Adjust codegen for NDD.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add tests for sal.
---
 gcc/config/i386/i386-expand.cc  |   1 +
 gcc/config/i386/i386.md | 194 
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  22 +++
 3 files changed, 150 insertions(+), 67 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 5f02d557a50..7e3080482a6 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1276,6 +1276,7 @@ bool ix86_can_use_ndd_p (enum rtx_code code)
 case AND:
 case IOR:
 case XOR:
+case ASHIFT:
   return true;
 default:
   return false;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index cf9842d1a49..a0e81545f17 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14103,13 +14103,14 @@ (define_insn_and_split "*ashl3_doubleword_mask_1"
 })
 
 (define_insn "ashl3_doubleword"
-  [(set (match_operand:DWI 0 "register_operand" "=&r")
-   (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n")
-   (match_operand:QI 2 "nonmemory_operand" "c")))
+  [(set (match_operand:DWI 0 "register_operand" "=&r,r")
+   (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n,r")
+   (match_operand:QI 2 "nonmemory_operand" "c,c")))
(clobber (reg:CC FLAGS_REG))]
   ""
   "#"
-  [(set_attr "type" "multi")])
+  [(set_attr "isa" "*,apx_ndd")
+   (set_attr "type" "multi")])
 
 (define_split
   [(set (match_operand:DWI 0 "register_operand")
@@ -14149,11 +14150,14 @@ (define_insn_and_split 
"*ashl3_doubleword_highpart"
   [(const_int 0)]
 {
   split_double_mode (mode, &operands[0], 1, &operands[0], &operands[3]);
+  bool use_ndd = ix86_can_use_ndd_p (ASHIFT)
+&& !rtx_equal_p (operands[3], operands[1]);
   int bits = INTVAL (operands[2]) - ( * BITS_PER_UNIT);
-  if (!rtx_equal_p (operands[3], operands[1]))
+  if (!rtx_equal_p (operands[3], operands[1]) || !use_ndd)
 emit_move_insn (operands[3], operands[1]);
+  rtx op_tmp = use_ndd? operands[1] : operands[3];
   if (bits > 0)
-emit_insn (gen_ashl3 (operands[3], operands[3], GEN_INT (bits)));
+emit_insn (gen_ashl3 (operands[3], op_tmp, GEN_INT (bits)));
   ix86_expand_clear (operands[0]);
   DONE;
 })
@@ -14460,12 +14464,14 @@ (define_insn "*bmi2_ashl3_1"
(set_attr "mode" "")])
 
 (define_insn "*ashl3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k")
-   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "0,l,rm,k")
- (match_operand:QI 2 "nonmemory_operand" "c,M,r,")))
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k,r")
+   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" 
"0,l,rm,k,rm")
+ (match_operand:QI 2 "nonmemory_operand" 
"c,M,r,,c")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, mode, operands)"
+  "ix86_binary_operator_ok (ASHIFT, mode, operands,
+   ix86_can_use_ndd_p (ASHIFT))"
 {
+  bool use_ndd = (which_alternative == 4);
   switch (get_attr_type (insn))
 {
 case TYPE_LEA:
@@ -14480,18 +14486,24 @@ (define_insn "*ashl3_1"
 
 default:
   if (operands[2] == const1_rtx
- && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
-   return "sal{}\t%0";
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
+ && !(use_ndd && REG_P (operands[1])
+  && REGNO (operands[1]) == CX_REG))
+   return use_ndd ? "sal{}\t{%1, %0|%0, %1}"
+  : "sal{}\t%0";
   else
-   return "sal{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "sal{}\t{%2, %1, %0|%0, %1, %2}"
+  : "sal{}\t{%2, %0|%0, %2}";
 }
 }
-  [(set_attr "isa" "*,*,bmi

[PATCH 05/16] [APX NDD] Support APX NDD for adc insns

2023-11-15 Thread Hongyu Wang

From: Kong Lingling 

Legacy adc patterns are commonly adopted to TImode add, when extending TImode
add to NDD version, operands[0] and operands[1] can be different, so extra move
should be emitted if those patterns have optimization when adding const0_rtx.

gcc/ChangeLog:

* config/i386/i386.md (*add3_doubleword): Add ndd constraints, and
move operands[1] to operands[0] when they are not equal.
(*add3_doubleword_cc_overflow_1): Likewise.
(*add3_doubleword_zext): Add ndd constraints.
(*addv4_doubleword): Likewise.
(*addv4_doubleword_1): Likewise.
(addv4_overflow_1): Likewise.
(*addv4_overflow_2): Likewise.
(@add3_carry): Likewise.
(*add3_carry_0): Likewise.
(*addsi3_carry_zext): Likewise.
(*addsi3_carry_zext_0): Likewise.
(addcarry): Likewise.
(addcarry_0): Likewise.
(*addcarry_1): Likewise.
(*add3_eq): Likewise.
(*add3_ne): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd-adc.c: New test.
---
 gcc/config/i386/i386.md | 203 +---
 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c |  15 ++
 2 files changed, 146 insertions(+), 72 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-adc.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ecd06625a7d..f23859d1172 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6235,12 +6235,13 @@ (define_expand "add3"
ix86_can_use_ndd_p (PLUS)); DONE;")
 
 (define_insn_and_split "*add3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,r")
(plus:
- (match_operand: 1 "nonimmediate_operand" "%0,0")
- (match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
+ (match_operand: 1 "nonimmediate_operand" "%0,0,ro,r")
+ (match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,r")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (PLUS, mode, operands)"
+  "ix86_binary_operator_ok (PLUS, mode, operands,
+   ix86_can_use_ndd_p (PLUS))"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
@@ -6260,24 +6261,35 @@ (define_insn_and_split "*add3_doubleword"
   split_double_mode (mode, &operands[0], 3, &operands[0], &operands[3]);
   if (operands[2] == const0_rtx)
 {
+  /* Under NDD op0 and op1 may not equal, do not delete insn then.  */
+
+  bool emit_insn_deleted_note_p = true;
+  if (!rtx_equal_p (operands[0], operands[1]))
+   {
+ emit_move_insn (operands[0], operands[1]);
+ emit_insn_deleted_note_p = false;
+   }
   if (operands[5] != const0_rtx)
-   ix86_expand_binary_operator (PLUS, mode, &operands[3]);
+   ix86_expand_binary_operator (PLUS, mode, &operands[3],
+ix86_can_use_ndd_p (PLUS));
   else if (!rtx_equal_p (operands[3], operands[4]))
emit_move_insn (operands[3], operands[4]);
-  else
+  else if (emit_insn_deleted_note_p)
emit_note (NOTE_INSN_DELETED);
   DONE;
 }
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*add3_doubleword_zext"
-  [(set (match_operand: 0 "nonimmediate_operand" "=r,o")
+  [(set (match_operand: 0 "nonimmediate_operand" "=r,o,r,r")
(plus:
  (zero_extend:
-   (match_operand:DWIH 2 "nonimmediate_operand" "rm,r")) 
- (match_operand: 1 "nonimmediate_operand" "0,0")))
+   (match_operand:DWIH 2 "nonimmediate_operand" "rm,r,rm,r"))
+ (match_operand: 1 "nonimmediate_operand" "0,0,r,m")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (UNKNOWN, mode, operands)"
+  "ix86_binary_operator_ok (UNKNOWN, mode, operands,
+ ix86_can_use_ndd_p (PLUS))"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CCC FLAGS_REG)
@@ -6293,7 +6305,8 @@ (define_insn_and_split "*add3_doubleword_zext"
   (match_dup 4))
 (const_int 0)))
  (clobber (reg:CC FLAGS_REG))])]
- "split_double_mode (mode, &operands[0], 2, &operands[0], &operands[3]);")
+ "split_double_mode (mode, &operands[0], 2, &operands[0], &operands[3]);"
+ [(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*add3_doubleword_concat"
   [(set (match_operand: 0 "register_operand" "=&r")
@@ -7269,14 +7282,15 @@ (define_insn_and_split "*addv4_doubleword"
(eq:CCO
  (plus:
(sign_extend:
- (match_operand: 1 "nonimmediate_operand" "%0,0"))
+ (match_operand: 1 "nonimmediate_operand" "%0,0,ro,r"))
(sign_extend:
- (match_operand: 2 "nonimmediate_operand" "r,o")))
+ (match_operand: 2 "nonimmediate_operand" "r,o,r,o")))
  (sign_extend:
(plus: (match_dup 1) (match_dup 2)
-   (set (

[PATCH 07/16] [APX NDD] Support APX NDD for sbb insn

2023-11-15 Thread Hongyu Wang

From: Kong Lingling 

Similar to *add3_doubleword, operands[1] may not equal to operands[0] so
extra move is required.

gcc/ChangeLog:

* config/i386/i386.md (*sub3_doubleword): Add ndd constraints, and
emit move when operands[0] not equal to operands[1].
(*sub3_doubleword_zext): Likewise.
(*subv4_doubleword): Likewise.
(*subv4_doubleword_1): Likewise.
(*subv4_overflow_1): Likewise.
(*subv4_overflow_2): Likewise.
(*addsi3_carry_zext_0r): Likewise.
(@sub3_carry): Add NDD alternatives and adjust output templates.
(*subsi3_carry_zext): Likewise.
(subborrow): Likewise.
(subborrow_0): Likewise.
(*sub3_eq): Likewise.
(*sub3_ne): Likewise.
(*sub3_eq_1): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd-sbb.c: New test.
---
 gcc/config/i386/i386.md | 159 
 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c |   6 +
 2 files changed, 106 insertions(+), 59 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-sbb.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1aa8469d666..c3dcfaf52e1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -7641,12 +7641,13 @@ (define_expand "sub3"
ix86_can_use_ndd_p (MINUS)); DONE;")
 
 (define_insn_and_split "*sub3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,r")
(minus:
- (match_operand: 1 "nonimmediate_operand" "0,0")
- (match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
+ (match_operand: 1 "nonimmediate_operand" "0,0,ro,r")
+ (match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,o")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, mode, operands)"
+  "ix86_binary_operator_ok (MINUS, mode, operands,
+   ix86_can_use_ndd_p (MINUS))"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
@@ -7670,16 +7671,18 @@ (define_insn_and_split "*sub3_doubleword"
   ix86_can_use_ndd_p (MINUS));
   DONE;
 }
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*sub3_doubleword_zext"
-  [(set (match_operand: 0 "nonimmediate_operand" "=r,o")
+  [(set (match_operand: 0 "nonimmediate_operand" "=r,o,r,r")
(minus:
- (match_operand: 1 "nonimmediate_operand" "0,0")
+ (match_operand: 1 "nonimmediate_operand" "0,0,r,o")
  (zero_extend:
-   (match_operand:DWIH 2 "nonimmediate_operand" "rm,r"
+   (match_operand:DWIH 2 "nonimmediate_operand" "rm,r,rm,r"
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (UNKNOWN, mode, operands)"
+  "ix86_binary_operator_ok (UNKNOWN, mode, operands,
+   ix86_can_use_ndd_p (MINUS))"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
@@ -7693,7 +7696,8 @@ (define_insn_and_split "*sub3_doubleword_zext"
   (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0)))
 (const_int 0)))
  (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (mode, &operands[0], 2, &operands[0], 
&operands[3]);")
+  "split_double_mode (mode, &operands[0], 2, &operands[0], &operands[3]);"
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn "*sub_1"
   [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,i,r,r")
@@ -7929,14 +7933,15 @@ (define_insn_and_split "*subv4_doubleword"
(eq:CCO
  (minus:
(sign_extend:
- (match_operand: 1 "nonimmediate_operand" "0,0"))
+ (match_operand: 1 "nonimmediate_operand" "0,0,ro,r"))
(sign_extend:
- (match_operand: 2 "nonimmediate_operand" "r,o")))
+ (match_operand: 2 "nonimmediate_operand" "r,o,r,o")))
  (sign_extend:
(minus: (match_dup 1) (match_dup 2)
-   (set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+   (set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,r")
(minus: (match_dup 1) (match_dup 2)))]
-  "ix86_binary_operator_ok (MINUS, mode, operands)"
+  "ix86_binary_operator_ok (MINUS, mode, operands,
+   ix86_can_use_ndd_p (MINUS))"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC FLAGS_REG)
@@ -7964,22 +7969,24 @@ (define_insn_and_split "*subv4_doubleword"
 (match_dup 5)))])]
 {
   split_double_mode (mode, &operands[0], 3, &operands[0], &operands[3]);
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn_and_split "*subv4_doubleword_1"
   [(set (reg:CCO FLAGS_REG)
(eq:CCO
  (minus:
(sign_extend:
- (match_operand: 1 "nonimmediate_operand" "0"))
+ (match_operand: 1 "nonimmediate_operand" "0,ro"))
(match_operand: 3 "const_scalar_int_operand"))

[PATCH 16/16] [APX NDD] Support APX NDD for cmove insns

2023-11-15 Thread Hongyu Wang

gcc/ChangeLog:

* config/i386/i386.md (*movcc_noc): Extend with new constraints
to support NDD.
(*movsicc_noc_zext): Likewise.
(*movsicc_noc_zext_1): Likewise.
(*movqicc_noc): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd-cmov.c: New test.
---
 gcc/config/i386/i386.md  | 48 
 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c | 16 +++
 2 files changed, 45 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2e3d37d08b0..2ae9aaf59fb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -24119,47 +24119,56 @@ (define_split
(neg:SWI (ltu:SWI (reg:CCC FLAGS_REG) (const_int 0])
 
 (define_insn "*movcc_noc"
-  [(set (match_operand:SWI248 0 "register_operand" "=r,r")
+  [(set (match_operand:SWI248 0 "register_operand" "=r,r,r,r")
(if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator"
   [(reg FLAGS_REG) (const_int 0)])
- (match_operand:SWI248 2 "nonimmediate_operand" "rm,0")
- (match_operand:SWI248 3 "nonimmediate_operand" "0,rm")))]
+ (match_operand:SWI248 2 "nonimmediate_operand" "rm,0,rm,r")
+ (match_operand:SWI248 3 "nonimmediate_operand" "0,rm,r,rm")))]
   "TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "@
cmov%O2%C1\t{%2, %0|%0, %2}
-   cmov%O2%c1\t{%3, %0|%0, %3}"
-  [(set_attr "type" "icmov")
+   cmov%O2%c1\t{%3, %0|%0, %3}
+   cmov%O2%C1\t{%2, %3, %0|%0, %3, %2}
+   cmov%O2%c1\t{%3, %2, %0|%0, %2, %3}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "icmov")
(set_attr "mode" "")])
 
 (define_insn "*movsicc_noc_zext"
-  [(set (match_operand:DI 0 "register_operand" "=r,r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r,r")
(if_then_else:DI (match_operator 1 "ix86_comparison_operator"
   [(reg FLAGS_REG) (const_int 0)])
  (zero_extend:DI
-   (match_operand:SI 2 "nonimmediate_operand" "rm,0"))
+   (match_operand:SI 2 "nonimmediate_operand" "rm,0,rm,r"))
  (zero_extend:DI
-   (match_operand:SI 3 "nonimmediate_operand" "0,rm"]
+   (match_operand:SI 3 "nonimmediate_operand" "0,rm,r,rm"]
   "TARGET_64BIT
&& TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "@
cmov%O2%C1\t{%2, %k0|%k0, %2}
-   cmov%O2%c1\t{%3, %k0|%k0, %3}"
-  [(set_attr "type" "icmov")
+   cmov%O2%c1\t{%3, %k0|%k0, %3}
+   cmov%O2%C1\t{%2, %3, %k0|%k0, %3, %2}
+   cmov%O2%c1\t{%3, %2, %k0|%k0, %2, %3}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "icmov")
(set_attr "mode" "SI")])
 
 (define_insn "*movsicc_noc_zext_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r")
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,r")
(zero_extend:DI
  (if_then_else:SI (match_operator 1 "ix86_comparison_operator"
 [(reg FLAGS_REG) (const_int 0)])
-(match_operand:SI 2 "nonimmediate_operand" "rm,0")
-(match_operand:SI 3 "nonimmediate_operand" "0,rm"]
+(match_operand:SI 2 "nonimmediate_operand" "rm,0,rm,r")
+(match_operand:SI 3 "nonimmediate_operand" "0,rm,r,rm"]
   "TARGET_64BIT
&& TARGET_CMOVE && !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "@
cmov%O2%C1\t{%2, %k0|%k0, %2}
-   cmov%O2%c1\t{%3, %k0|%k0, %3}"
-  [(set_attr "type" "icmov")
+   cmov%O2%c1\t{%3, %k0|%k0, %3}
+   cmov%O2%C1\t{%2, %3, %k0|%k0, %3, %2}
+   cmov%O2%c1\t{%3, %2, %k0|%k0, %2, %3}"
+  [(set_attr "isa" "*,*,apx_ndd,apx_ndd")
+   (set_attr "type" "icmov")
(set_attr "mode" "SI")])
 
 
@@ -24184,14 +24193,15 @@ (define_split
 })
 
 (define_insn "*movqicc_noc"
-  [(set (match_operand:QI 0 "register_operand" "=r,r")
+  [(set (match_operand:QI 0 "register_operand" "=r,r,r")
(if_then_else:QI (match_operator 1 "ix86_comparison_operator"
   [(reg FLAGS_REG) (const_int 0)])
- (match_operand:QI 2 "register_operand" "r,0")
- (match_operand:QI 3 "register_operand" "0,r")))]
+ (match_operand:QI 2 "register_operand" "r,0,r")
+ (match_operand:QI 3 "register_operand" "0,r,r")))]
   "TARGET_CMOVE && !TARGET_PARTIAL_REG_STALL"
   "#"
-  [(set_attr "type" "icmov")
+  [(set_attr "isa" "*,*,apx_ndd")
+   (set_attr "type" "icmov")
(set_attr "mode" "QI")])
 
 (define_split
diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c 
b/gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
new file mode 100644
index 000..459dc965342
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/apx-ndd-cmov.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -m64 -mapxf" } */
+/* { dg-final { scan-assembler-times "cmove\[^\n\r]*, %eax" 1 } } */
+/*

[PATCH 10/16] [APX NDD] Support APX NDD for and insn

2023-11-15 Thread Hongyu Wang

From: Kong Lingling 

For NDD form AND insn, there are three splitter fixes after extending legacy
patterns.

1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for
some optimization splitters that generates highpart zero_extract for QImode
need to be prohibited under NDD pattern.

2. Legacy AND insn will use r/qm/L constraint, and a post-reload splitter will
transform it into zero_extend move. But for NDD form AND, the splitter is not
strict enough as the splitter assum such AND will have the const_int operand
matching the constraint "L", then NDD form AND allows const_int with any QI
values. Restrict the splitter condition to match "L" constraint that strictly
matches zero-extend sematic.

3. Legacy AND insn will adopt r/0/Z constraint, a splitter will try to optimize
such form into strict_lowpart QImode AND when 7th bit is not set. But the
splitter will wronly convert non-zext form of NDD and with memory src, then the
strict_lowpart transform matches alternative 1 of *_slp_1 and
generates *movstrict_1 so the zext sematic was omitted. This could cause
highpart of dest not cleared and generates wrong code. Disable the splitter
when NDD adopted and operands[0] and operands[1] are not equal.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add AND
support.
* config/i386/i386.md (and3): Add NDD alternatives and adjust
output template.
(*anddi_1): Likewise.
(*and_1): Likewise.
(*andqi_1): Likewise.
(*andsi_1_zext): Likewise.
(*anddi_2): Likewise.
(*andsi_2_zext): Likewise.
(*andqi_2_maybe_si): Likewise.
(*and_2): Likewise.
(*and3_doubleword): Add NDD constraints, emit move for optimized
case if operands[0] not equal to operands[1].
(define_split for QI highpart AND): Prohibit splitter to split NDD
form AND insn to qi_ext_3.
(define_split for QI strict_lowpart optimization): Prohibit splitter to
split NDD form AND insn to *3_1_slp.
(define_split for zero_extend and optimization): Prohibit splitter to
split NDD form AND insn to zero_extend insn.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add and test.
* gcc.target/i386/apx-spill_to_egprs-1.c: Change some check.
---
 gcc/config/i386/i386-expand.cc|   1 +
 gcc/config/i386/i386.md   | 177 --
 gcc/testsuite/gcc.target/i386/apx-ndd.c   |  13 ++
 .../gcc.target/i386/apx-spill_to_egprs-1.c|   8 +-
 4 files changed, 135 insertions(+), 64 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index be77ba4a476..662f687abc3 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1273,6 +1273,7 @@ bool ix86_can_use_ndd_p (enum rtx_code code)
 case MINUS:
 case NEG:
 case NOT:
+case AND:
   return true;
 default:
   return false;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9758e4e5144..4bf0c16f401 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11471,18 +11471,20 @@ (define_expand "and3"
   (operands[0], gen_lowpart (mode, operands[1]),
mode, mode, 1));
   else
-ix86_expand_binary_operator (AND, mode, operands);
+ix86_expand_binary_operator (AND, mode, operands,
+ix86_can_use_ndd_p (AND));
 
   DONE;
 })
 
 (define_insn_and_split "*and3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,r")
(and:
-(match_operand: 1 "nonimmediate_operand" "%0,0")
-(match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
+(match_operand: 1 "nonimmediate_operand" "%0,0,ro,r")
+(match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,o")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (AND, mode, operands)"
+  "ix86_binary_operator_ok (AND, mode, operands,
+   ix86_can_use_ndd_p (AND))"
   "#"
   "&& reload_completed"
   [(const_int:DWIH 0)]
@@ -11494,39 +11496,53 @@ (define_insn_and_split "*and3_doubleword"
   if (operands[2] == const0_rtx)
 emit_move_insn (operands[0], const0_rtx);
   else if (operands[2] == constm1_rtx)
-emit_insn_deleted_note_p = true;
+{
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  else
+   emit_insn_deleted_note_p = true;
+}
   else
-ix86_expand_binary_operator (AND, mode, &operands[0]);
+ix86_expand_binary_operator (AND, mode, &operands[0],
+   ix86_can_use_ndd_p (AND));
 
   if (operands[5] == const0_rtx)
 emit_move_insn (operands[3], const0_rtx);
   else if (operands[5] == constm1_rtx)
 {
-  if (emit_insn_deleted_note_p)
+  if (!rtx_equal_p (operands[3], operands[4]))
+   emit_move_in

[PATCH 11/16] [APX NDD] Support APX NDD for or/xor insn

2023-11-15 Thread Hongyu Wang

From: Kong Lingling 

Similar to AND insn, two splitters need to be adjusted to prevent
misoptimizaiton for NDD OR/XOR.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add IOR/XOR
support.
* config/i386/i386.md (3): Add NDD alternative and adjust
output templates.
(*_1): Likewise.
(*qi_1): Likewise.
(*notxor_1): Likewise.
(*si_1_zext): Likewise.
(*si_1_zext_imm): Likewise.
(*notxorqi_1): Likewise.
(*_2): Likewise.
(*si_2_zext): Likewise.
(*si_2_zext_imm): Likewise.
(*3_doubleword): Add NDD constraints, emit move for
optimized case if operands[0] != operands[1] or operands[4]
!= operands[5].
(define_split for QI highpart OR/XOR): Prohibit splitter to split NDD
form OR/XOR insn to qi_ext_3.
(define_split for QI strict_lowpart optimization): Prohibit splitter to
split NDD form AND insn to *3_1_slp.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add or and xor test.
---
 gcc/config/i386/i386-expand.cc  |   2 +
 gcc/config/i386/i386.md | 180 +++-
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  26 
 3 files changed, 143 insertions(+), 65 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 662f687abc3..5f02d557a50 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1274,6 +1274,8 @@ bool ix86_can_use_ndd_p (enum rtx_code code)
 case NEG:
 case NOT:
 case AND:
+case IOR:
+case XOR:
   return true;
 default:
   return false;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4bf0c16f401..cf9842d1a49 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12372,17 +12372,19 @@ (define_expand "3"
   && !x86_64_hilo_general_operand (operands[2], mode))
 operands[2] = force_reg (mode, operands[2]);
 
-  ix86_expand_binary_operator (, mode, operands);
+  ix86_expand_binary_operator (, mode, operands,
+  ix86_can_use_ndd_p ());
   DONE;
 })
 
 (define_insn_and_split "*3_doubleword"
-  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
+  [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,r")
(any_or:
-(match_operand: 1 "nonimmediate_operand" "%0,0")
-(match_operand: 2 "x86_64_hilo_general_operand" "r,o")))
+(match_operand: 1 "nonimmediate_operand" "%0,0,ro,r")
+(match_operand: 2 "x86_64_hilo_general_operand" 
"r,o,r,o")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands)"
+  "ix86_binary_operator_ok (, mode, operands,
+   ix86_can_use_ndd_p ())"
   "#"
   "&& reload_completed"
   [(const_int:DWIH 0)]
@@ -12394,20 +12396,29 @@ (define_insn_and_split "*3_doubleword"
   split_double_mode (mode, &operands[0], 3, &operands[0], &operands[3]);
 
   if (operands[2] == const0_rtx)
-emit_insn_deleted_note_p = true;
+{
+  if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+  else
+   emit_insn_deleted_note_p = true;
+}
   else if (operands[2] == constm1_rtx)
 {
   if ( == IOR)
emit_move_insn (operands[0], constm1_rtx);
   else
-   ix86_expand_unary_operator (NOT, mode, &operands[0]);
+   ix86_expand_unary_operator (NOT, mode, &operands[0],
+   ix86_can_use_ndd_p (NOT));
 }
   else
-ix86_expand_binary_operator (, mode, &operands[0]);
+ix86_expand_binary_operator (, mode, &operands[0],
+ix86_can_use_ndd_p ());
 
   if (operands[5] == const0_rtx)
 {
-  if (emit_insn_deleted_note_p)
+  if (!rtx_equal_p (operands[3], operands[4]))
+   emit_move_insn (operands[3], operands[4]);
+  else if (emit_insn_deleted_note_p)
emit_note (NOTE_INSN_DELETED);
 }
   else if (operands[5] == constm1_rtx)
@@ -12415,37 +12426,44 @@ (define_insn_and_split "*3_doubleword"
   if ( == IOR)
emit_move_insn (operands[3], constm1_rtx);
   else
-   ix86_expand_unary_operator (NOT, mode, &operands[3]);
+   ix86_expand_unary_operator (NOT, mode, &operands[3],
+   ix86_can_use_ndd_p (NOT));
 }
   else
-ix86_expand_binary_operator (, mode, &operands[3]);
+ix86_expand_binary_operator (, mode, &operands[3],
+ix86_can_use_ndd_p ());
 
   DONE;
-})
+}
+[(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
 (define_insn "*_1"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,r,r,?k")
(any_or:SWI248
-(match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
-(match_operand:SWI248 2 "" "r,,k")))
+(match_operand:SWI248 1 "nonimmediate_operand" "%0,0,rm,r,k")
+

[PATCH 15/16] [APX NDD] Support APX NDD for shld/shrd insns

2023-11-15 Thread Hongyu Wang

For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use
+r*m as its constraint. To support NDD we added new define_insns to handle NDD
form pattern with extra input and dest operand to be fixed in register.

gcc/ChangeLog:

* config/i386/i386.md (x86_64_shld_ndd): New define_insn.
(x86_64_shld_ndd_1): Likewise.
(*x86_64_shld_ndd_2): Likewise.
(x86_shld_ndd): Likewise.
(x86_shld_ndd_1): Likewise.
(*x86_shld_ndd_2): Likewise.
(x86_64_shrd_ndd): Likewise.
(x86_64_shrd_ndd_1): Likewise.
(*x86_64_shrd_ndd_2): Likewise.
(x86_shrd_ndd): Likewise.
(x86_shrd_ndd_1): Likewise.
(*x86_shrd_ndd_2): Likewise.
(*x86_64_shld_shrd_1_nozext): Adjust codegen under TARGET_APX_NDD.
(*x86_shld_shrd_1_nozext): Likewise.
(*x86_64_shrd_shld_1_nozext): Likewise.
(*x86_shrd_shld_1_nozext): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd-shld-shrd.c: New test.
---
 gcc/config/i386/i386.md   | 323 +-
 .../gcc.target/i386/apx-ndd-shld-shrd.c   |  24 ++
 2 files changed, 345 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-shld-shrd.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 760c0d32f4d..2e3d37d08b0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14183,6 +14183,24 @@ (define_insn "x86_64_shld"
(set_attr "amdfam10_decode" "vector")
(set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_64_shld_ndd"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
+ (and:QI (match_operand:QI 3 "nonmemory_operand" "Jc")
+ (const_int 63)))
+   (subreg:DI
+ (lshiftrt:TI
+   (zero_extend:TI
+ (match_operand:DI 2 "register_operand" "r"))
+   (minus:QI (const_int 64)
+ (and:QI (match_dup 3) (const_int 63 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_64BIT && TARGET_APX_NDD"
+  "shld{q}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "prefix_0f" "1")
+   (set_attr "mode" "DI")])
+
 (define_insn "x86_64_shld_1"
   [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m")
 (ior:DI (ashift:DI (match_dup 0)
@@ -14204,6 +14222,24 @@ (define_insn "x86_64_shld_1"
(set_attr "amdfam10_decode" "vector")
(set_attr "bdver1_decode" "vector")])
 
+(define_insn "x86_64_shld_ndd_1"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
+  (match_operand:QI 3 "const_0_to_63_operand"))
+   (subreg:DI
+ (lshiftrt:TI
+   (zero_extend:TI
+ (match_operand:DI 2 "register_operand" "r"))
+   (match_operand:QI 4 "const_0_to_255_operand")) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_64BIT && TARGET_APX_NDD
+   && INTVAL (operands[4]) == 64 - INTVAL (operands[3])"
+  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  [(set_attr "type" "ishift")
+   (set_attr "mode" "DI")
+   (set_attr "length_immediate" "1")])
+
+
 (define_insn_and_split "*x86_64_shld_shrd_1_nozext"
   [(set (match_operand:DI 0 "nonimmediate_operand")
(ior:DI (ashift:DI (match_operand:DI 4 "nonimmediate_operand")
@@ -14229,6 +14265,23 @@ (define_insn_and_split "*x86_64_shld_shrd_1_nozext"
   operands[4] = force_reg (DImode, operands[4]);
   emit_insn (gen_x86_64_shrd_1 (operands[0], operands[4], operands[3], 
operands[2]));
 }
+  else if (TARGET_APX_NDD)
+{
+ rtx tmp = gen_reg_rtx (DImode);
+ if (MEM_P (operands[4]))
+   {
+operands[1] = force_reg (DImode, operands[1]);
+emit_insn (gen_x86_64_shld_ndd_1 (tmp, operands[4], operands[1],
+  operands[2], operands[3]));
+   }
+ else if (MEM_P (operands[1]))
+   emit_insn (gen_x86_64_shrd_ndd_1 (tmp, operands[1], operands[4],
+operands[3], operands[2]));
+ else
+   emit_insn (gen_x86_64_shld_ndd_1 (tmp, operands[4], operands[1],
+operands[2], operands[3]));
+ emit_move_insn (operands[0], tmp);
+}
   else
{
  operands[1] = force_reg (DImode, operands[1]);
@@ -14261,6 +14314,33 @@ (define_insn_and_split "*x86_64_shld_2"
   (const_int 63 0)))
  (clobber (reg:CC FLAGS_REG))])])
 
+(define_insn_and_split "*x86_64_shld_ndd_2"
+  [(set (match_operand:DI 0 "nonimmediate_operand")
+   (ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand")
+  (match_operand:QI 3 "nonmemory_operand"))
+   (lshiftrt:DI (match_op

[PATCH 13/16] [APX NDD] Support APX NDD for right shift insns

2023-11-15 Thread Hongyu Wang

Similar to LSHIFT, rshift should also emit $1 for NDD form with CX_REG as
operands[1].

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add LSHIFTRT
and RSHIFTRT.
* config/i386/i386.md (ashr3_cvt): Extend with new
alternatives to support NDD, and adjust output templates.
(*ashrsi3_cvt_zext): Likewise.
(*ashr3_1): Likewise for SI/DI mode.
(*highpartdisi2): Likewise.
(*lshr3_1): Likewise.
(*si3_1_zext): Likewise.
(*ashr3_1): Likewise for QI/HI mode.
(*lshrqi3_1): Likewise.
(*lshrhi3_1): Likewise.
(3_cmp): Likewise.
(*si3_cmp_zext): Likewise.
(*3_cconly): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Add l/ashiftrt tests.
---
 gcc/config/i386/i386-expand.cc  |   2 +
 gcc/config/i386/i386.md | 265 +++-
 gcc/testsuite/gcc.target/i386/apx-ndd.c |  24 +++
 3 files changed, 191 insertions(+), 100 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 7e3080482a6..8e040346fbb 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1277,6 +1277,8 @@ bool ix86_can_use_ndd_p (enum rtx_code code)
 case IOR:
 case XOR:
 case ASHIFT:
+case ASHIFTRT:
+case LSHIFTRT:
   return true;
 default:
   return false;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a0e81545f17..3ff333d4a41 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15490,39 +15490,45 @@ (define_mode_attr cvt_mnemonic
   [(SI "{cltd|cdq}") (DI "{cqto|cqo}")])
 
 (define_insn "ashr3_cvt"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=*d,rm")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=*d,rm,r")
(ashiftrt:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "*a,0")
+ (match_operand:SWI48 1 "nonimmediate_operand" "*a,0,rm")
  (match_operand:QI 2 "const_int_operand")))
(clobber (reg:CC FLAGS_REG))]
   "INTVAL (operands[2]) == GET_MODE_BITSIZE (mode)-1
&& (TARGET_USE_CLTD || optimize_function_for_size_p (cfun))
-   && ix86_binary_operator_ok (ASHIFTRT, mode, operands)"
+   && ix86_binary_operator_ok (ASHIFTRT, mode, operands,
+  ix86_can_use_ndd_p (ASHIFTRT))"
   "@

-   sar{}\t{%2, %0|%0, %2}"
-  [(set_attr "type" "imovx,ishift")
-   (set_attr "prefix_0f" "0,*")
-   (set_attr "length_immediate" "0,*")
-   (set_attr "modrm" "0,1")
+   sar{}\t{%2, %0|%0, %2}
+   sar{}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd")
+   (set_attr "type" "imovx,ishift,ishift")
+   (set_attr "prefix_0f" "0,*,*")
+   (set_attr "length_immediate" "0,*,*")
+   (set_attr "modrm" "0,1,1")
(set_attr "mode" "")])
 
 (define_insn "*ashrsi3_cvt_zext"
-  [(set (match_operand:DI 0 "register_operand" "=*d,r")
+  [(set (match_operand:DI 0 "register_operand" "=*d,r,r")
(zero_extend:DI
- (ashiftrt:SI (match_operand:SI 1 "register_operand" "*a,0")
+ (ashiftrt:SI (match_operand:SI 1 "register_operand" "*a,0,r")
   (match_operand:QI 2 "const_int_operand"
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && INTVAL (operands[2]) == 31
&& (TARGET_USE_CLTD || optimize_function_for_size_p (cfun))
-   && ix86_binary_operator_ok (ASHIFTRT, SImode, operands)"
+   && ix86_binary_operator_ok (ASHIFTRT, SImode, operands,
+  ix86_can_use_ndd_p (ASHIFTRT))"
   "@
{cltd|cdq}
-   sar{l}\t{%2, %k0|%k0, %2}"
-  [(set_attr "type" "imovx,ishift")
-   (set_attr "prefix_0f" "0,*")
-   (set_attr "length_immediate" "0,*")
-   (set_attr "modrm" "0,1")
+   sar{l}\t{%2, %k0|%k0, %2}
+   sar{l}\t{%2, %1, %k0|%k0, %1, %2}"
+  [(set_attr "isa" "*,*,apx_ndd")
+   (set_attr "type" "imovx,ishift,ishift")
+   (set_attr "prefix_0f" "0,*,*")
+   (set_attr "length_immediate" "0,*,*")
+   (set_attr "modrm" "0,1,1")
(set_attr "mode" "SI")])
 
 (define_expand "@x86_shift_adj_3"
@@ -15564,13 +15570,15 @@ (define_insn "*bmi2_3_1"
(set_attr "mode" "")])
 
 (define_insn "*ashr3_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
(ashiftrt:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "0,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,r")))
+ (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm")
+ (match_operand:QI 2 "nonmemory_operand" "c,r,c")))
(clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFTRT, mode, operands)"
+  "ix86_binary_operator_ok (ASHIFTRT, mode, operands,
+   ix86_can_use_ndd_p (ASHIFTRT))"
 {
+  bool use_ndd = (which_alternative == 2);
   switch (get_attr_type (insn))
 {
 case TYPE_ISHIFTX:
@@ -15578,14 +15586,18 @@ (define_insn "*ashr3_1"
 
 default:
   if (operands[2] == const1_rtx
-

Re: [PATCH v2] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-15 Thread chenglulu




在 2023/11/15 上午5:52, Xi Ruoyao 写道:

This is isomorphic to the LLVM changes [1-2].

On LoongArch, the LL and SC instructions has memory barrier semantics:

- LL:  + 
- SC:  + 

But the compare and swap operation is allowed to fail, and if it fails
the SC instruction is not executed, thus the guarantee of acquiring
semantics cannot be ensured. Therefore, an acquire barrier needs to be
generated when failure_memorder includes an acquire operation.

On CPUs implementing LoongArch v1.10 or later, "dbar 0b10100" is an
acquire barrier; on CPUs implementing LoongArch v1.00, it is a full
barrier.  So it's always enough for acquire semantics.  OTOH if an
acquire semantic is not needed, we still needs the "dbar 0x700" as the
load-load barrier like all LL-SC loops.

[1]:https://github.com/llvm/llvm-project/pull/67391
[2]:https://github.com/llvm/llvm-project/pull/69339


I've done the test. There's no problem.

Thanks.


gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_memmodel_needs_release_fence): Remove.
(loongarch_cas_failure_memorder_needs_acquire): New static
function.
(loongarch_print_operand): Redefine 'G' for the barrier on CAS
failure.
* config/loongarch/sync.md (atomic_cas_value_strong):
Remove the redundant barrier before the LL instruction, and
emit an acquire barrier on failure if needed by
failure_memorder.
(atomic_cas_value_cmp_and_7_): Likewise.
(atomic_cas_value_add_7_): Remove the unnecessary barrier
before the LL instruction.
(atomic_cas_value_sub_7_): Likewise.
(atomic_cas_value_and_7_): Likewise.
(atomic_cas_value_xor_7_): Likewise.
(atomic_cas_value_or_7_): Likewise.
(atomic_cas_value_nand_7_): Likewise.
(atomic_cas_value_exchange_7_): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/cas-acquire.c: New test.
---

v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635304.html

Changes from v1:

- Remove yet not reachable "case MEMMODEL_CONSUME".
- Fix the test case so it almost always fail with (buggy) GCC 13 and
   LA664.

Bootstrapped and regtested on loongarch64-linux-gnu running on LA664.
Ok for trunk?

  gcc/config/loongarch/loongarch.cc | 30 ---
  gcc/config/loongarch/sync.md  | 49 +--
  .../gcc.target/loongarch/cas-acquire.c| 82 +++
  3 files changed, 119 insertions(+), 42 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/cas-acquire.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 2998bf740d4..738911661d7 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5833,27 +5833,27 @@ loongarch_memmodel_needs_rel_acq_fence (enum memmodel 
model)
  }
  }
  
-/* Return true if a FENCE should be emitted to before a memory access to

-   implement the release portion of memory model MODEL.  */
+/* Return true if a FENCE should be emitted after a failed CAS to
+   implement the acquire semantic of failure_memorder.  */
  
  static bool

-loongarch_memmodel_needs_release_fence (enum memmodel model)
+loongarch_cas_failure_memorder_needs_acquire (enum memmodel model)
  {
-  switch (model)
+  switch (memmodel_base (model))
  {
+case MEMMODEL_ACQUIRE:
  case MEMMODEL_ACQ_REL:
  case MEMMODEL_SEQ_CST:
-case MEMMODEL_SYNC_SEQ_CST:
-case MEMMODEL_RELEASE:
-case MEMMODEL_SYNC_RELEASE:
return true;
  
-case MEMMODEL_ACQUIRE:

-case MEMMODEL_CONSUME:
-case MEMMODEL_SYNC_ACQUIRE:
  case MEMMODEL_RELAXED:
+case MEMMODEL_RELEASE:
return false;
  
+/* MEMMODEL_CONSUME is deliberately not handled because it's always

+   replaced by MEMMODEL_ACQUIRE as at now.  If you see an ICE caused by
+   MEMMODEL_CONSUME, read the change (re)introducing it carefully and
+   decide what to do.  See PR 59448 and get_memmodel in builtins.cc.  */
  default:
gcc_unreachable ();
  }
@@ -5966,7 +5966,8 @@ loongarch_print_operand_reloc (FILE *file, rtx op, bool 
hi64_part,
 'd'Print CONST_INT OP in decimal.
 'E'Print CONST_INT OP element 0 of a replicated CONST_VECTOR in 
decimal.
 'F'Print the FPU branch condition for comparison OP.
-   'G' Print a DBAR insn if the memory model requires a release.
+   'G' Print a DBAR insn for CAS failure (with an acquire semantic if
+   needed, otherwise a simple load-load barrier).
 'H'  Print address 52-61bit relocation associated with OP.
 'h'  Print the high-part relocation associated with OP.
 'i'Print i if the operand is not a register.
@@ -6057,8 +6058,11 @@ loongarch_print_operand (FILE *file, rtx op, int letter)
break;
  
  case 'G':

-  if (loongarch_memmodel_needs_release_fence ((enum memmodel) INTVAL (op)))
-   fputs ("dbar\t0", file);
+  if (loongarch_cas_fa

Re: [RFC PATCH] Detecting lifetime-dse issues via Valgrind [PR66487]

2023-11-15 Thread Daniil Frolov


On 2023-11-13 02:53, Sam James wrote:

Sam James  writes:


Alexander Monakov  writes:
[...]


I'm very curious what you mean by "this has come up with LLVM [] 
too": ttbomk,
LLVM doesn't do such lifetime-based optimization yet, which is why 
compiling
LLVM with LLVM doesn't break it. Can you share some examples? Or do 
you mean
instances when libLLVM-miscompiled-with-GCC was linked elsewhere, and 
that

program crashed mysteriously as a result?

Indeed this work is inspired by the LLVM incident in PR 106943.


[...]
I had some vague memories in the back of my head so I went digging
because I enjoy this:
[...]


I ended up stumbling on two more:

* charm (https://github.com/UIUC-PPL/charm/issues/1045)
* firebird (https://github.com/FirebirdSQL/firebird/issues/5384, 
starring richi)


Now I'm really done :)


[...]


Alexander


thanks,
sam


Thanks for your prompt response; it is greatly appreciated.

We conducted tests on two packages from your provided list and obtained
some preliminary results:

* crypto++: No violations were detected during their own tests, which
were executed using 'make valgrind' and our custom option
--fvalgrind-emit-annotations.

* firebird: An issue was identified with the global object isqlGlobal.  
It

appears that developers are assuming the global object will be
zero-initialized, but the C++ standard guarantees this only for static
initialization.  The presence of a non-trivial constructor, 
IsqlGlobals(),

means that isqlGlobal has formally uninitialized fields.  A snippet from
the Valgrind dump is as follows:

==106087== Conditional jump or move depends on uninitialised value(s)
==106087==at 0x4378F0: create_db(char const*, char*) (isql.cpp:5838)
==106087==by 0x44CBE3: frontend(char const*) (isql.cpp:6699)
==106087==by 0x44EBC1: get_statement (isql.cpp:7638)
==106087==by 0x44EBC1: do_isql() (isql.cpp:6008)
==106087==by 0x45039C: ISQL_main(int, char**) (isql.cpp:1854)
==106087==by 0x4EE1082: (below main) (libc-start.c:308)
==106087==  Uninitialised value was created by a client request
==106087==at 0x5A617C: __valgrind_make_mem_undefined (valgrind.c:48)
==106087==by 0x42E365: IsqlGlobals::IsqlGlobals() (isql.cpp:1378)
==106087==by 0x418388: __static_initialization_and_destruction_0 
(isql.cpp:1098)

==106087==by 0x418388: _GLOBAL__sub_I_isql.cpp (isql.cpp:11536)
==106087==by 0x5A61DC: __libc_csu_init (in ...)
==106087==by 0x4EE100F: (below main) (libc-start.c:264)

---
With best regards,
Daniil

RE: [PATCH] RISC-V: Support trailing vec_init optimization

2023-11-15 Thread Li, Pan2

Committed, thanks Robin..

Pan

-Original Message-
From: Robin Dapp  
Sent: Wednesday, November 15, 2023 4:59 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; kito.ch...@gmail.com; kito.ch...@sifive.com; 
jeffreya...@gmail.com
Subject: Re: [PATCH] RISC-V: Support trailing vec_init optimization

Hi Juzhe,

thanks, LGTM as it is just a refinement of what we already have.

Regards
 Robin

Re: gfortran.dg/dg.exp debug messages pollute test output

2023-11-15 Thread FX Coudert

> FX submitted the patch series, I can find the reference if you need it.

Patch was submitted in this thread: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630096.html


>> Besides,
>> it's unclear if those messages can just be removed (they are pretty
>> cryptic as is) or at least changed to use verbose instead of puts.
>> Please fix.

I don’t see value in this output, so I think it’s best to remove the puts calls 
entirely. Attached patch does that.
Testing under progress, OK if it passes? (or does that count as obvious fix-up 
of the previous patch)

FX




0001-Testsuite-silence-some-noise-in-output.patch
Description: Binary data

Re: Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.

2023-11-15 Thread 钟居哲

Hi, Kito. Could you take a look at this issue?

-march parser is consistent between non-linux and linux.

You can simplify verify it with these cases:

FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-runu.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c -std=c99 
-O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c -std=c99 -O3 
-ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess 
errors)
FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-run.c -O3 -ftree-vectorize (test for 
excess errors)

These cases failed on non-linux toolchain, but pass on linux toolchain.
This consistency is caused by your previous multilib patch as Lehua said:
https://github.com/gcc-mirror/gcc/commit/17d683d 




juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-11-13 19:27
To: kito.cheng; Robin Dapp
CC: juzhe.zh...@rivai.ai; gcc-patches; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: testsuite: Fix 32-bit FAILs.
Hi Kito,
 
On 2023/11/13 19:13, Lehua Ding wrote:
> Hi Robin,
> 
> On 2023/11/13 18:33, Robin Dapp wrote:
>>> On 2023/11/13 18:22, juzhe.zh...@rivai.ai wrote:
 If there is a difference between them. I think we should fix 
 riscv-common.cc.
 Since I think "zvfh_zfh" should not be different with "zfh_zvfh"
>>>
>>> It's possible. Let me debug it and see if there's a problem.
>>
>> I don't think it is different.  Just checked and it still works for me.
>>
>> Could you please tell me how you invoke the testsuite?
> 
> This looks to be the difference between the linux and elf versions of 
> gcc. The elf version of gcc we are build will have this problem, the 
> linux version of gcc will not. I think the linux version of gcc has a 
> wrong behavior.:
> 
> ➜  riscv-gnu-toolchain-push git:(tintin-dev) 
> ./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-newlib-spike-debug/install/bin/riscv32-unknown-elf-gcc
>  -march=rv32gcv_zfh 
> build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c
> riscv32-unknown-elf-gcc: fatal error: Cannot find suitable multilib set 
> for 
> '-march=rv32imafdcv_zicsr_zifencei_zfh_zfhmin_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=ilp32d'
> compilation terminated.
> ➜  riscv-gnu-toolchain-push git:(tintin-dev) 
> ./build/dev-rv32gcv_zfh_zvfh-ilp32d-medany-linux-spike-debug/install/bin/riscv32-unknown-linux-gnu-gcc
>  -march=rv32gcv_zfh 
> build/dev-rv64gcv_zvfh_zfh-lp64d-medany-newlib-spike-debug/hello.c
> 
 
It looks like this commit[1] from you make the difference between elf 
and linux. Can you help to see if it makes sense to behave differently 
now? elf version --with-arch is rv32gcv_zvfh_zfh, and the user will get 
an error with -march=rv32gcv_zfh. linux version will not.
 
[1] https://github.com/gcc-mirror/gcc/commit/17d683d
 
-- 
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai

[committed] arm: testsuite: fix test for armv6t2 hardware

2023-11-15 Thread Richard Earnshaw


My previous patch series added a new function to check for armv6t2
compatible hardware.  But the test was not correctly implemented and
also did not follow the standard naming convention for Arm hw
compatibility tests.  Fix both of these issues.

gcc/testsuite:

* lib/target-supports.exp (check_effective_target_arm_arch_v6t2_hw_ok):
Rename to...
(check_effective_target_arm_arch_v6t2_hw): ... this.  Fix checks.
* gcc.target/arm/acle/data-intrinsics-armv6.c: Update pre-check.
* gcc.target/arm/acle/data-intrinsics-rbit.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c | 2 +-
 gcc/testsuite/gcc.target/arm/acle/data-intrinsics-rbit.c  | 2 +-
 gcc/testsuite/lib/target-supports.exp | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
index 6dc8c55e2f9..c231fa4c1ae 100644
--- a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
+++ b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target arm_arch_v6t2_hw_ok } */
+/* { dg-require-effective-target arm_arch_v6t2_hw } */
 /* { dg-add-options arm_arch_v6t2 } */
 
 #include "arm_acle.h"
diff --git a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-rbit.c b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-rbit.c
index b01c4219a7e..ac358bce02d 100644
--- a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-rbit.c
+++ b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-rbit.c
@@ -1,6 +1,6 @@
 /* Test the ACLE data intrinsics existence for specific instruction.  */
 /* { dg-do run } */
-/* { dg-require-effective-target arm_arch_v6t2_hw_ok } */
+/* { dg-require-effective-target arm_arch_v6t2_hw } */
 /* { dg-additional-options "--save-temps -O1" } */
 /* { dg-add-options arm_arch_v6t2 } */
 /* { dg-final { check-function-bodies "**" "" "" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 43a040e135c..b6a2e4fd096 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5607,9 +5607,9 @@ proc check_effective_target_arm_thumb1_cbz_ok {} {
 # Return 1 if this is an Arm target which supports the Armv6t2 extensions.
 # This can be either in Arm state or in Thumb state.
 
-proc check_effective_target_arm_arch_v6t2_hw_ok {} {
-if [check_effective_target_arm_thumb1_ok] {
-	return [check_no_compiler_messages arm_movt object {
+proc check_effective_target_arm_arch_v6t2_hw {} {
+if [check_effective_target_arm_arch_v6t2_ok] {
+	return [check_runtime arm_arch_v6t2 {
 	int
 	main (void)
 	{

[PATCH 2/4] libsanitizer: Apply local patches

2023-11-15 Thread Jakub Jelinek

Hi!

This patch just reapplies local patches (will be noted in LOCAL_PATCHES).

diff --git a/libsanitizer/asan/asan_globals.cpp 
b/libsanitizer/asan/asan_globals.cpp
index 4d391cb2a88..01a243927ca 100644
--- a/libsanitizer/asan/asan_globals.cpp
+++ b/libsanitizer/asan/asan_globals.cpp
@@ -158,23 +158,6 @@ static void CheckODRViolationViaIndicator(const Global *g) 
{
   }
 }
 
-// Check ODR violation for given global G by checking if it's already poisoned.
-// We use this method in case compiler doesn't use private aliases for global
-// variables.
-static void CheckODRViolationViaPoisoning(const Global *g) {
-  if (__asan_region_is_poisoned(g->beg, g->size_with_redzone)) {
-// This check may not be enough: if the first global is much larger
-// the entire redzone of the second global may be within the first global.
-for (ListOfGlobals *l = list_of_all_globals; l; l = l->next) {
-  if (g->beg == l->g->beg &&
-  (flags()->detect_odr_violation >= 2 || g->size != l->g->size) &&
-  !IsODRViolationSuppressed(g->name))
-ReportODRViolation(g, FindRegistrationSite(g),
-   l->g, FindRegistrationSite(l->g));
-}
-  }
-}
-
 // Clang provides two different ways for global variables protection:
 // it can poison the global itself or its private alias. In former
 // case we may poison same symbol multiple times, that can help us to
@@ -220,8 +203,6 @@ static void RegisterGlobal(const Global *g) {
 // where two globals with the same name are defined in different modules.
 if (UseODRIndicator(g))
   CheckODRViolationViaIndicator(g);
-else
-  CheckODRViolationViaPoisoning(g);
   }
   if (CanPoisonMemory())
 PoisonRedZones(*g);
diff --git a/libsanitizer/asan/asan_interceptors.h 
b/libsanitizer/asan/asan_interceptors.h
index c4bf087ea17..9a6c22c764a 100644
--- a/libsanitizer/asan/asan_interceptors.h
+++ b/libsanitizer/asan/asan_interceptors.h
@@ -81,7 +81,12 @@ void InitializePlatformInterceptors();
 #if ASAN_HAS_EXCEPTIONS && !SANITIZER_WINDOWS && !SANITIZER_SOLARIS && \
 !SANITIZER_NETBSD
 # define ASAN_INTERCEPT___CXA_THROW 1
-# define ASAN_INTERCEPT___CXA_RETHROW_PRIMARY_EXCEPTION 1
+# if ! defined(ASAN_HAS_CXA_RETHROW_PRIMARY_EXCEPTION) \
+ || ASAN_HAS_CXA_RETHROW_PRIMARY_EXCEPTION
+#   define ASAN_INTERCEPT___CXA_RETHROW_PRIMARY_EXCEPTION 1
+# else
+#   define ASAN_INTERCEPT___CXA_RETHROW_PRIMARY_EXCEPTION 0
+# endif
 # if defined(_GLIBCXX_SJLJ_EXCEPTIONS) || (SANITIZER_IOS && defined(__arm__))
 #  define ASAN_INTERCEPT__UNWIND_SJLJ_RAISEEXCEPTION 1
 # else
diff --git a/libsanitizer/asan/asan_mapping.h b/libsanitizer/asan/asan_mapping.h
index c5f95c07a21..47ccf8444d3 100644
--- a/libsanitizer/asan/asan_mapping.h
+++ b/libsanitizer/asan/asan_mapping.h
@@ -190,7 +190,7 @@
 #  elif defined(__aarch64__)
 #define ASAN_SHADOW_OFFSET_CONST 0x0010
 #  elif defined(__powerpc64__)
-#define ASAN_SHADOW_OFFSET_CONST 0x1000
+#define ASAN_SHADOW_OFFSET_CONST 0x0200
 #  elif defined(__s390x__)
 #define ASAN_SHADOW_OFFSET_CONST 0x0010
 #  elif SANITIZER_FREEBSD
diff --git a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp 
b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp
index 37b2b57c0c8..2720a3cab2c 100644
--- a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp
+++ b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp
@@ -838,9 +838,13 @@ u32 GetNumberOfCPUs() {
 #elif SANITIZER_SOLARIS
   return sysconf(_SC_NPROCESSORS_ONLN);
 #else
+#if defined(CPU_COUNT)
   cpu_set_t CPUs;
   CHECK_EQ(sched_getaffinity(0, sizeof(cpu_set_t), &CPUs), 0);
   return CPU_COUNT(&CPUs);
+#else
+  return 1;
+#endif
 #endif
 }
 
diff --git a/libsanitizer/sanitizer_common/sanitizer_mac.cpp 
b/libsanitizer/sanitizer_common/sanitizer_mac.cpp
index 24e3d111252..e1f83e4002a 100644
--- a/libsanitizer/sanitizer_common/sanitizer_mac.cpp
+++ b/libsanitizer/sanitizer_common/sanitizer_mac.cpp
@@ -38,7 +38,7 @@
 extern char **environ;
 #  endif
 
-#  if defined(__has_include) && __has_include()
+#  if defined(__has_include) && __has_include() && 
defined(__BLOCKS__)
 #define SANITIZER_OS_TRACE 1
 #include 
 #  else
@@ -71,7 +71,15 @@ extern char ***_NSGetArgv(void);
 #  include 
 #  include 
 #  include 
-#  include 
+#  if defined(__has_builtin) && __has_builtin(__builtin_os_log_format)
+#include 
+#  else
+ /* Without support for __builtin_os_log_format, fall back to the older
+method.  */
+#define OS_LOG_DEFAULT 0
+#define os_log_error(A,B,C) \
+   asl_log(nullptr, nullptr, ASL_LEVEL_ERR, "%s", (C));
+#  endif
 #  include 
 #  include 
 #  include 
diff --git a/libsanitizer/sanitizer_common/sanitizer_mac.h 
b/libsanitizer/sanitizer_common/sanitizer_mac.h
index f0a97d098ee..1cf2e298cc9 100644
--- a/libsanitizer/sanitizer_common/sanitizer_mac.h
+++ b/libsanitizer/sanitizer_common/sanitizer_mac.h
@@ -14,6 +14,26 @@
 
 #include "sanitizer_c

[PATCH 3/4] libsanitizer: Adjust the asan/sanity-check-pure-c-1.c test

2023-11-15 Thread Jakub Jelinek

Hi!

The updated libasan doesn't print __interceptor_free (or __interceptor_malloc)
but free (or malloc), the following patch adjusts the testcase so that it
accepts it.

2023-11-15  Jakub Jelinek  

* c-c++-common/asan/sanity-check-pure-c-1.c: Adjust for interceptor_
or wrap_ substrings possibly not being emitted in newer libasan.

--- gcc/testsuite/c-c++-common/asan/sanity-check-pure-c-1.c.jj  2020-01-14 
20:02:46.646611886 +0100
+++ gcc/testsuite/c-c++-common/asan/sanity-check-pure-c-1.c 2023-11-15 
10:51:50.921621770 +0100
@@ -10,7 +10,7 @@ int main() {
 }
 
 /* { dg-output "heap-use-after-free.*(\n|\r\n|\r)" } */
-/* { dg-output "#0 \[^\n\r]*(in 
_*(interceptor_|wrap_)free|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "#0 \[^\n\r]*(in 
_*(interceptor_|wrap_)?free|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "#1 \[^\n\r]*(in _*main 
(\[^\n\r]*sanity-check-pure-c-1.c:8|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r)" } */
-/* { dg-output "#0 \[^\n\r]*(in 
_*(interceptor_|wrap_)malloc|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "#0 \[^\n\r]*(in 
_*(interceptor_|wrap_)?malloc|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "#1 \[^\n\r]*(in _*main 
(\[^\n\r]*sanity-check-pure-c-1.c:7|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r)" } 
*/

Jakub

[PATCH 4/4] libsanitizer: Readd __ubsan_handle_function_type_mismatch_v1{,_abort}

2023-11-15 Thread Jakub Jelinek

Hi!

So that we don't have to bump libubsan.so.1 SONAME, the following patch
reverts part of the changes which removed two handlers.  While we don't
actually use them from GCC, we shouldn't remove supported entrypoints
unless SONAME is changed (removal of __interceptor_* or ___interceptor_*
is fine).  This is the only removal, other libraries just added some
symbols.

2023-11-15  Jakub Jelinek  

* ubsan/ubsan_handlers_cxx.h (FunctionTypeMismatchData): Forward
declare.
(__ubsan_handle_function_type_mismatch_v1,
__ubsan_handle_function_type_mismatch_v1_abort): Declare.
* ubsan/ubsan_handlers_cxx.cpp (handleFunctionTypeMismatch,
__ubsan_handle_function_type_mismatch_v1,
__ubsan_handle_function_type_mismatch_v1_abort): New functions readded
for backwards compatibility from older ubsan.
* ubsan/ubsan_interface.inc (__ubsan_handle_function_type_mismatch_v1,
__ubsan_handle_function_type_mismatch_v1_abort): Readd.

--- libsanitizer/ubsan/ubsan_handlers_cxx.h.jj  2023-11-14 23:52:59.417503473 
+0100
+++ libsanitizer/ubsan/ubsan_handlers_cxx.h 2023-11-15 11:36:34.961739772 
+0100
@@ -33,6 +33,19 @@ void __ubsan_handle_dynamic_type_cache_m
 extern "C" SANITIZER_INTERFACE_ATTRIBUTE
 void __ubsan_handle_dynamic_type_cache_miss_abort(
   DynamicTypeCacheMissData *Data, ValueHandle Pointer, ValueHandle Hash);
+
+struct FunctionTypeMismatchData;
+
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void
+__ubsan_handle_function_type_mismatch_v1(FunctionTypeMismatchData *Data,
+ ValueHandle Val,
+ ValueHandle calleeRTTI,
+ ValueHandle fnRTTI);
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE void
+__ubsan_handle_function_type_mismatch_v1_abort(FunctionTypeMismatchData *Data,
+   ValueHandle Val,
+   ValueHandle calleeRTTI,
+   ValueHandle fnRTTI);
 }
 
 #endif // UBSAN_HANDLERS_CXX_H
--- libsanitizer/ubsan/ubsan_handlers_cxx.cpp.jj2023-11-14 
23:52:59.417503473 +0100
+++ libsanitizer/ubsan/ubsan_handlers_cxx.cpp   2023-11-15 11:31:56.241672876 
+0100
@@ -156,6 +156,50 @@ void __ubsan_handle_cfi_bad_type(CFIChec
 Diag(Loc, DL_Note, ET, "check failed in %0, vtable located in %1")
 << SrcModule << DstModule;
 }
+
+static bool handleFunctionTypeMismatch(FunctionTypeMismatchData *Data,
+   ValueHandle Function,
+   ValueHandle calleeRTTI,
+   ValueHandle fnRTTI, ReportOptions Opts) 
{
+  if (checkTypeInfoEquality(reinterpret_cast(calleeRTTI),
+reinterpret_cast(fnRTTI)))
+return false;
+
+  SourceLocation CallLoc = Data->Loc.acquire();
+  ErrorType ET = ErrorType::FunctionTypeMismatch;
+
+  if (ignoreReport(CallLoc, Opts, ET))
+return true;
+
+  ScopedReport R(Opts, CallLoc, ET);
+
+  SymbolizedStackHolder FLoc(getSymbolizedLocation(Function));
+  const char *FName = FLoc.get()->info.function;
+  if (!FName)
+FName = "(unknown)";
+
+  Diag(CallLoc, DL_Error, ET,
+   "call to function %0 through pointer to incorrect function type %1")
+  << FName << Data->Type;
+  Diag(FLoc, DL_Note, ET, "%0 defined here") << FName;
+  return true;
+}
+
+void __ubsan_handle_function_type_mismatch_v1(FunctionTypeMismatchData *Data,
+  ValueHandle Function,
+  ValueHandle calleeRTTI,
+  ValueHandle fnRTTI) {
+  GET_REPORT_OPTIONS(false);
+  handleFunctionTypeMismatch(Data, Function, calleeRTTI, fnRTTI, Opts);
+}
+
+void __ubsan_handle_function_type_mismatch_v1_abort(
+FunctionTypeMismatchData *Data, ValueHandle Function,
+ValueHandle calleeRTTI, ValueHandle fnRTTI) {
+  GET_REPORT_OPTIONS(true);
+  if (handleFunctionTypeMismatch(Data, Function, calleeRTTI, fnRTTI, Opts))
+Die();
+}
 }  // namespace __ubsan
 
 #endif // CAN_SANITIZE_UB
--- libsanitizer/ubsan/ubsan_interface.inc.jj   2023-11-14 23:52:59.417503473 
+0100
+++ libsanitizer/ubsan/ubsan_interface.inc  2023-11-15 11:32:57.430809418 
+0100
@@ -21,6 +21,8 @@ INTERFACE_FUNCTION(__ubsan_handle_dynami
 INTERFACE_FUNCTION(__ubsan_handle_dynamic_type_cache_miss_abort)
 INTERFACE_FUNCTION(__ubsan_handle_float_cast_overflow)
 INTERFACE_FUNCTION(__ubsan_handle_float_cast_overflow_abort)
+INTERFACE_FUNCTION(__ubsan_handle_function_type_mismatch_v1)
+INTERFACE_FUNCTION(__ubsan_handle_function_type_mismatch_v1_abort)
 INTERFACE_FUNCTION(__ubsan_handle_function_type_mismatch)
 INTERFACE_FUNCTION(__ubsan_handle_function_type_mismatch_abort)
 INTERFACE_FUNCTION(__ubsan_handle_implicit_conversion)

Jakub

[PING] [PATCH v2] A new copy propagation and PHI elimination pass

2023-11-15 Thread Filip Kastl

- Forwarded message from Filip Kastl  -

From: Filip Kastl 
To: gcc-patches@gcc.gnu.org
Cc: rguent...@suse.de, hubi...@ucw.cz
Subject: [PATCH v2] A new copy propagation and PHI elimination pass
Date: Thu, 2 Nov 2023 14:00:02 +0100
Message-ID: 

> Hi,
> 
> this is a patch that I submitted two months ago as an RFC. I added some polish
> since.
> 
> It is a new lightweight pass that removes redundant PHI functions and as a
> bonus does basic copy propagation. With Jan Hubička we measured that it is 
> able
> to remove usually more than 5% of all PHI functions when run among early 
> passes
> (sometimes even 13% or more). Those are mostly PHI functions that would be
> later optimized away but with this pass it is possible to remove them early
> enough so that they don't get streamed when runing LTO (and also potentially
> inlined at multiple places). It is also able to remove some redundant PHIs
> that otherwise would still be present during RTL expansion.
> 
> Jakub Jelínek was concerned about debug info coverage so I compiled cc1plus
> with and without this patch. These are the sizes of .debug_info and
> .debug_loclists
> 
> .debug_info without patch 181694311
> .debug_infowith patch 181692320
> +0.0011% change
> 
> .debug_loclists without patch 47934753
> .debug_loclistswith patch 47934966
> -0.0004% change
> 
> I wanted to use dwlocstat to compare debug coverages but didn't manage to get
> the program working on my machine sadly. Hope this suffices. Seems to me that
> my patch doesn't have a significant impact on debug info.
> 
> Bootstraped and tested* on x86_64-pc-linux-gnu.
> 
> * One testcase (pr79691.c) did regress. However that is because the test is
> dependent on a certain variable not being copy propagated. I will go into more
> detail about this in a reply to this mail.
> 
> Ok to commit?

This is a second version of the patch.  In this version, I modified the
pr79691.c testcase so that it works as intended with other changes from the
patch.

The pr79691.c testcase checks that we get constants from snprintf calls and
that they simplify into a single constant.  The testcase doesn't account for
the fact that this constant may be further copy propagated which is exactly
what happens with this patch applied.

Bootstrapped and tested on x86_64-pc-linux-gnu.

Ok to commit?

Filip Kastl

-- >8 --

This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:

_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16

It also handles more complicated situations, e.g.:

_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1

gcc/ChangeLog:

* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
  RTL expansion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* tree-ssa-sccopy.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr79691.c: Updated scan-tree-dump to account
  for additional copy propagation this patch adds.
* gcc.dg/sccopy-1.c: New test.

Signed-off-by: Filip Kastl 
---
 gcc/Makefile.in |   1 +
 gcc/passes.def  |   3 +
 gcc/testsuite/gcc.dg/sccopy-1.c |  78 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr79691.c |   2 +-
 gcc/tree-pass.h |   1 +
 gcc/tree-ssa-sccopy.cc  | 867 
 6 files changed, 951 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c
 create mode 100644 gcc/tree-ssa-sccopy.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a25a1e32fbc..2bd5a015676 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1736,6 +1736,7 @@ OBJS = \
tree-ssa-pre.o \
tree-ssa-propagate.o \
tree-ssa-reassoc.o \
+   tree-ssa-sccopy.o \
tree-ssa-sccvn.o \
tree-ssa-scopedtables.o \
tree-ssa-sink.o \
diff --git a/gcc/passes.def b/gcc/passes.def
index 1e1950bdb39..fa6c5a2c9fa 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_if_to_switch);
  NEXT_PASS (pass_convert_switch);
  NEXT_PASS (pass_cleanup_eh);
+ NEXT_PASS (pass_sccopy);
  NEXT_PASS (pass_profile);
  NEXT_PASS (pass_local_pure_const);
  NEXT_PASS (pass_modref);
@@ -368,6 +369,7 @@ along with GCC; see the file COPYING3.  If not see
 However, this also causes us to misdiagnose cases that should be
 real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
   NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
+  NEXT_PASS (pass_sccopy);
   NEXT_PASS (pass_tail_calls);
   /* Split critical edges

Re: [PATCH 1/4] libsanitizer: merge from upstream (c425db2eb558c263)

2023-11-15 Thread Sam James



Jakub Jelinek  writes:

> Hi!
>
> The following patch is result of libsanitizer/merge.sh
> from c425db2eb558c263 (yesterday evening).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux (together with
> the follow-up 3 patches I'm about to post).
>
> Iain, could you please check Darwin?
>
> And anyone else their favourite platform?
>
> BTW, seems upstream has added riscv64 support for I think lsan/tsan,
> so if anyone is willing to try it there, it would be a matter of
> copying e.g. the s390*-*-linux* libsanitizer/configure.tgt entry
> to riscv64-*-linux* with the obvious s/s390x/riscv64/ change in it.

Thank you for doing it.

This will fix PR109882.

>
> Thanks.
>
>   Jakub
>
> [2. application/gzip; gcc14-libsanitizer-merge.patch.gz]...

[committed] libstdc++: std::stacktrace tweaks

2023-11-15 Thread Jonathan Wakely

Tested x86_64-linux. Pushed to trunk. Backports to follow.

The new hash.cc test was failing for armv8l-unknown-linux-gnueabihf
according to Linaro CI. This should fix it (but there are still other
failures for std::stacktrace, so I opened PR 112541).

-- >8 --

Fix a typo in a string literal and make the new hash.cc test gracefully
handle missing stacktrace data (see PR 112541).

libstdc++-v3/ChangeLog:

* include/std/stacktrace (basic_stacktrace::at): Fix class name
in exception message.
* testsuite/19_diagnostics/stacktrace/hash.cc: Do not fail if
current() returns a non-empty stacktrace.
---
 libstdc++-v3/include/std/stacktrace  | 2 +-
 libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/stacktrace 
b/libstdc++-v3/include/std/stacktrace
index 9a0d0b16068..9d5f6396aed 100644
--- a/libstdc++-v3/include/std/stacktrace
+++ b/libstdc++-v3/include/std/stacktrace
@@ -425,7 +425,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   at(size_type __n) const
   {
if (__n >= size())
- __throw_out_of_range("basic_stack_trace::at: bad frame number");
+ __throw_out_of_range("basic_stacktrace::at: bad frame number");
return begin()[__n];
   }
 
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc 
b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
index 88831efd687..21705098ff0 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
@@ -12,9 +12,10 @@ test_hash()
   using Alloc = __gnu_test::uneq_allocator;
   using S = std::basic_stacktrace;
   S s;
+  S cur = S::current();
   std::size_t h = std::hash()(s);
-  std::size_t h2 = std::hash()(S::current());
-  VERIFY( h != h2 );
+  std::size_t h2 = std::hash()(cur);
+  VERIFY( cur.empty() == (h == h2) );
 }
 
 int main()
-- 
2.41.0

[committed] libstdc++: Fix std::deque::operator[] Xmethod [PR112491]

2023-11-15 Thread Jonathan Wakely

Tested x86_64-linux. Pushed to trunk. Backports to follow.

-- >8 --

The Xmethod for std::deque::operator[] has the same bug that I recently
fixed for the std::deque::size() Xmethod. The first node might have
unused capacity at the start, which needs to be accounted for when
indexing into the deque.

libstdc++-v3/ChangeLog:

PR libstdc++/112491
* python/libstdcxx/v6/xmethods.py (DequeWorkerBase.index):
Correctly handle unused capacity at the start of the first node.
* testsuite/libstdc++-xmethods/deque.cc: Check index operator
when elements have been removed from the front.
---
 libstdc++-v3/python/libstdcxx/v6/xmethods.py   | 12 
 libstdc++-v3/testsuite/libstdc++-xmethods/deque.cc |  6 +-
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/xmethods.py 
b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
index dcef285180a..f29bc2d40fb 100644
--- a/libstdc++-v3/python/libstdcxx/v6/xmethods.py
+++ b/libstdc++-v3/python/libstdcxx/v6/xmethods.py
@@ -195,7 +195,7 @@ class DequeWorkerBase(gdb.xmethod.XMethodWorker):
 def size(self, obj):
 start = obj['_M_impl']['_M_start']
 finish = obj['_M_impl']['_M_finish']
-if not start['_M_node']:
+if start['_M_cur'] == finish['_M_cur']:
 return 0
 return (self._bufsize
 * (finish['_M_node'] - start['_M_node'] - 1)
@@ -203,9 +203,13 @@ class DequeWorkerBase(gdb.xmethod.XMethodWorker):
 + (start['_M_last'] - start['_M_cur']))
 
 def index(self, obj, idx):
-first_node = obj['_M_impl']['_M_start']['_M_node']
-index_node = first_node + int(idx) // self._bufsize
-return index_node[0][idx % self._bufsize]
+start = obj['_M_impl']['_M_start']
+first_node_size = start['_M_last'] - start['_M_cur']
+if idx < first_node_size:
+return start['_M_cur'][idx]
+idx = idx - first_node_size
+index_node = start['_M_node'][1 + int(idx) // self._bufsize]
+return index_node[idx % self._bufsize]
 
 
 class DequeEmptyWorker(DequeWorkerBase):
diff --git a/libstdc++-v3/testsuite/libstdc++-xmethods/deque.cc 
b/libstdc++-v3/testsuite/libstdc++-xmethods/deque.cc
index e4077c14ff5..6058d23c87a 100644
--- a/libstdc++-v3/testsuite/libstdc++-xmethods/deque.cc
+++ b/libstdc++-v3/testsuite/libstdc++-xmethods/deque.cc
@@ -69,20 +69,24 @@ main ()
 
   // PR libstdc++/112491
   std::deque q5;
-  q5.push_front(0);
+  q5.push_front(5);
 // { dg-final { note-test q5.size() 1 } }
+// { dg-final { note-test q5\[0\] 5 } }
   std::deque q6 = q1;
   q6.pop_front();
 // { dg-final { note-test {q6.size() == (q1_size-1)} true } }
+// { dg-final { note-test q6\[1\] 102 } }
   std::deque q7 = q2;
   q7.pop_front();
   q7.pop_front();
 // { dg-final { note-test {q7.size() == (q2_size-2)} true } }
+// { dg-final { note-test q7\[1\] 203 } }
   std::deque q8 = q3;
   q8.pop_front();
   q8.pop_front();
   q8.pop_front();
 // { dg-final { note-test {q8.size() == (q3_size-3)} true } }
+// { dg-final { note-test q8\[1\] 304 } }
   std::deque q9 = q8;
   q9.clear();
 // { dg-final { note-test q9.size() 0 } }
-- 
2.41.0

Re: [PATCH v2] LoongArch: Remove redundant barrier instructions before LL-SC loops

2023-11-15 Thread Xi Ruoyao

Pushed r14-5486.

/* snip */

> > * gcc.target/loongarch/cas-acquire.c: New test.

This test fails with GCC 12/13 on LA664, and it indicates a correctness
issue.  May I backport this patch to 12/13 as well? 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH 1/4] libsanitizer: merge from upstream (c425db2eb558c263)

2023-11-15 Thread Richard Biener

On Wed, 15 Nov 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following patch is result of libsanitizer/merge.sh
> from c425db2eb558c263 (yesterday evening).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux (together with
> the follow-up 3 patches I'm about to post).
> 
> Iain, could you please check Darwin?
> 
> And anyone else their favourite platform?
> 
> BTW, seems upstream has added riscv64 support for I think lsan/tsan,
> so if anyone is willing to try it there, it would be a matter of
> copying e.g. the s390*-*-linux* libsanitizer/configure.tgt entry
> to riscv64-*-linux* with the obvious s/s390x/riscv64/ change in it.

OK for the series.

> Thanks.
> 
>   Jakub
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] Only allow (copysign x, NEG_CONST) -> (fneg (fabs x)) simplification for constant folding [PR112483]

2023-11-15 Thread Richard Biener

On Tue, Nov 14, 2023 at 10:14 PM Xi Ruoyao  wrote:
>
> On Tue, 2023-11-14 at 11:44 +0100, Richard Biener wrote:
> > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> > > index 2d2e5a3c1ca..f3745d86aea 100644
> > > --- a/gcc/simplify-rtx.cc
> > > +++ b/gcc/simplify-rtx.cc
> > > @@ -4392,7 +4392,7 @@ simplify_ashift:
> > >real_convert (&f1, mode, CONST_DOUBLE_REAL_VALUE (trueop1));
> > >rtx tmp = simplify_gen_unary (ABS, mode, op0, mode);
> > >if (REAL_VALUE_NEGATIVE (f1))
> > > -   tmp = simplify_gen_unary (NEG, mode, tmp, mode);
> > > +   tmp = simplify_unary_operation (NEG, mode, tmp, mode);
> > >   return tmp;
> > > }
> >
> > shouldn't that be when either the ABS or the NEG simplify?
>
> Simplify (copysign x, POSTIVE_CONST) to (abs x) is an optimization.  So
> for a positive f1, tmp will just be (abs x) and we return it.

Ah, OK.

> > And I wonder when that happens - I suppose when op0 is CONST_DOUBLE only?
>
> Yes, it's Andrew's intention.

The patch is fine then.

Richard.

> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University

Re: [PATCH v4] gcc: Introduce -fhardened

2023-11-15 Thread Richard Biener

On Tue, Nov 14, 2023 at 5:00 PM Marek Polacek  wrote:
>
> On Tue, Nov 14, 2023 at 08:46:16AM +0100, Richard Biener wrote:
> > On Fri, Nov 3, 2023 at 11:51 PM Marek Polacek  wrote:
> > >
> > > On Thu, Oct 26, 2023 at 05:55:56PM +0200, Richard Biener wrote:
> > > >
> > > >
> > > > > Am 24.10.2023 um 21:09 schrieb Marek Polacek :
> > > > >
> > > > > On Tue, Oct 24, 2023 at 09:22:25AM +0200, Richard Biener wrote:
> > > > >>> On Mon, Oct 23, 2023 at 9:26 PM Marek Polacek  
> > > > >>> wrote:
> > > > >>>
> > > > >>> On Thu, Oct 19, 2023 at 02:24:11PM +0200, Richard Biener wrote:
> > > >  Can you see how our
> > > >  primary and secondary targets (+ host OS) behave here?
> > > > >>>
> > > > >>> That's very reasonable.  I tried to build gcc on Compile Farm 119 
> > > > >>> (AIX) but
> > > > >>> that fails with:
> > > > >>>
> > > > >>> ar  -X64 x ../ppc64/libgcc/libgcc_s.a shr.o
> > > > >>> ar: 0707-100 ../ppc64/libgcc/libgcc_s.a does not exist.
> > > > >>> make[2]: *** 
> > > > >>> [/home/polacek/gcc/libgcc/config/rs6000/t-slibgcc-aix:98: all] 
> > > > >>> Error 1
> > > > >>> make[2]: Leaving directory 
> > > > >>> '/home/polacek/x/trunk/powerpc-ibm-aix7.3.1.0/libgcc'
> > > > >>>
> > > > >>> and I tried Darwin (104) and that fails with
> > > > >>>
> > > > >>> *** Configuration aarch64-apple-darwin21.6.0 not supported
> > > > >>>
> > > > >>> Is anyone else able to build gcc on those machines, or test the 
> > > > >>> attached
> > > > >>> patch?
> > > > >>>
> > > >  I think the
> > > >  documentation should elaborate a bit on expectations for 
> > > >  non-Linux/GNU
> > > >  targets, specifically I think the default configuration for a 
> > > >  target should
> > > >  with -fhardened _not_ have any -Whardened diagnostics.  Maybe we 
> > > >  can
> > > >  have a testcase for this?
> > > > >>>
> > > > >>> Sorry, I'm not sure how to test that.  I suppose if -fhardened 
> > > > >>> enables
> > > > >>> something not supported on those systems, and it's something for 
> > > > >>> which
> > > > >>> we have a configure test, then we shouldn't warn.  This is already 
> > > > >>> the
> > > > >>> case for -pie, -z relro, and -z now.
> > > > >>
> > > > >> I was thinking of
> > > > >>
> > > > >> /* { dg-do compile } */
> > > > >> /* { dg-additional-options "-fhardened -Whardened" } */
> > > > >>
> > > > >> int main () {}
> > > > >>
> > > > >> and excess errors should catch "misconfigurations"?
> > > > >
> > > > > I see.  fhardened-3.c is basically just like this (-Whardened is on 
> > > > > by default).
> > > > >
> > > > >>> Should the docs say something like the following for features 
> > > > >>> without
> > > > >>> configure checks?
> > > > >>>
> > > > >>> @option{-fhardened} can, on certain systems, attempt to enable 
> > > > >>> features
> > > > >>> not supported on that particular system.  In that case, it's 
> > > > >>> possible to
> > > > >>> prevent the warning using the @option{-Wno-hardened} option.
> > > > >>
> > > > >> Yeah, but ideally
> > > > >>
> > > > >> @option{-fhardened} can, on certain systems, not enable features not
> > > > >> available on those systems and @option{-Whardened} will not diagnose
> > > > >> those as missing.
> > > > >>
> > > > >> But I understand it doesn't work like that?
> > > > >
> > > > > Right.  It will not diagnose missing features if they have a configure
> > > > > check, otherwise it will.  And I don't know if we want a configure 
> > > > > check
> > > > > for every feature.  Maybe we can add them in the future if the current
> > > > > patch turns out to be problematical in practice?
> > > >
> > > > Maybe we can have a switch on known target triples and statically 
> > > > configure based
> > > > On that, eventually even not support -fhardened for targets not listed. 
> > > >  That’s certainly easier than detecting the target system features 
> > > > (think of cross compilers)
> > >
> > > You mean like the following?  The only difference is the addition of
> > > HAVE_FHARDENED_SUPPORT and updating the tests to only run on gnu/linux
> > > targets.  If other OSs want to use -fhardened, they need to update the
> > > configure test.  Thanks,
> >
> > Yes, something like this.  IMHO we should aim to at least support all
> > our primary platforms (and maybe secondary if they have a relevant
> > host OS part).
>
> That sounds good.  Do you want to see any other changes in this patch
> or are you fine with it as-is (provided that someone else also acks it)?

I'm fine with it as-is if somebody else also acks it.

Richard.

> Thanks,
>
> Marek
>

Re: gfortran.dg/dg.exp debug messages pollute test output

2023-11-15 Thread Rainer Orth

Hi FX,

>> FX submitted the patch series, I can find the reference if you need it.
>
> Patch was submitted in this thread:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630096.html

ah, I see.  I'd been looking for the patch summary and Iain's name in my
searches; that's why I came up blank.

>>> Besides,
>>> it's unclear if those messages can just be removed (they are pretty
>>> cryptic as is) or at least changed to use verbose instead of puts.
>>> Please fix.
>
> I don’t see value in this output, so I think it’s best to remove the puts
> calls entirely. Attached patch does that.
> Testing under progress, OK if it passes? (or does that count as obvious
> fix-up of the previous patch)

Ok for trunk.  However, this perfectly fits the obvious criteria, too.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[PATCH] Fix ICE with SLP and -fdbg-cnt

2023-11-15 Thread Richard Biener

We have to clear the visited flag on stmts.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-slp.cc (vect_slp_region): Also clear visited flag when
we skipped an instance due to -fdbg-cnt.
---
 gcc/tree-vect-slp.cc | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 80e279d8f50..33c4d1308f6 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -7652,6 +7652,7 @@ vect_slp_region (vec bbs, 
vec datarefs,
 
  bb_vinfo->shared->check_datarefs ();
 
+ bool force_clear = false;
  auto_vec profitable_subgraphs;
  for (slp_instance instance : BB_VINFO_SLP_INSTANCES (bb_vinfo))
{
@@ -7674,15 +7675,17 @@ vect_slp_region (vec bbs, 
vec datarefs,
 
  vect_location = saved_vect_location;
  if (!dbg_cnt (vect_slp))
-   continue;
+   {
+ force_clear = true;
+ continue;
+   }
 
  profitable_subgraphs.safe_push (instance);
}
 
  /* When we're vectorizing an if-converted loop body make sure
 we vectorized all if-converted code.  */
- if (!profitable_subgraphs.is_empty ()
- && orig_loop)
+ if ((!profitable_subgraphs.is_empty () || force_clear) && orig_loop)
{
  gcc_assert (bb_vinfo->bbs.length () == 1);
  for (gimple_stmt_iterator gsi = gsi_start_bb (bb_vinfo->bbs[0]);
-- 
2.35.3

[PATCH] tree-optimization/112282 - wrong-code with ifcvt hoisting

2023-11-15 Thread Richard Biener

The following avoids hoisting of invariants from conditionally
executed parts of an if-converted loop.  That now makes a difference
since we perform bitfield lowering even when we do not actually
if-convert the loop.  if-conversion deals with resetting flow-sensitive
info when necessary already.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/112282
* tree-if-conv.cc (ifcvt_hoist_invariants): Only hoist from
the loop header.

* gcc.dg/torture/pr112282.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr112282.c | 132 
 gcc/tree-if-conv.cc |  44 
 2 files changed, 153 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr112282.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr112282.c 
b/gcc/testsuite/gcc.dg/torture/pr112282.c
new file mode 100644
index 000..23e0ed64b82
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr112282.c
@@ -0,0 +1,132 @@
+/* { dg-do run } */
+
+int printf(const char *, ...);
+void __assert_fail();
+int a, g, h, i, v, w = 2, x, y, ab, ac, ad, ae, af, ag;
+static int f, j, m, n, p, r, u, aa;
+struct b {
+  int c : 20;
+  int d : 20;
+  int e : 10;
+};
+static struct b l, o, q = {3, 3, 5};
+int s(int z) {
+  struct b ah;
+  int ai = 1, aj[7] = {1, 1, 1, 1, 1, 1, 1};
+ak:
+  for (u = -22; u < 2; ++u) {
+struct b al[8] = {{2, 7, 9}, {8, 7, 1}, {2, 7, 9}, {8, 7, 1}, {2, 7, 9}, 
{8, 7, 1}, {2, 7, 9}};
+y = z = 0;
+for (; z < 2; z++) {
+  int am[18], k;
+  ab = ac = 0;
+  for (; ac < 1; ac++)
+for (k = 0; k < 9; k++)
+  am[k] = 0;
+  n = 0;
+  while (1) {
+v = u < 0 || a;
+h = z < ~u && 4 & q.c;
+if ((aa <= l.c) > q.d && p)
+  return o.c;
+if (w)
+  break;
+return q.e;
+  }
+  a = j;
+}
+  }
+  for (x = 0; x < 2; x++) {
+struct b an = {1, 8, 4};
+int ao[28] = {5, 0, 0, 9, 0, 3, 0, 5, 0, 0, 9, 0, 3, 0, 5, 0, 0, 9, 0, 3, 
0, 5, 0, 0, 9, 0, 3, 0};
+if (q.e) {
+  int ap = ai || l.c + q.c, aq = q.d, ar = p & f;
+  q.d = q.d || ar || ap;
+  p = 0;
+  if (!j && ai)
+goto as;
+  if (q.d) {
+printf("", l);
+q.d = f >> j;
+  }
+  p = l.c = aq;
+  an = q;
+} else {
+  int at[12][1] = {{9}, {9}, {5}, {9}, {9}, {5}, {9}, {9}, {5}, {9}, {9}, 
{5}};
+  struct b au;
+  if (o.c)
+aa = ah.e;
+  if (an.d)
+ah.e = (j & (aa * m)) ^ au.d;
+  o.c = m + aa;
+  int av = o.c || 0, aw = ai || q.c & l.c, ax = n;
+  if (q.e < ai)
+q = an;
+  if (r)
+break;
+  ai = aw - av;
+  an.e = 0;
+  if (ai) {
+an.e = l.c || 0;
+f = q.c;
+ah.e = l.c % q.d;
+q.c = au.e;
+if ((q.d && q.c) || ah.e)
+  __assert_fail();
+q.c = 0;
+if (au.d > m || ah.e)
+  w = au.c | (n & ah.c);
+  as:
+ae = af = ah.c;
+int ay = au.d & q.e & au.c || o.c, az = 0 || o.c, ba = m & ah.d;
+if (n)
+  au.c = au.e = (q.e || ah.d) ^ (o.c + (az / au.e));
+n = au.c || au.e;
+if (ba) {
+  printf("", ax);
+  x = q.e | m;
+  continue;
+}
+m = ay;
+n = printf("", au);
+  }
+  if (ah.d)
+o.c = l.c & o.c & q.c;
+  if (q.d)
+__assert_fail();
+  printf("", an);
+  printf("", q);
+  printf("", au);
+  if (ah.e)
+while (u++) {
+  struct b al[7] = {{7, 9, 8}, {7, 1, 2}, {7, 9, 8}, {7, 1, 2}, {7, 9, 
8}, {7, 1, 2}, {7, 9, 0}};
+  if (an.d) {
+int d[8] = {0, 1, 0, 1, 0, 1, 0, 1};
+if (ad)
+  goto ak;
+while (ag)
+  g = an.d = i = m;
+f = j;
+  }
+  n++;
+}
+  f = q.d;
+}
+if (l.c && m) {
+  int d[7] = {1, 0, 1, 0, 1, 0, 1};
+  if (x)
+h = an.d;
+  else
+g = 0;
+}
+  }
+  int bb = (q.d ^ ah.c) | aa | (q.e & q.c) | (f & ah.d);
+  if (bb)
+return x;
+  return 0;
+}
+int main() {
+  j = 1;
+  s(0);
+  return 0;
+}
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 0190cf2369e..0bde281c246 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -3468,30 +3468,28 @@ ifcvt_can_hoist (class loop *loop, edge pe, gimple 
*stmt)
 static void
 ifcvt_hoist_invariants (class loop *loop, edge pe)
 {
+  /* Only hoist from the now unconditionally executed part of the loop.  */
+  basic_block bb = loop->header;
   gimple_stmt_iterator hoist_gsi = {};
-  unsigned int num_blocks = loop->num_nodes;
-  basic_block *body = get_loop_body (loop);
-  for (unsigned int i = 0; i < num_blocks; ++i)
-for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);)
-  {
-   gimple *stmt = gsi_stmt (gsi);
-   if (ifcvt_can_hoist (loop, pe, stmt))
- {
-   /* Once

RE: [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch

2023-11-15 Thread Richard Biener

On Tue, 14 Nov 2023, Tamar Christina wrote:

> > > OK, but then I think the fix is to not use
> > > standard_iv_increment_position (it's a weird API anyway).  Instead insert
> > before the main exit condition.
> > 
> > I figured as much, Almost done respinning it with the vectorizer's own 
> > simpler
> > copy.
> > Should be out today with the rest.
> > 
> > >
> > > Btw, I assumed this order of main / early exit cannot happen.  But I
> > > didn't re- review the main exit identification code yet.
> > >
> > 
> > It can happen because we allowed vec_init_loop_exit_info to pick the last
> > analyzable exit.  In cases like these it happens because the final exit has 
> > no
> > Information from SCEV. It then picks the last exit it could analyze which by
> > default is an early exit.
> > 
> > It's very tricky to deal with and have just finished cleaning up the IV 
> > update
> > code to make it easier to follow... but it does seem to add about 970 more
> > vectorized cases (most of which are execution tests).
> > 
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop-manip.cc (vect_iv_increment_position): New.
>   (vect_set_loop_controls_directly): Use it.
>   (vect_set_loop_condition_partial_vectors_avx512): Likewise.
>   (vect_set_loop_condition_normal): Likewise.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> fafbf924e8db18eb4eec7a4a1906d10f6ce9812f..a5a612dc6b47436730592469176623685a7a413f
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -448,6 +448,20 @@ vect_adjust_loop_lens_control (tree iv_type, gimple_seq 
> *seq,
>  }
>  }
>  
> +/* Stores the standard position for induction variable increment in 
> belonging to
> +   LOOP_EXIT (just before the exit condition of the given exit to BSI.
> +   INSERT_AFTER is set to true if the increment should be inserted after
> +   *BSI.  */
> +
> +static void
> +vect_iv_increment_position (edge loop_exit, gimple_stmt_iterator *bsi,
> + bool *insert_after)
> +{
> +  basic_block bb = loop_exit->src;
> +  *bsi = gsi_last_bb (bb);
> +  *insert_after = false;
> +}
> +
>  /* Helper for vect_set_loop_condition_partial_vectors.  Generate definitions
> for all the rgroup controls in RGC and return a control that is nonzero
> when the loop needs to iterate.  Add any new preheader statements to
> @@ -531,7 +545,8 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>tree index_before_incr, index_after_incr;
>gimple_stmt_iterator incr_gsi;
>bool insert_after;
> -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +  vect_iv_increment_position (exit_e, &incr_gsi, &insert_after);
>if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
>  {
>/* Create an IV that counts down from niters_total and whose step
> @@ -1017,7 +1032,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class 
> loop *loop,
>tree index_before_incr, index_after_incr;
>gimple_stmt_iterator incr_gsi;
>bool insert_after;
> -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +  vect_iv_increment_position (exit_edge, &incr_gsi, &insert_after);
>create_iv (niters_adj, MINUS_EXPR, iv_step, NULL_TREE, loop,
>&incr_gsi, insert_after, &index_before_incr,
>&index_after_incr);
> @@ -1185,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class 
> loop *loop,
> loop handles exactly VF scalars per iteration.  */
>  
>  static gcond *
> -vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge 
> exit_edge,
> +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
>   class loop *loop, tree niters, tree step,
>   tree final_iv, bool niters_maybe_zero,
>   gimple_stmt_iterator loop_cond_gsi)
> @@ -1278,7 +1293,7 @@ vect_set_loop_condition_normal (loop_vec_info /* 
> loop_vinfo */, edge exit_edge,
>   }
>  }
>  
> -  standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +  vect_iv_increment_position (exit_edge, &incr_gsi, &insert_after);
>create_iv (init, PLUS_EXPR, step, NULL_TREE, loop,
>   &incr_gsi, insert_after, &indx_before_incr, &indx_after_incr);
>indx_after_incr = force_gimple_operand_gsi (&loop_cond_gsi, 
> indx_after_incr,
> @@ -1446,7 +1461,7 @@ slpeel_tree_duplicate_loop_for_vectorization (class 
> loop *loop, edge loop_exit,
>redirect_edge_and_branch (exit, dest);
>  }
>  
> -  /* Only fush the main exit, the remaining exits we need to match the order
> +  /* Only flush the main exit, the remaining exits we need to match the order
>   in the loop->header which with multiple exits

[PATCH] c++/modules: Allow exporting const-qualified namespace-scope variables [PR99232]

2023-11-15 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
access.

-- >8 --

By [basic.link] p3.2.1, a non-template non-volatile const-qualified
variable is not necessarily internal linkage in a module declaration,
and rather may have module linkage (or external linkage if it is
exported, see p4.8).

PR c++/99232

gcc/cp/ChangeLog:

* decl.cc (grokvardecl): Don't mark variables attached to
modules as internal.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99232_a.C: New test.
* g++.dg/modules/pr99232_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/decl.cc   | 3 ++-
 gcc/testsuite/g++.dg/modules/pr99232_a.C | 8 
 gcc/testsuite/g++.dg/modules/pr99232_b.C | 7 +++
 3 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr99232_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr99232_b.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index d2ed46b1453..173dd93ef5b 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -10992,7 +10992,8 @@ grokvardecl (tree type,
&& (DECL_THIS_EXTERN (decl)
|| ! constp
|| volatilep
-   || inlinep));
+   || inlinep
+   || module_attach_p ()));
   TREE_STATIC (decl) = ! DECL_EXTERNAL (decl);
 }
   /* Not at top level, only `static' makes a static definition.  */
diff --git a/gcc/testsuite/g++.dg/modules/pr99232_a.C 
b/gcc/testsuite/g++.dg/modules/pr99232_a.C
new file mode 100644
index 000..33b3b783399
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99232_a.C
@@ -0,0 +1,8 @@
+// PR c++/99232
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi pr99232 }
+
+export module pr99232;
+
+export const double lambda{ 1.3 };
+export constexpr int a = 42;
diff --git a/gcc/testsuite/g++.dg/modules/pr99232_b.C 
b/gcc/testsuite/g++.dg/modules/pr99232_b.C
new file mode 100644
index 000..98f3c52a51c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99232_b.C
@@ -0,0 +1,7 @@
+// PR c++/99232
+// { dg-additional-options "-fmodules-ts" }
+
+import pr99232;
+
+double foo() { return lambda * 2.0; }
+static_assert(a == 42);
-- 
2.42.0

Re: [PATCH v4] gcc: Introduce -fhardened

2023-11-15 Thread Jakub Jelinek

On Fri, Nov 03, 2023 at 06:51:16PM -0400, Marek Polacek wrote:
> +  if (flag_hardened)
> + {
> +   if (!fortify_seen_p && optimize > 0)
> + {
> +   if (TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 35)
> + cpp_define (parse_in, "_FORTIFY_SOURCE=3");
> +   else
> + cpp_define (parse_in, "_FORTIFY_SOURCE=2");
> + }

I don't like the above in generic code, the fact that gcc was configured
against glibc target headers doesn't mean it is targetting glibc.
E.g. for most *-linux* targets, config/linux.opt provides the
-mbionic/-mglibc/-muclibc/-mmusl options.

One ugly way around would be to do
#ifdef OPTION_GLIBC
  if (OPTION_GLIBC && TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 35)
cpp_define (parse_in, "_FORTIFY_SOURCE=3");
  else
#endif
cpp_define (parse_in, "_FORTIFY_SOURCE=2");
(assuming OPTION_GLIBC at that point is already computed); a cleaner way
would be to introduce a target hook for that, say
fortify_source_default_level or something similar, where the default hook
would return 2 and next to linux_libc_has_function one would override it
for OPTION_GLIBC && TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 35
to 3.  That way, in the future other targets (say *BSD) can choose to do
something similar more easily.

The rest LGTM.

Jakub

Re: building GNU gettext on AIX

2023-11-15 Thread Bruno Haible

[CCing bug-gettext]

David Edelsohn wrote in
:
> The current gettext-0.22.3 fails to build for me on AIX.

Here are some hints to get a successful build of GNU gettext on AIX:

1. Set the recommended environment variables before running configure:
   https://gitlab.com/ghwiki/gnow-how/-/wikis/Platforms/Configuration

   Namely:
   * for a 32-bit build with gcc:
 CC=gcc
 CXX=g++
 CPPFLAGS="-I$PREFIX/include"
 LDFLAGS="-L$PREFIX/lib"
 unset AR NM
   * for a 32-bit build with xlc:
 CC="xlc -qthreaded -qtls"
 CXX="xlC -qthreaded -qtls"
 CPPFLAGS="-I$PREFIX/include"
 LDFLAGS="-L$PREFIX/lib"
 unset AR NM
   * for a 64-bit build with gcc:
 CC="gcc -maix64"
 CXX="g++ -maix64"
 CPPFLAGS="-I$PREFIX/include"
 LDFLAGS="-L$PREFIX/lib"
 AR="ar -X 64"; NM="nm -X 64 -B"
   * for a 64-bit build with xlc:
 CC="xlc -q64 -qthreaded -qtls"
 CXX="xlC -q64 -qthreaded -qtls"
 CPPFLAGS="-I$PREFIX/include"
 LDFLAGS="-L$PREFIX/lib"
 AR="ar -X 64"; NM="nm -X 64 -B"

   where $PREFIX is the value that you pass to the --prefix configure option.

   Rationale: you can run into all sorts of problems if you choose compiler
   options at random and haven't experience with compiler options on that
   platform.

2. Don't use ibm-clang.

   Rationale: It's broken.

3. Don't use -Wall with gcc 10.3.

   Rationale: If you specify -Wall, gettext's configure adds -fanalyzer, which
   has excessive memory requirements in gcc 10.x. In particular, on AIX, it
   makes cc1 crash while compiling regex.c after it has consumed 1 GiB of RAM.

4. Avoid using a --prefix that contains earlier installations of the same
   package.

   Rationale: Because the AIX linker hardcodes directory names in shared
   libraries, GNU libtool has a peculiar configuration on AIX. It ends up
   mixing the in-build-tree libraries with the libraries in the install
   locations, leading to all sorts of errors.

   If you really need to use a --prefix that contains an earlier
   installation of the same package:
 - Either use --disable-shared and remove libgettextlib.a and
   libgettextsrc.a from $PREFIX/lib before starting the build.
 - Or use a mix of "make -k", "make -k install" and ad-hoc workarounds
   that cannot be described in a general way.

Bruno

RE: [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form for early breaks

2023-11-15 Thread Richard Biener

On Wed, 15 Nov 2023, Tamar Christina wrote:

> Patch updated to latest trunk,
> 
> This splits the part of the function that does peeling for loops at exits to
> a different function.  In this new function we also peel for early breaks.
> 
> Peeling for early breaks works by redirecting all early break exits to a
> single "early break" block and combine them and the normal exit edge together
> later in a different block which then goes into the epilog preheader.
> 
> This allows us to re-use all the existing code for IV updates, Additionally 
> this
> also enables correct linking for multiple vector epilogues.
> 
> flush_pending_stmts cannot be used in this scenario since it updates the PHI
> nodes in the order that they are in the exit destination blocks.  This means
> they are in CFG visit order.  With a single exit this doesn't matter but with
> multiple exits with different live values through the different exits the 
> order
> usually does not line up.
> 
> Additionally the vectorizer helper functions expect to be able to iterate over
> the nodes in the order that they occur in the loop header blocks.  This is an
> invariant we must maintain.  To do this we just inline the work
> flush_pending_stmts but maintain the order by using the header blocks to guide
> the work.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop-manip.cc (vect_is_loop_exit_latch_pred): New.
>   (slpeel_tree_duplicate_loop_for_vectorization): New.
>   (slpeel_tree_duplicate_loop_to_edge_cfg): use it.
>   * tree-vectorizer.h (is_loop_header_bb_p): Drop assert.
>   (slpeel_tree_duplicate_loop_to_edge_cfg): Update signature.
>   (vect_is_loop_exit_latch_pred): New.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> b9161274ce401a7307f3e61ad23aa036701190d7..fafbf924e8db18eb4eec7a4a1906d10f6ce9812f
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1392,6 +1392,153 @@ vect_set_loop_condition (class loop *loop, edge 
> loop_e, loop_vec_info loop_vinfo
>(gimple *) cond_stmt);
>  }
>  
> +/* Determine if the exit choosen by the loop vectorizer differs from the
> +   natural loop exit.  i.e. if the exit leads to the loop patch or not.
> +   When this happens we need to flip the understanding of main and other
> +   exits by peeling and IV updates.  */
> +
> +bool inline
> +vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)

Ick, bad name - didn't see its use(s) in this patch?


> +{
> +  return single_pred (loop->latch) == loop_exit->src;
> +}
> +
> +/* Perform peeling for when the peeled loop is placed after the original 
> loop.
> +   This maintains LCSSA and creates the appropriate blocks for multiple exit
> +   vectorization.   */
> +
> +void static
> +slpeel_tree_duplicate_loop_for_vectorization (class loop *loop, edge 
> loop_exit,
> +   vec &loop_exits,
> +   class loop *new_loop,
> +   bool flow_loops,
> +   basic_block new_preheader)

also bad name ;)  I don't see a strong reason to factor this out.

> +{
> +  bool multiple_exits_p = loop_exits.length () > 1;
> +  basic_block main_loop_exit_block = new_preheader;
> +  if (multiple_exits_p)
> +{
> +  edge loop_entry = single_succ_edge (new_preheader);
> +  new_preheader = split_edge (loop_entry);
> +}
> +
> +  auto_vec  new_phis;
> +  hash_map  new_phi_args;
> +  /* First create the empty phi nodes so that when we flush the
> + statements they can be filled in.   However because there is no order
> + between the PHI nodes in the exits and the loop headers we need to
> + order them base on the order of the two headers.  First record the new
> + phi nodes. Then redirect the edges and flush the changes.  This writes 
> out
> + the new SSA names.  */
> +  for (auto gsi_from = gsi_start_phis (loop_exit->dest);
> +   !gsi_end_p (gsi_from); gsi_next (&gsi_from))
> +{
> +  gimple *from_phi = gsi_stmt (gsi_from);
> +  tree new_res = copy_ssa_name (gimple_phi_result (from_phi));
> +  gphi *res = create_phi_node (new_res, main_loop_exit_block);
> +  new_phis.safe_push (res);
> +}
> +
> +  for (auto exit : loop_exits)
> +{
> +  basic_block dest
> + = exit == loop_exit ? main_loop_exit_block : new_preheader;
> +  redirect_edge_and_branch (exit, dest);
> +}
> +
> +  /* Only fush the main exit, the remaining exits we need to match the order
> + in the loop->header which with multiple exits may not be the same.  */
> +  flush_pending_stmts (loop_exit);
> +
> +  /* Record the new SSA names in the cache so that we can skip materializing
> + them again when we fill in the rest of the LCSSA

RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

2023-11-15 Thread Richard Biener

On Wed, 15 Nov 2023, Tamar Christina wrote:

> Patch updated to latest trunk:
> 
> Hi All,
> 
> This changes the PHI node updates to support early breaks.
> It has to support both the case where the loop's exit matches the normal loop
> exit and one where the early exit is "inverted", i.e. it's an early exit edge.
> 
> In the latter case we must always restart the loop for VF iterations.  For an
> early exit the reason is obvious, but there are cases where the "normal" exit
> is located before the early one.  This exit then does a check on ivtmp 
> resulting
> in us leaving the loop since it thinks we're done.
> 
> In these case we may still have side-effects to perform so we also go to the
> scalar loop.
> 
> For the "normal" exit niters has already been adjusted for peeling, for the
> early exits we must find out how many iterations we actually did.  So we have
> to recalculate the new position for each exit.
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide unused.
>   (vect_update_ivs_after_vectorizer): Support early break.
>   (vect_do_peeling): Use it.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3d2654cf1c842baac58f5
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512 (class 
> loop *loop,
> loop handles exactly VF scalars per iteration.  */
>  
>  static gcond *
> -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge exit_edge,
> +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge 
> exit_edge,
>   class loop *loop, tree niters, tree step,
>   tree final_iv, bool niters_maybe_zero,
>   gimple_stmt_iterator loop_cond_gsi)
> @@ -1412,7 +1412,7 @@ vect_set_loop_condition (class loop *loop, edge loop_e, 
> loop_vec_info loop_vinfo
> When this happens we need to flip the understanding of main and other
> exits by peeling and IV updates.  */
>  
> -bool inline
> +bool
>  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)
>  {
>return single_pred (loop->latch) == loop_exit->src;
> @@ -2142,6 +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
>   Input:
>   - LOOP - a loop that is going to be vectorized. The last few iterations
>of LOOP were peeled.
> + - VF   - The chosen vectorization factor for LOOP.
>   - NITERS - the number of iterations that LOOP executes (before it is
>  vectorized). i.e, the number of times the ivs should be 
> bumped.
>   - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path

the comment on this is now a bit misleading, can you try to update it
and/or move the comment bits to the docs on EARLY_EXIT?

> @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
>The phi args associated with the edge UPDATE_E in the bb
>UPDATE_E->dest are updated accordingly.
>  
> + - restart_loop - Indicates whether the scalar loop needs to restart the

params are ALL_CAPS

> +   iteration count where the vector loop began.
> +
>   Assumption 1: Like the rest of the vectorizer, this function assumes
>   a single loop exit that has a single predecessor.
>  
> @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
>   */
>  
>  static void
> -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> -   tree niters, edge update_e)
> +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, poly_uint64 vf,

LOOP_VINFO_VECT_FACTOR?

> +   tree niters, edge update_e, bool restart_loop)

I think 'bool early_exit' is better here?  I wonder if we have an "early"
exit after the main exit we are probably sure there are no side-effects
to re-execute and could avoid this restarting?

>  {
>gphi_iterator gsi, gsi1;
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>basic_block update_bb = update_e->dest;
> -
> -  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> -
> -  /* Make sure there exists a single-predecessor exit bb:  */
> -  gcc_assert (single_pred_p (exit_bb));
> -  gcc_assert (single_succ_edge (exit_bb) == update_e);
> +  bool inversed_iv
> + = !vect_is_loop_exit_latch_pred (LOOP_VINFO_IV_EXIT (loop_vinfo),
> +  LOOP_VINFO_LOOP (loop_vinfo));
> +  bool needs_interm_block = LOOP_VINFO_EARLY_BREAKS (loop_vinfo)
> + && flow_bb_inside_loop_p (loop, update_e->src);
> +  edge loop_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +  gcond *cond = get_loop_exit_condition (loop_e);
> +  basic_block exit_bb = loop_e->dest;
> +  basic_block iv_block =

RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

2023-11-15 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, November 15, 2023 1:01 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> On Wed, 15 Nov 2023, Tamar Christina wrote:
> 
> > Patch updated to latest trunk:
> >
> > Hi All,
> >
> > This changes the PHI node updates to support early breaks.
> > It has to support both the case where the loop's exit matches the
> > normal loop exit and one where the early exit is "inverted", i.e. it's an 
> > early
> exit edge.
> >
> > In the latter case we must always restart the loop for VF iterations.
> > For an early exit the reason is obvious, but there are cases where the
> > "normal" exit is located before the early one.  This exit then does a
> > check on ivtmp resulting in us leaving the loop since it thinks we're done.
> >
> > In these case we may still have side-effects to perform so we also go
> > to the scalar loop.
> >
> > For the "normal" exit niters has already been adjusted for peeling,
> > for the early exits we must find out how many iterations we actually
> > did.  So we have to recalculate the new position for each exit.
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide
> unused.
> > (vect_update_ivs_after_vectorizer): Support early break.
> > (vect_do_peeling): Use it.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > index
> >
> d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> d2654cf1
> > c842baac58f5 100644
> > --- a/gcc/tree-vect-loop-manip.cc
> > +++ b/gcc/tree-vect-loop-manip.cc
> > @@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512
> (class loop *loop,
> > loop handles exactly VF scalars per iteration.  */
> >
> >  static gcond *
> > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > exit_edge,
> > +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> > +exit_edge,
> > class loop *loop, tree niters, tree step,
> > tree final_iv, bool niters_maybe_zero,
> > gimple_stmt_iterator loop_cond_gsi) @@ -
> 1412,7 +1412,7 @@
> > vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info
> loop_vinfo
> > When this happens we need to flip the understanding of main and other
> > exits by peeling and IV updates.  */
> >
> > -bool inline
> > +bool
> >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> >return single_pred (loop->latch) == loop_exit->src; @@ -2142,6
> > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> >   Input:
> >   - LOOP - a loop that is going to be vectorized. The last few 
> > iterations
> >of LOOP were peeled.
> > + - VF   - The chosen vectorization factor for LOOP.
> >   - NITERS - the number of iterations that LOOP executes (before it is
> >  vectorized). i.e, the number of times the ivs should be 
> > bumped.
> >   - UPDATE_E - a successor edge of LOOP->exit that is on the
> > (only) path
> 
> the comment on this is now a bit misleading, can you try to update it and/or
> move the comment bits to the docs on EARLY_EXIT?
> 
> > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info
> loop_vinfo)
> >The phi args associated with the edge UPDATE_E in the bb
> >UPDATE_E->dest are updated accordingly.
> >
> > + - restart_loop - Indicates whether the scalar loop needs to
> > + restart the
> 
> params are ALL_CAPS
> 
> > + iteration count where the vector loop began.
> > +
> >   Assumption 1: Like the rest of the vectorizer, this function assumes
> >   a single loop exit that has a single predecessor.
> >
> > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info
> loop_vinfo)
> >   */
> >
> >  static void
> > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > - tree niters, edge update_e)
> > +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > +poly_uint64 vf,
> 
> LOOP_VINFO_VECT_FACTOR?
> 
> > + tree niters, edge update_e, bool
> restart_loop)
> 
> I think 'bool early_exit' is better here?  I wonder if we have an "early"
> exit after the main exit we are probably sure there are no side-effects to re-
> execute and could avoid this restarting?

Side effects yes, but the actual check may not have been performed yet.
If you remember https://gist.github.com/Mistuke/66f14fe5c1be32b91ce149bd9b8bb35f
There in the clz loop through the "main" exit you still have to see if that 
iteration
did not contain the entry.  This is because the loop counter is incremented
before you iterate.

> 
> >  {
> >gphi_iterator gsi,

[PATCH] s390: Fix ICE in testcase pr89233

2023-11-15 Thread Juergen Christ

When using GNU vector extensions, an access outside of the vector size
caused an ICE on s390.  Fix this by aligning with the vec_extract
builtin, i.e., computing constant index modulo number of lanes.

Fixes testcase gcc.target/s390/pr89233.c.

Bootstrapped and tested on s390.  OK for mainline?

gcc/ChangeLog:

* config/s390/vector.md: (*vec_extract) Fix.

Signed-off-by: Juergen Christ 
---
 gcc/config/s390/vector.md | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 7d1eb36e8446..deda5990a035 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -532,12 +532,14 @@
  (match_operand:V1 "nonmemory_operand"  "v,v")
  (parallel
   [(match_operand:SI 2 "nonmemory_operand" "an,I")])))]
-  "TARGET_VX
-   && (!CONST_INT_P (operands[2])
-   || UINTVAL (operands[2]) < GET_MODE_NUNITS (mode))"
-  "@
-   vlgv\t%0,%v1,%Y2
-   vste\t%v1,%0,%2"
+  "TARGET_VX"
+  {
+if (CONST_INT_P (operands[2]))
+ operands[2] = GEN_INT (UINTVAL (operands[2]) & (GET_MODE_NUNITS 
(mode) - 1));
+if (which_alternative == 0)
+  return "vlgv\t%0,%v1,%Y2";
+   return "vste\t%v1,%0,%2";
+  }
   [(set_attr "op_type" "VRS,VRX")])
 
 ; vlgvb, vlgvh, vlgvf, vlgvg
-- 
2.39.3

[PATCH] s390: split int128 load

2023-11-15 Thread Juergen Christ

Issue two loads when using GPRs instead of one load-multiple.

Bootstrapped and tested on s390.  OK for mainline?

gcc/ChangeLog:

* config/s390/s390.md: Split TImode loads.

gcc/testsuite/ChangeLog:

* gcc.target/s390/int128load.c: New test.

Signed-off-by: Juergen Christ 
---
 gcc/config/s390/s390.md|  4 
 gcc/testsuite/gcc.target/s390/int128load.c | 14 ++
 2 files changed, 14 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/int128load.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 3f29ba214427..5bff69aeb350 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -1687,8 +1687,6 @@
   [(set (match_operand:TI 0 "nonimmediate_operand" "")
 (match_operand:TI 1 "general_operand" ""))]
   "TARGET_ZARCH && reload_completed
-   && !s_operand (operands[0], TImode)
-   && !s_operand (operands[1], TImode)
&& s390_split_ok_p (operands[0], operands[1], TImode, 0)"
   [(set (match_dup 2) (match_dup 4))
(set (match_dup 3) (match_dup 5))]
@@ -1703,8 +1701,6 @@
   [(set (match_operand:TI 0 "nonimmediate_operand" "")
 (match_operand:TI 1 "general_operand" ""))]
   "TARGET_ZARCH && reload_completed
-   && !s_operand (operands[0], TImode)
-   && !s_operand (operands[1], TImode)
&& s390_split_ok_p (operands[0], operands[1], TImode, 1)"
   [(set (match_dup 2) (match_dup 4))
(set (match_dup 3) (match_dup 5))]
diff --git a/gcc/testsuite/gcc.target/s390/int128load.c 
b/gcc/testsuite/gcc.target/s390/int128load.c
new file mode 100644
index ..35d5380704b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/int128load.c
@@ -0,0 +1,14 @@
+/* Check that int128 loads and stores are split.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -mzarch -march=zEC12" } */
+
+__int128 global;
+
+void f(__int128 x)
+{
+  global = x;
+}
+
+/* { dg-final { scan-assembler-times "lg\t" 2 } } */
+/* { dg-final { scan-assembler-times "stg\t" 2 } } */
-- 
2.39.3

[PATCH] s390: implement flags output

2023-11-15 Thread Juergen Christ

Implement flags output for inline assemblies.  Only use one output constraint
that captures the whole condition code.  No breakout into different condition
codes is allowed.  Also, only one condition code variable is allowed.

Add further logic to canonicalize various cases where we combine different
cases of possible condition codes.

Bootstrapped and tested on s390.  OK for mainline?

gcc/ChangeLog:

* config/s390/s390-c.cc (s390_cpu_cpp_builtins): Define
__GCC_ASM_FLAG_OUTPUTS__.
* config/s390/s390.cc (s390_canonicalize_comparison): More
UNSPEC_CC_TO_INT cases.
(s390_md_asm_adjust): Implement flags output.
* config/s390/s390.md (ccstore4): Allow mask operands.
* doc/extend.texi: Document flags output.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ccor.c: New test.

Signed-off-by: Juergen Christ 
---
 gcc/config/s390/s390-c.cc|   1 +
 gcc/config/s390/s390.cc  | 139 ++-
 gcc/config/s390/s390.md  |   8 +-
 gcc/doc/extend.texi  |   5 +
 gcc/testsuite/gcc.target/s390/ccor.c |  88 +
 5 files changed, 232 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/ccor.c

diff --git a/gcc/config/s390/s390-c.cc b/gcc/config/s390/s390-c.cc
index 269f4f8e978d..c126e6d323d7 100644
--- a/gcc/config/s390/s390-c.cc
+++ b/gcc/config/s390/s390-c.cc
@@ -409,6 +409,7 @@ s390_cpu_cpp_builtins (cpp_reader *pfile)
 cpp_define (pfile, "__LONG_DOUBLE_128__");
   cl_target_option_save (&opts, &global_options, &global_options_set);
   s390_cpu_cpp_builtins_internal (pfile, &opts, NULL);
+  cpp_define (pfile, "__GCC_ASM_FLAG_OUTPUTS__");
 }
 
 #if S390_USE_TARGET_ATTRIBUTE
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 61c5f88de8af..a19dd7849b84 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -1877,6 +1877,97 @@ s390_canonicalize_comparison (int *code, rtx *op0, rtx 
*op1,
  *code = new_code;
}
 }
+  /* Remove UNSPEC_CC_TO_INT from connectives.  This happens for
+ checks against multiple condition codes. */
+  if (GET_CODE (*op0) == AND
+  && GET_CODE (XEXP (*op0, 0)) == UNSPEC
+  && XINT (XEXP (*op0, 0), 1) == UNSPEC_CC_TO_INT
+  && XVECLEN (XEXP (*op0, 0), 0) == 1
+  && REGNO (XVECEXP (XEXP (*op0, 0), 0, 0)) == CC_REGNUM
+  && CONST_INT_P (XEXP (*op0, 1))
+  && CONST_INT_P (*op1)
+  && INTVAL (XEXP (*op0, 1)) == -3
+  && *code == EQ)
+{
+  if (INTVAL (*op1) == 0)
+   {
+ /* case cc == 0 || cc = 2 => mask = 0xa */
+ *op0 = XVECEXP (XEXP (*op0, 0), 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0xa);
+   }
+  else if (INTVAL (*op1) == 1)
+   {
+ /* case cc == 1 || cc == 3 => mask = 0x5 */
+ *op0 = XVECEXP (XEXP (*op0, 0), 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0x5);
+   }
+}
+  if (GET_CODE (*op0) == PLUS
+  && GET_CODE (XEXP (*op0, 0)) == UNSPEC
+  && XINT (XEXP (*op0, 0), 1) == UNSPEC_CC_TO_INT
+  && XVECLEN (XEXP (*op0, 0), 0) == 1
+  && REGNO (XVECEXP (XEXP (*op0, 0), 0, 0)) == CC_REGNUM
+  && CONST_INT_P (XEXP (*op0, 1))
+  && CONST_INT_P (*op1)
+  && (*code == LEU || *code == GTU))
+{
+  if (INTVAL (*op1) == 1)
+   {
+ if (INTVAL (XEXP (*op0, 1)) == -1)
+   {
+ /* case cc == 1 || cc == 2 => mask = 0x6 */
+ *op0 = XVECEXP (XEXP (*op0, 0), 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0x6);
+ *code = *code == GTU ? NE : EQ;
+   }
+ else if (INTVAL (XEXP (*op0, 1)) == -2)
+   {
+ /* case cc == 2 || cc == 3 => mask = 0x3 */
+ *op0 = XVECEXP (XEXP (*op0, 0), 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0x3);
+ *code = *code == GTU ? NE : EQ;
+   }
+   }
+  else if (INTVAL (*op1) == 2
+  && INTVAL (XEXP (*op0, 1)) == -1)
+   {
+ /* case cc == 1 || cc == 2 || cc == 3 => mask = 0x7 */
+ *op0 = XVECEXP (XEXP (*op0, 0), 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0x7);
+ *code = *code == GTU ? NE : EQ;
+   }
+}
+  else if (*code == LEU || *code == GTU)
+{
+  if (GET_CODE (*op0) == UNSPEC
+ && XINT (*op0, 1) == UNSPEC_CC_TO_INT
+ && XVECLEN (*op0, 0) == 1
+ && REGNO (XVECEXP (*op0, 0, 0)) == CC_REGNUM
+ && CONST_INT_P (*op1))
+   {
+ if (INTVAL (*op1) == 1)
+   {
+ /* case cc == 0 || cc == 1 => mask = 0xc */
+ *op0 = XVECEXP (*op0, 0, 0);
+ *op1 = gen_rtx_CONST_INT (VOIDmode, 0xc);
+ *code = *code == GTU ? NE : EQ;
+   }
+ else if (INTVAL (*op1) == 2)
+   {
+ /* case cc == 0 || cc == 1 || cc == 2 => mask = 0xd */
+ *op0 = XVECEXP (*op0, 0, 0);
+ *op1 = gen_rtx_CONST_INT

RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

2023-11-15 Thread Richard Biener

On Wed, 15 Nov 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, November 15, 2023 1:01 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> > breaks and arbitrary exits
> > 
> > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > 
> > > Patch updated to latest trunk:
> > >
> > > Hi All,
> > >
> > > This changes the PHI node updates to support early breaks.
> > > It has to support both the case where the loop's exit matches the
> > > normal loop exit and one where the early exit is "inverted", i.e. it's an 
> > > early
> > exit edge.
> > >
> > > In the latter case we must always restart the loop for VF iterations.
> > > For an early exit the reason is obvious, but there are cases where the
> > > "normal" exit is located before the early one.  This exit then does a
> > > check on ivtmp resulting in us leaving the loop since it thinks we're 
> > > done.
> > >
> > > In these case we may still have side-effects to perform so we also go
> > > to the scalar loop.
> > >
> > > For the "normal" exit niters has already been adjusted for peeling,
> > > for the early exits we must find out how many iterations we actually
> > > did.  So we have to recalculate the new position for each exit.
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide
> > unused.
> > >   (vect_update_ivs_after_vectorizer): Support early break.
> > >   (vect_do_peeling): Use it.
> > >
> > > --- inline copy of patch ---
> > >
> > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > index
> > >
> > d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> > d2654cf1
> > > c842baac58f5 100644
> > > --- a/gcc/tree-vect-loop-manip.cc
> > > +++ b/gcc/tree-vect-loop-manip.cc
> > > @@ -1200,7 +1200,7 @@ vect_set_loop_condition_partial_vectors_avx512
> > (class loop *loop,
> > > loop handles exactly VF scalars per iteration.  */
> > >
> > >  static gcond *
> > > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > > exit_edge,
> > > +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */, edge
> > > +exit_edge,
> > >   class loop *loop, tree niters, tree step,
> > >   tree final_iv, bool niters_maybe_zero,
> > >   gimple_stmt_iterator loop_cond_gsi) @@ -
> > 1412,7 +1412,7 @@
> > > vect_set_loop_condition (class loop *loop, edge loop_e, loop_vec_info
> > loop_vinfo
> > > When this happens we need to flip the understanding of main and other
> > > exits by peeling and IV updates.  */
> > >
> > > -bool inline
> > > +bool
> > >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> > >return single_pred (loop->latch) == loop_exit->src; @@ -2142,6
> > > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> > >   Input:
> > >   - LOOP - a loop that is going to be vectorized. The last few 
> > > iterations
> > >of LOOP were peeled.
> > > + - VF   - The chosen vectorization factor for LOOP.
> > >   - NITERS - the number of iterations that LOOP executes (before it is
> > >  vectorized). i.e, the number of times the ivs should be 
> > > bumped.
> > >   - UPDATE_E - a successor edge of LOOP->exit that is on the
> > > (only) path
> > 
> > the comment on this is now a bit misleading, can you try to update it and/or
> > move the comment bits to the docs on EARLY_EXIT?
> > 
> > > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info
> > loop_vinfo)
> > >The phi args associated with the edge UPDATE_E in the 
> > > bb
> > >UPDATE_E->dest are updated accordingly.
> > >
> > > + - restart_loop - Indicates whether the scalar loop needs to
> > > + restart the
> > 
> > params are ALL_CAPS
> > 
> > > +   iteration count where the vector loop began.
> > > +
> > >   Assumption 1: Like the rest of the vectorizer, this function assumes
> > >   a single loop exit that has a single predecessor.
> > >
> > > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info
> > loop_vinfo)
> > >   */
> > >
> > >  static void
> > > -vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > -   tree niters, edge update_e)
> > > +vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
> > > +poly_uint64 vf,
> > 
> > LOOP_VINFO_VECT_FACTOR?
> > 
> > > +   tree niters, edge update_e, bool
> > restart_loop)
> > 
> > I think 'bool early_exit' is better here?  I wonder if we have an "early"
> > exit after the main exit we are probably sure there are no side-effects to 
> > re-
> > execute and could avoid this restarting?
> 
> Side effects yes, but the actual check may not have been performed yet.
> If yo

[PATCH] s390: Fix generation of s390-gen-builtins.h

2023-11-15 Thread Stefan Schulze Frielinghaus

By default the preprocessed output includes linemarkers.  This leads to
an error if -pedantic is used as e.g. during bootstrap:

s390-gen-builtins.h:1:3: error: style of line directive is a GCC extension 
[-Werror]

Fixed by omitting linemarkers while generating s390-gen-builtins.h.

gcc/ChangeLog:

* config/s390/t-s390: Generate s390-gen-builtins.h without
linemarkers.
---
 gcc/config/s390/t-s390 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/t-s390 b/gcc/config/s390/t-s390
index 4ab9718f6e2..2e884c367de 100644
--- a/gcc/config/s390/t-s390
+++ b/gcc/config/s390/t-s390
@@ -33,4 +33,4 @@ s390-d.o: $(srcdir)/config/s390/s390-d.cc
$(POSTCOMPILE)
 
 s390-gen-builtins.h: $(srcdir)/config/s390/s390-builtins.h
-   $(COMPILER) -E $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $< > $@
+   $(COMPILER) -E -P $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $< > 
$@
-- 
2.41.0

Re: [PATCH] s390: Fix generation of s390-gen-builtins.h

2023-11-15 Thread Andreas Krebbel

On 11/15/23 14:29, Stefan Schulze Frielinghaus wrote:
> By default the preprocessed output includes linemarkers.  This leads to
> an error if -pedantic is used as e.g. during bootstrap:
> 
> s390-gen-builtins.h:1:3: error: style of line directive is a GCC extension 
> [-Werror]
> 
> Fixed by omitting linemarkers while generating s390-gen-builtins.h.
> 
> gcc/ChangeLog:
> 
>   * config/s390/t-s390: Generate s390-gen-builtins.h without
>   linemarkers.

Ok, Thanks!

Andreas


> ---
>  gcc/config/s390/t-s390 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/s390/t-s390 b/gcc/config/s390/t-s390
> index 4ab9718f6e2..2e884c367de 100644
> --- a/gcc/config/s390/t-s390
> +++ b/gcc/config/s390/t-s390
> @@ -33,4 +33,4 @@ s390-d.o: $(srcdir)/config/s390/s390-d.cc
>   $(POSTCOMPILE)
>  
>  s390-gen-builtins.h: $(srcdir)/config/s390/s390-builtins.h
> - $(COMPILER) -E $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $< > $@
> + $(COMPILER) -E -P $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) $< > 
> $@

RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits

2023-11-15 Thread Richard Biener

On Wed, 15 Nov 2023, Tamar Christina wrote:

> Patch updated to trunk.
> 
> This adds support to vectorizable_live_reduction to handle multiple exits by

vectorizable_live_operation, but I do wonder how you handle reductions?

> doing a search for which exit the live value should be materialized in.
> 
> Additinally which value in the index we're after depends on whether the exit
> it's materialized in is an early exit or whether the loop's main exit is
> different from the loop's natural one (i.e. the one with the same src block as
> the latch).
> 
> In those two cases we want the first rather than the last value as we're going
> to restart the iteration in the scalar loop.  For VLA this means we need to
> reverse both the mask and vector since there's only a way to get the last
> active element and not the first.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
>   * tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
>   * tree-vectorizer.h (perm_mask_for_reverse): Expose.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> 4cf7f65dc164db27a498b31fe7ce0d9af3f3e299..2476e59ef488fd0a3b296ced7b0d4d3e76a3634f
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10627,12 +10627,60 @@ vectorizable_live_operation (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>  lhs' = new_tree;  */
>  
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> +  /* A value can only be live in one exit.  So figure out which one.  */

Well, a value can be live across multiple exits!

> +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> +  /* Check if we have a loop where the chosen exit is not the main exit,
> +  in these cases for an early break we restart the iteration the vector 
> code
> +  did.  For the live values we want the value at the start of the 
> iteration
> +  rather than at the end.  */
> +  bool restart_loop = false;
> +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> + {
> +   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> + if (!is_gimple_debug (use_stmt)
> + && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> +   {

In fact when you get here you know the use is in a LC PHI.  Use
FOR_EACH_IMM_USE_FAST and you can get at the edge
via phi_arg_index_from_use and gimple_phi_arg_edge.

As said you have to process all exits the value is live on, not only
the first.

> + basic_block use_bb = gimple_bb (use_stmt);
> + for (auto edge : get_loop_exit_edges (loop))
> +   {
> + /* Alternative exits can have an intermediate BB in
> +between to update the IV.  In those cases we need to
> +look one block further.  */
> + if (use_bb == edge->dest
> + || (single_succ_p (edge->dest)
> + && use_bb == single_succ (edge->dest)))
> +   {
> + exit_e = edge;
> + goto found;
> +   }
> +   }
> +   }
> +found:
> +   /* If the edge isn't a single pred then split the edge so we have a
> +  location to place the live operations.  Perhaps we should always
> +  split during IV updating.  But this way the CFG is cleaner to
> +  follow.  */
> +   restart_loop = !vect_is_loop_exit_latch_pred (exit_e, loop);
> +   if (!single_pred_p (exit_e->dest))
> + exit_e = single_pred_edge (split_edge (exit_e));
> +
> +   /* For early exit where the exit is not in the BB that leads to the
> +  latch then we're restarting the iteration in the scalar loop. So
> +  get the first live value.  */
> +   if (restart_loop)
> + {
> +   vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0];
> +   vec_lhs = gimple_get_lhs (vec_stmt);
> +   bitstart = build_zero_cst (TREE_TYPE (bitstart));

No, this doesn't work for SLP.  Note this also gets you the "first" live
value _after_ the vector iteration.  Btw, I fail to see why you need
to handle STMT_VINFO_LIVE at all for the early exits - this is
scalar values live _after_ all iterations of the loop, thus it's
provided by the scalar epilog that always runs when we exit the vector
loop early.

The story is different for reductions though (unless we fail to support
early breaks for those at the moment).

Richard.


> + }
> + }
> +
> +  basic_block exit_bb = exit_e->dest;
>gcc_assert (single_pred_p (exit_bb));
>  
>tree vec_lhs_phi = copy_ssa_name (vec_lhs);
>gimple *phi = create_phi_node (vec_lhs_phi, exit_bb);
> -  SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_E

[committed] amdgcn: simplify secondary reload patterns

2023-11-15 Thread Andrew Stubbs

This patch makes no functional changes, but cleans up the code a little 
to make way for my next patch.


The confusung "reload_in" and "reload_out" define_expand were used 
solely for secondary reload and were nothing more than aliases for the 
"sgprbase" instructions.  I've now learned that the constraints on these 
patterns were active (unusually for define_expand) so having them hide 
or duplicate the constraints from the real insns is pointless.


Also, whatever restriction previously prevented use of the "@" feature, 
and led to creating the "CODE_FOR" macros, no longer exists (maybe 
moving to C++ fixed it?), so that can get cleaned up too.


Andrew


amdgcn: simplify secondary reload patterns

Remove some unnecessary complexity; no functional change is intended,
although LRA appears to use the constraints from the reload_in/out
patterns, so it's probably an improvement for it to see the real sgprbase
constraints.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (mov_sgprbase): Add @ modifier.
(reload_in): Delete.
(reload_out): Delete.
* config/gcn/gcn.cc (CODE_FOR): Delete.
(get_code_for_##PREFIX##vN##SUFFIX): Delete.
(CODE_FOR_OP): Delete.
(get_code_for_##PREFIX): Delete.
(gcn_secondary_reload): Replace "get_code_for" with "code_for".

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 8c441696ca4..8dc93e8c82e 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -641,7 +641,7 @@ (define_insn "mov_exec"
 ;   vT += Sv
 ;   flat_load v, vT
 
-(define_insn "mov_sgprbase"
+(define_insn "@mov_sgprbase"
   [(set (match_operand:V_1REG 0 "nonimmediate_operand")
(unspec:V_1REG
  [(match_operand:V_1REG 1 "general_operand")]
@@ -655,7 +655,7 @@ (define_insn "mov_sgprbase"
   [m,v ,&v;*   ,12] #
   })
 
-(define_insn "mov_sgprbase"
+(define_insn "@mov_sgprbase"
   [(set (match_operand:V_2REG 0 "nonimmediate_operand" "= v, v, m")
(unspec:V_2REG
  [(match_operand:V_2REG 1 "general_operand"   "vDB, m, v")]
@@ -672,7 +672,7 @@ (define_insn "mov_sgprbase"
   [(set_attr "type" "vmult,*,*")
(set_attr "length" "8,12,12")])
 
-(define_insn "mov_sgprbase"
+(define_insn "@mov_sgprbase"
   [(set (match_operand:V_4REG 0 "nonimmediate_operand")
(unspec:V_4REG
  [(match_operand:V_4REG 1 "general_operand")]
@@ -685,31 +685,6 @@ (define_insn "mov_sgprbase"
   [m,v  ,&v;*,12] #
   })
 
-; reload_in was once a standard name, but here it's only referenced by
-; gcn_secondary_reload.  It allows a reload with a scratch register.
-
-(define_expand "reload_in"
-  [(set (match_operand:V_MOV 0 "register_operand" "= v")
-   (match_operand:V_MOV 1 "memory_operand"   "  m"))
-   (clobber (match_operand: 2 "register_operand" "=&v"))]
-  ""
-  {
-emit_insn (gen_mov_sgprbase (operands[0], operands[1], operands[2]));
-DONE;
-  })
-
-; reload_out is similar to reload_in, above.
-
-(define_expand "reload_out"
-  [(set (match_operand:V_MOV 0 "memory_operand"  "= m")
-   (match_operand:V_MOV 1 "register_operand" "  v"))
-   (clobber (match_operand: 2 "register_operand" "=&v"))]
-  ""
-  {
-emit_insn (gen_mov_sgprbase (operands[0], operands[1], operands[2]));
-DONE;
-  })
-
 ; Expand scalar addresses into gather/scatter patterns
 
 (define_split
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index ac299259213..28065c50bfd 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -1388,64 +1388,6 @@ GEN_VN_NOEXEC (vec_series,si, A(rtx dest, rtx x, rtx c), 
A(dest, x, c))
 #undef GET_VN_FN
 #undef A
 
-/* Get icode for vector instructions without an optab.  */
-
-#define CODE_FOR(PREFIX, SUFFIX) \
-static int \
-get_code_for_##PREFIX##vN##SUFFIX (int nunits) \
-{ \
-  switch (nunits) \
-{ \
-case 2: return CODE_FOR_##PREFIX##v2##SUFFIX; \
-case 4: return CODE_FOR_##PREFIX##v4##SUFFIX; \
-case 8: return CODE_FOR_##PREFIX##v8##SUFFIX; \
-case 16: return CODE_FOR_##PREFIX##v16##SUFFIX; \
-case 32: return CODE_FOR_##PREFIX##v32##SUFFIX; \
-case 64: return CODE_FOR_##PREFIX##v64##SUFFIX; \
-} \
-  \
-  gcc_unreachable (); \
-  return CODE_FOR_nothing; \
-}
-
-#define CODE_FOR_OP(PREFIX) \
- CODE_FOR (PREFIX, qi) \
-   CODE_FOR (PREFIX, hi) \
-   CODE_FOR (PREFIX, hf) \
-   CODE_FOR (PREFIX, si) \
-   CODE_FOR (PREFIX, sf) \
-   CODE_FOR (PREFIX, di) \
-   CODE_FOR (PREFIX, df) \
-   CODE_FOR (PREFIX, ti) \
-static int \
-get_code_for_##PREFIX (machine_mode mode) \
-{ \
-  int vf = GET_MODE_NUNITS (mode); \
-  machine_mode smode = GET_MODE_INNER (mode); \
-  \
-  switch (smode) \
-{ \
-case E_QImode: return get_code_for_##PREFIX##vNqi (vf); \
-case E_HImode: return get_code_for_##PREFIX##vNhi (vf); \
-case E_HFmode: return get_code_for_##PREFIX##vNhf (vf); \
-case E_SImode: return get_code_for_##PREFIX##vNsi (vf); \
-case E_SFmode: return ge

[committed] amdgcn: Add Accelerator VGPR registers

2023-11-15 Thread Andrew Stubbs

AMD GPUs since CDNA1 have had a new register file with an additional 256 
32-bit-by-64-lane vector registers.  This doubles the number of vector 
registers on the device, compared to previous models.  The way the 
hardware works is that the register file is divided between all the 
running threads, so a single thread cannot use all this capacity without 
limiting parallism; doubling the number makes this much nicer.


The new registers can only be used for selected operations (mostly 
related to matrices), none of which GCC supports easily, but we can use 
them as spill space and avoid costly stack accesses for very large 
registers.


In CDNA2 there were additional instruction encodings added for load and 
store to and from these new registers, so that opens up more 
possibilities for optimzations.


This patch adds the new registers as CALL_USED (so they will never add 
to function call overhead), configures them as spill space and 
load/store targets (CDNA2 only), and provides the necessary move 
instructions. There are many tweaks to the target hooks to handle the 
new cases, but there are not intended to be any functional changes to 
any other registers or instructions.


The original work was done by Andrew Jenner, and I've finished off the 
task with debug and tidy-up.


Andrew

amdgcn: Add Accelerator VGPR registers

Add the new CDNA register file.  We don't support any of the specialized
instructions that use these registers, but they're useful to relieve
register pressure without spilling to stack.

Co-authored-by: Andrew Jenner  

gcc/ChangeLog:

* config/gcn/constraints.md: Add "a" AVGPR constraint.
* config/gcn/gcn-valu.md (*mov): Add AVGPR alternatives.
(*mov_4reg): Likewise.
(@mov_sgprbase): Likewise.
(gather_insn_1offset): Likewise.
(gather_insn_1offset_ds): Likewise.
(gather_insn_2offsets): Likewise.
(scatter_expr): Likewise.
(scatter_insn_1offset_ds): Likewise.
(scatter_insn_2offsets): Likewise.
* config/gcn/gcn.cc (MAX_NORMAL_AVGPR_COUNT): Define.
(gcn_class_max_nregs): Handle AVGPR_REGS and ALL_VGPR_REGS.
(gcn_hard_regno_mode_ok): Likewise.
(gcn_regno_reg_class): Likewise.
(gcn_spill_class): Allow spilling to AVGPRs on TARGET_CDNA1_PLUS.
(gcn_sgpr_move_p): Handle AVGPRs.
(gcn_secondary_reload): Reload AVGPRs via VGPRs.
(gcn_conditional_register_usage): Handle AVGPRs.
(gcn_vgpr_equivalent_register_operand): New function.
(gcn_valid_move_p): Check for validity of AVGPR moves.
(gcn_compute_frame_offsets): Handle AVGPRs.
(gcn_memory_move_cost): Likewise.
(gcn_register_move_cost): Likewise.
(gcn_vmem_insn_p): Handle TYPE_VOP3P_MAI.
(gcn_md_reorg): Handle AVGPRs.
(gcn_hsa_declare_function_name): Likewise.
(print_reg): Likewise.
(gcn_dwarf_register_number): Likewise.
* config/gcn/gcn.h (FIRST_AVGPR_REG): Define.
(AVGPR_REGNO): Define.
(LAST_AVGPR_REG): Define.
(SOFT_ARG_REG): Update.
(FRAME_POINTER_REGNUM): Update.
(DWARF_LINK_REGISTER): Update.
(FIRST_PSEUDO_REGISTER): Update.
(AVGPR_REGNO_P): Define.
(enum reg_class): Add AVGPR_REGS and ALL_VGPR_REGS.
(REG_CLASS_CONTENTS): Add new register classes and add entries for
AVGPRs to all classes.
(REGISTER_NAMES): Add AVGPRs.
* config/gcn/gcn.md (FIRST_AVGPR_REG, LAST_AVGPR_REG): Define.
(AP_REGNUM, FP_REGNUM): Update.
(define_attr "type"): Add vop3p_mai.
(define_attr "unit"): Handle vop3p_mai.
(define_attr "gcn_version"): Add "cdna2".
(define_attr "enabled"): Handle cdna2.
(*mov_insn): Add AVGPR alternatives.
(*movti_insn): Likewise.
* config/gcn/mkoffload.cc (isa_has_combined_avgprs): New.
(process_asm): Process avgpr_count.
* config/gcn/predicates.md (gcn_avgpr_register_operand): New.
(gcn_avgpr_hard_register_operand): New.
* doc/md.texi: Document the "a" constraint.

gcc/testsuite/ChangeLog:

* gcc.target/gcn/avgpr-mem-double.c: New test.
* gcc.target/gcn/avgpr-mem-int.c: New test.
* gcc.target/gcn/avgpr-mem-long.c: New test.
* gcc.target/gcn/avgpr-mem-short.c: New test.
* gcc.target/gcn/avgpr-spill-double.c: New test.
* gcc.target/gcn/avgpr-spill-int.c: New test.
* gcc.target/gcn/avgpr-spill-long.c: New test.
* gcc.target/gcn/avgpr-spill-short.c: New test.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (max_isa_vgprs): New.
(run_kernel): CDNA2 devices have more VGPRs.

diff --git a/gcc/config/gcn/constraints.md b/gcc/config/gcn/constraints.md
index efe462a0bd6..b29dc5b6643 100644
--- a/gcc/config/gcn/constraints.md
+++ b/gcc/config/gcn/constraints.md
@@ -77,6 +77,9 @@ (define_constraint "Y"
 (define_register_constraint "v" "VGPR_REGS"
   "VGPR registers

RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

2023-11-15 Thread Tamar Christina

> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, November 15, 2023 1:23 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> On Wed, 15 Nov 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Wednesday, November 15, 2023 1:01 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ;
> j...@ventanamicro.com
> > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > support early breaks and arbitrary exits
> > >
> > > On Wed, 15 Nov 2023, Tamar Christina wrote:
> > >
> > > > Patch updated to latest trunk:
> > > >
> > > > Hi All,
> > > >
> > > > This changes the PHI node updates to support early breaks.
> > > > It has to support both the case where the loop's exit matches the
> > > > normal loop exit and one where the early exit is "inverted", i.e.
> > > > it's an early
> > > exit edge.
> > > >
> > > > In the latter case we must always restart the loop for VF iterations.
> > > > For an early exit the reason is obvious, but there are cases where
> > > > the "normal" exit is located before the early one.  This exit then
> > > > does a check on ivtmp resulting in us leaving the loop since it thinks 
> > > > we're
> done.
> > > >
> > > > In these case we may still have side-effects to perform so we also
> > > > go to the scalar loop.
> > > >
> > > > For the "normal" exit niters has already been adjusted for
> > > > peeling, for the early exits we must find out how many iterations
> > > > we actually did.  So we have to recalculate the new position for each 
> > > > exit.
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * tree-vect-loop-manip.cc (vect_set_loop_condition_normal): Hide
> > > unused.
> > > > (vect_update_ivs_after_vectorizer): Support early break.
> > > > (vect_do_peeling): Use it.
> > > >
> > > > --- inline copy of patch ---
> > > >
> > > > diff --git a/gcc/tree-vect-loop-manip.cc
> > > > b/gcc/tree-vect-loop-manip.cc index
> > > >
> > >
> d3fa8699271c4d7f404d648a38a95beabeabc99a..e1d210ab4617c894dab3
> > > d2654cf1
> > > > c842baac58f5 100644
> > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > @@ -1200,7 +1200,7 @@
> > > > vect_set_loop_condition_partial_vectors_avx512
> > > (class loop *loop,
> > > > loop handles exactly VF scalars per iteration.  */
> > > >
> > > >  static gcond *
> > > > -vect_set_loop_condition_normal (loop_vec_info loop_vinfo, edge
> > > > exit_edge,
> > > > +vect_set_loop_condition_normal (loop_vec_info /* loop_vinfo */,
> > > > +edge exit_edge,
> > > > class loop *loop, tree niters, tree 
> > > > step,
> > > > tree final_iv, bool niters_maybe_zero,
> > > > gimple_stmt_iterator loop_cond_gsi) @@ -
> > > 1412,7 +1412,7 @@
> > > > vect_set_loop_condition (class loop *loop, edge loop_e,
> > > > loop_vec_info
> > > loop_vinfo
> > > > When this happens we need to flip the understanding of main and
> other
> > > > exits by peeling and IV updates.  */
> > > >
> > > > -bool inline
> > > > +bool
> > > >  vect_is_loop_exit_latch_pred (edge loop_exit, class loop *loop)  {
> > > >return single_pred (loop->latch) == loop_exit->src; @@ -2142,6
> > > > +2142,7 @@ vect_can_advance_ivs_p (loop_vec_info loop_vinfo)
> > > >   Input:
> > > >   - LOOP - a loop that is going to be vectorized. The last few 
> > > > iterations
> > > >of LOOP were peeled.
> > > > + - VF   - The chosen vectorization factor for LOOP.
> > > >   - NITERS - the number of iterations that LOOP executes (before it 
> > > > is
> > > >  vectorized). i.e, the number of times the ivs should 
> > > > be bumped.
> > > >   - UPDATE_E - a successor edge of LOOP->exit that is on the
> > > > (only) path
> > >
> > > the comment on this is now a bit misleading, can you try to update
> > > it and/or move the comment bits to the docs on EARLY_EXIT?
> > >
> > > > @@ -2152,6 +2153,9 @@ vect_can_advance_ivs_p (loop_vec_info
> > > loop_vinfo)
> > > >The phi args associated with the edge UPDATE_E in 
> > > > the bb
> > > >UPDATE_E->dest are updated accordingly.
> > > >
> > > > + - restart_loop - Indicates whether the scalar loop needs to
> > > > + restart the
> > >
> > > params are ALL_CAPS
> > >
> > > > + iteration count where the vector loop began.
> > > > +
> > > >   Assumption 1: Like the rest of the vectorizer, this function 
> > > > assumes
> > > >   a single loop exit that has a single predecessor.
> > > >
> > > > @@ -2169,18 +2173,22 @@ vect_can_advance_ivs_p (loop_vec_info
> > > loop_vinfo)
> > > >   */
> > > >
> > > >  static void
> > > > -vect_update_ivs_after_v

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-15 Thread Arsen Arsenović

David Edelsohn  writes:

> GCC had been working on AIX with NLS, using "--with-included-gettext".
> --disable-nls gets past the breakage, but GCC does not build for me on AIX
> with NLS enabled.

That should still work with gettext 0.22+ extracted in-tree (it should
be fetched by download_prerequisites).

> A change in dependencies for GCC should have been announced and more widely
> socialized in the GCC development mailing list, not just GCC patches
> mailing list.
>
> I have tried both the AIX Open Source libiconv and libgettext package, and
> the ones that I previously built.  Both fail because GCC configure decides
> to disable NLS, despite being requested, while libcpp is satisfied, so
> tools in the gcc subdirectory don't link against libiconv and the build
> fails.  With the included gettext, I was able to rely on a self-consistent
> solution.

That is interesting.  They should be using the same checks.  I've
checked trunk and regenerated files on it, and saw no significant diff
(some whitespace changes only).  Could you post the config.log of both?

I've never used AIX.  Can I reproduce this on one of the cfarm machines
to poke around?  I've tried cfarm119, but that one lacked git, and I
haven't poked around much further due to time constraints.

TIA, sorry about the inconvenience.  Have a lovely day.

> The current gettext-0.22.3 fails to build for me on AIX.
>
> libcpp configure believes that NLS functions on AIX, but gcc configure
> fails in its tests of gettext functionality, which leads to an inconsistent
> configuration and build breakage.
>
> Thanks, David

-- 
Arsen Arsenović

signature.asc
Description: PGP signature

RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits

2023-11-15 Thread Tamar Christina




> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, November 15, 2023 1:42 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 8/21]middle-end: update vectorizable_live_reduction
> with support for multiple exits and different exits
> 
> On Wed, 15 Nov 2023, Tamar Christina wrote:
> 
> > Patch updated to trunk.
> >
> > This adds support to vectorizable_live_reduction to handle multiple
> > exits by
> 
> vectorizable_live_operation, but I do wonder how you handle reductions?

In the testcases I have reductions all seem to work fine, since reductions are
Placed in the merge block between the two loops and always have the
"value so far from full loop iterations".  These will just be used as seed for 
the
Scalar loop for any partial iterations.

> 
> > doing a search for which exit the live value should be materialized in.
> >
> > Additinally which value in the index we're after depends on whether
> > the exit it's materialized in is an early exit or whether the loop's
> > main exit is different from the loop's natural one (i.e. the one with
> > the same src block as the latch).
> >
> > In those two cases we want the first rather than the last value as
> > we're going to restart the iteration in the scalar loop.  For VLA this
> > means we need to reverse both the mask and vector since there's only a
> > way to get the last active element and not the first.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop.cc (vectorizable_live_operation): Support early exits.
> > * tree-vect-stmts.cc (perm_mask_for_reverse): Expose.
> > * tree-vectorizer.h (perm_mask_for_reverse): Expose.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index
> >
> 4cf7f65dc164db27a498b31fe7ce0d9af3f3e299..2476e59ef488fd0a3b296c
> ed7b0d
> > 4d3e76a3634f 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -10627,12 +10627,60 @@ vectorizable_live_operation (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >lhs' = new_tree;  */
> >
> >class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > -  basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest;
> > +  /* A value can only be live in one exit.  So figure out which
> > + one.  */
> 
> Well, a value can be live across multiple exits!

The same value can only be live across multiple early exits no?  In which
case they'll all still be I the same block as all the early exits end In the 
same
merge block.

So this code is essentially just figuring out if you're an early or normal exit.
Perhaps the comment is inclear..

> 
> > +  edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo);
> > +  /* Check if we have a loop where the chosen exit is not the main 
> > exit,
> > +in these cases for an early break we restart the iteration the vector
> code
> > +did.  For the live values we want the value at the start of the 
> > iteration
> > +rather than at the end.  */
> > +  bool restart_loop = false;
> > +  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > +   {
> > + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> > +   if (!is_gimple_debug (use_stmt)
> > +   && !flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> > + {
> 
> In fact when you get here you know the use is in a LC PHI.  Use
> FOR_EACH_IMM_USE_FAST and you can get at the edge via
> phi_arg_index_from_use and gimple_phi_arg_edge.
> 
> As said you have to process all exits the value is live on, not only the 
> first.
> 
> > +   basic_block use_bb = gimple_bb (use_stmt);
> > +   for (auto edge : get_loop_exit_edges (loop))
> > + {
> > +   /* Alternative exits can have an intermediate BB in
> > +  between to update the IV.  In those cases we need to
> > +  look one block further.  */
> > +   if (use_bb == edge->dest
> > +   || (single_succ_p (edge->dest)
> > +   && use_bb == single_succ (edge->dest)))
> > + {
> > +   exit_e = edge;
> > +   goto found;
> > + }
> > + }
> > + }
> > +found:
> > + /* If the edge isn't a single pred then split the edge so we have a
> > +location to place the live operations.  Perhaps we should always
> > +split during IV updating.  But this way the CFG is cleaner to
> > +follow.  */
> > + restart_loop = !vect_is_loop_exit_latch_pred (exit_e, loop);
> > + if (!single_pred_p (exit_e->dest))
> > +   exit_e = single_pred_edge (split_edge (exit_e));
> > +
> > + /* For early exit where the exit is not in the BB that leads to the
> > +latch then we're restarting the iteration in the scalar loop. So
> > +get the first li

Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-15 Thread Robin Dapp

> Looks wrong. Recover back.

When we demote we use two elements where there was one before.
Therefore the vector needs to be able to hold twice as many
elements.  We adjust vl correctly but the mode is not here.

Regards
 Robin

nvptx: Extend 'brev' test cases (was: [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic)

2023-11-15 Thread Thomas Schwinge

Hi!

On 2023-05-06T17:04:57+0100, "Roger Sayle"  wrote:
> This patch adds support for (a pair of) bit reversal intrinsics
> __builtin_nvptx_brev and __builtin_nvptx_brevll which perform 32-bit
> and 64-bit bit reversal (using nvptx's brev instruction) matching
> the __brev and __brevll instrinsics provided by NVidia's nvcc compiler.
> https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html
>
> This patch has been tested on nvptx-none which make and make -k check
> with no new failures.  Ok for mainline?

(That got pushed in commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/brev-1.c
> +[...]

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/brev-2.c
> +[...]

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/brevll-1.c
> +[...]

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/brevll-2.c
> +[...]

Pushed to master branch commit 61c45c055a5ccfc59463c21ab057dece822d973c
"nvptx: Extend 'brev' test cases", see attached.  That's in order to
observe effects of a later patch, and also to exercise the new nvptx
'check-function-bodies' a bit.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 61c45c055a5ccfc59463c21ab057dece822d973c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 4 Sep 2023 23:06:27 +0200
Subject: [PATCH] nvptx: Extend 'brev' test cases

In order to observe effects of a later patch, extend the 'brev' test cases
added in commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".

	gcc/testsuite/
	* gcc.target/nvptx/brev-1.c: Extend.
	* gcc.target/nvptx/brev-2.c: Rename to...
	* gcc.target/nvptx/brev-2-O2.c: ... this, and extend.  Copy to...
	* gcc.target/nvptx/brev-2-O0.c: ... this, and adapt for '-O0'.
	* gcc.target/nvptx/brevll-1.c: Extend.
	* gcc.target/nvptx/brevll-2.c: Rename to...
	* gcc.target/nvptx/brevll-2-O2.c: ... this, and extend.  Copy to...
	* gcc.target/nvptx/brevll-2-O0.c: ... this, and adapt for '-O0'.
---
 gcc/testsuite/gcc.target/nvptx/brev-1.c   |  12 +-
 gcc/testsuite/gcc.target/nvptx/brev-2-O0.c| 129 
 .../nvptx/{brev-2.c => brev-2-O2.c}   |  27 +++
 gcc/testsuite/gcc.target/nvptx/brevll-1.c |  12 +-
 gcc/testsuite/gcc.target/nvptx/brevll-2-O0.c  | 189 ++
 .../nvptx/{brevll-2.c => brevll-2-O2.c}   |  27 +++
 6 files changed, 392 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/brev-2-O0.c
 rename gcc/testsuite/gcc.target/nvptx/{brev-2.c => brev-2-O2.c} (80%)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/brevll-2-O0.c
 rename gcc/testsuite/gcc.target/nvptx/{brevll-2.c => brevll-2-O2.c} (90%)

diff --git a/gcc/testsuite/gcc.target/nvptx/brev-1.c b/gcc/testsuite/gcc.target/nvptx/brev-1.c
index fbb4fff1e59..af875dd4dcc 100644
--- a/gcc/testsuite/gcc.target/nvptx/brev-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/brev-1.c
@@ -1,8 +1,16 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies {**} {} } } */
+
 unsigned int foo(unsigned int x)
 {
   return __builtin_nvptx_brev(x);
 }
-
-/* { dg-final { scan-assembler "brev.b32" } } */
+/*
+** foo:
+**	...
+**	mov\.u32	(%r[0-9]+), %ar0;
+**	brev\.b32	%value, \1;
+**	st\.param\.u32	\[%value_out\], %value;
+**	ret;
+*/
diff --git a/gcc/testsuite/gcc.target/nvptx/brev-2-O0.c b/gcc/testsuite/gcc.target/nvptx/brev-2-O0.c
new file mode 100644
index 000..ca011ebf472
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/brev-2-O0.c
@@ -0,0 +1,129 @@
+/* { dg-do run } */
+/* { dg-options "-O0" } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { check-function-bodies {**} {} } } */
+
+inline __attribute__((always_inline))
+unsigned int bitreverse32(unsigned int x)
+{
+  return __builtin_nvptx_brev(x);
+}
+
+int main(void)
+{
+  if (bitreverse32(0x) != 0x)
+__builtin_abort();
+  if (bitreverse32(0x) != 0x)
+__builtin_abort();
+
+  if (bitreverse32(0x0001) != 0x8000)
+__builtin_abort();
+  if (bitreverse32(0x0002) != 0x4000)
+__builtin_abort();
+  if (bitreverse32(0x0004) != 0x2000)
+__builtin_abort();
+  if (bitreverse32(0x0008) != 0x1000)
+__builtin_abort();
+  if (bitreverse32(0x0010) != 0x0800)
+__builtin_abort();
+  if (bitreverse32(0x0020) != 0x0400)
+__builtin_abort();
+  if (bitreverse32(0x0040) != 0x0200)
+__builtin_abort();
+  if (bitreverse32(0x0080) != 0x0100)
+__builtin_abort();
+  if (bitreverse32(0x0100) != 0x0080)
+__builtin_abort();
+  if (bitreverse32(0x0200) != 0x0040)
+__builtin_abort();
+

Re: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-15 Thread 钟居哲

Could you show me the example ?

It's used by handling SEW = 64 on RV32. I don't know why this patch touch this 
code.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-15 22:27
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for 
rv32gcv.
> Looks wrong. Recover back.
 
When we demote we use two elements where there was one before.
Therefore the vector needs to be able to hold twice as many
elements.  We adjust vl correctly but the mode is not here.
 
Regards
Robin

Re: [nvptx PATCH] Update nvptx's bitrev2 pattern to use BITREVERSE rtx.

2023-11-15 Thread Thomas Schwinge

Hi!

On 2023-06-08T00:09:00+0100, "Roger Sayle"  wrote:
> This minor tweak to the nvptx backend switches the representation of
> of the brev instruction from an UNSPEC to instead use the new BITREVERSE
> rtx.

ACK.

> This allows various RTL optimizations including evaluation (constant
> folding) of integer constant arguments at compile-time.

..., which we're then observing via
commit 61c45c055a5ccfc59463c21ab057dece822d973c
"nvptx: Extend 'brev' test cases" that I just pushed;

"nvptx: Extend 'brev' test cases".

> This patch has been tested on nvptx-none with make and make -k check
> with no new failures.  Ok for mainline?

I've thus updated the test cases for these changes here, and pushed to
master branch commit 75c20a99b3a242121eef8a532f5224c00c471b56
"Update nvptx's bitrev2 pattern to use BITREVERSE rtx.", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 75c20a99b3a242121eef8a532f5224c00c471b56 Mon Sep 17 00:00:00 2001
From: Roger Sayle 
Date: Thu, 8 Jun 2023 00:09:00 +0100
Subject: [PATCH] Update nvptx's bitrev2 pattern to use BITREVERSE rtx.

This minor tweak to the nvptx backend switches the representation of
of the brev instruction from an UNSPEC to instead use the new BITREVERSE
rtx.  This allows various RTL optimizations including evaluation (constant
folding) of integer constant arguments at compile-time.

	gcc/
	* config/nvptx/nvptx.md (UNSPEC_BITREV): Delete.
	(bitrev2): Represent using bitreverse.
	gcc/testsuite/
	* gcc.target/nvptx/brev-2-O2.c: Adjust.
	* gcc.target/nvptx/brevll-2-O2.c: Likewise.

Co-authored-by: Thomas Schwinge 
---
 gcc/config/nvptx/nvptx.md|  5 +---
 gcc/testsuite/gcc.target/nvptx/brev-2-O2.c   | 25 ++--
 gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c | 25 ++--
 3 files changed, 5 insertions(+), 50 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 1bb93045403..7a7c9948f45 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -34,8 +34,6 @@
UNSPEC_FPINT_CEIL
UNSPEC_FPINT_NEARBYINT
 
-   UNSPEC_BITREV
-
UNSPEC_ALLOCA
 
UNSPEC_SET_SOFTSTACK
@@ -636,8 +634,7 @@
 
 (define_insn "bitrev2"
   [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
-	(unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")]
-		 UNSPEC_BITREV))]
+	(bitreverse:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")))]
   ""
   "%.\\tbrev.b%T0\\t%0, %1;")
 
diff --git a/gcc/testsuite/gcc.target/nvptx/brev-2-O2.c b/gcc/testsuite/gcc.target/nvptx/brev-2-O2.c
index e35052208d0..c707a87f356 100644
--- a/gcc/testsuite/gcc.target/nvptx/brev-2-O2.c
+++ b/gcc/testsuite/gcc.target/nvptx/brev-2-O2.c
@@ -1,7 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
 /* { dg-additional-options -save-temps } */
-/* { dg-final { check-function-bodies {**} {} } } */
 
 inline __attribute__((always_inline))
 unsigned int bitreverse32(unsigned int x)
@@ -96,26 +95,6 @@ int main(void)
 
   return 0;
 }
-/*
-** main:
-**	...
-**	mov\.u32	(%r[0-9]+), 0;
-**	brev\.b32	(%r[0-9]+), \1;
-**	setp\.[^.]+\.u32	%r[0-9]+, \2, 0;
-**	...
-**	mov\.u32	(%r[0-9]+), -1;
-**	brev\.b32	(%r[0-9]+), \3;
-**	setp\.[^.]+\.u32	%r[0-9]+, \4, -1;
-**	...
-**	mov\.u32	(%r[0-9]+), 1;
-**	brev\.b32	(%r[0-9]+), \5;
-**	setp\.[^.]+\.u32	%r[0-9]+, \6, -2147483648;
-**	...
-**	mov\.u32	(%r[0-9]+), 2;
-**	brev\.b32	(%r[0-9]+), \7;
-**	setp\.[^.]+\.u32	%r[0-9]+, \8, 1073741824;
-**	...
-*/
 
-/* { dg-final { scan-assembler-times {\tbrev\.b32\t} 40 } } */
-/* { dg-final { scan-assembler {\mabort\M} } } */
+/* { dg-final { scan-assembler-not {\tbrev\.b32\t} } } */
+/* { dg-final { scan-assembler-not {\mabort\M} } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c b/gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c
index cbfda1b9601..c89be9627f8 100644
--- a/gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c
+++ b/gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c
@@ -1,7 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
 /* { dg-additional-options -save-temps } */
-/* { dg-final { check-function-bodies {**} {} } } */
 
 inline __attribute__((always_inline))
 unsigned long long bitreverse64(unsigned long long x)
@@ -156,26 +155,6 @@ int main(void)
 
   return 0;
 }
-/*
-** main:
-**	...
-**	mov\.u64	(%r[0-9]+), 0;
-**	brev\.b64	(%r[0-9]+), \1;
-**	setp\.[^.]+\.u64	%r[0-9]+, \2, 0;
-**	...
-**	mov\.u64	(%r[0-9]+), -1;
-**	brev\.b64	(%r[0-9]+), \3;
-**	setp\.[^.]+\.u64	%r[0-9]+, \4, -1;
-**	...
-**	mov\.u64	(%r[0-9]+), 1;
-**	brev\.b64	(%r[0-9]+), \5;
-**	setp\.[^.]+\.u64	%r[0-9]+, \6, -9223372036854775808;
-**	...
-**	mov\.u64	(%r[0-9]+), 2;
-**	brev\.b64	(%r[0-9]+), \7;
-**	setp\.[^.]+\.u64	%r[0

nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev' description (was: [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic)

2023-11-15 Thread Thomas Schwinge

Hi!

On 2023-05-06T17:04:57+0100, "Roger Sayle"  wrote:
> This patch adds support for (a pair of) bit reversal intrinsics
> __builtin_nvptx_brev and __builtin_nvptx_brevll which perform 32-bit
> and 64-bit bit reversal (using nvptx's brev instruction) matching
> the __brev and __brevll instrinsics provided by NVidia's nvcc compiler.
> https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html

(That got pushed in commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".)

> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi

> @@ -17941,6 +17942,20 @@ Enable global interrupt.
>  Disable global interrupt.
>  @enddefbuiltin
>
> +@node Nvidia PTX Built-in Functions
> +@subsection Nvidia PTX Built-in Functions
> +
> +These built-in functions are available for the Nvidia PTX target:
> +
> +@defbuiltin{unsigned int __builtin_nvptx_brev (unsigned int @var{x})}
> +Reverse the bit order of a 32-bit unsigned integer.
> +Disable global interrupt.

Pushed to master branch commit 4450984d0a18cd4e352d396231ba2c457d20feea
"nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev' description", see
attached.

> +@enddefbuiltin
> +
> +@defbuiltin{unsigned long long __builtin_nvptx_brevll (unsigned long long 
> @var{x})}
> +Reverse the bit order of a 64-bit unsigned integer.
> +@enddefbuiltin
> +
>  @node Basic PowerPC Built-in Functions
>  @subsection Basic PowerPC Built-in Functions


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 4450984d0a18cd4e352d396231ba2c457d20feea Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 4 Sep 2023 17:20:28 +0200
Subject: [PATCH] nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev'
 description

Minor fix-up for commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".

	gcc/
	* doc/extend.texi (Nvidia PTX Built-in Functions): Fix
	copy'n'paste-o in '__builtin_nvptx_brev' description.
---
 gcc/doc/extend.texi | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 406ccc9bc75..a95121b0124 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18471,7 +18471,6 @@ These built-in functions are available for the Nvidia PTX target:
 
 @defbuiltin{unsigned int __builtin_nvptx_brev (unsigned int @var{x})}
 Reverse the bit order of a 32-bit unsigned integer.
-Disable global interrupt.
 @enddefbuiltin
 
 @defbuiltin{unsigned long long __builtin_nvptx_brevll (unsigned long long @var{x})}
-- 
2.34.1

[PATCH]middle-end: skip checking loop exits if loop malformed [PR111878]

2023-11-15 Thread Tamar Christina

Hi All,

Before my refactoring if the loop->latch was incorrect then find_loop_location
skipped checking the edges and would eventually return a dummy location.

It turns out that a loop can have
loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS) but also not have a latch
in which case get_loop_exit_edges traps.

This restores the old behavior.

Bootstrapped Regtested on x86_64-pc-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/111878
* tree-vect-loop-manip.cc (find_loop_location): Skip edges check if
latch incorrect.

gcc/testsuite/ChangeLog:

PR tree-optimization/111878
* gcc.dg/graphite/pr111878.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/graphite/pr111878.c 
b/gcc/testsuite/gcc.dg/graphite/pr111878.c
new file mode 100644
index 
..6722910062e43c827e94c53b43f106af1848852a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr111878.c
@@ -0,0 +1,19 @@
+/* { dg-options "-O3 -fgraphite-identity -fsave-optimization-record" } */
+
+int long_c2i_ltmp;
+int *long_c2i_cont;
+
+void
+long_c2i (long utmp, int i)
+{
+  int neg = 1;
+  switch (long_c2i_cont[0])
+case 0:
+neg = 0;
+  for (; i; i++)
+if (neg)
+  utmp |= long_c2i_cont[i] ^ 5;
+else
+  utmp |= long_c2i_cont[i];
+  long_c2i_ltmp = utmp;
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
b9161274ce401a7307f3e61ad23aa036701190d7..ff188840c1762d0b5fb6655cb93b5a8662b31343
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1792,7 +1792,8 @@ find_loop_location (class loop *loop)
   if (!loop)
 return dump_user_location_t ();
 
-  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
+  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS)
+  && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun))
 {
   /* We only care about the loop location, so use any exit with location
 information.  */




-- 
diff --git a/gcc/testsuite/gcc.dg/graphite/pr111878.c 
b/gcc/testsuite/gcc.dg/graphite/pr111878.c
new file mode 100644
index 
..6722910062e43c827e94c53b43f106af1848852a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr111878.c
@@ -0,0 +1,19 @@
+/* { dg-options "-O3 -fgraphite-identity -fsave-optimization-record" } */
+
+int long_c2i_ltmp;
+int *long_c2i_cont;
+
+void
+long_c2i (long utmp, int i)
+{
+  int neg = 1;
+  switch (long_c2i_cont[0])
+case 0:
+neg = 0;
+  for (; i; i++)
+if (neg)
+  utmp |= long_c2i_cont[i] ^ 5;
+else
+  utmp |= long_c2i_cont[i];
+  long_c2i_ltmp = utmp;
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
b9161274ce401a7307f3e61ad23aa036701190d7..ff188840c1762d0b5fb6655cb93b5a8662b31343
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1792,7 +1792,8 @@ find_loop_location (class loop *loop)
   if (!loop)
 return dump_user_location_t ();
 
-  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
+  if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS)
+  && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun))
 {
   /* We only care about the loop location, so use any exit with location
 information.  */

[PATCH]AArch64 Add pattern for unsigned widenings (uxtl) to zip{1,2}

2023-11-15 Thread Tamar Christina

Hi All,

This changes unpack instructions to use zip{1,2} when doing a zero-extending
widening operation.  Permutes generally have a higher throughput than the
widening operations. Zeros are shuffled into the top half of the registers.

The testcase

void d2 (unsigned * restrict a, unsigned short *b, int n)
{
for (int i = 0; i < (n & -8); i++)
  a[i] = b[i];
}

now generates:

moviv1.4s, 0
.L3:
ldr q0, [x1], 16
zip1v2.8h, v0.8h, v1.8h
zip2v0.8h, v0.8h, v1.8h
stp q2, q0, [x0]
add x0, x0, 32
cmp x1, x2
bne .L3


instead of:

.L3:
ldr q0, [x1], 16
uxtlv1.4s, v0.4h
uxtl2   v0.4s, v0.8h
stp q1, q0, [x0]
add x0, x0, 32
cmp x1, x2
bne .L3

Since we need the extra 0 register we do this only for the vectorizer's lo/hi
pairs when we know the 0 will be floated outside of the loop.

This gives an 8% speed-up in Imagick in SPECCPU 2017 on Neoverse V2.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (vec_unpack_lo__lo___zip): New.
(aarch64_uaddw__zip): New.
* config/aarch64/iterators.md (PERM_EXTEND, perm_index): New.
(perm_hilo): Add UNSPEC_ZIP1, UNSPEC_ZIP2.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/vmovl_high_1.c: Update codegen.
* gcc.target/aarch64/uxtl-combine-1.c: New test.
* gcc.target/aarch64/uxtl-combine-2.c: New test.
* gcc.target/aarch64/uxtl-combine-3.c: New test.
* gcc.target/aarch64/uxtl-combine-4.c: New test.
* gcc.target/aarch64/uxtl-combine-5.c: New test.
* gcc.target/aarch64/uxtl-combine-6.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
81ff5bad03d598fa0d48df93d172a28bc0d1d92e..3d811007dd94dcd9176d6021a41a196c12fe9c3f
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1988,26 +1988,60 @@ (define_insn "aarch64_simd_vec_unpack_hi_"
   [(set_attr "type" "neon_shift_imm_long")]
 )
 
-(define_expand "vec_unpack_hi_"
+(define_expand "vec_unpacku_hi_"
   [(match_operand: 0 "register_operand")
-   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))]
+   (match_operand:VQW 1 "register_operand")]
+  "TARGET_SIMD"
+  {
+rtx res = gen_reg_rtx (mode);
+rtx tmp = aarch64_gen_shareable_zero (mode);
+if (BYTES_BIG_ENDIAN)
+  emit_insn (gen_aarch64_zip2 (res, tmp, operands[1]));
+else
+ emit_insn (gen_aarch64_zip2 (res, operands[1], tmp));
+emit_move_insn (operands[0],
+  simplify_gen_subreg (mode, res, mode, 0));
+DONE;
+  }
+)
+
+(define_expand "vec_unpacks_hi_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")]
   "TARGET_SIMD"
   {
 rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
-emit_insn (gen_aarch64_simd_vec_unpack_hi_ (operands[0],
- operands[1], p));
+emit_insn (gen_aarch64_simd_vec_unpacks_hi_ (operands[0],
+  operands[1], p));
+DONE;
+  }
+)
+
+(define_expand "vec_unpacku_lo_"
+  [(match_operand: 0 "register_operand")
+   (match_operand:VQW 1 "register_operand")]
+  "TARGET_SIMD"
+  {
+rtx res = gen_reg_rtx (mode);
+rtx tmp = aarch64_gen_shareable_zero (mode);
+if (BYTES_BIG_ENDIAN)
+   emit_insn (gen_aarch64_zip1 (res, tmp, operands[1]));
+else
+   emit_insn (gen_aarch64_zip1 (res, operands[1], tmp));
+emit_move_insn (operands[0],
+  simplify_gen_subreg (mode, res, mode, 0));
 DONE;
   }
 )
 
-(define_expand "vec_unpack_lo_"
+(define_expand "vec_unpacks_lo_"
   [(match_operand: 0 "register_operand")
-   (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))]
+   (match_operand:VQW 1 "register_operand")]
   "TARGET_SIMD"
   {
 rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
-emit_insn (gen_aarch64_simd_vec_unpack_lo_ (operands[0],
- operands[1], p));
+emit_insn (gen_aarch64_simd_vec_unpacks_lo_ (operands[0],
+  operands[1], p));
 DONE;
   }
 )
@@ -4735,6 +4769,34 @@ (define_insn 
"aarch64_subw2_internal"
   [(set_attr "type" "neon_sub_widen")]
 )
 
+(define_insn "aarch64_usubw__zip"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (minus:
+ (match_operand: 1 "register_operand" "w")
+ (subreg:
+   (unspec: [
+   (match_operand:VQW 2 "register_operand" "w")
+   (match_operand:VQW 3 "aarch64_simd_imm_zero" "Dz")
+  ] PERM_EXTEND) 0)))]
+  "TARGET_SIMD"
+  "usubw\\t%0., %1., %2."
+  [(set_attr "type" "neon_sub_widen")]
+)
+
+(define_insn "aarch6

[committed] i386: Fix strict_low_part QImode insn with high input register patterns [PR112540]

2023-11-15 Thread Uros Bizjak

PR target/112540

gcc/ChangeLog:

* config/i386/i386.md (*addqi_ext_1_slp):
Correct operand numbers in split pattern.  Replace !Q constraint
of operand 1 with !qm.  Add insn constrain.
(*subqi_ext_1_slp): Ditto.
(*qi_ext_1_slp): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6136e46b1bc..29ec9425200 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6624,9 +6624,9 @@ (define_insn_and_split "*addqi_ext_1_slp"
  [(match_operand 2 "int248_register_operand" "Q,Q")
   (const_int 8)
   (const_int 8)]) 0)
- (match_operand:QI 1 "nonimmediate_operand" "0,!Q")))
+ (match_operand:QI 1 "nonimmediate_operand" "0,!qm")))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
   "@
add{b}\t{%h2, %0|%0, %h2}
#"
@@ -6638,8 +6638,8 @@ (define_insn_and_split "*addqi_ext_1_slp"
   (plus:QI
 (subreg:QI
   (match_op_dup 3
-[(match_dup 0) (const_int 8) (const_int 8)]) 0)
-  (match_dup 1)))
+[(match_dup 2) (const_int 8) (const_int 8)]) 0)
+  (match_dup 0)))
   (clobber (reg:CC FLAGS_REG))])]
   ""
   [(set_attr "type" "alu")
@@ -7662,14 +7662,14 @@ (define_insn_and_split "*sub_1_slp"
 (define_insn_and_split "*subqi_ext_1_slp"
   [(set (strict_low_part (match_operand:QI 0 "register_operand" "+Q,&Q"))
(minus:QI
- (match_operand:QI 1 "nonimmediate_operand" "0,!Q")
+ (match_operand:QI 1 "nonimmediate_operand" "0,!qm")
  (subreg:QI
(match_operator:SWI248 3 "extract_operator"
  [(match_operand 2 "int248_register_operand" "Q,Q")
   (const_int 8)
   (const_int 8)]) 0)))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
   "@
sub{b}\t{%h2, %0|%0, %h2}
#"
@@ -7679,10 +7679,10 @@ (define_insn_and_split "*subqi_ext_1_slp"
(parallel
  [(set (strict_low_part (match_dup 0))
   (minus:QI
-  (match_dup 1)
+(match_dup 0)
 (subreg:QI
   (match_op_dup 3
-[(match_dup 0) (const_int 8) (const_int 8)]) 0)))
+[(match_dup 2) (const_int 8) (const_int 8)]) 0)))
   (clobber (reg:CC FLAGS_REG))])]
   ""
   [(set_attr "type" "alu")
@@ -11492,9 +11492,9 @@ (define_insn_and_split "*qi_ext_1_slp"
  [(match_operand 2 "int248_register_operand" "Q,Q")
   (const_int 8)
   (const_int 8)]) 0)
- (match_operand:QI 1 "nonimmediate_operand" "0,!Q")))
+ (match_operand:QI 1 "nonimmediate_operand" "0,!qm")))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
   "@
{b}\t{%h2, %0|%0, %h2}
#"
@@ -11504,10 +11504,10 @@ (define_insn_and_split "*qi_ext_1_slp"
(parallel
  [(set (strict_low_part (match_dup 0))
   (any_logic:QI
-  (match_dup 1)
 (subreg:QI
   (match_op_dup 3
-[(match_dup 0) (const_int 8) (const_int 8)]) 0)))
+[(match_dup 2) (const_int 8) (const_int 8)]) 0)))
+  (match_dup 0)
   (clobber (reg:CC FLAGS_REG))])]
   ""
   [(set_attr "type" "alu")

Re: PR111754

2023-11-15 Thread Prathamesh Kulkarni

On Wed, 8 Nov 2023 at 21:57, Prathamesh Kulkarni
 wrote:
>
> On Thu, 26 Oct 2023 at 09:43, Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 26 Oct 2023 at 04:09, Richard Sandiford
> >  wrote:
> > >
> > > Prathamesh Kulkarni  writes:
> > > > On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
> > > >  wrote:
> > > >>
> > > >> Hi,
> > > >>
> > > >> Sorry the slow review.  I clearly didn't think this through properly
> > > >> when doing the review of the original patch, so I wanted to spend
> > > >> some time working on the code to get a better understanding of
> > > >> the problem.
> > > >>
> > > >> Prathamesh Kulkarni  writes:
> > > >> > Hi,
> > > >> > For the following test-case:
> > > >> >
> > > >> > typedef float __attribute__((__vector_size__ (16))) F;
> > > >> > F foo (F a, F b)
> > > >> > {
> > > >> >   F v = (F) { 9 };
> > > >> >   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > > >> > }
> > > >> >
> > > >> > Compiling with -O2 results in following ICE:
> > > >> > foo.c: In function ‘foo’:
> > > >> > foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> > > >> > 6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > > >> >   |  ^~
> > > >> > 0x7f3185 wi::int_traits
> > > >> >>::decompose(long*, unsigned int, std::pair
> > > >> > const&)
> > > >> > ../../gcc/gcc/rtl.h:2314
> > > >> > 0x7f3185 wide_int_ref_storage > > >> > false>::wide_int_ref_storage
> > > >> >>(std::pair const&)
> > > >> > ../../gcc/gcc/wide-int.h:1089
> > > >> > 0x7f3185 generic_wide_int
> > > >> >>::generic_wide_int
> > > >> >>(std::pair const&)
> > > >> > ../../gcc/gcc/wide-int.h:847
> > > >> > 0x7f3185 poly_int<1u, generic_wide_int > > >> > false> > >::poly_int
> > > >> >>(poly_int_full, std::pair const&)
> > > >> > ../../gcc/gcc/poly-int.h:467
> > > >> > 0x7f3185 poly_int<1u, generic_wide_int > > >> > false> > >::poly_int
> > > >> >>(std::pair const&)
> > > >> > ../../gcc/gcc/poly-int.h:453
> > > >> > 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> > > >> > ../../gcc/gcc/rtl.h:2383
> > > >> > 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> > > >> > ../../gcc/gcc/rtx-vector-builder.h:122
> > > >> > 0xfd4e1b vector_builder > > >> > rtx_vector_builder>::elt(unsigned int) const
> > > >> > ../../gcc/gcc/vector-builder.h:253
> > > >> > 0xfd4d11 rtx_vector_builder::build()
> > > >> > ../../gcc/gcc/rtx-vector-builder.cc:73
> > > >> > 0xc21d9c const_vector_from_tree
> > > >> > ../../gcc/gcc/expr.cc:13487
> > > >> > 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > > >> > expand_modifier, rtx_def**, bool)
> > > >> > ../../gcc/gcc/expr.cc:11059
> > > >> > 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, 
> > > >> > expand_modifier)
> > > >> > ../../gcc/gcc/expr.h:310
> > > >> > 0xaee682 expand_return
> > > >> > ../../gcc/gcc/cfgexpand.cc:3809
> > > >> > 0xaee682 expand_gimple_stmt_1
> > > >> > ../../gcc/gcc/cfgexpand.cc:3918
> > > >> > 0xaee682 expand_gimple_stmt
> > > >> > ../../gcc/gcc/cfgexpand.cc:4044
> > > >> > 0xaf28f0 expand_gimple_basic_block
> > > >> > ../../gcc/gcc/cfgexpand.cc:6100
> > > >> > 0xaf4996 execute
> > > >> > ../../gcc/gcc/cfgexpand.cc:6835
> > > >> >
> > > >> > IIUC, the issue is that fold_vec_perm returns a vector having float 
> > > >> > element
> > > >> > type with res_nelts_per_pattern == 3, and later ICE's when it tries
> > > >> > to derive element v[3], not present in the encoding, while trying to
> > > >> > build rtx vector
> > > >> > in rtx_vector_builder::build():
> > > >> >  for (unsigned int i = 0; i < nelts; ++i)
> > > >> > RTVEC_ELT (v, i) = elt (i);
> > > >> >
> > > >> > The attached patch tries to fix this by returning false from
> > > >> > valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and
> > > >> > input vector has non-integral element type, so for VLA vectors, it
> > > >> > will only build result with dup sequence (nelts_per_pattern < 3) for
> > > >> > non-integral element type.
> > > >> >
> > > >> > For VLS vectors, this will still work for stepped sequence since it
> > > >> > will then use the "VLS exception" in fold_vec_perm_cst, and set:
> > > >> > res_npattern = res_nelts and
> > > >> > res_nelts_per_pattern = 1
> > > >> >
> > > >> > and fold the above case to:
> > > >> > F foo (F a, F b)
> > > >> > {
> > > >> >[local count: 1073741824]:
> > > >> >   return { 0.0, 9.0e+0, 0.0, 0.0 };
> > > >> > }
> > > >> >
> > > >> > But I am not sure if this is entirely correct, since:
> > > >> > tree res = out_elts.build ();
> > > >> > will canonicalize the encoding and may result in a stepped sequence
> > > >> > (vector_builder::finalize() may reduce npatterns at the cost of 
> > > >> > increasing
> > > >> > nelts_per_pattern)  ?
> > > >> >
> > > >> > PS: This issue is now latent after PR111648 fix, since
> > > >> > valid_mask

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-15 Thread Xi Ruoyao

On Wed, 2023-11-15 at 15:14 +0100, Arsen Arsenović wrote:
> That is interesting.  They should be using the same checks.  I've
> checked trunk and regenerated files on it, and saw no significant diff
> (some whitespace changes only).  Could you post the config.log of
> both?

You did not regenerate config.in.  But I've regenerated it in r14-5434
anyway.

The related changes:

+/* Define to 1 if you have the Mac OS X function
+   CFLocaleCopyPreferredLanguages in the CoreFoundation framework. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_CFLOCALECOPYPREFERREDLANGUAGES
+#endif
+
+
+/* Define to 1 if you have the Mac OS X function
CFPreferencesCopyAppValue in
+   the CoreFoundation framework. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_CFPREFERENCESCOPYAPPVALUE
+#endif

+/* Define if the GNU dcgettext() function is already present or preinstalled.
+   */
+#ifndef USED_FOR_TARGET
+#undef HAVE_DCGETTEXT
+#endif

+/* Define if the GNU gettext() function is already present or preinstalled. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GETTEXT
+#endif

I don't know if they are related to the issue on AIX though.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] Fortran: fix reallocation on assignment of polymorphic variables [PR110415]

2023-11-15 Thread Andrew Jenner


This patch adds the testcase from PR110415 and fixes the bug.

The problem is that in a couple of places in trans_class_assignment in 
trans-expr.cc, we need to get the run-time size of the polymorphic 
object from the vtbl, but we are currently getting that vtbl from the 
lhs of the assignment rather than the rhs. This gives us the old value 
of the size but we need to pass the new size to __builtin_malloc and 
__builtin_realloc.


I'm fixing this by adding a parameter to trans_class_vptr_len_assignment 
to retrieve the tree corresponding the vptr from the object on the rhs 
of the assignment, and then passing this where it is needed. In the case 
where trans_class_vptr_len_assignment returns NULL_TREE for the rhs vptr 
we use the lhs vptr as before.


To get this to work I also needed to change the implementation of 
trans_class_vptr_len_assignment to create a temporary for the assignment 
in more circumstances. Currently, the "a = func()" assignment in MAIN__ 
doesn't hit the "Create a temporary for complication expressions" case 
on line 9951 because "DECL_P (rse->expr)" is true - the expression has 
already been placed into a temporary. That means we don't hit the "if 
(temp_rhs ..." case on line 10038 and go on to get the vptr_expr from 
"gfc_lval_expr_from_sym (gfc_find_vtab (&re->ts))" on line 10057 which 
is the vtbl of the static type rather than the dynamic one from the rhs. 
So with this fix we create an extra temporary, but that should be 
optimised away in the middle-end so there should be no run-time effect.


I'm not sure if this is the best way to fix this (the Fortran front-end 
is new territory for me) but I've verified that the testcase passes with 
this change, fails without it, and that the change does not introduce 
any FAILs when running the gfortran testcases on x86_64-pc-linux-gnu.


Is this OK for mainline, GCC 13 and OG13?

Thanks,

Andrew

gcc/fortran/
* trans-expr.cc (trans_class_vptr_len_assignment): Add
from_vptrp parameter. Populate it. Don't check for DECL_P
when deciding whether to create temporary.
(trans_class_pointer_fcn, gfc_trans_pointer_assignment): Add
NULL argument to trans_class_vptr_len_assignment calls.
(trans_class_assignment): Get rhs_vptr from
trans_class_vptr_len_assignment and use it for determining size
for allocation/reallocation.

gcc/testsuite/
* gfortran.dg/pr110415.f90: New test.diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 50c4604a025..f1618b55add 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -9936,7 +9936,8 @@ trans_get_upoly_len (stmtblock_t *block, gfc_expr *expr)
 static tree
 trans_class_vptr_len_assignment (stmtblock_t *block, gfc_expr * le,
 gfc_expr * re, gfc_se *rse,
-tree * to_lenp, tree * from_lenp)
+tree * to_lenp, tree * from_lenp,
+tree * from_vptrp)
 {
   gfc_se se;
   gfc_expr * vptr_expr;
@@ -9944,10 +9945,11 @@ trans_class_vptr_len_assignment (stmtblock_t *block, 
gfc_expr * le,
   bool set_vptr = false, temp_rhs = false;
   stmtblock_t *pre = block;
   tree class_expr = NULL_TREE;
+  tree from_vptr = NULL_TREE;
 
   /* Create a temporary for complicated expressions.  */
   if (re->expr_type != EXPR_VARIABLE && re->expr_type != EXPR_NULL
-  && rse->expr != NULL_TREE && !DECL_P (rse->expr))
+  && rse->expr != NULL_TREE)
 {
   if (re->ts.type == BT_CLASS && !GFC_CLASS_TYPE_P (TREE_TYPE (rse->expr)))
class_expr = gfc_get_class_from_expr (rse->expr);
@@ -10044,6 +10046,7 @@ trans_class_vptr_len_assignment (stmtblock_t *block, 
gfc_expr * le,
tmp = rse->expr;
 
  se.expr = gfc_class_vptr_get (tmp);
+ from_vptr = se.expr;
  if (UNLIMITED_POLY (re))
from_len = gfc_class_len_get (tmp);
 
@@ -10065,6 +10068,7 @@ trans_class_vptr_len_assignment (stmtblock_t *block, 
gfc_expr * le,
  gfc_free_expr (vptr_expr);
  gfc_add_block_to_block (block, &se.pre);
  gcc_assert (se.post.head == NULL_TREE);
+ from_vptr = se.expr;
}
   gfc_add_modify (pre, lhs_vptr, fold_convert (TREE_TYPE (lhs_vptr),
se.expr));
@@ -10093,11 +10097,13 @@ trans_class_vptr_len_assignment (stmtblock_t *block, 
gfc_expr * le,
}
 }
 
-  /* Return the _len trees only, when requested.  */
+  /* Return the _len and _vptr trees only, when requested.  */
   if (to_lenp)
 *to_lenp = to_len;
   if (from_lenp)
 *from_lenp = from_len;
+  if (from_vptrp)
+*from_vptrp = from_vptr;
   return lhs_vptr;
 }
 
@@ -10166,7 +10172,7 @@ trans_class_pointer_fcn (stmtblock_t *block, gfc_se 
*lse, gfc_se *rse,
 {
   expr1_vptr = trans_class_vptr_len_assignment (block, expr1,
expr2, rse,

[PATCH] Add support for function attributes and variable attributes

2023-11-15 Thread Guillaume Gomez

Hi,

This patch adds the (incomplete) support for function and variable
attributes. The added attributes are the ones we're using in
rustc_codegen_gcc but all the groundwork is done to add more (and we
will very likely add more as we didn't add all the ones we use in
rustc_codegen_gcc yet).

The only big question with this patch is about `inline`. We currently
handle it as an attribute because it is more convenient for us but is
it ok or should we create a separate function to mark a function as
inlined?

Thanks in advance for the review.
From df75f0eb8aacba249b6e791603752e35778951a4 Mon Sep 17 00:00:00 2001
From: Guillaume Gomez 
Date: Mon, 20 Jun 2022 14:34:39 -0400
Subject: [PATCH] [PATCH] Add support for function attributes and variable
 attributes.

gcc/jit/ChangeLog:

	* dummy-frontend.cc (handle_alias_attribute): New function.
	(handle_always_inline_attribute): New function.
	(handle_cold_attribute): New function.
	(handle_fnspec_attribute): New function.
	(handle_format_arg_attribute): New function.
	(handle_format_attribute): New function.
	(handle_noinline_attribute): New function.
	(handle_target_attribute): New function.
	(handle_used_attribute): New function.
	(handle_visibility_attribute): New function.
	(handle_weak_attribute): New function.
	(handle_alias_ifunc_attribute): New function.
	* jit-playback.cc (fn_attribute_to_string): New function.
	(variable_attribute_to_string): New function.
	(global_new_decl): Add attributes support.
	(set_variable_attribute): New function.
	(new_global): Add attributes support.
	(new_global_initialized): Add attributes support.
	(new_local): Add attributes support.
	* jit-playback.h (fn_attribute_to_string): New function.
	(set_variable_attribute): New function.
	* jit-recording.cc (recording::lvalue::add_attribute): New function.
	(recording::function::function): New function.
	(recording::function::write_to_dump): Add attributes support.
	(recording::function::add_attribute): New function.
	(recording::function::add_string_attribute): New function.
	(recording::function::add_integer_array_attribute): New function.
	(recording::global::replay_into): Add attributes support.
	(recording::local::replay_into): Add attributes support.
	* libgccjit.cc (gcc_jit_function_add_attribute): New function.
	(gcc_jit_function_add_string_attribute): New function.
	(gcc_jit_function_add_integer_array_attribute): New function.
	(gcc_jit_lvalue_add_attribute): New function.
	* libgccjit.h (enum gcc_jit_fn_attribute): New enum.
	(gcc_jit_function_add_attribute): New function.
	(gcc_jit_function_add_string_attribute): New function.
	(gcc_jit_function_add_integer_array_attribute): New function.
	(enum gcc_jit_variable_attribute): New function.
	(gcc_jit_lvalue_add_string_attribute): New function.
	* libgccjit.map: Declare new functions.

gcc/testsuite/ChangeLog:

	* jit.dg/jit.exp: Add `jit-verify-assembler-output-not` test command.
	* jit.dg/test-restrict.c: New test.
	* jit.dg/test-restrict-attribute.c: New test.
	* jit.dg/test-alias-attribute.c: New test.
	* jit.dg/test-always_inline-attribute.c: New test.
	* jit.dg/test-cold-attribute.c: New test.
	* jit.dg/test-const-attribute.c: New test.
	* jit.dg/test-noinline-attribute.c: New test.
	* jit.dg/test-nonnull-attribute.c: New test.
	* jit.dg/test-pure-attribute.c: New test.
	* jit.dg/test-used-attribute.c: New test.
	* jit.dg/test-variable-attribute.c: New test.
	* jit.dg/test-weak-attribute.c: New test.

gcc/jit/ChangeLog:
	* docs/topics/compatibility.rst: Add documentation for LIBGCCJIT_ABI_26.
	* docs/topics/types.rst: Add documentation for new functions.

Co-authored-by: Antoni Boucher 
Signed-off-by: Guillaume Gomez 
---
 gcc/jit/docs/topics/compatibility.rst |  12 +
 gcc/jit/docs/topics/types.rst |  77 +++
 gcc/jit/dummy-frontend.cc | 504 --
 gcc/jit/jit-playback.cc   | 165 +-
 gcc/jit/jit-playback.h|  37 +-
 gcc/jit/jit-recording.cc  | 166 +-
 gcc/jit/jit-recording.h   |  19 +-
 gcc/jit/libgccjit.cc  |  45 ++
 gcc/jit/libgccjit.h   |  49 ++
 gcc/jit/libgccjit.map |   8 +
 gcc/testsuite/jit.dg/jit.exp  |  33 ++
 gcc/testsuite/jit.dg/test-alias-attribute.c   |  50 ++
 .../jit.dg/test-always_inline-attribute.c | 153 ++
 gcc/testsuite/jit.dg/test-cold-attribute.c|  54 ++
 gcc/testsuite/jit.dg/test-const-attribute.c   | 134 +
 .../jit.dg/test-noinline-attribute.c  | 114 
 gcc/testsuite/jit.dg/test-nonnull-attribute.c |  94 
 gcc/testsuite/jit.dg/test-pure-attribute.c| 134 +
 ...t-restrict.c => test-restrict-attribute.c} |   4 +-
 gcc/testsuite/jit.dg/test-used-attribute.c| 112 
 .../jit.dg/test-variable-attribute.c  |  46 ++
 gcc/testsuite/jit.dg/test-weak-attribute.c|  41 ++
 22 files changed, 1986 insertions(+), 65 deletions

Re: [PATCH] Add support for function attributes and variable attributes

2023-11-15 Thread Antoni Boucher

David: another thing I remember you mentioned when you reviewed an
earlier version of this patch is the usage of `std::pair`.
I can't find where you said that, but I remember you mentioned that we
should use a struct instead.
Can you please elaborate again?
Thanks.

On Wed, 2023-11-15 at 17:53 +0100, Guillaume Gomez wrote:
> Hi,
> 
> This patch adds the (incomplete) support for function and variable
> attributes. The added attributes are the ones we're using in
> rustc_codegen_gcc but all the groundwork is done to add more (and we
> will very likely add more as we didn't add all the ones we use in
> rustc_codegen_gcc yet).
> 
> The only big question with this patch is about `inline`. We currently
> handle it as an attribute because it is more convenient for us but is
> it ok or should we create a separate function to mark a function as
> inlined?
> 
> Thanks in advance for the review.

[PATCH]AArch64: only discount MLA for vector and scalar statements

2023-11-15 Thread Tamar Christina

Hi All,

In testcases gcc.dg/tree-ssa/slsr-19.c  and gcc.dg/tree-ssa/slsr-20.c we have a
fairly simple computation.  On the current generic costing we generate:

f:
add w0, w0, 2
maddw1, w0, w1, w1
lsl w0, w1, 1
ret

but on any other cost model but generic (including the new up coming generic)
we generate:

f:
adrpx2, .LC0
dup v31.2s, w0
fmovs30, w1
ldr d29, [x2, #:lo12:.LC0]
add v31.2s, v31.2s, v29.2s
mul v31.2s, v31.2s, v30.s[0]
addpv31.2s, v31.2s, v31.2s
fmovw0, s31
ret
.LC0:
.word   2
.word   4

This seems to be because the vectorizer thinks the vector transfers are free:

x1_4 + x2_6 1 times vector_stmt costs 0 in body
x1_4 + x2_6 1 times vec_to_scalar costs 0 in body  

This happens because the stmt it's using to get the cost of register transfers
for the given type happens to be one feeding into a MUL.  we incorrectly
discount the + for the register transfer.

This is fixed by guarding the check for aarch64_multiply_add_p with a kind
check and only do it for scalar_stmt and vector_stmt.

I'm sending this separate to my patch series but it's required for it.
It also seems to fix overvectorization cases in fotonik3d_r in SPECCPU 2017.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_adjust_stmt_cost): Guard mla.
(aarch64_vector_costs::count_ops): Likewise.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
06ec22057e10fd591710aa4c795a78f34eeaa8e5..0f05877ead3dca6477ebc70f53c632e4eb48d439
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -14587,7 +14587,7 @@ aarch64_adjust_stmt_cost (vec_info *vinfo, 
vect_cost_for_stmt kind,
}
 
   gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
-  if (assign)
+  if ((kind == scalar_stmt || kind == vector_stmt) && assign)
{
  /* For MLA we need to reduce the cost since MLA is 1 instruction.  */
  if (!vect_is_reduction (stmt_info)
@@ -14669,7 +14669,9 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
 }
 
   /* Assume that multiply-adds will become a single operation.  */
-  if (stmt_info && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
+  if (stmt_info
+  && (kind == scalar_stmt || kind == vector_stmt)
+  && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
 return;
 
   /* Assume that bool AND with compare operands will become a single




-- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
06ec22057e10fd591710aa4c795a78f34eeaa8e5..0f05877ead3dca6477ebc70f53c632e4eb48d439
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -14587,7 +14587,7 @@ aarch64_adjust_stmt_cost (vec_info *vinfo, 
vect_cost_for_stmt kind,
}
 
   gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
-  if (assign)
+  if ((kind == scalar_stmt || kind == vector_stmt) && assign)
{
  /* For MLA we need to reduce the cost since MLA is 1 instruction.  */
  if (!vect_is_reduction (stmt_info)
@@ -14669,7 +14669,9 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
 }
 
   /* Assume that multiply-adds will become a single operation.  */
-  if (stmt_info && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
+  if (stmt_info
+  && (kind == scalar_stmt || kind == vector_stmt)
+  && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags))
 return;
 
   /* Assume that bool AND with compare operands will become a single

[PATCH 2/6]AArch64: Remove special handling of generic cpu.

2023-11-15 Thread Tamar Christina

Hi All,

In anticipation of adding new generic turning values this removes the hardcoding
of the "generic" CPU and instead just specifies it as a normal CPU.

No change in behavior is expected.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/111370
* config/aarch64/aarch64-cores.def: Add generic.
* config/aarch64/aarch64-opts.h (enum aarch64_proc): Remove generic.
* config/aarch64/aarch64-tune.md: Regenerate
* config/aarch64/aarch64.cc (all_cores): Remove generic
* config/aarch64/aarch64.h (enum target_cpus): Remove
TARGET_CPU_generic.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
eae40b29df6f8ae353d168b6f73845846d1da94b..3e363bd0e8bbc10cb5b28d6183647736318e6d40
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -189,4 +189,7 @@ AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, 
(I8MM, BF16, SVE2_BITPER
 AARCH64_CORE("neoverse-v2", neoversev2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, 
RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 
+/* Generic Architecture Processors.  */
+AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
+
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
831e28ab52a4271ef5467965039a32d078755d42..01151e93d17979f499523cabb74a449170483a70
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -32,8 +32,6 @@ enum aarch64_processor
 #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, 
PART, VARIANT) \
   INTERNAL_IDENT,
 #include "aarch64-cores.def"
-  /* Used to indicate that no processor has been specified.  */
-  generic,
   /* Used to mark the end of the processor table.  */
   aarch64_none
 };
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
c969277d617ad5fd070a915bfedb83323eb71e6c..cd5d79ea9c221874578a4d5804e4f618e671ebcd
 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
d74e9116fc56cfa85558cc0810f76479e7280f69..b178bb5b62dbdcb1f5edbad4155416d6093a11f3
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -720,7 +720,6 @@ enum target_cpus
 #define AARCH64_CORE(NAME, INTERNAL_IDENT, SCHED, ARCH, FLAGS, COSTS, IMP, 
PART, VARIANT) \
   TARGET_CPU_##INTERNAL_IDENT,
 #include "aarch64-cores.def"
-  TARGET_CPU_generic
 };
 
 /* If there is no CPU defined at configure, use generic as default.  */
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
07b1cde39209f5c7740e336b499e9aed31e4c515..086448632700bc97b0d4c75d85cef63f820e9944
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -427,8 +427,6 @@ static const struct processor

[PATCH 3/6]AArch64: Add new generic-armv8-a CPU and make it the default.

2023-11-15 Thread Tamar Christina

Hi All,

This patch adds a new generic scheduling model "generic-armv8-a" and makes it
the default for all Armv8 architectures.

-mcpu=generic and -mtune=generic is kept around for those that really want the
deprecated cost model.

This shows on SPECCPU 2017 the following:

generic:  SPECINT 1.0% imporvement in geomean, SPECFP -0.6%.  The SPECFP is due
  to fotonik3d_r where we vectorize an FP calculation that only ever
  needs one lane of the result.  This I believe is a generic costing bug
  but at the moment we can't change costs of FP and INT independently.
  So will defer updating that cost to stage3 after Richard's other
  costing updates land.

generic SVE: SPECINT 1.1% improvement in geomean, SPECFP 0.7% improvement.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/111370
* config/aarch64/aarch64-arches.def (armv8-9, armv8-a, armv8.1-a,
armv8.2-a, armv8.3-a, armv8.4-a, armv8.5-a, armv8.6-a, armv8.7-a,
armv8.8-a): Update to generic_armv8_a.
* config/aarch64/aarch64-cores.def (generic-armv8-a): New.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc: Include generic_armv8_a.h
* config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Change to
TARGET_CPU_generic_armv8_a.
* config/aarch64/tuning_models/generic_armv8_a.h: New file.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 
7ae92aa8e984e0a77efd5c5a5061c4c6f86e0118..f89e4ea1f48acc2875c9a834d93d94c94163cddc
 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -30,19 +30,19 @@
Due to the assumptions about the positions of these fields in config.gcc,
NAME should be kept as the first argument.  */
 
-AARCH64_ARCH("armv8-a",   generic,   V8A,   8,  (SIMD))
-AARCH64_ARCH("armv8.1-a", generic,   V8_1A, 8,  (V8A, LSE, CRC, 
RDMA))
-AARCH64_ARCH("armv8.2-a", generic,   V8_2A, 8,  (V8_1A))
-AARCH64_ARCH("armv8.3-a", generic,   V8_3A, 8,  (V8_2A, PAUTH, 
RCPC))
-AARCH64_ARCH("armv8.4-a", generic,   V8_4A, 8,  (V8_3A, F16FML, 
DOTPROD, FLAGM))
-AARCH64_ARCH("armv8.5-a", generic,   V8_5A, 8,  (V8_4A, SB, SSBS, 
PREDRES))
-AARCH64_ARCH("armv8.6-a", generic,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
-AARCH64_ARCH("armv8.7-a", generic,   V8_7A, 8,  (V8_6A, LS64))
-AARCH64_ARCH("armv8.8-a", generic,   V8_8A, 8,  (V8_7A, MOPS))
-AARCH64_ARCH("armv8-r",   generic,   V8R  , 8,  (V8_4A))
-AARCH64_ARCH("armv9-a",   generic,   V9A  , 9,  (V8_5A, SVE2))
-AARCH64_ARCH("armv9.1-a", generic,   V9_1A, 9,  (V8_6A, V9A))
-AARCH64_ARCH("armv9.2-a", generic,   V9_2A, 9,  (V8_7A, V9_1A))
-AARCH64_ARCH("armv9.3-a", generic,   V9_3A, 9,  (V8_8A, V9_2A))
+AARCH64_ARCH("armv8-a",   generic_armv8_a,   V8A,   8,  (SIMD))
+AARCH64_ARCH("armv8.1-a", generic_armv8_a,   V8_1A, 8,  (V8A, LSE, 
CRC, RDMA))
+AARCH64_ARCH("armv8.2-a", generic_armv8_a,   V8_2A, 8,  (V8_1A))
+AARCH64_ARCH("armv8.3-a", generic_armv8_a,   V8_3A, 8,  (V8_2A, PAUTH, 
RCPC))
+AARCH64_ARCH("armv8.4-a", generic_armv8_a,   V8_4A, 8,  (V8_3A, 
F16FML, DOTPROD, FLAGM))
+AARCH64_ARCH("armv8.5-a", generic_armv8_a,   V8_5A, 8,  (V8_4A, SB, 
SSBS, PREDRES))
+AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 8,  (V8_5A, I8MM, 
BF16))
+AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A, LS64))
+AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, MOPS))
+AARCH64_ARCH("armv8-r",   generic_armv8_a,   V8R  , 8,  (V8_4A))
+AARCH64_ARCH("armv9-a",   generic,   V9A  , 9,  (V8_5A, SVE2))
+AARCH64_ARCH("armv9.1-a", generic,   V9_1A, 9,  (V8_6A, V9A))
+AARCH64_ARCH("armv9.2-a", generic,   V9_2A, 9,  (V8_7A, V9_1A))
+AARCH64_ARCH("armv9.3-a", generic,   V9_3A, 9,  (V8_8A, V9_2A))
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
3e363bd0e8bbc10cb5b28d6183647736318e6d40..30f4dd04ed71823bc34c0c405d49963b6b2d1375
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -191,5 +191,6 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, 
BF16, SVE2_BITPERM, RNG,
 
 /* Generic Architecture Processors.  */
 AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
+AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A,  (), 
generic_armv8_a, 0x0, 0x0, -1)
 
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
cd5d79ea9c221874578a4d5804e4f618e671ebcd..0a32056f255de455f47a0b7395dfef0a

[PATCH 6/6]AArch64: only emit mismatch error when features would be disabled.

2023-11-15 Thread Tamar Christina

Hi All,

At the moment we emit a warning whenever you specify both -march and -mcpu
and the architecture of them differ.  The idea originally was that the user may
not be aware of this change.

However this has a few problems:

1.  Architecture revisions is not an observable part of the architecture,
extensions are.  Starting with GCC 14 we have therefore relaxed the rule 
that
all extensions can be enabled at any architecture level.  Therefore it's
incorrect, or at least not useful to keep the check on architecture.

2.  It's problematic in Makefiles and other build systems, where you want to
for certain files enable CPU specific builds.  i.e. you may be by default
building for -march=armv8-a but for some file for -mcpu=neoverse-n1.  Since
there's no easy way to remove the earlier options we end up warning and
there's no way to disable just this warning.  Build systems compiling with
-Werror face an issue in this case that compiling with GCC is needlessly
hard.

3. It doesn't actually warn for cases that may lead to issues, so e.g.
   -march=armv8.2-a+sve -mcpu=neoverse-n1 does not give a warning that SVE would
   be disabled.

For this reason I have one of two proposals:

1.  Just remove this warning all together.

2.  Rework the warning based on extensions and only warn when features would be
disabled by the presence of the -mcpu.  This is the approach this patch has
taken.

As examples:

> aarch64-none-linux-gnu-gcc -march=armv8.2-a+sve -mcpu=neoverse-n1
cc1: warning: switch ‘-mcpu=neoverse-n1’ conflicts with ‘-march=armv8.2-a+sve’ 
switch and resulted in options +crc+sve+norcpc+nodotprod being added

.arch armv8.2-a+crc+sve

> aarch64-none-linux-gnu-gcc -march=armv8.2-a -mcpu=neoverse-n1
> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n1
> aarch64-none-linux-gnu-gcc -march=armv8.2-a+dotprod -mcpu=neoverse-n2


The one remaining issue here is that if both -march and -mcpu are specified we
pick the -march.  This is not particularly obvious and for the use case to be
more useful I think it makes sense to pick the CPU's arch?

I did not make that change in the patch as it changes semantics.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Note that I can't write a test for this because dg-warning expects warnings to
be at a particular line and doesn't support warnings at the "global" level.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options): Rework warnings.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc010dcc0b138db29caf7f
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16388,12 +16388,22 @@ aarch64_override_options (void)
   if (cpu && arch)
 {
   /* If both -mcpu and -march are specified, warn if they are not
-architecturally compatible and prefer the -march ISA flags.  */
-  if (arch->arch != cpu->arch)
-   {
- warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch",
+feature compatible.  feature compatible means that the inclusion of the
+cpu features would end up disabling an achitecture feature.  In
+otherwords the cpu features need to be a strict superset of the arch
+features and if so prefer the -march ISA flags.  */
+  auto full_arch_flags = arch->flags | arch_isa;
+  auto full_cpu_flags = cpu->flags | cpu_isa;
+  if (~full_cpu_flags & full_arch_flags)
+   {
+ std::string ext_diff
+   = aarch64_get_extension_string_for_isa_flags (full_arch_flags,
+ full_cpu_flags);
+ warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch "
+ "and resulted in options %s being added",
   aarch64_cpu_string,
-  aarch64_arch_string);
+  aarch64_arch_string,
+  ext_diff.c_str ());
}
 
   selected_arch = arch->arch;




-- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
caf80d66b3a744cc93899645aa5f9374983cd3db..3afd222ad3bdcfb922cc010dcc0b138db29caf7f
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16388,12 +16388,22 @@ aarch64_override_options (void)
   if (cpu && arch)
 {
   /* If both -mcpu and -march are specified, warn if they are not
-architecturally compatible and prefer the -march ISA flags.  */
-  if (arch->arch != cpu->arch)
-   {
- warning (0, "switch %<-mcpu=%s%> conflicts with %<-march=%s%> switch",
+feature compatible.  feature compatible means that the inclusion of

[PATCH 4/6]AArch64: Add new generic-armv9-a CPU and make it the default for Armv9

2023-11-15 Thread Tamar Christina

Hi All,

This patch adds a new generic scheduling model "generic-armv9-a" and makes it
the default for all Armv9 architectures.

-mcpu=generic and -mtune=generic is kept around for those that really want the
deprecated cost model.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/111370
* config/aarch64/aarch64-arches.def (armv9-a, armv9.1-a, armv9.2-a,
armv9.3-a): Update to generic-armv9-a.
* config/aarch64/aarch64-cores.def (generic-armv9-a): New.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc: Include generic_armv9_a.h.
* config/aarch64/tuning_models/generic_armv9_a.h: New file.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-arches.def 
b/gcc/config/aarch64/aarch64-arches.def
index 
f89e4ea1f48acc2875c9a834d93d94c94163cddc..6b9a19c490ba0b35082077e877b19906138f039b
 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -40,9 +40,9 @@ AARCH64_ARCH("armv8.6-a", generic_armv8_a,   V8_6A, 
8,  (V8_5A, I8MM, BF
 AARCH64_ARCH("armv8.7-a", generic_armv8_a,   V8_7A, 8,  (V8_6A, LS64))
 AARCH64_ARCH("armv8.8-a", generic_armv8_a,   V8_8A, 8,  (V8_7A, MOPS))
 AARCH64_ARCH("armv8-r",   generic_armv8_a,   V8R  , 8,  (V8_4A))
-AARCH64_ARCH("armv9-a",   generic,   V9A  , 9,  (V8_5A, SVE2))
-AARCH64_ARCH("armv9.1-a", generic,   V9_1A, 9,  (V8_6A, V9A))
-AARCH64_ARCH("armv9.2-a", generic,   V9_2A, 9,  (V8_7A, V9_1A))
-AARCH64_ARCH("armv9.3-a", generic,   V9_3A, 9,  (V8_8A, V9_2A))
+AARCH64_ARCH("armv9-a",   generic_armv9_a,   V9A  , 9,  (V8_5A, SVE2))
+AARCH64_ARCH("armv9.1-a", generic_armv9_a,   V9_1A, 9,  (V8_6A, V9A))
+AARCH64_ARCH("armv9.2-a", generic_armv9_a,   V9_2A, 9,  (V8_7A, V9_1A))
+AARCH64_ARCH("armv9.3-a", generic_armv9_a,   V9_3A, 9,  (V8_8A, V9_2A))
 
 #undef AARCH64_ARCH
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
30f4dd04ed71823bc34c0c405d49963b6b2d1375..16752b77f4baf8d1aa8a5406826aa29e367120c5
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -191,6 +191,7 @@ AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, 
BF16, SVE2_BITPERM, RNG,
 
 /* Generic Architecture Processors.  */
 AARCH64_CORE("generic",  generic, cortexa53, V8A,  (), generic, 0x0, 0x0, -1)
-AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A,  (), 
generic_armv8_a, 0x0, 0x0, -1)
+AARCH64_CORE("generic-armv8-a",  generic_armv8_a, cortexa53, V8A, (), 
generic_armv8_a, 0x0, 0x0, -1)
+AARCH64_CORE("generic-armv9-a",  generic_armv9_a, cortexa53, V9A, (), 
generic_armv9_a, 0x0, 0x0, -1)
 
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
0a32056f255de455f47a0b7395dfef0af84c6b5e..61bb85211252970f0a0526929d6b88353bdd930f
 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
(const (symbol_ref "((enum attr_tune

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-15 Thread David Edelsohn

On Wed, Nov 15, 2023 at 9:22 AM Arsen Arsenović  wrote:

>
> David Edelsohn  writes:
>
> > GCC had been working on AIX with NLS, using "--with-included-gettext".
> > --disable-nls gets past the breakage, but GCC does not build for me on
> AIX
> > with NLS enabled.
>
> That should still work with gettext 0.22+ extracted in-tree (it should
> be fetched by download_prerequisites).
>
> > A change in dependencies for GCC should have been announced and more
> widely
> > socialized in the GCC development mailing list, not just GCC patches
> > mailing list.
> >
> > I have tried both the AIX Open Source libiconv and libgettext package,
> and
> > the ones that I previously built.  Both fail because GCC configure
> decides
> > to disable NLS, despite being requested, while libcpp is satisfied, so
> > tools in the gcc subdirectory don't link against libiconv and the build
> > fails.  With the included gettext, I was able to rely on a
> self-consistent
> > solution.
>
> That is interesting.  They should be using the same checks.  I've
> checked trunk and regenerated files on it, and saw no significant diff
> (some whitespace changes only).  Could you post the config.log of both?
>
> I've never used AIX.  Can I reproduce this on one of the cfarm machines
> to poke around?  I've tried cfarm119, but that one lacked git, and I
> haven't poked around much further due to time constraints.
>

The AIX system in the Compile Farm has a complete complement of Open Source
software installed.

Please ensure that /opt/freeware/bin is in your path.  Also, the GCC Wiki
Compile Farm page has build tips that include AIX

https://gcc.gnu.org/wiki/CompileFarm#Services_and_software_installed_on_farm_machines

that recommended --with-included-gettext configuration option.

Thanks, David


>
> TIA, sorry about the inconvenience.  Have a lovely day.
>
> > The current gettext-0.22.3 fails to build for me on AIX.
> >
> > libcpp configure believes that NLS functions on AIX, but gcc configure
> > fails in its tests of gettext functionality, which leads to an
> inconsistent
> > configuration and build breakage.
> >
> > Thanks, David
>
>
> --
> Arsen Arsenović
>

[Committed] RISC-V: elide unnecessary sign extend when expanding cmp_and_jump

2023-11-15 Thread Vineet Gupta

RV64 compare and branch instructions only support 64-bit operands.
At Expand time, the backend conservatively zero/sign extends
its operands even if not needed, such as incoming function args
which ABI/ISA guarantee to be sign-extended already (this is true for
SI, HI, QI operands)

And subsequently REE fails to eliminate them as
   "missing defintion(s)" or "multiple definition(s)
since function args don't have explicit definition.

So during expand riscv_extend_comparands (), if an operand is a
subreg-promoted SI with inner DI, which is representative of a function
arg, just peel away the subreg to expose the DI, eliding the sign
extension. As Jeff noted this routine is also used in if-conversion so
potentially can also help there.

Note there's currently patches floating around to improve REE and also a
new pass to eliminate unneccesary extensions, but it is still beneficial
to not generate those extra extensions in first place. It is obviously
less work for post-reload passes such as REE, but even for earlier
passes, such as combine, having to deal with one less thing and ensuing
fewer combinations is a win too.

Way too many existing tests used to observe this issue.
e.g. gcc.c-torture/compile/20190827-1.c -O2 -march=rv64gc
It elimiates the SEXT.W

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sign_extend_if_not_subreg_prom): New.
* (riscv_extend_comparands): Call New function on operands.

Tested-by: Patrick O'Neill  # pre-commit-CI #676
Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e919850fc6cb..e466d4f168af 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3695,6 +3695,24 @@ riscv_zero_if_equal (rtx cmp0, rtx cmp1)
   cmp0, cmp1, 0, 0, OPTAB_DIRECT);
 }
 
+/* Helper function for riscv_extend_comparands to Sign-extend the OP.
+   However if the OP is SI subreg promoted with an inner DI, such as
+   (subreg/s/v:SI (reg/v:DI) 0)
+   just peel off the SUBREG to get DI, avoiding extraneous extension.  */
+
+static void
+riscv_sign_extend_if_not_subreg_prom (rtx *op)
+{
+  if (GET_CODE (*op) == SUBREG
+  && SUBREG_PROMOTED_VAR_P (*op)
+  && SUBREG_PROMOTED_SIGNED_P (*op)
+  && (GET_MODE_SIZE (GET_MODE (XEXP (*op, 0))).to_constant ()
+ == GET_MODE_SIZE (word_mode)))
+*op = XEXP (*op, 0);
+  else
+*op = gen_rtx_SIGN_EXTEND (word_mode, *op);
+}
+
 /* Sign- or zero-extend OP0 and OP1 for integer comparisons.  */
 
 static void
@@ -3724,9 +3742,10 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx 
*op1)
}
   else
{
- *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+ riscv_sign_extend_if_not_subreg_prom (op0);
+
  if (*op1 != const0_rtx)
-   *op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
+   riscv_sign_extend_if_not_subreg_prom (op1);
}
 }
 }
-- 
2.34.1

[Committed] RISC-V: fix vsetvli pass testsuite failure [PR/112447]

2023-11-15 Thread Vineet Gupta

From: Juzhe-Zhong 

Fixes: f0e28d8c1371 ("RISC-V: Fix failed hoist in LICM of vmv.v.x instruction")

Since above commit, we have following failure:

  FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions execution test
  FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test

The issue was not the commit but rather it unravelled an issue in the
vsetvli pass.

Here's Juzhe's analysis:

We have 2 types of global vsetvls insertion.
One is earliest fusion of each end of the block.
The other is LCM suggested edge vsetvls.

So before this patch, insertion as follows:

|  (insn 2817 2820 2818 361 (set (reg:SI 67 vtype)
|(unspec:SI [
|(const_int 8 [0x8])
|(const_int 7 [0x7])
|(const_int 1 [0x1]) repeated x2
|] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))
|  (insn 2818 2817 999 361 (set (reg:SI 67 vtype)
|(unspec:SI [
|(const_int 32 [0x20])
|(const_int 1 [0x1]) repeated x3
|] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))

After this patch:

|  (insn 2817 2820 2819 361 (set (reg:SI 67 vtype)
|(unspec:SI [
|(const_int 32 [0x20])
|(const_int 1 [0x1]) repeated x3
|] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))
|  (insn 2819 2817 999 361 (set (reg:SI 67 vtype)
|(unspec:SI [
|(const_int 8 [0x8])
|(const_int 7 [0x7])
|(const_int 1 [0x1]) repeated x2
|] UNSPEC_VSETVL)) 1708 {vsetvl_vtype_change_only}
| (nil))

The original insertion order is incorrect.

We should first insert earliest fusion since it is the vsetvls information
already there which was seen by later LCM. We just delay the insertion.
So it should be come before the LCM suggested insertion.

PR target/112447

gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Insert
local vsetvl info before LCM suggested one.

Tested-by: Patrick O'Neill  # pre-commit-CI #679
Co-developed-by: Vineet Gupta 

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv-vsetvl.cc | 70 
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 8466b5d019ea..74367ec8d8e9 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3229,6 +3229,41 @@ pre_vsetvl::emit_vsetvl ()
   remove_vsetvl_insn (item);
 }
 
+  /* Insert vsetvl info that was not deleted after lift up.  */
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  const vsetvl_block_info &block_info = get_block_info (bb);
+  if (!block_info.has_info ())
+   continue;
+
+  const vsetvl_info &footer_info = block_info.get_exit_info ();
+
+  if (footer_info.delete_p ())
+   continue;
+
+  edge eg;
+  edge_iterator eg_iterator;
+  FOR_EACH_EDGE (eg, eg_iterator, bb->cfg_bb ()->succs)
+   {
+ gcc_assert (!(eg->flags & EDGE_ABNORMAL));
+ if (dump_file)
+   {
+ fprintf (
+   dump_file,
+   "\n  Insert missed vsetvl info at edge(bb %u -> bb %u): ",
+   eg->src->index, eg->dest->index);
+ footer_info.dump (dump_file, "");
+   }
+ start_sequence ();
+ insert_vsetvl_insn (EMIT_DIRECT, footer_info);
+ rtx_insn *rinsn = get_insns ();
+ end_sequence ();
+ default_rtl_profile ();
+ insert_insn_on_edge (rinsn, eg);
+ need_commit = true;
+   }
+}
+
   /* m_insert vsetvl as LCM suggest. */
   for (int ed = 0; ed < NUM_EDGES (m_edges); ed++)
 {
@@ -3267,41 +3302,6 @@ pre_vsetvl::emit_vsetvl ()
   insert_insn_on_edge (rinsn, eg);
 }
 
-  /* Insert vsetvl info that was not deleted after lift up.  */
-  for (const bb_info *bb : crtl->ssa->bbs ())
-{
-  const vsetvl_block_info &block_info = get_block_info (bb);
-  if (!block_info.has_info ())
-   continue;
-
-  const vsetvl_info &footer_info = block_info.get_exit_info ();
-
-  if (footer_info.delete_p ())
-   continue;
-
-  edge eg;
-  edge_iterator eg_iterator;
-  FOR_EACH_EDGE (eg, eg_iterator, bb->cfg_bb ()->succs)
-   {
- gcc_assert (!(eg->flags & EDGE_ABNORMAL));
- if (dump_file)
-   {
- fprintf (
-   dump_file,
-   "\n  Insert missed vsetvl info at edge(bb %u -> bb %u): ",
-   eg->src->index, eg->dest->index);
- footer_info.dump (dump_file, "");
-   }
- start_sequence ();
- insert_vsetvl_insn (EMIT_DIRECT, footer_info);
- rtx_insn *rinsn = get_insns ();
- end_sequence ();
- default_rtl_profile ();
- insert_insn_on_edge (rinsn,

Re: [PATCH 04/14] c++: use _P() defines from tree.h

2023-11-15 Thread Bernhard Reutner-Fischer

On Tue, 8 Aug 2023 16:31:39 -0400
Jason Merrill  wrote:

> On 8/2/23 12:51, Patrick Palka via Gcc-patches wrote:
> > On Thu, Jun 1, 2023 at 2:11 PM Bernhard Reutner-Fischer
> >  wrote:  
> >>
> >> Hi David, Patrick,
> >>
> >> On Thu, 1 Jun 2023 18:33:46 +0200
> >> Bernhard Reutner-Fischer  wrote:
> >>  
> >>> On Thu, 1 Jun 2023 11:24:06 -0400
> >>> Patrick Palka  wrote:
> >>>  
>  On Sat, May 13, 2023 at 7:26 PM Bernhard Reutner-Fischer via
>  Gcc-patches  wrote:  
> >>>  
> > diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
> > index 131b212ff73..19dfb3ed782 100644
> > --- a/gcc/cp/tree.cc
> > +++ b/gcc/cp/tree.cc
> > @@ -1173,7 +1173,7 @@ build_cplus_array_type (tree elt_type, tree 
> > index_type, int dependent)
> >   }
> >
> > /* Avoid spurious warnings with VLAs (c++/54583).  */
> > -  if (TYPE_SIZE (t) && EXPR_P (TYPE_SIZE (t)))
> > +  if (CAN_HAVE_LOCATION_P (TYPE_SIZE (t)))  
> 
>  Hmm, this change seems undesirable...  
> >>>
> >>> mhm, yes that is misleading. I'll prepare a patch to revert this.
> >>> Let me have a look if there were other such CAN_HAVE_LOCATION_P changes
> >>> that we'd want to revert.  
> >>
> >> Sorry for that!
> >> I'd revert the hunk above and the one in gcc-rich-location.cc
> >> (maybe_range_label_for_tree_type_mismatch::get_text), please see
> >> attached. Bootstrap running, ok for trunk if it passes?  
> > 
> > LGTM!  
> 
> Yes, OK.

Now applied as r14-5508 (186331063dfbcf1eacb445c473d92634c9baa90f)

thanks

[PATCH] c++: constantness of call to function pointer [PR111703]

2023-11-15 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/13/12 (to match the PR107939 / r13-6525-ge09bc034d1b4d6 backports)?

-- >8 --

potential_constant_expression for a CALL_EXPR to a non-overload tests
FUNCTION_POINTER_TYPE_P on the callee rather than on the type of the
callee, which means we always pass want_rval=any when recursing and so
may fail to properly treat a non-constant function pointer callee as such.
Fixing this turns out to further work around the PR111703 issue.

PR c++/111703
PR c++/107939

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) :
Fix FUNCTION_POINTER_TYPE_P test.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-fn8.C: Extend test.
* g++.dg/diagnostic/constexpr4.C: New test.
---
 gcc/cp/constexpr.cc  | 4 +++-
 gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C| 2 ++
 gcc/testsuite/g++.dg/diagnostic/constexpr4.C | 9 +
 3 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/constexpr4.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 8a6b210144a..5ecc30117a1 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9547,7 +9547,9 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
  }
else if (fun)
   {
-   if (RECUR (fun, FUNCTION_POINTER_TYPE_P (fun) ? rval : any))
+   if (RECUR (fun, (TREE_TYPE (fun)
+&& FUNCTION_POINTER_TYPE_P (TREE_TYPE (fun))
+? rval : any)))
  /* Might end up being a constant function pointer.  But it
 could also be a function object with constexpr op(), so
 we pass 'any' so that the underlying VAR_DECL is deemed
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
index 3f63a5b28d7..c63d26c931d 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
@@ -15,10 +15,12 @@ struct P {
 };
 
 void (*f)(P);
+P (*h)(P);
 
 template
 constexpr bool g() {
   P x;
   f(x); // { dg-bogus "from here" }
+  f(h(x)); // { dg-bogus "from here" }
   return true;
 }
diff --git a/gcc/testsuite/g++.dg/diagnostic/constexpr4.C 
b/gcc/testsuite/g++.dg/diagnostic/constexpr4.C
new file mode 100644
index 000..f971f533b08
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/constexpr4.C
@@ -0,0 +1,9 @@
+// Verify we diagnose a call to a non-constant function pointer ahead of time.
+// { dg-do compile { target c++11 } }
+
+int (*f)(int);
+
+template
+void g() {
+  static_assert(f(N) == 0, ""); // { dg-error "non-constant|'f' is not usable" 
}
+}
-- 
2.43.0.rc1

[Committed] RISC-V: Fix ICE in non-canonical march parsing

2023-11-15 Thread Patrick O'Neill

Updated testcase names and committed.

Thanks,
Patrick

---

Passing in a base extension in non-canonical order (i, e, g) causes GCC
to ICE:
xgcc: error: '-march=rv64ge': ISA string is not in canonical order. 'e'
xgcc: internal compiler error: in add, at 
common/config/riscv/riscv-common.cc:671
...

This is fixed by skipping to the next extension when a non-canonical
order is detected.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse_std_ext): Emit an error and skip to
the next extension when a non-canonical ordering is detected.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-27.c: New test.
* gcc.target/riscv/arch-28.c: New test.

Signed-off-by: Patrick O'Neill 
---
Tested using rv64gc glibc on QEMU.
---
 gcc/common/config/riscv/riscv-common.cc  | 17 +
 gcc/testsuite/gcc.target/riscv/arch-27.c |  7 +++
 gcc/testsuite/gcc.target/riscv/arch-28.c |  7 +++
 3 files changed, 27 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-27.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-28.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 526dbb7603b..57fe856063e 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1017,15 +1017,24 @@ riscv_subset_list::parse_std_ext (const char *p)
   std_ext = *p;
 
   /* Checking canonical order.  */
+  const char *prior_std_exts = std_exts;
+
   while (*std_exts && std_ext != *std_exts)
std_exts++;
 
   subset[0] = std_ext;
   if (std_ext != *std_exts && standard_extensions_p (subset))
-   error_at (m_loc,
- "%<-march=%s%>: ISA string is not in canonical order. "
- "%<%c%>",
- m_arch, *p);
+   {
+ error_at (m_loc,
+   "%<-march=%s%>: ISA string is not in canonical order. "
+   "%<%c%>",
+   m_arch, *p);
+ /* Extension ordering is invalid.  Ignore this extension and keep
+searching for other issues with remaining extensions.  */
+ std_exts = prior_std_exts;
+ p++;
+ continue;
+   }
 
   std_exts++;
 
diff --git a/gcc/testsuite/gcc.target/riscv/arch-27.c 
b/gcc/testsuite/gcc.target/riscv/arch-27.c
new file mode 100644
index 000..70143b2156f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-27.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64ge -mabi=lp64d" } */
+int foo()
+{
+}
+
+/* { dg-error "ISA string is not in canonical order. 'e'" "" { target *-*-* } 
0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-28.c 
b/gcc/testsuite/gcc.target/riscv/arch-28.c
new file mode 100644
index 000..934399a7b3a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-28.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imaefcv -mabi=lp64d" } */
+int foo()
+{
+}
+
+/* { dg-error "ISA string is not in canonical order. 'e'" "" { target *-*-* } 
0 } */
-- 
2.34.1

Re: [PATCH] RISC-V: Fix ICE in non-canonical march parsing

2023-11-15 Thread Patrick O'Neill


Does relax mean no longer enforcing the canonical order of extensions?

Patrick

On 11/14/23 17:52, Kito Cheng wrote:


LGTM, and BTW...I am thinking we could relax the canonical order
during parsing, did you have interesting and time working on that
item?

On Wed, Nov 15, 2023 at 9:35 AM Patrick O'Neill  wrote:

Passing in a base extension in non-canonical order (i, e, g) causes GCC
to ICE:
xgcc: error: '-march=rv64ge': ISA string is not in canonical order. 'e'
xgcc: internal compiler error: in add, at 
common/config/riscv/riscv-common.cc:671
...

This is fixed by skipping to the next extension when a non-canonical
order is detected.

gcc/ChangeLog:

 * common/config/riscv/riscv-common.cc
 (riscv_subset_list::parse_std_ext): Emit an error and skip to
 the next extension when a non-canonical ordering is detected.

Re: building GNU gettext on AIX

2023-11-15 Thread David Edelsohn

When I try to configure gettext-0.22.3, I receive the following error:

checking for socklen_t equivalent... configure: error: Cannot find a type
to use in place of socklen_t

configure: error:
/nasfarm/edelsohn/src/gettext-0.22.3/libtextstyle/configure failed for
libtextstyle


configure:43943: /nasfarm/edelsohn/install/GCC12/bin/gcc -c -g -O2
-D_THREAD_SAFE
conftest.c >&5

conftest.c:112:18: error: two or more data types in declaration specifiers

  112 | #define intmax_t long long

  |  ^~~~

conftest.c:112:23: error: two or more data types in declaration specifiers

  112 | #define intmax_t long long

  |   ^~~~

In file included from conftest.c:212:

conftest.c:214:24: error: conflicting types for 'ngetpeername'; have
'int(int,  void *, long unsigned int *)'

  214 |int getpeername (int, void *, unsigned long int
*);

  |^~~

/nasfarm/edelsohn/install/GCC12/lib/gcc/powerpc-ibm-aix7.2.5.0/12.1.1/include-fixed/sys/socket.h:647:9:
note: previous declaration of 'ngetpeername' with type 'int(int,  struct
sockaddr * restrict,  socklen_t * restrict)' {aka 'int(int,  struct
sockaddr * restrict,  long unsigned int * restrict)'}

  647 | int getpeername(int, struct sockaddr *__restrict__, socklen_t
*__restrict__);

  | ^~~


configure and config.h seems to get itself confused about types.


David



On Wed, Nov 15, 2023 at 7:29 AM Bruno Haible  wrote:

> [CCing bug-gettext]
>
> David Edelsohn wrote in
> :
> > The current gettext-0.22.3 fails to build for me on AIX.
>
> Here are some hints to get a successful build of GNU gettext on AIX:
>
> 1. Set the recommended environment variables before running configure:
>https://gitlab.com/ghwiki/gnow-how/-/wikis/Platforms/Configuration
>
>Namely:
>* for a 32-bit build with gcc:
>  CC=gcc
>  CXX=g++
>  CPPFLAGS="-I$PREFIX/include"
>  LDFLAGS="-L$PREFIX/lib"
>  unset AR NM
>* for a 32-bit build with xlc:
>  CC="xlc -qthreaded -qtls"
>  CXX="xlC -qthreaded -qtls"
>  CPPFLAGS="-I$PREFIX/include"
>  LDFLAGS="-L$PREFIX/lib"
>  unset AR NM
>* for a 64-bit build with gcc:
>  CC="gcc -maix64"
>  CXX="g++ -maix64"
>  CPPFLAGS="-I$PREFIX/include"
>  LDFLAGS="-L$PREFIX/lib"
>  AR="ar -X 64"; NM="nm -X 64 -B"
>* for a 64-bit build with xlc:
>  CC="xlc -q64 -qthreaded -qtls"
>  CXX="xlC -q64 -qthreaded -qtls"
>  CPPFLAGS="-I$PREFIX/include"
>  LDFLAGS="-L$PREFIX/lib"
>  AR="ar -X 64"; NM="nm -X 64 -B"
>
>where $PREFIX is the value that you pass to the --prefix configure
> option.
>
>Rationale: you can run into all sorts of problems if you choose compiler
>options at random and haven't experience with compiler options on that
>platform.
>
> 2. Don't use ibm-clang.
>
>Rationale: It's broken.
>
> 3. Don't use -Wall with gcc 10.3.
>
>Rationale: If you specify -Wall, gettext's configure adds -fanalyzer,
> which
>has excessive memory requirements in gcc 10.x. In particular, on AIX, it
>makes cc1 crash while compiling regex.c after it has consumed 1 GiB of
> RAM.
>
> 4. Avoid using a --prefix that contains earlier installations of the same
>package.
>
>Rationale: Because the AIX linker hardcodes directory names in shared
>libraries, GNU libtool has a peculiar configuration on AIX. It ends up
>mixing the in-build-tree libraries with the libraries in the install
>locations, leading to all sorts of errors.
>
>If you really need to use a --prefix that contains an earlier
>installation of the same package:
>  - Either use --disable-shared and remove libgettextlib.a and
>libgettextsrc.a from $PREFIX/lib before starting the build.
>  - Or use a mix of "make -k", "make -k install" and ad-hoc workarounds
>that cannot be described in a general way.
>
> Bruno
>
>
>
>

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-15 Thread David Edelsohn

On Wed, Nov 15, 2023 at 9:22 AM Arsen Arsenović  wrote:

>
> David Edelsohn  writes:
>
> > GCC had been working on AIX with NLS, using "--with-included-gettext".
> > --disable-nls gets past the breakage, but GCC does not build for me on
> AIX
> > with NLS enabled.
>
> That should still work with gettext 0.22+ extracted in-tree (it should
> be fetched by download_prerequisites).
>
> > A change in dependencies for GCC should have been announced and more
> widely
> > socialized in the GCC development mailing list, not just GCC patches
> > mailing list.
> >
> > I have tried both the AIX Open Source libiconv and libgettext package,
> and
> > the ones that I previously built.  Both fail because GCC configure
> decides
> > to disable NLS, despite being requested, while libcpp is satisfied, so
> > tools in the gcc subdirectory don't link against libiconv and the build
> > fails.  With the included gettext, I was able to rely on a
> self-consistent
> > solution.
>
> That is interesting.  They should be using the same checks.  I've
> checked trunk and regenerated files on it, and saw no significant diff
> (some whitespace changes only).  Could you post the config.log of both?
>

GCC configured with --with-libintl-prefix and --with-libiconv-prefix

libcpp/config.log:

configure:7610: checking for GNU gettext in libc

configure:7639: /nasfarm/edelsohn/install/GCC12/bin/gcc -std=gnu99 -o
conftest -g  -static-libstdc++ -static-libgcc -Wl,-bbigtoc conftest.c  >&5

conftest.c:71:10: fatal error: libintl.h: No such file or directory

   71 | #include 

  |  ^~~

configure:8318: checking for GNU gettext in libintl

configure:8355: /nasfarm/edelsohn/install/GCC12/bin/gcc -std=gnu99 -o
conftest -g -I/nasfarm/edelsohn/install/include  -static-libstdc++
-static-libgcc -Wl,-bbigtoc conftest.c  /nasfarm/edelsohn/install/lib/
libintl.a >&5

ld: 0711-317 ERROR: Undefined symbol: .libiconv_open

ld: 0711-317 ERROR: Undefined symbol: .libiconv_set_relocation_prefix

ld: 0711-317 ERROR: Undefined symbol: .libiconv_close

ld: 0711-317 ERROR: Undefined symbol: .libiconv

ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more
information.

collect2: error: ld returned 8 exit status

configure:8355: $? = 1

configure:8392: /nasfarm/edelsohn/install/GCC12/bin/gcc -std=gnu99 -o
conftest -g -I/nasfarm/edelsohn/install/include  -static-libstdc++
-static-libgcc -Wl,-bbigtoc conftest.c  /nasfarm/edelsohn/install/lib/
libintl.a /nasfarm/edelsohn/install/lib/libiconv.a >&5

configure:8392: $? = 0

configure:8405: result: yes

configure:8440: checking whether to use NLS

configure:8442: result: yes

configure:8445: checking where the gettext function comes from

configure:8456: result: external libintl

configure:8464: checking how to link with libintl

configure:8466: result: /nasfarm/edelsohn/install/lib/libintl.a
/nasfarm/edelsohn/install/lib/libiconv.a

configure:8525: checking whether NLS is requested

configure:8531: result: yes

gcc/config.log:

configure:14002: checking for GNU gettext in libc

configure:14031: /nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11 -o
conftest

-g-static-libstdc++ -static-libgcc -Wl,-bbigtoc  conftest.cpp  >&5

conftest.cpp:196:10: fatal error: libintl.h: No such file or directory

  196 | #include 

  |  ^~~

configure:14710: checking for GNU gettext in libintl

configure:14747: /nasfarm/edelsohn/install/GCC12/bin/g++ -std=c++11 -o
conftest -g-I/nasfarm/edelsohn/install/include -static-libstdc++
-static-libgcc -Wl,-bbigtoc  conftest.cpp  /nasfarm/edelsohn/install/lib/
libintl.a >&5

ld: 0711-317 ERROR: Undefined symbol: .libiconv_open

ld: 0711-317 ERROR: Undefined symbol: .libiconv_set_relocation_prefix

ld: 0711-317 ERROR: Undefined symbol: .libiconv_close

ld: 0711-317 ERROR: Undefined symbol: .libiconv

ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more
information.

collect2: error: ld returned 8 exit status

configure:14747: $? = 1

configure:14797: result: no

configure:14832: checking whether to use NLS

configure:14834: result: no

configure:14917: checking whether NLS is requested

configure:14920: result: no




> I've never used AIX.  Can I reproduce this on one of the cfarm machines
> to poke around?  I've tried cfarm119, but that one lacked git, and I
> haven't poked around much further due to time constraints.
>
> TIA, sorry about the inconvenience.  Have a lovely day.
>
> > The current gettext-0.22.3 fails to build for me on AIX.
> >
> > libcpp configure believes that NLS functions on AIX, but gcc configure
> > fails in its tests of gettext functionality, which leads to an
> inconsistent
> > configuration and build breakage.
> >
> > Thanks, David
>
>
> --
> Arsen Arsenović
>

Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-11-15 Thread Thomas Schwinge

Hi!

On 2023-10-30T19:08:18+, Iain Sandoe  wrote:
>> On 30 Oct 2023, at 16:31, FX Coudert  wrote:
>>
>>> +enable_darwin_at_rpath_$1=no
>>
>> I actually don’t understand why this one would have $1 in the name, unlike 
>> all other regenerated configure files. What value do we expect for $1 at 
>> this point in the file? That’s just plain weird.
>
> I’ve committed the missing hunk - at least that should appease CI.
>
> Agreed, it is weird, (actually, I’ve never quite understood why fixincludes 
> wants libtool.m4 given that it is host-side and not building any libraries) ..

So I currently see the following in my build logs:

[...]
mkdir -p -- ./fixincludes
Configuring in ./fixincludes
configure: creating cache ./config.cache
[...]/source-gcc/fixincludes/configure: line 3030: 
enable_darwin_at_rpath_--srcdir=[...]/source-gcc/fixincludes=no: No such file 
or directory
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking target system type... nvptx-unknown-none
[...]

I'm not convinced that's achieving what it means to achieve?

Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

[COMMITTED] Regenerate libiberty/aclocal.m4 with aclocal 1.15.1

2023-11-15 Thread Mark Wielaard

There is a new buildbot check that all autotool files are generated
with the correct versions (automake 1.15.1 and autoconf 2.69).
https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen

Correct one file that was generated with the wrong version.

libiberty/
* aclocal.m4: Rebuild.
---
 libiberty/aclocal.m4 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libiberty/aclocal.m4 b/libiberty/aclocal.m4
index f327865aaf9..0757688d52a 100644
--- a/libiberty/aclocal.m4
+++ b/libiberty/aclocal.m4
@@ -1,6 +1,6 @@
-# generated automatically by aclocal 1.16.5 -*- Autoconf -*-
+# generated automatically by aclocal 1.15.1 -*- Autoconf -*-
 
-# Copyright (C) 1996-2021 Free Software Foundation, Inc.
+# Copyright (C) 1996-2017 Free Software Foundation, Inc.
 
 # This file is free software; the Free Software Foundation
 # gives unlimited permission to copy and/or distribute it,
-- 
2.39.3

Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-15 Thread Robin Dapp

On 11/15/23 15:29, 钟居哲 wrote:
> Could you show me the example ?
> 
> It's used by handling SEW = 64 on RV32. I don't know why this patch touch 
> this code.

Use gather_load_run-1.c with the 64-bit index patterns disabled
on rv32.  We insert (mem:DI (reg:SI)) into a vector so use the
SEW = 64 demote handler.  There we set vl = vl * 2 (which is correct)
but the mode (i.e. vector) just changes from DI to SI while
keeping the number of elements the same.  Then we get e.g. go
from V8DI to V8SI and slide down 16 elements, losing the lower
half.  

Regards
 Robin

Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-11-15 Thread FX Coudert

> So I currently see the following in my build logs:
> 
>[...]
>mkdir -p -- ./fixincludes
>Configuring in ./fixincludes
>configure: creating cache ./config.cache
>[...]/source-gcc/fixincludes/configure: line 3030: 
> enable_darwin_at_rpath_--srcdir=[...]/source-gcc/fixincludes=no: No such file 
> or directory
>checking build system type... x86_64-pc-linux-gnu
>checking host system type... x86_64-pc-linux-gnu
>checking target system type... nvptx-unknown-none
>[...]
> 
> I'm not convinced that's achieving what it means to achieve?

I’ve tried to understand where that line gets expanded from:

>>> +enable_darwin_at_rpath_$1=no

It comes from:

> _LT_TAGVAR(enable_darwin_at_rpath, $1)=no

in the top-level libtool.m4. I can’t say that I understand why that line is 
there. All the other definitions using this structure are all inside the 
definition of _LT_ prefixed functions, defined by m4_defun. This one line is 
alone, outside of any function.

If I remove the line from libtool.m4 (innocent smile) I see that 
fixincludes/configure is better, and it does not appear to change the 
regenerated files in other directories (I didn’t do a build yet, just tried to 
regenerate with some manual autoconf invocations).

Food for thought.
FX

Re: building GNU gettext on AIX

2023-11-15 Thread Bruno Haible

David Edelsohn wrote:
> When I try to configure gettext-0.22.3, I receive the following error:
> 
> checking for socklen_t equivalent... configure: error: Cannot find a type
> to use in place of socklen_t
> 
> configure: error:
> /nasfarm/edelsohn/src/gettext-0.22.3/libtextstyle/configure failed for
> libtextstyle
> 
> 
> configure:43943: /nasfarm/edelsohn/install/GCC12/bin/gcc -c -g -O2
> -D_THREAD_SAFE
> conftest.c >&5
> 
> conftest.c:112:18: error: two or more data types in declaration specifiers
> 
>   112 | #define intmax_t long long
> 
>   |  ^~~~
> 
> conftest.c:112:23: error: two or more data types in declaration specifiers
> 
>   112 | #define intmax_t long long
> 
>   |   ^~~~
> 
> In file included from conftest.c:212:
> 
> conftest.c:214:24: error: conflicting types for 'ngetpeername'; have
> 'int(int,  void *, long unsigned int *)'
> 
>   214 |int getpeername (int, void *, unsigned long int
> *);
> 
>   |^~~
> 
> /nasfarm/edelsohn/install/GCC12/lib/gcc/powerpc-ibm-aix7.2.5.0/12.1.1/include-fixed/sys/socket.h:647:9:
> note: previous declaration of 'ngetpeername' with type 'int(int,  struct
> sockaddr * restrict,  socklen_t * restrict)' {aka 'int(int,  struct
> sockaddr * restrict,  long unsigned int * restrict)'}
> 
>   647 | int getpeername(int, struct sockaddr *__restrict__, socklen_t
> *__restrict__);
> 
>   | ^~~
> 
> 
> configure and config.h seems to get itself confused about types.

There seem to be two problems, both related to the include files of
your compiler:

  - The configure test "checking for intmax_t..." must have found the
answer "no". But on a modern system,  should be defining
intmax_t already.

  - This configure test that tries to find the getpeername declaration,
but cannot find it (maybe because of the first problem?):


 for arg2 in "struct sockaddr" void; do
   for t in int size_t "unsigned int" "long int" "unsigned long int"; do
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
/* end confdefs.h.  */
#include 
   #include 

   int getpeername (int, $arg2 *, $t *);
int
main (void)
{
$t len;
  getpeername (0, 0, &len);
  ;
  return 0;
}
_ACEOF
if ac_fn_c_try_compile "$LINENO"
then :
  gl_cv_socklen_t_equiv="$t"
fi


I would concentrate on the first problem. If you don't get it fixed, then I'd
suggest to try 'gcc' from the AIX Toolbox [1] or 'xlc' (as an IBM product)
instead of 'gcc' (that looks like you built it yourself).

Bruno

[1] 
https://public.dhe.ibm.com/aix/freeSoftware/aixtoolbox/SPECS/gcc12-12.3.0-1.spec

[committed] i386: Optimize strict_low_part QImode insn with high input registers

2023-11-15 Thread Uros Bizjak

Following testcase:

struct S1
{
  unsigned char val;
  unsigned char pad1;
  unsigned short pad2;
};

struct S2
{
  unsigned char pad1;
  unsigned char val;
  unsigned short pad2;
};

struct S1 test_add (struct S1 a, struct S2 b, struct S2 c)
{
  a.val = b.val + c.val;

  return a;
}

compiles with -O2 to:

movl%edi, %eax
movzbl  %dh, %edx
movl%esi, %ecx
movb%dl, %al
addb%ch, %al

The insert to %al can go directly from %dh:

movl%edi, %eax
movl%esi, %ecx
movb%dh, %al
addb%ch, %al

Patch introduces strict_low_part QImode insn patterns with both of
their input arguments extracted from high register.  This invalid
insn is split after reload to a lowpart insert from the high register
and qi_ext_1_slp instruction.

PR target/78904

gcc/ChangeLog:

* config/i386/i386.md (*movstrictqi_ext_1): New insn pattern.
(*addqi_ext_2_slp): New define_insn_and_split pattern.
(*subqi_ext_2_slp): Ditto.
(*qi_ext_2_slp): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr78904-8.c: New test.
* gcc.target/i386/pr78904-8a.c: New test.
* gcc.target/i386/pr78904-8b.c: New test.
* gcc.target/i386/pr78904-9.c: New test.
* gcc.target/i386/pr78904-9a.c: New test.
* gcc.target/i386/pr78904-9b.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 32535621db4..26cdb21d3c0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3335,6 +3335,19 @@ (define_insn "*movstrict_xor"
(set_attr "mode" "")
(set_attr "length_immediate" "0")])
 
+(define_insn "*movstrictqi_ext_1"
+  [(set (strict_low_part
+ (match_operand:QI 0 "register_operand" "+Q"))
+ (subreg:QI
+   (match_operator:SWI248 2 "extract_operator"
+ [(match_operand 1 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0))]
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "mov{b}\t{%h1, %0|%0, %h1}"
+  [(set_attr "type" "imov")
+   (set_attr "mode" "QI")])
+
 (define_expand "extv"
   [(set (match_operand:SWI24 0 "register_operand")
(sign_extract:SWI24 (match_operand:SWI24 1 "register_operand")
@@ -6645,6 +6658,39 @@ (define_insn_and_split "*addqi_ext_1_slp"
   [(set_attr "type" "alu")
(set_attr "mode" "QI")])
 
+(define_insn_and_split "*addqi_ext_2_slp"
+  [(set (strict_low_part (match_operand:QI 0 "register_operand" "+&Q"))
+   (plus:QI
+ (subreg:QI
+   (match_operator:SWI248 3 "extract_operator"
+ [(match_operand 1 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0)
+   (subreg:QI
+ (match_operator:SWI248 4 "extract_operator"
+   [(match_operand 2 "int248_register_operand" "Q")
+(const_int 8)
+(const_int 8)]) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "#"
+  "&& reload_completed"
+  [(set (strict_low_part (match_dup 0))
+   (subreg:QI
+ (match_op_dup 4
+   [(match_dup 2) (const_int 8) (const_int 8)]) 0))
+   (parallel
+ [(set (strict_low_part (match_dup 0))
+  (plus:QI
+(subreg:QI
+  (match_op_dup 3
+[(match_dup 1) (const_int 8) (const_int 8)]) 0)
+  (match_dup 0)))
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
+  [(set_attr "type" "alu")
+   (set_attr "mode" "QI")])
+
 ;; Split non destructive adds if we cannot use lea.
 (define_split
   [(set (match_operand:SWI48 0 "register_operand")
@@ -7688,6 +7734,39 @@ (define_insn_and_split "*subqi_ext_1_slp"
   [(set_attr "type" "alu")
(set_attr "mode" "QI")])
 
+(define_insn_and_split "*subqi_ext_2_slp"
+  [(set (strict_low_part (match_operand:QI 0 "register_operand" "+&Q"))
+   (minus:QI
+ (subreg:QI
+   (match_operator:SWI248 3 "extract_operator"
+ [(match_operand 1 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0)
+   (subreg:QI
+ (match_operator:SWI248 4 "extract_operator"
+   [(match_operand 2 "int248_register_operand" "Q")
+(const_int 8)
+(const_int 8)]) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "#"
+  "&& reload_completed"
+  [(set (strict_low_part (match_dup 0))
+   (subreg:QI
+ (match_op_dup 3
+   [(match_dup 1) (const_int 8) (const_int 8)]) 0))
+   (parallel
+ [(set (strict_low_part (match_dup 0))
+  (minus:QI
+  (match_dup 0)
+(subreg:QI
+  (match_op_dup 4
+[(match_dup 2) (const_int 8) (const_int 8)]) 0)))
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
+  [(set_attr "type" "alu")
+   (

Re: building GNU gettext on AIX

2023-11-15 Thread David Edelsohn

On Wed, Nov 15, 2023 at 4:22 PM Bruno Haible  wrote:

> David Edelsohn wrote:
> > When I try to configure gettext-0.22.3, I receive the following error:
> >
> > checking for socklen_t equivalent... configure: error: Cannot find a type
> > to use in place of socklen_t
> >
> > configure: error:
> > /nasfarm/edelsohn/src/gettext-0.22.3/libtextstyle/configure failed for
> > libtextstyle
> >
> >
> > configure:43943: /nasfarm/edelsohn/install/GCC12/bin/gcc -c -g -O2
> > -D_THREAD_SAFE
> > conftest.c >&5
> >
> > conftest.c:112:18: error: two or more data types in declaration
> specifiers
> >
> >   112 | #define intmax_t long long
> >
> >   |  ^~~~
> >
> > conftest.c:112:23: error: two or more data types in declaration
> specifiers
> >
> >   112 | #define intmax_t long long
> >
> >   |   ^~~~
> >
> > In file included from conftest.c:212:
> >
> > conftest.c:214:24: error: conflicting types for 'ngetpeername'; have
> > 'int(int,  void *, long unsigned int *)'
> >
> >   214 |int getpeername (int, void *, unsigned long
> int
> > *);
> >
> >   |^~~
> >
> >
> /nasfarm/edelsohn/install/GCC12/lib/gcc/powerpc-ibm-aix7.2.5.0/12.1.1/include-fixed/sys/socket.h:647:9:
> > note: previous declaration of 'ngetpeername' with type 'int(int,  struct
> > sockaddr * restrict,  socklen_t * restrict)' {aka 'int(int,  struct
> > sockaddr * restrict,  long unsigned int * restrict)'}
> >
> >   647 | int getpeername(int, struct sockaddr *__restrict__, socklen_t
> > *__restrict__);
> >
> >   | ^~~
> >
> >
> > configure and config.h seems to get itself confused about types.
>
> There seem to be two problems, both related to the include files of
> your compiler:
>
>   - The configure test "checking for intmax_t..." must have found the
> answer "no". But on a modern system,  should be defining
> intmax_t already.
>
>   - This configure test that tries to find the getpeername declaration,
> but cannot find it (maybe because of the first problem?):
>
>
> 
>  for arg2 in "struct sockaddr" void; do
>for t in int size_t "unsigned int" "long int" "unsigned long
> int"; do
>  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> /* end confdefs.h.  */
> #include 
>#include 
>
>int getpeername (int, $arg2 *, $t *);
> int
> main (void)
> {
> $t len;
>   getpeername (0, 0, &len);
>   ;
>   return 0;
> }
> _ACEOF
> if ac_fn_c_try_compile "$LINENO"
> then :
>   gl_cv_socklen_t_equiv="$t"
> fi
>
> 
>
> I would concentrate on the first problem. If you don't get it fixed, then
> I'd
> suggest to try 'gcc' from the AIX Toolbox [1] or 'xlc' (as an IBM product)
> instead of 'gcc' (that looks like you built it yourself).
>
> Bruno
>
> [1]
> https://public.dhe.ibm.com/aix/freeSoftware/aixtoolbox/SPECS/gcc12-12.3.0-1.spec


Bruno,

I am using my own install of GCC for a reason.  The build of GCC works for
everything else, including bootstrap of GCC, GDB, GMP, etc.  The only
problem is gettext.

Thanks, David

Re: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-15 Thread 钟居哲

OK. Make sense。
LGTM as long as you remove  all
GET_MODE_BITSIZE (GET_MODE_INNER (mode)) <= GET_MODE_BITSIZE (Pmode)

juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2023-11-16 04:30
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for 
rv32gcv.
On 11/15/23 15:29, 钟居哲 wrote:
> Could you show me the example ?
> 
> It's used by handling SEW = 64 on RV32. I don't know why this patch touch 
> this code.

Use gather_load_run-1.c with the 64-bit index patterns disabled
on rv32.  We insert (mem:DI (reg:SI)) into a vector so use the
SEW = 64 demote handler.  There we set vl = vl * 2 (which is correct)
but the mode (i.e. vector) just changes from DI to SI while
keeping the number of elements the same.  Then we get e.g. go
from V8DI to V8SI and slide down 16 elements, losing the lower
half.  

Regards
Robin

1 2 >

1 - 100 of 148 matches

Mail list logo