[PATCH] s390: Fix strict_low_part generation

2024-08-16 Thread Stefan Schulze Frielinghaus
In s390_expand_insv(), if generating code for ICM et al. src is a MEM
and gen_lowpart might force src into a register such that we end up with
patterns which do not match anymore.  Use adjust_address() instead in
order to preserve a MEM.

Furthermore, it is not straight forward to enforce a subreg.  For
example, in case of a paradoxical subreg, gen_lowpart() may return a
register.  In order to compensate this, s390_gen_lowpart_subreg() emits
a reference to a pseudo which does not coincide with its definition
which is wrong.  Additionally, if dest is a paradoxical subreg, then do
not try to emit a strict_low_part since it could mean that dest was not
initialized even though this might be fixed up later by init-regs.

Splitter for insn *get_tp_64, *zero_extendhisi2_31,
*zero_extendqisi2_31, *zero_extendqihi2_31 are applied after reload.
Thus, operands[0] is a hard register and gen_lowpart (m, operands[0])
just returns the hard register for mode m which is fine to use as an
argument for strict_low_part, i.e., we do not need to enforce subregs
here since after reload subregs are supposed to be eliminated anyway.

This fixes gcc.dg/torture/pr111821.c.

gcc/ChangeLog:

* config/s390/s390-protos.h (s390_gen_lowpart_subreg): Remove.
* config/s390/s390.cc (s390_gen_lowpart_subreg): Remove.
(s390_expand_insv): Use adjust_address() and emit a
strict_low_part only in case of a natural subreg.
* config/s390/s390.md: Use gen_lowpart() instead of
s390_gen_lowpart_subreg().
---
 Bootstrapped and regtested on s390.  Ok for mainline,gcc12,gcc13,gcc14?

 gcc/config/s390/s390-protos.h |  1 -
 gcc/config/s390/s390.cc   | 47 +++
 gcc/config/s390/s390.md   | 13 +-
 3 files changed, 20 insertions(+), 41 deletions(-)

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index b4646ccb606..e7ac59d17da 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -50,7 +50,6 @@ extern void s390_set_has_landing_pad_p (bool);
 extern bool s390_hard_regno_rename_ok (unsigned int, unsigned int);
 extern int s390_class_max_nregs (enum reg_class, machine_mode);
 extern bool s390_return_addr_from_memory(void);
-extern rtx s390_gen_lowpart_subreg (machine_mode, rtx);
 extern bool s390_fma_allowed_p (machine_mode);
 #if S390_USE_TARGET_ATTRIBUTE
 extern tree s390_valid_target_attribute_tree (tree args,
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 7aea776da2f..7cdcebfc08b 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -516,31 +516,6 @@ s390_return_addr_from_memory ()
   return cfun_gpr_save_slot(RETURN_REGNUM) == SAVE_SLOT_STACK;
 }
 
-/* Generate a SUBREG for the MODE lowpart of EXPR.
-
-   In contrast to gen_lowpart it will always return a SUBREG
-   expression.  This is useful to generate STRICT_LOW_PART
-   expressions.  */
-rtx
-s390_gen_lowpart_subreg (machine_mode mode, rtx expr)
-{
-  rtx lowpart = gen_lowpart (mode, expr);
-
-  /* There might be no SUBREG in case it could be applied to the hard
- REG rtx or it could be folded with a paradoxical subreg.  Bring
- it back.  */
-  if (!SUBREG_P (lowpart))
-{
-  machine_mode reg_mode = TARGET_ZARCH ? DImode : SImode;
-  gcc_assert (REG_P (lowpart));
-  lowpart = gen_lowpart_SUBREG (mode,
-   gen_rtx_REG (reg_mode,
-REGNO (lowpart)));
-}
-
-  return lowpart;
-}
-
 /* Return nonzero if it's OK to use fused multiply-add for MODE.  */
 bool
 s390_fma_allowed_p (machine_mode mode)
@@ -7112,15 +7087,21 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src)
   /* Emit a strict_low_part pattern if possible.  */
   if (smode_bsize == bitsize && bitpos == mode_bsize - smode_bsize)
{
- rtx low_dest = s390_gen_lowpart_subreg (smode, dest);
- rtx low_src = gen_lowpart (smode, src);
-
- switch (smode)
+ rtx low_dest = gen_lowpart (smode, dest);
+ if (SUBREG_P (low_dest) && !paradoxical_subreg_p (low_dest))
{
-   case E_QImode: emit_insn (gen_movstrictqi (low_dest, low_src)); 
return true;
-   case E_HImode: emit_insn (gen_movstricthi (low_dest, low_src)); 
return true;
-   case E_SImode: emit_insn (gen_movstrictsi (low_dest, low_src)); 
return true;
-   default: break;
+ poly_int64 offset = GET_MODE_SIZE (mode) - GET_MODE_SIZE (smode);
+ rtx low_src = adjust_address (src, smode, offset);
+ switch (smode)
+   {
+   case E_QImode: emit_insn (gen_movstrictqi (low_dest, low_src));
+  return true;
+   case E_HImode: emit_insn (gen_movstricthi (low_dest, low_src));
+  return true;
+   case E_SImode: emit_insn (gen_movstrictsi (low_dest, low_src));
+  retu

Re: [PATCH v3 0/5] aarch64: Fix intrinsic availability [PR112108]

2024-08-16 Thread Kyrylo Tkachov


> On 15 Aug 2024, at 18:48, Andrew Carlotti  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Thu, Aug 15, 2024 at 05:15:03PM +0100, Richard Sandiford wrote:
>> Andrew Carlotti  writes:
>>> This series of patches fixes issues with some intrinsics being incorrectly
>>> gated by global target options, instad of just using function-specific 
>>> target
>>> options.  These issues have been present since the +tme, +memtag and +ls64
>>> intrinsics were introduced.
>>> 
>>> Compared to the previous version, this series no longer adds feature checks 
>>> to
>>> the intrinsic expanders, and fixes various formatting issues pointed out by
>>> Richard Sandiford.
>>> 
>>> Additionally, the series now refactors the checking of 
>>> TARGET_GENERAL_REGS_ONLY
>>> in check_required_extensions.  This refactor is included as a new patch 
>>> (1/5)
>>> to make the diffs more readable.
>>> 
>>> 
>>> Bootstrapped and regression tested on aarch64.  Ok to merge?
>> 
>> LGTM, thanks.  OK if there are no other comments before the weekend.
>> 
>>> Also, ok for backports to affected versions (with regression tests)?
>> 
>> Hmm, it seems a bit invasive.  And if the GCC 11 tag in the PR is
>> anything to go by, it sounds like this is already unfixable behaviour
>> in at least one release series.
> 
> I think the impact is minimal prior to FMV support, so backporting is less
> important for older versions.  The series should backport cleanly to GCC 14,
> but would have conflicts in earlier version, so I think it would be sensible 
> to
> backport to GCC 14 and not further.

I think backporting only to GCC 14 is sensible. The intrinsics in question tbh 
are or will be shipping hardware that I don’t expect will be used with older 
compilers much to be worth the risk of adjusting the patches for those branches.
Thanks,
Kyrill


> 
>> Let's see if anyone else has any opinions.
>> 
>> Richard



Re: Re: [PATCH] Re-add calling emit_clobber in lower-subreg.cc's resolve_simple_move.

2024-08-16 Thread 钟居哲
Sorry for long time no update of subreg stuff.
I am working on it but recently get stuck in other project and I will be back 
after I finish my recent project :)



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2024-08-15 05:55
To: Jeff Law; Xianmiao Qu; gcc-patches; roger; juzhe.zhong; richard.sandiford
Subject: Re: [PATCH] Re-add calling emit_clobber in lower-subreg.cc's 
resolve_simple_move.
 
 
On 8/14/24 3:53 PM, Richard Sandiford wrote:
 
> 
> FWIW, I think the work to add a df subreg liveness tracking problem
> and use it in LRA/IRA would solve the live range problem without needing
> a clobber.  I wonder how that's going?  In my last review I suggested
> a change in representation (a single bitmap rather than per-register
> bitmaps), but the general approach was good and very welcome.  Hope I
> didn't scupper the whole thing :(
I think Lehua may have left rivai, so I'm not sure there's someone 
working on it right now.
 
jeff
 


[PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread 钟居哲
Hi, Zeng.
Thanks for fixing it.
LGTM from myside but since I am no expert on dwarf stuff.
I'd like to let kito or Robin review it again.

Thanks.



juzhe.zh...@rivai.ai


[PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-08-16 Thread Stefan Schulze Frielinghaus
Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1
survive register allocation.  This in turn leads to wrong register
renaming.  Keeping the current approach would mean we need two insns for
*tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along the
lines

(define_insn "*tf_to_fprx2_0"
  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0)
(unspec:DF [(match_operand:TF 1 "general_operand" "v")]
   UNSPEC_TF_TO_FPRX2_0))]
  "TARGET_VXE"
  "#")

(define_insn "*tf_to_fprx2_0"
  [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
(unspec:DF [(match_operand:TF 1 "general_operand" "v")]
   UNSPEC_TF_TO_FPRX2_0))]
  "TARGET_VXE"
  "vpdi\t%v0,%v1,%v0,1
  [(set_attr "op_type" "VRR")])

and similar for *tf_to_fprx2_1.  Note, pre register allocation operand 0
has mode FPRX2 and afterwards DF once subregs have been eliminated.

Since we always copy a whole vector register into a floating-point
register pair, another way to fix this is to merge *tf_to_fprx2_0 and
*tf_to_fprx2_1 into a single insn which means we don't have to use
subregs at all.  The downside of this is that the assembler template
contains two instructions, now.  The upside is that we don't have to
come up with some artificial insn before RA which might be more
readable/maintainable.  That is implemented by this patch.

In commit r11-4872-ge627cda5686592, the output operand specifier %V was
introduced which is used in tf_to_fprx2 only, now.  I didn't come up
with its counterpart like %F for floating-point registers.  Instead I
printed the register pair in the output function directly.  This spares
us a new and "rare" format specifier for a single insn.  I don't have a
strong opinion which option to choose, however, we should either add %F
in order to mimic the same behaviour as %V or getting rid of %V and
inline the logic in the output function.  I lean towards the latter.
Any preferences?
---
 gcc/config/s390/s390.md|  2 +
 gcc/config/s390/vector.md  | 66 +++---
 gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +
 3 files changed, 60 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 3d5759d6252..31240899934 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -241,6 +241,8 @@
UNSPEC_VEC_VFMIN
UNSPEC_VEC_VFMAX
 
+   UNSPEC_TF_TO_FPRX2
+
UNSPEC_NNPA_VCLFNHS_V8HI
UNSPEC_NNPA_VCLFNLS_V8HI
UNSPEC_NNPA_VCRNFS_V8HI
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index a75b7cb5825..561182e0c2c 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -907,36 +907,36 @@
   "vmrlg\t%0,%1,%2";
   [(set_attr "op_type" "VRR")])
 
-
-(define_insn "*tf_to_fprx2_0"
-  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
-   (subreg:DF (match_operand:TF1 "general_operand"   "v") 0))]
-  "TARGET_VXE"
-  ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
-  "vpdi\t%v0,%v1,%v0,1"
-  [(set_attr "op_type" "VRR")])
-
-(define_insn "*tf_to_fprx2_1"
-  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
-   (subreg:DF (match_operand:TF1 "general_operand"   "v") 8))]
+(define_insn "tf_to_fprx2"
+  [(set (match_operand:FPRX2 0 "register_operand" "=f,f ,f")
+   (unspec:FPRX2 [(match_operand:TF 1 "general_operand"   "v,AR,AT")]
+ UNSPEC_TF_TO_FPRX2))]
   "TARGET_VXE"
-  ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
-  "vpdi\t%V0,%v1,%V0,5"
-  [(set_attr "op_type" "VRR")])
-
-(define_insn_and_split "tf_to_fprx2"
-  [(set (match_operand:FPRX20 "nonimmediate_operand" "=f,f")
-   (subreg:FPRX2 (match_operand:TF 1 "general_operand"   "v,AR") 0))]
-  "TARGET_VXE"
-  "#"
-  "!(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))"
-  [(set (match_dup 2) (match_dup 3))
-   (set (match_dup 4) (match_dup 5))]
 {
-  operands[2] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 0);
-  operands[3] = simplify_gen_subreg (DFmode, operands[1], TFmode, 0);
-  operands[4] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 8);
-  operands[5] = simplify_gen_subreg (DFmode, operands[1], TFmode, 8);
+  char buf[64];
+  switch (which_alternative)
+{
+case 0:
+  if (REGNO (operands[0]) == REGNO (operands[1]))
+   return "vpdi\t%V0,%v1,%V0,5";
+  else
+   return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
+case 1:
+  {
+   const char *reg_pair = reg_names[REGNO (operands[0]) + 1];
+   snprintf (buf, sizeof (buf), "ld\t%%f0,%%1;ld\t%%%s,8+%%1", reg_pair);
+   output_asm_insn (buf, operands);
+   return "";
+  }
+case 2:
+  {
+   const char *reg_pair = reg_names[REGNO (operands[0]) + 1];
+   snprintf (buf, sizeof (buf), "ldy\t%%f0,%%1;ldy\t%%%s,8+%%1", reg_pair);
+ 

Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Kito Cheng
LGTM, thanks for fixing that :)

On Wed, Aug 14, 2024 at 2:06 PM 曾治金  wrote:
>
> This patch is to fix the bug (BugId:116305) introduced by the commit
> bd93ef for risc-v target.
>
> The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
> if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
> it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
> merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
> of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
> of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
> equal.
>
> Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
> register value in riscv_legitimize_poly_move, and dwarf2cfi will also
> get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
> to calculate the number of times to multiply the vlenb register value.
>
> So need to change the factor from riscv_bytes_per_vector_chunk to
> BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
> information. The incorrect example as follow:
>
> ```
> csrrt0,vlenb
> sllit1,t0,1
> sub sp,sp,t1
>
> .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
> ```
>
> The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
> the literal 4, '0x1e' means the multiply operation. But in fact, the
> vlenb register value just need to multiply the literal 2.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.
>
> Signed-off-by: Zhijin Zeng 
> ---
>  gcc/config/riscv/riscv.cc |  4 +--
>  .../riscv/rvv/base/scalable_vector_cfi.c  | 32 +++
>  2 files changed, 34 insertions(+), 2 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 5fe4273beb7..e740fc159dd 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -10773,12 +10773,12 @@ static unsigned int
>  riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
>   int *offset)
>  {
> -  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
> +  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
>   1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
>   2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
>*/
>gcc_assert (i == 1);
> -  *factor = riscv_bytes_per_vector_chunk;
> +  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
>*offset = 1;
>return RISCV_DWARF_VLENB;
>  }
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> new file mode 100644
> index 000..184da10caf3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
> +/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
> +/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } 
> */
> +
> +#include "riscv_vector.h"
> +
> +#define PI_2 1.570796326795
> +
> +extern void func(float *result);
> +
> +void test(const float *ys, const float *xs, float *result, size_t length) {
> +size_t gvl = __riscv_vsetvlmax_e32m2();
> +vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
> +
> +for(size_t i = 0; i < length;) {
> +gvl = __riscv_vsetvl_e32m2(length - i);
> +vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl);
> +vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl);
> +vbool16_t mask0  = __riscv_vmflt_vv_f32m2_b16(x, y, gvl);
> +vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 
> 0, gvl);
> +
> +__riscv_vse32_v_f32m2(result, fixpi, gvl);
> +
> +func(result);
> +
> +i += gvl;
> +ys += gvl;
> +xs += gvl;
> +result += gvl;
> +}
> +}
> --
> 2.34.1
>
>
> This message and any attachment are confidential and may be privileged or 
> otherwise protected from disclosure. If you are not an intended recipient of 
> this message, please delete it and any attachment from your system and notify 
> the sender immediately by reply e-mail. Unintended recipients should not use, 
> copy, disclose or take any action based on this message or any information 
> contained in this message. Emails cannot be guaranteed to be secure or error 
> free as they can be intercepted, amended, lost or destroyed, and you should 
> take full responsibility for security checking.
>
> 本邮件及其任何附件具有保密性质,并可能受其他保护或不允许被披露给第三方。如阁下误收到本邮件,敬请立即以回复电子邮件的方式通知发件人,并将本邮件及其任何附件从阁下系统中予以删除。如阁下并非本邮件写明之收件人,敬请切勿使用、复制、披露本邮件或其任何内容,亦请切勿依

Re: [PATCH] libcpp, c-family, v3: Add (dumb) C23 N3017 #embed support [PR105863]

2024-08-16 Thread Jakub Jelinek
On Fri, Aug 16, 2024 at 01:43:58AM +0200, Jakub Jelinek wrote:
> My reading of it wasn't that whether it is
> # embed < h-char-sequence > embed-parameter-sequence[opt] new-line
> or
> # embed < h-char-sequence > embed-parameter-sequence[opt] new-line
> or
> # embed pp-tokens new-line
> depends solely on the filename part in there, but also whether
> embed-parameter-sequence is syntactically valid (if specified).

But if so, doesn't that mean that also
#define foo bar
#define bar baz
#define limit suffix (1
#embed  limit )
should be treated as
#embed  suffix (1)
?
I'd think that for filenames that would be quite surprising.

I think what is implemented by the 3 implementations makes more sense,
is easier to implement and understand, but it would mean a DR.

Make just the filename part macro unexpanded or expanded based on whether
it matches
#embed < h-char-sequence > pp-tokens[opt] new-line
or
#embed " q-char-sequence " pp-tokens[opt] new-line
and if not use
#embed pp-tokens new-line
and say that the pp-tokens in the first two forms are macro expanded and
should be valid embed-parameter-sequence[opt] after macro expansion.

Because what is strictly specified right now means essentially lexing the
line (or more if \ is used at end of lines) twice, once with disabled macro
expansion, silently see if it matches the grammar and then depending on that
lex it again with or without macro expansion.  Plus really unsure what to do
about the limit argument if it is balanced before expansion and not balanced
after it and whether to macro expand prefix/suffix/if_empty argument at all
and when if yes.  If prefix/suffix/if_empty is macro unexpanded, it would be
a loop-hole how to introduce tokens without macro expansion into the source
(other than temporarily #undefining macros or using (foo) (args) to avoid
function-like macro expansion).

One could simply
#include 
#embed  if_empty (__linux__ = S_ISDIR (1, 2, 3, 4))

Jakub



[PATCH-1] Builtins: Fold isinf on IBM long double to isinf on high-order double [PR97786]

2024-08-16 Thread HAO CHEN GUI
Hi,
  This patch folds builtin_isinf on IBM long double to builtin_isinf on
its high-order double.

  The isinf_optab was already implemented in this patch.
https://gcc.gnu.org/g:53945be1efb502f235d84ff67ceafe4a764b6e1c

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Builtins: Fold isinf on IBM long double to isinf on high-order double

For IBM long double, Inf is encoded in the high-order double value only.
So the builtin_isinf on IBM long double can be folded to builtin_isinf on
double type.  As former patch implemented DFmode isinf_optab, this patch
converts builtin_isinf on IBM long double to builtin_isinf on double type
if the DFmode isinf_optab exists.

gcc/
PR target/97786
* builtins.cc (fold_builtin_interclass_mathfn): Fold isinf on IBM
long double to isinf on high_order double  when DFmode isinf_optab
exists.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-3.c: New test.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 0b902896ddd..c20bd7b5f31 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -9568,6 +9568,15 @@ fold_builtin_interclass_mathfn (location_t loc, tree 
fndecl, tree arg)
type = double_type_node;
mode = DFmode;
arg = fold_build1_loc (loc, NOP_EXPR, type, arg);
+
+   /* If isinf icode exists, build the call with high-order
+  double value only.  */
+   tree const isinf_fn = builtin_decl_explicit (BUILT_IN_ISINF);
+   if (interclass_mathfn_icode (arg, isinf_fn) != CODE_FOR_nothing)
+ {
+   result = build_call_expr (isinf_fn, 1, arg);
+   return result;
+ }
  }
get_max_float (REAL_MODE_FORMAT (mode), buf, sizeof (buf), false);
real_from_string (&r, buf);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-3.c
new file mode 100644
index 000..6a8d9f2df53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ibmlongdouble 
-Wno-psabi" } */
+/* { dg-require-effective-target powerpc_vsx } */
+
+int test1 (long double x)
+{
+  return __builtin_isinf (x);
+}
+
+int test2 (long double x)
+{
+  return __builtin_isinfl (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler-times {\mxststdcdp\M} 2 } } */


[PATCH-2] Builtins: Fold isfinite on IBM long double to isfinite on high-order double [PR97786]

2024-08-16 Thread HAO CHEN GUI
Hi,
  This patch folds builtin_isfinite on IBM long double to builtin_isfinite
on its high-order double.

  The isfinite_optab was already implemented in this patch.
https://gcc.gnu.org/g:44eb45c2ef7192eb6a811fd46fcb2c7fbeb6f865

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Builtins: Fold isfinite on IBM long double to isfinite on high-order double

For IBM long double, INF and NAN is encoded in the high-order double value
only.  So the builtin_isfinite on IBM long double can be folded to
builtin_isfinite on double type.  As former patch implemented DFmode
isfinite_optab, this patch converts builtin_isfinite on IBM long double to
builtin_isfinite on double type if the DFmode isfinite_optab exists.

gcc/
PR target/97786
* builtins.cc (fold_builtin_interclass_mathfn): Fold isfinite on
IBM long double to isfinite on high-order double when DFmode
isfinite_optab exists.

gcc/testsuite/
PR target/97786
* gcc.target/powerpc/pr97786-6.c: New test.

patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index c20bd7b5f31..c3d8ce5313a 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -9601,6 +9601,15 @@ fold_builtin_interclass_mathfn (location_t loc, tree 
fndecl, tree arg)
type = double_type_node;
mode = DFmode;
arg = fold_build1_loc (loc, NOP_EXPR, type, arg);
+
+   /* If isfinite icode exists, build the call with high-order
+  double value only.  */
+   tree const isfinite_fn = builtin_decl_explicit (BUILT_IN_ISFINITE);
+   if (interclass_mathfn_icode (arg, isfinite_fn) != CODE_FOR_nothing)
+ {
+   result = build_call_expr (isfinite_fn, 1, arg);
+   return result;
+ }
  }
get_max_float (REAL_MODE_FORMAT (mode), buf, sizeof (buf), false);
real_from_string (&r, buf);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-6.c 
b/gcc/testsuite/gcc.target/powerpc/pr97786-6.c
new file mode 100644
index 000..1b25bcecd5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97786-6.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ppc_float128_hw } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ibmlongdouble 
-Wno-psabi" } */
+/* { dg-require-effective-target powerpc_vsx } */
+
+int test1 (long double x)
+{
+  return __builtin_isfinite (x);
+}
+
+/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */
+/* { dg-final { scan-assembler {\mxststdcdp\M} } } */


Re: [Ping x 3, Patch, Fortran, PR84244, v3] Fix ICE in recompute_tree_invariant_for_addr_expr, at tree.c:4535

2024-08-16 Thread Andre Vehreschild
Hi all,

any one for a review? This patch is over a month old and starts to rot.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

- Andre

On Fri, 9 Aug 2024 16:27:42 +0200
Andre Vehreschild  wrote:

> Ping!
>
> On Wed, 17 Jul 2024 15:11:33 +0200
> Andre Vehreschild  wrote:
>
> > Hi all,
> >
> > and the last ping.
> >
> > Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
> >
> > Regards,
> > Andre
> >
> > On Thu, 11 Jul 2024 16:05:09 +0200
> > Andre Vehreschild  wrote:
> >
> > > Hi all,
> > >
> > > the attached patch fixes a segfault in the compiler, where for pointer
> > > components of a derived type the caf_token in the component was not
> > > set, when the derived was previously used outside of a coarray.
> > >
> > > Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
> > >
> > > Regards,
> > >   Andre
> >
> >
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


--
Andre Vehreschild * Email: vehre ad gmx dot de
From 6cb0fc042ec3121b58c1e04b86c9a5c24ca581b1 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Thu, 11 Jul 2024 15:44:56 +0200
Subject: [PATCH] [Fortran] Fix ICE in recompute_tree_invariant_for_addr_expr,
 at tree.c:4535 [PR84244]

Declaring an unused function with a derived type having a pointer
component and using that derived type as a coarray, lead the compiler to
ICE because the caf_token for the pointer was not linked into the
component correctly.

	PR fortran/84244

gcc/fortran/ChangeLog:

	* trans-types.cc (gfc_get_derived_type): When a caf_sub_token is
	generated for a component, link it to the component it is
	generated for (the previous one).

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/ptr_comp_5.f08: New test.
---
 gcc/fortran/trans-types.cc|  6 +-
 .../gfortran.dg/coarray/ptr_comp_5.f08| 19 +++
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/ptr_comp_5.f08

diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index e6da8e1a58b..bc582085f57 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -2661,7 +2661,7 @@ gfc_get_derived_type (gfc_symbol * derived, int codimen)
   tree *chain = NULL;
   bool got_canonical = false;
   bool unlimited_entity = false;
-  gfc_component *c;
+  gfc_component *c, *last_c = nullptr;
   gfc_namespace *ns;
   tree tmp;
   bool coarray_flag, class_coarray_flag;
@@ -2961,10 +2961,14 @@ gfc_get_derived_type (gfc_symbol * derived, int codimen)
 	 types.  */
   if (class_coarray_flag || !c->backend_decl)
 	c->backend_decl = field;
+  if (c->attr.caf_token && last_c)
+	last_c->caf_token = field;

   if (c->attr.pointer && (c->attr.dimension || c->attr.codimension)
 	  && !(c->ts.type == BT_DERIVED && strcmp (c->name, "_data") == 0))
 	GFC_DECL_PTR_ARRAY_P (c->backend_decl) = 1;
+
+  last_c = c;
 }

   /* Now lay out the derived type, including the fields.  */
diff --git a/gcc/testsuite/gfortran.dg/coarray/ptr_comp_5.f08 b/gcc/testsuite/gfortran.dg/coarray/ptr_comp_5.f08
new file mode 100644
index 000..ed3a8db13fa
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/ptr_comp_5.f08
@@ -0,0 +1,19 @@
+! { dg-do compile }
+
+! Check PR84244 does not ICE anymore.
+
+program ptr_comp_5
+  integer, target :: dest = 42
+  type t
+integer, pointer :: p
+  end type
+  type(t) :: o[*]
+
+  o%p => dest
+contains
+  ! This unused routine is crucial for the ICE.
+  function f(x)
+type(t), intent(in) ::x
+  end function
+end program
+
--
2.45.2



[PATCH] testsuite: Add -fwrapv to signbit-5.c

2024-08-16 Thread Torbjörn SVENSSON
Ok for trunk and releases/gcc-14?

Verified this on x86_64 and arm-none-eabi.
Don't know if the other "truth type" dg-lines can be removed as well.

--

On Cortex-M55 with MVE, the test case fails due to -INT_MAX being
undefined. Adding -fwrapv solves the issues.

Regtested on x86_64-pc-linux and arm-none-eabi for
Cortex-M0/M3/M4/M7/M33/M55/M85/A7.

gcc/testsuite/ChangeLog:

* gcc.dg/signbit-5.c: Add -fwrapv and remove x86 exception.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
 gcc/testsuite/gcc.dg/signbit-5.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/signbit-5.c b/gcc/testsuite/gcc.dg/signbit-5.c
index 1e1b237a0e0..2bca640f930 100644
--- a/gcc/testsuite/gcc.dg/signbit-5.c
+++ b/gcc/testsuite/gcc.dg/signbit-5.c
@@ -1,8 +1,7 @@
 /* { dg-do run } */
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -fwrapv" } */
 
 /* This test does not work when the truth type does not match vector type.  */
-/* { dg-additional-options "-mno-avx512f" { target { i?86-*-* x86_64-*-* } } } 
*/
 /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
 /* { dg-xfail-run-if "truth type does not match vector type" { amdgcn-*-* } } 
*/
 /* { dg-xfail-run-if "truth type does not match vector type" { riscv_v } } */
-- 
2.25.1



[PATCH v2 02/10] fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608]

2024-08-16 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Disable rewriting of MINLOC/MAXLOC expressions for which inline code
generation is supported.  Update the gfc_inline_intrinsic_function_p
predicate (already existing) for that, with the current state of
MINLOC/MAXLOC inlining support, that is only the cases of a scalar
result and non-CHARACTER argument for now.

This change has no effect currently, as the MINLOC/MAXLOC front-end passes
only change expressions of rank 1, but the inlining control predicate
gfc_inline_intrinsic_function_p returns false for those.  However, later
changes will extend MINLOC/MAXLOC inline expansion support to array
expressions and update the inlining control predicate, and this will become
effective.

PR fortran/90608

gcc/fortran/ChangeLog:

* frontend-passes.cc (optimize_minmaxloc): Skip if we can generate
inline code for the unmodified expression.
* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Add
MINLOC and MAXLOC cases.
---
 gcc/fortran/frontend-passes.cc |  3 ++-
 gcc/fortran/trans-intrinsic.cc | 23 +++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/frontend-passes.cc b/gcc/fortran/frontend-passes.cc
index 3c06018fdbb..8e4c6310ba8 100644
--- a/gcc/fortran/frontend-passes.cc
+++ b/gcc/fortran/frontend-passes.cc
@@ -2277,7 +2277,8 @@ optimize_minmaxloc (gfc_expr **e)
   || fn->value.function.actual == NULL
   || fn->value.function.actual->expr == NULL
   || fn->value.function.actual->expr->ts.type == BT_CHARACTER
-  || fn->value.function.actual->expr->rank != 1)
+  || fn->value.function.actual->expr->rank != 1
+  || gfc_inline_intrinsic_function_p (fn))
 return;
 
   *e = gfc_get_array_expr (fn->ts.type, fn->ts.kind, &fn->where);
diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 84a378ef310..2c8512060cc 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -11652,6 +11652,29 @@ gfc_inline_intrinsic_function_p (gfc_expr *expr)
 case GFC_ISYM_TRANSPOSE:
   return true;
 
+case GFC_ISYM_MINLOC:
+case GFC_ISYM_MAXLOC:
+  {
+   /* Disable inline expansion if code size matters.  */
+   if (optimize_size)
+ return false;
+
+   gfc_actual_arglist *array_arg = expr->value.function.actual;
+   gfc_actual_arglist *dim_arg = array_arg->next;
+
+   gfc_expr *array = array_arg->expr;
+   gfc_expr *dim = dim_arg->expr;
+
+   if (!(array->ts.type == BT_INTEGER
+ || array->ts.type == BT_REAL))
+ return false;
+
+   if (array->rank == 1 && dim != nullptr)
+ return true;
+
+   return false;
+  }
+
 default:
   return false;
 }
-- 
2.43.0



[PATCH v2 06/10] fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK [PR90608]

2024-08-16 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Enable generation of inline code for the MINLOC and MAXLOC intrinsic,
if the ARRAY argument is of integral type and of any rank (only the rank 1
case was previously inlined), and neither DIM nor MASK arguments are
present.

This needs a few adjustments in gfc_conv_intrinsic_minmaxloc,
mainly to replace the single variables POS and OFFSET, with collections
of variables, one variable per dimension each.

The restriction to integral ARRAY and absent MASK limits the scope of
the change to the cases where we generate single loop inline code.  The
code generation for the second loop is only accessible with ARRAY of rank
1, so it can continue using a single variable.  A later change will extend
inlining to the double loop cases.

There is some bounds checking code that was previously handled by the
library, and that needed some changes in the scalarizer to avoid regressing.
The bounds check code generation was already supported by the scalarizer,
but it was only applying to array reference sections, checking both
for array bound violation and for shape conformability between all the
involved arrays.  With this change, for MINLOC or MAXLOC, enable the
conformability check between all the scalarized arrays, and disable the
array bound violation check.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_startstride): Set the MINLOC/MAXLOC
result upper bound using the rank of the ARRAY argument.  Ajdust
the error message for intrinsic result arrays.  Only check array
bounds for array references.  Move bound check decision code...
(bounds_check_needed): ... here as a new predicate.  Allow bound
check for MINLOC/MAXLOC intrinsic results.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Change the
result array upper bound to the rank of ARRAY.  Update the NONEMPTY
variable to depend on the non-empty extent of every dimension.  Use
one variable per dimension instead of a single variable for the
position and the offset.  Update their declaration, initialization,
and update to affect the variable of each dimension.  Use the first
variable only in areas only accessed with rank 1 ARRAY argument.
Set every element of the result using its corresponding variable.
(gfc_inline_intrinsic_function_p): Return true for integral ARRAY
and absent DIM and MASK.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_bounds_4.f90: Additionally accept the error
message emitted by the scalarizer.
---
 gcc/fortran/trans-array.cc|  70 ++--
 gcc/fortran/trans-intrinsic.cc| 150 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_4.f90 |   4 +-
 3 files changed, 166 insertions(+), 58 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index e578b676fcc..1190bfa6c02 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4956,6 +4956,35 @@ add_check_section_in_array_bounds (stmtblock_t *inner, 
gfc_ss_info *ss_info,
 }
 
 
+/* Tells whether we need to generate bounds checking code for the array
+   associated with SS.  */
+
+bool
+bounds_check_needed (gfc_ss *ss)
+{
+  /* Catch allocatable lhs in f2003.  */
+  if (flag_realloc_lhs && ss->no_bounds_check)
+return false;
+
+  gfc_ss_info *ss_info = ss->info;
+  if (ss_info->type == GFC_SS_SECTION)
+return true;
+
+  if (!(ss_info->type == GFC_SS_INTRINSIC
+   && ss_info->expr
+   && ss_info->expr->expr_type == EXPR_FUNCTION))
+return false;
+
+  gfc_intrinsic_sym *isym = ss_info->expr->value.function.isym;
+  if (!(isym
+   && (isym->id == GFC_ISYM_MAXLOC
+   || isym->id == GFC_ISYM_MINLOC)))
+return false;
+
+  return gfc_inline_intrinsic_function_p (ss_info->expr);
+}
+
+
 /* Calculates the range start and stride for a SS chain.  Also gets the
descriptor and data pointer.  The range of vector subscripts is the size
of the vector.  Array bounds are also checked.  */
@@ -5057,10 +5086,17 @@ done:
info->data = gfc_conv_array_data (info->descriptor);
info->data = gfc_evaluate_now (info->data, &outer_loop->pre);
 
-   info->offset = gfc_index_zero_node;
+   gfc_expr *array = expr->value.function.actual->expr;
+   tree rank = build_int_cst (gfc_array_index_type, array->rank);
+
+   tree tmp = fold_build2_loc (input_location, MINUS_EXPR,
+   gfc_array_index_type, rank,
+   gfc_index_one_node);
+
+   info->end[0] = gfc_evaluate_now (tmp, &outer_loop->pre);
info->start[0] = gfc_index_zero_node;
-   info->end[0] = gfc_index_zero_node;
info->stride[0] = gfc_index_one_node;
+   info->offse

[PATCH v2 05/10] fortran: Outline array bound check generation code

2024-08-16 Thread Mikael Morin
From: Mikael Morin 

The next patch will need reindenting of the array bound check generation
code.  This outlines it to its own function beforehand, reducing the churn
in the next patch.

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_startstride): Move array bound check
generation code...
(add_check_section_in_array_bounds): ... here as a new function.
---
 gcc/fortran/trans-array.cc | 297 ++---
 1 file changed, 143 insertions(+), 154 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 46e2152d0f0..e578b676fcc 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4816,6 +4816,146 @@ gfc_conv_section_startstride (stmtblock_t * block, 
gfc_ss * ss, int dim)
 }
 
 
+/* Generate in INNER the bounds checking code along the dimension DIM for
+   the array associated with SS_INFO.  */
+
+static void
+add_check_section_in_array_bounds (stmtblock_t *inner, gfc_ss_info *ss_info,
+  int dim)
+{
+  gfc_expr *expr = ss_info->expr;
+  locus *expr_loc = &expr->where;
+  const char *expr_name = expr->symtree->name;
+
+  gfc_array_info *info = &ss_info->data.array;
+
+  bool check_upper;
+  if (dim == info->ref->u.ar.dimen - 1
+  && info->ref->u.ar.as->type == AS_ASSUMED_SIZE)
+check_upper = false;
+  else
+check_upper = true;
+
+  /* Zero stride is not allowed.  */
+  tree tmp = fold_build2_loc (input_location, EQ_EXPR, logical_type_node,
+ info->stride[dim], gfc_index_zero_node);
+  char * msg = xasprintf ("Zero stride is not allowed, for dimension %d "
+ "of array '%s'", dim + 1, expr_name);
+  gfc_trans_runtime_check (true, false, tmp, inner, expr_loc, msg);
+  free (msg);
+
+  tree desc = info->descriptor;
+
+  /* This is the run-time equivalent of resolve.cc's
+ check_dimension.  The logical is more readable there
+ than it is here, with all the trees.  */
+  tree lbound = gfc_conv_array_lbound (desc, dim);
+  tree end = info->end[dim];
+  tree ubound = check_upper ? gfc_conv_array_ubound (desc, dim) : NULL_TREE;
+
+  /* non_zerosized is true when the selected range is not
+ empty.  */
+  tree stride_pos = fold_build2_loc (input_location, GT_EXPR, 
logical_type_node,
+info->stride[dim], gfc_index_zero_node);
+  tmp = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
+info->start[dim], end);
+  stride_pos = fold_build2_loc (input_location, TRUTH_AND_EXPR,
+   logical_type_node, stride_pos, tmp);
+
+  tree stride_neg = fold_build2_loc (input_location, LT_EXPR, 
logical_type_node,
+info->stride[dim], gfc_index_zero_node);
+  tmp = fold_build2_loc (input_location, GE_EXPR, logical_type_node,
+info->start[dim], end);
+  stride_neg = fold_build2_loc (input_location, TRUTH_AND_EXPR,
+   logical_type_node, stride_neg, tmp);
+  tree non_zerosized = fold_build2_loc (input_location, TRUTH_OR_EXPR,
+   logical_type_node, stride_pos,
+   stride_neg);
+
+  /* Check the start of the range against the lower and upper
+ bounds of the array, if the range is not empty.
+ If upper bound is present, include both bounds in the
+ error message.  */
+  if (check_upper)
+{
+  tmp = fold_build2_loc (input_location, LT_EXPR, logical_type_node,
+info->start[dim], lbound);
+  tmp = fold_build2_loc (input_location, TRUTH_AND_EXPR, logical_type_node,
+non_zerosized, tmp);
+  tree tmp2 = fold_build2_loc (input_location, GT_EXPR, logical_type_node,
+  info->start[dim], ubound);
+  tmp2 = fold_build2_loc (input_location, TRUTH_AND_EXPR, 
logical_type_node,
+ non_zerosized, tmp2);
+  msg = xasprintf ("Index '%%ld' of dimension %d of array '%s' outside of "
+  "expected range (%%ld:%%ld)", dim + 1, expr_name);
+  gfc_trans_runtime_check (true, false, tmp, inner, expr_loc, msg,
+ fold_convert (long_integer_type_node, info->start[dim]),
+ fold_convert (long_integer_type_node, lbound),
+ fold_convert (long_integer_type_node, ubound));
+  gfc_trans_runtime_check (true, false, tmp2, inner, expr_loc, msg,
+ fold_convert (long_integer_type_node, info->start[dim]),
+ fold_convert (long_integer_type_node, lbound),
+ fold_convert (long_integer_type_node, ubound));
+  free (msg);
+}
+  else
+{
+  tmp = fold_build2_loc (input_location, LT_EXPR, logical_type_node,
+info->start[dim], lbound);
+  tmp = fold_build2_loc (input_location, TRUTH_AND_EX

[PATCH v2 04/10] fortran: Remove MINLOC/MAXLOC frontend optimization

2024-08-16 Thread Mikael Morin
From: Mikael Morin 

This patch is new in the V2 series.

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Remove the frontend pass rewriting calls of MINLOC/MAXLOC without DIM to
calls with one-valued DIM enclosed in an array constructor.  This
transformation was circumventing the limitation of inline MINLOC/MAXLOC code
generation to scalar cases only, allowing inline code to be generated if
ARRAY had rank 1 and DIM was absent.  As MINLOC/MAXLOC has gained support of
inline code generation in that case, the limitation is no longer effective,
and the transformation no longer necessary.

gcc/fortran/ChangeLog:

* frontend-passes.cc (optimize_minmaxloc): Remove.
(optimize_expr): Remove dispatch to optimize_minmaxloc.
---
 gcc/fortran/frontend-passes.cc | 57 --
 1 file changed, 57 deletions(-)

diff --git a/gcc/fortran/frontend-passes.cc b/gcc/fortran/frontend-passes.cc
index 8e4c6310ba8..31d553e9844 100644
--- a/gcc/fortran/frontend-passes.cc
+++ b/gcc/fortran/frontend-passes.cc
@@ -36,7 +36,6 @@ static bool optimize_op (gfc_expr *);
 static bool optimize_comparison (gfc_expr *, gfc_intrinsic_op);
 static bool optimize_trim (gfc_expr *);
 static bool optimize_lexical_comparison (gfc_expr *);
-static void optimize_minmaxloc (gfc_expr **);
 static bool is_empty_string (gfc_expr *e);
 static void doloop_warn (gfc_namespace *);
 static int do_intent (gfc_expr **);
@@ -356,17 +355,6 @@ optimize_expr (gfc_expr **e, int *walk_subtrees 
ATTRIBUTE_UNUSED,
   if ((*e)->expr_type == EXPR_OP && optimize_op (*e))
 gfc_simplify_expr (*e, 0);
 
-  if ((*e)->expr_type == EXPR_FUNCTION && (*e)->value.function.isym)
-switch ((*e)->value.function.isym->id)
-  {
-  case GFC_ISYM_MINLOC:
-  case GFC_ISYM_MAXLOC:
-   optimize_minmaxloc (e);
-   break;
-  default:
-   break;
-  }
-
   if (function_expr)
 count_arglist --;
 
@@ -2262,51 +2250,6 @@ optimize_trim (gfc_expr *e)
   return true;
 }
 
-/* Optimize minloc(b), where b is rank 1 array, into
-   (/ minloc(b, dim=1) /), and similarly for maxloc,
-   as the latter forms are expanded inline.  */
-
-static void
-optimize_minmaxloc (gfc_expr **e)
-{
-  gfc_expr *fn = *e;
-  gfc_actual_arglist *a;
-  char *name, *p;
-
-  if (fn->rank != 1
-  || fn->value.function.actual == NULL
-  || fn->value.function.actual->expr == NULL
-  || fn->value.function.actual->expr->ts.type == BT_CHARACTER
-  || fn->value.function.actual->expr->rank != 1
-  || gfc_inline_intrinsic_function_p (fn))
-return;
-
-  *e = gfc_get_array_expr (fn->ts.type, fn->ts.kind, &fn->where);
-  (*e)->shape = fn->shape;
-  fn->rank = 0;
-  fn->shape = NULL;
-  gfc_constructor_append_expr (&(*e)->value.constructor, fn, &fn->where);
-
-  name = XALLOCAVEC (char, strlen (fn->value.function.name) + 1);
-  strcpy (name, fn->value.function.name);
-  p = strstr (name, "loc0");
-  p[3] = '1';
-  fn->value.function.name = gfc_get_string ("%s", name);
-  if (fn->value.function.actual->next)
-{
-  a = fn->value.function.actual->next;
-  gcc_assert (a->expr == NULL);
-}
-  else
-{
-  a = gfc_get_actual_arglist ();
-  fn->value.function.actual->next = a;
-}
-  a->expr = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind,
-  &fn->where);
-  mpz_set_ui (a->expr->value.integer, 1);
-}
-
 /* Data package to hand down for DO loop checks in a contained
procedure.  */
 typedef struct contained_info
-- 
2.43.0



[PATCH v2 08/10] fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608]

2024-08-16 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Enable generation of inline MINLOC/MAXLOC code in the case where DIM
is not present, and either ARRAY is of floating point type or MASK is an
array.  Those cases are the remaining bits to fully support inlining of
non-CHARACTER MINLOC/MAXLOC without DIM.  They are treated together because
they generate similar code, the NANs for REAL types being handled a bit like
a second level of masking.  These are the cases for which we generate two
sets of loops.

This change affects the code generating the second loop, that was previously
accessible only in the cases ARRAY has rank 1 only.  The single variable
initialization and update are changed to apply to multiple variables, one
per dimension.

The code generated is as follows (if ARRAY has rank 2):

for (idx11 in lower1..upper1)
  {
for (idx12 in lower2..upper2)
  {
...
if (...)
  {
...
goto second_loop;
  }
  }
  }
second_loop:
for (idx21 in lower1..upper1)
  {
for (idx22 in lower2..upper2)
  {
...
  }
  }

This code leads to processing the first elements redundantly, both in the
first set of loops and in the second one.  The loop over idx22 could
start from idx12 the first time it is run, but as it has to start from
lower2 for the rest of the runs, this change uses the same bounds for both
set of loops for simplicity.  In the rank 1 case, this makes the generated
code worse compared to the inline code that was generated before.  A later
change will introduce conditionals to avoid the duplicate processing and
restore the generated code in that case.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Initialize
and update all the variables.  Put the label and goto in the
outermost scalarizer loop.  Don't start the second loop where the
first stopped.
(gfc_inline_intrinsic_function_p): Also return TRUE for array MASK
or for any REAL type.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_bounds_5.f90: Additionally accept error
messages reported by the scalarizer.
* gfortran.dg/maxloc_bounds_6.f90: Ditto.
---
 gcc/fortran/trans-intrinsic.cc| 127 --
 gcc/testsuite/gfortran.dg/maxloc_bounds_5.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_6.f90 |   4 +-
 3 files changed, 87 insertions(+), 48 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index cd7a43f58fb..a92b733cf2f 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5358,12 +5358,55 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
   }
   S++;
 }
-   B: ARRAY has rank 1, and DIM is absent.  Use the same code as the scalar
-  case and wrap the result in an array.
-   C: ARRAY has rank > 1, NANs are not supported, and DIM and MASK are absent.
-  Generate code similar to the single loop scalar case, but using one
-  variable per dimension, for example if ARRAY has rank 2:
-  4) NAN's aren't supported, no MASK:
+   B: Array result, non-CHARACTER type, DIM absent
+  Generate similar code as in the scalar case, using a collection of
+  variables (one per dimension) instead of a single variable as result.
+  Picking only cases 1) and 4) with ARRAY of rank 2, the generated code
+  becomes:
+  1) Array mask is used and NaNs need to be supported:
+limit = Infinity;
+pos0 = 0;
+pos1 = 0;
+S1 = from1;
+while (S1 <= to1) {
+  S0 = from0;
+  while (s0 <= to0 {
+if (mask[S1][S0]) {
+  if (pos0 == 0) {
+pos0 = S0 + (1 - from0);
+pos1 = S1 + (1 - from1);
+  }
+  if (a[S1][S0] <= limit) {
+limit = a[S1][S0];
+pos0 = S0 + (1 - from0);
+pos1 = S1 + (1 - from1);
+goto lab1;
+  }
+}
+S0++;
+  }
+  S1++;
+}
+goto lab2;
+lab1:;
+S1 = from1;
+while (S1 <= to1) {
+  S0 = from0;
+  while (S0 <= to0) {
+if (mask[S1][S0])
+  if (a[S1][S0] < limit) {
+limit = a[S1][S0];
+pos0 = S + (1 - from0);
+pos1 = S + (1 - from1);
+  }
+S0++;
+  }
+  S1++;
+}
+lab2:;
+result = { pos0, pos1 };
+  ...
+  4) NANs aren't supported, no array mask.
 limit = infinities_supported ? Infinity : huge (limit);
 pos0 = (from0 <= to0 && from1 <= to1) ? 1 : 0;
 pos1 = (fr

[PATCH v2 00/10] fortran: Inline MINLOC/MAXLOC without DIM argument [PR90608]

2024-08-16 Thread Mikael Morin
From: Mikael Morin 

Hello,

this is the second version of the inline MINLOC/MAXLOC without DIM patchset
whose first version was posted before at:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658909.html

Appart from the NAN skipping conditional likeliness which is left unchanged,
it takes into account the review comments I got so far, including the still
controversial -finline-intrinsics flag.  Regarding the conditional
likeliness, I looked at its effect on the generated assembly for
minmaxloc_18, and I really can't tell which is better just from the look
of it.  I prefer to not touch that part.

Patches 4 (minmaxloc frontend pass removal) and 10 (-finline-intrinsics flag)
are new, and patch 1 (tests) has been modified to move the NAN tests to a
separate file and use the ieee_arithmetic intrinsic module.  The rest of the
patches are rebased versions of the previously posted patches.
Details below and in the patches themselves.

This series of patches enable the generation of inline code for the MINLOC
and MAXLOC intrinsics, when the DIM argument is not present.  The
generated code is based on the inline implementation already generated in
the scalar case, that is when ARRAY has rank 1 and DIM is present.  The
code is extended by using several variables (one for each dimension) where
the scalar code used just one, and collecting the variables to an array
before returning.

The patches are split in a way that allows inlining in more and more cases
as controlled by the gfc_inline_intrinsic_p predicate which evolves with
the patches.

Changes from V1:

 - In patch 1/10, use intrinsic ieee_arithmetic module to get NAN values in
   tests.  This required to split the tests using ieee_arithmetic to
   a separate file in the ieee/ subdirectory.

 - Add patch 4/10 removing the frontend minmaxloc pass.

 - Add patch 10/10 adding -finline-intrinsics flag to control MINLOC/MAXLOC
   inlining from the command line.

Mikael Morin (10):
  fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608]
  fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608]
  fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1
[PR90608]
  fortran: Remove MINLOC/MAXLOC frontend optimization
  fortran: Outline array bound check generation code
  fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK
[PR90608]
  fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK
[PR90608]
  fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608]
  fortran: Continue MINLOC/MAXLOC second loop where the first stopped
[PR90608]
  fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]

 gcc/flag-types.h  |  30 +
 gcc/fortran/frontend-passes.cc|  56 --
 gcc/fortran/invoke.texi   |  24 +
 gcc/fortran/lang.opt  |  27 +
 gcc/fortran/options.cc|  21 +-
 gcc/fortran/trans-array.cc| 382 +
 gcc/fortran/trans-intrinsic.cc| 489 ---
 .../gfortran.dg/ieee/maxloc_nan_1.f90 |  44 +
 .../gfortran.dg/ieee/minloc_nan_1.f90 |  44 +
 gcc/testsuite/gfortran.dg/maxloc_7.f90| 208 +
 gcc/testsuite/gfortran.dg/maxloc_bounds_4.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_5.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_6.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 |   4 +-
 .../gfortran.dg/maxloc_with_mask_1.f90| 373 +
 gcc/testsuite/gfortran.dg/minloc_8.f90| 208 +
 .../gfortran.dg/minloc_with_mask_1.f90| 372 +
 gcc/testsuite/gfortran.dg/minmaxloc_18.f90| 772 ++
 gcc/testsuite/gfortran.dg/minmaxloc_18a.f90   |  10 +
 gcc/testsuite/gfortran.dg/minmaxloc_18b.f90   |  10 +
 gcc/testsuite/gfortran.dg/minmaxloc_18c.f90   |  10 +
 gcc/testsuite/gfortran.dg/minmaxloc_18d.f90   |  10 +
 22 files changed, 2760 insertions(+), 346 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/ieee/maxloc_nan_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/ieee/minloc_nan_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_with_mask_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_with_mask_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18a.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18b.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18c.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18d.f90

-- 
2.43.0



[PATCH v2 07/10] fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK [PR90608]

2024-08-16 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Enable the generation of inline code for MINLOC/MAXLOC when argument ARRAY
is of integral type, DIM is not present, and MASK is present and is scalar
(only absent MASK or rank 1 ARRAY were inlined before).

Scalar masks are implemented with a wrapping condition around the code one
would generate if MASK wasn't present, so they are easy to support once
inline code without MASK is working.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate
variable initialization for each dimension in the else branch of
the toplevel condition.
(gfc_inline_intrinsic_function_p): Return TRUE for scalar MASK.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_bounds_7.f90: Additionally accept the error message
reported by the scalarizer.
---
 gcc/fortran/trans-intrinsic.cc| 13 -
 gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 |  4 ++--
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index b8a7faf5459..cd7a43f58fb 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5914,7 +5914,6 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
   /* For a scalar mask, enclose the loop in an if statement.  */
   if (maskexpr && maskss == NULL)
 {
-  gcc_assert (loop.dimen == 1);
   tree ifmask;
 
   gfc_init_se (&maskse, NULL);
@@ -5929,7 +5928,8 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
 the pos variable the same way as above.  */
 
   gfc_init_block (&elseblock);
-  gfc_add_modify (&elseblock, pos[0], gfc_index_zero_node);
+  for (int i = 0; i < loop.dimen; i++)
+   gfc_add_modify (&elseblock, pos[i], gfc_index_zero_node);
   elsetmp = gfc_finish_block (&elseblock);
   ifmask = conv_mask_condition (&maskse, maskexpr, optional_mask);
   tmp = build3_v (COND_EXPR, ifmask, tmp, elsetmp);
@@ -11823,9 +11823,12 @@ gfc_inline_intrinsic_function_p (gfc_expr *expr)
if (array->rank == 1)
  return true;
 
-   if (array->ts.type == BT_INTEGER
-   && dim == nullptr
-   && mask == nullptr)
+   if (array->ts.type != BT_INTEGER
+   || dim != nullptr)
+ return false;
+
+   if (mask == nullptr
+   || mask->rank == 0)
  return true;
 
return false;
diff --git a/gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 
b/gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90
index 206a29b149d..3aa9d3dcebe 100644
--- a/gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90
+++ b/gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90
@@ -1,6 +1,6 @@
 ! { dg-do run }
 ! { dg-options "-fbounds-check" }
-! { dg-shouldfail "Incorrect extent in return value of MAXLOC intrinsic: is 3, 
should be 2" }
+! { dg-shouldfail "Incorrect extent in return value of MAXLOC intrinsic: is 3, 
should be 2|Array bound mismatch for dimension 1 of array 'res' .3/2." }
 module tst
 contains
   subroutine foo(res)
@@ -18,4 +18,4 @@ program main
   integer :: res(3)
   call foo(res)
 end program main
-! { dg-output "Fortran runtime error: Incorrect extent in return value of 
MAXLOC intrinsic: is 3, should be 2" }
+! { dg-output "Fortran runtime error: Incorrect extent in return value of 
MAXLOC intrinsic: is 3, should be 2|Array bound mismatch for dimension 1 of 
array 'res' .3/2." }
-- 
2.43.0



[PATCH v2 09/10] fortran: Continue MINLOC/MAXLOC second loop where the first stopped [PR90608]

2024-08-16 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Continue the second set of loops where the first one stopped in the
generated inline MINLOC/MAXLOC code in the cases where the generated code
contains two sets of loops.  This fixes a regression that was introduced
when enabling the generation of inline MINLOC/MAXLOC code with ARRAY of rank
greater than 1, no DIM argument, and either non-scalar MASK or floating-
point ARRAY.

In the cases where two sets of loops are generated as inline MINLOC/MAXLOC
code, we previously generated code such as (for rank 2 ARRAY, so with two
levels of nesting):

for (idx11 in lower1..upper1)
  {
for (idx12 in lower2..upper2)
  {
...
if (...)
  {
...
goto second_loop;
  }
  }
  }
second_loop:
for (idx21 in lower1..upper1)
  {
for (idx22 in lower2..upper2)
  {
...
  }
  }

which means we process the first elements twice, once in the first set
of loops and once in the second one.  This change avoids this duplicate
processing by using a conditional as lower bound for the second set of
loops, generating code like:

second_loop_entry = false;
for (idx11 in lower1..upper1)
  {
for (idx12 in lower2..upper2)
  {
...
if (...)
  {
...
second_loop_entry = true;
goto second_loop;
  }
  }
  }
second_loop:
for (idx21 in (second_loop_entry ? idx11 : lower1)..upper1)
  {
for (idx22 in (second_loop_entry ? idx12 : lower2)..upper2)
  {
...
second_loop_entry = false;
  }
  }

It was expected that the compiler optimizations would be able to remove the
state variable second_loop_entry.  It is the case if ARRAY has rank 1 (so
without loop nesting), the variable is removed and the loop bounds become
unconditional, which restores previously generated code, fully fixing the
regression.  For larger rank, unfortunately, the state variable and
conditional loop bounds remain, but those cases were previously using
library calls, so it's not a regression.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate a set
of index variables.  Set them using the loop indexes before leaving
the first set of loops.  Generate a new loop entry predicate.
Initialize it.  Set it before leaving the first set of loops.  Clear
it in the body of the second set of loops.  For the second set of
loops, update each loop lower bound to use the corresponding index
variable if the predicate variable is set.
---
 gcc/fortran/trans-intrinsic.cc | 33 +++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index a92b733cf2f..b03f7b1653e 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5368,6 +5368,7 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
 pos0 = 0;
 pos1 = 0;
 S1 = from1;
+second_loop_entry = false;
 while (S1 <= to1) {
   S0 = from0;
   while (s0 <= to0 {
@@ -5380,6 +5381,7 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
 limit = a[S1][S0];
 pos0 = S0 + (1 - from0);
 pos1 = S1 + (1 - from1);
+second_loop_entry = true;
 goto lab1;
   }
 }
@@ -5389,9 +5391,9 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
 }
 goto lab2;
 lab1:;
-S1 = from1;
+S1 = second_loop_entry ? S1 : from1;
 while (S1 <= to1) {
-  S0 = from0;
+  S0 = second_loop_entry ? S0 : from0;
   while (S0 <= to0) {
 if (mask[S1][S0])
   if (a[S1][S0] < limit) {
@@ -5399,6 +5401,7 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
 pos0 = S + (1 - from0);
 pos1 = S + (1 - from1);
   }
+second_loop_entry = false;
 S0++;
   }
   S1++;
@@ -5470,6 +5473,7 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
   gfc_expr *backexpr;
   gfc_se backse;
   tree pos[GFC_MAX_DIMENSIONS];
+  tree idx[GFC_MAX_DIMENSIONS];
   tree result_var = NULL_TREE;
   int n;
   bool optional_mask;
@@ -5551,6 +,8 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
   gfc_get_string ("pos%d", i));
   offset[i] = gfc_creat

[PATCH v2 03/10] fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1 [PR90608]

2024-08-16 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Enable inline code generation for the MINLOC and MAXLOC intrinsic, if the
DIM argument is not present and ARRAY has rank 1.  This case is similar to
the case where the result is scalar (DIM present and rank 1 ARRAY), which
already supports inline expansion of the intrinsic.  Both cases return
the same value, with the difference that the result is an array of size 1 if
DIM is absent, whereas it's a scalar if DIM  is present.  So all there is
to do for the new case to work is hook the inline expansion with the
scalarizer.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_startstride): Set the scalarization
rank based on the MINLOC/MAXLOC rank if needed.  Call the inline
code generation and setup the scalarizer array descriptor info
in the MINLOC and MAXLOC cases.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Return the
result array element if the scalarizer is setup and we are inside
the loops.  Restrict library function call dispatch to the case
where inline expansion is not supported.  Declare an array result
if the expression isn't scalar.  Initialize the array result single
element and return the result variable if the expression isn't
scalar.
(walk_inline_intrinsic_minmaxloc): New function.
(walk_inline_intrinsic_function): Add MINLOC and MAXLOC cases,
dispatching to walk_inline_intrinsic_minmaxloc.
(gfc_add_intrinsic_ss_code): Add MINLOC and MAXLOC cases.
(gfc_inline_intrinsic_function_p): Return true if ARRAY has rank 1,
regardless of DIM.
---
 gcc/fortran/trans-array.cc |  25 
 gcc/fortran/trans-intrinsic.cc | 224 +++--
 2 files changed, 181 insertions(+), 68 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 9fb0b2b398d..46e2152d0f0 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4851,6 +4851,8 @@ gfc_conv_ss_startstride (gfc_loopinfo * loop)
case GFC_ISYM_UBOUND:
case GFC_ISYM_LCOBOUND:
case GFC_ISYM_UCOBOUND:
+   case GFC_ISYM_MAXLOC:
+   case GFC_ISYM_MINLOC:
case GFC_ISYM_SHAPE:
case GFC_ISYM_THIS_IMAGE:
  loop->dimen = ss->dimen;
@@ -4900,6 +4902,29 @@ done:
case GFC_SS_INTRINSIC:
  switch (expr->value.function.isym->id)
{
+   case GFC_ISYM_MINLOC:
+   case GFC_ISYM_MAXLOC:
+ {
+   gfc_se se;
+   gfc_init_se (&se, nullptr);
+   se.loop = loop;
+   se.ss = ss;
+   gfc_conv_intrinsic_function (&se, expr);
+   gfc_add_block_to_block (&outer_loop->pre, &se.pre);
+   gfc_add_block_to_block (&outer_loop->post, &se.post);
+
+   info->descriptor = se.expr;
+
+   info->data = gfc_conv_array_data (info->descriptor);
+   info->data = gfc_evaluate_now (info->data, &outer_loop->pre);
+
+   info->offset = gfc_index_zero_node;
+   info->start[0] = gfc_index_zero_node;
+   info->end[0] = gfc_index_zero_node;
+   info->stride[0] = gfc_index_one_node;
+   continue;
+ }
+
/* Fall through to supply start and stride.  */
case GFC_ISYM_LBOUND:
case GFC_ISYM_UBOUND:
diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 2c8512060cc..9fcb57a9cc4 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5273,66 +5273,95 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
we need to handle.  For performance reasons we sometimes create two
loops instead of one, where the second one is much simpler.
Examples for minloc intrinsic:
-   1) Result is an array, a call is generated
-   2) Array mask is used and NaNs need to be supported:
-  limit = Infinity;
-  pos = 0;
-  S = from;
-  while (S <= to) {
-   if (mask[S]) {
- if (pos == 0) pos = S + (1 - from);
- if (a[S] <= limit) { limit = a[S]; pos = S + (1 - from); goto lab1; }
-   }
-   S++;
-  }
-  goto lab2;
-  lab1:;
-  while (S <= to) {
-   if (mask[S]) if (a[S] < limit) { limit = a[S]; pos = S + (1 - from); }
-   S++;
-  }
-  lab2:;
-   3) NaNs need to be supported, but it is known at compile time or cheaply
-  at runtime whether array is nonempty or not:
-  limit = Infinity;
-  pos = 0;
-  S = from;
-  while (S <= to) {
-   if (a[S] <= limit) { limit = a[S]; pos = S + (1 - from); goto lab1; }
-   S++;
-  }
-  if (from <= to) pos = 1;
-  goto lab2;
-  lab1:;
-  while (S <= to) {
-   if (a[S] < limit) { limit = a[S]; pos = S + (1 - from); }
-   S++

[PATCH v2 01/10] fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608]

2024-08-16 Thread Mikael Morin
From: Mikael Morin 

Compared to the previous version of the patch at
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658916.html
this uses the IEEE_ARITHMETIC module to generate NAN values in the
tests.  This change required to move the affected tests to a separate file
in the ieee/ subdirectory, so that the compiler when run has the intrinsic
module path correctly provided and can load the intrinsic module.

Tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Add the tests covering the various cases for which we are about to implement
inline expansion of MINLOC and MAXLOC.  Those are cases where the DIM
argument is not present.

PR fortran/90608

gcc/testsuite/ChangeLog:

* gfortran.dg/ieee/maxloc_nan_1.f90: New test.
* gfortran.dg/ieee/minloc_nan_1.f90: New test.
* gfortran.dg/maxloc_7.f90: New test.
* gfortran.dg/maxloc_with_mask_1.f90: New test.
* gfortran.dg/minloc_8.f90: New test.
* gfortran.dg/minloc_with_mask_1.f90: New test.
---
 .../gfortran.dg/ieee/maxloc_nan_1.f90 |  44 +++
 .../gfortran.dg/ieee/minloc_nan_1.f90 |  44 +++
 gcc/testsuite/gfortran.dg/maxloc_7.f90| 208 ++
 .../gfortran.dg/maxloc_with_mask_1.f90| 373 ++
 gcc/testsuite/gfortran.dg/minloc_8.f90| 208 ++
 .../gfortran.dg/minloc_with_mask_1.f90| 372 +
 6 files changed, 1249 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/ieee/maxloc_nan_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/ieee/minloc_nan_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_with_mask_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_with_mask_1.f90

diff --git a/gcc/testsuite/gfortran.dg/ieee/maxloc_nan_1.f90 
b/gcc/testsuite/gfortran.dg/ieee/maxloc_nan_1.f90
new file mode 100644
index 000..329b54e8e1f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ieee/maxloc_nan_1.f90
@@ -0,0 +1,44 @@
+! { dg-do run }
+!
+! PR fortran/90608
+! Check the correct behaviour of the inline MAXLOC implementation,
+! when ARRAY is filled with NANs.
+
+program p
+  implicit none
+  call check_without_mask
+  call check_with_mask
+contains
+  subroutine check_without_mask()
+use, intrinsic :: ieee_arithmetic
+real, allocatable :: a(:,:,:)
+real :: nan
+integer, allocatable :: m(:)
+if (.not. ieee_support_nan(nan)) return
+nan = ieee_value(nan, ieee_quiet_nan)
+allocate(a(3,3,3), source = nan)
+m = maxloc(a)
+if (size(m, dim=1) /= 3) stop 32
+if (any(m /= (/ 1, 1, 1 /))) stop 35
+  end subroutine
+  subroutine check_with_mask()
+use, intrinsic :: ieee_arithmetic
+real, allocatable :: a(:,:,:)
+logical, allocatable :: m(:,:,:)
+real :: nan
+integer, allocatable :: r(:)
+if (.not. ieee_support_nan(nan)) return
+nan = ieee_value(nan, ieee_quiet_nan)
+allocate(a(3,3,3), source = nan)
+allocate(m(3,3,3))
+m(:,:,:) = reshape((/ .false., .false., .true. , .true. , .false., &
+  .true. , .false., .false., .false., .true. , &
+  .true. , .false., .true. , .true. , .true. , &
+  .false., .false., .true. , .true. , .false., &
+  .false., .true. , .false., .false., .true. , &
+  .true. , .true. /), shape(m))
+r = maxloc(a, mask = m)
+if (size(r, dim = 1) /= 3) stop 62
+if (any(r /= (/ 3, 1, 1 /))) stop 65
+  end subroutine
+end program p
diff --git a/gcc/testsuite/gfortran.dg/ieee/minloc_nan_1.f90 
b/gcc/testsuite/gfortran.dg/ieee/minloc_nan_1.f90
new file mode 100644
index 000..8f71b4c4398
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ieee/minloc_nan_1.f90
@@ -0,0 +1,44 @@
+! { dg-do run }
+!
+! PR fortran/90608
+! Check the correct behaviour of the inline MINLOC implementation,
+! when ARRAY is filled with NANs.
+
+program p
+  implicit none
+  call check_without_mask
+  call check_with_mask
+contains
+  subroutine check_without_mask()
+use, intrinsic :: ieee_arithmetic
+real, allocatable :: a(:,:,:)
+real :: nan
+integer, allocatable :: m(:)
+if (.not. ieee_support_nan(nan)) return
+nan = ieee_value(nan, ieee_quiet_nan)
+allocate(a(3,3,3), source = nan)
+m = minloc(a)
+if (size(m, dim=1) /= 3) stop 32
+if (any(m /= (/ 1, 1, 1 /))) stop 35
+  end subroutine
+  subroutine check_with_mask()
+use, intrinsic :: ieee_arithmetic
+real, allocatable :: a(:,:,:)
+logical, allocatable :: m(:,:,:)
+real :: nan
+integer, allocatable :: r(:)
+if (.not. ieee_support_nan(nan)) return
+nan = ieee_value(nan, ieee_quiet_nan)
+allocate(a(3,3,3), source = nan)
+allocate(m(3,3,3))
+m(:,:,:) = reshape((/ .false., .false., .true. , .true. , .false., &
+  .true. , .false., .false., .fals

[PATCH v2 10/10] fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]

2024-08-16 Thread Mikael Morin
From: Mikael Morin 

This patch is new in the V2 series.

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Introduce the -finline-intrinsics flag to control from the command line
whether to generate either inline code or calls to the functions from the
library, for the MINLOC and MAXLOC intrinsics.

The flag allows to specify inlining either independently for each intrinsic
(either MINLOC or MAXLOC), or all together.  For each intrinsic, a default
value is set if none was set.  The default value depends on the optimization
setting: inlining is avoided if not optimizing or if optimizing for size;
otherwise inlining is preferred.

There is no direct support for this behaviour provided by the .opt options
framework.  It is obtained by defining three different variants of the flag
(finline-intrinsics, fno-inline-intrinsics, finline-intrinsics=) all using
the same underlying option variable.  Each enum value (corresponding to an
intrinsic function) uses two identical bits, and the variable is initialized
with alternated bits, so that we can tell whether the value was left
initialized by checking whether the two bits have different values.

PR fortran/90608

gcc/ChangeLog:

* flag-types.h (enum gfc_inlineable_intrinsics): New type.

gcc/fortran/ChangeLog:

* invoke.texi(finline-intrinsics): Document new flag.
* lang.opt (finline-intrinsics, finline-intrinsics=,
fno-inline-intrinsics): New flags.
* options.cc (gfc_post_options): If the option variable controling
the inlining of MAXLOC (respectively MINLOC) has not been set, set
it or clear it depending on the optimization option variables.
* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Return false
if inlining for the intrinsic is disabled according to the option
variable.

gcc/testsuite/ChangeLog:

* gfortran.dg/minmaxloc_18.f90: New test.
* gfortran.dg/minmaxloc_18a.f90: New test.
* gfortran.dg/minmaxloc_18b.f90: New test.
* gfortran.dg/minmaxloc_18c.f90: New test.
* gfortran.dg/minmaxloc_18d.f90: New test.
---
 gcc/flag-types.h|  30 +
 gcc/fortran/invoke.texi |  24 +
 gcc/fortran/lang.opt|  27 +
 gcc/fortran/options.cc  |  21 +-
 gcc/fortran/trans-intrinsic.cc  |  13 +-
 gcc/testsuite/gfortran.dg/minmaxloc_18.f90  | 772 
 gcc/testsuite/gfortran.dg/minmaxloc_18a.f90 |  10 +
 gcc/testsuite/gfortran.dg/minmaxloc_18b.f90 |  10 +
 gcc/testsuite/gfortran.dg/minmaxloc_18c.f90 |  10 +
 gcc/testsuite/gfortran.dg/minmaxloc_18d.f90 |  10 +
 10 files changed, 922 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18a.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18b.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18c.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_18d.f90

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 1e497f0bb91..df56337f7e8 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -451,6 +451,36 @@ enum gfc_convert
 };
 
 
+/* gfortran -finline-intrinsics= values;
+   We use two identical bits for each value, and initialize with alternated
+   bits, so that we can check whether a value has been set by checking whether
+   the two bits have identical value.  */
+
+#define GFC_INL_INTR_VAL(idx) (3 << (2 * idx))
+#define GFC_INL_INTR_UNSET_VAL(val) (0x & (val))
+
+enum gfc_inlineable_intrinsics
+{
+  GFC_FLAG_INLINE_INTRINSIC_NONE = 0,
+  GFC_FLAG_INLINE_INTRINSIC_MAXLOC = GFC_INL_INTR_VAL (0),
+  GFC_FLAG_INLINE_INTRINSIC_MINLOC = GFC_INL_INTR_VAL (1),
+  GFC_FLAG_INLINE_INTRINSIC_ALL = GFC_FLAG_INLINE_INTRINSIC_MAXLOC
+ | GFC_FLAG_INLINE_INTRINSIC_MINLOC,
+
+  GFC_FLAG_INLINE_INTRINSIC_NONE_UNSET
+ = GFC_INL_INTR_UNSET_VAL (GFC_FLAG_INLINE_INTRINSIC_NONE),
+  GFC_FLAG_INLINE_INTRINSIC_MAXLOC_UNSET
+ = GFC_INL_INTR_UNSET_VAL (GFC_FLAG_INLINE_INTRINSIC_MAXLOC),
+  GFC_FLAG_INLINE_INTRINSIC_MINLOC_UNSET
+ = GFC_INL_INTR_UNSET_VAL (GFC_FLAG_INLINE_INTRINSIC_MINLOC),
+  GFC_FLAG_INLINE_INTRINSIC_ALL_UNSET
+ = GFC_INL_INTR_UNSET_VAL (GFC_FLAG_INLINE_INTRINSIC_ALL)
+};
+
+#undef GFC_INL_INTR_UNSET_VAL
+#undef GFC_INL_INTR_VAL
+
+
 /* Inline String Operations functions.  */
 enum ilsop_fn
 {
diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 6bc42afe2c4..53b6de1c92b 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -194,6 +194,7 @@ and warnings}.
 -finit-character=@var{n} -finit-integer=@var{n} -finit-local-zero
 -finit-derived -finit-logical=@var{}
 -finit-real=@var{}
+-finline-intrinsics[=<@var{minloc},@var{maxloc}>]
 -finline-matmul-limit=@var{n}
 -finline-arg-packing -fmax-array-constructor=@var{n}
 -fma

Re: [PATCH v3 04/12] OpenMP: C front end support for metadirectives

2024-08-16 Thread Jakub Jelinek
On Sat, Jul 20, 2024 at 02:42:23PM -0600, Sandra Loosemore wrote:
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -263,9 +263,24 @@ struct GTY(()) c_parser {
>   otherwise NULL.  */
>vec *in_omp_attribute_pragma;
>  
> +  /* When in_omp_attribute_pragma is non-null, these fields save the values
> + of the tokens and tokens_avail fields, so that they can be restored
> + after parsing the attribute.  Note that parsing the body of a
> + metadirective uses its own save/restore mechanism as those can be
> + nested with or without the attribute pragmas in the body.  */
> +c_token * GTY((skip)) save_tokens;
> +unsigned int save_tokens_avail;

The indentation of the above 2 is wrong.
Plus if those members are for metadirective parsing, their names are too
generic.

> +
>/* Set for omp::decl attribute parsing to the decl to which it
>   appertains.  */
>tree in_omp_decl_attribute;
> +
> +  /* Set if we are processing a statement body associated with a
> + metadirective variant.  */
> +  BOOL_BITFIELD in_metadirective_body : 1;

And the member ordering creates just too much padding.
Pointer, 32-bit int, pointer, 1-bit bitfield, pointer, 32-bit int,
reordering them slightly would get rid of that.

> +
> +  vec * GTY((skip)) metadirective_body_labels;
> +  unsigned int metadirective_region_num;

But more importantly, for something parsed really rarely, wouldn't it be
better to just add a single pointer to a new structure that contains
all you need for metadirective parsing?

> +  const char *old_name = IDENTIFIER_POINTER (name);
> +  char *new_name = (char *) alloca (strlen (old_name) + 32);

  char *new_name = XALLOCAVEC (char, strlen (old_name) + 32);
please.

> +  sprintf (new_name, "%s_MDR%u", old_name, parser->metadirective_region_num);
> +  return get_identifier (new_name);
> +}
> +
>  /* Parse a label (C90 6.6.1, C99 6.8.1, C11 6.8.1).
>  
> label:
> @@ -7431,6 +7483,9 @@ c_parser_label (c_parser *parser, tree std_attrs)
>gcc_assert (c_parser_next_token_is (parser, CPP_COLON));
>c_parser_consume_token (parser);
>attrs = c_parser_gnu_attributes (parser);
> +  if (parser->in_metadirective_body
> +   && parser->metadirective_body_labels->contains (name))
> + name = mangle_metadirective_region_label (parser, name);
>tlab = define_label (loc2, name);
>if (tlab)
>   {
> @@ -7658,8 +7713,11 @@ c_parser_statement_after_labels (c_parser *parser, 
> bool *if_p,
> c_parser_consume_token (parser);
> if (c_parser_next_token_is (parser, CPP_NAME))
>   {
> -   stmt = c_finish_goto_label (loc,
> -   c_parser_peek_token (parser)->value);
> +   tree name = c_parser_peek_token (parser)->value;
> +   if (parser->in_metadirective_body
> +   && parser->metadirective_body_labels->contains (name))
> + name = mangle_metadirective_region_label (parser, name);
> +   stmt = c_finish_goto_label (loc, name);
> c_parser_consume_token (parser);
>   }
> else if (c_parser_next_token_is (parser, CPP_MULT))
> @@ -14736,6 +14794,10 @@ c_parser_pragma (c_parser *parser, enum 
> pragma_context context, bool *if_p)
>c_parser_omp_nothing (parser);
>return false;
>  
> +case PRAGMA_OMP_METADIRECTIVE:
> +  c_parser_omp_metadirective (parser, if_p);
> +  return true;
> +
>  case PRAGMA_OMP_ERROR:
>return c_parser_omp_error (parser, context);
>  
> @@ -24879,7 +24941,7 @@ c_parser_omp_declare_simd (c_parser *parser, enum 
> pragma_context context)
>  
>  static tree
>  c_parser_omp_context_selector (c_parser *parser, enum omp_tss_code set,
> -tree parms)
> +tree parms, bool metadirective_p)
>  {
>tree ret = NULL_TREE;
>do
> @@ -25026,12 +25088,18 @@ c_parser_omp_context_selector (c_parser *parser, 
> enum omp_tss_code set,
>   {
> mark_exp_read (t);
> t = c_fully_fold (t, false, NULL);
> -   /* FIXME: this is bogus, both device_num and
> -  condition selectors allow arbitrary expressions.  */

Not in 5.0, that is a 5.1 feature that hasn't been implemented in the
initial declare variant implementation (and target_device set with
device_num didn't exist there at all).

> -   if (!INTEGRAL_TYPE_P (TREE_TYPE (t))
> -   || !tree_fits_shwi_p (t))
> - error_at (token->location, "property must be "
> -   "constant integer expression");
> +   /* FIXME: I believe it is an unimplemented feature rather
> +  than a user error to have non-constant expressions
> +  inside "declare variant".  */
> +   if (!metadirective_p
> +   && (!INTEGRAL_TYPE_P (TREE_TYPE (t))
> +   || !tree_fits_s

Re: [PATCH v2] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-08-16 Thread Alex Coplan
On 15/08/2024 16:55, Jason Merrill wrote:
> On 8/12/24 1:55 PM, Alex Coplan wrote:
> > Hi!
> > 
> > This is a v2 patch of:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659968.html
> > that addresses Jakub's feedback.
> > 
> > FWIW, I tried to contrive a testcase where convert_from_reference kicks
> > in and we get called with an ANNOTATE_EXPR in maybe_convert_cond, but to
> > no avail.
> 
> Yes, the convert_from_reference shouldn't have any effect here, that should
> have happened already when processing the condition expression.
> 
> > However, I did see cases (both in hand-written testcases and
> > in the testsuite, e.g. g++.dg/ext/pr114409-2.C) where the subsequent
> > call to condition_conversion would change the type (e.g. from int to
> > bool), which shows the need for updating the types in the ANNOTATE_EXPR
> > chain -- thanks for pointing that out, Jakub!
> > 
> > Personally, I feel the handling of the flags (in this patch, as per
> > Jakub's suggestion) is a bit of a premature optimization.  It seems
> > cleaner (and safer) to me just to re-build the annotations if needed
> > (i.e. in the case that the type changed).  You could even have a nice
> > abstraction that encapsulates the stripping and re-building of
> > ANNOTATE_EXPRs, so that it doesn't clutter the caller quite so much.
> 
> I'm sympathetic that the optimization is not very significant, but neither
> is updating the flags.  You could also factor it out for the same less
> clutter in the caller?

Good point, I'll see if I can't factor things out with the in-place update
approach.

> 
> > +  /* If the type of *CONDP changed (e.g. due to 
> > convert_from_reference) then
> 
> As discussed, this is much more likely to be from condition_conversion.
> 
> > +the flags may have changed too.  The logic in the loop below relies on
> > +the flags only being changed in the following directions (if at all):
> > +  TREE_SIDE_EFFECTS : 0 -> 1
> > +  TREE_READONLY : 1 -> 0
> > +thus avoiding re-computing the flags from scratch (e.g. via build3), so
> > +let's verify that this precondition holds.  */
> 
> Is there any case where an ANNOTATE_EXPR can have different
> READONLY/SIDE_EFFECTS flags from its operand?  It would be simpler to just
> copy the flags and not bother with the checking.

Looking at the calls to build3 (ANNOTATE_EXPR, ...) in cp/semantics.cc,
it looks like the other two operands of ANNOTATE_EXPRs are only ever
INTEGER_CSTs (the code in tree-cfg.cc:replace_loop_annotate_in_block
corroborates this).

I think an INTEGER_CST C will always have:

  TREE_SIDE_EFFECTS (C) = TREE_READONLY (C) = 0

and since the TREE_READONLY flags are conjunctive and TREE_SIDE_EFFECTS
flags are disjunctive, for an ANNOTATE_EXPR A we will necessarily have:

 - TREE_READONLY (A) = 0
 - TREE_SIDE_EFFECTS (A) = TREE_SIDE_EFFECTS (TREE_OPERAND (A, 0))

so indeed I think this can be simplified significantly, since the above
means we needn't update TREE_READONLY, and TREE_SIDE_EFFECTS can be set
to that of the updated operand (without the checking).

I'll adjust the patch to account for this and try to factor things out
as suggested above.

Thanks a lot for the review.

Alex

> 
> > +#define CHECK_FLAG_CHANGE(prop, value)\
> > +  gcc_checking_assert (prop (orig_inner) == prop (*condp) || prop 
> > (*condp) == value)
> > +  CHECK_FLAG_CHANGE (TREE_SIDE_EFFECTS, 1);
> > +  CHECK_FLAG_CHANGE (TREE_READONLY, 0);
> > +#undef CHECK_FLAG_CHANGE
> > +  for (tree c = cond; c != *condp; c = TREE_OPERAND (c, 0))
> > +   {
> > + gcc_checking_assert (TREE_CODE (c) == ANNOTATE_EXPR);
> > + TREE_TYPE (c) = TREE_TYPE (*condp);
> > + TREE_SIDE_EFFECTS (c) |= TREE_SIDE_EFFECTS (*condp);
> > + TREE_READONLY (c) &= TREE_READONLY (*condp);
> > +   }
> 
> 


Re: [PATCH v2] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-08-16 Thread Jakub Jelinek
On Fri, Aug 16, 2024 at 11:38:03AM +0100, Alex Coplan wrote:
> Looking at the calls to build3 (ANNOTATE_EXPR, ...) in cp/semantics.cc,
> it looks like the other two operands of ANNOTATE_EXPRs are only ever
> INTEGER_CSTs (the code in tree-cfg.cc:replace_loop_annotate_in_block
> corroborates this).

As long as we don't add new ANNOTATE_EXPR kinds with non-constant arguments,
but we don't have them right now.

> I think an INTEGER_CST C will always have:
> 
>   TREE_SIDE_EFFECTS (C) = TREE_READONLY (C) = 0

That is true.
> 
> and since the TREE_READONLY flags are conjunctive and TREE_SIDE_EFFECTS
> flags are disjunctive, for an ANNOTATE_EXPR A we will necessarily have:
> 
>  - TREE_READONLY (A) = 0

No.  The TREE_READONLY computation is:
read_only = 1;
...
if (!TREE_READONLY (arg##N) \
&& !CONSTANT_CLASS_P (arg##N))  \
  (void) (read_only = 0);   \
While INTEGER_CST isn't TREE_READONLY, it is CONSTANT_CLASS_P.

>  - TREE_SIDE_EFFECTS (A) = TREE_SIDE_EFFECTS (TREE_OPERAND (A, 0))

So, unless we add non-INTEGER_CST extra arguments to ANNOTATE_EXPR,
  TREE_READONLY (A) = TREE_READONLY (TREE_OPERAND (A, 0))
  || CONSTANT_CLASS_P (TREE_OPERAND (A, 0));
Not really sure if the first argument will ever be say INTEGER_CST,
#pragma GCC unroll 8
while (1)
{
  if (something)
break;
}
?

Jakub



[PATCH] Do not emit a redundant DW_TAG_lexical_block for inlined subroutines

2024-08-16 Thread Bernd Edlinger
While this already works correctly for the case when an inlined
subroutine contains only one subrange, a redundant DW_TAG_lexical_block
is still emitted when the subroutine has multiple blocks.

Fixes: ac02e5b75451 ("re PR debug/37801 (DWARF output for inlined functions
  doesn't always use DW_TAG_inlined_subroutine)")

gcc/ChangeLog:

PR debug/87440
* dwarf2out.cc (gen_inlined_subroutine_die): Handle the case
of multiple subranges correctly.
---
some more context is here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87440#c5
Bootstrapped and regression-tested on x86_64-pc-linux-gnu, OK for trunk?

 gcc/dwarf2out.cc | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 357efaa5990..346feeb53c8 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -25171,9 +25171,10 @@ gen_inlined_subroutine_die (tree stmt, dw_die_ref 
context_die)
  Do that by doing the recursion to subblocks on the single subblock
  of STMT.  */
   bool unwrap_one = false;
-  if (BLOCK_SUBBLOCKS (stmt) && !BLOCK_CHAIN (BLOCK_SUBBLOCKS (stmt)))
+  tree sub = BLOCK_SUBBLOCKS (stmt);
+  if (sub)
 {
-  tree origin = block_ultimate_origin (BLOCK_SUBBLOCKS (stmt));
+  tree origin = block_ultimate_origin (sub);
   if (origin
  && TREE_CODE (origin) == BLOCK
  && BLOCK_SUPERCONTEXT (origin) == decl)
@@ -25181,7 +25182,11 @@ gen_inlined_subroutine_die (tree stmt, dw_die_ref 
context_die)
 }
   decls_for_scope (stmt, subr_die, !unwrap_one);
   if (unwrap_one)
-decls_for_scope (BLOCK_SUBBLOCKS (stmt), subr_die);
+{
+  decls_for_scope (sub, subr_die);
+  for (sub = BLOCK_CHAIN (sub); sub; sub = BLOCK_CHAIN (sub))
+   gen_block_die (sub, subr_die);
+}
 }
 
 /* Generate a DIE for a field in a record, or structure.  CTX is required: see
-- 
2.39.2



Re: [PATCH v3 05/12] OpenMP: C++ front-end support for metadirectives

2024-08-16 Thread Jakub Jelinek
On Sat, Jul 20, 2024 at 02:42:24PM -0600, Sandra Loosemore wrote:
> +  const char *old_name = IDENTIFIER_POINTER (name);
> +  char *new_name = (char *) alloca (strlen (old_name) + 32);

XALLOCAVEC like for the C FE patch.

> +   /* FIXME: I believe it is an unimplemented feature rather
> +  than a user error to have non-constant expressions
> +  inside "declare variant".  */
> +   t = metadirective_p
> + ? cp_parser_expression (parser)
> + : cp_parser_constant_expression (parser);
> if (t != error_mark_node)
>   {
> t = fold_non_dependent_expr (t);
> -   if (!value_dependent_expression_p (t)
> +   if (!metadirective_p
> +   && !value_dependent_expression_p (t)
> && (!INTEGRAL_TYPE_P (TREE_TYPE (t))
> || !tree_fits_shwi_p (t)))
>   error_at (token->location, "property must be "
> "constant integer expression");
> +   if (metadirective_p
> +   && !INTEGRAL_TYPE_P (TREE_TYPE (t)))

Shouldn't this be && !type_dependent_expression_p (t) before the
!INTEGRAL_TYPE_P check?
I mean
template 
void
foo ()
{
  #pragma omp metadirective ... user={condition(N)} ...
...
}
should be valid, or just typename T and foo (T x) and condition(x).

> + error_at (token->location,
> +   "property must be integer expression");
> --- a/gcc/cp/parser.h
> +++ b/gcc/cp/parser.h
> @@ -450,6 +450,13 @@ struct GTY(()) cp_parser {
>/* Pointer to state for parsing omp_loops.  Managed by
>   cp_parser_omp_for_loop in parser.cc and not used outside that file.  */
>struct omp_for_parse_data * GTY((skip)) omp_for_parse_state;
> +
> +  /* Set if we are processing a statement body associated with a
> + metadirective variant.  */
> +  bool in_metadirective_body;
> +
> +  vec * GTY((skip)) metadirective_body_labels;
> +  unsigned metadirective_region_num;

Again, there is unnecessary padding here (pointer, 8-bit bool, pointer,
32-bit unsigned) and maybe put the stuff into a separate structure and just
use a pointer to it?  Like the omp_for_parse_state.  Though, it is less
important than in the C++ FE.
>  };
>  
>  /* In parser.cc  */
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 108e929b8ee..109121be501 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -17851,6 +17851,79 @@ tsubst_omp_clauses (tree clauses, enum 
> c_omp_region_type ort,
>return new_clauses;
>  }
>  
> +/* Like tsubst_copy, but specifically for OpenMP context selectors.  */
> +static tree
> +tsubst_omp_context_selector (tree ctx, tree args, tsubst_flags_t complain,
> +  tree in_decl)
> +{
> +  tree new_ctx = NULL_TREE;
> +  for (tree set = ctx; set; set = TREE_CHAIN (set))
> +{
> +  tree selectors = NULL_TREE;
> +  for (tree sel = OMP_TSS_TRAIT_SELECTORS (set); sel;
> +sel = TREE_CHAIN (sel))
> + {
> +   enum omp_ts_code code = OMP_TS_CODE (sel);
> +   tree properties = NULL_TREE;
> +   tree score = OMP_TS_SCORE (sel);
> +   tree t;
> +
> +   if (score)
> + {
> +   score = tsubst_expr (score, args, complain, in_decl);
> +   score = fold_non_dependent_expr (score);

I think for partial template specialization processing_template_decl
can be true, so wonder if in that case it shouldn't again not diagnose
anything if score is still value dependent expression.

> +   switch (omp_ts_map[OMP_TS_CODE (sel)].tp_type)
> +   {
> +   case OMP_TRAIT_PROPERTY_DEV_NUM_EXPR:
> +   case OMP_TRAIT_PROPERTY_BOOL_EXPR:
> + t = tsubst_expr (OMP_TP_VALUE (OMP_TS_PROPERTIES (sel)),
> +  args, complain, in_decl);
> + t = fold_non_dependent_expr (t);
> + if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
> +   error_at (cp_expr_loc_or_input_loc (t),
> + "property must be integer expression");

And similarly here.  Also, where do we do instantiation of the declare
variant selectors (if we do that at all)?

Jakub



Re: [PATCH v2] c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]

2024-08-16 Thread Alex Coplan
On 16/08/2024 12:47, Jakub Jelinek wrote:
> On Fri, Aug 16, 2024 at 11:38:03AM +0100, Alex Coplan wrote:
> > Looking at the calls to build3 (ANNOTATE_EXPR, ...) in cp/semantics.cc,
> > it looks like the other two operands of ANNOTATE_EXPRs are only ever
> > INTEGER_CSTs (the code in tree-cfg.cc:replace_loop_annotate_in_block
> > corroborates this).
> 
> As long as we don't add new ANNOTATE_EXPR kinds with non-constant arguments,
> but we don't have them right now.
> 
> > I think an INTEGER_CST C will always have:
> > 
> >   TREE_SIDE_EFFECTS (C) = TREE_READONLY (C) = 0
> 
> That is true.
> > 
> > and since the TREE_READONLY flags are conjunctive and TREE_SIDE_EFFECTS
> > flags are disjunctive, for an ANNOTATE_EXPR A we will necessarily have:
> > 
> >  - TREE_READONLY (A) = 0
> 
> No.  The TREE_READONLY computation is:
> read_only = 1;
> ...
> if (!TREE_READONLY (arg##N) \
> && !CONSTANT_CLASS_P (arg##N))  \
>   (void) (read_only = 0);   \
> While INTEGER_CST isn't TREE_READONLY, it is CONSTANT_CLASS_P.
> 
> >  - TREE_SIDE_EFFECTS (A) = TREE_SIDE_EFFECTS (TREE_OPERAND (A, 0))
> 
> So, unless we add non-INTEGER_CST extra arguments to ANNOTATE_EXPR,
>   TREE_READONLY (A) = TREE_READONLY (TREE_OPERAND (A, 0))
> || CONSTANT_CLASS_P (TREE_OPERAND (A, 0));

Ah, right.  I was going off memory of what we discussed so far and didn't look
at what PROCESS_ARG actually does.  Thanks.

In any case, this avoids the need for the checking in the change of
direction of the flags (although perhaps pushes the problem elsewhere in
that now we arguably need to check that operands 1 and 2 of each
ANNOTATE_EXPR is an INTEGER_CST, unless we want to just rely on that
assumption unchecked).

> Not really sure if the first argument will ever be say INTEGER_CST,
> #pragma GCC unroll 8
> while (1)
> {
>   if (something)
> break;
> }
> ?
> 
>   Jakub
> 


Re: [PATCH v3 06/12] OpenMP: common c/c++ testcases for metadirectives

2024-08-16 Thread Jakub Jelinek
On Sat, Jul 20, 2024 at 02:42:25PM -0600, Sandra Loosemore wrote:
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/gomp/attrs-metadirective-1.c
> @@ -0,0 +1,41 @@
> +/* { dg-do compile { target { c || c++11 } } } */
> +/* { dg-options "-fopenmp -std=c23" { target { c } } } */
> +
> +#define N 100
> +
> +void f (int a[], int b[], int c[])

Unless testcases specifically test for other formatting, usually
they should follow the normal GNU coding conventions, so
void
f (int a[], int b[], int c[])

> +int main (void)

Ditto.

> +/* { dg-final { scan-tree-dump-times "#pragma omp metadirective" 1 
> "original" } } */
> +/* { dg-final { scan-tree-dump-times "when \\(device = .*arch.*nvptx.*\\):" 
> 1 "original" } } */
> +/* { dg-final { scan-tree-dump-times "#pragma omp teams" 1 "original" } } */
> +/* { dg-final { scan-tree-dump-times "otherwise:" 1 "original" } } */
> +/* { dg-final { scan-tree-dump-times "#pragma omp parallel" 1 "original" } } 
> */
> +/* { dg-final { scan-tree-dump-times "#pragma omp loop" 2 "original" } } */
> +
> +/* { dg-final { scan-tree-dump-times "#pragma omp metadirective" 1 "gimple" 
> } } */
> +
> +/* { dg-final { scan-tree-dump-not "#pragma omp metadirective" "optimized" } 
> } */

Have you tested all the new scan-tree-dump* testcases both in builds
configured with and without offloading?  Those very often suffer from dump
differences...

And, as I wrote in the C++ FE patch review, there should be test coverage
for when device_num or condition is used in a template and is not value or
type dependent, or when it is not type dependent but value dependent, and
when it is value dependent.

Jakub



[Fortran, Patch, PR46371, v1] Fix coarrays use in select type

2024-08-16 Thread Andre Vehreschild
Hi all,

attached patch is a follow up on the pr110033 patch and fixes two ICEs
reported in pr46371. With the patch also pr56496 is fixed, although that could
have been fixed by pr110033 already. I just added the testcase from pr56496 here
as coarray/select_type_3.f90 (I like it when the name of the test gives a rough
idea on what is tested instead of having just the pr#) to have it covered.

Bootstraps and regtests ok on x86_64-pc-linux-gnu. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From 205e001e9df7d7b84667a16deee776d2cc8129ca Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Thu, 15 Aug 2024 20:23:23 +0200
Subject: [PATCH] [Fortran] Allow coarrays in select type. [PR46371, PR56496]

Fix ICE when scalar coarrays are used in a select type. Prevent
coindexing in associate/select type/select rank selector expression.

gcc/fortran/ChangeLog:

	PR fortran/46371
	PR fortran/56496

	* expr.cc (gfc_is_coindexed): Detect is coindexed also when
	rewritten to caf_get.
	* trans-stmt.cc (trans_associate_var): Always accept a
	descriptor for coarrays.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/select_type_1.f90: New test.
	* gfortran.dg/coarray/select_type_2.f90: New test.
	* gfortran.dg/coarray/select_type_3.f90: New test.
---
 gcc/fortran/expr.cc   |  4 +++
 gcc/fortran/trans-stmt.cc | 10 ++
 .../gfortran.dg/coarray/select_type_1.f90 | 34 +++
 .../gfortran.dg/coarray/select_type_2.f90 | 19 +++
 .../gfortran.dg/coarray/select_type_3.f90 | 23 +
 5 files changed, 83 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/select_type_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/select_type_2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/select_type_3.f90

diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index d3a1f8c0ba1..4f2d80c04f8 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -5803,6 +5803,10 @@ gfc_is_coindexed (gfc_expr *e)
 {
   gfc_ref *ref;

+  if (e->expr_type == EXPR_FUNCTION && e->value.function.isym
+  && e->value.function.isym->id == GFC_ISYM_CAF_GET)
+e = e->value.function.actual->expr;
+
   for (ref = e->ref; ref; ref = ref->next)
 if (ref->type == REF_ARRAY && ref->u.ar.codimen > 0)
   return !gfc_ref_this_image (ref);
diff --git a/gcc/fortran/trans-stmt.cc b/gcc/fortran/trans-stmt.cc
index 3b09a139dc0..023b1739b85 100644
--- a/gcc/fortran/trans-stmt.cc
+++ b/gcc/fortran/trans-stmt.cc
@@ -2200,16 +2200,12 @@ trans_associate_var (gfc_symbol *sym, gfc_wrapped_block *block)
 		  else
 		stmp = gfc_class_data_get (ctmp);

-		  /* Coarray scalar component expressions can emerge from
-		 the front end as array elements of the _data field.  */
-		  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (stmp)))
-		stmp = gfc_conv_descriptor_data_get (stmp);
-
-		  if (!POINTER_TYPE_P (TREE_TYPE (stmp)))
+		  if (!CLASS_DATA (sym)->attr.codimension
+		  && !POINTER_TYPE_P (TREE_TYPE (stmp)))
 		stmp = gfc_build_addr_expr (NULL, stmp);

 		  dtmp = gfc_class_data_get (ctree);
-		  stmp = fold_convert (TREE_TYPE (dtmp), stmp);
+		  stmp = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (dtmp), stmp);
 		  gfc_add_modify (&se.pre, dtmp, stmp);
 		  stmp = gfc_class_vptr_get (ctmp);
 		  dtmp = gfc_class_vptr_get (ctree);
diff --git a/gcc/testsuite/gfortran.dg/coarray/select_type_1.f90 b/gcc/testsuite/gfortran.dg/coarray/select_type_1.f90
new file mode 100644
index 000..7f12fb9aec7
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/select_type_1.f90
@@ -0,0 +1,34 @@
+!{ dg-do run }
+
+! Check PR46371 is fixed.
+! Contributed by Tobias Burnus  
+
+program pr46371
+  type :: foo
+integer :: i = 0
+  end type
+
+  class(foo), allocatable :: o_foo[:]
+  integer :: j
+
+  allocate(foo :: o_foo[*])
+  if (this_image() == 1) then
+
+select type(a => o_foo)
+  type is(foo)
+  j = a[1]%i
+  a[1]%i = 3
+end select
+
+if (j /= 0) stop 1
+
+select type(o_foo)
+  type is(foo)
+  j = o_foo[1]%i
+end select
+
+if (o_foo[1]%i /= 3) stop 2
+if (j /= 3) stop 3
+  end if
+end program pr46371
+
diff --git a/gcc/testsuite/gfortran.dg/coarray/select_type_2.f90 b/gcc/testsuite/gfortran.dg/coarray/select_type_2.f90
new file mode 100644
index 000..1694d095708
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/select_type_2.f90
@@ -0,0 +1,19 @@
+!{ dg-do compile }
+
+! Check PR46371 is fixed.
+! Contributed by Tobias Burnus  
+
+program pr46371
+  type :: foo
+integer :: i = 0
+  end type
+
+  class(foo), allocatable :: o_foo[:]
+  integer :: j
+
+  select type(a => o_foo[2])  !{ dg-error "must not be coindexed" }
+type is(foo)
+j = a%i
+  end select
+end program pr46371
+
diff --git a/gcc/testsuite/gfortran.dg/coarray/select_type_3.f90 b/gcc/testsuite/gfortran.dg/coarray/select_type_3.f90
new file mode 100644
index 000

RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Li, Pan2
Hi there,

Please feel free to let me know if you don't have authority to commit it. I can 
help to commit this patch.

Pan


-Original Message-
From: Kito Cheng  
Sent: Friday, August 16, 2024 3:48 PM
To: 曾治金 
Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org
Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]

LGTM, thanks for fixing that :)

On Wed, Aug 14, 2024 at 2:06 PM 曾治金  wrote:
>
> This patch is to fix the bug (BugId:116305) introduced by the commit
> bd93ef for risc-v target.
>
> The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
> if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
> it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
> merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
> of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
> of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
> equal.
>
> Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
> register value in riscv_legitimize_poly_move, and dwarf2cfi will also
> get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
> to calculate the number of times to multiply the vlenb register value.
>
> So need to change the factor from riscv_bytes_per_vector_chunk to
> BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
> information. The incorrect example as follow:
>
> ```
> csrrt0,vlenb
> sllit1,t0,1
> sub sp,sp,t1
>
> .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
> ```
>
> The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
> the literal 4, '0x1e' means the multiply operation. But in fact, the
> vlenb register value just need to multiply the literal 2.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.
>
> Signed-off-by: Zhijin Zeng 
> ---
>  gcc/config/riscv/riscv.cc |  4 +--
>  .../riscv/rvv/base/scalable_vector_cfi.c  | 32 +++
>  2 files changed, 34 insertions(+), 2 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 5fe4273beb7..e740fc159dd 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -10773,12 +10773,12 @@ static unsigned int
>  riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
>   int *offset)
>  {
> -  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
> +  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
>   1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
>   2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
>*/
>gcc_assert (i == 1);
> -  *factor = riscv_bytes_per_vector_chunk;
> +  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
>*offset = 1;
>return RISCV_DWARF_VLENB;
>  }
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> new file mode 100644
> index 000..184da10caf3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
> +/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
> +/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } 
> */
> +
> +#include "riscv_vector.h"
> +
> +#define PI_2 1.570796326795
> +
> +extern void func(float *result);
> +
> +void test(const float *ys, const float *xs, float *result, size_t length) {
> +size_t gvl = __riscv_vsetvlmax_e32m2();
> +vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
> +
> +for(size_t i = 0; i < length;) {
> +gvl = __riscv_vsetvl_e32m2(length - i);
> +vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl);
> +vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl);
> +vbool16_t mask0  = __riscv_vmflt_vv_f32m2_b16(x, y, gvl);
> +vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 
> 0, gvl);
> +
> +__riscv_vse32_v_f32m2(result, fixpi, gvl);
> +
> +func(result);
> +
> +i += gvl;
> +ys += gvl;
> +xs += gvl;
> +result += gvl;
> +}
> +}
> --
> 2.34.1
>
>
> This message and any attachment are confidential and may be privileged or 
> otherwise protected from disclosure. If you are not an intended recipient of 
> this message, please delete it and any attachment from your system and notify 
> the sender immediately by reply e-mail. Unintended recipients should not use, 
> copy, disclose or take any action based on this message or any 

Re: [PATCH v3 07/12] OpenMP: Fortran front-end support for metadirectives.

2024-08-16 Thread Jakub Jelinek
On Sat, Jul 20, 2024 at 02:42:26PM -0600, Sandra Loosemore wrote:
> This patch adds support for metadirectives to the Fortran front end.
> +  else if (c->op == EXEC_OMP_METADIRECTIVE)
> +{
> +  gfc_omp_variant *variant
> + = c->ext.omp_variants;

Why two lines?  This is short enough to fit on one.

> +  if (begin_p && directive != ST_NONE
> +   && gfc_omp_end_stmt (directive) == ST_NONE)

When the whole condition doesn't fit on one line, each && (or ||) should
be on a separate line.

> +   gfc_error (
> + "Unexpected %s statement in an OMP METADIRECTIVE block at %C",
> + gfc_ascii_statement (st));

Please avoid calls with opening ( at end of line if possible, they are too
ugly.
  gfc_error ("Unexpected %s statement in an OMP METADIRECTIVE "
 "block at %C", gfc_ascii_statement (st));
looks better.

> + case ST_OMP_END_METADIRECTIVE:
> +   if (gfc_state_stack->state == COMP_OMP_BEGIN_METADIRECTIVE)
> + {
> +   st = next_statement ();
> +   return st;
> + }
> +   /* FALLTHRU */

When the stmt to fall through is just return st;, /* FALLTHRU */
seems unnecessary.  Just do
  if (gfc_state_stack->state == COMP_OMP_BEGIN_METADIRECTIVE)
return next_statement ();
  else
return st;
> +
>   default:
> return st;
>   }

> --- a/gcc/fortran/trans-decl.cc
> +++ b/gcc/fortran/trans-decl.cc
> @@ -331,7 +331,10 @@ gfc_get_label_decl (gfc_st_label * lp)
>gcc_assert (lp != NULL && lp->value <= MAX_LABEL_VALUE);
>  
>/* Build a mangled name for the label.  */
> -  sprintf (label_name, "__label_%.6d", lp->value);
> +  if (lp->omp_region)
> + sprintf (label_name, "__label_%d_%.6d", lp->omp_region, lp->value);
> +  else
> + sprintf (label_name, "__label_%.6d", lp->value);

Makes me wonder what will happen if there are nested metadirectives (or is
omp_region unique in the whole function or TU rather than just in one
metadirective)?

!$omp metadirective ... say conditional teams here
!$omp metadirective ... say conditional parallel here
some code that needs labels
!$omp end metadirective
!$omp end metadirective

Jakub



RE: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean for HI/QI [PR116278]

2024-08-16 Thread Li, Pan2
Thanks Jeff and waterman for comments.

> What's more important is that we get the RTL semantics right, the fact
> that it seems to work due to addiw seems to be more of an accident than
> by design.

The SImode has different handling from day 1 which follow the algorithm up to a 
point.

11842   if (mode == SImode && mode != Xmode)
11843 { /* Take addw to avoid the sum truncate.  
11844   rtx simode_sum = gen_reg_rtx (SImode
11845   riscv_emit_binary (PLUS, simode_sum, x, y
11846   emit_move_insn (xmode_sum, gen_lowpart (Xmode, simode_sum));

  
11847 }

> I think your overall point still holds, though.

Got the point here but I would like to double confirm the below 2 more insn is 
acceptable for this change. (or we can eliminate it later)

sat_u_add_uint32_t_fmt_1:
sllia5,a0,32   // additional insn for taking care SI in rv64
srlia5,a5,32   // Ditto.
addwa0,a0,a1
sltua5,a0,a5


   
neg a5,a5   


   
or  a0,a5,a0
sext.w  a0,a0
ret

If so, I will prepare the v3 for the SImode in RV64.

Pan

-Original Message-
From: Andrew Waterman  
Sent: Friday, August 16, 2024 12:28 PM
To: Jeff Law 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] RISC-V: Make sure high bits of usadd operands is clean 
for HI/QI [PR116278]

On Thu, Aug 15, 2024 at 9:23 PM Jeff Law  wrote:
>
>
>
> On 8/13/24 10:16 PM, Li, Pan2 wrote:
> >> How specifically is it avoided for SI?  ISTM it should have the exact
> >> same problem with a constant like 0x8000 in SImode on rv64 which is
> >> going to be extended to 0x8000.
> >
> > HI and QI need some special handling for sum. For example, for HImode.
> >
> > 65535 + 2 = 65537, when compare sum and 2, we need to cleanup the high bits 
> > (aka make 65537 become 1) to tell the HImode overflow.
> > Thus, for HI and QI, we need to clean up highest bits of mode.
> >
> > But for SI, we don't need that as we have addw insn, the sign extend will 
> > take care of this as well as the sltu. For example, SImode.
> >
> > lw  a1,0(a5)  // a1 is -40, aka 0xffd8
> > lui a0,0x1a   //
> > addwia5,a1,9   // a5 is -31, aka 0xffe1
> > // For QI and HI, we need to mask the highbits, 
> > but not applicable for SI.
> > sltua1,a5,a1  // compare a1 and a5, a5 > a1, then no-overflow as 
> > expected.
> What's more important is that we get the RTL semantics right, the fact
> that it seems to work due to addiw seems to be more of an accident than
> by design.  Also note that addiw isn't available unless ZBA is enabled,
> so we don't want to depend on that to save us.

addiw is always available in RV64; you're probably thinking of add.uw,
which is an RV64_Zba instruction.  I think your overall point still
holds, though.

>
> I still think we should be handling SI on rv64 in a manner similar to
> QI/HI are handled on rv32/rv64.
>
> jeff
>


Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Zhijin Zeng
Hi Pan,
I am a new guy for GCC and don't have authority to commit. Please help to 
commit this patch. Thank you very much.
Zhijin

> From: "Li, Pan2"
> Date:  Fri, Aug 16, 2024, 20:15
> Subject:  RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> To: "曾治金"
> Cc: "gcc-patches@gcc.gnu.org", 
> "gcc-b...@gcc.gnu.org", "Kito 
> Cheng"
> Hi there,
> 
> Please feel free to let me know if you don't have authority to commit it. I 
> can help to commit this patch.
> 
> Pan
> 
> 
> -Original Message-
> From: Kito Cheng  
> Sent: Friday, August 16, 2024 3:48 PM
> To: 曾治金 
> Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org
> Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> 
> LGTM, thanks for fixing that :)
> 
> On Wed, Aug 14, 2024 at 2:06 PM 曾治金  wrote:
> >
> > This patch is to fix the bug (BugId:116305) introduced by the commit
> > bd93ef for risc-v target.
> >
> > The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
> > if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
> > it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
> > merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
> > of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
> > of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
> > equal.
> >
> > Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
> > register value in riscv_legitimize_poly_move, and dwarf2cfi will also
> > get the estimated vlenb register value in 
> > riscv_dwarf_poly_indeterminate_value
> > to calculate the number of times to multiply the vlenb register value.
> >
> > So need to change the factor from riscv_bytes_per_vector_chunk to
> > BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
> > information. The incorrect example as follow:
> >
> > ```
> > csrr    t0,vlenb
> > slli    t1,t0,1
> > sub     sp,sp,t1
> >
> > .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
> > ```
> >
> > The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
> > the literal 4, '0x1e' means the multiply operation. But in fact, the
> > vlenb register value just need to multiply the literal 2.
> >
> > gcc/ChangeLog:
> >
> >         * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.
> >
> > Signed-off-by: Zhijin Zeng 
> > ---
> >  gcc/config/riscv/riscv.cc                     |  4 +--
> >  .../riscv/rvv/base/scalable_vector_cfi.c      | 32 +++
> >  2 files changed, 34 insertions(+), 2 deletions(-)
> >  create mode 100644 
> >gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 5fe4273beb7..e740fc159dd 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -10773,12 +10773,12 @@ static unsigned int
> >  riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
> >                                       int *offset)
> >  {
> > -  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
> > +  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
> >       1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
> >       2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
> >    */
> >    gcc_assert (i == 1);
> > -  *factor = riscv_bytes_per_vector_chunk;
> > +  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
> >    *offset = 1;
> >    return RISCV_DWARF_VLENB;
> >  }
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> > new file mode 100644
> > index 000..184da10caf3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> > @@ -0,0 +1,32 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
> > +/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
> > +/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } 
> > } */
> > +
> > +#include "riscv_vector.h"
> > +
> > +#define PI_2 1.570796326795
> > +
> > +extern void func(float *result);
> > +
> > +void test(const float *ys, const float *xs, float *result, size_t length) {
> > +    size_t gvl = __riscv_vsetvlmax_e32m2();
> > +    vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
> > +
> > +    for(size_t i = 0; i < length;) {
> > +        gvl = __riscv_vsetvl_e32m2(length - i);
> > +        vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl);
> > +        vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl);
> > +        vbool16_t mask0  = __riscv_vmflt_vv_f32m2_b16(x, y, gvl);
> > +        vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 
> > 0, gvl

Re: [PATCH v3 08/12] OpenMP: Reject other properties with kind(any)

2024-08-16 Thread Jakub Jelinek
On Sat, Jul 20, 2024 at 02:42:27PM -0600, Sandra Loosemore wrote:
> The OpenMP spec says:
> 
> "If trait-property any is specified in the kind trait-selector of the
> device selector set or the target_device selector sets, no other
> trait-property may be specified in the same selector set."

That is OpenMP 5.1 addition, the code was written for OpenMP 5.0, so it was
valid at that time.

> GCC was not previously enforcing this restriction and several testcases
> included such valid constructs.  This patch fixes it.
> 
> gcc/ChangeLog
>   * omp-general.cc (omp_check_context_selector): Reject other
>   properties in the same selector set with kind(any).
> 
> gcc/testsuite/ChangeLog
>   * c-c++-common/gomp/declare-variant-10.c: Fix broken tests.
>   * c-c++-common/gomp/declare-variant-3.c: Likewise.
>   * c-c++-common/gomp/declare-variant-9.c: Likewise.
>   * c-c++-common/gomp/declare-variant-any.c: New.
>   * gfortran.dg/gomp/declare-variant-10.f90: Fix broken tests.
>   * gfortran.dg/gomp/declare-variant-3.f90: Likewise.
>   * gfortran.dg/gomp/declare-variant-9.f90: Likewise.
>   * gfortran.dg/gomp/declare-variant-any.f90: Likewise.
> ---
>  gcc/omp-general.cc| 31 +++
>  .../c-c++-common/gomp/declare-variant-10.c|  4 +--
>  .../c-c++-common/gomp/declare-variant-3.c | 10 ++
>  .../c-c++-common/gomp/declare-variant-9.c |  4 +--
>  .../c-c++-common/gomp/declare-variant-any.c   | 10 ++
>  .../gfortran.dg/gomp/declare-variant-10.f90   |  4 +--
>  .../gfortran.dg/gomp/declare-variant-3.f90| 12 ++-
>  .../gfortran.dg/gomp/declare-variant-9.f90|  2 +-
>  .../gfortran.dg/gomp/declare-variant-any.f90  | 28 +
>  9 files changed, 82 insertions(+), 23 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-variant-any.c
>  create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-variant-any.f90
> 
> diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
> index 87a245ec8b3..12f178c5a2d 100644
> --- a/gcc/omp-general.cc
> +++ b/gcc/omp-general.cc
> @@ -1288,6 +1288,8 @@ omp_check_context_selector (location_t loc, tree ctx, 
> bool metadirective_p)
>for (tree tss = ctx; tss; tss = TREE_CHAIN (tss))
>  {
>enum omp_tss_code tss_code = OMP_TSS_CODE (tss);
> +  bool saw_any_prop = false;
> +  bool saw_other_prop = false;
>  
>/* FIXME: not implemented yet.  */
>if (!metadirective_p && tss_code == OMP_TRAIT_SET_TARGET_DEVICE)
> @@ -1325,6 +1327,27 @@ omp_check_context_selector (location_t loc, tree ctx, 
> bool metadirective_p)
> else
>   ts_seen[ts_code] = true;
>  
> +
> +   /* If trait-property "any" is specified in the "kind"
> +  trait-selector of the "device" selector set or the
> +  "target_device" selector sets, no other trait-property
> +  may be specified in the same selector set.  */
> +   if (ts_code == OMP_TRAIT_DEVICE_KIND)
> + for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p))
> +   {
> + const char *prop = omp_context_name_list_prop (p);
> + if (!prop)
> +   continue;
> + else if (strcmp (prop, "any") == 0)
> +   saw_any_prop = true;
> + else
> +   saw_other_prop = true;
> +   }
> + else if (ts_code == OMP_TRAIT_DEVICE_ARCH

The indentation looks wrong here, should be indented by 2 more columns to
the right.

Anyway, while 5.0/5.1/5.2 had
"Each trait-selector-name can only be specified once."
restriction and in 5.0 it made perfect sense, with the addition
of target_device set in 5.1 this seems to be weird and TR13 has
"Each trait-selector-name may only be specified once in a trait selector set."
So, in 5.1/5.2 pedantically
  device={kind(host)},target_device={device_num(whatever),kind(nohost)}
was invalid and I think the code still rejects it, for 6.0 it will be
valid, so wonder if we shouldn't have
  OMP_TRAIT_TARGET_DEVICE_KIND,
  OMP_TRAIT_TARGET_DEVICE_ISA,
  OMP_TRAIT_TARGET_DEVICE_ARCH,
and whether
  OMP_TRAIT_DEVICE_NUM,
is appropriate and shouldn't be instead
  OMP_TRAIT_TARGET_DEVICE_DEVICE_NUM,

Also wonder if
"If trait-property any is specified in the kind trait-selector of the device 
selector set or
the target_device selector sets, no other trait-property may be specified in 
the same
selector set."
shouldn't have an exception for device_num, one could possibly just want to
declare that some particular device_num has any kind, so
  target_device={device_num(whatever),kind(any)}
rather than just the only pedantically allowed
  target_device={kind(any)}
where the latter specifies something (well, nothing) about the default device 
num while the
former would specify something (well, nothing) about a chosen device num.

Anyway, if OMP_TRAIT_TARGET_DEVICE_{KIND,ISA,ARCH} are separate from
OMP_TRAIT_DEVICE_{KIND,ISA,ARCH}, all this checking could be d

[PATCH] testsuite: Add -fshort-enums to pr33738.C

2024-08-16 Thread Torbjörn SVENSSON
Ok for trunk and releases/gcc-14?

--

For some targets, like Cortex-M on arm-none-eabi, the -fshort-enums is
enabled by default. For these targets, the test case fails as
sizeof(Alpha) < sizeof(int).
To make the test case bahave identical for targets that does enable
-fshort-enums and those that does not, force the option in the test
case and verify that the warning is emitted.

Regtested on x86_64-pc-linux-gnu and arm-none-eabi.

gcc/testsuite/ChangeLog:

* g++.dg/warn/pr33738.C: Added -fshort-enums and removed xfail.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/g++.dg/warn/pr33738.C | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/g++.dg/warn/pr33738.C 
b/gcc/testsuite/g++.dg/warn/pr33738.C
index 73e98d5e083..84bbdaeecc7 100644
--- a/gcc/testsuite/g++.dg/warn/pr33738.C
+++ b/gcc/testsuite/g++.dg/warn/pr33738.C
@@ -1,5 +1,5 @@
 // { dg-do run }
-// { dg-options "-O2 -Wtype-limits -fstrict-enums" }
+// { dg-options "-O2 -Wtype-limits -fstrict-enums -fshort-enums" }
 extern void link_error (void);
 
 enum Alpha {
@@ -15,11 +15,11 @@ int GetM1() {
 
 int main() {
  a2 = static_cast(GetM1());
- if (a2 == -1) {   // { dg-warning "always false due" "" { xfail *-*-* } } 
*/
+ if (a2 == -1) {   // { dg-warning "always false due" } */
 link_error ();
  }
  a2 = static_cast(GetM1());
- if (-1 == a2) {   // { dg-warning "always false due" "" { xfail *-*-* } } 
*/
+ if (-1 == a2) {   // { dg-warning "always false due" } */
 link_error ();
  }
  return 0;
-- 
2.25.1



Re: [PATCH] testsuite: Add -fshort-enums to pr33738.C

2024-08-16 Thread Jakub Jelinek
On Fri, Aug 16, 2024 at 02:58:05PM +0200, Torbjörn SVENSSON wrote:
> Ok for trunk and releases/gcc-14?
> 
> --
> 
> For some targets, like Cortex-M on arm-none-eabi, the -fshort-enums is
> enabled by default. For these targets, the test case fails as
> sizeof(Alpha) < sizeof(int).
> To make the test case bahave identical for targets that does enable
> -fshort-enums and those that does not, force the option in the test
> case and verify that the warning is emitted.
> 
> Regtested on x86_64-pc-linux-gnu and arm-none-eabi.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/warn/pr33738.C: Added -fshort-enums and removed xfail.

That looks wrong, what the test tested is no longer tested on most arches.

Better would be to use -fno-short-enums explicitly, and add another test
which #includes (or copies over) this test and has -fshort-enums and tests
what happens in that case.
> 
> Signed-off-by: Torbjörn SVENSSON 

Jakub



RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Li, Pan2
Is this you newest version?
https://patchwork.sourceware.org/project/gcc/patch/8fd4328940034d8778cca67eaad54e5a2c2b1a6c.1c2f51e1.0a9a.4367.9762.9b6eccc3b...@feishu.cn/

If so, you may need to rebase upstream, I got conflict when git am.

Applying: RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]
error: corrupt patch at line 20
Patch failed at 0001 RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Pan

-Original Message-
From: Zhijin Zeng  
Sent: Friday, August 16, 2024 8:47 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng 

Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]

Hi Pan,
I am a new guy for GCC and don't have authority to commit. Please help to 
commit this patch. Thank you very much.
Zhijin

> From: "Li, Pan2"
> Date:  Fri, Aug 16, 2024, 20:15
> Subject:  RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> To: "曾治金"
> Cc: "gcc-patches@gcc.gnu.org", 
> "gcc-b...@gcc.gnu.org", "Kito 
> Cheng"
> Hi there,
> 
> Please feel free to let me know if you don't have authority to commit it. I 
> can help to commit this patch.
> 
> Pan
> 
> 
> -Original Message-
> From: Kito Cheng  
> Sent: Friday, August 16, 2024 3:48 PM
> To: 曾治金 
> Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org
> Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> 
> LGTM, thanks for fixing that :)
> 
> On Wed, Aug 14, 2024 at 2:06 PM 曾治金  wrote:
> >
> > This patch is to fix the bug (BugId:116305) introduced by the commit
> > bd93ef for risc-v target.
> >
> > The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
> > if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
> > it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
> > merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
> > of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
> > of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
> > equal.
> >
> > Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
> > register value in riscv_legitimize_poly_move, and dwarf2cfi will also
> > get the estimated vlenb register value in 
> > riscv_dwarf_poly_indeterminate_value
> > to calculate the number of times to multiply the vlenb register value.
> >
> > So need to change the factor from riscv_bytes_per_vector_chunk to
> > BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
> > information. The incorrect example as follow:
> >
> > ```
> > csrr    t0,vlenb
> > slli    t1,t0,1
> > sub     sp,sp,t1
> >
> > .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
> > ```
> >
> > The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
> > the literal 4, '0x1e' means the multiply operation. But in fact, the
> > vlenb register value just need to multiply the literal 2.
> >
> > gcc/ChangeLog:
> >
> >         * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.
> >
> > Signed-off-by: Zhijin Zeng 
> > ---
> >  gcc/config/riscv/riscv.cc                     |  4 +--
> >  .../riscv/rvv/base/scalable_vector_cfi.c      | 32 +++
> >  2 files changed, 34 insertions(+), 2 deletions(-)
> >  create mode 100644 
> >gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 5fe4273beb7..e740fc159dd 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -10773,12 +10773,12 @@ static unsigned int
> >  riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
> >                                       int *offset)
> >  {
> > -  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
> > +  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
> >       1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
> >       2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
> >    */
> >    gcc_assert (i == 1);
> > -  *factor = riscv_bytes_per_vector_chunk;
> > +  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
> >    *offset = 1;
> >    return RISCV_DWARF_VLENB;
> >  }
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> > new file mode 100644
> > index 000..184da10caf3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> > @@ -0,0 +1,32 @@
> > +/* { dg-do

Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Zhijin Zeng
Sorry, the line number changed. The newest version as follow,

This patch is to fix the bug (BugId:116305) introduced by the commit
bd93ef for risc-v target.

The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
equal.

Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
register value in riscv_legitimize_poly_move, and dwarf2cfi will also
get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
to calculate the number of times to multiply the vlenb register value.

So need to change the factor from riscv_bytes_per_vector_chunk to
BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
information. The incorrect example as follow:

```
csrr    t0,vlenb
slli    t1,t0,1
sub     sp,sp,t1

.cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
```

The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
the literal 4, '0x1e' means the multiply operation. But in fact, the
vlenb register value just need to multiply the literal 2.

gcc/ChangeLog:

        * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.

Signed-off-by: Zhijin Zeng 
---
 gcc/config/riscv/riscv.cc                     |  4 +--
 .../riscv/rvv/base/scalable_vector_cfi.c      | 32 +++
 2 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1f60d8f9711..8b7123e043e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11010,12 +11010,12 @@ static unsigned int
 riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
                                      int *offset)
 {
-  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
+  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
      1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
      2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
   */
   gcc_assert (i == 1);
-  *factor = riscv_bytes_per_vector_chunk;
+  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
   *offset = 1;
   return RISCV_DWARF_VLENB;
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
new file mode 100644
index 000..184da10caf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
+/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } */
+
+#include "riscv_vector.h"
+
+#define PI_2 1.570796326795
+
+extern void func(float *result);
+
+void test(const float *ys, const float *xs, float *result, size_t length) {
+    size_t gvl = __riscv_vsetvlmax_e32m2();
+    vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
+
+    for(size_t i = 0; i < length;) {
+        gvl = __riscv_vsetvl_e32m2(length - i);
+        vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl);
+        vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl);
+        vbool16_t mask0  = __riscv_vmflt_vv_f32m2_b16(x, y, gvl);
+        vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 0, 
gvl);
+
+        __riscv_vse32_v_f32m2(result, fixpi, gvl);
+
+        func(result);
+
+        i += gvl;
+        ys += gvl;
+        xs += gvl;
+        result += gvl;
+    }
+}
--
2.34.1

> From: "Li, Pan2"
> Date:  Fri, Aug 16, 2024, 21:05
> Subject:  RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> To: "Zhijin Zeng"
> Cc: "gcc-patches@gcc.gnu.org", 
> "gcc-b...@gcc.gnu.org", "Kito 
> Cheng"
> Is this you newest version?
> https://patchwork.sourceware.org/project/gcc/patch/8fd4328940034d8778cca67eaad54e5a2c2b1a6c.1c2f51e1.0a9a.4367.9762.9b6eccc3b...@feishu.cn/
> 
> If so, you may need to rebase upstream, I got conflict when git am.
> 
> Applying: RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]
> error: corrupt patch at line 20
> Patch failed at 0001 RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
> 
> P

[PATCH] c++: ICE with enum and conversion fn in template [PR115657]

2024-08-16 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we initialize an enumerator with a class prvalue with a conversion
function.  When we fold it in build_enumerator, we create a TARGET_EXPR
for the object, and subsequently crash in tsubst_expr, which should not
see such a code.

Normally, we fix similar problems by using an IMPLICIT_CONV_EXPR but here
I may get away with not using the result of fold_non_dependent_expr unless
the result is a constant.  A TARGET_EXPR is not constant.

PR c++/115657

gcc/cp/ChangeLog:

* decl.cc (build_enumerator): Call maybe_fold_non_dependent_expr
instead of fold_non_dependent_expr.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-recursion2.C: New test.
* g++.dg/template/conv21.C: New test.
---
 gcc/cp/decl.cc| 10 +++--
 .../g++.dg/cpp1y/constexpr-recursion2.C   | 22 +++
 gcc/testsuite/g++.dg/template/conv21.C| 14 
 3 files changed, 44 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-recursion2.C
 create mode 100644 gcc/testsuite/g++.dg/template/conv21.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index f23b635aec9..12139e1d862 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -17387,9 +17387,15 @@ build_enumerator (tree name, tree value, tree 
enumtype, tree attributes,
   tree type;
 
   /* scalar_constant_value will pull out this expression, so make sure
- it's folded as appropriate.  */
+ it's folded as appropriate.
+
+ Creating a TARGET_EXPR in a template breaks when substituting, and
+ here we would create it for instance when using a class prvalue with
+ a user-defined conversion function.  So don't use such a tree.  We
+ instantiate VALUE here to get errors about bad enumerators even in
+ a template that does not get instantiated.  */
   if (processing_template_decl)
-value = fold_non_dependent_expr (value);
+value = maybe_fold_non_dependent_expr (value);
 
   /* If the VALUE was erroneous, pretend it wasn't there; that will
  result in the enum being assigned the next value in sequence.  */
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-recursion2.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-recursion2.C
new file mode 100644
index 000..f268f52e2b5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-recursion2.C
@@ -0,0 +1,22 @@
+// PR c++/115657
+// { dg-do compile { target c++14 } }
+// { dg-options "-Wall" }
+
+// Like constexpr-recursion1.C but use a class with a conversion function.
+
+struct X {
+  constexpr operator int() { return 0; }
+};
+
+template 
+constexpr X f1 ()
+{
+  enum E { a = f1<0> () }; // { dg-error "called in a constant expression 
before its definition is complete|is not an integer constant" }
+  return {};
+}
+
+constexpr X f3 ()
+{
+  enum E { a = f3 () };// { dg-error "called in a constant expression 
before its definition is complete|is not an integer constant" }
+  return {};
+}
diff --git a/gcc/testsuite/g++.dg/template/conv21.C 
b/gcc/testsuite/g++.dg/template/conv21.C
new file mode 100644
index 000..1dc7b3d50d9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/conv21.C
@@ -0,0 +1,14 @@
+// PR c++/115657
+// { dg-do compile { target c++11 } }
+
+struct NonIntegral
+{
+constexpr operator int() { return 0; }
+};
+
+template struct TemplatedStructural
+{
+enum { e = NonIntegral{} };
+};
+
+template struct TemplatedStructural;

base-commit: 9cdde72d1cefdf252ad2eec1ff465dccb3ab
-- 
2.46.0



[PATCH v2] testsuite: Verify -fshort-enums and -fno-short-enums in pr33738.C

2024-08-16 Thread Torbjörn SVENSSON
Ok for trunk and releases/gcc-14?

Changes since v1:

- Changed original test case to use -fno-short-enums.
- Added pruning of "use of enum values across objects may fail" warnings.
- Created a copy of the original test case and added -fshort-enums and removed 
the xfail.

--

For some targets, like Cortex-M on arm-none-eabi, the -fshort-enums is
enabled by default. For these targets, the test case fails as
sizeof(Alpha) < sizeof(int).
To make the test case behave identical for targets that does enable
-fshort-enums and those that does not, add -fno-short-enums in the test
case and verify that the warning is not emitted. Then also create a copy
and run the test with -fshort-enums and verify that the warning is
emitted.

Regtested on x86_64-pc-linux-gnu and arm-none-eabi.

gcc/testsuite/ChangeLog:

* g++.dg/warn/pr33738.C: Added -fno-short-enums.
* g++.dg/warn/pr33738-2.C: Duplicate g++.dg/warn/pr33738.C with
-fshort-enums and removed xfail.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/g++.dg/warn/pr33738-2.C | 27 +++
 gcc/testsuite/g++.dg/warn/pr33738.C   |  3 ++-
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/pr33738-2.C

diff --git a/gcc/testsuite/g++.dg/warn/pr33738-2.C 
b/gcc/testsuite/g++.dg/warn/pr33738-2.C
new file mode 100644
index 000..84bbdaeecc7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/pr33738-2.C
@@ -0,0 +1,27 @@
+// { dg-do run }
+// { dg-options "-O2 -Wtype-limits -fstrict-enums -fshort-enums" }
+extern void link_error (void);
+
+enum Alpha {
+ ZERO = 0, ONE, TWO, THREE
+};
+
+Alpha a2;
+
+int m1 = -1;
+int GetM1() {
+ return m1;
+}
+
+int main() {
+ a2 = static_cast(GetM1());
+ if (a2 == -1) {   // { dg-warning "always false due" } */
+link_error ();
+ }
+ a2 = static_cast(GetM1());
+ if (-1 == a2) {   // { dg-warning "always false due" } */
+link_error ();
+ }
+ return 0;
+}
+
diff --git a/gcc/testsuite/g++.dg/warn/pr33738.C 
b/gcc/testsuite/g++.dg/warn/pr33738.C
index 73e98d5e083..17ef158e56e 100644
--- a/gcc/testsuite/g++.dg/warn/pr33738.C
+++ b/gcc/testsuite/g++.dg/warn/pr33738.C
@@ -1,5 +1,6 @@
 // { dg-do run }
-// { dg-options "-O2 -Wtype-limits -fstrict-enums" }
+/* { dg-prune-output "use of enum values across objects may fail" } */
+// { dg-options "-O2 -Wtype-limits -fstrict-enums -fno-short-enums" }
 extern void link_error (void);
 
 enum Alpha {
-- 
2.25.1



Re: [PATCH] testsuite: Add -fwrapv to signbit-5.c

2024-08-16 Thread Jeff Law




On 8/16/24 4:12 AM, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-14?

Verified this on x86_64 and arm-none-eabi.
Don't know if the other "truth type" dg-lines can be removed as well.

--

On Cortex-M55 with MVE, the test case fails due to -INT_MAX being
undefined. Adding -fwrapv solves the issues.

Regtested on x86_64-pc-linux and arm-none-eabi for
Cortex-M0/M3/M4/M7/M33/M55/M85/A7.

gcc/testsuite/ChangeLog:

* gcc.dg/signbit-5.c: Add -fwrapv and remove x86 exception.
Presumably the -x[i] when i == 0 cases?   I'm a bit surprised that doing 
a -INT_MIN didn't produce -INT_MIN, but it's still a bad thing to do due 
to the overflow.


So, OK for the trunk and release branches.  If we need to adjust risc-v 
we'll know if a few days :-)


jeff



Re: [PATCH] Dump aliases in -fcallgraph-info

2024-08-16 Thread Jeff Law




On 8/15/24 9:56 PM, Alexandre Oliva wrote:


Dump ICF-unified decls, thunks, aliases and whatnot along with their
ultimate targets, with edges from the alias to the target.

Add support for dropping the source file's suffix when forming from
dump-base, so that auxiliary files can be scanned, such as the .ci
files generated by -fcallgraph-info, as in the testcase.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* toplev.cc (dump_final_alias_vcg): New.
(dump_final_node_vcg): Dump aliases along with node.

for  gcc/testsuite/ChangeLog

* lib/scandump.exp (dump-base): Support {} in dump base suffix
to drop it.
* gcc.dg/callgraph-info-1.c: New.

OK
jeff



Re: [PATCH] testsuite: Reduce cut-&-paste in scanltranstree.exp

2024-08-16 Thread Jeff Law




On 8/15/24 6:55 AM, Richard Sandiford wrote:

scanltranstree.exp defines some LTO wrappers around standard
non-LTO scanners.  Four of them are cut-&-paste variants of
one another, so this patch generates them from a single template.
It also does the same for scan-ltrans-tree-dump-times, so that
other *-times scanners can be added easily in future.

The scanners seem to be lightly used.  gcc.dg/ipa/ipa-icf-38.c uses
scan-ltrans-tree-dump{,-not} and libgomp.c/declare-variant-1.c
uses scan-ltrans-tree-dump-{not,times}.  Nothing currently seems
to use scan-ltrans-tree-dump-dem*.

Tested on the files above so far.  Surprisingly, it worked first time,
but I tested that deliberately introduced mistakes were flagged.
(That's my story anyway.)  OK if it passes full testing on
aarch64-linux-gnu & x86_64-linux-gnu?

Richard


gcc/testsuite/
* lib/scanltranstree.exp: Redefine the routines using two
templates.

OK
jeff



[PATCH] testsuite: Add -fno-short-enums to pr97315-1.C

2024-08-16 Thread Torbjörn SVENSSON
Ok for trunk and releases/gcc-14?

--

The test case assumes that sizeof(tree_code) >= 2. On some targets, like
Cortex-M on arm-none-eabi, -fshort-enums is enabled by default and in
that case, sizeof(tree_code) will be 1 and the following warning is
emitted:

.../pr97315-1.C:8:13: warning: width of 'tree_base::code' exceeds its type

Avoid the warning by forcing -fno-short-enums.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr97315-1.C: Add -fno-short-enums.

Signed-off-by: Torbjörn SVENSSON 
---
 gcc/testsuite/g++.dg/opt/pr97315-1.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/opt/pr97315-1.C 
b/gcc/testsuite/g++.dg/opt/pr97315-1.C
index 5a618d8e1e8..3e439c5f179 100644
--- a/gcc/testsuite/g++.dg/opt/pr97315-1.C
+++ b/gcc/testsuite/g++.dg/opt/pr97315-1.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fno-exceptions" } */
+/* { dg-options "-O3 -fno-exceptions -fno-short-enums" } */
 
 typedef struct tree_node *tree;
 enum tree_code { RECORD_TYPE, QUAL_UNION_TYPE };
-- 
2.25.1



[PATCH v2] match: Fix A || B not optimized to true when !B implies A [PR114326]

2024-08-16 Thread Konstantinos Eleftheriou
From: kelefth 

In expressions like (a != b || ((a ^ b) & CST0) == CST1) and
(a != b || (a ^ b) == CST), (a ^ b) is folded to false.
In the equivalent expressions (((a ^ b) & CST0) == CST1 || a != b) and
((a ^ b) == CST, (a ^ b) || a != b) this is not happening.

This patch adds the following simplifications in match.pd:
((a ^ b) & CST0) == CST1 || a != b --> 0 == (CST1 || a != b)
(a ^ b) == CST || a != b --> 0 == CST || (a != b)

PR tree-optimization/114326

gcc/ChangeLog:

* match.pd: Add two patterns to fold a ^ b to 0, when a == b.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/fold-xor-and-or-1.c: New test.
* gcc.dg/tree-ssa/fold-xor-and-or-2.c: New test.
* gcc.dg/tree-ssa/fold-xor-or-1.c: New test.
* gcc.dg/tree-ssa/fold-xor-or-2.c: New test.

Reviewed-by: Christoph Müllner 
Signed-off-by: Philipp Tomsich 
Signed-off-by: Konstantinos Eleftheriou 
---
 gcc/match.pd  | 30 +++
 .../gcc.dg/tree-ssa/fold-xor-and-or-1.c   | 17 +++
 .../gcc.dg/tree-ssa/fold-xor-and-or-2.c   | 19 
 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-1.c | 17 +++
 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-2.c | 19 
 5 files changed, 102 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index c9c8478d286..1c55bd72f09 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -10680,3 +10680,33 @@ and,
   }
   (if (full_perm_p)
(vec_perm (op@3 @0 @1) @3 @2))
+
+/* ((a ^ b) & CST0) == CST1 || a != b --> 0 == (CST1 || a != b). */
+(for cmp (simple_comparison)
+  (simplify
+(bit_ior
+  (cmp
+   (bit_and
+ (bit_xor @0 @1)
+ INTEGER_CST)
+   @3)
+(ne@4 @0 @1))
+  (bit_ior
+(cmp
+  { build_zero_cst (TREE_TYPE (@0)); }
+  @3)
+@4)))
+
+/* (a ^ b) == CST || a != b --> 0 == CST || (a != b). */
+(for cmp (simple_comparison)
+  (simplify
+(bit_ior
+  (cmp
+   (bit_xor @0 @1)
+   @2)
+  (ne@3 @0 @1))
+(bit_ior
+  (cmp
+   {build_zero_cst (TREE_TYPE (@0)); }
+   @2)
+  @3)))
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-1.c
new file mode 100644
index 000..0e6fc1d5515
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int cmp1(int d1, int d2) {
+  if (((d1 ^ d2) & 0xabcd) == 0 || d1 != d2)
+return 0;
+  return 1;
+}
+
+int cmp2(int d1, int d2) {
+  if (d1 != d2 || ((d1 ^ d2) & 0xabcd) == 0)
+return 0;
+  return 1;
+}
+
+/* The if should be removed, so the condition should not exist */
+/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." 
"optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c
new file mode 100644
index 000..3f8da111354
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+typedef unsigned long int uint64_t;
+
+int cmp1(uint64_t d1, uint64_t d2) {
+  if (((d1 ^ d2) & 0xabcd) == 0 || d1 != d2)
+return 0;
+  return 1;
+}
+
+int cmp2(uint64_t d1, uint64_t d2) {
+  if (d1 != d2 || ((d1 ^ d2) & 0xabcd) == 0)
+return 0;
+  return 1;
+}
+
+/* The if should be removed, so the condition should not exist */
+/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." 
"optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-1.c
new file mode 100644
index 000..0bc849a2d74
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int cmp1(int d1, int d2) {
+  if ((d1 ^ d2) == 0xabcd || d1 != d2)
+return 0;
+  return 1;
+}
+
+int cmp2(int d1, int d2) {
+  if (d1 != d2 || (d1 ^ d2) == 0xabcd)
+return 0;
+  return 1;
+}
+
+/* The if should be removed, so the condition should not exist */
+/* { dg-final { scan-tree-dump-not "d1_\[0-9\]+.D. \\^ d2_\[0-9\]+.D." 
"optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-2.c
new file mode 100644
index 000..2276fc1c2ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+typedef unsigned long int uint64_t;
+
+int cmp1(uint64_t d1, uint64_t d2) {
+  if ((d1 ^ d2) == 0xabcd || d1 != d2)

Re: [PATCH] Re-add calling emit_clobber in lower-subreg.cc's resolve_simple_move.

2024-08-16 Thread Xianmiao Qu
On Tue, Aug 13, 2024 at 09:58:31PM -0600, Jeff Law wrote:
> Note this changes target independent code.  So it needs to be bootstrapped
> and regression tested on one of the primary platforms:
> 
> > The primary platforms are:
> > 
> > aarch64-none-linux-gnu
> > arm-linux-gnueabi
> > i586-unknown-freebsd
> > i686-pc-linux-gnu
> > powerpc64-unknown-linux-gnu
> > powerpc64le-unknown-linux-gnu
> > sparc-sun-solaris2.11
> > x86_64-pc-linux-gnu
> 
> 
> I'll ACK once there's a confirmation that it has passed the bootstrap and
> regression test on one of those platforms.
> 
> jeff
>

The test results before and after adding this patch are as follows:
  https://gcc.gnu.org/pipermail/gcc-testresults/2024-August/822478.html
  https://gcc.gnu.org/pipermail/gcc-testresults/2024-August/822481.html
No new errors.


Thanks,
Xianmiao 


Re: [PATCH] testsuite: Add -fwrapv to signbit-5.c

2024-08-16 Thread Torbjorn SVENSSON




On 2024-08-16 16:07, Jeff Law wrote:



On 8/16/24 4:12 AM, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-14?

Verified this on x86_64 and arm-none-eabi.
Don't know if the other "truth type" dg-lines can be removed as well.

--

On Cortex-M55 with MVE, the test case fails due to -INT_MAX being
undefined. Adding -fwrapv solves the issues.

Regtested on x86_64-pc-linux and arm-none-eabi for
Cortex-M0/M3/M4/M7/M33/M55/M85/A7.

gcc/testsuite/ChangeLog:

* gcc.dg/signbit-5.c: Add -fwrapv and remove x86 exception.
Presumably the -x[i] when i == 0 cases?   I'm a bit surprised that doing 
a -INT_MIN didn't produce -INT_MIN, but it's still a bad thing to do due 
to the overflow.


On the Cortex-M55 with MVE, -INT_MIN will result in INT_MIN, i.e. a 
large negative value. The negated INT_MIN value cannot be represented 
using the two complement form with the same number of bits.


So, OK for the trunk and release branches.  If we need to adjust risc-v 
we'll know if a few days :-)


Ok.


Pushed as r15-2950 and r14-10592.

Kind regards,
Torbjörn


Re: [RFC/RFA] [PATCH v2 09/12] Add symbolic execution support.

2024-08-16 Thread Jeff Law




On 8/12/24 5:13 AM, Matevos Mehrabyan wrote:


On Fri, Aug 2, 2024, 14:25 Richard Biener
mailto:richard.guent...@gmail.com>> wrote:
 > On Wed, Jul 31, 2024 at 12:42 PM Mariam Arutunian
 > mailto:mariamarutun...@gmail.com>> wrote:
 >
 >     Gives an opportunity to execute the code on bit level,
 >    assigning symbolic values to the variables which don't have
initial values.
 >    Supports only CRC specific operations.
 >
 >    Example:
 >
 >    uint8_t crc;
 >    uint8_t pol = 1;
 >    crc = crc ^ pol;
 >
 >    during symbolic execution crc's value will be:
 >    crc(8), crc(7), ... crc(1), crc(0) ^ 1

There seem to be quite some functions without a function comment.


I added more comments for functions in the new patch.

I see

+enum value_type {
+  SYMBOLIC_BIT,
+  BIT,
+  BIT_XOR_EXPRESSION,
+  BIT_AND_EXPRESSION,
+  BIT_OR_EXPRESSION,
+  BIT_COMPLEMENT_EXPRESSION,
+  SHIFT_RIGHT_EXPRESSION,
+  SHIFT_LEFT_EXPRESSION,
+  ADD_EXPRESSION,
+  SUB_EXPRESSION,
+  BIT_CONDITION
+};

is there a specific reason to make the expressions not use enum
tree_code?


This enum is used for inner purposes. It represents a single bit. It
is used by 'is_a_helper<>' for type checking and some printing. As
you can see it also has values such as 'BIT' and 'SYMBOLIC_BIT' that
represents constant and symbolic bits. 'tree_code' doesn't have such
similar values. At most, I can remove Expression values from it and 
use both 'value_type' and 'tree_code', but it wouldn't be handy.
Right.  IIRC we need to track that a given bit has some same value as 
some other bit.  We could probably fake it by finding a tree code in the 
same neighborhood as SYMBOLIC_BIT and BIT, but I'm not sure doing so 
really helps maintenance in any notable way.



;


How is this all used and what are the constraints?  It does look like
a generic framework which means documentation in the internals
manual would be useful to be had.


Currently, this is only used for CRC candidate functions. It 
supports limited expressions that we met while analyzing CRC

functions, but can be extended. New expressions must be represented
at the bit level as the symbolic executor operates on the bit level.
Right.  It could potentially be useful to for other bitwise analysis. 
It's got a pretty limited set of operations, but they could always be 
extended.


I'd kind of hoped we could extend the existing symbolic execution and 
propagation engines rather than adding another.  But that never really 
panned out the way I'd hoped -- in particular I think we could use those 
existing mechanisms for the polynomial extraction, but they'd need 
significant work for the validation step IIRC.




I can add documentation for this version. In which section should
this be added or I should add the documentation in the source file
as it's done for gimple-ssa-store- merging.cc?
I'm torn.  We end to document in the .cc files which just encourages 
folks to look at the implementation rather than defining crisp APIs that 
can be understood and used solely from looking at headers.  I've never 
liked that approach.


But I also don't want to have a module like the bitwise symbolic 
execution do something significantly differently than what we do 
elsewhere.  So document in the appropriate .cc file.


I think the question to Richi is whether or not converting the store 
merging code to this new symbolic execution engine (or using the 
store-merging symbolic execution engine in the CRC validator) is a 
requirement to move forward.


Jeff


Re: [PATCH v5] Target-independent store forwarding avoidance.

2024-08-16 Thread Richard Sandiford
Manolis Tsamis  writes:
> This pass detects cases of expensive store forwarding and tries to avoid them
> by reordering the stores and using suitable bit insertion sequences.
> For example it can transform this:
>
>  strbw2, [x1, 1]
>  ldr x0, [x1]  # Expensive store forwarding to larger load.
>
> To:
>
>  ldr x0, [x1]
>  strbw2, [x1]
>  bfi x0, x2, 0, 8
>
> Assembly like this can appear with bitfields or type punning / unions.
> On stress-ng when running the cpu-union microbenchmark the following speedups
> have been observed.
>
>   Neoverse-N1:  +29.4%
>   Intel Coffeelake: +13.1%
>   AMD 5950X:+17.5%
>
> gcc/ChangeLog:
>
>   * Makefile.in: Add avoid-store-forwarding.o
>   * common.opt: New option -favoid-store-forwarding.
>   * doc/invoke.texi: New param store-forwarding-max-distance.
>   * doc/passes.texi: Document new pass.
>   * doc/tm.texi: Regenerate.
>   * doc/tm.texi.in: Document new pass.
>   * params.opt: New param store-forwarding-max-distance.
>   * passes.def: Schedule a new pass.
>   * target.def (HOOK_PREFIX): New target hook avoid_store_forwarding_p.
>   * target.h (struct store_fwd_info): Declare.
>   * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
>   * avoid-store-forwarding.cc: New file.
>   * avoid-store-forwarding.h: New file.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
>   * gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
>   * gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
>   * gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
>   * gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
> Changes in v5:
> - Fix bug with BIG_ENDIAN targets.
> - Fix bug with unrecognized instructions.
> - Fix / simplify pass init/fini.
>
> Changes in v4:
> - Change pass scheduling to run after sched1.
> - Add target hook to decide whether a store forwarding instance
> should be avoided or not.
> - Fix bugs.
>
> Changes in v3:
> - Only emit SUBREG after calling validate_subreg.
> - Fix memory corruption due to vec self-reference.
> - Fix bitmap_bit_in_range_p ICE due to BLKMode.
> - Reject MEM to MEM sets.
> - Add get_load_mem comment.
> - Add new testcase.
>
> Changes in v2:
> - Allow modes that are not scalar_int_mode.
> - Introduce simple costing to avoid unprofitable transformations.
> - Reject bit insert sequences that spill to memory.
> - Document new pass.
> - Fix and add testcases.
>
>  gcc/Makefile.in   |   1 +
>  gcc/avoid-store-forwarding.cc | 616 ++
>  gcc/avoid-store-forwarding.h  |  56 ++
>  gcc/common.opt|   4 +
>  gcc/doc/invoke.texi   |   9 +
>  gcc/doc/passes.texi   |   8 +
>  gcc/doc/tm.texi   |   9 +
>  gcc/doc/tm.texi.in|   2 +
>  gcc/params.opt|   4 +
>  gcc/passes.def|   1 +
>  gcc/target.def|  11 +
>  gcc/target.h  |   3 +
>  .../aarch64/avoid-store-forwarding-1.c|  28 +
>  .../aarch64/avoid-store-forwarding-2.c|  39 ++
>  .../aarch64/avoid-store-forwarding-3.c|  31 +
>  .../aarch64/avoid-store-forwarding-4.c|  23 +
>  .../aarch64/avoid-store-forwarding-5.c|  38 ++
>  gcc/tree-pass.h   |   1 +
>  18 files changed, 884 insertions(+)
>  create mode 100644 gcc/avoid-store-forwarding.cc
>  create mode 100644 gcc/avoid-store-forwarding.h
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-3.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-5.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 8fba8f7db6a..43675288399 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1682,6 +1682,7 @@ OBJS = \
>   statistics.o \
>   stmt.o \
>   stor-layout.o \
> + avoid-store-forwarding.o \
>   store-motion.o \
>   streamer-hooks.o \
>   stringpool.o \
> diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> new file mode 100644
> index 000..4a0343c0314
> --- /dev/null
> +++ b/gcc/avoid-store-forwarding.cc
> @@ -0,0 +1,616 @@
> +/* Avoid store forwarding optimization pass.
> +   Copyright (C) 2024 Free Softwar

Re: [PATCH] testsuite: Add -fno-short-enums to pr97315-1.C

2024-08-16 Thread Jakub Jelinek
On Fri, Aug 16, 2024 at 04:15:10PM +0200, Torbjörn SVENSSON wrote:
> Ok for trunk and releases/gcc-14?
> 
> --
> 
> The test case assumes that sizeof(tree_code) >= 2. On some targets, like
> Cortex-M on arm-none-eabi, -fshort-enums is enabled by default and in
> that case, sizeof(tree_code) will be 1 and the following warning is
> emitted:
> 
> .../pr97315-1.C:8:13: warning: width of 'tree_base::code' exceeds its type
> 
> Avoid the warning by forcing -fno-short-enums.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/opt/pr97315-1.C: Add -fno-short-enums.
> 
> Signed-off-by: Torbjörn SVENSSON 

Ok, thanks.

Jakub



Re: [PATCH] match: Fix A || B not optimized to true when !B implies A [PR114326]

2024-08-16 Thread Konstantinos Eleftheriou
Thanks, fixed (
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660631.html).

On Thu, Aug 15, 2024 at 3:57 PM Sam James  wrote:

> Konstantinos Eleftheriou  writes:
>
> > From: kelefth 
> >
> > In expressions like (a != b || ((a ^ b) & CST0) == CST1) and
> > (a != b || (a ^ b) == CST), (a ^ b) is folded to false.
> > In the equivalent expressions (((a ^ b) & CST0) == CST1 || a != b) and
> > ((a ^ b) == CST, (a ^ b) || a != b) this is not happening.
> >
> > This patch adds the following simplifications in match.pd:
> > ((a ^ b) & CST0) == CST1 || a != b --> 0 == (CST1 || a != b)
> > (a ^ b) == CST || a != b --> 0 == CST || (a != b)
> >
> >   PR tree-optimization/114326
> >
> > gcc/ChangeLog:
> >
> >   * match.pd: Add two patterns to fold a ^ b to 0, when a == b.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/tree-ssa/fold-xor-and-or-1.c: New test.
> >   * gcc.dg/tree-ssa/fold-xor-and-or-2.c: New test.
> >   * gcc.dg/tree-ssa/fold-xor-or-1.c: New test.
> >   * gcc.dg/tree-ssa/fold-xor-or-2.c: New test.
> >
> > Reviewed-by: Christoph Müllner 
> > Signed-off-by: Philipp Tomsich 
> > Signed-off-by: Konstantinos Eleftheriou <
> konstantinos.elefther...@vrull.eu>
> > ---
> >  gcc/match.pd  | 30 +++
> >  .../gcc.dg/tree-ssa/fold-xor-and-or-1.c   | 17 +++
> >  .../gcc.dg/tree-ssa/fold-xor-and-or-2.c   | 19 
> >  gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-1.c | 17 +++
> >  gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-2.c | 19 
> >  5 files changed, 102 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/fold-xor-or-2.c
> >
> > [...]
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/fold-xor-and-or-2.c
> > @@ -0,0 +1,19 @@
> > +/* { dg-do-compile } */
>
> /* { dg-do compile } */
>
> Please fix each instance of that. Thanks!
>
> > [...]
>
> sam
>


Re: [PATCH] testsuite: Add -fno-short-enums to pr97315-1.C

2024-08-16 Thread Torbjorn SVENSSON




On 2024-08-16 16:37, Jakub Jelinek wrote:

On Fri, Aug 16, 2024 at 04:15:10PM +0200, Torbjörn SVENSSON wrote:

Ok for trunk and releases/gcc-14?

--

The test case assumes that sizeof(tree_code) >= 2. On some targets, like
Cortex-M on arm-none-eabi, -fshort-enums is enabled by default and in
that case, sizeof(tree_code) will be 1 and the following warning is
emitted:

.../pr97315-1.C:8:13: warning: width of 'tree_base::code' exceeds its type

Avoid the warning by forcing -fno-short-enums.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr97315-1.C: Add -fno-short-enums.

Signed-off-by: Torbjörn SVENSSON 


Ok, thanks.

Jakub




Pushed as r15-2951 and r14-10593.

Kind regards,
Torbjörn


Re: [PATCH v2] testsuite: Verify -fshort-enums and -fno-short-enums in pr33738.C

2024-08-16 Thread Jakub Jelinek
On Fri, Aug 16, 2024 at 03:51:01PM +0200, Torbjörn SVENSSON wrote:
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/warn/pr33738.C: Added -fno-short-enums.
>   * g++.dg/warn/pr33738-2.C: Duplicate g++.dg/warn/pr33738.C with
>   -fshort-enums and removed xfail.
> 
> Signed-off-by: Torbjörn SVENSSON 
> ---
>  gcc/testsuite/g++.dg/warn/pr33738-2.C | 27 +++
>  gcc/testsuite/g++.dg/warn/pr33738.C   |  3 ++-
>  2 files changed, 29 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/warn/pr33738-2.C
> 
> --- a/gcc/testsuite/g++.dg/warn/pr33738.C
> +++ b/gcc/testsuite/g++.dg/warn/pr33738.C
> @@ -1,5 +1,6 @@
>  // { dg-do run }
> -// { dg-options "-O2 -Wtype-limits -fstrict-enums" }
> +/* { dg-prune-output "use of enum values across objects may fail" } */
> +// { dg-options "-O2 -Wtype-limits -fstrict-enums -fno-short-enums" }

When the test already uses // style comments, use that for the new
dg-prune-output as well.

>  extern void link_error (void);
>  
>  enum Alpha {
> -- 
> 2.25.1

Ok for trunk with that nit fixed.

Jakub



Re: [PATCH v2] testsuite: Verify -fshort-enums and -fno-short-enums in pr33738.C

2024-08-16 Thread Torbjorn SVENSSON




On 2024-08-16 16:45, Jakub Jelinek wrote:

On Fri, Aug 16, 2024 at 03:51:01PM +0200, Torbjörn SVENSSON wrote:

gcc/testsuite/ChangeLog:

* g++.dg/warn/pr33738.C: Added -fno-short-enums.
* g++.dg/warn/pr33738-2.C: Duplicate g++.dg/warn/pr33738.C with
-fshort-enums and removed xfail.

Signed-off-by: Torbjörn SVENSSON 
---
  gcc/testsuite/g++.dg/warn/pr33738-2.C | 27 +++
  gcc/testsuite/g++.dg/warn/pr33738.C   |  3 ++-
  2 files changed, 29 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/pr33738-2.C

--- a/gcc/testsuite/g++.dg/warn/pr33738.C
+++ b/gcc/testsuite/g++.dg/warn/pr33738.C
@@ -1,5 +1,6 @@
  // { dg-do run }
-// { dg-options "-O2 -Wtype-limits -fstrict-enums" }
+/* { dg-prune-output "use of enum values across objects may fail" } */
+// { dg-options "-O2 -Wtype-limits -fstrict-enums -fno-short-enums" }


When the test already uses // style comments, use that for the new
dg-prune-output as well.


I copy-pasted from another test case and didn't see that it used c-style 
comments until after I sent the patch. I intended to fix that before 
merge regardless if someone commented on it or not.





  extern void link_error (void);
  
  enum Alpha {

--
2.25.1


Ok for trunk with that nit fixed.

Jakub




Pushed as r15-2952 and r14-10594.

Kind regards,
Torbjörn


Re: [PATCH 4/4] rs6000, Add tests and documentation for vector, conversions between integer and float

2024-08-16 Thread Carl Love

Kewen:

Ping.

  Carl

On 8/7/24 10:15 AM, Carl Love wrote:



 GCC maintainers:

The following patch fixes errors in the definition of the 
__builtin_vsx_uns_floate_v2di, __builtin_vsx_uns_floato_v2di and 
__builtin_vsx_uns_float2_v2di built-ins.  The arguments should be 
unsigned but are listed as signed.


Additionally, there are a number of test cases that are missing for 
the various instances of the built-ins.  Additionally, the 
documentation for the various built-ins is missing.


This patch adds the missing test cases and documentation.

The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

    Carl
- 

rs6000, Add tests and documentation for vector conversions between 
integer and float


The arguments for the __builtin_vsx_uns_floate_v2di,
__builtin_vsx_uns_floato_v2di and __builtin_vsx_uns_float2_v2di built-ins
should be unsigned.

Add tests for the following existing integer and long long int to float
built-ins:
  __builtin_altivecfloat_sisf (vsi);
  __builtin_altivec_uns_float_sisf (vui);
  __builtin_vsxfloate_v2di (vsll);
  __builtin_vsx_uns_floate_v2di (vull);
  __builtin_vsx_floato_v2di (vsll);
  __builtin_vsx_uns_floato_v2di (vull);
  __builtin_vsx_float2_v2di (vsll, vsll);
  __builtin_vsx_uns_float2_v2di (vull, vull);

Add tests for the vector float to vector int built-ins:
  __builtin_altivec_fix_sfsi
  __builtin_altivec_fixuns_sfsi

The various built-ins are not documented.  The patch adds the missing
documentation for the variouls built-ins.

This patch fixes the incorrect __builtin_vsx_uns_float[o|e|2]_v2di
argument types and adds test cases for each of the built-ins listed 
above.


gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_uns_floate_v2di,
    __builtin_vsx_uns_floato_v2di,__builtin_vsx_uns_float2_v2di): Change
    argument from signed to unsigned.
    * doc/extend.texi: Add documentation for each of the built-ins.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vsx-int-to-float-runnable.c: New file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 +-
 gcc/doc/extend.texi   |  37 +++
 .../powerpc/vsx-int-to-float-runnable.c   | 260 ++
 3 files changed, 300 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/vsx-int-to-float-runnable.c


diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index f2bebd299b2..1227daa1555 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1463,10 +1463,10 @@
   const vd __builtin_vsx_uns_doubleo_v4si (vsi);
 UNS_DOUBLEO_V4SI unsdoubleov4si2 {}

-  const vf __builtin_vsx_uns_floate_v2di (vsll);
+  const vf __builtin_vsx_uns_floate_v2di (vull);
 UNS_FLOATE_V2DI unsfloatev2di {}

-  const vf __builtin_vsx_uns_floato_v2di (vsll);
+  const vf __builtin_vsx_uns_floato_v2di (vull);
 UNS_FLOATO_V2DI unsfloatov2di {}

   const vsll __builtin_vsx_vsigned_v2df (vd);
@@ -2272,7 +2272,7 @@
   const vss __builtin_vsx_revb_v8hi (vss);
 REVB_V8HI revb_v8hi {}

-  const vf __builtin_vsx_uns_float2_v2di (vsll, vsll);
+  const vf __builtin_vsx_uns_float2_v2di (vull, vull);
 UNS_FLOAT2_V2DI uns_float2_v2di {}

   const vsi __builtin_vsx_vsigned2_v2df (vd, vd);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index bf6f4094040..7ec4f19a6bf 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22919,6 +22919,43 @@ but the index value must be 0.

 Only functions excluded from the PVIPR are listed here.

+The following built-ins convert signed and unsigned vectors of ints and
+long long ints to a vector of 32-bit floating point values.
+
+@smallexample
+vector float __builtin_altivec_float_sisf (vector int);
+vector float __builtin_altivec_uns_float_sisf (vector unsigned int);
+vector float __builtin_vsx_floate_v2di (vector signed long long int);
+vector float __builtin_vsx_uns_floate_v2di (vector unsigned long long 
int);

+vector float __builtin_vsx_floato_v2di (vector signed long long int);
+vector float __builtin_vsx_uns_floato_v2di (vector unsigned long long 
int);

+vector float __builtin_vsx_float2_v2di (vector signed long long int,
+    vector signed long long int);
+vector float __builtin_vsx_uns_float2_v2di (vector unsigned long long 
int,
+    vector signed long long 
int);

+@end smallexample
+
+The @code{__builtin_altivec_float_sisf} and
+@code{__builtin_altivec_uns_float_sisf} built-ins convert signed and
+unsigned vectors of 32-bit integers to a vector of 32-bit floating point
+values.  The @code{__builtin_vsx_floate_v2di} and
+@code{__builtin_vsx_uns_floate_v2di} built-ins converts a vector
+long long ints to 32-bit floating point values stori

Re: [PATCH ver 2] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros

2024-08-16 Thread Carl Love

Ping.

 Carl

On 8/9/24 8:57 AM, Carl Love wrote:


Gcc maintainers:

Version 2, based on discussion additional overloaded instances of the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been 
added.  The additional instances are for arguments of vector signed 
char and vector bool char.  The patch has been tested on Power 10 LE 
and BE with no regressions.


Per a report from a user, the existing vec_test_lsbb_all_ones and, 
vec_test_lsbb_all_zeros built-ins are not documented in the GCC 
documentation file.


The following patch adds missing documentation for the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins.


Please let me know if the patch is acceptable for mainline. Thanks.

  Carl

rs6000,extend and document built-ins vec_test_lsbb_all_ones and 
vec_test_lsbb_all_zeros


The built-ins currently support unsigned char arguments.  Extend the
built-ins to also support vector signed char and vector bool char 
aruments.


Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
returns 1 if the least significant bit in each byte is a 1, returns
0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
the least significant bit in each byte is a zero and 0 otherwise.

Add addtional test cases for the built-ins in files:
  gcc/testsuite/gcc.target/powerpc/lsbb.c
  gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c

gcc/ChangeLog:
    * config/rs6000/rs6000-overloaded.def (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add built-in instances for vector signed
    char and vector bool char.
    * doc/extend.texi (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add documentation for the
    existing built-ins.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/lsbb-runnable.c: Add test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
    * gcc.target/powerpc/lsbb.c: Add compile test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
---
 gcc/config/rs6000/rs6000-overload.def |  12 +-
 gcc/doc/extend.texi   |  19 +++
 .../gcc.target/powerpc/lsbb-runnable.c    | 131 ++
 gcc/testsuite/gcc.target/powerpc/lsbb.c   |  24 +++-
 4 files changed, 156 insertions(+), 30 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def

index 87495aded49..7d9e31c3f9e 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4403,12 +4403,20 @@
 XXEVAL  XXEVAL_VUQ

 [VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, 
__builtin_vec_xvtlsbb_all_ones]

+  signed int __builtin_vec_xvtlsbb_all_ones (vsc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VSC
   signed int __builtin_vec_xvtlsbb_all_ones (vuc);
-    XVTLSBB_ONES
+    XVTLSBB_ONES LSBB_ALL_ONES_VUC
+  signed int __builtin_vec_xvtlsbb_all_ones (vbc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VBC

 [VEC_TEST_LSBB_ALL_ZEROS, vec_test_lsbb_all_zeros, 
__builtin_vec_xvtlsbb_all_zeros]

+  signed int __builtin_vec_xvtlsbb_all_zeros (vsc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VSC
   signed int __builtin_vec_xvtlsbb_all_zeros (vuc);
-    XVTLSBB_ZEROS
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VUC
+  signed int __builtin_vec_xvtlsbb_all_zeros (vbc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VBC

 [VEC_TRUNC, vec_trunc, __builtin_vec_trunc]
   vf __builtin_vec_trunc (vf);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89fe5db7aed..5ca87889831 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23332,6 +23332,25 @@ signed long long will sign extend the 
rightmost byte of each doubleword.

 The following additional built-in functions are also available for the
 PowerPC family of processors, starting with ISA 3.1 
(@option{-mcpu=power10}):


+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector signed char);
+@exdent int vec_test_lsbb_all_ones (vector unsigned char);
+@exdent int vec_test_lsbb_all_ones (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_ones
+
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least 
significant

+bit in each byte is equal to 1.  It returns a 0 otherwise.
+
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector signed char);
+@exdent int vec_test_lsbb_all_zeros (vector unsigned char);
+@exdent int vec_test_lsbb_all_zeros (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
+
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least 
significant

+bit in each byte is equal to zero.  It returns a 0 otherwise.

 @smallexample
 @exdent vector unsigned long long int
diff --git a/gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c 
b/gcc/testsuite/gcc.targ

Re: [RFC/RFA][PATCH v3 06/12] aarch64: Implement new expander for efficient CRC computation

2024-08-16 Thread Mariam Arutunian
On Fri, Aug 9, 2024 at 7:22 PM Richard Sandiford 
wrote:

> Sorry again for the slow review. :(
>
> I only really looked at the unreversed version earlier, on the basis
> that the comments would apply to both versions.  But I've got a couple
> of comments about the reversed version below:
>
> Mariam Arutunian  writes:
> > [...]
> > diff --git a/gcc/config/aarch64/aarch64.cc
> b/gcc/config/aarch64/aarch64.cc
> > index ee12d8897a8..546a379fd74 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -30265,6 +30265,126 @@ aarch64_retrieve_sysreg (const char *regname,
> bool write_p, bool is128op)
> >return sysreg->encoding;
> >  }
> >
> > +/* Generate assembly to calculate CRC
> > +   using carry-less multiplication instruction.
> > +   OPERANDS[1] is input CRC,
> > +   OPERANDS[2] is data (message),
> > +   OPERANDS[3] is the polynomial without the leading 1.  */
> > +
> > +void
> > +aarch64_expand_crc_using_pmull (scalar_mode crc_mode,
> > + scalar_mode data_mode,
> > + rtx *operands)
> > +{
> > +  /* Check and keep arguments.  */
> > +  gcc_assert (!CONST_INT_P (operands[0]));
> > +  gcc_assert (CONST_INT_P (operands[3]));
> > +  rtx crc = operands[1];
> > +  rtx data = operands[2];
> > +  rtx polynomial = operands[3];
> > +
> > +  unsigned HOST_WIDE_INT crc_size = GET_MODE_BITSIZE (crc_mode);
> > +  unsigned HOST_WIDE_INT data_size = GET_MODE_BITSIZE (data_mode);
> > +  gcc_assert (crc_size <= 32);
> > +  gcc_assert (data_size <= crc_size);
> > +
> > +  /* Calculate the quotient.  */
> > +  unsigned HOST_WIDE_INT
> > +  q = gf2n_poly_long_div_quotient (UINTVAL (polynomial), crc_size);
> > +  /* CRC calculation's main part.  */
> > +  if (crc_size > data_size)
> > +crc = expand_shift (RSHIFT_EXPR, DImode, crc, crc_size - data_size,
> > + NULL_RTX, 1);
> > +
> > +  rtx t0 = force_reg (DImode, gen_int_mode (q, DImode));
> > +  polynomial = simplify_gen_unary (ZERO_EXTEND, DImode, polynomial,
> > +GET_MODE (polynomial));
> > +  rtx t1 = force_reg (DImode, polynomial);
> > +
> > +  rtx a0 = expand_binop (DImode, xor_optab, crc, data, NULL_RTX, 1,
> > +  OPTAB_WIDEN);
> > +
> > +  rtx clmul_res = gen_reg_rtx (TImode);
> > +  emit_insn (gen_aarch64_crypto_pmulldi (clmul_res, a0, t0));
> > +  a0 = gen_lowpart (DImode, clmul_res);
> > +
> > +  a0 = expand_shift (RSHIFT_EXPR, DImode, a0, crc_size, NULL_RTX, 1);
> > +
> > +  emit_insn (gen_aarch64_crypto_pmulldi (clmul_res, a0, t1));
> > +  a0 = gen_lowpart (DImode, clmul_res);
> > +
> > +  if (crc_size > data_size)
> > +{
> > +  rtx crc_part = expand_shift (LSHIFT_EXPR, DImode, operands[1],
> data_size,
> > +NULL_RTX, 0);
> > +  a0 = expand_binop (DImode, xor_optab, a0, crc_part, NULL_RTX, 1,
> > +  OPTAB_DIRECT);
> > +}
> > +
> > +  /* Zero upper bits beyond crc_size.  */
>
> The comment no longer applies.  Otherwise this function looks good to me.
>
>
Ok.)


> > +  aarch64_emit_move (operands[0], gen_lowpart (crc_mode, a0));
> > +}
> > +
> > +/* Generate assembly to calculate reversed CRC
> > +   using carry-less multiplication instruction.
> > +   OPERANDS[1] is input CRC,
> > +   OPERANDS[2] is data,
> > +   OPERANDS[3] is the polynomial without the leading 1.  */
> > +
> > +void
> > +aarch64_expand_reversed_crc_using_pmull (scalar_mode crc_mode,
> > +  scalar_mode data_mode,
> > +  rtx *operands)
> > +{
> > +  /* Check and keep arguments.  */
> > +  gcc_assert (!CONST_INT_P (operands[0]));
> > +  gcc_assert (CONST_INT_P (operands[3]));
> > +  rtx crc = operands[1];
> > +  rtx data = operands[2];
> > +  rtx polynomial = operands[3];
> > +
> > +  unsigned HOST_WIDE_INT crc_size = GET_MODE_BITSIZE (crc_mode);
> > +  unsigned HOST_WIDE_INT data_size = GET_MODE_BITSIZE (data_mode);
> > +  gcc_assert (crc_size <= 32);
> > +  gcc_assert (data_size <= crc_size);
> > +
> > +  /* Calculate the quotient.  */
> > +  unsigned HOST_WIDE_INT
> > +  q = gf2n_poly_long_div_quotient (UINTVAL (polynomial), crc_size);
> > +  /* Reflect the calculated quotient.  */
> > +  q = reflect (q);
> > +  rtx t0 = force_reg (DImode, gen_int_mode (q >> (data_size - 4),
> DImode));
> > +
> > +  /* Reflect the polynomial.  */
> > +  unsigned HOST_WIDE_INT ref_polynomial = reflect (UINTVAL
> (polynomial));
>
> It looks like reflect() autodetects the bitwidth based on the assumption
> that the upper half will be nonzero.  But that might not be true for all
> possible polynomials (when the implicit leading coefficient is absent)
> E.g. it looks like the 64-bit HDLC CRC polynomial is 0x1b (just the
> lowest byte nonzero), and although we don't support 64-bit polynomials
> here, the approach wouldn't work for it.
>
> I think it'd be safer to pass the bitwidth as an explicit paramete

RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-08-16 Thread Prathamesh Kulkarni


> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 13, 2024 10:06 PM
> To: Thomas Schwinge 
> Cc: Prathamesh Kulkarni ; Andrew Pinski
> ; gcc-patches@gcc.gnu.org; Jakub Jelinek
> 
> Subject: Re: [nvptx] Pass -m32/-m64 to host_compiler if it has
> multilib support
> 
> External email: Use caution opening links or attachments
> 
> 
> > Am 13.08.2024 um 17:48 schrieb Thomas Schwinge
> :
> >
> > Hi Prathamesh!
> >
> > On 2024-08-12T07:50:07+, Prathamesh Kulkarni
>  wrote:
> >>> From: Thomas Schwinge 
> >>> Sent: Friday, August 9, 2024 12:55 AM
> >
> >>> On 2024-08-08T06:46:25-0700, Andrew Pinski 
> wrote:
>  On Thu, Aug 8, 2024 at 6:11 AM Prathamesh Kulkarni
>   wrote:
> > After differing NUM_POLY_INT_COEFFS fix for AArch64/nvptx
> >>> offloading, the following minimal test:
> >>>
> >>> First, thanks for your work on enabling this!  I will say that I
> had
> >>> the plan to re-engage with Nvidia to hire us (as initial
> >>> implementors of GCC/nvptx offloading) to make AArch64/nvptx
> >>> offloading work, but now that Nvidia has its own GCC team, that's
> >>> great that you're able to work on this yourself!  :-)
> >>>
> >>> Please CC me for GCC/nvptx issues for (at least potentially...)
> >>> faster response times.
> >> Thanks, will do 😊
> >
> > Heh, so much for "potentially": I'm not able to spend a lot of time
> on
> > this right now, as I shall soon be out of office.  Quickly:
> >
> > compiled with -fopenmp -foffload=nvptx-none now fails with:
> > gcc: error: unrecognized command-line option '-m64'
> > nvptx mkoffload: fatal error: ../install/bin/gcc returned 1 exit
> >>> status compilation terminated.
> >>>
> >>> Heh.  Yeah...
> >>>
> > As mentioned in RFC email, this happens because
> > nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host
> > compiler
> >>> depending on whether offload_abi is OFFLOAD_ABI_LP64 or
> >>> OFFLOAD_ABI_ILP32, and aarch64 backend doesn't recognize these
> >>> options.
> >
> >>> So, my idea is: instead of the current strategy that the host
> >>> 'TARGET_OFFLOAD_OPTIONS' synthesizes '-foffload-abi=lp64' etc.,
> >>> which the 'mkoffload's then interpret and re-synthesize '-m64'
> etc.
> >>> -- how about we instead directly tell the 'mkoffload's the
> relevant
> >>> ABI options?  That is, 'TARGET_OFFLOAD_OPTIONS' instead
> synthesizes
> >>> '- foffload-abi=-m64'
> >>> etc., which the 'mkoffload's can then readily use.  Could you
> please
> >>> give that a try, and/or does anyone see any issues with that
> approach?
> >>>
> >>> And use something like '-foffload-abi=disable' to replace the
> current:
> >>>
> >>>/* PR libgomp/65099: Currently, we only support offloading in
> 64-
> >>> bit
> >>>   configurations.  */
> >>>if (offload_abi == OFFLOAD_ABI_LP64)
> >>>  {
> >>>
> >>> (As discussed before, this should be done differently altogether,
> >>> but that's for another day.)
> >> Sorry, I don't quite follow. Currently we enable offloading if
> >> offload_abi == OFFLOAD_ABI_LP64, which is synthesized from
> >> -foffload-abi=lp64. If we change -foffload-abi to instead specify
> >> host-specific ABI opts, I guess mkoffload will still need to
> somehow
> >> figure out which ABI is used, so it can disable offloading for 32-
> bit
> >> ? I suppose we could adjust TARGET_OFFLOAD_OPTIONS for each host to
> pass -foffload-abi=disable if TARGET_ILP32 is set and offload target
> is nvptx, but not sure if that'd be correct ?
> >
> > Basically, yes.  My idea was that all 'TARGET_OFFLOAD_OPTIONS'
> > implementations return either the correct host flags to be used by
> the
> > 'mkoffload's (the case that offloading is supported for the current
> > host flags/ABI configuration), or otherwise return '-foffload-
> abi=disable'.
> > For example (untested):
> >
> >> char *
> >> ix86_offload_options (void)
> >> {
> >>   if (TARGET_LP64)
> >> -return xstrdup ("-foffload-abi=lp64");
> >> +return xstrdup ("-foffload-abi=-m64");
> >> -  return xstrdup ("-foffload-abi=ilp32");
> >> +  return xstrdup ("-foffload-abi=disable");
> >> }
> >
> > That is, only for 'TARGET_LP64' offloading is supported, and via
> > '-foffload-abi=-m64' the 'mkoffload's know that they need to specify
> > '-m64'.  For other host flags/ABI configuration, the 'mkoffload's
> see
> > '-foffload-abi=disable' and thus disable offload code generation
> > (replacing the current 'if (offload_abi == OFFLOAD_ABI_LP64)' in
> > 'mkoffload').
> >
> >> In the attached patch
> >
> > Yes, that's going in the right direction, thanks!
> >
> >> I added another option -foffload-abi-host-opts to specify host abi
> >> opts, and leave -foffload-abi to specify if ABI is 32/64 bit which
> >> mkoffload can use to enable/disable offloading (as before).
> >
> > I'm not sure however, if this additional option is really necessary?
Well, my concern was if that'd change the behavior for TARGET_ILP32 ?
IIUC, currently for -foffload-abi=ilp32, mkoffload will create empty 

[PATCH] c++: default targ eligibility refinement [PR101463]

2024-08-16 Thread Patrick Palka
> > Here during default template argument substitution we wrongly consider
> > the (substituted) default arguments v and vt as value-dependent[1]
> > which ultimately leads to deduction failure for the calls.
> >
> > The bogus value_dependent_expression_p result aside, I noticed
> > type_unification_real during default targ substitution keeps track of
> > whether all previous targs are known and non-dependent, as is the case
> > for these calls.  And in such cases it should be safe to avoid checking
> > dependence of the substituted default targ and just assume it's not.
> > This patch implements this optimization, which lets us accept both
> > testcases by sidestepping the value_dependent_expression_p issue
> > altogether.
>
> Hmm, maybe instead of substituting and asking if it's dependent, we should
> specifically look for undeduced parameters.

Makes sense, like so?  Bootstrapped and regtested on x86_64-pc-linux-gnu.

PR c++/101463

gcc/cp/ChangeLog:

* pt.cc (type_unification_real): Directly look for undeduced
parameters in the default argument instead of substituting
and asking if it's dependent.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/nontype6.C: New test.
* g++.dg/cpp1z/nontype6a.C: New test.
---
 gcc/cp/pt.cc   | 41 ++
 gcc/testsuite/g++.dg/cpp1z/nontype6.C  | 24 +++
 gcc/testsuite/g++.dg/cpp1z/nontype6a.C | 25 
 3 files changed, 71 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype6.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype6a.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8725a5eeb3f..ad0f73c2f43 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23607,28 +23607,31 @@ type_unification_real (tree tparms,
 is important if the default argument contains something that
 might be instantiation-dependent like access (87480).  */
  processing_template_decl_sentinel s (!any_dependent_targs);
- tree substed = NULL_TREE;
- if (saw_undeduced == 1 && !any_dependent_targs)
+
+ tree used_tparms = NULL_TREE;
+ if (saw_undeduced == 1)
{
- /* First instatiate in template context, in case we still
-depend on undeduced template parameters.  */
- ++processing_template_decl;
- substed = tsubst_template_arg (arg, full_targs, complain,
-NULL_TREE);
- --processing_template_decl;
- if (substed != error_mark_node
- && !uses_template_parms (substed))
-   /* We replaced all the tparms, substitute again out of
-  template context.  */
-   substed = NULL_TREE;
+ tree tparms_list = build_tree_list (size_int (1), tparms);
+ used_tparms = find_template_parameters (arg, tparms_list);
+ for (; used_tparms; used_tparms = TREE_CHAIN (used_tparms))
+   {
+ int level, index;
+ template_parm_level_and_index (TREE_VALUE (used_tparms),
+&level, &index);
+ if (TREE_VEC_ELT (targs, index) == NULL_TREE)
+   break;
+   }
}
- if (!substed)
-   substed = tsubst_template_arg (arg, full_targs, complain,
-  NULL_TREE);
 
- if (!uses_template_parms (substed))
-   arg = convert_template_argument (parm, substed, full_targs,
-complain, i, NULL_TREE);
+ if (!used_tparms)
+   {
+ /* All template parameters used within this default argument
+are deduced, so we can use it.  */
+ arg = tsubst_template_arg (arg, full_targs, complain,
+NULL_TREE);
+ arg = convert_template_argument (parm, arg, full_targs,
+  complain, i, NULL_TREE);
+   }
  else if (saw_undeduced == 1)
arg = NULL_TREE;
  else if (!any_dependent_targs)
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype6.C 
b/gcc/testsuite/g++.dg/cpp1z/nontype6.C
new file mode 100644
index 000..06cd234cc61
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nontype6.C
@@ -0,0 +1,24 @@
+// PR c++/101463
+// { dg-do compile { target c++17 } }
+
+int a;
+
+int& v = a;
+
+template
+void f(int) { }
+
+template
+void g(T) { }
+
+template
+int& vt = a;
+
+template>
+void h(T) { }
+
+int main() {
+  f(0);
+  g(0);
+  h(0);
+}
diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype6a.C 
b/gcc/testsuite/g++.dg/cpp1z/nontype6a.C
new file mode 

Re: [PATCH] libcpp, c-family, v3: Add (dumb) C23 N3017 #embed support [PR105863]

2024-08-16 Thread Joseph Myers
On Fri, 16 Aug 2024, Jakub Jelinek wrote:

> > Apart from any consequences for arguments of prefix/suffix/is_empty (where 
> > there is a plausible argument that the argument should get expanded at 
> > some point and that the current wording is undesirable for usability), 
> > this would also mean that e.g.
> > 
> > #define LIMIT limit
> > #embed "file" LIMIT(1)
> > 
> > isn't valid because LIMIT doesn't get expanded (the syntax for 
> > non-expanded #embed is met, with an unknown parameter LIMIT), while
> > 
> > #define limit !
> > #embed "file" limit(1)
> > 
> > *is* valid, because limit doesn't get expanded (which may be convenient 
> > for usability - it means headers don't need to use __limit__ if using 
> > #embed, even if files including the header might have defined limit as a 
> > macro).
> 
> Is there an agreement on that?

I think it's understood to be what the words mean (apart from the recent 
discussion, also noted in February 2022 discussion - e.g. 
).  Whether it's desirable 
is another matter.  Apart from likely wanting to be able to use macros 
within the prefix/suffix/if_empty arguments, it would also seem fairly 
reasonable to want to have a macro expanding to such parameters used with 
more than one directive.

#define PARAMS prefix(X) suffix(Y)

#embed "a" PARAMS

#embed "b" PARAMS

without needing to arrange for the directive not to look like one of the 
other forms before macro expansion.

I've added this to my list of issues to file once we have an issue 
tracking system for the C standard in operation.

> > > +  if (CPP_PEDANTIC (pfile))
> > > +{
> > > +  if (CPP_OPTION (pfile, cplusplus))
> > > + cpp_error (pfile, CPP_DL_PEDWARN,
> > > +"#%s is a GCC extension", "embed");
> > > +  else if (!CPP_OPTION (pfile, warning_directive))
> > > + cpp_error (pfile, CPP_DL_PEDWARN,
> > > +"#%s before C23 is a GCC extension", "embed");
> > 
> > I don't think warning_directive directive should be used here as the 
> > condition for diagnosing #embed as an extension; adding a separate 
> > embed_directive would be better.  (Especially if a future C++ version ends 
> > up adding #embed; you could then use embed_directive as a condition for 
> > the pedwarn for both C and C++, whereas warning_directive wouldn't work as 
> > a condition for C++ since #warning is already in C++23.)
> 
> Ok, will change that (there are really too many features and the table
> already needs 147 columns before this change, so wanted to avoid adding new
> stuff there unless necessary).

My concern here is more with the use of the fields in cpp_options, than 
with the initialization of that from lang_flags.  Maybe that large 
lang_defaults table isn't the optimal way of initializing all those 
cpp_options fields (although initializing CPP_OPTION (pfile, 
embed_directive) from l->warning_directive would also be awkward; the 
question is more whether computing the cpp_options fields with logic like 
"C from version X onwards, C++ from version Y onwards" would be an 
improvement on a table taking 147 columns).

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [COMMITTED] Regenerate avr.opt.urls

2024-08-16 Thread Georg-Johann Lay

Am 13.08.24 um 10:59 schrieb Mark Wielaard:

avr added an -mlra option, but the avr.opt.url file wasn't
regenerated.

Note that commit 149a23ee2568 ("AVR: -mlra is not documeted in TEXI.")
did add the Undocumented flag, but that still needs the avr.op.urls
file to be updated.

Fixes: 09a87ea666b2 ("AVR: ad target/113934 - Add option -mlra to enable LRA.")


So I wonder why regenerate-opt-urls.py is searching for a documentation
to begin with, because -mlra is tagged "Undocumented".

Johann


gcc/ChangeLog:

* config/avr/avr.opt.urls: Regenerate.
---
  gcc/config/avr/avr.opt.urls | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/config/avr/avr.opt.urls b/gcc/config/avr/avr.opt.urls
index f38e67384ab1..6acc418b407d 100644
--- a/gcc/config/avr/avr.opt.urls
+++ b/gcc/config/avr/avr.opt.urls
@@ -1,5 +1,7 @@
  ; Autogenerated by regenerate-opt-urls.py from gcc/config/avr/avr.opt and 
generated HTML
  
+; skipping UrlSuffix for 'mlra' due to finding no URLs

+
  mcall-prologues
  UrlSuffix(gcc/AVR-Options.html#index-mcall-prologues)
  


Re: [PATCH] libcpp, c-family, v3: Add (dumb) C23 N3017 #embed support [PR105863]

2024-08-16 Thread Joseph Myers
On Fri, 16 Aug 2024, Jakub Jelinek wrote:

> On Fri, Aug 16, 2024 at 01:43:58AM +0200, Jakub Jelinek wrote:
> > My reading of it wasn't that whether it is
> > # embed < h-char-sequence > embed-parameter-sequence[opt] new-line
> > or
> > # embed < h-char-sequence > embed-parameter-sequence[opt] new-line
> > or
> > # embed pp-tokens new-line
> > depends solely on the filename part in there, but also whether
> > embed-parameter-sequence is syntactically valid (if specified).
> 
> But if so, doesn't that mean that also
> #define foo bar
> #define bar baz
> #define limit suffix (1
> #embed  limit )
> should be treated as
> #embed  suffix (1)
> ?
> I'd think that for filenames that would be quite surprising.

I think the header-name preprocessing token still gets recognized inside 
#embed, so preventing macro expansion inside the filename, because there 
are no identifier preprocessing tokens inside that filename.  Just as

#define EMPTY
#define foobar stdio
#include  EMPTY

doesn't end up including  after macro expansion.  (There is the 
subsequent implementation-defined process for combining preprocessing 
tokens between < and > into a header name, so allowing a header name to 
result from macro expansion without needing a header name as an argument 
to the macro, but no such process to split up a header name before macro 
expansion.)

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] Fortran: fix doumentation of intrinsic RANDOM_INIT [PR114146]

2024-08-16 Thread Harald Anlauf

Hi Andre,

Am 16.08.24 um 07:46 schrieb Andre Vehreschild:

Hi Harald,

s/doumentation/documentation/

in the commit's title.


oops!  Thanks for pointing this out.

Fixed and pushed as r15-2955-g07ece73d4712c6 .

Harald


Thanks for the patch, ok to commit.

- Andre

On Thu, 15 Aug 2024 22:37:53 +0200
Harald Anlauf  wrote:


Dear all,

here's a documentation bugfix.  The previous wording was in conflict
with the standard, while the runtime behavior is apparently fine.

Checked with make dvi pdf .

OK for mainline?

Thanks,
Harald




--
Andre Vehreschild * Email: vehre ad gmx dot de





Re: [PATCH] libcpp, c-family, v3: Add (dumb) C23 N3017 #embed support [PR105863]

2024-08-16 Thread Jakub Jelinek
On Fri, Aug 16, 2024 at 04:05:52PM +, Joseph Myers wrote:
> On Fri, 16 Aug 2024, Jakub Jelinek wrote:
> 
> > > Apart from any consequences for arguments of prefix/suffix/is_empty 
> > > (where 
> > > there is a plausible argument that the argument should get expanded at 
> > > some point and that the current wording is undesirable for usability), 
> > > this would also mean that e.g.
> > > 
> > > #define LIMIT limit
> > > #embed "file" LIMIT(1)
> > > 
> > > isn't valid because LIMIT doesn't get expanded (the syntax for 
> > > non-expanded #embed is met, with an unknown parameter LIMIT), while
> > > 
> > > #define limit !
> > > #embed "file" limit(1)
> > > 
> > > *is* valid, because limit doesn't get expanded (which may be convenient 
> > > for usability - it means headers don't need to use __limit__ if using 
> > > #embed, even if files including the header might have defined limit as a 
> > > macro).
> > 
> > Is there an agreement on that?
> 
> I think it's understood to be what the words mean (apart from the recent 
> discussion, also noted in February 2022 discussion - e.g. 
> ).  Whether it's desirable 
> is another matter.  Apart from likely wanting to be able to use macros 
> within the prefix/suffix/if_empty arguments, it would also seem fairly 
> reasonable to want to have a macro expanding to such parameters used with 
> more than one directive.
> 
> #define PARAMS prefix(X) suffix(Y)
> 
> #embed "a" PARAMS
> 
> #embed "b" PARAMS
> 
> without needing to arrange for the directive not to look like one of the 
> other forms before macro expansion.
> 
> I've added this to my list of issues to file once we have an issue 
> tracking system for the C standard in operation.

Ok.  So for now, should I work on a patch variant which tries to follow
what is in C23 right now?
I.e. most likely set pfile->state.prevent_expansion = 1; initially,
keep checking the tokens just for the basic syntax match (header token or
string literal, followed by check for pp-parameter matches if any until end
of line, based on that decide and push all the read tokens back to
lookaside?
Still, as I wrote, not really sure if it goes with the
pfile->state.prevent_expansion = 1 decision when temporarily switching that
off for the limit argument real parsing when the closing ) (i.e.
non-balanced) comes from a macro, and whether to ignore the macro expansion
of prefix/suffix/if_empty altogether, or push those expanded when actually
using them (and if so, again, what to do with unbalanced case; maybe in that
case it would be desirable and would allow to insert unbalanced {,(,[,],),}
to the token stream).

> My concern here is more with the use of the fields in cpp_options, than 
> with the initialization of that from lang_flags.  Maybe that large 
> lang_defaults table isn't the optimal way of initializing all those 
> cpp_options fields (although initializing CPP_OPTION (pfile, 
> embed_directive) from l->warning_directive would also be awkward; the 
> question is more whether computing the cpp_options fields with logic like 
> "C from version X onwards, C++ from version Y onwards" would be an 
> improvement on a table taking 147 columns).

Ok, will change it.  And think how to improve the table next.
Indeed, having some macro for this option is C >= n && C++ >= m, whether GNU
only or standard etc. would be nice.
Right now it is 28x23 bit matrix.
One possibility would be

/*  u e w
  b d   8 l a   t
  x u i i   c v s   s i r d r
x i   d u r d n g t h a c   z f n e u
c c n x c d s i l l l c s r l o o d l d d l f
9 + u i 1 i t g i i i s e i i p p f i e i i a
9 + m d 1 d d r t t t t p g t t e p t f r m l  */
{ /* GNUC89   */  { 0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0 },
  /* GNUC99   */  { 1,0,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0 },
  /* GNUC11   */  { 1,0,1,1,1,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0 },
  /* GNUC17   */  { 1,0,1,1,1,0,0,1,1,1,0,0,0,0,0,1,1,0,0,0,0,0,0 },
  /* GNUC23   */  { 1,0,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1,1,0,1,1,0,1 },
  /* GNUC2Y   */  { 1,0,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1,1,0,1,1,0,1 },
  /* STDC89   */  { 0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0 },
  /* STDC94   */  { 0,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0 },
  /* STDC99   */  { 1,0,1,1,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0 },
  /* STDC11   */  { 1,0,1,1,1,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0 },
  /* STDC17   */  { 1,0,1,1,1,0,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0 },
  /* STDC23   */  { 1,0,1,1,1,1,1,1,1,0,0,1,1,0,1,1,1,1,0,1,1,0,1 },
  /* STDC2Y   */  { 1,0,1,1,1,1,1,1,1,0,0,1,1,0,1,1,1,1,0,1,1,0,1 },
  /* GNUCXX   */  { 0,1,1,1,0,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1 },
  /* CXX98*/  { 0,1,0,1,0,1,1,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1 },
  /* GNUCXX11 */  { 1,1,1,1,1,1,0,1,1,1,1,0,0,

Re: [PATCH] libcpp, v2: Add support for gnu::offset #embed/__has_embed parameter

2024-08-16 Thread Joseph Myers
On Thu, 15 Aug 2024, Jakub Jelinek wrote:

> +   else
> + {
> +   if (res > INTTYPE_MAXIMUM (off_t))
> + cpp_error_with_line (pfile, CPP_DL_ERROR, loc, 0,
> +  "too large 'gnu::offset' argument");

Having a testcase for this diagnostic would be a good idea.  Also one for 
a negative argument for gnu::offset (the errors for negative arguments are 
already tested for limit, but I think testing that for gnu::offset is a 
good idea as well).

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [Fortran, Patch, PR46371, v1] Fix coarrays use in select type

2024-08-16 Thread Harald Anlauf

Hi Andre,

Am 16.08.24 um 14:10 schrieb Andre Vehreschild:

Hi all,

attached patch is a follow up on the pr110033 patch and fixes two ICEs
reported in pr46371. With the patch also pr56496 is fixed, although that could
have been fixed by pr110033 already. I just added the testcase from pr56496 here
as coarray/select_type_3.f90 (I like it when the name of the test gives a rough
idea on what is tested instead of having just the pr#) to have it covered.

Bootstraps and regtests ok on x86_64-pc-linux-gnu. Ok for mainline?


this looks good to me.

I think with this patch also pr99837 is resolved.  Can you have a look,
and if so, close it?

Thanks for the patch!

Harald


Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de




Re: [PATCH] libcpp, c-family, v3: Add (dumb) C23 N3017 #embed support [PR105863]

2024-08-16 Thread Joseph Myers
On Fri, 16 Aug 2024, Jakub Jelinek wrote:

> Ok.  So for now, should I work on a patch variant which tries to follow
> what is in C23 right now?

Not sure how useful having such a patch variant would be until we have a 
better idea of what the desired semantics actually are.

-- 
Joseph S. Myers
josmy...@redhat.com



[PATCH] Extend check-function-bodies to cover directives

2024-08-16 Thread H.J. Lu
As PR target/116174 shown, we may need to verify the directive order.
Extend check-function-bodies to cover directives.

* gcc.target/i386/pr116174.c: Use check-function-bodies.
* lib/scanasm.exp (configure_check-function-bodies): Add an
argument for fluff.  Set up_config(fluff) to $fluff if not
empty.
(check-function-bodies): Add an optional argument for fluff and
pass it to configure_check-function-bodies.

Signed-off-by: H.J. Lu 
---
 gcc/testsuite/gcc.target/i386/pr116174.c | 16 ++--
 gcc/testsuite/lib/scanasm.exp| 17 -
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr116174.c 
b/gcc/testsuite/gcc.target/i386/pr116174.c
index 8877d0b51af..75c62964d97 100644
--- a/gcc/testsuite/gcc.target/i386/pr116174.c
+++ b/gcc/testsuite/gcc.target/i386/pr116174.c
@@ -1,6 +1,20 @@
 /* { dg-do compile { target *-*-linux* } } */
 /* { dg-options "-O2 -fcf-protection=branch" } */
+/* Keep directives ('.p2align', '.cfi_startproc').
+/* { dg-final { check-function-bodies "**" "" "" { target "*-*-*" } 
{^\s*(?://|$)} } } */
 
+/*
+**foo:
+**...
+** .cfi_startproc
+** (
+** endbr64
+** |
+** endbr32
+** )
+** .p2align 5
+**...
+*/
 char *
 foo (char *dest, const char *src)
 {
@@ -8,5 +22,3 @@ foo (char *dest, const char *src)
 /* nothing */;
   return --dest;
 }
-
-/* { dg-final { scan-assembler "\t\.cfi_startproc\n\tendbr(32|64)\n" } } */
diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 42c719c512c..5165284608f 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -863,7 +863,7 @@ proc scan-lto-assembler { args } {
 
 # Set up CONFIG for check-function-bodies.
 
-proc configure_check-function-bodies { config } {
+proc configure_check-function-bodies { config fluff } {
 upvar $config up_config
 
 # Regexp for the start of a function definition (name in \1).
@@ -890,7 +890,9 @@ proc configure_check-function-bodies { config } {
 }
 
 # Regexp for lines that aren't interesting.
-if { [istarget nvptx*-*-*] } {
+if {$fluff ne ""} then {
+   set up_config(fluff) $fluff
+} elseif { [istarget nvptx*-*-*] } {
# Skip lines beginning with '//' comments ('-fverbose-asm', for
# example).
set up_config(fluff) {^\s*(?://)}
@@ -982,7 +984,7 @@ proc check_function_body { functions name body_regexp } {
 
 # Check the implementations of functions against expected output.  Used as:
 #
-# { dg-do { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR]] } }
+# { dg-do { check-function-bodies PREFIX TERMINATOR[ OPTION[ SELECTOR 
[FLUFF]]] } }
 #
 # See sourcebuild.texi for details.
 
@@ -990,7 +992,7 @@ proc check-function-bodies { args } {
 if { [llength $args] < 2 } {
error "too few arguments to check-function-bodies"
 }
-if { [llength $args] > 4 } {
+if { [llength $args] > 5 } {
error "too many arguments to check-function-bodies"
 }
 
@@ -1029,6 +1031,11 @@ proc check-function-bodies { args } {
}
 }
 
+set fluff ""
+if { [llength $args] >= 5 } {
+   set fluff [lindex $args 4]
+}
+
 set testcase [testname-for-summary]
 # The name might include a list of options; extract the file name.
 set filename [lindex $testcase 0]
@@ -1048,7 +1055,7 @@ proc check-function-bodies { args } {
 # (name in \1).  This may be different from '$config(start)'.
 set start_expected {^(\S+):$}
 
-configure_check-function-bodies config
+configure_check-function-bodies config $fluff
 set have_bodies 0
 if { [is_remote host] } {
remote_upload host "$filename"
-- 
2.46.0



Re: [Ping x 3, Patch, Fortran, PR84244, v3] Fix ICE in recompute_tree_invariant_for_addr_expr, at tree.c:4535

2024-08-16 Thread Harald Anlauf

Hi Andre,

Am 16.08.24 um 12:05 schrieb Andre Vehreschild:

Hi all,

any one for a review? This patch is over a month old and starts to rot.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?


this is good to go.

Thanks for the patch!

Harald


- Andre

On Fri, 9 Aug 2024 16:27:42 +0200
Andre Vehreschild  wrote:


Ping!

On Wed, 17 Jul 2024 15:11:33 +0200
Andre Vehreschild  wrote:


Hi all,

and the last ping.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre

On Thu, 11 Jul 2024 16:05:09 +0200
Andre Vehreschild  wrote:


Hi all,

the attached patch fixes a segfault in the compiler, where for pointer
components of a derived type the caf_token in the component was not
set, when the derived was previously used outside of a coarray.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre



--
Andre Vehreschild * Email: vehre ad gmx dot de



--
Andre Vehreschild * Email: vehre ad gmx dot de



--
Andre Vehreschild * Email: vehre ad gmx dot de




[PATCH] libcpp: Adjust lang_defaults

2024-08-16 Thread Jakub Jelinek
Hi!

Here it is in patch form, at the same time I've turned it into bit-fields.
On x86_64-linux, this reduced .rodata by 532 bytes (so 5.75x reduction
of the variable) and grew the cpp_set_lang function by 26 bytes (8.4%
growth).

So far smoke tested, ok for trunk if it passes full bootstrap/regtest?

--- libcpp/init.cc.jj   2024-08-15 12:03:35.880901322 +0200
+++ libcpp/init.cc  2024-08-16 19:37:31.564755070 +0200
@@ -77,61 +77,67 @@ END
requires.  */
 struct lang_flags
 {
-  char c99;
-  char cplusplus;
-  char extended_numbers;
-  char extended_identifiers;
-  char c11_identifiers;
-  char xid_identifiers;
-  char std;
-  char digraphs;
-  char uliterals;
-  char rliterals;
-  char user_literals;
-  char binary_constants;
-  char digit_separators;
-  char trigraphs;
-  char utf8_char_literals;
-  char va_opt;
-  char scope;
-  char dfp_constants;
-  char size_t_literals;
-  char elifdef;
-  char warning_directive;
-  char delimited_escape_seqs;
-  char true_false;
+  unsigned int c99 : 1;
+  unsigned int cplusplus : 1;
+  unsigned int extended_numbers : 1;
+  unsigned int extended_identifiers : 1;
+  unsigned int c11_identifiers : 1;
+  unsigned int xid_identifiers : 1;
+  unsigned int std : 1;
+  unsigned int digraphs : 1;
+  unsigned int uliterals : 1;
+  unsigned int rliterals : 1;
+  unsigned int user_literals : 1;
+  unsigned int binary_constants : 1;
+  unsigned int digit_separators : 1;
+  unsigned int trigraphs : 1;
+  unsigned int utf8_char_literals : 1;
+  unsigned int va_opt : 1;
+  unsigned int scope : 1;
+  unsigned int dfp_constants : 1;
+  unsigned int size_t_literals : 1;
+  unsigned int elifdef : 1;
+  unsigned int warning_directive : 1;
+  unsigned int delimited_escape_seqs : 1;
+  unsigned int true_false : 1;
 };
 
-static const struct lang_flags lang_defaults[] =
-{ /*  c99 c++ xnum xid c11 xidid std digr ulit rlit udlit bincst 
digsep trig u8chlit vaopt scope dfp szlit elifdef warndir delim trufal */
-  /* GNUC89   */  { 0,  0,  1,  0,  0,  0,0,  1,   0,   0,   0,0, 
0, 0,   0,  1,   1, 0,   0,   0,  0,  0,0 },
-  /* GNUC99   */  { 1,  0,  1,  1,  0,  0,0,  1,   1,   1,   0,0, 
0, 0,   0,  1,   1, 0,   0,   0,  0,  0,0 },
-  /* GNUC11   */  { 1,  0,  1,  1,  1,  0,0,  1,   1,   1,   0,0, 
0, 0,   0,  1,   1, 0,   0,   0,  0,  0,0 },
-  /* GNUC17   */  { 1,  0,  1,  1,  1,  0,0,  1,   1,   1,   0,0, 
0, 0,   0,  1,   1, 0,   0,   0,  0,  0,0 },
-  /* GNUC23   */  { 1,  0,  1,  1,  1,  1,0,  1,   1,   1,   0,1, 
1, 0,   1,  1,   1, 1,   0,   1,  1,  0,1 },
-  /* GNUC2Y   */  { 1,  0,  1,  1,  1,  1,0,  1,   1,   1,   0,1, 
1, 0,   1,  1,   1, 1,   0,   1,  1,  0,1 },
-  /* STDC89   */  { 0,  0,  0,  0,  0,  0,1,  0,   0,   0,   0,0, 
0, 1,   0,  0,   0, 0,   0,   0,  0,  0,0 },
-  /* STDC94   */  { 0,  0,  0,  0,  0,  0,1,  1,   0,   0,   0,0, 
0, 1,   0,  0,   0, 0,   0,   0,  0,  0,0 },
-  /* STDC99   */  { 1,  0,  1,  1,  0,  0,1,  1,   0,   0,   0,0, 
0, 1,   0,  0,   0, 0,   0,   0,  0,  0,0 },
-  /* STDC11   */  { 1,  0,  1,  1,  1,  0,1,  1,   1,   0,   0,0, 
0, 1,   0,  0,   0, 0,   0,   0,  0,  0,0 },
-  /* STDC17   */  { 1,  0,  1,  1,  1,  0,1,  1,   1,   0,   0,0, 
0, 1,   0,  0,   0, 0,   0,   0,  0,  0,0 },
-  /* STDC23   */  { 1,  0,  1,  1,  1,  1,1,  1,   1,   0,   0,1, 
1, 0,   1,  1,   1, 1,   0,   1,  1,  0,1 },
-  /* STDC2Y   */  { 1,  0,  1,  1,  1,  1,1,  1,   1,   0,   0,1, 
1, 0,   1,  1,   1, 1,   0,   1,  1,  0,1 },
-  /* GNUCXX   */  { 0,  1,  1,  1,  0,  1,0,  1,   0,   0,   0,0, 
0, 0,   0,  1,   1, 0,   0,   0,  0,  0,1 },
-  /* CXX98*/  { 0,  1,  0,  1,  0,  1,1,  1,   0,   0,   0,0, 
0, 1,   0,  0,   1, 0,   0,   0,  0,  0,1 },
-  /* GNUCXX11 */  { 1,  1,  1,  1,  1,  1,0,  1,   1,   1,   1,0, 
0, 0,   0,  1,   1, 0,   0,   0,  0,  0,1 },
-  /* CXX11*/  { 1,  1,  0,  1,  1,  1,1,  1,   1,   1,   1,0, 
0, 1,   0,  0,   1, 0,   0,   0,  0,  0,1 },
-  /* GNUCXX14 */  { 1,  1,  1,  1,  1,  1,0,  1,   1,   1,   1,1, 
1, 0,   0,  1,   1, 0,   0,   0,  0,  0,1 },
-  /* CXX14*/  { 1,  1,  0,  1,  1,  1,1,  1,   1,   1,   1,1, 
1, 1,   0,  0,   1, 0,   0,   0,  0,  0,1 },
-  /* GNUCXX17 */  { 1,  1,  1,  1,  1,  1,0,  1,   1,   1,   1,1, 
1, 0,   1,  1,   1, 0,   0,   0,  0,  0,1 },
-  /* CXX17*/  { 1,  1,  1,  1,  1,  

[PATCH] c++: fix ICE in convert_nontype_argument [PR116384]

2024-08-16 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here we ICE since r14-8291 in C++11/C++14 modes.  Fortunately
this is an easy one.

The important bit of r14-8291 is this:

@@ -20056,9 +20071,12 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
RETURN (retval);
  }
if (IMPLICIT_CONV_EXPR_NONTYPE_ARG (t))
- /* We'll pass this to convert_nontype_argument again, we don't need
-to actually perform any conversion here.  */
- RETURN (expr);
+ {
+   tree r = convert_nontype_argument (type, expr, complain);
+   if (r == NULL_TREE)
+ r = error_mark_node;
+   RETURN (r);
+ }

which obviously means that instead of returning right away we go
to convert_nontype_argument.  When type is error_mark_node and we're
in C++17, in convert_nontype_argument we go down this path:

  else if (INTEGRAL_OR_ENUMERATION_TYPE_P (type)
   || cxx_dialect >= cxx17)
{
  expr = build_converted_constant_expr (type, expr, complain);
  if (expr == error_mark_node)
return (complain & tf_error) ? NULL_TREE : error_mark_node;
  // ...
}

but pre-C++17, we take a different route and end up crashing on
gcc_unreachable.

It would of course also work to check for error_mark_node early in
build_converted_constant_expr.

PR c++/116384

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) : Bail if tsubst
returns error_mark_node.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/vt-116384.C: New test.
---
 gcc/cp/pt.cc   |  2 ++
 gcc/testsuite/g++.dg/cpp0x/vt-116384.C | 26 ++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/vt-116384.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8725a5eeb3f..684ee0c8a60 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -20217,6 +20217,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
 case IMPLICIT_CONV_EXPR:
   {
tree type = tsubst (TREE_TYPE (t), args, complain, in_decl);
+   if (type == error_mark_node)
+ RETURN (error_mark_node);
tree expr = RECUR (TREE_OPERAND (t, 0));
if (dependent_type_p (type) || type_dependent_expression_p (expr))
  {
diff --git a/gcc/testsuite/g++.dg/cpp0x/vt-116384.C 
b/gcc/testsuite/g++.dg/cpp0x/vt-116384.C
new file mode 100644
index 000..54d7f0774c5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/vt-116384.C
@@ -0,0 +1,26 @@
+// PR c++/116384
+// { dg-do compile { target c++11 } }
+
+namespace a {
+template  struct c;
+template  struct d;
+}
+namespace e {
+namespace g {
+template  using h = void;
+template  class, typename...> struct detector {};
+template  class i, typename... args>
+struct detector>, i, args...>;
+}
+template  class i, typename... args>
+using j = g::detector;
+template  using l = typename a::c::m;
+template  struct conjunction;
+namespace g {
+template  using n = l>::p>;
+}
+template  = true> class o;
+}
+struct r;
+template  using q = e::o;
+void s() { e::j f; }

base-commit: c8981bde45d365330a5e7c2e33c8dbaf3495248a
-- 
2.46.0



Re: [PATCH V3 08/10] rs6000: Adjust altivec dot-product backend patterns

2024-08-16 Thread Peter Bergner
rs6000 patches should CC the rs6000 port maintainers.  I've CC'd them on
this note.

Peter


On 8/15/24 3:44 AM, Victor Do Nascimento wrote:
> Following the migration of the dot_prod optab from a direct to a
> conversion-type optab, ensure all back-end patterns incorporate the
> second machine mode into pattern names.
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/altivec.md (udot_prod): Renamed to...
>   (udot_prodv4si): ...this.
>   (sdot_prodv8hi): Renamed to...
>   (sdot_prodv4siv8hi): ...this.
> ---
>  gcc/config/rs6000/altivec.md | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 1f5489b974f..0911c1792a8 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -3698,7 +3698,7 @@ (define_expand "neg2"
>  }
>  })
>  
> -(define_expand "udot_prod"
> +(define_expand "udot_prodv4si"
>[(set (match_operand:V4SI 0 "register_operand" "=v")
>  (plus:V4SI (match_operand:V4SI 3 "register_operand" "v")
> (unspec:V4SI [(match_operand:VIshort 1 "register_operand" 
> "v")  
> @@ -3710,7 +3710,7 @@ (define_expand "udot_prod"
>DONE;
>  })
>  
> -(define_expand "sdot_prodv8hi"
> +(define_expand "sdot_prodv4siv8hi"
>[(set (match_operand:V4SI 0 "register_operand" "=v")
>  (plus:V4SI (match_operand:V4SI 3 "register_operand" "v")
> (unspec:V4SI [(match_operand:V8HI 1 "register_operand" 
> "v")



Re: [PATCH 1/3] Write CodeView information about local static variables

2024-08-16 Thread Mark Harmstone

Thanks Jeff. No, CodeView is effectively Windows-specific - it relies on PE for 
reporting the PDB filename, and COFF for the .secidx relocation. I might look 
into moving these bits into the config once I get down to plumbing it for 
aarch64-w64-mingw32.

Mark

On 14/08/2024 05:09, Jeff Law wrote:



On 8/12/24 6:24 PM, Mark Harmstone wrote:

Outputs CodeView S_LDATA32 symbols, for static variables within
functions, along with S_BLOCK32 and S_END for the beginning and end of
lexical blocks.

gcc/
* dwarf2codeview.cc (enum cv_sym_type): Add S_END and S_BLOCK32.
(write_local_s_ldata32): New function.
(write_unoptimized_local_variable): New function.
(write_s_block32): New function.
(write_s_end): New function.
(write_unoptimized_function_vars): New function.
(write_function): Call write_unoptimized_function_vars.

This series is fine.  I'm not particularly jazzed about how much target 
specific data shows up in patch #2.  It's probably safe to assume the mapping 
of register number of the codeview number doesn't match the dwarf map.  It's 
probably also safe to assume we're not supporting codeview on any targets other 
than x86 and ix86?

jeff




[PATCH 1/2] builtins: Don't expand bit query builtins for __int128_t if the target supports an optab for it

2024-08-16 Thread Andrew Pinski
On aarch64 (without !CSSC instructions), since popcount is implemented using 
the SIMD instruction cnt,
instead of using two SIMD cnt (V8QI mode), it is better to use one 128bit cnt 
(V16QI mode). And only one
reduction addition instead of 2. Currently fold_builtin_bit_query will expand 
always without checking
if there was an optab for the type, so this changes that to check the optab to 
see if we should expand
or have the backend handle it.

Bootstrapped and tested on x86_64-linux-gnu and built and tested for 
aarch64-linux-gnu.

gcc/ChangeLog:

* builtins.cc (fold_builtin_bit_query): Don't expand double
`unsigned long long` typess if there is an optab entry for that
type.

Signed-off-by: Andrew Pinski 
---
 gcc/builtins.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 0b902896ddd..b4d51eaeba5 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -10185,7 +10185,9 @@ fold_builtin_bit_query (location_t loc, enum 
built_in_function fcode,
   tree call = NULL_TREE, tem;
   if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE
   && (TYPE_PRECISION (arg0_type)
- == 2 * TYPE_PRECISION (long_long_unsigned_type_node)))
+ == 2 * TYPE_PRECISION (long_long_unsigned_type_node))
+  /* If the target supports the optab, then don't do the expansion. */
+  && !direct_internal_fn_supported_p (ifn, arg0_type, OPTIMIZE_FOR_BOTH))
 {
   /* __int128 expansions using up to 2 long long builtins.  */
   arg0 = save_expr (arg0);
-- 
2.43.0



[PATCH 2/2] aarch64: Implement popcountti2 pattern [PR113042]

2024-08-16 Thread Andrew Pinski
When CSSC is not enabled, 128bit popcount can be implemented
just via the vector (v16qi) cnt instruction followed by a reduction,
like how the 64bit one is currently implemented instead of
splitting into 2 64bit popcount.

Build and tested for aarch64-linux-gnu.

PR target/113042

gcc/ChangeLog:

* config/aarch64/aarch64.md (popcountti2): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt10.c: New test.
* gcc.target/aarch64/popcnt9.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.md   | 16 +
 gcc/testsuite/gcc.target/aarch64/popcnt10.c | 25 +
 gcc/testsuite/gcc.target/aarch64/popcnt9.c  | 25 +
 3 files changed, 66 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt10.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt9.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 12dcc16529a..73506e71f43 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5378,6 +5378,22 @@ (define_expand "popcount2"
 }
 })
 
+(define_expand "popcountti2"
+  [(set (match_operand:TI 0 "register_operand")
+   (popcount:TI (match_operand:TI 1 "register_operand")))]
+  "TARGET_SIMD && !TARGET_CSSC"
+{
+  rtx v = gen_reg_rtx (V16QImode);
+  rtx v1 = gen_reg_rtx (V16QImode);
+  emit_move_insn (v, gen_lowpart (V16QImode, operands[1]));
+  emit_insn (gen_popcountv16qi2 (v1, v));
+  rtx out = gen_reg_rtx (DImode);
+  emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v16qi (out, v1));
+  out = convert_to_mode (TImode, out, true);
+  emit_move_insn (operands[0], out);
+  DONE;
+})
+
 (define_insn "clrsb2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt10.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt10.c
new file mode 100644
index 000..4d01fc67022
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt10.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+cssc"
+
+/*
+** h128:
+** ldp x([0-9]+), x([0-9]+), \[x0\]
+** cnt x([0-9]+), x([0-9]+)
+** cnt x([0-9]+), x([0-9]+)
+** add w0, w([0-9]+), w([0-9]+)
+** ret
+*/
+
+
+unsigned h128 (const unsigned __int128 *a) {
+  return __builtin_popcountg (a[0]);
+}
+
+/* popcount with CSSC should be split into 2 sections. */
+/* { dg-final { scan-tree-dump-not "POPCOUNT " "optimized" } } */
+/* { dg-final { scan-tree-dump-times " __builtin_popcount" 2 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt9.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt9.c
new file mode 100644
index 000..c778fc7f420
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt9.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+nocssc"
+
+/*
+** h128:
+** ldr q([0-9]+), \[x0\]
+** cnt v([0-9]+).16b, v\1.16b
+** addvb([0-9]+), v\2.16b
+** fmovw0, s\3
+** ret
+*/
+
+
+unsigned h128 (const unsigned __int128 *a) {
+ return __builtin_popcountg (a[0]);
+}
+
+/* There should be only one POPCOUNT. */
+/* { dg-final { scan-tree-dump-times "POPCOUNT " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " __builtin_popcount"  "optimized" } } */
+
-- 
2.43.0



RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Li, Pan2
Never mind, looks still conflict, could you please help to double check about 
it?
Current upstream should be 3c9c93f3c923c4a0ccd42db4fd26a944a3c91458.

└─(09:18:27 on master ✭)──> git apply tmp.patch 

──(Sat,Aug17)─┘
error: patch failed: gcc/config/riscv/riscv.cc:11010
error: gcc/config/riscv/riscv.cc: patch does not apply

Pan

-Original Message-
From: Zhijin Zeng  
Sent: Friday, August 16, 2024 9:30 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng 

Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]

Sorry, the line number changed. The newest version as follow,

This patch is to fix the bug (BugId:116305) introduced by the commit
bd93ef for risc-v target.

The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
equal.

Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
register value in riscv_legitimize_poly_move, and dwarf2cfi will also
get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
to calculate the number of times to multiply the vlenb register value.

So need to change the factor from riscv_bytes_per_vector_chunk to
BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
information. The incorrect example as follow:

```
csrr    t0,vlenb
slli    t1,t0,1
sub     sp,sp,t1

.cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
```

The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
the literal 4, '0x1e' means the multiply operation. But in fact, the
vlenb register value just need to multiply the literal 2.

gcc/ChangeLog:

        * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.

Signed-off-by: Zhijin Zeng 
---
 gcc/config/riscv/riscv.cc                     |  4 +--
 .../riscv/rvv/base/scalable_vector_cfi.c      | 32 +++
 2 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1f60d8f9711..8b7123e043e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11010,12 +11010,12 @@ static unsigned int
 riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
                                      int *offset)
 {
-  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
+  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
      1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
      2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
   */
   gcc_assert (i == 1);
-  *factor = riscv_bytes_per_vector_chunk;
+  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
   *offset = 1;
   return RISCV_DWARF_VLENB;
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
new file mode 100644
index 000..184da10caf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
+/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } */
+
+#include "riscv_vector.h"
+
+#define PI_2 1.570796326795
+
+extern void func(float *result);
+
+void test(const float *ys, const float *xs, float *result, size_t length) {
+    size_t gvl = __riscv_vsetvlmax_e32m2();
+    vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
+
+    for(size_t i = 0; i < length;) {
+        gvl = __riscv_vsetvl_e32m2(length - i);
+        vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl);
+        vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl);
+        vbool16_t mask0  = __riscv_vmflt_vv_f32m2_b16(x, y, gvl);
+        vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 0, 
gvl);
+
+        __riscv_vse32_v_f32m2(result, fixpi, gvl);
+
+        func(result);
+
+        i += gvl;
+        ys += gvl;
+        xs += gvl;
+        result += gvl;
+    }
+}
--
2.34.1

> From: "Li, Pan2"
> Date:  Fri, Aug 16, 2024, 21:05
> Subject:  RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> To: "Zhijin Zeng"
> Cc: "gcc-patches@gcc.gnu.org", 
> "gcc-b...@gcc

[PATCH] PHIOPT: move factor_out_conditional_operation over to use gimple_match_op

2024-08-16 Thread Andrew Pinski
To start working on more with expressions with more than one operand, converting
over to use gimple_match_op is needed.
The added side-effect here is factor_out_conditional_operation can now support
builtins/internal calls that has one operand without any extra code added.

Note on the changed testcases:
* pr87007-5.c: the test was testing testing for avoiding partial register stalls
for the sqrt and making sure there is only one zero of the register before the
branch, the phiopt would now merge the sqrt's so disable phiopt.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* gimple-match-exports.cc 
(gimple_match_op::operands_occurs_in_abnormal_phi):
New function.
* gimple-match.h (gimple_match_op): Add operands_occurs_in_abnormal_phi.
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Use 
gimple_match_op
instead of manually extracting from/creating the gimple.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr87007-5.c: Disable phi-opt.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-match-exports.cc   | 14 +
 gcc/gimple-match.h|  2 +
 gcc/testsuite/gcc.target/i386/pr87007-5.c |  5 +-
 gcc/tree-ssa-phiopt.cc| 66 ++-
 4 files changed, 49 insertions(+), 38 deletions(-)

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index aacf3ff0414..15d54b7d843 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -126,6 +126,20 @@ gimple_match_op::resimplify (gimple_seq *seq, tree 
(*valueize)(tree))
 }
 }
 
+/* Returns true if any of the operands of THIS occurs
+   in abnormal phis. */
+bool
+gimple_match_op::operands_occurs_in_abnormal_phi() const
+{
+  for (unsigned int i = 0; i < num_ops; i++)
+{
+   if (TREE_CODE (ops[i]) == SSA_NAME
+  && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ops[i]))
+   return true;
+}
+  return false;
+}
+
 /* Return whether T is a constant that we'll dispatch to fold to
evaluate fully constant expressions.  */
 
diff --git a/gcc/gimple-match.h b/gcc/gimple-match.h
index d710fcbace2..8edff578ba9 100644
--- a/gcc/gimple-match.h
+++ b/gcc/gimple-match.h
@@ -136,6 +136,8 @@ public:
 
   /* The operands to CODE.  Only the first NUM_OPS entries are meaningful.  */
   tree ops[MAX_NUM_OPS];
+
+  bool operands_occurs_in_abnormal_phi() const;
 };
 
 inline
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
b/gcc/testsuite/gcc.target/i386/pr87007-5.c
index 8f2dc947f6c..1a240adef63 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -1,8 +1,11 @@
 /* { dg-do compile } */
-/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize 
-fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize 
-fdump-tree-cddce3-details -fdump-tree-lsplit-optimized -fno-ssa-phiopt" } */
 /* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
are sunk out of the loop and the loop is elided.  One vsqrtsd with
memory operand needs a xor to avoid partial dependence.  */
+/* Phi-OPT needs to ne disabled otherwise, sqrt calls are merged which is 
better
+   but we are testing to make sure the partial register stall for SSE is still 
avoided
+   for sqrts.  */
 
 #include
 
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index aa414f6..2d4aba5b087 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -220,13 +220,12 @@ static gphi *
 factor_out_conditional_operation (edge e0, edge e1, gphi *phi,
   tree arg0, tree arg1, gimple *cond_stmt)
 {
-  gimple *arg0_def_stmt = NULL, *arg1_def_stmt = NULL, *new_stmt;
-  tree new_arg0 = NULL_TREE, new_arg1 = NULL_TREE;
+  gimple *arg0_def_stmt = NULL, *arg1_def_stmt = NULL;
   tree temp, result;
   gphi *newphi;
   gimple_stmt_iterator gsi, gsi_for_def;
   location_t locus = gimple_location (phi);
-  enum tree_code op_code;
+  gimple_match_op arg0_op, arg1_op;
 
   /* Handle only PHI statements with two arguments.  TODO: If all
  other arguments to PHI are INTEGER_CST or if their defining
@@ -250,31 +249,31 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
*phi,
   /* Check if arg0 is an SSA_NAME and the stmt which defines arg0 is
  an unary operation.  */
   arg0_def_stmt = SSA_NAME_DEF_STMT (arg0);
-  if (!is_gimple_assign (arg0_def_stmt)
-  || (gimple_assign_rhs_class (arg0_def_stmt) != GIMPLE_UNARY_RHS
- && gimple_assign_rhs_code (arg0_def_stmt) != VIEW_CONVERT_EXPR))
+  if (!gimple_extract_op (arg0_def_stmt, &arg0_op))
 return NULL;
 
-  /* Use the RHS as new_arg0.  */
-  op_code = gimple_assign_rhs_code (arg0_def_stmt);
-  new_arg0 = gimple_assign_rhs1 (arg0_def_stmt);
-  if (op_code == VIEW_CONVERT_EXPR)
-{
-  new_arg0 = TREE_OPERAND (new_arg0, 0);
-  if (!is

Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Zhijin Zeng
The patch for 3c9c93 as follow. But it's a little strange that this patch 
hasn't changed and I don't know why it apply fail. May you directly modify the 
riscv.cc if this version still conflict? The riscv.cc just changed two lines. 
Thank you again.
Zhijjin

This patch is to fix the bug (BugId:116305) introduced by the commit
bd93ef for risc-v target.

The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
equal.

Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
register value in riscv_legitimize_poly_move, and dwarf2cfi will also
get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
to calculate the number of times to multiply the vlenb register value.

So need to change the factor from riscv_bytes_per_vector_chunk to
BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
information. The incorrect example as follow:

```
csrr    t0,vlenb
slli    t1,t0,1
sub     sp,sp,t1

.cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
```

The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
the literal 4, '0x1e' means the multiply operation. But in fact, the
vlenb register value just need to multiply the literal 2.

gcc/ChangeLog:

        * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.

Signed-off-by: Zhijin Zeng 
---
 gcc/config/riscv/riscv.cc                     |  4 +--
 .../riscv/rvv/base/scalable_vector_cfi.c      | 32 +++
 2 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1f60d8f9711..8b7123e043e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11010,12 +11010,12 @@ static unsigned int
 riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
                                      int *offset)
 {
-  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
+  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
      1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
      2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
   */
   gcc_assert (i == 1);
-  *factor = riscv_bytes_per_vector_chunk;
+  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
   *offset = 1;
   return RISCV_DWARF_VLENB;
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
new file mode 100644
index 000..184da10caf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
+/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } */
+
+#include "riscv_vector.h"
+
+#define PI_2 1.570796326795
+
+extern void func(float *result);
+
+void test(const float *ys, const float *xs, float *result, size_t length) {
+    size_t gvl = __riscv_vsetvlmax_e32m2();
+    vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
+
+    for(size_t i = 0; i < length;) {
+        gvl = __riscv_vsetvl_e32m2(length - i);
+        vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl);
+        vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl);
+        vbool16_t mask0  = __riscv_vmflt_vv_f32m2_b16(x, y, gvl);
+        vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 0, 
gvl);
+
+        __riscv_vse32_v_f32m2(result, fixpi, gvl);
+
+        func(result);
+
+        i += gvl;
+        ys += gvl;
+        xs += gvl;
+        result += gvl;
+    }
+}
--
2.34.1


> From: "Li, Pan2"
> Date:  Sat, Aug 17, 2024, 09:20
> Subject:  RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> To: "Zhijin Zeng"
> Cc: "gcc-patches@gcc.gnu.org", 
> "gcc-b...@gcc.gnu.org", "Kito 
> Cheng"
> Never mind, looks still conflict, could you please help to double check about 
> it?
> Current upstream should be 3c9c93f3c923c4a0ccd42db4fd26a944a3c91458.
> 
> └─(09:18:27 on master ✭)──> git apply tmp.patch                               
>                                                                               
>                                                                 
> ──(Sat,Aug17)─┘
> error: patch failed: gcc/config/riscv/riscv.cc:11010
> error: gcc/config/riscv/riscv.cc: patch does not apply
> 
> Pan
>

[PATCH v4] RISC-V: Make sure high bits of usadd operands is clean for non-Xmode [PR116278]

2024-08-16 Thread pan2 . li
From: Pan Li 

For QI/HImode of .SAT_ADD,  the operands may be sign-extended and the
high bits of Xmode may be all 1 which is not expected.  For example as
below code.

signed char b[1];
unsigned short c;
signed char *d = b;
int main() {
  b[0] = -40;
  c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6; }) + 9;
  __builtin_printf("%d\n", c);
}

After expanding we have:

;; _6 = .SAT_ADD (_3, 9);
(insn 8 7 9 (set (reg:DI 143)
(high:DI (symbol_ref:DI ("d") [flags 0x86]  )))
 (nil))
(insn 9 8 10 (set (reg/f:DI 142)
(mem/f/c:DI (lo_sum:DI (reg:DI 143)
(symbol_ref:DI ("d") [flags 0x86]  )) [1 d+0 S8 
A64]))
 (nil))
(insn 10 9 11 (set (reg:HI 144 [ _3 ])
(sign_extend:HI (mem:QI (reg/f:DI 142) [0 *d.0_1+0 S1 A8]))) 
"test.c":7:10 -1
 (nil))

The convert from signed char to unsigned short will have sign_extend rtl
as above.  And finally become the lb insn as below:

lb  a1,0(a5)   // a1 is -40, aka 0xffd8
lui a0,0x1a
addia5,a1,9
sllia5,a5,0x30
srlia5,a5,0x30 // a5 is 65505
sltua1,a5,a1   // compare 65505 and 0xffd8 => TRUE

The sltu try to compare 65505 and 0xffd8 here,  but we
actually want to compare 65505 and 65496 (0xffd8).  Thus we need to
clean up the high bits to ensure this.

The below test suites are passed for this patch:
* The rv64gcv fully regression test.

PR target/116278

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new
func impl to zero extend rtx.
(riscv_expand_usadd): Leverage above func to cleanup operands
and sum.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr116278-run-1.c: New test.
* gcc.target/riscv/pr116278-run-2.c: New test.

PR 116278

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new
func impl to zero extend rtx.
(riscv_expand_usadd): Leverage above func to cleanup operands 0
and remove the special handing for SImode in RV64.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_u_add-11.c: Adjust asm check body.
* gcc.target/riscv/sat_u_add-15.c: Ditto.
* gcc.target/riscv/sat_u_add-19.c: Ditto.
* gcc.target/riscv/sat_u_add-23.c: Ditto.
* gcc.target/riscv/sat_u_add-3.c: Ditto.
* gcc.target/riscv/sat_u_add-7.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-11.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-15.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-3.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-7.c: Ditto.
* gcc.target/riscv/pr116278-run-1.c: New test.
* gcc.target/riscv/pr116278-run-2.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc | 34 ---
 .../gcc.target/riscv/pr116278-run-1.c | 20 +++
 .../gcc.target/riscv/pr116278-run-2.c | 20 +++
 gcc/testsuite/gcc.target/riscv/sat_u_add-11.c |  6 +++-
 gcc/testsuite/gcc.target/riscv/sat_u_add-15.c |  6 +++-
 gcc/testsuite/gcc.target/riscv/sat_u_add-19.c |  6 +++-
 gcc/testsuite/gcc.target/riscv/sat_u_add-23.c |  6 +++-
 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c  |  6 +++-
 gcc/testsuite/gcc.target/riscv/sat_u_add-7.c  |  6 +++-
 .../gcc.target/riscv/sat_u_add_imm-11.c   |  6 +++-
 .../gcc.target/riscv/sat_u_add_imm-15.c   |  6 +++-
 .../gcc.target/riscv/sat_u_add_imm-3.c|  6 +++-
 .../gcc.target/riscv/sat_u_add_imm-7.c|  6 +++-
 13 files changed, 112 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr116278-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr116278-run-2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1f60d8f9711..453a061428e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11801,12 +11801,29 @@ riscv_get_raw_result_mode (int regno)
   return default_get_reg_raw_mode (regno);
 }
 
+/* Generate a new rtx of Xmode based on the rtx and mode in define pattern.
+   The rtx x will be zero extended to Xmode if the mode is HI/QImode,  and
+   the new zero extended Xmode rtx will be returned.
+   Or the gen_lowpart rtx of Xmode will be returned.  */
+
+static rtx
+riscv_gen_zero_extend_rtx (rtx x, machine_mode mode)
+{
+  if (mode == Xmode)
+return x;
+
+  rtx xmode_reg = gen_reg_rtx (Xmode);
+  riscv_emit_unary (ZERO_EXTEND, xmode_reg, x);
+
+  return xmode_reg;
+}
+
 /* Implements the unsigned saturation add standard name usadd for int mode.
 
z = SAT_ADD(x, y).
=>
1. sum = x + y.
-   2. sum = truncate (sum) for QI and HI only.
+   2. sum = truncate (sum) for non-Xmode.
3. lt = sum < x.
4. lt = -lt.
5. z = sum | lt.  */
@@ -11817,22 +11834,15 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y)
   machine_mode mode = GET_MODE (dest);
   rtx xmode_sum = gen_reg_rtx (Xmode);
   rtx xmode_lt = gen_reg_rtx (Xmode);
-  rtx xmode_x = gen_lowpart (Xmode, x)

RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Li, Pan2
Ok, I will commit it if no surprise from test as manually changing.

Pan

-Original Message-
From: Zhijin Zeng  
Sent: Saturday, August 17, 2024 10:46 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng 

Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]

The patch for 3c9c93 as follow. But it's a little strange that this patch 
hasn't changed and I don't know why it apply fail. May you directly modify the 
riscv.cc if this version still conflict? The riscv.cc just changed two lines. 
Thank you again.
Zhijjin

This patch is to fix the bug (BugId:116305) introduced by the commit
bd93ef for risc-v target.

The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
equal.

Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
register value in riscv_legitimize_poly_move, and dwarf2cfi will also
get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
to calculate the number of times to multiply the vlenb register value.

So need to change the factor from riscv_bytes_per_vector_chunk to
BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
information. The incorrect example as follow:

```
csrr    t0,vlenb
slli    t1,t0,1
sub     sp,sp,t1

.cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
```

The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
the literal 4, '0x1e' means the multiply operation. But in fact, the
vlenb register value just need to multiply the literal 2.

gcc/ChangeLog:

        * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.

Signed-off-by: Zhijin Zeng 
---
 gcc/config/riscv/riscv.cc                     |  4 +--
 .../riscv/rvv/base/scalable_vector_cfi.c      | 32 +++
 2 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1f60d8f9711..8b7123e043e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11010,12 +11010,12 @@ static unsigned int
 riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
                                      int *offset)
 {
-  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
+  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
      1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
      2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
   */
   gcc_assert (i == 1);
-  *factor = riscv_bytes_per_vector_chunk;
+  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
   *offset = 1;
   return RISCV_DWARF_VLENB;
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
new file mode 100644
index 000..184da10caf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
+/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } */
+
+#include "riscv_vector.h"
+
+#define PI_2 1.570796326795
+
+extern void func(float *result);
+
+void test(const float *ys, const float *xs, float *result, size_t length) {
+    size_t gvl = __riscv_vsetvlmax_e32m2();
+    vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
+
+    for(size_t i = 0; i < length;) {
+        gvl = __riscv_vsetvl_e32m2(length - i);
+        vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl);
+        vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl);
+        vbool16_t mask0  = __riscv_vmflt_vv_f32m2_b16(x, y, gvl);
+        vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 0, 
gvl);
+
+        __riscv_vse32_v_f32m2(result, fixpi, gvl);
+
+        func(result);
+
+        i += gvl;
+        ys += gvl;
+        xs += gvl;
+        result += gvl;
+    }
+}
--
2.34.1


> From: "Li, Pan2"
> Date:  Sat, Aug 17, 2024, 09:20
> Subject:  RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> To: "Zhijin Zeng"
> Cc: "gcc-patches@gcc.gnu.org", 
> "gcc-b...@gcc.gnu.org", "Kito 
> Cheng"
> Never mind, looks still conflict, could you please help to double check about 
> it?
> Current upstream should be 3c9c93f3c923c4a0ccd42db4fd26a944a3c91458.
> 
> └─(09:18:27 on master ✭)──> git apply tm

RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Li, Pan2
Should be in upstream already.

Pan

-Original Message-
From: Li, Pan2  
Sent: Saturday, August 17, 2024 11:45 AM
To: Zhijin Zeng 
Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng 

Subject: RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]

Ok, I will commit it if no surprise from test as manually changing.

Pan

-Original Message-
From: Zhijin Zeng  
Sent: Saturday, August 17, 2024 10:46 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng 

Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
[PR116305]

The patch for 3c9c93 as follow. But it's a little strange that this patch 
hasn't changed and I don't know why it apply fail. May you directly modify the 
riscv.cc if this version still conflict? The riscv.cc just changed two lines. 
Thank you again.
Zhijjin

This patch is to fix the bug (BugId:116305) introduced by the commit
bd93ef for risc-v target.

The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
equal.

Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
register value in riscv_legitimize_poly_move, and dwarf2cfi will also
get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
to calculate the number of times to multiply the vlenb register value.

So need to change the factor from riscv_bytes_per_vector_chunk to
BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
information. The incorrect example as follow:

```
csrr    t0,vlenb
slli    t1,t0,1
sub     sp,sp,t1

.cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
```

The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
the literal 4, '0x1e' means the multiply operation. But in fact, the
vlenb register value just need to multiply the literal 2.

gcc/ChangeLog:

        * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.

Signed-off-by: Zhijin Zeng 
---
 gcc/config/riscv/riscv.cc                     |  4 +--
 .../riscv/rvv/base/scalable_vector_cfi.c      | 32 +++
 2 files changed, 34 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1f60d8f9711..8b7123e043e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11010,12 +11010,12 @@ static unsigned int
 riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
                                      int *offset)
 {
-  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
+  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
      1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
      2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
   */
   gcc_assert (i == 1);
-  *factor = riscv_bytes_per_vector_chunk;
+  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
   *offset = 1;
   return RISCV_DWARF_VLENB;
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
new file mode 100644
index 000..184da10caf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
+/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } */
+
+#include "riscv_vector.h"
+
+#define PI_2 1.570796326795
+
+extern void func(float *result);
+
+void test(const float *ys, const float *xs, float *result, size_t length) {
+    size_t gvl = __riscv_vsetvlmax_e32m2();
+    vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
+
+    for(size_t i = 0; i < length;) {
+        gvl = __riscv_vsetvl_e32m2(length - i);
+        vfloat32m2_t y = __riscv_vle32_v_f32m2(ys, gvl);
+        vfloat32m2_t x = __riscv_vle32_v_f32m2(xs, gvl);
+        vbool16_t mask0  = __riscv_vmflt_vv_f32m2_b16(x, y, gvl);
+        vfloat32m2_t fixpi = __riscv_vfrsub_vf_f32m2_mu(mask0, vpi2, vpi2, 0, 
gvl);
+
+        __riscv_vse32_v_f32m2(result, fixpi, gvl);
+
+        func(result);
+
+        i += gvl;
+        ys += gvl;
+        xs += gvl;
+        result += gvl;
+    }
+}
--
2.34.1


> From: "Li, Pan2"
> Date:  Sat, Aug 17, 2024, 09:20
> Subject:  RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> To: "Zh

Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]

2024-08-16 Thread Zhijin Zeng
This is my first time submitting a patch to gcc and sincerely thank you all for 
your help.
Zhijin

> From: "Li, Pan2"
> Date:  Sat, Aug 17, 2024, 12:15
> Subject:  RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> To: "Li, Pan2", "Zhijin Zeng"
> Cc: "gcc-patches@gcc.gnu.org", 
> "gcc-b...@gcc.gnu.org", "Kito 
> Cheng"
> Should be in upstream already.
> 
> Pan
> 
> -Original Message-
> From: Li, Pan2  
> Sent: Saturday, August 17, 2024 11:45 AM
> To: Zhijin Zeng 
> Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng 
> 
> Subject: RE: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> 
> Ok, I will commit it if no surprise from test as manually changing.
> 
> Pan
> 
> -Original Message-
> From: Zhijin Zeng  
> Sent: Saturday, August 17, 2024 10:46 AM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; gcc-b...@gcc.gnu.org; Kito Cheng 
> 
> Subject: Re: [PATCH] RISC-V: Fix factor in dwarf_poly_indeterminate_value 
> [PR116305]
> 
> The patch for 3c9c93 as follow. But it's a little strange that this patch 
> hasn't changed and I don't know why it apply fail. May you directly modify 
> the riscv.cc if this version still conflict? The riscv.cc just changed two 
> lines. Thank you again.
> Zhijjin
> 
> This patch is to fix the bug (BugId:116305) introduced by the commit
> bd93ef for risc-v target.
> 
> The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
> if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
> it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
> merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
> of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
> of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
> equal.
> 
> Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
> register value in riscv_legitimize_poly_move, and dwarf2cfi will also
> get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
> to calculate the number of times to multiply the vlenb register value.
> 
> So need to change the factor from riscv_bytes_per_vector_chunk to
> BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
> information. The incorrect example as follow:
> 
> ```
> csrr    t0,vlenb
> slli    t1,t0,1
> sub     sp,sp,t1
> 
> .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22
> ```
> 
> The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
> the literal 4, '0x1e' means the multiply operation. But in fact, the
> vlenb register value just need to multiply the literal 2.
> 
> gcc/ChangeLog:
> 
>         * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value):
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.
> 
> Signed-off-by: Zhijin Zeng 
> ---
>  gcc/config/riscv/riscv.cc                     |  4 +--
>  .../riscv/rvv/base/scalable_vector_cfi.c      | 32 +++
>  2 files changed, 34 insertions(+), 2 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> 
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 1f60d8f9711..8b7123e043e 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -11010,12 +11010,12 @@ static unsigned int
>  riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
>                                       int *offset)
>  {
> -  /* Polynomial invariant 1 == (VLENB / riscv_bytes_per_vector_chunk) - 1.
> +  /* Polynomial invariant 1 == (VLENB / BYTES_PER_RISCV_VECTOR) - 1.
>       1. TARGET_MIN_VLEN == 32, polynomial invariant 1 == (VLENB / 4) - 1.
>       2. TARGET_MIN_VLEN > 32, polynomial invariant 1 == (VLENB / 8) - 1.
>    */
>    gcc_assert (i == 1);
> -  *factor = riscv_bytes_per_vector_chunk;
> +  *factor = BYTES_PER_RISCV_VECTOR.coeffs[1];
>    *offset = 1;
>    return RISCV_DWARF_VLENB;
>  }
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> new file mode 100644
> index 000..184da10caf3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalable_vector_cfi.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-options "-g -O3 -march=rv64gcv -mabi=lp64d" } */
> +/* { dg-skip-if "" { *-*-* } {"-O2" "-O1" "-O0" "-Og" "-Oz" "-flto"} } */
> +/* { dg-final { scan-assembler {cfi_escape .*0x92,0xa2,0x38,0,0x32,0x1e} } } 
> */
> +
> +#include "riscv_vector.h"
> +
> +#define PI_2 1.570796326795
> +
> +extern void func(float *result);
> +
> +void test(const float *ys, const float *xs, float *result, size_t length) {
> +    size_t gvl = __riscv_vsetvlmax_e32m2();
> +    vfloat32m2_t vpi2 = __riscv_vfmv_v_f_f32m2(PI_2, gvl);
> +
> +    for(size_t i = 0; i < length;) {
> +        gvl = __riscv_vsetvl_e32m2(length - i);
> +    

RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

2024-08-16 Thread Li, Pan2
Thanks Richard for confirmation. Sorry almost forget this thread.

Hi Jakub,

Please feel free to let me know if there is anything I can do to fix this 
issue. Thanks a lot.

Pan


-Original Message-
From: Richard Biener  
Sent: Tuesday, July 16, 2024 11:29 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
Hongtao ; Jakub Jelinek 
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
vectorizable_call

On Tue, Jul 16, 2024 at 3:22 PM Li, Pan2  wrote:
>
> > I think that's a bug.  Do you say __builtin_add_overflow fails to promote
> > (constant) arguments?
>
> I double checked the 022t.ssa pass for the __builtin_add_overflow operands 
> tree type. It looks like that
> the 2 operands of .ADD_OVERFLOW has different tree types when one of them is 
> constant.
> One is unsigned DI, and the other is int.

I think that's a bug (and a downside of internal-functions as they
have no prototype the type
verifier could work with).

That you see them in 022t.ssa means that either the frontend
mis-handles the builtin call parsing
or fold_builtin_arith_overflow which is responsible for the rewriting
to an internal function is
wrong.

I've CCed Jakub who added those.

I think we could add verification for internal functions in the set of
commutative_binary_fn_p, commutative_ternary_fn_p, associative_binary_fn_p
and possibly others where we can constrain argument and result types.

Richard.

> (gdb) call debug_gimple_stmt(stmt)
> _14 = .ADD_OVERFLOW (_4, 129);
> (gdb) call debug_tree (gimple_call_arg(stmt, 0))
>   type  public unsigned DI
> size 
> unit-size 
> align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
> 0x76a437e0 precision:64 min  max 
> 
> pointer_to_this >
> visited
> def_stmt _4 = *_3;
> version:4>
> (gdb) call debug_tree (gimple_call_arg(stmt, 1))
>   constant 
> 129>
> (gdb)
>
> Then we go to the vect pass, we can also see that the ops of .ADD_OVERFLOW 
> has different tree types.
> As my understanding, here we should have unsigned DI for constant operands
>
> (gdb) layout src
> (gdb) list
> 506 
> if (gimple_call_num_args (_c4) == 2)
> 507   
> {
> 508   
>   tree _q40 = gimple_call_arg (_c4, 0);
> 509   
>   _q40 = do_valueize (valueize, _q40);
> 510   
>   tree _q41 = gimple_call_arg (_c4, 1);
> 511   
>   _q41 = do_valueize (valueize, _q41);
> 512   
>   if (integer_zerop (_q21))
> 513   
> {
> 514   
>   if (integer_minus_onep (_p1))
> 515   
> {
> (gdb) call debug_tree (_q40)
>   type  public unsigned DI
> size 
> unit-size 
> align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
> 0x76a437e0 precision:64 min  max 
> 
> pointer_to_this >
> visited
> def_stmt _4 = *_3;
> version:4>
> (gdb) call debug_tree (_q41)
>   constant 
> 129>
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, July 10, 2024 7:36 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com; Liu, 
> Hongtao 
> Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for 
> vectorizable_call
>
> On Wed, Jul 10, 2024 at 11:28 AM  wrote:
> >
> > From: Pan Li 
> >
> > The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
> > For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
> >
> > Form 3:
> >   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)  \
> >   T __attribute__((noinline))  \
> >   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
> >   {\
> > unsigned i;\
> > T ret; \
> > for (i = 0; i < limit; i++)\
> >   {\
> > out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 :