RFC [PATCH] c: Add missing cases where vla sizes are not instrumented by UBSan [PR98608]

2023-11-01 Thread Martin Uecker



Here is a patch that adds the missing cases for vla size instrumentation.
This now includes all cases where a type with size < 0 is created,
which is already UB and not just cases where a VLA is allocated.  But
a VLA can be allocated based on an typedef, which is also now
indirectly protected in this way.

I moved the instrumentation from the size itself as its own term into 
the expression that evaluate size expressions for side effects. This
avoids confusing other warning code that looks at the size expressions
(-Wvla-parameter).

There is one open question though:  How to to treat n == 0? 

Here I preliminary changed this to n > 0 (also for the existing case),
because when also detecting n == 0 this tools especially when
instrumenting all the types becomes basically useless because of 
the very common (and unproblematic) use of n == 0.  

But strictly speaking n == 0 is also UB and as pointed out in  PR98609
the error message is then not entirely accurate because it says
non-positive and not negative. I do not think it is confusing though
because it is still always correct.

One could consider splitting it up into vla-bound / vla-bound-strict,
but changing the error message would require further upstream changes
and dealing with this far exceeds the time I can afford contributing
to this this.

Another complication is that we ran out of bits for sanitizer flags in
unsigned int, so this would also require more changes.

Any advice?


I think it would be important to have complete UBSan coverage for all
size and bounds issues related to VM types and it would be nice to
get this in GCC 14. (I find this extremely useful in my projects).

Martin




c: Add missing cases where vla sizes are not instrumented by UBSan [PR98608]

Add vla-bound instrumentation for all VLAs including VLAs in parameters
and fields, but allow zero-sized errors.

Bootstrapped and regression tested on x86.

PR c/98608

gcc/c:
* c-decl.cc (grokdeclarator): Instrument all VLAs.

gcc/c-family:
* c-ubsan.cc (ubsan_instrument_vla): Do not include zero length.

gcc/testsuite:
* gcc.dg/ubsan/pr89608.c: New test.
* gcc.dg/ubsan/vla-1.c: New test.
* gcc.dg/ubsan/vla-2.c: New test.
* gcc.dg/ubsan/vla-3.c: New test.
* c-c++-common/ubsan/vla-1.c: Adapt.

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index b2c58c65d97..8983ede0166 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -313,7 +313,7 @@ ubsan_instrument_vla (location_t loc, tree size)
   tree type = TREE_TYPE (size);
   tree t, tt;
 
-  t = fold_build2 (LE_EXPR, boolean_type_node, size, build_int_cst (type, 0));
+  t = fold_build2 (LT_EXPR, boolean_type_node, size, build_int_cst (type, 0));
   if (flag_sanitize_trap & SANITIZE_VLA)
 tt = build_call_expr_loc (loc, builtin_decl_explicit (BUILT_IN_TRAP), 0);
   else
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 7a145bed281..752a65d6729 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -7201,16 +7201,20 @@ grokdeclarator (const struct c_declarator *declarator,
   with known value.  */
this_size_varies = size_varies = true;
warn_variable_length_array (name, size);
-   if (sanitize_flags_p (SANITIZE_VLA)
-   && current_function_decl != NULL_TREE
-   && decl_context == NORMAL)
+   if (sanitize_flags_p (SANITIZE_VLA))
  {
/* Evaluate the array size only once.  */
size = save_expr (size);
size = c_fully_fold (size, false, NULL);
-   size = fold_build2 (COMPOUND_EXPR, TREE_TYPE (size),
-   ubsan_instrument_vla (loc, size),
-   size);
+   tree instr = ubsan_instrument_vla (loc, size);
+   /* We have to build this in the right order, so
+  instrumentation is done before the size can
+  be used in other parameters.  */
+   if (*expr)
+ *expr = build2 (COMPOUND_EXPR, TREE_TYPE (instr),
+ *expr, instr);
+   else
+ *expr = instr;
  }
  }
 
diff --git a/gcc/testsuite/c-c++-common/ubsan/vla-1.c 
b/gcc/testsuite/c-c++-common/ubsan/vla-1.c
index c97465edae1..cc441ffc80b 100644
--- a/gcc/testsuite/c-c++-common/ubsan/vla-1.c
+++ b/gcc/testsuite/c-c++-common/ubsan/vla-1.c
@@ -110,9 +110,7 @@ main (void)
 /* { dg-output "\[^\n\r]*variable length array bound evaluates to non-positive 
value -1\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "\[^\n\r]*variable length array bound evaluates to non-positive 
value 

[PUSHED] i386: Improve stack protector patterns and peephole2s

2023-11-01 Thread Uros Bizjak
Improve stack protector patterns and peephole2s to substitute stack
protector scratch register clear with unrelated subsequent register
initialization in several ways:

a. Explicitly generate scratch register as named pseudo.  This allows
optimizers to eventually reuse the zero value in the register.

b. Allow scratch register in different mode (SWI48) than PTR mode:

d000:65 48 8b 04 25 28 00 mov%gs:0x28,%rax
d007:00 00
d009:48 89 44 24 08   mov%rax,0x8(%rsp)
d00e:8b 87 e0 01 00 00mov0x1e0(%rdi),%eax

   SImode moves on x86 zero-extend to the whole DImode register,
   so stack protector paranoia is not compromised.

c. Relax peephole2 constraint that stack protector scratch register
   must match new initialized register.  This relaxation substantially
   improves peephole2 opportunities, and generates sequences like:

a310:65 4c 8b 34 25 28 00 mov%gs:0x28,%r14
a317:00 00
a319:4c 89 74 24 08   mov%r14,0x8(%rsp)
a31e:4c 8b b7 98 00 00 00 mov0x98(%rdi),%r14

   We have to ensure the new scratch is dead in front of the sequence.

The patch also fixes omission of earlyclobbers for all alternatives of
new initialized register in *stack_protect_set_3, avoiding the need for
reg_overlap_mentioned_p constraint.  Earlyclobbers are per alternative,
not per operand.

Also, instructions are already valid in peephole2 pass, so we don't
have to explicitly re-check their operands for validity.

gcc/ChangeLog:

* config/i386/i386.md (stack_protect_set): Explicitly
generate scratch register in word mode.
(@stack_protect_set_1_): Rename to ...
(@stack_protect_set_1__): ... this.
Use SWI48 mode iterator to match scratch register.
(stack_protexct_set_1 peephole2): Use PTR, W and SWI48 mode
iterators to match peephole sequence.  Use general_operand
predicate for operand 4.  Allow different operand 2 and operand 3
registers and use peep2_reg_dead_p to ensure new scratch
register is dead before peephole seqeunce. Use peep2_reg_dead_p
to ensure old scratch register is dead after peephole sequence.
(*stack_protect_set_2_): Rename to ...
(*stack_protect_set_2__si): .. this.
(*stack_protect_set_3): Rename to ...
(*stack_protect_set_2__di): ... this.
Use PTR mode iterator to match stack protector memory move.
Use earlyclobber for all alternatives of operand 1.
(stack_protexct_set_2 peephole2): Use PTR, W and SWI48 mode
iterators to match peephole sequence.  Use general_operand
predicate for operand 4.  Allow different operand 2 and operand 3
registers and use peep2_reg_dead_p to ensure new scratch
register is dead before peephole seqeunce. Use peep2_reg_dead_p
to ensure old scratch register is dead after peephole sequence.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 75dd4b4061f..35d073c9a21 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -25653,40 +25653,58 @@ (define_expand "stack_protect_set"
(match_operand 1 "memory_operand")]
   ""
 {
+  rtx scratch = gen_reg_rtx (word_mode);
+
   emit_insn (gen_stack_protect_set_1
-(ptr_mode, operands[0], operands[1]));
+(ptr_mode, word_mode, operands[0], operands[1], scratch));
   DONE;
 })
 
-(define_insn "@stack_protect_set_1_"
+(define_insn "@stack_protect_set_1__"
   [(set (match_operand:PTR 0 "memory_operand" "=m")
(unspec:PTR [(match_operand:PTR 1 "memory_operand" "m")]
UNSPEC_SP_SET))
-   (set (match_scratch:PTR 2 "=&r") (const_int 0))
+   (set (match_operand:SWI48 2 "register_operand" "=&r") (const_int 0))
(clobber (reg:CC FLAGS_REG))]
   ""
 {
-  output_asm_insn ("mov{}\t{%1, %2|%2, %1}", operands);
-  output_asm_insn ("mov{}\t{%2, %0|%0, %2}", operands);
+  output_asm_insn ("mov{}\t{%1, %2|%2, %1}",
+  operands);
+  output_asm_insn ("mov{}\t{%2, %0|%0, %2}",
+  operands);
   return "xor{l}\t%k2, %k2";
 }
   [(set_attr "type" "multi")])
 
 ;; Patterns and peephole2s to optimize stack_protect_set_1_
-;; immediately followed by *mov{s,d}i_internal to the same register,
-;; where we can avoid the xor{l} above.  We don't split this, so that
-;; scheduling or anything else doesn't separate the *stack_protect_set*
-;; pattern from the set of the register that overwrites the register
-;; with a new value.
-(define_insn "*stack_protect_set_2_"
+;; immediately followed by *mov{s,d}i_internal, where we can avoid
+;; the xor{l} above.  We don't split this, so that scheduling or
+;; anything else doesn't separate the *stack_protect_set* pattern from
+;; the set of the register that overwrites the register with a new value.
+
+(define_peephole2
+  [(parallel [(set (match_operand:PTR 0 "memory_operand")
+  (unspec:PTR [(match_operand:PTR 1 "memory_operand")]
+ 

Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-11-01 Thread Uros Bizjak
On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle  wrote:
>
>
> This patch is a follow-up to my previous PR target/110551 patch, this
> time to address the additional move after mulx, seen on TARGET_BMI2
> architectures (such as -march=haswell).  The complication here is
> that the flexible multiple-set mulx instruction is introduced into
> RTL after reload, by split2, and therefore can't benefit from register
> preferencing.  This results in RTL like the following:
>
> (insn 32 31 17 2 (parallel [
> (set (reg:DI 4 si [orig:101 r ] [101])
> (mult:DI (reg:DI 1 dx [109])
> (reg:DI 5 di [109])))
> (set (reg:DI 5 di [ r+8 ])
> (umul_highpart:DI (reg:DI 1 dx [109])
> (reg:DI 5 di [109])))
> ]) "pr110551-2.c":8:17 -1
>  (nil))
>
> (insn 17 32 9 2 (set (reg:DI 0 ax [107])
> (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal}
>  (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ])
> (nil)))
>
> Here insn 32, the mulx instruction, places its results in si and di,
> and then immediately after decides to move di to ax, with di now dead.
> This can be trivially cleaned up by a peephole2.  I've added an
> additional constraint that the two SET_DESTs can't be the same
> register to avoid confusing the middle-end, but this has well-defined
> behaviour on x86_64/BMI2, encoding a umul_highpart.
>
> For the new test case, compiled on x86_64 with -O2 -march=haswell:
>
> Before:
> mulx64: movabsq $-7046029254386353131, %rdx
> mulx%rdi, %rsi, %rdi
> movq%rdi, %rax
> xorq%rsi, %rax
> ret
>
> After:
> mulx64: movabsq $-7046029254386353131, %rdx
> mulx%rdi, %rsi, %rax
> xorq%rsi, %rax
> ret
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

It looks that your previous PR110551 patch regressed
-march=cascadelake [1]. Let's fix these regressions first.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html

Uros.

> 2023-10-30  Roger Sayle  
>
> gcc/ChangeLog
> PR target/110551
> * config/i386/i386.md (*bmi2_umul3_1): Tidy condition
> as operands[2] with predicate register_operand must be !MEM_P.
> (peephole2): Optimize a mulx followed by a register-to-register
> move, to place result in the correct destination if possible.
>
> gcc/testsuite/ChangeLog
> PR target/110551
> * gcc.target/i386/pr110551-2.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>


[PATCH] Support cmul{_conj}v4hf3/cmla{_conj}v4hf4 with AVX512FP16 instruction.

2023-11-01 Thread liuhongt
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ready push to trunk.

gcc/ChangeLog:

* config/i386/mmx.md (cmlav4hf4): New expander.
(cmla_conjv4hf4): Ditto.
(cmulv4hf3): Ditto.
(cmul_conjv4hf3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-complexhf.c: New test.
---
 gcc/config/i386/mmx.md| 86 +++
 .../gcc.target/i386/part-vect-complexhf.c | 40 +
 2 files changed, 126 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-complexhf.c

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 2b97bb8fa98..ba81ff72551 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2622,6 +2622,92 @@ (define_expand "vec_fmsubaddv4hf4"
   DONE;
 })
 
+
+;;
+;; Parallel half-precision floating point complex type operations
+;;
+
+
+(define_expand "cmlav4hf4"
+  [(match_operand:V4HF 0 "register_operand")
+   (match_operand:V4HF 1 "vector_operand")
+   (match_operand:V4HF 2 "vector_operand")
+   (match_operand:V4HF 3 "vector_operand")]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+{
+  rtx op3 = gen_reg_rtx (V8HFmode);
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_movq_v4hf_to_sse (op3, operands[3]));
+  emit_insn (gen_movq_v4hf_to_sse (op2, operands[2]));
+  emit_insn (gen_movq_v4hf_to_sse (op1, operands[1]));
+
+  emit_insn (gen_cmlav8hf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (V4HFmode, op0, V8HFmode));
+  DONE;
+})
+
+(define_expand "cmla_conjv4hf4"
+  [(match_operand:V4HF 0 "register_operand")
+   (match_operand:V4HF 1 "vector_operand")
+   (match_operand:V4HF 2 "vector_operand")
+   (match_operand:V4HF 3 "vector_operand")]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+{
+  rtx op3 = gen_reg_rtx (V8HFmode);
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_movq_v4hf_to_sse (op3, operands[3]));
+  emit_insn (gen_movq_v4hf_to_sse (op2, operands[2]));
+  emit_insn (gen_movq_v4hf_to_sse (op1, operands[1]));
+
+  emit_insn (gen_cmla_conjv8hf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (V4HFmode, op0, V8HFmode));
+  DONE;
+})
+
+(define_expand "cmulv4hf3"
+  [(match_operand:V4HF 0 "register_operand")
+   (match_operand:V4HF 1 "vector_operand")
+   (match_operand:V4HF 2 "vector_operand")]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+{
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_movq_v4hf_to_sse (op2, operands[2]));
+  emit_insn (gen_movq_v4hf_to_sse (op1, operands[1]));
+
+  emit_insn (gen_cmulv8hf3 (op0, op1, op2));
+  emit_move_insn (operands[0], lowpart_subreg (V4HFmode, op0, V8HFmode));
+  DONE;
+})
+
+(define_expand "cmul_conjv4hf3"
+  [(match_operand:V4HF 0 "register_operand")
+   (match_operand:V4HF 1 "vector_operand")
+   (match_operand:V4HF 2 "vector_operand")]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+{
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_movq_v4hf_to_sse (op2, operands[2]));
+  emit_insn (gen_movq_v4hf_to_sse (op1, operands[1]));
+
+  emit_insn (gen_cmul_conjv8hf3 (op0, op1, op2));
+  emit_move_insn (operands[0], lowpart_subreg (V4HFmode, op0, V8HFmode));
+  DONE;
+})
+
 ;
 ;;
 ;; Parallel half-precision floating point conversion operations
diff --git a/gcc/testsuite/gcc.target/i386/part-vect-complexhf.c 
b/gcc/testsuite/gcc.target/i386/part-vect-complexhf.c
new file mode 100644
index 000..b9f4ba2f4cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/part-vect-complexhf.c
@@ -0,0 +1,40 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O1 -ftree-vectorize -ffast-math -mavx512fp16 -mavx512vl" } */
+/* { dg-final { scan-assembler-times "vfmaddcph\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-not "vfmadd\[123]*ph\[ \\t\]"} } */
+/* { dg-final { scan-assembler-not "vfmadd\[123]*sh\[ \\t\]"} } */
+/* { dg-final { scan-assembler-times "vfcmaddcph\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-times "vfmulcph\[ \\t\]" 1 } } */
+/* { dg-final { scan-assembler-times "vfcmulcph\[ \\t\]" 1 } } */
+
+#include
+#define TYPE _Float16
+#define N 2
+
+void fma0 (_Complex TYPE *a, _Complex TYPE *b,
+   _Complex TYPE * __restrict c)
+{
+  for (int i = 0; i < N; i++)
+c[i] += a[i] * b[i];
+}
+
+void fmaconj (_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
+ _Complex TYPE c[restrict N])
+{
+  for (int i = 0; i < N; i++)
+c[i] += a[i] * ~b[i];
+}
+
+void fmul (_Complex TYPE a[restrict N], _Complex TYPE b[restr

RE: [PATCH v4] libgfortran: Replace mutex with rwlock

2023-11-01 Thread Zhu, Lipeng
> >
> > Hi Lipeng,
> >
> > >>> Sure, as your comments, in the patch V6, I added 3 test cases with
> > >>> OpenMP to test different cases in concurrency respectively:
> > >>> 1. find and create unit very frequently to stress read lock and write 
> > >>> lock.
> > >>> 2. only access the unit which exist in cache to stress read lock.
> > >>> 3. access the same unit in concurrency.
> > >>> For the third test case, it also help to find a bug:  When unit
> > >>> can't be found in cache nor unit list in read phase, then threads
> > >>> will try to acquire write lock to insert the same unit, this will
> > >>> cause duplicate key
> > >> error.
> > >>> To fix this bug, I get the unit from unit list once again before
> > >>> insert in write
> > >> lock.
> > >>> More details you can refer the patch v6.
> > >>>
> > >>
> > >> Could you help to review this update? I really appreciate your 
> > >> assistance.
> > >>
> >
> > > Could you help to review this update?  Any concern will be appreciated.
> >
> > Fortran parts are OK (I think I wrote that already), we need somebody
> > for the non-Fortran parts.
> >
> Hi Thomas,
> 
> Thanks for your response. Very appreciate for your patience and help.
> 
> > Jakub, could you maybe take a look?
> >
> > Best regards
> >
> > Thomas
> 
> Hi Jakub,
> 
> Can you help to take a look at the change for libgcc part that added several
> rwlock macros in libgcc/gthr-posix.h?
> 

Hi Jakub,

Could you help to review this, any comment will be greatly appreciated.

> Best Regards,
> Lipeng Zhu



RE: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-11-01 Thread Roger Sayle


Hi Uros,

> From: Uros Bizjak 
> Sent: 01 November 2023 10:05
> Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation
> using peephole2.
> 
> On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle 
> wrote:
> >
> >
> > This patch is a follow-up to my previous PR target/110551 patch, this
> > time to address the additional move after mulx, seen on TARGET_BMI2
> > architectures (such as -march=haswell).  The complication here is that
> > the flexible multiple-set mulx instruction is introduced into RTL
> > after reload, by split2, and therefore can't benefit from register
> > preferencing.  This results in RTL like the following:
> >
> > (insn 32 31 17 2 (parallel [
> > (set (reg:DI 4 si [orig:101 r ] [101])
> > (mult:DI (reg:DI 1 dx [109])
> > (reg:DI 5 di [109])))
> > (set (reg:DI 5 di [ r+8 ])
> > (umul_highpart:DI (reg:DI 1 dx [109])
> > (reg:DI 5 di [109])))
> > ]) "pr110551-2.c":8:17 -1
> >  (nil))
> >
> > (insn 17 32 9 2 (set (reg:DI 0 ax [107])
> > (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal}
> >  (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ])
> > (nil)))
> >
> > Here insn 32, the mulx instruction, places its results in si and di,
> > and then immediately after decides to move di to ax, with di now dead.
> > This can be trivially cleaned up by a peephole2.  I've added an
> > additional constraint that the two SET_DESTs can't be the same
> > register to avoid confusing the middle-end, but this has well-defined
> > behaviour on x86_64/BMI2, encoding a umul_highpart.
> >
> > For the new test case, compiled on x86_64 with -O2 -march=haswell:
> >
> > Before:
> > mulx64: movabsq $-7046029254386353131, %rdx
> > mulx%rdi, %rsi, %rdi
> > movq%rdi, %rax
> > xorq%rsi, %rax
> > ret
> >
> > After:
> > mulx64: movabsq $-7046029254386353131, %rdx
> > mulx%rdi, %rsi, %rax
> > xorq%rsi, %rax
> > ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> 
> It looks that your previous PR110551 patch regressed -march=cascadelake [1].
> Let's fix these regressions first.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html
> 
> Uros.

This patch fixes that "regression".  Originally, the test case in PR110551 
contained
one unnecessary mov on "default" x86_targets, but two extra movs on BMI2
targets, including -march=haswell and -march=cascadelake.  The first patch
eliminated one of these MOVs, this patch eliminates the second.  I'm not sure
that you can call it a regression, the added test failed when run with a 
non-standard
-march setting.  The good news is that test case doesn't have to be changed with
this patch applied, i.e. the correct intended behaviour is no MOVs on all 
architectures.

I'll admit the timing is unusual; I had already written and was regression 
testing a
patch for the BMI2 issue, when the -march=cascadelake regression tester let me
know it was required for folks that helpfully run the regression suite with non
standard settings.  i.e. a long standing bug that wasn't previously tested for 
by
the testsuite.

> > 2023-10-30  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/110551
> > * config/i386/i386.md (*bmi2_umul3_1): Tidy condition
> > as operands[2] with predicate register_operand must be !MEM_P.
> > (peephole2): Optimize a mulx followed by a register-to-register
> > move, to place result in the correct destination if possible.
> >
> > gcc/testsuite/ChangeLog
> > PR target/110551
> > * gcc.target/i386/pr110551-2.c: New test case.
> >

Thanks again,
Roger
--




[PING] Re: [PATCH] Add files to discourage submissions of PRs to the GitHub mirror.

2023-11-01 Thread Eric Gallager
Hi, I'd like to ping the following patch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633191.html

On Mon, Oct 16, 2023 at 8:50 PM Eric Gallager  wrote:
>
> On Mon, Oct 16, 2023 at 7:58 PM Andrew Pinski  wrote:
> >
> > On Mon, Oct 16, 2023, 16:39 Eric Gallager  wrote:
> >>
> >> Currently there is an unofficial mirror of GCC on GitHub that people
> >> sometimes submit pull requests to:
> >> https://github.com/gcc-mirror/gcc
> >> However, this is not the proper way to contribute to GCC, so that means
> >> that someone (usually Jonathan Wakely) has to go through the PRs and
> >> manually tell people that they're sending their PRs to the wrong place.
> >> One thing that would help mitigate this problem would be files in a
> >> special .github directory that GitHub would automatically open when
> >> contributors attempt to open a PR, that would then tell them the proper
> >> way to contribute instead. This patch attempts to add two such files.
> >> They are written in Markdown, which I'm realizing might require some
> >> special handling in this repository, since the ".md" extension is also
> >> used for GCC's "Machine Description" files here, but I'm not quite sure
> >> how to go about handling that. Also note that I adapted these files from
> >> equivalent files in the git repository for Git itself:
> >> https://github.com/git/git/blob/master/.github/CONTRIBUTING.md
> >> https://github.com/git/git/blob/master/.github/PULL_REQUEST_TEMPLATE.md
> >> What do people think?
> >
> >
> >
> > I think this is a great idea. Is a similar one for opening issues too?
> >
>
> One for issues isn't necessary, because the GitHub mirror has never
> had issues enabled in the first place, so people already can't open
> issues there.
>
> > Thanks,
> > Andrew
> >
> >
> >> ChangeLog:
> >>
> >> * .github/CONTRIBUTING.md: New file.
> >> * .github/PULL_REQUEST_TEMPLATE.md: New file.
> >> ---
> >>  .github/CONTRIBUTING.md  | 18 ++
> >>  .github/PULL_REQUEST_TEMPLATE.md |  5 +
> >>  2 files changed, 23 insertions(+)
> >>  create mode 100644 .github/CONTRIBUTING.md
> >>  create mode 100644 .github/PULL_REQUEST_TEMPLATE.md
> >>
> >> diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
> >> new file mode 100644
> >> index ..4f7b3abca5f4
> >> --- /dev/null
> >> +++ b/.github/CONTRIBUTING.md
> >> @@ -0,0 +1,18 @@
> >> +## Contributing to GCC
> >> +
> >> +Thanks for taking the time to contribute to GCC! Please be advised that 
> >> if you are
> >> +viewing this on `github.com`, that the mirror there is unofficial and 
> >> unmonitored.
> >> +The GCC community does not use `github.com` for their contributions. 
> >> Instead, we use
> >> +a mailing list (`gcc-patches@gcc.gnu.org`) for code submissions, code
> >> +reviews, and bug reports.
> >> +
> >> +Perhaps one day it will be possible to use 
> >> [GitGitGadget](https://gitgitgadget.github.io/) to
> >> +conveniently send Pull Requests commits to GCC's mailing list, the way 
> >> that the Git project currently allows it to be used to send PRs to their 
> >> mailing list, but until that day arrives, please send your patches to the 
> >> mailing list manually.
> >> +
> >> +Please read ["Contributing to GCC"](https://gcc.gnu.org/contribute.html) 
> >> on the main GCC website
> >> +to learn how the GCC project is managed, and how you can work with it.
> >> +In addition, we highly recommend you to read [our guidelines for 
> >> read-write Git access](https://gcc.gnu.org/gitwrite.html).
> >> +
> >> +Or, you can follow the ["Contributing to GCC in 10 easy 
> >> steps"](https://gcc.gnu.org/wiki/GettingStarted#Basics:_Contributing_to_GCC_in_10_easy_steps)
> >>  section of the ["Getting Started" 
> >> page](https://gcc.gnu.org/wiki/GettingStarted) on [the 
> >> wiki](https://gcc.gnu.org/wiki) for another example of the contribution 
> >> process.
> >> +
> >> +Your friendly GCC community!
> >> diff --git a/.github/PULL_REQUEST_TEMPLATE.md 
> >> b/.github/PULL_REQUEST_TEMPLATE.md
> >> new file mode 100644
> >> index ..6417392c8cf3
> >> --- /dev/null
> >> +++ b/.github/PULL_REQUEST_TEMPLATE.md
> >> @@ -0,0 +1,5 @@
> >> +Thanks for taking the time to contribute to GCC! Please be advised that 
> >> if you are
> >> +viewing this on `github.com`, that the mirror there is unofficial and 
> >> unmonitored.
> >> +The GCC community does not use `github.com` for their contributions. 
> >> Instead, we use
> >> +a mailing list (`gcc-patches@gcc.gnu.org`) for code submissions, code 
> >> reviews, and
> >> +bug reports. Please send patches there instead.


Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-01 Thread Qing Zhao


> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
> 
> On Tue, 31 Oct 2023, Qing Zhao wrote:
> 
>> 2.3 A new semantic requirement in the user documentation of "counted_by"
>> 
>> For the following structure including a FAM with a counted_by attribute:
>> 
>>  struct A
>>  {
>>   size_t size;
>>   char buf[] __attribute__((counted_by(size)));
>>  };
>> 
>> for any object with such type:
>> 
>>  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
>> 
>> The setting to the size field should be done before the first reference 
>> to the FAM field.
>> 
>> Such requirement to the user will guarantee that the first reference to 
>> the FAM knows the size of the FAM.
>> 
>> We need to add this additional requirement to the user document.
> 
> Make sure the manual is very specific about exactly when size is 
> considered to be an accurate representation of the space available for buf 
> (given that, after malloc or realloc, it's going to be temporarily 
> inaccurate).  If the intent is that inaccurate size at such a time means 
> undefined behavior, say so explicitly.

Yes, good point. We need to define this clearly in the beginning. 
We need to explicit say that 

the size of the FAM is defined by the latest “counted_by” value. And it’s an 
undefined behavior when the size field is not defined when the FAM is 
referenced.

Is the above good enough?


> 
>> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
>> 
>> In C FE:
>> 
>> for every reference to a FAM, for example, "obj->buf" in the small example,
>>  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
>>  if YES, replace the reference to "obj->buf" with a call to
>>  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 
> 
> This seems plausible - but you should also consider the case of static 
> initializers - remember the GNU extension for statically allocated objects 
> with flexible array members (unless you're not allowing it with 
> counted_by).
> 
> static struct A x = { sizeof "hello", "hello" };
> static char *y = &x.buf;
> 
> I'd expect that to be valid - and unless you say such a usage is invalid, 

At this moment, I think that this should be valid.

I,e, the following:

struct A
{
 size_t size;
 char buf[] __attribute__((counted_by(size)));
};

static struct A x = {sizeof "hello", "hello”};

Should be valid, and x.size represents the number of elements of x.buf. 
Both x.size and x.buf are initialized statically. 

> you should avoid the replacement in such a static initializer context when 
> the FAM reference is to an object with a constant address (if 
> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
> expression; if it works fine as a constant-address lvalue, then the 
> replacement would be OK).

Then if such usage for the “counted_by” is valid, we need to replace the FAM 
reference by a call to  .ACCESS_WITH_SIZE as well.
Otherwise the “counted_by” relationship will be lost to the Middle end. 

With the current definition of .ACCESS_WITH_SIZE

PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)

Isn’t the PTR (return value of the call) a LVALUE? 

Qing
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-01 Thread Martin Uecker
Am Mittwoch, dem 01.11.2023 um 14:47 + schrieb Qing Zhao:
> 
> > On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
> > 
> > On Tue, 31 Oct 2023, Qing Zhao wrote:
> > 
> > > 2.3 A new semantic requirement in the user documentation of "counted_by"
> > > 
> > > For the following structure including a FAM with a counted_by attribute:
> > > 
> > >  struct A
> > >  {
> > >   size_t size;
> > >   char buf[] __attribute__((counted_by(size)));
> > >  };
> > > 
> > > for any object with such type:
> > > 
> > >  struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> > > 
> > > The setting to the size field should be done before the first reference 
> > > to the FAM field.
> > > 
> > > Such requirement to the user will guarantee that the first reference to 
> > > the FAM knows the size of the FAM.
> > > 
> > > We need to add this additional requirement to the user document.
> > 
> > Make sure the manual is very specific about exactly when size is 
> > considered to be an accurate representation of the space available for buf 
> > (given that, after malloc or realloc, it's going to be temporarily 
> > inaccurate).  If the intent is that inaccurate size at such a time means 
> > undefined behavior, say so explicitly.
> 
> Yes, good point. We need to define this clearly in the beginning. 
> We need to explicit say that 
> 
> the size of the FAM is defined by the latest “counted_by” value. And it’s an 
> undefined behavior when the size field is not defined when the FAM is 
> referenced.

It is defined by the latest "counted_by" value before x.buf
is referenced, but not the latest before x.buf is dereferenced.

> 
> Is the above good enough?
> 
> 
> > 
> > > 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> > > 
> > > In C FE:
> > > 
> > > for every reference to a FAM, for example, "obj->buf" in the small 
> > > example,
> > >  check whether the corresponding FIELD_DECL has a "counted_by" attribute?
> > >  if YES, replace the reference to "obj->buf" with a call to
> > >  .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 
> > 
> > This seems plausible - but you should also consider the case of static 
> > initializers - remember the GNU extension for statically allocated objects 
> > with flexible array members (unless you're not allowing it with 
> > counted_by).
> > 
> > static struct A x = { sizeof "hello", "hello" };
> > static char *y = &x.buf;
> > 
> > I'd expect that to be valid - and unless you say such a usage is invalid, 
> 
> At this moment, I think that this should be valid.
> 
> I,e, the following:
> 
> struct A
> {
>  size_t size;
>  char buf[] __attribute__((counted_by(size)));
> };
> 
> static struct A x = {sizeof "hello", "hello”};
> 
> Should be valid, and x.size represents the number of elements of x.buf. 
> Both x.size and x.buf are initialized statically. 

Joseph is talking about the compile-time initialization of y.

> 
> > you should avoid the replacement in such a static initializer context when 
> > the FAM reference is to an object with a constant address (if 
> > .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
> > expression; if it works fine as a constant-address lvalue, then the 
> > replacement would be OK).
> 
> Then if such usage for the “counted_by” is valid, we need to replace the FAM 
> reference by a call to  .ACCESS_WITH_SIZE as well.
> Otherwise the “counted_by” relationship will be lost to the Middle end. 
> 
> With the current definition of .ACCESS_WITH_SIZE
> 
> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
> 
> Isn’t the PTR (return value of the call) a LVALUE? 

The question is whether we get an address constant
that can be used for compile-time initialization.

I think it would be good to collect a list of test
cases and to include this example.

Martin

> 
> Qing
> > 
> > -- 
> > Joseph S. Myers
> > jos...@codesourcery.com
> 



Re: [RFC] Make genautomata.cc output reflect insn-attr.h expectation:

2023-11-01 Thread Vladimir Makarov



On 10/31/23 18:51, Edwin Lu wrote:

genattr.cc currently generates insn-attr.h with the following structure:

#if CPU_UNITS_QUERY
extern int get_cpu_unit_code (const char *);
extern int cpu_unit_reservation_p (state_t, int);
#endif
extern bool insn_has_dfa_reservation_p (rtx_insn *);

however genautomata.cc generates insn-automata.cc with the following structure:
#if CPU_UNITS_QUERY
int get_cpu_unit_code (const char * ) { ... }
int cpu_unit_reservation_p (state_t, int) { ... }
bool insn_has_dfa_reservation_p (rtx_insn *) { ... }
#endif

I'm not sure if insn_has_dfa_reservation_p is supposed to be a part of the
CPU_UNITS_QUERY conditional group or not. For consistency, I would like to
move it outside of the group.


No, it should  be not considered a part of cpu unit query group. The 
function just says that there is any cpu reservation by insns.


Two other functions say that the state is still reserving a particular 
cpu unit.  Using these 2 functions requires a lot of memory for their 
implementation and prevent further dfa minimizations.  The functions 
should be used mostly for VLIW CPUs when we need this information to 
generate machine insns (e.g, ia64 VLIW insn template).



This would move insn_has_dfa_reservation_p out of the #if CPU_UNITS_QUERY
conditional inside of insn-automata.cc. This would allow us to see if the
scheduler is trying to schedule an insn with a type which is not associated
with a cpu unit or insn reservation through the TARGET_SCHED_VARIABLE_ISSUE
hook.

If there is a reason for insn_has_dfa_reservation_p being within the
conditional, please let me know!


It seems a typo.

The patch is ok for me.  Thank you for finding this out.


gcc/Changelog:

* genautomata.cc (write_automata): move endif

Signed-off-by: Edwin Lu 
---
  gcc/genautomata.cc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/genautomata.cc b/gcc/genautomata.cc
index 72f01686d6b..9dda25e5ba2 100644
--- a/gcc/genautomata.cc
+++ b/gcc/genautomata.cc
@@ -9503,9 +9503,9 @@ write_automata (void)
fprintf (output_file, "\n#if %s\n\n", CPU_UNITS_QUERY_MACRO_NAME);
output_get_cpu_unit_code_func ();
output_cpu_unit_reservation_p ();
-  output_insn_has_dfa_reservation_p ();
fprintf (output_file, "\n#endif /* #if %s */\n\n",
   CPU_UNITS_QUERY_MACRO_NAME);
+  output_insn_has_dfa_reservation_p ();
output_dfa_clean_insn_cache_func ();
output_dfa_start_func ();
output_dfa_finish_func ();




Re: [PATCH] c++: constantness of local var in constexpr fn [PR111703, PR112269]

2023-11-01 Thread Patrick Palka
On Tue, 31 Oct 2023, Patrick Palka wrote:

> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?  Does it look OK for release branches as well for sake of PR111703?
> 
> -- >8 --
> 
> potential_constant_expression was incorrectly treating most local
> variables from a constexpr function as (potentially) constant because it
> wasn't considering the 'now' parameter.  This patch fixes this by
> relaxing some var_in_maybe_constexpr_fn checks accordingly, which turns
> out to partially fix two recently reported regressions:
> 
> PR111703 is a regression caused by r11-550-gf65a3299a521a4 for
> restricting constexpr evaluation during warning-dependent folding.
> The mechanism is intended to restrict only constant evaluation of the
> instantiated non-dependent expression, but it also ends up restricting
> constant evaluation (as part of satisfaction) during instantiation of
> the expression, in particular when resolving the ck_rvalue conversion of
> the 'x' argument into a copy constructor call.

Oops, this analysis is inaccurate for this specific testcase (although
the general idea is the same)...  We don't call fold_for_warn on 'f(x)'
but rather on its 'x' argument that has been processed by
convert_arguments into an IMPLICIT_CONV_EXPR.  And it's the
instantiation of this IMPLICIT_CONV_EXPR that turns it into a copy
constructor call.  There is no ck_rvalue conversion at all here since
'f' is a function pointer, not an actual function, and so ICSes don't
get computed (IIUC).  If 'f' is changed to be an actual function then
there's no issue since build_over_call doesn't perform argument
conversions when in a template context and therefore doesn't call
check_function_arguments on the converted arguments (from which the
problematic fold_for_warn call occurs).

> This seems like a bug in
> the mechanism[1], though I don't know if we want to refine the mechanism
> or get rid of it completely since the original testcases which motivated
> the mechanism are fixed more simply by r13-1225-gb00b95198e6720.  In any
> case, this patch partially fixes this by making us correctly treat 'x'
> and therefore 'f(x)' in the below testcase as non-constant, which
> prevents the problematic warning-dependent folding from occurring at
> all.  If this bug crops up again then I figure we could decide what to
> do with the mechanism then.
> 
> PR112269 is caused by r14-4796-g3e3d73ed5e85e7 for merging tsubst_copy
> into tsubst_copy_and_build.  tsubst_copy used to exit early when 'args'
> was empty, behavior which that commit deliberately didn't preserve.
> This early exit masked the fact that COMPLEX_EXPR wasn't handled by
> tsubst at all, and is a tree code that apparently we could see during
> warning-dependent folding on some targets.  A complete fix is to add
> handling for this tree code in tsubst_expr, but this patch should fix
> the reported testsuite failures since the situations where COMPLEX_EXPR
> crops up in  turn out to not be constant expressions in the
> first place after this patch.
> 
> [1]: The mechanism incorrectly assumes that instantiation of the
> non-dependent expression shouldn't induce any template instantiation
> since ahead of time checking of the expression should've already induced
> whatever template instantiation was needed, but in this case although
> overload resolution was performed ahead of time, a ck_rvalue conversion
> gets resolved to a copy constructor call only at instantiation time.
> 
>   PR c++/111703
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (potential_constant_expression_1) :
>   Only consider var_in_maybe_constexpr_fn if 'now' is false.
>   : Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/concepts-fn8.C: New test.
> ---
>  gcc/cp/constexpr.cc   |  4 ++--
>  gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C | 24 +++
>  2 files changed, 26 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-fn8.C
> 
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index c05760e6789..8a6b210144a 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -9623,7 +9623,7 @@ potential_constant_expression_1 (tree t, bool 
> want_rval, bool strict, bool now,
> return RECUR (DECL_VALUE_EXPR (t), rval);
>   }
>if (want_rval
> -   && !var_in_maybe_constexpr_fn (t)
> +   && (now || !var_in_maybe_constexpr_fn (t))
> && !type_dependent_expression_p (t)
> && !decl_maybe_constant_var_p (t)
> && (strict
> @@ -9737,7 +9737,7 @@ potential_constant_expression_1 (tree t, bool 
> want_rval, bool strict, bool now,
>  STRIP_NOPS (x);
>  if (is_this_parameter (x) && !is_capture_proxy (x))
> {
> - if (!var_in_maybe_constexpr_fn (x))
> + if (now || !var_in_maybe_constexpr_fn (x))
> {
>   if (flags & tf_error)
> constexpr_error (loc, fundef_p, "use of % in

Re: [PING] Re: [PATCH] Add files to discourage submissions of PRs to the GitHub mirror.

2023-11-01 Thread Jeff Law




On 11/1/23 08:11, Eric Gallager wrote:

Hi, I'd like to ping the following patch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633191.html

OK for the trunk.

jeff


Re: [PING] [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-11-01 Thread Martin Uecker
Am Dienstag, dem 31.10.2023 um 22:19 + schrieb Joseph Myers:
> On Tue, 31 Oct 2023, Martin Uecker wrote:
> 
> > > +   if (TREE_CODE (arg) == INTEGER_CST
> > > +   && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
> 
> What if TYPE_SIZE_UNIT (ttl) is not an INTEGER_CST?  I don't see any tests 
> of the case of assigning to a pointer to a variably sized type.
> 

Right. Thanks! Revised version attached.

Martin



c: Add Walloc-size to warn about insufficient size in allocations [PR71219]

Add option Walloc-size that warns about allocations that have
insufficient storage for the target type of the pointer the
storage is assigned to. Added to Wextra.

PR c/71219
gcc:
* doc/invoke.texi: Document -Walloc-size option.

gcc/c-family:

* c.opt (Walloc-size): New option.

gcc/c:
* c-typeck.cc (convert_for_assignment): Add warning.

gcc/testsuite:

* gcc.dg/Walloc-size-1.c: New test.
* gcc.dg/Walloc-size-2.c: New test.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 44b9c862c14..29d3d789a49 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -331,6 +331,10 @@ Walloca
 C ObjC C++ ObjC++ Var(warn_alloca) Warning
 Warn on any use of alloca.
 
+Walloc-size
+C ObjC Var(warn_alloc_size) Warning LangEnabledBy(C ObjC, Wextra)
+Warn when allocating insufficient storage for the target type of the assigned 
pointer.
+
 Walloc-size-larger-than=
 C ObjC C++ LTO ObjC++ Var(warn_alloc_size_limit) Joined Host_Wide_Int ByteSize 
Warning Init(HOST_WIDE_INT_MAX)
 -Walloc-size-larger-than=   Warn for calls to allocation functions 
that
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 0de4662bfc6..16fadfb5468 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -7347,6 +7347,34 @@ convert_for_assignment (location_t location, location_t 
expr_loc, tree type,
"request for implicit conversion "
"from %qT to %qT not permitted in C++", rhstype, type);
 
+  /* Warn of new allocations that are not big enough for the target
+type.  */
+  tree fndecl;
+  if (warn_alloc_size
+ && TREE_CODE (rhs) == CALL_EXPR
+ && (fndecl = get_callee_fndecl (rhs)) != NULL_TREE
+ && DECL_IS_MALLOC (fndecl))
+   {
+ tree fntype = TREE_TYPE (fndecl);
+ tree fntypeattrs = TYPE_ATTRIBUTES (fntype);
+ tree alloc_size = lookup_attribute ("alloc_size", fntypeattrs);
+ if (alloc_size)
+   {
+ tree args = TREE_VALUE (alloc_size);
+ int idx = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1;
+ /* For calloc only use the second argument.  */
+ if (TREE_CHAIN (args))
+   idx = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN (args))) - 1;
+ tree arg = CALL_EXPR_ARG (rhs, idx);
+ if (TREE_CODE (arg) == INTEGER_CST
+ && INTEGER_CST == TREE_CODE (TYPE_SIZE_UNIT (ttl))
+ && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
+warning_at (location, OPT_Walloc_size, "allocation of "
+"insufficient size %qE for type %qT with "
+"size %qE", arg, ttl, TYPE_SIZE_UNIT (ttl));
+   }
+   }
+
   /* See if the pointers point to incompatible address spaces.  */
   asl = TYPE_ADDR_SPACE (ttl);
   asr = TYPE_ADDR_SPACE (ttr);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5a9284d635c..815a33d4b87 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8138,6 +8138,16 @@ always leads to a call to another @code{cold} function 
such as wrappers of
 C++ @code{throw} or fatal error reporting functions leading to @code{abort}.
 @end table
 
+@opindex Wno-alloc-size
+@opindex Walloc-size
+@item -Walloc-size
+Warn about calls to allocation functions decorated with attribute
+@code{alloc_size} that specify insufficient size for the target type of
+the pointer the result is assigned to, including those to the built-in
+forms of the functions @code{aligned_alloc}, @code{alloca},
+@code{calloc},
+@code{malloc}, and @code{realloc}.
+
 @opindex Wno-alloc-zero
 @opindex Walloc-zero
 @item -Walloc-zero
diff --git a/gcc/testsuite/gcc.dg/Walloc-size-1.c 
b/gcc/testsuite/gcc.dg/Walloc-size-1.c
new file mode 100644
index 000..61806f58192
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Walloc-size-1.c
@@ -0,0 +1,36 @@
+/* Tests the warnings for insufficient allocation size.
+   { dg-do compile }
+   { dg-options "-Walloc-size" }
+ * */
+#include 
+#include 
+
+struct b { int x[10]; };
+
+void fo0(void)
+{
+struct b *p = malloc(sizeof *p);
+}
+
+void fo1(void)
+{
+struct b *p = malloc(sizeof p);/* { dg-warning 
"allocation of insufficient size" } */
+}
+
+void fo2(void)
+{
+struct b *p = alloca(sizeof p);/* { dg-warning 
"allocation of insufficient

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-01 Thread Qing Zhao


> On Nov 1, 2023, at 11:00 AM, Martin Uecker  wrote:
> 
> Am Mittwoch, dem 01.11.2023 um 14:47 + schrieb Qing Zhao:
>> 
>>> On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
>>> 
>>> On Tue, 31 Oct 2023, Qing Zhao wrote:
>>> 
 2.3 A new semantic requirement in the user documentation of "counted_by"
 
 For the following structure including a FAM with a counted_by attribute:
 
 struct A
 {
  size_t size;
  char buf[] __attribute__((counted_by(size)));
 };
 
 for any object with such type:
 
 struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
 
 The setting to the size field should be done before the first reference 
 to the FAM field.
 
 Such requirement to the user will guarantee that the first reference to 
 the FAM knows the size of the FAM.
 
 We need to add this additional requirement to the user document.
>>> 
>>> Make sure the manual is very specific about exactly when size is 
>>> considered to be an accurate representation of the space available for buf 
>>> (given that, after malloc or realloc, it's going to be temporarily 
>>> inaccurate).  If the intent is that inaccurate size at such a time means 
>>> undefined behavior, say so explicitly.
>> 
>> Yes, good point. We need to define this clearly in the beginning. 
>> We need to explicit say that 
>> 
>> the size of the FAM is defined by the latest “counted_by” value. And it’s an 
>> undefined behavior when the size field is not defined when the FAM is 
>> referenced.
> 
> It is defined by the latest "counted_by" value before x.buf
> is referenced, but not the latest before x.buf is dereferenced.

Then:

The size of the FAM is defined by the latest “counted_by” value before the FAM 
is referenced. 
It’s an undefined behavior when the “counted_by” value is not initialized 
before the FAM is referenced. 

> 
>> 
>> Is the above good enough?
>> 
>> 
>>> 
 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
 
 In C FE:
 
 for every reference to a FAM, for example, "obj->buf" in the small example,
 check whether the corresponding FIELD_DECL has a "counted_by" attribute?
 if YES, replace the reference to "obj->buf" with a call to
 .ACCESS_WITH_SIZE (obj->buf, obj->size, -1); 
>>> 
>>> This seems plausible - but you should also consider the case of static 
>>> initializers - remember the GNU extension for statically allocated objects 
>>> with flexible array members (unless you're not allowing it with 
>>> counted_by).
>>> 
>>> static struct A x = { sizeof "hello", "hello" };
>>> static char *y = &x.buf;
>>> 
>>> I'd expect that to be valid - and unless you say such a usage is invalid, 
>> 
>> At this moment, I think that this should be valid.
>> 
>> I,e, the following:
>> 
>> struct A
>> {
>> size_t size;
>> char buf[] __attribute__((counted_by(size)));
>> };
>> 
>> static struct A x = {sizeof "hello", "hello”};
>> 
>> Should be valid, and x.size represents the number of elements of x.buf. 
>> Both x.size and x.buf are initialized statically. 
> 
> Joseph is talking about the compile-time initialization of y.

Okay, -:) 
so, this is the point where the x.buf is referenced,
 and I think that replacing this reference to a call to .ACCESS_WITH_SIZE is 
still needed.
Otherwise, the “counted_by” relationship will NOT be seen by the middle-end 
anymore.


> 
>> 
>>> you should avoid the replacement in such a static initializer context when 
>>> the FAM reference is to an object with a constant address (if 
>>> .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant 
>>> expression; if it works fine as a constant-address lvalue, then the 
>>> replacement would be OK).
>> 
>> Then if such usage for the “counted_by” is valid, we need to replace the FAM 
>> reference by a call to  .ACCESS_WITH_SIZE as well.
>> Otherwise the “counted_by” relationship will be lost to the Middle end. 
>> 
>> With the current definition of .ACCESS_WITH_SIZE
>> 
>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> 
>> Isn’t the PTR (return value of the call) a LVALUE? 
> 
> The question is whether we get an address constant
> that can be used for compile-time initialization.

Oh, I see.

So, now, PTR is already an constant at FE, the replacement will be

.ACCESS_WITH_SIZE( CONSTANT_ADDRESS, SIZE, ACCESS_MODE)

This looks awkward….
Should we allow this?

If not allowed, then the “counted_by” attribute will not work for the static 
initialization. 

> 
> I think it would be good to collect a list of test
> cases and to include this example.

Yes, I will put this into the testing case list.

Qing
> 
> Martin
> 
>> 
>> Qing
>>> 
>>> -- 
>>> Joseph S. Myers
>>> jos...@codesourcery.com



[PATCH v2] RISC-V: Use riscv_subword_address for atomic_test_and_set

2023-11-01 Thread Patrick O'Neill
Other subword atomic patterns use riscv_subword_address to calculate
the aligned address, shift amount, mask and !mask. atomic_test_and_set
was implemented before the common function was added. After this patch
all subword atomic patterns use riscv_subword_address.

gcc/ChangeLog:

* config/riscv/sync.md:  Use riscv_subword_address function to
calculate the address and shift in atomic_test_and_set.

Signed-off-by: Patrick O'Neill 
---
Changelog:
v2: Comment out the diff in the foreword so git doesn't get confused
when applying the patch
---
Tested using r14-5040-g5dc2ba333f8.

This patch causes this codegen to regress (adds a mv) but *only* on -O0.

extern void abort();

short x;

int main()
{
  if ( __atomic_test_and_set(&x, __ATOMIC_SEQ_CST))
abort();
}

Baseline:

main:
addisp,sp,-16
sd  ra,8(sp)
sd  s0,0(sp)
addis0,sp,16
lui a5,%hi(x)
addia5,a5,%lo(x)
andia4,a5,-4
andia5,a5,3
li  a3,1
slliw   a5,a5,3
sllwa2,a3,a5
amoor.w.aqrla3,a2,0(a4)
srlwa5,a3,a5
andia5,a5,0xff
beq a5,zero,.L2
callabort
.L2:
li  a5,0
mv  a0,a5
ld  ra,8(sp)
ld  s0,0(sp)
addisp,sp,16
jr  ra

After patch there is an additional mv:

main:
addisp,sp,-16
sd  ra,8(sp)
sd  s0,0(sp)
addis0,sp,16
lui a5,%hi(x)
addia5,a5,%lo(x)
andia3,a5,-4
andia5,a5,3
slliw   a5,a5,3
li  a4,1
sllwa2,a4,a5
amoor.w.aqrla4,a2,0(a3)
srawa4,a4,a5
>   mv  a5,a4
andia5,a5,0xff
beq a5,zero,.L2
callabort
.L2:
li  a5,0
mv  a0,a5
ld  ra,8(sp)
ld  s0,0(sp)
addisp,sp,16
jr  ra

This can be fixed using:
> diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
> index ad4751febd2..a9539977321 100644
> --- a/gcc/config/riscv/sync.md
> +++ b/gcc/config/riscv/sync.md
> @@ -530,10 +530,9 @@

>emit_insn (gen_atomic_fetch_orsi (old, aligned_mem, shifted_set, model));

> -  emit_move_insn (old, gen_rtx_ASHIFTRT (SImode, old,
> -gen_lowpart (QImode, shift)));
> -
> -  emit_move_insn (operands[0], gen_lowpart (QImode, old));
> +  emit_move_insn (gen_lowpart (SImode, operands[0]),
> + gen_rtx_ASHIFTRT (SImode, old,
> +   gen_lowpart (QImode, shift)));

>DONE;
>  })

But I think it hurts read/grokability of the .md sequence. If it's worth
changing for -O0 generated sequences, let me know and I'll send a follow
up patch.
---
 gcc/config/riscv/sync.md | 41 +---
 1 file changed, 17 insertions(+), 24 deletions(-)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 6ff3493b5ce..ad4751febd2 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -504,43 +504,36 @@
(set (attr "length") (const_int 28))])

 (define_expand "atomic_test_and_set"
-  [(match_operand:QI 0 "register_operand" "") ;; bool output
+  [(match_operand:QI 0 "register_operand" "");; bool output
(match_operand:QI 1 "memory_operand" "+A");; memory
-   (match_operand:SI 2 "const_int_operand" "")]   ;; model
+   (match_operand:SI 2 "const_int_operand" "")]  ;; model
   "TARGET_ATOMIC"
 {
   /* We have no QImode atomics, so use the address LSBs to form a mask,
  then use an aligned SImode atomic.  */
-  rtx result = operands[0];
+  rtx old = gen_reg_rtx (SImode);
   rtx mem = operands[1];
   rtx model = operands[2];
-  rtx addr = force_reg (Pmode, XEXP (mem, 0));
-
-  rtx aligned_addr = gen_reg_rtx (Pmode);
-  emit_move_insn (aligned_addr, gen_rtx_AND (Pmode, addr, GEN_INT (-4)));
+  rtx set = gen_reg_rtx (QImode);
+  rtx aligned_mem = gen_reg_rtx (SImode);
+  rtx shift = gen_reg_rtx (SImode);

-  rtx aligned_mem = change_address (mem, SImode, aligned_addr);
-  set_mem_alias_set (aligned_mem, 0);
+  /* Unused.  */
+  rtx _mask = gen_reg_rtx (SImode);
+  rtx _not_mask = gen_reg_rtx (SImode);

-  rtx offset = gen_reg_rtx (SImode);
-  emit_move_insn (offset, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
-  GEN_INT (3)));
+  riscv_subword_address (mem, &aligned_mem, &shift, &_mask, &_not_mask);

-  rtx tmp = gen_reg_rtx (SImode);
-  emit_move_insn (tmp, GEN_INT (1));
+  emit_move_insn (set, GEN_INT (1));
+  rtx shifted_set = gen_reg_rtx (SImode);
+  riscv_lshift_subword (QImode, set, shift, &shifted_set);

-  rtx shmt = gen_reg_rtx (SImode);
-  emit_move_insn (shmt, gen_rtx_ASHIFT (SImode, offset, GEN_INT (3)));
+  emit_insn (gen_atomic_fetch_orsi (old, aligned_mem, shifted_set, model));

-  rtx word = gen_reg_rtx (SImode);
-  emit_move_insn (word, gen_rtx_AS

Re: [PATCH v4] VECT: Refine the type size restriction of call vectorizer

2023-11-01 Thread Richard Biener



> Am 31.10.2023 um 16:10 schrieb pan2...@intel.com:
> 
> From: Pan Li 
> 
> Update in v4:
> 
> * Append the check to vectorizable_internal_function.
> 
> Update in v3:
> 
> * Add func to predicate type size is legal or not for vectorizer call.
> 
> Update in v2:
> 
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
> 
> Original log:
> 
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
> 
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>  for (unsigned i = 0; i < count; i++)
>out[i] = __builtin_lrintf (in[i]);
> }
> 
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
> 
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to refine this data
> type size check and unblock the standard name like lrintmn2 on conditions.
> 
> The type size of vectype_out need to be exactly the same as the type
> size of vectype_in when the vectype_out size isn't participating in
> the optab selection. While there is no such restriction when the
> vectype_out is somehow a part of the optab query.
> 
> The below test are passed for this patch.
> 
> * The risc-v regression tests.
> * Ensure the lrintf standard name in risc-v.
> 
> The below test are ongoing.
> 
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> 

Ok

Thanks,
Richard 

> gcc/ChangeLog:
> 
>* tree-vect-stmts.cc (vectorizable_internal_function): Add type
>size check for vectype_out doesn't participating for optab query.
>(vectorizable_call): Remove the type size check.
> 
> Signed-off-by: Pan Li 
> ---
> gcc/tree-vect-stmts.cc | 22 +-
> 1 file changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..799b4ab10c7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree 
> fndecl,
>   const direct_internal_fn_info &info = direct_internal_fn (ifn);
>   if (info.vectorizable)
>{
> +  bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
>  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
>  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +
> +  /* The type size of both the vectype_in and vectype_out should be
> + exactly the same when vectype_out isn't participating the optab.
> + While there is no restriction for type size when vectype_out
> + is part of the optab query.  */
> +  if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
> +return IFN_LAST;
> +
>  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
>  OPTIMIZE_FOR_SPEED))
>return ifn;
> @@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
> 
>   return false;
> }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> - just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> - by a pack of the two vectors into an SI vector.  We would need
> - separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -{
> -  if (dump_enabled_p ())
> -dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> - "mismatched vector sizes %T and %T\n",
> - vectype_in, vectype_out);
> -  return false;
> -}
> 
>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>   != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> -- 
> 2.34.1
> 


Re: [Committed] Make genautomata.cc output reflect insn-attr.h expectation:

2023-11-01 Thread Edwin Lu

On 11/1/2023 8:07 AM, Vladimir Makarov wrote:


On 10/31/23 18:51, Edwin Lu wrote:

genattr.cc currently generates insn-attr.h with the following structure:

#if CPU_UNITS_QUERY
extern int get_cpu_unit_code (const char *);
extern int cpu_unit_reservation_p (state_t, int);
#endif
extern bool insn_has_dfa_reservation_p (rtx_insn *);

however genautomata.cc generates insn-automata.cc with the following 
structure:

#if CPU_UNITS_QUERY
int get_cpu_unit_code (const char * ) { ... }
int cpu_unit_reservation_p (state_t, int) { ... }
bool insn_has_dfa_reservation_p (rtx_insn *) { ... }
#endif

I'm not sure if insn_has_dfa_reservation_p is supposed to be a part of 
the
CPU_UNITS_QUERY conditional group or not. For consistency, I would 
like to

move it outside of the group.


No, it should  be not considered a part of cpu unit query group. The 
function just says that there is any cpu reservation by insns.


Two other functions say that the state is still reserving a particular 
cpu unit.  Using these 2 functions requires a lot of memory for their 
implementation and prevent further dfa minimizations.  The functions 
should be used mostly for VLIW CPUs when we need this information to 
generate machine insns (e.g, ia64 VLIW insn template).



Thanks for the clarification!


This would move insn_has_dfa_reservation_p out of the #if CPU_UNITS_QUERY
conditional inside of insn-automata.cc. This would allow us to see if the
scheduler is trying to schedule an insn with a type which is not 
associated
with a cpu unit or insn reservation through the 
TARGET_SCHED_VARIABLE_ISSUE

hook.

If there is a reason for insn_has_dfa_reservation_p being within the
conditional, please let me know!


It seems a typo.

The patch is ok for me.  Thank you for finding this out.


Committed!

Edwin

gcc/Changelog:

* genautomata.cc (write_automata): move endif

Signed-off-by: Edwin Lu 
---
  gcc/genautomata.cc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/genautomata.cc b/gcc/genautomata.cc
index 72f01686d6b..9dda25e5ba2 100644
--- a/gcc/genautomata.cc
+++ b/gcc/genautomata.cc
@@ -9503,9 +9503,9 @@ write_automata (void)
    fprintf (output_file, "\n#if %s\n\n", CPU_UNITS_QUERY_MACRO_NAME);
    output_get_cpu_unit_code_func ();
    output_cpu_unit_reservation_p ();
-  output_insn_has_dfa_reservation_p ();
    fprintf (output_file, "\n#endif /* #if %s */\n\n",
 CPU_UNITS_QUERY_MACRO_NAME);
+  output_insn_has_dfa_reservation_p ();
    output_dfa_clean_insn_cache_func ();
    output_dfa_start_func ();
    output_dfa_finish_func ();








[Patch, fortran] PR98498 - Interp request: defined operators and unlimited polymorphic

2023-11-01 Thread Paul Richard Thomas
The interpretation request came in a long time ago but I only just got
around to implementing it.

The updated text from the standard is in the comment. Now I am writing
this, I think that I should perhaps use switch(op)/case rather than using
if/else if and depending on the order of the gfc_intrinsic_op enum being
maintained. Thoughts?

The testcase runs fine with both mainline and nagfor. I think that
compile-only with counts of star-eq and star_not should suffice.

Regtests with no regressions. OK for mainline?

Paul

Fortran: Defined operators with unlimited polymorphic args [PR98498]

2023-11-01  Paul Thomas  

gcc/fortran
PR fortran/98498
* interface.cc (upoly_ok): New function.
(gfc_extend_expr): Use new function to ensure that defined
operators using unlimited polymorphic formal arguments do not
override their intrinsic uses.

gcc/testsuite/
PR fortran/98498
* gfortran.dg/interface_50.f90: New test.
diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc
index 8c4571e0aa6..ba7fb5dfea5 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -4616,6 +4616,35 @@ build_compcall_for_operator (gfc_expr* e, gfc_actual_arglist* actual,
 }
 
 
+/* Check if the type of an actual argument is OK to use with an
+   unlimited polymorphic formal argument in a defined operation.  */
+
+static bool
+upoly_ok (bt type, gfc_intrinsic_op op)
+{
+  bool ok = false;
+  if (type == BT_DERIVED || type == BT_CLASS)
+ok = true;
+  else if ((op >= INTRINSIC_UPLUS && op <= INTRINSIC_POWER)
+	   && (type == BT_LOGICAL || type == BT_CHARACTER))
+ok = true;
+  else if ((op == INTRINSIC_CONCAT) && (type != BT_CHARACTER))
+ok = true;
+  else if ((op >= INTRINSIC_GT && op <= INTRINSIC_LE)
+	   && (type == BT_COMPLEX))
+ok = true;
+  else if ((op >= INTRINSIC_GT_OS) && (op <= INTRINSIC_LE_OS)
+	   && (type == BT_COMPLEX))
+ok = true;
+  else if ((op >= INTRINSIC_AND) && (op <= INTRINSIC_NEQV)
+	   && (type != BT_LOGICAL))
+ok = true;
+  else if ((op == INTRINSIC_NOT) && (type != BT_LOGICAL))
+ok = true;
+  return ok;
+}
+
+
 /* This subroutine is called when an expression is being resolved.
The expression node in question is either a user defined operator
or an intrinsic operator with arguments that aren't compatible
@@ -4737,6 +4766,24 @@ gfc_extend_expr (gfc_expr *e)
 	  if (sym != NULL)
 	break;
 	}
+
+  /* F2018(15.4.3.4.2): "If the operator is an intrinsic-operator (R608),
+	 the number of dummy arguments shall be consistent with the intrinsic
+	 uses of that operator, and the types, kind type parameters, or ranks
+	 of the dummy arguments shall differ from those required for the
+	 intrinsic operation (10.1.5)." ie. the use of unlimited polymorphic
+	 formal arguments must not override the intrinsic uses.  */
+  if (sym && (UNLIMITED_POLY (sym->formal->sym)
+		  || (sym->formal->next
+		  && UNLIMITED_POLY (sym->formal->next->sym
+	{
+	  bool arg2 = (actual->next != NULL);
+	  bool a1ok = upoly_ok (actual->expr->ts.type, e->value.op.op);
+	  bool a2ok = arg2 && upoly_ok (actual->next->expr->ts.type,
+	e->value.op.op);
+	  if ((!arg2 && !a1ok) || (arg2 && (!a1ok && !a2ok)))
+	sym = NULL;
+	}
 }
 
   /* TODO: Do an ambiguity-check and error if multiple matching interfaces are
! { dg-do compile }
! { dg-options "-fdump-tree-original" }
!
! Tests the fix for PR98498, which was subject to an interpretation request
! as to whether or not the interface operator overrode the intrinsic use.
! (See PR for correspondence)
!
! Contributed by Paul Thomas  
!
MODULE mytypes
  IMPLICIT none

  TYPE pvar
 character(len=20) :: name
 integer   :: level
  end TYPE pvar

  interface operator (==)
 module procedure star_eq
  end interface

  interface operator (.not.)
 module procedure star_not
  end interface

contains
  function star_eq(a, b)
implicit none
class(*), intent(in) :: a, b
logical :: star_eq
select type (a)
  type is (pvar)
  select type (b)
type is (pvar)
  if((a%level .eq. b%level) .and. (a%name .eq. b%name)) then
star_eq = .true.
  else
star_eq = .false.
  end if
type is (integer)
  star_eq = (a%level == b)
  end select
  class default
star_eq = .false.
end select
  end function star_eq

  function star_not (a)
implicit none
class(*), intent(in) :: a
type(pvar) :: star_not
select type (a)
  type is (pvar)
star_not = a
star_not%level = -star_not%level
  type is (real)
star_not = pvar ("real", -int(a))
  class default
star_not = pvar ("noname", 0)
end select
  end function

end MODULE mytypes

program test_eq
   use mytypes
   implicit none

   type(pvar) x, y
   integer :: i = 4
   real :: r = 2.0
! Check that intrinsic use of .not. and == is not overridden.
   if (.not.(i == 2*int (r))) stop 1
   if (r == 1.0) stop 2

! Test defined operat

Help: which routine in C FE I should look at for the reference to a FAM field?

2023-11-01 Thread Qing Zhao
Joseph and Martin,

For the task to replace every reference to a FAM field with an call to 
.ACCESS_WITH_SIZE, 
Where in the C FE I should look at?

Thanks a lot for the help.


Qing

Ping: [PATCH v3] libiberty: Use posix_spawn in pex-unix when available.

2023-11-01 Thread Brendan Shanks
Polite ping on this.

> On Oct 4, 2023, at 11:28 AM, Brendan Shanks  wrote:
> 
> Hi,
> 
> This patch implements pex_unix_exec_child using posix_spawn when
> available.
> 
> This should especially benefit recent macOS (where vfork just calls
> fork), but should have equivalent or faster performance on all
> platforms.
> In addition, the implementation is substantially simpler than the
> vfork+exec code path.
> 
> Tested on x86_64-linux.
> 
> v2: Fix error handling (previously the function would be run twice in
> case of error), and don't use a macro that changes control flow.
> 
> v3: Match file style for error-handling blocks, don't close
> in/out/errdes on error, and check close() for errors.
> 
> libiberty/
> * configure.ac (AC_CHECK_HEADERS): Add spawn.h.
> (checkfuncs): Add posix_spawn, posix_spawnp.
> (AC_CHECK_FUNCS): Add posix_spawn, posix_spawnp.
> * configure, config.in: Rebuild.
> * pex-unix.c [HAVE_POSIX_SPAWN] (pex_unix_exec_child): New function.
> 
> Signed-off-by: Brendan Shanks 
> ---
> libiberty/configure.ac |   8 +-
> libiberty/pex-unix.c   | 168 +
> 2 files changed, 173 insertions(+), 3 deletions(-)
> 
> diff --git a/libiberty/configure.ac b/libiberty/configure.ac
> index 0748c592704..2488b031bc8 100644
> --- a/libiberty/configure.ac
> +++ b/libiberty/configure.ac
> @@ -289,7 +289,7 @@ AC_SUBST_FILE(host_makefile_frag)
> # It's OK to check for header files.  Although the compiler may not be
> # able to link anything, it had better be able to at least compile
> # something.
> -AC_CHECK_HEADERS(sys/file.h sys/param.h limits.h stdlib.h malloc.h string.h 
> unistd.h strings.h sys/time.h time.h sys/resource.h sys/stat.h sys/mman.h 
> fcntl.h alloca.h sys/pstat.h sys/sysmp.h sys/sysinfo.h machine/hal_sysinfo.h 
> sys/table.h sys/sysctl.h sys/systemcfg.h stdint.h stdio_ext.h process.h 
> sys/prctl.h)
> +AC_CHECK_HEADERS(sys/file.h sys/param.h limits.h stdlib.h malloc.h string.h 
> unistd.h strings.h sys/time.h time.h sys/resource.h sys/stat.h sys/mman.h 
> fcntl.h alloca.h sys/pstat.h sys/sysmp.h sys/sysinfo.h machine/hal_sysinfo.h 
> sys/table.h sys/sysctl.h sys/systemcfg.h stdint.h stdio_ext.h process.h 
> sys/prctl.h spawn.h)
> AC_HEADER_SYS_WAIT
> AC_HEADER_TIME
> 
> @@ -412,7 +412,8 @@ funcs="$funcs setproctitle"
> vars="sys_errlist sys_nerr sys_siglist"
> 
> checkfuncs="__fsetlocking canonicalize_file_name dup3 getrlimit getrusage \
> - getsysinfo gettimeofday on_exit pipe2 psignal pstat_getdynamic 
> pstat_getstatic \
> + getsysinfo gettimeofday on_exit pipe2 posix_spawn posix_spawnp psignal \
> + pstat_getdynamic pstat_getstatic \
>  realpath setrlimit spawnve spawnvpe strerror strsignal sysconf sysctl \
>  sysmp table times wait3 wait4"
> 
> @@ -435,7 +436,8 @@ if test "x" = "y"; then
> index insque \
> memchr memcmp memcpy memmem memmove memset mkstemps \
> on_exit \
> -pipe2 psignal pstat_getdynamic pstat_getstatic putenv \
> +pipe2 posix_spawn posix_spawnp psignal \
> +pstat_getdynamic pstat_getstatic putenv \
> random realpath rename rindex \
> sbrk setenv setproctitle setrlimit sigsetmask snprintf spawnve spawnvpe \
>  stpcpy stpncpy strcasecmp strchr strdup \
> diff --git a/libiberty/pex-unix.c b/libiberty/pex-unix.c
> index 33b5bce31c2..336799d1125 100644
> --- a/libiberty/pex-unix.c
> +++ b/libiberty/pex-unix.c
> @@ -58,6 +58,9 @@ extern int errno;
> #ifdef HAVE_PROCESS_H
> #include 
> #endif
> +#ifdef HAVE_SPAWN_H
> +#include 
> +#endif
> 
> #ifdef vfork /* Autoconf may define this to fork for us. */
> # define VFORK_STRING "fork"
> @@ -559,6 +562,171 @@ pex_unix_exec_child (struct pex_obj *obj 
> ATTRIBUTE_UNUSED,
>   return (pid_t) -1;
> }
> 
> +#elif defined(HAVE_POSIX_SPAWN) && defined(HAVE_POSIX_SPAWNP)
> +/* Implementation of pex->exec_child using posix_spawn.*/
> +
> +static pid_t
> +pex_unix_exec_child (struct pex_obj *obj ATTRIBUTE_UNUSED,
> + int flags, const char *executable,
> + char * const * argv, char * const * env,
> + int in, int out, int errdes,
> + int toclose, const char **errmsg, int *err)
> +{
> +  int ret;
> +  pid_t pid = -1;
> +  posix_spawnattr_t attr;
> +  posix_spawn_file_actions_t actions;
> +  int attr_initialized = 0, actions_initialized = 0;
> +
> +  *err = 0;
> +
> +  ret = posix_spawnattr_init (&attr);
> +  if (ret)
> +{
> +  *err = ret;
> +  *errmsg = "posix_spawnattr_init";
> +  goto exit;
> +}
> +  attr_initialized = 1;
> +
> +  /* Use vfork() on glibc <=2.24. */
> +#ifdef POSIX_SPAWN_USEVFORK
> +  ret = posix_spawnattr_setflags (&attr, POSIX_SPAWN_USEVFORK);
> +  if (ret)
> +{
> +  *err = ret;
> +  *errmsg = "posix_spawnattr_setflags";
> +  goto exit;
> +}
> +#endif
> +
> +  ret = posix_spawn_file_actions_init (&actions);
> +  if (ret)
> +{
> +  *err = ret;
> +  *errmsg = "posix_spawn_file_actions_init";
> +  goto exit;
> +}
> +  actions_initialized = 1;
> +
> +  if (in != STDIN_

[PATCH] RISC-V: Add check for types without insn reservations

2023-11-01 Thread Edwin Lu
Now that all insns are guaranteed to have a type, ensure every insn
is associated with a cpu unit/insn reservation.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_variable_issue): add disabled 
assert

Signed-off-by: Edwin Lu 
---
 gcc/config/riscv/riscv.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0148a4f2e43..0bfd06902e5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7751,6 +7751,12 @@ riscv_sched_variable_issue (FILE *, int, rtx_insn *insn, 
int more)
  an assert so we can find and fix this problem.  */
   gcc_assert (get_attr_type (insn) != TYPE_UNKNOWN);
 
+  /* If we ever encounter an insn without an insn reservation, trip
+ an assert so we can find and fix this problem.  */
+#if 0
+  gcc_assert (insn_has_dfa_reservation_p (insn));
+#endif
+
   return more - 1;
 }
 
-- 
2.34.1



Re: Help: which routine in C FE I should look at for the reference to a FAM field?

2023-11-01 Thread Martin Uecker
Am Mittwoch, dem 01.11.2023 um 18:14 + schrieb Qing Zhao:
> Joseph and Martin,
> 
> For the task to replace every reference to a FAM field with an call to 
> .ACCESS_WITH_SIZE, 
> Where in the C FE I should look at?
> 
> Thanks a lot for the help.
> 
> 

build_component_ref in c_decl.cc

Martin

> Qing



Re: [PATCH] RISC-V: Add check for types without insn reservations

2023-11-01 Thread Jeff Law




On 11/1/23 12:17, Edwin Lu wrote:

Now that all insns are guaranteed to have a type, ensure every insn
is associated with a cpu unit/insn reservation.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_variable_issue): add disabled 
assert
OK.  Really interested to see how often this trips in practice.  I 
suspect often right now ;-)


jeff




Re: [PATCH v2] RISC-V: Use riscv_subword_address for atomic_test_and_set

2023-11-01 Thread Jeff Law




On 11/1/23 10:14, Patrick O'Neill wrote:

Other subword atomic patterns use riscv_subword_address to calculate
the aligned address, shift amount, mask and !mask. atomic_test_and_set
was implemented before the common function was added. After this patch
all subword atomic patterns use riscv_subword_address.

gcc/ChangeLog:

* config/riscv/sync.md:  Use riscv_subword_address function to
calculate the address and shift in atomic_test_and_set.
OK.  No strong opinions on the extraneous move at -O0.  It does make a 
tiny bit of work for the optimizers, but I doubt it matters in practice.


Your call if you want to fix that as a follow-up or not.

jeff


Re: [PATCH] RISC-V: Allow dest operand and accumulator operand overlap of widen reduction instruction[PR112327]

2023-11-01 Thread Jeff Law




On 11/1/23 00:56, Juzhe-Zhong wrote:


Consider this following intrinsic code:

void rvv_dot_prod(int16_t *pSrcA, int16_t *pSrcB, uint32_t n, int64_t *result)
{
 size_t vl;
 vint16m4_t vSrcA, vSrcB;
 vint64m1_t vSum = __riscv_vmv_s_x_i64m1(0, 1);
 while (n > 0) {
 vl = __riscv_vsetvl_e16m4(n);
 vSrcA = __riscv_vle16_v_i16m4(pSrcA, vl);
 vSrcB = __riscv_vle16_v_i16m4(pSrcB, vl);
 vSum = __riscv_vwredsum_vs_i32m8_i64m1(__riscv_vwmul_vv_i32m8(vSrcA, 
vSrcB, vl), vSum, vl);
 pSrcA += vl;
 pSrcB += vl;
 n -= vl;
 }
 *result = __riscv_vmv_x_s_i64m1_i64(vSum);
}

https://godbolt.org/z/vWd35W7G6

Before this patch:

...
Loop:
...
vmv1r.v v2,v1
...
vwredsum.vs v1,v8,v2
...

After this patch:

...
Loop:
...
vwredsum.vs v1,v8,v1
...

PR target/112327

gcc/ChangeLog:

* config/riscv/vector.md: Add '0'.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112327-1.c: New test.
* gcc.target/riscv/rvv/base/pr112327-2.c: New test.

OK
jeff


Re: [PATCH v2] RISC-V: Enable ztso tests on rv32

2023-11-01 Thread Jeff Law




On 10/31/23 17:25, Patrick O'Neill wrote:

This patch transitions the ztso testcases to use the testsuite infrastructure,
enabling the tests on both rv64 and rv32 targets.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add Ztso extension to
dg-options for dg-do compile.
 * gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-4.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-5.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-load-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-load-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-load-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-store-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-store-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-store-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.
 * lib/target-supports.exp: Add testing infrastructure to require the
Ztso extension or add it to an existing -march.

Signed-off-by: Patrick O'Neill 
---
Before committing v1, I ran the full testsuite as a sanity check and found
failures that don't happen when running the testcases individually. v2 fixes
those failures using common-sense fixes.
OK for the trunk.  Thanks for doing the deeper testing and addressing 
the issues that showed up.


Re: [PATCH] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-11-01 Thread Jeff Law




On 10/31/23 12:35, Vineet Gupta wrote:

riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case.

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code path.

[Usual caveat, I'll wait for Pre-commit CI to run the tests and report]

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
  returned for libcall case.
Macro that change their arguments are evil ;(   I think Juzhe's patch in 
this space a few months back wasn't supposed to change behavior.


OK once CI finishes without regressions.

jeff


Re: [PATCH] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-11-01 Thread Vineet Gupta




On 11/1/23 12:11, Jeff Law wrote:



On 10/31/23 12:35, Vineet Gupta wrote:

riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case.

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code 
path.


[Usual caveat, I'll wait for Pre-commit CI to run the tests and report]

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
  returned for libcall case.
Macro that change their arguments are evil ;(   I think Juzhe's patch 
in this space a few months back wasn't supposed to change behavior.


Oh, its a regression. I can add a Fixes: tag



OK once CI finishes without regressions.


Thx,
-Vineet


Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-11-01 Thread Uros Bizjak
On Wed, Nov 1, 2023 at 1:58 PM Roger Sayle  wrote:
>
>
> Hi Uros,
>
> > From: Uros Bizjak 
> > Sent: 01 November 2023 10:05
> > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation
> > using peephole2.
> >
> > On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle 
> > wrote:
> > >
> > >
> > > This patch is a follow-up to my previous PR target/110551 patch, this
> > > time to address the additional move after mulx, seen on TARGET_BMI2
> > > architectures (such as -march=haswell).  The complication here is that
> > > the flexible multiple-set mulx instruction is introduced into RTL
> > > after reload, by split2, and therefore can't benefit from register
> > > preferencing.  This results in RTL like the following:
> > >
> > > (insn 32 31 17 2 (parallel [
> > > (set (reg:DI 4 si [orig:101 r ] [101])
> > > (mult:DI (reg:DI 1 dx [109])
> > > (reg:DI 5 di [109])))
> > > (set (reg:DI 5 di [ r+8 ])
> > > (umul_highpart:DI (reg:DI 1 dx [109])
> > > (reg:DI 5 di [109])))
> > > ]) "pr110551-2.c":8:17 -1
> > >  (nil))
> > >
> > > (insn 17 32 9 2 (set (reg:DI 0 ax [107])
> > > (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal}
> > >  (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ])
> > > (nil)))
> > >
> > > Here insn 32, the mulx instruction, places its results in si and di,
> > > and then immediately after decides to move di to ax, with di now dead.
> > > This can be trivially cleaned up by a peephole2.  I've added an
> > > additional constraint that the two SET_DESTs can't be the same
> > > register to avoid confusing the middle-end, but this has well-defined
> > > behaviour on x86_64/BMI2, encoding a umul_highpart.
> > >
> > > For the new test case, compiled on x86_64 with -O2 -march=haswell:
> > >
> > > Before:
> > > mulx64: movabsq $-7046029254386353131, %rdx
> > > mulx%rdi, %rsi, %rdi
> > > movq%rdi, %rax
> > > xorq%rsi, %rax
> > > ret
> > >
> > > After:
> > > mulx64: movabsq $-7046029254386353131, %rdx
> > > mulx%rdi, %rsi, %rax
> > > xorq%rsi, %rax
> > > ret
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> >
> > It looks that your previous PR110551 patch regressed -march=cascadelake [1].
> > Let's fix these regressions first.
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html
> >
> > Uros.
>
> This patch fixes that "regression".  Originally, the test case in PR110551 
> contained
> one unnecessary mov on "default" x86_targets, but two extra movs on BMI2
> targets, including -march=haswell and -march=cascadelake.  The first patch
> eliminated one of these MOVs, this patch eliminates the second.  I'm not sure
> that you can call it a regression, the added test failed when run with a 
> non-standard
> -march setting.  The good news is that test case doesn't have to be changed 
> with
> this patch applied, i.e. the correct intended behaviour is no MOVs on all 
> architectures.

I was not worried about the extra mov, but more about [2], also
referred from [1], but it looks that that regression was wrongly
attributed to your patch.

[2] https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078391.html

> I'll admit the timing is unusual; I had already written and was regression 
> testing a
> patch for the BMI2 issue, when the -march=cascadelake regression tester let me
> know it was required for folks that helpfully run the regression suite with 
> non
> standard settings.  i.e. a long standing bug that wasn't previously tested 
> for by
> the testsuite.
>
> > > 2023-10-30  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > PR target/110551
> > > * config/i386/i386.md (*bmi2_umul3_1): Tidy condition
> > > as operands[2] with predicate register_operand must be !MEM_P.
> > > (peephole2): Optimize a mulx followed by a register-to-register
> > > move, to place result in the correct destination if possible.
> > >
> > > gcc/testsuite/ChangeLog
> > > PR target/110551
> > > * gcc.target/i386/pr110551-2.c: New test case.

The patch is OK.

Thanks,
Uros.


Re: [Patch, fortran] PR98498 - Interp request: defined operators and unlimited polymorphic

2023-11-01 Thread Harald Anlauf

Hi Paul,

Am 01.11.23 um 19:02 schrieb Paul Richard Thomas:

The interpretation request came in a long time ago but I only just got
around to implementing it.

The updated text from the standard is in the comment. Now I am writing
this, I think that I should perhaps use switch(op)/case rather than using
if/else if and depending on the order of the gfc_intrinsic_op enum being
maintained. Thoughts?


the logic is likely harder to parse with if/else than with 
switch(op)/case.  However, I do not think that the order of

the enum will ever be changed, as the module format relies
on that very order.


The testcase runs fine with both mainline and nagfor. I think that
compile-only with counts of star-eq and star_not should suffice.


I found other cases that are rejected even with your patch,
but which are accepted by nagfor.  Example:

   print *, ('a' == c)

Nagfor prints F at runtime as expected, as it correctly resolves
this to star_eq.  Further examples can be easily constructed.

Can you have a look?

Thanks,
Harald


Regtests with no regressions. OK for mainline?

Paul

Fortran: Defined operators with unlimited polymorphic args [PR98498]

2023-11-01  Paul Thomas  

gcc/fortran
PR fortran/98498
* interface.cc (upoly_ok): New function.
(gfc_extend_expr): Use new function to ensure that defined
operators using unlimited polymorphic formal arguments do not
override their intrinsic uses.

gcc/testsuite/
PR fortran/98498
* gfortran.dg/interface_50.f90: New test.






Re: [PING] [C PATCH, v2] Add Walloc-size to warn about insufficient size in allocations [PR71219]

2023-11-01 Thread Joseph Myers
On Wed, 1 Nov 2023, Martin Uecker wrote:

> Am Dienstag, dem 31.10.2023 um 22:19 + schrieb Joseph Myers:
> > On Tue, 31 Oct 2023, Martin Uecker wrote:
> > 
> > > > + if (TREE_CODE (arg) == INTEGER_CST
> > > > + && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
> > 
> > What if TYPE_SIZE_UNIT (ttl) is not an INTEGER_CST?  I don't see any tests 
> > of the case of assigning to a pointer to a variably sized type.
> > 
> 
> Right. Thanks! Revised version attached.

This version is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-11-01 Thread Patrick O'Neill



On 11/1/23 12:19, Vineet Gupta wrote:



On 11/1/23 12:11, Jeff Law wrote:



On 10/31/23 12:35, Vineet Gupta wrote:

riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case.

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which 
probes
the ABI (using a NULL tree type) and ends up hitting the libcall 
code path.


[Usual caveat, I'll wait for Pre-commit CI to run the tests and report]

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
  returned for libcall case.
Macro that change their arguments are evil ;(   I think Juzhe's patch 
in this space a few months back wasn't supposed to change behavior.


Oh, its a regression. I can add a Fixes: tag



OK once CI finishes without regressions.


Thx,
-Vineet


It passes precommit CI without any new failures:
https://github.com/ewlu/gcc-precommit-ci/issues/526#issuecomment-1787891174

Tested-by: Patrick O'Neill 

Thanks,
Patrick


[[Committed]] RISC-V: fix TARGET_PROMOTE_FUNCTION_MODE hook for libcalls

2023-11-01 Thread Vineet Gupta
Fixes: 3496ca4e6566 ("RISC-V: Add runtime invariant support")

riscv_promote_function_mode doesn't promote a SI to DI for libcalls
case. It intends to do that however the code is broken (regression).

The fix is what generic promote_mode () in explow.cc does. I really
don't understand why the old code didn't work, but stepping thru the
debugger shows old code didn't and fixed does.

This showed up when testing Ajit's REE ABI extension series which probes
the ABI (using a NULL tree type) and ends up hitting the libcall code path.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_promote_function_mode): Fix mode
returned for libcall case.

Tested-by: Patrick O'Neill  # pre-commit-CI #526
Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0148a4f2e431..895a098cd9a6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8630,9 +8630,10 @@ riscv_promote_function_mode (const_tree type 
ATTRIBUTE_UNUSED,
 return promote_mode (type, mode, punsignedp);
 
   unsignedp = *punsignedp;
-  PROMOTE_MODE (as_a  (mode), unsignedp, type);
+  scalar_mode smode = as_a  (mode);
+  PROMOTE_MODE (smode, unsignedp, type);
   *punsignedp = unsignedp;
-  return mode;
+  return smode;
 }
 
 /* Implement TARGET_MACHINE_DEPENDENT_REORG.  */
-- 
2.34.1



[PATCH] preprocessor: Reinitialize frontend parser after loading a PCH [PR112319]

2023-11-01 Thread Lewis Hyatt
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112319

This is a one-line patch to fix the GCC 14 regression noted in the
PR. Bootstrap + regtest all languages on x86-64 looks good. Is it OK please?
Thanks!

-Lewis

-- >8 --

Since r14-2893, the frontend parser object needs to exist when running in
preprocess-only mode, because pragma_lex() is now called in that mode and
needs to make use of it. This is handled by calling c_init_preprocess() at
startup. If -fpch-preprocess is in effect (commonly, because of
-save-temps), a PCH file may be loaded during preprocessing, in which
case the parser will be destroyed, causing the issue noted in the
PR. Resolve it by reinitializing the frontend parser after loading the PCH.

gcc/c-family/ChangeLog:

PR pch/112319
* c-ppoutput.cc (cb_read_pch): Reinitialize the frontend parser
after loading a PCH.

gcc/testsuite/ChangeLog:

PR pch/112319
* g++.dg/pch/pr112319.C: New test.
* g++.dg/pch/pr112319.Hs: New test.
* gcc.dg/pch/pr112319.c: New test.
* gcc.dg/pch/pr112319.hs: New test.
---
 gcc/c-family/c-ppoutput.cc   | 5 +
 gcc/testsuite/g++.dg/pch/pr112319.C  | 5 +
 gcc/testsuite/g++.dg/pch/pr112319.Hs | 1 +
 gcc/testsuite/gcc.dg/pch/pr112319.c  | 5 +
 gcc/testsuite/gcc.dg/pch/pr112319.hs | 1 +
 5 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/pch/pr112319.C
 create mode 100644 gcc/testsuite/g++.dg/pch/pr112319.Hs
 create mode 100644 gcc/testsuite/gcc.dg/pch/pr112319.c
 create mode 100644 gcc/testsuite/gcc.dg/pch/pr112319.hs

diff --git a/gcc/c-family/c-ppoutput.cc b/gcc/c-family/c-ppoutput.cc
index 4aa2bef2c0f..4f973767976 100644
--- a/gcc/c-family/c-ppoutput.cc
+++ b/gcc/c-family/c-ppoutput.cc
@@ -862,4 +862,9 @@ cb_read_pch (cpp_reader *pfile, const char *name,
 
   fprintf (print.outf, "#pragma GCC pch_preprocess \"%s\"\n", name);
   print.src_line++;
+
+  /* The process of reading the PCH has destroyed the frontend parser,
+ so ask the frontend to reinitialize it, in case we need it to
+ process any #pragma directives encountered while preprocessing.  */
+  c_init_preprocess ();
 }
diff --git a/gcc/testsuite/g++.dg/pch/pr112319.C 
b/gcc/testsuite/g++.dg/pch/pr112319.C
new file mode 100644
index 000..9e0457e8aec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pch/pr112319.C
@@ -0,0 +1,5 @@
+/* { dg-additional-options "-Wpragmas -save-temps" } */
+#include "pr112319.H"
+#pragma GCC diagnostic error "-Wpragmas"
+#pragma GCC diagnostic ignored "oops" /* { dg-error "oops" } */
+/* { dg-regexp {[^[:space:]]*: some warnings being treated as errors} } */
diff --git a/gcc/testsuite/g++.dg/pch/pr112319.Hs 
b/gcc/testsuite/g++.dg/pch/pr112319.Hs
new file mode 100644
index 000..3b6178bfae0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pch/pr112319.Hs
@@ -0,0 +1 @@
+/* This space intentionally left blank.  */
diff --git a/gcc/testsuite/gcc.dg/pch/pr112319.c 
b/gcc/testsuite/gcc.dg/pch/pr112319.c
new file mode 100644
index 000..043881463c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pch/pr112319.c
@@ -0,0 +1,5 @@
+/* { dg-additional-options "-Wpragmas -save-temps" } */
+#include "pr112319.h"
+#pragma GCC diagnostic error "-Wpragmas"
+#pragma GCC diagnostic ignored "oops" /* { dg-error "oops" } */
+/* { dg-regexp {[^[:space:]]*: some warnings being treated as errors} } */
diff --git a/gcc/testsuite/gcc.dg/pch/pr112319.hs 
b/gcc/testsuite/gcc.dg/pch/pr112319.hs
new file mode 100644
index 000..3b6178bfae0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pch/pr112319.hs
@@ -0,0 +1 @@
+/* This space intentionally left blank.  */


[Committed] RISC-V: Enable ztso tests on rv32

2023-11-01 Thread Patrick O'Neill



On 11/1/23 12:03, Jeff Law wrote:



On 10/31/23 17:25, Patrick O'Neill wrote:
This patch transitions the ztso testcases to use the testsuite 
infrastructure,

enabling the tests on both rv64 and rv32 targets.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add Ztso 
extension to

dg-options for dg-do compile.
 * gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-4.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-fence-5.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-load-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-load-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-load-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-store-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-store-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-store-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
 * gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.
 * lib/target-supports.exp: Add testing infrastructure to 
require the

Ztso extension or add it to an existing -march.

Signed-off-by: Patrick O'Neill 
---
Before committing v1, I ran the full testsuite as a sanity check and 
found
failures that don't happen when running the testcases individually. 
v2 fixes

those failures using common-sense fixes.
OK for the trunk.  Thanks for doing the deeper testing and addressing 
the issues that showed up.


Committed.

Thanks,
Patrick



[Committed] RISC-V: Use riscv_subword_address for atomic_test_and_set

2023-11-01 Thread Patrick O'Neill



On 11/1/23 12:00, Jeff Law wrote:



On 11/1/23 10:14, Patrick O'Neill wrote:

Other subword atomic patterns use riscv_subword_address to calculate
the aligned address, shift amount, mask and !mask. atomic_test_and_set
was implemented before the common function was added. After this patch
all subword atomic patterns use riscv_subword_address.

gcc/ChangeLog:

* config/riscv/sync.md:  Use riscv_subword_address function to
calculate the address and shift in atomic_test_and_set.
OK.  No strong opinions on the extraneous move at -O0.  It does make a 
tiny bit of work for the optimizers, but I doubt it matters in practice.


Your call if you want to fix that as a follow-up or not.

jeff

Committed.

I think I'll leave it as-is since IMO sync.md is more readable when 
those statements are broken up.
If someone finds issue with it in the future we can easily apply the 
diff from the patch.


Thanks,
Patrick


[PATCH] Fortran: passing of allocatable/pointer arguments to OPTIONAL+VALUE [PR92887]

2023-11-01 Thread Harald Anlauf
Dear all,

I've dusted off and cleaned up a previous attempt to fix the handling
of allocatable or pointer actual arguments to OPTIONAL+VALUE dummies.
The standard says that a non-allocated / non-associated actual argument
in that case shall be treated as non-present.

However, gfortran's calling conventions demand that the presence status
for OPTIONAL+VALUE is passed as a hidden argument, while we need to
pass something on the stack which has the right type.  The solution
is to conditionally create a temporary when needed.

Testcase checked with NAG.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 6927612d97a8e7360e651bb081745fc7659a4c4b Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 1 Nov 2023 22:55:36 +0100
Subject: [PATCH] Fortran: passing of allocatable/pointer arguments to
 OPTIONAL+VALUE [PR92887]

gcc/fortran/ChangeLog:

	PR fortran/92887
	* trans-expr.cc (conv_cond_temp): Helper function for creation of a
	conditional temporary.
	(gfc_conv_procedure_call): Handle passing of allocatable or pointer
	actual argument to dummy with OPTIONAL + VALUE attribute.  Actual
	arguments that are not allocated or associated are treated as not
	present.

gcc/testsuite/ChangeLog:

	PR fortran/92887
	* gfortran.dg/value_optional_1.f90: New test.
---
 gcc/fortran/trans-expr.cc | 50 ++-
 .../gfortran.dg/value_optional_1.f90  | 83 +++
 2 files changed, 130 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/value_optional_1.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 1b8be081a17..1c06ecb3c28 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6030,6 +6030,28 @@ post_call:
 }


+/* Create "conditional temporary" to handle scalar dummy variables with the
+   OPTIONAL+VALUE attribute that shall not be dereferenced.  Use null value
+   as fallback.  Only instances of intrinsic basic type are supported.  */
+
+void
+conv_cond_temp (gfc_se * parmse, gfc_expr * e, tree cond)
+{
+  tree temp;
+  gcc_assert (e->ts.type != BT_DERIVED && e->ts.type != BT_CLASS);
+  gcc_assert (e->rank == 0);
+  temp = gfc_create_var (TREE_TYPE (parmse->expr), "condtemp");
+  TREE_STATIC (temp) = 1;
+  TREE_CONSTANT (temp) = 1;
+  TREE_READONLY (temp) = 1;
+  DECL_INITIAL (temp) = build_zero_cst (TREE_TYPE (temp));
+  parmse->expr = fold_build3_loc (input_location, COND_EXPR,
+  TREE_TYPE (parmse->expr),
+  cond, parmse->expr, temp);
+  parmse->expr = gfc_evaluate_now (parmse->expr, &parmse->pre);
+}
+
+
 /* Generate code for a procedure call.  Note can return se->post != NULL.
If se->direct_byref is set then se->expr contains the return parameter.
Return nonzero, if the call has alternate specifiers.
@@ -6470,9 +6492,31 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 			&& fsym->ts.type != BT_CLASS
 			&& fsym->ts.type != BT_DERIVED)
 		  {
-			if (e->expr_type != EXPR_VARIABLE
-			|| !e->symtree->n.sym->attr.optional
-			|| e->ref != NULL)
+			/* F2018:15.5.2.12 Argument presence and
+			   restrictions on arguments not present.  */
+			if (e->expr_type == EXPR_VARIABLE
+			&& (gfc_expr_attr (e).allocatable
+|| gfc_expr_attr (e).pointer))
+			  {
+			gfc_se argse;
+			tree cond;
+			gfc_init_se (&argse, NULL);
+			argse.want_pointer = 1;
+			gfc_conv_expr (&argse, e);
+			cond = fold_convert (TREE_TYPE (argse.expr),
+		 null_pointer_node);
+			cond = fold_build2_loc (input_location, NE_EXPR,
+		logical_type_node,
+		argse.expr, cond);
+			vec_safe_push (optionalargs,
+	   fold_convert (boolean_type_node,
+			 cond));
+			/* Create "conditional temporary".  */
+			conv_cond_temp (&parmse, e, cond);
+			  }
+			else if (e->expr_type != EXPR_VARIABLE
+ || !e->symtree->n.sym->attr.optional
+ || e->ref != NULL)
 			  vec_safe_push (optionalargs, boolean_true_node);
 			else
 			  {
diff --git a/gcc/testsuite/gfortran.dg/value_optional_1.f90 b/gcc/testsuite/gfortran.dg/value_optional_1.f90
new file mode 100644
index 000..2f95316de52
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/value_optional_1.f90
@@ -0,0 +1,83 @@
+! { dg-do run }
+! PR fortran/92887
+!
+! Test passing nullified/disassociated pointer or unalloc allocatable
+! to OPTIONAL + VALUE
+
+program p
+  implicit none !(type, external)
+  integer,  allocatable :: aa
+  real, pointer :: pp
+  character,allocatable :: ca
+  character,pointer :: cp
+  complex,  allocatable :: za
+  complex,  pointer :: zp
+  type t
+ integer,  allocatable :: aa
+ real, pointer :: pp => NULL()
+ complex,  allocatable :: za
+  end type t
+  type(t) :: tt
+  nullify (pp, cp, zp)
+  call sub (aa, pp, ca, cp, za)
+  call sub (tt% aa, tt% pp, z=tt% za)
+  allocate (aa, pp, ca, cp, za, zp, tt% za)
+  aa = 1; pp = 2.; ca = "c"; cp = "d"; za = 3.; zp = 4

Re: [committed] libstdc++: Minor update to installation docs

2023-11-01 Thread Gerald Pfeifer
On Mon, 18 Sep 2023, Jonathan Wakely via Gcc-patches wrote:
> @@ -103,8 +103,10 @@ ln -s libiconv-1.16 libiconv
>   
> If GCC 3.1.0 or later on is being used on GNU/Linux, an attempt
> will be made to use "C" library functionality necessary for
> -   C++ named locale support.  For GCC 4.6.0 and later, this
> -   means that glibc 2.3 or later is required.
> +   C++ named locale support, e.g. the newlocale
> +   and uselocale functions.
> +   For GCC 4.6.0 and later,
> +   this means that glibc 2.3 or later is required.

Do we still need to provide those details on GCC 3.1+ and GCC 4.6+?

Would it make sense to simply require glibc 2.3 (or higher)?

Gerald


Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-01 Thread waffl3x
Just want to quickly check, when is the cutoff for GCC14 exactly? I
want to know how much time I have left to try to tackle this bug with
subscript. I'm going to be crunching out final stuff this week and I'll
try to get a (probably non-final) patch for you to review today.

Alex


Re: [PING] Re: [PATCH] Add files to discourage submissions of PRs to the GitHub mirror.

2023-11-01 Thread Eric Gallager
On Wed, Nov 1, 2023 at 11:31 AM Jeff Law  wrote:
>
>
>
> On 11/1/23 08:11, Eric Gallager wrote:
> > Hi, I'd like to ping the following patch:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633191.html
> OK for the trunk.
>

Thanks, committed as r14-5064-g2b9778c8d9d331:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=2b9778c8d9d33174de63716b74b2f114d700e104

> jeff


Re: [PING] Re: [PATCH] Add files to discourage submissions of PRs to the GitHub mirror.

2023-11-01 Thread Eric Gallager
On Wed, Nov 1, 2023 at 7:25 PM Eric Gallager  wrote:
>
> On Wed, Nov 1, 2023 at 11:31 AM Jeff Law  wrote:
> >
> >
> >
> > On 11/1/23 08:11, Eric Gallager wrote:
> > > Hi, I'd like to ping the following patch:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633191.html
> > OK for the trunk.
> >
>
> Thanks, committed as r14-5064-g2b9778c8d9d331:
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=2b9778c8d9d33174de63716b74b2f114d700e104

...and then I cleaned up the formatting a bit in
r14-5065-g4968e4844a3ce3, which I pushed under the "obvious" rule:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=4968e4844a3ce30143ae2e267895c418f5c636a1

>
> > jeff


RE: [PATCH] RISC-V: Allow dest operand and accumulator operand overlap of widen reduction instruction[PR112327]

2023-11-01 Thread Li, Pan2
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Thursday, November 2, 2023 3:02 AM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Allow dest operand and accumulator operand overlap 
of widen reduction instruction[PR112327]



On 11/1/23 00:56, Juzhe-Zhong wrote:
> 
> Consider this following intrinsic code:
> 
> void rvv_dot_prod(int16_t *pSrcA, int16_t *pSrcB, uint32_t n, int64_t *result)
> {
>  size_t vl;
>  vint16m4_t vSrcA, vSrcB;
>  vint64m1_t vSum = __riscv_vmv_s_x_i64m1(0, 1);
>  while (n > 0) {
>  vl = __riscv_vsetvl_e16m4(n);
>  vSrcA = __riscv_vle16_v_i16m4(pSrcA, vl);
>  vSrcB = __riscv_vle16_v_i16m4(pSrcB, vl);
>  vSum = __riscv_vwredsum_vs_i32m8_i64m1(__riscv_vwmul_vv_i32m8(vSrcA, 
> vSrcB, vl), vSum, vl);
>  pSrcA += vl;
>  pSrcB += vl;
>  n -= vl;
>  }
>  *result = __riscv_vmv_x_s_i64m1_i64(vSum);
> }
> 
> https://godbolt.org/z/vWd35W7G6
> 
> Before this patch:
> 
> ...
> Loop:
> ...
> vmv1r.v v2,v1
> ...
> vwredsum.vs v1,v8,v2
> ...
> 
> After this patch:
> 
> ...
> Loop:
> ...
> vwredsum.vs   v1,v8,v1
> ...
> 
>   PR target/112327
> 
> gcc/ChangeLog:
> 
>   * config/riscv/vector.md: Add '0'.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/base/pr112327-1.c: New test.
>   * gcc.target/riscv/rvv/base/pr112327-2.c: New test.
OK
jeff


RE: [PATCH v4] VECT: Refine the type size restriction of call vectorizer

2023-11-01 Thread Li, Pan2
Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, November 2, 2023 12:43 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; kito.ch...@gmail.com; Liu, Hongtao 

Subject: Re: [PATCH v4] VECT: Refine the type size restriction of call 
vectorizer



> Am 31.10.2023 um 16:10 schrieb pan2...@intel.com:
> 
> From: Pan Li 
> 
> Update in v4:
> 
> * Append the check to vectorizable_internal_function.
> 
> Update in v3:
> 
> * Add func to predicate type size is legal or not for vectorizer call.
> 
> Update in v2:
> 
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
> 
> Original log:
> 
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
> 
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>  for (unsigned i = 0; i < count; i++)
>out[i] = __builtin_lrintf (in[i]);
> }
> 
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
> 
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to refine this data
> type size check and unblock the standard name like lrintmn2 on conditions.
> 
> The type size of vectype_out need to be exactly the same as the type
> size of vectype_in when the vectype_out size isn't participating in
> the optab selection. While there is no such restriction when the
> vectype_out is somehow a part of the optab query.
> 
> The below test are passed for this patch.
> 
> * The risc-v regression tests.
> * Ensure the lrintf standard name in risc-v.
> 
> The below test are ongoing.
> 
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> 

Ok

Thanks,
Richard 

> gcc/ChangeLog:
> 
>* tree-vect-stmts.cc (vectorizable_internal_function): Add type
>size check for vectype_out doesn't participating for optab query.
>(vectorizable_call): Remove the type size check.
> 
> Signed-off-by: Pan Li 
> ---
> gcc/tree-vect-stmts.cc | 22 +-
> 1 file changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..799b4ab10c7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree 
> fndecl,
>   const direct_internal_fn_info &info = direct_internal_fn (ifn);
>   if (info.vectorizable)
>{
> +  bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
>  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
>  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +
> +  /* The type size of both the vectype_in and vectype_out should be
> + exactly the same when vectype_out isn't participating the optab.
> + While there is no restriction for type size when vectype_out
> + is part of the optab query.  */
> +  if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
> +return IFN_LAST;
> +
>  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
>  OPTIMIZE_FOR_SPEED))
>return ifn;
> @@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
> 
>   return false;
> }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> - just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> - by a pack of the two vectors into an SI vector.  We would need
> - separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -{
> -  if (dump_enabled_p ())
> -dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> - "mismatched vector sizes %T and %T\n",
> - vectype_in, vectype_out);
> -  return false;
> -}
> 
>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>   != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> -- 
> 2.34.1
> 


[PATCH] RISC-V: Support vcreate intrinsics for non-tuple types

2023-11-01 Thread Li Xu
From: xuli 

https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/288

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Expand non-tuple 
intrinsics.
* config/riscv/riscv-vector-builtins-functions.def (vcreate): Define 
non-tuple intrinsics.
* config/riscv/riscv-vector-builtins-shapes.cc (struct vcreate_def): 
Ditto.
* config/riscv/riscv-vector-builtins.cc: Add arg types.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/tuple_create.c: Rename to vcreate.c.
* gcc.target/riscv/rvv/base/vcreate.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  21 +-
 .../riscv/riscv-vector-builtins-functions.def |   6 +
 .../riscv/riscv-vector-builtins-shapes.cc |  25 +-
 gcc/config/riscv/riscv-vector-builtins.cc |  53 
 .../gcc.target/riscv/rvv/base/tuple_create.c  | 123 -
 .../gcc.target/riscv/rvv/base/vcreate.c   | 260 ++
 6 files changed, 357 insertions(+), 131 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple_create.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 0b1409a52e0..25ba31e2659 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1798,6 +1798,10 @@ public:
   {
 unsigned int nargs = gimple_call_num_args (f.call);
 tree lhs_type = TREE_TYPE (f.lhs);
+/* LMUL > 1 non-tuple vector types are not structure,
+   we can't use __val[index] to set the subpart.  */
+if (!riscv_v_ext_tuple_mode_p (TYPE_MODE (lhs_type)))
+  return NULL;
 
 /* Replace the call with a clobber of the result (to prevent it from
becoming upwards exposed) followed by stores into each individual
@@ -1823,9 +1827,22 @@ public:
 return clobber;
   }
 
-  rtx expand (function_expander &) const override
+  rtx expand (function_expander &e) const override
   {
-gcc_unreachable ();
+if (!e.target)
+  return NULL_RTX;
+gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (e.target)));
+unsigned int nargs = call_expr_nargs (e.exp);
+for (unsigned int i = 0; i < nargs; i++)
+  {
+   rtx src = expand_normal (CALL_EXPR_ARG (e.exp, i));
+   poly_int64 offset = i * GET_MODE_SIZE (GET_MODE (src));
+   rtx subreg = simplify_gen_subreg (GET_MODE (src), e.target,
+ GET_MODE (e.target), offset);
+   emit_move_insn (subreg, src);
+  }
+
+return e.target;
   }
 };
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 911fd520195..1c37fd5fffe 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -617,6 +617,12 @@ DEF_RVV_FUNCTION (vget, vget, none_preds, 
all_v_vget_lmul1_x8_ops)
 DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_lmul2_x2_ops)
 DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_lmul2_x4_ops)
 DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_lmul4_x2_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul1_x2_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul1_x4_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul1_x8_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul2_x2_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul2_x4_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul4_x2_ops)
 
 // Tuple types
 DEF_RVV_FUNCTION (vset, vset, none_preds, all_v_vset_tuple_ops)
diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
index 0bda934ae16..72b0d6a96a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
@@ -728,13 +728,17 @@ struct vcreate_def : public build_base
  if (!return_type)
continue;
 
- machine_mode mode = TYPE_MODE (return_type);
- unsigned int nf = get_nf (mode);
+ tree arg_type = function_instance.op_info->args[0].get_tree_type (
+   function_instance.type.index);
 
- for (unsigned int i = 0; i < nf; i++)
-   argument_types.quick_push (
- function_instance.op_info->args[0].get_tree_type (
-   function_instance.type.index));
+ machine_mode outer_mode = TYPE_MODE (return_type);
+ machine_mode inner_mode = TYPE_MODE (arg_type);
+ unsigned int nargs
+   = exact_div (GET_MODE_SIZE (outer_mode), GET_MODE_SIZE (inner_mode))
+   .to_constant ();
+
+ for (unsigned int i = 0; i < nargs; i++)
+   argument_types.quick_push (arg_type);
 
  b.add_unique_function (function_instance, (*group.shape), return_type,
  

[tree-optimization/111721] VECT: Support SLP for MASK_LEN_GATHER_LOAD with dummy mask

2023-11-01 Thread Juzhe-Zhong
This patch fixes following FAILs for RVV:
FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
stmts"

Bootstrap on X86 and regtest passed.

Tested on aarch64 passed.

Ok for trunk ?

PR tree-optimization/111721

gcc/ChangeLog:

* tree-vect-slp.cc (vect_get_and_check_slp_defs): Support SLP for dummy 
mask -1.
* tree-vect-stmts.cc (vectorizable_load): Ditto.

---
 gcc/tree-vect-slp.cc   | 14 --
 gcc/tree-vect-stmts.cc |  8 +++-
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 43d742e3c92..23ca0318e31 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -756,8 +756,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char 
swap,
{
  tree type = TREE_TYPE (oprnd);
  dt = dts[i];
- if ((dt == vect_constant_def
-  || dt == vect_external_def)
+ if (dt == vect_external_def
  && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
  && (TREE_CODE (type) == BOOLEAN_TYPE
  || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
@@ -769,6 +768,17 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
char swap,
 "for variable-length SLP %T\n", oprnd);
  return -1;
}
+ if (dt == vect_constant_def
+ && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
+ && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Build SLP failed: invalid type of def "
+"for variable-length SLP %T\n",
+oprnd);
+ return -1;
+   }
 
  /* For the swapping logic below force vect_reduction_def
 for the reduction op in a SLP reduction group.  */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 6ce4868d3e1..6c47121e158 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9859,10 +9859,16 @@ vectorizable_load (vec_info *vinfo,
   mask_index = internal_fn_mask_index (ifn);
   if (mask_index >= 0 && slp_node)
mask_index = vect_slp_child_index_for_operand (call, mask_index);
+  slp_tree slp_op = NULL;
   if (mask_index >= 0
  && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
- &mask, NULL, &mask_dt, &mask_vectype))
+ &mask, &slp_op, &mask_dt, &mask_vectype))
return false;
+  /* MASK_LEN_GATHER_LOAD dummy mask -1 should always match the
+MASK_VECTYPE.  */
+  if (mask_index >= 0 && slp_node && mask_dt == vect_constant_def
+ && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
+   gcc_unreachable ();
 }
 
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-- 
2.36.3



Re: [PATCH] RISC-V: Support vcreate intrinsics for non-tuple types

2023-11-01 Thread juzhe.zh...@rivai.ai
LGTM. Thanks.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-11-02 08:54
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Support vcreate intrinsics for non-tuple types
From: xuli 
 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/288
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc: Expand non-tuple intrinsics.
* config/riscv/riscv-vector-builtins-functions.def (vcreate): Define non-tuple 
intrinsics.
* config/riscv/riscv-vector-builtins-shapes.cc (struct vcreate_def): Ditto.
* config/riscv/riscv-vector-builtins.cc: Add arg types.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/tuple_create.c: Rename to vcreate.c.
* gcc.target/riscv/rvv/base/vcreate.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  21 +-
.../riscv/riscv-vector-builtins-functions.def |   6 +
.../riscv/riscv-vector-builtins-shapes.cc |  25 +-
gcc/config/riscv/riscv-vector-builtins.cc |  53 
.../gcc.target/riscv/rvv/base/tuple_create.c  | 123 -
.../gcc.target/riscv/rvv/base/vcreate.c   | 260 ++
6 files changed, 357 insertions(+), 131 deletions(-)
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple_create.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 0b1409a52e0..25ba31e2659 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1798,6 +1798,10 @@ public:
   {
 unsigned int nargs = gimple_call_num_args (f.call);
 tree lhs_type = TREE_TYPE (f.lhs);
+/* LMUL > 1 non-tuple vector types are not structure,
+   we can't use __val[index] to set the subpart.  */
+if (!riscv_v_ext_tuple_mode_p (TYPE_MODE (lhs_type)))
+  return NULL;
 /* Replace the call with a clobber of the result (to prevent it from
becoming upwards exposed) followed by stores into each individual
@@ -1823,9 +1827,22 @@ public:
 return clobber;
   }
-  rtx expand (function_expander &) const override
+  rtx expand (function_expander &e) const override
   {
-gcc_unreachable ();
+if (!e.target)
+  return NULL_RTX;
+gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (e.target)));
+unsigned int nargs = call_expr_nargs (e.exp);
+for (unsigned int i = 0; i < nargs; i++)
+  {
+ rtx src = expand_normal (CALL_EXPR_ARG (e.exp, i));
+ poly_int64 offset = i * GET_MODE_SIZE (GET_MODE (src));
+ rtx subreg = simplify_gen_subreg (GET_MODE (src), e.target,
+   GET_MODE (e.target), offset);
+ emit_move_insn (subreg, src);
+  }
+
+return e.target;
   }
};
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 911fd520195..1c37fd5fffe 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -617,6 +617,12 @@ DEF_RVV_FUNCTION (vget, vget, none_preds, 
all_v_vget_lmul1_x8_ops)
DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_lmul2_x2_ops)
DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_lmul2_x4_ops)
DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_lmul4_x2_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul1_x2_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul1_x4_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul1_x8_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul2_x2_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul2_x4_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul4_x2_ops)
// Tuple types
DEF_RVV_FUNCTION (vset, vset, none_preds, all_v_vset_tuple_ops)
diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
index 0bda934ae16..72b0d6a96a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
@@ -728,13 +728,17 @@ struct vcreate_def : public build_base
  if (!return_type)
continue;
-   machine_mode mode = TYPE_MODE (return_type);
-   unsigned int nf = get_nf (mode);
+   tree arg_type = function_instance.op_info->args[0].get_tree_type (
+ function_instance.type.index);
-   for (unsigned int i = 0; i < nf; i++)
- argument_types.quick_push (
-   function_instance.op_info->args[0].get_tree_type (
- function_instance.type.index));
+   machine_mode outer_mode = TYPE_MODE (return_type);
+   machine_mode inner_mode = TYPE_MODE (arg_type);
+   unsigned int nargs
+ = exact_div (GET_MODE_SIZE (outer_mode), GET_MODE_SIZE (inner_mode))
+ .to_constant ();
+
+   for (unsigned int i = 0; i < nargs; i++)
+ argument_types.quick_push (arg_type);
  b.add_unique_function (function_instance, (*group.shape), return_type,
argument_types);
@@ -748,6 +752,15 @@ struct vcreate_d

Re: Re: [PATCH] RISC-V: Support vcreate intrinsics for non-tuple types

2023-11-01 Thread Li Xu
Committed, thanks juzhe.



xu...@eswincomputing.com
 
From: juzhe.zh...@rivai.ai
Date: 2023-11-02 09:00
To: Li Xu; gcc-patches
CC: kito.cheng; palmer; Li Xu
Subject: Re: [PATCH] RISC-V: Support vcreate intrinsics for non-tuple types
LGTM. Thanks.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-11-02 08:54
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Support vcreate intrinsics for non-tuple types
From: xuli 
 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/288
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc: Expand non-tuple intrinsics.
* config/riscv/riscv-vector-builtins-functions.def (vcreate): Define non-tuple 
intrinsics.
* config/riscv/riscv-vector-builtins-shapes.cc (struct vcreate_def): Ditto.
* config/riscv/riscv-vector-builtins.cc: Add arg types.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/tuple_create.c: Rename to vcreate.c.
* gcc.target/riscv/rvv/base/vcreate.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  21 +-
.../riscv/riscv-vector-builtins-functions.def |   6 +
.../riscv/riscv-vector-builtins-shapes.cc |  25 +-
gcc/config/riscv/riscv-vector-builtins.cc |  53 
.../gcc.target/riscv/rvv/base/tuple_create.c  | 123 -
.../gcc.target/riscv/rvv/base/vcreate.c   | 260 ++
6 files changed, 357 insertions(+), 131 deletions(-)
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple_create.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 0b1409a52e0..25ba31e2659 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1798,6 +1798,10 @@ public:
   {
 unsigned int nargs = gimple_call_num_args (f.call);
 tree lhs_type = TREE_TYPE (f.lhs);
+/* LMUL > 1 non-tuple vector types are not structure,
+   we can't use __val[index] to set the subpart.  */
+if (!riscv_v_ext_tuple_mode_p (TYPE_MODE (lhs_type)))
+  return NULL;
 /* Replace the call with a clobber of the result (to prevent it from
becoming upwards exposed) followed by stores into each individual
@@ -1823,9 +1827,22 @@ public:
 return clobber;
   }
-  rtx expand (function_expander &) const override
+  rtx expand (function_expander &e) const override
   {
-gcc_unreachable ();
+if (!e.target)
+  return NULL_RTX;
+gcc_assert (riscv_v_ext_vector_mode_p (GET_MODE (e.target)));
+unsigned int nargs = call_expr_nargs (e.exp);
+for (unsigned int i = 0; i < nargs; i++)
+  {
+ rtx src = expand_normal (CALL_EXPR_ARG (e.exp, i));
+ poly_int64 offset = i * GET_MODE_SIZE (GET_MODE (src));
+ rtx subreg = simplify_gen_subreg (GET_MODE (src), e.target,
+   GET_MODE (e.target), offset);
+ emit_move_insn (subreg, src);
+  }
+
+return e.target;
   }
};
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 911fd520195..1c37fd5fffe 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -617,6 +617,12 @@ DEF_RVV_FUNCTION (vget, vget, none_preds, 
all_v_vget_lmul1_x8_ops)
DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_lmul2_x2_ops)
DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_lmul2_x4_ops)
DEF_RVV_FUNCTION (vget, vget, none_preds, all_v_vget_lmul4_x2_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul1_x2_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul1_x4_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul1_x8_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul2_x2_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul2_x4_ops)
+DEF_RVV_FUNCTION (vcreate, vcreate, none_preds, all_v_vcreate_lmul4_x2_ops)
// Tuple types
DEF_RVV_FUNCTION (vset, vset, none_preds, all_v_vset_tuple_ops)
diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
index 0bda934ae16..72b0d6a96a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
@@ -728,13 +728,17 @@ struct vcreate_def : public build_base
  if (!return_type)
continue;
-   machine_mode mode = TYPE_MODE (return_type);
-   unsigned int nf = get_nf (mode);
+   tree arg_type = function_instance.op_info->args[0].get_tree_type (
+ function_instance.type.index);
-   for (unsigned int i = 0; i < nf; i++)
- argument_types.quick_push (
-   function_instance.op_info->args[0].get_tree_type (
- function_instance.type.index));
+   machine_mode outer_mode = TYPE_MODE (return_type);
+   machine_mode inner_mode = TYPE_MODE (arg_type);
+   unsigned int nargs
+ = exact_div (GET_MODE_SIZE (outer_mode), GET_MODE_SIZE (inner_mode))
+ .to_c

RE: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-11-01 Thread Jiang, Haochen
> -Original Message-
> From: Uros Bizjak 
> Sent: Thursday, November 2, 2023 3:23 AM
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation
> using peephole2.
> 
> On Wed, Nov 1, 2023 at 1:58 PM Roger Sayle 
> wrote:
> >
> >
> > Hi Uros,
> >
> > > From: Uros Bizjak 
> > > Sent: 01 November 2023 10:05
> > > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register
> allocation
> > > using peephole2.
> > >
> > > On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle
> 
> > > wrote:
> > > >
> > > >
> > > > This patch is a follow-up to my previous PR target/110551 patch, this
> > > > time to address the additional move after mulx, seen on TARGET_BMI2
> > > > architectures (such as -march=haswell).  The complication here is that
> > > > the flexible multiple-set mulx instruction is introduced into RTL
> > > > after reload, by split2, and therefore can't benefit from register
> > > > preferencing.  This results in RTL like the following:
> > > >
> > > > (insn 32 31 17 2 (parallel [
> > > > (set (reg:DI 4 si [orig:101 r ] [101])
> > > > (mult:DI (reg:DI 1 dx [109])
> > > > (reg:DI 5 di [109])))
> > > > (set (reg:DI 5 di [ r+8 ])
> > > > (umul_highpart:DI (reg:DI 1 dx [109])
> > > > (reg:DI 5 di [109])))
> > > > ]) "pr110551-2.c":8:17 -1
> > > >  (nil))
> > > >
> > > > (insn 17 32 9 2 (set (reg:DI 0 ax [107])
> > > > (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal}
> > > >  (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ])
> > > > (nil)))
> > > >
> > > > Here insn 32, the mulx instruction, places its results in si and di,
> > > > and then immediately after decides to move di to ax, with di now dead.
> > > > This can be trivially cleaned up by a peephole2.  I've added an
> > > > additional constraint that the two SET_DESTs can't be the same
> > > > register to avoid confusing the middle-end, but this has well-defined
> > > > behaviour on x86_64/BMI2, encoding a umul_highpart.
> > > >
> > > > For the new test case, compiled on x86_64 with -O2 -march=haswell:
> > > >
> > > > Before:
> > > > mulx64: movabsq $-7046029254386353131, %rdx
> > > > mulx%rdi, %rsi, %rdi
> > > > movq%rdi, %rax
> > > > xorq%rsi, %rax
> > > > ret
> > > >
> > > > After:
> > > > mulx64: movabsq $-7046029254386353131, %rdx
> > > > mulx%rdi, %rsi, %rax
> > > > xorq%rsi, %rax
> > > > ret
> > > >
> > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > > and make -k check, both with and without --target_board=unix{-m32}
> > > > with no new failures.  Ok for mainline?
> > >
> > > It looks that your previous PR110551 patch regressed -march=cascadelake 
> > > [1].

Actually it is not only on -march=cascadelake, w/o -march=cascadelake will also
fail.

Thx,
Haochen

> > > Let's fix these regressions first.
> > >
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html
> > >
> > > Uros.
> >
> > This patch fixes that "regression".  Originally, the test case in PR110551 
> > contained
> > one unnecessary mov on "default" x86_targets, but two extra movs on BMI2
> > targets, including -march=haswell and -march=cascadelake.  The first patch
> > eliminated one of these MOVs, this patch eliminates the second.  I'm not 
> > sure
> > that you can call it a regression, the added test failed when run with a 
> > non-standard
> > -march setting.  The good news is that test case doesn't have to be changed 
> > with
> > this patch applied, i.e. the correct intended behaviour is no MOVs on all
> architectures.
> 
> I was not worried about the extra mov, but more about [2], also
> referred from [1], but it looks that that regression was wrongly
> attributed to your patch.
> 
> [2] https://gcc.gnu.org/pipermail/gcc-regression/2023-October/078391.html
> 
> > I'll admit the timing is unusual; I had already written and was regression 
> > testing a
> > patch for the BMI2 issue, when the -march=cascadelake regression tester let 
> > me
> > know it was required for folks that helpfully run the regression suite with 
> > non
> > standard settings.  i.e. a long standing bug that wasn't previously tested 
> > for by
> > the testsuite.
> >
> > > > 2023-10-30  Roger Sayle  
> > > >
> > > > gcc/ChangeLog
> > > > PR target/110551
> > > > * config/i386/i386.md (*bmi2_umul3_1): Tidy condition
> > > > as operands[2] with predicate register_operand must be !MEM_P.
> > > > (peephole2): Optimize a mulx followed by a register-to-register
> > > > move, to place result in the correct destination if possible.
> > > >
> > > > gcc/testsuite/ChangeLog
> > > > PR target/110551
> > > > * gcc.target/i386/pr110551-2.c: New test case.
> 
> The patch is OK.
> 
> Thanks,
> Uros.


[Committed] RISC-V: Fix redundant attributes

2023-11-01 Thread Juzhe-Zhong
Notice that there are some reundant 'vimov' codes in attribute.

Committed as it is obvious.

gcc/ChangeLog:

* config/riscv/vector.md: Fix redundant codes in attributes.

---
 gcc/config/riscv/vector.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index fabd18b4db7..28baee59a9b 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -827,13 +827,13 @@
 
 ;; The avl type value.
 (define_attr "avl_type_idx" ""
-  (cond [(eq_attr "type" 
"vlde,vldff,vste,vimov,vimov,vimov,vfmov,vext,vimerge,\
+  (cond [(eq_attr "type" "vlde,vldff,vste,vimov,vfmov,vext,vimerge,\
  vfsqrt,vfrecp,vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
  
vfwcvtftoi,vfwcvtftof,vfncvtitof,vfncvtftoi,vfncvtftof,\
  vfclass,vired,viwred,vfredu,vfredo,vfwredu,vfwredo,\
  vimovxv,vfmovfv,vlsegde,vlsegdff")
   (const_int 7)
-(eq_attr "type" "vldm,vstm,vimov,vmalu,vmalu")
+(eq_attr "type" "vldm,vstm,vmalu,vmalu")
   (const_int 5)
 
 ;; If operands[3] of "vlds" is not vector mode, it is pred_broadcast.
-- 
2.36.3



[PATCH] RISC-V: Fix redundant vsetvl in fixed-vlmax vectorized codes[PR112326]

2023-11-01 Thread Juzhe-Zhong
With compile option --param=riscv-autovec-preference=fixed-vlmax, we have
redundant AVL/VL toggling:

vsetvli a5,a3,e8,mf4,ta,ma -> should be changed into e32m1
vle32.v v1,0(a1)
vle32.v v2,0(a0)
vsetivlizero,4,e32,m1,ta,ma -> redundant
sllia2,a5,2
vadd.vv v1,v1,v2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma -> redundant
vse32.v v1,0(a4)
add a0,a0,a2
add a1,a1,a2
add a4,a4,a2
bne a3,zero,.L3

The root cause is because we simplify AVL into immediate AVL too early
in FIXED-VLMAX situation. The later avlprop PASS failed to propagate AVL
generated by (SELECT_VL/vsetvl VL, AVL) into the normal RVV instruction.

So we need to remove immedate AVL simplification in 'expand' stage.

After this patch:

vsetvli a5,a3,e32,m1,ta,ma
sllia2,a5,2
vle32.v v1,0(a1)
vle32.v v2,0(a0)
sub a3,a3,a5
vadd.vv v1,v1,v2
vse32.v v1,0(a4)
add a0,a0,a2
add a1,a1,a2
add a4,a4,a2
bne a3,zero,.L3

After the removed simplification, the following situation should be fixed:
typedef int8_t vnx2qi __attribute__ ((vector_size (2)));

__attribute__ ((noipa)) void
f_vnx2qi (int8_t a, int8_t b, int8_t *out)
{
  vnx2qi v = {a, b};
  *(vnx2qi *) out = v;
}

We should use vsetvili zero, 2 instead of vsetvl a5,zero.
Such simplification is done in avlprop PASS which is also included in this patch
to fix regression of these situation.

PR target/112326

gcc/ChangeLog:

* config/riscv/riscv-avlprop.cc (get_insn_vtype_mode): New function.
(simplify_replace_vlmax_avl): Ditto.
(pass_avlprop::execute): Add immediate AVL simplification.
* config/riscv/riscv-protos.h (imm_avl_p): Rename.
* config/riscv/riscv-v.cc (const_vlmax_p): Ditto.
(imm_avl_p): Ditto.
(emit_vlmax_insn): Adapt for new interface name.
* config/riscv/vector.md (mode_idx): New attribute.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112326.c: New test.

---
 gcc/config/riscv/riscv-avlprop.cc | 76 +++
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 24 ++
 gcc/config/riscv/vector.md| 29 ++-
 .../gcc.target/riscv/rvv/autovec/pr112326.c   | 16 
 5 files changed, 113 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112326.c

diff --git a/gcc/config/riscv/riscv-avlprop.cc 
b/gcc/config/riscv/riscv-avlprop.cc
index c59eb7f6fa3..e4d14a89a91 100644
--- a/gcc/config/riscv/riscv-avlprop.cc
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -109,6 +109,36 @@ vlmax_ta_p (rtx_insn *rinsn)
   return vlmax_avl_type_p (rinsn) && tail_agnostic_p (rinsn);
 }
 
+static machine_mode
+get_insn_vtype_mode (rtx_insn *rinsn)
+{
+  extract_insn_cached (rinsn);
+  int mode_idx = get_attr_mode_idx (rinsn);
+  gcc_assert (mode_idx != INVALID_ATTRIBUTE);
+  return GET_MODE (recog_data.operand[mode_idx]);
+}
+
+static void
+simplify_replace_vlmax_avl (rtx_insn *rinsn, rtx new_avl)
+{
+  /* Replace AVL operand.  */
+  extract_insn_cached (rinsn);
+  rtx avl = recog_data.operand[get_attr_vl_op_idx (rinsn)];
+  int count = count_regno_occurrences (rinsn, REGNO (avl));
+  gcc_assert (count == 1);
+  rtx new_pat = simplify_replace_rtx (PATTERN (rinsn), avl, new_avl);
+  validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
+
+  /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
+  if (vlmax_avl_type_p (rinsn))
+{
+  int index = get_attr_avl_type_idx (rinsn);
+  gcc_assert (index != INVALID_ATTRIBUTE);
+  validate_change_or_fail (rinsn, recog_data.operand_loc[index],
+  get_avl_type_rtx (avl_type::NONVLMAX), false);
+}
+}
+
 const pass_data pass_data_avlprop = {
   RTL_PASS, /* type */
   "avlprop",/* name */
@@ -385,22 +415,7 @@ pass_avlprop::execute (function *fn)
  print_rtl_single (dump_file, rinsn);
}
   /* Replace AVL operand.  */
-  extract_insn_cached (rinsn);
-  rtx avl = recog_data.operand[get_attr_vl_op_idx (rinsn)];
-  int count = count_regno_occurrences (rinsn, REGNO (avl));
-  gcc_assert (count == 1);
-  rtx new_pat = simplify_replace_rtx (PATTERN (rinsn), avl, prop.second);
-  validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
-
-  /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
-  if (vlmax_avl_type_p (rinsn))
-   {
- int index = get_attr_avl_type_idx (rinsn);
- gcc_assert (index != INVALID_ATTRIBUTE);
- validate_change_or_fail (rinsn, recog_data.operand_loc[index],
-  get_avl_type_rtx (avl_type::NONVLMAX),
-  false);
-   }
+  simplify_replace_vlmax_avl (rinsn, prop.second);
   if (dump

[PATCH v1] EXPMED: Allow vector mode for DSE extract_low_bits [PR111720]

2023-11-01 Thread pan2 . li
From: Pan Li 

The extract_low_bits only try the scalar mode if the bitsize of
the mode and src_mode is not equal. When vector mode is given
from get_stored_val in DSE, it will always fail and return NULL_RTX.

This patch would like to allow the vector mode in the extract_low_bits
if and only if the size of mode is less than or equals to the size of
the src_mode.

Given below example code with --param=riscv-autovec-preference=fixed-vlmax.

vuint8m1_t test () {
  uint8_t arr[32] = {
1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
1, 2, 7, 1, 3, 4, 5, 3, 1, 0, 1, 2, 4, 4, 9, 9,
  };

  return __riscv_vle8_v_u8m1(arr, 32);
}

Before this patch:

test:
  lui a5,%hi(.LANCHOR0)
  addisp,sp,-32
  addia5,a5,%lo(.LANCHOR0)
  li  a3,32
  vl2re64.v   v2,0(a5)
  vsetvli zero,a3,e8,m1,ta,ma
  vs2r.v  v2,0(sp) <== Unnecessary store to stack
  vle8.v  v1,0(sp) <== Ditto
  vs1r.v  v1,0(a0)
  addisp,sp,32
  jr  ra

After this patch:

test:
  lui a5,%hi(.LANCHOR0)
  addia5,a5,%lo(.LANCHOR0)
  li  a4,32
  addisp,sp,-32
  vsetvli zero,a4,e8,m1,ta,ma
  vle8.v  v1,0(a5)
  vs1r.v  v1,0(a0)
  addisp,sp,32
  jr  ra

Below tests are passed within this patch:

* The x86 bootstrap and regression test.
* The aarch64 regression test.
* The risc-v regression test.

PR target/111720

gcc/ChangeLog:

* expmed.cc (extract_low_bits): Allow vector mode if the
mode size is less than or equal to src_mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr111720-0.c: New test.
* gcc.target/riscv/rvv/base/pr111720-1.c: New test.
* gcc.target/riscv/rvv/base/pr111720-10.c: New test.
* gcc.target/riscv/rvv/base/pr111720-2.c: New test.
* gcc.target/riscv/rvv/base/pr111720-3.c: New test.
* gcc.target/riscv/rvv/base/pr111720-4.c: New test.
* gcc.target/riscv/rvv/base/pr111720-5.c: New test.
* gcc.target/riscv/rvv/base/pr111720-6.c: New test.
* gcc.target/riscv/rvv/base/pr111720-7.c: New test.
* gcc.target/riscv/rvv/base/pr111720-8.c: New test.
* gcc.target/riscv/rvv/base/pr111720-9.c: New test.

Signed-off-by: Pan Li 
---
 gcc/expmed.cc | 44 ---
 .../gcc.target/riscv/rvv/base/pr111720-0.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-1.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-10.c   | 18 
 .../gcc.target/riscv/rvv/base/pr111720-2.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-3.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-4.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-5.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-6.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-7.c| 21 +
 .../gcc.target/riscv/rvv/base/pr111720-8.c| 18 
 .../gcc.target/riscv/rvv/base/pr111720-9.c| 15 +++
 12 files changed, 227 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-0.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111720-9.c

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index b294eabb08d..5db83fe638c 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -2403,8 +2403,6 @@ extract_split_bit_field (rtx op0, opt_scalar_int_mode 
op0_mode,
 rtx
 extract_low_bits (machine_mode mode, machine_mode src_mode, rtx src)
 {
-  scalar_int_mode int_mode, src_int_mode;
-
   if (mode == src_mode)
 return src;
 
@@ -2437,22 +2435,38 @@ extract_low_bits (machine_mode mode, machine_mode 
src_mode, rtx src)
 return x;
 }
 
-  if (!int_mode_for_mode (src_mode).exists (&src_int_mode)
-  || !int_mode_for_mode (mode).exists (&int_mode))
-return NULL_RTX;
+  if (VECTOR_MODE_P (mode) && VECTOR_MODE_P (src_mode))
+{
+  if (maybe_gt (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
+   || !targetm.modes_tieable_p (mode, src_mode))
+   return NULL_RTX;
 
-  if (!targetm.modes_tieable_p (src_int_mode, src_mode))
-return NULL_RTX;
-  if (!targetm.modes_tieable_p (int_mode, mode))
-return NULL_RTX;
+  /* For vector mode,  only the bitsize (mode) <= bitsize (src_mode) and
+tieable is allowed here.  */
+  src = gen_lowpart (mode, src);
+}
+  else
+{

[PATCH V2] RISC-V: Fix redundant vsetvl in fixed-vlmax vectorized codes[PR112326]

2023-11-01 Thread Juzhe-Zhong
With compile option --param=riscv-autovec-preference=fixed-vlmax, we have
redundant AVL/VL toggling:

vsetvli a5,a3,e8,mf4,ta,ma -> should be changed into e32m1
vle32.v v1,0(a1)
vle32.v v2,0(a0)
vsetivlizero,4,e32,m1,ta,ma -> redundant
sllia2,a5,2
vadd.vv v1,v1,v2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma -> redundant
vse32.v v1,0(a4)
add a0,a0,a2
add a1,a1,a2
add a4,a4,a2
bne a3,zero,.L3

The root cause is because we simplify AVL into immediate AVL too early
in FIXED-VLMAX situation. The later avlprop PASS failed to propagate AVL
generated by (SELECT_VL/vsetvl VL, AVL) into the normal RVV instruction.

So we need to remove immedate AVL simplification in 'expand' stage.

After this patch:

vsetvli a5,a3,e32,m1,ta,ma
sllia2,a5,2
vle32.v v1,0(a1)
vle32.v v2,0(a0)
sub a3,a3,a5
vadd.vv v1,v1,v2
vse32.v v1,0(a4)
add a0,a0,a2
add a1,a1,a2
add a4,a4,a2
bne a3,zero,.L3

After the removed simplification, the following situation should be fixed:
typedef int8_t vnx2qi __attribute__ ((vector_size (2)));

__attribute__ ((noipa)) void
f_vnx2qi (int8_t a, int8_t b, int8_t *out)
{
  vnx2qi v = {a, b};
  *(vnx2qi *) out = v;
}

We should use vsetvili zero, 2 instead of vsetvl a5,zero.
Such simplification is done in avlprop PASS which is also included in this patch
to fix regression of these situation.

PR target/112326

gcc/ChangeLog:

* config/riscv/riscv-avlprop.cc (get_insn_vtype_mode): New function.
(simplify_replace_vlmax_avl): Ditto.
(pass_avlprop::execute): Add immediate AVL simplification.
* config/riscv/riscv-protos.h (imm_avl_p): Rename.
* config/riscv/riscv-v.cc (const_vlmax_p): Ditto.
(imm_avl_p): Ditto.
(emit_vlmax_insn): Adapt for new interface name.
* config/riscv/vector.md (mode_idx): New attribute.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112326.c: New test.

---
 gcc/config/riscv/riscv-avlprop.cc | 95 ++-
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   | 24 ++---
 gcc/config/riscv/vector.md| 29 +-
 .../gcc.target/riscv/rvv/autovec/pr112326.c   | 16 
 5 files changed, 122 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112326.c

diff --git a/gcc/config/riscv/riscv-avlprop.cc 
b/gcc/config/riscv/riscv-avlprop.cc
index c59eb7f6fa3..bec1e3c715a 100644
--- a/gcc/config/riscv/riscv-avlprop.cc
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -109,6 +109,48 @@ vlmax_ta_p (rtx_insn *rinsn)
   return vlmax_avl_type_p (rinsn) && tail_agnostic_p (rinsn);
 }
 
+static machine_mode
+get_insn_vtype_mode (rtx_insn *rinsn)
+{
+  extract_insn_cached (rinsn);
+  int mode_idx = get_attr_mode_idx (rinsn);
+  gcc_assert (mode_idx != INVALID_ATTRIBUTE);
+  return GET_MODE (recog_data.operand[mode_idx]);
+}
+
+static void
+simplify_replace_vlmax_avl (rtx_insn *rinsn, rtx new_avl)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "\nPropagating AVL: ");
+  print_rtl_single (dump_file, new_avl);
+  fprintf (dump_file, "into: ");
+  print_rtl_single (dump_file, rinsn);
+}
+  /* Replace AVL operand.  */
+  extract_insn_cached (rinsn);
+  rtx avl = recog_data.operand[get_attr_vl_op_idx (rinsn)];
+  int count = count_regno_occurrences (rinsn, REGNO (avl));
+  gcc_assert (count == 1);
+  rtx new_pat = simplify_replace_rtx (PATTERN (rinsn), avl, new_avl);
+  validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
+
+  /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
+  if (vlmax_avl_type_p (rinsn))
+{
+  int index = get_attr_avl_type_idx (rinsn);
+  gcc_assert (index != INVALID_ATTRIBUTE);
+  validate_change_or_fail (rinsn, recog_data.operand_loc[index],
+  get_avl_type_rtx (avl_type::NONVLMAX), false);
+}
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Successfully to match this instruction: ");
+  print_rtl_single (dump_file, rinsn);
+}
+}
+
 const pass_data pass_data_avlprop = {
   RTL_PASS, /* type */
   "avlprop",/* name */
@@ -377,34 +419,35 @@ pass_avlprop::execute (function *fn)
   for (const auto prop : *m_avl_propagations)
 {
   rtx_insn *rinsn = prop.first->rtl ();
+  simplify_replace_vlmax_avl (rinsn, prop.second);
+}
+
+  if (riscv_autovec_preference == RVV_FIXED_VLMAX)
+{
+  /* Simplify VLMAX AVL into immediate AVL.
+E.g. Simplify this following case:
+
+ vsetvl a5, zero, e32, m1
+ vadd.vv
+
+   into:
+
+ vsetvl zero, 4, e32, m1
+ vadd.vv
+if GET_MODE_NUNITS (RVVM1SImode) == 4. 

Re: [PATCH] RISC-V: Fix redundant vsetvl in fixed-vlmax vectorized codes[PR112326]

2023-11-01 Thread juzhe.zh...@rivai.ai
update with more dump information in V2:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/634950.html 




juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-11-02 11:06
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix redundant vsetvl in fixed-vlmax vectorized 
codes[PR112326]
With compile option --param=riscv-autovec-preference=fixed-vlmax, we have
redundant AVL/VL toggling:
 
vsetvli a5,a3,e8,mf4,ta,ma -> should be changed into e32m1
vle32.v v1,0(a1)
vle32.v v2,0(a0)
vsetivli zero,4,e32,m1,ta,ma -> redundant
slli a2,a5,2
vadd.vv v1,v1,v2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma -> redundant
vse32.v v1,0(a4)
add a0,a0,a2
add a1,a1,a2
add a4,a4,a2
bne a3,zero,.L3
 
The root cause is because we simplify AVL into immediate AVL too early
in FIXED-VLMAX situation. The later avlprop PASS failed to propagate AVL
generated by (SELECT_VL/vsetvl VL, AVL) into the normal RVV instruction.
 
So we need to remove immedate AVL simplification in 'expand' stage.
 
After this patch:
 
  vsetvli a5,a3,e32,m1,ta,ma
slli a2,a5,2
vle32.v v1,0(a1)
vle32.v v2,0(a0)
sub a3,a3,a5
vadd.vv v1,v1,v2
vse32.v v1,0(a4)
add a0,a0,a2
add a1,a1,a2
add a4,a4,a2
bne a3,zero,.L3
 
After the removed simplification, the following situation should be fixed:
typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
 
__attribute__ ((noipa)) void
f_vnx2qi (int8_t a, int8_t b, int8_t *out)
{
  vnx2qi v = {a, b};
  *(vnx2qi *) out = v;
}
 
We should use vsetvili zero, 2 instead of vsetvl a5,zero.
Such simplification is done in avlprop PASS which is also included in this patch
to fix regression of these situation.
 
PR target/112326
 
gcc/ChangeLog:
 
* config/riscv/riscv-avlprop.cc (get_insn_vtype_mode): New function.
(simplify_replace_vlmax_avl): Ditto.
(pass_avlprop::execute): Add immediate AVL simplification.
* config/riscv/riscv-protos.h (imm_avl_p): Rename.
* config/riscv/riscv-v.cc (const_vlmax_p): Ditto.
(imm_avl_p): Ditto.
(emit_vlmax_insn): Adapt for new interface name.
* config/riscv/vector.md (mode_idx): New attribute.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/pr112326.c: New test.
 
---
gcc/config/riscv/riscv-avlprop.cc | 76 +++
gcc/config/riscv/riscv-protos.h   |  1 +
gcc/config/riscv/riscv-v.cc   | 24 ++
gcc/config/riscv/vector.md| 29 ++-
.../gcc.target/riscv/rvv/autovec/pr112326.c   | 16 
5 files changed, 113 insertions(+), 33 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112326.c
 
diff --git a/gcc/config/riscv/riscv-avlprop.cc 
b/gcc/config/riscv/riscv-avlprop.cc
index c59eb7f6fa3..e4d14a89a91 100644
--- a/gcc/config/riscv/riscv-avlprop.cc
+++ b/gcc/config/riscv/riscv-avlprop.cc
@@ -109,6 +109,36 @@ vlmax_ta_p (rtx_insn *rinsn)
   return vlmax_avl_type_p (rinsn) && tail_agnostic_p (rinsn);
}
+static machine_mode
+get_insn_vtype_mode (rtx_insn *rinsn)
+{
+  extract_insn_cached (rinsn);
+  int mode_idx = get_attr_mode_idx (rinsn);
+  gcc_assert (mode_idx != INVALID_ATTRIBUTE);
+  return GET_MODE (recog_data.operand[mode_idx]);
+}
+
+static void
+simplify_replace_vlmax_avl (rtx_insn *rinsn, rtx new_avl)
+{
+  /* Replace AVL operand.  */
+  extract_insn_cached (rinsn);
+  rtx avl = recog_data.operand[get_attr_vl_op_idx (rinsn)];
+  int count = count_regno_occurrences (rinsn, REGNO (avl));
+  gcc_assert (count == 1);
+  rtx new_pat = simplify_replace_rtx (PATTERN (rinsn), avl, new_avl);
+  validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
+
+  /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
+  if (vlmax_avl_type_p (rinsn))
+{
+  int index = get_attr_avl_type_idx (rinsn);
+  gcc_assert (index != INVALID_ATTRIBUTE);
+  validate_change_or_fail (rinsn, recog_data.operand_loc[index],
+get_avl_type_rtx (avl_type::NONVLMAX), false);
+}
+}
+
const pass_data pass_data_avlprop = {
   RTL_PASS, /* type */
   "avlprop", /* name */
@@ -385,22 +415,7 @@ pass_avlprop::execute (function *fn)
  print_rtl_single (dump_file, rinsn);
}
   /* Replace AVL operand.  */
-  extract_insn_cached (rinsn);
-  rtx avl = recog_data.operand[get_attr_vl_op_idx (rinsn)];
-  int count = count_regno_occurrences (rinsn, REGNO (avl));
-  gcc_assert (count == 1);
-  rtx new_pat = simplify_replace_rtx (PATTERN (rinsn), avl, prop.second);
-  validate_change_or_fail (rinsn, &PATTERN (rinsn), new_pat, false);
-
-  /* Change AVL TYPE into NONVLMAX if it is VLMAX.  */
-  if (vlmax_avl_type_p (rinsn))
- {
-   int index = get_attr_avl_type_idx (rinsn);
-   gcc_assert (index != INVALID_ATTRIBUTE);
-   validate_change_or_fail (rinsn, recog_data.operand_loc[index],
-get_avl_type_rtx (avl_type::NONVLMAX),
-false);
- }
+  simplify_replace_vlmax_avl (rinsn, prop.second);
   if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file, "Successfully to match this instruction: ");
@