Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-06-12 Thread Manolis Tsamis
On Fri, Jun 9, 2023 at 3:57 AM Jeff Law  wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Implementation of the new RISC-V optimization pass for memory offset
> > calculations, documentation and testcases.
> >
> > gcc/ChangeLog:
> >
> >   * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> >   * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> >   pass.
> >   * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> >   * config/riscv/riscv.opt: New options.
> >   * config/riscv/t-riscv: New build rule.
> >   * doc/invoke.texi: Document new option.
> >   * config/riscv/riscv-fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >   * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >   * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> So a followup.
>
> While I think we probably could create a variety of backend patterns,
> perhaps disallow the frame pointer as the addend argument to a shadd
> pattern and the like and capture the key cases from mcf and probably
> deepsjeng it's probably not the best direction.
>
> What I suspect would ultimately happen is we'd be presented with
> additional cases over time that would require an ever increasing number
> of patterns.  sign vs zero extension, increasing depth of search space
> to find reassociation opportunities, different variants with and without
> shadd/zbb, etc etc.
>
> So with that space explored a bit the next big question is target
> specific or generic.  I'll poke in there a it over the coming days.  In
> the mean time I do have some questions/comments on the code itself.
> There may be more over time..
>
>
>
> > +static rtx_insn*
> > +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> [ ... ]
>
>
> > +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> > +{
> > +  /* Problem getting some definition for this instruction.  */
> > +  if (ref_link->ref == NULL)
> > + return NULL;
> > +  if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
> > + return NULL;
> > +  if (global_regs[REGNO (reg)]
> > +   && !set_of (reg, DF_REF_INSN (ref_link->ref)))
> > + return NULL;
> > +}
> That last condition feels a bit odd.  It would seem that you wanted an
> OR boolean rather than AND.
>

Most of this function I didn't write by myself, I used existing code
to get definitions taken from REE's get_defs.
In the original there's a comment about this line this comment that explains it:

  As global regs are assumed to be defined at each function call
  dataflow can report a call_insn as being a definition of REG.
  But we can't do anything with that in this pass so proceed only
  if the instruction really sets REG in a way that can be deduced
  from the RTL structure.

This function is the only one I copied without changing much (because
I didn't quite understand it), so I don't know if that condition is
any useful for f-m-o.
Also the code duplication here is a bit unfortunate, maybe it would be
preferred to create a generic version that can be used in both?

>
> > +
> > +  unsigned int dest_regno = REGNO (dest);
> > +
> > +  /* We don't want to fold offsets from instructions that change some
> > + particular registers with potentially global side effects.  */
> > +  if (!GP_REG_P (dest_regno)
> > +  || dest_regno == STACK_POINTER_REGNUM
> > +  || (frame_pointer_needed && dest_regno == HARD_FRAME_POINTER_REGNUM)
> > +  || dest_regno == GP_REGNUM
> > +  || dest_regno == THREAD_POINTER_REGNUM
> > +  || dest_regno == RETURN_ADDR_REGNUM)
> > +return 0;
> I'd think most of this would be captured by testing fixed_registers
> rather than trying to list each register individually.  In fact, if we
> need to generalize this to work on other targets we almost certainly
> want a more general test.
>

Thanks, I knew there would be some proper way to test this but wasn't
aware which is the correct one.
Should this look like below? Or is the GP_REG_P redundant and just
fixed_regs will do?

  if (!GP_REG_P (dest_regno) || fixed_regs[dest_regno])
return 0;

>
> > +  else if ((
> > + GET_CODE (src) == SIGN_EXTEND
> > + || GET_CODE (src) == ZERO_EXTEND
> > +   )
> > +   && MEM_P (XEXP (src, 0)))
> Formatting is goofy above...
>

Noted.

>
>
> > +
> > +   if (dump_file)
> > + {
> > +   fprintf (dump_file, "Instruction deleted from folding:");
> > +   print_rtl_single (dump_file, insn);
> > + }
> > +
> > +   if (REGNO (dest) != REGNO (arg1))
> > + {
> > +   /* If the dest register is different than the fisrt argument
> > +  then the addition with constant 0 is equivalent to a move
> > +  instruction.  We emit the move and let the subsequent
> > +  pass cprop_hardreg eliminate that if possible.  */
> > +  

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-06-12 Thread Manolis Tsamis
On Thu, Jun 8, 2023 at 8:37 AM Jeff Law  wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Implementation of the new RISC-V optimization pass for memory offset
> > calculations, documentation and testcases.
> >
> > gcc/ChangeLog:
> >
> >   * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> >   * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> >   pass.
> >   * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> >   * config/riscv/riscv.opt: New options.
> >   * config/riscv/t-riscv: New build rule.
> >   * doc/invoke.texi: Document new option.
> >   * config/riscv/riscv-fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >   * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >   * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> So not going into the guts of the patch yet.
>
>  From a benchmark standpoint the only two that get out of the +-0.05%
> range are mcf and deepsjeng (from a dynamic instruction standpoint).  So
> from an evaluation standpoint we can probably focus our efforts there.
> And as we know, mcf is actually memory bound, so while improving its
> dynamic instruction count is good, the end performance improvement may
> be marginal.
>

Even if late, one question for the dynamic instruction numbers.
Was this measured just with f-m-o or with the stack pointer fold patch
applied too?
I remember I was getting better improvements in the past, but most of
the cases had to do with the stack pointer so the second patch is
necessary.

> As I mentioned to Philipp many months ago this reminds me a lot of a
> problem I've seen before.  Basically register elimination emits code
> that can be terrible in some circumstances.  So I went and poked at this
> again.
>
> I think the key difference between now and what I was dealing with
> before is for the cases that really matter for rv64 we have a shNadd
> insn in the sequence.  That private port I was working on before did not
> have shNadd (don't ask, I probably can't tell).  Our target also had
> reg+reg addressing modes.  What I can't remember was if we were trying
> harder to fold the constant terms into the memory reference or if we
> were more focused on the reg+reg.  Ultimately it's probably not that
> important to remember -- the key is there are very significant
> differences in the target's capabilities which impact how we should be
> generating code in this case.  Those differences affect the code we
> generate *and* the places where we can potentially get control and do
> some address rewriting.
>
> A  key sequence in mcf looks something like this in IRA, others have
> similar structure:
>
> > (insn 237 234 239 26 (set (reg:DI 377)
> > (plus:DI (ashift:DI (reg:DI 200 [ _173 ])
> > (const_int 3 [0x3]))
> > (reg/f:DI 65 frame))) "pbeampp.c":139:15 333 {*shNadd}
> >  (nil))
> > (insn 239 237 235 26 (set (reg/f:DI 380)
> > (plus:DI (reg:DI 513)
> > (reg:DI 377))) "pbeampp.c":139:15 5 {adddi3}
> >  (expr_list:REG_DEAD (reg:DI 377)
> > (expr_list:REG_EQUAL (plus:DI (reg:DI 377)
> > (const_int -32768 [0x8000]))
> > (nil
> [ ... ]
> > (insn 240 235 255 26 (set (reg/f:DI 204 [ _177 ])
> > (mem/f:DI (plus:DI (reg/f:DI 380)
> > (const_int 280 [0x118])) [7 *_176+0 S8 A64])) 
> > "pbeampp.c":139:15 179 {*movdi_64bit}
> >  (expr_list:REG_DEAD (reg/f:DI 380)
> > (nil)))
>
>
> The key here is insn 237.  It's generally going to be bad to have FP
> show up in a shadd insn because its going to be eliminated into
> sp+offset.  That'll generate an input reload before insn 237 and we
> can't do any combination with the constant in insn 239.
>
> After LRA it looks like this:
>
> > (insn 1540 234 1541 26 (set (reg:DI 11 a1 [750])
> > (const_int 32768 [0x8000])) "pbeampp.c":139:15 179 {*movdi_64bit}
> >  (nil))
> > (insn 1541 1540 1611 26 (set (reg:DI 12 a2 [749])
> > (plus:DI (reg:DI 11 a1 [750])
> > (const_int -272 [0xfef0]))) "pbeampp.c":139:15 5 
> > {adddi3}
> >  (expr_list:REG_EQUAL (const_int 32496 [0x7ef0])
> > (nil)))
> > (insn 1611 1541 1542 26 (set (reg:DI 29 t4 [795])
> > (plus:DI (reg/f:DI 2 sp)
> > (const_int 64 [0x40]))) "pbeampp.c":139:15 5 {adddi3}
> >  (nil))
> > (insn 1542 1611 237 26 (set (reg:DI 12 a2 [749])
> > (plus:DI (reg:DI 12 a2 [749])
> > (reg:DI 29 t4 [795]))) "pbeampp.c":139:15 5 {adddi3}
> >  (nil))
> > (insn 237 1542 239 26 (set (reg:DI 12 a2 [377])
> > (plus:DI (ashift:DI (reg:DI 14 a4 [orig:200 _173 ] [200])
> > (const_int 3 [0x3]))
> > (reg:DI 12 a2 [749]))) "pbeampp.c":139:15 333 {*shNadd}
> >  (nil))
> > (insn 239 237 235 26 (set (reg/f:DI 12 a2 [380])
> > (plus:

[PATCH v1] RISC-V: Support RVV FP16 MISC vget/vset intrinsic API

2023-06-12 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch support the intrinsic API of FP16 ZVFHMIN vget/vset. From
the user's perspective, it is reasonable to do some get/set operations
for the vfloat16*_t types when only ZVFHMIN is enabled.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat16m1_t): Add type to lmul1 ops.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add new test cases.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Likewise.
---
 .../riscv/riscv-vector-builtins-types.def |  3 ++
 .../riscv/rvv/base/zvfh-over-zvfhmin.c| 15 +++--
 .../riscv/rvv/base/zvfhmin-intrinsic.c| 32 ++-
 3 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index db8e61fea6a..4926bd8a2d2 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -1091,6 +1091,7 @@ DEF_RVV_LMUL1_OPS (vuint8m1_t, 0)
 DEF_RVV_LMUL1_OPS (vuint16m1_t, 0)
 DEF_RVV_LMUL1_OPS (vuint32m1_t, 0)
 DEF_RVV_LMUL1_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_LMUL1_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
 DEF_RVV_LMUL1_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
 DEF_RVV_LMUL1_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
 
@@ -1102,6 +1103,7 @@ DEF_RVV_LMUL2_OPS (vuint8m2_t, 0)
 DEF_RVV_LMUL2_OPS (vuint16m2_t, 0)
 DEF_RVV_LMUL2_OPS (vuint32m2_t, 0)
 DEF_RVV_LMUL2_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_LMUL2_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
 DEF_RVV_LMUL2_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
 DEF_RVV_LMUL2_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
 
@@ -1113,6 +1115,7 @@ DEF_RVV_LMUL4_OPS (vuint8m4_t, 0)
 DEF_RVV_LMUL4_OPS (vuint16m4_t, 0)
 DEF_RVV_LMUL4_OPS (vuint32m4_t, 0)
 DEF_RVV_LMUL4_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_LMUL4_OPS (vfloat16m4_t, RVV_REQUIRE_ELEN_FP_16)
 DEF_RVV_LMUL4_OPS (vfloat32m4_t, RVV_REQUIRE_ELEN_FP_32)
 DEF_RVV_LMUL4_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
index c3ed4191a36..1d82cc8de2d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
@@ -61,6 +61,14 @@ vfloat16m8_t test_vundefined_f16m8() {
   return __riscv_vundefined_f16m8();
 }
 
+vfloat16m2_t test_vset_v_f16m1_f16m2(vfloat16m2_t dest, size_t index, 
vfloat16m1_t val) {
+  return __riscv_vset_v_f16m1_f16m2(dest, 0, val);
+}
+
+vfloat16m4_t test_vget_v_f16m8_f16m4(vfloat16m8_t src, size_t index) {
+  return __riscv_vget_v_f16m8_f16m4(src, 0);
+}
+
 /* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 3 } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m4,\s*t[au],\s*m[au]} 2 } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 1 } } */
@@ -71,7 +79,10 @@ vfloat16m8_t test_vundefined_f16m8() {
 /* { dg-final { scan-assembler-times {vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+} 2 
} } */
 /* { dg-final { scan-assembler-times {vle16\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 7 } 
} */
 /* { dg-final { scan-assembler-times {vse16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
6 } } */
-/* { dg-final { scan-assembler-times 
{vl4re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
+/* { dg-final { scan-assembler-times 
{vl1re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
+/* { dg-final { scan-assembler-times 
{vl2re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
+/* { dg-final { scan-assembler-times 
{vl4re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 3 } } */
 /* { dg-final { scan-assembler-times 
{vl8re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
-/* { dg-final { scan-assembler-times {vs4r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 
} } */
+/* { dg-final { scan-assembler-times {vs2r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 
} } */
+/* { dg-final { scan-assembler-times {vs4r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 3 
} } */
 /* { dg-final { scan-assembler-times {vs8r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 5 
} } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
index 8d39a2ed4c2..1026b3f82f1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
@@ -165,6 +165,22 @@ vfloat16m8_t test_vundefined_f16m8() {
   return __riscv_vundefined_f16m8();
 }
 
+vfloat16m2_t test_vset_v_f16m1_f16m2(vfloat16m2_t dest, size_t index, 
vfloat16m1_t val) {
+  return __riscv_vset_v_f16m1_f16m2(dest, 0, val);
+}
+
+vfloat16m8_t test_vset_v_f16m4_f16m8(vfloat16m8_t dest, size_t index, 
vfloat16m4_t val) {
+  return __riscv_vset_v_f16m4_f16m8(des

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-06-12 Thread Manolis Tsamis
On Sat, Jun 10, 2023 at 6:49 PM Jeff Law  wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Implementation of the new RISC-V optimization pass for memory offset
> > calculations, documentation and testcases.
> >
> > gcc/ChangeLog:
> >
> >   * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> >   * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> >   pass.
> >   * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> >   * config/riscv/riscv.opt: New options.
> >   * config/riscv/t-riscv: New build rule.
> >   * doc/invoke.texi: Document new option.
> >   * config/riscv/riscv-fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >   * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >   * gcc.target/riscv/fold-mem-offsets-3.c: New test.
>
> So I made a small number of changes so that this could be run on other
> targets.
>
>
> I had an hppa compiler handy, so it was trivial to do some light testing
> with that.  f-m-o didn't help at all on the included tests.  But I think
> that's more likely an artifact of the port supporting scaled indexed
> loads and doing fairly aggressive address rewriting to encourage that
> addressing mode.
>
> Next I had an H8 compiler handy.  All three included tests showed
> improvement, both in terms of instruction count and size.  What was most
> interesting here is that f-m-o removed some redundant address
> calculations without needing to adjust the memory references which was a
> pleasant surprise.
>
> Given the fact that both ports worked and the H8 showed an improvement,
> the next step was to put the patch into my tester.  It tests 30+
> distinct processor families.  The goal wasn't to evaluate effectiveness,
> but to validate that those targets could still build their target
> libraries and successfully run their testsuites.
>
> That's run through the various crosses.  Things like the hppa, alpha,
> m68k bootstraps only run once a week as they take many hours each.  The
> result is quite encouraging.  None of the crosses had any build issues
> or regressions.
>

That's all great news!

> The net result I think is we should probably move this to a target
> independent optimization pass.  We only need to generalize a few things.
>

I also think that's where this should end up since most of the pass is
target independent anyway.
I just couldn't figure out what would be a proper way to model the
propagation rules for each target.
Is a target hook necessary for that?

> Most importantly we need to get a resolution on the conditional I asked
> about inside get_single_def_in_bb.   There's some other refactoring I
> think we should do, but I'd really like to get a resolution on the code
> in get_single_def_in_bb first, then we ought to be able to move forward
> pretty quickly on the refactoring and integration.
>

Just replied to that in my previous response :)

> jeff

Thanks,
Manolis


Re: [PATCH v1] RISC-V: Support RVV FP16 MISC vget/vset intrinsic API

2023-06-12 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-12 15:40
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV FP16 MISC vget/vset intrinsic API
From: Pan Li 
 
This patch support the intrinsic API of FP16 ZVFHMIN vget/vset. From
the user's perspective, it is reasonable to do some get/set operations
for the vfloat16*_t types when only ZVFHMIN is enabled.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-types.def
(vfloat16m1_t): Add type to lmul1 ops.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add new test cases.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Likewise.
---
.../riscv/riscv-vector-builtins-types.def |  3 ++
.../riscv/rvv/base/zvfh-over-zvfhmin.c| 15 +++--
.../riscv/rvv/base/zvfhmin-intrinsic.c| 32 ++-
3 files changed, 40 insertions(+), 10 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index db8e61fea6a..4926bd8a2d2 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -1091,6 +1091,7 @@ DEF_RVV_LMUL1_OPS (vuint8m1_t, 0)
DEF_RVV_LMUL1_OPS (vuint16m1_t, 0)
DEF_RVV_LMUL1_OPS (vuint32m1_t, 0)
DEF_RVV_LMUL1_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_LMUL1_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_LMUL1_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_LMUL1_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
@@ -1102,6 +1103,7 @@ DEF_RVV_LMUL2_OPS (vuint8m2_t, 0)
DEF_RVV_LMUL2_OPS (vuint16m2_t, 0)
DEF_RVV_LMUL2_OPS (vuint32m2_t, 0)
DEF_RVV_LMUL2_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_LMUL2_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_LMUL2_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_LMUL2_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
@@ -1113,6 +1115,7 @@ DEF_RVV_LMUL4_OPS (vuint8m4_t, 0)
DEF_RVV_LMUL4_OPS (vuint16m4_t, 0)
DEF_RVV_LMUL4_OPS (vuint32m4_t, 0)
DEF_RVV_LMUL4_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_LMUL4_OPS (vfloat16m4_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_LMUL4_OPS (vfloat32m4_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_LMUL4_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
index c3ed4191a36..1d82cc8de2d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
@@ -61,6 +61,14 @@ vfloat16m8_t test_vundefined_f16m8() {
   return __riscv_vundefined_f16m8();
}
+vfloat16m2_t test_vset_v_f16m1_f16m2(vfloat16m2_t dest, size_t index, 
vfloat16m1_t val) {
+  return __riscv_vset_v_f16m1_f16m2(dest, 0, val);
+}
+
+vfloat16m4_t test_vget_v_f16m8_f16m4(vfloat16m8_t src, size_t index) {
+  return __riscv_vget_v_f16m8_f16m4(src, 0);
+}
+
/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 3 } } */
/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m4,\s*t[au],\s*m[au]} 2 } } */
/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 1 } } */
@@ -71,7 +79,10 @@ vfloat16m8_t test_vundefined_f16m8() {
/* { dg-final { scan-assembler-times {vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+} 2 } 
} */
/* { dg-final { scan-assembler-times {vle16\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 7 } 
} */
/* { dg-final { scan-assembler-times {vse16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 6 
} } */
-/* { dg-final { scan-assembler-times 
{vl4re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
+/* { dg-final { scan-assembler-times 
{vl1re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
+/* { dg-final { scan-assembler-times 
{vl2re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
+/* { dg-final { scan-assembler-times 
{vl4re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 3 } } */
/* { dg-final { scan-assembler-times {vl8re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
1 } } */
-/* { dg-final { scan-assembler-times {vs4r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 
} } */
+/* { dg-final { scan-assembler-times {vs2r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 
} } */
+/* { dg-final { scan-assembler-times {vs4r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 3 
} } */
/* { dg-final { scan-assembler-times {vs8r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 5 
} } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
index 8d39a2ed4c2..1026b3f82f1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
@@ -165,6 +165,22 @@ vfloat16m8_t test_vundefined_f16m8() {
   return __riscv_vundefined_f16m8();
}
+vfloat16m2_t test_vset_v_f16m1_f16m2(vfloat16m2_t dest, size_t index, 
vfloat16m1_t val) {
+  return __riscv_vset_v_f16m1_f16m

Re: [PATCH 2/2] ipa-cp: Feed results of IPA-CP into value numbering

2023-06-12 Thread Richard Biener via Gcc-patches
On Fri, 9 Jun 2023, Martin Jambor wrote:

> Hi,
> 
> thanks for looking at this.
> 
> On Fri, Jun 02 2023, Richard Biener wrote:
> > On Mon, 29 May 2023, Martin Jambor wrote:
> >
> 
> [...]
> 
> >> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> >> index 27c84e78fcf..33215b5fc82 100644
> >> --- a/gcc/tree-ssa-sccvn.cc
> >> +++ b/gcc/tree-ssa-sccvn.cc
> >> @@ -74,6 +74,9 @@ along with GCC; see the file COPYING3.  If not see
> >>  #include "ipa-modref-tree.h"
> >>  #include "ipa-modref.h"
> >>  #include "tree-ssa-sccvn.h"
> >> +#include "alloc-pool.h"
> >> +#include "symbol-summary.h"
> >> +#include "ipa-prop.h"
> >>  
> >>  /* This algorithm is based on the SCC algorithm presented by Keith
> >> Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
> >> @@ -2327,7 +2330,7 @@ vn_walk_cb_data::push_partial_def (pd_data pd,
> >> with the current VUSE and performs the expression lookup.  */
> >>  
> >>  static void *
> >> -vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, tree vuse, void 
> >> *data_)
> >> +vn_reference_lookup_2 (ao_ref *op, tree vuse, void *data_)
> >>  {
> >>vn_walk_cb_data *data = (vn_walk_cb_data *)data_;
> >>vn_reference_t vr = data->vr;
> >> @@ -2361,6 +2364,37 @@ vn_reference_lookup_2 (ao_ref *op ATTRIBUTE_UNUSED, 
> >> tree vuse, void *data_)
> >>return *slot;
> >>  }
> >>  
> >> +  if (SSA_NAME_IS_DEFAULT_DEF (vuse))
> >> +{
> >> +  HOST_WIDE_INT offset, size;
> >> +  tree v = NULL_TREE;
> >> +  if (op->base && TREE_CODE (op->base) == PARM_DECL
> >> +&& op->offset.is_constant (&offset)
> >> +&& op->size.is_constant (&size)
> >> +&& op->max_size_known_p ()
> >> +&& known_eq (op->size, op->max_size))
> >> +  v = ipcp_get_aggregate_const (cfun, op->base, false, offset, size);
> >
> > We've talked about partial definition support, this does not
> > have this implemented AFAICS.  But that means you cannot simply
> > do ->finish () without verifying data->partial_defs.is_empty ().
> >
> 
> You are right, partial definitions are not implemented.  I have added
> the is_empty check to the patch.  I'll continue looking into adding the
> support as a follow-up.
> 
> >> +  else if (op->ref)
> >> +  {
> >
> > does this ever happen to imrpove things?
> 
> Yes, this branch is necessary for propagation of all known constants
> passed in memory pointed to by a POINTER_TYPE_P parameter.  It handles
> the second testcase added by the patch.
> 
> > There's the remote
> > possibility op->base isn't initialized yet, for this reason
> > above you should use ao_ref_base (op) instead of accessing
> > op->base directly.
> 
> OK
> 
> >
> >> +HOST_WIDE_INT offset, size;
> >> +bool reverse;
> >> +tree base = get_ref_base_and_extent_hwi (op->ref, &offset,
> >> + &size, &reverse);
> >> +if (base
> >> +&& TREE_CODE (base) == MEM_REF
> >> +&& integer_zerop (TREE_OPERAND (base, 1))
> >> +&& TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
> >
> > And this then should be done within the above branch as well,
> > just keyed off base == MEM_REF.
> 
> I am sorry but I don't understand this comment, can you please try to
> re-phrase it?  The previous branch handles direct accesses to
> PARM_DECLs, MEM_REFs don't need to be there at all.

See below

> Updated (bootstrap and testing passing) patch is below for reference,
> but I obviously expect to incorporate the above comment as well before
> proposing to push it.
> 
> Thanks,
> 
> Martin
> 
> 
> Subject: [PATCH 2/2] ipa-cp: Feed results of IPA-CP into value numbering
> 
> PRs 68930 and 92497 show that when IPA-CP figures out constants in
> aggregate parameters or when passed by reference but the loads happen
> in an inlined function the information is lost.  This happens even
> when the inlined function itself was known to have - or even cloned to
> have - such constants in incoming parameters because the transform
> phase of IPA passes is not run on them.  See discussion in the bugs
> for reasons why.
> 
> Honza suggested that we can plug the results of IPA-CP analysis into
> value numbering, so that FRE can figure out that some loads fetch
> known constants.  This is what this patch attempts to do.
> 
> This version of the patch uses the new way we represent aggregate
> constants discovered IPA-CP and so avoids linear scan to find them.
> Similarly, it depends on the previous patch which avoids potentially
> slow linear look ups of indices of PARM_DECLs when there are many of
> them.
> 
> gcc/ChangeLog:
> 
> 2023-06-07  Martin Jambor  
> 
>   PR ipa/68930
>   PR ipa/92497
>   * ipa-prop.h (ipcp_get_aggregate_const): Declare.
>   * ipa-prop.cc (ipcp_get_aggregate_const): New function.
>   (ipcp_transform_function): Do not deallocate transformation info.
>   * tree-ssa-sccvn.cc: Include alloc-pool.h, symbol-summary.h and
>   ipa-prop.h.
>   (vn_reference_lookup_2): When hitting 

[PATCH] combine: Narrow comparison of memory and constant

2023-06-12 Thread Stefan Schulze Frielinghaus via Gcc-patches
Comparisons between memory and constants might be done in a smaller mode
resulting in smaller constants which might finally end up as immediates
instead of in the literal pool.

For example, on s390x a non-symmetric comparison like
  x <= 0x3fff
results in the constant being spilled to the literal pool and an 8 byte
memory comparison is emitted.  Ideally, an equivalent comparison
  x0 <= 0x3f
where x0 is the most significant byte of x, is emitted where the
constant is smaller and more likely to materialize as an immediate.

Similarly, comparisons of the form
  x >= 0x4000
can be shortened into x0 >= 0x40.

I'm not entirely sure whether combine is the right place to implement
something like this.  In my first try I implemented it in
TARGET_CANONICALIZE_COMPARISON but then thought other targets might
profit from it, too.  simplify_context::simplify_relational_operation_1
seems to be the wrong place since code/mode may change.  Any opinions?

gcc/ChangeLog:

* combine.cc (simplify_compare_const): Narrow comparison of
memory and constant.
(try_combine): Adapt new function signature.
(simplify_comparison): Adapt new function signature.

gcc/testsuite/ChangeLog:

* gcc.target/s390/cmp-mem-const-1.c: New test.
* gcc.target/s390/cmp-mem-const-2.c: New test.
---
 gcc/combine.cc| 82 ++-
 .../gcc.target/s390/cmp-mem-const-1.c | 99 +++
 .../gcc.target/s390/cmp-mem-const-2.c | 23 +
 3 files changed, 200 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/cmp-mem-const-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/cmp-mem-const-2.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 5aa0ec5c45a..6ad1600dc1b 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -460,7 +460,7 @@ static rtx simplify_shift_const (rtx, enum rtx_code, 
machine_mode, rtx,
 static int recog_for_combine (rtx *, rtx_insn *, rtx *);
 static rtx gen_lowpart_for_combine (machine_mode, rtx);
 static enum rtx_code simplify_compare_const (enum rtx_code, machine_mode,
-rtx, rtx *);
+rtx *, rtx *);
 static enum rtx_code simplify_comparison (enum rtx_code, rtx *, rtx *);
 static void update_table_tick (rtx);
 static void record_value_for_reg (rtx, rtx_insn *, rtx);
@@ -3185,7 +3185,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
  compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
  if (is_a  (GET_MODE (i2dest), &mode))
compare_code = simplify_compare_const (compare_code, mode,
-  op0, &op1);
+  &op0, &op1);
  target_canonicalize_comparison (&compare_code, &op0, &op1, 1);
}
 
@@ -11800,9 +11800,10 @@ gen_lowpart_for_combine (machine_mode omode, rtx x)
 
 static enum rtx_code
 simplify_compare_const (enum rtx_code code, machine_mode mode,
-   rtx op0, rtx *pop1)
+   rtx *pop0, rtx *pop1)
 {
   scalar_int_mode int_mode;
+  rtx op0 = *pop0;
   HOST_WIDE_INT const_op = INTVAL (*pop1);
 
   /* Get the constant we are comparing against and turn off all bits
@@ -11987,6 +11988,79 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
   break;
 }
 
+  /* Narrow non-symmetric comparison of memory and constant as e.g.
+ x0...x7 <= 0x3fff into x0 <= 0x3f where x0 is the most
+ significant byte.  Likewise, transform x0...x7 >= 0x4000 into
+ x0 >= 0x40.  */
+  if ((code == LEU || code == LTU || code == GEU || code == GTU)
+  && is_a  (GET_MODE (op0), &int_mode)
+  && MEM_P (op0)
+  && !MEM_VOLATILE_P (op0)
+  && (unsigned HOST_WIDE_INT)const_op > 0xff)
+{
+  unsigned HOST_WIDE_INT n = (unsigned HOST_WIDE_INT)const_op;
+  enum rtx_code adjusted_code = code;
+
+  /* If the least significant bit is already zero, then adjust the
+comparison in the hope that we hit cases like
+  op0  <= 0x3dfe
+where the adjusted comparison
+  op0  <  0x3dff
+can be shortened into
+  op0' <  0x3d.  */
+  if (code == LEU && (n & 1) == 0)
+   {
+ ++n;
+ adjusted_code = LTU;
+   }
+  /* or e.g. op0 < 0x4020  */
+  else if (code == LTU && (n & 1) == 0)
+   {
+ --n;
+ adjusted_code = LEU;
+   }
+  /* or op0 >= 0x4001  */
+  else if (code == GEU && (n & 1) == 1)
+   {
+ --n;
+ adjusted_code = GTU;
+   }
+  /* or op0 > 0x3fff.  */
+  else if (code == GTU && (n & 1) == 1)
+   {
+ ++n;
+ adjusted_code = GEU;
+   }
+
+  scalar_int_mode narrow_mode_iter;
+  bool lower_p = code == LEU || code ==

Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-12 Thread Richard Biener via Gcc-patches
On Mon, 12 Jun 2023, Jiufu Guo wrote:

> Richard Biener  writes:
> 
> > On Fri, 9 Jun 2023, Jiufu Guo wrote:
> >
> >> 
> >> Hi,
> >> 
> >> Richard Biener  writes:
> >> 
> >> > On Fri, 9 Jun 2023, Jiufu Guo wrote:
> >> >
> >> >> 
> >> >> Hi,
> >> >> 
> >> >> Richard Biener  writes:
> >> >> 
> >> >> > On Fri, 9 Jun 2023, Richard Sandiford wrote:
> >> >> >
> >> >> >> guojiufu  writes:
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > On 2023-06-09 16:00, Richard Biener wrote:
> >> >> >> >> On Fri, 9 Jun 2023, Jiufu Guo wrote:
> >> >> >> >> 
> >> >> >> >>> Hi,
> >> >> >> >>> 
> >> ...
> >> >> >> >>> 
> >> >> >> >>> This patch is raised when drafting below one.
> >> >> >> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
> >> >> >> >>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
> >> >> >> >>> try_const_anchors, and hits the assert/ice.
> >> >> >> >>> 
> >> >> >> >>> Boostrap and regtest pass on ppc64{,le} and x86_64.
> >> >> >> >>> Is this ok for trunk?
> >> >> >> >> 
> >> >> >> >> Iff the correct fix at all (how can a CONST_INT have BLKmode?) 
> >> >> >> >> then
> >> >> >> >> I suggest to instead fix try_const_anchors to change
> >> >> >> >> 
> >> >> >> >>   /* CONST_INT is used for CC modes, but we should leave those 
> >> >> >> >> alone.  
> >> >> >> >> */
> >> >> >> >>   if (GET_MODE_CLASS (mode) == MODE_CC)
> >> >> >> >> return NULL_RTX;
> >> >> >> >> 
> >> >> >> >>   gcc_assert (SCALAR_INT_MODE_P (mode));
> >> >> >> >> 
> >> >> >> >> to
> >> >> >> >> 
> >> >> >> >>   /* CONST_INT is used for CC modes, leave any non-scalar-int 
> >> >> >> >> mode 
> >> >> >> >> alone.  */
> >> >> >> >>   if (!SCALAR_INT_MODE_P (mode))
> >> >> >> >> return NULL_RTX;
> >> >> >> >> 
> >> >> >> >
> >> >> >> > This is also able to fix this issue.  there is a "Punt on CC 
> >> >> >> > modes" 
> >> >> >> > patch
> >> >> >> > to return NULL_RTX in try_const_anchors.
> >> >> >> >
> >> >> >> >> but as said I wonder how we arrive at a BLKmode CONST_INT and 
> >> >> >> >> whether
> >> >> >> >> we should have fended this off earlier.  Can you share more 
> >> >> >> >> complete
> >> >> >> >> RTL of that stack_tie?
> >> >> >> >
> >> >> >> >
> >> >> >> > (insn 15 14 16 3 (parallel [
> >> >> >> >  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
> >> >> >> >  (const_int 0 [0]))
> >> >> >> >  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
> >> >> >> >   (nil))
> >> >> >> >
> >> >> >> > It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".
> >> >> >> 
> >> >> >> I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] 
> >> >> >> ...)
> >> >> >> would be though.  It's arguably more accurate too, since the effect
> >> >> >> on the stack locations is unspecified rather than predictable.
> >> >> >
> >> >> > powerpc seems to be the only port with a stack_tie that's not
> >> >> > using an UNSPEC RHS.
> >> >> In rs6000.md, it is
> >> >> 
> >> >> ; This is to explain that changes to the stack pointer should
> >> >> ; not be moved over loads from or stores to stack memory.
> >> >> (define_insn "stack_tie"
> >> >>   [(match_parallel 0 "tie_operand"
> >> >>[(set (mem:BLK (reg 1)) (const_int 0))])]
> >> >>   ""
> >> >>   ""
> >> >>   [(set_attr "length" "0")])
> >> >> 
> >> >> This would be just an placeholder insn, and acts as the comments.
> >> >> UNSPEC_ would works like other targets.  While, I'm wondering
> >> >> the concerns on "set (mem:BLK (reg 1)) (const_int 0)".
> >> >> MODEs between SET_DEST and SET_SRC?
> >> >
> >> > I don't think the issue is the mode but the issue is that
> >> > the patter as-is says some memory is zeroed while that's not
> >> > actually true (not specifying a size means we can't really do
> >> > anything with this MEM, but still).  Using an UNSPEC avoids
> >> > implying anything for the stored value.
> >> >
> >> > Of course I think a MEM SET_DEST without a specified size is bougs
> >> > as well, but there's larger precedent for this...
> >> 
> >> Thanks for your kindly comments!
> >> Using "(set (mem:BLK (reg 1)) (const_int 0))" here, may because this
> >> insn does not generate real thing (not a real store and no asm code),
> >> may like barrier.
> >> 
> >> While I agree that, using UNSPEC may be more clear to avoid mis-reading.
> >
> > Btw, another way to avoid the issue in CSE is to make it not process
> > (aka record anything for optimization) for SET from MEMs with
> > !MEM_SIZE_KNOWN_P
> 
> Thanks! Yes, this would make sense.
> Then, there are two ideas(patches) to handle this issue:
> Which one would be preferable?  This one (from compiling time aspect)?
> 
> And maybe, the changes in rs6000 stack_tie through using unspec
> can be a standalone enhancement besides cse patch.
> 
> Thanks for comments!
> 
> BR,
> Jeff (Jiufu Guo)
> 
>  patch 1
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index 2bb63ac4105..06ecdadecbc 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4271,6 +4271,8 @@ fi

[PATCH v5] RISC-V: Add vector psabi checking.

2023-06-12 Thread yanzhang.wang--- via Gcc-patches
From: Yanzhang Wang 

This patch adds support to check function's argument or return is vector type
and throw warning if yes.

There're two exceptions,
  - The vector_size attribute.
  - The intrinsic functions.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_init_cumulative_args): Set
  warning flag if func is not builtin
* config/riscv/riscv.cc
(riscv_scalable_vector_type_p): Determine whether the type is scalable 
vector.
(riscv_arg_has_vector): Determine whether the arg is vector type.
(riscv_pass_in_vector_p): Check the vector type param is passed by 
value.
(riscv_init_cumulative_args): The same as header.
(riscv_get_arg_info): Add the checking.
(riscv_function_value): Check the func return and set warning flag
* config/riscv/riscv.h (INIT_CUMULATIVE_ARGS): Add a flag to
  determine whether warning psabi or not.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add -Wno-psabi
* gcc.target/riscv/vector-abi-1.c: New test.
* gcc.target/riscv/vector-abi-2.c: New test.
* gcc.target/riscv/vector-abi-3.c: New test.
* gcc.target/riscv/vector-abi-4.c: New test.
* gcc.target/riscv/vector-abi-5.c: New test.
* gcc.target/riscv/vector-abi-6.c: New test.

Signed-off-by: Yanzhang Wang 
Co-authored-by: Kito Cheng 
---
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv.cc | 112 +-
 gcc/config/riscv/riscv.h  |   5 +-
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +-
 gcc/testsuite/gcc.target/riscv/vector-abi-1.c |  14 +++
 gcc/testsuite/gcc.target/riscv/vector-abi-2.c |  15 +++
 gcc/testsuite/gcc.target/riscv/vector-abi-3.c |  14 +++
 gcc/testsuite/gcc.target/riscv/vector-abi-4.c |  16 +++
 gcc/testsuite/gcc.target/riscv/vector-abi-5.c |  15 +++
 gcc/testsuite/gcc.target/riscv/vector-abi-6.c |  20 
 10 files changed, 212 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-6.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 66c1f535d60..90fde5f8be3 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -302,4 +302,6 @@ th_mempair_output_move (rtx[4], bool, machine_mode, 
RTX_CODE);
 #endif
 
 extern bool riscv_use_divmod_expander (void);
+void riscv_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
+
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index de30bf4e567..dd5361c2bd2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3795,6 +3795,99 @@ riscv_pass_fpr_pair (machine_mode mode, unsigned regno1,
   GEN_INT (offset2;
 }
 
+/* Use the TYPE_SIZE to distinguish the type with vector_size attribute and
+   intrinsic vector type.  Because we can't get the decl for the params.  */
+
+static bool
+riscv_scalable_vector_type_p (const_tree type)
+{
+  tree size = TYPE_SIZE (type);
+  if (size && TREE_CODE (size) == INTEGER_CST)
+return false;
+
+  /* For the data type like vint32m1_t, the size code is POLY_INT_CST.  */
+  return true;
+}
+
+static bool
+riscv_arg_has_vector (const_tree type)
+{
+  bool is_vector = false;
+
+  switch (TREE_CODE (type))
+{
+case RECORD_TYPE:
+  if (!COMPLETE_TYPE_P (type))
+   break;
+
+  for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
+   if (TREE_CODE (f) == FIELD_DECL)
+ {
+   tree field_type = TREE_TYPE (f);
+   if (!TYPE_P (field_type))
+ break;
+
+   /* Ignore it if it's fixed length vector.  */
+   if (VECTOR_TYPE_P (field_type))
+ is_vector = riscv_scalable_vector_type_p (field_type);
+   else
+ is_vector = riscv_arg_has_vector (field_type);
+ }
+
+  break;
+
+case VECTOR_TYPE:
+  is_vector = riscv_scalable_vector_type_p (type);
+  break;
+
+default:
+  is_vector = false;
+  break;
+}
+
+  return is_vector;
+}
+
+/* Pass the type to check whether it's a vector type or contains vector type.
+   Only check the value type and no checking for vector pointer type.  */
+
+static void
+riscv_pass_in_vector_p (const_tree type)
+{
+  static int warned = 0;
+
+  if (type && riscv_arg_has_vector (type) && !warned)
+{
+  warning (OPT_Wpsabi, "ABI for the scalable vector type is currently in "
+  "experimental stage and may changes in the upcoming version of "
+  "GCC.");
+  warned = 1;
+}
+}
+

[PATCH v2] [PR96339] Optimise svlast[ab]

2023-06-12 Thread Tejas Belagod via Gcc-patches
From: Tejas Belagod 

  This PR optimizes an SVE intrinsics sequence where
svlasta (svptrue_pat_b8 (SV_VL1), x)
  a scalar is selected based on a constant predicate and a variable vector.
  This sequence is optimized to return the correspoding element of a NEON
  vector. For eg.
svlasta (svptrue_pat_b8 (SV_VL1), x)
  returns
umovw0, v0.b[1]
  Likewise,
svlastb (svptrue_pat_b8 (SV_VL1), x)
  returns
 umovw0, v0.b[0]
  This optimization only works provided the constant predicate maps to a range
  that is within the bounds of a 128-bit NEON register.

gcc/ChangeLog:

PR target/96339
* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): Fold 
sve
calls that have a constant input predicate vector.
(svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
(svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
(svlast_impl::vect_all_same): Check if all vector elements are equal.

gcc/testsuite/ChangeLog:

PR target/96339
* gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
* gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
* gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
* gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
to expect optimized code for function body.
* gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): Likewise.
* gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.cc  | 133 
 .../aarch64/sve/acle/general-c/svlast.c   |  63 
 .../sve/acle/general-c/svlast128_run.c| 313 +
 .../sve/acle/general-c/svlast256_run.c| 314 ++
 .../gcc.target/aarch64/sve/pcs/return_4.c |   2 -
 .../aarch64/sve/pcs/return_4_1024.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_4_128.c |   2 -
 .../aarch64/sve/pcs/return_4_2048.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_4_256.c |   2 -
 .../gcc.target/aarch64/sve/pcs/return_4_512.c |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5.c |   2 -
 .../aarch64/sve/pcs/return_5_1024.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5_128.c |   2 -
 .../aarch64/sve/pcs/return_5_2048.c   |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5_256.c |   2 -
 .../gcc.target/aarch64/sve/pcs/return_5_512.c |   2 -
 16 files changed, 823 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast128_run.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast256_run.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index cd9cace3c9b..9b766ffa817 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1056,6 +1056,139 @@ class svlast_impl : public quiet
 public:
   CONSTEXPR svlast_impl (int unspec) : m_unspec (unspec) {}
 
+  bool is_lasta () const { return m_unspec == UNSPEC_LASTA; }
+  bool is_lastb () const { return m_unspec == UNSPEC_LASTB; }
+
+  bool vect_all_same (tree v, int step) const
+  {
+int i;
+int nelts = vector_cst_encoded_nelts (v);
+tree first_el = VECTOR_CST_ENCODED_ELT (v, 0);
+
+for (i = 0; i < nelts; i += step)
+  if (!operand_equal_p (VECTOR_CST_ENCODED_ELT (v, i), first_el, 0))
+   return false;
+
+return true;
+  }
+
+  /* Fold a svlast{a/b} call with constant predicate to a BIT_FIELD_REF.
+ BIT_FIELD_REF lowers to Advanced SIMD element extract, so we have to
+ ensure the index of the element being accessed is in the range of a
+ Advanced SIMD vector width.  */
+  gimple *fold (gimple_folder & f) const override
+  {
+tree pred = gimple_call_arg (f.call, 0);
+tree val = gimple_call_arg (f.call, 1);
+
+if (TREE_CODE (pred) == VECTOR_CST)
+  {
+   HOST_WIDE_INT pos;
+   int i = 0;
+   int step = f.type_suffix (0).element_bytes;
+   int step_1 = gcd (step, VECTOR_CST_NPATTERNS (pred));
+   int npats = VECTOR_CST_NPATTERNS (pred);
+   unsigned HOST_WIDE_INT e

Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-06-12 Thread Tejas Belagod via Gcc-patches


From: Richard Sandiford 
Date: Friday, May 19, 2023 at 3:20 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]
Tejas Belagod  writes:
> Am I correct to understand that we still need to check for the case when
> there's a repeating non-zero elements in the case of NELTS_PER_PATTERN == 2?
> eg. { 0, 0, 1, 1, 1, 1,} which should be encoded as {0, 0, 1, 1} with
> NPATTERNS = 2 ?

Yeah, that's right.  The current handling for NPATTERNS==2 looked
good to me.  It was the other two cases that I was worried about.

Thanks,
Richard

Thanks for all the reviews. I’ve posted a new version of the patch here - 
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621310.html

Thanks,
Tejas.



Re: Splitting up 27_io/basic_istream/ignore/wchar_t/94749.cc (takes too long)

2023-06-12 Thread Bernhard Reutner-Fischer via Gcc-patches
On Sat, 10 Jun 2023 11:29:36 -0700
Mike Stump  wrote:

> On Jun 9, 2023, at 2:47 PM, Bernhard Reutner-Fischer
>  wrote:

> > But well. Either way, what
> > should we do about remote env, if there is one? If the target
> > supports it, send it and skip otherwise?  

> So, to focus a
> conversation, which target, which host, canadian? Which part of the
> environment? What part is missing you want to fix? Want to unify
> between targets... and so on.
> 

The most recent target where this came up again was GCN i think.
See the last block in
https://inbox.sourceware.org/gcc-patches/20230508195217.4897009f@nbbrfq/
and Thomas' reference therein to Tobias'
https://inbox.sourceware.org/gcc-patches/018bcdeb-b3bb-1859-cd0b-a8a92e26d...@codesourcery.com/

thoughts?

thanks,


[PING] [PATCH v2] rs6000: fmr gets used instead of faster xxlor [PR93571]

2023-06-12 Thread Ajit Agarwal via Gcc-patches
Hello Segher:

Please review and let me know your feedback to submit in trunk.

Thanks & Regards
Ajit

On 25/02/23 3:20 pm, Ajit Agarwal via Gcc-patches wrote:
> Hello All:
> 
> Here is the patch that uses xxlor instead of fmr where possible.
> Performance results shows that fmr is better in power9 and 
> power10 architectures whereas xxlor is better in power7 and
> power 8 architectures. fmr is the only option before p7.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu
> 
> Thanks & Regards
> Ajit
> 
>   rs6000: Use xxlor instead of fmr where possible
> 
>   Replaces fmr with xxlor instruction for power7 and power8
>   architectures whereas for power9 and power10 keep fmr
>   instruction.
> 
>   Perf measurement results:
> 
>   Power9 fmr:  201,847,661 cycles.
>   Power9 xxlor: 201,877,78 cycles.
>   Power8 fmr: 200,901,043 cycles.
>   Power8 xxlor: 201,020,518 cycles.
>   Power7 fmr: 201,059,524 cycles.
>   Power7 xxlor: 201,042,851 cycles.
> 
>   2023-02-25  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.md (*movdf_hardfloat64): Use xxlor for power7
>   and power8 and fmr for power9 and power10.
> ---
>  gcc/config/rs6000/rs6000.md | 44 +++--
>  1 file changed, 28 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 81bffb04ceb..e101f7f5fc1 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -354,7 +354,7 @@ (define_attr "cpu"
>(const (symbol_ref "(enum attr_cpu) rs6000_tune")))
>  
>  ;; The ISA we implement.
> -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
> +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p7p8v,p9,p9v,p9kf,p9tf,p10"
>(const_string "any"))
>  
>  ;; Is this alternative enabled for the current CPU/ISA/etc.?
> @@ -402,6 +402,11 @@ (define_attr "enabled" ""
>   (and (eq_attr "isa" "p10")
> (match_test "TARGET_POWER10"))
>   (const_int 1)
> +  
> + (and (eq_attr "isa" "p7p8v")
> +   (match_test "TARGET_VSX && !TARGET_P9_VECTOR"))
> + (const_int 1)
> +
>  ] (const_int 0)))
>  
>  ;; If this instruction is microcoded on the CELL processor
> @@ -8436,27 +8441,29 @@ (define_insn "*mov_softfloat32"
>  
>  (define_insn "*mov_hardfloat64"
>[(set (match_operand:FMOVE64 0 "nonimmediate_operand"
> -   "=m,   d,  d,  ,   wY,
> - ,Z,  ,  ,  !r,
> +   "=m,   d,  ,  ,   wY,
> + ,Z,  wa, ,  !r,
>   YZ,  r,  !r, *c*l,   !r,
> -*h,   r,  ,   wa")
> +*h,   r,  ,   d,  wn,
> +wa")
>   (match_operand:FMOVE64 1 "input_operand"
> -"d,   m,  d,  wY, ,
> - Z,   ,   ,  ,  ,
> +"d,   m,  ,  wY, ,
> + Z,   ,   wa, ,  ,
>   r,   YZ, r,  r,  *h,
> - 0,   ,   r,  eP"))]
> + 0,   ,   r,  d,  wn,
> + eP"))]
>"TARGET_POWERPC64 && TARGET_HARD_FLOAT
> && (gpc_reg_operand (operands[0], mode)
> || gpc_reg_operand (operands[1], mode))"
>"@
> stfd%U0%X0 %1,%0
> lfd%U1%X1 %0,%1
> -   fmr %0,%1
> +   xxlor %x0,%x1,%x1
> lxsd %0,%1
> stxsd %1,%0
> lxsdx %x0,%y1
> stxsdx %x1,%y0
> -   xxlor %x0,%x1,%x1
> +   fmr %0,%1
> xxlxor %x0,%x0,%x0
> li %0,0
> std%U0%X0 %1,%0
> @@ -8467,23 +8474,28 @@ (define_insn "*mov_hardfloat64"
> nop
> mfvsrd %0,%x1
> mtvsrd %x0,%1
> +   fmr %0,%1
> +   fmr %0,%1
> #"
>[(set_attr "type"
> -"fpstore, fpload, fpsimple,   fpload, fpstore,
> +"fpstore, fpload, veclogical, fpload, fpstore,
>   fpload,  fpstore,veclogical, veclogical, integer,
>   store,   load,   *,  mtjmpr, mfjmpr,
> - *,   mfvsr,  mtvsr,  vecperm")
> + *,   mfvsr,  mtvsr,  fpsimple,   fpsimple,
> + vecperm")
> (set_attr "size" "64")
> (set_attr "isa"
> -"*,   *,  *,  p9v,p9v,
> - p7v, p7v,*,  *,  *,
> - *,   *,  *,  *,  *,
> - *,   p8v,p8v,p10")
> +"*,   *,  p7p8v,p9v,p9v,
> + p7v, p7v,*,   *,  *,
> + *,   *,  *,   *,  *,
> + *,   p8v,p8v, *,  *,
> + p10")
> (set_attr "prefixed"
>  "*,  

Re: [PATCH v2] [PR96339] Optimise svlast[ab]

2023-06-12 Thread Richard Sandiford via Gcc-patches
Tejas Belagod  writes:
> From: Tejas Belagod 
>
>   This PR optimizes an SVE intrinsics sequence where
> svlasta (svptrue_pat_b8 (SV_VL1), x)
>   a scalar is selected based on a constant predicate and a variable vector.
>   This sequence is optimized to return the correspoding element of a NEON
>   vector. For eg.
> svlasta (svptrue_pat_b8 (SV_VL1), x)
>   returns
> umovw0, v0.b[1]
>   Likewise,
> svlastb (svptrue_pat_b8 (SV_VL1), x)
>   returns
>  umovw0, v0.b[0]
>   This optimization only works provided the constant predicate maps to a range
>   that is within the bounds of a 128-bit NEON register.
>
> gcc/ChangeLog:
>
>   PR target/96339
>   * config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): Fold 
> sve
>   calls that have a constant input predicate vector.
>   (svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
>   (svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
>   (svlast_impl::vect_all_same): Check if all vector elements are equal.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/96339
>   * gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
>   * gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
>   * gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
>   * gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
>   to expect optimized code for function body.
>   * gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): Likewise.

OK, thanks.

Richard


[pushed] c++: build initializer_list in a loop [PR105838]

2023-06-12 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

I previously applied this change in r13-4565 but reverted it due to
PR108071.  That PR was then fixed by r13-4712, but I didn't re-apply this
change then because we weren't making the array static; since r14-1500 for
PR110070 we now make the initializer array static, so let's bring this back.

In situations where the maybe_init_list_as_range optimization isn't viable,
we can build an initializer_list with a loop over a constant array
of string literals.

This is represented using a VEC_INIT_EXPR, which required adjusting a couple
of places that expected the initializer array to have the same type as the
target array and fixing build_vec_init not to undo our efforts.

PR c++/105838

gcc/cp/ChangeLog:

* call.cc (convert_like_internal) [ck_list]: Use
maybe_init_list_as_array.
* constexpr.cc (cxx_eval_vec_init_1): Init might have
a different type.
* tree.cc (build_vec_init_elt): Likewise.
* init.cc (build_vec_init): Handle from_array from a
TARGET_EXPR.  Retain TARGET_EXPR of a different type.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/initlist-opt5.C: New test.
---
 gcc/cp/call.cc| 11 -
 gcc/cp/constexpr.cc   |  6 ++---
 gcc/cp/init.cc| 13 --
 gcc/cp/tree.cc|  2 --
 gcc/testsuite/g++.dg/tree-ssa/initlist-opt5.C | 24 +++
 5 files changed, 48 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/initlist-opt5.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 354773f00c6..68cf878308e 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -8541,7 +8541,16 @@ convert_like_internal (conversion *convs, tree expr, 
tree fn, int argnum,
unsigned len = CONSTRUCTOR_NELTS (expr);
tree array;
 
-   if (len)
+   if (tree init = maybe_init_list_as_array (elttype, expr))
+ {
+   elttype = cp_build_qualified_type
+ (elttype, cp_type_quals (elttype) | TYPE_QUAL_CONST);
+   array = build_array_of_n_type (elttype, len);
+   array = build_vec_init_expr (array, init, complain);
+   array = get_target_expr (array);
+   array = cp_build_addr_expr (array, complain);
+ }
+   else if (len)
  {
tree val; unsigned ix;
 
diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 8f7f0b7d325..bbecf86d58b 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -5240,12 +5240,12 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
atype, tree init,
   else
{
  /* Copying an element.  */
- gcc_assert (same_type_ignoring_top_level_qualifiers_p
- (atype, TREE_TYPE (init)));
  eltinit = cp_build_array_ref (input_location, init, idx, complain);
  if (!lvalue_p (init))
eltinit = move (eltinit);
- eltinit = force_rvalue (eltinit, complain);
+ eltinit = (perform_implicit_conversion_flags
+(elttype, eltinit, complain,
+ LOOKUP_IMPLICIT|LOOKUP_NO_NARROWING));
  eltinit = cxx_eval_constant_expression (&new_ctx, eltinit, lval,
  non_constant_p, overflow_p);
}
diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 6ccda365b04..af6e30f511e 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4475,7 +4475,9 @@ build_vec_init (tree base, tree maxindex, tree init,
   /* Look through the TARGET_EXPR around a compound literal.  */
   if (init && TREE_CODE (init) == TARGET_EXPR
   && TREE_CODE (TARGET_EXPR_INITIAL (init)) == CONSTRUCTOR
-  && from_array != 2)
+  && from_array != 2
+  && (same_type_ignoring_top_level_qualifiers_p
+ (TREE_TYPE (init), atype)))
 init = TARGET_EXPR_INITIAL (init);
 
   if (tree vi = get_vec_init_expr (init))
@@ -4601,7 +4603,14 @@ build_vec_init (tree base, tree maxindex, tree init,
 {
   if (lvalue_kind (init) & clk_rvalueref)
xvalue = true;
-  base2 = decay_conversion (init, complain);
+  if (TREE_CODE (init) == TARGET_EXPR)
+   {
+ /* Avoid error in decay_conversion.  */
+ base2 = decay_conversion (TARGET_EXPR_SLOT (init), complain);
+ base2 = cp_build_compound_expr (init, base2, tf_none);
+   }
+  else
+   base2 = decay_conversion (init, complain);
   if (base2 == error_mark_node)
return error_mark_node;
   itype = TREE_TYPE (base2);
diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 4d5e3f53c5e..751c9adeb62 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -785,8 +785,6 @@ build_vec_init_elt (tree type, tree init, tsubst_flags_t 
complain)
   releasing_vec argvec;
   if (init && !BRACE_ENCLOSED_INITIALIZER_P (init))
 {
-  gcc_assert (same_type_ignoring_top_level_qualifiers_p
-

[PATCH] middle-end/110200 - genmatch force-leaf and convert interaction

2023-06-12 Thread Richard Biener via Gcc-patches
The following fixes code GENERIC generation for (convert! ...)
which currently generates

  if (TREE_TYPE (_o1[0]) != type)
_r1 = fold_build1_loc (loc, NOP_EXPR, type, _o1[0]);
if (EXPR_P (_r1))
  goto next_after_fail867;
  else
_r1 = _o1[0];

where obviously braces are missing.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk,
will push down to branches as well.

PR middle-end/110200
* genmatch.cc (expr::gen_transform): Put braces around
the if arm for the (convert ...) short-cut.
---
 gcc/genmatch.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index bd6ce3a28f8..5fceeec9780 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -2625,7 +2625,8 @@ expr::gen_transform (FILE *f, int indent, const char 
*dest, bool gimple,
{
  fprintf_indent (f, indent, "if (TREE_TYPE (_o%d[0]) != %s)\n",
  depth, type);
- indent += 2;
+ fprintf_indent (f, indent + 2, "{\n");
+ indent += 4;
}
   if (opr->kind == id_base::CODE)
fprintf_indent (f, indent, "_r%d = fold_build%d_loc (loc, %s, %s",
@@ -2648,7 +2649,8 @@ expr::gen_transform (FILE *f, int indent, const char 
*dest, bool gimple,
}
   if (*opr == CONVERT_EXPR)
{
- indent -= 2;
+ fprintf_indent (f, indent - 2, "}\n");
+ indent -= 4;
  fprintf_indent (f, indent, "else\n");
  fprintf_indent (f, indent, "  _r%d = _o%d[0];\n", depth, depth);
}
-- 
2.35.3


Re: [r14-1624 Regression] FAIL: std/time/year_month_day_last/1.cc (test for excess errors) on Linux/x86_64

2023-06-12 Thread Jason Merrill via Gcc-patches
This should be fixed by r14-1660-g953bbeaeff050f

On Sun, Jun 11, 2023 at 11:33 PM haochen.jiang 
wrote:

> On Linux/x86_64,
>
> 28db36e2cfca1b7106adc8d371600fa3a325c4e2 is the first bad commit
> commit 28db36e2cfca1b7106adc8d371600fa3a325c4e2
> Author: Jason Merrill 
> Date:   Wed Jun 7 05:15:02 2023 -0400
>
> c++: allow NRV and non-NRV returns [PR58487]
>
> caused
>
> FAIL: 25_algorithms/minmax/constrained.cc (test for excess errors)
> FAIL: g++.dg/cpp2a/spaceship-synth10.C  -std=gnu++20 (internal compiler
> error: Segmentation fault)
> FAIL: g++.dg/cpp2a/spaceship-synth10.C  -std=gnu++20 (test for excess
> errors)
> FAIL: g++.dg/cpp2a/spaceship-synth12.C  -std=c++20 (internal compiler
> error: Segmentation fault)
> FAIL: g++.dg/cpp2a/spaceship-synth12.C  -std=c++20 (test for excess errors)
> FAIL: g++.dg/cpp2a/spaceship-synth13.C  -std=c++20 (internal compiler
> error: Segmentation fault)
> FAIL: g++.dg/cpp2a/spaceship-synth13.C  -std=c++20 (test for excess errors)
> FAIL: g++.dg/cpp2a/spaceship-synth14.C  -std=c++20 (internal compiler
> error: Segmentation fault)
> FAIL: g++.dg/cpp2a/spaceship-synth14.C  -std=c++20 (test for excess errors)
> FAIL: g++.dg/cpp2a/spaceship-synth1a.C  -std=c++20 (internal compiler
> error: Segmentation fault)
> FAIL: g++.dg/cpp2a/spaceship-synth1a.C  -std=c++20 (test for excess errors)
> FAIL: g++.dg/cpp2a/spaceship-synth1.C  -std=c++20 (internal compiler
> error: Segmentation fault)
> FAIL: g++.dg/cpp2a/spaceship-synth1.C  -std=c++20 (test for excess errors)
> FAIL: g++.dg/cpp2a/spaceship-synth2a.C  -std=c++20 (internal compiler
> error: Segmentation fault)
> FAIL: g++.dg/cpp2a/spaceship-synth2a.C  -std=c++20 (test for excess errors)
> FAIL: g++.dg/cpp2a/spaceship-synth2b.C  -std=c++20 (internal compiler
> error: Segmentation fault)
> FAIL: g++.dg/cpp2a/spaceship-synth2b.C  -std=c++20 (test for excess errors)
> FAIL: g++.dg/cpp2a/spaceship-synth2.C  -std=c++20 (internal compiler
> error: Segmentation fault)
> FAIL: g++.dg/cpp2a/spaceship-synth2.C  -std=c++20 (test for excess errors)
> FAIL: g++.dg/cpp2a/spaceship-synth4.C  -std=c++20 (internal compiler
> error: Segmentation fault)
> FAIL: g++.dg/cpp2a/spaceship-synth4.C  -std=c++20 (test for excess errors)
> FAIL: g++.dg/cpp2a/spaceship-synth5.C  -std=c++20 (internal compiler
> error: Segmentation fault)
> FAIL: g++.dg/cpp2a/spaceship-synth5.C  -std=c++20 (test for excess errors)
> FAIL: g++.dg/cpp2a/spaceship-weak1.C  -std=c++20 (internal compiler error:
> Segmentation fault)
> FAIL: g++.dg/cpp2a/spaceship-weak1.C  -std=c++20 (test for excess errors)
> FAIL: std/time/month_day/1.cc (test for excess errors)
> FAIL: std/time/month_day_last/1.cc (test for excess errors)
> FAIL: std/time/year_month/1.cc (test for excess errors)
> FAIL: std/time/year_month_day/1.cc (test for excess errors)
> FAIL: std/time/year_month_day/4.cc (test for excess errors)
> FAIL: std/time/year_month_day_last/1.cc (test for excess errors)
>
> with GCC configured with
>
> ../../gcc/configure
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-1624/usr
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet
> --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check
> RUNTESTFLAGS="conformance.exp=25_algorithms/minmax/constrained.cc
> --target_board='unix{-m32}'"
> $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check
> RUNTESTFLAGS="conformance.exp=25_algorithms/minmax/constrained.cc
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth10.C
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth10.C
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth12.C
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth12.C
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth13.C
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth13.C
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth14.C
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth14.C
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth1a.C
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check
> RUNTESTFLAGS="dg.exp=g++.dg/cpp2a/spaceship-synth1a.C
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ 

Re: [PATCH] [x86] Add missing vec_pack/unpacks patterns for _Float16 <-> int/float conversion.

2023-06-12 Thread Hongtao Liu via Gcc-patches
On Mon, Jun 5, 2023 at 9:26 AM liuhongt  wrote:
>
> This patch only support vec_pack/unpacks optabs for vector modes whose lenth 
> >= 128.
> For 32/64-bit vector, they're more hanlded by BB vectorizer with
> truncmn2/extendmn2/fix{,uns}_truncmn2.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ready to push to trunk.
Committed.
>
> gcc/ChangeLog:
>
> * config/i386/sse.md (vec_pack_float_): New 
> expander.
> (vec_unpack_fix_trunc_lo_): Ditto.
> (vec_unpack_fix_trunc_hi_): Ditto.
> (vec_unpacks_lo_: Ditto.
> (vec_unpacks_hi_: Ditto.
> (sse_movlhps_): New define_insn.
> (ssse3_palignr_perm): Extend to V_128H.
> (V_128H): New mode iterator.
> (ssepackPHmode): New mode attribute.
> (vunpck_extract_mode>: Ditto.
> (vpckfloat_concat_mode): Extend to VxSI/VxSF for _Float16.
> (vpckfloat_temp_mode): Ditto.
> (vpckfloat_op_mode): Ditto.
> (vunpckfixt_mode): Extend to VxHF.
> (vunpckfixt_model): Ditto.
> (vunpckfixt_extract_mode): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/vec_pack_fp16-1.c: New test.
> * gcc.target/i386/vec_pack_fp16-2.c: New test.
> * gcc.target/i386/vec_pack_fp16-3.c: New test.
> ---
>  gcc/config/i386/sse.md| 216 +-
>  .../gcc.target/i386/vec_pack_fp16-1.c |  34 +++
>  .../gcc.target/i386/vec_pack_fp16-2.c |   9 +
>  .../gcc.target/i386/vec_pack_fp16-3.c |   8 +
>  4 files changed, 258 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vec_pack_fp16-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vec_pack_fp16-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vec_pack_fp16-3.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index a92f50e96b5..1eb2dd077ff 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -291,6 +291,9 @@ (define_mode_iterator V
>  (define_mode_iterator V_128
>[V16QI V8HI V4SI V2DI V4SF (V2DF "TARGET_SSE2")])
>
> +(define_mode_iterator V_128H
> +  [V16QI V8HI V8HF V8BF V4SI V2DI V4SF (V2DF "TARGET_SSE2")])
> +
>  ;; All 256bit vector modes
>  (define_mode_iterator V_256
>[V32QI V16HI V8SI V4DI V8SF V4DF])
> @@ -1076,6 +1079,12 @@ (define_mode_attr ssePHmodelower
> (V8DI "v8hf") (V4DI "v4hf") (V2DI "v2hf")
> (V8DF "v8hf") (V16SF "v16hf") (V8SF "v8hf")])
>
> +
> +;; Mapping of vector modes to packed vector hf modes of same sized.
> +(define_mode_attr ssepackPHmode
> +  [(V16SI "V32HF") (V8SI "V16HF") (V4SI "V8HF")
> +   (V16SF "V32HF") (V8SF "V16HF") (V4SF "V8HF")])
> +
>  ;; Mapping of vector modes to packed single mode of the same size
>  (define_mode_attr ssePSmode
>[(V16SI "V16SF") (V8DF "V16SF")
> @@ -6918,6 +6927,61 @@ (define_mode_attr qq2phsuff
> (V16SF "") (V8SF "{y}") (V4SF "{x}")
> (V8DF "{z}") (V4DF "{y}") (V2DF "{x}")])
>
> +(define_mode_attr vunpck_extract_mode
> +  [(V32HF "v32hf") (V16HF "v16hf") (V8HF "v16hf")])
> +
> +(define_expand "vec_unpacks_lo_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand:VF_AVX512FP16VL 1 "register_operand")]
> +  "TARGET_AVX512FP16"
> +{
> +  rtx tem = operands[1];
> +  rtx (*gen) (rtx, rtx);
> +  if (mode != V8HFmode)
> +{
> +  tem = gen_reg_rtx (mode);
> +  emit_insn (gen_vec_extract_lo_ (tem,
> +  operands[1]));
> +  gen = gen_extend2;
> +}
> +  else
> +gen = gen_avx512fp16_float_extend_phv4sf2;
> +
> +  emit_insn (gen (operands[0], tem));
> +  DONE;
> +})
> +
> +(define_expand "vec_unpacks_hi_"
> +  [(match_operand: 0 "register_operand")
> +   (match_operand:VF_AVX512FP16VL 1 "register_operand")]
> +  "TARGET_AVX512FP16"
> +{
> +  rtx tem = operands[1];
> +  rtx (*gen) (rtx, rtx);
> +  if (mode != V8HFmode)
> +{
> +  tem = gen_reg_rtx (mode);
> +  emit_insn (gen_vec_extract_hi_ (tem,
> +  operands[1]));
> +  gen = gen_extend2;
> +}
> +  else
> +{
> +  tem = gen_reg_rtx (V8HFmode);
> +  rtvec tmp = rtvec_alloc (8);
> +  for (int i = 0; i != 8; i++)
> +   RTVEC_ELT (tmp, i) = GEN_INT((i+4)%8);
> +
> +  rtx selector = gen_rtx_PARALLEL (VOIDmode, tmp);
> +  emit_move_insn (tem,
> +gen_rtx_VEC_SELECT (V8HFmode, operands[1], selector));
> +  gen = gen_avx512fp16_float_extend_phv4sf2;
> +}
> +
> +  emit_insn (gen (operands[0], tem));
> +  DONE;
> +})
> +
>  (define_insn 
> "avx512fp16_vcvtph2_"
>[(set (match_operand:VI248_AVX512VL 0 "register_operand" "=v")
>  (unspec:VI248_AVX512VL
> @@ -8314,11 +8378,17 @@ (define_expand "floatv2div2sf2"
>  })
>
>  (define_mode_attr vpckfloat_concat_mode
> -  [(V8DI "v16sf") (V4DI "v8sf") (V2DI "v8sf")])
> +  [(V8DI "v16sf") (V4DI "v8sf") (V2DI "v8sf")
> +   (V16SI "v32hf") (V8SI "v16hf") (V4SI "v16hf")
> +   (V

Re: [PATCH] LoongArch: Set 4 * (issue rate) as the default for -falign-functions and -falign-loops

2023-06-12 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-05-30 at 09:30 +0800, Lulu Cheng wrote:
> 
> 在 2023/5/29 下午2:09, Xi Ruoyao 写道:
> > On Tue, 2023-04-18 at 21:06 +0800, Lulu Cheng wrote:
> > > Hi, ruoyao:
> > > 
> > > Thank you so much for making this submission. But we are testing
> > > the
> > > impact of these two alignment parameters
> > > 
> > > (also including -falign-jumps and -falign-lables ) on performance.
> > > So
> > > before the result comes out, this patch will
> > > 
> > > not be merged into the main branch for the time being.
> > Hi!
> > 
> > Is there an estimate when the benchmark will be done?  If it will be
> > done soon I'll wait for the result before performing a full system
> > rebuild, otherwise I'll use my gut feeling to specify a -falign-
> > functions= value for the build :).
> > 
> Sorry for taking so long to reply to the email. From our current test 
> results,
> 
> the performance of the SPEC is best when combined with -falign-
> loops=16,
> 
> -falign-jumps=16, -falign-functions=32 and -falign-lables=16.

I've completed a system rebuild with -falign-
{jumps,functions,labels}=16.  I've missed -falign-loops=16 but the doc
says -falign-labels=16 implies -falign-jumps=16 and -falign-loops=16 (if
-falign-jumps or -falign-loops are not set explicitly with a larger
value).

I'll make a patch to set -falign-functions=32 and -falign-labels=16 with
-mtune={la464,loongarch64} after setting a basic develop environment on
the new system...  And I'm wondering if things will change with LA664
:).


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] RISC-V: Add ZVFHMIN autovec block testcase

2023-06-12 Thread juzhe . zhong
From: Juzhe-Zhong 

To be safe, add ZVFHMIN autovec block testcase to make sure
we won't enable autovec in ZVFHMIN by mistakes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/zvfhmin-1.c: New test.

---
 .../gcc.target/riscv/rvv/autovec/zvfhmin-1.c  | 34 +++
 1 file changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/zvfhmin-1.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zvfhmin-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zvfhmin-1.c
new file mode 100644
index 000..934d42c5d5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zvfhmin-1.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv_zvfhmin -mabi=ilp32d --param 
riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+void f0 (_Float16 * __restrict a, _Float16 * __restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+a[i] = -b[i];
+}
+
+void f1 (_Float16 * __restrict a, _Float16 * __restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+a[i] = a[i]+b[i];
+}
+
+void f2 (_Float16 * __restrict a, _Float16 * __restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+a[i] = a[i]-b[i];
+}
+
+void f3 (_Float16 * __restrict a, _Float16 * __restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+a[i] = a[i]*b[i];
+}
+
+void f4 (_Float16 * __restrict a, _Float16 * __restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+a[i] = a[i]/b[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" 
} } */
-- 
2.36.3



Re: [PATCH] RISC-V: Add ZVFHMIN autovec block testcase

2023-06-12 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

no complaints here.  Just please make sure you add the commit
message or something related as top comment to the test when
committing.
Somebody who reads the test is not going to want to lookup
the commit message to know what's going on.

Regards
 Robin


[PATCH V2] RISC-V: Add ZVFHMIN block autovec testcase

2023-06-12 Thread juzhe . zhong
From: Juzhe-Zhong 

To be safe, add ZVFHMIN autovec block testcase to make sure
we won't enable autovec in ZVFHMIN by mistakes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/zvfhmin-1.c: New test.

---
 .../gcc.target/riscv/rvv/autovec/zvfhmin-1.c  | 35 +++
 1 file changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/zvfhmin-1.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/zvfhmin-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zvfhmin-1.c
new file mode 100644
index 000..08da48d0270
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/zvfhmin-1.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv_zvfhmin -mabi=ilp32d --param 
riscv-autovec-preference=scalable -fdump-tree-vect-details" } */
+
+void f0 (_Float16 * __restrict a, _Float16 * __restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+a[i] = -b[i];
+}
+
+void f1 (_Float16 * __restrict a, _Float16 * __restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+a[i] = a[i]+b[i];
+}
+
+void f2 (_Float16 * __restrict a, _Float16 * __restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+a[i] = a[i]-b[i];
+}
+
+void f3 (_Float16 * __restrict a, _Float16 * __restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+a[i] = a[i]*b[i];
+}
+
+void f4 (_Float16 * __restrict a, _Float16 * __restrict b, int n)
+{
+  for (int i = 0; i < n; i++)
+a[i] = a[i]/b[i];
+}
+
+/* We can't enable FP16 NEG/PLUS/MINUS/MULT/DIV auto-vectorization when 
-march="*zvfhmin*".  */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" 
} } */
-- 
2.36.3



Re: Re: [PATCH] RISC-V: Add ZVFHMIN autovec block testcase

2023-06-12 Thread juzhe.zh...@rivai.ai
Ok. Add comments in V2 patch.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-12 17:39
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Add ZVFHMIN autovec block testcase
Hi Juzhe,
 
no complaints here.  Just please make sure you add the commit
message or something related as top comment to the test when
committing.
Somebody who reads the test is not going to want to lookup
the commit message to know what's going on.
 
Regards
Robin
 


Re: [PATCH V2] RISC-V: Add ZVFHMIN block autovec testcase

2023-06-12 Thread Robin Dapp via Gcc-patches
> +/* We can't enable FP16 NEG/PLUS/MINUS/MULT/DIV auto-vectorization when 
> -march="*zvfhmin*".  */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 
> "vect" } } */

Thanks.  OK from my side.

Regards
 Robin


Re: [PATCH] RISC-V: Fix V_WHOLE && V_FRACT iterator requirement

2023-06-12 Thread Robin Dapp via Gcc-patches
> +  (VNx16QI "TARGET_MIN_VLEN <= 128")
> +  (VNx32QI "TARGET_MIN_VLEN <= 256")
> +  (VNx64QI "TARGET_MIN_VLEN >= 64 && TARGET_MIN_VLEN <= 512")
> +  (VNx128QI "TARGET_MIN_VLEN >= 128 && TARGET_MIN_VLEN <= 1024")
> 
> This not correct, we always use VNx16QI as LMUL = m1 for min_vlen >= 128.
> Requirement of TARGET_MIN_VLEN <= 128 is incorrect for VNx16QI.
> VNx32QI,...etc likewise.

Please elaborate.  What happens with a VNx16QI on a target with
min_vlen == 256?  Is it a full 256-bit vector with only the first half
populated?  If so, this need documentation either here or somewhere
else (but with a reference here).

Either you can pick my testcase and amend your patch (plus
streamline formatting as well adding a proper comment) or I change
mine.  Your call.

Regards
 Robin


Re: Re: [PATCH] RISC-V: Fix V_WHOLE && V_FRACT iterator requirement

2023-06-12 Thread juzhe.zh...@rivai.ai
I'd like you to defer to you commit my patch with your test (Jeff has approved 
my patch, just feel free to commit).

Here is the description:
We have 3 configuration for "-march"
1. zve32*  (TARGET_MIN_VLEN == 32), the LMUL = 1 mode will be VNx4QI, VNx2HI, 
VNx1SI
2. zve64*  (TARGET_MIN_VLEN == 64), the LMUL = 1 mode will be VNx8QI, VNx4HI, 
VNx2SI
3. zve64*_zvl128b  (TARGET_MIN_VLEN >= 128), the LMUL = 1 mode will be VNx16QI, 
VNx8HI, VNx4SI

We dynamically adjust BYTES_PER_VECTOR according to TARGET_MIN_VLEN.
For TARGET_MIN_VLEN = 32 (chunk=32), the LMUL = 1 size = (4,4) bytes.
For TARGET_MIN_VLEN = 64 (chunk=64), the LMUL = 1 size = (8,8) bytes.
For TARGET_MIN_VLEN >= 128 (chunk=128), the LMUL = 1 size = (16*n,16*n) bytes.
I have explained it many times.
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614935.html 
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612574.html 




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-12 17:51
To: 钟居哲; gcc-patches
CC: rdapp.gcc; kito.cheng; palmer; Jeff Law
Subject: Re: [PATCH] RISC-V: Fix V_WHOLE && V_FRACT iterator requirement
> +  (VNx16QI "TARGET_MIN_VLEN <= 128")
> +  (VNx32QI "TARGET_MIN_VLEN <= 256")
> +  (VNx64QI "TARGET_MIN_VLEN >= 64 && TARGET_MIN_VLEN <= 512")
> +  (VNx128QI "TARGET_MIN_VLEN >= 128 && TARGET_MIN_VLEN <= 1024")
> 
> This not correct, we always use VNx16QI as LMUL = m1 for min_vlen >= 128.
> Requirement of TARGET_MIN_VLEN <= 128 is incorrect for VNx16QI.
> VNx32QI,...etc likewise.
 
Please elaborate.  What happens with a VNx16QI on a target with
min_vlen == 256?  Is it a full 256-bit vector with only the first half
populated?  If so, this need documentation either here or somewhere
else (but with a reference here).
 
Either you can pick my testcase and amend your patch (plus
streamline formatting as well adding a proper comment) or I change
mine.  Your call.
 
Regards
Robin
 


RE: gcc/config.in was not regenerated

2023-06-12 Thread Tamar Christina via Gcc-patches
Hi Coudert,

Sorry, missed that one.

I'll fix that.

Tamar.

> -Original Message-
> From: FX Coudert 
> Sent: Saturday, June 10, 2023 9:21 PM
> To: Tamar Christina 
> Cc: g...@gcc.gnu.org; Jeff Law ; gcc-
> patc...@gcc.gnu.org
> Subject: gcc/config.in was not regenerated
> 
> Hi,
> 
> Building GCC in maintainer mode leads to changes in gcc/config.in
> :
> 
> > diff --git a/gcc/config.in b/gcc/config.in index
> > 4cad077bfbe..25442c59aec 100644
> > --- a/gcc/config.in
> > +++ b/gcc/config.in
> > @@ -67,6 +67,12 @@
> >  #endif
> > +/* Define to larger than one set the number of match.pd
> > partitions to make. */
> > +#ifndef USED_FOR_TARGET
> > +#undef DEFAULT_MATCHPD_PARTITIONS
> > +#endif
> > +
> > +
> >  /* Define to larger than zero set the default stack clash protector
> > size. */  #ifndef USED_FOR_TARGET  #undef
> DEFAULT_STK_CLASH_GUARD_SIZE
> 
> which I think are because this commit
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0a85544e1aaeca41133ecfc4
> 38cda913dbc0f122
> should have regenerated and committed config.in 
> 
> Christina, can you please have a look?
> 
> FX


[PATCH][committed] Regenerate config.in

2023-06-12 Thread Tamar Christina via Gcc-patches
Hi All,

Looks like I forgot to regenerate config.in which
causes updates when you enable maintainer mode.

Bootstrapped aarch64-none-linux-gnu.

Committed under obvious rule.

Thanks,
Tamar

gcc/ChangeLog:

* config.in: Regenerate.

--- inline copy of patch -- 
diff --git a/gcc/config.in b/gcc/config.in
index 
4cad077bfbed7fd73b3c04ce6405fd2f49178412..cf2f284378447c8f8e2f838a786dba23d6086fe3
 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -67,6 +67,12 @@
 #endif
 
 
+/* Define to larger than one set the number of match.pd partitions to make. */
+#ifndef USED_FOR_TARGET
+#undef DEFAULT_MATCHPD_PARTITIONS
+#endif
+
+
 /* Define to larger than zero set the default stack clash protector size. */
 #ifndef USED_FOR_TARGET
 #undef DEFAULT_STK_CLASH_GUARD_SIZE
@@ -2239,8 +2245,7 @@
 #endif
 
 
-/* Define to the sub-directory in which libtool stores uninstalled libraries.
-   */
+/* Define to the sub-directory where libtool stores uninstalled libraries. */
 #ifndef USED_FOR_TARGET
 #undef LT_OBJDIR
 #endif




-- 
diff --git a/gcc/config.in b/gcc/config.in
index 
4cad077bfbed7fd73b3c04ce6405fd2f49178412..cf2f284378447c8f8e2f838a786dba23d6086fe3
 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -67,6 +67,12 @@
 #endif
 
 
+/* Define to larger than one set the number of match.pd partitions to make. */
+#ifndef USED_FOR_TARGET
+#undef DEFAULT_MATCHPD_PARTITIONS
+#endif
+
+
 /* Define to larger than zero set the default stack clash protector size. */
 #ifndef USED_FOR_TARGET
 #undef DEFAULT_STK_CLASH_GUARD_SIZE
@@ -2239,8 +2245,7 @@
 #endif
 
 
-/* Define to the sub-directory in which libtool stores uninstalled libraries.
-   */
+/* Define to the sub-directory where libtool stores uninstalled libraries. */
 #ifndef USED_FOR_TARGET
 #undef LT_OBJDIR
 #endif





Re: [RFC] Add stdckdint.h header for C23

2023-06-12 Thread Eric Gallager via Gcc-patches
On Sat, Jun 10, 2023 at 6:38 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> The following patch is an attempt to implement the C23 stdckdint.h
> header on top of our GNU extension - __builtin_{add,sub,mul}_overflow
> builtins.
>
> I have looked at gnulib stdckdint.h and they are full of workarounds
> for various compilers, EDG doesn't do this, clang <= 14 can't multiply
> __int128, ..., so I think the header belongs into the compiler rather
> than C library, because it would be a nightmare to maintain it there.
>
> What I'm struggling with is enforcing the weird restrictions
> C23 imposes on these.
>
> The builtins error on the result pointer not being writable, or
> having boolean or enumeral type (the reason for disallowing bool
> was that it would be questionable whether it should act as if
> storing to an unsigned 1-bit precision type which would overflow
> if result is not in [0,1] or whether it would never overflow
> for bool * result and simply store false if the infinite precision
> result is 0 and true otherwise, and for enums because of the
> uncertainities on just the enumerators vs. range from smallest to
> largest enumerator vs. strict enum precision with underlying type).
> They do allow storing result in plain char.  And the source operands
> can have any integral types, including plain char, including booleans
> and including enumeral types.  The plain is to allow even _BitInt(N)
> as both source and result later on.
>
> Now, C23 says that suitable types for both type2/type3 and type1
> are integral types other than plain char, bool, a bit-precise integer type,
> or an enumerated type.
>
> And it also says:
> It is recommended to produce a diagnostic message if type2 or type3 are
> not suitable integer types, or if *result is not a modifiable lvalue of
> a suitable integer type.
>
> I've tried to first check it with:
>   static_assert (_Generic ((a), char: 0, const char: 0, volatile char: 0, 
> const volatile char: 0,
>default: __builtin_classify_type (a) - 1 <= 1U),
>  "...")
> but that only catches plain char and doesn't catch _Bool/bool and
> doesn't catch enumerated types (note, for the *result we diagnose
> it for the builtins, but not for the other args), because
> __builtin_classify_type sadly promotes its argument.
>
> The _Generic in the patch below is slightly better, it catches
> also _Bool/bool, but doesn't catch enumerated types, comptypes
> used by _Generic says enumeral type is compatible with the underlying
> integer type.  But catching just plain char and bool would be
> also doable with just _Generic listing the non-allowed types.
>
> I think changing __builtin_classify_type behavior after 35 years
> would be dangerous, shall we introduce a new similar builtin which
> would just never promote the argument/perform array/function/enum
> conversions on it, so that
> __builtin_type_classify (true) == boolean_type_class
> enum E { E1, E2 } e;
> __builtin_type_classify (e) == enumeral_type_class
> int a[2];
> __builtin_type_classify (a) == array_type_class
> etc.?
> Seems clang changed __builtin_type_classify at some point
> so that it e.g. returns enumeral_type_class for enum arguments
> and array_type_class for arrays, but doesn't return boolean_type_class
> for _Bool argument.
>
> Also, shall we introduce __typeof_unqual__ keyword which could be used in
> < C23 modes and perhaps C++?
>

I think I remember a desire for a __typeof_unqual__ keyword in other
contexts as well, too, so it would probably be worthwhile anyways...

> 2023-06-10  Jakub Jelinek  
>
> * Makefile.in (USER_H): Add stdckdint.h.
> * ginclude/stdckdint.h: New file.
>
> * gcc.dg/stdckdint-1.c: New test.
> * gcc.dg/stdckdint-2.c: New test.
>
> --- gcc/Makefile.in.jj  2023-06-06 20:02:35.581211930 +0200
> +++ gcc/Makefile.in 2023-06-10 10:17:05.062270115 +0200
> @@ -466,6 +466,7 @@ USER_H = $(srcdir)/ginclude/float.h \
>  $(srcdir)/ginclude/stdnoreturn.h \
>  $(srcdir)/ginclude/stdalign.h \
>  $(srcdir)/ginclude/stdatomic.h \
> +$(srcdir)/ginclude/stdckdint.h \
>  $(EXTRA_HEADERS)
>
>  USER_H_INC_NEXT_PRE = @user_headers_inc_next_pre@
> --- gcc/ginclude/stdckdint.h.jj 2023-06-10 09:20:39.154053338 +0200
> +++ gcc/ginclude/stdckdint.h2023-06-10 12:02:33.454947780 +0200
> @@ -0,0 +1,78 @@
> +/* Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +Under Section 7 of GPL version 3, you ar

RE: [PATCH] simplify-rtx: Implement constant folding of SS_TRUNCATE, US_TRUNCATE

2023-06-12 Thread Kyrylo Tkachov via Gcc-patches
Hi Richard,

> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, June 9, 2023 7:08 PM
> To: Kyrylo Tkachov via Gcc-patches 
> Cc: Kyrylo Tkachov 
> Subject: Re: [PATCH] simplify-rtx: Implement constant folding of
> SS_TRUNCATE, US_TRUNCATE
> 
> Kyrylo Tkachov via Gcc-patches  writes:
> > Hi all,
> >
> > This patch implements RTL constant-folding for the SS_TRUNCATE and
> US_TRUNCATE codes.
> > The semantics are a clamping operation on the argument with the min and
> max of the narrow mode,
> > followed by a truncation. The signedness of the clamp and the min/max
> extrema is derived from
> > the signedness of the saturating operation.
> >
> > We have a number of instructions in aarch64 that use SS_TRUNCATE and
> US_TRUNCATE to represent
> > their operations and we have pretty thorough runtime tests in
> gcc.target/aarch64/advsimd-intrinsics/vqmovn*.c.
> > With this patch the instructions are folded away at optimisation levels and
> the correctness checks still
> > pass.
> >
> > Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-
> elf.
> > Ok for trunk?
> >
> > Thanks,
> > Kyrill
> >
> > gcc/ChangeLog:
> >
> > * simplify-rtx.cc (simplify_const_unary_operation):
> > Handle US_TRUNCATE, SS_TRUNCATE.
> >
> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> > index
> 276be67aa67247dd46361ab9badc46ab089d6df0..5983a06e5a8ca89c717e864
> 8be410024147b16e6 100644
> > --- a/gcc/simplify-rtx.cc
> > +++ b/gcc/simplify-rtx.cc
> > @@ -2131,6 +2131,22 @@ simplify_const_unary_operation (enum
> rtx_code code, machine_mode mode,
> >   result = wide_int::from (op0, width, UNSIGNED);
> >   break;
> >
> > +   case US_TRUNCATE:
> > +   case SS_TRUNCATE:
> > + {
> > +   signop sgn = code == US_TRUNCATE ? UNSIGNED : SIGNED;
> > +   wide_int nmax
> > + = wide_int::from (wi::max_value (width, sgn),
> > +   GET_MODE_PRECISION (imode), sgn);
> > +   wide_int nmin
> > + = wide_int::from (wi::min_value (width, sgn),
> > +   GET_MODE_PRECISION (imode), sgn);
> > +   result
> > + = wide_int::from (op0, GET_MODE_PRECISION (imode), sgn);
> > +   result = wi::min (wi::max (result, nmin, sgn), nmax, sgn);
> 
> FWIW, it looks like this could be:
> 
>   result = wi::min (wi::max (op0, nmin, sgn), nmax, sgn);
> 
> without the first assignment to result.  That feels more natural IMO,
> since no conversion is being done on op0.

Thanks, that works indeed.
I'll push the attached patch to trunk once bootstrap and testing completes.
Kyrill

> 
> Thanks,
> Richard
> 
> > +   result = wide_int::from (result, width, sgn);
> > +   break;
> > + }
> > case SIGN_EXTEND:
> >   result = wide_int::from (op0, width, SIGNED);
> >   break;


sstrunc.patch
Description: sstrunc.patch


Re: [PATCH][committed] Regenerate config.in

2023-06-12 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 12, 2023 at 11:20:00AM +0100, Tamar Christina via Gcc-patches wrote:
> Hi All,
> 
> Looks like I forgot to regenerate config.in which
> causes updates when you enable maintainer mode.
> 
> Bootstrapped aarch64-none-linux-gnu.
> 
> Committed under obvious rule.

Do you use the DEFAULT_MATCHPD_PARTITIONS macro anywhere?
If not, why the AC_DEFINE_UNQUOTED at all and not just the AC_SUBST?

Jakub



RE: [PATCH][committed] Regenerate config.in

2023-06-12 Thread Tamar Christina via Gcc-patches
> 
> Do you use the DEFAULT_MATCHPD_PARTITIONS macro anywhere?
> If not, why the AC_DEFINE_UNQUOTED at all and not just the AC_SUBST?
> 

It used to be used to change the default of genmatch.cc, but the default is now 
not to split anymore.  So guess I can remove it.

Will follow up...


Re: [committed] libstdc++: Fix P2510R3 "Formatting pointers" [PR110149]

2023-06-12 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 9 Jun 2023 at 17:41, Jonathan Wakely via Gcc-patches
 wrote:
>
> Tested powerpc64le-linux. Pushed to trunk.
Hi Jonathan,
This patch causes following regression on armv8l-unknown-linux-gnueabihf:
FAIL: std/format/functions/format.cc execution test
/home/tcwg-buildslave/workspace/tcwg_gnu_3/abe/snapshots/gcc.git~master/libstdc++-v3/testsuite/std/format/functions/format.cc:368:
void test_pointer(): Assertion 's == (str_int + ' ' + str_int + "
0x0")' failed.
timeout: the monitored command dumped core

Full libstdc++.log:
https://people.linaro.org/~prathamesh.kulkarni/libstdc++.log.0.xz
Could you please check ?

Thanks,
Prathamesh



>
> I'll backport it to gcc-13 later.
>
> -- >8 --
>
> I had intended to support the P2510R3 proposal unconditionally in C++20
> mode, but I left it half implemented. The parse function supported the
> new extensions, but the format function didn't.
>
> This adds the missing pieces, and makes it only enabled for C++26 and
> non-strict modes.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/110149
> * include/std/format (formatter::parse):
> Only alow 0 and P for C++26 and non-strict modes.
> (formatter::format): Use toupper for P
> type, and insert zero-fill characters for 0 option.
> * testsuite/std/format/functions/format.cc: Check pointer
> formatting. Only check P2510R3 extensions conditionally.
> * testsuite/std/format/parse_ctx.cc: Only check P2510R3
> extensions conditionally.
> ---
>  libstdc++-v3/include/std/format   | 56 ---
>  .../testsuite/std/format/functions/format.cc  | 42 ++
>  .../testsuite/std/format/parse_ctx.cc | 15 +++--
>  3 files changed, 101 insertions(+), 12 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
> index 6edc3208afa..96a1e62ccc8 100644
> --- a/libstdc++-v3/include/std/format
> +++ b/libstdc++-v3/include/std/format
> @@ -830,7 +830,7 @@ namespace __format
> {
>   if (_M_spec._M_type == _Pres_esc)
> {
> - // TODO: C++20 escaped string presentation
> + // TODO: C++23 escaped string presentation
> }
>
>   if (_M_spec._M_width_kind == _WP_none
> @@ -2081,19 +2081,31 @@ namespace __format
> if (__finished())
>   return __first;
>
> -   // _GLIBCXX_RESOLVE_LIB_DEFECTS
> -   // P2519R3 Formatting pointers
> +// _GLIBCXX_RESOLVE_LIB_DEFECTS
> +// P2510R3 Formatting pointers
> +#define _GLIBCXX_P2518R3 (__cplusplus > 202302L || ! defined __STRICT_ANSI__)
> +
> +#if _GLIBCXX_P2518R3
> __first = __spec._M_parse_zero_fill(__first, __last);
> if (__finished())
>   return __first;
> +#endif
>
> __first = __spec._M_parse_width(__first, __last, __pc);
>
> -   if (__first != __last && (*__first == 'p' || *__first == 'P'))
> +   if (__first != __last)
>   {
> -   if (*__first == 'P')
> +   if (*__first == 'p')
> + ++__first;
> +#if _GLIBCXX_P2518R3
> +   else if (*__first == 'P')
> +   {
> + // _GLIBCXX_RESOLVE_LIB_DEFECTS
> + // P2510R3 Formatting pointers
>   __spec._M_type = __format::_Pres_P;
> -   ++__first;
> + ++__first;
> +   }
> +#endif
>   }
>
> if (__finished())
> @@ -2110,9 +2122,21 @@ namespace __format
>   char __buf[2 + sizeof(__v) * 2];
>   auto [__ptr, __ec] = std::to_chars(__buf + 2, std::end(__buf),
>  __u, 16);
> - const int __n = __ptr - __buf;
> + int __n = __ptr - __buf;
>   __buf[0] = '0';
>   __buf[1] = 'x';
> +#if _GLIBCXX_P2518R3
> + if (_M_spec._M_type == __format::_Pres_P)
> +   {
> + __buf[1] = 'X';
> + for (auto __p = __buf + 2; __p != __ptr; ++__p)
> +#if __has_builtin(__builtin_toupper)
> +   *__p = __builtin_toupper(*__p);
> +#else
> +   *__p = std::toupper(*__p);
> +#endif
> +   }
> +#endif
>
>   basic_string_view<_CharT> __str;
>   if constexpr (is_same_v<_CharT, char>)
> @@ -2126,6 +2150,24 @@ namespace __format
>   __str = wstring_view(__p, __n);
> }
>
> +#if _GLIBCXX_P2518R3
> + if (_M_spec._M_zero_fill)
> +   {
> + size_t __width = _M_spec._M_get_width(__fc);
> + if (__width <= __str.size())
> +   return __format::__write(__fc.out(), __str);
> +
> + auto __out = __fc.out();
> + // Write "0x" or "0X" prefix before zero-filling.
> + __out = __format::__write(std::move(__out), __str.substr(0, 2));
> + __str.remove_prefix(2);
> + size_t __nfill = __width - __n;
> + return __format::__write_padded(std::move(__out), __str,
> + __format::

[PATCH] rs6000: Change bitwise xor to inequality operator [PR106907]

2023-06-12 Thread P Jeevitha via Gcc-patches
PR106907 has few warnings spotted from cppcheck. Here we have
warnings for precedence clarification since boolean results are
used in bitwise operation. Bitwise xor performed on bool
is similar to checking inequality. So changed to inequality
operator (!=) instead of bitwise xor (^). And fixed comment indentation

2023-06-12  Jeevitha Palanisamy  

gcc/
PR target/106907
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Change 
bitwise
xor to inequality and fix comment indentation.


diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index ea68ca6faef..ea7efda8dcd 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -23396,10 +23396,10 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
rtx op1,
  && GET_MODE (XEXP (op0, 0)) != V8HImode)))
continue;
 
-  /* For little-endian, the two input operands must be swapped
- (or swapped back) to ensure proper right-to-left numbering
- from 0 to 2N-1.  */
- if (swapped ^ !BYTES_BIG_ENDIAN
+ /* For little-endian, the two input operands must be swapped
+(or swapped back) to ensure proper right-to-left numbering
+from 0 to 2N-1.  */
+ if (swapped != !BYTES_BIG_ENDIAN
  && icode != CODE_FOR_vsx_xxpermdi_v16qi)
std::swap (op0, op1);
  if (imode != V16QImode)





Re: [PATCH] Make sure SCALAR_INT_MODE_P before invoke try_const_anchors

2023-06-12 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Mon, 12 Jun 2023, Jiufu Guo wrote:
>
>> Richard Biener  writes:
>> 
>> > On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >
>> >> 
>> >> Hi,
>> >> 
>> >> Richard Biener  writes:
>> >> 
>> >> > On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >> >
>> >> >> 
>> >> >> Hi,
>> >> >> 
>> >> >> Richard Biener  writes:
>> >> >> 
>> >> >> > On Fri, 9 Jun 2023, Richard Sandiford wrote:
>> >> >> >
>> >> >> >> guojiufu  writes:
>> >> >> >> > Hi,
>> >> >> >> >
>> >> >> >> > On 2023-06-09 16:00, Richard Biener wrote:
>> >> >> >> >> On Fri, 9 Jun 2023, Jiufu Guo wrote:
>> >> >> >> >> 
>> >> >> >> >>> Hi,
>> >> >> >> >>> 
>> >> ...
>> >> >> >> >>> 
>> >> >> >> >>> This patch is raised when drafting below one.
>> >> >> >> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html.
>> >> >> >> >>> With that patch, "{[%1:DI]=0;} stack_tie" with BLKmode runs into
>> >> >> >> >>> try_const_anchors, and hits the assert/ice.
>> >> >> >> >>> 
>> >> >> >> >>> Boostrap and regtest pass on ppc64{,le} and x86_64.
>> >> >> >> >>> Is this ok for trunk?
>> >> >> >> >> 
>> >> >> >> >> Iff the correct fix at all (how can a CONST_INT have BLKmode?) 
>> >> >> >> >> then
>> >> >> >> >> I suggest to instead fix try_const_anchors to change
>> >> >> >> >> 
>> >> >> >> >>   /* CONST_INT is used for CC modes, but we should leave those 
>> >> >> >> >> alone.  
>> >> >> >> >> */
>> >> >> >> >>   if (GET_MODE_CLASS (mode) == MODE_CC)
>> >> >> >> >> return NULL_RTX;
>> >> >> >> >> 
>> >> >> >> >>   gcc_assert (SCALAR_INT_MODE_P (mode));
>> >> >> >> >> 
>> >> >> >> >> to
>> >> >> >> >> 
>> >> >> >> >>   /* CONST_INT is used for CC modes, leave any non-scalar-int 
>> >> >> >> >> mode 
>> >> >> >> >> alone.  */
>> >> >> >> >>   if (!SCALAR_INT_MODE_P (mode))
>> >> >> >> >> return NULL_RTX;
>> >> >> >> >> 
>> >> >> >> >
>> >> >> >> > This is also able to fix this issue.  there is a "Punt on CC 
>> >> >> >> > modes" 
>> >> >> >> > patch
>> >> >> >> > to return NULL_RTX in try_const_anchors.
>> >> >> >> >
>> >> >> >> >> but as said I wonder how we arrive at a BLKmode CONST_INT and 
>> >> >> >> >> whether
>> >> >> >> >> we should have fended this off earlier.  Can you share more 
>> >> >> >> >> complete
>> >> >> >> >> RTL of that stack_tie?
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > (insn 15 14 16 3 (parallel [
>> >> >> >> >  (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
>> >> >> >> >  (const_int 0 [0]))
>> >> >> >> >  ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
>> >> >> >> >   (nil))
>> >> >> >> >
>> >> >> >> > It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".
>> >> >> >> 
>> >> >> >> I'm not convinced this is correct RTL.  (unspec:BLK [(const_int 0)] 
>> >> >> >> ...)
>> >> >> >> would be though.  It's arguably more accurate too, since the effect
>> >> >> >> on the stack locations is unspecified rather than predictable.
>> >> >> >
>> >> >> > powerpc seems to be the only port with a stack_tie that's not
>> >> >> > using an UNSPEC RHS.
>> >> >> In rs6000.md, it is
>> >> >> 
>> >> >> ; This is to explain that changes to the stack pointer should
>> >> >> ; not be moved over loads from or stores to stack memory.
>> >> >> (define_insn "stack_tie"
>> >> >>   [(match_parallel 0 "tie_operand"
>> >> >>   [(set (mem:BLK (reg 1)) (const_int 0))])]
>> >> >>   ""
>> >> >>   ""
>> >> >>   [(set_attr "length" "0")])
>> >> >> 
>> >> >> This would be just an placeholder insn, and acts as the comments.
>> >> >> UNSPEC_ would works like other targets.  While, I'm wondering
>> >> >> the concerns on "set (mem:BLK (reg 1)) (const_int 0)".
>> >> >> MODEs between SET_DEST and SET_SRC?
>> >> >
>> >> > I don't think the issue is the mode but the issue is that
>> >> > the patter as-is says some memory is zeroed while that's not
>> >> > actually true (not specifying a size means we can't really do
>> >> > anything with this MEM, but still).  Using an UNSPEC avoids
>> >> > implying anything for the stored value.
>> >> >
>> >> > Of course I think a MEM SET_DEST without a specified size is bougs
>> >> > as well, but there's larger precedent for this...
>> >> 
>> >> Thanks for your kindly comments!
>> >> Using "(set (mem:BLK (reg 1)) (const_int 0))" here, may because this
>> >> insn does not generate real thing (not a real store and no asm code),
>> >> may like barrier.
>> >> 
>> >> While I agree that, using UNSPEC may be more clear to avoid mis-reading.
>> >
>> > Btw, another way to avoid the issue in CSE is to make it not process
>> > (aka record anything for optimization) for SET from MEMs with
>> > !MEM_SIZE_KNOWN_P
>> 
>> Thanks! Yes, this would make sense.
>> Then, there are two ideas(patches) to handle this issue:
>> Which one would be preferable?  This one (from compiling time aspect)?
>> 
>> And maybe, the changes in rs6000 stack_tie through using unspec
>> can be a standalone enhancement besides cse patch.
>> 
>> Thanks for comments!
>> 
>> BR,
>> Jeff (Jiufu Guo)
>> 
>> -

Re: [PATCH V2] RISC-V: Add ZVFHMIN block autovec testcase

2023-06-12 Thread Kito Cheng via Gcc-patches
LGTM too, thanks

On Mon, Jun 12, 2023 at 5:46 PM Robin Dapp via Gcc-patches
 wrote:
>
> > +/* We can't enable FP16 NEG/PLUS/MINUS/MULT/DIV auto-vectorization when 
> > -march="*zvfhmin*".  */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 
> > "vect" } } */
>
> Thanks.  OK from my side.
>
> Regards
>  Robin


[PATCH v1] RISC-V: Fix one potential test failure for RVV vsetvl

2023-06-12 Thread Pan Li via Gcc-patches
From: Pan Li 

The test will fail on below command with multi-thread like below.  However,
it comes from one missed "Oz" option when check vsetvl.

make -j $(nproc) report RUNTESTFLAGS="rvv.exp riscv.exp"

To some reason, this failure cannot be reproduced by RUNTESTFLAGS="rvv.exp"
or make without -j option. We would like to fix it and root cause the
reason later.

Signed-off-by: Pan Li 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Adjust test checking.
---
 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
index 66c90ac10e7..f3420be8ab6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
@@ -34,4 +34,4 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t 
k) {
 /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4} 1 { 
target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {srli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*8} 1 { 
target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0" 
no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 5 { target { no-opts 
"-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 5 { target { no-opts 
"-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } 
*/
-- 
2.34.1



RE: [PATCH V2] RISC-V: Add ZVFHMIN block autovec testcase

2023-06-12 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and Robin.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Monday, June 12, 2023 8:19 PM
To: Robin Dapp 
Cc: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org; kito.ch...@sifive.com; 
pal...@dabbelt.com; pal...@rivosinc.com; jeffreya...@gmail.com
Subject: Re: [PATCH V2] RISC-V: Add ZVFHMIN block autovec testcase

LGTM too, thanks

On Mon, Jun 12, 2023 at 5:46 PM Robin Dapp via Gcc-patches 
 wrote:
>
> > +/* We can't enable FP16 NEG/PLUS/MINUS/MULT/DIV auto-vectorization 
> > +when -march="*zvfhmin*".  */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in 
> > +function" 0 "vect" } } */
>
> Thanks.  OK from my side.
>
> Regards
>  Robin


Re: [PATCH] RISC-V: Add RVV narrow shift right lowering auto-vectorization

2023-06-12 Thread juzhe.zh...@rivai.ai
Is this patch ok for trunk?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-12 10:41
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Add RVV narrow shift right lowering auto-vectorization
From: Juzhe-Zhong 
 
Optimize the following auto-vectorization codes:
void foo (int16_t * __restrict a, int32_t * __restrict b, int32_t c, int n)
{
for (int i = 0; i < n; i++)
  a[i] = b[i] >> c;
}
 
Before this patch:
foo:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,ta,ma
vle32.v v1,0(a1)
vsetvli a4,zero,e32,m1,ta,ma
vsra.vx v1,v1,a2
vsetvli zero,zero,e16,mf2,ta,ma
sllia7,a5,2
vncvt.x.x.w v1,v1
sllia6,a5,1
vsetvli zero,a5,e16,mf2,ta,ma
sub a3,a3,a5
vse16.v v1,0(a0)
add a1,a1,a7
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret
 
After this patch:
foo:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,ta,ma
vle32.v v1,0(a1)
vsetvli a7,zero,e16,mf2,ta,ma
slli a6,a5,2
vnsra.wx v1,v1,a2
slli a4,a5,1
vsetvli zero,a5,e16,mf2,ta,ma
sub a3,a3,a5
vse16.v v1,0(a0)
add a1,a1,a6
add a0,a0,a4
bne a3,zero,.L3
.L5:
ret
 
gcc/ChangeLog:
 
* config/riscv/autovec-opt.md 
(*vtrunc): New pattern.
(*trunc): Ditto.
* config/riscv/autovec.md (3): Change to 
define_insn_and_split.
(v3): Ditto.
(trunc2): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/narrow-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/narrow_run-3.c: New test.
 
---
gcc/config/riscv/autovec-opt.md   | 46 +
gcc/config/riscv/autovec.md   | 43 ++--
.../riscv/rvv/autovec/binop/narrow-1.c| 31 
.../riscv/rvv/autovec/binop/narrow-2.c| 32 
.../riscv/rvv/autovec/binop/narrow-3.c| 31 
.../riscv/rvv/autovec/binop/narrow_run-1.c| 50 +++
.../riscv/rvv/autovec/binop/narrow_run-2.c| 46 +
.../riscv/rvv/autovec/binop/narrow_run-3.c| 46 +
8 files changed, 311 insertions(+), 14 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-3.c
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 7bb93eed220..aef28e445e1 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -330,3 +330,49 @@
   }
   [(set_attr "type" "viwmuladd")
(set_attr "mode" "")])
+
+;; -
+;;  [INT] Binary narrow shifts.
+;; -
+;; Includes:
+;; - vnsrl.wv/vnsrl.wx/vnsrl.wi
+;; - vnsra.wv/vnsra.wx/vnsra.wi
+;; -
+
+(define_insn_and_split "*vtrunc"
+  [(set (match_operand: 0 "register_operand"   "=vr,vr")
+(truncate:
+  (any_shiftrt:VWEXTI
+(match_operand:VWEXTI 1 "register_operand" " vr,vr")
+ (any_extend:VWEXTI
+  (match_operand: 2 "vector_shift_operand" " 
vr,vk")]
+  "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_narrow (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, operands);
+  DONE;
+}
+ [(set_attr "type" "vnshift")
+  (set_attr "mode" "")])
+
+(define_insn_and_split "*trunc"
+  [(set (match_operand: 0 "register_operand" "=vr")
+(truncate:
+  (any_shiftrt:VWEXTI
+(match_operand:VWEXTI 1 "register_operand"   " vr")
+ (match_operand: 2 "csr_operand" " rK"]
+  "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+{
+  operands[2] = gen_lowpart (Pmode, operands[2]);
+  insn_code icode = code_for_pred_narrow_scalar (, 
mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, operands);
+  DONE;
+}
+ [(set_attr "type" "vnshift")
+  (set_attr "mode" "")])
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b7070099f29..eadc2c5b595 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -150,18 +150,2

Re: Re: [PATCH] RISC-V: Fix V_WHOLE && V_FRACT iterator requirement

2023-06-12 Thread Kito Cheng via Gcc-patches
Some more detail here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616051.html




On Mon, Jun 12, 2023 at 5:58 PM juzhe.zh...@rivai.ai
 wrote:
>
> I'd like you to defer to you commit my patch with your test (Jeff has 
> approved my patch, just feel free to commit).
>
> Here is the description:
> We have 3 configuration for "-march"
> 1. zve32*  (TARGET_MIN_VLEN == 32), the LMUL = 1 mode will be VNx4QI, VNx2HI, 
> VNx1SI
> 2. zve64*  (TARGET_MIN_VLEN == 64), the LMUL = 1 mode will be VNx8QI, VNx4HI, 
> VNx2SI
> 3. zve64*_zvl128b  (TARGET_MIN_VLEN >= 128), the LMUL = 1 mode will be 
> VNx16QI, VNx8HI, VNx4SI
>
> We dynamically adjust BYTES_PER_VECTOR according to TARGET_MIN_VLEN.
> For TARGET_MIN_VLEN = 32 (chunk=32), the LMUL = 1 size = (4,4) bytes.
> For TARGET_MIN_VLEN = 64 (chunk=64), the LMUL = 1 size = (8,8) bytes.
> For TARGET_MIN_VLEN >= 128 (chunk=128), the LMUL = 1 size = (16*n,16*n) bytes.
> I have explained it many times.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614935.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612574.html
>
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Robin Dapp
> Date: 2023-06-12 17:51
> To: 钟居哲; gcc-patches
> CC: rdapp.gcc; kito.cheng; palmer; Jeff Law
> Subject: Re: [PATCH] RISC-V: Fix V_WHOLE && V_FRACT iterator requirement
> > +  (VNx16QI "TARGET_MIN_VLEN <= 128")
> > +  (VNx32QI "TARGET_MIN_VLEN <= 256")
> > +  (VNx64QI "TARGET_MIN_VLEN >= 64 && TARGET_MIN_VLEN <= 512")
> > +  (VNx128QI "TARGET_MIN_VLEN >= 128 && TARGET_MIN_VLEN <= 1024")
> >
> > This not correct, we always use VNx16QI as LMUL = m1 for min_vlen >= 128.
> > Requirement of TARGET_MIN_VLEN <= 128 is incorrect for VNx16QI.
> > VNx32QI,...etc likewise.
>
> Please elaborate.  What happens with a VNx16QI on a target with
> min_vlen == 256?  Is it a full 256-bit vector with only the first half
> populated?  If so, this need documentation either here or somewhere
> else (but with a reference here).
>
> Either you can pick my testcase and amend your patch (plus
> streamline formatting as well adding a proper comment) or I change
> mine.  Your call.
>
> Regards
> Robin
>


Re: [PATCH v1] RISC-V: Fix one potential test failure for RVV vsetvl

2023-06-12 Thread Kito Cheng via Gcc-patches
OK for this patch, and I am thinking we should adjust rvv.exp to
just exclude -O0, -Os and -Oz for some testcases run to simplify many
testcases.


On Mon, Jun 12, 2023 at 8:20 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> The test will fail on below command with multi-thread like below.  However,
> it comes from one missed "Oz" option when check vsetvl.
>
> make -j $(nproc) report RUNTESTFLAGS="rvv.exp riscv.exp"
>
> To some reason, this failure cannot be reproduced by RUNTESTFLAGS="rvv.exp"
> or make without -j option. We would like to fix it and root cause the
> reason later.
>
> Signed-off-by: Pan Li 
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Adjust test checking.
> ---
>  gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
> index 66c90ac10e7..f3420be8ab6 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
> @@ -34,4 +34,4 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t m, 
> size_t k) {
>  /* { dg-final { scan-assembler-times {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4} 1 
> { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
>  /* { dg-final { scan-assembler-times {srli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*8} 1 
> { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
>  /* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0" 
> no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */
> -/* { dg-final { scan-assembler-times 
> {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 5 { target { 
> no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } } } */
> +/* { dg-final { scan-assembler-times 
> {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 5 { target { 
> no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts 
> "-funroll-loops" } } } } */
> --
> 2.34.1
>


Re: [PATCH] RISC-V: Add RVV narrow shift right lowering auto-vectorization

2023-06-12 Thread Kito Cheng via Gcc-patches
We have two style predictor for those define_insn_and_split patterns,
"TARGET_VECTOR"/"&& can_create_pseudo_p ()" and "TARGET_VECTOR &&
can_create_pseudo_p ()"/"&& 1", could you unify all to later form? I
feel that would be safer since those patterns are really only valid
before RA(can_create_pseudo_p() == true), although it's mostly used by
combine pass so it's mostly safe, but IMO we should fix this soon
rather than fix that until we hit this later.

OK for this patch as it is, and I would like to have a separated patch
to fix all those issues.

On Mon, Jun 12, 2023 at 8:27 PM juzhe.zh...@rivai.ai
 wrote:
>
> Is this patch ok for trunk?
>
>
>
> juzhe.zh...@rivai.ai
>
> From: juzhe.zhong
> Date: 2023-06-12 10:41
> To: gcc-patches
> CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; 
> Juzhe-Zhong
> Subject: [PATCH] RISC-V: Add RVV narrow shift right lowering 
> auto-vectorization
> From: Juzhe-Zhong 
>
> Optimize the following auto-vectorization codes:
> void foo (int16_t * __restrict a, int32_t * __restrict b, int32_t c, int n)
> {
> for (int i = 0; i < n; i++)
>   a[i] = b[i] >> c;
> }
>
> Before this patch:
> foo:
> ble a3,zero,.L5
> .L3:
> vsetvli a5,a3,e32,m1,ta,ma
> vle32.v v1,0(a1)
> vsetvli a4,zero,e32,m1,ta,ma
> vsra.vx v1,v1,a2
> vsetvli zero,zero,e16,mf2,ta,ma
> sllia7,a5,2
> vncvt.x.x.w v1,v1
> sllia6,a5,1
> vsetvli zero,a5,e16,mf2,ta,ma
> sub a3,a3,a5
> vse16.v v1,0(a0)
> add a1,a1,a7
> add a0,a0,a6
> bne a3,zero,.L3
> .L5:
> ret
>
> After this patch:
> foo:
> ble a3,zero,.L5
> .L3:
> vsetvli a5,a3,e32,m1,ta,ma
> vle32.v v1,0(a1)
> vsetvli a7,zero,e16,mf2,ta,ma
> slli a6,a5,2
> vnsra.wx v1,v1,a2
> slli a4,a5,1
> vsetvli zero,a5,e16,mf2,ta,ma
> sub a3,a3,a5
> vse16.v v1,0(a0)
> add a1,a1,a6
> add a0,a0,a4
> bne a3,zero,.L3
> .L5:
> ret
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md 
> (*vtrunc): New pattern.
> (*trunc): Ditto.
> * config/riscv/autovec.md (3): Change to 
> define_insn_and_split.
> (v3): Ditto.
> (trunc2): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/binop/narrow-1.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/narrow-2.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/narrow-3.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/narrow_run-2.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/narrow_run-3.c: New test.
>
> ---
> gcc/config/riscv/autovec-opt.md   | 46 +
> gcc/config/riscv/autovec.md   | 43 ++--
> .../riscv/rvv/autovec/binop/narrow-1.c| 31 
> .../riscv/rvv/autovec/binop/narrow-2.c| 32 
> .../riscv/rvv/autovec/binop/narrow-3.c| 31 
> .../riscv/rvv/autovec/binop/narrow_run-1.c| 50 +++
> .../riscv/rvv/autovec/binop/narrow_run-2.c| 46 +
> .../riscv/rvv/autovec/binop/narrow_run-3.c| 46 +
> 8 files changed, 311 insertions(+), 14 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-3.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-2.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-3.c
>
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 7bb93eed220..aef28e445e1 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -330,3 +330,49 @@
>}
>[(set_attr "type" "viwmuladd")
> (set_attr "mode" "")])
> +
> +;; -
> +;;  [INT] Binary narrow shifts.
> +;; -
> +;; Includes:
> +;; - vnsrl.wv/vnsrl.wx/vnsrl.wi
> +;; - vnsra.wv/vnsra.wx/vnsra.wi
> +;; -
> +
> +(define_insn_and_split "*vtrunc"
> +  [(set (match_operand: 0 "register_operand"   "=vr,vr")
> +(truncate:
> +  (any_shiftrt:VWEXTI
> +(match_operand:VWEXTI 1 "register_operand" " vr,vr")
> + (any_extend:VWEXTI
> +  (match_operand: 2 "vector_shift_operand" " 
> vr,vk")]
> +  "TARGET_VECTOR"
> +  "#"
> +  "&& can_create_pseudo_p ()"
> +  [(const_int 0)]
> +{
> +  insn_code icode = code_for_pred_narrow (, mode);
> +  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, operands)

Re: [PATCH] inline: improve internal function costs

2023-06-12 Thread Andre Vieira (lists) via Gcc-patches




On 05/06/2023 04:04, Jan Hubicka wrote:

On Thu, 1 Jun 2023, Andre Vieira (lists) wrote:


Hi,

This is a follow-up of the internal function patch to add widening and
narrowing patterns.  This patch improves the inliner cost estimation for
internal functions.


I have no idea why calls are special in IPA analyze_function_body
and so I cannot say whether treating all internal fn calls as
non-calls is correct there.  Honza?


The reason is that normal statements are acconted as part of the
function body, while calls have their costs attached to call edges
(so it can be adjusted when call is inlined to otherwise optimized).

However since internal functions have no cgraph edges, this looks like
a bug that we do not test it.  (the code was written before internal
calls was introduced).



This sounds to me like you agree with my approach to treat internal 
calls different to regular calls.



I wonder if we don't want to have is_noninternal_gimple_call that could
be used by IPA code to test whether cgraph edge should exist for
the statement.


I'm happy to add such a helper function @richi,rsandifo: you ok with that?


The tree-inline.cc change is OK though (you can push that separately).

The rest is OK too.
Honza


Thanks,
Richard.


Bootstrapped and regression tested on aarch64-unknown-linux-gnu.

gcc/ChangeLog:

 * ipa-fnsummary.cc (analyze_function_body): Correctly handle
 non-zero costed internal functions.
 * tree-inline.cc (estimate_num_insns): Improve costing for internal
 functions.



--
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: Re: [PATCH] RISC-V: Add RVV narrow shift right lowering auto-vectorization

2023-06-12 Thread juzhe.zh...@rivai.ai
You mean change all split pattern like this ?
;; This helps to match zero_extend + sign_extend + fma.
(define_insn_and_split "*zero_sign_extend_fma"
  [(set (match_operand:VWEXTI 0 "register_operand")
  (plus:VWEXTI
(mult:VWEXTI
  (zero_extend:VWEXTI
(match_operand: 2 "register_operand"))
  (sign_extend:VWEXTI
(match_operand: 3 "register_operand")))
(match_operand:VWEXTI 1 "register_operand")))]
  "TARGET_VECTOR && can_create_pseudo_p ()"
  "#"
  "&& 1"
  [(const_int 0)]



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-06-12 20:37
To: juzhe.zh...@rivai.ai
CC: gcc-patches; Kito.cheng; palmer; palmer; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH] RISC-V: Add RVV narrow shift right lowering 
auto-vectorization
We have two style predictor for those define_insn_and_split patterns,
"TARGET_VECTOR"/"&& can_create_pseudo_p ()" and "TARGET_VECTOR &&
can_create_pseudo_p ()"/"&& 1", could you unify all to later form? I
feel that would be safer since those patterns are really only valid
before RA(can_create_pseudo_p() == true), although it's mostly used by
combine pass so it's mostly safe, but IMO we should fix this soon
rather than fix that until we hit this later.
 
OK for this patch as it is, and I would like to have a separated patch
to fix all those issues.
 
On Mon, Jun 12, 2023 at 8:27 PM juzhe.zh...@rivai.ai
 wrote:
>
> Is this patch ok for trunk?
>
>
>
> juzhe.zh...@rivai.ai
>
> From: juzhe.zhong
> Date: 2023-06-12 10:41
> To: gcc-patches
> CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; 
> Juzhe-Zhong
> Subject: [PATCH] RISC-V: Add RVV narrow shift right lowering 
> auto-vectorization
> From: Juzhe-Zhong 
>
> Optimize the following auto-vectorization codes:
> void foo (int16_t * __restrict a, int32_t * __restrict b, int32_t c, int n)
> {
> for (int i = 0; i < n; i++)
>   a[i] = b[i] >> c;
> }
>
> Before this patch:
> foo:
> ble a3,zero,.L5
> .L3:
> vsetvli a5,a3,e32,m1,ta,ma
> vle32.v v1,0(a1)
> vsetvli a4,zero,e32,m1,ta,ma
> vsra.vx v1,v1,a2
> vsetvli zero,zero,e16,mf2,ta,ma
> sllia7,a5,2
> vncvt.x.x.w v1,v1
> sllia6,a5,1
> vsetvli zero,a5,e16,mf2,ta,ma
> sub a3,a3,a5
> vse16.v v1,0(a0)
> add a1,a1,a7
> add a0,a0,a6
> bne a3,zero,.L3
> .L5:
> ret
>
> After this patch:
> foo:
> ble a3,zero,.L5
> .L3:
> vsetvli a5,a3,e32,m1,ta,ma
> vle32.v v1,0(a1)
> vsetvli a7,zero,e16,mf2,ta,ma
> slli a6,a5,2
> vnsra.wx v1,v1,a2
> slli a4,a5,1
> vsetvli zero,a5,e16,mf2,ta,ma
> sub a3,a3,a5
> vse16.v v1,0(a0)
> add a1,a1,a6
> add a0,a0,a4
> bne a3,zero,.L3
> .L5:
> ret
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md 
> (*vtrunc): New pattern.
> (*trunc): Ditto.
> * config/riscv/autovec.md (3): Change to 
> define_insn_and_split.
> (v3): Ditto.
> (trunc2): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/binop/narrow-1.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/narrow-2.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/narrow-3.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/narrow_run-2.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/narrow_run-3.c: New test.
>
> ---
> gcc/config/riscv/autovec-opt.md   | 46 +
> gcc/config/riscv/autovec.md   | 43 ++--
> .../riscv/rvv/autovec/binop/narrow-1.c| 31 
> .../riscv/rvv/autovec/binop/narrow-2.c| 32 
> .../riscv/rvv/autovec/binop/narrow-3.c| 31 
> .../riscv/rvv/autovec/binop/narrow_run-1.c| 50 +++
> .../riscv/rvv/autovec/binop/narrow_run-2.c| 46 +
> .../riscv/rvv/autovec/binop/narrow_run-3.c| 46 +
> 8 files changed, 311 insertions(+), 14 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-3.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-2.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-3.c
>
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 7bb93eed220..aef28e445e1 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -330,3 +330,49 @@
>}
>[(set_attr "type" "viwmuladd")
> (set_attr "mode" "")])
> +
> +;; -
> +;;  [INT] Binary narrow shifts.
> +;; ---

RE: [PATCH v5] RISC-V: Add vector psabi checking.

2023-06-12 Thread Wang, Yanzhang via Gcc-patches
I found there're still some test cases that does not pass. I'll push
another version soon. Sorry for the inconvenience.

> -Original Message-
> From: Wang, Yanzhang 
> Sent: Monday, June 12, 2023 4:08 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Li, Pan2
> ; Wang, Yanzhang 
> Subject: [PATCH v5] RISC-V: Add vector psabi checking.
> 
> From: Yanzhang Wang 
> 
> This patch adds support to check function's argument or return is vector
> type and throw warning if yes.
> 
> There're two exceptions,
>   - The vector_size attribute.
>   - The intrinsic functions.
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv-protos.h (riscv_init_cumulative_args): Set
> warning flag if func is not builtin
>   * config/riscv/riscv.cc
>   (riscv_scalable_vector_type_p): Determine whether the type is scalable
> vector.
>   (riscv_arg_has_vector): Determine whether the arg is vector type.
>   (riscv_pass_in_vector_p): Check the vector type param is passed by
> value.
>   (riscv_init_cumulative_args): The same as header.
>   (riscv_get_arg_info): Add the checking.
>   (riscv_function_value): Check the func return and set warning flag
>   * config/riscv/riscv.h (INIT_CUMULATIVE_ARGS): Add a flag to
> determine whether warning psabi or not.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/rvv.exp: Add -Wno-psabi
>   * gcc.target/riscv/vector-abi-1.c: New test.
>   * gcc.target/riscv/vector-abi-2.c: New test.
>   * gcc.target/riscv/vector-abi-3.c: New test.
>   * gcc.target/riscv/vector-abi-4.c: New test.
>   * gcc.target/riscv/vector-abi-5.c: New test.
>   * gcc.target/riscv/vector-abi-6.c: New test.
> 
> Signed-off-by: Yanzhang Wang 
> Co-authored-by: Kito Cheng 
> ---
>  gcc/config/riscv/riscv-protos.h   |   2 +
>  gcc/config/riscv/riscv.cc | 112 +-
>  gcc/config/riscv/riscv.h  |   5 +-
>  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +-
>  gcc/testsuite/gcc.target/riscv/vector-abi-1.c |  14 +++
> gcc/testsuite/gcc.target/riscv/vector-abi-2.c |  15 +++
> gcc/testsuite/gcc.target/riscv/vector-abi-3.c |  14 +++
> gcc/testsuite/gcc.target/riscv/vector-abi-4.c |  16 +++
> gcc/testsuite/gcc.target/riscv/vector-abi-5.c |  15 +++
> gcc/testsuite/gcc.target/riscv/vector-abi-6.c |  20 
>  10 files changed, 212 insertions(+), 3 deletions(-)  create mode 100644
> gcc/testsuite/gcc.target/riscv/vector-abi-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/vector-abi-6.c
> 
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-
> protos.h index 66c1f535d60..90fde5f8be3 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -302,4 +302,6 @@ th_mempair_output_move (rtx[4], bool, machine_mode,
> RTX_CODE);  #endif
> 
>  extern bool riscv_use_divmod_expander (void);
> +void riscv_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree,
> +int);
> +
>  #endif /* ! GCC_RISCV_PROTOS_H */
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index
> de30bf4e567..dd5361c2bd2 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -3795,6 +3795,99 @@ riscv_pass_fpr_pair (machine_mode mode, unsigned
> regno1,
>  GEN_INT (offset2;
>  }
> 
> +/* Use the TYPE_SIZE to distinguish the type with vector_size attribute
> and
> +   intrinsic vector type.  Because we can't get the decl for the
> +params.  */
> +
> +static bool
> +riscv_scalable_vector_type_p (const_tree type) {
> +  tree size = TYPE_SIZE (type);
> +  if (size && TREE_CODE (size) == INTEGER_CST)
> +return false;
> +
> +  /* For the data type like vint32m1_t, the size code is POLY_INT_CST.
> +*/
> +  return true;
> +}
> +
> +static bool
> +riscv_arg_has_vector (const_tree type)
> +{
> +  bool is_vector = false;
> +
> +  switch (TREE_CODE (type))
> +{
> +case RECORD_TYPE:
> +  if (!COMPLETE_TYPE_P (type))
> + break;
> +
> +  for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
> + if (TREE_CODE (f) == FIELD_DECL)
> +   {
> + tree field_type = TREE_TYPE (f);
> + if (!TYPE_P (field_type))
> +   break;
> +
> + /* Ignore it if it's fixed length vector.  */
> + if (VECTOR_TYPE_P (field_type))
> +   is_vector = riscv_scalable_vector_type_p (field_type);
> + else
> +   is_vector = riscv_arg_has_vector (field_type);
> +   }
> +
> +  break;
> +
> +case VECTOR_TYPE:
> +  is_vector = riscv_scalable_vector_type_p (type);
> +  break;
> +
> +default:
> +  is_vector = false;
> + 

Re: [PATCH v5] RISC-V: Add vector psabi checking.

2023-06-12 Thread Kito Cheng via Gcc-patches
Hi Yan-Zhang:

OK with one minor, go ahead IF the regression is clean.

Hi Pan:

Could you help to verify this patch and commit if the regression is clean?

thanks :)

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
> b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> index 5e69235a268..ad79d0e9a8d 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> @@ -43,7 +43,7 @@ dg-init
>  # Main loop.
>  set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"

Add -Wno-psabi here rather than below, and also add it for
g++.target/riscv/rvv/rvv.exp

>  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.\[cS\]]] \
> -   "" $CFLAGS
> +   "-Wno-psabi" $CFLAGS
>  gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
> "" $CFLAGS
>  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \


Re: Re: [PATCH] RISC-V: Add RVV narrow shift right lowering auto-vectorization

2023-06-12 Thread Kito Cheng via Gcc-patches
Yes, change all define_insn_and_split to that style, "TARGET_VECTOR &&
can_create_pseudo_p ()"/  "&& 1", my understanding is all those
patterns should only work before RA, so all using "TARGET_VECTOR &&
can_create_pseudo_p ()" is more reasonable.

On Mon, Jun 12, 2023 at 8:41 PM juzhe.zh...@rivai.ai
 wrote:
>
> You mean change all split pattern like this ?
> ;; This helps to match zero_extend + sign_extend + fma.
> (define_insn_and_split "*zero_sign_extend_fma"
>   [(set (match_operand:VWEXTI 0 "register_operand")
>   (plus:VWEXTI
> (mult:VWEXTI
>   (zero_extend:VWEXTI
> (match_operand: 2 "register_operand"))
>   (sign_extend:VWEXTI
> (match_operand: 3 "register_operand")))
> (match_operand:VWEXTI 1 "register_operand")))]
>   "TARGET_VECTOR && can_create_pseudo_p ()"
>   "#"
>   "&& 1"
>   [(const_int 0)]
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-06-12 20:37
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; Kito.cheng; palmer; palmer; jeffreyalaw; Robin Dapp
> Subject: Re: [PATCH] RISC-V: Add RVV narrow shift right lowering 
> auto-vectorization
> We have two style predictor for those define_insn_and_split patterns,
> "TARGET_VECTOR"/"&& can_create_pseudo_p ()" and "TARGET_VECTOR &&
> can_create_pseudo_p ()"/"&& 1", could you unify all to later form? I
> feel that would be safer since those patterns are really only valid
> before RA(can_create_pseudo_p() == true), although it's mostly used by
> combine pass so it's mostly safe, but IMO we should fix this soon
> rather than fix that until we hit this later.
>
> OK for this patch as it is, and I would like to have a separated patch
> to fix all those issues.
>
> On Mon, Jun 12, 2023 at 8:27 PM juzhe.zh...@rivai.ai
>  wrote:
> >
> > Is this patch ok for trunk?
> >
> >
> >
> > juzhe.zh...@rivai.ai
> >
> > From: juzhe.zhong
> > Date: 2023-06-12 10:41
> > To: gcc-patches
> > CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; 
> > Juzhe-Zhong
> > Subject: [PATCH] RISC-V: Add RVV narrow shift right lowering 
> > auto-vectorization
> > From: Juzhe-Zhong 
> >
> > Optimize the following auto-vectorization codes:
> > void foo (int16_t * __restrict a, int32_t * __restrict b, int32_t c, int n)
> > {
> > for (int i = 0; i < n; i++)
> >   a[i] = b[i] >> c;
> > }
> >
> > Before this patch:
> > foo:
> > ble a3,zero,.L5
> > .L3:
> > vsetvli a5,a3,e32,m1,ta,ma
> > vle32.v v1,0(a1)
> > vsetvli a4,zero,e32,m1,ta,ma
> > vsra.vx v1,v1,a2
> > vsetvli zero,zero,e16,mf2,ta,ma
> > sllia7,a5,2
> > vncvt.x.x.w v1,v1
> > sllia6,a5,1
> > vsetvli zero,a5,e16,mf2,ta,ma
> > sub a3,a3,a5
> > vse16.v v1,0(a0)
> > add a1,a1,a7
> > add a0,a0,a6
> > bne a3,zero,.L3
> > .L5:
> > ret
> >
> > After this patch:
> > foo:
> > ble a3,zero,.L5
> > .L3:
> > vsetvli a5,a3,e32,m1,ta,ma
> > vle32.v v1,0(a1)
> > vsetvli a7,zero,e16,mf2,ta,ma
> > slli a6,a5,2
> > vnsra.wx v1,v1,a2
> > slli a4,a5,1
> > vsetvli zero,a5,e16,mf2,ta,ma
> > sub a3,a3,a5
> > vse16.v v1,0(a0)
> > add a1,a1,a6
> > add a0,a0,a4
> > bne a3,zero,.L3
> > .L5:
> > ret
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/autovec-opt.md 
> > (*vtrunc): New pattern.
> > (*trunc): Ditto.
> > * config/riscv/autovec.md (3): Change to 
> > define_insn_and_split.
> > (v3): Ditto.
> > (trunc2): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/autovec/binop/narrow-1.c: New test.
> > * gcc.target/riscv/rvv/autovec/binop/narrow-2.c: New test.
> > * gcc.target/riscv/rvv/autovec/binop/narrow-3.c: New test.
> > * gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c: New test.
> > * gcc.target/riscv/rvv/autovec/binop/narrow_run-2.c: New test.
> > * gcc.target/riscv/rvv/autovec/binop/narrow_run-3.c: New test.
> >
> > ---
> > gcc/config/riscv/autovec-opt.md   | 46 +
> > gcc/config/riscv/autovec.md   | 43 ++--
> > .../riscv/rvv/autovec/binop/narrow-1.c| 31 
> > .../riscv/rvv/autovec/binop/narrow-2.c| 32 
> > .../riscv/rvv/autovec/binop/narrow-3.c| 31 
> > .../riscv/rvv/autovec/binop/narrow_run-1.c| 50 +++
> > .../riscv/rvv/autovec/binop/narrow_run-2.c| 46 +
> > .../riscv/rvv/autovec/binop/narrow_run-3.c| 46 +
> > 8 files changed, 311 insertions(+), 14 deletions(-)
> > create mode 100644 
> > gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-1.c
> > create mode 100644 
> > gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-2.c
> > create mode 100644 
> > gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow-3.c
> > create mode 100644 
> > gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c
> > create m

Re: [PATCH v1] RISC-V: Support RVV FP16 MISC vget/vset intrinsic API

2023-06-12 Thread Kito Cheng via Gcc-patches
lgtm

On Mon, Jun 12, 2023 at 3:43 PM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM
>
>
>
> juzhe.zh...@rivai.ai
>
> From: pan2.li
> Date: 2023-06-12 15:40
> To: gcc-patches
> CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
> Subject: [PATCH v1] RISC-V: Support RVV FP16 MISC vget/vset intrinsic API
> From: Pan Li 
>
> This patch support the intrinsic API of FP16 ZVFHMIN vget/vset. From
> the user's perspective, it is reasonable to do some get/set operations
> for the vfloat16*_t types when only ZVFHMIN is enabled.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-types.def
> (vfloat16m1_t): Add type to lmul1 ops.
> (vfloat16m2_t): Likewise.
> (vfloat16m4_t): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add new test cases.
> * gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Likewise.
> ---
> .../riscv/riscv-vector-builtins-types.def |  3 ++
> .../riscv/rvv/base/zvfh-over-zvfhmin.c| 15 +++--
> .../riscv/rvv/base/zvfhmin-intrinsic.c| 32 ++-
> 3 files changed, 40 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
> b/gcc/config/riscv/riscv-vector-builtins-types.def
> index db8e61fea6a..4926bd8a2d2 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-types.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-types.def
> @@ -1091,6 +1091,7 @@ DEF_RVV_LMUL1_OPS (vuint8m1_t, 0)
> DEF_RVV_LMUL1_OPS (vuint16m1_t, 0)
> DEF_RVV_LMUL1_OPS (vuint32m1_t, 0)
> DEF_RVV_LMUL1_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_LMUL1_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
> DEF_RVV_LMUL1_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
> DEF_RVV_LMUL1_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
> @@ -1102,6 +1103,7 @@ DEF_RVV_LMUL2_OPS (vuint8m2_t, 0)
> DEF_RVV_LMUL2_OPS (vuint16m2_t, 0)
> DEF_RVV_LMUL2_OPS (vuint32m2_t, 0)
> DEF_RVV_LMUL2_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_LMUL2_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
> DEF_RVV_LMUL2_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
> DEF_RVV_LMUL2_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
> @@ -1113,6 +1115,7 @@ DEF_RVV_LMUL4_OPS (vuint8m4_t, 0)
> DEF_RVV_LMUL4_OPS (vuint16m4_t, 0)
> DEF_RVV_LMUL4_OPS (vuint32m4_t, 0)
> DEF_RVV_LMUL4_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
> +DEF_RVV_LMUL4_OPS (vfloat16m4_t, RVV_REQUIRE_ELEN_FP_16)
> DEF_RVV_LMUL4_OPS (vfloat32m4_t, RVV_REQUIRE_ELEN_FP_32)
> DEF_RVV_LMUL4_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
> index c3ed4191a36..1d82cc8de2d 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
> @@ -61,6 +61,14 @@ vfloat16m8_t test_vundefined_f16m8() {
>return __riscv_vundefined_f16m8();
> }
> +vfloat16m2_t test_vset_v_f16m1_f16m2(vfloat16m2_t dest, size_t index, 
> vfloat16m1_t val) {
> +  return __riscv_vset_v_f16m1_f16m2(dest, 0, val);
> +}
> +
> +vfloat16m4_t test_vget_v_f16m8_f16m4(vfloat16m8_t src, size_t index) {
> +  return __riscv_vget_v_f16m8_f16m4(src, 0);
> +}
> +
> /* { dg-final { scan-assembler-times 
> {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 3 } } */
> /* { dg-final { scan-assembler-times 
> {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m4,\s*t[au],\s*m[au]} 2 } } */
> /* { dg-final { scan-assembler-times 
> {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 1 } } */
> @@ -71,7 +79,10 @@ vfloat16m8_t test_vundefined_f16m8() {
> /* { dg-final { scan-assembler-times {vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+} 2 
> } } */
> /* { dg-final { scan-assembler-times {vle16\.v\s+v[0-9]+,\s*0\([0-9ax]+\)} 7 
> } } */
> /* { dg-final { scan-assembler-times {vse16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
> 6 } } */
> -/* { dg-final { scan-assembler-times 
> {vl4re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
> +/* { dg-final { scan-assembler-times 
> {vl1re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
> +/* { dg-final { scan-assembler-times 
> {vl2re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
> +/* { dg-final { scan-assembler-times 
> {vl4re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 3 } } */
> /* { dg-final { scan-assembler-times 
> {vl8re16\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 1 } } */
> -/* { dg-final { scan-assembler-times {vs4r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
> 1 } } */
> +/* { dg-final { scan-assembler-times {vs2r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
> 1 } } */
> +/* { dg-final { scan-assembler-times {vs4r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
> 3 } } */
> /* { dg-final { scan-assembler-times {vs8r\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 
> 5 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
> index 8d39a2ed4c2..1026b3f82f1 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c
> +++ b/gcc/tes

RE: [PATCH v5] RISC-V: Add vector psabi checking.

2023-06-12 Thread Li, Pan2 via Gcc-patches
Sure thing, will commit it after all riscv.exp rvv.exp pass.

Pan

-Original Message-
From: Kito Cheng  
Sent: Monday, June 12, 2023 8:43 PM
To: Wang, Yanzhang 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Li, Pan2 
Subject: Re: [PATCH v5] RISC-V: Add vector psabi checking.

Hi Yan-Zhang:

OK with one minor, go ahead IF the regression is clean.

Hi Pan:

Could you help to verify this patch and commit if the regression is clean?

thanks :)

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
> b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> index 5e69235a268..ad79d0e9a8d 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> @@ -43,7 +43,7 @@ dg-init
>  # Main loop.
>  set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"

Add -Wno-psabi here rather than below, and also add it for
g++.target/riscv/rvv/rvv.exp

>  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.\[cS\]]] \
> -   "" $CFLAGS
> +   "-Wno-psabi" $CFLAGS
>  gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
> "" $CFLAGS
>  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \


Re: [PATCH v2] rs6000: fmr gets used instead of faster xxlor [PR93571]

2023-06-12 Thread Segher Boessenkool
Hi!

On Sat, Feb 25, 2023 at 03:20:33PM +0530, Ajit Agarwal wrote:
> Here is the patch that uses xxlor instead of fmr where possible.
> Performance results shows that fmr is better in power9 and 
> power10 architectures whereas xxlor is better in power7 and
> power 8 architectures. fmr is the only option before p7.

>  ;; The ISA we implement.
> -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
> +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p7p8v,p9,p9v,p9kf,p9tf,p10"
>(const_string "any"))

This isn't really about what insn we *can* use here.

> + (and (eq_attr "isa" "p7p8v")
> +   (match_test "TARGET_VSX && !TARGET_P9_VECTOR"))
> + (const_int 1)

What is needed here is test the *tune* setting.  For example if someone
uses -mcpu=power8 -mtune=power9 (this is a setting that is really used,
or was a few years ago anyway) you *do* want fmr insns generated.

So don't do this via the isa attribute at all, just add some insn
condition (testing the tune setting)?


Segher


[PATCH] Remove DEFAULT_MATCHPD_PARTITIONS macro

2023-06-12 Thread Tamar Christina via Gcc-patches
Hi All,

As Jakub pointed out, DEFAULT_MATCHPD_PARTITIONS
is now unused and can be removed.

Bootstrapped aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Remove DEFAULT_MATCHPD_PARTITIONS.

--- inline copy of patch -- 
diff --git a/gcc/config.in b/gcc/config.in
index 
cf2f284378447c8f8e2f838a786dba23d6086fe3..0e62b9fbfc93da8fb511bf581ef9457e55c8bc6c
 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -67,12 +67,6 @@
 #endif
 
 
-/* Define to larger than one set the number of match.pd partitions to make. */
-#ifndef USED_FOR_TARGET
-#undef DEFAULT_MATCHPD_PARTITIONS
-#endif
-
-
 /* Define to larger than zero set the default stack clash protector size. */
 #ifndef USED_FOR_TARGET
 #undef DEFAULT_STK_CLASH_GUARD_SIZE
diff --git a/gcc/configure b/gcc/configure
index 
5f67808b77441ba730183eef90367b70a51b08a0..3aa2534f4d4aa4136e9aaf5de51b8e6b67c48d5a
 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -7908,11 +7908,6 @@ if (test $DEFAULT_MATCHPD_PARTITIONS -lt 1); then
 fi
 
 
-cat >>confdefs.h <<_ACEOF
-#define DEFAULT_MATCHPD_PARTITIONS $DEFAULT_MATCHPD_PARTITIONS
-_ACEOF
-
-
 
 # Enable __cxa_atexit for C++.
 # Check whether --enable-__cxa_atexit was given.
@@ -19850,7 +19845,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19853 "configure"
+#line 19848 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19956,7 +19951,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19959 "configure"
+#line 19954 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 
cc8dd9e20bf4e3994af99a74ec2a0fe61b0fb1ae..524ef76ec7deb6357d616b6dc6e016d2a9804816
 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -932,8 +932,6 @@ if (test $DEFAULT_MATCHPD_PARTITIONS -lt 1); then
Cannot be negative.]))
 fi
 
-AC_DEFINE_UNQUOTED(DEFAULT_MATCHPD_PARTITIONS, $DEFAULT_MATCHPD_PARTITIONS,
-   [Define to larger than one set the number of match.pd partitions to 
make.])
 AC_SUBST(DEFAULT_MATCHPD_PARTITIONS)
 
 # Enable __cxa_atexit for C++.




-- 
diff --git a/gcc/config.in b/gcc/config.in
index 
cf2f284378447c8f8e2f838a786dba23d6086fe3..0e62b9fbfc93da8fb511bf581ef9457e55c8bc6c
 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -67,12 +67,6 @@
 #endif
 
 
-/* Define to larger than one set the number of match.pd partitions to make. */
-#ifndef USED_FOR_TARGET
-#undef DEFAULT_MATCHPD_PARTITIONS
-#endif
-
-
 /* Define to larger than zero set the default stack clash protector size. */
 #ifndef USED_FOR_TARGET
 #undef DEFAULT_STK_CLASH_GUARD_SIZE
diff --git a/gcc/configure b/gcc/configure
index 
5f67808b77441ba730183eef90367b70a51b08a0..3aa2534f4d4aa4136e9aaf5de51b8e6b67c48d5a
 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -7908,11 +7908,6 @@ if (test $DEFAULT_MATCHPD_PARTITIONS -lt 1); then
 fi
 
 
-cat >>confdefs.h <<_ACEOF
-#define DEFAULT_MATCHPD_PARTITIONS $DEFAULT_MATCHPD_PARTITIONS
-_ACEOF
-
-
 
 # Enable __cxa_atexit for C++.
 # Check whether --enable-__cxa_atexit was given.
@@ -19850,7 +19845,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19853 "configure"
+#line 19848 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19956,7 +19951,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19959 "configure"
+#line 19954 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 
cc8dd9e20bf4e3994af99a74ec2a0fe61b0fb1ae..524ef76ec7deb6357d616b6dc6e016d2a9804816
 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -932,8 +932,6 @@ if (test $DEFAULT_MATCHPD_PARTITIONS -lt 1); then
Cannot be negative.]))
 fi
 
-AC_DEFINE_UNQUOTED(DEFAULT_MATCHPD_PARTITIONS, $DEFAULT_MATCHPD_PARTITIONS,
-   [Define to larger than one set the number of match.pd partitions to 
make.])
 AC_SUBST(DEFAULT_MATCHPD_PARTITIONS)
 
 # Enable __cxa_atexit for C++.





Re: [PATCH] inline: improve internal function costs

2023-06-12 Thread Richard Biener via Gcc-patches
On Mon, 12 Jun 2023, Andre Vieira (lists) wrote:

> 
> 
> On 05/06/2023 04:04, Jan Hubicka wrote:
> >> On Thu, 1 Jun 2023, Andre Vieira (lists) wrote:
> >>
> >>> Hi,
> >>>
> >>> This is a follow-up of the internal function patch to add widening and
> >>> narrowing patterns.  This patch improves the inliner cost estimation for
> >>> internal functions.
> >>
> >> I have no idea why calls are special in IPA analyze_function_body
> >> and so I cannot say whether treating all internal fn calls as
> >> non-calls is correct there.  Honza?
> > 
> > The reason is that normal statements are acconted as part of the
> > function body, while calls have their costs attached to call edges
> > (so it can be adjusted when call is inlined to otherwise optimized).
> > 
> > However since internal functions have no cgraph edges, this looks like
> > a bug that we do not test it.  (the code was written before internal
> > calls was introduced).
> >
> 
> This sounds to me like you agree with my approach to treat internal calls
> different to regular calls.
> 
> > I wonder if we don't want to have is_noninternal_gimple_call that could
> > be used by IPA code to test whether cgraph edge should exist for
> > the statement.
> 
> I'm happy to add such a helper function @richi,rsandifo: you ok with that?

It's a bit of an ugly name, if we want something that keys on calls
that have an edge it should be obvious it does this.  I wouldn't
add is_noninternal_gimple_call.  With LTO and libgcc and internal
optab fns it's also less obvious in cases we want to have say
.DIVMODDI3 (...) which in the end maps to a LTOed libcall from libgcc.a 
...

Richard.


Re: [PATCH] Remove DEFAULT_MATCHPD_PARTITIONS macro

2023-06-12 Thread Richard Biener via Gcc-patches
On Mon, 12 Jun 2023, Tamar Christina wrote:

> Hi All,
> 
> As Jakub pointed out, DEFAULT_MATCHPD_PARTITIONS
> is now unused and can be removed.
> 
> Bootstrapped aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Remove DEFAULT_MATCHPD_PARTITIONS.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/config.in b/gcc/config.in
> index 
> cf2f284378447c8f8e2f838a786dba23d6086fe3..0e62b9fbfc93da8fb511bf581ef9457e55c8bc6c
>  100644
> --- a/gcc/config.in
> +++ b/gcc/config.in
> @@ -67,12 +67,6 @@
>  #endif
>  
>  
> -/* Define to larger than one set the number of match.pd partitions to make. 
> */
> -#ifndef USED_FOR_TARGET
> -#undef DEFAULT_MATCHPD_PARTITIONS
> -#endif
> -
> -
>  /* Define to larger than zero set the default stack clash protector size. */
>  #ifndef USED_FOR_TARGET
>  #undef DEFAULT_STK_CLASH_GUARD_SIZE
> diff --git a/gcc/configure b/gcc/configure
> index 
> 5f67808b77441ba730183eef90367b70a51b08a0..3aa2534f4d4aa4136e9aaf5de51b8e6b67c48d5a
>  100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -7908,11 +7908,6 @@ if (test $DEFAULT_MATCHPD_PARTITIONS -lt 1); then
>  fi
>  
>  
> -cat >>confdefs.h <<_ACEOF
> -#define DEFAULT_MATCHPD_PARTITIONS $DEFAULT_MATCHPD_PARTITIONS
> -_ACEOF
> -
> -
>  
>  # Enable __cxa_atexit for C++.
>  # Check whether --enable-__cxa_atexit was given.
> @@ -19850,7 +19845,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 19853 "configure"
> +#line 19848 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> @@ -19956,7 +19951,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 19959 "configure"
> +#line 19954 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index 
> cc8dd9e20bf4e3994af99a74ec2a0fe61b0fb1ae..524ef76ec7deb6357d616b6dc6e016d2a9804816
>  100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -932,8 +932,6 @@ if (test $DEFAULT_MATCHPD_PARTITIONS -lt 1); then
>   Cannot be negative.]))
>  fi
>  
> -AC_DEFINE_UNQUOTED(DEFAULT_MATCHPD_PARTITIONS, $DEFAULT_MATCHPD_PARTITIONS,
> - [Define to larger than one set the number of match.pd partitions to 
> make.])
>  AC_SUBST(DEFAULT_MATCHPD_PARTITIONS)
>  
>  # Enable __cxa_atexit for C++.
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] Remove DEFAULT_MATCHPD_PARTITIONS macro

2023-06-12 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 12, 2023 at 01:53:26PM +0100, Tamar Christina wrote:
> gcc/ChangeLog:
> 
>   * config.in: Regenerate.
>   * configure: Regenerate.
>   * configure.ac: Remove DEFAULT_MATCHPD_PARTITIONS.

Ok, thanks.

Jakub



[PATCH] Fix disambiguation against .MASK_STORE

2023-06-12 Thread Richard Biener via Gcc-patches
Alias analysis was treating .MASK_STORE as storing a full vector
which means we disambiguate against decls of smaller than vector size.
That's of course wrong and a similar issue was fixed for DSE already.
The following makes sure we set the size of the access to unknown
and only constrain max_size.

This fixes runtime execution FAILs of gfortran.dg/matmul_2.f90,
gfortran.dg/matmul_6.f90 and gfortran.dg/pr91577.f90 when using
AVX512 with full masked loop vectorization on Zen4.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to
trunk sofar.

* tree-ssa-alias.cc (call_may_clobber_ref_p_1): For
.MASK_STORE and friend set the size of the access to
unknown.
---
 gcc/tree-ssa-alias.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 79ed956e300..b5476e8b41e 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -3072,6 +3072,9 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref, bool 
tbaa_p)
  ao_ref lhs_ref;
  ao_ref_init_from_ptr_and_size (&lhs_ref, gimple_call_arg (call, 0),
 TYPE_SIZE_UNIT (TREE_TYPE (rhs)));
+ /* We cannot make this a known-size access since otherwise
+we disambiguate against stores to decls that are smaller.  */
+ lhs_ref.size = -1;
  lhs_ref.ref_alias_set = lhs_ref.base_alias_set
= tbaa_p ? get_deref_alias_set
   (TREE_TYPE (gimple_call_arg (call, 1))) : 0;
-- 
2.35.3


[PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-12 Thread Jiufu Guo via Gcc-patches
Hi,

For stack_tie, currently below insn is generated:
(insn 15 14 16 3 (parallel [
 (set (mem/c:BLK (reg/f:DI 1 1) [1  A8])
 (const_int 0 [0]))
 ]) "/home/guojiufu/temp/gdb.c":13:3 922 {stack_tie}
  (nil))

It is "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])".  This maybe
looks like "a memory block is zerored", while actually stack_tie
may be more like a placeholder, and does not generate any thing.

To avoid potential misunderstand, "UNPSEC:BLK [(const_int 0)].." could
be used here like other ports.

This patch does this.  Bootstrap®test pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

---
 gcc/config/rs6000/predicates.md   | 11 +++
 gcc/config/rs6000/rs6000-logue.cc |  4 +++-
 gcc/config/rs6000/rs6000.cc   |  4 
 gcc/config/rs6000/rs6000.md   | 14 ++
 4 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index a16ee30f0c0..4748cb37ce8 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1854,10 +1854,13 @@ (define_predicate "stmw_operation"
 (define_predicate "tie_operand"
   (match_code "parallel")
 {
-  return (GET_CODE (XVECEXP (op, 0, 0)) == SET
- && MEM_P (XEXP (XVECEXP (op, 0, 0), 0))
- && GET_MODE (XEXP (XVECEXP (op, 0, 0), 0)) == BLKmode
- && XEXP (XVECEXP (op, 0, 0), 1) == const0_rtx);
+  rtx set = XVECEXP (op, 0, 0);
+  return (GET_CODE (set) == SET
+ && MEM_P (SET_DEST (set))
+ && GET_MODE (SET_DEST (set)) == BLKmode
+ && GET_CODE (SET_SRC (set)) == UNSPEC
+ && XINT (SET_SRC (set), 1) == UNSPEC_TIE
+ && XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
 })
 
 ;; Match a small code model toc reference (or medium and large
diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index bc6b153b59f..b99f43a8282 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -1463,7 +1463,9 @@ rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
   while (--i >= 0)
 {
   rtx mem = gen_frame_mem (BLKmode, regs[i]);
-  RTVEC_ELT (p, i) = gen_rtx_SET (mem, const0_rtx);
+  RTVEC_ELT (p, i)
+   = gen_rtx_SET (mem, gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx),
+   UNSPEC_TIE));
 }
 
   emit_insn (gen_stack_tie (gen_rtx_PARALLEL (VOIDmode, p)));
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d197c3f3289..0c81ebea711 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1760,6 +1760,10 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 
 #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
 #define TARGET_UPDATE_IPA_FN_TARGET_INFO rs6000_update_ipa_fn_target_info
+
+#undef TARGET_CONST_ANCHOR
+#define TARGET_CONST_ANCHOR 0x8000
+
 
 
 /* Processor table.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..fdcf8347812 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -158,6 +158,7 @@ (define_c_enum "unspec"
UNSPEC_HASHCHK
UNSPEC_XXSPLTIDP_CONST
UNSPEC_XXSPLTIW_CONST
+   UNSPEC_TIE
   ])
 
 ;;
@@ -10828,7 +10829,9 @@ (define_expand "restore_stack_block"
   operands[4] = gen_frame_mem (Pmode, operands[1]);
   p = rtvec_alloc (1);
   RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
- const0_rtx);
+ gen_rtx_UNSPEC (BLKmode,
+ gen_rtvec (1, const0_rtx),
+ UNSPEC_TIE));
   operands[5] = gen_rtx_PARALLEL (VOIDmode, p);
 })
 
@@ -10866,7 +10869,9 @@ (define_expand "restore_stack_nonlocal"
   operands[5] = gen_frame_mem (Pmode, operands[3]);
   p = rtvec_alloc (1);
   RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
- const0_rtx);
+ gen_rtx_UNSPEC (BLKmode,
+ gen_rtvec (1, const0_rtx),
+ UNSPEC_TIE));
   operands[6] = gen_rtx_PARALLEL (VOIDmode, p);
 })
 
@@ -13898,7 +13903,8 @@ (define_insn "*save_fpregs__r1"
 ; not be moved over loads from or stores to stack memory.
 (define_insn "stack_tie"
   [(match_parallel 0 "tie_operand"
-  [(set (mem:BLK (reg 1)) (const_int 0))])]
+  [(set (mem:BLK (reg 1))
+   (unspec:BLK [(const_int 0)] UNSPEC_TIE))])]
   ""
   ""
   [(set_attr "length" "0")])
@@ -13910,7 +13916,7 @@ (define_insn "stack_restore_tie"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
(plus:SI (match_operand:SI 1 "gpc_reg_operand" "r,r")
 (match_operand:SI 2 "reg_or_cint_operand" "O,rI")))
-   (set (mem:BLK (scratch)) (const_int 0))]
+   (set (mem:BLK (scratc

[PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation

2023-06-12 Thread juzhe . zhong
From: Juzhe-Zhong 

According to RVV ISA:
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc

We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing 
vdecompress)
Decompress operation.

Case 1 (nunits = POLY_INT_CST [16, 16]):
_48 = VEC_PERM_EXPR <_37, _35, { 0, POLY_INT_CST [16, 16], 1, POLY_INT_CST [17, 
16], 2, POLY_INT_CST [18, 16], ... }>;
We can optimize such VLA SLP permuation pattern into:
_48 = vdecompress (_37, _35, mask = { 0, 1, 0, 1, ... };

Case 2 (nunits = POLY_INT_CST [16, 16]):
_23 = VEC_PERM_EXPR <_46, _44, { POLY_INT_CST [1, 1], POLY_INT_CST [3, 3], 
POLY_INT_CST [2, 1], POLY_INT_CST [4, 3], POLY_INT_CST [3, 1], POLY_INT_CST [5, 
3], ... }>;
We can optimize such VLA SLP permuation pattern into:
_48 = vdecompress (slidedown(_46, 1/2 nunits), slidedown(_44, 1/2 nunits), mask 
= { 0, 1, 0, 1, ... };

For example:
void __attribute__ ((noinline, noclone))
vec_slp (uint64_t *restrict a, uint64_t b, uint64_t c, int n)
{
  for (int i = 0; i < n; ++i)
{
  a[i * 2] += b;
  a[i * 2 + 1] += c;
}
}

ASM:
...
vid.v   v0
vand.vi v0,v0,1
vmseq.viv0,v0,1  ===> mask = { 0, 1, 0, 1, ... }
vdecompress:
viota.m v3,v0   
vrgather.vv v2,v1,v3,v0.t
...

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_vlmax_decompress_insn): New function.
(expand_const_vector): Enhance repeating sequence mask.
(shuffle_decompress_patterns): New function.
(expand_vec_perm_const_1): Add decompress optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-8.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-9.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-9.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 146 +-
 .../riscv/rvv/autovec/partial/slp-8.c |  30 
 .../riscv/rvv/autovec/partial/slp-9.c |  31 
 .../riscv/rvv/autovec/partial/slp_run-8.c |  30 
 .../riscv/rvv/autovec/partial/slp_run-9.c |  30 
 5 files changed, 260 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-9.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e1b85a5af91..3cea6b25261 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -836,6 +836,46 @@ emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx 
sel, rtx mask)
   emit_vlmax_masked_mu_insn (icode, RVV_BINOP_MU, ops);
 }
 
+/* According to RVV ISA spec (16.5.1. Synthesizing vdecompress):
+   https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
+
+  There is no inverse vdecompress provided, as this operation can be readily
+  synthesized using iota and a masked vrgather:
+
+  Desired functionality of 'vdecompress'
+   7 6 5 4 3 2 1 0 # vid
+
+ e d c b a # packed vector of 5 elements
+   1 0 0 1 1 1 0 1 # mask vector of 8 elements
+   p q r s t u v w # destination register before vdecompress
+
+   e q r d c b v a # result of vdecompress
+   # v0 holds mask
+   # v1 holds packed data
+   # v11 holds input expanded vector and result
+   viota.m v10, v0 # Calc iota from mask in v0
+   vrgather.vv v11, v1, v10, v0.t  # Expand into destination
+ p q r s t u v w  # v11 destination register
+  e d c b a  # v1 source vector
+ 1 0 0 1 1 1 0 1  # v0 mask vector
+
+ 4 4 4 3 2 1 1 0  # v10 result of viota.m
+ e q r d c b v a  # v11 destination after vrgather using viota.m under mask
+*/
+static void
+emit_vlmax_decompress_insn (rtx target, rtx op, rtx mask)
+{
+  machine_mode data_mode = GET_MODE (target);
+  machine_mode sel_mode = related_int_vector_mode (data_mode).require ();
+  if (GET_MODE_INNER (data_mode) == QImode)
+sel_mode = get_vector_mode (HImode, GET_MODE_NUNITS (data_mode)).require 
();
+
+  rtx sel = gen_reg_rtx (sel_mode);
+  rtx iota_ops[] = {sel, mask};
+  emit_vlmax_insn (code_for_pred_iota (sel_mode), RVV_UNOP, iota_ops);
+  emit_vlmax_masked_gather_mu_insn (target, op, sel, mask);
+}
+
 /* Emit merge instruction.  */
 
 static machine_mode
@@ -934,14 +974,41 @@ expand_const_vector (rtx target, rtx src)
 {
   machine_mode mode = GET_MODE (target);
   scalar_mode elt_mode = GET_MODE_INNER (mode);
+  poly_uint64 nunits = GET_MODE_NUNITS (mode);
+  unsigned int nelts_per_pattern = CONST_VECTOR_NELTS_PER_PATTERN (src);
+  unsigned int npatterns = CONST_VECTOR_NPATTERNS (src);
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
 {
   rtx elt;
-  gcc_assert (
-   const_vec_duplicate_p (s

RE: [PATCH v5] RISC-V: Add vector psabi checking.

2023-06-12 Thread Wang, Yanzhang via Gcc-patches
I found that add the -Wno-psabi to CFLAGS will be overrode by
dg-options. It seems we can only add this option to the third
arg of dg-runtest. Attach the dg-runtest comments,

# dg-runtest -- simple main loop useful to most testsuites
#
# OPTIONS is a set of options to always pass.
# DEFAULT_EXTRA_OPTIONS is a set of options to pass if the testcase
# doesn't specify any (with dg-option).

> -Original Message-
> From: Kito Cheng 
> Sent: Monday, June 12, 2023 8:43 PM
> To: Wang, Yanzhang 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Li, Pan2
> 
> Subject: Re: [PATCH v5] RISC-V: Add vector psabi checking.
> 
> Hi Yan-Zhang:
> 
> OK with one minor, go ahead IF the regression is clean.
> 
> Hi Pan:
> 
> Could you help to verify this patch and commit if the regression is clean?
> 
> thanks :)
> 
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> > index 5e69235a268..ad79d0e9a8d 100644
> > --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> > @@ -43,7 +43,7 @@ dg-init
> >  # Main loop.
> >  set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
> 
> Add -Wno-psabi here rather than below, and also add it for
> g++.target/riscv/rvv/rvv.exp
> 
> >  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.\[cS\]]] \
> > -   "" $CFLAGS
> > +   "-Wno-psabi" $CFLAGS
> >  gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]]
> \
> > "" $CFLAGS
> >  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \


[PATCH] New finish_compare_by_pieces target hook (for x86).

2023-06-12 Thread Roger Sayle

The following simple test case, from PR 104610, shows that memcmp () == 0
can result in some bizarre code sequences on x86.

int foo(char *a)
{
static const char t[] = "0123456789012345678901234567890";
return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
}

with -O2 currently contains both:
xorl%eax, %eax
xorl$1, %eax
and also
movl$1, %eax
xorl$1, %eax

Changing the return type of foo to _Bool results in the equally
bizarre:
xorl%eax, %eax
testl   %eax, %eax
sete%al
and also
movl$1, %eax
testl   %eax, %eax
sete%al

All these sequences set the result to a constant, but this optimization
opportunity only occurs very late during compilation, by basic block
duplication in the 322r.bbro pass, too late for CSE or peephole2 to
do anything about it.  The problem is that the idiom expanded by
compare_by_pieces for __builtin_memcmp_eq contains basic blocks that
can't easily be optimized by if-conversion due to the multiple
incoming edges on the fail block.

In summary, compare_by_pieces generates code that looks like:

if (x[0] != y[0]) goto fail_label;
if (x[1] != y[1]) goto fail_label;
...
if (x[n] != y[n]) goto fail_label;
result = 1;
goto end_label;
fail_label:
result = 0;
end_label:

In theory, the RTL if-conversion pass could be enhanced to tackle
arbitrarily complex if-then-else graphs, but the solution proposed
here is to allow suitable targets to perform if-conversion during
compare_by_pieces.  The x86, for example, can take advantage that
all of the above comparisons set and test the zero flag (ZF), which
can then be used in combination with sete.  Hence compare_by_pieces
could instead generate:

if (x[0] != y[0]) goto fail_label;
if (x[1] != y[1]) goto fail_label;
...
if (x[n] != y[n]) goto fail_label;
fail_label:
sete result

which requires one less basic block, and the redundant conditional
branch to a label immediately after is cleaned up by GCC's existing
RTL optimizations.

For the test case above, where -O2 -msse4 previously generated:

foo:movdqu  (%rdi), %xmm0
pxor.LC0(%rip), %xmm0
ptest   %xmm0, %xmm0
je  .L5
.L2:movl$1, %eax
xorl$1, %eax
ret
.L5:movdqu  16(%rdi), %xmm0
pxor.LC1(%rip), %xmm0
ptest   %xmm0, %xmm0
jne .L2
xorl%eax, %eax
xorl$1, %eax
ret

we now generate:

foo:movdqu  (%rdi), %xmm0
pxor.LC0(%rip), %xmm0
ptest   %xmm0, %xmm0
jne .L2
movdqu  16(%rdi), %xmm0
pxor.LC1(%rip), %xmm0
ptest   %xmm0, %xmm0
.L2:sete%al
movzbl  %al, %eax
ret

Using a target hook allows the large amount of intelligence already in
compare_by_pieces to be re-used by the i386 backend, but this can also
help other backends with condition flags where the equality result can
be materialized.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-12  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.cc (ix86_finish_compare_by_pieces): New
function to provide a backend specific implementation.
(TARGET_FINISH_COMPARE_BY_PIECES): Use the above function.

* doc/tm.texi.in (TARGET_FINISH_COMPARE_BY_PIECES): New @hook.
* doc/tm.texi: Regenerate.

* expr.cc (compare_by_pieces): Call finish_compare_by_pieces in
targetm to finalize the RTL expansion.  Move the current
implementation to a default target hook.
* target.def (finish_compare_by_pieces): New target hook to allow
compare_by_pieces to be customized by the target.
* targhooks.cc (default_finish_compare_by_pieces): Default
implementation moved here from expr.cc's compare_by_pieces.
* targhooks.h (default_finish_compare_by_pieces): Prototype.

gcc/testsuite/ChangeLog
* gcc.target/i386/pieces-memcmp-1.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3a1444d..509c0ee 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -16146,6 +16146,20 @@ ix86_fp_compare_code_to_integer (enum rtx_code code)
 }
 }
 
+/* Override compare_by_pieces' default implementation using the state
+   of the CCZmode FLAGS_REG and sete instruction.  TARGET is the integral
+   mode result, and FAIL_LABEL is the branch target of mismatched
+   comparisons.  */
+
+void
+ix86_finish_compare_by_pieces (rtx target, rtx_code_label *fail_label)
+{
+  rtx tmp = gen_reg_rtx (QImode);
+  emit_label (fail_label);
+  ix86_expand_setcc (tmp, NE, gen_rtx_REG (CCZmode, FLAGS_REG), const0_rtx);
+  convert_move (target, tmp, 1);
+}
+
 /* Zero extend possibly SImode

Re: [PATCH v5] RISC-V: Add vector psabi checking.

2023-06-12 Thread Kito Cheng via Gcc-patches
How about appending to DEFAULT_CFLAGS?

On Mon, Jun 12, 2023 at 9:38 PM Wang, Yanzhang via Gcc-patches
 wrote:
>
> I found that add the -Wno-psabi to CFLAGS will be overrode by
> dg-options. It seems we can only add this option to the third
> arg of dg-runtest. Attach the dg-runtest comments,
>
> # dg-runtest -- simple main loop useful to most testsuites
> #
> # OPTIONS is a set of options to always pass.
> # DEFAULT_EXTRA_OPTIONS is a set of options to pass if the testcase
> # doesn't specify any (with dg-option).
>
> > -Original Message-
> > From: Kito Cheng 
> > Sent: Monday, June 12, 2023 8:43 PM
> > To: Wang, Yanzhang 
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Li, Pan2
> > 
> > Subject: Re: [PATCH v5] RISC-V: Add vector psabi checking.
> >
> > Hi Yan-Zhang:
> >
> > OK with one minor, go ahead IF the regression is clean.
> >
> > Hi Pan:
> >
> > Could you help to verify this patch and commit if the regression is clean?
> >
> > thanks :)
> >
> > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> > b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> > > index 5e69235a268..ad79d0e9a8d 100644
> > > --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> > > @@ -43,7 +43,7 @@ dg-init
> > >  # Main loop.
> > >  set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
> >
> > Add -Wno-psabi here rather than below, and also add it for
> > g++.target/riscv/rvv/rvv.exp
> >
> > >  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.\[cS\]]] \
> > > -   "" $CFLAGS
> > > +   "-Wno-psabi" $CFLAGS
> > >  gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]]
> > \
> > > "" $CFLAGS
> > >  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/*.\[cS\]]] \


RE: Re: [PATCH] RISC-V: Add RVV narrow shift right lowering auto-vectorization

2023-06-12 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and will take care of the define_insn_and_split part in 
another PATCH.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Monday, June 12, 2023 8:45 PM
To: juzhe.zh...@rivai.ai
Cc: kito.cheng ; gcc-patches ; 
palmer ; palmer ; jeffreyalaw 
; Robin Dapp 
Subject: Re: Re: [PATCH] RISC-V: Add RVV narrow shift right lowering 
auto-vectorization

Yes, change all define_insn_and_split to that style, "TARGET_VECTOR && 
can_create_pseudo_p ()"/  "&& 1", my understanding is all those patterns should 
only work before RA, so all using "TARGET_VECTOR && can_create_pseudo_p ()" is 
more reasonable.

On Mon, Jun 12, 2023 at 8:41 PM juzhe.zh...@rivai.ai  
wrote:
>
> You mean change all split pattern like this ?
> ;; This helps to match zero_extend + sign_extend + fma.
> (define_insn_and_split "*zero_sign_extend_fma"
>   [(set (match_operand:VWEXTI 0 "register_operand")
>   (plus:VWEXTI
> (mult:VWEXTI
>   (zero_extend:VWEXTI
> (match_operand: 2 "register_operand"))
>   (sign_extend:VWEXTI
> (match_operand: 3 "register_operand")))
> (match_operand:VWEXTI 1 "register_operand")))]
>   "TARGET_VECTOR && can_create_pseudo_p ()"
>   "#"
>   "&& 1"
>   [(const_int 0)]
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2023-06-12 20:37
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; Kito.cheng; palmer; palmer; jeffreyalaw; Robin Dapp
> Subject: Re: [PATCH] RISC-V: Add RVV narrow shift right lowering 
> auto-vectorization We have two style predictor for those 
> define_insn_and_split patterns, "TARGET_VECTOR"/"&& 
> can_create_pseudo_p ()" and "TARGET_VECTOR && can_create_pseudo_p 
> ()"/"&& 1", could you unify all to later form? I feel that would be 
> safer since those patterns are really only valid before 
> RA(can_create_pseudo_p() == true), although it's mostly used by 
> combine pass so it's mostly safe, but IMO we should fix this soon rather than 
> fix that until we hit this later.
>
> OK for this patch as it is, and I would like to have a separated patch 
> to fix all those issues.
>
> On Mon, Jun 12, 2023 at 8:27 PM juzhe.zh...@rivai.ai 
>  wrote:
> >
> > Is this patch ok for trunk?
> >
> >
> >
> > juzhe.zh...@rivai.ai
> >
> > From: juzhe.zhong
> > Date: 2023-06-12 10:41
> > To: gcc-patches
> > CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; 
> > Juzhe-Zhong
> > Subject: [PATCH] RISC-V: Add RVV narrow shift right lowering 
> > auto-vectorization
> > From: Juzhe-Zhong 
> >
> > Optimize the following auto-vectorization codes:
> > void foo (int16_t * __restrict a, int32_t * __restrict b, int32_t c, 
> > int n) {
> > for (int i = 0; i < n; i++)
> >   a[i] = b[i] >> c;
> > }
> >
> > Before this patch:
> > foo:
> > ble a3,zero,.L5
> > .L3:
> > vsetvli a5,a3,e32,m1,ta,ma
> > vle32.v v1,0(a1)
> > vsetvli a4,zero,e32,m1,ta,ma
> > vsra.vx v1,v1,a2
> > vsetvli zero,zero,e16,mf2,ta,ma
> > sllia7,a5,2
> > vncvt.x.x.w v1,v1
> > sllia6,a5,1
> > vsetvli zero,a5,e16,mf2,ta,ma
> > sub a3,a3,a5
> > vse16.v v1,0(a0)
> > add a1,a1,a7
> > add a0,a0,a6
> > bne a3,zero,.L3
> > .L5:
> > ret
> >
> > After this patch:
> > foo:
> > ble a3,zero,.L5
> > .L3:
> > vsetvli a5,a3,e32,m1,ta,ma
> > vle32.v v1,0(a1)
> > vsetvli a7,zero,e16,mf2,ta,ma
> > slli a6,a5,2
> > vnsra.wx v1,v1,a2
> > slli a4,a5,1
> > vsetvli zero,a5,e16,mf2,ta,ma
> > sub a3,a3,a5
> > vse16.v v1,0(a0)
> > add a1,a1,a6
> > add a0,a0,a4
> > bne a3,zero,.L3
> > .L5:
> > ret
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/autovec-opt.md 
> > (*vtrunc): New pattern.
> > (*trunc): Ditto.
> > * config/riscv/autovec.md (3): Change to 
> > define_insn_and_split.
> > (v3): Ditto.
> > (trunc2): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/rvv/autovec/binop/narrow-1.c: New test.
> > * gcc.target/riscv/rvv/autovec/binop/narrow-2.c: New test.
> > * gcc.target/riscv/rvv/autovec/binop/narrow-3.c: New test.
> > * gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c: New test.
> > * gcc.target/riscv/rvv/autovec/binop/narrow_run-2.c: New test.
> > * gcc.target/riscv/rvv/autovec/binop/narrow_run-3.c: New test.
> >
> > ---
> > gcc/config/riscv/autovec-opt.md   | 46 +
> > gcc/config/riscv/autovec.md   | 43 ++--
> > .../riscv/rvv/autovec/binop/narrow-1.c| 31 
> > .../riscv/rvv/autovec/binop/narrow-2.c| 32 
> > .../riscv/rvv/autovec/binop/narrow-3.c| 31 
> > .../riscv/rvv/autovec/binop/narrow_run-1.c| 50 +++
> > .../riscv/rvv/autovec/binop/narrow_run-2.c| 46 +
> > .../riscv/rvv/autovec/binop/narrow_run-3.c| 46 

RE: [PATCH v1] RISC-V: Support RVV FP16 MISC vlmul ext intrinsic API

2023-06-12 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and Juzhe.

Pan

From: Kito Cheng 
Sent: Monday, June 12, 2023 11:33 AM
To: 钟居哲 
Cc: Li, Pan2 ; gcc-patches ; 
rdapp.gcc ; Jeff Law ; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Support RVV FP16 MISC vlmul ext intrinsic API

Lgtm too :)

钟居哲 mailto:juzhe.zh...@rivai.ai>> 於 2023年6月12日 週一 05:48 
寫道:
LGTM



juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-11 08:33
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; 
yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV FP16 MISC vlmul ext intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch support the intrinsic API of FP16 ZVFHMIN vlmul ext. Aka:

vfloat16*_t <==> vfloat16*_t.

From the user's perspective, it is reasonable to do some type convert
between vfloat16*_t and vfloat16*_t when only ZVFHMIN is enabled.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add type to X2/X4/X8/X16/X32 vlmul ext ops.
(vfloat16mf2_t): Ditto.
(vfloat16m1_t): Ditto.
(vfloat16m2_t): Ditto.
(vfloat16m4_t): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Add new test cases.
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: Add new test cases.
---
.../riscv/riscv-vector-builtins-types.def | 15 ++
.../riscv/rvv/base/zvfh-over-zvfhmin.c| 18 +--
.../riscv/rvv/base/zvfhmin-intrinsic.c| 54 +--
3 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 589ea532727..db8e61fea6a 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -978,6 +978,11 @@ DEF_RVV_X2_VLMUL_EXT_OPS (vuint32m4_t, 0)
DEF_RVV_X2_VLMUL_EXT_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_X2_VLMUL_EXT_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_X2_VLMUL_EXT_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X2_VLMUL_EXT_OPS (vfloat16m4_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_X2_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X2_VLMUL_EXT_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_X2_VLMUL_EXT_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
@@ -1014,6 +1019,10 @@ DEF_RVV_X4_VLMUL_EXT_OPS (vuint32m1_t, 0)
DEF_RVV_X4_VLMUL_EXT_OPS (vuint32m2_t, 0)
DEF_RVV_X4_VLMUL_EXT_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_X4_VLMUL_EXT_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X4_VLMUL_EXT_OPS (vfloat16m2_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_X4_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X4_VLMUL_EXT_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_X4_VLMUL_EXT_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
@@ -1040,6 +1049,9 @@ DEF_RVV_X8_VLMUL_EXT_OPS (vuint16m1_t, 0)
DEF_RVV_X8_VLMUL_EXT_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X8_VLMUL_EXT_OPS (vuint32m1_t, 0)
DEF_RVV_X8_VLMUL_EXT_OPS (vuint64m1_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_X8_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X8_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
+DEF_RVV_X8_VLMUL_EXT_OPS (vfloat16m1_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_X8_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X8_VLMUL_EXT_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_X8_VLMUL_EXT_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
@@ -1056,6 +1068,8 @@ DEF_RVV_X16_VLMUL_EXT_OPS (vuint8mf2_t, 0)
DEF_RVV_X16_VLMUL_EXT_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X16_VLMUL_EXT_OPS (vuint16mf2_t, 0)
DEF_RVV_X16_VLMUL_EXT_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X16_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X16_VLMUL_EXT_OPS (vfloat16mf2_t, RVV_REQUIRE_ELEN_FP_16)
DEF_RVV_X16_VLMUL_EXT_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X32_VLMUL_EXT_OPS (vint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
@@ -1064,6 +1078,7 @@ DEF_RVV_X32_VLMUL_EXT_OPS (vint16mf4_t, 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X32_VLMUL_EXT_OPS (vuint8mf8_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X32_VLMUL_EXT_OPS (vuint8mf4_t, 0)
DEF_RVV_X32_VLMUL_EXT_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_X32_VLMUL_EXT_OPS (vfloat16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_X64_VLMUL_

RE: [PATCH v1] RISC-V: Fix one potential test failure for RVV vsetvl

2023-06-12 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Sounds good to me. Not sure if there are some tests focus on -O0/Os/Oz, we can 
refine this in another PATCH.

Pan

-Original Message-
From: Kito Cheng  
Sent: Monday, June 12, 2023 8:30 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; rdapp@gmail.com; 
jeffreya...@gmail.com; Wang, Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Fix one potential test failure for RVV vsetvl

OK for this patch, and I am thinking we should adjust rvv.exp to just exclude 
-O0, -Os and -Oz for some testcases run to simplify many testcases.


On Mon, Jun 12, 2023 at 8:20 PM Pan Li via Gcc-patches 
 wrote:
>
> From: Pan Li 
>
> The test will fail on below command with multi-thread like below.  
> However, it comes from one missed "Oz" option when check vsetvl.
>
> make -j $(nproc) report RUNTESTFLAGS="rvv.exp riscv.exp"
>
> To some reason, this failure cannot be reproduced by RUNTESTFLAGS="rvv.exp"
> or make without -j option. We would like to fix it and root cause the 
> reason later.
>
> Signed-off-by: Pan Li 
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Adjust test checking.
> ---
>  gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
> index 66c90ac10e7..f3420be8ab6 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c
> @@ -34,4 +34,4 @@ void f(int8_t *base, int8_t *out, size_t vl, size_t 
> m, size_t k) {
>  /* { dg-final { scan-assembler-times 
> {slli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*4} 1 { target { no-opts "-O0" 
> no-opts "-g" no-opts "-funroll-loops" } } } } */
>  /* { dg-final { scan-assembler-times 
> {srli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*8} 1 { target { no-opts "-O0" 
> no-opts "-g" no-opts "-funroll-loops" } } } } */
>  /* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts 
> "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts 
> "-funroll-loops" } } } } */
> -/* { dg-final { scan-assembler-times 
> {vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 5 { target { 
> no-opts "-O0" no-opts "-Os" no-opts "-g" no-opts "-funroll-loops" } } 
> } } */
> +/* { dg-final { scan-assembler-times 
> +{vsetvli\s+zero,\s*[a-x0-9]+,\s*e8,\s*mf8,\s*tu,\s*m[au]} 5 { target 
> +{ no-opts "-O0" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts 
> +"-funroll-loops" } } } } */
> --
> 2.34.1
>


RE: [PATCH v5] RISC-V: Add vector psabi checking.

2023-06-12 Thread Wang, Yanzhang via Gcc-patches
It's the same behavior. Because the DEFAULT_CFLAGS will be copied to
CFLAGS and then passed as the DEFAULT_EXTRA_OPTIONS to dg-runtest.

> -Original Message-
> From: Kito Cheng 
> Sent: Monday, June 12, 2023 10:08 PM
> To: Wang, Yanzhang 
> Cc: Kito Cheng ; gcc-patches@gcc.gnu.org;
> juzhe.zh...@rivai.ai; Li, Pan2 
> Subject: Re: [PATCH v5] RISC-V: Add vector psabi checking.
> 
> How about appending to DEFAULT_CFLAGS?
> 
> On Mon, Jun 12, 2023 at 9:38 PM Wang, Yanzhang via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > I found that add the -Wno-psabi to CFLAGS will be overrode by
> > dg-options. It seems we can only add this option to the third arg of
> > dg-runtest. Attach the dg-runtest comments,
> >
> > # dg-runtest -- simple main loop useful to most testsuites # # OPTIONS
> > is a set of options to always pass.
> > # DEFAULT_EXTRA_OPTIONS is a set of options to pass if the testcase #
> > doesn't specify any (with dg-option).
> >
> > > -Original Message-
> > > From: Kito Cheng 
> > > Sent: Monday, June 12, 2023 8:43 PM
> > > To: Wang, Yanzhang 
> > > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Li, Pan2
> > > 
> > > Subject: Re: [PATCH v5] RISC-V: Add vector psabi checking.
> > >
> > > Hi Yan-Zhang:
> > >
> > > OK with one minor, go ahead IF the regression is clean.
> > >
> > > Hi Pan:
> > >
> > > Could you help to verify this patch and commit if the regression is
> clean?
> > >
> > > thanks :)
> > >
> > > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> > > b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> > > > index 5e69235a268..ad79d0e9a8d 100644
> > > > --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> > > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> > > > @@ -43,7 +43,7 @@ dg-init
> > > >  # Main loop.
> > > >  set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
> > >
> > > Add -Wno-psabi here rather than below, and also add it for
> > > g++.target/riscv/rvv/rvv.exp
> > >
> > > >  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.\[cS\]]]
> \
> > > > -   "" $CFLAGS
> > > > +   "-Wno-psabi" $CFLAGS
> > > >  gcc-dg-runtest [lsort [glob -nocomplain
> > > > $srcdir/$subdir/vsetvl/*.\[cS\]]]
> > > \
> > > > "" $CFLAGS
> > > >  dg-runtest [lsort [glob -nocomplain
> > > > $srcdir/$subdir/autovec/*.\[cS\]]] \


Re: [PATCH v5] RISC-V: Add vector psabi checking.

2023-06-12 Thread Jeff Law via Gcc-patches




On 6/12/23 07:36, Wang, Yanzhang via Gcc-patches wrote:

I found that add the -Wno-psabi to CFLAGS will be overrode by
dg-options. It seems we can only add this option to the third
arg of dg-runtest. Attach the dg-runtest comments,
I think we default to -Wno-psabi to avoid triggering diagnostics in the 
common case where we aren't concerned about such issues.  So not a 
surprise that we'll need to work a bit harder to get it added when we do 
want to check for psabi issues.


jeff


Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-06-12 Thread Jeff Law via Gcc-patches




On 6/12/23 01:36, Manolis Tsamis wrote:





Even if late, one question for the dynamic instruction numbers.
Was this measured just with f-m-o or with the stack pointer fold patch
applied too?
I remember I was getting better improvements in the past, but most of
the cases had to do with the stack pointer so the second patch is
necessary.
It was just the main f-m-o patch, so there'll be additional benefits 
with the ability to cprop the stack pointer.   And even if we don't get 
measurable wins for something like mcf due to its memory bound nature, 
smaller, tighter code is always preferable.


Jeff


Re: [PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation

2023-06-12 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

seems a nice improvement, looks good to me.  While reading I was wondering
if vzext could help synthesize some (zero-based) patterns as well
(e.g. 0 3 0 3...).
However the sequences I could come up with were not shorter than what we
are already emitting, so probably not.

Regards
 Robin


Re: Re: [PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation

2023-06-12 Thread 钟居哲
No. Such pattern you pointed I already supported.
The operation is very simple.
Just use a single vmv.v.i but larger SEW is enough. No need vzext.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-12 22:43
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization with 
decompress operation
Hi Juzhe,
 
seems a nice improvement, looks good to me.  While reading I was wondering
if vzext could help synthesize some (zero-based) patterns as well
(e.g. 0 3 0 3...).
However the sequences I could come up with were not shorter than what we
are already emitting, so probably not.
 
Regards
Robin
 


Re: [PATCH v5] RISC-V: Add vector psabi checking.

2023-06-12 Thread Kito Cheng via Gcc-patches
Hmmm, yeah, I think let's add it case by case...I assume we should get
it rid before GCC 14, it is mostly used for the transition period
before we settle down the ABI and for GCC 13.

On Mon, Jun 12, 2023 at 10:34 PM Jeff Law  wrote:
>
>
>
> On 6/12/23 07:36, Wang, Yanzhang via Gcc-patches wrote:
> > I found that add the -Wno-psabi to CFLAGS will be overrode by
> > dg-options. It seems we can only add this option to the third
> > arg of dg-runtest. Attach the dg-runtest comments,
> I think we default to -Wno-psabi to avoid triggering diagnostics in the
> common case where we aren't concerned about such issues.  So not a
> surprise that we'll need to work a bit harder to get it added when we do
> want to check for psabi issues.
>
> jeff


Re: Re: [PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation

2023-06-12 Thread Kito Cheng via Gcc-patches
I didn't take a close review yet, (and I suspect I can't find time
before I start my vacation :P), but I am thinking we may adding
selftests for expand_const_vector in *future*, again, not blocker for
this patch :)

On Mon, Jun 12, 2023 at 10:51 PM 钟居哲  wrote:
>
> No. Such pattern you pointed I already supported.
> The operation is very simple.
> Just use a single vmv.v.i but larger SEW is enough. No need vzext.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Robin Dapp
> Date: 2023-06-12 22:43
> To: juzhe.zhong; gcc-patches
> CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
> Subject: Re: [PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization with 
> decompress operation
> Hi Juzhe,
>
> seems a nice improvement, looks good to me.  While reading I was wondering
> if vzext could help synthesize some (zero-based) patterns as well
> (e.g. 0 3 0 3...).
> However the sequences I could come up with were not shorter than what we
> are already emitting, so probably not.
>
> Regards
> Robin
>


[PATCH] RISC-V: Implement vec_set and vec_extract.

2023-06-12 Thread Robin Dapp via Gcc-patches
Hi,

this implements the vec_set and vec_extract patterns for integer and
floating-point data types.  For vec_set we broadcast the insert value to
a vector register and then perform a vslideup with effective length 1 to
the requested index.

vec_extract is done by sliding down the requested element to index 0
and v(f)mv.[xf].s to a scalar register.

The patch does not include vector-vector extraction which
will be done at a later time.

The vec_set tests required a vector calling convention/ABI because
a vector is being returned.  I'm currently experimenting with adding
preliminary vector ABI support locally and still finishing some tests
after discussing with Juzhe.  Consequently, I would not push this
before ABI support is upstream.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (vec_set): Implement.
(vec_extract): Implement.
* config/riscv/riscv-protos.h (enum insn_type): Add slide insn.
(emit_vlmax_slide_insn): Declare.
(emit_nonvlmax_slide_tu_insn): Declare.
(emit_scalar_move_insn): Export.
(emit_nonvlmax_integer_move_insn): Export.
* config/riscv/riscv-v.cc (emit_vlmax_slide_insn): New function.
(emit_nonvlmax_slide_tu_insn): New function.
(emit_vlmax_masked_mu_insn): No change.
(emit_vlmax_integer_move_insn): Export.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: New test.
---
 gcc/config/riscv/autovec.md   |  79 ++
 gcc/config/riscv/riscv-protos.h   |   5 +
 gcc/config/riscv/riscv-v.cc   |  62 -
 .../rvv/autovec/vls-vlmax/vec_extract-1.c |  49 
 .../rvv/autovec/vls-vlmax/vec_extract-2.c |  58 +
 .../rvv/autovec/vls-vlmax/vec_extract-3.c |  59 +
 .../rvv/autovec/vls-vlmax/vec_extract-4.c |  60 +
 .../rvv/autovec/vls-vlmax/vec_extract-run.c   | 230 ++
 .../riscv/rvv/autovec/vls-vlmax/vec_set-1.c   |  52 
 .../riscv/rvv/autovec/vls-vlmax/vec_set-2.c   |  62 +
 .../riscv/rvv/autovec/vls-vlmax/vec_set-3.c   |  63 +
 .../riscv/rvv/autovec/vls-vlmax/vec_set-4.c   |  64 +
 .../riscv/rvv/autovec/vls-vlmax/vec_set-run.c | 230 ++
 13 files changed, 1071 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b7070099f29..9cfa48f94b5 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -640,3 +640,82 @@ (define_expand "select_vl"
   riscv_vector::expand_select_vl (operands);
   DONE;
 })
+
+;; -
+;;  [INT,FP] Insert a vector element.
+;; -
+
+(define_expand "vec_set"
+  [(match_operand:V0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand  2 "immediate_operand")]
+  "TARGET_VECTOR"
+{
+  /* If we set the first element, emit an v(f)mv.s.[xf].  */
+  if (operands[2] == const0_rtx)
+{
+  rtx ops[] = {operands[0], riscv_vector::gen_scalar_move_mask (mode),
+  RVV_VUNDEF (mode), operands[1]};
+  riscv_vector::emit_scalar_move_insn
+ (code_for_pred_broadcast (mode), ops);
+}
+  else
+{
+  /* Move the desired value into a vector register and insert
+it at the proper position using vslideup with an
+"ef

Re: [PATCH] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation

2023-06-12 Thread Jeff Law via Gcc-patches




On 6/12/23 08:54, Kito Cheng wrote:

I didn't take a close review yet, (and I suspect I can't find time
before I start my vacation :P), but I am thinking we may adding
selftests for expand_const_vector in *future*, again, not blocker for
this patch :)

I'll take this one.  Go enjoy your vacation!

jeff


[PATCH] RISC-V: Add sign-extending variants for vmv.x.s.

2023-06-12 Thread Robin Dapp via Gcc-patches
Hi,

when the destination register of a vmv.x.s needs to be sign extended to
XLEN we currently emit an sext insn.  Since vmv.x.s performs this
implicitly this patch adds two instruction patterns (intended for
combine et al.) that include sign_extend for the destination operand.

The tests extend the vec_extract tests sent before.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Add VI_QH iterator.
* config/riscv/autovec-opt.md
(@pred_extract_first_sextdi): New vmv.x.s pattern
that includes sign extension.
(@pred_extract_first_sextsi): Dito for SImode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: Expect
no sext instructions.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: Dito.
---
 gcc/config/riscv/autovec-opt.md   | 29 +++
 gcc/config/riscv/vector-iterators.md  |  5 
 .../rvv/autovec/vls-vlmax/vec_extract-1.c |  2 ++
 .../rvv/autovec/vls-vlmax/vec_extract-2.c |  2 ++
 .../rvv/autovec/vls-vlmax/vec_extract-3.c |  2 ++
 .../rvv/autovec/vls-vlmax/vec_extract-4.c |  2 ++
 7 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 7bb93eed220..82bd967dc38 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -330,3 +330,32 @@ (define_insn_and_split "*zero_sign_extend_fma"
   }
   [(set_attr "type" "viwmuladd")
(set_attr "mode" "")])
+
+;; -
+;;  Sign-extension for vmv.x.s.
+;; -
+(define_insn "@pred_extract_first_sextdi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (sign_extend:DI
+  (unspec:
+   [(vec_select:
+  (match_operand:VI_QHS 1 "register_operand""vr")
+  (parallel [(const_int 0)]))
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)))]
+  "TARGET_VECTOR && Pmode == DImode"
+  "vmv.x.s\t%0,%1"
+  [(set_attr "type" "vimovvx")
+   (set_attr "mode" "")])
+
+(define_insn "@pred_extract_first_sextsi"
+  [(set (match_operand:SI 0 "register_operand"   "=r")
+   (sign_extend:SI
+  (unspec:
+   [(vec_select:
+  (match_operand:VI_QH 1 "register_operand"  "vr")
+  (parallel [(const_int 0)]))
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)))]
+  "TARGET_VECTOR && Pmode == SImode"
+  "vmv.x.s\t%0,%1"
+  [(set_attr "type" "vimovvx")
+   (set_attr "mode" "")])
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 6abd777c1ad..e8b39d63d28 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -352,6 +352,11 @@ (define_mode_iterator VFULLI [
   (VNx2DI "TARGET_FULL_V") (VNx4DI "TARGET_FULL_V") (VNx8DI "TARGET_FULL_V") 
(VNx16DI "TARGET_FULL_V")
 ])
 
+(define_mode_iterator VI_QH [
+  (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI 
(VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
+  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
+])
+
 (define_mode_iterator VI_QHS [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI 
(VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
index b631fdb9cc6..dedd56a3d3b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
@@ -47,3 +47,5 @@ TEST_ALL1 (VEC_EXTRACT)
 
 /* { dg-final { scan-assembler-times {\tvfmv.f.s} 5 } } */
 /* { dg-final { scan-assembler-times {\tvmv.x.s} 13 } } */
+
+/* { dg-final { scan-assembler-not {\tsext} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
index 0a93752bd4b..f63cee4c2a4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
@@ -56,3 +56,5 @@ TEST_ALL2 (VEC_EXTRACT)
 
 /* { dg-final { scan-assembler-times {\tvfmv.f.s} 9 } } */
 /* { dg-final { scan-assembler-times {\tvmv.x.s} 19 } } */
+
+/* { dg-final { scan-assembler-not {\tsext} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec

[PATCH V2] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation

2023-06-12 Thread juzhe . zhong
From: Juzhe-Zhong 

According to RVV ISA:
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc

We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing 
vdecompress)
Decompress operation.

Case 1 (nunits = POLY_INT_CST [16, 16]):
_48 = VEC_PERM_EXPR <_37, _35, { 0, POLY_INT_CST [16, 16], 1, POLY_INT_CST [17, 
16], 2, POLY_INT_CST [18, 16], ... }>;
We can optimize such VLA SLP permuation pattern into:
_48 = vdecompress (_37, _35, mask = { 0, 1, 0, 1, ... };

Case 2 (nunits = POLY_INT_CST [16, 16]):
_23 = VEC_PERM_EXPR <_46, _44, { POLY_INT_CST [1, 1], POLY_INT_CST [3, 3], 
POLY_INT_CST [2, 1], POLY_INT_CST [4, 3], POLY_INT_CST [3, 1], POLY_INT_CST [5, 
3], ... }>;
We can optimize such VLA SLP permuation pattern into:
_48 = vdecompress (slidedown(_46, 1/2 nunits), slidedown(_44, 1/2 nunits), mask 
= { 0, 1, 0, 1, ... };

For example:
void __attribute__ ((noinline, noclone))
vec_slp (uint64_t *restrict a, uint64_t b, uint64_t c, int n)
{
  for (int i = 0; i < n; ++i)
{
  a[i * 2] += b;
  a[i * 2 + 1] += c;
}
}

ASM:
...
vid.v   v0
vand.vi v0,v0,1
vmseq.viv0,v0,1  ===> mask = { 0, 1, 0, 1, ... }
vdecompress:
viota.m v3,v0   
vrgather.vv v2,v1,v3,v0.t
Loop:
vsetvli zero,a5,e64,m1,ta,ma
vle64.v v1,0(a0)
vsetvli a6,zero,e64,m1,ta,ma
vadd.vv v1,v2,v1
vsetvli zero,a5,e64,m1,ta,ma
mv  a5,a3
vse64.v v1,0(a0)
add a3,a3,a1
add a0,a0,a2
bgtua5,a4,.L4


gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_vlmax_decompress_insn): New function.
(shuffle_decompress_patterns): New function.
(expand_vec_perm_const_1): Add decompress optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-8.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-9.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-9.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 111 ++
 .../riscv/rvv/autovec/partial/slp-8.c |  30 +
 .../riscv/rvv/autovec/partial/slp-9.c |  31 +
 .../riscv/rvv/autovec/partial/slp_run-8.c |  30 +
 .../riscv/rvv/autovec/partial/slp_run-9.c |  30 +
 5 files changed, 232 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-9.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e1b85a5af91..fb970344521 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -836,6 +836,46 @@ emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx 
sel, rtx mask)
   emit_vlmax_masked_mu_insn (icode, RVV_BINOP_MU, ops);
 }
 
+/* According to RVV ISA spec (16.5.1. Synthesizing vdecompress):
+   https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
+
+  There is no inverse vdecompress provided, as this operation can be readily
+  synthesized using iota and a masked vrgather:
+
+  Desired functionality of 'vdecompress'
+   7 6 5 4 3 2 1 0 # vid
+
+ e d c b a # packed vector of 5 elements
+   1 0 0 1 1 1 0 1 # mask vector of 8 elements
+   p q r s t u v w # destination register before vdecompress
+
+   e q r d c b v a # result of vdecompress
+   # v0 holds mask
+   # v1 holds packed data
+   # v11 holds input expanded vector and result
+   viota.m v10, v0 # Calc iota from mask in v0
+   vrgather.vv v11, v1, v10, v0.t  # Expand into destination
+ p q r s t u v w  # v11 destination register
+  e d c b a  # v1 source vector
+ 1 0 0 1 1 1 0 1  # v0 mask vector
+
+ 4 4 4 3 2 1 1 0  # v10 result of viota.m
+ e q r d c b v a  # v11 destination after vrgather using viota.m under mask
+*/
+static void
+emit_vlmax_decompress_insn (rtx target, rtx op, rtx mask)
+{
+  machine_mode data_mode = GET_MODE (target);
+  machine_mode sel_mode = related_int_vector_mode (data_mode).require ();
+  if (GET_MODE_INNER (data_mode) == QImode)
+sel_mode = get_vector_mode (HImode, GET_MODE_NUNITS (data_mode)).require 
();
+
+  rtx sel = gen_reg_rtx (sel_mode);
+  rtx iota_ops[] = {sel, mask};
+  emit_vlmax_insn (code_for_pred_iota (sel_mode), RVV_UNOP, iota_ops);
+  emit_vlmax_masked_gather_mu_insn (target, op, sel, mask);
+}
+
 /* Emit merge instruction.  */
 
 static machine_mode
@@ -2337,6 +2377,75 @@ struct expand_vec_perm_d
   bool testing_p;
 };
 
+/* Recognize decompress patterns:
+
+   1. VEC_PERM_EXPR op0 and op1
+  with isel = { 0, nunits, 1, nunits + 1, ... }.
+  Decompress op0 and op1 vector with the mask = { 

Re: [PATCH] RISC-V: Implement vec_set and vec_extract.

2023-06-12 Thread 钟居哲
+  /* If the slide offset fits into 5 bits we can
+ use the immediate variant instead of the register variant.
+ The expander's operand[2] is ops[3] here. */
+  if (!satisfies_constraint_K (ops[3]))
+ops[3] = force_reg (Pmode, ops[3]);

I don't think we need this. maybe_expand_insn should be able to handle this.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-12 22:55
To: gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Implement vec_set and vec_extract.
Hi,
 
this implements the vec_set and vec_extract patterns for integer and
floating-point data types.  For vec_set we broadcast the insert value to
a vector register and then perform a vslideup with effective length 1 to
the requested index.
 
vec_extract is done by sliding down the requested element to index 0
and v(f)mv.[xf].s to a scalar register.
 
The patch does not include vector-vector extraction which
will be done at a later time.
 
The vec_set tests required a vector calling convention/ABI because
a vector is being returned.  I'm currently experimenting with adding
preliminary vector ABI support locally and still finishing some tests
after discussing with Juzhe.  Consequently, I would not push this
before ABI support is upstream.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (vec_set): Implement.
(vec_extract): Implement.
* config/riscv/riscv-protos.h (enum insn_type): Add slide insn.
(emit_vlmax_slide_insn): Declare.
(emit_nonvlmax_slide_tu_insn): Declare.
(emit_scalar_move_insn): Export.
(emit_nonvlmax_integer_move_insn): Export.
* config/riscv/riscv-v.cc (emit_vlmax_slide_insn): New function.
(emit_nonvlmax_slide_tu_insn): New function.
(emit_vlmax_masked_mu_insn): No change.
(emit_vlmax_integer_move_insn): Export.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: New test.
---
gcc/config/riscv/autovec.md   |  79 ++
gcc/config/riscv/riscv-protos.h   |   5 +
gcc/config/riscv/riscv-v.cc   |  62 -
.../rvv/autovec/vls-vlmax/vec_extract-1.c |  49 
.../rvv/autovec/vls-vlmax/vec_extract-2.c |  58 +
.../rvv/autovec/vls-vlmax/vec_extract-3.c |  59 +
.../rvv/autovec/vls-vlmax/vec_extract-4.c |  60 +
.../rvv/autovec/vls-vlmax/vec_extract-run.c   | 230 ++
.../riscv/rvv/autovec/vls-vlmax/vec_set-1.c   |  52 
.../riscv/rvv/autovec/vls-vlmax/vec_set-2.c   |  62 +
.../riscv/rvv/autovec/vls-vlmax/vec_set-3.c   |  63 +
.../riscv/rvv/autovec/vls-vlmax/vec_set-4.c   |  64 +
.../riscv/rvv/autovec/vls-vlmax/vec_set-run.c | 230 ++
13 files changed, 1071 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b7070099f29..9cfa48f94b5 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -640,3 +640,82 @@ (define_expand "select_vl"
   riscv_vector::expand_select_vl (operands);
   DONE;
})
+
+;; -
+;;  [INT,FP] Insert a vector element.
+;; -
+
+(define_expand "vec_set"
+  [(match_operand:V 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand 2 "immediate_operand")]
+  "TARGET_VECTOR"
+{
+  /* If we set the first element, emit an v(f)mv.s.[xf].  */
+  if (operands[2] == const0_rtx)
+{
+  rtx ops[] = {operands[0], ri

Re: [PATCH] RISC-V: Add sign-extending variants for vmv.x.s.

2023-06-12 Thread 钟居哲
Change 
+(define_insn "@pred_extract_first_sextdi"
into 
(define_insn "*pred_extract_first_sextdi"

Change
+(define_insn "@pred_extract_first_sextsi"
into
(define_insn "*pred_extract_first_sextsi"

I don't think we will call combine pattern in vector-opt.md in the future.
Use "*" instead of "@" can save resources during building.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-12 23:04
To: gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Add sign-extending variants for vmv.x.s.
Hi,
 
when the destination register of a vmv.x.s needs to be sign extended to
XLEN we currently emit an sext insn.  Since vmv.x.s performs this
implicitly this patch adds two instruction patterns (intended for
combine et al.) that include sign_extend for the destination operand.
 
The tests extend the vec_extract tests sent before.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/vector-iterators.md: Add VI_QH iterator.
* config/riscv/autovec-opt.md
(@pred_extract_first_sextdi): New vmv.x.s pattern
that includes sign extension.
(@pred_extract_first_sextsi): Dito for SImode.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: Expect
no sext instructions.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: Dito.
---
gcc/config/riscv/autovec-opt.md   | 29 +++
gcc/config/riscv/vector-iterators.md  |  5 
.../rvv/autovec/vls-vlmax/vec_extract-1.c |  2 ++
.../rvv/autovec/vls-vlmax/vec_extract-2.c |  2 ++
.../rvv/autovec/vls-vlmax/vec_extract-3.c |  2 ++
.../rvv/autovec/vls-vlmax/vec_extract-4.c |  2 ++
7 files changed, 43 insertions(+), 1 deletion(-)
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 7bb93eed220..82bd967dc38 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -330,3 +330,32 @@ (define_insn_and_split "*zero_sign_extend_fma"
   }
   [(set_attr "type" "viwmuladd")
(set_attr "mode" "")])
+
+;; -
+;;  Sign-extension for vmv.x.s.
+;; -
+(define_insn "@pred_extract_first_sextdi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+ (sign_extend:DI
+  (unspec:
+ [(vec_select:
+(match_operand:VI_QHS 1 "register_operand""vr")
+(parallel [(const_int 0)]))
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)))]
+  "TARGET_VECTOR && Pmode == DImode"
+  "vmv.x.s\t%0,%1"
+  [(set_attr "type" "vimovvx")
+   (set_attr "mode" "")])
+
+(define_insn "@pred_extract_first_sextsi"
+  [(set (match_operand:SI 0 "register_operand"   "=r")
+ (sign_extend:SI
+  (unspec:
+ [(vec_select:
+(match_operand:VI_QH 1 "register_operand"  "vr")
+(parallel [(const_int 0)]))
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)))]
+  "TARGET_VECTOR && Pmode == SImode"
+  "vmv.x.s\t%0,%1"
+  [(set_attr "type" "vimovvx")
+   (set_attr "mode" "")])
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 6abd777c1ad..e8b39d63d28 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -352,6 +352,11 @@ (define_mode_iterator VFULLI [
   (VNx2DI "TARGET_FULL_V") (VNx4DI "TARGET_FULL_V") (VNx8DI "TARGET_FULL_V") 
(VNx16DI "TARGET_FULL_V")
])
+(define_mode_iterator VI_QH [
+  (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI 
(VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
+  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
+])
+
(define_mode_iterator VI_QHS [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI 
(VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
index b631fdb9cc6..dedd56a3d3b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
@@ -47,3 +47,5 @@ TEST_ALL1 (VEC_EXTRACT)
/* { dg-final { scan-assembler-times {\tvfmv.f.s} 5 } } */
/* { dg-final { scan-assembler-times {\tvmv.x.s} 13 } } */
+
+/* { dg-final { scan-assembler-not {\tsext} } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
index 0a93752bd4b..f63cee4c2a4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-

Re: [PATCH 3/4] [RISC-V] resolve confilct between zcmp multi push/pop and shrink-wrap-separate

2023-06-12 Thread Kito Cheng via Gcc-patches
I would suggest breaking this patch into two parts: RISC-V part and
the rest part (shrink-wrap.h / shrink-wrap.cc).


On Wed, Jun 7, 2023 at 1:55 PM Fei Gao  wrote:
>
> Disable zcmp multi push/pop if shrink-wrap-separate is active.
>
> So in -Os that prefers smaller code size, by default shrink-wrap-separate
> is disabled while zcmp multi push/pop is enabled.
>
> And in -O2 and others that prefers speed, by default shrink-wrap-separate
> is enabled while zcmp multi push/pop is disabled. To force enabling zcmp multi
> push/pop in this case, -fno-shrink-wrap-separate has to be explictly given.
>
> The following TC shows the issues in -O2 before this patch with both
> shrink-wrap-separate and zcmp multi push/pop active.
> 1. duplicated store of s regs.
> 2. cm.push pushes ra, s0-s11 in reverse order than what normal
>prologue does, causing stack corruption and failure to resotre s regs.
>
> TC: zcmp_shrink_wrap_separate.c included in this patch.
>
> output asm before this patch:
> calc_func:
> cm.push {ra, s0-s3}, -32
> ...
> beq a5,zero,.L2
> ...
> .L2:
> ...
> sw  s1,20(sp) //issue here
> sw  s3,12(sp) //issue here
> ...
> sw  s2,16(sp) //issue here
>
> output asm after this patch:
> calc_func:
> addisp,sp,-32
> sw  s0,24(sp)
> ...
> beq a5,zero,.L2
> ...
> .L2:
> ...
> sw  s1,20(sp)
> sw  s3,12(sp)
> ...
> sw  s2,16(sp)
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc
> (riscv_avoid_shrink_wrapping_separate): wrap the condition check in
> riscv_avoid_shrink_wrapping_separate.
> (riscv_avoid_multi_push): avoid multi push if shrink_wrapping_separate
>   is active.
> (riscv_get_separate_components): call 
> riscv_avoid_shrink_wrapping_separate
> * shrink-wrap.cc (try_shrink_wrapping_separate): call
>   use_shrink_wrapping_separate.
> (use_shrink_wrapping_separate):wrap the condition
>   check in use_shrink_wrapping_separate
> * shrink-wrap.h (use_shrink_wrapping_separate): add to extern
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zcmp_shrink_wrap_separate.c: New test.
> * gcc.target/riscv/zcmp_shrink_wrap_separate2.c: New test.
>
> Signed-off-by: Fei Gao 
> Co-Authored-By: Zhangjin Liao 
> ---
>  gcc/config/riscv/riscv.cc | 19 +++-
>  gcc/shrink-wrap.cc| 25 +++--
>  gcc/shrink-wrap.h |  1 +
>  .../riscv/zcmp_shrink_wrap_separate.c | 97 +++
>  .../riscv/zcmp_shrink_wrap_separate2.c| 97 +++
>  5 files changed, 228 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/zcmp_shrink_wrap_separate2.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index f60c241a526..b505cdeca34 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cfghooks.h"
>  #include "cfgloop.h"
>  #include "cfgrtl.h"
> +#include "shrink-wrap.h"
>  #include "sel-sched.h"
>  #include "fold-const.h"
>  #include "gimple-iterator.h"
> @@ -389,6 +390,7 @@ static const struct riscv_tune_param 
> optimize_size_tune_info = {
>false,   /* use_divmod_expansion */
>  };
>
> +static bool riscv_avoid_shrink_wrapping_separate ();
>  static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
>  static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
>
> @@ -4910,6 +4912,8 @@ riscv_avoid_multi_push(const struct riscv_frame_info 
> *frame)
>|| cfun->machine->interrupt_handler_p
>|| cfun->machine->varargs_size != 0
>|| crtl->args.pretend_args_size != 0
> +  || (use_shrink_wrapping_separate ()
> +  && !riscv_avoid_shrink_wrapping_separate ())
>|| (frame->mask & ~ MULTI_PUSH_GPR_MASK))
>  return true;
>
> @@ -6077,6 +6081,17 @@ riscv_epilogue_uses (unsigned int regno)
>return false;
>  }
>
> +static bool
> +riscv_avoid_shrink_wrapping_separate ()
> +{
> +  if (riscv_use_save_libcall (&cfun->machine->frame)
> +  || cfun->machine->interrupt_handler_p
> +  || !cfun->machine->frame.gp_sp_offset.is_constant ())
> +return true;
> +
> +  return false;
> +}
> +
>  /* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
>
>  static sbitmap
> @@ -6086,9 +6101,7 @@ riscv_get_separate_components (void)
>sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
>bitmap_clear (components);
>
> -  if (riscv_use_save_libcall (&cfun->machine->frame)
> -  || cfun->machine->interrupt_handler_p
> -  || !cfun->machine->frame.gp_sp_offset.is_constant ())
> +  if (ris

Re: [PATCH] RISC-V: Add sign-extending variants for vmv.x.s.

2023-06-12 Thread Robin Dapp via Gcc-patches
> Change 
> 
> +(define_insn "@pred_extract_first_sextdi"
> 
> into 
> 
> (define_insn "*pred_extract_first_sextdi"

Yeah, I was thinking about this as well right after sending.
We will probably never call this directly.

Regards
 Robin


Re: [PATCH] RISC-V: Implement vec_set and vec_extract.

2023-06-12 Thread Robin Dapp via Gcc-patches
> +  /* If the slide offset fits into 5 bits we can
> + use the immediate variant instead of the register variant.
> + The expander's operand[2] is ops[3] here. */
> +  if (!satisfies_constraint_K (ops[3]))
> +    ops[3] = force_reg (Pmode, ops[3]);
> 
> I don't think we need this. maybe_expand_insn should be able to handle this.

Yes, removed it locally and retested, clean.

Regards
 Robin


[COMMITTED 0/17] - Range-op dispatch unification rework

2023-06-12 Thread Andrew MacLeod via Gcc-patches

This patch set completes the range-op dispatch and unification rework.

The first 7 patches move the remainder of the integral table to the 
unified table, and remove the integer table.


The 8th patch moves all the pointer specific code into a new file 
range-op-ptr.cc


Patches 9-12 introduce a "hybrid" operator class for the 4 operations 
which pointers and integer share a TREE_CODE, but have different 
implementations.  And extra hybrid class is introduced in the pointer 
file which inherits from the integer version, and adds new overloads for 
the used methods which look sa tthe type being passed in and does the 
dispatcxh itself to either the inherited integer version, or call the 
pointer version opcode.


This allows us to have a unified entry for those 4 operators 
(BIT_AND_EXPR, BIT_IOR_EXPR, MIN_EXPR, and MAX_EXPR) and move on.   WHen 
we introduce a pointer range type (ie PRANGE), we can simply add the 
prange signature to the appropriate range_operator methods, and remove 
the pointer and hybrid classes.


 patch 13 thru 16 does some tweaking to range_op_handler and hows its 
used. It now provides a default operator under the covers, so you no 
longer need to check if its valid.   The valid check now simply 
indicates if its has a custom operator implemented or not. This means 
you can simply write:


if (range_op_handler (CONVERT_EXPR).fold_range (...  ))

without worrying about whether there is an entry.  If there is no 
CONVERT_EXPR implemented, you'll simple get false back from all the calls.


Combined with the previous work, it is now always safe to call any 
range_operator routine via range_op_handler with any set of types for 
vrange parameters (including unsupported types)  on any tree code, and 
you will simply get false back if it isn't implemented.


Andrew



[COMMITTED 1/17] Move operator_addr_expr to the unified range-op table.

2023-06-12 Thread Andrew MacLeod via Gcc-patches

Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

From 438f8281ad2d821e09eaf5691d1b76b6f2f39b4c Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 15:56:15 -0400
Subject: [PATCH 01/17] Move operator_addr_expr to the unified range-op table.

	* range-op-mixed.h (class operator_addr_expr): Move from...
	* range-op.cc (unified_table::unified_table): Add ADDR_EXPR.
	(class operator_addr_expr): Move from here.
	(integral_table::integral_table): Remove ADDR_EXPR.
	(pointer_table::pointer_table): Remove ADDR_EXPR.
---
 gcc/range-op-mixed.h | 13 +
 gcc/range-op.cc  | 23 +--
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index 52b8570cb2a..d31b144169d 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -501,4 +501,17 @@ public:
 		relation_kind kind) const final override;
 };
 
+class operator_addr_expr : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  bool fold_range (irange &r, tree type,
+		   const irange &op1, const irange &op2,
+		   relation_trio rel = TRIO_VARYING) const final override;
+  bool op1_range (irange &r, tree type,
+		  const irange &lhs, const irange &op2,
+		  relation_trio rel = TRIO_VARYING) const final override;
+};
+
 #endif // GCC_RANGE_OP_MIXED_H
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 028631c6851..20cc9b0dc9c 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -75,6 +75,7 @@ operator_abs op_abs;
 operator_minus op_minus;
 operator_negate op_negate;
 operator_mult op_mult;
+operator_addr_expr op_addr;
 
 // Invoke the initialization routines for each class of range.
 
@@ -102,6 +103,10 @@ unified_table::unified_table ()
   set (MINUS_EXPR, op_minus);
   set (NEGATE_EXPR, op_negate);
   set (MULT_EXPR, op_mult);
+
+  // Occur in both integer and pointer tables, but currently share
+  // integral implelmentation.
+  set (ADDR_EXPR, op_addr);
 }
 
 // The tables are hidden and accessed via a simple extern function.
@@ -4366,21 +4371,6 @@ operator_negate::op1_range (irange &r, tree type,
 }
 
 
-class operator_addr_expr : public range_operator
-{
-  using range_operator::fold_range;
-  using range_operator::op1_range;
-public:
-  virtual bool fold_range (irange &r, tree type,
-			   const irange &op1,
-			   const irange &op2,
-			   relation_trio rel = TRIO_VARYING) const;
-  virtual bool op1_range (irange &r, tree type,
-			  const irange &lhs,
-			  const irange &op2,
-			  relation_trio rel = TRIO_VARYING) const;
-} op_addr;
-
 bool
 operator_addr_expr::fold_range (irange &r, tree type,
 const irange &lh,
@@ -4613,7 +4603,6 @@ integral_table::integral_table ()
   set (BIT_IOR_EXPR, op_bitwise_or);
   set (BIT_XOR_EXPR, op_bitwise_xor);
   set (BIT_NOT_EXPR, op_bitwise_not);
-  set (ADDR_EXPR, op_addr);
 }
 
 // Initialize any integral operators to the primary table
@@ -4644,8 +4633,6 @@ pointer_table::pointer_table ()
   set (MIN_EXPR, op_ptr_min_max);
   set (MAX_EXPR, op_ptr_min_max);
 
-  set (ADDR_EXPR, op_addr);
-
   set (BIT_NOT_EXPR, op_bitwise_not);
   set (BIT_XOR_EXPR, op_bitwise_xor);
 }
-- 
2.40.1



[COMMITTED 2/17] - Move operator_bitwise_not to the unified range-op table.

2023-06-12 Thread Andrew MacLeod via Gcc-patches

Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 5bb4c53870db1331592a89119f41beee2b17d832 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 15:59:43 -0400
Subject: [PATCH 02/17] Move operator_bitwise_not to the unified range-op
 table.

	* range-op-mixed.h (class operator_bitwise_not): Move from...
	* range-op.cc (unified_table::unified_table): Add BIT_NOT_EXPR.
	(class operator_bitwise_not): Move from here.
	(integral_table::integral_table): Remove BIT_NOT_EXPR.
	(pointer_table::pointer_table): Remove BIT_NOT_EXPR.
---
 gcc/range-op-mixed.h | 13 +
 gcc/range-op.cc  | 21 +++--
 2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index d31b144169d..ba04c51a2d8 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -514,4 +514,17 @@ public:
 		  relation_trio rel = TRIO_VARYING) const final override;
 };
 
+class operator_bitwise_not : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  bool fold_range (irange &r, tree type,
+		   const irange &lh, const irange &rh,
+		   relation_trio rel = TRIO_VARYING) const final override;
+  bool op1_range (irange &r, tree type,
+		  const irange &lhs, const irange &op2,
+		  relation_trio rel = TRIO_VARYING) const final override;
+};
+
 #endif // GCC_RANGE_OP_MIXED_H
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 20cc9b0dc9c..107582a9571 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -76,6 +76,7 @@ operator_minus op_minus;
 operator_negate op_negate;
 operator_mult op_mult;
 operator_addr_expr op_addr;
+operator_bitwise_not op_bitwise_not;
 
 // Invoke the initialization routines for each class of range.
 
@@ -105,8 +106,9 @@ unified_table::unified_table ()
   set (MULT_EXPR, op_mult);
 
   // Occur in both integer and pointer tables, but currently share
-  // integral implelmentation.
+  // integral implementation.
   set (ADDR_EXPR, op_addr);
+  set (BIT_NOT_EXPR, op_bitwise_not);
 }
 
 // The tables are hidden and accessed via a simple extern function.
@@ -4080,21 +4082,6 @@ operator_logical_not::op1_range (irange &r,
 }
 
 
-class operator_bitwise_not : public range_operator
-{
-  using range_operator::fold_range;
-  using range_operator::op1_range;
-public:
-  virtual bool fold_range (irange &r, tree type,
-			   const irange &lh,
-			   const irange &rh,
-			   relation_trio rel = TRIO_VARYING) const;
-  virtual bool op1_range (irange &r, tree type,
-			  const irange &lhs,
-			  const irange &op2,
-			  relation_trio rel = TRIO_VARYING) const;
-} op_bitwise_not;
-
 bool
 operator_bitwise_not::fold_range (irange &r, tree type,
   const irange &lh,
@@ -4602,7 +4589,6 @@ integral_table::integral_table ()
   set (BIT_AND_EXPR, op_bitwise_and);
   set (BIT_IOR_EXPR, op_bitwise_or);
   set (BIT_XOR_EXPR, op_bitwise_xor);
-  set (BIT_NOT_EXPR, op_bitwise_not);
 }
 
 // Initialize any integral operators to the primary table
@@ -4633,7 +4619,6 @@ pointer_table::pointer_table ()
   set (MIN_EXPR, op_ptr_min_max);
   set (MAX_EXPR, op_ptr_min_max);
 
-  set (BIT_NOT_EXPR, op_bitwise_not);
   set (BIT_XOR_EXPR, op_bitwise_xor);
 }
 
-- 
2.40.1



[COMMITTED 3/17] - Move operator_bitwise_xor to the unified range-op table.

2023-06-12 Thread Andrew MacLeod via Gcc-patches

Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From cc18db2826c5449e84366644fa461816fa5f3f99 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:01:05 -0400
Subject: [PATCH 03/17] Move operator_bitwise_xor to the unified range-op
 table.

	* range-op-mixed.h (class operator_bitwise_xor): Move from...
	* range-op.cc (unified_table::unified_table): Add BIT_XOR_EXPR.
	(class operator_bitwise_xor): Move from here.
	(integral_table::integral_table): Remove BIT_XOR_EXPR.
	(pointer_table::pointer_table): Remove BIT_XOR_EXPR.
---
 gcc/range-op-mixed.h | 23 +++
 gcc/range-op.cc  | 36 +++-
 2 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index ba04c51a2d8..644473053e0 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -527,4 +527,27 @@ public:
 		  relation_trio rel = TRIO_VARYING) const final override;
 };
 
+class operator_bitwise_xor : public range_operator
+{
+public:
+  using range_operator::op1_range;
+  using range_operator::op2_range;
+  bool op1_range (irange &r, tree type,
+		  const irange &lhs, const irange &op2,
+		  relation_trio rel = TRIO_VARYING) const final override;
+  bool op2_range (irange &r, tree type,
+		  const irange &lhs, const irange &op1,
+		  relation_trio rel = TRIO_VARYING) const final override;
+  bool op1_op2_relation_effect (irange &lhs_range,
+	tree type,
+	const irange &op1_range,
+	const irange &op2_range,
+	relation_kind rel) const;
+  void update_bitmask (irange &r, const irange &lh,
+		   const irange &rh) const final override;
+private:
+  void wi_fold (irange &r, tree type, const wide_int &lh_lb,
+		const wide_int &lh_ub, const wide_int &rh_lb,
+		const wide_int &rh_ub) const final override;
+};
 #endif // GCC_RANGE_OP_MIXED_H
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 107582a9571..11f576c55c5 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -77,6 +77,7 @@ operator_negate op_negate;
 operator_mult op_mult;
 operator_addr_expr op_addr;
 operator_bitwise_not op_bitwise_not;
+operator_bitwise_xor op_bitwise_xor;
 
 // Invoke the initialization routines for each class of range.
 
@@ -109,6 +110,7 @@ unified_table::unified_table ()
   // integral implementation.
   set (ADDR_EXPR, op_addr);
   set (BIT_NOT_EXPR, op_bitwise_not);
+  set (BIT_XOR_EXPR, op_bitwise_xor);
 }
 
 // The tables are hidden and accessed via a simple extern function.
@@ -3732,33 +3734,12 @@ operator_bitwise_or::op2_range (irange &r, tree type,
   return operator_bitwise_or::op1_range (r, type, lhs, op1);
 }
 
-
-class operator_bitwise_xor : public range_operator
+void
+operator_bitwise_xor::update_bitmask (irange &r, const irange &lh,
+  const irange &rh) const
 {
-  using range_operator::op1_range;
-  using range_operator::op2_range;
-public:
-  virtual void wi_fold (irange &r, tree type,
-		const wide_int &lh_lb,
-		const wide_int &lh_ub,
-		const wide_int &rh_lb,
-		const wide_int &rh_ub) const;
-  virtual bool op1_range (irange &r, tree type,
-			  const irange &lhs,
-			  const irange &op2,
-			  relation_trio rel = TRIO_VARYING) const;
-  virtual bool op2_range (irange &r, tree type,
-			  const irange &lhs,
-			  const irange &op1,
-			  relation_trio rel = TRIO_VARYING) const;
-  virtual bool op1_op2_relation_effect (irange &lhs_range,
-	tree type,
-	const irange &op1_range,
-	const irange &op2_range,
-	relation_kind rel) const;
-  void update_bitmask (irange &r, const irange &lh, const irange &rh) const
-{ update_known_bitmask (r, BIT_XOR_EXPR, lh, rh); }
-} op_bitwise_xor;
+  update_known_bitmask (r, BIT_XOR_EXPR, lh, rh);
+}
 
 void
 operator_bitwise_xor::wi_fold (irange &r, tree type,
@@ -4588,7 +4569,6 @@ integral_table::integral_table ()
   set (MAX_EXPR, op_max);
   set (BIT_AND_EXPR, op_bitwise_and);
   set (BIT_IOR_EXPR, op_bitwise_or);
-  set (BIT_XOR_EXPR, op_bitwise_xor);
 }
 
 // Initialize any integral operators to the primary table
@@ -4618,8 +4598,6 @@ pointer_table::pointer_table ()
   set (BIT_IOR_EXPR, op_pointer_or);
   set (MIN_EXPR, op_ptr_min_max);
   set (MAX_EXPR, op_ptr_min_max);
-
-  set (BIT_XOR_EXPR, op_bitwise_xor);
 }
 
 // Initialize any pointer operators to the primary table
-- 
2.40.1



[COMMITTED 7/17] - Move operator_max to the unified range-op table.

2023-06-12 Thread Andrew MacLeod via Gcc-patches
This is the last of the integral operators, so also remove the integral 
table.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 6585fa54e0f2a54f1a398b49b5b4b6a9cd6da4ea Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:10:54 -0400
Subject: [PATCH 07/17] Move operator_max to the unified range-op table.

Also remove the integral table.

	* range-op-mixed.h (class operator_max): Move from...
	* range-op.cc (unified_table::unified_table): Add MAX_EXPR.
	(get_op_handler): Remove the integral table.
	(class operator_max): Move from here.
	(integral_table::integral_table): Delete.
	* range-op.h (class integral_table): Delete.
---
 gcc/range-op-mixed.h | 10 ++
 gcc/range-op.cc  | 34 --
 gcc/range-op.h   |  9 -
 3 files changed, 18 insertions(+), 35 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index 7bd9b5e1129..cd137acd0e6 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -607,4 +607,14 @@ private:
 		const wide_int &rh_ub) const final override;
 };
 
+class operator_max : public range_operator
+{
+public:
+  void update_bitmask (irange &r, const irange &lh,
+  const irange &rh) const final override;
+private:
+  void wi_fold (irange &r, tree type, const wide_int &lh_lb,
+		const wide_int &lh_ub, const wide_int &rh_lb,
+		const wide_int &rh_ub) const final override;
+};
 #endif // GCC_RANGE_OP_MIXED_H
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index a777fb0d8a3..e83f627a722 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -49,7 +49,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-ccp.h"
 #include "range-op-mixed.h"
 
-integral_table integral_tree_table;
 pointer_table pointer_tree_table;
 
 // Instantiate a range_op_table for unified operations.
@@ -81,6 +80,7 @@ operator_bitwise_xor op_bitwise_xor;
 operator_bitwise_and op_bitwise_and;
 operator_bitwise_or op_bitwise_or;
 operator_min op_min;
+operator_max op_max;
 
 // Invoke the initialization routines for each class of range.
 
@@ -121,6 +121,7 @@ unified_table::unified_table ()
   set (BIT_AND_EXPR, op_bitwise_and);
   set (BIT_IOR_EXPR, op_bitwise_or);
   set (MIN_EXPR, op_min);
+  set (MAX_EXPR, op_max);
 }
 
 // The tables are hidden and accessed via a simple extern function.
@@ -132,16 +133,7 @@ get_op_handler (enum tree_code code, tree type)
   if (POINTER_TYPE_P (type) && pointer_tree_table[code])
 return pointer_tree_table[code];
 
-  if (unified_tree_table[code])
-{
-  // Should not be in any other table if it is in the unified table.
-  gcc_checking_assert (!integral_tree_table[code]);
-  return unified_tree_table[code];
-}
-
-  if (INTEGRAL_TYPE_P (type))
-return integral_tree_table[code];
-  return NULL;
+  return unified_tree_table[code];
 }
 
 range_op_handler::range_op_handler ()
@@ -2001,17 +1993,12 @@ operator_min::wi_fold (irange &r, tree type,
 }
 
 
-class operator_max : public range_operator
+void
+operator_max::update_bitmask (irange &r, const irange &lh,
+			  const irange &rh) const
 {
-public:
-  virtual void wi_fold (irange &r, tree type,
-		const wide_int &lh_lb,
-		const wide_int &lh_ub,
-		const wide_int &rh_lb,
-		const wide_int &rh_ub) const;
-  void update_bitmask (irange &r, const irange &lh, const irange &rh) const
-{ update_known_bitmask (r, MAX_EXPR, lh, rh); }
-} op_max;
+  update_known_bitmask (r, MAX_EXPR, lh, rh);
+}
 
 void
 operator_max::wi_fold (irange &r, tree type,
@@ -4529,11 +4516,6 @@ pointer_or_operator::wi_fold (irange &r, tree type,
 r.set_varying (type);
 }
 
-integral_table::integral_table ()
-{
-  set (MAX_EXPR, op_max);
-}
-
 // Initialize any integral operators to the primary table
 
 void
diff --git a/gcc/range-op.h b/gcc/range-op.h
index 0f5ee41f96c..08c51bace40 100644
--- a/gcc/range-op.h
+++ b/gcc/range-op.h
@@ -299,15 +299,6 @@ range_op_table::set (enum tree_code code, range_operator &op)
   m_range_tree[code] = &op;
 }
 
-// This holds the range op tables
-
-class integral_table : public range_op_table
-{
-public:
-  integral_table ();
-};
-extern integral_table integral_tree_table;
-
 // Instantiate a range op table for pointer operations.
 
 class pointer_table : public range_op_table
-- 
2.40.1



[COMMITTED 11/17] - Add a hybrid MIN_EXPR operator for integer and pointer.

2023-06-12 Thread Andrew MacLeod via Gcc-patches
Add a hybrid operator to choose between integer and pointer versions at 
runtime.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 08f2e419b1e29f114857b3d817904abf3b4891be Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:34:26 -0400
Subject: [PATCH 11/17] Add a hybrid MIN_EXPR operator for integer and pointer.

This adds an operator to the unified table for MIN_EXPR which will
select either the pointer or integer version based on the type passed
to the method.   This is for use until we have a seperate PRANGE class.

	* range-op-mixed.h (operator_min): Remove final.
	* range-op-ptr.cc (pointer_table::pointer_table): Remove MIN_EXPR.
	(class hybrid_min_operator): New.
	(range_op_table::initialize_pointer_ops): Add hybrid_min_operator.
	* range-op.cc (unified_table::unified_table): Comment out MIN_EXPR.
---
 gcc/range-op-mixed.h |  6 +++---
 gcc/range-op-ptr.cc  | 28 +++-
 gcc/range-op.cc  |  2 +-
 3 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index e4852e974c4..a65935435c2 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -625,11 +625,11 @@ class operator_min : public range_operator
 {
 public:
   void update_bitmask (irange &r, const irange &lh,
-		   const irange &rh) const final override;
-private:
+		   const irange &rh) const override;
+protected:
   void wi_fold (irange &r, tree type, const wide_int &lh_lb,
 		const wide_int &lh_ub, const wide_int &rh_lb,
-		const wide_int &rh_ub) const final override;
+		const wide_int &rh_ub) const override;
 };
 
 class operator_max : public range_operator
diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc
index 7b22d0bf05b..483e43ca994 100644
--- a/gcc/range-op-ptr.cc
+++ b/gcc/range-op-ptr.cc
@@ -270,7 +270,6 @@ operator_pointer_diff::op1_op2_relation_effect (irange &lhs_range, tree type,
 
 pointer_table::pointer_table ()
 {
-  set (MIN_EXPR, op_ptr_min_max);
   set (MAX_EXPR, op_ptr_min_max);
 }
 
@@ -380,6 +379,32 @@ public:
 }
 } op_hybrid_or;
 
+// Temporary class which dispatches routines to either the INT version or
+// the pointer version depending on the type.  Once PRANGE is a range
+// class, we can remove the hybrid.
+
+class hybrid_min_operator : public operator_min
+{
+public:
+  void update_bitmask (irange &r, const irange &lh,
+		   const irange &rh) const final override
+{
+  if (!r.undefined_p () && INTEGRAL_TYPE_P (r.type ()))
+	operator_min::update_bitmask (r, lh, rh);
+}
+
+  void wi_fold (irange &r, tree type, const wide_int &lh_lb,
+		const wide_int &lh_ub, const wide_int &rh_lb,
+		const wide_int &rh_ub) const final override
+{
+  if (INTEGRAL_TYPE_P (type))
+	return operator_min::wi_fold (r, type, lh_lb, lh_ub, rh_lb, rh_ub);
+  else
+	return op_ptr_min_max.wi_fold (r, type, lh_lb, lh_ub, rh_lb, rh_ub);
+}
+} op_hybrid_min;
+
+
 
 
 // Initialize any pointer operators to the primary table
@@ -391,4 +416,5 @@ range_op_table::initialize_pointer_ops ()
   set (POINTER_DIFF_EXPR, op_pointer_diff);
   set (BIT_AND_EXPR, op_hybrid_and);
   set (BIT_IOR_EXPR, op_hybrid_or);
+  set (MIN_EXPR, op_hybrid_min);
 }
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 0a9a3297de7..481f3b1324d 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -123,7 +123,7 @@ unified_table::unified_table ()
 
   // set (BIT_AND_EXPR, op_bitwise_and);
   // set (BIT_IOR_EXPR, op_bitwise_or);
-  set (MIN_EXPR, op_min);
+  // set (MIN_EXPR, op_min);
   set (MAX_EXPR, op_max);
 }
 
-- 
2.40.1



[COMMITTED 4/17] - Move operator_bitwise_and to the unified range-op table.

2023-06-12 Thread Andrew MacLeod via Gcc-patches

Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From f2166fc81194a3e4e9ef185a7404551b410bb752 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:02:09 -0400
Subject: [PATCH 04/17] Move operator_bitwise_and to the unified range-op
 table.

At this point, the remaining 4 integral operation have different
impllementations than pointers, so we now check for a pointer table
entry first, then if there is nothing, look at the Unified table.

	* range-op-mixed.h (class operator_bitwise_and): Move from...
	* range-op.cc (unified_table::unified_table): Add BIT_AND_EXPR.
	(get_op_handler): Check for a pointer table entry first.
	(class operator_bitwise_and): Move from here.
	(integral_table::integral_table): Remove BIT_AND_EXPR.
---
 gcc/range-op-mixed.h | 27 
 gcc/range-op.cc  | 49 ++--
 2 files changed, 42 insertions(+), 34 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index 644473053e0..b3d51f8a54e 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -550,4 +550,31 @@ private:
 		const wide_int &lh_ub, const wide_int &rh_lb,
 		const wide_int &rh_ub) const final override;
 };
+
+class operator_bitwise_and : public range_operator
+{
+public:
+  using range_operator::op1_range;
+  using range_operator::op2_range;
+  using range_operator::lhs_op1_relation;
+  bool op1_range (irange &r, tree type,
+		  const irange &lhs, const irange &op2,
+		  relation_trio rel = TRIO_VARYING) const final override;
+  bool op2_range (irange &r, tree type,
+		  const irange &lhs, const irange &op1,
+		  relation_trio rel = TRIO_VARYING) const final override;
+  relation_kind lhs_op1_relation (const irange &lhs,
+  const irange &op1, const irange &op2,
+  relation_kind) const final override;
+  void update_bitmask (irange &r, const irange &lh,
+		   const irange &rh) const final override;
+private:
+  void wi_fold (irange &r, tree type, const wide_int &lh_lb,
+		const wide_int &lh_ub, const wide_int &rh_lb,
+		const wide_int &rh_ub) const final override;
+  void simple_op1_range_solver (irange &r, tree type,
+const irange &lhs,
+const irange &op2) const;
+};
+
 #endif // GCC_RANGE_OP_MIXED_H
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 11f576c55c5..57bd95a1151 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -78,6 +78,7 @@ operator_mult op_mult;
 operator_addr_expr op_addr;
 operator_bitwise_not op_bitwise_not;
 operator_bitwise_xor op_bitwise_xor;
+operator_bitwise_and op_bitwise_and;
 
 // Invoke the initialization routines for each class of range.
 
@@ -111,6 +112,11 @@ unified_table::unified_table ()
   set (ADDR_EXPR, op_addr);
   set (BIT_NOT_EXPR, op_bitwise_not);
   set (BIT_XOR_EXPR, op_bitwise_xor);
+
+  // These are in both integer and pointer tables, but pointer has a different
+  // implementation.  These also remain in the pointer table until a pointer
+  // speifc version is provided.
+  set (BIT_AND_EXPR, op_bitwise_and);
 }
 
 // The tables are hidden and accessed via a simple extern function.
@@ -118,16 +124,17 @@ unified_table::unified_table ()
 range_operator *
 get_op_handler (enum tree_code code, tree type)
 {
+  // If this is pointer type and there is pointer specifc routine, use it.
+  if (POINTER_TYPE_P (type) && pointer_tree_table[code])
+return pointer_tree_table[code];
+
   if (unified_tree_table[code])
 {
   // Should not be in any other table if it is in the unified table.
-  gcc_checking_assert (!pointer_tree_table[code]);
   gcc_checking_assert (!integral_tree_table[code]);
   return unified_tree_table[code];
 }
 
-  if (POINTER_TYPE_P (type))
-return pointer_tree_table[code];
   if (INTEGRAL_TYPE_P (type))
 return integral_tree_table[code];
   return NULL;
@@ -3121,37 +3128,12 @@ operator_logical_and::op2_range (irange &r, tree type,
 }
 
 
-class operator_bitwise_and : public range_operator
+void
+operator_bitwise_and::update_bitmask (irange &r, const irange &lh,
+  const irange &rh) const
 {
-  using range_operator::op1_range;
-  using range_operator::op2_range;
-  using range_operator::lhs_op1_relation;
-public:
-  virtual bool op1_range (irange &r, tree type,
-			  const irange &lhs,
-			  const irange &op2,
-			  relation_trio rel = TRIO_VARYING) const;
-  virtual bool op2_range (irange &r, tree type,
-			  const irange &lhs,
-			  const irange &op1,
-			  relation_trio rel = TRIO_VARYING) const;
-  virtual void wi_fold (irange &r, tree type,
-		const wide_int &lh_lb,
-		const wide_int &lh_ub,
-		const wide_int &rh_lb,
-		const wide_int &rh_ub) const;
-  virtual relation_kind lhs_op1_relation (const irange &lhs,
-	  const irange &op1,
-	  const irange &op2,
-	  relation_kind) const;
-  void update_bitmask (irange &r, const irange &lh, const irange &rh) const
-{ update_known_bitmask (r, BIT_AND_EXPR, lh, rh); }
-private:
-  void

[COMMITTED 5/17] - Move operator_bitwise_or to the unified range-op table.

2023-06-12 Thread Andrew MacLeod via Gcc-patches

Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From a71ee5c2d48691280f76a90e2838d968f45de0c8 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:05:33 -0400
Subject: [PATCH 05/17] Move operator_bitwise_or to the unified range-op table.

	* range-op-mixed.h (class operator_bitwise_or): Move from...
	* range-op.cc (unified_table::unified_table): Add BIT_IOR_EXPR.
	(class operator_bitwise_or): Move from here.
	(integral_table::integral_table): Remove BIT_IOR_EXPR.
---
 gcc/range-op-mixed.h | 19 +++
 gcc/range-op.cc  | 28 +++-
 2 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index b3d51f8a54e..8a11d61220c 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -577,4 +577,23 @@ private:
 const irange &op2) const;
 };
 
+class operator_bitwise_or : public range_operator
+{
+public:
+  using range_operator::op1_range;
+  using range_operator::op2_range;
+  bool op1_range (irange &r, tree type,
+		  const irange &lhs, const irange &op2,
+		  relation_trio rel = TRIO_VARYING) const final override;
+  bool op2_range (irange &r, tree type,
+		  const irange &lhs, const irange &op1,
+		  relation_trio rel = TRIO_VARYING) const final override;
+  void update_bitmask (irange &r, const irange &lh,
+		   const irange &rh) const final override;
+private:
+  void wi_fold (irange &r, tree type, const wide_int &lh_lb,
+		const wide_int &lh_ub, const wide_int &rh_lb,
+		const wide_int &rh_ub) const final override;
+};
+
 #endif // GCC_RANGE_OP_MIXED_H
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 57bd95a1151..07e0c88e209 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -79,6 +79,7 @@ operator_addr_expr op_addr;
 operator_bitwise_not op_bitwise_not;
 operator_bitwise_xor op_bitwise_xor;
 operator_bitwise_and op_bitwise_and;
+operator_bitwise_or op_bitwise_or;
 
 // Invoke the initialization routines for each class of range.
 
@@ -117,6 +118,7 @@ unified_table::unified_table ()
   // implementation.  These also remain in the pointer table until a pointer
   // speifc version is provided.
   set (BIT_AND_EXPR, op_bitwise_and);
+  set (BIT_IOR_EXPR, op_bitwise_or);
 }
 
 // The tables are hidden and accessed via a simple extern function.
@@ -3608,27 +3610,12 @@ operator_logical_or::op2_range (irange &r, tree type,
 }
 
 
-class operator_bitwise_or : public range_operator
+void
+operator_bitwise_or::update_bitmask (irange &r, const irange &lh,
+ const irange &rh) const
 {
-  using range_operator::op1_range;
-  using range_operator::op2_range;
-public:
-  virtual bool op1_range (irange &r, tree type,
-			  const irange &lhs,
-			  const irange &op2,
-			  relation_trio rel = TRIO_VARYING) const;
-  virtual bool op2_range (irange &r, tree type,
-			  const irange &lhs,
-			  const irange &op1,
-			  relation_trio rel = TRIO_VARYING) const;
-  virtual void wi_fold (irange &r, tree type,
-		const wide_int &lh_lb,
-		const wide_int &lh_ub,
-		const wide_int &rh_lb,
-		const wide_int &rh_ub) const;
-  void update_bitmask (irange &r, const irange &lh, const irange &rh) const
-{ update_known_bitmask (r, BIT_IOR_EXPR, lh, rh); }
-} op_bitwise_or;
+  update_known_bitmask (r, BIT_IOR_EXPR, lh, rh);
+}
 
 void
 operator_bitwise_or::wi_fold (irange &r, tree type,
@@ -4549,7 +4536,6 @@ integral_table::integral_table ()
 {
   set (MIN_EXPR, op_min);
   set (MAX_EXPR, op_max);
-  set (BIT_IOR_EXPR, op_bitwise_or);
 }
 
 // Initialize any integral operators to the primary table
-- 
2.40.1



[COMMITTED 8/17] - Split pointer based range operators to range-op-ptr.cc

2023-06-12 Thread Andrew MacLeod via Gcc-patches
This patch moves all the pointer specific code into a new file 
range-op-ptr.cc


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From cb511d2209fa3a05801983a6965656734c1592c6 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:17:51 -0400
Subject: [PATCH 08/17] Split pointer ibased range operators to range-op-ptr.cc

MOve the pointer table and all pointer specific operators into a
new file for pointers.

	* Makefile.in (OBJS): Add range-op-ptr.o.
	* range-op-mixed.h (update_known_bitmask): Move prototype here.
	(minus_op1_op2_relation_effect): Move prototype here.
	(wi_includes_zero_p): Move function to here.
	(wi_zero_p): Ditto.
	* range-op.cc (update_known_bitmask): Remove static.
	(wi_includes_zero_p): Move to header.
	(wi_zero_p): Move to header.
	(minus_op1_op2_relation_effect): Remove static.
	(operator_pointer_diff): Move class and routines to range-op-ptr.cc.
	(pointer_plus_operator): Ditto.
	(pointer_min_max_operator): Ditto.
	(pointer_and_operator): Ditto.
	(pointer_or_operator): Ditto.
	(pointer_table): Ditto.
	(range_op_table::initialize_pointer_ops): Ditto.
	* range-op-ptr.cc: New.
---
 gcc/Makefile.in  |   1 +
 gcc/range-op-mixed.h |  25 
 gcc/range-op-ptr.cc  | 286 +++
 gcc/range-op.cc  | 258 +-
 4 files changed, 314 insertions(+), 256 deletions(-)
 create mode 100644 gcc/range-op-ptr.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 0c02f312985..4be82e83b9e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1588,6 +1588,7 @@ OBJS = \
 	range.o \
 	range-op.o \
 	range-op-float.o \
+	range-op-ptr.o \
 	read-md.o \
 	read-rtl.o \
 	read-rtl-function.o \
diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index cd137acd0e6..b188f5a516e 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -22,6 +22,31 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_RANGE_OP_MIXED_H
 #define GCC_RANGE_OP_MIXED_H
 
+void update_known_bitmask (irange &, tree_code, const irange &, const irange &);
+bool minus_op1_op2_relation_effect (irange &lhs_range, tree type,
+const irange &, const irange &,
+relation_kind rel);
+
+
+// Return TRUE if 0 is within [WMIN, WMAX].
+
+inline bool
+wi_includes_zero_p (tree type, const wide_int &wmin, const wide_int &wmax)
+{
+  signop sign = TYPE_SIGN (type);
+  return wi::le_p (wmin, 0, sign) && wi::ge_p (wmax, 0, sign);
+}
+
+// Return TRUE if [WMIN, WMAX] is the singleton 0.
+
+inline bool
+wi_zero_p (tree type, const wide_int &wmin, const wide_int &wmax)
+{
+  unsigned prec = TYPE_PRECISION (type);
+  return wmin == wmax && wi::eq_p (wmin, wi::zero (prec));
+}
+
+
 enum bool_range_state { BRS_FALSE, BRS_TRUE, BRS_EMPTY, BRS_FULL };
 bool_range_state get_bool_state (vrange &r, const vrange &lhs, tree val_type);
 
diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc
new file mode 100644
index 000..55c37cc8c86
--- /dev/null
+++ b/gcc/range-op-ptr.cc
@@ -0,0 +1,286 @@
+/* Code for range operators.
+   Copyright (C) 2017-2023 Free Software Foundation, Inc.
+   Contributed by Andrew MacLeod 
+   and Aldy Hernandez .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "insn-codes.h"
+#include "rtl.h"
+#include "tree.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "optabs-tree.h"
+#include "gimple-pretty-print.h"
+#include "diagnostic-core.h"
+#include "flags.h"
+#include "fold-const.h"
+#include "stor-layout.h"
+#include "calls.h"
+#include "cfganal.h"
+#include "gimple-iterator.h"
+#include "gimple-fold.h"
+#include "tree-eh.h"
+#include "gimple-walk.h"
+#include "tree-cfg.h"
+#include "wide-int.h"
+#include "value-relation.h"
+#include "range-op.h"
+#include "tree-ssa-ccp.h"
+#include "range-op-mixed.h"
+
+class pointer_plus_operator : public range_operator
+{
+  using range_operator::op2_range;
+public:
+  virtual void wi_fold (irange &r, tree type,
+			const wide_int &lh_lb,
+			const wide_int &lh_ub,
+			const wide_int &rh_lb,
+			const wide_int &rh_ub) const;
+  virtual bool op2_range (irange &r, tree type,
+			  const irange &lhs,
+			  const irange &op1,
+			  relation_trio = TRIO_VARYING) const;
+  void update_bitmask (irange &r, const irange &lh, con

[COMMITTED 13/17] - Remove type from range_op_handler table selection

2023-06-12 Thread Andrew MacLeod via Gcc-patches
Lucky 13.  WIth the unified table complete, it is no longer necessary to 
specify a type when constructing a range_op_handler. This patch removes 
that requirement.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 8934830333933349d41e62f9fd6a3d21ab71150c Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:41:20 -0400
Subject: [PATCH 13/17] Remove type from range_op_handler table selection

With the unified table complete, we no loonger need to specify a type
to choose a table when setting a range_op_handler.

	* gimple-range-gori.cc (gori_compute::condexpr_adjust): Do not
	pass type.
	* gimple-range-op.cc (get_code): Rename from get_code_and_type
	and simplify.
	(gimple_range_op_handler::supported_p): No need for type.
	(gimple_range_op_handler::gimple_range_op_handler): Ditto.
	(cfn_copysign::fold_range): Ditto.
	(cfn_ubsan::fold_range): Ditto.
	* ipa-cp.cc (ipa_vr_operation_and_type_effects): Ditto.
	* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Ditto.
	* range-op-float.cc (operator_plus::op1_range): Ditto.
	(operator_mult::op1_range): Ditto.
	(range_op_float_tests): Ditto.
	* range-op.cc (get_op_handler): Remove.
	(range_op_handler::set_op_handler): Remove.
	(operator_plus::op1_range): No need for type.
	(operator_minus::op1_range): Ditto.
	(operator_mult::op1_range): Ditto.
	(operator_exact_divide::op1_range): Ditto.
	(operator_cast::op1_range): Ditto.
	(perator_bitwise_not::fold_range): Ditto.
	(operator_negate::fold_range): Ditto.
	* range-op.h (range_op_handler::range_op_handler): Remove type param.
	(range_cast): No need for type.
	(range_op_table::operator[]): Check for enum_code >= 0.
	* tree-data-ref.cc (compute_distributive_range): No need for type.
	* tree-ssa-loop-unswitch.cc (unswitch_predicate): Ditto.
	* value-query.cc (range_query::get_tree_range): Ditto.
	* value-relation.cc (relation_oracle::validate_relation): Ditto.
	* vr-values.cc (range_of_var_in_loop): Ditto.
	(simplify_using_ranges::fold_cond_with_ops): Ditto.
---
 gcc/gimple-range-gori.cc  |  2 +-
 gcc/gimple-range-op.cc| 42 ++-
 gcc/ipa-cp.cc |  6 ++---
 gcc/ipa-fnsummary.cc  |  6 ++---
 gcc/range-op-float.cc |  6 ++---
 gcc/range-op.cc   | 39 
 gcc/range-op.h| 10 +++--
 gcc/tree-data-ref.cc  |  4 ++--
 gcc/tree-ssa-loop-unswitch.cc |  2 +-
 gcc/value-query.cc|  5 ++---
 gcc/value-relation.cc |  2 +-
 gcc/vr-values.cc  |  6 ++---
 12 files changed, 43 insertions(+), 87 deletions(-)

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index a1c8d51e484..abc70cd54ee 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -1478,7 +1478,7 @@ gori_compute::condexpr_adjust (vrange &r1, vrange &r2, gimple *, tree cond,
   tree type = TREE_TYPE (gimple_assign_rhs1 (cond_def));
   if (!range_compatible_p (type, TREE_TYPE (gimple_assign_rhs2 (cond_def
 return false;
-  range_op_handler hand (gimple_assign_rhs_code (cond_def), type);
+  range_op_handler hand (gimple_assign_rhs_code (cond_def));
   if (!hand)
 return false;
 
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index b6b10e47b78..4cbc981ee04 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -94,28 +94,14 @@ gimple_range_base_of_assignment (const gimple *stmt)
 
 // If statement is supported by range-ops, set the CODE and return the TYPE.
 
-static tree
-get_code_and_type (gimple *s, enum tree_code &code)
+static inline enum tree_code
+get_code (gimple *s)
 {
-  tree type = NULL_TREE;
-  code = NOP_EXPR;
-
   if (const gassign *ass = dyn_cast (s))
-{
-  code = gimple_assign_rhs_code (ass);
-  // The LHS of a comparison is always an int, so we must look at
-  // the operands.
-  if (TREE_CODE_CLASS (code) == tcc_comparison)
-	type = TREE_TYPE (gimple_assign_rhs1 (ass));
-  else
-	type = TREE_TYPE (gimple_assign_lhs (ass));
-}
-  else if (const gcond *cond = dyn_cast (s))
-{
-  code = gimple_cond_code (cond);
-  type = TREE_TYPE (gimple_cond_lhs (cond));
-}
-  return type;
+return gimple_assign_rhs_code (ass);
+  if (const gcond *cond = dyn_cast (s))
+return gimple_cond_code (cond);
+  return ERROR_MARK;
 }
 
 // If statement S has a supported range_op handler return TRUE.
@@ -123,9 +109,8 @@ get_code_and_type (gimple *s, enum tree_code &code)
 bool
 gimple_range_op_handler::supported_p (gimple *s)
 {
-  enum tree_code code;
-  tree type = get_code_and_type (s, code);
-  if (type && range_op_handler (code, type))
+  enum tree_code code = get_code (s);
+  if (range_op_handler (code))
 return true;
   if (is_a  (s) && gimple_range_op_handler (s))
 return true;
@@ -135,14 +120,11 @@ gimple_range_op_handler::supported_p (gimple *s)
 // Construct a handler object for statement S.
 
 gimple_range_op_handler::gimple_range

[COMMITTED 6/17] - Move operator_min to the unified range-op table.

2023-06-12 Thread Andrew MacLeod via Gcc-patches

Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 508645fd461ceb8b743837e24411df2e17bd3950 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:09:58 -0400
Subject: [PATCH 06/17] Move operator_min to the unified range-op table.

	* range-op-mixed.h (class operator_min): Move from...
	* range-op.cc (unified_table::unified_table): Add MIN_EXPR.
	(class operator_min): Move from here.
	(integral_table::integral_table): Remove MIN_EXPR.
---
 gcc/range-op-mixed.h | 11 +++
 gcc/range-op.cc  | 18 +++---
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index 8a11d61220c..7bd9b5e1129 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -596,4 +596,15 @@ private:
 		const wide_int &rh_ub) const final override;
 };
 
+class operator_min : public range_operator
+{
+public:
+  void update_bitmask (irange &r, const irange &lh,
+		   const irange &rh) const final override;
+private:
+  void wi_fold (irange &r, tree type, const wide_int &lh_lb,
+		const wide_int &lh_ub, const wide_int &rh_lb,
+		const wide_int &rh_ub) const final override;
+};
+
 #endif // GCC_RANGE_OP_MIXED_H
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 07e0c88e209..a777fb0d8a3 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -80,6 +80,7 @@ operator_bitwise_not op_bitwise_not;
 operator_bitwise_xor op_bitwise_xor;
 operator_bitwise_and op_bitwise_and;
 operator_bitwise_or op_bitwise_or;
+operator_min op_min;
 
 // Invoke the initialization routines for each class of range.
 
@@ -119,6 +120,7 @@ unified_table::unified_table ()
   // speifc version is provided.
   set (BIT_AND_EXPR, op_bitwise_and);
   set (BIT_IOR_EXPR, op_bitwise_or);
+  set (MIN_EXPR, op_min);
 }
 
 // The tables are hidden and accessed via a simple extern function.
@@ -1980,17 +1982,12 @@ operator_pointer_diff::op1_op2_relation_effect (irange &lhs_range, tree type,
 }
 
 
-class operator_min : public range_operator
+void
+operator_min::update_bitmask (irange &r, const irange &lh,
+			  const irange &rh) const
 {
-public:
-  virtual void wi_fold (irange &r, tree type,
-		const wide_int &lh_lb,
-		const wide_int &lh_ub,
-		const wide_int &rh_lb,
-		const wide_int &rh_ub) const;
-  void update_bitmask (irange &r, const irange &lh, const irange &rh) const
-{ update_known_bitmask (r, MIN_EXPR, lh, rh); }
-} op_min;
+  update_known_bitmask (r, MIN_EXPR, lh, rh);
+}
 
 void
 operator_min::wi_fold (irange &r, tree type,
@@ -4534,7 +4531,6 @@ pointer_or_operator::wi_fold (irange &r, tree type,
 
 integral_table::integral_table ()
 {
-  set (MIN_EXPR, op_min);
   set (MAX_EXPR, op_max);
 }
 
-- 
2.40.1



[COMMITTED 14/17] - Switch from unified table to range_op_table. There can be only one.

2023-06-12 Thread Andrew MacLeod via Gcc-patches
Now that the unified table is the only one,  remove it and simply use 
range_op_table as the class instead of inheriting from it.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 5bb9d2acd1987f788a52a2be9bca10c47033020a Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:56:06 -0400
Subject: [PATCH 14/17] Switch from unified table to range_op_table.  There can
 be only one.

Now that there is only a single range_op_table, make the base table the
only table.

	* range-op.cc (unified_table): Delete.
	(range_op_table operator_table): Instantiate.
	(range_op_table::range_op_table): Rename from unified_table.
	(range_op_handler::range_op_handler): Use range_op_table.
	* range-op.h (range_op_table::operator []): Inline.
	(range_op_table::set): Inline.
---
 gcc/range-op.cc | 14 +-
 gcc/range-op.h  | 33 +++--
 2 files changed, 16 insertions(+), 31 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 3e8b1222b1c..382f5d50ffa 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -49,13 +49,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-ccp.h"
 #include "range-op-mixed.h"
 
-// Instantiate a range_op_table for unified operations.
-class unified_table : public range_op_table
-{
-  public:
-unified_table ();
-} unified_tree_table;
-
 // Instantiate the operators which apply to multiple types here.
 
 operator_equal op_equal;
@@ -80,9 +73,12 @@ operator_bitwise_or op_bitwise_or;
 operator_min op_min;
 operator_max op_max;
 
+// Instantaite a range operator table.
+range_op_table operator_table;
+
 // Invoke the initialization routines for each class of range.
 
-unified_table::unified_table ()
+range_op_table::range_op_table ()
 {
   initialize_integral_ops ();
   initialize_pointer_ops ();
@@ -134,7 +130,7 @@ range_op_handler::range_op_handler ()
 
 range_op_handler::range_op_handler (tree_code code)
 {
-  m_operator = unified_tree_table[code];
+  m_operator = operator_table[code];
 }
 
 // Create a dispatch pattern for value range discriminators LHS, OP1, and OP2.
diff --git a/gcc/range-op.h b/gcc/range-op.h
index 295e5116dd1..328910d0ec5 100644
--- a/gcc/range-op.h
+++ b/gcc/range-op.h
@@ -266,35 +266,24 @@ extern void wi_set_zero_nonzero_bits (tree type,
 class range_op_table
 {
 public:
-  range_operator *operator[] (enum tree_code code);
-  void set (enum tree_code code, range_operator &op);
+  range_op_table ();
+  inline range_operator *operator[] (enum tree_code code)
+{
+  gcc_checking_assert (code >= 0 && code < MAX_TREE_CODES);
+  return m_range_tree[code];
+}
 protected:
+  inline void set (enum tree_code code, range_operator &op)
+{
+  gcc_checking_assert (m_range_tree[code] == NULL);
+  m_range_tree[code] = &op;
+}
   range_operator *m_range_tree[MAX_TREE_CODES];
   void initialize_integral_ops ();
   void initialize_pointer_ops ();
   void initialize_float_ops ();
 };
 
-
-// Return a pointer to the range_operator instance, if there is one
-// associated with tree_code CODE.
-
-inline range_operator *
-range_op_table::operator[] (enum tree_code code)
-{
-  gcc_checking_assert (code >= 0 && code < MAX_TREE_CODES);
-  return m_range_tree[code];
-}
-
-// Add OP to the handler table for CODE.
-
-inline void
-range_op_table::set (enum tree_code code, range_operator &op)
-{
-  gcc_checking_assert (m_range_tree[code] == NULL);
-  m_range_tree[code] = &op;
-}
-
 extern range_operator *ptr_op_widen_mult_signed;
 extern range_operator *ptr_op_widen_mult_unsigned;
 extern range_operator *ptr_op_widen_plus_signed;
-- 
2.40.1



[COMMITTED 16/17] - Provide interface for non-standard operators.

2023-06-12 Thread Andrew MacLeod via Gcc-patches
This patch removes the hack introduced late last year for the 
non-standard range-op support.


Instead of adding a a pointer to a range_operator in the header file, 
and then setting the operator from another file via that pointer, the 
table itself is extended and  we provide new #defines to declare new 
operators.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 6d3b6847bcb36221185a6259d19d743f4cfe1b5a Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 17:06:36 -0400
Subject: [PATCH 16/17] Provide interface for non-standard operators.

THis removes the hack introduced for WIDEN_MULT which exported a pointer
to the operator and the gimple-range-op.cc set the operator to this
pointer whenn it was appropriate.

Instead, we simple change the range-op table to be unsigned indexed,
and add new opcodes to the end of the table, allowing them to be indexed
directly via range_op_handler::range_op.

	* gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard):
	Use range_op_handler directly.
	* range-op.cc (range_op_handler::range_op_handler): Unsigned
	param instead of tree-code.
	(ptr_op_widen_plus_signed): Delete.
	(ptr_op_widen_plus_unsigned): Delete.
	(ptr_op_widen_mult_signed): Delete.
	(ptr_op_widen_mult_unsigned): Delete.
	(range_op_table::initialize_integral_ops): Add new opcodes.
	* range-op.h (range_op_handler): Use unsigned.
	(OP_WIDEN_MULT_SIGNED): New.
	(OP_WIDEN_MULT_UNSIGNED): New.
	(OP_WIDEN_PLUS_SIGNED): New.
	(OP_WIDEN_PLUS_UNSIGNED): New.
	(RANGE_OP_TABLE_SIZE): New.
	(range_op_table::operator []): Use unsigned.
	(range_op_table::set): Use unsigned.
	(m_range_tree): Make unsigned.
	(ptr_op_widen_mult_signed): Remove.
	(ptr_op_widen_mult_unsigned): Remove.
	(ptr_op_widen_plus_signed): Remove.
	(ptr_op_widen_plus_unsigned): Remove.
---
 gcc/gimple-range-op.cc | 11 +++
 gcc/range-op.cc| 11 ++-
 gcc/range-op.h | 26 --
 3 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 021a9108ecf..72c7b866f90 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1168,8 +1168,11 @@ public:
 void
 gimple_range_op_handler::maybe_non_standard ()
 {
-  range_operator *signed_op = ptr_op_widen_mult_signed;
-  range_operator *unsigned_op = ptr_op_widen_mult_unsigned;
+  range_op_handler signed_op (OP_WIDEN_MULT_SIGNED);
+  gcc_checking_assert (signed_op);
+  range_op_handler unsigned_op (OP_WIDEN_MULT_UNSIGNED);
+  gcc_checking_assert (unsigned_op);
+
   if (gimple_code (m_stmt) == GIMPLE_ASSIGN)
 switch (gimple_assign_rhs_code (m_stmt))
   {
@@ -1195,9 +1198,9 @@ gimple_range_op_handler::maybe_non_standard ()
 	std::swap (m_op1, m_op2);
 
 	  if (signed1 || signed2)
-	m_operator = signed_op;
+	m_operator = signed_op.range_op ();
 	  else
-	m_operator = unsigned_op;
+	m_operator = unsigned_op.range_op ();
 	  break;
 	}
 	default:
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index a271e00fa07..8a661fdb042 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -135,7 +135,7 @@ range_op_handler::range_op_handler ()
 // Create a range_op_handler for CODE.  Use a default operatoer if CODE
 // does not have an entry.
 
-range_op_handler::range_op_handler (tree_code code)
+range_op_handler::range_op_handler (unsigned code)
 {
   m_operator = operator_table[code];
   if (!m_operator)
@@ -1726,7 +1726,6 @@ public:
 			const wide_int &rh_lb,
 			const wide_int &rh_ub) const;
 } op_widen_plus_signed;
-range_operator *ptr_op_widen_plus_signed = &op_widen_plus_signed;
 
 void
 operator_widen_plus_signed::wi_fold (irange &r, tree type,
@@ -1760,7 +1759,6 @@ public:
 			const wide_int &rh_lb,
 			const wide_int &rh_ub) const;
 } op_widen_plus_unsigned;
-range_operator *ptr_op_widen_plus_unsigned = &op_widen_plus_unsigned;
 
 void
 operator_widen_plus_unsigned::wi_fold (irange &r, tree type,
@@ -2184,7 +2182,6 @@ public:
 			const wide_int &rh_ub)
 const;
 } op_widen_mult_signed;
-range_operator *ptr_op_widen_mult_signed = &op_widen_mult_signed;
 
 void
 operator_widen_mult_signed::wi_fold (irange &r, tree type,
@@ -2217,7 +2214,6 @@ public:
 			const wide_int &rh_ub)
 const;
 } op_widen_mult_unsigned;
-range_operator *ptr_op_widen_mult_unsigned = &op_widen_mult_unsigned;
 
 void
 operator_widen_mult_unsigned::wi_fold (irange &r, tree type,
@@ -4298,6 +4294,11 @@ range_op_table::initialize_integral_ops ()
   set (IMAGPART_EXPR, op_unknown);
   set (REALPART_EXPR, op_unknown);
   set (ABSU_EXPR, op_absu);
+  set (OP_WIDEN_MULT_SIGNED, op_widen_mult_signed);
+  set (OP_WIDEN_MULT_UNSIGNED, op_widen_mult_unsigned);
+  set (OP_WIDEN_PLUS_SIGNED, op_widen_plus_signed);
+  set (OP_WIDEN_PLUS_UNSIGNED, op_widen_plus_unsigned);
+
 }
 
 #if CHECKING_P
diff --git a/gcc/range-op.h b/gcc/range-op.h
index 8243258eea5..3602bc4e123 100644
--- a/gcc/range-op.h
+++ b/gcc/range-op.h
@@ -185,7 +185,7 @@ class range_op_handler

[COMMITTED 9/17] - Add a hybrid BIT_AND_EXPR operator for integer and pointer.

2023-06-12 Thread Andrew MacLeod via Gcc-patches
Add a hybrid operator to choose between integer and pointer versions at 
runtime.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 8adb8b2fd5797706e9fbb353d52fda123545431d Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:28:40 -0400
Subject: [PATCH 09/17] Add a hybrid BIT_AND_EXPR operator for integer and
 pointer.

This adds an operator to the unified table for BIT_AND_EXPR which will
select either the pointer or integer version based on the type passed
to the method.   This is for use until we have a seperate PRANGE class.

	* range-op-mixed.h (operator_bitwise_and): Remove final.
	* range-op-ptr.cc (pointer_table::pointer_table): Remove BIT_AND_EXPR.
	(class hybrid_and_operator): New.
	(range_op_table::initialize_pointer_ops): Add hybrid_and_operator.
	* range-op.cc (unified_table::unified_table): Comment out BIT_AND_EXPR.
---
 gcc/range-op-mixed.h | 12 -
 gcc/range-op-ptr.cc  | 62 +++-
 gcc/range-op.cc  |  9 ---
 3 files changed, 73 insertions(+), 10 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index b188f5a516e..4177818e4b9 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -584,19 +584,19 @@ public:
   using range_operator::lhs_op1_relation;
   bool op1_range (irange &r, tree type,
 		  const irange &lhs, const irange &op2,
-		  relation_trio rel = TRIO_VARYING) const final override;
+		  relation_trio rel = TRIO_VARYING) const override;
   bool op2_range (irange &r, tree type,
 		  const irange &lhs, const irange &op1,
-		  relation_trio rel = TRIO_VARYING) const final override;
+		  relation_trio rel = TRIO_VARYING) const override;
   relation_kind lhs_op1_relation (const irange &lhs,
   const irange &op1, const irange &op2,
-  relation_kind) const final override;
+  relation_kind) const override;
   void update_bitmask (irange &r, const irange &lh,
-		   const irange &rh) const final override;
-private:
+		   const irange &rh) const override;
+protected:
   void wi_fold (irange &r, tree type, const wide_int &lh_lb,
 		const wide_int &lh_ub, const wide_int &rh_lb,
-		const wide_int &rh_ub) const final override;
+		const wide_int &rh_ub) const override;
   void simple_op1_range_solver (irange &r, tree type,
 const irange &lhs,
 const irange &op2) const;
diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc
index 55c37cc8c86..941026994ed 100644
--- a/gcc/range-op-ptr.cc
+++ b/gcc/range-op-ptr.cc
@@ -270,12 +270,71 @@ operator_pointer_diff::op1_op2_relation_effect (irange &lhs_range, tree type,
 
 pointer_table::pointer_table ()
 {
-  set (BIT_AND_EXPR, op_pointer_and);
   set (BIT_IOR_EXPR, op_pointer_or);
   set (MIN_EXPR, op_ptr_min_max);
   set (MAX_EXPR, op_ptr_min_max);
 }
 
+// --
+// Hybrid operators for the 4 operations which integer and pointers share,
+// but which have different implementations.  Simply check the type in
+// the call and choose the appropriate method.
+// Once there is a PRANGE signature, simply add the appropriate
+// prototypes in the rmixed range class, and remove these hybrid classes.
+
+class hybrid_and_operator : public operator_bitwise_and
+{
+public:
+  using range_operator::op1_range;
+  using range_operator::op2_range;
+  using range_operator::lhs_op1_relation;
+  bool op1_range (irange &r, tree type,
+		  const irange &lhs, const irange &op2,
+		  relation_trio rel = TRIO_VARYING) const final override
+{
+  if (INTEGRAL_TYPE_P (type))
+	return operator_bitwise_and::op1_range (r, type, lhs, op2, rel);
+  else
+	return false;
+}
+  bool op2_range (irange &r, tree type,
+		  const irange &lhs, const irange &op1,
+		  relation_trio rel = TRIO_VARYING) const final override
+{
+  if (INTEGRAL_TYPE_P (type))
+	return operator_bitwise_and::op2_range (r, type, lhs, op1, rel);
+  else
+	return false;
+}
+  relation_kind lhs_op1_relation (const irange &lhs,
+  const irange &op1, const irange &op2,
+  relation_kind rel) const final override
+{
+  if (!lhs.undefined_p () && INTEGRAL_TYPE_P (lhs.type ()))
+	return operator_bitwise_and::lhs_op1_relation (lhs, op1, op2, rel);
+  else
+	return VREL_VARYING;
+}
+  void update_bitmask (irange &r, const irange &lh,
+		   const irange &rh) const final override
+{
+  if (!r.undefined_p () && INTEGRAL_TYPE_P (r.type ()))
+	operator_bitwise_and::update_bitmask (r, lh, rh);
+}
+
+  void wi_fold (irange &r, tree type, const wide_int &lh_lb,
+		const wide_int &lh_ub, const wide_int &rh_lb,
+		const wide_int &rh_ub) const final override
+{
+  if (INTEGRAL_TYPE_P (type))
+	return operator_bitwise_and::wi_fold (r, type, lh_lb, lh_ub,
+	  rh_lb, rh_ub);
+  else
+	return op_pointer_and.wi_fold (r, type, lh_lb, lh_ub, rh_lb, rh_ub);
+}
+} op_hybrid_and;
+
+
 // Initialize any pointer operators to the primary 

[COMMITTED 15/17] - Provide a default range_operator via range_op_handler.

2023-06-12 Thread Andrew MacLeod via Gcc-patches
This provides range_op_handler with a default range_operator, so you no 
longer need to check if it has a valid handler or not.


The valid check now turns into a "is this something other than a default 
operator" check.   IT means you can now simply invoke fold without 
checking.. ie instead of


range_op_handler handler(CONVERT_EXPR);
if (handler &&  handler.fold_range (..))

we can simply write
if (range_op_handler(CONVERT_EXPR).fold_range ())

The new method range_op() will return the a pointer to the custom 
range_operator, or NULL if its the default.   THis allos use of 
range_op_handler() to behave as if you were indexing a range table/ if 
that happens to be needed.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 3c4399657d35a0b5bf7caeb88c6ddc0461322d3f Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:59:38 -0400
Subject: [PATCH 15/17] Provide a default range_operator via range_op_handler.

range_op_handler now provides a default range_operator for any opcode,
so there is no longer a need to check for a valid operator.

	* gimple-range-op.cc (gimple_range_op_handler): Set m_operator
	manually as there is no access to the default operator.
	(cfn_copysign::fold_range): Don't check for validity.
	(cfn_ubsan::fold_range): Ditto.
	(gimple_range_op_handler::maybe_builtin_call): Don't set to NULL.
	* range-op.cc (default_operator): New.
	(range_op_handler::range_op_handler): Use default_operator
	instead of NULL.
	(range_op_handler::operator bool): Move from header, compare
	against default operator.
	(range_op_handler::range_op): New.
	* range-op.h (range_op_handler::operator bool): Move.
---
 gcc/gimple-range-op.cc | 28 +---
 gcc/range-op.cc| 32 ++--
 gcc/range-op.h |  3 ++-
 3 files changed, 45 insertions(+), 18 deletions(-)

diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 4cbc981ee04..021a9108ecf 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -120,21 +120,22 @@ gimple_range_op_handler::supported_p (gimple *s)
 // Construct a handler object for statement S.
 
 gimple_range_op_handler::gimple_range_op_handler (gimple *s)
-  : range_op_handler (get_code (s))
 {
+  range_op_handler oper (get_code (s));
   m_stmt = s;
   m_op1 = NULL_TREE;
   m_op2 = NULL_TREE;
 
-  if (m_operator)
+  if (oper)
 switch (gimple_code (m_stmt))
   {
 	case GIMPLE_COND:
 	  m_op1 = gimple_cond_lhs (m_stmt);
 	  m_op2 = gimple_cond_rhs (m_stmt);
 	  // Check that operands are supported types.  One check is enough.
-	  if (!Value_Range::supports_type_p (TREE_TYPE (m_op1)))
-	m_operator = NULL;
+	  if (Value_Range::supports_type_p (TREE_TYPE (m_op1)))
+	m_operator = oper.range_op ();
+	  gcc_checking_assert (m_operator);
 	  return;
 	case GIMPLE_ASSIGN:
 	  m_op1 = gimple_range_base_of_assignment (m_stmt);
@@ -153,7 +154,9 @@ gimple_range_op_handler::gimple_range_op_handler (gimple *s)
 	m_op2 = gimple_assign_rhs2 (m_stmt);
 	  // Check that operands are supported types.  One check is enough.
 	  if ((m_op1 && !Value_Range::supports_type_p (TREE_TYPE (m_op1
-	m_operator = NULL;
+	return;
+	  m_operator = oper.range_op ();
+	  gcc_checking_assert (m_operator);
 	  return;
 	default:
 	  gcc_unreachable ();
@@ -165,6 +168,7 @@ gimple_range_op_handler::gimple_range_op_handler (gimple *s)
 maybe_builtin_call ();
   else
 maybe_non_standard ();
+  gcc_checking_assert (m_operator);
 }
 
 // Calculate what we can determine of the range of this unary
@@ -364,11 +368,10 @@ public:
 			   const frange &rh, relation_trio) const override
   {
 frange neg;
-range_op_handler abs_op (ABS_EXPR);
-range_op_handler neg_op (NEGATE_EXPR);
-if (!abs_op || !abs_op.fold_range (r, type, lh, frange (type)))
+if (!range_op_handler (ABS_EXPR).fold_range (r, type, lh, frange (type)))
   return false;
-if (!neg_op || !neg_op.fold_range (neg, type, r, frange (type)))
+if (!range_op_handler (NEGATE_EXPR).fold_range (neg, type, r,
+		frange (type)))
   return false;
 
 bool signbit;
@@ -1073,14 +1076,11 @@ public:
   virtual bool fold_range (irange &r, tree type, const irange &lh,
 			   const irange &rh, relation_trio rel) const
   {
-range_op_handler handler (m_code);
-gcc_checking_assert (handler);
-
 bool saved_flag_wrapv = flag_wrapv;
 // Pretend the arithmetic is wrapping.  If there is any overflow,
 // we'll complain, but will actually do wrapping operation.
 flag_wrapv = 1;
-bool result = handler.fold_range (r, type, lh, rh, rel);
+bool result = range_op_handler (m_code).fold_range (r, type, lh, rh, rel);
 flag_wrapv = saved_flag_wrapv;
 
 // If for both arguments vrp_valueize returned non-NULL, this should
@@ -1230,8 +1230,6 @@ gimple_range_op_handler::maybe_builtin_call ()
 	m_operator = &op_cfn_constant_p;
   else if (frange::supports_p (TREE_TYPE (m_o

[COMMITTED 10/17] - Add a hybrid BIT_IOR_EXPR operator for integer and pointer.

2023-06-12 Thread Andrew MacLeod via Gcc-patches
Add a hybrid operator to choose between integer and pointer versions at 
runtime.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 80f402e832a2ce402ee1562030d5c67ebc276f7c Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:33:17 -0400
Subject: [PATCH 10/17] Add a hybrid BIT_IOR_EXPR operator for integer and
 pointer.

This adds an operator to the unified table for BIT_IOR_EXPR which will
select either the pointer or integer version based on the type passed
to the method.   This is for use until we have a seperate PRANGE class.

	* range-op-mixed.h (operator_bitwise_or): Remove final.
	* range-op-ptr.cc (pointer_table::pointer_table): Remove BIT_IOR_EXPR.
	(class hybrid_or_operator): New.
	(range_op_table::initialize_pointer_ops): Add hybrid_or_operator.
	* range-op.cc (unified_table::unified_table): Comment out BIT_IOR_EXPR.
---
 gcc/range-op-mixed.h | 10 -
 gcc/range-op-ptr.cc  | 52 ++--
 gcc/range-op.cc  |  4 ++--
 3 files changed, 57 insertions(+), 9 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index 4177818e4b9..e4852e974c4 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -609,16 +609,16 @@ public:
   using range_operator::op2_range;
   bool op1_range (irange &r, tree type,
 		  const irange &lhs, const irange &op2,
-		  relation_trio rel = TRIO_VARYING) const final override;
+		  relation_trio rel = TRIO_VARYING) const override;
   bool op2_range (irange &r, tree type,
 		  const irange &lhs, const irange &op1,
-		  relation_trio rel = TRIO_VARYING) const final override;
+		  relation_trio rel = TRIO_VARYING) const override;
   void update_bitmask (irange &r, const irange &lh,
-		   const irange &rh) const final override;
-private:
+		   const irange &rh) const override;
+protected:
   void wi_fold (irange &r, tree type, const wide_int &lh_lb,
 		const wide_int &lh_ub, const wide_int &rh_lb,
-		const wide_int &rh_ub) const final override;
+		const wide_int &rh_ub) const override;
 };
 
 class operator_min : public range_operator
diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc
index 941026994ed..7b22d0bf05b 100644
--- a/gcc/range-op-ptr.cc
+++ b/gcc/range-op-ptr.cc
@@ -184,9 +184,9 @@ pointer_and_operator::wi_fold (irange &r, tree type,
 
 class pointer_or_operator : public range_operator
 {
+public:
   using range_operator::op1_range;
   using range_operator::op2_range;
-public:
   virtual bool op1_range (irange &r, tree type,
 			  const irange &lhs,
 			  const irange &op2,
@@ -270,7 +270,6 @@ operator_pointer_diff::op1_op2_relation_effect (irange &lhs_range, tree type,
 
 pointer_table::pointer_table ()
 {
-  set (BIT_IOR_EXPR, op_pointer_or);
   set (MIN_EXPR, op_ptr_min_max);
   set (MAX_EXPR, op_ptr_min_max);
 }
@@ -334,6 +333,54 @@ public:
 }
 } op_hybrid_and;
 
+// Temporary class which dispatches routines to either the INT version or
+// the pointer version depending on the type.  Once PRANGE is a range
+// class, we can remove the hybrid.
+
+class hybrid_or_operator : public operator_bitwise_or
+{
+public:
+  using range_operator::op1_range;
+  using range_operator::op2_range;
+  using range_operator::lhs_op1_relation;
+  bool op1_range (irange &r, tree type,
+		  const irange &lhs, const irange &op2,
+		  relation_trio rel = TRIO_VARYING) const final override
+{
+  if (INTEGRAL_TYPE_P (type))
+	return operator_bitwise_or::op1_range (r, type, lhs, op2, rel);
+  else
+	return op_pointer_or.op1_range (r, type, lhs, op2, rel);
+}
+  bool op2_range (irange &r, tree type,
+		  const irange &lhs, const irange &op1,
+		  relation_trio rel = TRIO_VARYING) const final override
+{
+  if (INTEGRAL_TYPE_P (type))
+	return operator_bitwise_or::op2_range (r, type, lhs, op1, rel);
+  else
+	return op_pointer_or.op2_range (r, type, lhs, op1, rel);
+}
+  void update_bitmask (irange &r, const irange &lh,
+		   const irange &rh) const final override
+{
+  if (!r.undefined_p () && INTEGRAL_TYPE_P (r.type ()))
+	operator_bitwise_or::update_bitmask (r, lh, rh);
+}
+
+  void wi_fold (irange &r, tree type, const wide_int &lh_lb,
+		const wide_int &lh_ub, const wide_int &rh_lb,
+		const wide_int &rh_ub) const final override
+{
+  if (INTEGRAL_TYPE_P (type))
+	return operator_bitwise_or::wi_fold (r, type, lh_lb, lh_ub,
+	  rh_lb, rh_ub);
+  else
+	return op_pointer_or.wi_fold (r, type, lh_lb, lh_ub, rh_lb, rh_ub);
+}
+} op_hybrid_or;
+
+
 
 // Initialize any pointer operators to the primary table
 
@@ -343,4 +390,5 @@ range_op_table::initialize_pointer_ops ()
   set (POINTER_PLUS_EXPR, op_pointer_plus);
   set (POINTER_DIFF_EXPR, op_pointer_diff);
   set (BIT_AND_EXPR, op_hybrid_and);
+  set (BIT_IOR_EXPR, op_hybrid_or);
 }
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index dcb922143ce..0a9a3297de7 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -121,8 +121,8 @@ unified_table::

[COMMITTED 17/17] PR tree-optimization/110205 - Add some overrides.

2023-06-12 Thread Andrew MacLeod via Gcc-patches
Add some missing overrides, and add the diaptch pattern for FII which 
will be used for integer to float conversion.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 1bed4b49302e2fd7bf89426117331ae89ebdc90b Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 12 Jun 2023 09:47:43 -0400
Subject: [PATCH 17/17] Add some overrides.

	PR tree-optimization/110205
	* range-op-float.cc (range_operator::fold_range): Add default FII
	fold routine.
	(Class operator_gt): Add missing final overrides.
	* range-op.cc (range_op_handler::fold_range): Add RO_FII case.
	(operator_lshift ::update_bitmask): Add final override.
	(operator_rshift ::update_bitmask): Add final override.
	* range-op.h (range_operator::fold_range): Add FII prototype.
---
 gcc/range-op-float.cc | 10 ++
 gcc/range-op-mixed.h  |  9 +
 gcc/range-op.cc   | 10 --
 gcc/range-op.h|  4 
 4 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 24f2235884f..f5c0cec75c4 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -157,6 +157,16 @@ range_operator::fold_range (irange &r ATTRIBUTE_UNUSED,
   return false;
 }
 
+bool
+range_operator::fold_range (frange &r ATTRIBUTE_UNUSED,
+			tree type ATTRIBUTE_UNUSED,
+			const irange &lh ATTRIBUTE_UNUSED,
+			const irange &rh ATTRIBUTE_UNUSED,
+			relation_trio) const
+{
+  return false;
+}
+
 bool
 range_operator::op1_range (frange &r ATTRIBUTE_UNUSED,
  tree type ATTRIBUTE_UNUSED,
diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index bdc488b8754..6944742ecbc 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -239,26 +239,27 @@ public:
   using range_operator::op1_op2_relation;
   bool fold_range (irange &r, tree type,
 		   const irange &op1, const irange &op2,
-		   relation_trio = TRIO_VARYING) const;
+		   relation_trio = TRIO_VARYING) const final override;
   bool fold_range (irange &r, tree type,
 		   const frange &op1, const frange &op2,
 		   relation_trio = TRIO_VARYING) const final override;
 
   bool op1_range (irange &r, tree type,
 		  const irange &lhs, const irange &op2,
-		  relation_trio = TRIO_VARYING) const;
+		  relation_trio = TRIO_VARYING) const final override;
   bool op1_range (frange &r, tree type,
 		  const irange &lhs, const frange &op2,
 		  relation_trio = TRIO_VARYING) const final override;
 
   bool op2_range (irange &r, tree type,
 		  const irange &lhs, const irange &op1,
-		  relation_trio = TRIO_VARYING) const;
+		  relation_trio = TRIO_VARYING) const final override;
   bool op2_range (frange &r, tree type,
 		  const irange &lhs, const frange &op1,
 		  relation_trio = TRIO_VARYING) const final override;
   relation_kind op1_op2_relation (const irange &lhs) const final override;
-  void update_bitmask (irange &r, const irange &lh, const irange &rh) const;
+  void update_bitmask (irange &r, const irange &lh,
+		   const irange &rh) const final override;
 };
 
 class operator_ge :  public range_operator
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 8a661fdb042..f0dff53ec1e 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -219,6 +219,10 @@ range_op_handler::fold_range (vrange &r, tree type,
 	return m_operator->fold_range (as_a  (r), type,
    as_a  (lh),
    as_a  (rh), rel);
+  case RO_FII:
+	return m_operator->fold_range (as_a  (r), type,
+   as_a  (lh),
+   as_a  (rh), rel);
   default:
 	return false;
 }
@@ -2401,7 +2405,8 @@ public:
 tree type,
 const wide_int &,
 const wide_int &) const;
-  void update_bitmask (irange &r, const irange &lh, const irange &rh) const
+  void update_bitmask (irange &r, const irange &lh,
+		   const irange &rh) const final override
 { update_known_bitmask (r, LSHIFT_EXPR, lh, rh); }
 } op_lshift;
 
@@ -2432,7 +2437,8 @@ public:
 	   const irange &op1,
 	   const irange &op2,
 	   relation_kind rel) const;
-  void update_bitmask (irange &r, const irange &lh, const irange &rh) const
+  void update_bitmask (irange &r, const irange &lh,
+		   const irange &rh) const final override
 { update_known_bitmask (r, RSHIFT_EXPR, lh, rh); }
 } op_rshift;
 
diff --git a/gcc/range-op.h b/gcc/range-op.h
index 3602bc4e123..af94c2756a7 100644
--- a/gcc/range-op.h
+++ b/gcc/range-op.h
@@ -72,6 +72,10 @@ public:
 			   const frange &lh,
 			   const frange &rh,
 			   relation_trio = TRIO_VARYING) const;
+  virtual bool fold_range (frange &r, tree type,
+			   const irange &lh,
+			   const irange &rh,
+			   relation_trio = TRIO_VARYING) const;
 
   // Return the range for op[12] in the general case.  LHS is the range for
   // the LHS of the expression, OP[12]is the range for the other
-- 
2.40.1



[COMMITTED 12/17] - Add a hybrid MAX_EXPR operator for integer and pointer.

2023-06-12 Thread Andrew MacLeod via Gcc-patches
Add a hybrid operator to choose between integer and pointer versions at 
runtime.


This is the last use of the pointer table, so it is also removed.

Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From cd194f582c5be3cc91e025e304e2769f61ceb6b6 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 10 Jun 2023 16:35:18 -0400
Subject: [PATCH 12/17] Add a hybrid MAX_EXPR operator for integer and pointer.

This adds an operator to the unified table for MAX_EXPR which will
select either the pointer or integer version based on the type passed
to the method.   This is for use until we have a seperate PRANGE class.

THIs also removes the pointer table which is no longer needed.

	* range-op-mixed.h (operator_max): Remove final.
	* range-op-ptr.cc (pointer_table::pointer_table): Remove MAX_EXPR.
	(pointer_table::pointer_table): Remove.
	(class hybrid_max_operator): New.
	(range_op_table::initialize_pointer_ops): Add hybrid_max_operator.
	* range-op.cc (pointer_tree_table): Remove.
	(unified_table::unified_table): Comment out MAX_EXPR.
	(get_op_handler): Remove check of pointer table.
	* range-op.h (class pointer_table): Remove.
---
 gcc/range-op-mixed.h |  6 +++---
 gcc/range-op-ptr.cc  | 30 --
 gcc/range-op.cc  | 10 ++
 gcc/range-op.h   |  9 -
 4 files changed, 25 insertions(+), 30 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index a65935435c2..bdc488b8754 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -636,10 +636,10 @@ class operator_max : public range_operator
 {
 public:
   void update_bitmask (irange &r, const irange &lh,
-  const irange &rh) const final override;
-private:
+  const irange &rh) const override;
+protected:
   void wi_fold (irange &r, tree type, const wide_int &lh_lb,
 		const wide_int &lh_ub, const wide_int &rh_lb,
-		const wide_int &rh_ub) const final override;
+		const wide_int &rh_ub) const override;
 };
 #endif // GCC_RANGE_OP_MIXED_H
diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc
index 483e43ca994..ea66fe9056b 100644
--- a/gcc/range-op-ptr.cc
+++ b/gcc/range-op-ptr.cc
@@ -157,7 +157,6 @@ pointer_min_max_operator::wi_fold (irange &r, tree type,
 r.set_varying (type);
 }
 
-
 class pointer_and_operator : public range_operator
 {
 public:
@@ -265,14 +264,6 @@ operator_pointer_diff::op1_op2_relation_effect (irange &lhs_range, tree type,
 	rel);
 }
 
-// When PRANGE is implemented, these are all the opcodes which are currently
-// expecting routines with PRANGE signatures.
-
-pointer_table::pointer_table ()
-{
-  set (MAX_EXPR, op_ptr_min_max);
-}
-
 // --
 // Hybrid operators for the 4 operations which integer and pointers share,
 // but which have different implementations.  Simply check the type in
@@ -404,8 +395,26 @@ public:
 }
 } op_hybrid_min;
 
+class hybrid_max_operator : public operator_max
+{
+public:
+  void update_bitmask (irange &r, const irange &lh,
+		   const irange &rh) const final override
+{
+  if (!r.undefined_p () && INTEGRAL_TYPE_P (r.type ()))
+	operator_max::update_bitmask (r, lh, rh);
+}
 
-
+  void wi_fold (irange &r, tree type, const wide_int &lh_lb,
+		const wide_int &lh_ub, const wide_int &rh_lb,
+		const wide_int &rh_ub) const final override
+{
+  if (INTEGRAL_TYPE_P (type))
+	return operator_max::wi_fold (r, type, lh_lb, lh_ub, rh_lb, rh_ub);
+  else
+	return op_ptr_min_max.wi_fold (r, type, lh_lb, lh_ub, rh_lb, rh_ub);
+}
+} op_hybrid_max;
 
 // Initialize any pointer operators to the primary table
 
@@ -417,4 +426,5 @@ range_op_table::initialize_pointer_ops ()
   set (BIT_AND_EXPR, op_hybrid_and);
   set (BIT_IOR_EXPR, op_hybrid_or);
   set (MIN_EXPR, op_hybrid_min);
+  set (MAX_EXPR, op_hybrid_max);
 }
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 481f3b1324d..046b7691bb6 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -49,8 +49,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-ccp.h"
 #include "range-op-mixed.h"
 
-pointer_table pointer_tree_table;
-
 // Instantiate a range_op_table for unified operations.
 class unified_table : public range_op_table
 {
@@ -124,18 +122,14 @@ unified_table::unified_table ()
   // set (BIT_AND_EXPR, op_bitwise_and);
   // set (BIT_IOR_EXPR, op_bitwise_or);
   // set (MIN_EXPR, op_min);
-  set (MAX_EXPR, op_max);
+  // set (MAX_EXPR, op_max);
 }
 
 // The tables are hidden and accessed via a simple extern function.
 
 range_operator *
-get_op_handler (enum tree_code code, tree type)
+get_op_handler (enum tree_code code, tree)
 {
-  // If this is pointer type and there is pointer specifc routine, use it.
-  if (POINTER_TYPE_P (type) && pointer_tree_table[code])
-return pointer_tree_table[code];
-
   return unified_tree_table[code];
 }
 
diff --git a/gcc/range-op.h b/gcc/range-op.h
index 08c51bace40..15c45137af2 100644
--- a/gcc/range-op.h
+

[committed] OpenMP: Cleanups related to the 'present' modifier

2023-06-12 Thread Tobias Burnus

Cleanup follow up to
  r14-1579-g4ede915d5dde93 "openmp: Add support for the 'present' modifier"
committed 6 days ago.

Namely:
* Replace for the program → libgomp ABI GOMP_MAP_PRESENT_[ALLOC,TO,FROM,TOFROM]
  by the preexisting GOMP_MAP_FORCE_PRESENT but keep the other enum values
  (and use them until gimplifcation).

* Improve wording if a non-existing/unsupported map-type modifier was used
  by not referring to 'omp target' as it could be also target (enter/exit) data.
  + Add a testcase for enter/exit data + data.

* Unify + improve wording shown for 'present' when not present on the device.

* Extend in the testcases to check that data actually gets copied with
  'target update' and 'map when the 'present' modifier is present.

Committed as Rev. r14-1736-g38944ec2a6fa10

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 38944ec2a6fa108d24e5cfbb24c52020f9aa3015
Author: Tobias Burnus 
Date:   Mon Jun 12 18:15:28 2023 +0200

OpenMP: Cleanups related to the 'present' modifier

Reduce number of enum values passed to libgomp as
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} have the same semantic as
GOMP_MAP_FORCE_PRESENT (i.e. abort if not present, otherwise ignore);
that's different to GOMP_MAP_ALWAYS_PRESENT_{TO,TOFROM,FROM} which also
abort if not present but copy data when present. This is is a follow-up to
the commit r14-1579-g4ede915d5dde93 done 6 days ago.

Additionally, the commit improves a libgomp run-time and a C/C++ compile-time
error wording and extends testcases a tiny bit.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_clause_map): Reword error message for
clearness especially with 'omp target (enter/exit) data.'

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_clause_map): Reword error message for
clearness especially with 'omp target (enter/exit) data.'
* semantics.cc (handle_omp_array_sections): Handle
GOMP_MAP_{ALWAYS_,}PRESENT_{TO,TOFROM,FROM,ALLOC} enum values.

gcc/ChangeLog:

* gimplify.cc (gimplify_adjust_omp_clauses_1): Use
GOMP_MAP_FORCE_PRESENT for 'present alloc' implicit mapping.
(gimplify_adjust_omp_clauses): Change
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to the equivalent
GOMP_MAP_FORCE_PRESENT.
* omp-low.cc (lower_omp_target): Remove handling of no-longer valid
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC}; update map kinds used for
to/from clauses with present modifier.

include/ChangeLog:

* gomp-constants.h (enum gomp_map_kind): Change the enum values
GOMP_MAP_PRESENT_{TO,TOFROM,FROM,ALLOC} to be compiler only.
(GOMP_MAP_PRESENT_P): Update to include also GOMP_MAP_FORCE_PRESENT.

libgomp/ChangeLog:

* target.c (gomp_to_device_kind_p, gomp_map_vars_internal): Replace
GOMP_MAP_PRESENT_{FROM,TO,TOFROM,ACLLOC} by GOMP_MAP_FORCE_PRESENT.
(gomp_map_vars_internal, gomp_update): Likewise; unify and improve
error message.
* testsuite/libgomp.c-c++-common/target-present-2.c: Update for
changed error message.
* testsuite/libgomp.fortran/target-present-1.f90: Likewise.
* testsuite/libgomp.fortran/target-present-2.f90: Likewise.
* testsuite/libgomp.oacc-c-c++-common/present-1.c: Likewise.
* testsuite/libgomp.c-c++-common/target-present-1.c: Likewise and
extend testcase to check that data is copied when needed.
* testsuite/libgomp.c-c++-common/target-present-3.c: Likewise.
* testsuite/libgomp.fortran/target-present-3.f90: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/defaultmap-4.c: Update scan-tree-dump.
* c-c++-common/gomp/map-9.c: Likewise.
* gfortran.dg/gomp/defaultmap-8.f90: Likewise.
* gfortran.dg/gomp/map-11.f90: Likewise.
* gfortran.dg/gomp/target-update-1.f90: Likewise.
* gfortran.dg/gomp/map-12.f90: Likewise; also check original dump.
* c-c++-common/gomp/map-6.c: Update dg-error and also check
clause error with 'target (enter/exit) data'.
---
 gcc/c/c-parser.cc  |  5 +-
 gcc/cp/parser.cc   |  5 +-
 gcc/cp/semantics.cc|  7 +++
 gcc/gimplify.cc| 13 -
 gcc/omp-low.cc | 14 ++
 gcc/testsuite/c-c++-common/gomp/defaultmap-4.c |  4 +-
 gcc/testsuite/c-c++-common/gomp/map-6.c| 14 +-
 gcc/testsuite/c-c++-common/

Re: [aarch64] Code-gen for vector initialization involving constants

2023-06-12 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 31 May 2023 at 00:23, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > Hi Richard,
> > The s32 case for single constant patch doesn't regress now after the
> > above commit.
> > Bootstrapped+tested on aarch64-linux-gnu, and verified that the new
> > tests pass for aarch64_be-linux-gnu.
> > Is it OK to commit ?
> >
> > Thanks,
> > Prathamesh
> >
> > [aarch64] Improve code-gen for vector initialization with single constant 
> > element.
> >
> > gcc/ChangeLog:
> >   * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak 
> > condition
> >   if (n_var == n_elts && n_elts <= 16) to allow a single constant,
> >   and if maxv == 1, use constant element for duplicating into register.
> >
> > gcc/testsuite/ChangeLog:
> >   * gcc.target/aarch64/vec-init-single-const.c: New test.
> >   * gcc.target/aarch64/vec-init-single-const-be.c: Likewise.
> >   * gcc.target/aarch64/vec-init-single-const-2.c: Likewise.
>
> OK, thanks.
Hi Richard,
Sorry for the delay, I was away on vacation. Committed the patch after
rebasing on ToT, and verifying bootstrap+test passes on
aarch64-linux-gnu:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=9eb757d11746c006c044ff45538b956be7f5859c

Thanks,
Prathamesh
>
> Richard
>
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 5b046d32b37..30d6e3e8d83 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -22192,7 +22192,7 @@ aarch64_expand_vector_init_fallback (rtx target, 
> > rtx vals)
> >   and matches[X][1] with the count of duplicate elements (if X is the
> >   earliest element which has duplicates).  */
> >
> > -  if (n_var == n_elts && n_elts <= 16)
> > +  if (n_var >= n_elts - 1 && n_elts <= 16)
> >  {
> >int matches[16][2] = {0};
> >for (int i = 0; i < n_elts; i++)
> > @@ -22209,12 +22209,23 @@ aarch64_expand_vector_init_fallback (rtx target, 
> > rtx vals)
> >   }
> >int maxelement = 0;
> >int maxv = 0;
> > +  rtx const_elem = NULL_RTX;
> > +  int const_elem_pos = 0;
> > +
> >for (int i = 0; i < n_elts; i++)
> > - if (matches[i][1] > maxv)
> > -   {
> > - maxelement = i;
> > - maxv = matches[i][1];
> > -   }
> > + {
> > +   if (matches[i][1] > maxv)
> > + {
> > +   maxelement = i;
> > +   maxv = matches[i][1];
> > + }
> > +   if (CONST_INT_P (XVECEXP (vals, 0, i))
> > +   || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
> > + {
> > +   const_elem_pos = i;
> > +   const_elem = XVECEXP (vals, 0, i);
> > + }
> > + }
> >
> >/* Create a duplicate of the most common element, unless all elements
> >are equally useless to us, in which case just immediately set the
> > @@ -22252,8 +22263,19 @@ aarch64_expand_vector_init_fallback (rtx target, 
> > rtx vals)
> >vector register.  For big-endian we want that position to hold
> >the last element of VALS.  */
> > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
> > -   rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
> > -   aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
> > +
> > +   /* If we have a single constant element, use that for duplicating
> > +  instead.  */
> > +   if (const_elem)
> > + {
> > +   maxelement = const_elem_pos;
> > +   aarch64_emit_move (target, gen_vec_duplicate (mode, 
> > const_elem));
> > + }
> > +   else
> > + {
> > +   rtx x = force_reg (inner_mode, XVECEXP (vals, 0, maxelement));
> > +   aarch64_emit_move (target, lowpart_subreg (mode, x, 
> > inner_mode));
> > + }
> >   }
> >else
> >   {
> > diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-single-const-2.c 
> > b/gcc/testsuite/gcc.target/aarch64/vec-init-single-const-2.c
> > new file mode 100644
> > index 000..f4dcab429c1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-single-const-2.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +#include 
> > +
> > +/* In case where there are no duplicate elements in vector initializer,
> > +   check that the constant is used for duplication.  */
> > +
> > +int8x16_t f_s8(int8_t a0, int8_t a1, int8_t a2, int8_t a3, int8_t a4,
> > +int8_t a5, int8_t a6, int8_t a7, int8_t a8, int8_t a9,
> > +int8_t a10, int8_t a11, int8_t a12, int8_t a13, int8_t a14)
> > +{
> > +  return (int8x16_t) { a0, a1, a2, a3, a4, a5, a6, a7,
> > +   a8, a9, a10, a11, a12, a13, a14, 1 };
> > +}
> > +
> > +int16x8_t f_s16(int16_t a0, int16_t a1, int16_t a2, int16_t a3, int16_t a4,
> > + int16_t a5, int16_t a6)
> > +{
> > +  return (int16x8_t) { a0, a1, a2, a3, a4, a5, a6, 1 };
> > +}
> > +
> > +int32x4_t f_s32(int32_t a0, int32_t a1, in

Re: [PATCH v4 2/6] libstdc++: use new built-in trait __is_reference for std::is_reference

2023-06-12 Thread François Dumont via Gcc-patches

Same remark for all your alike patches.

On 11/06/2023 04:43, Ken Matsui via Libstdc++ wrote:

This patch gets std::is_reference to dispatch to new built-in trait
__is_reference.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_reference): Use __is_reference built-in
trait.
(is_reference_v): Likewise.

Signed-off-by: Ken Matsui 
---
  libstdc++-v3/include/std/type_traits | 13 -
  1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 0e7a9c9c7f3..b2eb4bd3e7f 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -639,6 +639,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// Composite type categories.
  
/// is_reference

+#if __has_builtin(__is_reference)

Here you are checking __has_builtin(__is_reference)...

+  template
+struct is_reference
+: public __bool_constant<__is_reference(_Tp)>
+{ };
+#else
template
  struct is_reference
  : public false_type
@@ -653,6 +659,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  struct is_reference<_Tp&&>
  : public true_type
  { };
+#endif
  
/// is_arithmetic

template
@@ -3193,11 +3200,7 @@ template 
  template 
inline constexpr bool is_function_v = is_function<_Tp>::value;
  template 
-  inline constexpr bool is_reference_v = false;
-template 
-  inline constexpr bool is_reference_v<_Tp&> = true;
-template 
-  inline constexpr bool is_reference_v<_Tp&&> = true;
+  inline constexpr bool is_reference_v = __is_reference(_Tp);

...but not here, why ?



  1   2   >