RE: [PATCH v2] RISC-V: Refactor riscv mode after for VXRM and FRM

2023-07-12 Thread Li, Pan2 via Gcc-patches
Thank Juzhe for review. Sure, let me hold the v3 for kito's comments.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Wednesday, July 12, 2023 2:11 PM
To: Li, Pan2 ; gcc-patches 
Cc: Robin Dapp ; jeffreyalaw ; Li, 
Pan2 ; Wang, Yanzhang ; kito.cheng 

Subject: Re: [PATCH v2] RISC-V: Refactor riscv mode after for VXRM and FRM


+regnum_definition_p (rtx_insn *insn, unsigned int regno)

I prefer it to be reg_set_p.



+insn_asm_p (rtx_insn *insn)

asm_insn_p



+global_vxrm_state_unknown_p

vxrm_unknown_p



+global_frm_state_unknown_p (rtx_insn *insn)

FRM of CALL function is not "UNKNOWN" unlike VXRM.

It just change into another unknown(may be same or different from previous 
dynamic mode) Dynamic mode.

frm_unknown_dynamic_p



The reset refactoring looks good.

Let's see whether kito has more comments.



Thanks.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-07-12 13:50
To: gcc-patches
CC: juzhe.zhong; 
rdapp.gcc; 
jeffreyalaw; pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v2] RISC-V: Refactor riscv mode after for VXRM and FRM
From: Pan Li mailto:pan2...@intel.com>>

When investigate the FRM dynmaic rounding mode, we find the global
unknown status is quite different between the fixed-point and
floating-point. Thus, we separate the unknown function with extracting
some inner common functions.

We will also prepare more test cases in another PATCH.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv.cc (regnum_definition_p): New function.
(insn_asm_p): Ditto.
(riscv_vxrm_mode_after): New function for fixed-point.
(global_vxrm_state_unknown_p): Ditto.
(riscv_frm_mode_after): New function for floating-point.
(global_frm_state_unknown_p): Ditto.
(riscv_mode_after): Leverage new functions.
(riscv_entity_mode_after): Removed.
---
gcc/config/riscv/riscv.cc | 96 +--
1 file changed, 82 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38d8eb2fcf5..553fbb4435a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7742,19 +7742,91 @@ global_state_unknown_p (rtx_insn *insn, unsigned int 
regno)
   return false;
}
+static bool
+regnum_definition_p (rtx_insn *insn, unsigned int regno)
+{
+  df_ref ref;
+  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+
+  /* Return true if there is a definition of regno.  */
+  for (ref = DF_INSN_INFO_DEFS (insn_info); ref; ref = DF_REF_NEXT_LOC (ref))
+if (DF_REF_REGNO (ref) == regno)
+  return true;
+
+  return false;
+}
+
+static bool
+insn_asm_p (rtx_insn *insn)
+{
+  extract_insn (insn);
+
+  return recog_data.is_asm;
+}
+
+static bool
+global_vxrm_state_unknown_p (rtx_insn *insn)
+{
+  /* Return true if there is a definition of VXRM.  */
+  if (regnum_definition_p (insn, VXRM_REGNUM))
+return true;
+
+  /* A CALL function may contain an instruction that modifies the VXRM,
+ return true in this situation.  */
+  if (CALL_P (insn))
+return true;
+
+  /* Return true for all assembly since users may hardcode a assembly
+ like this: asm volatile ("csrwi vxrm, 0").  */
+  if (insn_asm_p (insn))
+return true;
+
+  return false;
+}
+
+static bool
+global_frm_state_unknown_p (rtx_insn *insn)
+{
+  /* Return true if there is a definition of FRM.  */
+  if (regnum_definition_p (insn, FRM_REGNUM))
+return true;
+
+  /* A CALL function may contain an instruction that modifies the FRM,
+ return true in this situation.  */
+  if (CALL_P (insn))
+return true;
+
+  return false;
+}
+
static int
-riscv_entity_mode_after (int regnum, rtx_insn *insn, int mode,
- int (*get_attr_mode) (rtx_insn *), int default_mode)
+riscv_vxrm_mode_after (rtx_insn *insn, int mode)
{
-  if (global_state_unknown_p (insn, regnum))
-return default_mode;
-  else if (recog_memoized (insn) < 0)
+  if (global_vxrm_state_unknown_p (insn))
+return VXRM_MODE_NONE;
+
+  if (recog_memoized (insn) < 0)
+return mode;
+
+  if (reg_mentioned_p (gen_rtx_REG (SImode, VXRM_REGNUM), PATTERN (insn)))
+return get_attr_vxrm_mode (insn);
+  else
 return mode;
+}
-  rtx reg = gen_rtx_REG (SImode, regnum);
-  bool mentioned_p = reg_mentioned_p (reg, PATTERN (insn));
+static int
+riscv_frm_mode_after (rtx_insn *insn, int mode)
+{
+  if (global_frm_state_unknown_p (insn))
+return FRM_MODE_NONE;
-  return mentioned_p ? get_attr_mode (insn): mode;
+  if (recog_memoized (insn) < 0)
+return mode;
+
+  if (reg_mentioned_p (gen_rtx_REG (SImode, FRM_REGNUM), PATTERN (insn)))
+return get_attr_frm_mode (insn);
+  else
+return mode;
}
/* Return the mode that an insn results in.  */
@@ -7765,13 +7837,9 @@ riscv_mode_after (int entity,

Re: [PATCH 1/1] riscv: thead: Fix ICE when enable XTheadMemPair ISA extension.

2023-07-12 Thread Kito Cheng via Gcc-patches
Hi Xianmiao:


> Hi Christoph and Kito,
>
> That's great that this bug has been resolved. If you merge this patch,
> it would be best to also merge it to the gcc-13 branch.

Yeah, that sounds reasonable, and the convention for backport is
waiting 1 week to make sure it's stable, so will backport that 1 week
later :)

>
>
> Thanks,
> Cooper


Re: [PATCH 1/1] riscv: thead: Fix ICE when enable XTheadMemPair ISA extension.

2023-07-12 Thread Philipp Tomsich
Looks like I missed the OK on this one.
I can pick it up today, unless you Kito already has it in flight?

Thanks,
Philipp.

On Tue, 11 Jul 2023 at 17:51, Kito Cheng  wrote:

> Hi Christoph:
>
> Ooops, I thought Philipp will push those patches, does here any other
> patches got approved but not committed? I can help to push those
> patches tomorrow.
>
> On Tue, Jul 11, 2023 at 11:42 PM Christoph Müllner
>  wrote:
> >
> > Hi Cooper,
> >
> > I addressed this in April this year.
> > It even got an "ok", but nobody pushed it:
> >   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616972.html
> >
> > BR
> > Christoph
> >
> > On Tue, Jul 11, 2023 at 5:39 PM Xianmiao Qu 
> wrote:
> > >
> > > The frame related load/store instructions should not been
> > > scheduled bewteen echo other, and the REG_FRAME_RELATED_EXPR
> > > expression note should should be added to those instructions
> > > to prevent this.
> > > This bug cause ICE during GCC bootstap, and it will also ICE
> > > in the simplified case mempair-4.c, compilation fails with:
> > > during RTL pass: dwarf2
> > > theadmempair-4.c:20:1: internal compiler error: in
> dwarf2out_frame_debug_cfa_offset, at dwarf2cfi.cc:1376
> > > 0xa8c017 dwarf2out_frame_debug_cfa_offset
> > > ../../../gcc/gcc/dwarf2cfi.cc:1376
> > > 0xa8c017 dwarf2out_frame_debug
> > > ../../../gcc/gcc/dwarf2cfi.cc:2285
> > > 0xa8c017 scan_insn_after
> > > ../../../gcc/gcc/dwarf2cfi.cc:2726
> > > 0xa8cc97 scan_trace
> > > ../../../gcc/gcc/dwarf2cfi.cc:2893
> > > 0xa8d84d create_cfi_notes
> > > ../../../gcc/gcc/dwarf2cfi.cc:2933
> > > 0xa8d84d execute_dwarf2_frame
> > > ../../../gcc/gcc/dwarf2cfi.cc:3309
> > > 0xa8d84d execute
> > > ../../../gcc/gcc/dwarf2cfi.cc:3799
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/riscv/thead.cc (th_mempair_save_regs): Add
> > > REG_FRAME_RELATED_EXPR note for mempair instuctions.
> > >
> > > gcc/testsuite/ChangeLog:
> > > * gcc.target/riscv/xtheadmempair-4.c: New test.
> > > ---
> > >  gcc/config/riscv/thead.cc |  6 +++--
> > >  .../gcc.target/riscv/xtheadmempair-4.c| 26 +++
> > >  2 files changed, 30 insertions(+), 2 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> > >
> > > diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
> > > index 75203805310..2df709226f9 100644
> > > --- a/gcc/config/riscv/thead.cc
> > > +++ b/gcc/config/riscv/thead.cc
> > > @@ -366,10 +366,12 @@ th_mempair_save_regs (rtx operands[4])
> > >  {
> > >rtx set1 = gen_rtx_SET (operands[0], operands[1]);
> > >rtx set2 = gen_rtx_SET (operands[2], operands[3]);
> > > +  rtx dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (2));
> > >rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
> set1, set2)));
> > >RTX_FRAME_RELATED_P (insn) = 1;
> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set1));
> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set2));
> > > +  XVECEXP (dwarf, 0, 0) = copy_rtx (set1);
> > > +  XVECEXP (dwarf, 0, 1) = copy_rtx (set2);
> > > +  add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
> > >  }
> > >
> > >  /* Similar like riscv_restore_reg, but restores two registers from
> memory
> > > diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> b/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> > > new file mode 100644
> > > index 000..d653f056ef4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> > > @@ -0,0 +1,26 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" }
> } */
> > > +/* { dg-options "-march=rv64gc_xtheadmempair -O2 -g
> -mtune=thead-c906" { target { rv64 } } } */
> > > +/* { dg-options "-march=rv32gc_xtheadmempair -O2 -g
> -mtune=thead-c906" { target { rv32 } } } */
> > > +
> > > +void a();
> > > +void b(char *);
> > > +void m_fn1(int);
> > > +int e;
> > > +
> > > +int foo(int ee, int f, int g) {
> > > +  char *h = (char *)__builtin_alloca(1);
> > > +  b(h);
> > > +  b("");
> > > +  int i = ee;
> > > +  e = g;
> > > +  m_fn1(f);
> > > +  a();
> > > +  e = i;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-times "th.ldd\t" 3 { target { rv64 } }
> } } */
> > > +/* { dg-final { scan-assembler-times "th.sdd\t" 3 { target { rv64 } }
> } } */
> > > +
> > > +/* { dg-final { scan-assembler-times "th.lwd\t" 3 { target { rv32 } }
> } } */
> > > +/* { dg-final { scan-assembler-times "th.swd\t" 3 { target { rv32 } }
> } } */
> > > --
> > > 2.17.1
> > >
>


Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c

2023-07-12 Thread Richard Biener via Gcc-patches
On Tue, Jul 11, 2023 at 5:01 PM Maciej W. Rozycki  wrote:
>
> On Fri, 7 Jul 2023, Richard Biener wrote:
>
> > > The bb-slp-pr95839.c test assumes quad-single float vector support, but
> > > some targets only support pairs of floats, causing this test to fail
> > > with such targets.  Limit this test to targets that support at least
> > > 128-bit vectors then, and add a complementing test that can be run with
> > > targets that have support for 64-bit vectors only.  There is no need to
> > > adjust bb-slp-pr95839-2.c as 128 bits are needed even for the smallest
> > > vector of doubles, so support is implied by the presence of vectors of
> > > doubles.
> >
> > I wonder why you see the testcase FAIL, on x86-64 when doing
> >
> > typedef float __attribute__((vector_size(32))) v4f32;
> >
> > v4f32 f(v4f32 a, v4f32 b)
> > {
> >   /* Check that we vectorize this CTOR without any loads.  */
> >   return (v4f32){a[0] + b[0], a[1] + b[1], a[2] + b[2], a[3] + b[3],
> >   a[4] + b[4], a[5] + b[5], a[6] + b[6], a[7] + b[7]};
> > }
> >
> > I see we vectorize the add and the "store".  We fail to perform
> > extraction from the incoming vectors (unless you enable AVX),
> > that's a missed optimization.
> >
> > So with paired floats I would expect sth similar?  Maybe
> > x86 is saved by kind-of-presence (but disabled) of V8SFmode vectors.
>
>  I am not familiar enough with this stuff to answer your question.
>
>  As we pass and return V2SF data in FP registers just as with complex
> float data with this hardware the function from my bb-slp-pr95839-v8.c
> expands to a single vector FP add instruction, followed by a function
> return.
>
>  Conversely, the original function from bb-slp-pr95839.c expands to a
> sequence of 22 instructions to extract incoming vector FP data from 4
> 64-bit GPRs into 8 FPRs, add the vectors piecemeal with 4 scalar FP add
> instructions, and then insert outgoing vector FP data from 4 FPRs back to
> 2 64-bit GPRs.  As an experiment I have modified the backend minimally so
> as to pass and return V4SF data in FP registers as well, but that didn't
> make the vectoriser trigger.
>
> > That said, we should handle this better so can you file an
> > enhancement bugreport for this?
>
>  Filed as PR -optimization/110630.

Thanks!

>  I can't publish RISC-V information
> related to the hardware affected, but as a quick check I ran the MIPS
> compiler:
>
> $ mips-linux-gnu-gcc -march=mips64 -mabi=64 -mpaired-single -O2 -S 
> bb-slp-pr95839*.c
>
> and got this code for bb-slp-pr95839-v8.c (mind the branch delay slot):
>
> jr  $31
> add.ps  $f0,$f12,$f13
>
> vs code for bb-slp-pr95839.c:
>
> daddiu  $sp,$sp,-64
> sd  $5,24($sp)
> sd  $7,40($sp)
> lwc1$f0,24($sp)
> lwc1$f1,40($sp)
> sd  $4,16($sp)
> sd  $6,32($sp)
> add.s   $f3,$f0,$f1
> lwc1$f0,28($sp)
> lwc1$f1,44($sp)
> lwc1$f4,36($sp)
> swc1$f3,56($sp)
> add.s   $f2,$f0,$f1
> lwc1$f0,16($sp)
> lwc1$f1,32($sp)
> swc1$f2,60($sp)
> add.s   $f1,$f0,$f1
> lwc1$f0,20($sp)
> ld  $3,56($sp)
> add.s   $f0,$f0,$f4
> swc1$f1,48($sp)
> swc1$f0,52($sp)
> ld  $2,48($sp)
> jr  $31
> daddiu  $sp,$sp,64
>
> so this is essentially the same scenario (up to the machine instruction
> count), and therefore it seems backend-agnostic.  I can imagine the latter
> case could expand to something like (instruction reordering surely needed
> for performance omitted for clarity):
>
> dmtc1   $4,$f0
> dmtc1   $5,$f1
> dmtc1   $6,$f2
> dmtc1   $7,$f3
> add.ps  $f0,$f0,$f1
> add.ps  $f2,$f2,$f3
> dmfc1   $2,$f0
> jr  $31
> dmfc1   $3,$f2
>
> saving a lot of cycles, and removing the need for spilling temporaries to
> the stack and for frame creation in the first place.
>
>  Do you agree it still makes sense to include bb-slp-pr95839-v8.c with the
> testsuite?

Sure, more coverage is always  nice.

Richard.

>   Maciej


Re: [PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c

2023-07-12 Thread Richard Biener via Gcc-patches
On Tue, Jul 11, 2023 at 5:01 PM Maciej W. Rozycki  wrote:
>
> On Fri, 7 Jul 2023, Richard Biener wrote:
>
> > > The pr97428.c test assumes support for vectors of doubles, but some
> > > targets only support vectors of floats, causing this test to fail with
> > > such targets.  Limit this test to targets that support vectors of
> > > doubles then.
> >
> > OK.
>
>  Applied, thanks.  OK to backport to the active branches?

Yes.

>   Maciej


Re: [PATCH v2] genopinit: Allow more than 256 modes.

2023-07-12 Thread Richard Biener via Gcc-patches
On Tue, 11 Jul 2023, Robin Dapp wrote:

> Attached is v2 that does not switch to uint64_t but stays within
> 32 bits by shifting the optab by 20 and the mode(s) by 10 bits.

LGTM.

> Regards
>  Robin
> 
> Upcoming changes for RISC-V will have us exceed 255 modes or 8 bits.
> This patch increases the limit to 10 bits and adjusts the hashing
> function for the gen* and optabs-query lookups accordingly.
> Consequently, the number of optabs is limited to 4095.
> 
> gcc/ChangeLog:
> 
>   * genopinit.cc (main): Adjust maximal number of optabs and
>   machine modes.
>   * gensupport.cc (find_optab): Shift optab by 20 and mode by
>   10 bits.
>   * optabs-query.h (optab_handler): Ditto.
>   (convert_optab_handler): Ditto.
> ---
>  gcc/genopinit.cc   | 5 ++---
>  gcc/gensupport.cc  | 2 +-
>  gcc/optabs-query.h | 4 ++--
>  3 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
> index 6bd8858a1d9..2a841006884 100644
> --- a/gcc/genopinit.cc
> +++ b/gcc/genopinit.cc
> @@ -182,8 +182,7 @@ main (int argc, const char **argv)
>  
>progname = "genopinit";
>  
> -  if (NUM_OPTABS > 0x
> -|| MAX_MACHINE_MODE >= ((1 << MACHINE_MODE_BITSIZE) - 1))
> +  if (NUM_OPTABS > 0xfff || NUM_MACHINE_MODES > 0x3ff)
>  fatal ("genopinit range assumptions invalid");
>  
>if (!init_rtx_reader_args_cb (argc, argv, handle_arg))
> @@ -439,7 +438,7 @@ main (int argc, const char **argv)
>  "bool\n"
>  "swap_optab_enable (optab op, machine_mode m, bool set)\n"
>  "{\n"
> -"  unsigned scode = (op << 16) | m;\n"
> +"  unsigned scode = (op << 20) | m;\n"
>  "  int i = lookup_handler (scode);\n"
>  "  if (i >= 0)\n"
>  "{\n"
> diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
> index e39e6dacce2..959d1d9c83c 100644
> --- a/gcc/gensupport.cc
> +++ b/gcc/gensupport.cc
> @@ -3806,7 +3806,7 @@ find_optab (optab_pattern *p, const char *name)
>   {
> p->name = name;
> p->op = optabs[pindex].op;
> -   p->sort_num = (p->op << 16) | (p->m2 << 8) | p->m1;
> +   p->sort_num = (p->op << 20) | (p->m2 << 10) | p->m1;
> return true;
>   }
>  }
> diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
> index 043e9791bc1..920eb6a1b67 100644
> --- a/gcc/optabs-query.h
> +++ b/gcc/optabs-query.h
> @@ -37,7 +37,7 @@ convert_optab_p (optab op)
>  inline enum insn_code
>  optab_handler (optab op, machine_mode mode)
>  {
> -  unsigned scode = (op << 16) | mode;
> +  unsigned scode = (op << 20) | mode;
>gcc_assert (op > LAST_CONV_OPTAB);
>return raw_optab_handler (scode);
>  }
> @@ -50,7 +50,7 @@ inline enum insn_code
>  convert_optab_handler (convert_optab op, machine_mode to_mode,
>  machine_mode from_mode)
>  {
> -  unsigned scode = (op << 16) | (from_mode << 8) | to_mode;
> +  unsigned scode = (op << 20) | (from_mode << 10) | to_mode;
>gcc_assert (convert_optab_p (op));
>return raw_optab_handler (scode);
>  }
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH 1/1] riscv: thead: Fix ICE when enable XTheadMemPair ISA extension.

2023-07-12 Thread Kito Cheng via Gcc-patches
Yeah, I've applied patches on my local tree and running the testsuite.

On Wed, Jul 12, 2023 at 3:11 PM Philipp Tomsich
 wrote:
>
> Looks like I missed the OK on this one.
> I can pick it up today, unless you Kito already has it in flight?
>
> Thanks,
> Philipp.
>
> On Tue, 11 Jul 2023 at 17:51, Kito Cheng  wrote:
>>
>> Hi Christoph:
>>
>> Ooops, I thought Philipp will push those patches, does here any other
>> patches got approved but not committed? I can help to push those
>> patches tomorrow.
>>
>> On Tue, Jul 11, 2023 at 11:42 PM Christoph Müllner
>>  wrote:
>> >
>> > Hi Cooper,
>> >
>> > I addressed this in April this year.
>> > It even got an "ok", but nobody pushed it:
>> >   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616972.html
>> >
>> > BR
>> > Christoph
>> >
>> > On Tue, Jul 11, 2023 at 5:39 PM Xianmiao Qu  
>> > wrote:
>> > >
>> > > The frame related load/store instructions should not been
>> > > scheduled bewteen echo other, and the REG_FRAME_RELATED_EXPR
>> > > expression note should should be added to those instructions
>> > > to prevent this.
>> > > This bug cause ICE during GCC bootstap, and it will also ICE
>> > > in the simplified case mempair-4.c, compilation fails with:
>> > > during RTL pass: dwarf2
>> > > theadmempair-4.c:20:1: internal compiler error: in 
>> > > dwarf2out_frame_debug_cfa_offset, at dwarf2cfi.cc:1376
>> > > 0xa8c017 dwarf2out_frame_debug_cfa_offset
>> > > ../../../gcc/gcc/dwarf2cfi.cc:1376
>> > > 0xa8c017 dwarf2out_frame_debug
>> > > ../../../gcc/gcc/dwarf2cfi.cc:2285
>> > > 0xa8c017 scan_insn_after
>> > > ../../../gcc/gcc/dwarf2cfi.cc:2726
>> > > 0xa8cc97 scan_trace
>> > > ../../../gcc/gcc/dwarf2cfi.cc:2893
>> > > 0xa8d84d create_cfi_notes
>> > > ../../../gcc/gcc/dwarf2cfi.cc:2933
>> > > 0xa8d84d execute_dwarf2_frame
>> > > ../../../gcc/gcc/dwarf2cfi.cc:3309
>> > > 0xa8d84d execute
>> > > ../../../gcc/gcc/dwarf2cfi.cc:3799
>> > >
>> > > gcc/ChangeLog:
>> > >
>> > > * config/riscv/thead.cc (th_mempair_save_regs): Add
>> > > REG_FRAME_RELATED_EXPR note for mempair instuctions.
>> > >
>> > > gcc/testsuite/ChangeLog:
>> > > * gcc.target/riscv/xtheadmempair-4.c: New test.
>> > > ---
>> > >  gcc/config/riscv/thead.cc |  6 +++--
>> > >  .../gcc.target/riscv/xtheadmempair-4.c| 26 +++
>> > >  2 files changed, 30 insertions(+), 2 deletions(-)
>> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
>> > >
>> > > diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
>> > > index 75203805310..2df709226f9 100644
>> > > --- a/gcc/config/riscv/thead.cc
>> > > +++ b/gcc/config/riscv/thead.cc
>> > > @@ -366,10 +366,12 @@ th_mempair_save_regs (rtx operands[4])
>> > >  {
>> > >rtx set1 = gen_rtx_SET (operands[0], operands[1]);
>> > >rtx set2 = gen_rtx_SET (operands[2], operands[3]);
>> > > +  rtx dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (2));
>> > >rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set1, 
>> > > set2)));
>> > >RTX_FRAME_RELATED_P (insn) = 1;
>> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set1));
>> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set2));
>> > > +  XVECEXP (dwarf, 0, 0) = copy_rtx (set1);
>> > > +  XVECEXP (dwarf, 0, 1) = copy_rtx (set2);
>> > > +  add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
>> > >  }
>> > >
>> > >  /* Similar like riscv_restore_reg, but restores two registers from 
>> > > memory
>> > > diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c 
>> > > b/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
>> > > new file mode 100644
>> > > index 000..d653f056ef4
>> > > --- /dev/null
>> > > +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
>> > > @@ -0,0 +1,26 @@
>> > > +/* { dg-do compile } */
>> > > +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } 
>> > > */
>> > > +/* { dg-options "-march=rv64gc_xtheadmempair -O2 -g -mtune=thead-c906" 
>> > > { target { rv64 } } } */
>> > > +/* { dg-options "-march=rv32gc_xtheadmempair -O2 -g -mtune=thead-c906" 
>> > > { target { rv32 } } } */
>> > > +
>> > > +void a();
>> > > +void b(char *);
>> > > +void m_fn1(int);
>> > > +int e;
>> > > +
>> > > +int foo(int ee, int f, int g) {
>> > > +  char *h = (char *)__builtin_alloca(1);
>> > > +  b(h);
>> > > +  b("");
>> > > +  int i = ee;
>> > > +  e = g;
>> > > +  m_fn1(f);
>> > > +  a();
>> > > +  e = i;
>> > > +}
>> > > +
>> > > +/* { dg-final { scan-assembler-times "th.ldd\t" 3 { target { rv64 } } } 
>> > > } */
>> > > +/* { dg-final { scan-assembler-times "th.sdd\t" 3 { target { rv64 } } } 
>> > > } */
>> > > +
>> > > +/* { dg-final { scan-assembler-times "th.lwd\t" 3 { target { rv32 } } } 
>> > > } */
>> > > +/* { dg-final { scan-assembler-times "th.swd\t" 3 { target { rv32 } } } 
>> > > } */
>> > > --
>> > > 2.17.1
>> > >


Re: [PATCH 1/1] riscv: thead: Fix ICE when enable XTheadMemPair ISA extension.

2023-07-12 Thread Philipp Tomsich
Awesome, thanks!

On Wed, 12 Jul 2023 at 09:18, Kito Cheng  wrote:

> Yeah, I've applied patches on my local tree and running the testsuite.
>
> On Wed, Jul 12, 2023 at 3:11 PM Philipp Tomsich
>  wrote:
> >
> > Looks like I missed the OK on this one.
> > I can pick it up today, unless you Kito already has it in flight?
> >
> > Thanks,
> > Philipp.
> >
> > On Tue, 11 Jul 2023 at 17:51, Kito Cheng  wrote:
> >>
> >> Hi Christoph:
> >>
> >> Ooops, I thought Philipp will push those patches, does here any other
> >> patches got approved but not committed? I can help to push those
> >> patches tomorrow.
> >>
> >> On Tue, Jul 11, 2023 at 11:42 PM Christoph Müllner
> >>  wrote:
> >> >
> >> > Hi Cooper,
> >> >
> >> > I addressed this in April this year.
> >> > It even got an "ok", but nobody pushed it:
> >> >   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616972.html
> >> >
> >> > BR
> >> > Christoph
> >> >
> >> > On Tue, Jul 11, 2023 at 5:39 PM Xianmiao Qu <
> cooper...@linux.alibaba.com> wrote:
> >> > >
> >> > > The frame related load/store instructions should not been
> >> > > scheduled bewteen echo other, and the REG_FRAME_RELATED_EXPR
> >> > > expression note should should be added to those instructions
> >> > > to prevent this.
> >> > > This bug cause ICE during GCC bootstap, and it will also ICE
> >> > > in the simplified case mempair-4.c, compilation fails with:
> >> > > during RTL pass: dwarf2
> >> > > theadmempair-4.c:20:1: internal compiler error: in
> dwarf2out_frame_debug_cfa_offset, at dwarf2cfi.cc:1376
> >> > > 0xa8c017 dwarf2out_frame_debug_cfa_offset
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:1376
> >> > > 0xa8c017 dwarf2out_frame_debug
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:2285
> >> > > 0xa8c017 scan_insn_after
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:2726
> >> > > 0xa8cc97 scan_trace
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:2893
> >> > > 0xa8d84d create_cfi_notes
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:2933
> >> > > 0xa8d84d execute_dwarf2_frame
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:3309
> >> > > 0xa8d84d execute
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:3799
> >> > >
> >> > > gcc/ChangeLog:
> >> > >
> >> > > * config/riscv/thead.cc (th_mempair_save_regs): Add
> >> > > REG_FRAME_RELATED_EXPR note for mempair instuctions.
> >> > >
> >> > > gcc/testsuite/ChangeLog:
> >> > > * gcc.target/riscv/xtheadmempair-4.c: New test.
> >> > > ---
> >> > >  gcc/config/riscv/thead.cc |  6 +++--
> >> > >  .../gcc.target/riscv/xtheadmempair-4.c| 26
> +++
> >> > >  2 files changed, 30 insertions(+), 2 deletions(-)
> >> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> >> > >
> >> > > diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
> >> > > index 75203805310..2df709226f9 100644
> >> > > --- a/gcc/config/riscv/thead.cc
> >> > > +++ b/gcc/config/riscv/thead.cc
> >> > > @@ -366,10 +366,12 @@ th_mempair_save_regs (rtx operands[4])
> >> > >  {
> >> > >rtx set1 = gen_rtx_SET (operands[0], operands[1]);
> >> > >rtx set2 = gen_rtx_SET (operands[2], operands[3]);
> >> > > +  rtx dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (2));
> >> > >rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
> set1, set2)));
> >> > >RTX_FRAME_RELATED_P (insn) = 1;
> >> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set1));
> >> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set2));
> >> > > +  XVECEXP (dwarf, 0, 0) = copy_rtx (set1);
> >> > > +  XVECEXP (dwarf, 0, 1) = copy_rtx (set2);
> >> > > +  add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
> >> > >  }
> >> > >
> >> > >  /* Similar like riscv_restore_reg, but restores two registers from
> memory
> >> > > diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> b/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> >> > > new file mode 100644
> >> > > index 000..d653f056ef4
> >> > > --- /dev/null
> >> > > +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> >> > > @@ -0,0 +1,26 @@
> >> > > +/* { dg-do compile } */
> >> > > +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os"
> "-flto" } } */
> >> > > +/* { dg-options "-march=rv64gc_xtheadmempair -O2 -g
> -mtune=thead-c906" { target { rv64 } } } */
> >> > > +/* { dg-options "-march=rv32gc_xtheadmempair -O2 -g
> -mtune=thead-c906" { target { rv32 } } } */
> >> > > +
> >> > > +void a();
> >> > > +void b(char *);
> >> > > +void m_fn1(int);
> >> > > +int e;
> >> > > +
> >> > > +int foo(int ee, int f, int g) {
> >> > > +  char *h = (char *)__builtin_alloca(1);
> >> > > +  b(h);
> >> > > +  b("");
> >> > > +  int i = ee;
> >> > > +  e = g;
> >> > > +  m_fn1(f);
> >> > > +  a();
> >> > > +  e = i;
> >> > > +}
> >> > > +
> >> > > +/* { dg-final { scan-assembler-times "th.ldd\t" 3 { target { rv64
> } } } } */
> >> > > +/* { dg-final { scan-assembler-times "th.sdd\t" 3 { target { rv64
> } } } } */
> >> > > +
> >> > > +/* 

Re: [PATCH V2] Provide -fcf-protection=branch,return.

2023-07-12 Thread Hongtao Liu via Gcc-patches
ping.

On Mon, May 22, 2023 at 4:08 PM Hongtao Liu  wrote:
>
> ping.
>
> On Sat, May 13, 2023 at 5:20 PM liuhongt  wrote:
> >
> > > I think this could be simplified if you use either EnumSet or
> > > EnumBitSet instead in common.opt for `-fcf-protection=`.
> >
> > Use EnumSet instead of EnumBitSet since CF_FULL is not power of 2.
> > It is a bit tricky for sets classification, cf_branch and cf_return
> > should be in different sets, but they both "conflicts" cf_full,
> > cf_none. And current EnumSet don't handle this well.
> >
> > So in the current implementation, only cf_full,cf_none are exclusive
> > to each other, but they can be combined with any cf_branch, cf_return,
> > cf_check. It's not perfect, but still an improvement than original
> > one.
> >
> > gcc/ChangeLog:
> >
> > * common.opt: (fcf-protection=): Add EnumSet attribute to
> > support combination of params.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * c-c++-common/fcf-protection-10.c: New test.
> > * c-c++-common/fcf-protection-11.c: New test.
> > * c-c++-common/fcf-protection-12.c: New test.
> > * c-c++-common/fcf-protection-8.c: New test.
> > * c-c++-common/fcf-protection-9.c: New test.
> > * gcc.target/i386/pr89701-1.c: New test.
> > * gcc.target/i386/pr89701-2.c: New test.
> > * gcc.target/i386/pr89701-3.c: New test.
> > ---
> >  gcc/common.opt | 12 ++--
> >  gcc/testsuite/c-c++-common/fcf-protection-10.c |  2 ++
> >  gcc/testsuite/c-c++-common/fcf-protection-11.c |  2 ++
> >  gcc/testsuite/c-c++-common/fcf-protection-12.c |  2 ++
> >  gcc/testsuite/c-c++-common/fcf-protection-8.c  |  2 ++
> >  gcc/testsuite/c-c++-common/fcf-protection-9.c  |  2 ++
> >  gcc/testsuite/gcc.target/i386/pr89701-1.c  |  4 
> >  gcc/testsuite/gcc.target/i386/pr89701-2.c  |  4 
> >  gcc/testsuite/gcc.target/i386/pr89701-3.c  |  4 
> >  9 files changed, 28 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-10.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-11.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-12.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-8.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-9.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-3.c
> >
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index a28ca13385a..02f2472959a 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -1886,7 +1886,7 @@ fcf-protection
> >  Common RejectNegative Alias(fcf-protection=,full)
> >
> >  fcf-protection=
> > -Common Joined RejectNegative Enum(cf_protection_level) 
> > Var(flag_cf_protection) Init(CF_NONE)
> > +Common Joined RejectNegative Enum(cf_protection_level) EnumSet 
> > Var(flag_cf_protection) Init(CF_NONE)
> >  -fcf-protection=[full|branch|return|none|check]Instrument 
> > functions with checks to verify jump/call/return control-flow transfer
> >  instructions have valid targets.
> >
> > @@ -1894,19 +1894,19 @@ Enum
> >  Name(cf_protection_level) Type(enum cf_protection_level) 
> > UnknownError(unknown Control-Flow Protection Level %qs)
> >
> >  EnumValue
> > -Enum(cf_protection_level) String(full) Value(CF_FULL)
> > +Enum(cf_protection_level) String(full) Value(CF_FULL) Set(1)
> >
> >  EnumValue
> > -Enum(cf_protection_level) String(branch) Value(CF_BRANCH)
> > +Enum(cf_protection_level) String(branch) Value(CF_BRANCH) Set(2)
> >
> >  EnumValue
> > -Enum(cf_protection_level) String(return) Value(CF_RETURN)
> > +Enum(cf_protection_level) String(return) Value(CF_RETURN) Set(3)
> >
> >  EnumValue
> > -Enum(cf_protection_level) String(check) Value(CF_CHECK)
> > +Enum(cf_protection_level) String(check) Value(CF_CHECK) Set(4)
> >
> >  EnumValue
> > -Enum(cf_protection_level) String(none) Value(CF_NONE)
> > +Enum(cf_protection_level) String(none) Value(CF_NONE) Set(1)
> >
> >  finstrument-functions
> >  Common Var(flag_instrument_function_entry_exit,1)
> > diff --git a/gcc/testsuite/c-c++-common/fcf-protection-10.c 
> > b/gcc/testsuite/c-c++-common/fcf-protection-10.c
> > new file mode 100644
> > index 000..b271d134e52
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/fcf-protection-10.c
> > @@ -0,0 +1,2 @@
> > +/* { dg-do compile { target { "i?86-*-* x86_64-*-*" } } } */
> > +/* { dg-options "-fcf-protection=branch,check" } */
> > diff --git a/gcc/testsuite/c-c++-common/fcf-protection-11.c 
> > b/gcc/testsuite/c-c++-common/fcf-protection-11.c
> > new file mode 100644
> > index 000..2e566350ccd
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/fcf-protection-11.c
> > @@ -0,0 +1,2 @@
> > +/* { dg-do compile { target { "i?86-*-* x86_64-*-*" } } } */
> > +/* { dg-options "-fcf-protection=branch,return" } */

Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-12 Thread Richard Biener via Gcc-patches
On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:

> I understand your concern. I CC Richards to see whether this piece of codes 
> is  unsafe.
> 
> Hi, Richard and Richi:
> 
> Jeff is worrying about this codes in "expand_gather_scatter" of supporting 
> len_mask_gather_load/len_mask_scatter_store in RISC-V port.
> 
> The codes are as follows:
> 
>  +/* Return true if it is the strided load/store. */
> +static bool
> +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> +{
> +  if (const_vec_series_p (vec_offset, base, step))
> +return true;
> +
> +  /* For strided load/store, vectorizer always generates
> + VEC_SERIES_EXPR for vec_offset.  */
> +  tree expr = REG_P (vec_offset) ? REG_EXPR (vec_offset) : NULL_TREE;
> +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> +return false;
> +
> +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +  || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> +return false;
> +
> +  tree baset = gimple_assign_rhs1 (def_stmt);
> +  tree stept = gimple_assign_rhs2 (def_stmt);
> +  *base = expand_normal (baset);
> +  *step = expand_normal (stept);
> +
> +  if (!rtx_equal_p (*base, const0_rtx))
> +return false;
> +  return true;
> +}
> In this codes, I tried to query the SSA_NAME_DEF_STMT to see whether the 
> vector offset of gather/scatter is VEC_SERISE
> If it is VEC_SERISE, I will lower them into RVV strided load/stores 
> (vlse.v/vsse.v) which is using scalar stride, 
> if it is not, then use common RVV indexed load/store with vector offset 
> (vluxei/vsuxei).
> 
> Jeff is worrying about whether we are using SSA_NAME_DEF_STMT at this point  
> (during the stage "expand" expanding gimple ->rtl).

Using SSA_NAME_DEF_STMT during expansion is OK, but I don't think you
can rely on REG_EXPR here since you don't know whether any coalescing
happened.  That said, maybe the implementation currently guarantees
you'll only see a REG_EXPR SSA name if there's a single definition
of that register, but at least I'm not aware of that and this is also
not documented.

I wonder if you can recover vlse.v at combine time though?

That said, if the ISA supports gather/scatter with an affine offset
the more appropriate way would be to add additional named expanders
for this and deal with the above in the middle-end during RTL
expansion instead.

Richard.

> I am also wondering whether I am doing wrong here.
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Jeff Law
> Date: 2023-07-12 13:32
> To: juzhe.zh...@rivai.ai; gcc-patches
> CC: Kito.cheng; Robin Dapp
> Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
> auto-vectorization
>  
>  
> On 7/11/23 20:34, juzhe.zh...@rivai.ai wrote:
> > Hi, Jeff.
> > 
> >  >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
> >>>complete.  While you might be able to get REG_EXPR, I would not really
> >>>expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
> >>>way to make sure it's not called at an inappropriate time.
> > I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.
> > 
> >>>Should this have been known_lt rather than known_le?
> > It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE 
> > for SLP.
> THanks for double checking.  It looked slightly odd checking ge or le.
>  
>  
> > 
> >>>Something's off in your formatting here.  I'd guess spaces vs tabs
> > Ok.
> > 
> >>>In a few places you're using expand_binop.  Those interfaces are really
> >>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
> >>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
> >>>interfaces?
> > I saw ARM SVE is using them in many places for expanding patterns.
> > And I think it's convenient so that's why I use them.
> OK.
>  
> I still think we need a resolution on strided_load_store_p.  As I 
> mentioned in my original email, I'm not sure you can depend on getting 
> to the SSA_NAME_DEF_STMT at this point -- in particular if it's a 
> dangling pointer, then bad things are going to happen.  So let's chase 
> that down.  Presumably this is called during gimple->rtl expansion, 
> right?  Is it ever called later?
>  
> I think my concerns about expand_gather_scatter are a non-issue after 
> looking at it again -- I missed the GET_MODE (step) != Pmode conditional 
> when I first looked at that code.
>  
>  
> jeff
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


RE: [x86 PATCH] Tweak ix86_expand_int_compare to use PTEST for vector equality.

2023-07-12 Thread Roger Sayle


> From: Hongtao Liu 
> Sent: 12 July 2023 01:45
> 
> On Wed, Jul 12, 2023 at 4:57 AM Roger Sayle 
> > > From: Hongtao Liu 
> > > Sent: 28 June 2023 04:23
> > > > From: Roger Sayle 
> > > > Sent: 27 June 2023 20:28
> > > >
> > > > I've also come up with an alternate/complementary/supplementary
> > > > fix of generating the PTEST during RTL expansion, rather than rely
> > > > on this being caught/optimized later during STV.
> > > >
> > > > You may notice in this patch, the tests for TARGET_SSE4_1 and
> > > > TImode appear last.  When I was writing this, I initially also
> > > > added support for AVX VPTEST and OImode, before realizing that x86
> > > > doesn't (yet) support 256-bit OImode (which also explains why we
> > > > don't have an OImode to V1OImode scalar-to-vector pass).
> > > > Retaining this clause ordering should minimize the lines changed if 
> > > > things
> change in future.
> > > >
> > > > This patch has been tested on x86_64-pc-linux-gnu with make
> > > > bootstrap and make -k check, both with and without
> > > > --target_board=unix{-m32} with no new failures.  Ok for mainline?
> > > >
> > > >
> > > > 2023-06-27  Roger Sayle  
> > > >
> > > > gcc/ChangeLog
> > > > * config/i386/i386-expand.cc (ix86_expand_int_compare): If
> > > > testing a TImode SUBREG of a 128-bit vector register against
> > > > zero, use a PTEST instruction instead of first moving it to
> > > > to scalar registers.
> > > >
> > >
> > > +  /* Attempt to use PTEST, if available, when testing vector modes for
> > > + equality/inequality against zero.  */  if (op1 == const0_rtx
> > > +  && SUBREG_P (op0)
> > > +  && cmpmode == CCZmode
> > > +  && SUBREG_BYTE (op0) == 0
> > > +  && REG_P (SUBREG_REG (op0))
> > > Just register_operand (op0, TImode),
> >
> > I completely agree that in most circumstances, the early RTL
> > optimizers should use standard predicates, such as register_operand,
> > that don't distinguish between REG and SUBREG, allowing the choice
> > (assignment) to be left to register allocation (reload).
> >
> > However in this case, unusually, the presence of the SUBREG, and
> > treating it differently from a REG is critical (in fact the reason for
> > the patch).  x86_64 can very efficiently test whether a 128-bit value
> > is zero, setting ZF, either in TImode, using orq %rax,%rdx in a single
> > cycle/single instruction, or in V1TImode, using ptest %xmm0,%xmm0, in a 
> > single
> cycle/single instruction.
> > There's no reason to prefer one form over the other.  A SUREG,
> > however, that moves the value from the scalar registers to a vector
> > register, or from a vector registers to scalar registers, requires two or 
> > three
> instructions, often reading
> > and writing values via memory, at a huge performance penalty.   Hence the
> > goal is to eliminate the (VIEW_CONVERT) SUBREG, and choose the
> > appropriate single-cycle test instruction for where the data is
> > located.  Hence we want to leave REG_P alone, but optimize (only) the
> SUBREG_P cases.
> > register_operand doesn't help with this.
> >
> > Note this is counter to the usual advice.  Normally, a SUBREG between
> > scalar registers is cheap (in fact free) on x86, hence it safe for
> > predicates to ignore them prior to register allocation.  But another
> > use of SUBREG, to represent a VIEW_CONVERT_EXPR/transfer between
> > processing units is closer to a conversion, and a very expensive one
> > (going via memory with different size reads vs writes) at that.
> >
> >
> > > +  && VECTOR_MODE_P (GET_MODE (SUBREG_REG (op0)))
> > > +  && TARGET_SSE4_1
> > > +  && GET_MODE (op0) == TImode
> > > +  && GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))) == 16)
> > > +{
> > > +  tmp = SUBREG_REG (op0);
> > > and tmp = lowpart_subreg (V1TImode, force_reg (TImode, op0));?
> > > I think RA can handle SUBREG correctly, no need for extra predicates.
> >
> > Likewise, your "tmp = lowpart_subreg (V1TImode, force_reg (TImode, ...))"
> > is forcing there to always be an inter-unit transfer/pipeline stall,
> > when this is idiom that we're trying to eliminate.
> >
> > I should have repeated the motivating example from my original post at
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html
> >
> > typedef long long __m128i __attribute__ ((__vector_size__ (16))); int
> > foo (__m128i x, __m128i y) {
> >   return (__int128)x == (__int128)y;
> > }
> >
> > is currently generated as:
> > foo:movaps  %xmm0, -40(%rsp)
> > movq-32(%rsp), %rdx
> > movq%xmm0, %rax
> > movq%xmm1, %rsi
> > movaps  %xmm1, -24(%rsp)
> > movq-16(%rsp), %rcx
> > xorq%rsi, %rax
> > xorq%rcx, %rdx
> > orq %rdx, %rax
> > sete%al
> > movzbl  %al, %eax
> > ret
> >
> > with this patch (to eliminate the interunit SUBREG) this becomes:
> >
> > foo:pxor%xmm1, %xmm0
> > xorl%eax, %

Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-12 Thread juzhe.zh...@rivai.ai
Thanks Richard.

Is it correct that the better way is to add optabs 
(len_strided_load/len_strided_store),
then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to 
len_strided_load/len_strided_store optab (if it is strided load/store) in
expand_gather_load_optab_fn 
expand_scatter_store_optab_fn

of internal-fn.cc

Am I right? Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-12 15:27
To: juzhe.zh...@rivai.ai
CC: jeffreyalaw; gcc-patches; Kito.cheng; Robin Dapp; richard.sandiford
Subject: Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
auto-vectorization
On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:
 
> I understand your concern. I CC Richards to see whether this piece of codes 
> is  unsafe.
> 
> Hi, Richard and Richi:
> 
> Jeff is worrying about this codes in "expand_gather_scatter" of supporting 
> len_mask_gather_load/len_mask_scatter_store in RISC-V port.
> 
> The codes are as follows:
> 
>  +/* Return true if it is the strided load/store. */
> +static bool
> +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> +{
> +  if (const_vec_series_p (vec_offset, base, step))
> +return true;
> +
> +  /* For strided load/store, vectorizer always generates
> + VEC_SERIES_EXPR for vec_offset.  */
> +  tree expr = REG_P (vec_offset) ? REG_EXPR (vec_offset) : NULL_TREE;
> +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> +return false;
> +
> +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> +  if (!def_stmt || !is_gimple_assign (def_stmt)
> +  || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> +return false;
> +
> +  tree baset = gimple_assign_rhs1 (def_stmt);
> +  tree stept = gimple_assign_rhs2 (def_stmt);
> +  *base = expand_normal (baset);
> +  *step = expand_normal (stept);
> +
> +  if (!rtx_equal_p (*base, const0_rtx))
> +return false;
> +  return true;
> +}
> In this codes, I tried to query the SSA_NAME_DEF_STMT to see whether the 
> vector offset of gather/scatter is VEC_SERISE
> If it is VEC_SERISE, I will lower them into RVV strided load/stores 
> (vlse.v/vsse.v) which is using scalar stride, 
> if it is not, then use common RVV indexed load/store with vector offset 
> (vluxei/vsuxei).
> 
> Jeff is worrying about whether we are using SSA_NAME_DEF_STMT at this point  
> (during the stage "expand" expanding gimple ->rtl).
 
Using SSA_NAME_DEF_STMT during expansion is OK, but I don't think you
can rely on REG_EXPR here since you don't know whether any coalescing
happened.  That said, maybe the implementation currently guarantees
you'll only see a REG_EXPR SSA name if there's a single definition
of that register, but at least I'm not aware of that and this is also
not documented.
 
I wonder if you can recover vlse.v at combine time though?
 
That said, if the ISA supports gather/scatter with an affine offset
the more appropriate way would be to add additional named expanders
for this and deal with the above in the middle-end during RTL
expansion instead.
 
Richard.
 
> I am also wondering whether I am doing wrong here.
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Jeff Law
> Date: 2023-07-12 13:32
> To: juzhe.zh...@rivai.ai; gcc-patches
> CC: Kito.cheng; Robin Dapp
> Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
> auto-vectorization
>  
>  
> On 7/11/23 20:34, juzhe.zh...@rivai.ai wrote:
> > Hi, Jeff.
> > 
> >  >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
> >>>complete.  While you might be able to get REG_EXPR, I would not really
> >>>expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
> >>>way to make sure it's not called at an inappropriate time.
> > I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.
> > 
> >>>Should this have been known_lt rather than known_le?
> > It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE 
> > for SLP.
> THanks for double checking.  It looked slightly odd checking ge or le.
>  
>  
> > 
> >>>Something's off in your formatting here.  I'd guess spaces vs tabs
> > Ok.
> > 
> >>>In a few places you're using expand_binop.  Those interfaces are really
> >>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
> >>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
> >>>interfaces?
> > I saw ARM SVE is using them in many places for expanding patterns.
> > And I think it's convenient so that's why I use them.
> OK.
>  
> I still think we need a resolution on strided_load_store_p.  As I 
> mentioned in my original email, I'm not sure you can depend on getting 
> to the SSA_NAME_DEF_STMT at this point -- in particular if it's a 
> dangling pointer, then bad things are going to happen.  So let's chase 
> that down.  Presumably this is called during gimple->rtl expansion, 
> right?  Is it ever called later?
>  
> I think my concerns about expand_gather_scatter are a non-issue after 
> looking at it ag

Re: [PATCH 0/9] Add btf_decl_tag C attribute

2023-07-12 Thread Richard Biener via Gcc-patches
On Tue, Jul 11, 2023 at 11:58 PM David Faust via Gcc-patches
 wrote:
>
> Hello,
>
> This series adds support for a new attribute, "btf_decl_tag" in GCC.
> The same attribute is already supported in clang, and is used by various
> components of the BPF ecosystem.
>
> The purpose of the attribute is to allow to associate (to "tag")
> declarations with arbitrary string annotations, which are emitted into
> debugging information (DWARF and/or BTF) to facilitate post-compilation
> analysis (the motivating use case being the Linux kernel BPF verifier).
> Multiple tags are allowed on the same declaration.
>
> These strings are not interpreted by the compiler, and the attribute
> itself has no effect on generated code, other than to produce additional
> DWARF DIEs and/or BTF records conveying the annotations.
>
> This entails:
>
> - A new C-language-level attribute which allows to associate (to "tag")
>   particular declarations with arbitrary strings.
>
> - The conveyance of that information in DWARF in the form of a new DIE,
>   DW_TAG_GNU_annotation, with tag number (0x6000) and format matching
>   that of the DW_TAG_LLVM_annotation extension supported in LLVM for
>   the same purpose. These DIEs are already supported by BPF tooling,
>   such as pahole.
>
> - The conveyance of that information in BTF debug info in the form of
>   BTF_KIND_DECL_TAG records. These records are already supported by
>   LLVM and other tools in the eBPF ecosystem, such as the Linux kernel
>   eBPF verifier.
>
>
> Background
> ==
>
> The purpose of these tags is to convey additional semantic information
> to post-compilation consumers, in particular the Linux kernel eBPF
> verifier. The verifier can make use of that information while analyzing
> a BPF program to aid in determining whether to allow or reject the
> program to be run. More background on these tags can be found in the
> early support for them in the kernel here [1] and [2].
>
> The "btf_decl_tag" attribute is half the story; the other half is a
> sibling attribute "btf_type_tag" which serves the same purpose but
> applies to types. Support for btf_type_tag will come in a separate
> patch series, since it is impaced by GCC bug 110439 which needs to be
> addressed first.
>
> I submitted an initial version of this work (including btf_type_tag)
> last spring [3], however at the time there were some open questions
> about the behavior of the btf_type_tag attribute and issues with its
> implementation. Since then we have clarified these details and agreed
> to solutions with the BPF community and LLVM BPF folks.
>
> The main motivation for emitting the tags in DWARF is that the Linux
> kernel generates its BTF information via pahole, using DWARF as a source:
>
> ++  BTF  BTF   +--+
> | pahole |---> vmlinux.btf --->| verifier |
> ++ +--+
> ^^
> ||
>   DWARF |BTF |
> ||
>   vmlinux  +-+
>   module1.ko   | BPF program |
>   module2.ko   +-+
> ...
>
> This is because:
>
> a)  pahole adds additional kernel-specific information into the
> produced BTF based on additional analysis of kernel objects.
>
> b)  Unlike GCC, LLVM will only generate BTF for BPF programs.
>
> b)  GCC can generate BTF for whatever target with -gbtf, but there is no
> support for linking/deduplicating BTF in the linker.
>
> In the scenario above, the verifier needs access to the pointer tags of
> both the kernel types/declarations (conveyed in the DWARF and translated
> to BTF by pahole) and those of the BPF program (available directly in BTF).
>
>
> DWARF Representation
> 
>
> As noted above, btf_decl_tag is represented in DWARF via a new DIE
> DW_TAG_GNU_annotation, with identical format to the LLVM DWARF
> extension DW_TAG_LLVM_annotation serving the same purpose. The DIE has
> the following format:
>
>   DW_TAG_GNU_annotation (0x6000)
> DW_AT_name: "btf_decl_tag"
> DW_AT_const_value: 
>
> These DIEs are placed in the DWARF tree as children of the DIE for the
> appropriate declaration, and one such DIE is created for each occurrence
> of the btf_decl_tag attribute on a declaration.
>
> For example:
>
>   const int * c __attribute__((btf_decl_tag ("__c"), btf_decl_tag 
> ("devicemem")));
>
> This declaration produces the following DWARF:
>
>  <1><1e>: Abbrev Number: 2 (DW_TAG_variable)
> <1f>   DW_AT_name: c
> <24>   DW_AT_type: <0x49>
> ...
>  <2><36>: Abbrev Number: 3 (User TAG value: 0x6000)
> <37>   DW_AT_name: (indirect string, offset: 0x4c): btf_decl_tag
> <3b>   DW_AT_const_value : (indirect string, offset: 0): devicemem
>  <2>

Re: [RFC] Store_bit_field_1: Use mode of SUBREG instead of REG

2023-07-12 Thread Richard Biener via Gcc-patches
On Wed, Jul 12, 2023 at 5:20 AM YunQiang Su  wrote:
>
> PR #104914
>
> When work with
>   int val;
>   ((unsigned char*)&val)[0] = *buf;
> The RTX mode is obtained from REG instead of SUBREG,
> which make D is used instead of .
> Thus something wrong happens on sign-extend default architectures,
> like MIPS64.
>
> gcc/ChangeLog:
> PR: 104914.
> * expmed.cc(store_bit_field_1): Get mode from original
> str_rtx instead of op0.
> ---
>  gcc/expmed.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
> index fbd4ce2d42f..37f90912122 100644
> --- a/gcc/expmed.cc
> +++ b/gcc/expmed.cc
> @@ -849,7 +849,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> poly_uint64 bitnum,
>   if we aren't.  This must come after the entire register case above,
>   since that case is valid for any mode.  The following cases are only
>   valid for integral modes.  */
> -  opt_scalar_int_mode op0_mode = int_mode_for_mode (GET_MODE (op0));
> +  opt_scalar_int_mode op0_mode = int_mode_for_mode (GET_MODE (str_rtx));

I don't think this is correct - op0_mode is used to store into op0, and we are
just requiring that it is an integer mode and equal to the original
mode.  I suppose
your patch makes us go to the fallback code instead, but it's surely
for the wrong
reason.  I also wonder why we don't just check GET_MODE_CLASS
(GET_MODE (op0)) == MODE_CLASS_INT ...

>scalar_int_mode imode;
>if (!op0_mode.exists (&imode) || imode != GET_MODE (op0))
>  {
> --
> 2.30.2
>


Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-12 Thread Richard Biener via Gcc-patches
On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:

> Thanks Richard.
> 
> Is it correct that the better way is to add optabs 
> (len_strided_load/len_strided_store),
> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to 
> len_strided_load/len_strided_store optab (if it is strided load/store) in
> expand_gather_load_optab_fn 
> expand_scatter_store_optab_fn
> 
> of internal-fn.cc
> 
> Am I right? Thanks.

Yes.

In priciple the vectorizer can also directly take advantage of this
and code generate an internal .LEN_STRIDED_LOAD ifn.

Richard.

> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-07-12 15:27
> To: juzhe.zh...@rivai.ai
> CC: jeffreyalaw; gcc-patches; Kito.cheng; Robin Dapp; richard.sandiford
> Subject: Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
> auto-vectorization
> On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:
>  
> > I understand your concern. I CC Richards to see whether this piece of codes 
> > is  unsafe.
> > 
> > Hi, Richard and Richi:
> > 
> > Jeff is worrying about this codes in "expand_gather_scatter" of supporting 
> > len_mask_gather_load/len_mask_scatter_store in RISC-V port.
> > 
> > The codes are as follows:
> > 
> >  +/* Return true if it is the strided load/store. */
> > +static bool
> > +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> > +{
> > +  if (const_vec_series_p (vec_offset, base, step))
> > +return true;
> > +
> > +  /* For strided load/store, vectorizer always generates
> > + VEC_SERIES_EXPR for vec_offset.  */
> > +  tree expr = REG_P (vec_offset) ? REG_EXPR (vec_offset) : NULL_TREE;
> > +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> > +return false;
> > +
> > +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> > +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> > +  if (!def_stmt || !is_gimple_assign (def_stmt)
> > +  || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> > +return false;
> > +
> > +  tree baset = gimple_assign_rhs1 (def_stmt);
> > +  tree stept = gimple_assign_rhs2 (def_stmt);
> > +  *base = expand_normal (baset);
> > +  *step = expand_normal (stept);
> > +
> > +  if (!rtx_equal_p (*base, const0_rtx))
> > +return false;
> > +  return true;
> > +}
> > In this codes, I tried to query the SSA_NAME_DEF_STMT to see whether the 
> > vector offset of gather/scatter is VEC_SERISE
> > If it is VEC_SERISE, I will lower them into RVV strided load/stores 
> > (vlse.v/vsse.v) which is using scalar stride, 
> > if it is not, then use common RVV indexed load/store with vector offset 
> > (vluxei/vsuxei).
> > 
> > Jeff is worrying about whether we are using SSA_NAME_DEF_STMT at this point 
> >  (during the stage "expand" expanding gimple ->rtl).
>  
> Using SSA_NAME_DEF_STMT during expansion is OK, but I don't think you
> can rely on REG_EXPR here since you don't know whether any coalescing
> happened.  That said, maybe the implementation currently guarantees
> you'll only see a REG_EXPR SSA name if there's a single definition
> of that register, but at least I'm not aware of that and this is also
> not documented.
>  
> I wonder if you can recover vlse.v at combine time though?
>  
> That said, if the ISA supports gather/scatter with an affine offset
> the more appropriate way would be to add additional named expanders
> for this and deal with the above in the middle-end during RTL
> expansion instead.
>  
> Richard.
>  
> > I am also wondering whether I am doing wrong here.
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Jeff Law
> > Date: 2023-07-12 13:32
> > To: juzhe.zh...@rivai.ai; gcc-patches
> > CC: Kito.cheng; Robin Dapp
> > Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
> > auto-vectorization
> >  
> >  
> > On 7/11/23 20:34, juzhe.zh...@rivai.ai wrote:
> > > Hi, Jeff.
> > > 
> > >  >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
> > >>>complete.  While you might be able to get REG_EXPR, I would not really
> > >>>expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
> > >>>way to make sure it's not called at an inappropriate time.
> > > I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.
> > > 
> > >>>Should this have been known_lt rather than known_le?
> > > It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE 
> > > for SLP.
> > THanks for double checking.  It looked slightly odd checking ge or le.
> >  
> >  
> > > 
> > >>>Something's off in your formatting here.  I'd guess spaces vs tabs
> > > Ok.
> > > 
> > >>>In a few places you're using expand_binop.  Those interfaces are really
> > >>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
> > >>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
> > >>>interfaces?
> > > I saw ARM SVE is using them in many places for expanding patterns.
> > > And I think it's convenient so that's why I use them.
> > OK.
> >  
> > I still think we need a resolution on stri

Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-12 Thread juzhe.zh...@rivai.ai
Thanks Richard so much!

I am gonna prepare V7 of this patch with dropping the strided load/store 
support on RISC-V backend.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-12 15:56
To: juzhe.zh...@rivai.ai
CC: jeffreyalaw; gcc-patches; Kito.cheng; Robin Dapp; richard.sandiford
Subject: Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
auto-vectorization
On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:
 
> Thanks Richard.
> 
> Is it correct that the better way is to add optabs 
> (len_strided_load/len_strided_store),
> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to 
> len_strided_load/len_strided_store optab (if it is strided load/store) in
> expand_gather_load_optab_fn 
> expand_scatter_store_optab_fn
> 
> of internal-fn.cc
> 
> Am I right? Thanks.
 
Yes.
 
In priciple the vectorizer can also directly take advantage of this
and code generate an internal .LEN_STRIDED_LOAD ifn.
 
Richard.
 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-07-12 15:27
> To: juzhe.zh...@rivai.ai
> CC: jeffreyalaw; gcc-patches; Kito.cheng; Robin Dapp; richard.sandiford
> Subject: Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
> auto-vectorization
> On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:
>  
> > I understand your concern. I CC Richards to see whether this piece of codes 
> > is  unsafe.
> > 
> > Hi, Richard and Richi:
> > 
> > Jeff is worrying about this codes in "expand_gather_scatter" of supporting 
> > len_mask_gather_load/len_mask_scatter_store in RISC-V port.
> > 
> > The codes are as follows:
> > 
> >  +/* Return true if it is the strided load/store. */
> > +static bool
> > +strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
> > +{
> > +  if (const_vec_series_p (vec_offset, base, step))
> > +return true;
> > +
> > +  /* For strided load/store, vectorizer always generates
> > + VEC_SERIES_EXPR for vec_offset.  */
> > +  tree expr = REG_P (vec_offset) ? REG_EXPR (vec_offset) : NULL_TREE;
> > +  if (!expr || TREE_CODE (expr) != SSA_NAME)
> > +return false;
> > +
> > +  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
> > +  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
> > +  if (!def_stmt || !is_gimple_assign (def_stmt)
> > +  || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
> > +return false;
> > +
> > +  tree baset = gimple_assign_rhs1 (def_stmt);
> > +  tree stept = gimple_assign_rhs2 (def_stmt);
> > +  *base = expand_normal (baset);
> > +  *step = expand_normal (stept);
> > +
> > +  if (!rtx_equal_p (*base, const0_rtx))
> > +return false;
> > +  return true;
> > +}
> > In this codes, I tried to query the SSA_NAME_DEF_STMT to see whether the 
> > vector offset of gather/scatter is VEC_SERISE
> > If it is VEC_SERISE, I will lower them into RVV strided load/stores 
> > (vlse.v/vsse.v) which is using scalar stride, 
> > if it is not, then use common RVV indexed load/store with vector offset 
> > (vluxei/vsuxei).
> > 
> > Jeff is worrying about whether we are using SSA_NAME_DEF_STMT at this point 
> >  (during the stage "expand" expanding gimple ->rtl).
>  
> Using SSA_NAME_DEF_STMT during expansion is OK, but I don't think you
> can rely on REG_EXPR here since you don't know whether any coalescing
> happened.  That said, maybe the implementation currently guarantees
> you'll only see a REG_EXPR SSA name if there's a single definition
> of that register, but at least I'm not aware of that and this is also
> not documented.
>  
> I wonder if you can recover vlse.v at combine time though?
>  
> That said, if the ISA supports gather/scatter with an affine offset
> the more appropriate way would be to add additional named expanders
> for this and deal with the above in the middle-end during RTL
> expansion instead.
>  
> Richard.
>  
> > I am also wondering whether I am doing wrong here.
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Jeff Law
> > Date: 2023-07-12 13:32
> > To: juzhe.zh...@rivai.ai; gcc-patches
> > CC: Kito.cheng; Robin Dapp
> > Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
> > auto-vectorization
> >  
> >  
> > On 7/11/23 20:34, juzhe.zh...@rivai.ai wrote:
> > > Hi, Jeff.
> > > 
> > >  >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
> > >>>complete.  While you might be able to get REG_EXPR, I would not really
> > >>>expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
> > >>>way to make sure it's not called at an inappropriate time.
> > > I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.
> > > 
> > >>>Should this have been known_lt rather than known_le?
> > > It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE 
> > > for SLP.
> > THanks for double checking.  It looked slightly odd checking ge or le.
> >  
> >  
> > > 
> > >>>Something's off in your formatting here.  I'd guess spaces vs tabs
> > > Ok.
> > > 
> > >>>In a few places you're using expand_binop.  Those int

Re: [PATCH] aarch64: Fix warnings during libgcc build

2023-07-12 Thread Szabolcs Nagy via Gcc-patches
The 07/11/2023 17:20, Florian Weimer wrote:
> * Richard Earnshaw:
> 
> > On 11/07/2023 10:37, Florian Weimer via Gcc-patches wrote:
> >> libgcc/
> >>* config/aarch64/aarch64-unwind.h
> >> (aarch64_cie_signed_with_b_key):
> >>Add missing const qualifier.  Cast from const unsigned char *
> >>to const char *.  Use __builtin_strchr to avoid an implicit
> >>function declaration.
> >>* config/aarch64/linux-unwind.h (aarch64_fallback_frame_state):
> >>Add missing cast.
> >> ---
> >> diff --git a/libgcc/config/aarch64/linux-unwind.h 
> >> b/libgcc/config/aarch64/linux-unwind.h
> >> index 00eba866049..93da7a9537d 100644
> >> --- a/libgcc/config/aarch64/linux-unwind.h
> >> +++ b/libgcc/config/aarch64/linux-unwind.h
> >> @@ -77,7 +77,7 @@ aarch64_fallback_frame_state (struct _Unwind_Context 
> >> *context,
> >>   }
> >>   rt_ = context->cfa;
> >> -  sc = &rt_->uc.uc_mcontext;
> >> +  sc = (struct sigcontext *) &rt_->uc.uc_mcontext;
> >> /* This define duplicates the definition in aarch64.md */
> >>   #define SP_REGNUM 31
> >> 
> >
> > This looks somewhat dubious.  I'm not particularly familiar with the
> > kernel headers, but a quick look suggests an mcontext_t is nothing
> > like a sigcontext_t.  So isn't the cast just papering over some more
> > fundamental problem?
> 
> I agree it looks dubious.  Note that it's struct sigcontext, not
> (not-struct) sigcontext_t.  I don't know why the uc_mcontext members
> aren't accessed directly, so I can't really write a good comment about
> it.

historically glibc typedefed mcontext_t to linux struct sigcontext
so this used to work fine. (i dont know about other os-es)

then at some point glibc fixed the namespace polluting fields
when building for posix which required a separate mcontext_t.

i guess either fix works: moving to the correct mcontext_t or to
cast to struct sigcontext, but the former means the fields must
be changed when building in a posix conforming mode (i guess
libgcc is built with _GNU_SOURCE so may not be an issue) and
they may be different across different libcs (or even different
versions of glibc) then.

> 
> Obviously it works quite well as-is. 8-)  Similar code is present in
> many, many Linux targets.
> 
> Thanks,
> Florian
> 


[PATCH] RISC-V: Support integer mult highpart auto-vectorization

2023-07-12 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch is adding an obvious missing mult_high auto-vectorization pattern.

Consider this following case:
#define DEF_LOOP(TYPE)  \
void __attribute__ ((noipa))\
mod_##TYPE (TYPE *__restrict dst, TYPE *__restrict src, int count)  \
{   \
  for (int i = 0; i < count; ++i)   \
dst[i] = src[i] / 17;   \
}

#define TEST_ALL(T) \
  T (int32_t) \

TEST_ALL (DEF_LOOP)

Before this patch:
mod_int32_t:
ble a2,zero,.L5
li  a5,17
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.x v2,a5
.L3:
vsetvli a5,a2,e8,mf4,ta,ma
vle32.v v1,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
sllia4,a5,2
vdiv.vv v1,v1,v2
sub a2,a2,a5
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L3
.L5:
ret

After this patch:
mod_int32_t:
ble a2,zero,.L5
li  a5,2021163008
addiw   a5,a5,-1927
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.x v3,a5
.L3:
vsetvli a5,a2,e8,mf4,ta,ma
vle32.v v2,0(a1)
vsetvli a3,zero,e32,m1,ta,ma
sllia4,a5,2
vmulh.vvv1,v2,v3
sub a2,a2,a5
vsra.vi v2,v2,31
vsra.vi v1,v1,3
vsub.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a4
add a0,a0,a4
bne a2,zero,.L3
.L5:
ret

Even though a single "vdiv" is lower into "1 vmulh + 2 vsra + 1 vsub",
4 more instructions are generated, we belive it's much better than before
since division is very slow in the hardward.

gcc/ChangeLog:

* config/riscv/autovec.md (smul3_highpart): New pattern.
(umul3_highpart): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/mulh-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/mulh_run-2.c: New test.

---
 gcc/config/riscv/autovec.md   | 30 +++
 .../riscv/rvv/autovec/binop/mulh-1.c  | 26 
 .../riscv/rvv/autovec/binop/mulh-2.c  | 27 +
 .../riscv/rvv/autovec/binop/mulh_run-1.c  | 29 ++
 .../riscv/rvv/autovec/binop/mulh_run-2.c  | 29 ++
 5 files changed, 141 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh_run-2.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 9e61b2e41d8..d98a63c285e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1178,3 +1178,33 @@
 riscv_vector::RVV_BINOP, operands);
   DONE;
 })
+
+;; -
+;;  [INT] Highpart multiplication
+;; -
+;; Includes:
+;; - vmulh.vv
+;; - vmulhu.vv
+;; -
+
+(define_expand "smul3_highpart"
+  [(match_operand:VFULLI 0 "register_operand")
+   (match_operand:VFULLI 1 "register_operand")
+   (match_operand:VFULLI 2 "register_operand")]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred_mulh (UNSPEC_VMULHS, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, operands);
+  DONE;
+})
+
+(define_expand "umul3_highpart"
+  [(match_operand:VFULLI 0 "register_operand")
+   (match_operand:VFULLI 1 "register_operand")
+   (match_operand:VFULLI 2 "register_operand")]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred_mulh (UNSPEC_VMULHU, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, operands);
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-1.c
new file mode 100644
index 000..265a332712a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d 
--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include 
+
+#define DEF_LOOP(TYPE) 
\
+  void __attribute__ ((noipa)) mod_##TYPE (TYPE *dst, TYPE *src, int count)
\
+  {
\
+for (int i = 0; i < count; ++i)
\
+  d

Devirtualization of objects in array

2023-07-12 Thread Ng YongXiang via Gcc-patches
Hello,

I'm writing to seek for a review for an issue I filed some time ago.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057 . A proposed patch is
attached in the bug tracker as well.

The gist of the issue is that in an array, we know the underlying type of
the object. An array must hold the type it is declared with, and it will
never be objects of derived type. Hence, there is a "devirtualization"
opportunity here. GCC is currently invoking the virtual destructor of the
objects in the container when the array destructs which shouldn't be the
case since we know for sure the object's type.

Thank you.

Best regards,
Yong Xiang


Re: Devirtualization of objects in array

2023-07-12 Thread Xi Ruoyao via Gcc-patches
On Wed, 2023-07-12 at 16:58 +0800, Ng YongXiang via Gcc-patches wrote:
> I'm writing to seek for a review for an issue I filed some time ago.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057 . A proposed patch is
> attached in the bug tracker as well.

You should send the patch to gcc-patches@gcc.gnu.org for a review, see
https://gcc.gnu.org/contribute.html for the details.  Generally we
consider patches attached in bugzilla as drafts.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-12 Thread Manolis Tsamis
On Mon, Jul 10, 2023 at 12:37 AM Hans-Peter Nilsson  wrote:
>
> On Thu, 15 Jun 2023, Manolis Tsamis wrote:
>
> > This is a new RTL pass that tries to optimize memory offset calculations
> > by moving them from add immediate instructions to the memory loads/stores.
> > For example it can transform this:
> >
> >   addi t4,sp,16
> >   add  t2,a6,t4
> >   shl  t3,t2,1
> >   ld   a2,0(t3)
> >   addi a2,1
> >   sd   a2,8(t2)
> >
> > into the following (one instruction less):
> >
> >   add  t2,a6,sp
> >   shl  t3,t2,1
> >   ld   a2,32(t3)
> >   addi a2,1
> >   sd   a2,24(t2)
> >
> > Although there are places where this is done already, this pass is more
> > powerful and can handle the more difficult cases that are currently not
> > optimized. Also, it runs late enough and can optimize away unnecessary
> > stack pointer calculations.
>
> It punts on all "use" insns that are not SET.
> Why not use single_set there too?
>

The issue was that single_set will potentially discard clobbers, but
if we have any clobbers it may be invalid to propagate through that
instruction.
Rejecting anything that is not a SET is enough to handle anything strange.
Although this can be improved (look through clobbers/use?) the
implementation will be more complicated without any obvious (large)
benefit.

Manolis

> brgds, H-P


Re: [PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-12 Thread Manolis Tsamis
On Mon, Jul 10, 2023 at 12:58 AM Hans-Peter Nilsson  wrote:
>
> On Sun, 9 Jul 2023, Hans-Peter Nilsson wrote:
>
> > On Thu, 15 Jun 2023, Manolis Tsamis wrote:
> >
> > > This is a new RTL pass that tries to optimize memory offset calculations
> > > by moving them from add immediate instructions to the memory loads/stores.
>
> > It punts on all "use" insns that are not SET.
> > Why not use single_set there too?
>
> Also, I don't see insn costs considered?
> (Also: typo "immidiate".)
>

The only change that this pass does is to change offsets where
possible and then simplify add immediate instructions to register
moves.
I don't see how this could result in worse performance and by
extension I don't see where insn costs could be used.
Do you have any thoughts about where to use the costs?

Thanks!
Manolis

> brgds, H-P


Re: [PATCH] RISC-V: Support integer mult highpart auto-vectorization

2023-07-12 Thread Kito Cheng via Gcc-patches
LGTM, thanks:)

 於 2023年7月12日 週三 16:40 寫道:

> From: Ju-Zhe Zhong 
>
> This patch is adding an obvious missing mult_high auto-vectorization
> pattern.
>
> Consider this following case:
> #define DEF_LOOP(TYPE)  \
> void __attribute__ ((noipa))\
> mod_##TYPE (TYPE *__restrict dst, TYPE *__restrict src, int count)  \
> {   \
>   for (int i = 0; i < count; ++i)   \
> dst[i] = src[i] / 17;   \
> }
>
> #define TEST_ALL(T) \
>   T (int32_t) \
>
> TEST_ALL (DEF_LOOP)
>
> Before this patch:
> mod_int32_t:
> ble a2,zero,.L5
> li  a5,17
> vsetvli a3,zero,e32,m1,ta,ma
> vmv.v.x v2,a5
> .L3:
> vsetvli a5,a2,e8,mf4,ta,ma
> vle32.v v1,0(a1)
> vsetvli a3,zero,e32,m1,ta,ma
> sllia4,a5,2
> vdiv.vv v1,v1,v2
> sub a2,a2,a5
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a0)
> add a1,a1,a4
> add a0,a0,a4
> bne a2,zero,.L3
> .L5:
> ret
>
> After this patch:
> mod_int32_t:
> ble a2,zero,.L5
> li  a5,2021163008
> addiw   a5,a5,-1927
> vsetvli a3,zero,e32,m1,ta,ma
> vmv.v.x v3,a5
> .L3:
> vsetvli a5,a2,e8,mf4,ta,ma
> vle32.v v2,0(a1)
> vsetvli a3,zero,e32,m1,ta,ma
> sllia4,a5,2
> vmulh.vvv1,v2,v3
> sub a2,a2,a5
> vsra.vi v2,v2,31
> vsra.vi v1,v1,3
> vsub.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a0)
> add a1,a1,a4
> add a0,a0,a4
> bne a2,zero,.L3
> .L5:
> ret
>
> Even though a single "vdiv" is lower into "1 vmulh + 2 vsra + 1 vsub",
> 4 more instructions are generated, we belive it's much better than before
> since division is very slow in the hardward.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (smul3_highpart): New pattern.
> (umul3_highpart): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/binop/mulh-1.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/mulh-2.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/mulh_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/mulh_run-2.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   | 30 +++
>  .../riscv/rvv/autovec/binop/mulh-1.c  | 26 
>  .../riscv/rvv/autovec/binop/mulh-2.c  | 27 +
>  .../riscv/rvv/autovec/binop/mulh_run-1.c  | 29 ++
>  .../riscv/rvv/autovec/binop/mulh_run-2.c  | 29 ++
>  5 files changed, 141 insertions(+)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-1.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-2.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh_run-1.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh_run-2.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 9e61b2e41d8..d98a63c285e 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1178,3 +1178,33 @@
>  riscv_vector::RVV_BINOP, operands);
>DONE;
>  })
> +
> +;;
> -
> +;;  [INT] Highpart multiplication
> +;;
> -
> +;; Includes:
> +;; - vmulh.vv
> +;; - vmulhu.vv
> +;;
> -
> +
> +(define_expand "smul3_highpart"
> +  [(match_operand:VFULLI 0 "register_operand")
> +   (match_operand:VFULLI 1 "register_operand")
> +   (match_operand:VFULLI 2 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  insn_code icode = code_for_pred_mulh (UNSPEC_VMULHS, mode);
> +  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP,
> operands);
> +  DONE;
> +})
> +
> +(define_expand "umul3_highpart"
> +  [(match_operand:VFULLI 0 "register_operand")
> +   (match_operand:VFULLI 1 "register_operand")
> +   (match_operand:VFULLI 2 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  insn_code icode = code_for_pred_mulh (UNSPEC_VMULHU, mode);
> +  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP,
> operands);
> +  DONE;
> +})
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-1.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-1.c
> new file mode 100644
> index 000..265a332712a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d
> --param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
> +
> +#include 
> +
> +#def

Re: [PATCH] [vect]Use intermiediate integer type for float_expr/fix_trunc_expr when direct optab is not existed.

2023-07-12 Thread Robin Dapp via Gcc-patches
> int32_t x = (int32_t)0x1.0p32;
> int32_t y = (int32_t)(int64_t)0x1.0p32;
>
> sets x to 2147483647 and y to 0.
>>>
>>> Hmm, good question.  GENERIC has a direct truncation to unsigned char
>>> for example, the C standard generally says if the integral part cannot
>>> be represented then the behavior is undefined.  So I think we should be
>>> safe here (0x1.0p32 doesn't fit an int).
>>
>> We should be following Annex F (unspecified value plus "invalid" exception
>> for out-of-range floating-to-integer conversions rather than undefined
>> behavior).  But we don't achieve that very well at present (see bug 93806
>> comments 27-29 for examples of how such conversions produce wobbly
>> values).
> 
> That would mean guarding this with !flag_trapping_math would be the 
> appropriate
> thing to do.

Follow-up on this:  When we do a NARROW_DST multiple-step conversion we
do not guard with !flag_trapping_math.  Is this intentional and if so, why
do we only require it for the NONE case?

I was thinking of implementing an expander for double -> int16 conversion
for RISC-V with multiple steps but that would just circumvent the
!flag_trapping_math check.  Then I wondered why we vectorize this
using multiple steps on x86 even with trapping math and it turns out
that the difference is the NARROW_DST modifier but we emit
VEC_PACK_FIX_TRUNC_EXPR anyway.  Is this "safe"?

Regards
 Robin


Re: [PATCH] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richard and Richi.
> As we disscussed before, COND_LEN_* patterns were added for multiple 
> situations.
> This patch apply CON_LEN_* for the following situation:
>
> Support for the situation that in "vectorizable_operation":
>   /* If operating on inactive elements could generate spurious traps,
>  we need to restrict the operation to active lanes.  Note that this
>  specifically doesn't apply to unhoisted invariants, since they
>  operate on the same value for every lane.
>
>  Similarly, if this operation is part of a reduction, a fully-masked
>  loop should only change the active lanes of the reduction chain,
>  keeping the inactive lanes as-is.  */
>   bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
>   || reduc_idx >= 0);
>
> For mask_out_inactive is true with length loop control.
>
> So, we can these 2 following cases:
>
> 1. Integer division:
>
>#define TEST_TYPE(TYPE)\
>__attribute__((noipa)) \
>void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)  \
>{  \
>  for (int i = 0; i < n; i++)  \
>dst[i] = a[i] % b[i];  \
>}
>#define TEST_ALL() \
>TEST_TYPE(int8_t)  \
>TEST_ALL()
>
> With this patch:
>   
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> 2. Floating-point arithmetic **WITHOUT** -ffast-math
>   
>#define TEST_TYPE(TYPE)\
>__attribute__((noipa)) \
>void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)  \
>{  \
>  for (int i = 0; i < n; i++)  \
>dst[i] = a[i] + b[i];  \
>}
>#define TEST_ALL() \
>TEST_TYPE(float)   \
>TEST_ALL()
>
> With this patch:
>
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> With this patch, we can make sure operations won't trap for elements that 
> "mask_out_inactive".
>
> gcc/ChangeLog:
>
> * internal-fn.cc (FOR_EACH_CODE_LEN_MAPPING): Add COND_LEN_*.
> (get_conditional_len_internal_fn): New function.
> (CASE): Add COND_LEN_*.
> * internal-fn.h (get_conditional_len_internal_fn): New function.
> * tree-vect-stmts.cc (vectorizable_operation): Apply COND_LEN_* into 
> operation could trap.
>
> ---
>  gcc/internal-fn.cc | 48 +
>  gcc/internal-fn.h  |  1 +
>  gcc/tree-vect-stmts.cc | 60 ++
>  3 files changed, 104 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index f9aaf66cf2a..e46dd57b7f0 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4337,6 +4337,54 @@ conditional_internal_fn_code (internal_fn ifn)
>  }
>  }
>  
> +/* Invoke T(CODE, IFN) for each conditional len function IFN that maps to a
> +   tree code CODE.  */
> +#define FOR_EACH_CODE_LEN_MAPPING(T) 
>   \
> +  T (PLUS_EXPR, IFN_COND_LEN_ADD)
>   \
> +  T (MINUS_EXPR, IFN_COND_LEN_SUB)   
>   \
> +  T (MULT_EXPR, IFN_COND_LEN_MUL)
>   \
> +  T (TRUNC_DIV_EXPR, IFN_COND_LEN_DIV)   
>   \
> +  T (TRUNC_MOD_EXPR, IFN_COND_LEN_MOD)   
>   \
> +  T (RDIV_EXPR, IFN_COND_LEN_RDIV)   
>   \
> +  T (MIN_EXPR, IFN_COND_LEN_MIN) 
>   \
> +  T (MAX_EXPR, IFN_COND_LEN_MAX) 
>   \
> +  T (BIT_AND_EXPR, IFN_COND_LEN_AND) 
>   \
> +  T (BIT_IOR_EXPR, IFN_COND_LEN_IOR) 
>   \
> +  T (BIT_XOR_EXPR, IFN_COND_LEN_XOR) 
>   \
> +  T (LSHIFT_EXPR, IFN_COND_LEN_SHL)  
>   \
> + 

Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-12 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:
>
>> Thanks Richard.
>> 
>> Is it correct that the better way is to add optabs 
>> (len_strided_load/len_strided_store),
>> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to 
>> len_strided_load/len_strided_store optab (if it is strided load/store) in
>> expand_gather_load_optab_fn 
>> expand_scatter_store_optab_fn
>> 
>> of internal-fn.cc
>> 
>> Am I right? Thanks.
>
> Yes.
>
> In priciple the vectorizer can also directly take advantage of this
> and code generate an internal .LEN_STRIDED_LOAD ifn.

Yeah, in particular, having a strided load should relax some
of the restrictions around the relationship of the vector offset
type to the loaded/stored data.  E.g. a "gather" of N bytes with a
64-bit stride would in principle be possible without needing an
Nx64-bit vector offset type.

Richard


Re: [x86 PATCH] PR target/110598: Fix rega = 0; rega ^= rega regression.

2023-07-12 Thread Uros Bizjak via Gcc-patches
On Tue, Jul 11, 2023 at 9:07 PM Roger Sayle  wrote:
>
>
> This patch fixes the regression PR target/110598 caused by my recent
> addition of a peephole2.  The intention of that optimization was to
> simplify zeroing a register, followed by an IOR, XOR or PLUS operation
> on it into a move, or as described in the comment:
> ;; Peephole2 rega = 0; rega op= regb into rega = regb.
>
> The issue is that I'd failed to consider the (rare and unusual) case,
> where regb is rega, where the transformation leads to the incorrect
> "rega = rega", when it should be "rega = 0".  The minimal fix is to
> add a !reg_mentioned_p check to the recent peephole2.
>
> In addition to resolving the regression, I've added a second peephole2
> to optimize the problematic case above, which contains a false
> dependency and is therefore tricky to optimize elsewhere.  This is an
> improvement over GCC 13, for example, that generates the redundant:
>
> xorl%edx, %edx
> xorq%rdx, %rdx
>
>
> 2023-07-11  Roger Sayle  
>
> gcc/ChangeLog
> PR target/110598
> * config/i386/i386.md (peephole2): Check !reg_mentioned_p when
> optimizing rega = 0; rega op= regb for op in [XOR,IOR,PLUS].
> (peephole2): Simplify rega = 0; rega op= rega cases.
>
> gcc/testsuite/ChangeLog
> PR target/110598
> * gcc.target/i386/pr110598.c: New test case.

OK.

Thanks,
Uros.

>
>
> Thanks in advance (and apologies for any inconvenience),
> Roger
> --
>


Re: Loop-ch improvements, part 1

2023-07-12 Thread Richard Biener via Gcc-patches
On Tue, 11 Jul 2023, Jan Hubicka wrote:

> Hi,
> this patch improves profile update in loop-ch to handle situation where 
> duplicated header
> has loop invariant test.  In this case we konw that all count of the exit 
> edge belongs to
> the duplicated loop header edge and can update probabilities accordingly.
> Since we also do all the work to track this information from analysis to 
> duplicaiton
> I also added code to turn those conditionals to constants so we do not need 
> later
> jump threading pass to clean up.
> 
> This made me to work out that the propagatoin was buggy in few aspects
>  1) it handled every PHI as PHI in header and incorrectly assigned some PHIs
> to be IV-like when they are not
>  2) it did not check for novops calls that are not required to return same
> value on every invocation.
>  3) I also added check for asm statement since those are not necessarily
> reproducible either.
> 
> I would like to do more changes, but tried to prevent this patch from
> snowballing.  The analysis of what statements will remain after duplication 
> can
> be improved.  I think we should use ranger query for other than first basic
> block, too and possibly drop the IV heuristics then.  Also it seems that a lot
> of this logic is pretty much same to analysis in peeling pass, so unifying 
> this
> would be nice.

Indeed.

> I also think I should move the profile update out of
> gimple_duplicate_sese_region (it is now very specific to ch) and rename it,
> since those regions are singe entry multiple exit.

Please.  Maybe move it to tree-ssa-loop-ch.cc as well.

> Bootstrapped/regtsted x86_64-linux, OK?

OK, thanks for improving this.

Richard.

> Honza
> 
> gcc/ChangeLog:
> 
>   * tree-cfg.cc (gimple_duplicate_sese_region): Add ORIG_ELIMINATED_EDGES
>   parameter and rewrite profile updating code to handle edges elimination.
>   * tree-cfg.h (gimple_duplicate_sese_region): Update prototpe.
>   * tree-ssa-loop-ch.cc (loop_invariant_op_p): New function.
>   (loop_iv_derived_p): New function.
>   (should_duplicate_loop_header_p): Track invariant exit edges; fix 
> handling
>   of PHIs and propagation of IV derived variables.
>   (ch_base::copy_headers): Pass around the invariant edges hash set.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/loop-ch-profile-1.c: Remove xfail.
> 
> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> index 4989906706c..3879fb7c4c1 100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -6661,14 +6661,16 @@ add_phi_args_after_copy (basic_block *region_copy, 
> unsigned n_region,
> true otherwise.
>  
> ELIMINATED_EDGE is an edge that is known to be removed in the dupicated
> -   region.  */
> +   region.  ORIG_ELIMINATED_EDGES, if non-NULL is set of edges known to be
> +   removed from the original region.  */
>  
>  bool
>  gimple_duplicate_sese_region (edge entry, edge exit,
> basic_block *region, unsigned n_region,
> basic_block *region_copy,
> bool update_dominance,
> -   edge eliminated_edge)
> +   edge eliminated_edge,
> +   hash_set  *orig_eliminated_edges)
>  {
>unsigned i;
>bool free_region_copy = false, copying_header = false;
> @@ -6747,7 +6749,8 @@ gimple_duplicate_sese_region (edge entry, edge exit,
>   split_edge_bb_loc (entry), update_dominance);
>if (total_count.initialized_p () && entry_count.initialized_p ())
>  {
> -  if (!eliminated_edge)
> +  if (!eliminated_edge
> +   && (!orig_eliminated_edges || orig_eliminated_edges->is_empty ()))
>   {
> scale_bbs_frequencies_profile_count (region, n_region,
>  total_count - entry_count,
> @@ -6765,7 +6768,7 @@ gimple_duplicate_sese_region (edge entry, edge exit,
>if (cond1) <- this condition will become false
>  and we update probabilities
>  goto loop_exit;
> -  if (cond2)
> +  if (cond2) <- this condition is loop invariant
>  goto loop_exit;
>goto loop_header   <- this will be redirected to loop.
>  // region_copy_end
> @@ -6776,6 +6779,7 @@ gimple_duplicate_sese_region (edge entry, edge exit,
>  if (cond1)   <- we need to update probabbility here
>goto loop_exit;
>  if (cond2)   <- and determine scaling factor here.
> +moreover cond2 is now always true
>goto loop_exit;
>  else
>goto loop;
> @@ -6785,53 +6789,84 @@ gimple_duplicate_sese_region (edge entry, edge exit,
>but only consumer so far is tree-ssa-loop-ch and it uses only this
>to handle the common case of peelin

Re: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-12 Thread Uros Bizjak via Gcc-patches
On Tue, Jul 11, 2023 at 10:07 PM Roger Sayle  wrote:
>
>
> The recent change in TImode parameter passing on x86_64 results in the
> FAIL of pr91681-1.c.  The issue is that with the extra flexibility,
> the combine pass is now spoilt for choice between using either the
> *add3_doubleword_concat or the *add3_doubleword_zext
> patterns, when one operand is a *concat and the other is a zero_extend.
> The solution proposed below is provide an *add3_doubleword_concat_zext
> define_insn_and_split, that can benefit both from the register allocation
> of *concat, and still avoid the xor normally required by zero extension.
>
> I'm investigating a follow-up refinement to improve register allocation
> further by avoiding the early clobber in the =&r, and handling (custom)
> reloads explicitly, but this piece resolves the testcase failure.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-07-11  Roger Sayle  
>
> gcc/ChangeLog
> PR target/91681
> * config/i386/i386.md (*add3_doubleword_concat_zext): New
> define_insn_and_split derived from *add3_doubleword_concat
> and *add3_doubleword_zext.

OK.

Thanks,
Uros.

>
>
> Thanks,
> Roger
> --
>


Re: [PATCH] Add __builtin_iseqsig()

2023-07-12 Thread FX Coudert via Gcc-patches
ping**3


>> Le 6 juin 2023 à 20:15, FX Coudert  a écrit :
>> 
>> Hi,
>> 
>> (It took me a while to get back to this.)
>> 
>> This is a new and improved version of the patch at 
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602932.html
>> It addresses the comment from Joseph that FE_INVALID should really be tested 
>> in the case of both quiet and signaling NaNs, which is now done 
>> systematically.
>> 
>> Bootstrapped and regtested on x86_64-pc-linux-gnu
>> OK to commit?
>> 
>> FX



0001-Add-__builtin_iseqsig.patch
Description: Binary data


Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-12 Thread juzhe.zhong--- via Gcc-patches
Thanks Richard.

I have addressed all comments on V7 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624220.html 

Drop vlse/vsse codegen optimization in RISC-V backend, instead I will support 
LEN_MASK_STRIDED_LOAD/LEN_MASK_STRIDE_STORE
in the future.

Thanks. 


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-07-12 17:33
To: Richard Biener
CC: juzhe.zhong\@rivai.ai; jeffreyalaw; gcc-patches; Kito.cheng; Robin Dapp
Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
auto-vectorization
Richard Biener  writes:
> On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:
>
>> Thanks Richard.
>> 
>> Is it correct that the better way is to add optabs 
>> (len_strided_load/len_strided_store),
>> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to 
>> len_strided_load/len_strided_store optab (if it is strided load/store) in
>> expand_gather_load_optab_fn 
>> expand_scatter_store_optab_fn
>> 
>> of internal-fn.cc
>> 
>> Am I right? Thanks.
>
> Yes.
>
> In priciple the vectorizer can also directly take advantage of this
> and code generate an internal .LEN_STRIDED_LOAD ifn.
 
Yeah, in particular, having a strided load should relax some
of the restrictions around the relationship of the vector offset
type to the loaded/stored data.  E.g. a "gather" of N bytes with a
64-bit stride would in principle be possible without needing an
Nx64-bit vector offset type.
 
Richard
 


Re: [PATCH 1/2] c++, libstdc++: implement __is_pointer built-in trait

2023-07-12 Thread Jonathan Wakely via Gcc-patches
On Wed, 12 Jul 2023 at 05:41, François Dumont via Libstdc++
 wrote:
>
>
> On 10/07/2023 07:23, Ken Matsui via Libstdc++ wrote:
> > This patch implements built-in trait for std::is_pointer.
> >
> > gcc/cp/ChangeLog:
> >
> >   * cp-trait.def: Define __is_pointer.
> >   * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_POINTER.
> >   * semantics.cc (trait_expr_value): Likewise.
> >   (finish_trait_expr): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/ext/has-builtin-1.C: Test existence of __is_pointer.
> >   * g++.dg/ext/is_pointer.C: New test.
> >   * g++.dg/tm/pr46567.C (__is_pointer): Rename to ...
> >   (is_pointer): ... this.
> >   * g++.dg/torture/20070621-1.C: Likewise.
> >   * g++.dg/torture/pr57107.C: Likewise.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/bits/cpp_type_traits.h (__is_pointer): Rename to ...
> >   (is_pointer): ... this.
> >   * include/bits/deque.tcc: Use is_pointer instead.
> >   * include/bits/stl_algobase.h: Likewise.
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >   gcc/cp/constraint.cc|  3 ++
> >   gcc/cp/cp-trait.def |  1 +
> >   gcc/cp/semantics.cc |  4 ++
> >   gcc/testsuite/g++.dg/ext/has-builtin-1.C|  3 ++
> >   gcc/testsuite/g++.dg/ext/is_pointer.C   | 51 +
> >   gcc/testsuite/g++.dg/tm/pr46567.C   | 22 -
> >   gcc/testsuite/g++.dg/torture/20070621-1.C   |  4 +-
> >   gcc/testsuite/g++.dg/torture/pr57107.C  |  4 +-
> >   libstdc++-v3/include/bits/cpp_type_traits.h |  6 +--
> >   libstdc++-v3/include/bits/deque.tcc |  6 +--
> >   libstdc++-v3/include/bits/stl_algobase.h|  6 +--
> >   11 files changed, 86 insertions(+), 24 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/ext/is_pointer.C
> >
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index 8cf0f2d0974..30266204eb5 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -3751,6 +3751,9 @@ diagnose_trait_expr (tree expr, tree args)
> >   case CPTK_IS_UNION:
> > inform (loc, "  %qT is not a union", t1);
> > break;
> > +case CPTK_IS_POINTER:
> > +  inform (loc, "  %qT is not a pointer", t1);
> > +  break;
> >   case CPTK_IS_AGGREGATE:
> > inform (loc, "  %qT is not an aggregate", t1);
> > break;
> > diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
> > index 8b7fece0cc8..b7c263e9a77 100644
> > --- a/gcc/cp/cp-trait.def
> > +++ b/gcc/cp/cp-trait.def
> > @@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, 
> > "__is_trivially_assignable", 2)
> >   DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, 
> > "__is_trivially_constructible", -1)
> >   DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
> >   DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
> > +DEFTRAIT_EXPR (IS_POINTER, "__is_pointer", 1)
> >   DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
> > "__reference_constructs_from_temporary", 2)
> >   DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
> > "__reference_converts_from_temporary", 2)
> >   /* FIXME Added space to avoid direct usage in GCC 13.  */
> > diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> > index 8fb47fd179e..68f8a4fe85b 100644
> > --- a/gcc/cp/semantics.cc
> > +++ b/gcc/cp/semantics.cc
> > @@ -12118,6 +12118,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, 
> > tree type2)
> >   case CPTK_IS_UNION:
> > return type_code1 == UNION_TYPE;
> >
> > +case CPTK_IS_POINTER:
> > +  return TYPE_PTR_P (type1);
> > +
> >   case CPTK_IS_ASSIGNABLE:
> > return is_xible (MODIFY_EXPR, type1, type2);
> >
> > @@ -12296,6 +12299,7 @@ finish_trait_expr (location_t loc, cp_trait_kind 
> > kind, tree type1, tree type2)
> >   case CPTK_IS_ENUM:
> >   case CPTK_IS_UNION:
> >   case CPTK_IS_SAME:
> > +case CPTK_IS_POINTER:
> > break;
> >
> >   case CPTK_IS_LAYOUT_COMPATIBLE:
> > diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
> > b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> > index f343e153e56..9dace5cbd48 100644
> > --- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> > +++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> > @@ -146,3 +146,6 @@
> >   #if !__has_builtin (__remove_cvref)
> >   # error "__has_builtin (__remove_cvref) failed"
> >   #endif
> > +#if !__has_builtin (__is_pointer)
> > +# error "__has_builtin (__is_pointer) failed"
> > +#endif
> > diff --git a/gcc/testsuite/g++.dg/ext/is_pointer.C 
> > b/gcc/testsuite/g++.dg/ext/is_pointer.C
> > new file mode 100644
> > index 000..d6e39565950
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/ext/is_pointer.C
> > @@ -0,0 +1,51 @@
> > +// { dg-do compile { target c++11 } }
> > +
> > +#define SA(X) static_assert((X),#X)
> > +
> > +SA(!__is_pointer(int));
> > +SA(__is_pointer(int*));
> > +SA(__is_pointer(int**));
> > +
> > +SA(__is_pointer(const int*));
> > +SA(__is_poi

Re: [PATCH 1/2] c++, libstdc++: implement __is_pointer built-in trait

2023-07-12 Thread Jonathan Wakely via Gcc-patches
On Mon, 10 Jul 2023 at 06:24, Ken Matsui via Libstdc++
 wrote:
>
> This patch implements built-in trait for std::is_pointer.
>
> gcc/cp/ChangeLog:
>
> * cp-trait.def: Define __is_pointer.
> * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_POINTER.
> * semantics.cc (trait_expr_value): Likewise.
> (finish_trait_expr): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/ext/has-builtin-1.C: Test existence of __is_pointer.
> * g++.dg/ext/is_pointer.C: New test.
> * g++.dg/tm/pr46567.C (__is_pointer): Rename to ...
> (is_pointer): ... this.
> * g++.dg/torture/20070621-1.C: Likewise.
> * g++.dg/torture/pr57107.C: Likewise.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/cpp_type_traits.h (__is_pointer): Rename to ...
> (is_pointer): ... this.

Please pick another name, is_pointer is not OK for the library trait.

(It might be OK for the g++.dg tests, I have no opinion on that.)

You could use __is_pointer_type or __is_ptr, I don't really mind which
(it's only used in two places).



Re: [PATCH v2 2/2] libstdc++: use new built-in trait __is_scalar for std::is_scalar

2023-07-12 Thread Jonathan Wakely via Gcc-patches
On Sat, 8 Jul 2023 at 05:47, Ken Matsui via Libstdc++
 wrote:
>
> This patch gets std::is_scalar to dispatch to new built-in trait
> __is_scalar.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (is_scalar): Use __is_scalar built-in
> trait.
> (is_scalar_v): Likewise.

OK for trunk (conditional on the front-end change being committed
first of course).


>
> Signed-off-by: Ken Matsui 
> ---
>  libstdc++-v3/include/std/type_traits | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 0e7a9c9c7f3..bc90b2c61ca 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -678,11 +678,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  struct is_member_pointer;
>
>/// is_scalar
> +#if __has_builtin(__is_scalar)
> +  template
> +struct is_scalar
> +: public __bool_constant<__is_scalar(_Tp)>
> +{ };
> +#else
>template
>  struct is_scalar
>  : public __or_, is_enum<_Tp>, is_pointer<_Tp>,
> is_member_pointer<_Tp>, is_null_pointer<_Tp>>::type
>  { };
> +#endif
>
>/// is_compound
>template
> @@ -3204,8 +3211,15 @@ template 
>inline constexpr bool is_fundamental_v = is_fundamental<_Tp>::value;
>  template 
>inline constexpr bool is_object_v = is_object<_Tp>::value;
> +
> +#if __has_builtin(__is_scalar)
> +template 
> +  inline constexpr bool is_scalar_v = __is_scalar(_Tp);
> +#else
>  template 
>inline constexpr bool is_scalar_v = is_scalar<_Tp>::value;
> +#endif
> +
>  template 
>inline constexpr bool is_compound_v = is_compound<_Tp>::value;
>  template 
> --
> 2.41.0
>



Re: [PATCH v2] libstdc++: use __is_enum built-in trait

2023-07-12 Thread Jonathan Wakely via Gcc-patches
On Sat, 8 Jul 2023 at 05:50, Ken Matsui via Libstdc++
 wrote:
>
> This patch replaces is_enum::value with __is_enum built-in trait in
> the type_traits header.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (__make_unsigned_selector): Use
> __is_enum built-in trait.
> (__make_signed_selector): Likewise.
> (__underlying_type_impl): Likewise.
>
> Signed-off-by: Ken Matsui 

OK for trunk, thanks!


> ---
>  libstdc++-v3/include/std/type_traits | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 0e7a9c9c7f3..9f086992ebc 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -1740,7 +1740,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>// Select between integral and enum: not possible to be both.
>templatebool _IsInt = is_integral<_Tp>::value,
> -  bool _IsEnum = is_enum<_Tp>::value>
> +  bool _IsEnum = __is_enum(_Tp)>
>  class __make_unsigned_selector;
>
>template
> @@ -1900,7 +1900,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>// Select between integral and enum: not possible to be both.
>templatebool _IsInt = is_integral<_Tp>::value,
> -  bool _IsEnum = is_enum<_Tp>::value>
> +  bool _IsEnum = __is_enum(_Tp)>
>  class __make_signed_selector;
>
>template
> @@ -2353,7 +2353,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  struct __common_type_fold<_CTp, _Rp, void>
>  { };
>
> -  template::value>
> +  template
>  struct __underlying_type_impl
>  {
>using type = __underlying_type(_Tp);
> --
> 2.41.0
>



Re: [PATCH v2 1/2] c++, libstdc++: implement __is_pointer built-in trait

2023-07-12 Thread Jonathan Wakely via Gcc-patches
On Mon, 10 Jul 2023 at 06:51, Ken Matsui via Libstdc++
 wrote:
>
> Hi,
>
> Here is the benchmark result for is_pointer:
>
> https://github.com/ken-matsui/gcc-benches/blob/main/is_pointer.md#sun-jul--9-103948-pm-pdt-2023
>
> Time: -62.1344%
> Peak Memory Usage: -52.4281%
> Total Memory Usage: -53.5889%

Wow!

Although maybe we could have improved our std::is_pointer_v anyway, like so:

template 
  inline constexpr bool is_pointer_v = false;
template 
  inline constexpr bool is_pointer_v<_Tp*> = true;
template 
  inline constexpr bool is_pointer_v<_Tp* const> = true;
template 
  inline constexpr bool is_pointer_v<_Tp* volatile> = true;
template 
  inline constexpr bool is_pointer_v<_Tp* const volatile> = true;

I'm not sure why I didn't already do that.

Could you please benchmark that? And if it is better than the current
impl using is_pointer<_Tp>::value then we should do this in the
library:

#if __has_builtin(__is_pointer)
template 
  inline constexpr bool is_pointer_v = __is_pointer(_Tp);
#else
template 
  inline constexpr bool is_pointer_v = false;
template 
  inline constexpr bool is_pointer_v<_Tp*> = true;
template 
  inline constexpr bool is_pointer_v<_Tp* const> = true;
template 
  inline constexpr bool is_pointer_v<_Tp* volatile> = true;
template 
  inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
#endif



Re: [PATCH 2/2] libstdc++: use new built-in trait __is_signed

2023-07-12 Thread Jonathan Wakely via Gcc-patches
On Sun, 9 Jul 2023 at 09:50, Ken Matsui via Libstdc++
 wrote:
>
> This patch lets libstdc++ use new built-in trait __is_signed.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (is_signed): Use __is_signed built-in trait.
> (is_signed_v): Likewise.
>
> Signed-off-by: Ken Matsui 

OK for trunk after the front-end implementation of __is_signed is committed.


> ---
>  libstdc++-v3/include/std/type_traits | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 0e7a9c9c7f3..23ab5a4b1e5 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -865,6 +865,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  : public __bool_constant<__is_abstract(_Tp)>
>  { };
>
> +  /// is_signed
> +#if __has_builtin(__is_signed)
> +  template
> +struct is_signed
> +: public __bool_constant<__is_signed(_Tp)>
> +{ };
> +#else
>/// @cond undocumented
>templatebool = is_arithmetic<_Tp>::value>
> @@ -877,11 +884,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  { };
>/// @endcond
>
> -  /// is_signed
>template
>  struct is_signed
>  : public __is_signed_helper<_Tp>::type
>  { };
> +#endif
>
>/// is_unsigned
>template
> @@ -3240,8 +3247,14 @@ template 
>  template 
>inline constexpr bool is_final_v = __is_final(_Tp);
>
> +#if __has_builtin(__is_signed)
> +template 
> +  inline constexpr bool is_signed_v = __is_signed(_Tp);
> +#else
>  template 
>inline constexpr bool is_signed_v = is_signed<_Tp>::value;
> +#endif
> +
>  template 
>inline constexpr bool is_unsigned_v = is_unsigned<_Tp>::value;
>
> --
> 2.41.0
>



Re: [PATCH v4 2/2] libstdc++: use new built-in trait __is_unsigned

2023-07-12 Thread Jonathan Wakely via Gcc-patches
On Sat, 8 Jul 2023 at 12:14, Ken Matsui via Libstdc++
 wrote:
>
> This patch lets libstdc++ use new built-in trait __is_unsigned.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (is_unsigned): Use __is_unsigned built-in
> trait.
> (is_unsigned_v): Likewise.
>
> Signed-off-by: Ken Matsui 

OK for trunk after the front-end implementation of __is_unsigned is committed.


> ---
>  libstdc++-v3/include/std/type_traits | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 0e7a9c9c7f3..7eeb0da7a27 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -884,10 +884,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  { };
>
>/// is_unsigned
> +#if __has_builtin(__is_unsigned)
> +  template
> +struct is_unsigned
> +: public __bool_constant<__is_unsigned(_Tp)>
> +{ };
> +#else
>template
>  struct is_unsigned
>  : public __and_, __not_>>::type
>  { };
> +#endif
>
>/// @cond undocumented
>template
> @@ -3242,8 +3249,14 @@ template 
>
>  template 
>inline constexpr bool is_signed_v = is_signed<_Tp>::value;
> +
> +#if __has_builtin(__is_unsigned)
> +template 
> +  inline constexpr bool is_unsigned_v = __is_unsigned(_Tp);
> +#else
>  template 
>inline constexpr bool is_unsigned_v = is_unsigned<_Tp>::value;
> +#endif
>
>  template 
>inline constexpr bool is_constructible_v = __is_constructible(_Tp, 
> _Args...);
> --
> 2.41.0
>



Re: [pushed][LRA][PR110372]: Refine reload pseudo class

2023-07-12 Thread Richard Sandiford via Gcc-patches
Vladimir Makarov via Gcc-patches  writes:
> The following patch solves
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110372
>
> The patch was successfully bootstrapped and tested on x86-64.
>
> commit 1f7e5a7b91862b999aab88ee0319052aaf00f0f1
> Author: Vladimir N. Makarov 
> Date:   Fri Jul 7 09:53:38 2023 -0400
>
> LRA: Refine reload pseudo class
> 
> For given testcase a reload pseudo happened to occur only in reload
> insns created on one constraint sub-pass.  Therefore its initial class
> (ALL_REGS) was not refined and the reload insns were not processed on
> the next constraint sub-passes.  This resulted into the wrong insn.
> 
> PR rtl-optimization/110372
> 
> gcc/ChangeLog:
> 
> * lra-assigns.cc (assign_by_spills): Add reload insns involving
> reload pseudos with non-refined class to be processed on the next
> sub-pass.
> * lra-constraints.cc (enough_allocatable_hard_regs_p): New func.
> (in_class_p): Use it.
> (print_curr_insn_alt): New func.
> (process_alt_operands): Use it.  Improve debug info.
> (curr_insn_transform): Use print_curr_insn_alt.  Refine reload
> pseudo class if it is not refined yet.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/i386/pr110372.c: New.
>
> diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
> index 73fbef29912..2f95121df06 100644
> --- a/gcc/lra-assigns.cc
> +++ b/gcc/lra-assigns.cc
> @@ -1443,10 +1443,11 @@ assign_by_spills (void)
>pass.  Indicate that it is no longer spilled.  */
> bitmap_clear_bit (&all_spilled_pseudos, regno);
> assign_hard_regno (hard_regno, regno);
> -   if (! reload_p)
> - /* As non-reload pseudo assignment is changed we
> -should reconsider insns referring for the
> -pseudo.  */
> +   if (! reload_p || regno_allocno_class_array[regno] == ALL_REGS)

Is this test meaningful on all targets?  We have some for which
GENERAL_REGS == ALL_REGS (e.g. nios2 and nvptx), so ALL_REGS can
be a valid allocation class.

Thanks,
Richard

> + /* As non-reload pseudo assignment is changed we should
> +reconsider insns referring for the pseudo.  Do the same if a
> +reload pseudo did not refine its class which can happens
> +when the pseudo occurs only in reload insns.  */
>   bitmap_set_bit (&changed_pseudo_bitmap, regno);
>   }
>   }
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> index 4dc2d70c402..123ff662cbc 100644
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -233,6 +233,34 @@ get_reg_class (int regno)
>return NO_REGS;
>  }
>  
> +/* Return true if REG_CLASS has enough allocatable hard regs to keep value of
> +   REG_MODE.  */
> +static bool
> +enough_allocatable_hard_regs_p (enum reg_class reg_class,
> + enum machine_mode reg_mode)
> +{
> +  int i, j, hard_regno, class_size, nregs;
> +  
> +  if (hard_reg_set_subset_p (reg_class_contents[reg_class], 
> lra_no_alloc_regs))
> +return false;
> +  class_size = ira_class_hard_regs_num[reg_class];
> +  for (i = 0; i < class_size; i++)
> +{
> +  hard_regno = ira_class_hard_regs[reg_class][i];
> +  nregs = hard_regno_nregs (hard_regno, reg_mode);
> +  if (nregs == 1)
> + return true;
> +  for (j = 0; j < nregs; j++)
> + if (TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno + j)
> + || ! TEST_HARD_REG_BIT (reg_class_contents[reg_class],
> + hard_regno + j))
> +   break;
> +  if (j >= nregs)
> + return true;
> +}
> +  return false;
> +}
> +
>  /* Return true if REG satisfies (or will satisfy) reg class constraint
> CL.  Use elimination first if REG is a hard register.  If REG is a
> reload pseudo created by this constraints pass, assume that it will
> @@ -252,7 +280,6 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class 
> *new_class,
>enum reg_class rclass, common_class;
>machine_mode reg_mode;
>rtx src;
> -  int class_size, hard_regno, nregs, i, j;
>int regno = REGNO (reg);
>  
>if (new_class != NULL)
> @@ -291,26 +318,7 @@ in_class_p (rtx reg, enum reg_class cl, enum reg_class 
> *new_class,
>common_class = ira_reg_class_subset[rclass][cl];
>if (new_class != NULL)
>   *new_class = common_class;
> -  if (hard_reg_set_subset_p (reg_class_contents[common_class],
> -  lra_no_alloc_regs))
> - return false;
> -  /* Check that there are enough allocatable regs.  */
> -  class_size = ira_class_hard_regs_num[common_class];
> -  for (i = 0; i < class_size; i++)
> - {
> -   hard_regno = ira_class_hard_regs[common_class][i];
> -   nregs = hard_regno_nregs (hard_regno, reg_mode);
> -   if (n

Re: [PATCH v8 2/6] libstdc++: use new built-in trait __is_reference for std::is_reference

2023-07-12 Thread Jonathan Wakely via Gcc-patches
On Sat, 8 Jul 2023 at 06:13, Ken Matsui via Libstdc++
 wrote:
>
> This patch gets std::is_reference to dispatch to new built-in trait
> __is_reference.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (is_reference): Use __is_reference built-in
> trait.
> (is_reference_v): Likewise.
>
> Signed-off-by: Ken Matsui 

OK for trunk after the front-end __is_reference is committed.


> ---
>  libstdc++-v3/include/std/type_traits | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 0e7a9c9c7f3..2a14df7e5f9 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -639,6 +639,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>// Composite type categories.
>
>/// is_reference
> +#if __has_builtin(__is_reference)
> +  template
> +struct is_reference
> +: public __bool_constant<__is_reference(_Tp)>
> +{ };
> +#else
>template
>  struct is_reference
>  : public false_type
> @@ -653,6 +659,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  struct is_reference<_Tp&&>
>  : public true_type
>  { };
> +#endif
>
>/// is_arithmetic
>template
> @@ -3192,12 +3199,19 @@ template 
>inline constexpr bool is_class_v = __is_class(_Tp);
>  template 
>inline constexpr bool is_function_v = is_function<_Tp>::value;
> +
> +#if __has_builtin(__is_reference)
> +template 
> +  inline constexpr bool is_reference_v = __is_reference(_Tp);
> +#else
>  template 
>inline constexpr bool is_reference_v = false;
>  template 
>inline constexpr bool is_reference_v<_Tp&> = true;
>  template 
>inline constexpr bool is_reference_v<_Tp&&> = true;
> +#endif
> +
>  template 
>inline constexpr bool is_arithmetic_v = is_arithmetic<_Tp>::value;
>  template 
> --
> 2.41.0
>



Re: [PATCH v8 4/6] libstdc++: use new built-in trait __is_function for std::is_function

2023-07-12 Thread Jonathan Wakely via Gcc-patches
On Sat, 8 Jul 2023 at 06:15, Ken Matsui via Libstdc++
 wrote:
>
> This patch gets std::is_function to dispatch to new built-in trait
> __is_function.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (is_function): Use __is_function built-in
> trait.
> (is_function_v): Likewise.
>
> Signed-off-by: Ken Matsui 
> ---
>  libstdc++-v3/include/std/type_traits | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 2a14df7e5f9..954b57518de 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -594,6 +594,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  { };
>
>/// is_function
> +#if __has_builtin(__is_function)
> +  template
> +struct is_function
> +: public __bool_constant<__is_function(_Tp)>
> +{ };
> +#else
>template
>  struct is_function
>  : public __bool_constant::value> { };
> @@ -605,6 +611,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template
>  struct is_function<_Tp&&>
>  : public false_type { };
> +#endif
>
>  #define __cpp_lib_is_null_pointer 201309L
>
> @@ -3197,8 +3204,14 @@ template 
>inline constexpr bool is_union_v = __is_union(_Tp);
>  template 
>inline constexpr bool is_class_v = __is_class(_Tp);
> +
> +#if __has_builtin(__is_function)
> +template 
> +  inline constexpr bool is_function_v = __is_function(_Tp);
> +#else
>  template 
>inline constexpr bool is_function_v = is_function<_Tp>::value;

This fallback could be:

template 
   inline constexpr bool is_function_v = !is_const_v;

That would avoid instantiating std::is_function and std::is_const and
std::integral_constant, which should be significant.



RE: [PATCH] RISC-V: Support integer mult highpart auto-vectorization

2023-07-12 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Wednesday, July 12, 2023 5:17 PM
To: 钟居哲 
Cc: GCC Patches ; Kito Cheng ; 
Jeff Law ; Robin Dapp 
Subject: Re: [PATCH] RISC-V: Support integer mult highpart auto-vectorization

LGTM, thanks:)

 於 2023年7月12日 週三 16:40 寫道:

> From: Ju-Zhe Zhong 
>
> This patch is adding an obvious missing mult_high auto-vectorization
> pattern.
>
> Consider this following case:
> #define DEF_LOOP(TYPE)  \
> void __attribute__ ((noipa))\
> mod_##TYPE (TYPE *__restrict dst, TYPE *__restrict src, int count)  \
> {   \
>   for (int i = 0; i < count; ++i)   \
> dst[i] = src[i] / 17;   \
> }
>
> #define TEST_ALL(T) \
>   T (int32_t) \
>
> TEST_ALL (DEF_LOOP)
>
> Before this patch:
> mod_int32_t:
> ble a2,zero,.L5
> li  a5,17
> vsetvli a3,zero,e32,m1,ta,ma
> vmv.v.x v2,a5
> .L3:
> vsetvli a5,a2,e8,mf4,ta,ma
> vle32.v v1,0(a1)
> vsetvli a3,zero,e32,m1,ta,ma
> sllia4,a5,2
> vdiv.vv v1,v1,v2
> sub a2,a2,a5
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a0)
> add a1,a1,a4
> add a0,a0,a4
> bne a2,zero,.L3
> .L5:
> ret
>
> After this patch:
> mod_int32_t:
> ble a2,zero,.L5
> li  a5,2021163008
> addiw   a5,a5,-1927
> vsetvli a3,zero,e32,m1,ta,ma
> vmv.v.x v3,a5
> .L3:
> vsetvli a5,a2,e8,mf4,ta,ma
> vle32.v v2,0(a1)
> vsetvli a3,zero,e32,m1,ta,ma
> sllia4,a5,2
> vmulh.vvv1,v2,v3
> sub a2,a2,a5
> vsra.vi v2,v2,31
> vsra.vi v1,v1,3
> vsub.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a0)
> add a1,a1,a4
> add a0,a0,a4
> bne a2,zero,.L3
> .L5:
> ret
>
> Even though a single "vdiv" is lower into "1 vmulh + 2 vsra + 1 vsub",
> 4 more instructions are generated, we belive it's much better than before
> since division is very slow in the hardward.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (smul3_highpart): New pattern.
> (umul3_highpart): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/binop/mulh-1.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/mulh-2.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/mulh_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/binop/mulh_run-2.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   | 30 +++
>  .../riscv/rvv/autovec/binop/mulh-1.c  | 26 
>  .../riscv/rvv/autovec/binop/mulh-2.c  | 27 +
>  .../riscv/rvv/autovec/binop/mulh_run-1.c  | 29 ++
>  .../riscv/rvv/autovec/binop/mulh_run-2.c  | 29 ++
>  5 files changed, 141 insertions(+)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-1.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-2.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh_run-1.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh_run-2.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 9e61b2e41d8..d98a63c285e 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1178,3 +1178,33 @@
>  riscv_vector::RVV_BINOP, operands);
>DONE;
>  })
> +
> +;;
> -
> +;;  [INT] Highpart multiplication
> +;;
> -
> +;; Includes:
> +;; - vmulh.vv
> +;; - vmulhu.vv
> +;;
> -
> +
> +(define_expand "smul3_highpart"
> +  [(match_operand:VFULLI 0 "register_operand")
> +   (match_operand:VFULLI 1 "register_operand")
> +   (match_operand:VFULLI 2 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  insn_code icode = code_for_pred_mulh (UNSPEC_VMULHS, mode);
> +  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP,
> operands);
> +  DONE;
> +})
> +
> +(define_expand "umul3_highpart"
> +  [(match_operand:VFULLI 0 "register_operand")
> +   (match_operand:VFULLI 1 "register_operand")
> +   (match_operand:VFULLI 2 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  insn_code icode = code_for_pred_mulh (UNSPEC_VMULHU, mode);
> +  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP,
> operands);
> +  DONE;
> +})
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-1.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/mulh-1.c
> new file mode 100644
> index 000.

Re: [PATCH 1/2] c++, libstdc++: implement __is_signed built-in trait

2023-07-12 Thread Jonathan Wakely via Gcc-patches
On Sun, 9 Jul 2023 at 09:50, Ken Matsui via Libstdc++
 wrote:
>
> This patch implements built-in trait for std::is_signed.
>
> gcc/cp/ChangeLog:
>
> * cp-trait.def: Define __is_signed.
> * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_SIGNED.
> * semantics.cc (trait_expr_value): Likewise.
> (finish_trait_expr): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/ext/has-builtin-1.C: Test existence of __is_signed.
> * g++.dg/ext/is_signed.C: New test.
> * g++.dg/tm/pr46567.C (__is_signed): Rename to ...
> (is_signed): ... this.
>
> libstdc++-v3/ChangeLog:
>
> * include/ext/numeric_traits.h (__is_signed): Rename to ...
> (is_signed): ... this.

Again, please do not use four underscores.

This data member of __numeric_traits_integer could be __signed or
__is_signed_integer. I think I prefer __signed here, since the
"integer" part is redundant with __numeric_traits_integer.




> * include/bits/charconv.h: Use is_signed instead.
> * include/bits/locale_facets.tcc: Likewise.
> * include/bits/uniform_int_dist.h: Likewise.
>
> Signed-off-by: Ken Matsui 
> ---
>  gcc/cp/constraint.cc |  3 ++
>  gcc/cp/cp-trait.def  |  1 +
>  gcc/cp/semantics.cc  |  4 ++
>  gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 ++
>  gcc/testsuite/g++.dg/ext/is_signed.C | 47 
>  gcc/testsuite/g++.dg/tm/pr46567.C| 12 ++---
>  libstdc++-v3/include/bits/charconv.h |  2 +-
>  libstdc++-v3/include/bits/locale_facets.tcc  |  6 +--
>  libstdc++-v3/include/bits/uniform_int_dist.h |  4 +-
>  libstdc++-v3/include/ext/numeric_traits.h| 18 
>  10 files changed, 79 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_signed.C
>
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index 8cf0f2d0974..73fcbfe39e8 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -3751,6 +3751,9 @@ diagnose_trait_expr (tree expr, tree args)
>  case CPTK_IS_UNION:
>inform (loc, "  %qT is not a union", t1);
>break;
> +case CPTK_IS_SIGNED:
> +  inform (loc, "  %qT is not a signed type", t1);
> +  break;
>  case CPTK_IS_AGGREGATE:
>inform (loc, "  %qT is not an aggregate", t1);
>break;
> diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
> index 8b7fece0cc8..576d5528d05 100644
> --- a/gcc/cp/cp-trait.def
> +++ b/gcc/cp/cp-trait.def
> @@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, 
> "__is_trivially_assignable", 2)
>  DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", 
> -1)
>  DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
>  DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
> +DEFTRAIT_EXPR (IS_SIGNED, "__is_signed", 1)
>  DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
> "__reference_constructs_from_temporary", 2)
>  DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
> "__reference_converts_from_temporary", 2)
>  /* FIXME Added space to avoid direct usage in GCC 13.  */
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 8fb47fd179e..17aad992f96 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -12118,6 +12118,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, 
> tree type2)
>  case CPTK_IS_UNION:
>return type_code1 == UNION_TYPE;
>
> +case CPTK_IS_SIGNED:
> +  return ARITHMETIC_TYPE_P (type1) && TYPE_SIGN (type1) == SIGNED;
> +
>  case CPTK_IS_ASSIGNABLE:
>return is_xible (MODIFY_EXPR, type1, type2);
>
> @@ -12296,6 +12299,7 @@ finish_trait_expr (location_t loc, cp_trait_kind 
> kind, tree type1, tree type2)
>  case CPTK_IS_ENUM:
>  case CPTK_IS_UNION:
>  case CPTK_IS_SAME:
> +case CPTK_IS_SIGNED:
>break;
>
>  case CPTK_IS_LAYOUT_COMPATIBLE:
> diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
> b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> index f343e153e56..a43202d0d59 100644
> --- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> +++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> @@ -146,3 +146,6 @@
>  #if !__has_builtin (__remove_cvref)
>  # error "__has_builtin (__remove_cvref) failed"
>  #endif
> +#if !__has_builtin (__is_signed)
> +# error "__has_builtin (__is_signed) failed"
> +#endif
> diff --git a/gcc/testsuite/g++.dg/ext/is_signed.C 
> b/gcc/testsuite/g++.dg/ext/is_signed.C
> new file mode 100644
> index 000..a04b548105d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/is_signed.C
> @@ -0,0 +1,47 @@
> +// { dg-do compile { target c++11 } }
> +
> +#include 
> +
> +using namespace __gnu_test;
> +
> +#define SA(X) static_assert((X),#X)
> +#define SA_TEST_CATEGORY(TRAIT, X, expect) \
> +  SA(TRAIT(X) == expect);  \
> +  SA(TRAIT(const X) == expect);\
> +  SA(TRAIT(volatile X) == expect); \
> +  SA(TRAIT(const volatile X) == expect)

Re: [PATCH v2 2/2] libstdc++: use new built-in trait __remove_pointer

2023-07-12 Thread Jonathan Wakely via Gcc-patches
On Sat, 8 Jul 2023 at 06:31, Ken Matsui via Libstdc++
 wrote:
>
> This patch lets libstdc++ use new built-in trait __remove_pointer.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (remove_pointer): Use __remove_pointer 
> built-in trait.
>
> Signed-off-by: Ken Matsui 

OK for trunk after the front-end __remove_pointer is committed.


> ---
>  libstdc++-v3/include/std/type_traits | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 0e7a9c9c7f3..81497e2f3e1 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -2023,6 +2023,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>// Pointer modifications.
>
> +  /// remove_pointer
> +#if __has_builtin(__remove_pointer)
> +  template
> +struct remove_pointer
> +{ using type = __remove_pointer(_Tp); };
> +#else
>template
>  struct __remove_pointer_helper
>  { using type = _Tp; };
> @@ -2031,11 +2037,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  struct __remove_pointer_helper<_Tp, _Up*>
>  { using type = _Up; };
>
> -  /// remove_pointer
>template
>  struct remove_pointer
>  : public __remove_pointer_helper<_Tp, __remove_cv_t<_Tp>>
>  { };
> +#endif
>
>template
>  struct __add_pointer_helper
> --
> 2.41.0
>



Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]

2023-07-12 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak  wrote:
>>
>> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
>>  wrote:
>> >
>> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak  wrote:
>> > >
>> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
>> > >  wrote:
>> > > >
>> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
>> > > >  wrote:
>> > > > >
>> > > > > As shown in the PR, simplify_gen_subreg call in 
>> > > > > simplify_replace_fn_rtx:
>> > > > >
>> > > > > (gdb) list
>> > > > > 469   if (code == SUBREG)
>> > > > > 470 {
>> > > > > 471   op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
>> > > > > old_rtx, fn, data);
>> > > > > 472   if (op0 == SUBREG_REG (x))
>> > > > > 473 return x;
>> > > > > 474   op0 = simplify_gen_subreg (GET_MODE (x), op0,
>> > > > > 475  GET_MODE (SUBREG_REG 
>> > > > > (x)),
>> > > > > 476  SUBREG_BYTE (x));
>> > > > > 477   return op0 ? op0 : x;
>> > > > > 478 }
>> > > > >
>> > > > > simplifies with following arguments:
>> > > > >
>> > > > > (gdb) p debug_rtx (op0)
>> > > > > (const_vector:V4QI [
>> > > > > (const_int -52 [0xffcc]) repeated x4
>> > > > > ])
>> > > > > (gdb) p debug_rtx (x)
>> > > > > (subreg:V16QI (reg:V4QI 98) 0)
>> > > > >
>> > > > > to:
>> > > > >
>> > > > > (gdb) p debug_rtx (op0)
>> > > > > (const_vector:V16QI [
>> > > > > (const_int -52 [0xffcc]) repeated x16
>> > > > > ])
>> > > > >
>> > > > > This simplification is invalid, it is not possible to get V16QImode 
>> > > > > vector
>> > > > > from V4QImode vector, even when all elements are duplicates.
>> >
>> > ^^^
>> >
>> > I think this simplification is valid.  A simplification to
>> >
>> > (const_vector:V16QI [
>> >  (const_int -52 [0xffcc]) repeated x4
>> >  (const_int 0 [0]) repeated x12
>> >  ])
>> >
>> > would be valid as well.
>> >
>> > > > > The simplification happens in simplify_context::simplify_subreg:
>> > > > >
>> > > > > (gdb) list
>> > > > > 7558  if (VECTOR_MODE_P (outermode)
>> > > > > 7559  && GET_MODE_INNER (outermode) == GET_MODE_INNER 
>> > > > > (innermode)
>> > > > > 7560  && vec_duplicate_p (op, &elt))
>> > > > > 7561return gen_vec_duplicate (outermode, elt);
>> > > > >
>> > > > > but the above simplification is valid only for non-paradoxical 
>> > > > > registers,
>> > > > > where outermode <= innermode.  We should not assume that elements 
>> > > > > outside
>> > > > > the original register are valid, let alone all duplicates.
>> > > >
>> > > > Hmm, but looking at the audit trail the x86 backend expects them to be 
>> > > > zero?
>> > > > Isn't that wrong as well?
>> > >
>> > > If you mean Comment #10, it is just an observation that
>> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
>> > >
>> > > (gdb) p debug_rtx (src)
>> > > (const_vector:V8HI [
>> > > (const_int 204 [0xcc]) repeated x4
>> > > (const_int 0 [0]) repeated x4
>> > > ])
>> > >
>> > > instead of:
>> > >
>> > > (gdb) p debug_rtx (src)
>> > > (const_vector:V8HI [
>> > > (const_int 204 [0xcc]) repeated x8
>> > > ])
>> > >
>> > > which is in line with the statement below.
>> > > >
>> > > > That is, I think putting any random value into the upper lanes when
>> > > > constant folding
>> > > > a paradoxical subreg sounds OK to me, no?
>> > >
>> > > The compiler is putting zero there as can be seen from the above new RTX.
>> > >
>> > > > Of course we might choose to not do such constant propagation for
>> > > > efficiency reason - at least
>> > > > when the resulting CONST_* would require a larger constant pool entry
>> > > > or more costly
>> > > > construction.
>> > >
>> > > This is probably a follow-up improvement, where this patch tries to
>> > > fix a specific invalid simplification of simplify_replace_rtx that is
>> > > invalid universally.
>> >
>> > How so?  What specifies the values of the paradoxical subreg for the
>> > bytes not covered by the subreg operand?
>>
>> I don't know why 0 is generated here (and if it is valid) for
>> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to
>> the wrong constant and triggers unwanted propagation later on.
>
> Quoting what I wrote in the PR below.  I think pragmatically the fix is
> good - we might miss some opportunistic folding this way but we for
> sure may not optimistically register an equality via REG_EQUAL without
> enforcing it (removing the producer and replacing it with the optimistic
> constant).
>
> So consider the patch approved if no other RTL maintainer chimes in
> within 48h.

Sorry, can you hold off a bit longer?  Wanted to have a look but the
deadline is about to expire.

I think at least a comment is needed, since like Richard says,
the tra

[PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.
As we disscussed before, COND_LEN_* patterns were added for multiple situations.
This patch apply CON_LEN_* for the following situation:

Support for the situation that in "vectorizable_operation":
  /* If operating on inactive elements could generate spurious traps,
 we need to restrict the operation to active lanes.  Note that this
 specifically doesn't apply to unhoisted invariants, since they
 operate on the same value for every lane.

 Similarly, if this operation is part of a reduction, a fully-masked
 loop should only change the active lanes of the reduction chain,
 keeping the inactive lanes as-is.  */
  bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
|| reduc_idx >= 0);

For mask_out_inactive is true with length loop control.

So, we can these 2 following cases:

1. Integer division:

   #define TEST_TYPE(TYPE)  \
   __attribute__((noipa))   \
   void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
   {\
 for (int i = 0; i < n; i++)\
   dst[i] = a[i] % b[i];\
   }
   #define TEST_ALL()   \
   TEST_TYPE(int8_t)\
   TEST_ALL()

With this patch:
  
  _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
  ivtmp_45 = _61 * 4;
  vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
  vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
  vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
vect__4.8_48, _61, 0);
  .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);

2. Floating-point arithmetic **WITHOUT** -ffast-math
  
   #define TEST_TYPE(TYPE)  \
   __attribute__((noipa))   \
   void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
   {\
 for (int i = 0; i < n; i++)\
   dst[i] = a[i] + b[i];\
   }
   #define TEST_ALL()   \
   TEST_TYPE(float) \
   TEST_ALL()

With this patch:
   
  _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
  ivtmp_45 = _61 * 4;
  vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
  vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
  vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
vect__4.8_48, _61, 0);
  .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);

With this patch, we can make sure operations won't trap for elements that 
"mask_out_inactive".

gcc/ChangeLog:

* internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* support.
(CASE): Ditto.
(get_conditional_len_internal_fn): New function.
* internal-fn.h (get_conditional_len_internal_fn): Ditto.
* tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* 
support.

---
 gcc/internal-fn.cc | 65 ++
 gcc/internal-fn.h  |  1 +
 gcc/tree-vect-stmts.cc | 48 ---
 3 files changed, 85 insertions(+), 29 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index f9aaf66cf2a..7e3a8cc8412 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) 
(internal_fn, gcall *) = {
   0
 };
 
-/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a
-   tree code CODE.  */
+/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX
+   that maps to a tree code CODE.  There is also an IFN_COND_LEN_##SUFFIX
+   for each such IFN_COND_##SUFFIX.  */
 #define FOR_EACH_CODE_MAPPING(T) \
-  T (PLUS_EXPR, IFN_COND_ADD) \
-  T (MINUS_EXPR, IFN_COND_SUB) \
-  T (MULT_EXPR, IFN_COND_MUL) \
-  T (TRUNC_DIV_EXPR, IFN_COND_DIV) \
-  T (TRUNC_MOD_EXPR, IFN_COND_MOD) \
-  T (RDIV_EXPR, IFN_COND_RDIV) \
-  T (MIN_EXPR, IFN_COND_MIN) \
-  T (MAX_EXPR, IFN_COND_MAX) \
-  T (BIT_AND_EXPR, IFN_COND_AND) \
-  T (BIT_IOR_EXPR, IFN_COND_IOR) \
-  T (BIT_XOR_EXPR, IFN_COND_XOR) \
-  T (LSHIFT_EXPR, IFN_COND_SHL) \
-  T (RSHIFT_EXPR, IFN_COND_SHR) \
-  T (NEGATE_EXPR, IFN_COND_NEG)
+  T (PLUS_EXPR, ADD) \
+  T (MINUS_EXPR, SUB) \
+  T (MULT_EXPR, MUL) \
+  T (TRUNC_DIV_EXPR, DIV) \
+  T (TRUNC_MOD_EXPR, MOD) \
+  T (RDIV_EXPR, RDIV) \
+  T (MIN_EXPR, MIN) \
+  T (MAX_EXPR, MAX) \
+  T (BIT_AND_EXPR, AND) \
+  T (BIT_IOR_EXPR, IOR) \
+  T (BIT_XOR_EXPR, XOR) \
+  T (LSHIFT_EXPR, SHL) \
+  T (RSHIFT_EXPR, SHR) \
+  T (NEGATE_EXPR, NEG)
 
 /* Return a function that only performs CODE when a certain condition is met
and that uses a given fallback value otherwise.  For example, if CODE is
@@ -4313,7 +4314,7 @@ get_conditional_internal_fn (tree_code code)
 {
   switch (c

Re: Re: [PATCH] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread juzhe.zh...@rivai.ai
Thank you so much.
I have addressed all comments with V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624237.html 

Could you take a look at it whether it looks reasonable to you?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-07-12 17:29
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH] VECT: Apply COND_LEN_* into vectorizable_operation
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richard and Richi.
> As we disscussed before, COND_LEN_* patterns were added for multiple 
> situations.
> This patch apply CON_LEN_* for the following situation:
>
> Support for the situation that in "vectorizable_operation":
>   /* If operating on inactive elements could generate spurious traps,
>  we need to restrict the operation to active lanes.  Note that this
>  specifically doesn't apply to unhoisted invariants, since they
>  operate on the same value for every lane.
>
>  Similarly, if this operation is part of a reduction, a fully-masked
>  loop should only change the active lanes of the reduction chain,
>  keeping the inactive lanes as-is.  */
>   bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
> || reduc_idx >= 0);
>
> For mask_out_inactive is true with length loop control.
>
> So, we can these 2 following cases:
>
> 1. Integer division:
>
>#define TEST_TYPE(TYPE) \
>__attribute__((noipa)) \
>void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
>{ \
>  for (int i = 0; i < n; i++) \
>dst[i] = a[i] % b[i]; \
>}
>#define TEST_ALL() \
>TEST_TYPE(int8_t) \
>TEST_ALL()
>
> With this patch:
>   
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> 2. Floating-point arithmetic **WITHOUT** -ffast-math
>   
>#define TEST_TYPE(TYPE) \
>__attribute__((noipa)) \
>void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
>{ \
>  for (int i = 0; i < n; i++) \
>dst[i] = a[i] + b[i]; \
>}
>#define TEST_ALL() \
>TEST_TYPE(float) \
>TEST_ALL()
>
> With this patch:
>
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> With this patch, we can make sure operations won't trap for elements that 
> "mask_out_inactive".
>
> gcc/ChangeLog:
>
> * internal-fn.cc (FOR_EACH_CODE_LEN_MAPPING): Add COND_LEN_*.
> (get_conditional_len_internal_fn): New function.
> (CASE): Add COND_LEN_*.
> * internal-fn.h (get_conditional_len_internal_fn): New function.
> * tree-vect-stmts.cc (vectorizable_operation): Apply COND_LEN_* into 
> operation could trap.
>
> ---
>  gcc/internal-fn.cc | 48 +
>  gcc/internal-fn.h  |  1 +
>  gcc/tree-vect-stmts.cc | 60 ++
>  3 files changed, 104 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index f9aaf66cf2a..e46dd57b7f0 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4337,6 +4337,54 @@ conditional_internal_fn_code (internal_fn ifn)
>  }
>  }
>  
> +/* Invoke T(CODE, IFN) for each conditional len function IFN that maps to a
> +   tree code CODE.  */
> +#define FOR_EACH_CODE_LEN_MAPPING(T) 
>   \
> +  T (PLUS_EXPR, IFN_COND_LEN_ADD)
>   \
> +  T (MINUS_EXPR, IFN_COND_LEN_SUB)   
>   \
> +  T (MULT_EXPR, IFN_COND_LEN_MUL)
>   \
> +  T (TRUNC_DIV_EXPR, IFN_COND_LEN_DIV)   
>   \
> +  T (TRUNC_MOD_EXPR, IFN_COND_LEN_MOD)   
>   \
> +  T (RDIV_EXPR, IFN_COND_LEN_RDIV)   
>   \
> +  T (MIN_EXPR, IFN_COND_LEN_MIN) 
>   \
> +  T (MAX_EXPR, IFN_COND_LEN_MAX) 
>   \
> +  T (BIT_AND_EXPR, IFN_COND_LEN_AND) 
>   \
> +  T (BIT_IOR_EXPR, IFN_COND_LEN_IOR) 
>   \
> +  T (BIT_XOR_EXPR, IFN_COND_LEN_XOR) 
>   \
> +  T (LSHIFT_EXPR, IFN_COND_LEN_SHL)  

Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-12 Thread Richard Biener via Gcc-patches
On Wed, 12 Jul 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:
> >
> >> Thanks Richard.
> >> 
> >> Is it correct that the better way is to add optabs 
> >> (len_strided_load/len_strided_store),
> >> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to 
> >> len_strided_load/len_strided_store optab (if it is strided load/store) in
> >> expand_gather_load_optab_fn 
> >> expand_scatter_store_optab_fn
> >> 
> >> of internal-fn.cc
> >> 
> >> Am I right? Thanks.
> >
> > Yes.
> >
> > In priciple the vectorizer can also directly take advantage of this
> > and code generate an internal .LEN_STRIDED_LOAD ifn.
> 
> Yeah, in particular, having a strided load should relax some
> of the restrictions around the relationship of the vector offset
> type to the loaded/stored data.  E.g. a "gather" of N bytes with a
> 64-bit stride would in principle be possible without needing an
> Nx64-bit vector offset type.

And it can be used to do the VMAT_ELEMENTWISE/VMAT_STRIDED_SLP in
a more efficient way as well.  We never got around using gather/scatter
for these (because in practice those tend to be slower than what we
do now there).

Richard.


Re: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richard and Richi.
> As we disscussed before, COND_LEN_* patterns were added for multiple 
> situations.
> This patch apply CON_LEN_* for the following situation:
>
> Support for the situation that in "vectorizable_operation":
>   /* If operating on inactive elements could generate spurious traps,
>  we need to restrict the operation to active lanes.  Note that this
>  specifically doesn't apply to unhoisted invariants, since they
>  operate on the same value for every lane.
>
>  Similarly, if this operation is part of a reduction, a fully-masked
>  loop should only change the active lanes of the reduction chain,
>  keeping the inactive lanes as-is.  */
>   bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
>   || reduc_idx >= 0);
>
> For mask_out_inactive is true with length loop control.
>
> So, we can these 2 following cases:
>
> 1. Integer division:
>
>#define TEST_TYPE(TYPE)\
>__attribute__((noipa)) \
>void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)  \
>{  \
>  for (int i = 0; i < n; i++)  \
>dst[i] = a[i] % b[i];  \
>}
>#define TEST_ALL() \
>TEST_TYPE(int8_t)  \
>TEST_ALL()
>
> With this patch:
>   
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> 2. Floating-point arithmetic **WITHOUT** -ffast-math
>   
>#define TEST_TYPE(TYPE)\
>__attribute__((noipa)) \
>void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)  \
>{  \
>  for (int i = 0; i < n; i++)  \
>dst[i] = a[i] + b[i];  \
>}
>#define TEST_ALL() \
>TEST_TYPE(float)   \
>TEST_ALL()
>
> With this patch:
>
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> With this patch, we can make sure operations won't trap for elements that 
> "mask_out_inactive".
>
> gcc/ChangeLog:
>
> * internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* 
> support.
> (CASE): Ditto.
> (get_conditional_len_internal_fn): New function.
> * internal-fn.h (get_conditional_len_internal_fn): Ditto.
> * tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* 
> support.
>
> ---
>  gcc/internal-fn.cc | 65 ++
>  gcc/internal-fn.h  |  1 +
>  gcc/tree-vect-stmts.cc | 48 ---
>  3 files changed, 85 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index f9aaf66cf2a..7e3a8cc8412 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) 
> (internal_fn, gcall *) = {
>0
>  };
>  
> -/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a
> -   tree code CODE.  */
> +/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX
> +   that maps to a tree code CODE.  There is also an IFN_COND_LEN_##SUFFIX
> +   for each such IFN_COND_##SUFFIX.  */
>  #define FOR_EACH_CODE_MAPPING(T) \
> -  T (PLUS_EXPR, IFN_COND_ADD) \
> -  T (MINUS_EXPR, IFN_COND_SUB) \
> -  T (MULT_EXPR, IFN_COND_MUL) \
> -  T (TRUNC_DIV_EXPR, IFN_COND_DIV) \
> -  T (TRUNC_MOD_EXPR, IFN_COND_MOD) \
> -  T (RDIV_EXPR, IFN_COND_RDIV) \
> -  T (MIN_EXPR, IFN_COND_MIN) \
> -  T (MAX_EXPR, IFN_COND_MAX) \
> -  T (BIT_AND_EXPR, IFN_COND_AND) \
> -  T (BIT_IOR_EXPR, IFN_COND_IOR) \
> -  T (BIT_XOR_EXPR, IFN_COND_XOR) \
> -  T (LSHIFT_EXPR, IFN_COND_SHL) \
> -  T (RSHIFT_EXPR, IFN_COND_SHR) \
> -  T (NEGATE_EXPR, IFN_COND_NEG)
> +  T (PLUS_EXPR, ADD) \
> +  T (MINUS_EXPR, SUB) \
> +  T (MULT_EXPR, MUL) \
> +  T (TRUNC_DIV_EXPR, DIV) \
> +  T (TRUNC_MOD_EXPR, MOD) \
> +  T (RDIV_EXPR, RDIV) \
> +  T (MIN_EXPR, MIN) \
> +  T (MAX_EXPR, MAX) \
> +  T (BIT_AND_EXPR, AND) \
> +  T (BIT_IOR_EXPR, IOR) \
> +  T (BIT_XOR_EXPR, XOR) \
> +  T (LSHIFT_EXPR, SHL) \
> +  T (RSHIFT_EXPR, SHR) \
> +  T (NEGATE_EXPR, 

Re: Re: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread juzhe.zh...@rivai.ai
Thank you so much.
I am gonna wait for Richi's final approval.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-07-12 18:53
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richard and Richi.
> As we disscussed before, COND_LEN_* patterns were added for multiple 
> situations.
> This patch apply CON_LEN_* for the following situation:
>
> Support for the situation that in "vectorizable_operation":
>   /* If operating on inactive elements could generate spurious traps,
>  we need to restrict the operation to active lanes.  Note that this
>  specifically doesn't apply to unhoisted invariants, since they
>  operate on the same value for every lane.
>
>  Similarly, if this operation is part of a reduction, a fully-masked
>  loop should only change the active lanes of the reduction chain,
>  keeping the inactive lanes as-is.  */
>   bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
> || reduc_idx >= 0);
>
> For mask_out_inactive is true with length loop control.
>
> So, we can these 2 following cases:
>
> 1. Integer division:
>
>#define TEST_TYPE(TYPE) \
>__attribute__((noipa)) \
>void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
>{ \
>  for (int i = 0; i < n; i++) \
>dst[i] = a[i] % b[i]; \
>}
>#define TEST_ALL() \
>TEST_TYPE(int8_t) \
>TEST_ALL()
>
> With this patch:
>   
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> 2. Floating-point arithmetic **WITHOUT** -ffast-math
>   
>#define TEST_TYPE(TYPE) \
>__attribute__((noipa)) \
>void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
>{ \
>  for (int i = 0; i < n; i++) \
>dst[i] = a[i] + b[i]; \
>}
>#define TEST_ALL() \
>TEST_TYPE(float) \
>TEST_ALL()
>
> With this patch:
>
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> With this patch, we can make sure operations won't trap for elements that 
> "mask_out_inactive".
>
> gcc/ChangeLog:
>
> * internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* 
> support.
> (CASE): Ditto.
> (get_conditional_len_internal_fn): New function.
> * internal-fn.h (get_conditional_len_internal_fn): Ditto.
> * tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* 
> support.
>
> ---
>  gcc/internal-fn.cc | 65 ++
>  gcc/internal-fn.h  |  1 +
>  gcc/tree-vect-stmts.cc | 48 ---
>  3 files changed, 85 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index f9aaf66cf2a..7e3a8cc8412 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) 
> (internal_fn, gcall *) = {
>0
>  };
>  
> -/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a
> -   tree code CODE.  */
> +/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX
> +   that maps to a tree code CODE.  There is also an IFN_COND_LEN_##SUFFIX
> +   for each such IFN_COND_##SUFFIX.  */
>  #define FOR_EACH_CODE_MAPPING(T) \
> -  T (PLUS_EXPR, IFN_COND_ADD) \
> -  T (MINUS_EXPR, IFN_COND_SUB) \
> -  T (MULT_EXPR, IFN_COND_MUL) \
> -  T (TRUNC_DIV_EXPR, IFN_COND_DIV) \
> -  T (TRUNC_MOD_EXPR, IFN_COND_MOD) \
> -  T (RDIV_EXPR, IFN_COND_RDIV) \
> -  T (MIN_EXPR, IFN_COND_MIN) \
> -  T (MAX_EXPR, IFN_COND_MAX) \
> -  T (BIT_AND_EXPR, IFN_COND_AND) \
> -  T (BIT_IOR_EXPR, IFN_COND_IOR) \
> -  T (BIT_XOR_EXPR, IFN_COND_XOR) \
> -  T (LSHIFT_EXPR, IFN_COND_SHL) \
> -  T (RSHIFT_EXPR, IFN_COND_SHR) \
> -  T (NEGATE_EXPR, IFN_COND_NEG)
> +  T (PLUS_EXPR, ADD) \
> +  T (MINUS_EXPR, SUB) \
> +  T (MULT_EXPR, MUL) \
> +  T (TRUNC_DIV_EXPR, DIV) \
> +  T (TRUNC_MOD_EXPR, MOD) \
> +  T (RDIV_EXPR, RDIV) \
> +  T (MIN_EXPR, MIN) \
> +  T (MAX_EXPR, MAX) \
> +  T (BIT_AND_EXPR, AND) \
> +  T (BIT_IOR_EXPR, IOR) \
> +  T (BIT_XOR_EXPR, XOR) \
> +  T (LSHIFT_EXPR, SHL) \
> +  T (RSHIFT_EXPR, SHR) \
> +  T (NEGATE_EXPR, NEG)
>  
>  /* Return a function that only performs CODE when a certain condition is met

Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]

2023-07-12 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford
 wrote:
>
> Richard Biener via Gcc-patches  writes:
> > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak  wrote:
> >>
> >> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
> >>  wrote:
> >> >
> >> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak  wrote:
> >> > >
> >> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
> >> > >  wrote:
> >> > > >
> >> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
> >> > > >  wrote:
> >> > > > >
> >> > > > > As shown in the PR, simplify_gen_subreg call in 
> >> > > > > simplify_replace_fn_rtx:
> >> > > > >
> >> > > > > (gdb) list
> >> > > > > 469   if (code == SUBREG)
> >> > > > > 470 {
> >> > > > > 471   op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
> >> > > > > old_rtx, fn, data);
> >> > > > > 472   if (op0 == SUBREG_REG (x))
> >> > > > > 473 return x;
> >> > > > > 474   op0 = simplify_gen_subreg (GET_MODE (x), op0,
> >> > > > > 475  GET_MODE (SUBREG_REG 
> >> > > > > (x)),
> >> > > > > 476  SUBREG_BYTE (x));
> >> > > > > 477   return op0 ? op0 : x;
> >> > > > > 478 }
> >> > > > >
> >> > > > > simplifies with following arguments:
> >> > > > >
> >> > > > > (gdb) p debug_rtx (op0)
> >> > > > > (const_vector:V4QI [
> >> > > > > (const_int -52 [0xffcc]) repeated x4
> >> > > > > ])
> >> > > > > (gdb) p debug_rtx (x)
> >> > > > > (subreg:V16QI (reg:V4QI 98) 0)
> >> > > > >
> >> > > > > to:
> >> > > > >
> >> > > > > (gdb) p debug_rtx (op0)
> >> > > > > (const_vector:V16QI [
> >> > > > > (const_int -52 [0xffcc]) repeated x16
> >> > > > > ])
> >> > > > >
> >> > > > > This simplification is invalid, it is not possible to get 
> >> > > > > V16QImode vector
> >> > > > > from V4QImode vector, even when all elements are duplicates.
> >> >
> >> > ^^^
> >> >
> >> > I think this simplification is valid.  A simplification to
> >> >
> >> > (const_vector:V16QI [
> >> >  (const_int -52 [0xffcc]) repeated x4
> >> >  (const_int 0 [0]) repeated x12
> >> >  ])
> >> >
> >> > would be valid as well.
> >> >
> >> > > > > The simplification happens in simplify_context::simplify_subreg:
> >> > > > >
> >> > > > > (gdb) list
> >> > > > > 7558  if (VECTOR_MODE_P (outermode)
> >> > > > > 7559  && GET_MODE_INNER (outermode) == GET_MODE_INNER 
> >> > > > > (innermode)
> >> > > > > 7560  && vec_duplicate_p (op, &elt))
> >> > > > > 7561return gen_vec_duplicate (outermode, elt);
> >> > > > >
> >> > > > > but the above simplification is valid only for non-paradoxical 
> >> > > > > registers,
> >> > > > > where outermode <= innermode.  We should not assume that elements 
> >> > > > > outside
> >> > > > > the original register are valid, let alone all duplicates.
> >> > > >
> >> > > > Hmm, but looking at the audit trail the x86 backend expects them to 
> >> > > > be zero?
> >> > > > Isn't that wrong as well?
> >> > >
> >> > > If you mean Comment #10, it is just an observation that
> >> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
> >> > >
> >> > > (gdb) p debug_rtx (src)
> >> > > (const_vector:V8HI [
> >> > > (const_int 204 [0xcc]) repeated x4
> >> > > (const_int 0 [0]) repeated x4
> >> > > ])
> >> > >
> >> > > instead of:
> >> > >
> >> > > (gdb) p debug_rtx (src)
> >> > > (const_vector:V8HI [
> >> > > (const_int 204 [0xcc]) repeated x8
> >> > > ])
> >> > >
> >> > > which is in line with the statement below.
> >> > > >
> >> > > > That is, I think putting any random value into the upper lanes when
> >> > > > constant folding
> >> > > > a paradoxical subreg sounds OK to me, no?
> >> > >
> >> > > The compiler is putting zero there as can be seen from the above new 
> >> > > RTX.
> >> > >
> >> > > > Of course we might choose to not do such constant propagation for
> >> > > > efficiency reason - at least
> >> > > > when the resulting CONST_* would require a larger constant pool entry
> >> > > > or more costly
> >> > > > construction.
> >> > >
> >> > > This is probably a follow-up improvement, where this patch tries to
> >> > > fix a specific invalid simplification of simplify_replace_rtx that is
> >> > > invalid universally.
> >> >
> >> > How so?  What specifies the values of the paradoxical subreg for the
> >> > bytes not covered by the subreg operand?
> >>
> >> I don't know why 0 is generated here (and if it is valid) for
> >> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to
> >> the wrong constant and triggers unwanted propagation later on.
> >
> > Quoting what I wrote in the PR below.  I think pragmatically the fix is
> > good - we might miss some opportunistic folding this way but we for
> > sure may not optimistically register an equality via REG_EQUAL without
> > enforcing it (remov

Re: Re: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread Richard Biener via Gcc-patches
On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:

> Thank you so much.
> I am gonna wait for Richi's final approval.

It's good enough when either of us approves unless we explicitely ask
to wait for somebody else.

LGTM anyway.

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2023-07-12 18:53
> To: juzhe.zhong
> CC: gcc-patches; rguenther
> Subject: Re: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation
> juzhe.zh...@rivai.ai writes:
> > From: Ju-Zhe Zhong 
> >
> > Hi, Richard and Richi.
> > As we disscussed before, COND_LEN_* patterns were added for multiple 
> > situations.
> > This patch apply CON_LEN_* for the following situation:
> >
> > Support for the situation that in "vectorizable_operation":
> >   /* If operating on inactive elements could generate spurious traps,
> >  we need to restrict the operation to active lanes.  Note that this
> >  specifically doesn't apply to unhoisted invariants, since they
> >  operate on the same value for every lane.
> >
> >  Similarly, if this operation is part of a reduction, a fully-masked
> >  loop should only change the active lanes of the reduction chain,
> >  keeping the inactive lanes as-is.  */
> >   bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
> > || reduc_idx >= 0);
> >
> > For mask_out_inactive is true with length loop control.
> >
> > So, we can these 2 following cases:
> >
> > 1. Integer division:
> >
> >#define TEST_TYPE(TYPE) \
> >__attribute__((noipa)) \
> >void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
> >{ \
> >  for (int i = 0; i < n; i++) \
> >dst[i] = a[i] % b[i]; \
> >}
> >#define TEST_ALL() \
> >TEST_TYPE(int8_t) \
> >TEST_ALL()
> >
> > With this patch:
> >   
> >   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
> >   ivtmp_45 = _61 * 4;
> >   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
> >   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
> >   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> > vect__4.8_48, _61, 0);
> >   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, 
> > vect__8.12_53);
> >
> > 2. Floating-point arithmetic **WITHOUT** -ffast-math
> >   
> >#define TEST_TYPE(TYPE) \
> >__attribute__((noipa)) \
> >void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
> >{ \
> >  for (int i = 0; i < n; i++) \
> >dst[i] = a[i] + b[i]; \
> >}
> >#define TEST_ALL() \
> >TEST_TYPE(float) \
> >TEST_ALL()
> >
> > With this patch:
> >
> >   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
> >   ivtmp_45 = _61 * 4;
> >   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
> >   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
> >   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> > vect__4.8_48, _61, 0);
> >   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, 
> > vect__8.12_53);
> >
> > With this patch, we can make sure operations won't trap for elements that 
> > "mask_out_inactive".
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* 
> > support.
> > (CASE): Ditto.
> > (get_conditional_len_internal_fn): New function.
> > * internal-fn.h (get_conditional_len_internal_fn): Ditto.
> > * tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* 
> > support.
> >
> > ---
> >  gcc/internal-fn.cc | 65 ++
> >  gcc/internal-fn.h  |  1 +
> >  gcc/tree-vect-stmts.cc | 48 ---
> >  3 files changed, 85 insertions(+), 29 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index f9aaf66cf2a..7e3a8cc8412 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) 
> > (internal_fn, gcall *) = {
> >0
> >  };
> >  
> > -/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a
> > -   tree code CODE.  */
> > +/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX
> > +   that maps to a tree code CODE.  There is also an IFN_COND_LEN_##SUFFIX
> > +   for each such IFN_COND_##SUFFIX.  */
> >  #define FOR_EACH_CODE_MAPPING(T) \
> > -  T (PLUS_EXPR, IFN_COND_ADD) \
> > -  T (MINUS_EXPR, IFN_COND_SUB) \
> > -  T (MULT_EXPR, IFN_COND_MUL) \
> > -  T (TRUNC_DIV_EXPR, IFN_COND_DIV) \
> > -  T (TRUNC_MOD_EXPR, IFN_COND_MOD) \
> > -  T (RDIV_EXPR, IFN_COND_RDIV) \
> > -  T (MIN_EXPR, IFN_COND_MIN) \
> > -  T (MAX_EXPR, IFN_COND_MAX) \
> > -  T (BIT_AND_EXPR, IFN_COND_AND) \
> > -  T (BIT_IOR_EXPR, IFN_COND_IOR) \
> > -  T (BIT_XOR_EXPR, IFN_COND_XOR) \
> > -  T (LSHIFT_EXPR, IFN_COND_SHL) \
> > -  T (RSHIFT_EXPR, IFN_COND_SHR) \
> > -  T (NEGATE_EXPR, IFN_COND_NEG)
> > +  T (PLUS_EXPR, ADD) \
>

Re: Re: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread juzhe.zh...@rivai.ai
Hi, Richard. 
I just notice:
+ The values of the remaining elements are undefined.

This is not true for RVV. Actually we want it to be old value that's why we 
have additional pass through operand.

For reduction, 

for (int i = ...)
   res += a[i]

In this case, the length generated by SELECT_VL, at the last iteration, it's 
probably partial vector, so the element >= length should
keep the original value.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-07-12 18:53
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richard and Richi.
> As we disscussed before, COND_LEN_* patterns were added for multiple 
> situations.
> This patch apply CON_LEN_* for the following situation:
>
> Support for the situation that in "vectorizable_operation":
>   /* If operating on inactive elements could generate spurious traps,
>  we need to restrict the operation to active lanes.  Note that this
>  specifically doesn't apply to unhoisted invariants, since they
>  operate on the same value for every lane.
>
>  Similarly, if this operation is part of a reduction, a fully-masked
>  loop should only change the active lanes of the reduction chain,
>  keeping the inactive lanes as-is.  */
>   bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
> || reduc_idx >= 0);
>
> For mask_out_inactive is true with length loop control.
>
> So, we can these 2 following cases:
>
> 1. Integer division:
>
>#define TEST_TYPE(TYPE) \
>__attribute__((noipa)) \
>void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
>{ \
>  for (int i = 0; i < n; i++) \
>dst[i] = a[i] % b[i]; \
>}
>#define TEST_ALL() \
>TEST_TYPE(int8_t) \
>TEST_ALL()
>
> With this patch:
>   
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> 2. Floating-point arithmetic **WITHOUT** -ffast-math
>   
>#define TEST_TYPE(TYPE) \
>__attribute__((noipa)) \
>void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
>{ \
>  for (int i = 0; i < n; i++) \
>dst[i] = a[i] + b[i]; \
>}
>#define TEST_ALL() \
>TEST_TYPE(float) \
>TEST_ALL()
>
> With this patch:
>
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> With this patch, we can make sure operations won't trap for elements that 
> "mask_out_inactive".
>
> gcc/ChangeLog:
>
> * internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* 
> support.
> (CASE): Ditto.
> (get_conditional_len_internal_fn): New function.
> * internal-fn.h (get_conditional_len_internal_fn): Ditto.
> * tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* 
> support.
>
> ---
>  gcc/internal-fn.cc | 65 ++
>  gcc/internal-fn.h  |  1 +
>  gcc/tree-vect-stmts.cc | 48 ---
>  3 files changed, 85 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index f9aaf66cf2a..7e3a8cc8412 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) 
> (internal_fn, gcall *) = {
>0
>  };
>  
> -/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a
> -   tree code CODE.  */
> +/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX
> +   that maps to a tree code CODE.  There is also an IFN_COND_LEN_##SUFFIX
> +   for each such IFN_COND_##SUFFIX.  */
>  #define FOR_EACH_CODE_MAPPING(T) \
> -  T (PLUS_EXPR, IFN_COND_ADD) \
> -  T (MINUS_EXPR, IFN_COND_SUB) \
> -  T (MULT_EXPR, IFN_COND_MUL) \
> -  T (TRUNC_DIV_EXPR, IFN_COND_DIV) \
> -  T (TRUNC_MOD_EXPR, IFN_COND_MOD) \
> -  T (RDIV_EXPR, IFN_COND_RDIV) \
> -  T (MIN_EXPR, IFN_COND_MIN) \
> -  T (MAX_EXPR, IFN_COND_MAX) \
> -  T (BIT_AND_EXPR, IFN_COND_AND) \
> -  T (BIT_IOR_EXPR, IFN_COND_IOR) \
> -  T (BIT_XOR_EXPR, IFN_COND_XOR) \
> -  T (LSHIFT_EXPR, IFN_COND_SHL) \
> -  T (RSHIFT_EXPR, IFN_COND_SHR) \
> -  T (NEGATE_EXPR, IFN_COND_NEG)
> +  T (PLUS_EXPR, ADD) \
> +  T (MINUS_EXPR, SUB) \
> +  T (MULT_EXPR, MUL) \
> +  T (TRUNC_DIV_EXPR, DIV) \
> +  T (TRUNC

[PATCH] tree-optimization/110630 - enhance SLP permute support

2023-07-12 Thread Richard Biener via Gcc-patches
The following enhances the existing lowpart extraction support for
SLP VEC_PERM nodes to cover all vector aligned extractions.  This
allows the existing bb-slp-pr95839.c testcase to be vectorized
with mips -mpaired-single and the new bb-slp-pr95839-3.c testcase
with SSE2.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110630
* tree-vect-slp.cc (vect_add_slp_permutation): New
offset parameter, honor that for the extract code generation.
(vectorizable_slp_permutation_1): Handle offsetted identities.

* gcc.dg/vect/bb-slp-pr95839.c: Make stricter.
* gcc.dg/vect/bb-slp-pr95839-3.c: New variant testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-3.c | 15 +++
 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c   |  1 +
 gcc/tree-vect-slp.cc | 14 +-
 3 files changed, 25 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-3.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-3.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-3.c
new file mode 100644
index 000..aaee8febf37
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-additional-options "-w -Wno-psabi" } */
+
+typedef float __attribute__((vector_size(32))) v8f32;
+
+v8f32 f(v8f32 a, v8f32 b)
+{
+  /* Check that we vectorize this CTOR without any loads.  */
+  return (v8f32){a[0] + b[0], a[1] + b[1], a[2] + b[2], a[3] + b[3],
+a[4] + b[4], a[5] + b[5], a[6] + b[6], a[7] + b[7]};
+}
+
+/* { dg-final { scan-tree-dump-not "from scalars" "slp2" } } */
+/* { dg-final { scan-tree-dump "optimized: basic block" "slp2" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
index 931fd46..d87bbf125c0 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
@@ -10,4 +10,5 @@ v4f32 f(v4f32 a, v4f32 b)
   return (v4f32){a[0] + b[0], a[1] + b[1], a[2] + b[2], a[3] + b[3]};
 }
 
+/* { dg-final { scan-tree-dump-not "from scalars" "slp2" } } */
 /* { dg-final { scan-tree-dump "optimized: basic block" "slp2" } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 355d078d66e..693621ca990 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -8432,7 +8432,7 @@ vect_transform_slp_perm_load (vec_info *vinfo,
 static void
 vect_add_slp_permutation (vec_info *vinfo, gimple_stmt_iterator *gsi,
  slp_tree node, tree first_def, tree second_def,
- tree mask_vec)
+ tree mask_vec, poly_uint64 identity_offset)
 {
   tree vectype = SLP_TREE_VECTYPE (node);
 
@@ -8470,14 +8470,17 @@ vect_add_slp_permutation (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
   else if (!types_compatible_p (TREE_TYPE (first_def), vectype))
 {
   /* For identity permutes we still need to handle the case
-of lowpart extracts or concats.  */
+of offsetted extracts or concats.  */
   unsigned HOST_WIDE_INT c;
   auto first_def_nunits
= TYPE_VECTOR_SUBPARTS (TREE_TYPE (first_def));
   if (known_le (TYPE_VECTOR_SUBPARTS (vectype), first_def_nunits))
{
+ unsigned HOST_WIDE_INT elsz
+   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (first_def;
  tree lowpart = build3 (BIT_FIELD_REF, vectype, first_def,
-TYPE_SIZE (vectype), bitsize_zero_node);
+TYPE_SIZE (vectype),
+bitsize_int (identity_offset * elsz));
  perm_stmt = gimple_build_assign (perm_dest, lowpart);
}
   else if (constant_multiple_p (TYPE_VECTOR_SUBPARTS (vectype),
@@ -8709,7 +8712,8 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
{
  indices.new_vector (mask, second_vec.first == -1U ? 1 : 2,
  TYPE_VECTOR_SUBPARTS (op_vectype));
- bool identity_p = indices.series_p (0, 1, 0, 1);
+ bool identity_p = (indices.series_p (0, 1, mask[0], 1)
+&& constant_multiple_p (mask[0], nunits));
  machine_mode vmode = TYPE_MODE (vectype);
  machine_mode op_vmode = TYPE_MODE (op_vectype);
  unsigned HOST_WIDE_INT c;
@@ -8762,7 +8766,7 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
= vect_get_slp_vect_def (second_node,
 second_vec.second + vi);
  vect_add_slp_permutation (vinfo, gsi, node, first_def,
-   second_def, mask_vec);
+   second_def, mask_vec, mask[0]);
}
}
 
-- 
2.35.3


Re: Re: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread juzhe.zh...@rivai.ai
Oh. Sorry for incorrect psedudo codes that I realized.

> + for (int i = 0; i < LEN + BIAS; i++)
> +   LHS[i] = COND[i] ? A[i] CODE B[i] : ELSE[i];


I think it should be:

 for (int i = 0; i < NUNITS; i++) {
   if (cond[i] && i < LEN + BIAS)
 LHS[i] = A[i] CODE B[i];
 else
   LHS[i] = ELSE[i];
}

Does it look reasonable to you?

Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-07-12 18:53
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richard and Richi.
> As we disscussed before, COND_LEN_* patterns were added for multiple 
> situations.
> This patch apply CON_LEN_* for the following situation:
>
> Support for the situation that in "vectorizable_operation":
>   /* If operating on inactive elements could generate spurious traps,
>  we need to restrict the operation to active lanes.  Note that this
>  specifically doesn't apply to unhoisted invariants, since they
>  operate on the same value for every lane.
>
>  Similarly, if this operation is part of a reduction, a fully-masked
>  loop should only change the active lanes of the reduction chain,
>  keeping the inactive lanes as-is.  */
>   bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
> || reduc_idx >= 0);
>
> For mask_out_inactive is true with length loop control.
>
> So, we can these 2 following cases:
>
> 1. Integer division:
>
>#define TEST_TYPE(TYPE) \
>__attribute__((noipa)) \
>void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
>{ \
>  for (int i = 0; i < n; i++) \
>dst[i] = a[i] % b[i]; \
>}
>#define TEST_ALL() \
>TEST_TYPE(int8_t) \
>TEST_ALL()
>
> With this patch:
>   
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> 2. Floating-point arithmetic **WITHOUT** -ffast-math
>   
>#define TEST_TYPE(TYPE) \
>__attribute__((noipa)) \
>void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
>{ \
>  for (int i = 0; i < n; i++) \
>dst[i] = a[i] + b[i]; \
>}
>#define TEST_ALL() \
>TEST_TYPE(float) \
>TEST_ALL()
>
> With this patch:
>
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> With this patch, we can make sure operations won't trap for elements that 
> "mask_out_inactive".
>
> gcc/ChangeLog:
>
> * internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* 
> support.
> (CASE): Ditto.
> (get_conditional_len_internal_fn): New function.
> * internal-fn.h (get_conditional_len_internal_fn): Ditto.
> * tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* 
> support.
>
> ---
>  gcc/internal-fn.cc | 65 ++
>  gcc/internal-fn.h  |  1 +
>  gcc/tree-vect-stmts.cc | 48 ---
>  3 files changed, 85 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index f9aaf66cf2a..7e3a8cc8412 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) 
> (internal_fn, gcall *) = {
>0
>  };
>  
> -/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a
> -   tree code CODE.  */
> +/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX
> +   that maps to a tree code CODE.  There is also an IFN_COND_LEN_##SUFFIX
> +   for each such IFN_COND_##SUFFIX.  */
>  #define FOR_EACH_CODE_MAPPING(T) \
> -  T (PLUS_EXPR, IFN_COND_ADD) \
> -  T (MINUS_EXPR, IFN_COND_SUB) \
> -  T (MULT_EXPR, IFN_COND_MUL) \
> -  T (TRUNC_DIV_EXPR, IFN_COND_DIV) \
> -  T (TRUNC_MOD_EXPR, IFN_COND_MOD) \
> -  T (RDIV_EXPR, IFN_COND_RDIV) \
> -  T (MIN_EXPR, IFN_COND_MIN) \
> -  T (MAX_EXPR, IFN_COND_MAX) \
> -  T (BIT_AND_EXPR, IFN_COND_AND) \
> -  T (BIT_IOR_EXPR, IFN_COND_IOR) \
> -  T (BIT_XOR_EXPR, IFN_COND_XOR) \
> -  T (LSHIFT_EXPR, IFN_COND_SHL) \
> -  T (RSHIFT_EXPR, IFN_COND_SHR) \
> -  T (NEGATE_EXPR, IFN_COND_NEG)
> +  T (PLUS_EXPR, ADD) \
> +  T (MINUS_EXPR, SUB) \
> +  T (MULT_EXPR, MUL) \
> +  T (TRUNC_DIV_EXPR, DIV) \
> +  T (TRUNC_MOD_EXPR, MOD) \
> +  T (RDIV_EXPR, RDIV) \
> +  T (MIN_EX

Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]

2023-07-12 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 12, 2023 at 12:58 PM Uros Bizjak  wrote:
>
> On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford
>  wrote:
> >
> > Richard Biener via Gcc-patches  writes:
> > > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak  wrote:
> > >>
> > >> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
> > >>  wrote:
> > >> >
> > >> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak  wrote:
> > >> > >
> > >> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
> > >> > >  wrote:
> > >> > > >
> > >> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
> > >> > > >  wrote:
> > >> > > > >
> > >> > > > > As shown in the PR, simplify_gen_subreg call in 
> > >> > > > > simplify_replace_fn_rtx:
> > >> > > > >
> > >> > > > > (gdb) list
> > >> > > > > 469   if (code == SUBREG)
> > >> > > > > 470 {
> > >> > > > > 471   op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
> > >> > > > > old_rtx, fn, data);
> > >> > > > > 472   if (op0 == SUBREG_REG (x))
> > >> > > > > 473 return x;
> > >> > > > > 474   op0 = simplify_gen_subreg (GET_MODE (x), op0,
> > >> > > > > 475  GET_MODE 
> > >> > > > > (SUBREG_REG (x)),
> > >> > > > > 476  SUBREG_BYTE (x));
> > >> > > > > 477   return op0 ? op0 : x;
> > >> > > > > 478 }
> > >> > > > >
> > >> > > > > simplifies with following arguments:
> > >> > > > >
> > >> > > > > (gdb) p debug_rtx (op0)
> > >> > > > > (const_vector:V4QI [
> > >> > > > > (const_int -52 [0xffcc]) repeated x4
> > >> > > > > ])
> > >> > > > > (gdb) p debug_rtx (x)
> > >> > > > > (subreg:V16QI (reg:V4QI 98) 0)
> > >> > > > >
> > >> > > > > to:
> > >> > > > >
> > >> > > > > (gdb) p debug_rtx (op0)
> > >> > > > > (const_vector:V16QI [
> > >> > > > > (const_int -52 [0xffcc]) repeated x16
> > >> > > > > ])
> > >> > > > >
> > >> > > > > This simplification is invalid, it is not possible to get 
> > >> > > > > V16QImode vector
> > >> > > > > from V4QImode vector, even when all elements are duplicates.
> > >> >
> > >> > ^^^
> > >> >
> > >> > I think this simplification is valid.  A simplification to
> > >> >
> > >> > (const_vector:V16QI [
> > >> >  (const_int -52 [0xffcc]) repeated x4
> > >> >  (const_int 0 [0]) repeated x12
> > >> >  ])
> > >> >
> > >> > would be valid as well.
> > >> >
> > >> > > > > The simplification happens in simplify_context::simplify_subreg:
> > >> > > > >
> > >> > > > > (gdb) list
> > >> > > > > 7558  if (VECTOR_MODE_P (outermode)
> > >> > > > > 7559  && GET_MODE_INNER (outermode) == 
> > >> > > > > GET_MODE_INNER (innermode)
> > >> > > > > 7560  && vec_duplicate_p (op, &elt))
> > >> > > > > 7561return gen_vec_duplicate (outermode, elt);
> > >> > > > >
> > >> > > > > but the above simplification is valid only for non-paradoxical 
> > >> > > > > registers,
> > >> > > > > where outermode <= innermode.  We should not assume that 
> > >> > > > > elements outside
> > >> > > > > the original register are valid, let alone all duplicates.
> > >> > > >
> > >> > > > Hmm, but looking at the audit trail the x86 backend expects them 
> > >> > > > to be zero?
> > >> > > > Isn't that wrong as well?
> > >> > >
> > >> > > If you mean Comment #10, it is just an observation that
> > >> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
> > >> > >
> > >> > > (gdb) p debug_rtx (src)
> > >> > > (const_vector:V8HI [
> > >> > > (const_int 204 [0xcc]) repeated x4
> > >> > > (const_int 0 [0]) repeated x4
> > >> > > ])
> > >> > >
> > >> > > instead of:
> > >> > >
> > >> > > (gdb) p debug_rtx (src)
> > >> > > (const_vector:V8HI [
> > >> > > (const_int 204 [0xcc]) repeated x8
> > >> > > ])
> > >> > >
> > >> > > which is in line with the statement below.
> > >> > > >
> > >> > > > That is, I think putting any random value into the upper lanes when
> > >> > > > constant folding
> > >> > > > a paradoxical subreg sounds OK to me, no?
> > >> > >
> > >> > > The compiler is putting zero there as can be seen from the above new 
> > >> > > RTX.
> > >> > >
> > >> > > > Of course we might choose to not do such constant propagation for
> > >> > > > efficiency reason - at least
> > >> > > > when the resulting CONST_* would require a larger constant pool 
> > >> > > > entry
> > >> > > > or more costly
> > >> > > > construction.
> > >> > >
> > >> > > This is probably a follow-up improvement, where this patch tries to
> > >> > > fix a specific invalid simplification of simplify_replace_rtx that is
> > >> > > invalid universally.
> > >> >
> > >> > How so?  What specifies the values of the paradoxical subreg for the
> > >> > bytes not covered by the subreg operand?
> > >>
> > >> I don't know why 0 is generated here (and if it is valid) for
> > >> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to
>

[IRA] Skip empty register classes in setup_reg_class_relations

2023-07-12 Thread SenthilKumar.Selvaraj--- via Gcc-patches
Hi,

  I've been spending some (spare) time trying to get LRA working
  for the avr target. After making a couple of changes to get
  libgcc going, I'm now hitting an assert at
  lra-constraints.cc:4423 for a subarch (avrtiny) that has a
  couple of regclasses with no available registers.

  The assert fires because in_class_p (correctly) returns
  false for get_reg_class (regno) = ALL_REGS, and new_class =
  NO_LD_REGS. For avrtiny, NO_LD_REGS is an empty regset, and
  therefore hard_reg_set_subset_p (NO_LD_REGS, lra_no_alloc_regs)
  is always true, making in_class_p return false.

  in_class_p picks NO_LD_REGS as new_class because common_class =
  ira_reg_class_subset[ALL_REGS][NO_REGS] evaluates as
  NO_LD_REGS. This appears wrong to me - it should be NO_REGS
  instead (lra-constraints.cc:4421 checks for NO_REGS).

  ira.cc:setup_reg_class_relations sets up
  ira_reg_class_subset (among other things), and the problem
  appears to be a missing continue statement if
  reg_class_contents[cl3] (in the innermost loop) is empty.

  In this case, for cl1 = ALL_REGS and cl2 = NO_REGS, cl3 =
  NO_LD_REGS, temp_hard_regset and temp_set2 are both empty, and
  hard_reg_subset_p (, ) is always true, so
  ira_reg_class_subset[ALL_REGS][NO_REGS] ends up being set to
  cl3 = NO_LD_REGS. Adding a continue if hard_reg_set_empty_p (temp_hard_regset)
  fixes the problem for me.

  Does the below patch look ok? Bootstrapping and regression
  testing passed on x86_64.

Regards
Senthil

gcc/ChangeLog:

* ira.cc (setup_reg_class_relations): Skip if
cl3 is an empty register class.


--- gcc/ira.cc
+++ gcc/ira.cc
@@ -1259,6 +1259,9 @@ setup_reg_class_relations (void)
  for (cl3 = 0; cl3 < N_REG_CLASSES; cl3++)
{
  temp_hard_regset = reg_class_contents[cl3] & ~no_unit_alloc_regs;
+ if (hard_reg_set_empty_p (temp_hard_regset))
+   continue;
+
  if (hard_reg_set_subset_p (temp_hard_regset, intersection_set))
{
  /* CL3 allocatable hard register set is inside of


Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]

2023-07-12 Thread Richard Biener via Gcc-patches
On Wed, Jul 12, 2023 at 1:05 PM Uros Bizjak  wrote:
>
> On Wed, Jul 12, 2023 at 12:58 PM Uros Bizjak  wrote:
> >
> > On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford
> >  wrote:
> > >
> > > Richard Biener via Gcc-patches  writes:
> > > > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak  wrote:
> > > >>
> > > >> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
> > > >>  wrote:
> > > >> >
> > > >> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak  
> > > >> > wrote:
> > > >> > >
> > > >> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
> > > >> > >  wrote:
> > > >> > > >
> > > >> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
> > > >> > > >  wrote:
> > > >> > > > >
> > > >> > > > > As shown in the PR, simplify_gen_subreg call in 
> > > >> > > > > simplify_replace_fn_rtx:
> > > >> > > > >
> > > >> > > > > (gdb) list
> > > >> > > > > 469   if (code == SUBREG)
> > > >> > > > > 470 {
> > > >> > > > > 471   op0 = simplify_replace_fn_rtx (SUBREG_REG 
> > > >> > > > > (x),
> > > >> > > > > old_rtx, fn, data);
> > > >> > > > > 472   if (op0 == SUBREG_REG (x))
> > > >> > > > > 473 return x;
> > > >> > > > > 474   op0 = simplify_gen_subreg (GET_MODE (x), op0,
> > > >> > > > > 475  GET_MODE 
> > > >> > > > > (SUBREG_REG (x)),
> > > >> > > > > 476  SUBREG_BYTE (x));
> > > >> > > > > 477   return op0 ? op0 : x;
> > > >> > > > > 478 }
> > > >> > > > >
> > > >> > > > > simplifies with following arguments:
> > > >> > > > >
> > > >> > > > > (gdb) p debug_rtx (op0)
> > > >> > > > > (const_vector:V4QI [
> > > >> > > > > (const_int -52 [0xffcc]) repeated x4
> > > >> > > > > ])
> > > >> > > > > (gdb) p debug_rtx (x)
> > > >> > > > > (subreg:V16QI (reg:V4QI 98) 0)
> > > >> > > > >
> > > >> > > > > to:
> > > >> > > > >
> > > >> > > > > (gdb) p debug_rtx (op0)
> > > >> > > > > (const_vector:V16QI [
> > > >> > > > > (const_int -52 [0xffcc]) repeated x16
> > > >> > > > > ])
> > > >> > > > >
> > > >> > > > > This simplification is invalid, it is not possible to get 
> > > >> > > > > V16QImode vector
> > > >> > > > > from V4QImode vector, even when all elements are duplicates.
> > > >> >
> > > >> > ^^^
> > > >> >
> > > >> > I think this simplification is valid.  A simplification to
> > > >> >
> > > >> > (const_vector:V16QI [
> > > >> >  (const_int -52 [0xffcc]) repeated x4
> > > >> >  (const_int 0 [0]) repeated x12
> > > >> >  ])
> > > >> >
> > > >> > would be valid as well.
> > > >> >
> > > >> > > > > The simplification happens in 
> > > >> > > > > simplify_context::simplify_subreg:
> > > >> > > > >
> > > >> > > > > (gdb) list
> > > >> > > > > 7558  if (VECTOR_MODE_P (outermode)
> > > >> > > > > 7559  && GET_MODE_INNER (outermode) == 
> > > >> > > > > GET_MODE_INNER (innermode)
> > > >> > > > > 7560  && vec_duplicate_p (op, &elt))
> > > >> > > > > 7561return gen_vec_duplicate (outermode, elt);
> > > >> > > > >
> > > >> > > > > but the above simplification is valid only for non-paradoxical 
> > > >> > > > > registers,
> > > >> > > > > where outermode <= innermode.  We should not assume that 
> > > >> > > > > elements outside
> > > >> > > > > the original register are valid, let alone all duplicates.
> > > >> > > >
> > > >> > > > Hmm, but looking at the audit trail the x86 backend expects them 
> > > >> > > > to be zero?
> > > >> > > > Isn't that wrong as well?
> > > >> > >
> > > >> > > If you mean Comment #10, it is just an observation that
> > > >> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
> > > >> > >
> > > >> > > (gdb) p debug_rtx (src)
> > > >> > > (const_vector:V8HI [
> > > >> > > (const_int 204 [0xcc]) repeated x4
> > > >> > > (const_int 0 [0]) repeated x4
> > > >> > > ])
> > > >> > >
> > > >> > > instead of:
> > > >> > >
> > > >> > > (gdb) p debug_rtx (src)
> > > >> > > (const_vector:V8HI [
> > > >> > > (const_int 204 [0xcc]) repeated x8
> > > >> > > ])
> > > >> > >
> > > >> > > which is in line with the statement below.
> > > >> > > >
> > > >> > > > That is, I think putting any random value into the upper lanes 
> > > >> > > > when
> > > >> > > > constant folding
> > > >> > > > a paradoxical subreg sounds OK to me, no?
> > > >> > >
> > > >> > > The compiler is putting zero there as can be seen from the above 
> > > >> > > new RTX.
> > > >> > >
> > > >> > > > Of course we might choose to not do such constant propagation for
> > > >> > > > efficiency reason - at least
> > > >> > > > when the resulting CONST_* would require a larger constant pool 
> > > >> > > > entry
> > > >> > > > or more costly
> > > >> > > > construction.
> > > >> > >
> > > >> > > This is probably a follow-up improvement, where this patch tries to
> > > >> > > fix a specific invalid simplificatio

[PATCH V3] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.
As we disscussed before, COND_LEN_* patterns were added for multiple situations.
This patch apply CON_LEN_* for the following situation:

Support for the situation that in "vectorizable_operation":
  /* If operating on inactive elements could generate spurious traps,
 we need to restrict the operation to active lanes.  Note that this
 specifically doesn't apply to unhoisted invariants, since they
 operate on the same value for every lane.

 Similarly, if this operation is part of a reduction, a fully-masked
 loop should only change the active lanes of the reduction chain,
 keeping the inactive lanes as-is.  */
  bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
|| reduc_idx >= 0);

For mask_out_inactive is true with length loop control.

So, we can these 2 following cases:

1. Integer division:

   #define TEST_TYPE(TYPE)  \
   __attribute__((noipa))   \
   void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
   {\
 for (int i = 0; i < n; i++)\
   dst[i] = a[i] % b[i];\
   }
   #define TEST_ALL()   \
   TEST_TYPE(int8_t)\
   TEST_ALL()

With this patch:
  
  _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
  ivtmp_45 = _61 * 4;
  vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
  vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
  vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
vect__4.8_48, _61, 0);
  .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);

2. Floating-point arithmetic **WITHOUT** -ffast-math
  
   #define TEST_TYPE(TYPE)  \
   __attribute__((noipa))   \
   void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
   {\
 for (int i = 0; i < n; i++)\
   dst[i] = a[i] + b[i];\
   }
   #define TEST_ALL()   \
   TEST_TYPE(float) \
   TEST_ALL()

With this patch:
   
  _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
  ivtmp_45 = _61 * 4;
  vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
  vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
  vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
vect__4.8_48, _61, 0);
  .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);

With this patch, we can make sure operations won't trap for elements that 
"mask_out_inactive".

gcc/ChangeLog:

* internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* support.
(CASE): Ditto.
(get_conditional_len_internal_fn): New function.
* internal-fn.h (get_conditional_len_internal_fn): Ditto.
* tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* 
support.

---
 gcc/internal-fn.cc | 73 +++---
 gcc/internal-fn.h  |  1 +
 gcc/tree-vect-stmts.cc | 48 ---
 3 files changed, 93 insertions(+), 29 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index f9aaf66cf2a..b288ac6fe6b 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) 
(internal_fn, gcall *) = {
   0
 };
 
-/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a
-   tree code CODE.  */
+/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX
+   that maps to a tree code CODE.  There is also an IFN_COND_LEN_##SUFFIX
+   for each such IFN_COND_##SUFFIX.  */
 #define FOR_EACH_CODE_MAPPING(T) \
-  T (PLUS_EXPR, IFN_COND_ADD) \
-  T (MINUS_EXPR, IFN_COND_SUB) \
-  T (MULT_EXPR, IFN_COND_MUL) \
-  T (TRUNC_DIV_EXPR, IFN_COND_DIV) \
-  T (TRUNC_MOD_EXPR, IFN_COND_MOD) \
-  T (RDIV_EXPR, IFN_COND_RDIV) \
-  T (MIN_EXPR, IFN_COND_MIN) \
-  T (MAX_EXPR, IFN_COND_MAX) \
-  T (BIT_AND_EXPR, IFN_COND_AND) \
-  T (BIT_IOR_EXPR, IFN_COND_IOR) \
-  T (BIT_XOR_EXPR, IFN_COND_XOR) \
-  T (LSHIFT_EXPR, IFN_COND_SHL) \
-  T (RSHIFT_EXPR, IFN_COND_SHR) \
-  T (NEGATE_EXPR, IFN_COND_NEG)
+  T (PLUS_EXPR, ADD) \
+  T (MINUS_EXPR, SUB) \
+  T (MULT_EXPR, MUL) \
+  T (TRUNC_DIV_EXPR, DIV) \
+  T (TRUNC_MOD_EXPR, MOD) \
+  T (RDIV_EXPR, RDIV) \
+  T (MIN_EXPR, MIN) \
+  T (MAX_EXPR, MAX) \
+  T (BIT_AND_EXPR, AND) \
+  T (BIT_IOR_EXPR, IOR) \
+  T (BIT_XOR_EXPR, XOR) \
+  T (LSHIFT_EXPR, SHL) \
+  T (RSHIFT_EXPR, SHR) \
+  T (NEGATE_EXPR, NEG)
 
 /* Return a function that only performs CODE when a certain condition is met
and that uses a given fallback value otherwise.  For example, if CODE is
@@ -4313,7 +4314,7 @@ get_conditional_internal_fn (tree_code code)
 {
   switch (code)

Re: Re: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread juzhe.zh...@rivai.ai
Thanks Richard and Richi so much!

I have sent V3 to be merged after I finished bootstrap && regression.

With adding clearer comments as follows:

+/* Like get_conditional_internal_fn, but return a function that
+   additionally restricts the operation to the leading elements
+   of a vector.  The number of elements to process is given by
+   a length and bias pair.  The function only performs the CODE
+   when a certain condition is met as well as the element is located
+   within LEN + BIAS (i < LEN + BIAS) and that uses a given fallback value
+   otherwise.
+
+   For example, if CODE is [PLUS, MINUS, ... etc]:
+
+ LHS = FN (COND, A, B, ELSE, LEN, BIAS)
+
+   is equivalent to the C expression:
+
+ for (int i = 0; i < NUNITS; i++)
+  {
+   if (COND[i] && i < (LEN + BIAS))
+ LHS[i] = A[i] CODE B[i];
+   else
+ LHS[i] = ELSE[i];
+  }
+*/
Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-12 18:59
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; gcc-patches
Subject: Re: Re: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation
On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:
 
> Thank you so much.
> I am gonna wait for Richi's final approval.
 
It's good enough when either of us approves unless we explicitely ask
to wait for somebody else.
 
LGTM anyway.
 
Richard.
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2023-07-12 18:53
> To: juzhe.zhong
> CC: gcc-patches; rguenther
> Subject: Re: [PATCH V2] VECT: Apply COND_LEN_* into vectorizable_operation
> juzhe.zh...@rivai.ai writes:
> > From: Ju-Zhe Zhong 
> >
> > Hi, Richard and Richi.
> > As we disscussed before, COND_LEN_* patterns were added for multiple 
> > situations.
> > This patch apply CON_LEN_* for the following situation:
> >
> > Support for the situation that in "vectorizable_operation":
> >   /* If operating on inactive elements could generate spurious traps,
> >  we need to restrict the operation to active lanes.  Note that this
> >  specifically doesn't apply to unhoisted invariants, since they
> >  operate on the same value for every lane.
> >
> >  Similarly, if this operation is part of a reduction, a fully-masked
> >  loop should only change the active lanes of the reduction chain,
> >  keeping the inactive lanes as-is.  */
> >   bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
> > || reduc_idx >= 0);
> >
> > For mask_out_inactive is true with length loop control.
> >
> > So, we can these 2 following cases:
> >
> > 1. Integer division:
> >
> >#define TEST_TYPE(TYPE) \
> >__attribute__((noipa)) \
> >void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
> >{ \
> >  for (int i = 0; i < n; i++) \
> >dst[i] = a[i] % b[i]; \
> >}
> >#define TEST_ALL() \
> >TEST_TYPE(int8_t) \
> >TEST_ALL()
> >
> > With this patch:
> >   
> >   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
> >   ivtmp_45 = _61 * 4;
> >   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
> >   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
> >   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> > vect__4.8_48, _61, 0);
> >   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, 
> > vect__8.12_53);
> >
> > 2. Floating-point arithmetic **WITHOUT** -ffast-math
> >   
> >#define TEST_TYPE(TYPE) \
> >__attribute__((noipa)) \
> >void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
> >{ \
> >  for (int i = 0; i < n; i++) \
> >dst[i] = a[i] + b[i]; \
> >}
> >#define TEST_ALL() \
> >TEST_TYPE(float) \
> >TEST_ALL()
> >
> > With this patch:
> >
> >   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
> >   ivtmp_45 = _61 * 4;
> >   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
> >   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
> >   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> > vect__4.8_48, _61, 0);
> >   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, 
> > vect__8.12_53);
> >
> > With this patch, we can make sure operations won't trap for elements that 
> > "mask_out_inactive".
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* 
> > support.
> > (CASE): Ditto.
> > (get_conditional_len_internal_fn): New function.
> > * internal-fn.h (get_conditional_len_internal_fn): Ditto.
> > * tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* 
> > support.
> >
> > ---
> >  gcc/internal-fn.cc | 65 ++
> >  gcc/internal-fn.h  |  1 +
> >  gcc/tree-vect-stmts.cc | 48 ---
> >  3 files changed, 85 insertions(+), 29 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index f9aaf66cf2a..7e3a8cc8412 

Re: [RFC] Store_bit_field_1: Use mode of SUBREG instead of REG

2023-07-12 Thread YunQiang Su


> 2023年7月12日 15:44,Richard Biener  写道:
> 
> On Wed, Jul 12, 2023 at 5:20 AM YunQiang Su  wrote:
>> 
>> PR #104914
>> 
>> When work with
>>  int val;
>>  ((unsigned char*)&val)[0] = *buf;
>> The RTX mode is obtained from REG instead of SUBREG,
>> which make D is used instead of .
>> Thus something wrong happens on sign-extend default architectures,
>> like MIPS64.
>> 
>> gcc/ChangeLog:
>>PR: 104914.
>>* expmed.cc(store_bit_field_1): Get mode from original
>>str_rtx instead of op0.
>> ---
>> gcc/expmed.cc | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/gcc/expmed.cc b/gcc/expmed.cc
>> index fbd4ce2d42f..37f90912122 100644
>> --- a/gcc/expmed.cc
>> +++ b/gcc/expmed.cc
>> @@ -849,7 +849,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
>> poly_uint64 bitnum,
>>  if we aren't.  This must come after the entire register case above,
>>  since that case is valid for any mode.  The following cases are only
>>  valid for integral modes.  */
>> -  opt_scalar_int_mode op0_mode = int_mode_for_mode (GET_MODE (op0));
>> +  opt_scalar_int_mode op0_mode = int_mode_for_mode (GET_MODE (str_rtx));
> 
> I don't think this is correct - op0_mode is used to store into op0, and we are
> just requiring that it is an integer mode and equal to the original
> mode.  I suppose
> your patch makes us go to the fallback code instead, but it's surely
> for the wrong

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index fbd4ce2d42f..feee8c82f59 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -850,6 +861,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
  since that case is valid for any mode.  The following cases are only
  valid for integral modes.  */
   opt_scalar_int_mode op0_mode = int_mode_for_mode (GET_MODE (op0));
+  opt_scalar_int_mode str_mode = int_mode_for_mode (GET_MODE (str_rtx));
   scalar_int_mode imode;
   if (!op0_mode.exists (&imode) || imode != GET_MODE (op0))
 {
@@ -881,8 +893,14 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
op0 = gen_lowpart (op0_mode.require (), op0);
 }
 
-  return store_integral_bit_field (op0, op0_mode, ibitsize, ibitnum,
-  bitregion_start, bitregion_end,
+  bool use_str_mode = false;
+  if (GET_MODE_CLASS(GET_MODE (str_rtx)) == MODE_INT
+  && GET_MODE_CLASS(GET_MODE (op0)) == MODE_INT
+  && known_gt (GET_MODE_SIZE(GET_MODE(op0)), 
GET_MODE_SIZE(GET_MODE(str_rtx
+   use_str_mode = true;
+  return store_integral_bit_field (op0,
+  use_str_mode ? str_mode : op0_mode,
+  ibitsize, ibitnum, bitregion_start, 
bitregion_end,
   fieldmode, value, reverse, fallback_p);
 }

> reason.  I also wonder why we don't just check GET_MODE_CLASS
> (GET_MODE (op0)) == MODE_CLASS_INT ...
> 

In fact I have no idea. Maybe there are some other tricky cases.

>>   scalar_int_mode imode;
>>   if (!op0_mode.exists (&imode) || imode != GET_MODE (op0))
>> {
>> --
>> 2.30.2




Re: [Patch, Fortran, committed] Allow ref'ing PDT's len() in parameter-initializer [PR102003]

2023-07-12 Thread Andre Vehreschild via Gcc-patches
Hi all, hi Harald,

thanks for the review. I choose to use gfc_replace_expr() and retested.
Everything went fine now.

Also thank you clarifying the pdt as a component in a derived type and that
that is still a bug and I didn't do it wrong.

I have pushed the attached patch seconds ago.

Thanks for your help,
Andre

On Tue, 11 Jul 2023 22:24:37 +0200
Harald Anlauf  wrote:

> Hi Andre,
>
> this looks much better now!
>
> This looks mostly good to me, except for a typo in the testcase:
>
> +  if (p% ci% len /= 42) stop 4
>
> There is no component "ci", only "c".  The testsuite would fail.
>
> Regarding the memleak: replacing
>
>// TODO: Fix leaking expr tmp, when simplify is done twice.
>tmp = gfc_copy_expr (*newp);
>
> by
>
>if (inquiry->next)
>   {
> gfc_free_expr (tmp);
> tmp = gfc_copy_expr (*newp);
>   }
>
> or rather
>
>if (inquiry->next)
>   gfc_replace_expr (tmp, *newp);
>
> at least shrinks the leak a bit.  (Untested otherwise).
>
> OK with one of the above changes (provided it survives regtesting).
>
> Thanks for the patch!
>
> Harald
>
>
> Am 11.07.23 um 18:23 schrieb Andre Vehreschild via Gcc-patches:
> > Hi Harald,
> >
> > attached is a new version of the patch. This now also respects inquiry-LEN.
> > Btw, there is a potential memory leak in the simplify for inquiry
> > functions. I have added a note into the code.
> >
> > I tried to use a pdt within a derived type as a component. Is that not
> > allowed by the standard? I know, I could hunt in the standard for it, but
> > when someone knows out of his head, he could greatly help me out.
> >
> > Regtests ok on x86_64-linux-gnu/F37.
> >
> > Regards,
> > Andre
> >
> > On Mon, 10 Jul 2023 20:55:29 +0200
> > Harald Anlauf  wrote:
> >
> >> Hi Andre,
> >>
> >> thanks for looking into this!
> >>
> >> While it fixes the original PR, here is a minor extension of the
> >> testcase that ICEs here with your patch:
> >>
> >> program pr102003
> >> type pdt(n)
> >>integer, len :: n = 8
> >>character(len=n) :: c
> >> end type pdt
> >> type(pdt(42)) :: p
> >> integer, parameter :: m = len (p% c)
> >> integer, parameter :: n = p% c% len
> >>
> >> if (m /= 42) stop 1
> >> if (len (p% c) /= 42) stop 2
> >> print *, p% c% len   ! OK
> >> if (p% c% len  /= 42) stop 3 ! OK
> >> print *, n   ! ICE
> >> end
> >>
> >> I get:
> >>
> >> pdt_33.f03:14:27:
> >>
> >>  14 |   integer, parameter :: n = p% c% len
> >> |   1
> >> Error: non-constant initialization expression at (1)
> >> pdt_33.f03:20:31:
> >>
> >>  20 |   print *, n   ! ICE
> >> |   1
> >> internal compiler error: tree check: expected record_type or union_type
> >> or qual_union_type, have integer_type in gfc_conv_component_ref, at
> >> fortran/trans-expr.cc:2757
> >> 0x84286c tree_check_failed(tree_node const*, char const*, int, char
> >> const*, ...)
> >>   ../../gcc-trunk/gcc/tree.cc:8899
> >> 0xa6d6fb tree_check3(tree_node*, char const*, int, char const*,
> >> tree_code, tree_code, tree_code)
> >>   ../../gcc-trunk/gcc/tree.h:3617
> >> 0xa90847 gfc_conv_component_ref(gfc_se*, gfc_ref*)
> >>   ../../gcc-trunk/gcc/fortran/trans-expr.cc:2757
> >> 0xa91bbc gfc_conv_variable
> >>   ../../gcc-trunk/gcc/fortran/trans-expr.cc:3137
> >> 0xaa8e9c gfc_conv_expr(gfc_se*, gfc_expr*)
> >>   ../../gcc-trunk/gcc/fortran/trans-expr.cc:9594
> >> 0xaa92ae gfc_conv_expr_reference(gfc_se*, gfc_expr*)
> >>   ../../gcc-trunk/gcc/fortran/trans-expr.cc:9713
> >> 0xad67f6 gfc_trans_transfer(gfc_code*)
> >>   ../../gcc-trunk/gcc/fortran/trans-io.cc:2607
> >> 0xa43cb7 trans_code
> >>   ../../gcc-trunk/gcc/fortran/trans.cc:2449
> >> 0xad37c6 build_dt
> >>   ../../gcc-trunk/gcc/fortran/trans-io.cc:2051
> >> 0xa43cd7 trans_code
> >>   ../../gcc-trunk/gcc/fortran/trans.cc:2421
> >> 0xa84711 gfc_generate_function_code(gfc_namespace*)
> >>   ../../gcc-trunk/gcc/fortran/trans-decl.cc:7762
> >> 0x9d9ca7 translate_all_program_units
> >>   ../../gcc-trunk/gcc/fortran/parse.cc:6929
> >> 0x9d9ca7 gfc_parse_file()
> >>   ../../gcc-trunk/gcc/fortran/parse.cc:7235
> >> 0xa40a1f gfc_be_parse_file
> >>   ../../gcc-trunk/gcc/fortran/f95-lang.cc:229
> >>
> >> The fortran-dump confirms that n is not simplified to a constant.
> >> So while you're at it, do you also see a solution to this variant?
> >>
> >> Harald
> >>
> >>
> >> Am 10.07.23 um 17:48 schrieb Andre Vehreschild via Gcc-patches:
> >>> Hi all,
> >>>
> >>> while browsing the pdt meta-bug I came across 102003 and thought to
> >>> myself: Well, that one is easy. How foolish of me...
> >>>
> >>> Anyway, the solution attached prevents a pdt_len (or pdt_kind) expression
> >>> in a function call (e.g. len() or kind()) to mark the whole expression as
> >>> a 

Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]

2023-07-12 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, Jul 12, 2023 at 1:05 PM Uros Bizjak  wrote:
>>
>> On Wed, Jul 12, 2023 at 12:58 PM Uros Bizjak  wrote:
>> >
>> > On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford
>> >  wrote:
>> > >
>> > > Richard Biener via Gcc-patches  writes:
>> > > > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak  wrote:
>> > > >>
>> > > >> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
>> > > >>  wrote:
>> > > >> >
>> > > >> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak  
>> > > >> > wrote:
>> > > >> > >
>> > > >> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
>> > > >> > >  wrote:
>> > > >> > > >
>> > > >> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
>> > > >> > > >  wrote:
>> > > >> > > > >
>> > > >> > > > > As shown in the PR, simplify_gen_subreg call in 
>> > > >> > > > > simplify_replace_fn_rtx:
>> > > >> > > > >
>> > > >> > > > > (gdb) list
>> > > >> > > > > 469   if (code == SUBREG)
>> > > >> > > > > 470 {
>> > > >> > > > > 471   op0 = simplify_replace_fn_rtx (SUBREG_REG 
>> > > >> > > > > (x),
>> > > >> > > > > old_rtx, fn, data);
>> > > >> > > > > 472   if (op0 == SUBREG_REG (x))
>> > > >> > > > > 473 return x;
>> > > >> > > > > 474   op0 = simplify_gen_subreg (GET_MODE (x), 
>> > > >> > > > > op0,
>> > > >> > > > > 475  GET_MODE 
>> > > >> > > > > (SUBREG_REG (x)),
>> > > >> > > > > 476  SUBREG_BYTE (x));
>> > > >> > > > > 477   return op0 ? op0 : x;
>> > > >> > > > > 478 }
>> > > >> > > > >
>> > > >> > > > > simplifies with following arguments:
>> > > >> > > > >
>> > > >> > > > > (gdb) p debug_rtx (op0)
>> > > >> > > > > (const_vector:V4QI [
>> > > >> > > > > (const_int -52 [0xffcc]) repeated x4
>> > > >> > > > > ])
>> > > >> > > > > (gdb) p debug_rtx (x)
>> > > >> > > > > (subreg:V16QI (reg:V4QI 98) 0)
>> > > >> > > > >
>> > > >> > > > > to:
>> > > >> > > > >
>> > > >> > > > > (gdb) p debug_rtx (op0)
>> > > >> > > > > (const_vector:V16QI [
>> > > >> > > > > (const_int -52 [0xffcc]) repeated x16
>> > > >> > > > > ])
>> > > >> > > > >
>> > > >> > > > > This simplification is invalid, it is not possible to get 
>> > > >> > > > > V16QImode vector
>> > > >> > > > > from V4QImode vector, even when all elements are duplicates.
>> > > >> >
>> > > >> > ^^^
>> > > >> >
>> > > >> > I think this simplification is valid.  A simplification to
>> > > >> >
>> > > >> > (const_vector:V16QI [
>> > > >> >  (const_int -52 [0xffcc]) repeated x4
>> > > >> >  (const_int 0 [0]) repeated x12
>> > > >> >  ])
>> > > >> >
>> > > >> > would be valid as well.
>> > > >> >
>> > > >> > > > > The simplification happens in 
>> > > >> > > > > simplify_context::simplify_subreg:
>> > > >> > > > >
>> > > >> > > > > (gdb) list
>> > > >> > > > > 7558  if (VECTOR_MODE_P (outermode)
>> > > >> > > > > 7559  && GET_MODE_INNER (outermode) == 
>> > > >> > > > > GET_MODE_INNER (innermode)
>> > > >> > > > > 7560  && vec_duplicate_p (op, &elt))
>> > > >> > > > > 7561return gen_vec_duplicate (outermode, elt);
>> > > >> > > > >
>> > > >> > > > > but the above simplification is valid only for 
>> > > >> > > > > non-paradoxical registers,
>> > > >> > > > > where outermode <= innermode.  We should not assume that 
>> > > >> > > > > elements outside
>> > > >> > > > > the original register are valid, let alone all duplicates.
>> > > >> > > >
>> > > >> > > > Hmm, but looking at the audit trail the x86 backend expects 
>> > > >> > > > them to be zero?
>> > > >> > > > Isn't that wrong as well?
>> > > >> > >
>> > > >> > > If you mean Comment #10, it is just an observation that
>> > > >> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
>> > > >> > >
>> > > >> > > (gdb) p debug_rtx (src)
>> > > >> > > (const_vector:V8HI [
>> > > >> > > (const_int 204 [0xcc]) repeated x4
>> > > >> > > (const_int 0 [0]) repeated x4
>> > > >> > > ])
>> > > >> > >
>> > > >> > > instead of:
>> > > >> > >
>> > > >> > > (gdb) p debug_rtx (src)
>> > > >> > > (const_vector:V8HI [
>> > > >> > > (const_int 204 [0xcc]) repeated x8
>> > > >> > > ])
>> > > >> > >
>> > > >> > > which is in line with the statement below.
>> > > >> > > >
>> > > >> > > > That is, I think putting any random value into the upper lanes 
>> > > >> > > > when
>> > > >> > > > constant folding
>> > > >> > > > a paradoxical subreg sounds OK to me, no?
>> > > >> > >
>> > > >> > > The compiler is putting zero there as can be seen from the above 
>> > > >> > > new RTX.
>> > > >> > >
>> > > >> > > > Of course we might choose to not do such constant propagation 
>> > > >> > > > for
>> > > >> > > > efficiency reason - at least
>> > > >> > > > when the resulting CONST_* would require a larger constant pool 
>> > > >> > > > entry
>> > > >> > > > or m

Re: [Patch] libgomp: Use libnuma for OpenMP's partition=nearest allocation trait

2023-07-12 Thread Tobias Burnus

Now committed as r14-2462-g450b05ce54d3f0.

Changes to the patch in previous email:
* I fixed some issues found on the way,
* The wording in the .texi has been improved/expanded, and
* I included two testcases to exercise the two libraries (or
  the default allocator when it is not available at runtime).

Given that the default allocation already works fine (nearest)
and the normal "malloc" is more economic in terms of memory
handling (not multiples of page size or requesting a fixed
pool size), I was wondering whether this patch is really needed.

But at the end: default can be changed (cf. below) and given
the user the choice makes sense. The manual states what GCC does
which should help to make a conscious choice.

* * *

I did experiment with the testcase attached to previous email
plus using dlopen to obtain the functions from libnuma if available.

It was also using:
/* { dg-do run { target { dlopen } } } */
/* { dg-additional-options "-ldl" } */

However, the Linux kernel too often placed the allocated memory
on the "wrong" node to be usable as a testcase. I did get be
0 to 15 misplaced allocations, depending on the run.

Hence, there is no such testcase. Using numactrl --preferred=1 I
could force the normal allocation to (mostly) use node 1 for
allocations such that the difference between partiton = default/environment
vs. partition = nearest was clearly visible. Hence it does work.

Otherwise, the same applies as I wrote the yesterday:

On 11.07.23 12:35, Tobias Burnus wrote:


While by default 'malloc' allocates memory on the same node as the
calling
process/thread ('numactl --show' shows 'preferred node: current',
Linux kernel memory policy MPOL_DEFAULT), this can be changed.
For instance, when running the program as follows, 'malloc' now
prefers to allocate on the second node:
  numactl --preferred=1 ./myproc

Thus, it seems to be sensible to provide a means to ensure the 'nearest'
allocation.  The MPOL_LOCAL policy does so, as provided by
libnuma's numa_alloc_local. (Which is just wrapper around the syscalls
mmap and mbind.) As with (lib)memkind, there is a run-time dlopen check
for (lib)numa - and no numa*.h is required when bulding GCC.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 450b05ce54d3f08c583c3b5341233ce0df99725b
Author: Tobias Burnus 
Date:   Wed Jul 12 13:50:21 2023 +0200

libgomp: Use libnuma for OpenMP's partition=nearest allocation trait

As with the memkind library, it is only used when found at runtime;
it does not need to be present when building GCC.

The included testcase does not check whether the memory has been placed
on the nearest node as the Linux kernel memory handling too often ignores
that hint, using a different node for the allocation.  However, when
running with 'numactl --preferred= ./executable', it is clearly
visible that the feature works by comparing malloc/default vs. nearest
placement (using get_mempolicy to obtain the node for a mem addr).

libgomp/ChangeLog:

* allocator.c: Add ifdef for LIBGOMP_USE_LIBNUMA.
(enum gomp_numa_memkind_kind): Renamed from gomp_memkind_kind;
add GOMP_MEMKIND_LIBNUMA.
(struct gomp_libnuma_data, gomp_init_libnuma, gomp_get_libnuma): New.
(omp_init_allocator): Handle partition=nearest with libnuma if avail.
(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
numa_alloc_local (+ memset), numa_free, and numa_realloc calls as
needed.
* config/linux/allocator.c (LIBGOMP_USE_LIBNUMA): Define
* libgomp.texi: Fix a typo; use 'fi' instead of its ligature char.
(Memory allocation): Renamed from 'Memory allocation with libmemkind';
updated for libnuma usage.
* testsuite/libgomp.c-c++-common/alloc-11.c: New test.
* testsuite/libgomp.c-c++-common/alloc-12.c: New test.
---
 libgomp/allocator.c   | 202 ---
 libgomp/config/linux/allocator.c  |   1 +
 libgomp/libgomp.texi  |  42 +++-
 libgomp/testsuite/libgomp.c-c++-common/alloc-11.c | 285 ++
 libgomp/testsuite/libgomp.c-c++-common/alloc-12.c | 217 
 5 files changed, 708 insertions(+), 39 deletions(-)

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 25c0f150302..b3187ab2911 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -31,13 +31,13 @@
 #include "libgomp.h"
 #include 
 #include 
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
 #include 
 #endif
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
-enum gomp_memkind_kind
+enum gomp_numa_me

Re: [PATCH V3] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richard and Richi.
> As we disscussed before, COND_LEN_* patterns were added for multiple 
> situations.
> This patch apply CON_LEN_* for the following situation:
>
> Support for the situation that in "vectorizable_operation":
>   /* If operating on inactive elements could generate spurious traps,
>  we need to restrict the operation to active lanes.  Note that this
>  specifically doesn't apply to unhoisted invariants, since they
>  operate on the same value for every lane.
>
>  Similarly, if this operation is part of a reduction, a fully-masked
>  loop should only change the active lanes of the reduction chain,
>  keeping the inactive lanes as-is.  */
>   bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
>   || reduc_idx >= 0);
>
> For mask_out_inactive is true with length loop control.
>
> So, we can these 2 following cases:
>
> 1. Integer division:
>
>#define TEST_TYPE(TYPE)\
>__attribute__((noipa)) \
>void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)  \
>{  \
>  for (int i = 0; i < n; i++)  \
>dst[i] = a[i] % b[i];  \
>}
>#define TEST_ALL() \
>TEST_TYPE(int8_t)  \
>TEST_ALL()
>
> With this patch:
>   
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> 2. Floating-point arithmetic **WITHOUT** -ffast-math
>   
>#define TEST_TYPE(TYPE)\
>__attribute__((noipa)) \
>void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)  \
>{  \
>  for (int i = 0; i < n; i++)  \
>dst[i] = a[i] + b[i];  \
>}
>#define TEST_ALL() \
>TEST_TYPE(float)   \
>TEST_ALL()
>
> With this patch:
>
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> With this patch, we can make sure operations won't trap for elements that 
> "mask_out_inactive".
>
> gcc/ChangeLog:
>
> * internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* 
> support.
> (CASE): Ditto.
> (get_conditional_len_internal_fn): New function.
> * internal-fn.h (get_conditional_len_internal_fn): Ditto.
> * tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* 
> support.
>
> ---
>  gcc/internal-fn.cc | 73 +++---
>  gcc/internal-fn.h  |  1 +
>  gcc/tree-vect-stmts.cc | 48 ---
>  3 files changed, 93 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index f9aaf66cf2a..b288ac6fe6b 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) 
> (internal_fn, gcall *) = {
>0
>  };
>  
> -/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a
> -   tree code CODE.  */
> +/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX
> +   that maps to a tree code CODE.  There is also an IFN_COND_LEN_##SUFFIX
> +   for each such IFN_COND_##SUFFIX.  */
>  #define FOR_EACH_CODE_MAPPING(T) \
> -  T (PLUS_EXPR, IFN_COND_ADD) \
> -  T (MINUS_EXPR, IFN_COND_SUB) \
> -  T (MULT_EXPR, IFN_COND_MUL) \
> -  T (TRUNC_DIV_EXPR, IFN_COND_DIV) \
> -  T (TRUNC_MOD_EXPR, IFN_COND_MOD) \
> -  T (RDIV_EXPR, IFN_COND_RDIV) \
> -  T (MIN_EXPR, IFN_COND_MIN) \
> -  T (MAX_EXPR, IFN_COND_MAX) \
> -  T (BIT_AND_EXPR, IFN_COND_AND) \
> -  T (BIT_IOR_EXPR, IFN_COND_IOR) \
> -  T (BIT_XOR_EXPR, IFN_COND_XOR) \
> -  T (LSHIFT_EXPR, IFN_COND_SHL) \
> -  T (RSHIFT_EXPR, IFN_COND_SHR) \
> -  T (NEGATE_EXPR, IFN_COND_NEG)
> +  T (PLUS_EXPR, ADD) \
> +  T (MINUS_EXPR, SUB) \
> +  T (MULT_EXPR, MUL) \
> +  T (TRUNC_DIV_EXPR, DIV) \
> +  T (TRUNC_MOD_EXPR, MOD) \
> +  T (RDIV_EXPR, RDIV) \
> +  T (MIN_EXPR, MIN) \
> +  T (MAX_EXPR, MAX) \
> +  T (BIT_AND_EXPR, AND) \
> +  T (BIT_IOR_EXPR, IOR) \
> +  T (BIT_XOR_EXPR, XOR) \
> +  T (LSHIFT_EXPR, SHL) \
> +  T (RSHIFT_EXPR, SHR) \
> +  T (NEGATE_EXPR, NEG)

Re: [PATCH] ci: Add a linux CI

2023-07-12 Thread Christophe Lyon via Gcc-patches
Hi,


On Sun, 9 Jul 2023 at 19:13, Tal Regev via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Description: adding a ci in a github repo. Everytime a user will do a PR to
> master branch or releases branches, it will activate the ci on their repo.
> for example: https://github.com/talregev/gcc/pull/1. Can help users to
> verify their own changes before submitting a patch.
>
> ChangeLog: Add a linux CI
>
> Bootstrapping and testing: I tested it on linux with
> host: x86_64-linux-gnu
> target: x86_64-linux-gnu
> some tests are failing. You can see the results in my CI yourself.
>
>
Thanks for sharing your patch & idea.
I think GCC validation is and has been a problem for a long time ;-)

I am not a maintainer, so take my comments with a grain of salt ;-)

- I don't know if the GCC project would want to accept such patches,
pointing to github etc...
- github is not the main GCC repository, it hosts several mirrors AFAIK
- these mirrors are updated by individuals, I think, I don't know at which
frequency etc... correct me if I'm wrong
- would this mean that each time each such mirror/fork is updated, this
triggers builds on github servers? Would that handle the load? I don't
think so (also: how many "free" minutes of CPU time can be used?)
- as you have noticed the GCC testsuite is not 100% clean (i.e. there are
failures, so 'make check' always exits with an error code), making such a
step useless. What we need is to compare to a baseline (eg. results of
previous build) and report if there were detections. Several companies have
CI systems doing this (either internally, or on publicly accessible servers)

In particular, at Linaro we monitor regressions for several arm and aarch64
flavors, and we are also experimenting with "pre-commit CI", based on
patchwork.

Thanks anyway for sharing, it's good to see such initiatives ;-)

Christophe



> Patch: attach to this email.
>


Re: [PATCH 0/9] Add btf_decl_tag C attribute

2023-07-12 Thread Jose E. Marchesi via Gcc-patches


[Added Eduard Zingerman in CC, who is implementing this same feature in
 clang/llvm and also the consumer component in the kernel (pahole).]

Hi Richard.

> On Tue, Jul 11, 2023 at 11:58 PM David Faust via Gcc-patches
>  wrote:
>>
>> Hello,
>>
>> This series adds support for a new attribute, "btf_decl_tag" in GCC.
>> The same attribute is already supported in clang, and is used by various
>> components of the BPF ecosystem.
>>
>> The purpose of the attribute is to allow to associate (to "tag")
>> declarations with arbitrary string annotations, which are emitted into
>> debugging information (DWARF and/or BTF) to facilitate post-compilation
>> analysis (the motivating use case being the Linux kernel BPF verifier).
>> Multiple tags are allowed on the same declaration.
>>
>> These strings are not interpreted by the compiler, and the attribute
>> itself has no effect on generated code, other than to produce additional
>> DWARF DIEs and/or BTF records conveying the annotations.
>>
>> This entails:
>>
>> - A new C-language-level attribute which allows to associate (to "tag")
>>   particular declarations with arbitrary strings.
>>
>> - The conveyance of that information in DWARF in the form of a new DIE,
>>   DW_TAG_GNU_annotation, with tag number (0x6000) and format matching
>>   that of the DW_TAG_LLVM_annotation extension supported in LLVM for
>>   the same purpose. These DIEs are already supported by BPF tooling,
>>   such as pahole.
>>
>> - The conveyance of that information in BTF debug info in the form of
>>   BTF_KIND_DECL_TAG records. These records are already supported by
>>   LLVM and other tools in the eBPF ecosystem, such as the Linux kernel
>>   eBPF verifier.
>>
>>
>> Background
>> ==
>>
>> The purpose of these tags is to convey additional semantic information
>> to post-compilation consumers, in particular the Linux kernel eBPF
>> verifier. The verifier can make use of that information while analyzing
>> a BPF program to aid in determining whether to allow or reject the
>> program to be run. More background on these tags can be found in the
>> early support for them in the kernel here [1] and [2].
>>
>> The "btf_decl_tag" attribute is half the story; the other half is a
>> sibling attribute "btf_type_tag" which serves the same purpose but
>> applies to types. Support for btf_type_tag will come in a separate
>> patch series, since it is impaced by GCC bug 110439 which needs to be
>> addressed first.
>>
>> I submitted an initial version of this work (including btf_type_tag)
>> last spring [3], however at the time there were some open questions
>> about the behavior of the btf_type_tag attribute and issues with its
>> implementation. Since then we have clarified these details and agreed
>> to solutions with the BPF community and LLVM BPF folks.
>>
>> The main motivation for emitting the tags in DWARF is that the Linux
>> kernel generates its BTF information via pahole, using DWARF as a source:
>>
>> ++  BTF  BTF   +--+
>> | pahole |---> vmlinux.btf --->| verifier |
>> ++ +--+
>> ^^
>> ||
>>   DWARF |BTF |
>> ||
>>   vmlinux  +-+
>>   module1.ko   | BPF program |
>>   module2.ko   +-+
>> ...
>>
>> This is because:
>>
>> a)  pahole adds additional kernel-specific information into the
>> produced BTF based on additional analysis of kernel objects.
>>
>> b)  Unlike GCC, LLVM will only generate BTF for BPF programs.
>>
>> b)  GCC can generate BTF for whatever target with -gbtf, but there is no
>> support for linking/deduplicating BTF in the linker.
>>
>> In the scenario above, the verifier needs access to the pointer tags of
>> both the kernel types/declarations (conveyed in the DWARF and translated
>> to BTF by pahole) and those of the BPF program (available directly in BTF).
>>
>>
>> DWARF Representation
>> 
>>
>> As noted above, btf_decl_tag is represented in DWARF via a new DIE
>> DW_TAG_GNU_annotation, with identical format to the LLVM DWARF
>> extension DW_TAG_LLVM_annotation serving the same purpose. The DIE has
>> the following format:
>>
>>   DW_TAG_GNU_annotation (0x6000)
>> DW_AT_name: "btf_decl_tag"
>> DW_AT_const_value: 
>>
>> These DIEs are placed in the DWARF tree as children of the DIE for the
>> appropriate declaration, and one such DIE is created for each occurrence
>> of the btf_decl_tag attribute on a declaration.
>>
>> For example:
>>
>>   const int * c __attribute__((btf_decl_tag ("__c"), btf_decl_tag 
>> ("devicemem")));
>>
>> This declaration produces the following DWARF:
>>
>>  <1><1e>: Abbrev Number: 2 (DW_TAG_variable)
>> <1f>   DW_AT_name  

Re: [PATCH] riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])

2023-07-12 Thread Christoph Müllner
On Wed, Jul 12, 2023 at 4:05 AM Jeff Law  wrote:
>
>
>
> On 7/10/23 22:44, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > Recently, two identical XTheadCondMov tests have been added, which both 
> > fail.
> > Let's fix that by changing the following:
> > * Merge both files into one (no need for separate tests for rv32 and rv64)
> > * Drop unrelated attribute check test (we already test for `th.mveqz`
> >and `th.mvnez` instructions, so there is little additional value)
> > * Fix the pattern to allow matching
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved to...
> >   * gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
> >   * gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Removed.
> I thought this stuff got fixed recently.  Certainly happy to see the
> files merged though.  Here's what I got from the July 4 run:

I have the following with a GCC master from today
(a454325bea77a0dd79415480d48233a7c296bc0a):

FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2
scan-assembler .attribute arch,
"rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2
scan-assembler .attribute arch,
"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"

With this patch the fails are gone.

BR
Christoph

>
> > UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O0
> > UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O1
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2  (test for 
> > excess errors)
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
> > check-function-bodies ConEmv_imm_imm_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
> > check-function-bodies ConEmv_imm_reg_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
> > check-function-bodies ConEmv_reg_imm_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
> > check-function-bodies ConEmv_reg_reg_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
> > check-function-bodies ConNmv_imm_imm_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
> > check-function-bodies ConNmv_imm_reg_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
> > check-function-bodies ConNmv_reg_imm_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
> > check-function-bodies ConNmv_reg_reg_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   scan-assembler 
> > .attribute arch, 
> > "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> > UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O3 -g
> > UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -Os
> > UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2 -flto 
> > -fno-use-linker-plugin -flto-partition=none
> > UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2 -flto 
> > -fuse-linker-plugin -fno-fat-lto-objects
> > UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O0
> > UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O1
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2  (test for 
> > excess errors)
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
> > check-function-bodies ConEmv_imm_imm_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
> > check-function-bodies ConEmv_imm_reg_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
> > check-function-bodies ConEmv_reg_imm_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
> > check-function-bodies ConEmv_reg_reg_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
> > check-function-bodies ConNmv_imm_imm_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
> > check-function-bodies ConNmv_imm_reg_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
> > check-function-bodies ConNmv_reg_imm_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
> > check-function-bodies ConNmv_reg_reg_reg
> > PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   scan-assembler 
> > .attribute arch, 
> > "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> > UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O3 -g
> > UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -Os
> > UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2 -flto 
> > -fno-use-linker-plugin -flto-partition=none
> > UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2 -flto 
> > -fuse-linker-plugin -fno-fat-lto-objects
>
>
> jeff


[PATCH V4] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.
As we disscussed before, COND_LEN_* patterns were added for multiple situations.
This patch apply CON_LEN_* for the following situation:

Support for the situation that in "vectorizable_operation":
  /* If operating on inactive elements could generate spurious traps,
 we need to restrict the operation to active lanes.  Note that this
 specifically doesn't apply to unhoisted invariants, since they
 operate on the same value for every lane.

 Similarly, if this operation is part of a reduction, a fully-masked
 loop should only change the active lanes of the reduction chain,
 keeping the inactive lanes as-is.  */
  bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
|| reduc_idx >= 0);

For mask_out_inactive is true with length loop control.

So, we can these 2 following cases:

1. Integer division:

   #define TEST_TYPE(TYPE)  \
   __attribute__((noipa))   \
   void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
   {\
 for (int i = 0; i < n; i++)\
   dst[i] = a[i] % b[i];\
   }
   #define TEST_ALL()   \
   TEST_TYPE(int8_t)\
   TEST_ALL()

With this patch:
  
  _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
  ivtmp_45 = _61 * 4;
  vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
  vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
  vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
vect__4.8_48, _61, 0);
  .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);

2. Floating-point arithmetic **WITHOUT** -ffast-math
  
   #define TEST_TYPE(TYPE)  \
   __attribute__((noipa))   \
   void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
   {\
 for (int i = 0; i < n; i++)\
   dst[i] = a[i] + b[i];\
   }
   #define TEST_ALL()   \
   TEST_TYPE(float) \
   TEST_ALL()

With this patch:
   
  _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
  ivtmp_45 = _61 * 4;
  vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
  vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
  vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
vect__4.8_48, _61, 0);
  .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);

With this patch, we can make sure operations won't trap for elements that 
"mask_out_inactive".

gcc/ChangeLog:

* internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* support.
(CASE): Ditto.
(get_conditional_len_internal_fn): New function.
* internal-fn.h (get_conditional_len_internal_fn): Ditto.
* tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* 
support.

---
 gcc/internal-fn.cc | 71 +++---
 gcc/internal-fn.h  |  1 +
 gcc/tree-vect-stmts.cc | 48 +---
 3 files changed, 91 insertions(+), 29 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index f9aaf66cf2a..c11123a1173 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) 
(internal_fn, gcall *) = {
   0
 };
 
-/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a
-   tree code CODE.  */
+/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX
+   that maps to a tree code CODE.  There is also an IFN_COND_LEN_##SUFFIX
+   for each such IFN_COND_##SUFFIX.  */
 #define FOR_EACH_CODE_MAPPING(T) \
-  T (PLUS_EXPR, IFN_COND_ADD) \
-  T (MINUS_EXPR, IFN_COND_SUB) \
-  T (MULT_EXPR, IFN_COND_MUL) \
-  T (TRUNC_DIV_EXPR, IFN_COND_DIV) \
-  T (TRUNC_MOD_EXPR, IFN_COND_MOD) \
-  T (RDIV_EXPR, IFN_COND_RDIV) \
-  T (MIN_EXPR, IFN_COND_MIN) \
-  T (MAX_EXPR, IFN_COND_MAX) \
-  T (BIT_AND_EXPR, IFN_COND_AND) \
-  T (BIT_IOR_EXPR, IFN_COND_IOR) \
-  T (BIT_XOR_EXPR, IFN_COND_XOR) \
-  T (LSHIFT_EXPR, IFN_COND_SHL) \
-  T (RSHIFT_EXPR, IFN_COND_SHR) \
-  T (NEGATE_EXPR, IFN_COND_NEG)
+  T (PLUS_EXPR, ADD) \
+  T (MINUS_EXPR, SUB) \
+  T (MULT_EXPR, MUL) \
+  T (TRUNC_DIV_EXPR, DIV) \
+  T (TRUNC_MOD_EXPR, MOD) \
+  T (RDIV_EXPR, RDIV) \
+  T (MIN_EXPR, MIN) \
+  T (MAX_EXPR, MAX) \
+  T (BIT_AND_EXPR, AND) \
+  T (BIT_IOR_EXPR, IOR) \
+  T (BIT_XOR_EXPR, XOR) \
+  T (LSHIFT_EXPR, SHL) \
+  T (RSHIFT_EXPR, SHR) \
+  T (NEGATE_EXPR, NEG)
 
 /* Return a function that only performs CODE when a certain condition is met
and that uses a given fallback value otherwise.  For example, if CODE is
@@ -4313,7 +4314,7 @@ get_conditional_internal_fn (tree_code code)
 {
   switch (code

Re: Re: [PATCH V3] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread 钟居哲
I fix comments as you suggested.

Thanks a lot!
 Soon will merge it when I finish the bootstrap && regression.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-07-12 20:14
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V3] VECT: Apply COND_LEN_* into vectorizable_operation
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richard and Richi.
> As we disscussed before, COND_LEN_* patterns were added for multiple 
> situations.
> This patch apply CON_LEN_* for the following situation:
>
> Support for the situation that in "vectorizable_operation":
>   /* If operating on inactive elements could generate spurious traps,
>  we need to restrict the operation to active lanes.  Note that this
>  specifically doesn't apply to unhoisted invariants, since they
>  operate on the same value for every lane.
>
>  Similarly, if this operation is part of a reduction, a fully-masked
>  loop should only change the active lanes of the reduction chain,
>  keeping the inactive lanes as-is.  */
>   bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
> || reduc_idx >= 0);
>
> For mask_out_inactive is true with length loop control.
>
> So, we can these 2 following cases:
>
> 1. Integer division:
>
>#define TEST_TYPE(TYPE) \
>__attribute__((noipa)) \
>void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
>{ \
>  for (int i = 0; i < n; i++) \
>dst[i] = a[i] % b[i]; \
>}
>#define TEST_ALL() \
>TEST_TYPE(int8_t) \
>TEST_ALL()
>
> With this patch:
>   
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> 2. Floating-point arithmetic **WITHOUT** -ffast-math
>   
>#define TEST_TYPE(TYPE) \
>__attribute__((noipa)) \
>void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
>{ \
>  for (int i = 0; i < n; i++) \
>dst[i] = a[i] + b[i]; \
>}
>#define TEST_ALL() \
>TEST_TYPE(float) \
>TEST_ALL()
>
> With this patch:
>
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> With this patch, we can make sure operations won't trap for elements that 
> "mask_out_inactive".
>
> gcc/ChangeLog:
>
> * internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* 
> support.
> (CASE): Ditto.
> (get_conditional_len_internal_fn): New function.
> * internal-fn.h (get_conditional_len_internal_fn): Ditto.
> * tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* 
> support.
>
> ---
>  gcc/internal-fn.cc | 73 +++---
>  gcc/internal-fn.h  |  1 +
>  gcc/tree-vect-stmts.cc | 48 ---
>  3 files changed, 93 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index f9aaf66cf2a..b288ac6fe6b 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) 
> (internal_fn, gcall *) = {
>0
>  };
>  
> -/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a
> -   tree code CODE.  */
> +/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX
> +   that maps to a tree code CODE.  There is also an IFN_COND_LEN_##SUFFIX
> +   for each such IFN_COND_##SUFFIX.  */
>  #define FOR_EACH_CODE_MAPPING(T) \
> -  T (PLUS_EXPR, IFN_COND_ADD) \
> -  T (MINUS_EXPR, IFN_COND_SUB) \
> -  T (MULT_EXPR, IFN_COND_MUL) \
> -  T (TRUNC_DIV_EXPR, IFN_COND_DIV) \
> -  T (TRUNC_MOD_EXPR, IFN_COND_MOD) \
> -  T (RDIV_EXPR, IFN_COND_RDIV) \
> -  T (MIN_EXPR, IFN_COND_MIN) \
> -  T (MAX_EXPR, IFN_COND_MAX) \
> -  T (BIT_AND_EXPR, IFN_COND_AND) \
> -  T (BIT_IOR_EXPR, IFN_COND_IOR) \
> -  T (BIT_XOR_EXPR, IFN_COND_XOR) \
> -  T (LSHIFT_EXPR, IFN_COND_SHL) \
> -  T (RSHIFT_EXPR, IFN_COND_SHR) \
> -  T (NEGATE_EXPR, IFN_COND_NEG)
> +  T (PLUS_EXPR, ADD) \
> +  T (MINUS_EXPR, SUB) \
> +  T (MULT_EXPR, MUL) \
> +  T (TRUNC_DIV_EXPR, DIV) \
> +  T (TRUNC_MOD_EXPR, MOD) \
> +  T (RDIV_EXPR, RDIV) \
> +  T (MIN_EXPR, MIN) \
> +  T (MAX_EXPR, MAX) \
> +  T (BIT_AND_EXPR, AND) \
> +  T (BIT_IOR_EXPR, IOR) \
> +  T (BIT_XOR_EXPR, XOR) \
> +  T (LSHIFT_EXPR, SHL) \
> +  T (RSHIFT_EXPR, SHR) \
> +  T (NEGATE_EXPR, NEG)
>  
>  /* Return a function that only performs C

Re: [PATCH 0/9] Add btf_decl_tag C attribute

2023-07-12 Thread Richard Biener via Gcc-patches
On Wed, Jul 12, 2023 at 2:44 PM Jose E. Marchesi
 wrote:
>
>
> [Added Eduard Zingerman in CC, who is implementing this same feature in
>  clang/llvm and also the consumer component in the kernel (pahole).]
>
> Hi Richard.
>
> > On Tue, Jul 11, 2023 at 11:58 PM David Faust via Gcc-patches
> >  wrote:
> >>
> >> Hello,
> >>
> >> This series adds support for a new attribute, "btf_decl_tag" in GCC.
> >> The same attribute is already supported in clang, and is used by various
> >> components of the BPF ecosystem.
> >>
> >> The purpose of the attribute is to allow to associate (to "tag")
> >> declarations with arbitrary string annotations, which are emitted into
> >> debugging information (DWARF and/or BTF) to facilitate post-compilation
> >> analysis (the motivating use case being the Linux kernel BPF verifier).
> >> Multiple tags are allowed on the same declaration.
> >>
> >> These strings are not interpreted by the compiler, and the attribute
> >> itself has no effect on generated code, other than to produce additional
> >> DWARF DIEs and/or BTF records conveying the annotations.
> >>
> >> This entails:
> >>
> >> - A new C-language-level attribute which allows to associate (to "tag")
> >>   particular declarations with arbitrary strings.
> >>
> >> - The conveyance of that information in DWARF in the form of a new DIE,
> >>   DW_TAG_GNU_annotation, with tag number (0x6000) and format matching
> >>   that of the DW_TAG_LLVM_annotation extension supported in LLVM for
> >>   the same purpose. These DIEs are already supported by BPF tooling,
> >>   such as pahole.
> >>
> >> - The conveyance of that information in BTF debug info in the form of
> >>   BTF_KIND_DECL_TAG records. These records are already supported by
> >>   LLVM and other tools in the eBPF ecosystem, such as the Linux kernel
> >>   eBPF verifier.
> >>
> >>
> >> Background
> >> ==
> >>
> >> The purpose of these tags is to convey additional semantic information
> >> to post-compilation consumers, in particular the Linux kernel eBPF
> >> verifier. The verifier can make use of that information while analyzing
> >> a BPF program to aid in determining whether to allow or reject the
> >> program to be run. More background on these tags can be found in the
> >> early support for them in the kernel here [1] and [2].
> >>
> >> The "btf_decl_tag" attribute is half the story; the other half is a
> >> sibling attribute "btf_type_tag" which serves the same purpose but
> >> applies to types. Support for btf_type_tag will come in a separate
> >> patch series, since it is impaced by GCC bug 110439 which needs to be
> >> addressed first.
> >>
> >> I submitted an initial version of this work (including btf_type_tag)
> >> last spring [3], however at the time there were some open questions
> >> about the behavior of the btf_type_tag attribute and issues with its
> >> implementation. Since then we have clarified these details and agreed
> >> to solutions with the BPF community and LLVM BPF folks.
> >>
> >> The main motivation for emitting the tags in DWARF is that the Linux
> >> kernel generates its BTF information via pahole, using DWARF as a source:
> >>
> >> ++  BTF  BTF   +--+
> >> | pahole |---> vmlinux.btf --->| verifier |
> >> ++ +--+
> >> ^^
> >> ||
> >>   DWARF |BTF |
> >> ||
> >>   vmlinux  +-+
> >>   module1.ko   | BPF program |
> >>   module2.ko   +-+
> >> ...
> >>
> >> This is because:
> >>
> >> a)  pahole adds additional kernel-specific information into the
> >> produced BTF based on additional analysis of kernel objects.
> >>
> >> b)  Unlike GCC, LLVM will only generate BTF for BPF programs.
> >>
> >> b)  GCC can generate BTF for whatever target with -gbtf, but there is no
> >> support for linking/deduplicating BTF in the linker.
> >>
> >> In the scenario above, the verifier needs access to the pointer tags of
> >> both the kernel types/declarations (conveyed in the DWARF and translated
> >> to BTF by pahole) and those of the BPF program (available directly in BTF).
> >>
> >>
> >> DWARF Representation
> >> 
> >>
> >> As noted above, btf_decl_tag is represented in DWARF via a new DIE
> >> DW_TAG_GNU_annotation, with identical format to the LLVM DWARF
> >> extension DW_TAG_LLVM_annotation serving the same purpose. The DIE has
> >> the following format:
> >>
> >>   DW_TAG_GNU_annotation (0x6000)
> >> DW_AT_name: "btf_decl_tag"
> >> DW_AT_const_value: 
> >>
> >> These DIEs are placed in the DWARF tree as children of the DIE for the
> >> appropriate declaration, and one such DIE is created for each occurrence
> >> of the btf_decl_t

[PATCH] tree-optimization/94864 - vector insert of vector extract simplification

2023-07-12 Thread Richard Biener via Gcc-patches
The PRs ask for optimizing of

  _1 = BIT_FIELD_REF ;
  result_4 = BIT_INSERT_EXPR ;

to a vector permutation.  The following implements this as
match.pd pattern, improving code generation on x86_64.

On the RTL level we face the issue that backend patterns inconsistently
use vec_merge and vec_select of vec_concat to represent permutes.

I think using a (supported) permute is almost always better
than an extract plus insert, maybe excluding the case we extract
element zero and that's aliased to a register that can be used
directly for insertion (not sure how to query that).

But this regresses for example gcc.target/i386/pr54855-8.c because PRE
now realizes that

  _1 = BIT_FIELD_REF ;
  if (_1 > a_4(D))
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 536870913]:

   [local count: 1073741824]:
  # iftmp.0_2 = PHI <_1(3), a_4(D)(2)>
  x_5 = BIT_INSERT_EXPR ;

is equal to

   [local count: 1073741824]:
  _1 = BIT_FIELD_REF ;
  if (_1 > a_4(D))
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 536870912]:
  _7 = BIT_INSERT_EXPR ;

   [local count: 1073741824]:
  # prephitmp_8 = PHI 

and that no longer produces the desired maxsd operation at the RTL
level (we fail to match .FMAX at the GIMPLE level earlier).

Bootstrapped and tested on x86_64-unknown-linux-gnu with regressions:

FAIL: gcc.target/i386/pr54855-13.c scan-assembler-times vmaxsh[ t] 1
FAIL: gcc.target/i386/pr54855-13.c scan-assembler-not vcomish[ t]
FAIL: gcc.target/i386/pr54855-8.c scan-assembler-times maxsd 1
FAIL: gcc.target/i386/pr54855-8.c scan-assembler-not movsd
FAIL: gcc.target/i386/pr54855-9.c scan-assembler-times minss 1
FAIL: gcc.target/i386/pr54855-9.c scan-assembler-not movss

I think this is also PR88540 (the lack of min/max detection, not
sure if the SSE min/max are suitable here)

PR tree-optimization/94864
PR tree-optimization/94865
* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
for vector insertion from vector extraction.

* gcc.target/i386/pr94864.c: New testcase.
* gcc.target/i386/pr94865.c: Likewise.
---
 gcc/match.pd| 25 +
 gcc/testsuite/gcc.target/i386/pr94864.c | 13 +
 gcc/testsuite/gcc.target/i386/pr94865.c | 13 +
 3 files changed, 51 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr94864.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr94865.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..8cc106049c4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7770,6 +7770,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  wi::to_wide (@ipos) + isize))
 (BIT_FIELD_REF @0 @rsize @rpos)
 
+/* Simplify vector inserts of other vector extracts to a permute.  */
+(simplify
+ (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos)
+ (if (VECTOR_TYPE_P (type)
+  && types_match (@0, @1)
+  && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2))
+  && TYPE_VECTOR_SUBPARTS (type).is_constant ())
+  (with
+   {
+ unsigned HOST_WIDE_INT elsz
+   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1;
+ poly_uint64 relt = exact_div (tree_to_poly_uint64 (@rpos), elsz);
+ poly_uint64 ielt = exact_div (tree_to_poly_uint64 (@ipos), elsz);
+ unsigned nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+ vec_perm_builder builder;
+ builder.new_vector (nunits, nunits, 1);
+ for (unsigned i = 0; i < nunits; ++i)
+   builder.quick_push (known_eq (ielt, i) ? nunits + relt : i);
+ vec_perm_indices sel (builder, 2, nunits);
+   }
+   (if (!VECTOR_MODE_P (TYPE_MODE (type))
+   || can_vec_perm_const_p (TYPE_MODE (type), TYPE_MODE (type), sel, 
false))
+(vec_perm @0 @1 { vec_perm_indices_to_tree
+(build_vector_type (ssizetype, nunits), sel); })
+
 (if (canonicalize_math_after_vectorization_p ())
  (for fmas (FMA)
   (simplify
diff --git a/gcc/testsuite/gcc.target/i386/pr94864.c 
b/gcc/testsuite/gcc.target/i386/pr94864.c
new file mode 100644
index 000..69cb481fcfe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr94864.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx" } */
+
+typedef double v2df __attribute__((vector_size(16)));
+
+v2df move_sd(v2df a, v2df b)
+{
+v2df result = a;
+result[0] = b[1];
+return result;
+}
+
+/* { dg-final { scan-assembler "unpckhpd\[\\t \]%xmm0, %xmm1" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr94865.c 
b/gcc/testsuite/gcc.target/i386/pr94865.c
new file mode 100644
index 000..84065ac2467
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr94865.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx" } */
+
+typedef double v2df __attribute__((vector_size(16)));
+
+v2df move_sd(v2df a, v2df b)
+{
+v2df result = a;
+result[1] = b[1];
+return result;
+}
+
+/* { dg-final { scan-assembler "shufpd\[\\t 

[PATCH] [og13] OpenACC: Vector length warning fixes for implicit mapping/declare create tests

2023-07-12 Thread Julian Brown
This patch adds expected "vector length" warnings to several tests
for NVPTX.

Tested with offloading to NVPTX. I will apply (to og13) shortly.

2023-07-11  Julian Brown  

libgomp/
* testsuite/libgomp.oacc-c-c++-common/implicit-mapping-1.c: Add
expected warning.
* testsuite/libgomp.oacc-fortran/declare-create-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/declare-create-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/declare-create-3.f90: Likewise.
* testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90:
Likewise.
---
 libgomp/testsuite/libgomp.oacc-c-c++-common/implicit-mapping-1.c | 1 +
 libgomp/testsuite/libgomp.oacc-fortran/declare-create-1.f90  | 1 +
 libgomp/testsuite/libgomp.oacc-fortran/declare-create-2.f90  | 1 +
 libgomp/testsuite/libgomp.oacc-fortran/declare-create-3.f90  | 1 +
 .../testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90 | 1 +
 .../testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90 | 1 +
 6 files changed, 6 insertions(+)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/implicit-mapping-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/implicit-mapping-1.c
index 4825e875998..ed0ab94cd8f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/implicit-mapping-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/implicit-mapping-1.c
@@ -12,6 +12,7 @@ int main(void)
 #pragma acc enter data copyin(arr[30:10])
 
 #pragma acc serial
+/* { dg-warning {using .vector_length \(32\)., ignoring 1} "" { target 
openacc_nvidia_accel_selected } .-1 } */
   {
 arr[33] = 66;
   }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-create-1.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/declare-create-1.f90
index 9e7e60f1440..057b5eb958a 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-create-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-create-1.f90
@@ -11,6 +11,7 @@ use m
 mint = 0
 
 !$acc serial
+! { dg-warning {using .vector_length \(32\)., ignoring 1} "" { target 
openacc_nvidia_accel_selected } .-1 }
 mint = 5
 !$acc end serial
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-create-2.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/declare-create-2.f90
index 675f6902775..dd7c9798fba 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-create-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-create-2.f90
@@ -13,6 +13,7 @@ allocate(mint)
 mint = 0
 
 !$acc serial
+! { dg-warning {using .vector_length \(32\)., ignoring 1} "" { target 
openacc_nvidia_accel_selected } .-1 }
 mint = 5
 !$acc end serial
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-create-3.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/declare-create-3.f90
index 16651cb1f5e..7cceaa5f8a3 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/declare-create-3.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-create-3.f90
@@ -13,6 +13,7 @@ allocate(mint(1:20))
 mint = 0
 
 !$acc serial
+! { dg-warning {using .vector_length \(32\)., ignoring 1} "" { target 
openacc_nvidia_accel_selected } .-1 }
 mint = 5
 !$acc end serial
 
diff --git 
a/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90
index 4b61e1cee9b..8b173c72d88 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90
@@ -19,6 +19,7 @@ integer :: arr(*)
 !$acc enter data copyin(arr(1:10))
 
 !$acc serial
+! { dg-warning {using .vector_length \(32\)., ignoring 1} "" { target 
openacc_nvidia_accel_selected } .-1 }
 arr(5) = 5
 !$acc end serial
 
diff --git 
a/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90
index daf7089915f..659fe8e3c06 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90
@@ -30,6 +30,7 @@ integer :: arr(*)
 ! overwritten.
 
 !$acc serial
+! { dg-warning {using .vector_length \(32\)., ignoring 1} "" { target 
openacc_nvidia_accel_selected } .-1 }
 ! This access is then done via the on-target pointer.
 arr(5) = 5
 !$acc end serial
-- 
2.41.0



[PATCH] OpenMP: Strided/rectangular 'target update' out-of-bounds array lookup fix

2023-07-12 Thread Julian Brown
This patch fixes a bug with the calculation of array bounds in the
metadata for noncontiguous 'target update' directives.  We record the
array base address, a bias and the array length to pass to libgomp --
but at present, we use the 'whole array size' for the last, which means
that at runtime we might look up an array with lower bound "base+bias"
and upper bound "base+bias+length", which for non-zero bias will overflow
the actual bounds of the array on the host and will (sometimes) return
an unrelated block instead of the correct one.

The fix is to instead calculate a size for the array that encloses the
elements to be transferred, and is guaranteed to be entirely within the
array (user errors excepted).

Tested with offloading to NVPTX.  I will apply (to og13) shortly.

2023-07-11  Julian Brown  

gcc/
* omp-low.cc (lower_omp_target): Calculate volume enclosing
transferred elements instead of using whole array size for
noncontiguous 'target update' operations.
---
 gcc/omp-low.cc | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 05ac917fb27..c7706a5921f 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -1,11 +1,13 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
 
tree bias = size_zero_node;
tree volume = size_one_node;
+   tree enclosure = size_one_node;
for (i = dims - 1; i >= 0; i--)
  {
tree dim = (*vdim)[i].value;
tree index = (*vindex)[i].value;
tree stride = (*vstride)[i].value;
+   tree len = (*vlen)[i].value;
 
/* For the bias we want, e.g.:
 
@@ -14463,6 +14465,20 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
   size_binop (MULT_EXPR, volume,
   index_stride));
volume = size_binop (MULT_EXPR, volume, dim);
+
+   if (i == 0)
+ {
+   tree elems_covered = size_binop (MINUS_EXPR, len,
+size_one_node);
+   elems_covered = size_binop (MULT_EXPR, elems_covered,
+   stride);
+   elems_covered = size_binop (PLUS_EXPR, elems_covered,
+   size_one_node);
+   enclosure = size_binop (MULT_EXPR, enclosure,
+   elems_covered);
+ }
+   else
+ enclosure = volume;
  }
 
/* If we don't have a separate span size, use the element size
@@ -14470,10 +14486,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
if (!span)
  span = fold_convert (sizetype, elsize);
 
-   /* The size of the whole array -- to make sure we find any
-  part of the array via splay-tree lookup that might be
-  mapped on the target at runtime.  */
-   OMP_CLAUSE_SIZE (oc) = size_binop (MULT_EXPR, arrsize, span);
+   /* The size of a volume enclosing the elements to be
+  transferred.  */
+   OMP_CLAUSE_SIZE (oc) = size_binop (MULT_EXPR, enclosure, span);
/* And the bias of the first element we will update.  */
OMP_CLAUSE_SIZE (dn) = size_binop (MULT_EXPR, bias, span);
 
-- 
2.41.0



Re: [PATCH 0/9] Add btf_decl_tag C attribute

2023-07-12 Thread Jose E. Marchesi via Gcc-patches


> On Wed, Jul 12, 2023 at 2:44 PM Jose E. Marchesi
>  wrote:
>>
>>
>> [Added Eduard Zingerman in CC, who is implementing this same feature in
>>  clang/llvm and also the consumer component in the kernel (pahole).]
>>
>> Hi Richard.
>>
>> > On Tue, Jul 11, 2023 at 11:58 PM David Faust via Gcc-patches
>> >  wrote:
>> >>
>> >> Hello,
>> >>
>> >> This series adds support for a new attribute, "btf_decl_tag" in GCC.
>> >> The same attribute is already supported in clang, and is used by various
>> >> components of the BPF ecosystem.
>> >>
>> >> The purpose of the attribute is to allow to associate (to "tag")
>> >> declarations with arbitrary string annotations, which are emitted into
>> >> debugging information (DWARF and/or BTF) to facilitate post-compilation
>> >> analysis (the motivating use case being the Linux kernel BPF verifier).
>> >> Multiple tags are allowed on the same declaration.
>> >>
>> >> These strings are not interpreted by the compiler, and the attribute
>> >> itself has no effect on generated code, other than to produce additional
>> >> DWARF DIEs and/or BTF records conveying the annotations.
>> >>
>> >> This entails:
>> >>
>> >> - A new C-language-level attribute which allows to associate (to "tag")
>> >>   particular declarations with arbitrary strings.
>> >>
>> >> - The conveyance of that information in DWARF in the form of a new DIE,
>> >>   DW_TAG_GNU_annotation, with tag number (0x6000) and format matching
>> >>   that of the DW_TAG_LLVM_annotation extension supported in LLVM for
>> >>   the same purpose. These DIEs are already supported by BPF tooling,
>> >>   such as pahole.
>> >>
>> >> - The conveyance of that information in BTF debug info in the form of
>> >>   BTF_KIND_DECL_TAG records. These records are already supported by
>> >>   LLVM and other tools in the eBPF ecosystem, such as the Linux kernel
>> >>   eBPF verifier.
>> >>
>> >>
>> >> Background
>> >> ==
>> >>
>> >> The purpose of these tags is to convey additional semantic information
>> >> to post-compilation consumers, in particular the Linux kernel eBPF
>> >> verifier. The verifier can make use of that information while analyzing
>> >> a BPF program to aid in determining whether to allow or reject the
>> >> program to be run. More background on these tags can be found in the
>> >> early support for them in the kernel here [1] and [2].
>> >>
>> >> The "btf_decl_tag" attribute is half the story; the other half is a
>> >> sibling attribute "btf_type_tag" which serves the same purpose but
>> >> applies to types. Support for btf_type_tag will come in a separate
>> >> patch series, since it is impaced by GCC bug 110439 which needs to be
>> >> addressed first.
>> >>
>> >> I submitted an initial version of this work (including btf_type_tag)
>> >> last spring [3], however at the time there were some open questions
>> >> about the behavior of the btf_type_tag attribute and issues with its
>> >> implementation. Since then we have clarified these details and agreed
>> >> to solutions with the BPF community and LLVM BPF folks.
>> >>
>> >> The main motivation for emitting the tags in DWARF is that the Linux
>> >> kernel generates its BTF information via pahole, using DWARF as a source:
>> >>
>> >> ++  BTF  BTF   +--+
>> >> | pahole |---> vmlinux.btf --->| verifier |
>> >> ++ +--+
>> >> ^^
>> >> ||
>> >>   DWARF |BTF |
>> >> ||
>> >>   vmlinux  +-+
>> >>   module1.ko   | BPF program |
>> >>   module2.ko   +-+
>> >> ...
>> >>
>> >> This is because:
>> >>
>> >> a)  pahole adds additional kernel-specific information into the
>> >> produced BTF based on additional analysis of kernel objects.
>> >>
>> >> b)  Unlike GCC, LLVM will only generate BTF for BPF programs.
>> >>
>> >> b)  GCC can generate BTF for whatever target with -gbtf, but there is no
>> >> support for linking/deduplicating BTF in the linker.
>> >>
>> >> In the scenario above, the verifier needs access to the pointer tags of
>> >> both the kernel types/declarations (conveyed in the DWARF and translated
>> >> to BTF by pahole) and those of the BPF program (available directly in 
>> >> BTF).
>> >>
>> >>
>> >> DWARF Representation
>> >> 
>> >>
>> >> As noted above, btf_decl_tag is represented in DWARF via a new DIE
>> >> DW_TAG_GNU_annotation, with identical format to the LLVM DWARF
>> >> extension DW_TAG_LLVM_annotation serving the same purpose. The DIE has
>> >> the following format:
>> >>
>> >>   DW_TAG_GNU_annotation (0x6000)
>> >> DW_AT_name: "btf_decl_tag"
>> >> DW_AT_const_value: 
>> >>
>> >> These DIEs are placed in the DWARF tree as childre

Re: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification

2023-07-12 Thread Jeff Law via Gcc-patches




On 7/12/23 07:36, Richard Biener via Gcc-patches wrote:

The PRs ask for optimizing of

   _1 = BIT_FIELD_REF ;
   result_4 = BIT_INSERT_EXPR ;

to a vector permutation.  The following implements this as
match.pd pattern, improving code generation on x86_64.

On the RTL level we face the issue that backend patterns inconsistently
use vec_merge and vec_select of vec_concat to represent permutes.

I think using a (supported) permute is almost always better
than an extract plus insert, maybe excluding the case we extract
element zero and that's aliased to a register that can be used
directly for insertion (not sure how to query that).
So for a target with aliases at the register level, I'd bet they're 
already aware of the aliasing and are prepared to deal with it in the 
target (and are probably already trying to take advantage of that quirk 
when possible).


So I'd just punt that problem to the targets.  If it turns out to be 
common, then we can try to address it, probably at the gimple->rtl border.


jeff



Re: [PATCH] riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])

2023-07-12 Thread Jeff Law via Gcc-patches




On 7/12/23 06:48, Christoph Müllner wrote:

On Wed, Jul 12, 2023 at 4:05 AM Jeff Law  wrote:




On 7/10/23 22:44, Christoph Muellner wrote:

From: Christoph Müllner 

Recently, two identical XTheadCondMov tests have been added, which both fail.
Let's fix that by changing the following:
* Merge both files into one (no need for separate tests for rv32 and rv64)
* Drop unrelated attribute check test (we already test for `th.mveqz`
and `th.mvnez` instructions, so there is little additional value)
* Fix the pattern to allow matching

gcc/testsuite/ChangeLog:

   * gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved to...
   * gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
   * gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Removed.

I thought this stuff got fixed recently.  Certainly happy to see the
files merged though.  Here's what I got from the July 4 run:


I have the following with a GCC master from today
(a454325bea77a0dd79415480d48233a7c296bc0a):

FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2
scan-assembler .attribute arch,
"rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2
scan-assembler .attribute arch,
"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"

With this patch the fails are gone.

Then it's fine with me :-)

jeff


Re: [PATCH] riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])

2023-07-12 Thread Philipp Tomsich
On Wed, 12 Jul 2023 at 16:05, Jeff Law  wrote:

>
>
> On 7/12/23 06:48, Christoph Müllner wrote:
> > On Wed, Jul 12, 2023 at 4:05 AM Jeff Law  wrote:
> >>
> >>
> >>
> >> On 7/10/23 22:44, Christoph Muellner wrote:
> >>> From: Christoph Müllner 
> >>>
> >>> Recently, two identical XTheadCondMov tests have been added, which
> both fail.
> >>> Let's fix that by changing the following:
> >>> * Merge both files into one (no need for separate tests for rv32 and
> rv64)
> >>> * Drop unrelated attribute check test (we already test for `th.mveqz`
> >>> and `th.mvnez` instructions, so there is little additional value)
> >>> * Fix the pattern to allow matching
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved to...
> >>>* gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
> >>>* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Removed.
> >> I thought this stuff got fixed recently.  Certainly happy to see the
> >> files merged though.  Here's what I got from the July 4 run:
> >
> > I have the following with a GCC master from today
> > (a454325bea77a0dd79415480d48233a7c296bc0a):
> >
> > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2
> > scan-assembler .attribute arch,
> > "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2
> > scan-assembler .attribute arch,
> > "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> >
> > With this patch the fails are gone.
> Then it's fine with me :-)


For the avoidance of all doubt: could I hear an "OK"?

Thanks,
Philipp.


Re: [PATCH] riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])

2023-07-12 Thread Jeff Law via Gcc-patches




On 7/12/23 08:07, Philipp Tomsich wrote:



On Wed, 12 Jul 2023 at 16:05, Jeff Law > wrote:




On 7/12/23 06:48, Christoph Müllner wrote:
 > On Wed, Jul 12, 2023 at 4:05 AM Jeff Law mailto:jeffreya...@gmail.com>> wrote:
 >>
 >>
 >>
 >> On 7/10/23 22:44, Christoph Muellner wrote:
 >>> From: Christoph Müllner mailto:christoph.muell...@vrull.eu>>
 >>>
 >>> Recently, two identical XTheadCondMov tests have been added,
which both fail.
 >>> Let's fix that by changing the following:
 >>> * Merge both files into one (no need for separate tests for
rv32 and rv64)
 >>> * Drop unrelated attribute check test (we already test for
`th.mveqz`
 >>>     and `th.mvnez` instructions, so there is little additional
value)
 >>> * Fix the pattern to allow matching
 >>>
 >>> gcc/testsuite/ChangeLog:
 >>>
 >>>        * gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved
to...
 >>>        * gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
 >>>        * gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Removed.
 >> I thought this stuff got fixed recently.  Certainly happy to see the
 >> files merged though.  Here's what I got from the July 4 run:
 >
 > I have the following with a GCC master from today
 > (a454325bea77a0dd79415480d48233a7c296bc0a):
 >
 > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2
 > scan-assembler .attribute arch,
 >
"rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
 > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2
 > scan-assembler .attribute arch,
 >
"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
 >
 > With this patch the fails are gone.
Then it's fine with me :-)


For the avoidance of all doubt: could I hear an "OK"?

OK for the trunk.
jeff


Re: [PATCH] riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])

2023-07-12 Thread Kito Cheng via Gcc-patches
Ok

Philipp Tomsich 於 2023年7月12日 週三,22:08寫道:

> On Wed, 12 Jul 2023 at 16:05, Jeff Law  wrote:
>
> >
> >
> > On 7/12/23 06:48, Christoph Müllner wrote:
> > > On Wed, Jul 12, 2023 at 4:05 AM Jeff Law 
> wrote:
> > >>
> > >>
> > >>
> > >> On 7/10/23 22:44, Christoph Muellner wrote:
> > >>> From: Christoph Müllner 
> > >>>
> > >>> Recently, two identical XTheadCondMov tests have been added, which
> > both fail.
> > >>> Let's fix that by changing the following:
> > >>> * Merge both files into one (no need for separate tests for rv32 and
> > rv64)
> > >>> * Drop unrelated attribute check test (we already test for `th.mveqz`
> > >>> and `th.mvnez` instructions, so there is little additional value)
> > >>> * Fix the pattern to allow matching
> > >>>
> > >>> gcc/testsuite/ChangeLog:
> > >>>
> > >>>* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved to...
> > >>>* gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
> > >>>* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Removed.
> > >> I thought this stuff got fixed recently.  Certainly happy to see the
> > >> files merged though.  Here's what I got from the July 4 run:
> > >
> > > I have the following with a GCC master from today
> > > (a454325bea77a0dd79415480d48233a7c296bc0a):
> > >
> > > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2
> > > scan-assembler .attribute arch,
> > >
> "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> > > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2
> > > scan-assembler .attribute arch,
> > >
> "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> > >
> > > With this patch the fails are gone.
> > Then it's fine with me :-)
>
>
> For the avoidance of all doubt: could I hear an "OK"?
>
> Thanks,
> Philipp.
>


[PATCH] - Devirtualization of array destruction (C++) - 110057

2023-07-12 Thread Ng YongXiang via Gcc-patches
Component:
c++

Bug ID:
110057

Bugzilla link:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057

Description:
Array should not call virtual destructor of object when array is destructed

ChangeLog:

2023-07-12  Ng YongXiang  PR c++
* Devirtualize auto generated destructor calls of arraycp/*
init.c: Call non virtual destructor of objects in arraytestsuite/
  * g++.dg/devirt-array-destructor-1.C: New.*
g++.dg/devirt-array-destructor-2.C: New.
* g++.dg/warn/pr83054.C: Change expected number of devirtualized calls


On Wed, Jul 12, 2023 at 5:02 PM Xi Ruoyao  wrote:

> On Wed, 2023-07-12 at 16:58 +0800, Ng YongXiang via Gcc-patches wrote:
> > I'm writing to seek for a review for an issue I filed some time ago.
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057 . A proposed patch
> is
> > attached in the bug tracker as well.
>
> You should send the patch to gcc-patches@gcc.gnu.org for a review, see
> https://gcc.gnu.org/contribute.html for the details.  Generally we
> consider patches attached in bugzilla as drafts.
>
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University
>
From aafa45669695520c26504479eb3f21d61ea81edb Mon Sep 17 00:00:00 2001
From: yongxiangng 
Date: Sat, 3 Jun 2023 00:36:32 +0800
Subject: [PATCH] Devirtualize auto generated destructor calls of arrays

---
 gcc/cp/init.cc|  8 +++---
 .../g++.dg/devirt-array-destructor-1.C| 27 ++
 .../g++.dg/devirt-array-destructor-2.C| 28 +++
 gcc/testsuite/g++.dg/warn/pr83054.C   | 24 +++-
 4 files changed, 69 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/devirt-array-destructor-1.C
 create mode 100644 gcc/testsuite/g++.dg/devirt-array-destructor-2.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 6ccda365b04..69ab51d0a4b 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4112,8 +4112,8 @@ build_vec_delete_1 (location_t loc, tree base, tree maxindex, tree type,
   if (type_build_dtor_call (type))
 	{
 	  tmp = build_delete (loc, ptype, base, sfk_complete_destructor,
-			  LOOKUP_NORMAL|LOOKUP_DESTRUCTOR, 1,
-			  complain);
+			  LOOKUP_NORMAL|LOOKUP_DESTRUCTOR|LOOKUP_NONVIRTUAL,
+			  1, complain);
 	  if (tmp == error_mark_node)
 	return error_mark_node;
 	}
@@ -4143,8 +4143,8 @@ build_vec_delete_1 (location_t loc, tree base, tree maxindex, tree type,
 return error_mark_node;
   body = build_compound_expr (loc, body, tmp);
   tmp = build_delete (loc, ptype, tbase, sfk_complete_destructor,
-		  LOOKUP_NORMAL|LOOKUP_DESTRUCTOR, 1,
-		  complain);
+		  LOOKUP_NORMAL|LOOKUP_DESTRUCTOR|LOOKUP_NONVIRTUAL,
+		  1, complain);
   if (tmp == error_mark_node)
 return error_mark_node;
   body = build_compound_expr (loc, body, tmp);
diff --git a/gcc/testsuite/g++.dg/devirt-array-destructor-1.C b/gcc/testsuite/g++.dg/devirt-array-destructor-1.C
new file mode 100644
index 000..be2d16ae761
--- /dev/null
+++ b/gcc/testsuite/g++.dg/devirt-array-destructor-1.C
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+/* Virtual calls should be devirtualized because we know dynamic type of object in array at compile time */
+/* { dg-options "-O3 -fdump-tree-optimized -fno-inline"  } */
+
+class A
+{
+public:
+  virtual ~A()
+  {
+  }
+};
+
+class B : public A
+{
+public:
+  virtual ~B()
+  {
+  }
+};
+
+int main()
+{
+  B b[10];
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "optimized"} } */
diff --git a/gcc/testsuite/g++.dg/devirt-array-destructor-2.C b/gcc/testsuite/g++.dg/devirt-array-destructor-2.C
new file mode 100644
index 000..0b3ab2ca9d0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/devirt-array-destructor-2.C
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* Virtual calls should be devirtualized because we know dynamic type of object in array at compile time */
+/* { dg-options "-O3 -fdump-tree-optimized -fno-inline"  } */
+
+class A
+{
+public:
+  virtual ~A()
+  {
+  }
+};
+
+class B : public A
+{
+public:
+  virtual ~B()
+  {
+  }
+};
+
+int main()
+{
+  B* ptr = new B[10];
+  delete[] ptr;
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "OBJ_TYPE_REF" 0 "optimized"} } */
diff --git a/gcc/testsuite/g++.dg/warn/pr83054.C b/gcc/testsuite/g++.dg/warn/pr83054.C
index 5285f94acee..7cd0951713d 100644
--- a/gcc/testsuite/g++.dg/warn/pr83054.C
+++ b/gcc/testsuite/g++.dg/warn/pr83054.C
@@ -10,7 +10,7 @@
 #endif
 
 extern "C" int printf (const char *, ...);
-struct foo // { dg-warning "final would enable devirtualization of 5 calls" }
+struct foo // { dg-warning "final would enable devirtualization of 1 call" }
 {
   static int count;
   void print (int i, int j) { printf ("foo[%d][%d] = %d\n", i, j, x); }
@@ -29,19 +29,15 @@ int foo::count;
 
 int main ()
 {
-  {
-foo array[3][3];
-for (int i = 0; i < 3; i++)
-  {
-	for (int j = 0; j < 3; j++)
-	  {
-	printf("&a[%d][%d] = %x\n", i, j, (void *)&arra

Re: [PATCH] RISC-V: Support COND_LEN_* patterns

2023-07-12 Thread 钟居哲
The middle-end vectorizer patch is approved and soon will be merged.

The middle-end dependency is resolved.

Ok for trunk?


juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-07-12 12:44
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Support COND_LEN_* patterns
This patch is depending on the following patch on Vectorizer:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624179.html
 
With this patch, we can handle operations may trap on elements outside the loop.
 
These 2 following cases will be addressed by this patch:
 
1. integer division:
 
  #define TEST_TYPE(TYPE) \
  __attribute__((noipa)) \
  void vrem_##TYPE (TYPE * __restrict dst, TYPE * __restrict a, TYPE * 
__restrict b, int n) \
  { \
for (int i = 0; i < n; i++) \
  dst[i] = a[i] % b[i]; \
  }
  #define TEST_ALL() \
   TEST_TYPE(int8_t) \
  TEST_ALL()
 
  Before this patch:
 
   vrem_int8_t:
ble a3,zero,.L14
csrrt4,vlenb
addiw   a5,a3,-1
addiw   a4,t4,-1
sext.w  t5,a3
bltua5,a4,.L10
csrrt3,vlenb
subwt3,t5,t3
li  a5,0
vsetvli t6,zero,e8,m1,ta,ma
.L4:
add a6,a2,a5
add a7,a0,a5
add t1,a1,a5
mv  a4,a5
add a5,a5,t4
vl1re8.vv2,0(a6)
vl1re8.vv1,0(t1)
sext.w  a6,a5
vrem.vv v1,v1,v2
vs1r.v  v1,0(a7)
bleua6,t3,.L4
csrra5,vlenb
addwa4,a4,a5
sext.w  a5,a4
beq t5,a4,.L16
.L3:
csrra6,vlenb
subwt5,t5,a4
srlia6,a6,1
addiw   t1,t5,-1
addiw   a7,a6,-1
bltut1,a7,.L9
sllia4,a4,32
srlia4,a4,32
add t0,a1,a4
add t6,a2,a4
add a4,a0,a4
vsetvli a7,zero,e8,mf2,ta,ma
sext.w  t3,a6
vle8.v  v1,0(t0)
vle8.v  v2,0(t6)
subwt4,t5,a6
vrem.vv v1,v1,v2
vse8.v  v1,0(a4)
mv  t1,t3
bltut4,t3,.L7
csrrt1,vlenb
add a4,a4,a6
add t0,t0,a6
add t6,t6,a6
sext.w  t1,t1
vle8.v  v1,0(t0)
vle8.v  v2,0(t6)
vrem.vv v1,v1,v2
vse8.v  v1,0(a4)
.L7:
addwa5,t1,a5
beq t5,t1,.L14
.L9:
add a4,a1,a5
add a6,a2,a5
lb  a6,0(a6)
lb  a4,0(a4)
add a7,a0,a5
addia5,a5,1
remwa4,a4,a6
sext.w  a6,a5
sb  a4,0(a7)
bgt a3,a6,.L9
.L14:
ret
.L10:
li  a4,0
li  a5,0
j   .L3
.L16:
ret
 
After this patch:
 
   vrem_int8_t:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,m1,tu,ma
vle8.v v1,0(a1)
vle8.v v2,0(a2)
sub a3,a3,a5
vrem.vv v1,v1,v2
vse8.v v1,0(a0)
add a1,a1,a5
add a2,a2,a5
add a0,a0,a5
bne a3,zero,.L3
.L5:
ret
 
2. Floating-point operation **WITHOUT** -ffast-math:
 
#define TEST_TYPE(TYPE) \
__attribute__((noipa)) \
void vadd_##TYPE (TYPE * __restrict dst, TYPE *__restrict a, TYPE 
*__restrict b, int n) \
{ \
  for (int i = 0; i < n; i++) \
dst[i] = a[i] + b[i]; \
}
 
#define TEST_ALL() \
 TEST_TYPE(float) \
 
TEST_ALL()
   
Before this patch:
   
   vadd_float:
ble a3,zero,.L10
csrra4,vlenb
srlit3,a4,2
addiw   a5,a3,-1
addiw   a6,t3,-1
sext.w  t6,a3
bltua5,a6,.L7
subwt5,t6,t3
mv  t1,a1
mv  a7,a2
mv  a6,a0
li  a5,0
vsetvli t4,zero,e32,m1,ta,ma
.L4:
vl1re32.v   v1,0(t1)
vl1re32.v   v2,0(a7)
addwa5,a5,t3
vfadd.vvv1,v1,v2
vs1r.v  v1,0(a6)
add t1,t1,a4
add a7,a7,a4
add a6,a6,a4
bgeut5,a5,.L4
beq t6,a5,.L10
sext.w  a5,a5
.L3:
sllia4,a5,2
.L6:
add a6,a1,a4
add a7,a2,a4
flw fa4,0(a6)
flw fa5,0(a7)
add a6,a0,a4
addiw   a5,a5,1
fadd.s  fa5,fa5,fa4
addia4,a4,4
fsw fa5,0(a6)
bgt a3,a5,.L6
.L10:
ret
.L7:
li  a5,0
j   .L3
 
After this patch:
 
   vadd_float:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,tu,ma
slli a4,a5,2
vle32.v v1,0(a1)
vle32.v v2,0(a2)
sub a3,a3,a5
vfadd.vv v1,v1,v2
vse32.v v1,0(a0)
add a1,a1,a4
add a2,a2,a4
add a0,a0,a4
bne a3,zero,.L3
.L5:
ret
  
gcc/ChangeLog:
 
* config/riscv/autovec.md (cond_len_): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_cond_len_binop): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_tu_insn): Ditto.
(emit_nonvlmax_fp_tu_insn): Ditto.
(need_frm_p): Ditto.
(expand_cond_len_binop): Ditto.
* config/riscv/riscv.cc

Re: [PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-07-12 Thread Jeff Law via Gcc-patches




On 7/12/23 03:12, Manolis Tsamis wrote:

On Mon, Jul 10, 2023 at 12:58 AM Hans-Peter Nilsson  wrote:


On Sun, 9 Jul 2023, Hans-Peter Nilsson wrote:


On Thu, 15 Jun 2023, Manolis Tsamis wrote:


This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.



It punts on all "use" insns that are not SET.
Why not use single_set there too?


Also, I don't see insn costs considered?
(Also: typo "immidiate".)



The only change that this pass does is to change offsets where
possible and then simplify add immediate instructions to register
moves.
I don't see how this could result in worse performance and by
extension I don't see where insn costs could be used.
Do you have any thoughts about where to use the costs?
If the offset crosses an architectural size boundary such that the 
instruction was longer, but still valid, it could affect the cost.


That's the most obvious case to me.  There may be others.

Any progress on that m68k issue?  I've also got a report of x264 failing 
to build on riscv64 with the V2 variant, but I haven't distilled that 
down to a testcase yet.


jeff


Re: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification

2023-07-12 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> The PRs ask for optimizing of
>
>   _1 = BIT_FIELD_REF ;
>   result_4 = BIT_INSERT_EXPR ;
>
> to a vector permutation.  The following implements this as
> match.pd pattern, improving code generation on x86_64.
>
> On the RTL level we face the issue that backend patterns inconsistently
> use vec_merge and vec_select of vec_concat to represent permutes.

Yeah, the current RTL codes probably overlap a bit too much.

Maybe we should have a rule that a vec_merge with a constant
third operand should be canonicalised to a vec_select?  And maybe
change the first operand of vec_select to be an rtvec, so that
no separate vec_concat (and thus wider mode) is needed for two-input
permutes?  Would be a lot of work though...

> I think using a (supported) permute is almost always better
> than an extract plus insert, maybe excluding the case we extract
> element zero and that's aliased to a register that can be used
> directly for insertion (not sure how to query that).

Yeah, extraction of the low element (0 for LE, N-1 for BE) is special
in RTL, in that it is now folded to a subreg.  But IMO it's reasonable
for even that case to through TARGET_VECTORIZE_VEC_PERM_CONST,
maybe with a target-independent helper function to match permute
vectors that are equivalent to extract-and-insert.

On AArch64, extract-and-insert is a single operation for other
elements too, e.g.:

ins v0.s[2], v1.s[1]

is a thing.  But if the helper returns the index of the extracted
elements, targets can decide for themselves whether the index is
supported or not.

Agree that this is the right thing for gimple to do FWIW.

Thanks,
Richard

> But this regresses for example gcc.target/i386/pr54855-8.c because PRE
> now realizes that
>
>   _1 = BIT_FIELD_REF ;
>   if (_1 > a_4(D))
> goto ; [50.00%]
>   else
> goto ; [50.00%]
>
>[local count: 536870913]:
>
>[local count: 1073741824]:
>   # iftmp.0_2 = PHI <_1(3), a_4(D)(2)>
>   x_5 = BIT_INSERT_EXPR ;
>
> is equal to
>
>[local count: 1073741824]:
>   _1 = BIT_FIELD_REF ;
>   if (_1 > a_4(D))
> goto ; [50.00%]
>   else
> goto ; [50.00%]
>
>[local count: 536870912]:
>   _7 = BIT_INSERT_EXPR ;
>
>[local count: 1073741824]:
>   # prephitmp_8 = PHI 
>
> and that no longer produces the desired maxsd operation at the RTL
> level (we fail to match .FMAX at the GIMPLE level earlier).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu with regressions:
>
> FAIL: gcc.target/i386/pr54855-13.c scan-assembler-times vmaxsh[ t] 1
> FAIL: gcc.target/i386/pr54855-13.c scan-assembler-not vcomish[ t]
> FAIL: gcc.target/i386/pr54855-8.c scan-assembler-times maxsd 1
> FAIL: gcc.target/i386/pr54855-8.c scan-assembler-not movsd
> FAIL: gcc.target/i386/pr54855-9.c scan-assembler-times minss 1
> FAIL: gcc.target/i386/pr54855-9.c scan-assembler-not movss
>
> I think this is also PR88540 (the lack of min/max detection, not
> sure if the SSE min/max are suitable here)
>
>   PR tree-optimization/94864
>   PR tree-optimization/94865
>   * match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
>   for vector insertion from vector extraction.
>
>   * gcc.target/i386/pr94864.c: New testcase.
>   * gcc.target/i386/pr94865.c: Likewise.
> ---
>  gcc/match.pd| 25 +
>  gcc/testsuite/gcc.target/i386/pr94864.c | 13 +
>  gcc/testsuite/gcc.target/i386/pr94865.c | 13 +
>  3 files changed, 51 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94864.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94865.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8543f777a28..8cc106049c4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7770,6 +7770,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> wi::to_wide (@ipos) + isize))
>  (BIT_FIELD_REF @0 @rsize @rpos)
>  
> +/* Simplify vector inserts of other vector extracts to a permute.  */
> +(simplify
> + (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos)
> + (if (VECTOR_TYPE_P (type)
> +  && types_match (@0, @1)
> +  && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2))
> +  && TYPE_VECTOR_SUBPARTS (type).is_constant ())
> +  (with
> +   {
> + unsigned HOST_WIDE_INT elsz
> +   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1;
> + poly_uint64 relt = exact_div (tree_to_poly_uint64 (@rpos), elsz);
> + poly_uint64 ielt = exact_div (tree_to_poly_uint64 (@ipos), elsz);
> + unsigned nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> + vec_perm_builder builder;
> + builder.new_vector (nunits, nunits, 1);
> + for (unsigned i = 0; i < nunits; ++i)
> +   builder.quick_push (known_eq (ielt, i) ? nunits + relt : i);
> + vec_perm_indices sel (builder, 2, nunits);
> +   }
> +   (if (!VECTOR_MODE_P (TYPE_MODE (type))
> + || can_vec_perm_const_p (TYPE_MODE (type), TYPE_MODE (type), sel, 
> false))
> + 

[committed] libgomp.texi: add cross ref, remove duplicated entry

2023-07-12 Thread Tobias Burnus

Committed as r14-2468-g13c3e29d47e359

"Some are only stubs" sounded worse than the actual status and we now a
have a rather extensive and complete section about this topic.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 13c3e29d47e359b2f05ea98d61710fc162ba6d31
Author: Tobias Burnus 
Date:   Wed Jul 12 16:14:20 2023 +0200

libgomp.texi: add cross ref, remove duplicated entry

libgomp/

* libgomp.texi (OpenMP 5.0): Replace '... stub' by @ref to
'Memory allocation' section which contains the full status.
(TR11): Remove differently worded duplicated entry.
---
 libgomp/libgomp.texi | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 9d910e6883c..1645cc0a2d3 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -221,7 +221,7 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{mutexinoutset} @emph{dependence-type} for @code{depend} clause
   @tab Y @tab
 @item Predefined memory spaces, memory allocators, allocator traits
-  @tab Y @tab Some are only stubs
+  @tab Y @tab See also @ref{Memory allocation}
 @item Memory management routines @tab Y @tab
 @item @code{allocate} directive @tab N @tab
 @item @code{allocate} clause @tab P @tab Initial support
@@ -487,8 +487,6 @@ Technical Report (TR) 11 is the first preview for OpenMP 6.0.
 @item Mapping lambda captures @tab N @tab
 @item For Fortran, atomic compare with storing the comparison result
   @tab N @tab
-@item @code{aligned} clause changes for @code{simd} and @code{declare simd}
-  @tab N @tab
 @end multitable
 
 


Re: [PATCH] RISC-V: Support COND_LEN_* patterns

2023-07-12 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

> +/* Return true if the operation is the floating-point operation need FRM.  */
> +static bool
> +need_frm_p (rtx_code code, machine_mode mode)
> +{
> +  if (!FLOAT_MODE_P (mode))
> +return false;
> +  return code != SMIN && code != SMAX;
> +}

Return true if the operation requires a rounding mode operand.  Maybe also
call it needs_fp_rounding?

> +  if (need_frm_p (code, mode))
> + emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_MU, ops, len);
> +  else
> + emit_nonvlmax_tu_insn (icode, RVV_BINOP_MU, ops, len);
> +}

This feels like we could decide it inside emit_nonvlmax_tu_insn.
Same for without _tu.  But let's keep it like this for now in
order not to stall progress.

> +/* Implement TARGET_PREFERRED_ELSE_VALUE.  For binary operations,
> +   prefer to use the first arithmetic operand as the else value if
> +   the else value doesn't matter, since that exactly matches the SVE
> +   destructive merging form.  For ternary operations we could either
> +   pick the first operand and use FMAD-like instructions or the last
> +   operand and use FMLA-like instructions; the latter seems more
> +   natural.  */

What's FMLA?  That's SVE I suppose and ours is fmacc?

Apart from that fine from my side, thanks for supporting this.

Regards
 Robin



Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-12 Thread Jeff Law via Gcc-patches




On 7/12/23 01:27, Richard Biener wrote:



Using SSA_NAME_DEF_STMT during expansion is OK, but I don't think you
can rely on REG_EXPR here since you don't know whether any coalescing
happened.  That said, maybe the implementation currently guarantees
you'll only see a REG_EXPR SSA name if there's a single definition
of that register, but at least I'm not aware of that and this is also
not documented.
If anyone knows if the implementation guarantees that, it'd probably be 
Michael, since he did the revamping of the expansion code years ago.




I wonder if you can recover vlse.v at combine time though?
It may be hard to recover at combine time -- our vector insns aren't in 
forms that are easily digested by combine.  In this specific case we 
have hope though.  Essentially combine would need to recognize the 
offsets vector as a simple stride and adjust appropriate.




That said, if the ISA supports gather/scatter with an affine offset
the more appropriate way would be to add additional named expanders
for this and deal with the above in the middle-end during RTL
expansion instead.
It's worth a try.  I didn't have much luck with this at Tachyum, but I 
always expected it was a mis-understanding of some parts of the 
vectorizer on my part.  I was deep inside this class of problems when I 
had to push it on the stack to develop a golang port :(


We were basically going down the path of treating everythign as a 
scatter-gather, but trying to recognize strides in the offsets vector as 
a degenerate case.


jeff


RE: Re: [PATCH V3] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-12 Thread Li, Pan2 via Gcc-patches
Committed v4 as passed both the regression and bootstrap tests, thanks both 
Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of ???
Sent: Wednesday, July 12, 2023 9:19 PM
To: richard.sandiford 
Cc: gcc-patches ; rguenther 
Subject: Re: Re: [PATCH V3] VECT: Apply COND_LEN_* into vectorizable_operation

I fix comments as you suggested.

Thanks a lot!
 Soon will merge it when I finish the bootstrap && regression.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-07-12 20:14
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V3] VECT: Apply COND_LEN_* into vectorizable_operation
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Hi, Richard and Richi.
> As we disscussed before, COND_LEN_* patterns were added for multiple 
> situations.
> This patch apply CON_LEN_* for the following situation:
>
> Support for the situation that in "vectorizable_operation":
>   /* If operating on inactive elements could generate spurious traps,
>  we need to restrict the operation to active lanes.  Note that this
>  specifically doesn't apply to unhoisted invariants, since they
>  operate on the same value for every lane.
>
>  Similarly, if this operation is part of a reduction, a fully-masked
>  loop should only change the active lanes of the reduction chain,
>  keeping the inactive lanes as-is.  */
>   bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
> || reduc_idx >= 0);
>
> For mask_out_inactive is true with length loop control.
>
> So, we can these 2 following cases:
>
> 1. Integer division:
>
>#define TEST_TYPE(TYPE) \
>__attribute__((noipa)) \
>void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
>{ \
>  for (int i = 0; i < n; i++) \
>dst[i] = a[i] % b[i]; \
>}
>#define TEST_ALL() \
>TEST_TYPE(int8_t) \
>TEST_ALL()
>
> With this patch:
>   
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> 2. Floating-point arithmetic **WITHOUT** -ffast-math
>   
>#define TEST_TYPE(TYPE) \
>__attribute__((noipa)) \
>void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n) \
>{ \
>  for (int i = 0; i < n; i++) \
>dst[i] = a[i] + b[i]; \
>}
>#define TEST_ALL() \
>TEST_TYPE(float) \
>TEST_ALL()
>
> With this patch:
>
>   _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
>   ivtmp_45 = _61 * 4;
>   vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
>   vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
>   vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
> vect__4.8_48, _61, 0);
>   .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);
>
> With this patch, we can make sure operations won't trap for elements that 
> "mask_out_inactive".
>
> gcc/ChangeLog:
>
> * internal-fn.cc (FOR_EACH_CODE_MAPPING): Adapt for COND_LEN_* 
> support.
> (CASE): Ditto.
> (get_conditional_len_internal_fn): New function.
> * internal-fn.h (get_conditional_len_internal_fn): Ditto.
> * tree-vect-stmts.cc (vectorizable_operation): Adapt for COND_LEN_* 
> support.
>
> ---
>  gcc/internal-fn.cc | 73 +++---
>  gcc/internal-fn.h  |  1 +
>  gcc/tree-vect-stmts.cc | 48 ---
>  3 files changed, 93 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index f9aaf66cf2a..b288ac6fe6b 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4276,23 +4276,24 @@ static void (*const internal_fn_expanders[]) 
> (internal_fn, gcall *) = {
>0
>  };
>  
> -/* Invoke T(CODE, IFN) for each conditional function IFN that maps to a
> -   tree code CODE.  */
> +/* Invoke T(CODE, SUFFIX) for each conditional function IFN_COND_##SUFFIX
> +   that maps to a tree code CODE.  There is also an IFN_COND_LEN_##SUFFIX
> +   for each such IFN_COND_##SUFFIX.  */
>  #define FOR_EACH_CODE_MAPPING(T) \
> -  T (PLUS_EXPR, IFN_COND_ADD) \
> -  T (MINUS_EXPR, IFN_COND_SUB) \
> -  T (MULT_EXPR, IFN_COND_MUL) \
> -  T (TRUNC_DIV_EXPR, IFN_COND_DIV) \
> -  T (TRUNC_MOD_EXPR, IFN_COND_MOD) \
> -  T (RDIV_EXPR, IFN_COND_RDIV) \
> -  T (MIN_EXPR, IFN_COND_MIN) \
> -  T (MAX_EXPR, IFN_COND_MAX) \
> -  T (BIT_AND_EXPR, IFN_COND_AND) \
> -  T (BIT_IOR_EXPR, IFN_COND_IOR) \
> -  T (BIT_XOR_EXPR, IFN_COND_XOR) \
> -  T (LSHIFT_EXPR, IFN_COND_SHL) \
> -  T (RSHIFT_EXPR, IFN_COND_SHR) \
> -  T (NEGATE_EXPR, IFN_COND_NEG)
> +  T (PLUS_EXPR, ADD) \
> +  T (MINUS_EXPR, SUB) \
> +  T (MULT_EXPR, MUL) \
> +  T (TRUNC_DIV_EXPR, DIV) 

Re: Re: [PATCH] RISC-V: Support COND_LEN_* patterns

2023-07-12 Thread 钟居哲
>> Return true if the operation requires a rounding mode operand.  Maybe also
>>call it needs_fp_rounding?
ok

>>What's FMLA?  That's SVE I suppose and ours is fmacc?
Yes, the comments is misleading will fix it soon.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-07-12 22:24
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support COND_LEN_* patterns
Hi Juzhe,
 
> +/* Return true if the operation is the floating-point operation need FRM.  */
> +static bool
> +need_frm_p (rtx_code code, machine_mode mode)
> +{
> +  if (!FLOAT_MODE_P (mode))
> +return false;
> +  return code != SMIN && code != SMAX;
> +}
 
Return true if the operation requires a rounding mode operand.  Maybe also
call it needs_fp_rounding?
 
> +  if (need_frm_p (code, mode))
> + emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_MU, ops, len);
> +  else
> + emit_nonvlmax_tu_insn (icode, RVV_BINOP_MU, ops, len);
> +}
 
This feels like we could decide it inside emit_nonvlmax_tu_insn.
Same for without _tu.  But let's keep it like this for now in
order not to stall progress.
 
> +/* Implement TARGET_PREFERRED_ELSE_VALUE.  For binary operations,
> +   prefer to use the first arithmetic operand as the else value if
> +   the else value doesn't matter, since that exactly matches the SVE
> +   destructive merging form.  For ternary operations we could either
> +   pick the first operand and use FMAD-like instructions or the last
> +   operand and use FMLA-like instructions; the latter seems more
> +   natural.  */
 
What's FMLA?  That's SVE I suppose and ours is fmacc?
 
Apart from that fine from my side, thanks for supporting this.
 
Regards
Robin
 
 


[committed] ifcvt: Change return type of predicate functions from int to bool

2023-07-12 Thread Uros Bizjak via Gcc-patches
Also change some internal variables and function arguments from int to bool.

gcc/ChangeLog:

* ifcvt.cc (cond_exec_changed_p): Change variable to bool.
(last_active_insn): Change "skip_use_p" function argument to bool.
(noce_operand_ok): Change return type from int to bool.
(find_cond_trap): Ditto.
(block_jumps_and_fallthru_p): Change "fallthru_p" and
"jump_p" variables to bool.
(noce_find_if_block): Change return type from int to bool.
(cond_exec_find_if_block): Ditto.
(find_if_case_1): Ditto.
(find_if_case_2): Ditto.
(dead_or_predicable): Ditto. Change "reversep" function arg to bool.
(block_jumps_and_fallthru): Rename from block_jumps_and_fallthru_p.
(cond_exec_process_insns): Change return type from int to bool.
Change "mod_ok" function arg to bool.
(cond_exec_process_if_block): Change return type from int to bool.
Change "do_multiple_p" function arg to bool.  Change "then_mod_ok"
variable to bool.
(noce_emit_store_flag): Change return type from int to bool.
Change "reversep" function arg to bool.  Change "cond_complex"
variable to bool.
(noce_try_move): Change return type from int to bool.
(noce_try_ifelse_collapse): Ditto.
(noce_try_store_flag): Ditto. Change "reversep" variable to bool.
(noce_try_addcc): Change return type from int to bool.  Change
"subtract" variable to bool.
(noce_try_store_flag_constants): Change return type from int to bool.
(noce_try_store_flag_mask): Ditto.  Change "reversep" variable to bool.
(noce_try_cmove): Change return type from int to bool.
(noce_try_cmove_arith): Ditto. Change "is_mem" variable to bool.
(noce_try_minmax): Change return type from int to bool.  Change
"unsignedp" variable to bool.
(noce_try_abs): Change return type from int to bool.  Change
"negate" variable to bool.
(noce_try_sign_mask): Change return type from int to bool.
(noce_try_move): Ditto.
(noce_try_store_flag_constants): Ditto.
(noce_try_cmove): Ditto.
(noce_try_cmove_arith): Ditto.
(noce_try_minmax): Ditto.  Change "unsignedp" variable to bool.
(noce_try_bitop): Change return type from int to bool.
(noce_operand_ok): Ditto.
(noce_convert_multiple_sets): Ditto.
(noce_convert_multiple_sets_1): Ditto.
(noce_process_if_block): Ditto.
(check_cond_move_block): Ditto.
(cond_move_process_if_block): Ditto. Change "success_p"
variable to bool.
(rest_of_handle_if_conversion): Change return type to void.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 0b180b4568f..a0af553b9ff 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -73,29 +73,29 @@ static int num_updated_if_blocks;
 static int num_true_changes;
 
 /* Whether conditional execution changes were made.  */
-static int cond_exec_changed_p;
+static bool cond_exec_changed_p;
 
 /* Forward references.  */
 static int count_bb_insns (const_basic_block);
 static bool cheap_bb_rtx_cost_p (const_basic_block, profile_probability, int);
 static rtx_insn *first_active_insn (basic_block);
-static rtx_insn *last_active_insn (basic_block, int);
+static rtx_insn *last_active_insn (basic_block, bool);
 static rtx_insn *find_active_insn_before (basic_block, rtx_insn *);
 static rtx_insn *find_active_insn_after (basic_block, rtx_insn *);
 static basic_block block_fallthru (basic_block);
 static rtx cond_exec_get_condition (rtx_insn *, bool);
 static rtx noce_get_condition (rtx_insn *, rtx_insn **, bool);
-static int noce_operand_ok (const_rtx);
+static bool noce_operand_ok (const_rtx);
 static void merge_if_block (ce_if_block *);
-static int find_cond_trap (basic_block, edge, edge);
+static bool find_cond_trap (basic_block, edge, edge);
 static basic_block find_if_header (basic_block, int);
-static int block_jumps_and_fallthru_p (basic_block, basic_block);
-static int noce_find_if_block (basic_block, edge, edge, int);
-static int cond_exec_find_if_block (ce_if_block *);
-static int find_if_case_1 (basic_block, edge, edge);
-static int find_if_case_2 (basic_block, edge, edge);
-static int dead_or_predicable (basic_block, basic_block, basic_block,
-  edge, int);
+static int block_jumps_and_fallthru (basic_block, basic_block);
+static bool noce_find_if_block (basic_block, edge, edge, int);
+static bool cond_exec_find_if_block (ce_if_block *);
+static bool find_if_case_1 (basic_block, edge, edge);
+static bool find_if_case_2 (basic_block, edge, edge);
+static bool dead_or_predicable (basic_block, basic_block, basic_block,
+   edge, bool);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx_insn *block_has_only_trap (basic_block);
 static void need_cmov_or_rewire (basic_block, hash_set *,
@@ -234,7 +234,7 @@ first_active_insn (basic_block bb)
 /* Return the last non-jump active (non-jump) insn in the basic block.  */
 
 static rtx_insn *
-last_active_insn

Re: [PATCH v2] RISC-V: Refactor riscv mode after for VXRM and FRM

2023-07-12 Thread Kito Cheng via Gcc-patches
Li, Pan2 via Gcc-patches 於 2023年7月12日 週三,15:07寫道:

> Thank Juzhe for review. Sure, let me hold the v3 for kito's comments.
>
> Pan
>
> From: juzhe.zh...@rivai.ai 
> Sent: Wednesday, July 12, 2023 2:11 PM
> To: Li, Pan2 ; gcc-patches 
> Cc: Robin Dapp ; jeffreyalaw ;
> Li, Pan2 ; Wang, Yanzhang ;
> kito.cheng 
> Subject: Re: [PATCH v2] RISC-V: Refactor riscv mode after for VXRM and FRM
>
>
> +regnum_definition_p (rtx_insn *insn, unsigned int regno)
>
> I prefer it to be reg_set_p.
>
>
>
> +insn_asm_p (rtx_insn *insn)
>
> asm_insn_p
>
>
>
> +global_vxrm_state_unknown_p
>
> vxrm_unknown_p
>
>
>
> +global_frm_state_unknown_p (rtx_insn *insn)
>
> FRM of CALL function is not "UNKNOWN" unlike VXRM.
>
> It just change into another unknown(may be same or different from previous
> dynamic mode) Dynamic mode.
>
> frm_unknown_dynamic_p
>
>
>
> The reset refactoring looks good.
>
> Let's see whether kito has more comments.
>
>
>
> Thanks.
>
> 
> juzhe.zh...@rivai.ai
>
> From: pan2.li
> Date: 2023-07-12 13:50
> To: gcc-patches
> CC: juzhe.zhong; rdapp.gcc rdapp@gmail.com>; jeffreyalaw; pan2.li
> ; yanzhang.wang;
> kito.cheng
> Subject: [PATCH v2] RISC-V: Refactor riscv mode after for VXRM and FRM
> From: Pan Li mailto:pan2...@intel.com>>
>
> When investigate the FRM dynmaic rounding mode, we find the global
> unknown status is quite different between the fixed-point and
> floating-point. Thus, we separate the unknown function with extracting
> some inner common functions.
>
> We will also prepare more test cases in another PATCH.
>
> Signed-off-by: Pan Li mailto:pan2...@intel.com>>
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (regnum_definition_p): New function.
> (insn_asm_p): Ditto.
> (riscv_vxrm_mode_after): New function for fixed-point.
> (global_vxrm_state_unknown_p): Ditto.
> (riscv_frm_mode_after): New function for floating-point.
> (global_frm_state_unknown_p): Ditto.
> (riscv_mode_after): Leverage new functions.
> (riscv_entity_mode_after): Removed.
> ---
> gcc/config/riscv/riscv.cc | 96 +--
> 1 file changed, 82 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 38d8eb2fcf5..553fbb4435a 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -7742,19 +7742,91 @@ global_state_unknown_p (rtx_insn *insn, unsigned
> int regno)
>return false;
> }
> +static bool
> +regnum_definition_p (rtx_insn *insn, unsigned int regno)
> +{
> +  df_ref ref;
> +  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
> +
> +  /* Return true if there is a definition of regno.  */
> +  for (ref = DF_INSN_INFO_DEFS (insn_info); ref; ref = DF_REF_NEXT_LOC
> (ref))
> +if (DF_REF_REGNO (ref) == regno)
> +  return true;
> +
> +  return false;
> +}
> +
> +static bool
> +insn_asm_p (rtx_insn *insn)
> +{
> +  extract_insn (insn);
> +
> +  return recog_data.is_asm;
> +}
> +
> +static bool
> +global_vxrm_state_unknown_p (rtx_insn *insn)
> +{
> +  /* Return true if there is a definition of VXRM.  */
> +  if (regnum_definition_p (insn, VXRM_REGNUM))
> +return true;
> +
> +  /* A CALL function may contain an instruction that modifies the VXRM,
> + return true in this situation.  */
> +  if (CALL_P (insn))
> +return true;
> +
> +  /* Return true for all assembly since users may hardcode a assembly
> + like this: asm volatile ("csrwi vxrm, 0").  */
> +  if (insn_asm_p (insn))
> +return true;
> +
> +  return false;
> +}
> +
> +static bool
> +global_frm_state_unknown_p (rtx_insn *insn)
> +{
> +  /* Return true if there is a definition of FRM.  */
> +  if (regnum_definition_p (insn, FRM_REGNUM))
> +return true;
> +
> +  /* A CALL function may contain an instruction that modifies the FRM,
> + return true in this situation.  */
> +  if (CALL_P (insn))
> +return true;
> +
> +  return false;
> +}
> +
> static int
> -riscv_entity_mode_after (int regnum, rtx_insn *insn, int mode,
> - int (*get_attr_mode) (rtx_insn *), int default_mode)
> +riscv_vxrm_mode_after (rtx_insn *insn, int mode)
> {
> -  if (global_state_unknown_p (insn, regnum))
> -return default_mode;
> -  else if (recog_memoized (insn) < 0)
> +  if (global_vxrm_state_unknown_p (insn))
> +return VXRM_MODE_NONE;
> +
> +  if (recog_memoized (insn) < 0)
> +return mode;
> +
> +  if (reg_mentioned_p (gen_rtx_REG (SImode, VXRM_REGNUM), PATTERN (insn)))


Extract vxrm reg to a local static variable to prevent construct that again
and again.


> +return get_attr_vxrm_mode (insn);
> +  else
>  return mode;
> +}
> -  rtx reg = gen_rtx_REG (SImode, regnum);
> -  bool mentioned_p = reg_mentioned_p (reg, PATTERN (insn));
> +static int
> +riscv_frm_mode_after (rtx

Re: [PATCH] riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])

2023-07-12 Thread Philipp Tomsich
Thanks, applied to trunk!

Philipp.

On Wed, 12 Jul 2023 at 16:08, Jeff Law  wrote:

>
>
> On 7/12/23 08:07, Philipp Tomsich wrote:
> >
> >
> > On Wed, 12 Jul 2023 at 16:05, Jeff Law  > > wrote:
> >
> >
> >
> > On 7/12/23 06:48, Christoph Müllner wrote:
> >  > On Wed, Jul 12, 2023 at 4:05 AM Jeff Law  > > wrote:
> >  >>
> >  >>
> >  >>
> >  >> On 7/10/23 22:44, Christoph Muellner wrote:
> >  >>> From: Christoph Müllner  > >
> >  >>>
> >  >>> Recently, two identical XTheadCondMov tests have been added,
> > which both fail.
> >  >>> Let's fix that by changing the following:
> >  >>> * Merge both files into one (no need for separate tests for
> > rv32 and rv64)
> >  >>> * Drop unrelated attribute check test (we already test for
> > `th.mveqz`
> >  >>> and `th.mvnez` instructions, so there is little additional
> > value)
> >  >>> * Fix the pattern to allow matching
> >  >>>
> >  >>> gcc/testsuite/ChangeLog:
> >  >>>
> >  >>>* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved
> > to...
> >  >>>* gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
> >  >>>* gcc.target/riscv/xtheadcondmov-indirect-rv64.c:
> Removed.
> >  >> I thought this stuff got fixed recently.  Certainly happy to see
> the
> >  >> files merged though.  Here's what I got from the July 4 run:
> >  >
> >  > I have the following with a GCC master from today
> >  > (a454325bea77a0dd79415480d48233a7c296bc0a):
> >  >
> >  > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2
> >  > scan-assembler .attribute arch,
> >  >
> >
>  "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> >  > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2
> >  > scan-assembler .attribute arch,
> >  >
> >
>  "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> >  >
> >  > With this patch the fails are gone.
> > Then it's fine with me :-)
> >
> >
> > For the avoidance of all doubt: could I hear an "OK"?
> OK for the trunk.
> jeff
>


[PATCH v1] RISC-V: Add more tests for RVV floating-point FRM.

2023-07-12 Thread Pan Li via Gcc-patches
From: Pan Li 

Add more test cases include both the asm check and run for RVV FRM.

Signed-off-by: Pan Li 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-10.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-8.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-insert-9.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-2.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm-run-3.c: New test.
---
 .../rvv/base/float-point-frm-insert-10.c  | 23 ++
 .../riscv/rvv/base/float-point-frm-insert-7.c | 29 +++
 .../riscv/rvv/base/float-point-frm-insert-8.c | 27 +++
 .../riscv/rvv/base/float-point-frm-insert-9.c | 24 ++
 .../riscv/rvv/base/float-point-frm-run-1.c| 79 +++
 .../riscv/rvv/base/float-point-frm-run-2.c| 71 +
 .../riscv/rvv/base/float-point-frm-run-3.c| 73 +
 7 files changed, 326 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-3.c

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-10.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-10.c
new file mode 100644
index 000..d35ee6d2131
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-10.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+void
+test_float_point_frm_static (float *out, vfloat32m1_t op1, vfloat32m1_t op2,
+size_t vl)
+{
+  asm volatile (
+"addi %0, %0, 0x12"
+:"=r"(vl)
+:
+:
+  );
+
+  vfloat32m1_t result = __riscv_vfadd_vv_f32m1_rm (op1, op2, 2, vl);
+  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 3, vl);
+  *(vfloat32m1_t *)out = result;
+}
+
+/* { dg-final { scan-assembler-times 
{vfadd\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {fsrm\s+[ax][0-9]+,\s*[ax][0-9]+} 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-7.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-7.c
new file mode 100644
index 000..7b1602fd509
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-7.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+size_t __attribute__ ((noinline))
+normalize_vl (size_t vl)
+{
+  if (vl % 4 == 0)
+return vl;
+
+  return ((vl / 4) + 1) * 4;
+}
+
+void
+test_float_point_frm_static (float *out, vfloat32m1_t op1, vfloat32m1_t op2,
+size_t vl)
+{
+  vfloat32m1_t result = __riscv_vfadd_vv_f32m1_rm (op1, op2, 2, vl);
+
+  vl = normalize_vl (vl);
+
+  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 3, vl);
+
+  *(vfloat32m1_t *)out = result;
+}
+
+/* { dg-final { scan-assembler-times 
{vfadd\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {fsrm\s+[ax][0-9]+,\s*[ax][0-9]+} 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-8.c
new file mode 100644
index 000..37481ddac38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-8.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+size_t __attribute__ ((noinline))
+normalize_vl (size_t vl)
+{
+  if (vl % 4 == 0)
+return vl;
+
+  return ((vl / 4) + 1) * 4;
+}
+
+void
+test_float_point_frm_static (float *out, vfloat32m1_t op1, vfloat32m1_t op2,
+size_t vl)
+{
+  vl = normalize_vl (vl);
+
+  vfloat32m1_t result = __riscv_vfadd_vv_f32m1_rm (op1, op2, 2, vl);
+
+  *(vfloat32m1_t *)out = result;
+}
+
+/* { dg-final { scan-assembler-times 
{vfadd\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {fsrm\s+[ax][0-9]+,\s*[ax][0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-9.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-9.c
new file mode 100644
index 0

Re: [PATCH v1] RISC-V: Add more tests for RVV floating-point FRM.

2023-07-12 Thread Kito Cheng via Gcc-patches
Pan Li via Gcc-patches 於 2023年7月12日 週三,23:07寫道:

> From: Pan Li 
>
> Add more test cases include both the asm check and run for RVV FRM.
>
> Signed-off-by: Pan Li 
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-frm-insert-10.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-insert-8.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-insert-9.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-run-1.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-run-2.c: New test.
> * gcc.target/riscv/rvv/base/float-point-frm-run-3.c: New test.
> ---
>  .../rvv/base/float-point-frm-insert-10.c  | 23 ++
>  .../riscv/rvv/base/float-point-frm-insert-7.c | 29 +++
>  .../riscv/rvv/base/float-point-frm-insert-8.c | 27 +++
>  .../riscv/rvv/base/float-point-frm-insert-9.c | 24 ++
>  .../riscv/rvv/base/float-point-frm-run-1.c| 79 +++
>  .../riscv/rvv/base/float-point-frm-run-2.c| 71 +
>  .../riscv/rvv/base/float-point-frm-run-3.c| 73 +
>  7 files changed, 326 insertions(+)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-10.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-7.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-8.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-9.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-2.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-3.c
>
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-10.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-10.c
> new file mode 100644
> index 000..d35ee6d2131
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-10.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
> +
> +#include "riscv_vector.h"
> +
> +void
> +test_float_point_frm_static (float *out, vfloat32m1_t op1, vfloat32m1_t
> op2,
> +size_t vl)
> +{
> +  asm volatile (
> +"addi %0, %0, 0x12"
> +:"=r"(vl)


Should be + rather than = here

>
> +:
> +:
> +  );
> +
> +  vfloat32m1_t result = __riscv_vfadd_vv_f32m1_rm (op1, op2, 2, vl);
> +  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 3, vl);
> +  *(vfloat32m1_t *)out = result;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vfadd\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {fsrm\s+[ax][0-9]+,\s*[ax][0-9]+} 2
> } } */
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-7.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-7.c
> new file mode 100644
> index 000..7b1602fd509
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-7.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
> +
> +#include "riscv_vector.h"
> +
> +size_t __attribute__ ((noinline))
> +normalize_vl (size_t vl)
> +{
> +  if (vl % 4 == 0)
> +return vl;
> +
> +  return ((vl / 4) + 1) * 4;
> +}
> +
> +void
> +test_float_point_frm_static (float *out, vfloat32m1_t op1, vfloat32m1_t
> op2,
> +size_t vl)
> +{
> +  vfloat32m1_t result = __riscv_vfadd_vv_f32m1_rm (op1, op2, 2, vl);
> +
> +  vl = normalize_vl (vl);
> +
> +  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 3, vl);
> +
> +  *(vfloat32m1_t *)out = result;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vfadd\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 2 } } */
> +/* { dg-final { scan-assembler-times {fsrm\s+[ax][0-9]+,\s*[ax][0-9]+} 2
> } } */
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-8.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-8.c
> new file mode 100644
> index 000..37481ddac38
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-8.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
> +
> +#include "riscv_vector.h"
> +
> +size_t __attribute__ ((noinline))
> +normalize_vl (size_t vl)
> +{
> +  if (vl % 4 == 0)
> +return vl;
> +
> +  return ((vl / 4) + 1) * 4;
> +}
> +
> +void
> +test_float_point_frm_static (float *out, vfloat32m1_t op1, vfloat32m1_t
> op2,
> +size_t vl)
> +{
> +  vl = normalize_vl (vl);
> +
> +  vfloat32m1_t result = __riscv_vfadd_vv_f32m1_rm (op1, op2, 2, vl);
> +
> +  *(vfloat32m1_t *)out = result;
> +}
> +
> 

[PATCH V2] RISC-V: Support COND_LEN_* patterns

2023-07-12 Thread Juzhe-Zhong
This middle-end has been merged:
https://github.com/gcc-mirror/gcc/commit/0d4dd7e07a879d6c07a33edb2799710faa95651e

With this patch, we can handle operations may trap on elements outside the loop.
 
These 2 following cases will be addressed by this patch:
 
1. integer division:
 
  #define TEST_TYPE(TYPE) \
  __attribute__((noipa)) \
  void vrem_##TYPE (TYPE * __restrict dst, TYPE * __restrict a, TYPE * 
__restrict b, int n) \
  { \
for (int i = 0; i < n; i++) \
  dst[i] = a[i] % b[i]; \
  }
  #define TEST_ALL() \
   TEST_TYPE(int8_t) \
  TEST_ALL()
 
  Before this patch:
 
   vrem_int8_t:
ble a3,zero,.L14
csrrt4,vlenb
addiw   a5,a3,-1
addiw   a4,t4,-1
sext.w  t5,a3
bltua5,a4,.L10
csrrt3,vlenb
subwt3,t5,t3
li  a5,0
vsetvli t6,zero,e8,m1,ta,ma
.L4:
add a6,a2,a5
add a7,a0,a5
add t1,a1,a5
mv  a4,a5
add a5,a5,t4
vl1re8.vv2,0(a6)
vl1re8.vv1,0(t1)
sext.w  a6,a5
vrem.vv v1,v1,v2
vs1r.v  v1,0(a7)
bleua6,t3,.L4
csrra5,vlenb
addwa4,a4,a5
sext.w  a5,a4
beq t5,a4,.L16
.L3:
csrra6,vlenb
subwt5,t5,a4
srlia6,a6,1
addiw   t1,t5,-1
addiw   a7,a6,-1
bltut1,a7,.L9
sllia4,a4,32
srlia4,a4,32
add t0,a1,a4
add t6,a2,a4
add a4,a0,a4
vsetvli a7,zero,e8,mf2,ta,ma
sext.w  t3,a6
vle8.v  v1,0(t0)
vle8.v  v2,0(t6)
subwt4,t5,a6
vrem.vv v1,v1,v2
vse8.v  v1,0(a4)
mv  t1,t3
bltut4,t3,.L7
csrrt1,vlenb
add a4,a4,a6
add t0,t0,a6
add t6,t6,a6
sext.w  t1,t1
vle8.v  v1,0(t0)
vle8.v  v2,0(t6)
vrem.vv v1,v1,v2
vse8.v  v1,0(a4)
.L7:
addwa5,t1,a5
beq t5,t1,.L14
.L9:
add a4,a1,a5
add a6,a2,a5
lb  a6,0(a6)
lb  a4,0(a4)
add a7,a0,a5
addia5,a5,1
remwa4,a4,a6
sext.w  a6,a5
sb  a4,0(a7)
bgt a3,a6,.L9
.L14:
ret
.L10:
li  a4,0
li  a5,0
j   .L3
.L16:
ret
 
After this patch:
 
   vrem_int8_t:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,m1,tu,ma
vle8.v v1,0(a1)
vle8.v v2,0(a2)
sub a3,a3,a5
vrem.vv v1,v1,v2
vse8.v v1,0(a0)
add a1,a1,a5
add a2,a2,a5
add a0,a0,a5
bne a3,zero,.L3
.L5:
ret
 
2. Floating-point operation **WITHOUT** -ffast-math:
 
#define TEST_TYPE(TYPE) \
__attribute__((noipa)) \
void vadd_##TYPE (TYPE * __restrict dst, TYPE *__restrict a, TYPE 
*__restrict b, int n) \
{ \
  for (int i = 0; i < n; i++) \
dst[i] = a[i] + b[i]; \
}
 
#define TEST_ALL() \
 TEST_TYPE(float) \
 
TEST_ALL()
   
Before this patch:
   
   vadd_float:
ble a3,zero,.L10
csrra4,vlenb
srlit3,a4,2
addiw   a5,a3,-1
addiw   a6,t3,-1
sext.w  t6,a3
bltua5,a6,.L7
subwt5,t6,t3
mv  t1,a1
mv  a7,a2
mv  a6,a0
li  a5,0
vsetvli t4,zero,e32,m1,ta,ma
.L4:
vl1re32.v   v1,0(t1)
vl1re32.v   v2,0(a7)
addwa5,a5,t3
vfadd.vvv1,v1,v2
vs1r.v  v1,0(a6)
add t1,t1,a4
add a7,a7,a4
add a6,a6,a4
bgeut5,a5,.L4
beq t6,a5,.L10
sext.w  a5,a5
.L3:
sllia4,a5,2
.L6:
add a6,a1,a4
add a7,a2,a4
flw fa4,0(a6)
flw fa5,0(a7)
add a6,a0,a4
addiw   a5,a5,1
fadd.s  fa5,fa5,fa4
addia4,a4,4
fsw fa5,0(a6)
bgt a3,a5,.L6
.L10:
ret
.L7:
li  a5,0
j   .L3
 
After this patch:
 
   vadd_float:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,tu,ma
slli a4,a5,2
vle32.v v1,0(a1)
vle32.v v2,0(a2)
sub a3,a3,a5
vfadd.vv v1,v1,v2
vse32.v v1,0(a0)
add a1,a1,a4
add a2,a2,a4
add a0,a0,a4
bne a3,zero,.L3
.L5:
ret
  
gcc/ChangeLog:
 
* config/riscv/autovec.md (cond_len_): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_cond_len_binop): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_tu_insn): Ditto.
(emit_nonvlmax_fp_tu_insn): Ditto.
(need_fp_rounding_p): Ditto.
(expand_cond_len_binop): Ditto.
* config/riscv/riscv.cc (riscv_preferred_else_value): Ditto.
(TARGET_PREFERRED_ELSE_VALUE): New target hook.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Adapt testcase.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c

Re: [PATCH v2] RISC-V: Refactor riscv mode after for VXRM and FRM

2023-07-12 Thread Jeff Law via Gcc-patches




On 7/11/23 23:50, pan2...@intel.com wrote:

From: Pan Li 

When investigate the FRM dynmaic rounding mode, we find the global
unknown status is quite different between the fixed-point and
floating-point. Thus, we separate the unknown function with extracting
some inner common functions.

We will also prepare more test cases in another PATCH.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv.cc (regnum_definition_p): New function.
(insn_asm_p): Ditto.
(riscv_vxrm_mode_after): New function for fixed-point.
(global_vxrm_state_unknown_p): Ditto.
(riscv_frm_mode_after): New function for floating-point.
(global_frm_state_unknown_p): Ditto.
(riscv_mode_after): Leverage new functions.
(riscv_entity_mode_after): Removed.
---
  gcc/config/riscv/riscv.cc | 96 +--
  1 file changed, 82 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38d8eb2fcf5..553fbb4435a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7742,19 +7742,91 @@ global_state_unknown_p (rtx_insn *insn, unsigned int 
regno)
return false;
  }
  
+static bool

+regnum_definition_p (rtx_insn *insn, unsigned int regno)
Needs a function comment.  This is true for each new function added.  In 
this specific case somethign like this might be appropriate


/* Return TRUE if REGNO is set in INSN, FALSE otherwise.  */

Which begs the question, is there some reason why we're not using the 
existing reg_set_p or simple_regno_set from rtlanal.cc?




Jeff


Re: [PATCH V2] RISC-V: Support COND_LEN_* patterns

2023-07-12 Thread Jeff Law via Gcc-patches




On 7/12/23 09:24, Juzhe-Zhong wrote:

This middle-end has been merged:
https://github.com/gcc-mirror/gcc/commit/0d4dd7e07a879d6c07a33edb2799710faa95651e

With this patch, we can handle operations may trap on elements outside the loop.
  
These 2 following cases will be addressed by this patch:
  
1. integer division:
  
   #define TEST_TYPE(TYPE) \

   __attribute__((noipa)) \
   void vrem_##TYPE (TYPE * __restrict dst, TYPE * __restrict a, TYPE * 
__restrict b, int n) \
   { \
 for (int i = 0; i < n; i++) \
   dst[i] = a[i] % b[i]; \
   }
   #define TEST_ALL() \
TEST_TYPE(int8_t) \
   TEST_ALL()
  
   Before this patch:
  
vrem_int8_t:

 ble a3,zero,.L14
 csrrt4,vlenb
 addiw   a5,a3,-1
 addiw   a4,t4,-1
 sext.w  t5,a3
 bltua5,a4,.L10
 csrrt3,vlenb
 subwt3,t5,t3
 li  a5,0
 vsetvli t6,zero,e8,m1,ta,ma
.L4:
 add a6,a2,a5
 add a7,a0,a5
 add t1,a1,a5
 mv  a4,a5
 add a5,a5,t4
 vl1re8.vv2,0(a6)
 vl1re8.vv1,0(t1)
 sext.w  a6,a5
 vrem.vv v1,v1,v2
 vs1r.v  v1,0(a7)
 bleua6,t3,.L4
 csrra5,vlenb
 addwa4,a4,a5
 sext.w  a5,a4
 beq t5,a4,.L16
.L3:
 csrra6,vlenb
 subwt5,t5,a4
 srlia6,a6,1
 addiw   t1,t5,-1
 addiw   a7,a6,-1
 bltut1,a7,.L9
 sllia4,a4,32
 srlia4,a4,32
 add t0,a1,a4
 add t6,a2,a4
 add a4,a0,a4
 vsetvli a7,zero,e8,mf2,ta,ma
 sext.w  t3,a6
 vle8.v  v1,0(t0)
 vle8.v  v2,0(t6)
 subwt4,t5,a6
 vrem.vv v1,v1,v2
 vse8.v  v1,0(a4)
 mv  t1,t3
 bltut4,t3,.L7
 csrrt1,vlenb
 add a4,a4,a6
 add t0,t0,a6
 add t6,t6,a6
 sext.w  t1,t1
 vle8.v  v1,0(t0)
 vle8.v  v2,0(t6)
 vrem.vv v1,v1,v2
 vse8.v  v1,0(a4)
.L7:
 addwa5,t1,a5
 beq t5,t1,.L14
.L9:
 add a4,a1,a5
 add a6,a2,a5
 lb  a6,0(a6)
 lb  a4,0(a4)
 add a7,a0,a5
 addia5,a5,1
 remwa4,a4,a6
 sext.w  a6,a5
 sb  a4,0(a7)
 bgt a3,a6,.L9
.L14:
 ret
.L10:
 li  a4,0
 li  a5,0
 j   .L3
.L16:
 ret
  
After this patch:
  
vrem_int8_t:

ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,m1,tu,ma
vle8.v v1,0(a1)
vle8.v v2,0(a2)
sub a3,a3,a5
vrem.vv v1,v1,v2
vse8.v v1,0(a0)
add a1,a1,a5
add a2,a2,a5
add a0,a0,a5
bne a3,zero,.L3
.L5:
ret
  
2. Floating-point operation **WITHOUT** -ffast-math:
  
 #define TEST_TYPE(TYPE) \

 __attribute__((noipa)) \
 void vadd_##TYPE (TYPE * __restrict dst, TYPE *__restrict a, TYPE 
*__restrict b, int n) \
 { \
   for (int i = 0; i < n; i++) \
 dst[i] = a[i] + b[i]; \
 }
  
 #define TEST_ALL() \

  TEST_TYPE(float) \
  
 TEST_ALL()

Before this patch:

vadd_float:

 ble a3,zero,.L10
 csrra4,vlenb
 srlit3,a4,2
 addiw   a5,a3,-1
 addiw   a6,t3,-1
 sext.w  t6,a3
 bltua5,a6,.L7
 subwt5,t6,t3
 mv  t1,a1
 mv  a7,a2
 mv  a6,a0
 li  a5,0
 vsetvli t4,zero,e32,m1,ta,ma
.L4:
 vl1re32.v   v1,0(t1)
 vl1re32.v   v2,0(a7)
 addwa5,a5,t3
 vfadd.vvv1,v1,v2
 vs1r.v  v1,0(a6)
 add t1,t1,a4
 add a7,a7,a4
 add a6,a6,a4
 bgeut5,a5,.L4
 beq t6,a5,.L10
 sext.w  a5,a5
.L3:
 sllia4,a5,2
.L6:
 add a6,a1,a4
 add a7,a2,a4
 flw fa4,0(a6)
 flw fa5,0(a7)
 add a6,a0,a4
 addiw   a5,a5,1
 fadd.s  fa5,fa5,fa4
 addia4,a4,4
 fsw fa5,0(a6)
 bgt a3,a5,.L6
.L10:
 ret
.L7:
 li  a5,0
 j   .L3
  
After this patch:
  
vadd_float:

ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,tu,ma
slli a4,a5,2
vle32.v v1,0(a1)
vle32.v v2,0(a2)
sub a3,a3,a5
vfadd.vv v1,v1,v2
vse32.v v1,0(a0)
add a1,a1,a4
add a2,a2,a4
add a0,a0,a4
bne a3,zero,.L3
.L5:
ret
   
gcc/ChangeLog:
  
 * config/riscv/autovec.md (cond_len_): New pattern.

 * config/riscv/riscv-protos.h (enum insn_type): New enum.
 (expand_cond_len_binop): New function.
 * config/riscv/riscv-v.cc (emit_nonvlmax_tu_insn): Ditto.
 (emit_nonvlmax_fp_tu_insn): Ditto.
 (need_fp_rounding_p): Ditto.
 (expand_cond_len_binop): Ditto.
 * config/riscv/riscv.cc (riscv_preferred_else_value): Ditto.
 (TARGET_PREFERRED_ELSE_VALUE): New target hook.
  
gcc/testsuite/Change

[PATCH] c++: constrained surrogate calls [PR110535]

2023-07-12 Thread Patrick Palka via Gcc-patches
We're not checking constraints of pointer/reference-to-function conversion
functions during overload resolution, which causes us to ICE on the first
testcase and incorrectly reject the second testcase.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13?

PR c++/110535

gcc/cp/ChangeLog:

* call.cc (add_conv_candidate): Check constraints.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-surrogate1.C: New test.
* g++.dg/cpp2a/concepts-surrogate2.C: New test.
---
 gcc/cp/call.cc   |  8 
 gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C | 12 
 gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C | 14 ++
 3 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 15a3d6f2a1f..81935b83908 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -2588,6 +2588,14 @@ add_conv_candidate (struct z_candidate **candidates, 
tree fn, tree obj,
   if (*candidates && (*candidates)->fn == totype)
 return NULL;
 
+  if (!constraints_satisfied_p (fn))
+{
+  reason = constraint_failure ();
+  viable = 0;
+  return add_candidate (candidates, fn, obj, arglist, len, convs,
+   access_path, conversion_path, viable, reason, 
flags);
+}
+
   for (i = 0; i < len; ++i)
 {
   tree arg, argtype, convert_type = NULL_TREE;
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C
new file mode 100644
index 000..e8481a31656
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C
@@ -0,0 +1,12 @@
+// PR c++/110535
+// { dg-do compile { target c++20 } }
+
+using F = int(int);
+
+template
+struct A {
+ operator F*() requires B;
+};
+
+int i = A{}(0);  // OK
+int j = A{}(0); // { dg-error "no match" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C
new file mode 100644
index 000..8bf8364beb7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C
@@ -0,0 +1,14 @@
+// PR c++/110535
+// { dg-do compile { target c++20 } }
+
+using F = int(int);
+using G = long(int);
+
+template
+struct A {
+ operator F&() requires B;
+ operator G&() requires (!B);
+};
+
+int i = A{}(0);  // { dg-bogus "ambiguous" }
+int j = A{}(0); // { dg-bogus "ambiguous" }
-- 
2.41.0.327.gaa9166bcc0



Re: [PATCH] c++: constrained surrogate calls [PR110535]

2023-07-12 Thread Patrick Palka via Gcc-patches
On Wed, 12 Jul 2023, Patrick Palka wrote:

> We're not checking constraints of pointer/reference-to-function conversion
> functions during overload resolution, which causes us to ICE on the first
> testcase and incorrectly reject the second testcase.

Er, I noticed [over.call.object] doesn't exactly say that surrogate
call functions inherit the constraints of the corresponding conversion
function, but I reckon that's the intent?

> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk/13?
> 
>   PR c++/110535
> 
> gcc/cp/ChangeLog:
> 
>   * call.cc (add_conv_candidate): Check constraints.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/concepts-surrogate1.C: New test.
>   * g++.dg/cpp2a/concepts-surrogate2.C: New test.
> ---
>  gcc/cp/call.cc   |  8 
>  gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C | 12 
>  gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C | 14 ++
>  3 files changed, 34 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C
> 
> diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> index 15a3d6f2a1f..81935b83908 100644
> --- a/gcc/cp/call.cc
> +++ b/gcc/cp/call.cc
> @@ -2588,6 +2588,14 @@ add_conv_candidate (struct z_candidate **candidates, 
> tree fn, tree obj,
>if (*candidates && (*candidates)->fn == totype)
>  return NULL;
>  
> +  if (!constraints_satisfied_p (fn))
> +{
> +  reason = constraint_failure ();
> +  viable = 0;
> +  return add_candidate (candidates, fn, obj, arglist, len, convs,
> + access_path, conversion_path, viable, reason, 
> flags);
> +}
> +
>for (i = 0; i < len; ++i)
>  {
>tree arg, argtype, convert_type = NULL_TREE;
> diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C 
> b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C
> new file mode 100644
> index 000..e8481a31656
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C
> @@ -0,0 +1,12 @@
> +// PR c++/110535
> +// { dg-do compile { target c++20 } }
> +
> +using F = int(int);
> +
> +template
> +struct A {
> + operator F*() requires B;
> +};
> +
> +int i = A{}(0);  // OK
> +int j = A{}(0); // { dg-error "no match" }
> diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C 
> b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C
> new file mode 100644
> index 000..8bf8364beb7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C
> @@ -0,0 +1,14 @@
> +// PR c++/110535
> +// { dg-do compile { target c++20 } }
> +
> +using F = int(int);
> +using G = long(int);
> +
> +template
> +struct A {
> + operator F&() requires B;
> + operator G&() requires (!B);
> +};
> +
> +int i = A{}(0);  // { dg-bogus "ambiguous" }
> +int j = A{}(0); // { dg-bogus "ambiguous" }
> -- 
> 2.41.0.327.gaa9166bcc0
> 
> 



Re: [PATCH] RISC-V: Throw compilation error for unknown sub-extension or supervisor extension

2023-07-12 Thread Jeff Law via Gcc-patches




On 7/11/23 21:30, juzhe.zh...@rivai.ai wrote:

LGTM

OK for the trunk.
jeff


  1   2   >