[PATCH, V1 0/1] RISC-V: Make R_RISCV_SUB6 conforms to riscv abi standard

2022-11-15 Thread zengxiao
From: zengxiao 

Hi all RISC-V folks:

When riscv-objdump is used to generate dwarf information, problems are found, 
like:
DW_CFA_??? (User defined call frame op: 0x3c)

This error is related to that riscv-objdump does not follow the riscv 
R_RISCV_SUB6 standard. 
Riscv-readelf is correct because it follows the R_RISCV_SUB6 standard.

There are test cases in 
https://github.com/zeng-xiao/gnu-bug-fix/tree/main/EG-769
that describe the error in detail. 

zengxiao (1):
  RISC-V: Make R_RISCV_SUB6 conforms to riscv abi standard

 bfd/elfxx-riscv.c |  7 +
 .../testsuite/binutils-all/riscv/dwarf-SUB6.d | 31 +++
 .../testsuite/binutils-all/riscv/dwarf-SUB6.s | 12 +++
 3 files changed, 50 insertions(+)
 create mode 100644 binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d
 create mode 100644 binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s

-- 
2.34.1



[PATCH, V1 1/1] RISC-V: Make R_RISCV_SUB6 conforms to riscv abi standard

2022-11-15 Thread zengxiao
From: zengxiao 

This patch makes R_RISCV_SUB6 conforms to riscv abi standard.
R_RISCV_SUB6 only the lower 6 bits of the code are valid.
The proposed specification which can be found in 8.5. Relocations of,
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/download/v1.0-rc4/riscv-abi.pdf

bfd/ChangeLog:

* elfxx-riscv.c (riscv_elf_add_sub_reloc):

binutils/ChangeLog:

* testsuite/binutils-all/riscv/dwarf-SUB6.d: New test.
* testsuite/binutils-all/riscv/dwarf-SUB6.s: New test.

reviewed-by: gao...@eswincomputing.com
 jinyanji...@eswincomputing.com

Signed-off-by: zengxiao 
---
 bfd/elfxx-riscv.c |  7 +
 .../testsuite/binutils-all/riscv/dwarf-SUB6.d | 31 +++
 .../testsuite/binutils-all/riscv/dwarf-SUB6.s | 12 +++
 3 files changed, 50 insertions(+)
 create mode 100644 binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d
 create mode 100644 binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s

diff --git a/bfd/elfxx-riscv.c b/bfd/elfxx-riscv.c
index 300ccf49534..e71d4a456f2 100644
--- a/bfd/elfxx-riscv.c
+++ b/bfd/elfxx-riscv.c
@@ -994,6 +994,13 @@ riscv_elf_add_sub_reloc (bfd *abfd,
   relocation = old_value + relocation;
   break;
 case R_RISCV_SUB6:
+  {
+bfd_vma six_bit_valid_value = old_value & howto->dst_mask;
+six_bit_valid_value -= relocation;
+relocation = (six_bit_valid_value & howto->dst_mask) |
+ (old_value & ~howto->dst_mask);
+  }
+  break;
 case R_RISCV_SUB8:
 case R_RISCV_SUB16:
 case R_RISCV_SUB32:
diff --git a/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d 
b/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d
new file mode 100644
index 000..47d5ae570d7
--- /dev/null
+++ b/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.d
@@ -0,0 +1,31 @@
+#PROG: objcopy
+#objdump: --dwarf=frames
+
+tmpdir/riscvcopy.o: file format elf32-littleriscv
+
+Contents of the .eh_frame section:
+
+
+ 0020  CIE
+  Version:   3
+  Augmentation:  "zR"
+  Code alignment factor: 1
+  Data alignment factor: -4
+  Return address column: 1
+  Augmentation data: 1b
+  DW_CFA_def_cfa_register: r2 \(sp\)
+  DW_CFA_def_cfa_offset: 48
+  DW_CFA_offset: r1 \(ra\) at cfa-4
+  DW_CFA_offset: r8 \(s0\) at cfa-8
+  DW_CFA_def_cfa: r8 \(s0\) ofs 0
+  DW_CFA_restore: r1 \(ra\)
+  DW_CFA_restore: r8 \(s0\)
+  DW_CFA_def_cfa: r2 \(sp\) ofs 48
+  DW_CFA_def_cfa_offset: 0
+  DW_CFA_nop
+
+0024 0010 0028 FDE cie= pc=002c..002c
+  DW_CFA_nop
+  DW_CFA_nop
+  DW_CFA_nop
+
diff --git a/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s 
b/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s
new file mode 100644
index 000..fe959f59d9b
--- /dev/null
+++ b/binutils/testsuite/binutils-all/riscv/dwarf-SUB6.s
@@ -0,0 +1,12 @@
+.attribute arch, "rv32i2p0_m2p0_a2p0_f2p0_c2p0"
+.cfi_startproc
+.cfi_def_cfa_offset 48
+.cfi_offset 1, -4
+.cfi_offset 8, -8
+.cfi_def_cfa 8, 0
+.cfi_restore 1
+.cfi_restore 8
+.cfi_def_cfa 2, 48
+.cfi_def_cfa_offset 0
+.cfi_endproc
+
\ No newline at end of file
-- 
2.34.1



[PATCH v2 2/2] RISC-V: Optimize RVV epilogue logic.

2022-11-15 Thread jiawei
Sometimes "step1 -= scalable_frame" will cause adjust equal to
zero. And it will generate additional redundant instruction 
"addi sp,sp,0". Add checking segement to skip that case.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_expand_epilogue): 
New check segement.

---
 gcc/config/riscv/riscv.cc | 35 +++
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 02a01ca0b7c..433b9b13eb6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5185,25 +5185,28 @@ riscv_expand_epilogue (int style)
  step1 -= scalable_frame;
}
 
-  /* Get an rtx for STEP1 that we can add to BASE.  */
-  rtx adjust = GEN_INT (step1.to_constant ());
-  if (!SMALL_OPERAND (step1.to_constant ()))
-   {
- riscv_emit_move (RISCV_PROLOGUE_TEMP (Pmode), adjust);
- adjust = RISCV_PROLOGUE_TEMP (Pmode);
-   }
-
-  insn = emit_insn (
-  gen_add3_insn (stack_pointer_rtx, stack_pointer_rtx, adjust));
-
-  rtx dwarf = NULL_RTX;
-  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+  /* Get an rtx for STEP1 that we can add to BASE.  
+ Skip if adjust equal to zero.  */
+  if (step1.to_constant () != 0)
+  {
+rtx adjust = GEN_INT (step1.to_constant ());
+if (!SMALL_OPERAND (step1.to_constant ()))
+{
+  riscv_emit_move (RISCV_PROLOGUE_TEMP (Pmode), adjust);
+  adjust = RISCV_PROLOGUE_TEMP (Pmode);
+}
+
+insn = emit_insn (
+   gen_add3_insn (stack_pointer_rtx, stack_pointer_rtx, adjust));
+rtx dwarf = NULL_RTX;
+rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
 GEN_INT (step2));
 
-  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
-  RTX_FRAME_RELATED_P (insn) = 1;
+dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
+RTX_FRAME_RELATED_P (insn) = 1;
 
-  REG_NOTES (insn) = dwarf;
+REG_NOTES (insn) = dwarf;
+ }
 }
   else if (frame_pointer_needed)
 {
-- 
2.25.1



[PATCH v2 0/2] RISC-V: Optimize RVV epilogue logic.

2022-11-15 Thread jiawei
Current epilogue will generate "addi sp,sp,0" redundant instruction.

```
csrrt0,vlenb
sllit1,t0,1
add sp,sp,t1
addisp,sp,0
ld  s0,24(sp)
addisp,sp,32
jr  ra
```

Optimize it by check if adjust equal to zero, remove redundant insn gen.

```
csrrt0,vlenb
sllit1,t0,1
add sp,sp,t1
ld  s0,24(sp)
addisp,sp,32
jr  ra
```

Thanks for Kito and Jeff's suggestion, add testcase and fix code format.

jiawei (2):
  RISC-V: Add spill sp adjust check testcase.
  RISC-V: Optimize RVV epilogue logic.

 gcc/config/riscv/riscv.cc | 35 ++-
 .../riscv/rvv/base/spill-sp-adjust.c  | 13 +++
 2 files changed, 32 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-sp-adjust.c

-- 
2.25.1



[PATCH v2 1/2] RISC-V: Add spill sp adjust check testcase.

2022-11-15 Thread jiawei
This testcase mix exist spill-1.c and adding new fun to check if
there have redundant addi intructions. Idea provided by Jeff Law.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/spill-sp-adjust.c: New test.

---
 .../gcc.target/riscv/rvv/base/spill-sp-adjust.c | 13 +
 1 file changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-sp-adjust.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/spill-sp-adjust.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/spill-sp-adjust.c
new file mode 100644
index 000..0226554abf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/spill-sp-adjust.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv" } */
+
+#include "spill-1.c"
+
+void
+spill_sp_adjust (int8_t *v)
+{
+  vint8mf8_t v1 = *(vint8mf8_t*)v; 
+}
+
+/* Make sure we do not have a useless SP adjustment.  */
+/* { dg-final { scan-assembler-not "addi\tsp,sp,0" } } */
-- 
2.25.1



Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-11-15 Thread Hongtao Liu via Gcc-patches
Hi:
  I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606040.html.
>  }
>
>/* See if we can get a better vector mode before extracting.  */
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 
> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b6896160090a453cc6a28d9
>  100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx 
> v1,
>v0_qi = gen_lowpart (qimode, v0);
>v1_qi = gen_lowpart (qimode, v1);
>if (targetm.vectorize.vec_perm_const != NULL
> + && targetm.can_change_mode_class (mode, qimode, ALL_REGS)
It looks like you want to guard gen_lowpart, shouldn't it be better to
use validate_subreg  or (tmp = gen_lowpart_if_possible (mode,
target_qi)).
IMHO, targetm.can_change_mode_class is mostly used for RA, but not to
guard gen_lowpart.
I did similar things in
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579296.html
(and ALL_REGS doesn't cover all cases for registers which are both
available for qimode and mode, ALL_REGS fail doesn't mean it can't be
subreg, it just means parts of ALL_REGS can't be subreg. but with a
subset of ALL_REGS, there could be a reg class which return true for
targetm.can_change_mode_class)
>   && targetm.vectorize.vec_perm_const (qimode, qimode, target_qi, 
> v0_qi,
>v1_qi, qimode_indices))
> return gen_lowpart (mode, target_qi);
> @@ -6311,7 +6312,8 @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx 
> v1,
>  }
>
>if (qimode != VOIDmode
> -  && selector_fits_mode_p (qimode, qimode_indices))
> +  && selector_fits_mode_p (qimode, qimode_indices)
> +  && targetm.can_change_mode_class (mode, qimode, ALL_REGS))
>  {
>icode = direct_optab_handler (vec_perm_optab, qimode);
>if (icode != CODE_FOR_nothing)
> diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c 
> b/gcc/testsuite/gcc.target/aarch64/ext_1.c
> new file mode 100644
> index 
> ..18a10a14f1161584267a8472e571b3bc2ddf887a




-- 
BR,
Hongtao


Re: [PATCH v2 0/2] RISC-V: Optimize RVV epilogue logic.

2022-11-15 Thread juzhe.zh...@rivai.ai
LGTM. Thanks for fixing my mistake. 
Let's see whether other RISC-V folks are happy with this patch.



juzhe.zh...@rivai.ai
 
From: jiawei
Date: 2022-11-15 16:33
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; jeffreyalaw; christoph.muellner; 
philipp.tomsich; wuwei2016; jiawei
Subject: [PATCH v2 0/2] RISC-V: Optimize RVV epilogue logic.
Current epilogue will generate "addi sp,sp,0" redundant instruction.
 
```
csrrt0,vlenb
sllit1,t0,1
add sp,sp,t1
addisp,sp,0
ld  s0,24(sp)
addisp,sp,32
jr  ra
```
 
Optimize it by check if adjust equal to zero, remove redundant insn gen.
 
```
csrrt0,vlenb
sllit1,t0,1
add sp,sp,t1
ld  s0,24(sp)
addisp,sp,32
jr  ra
```
 
Thanks for Kito and Jeff's suggestion, add testcase and fix code format.
 
jiawei (2):
  RISC-V: Add spill sp adjust check testcase.
  RISC-V: Optimize RVV epilogue logic.
 
gcc/config/riscv/riscv.cc | 35 ++-
.../riscv/rvv/base/spill-sp-adjust.c  | 13 +++
2 files changed, 32 insertions(+), 16 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/spill-sp-adjust.c
 
-- 
2.25.1
 
 


RE: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-11-15 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Hongtao Liu 
> Sent: Tuesday, November 15, 2022 8:36 AM
> To: Tamar Christina 
> Cc: Richard Sandiford ; Tamar Christina via
> Gcc-patches ; nd ;
> rguent...@suse.de
> Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from
> arbitrary element position inside a vector
> 
> Hi:
>   I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022-
> November/606040.html.
> >  }
> >
> >/* See if we can get a better vector mode before extracting.  */
> > diff --git a/gcc/optabs.cc b/gcc/optabs.cc index
> >
> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b689616009
> 0
> > a453cc6a28d9 100644
> > --- a/gcc/optabs.cc
> > +++ b/gcc/optabs.cc
> > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode mode,
> rtx v0, rtx v1,
> >v0_qi = gen_lowpart (qimode, v0);
> >v1_qi = gen_lowpart (qimode, v1);
> >if (targetm.vectorize.vec_perm_const != NULL
> > + && targetm.can_change_mode_class (mode, qimode, ALL_REGS)
> It looks like you want to guard gen_lowpart, shouldn't it be better to use
> validate_subreg  or (tmp = gen_lowpart_if_possible (mode, target_qi)).
> IMHO, targetm.can_change_mode_class is mostly used for RA, but not to
> guard gen_lowpart.

Hmm I don't think this is quite true, there are existing usages in expr.cc and 
rtanal.cc
That do this and aren't part of RA.  As I mentioned before for instance the
canoncalization of vec_select to subreg in rtlanal for instances uses this.

So there are already existing precedence for this.  And the documentation for
the hook says:

"This hook returns true if it is possible to bitcast values held in registers 
of class rclass from mode from to mode to and if doing so preserves the 
low-order bits that are common to both modes. The result is only meaningful if 
rclass has registers that can hold both from and to. The default implementation 
returns true"

So it looks like it's use outside of RA is perfectly valid.. and the 
documentation also mentions
in the example the use from the mid-end as an example.

But if the mid-end maintainers are happy I'll use something else.

Tamar

> I did similar things in
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579296.html
> (and ALL_REGS doesn't cover all cases for registers which are both available
> for qimode and mode, ALL_REGS fail doesn't mean it can't be subreg, it just
> means parts of ALL_REGS can't be subreg. but with a subset of ALL_REGS,
> there could be a reg class which return true for
> targetm.can_change_mode_class)
> >   && targetm.vectorize.vec_perm_const (qimode, qimode, target_qi,
> v0_qi,
> >v1_qi, qimode_indices))
> > return gen_lowpart (mode, target_qi); @@ -6311,7 +6312,8 @@
> > expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
> >  }
> >
> >if (qimode != VOIDmode
> > -  && selector_fits_mode_p (qimode, qimode_indices))
> > +  && selector_fits_mode_p (qimode, qimode_indices)
> > +  && targetm.can_change_mode_class (mode, qimode, ALL_REGS))
> >  {
> >icode = direct_optab_handler (vec_perm_optab, qimode);
> >if (icode != CODE_FOR_nothing)
> > diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c
> > b/gcc/testsuite/gcc.target/aarch64/ext_1.c
> > new file mode 100644
> > index
> >
> ..18a10a14f1161584267a8472e5
> 71
> > b3bc2ddf887a
> 
> 
> 
> 
> --
> BR,
> Hongtao


[PATCH] c++, v2: Allow attributes on concepts - DR 2428

2022-11-15 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 14, 2022 at 07:00:54PM -0500, Jason Merrill wrote:
> > The following patch adds parsing of attributes to concept definition,
> > allows deprecated attribute to be specified (some ugliness needed
> > because CONCEPT_DECL is a cp/*.def attribute and so can't be mentioned
> > in c-family/ directly; used what is used for objc method decls,
> > an alternative would be a langhook)
> 
> Several of the codes in c-common.def are C++-only, you might just move it
> over?
> 
> > and checks TREE_DEPRECATED in
> > build_standard_check (not sure if that is the right spot, or whether
> > it shouldn't be checked also for variable and function concepts and
> > how to write testcase coverage for that).
> 
> I wouldn't bother with var/fn concepts, they're obsolete.

Ok, so like this?
The previous version passed bootstrap/regtest on x86_64-linux and i686-linux,
I'll of course test this one as well.

Jakub



Re: [PATCH] c++, v2: Allow attributes on concepts - DR 2428

2022-11-15 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 15, 2022 at 09:54:00AM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Nov 14, 2022 at 07:00:54PM -0500, Jason Merrill wrote:
> > > The following patch adds parsing of attributes to concept definition,
> > > allows deprecated attribute to be specified (some ugliness needed
> > > because CONCEPT_DECL is a cp/*.def attribute and so can't be mentioned
> > > in c-family/ directly; used what is used for objc method decls,
> > > an alternative would be a langhook)
> > 
> > Several of the codes in c-common.def are C++-only, you might just move it
> > over?
> > 
> > > and checks TREE_DEPRECATED in
> > > build_standard_check (not sure if that is the right spot, or whether
> > > it shouldn't be checked also for variable and function concepts and
> > > how to write testcase coverage for that).
> > 
> > I wouldn't bother with var/fn concepts, they're obsolete.
> 
> Ok, so like this?
> The previous version passed bootstrap/regtest on x86_64-linux and i686-linux,
> I'll of course test this one as well.

Better with a patch.  Sorry.

2022-11-15  Jakub Jelinek  

gcc/c-family/
* c-common.def (CONCEPT_DECL): New tree, moved here from
cp-tree.def.
* c-common.cc (c_common_init_ts): Handle CONCEPT_DECL.
* c-attribs.cc (handle_deprecated_attribute): Allow deprecated
attribute on CONCEPT_DECL.
gcc/cp/
* cp-tree.def (CONCEPT_DECL): Move to c-common.def.
* cp-objcp-common.cc (cp_common_init_ts): Don't handle CONCEPT_DECL
here.
* cp-tree.h (finish_concept_definition): Add ATTRS parameter.
* parser.cc (cp_parser_concept_definition): Parse attributes in
between identifier and =.  Adjust finish_concept_definition
caller.
* pt.cc (finish_concept_definition): Add ATTRS parameter.  Call
cplus_decl_attributes.
* constraint.cc (build_standard_check): If CONCEPT_DECL is
TREE_DEPRECATED, emit -Wdeprecated-declaration warnings.
gcc/testsuite/
* g++.dg/cpp2a/concepts-dr2428.C: New test.

--- gcc/c-family/c-common.def.jj2022-10-14 09:28:27.975164491 +0200
+++ gcc/c-family/c-common.def   2022-11-15 09:34:01.384591076 +0100
@@ -81,6 +81,14 @@ DEFTREECODE (CONTINUE_STMT, "continue_st
SWITCH_STMT_SCOPE, respectively.  */
 DEFTREECODE (SWITCH_STMT, "switch_stmt", tcc_statement, 4)
 
+/* Extensions for C++ Concepts. */
+
+/* Concept definition. This is not entirely different than a VAR_DECL
+   except that a) it must be a template, and b) doesn't have the wide
+   range of value and linkage options available to variables.  Used
+   by C++ FE and in c-family attribute handling.  */
+DEFTREECODE (CONCEPT_DECL, "concept_decl", tcc_declaration, 0)
+
 /*
 Local variables:
 mode:c
--- gcc/c-family/c-common.cc.jj 2022-11-13 12:29:08.165504692 +0100
+++ gcc/c-family/c-common.cc2022-11-15 09:34:48.828950083 +0100
@@ -8497,6 +8497,8 @@ c_common_init_ts (void)
   MARK_TS_EXP (FOR_STMT);
   MARK_TS_EXP (SWITCH_STMT);
   MARK_TS_EXP (WHILE_STMT);
+
+  MARK_TS_DECL_COMMON (CONCEPT_DECL);
 }
 
 /* Build a user-defined numeric literal out of an integer constant type VALUE
--- gcc/c-family/c-attribs.cc.jj2022-11-14 13:35:34.184160348 +0100
+++ gcc/c-family/c-attribs.cc   2022-11-15 09:30:57.370081060 +0100
@@ -4211,7 +4211,8 @@ handle_deprecated_attribute (tree *node,
  || VAR_OR_FUNCTION_DECL_P (decl)
  || TREE_CODE (decl) == FIELD_DECL
  || TREE_CODE (decl) == CONST_DECL
- || objc_method_decl (TREE_CODE (decl)))
+ || objc_method_decl (TREE_CODE (decl))
+ || TREE_CODE (decl) == CONCEPT_DECL)
TREE_DEPRECATED (decl) = 1;
   else if (TREE_CODE (decl) == LABEL_DECL)
{
--- gcc/cp/cp-tree.def.jj   2022-09-29 18:11:34.83800 +0200
+++ gcc/cp/cp-tree.def  2022-11-15 09:32:17.456996090 +0100
@@ -495,11 +495,6 @@ DEFTREECODE (OMP_DEPOBJ, "omp_depobj", t
 
 /* Extensions for Concepts. */
 
-/* Concept definition. This is not entirely different than a VAR_DECL
-   except that a) it must be a template, and b) doesn't have the wide
-   range of value and linkage options available to variables.  */
-DEFTREECODE (CONCEPT_DECL, "concept_decl", tcc_declaration, 0)
-
 /* Used to represent information associated with constrained declarations. */
 DEFTREECODE (CONSTRAINT_INFO, "constraint_info", tcc_exceptional, 0)
 
--- gcc/cp/cp-objcp-common.cc.jj2022-09-30 18:38:55.349607203 +0200
+++ gcc/cp/cp-objcp-common.cc   2022-11-15 09:34:21.963313049 +0100
@@ -473,7 +473,6 @@ cp_common_init_ts (void)
   /* New decls.  */
   MARK_TS_DECL_COMMON (TEMPLATE_DECL);
   MARK_TS_DECL_COMMON (WILDCARD_DECL);
-  MARK_TS_DECL_COMMON (CONCEPT_DECL);
 
   MARK_TS_DECL_NON_COMMON (USING_DECL);
 
--- gcc/cp/cp-tree.h.jj 2022-11-15 08:17:07.561388452 +0100
+++ gcc/cp/cp-tree.h2022-11-15 09:30:57.371081046 +0100
@@ -8324,7 +8324,7 @@ struct diagnosing_failed_constraint
 extern cp_expr finish_constraint_or_expr   (location_t, cp_expr, cp_expr);

Re: [PATCH v2] LoongArch: Add prefetch instructions.

2022-11-15 Thread Xi Ruoyao via Gcc-patches
On Sat, 2022-11-12 at 17:45 +0800, Xi Ruoyao via Gcc-patches wrote:
> void prefetch(char *ptr, int off)
> {
> return __builtin_prefetch(ptr + off);
> }
> 
> It's compiled to "preldx 0,$r4,$r5".  I don't think it's correct
> because
> according to the doc, rk should contains several bit-fields instead of
> an offset.

Hi Lulu,

Considering we are in stage 3 now and we can still push patches which
have been reviewed (in the first week of stage 3), I guess we can add
preld for GCC 13 and try preldx in the next development cycle?

BTW if preldx behaves exactly as how the manual says, I think it's not
possible to invoke it correctly in GNU C unless using inline assembly...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] LoongArch: Add prefetch instructions.

2022-11-15 Thread Lulu Cheng



在 2022/11/15 下午5:17, Xi Ruoyao 写道:

On Sat, 2022-11-12 at 17:45 +0800, Xi Ruoyao via Gcc-patches wrote:

void prefetch(char *ptr, int off)
{
 return __builtin_prefetch(ptr + off);
}

It's compiled to "preldx 0,$r4,$r5".  I don't think it's correct
because
according to the doc, rk should contains several bit-fields instead of
an offset.

Hi Lulu,

Considering we are in stage 3 now and we can still push patches which
have been reviewed (in the first week of stage 3), I guess we can add
preld for GCC 13 and try preldx in the next development cycle?

BTW if preldx behaves exactly as how the manual says, I think it's not
possible to invoke it correctly in GNU C unless using inline assembly...

Well, I also want to add preld and instant load optimization support in 
this release.


I will send patch in the next two days:-)




Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-11-15 Thread Hongtao Liu via Gcc-patches
On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina  wrote:
>
> > -Original Message-
> > From: Hongtao Liu 
> > Sent: Tuesday, November 15, 2022 8:36 AM
> > To: Tamar Christina 
> > Cc: Richard Sandiford ; Tamar Christina via
> > Gcc-patches ; nd ;
> > rguent...@suse.de
> > Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from
> > arbitrary element position inside a vector
> >
> > Hi:
> >   I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022-
> > November/606040.html.
> > >  }
> > >
> > >/* See if we can get a better vector mode before extracting.  */
> > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc index
> > >
> > cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b689616009
> > 0
> > > a453cc6a28d9 100644
> > > --- a/gcc/optabs.cc
> > > +++ b/gcc/optabs.cc
> > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode mode,
> > rtx v0, rtx v1,
> > >v0_qi = gen_lowpart (qimode, v0);
> > >v1_qi = gen_lowpart (qimode, v1);
> > >if (targetm.vectorize.vec_perm_const != NULL
> > > + && targetm.can_change_mode_class (mode, qimode, ALL_REGS)
> > It looks like you want to guard gen_lowpart, shouldn't it be better to use
> > validate_subreg  or (tmp = gen_lowpart_if_possible (mode, target_qi)).
> > IMHO, targetm.can_change_mode_class is mostly used for RA, but not to
> > guard gen_lowpart.
>
> Hmm I don't think this is quite true, there are existing usages in expr.cc 
> and rtanal.cc
> That do this and aren't part of RA.  As I mentioned before for instance the
> canoncalization of vec_select to subreg in rtlanal for instances uses this.
In theory, we need to iterate through all reg classes that can be
assigned for both qimode and mode, if any regclass returns true for
targetm.can_change_mode_class, the bitcast(validate_subreg) should be
ok.
Here we just passed ALL_REGS.
>
> So there are already existing precedence for this.  And the documentation for
> the hook says:
>
> "This hook returns true if it is possible to bitcast values held in registers 
> of class rclass from mode from to mode to and if doing so preserves the 
> low-order bits that are common to both modes. The result is only meaningful 
> if rclass has registers that can hold both from and to. The default 
> implementation returns true"
>
> So it looks like it's use outside of RA is perfectly valid.. and the 
> documentation also mentions
> in the example the use from the mid-end as an example.
>
> But if the mid-end maintainers are happy I'll use something else.
>
> Tamar
>
> > I did similar things in
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579296.html
> > (and ALL_REGS doesn't cover all cases for registers which are both available
> > for qimode and mode, ALL_REGS fail doesn't mean it can't be subreg, it just
> > means parts of ALL_REGS can't be subreg. but with a subset of ALL_REGS,
> > there could be a reg class which return true for
> > targetm.can_change_mode_class)
> > >   && targetm.vectorize.vec_perm_const (qimode, qimode, target_qi,
> > v0_qi,
> > >v1_qi, qimode_indices))
> > > return gen_lowpart (mode, target_qi); @@ -6311,7 +6312,8 @@
> > > expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
> > >  }
> > >
> > >if (qimode != VOIDmode
> > > -  && selector_fits_mode_p (qimode, qimode_indices))
> > > +  && selector_fits_mode_p (qimode, qimode_indices)
> > > +  && targetm.can_change_mode_class (mode, qimode, ALL_REGS))
> > >  {
> > >icode = direct_optab_handler (vec_perm_optab, qimode);
> > >if (icode != CODE_FOR_nothing)
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c
> > > b/gcc/testsuite/gcc.target/aarch64/ext_1.c
> > > new file mode 100644
> > > index
> > >
> > ..18a10a14f1161584267a8472e5
> > 71
> > > b3bc2ddf887a
> >
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao


RE: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

2022-11-15 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Hongtao Liu 
> Sent: Tuesday, November 15, 2022 9:37 AM
> To: Tamar Christina 
> Cc: Richard Sandiford ; Tamar Christina via
> Gcc-patches ; nd ;
> rguent...@suse.de
> Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from
> arbitrary element position inside a vector
> 
> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina
>  wrote:
> >
> > > -Original Message-
> > > From: Hongtao Liu 
> > > Sent: Tuesday, November 15, 2022 8:36 AM
> > > To: Tamar Christina 
> > > Cc: Richard Sandiford ; Tamar Christina
> > > via Gcc-patches ; nd ;
> > > rguent...@suse.de
> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of
> > > subvectors from arbitrary element position inside a vector
> > >
> > > Hi:
> > >   I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022-
> > > November/606040.html.
> > > >  }
> > > >
> > > >/* See if we can get a better vector mode before extracting.
> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index
> > > >
> > >
> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600
> > > 9
> > > 0
> > > > a453cc6a28d9 100644
> > > > --- a/gcc/optabs.cc
> > > > +++ b/gcc/optabs.cc
> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode
> mode,
> > > rtx v0, rtx v1,
> > > >v0_qi = gen_lowpart (qimode, v0);
> > > >v1_qi = gen_lowpart (qimode, v1);
> > > >if (targetm.vectorize.vec_perm_const != NULL
> > > > + && targetm.can_change_mode_class (mode, qimode,
> > > > + ALL_REGS)
> > > It looks like you want to guard gen_lowpart, shouldn't it be better
> > > to use validate_subreg  or (tmp = gen_lowpart_if_possible (mode,
> target_qi)).
> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but not
> > > to guard gen_lowpart.
> >
> > Hmm I don't think this is quite true, there are existing usages in
> > expr.cc and rtanal.cc That do this and aren't part of RA.  As I
> > mentioned before for instance the canoncalization of vec_select to subreg
> in rtlanal for instances uses this.
> In theory, we need to iterate through all reg classes that can be assigned for
> both qimode and mode, if any regclass returns true for
> targetm.can_change_mode_class, the bitcast(validate_subreg) should be ok.
> Here we just passed ALL_REGS.

Yes, and most targets where this transformation is valid return true here.

I've checked:
 * alpha
 * arm
 * aarch64
 * rs6000
 * s390
 * sparc
 * pa
 * mips

And even the default example that other targets use from the documentation
would return true as the size of the modes are the same.

X86 and RISCV are the only two targets that I found (but didn't check all) that
blankly return a result based on just the register classes.

That is to say, there are more targets that adhere to the interpretation that
rclass here means "should be possible in some class in rclass" rather than
"should be possible in ALL classes of rclass".

> >
> > So there are already existing precedence for this.  And the
> > documentation for the hook says:
> >
> > "This hook returns true if it is possible to bitcast values held in 
> > registers of
> class rclass from mode from to mode to and if doing so preserves the low-
> order bits that are common to both modes. The result is only meaningful if
> rclass has registers that can hold both from and to. The default
> implementation returns true"
> >
> > So it looks like it's use outside of RA is perfectly valid.. and the
> > documentation also mentions in the example the use from the mid-end as
> an example.
> >
> > But if the mid-end maintainers are happy I'll use something else.
> >
> > Tamar
> >
> > > I did similar things in
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579296.html
> > > (and ALL_REGS doesn't cover all cases for registers which are both
> > > available for qimode and mode, ALL_REGS fail doesn't mean it can't
> > > be subreg, it just means parts of ALL_REGS can't be subreg. but with
> > > a subset of ALL_REGS, there could be a reg class which return true
> > > for
> > > targetm.can_change_mode_class)
> > > >   && targetm.vectorize.vec_perm_const (qimode, qimode,
> > > > target_qi,
> > > v0_qi,
> > > >v1_qi, qimode_indices))
> > > > return gen_lowpart (mode, target_qi); @@ -6311,7 +6312,8
> > > > @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1,
> > > >  }
> > > >
> > > >if (qimode != VOIDmode
> > > > -  && selector_fits_mode_p (qimode, qimode_indices))
> > > > +  && selector_fits_mode_p (qimode, qimode_indices)
> > > > +  && targetm.can_change_mode_class (mode, qimode, ALL_REGS))
> > > >  {
> > > >icode = direct_optab_handler (vec_perm_optab, qimode);
> > > >if (icode != CODE_FOR_nothing) diff --git
> > > > a/gcc/testsuite/gcc.target/aarch64/ext_1.c
> > > > b/gcc/testsuite/gcc.target/aarch64/ext_1.c
> > > > new file mode 100644
> > > > index
> > > >
> > >
> 000

Re: GCC 13.0.0 Status Report (2022-11-14), Stage 3 in effect now

2022-11-15 Thread Martin Liška
On 11/14/22 18:21, Xi Ruoyao wrote:
> Hi Martin,
> 

Hello.

> Is it allowed to merge libsanitizer from LLVM in stage 3?  If not I'd
> like to cherry pick some commits from LLVM [to fix some stupid errors
> I've made in LoongArch libasan :(].

I'm sorry but I was really busy with the porting of the documentation to Sphinx.

Anyway, yes, we should make one one libsanitizer merge, but RM should likely
approve it: Richi, Jakub, do you support it?

Thanks,
Martin

> 
> On Mon, 2022-11-14 at 13:21 +, Richard Biener via Gcc-patches wrote:
>> Status
>> ==
>>
>> The GCC development branch which will become GCC 13 is now in
>> bugfixing mode (Stage 3) until the end of Jan 15th.
>>
>> As usual the first weeks of Stage 3 are used to feature patches
>> posted late during Stage 1.  At some point unreviewed features
>> need to be postponed for the next Stage 1.
>>
>>
>> Quality Data
>> 
>>
>> Priority  #   Change from last report
>>     ---   ---
>> P1  33    
>> P2  473 
>> P3  113   +  29
>> P4  253   +   6
>> P5  25   
>>     ---   ---
>> Total P1-P3 619   +  29
>> Total   897   +  35
>>
>>
>> Previous Report
>> ===
>>
>> https://gcc.gnu.org/pipermail/gcc/2022-October/239690.html
> 



Re: Revert Sphinx documentation [Was: Issues with Sphinx]

2022-11-15 Thread Martin Liška
On 11/14/22 14:06, Gerald Pfeifer wrote:
> On Mon, 14 Nov 2022, Martin Liška wrote:
>> The situation with the Sphinx migration went out of control. The TODO 
>> list overwhelmed me and there are road-blocks that can't be easily fixed 
>> with what Sphinx currently supports.
> 
> This migration was/is a huge and complex undertaking, and you have been 
> patiently chipping away at obstacle after obstacle.
> 
> So while it probably is disappointing it did not go through this time,
> you made a lot of progress and important contributions - and we all 
> learned quite a bit more, also in terms of (not so obvious) requirements,
> dependencies, and road blocks left which you summarized.

Hello.

I see it similarly and I'm trying to take the best out of it.

> 
> 
> Timing was tricky for me being on the road last week and I am definitely
> committed to keep helping with this transition. Maybe soon after we are in 
> stage 1 again?

For now, I'm not planning working on that in the future. Maybe, someone else
can carry on working on that.

> 
> And would it make sense to convert at least our installation docs and
> https://gcc.gnu.org/install/ for the GCC 13 release?

Yes, it's definitely doable as it's quite small, not split among too many .rst
files and the HTML version is mainly built at our server in order to populate
gcc.gnu.org/install

I'm willing to help anybody, but it won't be me who will suggest/send
the patches.

Thanks for understanding.
Martin

> 
> Gerald



Re: GCC 13.0.0 Status Report (2022-11-14), Stage 3 in effect now

2022-11-15 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 15, 2022 at 11:02:53AM +0100, Martin Liška wrote:
> > Is it allowed to merge libsanitizer from LLVM in stage 3?  If not I'd
> > like to cherry pick some commits from LLVM [to fix some stupid errors
> > I've made in LoongArch libasan :(].
> 
> I'm sorry but I was really busy with the porting of the documentation to 
> Sphinx.
> 
> Anyway, yes, we should make one one libsanitizer merge, but RM should likely
> approve it: Richi, Jakub, do you support it?

Could you please prepare a patch, so that we can see how much actually
changed and decide based on that whether to go for a merge or cherry-picking
one or more commits?
I think last merge was done by you at the end of August, so we have
2.5 months of changes to potentially merge.

Jakub



Re: Rust frontend patches v3

2022-11-15 Thread Arthur Cohen

Hi Richard,

On 11/10/22 11:52, Richard Biener wrote:

On Wed, Oct 26, 2022 at 10:16 AM  wrote:


This is the fixed version of our previous patch set for gccrs - We've adressed
the comments raised in our previous emails.

This patch set does not contain any work that was not previously included, such
as closure support, the constant evaluator port, or the better implementation
of target hooks by Iain Buclaw. They will follow up in subsequent patch sets.

Thanks again to Open Source Security, inc and Embecosm who have accompanied us
for this work.

Many thanks to all of the contributors and our community, who made this
possible.

A very special thanks to Philip Herron, without whose mentoring I would have
never been in a position to send these patches.

You can see the current status of our work on our branch:
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/devel/rust/master

The patch set contains the following:


Can you mark the patches that have been reviewed/approved?  Can you
maybe either split the series or organize it in a way to separate the
pieces touching common parts of GCC from the gcc/rust/ parts?
Can you separate testsuite infrastructure from actual tests, can
you mark/separate target specific changes?  And for those (then small)
changes CC the appropriate maintainers?


Thanks a lot for all the feedback. I'll apply the required changes and 
make sure the patchset(s) are a bit easier to review.


All the best,

Arthur



Thanks,
Richard.


[PATCH Rust front-end v3 01/46] Use DW_ATE_UTF for the Rust 'char'
[PATCH Rust front-end v3 02/46] gccrs: Add nessecary hooks for a Rust
[PATCH Rust front-end v3 03/46] gccrs: Add Debug info testsuite
[PATCH Rust front-end v3 04/46] gccrs: Add link cases testsuite
[PATCH Rust front-end v3 05/46] gccrs: Add general compilation test
[PATCH Rust front-end v3 06/46] gccrs: Add execution test cases
[PATCH Rust front-end v3 07/46] gccrs: Add gcc-check-target
[PATCH Rust front-end v3 08/46] gccrs: Add Rust front-end base AST
[PATCH Rust front-end v3 09/46] gccrs: Add definitions of Rust Items
[PATCH Rust front-end v3 10/46] gccrs: Add full definitions of Rust
[PATCH Rust front-end v3 11/46] gccrs: Add Rust AST visitors
[PATCH Rust front-end v3 12/46] gccrs: Add Lexer for Rust front-end
[PATCH Rust front-end v3 13/46] gccrs: Add Parser for Rust front-end
[PATCH Rust front-end v3 14/46] gccrs: Add Parser for Rust front-end
[PATCH Rust front-end v3 15/46] gccrs: Add expansion pass for the
[PATCH Rust front-end v3 16/46] gccrs: Add name resolution pass to
[PATCH Rust front-end v3 17/46] gccrs: Add declarations for Rust HIR
[PATCH Rust front-end v3 18/46] gccrs: Add HIR definitions and
[PATCH Rust front-end v3 19/46] gccrs: Add AST to HIR lowering pass
[PATCH Rust front-end v3 20/46] gccrs: Add wrapper for make_unique
[PATCH Rust front-end v3 21/46] gccrs: Add port of FNV hash used
[PATCH Rust front-end v3 22/46] gccrs: Add Rust ABI enum helpers
[PATCH Rust front-end v3 23/46] gccrs: Add Base62 implementation
[PATCH Rust front-end v3 24/46] gccrs: Add implementation of Optional
[PATCH Rust front-end v3 25/46] gccrs: Add attributes checker
[PATCH Rust front-end v3 26/46] gccrs: Add helpers mappings canonical
[PATCH Rust front-end v3 27/46] gccrs: Add type resolution and trait
[PATCH Rust front-end v3 28/46] gccrs: Add Rust type information
[PATCH Rust front-end v3 29/46] gccrs: Add remaining type system
[PATCH Rust front-end v3 30/46] gccrs: Add unsafe checks for Rust
[PATCH Rust front-end v3 31/46] gccrs: Add const checker
[PATCH Rust front-end v3 32/46] gccrs: Add privacy checks
[PATCH Rust front-end v3 33/46] gccrs: Add dead code scan on HIR
[PATCH Rust front-end v3 34/46] gccrs: Add unused variable scan
[PATCH Rust front-end v3 35/46] gccrs: Add metadata ouptput pass
[PATCH Rust front-end v3 36/46] gccrs: Add base for HIR to GCC
[PATCH Rust front-end v3 37/46] gccrs: Add HIR to GCC GENERIC
[PATCH Rust front-end v3 38/46] gccrs: Add HIR to GCC GENERIC
[PATCH Rust front-end v3 39/46] gccrs: These are wrappers ported from
[PATCH Rust front-end v3 40/46] gccrs: Add GCC Rust front-end
[PATCH Rust front-end v3 41/46] gccrs: Add config-lang.in
[PATCH Rust front-end v3 42/46] gccrs: Add lang-spec.h
[PATCH Rust front-end v3 43/46] gccrs: Add lang.opt
[PATCH Rust front-end v3 44/46] gccrs: Add compiler driver
[PATCH Rust front-end v3 45/46] gccrs: Compiler proper interface
[PATCH Rust front-end v3 46/46] gccrs: Add README, CONTRIBUTING and



--
Arthur Cohen 

Toolchain Engineer

Embecosm GmbH

Geschäftsführer: Jeremy Bennett
Niederlassung: Nürnberg
Handelsregister: HR-B 36368
www.embecosm.de

Fürther Str. 27
90429 Nürnberg


Tel.: 091 - 128 707 040
Fax: 091 - 128 707 077


OpenPGP_0x1B3465B044AD9C65.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH]middle-end: replace GET_MODE_WIDER_MODE with GET_MODE_NEXT_MODE

2022-11-15 Thread Tamar Christina via Gcc-patches
Hi All,

After the fix to the addsub patch yesterday for bootstrap I had only regtested 
on x86.
While looking today it seemed the new tests were failing, this was caused
by a change in the behavior of the GET_MODE_WIDER_MODE macro on trunk.

This patch fixes that issue. Sorry for the mess, have rebased all branches now.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* match.pd: Replace GET_MODE_WIDER_MODE with
GET_MODE_NEXT_MODE.

--- inline copy of patch -- 
diff --git a/gcc/match.pd b/gcc/match.pd
index 
1b0ab7cf60fa4772fbe8304c622b0b8fab1bdefa..28191a992039c6f3a1dab5f7c0e35dd58dc47092
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7997,7 +7997,7 @@ and,
machine_mode wide_mode;
  }
  (if (sel.series_p (0, 2, 0, 2)
-  && GET_MODE_WIDER_MODE (vec_mode).exists (&wide_mode)
+  && GET_MODE_NEXT_MODE (vec_mode).exists (&wide_mode)
  && VECTOR_MODE_P (wide_mode)
  && (GET_MODE_UNIT_BITSIZE (vec_mode) * 2
  == GET_MODE_UNIT_BITSIZE (wide_mode)))




-- 
diff --git a/gcc/match.pd b/gcc/match.pd
index 
1b0ab7cf60fa4772fbe8304c622b0b8fab1bdefa..28191a992039c6f3a1dab5f7c0e35dd58dc47092
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7997,7 +7997,7 @@ and,
machine_mode wide_mode;
  }
  (if (sel.series_p (0, 2, 0, 2)
-  && GET_MODE_WIDER_MODE (vec_mode).exists (&wide_mode)
+  && GET_MODE_NEXT_MODE (vec_mode).exists (&wide_mode)
  && VECTOR_MODE_P (wide_mode)
  && (GET_MODE_UNIT_BITSIZE (vec_mode) * 2
  == GET_MODE_UNIT_BITSIZE (wide_mode)))





Re: [PATCH] RISC-V: Optimal RVV epilogue logic.

2022-11-15 Thread Philipp Tomsich
On Mon, 14 Nov 2022 at 17:29, jiawei  wrote:
>
> Skip add insn generate if the adjust size equal to zero.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_expand_epilogue):
> New if control segement.
>
> ---
>  gcc/config/riscv/riscv.cc | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 02a01ca0b7c..af138db7545 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -5186,24 +5186,26 @@ riscv_expand_epilogue (int style)
> }
>
>/* Get an rtx for STEP1 that we can add to BASE.  */
> -  rtx adjust = GEN_INT (step1.to_constant ());
> -  if (!SMALL_OPERAND (step1.to_constant ()))
> +  if (step1.to_constant () != 0){
> +rtx adjust = GEN_INT (step1.to_constant ());
> +if (!SMALL_OPERAND (step1.to_constant ()))

Please take a look at the recent improvements for the add3
expander (recently submitted as
https://patchwork.ozlabs.org/project/gcc/patch/20221109230718.3240479-1-philipp.toms...@vrull.eu/).
Maybe you also want to use the test for the addi_operand(...) instead
of SMALL_OPERAND?

> {
>   riscv_emit_move (RISCV_PROLOGUE_TEMP (Pmode), adjust);
>   adjust = RISCV_PROLOGUE_TEMP (Pmode);
> }
>
> -  insn = emit_insn (
> +insn = emit_insn (
>gen_add3_insn (stack_pointer_rtx, stack_pointer_rtx, adjust));
>
> -  rtx dwarf = NULL_RTX;
> -  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> +rtx dwarf = NULL_RTX;
> +rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>  GEN_INT (step2));
>
> -  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
> -  RTX_FRAME_RELATED_P (insn) = 1;
> +dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
> +RTX_FRAME_RELATED_P (insn) = 1;
>
> -  REG_NOTES (insn) = dwarf;
> +REG_NOTES (insn) = dwarf;
> +  }
>  }
>else if (frame_pointer_needed)
>  {
> --
> 2.25.1
>


Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hello,
>
> Ping and updated patch.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (*tb1): Rename to...
> (*tb1): ... this.
> (tbranch4): New.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/tbz_1.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd712bde55c7c72e
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -943,12 +943,29 @@ (define_insn "*cb1"
>   (const_int 1)))]
>  )
>
> -(define_insn "*tb1"
> +(define_expand "tbranch4"
>[(set (pc) (if_then_else
> - (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" 
> "r")
> -   (const_int 1)
> -   (match_operand 1
> - "aarch64_simd_shift_imm_" "n"))
> +   (match_operator 0 "aarch64_comparison_operator"
> +[(match_operand:ALLI 1 "register_operand")
> + (match_operand:ALLI 2 
> "aarch64_simd_shift_imm_")])
> +   (label_ref (match_operand 3 "" ""))
> +   (pc)))]
> +  "optimize > 0"

Why's the pattern conditional on optimize?  Seems a valid choice at -O0 too.

I think the split here shows the difficulty with having a single optab
and a comparison operator though.  operand 0 can be something like:

  (eq x 1)

but we're not comparing x for equality with 1.  We're testing whether
bit 1 is zero.  This means that operand 0 can't be taken literally
and can't be used directly in insn patterns.

In an earlier review, I'd said:

  For the TB instructions (and for other similar instructions that I've
  seen on other architectures) it would be more useful to have a single-bit
  test, with operand 4 specifying the bit position.  Arguably it might then
  be better to have separate eq and ne optabs, to avoid the awkward doubling
  of the operands (operand 1 contains operands 2 and 3).

I think we should do that eq/ne split (sorry for not pushing harder for
it before).

Thanks,
Richard



> +{
> +  rtx bitvalue = gen_reg_rtx (DImode);
> +  rtx tmp = simplify_gen_subreg (DImode, operands[1], GET_MODE 
> (operands[1]), 0);
> +  emit_insn (gen_extzv (bitvalue, tmp, const1_rtx, operands[2]));
> +  operands[2] = const0_rtx;
> +  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), bitvalue,
> +operands[2]);
> +})
> +
> +(define_insn "*tb1"
> +  [(set (pc) (if_then_else
> + (EQL (zero_extract:GPI (match_operand:ALLI 0 "register_operand" 
> "r")
> +(const_int 1)
> +(match_operand 1
> +  "aarch64_simd_shift_imm_" 
> "n"))
>(const_int 0))
>  (label_ref (match_operand 2 "" ""))
>  (pc)))
> @@ -959,15 +976,15 @@ (define_insn "*tb1"
>{
> if (get_attr_far_branch (insn) == 1)
>   return aarch64_gen_far_branch (operands, 2, "Ltb",
> -"\\t%0, %1, ");
> +"\\t%0, %1, ");
> else
>   {
> operands[1] = GEN_INT (HOST_WIDE_INT_1U << UINTVAL (operands[1]));
> -   return "tst\t%0, %1\;\t%l2";
> +   return "tst\t%0, %1\;\t%l2";
>   }
>}
>  else
> -  return "\t%0, %1, %l2";
> +  return "\t%0, %1, %l2";
>}
>[(set_attr "type" "branch")
> (set (attr "length")
> diff --git a/gcc/testsuite/gcc.target/aarch64/tbz_1.c 
> b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
> new file mode 100644
> index 
> ..86f5d3e23cf7f1ea6f3596549ce1a0cff6774463
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
> @@ -0,0 +1,95 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -std=c99  -fno-unwind-tables 
> -fno-asynchronous-unwind-tables" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +#include 
> +
> +void h(void);
> +
> +/*
> +** g1:
> +** tbnzx[0-9]+, #?0, .L([0-9]+)
> +** ret
> +** ...
> +*/
> +void g1(bool x)
> +{
> +  if (__builtin_expect (x, 0))
> +h ();
> +}
> +
> +/*
> +** g2:
> +** tbz x[0-9]+, #?0, .L([0-9]+)
> +** b   h
> +** ...
> +*/
> +void g2(bool x)
> +{
> +  if (__builtin_expect (x, 1))
> +h ();
> +}
> +
> +/*
> +** g3_ge:
> +** tbnzw[0-9]+, #?31, .L[0-9]+
> +** b   h
> +** ...
> +*/
> +void g3_ge(int x)
> +{
> +  if (__builtin_expect (x >= 0, 1))
> +h ();
> +}
> +
> +/*
> +** g3_gt:
> +** cmp w[0-9]+, 0
> +** ble .L[0-9]+
> +** b   h
> +** ...
> +*

RE: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-15 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, November 15, 2022 10:36 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> ; nd ; Marcus Shawcroft
> 
> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> 
> Tamar Christina  writes:
> > Hello,
> >
> > Ping and updated patch.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.md (*tb1): Rename to...
> > (*tb1): ... this.
> > (tbranch4): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/tbz_1.c: New test.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/config/aarch64/aarch64.md
> > b/gcc/config/aarch64/aarch64.md index
> >
> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
> 71
> > 2bde55c7c72e 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -943,12 +943,29 @@ (define_insn "*cb1"
> >   (const_int 1)))]
> >  )
> >
> > -(define_insn "*tb1"
> > +(define_expand "tbranch4"
> >[(set (pc) (if_then_else
> > - (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand"
> "r")
> > -   (const_int 1)
> > -   (match_operand 1
> > - "aarch64_simd_shift_imm_" "n"))
> > +   (match_operator 0 "aarch64_comparison_operator"
> > +[(match_operand:ALLI 1 "register_operand")
> > + (match_operand:ALLI 2
> "aarch64_simd_shift_imm_")])
> > +   (label_ref (match_operand 3 "" ""))
> > +   (pc)))]
> > +  "optimize > 0"
> 
> Why's the pattern conditional on optimize?  Seems a valid choice at -O0 too.
> 

Hi,

I had explained the reason why in the original patch, just didn't repeat it in 
the ping:

Instead of emitting the instruction directly I've chosen to expand the pattern 
using a zero extract and generating the existing pattern for comparisons for two
reasons:

  1. Allows for CSE of the actual comparison.
  2. It looks like the code in expand makes the label as unused and removed it
 if it doesn't see a separate reference to it.

Because of this expansion though I disable the pattern at -O0 since we have no 
combine in that case so we'd end up with worse code.  I did try emitting the 
pattern directly, but as mentioned in no#2 expand would then kill the label.

Basically I emit the pattern directly, immediately during expand the label is 
marked as dead for some weird reason.

Tamar.

> I think the split here shows the difficulty with having a single optab and a
> comparison operator though.  operand 0 can be something like:
> 
>   (eq x 1)
> 
> but we're not comparing x for equality with 1.  We're testing whether bit 1 is
> zero.  This means that operand 0 can't be taken literally and can't be used
> directly in insn patterns.
> 
> In an earlier review, I'd said:
> 
>   For the TB instructions (and for other similar instructions that I've
>   seen on other architectures) it would be more useful to have a single-bit
>   test, with operand 4 specifying the bit position.  Arguably it might then
>   be better to have separate eq and ne optabs, to avoid the awkward
> doubling
>   of the operands (operand 1 contains operands 2 and 3).
> 
> I think we should do that eq/ne split (sorry for not pushing harder for it
> before).
> 
> Thanks,
> Richard
> 
> 
> 
> > +{
> > +  rtx bitvalue = gen_reg_rtx (DImode);
> > +  rtx tmp = simplify_gen_subreg (DImode, operands[1], GET_MODE
> > +(operands[1]), 0);
> > +  emit_insn (gen_extzv (bitvalue, tmp, const1_rtx, operands[2]));
> > +  operands[2] = const0_rtx;
> > +  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
> bitvalue,
> > +operands[2]);
> > +})
> > +
> > +(define_insn "*tb1"
> > +  [(set (pc) (if_then_else
> > + (EQL (zero_extract:GPI (match_operand:ALLI 0 
> > "register_operand"
> "r")
> > +(const_int 1)
> > +(match_operand 1
> > +
> > +"aarch64_simd_shift_imm_" "n"))
> >(const_int 0))
> >  (label_ref (match_operand 2 "" ""))
> >  (pc)))
> > @@ -959,15 +976,15 @@ (define_insn "*tb1"
> >{
> > if (get_attr_far_branch (insn) == 1)
> >   return aarch64_gen_far_branch (operands, 2, "Ltb",
> > -"\\t%0, %1, ");
> > +"\\t%0, %1,
> > + ");
> > else
> >   {
> > operands[1] = GEN_INT (HOST_WIDE_INT_1U << UINTVAL
> (operands[1]));
> > -   return "tst\t%0, %1\;\t%l2";
> > +   return "tst\t%0, %1\;\t%l2";
> >   }
> >}
> >  else
> > -  return "\t%0, %1, %l2";
> > +  return "\t%0, %1, %l2";

Re: [PATCH v2] aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPU

2022-11-15 Thread Richard Sandiford via Gcc-patches
Philipp Tomsich  writes:
> Richard,
>
> is this OK for backport to GCC-12 and GCC-11?

The fusion part seems potentially risky for a stable branch, but since
it's conditional on the new flag (and thus new CPU), I think it should
be OK.

So yeah, OK for both, thanks.

Richard

> Thanks,
> Philipp.
>
> On Mon, 14 Nov 2022 at 14:53, Philipp Tomsich  
> wrote:
>>
>> This patch adds support for Ampere-1A CPU:
>>  - recognize the name of the core and provide detection for -mcpu=native,
>>  - updated extra_costs,
>>  - adds a new fusion pair for (A+B+1 and A-B-1).
>>
>> Ampere-1A and Ampere-1 have more timing difference than the extra
>> costs indicate, but these don't propagate through to the headline
>> items in our extra costs (e.g. the change in latency for scalar sqrt
>> doesn't have a corresponding table entry).
>>
>> gcc/ChangeLog:
>>
>> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a.
>> * config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs.
>> * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR):
>> Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two
>> registers and then +1/-1).
>> * config/aarch64/aarch64-tune.md: Regenerate.
>> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>> idiom-matcher for the new fusion pair.
>> * doc/invoke.texi: Add ampere1a.
>>
>> Signed-off-by: Philipp Tomsich 
>> ---
>>
>> Changes in v2:
>> - break line in fusion matcher to stay below 80 characters
>> - rename fusion pair addsub_2reg_const1
>> - document 'ampere1a' in invoke.texi
>>
>>  gcc/config/aarch64/aarch64-cores.def|   1 +
>>  gcc/config/aarch64/aarch64-cost-tables.h| 107 
>>  gcc/config/aarch64/aarch64-fusion-pairs.def |   1 +
>>  gcc/config/aarch64/aarch64-tune.md  |   2 +-
>>  gcc/config/aarch64/aarch64.cc   |  64 
>>  gcc/doc/invoke.texi |   2 +-
>>  6 files changed, 175 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-cores.def 
>> b/gcc/config/aarch64/aarch64-cores.def
>> index d2671778928..aead587cec1 100644
>> --- a/gcc/config/aarch64/aarch64-cores.def
>> +++ b/gcc/config/aarch64/aarch64-cores.def
>> @@ -70,6 +70,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  
>> V8A,  (CRC, CRYPTO), thu
>>
>>  /* Ampere Computing ('\xC0') cores. */
>>  AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, SHA3), 
>> ampere1, 0xC0, 0xac3, -1)
>> +AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, SHA3, 
>> MEMTAG), ampere1a, 0xC0, 0xac4, -1)
>>  /* Do not swap around "emag" and "xgene1",
>> this order is required to handle variant correctly. */
>>  AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), 
>> emag, 0x50, 0x000, 3)
>> diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
>> b/gcc/config/aarch64/aarch64-cost-tables.h
>> index 760d7b30368..48522606fbe 100644
>> --- a/gcc/config/aarch64/aarch64-cost-tables.h
>> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
>> @@ -775,4 +775,111 @@ const struct cpu_cost_table ampere1_extra_costs =
>>}
>>  };
>>
>> +const struct cpu_cost_table ampere1a_extra_costs =
>> +{
>> +  /* ALU */
>> +  {
>> +0, /* arith.  */
>> +0, /* logical.  */
>> +0, /* shift.  */
>> +COSTS_N_INSNS (1), /* shift_reg.  */
>> +0, /* arith_shift.  */
>> +COSTS_N_INSNS (1), /* arith_shift_reg.  */
>> +0, /* log_shift.  */
>> +COSTS_N_INSNS (1), /* log_shift_reg.  */
>> +0, /* extend.  */
>> +COSTS_N_INSNS (1), /* extend_arith.  */
>> +0, /* bfi.  */
>> +0, /* bfx.  */
>> +0, /* clz.  */
>> +0, /* rev.  */
>> +0, /* non_exec.  */
>> +true   /* non_exec_costs_exec.  */
>> +  },
>> +  {
>> +/* MULT SImode */
>> +{
>> +  COSTS_N_INSNS (3),   /* simple.  */
>> +  COSTS_N_INSNS (3),   /* flag_setting.  */
>> +  COSTS_N_INSNS (3),   /* extend.  */
>> +  COSTS_N_INSNS (4),   /* add.  */
>> +  COSTS_N_INSNS (4),   /* extend_add.  */
>> +  COSTS_N_INSNS (19)   /* idiv.  */
>> +},
>> +/* MULT DImode */
>> +{
>> +  COSTS_N_INSNS (3),   /* simple.  */
>> +  0,   /* flag_setting (N/A).  */
>> +  COSTS_N_INSNS (3),   /* extend.  */
>> +  COSTS_N_INSNS (4),   /* add.  */
>> +  COSTS_N_INSNS (4),   /* extend_add.  */
>> +  COSTS_N_INSNS (35)   /* idiv.  */
>> +}
>> +  },
>> +  /* LD/ST */
>> +  {
>> +COSTS_N_INSNS (4), /* load.  */
>> +COSTS_N_INSNS (4), /* load_sign_extend.  */
>> +0, /* ldrd (n/a).  */
>> +0, /* ldm_1st.  */
>> +0,

PING: [PATCH v6] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-11-15 Thread Di Zhao OS via Gcc-patches
Hi,

I saw that Stage 1 of GCC 13 development is just ended. So is this
considered? Or should I bring this up when general development is
reopened?


Thanks,
Di Zhao


> -Original Message-
> From: Di Zhao OS
> Sent: Tuesday, October 25, 2022 8:18 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Biener 
> Subject: [PATCH v6] tree-optimization/101186 - extend FRE with "equivalence
> map" for condition prediction
> 
> Sorry for the late update. I've been on a vacation and then I
> spent some time updating and verifying the patch.
> 
> Attached is a new version of the patch. There are some changes:
> 
> 1. Store equivalences in a vn_pval chain in vn_ssa_aux, rather than
>in the expression hash table. (Following Richard's suggestion.)
> 2. Extracted insert_single_predicated_value function.
> 3. Simplify record_equiv_from_prev_phi a bit.
> 4. Changed some of the functions' names and tried to improve the
>comments a little.
> 
> Current status of the new testcases in the patch:
> 
> ssa-fre-200.c Can also be optimized by evrp
> ssa-fre-201.c Not optimized in trunk.
> ssa-fre-202.c foo() can be removed by evrp; while x + b is not
>   folded.
> ssa-pre-34.c  Not optimized in trunk.
> 
> Initially, this patch is motivated to remove the unreachable codes
> in case like ssa-pre-34.c, in which we need to use equivalence
> relation produced from a preceding condition for another condition.
> VRP didn't optimize that because it needs jump threading to make
> the relation valid at the second condition.
> 
> After browsing the mechanisms of VRP and FRE, it seems to me there
> are two options: 1) Teach VRP to identify related but not threaded
> conditions. That might require introducing value-numbering into VRP
> to detect common expressions, and I think is too much for this.
> 2) Introduce temporary equivalence in sccvn, which I thought would
> change less on current code. (And along the reviews and updating
> patch I see how ad-hoc it was.)
> 
> I saw from the talk about VN there's plan to replace predicated
> values by ranger. So how does it goes? Is there something I can help
> with? (For the case ssa-pre-34.c, I think maybe it still needs the
> predicated-value support, to lookup related conditional expressions.)
> 
> Below are about questions in the last review:
> 
> > >  /* Valid hashtables storing information we have proven to be
> > > correct.  */
> > > @@ -490,9 +492,9 @@ VN_INFO (tree name)
> > >   nary->predicated_values = 0;
> > >   nary->u.result = boolean_true_node;
> > >   vn_nary_op_insert_into (nary, valid_info->nary);
> > > - gcc_assert (nary->unwind_to == NULL);
> >
> > why's that?  doesn't this mean unwinding will be broken?
> 
> Previously, predicate "argument_x == NULL" or "argument_x != NULL"
> is always new here (because argument_x's VN is just inserted.)
> But with the patch, there can be slot for "argument_x == NULL"
> or "argument_x != NULL" already. It won't break unwinding as the
> new value is not linked to the unwind-chain.
> 
> >
> > >   /* Also do not link it into the undo chain.  */
> > >   last_inserted_nary = nary->next;
> > > + /* There could be a predicate already.  */
> > >   nary->next = (vn_nary_op_t)(void *)-1;
> > >   nary = alloc_vn_nary_op_noinit (2, &vn_tables_insert_obstack);
> > >   init_vn_nary_op_from_pieces (nary, 2, EQ_EXPR,
> 
> > >  /* Compute and return the hash value for nary operation VBO1.  */
> > >
> > >  hashval_t
> > > @@ -4226,6 +4342,9 @@ init_vn_nary_op_from_stmt (vn_nary_op_t vno, gassign
> *stmt)
> > >for (i = 0; i < vno->length; ++i)
> > >   vno->op[i] = gimple_op (stmt, i + 1);
> > >  }
> > > +  /* Insert and lookup N-ary results by the operands' equivalence heads.
> */
> > > +  if (gimple_bb (stmt))
> > > +lookup_equiv_heads (vno->length, vno->op, vno->op, gimple_bb (stmt));
> >
> > That seems like the wrong place, the function didn't even valueize before.
> 
> To utilize temp-equivalences and get more simplified result, n-ary
> expressions should be always inserted and lookup by the operands'
> equivalence heads. So practically all the places
> init_vn_nary_op_from_stmt is used, lookup_equiv_heads (changed to
> get_equiv_heads) should be called. As I haven't found better place
> to put that, I just left it here in the patch..
> 
> > >  visit_nary_op (tree lhs, gassign *stmt)
> > >  {
> > >vn_nary_op_t vnresult;
> > > -  tree result = vn_nary_op_lookup_stmt (stmt, &vnresult);
> > > -  if (! result && vnresult)
> > > +  unsigned length = vn_nary_length_from_stmt (stmt);
> > > +  vn_nary_op_t vno
> > > += XALLOCAVAR (struct vn_nary_op_s, sizeof_vn_nary_op (length));
> > > +  init_vn_nary_op_from_stmt (vno, stmt);
> > > +  tree result = NULL_TREE;
> > > +  /* Try to get a simplified result.  */
> > > +  /* Do not simplify variable used in PHI at loop exit, or
> > > + simplify_peeled_chrec/constant_after_peeling may miss the loop.  */
> > > +  gimple *use_stmt;
>

Re: [PATCH] [PR68097] Try to avoid recursing for floats in tree_*_nonnegative_warnv_p.

2022-11-15 Thread Aldy Hernandez via Gcc-patches



On 11/15/22 08:15, Richard Biener wrote:

On Mon, Nov 14, 2022 at 8:05 PM Aldy Hernandez  wrote:




On 11/14/22 10:12, Richard Biener wrote:

On Sat, Nov 12, 2022 at 7:30 PM Aldy Hernandez  wrote:


It irks me that a PR named "we should track ranges for floating-point
hasn't been closed in this release.  This is an attempt to do just
that.

As mentioned in the PR, even though we track ranges for floats, it has
been suggested that avoiding recursing through SSA defs in
gimple_assign_nonnegative_warnv_p is also a goal.  We can do this with
various ranger components without the need for a heavy handed approach
(i.e. a full ranger).

I have implemented two versions of known_float_sign_p() that answer
the question whether we definitely know the sign for an operation or a
tree expression.

Both versions use get_global_range_query, which is a wrapper to query
global ranges.  This means, that no caching or propagation is done.
In the case of an SSA, we just return the global range for it (think
SSA_NAME_RANGE_INFO).  In the case of a tree code with operands, we
also use get_global_range_query to resolve the operands, and then call
into range-ops, which is our lowest level component.  There is no
ranger or gori involved.  All we're doing is resolving the operation
with the ranges passed.

This is enough to avoid recursing in the case where we definitely know
the sign of a range.  Otherwise, we still recurse.

Note that instead of get_global_range_query(), we could use
get_range_query() which uses a ranger (if active in a pass), or
get_global_range_query if not.  This would allow passes that have an
active ranger (with enable_ranger) to use a full ranger.  These passes
are currently, VRP, loop unswitching, DOM, loop versioning, etc.  If
no ranger is active, get_range_query defaults to global ranges, so
there's no additional penalty.

Would this be acceptable, at least enough to close (or rename the PR ;-))?


I think the checks would belong to the gimple_stmt_nonnegative_warnv_p function
only (that's the SSA name entry from the fold-const.cc ones)?


That was my first approach, but I thought I'd cover the unary and binary
operators as well, since they had other callers.  But I'm happy with
just the top-level tweak.  It's a lot less code :).


@@ -9234,6 +9235,15 @@ bool
  gimple_stmt_nonnegative_warnv_p (gimple *stmt, bool *strict_overflow_p,
  int depth)
  {
+  tree type = gimple_range_type (stmt);
+  if (type && frange::supports_p (type))
+{
+  frange r;
+  bool sign;
+  return (get_global_range_query ()->range_of_stmt (r, stmt)
+ && r.signbit_p (sign)
+ && sign == false);
+}

the above means we never fall through to the switch below if
frange::supports_p (type) - that's eventually good enough, I
don't think we ever call this very function directly but it gets
invoked via recursion through operands only.  But of course


Woah, sorry.  That was not intended.  For that matter, the patch as 
posted caused:


FAIL: gcc.dg/builtins-10.c (test for excess errors)
FAIL: gcc.dg/builtins-57.c (test for excess errors)
FAIL: gcc.dg/torture/builtin-nonneg-1.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/builtin-nonneg-1.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/builtin-nonneg-1.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (test for excess errors)

FAIL: gcc.dg/torture/builtin-nonneg-1.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/builtin-nonneg-1.c   -Os  (test for excess errors)
FAIL: gcc.dg/torture/builtin-power-1.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/builtin-power-1.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/builtin-power-1.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (test for excess errors)

FAIL: gcc.dg/torture/builtin-power-1.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/builtin-power-1.c   -Os  (test for excess errors)

Note that ranger folding calls this function, though it won't run the 
risk of endless recursion because range_of_stmt uses the LHS, and only 
use global ranges to solve the LHS.


Also, frange::supports_p() does not support all floats:

  static bool supports_p (const_tree type)
  {
// ?? Decimal floats can have multiple representations for the
// same number.  Supporting them may be as simple as just
// disabling them in singleton_p.  No clue.
return SCALAR_FLOAT_TYPE_P (type) && !DECIMAL_FLOAT_TYPE_P (type);
  }

Finally, my patch is more conservative than what the *nonnegative_warn* 
friends do.  We only return true when we're sure about the sign bit and 
it's FALSE.  As I mentioned elsewhere, tree_call_nonnegative_warn_p() 
always returns true for:


CASE_CFN_ACOS:
CASE_CFN_ACOS_FN:
CASE_CFN_ACOSH:
CASE_CFN_ACOSH_FN:
CASE_CFN_CABS:
CASE_CFN_CABS_FN:
...
...
  /* Always true.  */
  return true;

This means that we'll return true for a NAN, but we're incorrectly 
assuming 

Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, November 15, 2022 10:36 AM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> ; nd ; Marcus Shawcroft
>> 
>> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> 
>> Tamar Christina  writes:
>> > Hello,
>> >
>> > Ping and updated patch.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> > * config/aarch64/aarch64.md (*tb1): Rename to...
>> > (*tb1): ... this.
>> > (tbranch4): New.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > * gcc.target/aarch64/tbz_1.c: New test.
>> >
>> > --- inline copy of patch ---
>> >
>> > diff --git a/gcc/config/aarch64/aarch64.md
>> > b/gcc/config/aarch64/aarch64.md index
>> >
>> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
>> 71
>> > 2bde55c7c72e 100644
>> > --- a/gcc/config/aarch64/aarch64.md
>> > +++ b/gcc/config/aarch64/aarch64.md
>> > @@ -943,12 +943,29 @@ (define_insn "*cb1"
>> >   (const_int 1)))]
>> >  )
>> >
>> > -(define_insn "*tb1"
>> > +(define_expand "tbranch4"
>> >[(set (pc) (if_then_else
>> > - (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand"
>> "r")
>> > -   (const_int 1)
>> > -   (match_operand 1
>> > - "aarch64_simd_shift_imm_" "n"))
>> > +   (match_operator 0 "aarch64_comparison_operator"
>> > +[(match_operand:ALLI 1 "register_operand")
>> > + (match_operand:ALLI 2
>> "aarch64_simd_shift_imm_")])
>> > +   (label_ref (match_operand 3 "" ""))
>> > +   (pc)))]
>> > +  "optimize > 0"
>> 
>> Why's the pattern conditional on optimize?  Seems a valid choice at -O0 too.
>> 
>
> Hi,
>
> I had explained the reason why in the original patch, just didn't repeat it 
> in the ping:
>
> Instead of emitting the instruction directly I've chosen to expand the 
> pattern using a zero extract and generating the existing pattern for 
> comparisons for two
> reasons:
>
>   1. Allows for CSE of the actual comparison.
>   2. It looks like the code in expand makes the label as unused and removed it
>  if it doesn't see a separate reference to it.
>
> Because of this expansion though I disable the pattern at -O0 since we have 
> no combine in that case so we'd end up with worse code.  I did try emitting 
> the pattern directly, but as mentioned in no#2 expand would then kill the 
> label.
>
> Basically I emit the pattern directly, immediately during expand the label is 
> marked as dead for some weird reason.

Isn't #2 a bug though?  It seems like something we should fix rather than
work around.

Thanks,
Richard


>
> Tamar.
>
>> I think the split here shows the difficulty with having a single optab and a
>> comparison operator though.  operand 0 can be something like:
>> 
>>   (eq x 1)
>> 
>> but we're not comparing x for equality with 1.  We're testing whether bit 1 
>> is
>> zero.  This means that operand 0 can't be taken literally and can't be used
>> directly in insn patterns.
>> 
>> In an earlier review, I'd said:
>> 
>>   For the TB instructions (and for other similar instructions that I've
>>   seen on other architectures) it would be more useful to have a single-bit
>>   test, with operand 4 specifying the bit position.  Arguably it might then
>>   be better to have separate eq and ne optabs, to avoid the awkward
>> doubling
>>   of the operands (operand 1 contains operands 2 and 3).
>> 
>> I think we should do that eq/ne split (sorry for not pushing harder for it
>> before).
>> 
>> Thanks,
>> Richard
>> 
>> 
>> 
>> > +{
>> > +  rtx bitvalue = gen_reg_rtx (DImode);
>> > +  rtx tmp = simplify_gen_subreg (DImode, operands[1], GET_MODE
>> > +(operands[1]), 0);
>> > +  emit_insn (gen_extzv (bitvalue, tmp, const1_rtx, operands[2]));
>> > +  operands[2] = const0_rtx;
>> > +  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
>> bitvalue,
>> > +operands[2]);
>> > +})
>> > +
>> > +(define_insn "*tb1"
>> > +  [(set (pc) (if_then_else
>> > + (EQL (zero_extract:GPI (match_operand:ALLI 0 
>> > "register_operand"
>> "r")
>> > +(const_int 1)
>> > +(match_operand 1
>> > +
>> > +"aarch64_simd_shift_imm_" "n"))
>> >(const_int 0))
>> >  (label_ref (match_operand 2 "" ""))
>> >  (pc)))
>> > @@ -959,15 +976,15 @@ (define_insn "*tb1"
>> >{
>> > if (get_attr_far_branch (insn) == 1)
>> >   return aarch64_gen_far_branch (operands, 2, "Ltb",
>> > -"\\t%0, %1, ");
>> > +"\\t%0, %1,
>> > + ");
>> > else
>> >

Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

2022-11-15 Thread Jonathan Wakely via Gcc-patches

On 15/11/22 11:35 +0800, Hongyu Wang wrote:

Hi,

According to PR 107676, the document of -mrelax-cmpxchg-loop is nonsensical.
Adjust the wording according to the comments.

Bootstrapped on x86_64-pc-linux-gnu, ok for trunk?

gcc/ChangeLog:

PR target/107676
* doc/invoke.texi: Reword the description of
-mrelax-cmpxchg-loop.
---
gcc/doc/invoke.texi | 10 ++
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 40f667a630a..bdd7c319aef 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -33805,10 +33805,12 @@ registers.

@item -mrelax-cmpxchg-loop
@opindex mrelax-cmpxchg-loop
-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
-execute pause if load value is not expected. This reduces excessive
-cachline bouncing when and works for all atomic logic fetch builtins
-that generates compare and swap loop.
+For compare and swap loops that emitted by some __atomic_* builtins


s/that emitted/that are emitted/


+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
+counterparts), emit an atomic load before cmpxchg instruction. If the


s/before cmpxchg/before the cmpxchg/


+loaded value is not equal to expected, execute a pause instead of


s/not equal to expected/not equal to the expected/


+directly run the cmpxchg instruction. This might reduce excessive


s/directly run/directly running/


+cacheline bouncing.

@item -mindirect-branch=@var{choice}
@opindex mindirect-branch




RE: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-15 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, November 15, 2022 10:51 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> ; nd ; Marcus Shawcroft
> 
> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Tuesday, November 15, 2022 10:36 AM
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> ; nd ; Marcus Shawcroft
> >> 
> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >>
> >> Tamar Christina  writes:
> >> > Hello,
> >> >
> >> > Ping and updated patch.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> >
> >> > Ok for master?
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> > * config/aarch64/aarch64.md (*tb1): Rename to...
> >> > (*tb1): ... this.
> >> > (tbranch4): New.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >
> >> > * gcc.target/aarch64/tbz_1.c: New test.
> >> >
> >> > --- inline copy of patch ---
> >> >
> >> > diff --git a/gcc/config/aarch64/aarch64.md
> >> > b/gcc/config/aarch64/aarch64.md index
> >> >
> >>
> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
> >> 71
> >> > 2bde55c7c72e 100644
> >> > --- a/gcc/config/aarch64/aarch64.md
> >> > +++ b/gcc/config/aarch64/aarch64.md
> >> > @@ -943,12 +943,29 @@ (define_insn "*cb1"
> >> >   (const_int 1)))]
> >> >  )
> >> >
> >> > -(define_insn "*tb1"
> >> > +(define_expand "tbranch4"
> >> >[(set (pc) (if_then_else
> >> > - (EQL (zero_extract:DI (match_operand:GPI 0 
> >> > "register_operand"
> >> "r")
> >> > -   (const_int 1)
> >> > -   (match_operand 1
> >> > - "aarch64_simd_shift_imm_" 
> >> > "n"))
> >> > +   (match_operator 0 "aarch64_comparison_operator"
> >> > +[(match_operand:ALLI 1 "register_operand")
> >> > + (match_operand:ALLI 2
> >> "aarch64_simd_shift_imm_")])
> >> > +   (label_ref (match_operand 3 "" ""))
> >> > +   (pc)))]
> >> > +  "optimize > 0"
> >>
> >> Why's the pattern conditional on optimize?  Seems a valid choice at -O0
> too.
> >>
> >
> > Hi,
> >
> > I had explained the reason why in the original patch, just didn't repeat it 
> > in
> the ping:
> >
> > Instead of emitting the instruction directly I've chosen to expand the
> > pattern using a zero extract and generating the existing pattern for
> > comparisons for two
> > reasons:
> >
> >   1. Allows for CSE of the actual comparison.
> >   2. It looks like the code in expand makes the label as unused and removed
> it
> >  if it doesn't see a separate reference to it.
> >
> > Because of this expansion though I disable the pattern at -O0 since we
> have no combine in that case so we'd end up with worse code.  I did try
> emitting the pattern directly, but as mentioned in no#2 expand would then
> kill the label.
> >
> > Basically I emit the pattern directly, immediately during expand the label 
> > is
> marked as dead for some weird reason.
> 
> Isn't #2 a bug though?  It seems like something we should fix rather than
> work around.

Yes it's a bug ☹ ok if I'm going to fix that bug then do I need to split the 
optabs
still? Isn't the problem atm that I need the split?  If I'm emitting the 
instruction
directly then the recog pattern for it can just be (eq (vec_extract x 1) 0) 
which is
the correct semantics?

Thanks,
Tamar
> 
> Thanks,
> Richard
> 
> 
> >
> > Tamar.
> >
> >> I think the split here shows the difficulty with having a single
> >> optab and a comparison operator though.  operand 0 can be something
> like:
> >>
> >>   (eq x 1)
> >>
> >> but we're not comparing x for equality with 1.  We're testing whether
> >> bit 1 is zero.  This means that operand 0 can't be taken literally
> >> and can't be used directly in insn patterns.
> >>
> >> In an earlier review, I'd said:
> >>
> >>   For the TB instructions (and for other similar instructions that I've
> >>   seen on other architectures) it would be more useful to have a single-bit
> >>   test, with operand 4 specifying the bit position.  Arguably it might then
> >>   be better to have separate eq and ne optabs, to avoid the awkward
> >> doubling
> >>   of the operands (operand 1 contains operands 2 and 3).
> >>
> >> I think we should do that eq/ne split (sorry for not pushing harder
> >> for it before).
> >>
> >> Thanks,
> >> Richard
> >>
> >>
> >>
> >> > +{
> >> > +  rtx bitvalue = gen_reg_rtx (DImode);
> >> > +  rtx tmp = simplify_gen_subreg (DImode, operands[1], GET_MODE
> >> > +(operands[1]), 0);
> >> > +  emit_insn (gen_extzv (bitvalue, tmp, const1_rtx, operands[2]));
> >> > +  operands[2] = const0_rtx;
> >> > +  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
> >> bitvalue,
> >> > + 

Re: [PATCH]AArch64 Extend umov and sbfx patterns.

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi,
>
>> > --- a/gcc/config/aarch64/aarch64-simd.md
>> > +++ b/gcc/config/aarch64/aarch64-simd.md
>> > @@ -4259,7 +4259,7 @@ (define_insn
>> "*aarch64_get_lane_zero_extend"
>> >  ;; Extracting lane zero is split into a simple move when it is
>> > between SIMD  ;; registers or a store.
>> >  (define_insn_and_split "aarch64_get_lane"
>> > -  [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand"
>> > "=?r, w, Utv")
>> > +  [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand"
>> > + "=r, w, Utv")
>> >(vec_select:
>> >  (match_operand:VALL_F16_FULL 1 "register_operand" "w, w, w")
>> >  (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))]
>> 
>> Which testcase does this help with?  It didn't look like the new tests do any
>> vector stuff.
>> 
>
> Right, sorry about that, splitting up my patches resulted in this sneaking in 
> from a different series.
> Moved now.
>
>> > -(define_insn "*_ashl"
>> > +(define_insn "*_ashl"
>> >[(set (match_operand:GPI 0 "register_operand" "=r")
>> >(ANY_EXTEND:GPI
>> > -   (ashift:SHORT (match_operand:SHORT 1 "register_operand" "r")
>> > +   (ashift:ALLX (match_operand:ALLX 1 "register_operand" "r")
>> >   (match_operand 2 "const_int_operand" "n"]
>> > -  "UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
>> > +  "UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
>> 
>> It'd be better to avoid even defining si<-si or si<-di "extensions"
>> (even though nothing should try to match them), so how about adding:
>> 
>>>  &&
>> 
>> or similar to the beginning of the condition?  The conditions for the invalid
>> combos will then be provably false at compile time and the patterns will be
>> compiled out.
>> 
>
> Done.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md
>   (*_ashl): Renamed to...
>   (*_ashl): ...this.
>   (*zero_extend_lshr): Renamed to...
>   (*zero_extend_lshr): ...this.
>   (*extend_ashr): Rename to...
>   (*extend_ashr): ...this.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/bitmove_1.c: New test.
>   * gcc.target/aarch64/bitmove_2.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> d7684c93fba5b717d568e1a4fd712bde55c7c72e..d230bbb833f97813c8371aa07b587bd8b0292cee
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5711,40 +5711,43 @@ (define_insn "*extrsi5_insn_di"
>[(set_attr "type" "rotate_imm")]
>  )
>  
> -(define_insn "*_ashl"
> +(define_insn "*_ashl"
>[(set (match_operand:GPI 0 "register_operand" "=r")
>   (ANY_EXTEND:GPI
> -  (ashift:SHORT (match_operand:SHORT 1 "register_operand" "r")
> +  (ashift:ALLX (match_operand:ALLX 1 "register_operand" "r")
>  (match_operand 2 "const_int_operand" "n"]
> -  "UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
> +  " > 
> +   && UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
>  {
> -  operands[3] = GEN_INT ( - UINTVAL (operands[2]));
> +  operands[3] = GEN_INT ( - UINTVAL (operands[2]));
>return "bfiz\t%0, %1, %2, %3";
>  }
>[(set_attr "type" "bfx")]
>  )
>  
> -(define_insn "*zero_extend_lshr"
> +(define_insn "*zero_extend_lshr"
>[(set (match_operand:GPI 0 "register_operand" "=r")
>   (zero_extend:GPI
> -  (lshiftrt:SHORT (match_operand:SHORT 1 "register_operand" "r")
> -  (match_operand 2 "const_int_operand" "n"]
> -  "UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
> +  (lshiftrt:ALLX (match_operand:ALLX 1 "register_operand" "r")
> + (match_operand 2 "const_int_operand" "n"]
> +  " > 
> +   && UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
>  {
> -  operands[3] = GEN_INT ( - UINTVAL (operands[2]));
> +  operands[3] = GEN_INT ( - UINTVAL (operands[2]));
>return "ubfx\t%0, %1, %2, %3";
>  }
>[(set_attr "type" "bfx")]
>  )
>  
> -(define_insn "*extend_ashr"
> +(define_insn "*extend_ashr"
>[(set (match_operand:GPI 0 "register_operand" "=r")
>   (sign_extend:GPI
> -  (ashiftrt:SHORT (match_operand:SHORT 1 "register_operand" "r")
> -  (match_operand 2 "const_int_operand" "n"]
> -  "UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
> +  (ashiftrt:ALLX (match_operand:ALLX 1 "register_operand" "r")
> + (match_operand 2 "const_int_operand" "n"]
> +  " > 
> +   && UINTVAL (operands[2]) < GET_MODE_BITSIZE (mode)"
>  {
> -  operands[3] = GEN_INT ( - UINTVAL (operands[2]));
> +  operands[3] = GEN_INT ( - UINTVAL (operands[2]));
>return "sbfx\\t%0, %1, %2, %3";
>  }
>[(set_attr "type" "bfx")]
> diff --git a/gcc/testsuite/gcc.target/aarch64/bitmove_1.c 
> b/gcc/testsuite/gcc.target/aarch64/bitmove_1.c
> new file mode 100644
> index 
> 000

Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, November 15, 2022 10:51 AM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> ; nd ; Marcus Shawcroft
>> 
>> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> 
>> Tamar Christina  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford 
>> >> Sent: Tuesday, November 15, 2022 10:36 AM
>> >> To: Tamar Christina 
>> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> >> ; nd ; Marcus Shawcroft
>> >> 
>> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> >>
>> >> Tamar Christina  writes:
>> >> > Hello,
>> >> >
>> >> > Ping and updated patch.
>> >> >
>> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >> >
>> >> > Ok for master?
>> >> >
>> >> > Thanks,
>> >> > Tamar
>> >> >
>> >> > gcc/ChangeLog:
>> >> >
>> >> > * config/aarch64/aarch64.md (*tb1): Rename to...
>> >> > (*tb1): ... this.
>> >> > (tbranch4): New.
>> >> >
>> >> > gcc/testsuite/ChangeLog:
>> >> >
>> >> > * gcc.target/aarch64/tbz_1.c: New test.
>> >> >
>> >> > --- inline copy of patch ---
>> >> >
>> >> > diff --git a/gcc/config/aarch64/aarch64.md
>> >> > b/gcc/config/aarch64/aarch64.md index
>> >> >
>> >>
>> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
>> >> 71
>> >> > 2bde55c7c72e 100644
>> >> > --- a/gcc/config/aarch64/aarch64.md
>> >> > +++ b/gcc/config/aarch64/aarch64.md
>> >> > @@ -943,12 +943,29 @@ (define_insn "*cb1"
>> >> >   (const_int 1)))]
>> >> >  )
>> >> >
>> >> > -(define_insn "*tb1"
>> >> > +(define_expand "tbranch4"
>> >> >[(set (pc) (if_then_else
>> >> > - (EQL (zero_extract:DI (match_operand:GPI 0 
>> >> > "register_operand"
>> >> "r")
>> >> > -   (const_int 1)
>> >> > -   (match_operand 1
>> >> > - "aarch64_simd_shift_imm_" 
>> >> > "n"))
>> >> > +   (match_operator 0 "aarch64_comparison_operator"
>> >> > +[(match_operand:ALLI 1 "register_operand")
>> >> > + (match_operand:ALLI 2
>> >> "aarch64_simd_shift_imm_")])
>> >> > +   (label_ref (match_operand 3 "" ""))
>> >> > +   (pc)))]
>> >> > +  "optimize > 0"
>> >>
>> >> Why's the pattern conditional on optimize?  Seems a valid choice at -O0
>> too.
>> >>
>> >
>> > Hi,
>> >
>> > I had explained the reason why in the original patch, just didn't repeat 
>> > it in
>> the ping:
>> >
>> > Instead of emitting the instruction directly I've chosen to expand the
>> > pattern using a zero extract and generating the existing pattern for
>> > comparisons for two
>> > reasons:
>> >
>> >   1. Allows for CSE of the actual comparison.
>> >   2. It looks like the code in expand makes the label as unused and removed
>> it
>> >  if it doesn't see a separate reference to it.
>> >
>> > Because of this expansion though I disable the pattern at -O0 since we
>> have no combine in that case so we'd end up with worse code.  I did try
>> emitting the pattern directly, but as mentioned in no#2 expand would then
>> kill the label.
>> >
>> > Basically I emit the pattern directly, immediately during expand the label 
>> > is
>> marked as dead for some weird reason.
>> 
>> Isn't #2 a bug though?  It seems like something we should fix rather than
>> work around.
>
> Yes it's a bug ☹ ok if I'm going to fix that bug then do I need to split the 
> optabs
> still? Isn't the problem atm that I need the split?  If I'm emitting the 
> instruction
> directly then the recog pattern for it can just be (eq (vec_extract x 1) 0) 
> which is
> the correct semantics?

What rtx does the code that uses the optab pass for operand 0?

Richard


Re: [PATCH][i386]: Update ix86_can_change_mode_class target hook to accept QImode conversions

2022-11-15 Thread Hongtao Liu via Gcc-patches
On Fri, Nov 11, 2022 at 10:47 PM Tamar Christina via Gcc-patches
 wrote:
>
> Hi All,
>
> The current i386 implementation of the TARGET_CAN_CHANGE_MODE_CLASS is 
> currently
> not useful before re-alloc.
>
> In particular before regalloc optimization passes query the hook using 
> ALL_REGS,
> but because of the
>
>   if (MAYBE_FLOAT_CLASS_P (regclass))
>   return false;
>
> The hook returns false for all modes, even integer ones because ALL_REGS
> overlaps with floating point regs.
>
> The vector permute fallback cases used to unconditionally convert vector 
> integer
> permutes to vector QImode ones as a fallback plan.  This is incorrect and can
> result in incorrect code if the target doesn't support this conversion.
>
> To fix this some more checks were added, however that ended up introducing 
> ICEs
> in the i386 backend because e.g. the hook would reject conversions between 
> modes
> like V2TImode and V32QImode.
>
> My understanding is that for x87 we don't want to allow floating point
> conversions, but integers are fine.  So I have modified the check such that it
> also checks the modes, not just the register class groups.
>
> The second part of the code is needed because now that integer modes aren't
> uniformly rejected the i386 backend trigger further optimizations.  However 
> the
> backend lacks instructions to deal with canonical RTL representations of
> certain instructions.  for instance the back-end seems to prefer vec_select 0
> instead of subregs.
If the canonical form of the lower half of the vector in rtl is subreg
instead of vec_select, the x86 backend needs to update the pattern
accordingly to avoid diverge.
I can reproduce those failures when adding VALID_FP_MODE_P in
targetm.can_change_mode_class.
>
> So to prevent the canonicalization I reject integer modes when the sizes of to
> and from don't match and when we would have exited with false previously.
>
> This fixes all the ICEs and codegen regressions, but perhaps an x86 maintainer
> should take a deeper look at this hook implementation.
>
> Bootstrapped Regtested on x86_64-pc-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_can_change_mode_class): Update the target
> hook.
>
> --- inline copy of patch --
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 
> c4d0e36e9c0a2256f5dde1f4dc021c0328aa0cba..477dd007ea80272680751b61e35cc3eec79b66c3
>  100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -19682,7 +19682,15 @@ ix86_can_change_mode_class (machine_mode from, 
> machine_mode to,
>
>/* x87 registers can't do subreg at all, as all values are reformatted
>   to extended precision.  */
> -  if (MAYBE_FLOAT_CLASS_P (regclass))
> +  if (MAYBE_FLOAT_CLASS_P (regclass)
> +  && VALID_FP_MODE_P (from)
> +  && VALID_FP_MODE_P (to))
> +return false;
> +
> +  /* Reject integer modes if the sizes aren't the same.  It would have
> + normally exited above.  */
> +  if (MAYBE_FLOAT_CLASS_P (regclass)
> +  && GET_MODE_SIZE (from) != GET_MODE_SIZE (to))
>  return false;
>
>if (MAYBE_SSE_CLASS_P (regclass) || MAYBE_MMX_CLASS_P (regclass))
>
>
>
>
> --



-- 
BR,
Hongtao


RE: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-15 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, November 15, 2022 11:15 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> ; nd ; Marcus Shawcroft
> 
> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Tuesday, November 15, 2022 10:51 AM
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> ; nd ; Marcus Shawcroft
> >> 
> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >>
> >> Tamar Christina  writes:
> >> >> -Original Message-
> >> >> From: Richard Sandiford 
> >> >> Sent: Tuesday, November 15, 2022 10:36 AM
> >> >> To: Tamar Christina 
> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> >> ; nd ; Marcus
> Shawcroft
> >> >> 
> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >>
> >> >> Tamar Christina  writes:
> >> >> > Hello,
> >> >> >
> >> >> > Ping and updated patch.
> >> >> >
> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> >> >
> >> >> > Ok for master?
> >> >> >
> >> >> > Thanks,
> >> >> > Tamar
> >> >> >
> >> >> > gcc/ChangeLog:
> >> >> >
> >> >> > * config/aarch64/aarch64.md (*tb1): Rename
> to...
> >> >> > (*tb1): ... this.
> >> >> > (tbranch4): New.
> >> >> >
> >> >> > gcc/testsuite/ChangeLog:
> >> >> >
> >> >> > * gcc.target/aarch64/tbz_1.c: New test.
> >> >> >
> >> >> > --- inline copy of patch ---
> >> >> >
> >> >> > diff --git a/gcc/config/aarch64/aarch64.md
> >> >> > b/gcc/config/aarch64/aarch64.md index
> >> >> >
> >> >>
> >>
> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
> >> >> 71
> >> >> > 2bde55c7c72e 100644
> >> >> > --- a/gcc/config/aarch64/aarch64.md
> >> >> > +++ b/gcc/config/aarch64/aarch64.md
> >> >> > @@ -943,12 +943,29 @@ (define_insn "*cb1"
> >> >> >   (const_int 1)))]
> >> >> >  )
> >> >> >
> >> >> > -(define_insn "*tb1"
> >> >> > +(define_expand "tbranch4"
> >> >> >[(set (pc) (if_then_else
> >> >> > - (EQL (zero_extract:DI (match_operand:GPI 0
> "register_operand"
> >> >> "r")
> >> >> > -   (const_int 1)
> >> >> > -   (match_operand 1
> >> >> > - "aarch64_simd_shift_imm_" 
> >> >> > "n"))
> >> >> > +   (match_operator 0 "aarch64_comparison_operator"
> >> >> > +[(match_operand:ALLI 1 "register_operand")
> >> >> > + (match_operand:ALLI 2
> >> >> "aarch64_simd_shift_imm_")])
> >> >> > +   (label_ref (match_operand 3 "" ""))
> >> >> > +   (pc)))]
> >> >> > +  "optimize > 0"
> >> >>
> >> >> Why's the pattern conditional on optimize?  Seems a valid choice
> >> >> at -O0
> >> too.
> >> >>
> >> >
> >> > Hi,
> >> >
> >> > I had explained the reason why in the original patch, just didn't
> >> > repeat it in
> >> the ping:
> >> >
> >> > Instead of emitting the instruction directly I've chosen to expand
> >> > the pattern using a zero extract and generating the existing
> >> > pattern for comparisons for two
> >> > reasons:
> >> >
> >> >   1. Allows for CSE of the actual comparison.
> >> >   2. It looks like the code in expand makes the label as unused and
> >> > removed
> >> it
> >> >  if it doesn't see a separate reference to it.
> >> >
> >> > Because of this expansion though I disable the pattern at -O0 since
> >> > we
> >> have no combine in that case so we'd end up with worse code.  I did
> >> try emitting the pattern directly, but as mentioned in no#2 expand
> >> would then kill the label.
> >> >
> >> > Basically I emit the pattern directly, immediately during expand
> >> > the label is
> >> marked as dead for some weird reason.
> >>
> >> Isn't #2 a bug though?  It seems like something we should fix rather
> >> than work around.
> >
> > Yes it's a bug ☹ ok if I'm going to fix that bug then do I need to
> > split the optabs still? Isn't the problem atm that I need the split?
> > If I'm emitting the instruction directly then the recog pattern for it
> > can just be (eq (vec_extract x 1) 0) which is the correct semantics?
> 
> What rtx does the code that uses the optab pass for operand 0?

It gets passed the full comparison:

(eq (reg/v:SI 92 [ x ])
(const_int 0 [0]))

of which we only look at the operator.

Tamar.

> 
> Richard


Re: [PATCH] ira: Remove duplicate `memset' over `full_costs' from `assign_hard_reg'

2022-11-15 Thread Maciej W. Rozycki
On Mon, 14 Nov 2022, Jeff Law wrote:

> > gcc/
> > * ira-color.cc (assign_hard_reg): Remove duplicate `memset' over
> > `full_costs'.
> > ---
> > Hi,
> > 
> >   I find this fairly obvious, OK to apply?
> 
> Seems obvious to me as well.  OK.

 Thanks, now committed.

  Maciej


Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, November 15, 2022 11:15 AM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> ; nd ; Marcus Shawcroft
>> 
>> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> 
>> Tamar Christina  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford 
>> >> Sent: Tuesday, November 15, 2022 10:51 AM
>> >> To: Tamar Christina 
>> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> >> ; nd ; Marcus Shawcroft
>> >> 
>> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> >>
>> >> Tamar Christina  writes:
>> >> >> -Original Message-
>> >> >> From: Richard Sandiford 
>> >> >> Sent: Tuesday, November 15, 2022 10:36 AM
>> >> >> To: Tamar Christina 
>> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> >> >> ; nd ; Marcus
>> Shawcroft
>> >> >> 
>> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> >> >>
>> >> >> Tamar Christina  writes:
>> >> >> > Hello,
>> >> >> >
>> >> >> > Ping and updated patch.
>> >> >> >
>> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >> >> >
>> >> >> > Ok for master?
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Tamar
>> >> >> >
>> >> >> > gcc/ChangeLog:
>> >> >> >
>> >> >> > * config/aarch64/aarch64.md (*tb1): Rename
>> to...
>> >> >> > (*tb1): ... this.
>> >> >> > (tbranch4): New.
>> >> >> >
>> >> >> > gcc/testsuite/ChangeLog:
>> >> >> >
>> >> >> > * gcc.target/aarch64/tbz_1.c: New test.
>> >> >> >
>> >> >> > --- inline copy of patch ---
>> >> >> >
>> >> >> > diff --git a/gcc/config/aarch64/aarch64.md
>> >> >> > b/gcc/config/aarch64/aarch64.md index
>> >> >> >
>> >> >>
>> >>
>> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
>> >> >> 71
>> >> >> > 2bde55c7c72e 100644
>> >> >> > --- a/gcc/config/aarch64/aarch64.md
>> >> >> > +++ b/gcc/config/aarch64/aarch64.md
>> >> >> > @@ -943,12 +943,29 @@ (define_insn "*cb1"
>> >> >> >   (const_int 1)))]
>> >> >> >  )
>> >> >> >
>> >> >> > -(define_insn "*tb1"
>> >> >> > +(define_expand "tbranch4"
>> >> >> >[(set (pc) (if_then_else
>> >> >> > - (EQL (zero_extract:DI (match_operand:GPI 0
>> "register_operand"
>> >> >> "r")
>> >> >> > -   (const_int 1)
>> >> >> > -   (match_operand 1
>> >> >> > - 
>> >> >> > "aarch64_simd_shift_imm_" "n"))
>> >> >> > +   (match_operator 0 "aarch64_comparison_operator"
>> >> >> > +[(match_operand:ALLI 1 "register_operand")
>> >> >> > + (match_operand:ALLI 2
>> >> >> "aarch64_simd_shift_imm_")])
>> >> >> > +   (label_ref (match_operand 3 "" ""))
>> >> >> > +   (pc)))]
>> >> >> > +  "optimize > 0"
>> >> >>
>> >> >> Why's the pattern conditional on optimize?  Seems a valid choice
>> >> >> at -O0
>> >> too.
>> >> >>
>> >> >
>> >> > Hi,
>> >> >
>> >> > I had explained the reason why in the original patch, just didn't
>> >> > repeat it in
>> >> the ping:
>> >> >
>> >> > Instead of emitting the instruction directly I've chosen to expand
>> >> > the pattern using a zero extract and generating the existing
>> >> > pattern for comparisons for two
>> >> > reasons:
>> >> >
>> >> >   1. Allows for CSE of the actual comparison.
>> >> >   2. It looks like the code in expand makes the label as unused and
>> >> > removed
>> >> it
>> >> >  if it doesn't see a separate reference to it.
>> >> >
>> >> > Because of this expansion though I disable the pattern at -O0 since
>> >> > we
>> >> have no combine in that case so we'd end up with worse code.  I did
>> >> try emitting the pattern directly, but as mentioned in no#2 expand
>> >> would then kill the label.
>> >> >
>> >> > Basically I emit the pattern directly, immediately during expand
>> >> > the label is
>> >> marked as dead for some weird reason.
>> >>
>> >> Isn't #2 a bug though?  It seems like something we should fix rather
>> >> than work around.
>> >
>> > Yes it's a bug ☹ ok if I'm going to fix that bug then do I need to
>> > split the optabs still? Isn't the problem atm that I need the split?
>> > If I'm emitting the instruction directly then the recog pattern for it
>> > can just be (eq (vec_extract x 1) 0) which is the correct semantics?
>> 
>> What rtx does the code that uses the optab pass for operand 0?
>
> It gets passed the full comparison:
>
> (eq (reg/v:SI 92 [ x ])
> (const_int 0 [0]))
>
> of which we only look at the operator.

OK, that's what I thought.  The problem is then the one I mentioned above.
This rtx doesn't describe the operation that the optab is supposed to
perform, so it can never be used in the instruction pattern.  (This is
different from something like cbranch, where operand 0 can be used directly
if the target supports a very general compare-and-branch instruction.)

If we want to use a single op

[committed] libstdc++: Document use of Markdown for Doxygen comments

2022-11-15 Thread Jonathan Wakely via Gcc-patches
Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* doc/xml/manual/documentation_hacking.xml: Document use of
Markdown for Doxygen comments. Tweak formatting.
* doc/html/manual/documentation_hacking.html: Regenerate.
---
 .../html/manual/documentation_hacking.html| 21 --
 .../doc/xml/manual/documentation_hacking.xml  | 28 ++-
 2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/libstdc++-v3/doc/xml/manual/documentation_hacking.xml 
b/libstdc++-v3/doc/xml/manual/documentation_hacking.xml
index 776d5e857b5..44672f6e26d 100644
--- a/libstdc++-v3/doc/xml/manual/documentation_hacking.xml
+++ b/libstdc++-v3/doc/xml/manual/documentation_hacking.xml
@@ -286,7 +286,9 @@
formatting system, and will require the expansion of TeX's memory
capacity. Specifically, the pool_size
variable in the configuration file texmf.cnf may
-   need to be increased by a minimum factor of two.
+   need to be increased by a minimum factor of two. Alternatively, using
+   LATEX_CMD=lualatex might allow the docs to be
+   build without running out of memory.
   
 
 
@@ -515,9 +517,12 @@
   
 
   
-   Please use markup tags like @p and @a when referring to things
-   such as the names of function parameters. Use @e for emphasis
-   when necessary. Use @c to refer to other standard names.
+   Markdown can be used for formatting text. Doxygen is configured to
+   support this, and it is a good compromise between readable comments
+   in the C++ source and nice formatting in the generated HTML.
+   Please format the names of function parameters in either code font
+   or italics. Use underscores or @e for emphasis when necessary.
+   Use backticks or @c to refer to other standard names.
(Examples of all these abound in the present code.)
   
 
@@ -595,6 +600,7 @@
 
   HTML
   Doxygen
+  Markdown
 
   
 
@@ -602,41 +608,49 @@
 
   \
   \\
+  \\
 
 
 
   "
   \"
+  \"
 
 
 
   '
   \'
+  \'
 
 
 
   
   @a word
+  _word_ or *word*
 
 
 
   
   @b word
+  **word** or __word__
 
 
 
   
   @c word
+  `word`
 
 
 
   
   @a word
+  _word_ or *word*
 
 
 
   
   two words or more
+  _two words or more_
 
   
 
@@ -719,7 +733,7 @@
 
 
   
-   Editing the DocBook sources requires an XML editor. Many
+   An XML editor is recommended for editing the DocBook sources. Many
exist: some notable options
include emacs, Kate,
or Conglomerate.
@@ -815,8 +829,8 @@
   
 
   
-   The doc-html-docbook-regenerate target will generate
-   the HTML files and copy them back to the libstdc++ source tree.
+   The doc-html-docbook-regenerate target will
+   generate the HTML files and copy them back to the libstdc++ source tree.
This can be used to update the HTML files that are checked in to
version control.
   
-- 
2.38.1



RE: [PATCH 2/2]AArch64 Support new tbranch optab.

> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, November 15, 2022 11:34 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> ; nd ; Marcus Shawcroft
> 
> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Tuesday, November 15, 2022 11:15 AM
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> ; nd ; Marcus Shawcroft
> >> 
> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >>
> >> Tamar Christina  writes:
> >> >> -Original Message-
> >> >> From: Richard Sandiford 
> >> >> Sent: Tuesday, November 15, 2022 10:51 AM
> >> >> To: Tamar Christina 
> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> >> ; nd ; Marcus
> Shawcroft
> >> >> 
> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >>
> >> >> Tamar Christina  writes:
> >> >> >> -Original Message-
> >> >> >> From: Richard Sandiford 
> >> >> >> Sent: Tuesday, November 15, 2022 10:36 AM
> >> >> >> To: Tamar Christina 
> >> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> >> >> ; nd ; Marcus
> >> Shawcroft
> >> >> >> 
> >> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >> >>
> >> >> >> Tamar Christina  writes:
> >> >> >> > Hello,
> >> >> >> >
> >> >> >> > Ping and updated patch.
> >> >> >> >
> >> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no
> issues.
> >> >> >> >
> >> >> >> > Ok for master?
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Tamar
> >> >> >> >
> >> >> >> > gcc/ChangeLog:
> >> >> >> >
> >> >> >> > * config/aarch64/aarch64.md (*tb1):
> >> >> >> > Rename
> >> to...
> >> >> >> > (*tb1): ... this.
> >> >> >> > (tbranch4): New.
> >> >> >> >
> >> >> >> > gcc/testsuite/ChangeLog:
> >> >> >> >
> >> >> >> > * gcc.target/aarch64/tbz_1.c: New test.
> >> >> >> >
> >> >> >> > --- inline copy of patch ---
> >> >> >> >
> >> >> >> > diff --git a/gcc/config/aarch64/aarch64.md
> >> >> >> > b/gcc/config/aarch64/aarch64.md index
> >> >> >> >
> >> >> >>
> >> >>
> >>
> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
> >> >> >> 71
> >> >> >> > 2bde55c7c72e 100644
> >> >> >> > --- a/gcc/config/aarch64/aarch64.md
> >> >> >> > +++ b/gcc/config/aarch64/aarch64.md
> >> >> >> > @@ -943,12 +943,29 @@ (define_insn "*cb1"
> >> >> >> >   (const_int 1)))]
> >> >> >> >  )
> >> >> >> >
> >> >> >> > -(define_insn "*tb1"
> >> >> >> > +(define_expand "tbranch4"
> >> >> >> >[(set (pc) (if_then_else
> >> >> >> > - (EQL (zero_extract:DI (match_operand:GPI 0
> >> "register_operand"
> >> >> >> "r")
> >> >> >> > -   (const_int 1)
> >> >> >> > -   (match_operand 1
> >> >> >> > - 
> >> >> >> > "aarch64_simd_shift_imm_" "n"))
> >> >> >> > +   (match_operator 0 "aarch64_comparison_operator"
> >> >> >> > +[(match_operand:ALLI 1 "register_operand")
> >> >> >> > + (match_operand:ALLI 2
> >> >> >> "aarch64_simd_shift_imm_")])
> >> >> >> > +   (label_ref (match_operand 3 "" ""))
> >> >> >> > +   (pc)))]
> >> >> >> > +  "optimize > 0"
> >> >> >>
> >> >> >> Why's the pattern conditional on optimize?  Seems a valid
> >> >> >> choice at -O0
> >> >> too.
> >> >> >>
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > I had explained the reason why in the original patch, just
> >> >> > didn't repeat it in
> >> >> the ping:
> >> >> >
> >> >> > Instead of emitting the instruction directly I've chosen to
> >> >> > expand the pattern using a zero extract and generating the
> >> >> > existing pattern for comparisons for two
> >> >> > reasons:
> >> >> >
> >> >> >   1. Allows for CSE of the actual comparison.
> >> >> >   2. It looks like the code in expand makes the label as unused
> >> >> > and removed
> >> >> it
> >> >> >  if it doesn't see a separate reference to it.
> >> >> >
> >> >> > Because of this expansion though I disable the pattern at -O0
> >> >> > since we
> >> >> have no combine in that case so we'd end up with worse code.  I
> >> >> did try emitting the pattern directly, but as mentioned in no#2
> >> >> expand would then kill the label.
> >> >> >
> >> >> > Basically I emit the pattern directly, immediately during expand
> >> >> > the label is
> >> >> marked as dead for some weird reason.
> >> >>
> >> >> Isn't #2 a bug though?  It seems like something we should fix
> >> >> rather than work around.
> >> >
> >> > Yes it's a bug ☹ ok if I'm going to fix that bug then do I need to
> >> > split the optabs still? Isn't the problem atm that I need the split?
> >> > If I'm emitting the instruction directly then the recog pattern for
> >> > it can just be (eq (vec_extract x 1) 0) which is the correct semantics?
> >>
> >> What rtx does the code that uses the optab pass for op

Re: [PATCH]middle-end: replace GET_MODE_WIDER_MODE with GET_MODE_NEXT_MODE

Tamar Christina via Gcc-patches  writes:
> Hi All,
>
> After the fix to the addsub patch yesterday for bootstrap I had only 
> regtested on x86.
> While looking today it seemed the new tests were failing, this was caused
> by a change in the behavior of the GET_MODE_WIDER_MODE macro on trunk.
>
> This patch fixes that issue. Sorry for the mess, have rebased all branches 
> now.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * match.pd: Replace GET_MODE_WIDER_MODE with
>   GET_MODE_NEXT_MODE.
>
> --- inline copy of patch -- 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 1b0ab7cf60fa4772fbe8304c622b0b8fab1bdefa..28191a992039c6f3a1dab5f7c0e35dd58dc47092
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7997,7 +7997,7 @@ and,
> machine_mode wide_mode;
>   }
>   (if (sel.series_p (0, 2, 0, 2)
> -  && GET_MODE_WIDER_MODE (vec_mode).exists (&wide_mode)
> +  && GET_MODE_NEXT_MODE (vec_mode).exists (&wide_mode)
> && VECTOR_MODE_P (wide_mode)
> && (GET_MODE_UNIT_BITSIZE (vec_mode) * 2
> == GET_MODE_UNIT_BITSIZE (wide_mode)))

Does anything guarantee that the next mode will be the right one?
It think it would be safer to replace the last three && conditions with:

   && GET_MODE_2XWIDER_MODE (GET_MODE_INNER (vec_mode)).exists (&wide_elt_mode)
   && multiple_p (GET_MODE_NUNITS (vec_mode), 2, &wide_nunits)
   && related_vector_mode (vec_mode, wide_elt_mode,
   wide_nunits).exists (&wide_mode)

Thanks,
Richard


RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers

[Public]

Hi,

Thank you for reviewing the patch.

> Hi. I'm still waiting for feedback on fixes for existing models:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Finbox.
> sourceware.org%2Fgcc-patches%2F5ae6fc21-edc6-133-aee2-
> a41e16eb5b7%40ispras.ru%2FT%2F%23t&data=05%7C01%7CTejasSanja
> y.Joshi%40amd.com%7C5e440454f42948dd6b2e08dac6714448%7C3dd8961fe
> 4884e608e11a82d994e183d%7C0%7C0%7C638040487038011623%7CUnknow
> n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> WwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iWNT2VRhEHxgpbq
> Y4dNYjuzdvz%2BaV5XkLTuAegjj%2B5Q%3D&reserved=0
> did you have a chance to look at those?

I am yet to evaluate that patch, I will soon revert back.

> Why are you modeling 'fdiv' and 'ssediv' separately? When preparing the
> above patches, I checked that x87 and SSE divisions use the same hardware
> unit, and I don't see a strong reason to artificially clone it in the model.

I thought of modelling them separately as they are different ISA groups.
But yes, since they execute in the same unit, we can model them in the same 
automaton.

> I have a question on AVX512 modeling in your patch:
> 
> > +;; AVX instructions
> > +(define_insn_reservation "znver4_sse_log" 1
> > +  (and (eq_attr "cpu" "znver4")
> > +   (and (eq_attr "type" "sselog,sselog1")
> > +(and (eq_attr "mode" "V4SF,V8SF,V2DF,V4DF")
> > + (eq_attr "memory" "none"
> > +  "znver4-direct,znver4-fpu")
> > +
> > +(define_insn_reservation "znver4_sse_log_evex" 1
> > +  (and (eq_attr "cpu" "znver4")
> > +   (and (eq_attr "type" "sselog,sselog1")
> > +(and (eq_attr "mode" "V16SF,V8DF")
> > + (eq_attr "memory" "none"
> > +
> > +"znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3")
> > +
> 
> This is an AVX512 instruction, and you're modeling that it occupies two ports
> at once and thus has half throughput, but later in the AVX512 section:
> 
> > +;; AVX512 instructions
> > +(define_insn_reservation "znver4_sse_mul_evex" 3
> > +  (and (eq_attr "cpu" "znver4")
> > +   (and (eq_attr "type" "ssemul")
> > +(and (eq_attr "mode" "V16SF,V8DF")
> > + (eq_attr "memory" "none"
> > +  "znver4-double,znver4-fpu0|znver4-fpu3")
> 
> none of the instructions are modeled this way. If that's on purpose, can you
> add a comment? It's surprising, since generally AVX512 has half throughput
> compared to AVX256 on Zen 4, but the model doesn't seem to reflect that.

> > +"znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3")

AVX512 instructions (512-bitwide) occupy 2 consecutive cycles in the pipes they 
execute. So, it should be modelled as shown below:

(define_insn_reservation "znver4_sse_log_evex" 1
 (and (eq_attr "cpu" "znver4")
  (and (eq_attr "type" "sselog")
   (and (eq_attr "mode" "V16SF,V8DF,XI")
(eq_attr "memory" "none"
 "znver4-double,(znver4-fpu)*2")

(define_insn_reservation "znver4_sse_mul_evex" 3
 (and (eq_attr "cpu" "znver4")
  (and (eq_attr "type" "ssemul")
   (and (eq_attr "mode" "V16SF,V8DF")
(eq_attr "memory" "none"
 "znver4-double,(znver4-fpu0|znver4-fpu1)*2")
Doing this way increased the insn-automata.cc size from 201402 lines to 212189. 
Hope it is a tolerable increase or do you have any suggestions? I will revise 
all avx512 instructions and post it.

Thanks and Regards,
Tejas


[PATCH] Optimize testcase

From: Oria Chen 

gcc/testsuite/ChangeLog:

2022-11-15  Yixuan Chen  

* gcc.dg/fold-overflow-1.c: Optimize testcase, because riscv will use 
".LC0" intead of ".LC1" and ".LC2" with "-O" compile option
---
 gcc/testsuite/gcc.dg/fold-overflow-1.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/fold-overflow-1.c 
b/gcc/testsuite/gcc.dg/fold-overflow-1.c
index 108df4e3155..02598bfa3d3 100644
--- a/gcc/testsuite/gcc.dg/fold-overflow-1.c
+++ b/gcc/testsuite/gcc.dg/fold-overflow-1.c
@@ -18,5 +18,6 @@ float foo2(void)
   return 1.0f/0.0f;
 }
 
-/* { dg-final { scan-assembler-times "2139095040" 2 { target { ! mmix-*-* } } 
} } */
+/* { dg-final { scan-assembler-times "2139095040" 2 { target { ! { mmix-*-*  
riscv*-*-* } } } } } */
 /* { dg-final { scan-assembler-times "#7f80" 2 { target mmix-*-* } } } */
+/* { dg-final { scan-assembler-times "2139095040" 3 { target riscv*-*-* } } } 
*/
-- 
2.37.2



Re: [PATCH v3] c++: parser - Support for target address spaces in C++





Am 14.11.22 um 18:55 schrieb Jason Merrill:

On 11/10/22 06:40, Georg-Johann Lay wrote:



Am 10.11.22 um 15:08 schrieb Paul Iannetta:

On Thu, Nov 03, 2022 at 02:38:39PM +0100, Georg-Johann Lay wrote:

[PATCH v3] c++: parser - Support for target address spaces in C++

2. Will it work with compound literals?
===

Currently, the following C code works for target avr:

const __flash char *pHallo = (const __flash char[]) { "Hallo" };

This is a pointer in RAM (AS0) that holds the address of a string in 
flash
(AS1) and is initialized with that address. Unfortunately, this does 
not

work locally:

const __flash char* get_hallo (void)
{
 [static] const __flash char *p2 = (const __flash char[]) { 
"Hallo2" };

 return p2;
}

foo.c: In function 'get_hallo':
foo.c: error: compound literal qualified by address-space qualifier

Is there any way to make this work now? Would be great!


I don't object to allowing this, but what's the advantage of this 
pattern over


static __flash const char p2[] = "Hallo2";

?


Hi Jason.

Take an example that's a bit more complicated, like:

#define FSTR(X) (const __flash char[]) { X }

const __flash char* strings[] =
{
FSTR("cat"),
FSTR("dog"),
FSTR("petrophaga lorioti")
};

This won't work in a function just because GCC rejects it, no matter 
whether strings[] itself is in __flash or not.  One work around would be to


const __flash char str_cat[] = "cat";
const __flash char str_dog[] = "dog";
const __flash char str_petrophaga_lorioti[] = "petrophaga_lorioti";

const __flash char* strings[] =
{
str_cat,
str_dog,
str_petrophaga_lorioti
};

but anyone would prefer the first alternative.

In a more broader context, code without ASes like

const char* strings[] =
{
"cat",
"dog",
"petrophaga lorioti"
};

that makes perfect sense in C/C++ should also be possible for 
address-spaces, no matter whether the pointers and/or strings[] is in 
non-generic AS.


Unfortunately, ISO/IEC TR 18037 seems to never have considered such 
code, and they mostly are concerned with how code is being accessed, but 
not how code is being located.


Johann



Currently, I implement the same restrictions as the C front-end, but I
think that this restriction could be lifted.

3. Will TARGET_ADDR_SPACE_DIAGNOSE_USAGE still work?


Currently there is target hook TARGET_ADDR_SPACE_DIAGNOSE_USAGE.
I did not see it in your patches, so maybe I just missed it? See
https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gccint/Named-Address-Spaces.html#index-TARGET_005fADDR_005fSPACE_005fDIAGNOSE_005fUSAGE


That was a point I overlooked in my previous patch.  This will be in
my new revision where I also add implicit conversion between
address spaces and also expose TARGET_ADDR_SPACE_CONVERT.


4. Will it be possible to put C++ virtual tables in ASs, and how?
=


Currently, I do not allow the declaration of instances of classes in
an address space, mainly to not have to cope with the handling of the
this pointer.  That is,

   __flash Myclass *t;

does not work.  Nevertheless, I admit that this is would be nice to
have.

One big complaint about avr-g++ is that there is no way to put 
vtables in
flash (address-space 1) and to access them accordingly.  How can 
this be

achieved with C++ address spaces?


Do you want only the vtables in the flash address space or do you want
to be able to have the whole class content.


My question is about vtables, not the bits that represent some object.
vtables are stored independently of objects, usually in .rodata + 
comdat.  Notice that vtables are read-only and in static storage, even 
if objects are neither.


The problem with vtables is that the user has no handle to specify 
where to locate them -- and even if, due to AVR quirks, the right 
instruction must be used.  Thus just putting vtables in flash by means 
of some section attribute won't work, only address-spaces can do the 
trick.



1. If you only want the vtables, I think that a target hook called
at vtable creation would do the trick.


Yes, that would be enough, see https://gcc.gnu.org/PR43745


As you say there, this would be an ABI change, so there would need to be 
a transition strategy.  I don't know to what extent AVR users try to use 
older compiled code vs. always rebuilding everything.



Johann


2. If you want to be able to have pointer to classes in __flash, I
will need to further the support I have currently implemented to
support the pointer this qualified with an address space.
Retrospectively, I think this have to be implemented.

Paul


Would be great if this would work, but I think this can be really 
tricky, because it's already tricky for non-class objects.


Indeed, especially if objects of the same type can live either in flash 
or RAM: you'd need 2 or more of each method for the different accesses. 
Perhaps via 

Re: [PATCH] Optimize testcase

On Tue, Nov 15, 2022 at 08:13:45PM +0800, Yixuan Chen wrote:
> From: Oria Chen 
> 
> gcc/testsuite/ChangeLog:
> 
> 2022-11-15  Yixuan Chen  
> 
> * gcc.dg/fold-overflow-1.c: Optimize testcase, because riscv will use 
> ".LC0" intead of ".LC1" and ".LC2" with "-O" compile option

This is wrong.
See https://gcc.gnu.org/PR107608

Jakub



[PATCH] c++: Fix up calls to static operator() or operator[] [PR107624]

Hi!

On Mon, Nov 14, 2022 at 06:29:44PM -0500, Jason Merrill wrote:
> Indeed.  The code in build_new_method_call for this case has the comment
> 
>   /* In an expression of the form `a->f()' where `f' turns
>  out to be a static member function, `a' is
>  none-the-less evaluated.  */

Had to tweak 3 spots for this.  Furthermore, found that if in non-pedantic
C++20 compilation static operator[] is accepted, we required that it has 2
arguments, I think it is better to require exactly one because that case
is the only one that will actually work in C++20 and older.

Lightly tested so far, ok for trunk if it passes bootstrap/regtest?

Or do you want to outline the
  if (result != error_mark_node
  && TREE_CODE (TREE_TYPE (cand->fn)) != METHOD_TYPE
  && TREE_SIDE_EFFECTS (obj))
{
  /* But avoid the implicit lvalue-rvalue conversion when 'a'
 is volatile.  */
  tree a = obj;
  if (TREE_THIS_VOLATILE (a))
a = build_this (a);
  if (TREE_SIDE_EFFECTS (a))
result = build2 (COMPOUND_EXPR, TREE_TYPE (result), a, result);
}
part that is now repeated 4 times to some helper function?  If yes,
any suggestion on a good name?

2022-11-15  Jakub Jelinek  

PR c++/107624
* call.cc (build_op_call): If obj has side-effects
and operator() is static member function, return COMPOUND_EXPR
with the obj side-effects other than reading from volatile
object.
(build_op_subscript): Likewise.
(build_new_op): Similarly for ARRAY_REF, just for arg1 rather than
obj.
* decl.cc (grok_op_properties): For C++20 and earlier, if operator[]
is static member function, require exactly one parameter rather than
exactly two parameters.

* g++.dg/cpp23/static-operator-call4.C: New test.
* g++.dg/cpp23/subscript10.C: New test.
* g++.dg/cpp23/subscript11.C: New test.

--- gcc/cp/call.cc.jj   2022-11-15 07:59:57.337231337 +0100
+++ gcc/cp/call.cc  2022-11-15 13:02:33.369531156 +0100
@@ -5137,7 +5137,24 @@ build_op_call (tree obj, vecfn) == FUNCTION_DECL
   && DECL_OVERLOADED_OPERATOR_P (cand->fn)
   && DECL_OVERLOADED_OPERATOR_IS (cand->fn, CALL_EXPR))
-   result = build_over_call (cand, LOOKUP_NORMAL, complain);
+   {
+ result = build_over_call (cand, LOOKUP_NORMAL, complain);
+ /* In an expression of the form `a()' where cand->fn
+which is operator() turns out to be a static member function,
+`a' is none-the-less evaluated.  */
+ if (result != error_mark_node
+ && TREE_CODE (TREE_TYPE (cand->fn)) != METHOD_TYPE
+ && TREE_SIDE_EFFECTS (obj))
+   {
+ /* But avoid the implicit lvalue-rvalue conversion when 'a'
+is volatile.  */
+ tree a = obj;
+ if (TREE_THIS_VOLATILE (a))
+   a = build_this (a);
+ if (TREE_SIDE_EFFECTS (a))
+   result = build2 (COMPOUND_EXPR, TREE_TYPE (result), a, result);
+   }
+   }
   else
{
  if (TREE_CODE (cand->fn) == FUNCTION_DECL)
@@ -7046,6 +7063,24 @@ build_new_op (const op_location_t &loc,
  gcc_unreachable ();
}
}
+
+ /* In an expression of the form `a[]' where cand->fn
+which is operator[] turns out to be a static member function,
+`a' is none-the-less evaluated.  */
+ if (code == ARRAY_REF
+ && result
+ && result != error_mark_node
+ && TREE_CODE (TREE_TYPE (cand->fn)) != METHOD_TYPE
+ && TREE_SIDE_EFFECTS (arg1))
+   {
+ /* But avoid the implicit lvalue-rvalue conversion when 'a'
+is volatile.  */
+ tree a = arg1;
+ if (TREE_THIS_VOLATILE (a))
+   a = build_this (a);
+ if (TREE_SIDE_EFFECTS (a))
+   result = build2 (COMPOUND_EXPR, TREE_TYPE (result), a, result);
+   }
}
   else
{
@@ -7302,6 +7337,24 @@ build_op_subscript (const op_location_t
  /* Specify evaluation order as per P0145R2.  */
  CALL_EXPR_ORDERED_ARGS (call) = op_is_ordered (ARRAY_REF) == 1;
}
+
+ /* In an expression of the form `a[]' where cand->fn
+which is operator[] turns out to be a static member function,
+`a' is none-the-less evaluated.  */
+ if (result
+ && result != error_mark_node
+ && TREE_CODE (TREE_TYPE (cand->fn)) != METHOD_TYPE
+ && TREE_SIDE_EFFECTS (obj))
+   {
+ /* But avoid the implicit lvalue-rvalue conversion when 'a'
+is volatile.  */
+ tree a = obj;
+ if (TREE_THIS_VOLATILE (a))
+

Re: GCC 13.0.0 Status Report (2022-11-14), Stage 3 in effect now

On 11/15/22 11:07, Jakub Jelinek wrote:
> On Tue, Nov 15, 2022 at 11:02:53AM +0100, Martin Liška wrote:
>>> Is it allowed to merge libsanitizer from LLVM in stage 3?  If not I'd
>>> like to cherry pick some commits from LLVM [to fix some stupid errors
>>> I've made in LoongArch libasan :(].
>>
>> I'm sorry but I was really busy with the porting of the documentation to 
>> Sphinx.
>>
>> Anyway, yes, we should make one one libsanitizer merge, but RM should likely
>> approve it: Richi, Jakub, do you support it?
> 
> Could you please prepare a patch, so that we can see how much actually
> changed and decide based on that whether to go for a merge or cherry-picking
> one or more commits?

Sure, there it is. There's a minor change in output format that I address in 
0003 patch.

Apart from that, I was able to run all tests on x86_64-linux-gnu.
Patch statistics:
 46 files changed, 524 insertions(+), 252 deletions(-)

I'm running build on ppc64le and if you're fine, I'm going to finish
a proper libsanitizer testing procedure.

Martin

> I think last merge was done by you at the end of August, so we have
> 2.5 months of changes to potentially merge.
> 
>   Jakub
> 
From b9da933ec8860e0c217e2f6fc08f08687d40725f Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 15 Nov 2022 12:02:36 +0100
Subject: [PATCH 3/3] asan: update expected format based on ASAN

gcc/testsuite/ChangeLog:

	* c-c++-common/asan/global-overflow-1.c: Update
	expected format.
	* c-c++-common/asan/heap-overflow-1.c: Likewise.
	* c-c++-common/asan/strlen-overflow-1.c: Likewise.
	* c-c++-common/asan/strncpy-overflow-1.c: Likewise.
	* c-c++-common/hwasan/heap-overflow.c: Likewise.
	* g++.dg/asan/asan_mem_test.cc: Likewise.
	* g++.dg/asan/asan_oob_test.cc: Likewise.
	* g++.dg/asan/asan_str_test.cc: Likewise.
	* g++.dg/asan/asan_test.cc: Likewise.
	* g++.dg/asan/large-func-test-1.C: Likewise.
---
 .../c-c++-common/asan/global-overflow-1.c |  2 +-
 .../c-c++-common/asan/heap-overflow-1.c   |  2 +-
 .../c-c++-common/asan/strlen-overflow-1.c |  2 +-
 .../c-c++-common/asan/strncpy-overflow-1.c|  2 +-
 .../c-c++-common/hwasan/heap-overflow.c   |  2 +-
 gcc/testsuite/g++.dg/asan/asan_mem_test.cc| 20 +--
 gcc/testsuite/g++.dg/asan/asan_oob_test.cc| 12 +++
 gcc/testsuite/g++.dg/asan/asan_str_test.cc|  4 +--
 gcc/testsuite/g++.dg/asan/asan_test.cc| 36 +--
 gcc/testsuite/g++.dg/asan/large-func-test-1.C |  2 +-
 10 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/asan/global-overflow-1.c b/gcc/testsuite/c-c++-common/asan/global-overflow-1.c
index ec412231be0..b97801da2b7 100644
--- a/gcc/testsuite/c-c++-common/asan/global-overflow-1.c
+++ b/gcc/testsuite/c-c++-common/asan/global-overflow-1.c
@@ -25,5 +25,5 @@ int main() {
 /* { dg-skip-if "inaccurate debug info" { mips*-*-* } { "*" } { "-O0" } } */
 /* { dg-output "READ of size 1 at 0x\[0-9a-f\]+ thread T0.*(\n|\r\n|\r)" } */
 /* { dg-output "#0 0x\[0-9a-f\]+ +(in _*main (\[^\n\r]*global-overflow-1.c:20|\[^\n\r]*:0|\[^\n\r]*\\+0x\[0-9a-z\]*)|\[(\])\[^\n\r]*(\n|\r\n|\r).*" } */
-/* { dg-output "0x\[0-9a-f\]+ is located 0 bytes to the right of global variable" } */
+/* { dg-output "0x\[0-9a-f\]+ is located 0 bytes after global variable" } */
 /* { dg-output ".*YYY\[^\n\r]* of size 10\[^\n\r]*(\n|\r\n|\r)" } */
diff --git a/gcc/testsuite/c-c++-common/asan/heap-overflow-1.c b/gcc/testsuite/c-c++-common/asan/heap-overflow-1.c
index 7ef048e636f..7d8744852ae 100644
--- a/gcc/testsuite/c-c++-common/asan/heap-overflow-1.c
+++ b/gcc/testsuite/c-c++-common/asan/heap-overflow-1.c
@@ -25,7 +25,7 @@ int main(int argc, char **argv) {
 
 /* { dg-output "READ of size 1 at 0x\[0-9a-f\]+ thread T0.*(\n|\r\n|\r)" } */
 /* { dg-output "#0 0x\[0-9a-f\]+ +(in _*main (\[^\n\r]*heap-overflow-1.c:21|\[^\n\r]*:0|\[^\n\r]*\\+0x\[0-9a-z\]*)|\[(\]).*(\n|\r\n|\r)" } */
-/* { dg-output "\[^\n\r]*0x\[0-9a-f\]+ is located 0 bytes to the right of 10-byte region\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*0x\[0-9a-f\]+ is located 0 bytes after 10-byte region\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "\[^\n\r]*allocated by thread T0 here:\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "#0 0x\[0-9a-f\]+ +(in _*(interceptor_|wrap_|)malloc|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "#1 0x\[0-9a-f\]+ +(in _*main (\[^\n\r]*heap-overflow-1.c:19|\[^\n\r]*:0|\[^\n\r]*\\+0x\[0-9a-z\]*)|\[(\])\[^\n\r]*(\n|\r\n|\r)" } */
diff --git a/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c b/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c
index 86a79fd5d06..34c20c8ed50 100644
--- a/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c
+++ b/gcc/testsuite/c-c++-common/asan/strlen-overflow-1.c
@@ -21,4 +21,4 @@ int main () {
 
 /* { dg-output "READ of size 2 at 0x\[0-9a-f\]+ thread T0.*(\n|\r\n|\r)" } */
 /* { dg-output "#1 0x\[0-9a-f\]+ +(in _*main (\[^\n\r]*strlen-overflow-1.c:19|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r)" } *

RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers



On Tue, 15 Nov 2022, Joshi, Tejas Sanjay wrote:

> > > +;; AVX instructions
> > > +(define_insn_reservation "znver4_sse_log" 1
> > > +  (and (eq_attr "cpu" "znver4")
> > > +   (and (eq_attr "type" "sselog,sselog1")
> > > +(and (eq_attr "mode" 
> > > "V4SF,V8SF,V2DF,V4DF")
> > > + (eq_attr "memory" "none"
> > > +  "znver4-direct,znver4-fpu")
> > > +
> > > +(define_insn_reservation "znver4_sse_log_evex" 1
> > > +  (and (eq_attr "cpu" "znver4")
> > > +   (and (eq_attr "type" "sselog,sselog1")
> > > +(and (eq_attr "mode" "V16SF,V8DF")
> > > + (eq_attr "memory" "none"
> > > +
> > > +"znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3")
> > > +
> > 
> > This is an AVX512 instruction, and you're modeling that it occupies two 
> > ports
> > at once and thus has half throughput, but later in the AVX512 section:
> > 
> > > +;; AVX512 instructions
> > > +(define_insn_reservation "znver4_sse_mul_evex" 3
> > > +  (and (eq_attr "cpu" "znver4")
> > > +   (and (eq_attr "type" "ssemul")
> > > +(and (eq_attr "mode" "V16SF,V8DF")
> > > + (eq_attr "memory" "none"
> > > +  "znver4-double,znver4-fpu0|znver4-fpu3")
> > 
> > none of the instructions are modeled this way. If that's on purpose, can you
> > add a comment? It's surprising, since generally AVX512 has half throughput
> > compared to AVX256 on Zen 4, but the model doesn't seem to reflect that.
> 
> > > +"znver4-direct,znver4-fpu0+znver4-fpu1|znver4-fpu2+znver4-fpu3")
> 
> AVX512 instructions (512-bitwide) occupy 2 consecutive cycles in the pipes
> they execute. So, it should be modelled as shown below:
> 
> (define_insn_reservation "znver4_sse_log_evex" 1
>(and (eq_attr "cpu" "znver4")
> (and (eq_attr "type" "sselog")
>  (and (eq_attr "mode" "V16SF,V8DF,XI")
>   (eq_attr "memory" "none"
>"znver4-double,(znver4-fpu)*2")

I think instead of (znver4-fpu)*2 there should be

  znver4-fpu0*2|znver4-fpu1*2|znver4-fpu2*2|znver4-fpu3*2

assuming the instruction occupies the same pipe on both cycles (your
variant models as if it can move from one pipe to another).

> (define_insn_reservation "znver4_sse_mul_evex" 3
>(and (eq_attr "cpu" "znver4")
> (and (eq_attr "type" "ssemul")
>  (and (eq_attr "mode" "V16SF,V8DF")
>   (eq_attr "memory" "none"
>"znver4-double,(znver4-fpu0|znver4-fpu1)*2")

Likewise here, znver4-fpu0*2|znver4-fpu1*2.

> Doing this way increased the insn-automata.cc size from 201402 lines to 
> 212189.

Please reevaluate on top of my patches, the impact will be different.

> Hope it is a tolerable increase or do you have any suggestions?

Please take the corrections above into account.

Also I think it's better to use znver4-direct rather than znver4-double for
AVX512 instructions, because they are decoded as one uop, not two (it won't
make a practical difference due to a "Fix me", but it's a simple improvement).

Thanks.

Alexander


Re: GCC 13.0.0 Status Report (2022-11-14), Stage 3 in effect now

On Tue, Nov 15, 2022 at 01:49:36PM +0100, Martin Liška wrote:
> On 11/15/22 11:07, Jakub Jelinek wrote:
> > On Tue, Nov 15, 2022 at 11:02:53AM +0100, Martin Liška wrote:
> >>> Is it allowed to merge libsanitizer from LLVM in stage 3?  If not I'd
> >>> like to cherry pick some commits from LLVM [to fix some stupid errors
> >>> I've made in LoongArch libasan :(].
> >>
> >> I'm sorry but I was really busy with the porting of the documentation to 
> >> Sphinx.
> >>
> >> Anyway, yes, we should make one one libsanitizer merge, but RM should 
> >> likely
> >> approve it: Richi, Jakub, do you support it?
> > 
> > Could you please prepare a patch, so that we can see how much actually
> > changed and decide based on that whether to go for a merge or cherry-picking
> > one or more commits?
> 
> Sure, there it is. There's a minor change in output format that I address in 
> 0003 patch.
> 
> Apart from that, I was able to run all tests on x86_64-linux-gnu.
> Patch statistics:
>  46 files changed, 524 insertions(+), 252 deletions(-)
> 
> I'm running build on ppc64le and if you're fine, I'm going to finish
> a proper libsanitizer testing procedure.

Ok.

Jakub



[PATCH] LoongArch: Fix atomic_exchange make comparison and may jump out

gcc/ChangeLog:

* config/loongarch/sync.md:
Add atomic_cas_value_exchange_and_7 and fix atomic_exchange.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/sync-1.c: New test.
---
 gcc/config/loongarch/sync.md|  27 -
 gcc/testsuite/gcc.target/loongarch/sync-1.c | 104 
 2 files changed, 129 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/sync-1.c

diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 0c4f1983e..8a8e6247b 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -448,6 +448,29 @@
 }
   [(set (attr "length") (const_int 32))])
 
+(define_insn "atomic_cas_value_exchange_and_7_"
+  [(set (match_operand:GPR 0 "register_operand" "=&r")
+   (match_operand:GPR 1 "memory_operand" "+ZC"))
+   (set (match_dup 1)
+   (unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
+ (match_operand:GPR 3 "reg_or_0_operand" "rJ")
+ (match_operand:GPR 4 "reg_or_0_operand" "rJ")
+ (match_operand:GPR 5 "reg_or_0_operand"  "rJ")
+ (match_operand:SI 6 "const_int_operand")] ;; model
+UNSPEC_SYNC_EXCHANGE))
+   (clobber (match_scratch:GPR 7 "=&r"))]
+  ""
+{
+  return "%G6\\n\\t"
+"1:\\n\\t"
+"ll.\\t%0,%1\\n\\t"
+"and\\t%7,%0,%z3\\n\\t"
+"or%i5\\t%7,%7,%5\\n\\t"
+"sc.\\t%7,%1\\n\\t"
+"beqz\\t%7,1b\\n\\t";
+}
+  [(set (attr "length") (const_int 20))])
+
 (define_expand "atomic_exchange"
   [(set (match_operand:SHORT 0 "register_operand")
(unspec_volatile:SHORT
@@ -459,9 +482,9 @@
   ""
 {
   union loongarch_gen_fn_ptrs generator;
-  generator.fn_7 = gen_atomic_cas_value_cmp_and_7_si;
+  generator.fn_7 = gen_atomic_cas_value_exchange_and_7_si;
   loongarch_expand_atomic_qihi (generator, operands[0], operands[1],
-   operands[1], operands[2], operands[3]);
+   const0_rtx, operands[2], operands[3]);
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/loongarch/sync-1.c 
b/gcc/testsuite/gcc.target/loongarch/sync-1.c
new file mode 100644
index 0..cebed6a9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/sync-1.c
@@ -0,0 +1,104 @@
+/* Test __sync_test_and_set in atomic_exchange */
+/* { dg-do run } */
+/* { dg-options "-lpthread -std=c11" } */
+
+#include 
+#include 
+#include 
+#include 
+
+#define NR_THREAD 16
+#define NR_DATA 1000
+#define ITER_COUNT 1
+
+static int _data[NR_DATA];
+static char _lock;
+static int _overcnt;
+
+static inline void proc_yield(int cnt)
+{
+  __asm__ __volatile__("":::"memory");
+}
+
+static void unlock()
+{
+  return atomic_store_explicit(&_lock, 0, memory_order_seq_cst);
+}
+
+static int trylock()
+{
+  return atomic_exchange_explicit(&_lock, 1, memory_order_acquire) == 0;
+}
+
+static void lockslow()
+{
+  for (int i = 0;; i++) {
+if (i < 10)
+  proc_yield(i);
+else
+  sched_yield();
+if (atomic_load_explicit(&_lock, memory_order_relaxed) == 0
+  && atomic_exchange_explicit(&_lock, 1, memory_order_acquire) == 0)
+  return;
+  }
+}
+
+static void lock()
+{
+  if (trylock())
+return;
+  lockslow();
+}
+
+static void checkeq(int a, int b)
+{
+  if (a != b)
+__builtin_abort();
+}
+
+static void adddata()
+{
+  int i, v;
+  lock();
+  v = _data[0];
+  for (i = 0; i < NR_DATA; i++) {
+checkeq(_data[i], v);
+_data[i]++;
+  }
+  unlock();
+}
+
+static void backoff()
+{
+  int i, data[NR_DATA] = {0};
+  for (i = 0; i < NR_DATA; i++) {
+data[i]++;
+checkeq(data[i], 1);
+  }
+}
+
+static void *write_mutex_thread(void *unused)
+{
+  int i;
+  for (i = 0; i < ITER_COUNT; i++) {
+adddata();
+backoff();
+  }
+  atomic_fetch_add(&_overcnt, 1);
+}
+
+int main()
+{
+  int cnt;
+
+  pthread_t threads[NR_THREAD];
+  for (int i = 0; i < NR_THREAD; i++)
+pthread_create(&threads[i], 0, write_mutex_thread, NULL);
+  for (int i = 0; i < NR_THREAD; i++)
+pthread_detach(threads[i]);
+  while(cnt != NR_THREAD) {
+sched_yield();
+cnt = atomic_load(&_overcnt);
+  }
+  return 0;
+}
-- 
2.34.3



RE: [PATCH]middle-end: replace GET_MODE_WIDER_MODE with GET_MODE_NEXT_MODE

> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, November 15, 2022 11:59 AM
> To: Tamar Christina via Gcc-patches 
> Cc: Tamar Christina ; nd ;
> rguent...@suse.de; j...@ventanamicro.com
> Subject: Re: [PATCH]middle-end: replace GET_MODE_WIDER_MODE with
> GET_MODE_NEXT_MODE
> 
> Tamar Christina via Gcc-patches  writes:
> > Hi All,
> >
> > After the fix to the addsub patch yesterday for bootstrap I had only
> regtested on x86.
> > While looking today it seemed the new tests were failing, this was
> > caused by a change in the behavior of the GET_MODE_WIDER_MODE
> macro on trunk.
> >
> > This patch fixes that issue. Sorry for the mess, have rebased all branches
> now.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Replace GET_MODE_WIDER_MODE with
> > GET_MODE_NEXT_MODE.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/match.pd b/gcc/match.pd index
> >
> 1b0ab7cf60fa4772fbe8304c622b0b8fab1bdefa..28191a992039c6f3a1dab5f7c0
> e3
> > 5dd58dc47092 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -7997,7 +7997,7 @@ and,
> > machine_mode wide_mode;
> >   }
> >   (if (sel.series_p (0, 2, 0, 2)
> > -  && GET_MODE_WIDER_MODE (vec_mode).exists (&wide_mode)
> > +  && GET_MODE_NEXT_MODE (vec_mode).exists (&wide_mode)
> >   && VECTOR_MODE_P (wide_mode)
> >   && (GET_MODE_UNIT_BITSIZE (vec_mode) * 2
> >   == GET_MODE_UNIT_BITSIZE (wide_mode)))
> 
> Does anything guarantee that the next mode will be the right one?
> It think it would be safer to replace the last three && conditions with:
> 
>&& GET_MODE_2XWIDER_MODE (GET_MODE_INNER (vec_mode)).exists
> (&wide_elt_mode)
>&& multiple_p (GET_MODE_NUNITS (vec_mode), 2, &wide_nunits)
>&& related_vector_mode (vec_mode, wide_elt_mode,
>  wide_nunits).exists (&wide_mode)

I see, respun patch accordingly.

Ok for master?

--- inline copy of patch ---

diff --git a/gcc/match.pd b/gcc/match.pd
index 
1b0ab7cf60fa4772fbe8304c622b0b8fab1bdefa..82f05bbc912e4f80f3984d930c4a8dcb010136e1
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7995,12 +7995,15 @@ and,
vec_perm_indices sel (builder, 2, nelts);
machine_mode vec_mode = TYPE_MODE (type);
machine_mode wide_mode;
+   scalar_mode wide_elt_mode;
+   poly_uint64 wide_nunits;
+   scalar_mode inner_mode = GET_MODE_INNER (vec_mode);
  }
  (if (sel.series_p (0, 2, 0, 2)
-  && GET_MODE_WIDER_MODE (vec_mode).exists (&wide_mode)
- && VECTOR_MODE_P (wide_mode)
- && (GET_MODE_UNIT_BITSIZE (vec_mode) * 2
- == GET_MODE_UNIT_BITSIZE (wide_mode)))
+ && GET_MODE_2XWIDER_MODE (inner_mode).exists (&wide_elt_mode)
+ && multiple_p (GET_MODE_NUNITS (vec_mode), 2, &wide_nunits)
+ && related_vector_mode (vec_mode, wide_elt_mode,
+ wide_nunits).exists (&wide_mode))
(with
 {
   tree stype


rb16595.patch
Description: rb16595.patch


[PATCH (pushed)] libsanitizer: use git clone --depth 1

Using depth == 1 it makes the cloning much faster.

libsanitizer/ChangeLog:

* merge.sh: Use git clone --depth 1.
---
 libsanitizer/merge.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libsanitizer/merge.sh b/libsanitizer/merge.sh
index 95ded4f9634..7d1b553e2bb 100755
--- a/libsanitizer/merge.sh
+++ b/libsanitizer/merge.sh
@@ -6,7 +6,7 @@
 
 get_upstream() {
   rm -rf upstream
-  git clone https://github.com/llvm/llvm-project.git upstream
+  git clone --depth 1 https://github.com/llvm/llvm-project.git upstream
 }
 
 get_current_rev() {
-- 
2.38.1



Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote:

> > @item -mrelax-cmpxchg-loop
> > @opindex mrelax-cmpxchg-loop
> >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
> >-execute pause if load value is not expected. This reduces excessive
> >-cachline bouncing when and works for all atomic logic fetch builtins
> >-that generates compare and swap loop.
> >+For compare and swap loops that emitted by some __atomic_* builtins
> 
> s/that emitted/that are emitted/
> 
> >+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
> >+counterparts), emit an atomic load before cmpxchg instruction. If the
> 
> s/before cmpxchg/before the cmpxchg/
> 
> >+loaded value is not equal to expected, execute a pause instead of
> 
> s/not equal to expected/not equal to the expected/
> 
> >+directly run the cmpxchg instruction. This might reduce excessive
> 
> s/directly run/directly running/

This results in "... execute a pause instead of directly running the
cmpxchg instruction", which needs further TLC because:

* 'a pause' should be 'the PAUSE instruction';
* 'directly running [an instruction]' does not seem correct in context.

The option also applies to __sync builtins, not just __atomic.


How about the following:

When emitting a compare-and-swap loop for @ref{__sync Builtins}
and @ref{__atomic Builtins} lacking a native instruction, optimize
for the highly contended case by issuing an atomic load before the
@code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
when restarting the loop.

Alexander


Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

On Tue, 15 Nov 2022 at 13:34, Alexander Monakov  wrote:
>
> On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote:
>
> > > @item -mrelax-cmpxchg-loop
> > > @opindex mrelax-cmpxchg-loop
> > >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
> > >-execute pause if load value is not expected. This reduces excessive
> > >-cachline bouncing when and works for all atomic logic fetch builtins
> > >-that generates compare and swap loop.
> > >+For compare and swap loops that emitted by some __atomic_* builtins
> >
> > s/that emitted/that are emitted/
> >
> > >+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
> > >+counterparts), emit an atomic load before cmpxchg instruction. If the
> >
> > s/before cmpxchg/before the cmpxchg/
> >
> > >+loaded value is not equal to expected, execute a pause instead of
> >
> > s/not equal to expected/not equal to the expected/
> >
> > >+directly run the cmpxchg instruction. This might reduce excessive
> >
> > s/directly run/directly running/
>
> This results in "... execute a pause instead of directly running the
> cmpxchg instruction", which needs further TLC because:
>
> * 'a pause' should be 'the PAUSE instruction';
> * 'directly running [an instruction]' does not seem correct in context.
>
> The option also applies to __sync builtins, not just __atomic.
>
>
> How about the following:
>
> When emitting a compare-and-swap loop for @ref{__sync Builtins}
> and @ref{__atomic Builtins} lacking a native instruction, optimize
> for the highly contended case by issuing an atomic load before the
> @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
> when restarting the loop.

That's much better, thanks. My only remaining quibble would be that
"invoking" an instruction seems only marginally better than running
one. Emitting? Issuing? Using? Adding?



Re: [PATCH Rust front-end v3 38/46] gccrs: Add HIR to GCC GENERIC lowering entry point




On 11/9/22 14:53, Richard Biener wrote:

On Wed, Oct 26, 2022 at 10:37 AM  wrote:


From: Philip Herron 

This patch contains the entry point and utilities used for the lowering
of HIR nodes to `tree`s. It also contains a constant evaluator, ported
over from the C++ frontend.

Co-authored-by: David Faust 
Co-authored-by: Faisal Abbas <90.abbasfai...@gmail.com>
---
  gcc/rust/backend/rust-compile-context.cc | 146 
  gcc/rust/backend/rust-compile-context.h  | 343 ++
  gcc/rust/backend/rust-compile.cc | 414 +
  gcc/rust/backend/rust-compile.h  |  47 +++
  gcc/rust/backend/rust-constexpr.cc   | 441 +++
  gcc/rust/backend/rust-constexpr.h|  31 ++
  6 files changed, 1422 insertions(+)
  create mode 100644 gcc/rust/backend/rust-compile-context.cc
  create mode 100644 gcc/rust/backend/rust-compile-context.h
  create mode 100644 gcc/rust/backend/rust-compile.cc
  create mode 100644 gcc/rust/backend/rust-compile.h
  create mode 100644 gcc/rust/backend/rust-constexpr.cc
  create mode 100644 gcc/rust/backend/rust-constexpr.h

diff --git a/gcc/rust/backend/rust-compile-context.cc 
b/gcc/rust/backend/rust-compile-context.cc
new file mode 100644
index 000..cb2addf6c21
--- /dev/null
+++ b/gcc/rust/backend/rust-compile-context.cc
@@ -0,0 +1,146 @@
+// Copyright (C) 2020-2022 Free Software Foundation, Inc.
+
+// This file is part of GCC.
+
+// GCC is free software; you can redistribute it and/or modify it under
+// the terms of the GNU General Public License as published by the Free
+// Software Foundation; either version 3, or (at your option) any later
+// version.
+
+// GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+// WARRANTY; without even the implied warranty of MERCHANTABILITY or
+// FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+// for more details.
+
+// You should have received a copy of the GNU General Public License
+// along with GCC; see the file COPYING3.  If not see
+// .
+
+#include "rust-compile-context.h"
+#include "rust-compile-type.h"
+
+namespace Rust {
+namespace Compile {
+
+Context::Context (::Backend *backend)
+  : backend (backend), resolver (Resolver::Resolver::get ()),
+tyctx (Resolver::TypeCheckContext::get ()),
+mappings (Analysis::Mappings::get ()), mangler (Mangler ())
+{
+  setup_builtins ();
+}
+
+void
+Context::setup_builtins ()
+{
+  auto builtins = resolver->get_builtin_types ();
+  for (auto it = builtins.begin (); it != builtins.end (); it++)
+{
+  HirId ref;
+  bool ok = tyctx->lookup_type_by_node_id ((*it)->get_node_id (), &ref);
+  rust_assert (ok);
+
+  TyTy::BaseType *lookup;
+  ok = tyctx->lookup_type (ref, &lookup);
+  rust_assert (ok);
+
+  TyTyResolveCompile::compile (this, lookup);
+}
+}
+
+hashval_t
+Context::type_hasher (tree type)
+{
+  inchash::hash hstate;
+
+  hstate.add_int (TREE_CODE (type));
+
+  if (TYPE_NAME (type))
+{
+  hashval_t record_name_hash
+   = IDENTIFIER_HASH_VALUE (DECL_NAME (TYPE_NAME (type)));
+  hstate.add_object (record_name_hash);
+}


The following does look a bit like type_hash_canon_hash.  I'll probably see what
we use tree type hashing for, just wondering here.


+  for (tree t = TYPE_ATTRIBUTES (type); t; t = TREE_CHAIN (t))
+/* Just the identifier is adequate to distinguish.  */
+hstate.add_object (IDENTIFIER_HASH_VALUE (TREE_PURPOSE (t)));
+
+  switch (TREE_CODE (type))
+{
+case METHOD_TYPE:
+  hstate.add_object (TYPE_HASH (TYPE_METHOD_BASETYPE (type)));
+  /* FALLTHROUGH. */
+case FUNCTION_TYPE:
+  for (tree t = TYPE_ARG_TYPES (type); t; t = TREE_CHAIN (t))
+   if (TREE_VALUE (t) != error_mark_node)
+ hstate.add_object (TYPE_HASH (TREE_VALUE (t)));
+  break;
+
+case OFFSET_TYPE:
+  hstate.add_object (TYPE_HASH (TYPE_OFFSET_BASETYPE (type)));
+  break;
+
+  case ARRAY_TYPE: {


GCC coding conventions would say the { goes to the next line and indented.
The rust FE might intentionally diverge from that standard, if so a
pointer in some
README in rust/ would be helpful.


This is not our intention. We would like to stick to the GCC coding 
convention, and use a `.clang-format` file to do so and apply it before 
merging any code. However it clearly has some limitations. I'll be on 
the lookout for these patterns and fix them by hand, or try and figure 
out how to edit the clang-format file.



+   if (TYPE_DOMAIN (type))
+ hstate.add_object (TYPE_HASH (TYPE_DOMAIN (type)));
+   if (!AGGREGATE_TYPE_P (TREE_TYPE (type)))
+ {
+   unsigned typeless = TYPE_TYPELESS_STORAGE (type);
+   hstate.add_object (typeless);
+ }
+  }
+  break;
+
+  case INTEGER_TYPE: {
+   tree t = TYPE_MAX_VALUE (type);
+   if (!t)
+ t = TYPE_MIN_VALUE (type);
+   for (int i = 0; i < TREE_INT_

Re: [PATCH] [PR68097] Try to avoid recursing for floats in tree_*_nonnegative_warnv_p.

On Mon, Nov 14, 2022 at 10:12 AM Richard Biener
 wrote:
>
> On Sat, Nov 12, 2022 at 7:30 PM Aldy Hernandez  wrote:
> >
> > It irks me that a PR named "we should track ranges for floating-point
> > hasn't been closed in this release.  This is an attempt to do just
> > that.
> >
> > As mentioned in the PR, even though we track ranges for floats, it has
> > been suggested that avoiding recursing through SSA defs in
> > gimple_assign_nonnegative_warnv_p is also a goal.  We can do this with
> > various ranger components without the need for a heavy handed approach
> > (i.e. a full ranger).
> >
> > I have implemented two versions of known_float_sign_p() that answer
> > the question whether we definitely know the sign for an operation or a
> > tree expression.
> >
> > Both versions use get_global_range_query, which is a wrapper to query
> > global ranges.  This means, that no caching or propagation is done.
> > In the case of an SSA, we just return the global range for it (think
> > SSA_NAME_RANGE_INFO).  In the case of a tree code with operands, we
> > also use get_global_range_query to resolve the operands, and then call
> > into range-ops, which is our lowest level component.  There is no
> > ranger or gori involved.  All we're doing is resolving the operation
> > with the ranges passed.
> >
> > This is enough to avoid recursing in the case where we definitely know
> > the sign of a range.  Otherwise, we still recurse.
> >
> > Note that instead of get_global_range_query(), we could use
> > get_range_query() which uses a ranger (if active in a pass), or
> > get_global_range_query if not.  This would allow passes that have an
> > active ranger (with enable_ranger) to use a full ranger.  These passes
> > are currently, VRP, loop unswitching, DOM, loop versioning, etc.  If
> > no ranger is active, get_range_query defaults to global ranges, so
> > there's no additional penalty.
> >
> > Would this be acceptable, at least enough to close (or rename the PR ;-))?
>
> I think the checks would belong to the gimple_stmt_nonnegative_warnv_p 
> function
> only (that's the SSA name entry from the fold-const.cc ones)?
>
> I also notice the use of 'bool' for the "sign".  That's not really
> descriptive.  We
> have SIGNED and UNSIGNED (aka enum signop), not sure if that's the
> perfect match vs. NEGATIVE and NONNEGATIVE.  Maybe the functions
> name is just bad and they should be known_float_negative_p?

Yeah, SIGNED and UNSIGNED doesn't seem to be much clearer than "bool signbit".

For instance, we have the following in frange:

  void set_nan (tree type, bool sign);
  void update_nan (bool sign);
  bool maybe_isnan (bool sign) const;
  bool signbit_p (bool &signbit) const;

I'm OK changing them to enum signop if you prefer.  I'm just not
totally convinced it's more readable.

??

Aldy



Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]



On Tue, 15 Nov 2022, Jonathan Wakely wrote:

> > How about the following:
> >
> > When emitting a compare-and-swap loop for @ref{__sync Builtins}
> > and @ref{__atomic Builtins} lacking a native instruction, optimize
> > for the highly contended case by issuing an atomic load before the
> > @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
> > when restarting the loop.
> 
> That's much better, thanks. My only remaining quibble would be that
> "invoking" an instruction seems only marginally better than running
> one. Emitting? Issuing? Using? Adding?

Right, it should be 'using'; let me also add 'to save CPU power':

When emitting a compare-and-swap loop for @ref{__sync Builtins}
and @ref{__atomic Builtins} lacking a native instruction, optimize
for the highly contended case by issuing an atomic load before the
@code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
to save CPU power when restarting the loop.

Alexander


Re: [PATCH] LoongArch: Fix atomic_exchange make comparison and may jump out

On Tue, 2022-11-15 at 21:03 +0800, Jinyang He wrote:
> gcc/ChangeLog:
> 
> * config/loongarch/sync.md:
> Add atomic_cas_value_exchange_and_7 and fix atomic_exchange.

nit:

* config/loongarch/sync.md (atomic_cas_value_exchange_and_7): 
New define_insn.
(atomic_exchange): Use atomic_cas_value_exchange_and_7 instead 
of atomic_cas_value_cmp_and.

> gcc/testsuite/ChangeLog:
>
> * gcc.target/loongarch/sync-1.c: New test.

Likewise, ChangeLog content should be indented with a tab. (Not 8
spaces: if my mail client changes my tab to 8 spaces I'm sorry).

/* snip */

> +  return "%G6\\n\\t"
> +    "1:\\n\\t"
> +    "ll.\\t%0,%1\\n\\t"
> +    "and\\t%7,%0,%z3\\n\\t"
> +    "or%i5\\t%7,%7,%5\\n\\t"
> +    "sc.\\t%7,%1\\n\\t"
> +    "beqz\\t%7,1b\\n\\t";

Do we need a "dbar 0x700" after beqz?

/* snip */

> diff --git a/gcc/testsuite/gcc.target/loongarch/sync-1.c 
> b/gcc/testsuite/gcc.target/loongarch/sync-1.c
> new file mode 100644
> index 0..cebed6a9b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/sync-1.c
> @@ -0,0 +1,104 @@
> +/* Test __sync_test_and_set in atomic_exchange */
> +/* { dg-do run } */
> +/* { dg-options "-lpthread -std=c11" } */

This test seems not deterministic.  And the use of sched_yield is very
tricky, as the man page says:

   sched_yield() is intended for use with  real-time  scheduling  policies
   (i.e., SCHED_FIFO or SCHED_RR).  Use of sched_yield() with nondetermin‐
   istic scheduling policies such as SCHED_OTHER is unspecified  and  very
   likely means your application design is broken.

I'd suggest to create a bug report at https://gcc.gnu.org/bugzilla and
post this test in the PR.  Then add the PR number into the changelog,
and just add a { dg-do compile } and { dg-final { scan-assembler ... } }
test into the testsuite to ensure the correct ll/sc loop is generated.

A bug report also emphasises that this is a bug fix, which is suitable
for GCC 13 (in stage 3 now) and GCC 12 (the fix will be backported).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[committed] libstdc++: Fix detection of std::format support for __float128 [PR107693]

Tested x86_64-linux and x86_64-w64-mingw32. Pushed to trunk.

-- >8 --

std::format gives linker errors on targets that define __float128 but
do not support using it with std::to_chars. This improves the handling
of 128-bit flaoting-point types so they are disabled if unsupportable.

libstdc++-v3/ChangeLog:

PR libstdc++/107693
* include/std/format (_GLIBCXX_FORMAT_F128): Define to 2 when
basic_format_arg needs to use its _M_f128 member.
(__extended_floating_point, __floating_point): Replace with ...
(__formattable_floating_point): New concept.
* testsuite/std/format/functions/format.cc: Check whether
__float128 is supported. Also test _Float128.
---
 libstdc++-v3/include/std/format   | 77 ++-
 .../testsuite/std/format/functions/format.cc  | 20 -
 2 files changed, 59 insertions(+), 38 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 1796362ceef..c79c8f2ce31 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -1213,40 +1213,35 @@ namespace __format
   _Spec<_CharT> _M_spec{};
 };
 
+  // Decide how 128-bit floating-point types should be formatted (or not).
+  // When supported, the typedef __format::__float128_t is the type that
+  // format arguments should be converted to for storage in basic_format_arg.
+  // Define the macro _GLIBCXX_FORMAT_F128 to say they're supported.
+  // _GLIBCXX_FORMAT_F128=1 means __float128, _Float128 etc. will be formatted
+  // by converting them to long double (or __ieee128 for powerpc64le).
+  // _GLIBCXX_FORMAT_F128=2 means basic_format_arg needs to enable explicit
+  // support for _Float128, rather than formatting it as another type.
+#undef _GLIBCXX_FORMAT_F128
+
 #ifdef _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT
-# define _GLIBCXX_FORMAT_F128 1
+
+  // Format 128-bit floating-point types using __ieee128.
   using __float128_t = __ieee128;
-#elif defined _GLIBCXX_LDOUBLE_IS_IEEE_BINARY128
 # define _GLIBCXX_FORMAT_F128 1
+
+#elif defined _GLIBCXX_LDOUBLE_IS_IEEE_BINARY128
+
+  // Format 128-bit floating-point types using long double.
   using __float128_t = long double;
-#elif __FLT128_DIG__
-# define _GLIBCXX_FORMAT_F128 2
+# define _GLIBCXX_FORMAT_F128 1
+
+#elif __FLT128_DIG__ && defined(__GLIBC_PREREQ) // see floating_to_chars.cc
+
+  // Format 128-bit floating-point types using _Float128.
   using __float128_t = _Float128;
-#else
-# undef _GLIBCXX_FORMAT_F128
-#endif
+# define _GLIBCXX_FORMAT_F128 2
 
-#ifdef _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT
-  template
-concept __extended_floating_point = __is_same(_Tp, _Float128)
- || __is_same(_Tp, __ibm128)
- || __is_same(_Tp, __ieee128);
-#elif _GLIBCXX_FORMAT_F128
-  template
-concept __extended_floating_point = __is_same(_Tp, __float128_t);
-#else
-  template
-concept __extended_floating_point = false;
-#endif
-
-  template
-concept __floating_point = std::floating_point<_Tp>
-|| __extended_floating_point<_Tp>;
-
-  using std::to_chars;
-
-#if _GLIBCXX_FORMAT_F128 == 2 \
-  && (__cplusplus == 202002L || !defined(_GLIBCXX_HAVE_FLOAT128_MATH))
+# if __cplusplus == 202002L || !defined(_GLIBCXX_HAVE_FLOAT128_MATH)
   // These overloads exist in the library, but are not declared for C++20.
   // Make them available as std::__format::to_chars.
   to_chars_result
@@ -1260,8 +1255,16 @@ namespace __format
   to_chars_result
   to_chars(char*, char*, _Float128, chars_format, int) noexcept
 __asm("_ZSt8to_charsPcS_DF128_St12chars_formati");
+# endif
 #endif
 
+  using std::to_chars;
+
+  // We can format a floating-point type iff it is usable with to_chars.
+  template
+concept __formattable_float = requires (_Tp __t, char* __p)
+{ __format::to_chars(__p, __p, __t, chars_format::scientific, 6); };
+
   template<__char _CharT>
 struct __formatter_fp
 {
@@ -1984,7 +1987,7 @@ namespace __format
 #endif
 
   /// Format a floating-point value.
-  template<__format::__floating_point _Tp, __format::__char _CharT>
+  template<__format::__formattable_float _Tp, __format::__char _CharT>
 struct formatter<_Tp, _CharT>
 {
   formatter() = default;
@@ -2607,7 +2610,7 @@ namespace __format
 #ifdef _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT
__ieee128 _M_f128;
__ibm128  _M_ibm128;
-#elif _GLIBCXX_FORMAT_F128
+#elif _GLIBCXX_FORMAT_F128 == 2
__float128_t _M_f128;
 #endif
   };
@@ -2663,7 +2666,7 @@ namespace __format
  else if constexpr (is_same_v<_Tp, unsigned __int128>)
return __u._M_u128;
 #endif
-#if _GLIBCXX_FORMAT_F128
+#if _GLIBCXX_FORMAT_F128 == 2
  else if constexpr (is_same_v<_Tp, __float128_t>)
return __u._M_f128;
 #endif
@@ -2843,13 +2846,15 @@ namespace __format
return type_identity<_Float64>();
 # endif
 #endif
-#ifdef __FLT128_DIG__
+#if 

[committed] libstc++: std::formattable concept should not be defined for C++20

Tested x86_64-linux and x86_64-w64-mingw32. Pushed to trunk.

-- >8 --

This concept was added by a C++23 proposal, so don't define it for
C++20.

Split the format/formatter/formatter.cc test into two parts, one that
tests the C++20 requirements and one that tests the C++23 concept.

libstdc++-v3/ChangeLog:

* include/std/format (formattable): Only define for C++23/
* testsuite/std/format/formatter.cc: Moved to...
* testsuite/std/format/formatter/requirements.cc: ...here.
* testsuite/std/format/formatter/concept.cc: New test.
* testsuite/std/format/functions/format.cc: Replace use of
std::formattable in C++20.
---
 libstdc++-v3/include/std/format   | 11 +++-
 .../testsuite/std/format/formatter/concept.cc | 46 
 .../requirements.cc}  | 54 +--
 .../testsuite/std/format/functions/format.cc  | 12 -
 4 files changed, 77 insertions(+), 46 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/format/formatter/concept.cc
 rename libstdc++-v3/testsuite/std/format/{formatter.cc => 
formatter/requirements.cc} (50%)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index c79c8f2ce31..204a1710aca 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -2181,11 +2181,14 @@ namespace __format
 } // namespace __format
 /// @endcond
 
+#if __cplusplus > 202002L
   // [format.formattable], concept formattable
   template
 concept formattable
   = __format::__formattable_impl, _CharT>;
+#endif
 
+#if __cpp_lib_format_ranges
   /// @cond undocumented
 namespace __format
 {
@@ -2199,6 +2202,7 @@ namespace __format
   = conditional_t<__const_formattable_range<_Rg, _CharT>, const _Rg, _Rg>;
 } // namespace __format
   /// @endcond
+#endif // format_ranges
 
   /// An iterator after the last character written, and the number of
   /// characters that would have been written.
@@ -3485,16 +3489,19 @@ namespace __format
 
std::visit_format_arg([this](auto& __arg) {
  using _Type = remove_reference_t;
+ using _Formatter = typename _Context::template formatter_type<_Type>;
  if constexpr (is_same_v<_Type, monostate>)
__format::__invalid_arg_id_in_format_string();
  else if constexpr (is_same_v<_Type, handle>)
__arg.format(this->_M_pc, this->_M_fc);
- else
+ else if constexpr (is_default_constructible_v<_Formatter>)
{
- typename _Context::template formatter_type<_Type> __f;
+ _Formatter __f;
  this->_M_pc.advance_to(__f.parse(this->_M_pc));
  this->_M_fc.advance_to(__f.format(__arg, this->_M_fc));
}
+ else
+   static_assert(__format::__formattable_with<_Type, _Context>);
}, _M_fc.arg(__id));
   }
 };
diff --git a/libstdc++-v3/testsuite/std/format/formatter/concept.cc 
b/libstdc++-v3/testsuite/std/format/formatter/concept.cc
new file mode 100644
index 000..fe56dc44a68
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/format/formatter/concept.cc
@@ -0,0 +1,46 @@
+// { dg-options "-std=gnu++23" }
+// { dg-do compile { target c++23 } }
+
+#include 
+
+struct S { };
+
+template<> struct std::formatter : std::formatter {
+  template
+  auto format(S, std::basic_format_context& ctx) const {
+return formatter::format("ess", ctx);
+  }
+};
+
+struct T { };
+
+template<> struct std::formatter : std::formatter {
+  // This function only accepts std::format_context, not other contexts.
+  auto format(T, std::format_context& ctx) const {
+return formatter::format("tee", ctx);
+  }
+};
+
+struct U { };
+
+void
+test_concept() // [format.formattable]
+{
+  static_assert( std::formattable );
+  static_assert( std::formattable );
+  static_assert( std::formattable );
+  static_assert( std::formattable );
+  static_assert( std::formattable );
+  static_assert( std::formattable );
+  static_assert( std::formattable );
+  static_assert( std::formattable );
+  static_assert( std::formattable );
+  static_assert( ! std::formattable );
+  static_assert( ! std::formattable );
+  static_assert( ! std::formattable );
+  static_assert( std::formattable );
+  static_assert( std::formattable );
+  static_assert( ! std::formattable ); // only formats as char
+  static_assert( ! std::formattable ); // formatter not generic
+  static_assert( ! std::formattable ); // no formatter
+}
diff --git a/libstdc++-v3/testsuite/std/format/formatter.cc 
b/libstdc++-v3/testsuite/std/format/formatter/requirements.cc
similarity index 50%
rename from libstdc++-v3/testsuite/std/format/formatter.cc
rename to libstdc++-v3/testsuite/std/format/formatter/requirements.cc
index 64ff2dbfbfd..3bff8bdbd5d 100644
--- a/libstdc++-v3/testsuite/std/format/formatter.cc
+++ b/libstdc++-v3/testsuite/std/format/formatter/requirements.cc
@@ -4,48 +4,6 @@
 #include 
 #include 
 
-struct S { };
-
-template<> struct 

[committed] libstdc++: Fix std::format test for strict -std=c++20 mode

Tested x86_64-linux and x86_64-w64-mingw32. Pushed to trunk.

-- >8 --

Adjust a test to avoid using std::make_unsigned_t<__int128>. That's
ill-formed in strict modes because std::is_integral_v<__int128> is
false.

libstdc++-v3/ChangeLog:

* testsuite/std/format/functions/format.cc: Do not use
std::make_unsigned_t<__int128>.
---
 libstdc++-v3/testsuite/std/format/functions/format.cc | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc 
b/libstdc++-v3/testsuite/std/format/functions/format.cc
index c01405eac90..8019fbdf712 100644
--- a/libstdc++-v3/testsuite/std/format/functions/format.cc
+++ b/libstdc++-v3/testsuite/std/format/functions/format.cc
@@ -233,7 +233,7 @@ test_wchar()
 void
 test_minmax()
 {
-  auto check = [](T) {
+  auto check = []>(T, U = 0) {
 const int digits = std::numeric_limits::digits;
 const std::string zeros(digits, '0');
 const std::string ones(digits, '1');
@@ -241,7 +241,6 @@ test_minmax()
 VERIFY( s == "-1" + zeros );
 s = std::format("{:b}" , std::numeric_limits::max());
 VERIFY( s == ones );
-using U = std::make_unsigned_t;
 s = std::format("{:0{}b}" , std::numeric_limits::min(), digits + 1);
 VERIFY( s == '0' + zeros );
 s = std::format("{:b}" , std::numeric_limits::max());
@@ -252,7 +251,9 @@ test_minmax()
   check(std::int32_t(0));
   check(std::int64_t(0));
 #ifdef __SIZEOF_INT128__
-  check(__int128(0));
+  // std::make_unsigned_t<__int128> is invalid for strict -std=c++20 mode,
+  // so pass a second argument of the unsigned type.
+  check(__int128(0), (unsigned __int128)(0));
 #endif
 }
 
-- 
2.38.1



Re: [committed] libstdc++: Fix detection of std::format support for __float128 [PR107693]

On Tue, Nov 15, 2022 at 02:31:19PM +, Jonathan Wakely via Gcc-patches wrote:
> Tested x86_64-linux and x86_64-w64-mingw32. Pushed to trunk.
> 
> -- >8 --
> 
> std::format gives linker errors on targets that define __float128 but
> do not support using it with std::to_chars. This improves the handling
> of 128-bit flaoting-point types so they are disabled if unsupportable.
> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/107693
>   * include/std/format (_GLIBCXX_FORMAT_F128): Define to 2 when
>   basic_format_arg needs to use its _M_f128 member.
>   (__extended_floating_point, __floating_point): Replace with ...
>   (__formattable_floating_point): New concept.
>   * testsuite/std/format/functions/format.cc: Check whether
>   __float128 is supported. Also test _Float128.

> --- a/libstdc++-v3/include/std/format
> +++ b/libstdc++-v3/include/std/format

> +#elif __FLT128_DIG__ && defined(__GLIBC_PREREQ) // see floating_to_chars.cc

I'd just use here
#elif __FLT128_DIG__ && defined(_GLIBCXX_HAVE_FLOAT128_MATH)
instead.

The reason for defined(__GLIBC_PREREQ) in floating_{to,from}_chars.cc
is that I didn't want to make the ABI of linux libstdc++.so.6 dependent
on whether gcc was built against glibc 2.26+ or older glibc.
So, the symbols exist in libstdc++.so.6 even for older glibcs, but it will
actually only work properly (without losing precision; otherwise it will
just go through long double) if at runtime one uses glibc 2.26+.

But in the headers, defined(_GLIBCXX_HAVE_FLOAT128_MATH) is used everywhere
else (which is true only when compiling against glibc 2.26+).

Jakub



Re: [PATCH]middle-end: replace GET_MODE_WIDER_MODE with GET_MODE_NEXT_MODE

Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, November 15, 2022 11:59 AM
>> To: Tamar Christina via Gcc-patches 
>> Cc: Tamar Christina ; nd ;
>> rguent...@suse.de; j...@ventanamicro.com
>> Subject: Re: [PATCH]middle-end: replace GET_MODE_WIDER_MODE with
>> GET_MODE_NEXT_MODE
>> 
>> Tamar Christina via Gcc-patches  writes:
>> > Hi All,
>> >
>> > After the fix to the addsub patch yesterday for bootstrap I had only
>> regtested on x86.
>> > While looking today it seemed the new tests were failing, this was
>> > caused by a change in the behavior of the GET_MODE_WIDER_MODE
>> macro on trunk.
>> >
>> > This patch fixes that issue. Sorry for the mess, have rebased all branches
>> now.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >* match.pd: Replace GET_MODE_WIDER_MODE with
>> >GET_MODE_NEXT_MODE.
>> >
>> > --- inline copy of patch --
>> > diff --git a/gcc/match.pd b/gcc/match.pd index
>> >
>> 1b0ab7cf60fa4772fbe8304c622b0b8fab1bdefa..28191a992039c6f3a1dab5f7c0
>> e3
>> > 5dd58dc47092 100644
>> > --- a/gcc/match.pd
>> > +++ b/gcc/match.pd
>> > @@ -7997,7 +7997,7 @@ and,
>> > machine_mode wide_mode;
>> >   }
>> >   (if (sel.series_p (0, 2, 0, 2)
>> > -  && GET_MODE_WIDER_MODE (vec_mode).exists (&wide_mode)
>> > +  && GET_MODE_NEXT_MODE (vec_mode).exists (&wide_mode)
>> >  && VECTOR_MODE_P (wide_mode)
>> >  && (GET_MODE_UNIT_BITSIZE (vec_mode) * 2
>> >  == GET_MODE_UNIT_BITSIZE (wide_mode)))
>> 
>> Does anything guarantee that the next mode will be the right one?
>> It think it would be safer to replace the last three && conditions with:
>> 
>>&& GET_MODE_2XWIDER_MODE (GET_MODE_INNER (vec_mode)).exists
>> (&wide_elt_mode)
>>&& multiple_p (GET_MODE_NUNITS (vec_mode), 2, &wide_nunits)
>>&& related_vector_mode (vec_mode, wide_elt_mode,
>> wide_nunits).exists (&wide_mode)
>
> I see, respun patch accordingly.

LGTM, but I'm nervous when it comes to match.pd stuff so I'd prefer
Richi or Jeff to have the final say.

Thanks,
Richard

>
> Ok for master?
>
> --- inline copy of patch ---
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 1b0ab7cf60fa4772fbe8304c622b0b8fab1bdefa..82f05bbc912e4f80f3984d930c4a8dcb010136e1
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7995,12 +7995,15 @@ and,
> vec_perm_indices sel (builder, 2, nelts);
> machine_mode vec_mode = TYPE_MODE (type);
> machine_mode wide_mode;
> +   scalar_mode wide_elt_mode;
> +   poly_uint64 wide_nunits;
> +   scalar_mode inner_mode = GET_MODE_INNER (vec_mode);
>   }
>   (if (sel.series_p (0, 2, 0, 2)
> -  && GET_MODE_WIDER_MODE (vec_mode).exists (&wide_mode)
> -   && VECTOR_MODE_P (wide_mode)
> -   && (GET_MODE_UNIT_BITSIZE (vec_mode) * 2
> -   == GET_MODE_UNIT_BITSIZE (wide_mode)))
> +   && GET_MODE_2XWIDER_MODE (inner_mode).exists (&wide_elt_mode)
> +   && multiple_p (GET_MODE_NUNITS (vec_mode), 2, &wide_nunits)
> +   && related_vector_mode (vec_mode, wide_elt_mode,
> +   wide_nunits).exists (&wide_mode))
>   (with
>{
>  tree stype


[PING] Re: [PATCH v7 00/34] libgcc: Thumb-1 Floating-Point Assembly for Cortex M0

Hello, 

Is there still any interest in merging this patch? 

Thanks,
Daniel


On Mon, Oct 31, 2022, at 8:44 AM, Daniel Engel wrote:
> Hi Richard,
>
> I am re-submitting my libgcc patch from 2021:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563585.html
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587383.html
>
> I believe I have finally made the stage1 window. 
>
> Regards,
> Daniel
>
> ---
>
> Changes since v6:
>
> * Rebased and tested with gcc-13
>
> There are no regressions for -march={armv4t,armv6s-m,armv7-m,armv7-a}.
> Clean master:
>
> # of expected passes529397
> # of unexpected failures41160
> # of unexpected successes   12
> # of expected failures  3442
> # of unresolved testcases   978
> # of unsupported tests  28993
>
> Patched master:
>
> # of expected passes529397
> # of unexpected failures41160
> # of unexpected successes   12
> # of expected failures  3442
> # of unresolved testcases   978
> # of unsupported tests  28993
>
> ---
>
> This patch series adds an assembly-language implementation of IEEE-754 
> compliant
> single-precision functions designed for the Cortex M0 (v6m) architecture.  
> There
> are improvements to most of the EABI integer functions as well.  This is the
> ibgcc component of a larger library project originally proposed in 2018:
>
> https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html
>
> As one point of comparison, a test program [1] links 916 bytes from libgcc 
> with
> the patched toolchain vs 10276 bytes with gcc-arm-none-eabi-9-2020-q2 
> toolchain.
> That's a 90% size reduction.
>
> I have extensive test vectors [2], and this patch pass all tests on an 
> STM32F051.
> These vectors were derived from UCB [3], Testfloat [4], and IEEECC754 
> [5], plus
> many of my own generation.
>
> There may be some follow-on projects worth discussing:
>
> * The library is currently integrated into the ARM v6s-m multilib only.  
> It
> is likely that some other architectures would benefit from these routines.
> However, I have NOT profiled the existing implementations (ieee754-sf.S) 
> to
> estimate where improvements may be found.
>
> * GCC currently lacks test for some functions, such as 
> __aeabi_[u]ldivmod().
> There may be useful bits in [1] that can be integrated.
>
> On Cortex M0, the library has (approximately) the following properties:
>
> Function(s) Size (bytes)Cycles  
> Stack   Accuracy
> __clzsi250  20  
> 0   exact
> __clzsi2 (OPTIMIZE_SIZE)22  51  
> 0   exact
> __clzdi28+__clzsi2  4+__clzsi2  
> 0   exact
>
> __clrsbsi2  8+__clzsi2  6+__clzsi2  
> 0   exact
> __clrsbdi2  18+__clzsi2 (8..10)+__clzsi2
> 0   exact
>
> __ctzsi252  21  
> 0   exact
> __ctzsi2 (OPTIMIZE_SIZE)24  52  
> 0   exact
> __ctzdi28+__ctzsi2  5+__ctzsi2  
> 0   exact
>
> __ffssi28   6..(5+__ctzsi2) 
> 0   exact
> __ffsdi214+__ctzsi2 9..(8+__ctzsi2) 
> 0   exact
>
> __popcountsi2   52  25  
> 0   exact
> __popcountsi2 (OPTIMIZE_SIZE)   14  9..201  
> 0   exact
> __popcountdi2   34+__popcountsi246  
> 0   exact
> __popcountdi2 (OPTIMIZE_SIZE)   12+__popcountsi217..401 
> 0   exact
>
> __paritysi2 24  14  
> 0   exact
> __paritysi2 (OPTIMIZE_SIZE) 16  38  
> 0   exact
> __paritydi2 2+__paritysi2   1+__paritysi2   
> 0   exact
>
> __umulsidi3 44  24  
> 0   exact
> __mulsidi3  30+__umulsidi3  24+__umulsidi3  
> 8   exact
> __muldi3 (__aeabi_lmul) 10+__umulsidi3  6+__umulsidi3   
> 0   exact
> __ashldi3 (__aeabi_llsl)22  13  
> 0   exact
> __lshrdi3 (__aeabi_llsr)22  13  
> 0   exact
> __ashrdi3 (__aeabi_lasr)22  13  
> 0   exact
>
> __aeabi_lcmp20  13  
> 0   exact
> __aeabi_ulcmp   16  10  
> 0   exact
>
> __udivsi3 (__aeabi_uidiv)   56  72..385 
> 0   < 1 lsb

Re: [PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

Ping on this patch.

thanks.

Qing

> On Nov 8, 2022, at 9:51 AM, Qing Zhao  wrote:
> 
> '-Wstrict-flex-arrays'
> Warn about inproper usages of flexible array members according to
> the LEVEL of the 'strict_flex_array (LEVEL)' attribute attached to
> the trailing array field of a structure if it's available,
> otherwise according to the LEVEL of the option
> '-fstrict-flex-arrays=LEVEL'.
> 
> This option is effective only when LEVEL is bigger than 0.
> Otherwise, it will be ignored with a warning.
> 
> when LEVEL=1, warnings will be issued for a trailing array
> reference of a structure that have 2 or more elements if the
> trailing array is referenced as a flexible array member.
> 
> when LEVEL=2, in addition to LEVEL=1, additional warnings will be
> issued for a trailing one-element array reference of a structure if
> the array is referenced as a flexible array member.
> 
> when LEVEL=3, in addition to LEVEL=2, additional warnings will be
> issued for a trailing zero-length array reference of a structure if
> the array is referenced as a flexible array member.
> 
> At the same time, keep -Warray-bounds=[1|2] warnings unchanged from
> -fstrict-flex-arrays.
> 
> gcc/ChangeLog:
> 
>   * attribs.cc (strict_flex_array_level_of): New function.
>   * attribs.h (strict_flex_array_level_of): Prototype for new function.
>   * doc/invoke.texi: Document -Wstrict-flex-arrays option. Update
>   -fstrict-flex-arrays[=n] options.
>   * gimple-array-bounds.cc (array_bounds_checker::check_array_ref):
>   Issue warnings for -Wstrict-flex-arrays.
>   (get_up_bounds_for_array_ref): New function.
>   (check_out_of_bounds_and_warn): New function.
>   * opts.cc (finish_options): Issue warnings for unsupported combination
>   of -Warray-bounds and -fstrict-flex-arrays, -Wstrict_flex_arrays and
>   -fstrict-flex-array.
>   * tree-vrp.cc (execute_vrp): Enable the pass when
>   warn_strict_flex_array is true.
>   (execute_ranger_vrp): Likewise.
>   * tree.cc (array_ref_flexible_size_p): Add one new argument.
>   (component_ref_sam_type): New function.
>   (component_ref_size): Add one new argument,
>   * tree.h (array_ref_flexible_size_p): Update prototype.
>   (enum struct special_array_member): Add two new enum values.
>   (component_ref_sam_type): New prototype.
>   (component_ref_size): Update prototype.
> 
> gcc/c-family/ChangeLog:
> 
>   * c.opt (Wstrict-flex-arrays): New option.
> 
> gcc/c/ChangeLog:
> 
>   * c-decl.cc (is_flexible_array_member_p): Call new function
>   strict_flex_array_level_of.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/Wstrict-flex-arrays.c: New test.
>   * c-c++-common/Wstrict-flex-arrays_2.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-2.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-3.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-4.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-5.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-6.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-7.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-8.c: New test.
>   * gcc.dg/Wstrict-flex-arrays-9.c: New test.
>   * gcc.dg/Wstrict-flex-arrays.c: New test.
> ---
> gcc/attribs.cc|  30 ++
> gcc/attribs.h |   2 +
> gcc/c-family/c.opt|   5 +
> gcc/c/c-decl.cc   |  22 +-
> gcc/doc/invoke.texi   |  33 ++-
> gcc/gimple-array-bounds.cc| 264 +-
> gcc/opts.cc   |  15 +
> .../c-c++-common/Wstrict-flex-arrays.c|   9 +
> .../c-c++-common/Wstrict-flex-arrays_2.c  |   9 +
> gcc/testsuite/gcc.dg/Wstrict-flex-arrays-2.c  |  46 +++
> gcc/testsuite/gcc.dg/Wstrict-flex-arrays-3.c  |  46 +++
> gcc/testsuite/gcc.dg/Wstrict-flex-arrays-4.c  |  49 
> gcc/testsuite/gcc.dg/Wstrict-flex-arrays-5.c  |  48 
> gcc/testsuite/gcc.dg/Wstrict-flex-arrays-6.c  |  48 
> gcc/testsuite/gcc.dg/Wstrict-flex-arrays-7.c  |  50 
> gcc/testsuite/gcc.dg/Wstrict-flex-arrays-8.c  |  49 
> gcc/testsuite/gcc.dg/Wstrict-flex-arrays-9.c  |  49 
> gcc/testsuite/gcc.dg/Wstrict-flex-arrays.c|  46 +++
> gcc/tree-vrp.cc   |   6 +-
> gcc/tree.cc   | 165 ---
> gcc/tree.h|  15 +-
> 21 files changed, 870 insertions(+), 136 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/Wstrict-flex-arrays.c
> create mode 100644 gcc/testsuite/c-c++-common/Wstrict-flex-arrays_2.c
> create mode 100644 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-2.c
> create mode 100644 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-3.c
> create mode 100644 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-4.c
> create mode 100644 gcc/testsuite/gcc.dg/Wstrict-flex-arrays-5.c
>

Re: [committed] libstdc++: Fix detection of std::format support for __float128 [PR107693]

On Tue, 15 Nov 2022 at 14:42, Jakub Jelinek wrote:
>
> On Tue, Nov 15, 2022 at 02:31:19PM +, Jonathan Wakely via Gcc-patches 
> wrote:
> > Tested x86_64-linux and x86_64-w64-mingw32. Pushed to trunk.
> >
> > -- >8 --
> >
> > std::format gives linker errors on targets that define __float128 but
> > do not support using it with std::to_chars. This improves the handling
> > of 128-bit flaoting-point types so they are disabled if unsupportable.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/107693
> >   * include/std/format (_GLIBCXX_FORMAT_F128): Define to 2 when
> >   basic_format_arg needs to use its _M_f128 member.
> >   (__extended_floating_point, __floating_point): Replace with ...
> >   (__formattable_floating_point): New concept.
> >   * testsuite/std/format/functions/format.cc: Check whether
> >   __float128 is supported. Also test _Float128.
>
> > --- a/libstdc++-v3/include/std/format
> > +++ b/libstdc++-v3/include/std/format
>
> > +#elif __FLT128_DIG__ && defined(__GLIBC_PREREQ) // see floating_to_chars.cc
>
> I'd just use here
> #elif __FLT128_DIG__ && defined(_GLIBCXX_HAVE_FLOAT128_MATH)
> instead.
>
> The reason for defined(__GLIBC_PREREQ) in floating_{to,from}_chars.cc
> is that I didn't want to make the ABI of linux libstdc++.so.6 dependent
> on whether gcc was built against glibc 2.26+ or older glibc.
> So, the symbols exist in libstdc++.so.6 even for older glibcs, but it will
> actually only work properly (without losing precision; otherwise it will
> just go through long double) if at runtime one uses glibc 2.26+.

Yes, and my intention was that std::format would also support
__float128, with the same imprecise behaviour. But could just limit
std::format support to when std::to_chars works properly.

> But in the headers, defined(_GLIBCXX_HAVE_FLOAT128_MATH) is used everywhere
> else (which is true only when compiling against glibc 2.26+).



libsanitizer: sync from master

Hi.

I've just pushed libsanitizer update that was tested on x86_64-linux and 
ppc64le-linux systems.
Moreover, I run bootstrap on x86_64-linux and checked ABI difference with 
abidiff.

Pushed as r13-4068-g3037f11fb86eda.

Cheers,
Martin


Re: [PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops




On 11/11/2022 17:40, Stam Markianos-Wright via Gcc-patches wrote:

Hi all,

This is the 2/2 patch that contains the functional changes needed
for MVE Tail Predicated Low Overhead Loops.  See my previous email
for a general introduction of MVE LOLs.

This support is added through the already existing loop-doloop
mechanisms that are used for non-MVE dls/le looping.

Changes are:

1) Relax the loop-doloop mechanism in the mid-end to allow for
   decrement numbers other that -1 and for `count` to be an
   rtx containing the number of elements to be processed, rather
   than an expression for calculating the number of iterations.
2) Add a `allow_elementwise_doloop` target hook. This allows the
   target backend to manipulate the iteration count as it needs:
   in our case to change it from a pre-calculation of the number
   of iterations to the number of elements to be processed.
3) The doloop_end target-insn now had an additional parameter:
   the `count` (note: this is before it gets modified to just be
   the number of elements), so that the decrement value is
   extracted from that parameter.

And many things in the backend to implement the above optimisation:

4)  Appropriate changes to the define_expand of doloop_end and new
    patterns for dlstp and letp.
5) `arm_attempt_dlstp_transform`: (called from the define_expand of
    doloop_end) this function checks for the loop's suitability for
    dlstp/letp transformation and then implements it, if possible.
6) `arm_mve_get_loop_unique_vctp`: A function that loops through
    the loop contents and returns the vctp VPR-genereting operation
    within the loop, if it is unique and there is exclusively one
    vctp within the loop.
7) A couple of utility functions: `arm_mve_get_vctp_lanes` to map
   from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg`
   to check an insn to see if it requires the VPR or not.

No regressions on arm-none-eabi with various targets and on
aarch64-none-elf. Thoughts on getting this into trunk?

Thank you,
Stam Markianos-Wright

gcc/ChangeLog:

    * config/aarch64/aarch64.md: Add extra doloop_end arg.
    * config/arm/arm-protos.h (arm_attempt_dlstp_transform): New.
    * config/arm/arm.cc (TARGET_ALLOW_ELEMENTWISE_DOLOOP): New.
    (arm_mve_get_vctp_lanes): New.
    (arm_get_required_vpr_reg): New.
    (arm_mve_get_loop_unique_vctp): New.
    (arm_attempt_dlstp_transform): New.
    (arm_allow_elementwise_doloop): New.
    * config/arm/iterators.md:
    * config/arm/mve.md (*predicated_doloop_end_internal): New.
    (dlstp_insn): New.
    * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
    * config/arm/unspecs.md: New unspecs.
    * config/ia64/ia64.md: Add extra doloop_end arg.
    * config/pru/pru.md: Add extra doloop_end arg.
    * config/rs6000/rs6000.md: Add extra doloop_end arg.
    * config/s390/s390.md: Add extra doloop_end arg.
    * config/v850/v850.md: Add extra doloop_end arg.
    * doc/tm.texi: Document new hook.
    * doc/tm.texi.in: Likewise.
    * loop-doloop.cc (doloop_condition_get): Relax conditions.
    (doloop_optimize): Add support for elementwise LoLs.
    * target-insns.def (doloop_end): Add extra arg.
    * target.def (allow_elementwise_doloop): New hook.
    * targhooks.cc (default_allow_elementwise_doloop): New.
    * targhooks.h (default_allow_elementwise_doloop): New.

gcc/testsuite/ChangeLog:

    * gcc.target/arm/lob.h: Update framework.
    * gcc.target/arm/lob1.c: Likewise.
    * gcc.target/arm/lob6.c: Likewise.
    * gcc.target/arm/dlstp-int16x8.c: New test.
    * gcc.target/arm/dlstp-int32x4.c: New test.
    * gcc.target/arm/dlstp-int64x2.c: New test.
    * gcc.target/arm/dlstp-int8x16.c: New test.


### Inline copy of patch ###

diff --git a/gcc/config/aarch64/aarch64.md 
b/gcc/config/aarch64/aarch64.md
index 
f2e3d905dbbeb2949f2947f5cfd68208c94c9272..7a6d24a80060b4a704a481ccd1a32d96e7b0f369 
100644

--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7366,7 +7366,8 @@
 ;; knows what to generate.
 (define_expand "doloop_end"
   [(use (match_operand 0 "" ""))  ; loop pseudo
-   (use (match_operand 1 "" ""))] ; label
+   (use (match_operand 1 "" ""))  ; label
+   (use (match_operand 2 "" ""))] ; decrement constant
   "optimize > 0 && flag_modulo_sched"
 {
   rtx s0;
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
550272facd12e60a49bf8a3b20f811cc13765b3a..7684620f0f4d161dd9e9ad2d70308021ec3d3d34 
100644

--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -63,7 +63,7 @@ extern void arm_decompose_di_binop (rtx, rtx, rtx *, 
rtx *, rtx *, rtx *);

 extern bool arm_q_bit_access (void);
 extern bool arm_ge_bits_access (void);
 extern bool arm_target_insn_ok_for_lob (rtx);
-
+extern rtx arm_attempt_dlstp_transform (rtx, rtx);
 #ifdef RTX_CODE
 enum reg_class

Re: [PATCH] libstdc++: Enable building libstdc++.{a,so} when !HOSTED

On Thu, 20 Oct 2022 at 16:53, Arsen Arsenović via Libstdc++
 wrote:
>
> This enables us to provide symbols for placeholders and numeric limits,


I'm not convinced this is worth doing.

The placeholders and the numeric_limits members are all inline
variables for C++17 and later, and C++17 is the compiler's default
mode. The placeholders aren't even required to exist for freestanding
prior to C++23. For the numeric_limits definitions, I suppose it is a
problem that users can't take their address in freestanding today
unless they compile as C++17.

> and allows users to mess about with linker flags less.

i.e. they don't have to use -nostdlib and/or link with gcc -lsupc++,
but can just use g++?

That seems more compelling than providing definitions of the
placeholders and limits members.


>
> libstdc++-v3/ChangeLog:
>
> * Makefile.am [!_GLIBCXX_HOSTED]: Enable src/ subdirectory.
> * Makefile.in: Regenerate.
> * src/Makefile.am [!_GLIBCXX_HOSTED]: Omit compatibility files.
> There's no history to be compatible with.
> * src/c++11/Makefile.am [!_GLIBCXX_HOSTED]: Omit hosted-only
> source files from the build.
> * src/c++17/Makefile.am [!_GLIBCXX_HOSTED]: Likewise.
> * src/c++20/Makefile.am [!_GLIBCXX_HOSTED]: Likewise.
> * src/c++98/Makefile.am [!_GLIBCXX_HOSTED]: Likewise.
> * src/Makefile.in: Regenerate.
> * src/c++11/Makefile.in: Regenerate.
> * src/c++17/Makefile.in: Regenerate.
> * src/c++20/Makefile.in: Regenerate.
> * src/c++98/Makefile.in: Regenerate.
> ---
> Afternoon,
>
> With these changes, when we aren't hosted, we get a libstdc++ library that
> contains only library facilities available in freestanding (i.e. placeholders
> and limits.cc).  This is, AFAICT, the only code in libstdc++.{a,so} that can
> (and should) be available in freestanding.
>
> As an implementation note, this could be a little bit faster (at
> build/configure time), though not necessarily nicer, by having
> src/Makefile.am not try to build convenience libraries for versions of
> C++ that provide nothing.  I opted not to do this since it'd make
> src/Makefile.am even more complex, and make future changes harder to 
> implement.
> libstdc++ also isn't that slow to build, anyway.
>
> Tested on i686-elf.
>
> Have a good day!
>
>  libstdc++-v3/Makefile.am   |  4 ++--
>  libstdc++-v3/Makefile.in   |  4 ++--
>  libstdc++-v3/src/Makefile.am   |  6 +
>  libstdc++-v3/src/Makefile.in   |  8 +--
>  libstdc++-v3/src/c++11/Makefile.am | 16 ++---
>  libstdc++-v3/src/c++11/Makefile.in | 37 +++---
>  libstdc++-v3/src/c++17/Makefile.am |  4 
>  libstdc++-v3/src/c++17/Makefile.in |  6 +++--
>  libstdc++-v3/src/c++20/Makefile.am |  4 
>  libstdc++-v3/src/c++20/Makefile.in |  6 +++--
>  libstdc++-v3/src/c++98/Makefile.am |  4 
>  libstdc++-v3/src/c++98/Makefile.in |  6 +++--
>  12 files changed, 77 insertions(+), 28 deletions(-)
>
> diff --git a/libstdc++-v3/Makefile.am b/libstdc++-v3/Makefile.am
> index 0d147ad3ffe..d7f2b6e76a5 100644
> --- a/libstdc++-v3/Makefile.am
> +++ b/libstdc++-v3/Makefile.am
> @@ -24,11 +24,11 @@ include $(top_srcdir)/fragment.am
>
>  if GLIBCXX_HOSTED
>  ## Note that python must come after src.
> -  hosted_source = src doc po testsuite python
> +  hosted_source = doc po testsuite python
>  endif
>
>  ## Keep this list sync'd with acinclude.m4:GLIBCXX_CONFIGURE.
> -SUBDIRS = include libsupc++ $(hosted_source)
> +SUBDIRS = include libsupc++ src $(hosted_source)
>
>  ACLOCAL_AMFLAGS = -I . -I .. -I ../config
>
> diff --git a/libstdc++-v3/src/Makefile.am b/libstdc++-v3/src/Makefile.am
> index b83c222d51d..4eb78e76297 100644
> --- a/libstdc++-v3/src/Makefile.am
> +++ b/libstdc++-v3/src/Makefile.am
> @@ -121,7 +121,13 @@ cxx11_sources = \
> ${cxx0x_compat_sources} \
> ${ldbl_alt128_compat_sources}
>
> +if GLIBCXX_HOSTED
>  libstdc___la_SOURCES = $(cxx98_sources) $(cxx11_sources)
> +else
> +# When freestanding, there's currently no compatibility to preserve.  Should
> +# that change, any compatibility sources can be added here.
> +libstdc___la_SOURCES =
> +endif
>
>  libstdc___la_LIBADD = \
> $(GLIBCXX_LIBS) \
> diff --git a/libstdc++-v3/src/c++11/Makefile.am 
> b/libstdc++-v3/src/c++11/Makefile.am
> index ecd46aafc01..72f05100c98 100644
> --- a/libstdc++-v3/src/c++11/Makefile.am
> +++ b/libstdc++-v3/src/c++11/Makefile.am
> @@ -51,6 +51,10 @@ else
>  cxx11_abi_sources =
>  endif
>
> +sources_freestanding = \
> +   limits.cc \
> +   placeholders.cc
> +
>  sources = \
> chrono.cc \
> codecvt.cc \
> @@ -66,9 +70,7 @@ sources = \
> hashtable_c++0x.cc \
> ios.cc \
> ios_errcat.cc \
> -   limits.cc \
> mutex.cc \
> -   placeholders.cc \
> random.cc \
> regex.cc  \
> shared_ptr.cc \
> @@ -118,7 +120,15 @@ endif
>
>  vpath % $(top_srcdir)/src

Re: PING: [PATCH v6] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction




On 11/15/22 03:44, Di Zhao OS via Gcc-patches wrote:

Hi,

I saw that Stage 1 of GCC 13 development is just ended. So is this
considered? Or should I bring this up when general development is
reopened?


Yes, it can still be considered.  The guideline is the patch must be 
posted before stage1 closes, which this patch obviously was. Richi as 
both the release manager and the expert on this part of GCC is by far 
the best person to judge the technical aspect of the patch and the 
cost/benefit of including it as we work our way through stage3 in our 
development cycle.



jeff




Re: [PATCH 2/2] libstdc++: Move stream initialization into compiled library [PR44952]

> From: Patrick Palka via Gcc-patches 
> Date: Fri, 4 Nov 2022 16:05:25 +0100

> This patch moves the global object for constructing the standard streams
> out from  and into the compiled library on targets that support
> the init_priority attribute.  This means that  no longer
> introduces a separate global constructor in each TU that includes it.
> 
> We can do this only if the init_priority attribute is supported because
> we need to force that the stream initialization runs first before any
> user-defined global initializer in programs that that use a static
> libstdc++.a.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look right?
> Unfortunately I don't have access to a system that truly doesn't support
> init priorities, so I instead tested that situation by artificially
> disabling the init_priority attribute on x86_64.
> 
>   PR libstdc++/44952
>   PR libstdc++/98108
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/c++config (_GLIBCXX_HAS_ATTRIBUTE): Define.
>   (_GLIBCXX_HAVE_ATTRIBUTE_INIT_PRIORITY): Define.
>   * include/std/iostream (__ioinit): Define only if init_priority
>   attribute isn't usable.
>   * src/c++98/ios_init.cc (__ioinit): Define here instead if
>   the init_priority is usable.
>   * src/c++98/ios_base_init.h: New file.

This (r13-3707-g4e4e3ffd10f53e) broke statically linked programs
using iostreams (affected embedded targets + "native" with
-static).  For me it manifests as adding some 100+ fails to my
cris-elf autotester also repeatable using a native Debian 11
x86_64 build running the test-suite with -static like "make
check-gcc-c++ 'RUNTESTFLAGS=--target_board=unix/-static
old-deja.exp=15071.C'".

I opened PR107701 for it.

brgds, H-P


Re: [PATCH] Fix gdb FilteringTypePrinter (again)


On 06/10/22 19:38 +0200, François Dumont wrote:

Hi

Looks like the previous patch was not enough. When using it in the
context of a build without dual abi and versioned namespace I started
having failures again. I guess I hadn't rebuild everything properly.

This time I think the problem was in those lines:

    if self.type_obj == type_obj:
    return strip_inline_namespaces(self.name)

I've added a call to gdb.types.get_basic_type so that we do not compare
a type with its typedef.

Thanks for the pointer to the doc !

Doing so I eventually use your code Jonathan to make FilteringTypeFilter
more specific to a given instantiation.

    libstdc++: Fix gdb FilteringTypePrinter

    Once we found a matching FilteringTypePrinter instance we look for
the associated
    typedef and check that the returned Python Type is equal to the
Type to recognize.
    But gdb Python Type includes properties to distinguish a typedef
from the actual
    type. So use gdb.types.get_basic_type to check if we are indeed on
the same type.

    Additionnaly enhance FilteringTypePrinter matching mecanism by
introducing targ1 that,
    if not None, will be used as the 1st template parameter.

    libstdc++-v3/ChangeLog:

    * python/libstdcxx/v6/printers.py (FilteringTypePrinter):
Rename 'match' field
    'template'. Add self.targ1 to specify the first template
parameter of the instantiation
    to match.
    (add_one_type_printer): Add targ1 optional parameter,
default to None.
    Use gdb.types.get_basic_type to compare the type to
recognize and the type
    returned from the typedef lookup.
    (register_type_printers): Adapt calls to add_one_type_printers.

Tested under Linux x86_64 normal, version namespace with or without dual
abi.

François

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 0fa7805183e..52339b247d8 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -2040,62 +2040,72 @@ def add_one_template_type_printer(obj, name, defargs):

class FilteringTypePrinter(object):
r"""
-A type printer that uses typedef names for common template specializations.
+A type printer that uses typedef names for common template instantiations.

Args:
-match (str): The class template to recognize.
+template (str): The class template to recognize.
name (str): The typedef-name that will be used instead.
+targ1 (str): The first template argument.
+If arg1 is provided (not None), only template instantiations with 
this type
+as the first template argument, e.g. if 
template='basic_string is the same type as
+e.g. for template='basic_istream', name='istream', if any instantiation of
+std::basic_istream is the same type as std::istream then print it as
+std::istream.
+
+e.g. for template='basic_istream', name='istream', targ1='char', if any
+instantiation of std::basic_istream is the same type as
std::istream then print it as std::istream.
"""


These are template specializations, not instantiations. Please undo
the changes to the comments, because the comments are 100% correct
now, and would become wrong with this patch.

template struct foo { };
using F = foo; // #1
template struct foo { }; // #2
template<> struct foo { }; // #3

#1 is a *specialization* of the class template foo. It is
*instantiated* when you construct one or depend on its size, or its
members.
#2 is a *partial specialization* and #3 is an explicit specialization.
But #1 is a speclialization, not an instantiation.

Instantiation is a process that happens during compilation. A
specialization is a type (or function, or variable) generated from a
template by substituting arguments for the template parameters. The
python type printer matches specializations.



-def __init__(self, match, name):
-self.match = match
+def __init__(self, template, name, targ1):


Is there a reason to require targ1 here, instead of making it
optional, by using =None as the default?




+self.template = template
self.name = name
+self.targ1 = targ1
self.enabled = True

class _recognizer(object):
"The recognizer class for FilteringTypePrinter."

-def __init__(self, match, name):
-self.match = match
+def __init__(self, template, name, targ1):
+self.template = template
self.name = name
+self.targ1 = targ1
self.type_obj = None

def recognize(self, type_obj):
"""
-If type_obj starts with self.match and is the same type as
+If type_obj starts with self.template and is the same type as
self.name then return self.name, otherwise None.
"""
if type_obj.tag is None:
return None


Re: [PATCH]middle-end: replace GET_MODE_WIDER_MODE with GET_MODE_NEXT_MODE




On 11/15/22 07:54, Richard Sandiford via Gcc-patches wrote:

Tamar Christina  writes:

-Original Message-
From: Richard Sandiford 
Sent: Tuesday, November 15, 2022 11:59 AM
To: Tamar Christina via Gcc-patches 
Cc: Tamar Christina ; nd ;
rguent...@suse.de; j...@ventanamicro.com
Subject: Re: [PATCH]middle-end: replace GET_MODE_WIDER_MODE with
GET_MODE_NEXT_MODE

Tamar Christina via Gcc-patches  writes:

Hi All,

After the fix to the addsub patch yesterday for bootstrap I had only

regtested on x86.

While looking today it seemed the new tests were failing, this was
caused by a change in the behavior of the GET_MODE_WIDER_MODE

macro on trunk.

This patch fixes that issue. Sorry for the mess, have rebased all branches

now.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* match.pd: Replace GET_MODE_WIDER_MODE with
GET_MODE_NEXT_MODE.

--- inline copy of patch --
diff --git a/gcc/match.pd b/gcc/match.pd index


1b0ab7cf60fa4772fbe8304c622b0b8fab1bdefa..28191a992039c6f3a1dab5f7c0
e3

5dd58dc47092 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7997,7 +7997,7 @@ and,
 machine_mode wide_mode;
   }
   (if (sel.series_p (0, 2, 0, 2)
-  && GET_MODE_WIDER_MODE (vec_mode).exists (&wide_mode)
+  && GET_MODE_NEXT_MODE (vec_mode).exists (&wide_mode)
  && VECTOR_MODE_P (wide_mode)
  && (GET_MODE_UNIT_BITSIZE (vec_mode) * 2
  == GET_MODE_UNIT_BITSIZE (wide_mode)))

Does anything guarantee that the next mode will be the right one?
It think it would be safer to replace the last three && conditions with:

&& GET_MODE_2XWIDER_MODE (GET_MODE_INNER (vec_mode)).exists
(&wide_elt_mode)
&& multiple_p (GET_MODE_NUNITS (vec_mode), 2, &wide_nunits)
&& related_vector_mode (vec_mode, wide_elt_mode,
   wide_nunits).exists (&wide_mode)

I see, respun patch accordingly.

LGTM, but I'm nervous when it comes to match.pd stuff so I'd prefer
Richi or Jeff to have the final say.


It's just a matter of finding that 2X wider mode to make the 
transformation possible.  So I don't see any concerns here.



jeff




[PATCH] [range-ops] Minor readability fix.

I know it's past the end of stage1, but I'm afraid we'll drag this
around forever in the GCC12 branch, and it's an easy readbility fix.

p.s. Or if you prefer:

if (!lb_nan && !ub_nan && !maybe_nan && )
  r.clear_nan ();

OK for trunk?

gcc/ChangeLog:

* range-op-float.cc (range_operator_float::fold_range): Make check
for maybe_isnan more readable.
---
 gcc/range-op-float.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 53a0928c6aa..86107e16553 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -83,10 +83,12 @@ range_operator_float::fold_range (frange &r, tree type,
 
   r.set (type, lb, ub);
 
-  if (lb_nan || ub_nan || maybe_nan)
+  if (lb_nan || ub_nan || maybe_nan
+  || op1.maybe_isnan ()
+  || op2.maybe_isnan ())
 // Keep the default NAN (with a varying sign) set by the setter.
 ;
-  else if (!op1.maybe_isnan () && !op2.maybe_isnan ())
+  else
 r.clear_nan ();
 
   return true;
-- 
2.38.1



Re: [RFC PATCH] ipa-guarded-deref: Add new pass to dereference function pointers




On 11/14/22 08:38, Christoph Müllner wrote:


Ok, I will add the check.

Still, I don't think that the optimization changes the behavior in the
case of different ABIs
(i.e. the situation is problematic regardless if there is an indirect call
or a guarded direct call).
What could be done additionally to dropping the candidate is to emit a
warning (similar like ipa-devirt does for ODR violations).


The one target I'm aware of where this is most likely to cause problems 
would be the older 32bit PA SOM ABI.  It has the property that the ABI 
uses different registers for parameter passing for direct vs indirect 
calls.  But with all this happening before RTL generation we should be 
fine, even for that ABI (and its generally safe in that ABI to change an 
indirect to a direct call, but not vice-versa).



Jeff


Re: [PATCH] RISC-V: Use .p2align for code-alignment




On 11/13/22 13:41, Philipp Tomsich wrote:

RISC-V's .p2align (currently) ignores the max-skip argument.  As we
have experimental patches underway to address this in a
backwards-compatible manner, let's prepare GCC for the day when
binutils gets updated.

gcc/ChangeLog:

* config/riscv/riscv.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Implement.



What are the implications if we start using p2align immediately when the 
current (broken?) state of the assembler?  I'm pretty sure configure is 
already turning on HAVE_GAS_SKIP_P2ALIGN.  From a native risc-v build:



auto-host.h:#define HAVE_GAS_MAX_SKIP_P2ALIGN 1



jeff



Re: [PATCH] RISC-V: Zihintpause: add __builtin_riscv_pause




On 11/13/22 13:41, Philipp Tomsich wrote:

The Zihintpause extension uses an opcode from the 'fence' opcode range
to add a true hint instruction (i.e. if it is not supported on any
given platform, the 'fence' that is encoded will not enforce any
specific ordering on memory accesses) for entering a low-power state
(e.g. in an idle thread).  We expose this new instruction through a
machine-dependent builtin to allow generating it without a requirement
for any inline assembly.

Given that the encoding of 'pause' is valid (as a 'fence' encoding)
even for processors that do not (yet) support Zihintpause, we make
this builtin available without any further TARGET_* constraints.

gcc/ChangeLog:

* config/riscv/riscv-builtins.cc (struct riscv_builtin_description):
add the pause machine-dependent builtin with no result and no
 arguments; mark it as always present (pause is a true hint
 that encodes into a fence-insn, if not supported with the new
 pause semantics).
* config/riscv/riscv-ftypes.def: Add type for void -> void.
* config/riscv/riscv.md (riscv_pause): Add risc_pause and UNSPECV_PAUSE
* 
doc/gcc/extensions-to-the-c-language-family/target-builtins/risc-v-built-in-functions.rst:
Document.
* optabs.cc (maybe_gen_insn): Allow nops == 0 (void -> void).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtin_pause.c: New test.


OK.  Though I think you'll need to adjust the doc patch now with the 
sphinx work reverted.



Jeff




[PATCH] testsuite: Fix missing EFFECTIVE_TARGETS variable errors

Permit running vector tests outside `check_vect_support_and_set_flags' 
environment, removing errors such as:

ERROR: gcc.dg/analyzer/torture/pr93350.c   -O0 : can't read 
"EFFECTIVE_TARGETS": no such variable for " dg-require-effective-target 1 
vect_int "

or:

ERROR: gcc.dg/bic-bitmask-13.c: error executing dg-final: can't read 
"EFFECTIVE_TARGETS": no such variable

with `mips-linux-gnu' target testing.

The EFFECTIVE_TARGETS variable has originated from commit 9b7937cf8a06 
("Add support to run auto-vectorization tests for multiple effective 
targets."), where arrangements have been made to run vector tests run 
within `check_vect_support_and_set_flags' environment iteratively over 
all the vector unit variants available in the architecture using extra 
compilation flags regardless of whether the target environment arranged 
for a particular testsuite run has vector support enabled by default.  
So far this has been used for the MIPS target only.

Vector tests have since been added though that run outside environment 
set up by `check_vect_support_and_set_flags' just using the current
compilation environment with no extra flags added.  This works for most 
targets, however causes problems with the MIPS target, because outside 
`check_vect_support_and_set_flags' environment the EFFECTIVE_TARGETS 
variable will not have been correctly set up even if it was added to 
the particular script invoking the test in question.

Fix this by using just the current compilation environment whenever a 
vector feature is requested by `et-is-effective-target' in the absence 
of the EFFECTIVE_TARGETS variable.  This required some modification to 
individual vector feature tests, which always added the compilation 
flags required for the determination of whether the given vector unit 
variant can be verified with the current testsuite run (except for the 
Loongson MMI variant).  Now explicit flags are only passed in setting up 
EFFECTIVE_TARGETS and otherwise the current compilation environment will 
determine whether such a vector test is applicable.

This changes how Loongson MMI is handled in that the `-mloongson-mmi' 
flag is explicitly passed for the determination of whether this vector 
unit variant can be verified, which I gather is how it was supposed to 
be arranged anyway because the flag is then added for testing the 
Loongson MMI variant.

gcc/testsuite/
* lib/target-supports.exp
(check_effective_target_mpaired_single): Add `args' argument and
pass it to `check_no_compiler_messages' replacing
`-mpaired-single'.
(add_options_for_mips_loongson_mmi): Add `args' argument and 
pass it to `check_no_compiler_messages'.
(check_effective_target_mips_msa): Add `args' argument and pass 
it to `check_no_compiler_messages' replacing `-mmsa'.
(check_effective_target_mpaired_single_runtime)
(add_options_for_mpaired_single): Pass `-mpaired-single' to
`check_effective_target_mpaired_single'.
(check_effective_target_mips_loongson_mmi_runtime)
(add_options_for_mips_loongson_mmi): Pass `-mloongson-mmi' to
`check_effective_target_mips_loongson_mmi'.
(check_effective_target_mips_msa_runtime)
(add_options_for_mips_msa): Pass `-mmsa' to
`check_effective_target_mips_msa'.
(et-is-effective-target): Verify that EFFECTIVE_TARGETS exists
and if not, just check if the current compilation environment
supports the target feature requested.
(check_vect_support_and_set_flags): Pass `-mpaired-single',
`-mloongson-mmi', and `-mmsa' to the respective target feature
checks.
---
Hi,

 This removes said errors and depending on the compilation flags used with 
a testsuite invocation may give scores such as:

UNSUPPORTED: gcc.dg/analyzer/torture/pr93350.c   -O0

or:

PASS: gcc.dg/analyzer/torture/pr93350.c   -O0  (test for excess errors)

In some cases, including in particular Loongson MMI now enabled, it causes 
extra failures to appear, but my interpretation is they are preexisting 
issues either with the compiler or the respective test cases, which just 
did not previously show up simply because the test cases were not run.  
Therefore I conclude they are not issues with the test framework update 
proposed here.

 OK to apply then?

  Maciej
---
 gcc/testsuite/lib/target-supports.exp |   41 +++---
 1 file changed, 23 insertions(+), 18 deletions(-)

gcc-test-effective-targets.diff
Index: gcc/gcc/testsuite/lib/target-supports.exp
===
--- gcc.orig/gcc/testsuite/lib/target-supports.exp
+++ gcc/gcc/testsuite/lib/target-supports.exp
@@ -1329,10 +1329,10 @@ proc check_effective_target_pie { } {
 
 # Return true if the target supports -mpaired-single (as used on MIPS).
 
-proc check_effective_target_mpaired_single { } {
+proc check_effective_target_mpaired_single { args } {
  

Re: [PATCH] RISC-V: Split "(a & (1UL << bitno)) ? 0 : -1" to bext + addi




On 11/13/22 13:48, Philipp Tomsich wrote:

For a straightforward application of bext for the following function
   long bext64(long a, char bitno)
   {
 return (a & (1UL << bitno)) ? 0 : -1;
   }
we generate
srl a0,a0,a1# 7 [c=4 l=4]  lshrdi3
andia0,a0,1 # 8 [c=4 l=4]  anddi3/1
addia0,a0,-1# 14[c=4 l=4]  adddi3/1
due to the following failed match at combine time:
   (set (reg:DI 82)
(zero_extract:DI (reg:DI 83)
 (const_int 1 [0x1])
 (reg:DI 84)))

The existing pattern for bext requires the 3rd argument to
zero_extract to be a QImode register wrapped in a zero_extension.
This adds an additional pattern that allows an Xmode argument.

With this change, the testcase compiles to
bexta0,a0,a1# 8 [c=4 l=4]  *bextdi
addia0,a0,-1# 14[c=4 l=4]  adddi3/1

gcc/ChangeLog:

* config/riscv/bitmanip.md (*bext): Add an additional
pattern that allows the 3rd argument to zero_extract to be
an Xmode register operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bext.c: Add testcases.
* gcc.target/riscv/zbs-bexti.c: Add testcases.


It's fairly common to want variants with extraction as well as a simple 
register operand.   The biggest concern is typically around 
SHIFT_COUNT_TRUNCATED, but given we already have an extract variant 
y'all should have already addressed concerns around SHIFT_COUNT_TRUNCATED.


OK.

jeff




Re: [PATCH v2 0/2] Basic support for the Ventana VT1 w/ instruction fusion


On Mon, 14 Nov 2022 23:25:54 PST (-0800), richard.guent...@gmail.com wrote:

On Tue, Nov 15, 2022 at 12:01 AM Philipp Tomsich
 wrote:


On Mon, 14 Nov 2022 at 23:47, Palmer Dabbelt  wrote:
>
> [Trying to join the threads here.]
>
> On Mon, 14 Nov 2022 13:28:23 PST (-0800), philipp.toms...@vrull.eu wrote:
> > Jeff,
> >
> > On Mon, 14 Nov 2022 at 22:23, Jeff Law  wrote:
> >>
> >>
> >> On 11/14/22 13:00, Palmer Dabbelt wrote:
> >> > On Sun, 13 Nov 2022 12:48:22 PST (-0800), philipp.toms...@vrull.eu wrote:
> >> >>
> >> >> This series provides support for the Ventana VT1 (a 4-way superscalar
> >> >> rv64gc_zba_zbb_zbc_zbs_zifenci_xventanacondops core) including support
> >> >> for the supported instruction fusion patterns.
> >> >>
> >> >> This includes the addition of the fusion-aware scheduling
> >> >> infrastructure for RISC-V and implements idiom recognition for the
> >> >> fusion patterns supported by VT1.
> >> >>
> >> >> Note that we don't signal support for XVentanaCondOps at this point,
> >> >> as the XVentanaCondOps support is in-flight separately. Changing the
> >> >> defaults for VT1 can happen late in the cycle, so no need to link the
> >> >> two different changesets.
> >> >>
> >> >> Changes in v2:
> >> >> - Rebased and changed over to .rst-based documentation
> >> >> - Updated to catch more fusion cases
> >> >> - Signals support for Zifencei
> >> >>
> >> >> Philipp Tomsich (2):
> >> >>   RISC-V: Add basic support for the Ventana-VT1 core
> >> >>   RISC-V: Add instruction fusion (for ventana-vt1)
> >> >>
> >> >>  gcc/config/riscv/riscv-cores.def  |   3 +
> >> >>  gcc/config/riscv/riscv-opts.h |   2 +-
> >> >>  gcc/config/riscv/riscv.cc | 233 ++
> >> >>  .../risc-v-options.rst|   5 +-
> >> >>  4 files changed, 240 insertions(+), 3 deletions(-)
> >> >
> >> > I guess we never really properly talked about this on the GCC mailing
> >> > lists, but IMO it's fine to start taking code for designs that have
> >> > been announced under the assumption that if the hardware doesn't
> >> > actually show up according to those timelines that it will be assumed
> >> > to have never existed and thus be removed more quickly than usual.
> >> Absolutely.   I have zero interest in carrying around code for
> >> nonexistent or dead variants.
> >> >
> >> > That said, I can't find anything describing that the VT-1 exists aside
> >> > from these patches.  Is there anything that describes this design and
> >> > when it's expected to be available?
> >>
> >> What do you need?  I can give some broad overview information on the
> >> design, but it would likely just mirror what's already been mentioned in
> >> these patches.
> >>
> >>
> >> As far as schedules.  I'm not sure what I can say.  I'll check on that.
>
> I'm less worried about the "does this pipeline model match the HW" bits,
> at least until the HW is publicly available then all we can do is rely
> on the vendor (and even after the HW is public the vendor might be the
> only one who cares enough to figure things out, nothing we can really do
> upstream there).  We've had some issues with nobody caring enough about
> the C906 pipeline model to sort out whether some patches are a net win,
> but if nobody (including the vendor) cares about the HW enough to
> benchmark things then there's not much we can do.
>
> My bigger worry is getting roped in to supporting a bunch of hardware
> that doesn't actually exist yet and may never make it outside some
> vendor's lab.  That can generally be a ton of work and filters
> throughout GCC, even outside of the RISC-V backend.  We've already got
> enough chaos just trying to follow the ISA, chasing down issues related
> to hardware that may not ever manifest is just going to lead to
> craziness.
>
> So on my end the point of the schedule is to have something we can look
> at and determine that the hardware is somehow defunct.  The fairest way
> we could come up with was to tie it to some sort of company announcement
> of the hardware: obviously everyone knows their internal timelines, but
> that's not fair to companies that don't employ someone with commit
> access.  Requirement some sort of public announcement means everyone has
> the same rules to play by, IMO that's really important in RISC-V land as
> there's so many vendors.
>
> >> It was never my intention to bypass any process/procedures here. So if I
> >> did, my apologies.
> >
> > The controversial part is XVentanaCondOps (as it is a vendor-defined
> > extension), so I'll certainly hold off on that until both you and
> > Palmer are in agreement on how to proceed there.
>
> The pipeline models are essentially in the same spot.  We've got a bit
> of a precedent there for taking them just based on an announcement, but
> there isn't one here.
>
> [and the other side of the thread]
>
> On Mon, 14 Nov 2022 13:14:35 PST (-0800), philipp.toms...@vrull.eu wrote:
> > On Mon, 14 Nov 2022 at 21:58, Palm

Re: [PATCH] RISC-V: Split "(a & (1UL << bitno)) ? 0 : 1" to bext + xori




On 11/13/22 13:48, Philipp Tomsich wrote:

We avoid reassociating "(~(a >> BIT_NO)) & 1" into "((~a) >> BIT_NO) & 1"
by splitting it into a zero-extraction (bext) and an xori.  This both
avoids burning a register on a temporary and generates a sequence that
clearly captures 'extract bit, then invert bit'.

This change improves the previously generated
 srl   a0,a0,a1
 not  a0,a0
 andi  a0,a0,1
into
 bext  a0,a0,a1
 xori  a0,a0,1

Signed-off-by: Philipp Tomsich 

gcc/ChangeLog:

* config/riscv/bitmanip.md: Add split covering
"(a & (1 << BIT_NO)) ? 0 : 1".

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbs-bext.c: Add testcases.
* gcc.target/riscv/zbs-bexti.c: Add testcases.


OK.   Not terribly happy with the SUBREG, but I can guess that's an 
artifact of other patterns which require that operand to be QImode.



Jeff




Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector

Tamar Christina  writes:
>> -Original Message-
>> From: Hongtao Liu 
>> Sent: Tuesday, November 15, 2022 9:37 AM
>> To: Tamar Christina 
>> Cc: Richard Sandiford ; Tamar Christina via
>> Gcc-patches ; nd ;
>> rguent...@suse.de
>> Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from
>> arbitrary element position inside a vector
>>
>> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina
>>  wrote:
>> >
>> > > -Original Message-
>> > > From: Hongtao Liu 
>> > > Sent: Tuesday, November 15, 2022 8:36 AM
>> > > To: Tamar Christina 
>> > > Cc: Richard Sandiford ; Tamar Christina
>> > > via Gcc-patches ; nd ;
>> > > rguent...@suse.de
>> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of
>> > > subvectors from arbitrary element position inside a vector
>> > >
>> > > Hi:
>> > >   I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022-
>> > > November/606040.html.
>> > > >  }
>> > > >
>> > > >/* See if we can get a better vector mode before extracting.
>> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index
>> > > >
>> > >
>> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600
>> > > 9
>> > > 0
>> > > > a453cc6a28d9 100644
>> > > > --- a/gcc/optabs.cc
>> > > > +++ b/gcc/optabs.cc
>> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode
>> mode,
>> > > rtx v0, rtx v1,
>> > > >v0_qi = gen_lowpart (qimode, v0);
>> > > >v1_qi = gen_lowpart (qimode, v1);
>> > > >if (targetm.vectorize.vec_perm_const != NULL
>> > > > + && targetm.can_change_mode_class (mode, qimode,
>> > > > + ALL_REGS)
>> > > It looks like you want to guard gen_lowpart, shouldn't it be better
>> > > to use validate_subreg  or (tmp = gen_lowpart_if_possible (mode,
>> target_qi)).
>> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but not
>> > > to guard gen_lowpart.
>> >
>> > Hmm I don't think this is quite true, there are existing usages in
>> > expr.cc and rtanal.cc That do this and aren't part of RA.  As I
>> > mentioned before for instance the canoncalization of vec_select to subreg
>> in rtlanal for instances uses this.
>> In theory, we need to iterate through all reg classes that can be assigned 
>> for
>> both qimode and mode, if any regclass returns true for
>> targetm.can_change_mode_class, the bitcast(validate_subreg) should be ok.
>> Here we just passed ALL_REGS.
>
> Yes, and most targets where this transformation is valid return true here.
>
> I've checked:
>  * alpha
>  * arm
>  * aarch64
>  * rs6000
>  * s390
>  * sparc
>  * pa
>  * mips
>
> And even the default example that other targets use from the documentation
> would return true as the size of the modes are the same.
>
> X86 and RISCV are the only two targets that I found (but didn't check all) 
> that
> blankly return a result based on just the register classes.
>
> That is to say, there are more targets that adhere to the interpretation that
> rclass here means "should be possible in some class in rclass" rather than
> "should be possible in ALL classes of rclass".

Yeah, I agree.  A query "can something stored in ALL_REGS change from
mode M1 to mode M2?" is meaningful if at least one register R in ALL_REGS
can hold both M1 and M2.  It's then the target's job to answer
conservatively so that the result covers all such R.

In principle it's OK for a target to err on the side of caution and forbid
things that are actually OK.  But that's going to risk losing performance
in some cases, and sometimes that loss of performance will be unacceptable.
IMO that's what's happening here.  The target is applying x87 rules to
things that (AIUI) are never stored in x87 registers, and so losing
performance as a result.

Note that the RA also uses ALL_REGS for some things, so this usage
isn't specific to non-RA code.

IMO it's not the job of target-independent code to iterate through
individual classes and aggregate the result.  One of the reasons for
having union classes is to avoid the need to do that.  And ALL_REGS
is the ultimate union class. :-)

The patch looks correct to me.

Thanks,
Richard

>> >
>> > So there are already existing precedence for this.  And the
>> > documentation for the hook says:
>> >
>> > "This hook returns true if it is possible to bitcast values held in 
>> > registers of
>> class rclass from mode from to mode to and if doing so preserves the low-
>> order bits that are common to both modes. The result is only meaningful if
>> rclass has registers that can hold both from and to. The default
>> implementation returns true"
>> >
>> > So it looks like it's use outside of RA is perfectly valid.. and the
>> > documentation also mentions in the example the use from the mid-end as
>> an example.
>> >
>> > But if the mid-end maintainers are happy I'll use something else.
>> >
>> > Tamar
>> >
>> > > I did similar things in
>> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579296.html
>> > > (and ALL_REGS doesn't cover all cases for regist

Re: [PATCH] Allow prologues and epilogues to be inserted later




On 11/11/22 09:21, Richard Sandiford via Gcc-patches wrote:

Arm's SME adds a new processor mode called streaming mode.
This mode enables some new (matrix-oriented) instructions and
disables several existing groups of instructions, such as most
Advanced SIMD vector instructions and a much smaller set of SVE
instructions.  It can also change the current vector length.

There are instructions to switch in and out of streaming mode.
However, their effect on the ISA and vector length can't be represented
directly in RTL, so they need to be emitted late in the pass pipeline,
close to md_reorg.

It's sometimes the responsibility of the prologue and epilogue to
switch modes, which means we need to emit the prologue and epilogue
sequences late as well.  (This loses shrink-wrapping and scheduling
opportunities, but that's a price worth paying.)

This patch therefore adds a target hook for forcing prologue
and epilogue insertion to happen later in the pipeline.

Tested on aarch64-linux-gnu (including with a follow-on patch)
and x86_64-linux-gnu.  OK to install?
  I'll ob
Richard


gcc/
* target.def (use_late_prologue_epilogue): New hook.
* doc/gccint/target-macros/miscellaneous-parameters.rst: Add
TARGET_USE_LATE_PROLOGUE_EPILOGUE.
* doc/gccint/target-macros/tm.rst.in: Regenerate.
* passes.def (pass_late_thread_prologue_and_epilogue): New pass.
* tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare.
* function.cc (pass_thread_prologue_and_epilogue::gate): New function.
(pass_data_late_thread_prologue_and_epilogue): New pass variable.
(pass_late_thread_prologue_and_epilogue): New pass class.
(make_pass_late_thread_prologue_and_epilogue): New function.


I'm not sure how we'll enforce the no target independent code motion 
limitation that this seems to need and the exception made for reorg is 
hackish in that it appears we just rely on the fact that reorg isn't run 
for the one target where this matters.  That does make me wonder if we 
should future proof this ever so slightly -- is there a reasonably easy 
way to fail if a target were to define delay slots and the need for late 
prologue/epilogue?  If so, that seems advisable.



No objection to the meat of the patch, just wondering a bit about the 
additional sanity checking we can do...



Jeff



Re: [PATCH 2/2] aarch64: Add support for widening LDAPR instructions

"Andre Vieira (lists)"  writes:
> Updated version of the patch to account for the testsuite changes in the 
> first patch.
>
> On 10/11/2022 11:20, Andre Vieira (lists) via Gcc-patches wrote:
>> Hi,
>>
>> This patch adds support for the widening LDAPR instructions.
>>
>> Bootstrapped and regression tested on aarch64-none-linux-gnu.
>>
>> OK for trunk?
>>
>> 2022-11-09  Andre Vieira  
>>     Kyrylo Tkachov  
>>
>> gcc/ChangeLog:
>>
>>     * config/aarch64/atomics.md 
>> (*aarch64_atomic_load_rcpc_zext): New pattern.
>>     (*aarch64_atomic_load_rcpc_zext): Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>>     * gcc.target/aarch64/ldapr-ext.c: New test.
>
> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
> index 
> dc5f52ee8a4b349c0d8466a16196f83604893cbb..9670bef7d8cb2b32c5146536d806a7e8bdffb2e3
>  100644
> --- a/gcc/config/aarch64/atomics.md
> +++ b/gcc/config/aarch64/atomics.md
> @@ -704,6 +704,28 @@
>}
>  )
>  
> +(define_insn "*aarch64_atomic_load_rcpc_zext"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> +(zero_extend:GPI
> +  (unspec_volatile:ALLX
> +[(match_operand:ALLX 1 "aarch64_sync_memory_operand" "Q")
> + (match_operand:SI 2 "const_int_operand")]   ;; model
> +   UNSPECV_LDAP)))]
> +  "TARGET_RCPC"
> +  "ldapr\t%0, %1"

It would be good to add:

   > 

to the condition, so that we don't provide bogus SI->SI and DI->DI
extensions.  (They shouldn't be generated, but it's better not to provide
them anyway.)

Thanks,
Richard

> +)
> +
> +(define_insn "*aarch64_atomic_load_rcpc_sext"
> +  [(set (match_operand:GPI  0 "register_operand" "=r")
> +(sign_extend:GPI
> +  (unspec_volatile:ALLX
> +[(match_operand:ALLX 1 "aarch64_sync_memory_operand" "Q")
> + (match_operand:SI 2 "const_int_operand")]   ;; model
> +   UNSPECV_LDAP)))]
> +  "TARGET_RCPC"
> +  "ldaprs\t%0, %1"
> +)
> +
>  (define_insn "atomic_store"
>[(set (match_operand:ALLI 0 "aarch64_rcpc_memory_operand" "=Q,Ust")
>  (unspec_volatile:ALLI
> diff --git a/gcc/testsuite/gcc.target/aarch64/ldapr-ext.c 
> b/gcc/testsuite/gcc.target/aarch64/ldapr-ext.c
> new file mode 100644
> index 
> ..aed27e06235b1d266decf11745dacf94cc59e76d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ldapr-ext.c
> @@ -0,0 +1,94 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -std=c99" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +#include 
> +
> +#pragma GCC target "+rcpc"
> +
> +atomic_ullong u64;
> +atomic_llong s64;
> +atomic_uint u32;
> +atomic_int s32;
> +atomic_ushort u16;
> +atomic_short s16;
> +atomic_uchar u8;
> +atomic_schar s8;
> +
> +#define TEST(name, ldsize, rettype)  \
> +rettype  \
> +test_##name (void)   \
> +{\
> +  return atomic_load_explicit (&ldsize, memory_order_acquire);   \
> +}
> +
> +/*
> +**test_u8_u64:
> +**...
> +**   ldaprb  x0, \[x[0-9]+\]
> +**   ret
> +*/
> +
> +TEST(u8_u64, u8, unsigned long long)
> +
> +/*
> +**test_s8_s64:
> +**...
> +**   ldaprsb x0, \[x[0-9]+\]
> +**   ret
> +*/
> +
> +TEST(s8_s64, s8, long long)
> +
> +/*
> +**test_u16_u64:
> +**...
> +**   ldaprh  x0, \[x[0-9]+\]
> +**   ret
> +*/
> +
> +TEST(u16_u64, u16, unsigned long long)
> +
> +/*
> +**test_s16_s64:
> +**...
> +**   ldaprsh x0, \[x[0-9]+\]
> +**   ret
> +*/
> +
> +TEST(s16_s64, s16, long long)
> +
> +/*
> +**test_u8_u32:
> +**...
> +**   ldaprb  w0, \[x[0-9]+\]
> +**   ret
> +*/
> +
> +TEST(u8_u32, u8, unsigned)
> +
> +/*
> +**test_s8_s32:
> +**...
> +**   ldaprsb w0, \[x[0-9]+\]
> +**   ret
> +*/
> +
> +TEST(s8_s32, s8, int)
> +
> +/*
> +**test_u16_u32:
> +**...
> +**   ldaprh  w0, \[x[0-9]+\]
> +**   ret
> +*/
> +
> +TEST(u16_u32, u16, unsigned)
> +
> +/*
> +**test_s16_s32:
> +**...
> +**   ldaprsh w0, \[x[0-9]+\]
> +**   ret
> +*/
> +
> +TEST(s16_s32, s16, int)


Re: [AArch64] Enable generation of FRINTNZ instructions

"Andre Vieira (lists)"  writes:
> On 07/11/2022 11:05, Richard Biener wrote:
>> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote:
>>
>>> Sorry for the delay, just been reminded I still had this patch outstanding
>>> from last stage 1. Hopefully since it has been mostly reviewed it could go 
>>> in
>>> for this stage 1?
>>>
>>> I addressed the comments and gave the slp-part of vectorizable_call some TLC
>>> to make it work.
>>>
>>> I also changed vect_get_slp_defs as I noticed that the call from
>>> vectorizable_call was creating an auto_vec with 'nargs' that might be less
>>> than the number of children in the slp_node
>> how so?  Please fix that in the caller.  It looks like it probably
>> shoud use vect_nargs instead?
> Well that was my first intuition, but when I looked at it further the 
> variant it's calling:
> void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec > 
> *vec_oprnds, unsigned n)
>
> Is actually creating a vector of vectors of slp defs. So for each child 
> of slp_node it calls:
> void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs)
>
> Which returns a vector of vectorized defs. So vect_nargs would be the 
> right size for the inner vec of vec_defs, but the outer should 
> have the same number of elements as the original slp_node has children.
>
> However, at the call site (vectorizable_call), the operand we pass to 
> vect_get_slp_defs 'vec_defs', is initialized before the code-path is 
> specialized for slp_node. I'll go see if I can change the call site to 
> not have to do that, given the continue at the end of the if (slp_node) 
> BB I don't think it needs to use vec_defs after it, but it may require 
> some massaging to be able to define it separately for each code-path.
>
>>
>>> , so that quick_push might not be
>>> safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't
>>> actually come across any failure because of it though. Happy to split this
>>> into a separate patch if needed.
>>>
>>> Bootstrapped and regression tested on aarch64-none-linux-gnu and
>>> x86_64-pc-linux-gnu.
>>>
>>> OK for trunk?
>> I'll leave final approval to Richard but
>>
>> - This only needs 1 bit, but occupies the full 16 to ensure a nice
>> + This only needs 1 bit, but occupies the full 15 to ensure a nice
>>layout.  */
>> unsigned int vectorizable : 16;
>>
>> you don't actually change the width of the bitfield.  I would find
>> it more natural to have
>>
>>signed int type0 : 7;
>>signed int type0_vtrans : 1;
>>signed int type1 : 7;
>>signed int type1_vtrans : 1;
>>
>> with typeN_vtrans specifying how the types transform when vectorized.
>> I would imagine another variant we could need is narrow/widen
>> according to either result or other argument type?  That said,
>> just your flag would then be
>>
>>signed int type0 : 7;
>>signed int pad   : 1;
>>signed int type1 : 7;
>>signed int type1_vect_as_scalar : 1;
>>
>> ?
> That's a cool idea! I'll leave it as a single bit for now like that, if 
> we want to re-use it for multiple transformations we will obviously need 
> to rename & give it more bits.

I think we should steal bits from vectorizable rather than shrink
type0 and type1 though.  Then add a 14-bit padding field to show
how many bits are left.

> @@ -3340,9 +3364,20 @@ vectorizable_call (vec_info *vinfo,
>rhs_type = unsigned_type_node;
>  }
> 
> +  /* The argument that is not of the same type as the others.  */
>int mask_opno = -1;
> +  int scalar_opno = -1;
>if (internal_fn_p (cfn))
> -mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +{
> +  internal_fn ifn = as_internal_fn (cfn);
> +  if (direct_internal_fn_p (ifn)
> +   && direct_internal_fn (ifn).type1_is_scalar_p)
> + scalar_opno = direct_internal_fn (ifn).type1;
> +  else
> + /* For masked operations this represents the argument that carries the
> +mask.  */
> + mask_opno = internal_fn_mask_index (as_internal_fn (cfn));

This doesn't seem logically like an else.  We should do both.

LGTM otherwise for the bits outside match.pd.  If Richard's happy with
the match.pd bits then I think the patch is OK with those changes and
without the vect_get_slp_defs thing (as you mentioned downthread).

Thanks,
Richard


>>
>>> gcc/ChangeLog:
>>>
>>>      * config/aarch64/aarch64.md (ftrunc2): New
>>> pattern.
>>>      * config/aarch64/iterators.md (FRINTNZ): New iterator.
>>>      (frintnz_mode): New int attribute.
>>>      (VSFDF): Make iterator conditional.
>>>      * internal-fn.def (FTRUNC_INT): New IFN.
>>>      * internal-fn.cc (ftrunc_int_direct): New define.
>>>      (expand_ftrunc_int_optab_fn): New custom expander.
>>>      (direct_ftrunc_int_optab_supported_p): New supported_p.
>>>      * internal-fn.h (direct_internal_fn_info): Add new member
>>>      type1_is_scalar_p.
>>>      * match.pd: Add to the existing TRUNC pattern match.
>>>     

Re: [PATCH] libstdc++: Enable building libstdc++.{a,so} when !HOSTED


Jonathan Wakely  writes:

> On Thu, 20 Oct 2022 at 16:53, Arsen Arsenović via Libstdc++
>  wrote:
>>
>> This enables us to provide symbols for placeholders and numeric limits,
>
>
> I'm not convinced this is worth doing.
>
> The placeholders and the numeric_limits members are all inline
> variables for C++17 and later, and C++17 is the compiler's default
> mode. The placeholders aren't even required to exist for freestanding
> prior to C++23. For the numeric_limits definitions, I suppose it is a
> problem that users can't take their address in freestanding today
> unless they compile as C++17.
>
>> and allows users to mess about with linker flags less.
>
> i.e. they don't have to use -nostdlib and/or link with gcc -lsupc++,
> but can just use g++?
>
> That seems more compelling than providing definitions of the
> placeholders and limits members.

Indeed, with just a few more changes that I didn't get the chance to
polish up before S3 (setting the MATH_LIBRARY target macro to "" and
disabling linking crt0 when building --without-headers
--without-newlib), and the other patches I did submit, I could get a
freestanding test program building with just -nolibc:

  i686-elf-g++ -ffreestanding -ggdb3 -O2 -fno-omit-frame-pointer \
-std=gnu++20 -T ../linkscript.ld -nolibc \
-Wl,-z,max-page-size=0x1000 -o frello frello-main.o frello-abort.o \
-frello-support.o frello-memory.o frello-logging.o frello-assert.o \
frello-entry.o

Most of these flags serve unrelated purposes, and the linker script is
the default one for this configuration, with some sections shifted
about.

Have a great evening!
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


[PATCH v3] c, analyzer: support named constants in analyzer [PR106302]

On Mon, 2022-11-14 at 15:42 -0500, Marek Polacek wrote:
> On Fri, Nov 11, 2022 at 10:23:10PM -0500, David Malcolm wrote:
> > Changes since v1: ported the doc changes from texinfo to sphinx
> > 
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > 
> > Are the C frontend parts OK for trunk?  (I can self-approve the
> > analyzer parts)
> 
> Sorry for the delay.
>  
> > The patch adds an interface for frontends to call into the analyzer
> > as
> > the translation unit finishes.  The analyzer can then call back
> > into the
> > frontend to ask about the values of the named constants it cares
> > about
> > whilst the frontend's data structures are still around.
> > 
> > The patch implements this for the C frontend, which looks up the
> > names
> > by looking for named CONST_DECLs (which handles enum values). 
> > Failing
> > that, it attempts to look up the values of macros but only the
> > simplest
> > cases are supported (a non-traditional macro with a single
> > CPP_NUMBER
> > token).  It does this by building a buffer containing the macro
> > definition and rerunning a lexer on it.
> > 
> > The analyzer gracefully handles the cases where named values aren't
> > found (such as anything more complicated than described above).
> > 
> > The patch ports the analyzer to use this mechanism for "O_RDONLY",
> > "O_WRONLY", and "O_ACCMODE".  I have successfully tested my socket
> > patch
> > to also use this for "SOCK_STREAM" and "SOCK_DGRAM", so the
> > technique
> > seems to work.
> 
> So this works well for code like
> 
> enum __socket_type {
> SOCK_STREAM = 1,
> 
> #define SOCK_STREAM SOCK_STREAM
> };
> 
> ?

Yes: c_translation_unit::lookup_constant_by_id does the "lookup_name"
first, and this finds the CONST_DECL, so it doesn't need to look at
macros for this case.

I've added a testcase for this in the v3 patch (gcc.dg/analyzer/named-
constants-via-enum-and-macro.c)

> 
> > diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> > index d70697b1d63..efe19fbe70b 100644
> > --- a/gcc/c/c-parser.cc
> > +++ b/gcc/c/c-parser.cc
> > @@ -72,6 +72,8 @@ along with GCC; see the file COPYING3.  If not
> > see
> >  #include "memmodel.h"
> >  #include "c-family/known-headers.h"
> >  #include "bitmap.h"
> > +#include "analyzer/analyzer-language.h"
> > +#include "toplev.h"
> >  
> >  /* We need to walk over decls with incomplete struct/union/enum
> > types
> > after parsing the whole translation unit.
> > @@ -1662,6 +1664,87 @@ static bool
> > c_parser_objc_diagnose_bad_element_prefix
> >(c_parser *, struct c_declspecs *);
> >  static location_t c_parser_parse_rtl_body (c_parser *, char *);
> >  
> > +#if ENABLE_ANALYZER
> > +
> > +namespace ana {
> > +
> > +/* Concrete implementation of ana::translation_unit for the C
> > frontend.  */
> > +
> > +class c_translation_unit : public translation_unit
> > +{
> > +public:
> > +  /* Implementation of translation_unit::lookup_constant_by_id for
> > use by the
> > + analyzer to look up named constants in the user's source
> > code.  */
> > +  tree lookup_constant_by_id (tree id) const final override
> > +  {
> > +/* Consider decls.  */
> > +if (tree decl = lookup_name (id))
> > +  if (TREE_CODE (decl) == CONST_DECL)
> > +   if (tree value = DECL_INITIAL (decl))
> > + if (TREE_CODE (value) == INTEGER_CST)
> > +   return value;
> > +
> > +/* Consider macros.  */
> > +cpp_hashnode *hashnode = C_CPP_HASHNODE (id);
> > +if (cpp_macro_p (hashnode))
> > +  if (tree value = consider_macro (hashnode->value.macro))
> > +   return value;
> > +
> > +return NULL_TREE;
> > +  }
> > +
> > +private:
> > +  /* Attempt to get an INTEGER_CST from MACRO.
> > + Only handle the simplest cases: where MACRO's definition is a
> > single
> > + token containing a number, by lexing the number again.
> > + This will handle e.g.
> > +   #define NAME 42
> > + and other bases but not negative numbers, parentheses or e.g.
> > +   #define NAME 1 << 7
> > + as doing so would require a parser.  */
> > +  tree consider_macro (cpp_macro *macro) const
> > +  {
> > +if (macro->paramc > 0)
> > +  return NULL_TREE;
> > +if (macro->kind == cmk_traditional)
> 
> Do you really want to handle cmk_assert?  I'd say you want
> 
>   if (macro->kind != cmk_macro)

Thanks; fixed in the v3 patch.

> 
> > +  return NULL_TREE;
> > +if (macro->count != 1)
> > +  return NULL_TREE;
> > +const cpp_token &tok = macro->exp.tokens[0];
> > +if (tok.type != CPP_NUMBER)
> > +  return NULL_TREE;
> > +
> > +cpp_reader *old_parse_in = parse_in;
> > +parse_in = cpp_create_reader (c_dialect_cxx () ? CLK_GNUCXX:
> > CLK_GNUC89,
> > + ident_hash, line_table);
> 
> Why not always CLK_GNUC89 since we're in the C FE?

Fixed (I was copying and pasting from a selftest in input.c, IIRC).

> 
> > +
> > +pretty_printer pp;
> > +pp_string (&pp, (const char *)tok.val.s

Re: [PATCH v3] c, analyzer: support named constants in analyzer [PR106302]

On Tue, Nov 15, 2022 at 01:35:05PM -0500, David Malcolm wrote:
> On Mon, 2022-11-14 at 15:42 -0500, Marek Polacek wrote:
> > On Fri, Nov 11, 2022 at 10:23:10PM -0500, David Malcolm wrote:
> > > Changes since v1: ported the doc changes from texinfo to sphinx
> > > 
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > 
> > > Are the C frontend parts OK for trunk?  (I can self-approve the
> > > analyzer parts)
> > 
> > Sorry for the delay.
> >  
> > > The patch adds an interface for frontends to call into the analyzer
> > > as
> > > the translation unit finishes.  The analyzer can then call back
> > > into the
> > > frontend to ask about the values of the named constants it cares
> > > about
> > > whilst the frontend's data structures are still around.
> > > 
> > > The patch implements this for the C frontend, which looks up the
> > > names
> > > by looking for named CONST_DECLs (which handles enum values). 
> > > Failing
> > > that, it attempts to look up the values of macros but only the
> > > simplest
> > > cases are supported (a non-traditional macro with a single
> > > CPP_NUMBER
> > > token).  It does this by building a buffer containing the macro
> > > definition and rerunning a lexer on it.
> > > 
> > > The analyzer gracefully handles the cases where named values aren't
> > > found (such as anything more complicated than described above).
> > > 
> > > The patch ports the analyzer to use this mechanism for "O_RDONLY",
> > > "O_WRONLY", and "O_ACCMODE".  I have successfully tested my socket
> > > patch
> > > to also use this for "SOCK_STREAM" and "SOCK_DGRAM", so the
> > > technique
> > > seems to work.
> > 
> > So this works well for code like
> > 
> > enum __socket_type {
> > SOCK_STREAM = 1,
> > 
> > #define SOCK_STREAM SOCK_STREAM
> > };
> > 
> > ?
> 
> Yes: c_translation_unit::lookup_constant_by_id does the "lookup_name"
> first, and this finds the CONST_DECL, so it doesn't need to look at
> macros for this case.

Ah, nice.
 
> I've added a testcase for this in the v3 patch (gcc.dg/analyzer/named-
> constants-via-enum-and-macro.c)

Thanks.

> Thanks for the review.  Here's a v3 version of the patch.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> Are the C FE parts OK for trunk?

Ok, thanks.
 

Marek



[Patch] nvptx/mkoffload.cc: Fix "$nohost" check


Found when working on real reverse offload - as
the reverse-offload stub function was added to the reverse-offload table.
Reason - as mentioned in the commit log: lhd_set_decl_assembler_name.

I intent to commit it tomorrow as obvious, unless there are further
comments.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
nvptx/mkoffload.cc: Fix "$nohost" check

If lhd_set_decl_assembler_name is invoked - in particular if
!TREE_PUBLIC (decl) && !DECL_FILE_SCOPE_P (decl) - the '.nohost' suffix
might change to '.nohost.2'. This happens for the existing reverse offload
testcases via cgraph_node::analyze and is a side effect of
r13-3455-g178ac530fe67e4f2fc439cc4ce89bc19d571ca31 for some reason.

The solution is to not only check for a tailing '$nohost' but also for
'$nohost$' in nvptx/mkoffload.cc.

gcc/ChangeLog:

	* config/nvptx/mkoffload.cc (process): Recognize '$nohost$...'
	besides tailing '$nohost' as being for reverse offload.

 gcc/config/nvptx/mkoffload.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 854cd72f3c7..5d89ba8a788 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -364,7 +364,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 	 Alternatively, besides searching for 'BEGIN FUNCTION DECL',
 	 checking for '.visible .entry ' + id->ptx_name would be
 	 required.  */
-	  if (!endswith (id->ptx_name, "$nohost"))
+	  if (!endswith (id->ptx_name, "$nohost")
+	  && !strstr (id->ptx_name, "$nohost$"))
 	continue;
 	  fprintf (out, "\t\".extern ");
 	  const char *p = input + file_idx[fidx];
@@ -402,7 +403,8 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
 		"$offload_func_table[] = {");
   for (comma = "", id = func_ids; id; comma = ",", id = id->next)
 	fprintf (out, "%s\"\n\t\t\"%s", comma,
-		 endswith (id->ptx_name, "$nohost") ? id->ptx_name : "0");
+		 (endswith (id->ptx_name, "$nohost")
+		  || strstr (id->ptx_name, "$nohost$")) ? id->ptx_name : "0");
   fprintf (out, "};\\n\";\n\n");
 }
 


Re: why does gcc jit require pthread?

[Fixing typo in the Subject ("git" -> "jit" ); CCing jit mailing list]

On Fri, 2022-11-11 at 17:16 +, Jonathan Wakely wrote:
> On Mon, 7 Nov 2022 at 13:51, Jonathan Wakely wrote:
> > 
> > On Mon, 7 Nov 2022 at 13:33, LIU Hao wrote:
> > > 
> > > 在 2022-11-07 20:57, Jonathan Wakely 写道:
> > > > It would be a lot nicer if playback::context met the C++
> > > > Lockable
> > > > requirements, and playback::context::compile () could just take
> > > > a
> > > > scoped lock on *this:
> > > > 
> > > > 
> > > 
> > > Yeah yeah that makes a lot of sense. Would you please just commit
> > > that? I don't have write access to
> > > GCC repo, and it takes a couple of hours for me to bootstrap GCC
> > > just for this tiny change.
> > 
> > Somebody else needs to approve it first. I'll combine our patches
> > and
> > test and submit it properly for approval.
> 
> Here's a complete patch that actually builds now, although I'm seeing
> a stage 2 vs stage 3 comparison error which I don't have time to look
> into right now.

I confess that I'm not familiar with C++11's mutex and locking types,
but having read through the relevant entries on cppreference.com, the
patch looks correct to me.

Are these classes well-supported on the minimum compiler version we
support?  (Jonathan, I defer to your judgement here)

Jonathan: you said in your followup email that it "bootstraps and
passes testing on x86_64-pc-linux-gnu (CentOS 8 Stream)".  This is
possibly a silly question, but did this testing include the jit
testsuite?  A gotcha here is that --enable-languages=all does *not*
enable jit.

The patch is OK for trunk if you have favorable answers for the above
two questions.

Thanks!
Dave



Re: why does gcc jit require pthread?

On Tue, 15 Nov 2022 at 18:50, David Malcolm  wrote:
>
> [Fixing typo in the Subject ("git" -> "jit" ); CCing jit mailing list]
>
> On Fri, 2022-11-11 at 17:16 +, Jonathan Wakely wrote:
> > On Mon, 7 Nov 2022 at 13:51, Jonathan Wakely wrote:
> > >
> > > On Mon, 7 Nov 2022 at 13:33, LIU Hao wrote:
> > > >
> > > > 在 2022-11-07 20:57, Jonathan Wakely 写道:
> > > > > It would be a lot nicer if playback::context met the C++
> > > > > Lockable
> > > > > requirements, and playback::context::compile () could just take
> > > > > a
> > > > > scoped lock on *this:
> > > > >
> > > > >
> > > >
> > > > Yeah yeah that makes a lot of sense. Would you please just commit
> > > > that? I don't have write access to
> > > > GCC repo, and it takes a couple of hours for me to bootstrap GCC
> > > > just for this tiny change.
> > >
> > > Somebody else needs to approve it first. I'll combine our patches
> > > and
> > > test and submit it properly for approval.
> >
> > Here's a complete patch that actually builds now, although I'm seeing
> > a stage 2 vs stage 3 comparison error which I don't have time to look
> > into right now.
>
> I confess that I'm not familiar with C++11's mutex and locking types,
> but having read through the relevant entries on cppreference.com, the
> patch looks correct to me.
>
> Are these classes well-supported on the minimum compiler version we
> support?  (Jonathan, I defer to your judgement here)

std::mutex has been supported since 4.4.0 and is very simple. The
implementation on trunk is identical to the one in gcc 4.8.5 except
for adding 'noexcept' to mutex::native_handle (), which is not
relevant to this change.

> Jonathan: you said in your followup email that it "bootstraps and
> passes testing on x86_64-pc-linux-gnu (CentOS 8 Stream)".  This is
> possibly a silly question, but did this testing include the jit
> testsuite?  A gotcha here is that --enable-languages=all does *not*
> enable jit.

Yes, I built with --enable-languages=c,c++,jit --enable-host-shared

> The patch is OK for trunk if you have favorable answers for the above
> two questions.
>
> Thanks!
> Dave
>


Re: [PATCH v2] analyzer: add warnings relating to sockets [PR106140]

On Fri, 2022-11-11 at 22:27 -0500, David Malcolm wrote:
> Changed in v2: ported doc changes from texinfo to sphinx
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> I can self-approve this patch, but it depends on the named constants
> patch here:
>   * [PATCH v2] c, analyzer: support named constants in analyzer
> [PR106302]
>     *
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605835.html
> which requires review of the C frontend changes.

Marek approved v3 of the named constants patch (thanks!), so I've now
committed both that, and this (ported back to texinfo from sphinx) to
trunk, as:
  r13-4073-gd8aba860b34203
and
  r13-4074-g86a90006864840
respectively.

Dave



[committed] bpf: avoid possible use of uninitialized variable

Fix a maybe-uninitialized warning introduced in commit:
068baae1864 bpf: add preserve_field_info builtin

Thanks to Jan-Benedict Glaw for pointing this out.

Tested on bpf-unknown-none, committed as obvious.

gcc/

* config/bpf/bpf.cc (bpf_expand_builtin): Avoid use of uninitialized
variable in error case.
---
 gcc/config/bpf/bpf.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 16af2412bf6..51e46955015 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -1254,11 +1254,14 @@ bpf_expand_builtin (tree exp, rtx target 
ATTRIBUTE_UNUSED,
   /* A resolved overloaded __builtin_preserve_field_info.  */
   tree src = CALL_EXPR_ARG (exp, 0);
   tree kind_tree = CALL_EXPR_ARG (exp, 1);
-  unsigned HOST_WIDE_INT kind_val;
+  unsigned HOST_WIDE_INT kind_val = 0;
   if (tree_fits_uhwi_p (kind_tree))
kind_val = tree_to_uhwi (kind_tree);
   else
-   error ("invalid argument to built-in function");
+   {
+ error ("invalid argument to built-in function");
+ return expand_normal (error_mark_node);
+   }
 
   enum btf_core_reloc_kind kind = (enum btf_core_reloc_kind) kind_val;
 
-- 
2.37.2



[PATCH] RISC-V uninit-pred-9_b.c failure

The gimple generated by riscv is identical to that of powerpc64
currently. It seems like the change at 
r12-4790-4b3a325f07acebf47e82de227ce1d5ba62f5bcae also affected riscv64 like 
powerpc64 and cris.

gcc/testsuite/ChangeLog:

* gcc.dg/uninit-pred-9_b.c: Xfail for riscv64
---
 gcc/testsuite/gcc.dg/uninit-pred-9_b.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c 
b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
index 53c4a5399ea..843f5323713 100644
--- a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
+++ b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
@@ -17,7 +17,7 @@ int foo (int n, int l, int m, int r)
 
   if (l > 100)
 if ( (n <= 9) &&  (m < 100)  && (r < 19) )
-  blah(v); /* { dg-bogus "uninitialized" "bogus warning" { xfail 
powerpc64*-*-* cris-*-* } } */
+  blah(v); /* { dg-bogus "uninitialized" "bogus warning" { xfail 
powerpc64*-*-* cris-*-* riscv64*-*-* } */
 
   if ( (n <= 8) &&  (m < 99)  && (r < 19) )
   blah(v); /* { dg-bogus "uninitialized" "pr101674" { xfail mmix-*-* } } */
-- 
2.25.1



Re: [PATCH] RISC-V uninit-pred-9_b.c failure




On 11/15/22 12:08, Kevin Lee wrote:

The gimple generated by riscv is identical to that of powerpc64
currently. It seems like the change at 
r12-4790-4b3a325f07acebf47e82de227ce1d5ba62f5bcae also affected riscv64 like 
powerpc64 and cris.

gcc/testsuite/ChangeLog:

* gcc.dg/uninit-pred-9_b.c: Xfail for riscv64


Note that if we start adjusting BRANCH_COST then this may need further 
refinement, but I think it's fine for now.



OK

jeff




Re: [PATCH Rust front-end v3 01/46] Use DW_ATE_UTF for the Rust 'char' type

Mark Wielaard  writes:

> https://code.wildebeest.org/git/user/mjw/gccrs/commit/?h=no-Rust-old
> if someone wants to push that, to merge for a v4.

Sorry, missed that part, taking care of merging it right now :)

https://github.com/Rust-GCC/gccrs/pull/1649

Thanks,
Marc


[PATCH] Fortran: ICE in simplification of array expression involving power [PR107680]

Dear all,

when constant expressions involve parentheses, array constructors,
typespecs, and the power operator (**), we could fail with an ICE
during simplification in arith_power.

Debugging of the testcase showed we call the proper type conversions
needed for the arithmetic operation, but under certain circumstances
we seem to lose the typespec on the way to the invocation of the
simplification.  We then run into unhandled combinations of operand
types.

The attached patch is likely a sort of a band-aid to the problem:
we check the operand types in arith_power, and if we see that a
conversion is (still) needed, we punt and defer the simplification.

AFAICT this is safe.  It does not address a possibly deeply
covered issue in gfortran, which was suspected when analyzing
pr107000.  But as this is elusive, that may be hard to locate
and fix.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From efe9dafabc2e2cc2dab079dfa3be3d09b3471c0f Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 15 Nov 2022 21:20:20 +0100
Subject: [PATCH] Fortran: ICE in simplification of array expression involving
 power [PR107680]

gcc/fortran/ChangeLog:

	PR fortran/107680
	* arith.cc (arith_power): Check that operands are properly converted
	before attempting to simplify.

gcc/testsuite/ChangeLog:

	PR fortran/107680
	* gfortran.dg/pr107680.f90: New test.
---
 gcc/fortran/arith.cc   |  7 ++
 gcc/testsuite/gfortran.dg/pr107680.f90 | 34 ++
 2 files changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr107680.f90

diff --git a/gcc/fortran/arith.cc b/gcc/fortran/arith.cc
index fc9224ebc5c..c4ab75b401c 100644
--- a/gcc/fortran/arith.cc
+++ b/gcc/fortran/arith.cc
@@ -845,6 +845,13 @@ arith_power (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
   if (!gfc_numeric_ts (&op1->ts) || !gfc_numeric_ts (&op2->ts))
 return ARITH_INVALID_TYPE;

+  /* The result type is derived from op1 and must be compatible with the
+ result of the simplification.  Otherwise postpone simplification until
+ after operand conversions usually done by gfc_type_convert_binary.  */
+  if ((op1->ts.type == BT_INTEGER && op2->ts.type != BT_INTEGER)
+  || (op1->ts.type == BT_REAL && op2->ts.type == BT_COMPLEX))
+return ARITH_NOT_REDUCED;
+
   rc = ARITH_OK;
   result = gfc_get_constant_expr (op1->ts.type, op1->ts.kind, &op1->where);

diff --git a/gcc/testsuite/gfortran.dg/pr107680.f90 b/gcc/testsuite/gfortran.dg/pr107680.f90
new file mode 100644
index 000..4ed431eb06f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr107680.f90
@@ -0,0 +1,34 @@
+! { dg-do compile }
+! { dg-options "-fdump-tree-original" }
+! PR fortran/107680 - ICE in arith_power
+! Contributed by G.Steinmetz
+
+program p
+  real,parameter :: x(*) = [real:: ([1])]   **  2.0
+  complex, parameter :: y(*) = [real:: ([1])]   ** (2.0,1.0)
+  complex, parameter :: z(*) = [complex :: ([1])]   ** (2.0,1.0)
+  complex, parameter :: u(*) = [complex :: ([1.0])] ** (2.0,1.0)
+  complex, parameter :: v(*) = [real:: ([(1.0,2.0)])] ** (3.0,1.0)
+  complex, parameter :: w(*) = [integer :: ([(1.0,2.0)])] ** (3.0,1.0)
+  print *, [real:: ([3])]   **  2
+  print *, [real:: ([3])]   **  2.0
+  print *, [real:: ([1])]   ** (1.0,2.0)
+  print *, [real:: ([1.0])] ** (1.0,2.0)
+  print *, [complex :: ([3])]   **  2
+  print *, [complex :: ([3])]   **  2.0
+  print *, [complex :: ([1])]   ** (1.0,2.0)
+  print *, [complex :: ([1.0])] ** (1.0,2.0)
+  print *, [integer :: ([3.0])] **  2
+  print *, [integer :: ([3.0])] **  2.0
+  print *, [integer :: ([1.0])] ** (1.0,2.0)
+  print *, [integer :: ([(1.0,2.0)])] ** (3.0,1.0)
+  print *, v(1)
+  if (u(1) /= 1) stop 1
+  if (v(1) /= 1) stop 2
+  if (w(1) /= 1) stop 3
+  if (x(1) /= 1) stop 4
+  if (y(1) /= 1) stop 5
+  if (z(1) /= 1) stop 6
+end
+
+! { dg-final { scan-tree-dump-not "_gfortran_stop_numeric" "original" } }
--
2.35.3



Re: [PATCH] RISC-V: Use .p2align for code-alignment

Jeff,

On Tue, 15 Nov 2022 at 17:37, Jeff Law  wrote:
>
>
> On 11/13/22 13:41, Philipp Tomsich wrote:
>
> RISC-V's .p2align (currently) ignores the max-skip argument.  As we
> have experimental patches underway to address this in a
> backwards-compatible manner, let's prepare GCC for the day when
> binutils gets updated.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Implement.
>
>
> What are the implications if we start using p2align immediately when the 
> current (broken?) state of the assembler?  I'm pretty sure configure is 
> already turning on HAVE_GAS_SKIP_P2ALIGN.  From a native risc-v build:
>
>
> auto-host.h:#define HAVE_GAS_MAX_SKIP_P2ALIGN 1

This is your tree, which has the (partial fix — i.e., the best we can
do without breaking backward compatibility) for .p2align.
When building against upstream binutils, this should not be defined.

Philipp.


Re: [PATCH] c++: Fix up calls to static operator() or operator[] [PR107624]


On 11/15/22 02:28, Jakub Jelinek wrote:

Hi!

On Mon, Nov 14, 2022 at 06:29:44PM -0500, Jason Merrill wrote:

Indeed.  The code in build_new_method_call for this case has the comment

   /* In an expression of the form `a->f()' where `f' turns
  out to be a static member function, `a' is
  none-the-less evaluated.  */


Had to tweak 3 spots for this.  Furthermore, found that if in non-pedantic
C++20 compilation static operator[] is accepted, we required that it has 2
arguments, I think it is better to require exactly one because that case
is the only one that will actually work in C++20 and older.

Lightly tested so far, ok for trunk if it passes bootstrap/regtest?

Or do you want to outline the
  if (result != error_mark_node
  && TREE_CODE (TREE_TYPE (cand->fn)) != METHOD_TYPE
  && TREE_SIDE_EFFECTS (obj))
{
  /* But avoid the implicit lvalue-rvalue conversion when 'a'
 is volatile.  */
  tree a = obj;
  if (TREE_THIS_VOLATILE (a))
a = build_this (a);
  if (TREE_SIDE_EFFECTS (a))
result = build2 (COMPOUND_EXPR, TREE_TYPE (result), a, result);
}
part that is now repeated 4 times to some helper function?  If yes,
any suggestion on a good name?


Please.  Maybe keep_unused_object_arg?


2022-11-15  Jakub Jelinek  

PR c++/107624
* call.cc (build_op_call): If obj has side-effects
and operator() is static member function, return COMPOUND_EXPR
with the obj side-effects other than reading from volatile
object.
(build_op_subscript): Likewise.
(build_new_op): Similarly for ARRAY_REF, just for arg1 rather than
obj.
* decl.cc (grok_op_properties): For C++20 and earlier, if operator[]
is static member function, require exactly one parameter rather than
exactly two parameters.

* g++.dg/cpp23/static-operator-call4.C: New test.
* g++.dg/cpp23/subscript10.C: New test.
* g++.dg/cpp23/subscript11.C: New test.

--- gcc/cp/call.cc.jj   2022-11-15 07:59:57.337231337 +0100
+++ gcc/cp/call.cc  2022-11-15 13:02:33.369531156 +0100
@@ -5137,7 +5137,24 @@ build_op_call (tree obj, vecfn) == FUNCTION_DECL
   && DECL_OVERLOADED_OPERATOR_P (cand->fn)
   && DECL_OVERLOADED_OPERATOR_IS (cand->fn, CALL_EXPR))
-   result = build_over_call (cand, LOOKUP_NORMAL, complain);
+   {
+ result = build_over_call (cand, LOOKUP_NORMAL, complain);
+ /* In an expression of the form `a()' where cand->fn
+which is operator() turns out to be a static member function,
+`a' is none-the-less evaluated.  */
+ if (result != error_mark_node
+ && TREE_CODE (TREE_TYPE (cand->fn)) != METHOD_TYPE
+ && TREE_SIDE_EFFECTS (obj))
+   {
+ /* But avoid the implicit lvalue-rvalue conversion when 'a'
+is volatile.  */
+ tree a = obj;
+ if (TREE_THIS_VOLATILE (a))
+   a = build_this (a);
+ if (TREE_SIDE_EFFECTS (a))
+   result = build2 (COMPOUND_EXPR, TREE_TYPE (result), a, result);
+   }
+   }
else
{
  if (TREE_CODE (cand->fn) == FUNCTION_DECL)
@@ -7046,6 +7063,24 @@ build_new_op (const op_location_t &loc,
  gcc_unreachable ();
}
}
+
+ /* In an expression of the form `a[]' where cand->fn
+which is operator[] turns out to be a static member function,
+`a' is none-the-less evaluated.  */
+ if (code == ARRAY_REF
+ && result
+ && result != error_mark_node
+ && TREE_CODE (TREE_TYPE (cand->fn)) != METHOD_TYPE
+ && TREE_SIDE_EFFECTS (arg1))
+   {
+ /* But avoid the implicit lvalue-rvalue conversion when 'a'
+is volatile.  */
+ tree a = arg1;
+ if (TREE_THIS_VOLATILE (a))
+   a = build_this (a);
+ if (TREE_SIDE_EFFECTS (a))
+   result = build2 (COMPOUND_EXPR, TREE_TYPE (result), a, result);
+   }
}
else
{
@@ -7302,6 +7337,24 @@ build_op_subscript (const op_location_t
  /* Specify evaluation order as per P0145R2.  */
  CALL_EXPR_ORDERED_ARGS (call) = op_is_ordered (ARRAY_REF) == 1;
}
+
+ /* In an expression of the form `a[]' where cand->fn
+which is operator[] turns out to be a static member function,
+`a' is none-the-less evaluated.  */
+ if (result
+ && result != error_mark_node
+ && TREE_CODE (TREE_TYPE (cand->fn)) != METHOD_TYPE
+ && TREE_SIDE_EFFECTS (obj))
+   {
+ /* But avoid the implicit lvalue-rvalue conversion when 'a'
+is volatile

Re: [PATCH] c++, v2: Allow attributes on concepts - DR 2428


On 11/14/22 22:54, Jakub Jelinek wrote:

On Tue, Nov 15, 2022 at 09:54:00AM +0100, Jakub Jelinek via Gcc-patches wrote:

On Mon, Nov 14, 2022 at 07:00:54PM -0500, Jason Merrill wrote:

The following patch adds parsing of attributes to concept definition,
allows deprecated attribute to be specified (some ugliness needed
because CONCEPT_DECL is a cp/*.def attribute and so can't be mentioned
in c-family/ directly; used what is used for objc method decls,
an alternative would be a langhook)


Several of the codes in c-common.def are C++-only, you might just move it
over?


and checks TREE_DEPRECATED in
build_standard_check (not sure if that is the right spot, or whether
it shouldn't be checked also for variable and function concepts and
how to write testcase coverage for that).


I wouldn't bother with var/fn concepts, they're obsolete.


Ok, so like this?
The previous version passed bootstrap/regtest on x86_64-linux and i686-linux,
I'll of course test this one as well.


Better with a patch.  Sorry.


OK.


2022-11-15  Jakub Jelinek  

gcc/c-family/
* c-common.def (CONCEPT_DECL): New tree, moved here from
cp-tree.def.
* c-common.cc (c_common_init_ts): Handle CONCEPT_DECL.
* c-attribs.cc (handle_deprecated_attribute): Allow deprecated
attribute on CONCEPT_DECL.
gcc/cp/
* cp-tree.def (CONCEPT_DECL): Move to c-common.def.
* cp-objcp-common.cc (cp_common_init_ts): Don't handle CONCEPT_DECL
here.
* cp-tree.h (finish_concept_definition): Add ATTRS parameter.
* parser.cc (cp_parser_concept_definition): Parse attributes in
between identifier and =.  Adjust finish_concept_definition
caller.
* pt.cc (finish_concept_definition): Add ATTRS parameter.  Call
cplus_decl_attributes.
* constraint.cc (build_standard_check): If CONCEPT_DECL is
TREE_DEPRECATED, emit -Wdeprecated-declaration warnings.
gcc/testsuite/
* g++.dg/cpp2a/concepts-dr2428.C: New test.

--- gcc/c-family/c-common.def.jj2022-10-14 09:28:27.975164491 +0200
+++ gcc/c-family/c-common.def   2022-11-15 09:34:01.384591076 +0100
@@ -81,6 +81,14 @@ DEFTREECODE (CONTINUE_STMT, "continue_st
 SWITCH_STMT_SCOPE, respectively.  */
  DEFTREECODE (SWITCH_STMT, "switch_stmt", tcc_statement, 4)
  
+/* Extensions for C++ Concepts. */

+
+/* Concept definition. This is not entirely different than a VAR_DECL
+   except that a) it must be a template, and b) doesn't have the wide
+   range of value and linkage options available to variables.  Used
+   by C++ FE and in c-family attribute handling.  */
+DEFTREECODE (CONCEPT_DECL, "concept_decl", tcc_declaration, 0)
+
  /*
  Local variables:
  mode:c
--- gcc/c-family/c-common.cc.jj 2022-11-13 12:29:08.165504692 +0100
+++ gcc/c-family/c-common.cc2022-11-15 09:34:48.828950083 +0100
@@ -8497,6 +8497,8 @@ c_common_init_ts (void)
MARK_TS_EXP (FOR_STMT);
MARK_TS_EXP (SWITCH_STMT);
MARK_TS_EXP (WHILE_STMT);
+
+  MARK_TS_DECL_COMMON (CONCEPT_DECL);
  }
  
  /* Build a user-defined numeric literal out of an integer constant type VALUE

--- gcc/c-family/c-attribs.cc.jj2022-11-14 13:35:34.184160348 +0100
+++ gcc/c-family/c-attribs.cc   2022-11-15 09:30:57.370081060 +0100
@@ -4211,7 +4211,8 @@ handle_deprecated_attribute (tree *node,
  || VAR_OR_FUNCTION_DECL_P (decl)
  || TREE_CODE (decl) == FIELD_DECL
  || TREE_CODE (decl) == CONST_DECL
- || objc_method_decl (TREE_CODE (decl)))
+ || objc_method_decl (TREE_CODE (decl))
+ || TREE_CODE (decl) == CONCEPT_DECL)
TREE_DEPRECATED (decl) = 1;
else if (TREE_CODE (decl) == LABEL_DECL)
{
--- gcc/cp/cp-tree.def.jj   2022-09-29 18:11:34.83800 +0200
+++ gcc/cp/cp-tree.def  2022-11-15 09:32:17.456996090 +0100
@@ -495,11 +495,6 @@ DEFTREECODE (OMP_DEPOBJ, "omp_depobj", t
  
  /* Extensions for Concepts. */
  
-/* Concept definition. This is not entirely different than a VAR_DECL

-   except that a) it must be a template, and b) doesn't have the wide
-   range of value and linkage options available to variables.  */
-DEFTREECODE (CONCEPT_DECL, "concept_decl", tcc_declaration, 0)
-
  /* Used to represent information associated with constrained declarations. */
  DEFTREECODE (CONSTRAINT_INFO, "constraint_info", tcc_exceptional, 0)
  
--- gcc/cp/cp-objcp-common.cc.jj	2022-09-30 18:38:55.349607203 +0200

+++ gcc/cp/cp-objcp-common.cc   2022-11-15 09:34:21.963313049 +0100
@@ -473,7 +473,6 @@ cp_common_init_ts (void)
/* New decls.  */
MARK_TS_DECL_COMMON (TEMPLATE_DECL);
MARK_TS_DECL_COMMON (WILDCARD_DECL);
-  MARK_TS_DECL_COMMON (CONCEPT_DECL);
  
MARK_TS_DECL_NON_COMMON (USING_DECL);
  
--- gcc/cp/cp-tree.h.jj	2022-11-15 08:17:07.561388452 +0100

+++ gcc/cp/cp-tree.h2022-11-15 09:30:57.371081046 +0100
@@ -8324,7 +8324,7 @@ struct diagnosing_failed_constraint
  extern cp_expr finish_constraint_or_expr  (location_t, cp_expr, cp_expr);
  extern 

Re: [PATCH v2] c++: Disable -Wignored-qualifiers for template args [PR107492]


On 11/14/22 14:33, Marek Polacek wrote:

On Thu, Nov 03, 2022 at 03:22:12PM -0400, Jason Merrill wrote:

On 11/1/22 13:01, Marek Polacek wrote:

It seems wrong to issue a -Wignored-qualifiers warning for code like:

static_assert(!is_same_v);

because there the qualifier matters.  Likewise in template
specialization:

template struct S { };
template<> struct S { };
template<> struct S { }; // OK, not a redefinition

I'm of the mind that we should disable the warning for template
arguments, as in the patch below.


Hmm, I'm not sure why we would want to treat template arguments differently
from other type-ids.  Maybe only warn if funcdecl_p?


I think that makes sense.  There are other contexts in which cv-quals
matter, for instance trailing-return-type.


Well, technically they matter in all contexts, including function 
declaration:


const void f();
template  struct same;
template  struct same{};
same s;

but much more likely to be a confused user in that case, whereas in a 
template context it's likely to be some deep magic.  :)



Updated patch below, plus I've extended the testcase.  Thanks,

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
It seems wrong to issue a -Wignored-qualifiers warning for code like:

   static_assert(!is_same_v);

because there the qualifier matters.  Likewise in template
specialization:

   template struct S { };
   template<> struct S { };
   template<> struct S { }; // OK, not a redefinition

And likewise in other type-id contexts such as trailing-return-type:

   auto g() -> const void (*)();

This patch limits the warning to the function declaration context only.

PR c++/107492

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Only emit a -Wignored-qualifiers warning
when funcdecl_p.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wignored-qualifiers3.C: New test.
---
  gcc/cp/decl.cc|  6 -
  .../g++.dg/warn/Wignored-qualifiers3.C| 24 +++
  2 files changed, 29 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wignored-qualifiers3.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 890cfcabd35..67b9f24d7d6 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -13038,7 +13038,11 @@ grokdeclarator (const cp_declarator *declarator,
  
  	if (type_quals != TYPE_UNQUALIFIED)

  {
-   if (SCALAR_TYPE_P (type) || VOID_TYPE_P (type))
+   /* It's wrong, for instance, to issue a -Wignored-qualifiers
+  warning for
+   static_assert(!is_same_v);
+   because there the qualifier matters.  */
+   if (funcdecl_p && (SCALAR_TYPE_P (type) || VOID_TYPE_P (type)))
  warning_at (typespec_loc, OPT_Wignored_qualifiers, "type "
  "qualifiers ignored on function return type");
/* [dcl.fct] "A volatile-qualified return type is
diff --git a/gcc/testsuite/g++.dg/warn/Wignored-qualifiers3.C 
b/gcc/testsuite/g++.dg/warn/Wignored-qualifiers3.C
new file mode 100644
index 000..dedb38fc995
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wignored-qualifiers3.C
@@ -0,0 +1,24 @@
+// PR c++/107492
+// { dg-do compile { target c++14 } }
+// { dg-additional-options "-Wignored-qualifiers" }
+
+// Here the 'const' matters, so don't warn.
+template struct S { };
+template<> struct S { };
+template<> struct S { }; // { dg-bogus "ignored" }
+
+template constexpr bool is_same_v = false;
+template constexpr bool is_same_v = true;
+
+static_assert( ! is_same_v< void(*)(), const void(*)() >, ""); // { dg-bogus 
"ignored" }
+
+// Here the 'const' matters as well -> don't warn.
+auto g() -> const void (*)(); // { dg-bogus "ignored" }
+auto g() -> const void (*)() { return nullptr; } // { dg-bogus "ignored" }
+
+// Here as well.
+const void (*h)() = static_cast(h); // { dg-bogus "ignored" }
+
+// But let's keep the warning here.
+const void f(); // { dg-warning "ignored" }
+const void f() { } // { dg-warning "ignored" }

base-commit: c41bbfcaf9d6ef5b57a7e89bba70b861c08a686b




Re: [PATCH] RISC-V: Zihintpause: add __builtin_riscv_pause

On Tue, 15 Nov 2022 at 17:40, Jeff Law  wrote:
>
>
> On 11/13/22 13:41, Philipp Tomsich wrote:
> > The Zihintpause extension uses an opcode from the 'fence' opcode range
> > to add a true hint instruction (i.e. if it is not supported on any
> > given platform, the 'fence' that is encoded will not enforce any
> > specific ordering on memory accesses) for entering a low-power state
> > (e.g. in an idle thread).  We expose this new instruction through a
> > machine-dependent builtin to allow generating it without a requirement
> > for any inline assembly.
> >
> > Given that the encoding of 'pause' is valid (as a 'fence' encoding)
> > even for processors that do not (yet) support Zihintpause, we make
> > this builtin available without any further TARGET_* constraints.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-builtins.cc (struct riscv_builtin_description):
> >   add the pause machine-dependent builtin with no result and no
> >  arguments; mark it as always present (pause is a true hint
> >  that encodes into a fence-insn, if not supported with the new
> >  pause semantics).
> >   * config/riscv/riscv-ftypes.def: Add type for void -> void.
> >   * config/riscv/riscv.md (riscv_pause): Add risc_pause and 
> > UNSPECV_PAUSE
> >   * 
> > doc/gcc/extensions-to-the-c-language-family/target-builtins/risc-v-built-in-functions.rst:
> >   Document.
> >   * optabs.cc (maybe_gen_insn): Allow nops == 0 (void -> void).
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/builtin_pause.c: New test.
>
> OK.  Though I think you'll need to adjust the doc patch now with the
> sphinx work reverted.

Applied to master with the earlier changes to texinfo restored. Thanks!
--Philipp.


Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion

On Tue, Nov 15, 2022 at 1:46 AM Kito Cheng  wrote:

> Hi Christoph:
>
> > This patch implements expansions for the cmpstrsi and the cmpstrnsi
> > builtins using Zbb instructions (if available).
> > This allows to inline calls to strcmp() and strncmp().
> >
> > The expansion basically emits a peeled comparison sequence (i.e. a peeled
> > comparison loop) which compares XLEN bits per step if possible.
> >
> > The emitted sequence can be controlled, by setting the maximum number
> > of compared bytes (-mstring-compare-inline-limit).
>
> I would like to have a unified option interface,
> maybe -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit.
>

Ok, I don't mind (in fact, I thought about this as well).
The reason why it is how it is: I took inspiration from the rs6000 backend.


> And add some option like this:
> -minline-str[n]cmp=[bitmanip|vector|auto] in future,
> since I assume we'll have different versions of those things.
>
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
> >   prototype.
> > * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> >   macros.
> > (GEN_EMIT_HELPER2): New helper macros.
> > (expand_strncmp_zbb_sequence): New function.
> > (riscv_emit_str_compare_zbb): New function.
> > (riscv_expand_strn_compare): New function.
> > * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
> >   for strn_compare.
> > (cmpstrsi): Invoke expansion functions for strn_compare.
> > * config/riscv/riscv.opt: Add new parameter
> >   '-mstring-compare-inline-limit'.
>
> We need to document this option.
>


[PATCH] doc: invoke: riscv: Fix closing block bracket

From: Christoph Müllner 

This patch fixes a wrong placed closing bracket in the RISC-V section of
invoke.texi.

gcc/ChangeLog:

* doc/invoke.texi: Fix closing block bracket

Signed-off-by: Christoph Müllner 
---
 gcc/doc/invoke.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 057439a004c..dfac7c85844 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1220,8 +1220,8 @@ See RS/6000 and PowerPC Options.
 -malign-data=@var{type} @gol
 -mbig-endian  -mlittle-endian @gol
 -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg} @gol
--mstack-protector-guard-offset=@var{offset}}
--mcsr-check -mno-csr-check @gol
+-mstack-protector-guard-offset=@var{offset} @gol
+-mcsr-check -mno-csr-check @gol}
 
 @emph{RL78 Options}
 @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs @gol
-- 
2.38.1



  1   2   >