[PATCH 1/4] xtensa: Implement bswaphi2 insn pattern

2022-05-29 Thread Takayuki 'January June' Suwa via Gcc-patches

This patch adds bswaphi2 insn pattern that is one instruction less than the
default expansion.

gcc/ChangeLog:

* config/xtensa/xtensa.md (bswaphi2): New insn pattern.
---
 gcc/config/xtensa/xtensa.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 6f5cbc541d8..217879bde15 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -471,6 +471,16 @@
 
 ;; Byte swap.

+(define_insn "bswaphi2"
+  [(set (match_operand:HI 0 "register_operand" "=a")
+   (bswap:HI (match_operand:HI 1 "register_operand" "r")))
+   (clobber (match_scratch:HI 2 "=&a"))]
+  ""
+  "extui\t%2, %1, 8, 8\;slli\t%0, %1, 8\;or\t%0, %0, %2"
+   [(set_attr "type" "arith")
+(set_attr "mode" "HI")
+(set_attr "length"   "9")])
+
 (define_expand "bswapsi2"
   [(set (match_operand:SI 0 "register_operand" "")
 (bswap:SI (match_operand:SI 1 "register_operand" "")))]
--
2.20.1


[PATCH 3/4] xtensa: Optimize '(~x & y)' to '((x & y) ^ y)'

2022-05-29 Thread Takayuki 'January June' Suwa via Gcc-patches

In Xtensa ISA, there is no single machine instruction that calculates unary
bitwise negation.

gcc/ChangeLog:

* config/xtensa/xtensa.md (*andsi3_bitcmpl):
New insn_and_split pattern.

gcc/testsuite/ChangeLog:

* gcc.target/xtensa/check_zero_byte.c: New.
---
 gcc/config/xtensa/xtensa.md   | 20 +++
 .../gcc.target/xtensa/check_zero_byte.c   |  9 +
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/xtensa/check_zero_byte.c

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index f11ae4910f8..4aa128eab64 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -601,6 +601,26 @@
(set_attr "mode"  "SI")
(set_attr "length""3,3")])

+(define_insn_and_split "*andsi3_bitcmpl"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+   (and:SI (not:SI (match_operand:SI 1 "register_operand" "r"))
+   (match_operand:SI 2 "register_operand" "r")))]
+  ""
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(set (match_dup 3)
+   (and:SI (match_dup 1)
+   (match_dup 2)))
+   (set (match_dup 0)
+   (xor:SI (match_dup 3)
+   (match_dup 2)))]
+{
+  operands[3] = gen_reg_rtx (SImode);
+}
+  [(set_attr "type"  "arith")
+   (set_attr "mode"  "SI")
+   (set_attr "length""6")])
+
 (define_insn "iorsi3"
   [(set (match_operand:SI 0 "register_operand" "=a")
(ior:SI (match_operand:SI 1 "register_operand" "%r")
diff --git a/gcc/testsuite/gcc.target/xtensa/check_zero_byte.c 
b/gcc/testsuite/gcc.target/xtensa/check_zero_byte.c

new file mode 100644
index 000..6a04aaeefa0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/xtensa/check_zero_byte.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+int check_zero_byte(int v)
+{
+  return (v - 0x01010101) & ~v & 0x80808080;
+}
+
+/* { dg-final { scan-assembler-not "movi" } } */
--
2.20.1


[PATCH 2/4] xtensa: Make one_cmplsi2 optimizer-friendly

2022-05-29 Thread Takayuki 'January June' Suwa via Gcc-patches

In Xtensa ISA, there is no single machine instruction that calculates unary
bitwise negation.  But a few optimizers assume that bitwise negation can be
done by a single insn.

As a result, '((x < 0) ? ~x : x)' cannot be optimized to '(x ^ (x >> 31))'
ever before, for example.

This patch relaxes such limitation, by putting the insn expansion off till
the split pass.

gcc/ChangeLog:

* config/xtensa/xtensa.md (one_cmplsi2):
Rearrange as an insn_and_split pattern.

gcc/testsuite/ChangeLog:

* gcc.target/xtensa/one_cmpl_abs.c: New.
---
 gcc/config/xtensa/xtensa.md   | 26 +--
 .../gcc.target/xtensa/one_cmpl_abs.c  |  9 +++
 2 files changed, 27 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/xtensa/one_cmpl_abs.c

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 217879bde15..f11ae4910f8 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -556,16 +556,26 @@
(set_attr "mode"  "SI")
(set_attr "length""3")])

-(define_expand "one_cmplsi2"
-  [(set (match_operand:SI 0 "register_operand" "")
-   (not:SI (match_operand:SI 1 "register_operand" "")))]
+(define_insn_and_split "one_cmplsi2"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+   (not:SI (match_operand:SI 1 "register_operand" "r")))]
   ""
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(set (match_dup 2)
+   (const_int -1))
+   (set (match_dup 0)
+   (xor:SI (match_dup 1)
+   (match_dup 2)))]
 {
-  rtx temp = gen_reg_rtx (SImode);
-  emit_insn (gen_movsi (temp, constm1_rtx));
-  emit_insn (gen_xorsi3 (operands[0], temp, operands[1]));
-  DONE;
-})
+  operands[2] = gen_reg_rtx (SImode);
+}
+  [(set_attr "type"  "arith")
+   (set_attr "mode"  "SI")
+   (set (attr "length")
+   (if_then_else (match_test "TARGET_DENSITY")
+ (const_int 5)
+ (const_int 6)))])

 (define_insn "negsf2"
   [(set (match_operand:SF 0 "register_operand" "=f")
diff --git a/gcc/testsuite/gcc.target/xtensa/one_cmpl_abs.c 
b/gcc/testsuite/gcc.target/xtensa/one_cmpl_abs.c

new file mode 100644
index 000..608f65fd777
--- /dev/null
+++ b/gcc/testsuite/gcc.target/xtensa/one_cmpl_abs.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+int one_cmpl_abs(int a)
+{
+  return a < 0 ? ~a : a;
+}
+
+/* { dg-final { scan-assembler-not "bgez" } } */
--
2.20.1


[PATCH 4/4] xtensa: Add clrsbsi2 insn pattern

2022-05-29 Thread Takayuki 'January June' Suwa via Gcc-patches

> (clrsb:m x)
> Represents the number of redundant leading sign bits in x, represented
> as an integer of mode m, starting at the most significant bit position.

This explanation is just what the NSA instruction (not ever emitted before)
calculates in Xtensa ISA.

gcc/ChangeLog:

* config/xtensa/xtensa.md (clrsbsi2): New insn pattern.

libgcc/ChangeLog:

* config/xtensa/lib1funcs.S (__clrsbsi2): New function.
* config/xtensa/t-xtensa (LIB1ASMFUNCS): Add _clrsbsi2.
---
 gcc/config/xtensa/xtensa.md  | 12 +++-
 libgcc/config/xtensa/lib1funcs.S | 23 +++
 libgcc/config/xtensa/t-xtensa|  2 +-
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 4aa128eab64..dec49cc942b 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -429,7 +429,17 @@
(set_attr "length""3")])

 
-;; Count leading/trailing zeros and find first bit.
+;; Count redundant leading sign bits and leading/trailing zeros,
+;; and find first bit.
+
+(define_insn "clrsbsi2"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+   (clrsb:SI (match_operand:SI 1 "register_operand" "r")))]
+  "TARGET_NSA"
+  "nsa\t%0, %1"
+  [(set_attr "type"  "arith")
+   (set_attr "mode"  "SI")
+   (set_attr "length""3")])

 (define_insn "clzsi2"
   [(set (match_operand:SI 0 "register_operand" "=a")
diff --git a/libgcc/config/xtensa/lib1funcs.S 
b/libgcc/config/xtensa/lib1funcs.S

index 5a2bd20534f..3932d206256 100644
--- a/libgcc/config/xtensa/lib1funcs.S
+++ b/libgcc/config/xtensa/lib1funcs.S
@@ -456,6 +456,29 @@ __nsau_data:
 #endif /* L_clz */


+#ifdef L_clrsbsi2
+   .align  4
+   .global __clrsbsi2
+   .type   __clrsbsi2, @function
+__clrsbsi2:
+   leaf_entry sp, 16
+#if XCHAL_HAVE_NSA
+   nsa a2, a2
+#else
+   sraia3, a2, 31
+   xor a3, a3, a2
+   movia2, 31
+   beqza3, .Lreturn
+   do_nsau a2, a3, a4, a5
+   addia2, a2, -1
+.Lreturn:
+#endif
+   leaf_return
+   .size   __clrsbsi2, . - __clrsbsi2
+
+#endif /* L_clrsbsi2 */
+
+
 #ifdef L_clzsi2
.align  4
.global __clzsi2
diff --git a/libgcc/config/xtensa/t-xtensa b/libgcc/config/xtensa/t-xtensa
index 9836c96aefc..084618b382e 100644
--- a/libgcc/config/xtensa/t-xtensa
+++ b/libgcc/config/xtensa/t-xtensa
@@ -1,6 +1,6 @@
 LIB1ASMSRC = xtensa/lib1funcs.S
 LIB1ASMFUNCS = _mulsi3 _divsi3 _modsi3 _udivsi3 _umodsi3 \
-   _umulsidi3 _clz _clzsi2 _ctzsi2 _ffssi2 \
+   _umulsidi3 _clz _clrsbsi2 _clzsi2 _ctzsi2 _ffssi2 \
_ashldi3 _ashrdi3 _lshrdi3 \
_bswapsi2 _bswapdi2 \
_negsf2 _addsubsf3 _mulsf3 _divsf3 _cmpsf2 _fixsfsi _fixsfdi \
--
2.20.1


[PATCH v5] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-05-29 Thread Di Zhao OS via Gcc-patches
Hi, attached is a new version of the patch. The changes are:
- Skip using temporary equivalences for floating-point values, because
folding expressions can generate incorrect values. For example, 
operations on 0.0 and -0.0 may have different results.
- Avoid inserting duplicated back-refs from value-number to predicates.
- Disable fre in testsuite/g++.dg/pr83541.C .

Summary of the previous versions:
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587346.html

Is the patch still considered?

Thanks,
Di Zhao

---

Extend FRE with temporary equivalences.

2022-05-29  Di Zhao  

gcc/ChangeLog:
PR tree-optimization/101186
* tree-ssa-sccvn.c (VN_INFO): remove assertions (there could be a
predicate already).
(dominated_by_p_w_unex): Moved upward.
(vn_nary_op_get_predicated_value): Moved upward.
(is_vn_valid_at_bb): Check if vn_pval is valid at BB.
(lookup_equiv_head): Lookup the "equivalence head" of given node.
(lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
(vn_tracking_edge): Extracted utility function.
(init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
(vn_nary_op_insert_into): Insert new value at the front.
(vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
from pieces.
(fold_const_from_equiv_heads): Fold N-ary expression of equiv-heads.
(push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
(val_equiv_insert): Record temporary equivalence.
(vn_nary_op_insert_pieces_predicated): Record equivalences instead of
some predicates; insert back-refs.
(record_equiv_from_prev_phi_1): Record temporary equivalences generated
by PHI nodes.
(record_equiv_from_prev_phi): Given an outgoing edge of a conditional
expression taken, record equivalences generated by PHI nodes.
(visit_nary_op): Add lookup previous results of N-ary operations by
equivalences.
(insert_related_predicates_on_edge): Some predicates can be computed
from equivalences, no need to insert them.
(process_bb): Add lookup predicated values by equivalences.
(struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
(do_unwind): Unwind the back-refs to vn_nary_op_t.
(do_rpo_vn): Update back-reference unwind state.
* tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the
nary map entries.

gcc/testsuite/ChangeLog:

* g++.dg/pr83541.C: Disable fre.
* gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
* gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
* gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
* gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
* gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
* gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
* gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
* gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
* gcc.dg/tree-ssa/vrp03.c: Disable fre.
* gcc.dg/tree-ssa/ssa-fre-100.c: New test.
* gcc.dg/tree-ssa/ssa-fre-101.c: New test.
* gcc.dg/tree-ssa/ssa-fre-102.c: New test.
* gcc.dg/tree-ssa/ssa-pre-34.c: New test.


v5-tree-optimization-101186.patch
Description: v5-tree-optimization-101186.patch


[pushed] Darwin: Fix empty g++ command lines [PR105599].

2022-05-29 Thread Iain Sandoe via Gcc-patches
An empty g++ command line should produce a diagnostic that there are no
inputs.  The PR is that currently Darwin produces a dignostic about missing
link items instead - this is because (errnoeously), for this driver, we are
creating a link job for empty command lines.

The problem occurs in four stages:

 The g++ driver appends -shared-libgcc to the command line.

 The Darwin driver_init code in the backend does not see this (it sees an
 empty command line).

 When the back end driver code driver sees an empty command line, it does not
 add any supplementary flags (e.g. asm-macosx-version-min) - precisely to
 avoid anything being claimed as an input_file and therefore triggering a link
 line.

 Since we do not have a value for asm-macosx-version-min when processing the
 driver specs, we unconditionally inject 'multiply_defined suppress' which is
 used with shared libgcc (but only intended on very old Darwin).  This then
 causes the generation of a link job.

The solution, for the present, is to move version-specific link params to the
LINK_SPEC so that they are only processed when a link job has already been
decided.

tested on x86-64-darwin18/19, pushed to master,
thanks
Iain

Signed-off-by: Iain Sandoe 

PR target/105599

gcc/ChangeLog:

* config/darwin.h: Move versions-specific handling of multiply_defined
from SUBTARGET_DRIVER_SELF_SPECS to LINK_SPEC.
---
 gcc/config/darwin.h | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index b73e12372d8..f82ec62cf20 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -143,10 +143,7 @@ extern GTY(()) int darwin_ms_struct;
Right now there's no mechanism to split up the "variable portion" (%*) of
the matched spec string, so where we have some driver specs that take 2
or 3 arguments, these cannot be processed here, but are deferred until the
-   LINK_SPEC, where they are copied verbatim.
-   We have a "safe" version of the MacOS version string, that's been sanity-
-   checked and truncated to minor version.  If the 'tiny' (3rd) portion of the
-   value is not significant, it's better to use this in version-compare().  */
+   LINK_SPEC, where they are copied verbatim.  */
 
 #undef SUBTARGET_DRIVER_SELF_SPECS
 #define SUBTARGET_DRIVER_SELF_SPECS\
@@ -220,13 +217,8 @@ extern GTY(()) int darwin_ms_struct;
   "%{image_base*:-Xlinker -image_base -Xlinker %*} %

Re: [PATCH] libcpp: Ignore CPP_PADDING tokens in _cpp_parse_expr [PR105732]

2022-05-29 Thread Jason Merrill via Gcc-patches

On 5/27/22 04:16, Jakub Jelinek wrote:

Hi!

The first part of the following testcase (m1-m3 macros and its use)
regressed with my PR89971 fix, but as the m1,m4-m5 and its use part shows,
the problem isn't new, we can emit a CPP_PADDING token to avoid it from
being adjacent to whatever comes after the __VA_OPT__ (in this case there
is nothing afterwards, true).

In most cases these CPP_PADDING tokens don't matter, all other
callers of cpp_get_token_with_location either ignore CPP_PADDING tokens
completely (e.g. c_lex_with_flags) or they just remember them and
take them into account when printing stuff whether there should be
added whitespace or not (scan_translation_unit + token_streamer::stream).
So, I think we should just ignore CPP_PADDING tokens the same way in
_cpp_parse_expr.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2022-05-27  Jakub Jelinek  

PR preprocessor/105732
* expr.cc (_cpp_parse_expr): Handle CPP_PADDING by just another
token.

* c-c++-common/cpp/va-opt-10.c: New test.

--- libcpp/expr.cc.jj   2022-01-18 11:59:00.258972399 +0100
+++ libcpp/expr.cc  2022-05-26 15:39:54.348780446 +0200
@@ -1366,6 +1366,10 @@ _cpp_parse_expr (cpp_reader *pfile, bool
op.op = CPP_UMINUS;
  break;
  
+	case CPP_PADDING:

+ lex_count--;
+ continue;
+
default:
  if ((int) op.op <= (int) CPP_EQ || (int) op.op >= (int) CPP_PLUS_EQ)
SYNTAX_ERROR2_AT (op.loc,
--- gcc/testsuite/c-c++-common/cpp/va-opt-10.c.jj   2022-05-26 
15:54:40.279766330 +0200
+++ gcc/testsuite/c-c++-common/cpp/va-opt-10.c  2022-05-26 15:54:24.028928687 
+0200
@@ -0,0 +1,18 @@
+/* PR preprocessor/105732 */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" { target c } } */
+/* { dg-options "-std=c++20" { target c++ } } */
+
+#define m1(p1, p2, p3) p3
+#define m2(p1, ...) 1##__VA_OPT__(foo)
+#define m3(...) m1(1, 2, m2)
+#define m4(p1, ...) 1 __VA_OPT__()
+#define m5(...) m1(1, 2, m4)
+#if m3(,)(,)
+#else
+#error
+#endif
+#if m5(,)(,)
+#else
+#error
+#endif

Jakub





Re: [PATCH] c++: document comp_template_args's default args

2022-05-29 Thread Jason Merrill via Gcc-patches

On 5/27/22 14:07, Patrick Palka wrote:

In passing, use bool for its return type.


OK.


gcc/cp/ChangeLog:

* cp-tree.h (comp_template_args): Change return type to bool.
* pt.cc (comp_template_args): Document default arguments.
Change return type to bool and adjust returns accordingly.
---
  gcc/cp/cp-tree.h |  2 +-
  gcc/cp/pt.cc | 24 +++-
  2 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index d77fd1eb8a9..da8898155e0 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7327,7 +7327,7 @@ extern tree get_template_info 
(const_tree);
  extern int template_class_depth   (tree);
  extern int is_specialization_of   (tree, tree);
  extern bool is_specialization_of_friend   (tree, tree);
-extern int comp_template_args  (tree, tree, tree * = NULL,
+extern bool comp_template_args (tree, tree, tree * = NULL,
 tree * = NULL, bool = false);
  extern int template_args_equal  (tree, tree, bool = false);
  extern tree maybe_process_partial_specialization (tree);
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index ec168234325..b5064990857 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -9368,27 +9368,25 @@ template_args_equal (tree ot, tree nt, bool 
partial_order /* = false */)
  }
  }
  
-/* Returns 1 iff the OLDARGS and NEWARGS are in fact identical sets of

-   template arguments.  Returns 0 otherwise, and updates OLDARG_PTR and
+/* Returns true iff the OLDARGS and NEWARGS are in fact identical sets of
+   template arguments.  Returns false otherwise, and updates OLDARG_PTR and
 NEWARG_PTR with the offending arguments if they are non-NULL.  */
  
-int

+bool
  comp_template_args (tree oldargs, tree newargs,
-   tree *oldarg_ptr, tree *newarg_ptr,
-   bool partial_order)
+   tree *oldarg_ptr /* = NULL */, tree *newarg_ptr /* = NULL 
*/,
+   bool partial_order /* = false */)
  {
-  int i;
-
if (oldargs == newargs)
-return 1;
+return true;
  
if (!oldargs || !newargs)

-return 0;
+return false;
  
if (TREE_VEC_LENGTH (oldargs) != TREE_VEC_LENGTH (newargs))

-return 0;
+return false;
  
-  for (i = 0; i < TREE_VEC_LENGTH (oldargs); ++i)

+  for (int i = 0; i < TREE_VEC_LENGTH (oldargs); ++i)
  {
tree nt = TREE_VEC_ELT (newargs, i);
tree ot = TREE_VEC_ELT (oldargs, i);
@@ -9399,10 +9397,10 @@ comp_template_args (tree oldargs, tree newargs,
*oldarg_ptr = ot;
  if (newarg_ptr != NULL)
*newarg_ptr = nt;
- return 0;
+ return false;
}
  }
-  return 1;
+  return true;
  }
  
  inline bool




Re: [PATCH] c++: use current_template_constraints more

2022-05-29 Thread Jason Merrill via Gcc-patches

On 5/27/22 14:05, Patrick Palka wrote:

gcc/cp/ChangeLog:

* decl.cc (grokvardecl): Use current_template_constraints.
(xref_tag): Likewise.
* semantics.cc (finish_template_template_parm): Likewise.


OK.


---
  gcc/cp/decl.cc  | 13 +++--
  gcc/cp/semantics.cc |  3 +--
  2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 892e4a4b19b..26428ca7122 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -10789,9 +10789,7 @@ grokvardecl (tree type,
else if (flag_concepts
   && current_template_depth > template_class_depth (scope))
  {
-  tree reqs = TEMPLATE_PARMS_CONSTRAINTS (current_template_parms);
-  tree ci = build_constraints (reqs, NULL_TREE);
-
+  tree ci = current_template_constraints ();
set_constraints (decl, ci);
  }
  
@@ -15852,13 +15850,8 @@ xref_tag (enum tag_types tag_code, tree name,

  {
/* Check that we aren't trying to overload a class with different
   constraints.  */
-  tree constr = NULL_TREE;
-  if (current_template_parms)
-{
-  tree reqs = TEMPLATE_PARMS_CONSTRAINTS (current_template_parms);
-  constr = build_constraints (reqs, NULL_TREE);
-}
- if (!redeclare_class_template (t, current_template_parms, constr))
+ if (!redeclare_class_template (t, current_template_parms,
+current_template_constraints ()))
return error_mark_node;
  }
else if (!processing_template_decl
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index cd7a2818feb..efdeb9318a7 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -3387,8 +3387,7 @@ finish_template_template_parm (tree aggr, tree identifier)
  
/* Associate the constraints with the underlying declaration,

   not the template.  */
-  tree reqs = TEMPLATE_PARMS_CONSTRAINTS (current_template_parms);
-  tree constr = build_constraints (reqs, NULL_TREE);
+  tree constr = current_template_constraints ();
set_constraints (decl, constr);
  
end_template_decl ();




nvptx: forward '-v' command-line option to assembler, linker (was: [MentorEmbedded/nvptx-tools] Issue 30: Ignore not-supported sm_* error without --verify (PR #31))

2022-05-29 Thread Thomas Schwinge
Hi!

One item from 
:

On 2022-04-11T04:25:01-0700, Tobias Burnus  wrote:
> we could consider invoking nvptx-as with `-v` from GCC to make this more 
> prominent – at least when building GCC itself. _(The user could do `-Wa,-v` 
> to force this warning.)_

Not sure if that's what you had in mind, but what do you think about the
attached "nvptx: forward '-v' command-line option to assembler, linker"?
OK to push to GCC master branch (after merging

"Put '-v' verbose output onto stderr instead of stdout")?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 17c35607d4927299b0c4bd19dd6fd205c85c4a4b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sun, 29 May 2022 22:31:43 +0200
Subject: [PATCH] nvptx: forward '-v' command-line option to assembler, linker

For example, for offloading compilation with '-save-temps -v', before vs. after
word-diff then looks like:

[...]
 [...]/build-gcc-offload-nvptx-none/gcc/as {+-v -v+} -o ./a.xnvptx-none.mkoffload.o ./a.xnvptx-none.mkoffload.s
{+Verifying sm_30 code with sm_35 code generation.+}
{+ ptxas -c -o /dev/null ./a.xnvptx-none.mkoffload.o --gpu-name sm_35 -O0+}
[...]
 [...]/build-gcc-offload-nvptx-none/gcc/collect2 {+-v -v+} -o ./a.xnvptx-none.mkoffload [...] @./a.xnvptx-none.mkoffload.args.1 -lgomp -lgcc -lc -lgcc
{+collect2 version 12.0.1 20220428 (experimental)+}
{+[...]/build-gcc-offload-nvptx-none/gcc/collect-ld -v -v -o ./a.xnvptx-none.mkoffload [...] ./a.xnvptx-none.mkoffload.o -lgomp -lgcc -lc -lgcc+}
{+Linking ./a.xnvptx-none.mkoffload.o as 0+}
{+trying lib libc.a+}
{+trying lib libgcc.a+}
{+trying lib libgomp.a+}
{+Resolving abort+}
{+Resolving acc_on_device+}
{+Linking libgomp.a::oacc-init.o/ as 1+}
{+Linking libc.a::lib_a-abort.o/   as 2+}
[...]

(This depends on 
"Put '-v' verbose output onto stderr instead of stdout".)

	gcc/
	* config/nvptx/nvptx.h (ASM_SPEC, LINK_SPEC): Define.
---
 gcc/config/nvptx/nvptx.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index ed72c253191..b184f1d0150 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -27,6 +27,13 @@
 
 /* Run-time Target.  */
 
+/* Assembler supports '-v' option; handle similar to
+   '../../gcc.cc:asm_options', 'HAVE_GNU_AS'.  */
+#define ASM_SPEC "%{v}"
+
+/* Linker supports '-v' option.  */
+#define LINK_SPEC "%{v}"
+
 #define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
 
 #define TARGET_CPU_CPP_BUILTINS() nvptx_cpu_cpp_builtins ()
-- 
2.25.1



Re: [PATCH v3] DSE: Use the constant store source if possible

2022-05-29 Thread H.J. Lu via Gcc-patches
On Sat, May 28, 2022 at 11:37 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote:
> > On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote:
> >> "H.J. Lu"  writes:
> >>> On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
> >>>  wrote:
>  "H.J. Lu via Gcc-patches"  writes:
> > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> >> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
> >>  wrote:
> >>> When recording store for RTL dead store elimination, check if the 
> >>> source
> >>> register is set only once to a constant.  If yes, record the constant
> >>> as the store source.  It eliminates unrolled zero stores after memset > >>> 0
> >>> in a loop where a vector register is used as the zero store source.
> >>>
> >>> gcc/
> >>>
> >>>  PR rtl-optimization/105638
> >>>  * dse.cc (record_store): Use the constant source if the 
> >>> source
> >>>  register is set only once.
> >>>
> >>> gcc/testsuite/
> >>>
> >>>  PR rtl-optimization/105638
> >>>  * g++.target/i386/pr105638.C: New test.
> >>> ---
> >>>   gcc/dse.cc   | 19 ++
> >>>   gcc/testsuite/g++.target/i386/pr105638.C | 44 
> >>> 
> >>>   2 files changed, 63 insertions(+)
> >>>   create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >>>
> >>> diff --git a/gcc/dse.cc b/gcc/dse.cc
> >>> index 30c11cee034..0433dd3d846 100644
> >>> --- a/gcc/dse.cc
> >>> +++ b/gcc/dse.cc
> >>> @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
> >>>
> >>>if (tem && CONSTANT_P (tem))
> >>>  const_rhs = tem;
> >>> + else
> >>> +   {
> >>> + /* If RHS is set only once to a constant, set CONST_RHS
> >>> +to the constant.  */
> >>> + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >>> + if (def != nullptr
> >>> + && !DF_REF_IS_ARTIFICIAL (def)
> >>> + && !DF_REF_NEXT_REG (def))
> >>> +   {
> >>> + rtx_insn *def_insn = DF_REF_INSN (def);
> >>> + rtx def_body = PATTERN (def_insn);
> >>> + if (GET_CODE (def_body) == SET)
> >>> +   {
> >>> + rtx def_src = SET_SRC (def_body);
> >>> + if (CONSTANT_P (def_src))
> >>> +   const_rhs = def_src;
> >> doesn't DSE have its own tracking of stored values?  Shouldn't we
> > It tracks stored values only within the basic block.  When RTL loop
> > invariant motion hoists a constant initialization out of the loop into
> > a separate basic block, the constant store value becomes unknown
> > within the original basic block.
> >
> >> improve _that_ if it is not enough?  I also wonder if you need to
> > My patch extends DSE stored value tracking to include the constant which
> > is set only once in another basic block.
> >
> >> verify the SET isn't partial?
> >>
> > Here is the v2 patch to check that the constant is set by a non-partial
> > unconditional load.
> >
> > OK for master?
> >
> > Thanks.
> >
> > H.J.
> > ---
> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
> > loop invariant motion hoists a constant initialization out of the loop
> > into a separate basic block, the constant store value becomes unknown
> > within the original basic block.  When recording store for RTL DSE, 
> > check
> > if the source register is set only once to a constant by a non-partial
> > unconditional load.  If yes, record the constant as the constant store
> > source.  It eliminates unrolled zero stores after memset 0 in a loop
> > where a vector register is used as the zero store source.
> >
> > gcc/
> >
> >PR rtl-optimization/105638
> >* dse.cc (record_store): Use the constant source if the source
> >register is set only once.
> >
> > gcc/testsuite/
> >
> >PR rtl-optimization/105638
> >* g++.target/i386/pr105638.C: New test.
> > ---
> >   gcc/dse.cc   | 22 
> >   gcc/testsuite/g++.target/i386/pr105638.C | 44 
> >   2 files changed, 66 insertions(+)
> >   create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >
> > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > index 30c11cee034..af8e88dac32 100644
> > --- a/gcc/dse.cc
> > +++ b/gcc/dse.cc
> > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
> >
> >  if (tem && CONSTANT_P (tem))
> >   

[PATCH v2] RISC-V: bitmanip: improve constant-loading for (1ULL << 31) in DImode

2022-05-29 Thread Philipp Tomsich
The SINGLE_BIT_MASK_OPERAND() is overly restrictive, triggering for
bits above 31 only (to side-step any issues with the negative SImode
value 0x8000/(-1ull << 31)/(1 << 31)).  This moves the special
handling of this SImode value (i.e. the check for (-1ull << 31) to
riscv.cc and relaxes the SINGLE_BIT_MASK_OPERAND() test.

With this, the code-generation for loading (1ULL << 31) from:
li  a0,1
sllia0,a0,31
to:
bseti   a0,zero,31

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer_1): Rewrite value as
(-1 << 31) for the single-bit case, when operating on (1 << 31)
in SImode.
* gcc/config/riscv/riscv.h (SINGLE_BIT_MASK_OPERAND): Allow for
any single-bit value, moving the special case for (1 << 31) to
riscv_build_integer_1 (in riscv.c).

Signed-off-by: Philipp Tomsich 

---

Changes in v2:
- Use HOST_WIDE_INT_1U/HOST_WIDE_INT_M1U instead of constants.
- Fix some typos in the comment above the rewrite of the value.
- Update the comment to clarify that we expect a LUI to be emitted for
  the SImode case (i.e. sign-extended for RV64) of (1 << 31).

 gcc/config/riscv/riscv.cc |  9 +
 gcc/config/riscv/riscv.h  | 11 ---
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f83dc796d88..2e83ca07394 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -420,6 +420,15 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
   /* Simply BSETI.  */
   codes[0].code = UNKNOWN;
   codes[0].value = value;
+
+  /* RISC-V sign-extends all 32bit values that live in a 32bit
+register.  To avoid paradoxes, we thus need to use the
+sign-extended (negative) representation (-1 << 31) for the
+value, if we want to build (1 << 31) in SImode.  This will
+then expand to an LUI instruction.  */
+  if (mode == SImode && value == (HOST_WIDE_INT_1U << 31))
+   codes[0].value = (HOST_WIDE_INT_M1U << 31);
+
   return 1;
 }
 
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 5083a1c24b0..6f7f4d3fbdc 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -528,13 +528,10 @@ enum reg_class
   (((VALUE) | ((1UL<<31) - IMM_REACH)) == ((1UL<<31) - IMM_REACH)  \
|| ((VALUE) | ((1UL<<31) - IMM_REACH)) + IMM_REACH == 0)
 
-/* If this is a single bit mask, then we can load it with bseti.  But this
-   is not useful for any of the low 31 bits because we can use addi or lui
-   to load them.  It is wrong for loading SImode 0x8000 on rv64 because it
-   needs to be sign-extended.  So we restrict this to the upper 32-bits
-   only.  */
-#define SINGLE_BIT_MASK_OPERAND(VALUE) \
-  (pow2p_hwi (VALUE) && (ctz_hwi (VALUE) >= 32))
+/* If this is a single bit mask, then we can load it with bseti.  Special
+   handling of SImode 0x8000 on RV64 is done in riscv_build_integer_1. */
+#define SINGLE_BIT_MASK_OPERAND(VALUE) \
+  (pow2p_hwi (VALUE))
 
 /* Stack layout; function entry, exit and calling.  */
 
-- 
2.34.1



Re: [PATCH v3] DSE: Use the constant store source if possible

2022-05-29 Thread Jeff Law via Gcc-patches




On 5/29/2022 3:43 PM, H.J. Lu wrote:

On Sat, May 28, 2022 at 11:37 AM Jeff Law via Gcc-patches
 wrote:



On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote:

On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote:

"H.J. Lu"  writes:

On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
 wrote:

"H.J. Lu via Gcc-patches"  writes:

On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:

On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
 wrote:

When recording store for RTL dead store elimination, check if the source
register is set only once to a constant.  If yes, record the constant
as the store source.  It eliminates unrolled zero stores after memset 0
in a loop where a vector register is used as the zero store source.

gcc/

  PR rtl-optimization/105638
  * dse.cc (record_store): Use the constant source if the source
  register is set only once.

gcc/testsuite/

  PR rtl-optimization/105638
  * g++.target/i386/pr105638.C: New test.
---
   gcc/dse.cc   | 19 ++
   gcc/testsuite/g++.target/i386/pr105638.C | 44 
   2 files changed, 63 insertions(+)
   create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C

diff --git a/gcc/dse.cc b/gcc/dse.cc
index 30c11cee034..0433dd3d846 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)

if (tem && CONSTANT_P (tem))
  const_rhs = tem;
+ else
+   {
+ /* If RHS is set only once to a constant, set CONST_RHS
+to the constant.  */
+ df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
+ if (def != nullptr
+ && !DF_REF_IS_ARTIFICIAL (def)
+ && !DF_REF_NEXT_REG (def))
+   {
+ rtx_insn *def_insn = DF_REF_INSN (def);
+ rtx def_body = PATTERN (def_insn);
+ if (GET_CODE (def_body) == SET)
+   {
+ rtx def_src = SET_SRC (def_body);
+ if (CONSTANT_P (def_src))
+   const_rhs = def_src;

doesn't DSE have its own tracking of stored values?  Shouldn't we

It tracks stored values only within the basic block.  When RTL loop
invariant motion hoists a constant initialization out of the loop into
a separate basic block, the constant store value becomes unknown
within the original basic block.


improve _that_ if it is not enough?  I also wonder if you need to

My patch extends DSE stored value tracking to include the constant which
is set only once in another basic block.


verify the SET isn't partial?


Here is the v2 patch to check that the constant is set by a non-partial
unconditional load.

OK for master?

Thanks.

H.J.
---
RTL DSE tracks redundant constant stores within a basic block.  When RTL
loop invariant motion hoists a constant initialization out of the loop
into a separate basic block, the constant store value becomes unknown
within the original basic block.  When recording store for RTL DSE, check
if the source register is set only once to a constant by a non-partial
unconditional load.  If yes, record the constant as the constant store
source.  It eliminates unrolled zero stores after memset 0 in a loop
where a vector register is used as the zero store source.

gcc/

PR rtl-optimization/105638
* dse.cc (record_store): Use the constant source if the source
register is set only once.

gcc/testsuite/

PR rtl-optimization/105638
* g++.target/i386/pr105638.C: New test.
---
   gcc/dse.cc   | 22 
   gcc/testsuite/g++.target/i386/pr105638.C | 44 
   2 files changed, 66 insertions(+)
   create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C

diff --git a/gcc/dse.cc b/gcc/dse.cc
index 30c11cee034..af8e88dac32 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)

  if (tem && CONSTANT_P (tem))
const_rhs = tem;
+   else
+ {
+   /* If RHS is set only once to a constant, set CONST_RHS
+  to the constant.  */
+   df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
+   if (def != nullptr
+   && !DF_REF_IS_ARTIFICIAL (def)
+   && !(DF_REF_FLAGS (def)
+& (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
+   && !DF_REF_NEXT_REG (def))

Can we introduce a helper for this?  There are already similar tests
in ira and loop-iv, and it seems a bit too complex to have to open-code
each time.

I can use find_single_def_src in loop-iv.cc:

/* If REGNO has a single definition, return its known value, otherwise return
 null.  */

rtx
find_single_def_src (unsigned int regno)

Yeah, reusing that sounds good.  Perhaps we should move it into df-core.cc,
alongside the df_reg_used group of functions.

I think 

Ping [PATCH v3, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-05-29 Thread HAO CHEN GUI via Gcc-patches
Hi,
   Gentle ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595164.html
Thanks.

On 18/5/2022 下午 4:52, HAO CHEN GUI wrote:
> Hi,
>   This patch implements optab f[min/max]_optab by xs[min/max]dp on rs6000.
> Tests show that outputs of xs[min/max]dp are consistent with the standard
> of C99 fmin/max.
> 
>   This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instead
> of smin/max. So the builtins always generate xs[min/max]dp on all
> platforms.
> 
>   Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-05-18 Haochen Gui 
> 
> gcc/
>   PR target/103605
>   * rs6000.md (FMINMAX): New.
>   (minmax_op): New.
>   (f3): New pattern by UNSPEC_FMAX and UNSPEC_FMIN.
>   * rs6000-builtins.def (__builtin_vsx_xsmaxdp): Set pattern to fmaxdf3.
>   (__builtin_vsx_xsmindp): Set pattern to fmindf3.
> 
> gcc/testsuite/
>   PR target/103605
>   * gcc.dg/pr103605.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f4a9f24bcc5..8b735493b40 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1613,10 +1613,10 @@
>  XSCVSPDP vsx_xscvspdp {}
> 
>const double __builtin_vsx_xsmaxdp (double, double);
> -XSMAXDP smaxdf3 {}
> +XSMAXDP fmaxdf3 {}
> 
>const double __builtin_vsx_xsmindp (double, double);
> -XSMINDP smindf3 {}
> +XSMINDP fmindf3 {}
> 
>const double __builtin_vsx_xsrdpi (double);
>  XSRDPI vsx_xsrdpi {}
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index bf85baa5370..197de0838ee 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -158,6 +158,8 @@ (define_c_enum "unspec"
> UNSPEC_HASHCHK
> UNSPEC_XXSPLTIDP_CONST
> UNSPEC_XXSPLTIW_CONST
> +   UNSPEC_FMAX
> +   UNSPEC_FMIN
>])
> 
>  ;;
> @@ -5341,6 +5343,22 @@ (define_insn_and_split "*s3_fpr"
>DONE;
>  })
> 
> +
> +(define_int_iterator FMINMAX [UNSPEC_FMAX UNSPEC_FMIN])
> +
> +(define_int_attr  minmax_op [(UNSPEC_FMAX "max")
> +  (UNSPEC_FMIN "min")])
> +
> +(define_insn "f3"
> +  [(set (match_operand:SFDF 0 "vsx_register_operand" "=wa")
> + (unspec:SFDF [(match_operand:SFDF 1 "vsx_register_operand" "wa")
> +   (match_operand:SFDF 2 "vsx_register_operand" "wa")]
> +   FMINMAX))]
> +"TARGET_VSX"
> +"xsdp %x0,%x1,%x2"
> +[(set_attr "type" "fp")]
> +)
> +
>  (define_expand "movcc"
> [(set (match_operand:GPR 0 "gpc_reg_operand")
>(if_then_else:GPR (match_operand 1 "comparison_operator")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103605.c 
> b/gcc/testsuite/gcc.target/powerpc/pr103605.c
> new file mode 100644
> index 000..e43ac40c2d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103605.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O1 -mvsx" } */
> +/* { dg-final { scan-assembler-times {\mxsmaxdp\M} 3 } } */
> +/* { dg-final { scan-assembler-times {\mxsmindp\M} 3 } } */
> +
> +#include 
> +
> +double test1 (double d0, double d1)
> +{
> +  return fmin (d0, d1);
> +}
> +
> +float test2 (float d0, float d1)
> +{
> +  return fmin (d0, d1);
> +}
> +
> +double test3 (double d0, double d1)
> +{
> +  return fmax (d0, d1);
> +}
> +
> +float test4 (float d0, float d1)
> +{
> +  return fmax (d0, d1);
> +}
> +
> +double test5 (double d0, double d1)
> +{
> +  return __builtin_vsx_xsmindp (d0, d1);
> +}
> +
> +double test6 (double d0, double d1)
> +{
> +  return __builtin_vsx_xsmaxdp (d0, d1);
> +}


Ping [PATCH v2, rs6000] Fix ICE on expand bcd__ [PR100736]

2022-05-29 Thread HAO CHEN GUI via Gcc-patches
Hi,
   Gentle ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595661.html
Thanks.

On 26/5/2022 下午 3:35, HAO CHEN GUI wrote:
> Hi,
>   This patch fixes the ICE reported in PR100736. It removes the condition
> check of finite math only flag not setting in "*_cc" pattern.
> With or without this flag, we still can use "cror" to check if either
> two bits of CC is set or not for "fp_two" codes. We don't need a reverse
> comparison (implemented by crnot) here when the finite math flag is set,
> as the latency of "cror" and "crnor" are the same.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-05-26 Haochen Gui 
> 
> gcc/
>   * config/rs6000/rs6000.md (*_cc): Remove condition of
>   finite math only flag not setting.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/pr100736.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index fdfbc6566a5..a6f9cbc9b8b 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -12995,9 +12995,9 @@ (define_insn_and_split "*_cc"
>[(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>   (fp_two:GPR (match_operand:CCFP 1 "cc_reg_operand" "y")
> (const_int 0)))]
> -  "!flag_finite_math_only"
> +  ""
>"#"
> -  "&& 1"
> +  ""
>[(pc)]
>  {
>rtx cc = rs6000_emit_fp_cror (, mode, operands[1]);
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr100736.c 
> b/gcc/testsuite/gcc.target/powerpc/pr100736.c
> new file mode 100644
> index 000..32cb6df6cd9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr100736.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2 -ffinite-math-only" } */
> +
> +typedef __attribute__ ((altivec (vector__))) unsigned char v;
> +
> +int foo (v a, v b)
> +{
> +  return __builtin_vec_bcdsub_ge (a, b, 0);
> +}
> +
> +/* { dg-final { scan-assembler {\mcror\M} } } */
> 


Re: [PATCH] c++: Add !TYPE_P assert to type_dependent_expression_p [PR99080]

2022-05-29 Thread Jason Merrill via Gcc-patches

On 5/27/22 12:43, Marek Polacek wrote:

On Fri, May 27, 2022 at 11:52:12AM -0400, Jason Merrill wrote:

On 5/26/22 20:33, Marek Polacek wrote:

As discussed here:
,
type_dependent_expression_p should not be called with a type argument.

I promised I'd add an assert so here it is.  One place needed adjusting,
the comment explains why.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/99080

gcc/cp/ChangeLog:

* pt.cc (type_dependent_expression_p): Assert !TYPE_P.
* semantics.cc (finish_id_expression_1): Don't call
type_dependent_expression_p for a type.
---
   gcc/cp/pt.cc| 2 ++
   gcc/cp/semantics.cc | 4 +++-
   2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 24bbe2f4060..89156cb88b4 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -27727,6 +27727,8 @@ type_dependent_expression_p (tree expression)
 if (expression == NULL_TREE || expression == error_mark_node)
   return false;
+  gcc_checking_assert (!TYPE_P (expression));
+
 STRIP_ANY_LOCATION_WRAPPER (expression);
 /* An unresolved name is always dependent.  */
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index cd7a2818feb..7f8502f49b0 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -4141,7 +4141,9 @@ finish_id_expression_1 (tree id_expression,
   }
 else
   {
-  bool dependent_p = type_dependent_expression_p (decl);
+  /* DECL could be e.g. UNBOUND_CLASS_TEMPLATE which is a type which
+t_d_e_p doesn't accept.  */
+  bool dependent_p = !TYPE_P (decl) && type_dependent_expression_p (decl);


Maybe instead we could handle UNBOUND_CLASS_TEMPLATE at a higher level in
the function, like with an 'else if' before this 'else'?


Maybe, but I think I'd have to duplicate (parts of) this block:
  4227   else if (scope)
  4228 {
  4229   if (TREE_CODE (decl) == SCOPE_REF)
  4230 {
  4231   gcc_assert (same_type_p (scope, TREE_OPERAND (decl, 0)));
  4232   decl = TREE_OPERAND (decl, 1);
  4233 }
  4234
  4235   decl = (adjust_result_of_qualified_name_lookup
  4236   (decl, scope, current_nonlambda_class_type()));
  4237
  4238   cp_warn_deprecated_use_scopes (scope);
  4239
  4240   if (TYPE_P (scope))
  4241 decl = finish_qualified_id_expr (scope,
  4242  decl,
  4243  done,
  4244  address_p,
  4245  template_p,
  4246  template_arg_p,
  4247  tf_warning_or_error);
  4248   else
  4249 decl = convert_from_reference (decl);
  4250 }

Would that be acceptable?  Can't do

   else if (TREE_CODE (decl) == UNBOUND_CLASS_TEMPLATE)
 {
   gcc_checking_assert (scope);
   *idk = CP_ID_KIND_QUALIFIED;
   goto do_scope;
 }
because that will complain about skipping the initialization of dependent_p.

Here's a patch with the partial duplication, which passes dg.exp:

-- >8 --
As discussed here:
,
type_dependent_expression_p should not be called with a type argument.

I promised I'd add an assert so here it is.  One place needed adjusting.

PR c++/99080

gcc/cp/ChangeLog:

* pt.cc (type_dependent_expression_p): Assert !TYPE_P.
* semantics.cc (finish_id_expression_1): Handle UNBOUND_CLASS_TEMPLATE
specifically.
---
  gcc/cp/pt.cc|  2 ++
  gcc/cp/semantics.cc | 11 +++
  2 files changed, 13 insertions(+)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 24bbe2f4060..89156cb88b4 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -27727,6 +27727,8 @@ type_dependent_expression_p (tree expression)
if (expression == NULL_TREE || expression == error_mark_node)
  return false;
  
+  gcc_checking_assert (!TYPE_P (expression));

+
STRIP_ANY_LOCATION_WRAPPER (expression);
  
/* An unresolved name is always dependent.  */

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index cdc91a38e25..f62b0a4a736 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -4139,6 +4139,17 @@ finish_id_expression_1 (tree id_expression,
}
return r;
  }
+  else if (TREE_CODE (decl) == UNBOUND_CLASS_TEMPLATE)
+{
+  gcc_checking_assert (scope);
+  *idk = CP_ID_KIND_QUALIFIED;
+  decl = (adjust_result_of_qualified_name_lookup
+ (decl, scope, current_nonlambda_class_type()));


This call should have no effect, it only affects BASELINKs.  OK without 
this statement.



+  cp_warn_deprecated_use_scopes (scope);
+  decl = finish_qualified_id_expr (scope, decl, 

Re: [PATCH] c++: don't substitute TEMPLATE_PARM_CONSTRAINT [PR100374]

2022-05-29 Thread Jason Merrill via Gcc-patches

On 5/27/22 14:05, Patrick Palka wrote:

This makes us avoid substituting into the TEMPLATE_PARM_CONSTRAINT of
each template parameter except as necessary for (friend) declaration
matching, like we already do for the overall TEMPLATE_PARMS_CONSTRAINTS
of a template parameter list.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and perhaps 12.2?  Also tested on range-v3 and cmcstl2.


Are there already tests that cover the friend cases?


PR c++/100374

gcc/cp/ChangeLog:

* pt.cc (tsubst_each_template_parm_constraint): Define.
(tsubst_friend_function): Use it.
(tsubst_friend_class): Use it.
(tsubst_template_parm): Don't substitute TEMPLATE_PARM_CONSTRAINT.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-template-parm11.C: New test.
---
  gcc/cp/pt.cc  | 35 ---
  .../g++.dg/cpp2a/concepts-template-parm11.C   | 16 +
  2 files changed, 47 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-template-parm11.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 24bbe2f4060..ec168234325 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -184,6 +184,7 @@ static int unify_pack_expansion (tree, tree, tree,
 tree, unification_kind_t, bool, bool);
  static tree copy_template_args (tree);
  static tree tsubst_template_parms (tree, tree, tsubst_flags_t);
+static void tsubst_each_template_parm_constraint (tree, tree, tsubst_flags_t);
  tree most_specialized_partial_spec (tree, tsubst_flags_t);
  static tree tsubst_aggr_type (tree, tree, tsubst_flags_t, tree, int);
  static tree tsubst_arg_types (tree, tree, tree, tsubst_flags_t, tree);
@@ -11254,7 +11255,12 @@ tsubst_friend_function (tree decl, tree args)
tree parms = DECL_TEMPLATE_PARMS (new_friend);
tree treqs = TEMPLATE_PARMS_CONSTRAINTS (parms);
treqs = maybe_substitute_reqs_for (treqs, new_friend);
-  TEMPLATE_PARMS_CONSTRAINTS (parms) = treqs;
+  if (treqs != TEMPLATE_PARMS_CONSTRAINTS (parms))
+   {
+ TEMPLATE_PARMS_CONSTRAINTS (parms) = treqs;
+ /* As well as each TEMPLATE_PARM_CONSTRAINT.  */
+ tsubst_each_template_parm_constraint (parms, args, 
tf_warning_or_error);
+   }
  }
  
/* The mangled name for the NEW_FRIEND is incorrect.  The function

@@ -11500,6 +11506,8 @@ tsubst_friend_class (tree friend_tmpl, tree args)
{
  tree parms = tsubst_template_parms (DECL_TEMPLATE_PARMS (friend_tmpl),
  args, tf_warning_or_error);
+ tsubst_each_template_parm_constraint (parms, args,
+   tf_warning_or_error);
location_t saved_input_location = input_location;
input_location = DECL_SOURCE_LOCATION (friend_tmpl);
tree cons = get_constraints (tmpl);
@@ -11534,6 +11542,8 @@ tsubst_friend_class (tree friend_tmpl, tree args)
   DECL_FRIEND_CONTEXT (friend_tmpl));
  --processing_template_decl;
  set_constraints (tmpl, ci);
+ tsubst_each_template_parm_constraint (DECL_TEMPLATE_PARMS (tmpl),
+   args, tf_warning_or_error);
}
  
  	  /* Inject this template into the enclosing namspace scope.  */

@@ -13656,7 +13666,6 @@ tsubst_template_parm (tree t, tree args, tsubst_flags_t 
complain)
  
default_value = TREE_PURPOSE (t);

parm_decl = TREE_VALUE (t);
-  tree constraint = TEMPLATE_PARM_CONSTRAINTS (t);
  
parm_decl = tsubst (parm_decl, args, complain, NULL_TREE);

if (TREE_CODE (parm_decl) == PARM_DECL
@@ -13664,13 +13673,31 @@ tsubst_template_parm (tree t, tree args, 
tsubst_flags_t complain)
  parm_decl = error_mark_node;
default_value = tsubst_template_arg (default_value, args,
   complain, NULL_TREE);
-  constraint = tsubst_constraint (constraint, args, complain, NULL_TREE);
  
tree r = build_tree_list (default_value, parm_decl);

-  TEMPLATE_PARM_CONSTRAINTS (r) = constraint;
+  TEMPLATE_PARM_CONSTRAINTS (r) = TEMPLATE_PARM_CONSTRAINTS (t);
return r;
  }
  
+/* Substitute in-place the TEMPLATE_PARM_CONSTRAINT of each template

+   parameter in PARMS for sake of declaration matching.  */
+
+static void
+tsubst_each_template_parm_constraint (tree parms, tree args,
+ tsubst_flags_t complain)
+{
+  ++processing_template_decl;
+  for (; parms; parms = TREE_CHAIN (parms))
+{
+  tree level = TREE_VALUE (parms);
+  for (tree parm : tree_vec_range (level))
+   TEMPLATE_PARM_CONSTRAINTS (parm)
+ = tsubst_constraint (TEMPLATE_PARM_CONSTRAINTS (parm), args,
+  complain, NULL_TREE);
+}
+  --processing_template_decl;
+}
+
  /* Substitute the ARGS into the indicated aggregate (or enumeration)

Re: [PATCH] c++: don't substitute TEMPLATE_PARM_CONSTRAINT [PR100374]

2022-05-29 Thread Jason Merrill via Gcc-patches

On 5/29/22 22:10, Jason Merrill wrote:

On 5/27/22 14:05, Patrick Palka wrote:

This makes us avoid substituting into the TEMPLATE_PARM_CONSTRAINT of
each template parameter except as necessary for (friend) declaration
matching, like we already do for the overall TEMPLATE_PARMS_CONSTRAINTS
of a template parameter list.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and perhaps 12.2?  Also tested on range-v3 and cmcstl2.


Are there already tests that cover the friend cases?


Also, don't you also need to handle specialization of partial 
instantiations?



PR c++/100374

gcc/cp/ChangeLog:

* pt.cc (tsubst_each_template_parm_constraint): Define.
(tsubst_friend_function): Use it.
(tsubst_friend_class): Use it.
(tsubst_template_parm): Don't substitute TEMPLATE_PARM_CONSTRAINT.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-template-parm11.C: New test.
---
  gcc/cp/pt.cc  | 35 ---
  .../g++.dg/cpp2a/concepts-template-parm11.C   | 16 +
  2 files changed, 47 insertions(+), 4 deletions(-)
  create mode 100644 
gcc/testsuite/g++.dg/cpp2a/concepts-template-parm11.C


diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 24bbe2f4060..ec168234325 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -184,6 +184,7 @@ static int unify_pack_expansion (tree, tree, tree,
   tree, unification_kind_t, bool, bool);
  static tree copy_template_args (tree);
  static tree tsubst_template_parms (tree, tree, tsubst_flags_t);
+static void tsubst_each_template_parm_constraint (tree, tree, 
tsubst_flags_t);

  tree most_specialized_partial_spec (tree, tsubst_flags_t);
  static tree tsubst_aggr_type (tree, tree, tsubst_flags_t, tree, int);
  static tree tsubst_arg_types (tree, tree, tree, tsubst_flags_t, tree);
@@ -11254,7 +11255,12 @@ tsubst_friend_function (tree decl, tree args)
    tree parms = DECL_TEMPLATE_PARMS (new_friend);
    tree treqs = TEMPLATE_PARMS_CONSTRAINTS (parms);
    treqs = maybe_substitute_reqs_for (treqs, new_friend);
-  TEMPLATE_PARMS_CONSTRAINTS (parms) = treqs;
+  if (treqs != TEMPLATE_PARMS_CONSTRAINTS (parms))
+    {
+  TEMPLATE_PARMS_CONSTRAINTS (parms) = treqs;
+  /* As well as each TEMPLATE_PARM_CONSTRAINT.  */
+  tsubst_each_template_parm_constraint (parms, args, 
tf_warning_or_error);

+    }
  }
    /* The mangled name for the NEW_FRIEND is incorrect.  The function
@@ -11500,6 +11506,8 @@ tsubst_friend_class (tree friend_tmpl, tree args)
  {
    tree parms = tsubst_template_parms (DECL_TEMPLATE_PARMS 
(friend_tmpl),

    args, tf_warning_or_error);
+  tsubst_each_template_parm_constraint (parms, args,
+    tf_warning_or_error);
    location_t saved_input_location = input_location;
    input_location = DECL_SOURCE_LOCATION (friend_tmpl);
    tree cons = get_constraints (tmpl);
@@ -11534,6 +11542,8 @@ tsubst_friend_class (tree friend_tmpl, tree args)
 DECL_FRIEND_CONTEXT (friend_tmpl));
    --processing_template_decl;
    set_constraints (tmpl, ci);
+  tsubst_each_template_parm_constraint (DECL_TEMPLATE_PARMS 
(tmpl),

+    args, tf_warning_or_error);
  }
    /* Inject this template into the enclosing namspace scope.  */
@@ -13656,7 +13666,6 @@ tsubst_template_parm (tree t, tree args, 
tsubst_flags_t complain)

    default_value = TREE_PURPOSE (t);
    parm_decl = TREE_VALUE (t);
-  tree constraint = TEMPLATE_PARM_CONSTRAINTS (t);
    parm_decl = tsubst (parm_decl, args, complain, NULL_TREE);
    if (TREE_CODE (parm_decl) == PARM_DECL
@@ -13664,13 +13673,31 @@ tsubst_template_parm (tree t, tree args, 
tsubst_flags_t complain)

  parm_decl = error_mark_node;
    default_value = tsubst_template_arg (default_value, args,
 complain, NULL_TREE);
-  constraint = tsubst_constraint (constraint, args, complain, 
NULL_TREE);

    tree r = build_tree_list (default_value, parm_decl);
-  TEMPLATE_PARM_CONSTRAINTS (r) = constraint;
+  TEMPLATE_PARM_CONSTRAINTS (r) = TEMPLATE_PARM_CONSTRAINTS (t);
    return r;
  }
+/* Substitute in-place the TEMPLATE_PARM_CONSTRAINT of each template
+   parameter in PARMS for sake of declaration matching.  */
+
+static void
+tsubst_each_template_parm_constraint (tree parms, tree args,
+  tsubst_flags_t complain)
+{
+  ++processing_template_decl;
+  for (; parms; parms = TREE_CHAIN (parms))
+    {
+  tree level = TREE_VALUE (parms);
+  for (tree parm : tree_vec_range (level))
+    TEMPLATE_PARM_CONSTRAINTS (parm)
+  = tsubst_constraint (TEMPLATE_PARM_CONSTRAINTS (parm), args,
+   complain, NULL_TREE);
+    }
+  --processing_template_decl;
+}
+
  /* Substitute the ARGS into the indicated aggregate (or enumeration)
 type T.  If T is not an aggregate or enumeration type, it is
 handled as 

RE: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-29 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Alexander Monakov 
> Sent: Friday, May 27, 2022 5:39 PM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Add a bit dislike for separate mem alternative when op is
> REG_P.
> 
> On Wed, 25 May 2022, liuhongt via Gcc-patches wrote:
> 
> > Rigt now, mem_cost for separate mem alternative is 1 * frequency which
> > is pretty small and caused the unnecessary SSE spill in the PR, I've
> > tried to rework backend cost model, but RA still not happy with
> > that(regress somewhere else). I think the root cause of this is cost for 
> > separate
> 'm'
> > alternative cost is too small, especially considering that the mov
> > cost of gpr are 2(default for REGISTER_MOVE_COST). So this patch
> > increase mem_cost to 2*frequency, also increase 1 for reg_class cost when m
> alternative.
> 
> In the PR, the spill happens in the initial basic block of the function, i.e.
> the one with the highest frequency.
> 
> Also as noted in the PR, swapping the 'unlikely' branch to 'likely' avoids 
> the spill,
> even though it does not affect the frequency of the initial basic block, and
> makes the block with the use more rarely executed.

The spill is mainly decided by 3 insns related to r92

283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ])
284(reg:SF 102)) "test3.c":7:1 142 {*movsf_internal}
285 (expr_list:REG_DEAD (reg:SF 102)

288(insn 9 4 12 2 (set (reg:SI 89 [ _11 ])
289(subreg:SI (reg/v:SF 92 [ x ]) 0)) "test3.c":3:36 81 
{*movsi_internal}
290 (nil))

And
382(insn 28 27 29 5 (set (reg:DF 98)
383(float_extend:DF (reg/v:SF 92 [ x ]))) "test3.c":11:13 163 
{*extendsfdf2}
384 (expr_list:REG_DEAD (reg/v:SF 92 [ x ])
385(nil)))
386(insn 29 28 30 5 (s

The frequency the for INSN 3 and INSN 9 is not affected, but frequency of INSN 
28 drop from 805 -> 89 after swapping "unlikely" and "likely".
Because of that, GPR cost decreases a lot, finally make the RA choose GPR 
instead of MEM.

GENERAL_REGS:2356,2356 
SSE_REGS:6000,6000
MEM:4089,4089

Dump of 301.ira:
67  a4(r92,l0) costs: AREG:2356,2356 DREG:2356,2356 CREG:2356,2356 
BREG:2356,2356 SIREG:2356,2356 DIREG:2356,2356 AD_REGS:2356,2356 
CLOBBERED_REGS:2356,2356 Q_REGS:2356,2356 NON_Q_REGS:2356,2356 
TLS_GOTBASE_REGS:2356,2356 GENERAL_REGS:2356,2356 SSE_FIRST_REG:6000,6000 
NO_REX_SSE_REGS:6000,6000 SSE_REGS:6000,6000 \
   MMX_REGS:19534,19534 INT_SSE_REGS:19534,19534 ALL_REGS:214534,214534 
MEM:4089,4089

And although there's no spill, there's an extra VMOVD in the later BB which 
looks suboptimal(Guess we can stand with that since it's cold.)

24vmovd   %eax, %xmm2
25vcvtss2sd   %xmm2, %xmm2, %xmm1
26vmulsd  %xmm0, %xmm1, %xmm0
27vcvtsd2ss   %xmm0, %xmm0, %xmm0
> 
> Do you have a root cause analysis that explains the above?
> 
> Alexander


Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-29 Thread Hongtao Liu via Gcc-patches
On Fri, May 27, 2022 at 5:12 AM Vladimir Makarov via Gcc-patches
 wrote:
>
>
> On 2022-05-24 23:39, liuhongt wrote:
> > Rigt now, mem_cost for separate mem alternative is 1 * frequency which
> > is pretty small and caused the unnecessary SSE spill in the PR, I've tried
> > to rework backend cost model, but RA still not happy with that(regress
> > somewhere else). I think the root cause of this is cost for separate 'm'
> > alternative cost is too small, especially considering that the mov cost
> > of gpr are 2(default for REGISTER_MOVE_COST). So this patch increase 
> > mem_cost
> > to 2*frequency, also increase 1 for reg_class cost when m alternative.
> >
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
>
> Thank you for addressing this problem. And sorry I can not approve this
> patch at least w/o your additional work on benchmarking this change.
>
> This code is very old.  It is coming from older RA (former file
> regclass.c) and existed practically since GCC day 1.  People tried many
> times to improve this code.  The code also affects many targets.
Yes, that's why I increased it as low as possible, so it won't regress
#c6 in the PR.
>
> I can approve this patch if you show that there is no regression at
> least on x86-64 on some credible benchmark, e.g. SPEC2006 or SPEC2017.
>
I've tested the patch for SPEC2017 with both  -march=cascadelake
-Ofast -flto and -O2 -mtune=generic.
No obvious regression is observed, the binaries are all different from
before, so I looked at 2 of them, the difference mainly comes from
different choices of registers(xmm13 -> xmm12).
Ok for trunk then?
> I know it is a big work but when I myself do such changes I check
> SPEC2017.  I rejected my changes like this one several times when I
> benchmarked them on SPEC2017 although at the first glance they looked
> reasonable.
>
> > gcc/ChangeLog:
> >
> >   PR target/105513
> >   * ira-costs.cc (record_reg_classes): Increase both mem_cost
> >   and reg class cost by 1 for separate mem alternative when
> >   REG_P (op).
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/i386/pr105513-1.c: New test.
> > ---
> >   gcc/ira-costs.cc   | 26 +-
> >   gcc/testsuite/gcc.target/i386/pr105513-1.c | 16 +
> >   2 files changed, 31 insertions(+), 11 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/i386/pr105513-1.c
> >
> > diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc
> > index 964c94a06ef..f7b8325e195 100644
> > --- a/gcc/ira-costs.cc
> > +++ b/gcc/ira-costs.cc
> > @@ -625,7 +625,8 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
> > for (k = cost_classes_ptr->num - 1; k >= 0; k--)
> >   {
> > rclass = cost_classes[k];
> > -   pp_costs[k] = mem_cost[rclass][0] * frequency;
> > +   pp_costs[k] = (mem_cost[rclass][0]
> > +  + 1) * frequency;
> >   }
> >   }
> > else
> > @@ -648,7 +649,8 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
> > for (k = cost_classes_ptr->num - 1; k >= 0; k--)
> >   {
> > rclass = cost_classes[k];
> > -   pp_costs[k] = mem_cost[rclass][1] * frequency;
> > +   pp_costs[k] = (mem_cost[rclass][1]
> > +  + 1) * frequency;
> >   }
> >   }
> > else
> > @@ -670,9 +672,9 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
> > for (k = cost_classes_ptr->num - 1; k >= 0; k--)
> >   {
> > rclass = cost_classes[k];
> > -   pp_costs[k] = ((mem_cost[rclass][0]
> > -   + mem_cost[rclass][1])
> > -  * frequency);
> > +   pp_costs[k] = (mem_cost[rclass][0]
> > +  + mem_cost[rclass][1]
> > +  + 2) * frequency;
> >   }
> >   }
> > else
> > @@ -861,7 +863,8 @@ record_reg_classes (int n_alts, int n_ops, rtx *ops,
> > for (k = cost_classes_ptr->num - 1; k >= 0; k--)
> >   {
> > rclass = cost_classes[k];
> > -   pp_costs[k] = mem_cost[rclass][0] * frequency;
> > +   pp_costs[k] = (mem_cost[rclass][0]
> > +  + 1) * frequency;
> >   }
> >   }
> > else
> > @@ -884,7 +887,8 @@ record_re

Re: [PATCH v3] RISC-V/testsuite: constraint some of tests to hard_float

2022-05-29 Thread Kito Cheng via Gcc-patches
Committed, thanks!

On Fri, May 27, 2022 at 10:37 AM Vineet Gupta  wrote:
>
> Commit 9ddd44b58649d1d ("RISC-V: Provide `fmin'/`fmax' RTL pattern") added
> tests which check for hard float instructions which obviously fails on
> soft-float ABI builds.
>
> And my recent commit b646d7d279ae ("RISC-V: Inhibit FP <--> int register
> moves via tune param") is guilty of same crime.
>
> So constraint with "dg-require-effective-target hard_float"
>
> This reduces bunch of new RV failures.
>
> |   = Summary of gcc testsuite =
> || # of unexpected case / # of unique unexpected 
> case
> ||  gcc |  g++ | gfortran |
> |   rv64imac/   lp64/ medlow |  134 /22 |0 / 0 |- |  
> BEFORE
> |   rv64imac/   lp64/ medlow |   22 / 9 |0 / 0 |- |  
> AFTER
> |
>
> gcc/testsuite/Changelog:
> * gcc.target/riscv/fmax.c: Add dg-require-effective-target hard_float.
> * gcc.target/riscv/fmaxf.c: Ditto.
> * gcc.target/riscv/fmin.c: Ditto.
> * gcc.target/riscv/fminf.c: Ditto.
> * gcc.target/riscv/smax-ieee.c: Ditto.
> * gcc.target/riscv/smax.c: Ditto.
> * gcc.target/riscv/smaxf-ieee.c: Ditto.
> * gcc.target/riscv/smaxf.c: Ditto.
> * gcc.target/riscv/smin-ieee.c: Ditto.
> * gcc.target/riscv/smin.c: Ditto.
> * gcc.target/riscv/sminf-ieee.c: Ditto.
> * gcc.target/riscv/sminf.c: Ditto.
> * gcc.target/riscv/pr105666.c: Ditto.
>
> Signed-off-by: Vineet Gupta 
> ---
> v3:
> Added fix to pr105666.c as well.
> v2:
> Fixed the SoB snafu in v1
> ---
>  gcc/testsuite/gcc.target/riscv/fmax.c   | 1 +
>  gcc/testsuite/gcc.target/riscv/fmaxf.c  | 1 +
>  gcc/testsuite/gcc.target/riscv/fmin.c   | 1 +
>  gcc/testsuite/gcc.target/riscv/fminf.c  | 1 +
>  gcc/testsuite/gcc.target/riscv/pr105666.c   | 1 +
>  gcc/testsuite/gcc.target/riscv/smax-ieee.c  | 1 +
>  gcc/testsuite/gcc.target/riscv/smax.c   | 1 +
>  gcc/testsuite/gcc.target/riscv/smaxf-ieee.c | 1 +
>  gcc/testsuite/gcc.target/riscv/smaxf.c  | 1 +
>  gcc/testsuite/gcc.target/riscv/smin-ieee.c  | 1 +
>  gcc/testsuite/gcc.target/riscv/smin.c   | 1 +
>  gcc/testsuite/gcc.target/riscv/sminf-ieee.c | 1 +
>  gcc/testsuite/gcc.target/riscv/sminf.c  | 1 +
>  13 files changed, 13 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/fmax.c 
> b/gcc/testsuite/gcc.target/riscv/fmax.c
> index c71d35c9f9dc..e1b7fa8f918c 100644
> --- a/gcc/testsuite/gcc.target/riscv/fmax.c
> +++ b/gcc/testsuite/gcc.target/riscv/fmax.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
>  /* { dg-options "-fno-finite-math-only -fsigned-zeros -fno-signaling-nans 
> -dp" } */
>
>  double
> diff --git a/gcc/testsuite/gcc.target/riscv/fmaxf.c 
> b/gcc/testsuite/gcc.target/riscv/fmaxf.c
> index f9980166887a..8da0513dc8f6 100644
> --- a/gcc/testsuite/gcc.target/riscv/fmaxf.c
> +++ b/gcc/testsuite/gcc.target/riscv/fmaxf.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
>  /* { dg-options "-fno-finite-math-only -fsigned-zeros -fno-signaling-nans 
> -dp" } */
>
>  float
> diff --git a/gcc/testsuite/gcc.target/riscv/fmin.c 
> b/gcc/testsuite/gcc.target/riscv/fmin.c
> index 9634abd19af8..01993d49bc21 100644
> --- a/gcc/testsuite/gcc.target/riscv/fmin.c
> +++ b/gcc/testsuite/gcc.target/riscv/fmin.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
>  /* { dg-options "-fno-finite-math-only -fsigned-zeros -fno-signaling-nans 
> -dp" } */
>
>  double
> diff --git a/gcc/testsuite/gcc.target/riscv/fminf.c 
> b/gcc/testsuite/gcc.target/riscv/fminf.c
> index 9a3687be3092..32ce363e10d8 100644
> --- a/gcc/testsuite/gcc.target/riscv/fminf.c
> +++ b/gcc/testsuite/gcc.target/riscv/fminf.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
>  /* { dg-options "-fno-finite-math-only -fsigned-zeros -fno-signaling-nans 
> -dp" } */
>
>  float
> diff --git a/gcc/testsuite/gcc.target/riscv/pr105666.c 
> b/gcc/testsuite/gcc.target/riscv/pr105666.c
> index 904f3bc0763f..dd996eec8efc 100644
> --- a/gcc/testsuite/gcc.target/riscv/pr105666.c
> +++ b/gcc/testsuite/gcc.target/riscv/pr105666.c
> @@ -6,6 +6,7 @@
> spilling to stack.  */
>
>  /* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
>  /* { dg-options "-march=rv64g -ffast-math" } */
>
>  #define NITER 4
> diff --git a/gcc/testsuite/gcc.target/riscv/smax-ieee.c 
> b/gcc/testsuite/gcc.target/riscv/smax-ieee.c
> index 3a98aeb45add..2dbccefe2f4d 100644
> --- a/gcc/testsuite/gcc.target/riscv/smax-ieee.c
> +++ b/gcc/testsuite/gcc.target/riscv/smax-ieee.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
>  /* { dg-options "-ffinite-math-only -fsigned-zeros -dp" } */

[PATCH] testsuite: constraint some of fp tests to hard_float

2022-05-29 Thread Vineet Gupta
These tests validate fp conversions with various rounding modes which
would not work on soft-float ABIs.

On -march=rv64imac/-mabi=lp64 this reduces 5 unique failures (overall 35
due to multi flag combination builds)

gcc/testsuite/Changelog:
* gcc.dg/torture/fp-double-convert-float-1.c: Add
dg-require-effective-target hard_float.
* gcc.dg/torture/fp-int-convert-timode-3.c: Ditto.
* gcc.dg/torture/fp-int-convert-timode-4.c: Ditto.
* gcc.dg/torture/fp-uint64-convert-double-1.c: Ditto.
* gcc.dg/torture/fp-uint64-convert-double-2.c: Ditto.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.dg/torture/fp-double-convert-float-1.c  | 1 +
 gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c| 1 +
 gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c| 1 +
 gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c | 1 +
 gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-2.c | 1 +
 5 files changed, 5 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/torture/fp-double-convert-float-1.c 
b/gcc/testsuite/gcc.dg/torture/fp-double-convert-float-1.c
index ec23274ea989..1c28a9e101eb 100644
--- a/gcc/testsuite/gcc.dg/torture/fp-double-convert-float-1.c
+++ b/gcc/testsuite/gcc.dg/torture/fp-double-convert-float-1.c
@@ -1,6 +1,7 @@
 /* PR57245 */
 /* { dg-do run } */
 /* { dg-require-effective-target fenv } */
+/* { dg-require-effective-target hard_float } */
 /* { dg-additional-options "-frounding-math" } */
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c 
b/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c
index 707d539335fe..6f9a8d3f0d3e 100644
--- a/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c
+++ b/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-3.c
@@ -3,6 +3,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target int128 } */
 /* { dg-require-effective-target fenv } */
+/* { dg-require-effective-target hard_float } */
 /* { dg-options "-frounding-math" } */
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c 
b/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c
index 09600f909031..15f478d15e24 100644
--- a/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c
+++ b/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode-4.c
@@ -3,6 +3,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target int128 } */
 /* { dg-require-effective-target fenv } */
+/* { dg-require-effective-target hard_float } */
 /* { dg-options "-frounding-math" } */
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c 
b/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c
index fadad8c31981..0c7bf003e93e 100644
--- a/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c
+++ b/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c
@@ -1,6 +1,7 @@
 /* PR84407 */
 /* { dg-do run } */
 /* { dg-require-effective-target fenv } */
+/* { dg-require-effective-target hard_float } */
 /* { dg-additional-options "-frounding-math -fexcess-precision=standard" } */
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-2.c 
b/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-2.c
index 952f96b33c92..ac24b351a46d 100644
--- a/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-2.c
+++ b/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-2.c
@@ -1,6 +1,7 @@
 /* PR84407 */
 /* { dg-do run } */
 /* { dg-require-effective-target fenv } */
+/* { dg-require-effective-target hard_float } */
 /* { dg-additional-options "-frounding-math" } */
 
 #include 
-- 
2.32.0



RE: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-29 Thread Alexander Monakov via Gcc-patches
> > In the PR, the spill happens in the initial basic block of the function, 
> > i.e.
> > the one with the highest frequency.
> > 
> > Also as noted in the PR, swapping the 'unlikely' branch to 'likely' avoids 
> > the spill,
> > even though it does not affect the frequency of the initial basic block, and
> > makes the block with the use more rarely executed.
> 
> The spill is mainly decided by 3 insns related to r92
> 
> 283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ])
> 284(reg:SF 102)) "test3.c":7:1 142 {*movsf_internal}
> 285 (expr_list:REG_DEAD (reg:SF 102)
> 
> 288(insn 9 4 12 2 (set (reg:SI 89 [ _11 ])
> 289(subreg:SI (reg/v:SF 92 [ x ]) 0)) "test3.c":3:36 81 
> {*movsi_internal}
> 290 (nil))
> 
> And
> 382(insn 28 27 29 5 (set (reg:DF 98)
> 383(float_extend:DF (reg/v:SF 92 [ x ]))) "test3.c":11:13 163 
> {*extendsfdf2}
> 384 (expr_list:REG_DEAD (reg/v:SF 92 [ x ])
> 385(nil)))
> 386(insn 29 28 30 5 (s
> 
> The frequency the for INSN 3 and INSN 9 is not affected, but frequency of INSN
> 28 drop from 805 -> 89 after swapping "unlikely" and "likely".  Because of
> that, GPR cost decreases a lot, finally make the RA choose GPR instead of MEM.
> 
> GENERAL_REGS:2356,2356 
> SSE_REGS:6000,6000
> MEM:4089,4089

But why are SSE_REGS costed so high? r92 is used in SFmode, it doesn't make
sense that selecting a GPR for it looks cheaper than xmm0.

> Dump of 301.ira:
> 67  a4(r92,l0) costs: AREG:2356,2356 DREG:2356,2356 CREG:2356,2356 
> BREG:2356,2356 SIREG:2356,2356 DIREG:2356,2356 AD_REGS:2356,2356 
> CLOBBERED_REGS:2356,2356 Q_REGS:2356,2356 NON_Q_REGS:2356,2356 
> TLS_GOTBASE_REGS:2356,2356 GENERAL_REGS:2356,2356 SSE_FIRST_REG:6000,6000 
> NO_REX_SSE_REGS:6000,6000 SSE_REGS:6000,6000 \
>MMX_REGS:19534,19534 INT_SSE_REGS:19534,19534 ALL_REGS:214534,214534 
> MEM:4089,4089
> 
> And although there's no spill, there's an extra VMOVD in the later BB which
> looks suboptimal(Guess we can stand with that since it's cold.)

I think that falls out of the wrong decision for SSE_REGS cost.

Alexander

> 
> 24vmovd   %eax, %xmm2
> 25vcvtss2sd   %xmm2, %xmm2, %xmm1
> 26vmulsd  %xmm0, %xmm1, %xmm0
> 27vcvtsd2ss   %xmm0, %xmm0, %xmm0
> > 
> > Do you have a root cause analysis that explains the above?
> > 
> > Alexander