Re: [PATCH 3/4] gcov: Add gen_counter_update()

2023-11-19 Thread Dimitar Dimitrov
On Tue, Nov 14, 2023 at 11:08:24PM +0100, Sebastian Huber wrote:
> Move the counter update to the new gen_counter_update() helper function.  Use
> it in gimple_gen_edge_profiler() and gimple_gen_time_profiler().  The 
> resulting
> gimple instructions should be identical with the exception of the removed
> unshare_expr() call.  The unshare_expr() call was used in
> gimple_gen_edge_profiler().
> 
> gcc/ChangeLog:
> 
>   * tree-profile.cc (gen_assign_counter_update): New.
>   (gen_counter_update): Likewise.
>   (gimple_gen_edge_profiler): Use gen_counter_update().
>   (gimple_gen_time_profiler): Likewise.
> ---
>  gcc/tree-profile.cc | 133 +---
>  1 file changed, 62 insertions(+), 71 deletions(-)
> 

Hi Sebastian,

This patch caused a bunch of test failures on arm-none-eabi and
pru-unknown-elf targets.  One example:

/home/dinux/projects/pru/testbot-workspace/gcc/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c:
 In function 'main':
/home/dinux/projects/pru/testbot-workspace/gcc/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c:19:1:
 error: incorrect sharing of tree nodes
__gcov0.main[0]
# .MEM_12 = VDEF <.MEM_9>
__gcov0.main[0] = PROF_edge_counter_4;
during IPA pass: profile
/home/dinux/projects/pru/testbot-workspace/gcc/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c:19:1:
 internal compiler error: verify_gimple failed
0xfd9c7d verify_gimple_in_cfg(function*, bool, bool)
/home/dinux/projects/pru/testbot-workspace/gcc/gcc/tree-cfg.cc:5662
0xe586a4 execute_function_todo
/home/dinux/projects/pru/testbot-workspace/gcc/gcc/passes.cc:2088
0xe58ba2 do_per_function
/home/dinux/projects/pru/testbot-workspace/gcc/gcc/passes.cc:1694
0xe58ba2 do_per_function
/home/dinux/projects/pru/testbot-workspace/gcc/gcc/passes.cc:1684
0xe58bfe execute_todo
/home/dinux/projects/pru/testbot-workspace/gcc/gcc/passes.cc:2142
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
compiler exited with status 1
FAIL: gcc.dg/no_profile_instrument_function-attr-1.c (internal compiler error: 
verify_gimple failed)
FAIL: gcc.dg/no_profile_instrument_function-attr-1.c (test for excess errors)


I'm using the following script to build and test:
  https://github.com/dinuxbg/gnupru/blob/master/testing/manual-test-pru.sh

Regards,
Dimitar


[PATCH 2/3] OpenMP: Unify representation of name-list properties.

2023-11-19 Thread Sandra Loosemore
Previously, name-list properties specified as identifiers were stored
in the TREE_PURPOSE/OMP_TP_NAME slot, while those specified as strings
were stored in the TREE_VALUE/OMP_TP_VALUE slot.  This patch puts both
representations in OMP_TP_VALUE with a magic cookie in OMP_TP_NAME.

gcc/ChangeLog
* omp-general.h (OMP_TP_NAMELIST_NODE): New.
* omp-general.cc (omp_context_name_list_prop): Move earlier
in the file, and adjust for new representation.
(omp_check_context_selector): Adjust this too.
(omp_context_selector_props_compare): Likewise.

gcc/c/ChangeLog
* c-parser.cc (c_parser_omp_context_selector): Adjust for new
namelist property representation.

gcc/cp/ChangeLog
* parser.cc (cp_parser_omp_context_selector): Adjust for new
namelist property representation.
* pt.cc (tsubst_attribute): Likewise.

gcc/fortran/ChangeLog
* trans-openmp.cc (gfc_trans_omp_declare_varaint): Adjust for
new namelist property representation.
---
 gcc/c/c-parser.cc   |  5 ++-
 gcc/cp/parser.cc|  5 ++-
 gcc/cp/pt.cc|  4 +-
 gcc/fortran/trans-openmp.cc |  5 ++-
 gcc/omp-general.cc  | 84 +
 gcc/omp-general.h   |  1 +
 6 files changed, 61 insertions(+), 43 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index fcbacd461c7..a2ff381e0c1 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -24217,11 +24217,12 @@ c_parser_omp_context_selector (c_parser *parser, tree 
set, tree parms)
case CTX_PROPERTY_NAME_LIST:
  do
{
- tree prop = NULL_TREE, value = NULL_TREE;
+ tree prop = OMP_TP_NAMELIST_NODE;
+ tree value = NULL_TREE;
  if (c_parser_next_token_is (parser, CPP_KEYWORD)
  || c_parser_next_token_is (parser, CPP_NAME))
{
- prop = c_parser_peek_token (parser)->value;
+ value = c_parser_peek_token (parser)->value;
  c_parser_consume_token (parser);
}
  else if (c_parser_next_token_is (parser, CPP_STRING))
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index dd773570981..9030365644d 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -47469,11 +47469,12 @@ cp_parser_omp_context_selector (cp_parser *parser, 
tree set, bool has_parms_p)
case CTX_PROPERTY_NAME_LIST:
  do
{
- tree prop = NULL_TREE, value = NULL_TREE;
+ tree prop = OMP_TP_NAMELIST_NODE;
+ tree value = NULL_TREE;
  if (cp_lexer_next_token_is (parser->lexer, CPP_KEYWORD)
  || cp_lexer_next_token_is (parser->lexer, CPP_NAME))
{
- prop = cp_lexer_peek_token (parser->lexer)->u.value;
+ value = cp_lexer_peek_token (parser->lexer)->u.value;
  cp_lexer_consume_token (parser->lexer);
}
  else if (cp_lexer_next_token_is (parser->lexer, CPP_STRING))
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 3af793dfe20..c3815733651 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -11892,7 +11892,9 @@ tsubst_attribute (tree t, tree *decl_p, tree args,
}
  properties = copy_list (OMP_TS_PROPERTIES (ts));
  for (tree p = properties; p; p = TREE_CHAIN (p))
-   if (OMP_TP_VALUE (p))
+   if (OMP_TP_NAME (p) == OMP_TP_NAMELIST_NODE)
+ continue;
+   else if (OMP_TP_VALUE (p))
  {
bool allow_string
  = (OMP_TS_ID (ts) != condition || set[0] != 'u');
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index fe8044a57cd..60154ff3751 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -8235,9 +8235,10 @@ gfc_trans_omp_declare_variant (gfc_namespace *ns)
  break;
case CTX_PROPERTY_NAME_LIST:
  {
-   tree prop = NULL_TREE, value = NULL_TREE;
+   tree prop = OMP_TP_NAMELIST_NODE;
+   tree value = NULL_TREE;
if (otp->is_name)
- prop = get_identifier (otp->name);
+ value = get_identifier (otp->name);
else
  value = gfc_conv_constant_to_tree (otp->expr);
 
diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index 4ea0d971273..e4e3890449e 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -1114,6 +1114,30 @@ omp_maybe_offloaded (void)
   return false;
 }
 
+/* Return a name from PROP, a property in selectors accepting
+   name lists.  */
+
+static const char *
+omp_contex

[PATCH 1/3] OpenMP: Introduce accessor macros and constructors for context selectors.

2023-11-19 Thread Sandra Loosemore
This patch hides the underlying nested TREE_LIST structure of context
selectors behind accessor macros that have more meaningful names than
the generic TREE_PURPOSE/TREE_VALUE accessors.  There is a slight
change to the representation in that the score expression in
trait-selectors has a distinguished tag and is separated from the
ordinary properties, although internally it is still represented as
the first item in the TREE_VALUE of the selector.  This patch also renames
some local variables with slightly more descriptive names so it is easier
to track whether something is a selector-set, selector, or property.

gcc/ChangeLog
* omp-general.h (OMP_TS_SCORE_NODE): New.
(OMP_TSS_ID, OMP_TSS_TRAIT_SELECTORS): New.
(OMP_TS_ID, OMP_TS_SCORE, OMP_TS_PROPERTIES): New.
(OMP_TP_NAME, OMP_TP_VALUE): New.
(make_trait_set_selector): Declare.
(make_trait_selector): Declare.
(make_trait_property): Declare.
(omp_constructor_traits_to_codes): Rename to
omp_construct_traits_to_codes.
* omp-general.cc (omp_constructor_traits_to_codes): Rename
to omp_construct_traits_to_codes.  Use new accessors.
(omp_check_context_selector): Use new accessors.
(make_trait_set_selector): New.
(make_trait_selector): New.
(make_trait_property): New.
(omp_context_name_list_prop): Use new accessors.
(omp_context_selector_matches): Use new accessors.
(omp_context_selector_props_compare): Use new accessors.
(omp_context_selector_set_compare): Use new accessors.
(omp_get_context_selector): Use new accessors.
(omp_context_compute_score): Use new accessors.
* gimplify.cc (omp_construct_selector_matches): Adjust for renaming
of omp_constructor_traits_to_codes.

gcc/c/ChangeLog
* c-parser.cc (c_parser_omp_context_selector): Use new constructors.

gcc/cp/ChangeLog
* parser.cc (cp_parser_omp_context_selector): Use new constructors.
* pt.cc: Include omp-general.h.
(tsubst_attribute): Use new context selector accessors and
 constructors.

gcc/fortran/ChangeLog
* trans-openmp.cc (gfc_trans_omp_declare_variant): Use new
constructors.
---
 gcc/c/c-parser.cc   |  27 ++--
 gcc/cp/parser.cc|  30 ++--
 gcc/cp/pt.cc|  82 ++
 gcc/fortran/trans-openmp.cc |  27 ++--
 gcc/gimplify.cc |   4 +-
 gcc/omp-general.cc  | 293 ++--
 gcc/omp-general.h   |  48 +-
 7 files changed, 297 insertions(+), 214 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 703f9570dbc..fcbacd461c7 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -24032,7 +24032,10 @@ static const char *const omp_user_selectors[] = {
  trait-selector-name[([trait-score:]trait-property[,trait-property[,...]])]
 
trait-score:
- score(score-expression)  */
+ score(score-expression)
+
+   Note that this function returns a list of trait selectors for the
+   trait-selector-set SET.  */
 
 static tree
 c_parser_omp_context_selector (c_parser *parser, tree set, tree parms)
@@ -24051,6 +24054,7 @@ c_parser_omp_context_selector (c_parser *parser, tree 
set, tree parms)
}
 
   tree properties = NULL_TREE;
+  tree scoreval = NULL_TREE;
   const char *const *selectors = NULL;
   bool allow_score = true;
   bool allow_user = false;
@@ -24157,8 +24161,7 @@ c_parser_omp_context_selector (c_parser *parser, tree 
set, tree parms)
error_at (token->location, "score argument must be "
  "non-negative");
  else
-   properties = tree_cons (get_identifier (" score"),
-   score, properties);
+   scoreval = score;
}
  token = c_parser_peek_token (parser);
}
@@ -24171,7 +24174,8 @@ c_parser_omp_context_selector (c_parser *parser, tree 
set, tree parms)
{
  t = c_parser_expr_no_commas (parser, NULL).value;
  if (TREE_CODE (t) == STRING_CST)
-   properties = tree_cons (NULL_TREE, t, properties);
+   properties = make_trait_property (NULL_TREE, t,
+ properties);
  else if (t != error_mark_node)
{
  mark_exp_read (t);
@@ -24182,7 +24186,8 @@ c_parser_omp_context_selector (c_parser *parser, tree 
set, tree parms)
  "constant integer expression or string "
  "literal");
  else
-   properties = tree_cons (NULL_TREE, t, properties);
+   properties = make_trait_property (NULL_TREE, t,
+ properties);

[PATCH 0/3] OpenMP: Improve data abstractions for context selectors

2023-11-19 Thread Sandra Loosemore
While trying to track down some bugs in the metadirective patches
(currently on the OG13 branch), I found that I was getting totally
lost in the undocumented data structures for context selectors; the
multiple levels of TREE_PURPOSE and TREE_VALUE that don't hint at what
kind of object is being accessed, generic variable names like "t1" and
"t2" likewise.  Similarly the inconsistent and undocumented
representation of different properties, switch statements over the
first character of the trait selector set name, etc added to my
confusion.  It's not surprising that adding new features made this
foundation pretty creaky and I think that adding the additional
selector features in OMP 5.2 and 6.* is going to cause it to fall over
completely.

This series of patches adds a layer of data abstraction, using at
least slightly more descriptive names, and then tries to address some
of the representation and coding issues.

Part 1 introduces some macros (e.g., OMP_TSS_ID instead of
TREE_PURPOSE to get the name of a selector) and renames a bunch of
variables (e.g., tss for a trait-set selector, ts for a trait
selector, tp for a trait property).  Those changes were relatively
mechanical.  I also added some abstraction for the trait-score so that
it need not be handled explicitly when processing property lists.

Part 2 changes the representation of name-list properties so that both
the string and identifier forms store the name in the same place.

Part 3 is a more radical change: it replaces the string names of
trait-set and trait selectors with enumerators, which allows clean-up
of those funky switch statements.  I also made things more
table-driven.  Alas, this part is still WIP; there's an ICE in one of
the test cases I haven't been able to track down yet.

I can continue to work on this patch set in the next couple of weeks
if the general direction is seen as a good thing.  I believe there is
a little more latitude re the end of stage 1 with OpenMP (as there is
with target-specific patches) since it is not enabled by default; in any
case I'd like to get feedback on the general direction before continuing too
much farther with this, and adapting the metadirective patches to match it.

-Sandra

Sandra Loosemore (3):
  OpenMP: Introduce accessor macros and constructors for context
selectors.
  OpenMP: Unify representation of name-list properties.
  OpenMP: Use enumerators for names of trait-sets and traits

 gcc/c/c-parser.cc   | 212 ---
 gcc/cp/decl.cc  |   8 +-
 gcc/cp/parser.cc| 212 ---
 gcc/cp/pt.cc|  93 +++--
 gcc/fortran/trans-openmp.cc |  65 +++-
 gcc/gimplify.cc |   4 +-
 gcc/omp-general.cc  | 713 ++--
 gcc/omp-general.h   | 132 ++-
 8 files changed, 811 insertions(+), 628 deletions(-)

-- 
2.31.1



[PATCH 3/3] OpenMP: Use enumerators for names of trait-sets and traits

2023-11-19 Thread Sandra Loosemore
This patch introduces enumerators to represent trait-set names and
trait names, which makes it easier to use tables to control other
behavior and for switch statements to dispatch on the tags.  The tags
are stored in the same place in the TREE_LIST structure (OMP_TSS_ID or
OMP_TS_ID) and are encoded there as integer constants.

This patch has only been lightly tested and still has at least one bug
that causes an ICE.  :-(

gcc/ChangeLog
* omp-general.h (enum omp_tss_code): New.
(enum omp_ts_code): New.
(enum omp_tp_type): New.
(omp_tss_map): New.
(struct omp_ts_info): New.
(omp_ts_map): New.
(OMP_TSS_CODE, OMP_TSS_NAME): New.
(OMP_TS_CODE, OMP_TS_NAME): New.
(make_trait_set_selector, make_trait_selector): Adjust declarations.
(omp_context_selector_set_compare): Likewise.
(omp_get_context_selector): Likewise.
(omp_get_context_selector_list): New.
(omp_lookup_tss_code): New.
(omp_lookup_ts_code): New.
* omp-general.cc (omp_construct_traits_to_codes): Make it
table-driven.
(omp_tss_map): New.
(kind_properties, vendor_properties, extension_properties): New.
(atomic_default_mem_order_properties): New.
(omp_ts_map): New.
(omp_check_context_selector): Simplify lookup and dispatch logic.
(omp_mark_declare_variant): Adjust for new representation.
(make_trait_set_selector, make_trait_selector): Adjust for new
representations.
(omp_context_selector_matches): Simplify dispatch logic, also
avoid fix-sized buffers.
(omp_context_selector_props_compare): Adjust for new representations
and simplify dispatch logic.
(omp_context_selector_set_compare): Likewise.
(omp_context_selector_compare): Likewise.
(omp_get_context_selector): Adjust for new representations, and split
out...
(omp_get_context_selector_list): New function.
(omp_lookup_tss_code): New.
(omp_lookup_ts_code): New.
(omp_context_compute_score): Adjust for new representations.  Avoid
fixed-sized buffers and magic numbers.

gcc/c/ChangeLog
* c-parser.cc (omp_construct_selectors): Delete.
(omp_device_selectors): Delete.
(omp_implementation_selectors): Delete.
(omp_user_selectors): Delete.
(c_parser_omp_context_selector): Adjust for new representations
and simplify dispatch logic.
(c_parser_omp_context_selector_specification): Likewise.
(c_finish_omp_declare_variant): Adjust for new representations.

gcc/cp/ChangeLog
* decl.cc (omp_declare_variant_finalize_one): Adjust for new
representations.
* parser.cc (omp_construct_selectors): Delete.
(omp_device_selectors): Delete.
(omp_implementation_selectors): Delete.
(omp_user_selectors): Delete.
(cp_parser_omp_context_selector): Adjust for new representations
and simplify dispatch logic.
(cp_parser_omp_context_selector_specification): Likewise.
* pt.cc (tsubst_attribute): Adjust for new representations.

gcc/fortran/ChangeLog
* trans-openmp.cc (gfc_trans_omp_declare_variant): Adjust for
new representations.
---
 gcc/c/c-parser.cc   | 192 --
 gcc/cp/decl.cc  |   8 +-
 gcc/cp/parser.cc| 189 --
 gcc/cp/pt.cc|  15 +-
 gcc/fortran/trans-openmp.cc |  41 ++-
 gcc/omp-general.cc  | 496 +++-
 gcc/omp-general.h   |  87 ++-
 7 files changed, 555 insertions(+), 473 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index a2ff381e0c1..70c0e1828ca 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -24016,16 +24016,6 @@ c_parser_omp_declare_simd (c_parser *parser, enum 
pragma_context context)
 }
 }
 
-static const char *const omp_construct_selectors[] = {
-  "simd", "target", "teams", "parallel", "for", NULL };
-static const char *const omp_device_selectors[] = {
-  "kind", "isa", "arch", NULL };
-static const char *const omp_implementation_selectors[] = {
-  "vendor", "extension", "atomic_default_mem_order", "unified_address",
-  "unified_shared_memory", "dynamic_allocators", "reverse_offload", NULL };
-static const char *const omp_user_selectors[] = {
-  "condition", NULL };
-
 /* OpenMP 5.0:
 
trait-selector:
@@ -24038,7 +24028,8 @@ static const char *const omp_user_selectors[] = {
trait-selector-set SET.  */
 
 static tree
-c_parser_omp_context_selector (c_parser *parser, tree set, tree parms)
+c_parser_omp_context_selector (c_parser *parser, enum omp_tss_code set,
+  tree parms)
 {
   tree ret = NULL_TREE;
   do
@@ -24052,80 +24043,52 @@ c_parser_omp_context_selector (c_parser *parser, tree 
set, tree parms)
  c_parser_error (parser, "expected trait selector name");
  return error_mark_n

[PATCH] RISC-V: Remove duplicate `order_operator' predicate

2023-11-19 Thread Maciej W. Rozycki
Remove our RISC-V-specific `order_operator' predicate, which is exactly 
the same as generic `ordered_comparison_operator' one.

gcc/
* config/riscv/predicates.md (order_operator): Remove predicate.
* config/riscv/riscv.cc (riscv_rtx_costs): Update accordingly.
* config/riscv/riscv.md (*branch, *movcc)
(cstore4): Likewise.
---
Hi,

 Verified with the `riscv64-linux-gnu' target and the C language 
testsuite.  OK to apply?

  Maciej
---
 gcc/config/riscv/predicates.md |3 ---
 gcc/config/riscv/riscv.cc  |2 +-
 gcc/config/riscv/riscv.md  |6 +++---
 3 files changed, 4 insertions(+), 7 deletions(-)

gcc-riscv-ordered-comparison-operator.diff
Index: gcc/gcc/config/riscv/predicates.md
===
--- gcc.orig/gcc/config/riscv/predicates.md
+++ gcc/gcc/config/riscv/predicates.md
@@ -339,9 +339,6 @@
 (define_predicate "equality_operator"
   (match_code "eq,ne"))
 
-(define_predicate "order_operator"
-  (match_code "eq,ne,lt,ltu,le,leu,ge,geu,gt,gtu"))
-
 (define_predicate "signed_order_operator"
   (match_code "eq,ne,lt,le,ge,gt"))
 
Index: gcc/gcc/config/riscv/riscv.cc
===
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -2914,7 +2914,7 @@ riscv_rtx_costs (rtx x, machine_mode mod
  *total = COSTS_N_INSNS (SINGLE_SHIFT_COST + 1);
  return true;
}
- if (order_operator (XEXP (x, 0), mode))
+ if (ordered_comparison_operator (XEXP (x, 0), mode))
{
  *total = COSTS_N_INSNS (1);
  return true;
Index: gcc/gcc/config/riscv/riscv.md
===
--- gcc.orig/gcc/config/riscv/riscv.md
+++ gcc/gcc/config/riscv/riscv.md
@@ -2640,7 +2640,7 @@
 (define_insn "*branch"
   [(set (pc)
(if_then_else
-(match_operator 1 "order_operator"
+(match_operator 1 "ordered_comparison_operator"
 [(match_operand:X 2 "register_operand" "r")
  (match_operand:X 3 "reg_or_0_operand" "rJ")])
 (label_ref (match_operand 0 "" ""))
@@ -2716,7 +2716,7 @@
 (define_insn "*movcc"
   [(set (match_operand:GPR 0 "register_operand" "=r,r")
(if_then_else:GPR
-(match_operator 5 "order_operator"
+(match_operator 5 "ordered_comparison_operator"
[(match_operand:X 1 "register_operand" "r,r")
 (match_operand:X 2 "reg_or_0_operand" "rJ,rJ")])
 (match_operand:GPR 3 "register_operand" "0,0")
@@ -2902,7 +2902,7 @@
 
 (define_expand "cstore4"
   [(set (match_operand:SI 0 "register_operand")
-   (match_operator:SI 1 "order_operator"
+   (match_operator:SI 1 "ordered_comparison_operator"
[(match_operand:GPR 2 "register_operand")
 (match_operand:GPR 3 "nonmemory_operand")]))]
   ""


[PATCH] testsuite: Fix subexpressions with `scan-assembler-times'

2023-11-19 Thread Maciej W. Rozycki
We have an issue with `scan-assembler-times' handling expressions using 
subexpressions as produced by capturing parentheses `()' in an odd way, 
and one that is inconsistent with `scan-assembler', `scan-assembler-not', 
etc.  The problem comes from calling `regexp' with `-inline -all', which 
causes a list to be returned that would otherwise be placed in match 
variables.

Consequently if we have say:

/* { dg-final { scan-assembler-times "\\s(foo|bar)\\s" 1 } } */

in a test case and there is a lone `foo' present in output being matched, 
then our invocation of `regexp -inline -all' in `scan-assembler-times' 
will return:

{ foo } foo

and that in turn will confuse our match count calculation as `llength' 
will return 2 rather than 1, making the test fail even though `foo' was 
only actually matched once.

It seems unclear why we chose to call `regexp' in such an odd way in the 
first place just to figure out the number of matches.  The first version 
of TCL that supports the `-all' option to `regexp' is 8.3, and according 
to its documentation[1][2] `regexp' already returns the number of matches 
found whenever `-all' has been used *unless* `-inline' has also been used.

Remove the `-inline' option then along with the `llength' invocation.

References:

[1] "Tcl Built-In Commands - regexp manual page", 


[2] "Tcl Built-In Commands - regexp manual page", 


gcc/testsuite/
* lib/scanasm.exp (scan-assembler-times): Remove the `-inline' 
option to `regexp' and the wrapping `llength' call.
---
Hi,

 Verified with the `riscv64-linux-gnu' target and the C language
testsuite.  OK to apply?

  Maciej
---
 gcc/testsuite/lib/scanasm.exp |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

gcc-test-scan-assembler-times-count.diff
Index: gcc/gcc/testsuite/lib/scanasm.exp
===
--- gcc.orig/gcc/testsuite/lib/scanasm.exp
+++ gcc/gcc/testsuite/lib/scanasm.exp
@@ -505,7 +505,7 @@ proc scan-assembler-times { args } {
 close $fd
 regsub -all 
{(^|\n)[[:space:]]*\.section[[:space:]]*\.gnu\.lto_(?:[^\n]*\n(?![[:space:]]*\.(section|text|data|bss)))*[^\n]*\n}
 $text {\1} text
 
-set result_count [llength [regexp -inline -all -- $pattern $text]]
+set result_count [regexp -all -- $pattern $text]
 if {$result_count == $times} {
pass "$testcase scan-assembler-times $pp_pattern $times"
 } else {


Re: Propagate value ranges of return values

2023-11-19 Thread Sam James


Jan Hubicka  writes:

> Hi,
> this patch implements very basic propaation of return value ranges from VRP
> pass.  This helps std::vector's push_back since we work out value range of
> allocated block.  This propagates only within single translation unit.  I 
> hoped
> we will also do the propagation at WPA stage, but that needs more work on
> ipa-cp side.
>
> I also added code auto-detecting return_nonnull and corresponding 
> -Wsuggest-attribute
>
> Variant of this patch bootstrapped/regtested x86_64-linux, testing with
> this version is running.  I plan to commit the patch at Monday provided
> there are no issues.
>
> gcc/ChangeLog:
>
>   * cgraph.cc (add_detected_attribute_1): New function.
>   (cgraph_node::add_detected_attribute): New member function.
>   * cgraph.h (struct cgraph_node): Declare it.
>   * common.opt: Add Wsuggest-attribute=returns_nonnull.
>   * doc/invoke.texi: Document +Wsuggest-attribute=returns_nonnull.
>   * gimple-range-fold.cc: Include ipa-prop and dependencies.
>   (fold_using_range::range_of_call): Look for return value range.
>   * ipa-prop.cc (struct ipa_return_value_summary): New structure.
>   (class ipa_return_value_sum_t): New summary.
>   (ipa_record_return_value_range): New function.
>   (ipa_return_value_range): New function.
>   * ipa-prop.h (ipa_return_value_range): Declare.
>   (ipa_record_return_value_range): Declare.
>   * ipa-pure-const.cc (warn_function_returns_nonnull): New function.
>   * ipa-utils.h (warn_function_returns_nonnull): Declare.
>   * symbol-summary.h: Fix comment typo.
>   * tree-vrp.cc (execute_ranger_vrp): Record return values.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/tree-ssa/return-value-range-1.c: New test.
>
> diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
> index e41e5ad3ae7..71dacf23ce1 100644
> --- a/gcc/cgraph.cc
> +++ b/gcc/cgraph.cc
> @@ -2629,6 +2629,54 @@ cgraph_node::set_malloc_flag (bool malloc_p)
>return changed;
>  }
>  
> +/* Worker to set malloc flag.  */
> +static void
> +add_detected_attribute_1 (cgraph_node *node, const char *attr, bool *changed)
> +{
> +  if (!lookup_attribute (attr, DECL_ATTRIBUTES (node->decl)))
> +{
> +  DECL_ATTRIBUTES (node->decl) = tree_cons (get_identifier (attr),
> +  NULL_TREE, DECL_ATTRIBUTES 
> (node->decl));
> +  *changed = true;
> +}
> +
> +  ipa_ref *ref;
> +  FOR_EACH_ALIAS (node, ref)
> +{
> +  cgraph_node *alias = dyn_cast (ref->referring);
> +  if (alias->get_availability () > AVAIL_INTERPOSABLE)
> + add_detected_attribute_1 (alias, attr, changed);
> +}
> +
> +  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
> +if (e->caller->thunk
> + && (e->caller->get_availability () > AVAIL_INTERPOSABLE))
> +  add_detected_attribute_1 (e->caller, attr, changed);
> +}
> +
> +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */
> +
> +bool
> +cgraph_node::add_detected_attribute (const char *attr)
> +{
> +  bool changed = false;
> +
> +  if (get_availability () > AVAIL_INTERPOSABLE)
> +add_detected_attribute_1 (this, attr, &changed);
> +  else
> +{
> +  ipa_ref *ref;
> +
> +  FOR_EACH_ALIAS (this, ref)
> + {
> +   cgraph_node *alias = dyn_cast (ref->referring);
> +   if (alias->get_availability () > AVAIL_INTERPOSABLE)
> + add_detected_attribute_1 (alias, attr, &changed);
> + }
> +}
> +  return changed;
> +}
> +
>  /* Worker to set noreturng flag.  */
>  static void
>  set_noreturn_flag_1 (cgraph_node *node, bool noreturn_p, bool *changed)
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index cedaaac3a45..cfdd9f693a8 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -1190,6 +1190,10 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
> public symtab_node
>  
>bool set_pure_flag (bool pure, bool looping);
>  
> +  /* Add attribute ATTR to cgraph_node's decl and on aliases of the node
> + if any.  */
> +  bool add_detected_attribute (const char *attr);
> +
>/* Call callback on function and aliases associated to the function.
>   When INCLUDE_OVERWRITABLE is false, overwritable aliases and thunks are
>   skipped. */
> diff --git a/gcc/common.opt b/gcc/common.opt
> index d21db5d4a20..0be4f02677c 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -781,6 +781,10 @@ Wsuggest-attribute=malloc
>  Common Var(warn_suggest_attribute_malloc) Warning
>  Warn about functions which might be candidates for __attribute__((malloc)).
>  
> +Wsuggest-attribute=returns_nonnull

- or _?

(If changing it, needs adjustment in rest of patch too.)

> +Common Var(warn_suggest_attribute_malloc) Warning
> +Warn about functions which might be candidates for __attribute__((malloc)).
> +

Typo: s/malloc/nonnull/?


[pushed] libcpp: split decls out to rich-location.h

2023-11-19 Thread David Malcolm
The various decls relating to rich_location are in
libcpp/include/line-map.h, but they don't relate to line maps.

Split them out to their own header: libcpp/include/rich-location.h

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-5593-g78d132d73ec378.

gcc/ChangeLog:
* Makefile.in (CPPLIB_H): Add libcpp/include/rich-location.h.
* coretypes.h (class rich_location): New forward decl.

gcc/analyzer/ChangeLog:
* analyzer.h: Include "rich-location.h".

gcc/c-family/ChangeLog:
* c-lex.cc: Include "rich-location.h".

gcc/cp/ChangeLog:
* mapper-client.cc: Include "rich-location.h".

gcc/ChangeLog:
* diagnostic.h: Include "rich-location.h".
* edit-context.h (class fixit_hint): New forward decl.
* gcc-rich-location.h: Include "rich-location.h".
* genmatch.cc: Likewise.
* pretty-print.h: Likewise.

gcc/rust/ChangeLog:
* rust-location.h: Include "rich-location.h".

libcpp/ChangeLog:
* Makefile.in (TAGS_SOURCES): Add "include/rich-location.h".
* include/cpplib.h (class rich_location): New forward decl.
* include/line-map.h (class range_label)
(enum range_display_kind, struct location_range)
(class semi_embedded_vec, class rich_location, class label_text)
(class range_label, class fixit_hint): Move to...
* include/rich-location.h: ...this new file.
* internal.h: Include "rich-location.h".
---
 gcc/Makefile.in|   1 +
 gcc/analyzer/analyzer.h|   1 +
 gcc/c-family/c-lex.cc  |   1 +
 gcc/coretypes.h|   1 +
 gcc/cp/mapper-client.cc|   1 +
 gcc/diagnostic.h   |   1 +
 gcc/edit-context.h |   1 +
 gcc/gcc-rich-location.h|   2 +
 gcc/genmatch.cc|   1 +
 gcc/pretty-print.h |   1 +
 gcc/rust/rust-location.h   |   1 +
 libcpp/Makefile.in |   4 +-
 libcpp/include/cpplib.h|   2 +
 libcpp/include/line-map.h  | 671 ---
 libcpp/include/rich-location.h | 695 +
 libcpp/internal.h  |   1 +
 16 files changed, 713 insertions(+), 672 deletions(-)
 create mode 100644 libcpp/include/rich-location.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7228b79f223a..753f2f36618e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1038,6 +1038,7 @@ SYSTEM_H = system.h hwint.h 
$(srcdir)/../include/libiberty.h \
$(HASHTAB_H)
 PREDICT_H = predict.h predict.def
 CPPLIB_H = $(srcdir)/../libcpp/include/line-map.h \
+   $(srcdir)/../libcpp/include/rich-location.h \
$(srcdir)/../libcpp/include/cpplib.h
 CODYLIB_H = $(srcdir)/../libcody/cody.hh
 INPUT_H = $(srcdir)/../libcpp/include/line-map.h input.h
diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index f08572bb633e..cf32d4b85b15 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_ANALYZER_ANALYZER_H
 #define GCC_ANALYZER_ANALYZER_H
 
+#include "rich-location.h"
 #include "function.h"
 #include "json.h"
 #include "tristate.h"
diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index 06c2453c89a6..86ec679aebfe 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "file-prefix-map.h" /* remap_macro_filename()  */
 #include "langhooks.h"
 #include "attribs.h"
+#include "rich-location.h"
 
 /* We may keep statistics about how long which files took to compile.  */
 static int header_time, body_time;
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 9848cde2b97b..fe5b868fb4f3 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -156,6 +156,7 @@ struct cl_optimization;
 struct cl_option;
 struct cl_decoded_option;
 struct cl_option_handlers;
+class rich_location;
 class diagnostic_context;
 class pretty_printer;
 class diagnostic_event_id_t;
diff --git a/gcc/cp/mapper-client.cc b/gcc/cp/mapper-client.cc
index 927271952468..f1a0c4cc009a 100644
--- a/gcc/cp/mapper-client.cc
+++ b/gcc/cp/mapper-client.cc
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 
 #include "line-map.h"
+#include "rich-location.h"
 #include "diagnostic-core.h"
 #include "mapper-client.h"
 #include "intl.h"
diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index dbf972d25875..cbd25541f502 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_DIAGNOSTIC_H
 #define GCC_DIAGNOSTIC_H
 
+#include "rich-location.h"
 #include "pretty-print.h"
 #include "diagnostic-core.h"
 
diff --git a/gcc/edit-context.h b/gcc/edit-context.h
index 3ae9ba103ca7..71735c8b9c6f 100644
--- a/gcc/edit-context.h
+++ b/gcc/edit-context.h
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-19 Thread waffl3x
Funny enough I ended up removing the ones I was thinking about, seems
to always happen when I ask style questions but I'm glad to hear it's
okay going forward.

I'm having trouble fixing this bug, based on what Gasper said in
PR102609 I am pretty sure I know what the semantics should be. Since
the capture is not used in the body of the function, it should be well
formed to call the function with an unrelated type.

I had begun trying to tackle the case that Gasper mentioned and got the
following ICE. I also have another case that ICEs so I've been thinking
I don't get to do little changes to fix this. I've been looking at this
for a few hours now and given we are past the deadline now I figured I
should see what others think.

int main()
{
  int x = 42;
  auto f1 = [x](this auto&& self) {};

  static_cast(decltype(f1)::operator());
}

explicit-obj-lambdaX3.C: In instantiation of 'main():: static 
[with auto:1 = int&]':
explicit-obj-lambdaX3.C:33:53:   required from here
   33 |   static_cast(decltype(f1)::operator());
  | ^
explicit-obj-lambdaX3.C:31:33: internal compiler error: tree check: expected 
record_type or union_type or qual_union_type, have integer_type in 
finish_non_static_data_member, at cp/semantics.cc:2294
   31 |   auto f1 = [x](this auto&& self) {};
  | ^
0x1c66dda tree_check_failed(tree_node const*, char const*, int, char const*, 
...)
/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/tree.cc:8949
0xb2e125 tree_check3(tree_node*, char const*, int, char const*, tree_code, 
tree_code, tree_code)
/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/tree.h:3638
0xedfaf4 finish_non_static_data_member(tree_node*, tree_node*, tree_node*, int)

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/semantics.cc:2294
0xe8b9b8 tsubst_expr(tree_node*, tree_node*, int, tree_node*)

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/pt.cc:20864
0xe6d713 tsubst_decl

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/pt.cc:15387
0xe6fb1b tsubst(tree_node*, tree_node*, int, tree_node*)

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/pt.cc:15967
0xe7bd81 tsubst_stmt

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/pt.cc:18299
0xe7df18 tsubst_stmt

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/pt.cc:18554
0xea6982 instantiate_body

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/pt.cc:26743
0xea83e9 instantiate_decl(tree_node*, bool, bool)

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/pt.cc:27030
0xb5f9c9 resolve_address_of_overloaded_function

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/class.cc:8802
0xb60be1 instantiate_type(tree_node*, tree_node*, int)

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/class.cc:9061
0xaf9992 standard_conversion

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/call.cc:1244
0xafcb57 implicit_conversion

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/call.cc:2081
0xb2a8cb perform_direct_initialization_if_possible(tree_node*, tree_node*, 
bool, int)

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/call.cc:13456
0xf69db8 build_static_cast_1

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/typeck.cc:8356
0xf6af1b build_static_cast(unsigned int, tree_node*, tree_node*, int)

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/typeck.cc:8566
0xd9fc02 cp_parser_postfix_expression

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/parser.cc:7531
0xda45af cp_parser_unary_expression

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/parser.cc:9244
0xda5db4 cp_parser_cast_expression

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/parser.cc:10148

As I said, there is also this case that also ICEs in the same region.
It's making me think that some core assumptions are being violated in
the code leading up to finish_non_static_data_member.

int main()
{
  int x = 42;
  auto f1 = [x](this auto self) {};
}

explicit-obj-lambdaX3.C: In lambda function:
explicit-obj-lambdaX3.C:31:31: internal compiler error: Segmentation fault
   31 |   auto f1 = [x](this auto self) {};
  |   ^
0x1869eaa crash_signal
/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/toplev.cc:315
0xb2ea0b strip_array_types(tree_node*)
/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/tree.h:4955
0xf773e2 cp_type_quals(tree_node const*)

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-next/gcc/cp/typeck.cc:11509
0xedf993 finish_non_static_data_member(tree_node*, tree_node*, tree_node*, int)

/home/waffl3x/projects/gcc-dev/workspace/src/xobj-n

[Committed V2] RISC-V: Optimize constant AVL for LRA pattern

2023-11-19 Thread Juzhe-Zhong
This optimization was discovered in the tuple move splitted bug fix patch.

Before this patch:

vsetivlizero,4,e16,mf2,ta,ma
lhu a3,96(a5)
vlseg8e16.v v1,(a5)
lw  a4,%lo(e)(a2)
vsetvli a6,zero,e64,m2,ta,ma
addia0,a7,8
vse16.v v1,0(a7)
vse16.v v2,0(a0)
addia0,a0,8
vse16.v v3,0(a0)
addia0,a0,8
vse16.v v4,0(a0)
addia0,a0,8
vse16.v v5,0(a0)
addia0,a0,8
vse16.v v6,0(a0)
addia0,a0,8
vse16.v v7,0(a0)
addia0,a0,8
vse16.v v8,0(a0)

After this patch:

vsetivlizero,4,e64,m2,ta,ma
addia0,a7,8
vlseg8e16.v v1,(a5)
vse16.v v1,0(a7)
vse16.v v2,0(a0)
addia0,a0,8
vse16.v v3,0(a0)
addia0,a0,8
vse16.v v4,0(a0)
addia0,a0,8
vse16.v v5,0(a0)
addia0,a0,8
vse16.v v6,0(a0)
addia0,a0,8
vse16.v v7,0(a0)
addia0,a0,8
vse16.v v8,0(a0)

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_vlmax_insn_lra): Optimize constant AVL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/post-ra-avl.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 20 ---
 .../riscv/rvv/autovec/post-ra-avl.c   | 16 +++
 2 files changed, 33 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index f769c1474e0..594cc4dd145 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -374,10 +374,24 @@ void
 emit_vlmax_insn_lra (unsigned icode, unsigned insn_flags, rtx *ops, rtx vl)
 {
   gcc_assert (!can_create_pseudo_p ());
+  machine_mode mode = GET_MODE (ops[0]);
 
-  insn_expander e (insn_flags, true);
-  e.set_vl (vl);
-  e.emit_insn ((enum insn_code) icode, ops);
+  if (imm_avl_p (mode))
+{
+  /* Even though VL is a real hardreg already allocated since
+it is post-RA now, we still gain benefits that we emit
+vsetivli zero, imm instead of vsetvli VL, zero which is
+we can be more flexible in post-RA instruction scheduling.  */
+  insn_expander e (insn_flags, false);
+  e.set_vl (gen_int_mode (GET_MODE_NUNITS (mode), Pmode));
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+  else
+{
+  insn_expander e (insn_flags, true);
+  e.set_vl (vl);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
 }
 
 /* Emit an RVV insn with a predefined vector length.  Contrary to
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
new file mode 100644
index 000..f3d12bac7cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/post-ra-avl.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+int a, b, c, e;
+short d[7][7] = {};
+int foo() {
+  short f;
+  c = 0;
+  for (; c <= 6; c++) {
+e |= d[c][c] & 1;
+b &= f & 3;
+  }
+  return a;
+}
+
+/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero} 1 } } */
-- 
2.36.3



[PATCH v2 2/3] LoongArch: Use standard pattern name and RTX code for LSX/LASX muh instructions

2023-11-19 Thread Xi Ruoyao
Removes unnecessary UNSPECs and make the muh instructions useful with
GNU vectors or auto vectorization.

gcc/ChangeLog:

* config/loongarch/simd.md (muh): New code attribute mapping
any_extend to smul_highpart or umul_highpart.
(mul3_highpart): New define_insn.
* config/loongarch/lsx.md (UNSPEC_LSX_VMUH_S): Remove.
(UNSPEC_LSX_VMUH_U): Remove.
(lsx_vmuh_s_): Remove.
(lsx_vmuh_u_): Remove.
* config/loongarch/lasx.md (UNSPEC_LASX_XVMUH_S): Remove.
(UNSPEC_LASX_XVMUH_U): Remove.
(lasx_xvmuh_s_): Remove.
(lasx_xvmuh_u_): Remove.
* config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vmuh_b):
Redefine to standard pattern name.
(CODE_FOR_lsx_vmuh_h): Likewise.
(CODE_FOR_lsx_vmuh_w): Likewise.
(CODE_FOR_lsx_vmuh_d): Likewise.
(CODE_FOR_lsx_vmuh_bu): Likewise.
(CODE_FOR_lsx_vmuh_hu): Likewise.
(CODE_FOR_lsx_vmuh_wu): Likewise.
(CODE_FOR_lsx_vmuh_du): Likewise.
(CODE_FOR_lasx_xvmuh_b): Likewise.
(CODE_FOR_lasx_xvmuh_h): Likewise.
(CODE_FOR_lasx_xvmuh_w): Likewise.
(CODE_FOR_lasx_xvmuh_d): Likewise.
(CODE_FOR_lasx_xvmuh_bu): Likewise.
(CODE_FOR_lasx_xvmuh_hu): Likewise.
(CODE_FOR_lasx_xvmuh_wu): Likewise.
(CODE_FOR_lasx_xvmuh_du): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-muh.c: New test.
---
 gcc/config/loongarch/lasx.md  | 22 
 gcc/config/loongarch/loongarch-builtins.cc| 32 -
 gcc/config/loongarch/lsx.md   | 22 
 gcc/config/loongarch/simd.md  | 16 +
 gcc/testsuite/gcc.target/loongarch/vect-muh.c | 36 +++
 5 files changed, 68 insertions(+), 60 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index a5eb878a612..51574bf043d 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -68,8 +68,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_BRANCH
   UNSPEC_LASX_BRANCH_V
 
-  UNSPEC_LASX_XVMUH_S
-  UNSPEC_LASX_XVMUH_U
   UNSPEC_LASX_MXVEXTW_U
   UNSPEC_LASX_XVSLLWIL_S
   UNSPEC_LASX_XVSLLWIL_U
@@ -2835,26 +2833,6 @@ (define_insn "neg2"
   [(set_attr "type" "simd_logic")
(set_attr "mode" "")])
 
-(define_insn "lasx_xvmuh_s_"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-  (match_operand:ILASX 2 "register_operand" "f")]
- UNSPEC_LASX_XVMUH_S))]
-  "ISA_HAS_LASX"
-  "xvmuh.\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "")])
-
-(define_insn "lasx_xvmuh_u_"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-  (match_operand:ILASX 2 "register_operand" "f")]
- UNSPEC_LASX_XVMUH_U))]
-  "ISA_HAS_LASX"
-  "xvmuh.\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "")])
-
 (define_insn "lasx_xvsllwil_s__"
   [(set (match_operand: 0 "register_operand" "=f")
(unspec: [(match_operand:ILASX_WHB 1 "register_operand" "f")
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index cbd833aa283..a6fcc1c731e 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -319,6 +319,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lsx_vmod_hu CODE_FOR_umodv8hi3
 #define CODE_FOR_lsx_vmod_wu CODE_FOR_umodv4si3
 #define CODE_FOR_lsx_vmod_du CODE_FOR_umodv2di3
+#define CODE_FOR_lsx_vmuh_b CODE_FOR_smulv16qi3_highpart
+#define CODE_FOR_lsx_vmuh_h CODE_FOR_smulv8hi3_highpart
+#define CODE_FOR_lsx_vmuh_w CODE_FOR_smulv4si3_highpart
+#define CODE_FOR_lsx_vmuh_d CODE_FOR_smulv2di3_highpart
+#define CODE_FOR_lsx_vmuh_bu CODE_FOR_umulv16qi3_highpart
+#define CODE_FOR_lsx_vmuh_hu CODE_FOR_umulv8hi3_highpart
+#define CODE_FOR_lsx_vmuh_wu CODE_FOR_umulv4si3_highpart
+#define CODE_FOR_lsx_vmuh_du CODE_FOR_umulv2di3_highpart
 #define CODE_FOR_lsx_vmul_b CODE_FOR_mulv16qi3
 #define CODE_FOR_lsx_vmul_h CODE_FOR_mulv8hi3
 #define CODE_FOR_lsx_vmul_w CODE_FOR_mulv4si3
@@ -439,14 +447,6 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lsx_vfnmsub_s CODE_FOR_vfnmsubv4sf4_nmsub4
 #define CODE_FOR_lsx_vfnmsub_d CODE_FOR_vfnmsubv2df4_nmsub4
 
-#define CODE_FOR_lsx_vmuh_b CODE_FOR_lsx_vmuh_s_b
-#define CODE_FOR_lsx_vmuh_h CODE_FOR_lsx_vmuh_s_h
-#define CODE_FOR_lsx_vmuh_w CODE_FOR_lsx_vmuh_s_w
-#define CODE_FOR_lsx_vmuh_d CODE_FOR_lsx_vmuh_s_d
-#define CODE_FOR_lsx_vmuh_bu CODE_FOR_lsx_vmuh_u_bu
-#define CODE_FOR_lsx_vmuh_hu CODE_FOR_lsx_vmuh_u_hu
-#define CODE_FOR_lsx_vmuh_wu CODE_FOR_lsx_vmuh_u_wu
-#define CODE_FOR_lsx_vmuh_du CODE_FOR_lsx_vmuh_u_du
 #define CODE_FOR_lsx_vsllwil

[PATCH v2 3/3] LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate shift

2023-11-19 Thread Xi Ruoyao
Remove unnecessary UNSPECs and make the [x]vrotr[i] instructions useful
with GNU vectors and auto vectorization.

gcc/ChangeLog:

* config/loongarch/lsx.md (bitimm): Move to ...
(UNSPEC_LSX_VROTR): Remove.
(lsx_vrotr_): Remove.
(lsx_vrotri_): Remove.
* config/loongarch/lasx.md (UNSPEC_LASX_XVROTR): Remove.
(lsx_vrotr_): Remove.
(lsx_vrotri_): Remove.
* config/loongarch/simd.md (bitimm): ... here.  Expand it to
cover LASX modes.
(vrotr3): New define_insn.
(vrotri3): New define_insn.
* config/loongarch/loongarch-builtins.cc:
(CODE_FOR_lsx_vrotr_b): Use standard pattern name.
(CODE_FOR_lsx_vrotr_h): Likewise.
(CODE_FOR_lsx_vrotr_w): Likewise.
(CODE_FOR_lsx_vrotr_d): Likewise.
(CODE_FOR_lasx_xvrotr_b): Likewise.
(CODE_FOR_lasx_xvrotr_h): Likewise.
(CODE_FOR_lasx_xvrotr_w): Likewise.
(CODE_FOR_lasx_xvrotr_d): Likewise.
(CODE_FOR_lsx_vrotri_b): Define to standard pattern name.
(CODE_FOR_lsx_vrotri_h): Likewise.
(CODE_FOR_lsx_vrotri_w): Likewise.
(CODE_FOR_lsx_vrotri_d): Likewise.
(CODE_FOR_lasx_xvrotri_b): Likewise.
(CODE_FOR_lasx_xvrotri_h): Likewise.
(CODE_FOR_lasx_xvrotri_w): Likewise.
(CODE_FOR_lasx_xvrotri_d): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-rotr.c: New test.
---
 gcc/config/loongarch/lasx.md  | 22 
 gcc/config/loongarch/loongarch-builtins.cc| 16 +
 gcc/config/loongarch/lsx.md   | 28 ---
 gcc/config/loongarch/simd.md  | 29 +++
 .../gcc.target/loongarch/vect-rotr.c  | 36 +++
 5 files changed, 81 insertions(+), 50 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 51574bf043d..3e135387173 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -138,7 +138,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVHSUBW_Q_D
   UNSPEC_LASX_XVHADDW_QU_DU
   UNSPEC_LASX_XVHSUBW_QU_DU
-  UNSPEC_LASX_XVROTR
   UNSPEC_LASX_XVADD_Q
   UNSPEC_LASX_XVSUB_Q
   UNSPEC_LASX_XVREPLVE
@@ -4244,18 +4243,6 @@ (define_insn "lasx_xvhsubw_qu_du"
   [(set_attr "type" "simd_int_arith")
(set_attr "mode" "V4DI")])
 
-;;XVROTR.B   XVROTR.H   XVROTR.W   XVROTR.D
-;;TODO-478
-(define_insn "lasx_xvrotr_"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-  (match_operand:ILASX 2 "register_operand" "f")]
- UNSPEC_LASX_XVROTR))]
-  "ISA_HAS_LASX"
-  "xvrotr.\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "")])
-
 ;;XVADD.Q
 ;;TODO2
 (define_insn "lasx_xvadd_q"
@@ -4438,15 +4425,6 @@ (define_insn "lasx_xvexth_qu_du"
   [(set_attr "type" "simd_fcvt")
(set_attr "mode" "V4DI")])
 
-(define_insn "lasx_xvrotri_"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (rotatert:ILASX (match_operand:ILASX 1 "register_operand" "f")
-  (match_operand 2 "const__operand" "")))]
-  "ISA_HAS_LASX"
-  "xvrotri.\t%u0,%u1,%2"
-  [(set_attr "type" "simd_shf")
-   (set_attr "mode" "")])
-
 (define_insn "lasx_xvextl_q_d"
   [(set (match_operand:V4DI 0 "register_operand" "=f")
(unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f")]
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index a6fcc1c731e..5d037ab7f10 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -369,6 +369,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lsx_vsrli_h CODE_FOR_vlshrv8hi3
 #define CODE_FOR_lsx_vsrli_w CODE_FOR_vlshrv4si3
 #define CODE_FOR_lsx_vsrli_d CODE_FOR_vlshrv2di3
+#define CODE_FOR_lsx_vrotr_b CODE_FOR_vrotrv16qi3
+#define CODE_FOR_lsx_vrotr_h CODE_FOR_vrotrv8hi3
+#define CODE_FOR_lsx_vrotr_w CODE_FOR_vrotrv4si3
+#define CODE_FOR_lsx_vrotr_d CODE_FOR_vrotrv2di3
+#define CODE_FOR_lsx_vrotri_b CODE_FOR_rotrv16qi3
+#define CODE_FOR_lsx_vrotri_h CODE_FOR_rotrv8hi3
+#define CODE_FOR_lsx_vrotri_w CODE_FOR_rotrv4si3
+#define CODE_FOR_lsx_vrotri_d CODE_FOR_rotrv2di3
 #define CODE_FOR_lsx_vsub_b CODE_FOR_subv16qi3
 #define CODE_FOR_lsx_vsub_h CODE_FOR_subv8hi3
 #define CODE_FOR_lsx_vsub_w CODE_FOR_subv4si3
@@ -634,6 +642,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lasx_xvsrli_h CODE_FOR_vlshrv16hi3
 #define CODE_FOR_lasx_xvsrli_w CODE_FOR_vlshrv8si3
 #define CODE_FOR_lasx_xvsrli_d CODE_FOR_vlshrv4di3
+#define CODE_FOR_lasx_xvrotr_b CODE_FOR_vrotrv32qi3
+#define CODE_FOR_lasx_xvrotr_h CODE_FOR_vrotrv16hi3
+#define CODE_FOR_lasx_xvrotr_w CODE_FOR_vrotrv8si3
+#define CODE_FOR_lasx_xvrotr_d CODE_FOR_vrotrv4di3
+#define CODE_FOR_lasx_xvrotri_b CODE_FOR_rotrv32qi3
+#define COD

[PATCH v2 0/3] LoongArch: SIMD fixes and optimizations

2023-11-19 Thread Xi Ruoyao
The [1/3] patch is the PR112578 fix at
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637097.html
unchanged.  As I've made other two patches depending on the simd.md
file introduced by it, resend it as a part of this series.

As many LASX instructions are only differentiated from the corresponding
LSX instruction with operand length, create simd.md file to contain the
RTX templates sharable by LSX and LASX.  This makes the code cleaner and
easier to maintain.

The [2/3] and [3/3] patches make vector product highpart and rotate
shift operations for GNU vectors and auto vectorization.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (3):
  LoongArch: Fix usage of LSX and LASX frint/ftint instructions
[PR112578]
  LoongArch: Use standard pattern name and RTX code for LSX/LASX muh
instructions
  LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate
shift

 gcc/config/loongarch/lasx.md  | 271 -
 gcc/config/loongarch/loongarch-builtins.cc|  52 ++--
 gcc/config/loongarch/loongarch.md |   7 +-
 gcc/config/loongarch/lsx.md   | 284 --
 gcc/config/loongarch/simd.md  | 238 +++
 .../loongarch/vect-frint-no-inexact.c |  48 +++
 .../gcc.target/loongarch/vect-frint.c |  82 +
 .../loongarch/vect-ftint-no-inexact.c |  44 +++
 .../gcc.target/loongarch/vect-ftint.c |  80 +
 gcc/testsuite/gcc.target/loongarch/vect-muh.c |  36 +++
 .../gcc.target/loongarch/vect-rotr.c  |  36 +++
 11 files changed, 598 insertions(+), 580 deletions(-)
 create mode 100644 gcc/config/loongarch/simd.md
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c

-- 
2.42.1



[PATCH v2 1/3] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-19 Thread Xi Ruoyao
The usage LSX and LASX frint/ftint instructions had some problems:

1. These instructions raises FE_INEXACT, which is not allowed with
   -fno-fp-int-builtin-inexact for most C2x section F.10.6 functions
   (the only exceptions are rint, lrint, and llrint).
2. The "frint" instruction without explicit rounding mode is used for
   roundM2, this is incorrect because roundM2 is defined "rounding
   operand 1 to the *nearest* integer, rounding away from zero in the
   event of a tie".  We actually don't have such an instruction.  Our
   frintrne instruction is roundevenM2 (unfortunately, this is not
   documented).
3. These define_insn's are written in a way not so easy to hack.

So I removed these instructions and created a "simd.md" file, then added
them and the corresponding expanders there.  The advantage of the
simd.md file is we don't need to duplicate the RTL template twice (in
lsx.md and lasx.md).

gcc/ChangeLog:

PR target/112578
* config/loongarch/lsx.md (UNSPEC_LSX_VFTINT_S,
UNSPEC_LSX_VFTINTRNE, UNSPEC_LSX_VFTINTRP,
UNSPEC_LSX_VFTINTRM, UNSPEC_LSX_VFRINTRNE_S,
UNSPEC_LSX_VFRINTRNE_D, UNSPEC_LSX_VFRINTRZ_S,
UNSPEC_LSX_VFRINTRZ_D, UNSPEC_LSX_VFRINTRP_S,
UNSPEC_LSX_VFRINTRP_D, UNSPEC_LSX_VFRINTRM_S,
UNSPEC_LSX_VFRINTRM_D): Remove.
(ILSX, FLSX): Move into ...
(VIMODE): Move into ...
(FRINT_S, FRINT_D): Remove.
(frint_pattern_s, frint_pattern_d, frint_suffix): Remove.
(lsx_vfrint_, lsx_vftint_s__,
lsx_vftintrne_w_s, lsx_vftintrne_l_d, lsx_vftintrp_w_s,
lsx_vftintrp_l_d, lsx_vftintrm_w_s, lsx_vftintrm_l_d,
lsx_vfrintrne_s, lsx_vfrintrne_d, lsx_vfrintrz_s,
lsx_vfrintrz_d, lsx_vfrintrp_s, lsx_vfrintrp_d,
lsx_vfrintrm_s, lsx_vfrintrm_d,
v4sf2,
v2df2, round2): Remove.
* config/loongarch/lasx.md: Likewise.
* config/loongarch/simd.md: New file.
(ILSX, ILASX, FLSX, FLASX, VIMODE): ... here.
(IVEC, FVEC): New mode iterators.
(VIMODE): ... here.  Extend it to work for all LSX/LASX vector
modes.
(x, wu, simd_isa, WVEC, vimode, simdfmt, simdifmt_for_f,
elebits): New mode attributes.
(UNSPEC_SIMD_FRINTRP, UNSPEC_SIMD_FRINTRZ, UNSPEC_SIMD_FRINT,
UNSPEC_SIMD_FRINTRM, UNSPEC_SIMD_FRINTRNE): New unspecs.
(SIMD_FRINT): New int iterator.
(simd_frint_rounding, simd_frint_pattern): New int attributes.
(_vfrint_): New
define_insn template for frint instructions.
(_vftint__):
Likewise, but for ftint instructions.
(2): New define_expand with
flag_fp_int_builtin_inexact checked.
(l2): Likewise.
(rint2): New define_expand.  It does not require
flag_fp_int_builtin_inexact.
(ftrunc2): Likewise.
(lrint2): Likewise.
(fix_trunk2): Likewise.
(include): Add lsx.md and lasx.md.
* config/loongarch/loongarch.md (include): Include simd.md,
instead of including lsx.md and lasx.md directly.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vftint_w_s, CODE_FOR_lsx_vftint_l_d,
CODE_FOR_lasx_xvftint_w_s, CODE_FOR_lasx_xvftint_l_d):
Remove.

gcc/testsuite/ChangeLog:

PR target/112578
* gcc.target/loongarch/vect-frint.c: New test.
* gcc.target/loongarch/vect-frint-no-inexact.c: New test.
* gcc.target/loongarch/vect-ftint.c: New test.
* gcc.target/loongarch/vect-ftint-no-inexact.c: New test.
---
 gcc/config/loongarch/lasx.md  | 227 -
 gcc/config/loongarch/loongarch-builtins.cc|   4 -
 gcc/config/loongarch/loongarch.md |   7 +-
 gcc/config/loongarch/lsx.md   | 234 --
 gcc/config/loongarch/simd.md  | 193 +++
 .../loongarch/vect-frint-no-inexact.c |  48 
 .../gcc.target/loongarch/vect-frint.c |  82 ++
 .../loongarch/vect-ftint-no-inexact.c |  44 
 .../gcc.target/loongarch/vect-ftint.c |  80 ++
 9 files changed, 449 insertions(+), 470 deletions(-)
 create mode 100644 gcc/config/loongarch/simd.md
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 2e11f061202..a5eb878a612 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -53,7 +53,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVFCMP_SULT
   UNSPEC_LASX_XVFCMP_SUN
   UNSPEC_LASX_XVFCMP_SUNE
-  UNSPEC_LASX_XVFTINT_S
   UNSPEC_LASX_XVFTINT_U
   UNSPEC_LASX_XVCLO
   UNSPEC_LASX_XVSAT_S
@@ -92,12 +91,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVEXTRINS
   UNSPEC

Re: Propagate value ranges of return values

2023-11-19 Thread Jan Hubicka
> > +Wsuggest-attribute=returns_nonnull
> 
> - or _?
> 
> (If changing it, needs adjustment in rest of patch too.)
I was thinking of this and I am not sure what is better.
Sure _ in command line option looks odd, but this is an identifier
and it is returns_nonnull and not returns-nonnull.

I am not sure we have earlier situation like that.  I can live with both
variants and would be happy to hear opinions on this.
> 
> > +Common Var(warn_suggest_attribute_malloc) Warning
> > +Warn about functions which might be candidates for __attribute__((malloc)).
> > +
> 
> Typo: s/malloc/nonnull/?
Thanks!
Honza


Re: [PATCH 3/4] gcov: Add gen_counter_update()

2023-11-19 Thread Sebastian Huber

Hello Dimitar,

On 19.11.23 10:00, Dimitar Dimitrov wrote:

On Tue, Nov 14, 2023 at 11:08:24PM +0100, Sebastian Huber wrote:

Move the counter update to the new gen_counter_update() helper function.  Use
it in gimple_gen_edge_profiler() and gimple_gen_time_profiler().  The resulting
gimple instructions should be identical with the exception of the removed
unshare_expr() call.  The unshare_expr() call was used in
gimple_gen_edge_profiler().

gcc/ChangeLog:

* tree-profile.cc (gen_assign_counter_update): New.
(gen_counter_update): Likewise.
(gimple_gen_edge_profiler): Use gen_counter_update().
(gimple_gen_time_profiler): Likewise.
---
  gcc/tree-profile.cc | 133 +---
  1 file changed, 62 insertions(+), 71 deletions(-)


Hi Sebastian,

This patch caused a bunch of test failures on arm-none-eabi and
pru-unknown-elf targets.


thanks for the report. I will have a look at this next week. I guess it 
has something to do with the removed unshare_expr() call. I don't really 
know what it does, but I will try to figure this out.


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: Propagate value ranges of return values

2023-11-19 Thread Jan Hubicka
Hi,
this is updated version which also adds testuiste compensation
I lost earlier while maintaining the patch in my testing tree.
There are quite few testcases that use constant return values to hide
something from optimizer.

Bootstrapped/regtested x86_64-linux.
gcc/ChangeLog:

* cgraph.cc (add_detected_attribute_1): New function.
(cgraph_node::add_detected_attribute): Likewise.
* cgraph.h (cgraph_node::add_detected_attribute): Declare.
* common.opt: Add -Wsuggest-attribute=returns_nonnull.
* doc/invoke.texi: Document new flag.
* gimple-range-fold.cc (fold_using_range::range_of_call):
Use known reutrn value ranges.
* ipa-prop.cc (struct ipa_return_value_summary): New type.
(class ipa_return_value_sum_t): New type.
(ipa_return_value_sum): New summary.
(ipa_record_return_value_range): New function.
(ipa_return_value_range): New function.
* ipa-prop.h (ipa_return_value_range): Declare.
(ipa_record_return_value_range): Declare.
* ipa-pure-const.cc (warn_function_returns_nonnull): New funcion.
* ipa-utils.h (warn_function_returns_nonnull): Declare.
* symbol-summary.h: Fix comment.
* tree-vrp.cc (execute_ranger_vrp): Record return values.

gcc/testsuite/ChangeLog:

* g++.dg/ipa/devirt-2.C: Add noipa attribute to prevent ipa-vrp.
* g++.dg/ipa/devirt-7.C: Disable ipa-vrp.
* g++.dg/ipa/ipa-icf-2.C: Disable ipa-vrp.
* g++.dg/ipa/ipa-icf-3.C: Disable ipa-vrp.
* g++.dg/ipa/ivinline-1.C: Disable ipa-vrp.
* g++.dg/ipa/ivinline-3.C: Disable ipa-vrp.
* g++.dg/ipa/ivinline-5.C: Disable ipa-vrp.
* g++.dg/ipa/ivinline-8.C: Disable ipa-vrp.
* g++.dg/ipa/nothrow-1.C: Disable ipa-vrp.
* g++.dg/ipa/pure-const-1.C: Disable ipa-vrp.
* g++.dg/ipa/pure-const-2.C: Disable ipa-vrp.
* g++.dg/lto/inline-crossmodule-1_0.C: Disable ipa-vrp.
* gcc.c-torture/compile/pr106433.c: Add noipa attribute to prevent 
ipa-vrp.
* gcc.c-torture/execute/frame-address.c: Likewise.
* gcc.dg/ipa/fopt-info-inline-1.c: Disable ipa-vrp.
* gcc.dg/ipa/ipa-icf-25.c: Disable ipa-vrp.
* gcc.dg/ipa/ipa-icf-38.c: Disable ipa-vrp.
* gcc.dg/ipa/pure-const-1.c: Disable ipa-vrp.
* gcc.dg/ipa/remref-0.c: Add noipa attribute to prevent ipa-vrp.
* gcc.dg/tree-prof/time-profiler-1.c: Disable ipa-vrp.
* gcc.dg/tree-prof/time-profiler-2.c: Disable ipa-vrp.
* gcc.dg/tree-ssa/pr110269.c: Disable ipa-vrp.
* gcc.dg/tree-ssa/pr20701.c: Disable ipa-vrp.
* gcc.dg/tree-ssa/vrp05.c: Disable ipa-vrp.
* gcc.dg/tree-ssa/return-value-range-1.c: New test.

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index e41e5ad3ae7..71dacf23ce1 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -2629,6 +2629,54 @@ cgraph_node::set_malloc_flag (bool malloc_p)
   return changed;
 }
 
+/* Worker to set malloc flag.  */
+static void
+add_detected_attribute_1 (cgraph_node *node, const char *attr, bool *changed)
+{
+  if (!lookup_attribute (attr, DECL_ATTRIBUTES (node->decl)))
+{
+  DECL_ATTRIBUTES (node->decl) = tree_cons (get_identifier (attr),
+NULL_TREE, DECL_ATTRIBUTES 
(node->decl));
+  *changed = true;
+}
+
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (node, ref)
+{
+  cgraph_node *alias = dyn_cast (ref->referring);
+  if (alias->get_availability () > AVAIL_INTERPOSABLE)
+   add_detected_attribute_1 (alias, attr, changed);
+}
+
+  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
+if (e->caller->thunk
+   && (e->caller->get_availability () > AVAIL_INTERPOSABLE))
+  add_detected_attribute_1 (e->caller, attr, changed);
+}
+
+/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */
+
+bool
+cgraph_node::add_detected_attribute (const char *attr)
+{
+  bool changed = false;
+
+  if (get_availability () > AVAIL_INTERPOSABLE)
+add_detected_attribute_1 (this, attr, &changed);
+  else
+{
+  ipa_ref *ref;
+
+  FOR_EACH_ALIAS (this, ref)
+   {
+ cgraph_node *alias = dyn_cast (ref->referring);
+ if (alias->get_availability () > AVAIL_INTERPOSABLE)
+   add_detected_attribute_1 (alias, attr, &changed);
+   }
+}
+  return changed;
+}
+
 /* Worker to set noreturng flag.  */
 static void
 set_noreturn_flag_1 (cgraph_node *node, bool noreturn_p, bool *changed)
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index cedaaac3a45..cfdd9f693a8 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1190,6 +1190,10 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
 
   bool set_pure_flag (bool pure, bool looping);
 
+  /* Add attribute ATTR to cgraph_node's decl and on aliases of the node
+ if any.  */
+  bool add_detected_attribute (const char *attr);
+
   /* Call callback on function and aliases associated to the function.

Re: [PATCH] RISC-V: Remove duplicate `order_operator' predicate

2023-11-19 Thread Jeff Law




On 11/19/23 04:24, Maciej W. Rozycki wrote:

Remove our RISC-V-specific `order_operator' predicate, which is exactly
the same as generic `ordered_comparison_operator' one.

gcc/
* config/riscv/predicates.md (order_operator): Remove predicate.
* config/riscv/riscv.cc (riscv_rtx_costs): Update accordingly.
* config/riscv/riscv.md (*branch, *movcc)
(cstore4): Likewise.
---
Hi,

  Verified with the `riscv64-linux-gnu' target and the C language
testsuite.  OK to apply?

OK
jeff


Re: [PATCH] testsuite: scev: expect fail on ilp32

2023-11-19 Thread Jeff Law




On 11/19/23 00:30, Alexandre Oliva wrote:


I've recently patched scev-3.c and scev-5.c because it only passed by
accident on ia32.  It also fails on some (but not all) arm-eabi
variants.  It seems hard to characterize the conditions in which the
optimization is supposed to pass, but expecting them to fail on ilp32
targets, though probably a little excessive and possibly noisy, is not
quite as alarming as getting a fail in test reports, so I propose
changing the xfail marker from ia32 to ilp32.

I'm also proposing to add a similar marker to scev-4.c.  Though it
doesn't appear to be failing for me, I've got reports that suggest it
still does for others, and it certainly did for us as well.

Regstrapped on x86_64-linux-gnu, also tested on arm-eabi with default
cpu on trunk, and with tms570 on gcc-13.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.dg/tree-ssa/scev-3.c: xfail on all ilp32 targets,
though some of these do pass.
* gcc.dg/tree-ssa/scev-4.c: Likewise.
* gcc.dg/tree-ssa/scev-5.c: Likewise.
OK.  Though hopefully someone will figure out what properties actually 
cause the differences so that we can do the right thing without the 
noisy XPASS at some point.


jeff


Re: [PATCH] testsuite: analyzer: expect alignment warning with -fshort-enums

2023-11-19 Thread Jeff Law




On 11/19/23 00:36, Alexandre Oliva wrote:


On targets that have -fshort-enums enabled by default, the type casts
in the pr108251 analyzer tests warn that the byte-aligned enums may
not be sufficiently aligned to be a struct connection *.  The function
can't know better, the warning is reasonable, the code doesn't
expected enums to be shorter and less aligned than the struct.

Rather than use -fno-short-enums, I decided to embrace the warning on
targets that have short_enums enabled by default.

However, C++ doesn't issue the warning, because even with
-fshort-enums, enumeration types are not TYPE_PACKED, and the
expression is not sufficiently simplified by the C++ front-end for
check_and_warn_address_or_pointer_of_packed_member to identify the
insufficiently aligned pointer.  So don't expect the warning there.

(I've got followup patches in testing to get the same warnings in C++)

Regstrapped on x86_64-linux-gnu, also tested on arm-eabi with default
cpu on trunk, and with tms570 on gcc-13.  Ok to install?


for  gcc/testsuite/ChangeLog

* 
c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:
Expect "unaligned pointer value" warning on short_enums
targets, but not in c++.
* 
c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c:
Likewise.

OK.  Hell of a filename for a single test :-)

jeff


Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-19 Thread Jason Merrill

On 11/19/23 08:39, waffl3x wrote:

Funny enough I ended up removing the ones I was thinking about, seems
to always happen when I ask style questions but I'm glad to hear it's
okay going forward.

I'm having trouble fixing this bug, based on what Gasper said in
PR102609 I am pretty sure I know what the semantics should be. Since
the capture is not used in the body of the function, it should be well
formed to call the function with an unrelated type.


I don't think so: https://eel.is/c++draft/expr.prim#lambda.closure-5
says the type of the xobj parameter must be related.


I had begun trying to tackle the case that Gasper mentioned and got the
following ICE. I also have another case that ICEs so I've been thinking
I don't get to do little changes to fix this. I've been looking at this
for a few hours now and given we are past the deadline now I figured I
should see what others think.

int main()
{
   int x = 42;
   auto f1 = [x](this auto&& self) {};

   static_cast(decltype(f1)::operator());
}


This should be rejected when we try to instantiate the op() with int.


As I said, there is also this case that also ICEs in the same region.
It's making me think that some core assumptions are being violated in
the code leading up to finish_non_static_data_member.

int main()
{
   int x = 42;
   auto f1 = [x](this auto self) {};
}


Here I think the problem is in build_capture_proxy:


  /* The proxy variable forwards to the capture field.  */
  object = build_fold_indirect_ref (DECL_ARGUMENTS (fn));
  object = finish_non_static_data_member (member, object, NULL_TREE);


The call to build_fold_indirect_ref assumes that 'this' is a pointer, 
which it is not here.  I think you can just make that conditional on it 
being a pointer or reference?


Jason



Re: [PATCH 15/44] RISC-V/testsuite: Add branched cases for GEU and LEU cond-move operations

2023-11-19 Thread Jeff Law




On 11/18/23 22:38, Maciej W. Rozycki wrote:

Verify, for Ventana and Zicond targets and the GEU and LEU
conditional-move operations, that if-conversion does *not* trigger at
`-mbranch-cost=3' setting, which makes original branched code sequences
cheaper than their branchless equivalents if-conversion would emit.

gcc/testsuite/
* gcc.target/riscv/movdibgtu-ventana.c: New test.
* gcc.target/riscv/movdibgtu-zicond.c: New test.
* gcc.target/riscv/movdibltu-ventana.c: New test.
* gcc.target/riscv/movdibltu-zicond.c: New test.
* gcc.target/riscv/movsibgtu-ventana.c: New test.
* gcc.target/riscv/movsibgtu-zicond.c: New test.
* gcc.target/riscv/movsibltu-ventana.c: New test.
* gcc.target/riscv/movsibltu-zicond.c: New test.

OK
jeff





Re: [PATCH 17/44] RISC-V: Avoid extraneous EQ or NE operation in cond-move expansion

2023-11-19 Thread Jeff Law




On 11/18/23 22:38, Maciej W. Rozycki wrote:

In the non-zero case there is no need for the conditional value used by
Ventana and Zicond integer conditional operations to be specifically 1.
Regardless we canonicalize it by producing an extraneous conditional-set
operation, such as with the sequence below:

(insn 22 6 23 2 (set (reg:DI 141)
 (minus:DI (reg/v:DI 135 [ w ])
 (reg/v:DI 136 [ x ]))) 11 {subdi3}
  (nil))
(insn 23 22 24 2 (set (reg:DI 140)
 (ne:DI (reg:DI 141)
 (const_int 0 [0]))) 307 {*sne_zero_didi}
  (nil))
(insn 24 23 25 2 (set (reg:DI 143)
 (if_then_else:DI (eq:DI (reg:DI 140)
 (const_int 0 [0]))
 (const_int 0 [0])
 (reg:DI 13 a3 [ z ]))) 27913 {*czero.eqz.didi}
  (nil))
(insn 25 24 26 2 (set (reg:DI 142)
 (if_then_else:DI (ne:DI (reg:DI 140)
 (const_int 0 [0]))
 (const_int 0 [0])
 (reg/v:DI 137 [ y ]))) 27914 {*czero.nez.didi}
  (nil))
(insn 26 25 18 2 (set (reg/v:DI 138 [ z ])
 (ior:DI (reg:DI 142)
 (reg:DI 143))) 105 {iordi3}
  (nil))

where insn 23 can well be removed without changing the semantics of the
sequence.  This is actually fixed up later on by combine and the insn
does not make it to output meaning no SNEZ (or SEQZ in the reverse case)
appears in the assembly produced, however it counts towards the cost of
the sequence calculated by if-conversion, raising the trigger level for
the branchless sequence to be chosen.  Arguably to emit this extraneous
operation it can be also considered rather sloppy of our backend's.

Remove the check for operand 1 being constant 0 in the Ventana/Zicond
case for equality comparisons then, observing that `riscv_zero_if_equal'
called via `riscv_emit_int_compare' will canonicalize the comparison if
required, removing the extraneous insn from output:

(insn 22 6 23 2 (set (reg:DI 142)
 (minus:DI (reg/v:DI 135 [ w ])
 (reg/v:DI 136 [ x ]))) 11 {subdi3}
  (nil))
(insn 23 22 24 2 (set (reg:DI 141)
 (if_then_else:DI (eq:DI (reg:DI 142)
 (const_int 0 [0]))
 (const_int 0 [0])
 (reg:DI 13 a3 [ z ]))) 27913 {*czero.eqz.didi}
  (nil))
(insn 24 23 25 2 (set (reg:DI 140)
 (if_then_else:DI (ne:DI (reg:DI 142)
 (const_int 0 [0]))
 (const_int 0 [0])
 (reg/v:DI 137 [ y ]))) 27914 {*czero.nez.didi}
  (nil))
(insn 25 24 18 2 (set (reg/v:DI 138 [ z ])
 (ior:DI (reg:DI 140)
 (reg:DI 141))) 105 {iordi3}
  (nil))

while keeping actual assembly produced the same.

Adjust branch costs across the test cases affected accordingly.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Remove
the check for operand 1 being constant 0 in the Ventana/Zicond
case for equality comparisons.

gcc/testsuite/
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c:
Lower `-mbranch-cost=' setting.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c:
Likewise.

OK.  Thanks for catching this!

jeff


Re: [PATCH 18/44] RISC-V/testsuite: Add branched cases for equality cond-move operations

2023-11-19 Thread Jeff Law




On 11/18/23 22:38, Maciej W. Rozycki wrote:

Verify, for Ventana and Zicond targets and the equality conditional-move
operations, that if-conversion does *not* trigger at the respective
sufficiently low `-mbranch-cost=' settings that make original branched
code sequences cheaper than their branchless equivalents if-conversion
would emit.

gcc/testsuite/
* gcc.target/riscv/movdibeq-ventana.c: New test.
* gcc.target/riscv/movdibeq-zicond.c: New test.
* gcc.target/riscv/movdibne-ventana.c: New test.
* gcc.target/riscv/movdibne-zicond.c: New test.
* gcc.target/riscv/movsibeq-ventana.c: New test.
* gcc.target/riscv/movsibeq-zicond.c: New test.
* gcc.target/riscv/movsibne-ventana.c: New test.
* gcc.target/riscv/movsibne-zicond.c: New test.

OK
jeff


Re: [PATCH 19/44] RISC-V/testsuite: Add branchless cases for equality cond-move operations

2023-11-19 Thread Jeff Law




On 11/18/23 22:39, Maciej W. Rozycki wrote:

Verify, for Ventana and Zicond targets and the equality conditional-move
operations, that if-conversion triggers via `noce_try_cmove' at the
respective sufficiently high `-mbranch-cost=' settings that make
branchless code sequences produced by if-conversion cheaper than their
original branched equivalents, and that extraneous instructions such as
SNEZ, etc. are not present in output.

gcc/testsuite/
* gcc.target/riscv/movdieq-ventana.c: New test.
* gcc.target/riscv/movdieq-zicond.c: New test.
* gcc.target/riscv/movdine-ventana.c: New test.
* gcc.target/riscv/movdine-zicond.c: New test.
* gcc.target/riscv/movsieq-ventana.c: New test.
* gcc.target/riscv/movsieq-zicond.c: New test.
* gcc.target/riscv/movsine-ventana.c: New test.
* gcc.target/riscv/movsine-zicond.c: New test.

OK
jeff


Re: [PATCH 20/44] RISC-V: Also accept constants for T-Head cond-move comparison operands

2023-11-19 Thread Jeff Law




On 11/18/23 22:39, Maciej W. Rozycki wrote:

There is no need for the requirement for conditional-move comparison
operands to be stricter for T-Head targets than for other targets and
limit them to registers only.  Constants will be reloaded if required
just as with branches or other-target conditional-move operations and
there is no extra overhead specific to the T-Head case.  This enables
more opportunities for a branchless sequence to be produced.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Also
accept constants for T-Head comparison operands.

OK
jeff


Re: [PATCH 21/44] RISC-V: Also accept constants for T-Head cond-move data input operands

2023-11-19 Thread Jeff Law




On 11/18/23 22:39, Maciej W. Rozycki wrote:

There is no need for the requirement for conditional-move data input
operands to be stricter for T-Head targets than for short forward branch
targets and limit them to registers only.  They are keyed according to
the `sfb_alu_operand' predicate, which lets certain constants through.
Such constants are already forced into a register for the `cons' operand
in the analogous short forward branch case and we can force them for the
`alt' operand and T-Head as well.  This enables more opportunities for a
branchless sequence to be produced.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Also
 accept constants for T-Head data input operands.

OK.

Jeff


Re: [PATCH 23/44] RISC-V/testsuite: Add branched cases for T-Head non-equality cond moves

2023-11-19 Thread Jeff Law




On 11/18/23 22:40, Maciej W. Rozycki wrote:

Verify, for T-Head targets and the non-equality integer conditional-move
operations, that if-conversion does *not* trigger at `-mbranch-cost=1'
setting, which makes original branched code sequences cheaper than their
branchless equivalents if-conversion would emit.

gcc/testsuite/
* gcc.target/riscv/movdibge-thead.c: New test.
* gcc.target/riscv/movdibgeu-thead.c: New test.
* gcc.target/riscv/movdibgt-thead.c: New test.
* gcc.target/riscv/movdibgtu-thead.c: New test.
* gcc.target/riscv/movdible-thead.c: New test.
* gcc.target/riscv/movdibleu-thead.c: New test.
* gcc.target/riscv/movdiblt-thead.c: New test.
* gcc.target/riscv/movdibltu-thead.c: New test.
* gcc.target/riscv/movsibge-thead.c: New test.
* gcc.target/riscv/movsibgeu-thead.c: New test.
* gcc.target/riscv/movsibgt-thead.c: New test.
* gcc.target/riscv/movsibgtu-thead.c: New test.
* gcc.target/riscv/movsible-thead.c: New test.
* gcc.target/riscv/movsibleu-thead.c: New test.
* gcc.target/riscv/movsiblt-thead.c: New test.
* gcc.target/riscv/movsibltu-thead.c: New test.

OK.
jeff


Re: [PATCH 24/44] RISC-V/testsuite: Add branchless cases for T-Head non-equality cond moves

2023-11-19 Thread Jeff Law




On 11/18/23 22:40, Maciej W. Rozycki wrote:

Verify, for T-Head targets and the non-equality integer conditional-move
operations, that if-conversion triggers via `noce_try_cmove' at
`-mbranch-cost=2' setting, which makes branchless code sequences
produced by if-conversion cheaper than their original branched
equivalents, and that extraneous instructions such as SNEZ, etc. are not
present in output.

gcc/testsuite/
* gcc.target/riscv/movdige-thead.c: New test.
* gcc.target/riscv/movdigeu-thead.c: New test.
* gcc.target/riscv/movdigt-thead.c: New test.
* gcc.target/riscv/movdigtu-thead.c: New test.
* gcc.target/riscv/movdile-thead.c: New test.
* gcc.target/riscv/movdileu-thead.c: New test.
* gcc.target/riscv/movdilt-thead.c: New test.
* gcc.target/riscv/movdiltu-thead.c: New test.
* gcc.target/riscv/movsige-thead.c: New test.
* gcc.target/riscv/movsigeu-thead.c: New test.
* gcc.target/riscv/movsigt-thead.c: New test.
* gcc.target/riscv/movsigtu-thead.c: New test.
* gcc.target/riscv/movsile-thead.c: New test.
* gcc.target/riscv/movsileu-thead.c: New test.
* gcc.target/riscv/movsilt-thead.c: New test.
* gcc.target/riscv/movsiltu-thead.c: New test.

OK
jeff


Re: [PATCH 25/44] RISC-V: Implement `riscv_emit_unary' helper

2023-11-19 Thread Jeff Law




On 11/18/23 22:40, Maciej W. Rozycki wrote:

Add a `riscv_emit_unary' helper for unary operations, complementing
`riscv_emit_binary'.

gcc/
* config/riscv/riscv-protos.h (riscv_emit_unary): New prototype.
* config/riscv/riscv.cc (riscv_emit_unary): New function.

OK
jeff


Re: [PATCH 26/44] RISC-V: Add `movMODEcc' implementation for generic targets

2023-11-19 Thread Jeff Law




On 11/18/23 22:40, Maciej W. Rozycki wrote:

Provide RTL expansion of conditional-move operations for generic targets
using a suitable sequence of base integer machine instructions according
to cost evaluation by if-conversion.  Add `-mmovcc' command line option
to enable this transformation, off by default.

For the generic sequences small immediates as per the `arith_operand'
predicate are cost-equivalent to registers as we can use them as input,
alternative to a register, to the respective AND[I] machine operations,
however we need to reject immediates fulfilling `lui_operand', because
they would require reloading into a register, making the operation more
costly.  Therefore add `movcc_operand' predicate and use it accordingly.

There is a need to adjust zbs-bext-02.c, which can also serve as emitted
code example, because with certain compilation options an AND operation
can now legitimately appear in output despite BEXT having been produced
as expected, such as with `-march=rv64gc -O2':

foo:
mv  a3,a0
li  a5,0
mv  a0,a1
li  a2,64
li  a1,1
.L3:
sll a4,a1,a5
and a4,a4,a3
addiw   a5,a5,1
beq a4,zero,.L2
addiw   a0,a0,1
.L2:
bne a5,a2,.L3
ret

vs `-march=rv64gc_zbs -O2':

foo:
mv  a4,a0
li  a5,0
mv  a0,a1
li  a3,64
.L3:
bexta2,a4,a5
beq a2,zero,.L2
addiw   a0,a0,1
.L2:
addiw   a5,a5,1
bne a5,a3,.L3
ret

and then with `-march=rv64gc -mmovcc -mbranch-cost=7':

foo:
mv  a6,a0
li  a4,0
mv  a0,a1
li  a7,1
li  a1,64
.L3:
sll a5,a7,a4
and a5,a5,a6
sneza5,a5
neg a5,a5
not a2,a5
addiw   a3,a0,1
and a5,a5,a3
and a0,a2,a0
addiw   a4,a4,1
or  a0,a5,a0
bne a4,a1,.L3
ret

vs `-march=rv64gc_zbs -mmovcc -mbranch-cost=7':

foo:
mv  a6,a0
li  a4,0
mv  a0,a1
li  a1,64
.L3:
bexta5,a6,a4
neg a5,a5
not a2,a5
addiw   a3,a0,1
and a5,a5,a3
and a0,a2,a0
addiw   a4,a4,1
or  a0,a5,a0
bne a4,a1,.L3
ret

However BEXT is supposed to replace an SLL operation so adjust the test
case to reject SLL rather than AND, letting the test case pass even with
`/-mmovcc/-mbranch-cost=7' specified as DejaGNU test flags (and in the
absence of target-specific conditional-move operations enabled either by
default or with other test flags).

gcc/
* config/riscv/predicates.md (movcc_operand): New predicate.
* config/riscv/riscv.cc (riscv_expand_conditional_move): Handle
generic targets.
* config/riscv/riscv.md (movcc): Likewise.
* config/riscv/riscv.opt (mmovcc): New option.
* doc/invoke.texi (Option Summary): Document it.

gcc/testsuite/
* gcc.target/riscv/zbs-bext-02.c: Adjust to reject SLL rather
than AND.
OK.  Just curious are y'all seeing significant interest in this case 
from customers or is this more a case of rounding out the implementation 
to cover all potential possibilities?




Jeff


Re: [PATCH 27/44] RISC-V/testsuite: Add branched cases for generic integer cond moves

2023-11-19 Thread Jeff Law




On 11/18/23 22:40, Maciej W. Rozycki wrote:

Verify, for generic integer conditional-move operations, if-conversion
*not* to trigger at the respective sufficiently low `-mbranch-cost='
settings that make original branched code sequences cheaper than their
branchless equivalents if-conversion would emit.  Cover all integer
relational operations to make sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/movdibeq.c: New test.
* gcc.target/riscv/movdibge.c: New test.
* gcc.target/riscv/movdibgeu.c: New test.
* gcc.target/riscv/movdibgt.c: New test.
* gcc.target/riscv/movdibgtu.c: New test.
* gcc.target/riscv/movdible.c: New test.
* gcc.target/riscv/movdibleu.c: New test.
* gcc.target/riscv/movdiblt.c: New test.
* gcc.target/riscv/movdibltu.c: New test.
* gcc.target/riscv/movdibne.c: New test.
* gcc.target/riscv/movsibeq.c: New test.
* gcc.target/riscv/movsibge.c: New test.
* gcc.target/riscv/movsibgeu.c: New test.
* gcc.target/riscv/movsibgt.c: New test.
* gcc.target/riscv/movsibgtu.c: New test.
* gcc.target/riscv/movsible.c: New test.
* gcc.target/riscv/movsibleu.c: New test.
* gcc.target/riscv/movsiblt.c: New test.
* gcc.target/riscv/movsibltu.c: New test.
* gcc.target/riscv/movsibne.c: New test.

OK
jeff


Re: [PATCH 28/44] RISC-V/testsuite: Add branchless cases for generic integer cond moves

2023-11-19 Thread Jeff Law




On 11/18/23 22:41, Maciej W. Rozycki wrote:

Verify, for generic integer conditional-move operations, if-conversion
to trigger via `noce_try_cmove' at the respective sufficiently high
`-mbranch-cost=' settings that make branchless code sequences produced
by if-conversion cheaper than their original branched equivalents, and,
where applicable, that extraneous instructions such as SNEZ, etc. are
not present in output.  Cover all integer relational operations to make
sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/movdieq.c: New test.
* gcc.target/riscv/movdige.c: New test.
* gcc.target/riscv/movdigeu.c: New test.
* gcc.target/riscv/movdigt.c: New test.
* gcc.target/riscv/movdigtu.c: New test.
* gcc.target/riscv/movdile.c: New test.
* gcc.target/riscv/movdileu.c: New test.
* gcc.target/riscv/movdilt.c: New test.
* gcc.target/riscv/movdiltu.c: New test.
* gcc.target/riscv/movdine.c: New test.
* gcc.target/riscv/movsieq.c: New test.
* gcc.target/riscv/movsige.c: New test.
* gcc.target/riscv/movsigeu.c: New test.
* gcc.target/riscv/movsigt.c: New test.
* gcc.target/riscv/movsigtu.c: New test.
* gcc.target/riscv/movsile.c: New test.
* gcc.target/riscv/movsileu.c: New test.
* gcc.target/riscv/movsilt.c: New test.
* gcc.target/riscv/movsiltu.c: New test.
* gcc.target/riscv/movsine.c: New test.

OK
jeff


Re: [PATCH 29/44] RISC-V: Add `addMODEcc' implementation for generic targets

2023-11-19 Thread Jeff Law




On 11/18/23 22:41, Maciej W. Rozycki wrote:

Provide RTL expansion of conditional-add operations for generic targets
using a suitable sequence of base integer machine instructions according
to cost evaluation by if-conversion.  Use existing `-mmovcc' command
line option to enable this transformation.

gcc/
* config/riscv/riscv.md (addcc): New expander.
Is this an improvement over what if-convert creates for a conditional 
add or is the goal to expose the sequence earlier in the pipeline rather 
than waiting for ifcvt?


Either way this is fine, just questioning slightly if really improves 
things.  I don't see any way it'd be hurting.


jeff


Re: [PATCH 30/44] RISC-V/testsuite: Add branched cases for generic integer cond adds

2023-11-19 Thread Jeff Law




On 11/18/23 22:41, Maciej W. Rozycki wrote:

Verify, for generic integer conditional-add operations, if-conversion
*not* to trigger at the respective sufficiently low `-mbranch-cost='
settings that make original branched code sequences cheaper than their
branchless equivalents if-conversion would emit.  Cover all integer
relational operations to make sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/adddibeq.c: New test.
* gcc.target/riscv/adddibge.c: New test.
* gcc.target/riscv/adddibgeu.c: New test.
* gcc.target/riscv/adddibgt.c: New test.
* gcc.target/riscv/adddibgtu.c: New test.
* gcc.target/riscv/adddible.c: New test.
* gcc.target/riscv/adddibleu.c: New test.
* gcc.target/riscv/adddiblt.c: New test.
* gcc.target/riscv/adddibltu.c: New test.
* gcc.target/riscv/adddibne.c: New test.
* gcc.target/riscv/addsibeq.c: New test.
* gcc.target/riscv/addsibge.c: New test.
* gcc.target/riscv/addsibgeu.c: New test.
* gcc.target/riscv/addsibgt.c: New test.
* gcc.target/riscv/addsibgtu.c: New test.
* gcc.target/riscv/addsible.c: New test.
* gcc.target/riscv/addsibleu.c: New test.
* gcc.target/riscv/addsiblt.c: New test.
* gcc.target/riscv/addsibltu.c: New test.
* gcc.target/riscv/addsibne.c: New test.

OK
jeff


Re: [PATCH 31/44] RISC-V/testsuite: Add branchless cases for generic integer cond adds

2023-11-19 Thread Jeff Law




On 11/18/23 22:41, Maciej W. Rozycki wrote:

Verify, for generic integer conditional-add operations, if-conversion
to trigger via `noce_try_addcc' at the respective sufficiently high
`-mbranch-cost=' settings that make branchless code sequences produced
by if-conversion cheaper than their original branched equivalents, and,
where applicable, that extraneous instructions such as SNEZ, etc. are
not present in output.  Cover all integer relational operations to make
sure no corner case escapes.

The reason to XFAIL SImode tests for RV64 targets is the compiler thinks
it has to sign-extend addends, which causes if-conversion to give up.
WRT extension and causing if-conversion to give up.  Yes, it's a real 
issue.  In fact when we had Jivan do some analysis work on missed 
if-conversions, better handling of 32bit operations on rv64 was the 
biggest class of missed cases.


We've got a bit of internal code to address that.  But I've been having 
trouble finding the time to clean it up enough to post.




gcc/testsuite/
* gcc.target/riscv/adddieq.c: New test.
* gcc.target/riscv/adddige.c: New test.
* gcc.target/riscv/adddigeu.c: New test.
* gcc.target/riscv/adddigt.c: New test.
* gcc.target/riscv/adddigtu.c: New test.
* gcc.target/riscv/adddile.c: New test.
* gcc.target/riscv/adddileu.c: New test.
* gcc.target/riscv/adddilt.c: New test.
* gcc.target/riscv/adddiltu.c: New test.
* gcc.target/riscv/adddine.c: New test.
* gcc.target/riscv/addsieq.c: New test.
* gcc.target/riscv/addsige.c: New test.
* gcc.target/riscv/addsigeu.c: New test.
* gcc.target/riscv/addsigt.c: New test.
* gcc.target/riscv/addsigtu.c: New test.
* gcc.target/riscv/addsile.c: New test.
* gcc.target/riscv/addsileu.c: New test.
* gcc.target/riscv/addsilt.c: New test.
* gcc.target/riscv/addsiltu.c: New test.
* gcc.target/riscv/addsine.c: New test.

OK
jeff



Re: [PATCH 32/44] RISC-V: Only use SUBREG if applicable in `riscv_expand_float_scc'

2023-11-19 Thread Jeff Law




On 11/18/23 22:42, Maciej W. Rozycki wrote:

A subsequent change to enable the processing of conditional moves on a
floating-point condition by `riscv_expand_conditional_move' will cause
`riscv_expand_float_scc' to be called for word-mode target RTX with RV64
targets.  In that case an invalid insn such as:

(insn 25 24 0 (set (reg:DI 141)
 (subreg:SI (reg:DI 143) 0)) -1
  (nil))

would be produced, which would crash the compiler later on.  Since the
output operand of the SET operation to be produced already has the same
mode as the input operand does, just omit the use of SUBREG and assign
directly.

gcc/
* config/riscv/riscv.cc (riscv_expand_float_scc): Suppress the
use of SUBREG if the conditional-set target is word-mode.

OK
jeff


Re: [PATCH 33/44] RISC-V: Also allow FP conditions in `riscv_expand_conditional_move'

2023-11-19 Thread Jeff Law




On 11/18/23 22:42, Maciej W. Rozycki wrote:

In `riscv_expand_conditional_move' we only let integer conditions
through at the moment, even though code has already been prepared to
handle floating-point conditions as well.

Lift this restriction and only bail out if a non-word-mode integer
condition has been requested, as we cannot handle this specific case
owing to machine instruction set restriction.  We already take care of
the non-integer, non-floating-point case later on.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Don't
bail out in floating-point conditions.
I probably goof'd something when merging up the eswin, vrull and ventana 
changes in this space.  I recall fixing multiple bugs in handling FP 
conditional moves when building/testing spec2017 internally -- so this 
was supposed to be working.



OK for the trunk.

Jeff


Re: [PATCH 22/44] RISC-V: Fold all the cond-move variants together

2023-11-19 Thread Jeff Law




On 11/18/23 22:40, Maciej W. Rozycki wrote:

Code in `riscv_expand_conditional_move' for Ventana and Zicond targets
seems like bolted on as an afterthought rather than properly merged so
as to handle all the cases together.
You could characterize it that way.  It was mostly a desire to not muck 
up any of the thead or sfb code which I know next to nothing about.  I 
think with the improvements in the testsuite from your series it's a lot 
more feasible to unify the implementations and be sure we haven't broken 
something along the way.





Fold the existing code pieces together then (observing that for short
forward branch targets no integer comparisons need to be canonicalized),
letting T-Head targets produce branchless sequences for all the integer
comparisons rather than for equality ones only, and preparing for the
handling of floating-point comparisons here across all conditional-move
targets.

gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Unify
conditional-move handling across all the relevant targets.

OK

jeff


Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-19 Thread waffl3x
On Sunday, November 19th, 2023 at 9:31 AM, Jason Merrill  
wrote:


>
>
> On 11/19/23 08:39, waffl3x wrote:
>
> > Funny enough I ended up removing the ones I was thinking about, seems
> > to always happen when I ask style questions but I'm glad to hear it's
> > okay going forward.
> >
> > I'm having trouble fixing this bug, based on what Gasper said in
> > PR102609 I am pretty sure I know what the semantics should be. Since
> > the capture is not used in the body of the function, it should be well
> > formed to call the function with an unrelated type.
>
>
> I don't think so: https://eel.is/c++draft/expr.prim#lambda.closure-5
> says the type of the xobj parameter must be related.

Well, thanks for bringing that to my attention, that makes things
easier but I'm kinda disappointed, I almost wanted an excuse to write
more code. I wonder why Gasper thought this wasn't the case, perhaps it
was a later decision?

Okay, I checked, that paragraph is in the original paper, I thought I
could just take what one of the paper's authors said for granted but I
guess not, I'm not going to make that mistake again. On the other hand,
maybe he just misunderstood my question, what can you do.

Well regardless, that reduces the things I have left to do by a whole
lot.

> > I had begun trying to tackle the case that Gasper mentioned and got the
> > following ICE. I also have another case that ICEs so I've been thinking
> > I don't get to do little changes to fix this. I've been looking at this
> > for a few hours now and given we are past the deadline now I figured I
> > should see what others think.
> >
> > int main()
> > {
> > int x = 42;
> > auto f1 = [x](this auto&& self) {};
> >
> > static_cast(decltype(f1)::operator());
> > }
>
>
> This should be rejected when we try to instantiate the op() with int.

Yep, absolutely, that is clear now.

Just for the record, clang accepts it, but since I can't think of any
use cases for this I don't think there's any value in supporting it.

>
> > As I said, there is also this case that also ICEs in the same region.
> > It's making me think that some core assumptions are being violated in
> > the code leading up to finish_non_static_data_member.
> >
> > int main()
> > {
> > int x = 42;
> > auto f1 = [x](this auto self) {};
> > }
>
>
> Here I think the problem is in build_capture_proxy:
>
> > /* The proxy variable forwards to the capture field. */
> > object = build_fold_indirect_ref (DECL_ARGUMENTS (fn));
> > object = finish_non_static_data_member (member, object, NULL_TREE);
>
>
> The call to build_fold_indirect_ref assumes that 'this' is a pointer,
> which it is not here. I think you can just make that conditional on it
> being a pointer or reference?
>
> Jason

Thanks, I will take a look at that area.

I'm having trouble fixing the error for this case, the control flow
when the functions are overloaded is much more complex.

struct S {
  void f(this S&) {}
  void f(this S&, int)

  void g() {
void (*fp)(S&) = &f;
  }
};

This seemed to have fixed the non overloaded case, but I'm also not
very happy with it, it feels kind of icky. Especially since the expr's
location isn't available here, although, it just occurred to me that
the expr's location is probably stored in the node.

typeck.cc:cp_build_addr_expr_1
```
case BASELINK:
  arg = BASELINK_FUNCTIONS (arg);
  if (DECL_XOBJ_MEMBER_FUNC_P (
{
  error ("You must qualify taking address of xobj member functions");
  return error_mark_node;
}

Anyway, I'm quite tired but I'll to finish off the lambda stuff before
calling it, then I'll run a bootstrap and tests and if all is well I
will submit the patch. I will probably skimp on the changelog and
commit message as that's the part I have the hardest time on,
especially when I'm tired.

Alex
```


Re: [PATCH 09/44] RISC-V: Rework branch costing model for if-conversion

2023-11-19 Thread Jeff Law




On 11/18/23 22:36, Maciej W. Rozycki wrote:

The generic branch costing model for if-conversion assumes a fixed cost
of COSTS_N_INSNS (2) for a conditional branch, and that one half of that
cost comes from a preceding condition-set instruction, such as with
MODE_CC targets, and then the other half of that cost is for the actual
branch instruction.  This is hardcoded for `if_info.original_cost' in
`noce_find_if_block' and regardless of the cost set for branches via
BRANCH_COST.

Then `default_max_noce_ifcvt_seq_cost' instructs if-conversion to prefer
a branchless sequence as costly as high as triple the BRANCH_COST value
set.  This is apparently to make up for the inability to accurately
guess the branch penalty.

Consequently for the BRANCH_COST of 3 we commonly set for tuning,
if-conversion will consider branchless sequences costing 3 * 3 - 2 = 7
instruction units more than a corresponding branch sequence.  For the
BRANCH_COST of 4 such as with `sifive-7-series' tuning this is even
worse, at 3 * 4 - 2 = 10.  Effectively it means a branchless sequence
will always be chosen if available, even a very inefficient one.

Rework the branch costing model to better match our architecture,
observing in particular that we have no preparatory instructions for
branches so that the cost of a branch is naked BRANCH_COST plus any
extra overhead the processing of a branch's source RTX might incur.

Provide TARGET_INSN_COST and TARGET_MAX_NOCE_IFCVT_SEQ_COST handlers
than that return suitable cost based on BRANCH_COST.  The latter hook
usually returns a value that is lower than the cost of the corresponding
branched sequence.  This is because we don't really want to produce a
branchless sequence that is more expensive than the original branched
sequence.  If this turns out too conservative for some corner case, then
this choice might be revisited.

Then we don't want to fiddle with `noce_find_if_block' without a lot of
cross-target verification, so add TARGET_NOCE_CONVERSION_PROFITABLE_P
defined such that it subtracts the fixed COSTS_N_INSNS (2) cost from the
cost of the original branched sequence supplied and instead adds actual
branch cost calculated from the conditional branch instruction used.  It
is then further tweaked according to simple analysis of the replacement
branchless sequence produced so as to cancel the cost of an extraneous
zero extend operation produced by `noce_try_store_flag_mask' as observed
with gcc/testsuite/gcc.target/riscv/pr105314.c.

Tweak the testsuite accordingly and set `-mbranch-cost=' explicitly for
the relevant cases so that the expected if-conversion transformation is
made regardless of the default BRANCH_COST value of tuning in effect.
Some of these settings will be lowered later on as deficiencies in
branchless sequence generation have been fixed that lower their cost
calculated by if-conversion.
As I suspect you know a big part of the problem here is that BRANCH_COST 
and rtx_cost don't have any common scale and thus trying to compare 
BRANCH_COST to RTX_COST doesn't have well defined meaning.


That hasn't kept us from trying to do precisely that and the result has 
always been less than satisfactory.  You're introducing more, but I 
don't think there's a reasonable way out of this mess at this point.





gcc/
* config/riscv/riscv.cc (riscv_insn_cost): New function.
(riscv_max_noce_ifcvt_seq_cost): Likewise.
(riscv_noce_conversion_profitable_p): Likewise.
(TARGET_INSN_COST): New macro.
(TARGET_MAX_NOCE_IFCVT_SEQ_COST): New macro.
(TARGET_NOCE_CONVERSION_PROFITABLE_P: New macro.

gcc/testsuite/
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c:
Explicitly set the branch cost.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c:
Likewise.
* 
gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c:
Likewise.
---
FWIW I don't understand why the test cases absolutely HAD to have such
overlong names guaranteed to exceed our 80 column limit in any context.
It's such a pain to handle.
I dislike the long names as well.  I nearly changed them myself as part 
of the eswin submission, but that seemed a bit gratituous to me so I 
left them as-is.


If you wanted to rename them, be my guest, consider it pre-approved ;-)

WRT the extraneous zero-extension.  Isn't that arguably a bug in the scc 
expander for risc-v?  Fixing that isn't a prerequisite here, but it 
probably worth a bit of someone's time.


OK for the trunk.

jeff


[committed] Fix missing mode on a few unspec/unspec_volatile operands

2023-11-19 Thread Jeff Law
This is fix for a minor problem Jivan and I found while testing the 
ext-dce work originally from Joern.


The ext-dce pass will transform zero/sign extensions into subreg 
accesses when the upper bits are actually unused.  So it's more likely 
with the ext-dce work to get a sequence like this prior to combine:






(insn 10 9 11 2 (set (reg:SI 144)
(unspec_volatile [
(const_int 0 [0])
] UNSPECV_FRFLAGS)) "j.c":11:3 discrim 1 362 {riscv_frflags}
 (nil))
(insn 11 10 55 2 (set (reg:DI 140 [ _12 ])
(subreg:DI (reg:SI 144) 0)) "j.c":11:3 discrim 1 206 {*movdi_64bit}
 (expr_list:REG_DEAD (reg:SI 144)
(nil))) 


When we try to combine insn 10->11 we'll ultimately call simplify_subreg 
with something like


(subreg:DI (unspec_volatile [...]) 0)

Note the lack of a mode on the unspec_volatile.  That in turn will cause 
simplify_subreg to trigger an assertion.


The modeless unspec is generated by the RISC-V backend and the more I've 
pondered this issue over the last few days the more I'm convinced it's a 
backend bug.  Basically if the LHS of the set has a mode, then the RHS 
of the set should have a mode as well.


I've audited the various backends and only found a few problems which 
are fixed by this patch.  I've tested the relevant ports in my tester. 
c6x, sh, mips and s390[x].


There are other patterns that are potentially problematical in various 
ports.  They have a REG destination and an UNSPEC source, but the REG 
has no mode in the pattern.  Since it wasn't clear what mode to give the 
UNSPEC, I left those alone.


Pushing to the trunk.

jeff
commit 07da9b7f13c92a21d12172a9df85ad762591b998
Author: Jeff Law 
Date:   Sun Nov 19 11:56:57 2023 -0700

[committed] Fix missing mode on a few unspec/unspec_volatile operands

This is fix for a minor problem Jivan and I found while testing the ext-dce 
work originally from Joern.

The ext-dce pass will transform zero/sign extensions into subreg accesses 
when
the upper bits are actually unused.  So it's more likely with the ext-dce 
work
to get a sequence like this prior to combine:

>
>> (insn 10 9 11 2 (set (reg:SI 144)
>> (unspec_volatile [
>> (const_int 0 [0])
>> ] UNSPECV_FRFLAGS)) "j.c":11:3 discrim 1 362 {riscv_frflags}
>>  (nil))
>> (insn 11 10 55 2 (set (reg:DI 140 [ _12 ])
>> (subreg:DI (reg:SI 144) 0)) "j.c":11:3 discrim 1 206 
{*movdi_64bit}
>>  (expr_list:REG_DEAD (reg:SI 144)
>> (nil)))

When we try to combine insn 10->11 we'll ultimately call simplify_subreg 
with
something like

(subreg:DI (unspec_volatile [...]) 0)

Note the lack of a mode on the unspec_volatile.  That in turn will cause
simplify_subreg to trigger an assertion.

The modeless unspec is generated by the RISC-V backend and the more I've
pondered this issue over the last few days the more I'm convinced it's a
backend bug.  Basically if the LHS of the set has a mode, then the RHS of 
the
set should have a mode as well.

I've audited the various backends and only found a few problems which are 
fixed
by this patch.  I've tested the relevant ports in my tester.  c6x, sh, mips 
and
s390[x].

There are other patterns that are potentially problematical in various 
ports.
They have a REG destination and an UNSPEC source, but the REG has no mode in
the pattern.  Since it wasn't clear what mode to give the UNSPEC, I left 
those
alone.

gcc/

* config/c6x/c6x.md (mvilc): Add mode to UNSPEC source.
* config/mips/mips.md (rdhwr_synci_step_): Likewise.
* config/riscv/riscv.md (riscv_frcsr, riscv_frflags): Likewise.
* config/s390/s390.md (@split_stack_call): Likewise.
(@split_stack_cond_call): Likewise.
* config/sh/sh.md (sp_switch_1): Likewise.

diff --git a/gcc/config/c6x/c6x.md b/gcc/config/c6x/c6x.md
index 88b9291ae23..906fdb34a82 100644
--- a/gcc/config/c6x/c6x.md
+++ b/gcc/config/c6x/c6x.md
@@ -1440,7 +1440,7 @@ (define_expand "doloop_end"
 
 (define_insn "mvilc"
   [(set (reg:SI REG_ILC)
-   (unspec [(match_operand:SI 0 "register_operand" "a,b")] UNSPEC_MVILC))]
+   (unspec:SI [(match_operand:SI 0 "register_operand" "a,b")] 
UNSPEC_MVILC))]
   "TARGET_INSNS_64PLUS"
   "%|%.\\tmvc\\t%$\\t%0, ILC"
   [(set_attr "predicable" "no")
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index a25454783fb..0666310734e 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -5732,7 +5732,7 @@ (define_insn "synci"
 
 (define_insn "rdhwr_synci_step_"
   [(set (match_operand:P 0 "register_operand" "=d")
-(unspec_volatile [(const_int 1)]
+(unspec_volatile:P [(const_int 1)]
 UNSPEC_RDHWR))]
   "ISA_HAS_SYNCI"
   "rdhwr\t%0,$1")
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv

Re: [PATCH v2 1/3] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-19 Thread Xi Ruoyao
On Sun, 2023-11-19 at 22:30 +0800, Xi Ruoyao wrote:
> +;; All these are controlled by -ffp-int-builtin-inexact.
> +(define_int_attr simd_frint_pattern
> +  [(UNSPEC_SIMD_FRINTRP  "ceil")
> +   (UNSPEC_SIMD_FRINTRZ  "btrunc")
> +   (UNSPEC_SIMD_FRINT"nearbyint")
> +   (UNSPEC_SIMD_FRINTRNE "roundeven")
> +   (UNSPEC_SIMD_FRINTRM  "floor")])

This is wrong.  nearbyint is not allowed to raise FE_INEXACT even if -
ffp-int-builtin-inexact.  Please abandon this series and I'll send V3.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 34/44] RISC-V: Provide FP conditional-branch instructions for if-conversion

2023-11-19 Thread Jeff Law




On 11/18/23 22:42, Maciej W. Rozycki wrote:

Do not expand floating-point conditional-branch RTL instructions right
away that use a comparison operation that is either directly available
as a machine conditional-set instruction or is NE, which can be emulated
by EQ.  This is so that if-conversion sees them in their original form
and can produce fewer operations tried in a branchless code sequence
compared to when such an instruction has been already converted to a
sequence of a floating-point conditional-set RTL instruction followed by
an integer conditional-branch RTL instruction.  Split any floating-point
conditional-branch RTL instructions still remaining after reload then.

Adjust the testsuite accordingly: since the middle end uses the inverse
condition internally, an inverse conditional-set instruction may make it
to assembly output and also `cond_move_process_if_block' will be used by
if-conversion rather than `noce_process_if_block', because the latter
function not yet been updated to handle inverted conditions.

gcc/
* config/riscv/predicates.md (ne_operator): New predicate.
* config/riscv/riscv.cc (riscv_insn_cost): Handle branches on a
floating-point condition.
* config/riscv/riscv.md (@cbranch4): Rename expander to...
(@cbranch4): ... this.  Only expand the RTX via
`riscv_expand_conditional_branch' for `!signed_order_operator'
operators, otherwise let it through.
(*cbranch4, *cbranch4): New insns and
splitters.

gcc/testsuite/
* gcc.target/riscv/movdifge-sfb.c: Reject "if-conversion
succeeded through" rather than accepting it.
* gcc.target/riscv/movdifge-thead.c: Likewise.
* gcc.target/riscv/movdifge-ventana.c: Likewise.
* gcc.target/riscv/movdifge-zicond.c: Likewise.
* gcc.target/riscv/movdifgt-sfb.c: Likewise.
* gcc.target/riscv/movdifgt-thead.c: Likewise.
* gcc.target/riscv/movdifgt-ventana.c: Likewise.
* gcc.target/riscv/movdifgt-zicond.c: Likewise.
* gcc.target/riscv/movdifle-sfb.c: Likewise.
* gcc.target/riscv/movdifle-thead.c: Likewise.
* gcc.target/riscv/movdifle-ventana.c: Likewise.
* gcc.target/riscv/movdifle-zicond.c: Likewise.
* gcc.target/riscv/movdiflt-sfb.c: Likewise.
* gcc.target/riscv/movdiflt-thead.c: Likewise.
* gcc.target/riscv/movdiflt-ventana.c: Likewise.
* gcc.target/riscv/movdiflt-zicond.c: Likewise.
* gcc.target/riscv/movsifge-sfb.c: Likewise.
* gcc.target/riscv/movsifge-thead.c: Likewise.
* gcc.target/riscv/movsifge-ventana.c: Likewise.
* gcc.target/riscv/movsifge-zicond.c: Likewise.
* gcc.target/riscv/movsifgt-sfb.c: Likewise.
* gcc.target/riscv/movsifgt-thead.c: Likewise.
* gcc.target/riscv/movsifgt-ventana.c: Likewise.
* gcc.target/riscv/movsifgt-zicond.c: Likewise.
* gcc.target/riscv/movsifle-sfb.c: Likewise.
* gcc.target/riscv/movsifle-thead.c: Likewise.
* gcc.target/riscv/movsifle-ventana.c: Likewise.
* gcc.target/riscv/movsifle-zicond.c: Likewise.
* gcc.target/riscv/movsiflt-sfb.c: Likewise.
* gcc.target/riscv/movsiflt-thead.c: Likewise.
* gcc.target/riscv/movsiflt-ventana.c: Likewise.
* gcc.target/riscv/movsiflt-zicond.c: Likewise.
* gcc.target/riscv/smax-ieee.c: Also accept FLT.D.
* gcc.target/riscv/smaxf-ieee.c: Also accept FLT.S.
* gcc.target/riscv/smin-ieee.c: Also accept FGT.D.
* gcc.target/riscv/sminf-ieee.c: Also accept FGT.S.
So this is a more gradual lowering of the FP branches to allow ifcvt to 
do a better job.  Seems generally reasonable.  I don't expect that we're 
missing any significant simplifications, though I probably could 
construct a missed CSE/GCSE if I worked at it for a bit.


Presumably the length computation can't be handled by the generic code 
we've already got in place?


OK for the trunk.

jeff


Re: [PATCH 35/44] RISC-V: Avoid extraneous integer comparison for FP comparisons

2023-11-19 Thread Jeff Law




On 11/18/23 22:42, Maciej W. Rozycki wrote:

We have floating-point coditional-set machine instructions for a subset
of FP comparisons, so avoid going through a comparison against constant
zero in `riscv_expand_float_scc' where not necessary, preventing an
extraneous RTL instruction from being produced that counts against the
cost of the replacement branchless code sequence in if-conversion, e.g.:

(insn 29 6 30 2 (set (reg:DI 142)
 (ge:DI (reg/v:DF 135 [ w ])
 (reg/v:DF 136 [ x ]))) 297 {*cstoredfdi4}
  (nil))
(insn 30 29 31 2 (set (reg:DI 143)
 (ne:DI (reg:DI 142)
 (const_int 0 [0]))) 319 {*sne_zero_didi}
  (nil))
(insn 31 30 32 2 (set (reg:DI 141)
 (reg:DI 143)) 206 {*movdi_64bit}
  (nil))
(insn 32 31 33 2 (set (reg:DI 144)
 (neg:DI (reg:DI 141))) 15 {negdi2}
  (nil))
(insn 33 32 34 2 (set (reg:DI 145)
 (and:DI (reg:DI 144)
 (reg/v:DI 137 [ y ]))) 102 {*anddi3}
  (nil))
(insn 34 33 35 2 (set (reg:DI 146)
 (not:DI (reg:DI 144))) 111 {one_cmpldi2}
  (nil))
(insn 35 34 36 2 (set (reg:DI 147)
 (and:DI (reg:DI 146)
 (reg/v:DI 138 [ z ]))) 102 {*anddi3}
  (nil))
(insn 36 35 21 2 (set (reg/v:DI 138 [ z ])
 (ior:DI (reg:DI 145)
 (reg:DI 147))) 105 {iordi3}
  (nil))

where the second insn effectively just copies its input.  This now gets
simplified to:

(insn 29 6 30 2 (set (reg:DI 141)
 (ge:DI (reg/v:DF 135 [ w ])
 (reg/v:DF 136 [ x ]))) 297 {*cstoredfdi4}
  (nil))
(insn 30 29 31 2 (set (reg:DI 142)
 (neg:DI (reg:DI 141))) 15 {negdi2}
  (nil))
(insn 31 30 32 2 (set (reg:DI 143)
 (and:DI (reg:DI 142)
 (reg/v:DI 137 [ y ]))) 102 {*anddi3}
  (nil))
(insn 32 31 33 2 (set (reg:DI 144)
 (not:DI (reg:DI 142))) 111 {one_cmpldi2}
  (nil))
(insn 33 32 34 2 (set (reg:DI 145)
 (and:DI (reg:DI 144)
 (reg/v:DI 138 [ z ]))) 102 {*anddi3}
  (nil))
(insn 34 33 21 2 (set (reg/v:DI 138 [ z ])
 (ior:DI (reg:DI 143)
 (reg:DI 145))) 105 {iordi3}
  (nil))

lowering the cost of the code sequence produced (even though combine
would swallow the second insn anyway).

We still need to produce a comparison against constant zero where the
instruction following a floating-point coditional-set operation is a
branch, so add canonicalization to `riscv_expand_conditional_branch'
instead.

gcc/
* config/riscv/riscv.cc (riscv_emit_float_compare) : Handle
separately.
: Return operands supplied as is.
(riscv_emit_binary): Call `riscv_emit_binary' directly rather
than going through a temporary register for word-mode targets.
(riscv_expand_conditional_branch): Canonicalize the comparison
if not against constant zero.

OK
jeff


Re: [PATCH 36/44] RISC-V/testsuite: Add branched cases for generic FP cond moves

2023-11-19 Thread Jeff Law




On 11/18/23 22:42, Maciej W. Rozycki wrote:

Verify, for generic floating-point conditional-move operations that have
a corresponding conditional-set machine instruction, that if-conversion
does *not* trigger at `-mbranch-cost=4' setting, which makes original
branched code sequences cheaper than their branchless equivalents
if-conversion would emit.  Cover all the relevant floating-point
relational operations to make sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/movdibfge.c: New test.
* gcc.target/riscv/movdibfgt.c: New test.
* gcc.target/riscv/movdibfle.c: New test.
* gcc.target/riscv/movdibflt.c: New test.
* gcc.target/riscv/movdibfne.c: New test.
* gcc.target/riscv/movsibfge.c: New test.
* gcc.target/riscv/movsibfgt.c: New test.
* gcc.target/riscv/movsibfle.c: New test.
* gcc.target/riscv/movsibflt.c: New test.
* gcc.target/riscv/movsibfne.c: New test.

OK
jeff


Re: [PATCH 37/44] RISC-V/testsuite: Add branchless cases for generic FP cond moves

2023-11-19 Thread Jeff Law




On 11/18/23 22:43, Maciej W. Rozycki wrote:

Verify, for generic floating-point conditional-move operations that have
a corresponding conditional-set machine instruction, that if-conversion
triggers (via `cond_move_convert_if_block', which doesn't report) at
`-mbranch-cost=5' setting, which makes branchless code sequences emitted
by if-conversion cheaper than their original branched equivalents, and
that extraneous instructions such as SNEZ, etc. are not present in
output.

gcc/testsuite/
* gcc.target/riscv/movdifge.c: New test.
* gcc.target/riscv/movdifgt.c: New test.
* gcc.target/riscv/movdifle.c: New test.
* gcc.target/riscv/movdiflt.c: New test.
* gcc.target/riscv/movdifne.c: New test.
* gcc.target/riscv/movsifge.c: New test.
* gcc.target/riscv/movsifgt.c: New test.
* gcc.target/riscv/movsifle.c: New test.
* gcc.target/riscv/movsiflt.c: New test.
* gcc.target/riscv/movsifne.c: New test.

OK
jeff


Re: [PATCH 38/44] RISC-V/testsuite: Add branched cases for generic FP cond adds

2023-11-19 Thread Jeff Law




On 11/18/23 22:43, Maciej W. Rozycki wrote:

Verify, for generic floating-point conditional-add operations that have
a corresponding conditional-set machine instruction, that if-conversion
does *not* trigger at `-mbranch-cost=2' setting, which makes original
branched code sequences cheaper than their branchless equivalents
if-conversion would emit.  Cover all the relevant floating-point
relational operations to make sure no corner case escapes.

gcc/testsuite/
* gcc.target/riscv/adddibfeq.c: New test.
* gcc.target/riscv/adddibfge.c: New test.
* gcc.target/riscv/adddibfgt.c: New test.
* gcc.target/riscv/adddibfle.c: New test.
* gcc.target/riscv/adddibflt.c: New test.
* gcc.target/riscv/addsibfeq.c: New test.
* gcc.target/riscv/addsibfge.c: New test.
* gcc.target/riscv/addsibfgt.c: New test.
* gcc.target/riscv/addsibfle.c: New test.
* gcc.target/riscv/addsibflt.c: New test.

OK
jeff


Re: [PATCH 39/44] RISC-V/testsuite: Add branchless cases for generic FP cond adds

2023-11-19 Thread Jeff Law




On 11/18/23 22:43, Maciej W. Rozycki wrote:

Verify, for generic floating-point conditional-add operations that have
a corresponding conditional-set machine instruction, that if-conversion
triggers via `noce_try_addcc' at `-mbranch-cost=3' setting, which makes
branchless code sequences emitted by if-conversion cheaper than their
original branched equivalents, and that extraneous instructions such as
SNEZ, etc. are not present in output.

The reason to XFAIL SImode tests for RV64 targets is the compiler thinks
it has to sign-extend addends, which causes if-conversion to give up.

gcc/testsuite/
* gcc.target/riscv/adddifeq.c: New test.
* gcc.target/riscv/adddifge.c: New test.
* gcc.target/riscv/adddifgt.c: New test.
* gcc.target/riscv/adddifle.c: New test.
* gcc.target/riscv/adddiflt.c: New test.
* gcc.target/riscv/addsifeq.c: New test.
* gcc.target/riscv/addsifge.c: New test.
* gcc.target/riscv/addsifgt.c: New test.
* gcc.target/riscv/addsifle.c: New test.
* gcc.target/riscv/addsiflt.c: New test.

OK
jeff


Re: building GNU gettext on AIX

2023-11-19 Thread Bruno Haible
I wrote in
:
> > The latest issue is that a few files in gettext ignore --disable-pthreads
> > and creates a dependency on pthread_mutex.
> ...
>   * If no, then the simple solution would be to pass the configure option
>   --enable-threads=isoc
> This should not introduce a link dependency, because the mtx_lock,
> mtx_unlock, and mtx_init functions are in libc in AIX ≥ 7.2. Currently it
> does not work (it still uses pthread_mutex_lock and pthread_mutex_unlock
> despite --enable-threads=isoc). But I could make this work and release
> a gettext 0.22.4 with the fix.

Alas, this approach does not help reducing the dependency towards libpthreads.
On AIX, pthread_mutex_t and mtx_t are the same type. Thus code like this
===
#include 
#include 
#include 
#include 

pthread_mutex_t lock1 = PTHREAD_MUTEX_INITIALIZER;

int main ()
{
  if (mtx_lock (&lock1) != thrd_success)
abort ();

  if (mtx_unlock (&lock1) != thrd_success)
abort ();
}
===
compiles and runs fine. But the library dependencies still contain libpthreads.
This is in 32-bit mode:

$ ldd a.out
a.out needs:
 /usr/lib/libc.a(shr.o)
 /usr/lib/libc.a(cthread.o)
 /usr/lib/libc.a(_shr.o)
 /unix
 /usr/lib/libcrypt.a(shr.o)
 /usr/lib/libpthreads.a(shr_xpg5.o)
 /usr/lib/libpthreads.a(_shr_xpg5.o)
 /usr/lib/libpthreads.a(shr_comm.o)

and this in 64-bit mode:

$ ldd a.out 
a.out needs:
 /usr/lib/libc.a(shr_64.o)
 /usr/lib/libc.a(cthread_64.o)
 /usr/lib/libc.a(_shr_64.o)
 /unix
 /usr/lib/libcrypt.a(shr_64.o)
 /usr/lib/libpthreads.a(shr_xpg5_64.o)
 /usr/lib/libpthreads.a(_shr_xpg5_64.o)

Apparently the mtx_* functions are provided by
  /usr/lib/libc.a(cthread_64.o)
and this one depends on
  /usr/lib/libpthreads.a(shr_xpg5_64.o)
  /usr/lib/libpthreads.a(_shr_xpg5_64.o)

So, there can be only three ways to build GCC on AIX:

  - With --disable-nls. No i18n, no libpthreads dependency.
  - With --enable-nls, linked against a libintl created in the build
tree with --disable-shared --disable-threads (requires gettext ≥ 0.22.4).
Has i18n, but no libpthreads dependency.
  - With --enable-nls, linked against a public libintl. Depends on libpthreads.

Bruno





Re: [PATCH 40/44] RISC-V: Handle FP NE operator via inversion in cond-operation expansion

2023-11-19 Thread Jeff Law




On 11/18/23 22:43, Maciej W. Rozycki wrote:

We have no FNE.fmt machine instructions, but we can emulate them for the
purpose of conditional-move and conditional-add operations by using the
respective FEQ.fmt instruction and then swapping the data input operands
or complementing the mask for the conditional addend respectively, so
update our handlers accordingly.

gcc/
* config/riscv/riscv-protos.h (riscv_expand_float_scc): Add
`invert_ptr' parameter.
* config/riscv/riscv.cc (riscv_emit_float_compare): Add NE
inversion handling.
(riscv_expand_float_scc): Pass `invert_ptr' through to
`riscv_emit_float_compare'.
(riscv_expand_conditional_move): Pass `&invert' to
`riscv_expand_float_scc'.
* config/riscv/riscv.md (addcc): Likewise.

This and the rest of the patches (41, 42, 43, 44) in this series are OK.

I think between Kito and myself, we've reviewed the whole set, right?

jeff


Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-19 Thread Jason Merrill

On 11/19/23 13:36, waffl3x wrote:

I'm having trouble fixing the error for this case, the control flow
when the functions are overloaded is much more complex.

struct S {
   void f(this S&) {}
   void f(this S&, int)

   void g() {
 void (*fp)(S&) = &f;
   }
};

This seemed to have fixed the non overloaded case, but I'm also not
very happy with it, it feels kind of icky. Especially since the expr's
location isn't available here, although, it just occurred to me that
the expr's location is probably stored in the node.

typeck.cc:cp_build_addr_expr_1
```
 case BASELINK:
   arg = BASELINK_FUNCTIONS (arg);
   if (DECL_XOBJ_MEMBER_FUNC_P (
 {
   error ("You must qualify taking address of xobj member functions");
  return error_mark_node;
 }


The loc variable was set earlier in the function, you can use that.

The overloaded case we want to handle here in 
resolve_address_of_overloaded_function:



  if (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn)
  && !(complain & tf_ptrmem_ok) && !flag_ms_extensions)
{
  static int explained;

  if (!(complain & tf_error))
return error_mark_node;

  auto_diagnostic_group d;
  if (permerror (input_location, "assuming pointer to member %qD", fn)
  && !explained)
{
  inform (input_location, "(a pointer to member can only be "
  "formed with %<&%E%>)", fn);
  explained = 1;
}
}


Jason



[PATCH, v2] Fortran: restrictions on integer arguments to SYSTEM_CLOCK [PR112609]

2023-11-19 Thread Harald Anlauf

Hi Steve,

On 11/19/23 01:04, Steve Kargl wrote:

On Sat, Nov 18, 2023 at 11:12:55PM +0100, Harald Anlauf wrote:

Regtested on x86_64-pc-linux-gnu.  OK for mainline?



Not in its current form.


  {
+  int first_int_kind = -1;
+  bool f2023 = ((gfc_option.allow_std & GFC_STD_F2023) != 0
+   && (gfc_option.allow_std & GFC_STD_GNU) == 0);
+


If you use the gfc_notify_std(), then you should not need the
above check on GFC_STD_GNU as it should include GFC_STD_F2023.


this is actually the question (and problem).  For all new features,
-std=gnu shall include everything allowed by -std=f2023.

Here we have the problem that the testcase is valid F2018 and is
silently accepted by gfortran-13 for -std=gnu and -std=f2018.

I prefer to keep it that way also for gfortran-14, and apply the
new restrictions only for -std=f2023.  Do we agree on this?

Now that should happen for -std=gnu -pedantic (-w)?

I have thought some more and came up with the revised attached
patch, which still has the above condition.  It now marks the
diagnostics as GNU extensions beyond F2023 for -std=f2023.

The mask f2023 in the above form suppresses new warnings even
for -pedantic; one would normally use -w to suppress them.

Now if you remove the second part of the condition, we will
regress on testcases system_clock_1.f90 and system_clock_3.f90
because they would emit GNU extension warnings because the
testsuite runs with -pedantic.

The options I see:

- use patch-V1 (although diagnostics are better in V2),

- use patch-V2,

- use patch-V2, but enable -pedantic warnings for previously
  valid code, and adjust the failing testcases

- ???


Elsewhere in the FE, gfortran uses gfc_notify_std() to enforce
requirements of a Fortran standard.  The above would be

   if (count->ts.kind < gfc_default_integer_kind
   && gfc_notify_std (GFC_STD_F2023, "COUNT argument to SYSTEM_CLOCK "
  "at %L must have kind of at least default 
integer",
  &count->where))


I tried this first, and it did not do the job.

The logic in gfc_notify_std is:

  estd = std & ~gfc_option.allow_std;  /* Standard to error about.  */
  error = (estd != 0);
  if (error)
msg = notify_std_msg (estd);
...

So for -std=f2023 we get estd=0, error=false, and *NO* error.
For -std=f2018 we get error=true and an error message.
This is the opposite of what is needed.

Can you please try yourself?


Note, gfc_notify_std() should add the 'Fortran 2023: ' string,
if not, that should be fixed.


This I did fix.


Of course, I seldom provide patches if others don't have a comment
then do as you like.


Thanks for your feedback!

Harald

From 2a85dc469696c85524459380ce11faa20e558680 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sun, 19 Nov 2023 21:14:37 +0100
Subject: [PATCH] Fortran: restrictions on integer arguments to SYSTEM_CLOCK
 [PR112609]

Fortran 2023 added restrictions on integer arguments to SYSTEM_CLOCK to
have a decimal exponent range at least as large as a default integer,
and that all integer arguments have the same kind type parameter.

gcc/fortran/ChangeLog:

	PR fortran/112609
	* check.cc (gfc_check_system_clock): Add checks on integer arguments
	to SYSTEM_CLOCK specific to F2023.
	* error.cc (notify_std_msg): Adjust to handle new features added
	in F2023.

gcc/testsuite/ChangeLog:

	PR fortran/112609
	* gfortran.dg/system_clock_4.f90: New test.
---
 gcc/fortran/check.cc | 52 
 gcc/fortran/error.cc |  4 +-
 gcc/testsuite/gfortran.dg/system_clock_4.f90 | 24 +
 3 files changed, 79 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/system_clock_4.f90

diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index 6c45e6542f0..faaea853bc4 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -6774,6 +6774,10 @@ bool
 gfc_check_system_clock (gfc_expr *count, gfc_expr *count_rate,
 			gfc_expr *count_max)
 {
+  int first_int_kind = -1;
+  bool f2023 = ((gfc_option.allow_std & GFC_STD_F2023) != 0
+		&& (gfc_option.allow_std & GFC_STD_GNU) == 0);
+
   if (count != NULL)
 {
   if (!scalar_check (count, 0))
@@ -6788,8 +6792,17 @@ gfc_check_system_clock (gfc_expr *count, gfc_expr *count_rate,
 			  &count->where))
 	return false;
 
+  if (f2023 && count->ts.kind < gfc_default_integer_kind
+	  && !gfc_notify_std (GFC_STD_GNU, "Fortran 2023 requires "
+			  "COUNT argument to SYSTEM_CLOCK at %L "
+			  "to have a kind of at least default integer",
+			  &count->where))
+	return false;
+
   if (!variable_check (count, 0, false))
 	return false;
+
+  first_int_kind = count->ts.kind;
 }
 
   if (count_rate != NULL)
@@ -6816,6 +6829,16 @@ gfc_check_system_clock (gfc_expr *count, gfc_expr *count_rate,
   "SYSTEM_CLOCK at %L has non-default kind",
   &count_rate->where))
 	return false;
+
+	  if (f2023 && count_rate->ts.kind < gfc_default_integ

[committed] RISC-V: Infrastructure for instruction fusion

2023-11-19 Thread Jeff Law




I've been meaning to extract this and upstream it for a long time.  The 
work is primarily Philipp from VRULL with one case added by Raphael and 
light bugfixing on my part.


Essentially there's 10 distinct fusions supported and they can be 
selected individually by building a suitable mask in the uarch tuning 
structure.  Additional cases can be added -- the bulk of the effort is 
in recognizing the two fusible instructions.


The cases supported in this patch are all from the Veyron V1 processor, 
though the hope is they will be useful elsewhere.  I would encourage 
those familiar with other uarch implementations to enable fusion cases 
for those uarchs and extend the set of supported cases if any are missing.


Pushing to the trunk...

jeffcommit c177f28d601408180fdb2db0d5ba89d53b370b5e
Author: Philipp Tomsich 
Date:   Sun Nov 19 14:11:45 2023 -0700

[committed] RISC-V: Infrastructure for instruction fusion

I've been meaning to extract this and upstream it for a long time.  The 
work is
primarily Philipp from VRULL with one case added by Raphael and light 
bugfixing
on my part.

Essentially there's 10 distinct fusions supported and they can be selected
individually by building a suitable mask in the uarch tuning structure.
Additional cases can be added -- the bulk of the effort is in recognizing 
the
two fusible instructions.

The cases supported in this patch are all from the Veyron V1 processor, 
though
the hope is they will be useful elsewhere.  I would encourage those familiar
with other uarch implementations to enable fusion cases for those uarchs and
extend the set of supported cases if any are missing.

gcc/
* config/riscv/riscv-protos.h (extract_base_offset_in_addr): 
Prototype.
* config/riscv/riscv.cc (riscv_fusion_pairs): New enum.
(riscv_tune_param): Add fusible_ops field.
(riscv_tune_param_rocket_tune_info): Initialize new field.
(riscv_tune_param_sifive_7_tune_info): Likewise.
(thead_c906_tune_info): Likewise.
(generic_oo_tune_info): Likewise.
(optimize_size_tune_info): Likewise.
(riscv_macro_fusion_p): New function.
(riscv_fusion_enabled_p): Likewise.
(riscv_macro_fusion_pair_p): Likewise.
(TARGET_SCHED_MACRO_FUSION_P): Define.
(TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
(extract_base_offset_in_addr): Moved into riscv.cc from...
* config/riscv/thead.cc: Here.

Co-authored-by: Raphael Zinsly 
Co-authored-by: Jeff Law 

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 196b53f10f3..ae528db1898 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -626,6 +626,7 @@ extern bool riscv_expand_strcmp (rtx, rtx, rtx, rtx, rtx);
 extern bool riscv_expand_strlen (rtx, rtx, rtx, rtx);
 
 /* Routines implemented in thead.cc.  */
+extern bool extract_base_offset_in_addr (rtx, rtx *, rtx *);
 extern bool th_mempair_operands_p (rtx[4], bool, machine_mode);
 extern void th_mempair_order_operands (rtx[4], bool, machine_mode);
 extern void th_mempair_prepare_save_restore_operands (rtx[4], bool,
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index c2bd1c2ed29..3701f41b1b3 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -249,6 +249,21 @@ struct riscv_integer_op {
The worst case is LUI, ADDI, SLLI, ADDI, SLLI, ADDI, SLLI, ADDI.  */
 #define RISCV_MAX_INTEGER_OPS 8
 
+enum riscv_fusion_pairs
+{
+  RISCV_FUSE_NOTHING = 0,
+  RISCV_FUSE_ZEXTW = (1 << 0),
+  RISCV_FUSE_ZEXTH = (1 << 1),
+  RISCV_FUSE_ZEXTWS = (1 << 2),
+  RISCV_FUSE_LDINDEXED = (1 << 3),
+  RISCV_FUSE_LUI_ADDI = (1 << 4),
+  RISCV_FUSE_AUIPC_ADDI = (1 << 5),
+  RISCV_FUSE_LUI_LD = (1 << 6),
+  RISCV_FUSE_AUIPC_LD = (1 << 7),
+  RISCV_FUSE_LDPREINCREMENT = (1 << 8),
+  RISCV_FUSE_ALIGNED_STD = (1 << 9),
+};
+
 /* Costs of various operations on the different architectures.  */
 
 struct riscv_tune_param
@@ -264,6 +279,7 @@ struct riscv_tune_param
   unsigned short fmv_cost;
   bool slow_unaligned_access;
   bool use_divmod_expansion;
+  unsigned int fusible_ops;
 };
 
 
@@ -344,6 +360,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
   false,   /* use_divmod_expansion */
+  RISCV_FUSE_NOTHING,   /* fusible_ops */
 };
 
 /* Costs to use when optimizing for Sifive 7 Series.  */
@@ -359,6 +376,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
   false,   

Re: [PATCH] RISC-V: Implement -mmemcpy-strategy= options[PR112537]

2023-11-19 Thread Jeff Law




On 11/16/23 22:12, Li Xu wrote:

From: xuli

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112537

-mmemcpy-strategy=[auto|libcall|scalar|vector]

auto: Current status, use scalar or vector instructions.
libcall: Always use a library call.
scalar: Only use scalar instructions.
vector: Only use vector instructions.

PR target/112537

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum): 
Strategy enum.
* config/riscv/riscv-string.cc (riscv_expand_block_move): Disabled 
based on options.
(expand_block_move): Ditto.
* config/riscv/riscv.opt: Add -mmemcpy-strategy=.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/cpymem-strategy-1.c: New test.
* gcc.target/riscv/rvv/base/cpymem-strategy-2.c: New test.
* gcc.target/riscv/rvv/base/cpymem-strategy-3.c: New test.
* gcc.target/riscv/rvv/base/cpymem-strategy-4.c: New test.
 * gcc.target/riscv/rvv/base/cpymem-strategy-5.c: New test.
* gcc.target/riscv/rvv/base/cpymem-strategy.h: New test.
This is OK assuming you have tested it to ensure there aren't any 
regressions in the testsuite.  I don't expect problems, but let's be 
sure :-)


Thanks,
jeff



Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-19 Thread waffl3x






On Sunday, November 19th, 2023 at 1:34 PM, Jason Merrill  
wrote:


> 
> 
> On 11/19/23 13:36, waffl3x wrote:
> 
> > I'm having trouble fixing the error for this case, the control flow
> > when the functions are overloaded is much more complex.
> > 
> > struct S {
> > void f(this S&) {}
> > void f(this S&, int)
> > 
> > void g() {
> > void (*fp)(S&) = &f;
> > }
> > };
> > 
> > This seemed to have fixed the non overloaded case, but I'm also not
> > very happy with it, it feels kind of icky. Especially since the expr's
> > location isn't available here, although, it just occurred to me that
> > the expr's location is probably stored in the node.
> > 
> > typeck.cc:cp_build_addr_expr_1
> > ```
> > case BASELINK:
> > arg = BASELINK_FUNCTIONS (arg);
> > if (DECL_XOBJ_MEMBER_FUNC_P (
> > {
> > error ("You must qualify taking address of xobj member functions");
> > return error_mark_node;
> > }
> 
> 
> The loc variable was set earlier in the function, you can use that.

Will do.

> The overloaded case we want to handle here in
> resolve_address_of_overloaded_function:
> 
> > if (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn)
> > && !(complain & tf_ptrmem_ok) && !flag_ms_extensions)
> > {
> > static int explained;
> > 
> > if (!(complain & tf_error))
> > return error_mark_node;
> > 
> > auto_diagnostic_group d;
> > if (permerror (input_location, "assuming pointer to member %qD", fn)
> > && !explained)
> > {
> > inform (input_location, "(a pointer to member can only be "
> > "formed with %<&%E%>)", fn);
> > explained = 1;
> > }
> > }
> 
> 
> Jason

I'll check that out now, I just mostly finished the first lambda crash.

What is the proper way to error out of instantiate_body? What I have
right now is just not recursing down further if theres a problem. Also,
I'm starting to wonder if I should actually be erroring in
instantiate_decl instead.

I guess it will be better to just finish and you can share your
comments upon review though.

Alex


Re: [PATCH v2 2/5] c-family: Simplify attribute exclusion handling

2023-11-19 Thread Jeff Law




On 11/16/23 19:53, Andrew Carlotti wrote:

This patch changes the handling of mutual exclusions involving the
target and target_clones attributes to use the generic attribute
exclusion lists.  Additionally, the duplicate handling for the
always_inline and noinline attribute exclusion is removed.

The only change in functionality is the choice of warning message
displayed - due to either a change in the wording for mutual exclusion
warnings, or a change in the order in which different checks occur.

Ok for master?

gcc/c-family/ChangeLog:

* c-attribs.cc (attr_always_inline_exclusions): New.
(attr_target_exclusions): Ditto.
(attr_target_clones_exclusions): Ditto.
(c_common_attribute_table): Add new exclusion lists.
(handle_noinline_attribute): Remove custom exclusion handling.
(handle_always_inline_attribute): Ditto.
(handle_target_attribute): Ditto.
(handle_target_clones_attribute): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mvc2.C:
* g++.target/i386/mvc3.C:

OK
jeff


Re: [PATCH] c++: Set DECL_CONTEXT for __cxa_thread_atexit [PR99187]

2023-11-19 Thread Nathan Sidwell

On 11/16/23 16:39, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
access.

-- >8 --

Modules streaming requires DECL_CONTEXT to be set on declarations that
are streamed. This ensures that __cxa_thread_atexit is given translation
unit context much like is already done with many other support
functions.

PR c++/99187

gcc/cp/ChangeLog:

* cp-tree.h (enum cp_tree_index):
(thread_atexit_node):
* decl.cc (get_thread_atexit_node):

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99187.C: New test.

Signed-off-by: Nathaniel Shead 


thanks, I've committed it for you.

nathan

--
Nathan Sidwell



Re: [PATCH 1/5] Add register filter operand to define_register_constraint

2023-11-19 Thread Jeff Law




On 11/12/23 07:52, Richard Sandiford wrote:

The main way of enforcing registers to be aligned is through
HARD_REGNO_MODE_OK.  But this is a global property that applies
to all operands.  A given (regno, mode) pair is either globally
valid or globally invalid.

This patch instead adds a way of specifying that individual operands
must be aligned.  More generally, it allows constraints to specify
a C++ condition that the operand's REGNO must satisfy.  The condition
must be invariant for a given set of target options, so that it can
be precomputed and cached as a HARD_REG_SET.

This information will be used in very compile-time-sensitive
parts of the compiler.  A lot of the complication is in allowing
the information to be stored and tested without much memory cost,
and without impacting targets that don't use the feature.

Specifically:

- Constraints are encouraged to test the absolute REGNO rather than
   an offset from the start of the containing class.  For example,
   all constraints for even registers should use the same condition,
   such as "regno % 2 == 0".  This requires the classes to start at
   even register boundaries, but that's already an implicit
   requirement due to things like the ira-costs.cc code that begins:

   /* Some targets allow pseudos to be allocated to unaligned sequences
 of hard registers.  However, selecting an unaligned sequence can
 unnecessarily restrict later allocations.  So increase the cost of
 unaligned hard regs to encourage the use of aligned hard regs.  */

- Each unique condition is given a "filter identifier".

- The total number of filters is given by NUM_REGISTER_FILTERS,
   defined automatically in insn-config.h.  Structures can therefore use
   a bitfield of NUM_REGISTER_FILTERS to represent a mask of filters.

- There is a new target global, target_constraints, that caches the
   HARD_REG_SET for each filter.

- There is a function for looking up the HARD_REG_SET filter for a given
   constraint and one for looking up the filter id.  Both simply return
   a constant on targets that don't use the feature.

- There are functions for testing a register against a specific filter,
   or against a mask of filters.

This patch just adds the information.  Later ones make use of it.

gcc/
* rtl.def (DEFINE_REGISTER_CONSTRAINT): Add an optional filter
operand.
* doc/md.texi (define_register_constraint): Document it.
* doc/tm.texi.in: Reference it in discussion about aligned registers.
* doc/tm.texi: Regenerate.
* gensupport.h (register_filters, get_register_filter_id): Declare.
* gensupport.cc (register_filter_map, register_filters): New variables.
(get_register_filter_id): New function.
(process_define_register_constraint): Likewise.
(process_rtx): Pass define_register_constraints to
process_define_register_constraint.
* genconfig.cc (main): Emit a definition of NUM_REGISTER_FILTERS.
* genpreds.cc (constraint_data): Add a filter field.
(add_constraint): Update accordingly.
(process_define_register_constraint): Pass the filter operand.
(write_init_reg_class_start_regs): New function.
(write_get_register_filter): Likewise.
(write_get_register_filter_id): Likewise.
(write_tm_preds_h): Write a definition of target_constraints,
plus helpers to test its contents.  Write the get_register_filter*
functions.
(write_insn_preds_c): Write init_reg_class_start_regs.
* reginfo.cc (init_reg_class_start_regs): Declare.
(init_reg_sets): Call it.
* target-globals.h (this_target_constraints): Declare.
(target_globals): Add a constraints field.
(restore_target_globals): Update accordingly.
* target-globals.cc: Include tm_p.h.
(default_target_globals): Initialize the constraints field.
(save_target_globals): Handle the constraints field.
(target_globals::~target_globals): Likewise.
OK.  Mostly focused on the concept -- if we need to iterate on the 
implementation after your using it we can certainly do that.


Jeff


Re: [PATCH 2/5] recog: Handle register filters

2023-11-19 Thread Jeff Law




On 11/12/23 07:52, Richard Sandiford wrote:

The main (but simplest) part of this patch makes constrain_operands
take register filters into account.

The rest of the patch adds register filter information to
operand_alternative.  Generally, if two register constraints
have different register filters, it's better if they're in separate
alternatives.  However, the syntax doesn't enforce that, and we can't
assert it due to inline asms.  So it's a choice between (a) adding
code to enforce consistent filters or (b) dealing with mixes of filters
in a conservatively correct way (in the sense of not allowing invalid
operands).  The latter seems much easier.

The patch therefore adds a mask of the filters that apply
to at least one constraint in a given operand alternative.
A register is OK if it passes all of the filters in the mask.

gcc/
* recog.h (operand_alternative): Add a register_filters field.
(alternative_register_filters): New function.
* recog.cc (preprocess_constraints): Calculate the filters field.
(constrain_operands): Check register filters.

OK
jeff


Re: [PATCH 5/5] Add an aligned_register_operand predicate

2023-11-19 Thread Jeff Law




On 11/12/23 07:52, Richard Sandiford wrote:

This patch adds a target-independent aligned_register_operand
predicate, for use with register constraints that use filters
to impose an alignment.  The definition deliberately jetisons
some of the historical baggage in general_operand.

gcc/
* common.md (aligned_register_operand): New predicate.

OK
jeff


libstdc++: Speed up push_back

2023-11-19 Thread Jan Hubicka
Hi,
this patch speeds up the push_back at -O3 significantly by making the
reallocation to be inlined by default.  _M_realloc_insert is general
insertion that takes iterator pointing to location where the value
should be inserted.  As such it contains code to move other entries around
that is quite large.

Since appending to the end of array is common operation, I think we should
have specialized code for that.  Sadly it is really hard to work out this
from IPA passes, since we basically care whether the iterator points to
the same place as the end pointer, which are both passed by reference.
This is inter-procedural value numbering that is quite out of reach.

I also added extra check making it clear that the new length of the vector
is non-zero.  This saves extra conditionals.  Again it is quite hard case
since _M_check_len seem to be able to return 0 if its parameter is 0.
This never happens here, but we are not able to propagate this early nor
at IPA stage.

Would it be OK to duplciate code as this?  The resulting code is still not quite
optimal.

Regtested on x86_64-linux, OK?

Honza

void std::vector::_M_realloc_append (struct vector * 
const this, const struct pair_t & __args#0)
{
  struct _Guard __guard;
  struct pair_t * __new_finish;
  struct pair_t * __old_finish;
  struct pair_t * __old_start;
  struct _Vector_impl * _1;
  long unsigned int _2;
  struct pair_t * _3;
  struct pair_t * _4;
  long int _5;
  long int _6;
  long unsigned int _7;
  long unsigned int _8;
  struct pair_t * _9;
  const size_type _13;
  struct pair_t * _16;
  struct _Vector_impl * _18;
  long int _27;
  long unsigned int _34;

   [local count: 1073741824]:
  _13 = std::vector::_M_check_len (this_11(D), 1, 
"vector::_M_realloc_append");
  if (_13 == 0)
goto ; [0.00%]
  else
goto ; [100.00%]

   [count: 0]:
  __builtin_unreachable ();

   [local count: 1073741824]:
  __old_start_14 = this_11(D)->D.26060._M_impl.D.25361._M_start;
  __old_finish_15 = this_11(D)->D.26060._M_impl.D.25361._M_finish;
  _27 = __old_finish_15 - __old_start_14;
  _18 = &MEM[(struct _Vector_base *)this_11(D)]._M_impl;
  _16 = std::__new_allocator::allocate (_18, _13, 0B);
  _1 = &this_11(D)->D.26060._M_impl;
  __guard ={v} {CLOBBER};
  __guard._M_alloc = _1;
  _2 = (long unsigned int) _27;
  _3 = _16 + _2;
  *_3 = *__args#0_17(D);
  if (_27 > 0)
goto ; [41.48%]
  else
goto ; [58.52%]

   [local count: 445388112]:
  __builtin_memmove (_16, __old_start_14, _2);

   [local count: 1073741824]:
  _34 = _2 + 8;
  __new_finish_19 = _16 + _34;
  __guard._M_storage = __old_start_14;
  _4 = this_11(D)->D.26060._M_impl.D.25361._M_end_of_storage;
  _5 = _4 - __old_start_14;
  _6 = _5 /[ex] 8;
  _7 = (long unsigned int) _6;
  __guard._M_len = _7;
  std::vector::_M_realloc_append(const 
pair_t&)::_Guard::~_Guard (&__guard);
  __guard ={v} {CLOBBER(eol)};
  this_11(D)->D.26060._M_impl.D.25361._M_start = _16;
  this_11(D)->D.26060._M_impl.D.25361._M_finish = __new_finish_19;
  _8 = _13 * 8;
  _9 = _16 + _8;
  this_11(D)->D.26060._M_impl.D.25361._M_end_of_storage = _9;
  return;

}

Notice that memmove can be memcopy and the test whether block size is non-zero 
is useless.


libstdc++-v3/ChangeLog:

PR libstdc++/110287
* include/bits/stl_vector.h (_M_realloc_append): New member function.
(push_back): Use it.
* include/bits/vector.tcc: (emplace_back): Use it.
(_M_realloc_insert): Let compiler know that new vector size is non-zero.
(_M_realloc_append): New member function.

diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index 5e18f6eedce..973f4d7e2e9 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1288,7 +1288,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
_GLIBCXX_ASAN_ANNOTATE_GREW(1);
  }
else
- _M_realloc_insert(end(), __x);
+ _M_realloc_append(__x);
   }
 
 #if __cplusplus >= 201103L
@@ -1822,6 +1822,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   void
   _M_realloc_insert(iterator __position, const value_type& __x);
+
+  void
+  _M_realloc_append(const value_type& __x);
 #else
   // A value_type object constructed with _Alloc_traits::construct()
   // and destroyed with _Alloc_traits::destroy().
@@ -1871,6 +1874,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
void
_M_realloc_insert(iterator __position, _Args&&... __args);
 
+  template
+   _GLIBCXX20_CONSTEXPR
+   void
+   _M_realloc_append(_Args&&... __args);
+
   // Either move-construct at the end, or forward to _M_insert_aux.
   _GLIBCXX20_CONSTEXPR
   iterator
diff --git a/libstdc++-v3/include/bits/vector.tcc 
b/libstdc++-v3/include/bits/vector.tcc
index 80631d1e2a1..1306676e795 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -120,7 +120,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
_GLIBCXX_ASAN_

Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-19 Thread Bruno Haible
David Edelsohn wrote:
> --disable-threads currently does not completely disable threads.  Bruno is
> suggesting --enable-threads=isoc that relies on mtx mutex functions in libc.

Unfortunately, as said in the other mail today, relying only on mtx_* functions
did not drop the dependency towards libpthreads.

So, I've made a new release gettext-0.22.4, that includes only these changes:

  - AM_GNU_GETTEXT now recognizes a statically built libintl on macOS and AIX.

  - Passing --disable-threads now builds a libintl that, on AIX, does not
need -lpthread.

  - Other build fixes on AIX.

> Yes, GCC should configure the in tree gettext with --disable-threads, but
> that configure option is not completely effective and does not produce a
> build without threads references.

Now it is effective. But you (Arsen) should state in the documentation
(gcc/doc/install.texi) that for --disable-threads to have this effect,
one needs gettext version 0.22.4 or newer.

Bruno





Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-19 Thread Bruno Haible
I wrote:
> you (Arsen) should state in the documentation
> (gcc/doc/install.texi) that for --disable-threads to have this effect,
> one needs gettext version 0.22.4 or newer.

Not in gcc/doc/install.texi, but elsewhere. This topic is not relevant to
the average user who installs GCC from a tarball or from a git checkout.
Only to GCC hackers who have peculiar needs.

Bruno





Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-19 Thread Andrew Pinski
On Sun, Nov 19, 2023 at 3:01 PM Bruno Haible  wrote:
>
> I wrote:
> > you (Arsen) should state in the documentation
> > (gcc/doc/install.texi) that for --disable-threads to have this effect,
> > one needs gettext version 0.22.4 or newer.
>
> Not in gcc/doc/install.texi, but elsewhere. This topic is not relevant to
> the average user who installs GCC from a tarball or from a git checkout.
> Only to GCC hackers who have peculiar needs.

That still is documented in install.texi really.
https://gcc.gnu.org/install/specific.html#x-ibm-aix is generated from
that and it talks about other options dealing with NLS there.
Which seems like that part might need to be updated too.

Thanks,
Andrew


>
> Bruno
>
>
>


Re: [PATCH] testsuite: Fix subexpressions with `scan-assembler-times'

2023-11-19 Thread Jeff Law




On 11/19/23 04:27, Maciej W. Rozycki wrote:

We have an issue with `scan-assembler-times' handling expressions using
subexpressions as produced by capturing parentheses `()' in an odd way,
and one that is inconsistent with `scan-assembler', `scan-assembler-not',
etc.  The problem comes from calling `regexp' with `-inline -all', which
causes a list to be returned that would otherwise be placed in match
variables.

Consequently if we have say:

/* { dg-final { scan-assembler-times "\\s(foo|bar)\\s" 1 } } */

in a test case and there is a lone `foo' present in output being matched,
then our invocation of `regexp -inline -all' in `scan-assembler-times'
will return:

{ foo } foo

and that in turn will confuse our match count calculation as `llength'
will return 2 rather than 1, making the test fail even though `foo' was
only actually matched once.

It seems unclear why we chose to call `regexp' in such an odd way in the
first place just to figure out the number of matches.  The first version
of TCL that supports the `-all' option to `regexp' is 8.3, and according
to its documentation[1][2] `regexp' already returns the number of matches
found whenever `-all' has been used *unless* `-inline' has also been used.

Remove the `-inline' option then along with the `llength' invocation.

References:

[1] "Tcl Built-In Commands - regexp manual page",
 

[2] "Tcl Built-In Commands - regexp manual page",
 

gcc/testsuite/
* lib/scanasm.exp (scan-assembler-times): Remove the `-inline'
option to `regexp' and the wrapping `llength' call.
---
Hi,

  Verified with the `riscv64-linux-gnu' target and the C language
testsuite.  OK to apply?
Not sure why it is the way it is -- I walked back to Zdenek's change 
which introduced the scan-assembler-times and nothing about the -inline 
argument.


OK, but be on the lookout for scan-asm problems on other targets over 
the next few days.


Jeff


[PATCH v3 0/5] LoongArch: SIMD fixes and optimizations

2023-11-19 Thread Xi Ruoyao
The [1/5] patch is the PR112578 fix at
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637097.html.
It has been changed to remove the nearbyint pattern (because nearbyint
should not raise FE_INEXACT even if -ffp-int-builtin-inexact).
As other patches depending on the simd.md file introduced by this, sending
it as the first of this series.

As many LASX instructions are only differentiated from the corresponding
LSX instruction with operand length, create simd.md file to contain the
RTX templates sharable by LSX and LASX.  This makes the code cleaner and
easier to maintain.

The [2/5] and [3/5] patches make vector product highpart and rotate
shift operations for GNU vectors and auto vectorization.

The [4/5] patch is a simple code cleanup, with no function change.

The [5/5] patch uses LSX for FP scalar rounding operations if LSX is
available and -ffp-int-builtin-exact.  We do this because the base FP
ISA does not have such instructions.  Using LSX is overkill, but still
much faster than calling libc functions.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (5):
  LoongArch: Fix usage of LSX and LASX frint/ftint instructions
[PR112578]
  LoongArch: Use standard pattern name and RTX code for LSX/LASX muh
instructions
  LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate
shift
  LoongArch: Remove lrint_allow_inexact
  LoongArch: Use LSX for scalar FP rounding with explicit rounding mode

 gcc/config/loongarch/lasx.md  | 283 -
 gcc/config/loongarch/loongarch-builtins.cc|  52 ++--
 gcc/config/loongarch/loongarch.md |  12 +-
 gcc/config/loongarch/lsx.md   | 293 --
 gcc/config/loongarch/simd.md  | 268 
 .../loongarch/vect-frint-no-inexact.c |  48 +++
 .../loongarch/vect-frint-scalar-no-inexact.c  |  23 ++
 .../gcc.target/loongarch/vect-frint-scalar.c  |  43 +++
 .../gcc.target/loongarch/vect-frint.c |  85 +
 .../loongarch/vect-ftint-no-inexact.c |  44 +++
 .../gcc.target/loongarch/vect-ftint.c |  83 +
 gcc/testsuite/gcc.target/loongarch/vect-muh.c |  36 +++
 .../gcc.target/loongarch/vect-rotr.c  |  36 +++
 13 files changed, 701 insertions(+), 605 deletions(-)
 create mode 100644 gcc/config/loongarch/simd.md
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c

-- 
2.42.1



[PATCH v3 1/5] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-19 Thread Xi Ruoyao
The usage LSX and LASX frint/ftint instructions had some problems:

1. These instructions raises FE_INEXACT, which is not allowed with
   -fno-fp-int-builtin-inexact for most C2x section F.10.6 functions
   (the only exceptions are rint, lrint, and llrint).
2. The "frint" instruction without explicit rounding mode is used for
   roundM2, this is incorrect because roundM2 is defined "rounding
   operand 1 to the *nearest* integer, rounding away from zero in the
   event of a tie".  We actually don't have such an instruction.  Our
   frintrne instruction is roundevenM2 (unfortunately, this is not
   documented).
3. These define_insn's are written in a way not so easy to hack.

So I removed these instructions and created a "simd.md" file, then added
them and the corresponding expanders there.  The advantage of the
simd.md file is we don't need to duplicate the RTL template twice (in
lsx.md and lasx.md).

gcc/ChangeLog:

PR target/112578
* config/loongarch/lsx.md (UNSPEC_LSX_VFTINT_S,
UNSPEC_LSX_VFTINTRNE, UNSPEC_LSX_VFTINTRP,
UNSPEC_LSX_VFTINTRM, UNSPEC_LSX_VFRINTRNE_S,
UNSPEC_LSX_VFRINTRNE_D, UNSPEC_LSX_VFRINTRZ_S,
UNSPEC_LSX_VFRINTRZ_D, UNSPEC_LSX_VFRINTRP_S,
UNSPEC_LSX_VFRINTRP_D, UNSPEC_LSX_VFRINTRM_S,
UNSPEC_LSX_VFRINTRM_D): Remove.
(ILSX, FLSX): Move into ...
(VIMODE): Move into ...
(FRINT_S, FRINT_D): Remove.
(frint_pattern_s, frint_pattern_d, frint_suffix): Remove.
(lsx_vfrint_, lsx_vftint_s__,
lsx_vftintrne_w_s, lsx_vftintrne_l_d, lsx_vftintrp_w_s,
lsx_vftintrp_l_d, lsx_vftintrm_w_s, lsx_vftintrm_l_d,
lsx_vfrintrne_s, lsx_vfrintrne_d, lsx_vfrintrz_s,
lsx_vfrintrz_d, lsx_vfrintrp_s, lsx_vfrintrp_d,
lsx_vfrintrm_s, lsx_vfrintrm_d,
v4sf2,
v2df2, round2,
fix_trunc2): Remove.
* config/loongarch/lasx.md: Likewise.
* config/loongarch/simd.md: New file.
(ILSX, ILASX, FLSX, FLASX, VIMODE): ... here.
(IVEC, FVEC): New mode iterators.
(VIMODE): ... here.  Extend it to work for all LSX/LASX vector
modes.
(x, wu, simd_isa, WVEC, vimode, simdfmt, simdifmt_for_f,
elebits): New mode attributes.
(UNSPEC_SIMD_FRINTRP, UNSPEC_SIMD_FRINTRZ, UNSPEC_SIMD_FRINT,
UNSPEC_SIMD_FRINTRM, UNSPEC_SIMD_FRINTRNE): New unspecs.
(SIMD_FRINT): New int iterator.
(simd_frint_rounding, simd_frint_pattern): New int attributes.
(_vfrint_): New
define_insn template for frint instructions.
(_vftint__):
Likewise, but for ftint instructions.
(2): New define_expand with
flag_fp_int_builtin_inexact checked.
(l2): Likewise.
(ftrunc2): New define_expand.  It does not require
flag_fp_int_builtin_inexact.
(fix_trunc2): New define_insn_and_split.  It does
not require flag_fp_int_builtin_inexact.
(include): Add lsx.md and lasx.md.
* config/loongarch/loongarch.md (include): Include simd.md,
instead of including lsx.md and lasx.md directly.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vftint_w_s, CODE_FOR_lsx_vftint_l_d,
CODE_FOR_lasx_xvftint_w_s, CODE_FOR_lasx_xvftint_l_d):
Remove.

gcc/testsuite/ChangeLog:

PR target/112578
* gcc.target/loongarch/vect-frint.c: New test.
* gcc.target/loongarch/vect-frint-no-inexact.c: New test.
* gcc.target/loongarch/vect-ftint.c: New test.
* gcc.target/loongarch/vect-ftint-no-inexact.c: New test.
---
 gcc/config/loongarch/lasx.md  | 239 -
 gcc/config/loongarch/loongarch-builtins.cc|   4 -
 gcc/config/loongarch/loongarch.md |   7 +-
 gcc/config/loongarch/lsx.md   | 243 --
 gcc/config/loongarch/simd.md  | 194 ++
 .../loongarch/vect-frint-no-inexact.c |  48 
 .../gcc.target/loongarch/vect-frint.c |  85 ++
 .../loongarch/vect-ftint-no-inexact.c |  44 
 .../gcc.target/loongarch/vect-ftint.c |  83 ++
 9 files changed, 456 insertions(+), 491 deletions(-)
 create mode 100644 gcc/config/loongarch/simd.md
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 2e11f061202..d4a56c307c4 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -53,7 +53,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVFCMP_SULT
   UNSPEC_LASX_XVFCMP_SUN
   UNSPEC_LASX_XVFCMP_SUNE
-  UNSPEC_LASX_XVFTINT_S
   UNSPEC_LASX_XVFTINT_U
   UNSPEC_LASX_XVCLO
   UNSPEC_LASX_XVSAT_S
@@ -92,12 +91,6 @@ (define_c_enum "unspe

[PATCH v3 2/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX muh instructions

2023-11-19 Thread Xi Ruoyao
Removes unnecessary UNSPECs and make the muh instructions useful with
GNU vectors or auto vectorization.

gcc/ChangeLog:

* config/loongarch/simd.md (muh): New code attribute mapping
any_extend to smul_highpart or umul_highpart.
(mul3_highpart): New define_insn.
* config/loongarch/lsx.md (UNSPEC_LSX_VMUH_S): Remove.
(UNSPEC_LSX_VMUH_U): Remove.
(lsx_vmuh_s_): Remove.
(lsx_vmuh_u_): Remove.
* config/loongarch/lasx.md (UNSPEC_LASX_XVMUH_S): Remove.
(UNSPEC_LASX_XVMUH_U): Remove.
(lasx_xvmuh_s_): Remove.
(lasx_xvmuh_u_): Remove.
* config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vmuh_b):
Redefine to standard pattern name.
(CODE_FOR_lsx_vmuh_h): Likewise.
(CODE_FOR_lsx_vmuh_w): Likewise.
(CODE_FOR_lsx_vmuh_d): Likewise.
(CODE_FOR_lsx_vmuh_bu): Likewise.
(CODE_FOR_lsx_vmuh_hu): Likewise.
(CODE_FOR_lsx_vmuh_wu): Likewise.
(CODE_FOR_lsx_vmuh_du): Likewise.
(CODE_FOR_lasx_xvmuh_b): Likewise.
(CODE_FOR_lasx_xvmuh_h): Likewise.
(CODE_FOR_lasx_xvmuh_w): Likewise.
(CODE_FOR_lasx_xvmuh_d): Likewise.
(CODE_FOR_lasx_xvmuh_bu): Likewise.
(CODE_FOR_lasx_xvmuh_hu): Likewise.
(CODE_FOR_lasx_xvmuh_wu): Likewise.
(CODE_FOR_lasx_xvmuh_du): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-muh.c: New test.
---
 gcc/config/loongarch/lasx.md  | 22 
 gcc/config/loongarch/loongarch-builtins.cc| 32 -
 gcc/config/loongarch/lsx.md   | 22 
 gcc/config/loongarch/simd.md  | 16 +
 gcc/testsuite/gcc.target/loongarch/vect-muh.c | 36 +++
 5 files changed, 68 insertions(+), 60 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-muh.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index d4a56c307c4..023a023b44e 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -68,8 +68,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_BRANCH
   UNSPEC_LASX_BRANCH_V
 
-  UNSPEC_LASX_XVMUH_S
-  UNSPEC_LASX_XVMUH_U
   UNSPEC_LASX_MXVEXTW_U
   UNSPEC_LASX_XVSLLWIL_S
   UNSPEC_LASX_XVSLLWIL_U
@@ -2823,26 +2821,6 @@ (define_insn "neg2"
   [(set_attr "type" "simd_logic")
(set_attr "mode" "")])
 
-(define_insn "lasx_xvmuh_s_"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-  (match_operand:ILASX 2 "register_operand" "f")]
- UNSPEC_LASX_XVMUH_S))]
-  "ISA_HAS_LASX"
-  "xvmuh.\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "")])
-
-(define_insn "lasx_xvmuh_u_"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-  (match_operand:ILASX 2 "register_operand" "f")]
- UNSPEC_LASX_XVMUH_U))]
-  "ISA_HAS_LASX"
-  "xvmuh.\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "")])
-
 (define_insn "lasx_xvsllwil_s__"
   [(set (match_operand: 0 "register_operand" "=f")
(unspec: [(match_operand:ILASX_WHB 1 "register_operand" "f")
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index cbd833aa283..a6fcc1c731e 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -319,6 +319,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lsx_vmod_hu CODE_FOR_umodv8hi3
 #define CODE_FOR_lsx_vmod_wu CODE_FOR_umodv4si3
 #define CODE_FOR_lsx_vmod_du CODE_FOR_umodv2di3
+#define CODE_FOR_lsx_vmuh_b CODE_FOR_smulv16qi3_highpart
+#define CODE_FOR_lsx_vmuh_h CODE_FOR_smulv8hi3_highpart
+#define CODE_FOR_lsx_vmuh_w CODE_FOR_smulv4si3_highpart
+#define CODE_FOR_lsx_vmuh_d CODE_FOR_smulv2di3_highpart
+#define CODE_FOR_lsx_vmuh_bu CODE_FOR_umulv16qi3_highpart
+#define CODE_FOR_lsx_vmuh_hu CODE_FOR_umulv8hi3_highpart
+#define CODE_FOR_lsx_vmuh_wu CODE_FOR_umulv4si3_highpart
+#define CODE_FOR_lsx_vmuh_du CODE_FOR_umulv2di3_highpart
 #define CODE_FOR_lsx_vmul_b CODE_FOR_mulv16qi3
 #define CODE_FOR_lsx_vmul_h CODE_FOR_mulv8hi3
 #define CODE_FOR_lsx_vmul_w CODE_FOR_mulv4si3
@@ -439,14 +447,6 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lsx_vfnmsub_s CODE_FOR_vfnmsubv4sf4_nmsub4
 #define CODE_FOR_lsx_vfnmsub_d CODE_FOR_vfnmsubv2df4_nmsub4
 
-#define CODE_FOR_lsx_vmuh_b CODE_FOR_lsx_vmuh_s_b
-#define CODE_FOR_lsx_vmuh_h CODE_FOR_lsx_vmuh_s_h
-#define CODE_FOR_lsx_vmuh_w CODE_FOR_lsx_vmuh_s_w
-#define CODE_FOR_lsx_vmuh_d CODE_FOR_lsx_vmuh_s_d
-#define CODE_FOR_lsx_vmuh_bu CODE_FOR_lsx_vmuh_u_bu
-#define CODE_FOR_lsx_vmuh_hu CODE_FOR_lsx_vmuh_u_hu
-#define CODE_FOR_lsx_vmuh_wu CODE_FOR_lsx_vmuh_u_wu
-#define CODE_FOR_lsx_vmuh_du CODE_FOR_lsx_vmuh_u_du
 #define CODE_FOR_lsx_vsllwil

[PATCH v3 3/5] LoongArch: Use standard pattern name and RTX code for LSX/LASX rotate shift

2023-11-19 Thread Xi Ruoyao
Remove unnecessary UNSPECs and make the [x]vrotr[i] instructions useful
with GNU vectors and auto vectorization.

gcc/ChangeLog:

* config/loongarch/lsx.md (bitimm): Move to ...
(UNSPEC_LSX_VROTR): Remove.
(lsx_vrotr_): Remove.
(lsx_vrotri_): Remove.
* config/loongarch/lasx.md (UNSPEC_LASX_XVROTR): Remove.
(lsx_vrotr_): Remove.
(lsx_vrotri_): Remove.
* config/loongarch/simd.md (bitimm): ... here.  Expand it to
cover LASX modes.
(vrotr3): New define_insn.
(vrotri3): New define_insn.
* config/loongarch/loongarch-builtins.cc:
(CODE_FOR_lsx_vrotr_b): Use standard pattern name.
(CODE_FOR_lsx_vrotr_h): Likewise.
(CODE_FOR_lsx_vrotr_w): Likewise.
(CODE_FOR_lsx_vrotr_d): Likewise.
(CODE_FOR_lasx_xvrotr_b): Likewise.
(CODE_FOR_lasx_xvrotr_h): Likewise.
(CODE_FOR_lasx_xvrotr_w): Likewise.
(CODE_FOR_lasx_xvrotr_d): Likewise.
(CODE_FOR_lsx_vrotri_b): Define to standard pattern name.
(CODE_FOR_lsx_vrotri_h): Likewise.
(CODE_FOR_lsx_vrotri_w): Likewise.
(CODE_FOR_lsx_vrotri_d): Likewise.
(CODE_FOR_lasx_xvrotri_b): Likewise.
(CODE_FOR_lasx_xvrotri_h): Likewise.
(CODE_FOR_lasx_xvrotri_w): Likewise.
(CODE_FOR_lasx_xvrotri_d): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-rotr.c: New test.
---
 gcc/config/loongarch/lasx.md  | 22 
 gcc/config/loongarch/loongarch-builtins.cc| 16 +
 gcc/config/loongarch/lsx.md   | 28 ---
 gcc/config/loongarch/simd.md  | 29 +++
 .../gcc.target/loongarch/vect-rotr.c  | 36 +++
 5 files changed, 81 insertions(+), 50 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-rotr.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 023a023b44e..116b30c0774 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -138,7 +138,6 @@ (define_c_enum "unspec" [
   UNSPEC_LASX_XVHSUBW_Q_D
   UNSPEC_LASX_XVHADDW_QU_DU
   UNSPEC_LASX_XVHSUBW_QU_DU
-  UNSPEC_LASX_XVROTR
   UNSPEC_LASX_XVADD_Q
   UNSPEC_LASX_XVSUB_Q
   UNSPEC_LASX_XVREPLVE
@@ -4232,18 +4231,6 @@ (define_insn "lasx_xvhsubw_qu_du"
   [(set_attr "type" "simd_int_arith")
(set_attr "mode" "V4DI")])
 
-;;XVROTR.B   XVROTR.H   XVROTR.W   XVROTR.D
-;;TODO-478
-(define_insn "lasx_xvrotr_"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (unspec:ILASX [(match_operand:ILASX 1 "register_operand" "f")
-  (match_operand:ILASX 2 "register_operand" "f")]
- UNSPEC_LASX_XVROTR))]
-  "ISA_HAS_LASX"
-  "xvrotr.\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "")])
-
 ;;XVADD.Q
 ;;TODO2
 (define_insn "lasx_xvadd_q"
@@ -4426,15 +4413,6 @@ (define_insn "lasx_xvexth_qu_du"
   [(set_attr "type" "simd_fcvt")
(set_attr "mode" "V4DI")])
 
-(define_insn "lasx_xvrotri_"
-  [(set (match_operand:ILASX 0 "register_operand" "=f")
-   (rotatert:ILASX (match_operand:ILASX 1 "register_operand" "f")
-  (match_operand 2 "const__operand" "")))]
-  "ISA_HAS_LASX"
-  "xvrotri.\t%u0,%u1,%2"
-  [(set_attr "type" "simd_shf")
-   (set_attr "mode" "")])
-
 (define_insn "lasx_xvextl_q_d"
   [(set (match_operand:V4DI 0 "register_operand" "=f")
(unspec:V4DI [(match_operand:V4DI 1 "register_operand" "f")]
diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index a6fcc1c731e..5d037ab7f10 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -369,6 +369,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lsx_vsrli_h CODE_FOR_vlshrv8hi3
 #define CODE_FOR_lsx_vsrli_w CODE_FOR_vlshrv4si3
 #define CODE_FOR_lsx_vsrli_d CODE_FOR_vlshrv2di3
+#define CODE_FOR_lsx_vrotr_b CODE_FOR_vrotrv16qi3
+#define CODE_FOR_lsx_vrotr_h CODE_FOR_vrotrv8hi3
+#define CODE_FOR_lsx_vrotr_w CODE_FOR_vrotrv4si3
+#define CODE_FOR_lsx_vrotr_d CODE_FOR_vrotrv2di3
+#define CODE_FOR_lsx_vrotri_b CODE_FOR_rotrv16qi3
+#define CODE_FOR_lsx_vrotri_h CODE_FOR_rotrv8hi3
+#define CODE_FOR_lsx_vrotri_w CODE_FOR_rotrv4si3
+#define CODE_FOR_lsx_vrotri_d CODE_FOR_rotrv2di3
 #define CODE_FOR_lsx_vsub_b CODE_FOR_subv16qi3
 #define CODE_FOR_lsx_vsub_h CODE_FOR_subv8hi3
 #define CODE_FOR_lsx_vsub_w CODE_FOR_subv4si3
@@ -634,6 +642,14 @@ AVAIL_ALL (lasx, ISA_HAS_LASX)
 #define CODE_FOR_lasx_xvsrli_h CODE_FOR_vlshrv16hi3
 #define CODE_FOR_lasx_xvsrli_w CODE_FOR_vlshrv8si3
 #define CODE_FOR_lasx_xvsrli_d CODE_FOR_vlshrv4di3
+#define CODE_FOR_lasx_xvrotr_b CODE_FOR_vrotrv32qi3
+#define CODE_FOR_lasx_xvrotr_h CODE_FOR_vrotrv16hi3
+#define CODE_FOR_lasx_xvrotr_w CODE_FOR_vrotrv8si3
+#define CODE_FOR_lasx_xvrotr_d CODE_FOR_vrotrv4di3
+#define CODE_FOR_lasx_xvrotri_b CODE_FOR_rotrv32qi3
+#define COD

[PATCH v3 4/5] LoongArch: Remove lrint_allow_inexact

2023-11-19 Thread Xi Ruoyao
No functional change, just a cleanup.

gcc/ChangeLog:

* config/loongarch/loongarch.md (lrint_allow_inexact): Remove.
(2): Check if 
== UNSPEC_FTINT instead of .
---
 gcc/config/loongarch/loongarch.md | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 78ed63f2132..1e019815451 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -585,9 +585,6 @@ (define_int_attr lrint_pattern [(UNSPEC_FTINT "lrint")
 (define_int_attr lrint_submenmonic [(UNSPEC_FTINT "")
(UNSPEC_FTINTRM "rm")
(UNSPEC_FTINTRP "rp")])
-(define_int_attr lrint_allow_inexact [(UNSPEC_FTINT "1")
- (UNSPEC_FTINTRM "0")
- (UNSPEC_FTINTRP "0")])
 
 ;; Iterator and attributes for bytepick.d
 (define_int_iterator bytepick_w_ashift_amount [8 16 24])
@@ -2384,7 +2381,7 @@ (define_insn "2"
(unspec:ANYFI [(match_operand:ANYF 1 "register_operand" "f")]
  LRINT))]
   "TARGET_HARD_FLOAT &&
-   (
+   ( == UNSPEC_FTINT
 || flag_fp_int_builtin_inexact
 || !flag_trapping_math)"
   "ftint.. %0,%1"
-- 
2.42.1



[PATCH v3 5/5] LoongArch: Use LSX for scalar FP rounding with explicit rounding mode

2023-11-19 Thread Xi Ruoyao
In LoongArch FP base ISA there is only the frint.{s/d} instruction which
reads the global rounding mode.  Utilize LSX for explicit rounding mode
even if the operand is scalar.  It seems wasting the CPU power, but
still much faster than calling the library function.

gcc/ChangeLog:

* config/loongarch/simd.md (LSX_SCALAR_FRINT): New int iterator.
(VLSX_FOR_FMODE): New mode attribute.
(2): New expander,
expanding to vreplvei.{w/d} + frint{rp/rz/rm/rne}.{s.d}.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-frint-scalar.c: New test.
* gcc.target/loongarch/vect-frint-scalar-no-inexact.c: New test.
---
 gcc/config/loongarch/simd.md  | 29 +
 .../loongarch/vect-frint-scalar-no-inexact.c  | 23 ++
 .../gcc.target/loongarch/vect-frint-scalar.c  | 43 +++
 3 files changed, 95 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c

diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 6937477e3df..e592de49aa0 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -150,6 +150,35 @@ (define_expand "ftrunc2"
 UNSPEC_SIMD_FRINTRZ))]
   "")
 
+;; Use LSX for scalar ceil/floor/trunc/roundeven when -mlsx and -ffp-int-
+;; builtin-inexact.  The base FP instruction set lacks these operations.
+;; Yes we are wasting 50% or even 75% of the CPU horsepower, but it's still
+;; much faster than calling a libc function: on LA464 and LA664 there is a
+;; 3x ~ 5x speed up.
+;;
+;; Note that a vreplvei instruction is needed or we'll also operate on the
+;; junk in high bits of the vector register and produce random FP exceptions.
+
+(define_int_iterator LSX_SCALAR_FRINT
+  [UNSPEC_SIMD_FRINTRP
+   UNSPEC_SIMD_FRINTRZ
+   UNSPEC_SIMD_FRINTRM
+   UNSPEC_SIMD_FRINTRNE])
+
+(define_mode_attr VLSX_FOR_FMODE [(DF "V2DF") (SF "V4SF")])
+
+(define_expand "2"
+  [(set (match_dup 2)
+ (vec_duplicate:
+   (match_operand:ANYF 1 "register_operand")))
+   (set (match_dup 2)
+   (unspec: [(match_dup 2)] LSX_SCALAR_FRINT))
+   (set (match_operand:ANYF 0 "register_operand")
+   (vec_select:ANYF (match_dup 2) (parallel [(const_int 0)])))
+   (clobber (match_scratch: 3))]
+  "ISA_HAS_LSX && (flag_fp_int_builtin_inexact || !flag_trapping_math)"
+  "operands[2] = gen_reg_rtx (mode);")
+
 ;; vftint.{/rp/rz/rm}
 (define_insn
   "_vftint__"
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c 
b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
new file mode 100644
index 000..002e3b92df7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar-no-inexact.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlsx -fno-fp-int-builtin-inexact" } */
+
+#include "vect-frint-scalar.c"
+
+/* cannot use LSX for these with -fno-fp-int-builtin-inexact,
+   call library function.  */
+/* { dg-final { scan-assembler "\tb\t%plt\\(ceil\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(ceilf\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(floor\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(floorf\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(trunc\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(truncf\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(roundeven\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(roundevenf\\)" } } */
+
+/* nearbyint is not allowed to rasie FE_INEXACT for decades */
+/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyint\\)" } } */
+/* { dg-final { scan-assembler "\tb\t%plt\\(nearbyintf\\)" } } */
+
+/* rint should just use basic FP operation */
+/* { dg-final { scan-assembler "\tfrint\.s" } } */
+/* { dg-final { scan-assembler "\tfrint\.d" } } */
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c 
b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
new file mode 100644
index 000..c7cb40be7d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-frint-scalar.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mlsx" } */
+
+#define test(func, suffix) \
+__typeof__ (1.##suffix) \
+_##func##suffix (__typeof__ (1.##suffix) x) \
+{ \
+  return __builtin_##func##suffix (x); \
+}
+
+test (ceil, f)
+test (ceil, )
+test (floor, f)
+test (floor, )
+test (trunc, f)
+test (trunc, )
+test (roundeven, f)
+test (roundeven, )
+test (nearbyint, f)
+test (nearbyint, )
+test (rint, f)
+test (rint, )
+
+/* { dg-final { scan-assembler "\tvfrintrp\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrm\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrz\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrne\.s" } } */
+/* { dg-final { scan-assembler "\tvfrintrp\.d" } } */
+/* { dg-final { scan-assembler "\tvfrintrm\.d" } } */
+/* { dg-final { scan-assembler "\tvfrintrz\.d" } } 

[RFA] New pass for sign/zero extension elimination

2023-11-19 Thread Jeff Law

This is work originally started by Joern @ Embecosm.

There's been a long standing sense that we're generating too many 
sign/zero extensions on the RISC-V port.  REE is useful, but it's really 
focused on a relatively narrow part of the extension problem.


What Joern's patch does is introduce a new pass which tracks liveness of 
chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31 
and 32..63.


If it encounters a sign/zero extend that sets bits that are never read, 
then it replaces the sign/zero extension with a narrowing subreg.  The 
narrowing subreg usually gets eliminated by subsequent passes (it's just 
a copy after all).


Jivan has done some analysis and found that it eliminates roughly 1% of 
the dynamic instruction stream for x264 as well as some redundant 
extensions in the coremark benchmark (both on rv64).  In my own testing 
as I worked through issues on other architectures I clearly saw it 
helping in various places within GCC itself or in the testsuite.


The basic structure is to first do a fairly standard liveness analysis 
on the chunks, seeding original state with the liveness data from DF. 
Once that's stable, we do a final pass to identify the useless 
extensions and transform them into narrowing subregs.


A few key points to remember.

For destination processing it is always safe to ignore a destination. 
Ignoring a destination merely means that whatever was live after the 
given insn will continue to be live before the insn.  What is not safe 
is to clear a bit in the LIVENOW bitmap for a destination chunk that is 
not set.  This comes into play with things like STRICT_LOW_PART.


For source processing the safe thing to do is to set all the chunks in a 
register as live.  It is never safe to fail to process a source operand.


When a destination object is not fully live, we try to transfer that 
limited liveness to the source operands.  So for example if bits 16..63 
are dead in a destination of a PLUS, we need not mark bits 16..63 as 
live for the source operands.  We have to be careful -- consider a shift 
count on a target without SHIFT_COUNT_TRUNCATED set.  So we have both a 
list of RTL codes where we can transfer liveness and a few codes where 
one of the operands may need to be fully live (ex, a shift count) while 
the other input may not need to be fully live (value left shifted).


Locally we have had this enabled at -O1 and above to encourage testing, 
but I'm thinking that for the trunk enabling at -O2 and above is the 
right thing to do.


This has (of course) been tested on rv64.  It's also been bootstrapped 
and regression tested on x86.  Bootstrap and regression tested (C only) 
for m68k, sh4, sh4eb, alpha.  Earlier versions were also bootstrapped 
and regression tested on ppc, hppa and s390x (C only for those as well). 
 It's also been tested on the various crosses in my tester.  So we've 
got reasonable coverage of 16, 32 and 64 bit targets, big and little 
endian, with and without SHIFT_COUNT_TRUNCATED and all kinds of other 
oddities.


The included tests are for RISC-V only because not all targets are going 
to have extraneous extensions.   There's tests from coremark, x264 and 
GCC's bz database.  It probably wouldn't be hard to add aarch64 
testscases.  The BZs listed are improved by this patch for aarch64.


Given the amount of work Jivan and I have done, I'm not comfortable 
self-approving at this time.  I'd much rather have another set of eyes 
on the code.  Hopefully the code is documented well enough for that to 
be useful exercise.


So, no need to work from Pago Pago for this patch.  I may make another 
attempt at the eswin conditional move work while working virtually in 
Pago Pago though.


Thoughts, comments, recommendations?

Jeff


PR target/95650
PR rtl-optimization/96031
PR rtl-optimization/104387
PR rtl-optimization/111384

gcc/
* Makefile.in (OBJS): Add ext-dce.o.
* common.opt (ext-dce): Add new option.
* df-scan.cc (df_get_exit_block_use_set): No longer static.
* df.h (df_get_exit_block_use_set): Prototype.
* ext-dce.cc: New file.
* passes.def: Add ext-dce before combine.
* tree-pass.h (make_pass_ext_dce): Prototype..

gcc/testsuite
* gcc.target/riscv/core_bench_list.c: New test.
* gcc.target/riscv/core_init_matrix.c: New test.
* gcc.target/riscv/core_list_init.c: New test.
* gcc.target/riscv/matrix_add_const.c: New test.
* gcc.target/riscv/mem-extend.c: New test.
* gcc.target/riscv/pr111384.c: New test.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 753f2f36618..af6f1415507 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1451,6 +1451,7 @@ OBJS = \
explow.o \
expmed.o \
expr.o \
+   ext-dce.o \
fibonacci_heap.o \
file-prefix-map.o \
final.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index d21db5d4a20..141dfdf14fd 10064

Re: [pushed][PATCH v2] LoongArch: Add code generation support for call36 function calls.

2023-11-19 Thread chenglulu



在 2023/11/19 上午1:24, Xi Ruoyao 写道:

On Sat, 2023-11-18 at 16:16 +0800, chenglulu wrote:

Pushed to r14-5567.

在 2023/11/16 下午3:27, Lulu Cheng 写道:

When compiling with '-mcmodel=medium', the function call is made through
'pcaddu18i+jirl' if binutils supports call36, otherwise the
native implementation 'pcalau12i+jirl' is used.

gcc/ChangeLog:

* config.in: Regenerate.
* config/loongarch/loongarch-opts.h (HAVE_AS_SUPPORT_CALL36): Define 
macro.
* config/loongarch/loongarch.cc (loongarch_legitimize_call_address):
If binutils supports call36, the function call is not split over expand.
* config/loongarch/loongarch.md: Add call36 generation code.
* config/loongarch/predicates.md: Likewise.
* configure: Regenerate.
* configure.ac: Check whether binutils supports call36.

With this change I get some test failures with "old" Binutils 2.41:

FAIL: gcc.target/loongarch/func-call-medium-1.c scan-assembler 
test:.*la.global\\t.*g\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-1.c scan-assembler 
test1:.*la.global\\t.*f\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-1.c scan-assembler 
test2:.*la.local\\t.*l\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-2.c scan-assembler 
test:.*la.global\\t.*g\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-2.c scan-assembler 
test1:.*la.local\\t.*f\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-2.c scan-assembler 
test2:.*la.local\\t.*l\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-3.c scan-assembler 
test2:.*la.local\\t.*l\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-4.c scan-assembler 
test1:.*la.local\\t.*f\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-4.c scan-assembler 
test2:.*la.local\\t.*l\\n\\tjirl

Some strange thing is happening: with -mexplicit-relocs=auto or always I
get pcalau12i + jirl as expected, but with -mexplicit-relocs=none I get
"pcaddu18i $r1,%call36(g)" and jirl.  This seems irony (!).


Thank you for the revision.



Re: Re: [PATCH] RISC-V: Implement -mmemcpy-strategy= options[PR112537]

2023-11-19 Thread Li Xu
I've tested it and there are no issues with regression testing.

Thanks,
Li Xu



xu...@eswincomputing.com
 
From: Jeff Law
Date: 2023-11-20 05:42
To: Li Xu; gcc-patches
CC: kito.cheng; palmer; juzhe.zhong
Subject: Re: [PATCH] RISC-V: Implement -mmemcpy-strategy= options[PR112537]
 
 
On 11/16/23 22:12, Li Xu wrote:
> From: xuli
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112537
> 
> -mmemcpy-strategy=[auto|libcall|scalar|vector]
> 
> auto: Current status, use scalar or vector instructions.
> libcall: Always use a library call.
> scalar: Only use scalar instructions.
> vector: Only use vector instructions.
> 
> PR target/112537
> 
> gcc/ChangeLog:
> 
> * config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum): Strategy 
> enum.
> * config/riscv/riscv-string.cc (riscv_expand_block_move): Disabled based on 
> options.
> (expand_block_move): Ditto.
> * config/riscv/riscv.opt: Add -mmemcpy-strategy=.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/base/cpymem-strategy-1.c: New test.
> * gcc.target/riscv/rvv/base/cpymem-strategy-2.c: New test.
> * gcc.target/riscv/rvv/base/cpymem-strategy-3.c: New test.
> * gcc.target/riscv/rvv/base/cpymem-strategy-4.c: New test.
>  * gcc.target/riscv/rvv/base/cpymem-strategy-5.c: New test.
> * gcc.target/riscv/rvv/base/cpymem-strategy.h: New test.
This is OK assuming you have tested it to ensure there aren't any 
regressions in the testsuite.  I don't expect problems, but let's be 
sure :-)
 
Thanks,
jeff


Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-19 Thread David Edelsohn
On Sun, Nov 19, 2023 at 5:15 PM Bruno Haible  wrote:

> David Edelsohn wrote:
> > --disable-threads currently does not completely disable threads.  Bruno
> is
> > suggesting --enable-threads=isoc that relies on mtx mutex functions in
> libc.
>
> Unfortunately, as said in the other mail today, relying only on mtx_*
> functions
> did not drop the dependency towards libpthreads.
>
> So, I've made a new release gettext-0.22.4, that includes only these
> changes:
>
>   - AM_GNU_GETTEXT now recognizes a statically built libintl on macOS and
> AIX.
>
>   - Passing --disable-threads now builds a libintl that, on AIX, does not
> need -lpthread.
>
>   - Other build fixes on AIX.
>
> > Yes, GCC should configure the in tree gettext with --disable-threads, but
> > that configure option is not completely effective and does not produce a
> > build without threads references.
>
> Now it is effective. But you (Arsen) should state in the documentation
> (gcc/doc/install.texi) that for --disable-threads to have this effect,
> one needs gettext version 0.22.4 or newer.
>

So the question is do we want to change GCC on AIX to always link against
pthreads so that GCC can build with default, external builds of gettext
libintl.  I don't see a path for i18n support to work for GCC on AIX
without that unfortunate change.

Thanks, David


Re: [RFA] New pass for sign/zero extension elimination

2023-11-19 Thread Oleg Endo


On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote:
> This is work originally started by Joern @ Embecosm.
> 
> There's been a long standing sense that we're generating too many 
> sign/zero extensions on the RISC-V port.  REE is useful, but it's really 
> focused on a relatively narrow part of the extension problem.
> 
> What Joern's patch does is introduce a new pass which tracks liveness of 
> chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31 
> and 32..63.
> 
> If it encounters a sign/zero extend that sets bits that are never read, 
> then it replaces the sign/zero extension with a narrowing subreg.  The 
> narrowing subreg usually gets eliminated by subsequent passes (it's just 
> a copy after all).
> 

Have you tried it on SH, too?  (and if so any numbers?)

It sounds like this one would be great to remove some of the sign/zero
extension removal hackery that I've accumulated in the SH backend.

Cheers,
Oleg


Re: [pushed][PATCH v2] LoongArch: Add code generation support for call36 function calls.

2023-11-19 Thread Xi Ruoyao
On Mon, 2023-11-20 at 09:09 +0800, chenglulu wrote:
> 
> 在 2023/11/19 上午1:24, Xi Ruoyao 写道:
> > On Sat, 2023-11-18 at 16:16 +0800, chenglulu wrote:
> > > Pushed to r14-5567.
> > > 
> > > 在 2023/11/16 下午3:27, Lulu Cheng 写道:
> > > > When compiling with '-mcmodel=medium', the function call is made through
> > > > 'pcaddu18i+jirl' if binutils supports call36, otherwise the
> > > > native implementation 'pcalau12i+jirl' is used.
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > > * config.in: Regenerate.
> > > > * config/loongarch/loongarch-opts.h (HAVE_AS_SUPPORT_CALL36): 
> > > > Define macro.
> > > > * config/loongarch/loongarch.cc 
> > > > (loongarch_legitimize_call_address):
> > > > If binutils supports call36, the function call is not split 
> > > > over expand.
> > > > * config/loongarch/loongarch.md: Add call36 generation code.
> > > > * config/loongarch/predicates.md: Likewise.
> > > > * configure: Regenerate.
> > > > * configure.ac: Check whether binutils supports call36.
> > With this change I get some test failures with "old" Binutils 2.41:
> > 
> > FAIL: gcc.target/loongarch/func-call-medium-1.c scan-assembler 
> > test:.*la.global\\t.*g\\n\\tjirl
> > FAIL: gcc.target/loongarch/func-call-medium-1.c scan-assembler 
> > test1:.*la.global\\t.*f\\n\\tjirl
> > FAIL: gcc.target/loongarch/func-call-medium-1.c scan-assembler 
> > test2:.*la.local\\t.*l\\n\\tjirl
> > FAIL: gcc.target/loongarch/func-call-medium-2.c scan-assembler 
> > test:.*la.global\\t.*g\\n\\tjirl
> > FAIL: gcc.target/loongarch/func-call-medium-2.c scan-assembler 
> > test1:.*la.local\\t.*f\\n\\tjirl
> > FAIL: gcc.target/loongarch/func-call-medium-2.c scan-assembler 
> > test2:.*la.local\\t.*l\\n\\tjirl
> > FAIL: gcc.target/loongarch/func-call-medium-3.c scan-assembler 
> > test2:.*la.local\\t.*l\\n\\tjirl
> > FAIL: gcc.target/loongarch/func-call-medium-4.c scan-assembler 
> > test1:.*la.local\\t.*f\\n\\tjirl
> > FAIL: gcc.target/loongarch/func-call-medium-4.c scan-assembler 
> > test2:.*la.local\\t.*l\\n\\tjirl
> > 
> > Some strange thing is happening: with -mexplicit-relocs=auto or always I
> > get pcalau12i + jirl as expected, but with -mexplicit-relocs=none I get
> > "pcaddu18i $r1,%call36(g)" and jirl.  This seems irony (!).
> > 
> Thank you for the revision.

Then I'll push
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637153.html if
this is an approval?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: Re: [PATCH] RISC-V: Implement -mmemcpy-strategy= options[PR112537]

2023-11-19 Thread juzhe.zh...@rivai.ai
Jeff has approved your patch.
You can commit it now.

Btw, CC Robin to let him know this patch.
Since he will support strcpy/strlenetc builtin with RVV instruction 
sequence.
He will definitely needs compile option like this patch introduce.

Thanks.


juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-11-20 09:09
To: Jeff Law; gcc-patches
CC: kito.cheng; palmer; juzhe.zhong
Subject: Re: Re: [PATCH] RISC-V: Implement -mmemcpy-strategy= options[PR112537]
I've tested it and there are no issues with regression testing.

Thanks,
Li Xu



xu...@eswincomputing.com
 
From: Jeff Law
Date: 2023-11-20 05:42
To: Li Xu; gcc-patches
CC: kito.cheng; palmer; juzhe.zhong
Subject: Re: [PATCH] RISC-V: Implement -mmemcpy-strategy= options[PR112537]
 
 
On 11/16/23 22:12, Li Xu wrote:
> From: xuli
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112537
> 
> -mmemcpy-strategy=[auto|libcall|scalar|vector]
> 
> auto: Current status, use scalar or vector instructions.
> libcall: Always use a library call.
> scalar: Only use scalar instructions.
> vector: Only use vector instructions.
> 
> PR target/112537
> 
> gcc/ChangeLog:
> 
> * config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum): Strategy 
> enum.
> * config/riscv/riscv-string.cc (riscv_expand_block_move): Disabled based on 
> options.
> (expand_block_move): Ditto.
> * config/riscv/riscv.opt: Add -mmemcpy-strategy=.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/base/cpymem-strategy-1.c: New test.
> * gcc.target/riscv/rvv/base/cpymem-strategy-2.c: New test.
> * gcc.target/riscv/rvv/base/cpymem-strategy-3.c: New test.
> * gcc.target/riscv/rvv/base/cpymem-strategy-4.c: New test.
>  * gcc.target/riscv/rvv/base/cpymem-strategy-5.c: New test.
> * gcc.target/riscv/rvv/base/cpymem-strategy.h: New test.
This is OK assuming you have tested it to ensure there aren't any 
regressions in the testsuite.  I don't expect problems, but let's be 
sure :-)
 
Thanks,
jeff


Re: [pushed][PATCH v2] LoongArch: Add code generation support for call36 function calls.

2023-11-19 Thread chenglulu



在 2023/11/20 上午9:51, Xi Ruoyao 写道:

On Mon, 2023-11-20 at 09:09 +0800, chenglulu wrote:

在 2023/11/19 上午1:24, Xi Ruoyao 写道:

On Sat, 2023-11-18 at 16:16 +0800, chenglulu wrote:

Pushed to r14-5567.

在 2023/11/16 下午3:27, Lulu Cheng 写道:

When compiling with '-mcmodel=medium', the function call is made through
'pcaddu18i+jirl' if binutils supports call36, otherwise the
native implementation 'pcalau12i+jirl' is used.

gcc/ChangeLog:

* config.in: Regenerate.
* config/loongarch/loongarch-opts.h (HAVE_AS_SUPPORT_CALL36): Define 
macro.
* config/loongarch/loongarch.cc (loongarch_legitimize_call_address):
If binutils supports call36, the function call is not split over expand.
* config/loongarch/loongarch.md: Add call36 generation code.
* config/loongarch/predicates.md: Likewise.
* configure: Regenerate.
* configure.ac: Check whether binutils supports call36.

With this change I get some test failures with "old" Binutils 2.41:

FAIL: gcc.target/loongarch/func-call-medium-1.c scan-assembler 
test:.*la.global\\t.*g\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-1.c scan-assembler 
test1:.*la.global\\t.*f\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-1.c scan-assembler 
test2:.*la.local\\t.*l\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-2.c scan-assembler 
test:.*la.global\\t.*g\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-2.c scan-assembler 
test1:.*la.local\\t.*f\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-2.c scan-assembler 
test2:.*la.local\\t.*l\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-3.c scan-assembler 
test2:.*la.local\\t.*l\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-4.c scan-assembler 
test1:.*la.local\\t.*f\\n\\tjirl
FAIL: gcc.target/loongarch/func-call-medium-4.c scan-assembler 
test2:.*la.local\\t.*l\\n\\tjirl

Some strange thing is happening: with -mexplicit-relocs=auto or always I
get pcalau12i + jirl as expected, but with -mexplicit-relocs=none I get
"pcaddu18i $r1,%call36(g)" and jirl.  This seems irony (!).


Thank you for the revision.

Then I'll push
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637153.html if
this is an approval?


OK. Thanks.



Re: [committed] libstdc++: Fix aligned formatting of stacktrace_entry and thread::id [PR112564]

2023-11-19 Thread Hans-Peter Nilsson
> From: Jonathan Wakely 
> Date: Thu, 16 Nov 2023 17:20:09 +

>   PR libstdc++/112564
>   * include/std/stacktrace (formatter::format): Format according
>   to format-spec.
>   * include/std/thread (formatter::format): Use _Align_right as
>   default.
>   * testsuite/19_diagnostics/stacktrace/output.cc: Check
>   fill-and-align handling. Change compile test to run.
>   * testsuite/30_threads/thread/id/output.cc: Check fill-and-align
>   handling.

You already know this, so JFTR: this introduced a regression
for some targets, logged as PR112630.

Was this change deliberate:

> --- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc
> +++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc
> @@ -1,4 +1,5 @@
> -// { dg-do compile { target c++23 } }
> +// { dg-options "-lstdc++exp" }
> +// { dg-do run { target c++23 } }
>  // { dg-require-effective-target stacktrace }
>  // { dg-add-options no_pch }

i.e. changing from dg-compile to dg-run?

I'm guessing so.  Though the changelog entry and post isn't
explicit, the use of VERIFY is rather clear and most tests
in 19_diagnostics/stacktrace are dg-run.

If so, can the "dg-run-ness" of the test please move to a
separate test and let 19_diagnostics/stacktrace/output.cc be
just dg-compile?  This particular test may not warrant the
consideration, but more so a pattern to follow for other
tests.

brgds, H-P
PS. Sorry, I have no idea why regarding the underlying multi-target problem


Re: [RFA] New pass for sign/zero extension elimination

2023-11-19 Thread Xi Ruoyao
On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote:
> This is work originally started by Joern @ Embecosm.
> 
> There's been a long standing sense that we're generating too many 
> sign/zero extensions on the RISC-V port.  REE is useful, but it's really 
> focused on a relatively narrow part of the extension problem.
> 
> What Joern's patch does is introduce a new pass which tracks liveness of 
> chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31 
> and 32..63.
> 
> If it encounters a sign/zero extend that sets bits that are never read, 
> then it replaces the sign/zero extension with a narrowing subreg.  The
> narrowing subreg usually gets eliminated by subsequent passes (it's just 
> a copy after all).
> 
> Jivan has done some analysis and found that it eliminates roughly 1% of 
> the dynamic instruction stream for x264 as well as some redundant 
> extensions in the coremark benchmark (both on rv64).  In my own testing 
> as I worked through issues on other architectures I clearly saw it 
> helping in various places within GCC itself or in the testsuite.
> 
> The basic structure is to first do a fairly standard liveness analysis
> on the chunks, seeding original state with the liveness data from DF. 
> Once that's stable, we do a final pass to identify the useless 
> extensions and transform them into narrowing subregs.
> 
> A few key points to remember.
> 
> For destination processing it is always safe to ignore a destination. 
> Ignoring a destination merely means that whatever was live after the 
> given insn will continue to be live before the insn.  What is not safe
> is to clear a bit in the LIVENOW bitmap for a destination chunk that is 
> not set.  This comes into play with things like STRICT_LOW_PART.
> 
> For source processing the safe thing to do is to set all the chunks in a 
> register as live.  It is never safe to fail to process a source operand.
> 
> When a destination object is not fully live, we try to transfer that 
> limited liveness to the source operands.  So for example if bits 16..63 
> are dead in a destination of a PLUS, we need not mark bits 16..63 as 
> live for the source operands.  We have to be careful -- consider a shift 
> count on a target without SHIFT_COUNT_TRUNCATED set.  So we have both a 
> list of RTL codes where we can transfer liveness and a few codes where
> one of the operands may need to be fully live (ex, a shift count) while 
> the other input may not need to be fully live (value left shifted).
> 
> Locally we have had this enabled at -O1 and above to encourage testing, 
> but I'm thinking that for the trunk enabling at -O2 and above is the 
> right thing to do.
> 
> This has (of course) been tested on rv64.  It's also been bootstrapped
> and regression tested on x86.  Bootstrap and regression tested (C only) 
> for m68k, sh4, sh4eb, alpha.  Earlier versions were also bootstrapped 
> and regression tested on ppc, hppa and s390x (C only for those as well). 
>   It's also been tested on the various crosses in my tester.  So we've
> got reasonable coverage of 16, 32 and 64 bit targets, big and little 
> endian, with and without SHIFT_COUNT_TRUNCATED and all kinds of other 
> oddities.
> 
> The included tests are for RISC-V only because not all targets are going 
> to have extraneous extensions.   There's tests from coremark, x264 and
> GCC's bz database.  It probably wouldn't be hard to add aarch64 
> testscases.  The BZs listed are improved by this patch for aarch64.
> 
> Given the amount of work Jivan and I have done, I'm not comfortable 
> self-approving at this time.  I'd much rather have another set of eyes
> on the code.  Hopefully the code is documented well enough for that to
> be useful exercise.
> 
> So, no need to work from Pago Pago for this patch.  I may make another
> attempt at the eswin conditional move work while working virtually in 
> Pago Pago though.
> 
> Thoughts, comments, recommendations?

Unfortunately, I get some ICE building stage 1 libgcc with this patch on
loongarch64-linux-gnu:

during RTL pass: ext_dce
../../../gcc/libgcc/libgcc2.c: In function ‘__absvdi2’:
../../../gcc/libgcc/libgcc2.c:224:1: internal compiler error: Segmentation fault
  224 | }
  | ^
0x120baa477 crash_signal
../../gcc/gcc/toplev.cc:316
0x1216aeeb4 ext_dce_process_sets
../../gcc/gcc/ext-dce.cc:128
0x1216afbaf ext_dce_process_bb
../../gcc/gcc/ext-dce.cc:647
0x1216afbaf ext_dce
../../gcc/gcc/ext-dce.cc:802
0x1216afbaf execute
../../gcc/gcc/ext-dce.cc:868
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re:[pushed and backport] [PATCH] LoongArch: Modify MUSL_DYNAMIC_LINKER.

2023-11-19 Thread chenglulu

pushed to r14-5601

backport to r13-8085 and r12-9995.

r12 and r13 simultaneously synchronized the patch that changed '/lib64' 
to '/lib'.


在 2023/11/18 上午11:15, Lulu Cheng 写道:

Use no suffix at all in the musl dynamic linker name for hard
float ABI. Use -sf and -sp suffixes in musl dynamic linker name
for soft float and single precision ABIs. The following table
outlines the musl interpreter names for the LoongArch64 ABI names.

musl interpreter| LoongArch64 ABI
--- | -
ld-musl-loongarch64.so.1| loongarch64-lp64d
ld-musl-loongarch64-sp.so.1 | loongarch64-lp64f
ld-musl-loongarch64-sf.so.1 | loongarch64-lp64s

gcc/ChangeLog:

* config/loongarch/gnu-user.h (MUSL_ABI_SPEC): Modify suffix.
---
  gcc/config/loongarch/gnu-user.h | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/loongarch/gnu-user.h b/gcc/config/loongarch/gnu-user.h
index 9616d6e8a0b..e9f4bcef1d4 100644
--- a/gcc/config/loongarch/gnu-user.h
+++ b/gcc/config/loongarch/gnu-user.h
@@ -34,9 +34,9 @@ along with GCC; see the file COPYING3.  If not see
"/lib" ABI_GRLEN_SPEC "/ld-linux-loongarch-" ABI_SPEC ".so.1"
  
  #define MUSL_ABI_SPEC \

-  "%{mabi=lp64d:-lp64d}" \
-  "%{mabi=lp64f:-lp64f}" \
-  "%{mabi=lp64s:-lp64s}"
+  "%{mabi=lp64d:}" \
+  "%{mabi=lp64f:-sp}" \
+  "%{mabi=lp64s:-sf}"
  
  #undef MUSL_DYNAMIC_LINKER

  #define MUSL_DYNAMIC_LINKER \




gcc-patches@gcc.gnu.org

2023-11-19 Thread Alexandre Oliva


I got spurious fails of tests that required arm_thumb1_movt_ok on a
target cpu that did not support movt.  Looking into it, I found the
arm_movt property to have been cut&pasted into various procs that
checked for different properties.  They shouldn't share the same test
results cache entry, so I'm changing their prop names.

Regstrapped on x86_64-linux-gnu, also tested on arm-eabi with default
cpu on trunk, and with tms570 on gcc-13.  Ok to install?


for  gcc/testsuite/ChangeLog

* lib/target-supports.exp
(check_effective_target_arm_thumb1_cbz_ok): Fix prop name
cut&pasto.
(check_effective_target_arm_arch_v6t2_hw_ok): Likewise.
---
 gcc/testsuite/lib/target-supports.exp |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 240a3815d38a7..e3519207d0e61 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5610,7 +5610,7 @@ proc check_effective_target_arm_thumb1_movt_ok {} {
 
 proc check_effective_target_arm_thumb1_cbz_ok {} {
 if [check_effective_target_arm_thumb1_ok] {
-   return [check_no_compiler_messages arm_movt object {
+   return [check_no_compiler_messages arm_cbz object {
int
foo (void)
{
@@ -5627,7 +5627,7 @@ proc check_effective_target_arm_thumb1_cbz_ok {} {
 
 proc check_effective_target_arm_arch_v6t2_hw_ok {} {
 if [check_effective_target_arm_thumb1_ok] {
-   return [check_no_compiler_messages arm_movt object {
+   return [check_no_compiler_messages arm_v6t2_hw object {
int
main (void)
{

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH #2/4] c++: mark short-enums as packed

2023-11-19 Thread Alexandre Oliva


Unlike C, C++ doesn't mark enums shortened by -fshort-enums as packed.
This makes for undesirable warning differences between C and C++,
e.g. c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early*.c
triggers a warning about a type cast from a pointer to enum that, when
packed, might not be sufficiently aligned.

This change is not enough for that warning to trigger.  The tree
expression generated by the C++ front-end is also a little too
complicated for us get to the base pointer.  A separate patch takes
care of that.

Regstrapped on x86_64-linux-gnu, also tested on arm-eabi with default
cpu on trunk, and with tms570 on gcc-13.  Ok to install?


for  gcc/cp/ChangeLog

* decl.cc (finish_enum_value_list): Set TYPE_PACKED if
use_short_enum, and propagate it to variants.
---
 gcc/cp/decl.cc |6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 038c5ab71f201..f6d5645d5080f 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -16881,6 +16881,12 @@ finish_enum_value_list (tree enumtype)
   /* If -fstrict-enums, still constrain TYPE_MIN/MAX_VALUE.  */
   if (flag_strict_enums)
set_min_and_max_values_for_integral_type (enumtype, precision, sgn);
+
+  if (use_short_enum)
+   {
+ TYPE_PACKED (enumtype) = use_short_enum;
+ fixup_attribute_variants (enumtype);
+   }
 }
   else
 underlying_type = ENUM_UNDERLYING_TYPE (enumtype);


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH #3/4] warn on cast of pointer to packed plus constant

2023-11-19 Thread Alexandre Oliva


c-c++-common/analyzer/null-deref-pr108251-smp-fetch_ssl_fc_has_early.c
gets an unaligned pointer value warning on -fshort-enums targets in C,
but not in C++.  The former simplifies the offset-and-cast expression
enough that check_and_warn_address_or_pointer_of_packed_member finds
no more than a type cast of the base pointer, but in C++, the entire
expression, with cast, constant offsetting, and cast again, is
retained, and that's too much for the warning code.

Or rather it was.  It's easy enough to take the base pointer from
PLUS_POINTER_EXPR, and a constant offset can't possibly increase
alignment for just any pointer of laxer alignment, so we can safely
disregard the offset.

This should improve the warning even in C, if the packed enum is at a
nonzero offset into the containing struct.

Regstrapped on x86_64-linux-gnu, also tested on arm-eabi with default
cpu on trunk, and with tms570 on gcc-13.  Ok to install?


for  gcc/c-family/ChangeLog

* c-warn.cc
(check_and_warn_address_or_pointer_of_packed_member): Take the
base pointer from PLUS_POINTER_EXPR when the addend is
constant.
---
 gcc/c-family/c-warn.cc |   16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/c-family/c-warn.cc b/gcc/c-family/c-warn.cc
index d2938b91043d3..2ef73d7088f22 100644
--- a/gcc/c-family/c-warn.cc
+++ b/gcc/c-family/c-warn.cc
@@ -3108,10 +3108,20 @@ check_and_warn_address_or_pointer_of_packed_member 
(tree type, tree rhs)
 
   do
 {
-  while (TREE_CODE (rhs) == COMPOUND_EXPR)
-   rhs = TREE_OPERAND (rhs, 1);
-  orig_rhs = rhs;
+  do
+   {
+ orig_rhs = rhs;
+ while (TREE_CODE (rhs) == COMPOUND_EXPR)
+   rhs = TREE_OPERAND (rhs, 1);
+ /* Constants can't increase the alignment.  */
+ while (TREE_CODE (rhs) == POINTER_PLUS_EXPR
+&& TREE_CONSTANT (TREE_OPERAND (rhs, 1)))
+   rhs = TREE_OPERAND (rhs, 0);
+   }
+  while (orig_rhs != rhs);
   STRIP_NOPS (rhs);
+  while (TREE_CODE (rhs) == VIEW_CONVERT_EXPR)
+   rhs = TREE_OPERAND (rhs, 0);
   nop_p |= orig_rhs != rhs;
 }
   while (orig_rhs != rhs);

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH #4/4] testsuite: discard c++ exclusion on underaligned pointer warning

2023-11-19 Thread Alexandre Oliva


Having extended check_and_warn_address_or_pointer_of_packed_member to
find the packed (short) enum pointer in the cast expression coming
from the C++ front-end, and amended the C++ front end to mark short
enums as TYPE_PACKED, C++ issues the same warning that C does for
c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c,
so drop the exclusion.

Regstrapped on x86_64-linux-gnu, also tested on arm-eabi with default
cpu on trunk, and with tms570 on gcc-13.  Ok to install?


for  gcc/testsuite/ChangeLog

* 
c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c:
Expect warning in C++ as well.
* 
c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c:
Likewise.
---
 ...-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c |2 +-
 ...ull-deref-pr108251-smp_fetch_ssl_fc_has_early.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
 
b/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
index aaa2031b6dca4..bf5bf5cc2e278 100644
--- 
a/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
+++ 
b/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early-O2.c
@@ -61,7 +61,7 @@ static inline enum obj_type obj_type(const enum obj_type *t)
 }
 static inline struct connection *__objt_conn(enum obj_type *t)
 {
- return ((struct connection *)(((char *)(t)) - ((long)&((struct connection 
*)0)->obj_type))); /* { dg-warning "unaligned pointer value" "warning" { target 
{ short_enums && { ! c++ } } } } */
+ return ((struct connection *)(((char *)(t)) - ((long)&((struct connection 
*)0)->obj_type))); /* { dg-warning "unaligned pointer value" "warning" { target 
short_enums } } */
 }
 static inline struct connection *objt_conn(enum obj_type *t)
 {
diff --git 
a/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c
 
b/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c
index 6c96f5a76ef1c..7c2710c64d35e 100644
--- 
a/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c
+++ 
b/gcc/testsuite/c-c++-common/analyzer/null-deref-pr108251-smp_fetch_ssl_fc_has_early.c
@@ -60,7 +60,7 @@ static inline enum obj_type obj_type(const enum obj_type *t)
 }
 static inline struct connection *__objt_conn(enum obj_type *t)
 {
- return ((struct connection *)(((char *)(t)) - ((long)&((struct connection 
*)0)->obj_type))); /* { dg-warning "unaligned pointer value" "warning" { target 
{ short_enums && { ! c++ } } } } */
+ return ((struct connection *)(((char *)(t)) - ((long)&((struct connection 
*)0)->obj_type))); /* { dg-warning "unaligned pointer value" "warning" { target 
short_enums } } */
 }
 static inline struct connection *objt_conn(enum obj_type *t)
 {

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [RFA] New pass for sign/zero extension elimination

2023-11-19 Thread Jeff Law




On 11/19/23 18:22, Oleg Endo wrote:


On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote:

This is work originally started by Joern @ Embecosm.

There's been a long standing sense that we're generating too many
sign/zero extensions on the RISC-V port.  REE is useful, but it's really
focused on a relatively narrow part of the extension problem.

What Joern's patch does is introduce a new pass which tracks liveness of
chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31
and 32..63.

If it encounters a sign/zero extend that sets bits that are never read,
then it replaces the sign/zero extension with a narrowing subreg.  The
narrowing subreg usually gets eliminated by subsequent passes (it's just
a copy after all).



Have you tried it on SH, too?  (and if so any numbers?)
Just bootstrap with C regression testing on sh4/sh4eb.  No data on 
improvements.


Jeff


Re: Re: [PATCH] RISC-V: Implement -mmemcpy-strategy= options[PR112537]

2023-11-19 Thread Li Xu
Committed, thanks jeff and juzhe.

Thanks,
Li Xu


xu...@eswincomputing.com
 
From: juzhe.zh...@rivai.ai
Date: 2023-11-20 09:55
To: Li Xu; jeffreyalaw; gcc-patches
CC: kito.cheng; palmer; Robin Dapp
Subject: Re: Re: [PATCH] RISC-V: Implement -mmemcpy-strategy= options[PR112537]
Jeff has approved your patch.
You can commit it now.

Btw, CC Robin to let him know this patch.
Since he will support strcpy/strlenetc builtin with RVV instruction 
sequence.
He will definitely needs compile option like this patch introduce.

Thanks.


juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-11-20 09:09
To: Jeff Law; gcc-patches
CC: kito.cheng; palmer; juzhe.zhong
Subject: Re: Re: [PATCH] RISC-V: Implement -mmemcpy-strategy= options[PR112537]
I've tested it and there are no issues with regression testing.

Thanks,
Li Xu



xu...@eswincomputing.com
 
From: Jeff Law
Date: 2023-11-20 05:42
To: Li Xu; gcc-patches
CC: kito.cheng; palmer; juzhe.zhong
Subject: Re: [PATCH] RISC-V: Implement -mmemcpy-strategy= options[PR112537]
 
 
On 11/16/23 22:12, Li Xu wrote:
> From: xuli
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112537
> 
> -mmemcpy-strategy=[auto|libcall|scalar|vector]
> 
> auto: Current status, use scalar or vector instructions.
> libcall: Always use a library call.
> scalar: Only use scalar instructions.
> vector: Only use vector instructions.
> 
> PR target/112537
> 
> gcc/ChangeLog:
> 
> * config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum): Strategy 
> enum.
> * config/riscv/riscv-string.cc (riscv_expand_block_move): Disabled based on 
> options.
> (expand_block_move): Ditto.
> * config/riscv/riscv.opt: Add -mmemcpy-strategy=.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/base/cpymem-strategy-1.c: New test.
> * gcc.target/riscv/rvv/base/cpymem-strategy-2.c: New test.
> * gcc.target/riscv/rvv/base/cpymem-strategy-3.c: New test.
> * gcc.target/riscv/rvv/base/cpymem-strategy-4.c: New test.
>  * gcc.target/riscv/rvv/base/cpymem-strategy-5.c: New test.
> * gcc.target/riscv/rvv/base/cpymem-strategy.h: New test.
This is OK assuming you have tested it to ensure there aren't any 
regressions in the testsuite.  I don't expect problems, but let's be 
sure :-)
 
Thanks,
jeff


Re: [RFA] New pass for sign/zero extension elimination

2023-11-19 Thread Jeff Law




On 11/19/23 19:23, Xi Ruoyao wrote:

On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote:

This is work originally started by Joern @ Embecosm.

There's been a long standing sense that we're generating too many
sign/zero extensions on the RISC-V port.  REE is useful, but it's really
focused on a relatively narrow part of the extension problem.

What Joern's patch does is introduce a new pass which tracks liveness of
chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31
and 32..63.

If it encounters a sign/zero extend that sets bits that are never read,
then it replaces the sign/zero extension with a narrowing subreg.  The
narrowing subreg usually gets eliminated by subsequent passes (it's just
a copy after all).

Jivan has done some analysis and found that it eliminates roughly 1% of
the dynamic instruction stream for x264 as well as some redundant
extensions in the coremark benchmark (both on rv64).  In my own testing
as I worked through issues on other architectures I clearly saw it
helping in various places within GCC itself or in the testsuite.

The basic structure is to first do a fairly standard liveness analysis
on the chunks, seeding original state with the liveness data from DF.
Once that's stable, we do a final pass to identify the useless
extensions and transform them into narrowing subregs.

A few key points to remember.

For destination processing it is always safe to ignore a destination.
Ignoring a destination merely means that whatever was live after the
given insn will continue to be live before the insn.  What is not safe
is to clear a bit in the LIVENOW bitmap for a destination chunk that is
not set.  This comes into play with things like STRICT_LOW_PART.

For source processing the safe thing to do is to set all the chunks in a
register as live.  It is never safe to fail to process a source operand.

When a destination object is not fully live, we try to transfer that
limited liveness to the source operands.  So for example if bits 16..63
are dead in a destination of a PLUS, we need not mark bits 16..63 as
live for the source operands.  We have to be careful -- consider a shift
count on a target without SHIFT_COUNT_TRUNCATED set.  So we have both a
list of RTL codes where we can transfer liveness and a few codes where
one of the operands may need to be fully live (ex, a shift count) while
the other input may not need to be fully live (value left shifted).

Locally we have had this enabled at -O1 and above to encourage testing,
but I'm thinking that for the trunk enabling at -O2 and above is the
right thing to do.

This has (of course) been tested on rv64.  It's also been bootstrapped
and regression tested on x86.  Bootstrap and regression tested (C only)
for m68k, sh4, sh4eb, alpha.  Earlier versions were also bootstrapped
and regression tested on ppc, hppa and s390x (C only for those as well).
   It's also been tested on the various crosses in my tester.  So we've
got reasonable coverage of 16, 32 and 64 bit targets, big and little
endian, with and without SHIFT_COUNT_TRUNCATED and all kinds of other
oddities.

The included tests are for RISC-V only because not all targets are going
to have extraneous extensions.   There's tests from coremark, x264 and
GCC's bz database.  It probably wouldn't be hard to add aarch64
testscases.  The BZs listed are improved by this patch for aarch64.

Given the amount of work Jivan and I have done, I'm not comfortable
self-approving at this time.  I'd much rather have another set of eyes
on the code.  Hopefully the code is documented well enough for that to
be useful exercise.

So, no need to work from Pago Pago for this patch.  I may make another
attempt at the eswin conditional move work while working virtually in
Pago Pago though.

Thoughts, comments, recommendations?


Unfortunately, I get some ICE building stage 1 libgcc with this patch on
loongarch64-linux-gnu:

during RTL pass: ext_dce
../../../gcc/libgcc/libgcc2.c: In function ‘__absvdi2’:
../../../gcc/libgcc/libgcc2.c:224:1: internal compiler error: Segmentation fault
   224 | }
   | ^
0x120baa477 crash_signal
../../gcc/gcc/toplev.cc:316
0x1216aeeb4 ext_dce_process_sets
../../gcc/gcc/ext-dce.cc:128
0x1216afbaf ext_dce_process_bb
../../gcc/gcc/ext-dce.cc:647
0x1216afbaf ext_dce
../../gcc/gcc/ext-dce.cc:802
0x1216afbaf execute
../../gcc/gcc/ext-dce.cc:868
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

I think I know what's going on here.

jeff


[pushed] c++: add DECL_IMPLICIT_TEMPLATE_PARM_P macro

2023-11-19 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Let's use a more informative name instead of DECL_VIRTUAL_P directly.

gcc/cp/ChangeLog:

* cp-tree.h (DECL_TEMPLATE_PARM_CHECK): New.
(DECL_IMPLICIT_TEMPLATE_PARM_P): New.
(decl_template_parm_check): New.
* mangle.cc (write_closure_template_head): Use it.
* parser.cc (synthesize_implicit_template_parm): Likewise.
* pt.cc (template_parameters_equivalent_p): Likewise.
---
 gcc/cp/cp-tree.h | 19 +++
 gcc/cp/mangle.cc |  2 +-
 gcc/cp/parser.cc |  2 +-
 gcc/cp/pt.cc |  3 ++-
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c7a1cf610c8..7b0b7c6a17e 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -677,10 +677,14 @@ template_info_decl_check (const_tree t, const char* f, 
int l, const char* fn)
   tree_check_failed (__t, __FILE__, __LINE__, __FUNCTION__, 0);\
  __t; })
 
+#define DECL_TEMPLATE_PARM_CHECK(NODE) \
+  decl_template_parm_check ((NODE), __FILE__, __LINE__, __FUNCTION__)
+
 #else /* ENABLE_TREE_CHECKING */
 
 #define TEMPLATE_INFO_DECL_CHECK(NODE) (NODE)
 #define THUNK_FUNCTION_CHECK(NODE) (NODE)
+#define DECL_TEMPLATE_PARM_CHECK(NODE) (NODE)
 
 #endif /* ENABLE_TREE_CHECKING */
 
@@ -3577,6 +3581,11 @@ struct GTY(()) lang_decl {
need.  But we want a more descriptive name.  */
 #define DECL_VTABLE_OR_VTT_P(NODE) DECL_VIRTUAL_P (VAR_DECL_CHECK (NODE))
 
+/* 1 iff a _DECL for a template parameter came from
+   synthesize_implicit_template_parm.  */
+#define DECL_IMPLICIT_TEMPLATE_PARM_P(NODE) \
+  DECL_VIRTUAL_P (DECL_TEMPLATE_PARM_CHECK (NODE))
+
 /* 1 iff FUNCTION_TYPE or METHOD_TYPE has a ref-qualifier (either & or &&). */
 #define FUNCTION_REF_QUALIFIED(NODE) \
   TREE_LANG_FLAG_4 (FUNC_OR_METHOD_CHECK (NODE))
@@ -5057,6 +5066,16 @@ get_vec_init_expr (tree t)
|| TREE_CODE (NODE) == TYPE_DECL\
|| TREE_CODE (NODE) == TEMPLATE_DECL))
 
+#if ENABLE_TREE_CHECKING
+inline tree
+decl_template_parm_check (const_tree t, const char *f, int l, const char *fn)
+{
+  if (!DECL_TEMPLATE_PARM_P (t))
+tree_check_failed (t, f, l, fn, 0);
+  return const_cast(t);
+}
+#endif
+
 /* Nonzero for a raw template parameter node.  */
 #define TEMPLATE_PARM_P(NODE)  \
   (TREE_CODE (NODE) == TEMPLATE_TYPE_PARM  \
diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index afa68da871c..5137305ed07 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -1744,7 +1744,7 @@ write_closure_template_head (tree tmpl)
continue;
   parm = TREE_VALUE (parm);
 
-  if (DECL_VIRTUAL_P (parm))
+  if (DECL_IMPLICIT_TEMPLATE_PARM_P (parm))
// A synthetic parm, we're done.
break;
 
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index d1104336215..f556b8f3c01 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -50895,7 +50895,7 @@ synthesize_implicit_template_parm  (cp_parser *parser, 
tree constr)
  Note that DECL_ARTIFICIAL is used elsewhere for template
  parameters.  */
   if (TREE_VALUE (new_parm) != error_mark_node)
-DECL_VIRTUAL_P (TREE_VALUE (new_parm)) = true;
+DECL_IMPLICIT_TEMPLATE_PARM_P (TREE_VALUE (new_parm)) = true;
 
   tree new_decl = get_local_decls ();
   if (non_type)
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 324f6f01555..1de9d3eb44f 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -3359,7 +3359,8 @@ template_parameters_equivalent_p (const_tree parm1, 
const_tree parm2)
   /* ... one parameter was introduced by a parameter declaration, then
  both are. This case arises as a result of eagerly rewriting declarations
  during parsing.  */
-  if (DECL_VIRTUAL_P (decl1) != DECL_VIRTUAL_P (decl2))
+  if (DECL_IMPLICIT_TEMPLATE_PARM_P (decl1)
+  != DECL_IMPLICIT_TEMPLATE_PARM_P (decl2))
 return false;
 
   /* ... if either declares a pack, they both do.  */

base-commit: 0d734c79387191005c909c54c7556a88254c401b
-- 
2.39.3



[pushed] c++: compare one level of template parms

2023-11-19 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

There should never be a reason to compare more than one level of template
parameters; additional levels are for the enclosing context, which is either
irrelevant (for a template template parameter) or already compared (for a
member template).

Also, the comp_template_parms handling of type parameters was wrongly
checking for TEMPLATE_TYPE_PARM when a type parameter appears here as a
TYPE_DECL.

gcc/cp/ChangeLog:

* pt.cc (comp_template_parms): Just one level.
(template_parameter_lists_equivalent_p): Likewise.
---
 gcc/cp/pt.cc | 94 +++-
 1 file changed, 35 insertions(+), 59 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 1de9d3eb44f..ed681afb5d4 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -3274,53 +3274,40 @@ check_explicit_specialization (tree declarator,
 int
 comp_template_parms (const_tree parms1, const_tree parms2)
 {
-  const_tree p1;
-  const_tree p2;
-
   if (parms1 == parms2)
 return 1;
 
-  for (p1 = parms1, p2 = parms2;
-   p1 != NULL_TREE && p2 != NULL_TREE;
-   p1 = TREE_CHAIN (p1), p2 = TREE_CHAIN (p2))
+  tree t1 = TREE_VALUE (parms1);
+  tree t2 = TREE_VALUE (parms2);
+  int i;
+
+  gcc_assert (TREE_CODE (t1) == TREE_VEC);
+  gcc_assert (TREE_CODE (t2) == TREE_VEC);
+
+  if (TREE_VEC_LENGTH (t1) != TREE_VEC_LENGTH (t2))
+return 0;
+
+  for (i = 0; i < TREE_VEC_LENGTH (t2); ++i)
 {
-  tree t1 = TREE_VALUE (p1);
-  tree t2 = TREE_VALUE (p2);
-  int i;
+  tree parm1 = TREE_VALUE (TREE_VEC_ELT (t1, i));
+  tree parm2 = TREE_VALUE (TREE_VEC_ELT (t2, i));
 
-  gcc_assert (TREE_CODE (t1) == TREE_VEC);
-  gcc_assert (TREE_CODE (t2) == TREE_VEC);
+  /* If either of the template parameters are invalid, assume
+they match for the sake of error recovery. */
+  if (error_operand_p (parm1) || error_operand_p (parm2))
+   return 1;
 
-  if (TREE_VEC_LENGTH (t1) != TREE_VEC_LENGTH (t2))
+  if (TREE_CODE (parm1) != TREE_CODE (parm2))
return 0;
 
-  for (i = 0; i < TREE_VEC_LENGTH (t2); ++i)
-   {
-  tree parm1 = TREE_VALUE (TREE_VEC_ELT (t1, i));
-  tree parm2 = TREE_VALUE (TREE_VEC_ELT (t2, i));
-
-  /* If either of the template parameters are invalid, assume
- they match for the sake of error recovery. */
-  if (error_operand_p (parm1) || error_operand_p (parm2))
-return 1;
-
- if (TREE_CODE (parm1) != TREE_CODE (parm2))
-   return 0;
-
- if (TREE_CODE (parm1) == TEMPLATE_TYPE_PARM
-  && (TEMPLATE_TYPE_PARAMETER_PACK (parm1)
-  == TEMPLATE_TYPE_PARAMETER_PACK (parm2)))
-   continue;
- else if (!same_type_p (TREE_TYPE (parm1), TREE_TYPE (parm2)))
-   return 0;
-   }
+  if (TREE_CODE (parm1) == TYPE_DECL
+ && (TEMPLATE_TYPE_PARAMETER_PACK (TREE_TYPE (parm1))
+ == TEMPLATE_TYPE_PARAMETER_PACK (TREE_TYPE (parm2
+   continue;
+  else if (!same_type_p (TREE_TYPE (parm1), TREE_TYPE (parm2)))
+   return 0;
 }
 
-  if ((p1 != NULL_TREE) != (p2 != NULL_TREE))
-/* One set of parameters has more parameters lists than the
-   other.  */
-return 0;
-
   return 1;
 }
 
@@ -3403,31 +3390,20 @@ template_parameter_lists_equivalent_p (const_tree 
parms1, const_tree parms2)
   if (parms1 == parms2)
 return true;
 
-  const_tree p1 = parms1;
-  const_tree p2 = parms2;
-  while (p1 != NULL_TREE && p2 != NULL_TREE)
+  tree list1 = TREE_VALUE (parms1);
+  tree list2 = TREE_VALUE (parms2);
+
+  if (TREE_VEC_LENGTH (list1) != TREE_VEC_LENGTH (list2))
+return 0;
+
+  for (int i = 0; i < TREE_VEC_LENGTH (list2); ++i)
 {
-  tree list1 = TREE_VALUE (p1);
-  tree list2 = TREE_VALUE (p2);
-
-  if (TREE_VEC_LENGTH (list1) != TREE_VEC_LENGTH (list2))
-   return 0;
-
-  for (int i = 0; i < TREE_VEC_LENGTH (list2); ++i)
-   {
- tree parm1 = TREE_VEC_ELT (list1, i);
- tree parm2 = TREE_VEC_ELT (list2, i);
- if (!template_parameters_equivalent_p (parm1, parm2))
-   return false;
-   }
-
-  p1 = TREE_CHAIN (p1);
-  p2 = TREE_CHAIN (p2);
+  tree parm1 = TREE_VEC_ELT (list1, i);
+  tree parm2 = TREE_VEC_ELT (list2, i);
+  if (!template_parameters_equivalent_p (parm1, parm2))
+   return false;
 }
 
-  if ((p1 != NULL_TREE) != (p2 != NULL_TREE))
-return false;
-
   return true;
 }
 

base-commit: 0d734c79387191005c909c54c7556a88254c401b
prerequisite-patch-id: b424bed5ac33d406c713411784f1797effc63375
-- 
2.39.3



[PATCH RFC] c++: mangle function template constraints

2023-11-19 Thread Jason Merrill
Tested x86_64-pc-linux-gnu.  Are the library bits OK?  Any comments before I
push this?

-- 8< --

Per https://github.com/itanium-cxx-abi/cxx-abi/issues/24 and
https://github.com/itanium-cxx-abi/cxx-abi/pull/166

We need to mangle constraints to be able to distinguish between function
templates that only differ in constraints.  From the latter link, we want to
use the template parameter mangling previously specified for lambdas to also
make explicit the form of a template parameter where the argument is not a
"natural" fit for it, such as when the parameter is constrained or deduced.

I'm concerned about how the latter link changes the mangling for some C++98
and C++11 patterns, so I've limited template_parm_natural_p to avoid two
cases found by running the testsuite with -Wabi forced on:

template  T f() { return V; }
int main() { return f(); }

template  int max() { return i; }
template  int max()
{
  int sub = max();
  return i > sub ? i : sub;
}
int main() {  return max<1,2,3>(); }

A third C++11 pattern is changed by this patch:

template  class TT, typename... Ts> TT f();
template  struct A { };
int main() { f(); }

I aim to resolve these with the ABI committee before GCC 14.1.

We also need to resolve https://github.com/itanium-cxx-abi/cxx-abi/issues/38
(mangling references to dependent template-ids where the name is fully
resolved) as references to concepts in std:: will consistently run into this
area.  This is why mangle-concepts1.C only refers to concepts in the global
namespace so far.

The library changes are to avoid trying to mangle builtins, which fails.

Demangler support and test coverage is not complete yet.

gcc/cp/ChangeLog:

* cp-tree.h (TEMPLATE_ARGS_TYPE_CONSTRAINT_P): New.
(get_concept_check_template): Declare.
* constraint.cc (combine_constraint_expressions)
(finish_shorthand_constraint): Use UNKNOWN_LOCATION.
* pt.cc (convert_generic_types_to_packs): Likewise.
* mangle.cc (write_constraint_expression)
(write_tparms_constraints, write_type_constraint)
(template_parm_natural_p, write_requirement)
(write_requires_expr): New.
(write_encoding): Mangle trailing requires-clause.
(write_name): Pass parms to write_template_args.
(write_template_param_decl): Factor out from...
(write_closure_template_head): ...here.
(write_template_args): Mangle non-natural parms
and requires-clause.
(write_expression): Handle REQUIRES_EXPR.

include/ChangeLog:

* demangle.h (enum demangle_component_type): Add
DEMANGLE_COMPONENT_CONSTRAINTS.

libiberty/ChangeLog:

* cp-demangle.c (d_make_comp): Handle
DEMANGLE_COMPONENT_CONSTRAINTS.
(d_count_templates_scopes): Likewise.
(d_print_comp_inner): Likewise.
(d_maybe_constraints): New.
(d_encoding, d_template_args_1): Call it.
(d_parmlist): Handle 'Q'.
* testsuite/demangle-expected: Add some constraint tests.

libstdc++-v3/ChangeLog:

* include/std/bit: Avoid builtins in requires-clauses.
* include/std/variant: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/abi/mangle10.C: Disable compat aliases.
* g++.dg/abi/mangle52.C: Specify ABI 18.
* g++.dg/cpp2a/class-deduction-alias3.C
* g++.dg/cpp2a/class-deduction-alias8.C:
Avoid builtins in requires-clauses.
* g++.dg/abi/mangle-concepts1.C: New test.
* g++.dg/abi/mangle-ttp1.C: New test.
---
 gcc/cp/cp-tree.h  |   7 +
 include/demangle.h|   2 +
 gcc/cp/constraint.cc  |  10 +-
 gcc/cp/mangle.cc  | 369 --
 gcc/cp/pt.cc  |   4 +-
 gcc/testsuite/g++.dg/abi/mangle-concepts1.C   |  88 +
 gcc/testsuite/g++.dg/abi/mangle-ttp1.C|  27 ++
 gcc/testsuite/g++.dg/abi/mangle10.C   |   2 +-
 gcc/testsuite/g++.dg/abi/mangle52.C   |   2 +-
 .../g++.dg/cpp2a/class-deduction-alias3.C |   5 +-
 .../g++.dg/cpp2a/class-deduction-alias8.C |   5 +-
 libiberty/cp-demangle.c   |  86 +++-
 libiberty/testsuite/demangle-expected |   8 +
 libstdc++-v3/include/std/bit  |   2 +-
 libstdc++-v3/include/std/variant  |   4 +-
 15 files changed, 550 insertions(+), 71 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/abi/mangle-concepts1.C
 create mode 100644 gcc/testsuite/g++.dg/abi/mangle-ttp1.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 7b0b7c6a17e..138894dac98 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -3800,6 +3800,12 @@ struct GTY(()) lang_decl {
   : TREE_VEC_LENGTH (INNERMOST_TEMPLATE_ARGS (NODE))
 #endif
 
+/* True iff NODE represents the template args for a type-constraint,
+   in which case the first one represents the constrained type.
+   Currently only set during mangling.  */
+#define TEMPLATE_ARG

[PATCH] [x86] Support reduc_{and, ior, xor}_scal_m for V4HI/V8QI/V4QImode

2023-11-19 Thread liuhongt
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

PR target/112325
* config/i386/i386-expand.cc (emit_reduc_half): Hanlde
V8QImode.
* config/i386/mmx.md (reduc__scal_): New expander.
(reduc__scal_v4qi): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112325-mmx-1.c: New test.
---
 gcc/config/i386/i386-expand.cc|  1 +
 gcc/config/i386/mmx.md| 31 +-
 .../gcc.target/i386/pr112325-mmx-1.c  | 40 +++
 3 files changed, 70 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112325-mmx-1.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a8d871d321e..fe56d2f6153 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -17748,6 +17748,7 @@ emit_reduc_half (rtx dest, rtx src, int i)
   tem = gen_mmx_lshrv1si3 (d, gen_lowpart (V1SImode, src),
   GEN_INT (i / 2));
   break;
+case E_V8QImode:
 case E_V4HImode:
   d = gen_reg_rtx (V1DImode);
   tem = gen_mmx_lshrv1di3 (d, gen_lowpart (V1DImode, src),
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 355538749d1..c77c9719e9a 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -120,13 +120,15 @@ (define_mode_attr mmxscalarmode
   [(V2SI "SI") (V2SF "SF")
(V4HF "HF") (V4BF "BF")
(V2HF "HF") (V2BF "BF")
-   (V4HI "HI") (V2HI "HI")])
+   (V4HI "HI") (V2HI "HI")
+   (V8QI "QI")])
 
 (define_mode_attr mmxscalarmodelower
   [(V2SI "si") (V2SF "sf")
(V4HF "hf") (V4BF "bf")
(V2HF "hf") (V2BF "bf")
-   (V4HI "hi") (V2HI "hi")])
+   (V4HI "hi") (V2HI "hi")
+   (V8QI "qi")])
 
 (define_mode_attr Yv_Yw
   [(V8QI "Yw") (V4HI "Yw") (V2SI "Yv") (V1DI "Yv") (V2SF "Yv")])
@@ -6094,6 +6096,31 @@ (define_insn "*mmx_psadbw"
(set_attr "type" "mmxshft,sseiadd,sseiadd")
(set_attr "mode" "DI,TI,TI")])
 
+(define_expand "reduc__scal_"
+ [(any_logic:MMXMODE12
+(match_operand: 0 "register_operand")
+(match_operand:MMXMODE12 1 "register_operand"))]
+ "TARGET_MMX_WITH_SSE"
+{
+  rtx tmp = gen_reg_rtx (mode);
+  ix86_expand_reduc (gen_3, tmp, operands[1]);
+  emit_insn (gen_vec_extract (operands[0],
+  tmp, const0_rtx));
+  DONE;
+})
+
+(define_expand "reduc__scal_v4qi"
+ [(any_logic:V4QI
+(match_operand:QI 0 "register_operand")
+(match_operand:V4QI 1 "register_operand"))]
+ "TARGET_SSE2"
+{
+  rtx tmp = gen_reg_rtx (V4QImode);
+  ix86_expand_reduc (gen_v4qi3, tmp, operands[1]);
+  emit_insn (gen_vec_extractv4qiqi (operands[0], tmp, const0_rtx));
+  DONE;
+})
+
 (define_expand "reduc_plus_scal_v8qi"
  [(plus:V8QI
 (match_operand:QI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/i386/pr112325-mmx-1.c 
b/gcc/testsuite/gcc.target/i386/pr112325-mmx-1.c
new file mode 100644
index 000..887249fc6ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112325-mmx-1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-msse2 -O2 -fdump-tree-slp2" } */
+/* { dg-final { scan-tree-dump-times ".REDUC_IOR" 3 "slp2" } } */
+
+short
+foo1 (short* a)
+{
+  short sum = 0;
+  sum |= a[0];
+  sum |= a[1];
+  sum |= a[2];
+  sum |= a[3];
+  return sum;
+}
+
+char
+foo2 (char* a)
+{
+  char sum = 0;
+  sum |= a[0];
+  sum |= a[1];
+  sum |= a[2];
+  sum |= a[3];
+  sum |= a[4];
+  sum |= a[5];
+  sum |= a[6];
+  sum |= a[7];
+  return sum;
+}
+
+char
+foo3 (char* a)
+{
+  char sum = 0;
+  sum |= a[0];
+  sum |= a[1];
+  sum |= a[2];
+  sum |= a[3];
+  return sum;
+}
-- 
2.31.1



Re: [RFA] New pass for sign/zero extension elimination

2023-11-19 Thread Oleg Endo
On Sun, 2023-11-19 at 19:51 -0700, Jeff Law wrote:
> 
> On 11/19/23 18:22, Oleg Endo wrote:
> > 
> > On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote:
> > > This is work originally started by Joern @ Embecosm.
> > > 
> > > There's been a long standing sense that we're generating too many
> > > sign/zero extensions on the RISC-V port.  REE is useful, but it's really
> > > focused on a relatively narrow part of the extension problem.
> > > 
> > > What Joern's patch does is introduce a new pass which tracks liveness of
> > > chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31
> > > and 32..63.
> > > 
> > > If it encounters a sign/zero extend that sets bits that are never read,
> > > then it replaces the sign/zero extension with a narrowing subreg.  The
> > > narrowing subreg usually gets eliminated by subsequent passes (it's just
> > > a copy after all).
> > > 
> > 
> > Have you tried it on SH, too?  (and if so any numbers?)


> Just bootstrap with C regression testing on sh4/sh4eb.  No data on 
> improvements.
> 

Alright.  I'll check what it does for SH once it's in.

Cheers,
Oleg


[PATCH] LoongArch: Add support for xorsign.

2023-11-19 Thread Jiahao Xu
This patch adds support for xorsign pattern to scalar fp and vector. With the
new expands, uniformly using vector bitwise logical operations to handle 
xorsign.

On LoongArch64, floating-point registers and vector registers share the same 
register,
so this patch also allows conversion between LSX vector mode and scalar fp mode 
to
avoid unnecessary instruction generation.

gcc/ChangeLog:

* config/loongarch/lasx.md (xorsign3): New expander.
* config/loongarch/loongarch.cc (loongarch_can_change_mode_class): Allow
conversion between LSX vector mode and scalar fp mode.
* config/loongarch/loongarch.md (@xorsign3): New expander.
* config/loongarch/lsx.md (@xorsign3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xorsign-run.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-xorsign.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-xorsign-run.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-xorsign.c: New test.
* gcc.target/loongarch/xorsign-run.c: New test.
* gcc.target/loongarch/xorsign.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index f0f2dd08dd8..5a4be588fb4 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1120,10 +1120,10 @@ (define_insn "umod3"
(set_attr "mode" "")])
 
 (define_insn "xor3"
-  [(set (match_operand:ILASX 0 "register_operand" "=f,f,f")
-   (xor:ILASX
- (match_operand:ILASX 1 "register_operand" "f,f,f")
- (match_operand:ILASX 2 "reg_or_vector_same_val_operand" 
"f,YC,Urv8")))]
+  [(set (match_operand:LASX 0 "register_operand" "=f,f,f")
+   (xor:LASX
+ (match_operand:LASX 1 "register_operand" "f,f,f")
+ (match_operand:LASX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))]
   "ISA_HAS_LASX"
   "@
xvxor.v\t%u0,%u1,%u2
@@ -3147,6 +3147,20 @@ (define_expand "copysign3"
   operands[5] = gen_reg_rtx (mode);
 })
 
+(define_expand "xorsign3"
+  [(set (match_dup 4)
+(and:FLASX (match_dup 3)
+(match_operand:FLASX 2 "register_operand")))
+   (set (match_operand:FLASX 0 "register_operand")
+(xor:FLASX (match_dup 4)
+ (match_operand:FLASX 1 "register_operand")))]
+  "ISA_HAS_LASX"
+{
+  operands[3] = loongarch_build_signbit_mask (mode, 1, 0);
+
+  operands[4] = gen_reg_rtx (mode);
+})
+
 
 (define_insn "absv4df2"
   [(set (match_operand:V4DF 0 "register_operand" "=f")
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d05743bec87..e4cdbcf0f2d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6687,6 +6687,11 @@ loongarch_can_change_mode_class (machine_mode from, 
machine_mode to,
   if (LSX_SUPPORTED_MODE_P (from) && LSX_SUPPORTED_MODE_P (to))
 return true;
 
+  /* Allow conversion between LSX vector mode and scalar fp mode. */
+  if ((LSX_SUPPORTED_MODE_P (from) && SCALAR_FLOAT_MODE_P (to))
+  || ((SCALAR_FLOAT_MODE_P (from) && LSX_SUPPORTED_MODE_P (to
+return true;
+
   return !reg_classes_intersect_p (FP_REGS, rclass);
 }
 
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 22814a3679c..117c0924a85 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1146,6 +1146,23 @@ (define_insn "copysign3"
   "fcopysign.\t%0,%1,%2"
   [(set_attr "type" "fcopysign")
(set_attr "mode" "")])
+
+(define_expand "@xorsign3"
+  [(match_operand:ANYF 0 "register_operand")
+   (match_operand:ANYF 1 "register_operand")
+   (match_operand:ANYF 2 "register_operand")]
+  "ISA_HAS_LSX"
+{
+  machine_mode lsx_mode
+= mode == SFmode ? V4SFmode : V2DFmode;
+  rtx tmp = gen_reg_rtx (lsx_mode);
+  rtx op1 = lowpart_subreg (lsx_mode, operands[1], mode);
+  rtx op2 = lowpart_subreg (lsx_mode, operands[2], mode);
+  emit_insn (gen_xorsign3 (lsx_mode, tmp, op1, op2));
+  emit_move_insn (operands[0],
+  lowpart_subreg (mode, tmp, lsx_mode));
+  DONE;
+})
 
 ;;
 ;;  
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 55c7d79a030..40500363dc0 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1027,10 +1027,10 @@ (define_insn "umod3"
(set_attr "mode" "")])
 
 (define_insn "xor3"
-  [(set (match_operand:ILSX 0 "register_operand" "=f,f,f")
-   (xor:ILSX
- (match_operand:ILSX 1 "register_operand" "f,f,f")
- (match_operand:ILSX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))]
+  [(set (match_operand:LSX 0 "register_operand" "=f,f,f")
+   (xor:LSX
+ (match_operand:LSX 1 "register_operand" "f,f,f")
+ (match_operand:LSX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))]
   "ISA_HAS_LSX"
   "@
vxor.v\t%w0,%w1,%w2
@@ -2884,6 +2884,21 @@ (define_expand "copysign3"
   operands[5] = gen_reg_rtx (mode);
 })
 
+(define_expand "@xorsign3"
+  [(set (match_dup 4)
+(and:FLSX (match_dup 

[PATCH] LoongArch: Add support for xorsign.

2023-11-19 Thread Jiahao Xu
This patch adds support for xorsign pattern to scalar fp and vector. With the
new expands, uniformly using vector bitwise logical operations to handle 
xorsign.

On LoongArch64, floating-point registers and vector registers share the same 
register,
so this patch also allows conversion between LSX vector mode and scalar fp mode 
to
avoid unnecessary instruction generation.

gcc/ChangeLog:

* config/loongarch/lasx.md (xorsign3): New expander.
* config/loongarch/loongarch.cc (loongarch_can_change_mode_class): Allow
conversion between LSX vector mode and scalar fp mode.
* config/loongarch/loongarch.md (@xorsign3): New expander.
* config/loongarch/lsx.md (@xorsign3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xorsign-run.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-xorsign.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-xorsign-run.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-xorsign.c: New test.
* gcc.target/loongarch/xorsign-run.c: New test.
* gcc.target/loongarch/xorsign.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index f0f2dd08dd8..5a4be588fb4 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1120,10 +1120,10 @@ (define_insn "umod3"
(set_attr "mode" "")])
 
 (define_insn "xor3"
-  [(set (match_operand:ILASX 0 "register_operand" "=f,f,f")
-   (xor:ILASX
- (match_operand:ILASX 1 "register_operand" "f,f,f")
- (match_operand:ILASX 2 "reg_or_vector_same_val_operand" 
"f,YC,Urv8")))]
+  [(set (match_operand:LASX 0 "register_operand" "=f,f,f")
+   (xor:LASX
+ (match_operand:LASX 1 "register_operand" "f,f,f")
+ (match_operand:LASX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))]
   "ISA_HAS_LASX"
   "@
xvxor.v\t%u0,%u1,%u2
@@ -3147,6 +3147,20 @@ (define_expand "copysign3"
   operands[5] = gen_reg_rtx (mode);
 })
 
+(define_expand "xorsign3"
+  [(set (match_dup 4)
+(and:FLASX (match_dup 3)
+(match_operand:FLASX 2 "register_operand")))
+   (set (match_operand:FLASX 0 "register_operand")
+(xor:FLASX (match_dup 4)
+ (match_operand:FLASX 1 "register_operand")))]
+  "ISA_HAS_LASX"
+{
+  operands[3] = loongarch_build_signbit_mask (mode, 1, 0);
+
+  operands[4] = gen_reg_rtx (mode);
+})
+
 
 (define_insn "absv4df2"
   [(set (match_operand:V4DF 0 "register_operand" "=f")
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d05743bec87..e4cdbcf0f2d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6687,6 +6687,11 @@ loongarch_can_change_mode_class (machine_mode from, 
machine_mode to,
   if (LSX_SUPPORTED_MODE_P (from) && LSX_SUPPORTED_MODE_P (to))
 return true;
 
+  /* Allow conversion between LSX vector mode and scalar fp mode. */
+  if ((LSX_SUPPORTED_MODE_P (from) && SCALAR_FLOAT_MODE_P (to))
+  || ((SCALAR_FLOAT_MODE_P (from) && LSX_SUPPORTED_MODE_P (to
+return true;
+
   return !reg_classes_intersect_p (FP_REGS, rclass);
 }
 
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 22814a3679c..117c0924a85 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1146,6 +1146,23 @@ (define_insn "copysign3"
   "fcopysign.\t%0,%1,%2"
   [(set_attr "type" "fcopysign")
(set_attr "mode" "")])
+
+(define_expand "@xorsign3"
+  [(match_operand:ANYF 0 "register_operand")
+   (match_operand:ANYF 1 "register_operand")
+   (match_operand:ANYF 2 "register_operand")]
+  "ISA_HAS_LSX"
+{
+  machine_mode lsx_mode
+= mode == SFmode ? V4SFmode : V2DFmode;
+  rtx tmp = gen_reg_rtx (lsx_mode);
+  rtx op1 = lowpart_subreg (lsx_mode, operands[1], mode);
+  rtx op2 = lowpart_subreg (lsx_mode, operands[2], mode);
+  emit_insn (gen_xorsign3 (lsx_mode, tmp, op1, op2));
+  emit_move_insn (operands[0],
+  lowpart_subreg (mode, tmp, lsx_mode));
+  DONE;
+})
 
 ;;
 ;;  
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 55c7d79a030..40500363dc0 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1027,10 +1027,10 @@ (define_insn "umod3"
(set_attr "mode" "")])
 
 (define_insn "xor3"
-  [(set (match_operand:ILSX 0 "register_operand" "=f,f,f")
-   (xor:ILSX
- (match_operand:ILSX 1 "register_operand" "f,f,f")
- (match_operand:ILSX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))]
+  [(set (match_operand:LSX 0 "register_operand" "=f,f,f")
+   (xor:LSX
+ (match_operand:LSX 1 "register_operand" "f,f,f")
+ (match_operand:LSX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))]
   "ISA_HAS_LSX"
   "@
vxor.v\t%w0,%w1,%w2
@@ -2884,6 +2884,21 @@ (define_expand "copysign3"
   operands[5] = gen_reg_rtx (mode);
 })
 
+(define_expand "@xorsign3"
+  [(set (match_dup 4)
+(and:FLSX (match_dup 

[PATCH] LoongArch: Add support for xorsign.

2023-11-19 Thread Jiahao Xu
This patch adds support for xorsign pattern to scalar fp and vector. With the
new expands, uniformly using vector bitwise logical operations to handle 
xorsign.

On LoongArch64, floating-point registers and vector registers share the same 
register,
so this patch also allows conversion between LSX vector mode and scalar fp mode 
to
avoid unnecessary instruction generation.

gcc/ChangeLog:

* config/loongarch/lasx.md (xorsign3): New expander.
* config/loongarch/loongarch.cc (loongarch_can_change_mode_class): Allow
conversion between LSX vector mode and scalar fp mode.
* config/loongarch/loongarch.md (@xorsign3): New expander.
* config/loongarch/lsx.md (@xorsign3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xorsign-run.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-xorsign.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-xorsign-run.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-xorsign.c: New test.
* gcc.target/loongarch/xorsign-run.c: New test.
* gcc.target/loongarch/xorsign.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index f0f2dd08dd8..5a4be588fb4 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1120,10 +1120,10 @@ (define_insn "umod3"
(set_attr "mode" "")])
 
 (define_insn "xor3"
-  [(set (match_operand:ILASX 0 "register_operand" "=f,f,f")
-   (xor:ILASX
- (match_operand:ILASX 1 "register_operand" "f,f,f")
- (match_operand:ILASX 2 "reg_or_vector_same_val_operand" 
"f,YC,Urv8")))]
+  [(set (match_operand:LASX 0 "register_operand" "=f,f,f")
+   (xor:LASX
+ (match_operand:LASX 1 "register_operand" "f,f,f")
+ (match_operand:LASX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))]
   "ISA_HAS_LASX"
   "@
xvxor.v\t%u0,%u1,%u2
@@ -3147,6 +3147,20 @@ (define_expand "copysign3"
   operands[5] = gen_reg_rtx (mode);
 })
 
+(define_expand "xorsign3"
+  [(set (match_dup 4)
+(and:FLASX (match_dup 3)
+(match_operand:FLASX 2 "register_operand")))
+   (set (match_operand:FLASX 0 "register_operand")
+(xor:FLASX (match_dup 4)
+ (match_operand:FLASX 1 "register_operand")))]
+  "ISA_HAS_LASX"
+{
+  operands[3] = loongarch_build_signbit_mask (mode, 1, 0);
+
+  operands[4] = gen_reg_rtx (mode);
+})
+
 
 (define_insn "absv4df2"
   [(set (match_operand:V4DF 0 "register_operand" "=f")
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d05743bec87..e4cdbcf0f2d 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6687,6 +6687,11 @@ loongarch_can_change_mode_class (machine_mode from, 
machine_mode to,
   if (LSX_SUPPORTED_MODE_P (from) && LSX_SUPPORTED_MODE_P (to))
 return true;
 
+  /* Allow conversion between LSX vector mode and scalar fp mode. */
+  if ((LSX_SUPPORTED_MODE_P (from) && SCALAR_FLOAT_MODE_P (to))
+  || ((SCALAR_FLOAT_MODE_P (from) && LSX_SUPPORTED_MODE_P (to
+return true;
+
   return !reg_classes_intersect_p (FP_REGS, rclass);
 }
 
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 22814a3679c..117c0924a85 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1146,6 +1146,23 @@ (define_insn "copysign3"
   "fcopysign.\t%0,%1,%2"
   [(set_attr "type" "fcopysign")
(set_attr "mode" "")])
+
+(define_expand "@xorsign3"
+  [(match_operand:ANYF 0 "register_operand")
+   (match_operand:ANYF 1 "register_operand")
+   (match_operand:ANYF 2 "register_operand")]
+  "ISA_HAS_LSX"
+{
+  machine_mode lsx_mode
+= mode == SFmode ? V4SFmode : V2DFmode;
+  rtx tmp = gen_reg_rtx (lsx_mode);
+  rtx op1 = lowpart_subreg (lsx_mode, operands[1], mode);
+  rtx op2 = lowpart_subreg (lsx_mode, operands[2], mode);
+  emit_insn (gen_xorsign3 (lsx_mode, tmp, op1, op2));
+  emit_move_insn (operands[0],
+  lowpart_subreg (mode, tmp, lsx_mode));
+  DONE;
+})
 
 ;;
 ;;  
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 55c7d79a030..40500363dc0 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1027,10 +1027,10 @@ (define_insn "umod3"
(set_attr "mode" "")])
 
 (define_insn "xor3"
-  [(set (match_operand:ILSX 0 "register_operand" "=f,f,f")
-   (xor:ILSX
- (match_operand:ILSX 1 "register_operand" "f,f,f")
- (match_operand:ILSX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))]
+  [(set (match_operand:LSX 0 "register_operand" "=f,f,f")
+   (xor:LSX
+ (match_operand:LSX 1 "register_operand" "f,f,f")
+ (match_operand:LSX 2 "reg_or_vector_same_val_operand" "f,YC,Urv8")))]
   "ISA_HAS_LSX"
   "@
vxor.v\t%w0,%w1,%w2
@@ -2884,6 +2884,21 @@ (define_expand "copysign3"
   operands[5] = gen_reg_rtx (mode);
 })
 
+(define_expand "@xorsign3"
+  [(set (match_dup 4)
+(and:FLSX (match_dup 

Re: Re: RISC-V: Support XTheadVector extensions

2023-11-19 Thread juzhe.zh...@rivai.ai
As kito's suggestions. I just have a quick try.

This patch should does following things:

1. Remove all new API that RVV1.0 doesn't have. E.g. vlb.
They should be another separate patch to be reviewed.
So the first series patch should be "Support part of theadvector API base 
on current RVV1.0 API"

2. Here is a another approach which must work for theadvector:

   diff --git a/gcc/config/riscv/riscv-protos.h 
b/gcc/config/riscv/riscv-protos.h
index ae528db1898..24b514c58df 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -646,6 +646,7 @@ extern bool th_classify_address (struct riscv_address_info 
*,
 extern const char *th_output_move (rtx, rtx);
 extern bool th_print_operand_address (FILE *, machine_mode, rtx);
 #endif
+extern void th_vector_asm_output_opcode (FILE *, const char *);

 extern bool riscv_use_divmod_expander (void);
 void riscv_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3701f41b1b3..9631a428341 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10088,6 +10088,13 @@ extract_base_offset_in_addr (rtx mem, rtx *base, rtx 
*offset)
   return false;
 }

+void
+th_vector_asm_output_opcode (FILE *f, const char *ptr)
+{
+  if (ptr[0] == 'v')
+fprintf (f, "th.");
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6205d7533f4..be02a926028 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1206,4 +1206,6 @@ extern void riscv_remove_unneeded_save_restore_calls 
(void);
 #define HAVE_POST_MODIFY_DISP TARGET_XTHEADMEMIDX
 #define HAVE_PRE_MODIFY_DISP  TARGET_XTHEADMEMIDX

+#define ASM_OUTPUT_OPCODE(STREAM, PTR) th_vector_asm_output_opcode (STREAM, 
PTR);
+
 #endif /* ! GCC_RISCV_H */

It does work:

/tmp/cc0yrKxw.s:1692: Error: unrecognized opcode `th.vsetivli 
zero,8,e8,mf2,ta,ma'
/tmp/cc0yrKxw.s:1693: Error: unrecognized opcode `th.vmv.v.i v1,0'
/tmp/cc0yrKxw.s:1694: Error: unrecognized opcode `th.vse8.v v1,0(a5)'
/tmp/cc0yrKxw.s:1696: Error: unrecognized opcode `th.vse8.v v1,0(a5)'
make[2]: *** [Makefile:935: _gcov.o] Error 1
make[2]: *** Waiting for unfinished jobs
/tmp/cc2KYYTs.s: Assembler messages:
/tmp/cc2KYYTs.s:1606: Error: unrecognized opcode `th.vsetivli 
zero,8,e8,mf2,ta,ma'
/tmp/cc2KYYTs.s:1610: Error: unrecognized opcode `th.vle8.v v1,0(a1)'
/tmp/cc2KYYTs.s:1615: Error: unrecognized opcode `th.vse8.v v1,0(sp)'
/tmp/cc2KYYTs.s:1617: Error: unrecognized opcode `th.vle8.v v1,0(a2)'
/tmp/cc2KYYTs.s:1618: Error: unrecognized opcode `th.vse8.v v1,0(a5)'
/tmp/cc2KYYTs.s:1651: Error: unrecognized opcode `th.vsetivli 
zero,8,e8,mf2,ta,ma'
/tmp/cc2KYYTs.s:1671: Error: unrecognized opcode `th.vle8.v v1,0(a4)'
/tmp/cc2KYYTs.s:1674: Error: unrecognized opcode `th.vse8.v v1,0(a0)'
/tmp/cc2KYYTs.s:2469: Error: unrecognized opcode `th.vsetivli 
zero,8,e8,mf2,ta,ma'
/tmp/cc2KYYTs.s:2569: Error: unrecognized opcode `th.vsetivli 
zero,8,e8,mf2,ta,ma'
/tmp/cc2KYYTs.s:2580: Error: unrecognized opcode `th.vle8.v v1,0(a2)'
/tmp/cc2KYYTs.s:2581: Error: unrecognized opcode `th.vse8.v v1,0(a5)'
/tmp/cc2KYYTs.s:2643: Error: unrecognized opcode `th.vsetivli 
zero,8,e8,mf2,ta,ma'
/tmp/cc2KYYTs.s:2671: Error: unrecognized opcode `th.vsetivli 
zero,8,e8,mf2,ta,ma'
/tmp/cc2KYYTs.s:3294: Error: unrecognized opcode `th.vsetivli 
zero,8,e8,mf2,ta,ma'
/tmp/cc2KYYTs.s:3317: Error: unrecognized opcode `th.vle8.v v1,0(a4)'
/tmp/cc2KYYTs.s:3319: Error: unrecognized opcode `th.vse8.v v1,0(a4)'
/tmp/cc2KYYTs.s:3322: Error: unrecognized opcode `th.vle8.v v1,0(a4)'
/tmp/cc2KYYTs.s:3324: Error: unrecognized opcode `th.vse8.v v1,0(a4)'

But we need binutils support theadvector first, otherwise, it will fail during 
building.

3. Add theadvector gating on target-support.exp. We don't want to run 
theadvector test
when we don't enable theadvector.

Thanks.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-11-18 18:32
To: Philipp Tomsich
CC: Jeff Law; juzhe.zh...@rivai.ai; gcc-patches; kito.cheng; cooper.joshua; 
Robin Dapp; jkridner
Subject: Re: RISC-V: Support XTheadVector extensions
I guess it would be worth to state my thought publicly:
 
I *support* adding the T-head vector (a.k.a. vector 0.7) to upstream
GCC since T-Head vector already ships a large enough number of boards,
also it's not really T-head's problem as Palmer described in another
mail.
 
My biggest concern before is T-head folks didn't involved into
community work too much, so accept that definitely will increasing
work for maintainers, however I saw T-head folks is trying to
contribute stuffs to upstream now, so may not a concern now, also I
believe accept this patch will encourage they work more on upstream
together, which is benefit to each other.
 
Back to the one of the biggest issues for the patch set: GCC 14 or GCC
15. M

  1   2   >