[PATCH 3/9] dwarf: create annotation DIEs for decl tags

2023-07-11 Thread David Faust via Gcc-patches
The "btf_decl_tag" attribute is handled by constructing a
DW_TAG_GNU_annotation DIE for each occurrence to record the argument
string in debug information. The DIEs are children of the declarations
they annotate, with the following format:

  DW_TAG_GNU_annotation
DW_AT_name "btf_decl_tag"
DW_AT_const_value 

gcc/

* dwarf2out.cc (gen_btf_decl_tag_dies): New function.
(gen_formal_parameter_die): Call it here.
(gen_decl_die): Likewise.
---
 gcc/dwarf2out.cc | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 238d0a94400..c8c34db2b5a 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -13620,6 +13620,35 @@ long_double_as_float128 (tree type)
   return NULL_TREE;
 }
 
+/* Given a tree T, which should be a decl, process any btf_decl_tag attributes
+   on T, provided in ATTR.  Construct DW_TAG_GNU_annotation DIEs appropriately
+   as children of TARGET, usually the DIE for T.  */
+
+static void
+gen_btf_decl_tag_dies (tree t, dw_die_ref target)
+{
+  dw_die_ref die;
+  tree attr;
+
+  if (t == NULL_TREE || !DECL_P (t) || !target)
+return;
+
+  attr = lookup_attribute ("btf_decl_tag", DECL_ATTRIBUTES (t));
+  while (attr != NULL_TREE)
+{
+  die = new_die (DW_TAG_GNU_annotation, target, t);
+  add_name_attribute (die, IDENTIFIER_POINTER (get_attribute_name (attr)));
+  add_AT_string (die, DW_AT_const_value,
+TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr;
+  attr = lookup_attribute ("btf_decl_tag", TREE_CHAIN (attr));
+}
+
+  /* Strip the decl tag attribute to avoid creating multiple copies if we hit
+ this tree node again in some recursive call.  */
+  DECL_ATTRIBUTES (t)
+= remove_attribute ("btf_decl_tag", DECL_ATTRIBUTES (t));
+}
+
 /* Given a pointer to an arbitrary ..._TYPE tree node, return a debugging
entry that chains the modifiers specified by CV_QUALS in front of the
given type.  REVERSE is true if the type is to be interpreted in the
@@ -23016,6 +23045,9 @@ gen_formal_parameter_die (tree node, tree origin, bool 
emit_name_p,
   gcc_unreachable ();
 }
 
+  /* Handle any attribute btf_decl_tag on the decl.  */
+  gen_btf_decl_tag_dies (node, parm_die);
+
   return parm_die;
 }
 
@@ -27170,6 +27202,9 @@ gen_decl_die (tree decl, tree origin, struct 
vlr_context *ctx,
   break;
 }
 
+  /* Handle any attribute btf_decl_tag on the decl.  */
+  gen_btf_decl_tag_dies (decl_or_origin, lookup_decl_die (decl_or_origin));
+
   return NULL;
 }
 
-- 
2.40.1



[PATCH 4/9] dwarf: expose get_die_parent

2023-07-11 Thread David Faust via Gcc-patches
Expose get_die_parent () so it can be used outside of dwarf2out.cc

gcc/

* dwarf2out.cc (get_die_parent): Make non-static.
* dwarf2out.h (get_die_parent): Add extern declaration here.
---
 gcc/dwarf2out.cc | 2 +-
 gcc/dwarf2out.h  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index c8c34db2b5a..ba6d91f95cf 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -5457,7 +5457,7 @@ get_AT (dw_die_ref die, enum dwarf_attribute attr_kind)
 
 /* Returns the parent of the declaration of DIE.  */
 
-static dw_die_ref
+dw_die_ref
 get_die_parent (dw_die_ref die)
 {
   dw_die_ref t;
diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h
index 870b56a6a37..3be918edc21 100644
--- a/gcc/dwarf2out.h
+++ b/gcc/dwarf2out.h
@@ -453,6 +453,7 @@ extern dw_die_ref base_type_die (tree, bool);
 extern dw_die_ref lookup_decl_die (tree);
 extern dw_die_ref lookup_type_die (tree);
 
+extern dw_die_ref get_die_parent (dw_die_ref);
 extern dw_die_ref dw_get_die_child (dw_die_ref);
 extern dw_die_ref dw_get_die_sib (dw_die_ref);
 extern enum dwarf_tag dw_get_die_tag (dw_die_ref);
-- 
2.40.1



[PATCH 9/9] doc: document btf_decl_tag attribute

2023-07-11 Thread David Faust via Gcc-patches
Add documentation for the btf_decl_tag attribute.

gcc/

* doc/extend.texi (Common Function Attributes): Document btf_decl_tag.
(Common Variable Attributes): Likewise.
---
 gcc/doc/extend.texi | 47 +
 1 file changed, 47 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d88fd75e06e..57923621c46 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2856,6 +2856,29 @@ declares that @code{my_alloc1} returns 16-byte aligned 
pointers and
 that @code{my_alloc2} returns a pointer whose value modulo 32 is equal
 to 8.
 
+@cindex @code{btf_decl_tag} function attribute
+@item btf_type_tag (@var{argument})
+The @code{btf_decl_tag} attribute may be used to associate (to ``tag'')
+function declarations with arbitrary strings.  Debugging information will
+be emitted to associate the @var{argument} string with the attributed function.
+In DWARF, a @code{DW_TAG_GNU_annotation} DIE will be emitted in the DWARF
+information as a child of the DIE for the function and holding the
+@var{argument} string.  In BTF, a @code{BTF_KIND_DECL_TAG} record is emitted
+in the .BTF ELF section.
+
+For example
+
+@smallexample
+extern int bar (char, int) __attribute__((btf_decl_tag("for_user")))
+@end smallexample
+
+associates the string ``for_user'' to the function ``bar''. This
+string will be recorded in the BTF and/or DWARF information associated
+with the function.
+
+The @code{btf_decl_tag} attribute can also be used on variables
+(@pxref{Common Variable Attributes})  and field declarations.
+
 @cindex @code{cold} function attribute
 @item cold
 The @code{cold} attribute on functions is used to inform the compiler that
@@ -7570,6 +7593,30 @@ This warning can be disabled by 
@option{-Wno-if-not-aligned}.
 The @code{warn_if_not_aligned} attribute can also be used for types
 (@pxref{Common Type Attributes}.)
 
+@cindex @code{btf_decl_tag} variable attribute
+@item btf_decl_tag (@var{argument})
+The @code{btf_decl_tag} attribute may be used to associate (to ``tag'')
+variable declarations with arbitrary strings.  Debugging information will
+be emitted to associate the @var{argument} string with the attributed variable.
+In DWARF, a @code{DW_TAG_GNU_annotation} DIE will be emitted in the DWARF
+information as a child of the DIE for the variable and holding the
+@var{argument} string.  In BTF, a @code{BTF_KIND_DECL_TAG} record is emitted
+in the .BTF ELF section.
+
+For example
+
+@smallexample
+int * foo __attribute__((btf_decl_tag("user")));
+@end smallexample
+
+@noindent
+associates the string ``user'' to the variable ``foo''. This string
+will be recorded in the BTF and/or DWARF information associated with
+the variable.
+
+The @code{btf_decl_tag} attribute can also be used on functions
+(@pxref{Common Function Attributes}) and field declarations.
+
 @cindex @code{strict_flex_array} variable attribute
 @item strict_flex_array (@var{level})
 The @code{strict_flex_array} attribute should be attached to the trailing
-- 
2.40.1



[PATCH 6/9] dwarf2ctf: convert annotation DIEs to CTF types

2023-07-11 Thread David Faust via Gcc-patches
This patch makes the DWARF-to-CTF conversion process aware of the new
DW_TAG_GNU_annotation DIEs. The DIEs are converted to CTF_K_DECL_TAG
types and added to the compilation unit CTF container to be translated
to BTF and output.

gcc/

* dwarf2ctf.cc (handle_btf_tags): New function.
(gen_ctf_sou_type): Don't try to create member types for children which
are not DW_TAG_member. Call handle_btf_tags if appropriate.
(gen_ctf_function_type): Call handle_btf_tags if appropriate.
(gen_ctf_variable): Likewise.
(gen_ctf_type): Likewise.
---
 gcc/dwarf2ctf.cc | 71 +++-
 1 file changed, 70 insertions(+), 1 deletion(-)

diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc
index 549b0cb2dc1..b051aef45a8 100644
--- a/gcc/dwarf2ctf.cc
+++ b/gcc/dwarf2ctf.cc
@@ -32,6 +32,9 @@ along with GCC; see the file COPYING3.  If not see
 static ctf_id_t
 gen_ctf_type (ctf_container_ref, dw_die_ref);
 
+static void
+handle_btf_tags (ctf_container_ref, dw_die_ref, ctf_id_t, int);
+
 /* All the DIE structures we handle come from the DWARF information
generated by GCC.  However, there are three situations where we need
to create our own created DIE structures because GCC doesn't
@@ -547,6 +550,7 @@ gen_ctf_sou_type (ctf_container_ref ctfc, dw_die_ref sou, 
uint32_t kind)
   /* Now process the struct members.  */
   {
 dw_die_ref c;
+int idx = 0;
 
 c = dw_get_die_child (sou);
 if (c)
@@ -559,6 +563,9 @@ gen_ctf_sou_type (ctf_container_ref ctfc, dw_die_ref sou, 
uint32_t kind)
 
  c = dw_get_die_sib (c);
 
+ if (dw_get_die_tag (c) != DW_TAG_member)
+   continue;
+
  field_name = get_AT_string (c, DW_AT_name);
  field_type = ctf_get_AT_type (c);
  field_location = ctf_get_AT_data_member_location (c);
@@ -626,6 +633,12 @@ gen_ctf_sou_type (ctf_container_ref ctfc, dw_die_ref sou, 
uint32_t kind)
 field_name,
 field_type_id,
 field_location);
+
+ /* Handle BTF tags on the member.  */
+ if (btf_debuginfo_p ())
+   handle_btf_tags (ctfc, c, sou_type_id, idx);
+
+ idx++;
}
   while (c != dw_get_die_child (sou));
   }
@@ -718,6 +731,9 @@ gen_ctf_function_type (ctf_container_ref ctfc, dw_die_ref 
function,
  arg_type = gen_ctf_type (ctfc, ctf_get_AT_type (c));
  /* Add the argument to the existing CTF function type.  */
  ctf_add_function_arg (ctfc, function, arg_name, arg_type);
+
+ if (btf_debuginfo_p ())
+   handle_btf_tags (ctfc, c, function_type_id, i - 1);
}
  else
/* This is a local variable.  Ignore.  */
@@ -833,6 +849,11 @@ gen_ctf_variable (ctf_container_ref ctfc, dw_die_ref die)
   /* Skip updating the number of global objects at this time.  This is updated
  later after pre-processing as some CTF variable records although
  generated now, will not be emitted later.  [PR105089].  */
+
+  /* Handle any BTF tags on the variable.  */
+  if (btf_debuginfo_p ())
+handle_btf_tags (ctfc, die, CTF_NULL_TYPEID, -1);
+
 }
 
 /* Add a CTF function record for the given input DWARF DIE.  */
@@ -850,8 +871,13 @@ gen_ctf_function (ctf_container_ref ctfc, dw_die_ref die)
  counter.  Note that DWARF encodes function types in both
  DW_TAG_subroutine_type and DW_TAG_subprogram in exactly the same
  way.  */
-  (void) gen_ctf_function_type (ctfc, die, true /* from_global_func */);
+  function_type_id
+   = gen_ctf_function_type (ctfc, die, true /* from_global_func 
*/);
   ctfc->ctfc_num_global_funcs += 1;
+
+  /* Handle any BTF tags on the function itself.  */
+  if (btf_debuginfo_p ())
+handle_btf_tags (ctfc, die, function_type_id, -1);
 }
 
 /* Add CTF type record(s) for the given input DWARF DIE and return its type id.
@@ -928,6 +954,10 @@ gen_ctf_type (ctf_container_ref ctfc, dw_die_ref die)
   break;
 }
 
+  /* Handle any BTF tags on the type.  */
+  if (btf_debuginfo_p () && !unrecog_die)
+handle_btf_tags (ctfc, die, type_id, -1);
+
   /* For all types unrepresented in CTF, use an explicit CTF type of kind
  CTF_K_UNKNOWN.  */
   if ((type_id == CTF_NULL_TYPEID) && (!unrecog_die))
@@ -936,6 +966,45 @@ gen_ctf_type (ctf_container_ref ctfc, dw_die_ref die)
   return type_id;
 }
 
+/* BTF support.  Handle any BTF tags attached to a given DIE, and generate
+   intermediate CTF types for them.  */
+
+static void
+handle_btf_tags (ctf_container_ref ctfc, dw_die_ref die, ctf_id_t type_id,
+int component_idx)
+{
+  dw_die_ref c;
+  const char * name = NULL;
+  const char * value = NULL;
+
+  c = dw_get_die_child (die);
+  if (c)
+do
+  {
+   if (dw_get_die_tag (c) != DW_TAG_GNU_annotation)
+ {
+   c = dw_get_die_sib (c);
+   continue;
+ }
+
+   name = g

[PATCH 8/9] testsuite: add tests for BTF decl tags

2023-07-11 Thread David Faust via Gcc-patches
This patch adds tests for the btf_decl_tag attribute, in both DWARF
and BTF.

gcc/testsuite/

* gcc.dg/debug/btf/btf-decltag-func.c: New test.
* gcc.dg/debug/btf/btf-decltag-sou.c: New test.
* gcc.dg/debug/btf/btf-decltag-var.c: New test.
* gcc.dg/debug/dwarf2/annotation-decl-1.c: New test.
* gcc.dg/debug/dwarf2/annotation-decl-2.c: New test.
* gcc.dg/debug/dwarf2/annotation-decl-3.c: New test.
---
 .../gcc.dg/debug/btf/btf-decltag-func.c   | 21 
 .../gcc.dg/debug/btf/btf-decltag-sou.c| 33 +++
 .../gcc.dg/debug/btf/btf-decltag-var.c| 19 +++
 .../gcc.dg/debug/dwarf2/annotation-decl-1.c   |  9 +
 .../gcc.dg/debug/dwarf2/annotation-decl-2.c   | 18 ++
 .../gcc.dg/debug/dwarf2/annotation-decl-3.c   | 17 ++
 6 files changed, 117 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-var.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-decl-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-decl-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-decl-3.c

diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c
new file mode 100644
index 000..12a5eff9ac7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c
@@ -0,0 +1,21 @@
+/* { dg-do compile }  */
+/* { dg-options "-O0 -gbtf -dA" } */
+
+#define __tag1 __attribute__((btf_decl_tag("decl-tag-1")))
+#define __tag2 __attribute__((btf_decl_tag("decl-tag-2")))
+#define __tag3 __attribute__((btf_decl_tag("decl-tag-3")))
+
+extern int bar (int __tag1, int __tag2) __tag3;
+
+int __tag1 __tag2 foo (int arg1, int *arg2 __tag2)
+  {
+return bar (arg1 + 1, *arg2 + 2);
+  }
+
+/* { dg-final { scan-assembler-times "\[\t \]0x1100\[\t 
\]+\[^\n\]*btt_info" 4 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x\[\t 
\]+\[^\n\]*decltag_compidx" 3 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x1\[\t 
\]+\[^\n\]*decltag_compidx" 1 } } */
+
+/* { dg-final { scan-assembler-times " BTF_KIND_DECL_TAG 
'decl-tag-1'\[\\r\\n\]+\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\\(BTF_KIND_FUNC" 1 } 
} */
+/* { dg-final { scan-assembler-times " BTF_KIND_DECL_TAG 
'decl-tag-2'\[\\r\\n\]+\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\\(BTF_KIND_FUNC" 2 } 
} */
+/* { dg-final { scan-assembler-times " BTF_KIND_DECL_TAG 
'decl-tag-3'\[\\r\\n\]+\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\\(BTF_KIND_FUNC" 1 } 
} */
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c
new file mode 100644
index 000..13c9f075b1e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c
@@ -0,0 +1,33 @@
+
+/* { dg-do compile )  */
+/* { dg-options "-O0 -gbtf -dA" } */
+
+/* { dg-final { scan-assembler-times "\[\t \]0x1100\[\t 
\]+\[^\n\]*btt_info" 13 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0\[\t \]+\[^\n\]*decltag_compidx" 
2 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x1\[\t 
\]+\[^\n\]*decltag_compidx" 1 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x2\[\t 
\]+\[^\n\]*decltag_compidx" 3 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x3\[\t 
\]+\[^\n\]*decltag_compidx" 3 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x4\[\t 
\]+\[^\n\]*decltag_compidx" 1 } } */
+/* { dg-final { scan-assembler-times "\[\t \]0x\[\t 
\]+\[^\n\]*decltag_compidx" 3 } } */
+
+#define __tag1 __attribute__((btf_decl_tag("decl-tag-1")))
+#define __tag2 __attribute__((btf_decl_tag("decl-tag-2")))
+#define __tag3 __attribute__((btf_decl_tag("decl-tag-3")))
+
+struct t {
+  int a;
+  long b __tag3;
+  char c __tag2 __tag3;
+};
+
+struct t my_t __tag1 __tag2;
+
+union u {
+  char one __tag1 __tag2;
+  short two;
+  int three __tag1;
+  long four __tag1 __tag2 __tag3;
+  long long five __tag2;
+};
+
+union u my_u __tag3;
diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-var.c 
b/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-var.c
new file mode 100644
index 000..563e8838f1a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/btf/btf-decltag-var.c
@@ -0,0 +1,19 @@
+/* { dg-do compile }  */
+/* { dg-options "-O0 -gbtf -dA" } */
+
+unsigned long u __attribute__((btf_decl_tag ("__u")));
+
+const int * c __attribute__((btf_decl_tag ("__c"), btf_decl_tag 
("devicemem")));
+
+struct st
+{
+  int a;
+  char c;
+};
+
+struct st my_st __attribute__((btf_decl_tag ("__my_st")));
+
+/* { dg-final { scan-assembler-times " BTF_KIND_DECL_TAG 
'__u'\[\\r\\n\]+\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\\(BTF_KIND_VAR 'u'" 1 } } */
+/* { dg-final { scan-assembler-times " BTF_KIND_DECL_TAG 
'__c'\[\\r\\n\]+\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\\(BTF_KIND_VAR 'c'" 1 } } */
+/* { dg-final { scan-assembler-times " BTF_KIND_DECL_TAG

[PATCH 1/9] c-family: add btf_decl_tag attribute

2023-07-11 Thread David Faust via Gcc-patches
Add the "btf_decl_tag" attribute to the attribute table, along with
a simple handler for it.

gcc/c-family/

* c-attribs.cc (c_common_attribute_table): Add btf_decl_tag.
(handle_btf_decl_tag_attribute): Handle new attribute.
---
 gcc/c-family/c-attribs.cc | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index e2792ca6898..0a3de3ea307 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -178,6 +178,8 @@ static tree handle_signed_bool_precision_attribute (tree *, 
tree, tree, int,
 static tree handle_retain_attribute (tree *, tree, tree, int, bool *);
 static tree handle_fd_arg_attribute (tree *, tree, tree, int, bool *);
 
+static tree handle_btf_decl_tag_attribute (tree *, tree, tree, int, bool *);
+
 /* Helper to define attribute exclusions.  */
 #define ATTR_EXCL(name, function, type, variable)  \
   { name, function, type, variable }
@@ -569,6 +571,9 @@ const struct attribute_spec c_common_attribute_table[] =
 handle_fd_arg_attribute, NULL},
   { "fd_arg_write",   1, 1, false, true, true, false,
 handle_fd_arg_attribute, NULL}, 
+  { "btf_decl_tag",  1, 1, true, false, false, false,
+ handle_btf_decl_tag_attribute, NULL },
+
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };
 
@@ -5988,6 +5993,24 @@ handle_tainted_args_attribute (tree *node, tree name, 
tree, int,
   return NULL_TREE;
 }
 
+/* Handle a "btf_decl_tag" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_btf_decl_tag_attribute (tree *, tree name, tree args, int,
+  bool *no_add_attrs)
+{
+  if (!args)
+*no_add_attrs = true;
+  else if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
+{
+  error ("%qE attribute requires a string", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
 /* Attempt to partially validate a single attribute ATTR as if
it were to be applied to an entity OPER.  */
 
-- 
2.40.1



[PATCH 7/9] btf: create and output BTF_KIND_DECL_TAG types

2023-07-11 Thread David Faust via Gcc-patches
This patch updates btfout.cc to be aware of BTF_KIND_DECL_TAG types and
output them appropriately.

gcc/

* btfout.cc (funcs_map): New hash map.
(btf_emit_preprocess): ... Initialize it here...
(btf_collect_datasec): ... Populate it here...
(btf_finalize): ... And free it here.
(get_btf_kind): Handle BTF_KIND_DECL_TAG.
(calc_num_vbytes): Likewise.
(btf_asm_type): Likewise.
(output_asm_btf_vlen_bytes): Likewise.
(btf_asm_type_ref): Update comment.
---
 gcc/btfout.cc | 79 +--
 1 file changed, 76 insertions(+), 3 deletions(-)

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index e6acf4e51a5..087d0b40e0a 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -104,6 +104,9 @@ static vec voids;
created. This vector holds them.  */
 static GTY (()) vec *funcs;
 
+/* Maps FUNC_PROTO types to the IDs of the corresponding FUNC types.  */
+static GTY (()) hash_map  *funcs_map;
+
 /* The number of BTF variables added to the TU CTF container.  */
 static unsigned int num_vars_added = 0;
 
@@ -153,6 +156,7 @@ get_btf_kind (uint32_t ctf_kind)
 case CTF_K_VOLATILE: return BTF_KIND_VOLATILE;
 case CTF_K_CONST:return BTF_KIND_CONST;
 case CTF_K_RESTRICT: return BTF_KIND_RESTRICT;
+case CTFC_INT_K_DECL_TAG: return BTF_KIND_DECL_TAG;
 default:;
 }
   return BTF_KIND_UNKN;
@@ -316,6 +320,10 @@ btf_calc_num_vbytes (ctf_dtdef_ref dtd)
   vlen_bytes += vlen * sizeof (struct btf_var_secinfo);
   break;
 
+case BTF_KIND_DECL_TAG:
+  vlen_bytes += sizeof (struct btf_decl_tag);
+  break;
+
 default:
   break;
 }
@@ -425,13 +433,15 @@ btf_collect_datasec (ctf_container_ref ctfc)
   func_dtd->dtd_data = dtd->dtd_data;
   func_dtd->dtd_data.ctti_type = dtd->dtd_type;
   func_dtd->linkage = dtd->linkage;
-  func_dtd->dtd_type = num_types_added + num_types_created;
+  /* +1 for the sentinel type not in the types map.  */
+  func_dtd->dtd_type = num_types_added + num_types_created + 1;
 
   /* Only the BTF_KIND_FUNC type actually references the name. The
 BTF_KIND_FUNC_PROTO is always anonymous.  */
   dtd->dtd_data.ctti_name = 0;
 
   vec_safe_push (funcs, func_dtd);
+  funcs_map->put (dtd, func_dtd->dtd_type);
   num_types_created++;
 
   /* Mark any 'extern' funcs and add DATASEC entries for them.  */
@@ -449,7 +459,7 @@ btf_collect_datasec (ctf_container_ref ctfc)
  struct btf_var_secinfo info;
 
  /* +1 for the sentinel type not in the types map.  */
- info.type = func_dtd->dtd_type + 1;
+ info.type = func_dtd->dtd_type;
 
  /* Both zero at compile time.  */
  info.size = 0;
@@ -653,6 +663,7 @@ btf_emit_preprocess (ctf_container_ref ctfc)
 }
 
   btf_var_ids = hash_map::create_ggc (100);
+  funcs_map = hash_map::create_ggc (100);
 
   if (num_ctf_vars)
 {
@@ -709,7 +720,8 @@ btf_asm_type_ref (const char *prefix, ctf_container_ref 
ctfc, ctf_id_t ref_id)
   else if (ref_id >= num_types_added + 1
   && ref_id < num_types_added + num_vars_added + 1)
 {
-  /* Ref to a variable.  Should only appear in DATASEC entries.  */
+  /* Ref to a variable.
+Should only appear in DATASEC entries or DECL_TAGs.  */
   ctf_id_t var_id = btf_relative_var_id (ref_id);
   ctf_dvdef_ref dvd = ctfc->ctfc_vars_list[var_id];
   dw2_asm_output_data (4, ref_id, "%s: (BTF_KIND_VAR '%s')",
@@ -831,6 +843,59 @@ btf_asm_type (ctf_container_ref ctfc, ctf_dtdef_ref dtd)
 and should write 0.  */
   dw2_asm_output_data (4, 0, "(unused)");
   return;
+case BTF_KIND_DECL_TAG:
+  {
+   /* A decl tag might refer to (be the child DIE of) a variable.  Try to
+  lookup the parent DIE's CTF variable, and if it exists point to the
+  corresponding BTF variable.  This is an odd construction - we have a
+  'type' which refers to a variable, rather than the reverse.  */
+   dw_die_ref parent = get_die_parent (dtd->dtd_key);
+   ctf_dvdef_ref ref_dvd = ctf_dvd_lookup (ctfc, parent);
+   ctf_dtdef_ref ref_dtd = ctf_dtd_lookup (ctfc, parent);
+   if (ref_dvd)
+ {
+   /* The decl tag is on a variable.  */
+   unsigned int *var_id = btf_var_ids->get (ref_dvd);
+   gcc_assert (var_id);
+   btf_asm_type_ref ("btt_type", ctfc,
+ btf_absolute_var_id (*var_id));
+   return;
+ }
+   else if (ref_dtd)
+ {
+   /* Decl tags on functions refer to the FUNC_PROTO record as a
+  result of how they are created.  But we want them in the output
+  to refer to the synthesized FUNC record instead.  */
+   unsigned int *func_id = funcs_map->get (ref_dtd);
+   gcc_assert (func_id);
+   btf_asm_type_ref ("btt_type", ctfc, *func_id);
+   return;
+ }
+ 

Re: [PATCH] Include insn-opinit.h in PLUGIN_H [PR110610]

2023-07-11 Thread Jeff Law via Gcc-patches




On 7/11/23 04:37, Andre Vieira (lists) via Gcc-patches wrote:

Hi,

This patch fixes PR110610 by including OPTABS_H in the INTERNAL_FN_H 
list, as insn-opinit.h is now required by internal-fn.h. This will lead 
to insn-opinit.h, among the other OPTABS_H header files, being installed 
in the plugin directory.


Bootstrapped aarch64-unknown-linux-gnu.

@Jakub: could you check to see if it also addresses PR 110284?


gcc/ChangeLog:

     PR 110610
     * Makefile.in (INTERNAL_FN_H): Add OPTABS_H.
Why use OPTABS_H here?  Isn't the new dependency just on insn-opinit.h 
and insn-codes.h and neither of those #include other headers do they?



Jeff




Re: [PATCH] RISC-V: Optimize permutation codegen with vcompress

2023-07-11 Thread Jeff Law via Gcc-patches




On 7/11/23 00:38, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

This patch is to recognize specific permutation pattern which can be applied 
compress approach.

Consider this following case:
#include 
typedef int8_t vnx64i __attribute__ ((vector_size (64)));
#define MASK_64\
   1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31,
\
 37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81,
\
 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
\
 100, 101, 102, 103, 104, 105, 106, 107
void __attribute__ ((noinline, noclone)) test_1 (int8_t *x, int8_t *y, int8_t 
*out)
{
   vnx64i v1 = *(vnx64i*)x;
   vnx64i v2 = *(vnx64i*)y;
   vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64);
   *(vnx64i*)out = v3;
}

https://godbolt.org/z/P33nev6cW

Before this patch:
 lui a4,%hi(.LANCHOR0)
 addia4,a4,%lo(.LANCHOR0)
 vl4re8.vv4,0(a4)
 li  a4,64
 vsetvli a5,zero,e8,m4,ta,mu
 vl4re8.vv20,0(a0)
 vl4re8.vv16,0(a1)
 vmv.v.x v12,a4
 vrgather.vv v8,v20,v4
 vmsgeu.vv   v0,v4,v12
 vsub.vv v4,v4,v12
 vrgather.vv v8,v16,v4,v0.t
 vs4r.v  v8,0(a2)
 ret

After this patch:
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
vsetvli a5,zero,e8,m4,ta,ma
vl4re8.vv12,0(a1)
vl4re8.vv8,0(a0)
vlm.v   v0,0(a4)
vslideup.vi v4,v12,20
vcompress.vmv4,v8,v0
vs4r.v  v4,0(a2)
ret

gcc/ChangeLog:

 * config/riscv/riscv-protos.h (enum insn_type): Add vcompress 
optimization.
 * config/riscv/riscv-v.cc (emit_vlmax_compress_insn): Ditto.
 (shuffle_compress_patterns): Ditto.
 (expand_vec_perm_const_1): Ditto.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-1.c: New test.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-2.c: New test.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-3.c: New test.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-4.c: New test.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-5.c: New test.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-6.c: New test.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-1.c: New test.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-2.c: New test.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-3.c: New test.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-4.c: New test.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-5.c: New test.
 * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-6.c: New test.
I had to look at this a few times, but I think that's because it's been 
polluted by another vector architecture's handling of compressed 
vectors.  What you're doing looks quite reasonable.


OK for the trunk.

jeff



Ping: [PATCH v2] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-07-11 Thread Lewis Hyatt via Gcc-patches
May I please ping this patch again? I think it would be worthwhile to
close this gap in the support for UTF-8 sources. Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613247.html

-Lewis

On Fri, Jun 2, 2023 at 9:45 AM Lewis Hyatt  wrote:
>
> Hello-
>
> Ping please? Thanks.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613247.html
>
> -Lewis
>
> On Tue, May 2, 2023 at 9:27 AM Lewis Hyatt  wrote:
> >
> > May I please ping this one? Thanks...
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613247.html
> >
> > On Thu, Mar 2, 2023 at 6:21 PM Lewis Hyatt  wrote:
> > >
> > > The PR complains that we do not handle UTF-8 in the suffix for a 
> > > user-defined
> > > literal, such as:
> > >
> > > bool operator ""_π (unsigned long long);
> > >
> > > In fact we don't handle any extended identifier characters there, whether
> > > UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space 
> > > after
> > > the "" tokens is included, since then the identifier is lexed in the 
> > > "normal"
> > > way as its own token. But when it is lexed as part of the string token, 
> > > this
> > > is handled in lex_string() with a one-off loop that is not aware of 
> > > extended
> > > characters.
> > >
> > > This patch fixes it by adding a new function scan_cur_identifier() that 
> > > can be
> > > used to lex an identifier while in the middle of lexing another token.
> > >
> > > BTW, the other place that has been mis-lexing identifiers is
> > > lex_identifier_intern(), which is used to implement #pragma push_macro
> > > and #pragma pop_macro. This does not support extended characters either.
> > > I will add that in a subsequent patch, because it can't directly reuse the
> > > new function, but rather needs to lex from a string instead of a 
> > > cpp_buffer.
> > >
> > > With scan_cur_identifier(), we do also correctly warn about bidi and
> > > normalization issues in the extended identifiers comprising the suffix.
> > >
> > > libcpp/ChangeLog:
> > >
> > > PR preprocessor/103902
> > > * lex.cc (identifier_diagnostics_on_lex): New function refactoring
> > > some common code.
> > > (lex_identifier_intern): Use the new function.
> > > (lex_identifier): Don't run identifier diagnostics here, rather 
> > > let
> > > the call site do it when needed.
> > > (_cpp_lex_direct): Adjust the call sites of lex_identifier ()
> > > acccordingly.
> > > (struct scan_id_result): New struct.
> > > (scan_cur_identifier): New function.
> > > (create_literal2): New function.
> > > (lit_accum::create_literal2): New function.
> > > (is_macro): Folded into new function...
> > > (maybe_ignore_udl_macro_suffix): ...here.
> > > (is_macro_not_literal_suffix): Folded likewise.
> > > (lex_raw_string): Handle UTF-8 in UDL suffix via 
> > > scan_cur_identifier ().
> > > (lex_string): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR preprocessor/103902
> > > * g++.dg/cpp0x/udlit-extended-id-1.C: New test.
> > > * g++.dg/cpp0x/udlit-extended-id-2.C: New test.
> > > * g++.dg/cpp0x/udlit-extended-id-3.C: New test.
> > > * g++.dg/cpp0x/udlit-extended-id-4.C: New test.
> > > ---
> > >
> > > Notes:
> > > Hello-
> > >
> > > This is the updated version of the patch, incorporating feedback from 
> > > Jakub
> > > and Jason, most recently discussed here:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612073.html
> > >
> > > Please let me know how it looks? It is simpler than before with the 
> > > new
> > > approach. Thanks!
> > >
> > > One thing to note. As Jason clarified for me, a usage like this:
> > >
> > >  #pragma GCC poison _x
> > > const char * operator "" _x (const char *, unsigned long);
> > >
> > > The space between the "" and the _x is currently allowed but will be
> > > deprecated in C++23. GCC currently will complain about the poisoned 
> > > use of
> > > _x in this case, and this patch, which is just focused on handling 
> > > UTF-8
> > > properly, does not change this. But it seems that it would be correct
> > > not to apply poison in this case. I can try to follow up with a patch 
> > > to do
> > > so, if it seems worthwhile? Given the syntax is deprecated, maybe 
> > > it's not
> > > worth it...
> > >
> > > For the time being, this patch does add a testcase for the above and 
> > > xfails
> > > it. For the case where no space is present, which is the part touched 
> > > by the
> > > present patch, existing behavior is preserved correctly and no 
> > > diagnostics
> > > such as poison are issued for the UDL suffix. (Contrary to v1 of this
> > > patch.)
> > >
> > > Thanks! bootstrap + regtested all languages on x86-64 Linux with
> > > no regressions.
> > >
> > > -Lewis
> > >
> > >  .../g++.dg/cpp0x/udlit-extended-i

RE: [PATCH] RISC-V: Optimize permutation codegen with vcompress

2023-07-11 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Wednesday, July 12, 2023 7:19 AM
To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Optimize permutation codegen with vcompress



On 7/11/23 00:38, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> This patch is to recognize specific permutation pattern which can be applied 
> compress approach.
> 
> Consider this following case:
> #include 
> typedef int8_t vnx64i __attribute__ ((vector_size (64)));
> #define MASK_64   
>  \
>1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31,  
>   \
>  37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81,  
>   \
>  82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  
>   \
>  100, 101, 102, 103, 104, 105, 106, 107
> void __attribute__ ((noinline, noclone)) test_1 (int8_t *x, int8_t *y, int8_t 
> *out)
> {
>vnx64i v1 = *(vnx64i*)x;
>vnx64i v2 = *(vnx64i*)y;
>vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64);
>*(vnx64i*)out = v3;
> }
> 
> https://godbolt.org/z/P33nev6cW
> 
> Before this patch:
>  lui a4,%hi(.LANCHOR0)
>  addia4,a4,%lo(.LANCHOR0)
>  vl4re8.vv4,0(a4)
>  li  a4,64
>  vsetvli a5,zero,e8,m4,ta,mu
>  vl4re8.vv20,0(a0)
>  vl4re8.vv16,0(a1)
>  vmv.v.x v12,a4
>  vrgather.vv v8,v20,v4
>  vmsgeu.vv   v0,v4,v12
>  vsub.vv v4,v4,v12
>  vrgather.vv v8,v16,v4,v0.t
>  vs4r.v  v8,0(a2)
>  ret
> 
> After this patch:
>   lui a4,%hi(.LANCHOR0)
>   addia4,a4,%lo(.LANCHOR0)
>   vsetvli a5,zero,e8,m4,ta,ma
>   vl4re8.vv12,0(a1)
>   vl4re8.vv8,0(a0)
>   vlm.v   v0,0(a4)
>   vslideup.vi v4,v12,20
>   vcompress.vmv4,v8,v0
>   vs4r.v  v4,0(a2)
>   ret
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-protos.h (enum insn_type): Add vcompress 
> optimization.
>  * config/riscv/riscv-v.cc (emit_vlmax_compress_insn): Ditto.
>  (shuffle_compress_patterns): Ditto.
>  (expand_vec_perm_const_1): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-1.c: New test.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-2.c: New test.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-3.c: New test.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-4.c: New test.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-5.c: New test.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-6.c: New test.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-1.c: New test.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-2.c: New test.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-3.c: New test.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-4.c: New test.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-5.c: New test.
>  * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-6.c: New test.
I had to look at this a few times, but I think that's because it's been 
polluted by another vector architecture's handling of compressed 
vectors.  What you're doing looks quite reasonable.

OK for the trunk.

jeff



Re: [x86 PATCH] Tweak ix86_expand_int_compare to use PTEST for vector equality.

2023-07-11 Thread Hongtao Liu via Gcc-patches
On Wed, Jul 12, 2023 at 4:57 AM Roger Sayle  wrote:
>
>
> > From: Hongtao Liu 
> > Sent: 28 June 2023 04:23
> > > From: Roger Sayle 
> > > Sent: 27 June 2023 20:28
> > >
> > > I've also come up with an alternate/complementary/supplementary
> > > fix of generating the PTEST during RTL expansion, rather than rely on
> > > this being caught/optimized later during STV.
> > >
> > > You may notice in this patch, the tests for TARGET_SSE4_1 and TImode
> > > appear last.  When I was writing this, I initially also added support
> > > for AVX VPTEST and OImode, before realizing that x86 doesn't (yet)
> > > support 256-bit OImode (which also explains why we don't have an
> > > OImode to V1OImode scalar-to-vector pass).  Retaining this clause
> > > ordering should minimize the lines changed if things change in future.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> > >
> > >
> > > 2023-06-27  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > * config/i386/i386-expand.cc (ix86_expand_int_compare): If
> > > testing a TImode SUBREG of a 128-bit vector register against
> > > zero, use a PTEST instruction instead of first moving it to
> > > to scalar registers.
> > >
> >
> > +  /* Attempt to use PTEST, if available, when testing vector modes for
> > + equality/inequality against zero.  */  if (op1 == const0_rtx
> > +  && SUBREG_P (op0)
> > +  && cmpmode == CCZmode
> > +  && SUBREG_BYTE (op0) == 0
> > +  && REG_P (SUBREG_REG (op0))
> > Just register_operand (op0, TImode),
>
> I completely agree that in most circumstances, the early RTL optimizers
> should use standard predicates, such as register_operand, that don't
> distinguish between REG and SUBREG, allowing the choice (assignment)
> to be left to register allocation (reload).
>
> However in this case, unusually, the presence of the SUBREG, and treating
> it differently from a REG is critical (in fact the reason for the patch).  
> x86_64
> can very efficiently test whether a 128-bit value is zero, setting ZF, either
> in TImode, using orq %rax,%rdx in a single cycle/single instruction, or in
> V1TImode, using ptest %xmm0,%xmm0, in a single cycle/single instruction.
> There's no reason to prefer one form over the other.  A SUREG, however, that
> moves the value from the scalar registers to a vector register, or from a 
> vector
> registers to scalar registers, requires two or three instructions, often 
> reading
> and writing values via memory, at a huge performance penalty.   Hence the
> goal is to eliminate the (VIEW_CONVERT) SUBREG, and choose the appropriate
> single-cycle test instruction for where the data is located.  Hence we want
> to leave REG_P alone, but optimize (only) the SUBREG_P cases.
> register_operand doesn't help with this.
>
> Note this is counter to the usual advice.  Normally, a SUBREG between scalar
> registers is cheap (in fact free) on x86, hence it safe for predicates to 
> ignore
> them prior to register allocation.  But another use of SUBREG, to represent
> a VIEW_CONVERT_EXPR/transfer between processing units is closer to a
> conversion, and a very expensive one (going via memory with different size
> reads vs writes) at that.
>
>
> > +  && VECTOR_MODE_P (GET_MODE (SUBREG_REG (op0)))
> > +  && TARGET_SSE4_1
> > +  && GET_MODE (op0) == TImode
> > +  && GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))) == 16)
> > +{
> > +  tmp = SUBREG_REG (op0);
> > and tmp = lowpart_subreg (V1TImode, force_reg (TImode, op0));?
> > I think RA can handle SUBREG correctly, no need for extra predicates.
>
> Likewise, your "tmp = lowpart_subreg (V1TImode, force_reg (TImode, ...))"
> is forcing there to always be an inter-unit transfer/pipeline stall, when 
> this is
> idiom that we're trying to eliminate.
>
> I should have repeated the motivating example from my original post at
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html
>
> typedef long long __m128i __attribute__ ((__vector_size__ (16)));
> int foo (__m128i x, __m128i y) {
>   return (__int128)x == (__int128)y;
> }
>
> is currently generated as:
> foo:movaps  %xmm0, -40(%rsp)
> movq-32(%rsp), %rdx
> movq%xmm0, %rax
> movq%xmm1, %rsi
> movaps  %xmm1, -24(%rsp)
> movq-16(%rsp), %rcx
> xorq%rsi, %rax
> xorq%rcx, %rdx
> orq %rdx, %rax
> sete%al
> movzbl  %al, %eax
> ret
>
> with this patch (to eliminate the interunit SUBREG) this becomes:
>
> foo:pxor%xmm1, %xmm0
> xorl%eax, %eax
> ptest   %xmm0, %xmm0
> sete%al
> ret
>
> Hopefully, this clarifies things a little.
Thanks for the explanation, the patch LGTM.
One curious question, is there any case SUBREG_BYTE != 0 when inner
and outer mode(TImode) have the s

Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-11 Thread Jeff Law via Gcc-patches




On 7/7/23 08:32, Juzhe-Zhong wrote:

This patch fully support gather_load/scatter_store:
1. Support single-rgroup on both RV32/RV64.
2. Support indexed element width can be same as or smaller than Pmode.
3. Support VLA SLP with gather/scatter.
4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
5. Fix bug of handling (subreg:SI (const_poly_int:DI))
6. Fix bug on vec_perm which is used by gather/scatter SLP.

All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
We fully supported these 4 kinds of gather/scatter:
1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy mask 
(Full vector).
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real mask.
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy mask.
2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.

We use vluxei/vsuxei (un-ordered indexed loads/stores of RVV to code generate 
gather/scatter).

Also, we support strided loads/stores with vlse.v/vsse.v. Consider this 
following case:
#define TEST_LOOP(DATA_TYPE, BITS) \
   void __attribute__ ((noinline, noclone)) 
\
   f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src,   
\
  INDEX##BITS stride, INDEX##BITS n)   \
   {
\
 for (INDEX##BITS i = 0; i < n; ++i)
\
   dest[i] += src[i * stride];  
\
   }

Codegen:
f_int8_t_8:
ble a3,zero,.L10
li  a5,1
mv  a4,a0
bne a2,a5,.L4
li  a2,1
.L6:
vsetvli a5,a3,e8,m2,ta,ma
vle8.v  v2,0(a0)
vlse8.v v4,0(a1),a2
vsetvli a6,zero,e8,m2,ta,ma
sub a3,a3,a5
vadd.vv v2,v2,v4
vsetvli zero,a5,e8,m2,ta,ma
vse8.v  v2,0(a4)
add a0,a0,a5
add a1,a1,a5
add a4,a4,a5
bne a3,zero,.L6
.L10:
ret

We use vlse.v instead of vluxei.

This patch has been tested on both RV32 and RV64.

gcc/ChangeLog:

 * config/riscv/autovec.md 
(len_mask_gather_load): New pattern.
 (len_mask_gather_load): Ditto.
 (len_mask_gather_load): Ditto.
 (len_mask_gather_load): Ditto.
 (len_mask_gather_load): Ditto.
 (len_mask_gather_load): Ditto.
 (len_mask_gather_load): Ditto.
 (len_mask_gather_load): Ditto.
 (len_mask_scatter_store): Ditto.
 (len_mask_scatter_store): Ditto.
 (len_mask_scatter_store): Ditto.
 (len_mask_scatter_store): Ditto.
 (len_mask_scatter_store): Ditto.
 (len_mask_scatter_store): Ditto.
 (len_mask_scatter_store): Ditto.
 (len_mask_scatter_store): Ditto.
 * config/riscv/predicates.md (const_1_operand): New predicate.
 (vector_gs_offset_operand): Ditto.
 (vector_gs_scale_operand_16): Ditto.
 (vector_gs_scale_operand_32): Ditto.
 (vector_gs_scale_operand_64): Ditto.
 (vector_gs_extension_operand): Ditto.
 (vector_gs_scale_operand_16_rv32): Ditto.
 (vector_gs_scale_operand_32_rv32): Ditto.
 * config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter.
 (expand_gather_scatter): New function.
 * config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter.
 (emit_vlmax_masked_store_insn): New function.
 (emit_nonvlmax_masked_store_insn): Ditto.
 (modulo_sel_indices): Ditto.
 (expand_vec_perm): Fix SLP for gather/scatter.
 (prepare_gather_scatter): New function.
 (strided_load_store_p): Ditto.
 (expand_gather_scatter): Ditto.
 * config/riscv/riscv.cc (riscv_legitimize_move): Fix bug of (subreg:SI 
(DI CONST_POLY_INT)).
 * config/riscv/vector-iterators.md: Add gather/scatter.
 * config/riscv/vector.md (vec_duplicate): Use "@" instead.
 (@vec_duplicate): Ditto.
 (@pred_indexed_store): Fix 
name.
 (@pred_indexed_store): Ditto.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/rvv.exp: Add gather/scatter tests.
 * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-1.c: New 
test.
 * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-10.c: New 
test.
 * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-11.c: New 
test.
 * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-12.c: New 
test.
 * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-2.c: New 
test.
 * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-3.c: New 
test.
 * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-4.c: New 
test.
 * gcc.target/riscv/rvv/autovec/gather-scatter/gather_load-5.c: New 
test.

Re: [PATCH] riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])

2023-07-11 Thread Jeff Law via Gcc-patches




On 7/10/23 22:44, Christoph Muellner wrote:

From: Christoph Müllner 

Recently, two identical XTheadCondMov tests have been added, which both fail.
Let's fix that by changing the following:
* Merge both files into one (no need for separate tests for rv32 and rv64)
* Drop unrelated attribute check test (we already test for `th.mveqz`
   and `th.mvnez` instructions, so there is little additional value)
* Fix the pattern to allow matching

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved to...
* gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Removed.
I thought this stuff got fixed recently.  Certainly happy to see the 
files merged though.  Here's what I got from the July 4 run:



UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O0
UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O1
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2  (test for excess 
errors)
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
check-function-bodies ConEmv_imm_imm_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
check-function-bodies ConEmv_imm_reg_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
check-function-bodies ConEmv_reg_imm_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
check-function-bodies ConEmv_reg_reg_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
check-function-bodies ConNmv_imm_imm_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
check-function-bodies ConNmv_imm_reg_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
check-function-bodies ConNmv_reg_imm_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   
check-function-bodies ConNmv_reg_reg_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2   scan-assembler .attribute 
arch, "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O3 -g
UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -Os
UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none
UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects
UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O0
UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O1
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2  (test for excess 
errors)
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
check-function-bodies ConEmv_imm_imm_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
check-function-bodies ConEmv_imm_reg_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
check-function-bodies ConEmv_reg_imm_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
check-function-bodies ConEmv_reg_reg_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
check-function-bodies ConNmv_imm_imm_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
check-function-bodies ConNmv_imm_reg_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
check-function-bodies ConNmv_reg_imm_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   
check-function-bodies ConNmv_reg_reg_reg
PASS: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2   scan-assembler .attribute 
arch, "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O3 -g
UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -Os
UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none
UNSUPPORTED: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects



jeff


Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-11 Thread juzhe.zh...@rivai.ai
Hi, Jeff.

>> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
>> complete.  While you might be able to get REG_EXPR, I would not really
>> expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
>> way to make sure it's not called at an inappropriate time.
I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.

>> Should this have been known_lt rather than known_le?
It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE for 
SLP.

>> Something's off in your formatting here.  I'd guess spaces vs tabs
Ok.

>>In a few places you're using expand_binop.  Those interfaces are really
>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
>>interfaces?
I saw ARM SVE is using them in many places for expanding patterns.
And I think it's convenient so that's why I use them.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-07-12 10:01
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; rdapp.gcc
Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
auto-vectorization
 
 
On 7/7/23 08:32, Juzhe-Zhong wrote:
> This patch fully support gather_load/scatter_store:
> 1. Support single-rgroup on both RV32/RV64.
> 2. Support indexed element width can be same as or smaller than Pmode.
> 3. Support VLA SLP with gather/scatter.
> 4. Fully tested all gather/scatter with LMUL = M1/M2/M4/M8 both VLA and VLS.
> 5. Fix bug of handling (subreg:SI (const_poly_int:DI))
> 6. Fix bug on vec_perm which is used by gather/scatter SLP.
> 
> All kinds of GATHER/SCATTER are normalized into LEN_MASK_*.
> We fully supported these 4 kinds of gather/scatter:
> 1. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and dummy 
> mask (Full vector).
> 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with dummy length and real 
> mask.
> 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and dummy 
> mask.
> 2. LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE with real length and real mask.
> 
> We use vluxei/vsuxei (un-ordered indexed loads/stores of RVV to code generate 
> gather/scatter).
> 
> Also, we support strided loads/stores with vlse.v/vsse.v. Consider this 
> following case:
> #define TEST_LOOP(DATA_TYPE, BITS)
>  \
>void __attribute__ ((noinline, noclone))   
>   \
>f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, 
>   \
>   INDEX##BITS stride, INDEX##BITS n)   \
>{  
>   \
>  for (INDEX##BITS i = 0; i < n; ++i)  
>   \
>dest[i] += src[i * stride];
>   \
>}
> 
> Codegen:
> f_int8_t_8:
> ble a3,zero,.L10
> li a5,1
> mv a4,a0
> bne a2,a5,.L4
> li a2,1
> .L6:
> vsetvli a5,a3,e8,m2,ta,ma
> vle8.v v2,0(a0)
> vlse8.v v4,0(a1),a2
> vsetvli a6,zero,e8,m2,ta,ma
> sub a3,a3,a5
> vadd.vv v2,v2,v4
> vsetvli zero,a5,e8,m2,ta,ma
> vse8.v v2,0(a4)
> add a0,a0,a5
> add a1,a1,a5
> add a4,a4,a5
> bne a3,zero,.L6
> .L10:
> ret
> 
> We use vlse.v instead of vluxei.
> 
> This patch has been tested on both RV32 and RV64.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/autovec.md 
> (len_mask_gather_load): New pattern.
>  (len_mask_gather_load): Ditto.
>  (len_mask_gather_load): Ditto.
>  (len_mask_gather_load): Ditto.
>  (len_mask_gather_load): Ditto.
>  (len_mask_gather_load): Ditto.
>  (len_mask_gather_load): Ditto.
>  (len_mask_gather_load): Ditto.
>  (len_mask_scatter_store): Ditto.
>  (len_mask_scatter_store): Ditto.
>  (len_mask_scatter_store): Ditto.
>  (len_mask_scatter_store): Ditto.
>  (len_mask_scatter_store): Ditto.
>  (len_mask_scatter_store): Ditto.
>  (len_mask_scatter_store): Ditto.
>  (len_mask_scatter_store): Ditto.
>  * config/riscv/predicates.md (const_1_operand): New predicate.
>  (vector_gs_offset_operand): Ditto.
>  (vector_gs_scale_operand_16): Ditto.
>  (vector_gs_scale_operand_32): Ditto.
>  (vector_gs_scale_operand_64): Ditto.
>  (vector_gs_extension_operand): Ditto.
>  (vector_gs_scale_operand_16_rv32): Ditto.
>  (vector_gs_scale_operand_32_rv32): Ditto.
>  * config/riscv/riscv-protos.h (enum insn_type): Add gather/scatter.
>  (expand_gather_scatter): New function.
>  * config/riscv/riscv-v.cc (gen_const_vector_dup): Add gather/scatter.
>  (emit_vlmax_masked_store_insn): New function.
>  (emit_nonvlmax_masked_store_insn): Ditto.
>  (modulo_sel_indices): Ditto.
>  (expand_vec_perm): Fix SLP for gather/scatter.
>  (prepare_gather_scatter): New function.
>  (strided_load_store_p): Ditto.
>  (expand

[RFC] Store_bit_field_1: Use mode of SUBREG instead of REG

2023-07-11 Thread YunQiang Su
PR #104914

When work with
  int val;
  ((unsigned char*)&val)[0] = *buf;
The RTX mode is obtained from REG instead of SUBREG,
which make D is used instead of .
Thus something wrong happens on sign-extend default architectures,
like MIPS64.

gcc/ChangeLog:
PR: 104914.
* expmed.cc(store_bit_field_1): Get mode from original
str_rtx instead of op0.
---
 gcc/expmed.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index fbd4ce2d42f..37f90912122 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -849,7 +849,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
  if we aren't.  This must come after the entire register case above,
  since that case is valid for any mode.  The following cases are only
  valid for integral modes.  */
-  opt_scalar_int_mode op0_mode = int_mode_for_mode (GET_MODE (op0));
+  opt_scalar_int_mode op0_mode = int_mode_for_mode (GET_MODE (str_rtx));
   scalar_int_mode imode;
   if (!op0_mode.exists (&imode) || imode != GET_MODE (op0))
 {
-- 
2.30.2



[PATCH] RISC-V: Throw compilation error for unknown sub-extension or supervisor extension

2023-07-11 Thread Lehua Ding
Hi,

This tiny patch add a check for extension starts with 'z' or 's' in `-march`
option. Currently this unknown extension will be passed to the assembler, which
then reports an error. With this patch, the compiler will throw a compilation
error if the extension starts with 'z' or 's' is not a standard sub-extension or
supervisor extension.

e.g.:

Run `riscv64-unknown-elf-gcc -march=rv64gcv_zvl128_s123 a.c` will throw these 
error:

riscv64-unknown-elf-gcc: error: '-march=rv64gcv_zvl128_s123': extension 'zvl' 
starts with `z` but is not a standard sub-extension
riscv64-unknown-elf-gcc: error: '-march=rv64gcv_zvl128_s123': extension 's123' 
start with `s` but not a standard supervisor extension

Best,
Lehua

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (standard_extensions_p): New func.
(riscv_subset_list::add): Add check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-3.c: Update -march.
* gcc.target/riscv/arch-5.c: Ditto.
* gcc.target/riscv/arch-8.c: Ditto.
* gcc.target/riscv/attribute-10.c: Ditto.
* gcc.target/riscv/attribute-9.c: Ditto.
* gcc.target/riscv/pr102957.c: Ditto.
* gcc.target/riscv/arch-22.cc: New test.

---
 gcc/common/config/riscv/riscv-common.cc   | 29 +++
 gcc/testsuite/gcc.target/riscv/arch-22.cc |  8 +
 gcc/testsuite/gcc.target/riscv/arch-3.c   |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-5.c   |  2 +-
 gcc/testsuite/gcc.target/riscv/arch-8.c   |  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-10.c |  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-9.c  |  4 +--
 gcc/testsuite/gcc.target/riscv/pr102957.c |  2 ++
 8 files changed, 45 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-22.cc

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 6091d8f281b..df3c256c80c 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -518,6 +518,18 @@ subset_cmp (const std::string &a, const std::string &b)
 }
 }
 
+/* Return true if EXT is a standard extension.  */
+
+static bool
+standard_extensions_p (const char *ext)
+{
+  const riscv_ext_version *ext_ver;
+  for (ext_ver = &riscv_ext_version_table[0]; ext_ver->name != NULL; ++ext_ver)
+if (strcmp (ext, ext_ver->name) == 0)
+  return true;
+  return false;
+}
+
 /* Add new subset to list.  */
 
 void
@@ -546,6 +558,23 @@ riscv_subset_list::add (const char *subset, int 
major_version,
 
   return;
 }
+  else if (subset[0] == 'z' && !standard_extensions_p (subset))
+{
+  error_at (m_loc,
+   "%<-march=%s%>: extension %qs starts with `z` but is not a "
+   "standard sub-extension",
+   m_arch, subset);
+  return;
+}
+  else if (subset[0] == 's' && !standard_extensions_p (subset))
+{
+  error_at (
+   m_loc,
+   "%<-march=%s%>: extension %qs start with `s` but not a standard "
+   "supervisor extension",
+   m_arch, subset);
+  return;
+}
 
   riscv_subset_t *s = new riscv_subset_t ();
   riscv_subset_t *itr;
diff --git a/gcc/testsuite/gcc.target/riscv/arch-22.cc 
b/gcc/testsuite/gcc.target/riscv/arch-22.cc
new file mode 100644
index 000..f9d8b57cb20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-22.cc
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128_z123_s123 -mabi=lp64d" } */
+int foo()
+{
+}
+/* { dg-error "extension 'zvl128' start with `z` but not a standard 
sub-extension" "" { target *-*-* } 0 } */
+/* { dg-error "extension 'z123' start with `z` but not a standard 
sub-extension" "" { target *-*-* } 0 } */
+/* { dg-error "extension 's123' start with `s` but not a standard supervisor 
extension" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-3.c 
b/gcc/testsuite/gcc.target/riscv/arch-3.c
index 7aa945eca20..dee0fc6656d 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-3.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32isabc_xbar -mabi=ilp32" } */
+/* { dg-options "-march=rv32isvinval_xbar -mabi=ilp32" } */
 int foo()
 {
 }
diff --git a/gcc/testsuite/gcc.target/riscv/arch-5.c 
b/gcc/testsuite/gcc.target/riscv/arch-5.c
index 8258552214f..8bdaa9d17b2 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-5.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32i_zfoo_sabc_xbar -mabi=ilp32" } */
+/* { dg-options "-march=rv32i_zmmul_svnapot_xbar -mabi=ilp32" } */
 int foo()
 {
 }
diff --git a/gcc/testsuite/gcc.target/riscv/arch-8.c 
b/gcc/testsuite/gcc.target/riscv/arch-8.c
index 1b9e51b0e12..ef557aeb673 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-8.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv32id_zicsr_zifence -mabi=ilp

Re: [PATCH] RISC-V: Throw compilation error for unknown sub-extension or supervisor extension

2023-07-11 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-07-12 11:27
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; kito.cheng; palmer; jeffreyalaw
Subject: [PATCH] RISC-V: Throw compilation error for unknown sub-extension or 
supervisor extension
Hi,
 
This tiny patch add a check for extension starts with 'z' or 's' in `-march`
option. Currently this unknown extension will be passed to the assembler, which
then reports an error. With this patch, the compiler will throw a compilation
error if the extension starts with 'z' or 's' is not a standard sub-extension or
supervisor extension.
 
e.g.:
 
Run `riscv64-unknown-elf-gcc -march=rv64gcv_zvl128_s123 a.c` will throw these 
error:
 
riscv64-unknown-elf-gcc: error: '-march=rv64gcv_zvl128_s123': extension 'zvl' 
starts with `z` but is not a standard sub-extension
riscv64-unknown-elf-gcc: error: '-march=rv64gcv_zvl128_s123': extension 's123' 
start with `s` but not a standard supervisor extension
 
Best,
Lehua
 
gcc/ChangeLog:
 
* common/config/riscv/riscv-common.cc (standard_extensions_p): New func.
(riscv_subset_list::add): Add check.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/arch-3.c: Update -march.
* gcc.target/riscv/arch-5.c: Ditto.
* gcc.target/riscv/arch-8.c: Ditto.
* gcc.target/riscv/attribute-10.c: Ditto.
* gcc.target/riscv/attribute-9.c: Ditto.
* gcc.target/riscv/pr102957.c: Ditto.
* gcc.target/riscv/arch-22.cc: New test.
 
---
gcc/common/config/riscv/riscv-common.cc   | 29 +++
gcc/testsuite/gcc.target/riscv/arch-22.cc |  8 +
gcc/testsuite/gcc.target/riscv/arch-3.c   |  2 +-
gcc/testsuite/gcc.target/riscv/arch-5.c   |  2 +-
gcc/testsuite/gcc.target/riscv/arch-8.c   |  2 +-
gcc/testsuite/gcc.target/riscv/attribute-10.c |  2 +-
gcc/testsuite/gcc.target/riscv/attribute-9.c  |  4 +--
gcc/testsuite/gcc.target/riscv/pr102957.c |  2 ++
8 files changed, 45 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/arch-22.cc
 
diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 6091d8f281b..df3c256c80c 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -518,6 +518,18 @@ subset_cmp (const std::string &a, const std::string &b)
 }
}
+/* Return true if EXT is a standard extension.  */
+
+static bool
+standard_extensions_p (const char *ext)
+{
+  const riscv_ext_version *ext_ver;
+  for (ext_ver = &riscv_ext_version_table[0]; ext_ver->name != NULL; ++ext_ver)
+if (strcmp (ext, ext_ver->name) == 0)
+  return true;
+  return false;
+}
+
/* Add new subset to list.  */
void
@@ -546,6 +558,23 @@ riscv_subset_list::add (const char *subset, int 
major_version,
   return;
 }
+  else if (subset[0] == 'z' && !standard_extensions_p (subset))
+{
+  error_at (m_loc,
+ "%<-march=%s%>: extension %qs starts with `z` but is not a "
+ "standard sub-extension",
+ m_arch, subset);
+  return;
+}
+  else if (subset[0] == 's' && !standard_extensions_p (subset))
+{
+  error_at (
+ m_loc,
+ "%<-march=%s%>: extension %qs start with `s` but not a standard "
+ "supervisor extension",
+ m_arch, subset);
+  return;
+}
   riscv_subset_t *s = new riscv_subset_t ();
   riscv_subset_t *itr;
diff --git a/gcc/testsuite/gcc.target/riscv/arch-22.cc 
b/gcc/testsuite/gcc.target/riscv/arch-22.cc
new file mode 100644
index 000..f9d8b57cb20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-22.cc
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl128_z123_s123 -mabi=lp64d" } */
+int foo()
+{
+}
+/* { dg-error "extension 'zvl128' start with `z` but not a standard 
sub-extension" "" { target *-*-* } 0 } */
+/* { dg-error "extension 'z123' start with `z` but not a standard 
sub-extension" "" { target *-*-* } 0 } */
+/* { dg-error "extension 's123' start with `s` but not a standard supervisor 
extension" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-3.c 
b/gcc/testsuite/gcc.target/riscv/arch-3.c
index 7aa945eca20..dee0fc6656d 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-3.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-3.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=rv32isabc_xbar -mabi=ilp32" } */
+/* { dg-options "-march=rv32isvinval_xbar -mabi=ilp32" } */
int foo()
{
}
diff --git a/gcc/testsuite/gcc.target/riscv/arch-5.c 
b/gcc/testsuite/gcc.target/riscv/arch-5.c
index 8258552214f..8bdaa9d17b2 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-5.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-5.c
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-march=rv32i_zfoo_sabc_xbar -mabi=ilp32" } */
+/* { dg-options "-march=rv32i_zmmul_svnapot_xbar -mabi=ilp32" } */
int foo()
{
}
diff --git a/gcc/testsuite/gcc.target/riscv/arch-8.c 
b/gcc/testsuite/gcc.target/riscv/arch-8.c
index 1b9e51b0e12..ef557aeb673 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-8.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-8.c
@@ -1

[PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-11 Thread Lehua Ding
Hi,

This tiny patch add --append option to mklog.py that support add generated
ChangeLog to the corresponding patch file. With this option there is no need
to manually copy the generated ChangeLog to the patch file. e.g.:

Run `mklog.py -a /path/to/this/patch` will add the generated ChangeLog

```
contrib/ChangeLog:

* mklog.py:
```

to the right place of the /path/to/this/patch file.

Best,
Lehua

contrib/ChangeLog:

* mklog.py: Add --append option.

---
 contrib/mklog.py | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 777212c98d7..26230b9b4f2 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -358,6 +358,8 @@ if __name__ == '__main__':
  'file')
 parser.add_argument('--update-copyright', action='store_true',
 help='Update copyright in ChangeLog files')
+parser.add_argument('-a', '--append', action='store_true',
+help='Append the generate ChangeLog to the patch file')
 args = parser.parse_args()
 if args.input == '-':
 args.input = None
@@ -370,7 +372,30 @@ if __name__ == '__main__':
 else:
 output = generate_changelog(data, args.no_functions,
 args.fill_up_bug_titles, args.pr_numbers)
-if args.changelog:
+if args.append:
+if (not args.input):
+raise Exception("`-a or --append` option not support standard 
input")
+lines = []
+with open(args.input, 'r', newline='\n') as f:
+# 1 -> not find the possible start of diff log
+# 2 -> find the possible start of diff log
+# 3 -> finish add ChangeLog to the patch file
+maybe_diff_log = 1
+for line in f:
+if maybe_diff_log == 1 and line == "---\n":
+maybe_diff_log = 2
+elif maybe_diff_log == 2 and \
+ re.match("\s[^\s]+\s+\|\s\d+\s[+\-]+\n", line):
+lines += [output, "---\n", line]
+maybe_diff_log = 3
+else:
+# the possible start is not the true start.
+if maybe_diff_log == 2:
+maybe_diff_log = 1
+lines.append(line)
+with open(args.input, "w") as f:
+f.writelines(lines)
+elif args.changelog:
 lines = open(args.changelog).read().split('\n')
 start = list(takewhile(skip_line_in_changelog, lines))
 end = lines[len(start):]
-- 
2.36.1



[PATCH] VECT: Apply COND_LEN_* into vectorizable_operation

2023-07-11 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.
As we disscussed before, COND_LEN_* patterns were added for multiple situations.
This patch apply CON_LEN_* for the following situation:

Support for the situation that in "vectorizable_operation":
  /* If operating on inactive elements could generate spurious traps,
 we need to restrict the operation to active lanes.  Note that this
 specifically doesn't apply to unhoisted invariants, since they
 operate on the same value for every lane.

 Similarly, if this operation is part of a reduction, a fully-masked
 loop should only change the active lanes of the reduction chain,
 keeping the inactive lanes as-is.  */
  bool mask_out_inactive = ((!is_invariant && gimple_could_trap_p (stmt))
|| reduc_idx >= 0);

For mask_out_inactive is true with length loop control.

So, we can these 2 following cases:

1. Integer division:

   #define TEST_TYPE(TYPE)  \
   __attribute__((noipa))   \
   void vrem_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
   {\
 for (int i = 0; i < n; i++)\
   dst[i] = a[i] % b[i];\
   }
   #define TEST_ALL()   \
   TEST_TYPE(int8_t)\
   TEST_ALL()

With this patch:
  
  _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
  ivtmp_45 = _61 * 4;
  vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
  vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
  vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
vect__4.8_48, _61, 0);
  .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);

2. Floating-point arithmetic **WITHOUT** -ffast-math
  
   #define TEST_TYPE(TYPE)  \
   __attribute__((noipa))   \
   void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
   {\
 for (int i = 0; i < n; i++)\
   dst[i] = a[i] + b[i];\
   }
   #define TEST_ALL()   \
   TEST_TYPE(float) \
   TEST_ALL()

With this patch:
   
  _61 = .SELECT_VL (ivtmp_59, POLY_INT_CST [4, 4]);
  ivtmp_45 = _61 * 4;
  vect__4.8_48 = .LEN_MASK_LOAD (vectp_a.6_46, 32B, _61, 0, { -1, ... });
  vect__6.11_52 = .LEN_MASK_LOAD (vectp_b.9_50, 32B, _61, 0, { -1, ... });
  vect__8.12_53 = .COND_LEN_ADD ({ -1, ... }, vect__4.8_48, vect__6.11_52, 
vect__4.8_48, _61, 0);
  .LEN_MASK_STORE (vectp_dst.13_55, 32B, _61, 0, { -1, ... }, vect__8.12_53);

With this patch, we can make sure operations won't trap for elements that 
"mask_out_inactive".

gcc/ChangeLog:

* internal-fn.cc (FOR_EACH_CODE_LEN_MAPPING): Add COND_LEN_*.
(get_conditional_len_internal_fn): New function.
(CASE): Add COND_LEN_*.
* internal-fn.h (get_conditional_len_internal_fn): New function.
* tree-vect-stmts.cc (vectorizable_operation): Apply COND_LEN_* into 
operation could trap.

---
 gcc/internal-fn.cc | 48 +
 gcc/internal-fn.h  |  1 +
 gcc/tree-vect-stmts.cc | 60 ++
 3 files changed, 104 insertions(+), 5 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index f9aaf66cf2a..e46dd57b7f0 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4337,6 +4337,54 @@ conditional_internal_fn_code (internal_fn ifn)
 }
 }
 
+/* Invoke T(CODE, IFN) for each conditional len function IFN that maps to a
+   tree code CODE.  */
+#define FOR_EACH_CODE_LEN_MAPPING(T)   
\
+  T (PLUS_EXPR, IFN_COND_LEN_ADD)  
\
+  T (MINUS_EXPR, IFN_COND_LEN_SUB) 
\
+  T (MULT_EXPR, IFN_COND_LEN_MUL)  
\
+  T (TRUNC_DIV_EXPR, IFN_COND_LEN_DIV) 
\
+  T (TRUNC_MOD_EXPR, IFN_COND_LEN_MOD) 
\
+  T (RDIV_EXPR, IFN_COND_LEN_RDIV) 
\
+  T (MIN_EXPR, IFN_COND_LEN_MIN)   
\
+  T (MAX_EXPR, IFN_COND_LEN_MAX)   
\
+  T (BIT_AND_EXPR, IFN_COND_LEN_AND)   
\
+  T (BIT_IOR_EXPR, IFN_COND_LEN_IOR)   
\
+  T (BIT_XOR_EXPR, IFN_COND_LEN_XOR)   
\
+  T (LSHIFT_EXPR, IFN_COND_LEN_SHL)
\
+  T (RSHIFT_EXPR, IFN_COND_LEN_SHR)
\
+  T (NEGATE_EXPR, IFN_COND_LEN_NEG)
+
+/* Return a function that only performs CODE when a certain condition is met
+   and that uses a given fallb

Re: [PATCH 1/2] c++, libstdc++: implement __is_pointer built-in trait

2023-07-11 Thread François Dumont via Gcc-patches



On 10/07/2023 07:23, Ken Matsui via Libstdc++ wrote:

This patch implements built-in trait for std::is_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_pointer.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_POINTER.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_pointer.
* g++.dg/ext/is_pointer.C: New test.
* g++.dg/tm/pr46567.C (__is_pointer): Rename to ...
(is_pointer): ... this.
* g++.dg/torture/20070621-1.C: Likewise.
* g++.dg/torture/pr57107.C: Likewise.

libstdc++-v3/ChangeLog:

* include/bits/cpp_type_traits.h (__is_pointer): Rename to ...
(is_pointer): ... this.
* include/bits/deque.tcc: Use is_pointer instead.
* include/bits/stl_algobase.h: Likewise.

Signed-off-by: Ken Matsui 
---
  gcc/cp/constraint.cc|  3 ++
  gcc/cp/cp-trait.def |  1 +
  gcc/cp/semantics.cc |  4 ++
  gcc/testsuite/g++.dg/ext/has-builtin-1.C|  3 ++
  gcc/testsuite/g++.dg/ext/is_pointer.C   | 51 +
  gcc/testsuite/g++.dg/tm/pr46567.C   | 22 -
  gcc/testsuite/g++.dg/torture/20070621-1.C   |  4 +-
  gcc/testsuite/g++.dg/torture/pr57107.C  |  4 +-
  libstdc++-v3/include/bits/cpp_type_traits.h |  6 +--
  libstdc++-v3/include/bits/deque.tcc |  6 +--
  libstdc++-v3/include/bits/stl_algobase.h|  6 +--
  11 files changed, 86 insertions(+), 24 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/ext/is_pointer.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 8cf0f2d0974..30266204eb5 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3751,6 +3751,9 @@ diagnose_trait_expr (tree expr, tree args)
  case CPTK_IS_UNION:
inform (loc, "  %qT is not a union", t1);
break;
+case CPTK_IS_POINTER:
+  inform (loc, "  %qT is not a pointer", t1);
+  break;
  case CPTK_IS_AGGREGATE:
inform (loc, "  %qT is not an aggregate", t1);
break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 8b7fece0cc8..b7c263e9a77 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, 
"__is_trivially_assignable", 2)
  DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", -1)
  DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
  DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
+DEFTRAIT_EXPR (IS_POINTER, "__is_pointer", 1)
  DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_temporary", 2)
  DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
  /* FIXME Added space to avoid direct usage in GCC 13.  */
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179e..68f8a4fe85b 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12118,6 +12118,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
  case CPTK_IS_UNION:
return type_code1 == UNION_TYPE;
  
+case CPTK_IS_POINTER:

+  return TYPE_PTR_P (type1);
+
  case CPTK_IS_ASSIGNABLE:
return is_xible (MODIFY_EXPR, type1, type2);
  
@@ -12296,6 +12299,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, tree type1, tree type2)

  case CPTK_IS_ENUM:
  case CPTK_IS_UNION:
  case CPTK_IS_SAME:
+case CPTK_IS_POINTER:
break;
  
  case CPTK_IS_LAYOUT_COMPATIBLE:

diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index f343e153e56..9dace5cbd48 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -146,3 +146,6 @@
  #if !__has_builtin (__remove_cvref)
  # error "__has_builtin (__remove_cvref) failed"
  #endif
+#if !__has_builtin (__is_pointer)
+# error "__has_builtin (__is_pointer) failed"
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/is_pointer.C 
b/gcc/testsuite/g++.dg/ext/is_pointer.C
new file mode 100644
index 000..d6e39565950
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_pointer.C
@@ -0,0 +1,51 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+SA(!__is_pointer(int));
+SA(__is_pointer(int*));
+SA(__is_pointer(int**));
+
+SA(__is_pointer(const int*));
+SA(__is_pointer(const int**));
+SA(__is_pointer(int* const));
+SA(__is_pointer(int** const));
+SA(__is_pointer(int* const* const));
+
+SA(__is_pointer(volatile int*));
+SA(__is_pointer(volatile int**));
+SA(__is_pointer(int* volatile));
+SA(__is_pointer(int** volatile));
+SA(__is_pointer(int* volatile* volatile));
+
+SA(__is_pointer(const volatile int*));
+SA(__is_pointer(const volatile int**));
+SA(__is_pointer(const int* volatile));
+SA(__is_pointer(volatile int* const));
+SA(__is_pointer(int* const volatile));
+SA(__is_poi

[PATCH] RISC-V: Support COND_LEN_* patterns

2023-07-11 Thread Juzhe-Zhong
This patch is depending on the following patch on Vectorizer:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624179.html

With this patch, we can handle operations may trap on elements outside the loop.

These 2 following cases will be addressed by this patch:

1. integer division:

  #define TEST_TYPE(TYPE)   \
  __attribute__((noipa))\
  void vrem_##TYPE (TYPE * __restrict dst, TYPE * __restrict a, TYPE * 
__restrict b, int n) \
  { \
for (int i = 0; i < n; i++) \
  dst[i] = a[i] % b[i]; \
  }
  #define TEST_ALL()\
   TEST_TYPE(int8_t)\
  TEST_ALL()

  Before this patch:

   vrem_int8_t:
ble a3,zero,.L14
csrrt4,vlenb
addiw   a5,a3,-1
addiw   a4,t4,-1
sext.w  t5,a3
bltua5,a4,.L10
csrrt3,vlenb
subwt3,t5,t3
li  a5,0
vsetvli t6,zero,e8,m1,ta,ma
.L4:
add a6,a2,a5
add a7,a0,a5
add t1,a1,a5
mv  a4,a5
add a5,a5,t4
vl1re8.vv2,0(a6)
vl1re8.vv1,0(t1)
sext.w  a6,a5
vrem.vv v1,v1,v2
vs1r.v  v1,0(a7)
bleua6,t3,.L4
csrra5,vlenb
addwa4,a4,a5
sext.w  a5,a4
beq t5,a4,.L16
.L3:
csrra6,vlenb
subwt5,t5,a4
srlia6,a6,1
addiw   t1,t5,-1
addiw   a7,a6,-1
bltut1,a7,.L9
sllia4,a4,32
srlia4,a4,32
add t0,a1,a4
add t6,a2,a4
add a4,a0,a4
vsetvli a7,zero,e8,mf2,ta,ma
sext.w  t3,a6
vle8.v  v1,0(t0)
vle8.v  v2,0(t6)
subwt4,t5,a6
vrem.vv v1,v1,v2
vse8.v  v1,0(a4)
mv  t1,t3
bltut4,t3,.L7
csrrt1,vlenb
add a4,a4,a6
add t0,t0,a6
add t6,t6,a6
sext.w  t1,t1
vle8.v  v1,0(t0)
vle8.v  v2,0(t6)
vrem.vv v1,v1,v2
vse8.v  v1,0(a4)
.L7:
addwa5,t1,a5
beq t5,t1,.L14
.L9:
add a4,a1,a5
add a6,a2,a5
lb  a6,0(a6)
lb  a4,0(a4)
add a7,a0,a5
addia5,a5,1
remwa4,a4,a6
sext.w  a6,a5
sb  a4,0(a7)
bgt a3,a6,.L9
.L14:
ret
.L10:
li  a4,0
li  a5,0
j   .L3
.L16:
ret

After this patch:

   vrem_int8_t:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,m1,tu,ma
vle8.v  v1,0(a1)
vle8.v  v2,0(a2)
sub a3,a3,a5
vrem.vv v1,v1,v2
vse8.v  v1,0(a0)
add a1,a1,a5
add a2,a2,a5
add a0,a0,a5
bne a3,zero,.L3
.L5:
ret

2. Floating-point operation **WITHOUT** -ffast-math:
 
#define TEST_TYPE(TYPE) \
__attribute__((noipa))  \
void vadd_##TYPE (TYPE * __restrict dst, TYPE *__restrict a, TYPE 
*__restrict b, int n) \
{   \
  for (int i = 0; i < n; i++)   \
dst[i] = a[i] + b[i];   \
}

#define TEST_ALL()  \
 TEST_TYPE(float)   \

TEST_ALL()
   
Before this patch:
   
   vadd_float:
ble a3,zero,.L10
csrra4,vlenb
srlit3,a4,2
addiw   a5,a3,-1
addiw   a6,t3,-1
sext.w  t6,a3
bltua5,a6,.L7
subwt5,t6,t3
mv  t1,a1
mv  a7,a2
mv  a6,a0
li  a5,0
vsetvli t4,zero,e32,m1,ta,ma
.L4:
vl1re32.v   v1,0(t1)
vl1re32.v   v2,0(a7)
addwa5,a5,t3
vfadd.vvv1,v1,v2
vs1r.v  v1,0(a6)
add t1,t1,a4
add a7,a7,a4
add a6,a6,a4
bgeut5,a5,.L4
beq t6,a5,.L10
sext.w  a5,a5
.L3:
sllia4,a5,2
.L6:
add a6,a1,a4
add a7,a2,a4
flw fa4,0(a6)
flw fa5,0(a7)
add a6,a0,a4
addiw   a5,a5,1
fadd.s  fa5,fa5,fa4
addia4,a4,4
fsw fa5,0(a6)
bgt a3,a5,.L6
.L10:
ret
.L7:
li  a5,0
j   .L3

After this patch:

   vadd_float:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e32,m1,tu,ma
sllia4,a5,2
vle32.v v1,0(a1)
vle32.v v2,0(a2)
sub a3,a3,a5
vfadd.vvv1,v1,v2
vse32.v v1,0(a0)
add a1,a1,a4
add a2,a2,a4
add a0,a0,a4
bne a3,zero,.L3
.L5:
ret
  
gcc/ChangeLog:

* config/riscv/autovec.md (cond_len_): New pattern.
* config/riscv/riscv-protos.h (enum insn_type

Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-11 Thread Jeff Law via Gcc-patches




On 7/11/23 20:34, juzhe.zh...@rivai.ai wrote:

Hi, Jeff.

 >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is

complete.  While you might be able to get REG_EXPR, I would not really
expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
way to make sure it's not called at an inappropriate time.

I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.


Should this have been known_lt rather than known_le?
It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE 
for SLP.

THanks for double checking.  It looked slightly odd checking ge or le.





Something's off in your formatting here.  I'd guess spaces vs tabs

Ok.


In a few places you're using expand_binop.  Those interfaces are really
more for gimple->RTL.  BUt code like expand_gather_scatter is really
RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
interfaces?

I saw ARM SVE is using them in many places for expanding patterns.
And I think it's convenient so that's why I use them.

OK.

I still think we need a resolution on strided_load_store_p.  As I 
mentioned in my original email, I'm not sure you can depend on getting 
to the SSA_NAME_DEF_STMT at this point -- in particular if it's a 
dangling pointer, then bad things are going to happen.  So let's chase 
that down.  Presumably this is called during gimple->rtl expansion, 
right?  Is it ever called later?


I think my concerns about expand_gather_scatter are a non-issue after 
looking at it again -- I missed the GET_MODE (step) != Pmode conditional 
when I first looked at that code.



jeff


[PATCH v1] RISC-V: Refactor riscv mode after for VXRM and FRM

2023-07-11 Thread Pan Li via Gcc-patches
From: Pan Li 

When investigate the FRM dynmaic rounding mode, we find the global
unknown status is quite different between the fixed-point and
floating-point. Thus, we separate the unknown function with extracting
some inner common functions.

We will also prepare more test cases in another PATCH.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv.cc (regnum_definition_p): New function.
(insn_asm_p): Ditto.
(riscv_vxrm_mode_after): New function for fixed-point.
(global_vxrm_state_unknown_p): Ditto.
(riscv_frm_mode_after): New function for floating-point.
(global_frm_state_unknown_p): Ditto.
(riscv_mode_after): Leverage new functions.
(riscv_entity_mode_after): Removed.
---
 gcc/config/riscv/riscv.cc | 96 +--
 1 file changed, 82 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38d8eb2fcf5..dbaf100fd8e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7742,19 +7742,91 @@ global_state_unknown_p (rtx_insn *insn, unsigned int 
regno)
   return false;
 }
 
+static bool
+regnum_definition_p (rtx_insn *insn, unsigned int regno)
+{
+  df_ref ref;
+  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+
+  /* Return true if there is a definition of regno.  */
+  for (ref = DF_INSN_INFO_DEFS (insn_info); ref; ref = DF_REF_NEXT_LOC (ref))
+if (DF_REF_REGNO (ref) == regno)
+  return true;
+
+  return false;
+}
+
+static bool
+insn_asm_p (rtx_insn *insn)
+{
+  extract_insn (insn);
+
+  return recog_data.is_asm;
+}
+
+static bool
+global_vxrm_state_unknown_p (rtx_insn *insn)
+{
+  /* Return true if there is a definition of VXRM.  */
+  if (regnum_definition_p (insn, VXRM_REGNUM))
+return true;
+
+  /* A CALL function may contain an instruction that modifies the VXRM,
+ return true in this situation.  */
+  if (CALL_P (insn))
+return true;
+
+  /* Return true for all assembly since users may hardcode a assembly
+ like this: asm volatile ("csrwi vxrm, 0").  */
+  if (insn_asm_p (insn))
+return true;
+
+  return false;
+}
+
+static bool
+global_frm_state_unknown_p (rtx_insn *insn)
+{
+  /* Return true if there is a definition of FRM.  */
+  if (regnum_definition_p (insn, FRM_REGNUM))
+return true;
+
+  /* A CALL function may contain an instruction that modifies the VXRM,
+ return true in this situation.  */
+  if (CALL_P (insn))
+return true;
+
+  return false;
+}
+
 static int
-riscv_entity_mode_after (int regnum, rtx_insn *insn, int mode,
-int (*get_attr_mode) (rtx_insn *), int default_mode)
+riscv_vxrm_mode_after (rtx_insn *insn, int mode)
 {
-  if (global_state_unknown_p (insn, regnum))
-return default_mode;
-  else if (recog_memoized (insn) < 0)
+  if (global_vxrm_state_unknown_p (insn))
+return VXRM_MODE_NONE;
+
+  if (recog_memoized (insn) < 0)
+return mode;
+
+  if (reg_mentioned_p (gen_rtx_REG (SImode, VXRM_REGNUM), PATTERN (insn)))
+return get_attr_vxrm_mode (insn);
+  else
 return mode;
+}
 
-  rtx reg = gen_rtx_REG (SImode, regnum);
-  bool mentioned_p = reg_mentioned_p (reg, PATTERN (insn));
+static int
+riscv_frm_mode_after (rtx_insn *insn, int mode)
+{
+  if (global_frm_state_unknown_p (insn))
+return FRM_MODE_NONE;
 
-  return mentioned_p ? get_attr_mode (insn): mode;
+  if (recog_memoized (insn) < 0)
+return mode;
+
+  if (reg_mentioned_p (gen_rtx_REG (SImode, FRM_REGNUM), PATTERN (insn)))
+return get_attr_frm_mode (insn);
+  else
+return mode;
 }
 
 /* Return the mode that an insn results in.  */
@@ -7765,13 +7837,9 @@ riscv_mode_after (int entity, int mode, rtx_insn *insn)
   switch (entity)
 {
 case RISCV_VXRM:
-  return riscv_entity_mode_after (VXRM_REGNUM, insn, mode,
- (int (*)(rtx_insn *)) get_attr_vxrm_mode,
- VXRM_MODE_NONE);
+  return riscv_vxrm_mode_after (insn, mode);
 case RISCV_FRM:
-  return riscv_entity_mode_after (FRM_REGNUM, insn, mode,
- (int (*)(rtx_insn *)) get_attr_frm_mode,
- FRM_MODE_DYN);
+  return riscv_frm_mode_after (insn, mode);
 default:
   gcc_unreachable ();
 }
-- 
2.34.1



[PATCH v2] RISC-V: Refactor riscv mode after for VXRM and FRM

2023-07-11 Thread Pan Li via Gcc-patches
From: Pan Li 

When investigate the FRM dynmaic rounding mode, we find the global
unknown status is quite different between the fixed-point and
floating-point. Thus, we separate the unknown function with extracting
some inner common functions.

We will also prepare more test cases in another PATCH.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv.cc (regnum_definition_p): New function.
(insn_asm_p): Ditto.
(riscv_vxrm_mode_after): New function for fixed-point.
(global_vxrm_state_unknown_p): Ditto.
(riscv_frm_mode_after): New function for floating-point.
(global_frm_state_unknown_p): Ditto.
(riscv_mode_after): Leverage new functions.
(riscv_entity_mode_after): Removed.
---
 gcc/config/riscv/riscv.cc | 96 +--
 1 file changed, 82 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38d8eb2fcf5..553fbb4435a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7742,19 +7742,91 @@ global_state_unknown_p (rtx_insn *insn, unsigned int 
regno)
   return false;
 }
 
+static bool
+regnum_definition_p (rtx_insn *insn, unsigned int regno)
+{
+  df_ref ref;
+  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+
+  /* Return true if there is a definition of regno.  */
+  for (ref = DF_INSN_INFO_DEFS (insn_info); ref; ref = DF_REF_NEXT_LOC (ref))
+if (DF_REF_REGNO (ref) == regno)
+  return true;
+
+  return false;
+}
+
+static bool
+insn_asm_p (rtx_insn *insn)
+{
+  extract_insn (insn);
+
+  return recog_data.is_asm;
+}
+
+static bool
+global_vxrm_state_unknown_p (rtx_insn *insn)
+{
+  /* Return true if there is a definition of VXRM.  */
+  if (regnum_definition_p (insn, VXRM_REGNUM))
+return true;
+
+  /* A CALL function may contain an instruction that modifies the VXRM,
+ return true in this situation.  */
+  if (CALL_P (insn))
+return true;
+
+  /* Return true for all assembly since users may hardcode a assembly
+ like this: asm volatile ("csrwi vxrm, 0").  */
+  if (insn_asm_p (insn))
+return true;
+
+  return false;
+}
+
+static bool
+global_frm_state_unknown_p (rtx_insn *insn)
+{
+  /* Return true if there is a definition of FRM.  */
+  if (regnum_definition_p (insn, FRM_REGNUM))
+return true;
+
+  /* A CALL function may contain an instruction that modifies the FRM,
+ return true in this situation.  */
+  if (CALL_P (insn))
+return true;
+
+  return false;
+}
+
 static int
-riscv_entity_mode_after (int regnum, rtx_insn *insn, int mode,
-int (*get_attr_mode) (rtx_insn *), int default_mode)
+riscv_vxrm_mode_after (rtx_insn *insn, int mode)
 {
-  if (global_state_unknown_p (insn, regnum))
-return default_mode;
-  else if (recog_memoized (insn) < 0)
+  if (global_vxrm_state_unknown_p (insn))
+return VXRM_MODE_NONE;
+
+  if (recog_memoized (insn) < 0)
+return mode;
+
+  if (reg_mentioned_p (gen_rtx_REG (SImode, VXRM_REGNUM), PATTERN (insn)))
+return get_attr_vxrm_mode (insn);
+  else
 return mode;
+}
 
-  rtx reg = gen_rtx_REG (SImode, regnum);
-  bool mentioned_p = reg_mentioned_p (reg, PATTERN (insn));
+static int
+riscv_frm_mode_after (rtx_insn *insn, int mode)
+{
+  if (global_frm_state_unknown_p (insn))
+return FRM_MODE_NONE;
 
-  return mentioned_p ? get_attr_mode (insn): mode;
+  if (recog_memoized (insn) < 0)
+return mode;
+
+  if (reg_mentioned_p (gen_rtx_REG (SImode, FRM_REGNUM), PATTERN (insn)))
+return get_attr_frm_mode (insn);
+  else
+return mode;
 }
 
 /* Return the mode that an insn results in.  */
@@ -7765,13 +7837,9 @@ riscv_mode_after (int entity, int mode, rtx_insn *insn)
   switch (entity)
 {
 case RISCV_VXRM:
-  return riscv_entity_mode_after (VXRM_REGNUM, insn, mode,
- (int (*)(rtx_insn *)) get_attr_vxrm_mode,
- VXRM_MODE_NONE);
+  return riscv_vxrm_mode_after (insn, mode);
 case RISCV_FRM:
-  return riscv_entity_mode_after (FRM_REGNUM, insn, mode,
- (int (*)(rtx_insn *)) get_attr_frm_mode,
- FRM_MODE_DYN);
+  return riscv_frm_mode_after (insn, mode);
 default:
   gcc_unreachable ();
 }
-- 
2.34.1



[PATCH] Initial Granite Rapids D Support

2023-07-11 Thread Mo, Zewei via Gcc-patches
Hi all,

This patch is to add initial support for Granite Rapids D for GCC.

The link of related information is listed below:
https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Also, the patch of removing AMX-COMPLEX from Granite Rapids will be backported
to GCC13.

This has been tested on x86_64-pc-linux-gnu. Is this ok for trunk? Thank you.

Sincerely,
Zewei Mo

gcc/ChangeLog:

* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle Granite Rapids D.
* common/config/i386/i386-common.cc:
(processor_alias_table): Add graniterapids-d.
* common/config/i386/i386-cpuinfo.h
(enum processor_subtypes): Add INTEL_COREI7_GRANITERAPIDS_D.
* config.gcc: Add -march=graniterapids-d.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Handle graniterapids-d.
* gcc/config/i386/i386.h: (PTA_GRANITERAPIDS_D): New.
* doc/extend.texi: Add graniterapids-d.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv16.C: Add graniterapids-d.
* gcc.target/i386/funcspec-56.inc: Handle new march.
---
 gcc/common/config/i386/cpuinfo.h  |  9 -
 gcc/common/config/i386/i386-common.cc |  2 ++
 gcc/common/config/i386/i386-cpuinfo.h |  1 +
 gcc/config.gcc|  2 +-
 gcc/config/i386/driver-i386.cc|  3 +++
 gcc/config/i386/i386.h|  4 +++-
 gcc/doc/extend.texi   |  3 +++
 gcc/doc/invoke.texi   | 11 +++
 gcc/testsuite/g++.target/i386/mv16.C  |  6 ++
 gcc/testsuite/gcc.target/i386/funcspec-56.inc |  1 +
 10 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index ae48bc17771..7c2565c1d93 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -565,7 +565,6 @@ get_intel_cpu (struct __processor_model *cpu_model,
   cpu_model->__cpu_type = INTEL_SIERRAFOREST;
   break;
 case 0xad:
-case 0xae:
   /* Granite Rapids.  */
   cpu = "graniterapids";
   CHECK___builtin_cpu_is ("corei7");
@@ -573,6 +572,14 @@ get_intel_cpu (struct __processor_model *cpu_model,
   cpu_model->__cpu_type = INTEL_COREI7;
   cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS;
   break;
+case 0xae:
+  /* Granite Rapids D.  */
+  cpu = "graniterapids-d";
+  CHECK___builtin_cpu_is ("corei7");
+  CHECK___builtin_cpu_is ("graniterapids-d");
+  cpu_model->__cpu_type = INTEL_COREI7;
+  cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS_D;
+  break;
 case 0xb6:
   /* Grand Ridge.  */
   cpu = "grandridge";
diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index bf126f14073..8cea3669239 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2094,6 +2094,8 @@ const pta processor_alias_table[] =
 M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
   {"graniterapids", PROCESSOR_GRANITERAPIDS, CPU_HASWELL, PTA_GRANITERAPIDS,
 M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX512F},
+  {"graniterapids-d", PROCESSOR_GRANITERAPIDS, CPU_HASWELL, 
PTA_GRANITERAPIDS_D,
+M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D), P_PROC_AVX512F},
   {"bonnell", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
 M_CPU_TYPE (INTEL_BONNELL), P_PROC_SSSE3},
   {"atom", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
b/gcc/common/config/i386/i386-cpuinfo.h
index 2dafbb25a49..254dfec70e5 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -98,6 +98,7 @@ enum processor_subtypes
   ZHAOXIN_FAM7H_LUJIAZUI,
   AMDFAM19H_ZNVER4,
   INTEL_COREI7_GRANITERAPIDS,
+  INTEL_COREI7_GRANITERAPIDS_D,
   CPU_SUBTYPE_MAX
 };
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index d88071773c9..1446eb2b3ca 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -682,7 +682,7 @@ silvermont knl knm skylake-avx512 cannonlake icelake-client 
icelake-server \
 skylake goldmont goldmont-plus tremont cascadelake tigerlake cooperlake \
 sapphirerapids alderlake rocketlake eden-x2 nano nano-1000 nano-2000 nano-3000 
\
 nano-x2 eden-x4 nano-x4 lujiazui x86-64 x86-64-v2 x86-64-v3 x86-64-v4 \
-sierraforest graniterapids grandridge native"
+sierraforest graniterapids graniterapids-d grandridge native"
 
 # Additional x86 processors supported by --with-cpu=.  Each processor
 # MUST be separated by exactly one space.
diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc
index 54c019a7fa3..4c362ffcfa3 100644
--- a/gcc/config/i386/driver-i386.cc
+++ b/gcc/config/i386/driver-i386.cc
@@ -594,6 +594,9 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
   

RE: [PATCH] Initial Granite Rapids D Support

2023-07-11 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Mo, Zewei 
> Sent: Wednesday, July 12, 2023 1:56 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; ubiz...@gmail.com
> Subject: [PATCH] Initial Granite Rapids D Support
> 
> Hi all,
> 
> This patch is to add initial support for Granite Rapids D for GCC.
> 
> The link of related information is listed below:
> https://www.intel.com/content/www/us/en/develop/download/intel-
> architecture-instruction-set-extensions-programming-reference.html
> 
> Also, the patch of removing AMX-COMPLEX from Granite Rapids will be
> backported to GCC13.
> 
> This has been tested on x86_64-pc-linux-gnu. Is this ok for trunk? Thank you.
Ok.
> 
> Sincerely,
> Zewei Mo
> 
> gcc/ChangeLog:
> 
>   * common/config/i386/cpuinfo.h
>   (get_intel_cpu): Handle Granite Rapids D.
>   * common/config/i386/i386-common.cc:
>   (processor_alias_table): Add graniterapids-d.
>   * common/config/i386/i386-cpuinfo.h
>   (enum processor_subtypes): Add INTEL_COREI7_GRANITERAPIDS_D.
>   * config.gcc: Add -march=graniterapids-d.
>   * config/i386/driver-i386.cc (host_detect_local_cpu):
>   Handle graniterapids-d.
>   * gcc/config/i386/i386.h: (PTA_GRANITERAPIDS_D): New.
>   * doc/extend.texi: Add graniterapids-d.
>   * doc/invoke.texi: Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/i386/mv16.C: Add graniterapids-d.
>   * gcc.target/i386/funcspec-56.inc: Handle new march.
> ---
>  gcc/common/config/i386/cpuinfo.h  |  9 -
>  gcc/common/config/i386/i386-common.cc |  2 ++
>  gcc/common/config/i386/i386-cpuinfo.h |  1 +
>  gcc/config.gcc|  2 +-
>  gcc/config/i386/driver-i386.cc|  3 +++
>  gcc/config/i386/i386.h|  4 +++-
>  gcc/doc/extend.texi   |  3 +++
>  gcc/doc/invoke.texi   | 11 +++
>  gcc/testsuite/g++.target/i386/mv16.C  |  6 ++
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |  1 +
>  10 files changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/common/config/i386/cpuinfo.h
> b/gcc/common/config/i386/cpuinfo.h
> index ae48bc17771..7c2565c1d93 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -565,7 +565,6 @@ get_intel_cpu (struct __processor_model
> *cpu_model,
>cpu_model->__cpu_type = INTEL_SIERRAFOREST;
>break;
>  case 0xad:
> -case 0xae:
>/* Granite Rapids.  */
>cpu = "graniterapids";
>CHECK___builtin_cpu_is ("corei7"); @@ -573,6 +572,14 @@
> get_intel_cpu (struct __processor_model *cpu_model,
>cpu_model->__cpu_type = INTEL_COREI7;
>cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS;
>break;
> +case 0xae:
> +  /* Granite Rapids D.  */
> +  cpu = "graniterapids-d";
> +  CHECK___builtin_cpu_is ("corei7");
> +  CHECK___builtin_cpu_is ("graniterapids-d");
> +  cpu_model->__cpu_type = INTEL_COREI7;
> +  cpu_model->__cpu_subtype = INTEL_COREI7_GRANITERAPIDS_D;
> +  break;
>  case 0xb6:
>/* Grand Ridge.  */
>cpu = "grandridge";
> diff --git a/gcc/common/config/i386/i386-common.cc
> b/gcc/common/config/i386/i386-common.cc
> index bf126f14073..8cea3669239 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -2094,6 +2094,8 @@ const pta processor_alias_table[] =
>  M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
>{"graniterapids", PROCESSOR_GRANITERAPIDS, CPU_HASWELL,
> PTA_GRANITERAPIDS,
>  M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX512F},
> +  {"graniterapids-d", PROCESSOR_GRANITERAPIDS, CPU_HASWELL,
> PTA_GRANITERAPIDS_D,
> +M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D),
> P_PROC_AVX512F},
>{"bonnell", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL,
>  M_CPU_TYPE (INTEL_BONNELL), P_PROC_SSSE3},
>{"atom", PROCESSOR_BONNELL, CPU_ATOM, PTA_BONNELL, diff --git
> a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-
> cpuinfo.h
> index 2dafbb25a49..254dfec70e5 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ b/gcc/common/config/i386/i386-cpuinfo.h
> @@ -98,6 +98,7 @@ enum processor_subtypes
>ZHAOXIN_FAM7H_LUJIAZUI,
>AMDFAM19H_ZNVER4,
>INTEL_COREI7_GRANITERAPIDS,
> +  INTEL_COREI7_GRANITERAPIDS_D,
>CPU_SUBTYPE_MAX
>  };
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc index d88071773c9..1446eb2b3ca
> 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -682,7 +682,7 @@ silvermont knl knm skylake-avx512 cannonlake
> icelake-client icelake-server \  skylake goldmont goldmont-plus tremont
> cascadelake tigerlake cooperlake \  sapphirerapids alderlake rocketlake
> eden-x2 nano nano-1000 nano-2000 nano-3000 \
>  nano-x2 eden-x4 nano-x4 lujiazui x86-64 x86-64-v2 x86-64-v3 x86-64-v4 \ -
> sierraforest graniterapids grandridge native"
> +sierraforest graniterapids granite

Re: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV auto-vectorization

2023-07-11 Thread juzhe.zh...@rivai.ai
I understand your concern. I CC Richards to see whether this piece of codes is  
unsafe.

Hi, Richard and Richi:

Jeff is worrying about this codes in "expand_gather_scatter" of supporting 
len_mask_gather_load/len_mask_scatter_store in RISC-V port.

The codes are as follows:

 +/* Return true if it is the strided load/store. */
+static bool
+strided_load_store_p (rtx vec_offset, rtx *base, rtx *step)
+{
+  if (const_vec_series_p (vec_offset, base, step))
+return true;
+
+  /* For strided load/store, vectorizer always generates
+ VEC_SERIES_EXPR for vec_offset.  */
+  tree expr = REG_P (vec_offset) ? REG_EXPR (vec_offset) : NULL_TREE;
+  if (!expr || TREE_CODE (expr) != SSA_NAME)
+return false;
+
+  /* Check if it is GIMPLE like: _88 = VEC_SERIES_EXPR <0, _87>;  */
+  gimple *def_stmt = SSA_NAME_DEF_STMT (expr);
+  if (!def_stmt || !is_gimple_assign (def_stmt)
+  || gimple_assign_rhs_code (def_stmt) != VEC_SERIES_EXPR)
+return false;
+
+  tree baset = gimple_assign_rhs1 (def_stmt);
+  tree stept = gimple_assign_rhs2 (def_stmt);
+  *base = expand_normal (baset);
+  *step = expand_normal (stept);
+
+  if (!rtx_equal_p (*base, const0_rtx))
+return false;
+  return true;
+}
In this codes, I tried to query the SSA_NAME_DEF_STMT to see whether the vector 
offset of gather/scatter is VEC_SERISE
If it is VEC_SERISE, I will lower them into RVV strided load/stores 
(vlse.v/vsse.v) which is using scalar stride, 
if it is not, then use common RVV indexed load/store with vector offset 
(vluxei/vsuxei).

Jeff is worrying about whether we are using SSA_NAME_DEF_STMT at this point  
(during the stage "expand" expanding gimple ->rtl).

I am also wondering whether I am doing wrong here.
Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-07-12 13:32
To: juzhe.zh...@rivai.ai; gcc-patches
CC: Kito.cheng; Robin Dapp
Subject: Re: [PATCH V5] RISC-V: Support gather_load/scatter RVV 
auto-vectorization
 
 
On 7/11/23 20:34, juzhe.zh...@rivai.ai wrote:
> Hi, Jeff.
> 
>  >> Hmm, I'm not sure this is safe, especially if gimple->rtl expansion is
>>>complete.  While you might be able to get REG_EXPR, I would not really
>>>expect SSA_NAME_DEF_STMT to be correct.  At the least it'll need some
>>>way to make sure it's not called at an inappropriate time.
> I think it's safe, if SSA_NAME_DEF_STMT is NULL, then just return it.
> 
>>>Should this have been known_lt rather than known_le?
> It should be LE, since I will pass through GET_MODE_NUNITS/GET_MODE_SIZE 
> for SLP.
THanks for double checking.  It looked slightly odd checking ge or le.
 
 
> 
>>>Something's off in your formatting here.  I'd guess spaces vs tabs
> Ok.
> 
>>>In a few places you're using expand_binop.  Those interfaces are really
>>>more for gimple->RTL.  BUt code like expand_gather_scatter is really
>>>RTL, not gimple/tree.   Is there a reason why you're not using pure RTL
>>>interfaces?
> I saw ARM SVE is using them in many places for expanding patterns.
> And I think it's convenient so that's why I use them.
OK.
 
I still think we need a resolution on strided_load_store_p.  As I 
mentioned in my original email, I'm not sure you can depend on getting 
to the SSA_NAME_DEF_STMT at this point -- in particular if it's a 
dangling pointer, then bad things are going to happen.  So let's chase 
that down.  Presumably this is called during gimple->rtl expansion, 
right?  Is it ever called later?
 
I think my concerns about expand_gather_scatter are a non-issue after 
looking at it again -- I missed the GET_MODE (step) != Pmode conditional 
when I first looked at that code.
 
 
jeff
 


Re: [PATCH v2] RISC-V: Refactor riscv mode after for VXRM and FRM

2023-07-11 Thread juzhe.zh...@rivai.ai
+regnum_definition_p (rtx_insn *insn, unsigned int regno)I prefer it to be 
reg_set_p.
+insn_asm_p (rtx_insn *insn)asm_insn_p
+global_vxrm_state_unknown_pvxrm_unknown_p
+global_frm_state_unknown_p (rtx_insn *insn)FRM of CALL function is not 
"UNKNOWN" unlike VXRM.It just change into another unknown(may be same or 
different from previous dynamic mode) Dynamic mode.frm_unknown_dynamic_p
The reset refactoring looks good.Let's see whether kito has more comments.
Thanks.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-07-12 13:50
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Refactor riscv mode after for VXRM and FRM
From: Pan Li 
 
When investigate the FRM dynmaic rounding mode, we find the global
unknown status is quite different between the fixed-point and
floating-point. Thus, we separate the unknown function with extracting
some inner common functions.
 
We will also prepare more test cases in another PATCH.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (regnum_definition_p): New function.
(insn_asm_p): Ditto.
(riscv_vxrm_mode_after): New function for fixed-point.
(global_vxrm_state_unknown_p): Ditto.
(riscv_frm_mode_after): New function for floating-point.
(global_frm_state_unknown_p): Ditto.
(riscv_mode_after): Leverage new functions.
(riscv_entity_mode_after): Removed.
---
gcc/config/riscv/riscv.cc | 96 +--
1 file changed, 82 insertions(+), 14 deletions(-)
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 38d8eb2fcf5..553fbb4435a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7742,19 +7742,91 @@ global_state_unknown_p (rtx_insn *insn, unsigned int 
regno)
   return false;
}
+static bool
+regnum_definition_p (rtx_insn *insn, unsigned int regno)
+{
+  df_ref ref;
+  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+
+  /* Return true if there is a definition of regno.  */
+  for (ref = DF_INSN_INFO_DEFS (insn_info); ref; ref = DF_REF_NEXT_LOC (ref))
+if (DF_REF_REGNO (ref) == regno)
+  return true;
+
+  return false;
+}
+
+static bool
+insn_asm_p (rtx_insn *insn)
+{
+  extract_insn (insn);
+
+  return recog_data.is_asm;
+}
+
+static bool
+global_vxrm_state_unknown_p (rtx_insn *insn)
+{
+  /* Return true if there is a definition of VXRM.  */
+  if (regnum_definition_p (insn, VXRM_REGNUM))
+return true;
+
+  /* A CALL function may contain an instruction that modifies the VXRM,
+ return true in this situation.  */
+  if (CALL_P (insn))
+return true;
+
+  /* Return true for all assembly since users may hardcode a assembly
+ like this: asm volatile ("csrwi vxrm, 0").  */
+  if (insn_asm_p (insn))
+return true;
+
+  return false;
+}
+
+static bool
+global_frm_state_unknown_p (rtx_insn *insn)
+{
+  /* Return true if there is a definition of FRM.  */
+  if (regnum_definition_p (insn, FRM_REGNUM))
+return true;
+
+  /* A CALL function may contain an instruction that modifies the FRM,
+ return true in this situation.  */
+  if (CALL_P (insn))
+return true;
+
+  return false;
+}
+
static int
-riscv_entity_mode_after (int regnum, rtx_insn *insn, int mode,
- int (*get_attr_mode) (rtx_insn *), int default_mode)
+riscv_vxrm_mode_after (rtx_insn *insn, int mode)
{
-  if (global_state_unknown_p (insn, regnum))
-return default_mode;
-  else if (recog_memoized (insn) < 0)
+  if (global_vxrm_state_unknown_p (insn))
+return VXRM_MODE_NONE;
+
+  if (recog_memoized (insn) < 0)
+return mode;
+
+  if (reg_mentioned_p (gen_rtx_REG (SImode, VXRM_REGNUM), PATTERN (insn)))
+return get_attr_vxrm_mode (insn);
+  else
 return mode;
+}
-  rtx reg = gen_rtx_REG (SImode, regnum);
-  bool mentioned_p = reg_mentioned_p (reg, PATTERN (insn));
+static int
+riscv_frm_mode_after (rtx_insn *insn, int mode)
+{
+  if (global_frm_state_unknown_p (insn))
+return FRM_MODE_NONE;
-  return mentioned_p ? get_attr_mode (insn): mode;
+  if (recog_memoized (insn) < 0)
+return mode;
+
+  if (reg_mentioned_p (gen_rtx_REG (SImode, FRM_REGNUM), PATTERN (insn)))
+return get_attr_frm_mode (insn);
+  else
+return mode;
}
/* Return the mode that an insn results in.  */
@@ -7765,13 +7837,9 @@ riscv_mode_after (int entity, int mode, rtx_insn *insn)
   switch (entity)
 {
 case RISCV_VXRM:
-  return riscv_entity_mode_after (VXRM_REGNUM, insn, mode,
-   (int (*)(rtx_insn *)) get_attr_vxrm_mode,
-   VXRM_MODE_NONE);
+  return riscv_vxrm_mode_after (insn, mode);
 case RISCV_FRM:
-  return riscv_entity_mode_after (FRM_REGNUM, insn, mode,
-   (int (*)(rtx_insn *)) get_attr_frm_mode,
-   FRM_MODE_DYN);
+  return riscv_frm_mode_after (insn, mode);
 default:
   gcc_unreachable ();
 }
-- 
2.34.1
 
 


[PATCH] LoongArch: Fix the missing include file when using gcc plugins.

2023-07-11 Thread Guo Jie
From: Sun Haiyong 

gcc/ChangeLog:

* config.gcc: Add some include file in tm_file.

---
 gcc/config.gcc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 51ca5311fa4..b901aa8e5dc 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2478,7 +2478,7 @@ riscv*-*-freebsd*)
 
 loongarch*-*-linux*)
tm_file="elfos.h gnu-user.h linux.h linux-android.h glibc-stdint.h 
${tm_file}"
-   tm_file="${tm_file} loongarch/gnu-user.h loongarch/linux.h"
+   tm_file="${tm_file} loongarch/gnu-user.h loongarch/linux.h 
loongarch/loongarch-def.h loongarch/loongarch-tune.h 
loongarch/loongarch-driver.h"
extra_options="${extra_options} linux-android.opt"
tmake_file="${tmake_file} loongarch/t-linux"
gnu_ld=yes
-- 
2.20.1



Re: [PATCH 1/1] riscv: thead: Fix ICE when enable XTheadMemPair ISA extension.

2023-07-11 Thread Xianmiao Qu via Gcc-patches
On Tue, Jul 11, 2023 at 06:02:18PM +0200, Christoph Müllner wrote:
> Hi Kito,
> 
> I take some of the blame because I have sent a series
> that consisted of fixes followed by new features.
> 
> You have ack'ed patches 1-9 from the series.
> The last two patches (for XTheadMemIdx and XTheadFMemIdx) were
> later reviewed by Jeff and need a bit rework and more testing.
> 
> If it helps, you can find patches 1-9 rebased and retested here:
>   https://github.com/cmuellner/gcc/tree/riscv-thead-improvements
> 
> I have also sent out a fix for two failing T-Head tests earlier today:
>   https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624049.html
> It would be great if you could look at that and push that as well, if it is 
> ok.
> 
> Thanks,
> Christoph
> 
> 
> 
> On Tue, Jul 11, 2023 at 5:51 PM Kito Cheng  wrote:
> >
> > Hi Christoph:
> >
> > Ooops, I thought Philipp will push those patches, does here any other
> > patches got approved but not committed? I can help to push those
> > patches tomorrow.
> >
> > On Tue, Jul 11, 2023 at 11:42 PM Christoph Müllner
> >  wrote:
> > >
> > > Hi Cooper,
> > >
> > > I addressed this in April this year.
> > > It even got an "ok", but nobody pushed it:
> > >   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616972.html
> > >
> > > BR
> > > Christoph
> > >

Hi Christoph and Kito,

That's great that this bug has been resolved. If you merge this patch,
it would be best to also merge it to the gcc-13 branch.


Thanks,
Cooper


Re: [PATCH, OpenACC 2.7] readonly modifier support in front-ends

2023-07-11 Thread Tobias Burnus

Hi,

just a remark regarding OpenMP. With

...omp ... firstprivate(var) allocator(omp_const_mem_alloc: var) one can also 
create constant memory in OpenMP.
Likewise with a custom allocator that uses the memory space
omp_const_mem_space, which is then a run-time thing. I don't think
that's particular useful on the host as the !PROT_WRITE property is a
memory-page thing which requires to allocate a multiple of a page size
(and after writing the value, mprotect can make it read only). But I
think it can be useful on the device (cf. OpenACC). OpenMP and OpenACC
likely differ in terms of whether an entry is in the mapping table
(firstprivate vs copy) and in the ref count. In any case, it would be
good to have the code written such that both OpenACC's and OpenMP's use
case can share as much code as possible, even if only OpenACC is
initially supported. Tobias PS: I should eventually have a closer look
at your patch!

On 10.07.23 20:33, Chung-Lin Tang wrote:

this patch contains support for the 'readonly' modifier in copyin clauses
and the cache directive.

As we discussed earlier, the work for actually linking this to middle-end
points-to analysis is a somewhat non-trivial issue. This first patch allows
the language feature to be used in OpenACC directives first (with no effect for 
now).
The middle-end changes are probably going to be a later patch.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: Pushed: [PATCH v2] vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]

2023-07-11 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 10 Jul 2023 at 16:43, Xi Ruoyao via Gcc-patches
 wrote:
>
> On Mon, 2023-07-10 at 10:33 +, Richard Biener wrote:
> > On Fri, 7 Jul 2023, Xi Ruoyao wrote:
> >
> > > If a bit-field is signed and it's wider than the output type, we
> > > must
> > > ensure the extracted result sign-extended.  But this was not handled
> > > correctly.
> > >
> > > For example:
> > >
> > > int x : 8;
> > > long y : 55;
> > > bool z : 1;
> > >
> > > The vectorized extraction of y was:
> > >
> > > vect__ifc__49.29_110 =
> > >   MEM  [(struct Item
> > > *)vectp_a.27_108];
> > > vect_patt_38.30_112 =
> > >   vect__ifc__49.29_110 & { 9223372036854775552,
> > > 9223372036854775552 };
> > > vect_patt_39.31_113 = vect_patt_38.30_112 >> 8;
> > > vect_patt_40.32_114 =
> > >   VIEW_CONVERT_EXPR(vect_patt_39.31_113);
> > >
> > > This is obviously incorrect.  This pach has implemented it as:
> > >
> > > vect__ifc__25.16_62 =
> > >   MEM  [(struct Item
> > > *)vectp_a.14_60];
> > > vect_patt_31.17_63 =
> > >   VIEW_CONVERT_EXPR(vect__ifc__25.16_62);
> > > vect_patt_32.18_64 = vect_patt_31.17_63 << 1;
> > > vect_patt_33.19_65 = vect_patt_32.18_64 >> 9;
> >
> > OK.
>
> Pushed r14-2407 and r13-7553.
Hi Xi,
Your commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=63ae6bc60c0f67fb2791991bf4b6e7e0a907d420,

seems to cause following regressions on arm-linux-gnueabihf:
FAIL: g++.dg/vect/pr110557.cc  -std=c++98 (test for excess errors)
FAIL: g++.dg/vect/pr110557.cc  -std=c++14 (test for excess errors)
FAIL: g++.dg/vect/pr110557.cc  -std=c++17 (test for excess errors)
FAIL: g++.dg/vect/pr110557.cc  -std=c++20 (test for excess errors)

Excess error:
gcc/testsuite/g++.dg/vect/pr110557.cc:12:8: warning: width of
'Item::y' exceeds its type

Thanks,
Prathamesh
>
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University


Re: [PATCH] x86: improve fast bfloat->float conversion

2023-07-11 Thread Jan Beulich via Gcc-patches
On 11.07.2023 08:45, Liu, Hongtao wrote:
>> -Original Message-
>> From: Jan Beulich 
>> Sent: Tuesday, July 11, 2023 2:08 PM
>>
>> There's nothing AVX512BW-ish in here, so no reason to use Yw as the
>> constraints for the AVX alternative. Furthermore by using the 512-bit form of
>> VPSSLD (in a new alternative) all 32 registers can be used directly by the 
>> insn
>> without AVX512VL needing to be enabled.
> Yes, the instruction vpslld doesn't need AVX512BW, the patch LGTM.

Thanks.

>> ---
>> The corresponding expander, "extendbfsf2", looks to have been dead since
>> its introduction in a1ecc5600464 ("Fix incorrect _mm_cvtsbh_ss"): The builtin
>> references the insn (extendbfsf2_1), not the expander. Can't the expander
>> be deleted and the name of the insn then pruned of the _1 suffix? If so, that
>> further raises the question of the significance of the "!HONOR_NANS
>> (BFmode)" that the expander has, but the insn doesn't have. Which may
>> instead suggest the builtin was meant to reference the expander. Yet then I
>> can't see what would the builtin would expand to when HONOR_NANS
>> (BFmode) it true.
> 
> Quote from what Jakub said in [1].
> ---
> This is not correct.
> While using such code for _mm_cvtsbh_ss is fine if it is documented not to
> raise exceptions and turn a sNaN into a qNaN, it is not fine for HONOR_NANS
> (i.e. when -ffast-math is not on), because a __bf16 -> float conversion
> on sNaN should raise invalid exception and turn it into a qNaN.
> We could have extendbfsf2 expander that would FAIL; if HONOR_NANS and
> emit extendbfsf2_1 otherwise. 
> ---
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607108.html

I'm not sure I understand: It sounds like what Jakub said matches my
observation, yet then it seems unlikely that the issue wasn't fixed in
over half a year.

Also having the expander FAIL when HONOR_NANS (matching what I was
thinking) still doesn't clarify to me what then would happen to uses of
the builtin. Is there any (common code) fallback for such a case? I
didn't think there would be, in which case wouldn't this result in an
internal compiler error?

Jan


[PATCH pushed] testsuite: Unbreak pr110557.cc where long is 32-bit (was Re: Pushed: [PATCH v2] vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557])

2023-07-11 Thread Xi Ruoyao via Gcc-patches
On Tue, 2023-07-11 at 13:04 +0530, Prathamesh Kulkarni wrote:

/* snip */

> Hi Xi,
> Your commit:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=63ae6bc60c0f67fb2791991bf4b6e7e0a907d420,
> 
> seems to cause following regressions on arm-linux-gnueabihf:
> FAIL: g++.dg/vect/pr110557.cc  -std=c++98 (test for excess errors)
> FAIL: g++.dg/vect/pr110557.cc  -std=c++14 (test for excess errors)
> FAIL: g++.dg/vect/pr110557.cc  -std=c++17 (test for excess errors)
> FAIL: g++.dg/vect/pr110557.cc  -std=c++20 (test for excess errors)
> 
> Excess error:
> gcc/testsuite/g++.dg/vect/pr110557.cc:12:8: warning: width of
> 'Item::y' exceeds its type

Ah sorry, I didn't consider ports with 32-bit long.

The attached patch should fix the issue.  It has been tested and pushed
r14-2427 and r13-7555.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From 312839653b8295599c63cae90278a87af528edad Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Tue, 11 Jul 2023 15:55:54 +0800
Subject: [PATCH] testsuite: Unbreak pr110557.cc where long is 32-bit

On ports with 32-bit long, the test produced excess errors:

gcc/testsuite/g++.dg/vect/pr110557.cc:12:8: warning: width of
'Item::y' exceeds its type

Reported-by: Prathamesh Kulkarni 

gcc/testsuite/ChangeLog:

	* g++.dg/vect/pr110557.cc: Use long long instead of long for
	64-bit type.
	(test): Remove an unnecessary cast.
---
 gcc/testsuite/g++.dg/vect/pr110557.cc | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/g++.dg/vect/pr110557.cc b/gcc/testsuite/g++.dg/vect/pr110557.cc
index e1fbe1caac4..effb67e2df3 100644
--- a/gcc/testsuite/g++.dg/vect/pr110557.cc
+++ b/gcc/testsuite/g++.dg/vect/pr110557.cc
@@ -1,7 +1,9 @@
 // { dg-additional-options "-mavx" { target { avx_runtime } } }
 
-static inline long
-min (long a, long b)
+typedef long long i64;
+
+static inline i64
+min (i64 a, i64 b)
 {
   return a < b ? a : b;
 }
@@ -9,16 +11,16 @@ min (long a, long b)
 struct Item
 {
   int x : 8;
-  long y : 55;
+  i64 y : 55;
   bool z : 1;
 };
 
-__attribute__ ((noipa)) long
+__attribute__ ((noipa)) i64
 test (Item *a, int cnt)
 {
-  long size = 0;
+  i64 size = 0;
   for (int i = 0; i < cnt; i++)
-size = min ((long)a[i].y, size);
+size = min (a[i].y, size);
   return size;
 }
 
-- 
2.41.0



Re: [PATCH] x86: improve fast bfloat->float conversion

2023-07-11 Thread Jakub Jelinek via Gcc-patches
On Tue, Jul 11, 2023 at 09:50:23AM +0200, Jan Beulich via Gcc-patches wrote:
> > Quote from what Jakub said in [1].
> > ---
> > This is not correct.
> > While using such code for _mm_cvtsbh_ss is fine if it is documented not to
> > raise exceptions and turn a sNaN into a qNaN, it is not fine for HONOR_NANS
> > (i.e. when -ffast-math is not on), because a __bf16 -> float conversion
> > on sNaN should raise invalid exception and turn it into a qNaN.
> > We could have extendbfsf2 expander that would FAIL; if HONOR_NANS and
> > emit extendbfsf2_1 otherwise. 
> > ---
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607108.html
> 
> I'm not sure I understand: It sounds like what Jakub said matches my
> observation, yet then it seems unlikely that the issue wasn't fixed in
> over half a year.
> 
> Also having the expander FAIL when HONOR_NANS (matching what I was
> thinking) still doesn't clarify to me what then would happen to uses of
> the builtin. Is there any (common code) fallback for such a case? I
> didn't think there would be, in which case wouldn't this result in an
> internal compiler error?

There is some bfloat specific generic code I've added last year in expr.cc
and optabs.cc:
grep arm_bfloat_ *.cc
expr.cc:  || (REAL_MODE_FORMAT (from_mode) == 
&arm_bfloat_half_format
expr.cc:  || (REAL_MODE_FORMAT (to_mode) == 
&arm_bfloat_half_format
expr.cc:  if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
expr.cc:  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
optabs.cc:  == &arm_bfloat_half_format
optabs.cc:  == &arm_bfloat_half_format
optabs.cc:  if (REAL_MODE_FORMAT (GET_MODE (from)) == &arm_bfloat_half_format
plus if that doesn't trigger there are libcalls for it in libgcc, e.g.
libgcc/soft-fp/extendbfsf2.c
The generic code ensure one doesn't need full set of library functions
like also extendbf{d,x,t,h}f2.c etc. when target has next to bfloat also
IEEE single support.
The generic code also uses shifts when not honoring NaNs (etc.) where
possible, but of course target code can override it if it can do the stuff
better than the generic code, I just wanted some fallback because
on targets which do support bfloat and IEEE single the shifts will be a
common way to extend.

Jakub



RE: [PATCH] x86: improve fast bfloat->float conversion

2023-07-11 Thread Liu, Hongtao via Gcc-patches


> -Original Message-
> From: Jan Beulich 
> Sent: Tuesday, July 11, 2023 3:50 PM
> To: Liu, Hongtao 
> Cc: Kirill Yukhin ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] x86: improve fast bfloat->float conversion
> 
> On 11.07.2023 08:45, Liu, Hongtao wrote:
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: Tuesday, July 11, 2023 2:08 PM
> >>
> >> There's nothing AVX512BW-ish in here, so no reason to use Yw as the
> >> constraints for the AVX alternative. Furthermore by using the 512-bit
> >> form of VPSSLD (in a new alternative) all 32 registers can be used
> >> directly by the insn without AVX512VL needing to be enabled.
> > Yes, the instruction vpslld doesn't need AVX512BW, the patch LGTM.
> 
> Thanks.
> 
> >> ---
> >> The corresponding expander, "extendbfsf2", looks to have been dead
> >> since its introduction in a1ecc5600464 ("Fix incorrect
> >> _mm_cvtsbh_ss"): The builtin references the insn (extendbfsf2_1), not
> >> the expander. Can't the expander be deleted and the name of the insn
> >> then pruned of the _1 suffix? If so, that further raises the question
> >> of the significance of the "!HONOR_NANS (BFmode)" that the expander
> >> has, but the insn doesn't have. Which may instead suggest the builtin
> >> was meant to reference the expander. Yet then I can't see what would
> >> the builtin would expand to when HONOR_NANS
> >> (BFmode) it true.
> >
> > Quote from what Jakub said in [1].
> > ---
> > This is not correct.
> > While using such code for _mm_cvtsbh_ss is fine if it is documented
> > not to raise exceptions and turn a sNaN into a qNaN, it is not fine
> > for HONOR_NANS (i.e. when -ffast-math is not on), because a __bf16 ->
> > float conversion on sNaN should raise invalid exception and turn it into a
> qNaN.
> > We could have extendbfsf2 expander that would FAIL; if HONOR_NANS
> and
> > emit extendbfsf2_1 otherwise.
> > ---
> > [1]
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607108.html
> 
> I'm not sure I understand: It sounds like what Jakub said matches my
> observation, yet then it seems unlikely that the issue wasn't fixed in over 
> half
> a year.
> 
> Also having the expander FAIL when HONOR_NANS (matching what I was
> thinking) still doesn't clarify to me what then would happen to uses of the
> builtin. Is there any (common code) fallback for such a case? I didn't think
> there would be, in which case wouldn't this result in an internal compiler
> error?
For __bf16 -> float or target specific builtins, it should be ok since __bf16 
is just an extension type.
 but extendbfsf2 is a standard pattern name which is also used to expand c++23 
std::bfloat16_t -> float conversion which is assumed to raise exceptions for 
sNAN.
Since vpslld won't raise any exception, we need to add HONOR_NANS in the 
extendbfsf2 pattern.
It's my understanding, for std:bfloat16_t support, it's mentioned in [2].

https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601865.html
> 
> Jan


Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Jan Hubicka via Gcc-patches
> > > FWIW, this particular patch was regstrapped on x86-64-linux
> > > with trunk from a week ago (and sniff-tested on current trunk).
> >
> > This looks really cool.
> 
> The biggest benefit might be from IPA with LTO where we'd carefully place 
> those
> attributes at WPA time (at that time tieing our hands for later).

Within single partition IRA already propagates the knowledge about
callee-clobbered registers.

Across partition we already automatically enable regparm with -m32
see ix86_function_regparm and tests for target->local and
can_change_attribute

Enabling SSE at the same spot should be easy.

Honza


Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Richard Biener via Gcc-patches
On Tue, Jul 11, 2023 at 10:53 AM Jan Hubicka  wrote:
>
> > > > FWIW, this particular patch was regstrapped on x86-64-linux
> > > > with trunk from a week ago (and sniff-tested on current trunk).
> > >
> > > This looks really cool.
> >
> > The biggest benefit might be from IPA with LTO where we'd carefully place 
> > those
> > attributes at WPA time (at that time tieing our hands for later).
>
> Within single partition IRA already propagates the knowledge about
> callee-clobbered registers.
>
> Across partition we already automatically enable regparm with -m32
> see ix86_function_regparm and tests for target->local and
> can_change_attribute
>
> Enabling SSE at the same spot should be easy.

It's probably slightly different since we want to enable it for a "leaf"
sub-callgraph (or where edges to extern have the appropriate ABI
by means of attributes) irrespective of whether the functions are exported
(we're adding to the callee save set, which is ABI compatible
with the default ABI).  But yes, that place would be appropriate.

Richard.

>
> Honza


[PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-11 Thread Jiufu Guo via Gcc-patches
Hi,

Integer expression "(X - N * M) / N" can be optimized to "X / N - M"
if there is no wrap/overflow/underflow and "X - N * M" has the same
sign with "X".

Compare the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623028.html
- The APIs for checking overflow of range operation are moved to
other files: range-op and gimple-range.
- Improve the patterns with '(X + C)' for unsigned type.

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this patch ok for trunk?

BR,
Jeff (Jiufu Guo)


PR tree-optimization/108757

gcc/ChangeLog:

* gimple-range.cc (arith_without_overflow_p): New function.
(same_sign_p): New function.
* gimple-range.h (arith_without_overflow_p): New declare.
(same_sign_p): New declare.
* match.pd ((X - N * M) / N): New pattern.
((X + N * M) / N): New pattern.
((X + C) div_rshift N): New pattern.
* range-op.cc (plus_without_overflow_p): New function.
(minus_without_overflow_p): New function.
(mult_without_overflow_p): New function.
* range-op.h (plus_without_overflow_p): New declare.
(minus_without_overflow_p): New declare.
(mult_without_overflow_p): New declare.
* value-query.h (get_range): New function
* value-range.cc (irange::nonnegative_p): New function.
(irange::nonpositive_p): New function.
* value-range.h (irange::nonnegative_p): New declare.
(irange::nonpositive_p): New declare.

gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: New test.
* gcc.dg/pr108757-2.c: New test.
* gcc.dg/pr108757.h: New test.

---
 gcc/gimple-range.cc   |  50 +++
 gcc/gimple-range.h|   2 +
 gcc/match.pd  |  64 
 gcc/range-op.cc   |  77 ++
 gcc/range-op.h|   4 +
 gcc/value-query.h |  10 ++
 gcc/value-range.cc|  12 ++
 gcc/value-range.h |   2 +
 gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
 gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
 gcc/testsuite/gcc.dg/pr108757.h   | 233 ++
 11 files changed, 491 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr108757.h

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 
01e62d3ff3901143bde33dc73c0debf41d0c0fdd..620fe32e85e5fe3847a933554fc656b2939cf02d
 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -926,3 +926,53 @@ assume_query::dump (FILE *f)
 }
   fprintf (f, "--\n");
 }
+
+/* Return true if the operation "X CODE Y" in type does not overflow
+   underflow or wrap with value range info, otherwise return false.  */
+
+bool
+arith_without_overflow_p (tree_code code, tree x, tree y, tree type)
+{
+  gcc_assert (INTEGRAL_TYPE_P (type));
+
+  if (TYPE_OVERFLOW_UNDEFINED (type))
+return true;
+
+  value_range vr0;
+  value_range vr1;
+  if (!(get_range (vr0, x) && get_range (vr1, y)))
+return false;
+
+  switch (code)
+{
+case PLUS_EXPR:
+  return plus_without_overflow_p (vr0, vr1, type);
+case MINUS_EXPR:
+  return minus_without_overflow_p (vr0, vr1, type);
+case MULT_EXPR:
+  return mult_without_overflow_p (vr0, vr1, type);
+default:
+  gcc_unreachable ();
+}
+
+  return false;
+}
+
+/* Return true if "X" and "Y" have the same sign or zero.  */
+
+bool
+same_sign_p (tree x, tree y, tree type)
+{
+  gcc_assert (INTEGRAL_TYPE_P (type));
+
+  if (TYPE_UNSIGNED (type))
+return true;
+
+  value_range vr0;
+  value_range vr1;
+  if (!(get_range (vr0, x) && get_range (vr1, y)))
+return false;
+
+  return (vr0.nonnegative_p () && vr1.nonnegative_p ())
+|| (vr0.nonpositive_p () && vr1.nonpositive_p ());
+}
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index 
6587e4923ff44e10826a697ecced237a0ad23c88..84eac87392b642ed3305011415c804f5b319e09f
 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -101,5 +101,7 @@ protected:
   gori_compute m_gori;
 };
 
+bool arith_without_overflow_p (tree_code code, tree x, tree y, tree type);
+bool same_sign_p (tree x, tree y, tree type);
 
 #endif // GCC_GIMPLE_RANGE_H
diff --git a/gcc/match.pd b/gcc/match.pd
index 
8543f777a28e4f39b2b2a40d0702aed88786bbb3..87e990c5b1ebbd116d7d7efdba62347d3a967cdd
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -942,6 +942,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 #endif

 
+#if GIMPLE
+(for div (trunc_div exact_div)
+ /* Simplify (t + M*N) / N -> t / N + M.  */
+ (simplify
+  (div (plus:c@4 @0 (mult:c@3 @1 @2)) @2)
+  (if (INTEGRAL_TYPE_P (type)
+   && arith_without_overflow_p (MULT_EXPR, @1, @2, type)
+   && arith_without_overflow_p (PLUS_EXPR, @0, @3, type)
+   && same_sign_p (@0, @4, type))
+  (plus (div @0 @2) @1)))
+
+ /* Simplify (t - M*N) / N -> t / N - M.  */
+ (si

[PATCH] Add peephole to eliminate redundant comparison after cmpccxadd.

2023-07-11 Thread liuhongt via Gcc-patches
Similar like we did for CMPXCHG, but extended to all
ix86_comparison_int_operator since CMPCCXADD set EFLAGS exactly same
as CMP.

When operand order in CMP insn is same as that in CMPCCXADD,
CMP insn can be eliminated directly.

When operand order is swapped in CMP insn, only optimize
cmpccxadd + cmpl + jcc/setcc to cmpccxadd + jcc/setcc when FLAGS_REG is dead
after jcc/setcc plus adjusting code for jcc/setcc.

gcc/ChangeLog:

PR target/110591
* config/i386/sync.md (cmpccxadd_): Adjust the pattern
to explicitly set FLAGS_REG like *cmp_1, also add extra
3 define_peephole2 after the pattern.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110591.c: New test.
* gcc.target/i386/pr110591-2.c: New test.
---
 gcc/config/i386/sync.md| 160 -
 gcc/testsuite/gcc.target/i386/pr110591-2.c |  90 
 gcc/testsuite/gcc.target/i386/pr110591.c   |  66 +
 3 files changed, 315 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr110591-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr110591.c

diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index e1fa1504deb..e84226cf895 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -1093,7 +1093,9 @@ (define_insn "cmpccxadd_"
  UNSPECV_CMPCCXADD))
(set (match_dup 1)
(unspec_volatile:SWI48x [(const_int 0)] UNSPECV_CMPCCXADD))
-   (clobber (reg:CC FLAGS_REG))]
+   (set (reg:CC FLAGS_REG)
+   (compare:CC (match_dup 1)
+   (match_dup 2)))]
   "TARGET_CMPCCXADD && TARGET_64BIT"
 {
   char buf[128];
@@ -1105,3 +1107,159 @@ (define_insn "cmpccxadd_"
   output_asm_insn (buf, operands);
   return "";
 })
+
+(define_peephole2
+  [(set (match_operand:SWI48x 0 "register_operand")
+   (match_operand:SWI48x 1 "x86_64_general_operand"))
+   (parallel [(set (match_dup 0)
+  (unspec_volatile:SWI48x
+[(match_operand:SWI48x 2 "memory_operand")
+ (match_dup 0)
+ (match_operand:SWI48x 3 "register_operand")
+ (match_operand:SI 4 "const_int_operand")]
+UNSPECV_CMPCCXADD))
+ (set (match_dup 2)
+  (unspec_volatile:SWI48x [(const_int 0)] UNSPECV_CMPCCXADD))
+ (set (reg:CC FLAGS_REG)
+  (compare:CC (match_dup 2)
+  (match_dup 0)))])
+   (set (reg FLAGS_REG)
+   (compare (match_operand:SWI48x 5 "register_operand")
+(match_operand:SWI48x 6 "x86_64_general_operand")))]
+  "TARGET_CMPCCXADD && TARGET_64BIT
+   && rtx_equal_p (operands[0], operands[5])
+   && rtx_equal_p (operands[1], operands[6])"
+  [(set (match_dup 0)
+   (match_dup 1))
+   (parallel [(set (match_dup 0)
+  (unspec_volatile:SWI48x
+[(match_dup 2)
+ (match_dup 0)
+ (match_dup 3)
+ (match_dup 4)]
+UNSPECV_CMPCCXADD))
+ (set (match_dup 2)
+  (unspec_volatile:SWI48x [(const_int 0)] UNSPECV_CMPCCXADD))
+ (set (reg:CC FLAGS_REG)
+  (compare:CC (match_dup 2)
+  (match_dup 0)))])
+   (set (match_dup 7)
+   (match_op_dup 8
+ [(match_dup 9) (const_int 0)]))])
+
+(define_peephole2
+  [(set (match_operand:SWI48x 0 "register_operand")
+   (match_operand:SWI48x 1 "x86_64_general_operand"))
+   (parallel [(set (match_dup 0)
+  (unspec_volatile:SWI48x
+[(match_operand:SWI48x 2 "memory_operand")
+ (match_dup 0)
+ (match_operand:SWI48x 3 "register_operand")
+ (match_operand:SI 4 "const_int_operand")]
+UNSPECV_CMPCCXADD))
+ (set (match_dup 2)
+  (unspec_volatile:SWI48x [(const_int 0)] UNSPECV_CMPCCXADD))
+ (set (reg:CC FLAGS_REG)
+  (compare:CC (match_dup 2)
+  (match_dup 0)))])
+   (set (reg FLAGS_REG)
+   (compare (match_operand:SWI48x 5 "register_operand")
+(match_operand:SWI48x 6 "x86_64_general_operand")))
+   (set (match_operand:QI 7 "nonimmediate_operand")
+   (match_operator:QI 8 "ix86_comparison_int_operator"
+ [(reg FLAGS_REG) (const_int 0)]))]
+  "TARGET_CMPCCXADD && TARGET_64BIT
+   && rtx_equal_p (operands[0], operands[6])
+   && rtx_equal_p (operands[1], operands[5])
+   && peep2_regno_dead_p (4, FLAGS_REG)"
+  [(set (match_dup 0)
+   (match_dup 1))
+   (parallel [(set (match_dup 0)
+  (unspec_volatile:SWI48x
+[(match_dup 2)
+ (match_dup 0)
+ (match_dup 3)
+ (match_dup 4)]
+UNSPECV_CMPCCXADD))
+ (set (match_dup 2)
+  (unspec_volatile:S

Re: [COMMITTED] ada: Follow-up fix for compilation issue with recent MinGW-w64 versions

2023-07-11 Thread Eric Botcazou via Gcc-patches
> It turns out that adaint.c includes other Windows header files than just
> windows.h, so defining WIN32_LEAN_AND_MEAN is not sufficient for it.
> 
> gcc/ada/
> 
>   * adaint.c [_WIN32]: Undefine 'abort' macro.

I backported it onto the 13 branch.

-- 
Eric Botcazou




[COMMITTED] ada: Avoid renaming_decl in case of constrained array

2023-07-11 Thread Marc Poulhiès via Gcc-patches
From: Bob Duff 

This patch avoids rewriting "X: S := F(...);" as "X: S renames F(...);".
That rewrite is incorrect if S is a constrained array subtype,
because it changes the semantics. In the original, the
bounds of X are that of S. But constraints are ignored in
renamings, so the bounds of X would come from F'Result.
This can cause spurious Constraint_Errors in some obscure
cases. It causes unnecessary checks to be inserted, and even
when such checks pass (more common case), they might be less
efficient.

gcc/ada/

* exp_ch3.adb (Expand_N_Object_Declaration): Avoid transforming to
a renaming in case of constrained array that comes from source.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index daf27fb25e9..db27a5f68b6 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -7275,6 +7275,13 @@ package body Exp_Ch3 is
   Rewrite_As_Renaming : Boolean := False;
   --  Whether to turn the declaration into a renaming at the end
 
+  Nominal_Subtype_Is_Constrained_Array : constant Boolean :=
+Comes_From_Source (Obj_Def)
+and then Is_Array_Type (Typ) and then Is_Constrained (Typ);
+  --  Used to avoid rewriting as a renaming for constrained arrays,
+  --  which is only a problem for source arrays; others have the
+  --  correct bounds (see below).
+
--  Start of processing for Expand_N_Object_Declaration
 
begin
@@ -8030,7 +8037,14 @@ package body Exp_Ch3 is
 
or else (Nkind (Expr_Q) = N_Slice
  and then OK_To_Rename_Ref (Prefix (Expr_Q))
- and then not Special_Ret_Obj));
+ and then not Special_Ret_Obj))
+
+--  If we have "X : S := ...;", and S is a constrained array
+--  subtype, then we cannot rename, because renamings ignore
+--  the constraints of S, so that would change the semantics
+--  (sliding would not occur on the initial value).
+
+and then not Nominal_Subtype_Is_Constrained_Array;
 
 --  If the type needs finalization and is not inherently limited,
 --  then the target is adjusted after the copy and attached to the
-- 
2.40.0



[COMMITTED] ada: Fix wrong resolution for hidden discriminant in predicate

2023-07-11 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

The problem occurs for hidden discriminants of private discriminated types.

gcc/ada/

* sem_ch13.adb (Replace_Type_References_Generic.Visible_Component):
In the case of private discriminated types, return a discriminant
only if it is listed in the discriminant part of the declaration.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch13.adb | 49 +---
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index c3ea8d63566..4f97094aae5 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -15569,15 +15569,11 @@ package body Sem_Ch13 is
 
   function Visible_Component (Comp : Name_Id) return Entity_Id is
  E : Entity_Id;
+
   begin
- --  Types with nameable components are record, task, and protected
- --  types, and discriminated private types.
+ --  Types with nameable components are record, task, protected types
 
- if Ekind (T) in E_Record_Type
-   | E_Task_Type
-   | E_Protected_Type
-   or else (Is_Private_Type (T) and then Has_Discriminants (T))
- then
+ if Ekind (T) in E_Record_Type | E_Task_Type | E_Protected_Type then
 --  This is a sequential search, which seems acceptable
 --  efficiency-wise, given the typical size of component
 --  lists, protected operation lists, task item lists, and
@@ -15591,6 +15587,45 @@ package body Sem_Ch13 is
 
Next_Entity (E);
 end loop;
+
+ --  Private discriminated types may have visible discriminants
+
+ elsif Is_Private_Type (T) and then Has_Discriminants (T) then
+declare
+   Decl : constant Node_Id := Declaration_Node (T);
+   Spec : constant List_Id :=
+ Discriminant_Specifications (Original_Node (Decl));
+
+   Discr : Node_Id;
+
+begin
+   --  Loop over the discriminants listed in the discriminant part
+   --  of the private type declaration to find one with a matching
+   --  name; then, if it exists, return the discriminant entity of
+   --  the same name in the type, which is that of its full view.
+
+   if Present (Spec) then
+  Discr := First (Spec);
+
+  while Present (Discr) loop
+ if Chars (Defining_Identifier (Discr)) = Comp then
+Discr := First_Discriminant (T);
+
+while Present (Discr) loop
+   if Chars (Discr) = Comp then
+  return Discr;
+   end if;
+
+   Next_Discriminant (Discr);
+end loop;
+
+pragma Assert (False);
+ end if;
+
+ Next (Discr);
+  end loop;
+   end if;
+end;
  end if;
 
  --  Nothing by that name
-- 
2.40.0



Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-11 Thread Jan Hubicka via Gcc-patches
> 
> What I saw most wrecking the profile is when passes turn
> if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup
> which then simply deletes one of the outgoing edges without doing
> anything to the (guessed) profile.

Yep, I agree that this is disturbing.  At the cfg cleanup time one can
hardly do anything useful, since the knowledge of transform that caused
profile inconsistency is forgotten.  however I think it is not a complete
disaster.

With profile feedback the most common case of this happening is a
situation where we duplicated code (by inlining, unrolling etc.) into a
context where it behaves differently then the typical behaviour
represented by the profile.

So if one ends up zapping edge with large probability, one also knows
that the code being optimized does not exhibit typical behaviour from
the train run and thus is not very hot.  So profile inconsistency should
not affect performance that much.

So doing nothing is IMO may end up being safter than trying to get the
in/out counts right without really known what is going on.

This is mostly about the scenario "constant propagated this conditional
and profile disagrees with me".  There are other cases where update is
IMO important.  i.e. vectorizer forgetting to cap #of iterations of
epilogue may cause issue since the epilogue loop looks more frequent
than the main vectorized loop and it may cause IRA to insert spilling
into it or so.

When we duplicate we have chance to figure out profile updates.
Also we may try to get as much as possible done early.
I think we should again do loop header copying that does not expand code
at early opts again.  I have some more plans on cleaning up loop-ch and
then we can give it a try.

With guessed profile we always have option to re-do the propagation.
There is TODO_rebuild_frequencies for that which we do after inlining.
This is mostly to handle possible overflows on large loops nests
constructed by inliner.  

We can re-propagate once again after late cleanup passes. Looking at the
queue, we have:

  NEXT_PASS (pass_remove_cgraph_callee_edges);
  /* Initial scalar cleanups before alias computation.
 They ensure memory accesses are not indirect wherever possible.  */
  NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
  NEXT_PASS (pass_ccp, true /* nonzero_p */);
  /* After CCP we rewrite no longer addressed locals into SSA
 form if possible.  */
  NEXT_PASS (pass_object_sizes);
  NEXT_PASS (pass_post_ipa_warn);
  /* Must run before loop unrolling.  */
  NEXT_PASS (pass_warn_access, /*early=*/true);
  NEXT_PASS (pass_complete_unrolli);
 here we care about profile
  NEXT_PASS (pass_backprop);
  NEXT_PASS (pass_phiprop);
  NEXT_PASS (pass_forwprop);
  /* pass_build_alias is a dummy pass that ensures that we
 execute TODO_rebuild_alias at this point.  */
  NEXT_PASS (pass_build_alias);
  NEXT_PASS (pass_return_slot);
  NEXT_PASS (pass_fre, true /* may_iterate */);
  NEXT_PASS (pass_merge_phi);
  NEXT_PASS (pass_thread_jumps_full, /*first=*/true);
 here

By now we did CCP and FRE so we likely optimized out most of constant
conditionals exposed by inline.
Honza


[PATCH] aarch64: Fix warnings during libgcc build

2023-07-11 Thread Florian Weimer via Gcc-patches
libgcc/

* config/aarch64/aarch64-unwind.h (aarch64_cie_signed_with_b_key):
Add missing const qualifier.  Cast from const unsigned char *
to const char *.  Use __builtin_strchr to avoid an implicit
function declaration.
* config/aarch64/linux-unwind.h (aarch64_fallback_frame_state):
Add missing cast.

---
 libgcc/config/aarch64/aarch64-unwind.h | 4 ++--
 libgcc/config/aarch64/linux-unwind.h   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgcc/config/aarch64/aarch64-unwind.h 
b/libgcc/config/aarch64/aarch64-unwind.h
index 3ad2f8239ed..30e428862c4 100644
--- a/libgcc/config/aarch64/aarch64-unwind.h
+++ b/libgcc/config/aarch64/aarch64-unwind.h
@@ -40,8 +40,8 @@ aarch64_cie_signed_with_b_key (struct _Unwind_Context 
*context)
   const struct dwarf_cie *cie = get_cie (fde);
   if (cie != NULL)
{
- char *aug_str = cie->augmentation;
- return strchr (aug_str, 'B') == NULL ? 0 : 1;
+ const char *aug_str = (const char *) cie->augmentation;
+ return __builtin_strchr (aug_str, 'B') == NULL ? 0 : 1;
}
 }
   return 0;
diff --git a/libgcc/config/aarch64/linux-unwind.h 
b/libgcc/config/aarch64/linux-unwind.h
index 00eba866049..93da7a9537d 100644
--- a/libgcc/config/aarch64/linux-unwind.h
+++ b/libgcc/config/aarch64/linux-unwind.h
@@ -77,7 +77,7 @@ aarch64_fallback_frame_state (struct _Unwind_Context *context,
 }
 
   rt_ = context->cfa;
-  sc = &rt_->uc.uc_mcontext;
+  sc = (struct sigcontext *) &rt_->uc.uc_mcontext;
 
 /* This define duplicates the definition in aarch64.md */
 #define SP_REGNUM 31




[PATCH] m68k: Avoid implicit function declaration in libgcc

2023-07-11 Thread Florian Weimer via Gcc-patches
libgcc/

* config/m68k/fpgnulib.c (__cmpdf2): Declare.

---
 libgcc/config/m68k/fpgnulib.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgcc/config/m68k/fpgnulib.c b/libgcc/config/m68k/fpgnulib.c
index fe41edf26aa..d5c3411e947 100644
--- a/libgcc/config/m68k/fpgnulib.c
+++ b/libgcc/config/m68k/fpgnulib.c
@@ -395,6 +395,7 @@ double __extendsfdf2 (float);
 float __truncdfsf2 (double);
 long __fixdfsi (double);
 long __fixsfsi (float);
+int __cmpdf2 (double, double);
 
 int
 __unordxf2(long double a, long double b)



[PATCH] csky: Fix -Wincompatible-pointer-types warning during libgcc build

2023-07-11 Thread Florian Weimer via Gcc-patches
libgcc/

* config/csky/linux-unwind.h (csky_fallback_frame_state): Add
missing cast.

---
 libgcc/config/csky/linux-unwind.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/config/csky/linux-unwind.h 
b/libgcc/config/csky/linux-unwind.h
index 66b2a44e047..1acef215974 100644
--- a/libgcc/config/csky/linux-unwind.h
+++ b/libgcc/config/csky/linux-unwind.h
@@ -75,7 +75,7 @@ csky_fallback_frame_state (struct _Unwind_Context *context,
siginfo_t info;
ucontext_t uc;
   } *_rt = context->cfa;
-  sc = &(_rt->uc.uc_mcontext);
+  sc = (struct sigcontext *) &(_rt->uc.uc_mcontext);
 }
   else
 return _URC_END_OF_STACK;



[PATCH] arc: Fix -Wincompatible-pointer-types warning during libgcc build

2023-07-11 Thread Florian Weimer via Gcc-patches
libgcc/

* config/arc/linux-unwind.h (arc_fallback_frame_state): Add
missing cast.

---
 libgcc/config/arc/linux-unwind.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/config/arc/linux-unwind.h b/libgcc/config/arc/linux-unwind.h
index 0292e22ed1b..dec9428a7e5 100644
--- a/libgcc/config/arc/linux-unwind.h
+++ b/libgcc/config/arc/linux-unwind.h
@@ -100,7 +100,7 @@ arc_fallback_frame_state (struct _Unwind_Context *context,
   if (pc[0] == MOV_R8_139)
 {
   rt_ = context->cfa;
-  sc = &rt_->uc.uc_mcontext;
+  sc = (struct sigcontext *) &rt_->uc.uc_mcontext;
 }
   else
 return _URC_END_OF_STACK;



[PATCH] or1k: Fix -Wincompatible-pointer-types warning during libgcc build

2023-07-11 Thread Florian Weimer via Gcc-patches
libgcc/

* config/or1k/linux-unwind.h (or1k_fallback_frame_state): Add
missing cast.

---
 libgcc/config/or1k/linux-unwind.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/config/or1k/linux-unwind.h 
b/libgcc/config/or1k/linux-unwind.h
index aa873791daa..37b0c5aef37 100644
--- a/libgcc/config/or1k/linux-unwind.h
+++ b/libgcc/config/or1k/linux-unwind.h
@@ -51,7 +51,7 @@ or1k_fallback_frame_state (struct _Unwind_Context *context,
 return _URC_END_OF_STACK;
 
   rt = context->cfa;
-  sc = &rt->uc.uc_mcontext;
+  sc = (struct sigcontext *) &rt->uc.uc_mcontext;
 
   new_cfa = sc->regs.gpr[1];
   fs->regs.cfa_how = CFA_REG_OFFSET;



[PATCH] riscv: Fix -Wincompatible-pointer-types warning during libgcc build

2023-07-11 Thread Florian Weimer via Gcc-patches
libgcc/

* config/riscv/linux-unwind.h (riscv_fallback_frame_state): Add
missing cast.

---
 libgcc/config/riscv/linux-unwind.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/config/riscv/linux-unwind.h 
b/libgcc/config/riscv/linux-unwind.h
index 931c2f2795d..e58d0f4113e 100644
--- a/libgcc/config/riscv/linux-unwind.h
+++ b/libgcc/config/riscv/linux-unwind.h
@@ -64,7 +64,7 @@ riscv_fallback_frame_state (struct _Unwind_Context *context,
 return _URC_END_OF_STACK;
 
   rt_ = context->cfa;
-  sc = &rt_->uc.uc_mcontext;
+  sc = (struct sigcontext *) &rt_->uc.uc_mcontext;
 
   new_cfa = (_Unwind_Ptr) sc;
   fs->regs.cfa_how = CFA_REG_OFFSET;



[PATCH] tree-optimization/110614 - SLP splat and re-align (optimized)

2023-07-11 Thread Richard Biener via Gcc-patches
The following properly guards the re-align (optimized) paths used
on old power CPUs for the added case of SLP splats from non-grouped
loads.  Testcases are existing in dg-torture.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110614
* tree-vect-data-refs.cc (vect_supportable_dr_alignment):
SLP splats are not suitable for re-align ops.
---
 gcc/tree-vect-data-refs.cc | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index ab2af103cb4..9edc8989de9 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -6829,10 +6829,11 @@ vect_supportable_dr_alignment (vec_info *vinfo, 
dr_vec_info *dr_info,
 same alignment, instead it depends on the SLP group size.  */
  if (loop_vinfo
  && STMT_SLP_TYPE (stmt_info)
- && !multiple_p (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
- * (DR_GROUP_SIZE
-(DR_GROUP_FIRST_ELEMENT (stmt_info))),
- TYPE_VECTOR_SUBPARTS (vectype)))
+ && (!STMT_VINFO_GROUPED_ACCESS (stmt_info)
+ || !multiple_p (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+ * (DR_GROUP_SIZE
+  (DR_GROUP_FIRST_ELEMENT (stmt_info))),
+ TYPE_VECTOR_SUBPARTS (vectype
;
  else if (!loop_vinfo
   || (nested_in_vect_loop
-- 
2.35.3


Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-11 Thread Richard Biener via Gcc-patches
On Tue, 11 Jul 2023, Jan Hubicka wrote:

> > 
> > What I saw most wrecking the profile is when passes turn
> > if (cond) into if (0/1) leaving the CFG adjustment to CFG cleanup
> > which then simply deletes one of the outgoing edges without doing
> > anything to the (guessed) profile.
> 
> Yep, I agree that this is disturbing.  At the cfg cleanup time one can
> hardly do anything useful, since the knowledge of transform that caused
> profile inconsistency is forgotten.  however I think it is not a complete
> disaster.
> 
> With profile feedback the most common case of this happening is a
> situation where we duplicated code (by inlining, unrolling etc.) into a
> context where it behaves differently then the typical behaviour
> represented by the profile.
> 
> So if one ends up zapping edge with large probability, one also knows
> that the code being optimized does not exhibit typical behaviour from
> the train run and thus is not very hot.  So profile inconsistency should
> not affect performance that much.
> 
> So doing nothing is IMO may end up being safter than trying to get the
> in/out counts right without really known what is going on.
> 
> This is mostly about the scenario "constant propagated this conditional
> and profile disagrees with me".  There are other cases where update is
> IMO important.  i.e. vectorizer forgetting to cap #of iterations of
> epilogue may cause issue since the epilogue loop looks more frequent
> than the main vectorized loop and it may cause IRA to insert spilling
> into it or so.
> 
> When we duplicate we have chance to figure out profile updates.
> Also we may try to get as much as possible done early.
> I think we should again do loop header copying that does not expand code
> at early opts again.  I have some more plans on cleaning up loop-ch and
> then we can give it a try.
> 
> With guessed profile we always have option to re-do the propagation.
> There is TODO_rebuild_frequencies for that which we do after inlining.
> This is mostly to handle possible overflows on large loops nests
> constructed by inliner.  
> 
> We can re-propagate once again after late cleanup passes. Looking at the
> queue, we have:
> 
>   NEXT_PASS (pass_remove_cgraph_callee_edges);
>   /* Initial scalar cleanups before alias computation.
>  They ensure memory accesses are not indirect wherever possible.  */
>   NEXT_PASS (pass_strip_predict_hints, false /* early_p */);
>   NEXT_PASS (pass_ccp, true /* nonzero_p */);
>   /* After CCP we rewrite no longer addressed locals into SSA
>  form if possible.  */
>   NEXT_PASS (pass_object_sizes);
>   NEXT_PASS (pass_post_ipa_warn);
>   /* Must run before loop unrolling.  */
>   NEXT_PASS (pass_warn_access, /*early=*/true);
>   NEXT_PASS (pass_complete_unrolli);
>  here we care about profile
>   NEXT_PASS (pass_backprop);
>   NEXT_PASS (pass_phiprop);
>   NEXT_PASS (pass_forwprop);
>   /* pass_build_alias is a dummy pass that ensures that we
>  execute TODO_rebuild_alias at this point.  */
>   NEXT_PASS (pass_build_alias);
>   NEXT_PASS (pass_return_slot);
>   NEXT_PASS (pass_fre, true /* may_iterate */);
>   NEXT_PASS (pass_merge_phi);
>   NEXT_PASS (pass_thread_jumps_full, /*first=*/true);
>  here
> 
> By now we did CCP and FRE so we likely optimized out most of constant
> conditionals exposed by inline.

So maybe we should simply delay re-propagation of the profile?  I
think cunrolli doesn't so much care about the profile - cunrolli
is (was) about abstraction removal.  Jump threading should be
the first pass to care.

Richard.


[PATCH 1/3] fortran: defer class wrapper initialization after deallocation [PR92178]

2023-07-11 Thread Mikael Morin via Gcc-patches
If an actual argument is associated with an INTENT(OUT) dummy, and code
to deallocate it is generated, generate the class wrapper initialization
after the actual argument deallocation.

This is achieved by passing a cleaned up expression to
gfc_conv_class_to_class, so that the class wrapper initialization code
can be isolated and moved independently after the deallocation.

PR fortran/92178

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_conv_procedure_call): Use a separate gfc_se
struct, initalized from parmse, to generate the class wrapper.
After the class wrapper code has been generated, copy it back
depending on whether parameter deallocation code has been
generated.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_out_19.f90: New test.
---
 gcc/fortran/trans-expr.cc   | 18 -
 gcc/testsuite/gfortran.dg/intent_out_19.f90 | 22 +
 2 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/intent_out_19.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 7017b652d6e..b7e95e6d04d 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6500,6 +6500,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 
  else
{
+ bool defer_to_dealloc_blk = false;
  if (e->ts.type == BT_CLASS && fsym
  && fsym->ts.type == BT_CLASS
  && (!CLASS_DATA (fsym)->as
@@ -6661,6 +6662,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
  stmtblock_t block;
  tree ptr;
 
+ defer_to_dealloc_blk = true;
+
  gfc_init_block  (&block);
  ptr = parmse.expr;
  if (e->ts.type == BT_CLASS)
@@ -6717,7 +6720,12 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
&& ((CLASS_DATA (fsym)->as
 && CLASS_DATA (fsym)->as->type == AS_ASSUMED_RANK)
|| CLASS_DATA (e)->attr.dimension))
-   gfc_conv_class_to_class (&parmse, e, fsym->ts, false,
+   {
+ gfc_se class_se = parmse;
+ gfc_init_block (&class_se.pre);
+ gfc_init_block (&class_se.post);
+
+ gfc_conv_class_to_class (&class_se, e, fsym->ts, false,
 fsym->attr.intent != INTENT_IN
 && (CLASS_DATA (fsym)->attr.class_pointer
 || CLASS_DATA 
(fsym)->attr.allocatable),
@@ -6727,6 +6735,14 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 CLASS_DATA (fsym)->attr.class_pointer
 || CLASS_DATA (fsym)->attr.allocatable);
 
+ parmse.expr = class_se.expr;
+ stmtblock_t *class_pre_block = defer_to_dealloc_blk
+? &dealloc_blk
+: &parmse.pre;
+ gfc_add_block_to_block (class_pre_block, &class_se.pre);
+ gfc_add_block_to_block (&parmse.post, &class_se.post);
+   }
+
  if (fsym && (fsym->ts.type == BT_DERIVED
   || fsym->ts.type == BT_ASSUMED)
  && e->ts.type == BT_CLASS
diff --git a/gcc/testsuite/gfortran.dg/intent_out_19.f90 
b/gcc/testsuite/gfortran.dg/intent_out_19.f90
new file mode 100644
index 000..03036ed382a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/intent_out_19.f90
@@ -0,0 +1,22 @@
+! { dg-do run }
+!
+! PR fortran/92178
+! Check that if a data reference passed is as actual argument whose dummy
+! has INTENT(OUT) attribute, any other argument depending on the
+! same data reference is evaluated before the data reference deallocation.
+
+program p
+  implicit none
+  class(*),  allocatable :: c
+  c = 3
+  call bar (allocated(c), c, allocated (c))
+  if (allocated (c)) stop 14
+contains
+  subroutine bar (alloc, x, alloc2)
+logical :: alloc, alloc2
+class(*), allocatable, intent(out) :: x(..)
+if (allocated (x)) stop 5
+if (.not. alloc)   stop 6
+if (.not. alloc2)  stop 16
+  end subroutine bar
+end
-- 
2.40.1



[PATCH 0/3] Fix argument evaluation order [PR92178]

2023-07-11 Thread Mikael Morin via Gcc-patches
Hello,

this is a followup to Harald's recent work [1] on the evaluation order
of arguments, when one of them is passed to an intent(out) allocatable
dummy and is deallocated before the call.
This extends Harald's fix to support:
 - scalars passed to assumed rank dummies (patch 1),
 - scalars passed to assumed rank dummies with the data reference
 depending on its own content (patch 2),
 - arrays with the data reference depending on its own content
 (patch 3).

There is one (last?) case which is not supported, for which I have opened
a separate PR [2].

Regression tested on x86_64-pc-linux-gnu. OK for master?

[1] https://gcc.gnu.org/pipermail/fortran/2023-July/059562.html 
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110618

Mikael Morin (3):
  fortran: defer class wrapper initialization after deallocation
[PR92178]
  fortran: Factor data references for scalar class argument wrapping
[PR92178]
  fortran: Reorder array argument evaluation parts [PR92178]

 gcc/fortran/trans-array.cc  |   3 +
 gcc/fortran/trans-expr.cc   | 130 +---
 gcc/fortran/trans.cc|  28 +
 gcc/fortran/trans.h |   8 +-
 gcc/testsuite/gfortran.dg/intent_out_19.f90 |  22 
 gcc/testsuite/gfortran.dg/intent_out_20.f90 |  33 +
 gcc/testsuite/gfortran.dg/intent_out_21.f90 |  33 +
 7 files changed, 236 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/intent_out_19.f90
 create mode 100644 gcc/testsuite/gfortran.dg/intent_out_20.f90
 create mode 100644 gcc/testsuite/gfortran.dg/intent_out_21.f90

-- 
2.40.1



[PATCH 2/3] fortran: Factor data references for scalar class argument wrapping [PR92178]

2023-07-11 Thread Mikael Morin via Gcc-patches
In the case of a scalar actual arg passed to a polymorphic assumed-rank
dummy with INTENT(OUT) attribute, avoid repeatedly evaluating the actual
argument reference by saving a pointer to it.  This is non-optimal, but
may also be invalid, because the data reference may depend on its own
content.  In that case the expression can't be evaluated after the data
has been deallocated.

There are two ways redundant expressions are generated:
 - parmse.expr, which contains the actual argument expression, is
   reused to get or set subfields in gfc_conv_class_to_class.
 - gfc_conv_class_to_class, to get the virtual table pointer associated
   with the argument, generates a new expression from scratch starting
   with the frontend expression.

The first part is fixed by saving parmse.expr to a pointer and using
the pointer instead of the original expression.

The second part is fixed by adding a separate field to gfc_se that
is set to the class container expression  when the expression to
evaluate is polymorphic.  This needs the same field in gfc_ss_info
so that its value can be propagated to gfc_conv_class_to_class which
is modified to use that value.  Finally gfc_conv_procedure saves the
expression in that field to a pointer in between to avoid the same
problem as for the first part.

PR fortran/92178

gcc/fortran/ChangeLog:

* trans.h (struct gfc_se): New field class_container.
(struct gfc_ss_info): Ditto.
(gfc_evaluate_data_ref_now): New prototype.
* trans.cc (gfc_evaluate_data_ref_now):  Implement it.
* trans-array.cc (gfc_conv_ss_descriptor): Copy class_container
field from gfc_se struct to gfc_ss_info struct.
(gfc_conv_expr_descriptor): Copy class_container field from
gfc_ss_info struct to gfc_se struct.
* trans-expr.cc (gfc_conv_class_to_class): Use class container
set in class_container field if available.
(gfc_conv_variable): Set class_container field on encountering
class variables or components, clear it on encountering
non-class components.
(gfc_conv_procedure_call): Evaluate data ref to a pointer now,
and replace later references by usage of the pointer.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_out_20.f90: New test.
---
 gcc/fortran/trans-array.cc  |  3 ++
 gcc/fortran/trans-expr.cc   | 26 
 gcc/fortran/trans.cc| 28 +
 gcc/fortran/trans.h |  6 
 gcc/testsuite/gfortran.dg/intent_out_20.f90 | 33 +
 5 files changed, 96 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/intent_out_20.f90

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index e7c51bae052..1c2af55d436 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -3271,6 +3271,7 @@ gfc_conv_ss_descriptor (stmtblock_t * block, gfc_ss * ss, 
int base)
   gfc_add_block_to_block (block, &se.pre);
   info->descriptor = se.expr;
   ss_info->string_length = se.string_length;
+  ss_info->class_container = se.class_container;
 
   if (base)
 {
@@ -7687,6 +7688,8 @@ gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr)
  else if (deferred_array_component)
se->string_length = ss_info->string_length;
 
+ se->class_container = ss_info->class_container;
+
  gfc_free_ss_chain (ss);
  return;
}
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index b7e95e6d04d..5169fbcd974 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -1266,6 +1266,10 @@ gfc_conv_class_to_class (gfc_se *parmse, gfc_expr *e, 
gfc_typespec class_ts,
 
   slen = build_zero_cst (size_type_node);
 }
+  else if (parmse->class_container != NULL_TREE)
+/* Don't redundantly evaluate the expression if the required information
+   is already available.  */
+tmp = parmse->class_container;
   else
 {
   /* Remove everything after the last class reference, convert the
@@ -3078,6 +3082,11 @@ gfc_conv_variable (gfc_se * se, gfc_expr * expr)
  return;
}
 
+  if (sym->ts.type == BT_CLASS
+ && sym->attr.class_ok
+ && sym->ts.u.derived->attr.is_class)
+   se->class_container = se->expr;
+
   /* Dereference the expression, where needed.  */
   se->expr = gfc_maybe_dereference_var (sym, se->expr, se->descriptor_only,
is_classarray);
@@ -3135,6 +3144,15 @@ gfc_conv_variable (gfc_se * se, gfc_expr * expr)
conv_parent_component_references (se, ref);
 
  gfc_conv_component_ref (se, ref);
+
+ if (ref->u.c.component->ts.type == BT_CLASS
+ && ref->u.c.component->attr.class_ok
+ && ref->u.c.component->ts.u.derived->attr.is_class)
+   se->class_container = se->expr;
+ else if (!(ref->u.c.sym->attr.flavor == FL_DERIVED
+  

[PATCH 3/3] fortran: Reorder array argument evaluation parts [PR92178]

2023-07-11 Thread Mikael Morin via Gcc-patches
In the case of an array actual arg passed to a polymorphic array dummy
with INTENT(OUT) attribute, reorder the argument evaluation code to
the following:
 - first evaluate arguments' values, and data references,
 - deallocate data references associated with an allocatable,
   intent(out) dummy,
 - create a class container using the freed data references.

The ordering used to be incorrect between the first two items,
when one argument was deallocated before a later argument evaluated
its expression depending on the former argument.
r14-2395-gb1079fc88f082d3c5b583c8822c08c5647810259 fixed it by treating
arguments associated with an allocatable, intent(out) dummy in a
separate, later block.  This, however, wasn't working either if the data
reference of such an argument was depending on its own content, as
the class container initialization was trying to use deallocated
content.

This change generates class container initialization code in a separate
block, so that it is moved after the deallocation block without moving
the rest of the argument evaluation code.

This alone is not sufficient to fix the problem, because the class
container generation code repeatedly uses the full expression of
the argument at a place where deallocation might have happened
already.  This is non-optimal, but may also be invalid, because the data
reference may depend on its own content.  In that case the expression
can't be evaluated after the data has been deallocated.

As in the scalar case previously treated, this is fixed by saving
the data reference to a pointer before any deallocation happens,
and then only refering to the pointer.  gfc_reset_vptr is updated
to take into account the already evaluated class container if it's
available.

Contrary to the scalar case, one hunk is needed to wrap the parameter
evaluation in a conditional, to avoid regressing in
optional_class_2.f90.  This used to be handled by the class wrapper
construction which wrapped the whole code in a conditional.  With
this change the class wrapper construction can't see the parameter
evaluation code, so the latter is updated with an additional handling
for optional arguments.

PR fortran/92178

gcc/fortran/ChangeLog:

* trans.h (gfc_reset_vptr): Add class_container argument.
* trans-expr.cc (gfc_reset_vptr): Ditto.  If a valid vptr can
be obtained through class_container argument, bypass evaluation
of e.
(gfc_conv_procedure_call):  Wrap the argument evaluation code
in a conditional if the associated dummy is optional.  Evaluate
the data reference to a pointer now, and replace later
references with usage of the pointer.

gcc/testsuite/ChangeLog:

* gfortran.dg/intent_out_21.f90: New test.
---
 gcc/fortran/trans-expr.cc   | 86 -
 gcc/fortran/trans.h |  2 +-
 gcc/testsuite/gfortran.dg/intent_out_21.f90 | 33 
 3 files changed, 101 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/intent_out_21.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 5169fbcd974..dbb04f8c434 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -529,24 +529,32 @@ gfc_find_and_cut_at_last_class_ref (gfc_expr *e, bool 
is_mold,
 }
 
 
-/* Reset the vptr to the declared type, e.g. after deallocation.  */
+/* Reset the vptr to the declared type, e.g. after deallocation.
+   Use the variable in CLASS_CONTAINER if available.  Otherwise, recreate
+   one with E.  The generated assignment code is added at the end of BLOCK.  */
 
 void
-gfc_reset_vptr (stmtblock_t *block, gfc_expr *e)
+gfc_reset_vptr (stmtblock_t *block, gfc_expr *e, tree class_container)
 {
-  gfc_symbol *vtab;
-  tree vptr;
-  tree vtable;
-  gfc_se se;
+  tree vptr = NULL_TREE;
 
-  /* Evaluate the expression and obtain the vptr from it.  */
-  gfc_init_se (&se, NULL);
-  if (e->rank)
-gfc_conv_expr_descriptor (&se, e);
-  else
-gfc_conv_expr (&se, e);
-  gfc_add_block_to_block (block, &se.pre);
-  vptr = gfc_get_vptr_from_expr (se.expr);
+  if (class_container != NULL_TREE)
+vptr = gfc_get_vptr_from_expr (class_container);
+
+  if (vptr == NULL_TREE)
+{
+  gfc_se se;
+
+  /* Evaluate the expression and obtain the vptr from it.  */
+  gfc_init_se (&se, NULL);
+  if (e->rank)
+   gfc_conv_expr_descriptor (&se, e);
+  else
+   gfc_conv_expr (&se, e);
+  gfc_add_block_to_block (block, &se.pre);
+
+  vptr = gfc_get_vptr_from_expr (se.expr);
+}
 
   /* If a vptr is not found, we can do nothing more.  */
   if (vptr == NULL_TREE)
@@ -556,6 +564,9 @@ gfc_reset_vptr (stmtblock_t *block, gfc_expr *e)
 gfc_add_modify (block, vptr, build_int_cst (TREE_TYPE (vptr), 0));
   else
 {
+  gfc_symbol *vtab;
+  tree vtable;
+
   /* Return the vptr to the address of the declared type.  */
   vtab = gfc_find_derived_vtab (e->ts.u.derived);
   

[Patch] libgomp: Use libnuma for OpenMP's partition=nearest allocation trait

2023-07-11 Thread Tobias Burnus

While by default 'malloc' allocates memory on the same node as the calling
process/thread ('numactl --show' shows 'preferred node: current',
Linux kernel memory policy MPOL_DEFAULT), this can be changed.
For instance, when running the program as follows, 'malloc' now
prefers to allocate on the second node:
  numactl --preferred=1 ./myproc

Thus, it seems to be sensible to provide a means to ensure the 'nearest'
allocation.  The MPOL_LOCAL policy does so, as provided by
libnuma's numa_alloc_local. (Which is just wrapper around the syscalls
mmap and mbind.) As with (lib)memkind, there is a run-time dlopen check
for (lib)numa - and no numa*.h is required when bulding GCC.

The patch assumes that yesterday's patch
  'libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space'
  https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624030.html
has already been applied. (Which is mostly a .texi only patch, except
for one 'return' -> 'break' change.)

This patch has been bootstrapped and manually tested on x86-64.
It also passed "make check".

Comments, remarks, thoughts?

[I really dislike committing patches without any feedback from others,
but I still intent to do so, if no one comments. This applies to this patch
and the other one.]

Tobias

PS: I have attached a testcase, but as it needs -lnuma, I do not intent
to commit it.  An alternative which could be to do the same as we do in
the patch itself; namely, to use the dlopen handle to obtain the two
libnuma library calls. - I am unsure whether I should do so or
whether I should just leave out the testcase.

Thoughts?
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Use libnuma for OpenMP's partition=nearest allocation trait

libgomp/ChangeLog:

	* allocator.c: Add ifdef for LIBGOMP_USE_LIBNUMA.
	(enum gomp_numa_memkind_kind): Renamed from gomp_memkind_kind;
	add GOMP_MEMKIND_LIBNUMA.
	(struct gomp_libnuma_data, gomp_init_libnuma, gomp_get_libnuma): New.
	(omp_init_allocator): Handle partition=nearest with libnuma if avail.
	(omp_aligned_alloc, omp_free, omp_aligned_calloc, omp_realloc): Add
	numa_alloc_local (+ memset), numa_free, and numa_realloc calls as
	needed.
	* config/linux/allocator.c (LIBGOMP_USE_LIBNUMA): Define
	* libgomp.texi (Memory allocation): Renamed from 'Memory allocation
	with libmemkind'; updated for libnuma usage.

 libgomp/allocator.c  | 202 +--
 libgomp/config/linux/allocator.c |   1 +
 libgomp/libgomp.texi |  22 -
 3 files changed, 195 insertions(+), 30 deletions(-)

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 25c0f150302..2632f16e132 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -31,13 +31,13 @@
 #include "libgomp.h"
 #include 
 #include 
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
 #include 
 #endif
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
-enum gomp_memkind_kind
+enum gomp_numa_memkind_kind
 {
   GOMP_MEMKIND_NONE = 0,
 #define GOMP_MEMKIND_KINDS \
@@ -50,7 +50,8 @@ enum gomp_memkind_kind
 #define GOMP_MEMKIND_KIND(kind) GOMP_MEMKIND_##kind
   GOMP_MEMKIND_KINDS,
 #undef GOMP_MEMKIND_KIND
-  GOMP_MEMKIND_COUNT
+  GOMP_MEMKIND_COUNT,
+  GOMP_MEMKIND_LIBNUMA = GOMP_MEMKIND_COUNT
 };
 
 struct omp_allocator_data
@@ -65,7 +66,7 @@ struct omp_allocator_data
   unsigned int fallback : 8;
   unsigned int pinned : 1;
   unsigned int partition : 7;
-#ifdef LIBGOMP_USE_MEMKIND
+#if defined(LIBGOMP_USE_MEMKIND) || defined(LIBGOMP_USE_LIBNUMA)
   unsigned int memkind : 8;
 #endif
 #ifndef HAVE_SYNC_BUILTINS
@@ -81,6 +82,14 @@ struct omp_mem_header
   void *pad;
 };
 
+struct gomp_libnuma_data
+{
+  void *numa_handle;
+  void *(*numa_alloc_local) (size_t);
+  void *(*numa_realloc) (void *, size_t, size_t);
+  void (*numa_free) (void *, size_t);
+};
+
 struct gomp_memkind_data
 {
   void *memkind_handle;
@@ -92,6 +101,50 @@ struct gomp_memkind_data
   void **kinds[GOMP_MEMKIND_COUNT];
 };
 
+#ifdef LIBGOMP_USE_LIBNUMA
+static struct gomp_libnuma_data *libnuma_data;
+static pthread_once_t libnuma_data_once = PTHREAD_ONCE_INIT;
+
+static void
+gomp_init_libnuma (void)
+{
+  void *handle = dlopen ("libnuma.so.1", RTLD_LAZY);
+  struct gomp_libnuma_data *data;
+
+  data = calloc (1, sizeof (struct gomp_libnuma_data));
+  if (data == NULL)
+{
+  if (handle)
+	dlclose (handle);
+  return;
+}
+  if (!handle)
+{
+  __atomic_store_n (&libnuma_data, data, MEMMODEL_RELEASE);
+  return;
+}
+  data->numa_handle = handle;
+  data->numa_alloc_local
+= (__typeof (data->numa_alloc_local)) dlsym (handle, "numa_alloc_local");
+  data->numa_realloc
+= (__typeof (data->numa_realloc)) dlsym (handle, "numa_realloc");
+  data->numa_free

[PATCH] Include insn-opinit.h in PLUGIN_H [PR110610]

2023-07-11 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch fixes PR110610 by including OPTABS_H in the INTERNAL_FN_H 
list, as insn-opinit.h is now required by internal-fn.h. This will lead 
to insn-opinit.h, among the other OPTABS_H header files, being installed 
in the plugin directory.


Bootstrapped aarch64-unknown-linux-gnu.

@Jakub: could you check to see if it also addresses PR 110284?


gcc/ChangeLog:

PR 110610
* Makefile.in (INTERNAL_FN_H): Add OPTABS_H.diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 
c478ec852013eae65b9f3ec0a443e023c7d8b452..d3ff210ee04414f4e238c087400dd21e1cb0fc18
 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -976,7 +976,7 @@ READ_MD_H = $(OBSTACK_H) $(HASHTAB_H) read-md.h
 BUILTINS_DEF = builtins.def sync-builtins.def omp-builtins.def \
gtm-builtins.def sanitizer.def
 INTERNAL_FN_DEF = internal-fn.def
-INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF)
+INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF) $(OPTABS_H)
 TREE_CORE_H = tree-core.h $(CORETYPES_H) all-tree.def tree.def \
c-family/c-common.def $(lang_tree_files) \
$(BUILTINS_DEF) $(INPUT_H) statistics.h \


Re: [PATCH] rs6000: Change GPR2 to volatile & non-fixed register for function that does not use TOC [PR110320]

2023-07-11 Thread P Jeevitha via Gcc-patches



On 07/07/2023 A 12:11 am, Peter Bergner wrote:

> I believe the untested patch below should also work, without having to scan
> the (uncommonly used) options.  Jeevitha, can you bootstrap and regtest the
> patch below?

Yeah Peter, Bootstrapped and regtested the below patch on powerpc64le-linux 
there was no regression.

> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index d197c3f3289..7c356a73ac6 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10160,9 +10160,13 @@ rs6000_conditional_register_usage (void)
>  for (i = 32; i < 64; i++)
>fixed_regs[i] = call_used_regs[i] = 1;
> 
> +  /* For non PC-relative code, GPR2 is unavailable for register allocation.  
> */
> +  if (FIXED_R2 && !rs6000_pcrel_p ())
> +fixed_regs[2] = 1;
> +
>/* The TOC register is not killed across calls in a way that is
>   visible to the compiler.  */
> -  if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)
> +  if (fixed_regs[2] && (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2))
>  call_used_regs[2] = 0;
> 
>if (DEFAULT_ABI == ABI_V4 && flag_pic == 2)
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 3503614efbd..2a24fbdf9fd 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -812,7 +812,7 @@ enum data_align { align_abi, align_opt, align_both };
> 
>  #define FIXED_REGISTERS  \
>{/* GPRs */ \
> -   0, 1, FIXED_R2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, FIXED_R13, 0, 0, \
> +   0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, FIXED_R13, 0, 0, \
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
> /* FPRs */ \
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
> 

> 
> 
>> Besides, IMHO we need a corresponding test case to cover this -ffixed-r2 
>> handling.
> 
> Good idea.  I think we can duplicate the pr110320_2.c test case, replacing the
> -mno-pcrel option with -ffixed-r2.  Jeevitha, can you give that a try?
 
Yeah, adding the new test cases along with the mentioned changes for the older 
ones below,

diff --git a/gcc/testsuite/gcc.target/powerpc/pr110320_1.c 
b/gcc/testsuite/gcc.target/powerpc/pr110320_1.c
new file mode 100644
index 000..a4ad34d9303
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr110320_1.c
@@ -0,0 +1,22 @@
+/* PR target/110320 */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -ffixed-r0 -ffixed-r11 -ffixed-r12" 
} */
+
+/* Ensure we use r2 as a normal volatile register for the code below.
+   The test case ensures all of the parameter registers r3 - r10 are used
+   and needed after we compute the expression "x + y" which requires a
+   temporary.  The -ffixed-r* options disallow using the other volatile
+   registers r0, r11 and r12.  That leaves RA to choose from r2 and the more
+   expensive non-volatile registers for the temporary to be assigned to, and
+   RA will always chooses the cheaper volatile r2 register.  */
+
+extern long bar (long, long, long, long, long, long, long, long *);
+
+long
+foo (long r3, long r4, long r5, long r6, long r7, long r8, long r9, long *r10)
+{
+  *r10 = r3 + r4;
+  return bar (r3, r4, r5, r6, r7, r8, r9, r10);
+}
+
+/* { dg-final { scan-assembler {\madd 2,3,4\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr110320_2.c 
b/gcc/testsuite/gcc.target/powerpc/pr110320_2.c
new file mode 100644
index 000..9d6aefedd2e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr110320_2.c
@@ -0,0 +1,21 @@
+/* PR target/110320 */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -mno-pcrel -ffixed-r0 -ffixed-r11 
-ffixed-r12" } */
+
+/* Ensure we don't use r2 as a normal volatile register for the code below.
+   The test case ensures all of the parameter registers r3 - r10 are used
+   and needed after we compute the expression "x + y" which requires a
+   temporary.  The -ffixed-r* options disallow using the other volatile
+   registers r0, r11 and r12.  That only leaves RA to choose from the more
+   expensive non-volatile registers for the temporary to be assigned to.  */
+
+extern long bar (long, long, long, long, long, long, long, long *);
+
+long
+foo (long r3, long r4, long r5, long r6, long r7, long r8, long r9, long *r10)
+{
+  *r10 = r3 + r4;
+  return bar (r3, r4, r5, r6, r7, r8, r9, r10);
+}
+
+/* { dg-final { scan-assembler-not {\madd 2,3,4\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr110320_3.c 
b/gcc/testsuite/gcc.target/powerpc/pr110320_3.c
new file mode 100644
index 000..ea6c6188c8d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr110320_3.c
@@ -0,0 +1,21 @@
+/* PR target/110320 */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -ffixed-r2 -ffixed-r0 -ffixed-r11 
-ffixed-r12" } */
+
+/* Ensure we don't use r2 as a normal volatile register f

Re: [PATCH 1/2]middle-end ifcvt: Reduce comparisons on conditionals by tracking truths [PR109154]

2023-07-11 Thread Richard Biener via Gcc-patches
On Fri, 7 Jul 2023, Tamar Christina wrote:

> Hi All,
> 
> Following on from Jakub's patch in g:de0ee9d14165eebb3d31c84e98260c05c3b33acb
> these two patches finishes the work fixing the regression and improves 
> codegen.
> 
> As explained in that commit, ifconvert sorts PHI args in increasing number of
> occurrences in order to reduce the number of comparisons done while
> traversing the tree.
> 
> The remaining task that this patch fixes is dealing with the long chain of
> comparisons that can be created from phi nodes, particularly when they share
> any common successor (classical example is a diamond node).
> 
> on a PHI-node the true and else branches carry a condition, true will
> carry `a` and false `~a`.  The issue is that at the moment GCC tests both `a`
> and `~a` when the phi node has more than 2 arguments. Clearly this isn't
> needed.  The deeper the nesting of phi nodes the larger the repetition.
> 
> As an example, for
> 
> foo (int *f, int d, int e)
> {
>   for (int i = 0; i < 1024; i++)
> {
>   int a = f[i];
>   int t;
>   if (a < 0)
>   t = 1;
>   else if (a < e)
>   t = 1 - a * d;
>   else
>   t = 0;
>   f[i] = t;
> }
> }
> 
> after Jakub's patch we generate:
> 
>   _7 = a_10 < 0;
>   _21 = a_10 >= 0;
>   _22 = a_10 < e_11(D);
>   _23 = _21 & _22;
>   _ifc__42 = _23 ? t_13 : 0;
>   t_6 = _7 ? 1 : _ifc__42
> 
> but while better than before it is still inefficient, since in the false
> branch, where we know ~_7 is true, we still test _21.
> 
> This leads to superfluous tests for every diamond node.  After this patch we
> generate
> 
>  _7 = a_10 < 0;
>  _22 = a_10 < e_11(D);
>  _ifc__42 = _22 ? t_13 : 0;
>  t_6 = _7 ? 1 : _ifc__42;
> 
> Which correctly elides the test of _21.  This is done by borrowing the
> vectorizer's helper functions to limit predicate mask usages.  Ifcvt will 
> chain
> conditionals on the false edge (unless specifically inverted) so this patch on
> creating cond a ? b : c, will register ~a when traversing c.  If c is a
> conditional then c will be simplified to the smaller possible predicate given
> the assumptions we already know to be true.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Not sure how to write a non-fragile testcase for this as the
> conditionals chosen depends on threading etc. Any Suggestions?
> 
> Ok for master?

OK.

For a testcase I wonder if you can produce a GIMPLE FE one starting
with pass_fix_loops?  (I think it's still not possible to start
when in LC SSA)

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/109154
>   * tree-if-conv.cc (gen_simplified_condition,
>   gen_phi_nest_statement): New.
>   (gen_phi_arg_condition, predicate_scalar_phi): Use it.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 
> e342532a343a3c066142adeec5fdfaf736a653e5..16b36dd8b0226f796c1a3fc6d45a9059385e812b
>  100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1870,12 +1870,44 @@ convert_scalar_cond_reduction (gimple *reduc, 
> gimple_stmt_iterator *gsi,
>return rhs;
>  }
>  
> +/* Generate a simplified conditional.  */
> +
> +static tree
> +gen_simplified_condition (tree cond, scalar_cond_masked_set_type &cond_set)
> +{
> +  /* Check if the value is already live in a previous branch.  This resolves
> + nested conditionals from diamond PHI reductions.  */
> +  if (TREE_CODE (cond) == SSA_NAME)
> +{
> +  gimple *stmt = SSA_NAME_DEF_STMT (cond);
> +  gassign *assign = NULL;
> +  if ((assign = as_a  (stmt))
> +&& gimple_assign_rhs_code (assign) == BIT_AND_EXPR)
> + {
> +   tree arg1 = gimple_assign_rhs1 (assign);
> +   tree arg2 = gimple_assign_rhs2 (assign);
> +   if (cond_set.contains ({ arg1, 1 }))
> + arg1 = boolean_true_node;
> +   else
> + arg1 = gen_simplified_condition (arg1, cond_set);
> +
> +   if (cond_set.contains ({ arg2, 1 }))
> + arg2 = boolean_true_node;
> +   else
> + arg2 = gen_simplified_condition (arg2, cond_set);
> +
> +   cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, arg1, arg2);
> + }
> +}
> +  return cond;
> +}
> +
>  /* Produce condition for all occurrences of ARG in PHI node.  Set *INVERT
> as to whether the condition is inverted.  */
>  
>  static tree
> -gen_phi_arg_condition (gphi *phi, vec *occur,
> -gimple_stmt_iterator *gsi, bool *invert)
> +gen_phi_arg_condition (gphi *phi, vec *occur, gimple_stmt_iterator *gsi,
> +scalar_cond_masked_set_type &cond_set, bool *invert)
>  {
>int len;
>int i;
> @@ -1902,6 +1934,8 @@ gen_phi_arg_condition (gphi *phi, vec *occur,
> c = TREE_OPERAND (c, 0);
> *invert = true;
>   }
> +
> +  c = gen_simplified_condition (c, cond_set);
>c = force_gimple_operand_gsi (gsi, unshare_expr (c),
>   true, N

Re: [PATCH 2/2]middle-end ifcvt: Sort PHI arguments not only occurrences but also complexity [PR109154]

2023-07-11 Thread Richard Biener via Gcc-patches
On Fri, 7 Jul 2023, Tamar Christina wrote:

> Hi All,
> 
> This patch builds on the previous patch by fixing another issue with the
> way ifcvt currently picks which branches to test.
> 
> The issue with the current implementation is while it sorts for
> occurrences of the argument, it doesn't check for complexity of the arguments.
> 
> As an example:
> 
>[local count: 528603100]:
>   ...
>   if (distbb_75 >= 0.0)
> goto ; [59.00%]
>   else
> goto ; [41.00%]
> 
>[local count: 216727269]:
>   ...
>   goto ; [100.00%]
> 
>[local count: 311875831]:
>   ...
>   if (distbb_75 < iftmp.0_98)
> goto ; [20.00%]
>   else
> goto ; [80.00%]
> 
>[local count: 62375167]:
>   ...
> 
>[local count: 528603100]:
>   # prephitmp_175 = PHI <_173(18), 0.0(17), _174(16)>
> 
> All tree arguments to the PHI have the same number of occurrences, namely 1,
> however it makes a big difference which comparison we test first.
> 
> Sorting only on occurrences we'll pick the compares coming from BB 18 and BB 
> 17,
> This means we end up generating 4 comparisons, while 2 would have been enough.
> 
> By keeping track of the "complexity" of the COND in each BB, (i.e. the number
> of comparisons needed to traverse from the start [BB 15] to end [BB 19]) and
> using a key tuple of  we end up selecting the compare
> from BB 16 and BB 18 first.  BB 16 only requires 1 compare, and BB 18, after 
> we
> test BB 16 also only requires one additional compare.  This change paired with
> the one previous above results in the optimal 2 compares.
> 
> For deep nesting, i.e. for
> 
> ...
>   _79 = vr_15 > 20;
>   _80 = _68 & _79;
>   _82 = vr_15 <= 20;
>   _83 = _68 & _82;
>   _84 = vr_15 < -20;
>   _85 = _73 & _84;
>   _87 = vr_15 >= -20;
>   _88 = _73 & _87;
>   _ifc__111 = _55 ? 10 : 12;
>   _ifc__112 = _70 ? 7 : _ifc__111;
>   _ifc__113 = _85 ? 8 : _ifc__112;
>   _ifc__114 = _88 ? 9 : _ifc__113;
>   _ifc__115 = _45 ? 1 : _ifc__114;
>   _ifc__116 = _63 ? 3 : _ifc__115;
>   _ifc__117 = _65 ? 4 : _ifc__116;
>   _ifc__118 = _83 ? 6 : _ifc__117;
>   _ifc__119 = _60 ? 2 : _ifc__118;
>   _ifc__120 = _43 ? 13 : _ifc__119;
>   _ifc__121 = _75 ? 11 : _ifc__120;
>   vw_1 = _80 ? 5 : _ifc__121;
> 
> Most of the comparisons are still needed because the chain of
> occurrences to not negate eachother. i.e. _80 is _73 & vr_15 >= -20 and
> _85 is _73 & vr_15 < -20.  clearly given _73 needs to be true in both 
> branches,
> the only additional test needed is on vr_15, where the one test is the 
> negation
> of the other.  So we don't need to do the comparison of _73 twice.
> 
> The changes in the patch reduces the overall number of compares by one, but 
> has
> a bigger effect on the dependency chain.
> 
> Previously we would generate 5 instructions chain:
> 
>   cmple   p7.s, p4/z, z29.s, z30.s
>   cmpne   p7.s, p7/z, z29.s, #0
>   cmple   p6.s, p7/z, z31.s, z30.s
>   cmpge   p6.s, p6/z, z27.s, z25.s
>   cmplt   p15.s, p6/z, z28.s, z21.s
> 
> as the longest chain.  With this patch we generate 3:
> 
>   cmple   p7.s, p3/z, z27.s, z30.s
>   cmpne   p7.s, p7/z, z27.s, #0
>   cmpgt   p7.s, p7/z, z31.s, z30.s
> 
> and I don't think (x <= y) && (x != 0) && (z > y) can be reduced further.
> 
> Bootstrapped and Regtested on aarch64-none-linux-gnu and no issues.
> 
> Not sure how to write a non-fragile testcase for this as the
> conditionals chosen depends on threading etc. Any Suggestions?
> 
> Ok for master?

OK.

Likewise for the testcase - GIMPLE one starting at fix_loops.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/109154
>   * tree-if-conv.cc (INCLUDE_ALGORITHM): Include.
>   (struct bb_predicate): Add no_predicate_stmts.
>   (set_bb_predicate): Increase predicate count.
>   (set_bb_predicate_gimplified_stmts): Conditionally initialize
>   no_predicate_stmts.
>   (get_bb_num_predicate_stmts): New.
>   (init_bb_predicate): Initialzie no_predicate_stmts.
>   (release_bb_predicate): Cleanup no_predicate_stmts.
>   (insert_gimplified_predicates): Preserve no_predicate_stmts.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 
> 16b36dd8b0226f796c1a3fc6d45a9059385e812b..0ed50d99c46f99a4d1ea0e827ee2b2a3f494b2da
>  100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -80,6 +80,7 @@ along with GCC; see the file COPYING3.  If not see
>   :;
>  */
>  
> +#define INCLUDE_ALGORITHM
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> @@ -231,6 +232,10 @@ struct bb_predicate {
>   recorded here, in order to avoid the duplication of computations
>   that occur in previous conditions.  See PR44483.  */
>gimple_seq predicate_gimplified_stmts;
> +
> +  /* Records the number of statements recorded into
> + PREDICATE_GIMPLIFIED_STMTS.   */
> +  unsigned no_predicate_stmts;
>  };
>  
>  /* Returns true when the basic block BB has a predicate.  */
> @@ -254,10 

Re: [PATCH V2] VECT: Add COND_LEN_* operations for loop control with length targets

2023-07-11 Thread Richard Biener via Gcc-patches
On Mon, 10 Jul 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch is adding cond_len_* operations pattern for target support loop 
> control with length.
> 
> These patterns will be used in these following case:
> 
> 1. Integer division:
>void
>f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int n)
>{
>  for (int i = 0; i < n; ++i)
>   {
> a[i] = b[i] / c[i];
>   }
>}
> 
>   ARM SVE IR:
>   
>   ...
>   max_mask_36 = .WHILE_ULT (0, bnd.5_32, { 0, ... });
> 
>   Loop:
>   ...
>   # loop_mask_29 = PHI 
>   ...
>   vect__4.8_28 = .MASK_LOAD (_33, 32B, loop_mask_29);
>   ...
>   vect__6.11_25 = .MASK_LOAD (_20, 32B, loop_mask_29);
>   vect__8.12_24 = .COND_DIV (loop_mask_29, vect__4.8_28, vect__6.11_25, 
> vect__4.8_28);
>   ...
>   .MASK_STORE (_1, 32B, loop_mask_29, vect__8.12_24);
>   ...
>   next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
>   ...
>   
>   For target like RVV who support loop control with length, we want to see IR 
> as follows:
>   
>   Loop:
>   ...
>   # loop_len_29 = SELECT_VL
>   ...
>   vect__4.8_28 = .LEN_MASK_LOAD (_33, 32B, loop_len_29);
>   ...
>   vect__6.11_25 = .LEN_MASK_LOAD (_20, 32B, loop_len_29);
>   vect__8.12_24 = .COND_LEN_DIV (dummp_mask, vect__4.8_28, vect__6.11_25, 
> vect__4.8_28, loop_len_29, bias);
>   ...
>   .LEN_MASK_STORE (_1, 32B, loop_len_29, vect__8.12_24);
>   ...
>   next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
>   ...
>   
>   Notice here, we use dummp_mask = { -1, -1,  , -1 }
> 
> 2. Integer conditional division:
>Similar case with (1) but with condtion:
>void
>f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t 
> * cond, int n)
>{
>  for (int i = 0; i < n; ++i)
>{
>  if (cond[i])
>  a[i] = b[i] / c[i];
>}
>}
>
>ARM SVE:
>...
>max_mask_76 = .WHILE_ULT (0, bnd.6_52, { 0, ... });
> 
>Loop:
>...
># loop_mask_55 = PHI 
>...
>vect__4.9_56 = .MASK_LOAD (_51, 32B, loop_mask_55);
>mask__29.10_58 = vect__4.9_56 != { 0, ... };
>vec_mask_and_61 = loop_mask_55 & mask__29.10_58;
>...
>vect__6.13_62 = .MASK_LOAD (_24, 32B, vec_mask_and_61);
>...
>vect__8.16_66 = .MASK_LOAD (_1, 32B, vec_mask_and_61);
>vect__10.17_68 = .COND_DIV (vec_mask_and_61, vect__6.13_62, vect__8.16_66, 
> vect__6.13_62);
>...
>.MASK_STORE (_2, 32B, vec_mask_and_61, vect__10.17_68);
>...
>next_mask_77 = .WHILE_ULT (_3, bnd.6_52, { 0, ... });
>
>Here, ARM SVE use vec_mask_and_61 = loop_mask_55 & mask__29.10_58; to 
> gurantee the correct result.
>
>However, target with length control can not perform this elegant flow, for 
> RVV, we would expect:
>
>Loop:
>...
>loop_len_55 = SELECT_VL
>...
>mask__29.10_58 = vect__4.9_56 != { 0, ... };
>...
>vect__10.17_68 = .COND_LEN_DIV (mask__29.10_58, vect__6.13_62, 
> vect__8.16_66, vect__6.13_62, loop_len_55, bias);
>...
> 
>Here we expect COND_LEN_DIV predicated by a real mask which is the outcome 
> of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... };
>and a real length which is produced by loop control : loop_len_55 = 
> SELECT_VL
>
> 3. conditional Floating-point operations (no -ffast-math):
>
> void
> f (float *restrict a, float *restrict b, int32_t *restrict cond, int n)
> {
>   for (int i = 0; i < n; ++i)
> {
>   if (cond[i])
>   a[i] = b[i] + a[i];
> }
> }
>   
>   ARM SVE IR:
>   max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... });
> 
>   ...
>   # loop_mask_49 = PHI 
>   ...
>   mask__27.10_52 = vect__4.9_50 != { 0, ... };
>   vec_mask_and_55 = loop_mask_49 & mask__27.10_52;
>   ...
>   vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
> vect__6.13_56);
>   ...
>   next_mask_71 = .WHILE_ULT (_22, bnd.6_46, { 0, ... });
>   ...
>   
>   For RVV, we would expect IR:
>   
>   ...
>   loop_len_49 = SELECT_VL
>   ...
>   mask__27.10_52 = vect__4.9_50 != { 0, ... };
>   ...
>   vect__9.17_62 = .COND_LEN_ADD (mask__27.10_52, vect__6.13_56, 
> vect__8.16_60, vect__6.13_56, loop_len_49, bias);
>   ...
> 
> 4. Conditional un-ordered reduction:
>
>int32_t
>f (int32_t *restrict a, 
>int32_t *restrict cond, int n)
>{
>  int32_t result = 0;
>  for (int i = 0; i < n; ++i)
>{
>if (cond[i])
>  result += a[i];
>}
>  return result;
>}
>
>ARM SVE IR:
>  
>  Loop:
>  # vect_result_18.7_37 = PHI 
>  ...
>  # loop_mask_40 = PHI 
>  ...
>  mask__17.11_43 = vect__4.10_41 != { 0, ... };
>  vec_mask_and_46 = loop_mask_40 & mask__17.11_43;
>  ...
>  vect__33.16_51 = .COND_ADD (vec_mask_and_46, vect_result_18.7_37, 
> vect__7.14_47, vect_result_18.7_37);
>  ...
>  next_mask_58 = .WHILE_ULT (_15, bnd.6_36, { 0, ... });
>  ..

RE: [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds

2023-07-11 Thread Richard Biener via Gcc-patches
On Mon, 10 Jul 2023, Tamar Christina wrote:

> > > -  *type_out = STMT_VINFO_VECTYPE (stmt_info);
> > > +  if (cond_cst)
> > > +{
> > > +  append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
> > > +  pattern_stmt
> > > + = gimple_build_cond (gimple_cond_code (cond_stmt),
> > > +  gimple_get_lhs (pattern_stmt),
> > > +  fold_convert (ret_type, cond_cst),
> > > +  gimple_cond_true_label (cond_stmt),
> > > +  gimple_cond_false_label (cond_stmt));
> > > +  *type_out = STMT_VINFO_VECTYPE (stmt_info);
> > 
> > is there any vectype set for a gcond?
> 
> No, because gconds can't be codegen'd yet, atm we must replace the original
> gcond when generating code.
> 
> However looking at the diff this code, don't think the else is needed here.
> Testing an updated patch.
> 
> > 
> > I must say the flow of the function is a bit convoluted now.  Is it 
> > possible to
> > factor out a helper so we can fully separate the gassign vs. gcond handling 
> > in
> > this function?
> 
> I am not sure, the only place the changes are are at the start (e.g. how we 
> determine bf_stmt)
> and how we determine ret_type, and when determining shift_first for the 
> single use case.
> 
> Now I can't move the ret_type anywhere as I need to decompose bf_stmt first.  
> And the shift_first
> can be simplified by moving it up into the part that determined bf_stmt, but 
> then we walk the
> immediate uses even on cases where we early exit.  Which seems inefficient.
> 
> Then there's the final clause which just generates an additional gcond if the 
> original statement was
> a gcond. But not sure that'll help, since it's just something done *in 
> addition* to the normal assign.
> 
> So there doesn't seem to be enough, or big enough divergence to justify a 
> split.   I have however made
> an attempt at cleaning it up a bit, is this one better?

Yeah, it is.
 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-patterns.cc (vect_init_pattern_stmt): Copy STMT_VINFO_TYPE
>   from original statement.
>   (vect_recog_bitfield_ref_pattern): Support bitfields in gcond.
> 
> Co-Authored-By:  Andre Vieira 
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> 60bc9be6819af9bd28a81430869417965ba9d82d..b842f7d983405cd04f6760be7d91c1f55b30aac4
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -128,6 +128,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple 
> *pattern_stmt,
>STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info;
>STMT_VINFO_DEF_TYPE (pattern_stmt_info)
>  = STMT_VINFO_DEF_TYPE (orig_stmt_info);
> +  STMT_VINFO_TYPE (pattern_stmt_info) = STMT_VINFO_TYPE (orig_stmt_info);
>if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
>  {
>gcc_assert (!vectype
> @@ -2441,6 +2442,10 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
> bf_value = BIT_FIELD_REF (container, bitsize, bitpos);
> result = (type_out) bf_value;
>  
> +   or
> +
> +   if (BIT_FIELD_REF (container, bitsize, bitpos) `cmp` )
> +
> where type_out is a non-bitfield type, that is to say, it's precision 
> matches
> 2^(TYPE_SIZE(type_out) - (TYPE_UNSIGNED (type_out) ? 1 : 2)).
>  
> @@ -2450,6 +2455,10 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
> here it starts with:
> result = (type_out) bf_value;
>  
> +   or
> +
> +   if (BIT_FIELD_REF (container, bitsize, bitpos) `cmp` )
> +
> Output:
>  
> * TYPE_OUT: The vector type of the output of this pattern.
> @@ -2482,33 +2491,45 @@ vect_recog_widen_sum_pattern (vec_info *vinfo,
>  
> The shifting is always optional depending on whether bitpos != 0.
>  
> +   When the original bitfield was inside a gcond then an new gcond is also
> +   generated with the newly `result` as the operand to the comparison.
> +
>  */
>  
>  static gimple *
>  vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info,
>tree *type_out)
>  {
> -  gassign *first_stmt = dyn_cast  (stmt_info->stmt);
> -
> -  if (!first_stmt)
> -return NULL;
> -
> -  gassign *bf_stmt;
> -  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt))
> -  && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME)
> +  gimple *bf_stmt = NULL;
> +  tree lhs = NULL_TREE;
> +  tree ret_type = NULL_TREE;
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  if (gcond *cond_stmt = dyn_cast  (stmt))
> +{
> +  tree op = gimple_cond_lhs (cond_stmt);
> +  if (TREE_CODE (op) != SSA_NAME)
> + return NULL;
> +  bf_stmt = dyn_cast  (SSA_NAME_DEF_STMT (op));
> +  if (TREE_CODE (gimple_cond_rhs (cond_stmt)) != INTEGER_CST)
> + return NULL;
> +}
> +  else if (is_gimple_assign (stmt)
> +&& CONVERT_EXPR_CODE_P (gimple_a

[PATCH] genopinit: Allow more than 256 modes.

2023-07-11 Thread Robin Dapp via Gcc-patches
Hi,

upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
helper functions in gen* rely on the opcode as well as two modes fitting
into an unsigned int (a signed int even if we consider the qsort default
comparison function).  This patch changes the type of the index/hash
from unsigned int to unsigned long long and allows up to 16 bits for a
mode as well as 32 bits for an optab.

Despite fearing worse, bootstrap, build and test suite run times on
x86, aarch64, rv64 and power10 are actually unchanged (I didn't check
32-bit architectures but would expect similar results).

Regards
 Robin

gcc/ChangeLog:

* genopinit.cc (pattern_cmp): Use if/else for comparison instead
of subtraction.
(main): Change to unsigned long long.
* gensupport.cc (find_optab): Ditto.
* gensupport.h (struct optab_pattern): Ditto.
* optabs-query.h (optab_handler): Ditto.
(convert_optab_handler): Ditto.
---
 gcc/genopinit.cc   | 19 ---
 gcc/gensupport.cc  |  3 ++-
 gcc/gensupport.h   |  2 +-
 gcc/optabs-query.h |  5 +++--
 4 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
index 6bd8858a1d9..58c1bf7cba8 100644
--- a/gcc/genopinit.cc
+++ b/gcc/genopinit.cc
@@ -51,7 +51,12 @@ pattern_cmp (const void *va, const void *vb)
 {
   const optab_pattern *a = (const optab_pattern *)va;
   const optab_pattern *b = (const optab_pattern *)vb;
-  return a->sort_num - b->sort_num;
+  if (a->sort_num > b->sort_num)
+return 1;
+  else if (a->sort_num < b->sort_num)
+return -1;
+  else
+return 0;
 }
 
 static int
@@ -306,7 +311,7 @@ main (int argc, const char **argv)
   "extern const struct optab_libcall_d 
normlib_def[NUM_NORMLIB_OPTABS];\n"
   "\n"
   "/* Returns the active icode for the given (encoded) optab.  */\n"
-  "extern enum insn_code raw_optab_handler (unsigned);\n"
+  "extern enum insn_code raw_optab_handler (unsigned long long);\n"
   "extern bool swap_optab_enable (optab, machine_mode, bool);\n"
   "\n"
   "/* Target-dependent globals.  */\n"
@@ -358,14 +363,14 @@ main (int argc, const char **argv)
   "#include \"optabs.h\"\n"
   "\n"
   "struct optab_pat {\n"
-  "  unsigned scode;\n"
+  "  unsigned long long scode;\n"
   "  enum insn_code icode;\n"
   "};\n\n");
 
   fprintf (s_file,
   "static const struct optab_pat pats[NUM_OPTAB_PATTERNS] = {\n");
   for (i = 0; patterns.iterate (i, &p); ++i)
-fprintf (s_file, "  { %#08x, CODE_FOR_%s },\n", p->sort_num, p->name);
+fprintf (s_file, "  { %#08llx, CODE_FOR_%s },\n", p->sort_num, p->name);
   fprintf (s_file, "};\n\n");
 
   fprintf (s_file, "void\ninit_all_optabs (struct target_optabs 
*optabs)\n{\n");
@@ -410,7 +415,7 @@ main (int argc, const char **argv)
  the hash entries, which complicates the pat_enable array.  */
   fprintf (s_file,
   "static int\n"
-  "lookup_handler (unsigned scode)\n"
+  "lookup_handler (unsigned long long scode)\n"
   "{\n"
   "  int l = 0, h = ARRAY_SIZE (pats), m;\n"
   "  while (h > l)\n"
@@ -428,7 +433,7 @@ main (int argc, const char **argv)
 
   fprintf (s_file,
   "enum insn_code\n"
-  "raw_optab_handler (unsigned scode)\n"
+  "raw_optab_handler (unsigned long long scode)\n"
   "{\n"
   "  int i = lookup_handler (scode);\n"
   "  return (i >= 0 && this_fn_optabs->pat_enable[i]\n"
@@ -439,7 +444,7 @@ main (int argc, const char **argv)
   "bool\n"
   "swap_optab_enable (optab op, machine_mode m, bool set)\n"
   "{\n"
-  "  unsigned scode = (op << 16) | m;\n"
+  "  unsigned long long scode = ((unsigned long long)op << 32) | m;\n"
   "  int i = lookup_handler (scode);\n"
   "  if (i >= 0)\n"
   "{\n"
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index e39e6dacce2..3fe7428372d 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -3806,7 +3806,8 @@ find_optab (optab_pattern *p, const char *name)
{
  p->name = name;
  p->op = optabs[pindex].op;
- p->sort_num = (p->op << 16) | (p->m2 << 8) | p->m1;
+ p->sort_num
+   = ((unsigned long long) p->op << 32) | (p->m2 << 16) | p->m1;
  return true;
}
 }
diff --git a/gcc/gensupport.h b/gcc/gensupport.h
index 7925e22ed41..9f70e2310e2 100644
--- a/gcc/gensupport.h
+++ b/gcc/gensupport.h
@@ -123,7 +123,7 @@ struct optab_pattern
 
   /* An index that provides a lexicographical sort of (OP, M2, M1).
  Used by genopinit.cc.  */
-  unsigned int sort_num;
+  unsigned long long sort_num;
 };
 
 extern rtx add_implicit_parallel (rtvec);
diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
index 043e9791bc1..5a1d2f75470 100644
--- a/gcc/optabs-query.h
+++ b/gcc/optabs-query.h
@@ -37,7 +37,7 @@ convert_opt

Re: [PATCH] genopinit: Allow more than 256 modes.

2023-07-11 Thread 钟居哲
Thanks for fixing it.
CC Richards to see whether it is appropriate.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-07-11 19:51
To: gcc-patches
CC: rdapp.gcc; jeffreyalaw; juzhe.zh...@rivai.ai
Subject: [PATCH] genopinit: Allow more than 256 modes.
Hi,
 
upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
helper functions in gen* rely on the opcode as well as two modes fitting
into an unsigned int (a signed int even if we consider the qsort default
comparison function).  This patch changes the type of the index/hash
from unsigned int to unsigned long long and allows up to 16 bits for a
mode as well as 32 bits for an optab.
 
Despite fearing worse, bootstrap, build and test suite run times on
x86, aarch64, rv64 and power10 are actually unchanged (I didn't check
32-bit architectures but would expect similar results).
 
Regards
Robin
 
gcc/ChangeLog:
 
* genopinit.cc (pattern_cmp): Use if/else for comparison instead
of subtraction.
(main): Change to unsigned long long.
* gensupport.cc (find_optab): Ditto.
* gensupport.h (struct optab_pattern): Ditto.
* optabs-query.h (optab_handler): Ditto.
(convert_optab_handler): Ditto.
---
gcc/genopinit.cc   | 19 ---
gcc/gensupport.cc  |  3 ++-
gcc/gensupport.h   |  2 +-
gcc/optabs-query.h |  5 +++--
4 files changed, 18 insertions(+), 11 deletions(-)
 
diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
index 6bd8858a1d9..58c1bf7cba8 100644
--- a/gcc/genopinit.cc
+++ b/gcc/genopinit.cc
@@ -51,7 +51,12 @@ pattern_cmp (const void *va, const void *vb)
{
   const optab_pattern *a = (const optab_pattern *)va;
   const optab_pattern *b = (const optab_pattern *)vb;
-  return a->sort_num - b->sort_num;
+  if (a->sort_num > b->sort_num)
+return 1;
+  else if (a->sort_num < b->sort_num)
+return -1;
+  else
+return 0;
}
static int
@@ -306,7 +311,7 @@ main (int argc, const char **argv)
   "extern const struct optab_libcall_d normlib_def[NUM_NORMLIB_OPTABS];\n"
   "\n"
   "/* Returns the active icode for the given (encoded) optab.  */\n"
-"extern enum insn_code raw_optab_handler (unsigned);\n"
+"extern enum insn_code raw_optab_handler (unsigned long long);\n"
   "extern bool swap_optab_enable (optab, machine_mode, bool);\n"
   "\n"
   "/* Target-dependent globals.  */\n"
@@ -358,14 +363,14 @@ main (int argc, const char **argv)
   "#include \"optabs.h\"\n"
   "\n"
   "struct optab_pat {\n"
-"  unsigned scode;\n"
+"  unsigned long long scode;\n"
   "  enum insn_code icode;\n"
   "};\n\n");
   fprintf (s_file,
   "static const struct optab_pat pats[NUM_OPTAB_PATTERNS] = {\n");
   for (i = 0; patterns.iterate (i, &p); ++i)
-fprintf (s_file, "  { %#08x, CODE_FOR_%s },\n", p->sort_num, p->name);
+fprintf (s_file, "  { %#08llx, CODE_FOR_%s },\n", p->sort_num, p->name);
   fprintf (s_file, "};\n\n");
   fprintf (s_file, "void\ninit_all_optabs (struct target_optabs 
*optabs)\n{\n");
@@ -410,7 +415,7 @@ main (int argc, const char **argv)
  the hash entries, which complicates the pat_enable array.  */
   fprintf (s_file,
   "static int\n"
-"lookup_handler (unsigned scode)\n"
+"lookup_handler (unsigned long long scode)\n"
   "{\n"
   "  int l = 0, h = ARRAY_SIZE (pats), m;\n"
   "  while (h > l)\n"
@@ -428,7 +433,7 @@ main (int argc, const char **argv)
   fprintf (s_file,
   "enum insn_code\n"
-"raw_optab_handler (unsigned scode)\n"
+"raw_optab_handler (unsigned long long scode)\n"
   "{\n"
   "  int i = lookup_handler (scode);\n"
   "  return (i >= 0 && this_fn_optabs->pat_enable[i]\n"
@@ -439,7 +444,7 @@ main (int argc, const char **argv)
   "bool\n"
   "swap_optab_enable (optab op, machine_mode m, bool set)\n"
   "{\n"
-"  unsigned scode = (op << 16) | m;\n"
+"  unsigned long long scode = ((unsigned long long)op << 32) | m;\n"
   "  int i = lookup_handler (scode);\n"
   "  if (i >= 0)\n"
   "{\n"
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index e39e6dacce2..3fe7428372d 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -3806,7 +3806,8 @@ find_optab (optab_pattern *p, const char *name)
{
  p->name = name;
  p->op = optabs[pindex].op;
-   p->sort_num = (p->op << 16) | (p->m2 << 8) | p->m1;
+   p->sort_num
+ = ((unsigned long long) p->op << 32) | (p->m2 << 16) | p->m1;
  return true;
}
 }
diff --git a/gcc/gensupport.h b/gcc/gensupport.h
index 7925e22ed41..9f70e2310e2 100644
--- a/gcc/gensupport.h
+++ b/gcc/gensupport.h
@@ -123,7 +123,7 @@ struct optab_pattern
   /* An index that provides a lexicographical sort of (OP, M2, M1).
  Used by genopinit.cc.  */
-  unsigned int sort_num;
+  unsigned long long sort_num;
};
extern rtx add_implicit_parallel (rtvec);
diff --git a/gcc/optabs-query.h b/gcc/optabs-query.h
index 043e9791bc1..5a1d2f75470 100644
--- a/gcc/optabs-query.h
+++ b/gcc/optabs-query.h
@@ -37,7 +37,7 @@ convert_optab_p (optab op)
inline enum insn_code
optab_handler (optab op, machine_mode mode)
{
-  unsigned scode = (op << 16) | mode;
+  unsigned long long s

[PATCH] fortran: Release symbols in reversed order [PR106050]

2023-07-11 Thread Mikael Morin via Gcc-patches
Hello,

I saw the light regarding this PR after Paul posted a comment yesterday.

Regression test in progress on x86_64-pc-linux-gnu.
I plan to push in the next hours.

Mikael

-- >8 --

Release symbols in reversed order wrt the order they were allocated.
This fixes an error recovery ICE in the case of a misplaced
derived type declaration.  Such a declaration creates nested
symbols, one for the derived type and one for each type parameter,
which should be immediately released as the declaration is
rejected.  This breaks if the derived type is released first.
As the type parameter symbols are in the namespace of the derived
type, releasing the derived type releases the type parameters, so
one can't access them after that, even to release them.  Hence,
the type parameters should be released first.

PR fortran/106050

gcc/fortran/ChangeLog:

* symbol.cc (gfc_restore_last_undo_checkpoint): Release symbols
in reverse order.

gcc/testsuite/ChangeLog:

* gfortran.dg/pdt_33.f90: New test.
---
 gcc/fortran/symbol.cc|  2 +-
 gcc/testsuite/gfortran.dg/pdt_33.f90 | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pdt_33.f90

diff --git a/gcc/fortran/symbol.cc b/gcc/fortran/symbol.cc
index 37a9e8fa0ae..4a71d84b3fe 100644
--- a/gcc/fortran/symbol.cc
+++ b/gcc/fortran/symbol.cc
@@ -3661,7 +3661,7 @@ gfc_restore_last_undo_checkpoint (void)
   gfc_symbol *p;
   unsigned i;
 
-  FOR_EACH_VEC_ELT (latest_undo_chgset->syms, i, p)
+  FOR_EACH_VEC_ELT_REVERSE (latest_undo_chgset->syms, i, p)
 {
   /* Symbol in a common block was new. Or was old and just put in common */
   if (p->common_block
diff --git a/gcc/testsuite/gfortran.dg/pdt_33.f90 
b/gcc/testsuite/gfortran.dg/pdt_33.f90
new file mode 100644
index 000..0521513f2f8
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pdt_33.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+!
+! PR fortran/106050
+! The following used to trigger an error recovery ICE by releasing
+! the symbol T before the symbol K which was leading to releasing
+! K twice as it's in T's namespace.
+!
+! Contributed by G. Steinmetz 
+
+program p
+   a = 1
+   type t(k)  ! { dg-error "Unexpected derived type 
declaration" }
+  integer, kind :: k = 4  ! { dg-error "not allowed outside a TYPE 
definition" }
+   end type   ! { dg-error "Expecting END PROGRAM" }
+end
-- 
2.40.1



[PATCH] c++: coercing variable template from current inst [PR110580]

2023-07-11 Thread Patrick Palka via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

Here during ahead of time coercion of the variable template-id v1,
since we pass only the innermost arguments to coerce_template_parms (and
outer arguments are still dependent at this point), substitution of the
default template argument V=U prematurely lowers U from level 2 to level 1.
Thus we incorrectly resolve v1 to v1 (effectively) instead
of to v1.

Coercion of a class/alias template-id on the other hand is always done
using the full set of arguments relative to the most general template,
so ahead of time coercion there does the right thing.  I suppose we
should do the same for variable template-ids.

PR c++/110580

gcc/cp/ChangeLog:

* pt.cc (lookup_template_variable): Pass all arguments to
coerce_template_parms, and use the innermost parameters from
the most general template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ83.C: New test.
---
 gcc/cp/pt.cc |  4 +++-
 gcc/testsuite/g++.dg/cpp1y/var-templ83.C | 16 
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ83.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 076f788281e..fa15b75b9c5 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10345,7 +10345,9 @@ lookup_template_variable (tree templ, tree arglist, 
tsubst_flags_t complain)
   if (flag_concepts && variable_concept_p (templ))
 return build_concept_check (templ, arglist, tf_none);
 
-  tree parms = DECL_INNERMOST_TEMPLATE_PARMS (templ);
+  tree gen_templ = most_general_template (templ);
+  tree parms = DECL_INNERMOST_TEMPLATE_PARMS (gen_templ);
+  arglist = add_outermost_template_args (templ, arglist);
   arglist = coerce_template_parms (parms, arglist, templ, complain);
   if (arglist == error_mark_node)
 return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ83.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ83.C
new file mode 100644
index 000..f5268f258d7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ83.C
@@ -0,0 +1,16 @@
+// PR c++/110580
+// { dg-do compile { target c++14 } }
+
+template
+struct A {
+  template
+  static constexpr bool v1 = __is_same(U, V);
+
+  template
+  static constexpr bool v2 = !__is_same(U, V);
+
+  static_assert(v1, "");
+  static_assert(v2, "");
+};
+
+template struct A;
-- 
2.41.0.327.gaa9166bcc0



Re: [PATCH] RISC-V: Optimize permutation codegen with vcompress

2023-07-11 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

looks good from my side, thanks.  While going through it I
thought of some related cases that we could still handle
differently but I didn't bother to formalize them for now.
Most likely we already handle them in the shortest way
anyway.  I'm going to check on that when I find some time
at some point. 

In the tests I noticed that most (all?) of them are pretty
evenly split (half/half) between first and second source vector.
Wouldn't we want some more variety there? Still OK without
that IMHO.

Regards
 Robin



Re: Re: [PATCH] RISC-V: Optimize permutation codegen with vcompress

2023-07-11 Thread 钟居哲
The compress optimization pattern has included all variety.
It's not necessary to force split (half/half), we can apply this compress
pattern to any variety of compress pattern.

You can apply this patch to see.

Thanks.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-07-11 20:17
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Optimize permutation codegen with vcompress
Hi Juzhe,
 
looks good from my side, thanks.  While going through it I
thought of some related cases that we could still handle
differently but I didn't bother to formalize them for now.
Most likely we already handle them in the shortest way
anyway.  I'm going to check on that when I find some time
at some point. 
 
In the tests I noticed that most (all?) of them are pretty
evenly split (half/half) between first and second source vector.
Wouldn't we want some more variety there? Still OK without
that IMHO.
 
Regards
Robin
 
 


Re: [PATCH] RISC-V: Optimize permutation codegen with vcompress

2023-07-11 Thread Robin Dapp via Gcc-patches
> The compress optimization pattern has included all variety.
> It's not necessary to force split (half/half), we can apply this compress
> pattern to any variety of compress pattern.

Yes, that's clear.  I meant the testcases are mostly designed
like

MASK4 1, 2, 6, 7

instead of variation like

MASK4 0, 5, 6, 7

or something else.  But this wouldn't add a lot of coverage
anyway as we're searching for the "pivot" anyway.

Regards
 Robin


Re: Re: [PATCH] RISC-V: Optimize permutation codegen with vcompress

2023-07-11 Thread 钟居哲
MASK4 0, 5, 6, 7 also works definitely.

The optimization is generic as long as the permutation index matches the 
compress insn on RVV ISA SPEC.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-07-11 20:24
To: 钟居哲; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; Jeff Law
Subject: Re: [PATCH] RISC-V: Optimize permutation codegen with vcompress
> The compress optimization pattern has included all variety.
> It's not necessary to force split (half/half), we can apply this compress
> pattern to any variety of compress pattern.
 
Yes, that's clear.  I meant the testcases are mostly designed
like
 
MASK4 1, 2, 6, 7
 
instead of variation like
 
MASK4 0, 5, 6, 7
 
or something else.  But this wouldn't add a lot of coverage
anyway as we're searching for the "pivot" anyway.
 
Regards
Robin
 


Re: [PATCH] RISC-V: Optimize permutation codegen with vcompress

2023-07-11 Thread Robin Dapp via Gcc-patches
> MASK4 0, 5, 6, 7 also works definitely

Sure :)  My remark was that the tests are all(?)
evenly split and a bit more variation would have been nice.
Not that it doesn't work, I'm OK with it as is.

Regards
 Robin


Re: [PATCH] genopinit: Allow more than 256 modes.

2023-07-11 Thread Richard Sandiford via Gcc-patches
Robin Dapp via Gcc-patches  writes:
> Hi,
>
> upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
> helper functions in gen* rely on the opcode as well as two modes fitting
> into an unsigned int (a signed int even if we consider the qsort default
> comparison function).  This patch changes the type of the index/hash
> from unsigned int to unsigned long long and allows up to 16 bits for a
> mode as well as 32 bits for an optab.
>
> Despite fearing worse, bootstrap, build and test suite run times on
> x86, aarch64, rv64 and power10 are actually unchanged (I didn't check
> 32-bit architectures but would expect similar results).

I think for now we should just bump the mode shift to 10 and assert
(statically) that MAX_MACHINE_MODE < 1024.

Thanks,
Richard

> Regards
>  Robin
>
> gcc/ChangeLog:
>
>   * genopinit.cc (pattern_cmp): Use if/else for comparison instead
>   of subtraction.
>   (main): Change to unsigned long long.
>   * gensupport.cc (find_optab): Ditto.
>   * gensupport.h (struct optab_pattern): Ditto.
>   * optabs-query.h (optab_handler): Ditto.
>   (convert_optab_handler): Ditto.
> ---
>  gcc/genopinit.cc   | 19 ---
>  gcc/gensupport.cc  |  3 ++-
>  gcc/gensupport.h   |  2 +-
>  gcc/optabs-query.h |  5 +++--
>  4 files changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
> index 6bd8858a1d9..58c1bf7cba8 100644
> --- a/gcc/genopinit.cc
> +++ b/gcc/genopinit.cc
> @@ -51,7 +51,12 @@ pattern_cmp (const void *va, const void *vb)
>  {
>const optab_pattern *a = (const optab_pattern *)va;
>const optab_pattern *b = (const optab_pattern *)vb;
> -  return a->sort_num - b->sort_num;
> +  if (a->sort_num > b->sort_num)
> +return 1;
> +  else if (a->sort_num < b->sort_num)
> +return -1;
> +  else
> +return 0;
>  }
>  
>  static int
> @@ -306,7 +311,7 @@ main (int argc, const char **argv)
>  "extern const struct optab_libcall_d 
> normlib_def[NUM_NORMLIB_OPTABS];\n"
>  "\n"
>  "/* Returns the active icode for the given (encoded) optab.  */\n"
> -"extern enum insn_code raw_optab_handler (unsigned);\n"
> +"extern enum insn_code raw_optab_handler (unsigned long long);\n"
>  "extern bool swap_optab_enable (optab, machine_mode, bool);\n"
>  "\n"
>  "/* Target-dependent globals.  */\n"
> @@ -358,14 +363,14 @@ main (int argc, const char **argv)
>  "#include \"optabs.h\"\n"
>  "\n"
>  "struct optab_pat {\n"
> -"  unsigned scode;\n"
> +"  unsigned long long scode;\n"
>  "  enum insn_code icode;\n"
>  "};\n\n");
>  
>fprintf (s_file,
>  "static const struct optab_pat pats[NUM_OPTAB_PATTERNS] = {\n");
>for (i = 0; patterns.iterate (i, &p); ++i)
> -fprintf (s_file, "  { %#08x, CODE_FOR_%s },\n", p->sort_num, p->name);
> +fprintf (s_file, "  { %#08llx, CODE_FOR_%s },\n", p->sort_num, p->name);
>fprintf (s_file, "};\n\n");
>  
>fprintf (s_file, "void\ninit_all_optabs (struct target_optabs 
> *optabs)\n{\n");
> @@ -410,7 +415,7 @@ main (int argc, const char **argv)
>   the hash entries, which complicates the pat_enable array.  */
>fprintf (s_file,
>  "static int\n"
> -"lookup_handler (unsigned scode)\n"
> +"lookup_handler (unsigned long long scode)\n"
>  "{\n"
>  "  int l = 0, h = ARRAY_SIZE (pats), m;\n"
>  "  while (h > l)\n"
> @@ -428,7 +433,7 @@ main (int argc, const char **argv)
>  
>fprintf (s_file,
>  "enum insn_code\n"
> -"raw_optab_handler (unsigned scode)\n"
> +"raw_optab_handler (unsigned long long scode)\n"
>  "{\n"
>  "  int i = lookup_handler (scode);\n"
>  "  return (i >= 0 && this_fn_optabs->pat_enable[i]\n"
> @@ -439,7 +444,7 @@ main (int argc, const char **argv)
>  "bool\n"
>  "swap_optab_enable (optab op, machine_mode m, bool set)\n"
>  "{\n"
> -"  unsigned scode = (op << 16) | m;\n"
> +"  unsigned long long scode = ((unsigned long long)op << 32) | m;\n"
>  "  int i = lookup_handler (scode);\n"
>  "  if (i >= 0)\n"
>  "{\n"
> diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
> index e39e6dacce2..3fe7428372d 100644
> --- a/gcc/gensupport.cc
> +++ b/gcc/gensupport.cc
> @@ -3806,7 +3806,8 @@ find_optab (optab_pattern *p, const char *name)
>   {
> p->name = name;
> p->op = optabs[pindex].op;
> -   p->sort_num = (p->op << 16) | (p->m2 << 8) | p->m1;
> +   p->sort_num
> + = ((unsigned long long) p->op << 32) | (p->m2 << 16) | p->m1;
> return true;
>   }
>  }
> diff --git a/gcc/gensupport.h b/gcc/gensupport.h
> index 7925e22ed41..9f70e2310e2 100644
> --- a/gcc/gensupport.h
> +++ b/gcc/gensupport.h
> @@ -123,7 +123,7 @@ struct optab_pattern
>  
>/* An index that provides a lexicographical sort of (OP, M2, M1).
>

Re: [PATCH] genopinit: Allow more than 256 modes.

2023-07-11 Thread Richard Sandiford via Gcc-patches
Richard Sandiford  writes:
> Robin Dapp via Gcc-patches  writes:
>> Hi,
>>
>> upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
>> helper functions in gen* rely on the opcode as well as two modes fitting
>> into an unsigned int (a signed int even if we consider the qsort default
>> comparison function).  This patch changes the type of the index/hash
>> from unsigned int to unsigned long long and allows up to 16 bits for a
>> mode as well as 32 bits for an optab.
>>
>> Despite fearing worse, bootstrap, build and test suite run times on
>> x86, aarch64, rv64 and power10 are actually unchanged (I didn't check
>> 32-bit architectures but would expect similar results).
>
> I think for now we should just bump the mode shift to 10 and assert
> (statically) that MAX_MACHINE_MODE < 1024.

Sorry, just remembered that we already have:

  if (NUM_OPTABS > 0x
|| MAX_MACHINE_MODE >= ((1 << MACHINE_MODE_BITSIZE) - 1))
fatal ("genopinit range assumptions invalid");

so it would be a case of changing those instead.

Thanks,
Richard


Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-11 Thread Jan Hubicka via Gcc-patches
> > By now we did CCP and FRE so we likely optimized out most of constant
> > conditionals exposed by inline.
> 
> So maybe we should simply delay re-propagation of the profile?  I
> think cunrolli doesn't so much care about the profile - cunrolli
> is (was) about abstraction removal.  Jump threading should be
> the first pass to care.

That is what I was thinking too.  After inlining the profile counts may
be in quite bad shape. If you inline together loop like in exchange that
has large loop nest, we will definitely end up capping counts to avoid
overflow.

cunrolli does:

 ret = tree_unroll_loops_completely (optimize >= 3, false);

which sets may_increase_size to true for -O3 and then

 may_increase_size && optimize_loop_nest_for_speed_p (loop)

which seems reasonable guard and it may get random answers on capped
profile.  It is not big deal to try propagating before cunrolli and then
again before threading and see how much potential this idea has.
I guess I should also double check that the other passes are indeed
safe, but I think it is quite obvoius they should be.

Honza
> 
> Richard.


RE: [PATCH V2] VECT: Add COND_LEN_* operations for loop control with length targets

2023-07-11 Thread Li, Pan2 via Gcc-patches
Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Tuesday, July 11, 2023 7:01 PM
To: Ju-Zhe Zhong 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com
Subject: Re: [PATCH V2] VECT: Add COND_LEN_* operations for loop control with 
length targets

On Mon, 10 Jul 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch is adding cond_len_* operations pattern for target support loop 
> control with length.
> 
> These patterns will be used in these following case:
> 
> 1. Integer division:
>void
>f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int n)
>{
>  for (int i = 0; i < n; ++i)
>   {
> a[i] = b[i] / c[i];
>   }
>}
> 
>   ARM SVE IR:
>   
>   ...
>   max_mask_36 = .WHILE_ULT (0, bnd.5_32, { 0, ... });
> 
>   Loop:
>   ...
>   # loop_mask_29 = PHI 
>   ...
>   vect__4.8_28 = .MASK_LOAD (_33, 32B, loop_mask_29);
>   ...
>   vect__6.11_25 = .MASK_LOAD (_20, 32B, loop_mask_29);
>   vect__8.12_24 = .COND_DIV (loop_mask_29, vect__4.8_28, vect__6.11_25, 
> vect__4.8_28);
>   ...
>   .MASK_STORE (_1, 32B, loop_mask_29, vect__8.12_24);
>   ...
>   next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
>   ...
>   
>   For target like RVV who support loop control with length, we want to see IR 
> as follows:
>   
>   Loop:
>   ...
>   # loop_len_29 = SELECT_VL
>   ...
>   vect__4.8_28 = .LEN_MASK_LOAD (_33, 32B, loop_len_29);
>   ...
>   vect__6.11_25 = .LEN_MASK_LOAD (_20, 32B, loop_len_29);
>   vect__8.12_24 = .COND_LEN_DIV (dummp_mask, vect__4.8_28, vect__6.11_25, 
> vect__4.8_28, loop_len_29, bias);
>   ...
>   .LEN_MASK_STORE (_1, 32B, loop_len_29, vect__8.12_24);
>   ...
>   next_mask_37 = .WHILE_ULT (_2, bnd.5_32, { 0, ... });
>   ...
>   
>   Notice here, we use dummp_mask = { -1, -1,  , -1 }
> 
> 2. Integer conditional division:
>Similar case with (1) but with condtion:
>void
>f (int32_t *restrict a, int32_t *restrict b, int32_t *restrict c, int32_t 
> * cond, int n)
>{
>  for (int i = 0; i < n; ++i)
>{
>  if (cond[i])
>  a[i] = b[i] / c[i];
>}
>}
>
>ARM SVE:
>...
>max_mask_76 = .WHILE_ULT (0, bnd.6_52, { 0, ... });
> 
>Loop:
>...
># loop_mask_55 = PHI 
>...
>vect__4.9_56 = .MASK_LOAD (_51, 32B, loop_mask_55);
>mask__29.10_58 = vect__4.9_56 != { 0, ... };
>vec_mask_and_61 = loop_mask_55 & mask__29.10_58;
>...
>vect__6.13_62 = .MASK_LOAD (_24, 32B, vec_mask_and_61);
>...
>vect__8.16_66 = .MASK_LOAD (_1, 32B, vec_mask_and_61);
>vect__10.17_68 = .COND_DIV (vec_mask_and_61, vect__6.13_62, vect__8.16_66, 
> vect__6.13_62);
>...
>.MASK_STORE (_2, 32B, vec_mask_and_61, vect__10.17_68);
>...
>next_mask_77 = .WHILE_ULT (_3, bnd.6_52, { 0, ... });
>
>Here, ARM SVE use vec_mask_and_61 = loop_mask_55 & mask__29.10_58; to 
> gurantee the correct result.
>
>However, target with length control can not perform this elegant flow, for 
> RVV, we would expect:
>
>Loop:
>...
>loop_len_55 = SELECT_VL
>...
>mask__29.10_58 = vect__4.9_56 != { 0, ... };
>...
>vect__10.17_68 = .COND_LEN_DIV (mask__29.10_58, vect__6.13_62, 
> vect__8.16_66, vect__6.13_62, loop_len_55, bias);
>...
> 
>Here we expect COND_LEN_DIV predicated by a real mask which is the outcome 
> of comparison: mask__29.10_58 = vect__4.9_56 != { 0, ... };
>and a real length which is produced by loop control : loop_len_55 = 
> SELECT_VL
>
> 3. conditional Floating-point operations (no -ffast-math):
>
> void
> f (float *restrict a, float *restrict b, int32_t *restrict cond, int n)
> {
>   for (int i = 0; i < n; ++i)
> {
>   if (cond[i])
>   a[i] = b[i] + a[i];
> }
> }
>   
>   ARM SVE IR:
>   max_mask_70 = .WHILE_ULT (0, bnd.6_46, { 0, ... });
> 
>   ...
>   # loop_mask_49 = PHI 
>   ...
>   mask__27.10_52 = vect__4.9_50 != { 0, ... };
>   vec_mask_and_55 = loop_mask_49 & mask__27.10_52;
>   ...
>   vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, 
> vect__6.13_56);
>   ...
>   next_mask_71 = .WHILE_ULT (_22, bnd.6_46, { 0, ... });
>   ...
>   
>   For RVV, we would expect IR:
>   
>   ...
>   loop_len_49 = SELECT_VL
>   ...
>   mask__27.10_52 = vect__4.9_50 != { 0, ... };
>   ...
>   vect__9.17_62 = .COND_LEN_ADD (mask__27.10_52, vect__6.13_56, 
> vect__8.16_60, vect__6.13_56, loop_len_49, bias);
>   ...
> 
> 4. Conditional un-ordered reduction:
>
>int32_t
>f (int32_t *restrict a, 
>int32_t *restrict cond, int n)
>{
>  int32_t result = 0;
>  for (int i = 0; i < n; ++i)
>{
>if (cond[i])
>  result += a[i];
>}
>  return result;
>}
>
>ARM SVE IR:
>  
>  Loop:
>  # vect_result_18.7_37 = PHI 
>  ...
>  # loop_mask_40 = PH

Loop-ch improvements, part 1

2023-07-11 Thread Jan Hubicka via Gcc-patches
Hi,
this patch improves profile update in loop-ch to handle situation where 
duplicated header
has loop invariant test.  In this case we konw that all count of the exit edge 
belongs to
the duplicated loop header edge and can update probabilities accordingly.
Since we also do all the work to track this information from analysis to 
duplicaiton
I also added code to turn those conditionals to constants so we do not need 
later
jump threading pass to clean up.

This made me to work out that the propagatoin was buggy in few aspects
 1) it handled every PHI as PHI in header and incorrectly assigned some PHIs
to be IV-like when they are not
 2) it did not check for novops calls that are not required to return same
value on every invocation.
 3) I also added check for asm statement since those are not necessarily
reproducible either.

I would like to do more changes, but tried to prevent this patch from
snowballing.  The analysis of what statements will remain after duplication can
be improved.  I think we should use ranger query for other than first basic
block, too and possibly drop the IV heuristics then.  Also it seems that a lot
of this logic is pretty much same to analysis in peeling pass, so unifying this
would be nice.

I also think I should move the profile update out of
gimple_duplicate_sese_region (it is now very specific to ch) and rename it,
since those regions are singe entry multiple exit.

Bootstrapped/regtsted x86_64-linux, OK?

Honza

gcc/ChangeLog:

* tree-cfg.cc (gimple_duplicate_sese_region): Add ORIG_ELIMINATED_EDGES
parameter and rewrite profile updating code to handle edges elimination.
* tree-cfg.h (gimple_duplicate_sese_region): Update prototpe.
* tree-ssa-loop-ch.cc (loop_invariant_op_p): New function.
(loop_iv_derived_p): New function.
(should_duplicate_loop_header_p): Track invariant exit edges; fix 
handling
of PHIs and propagation of IV derived variables.
(ch_base::copy_headers): Pass around the invariant edges hash set.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/loop-ch-profile-1.c: Remove xfail.

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 4989906706c..3879fb7c4c1 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -6661,14 +6661,16 @@ add_phi_args_after_copy (basic_block *region_copy, 
unsigned n_region,
true otherwise.
 
ELIMINATED_EDGE is an edge that is known to be removed in the dupicated
-   region.  */
+   region.  ORIG_ELIMINATED_EDGES, if non-NULL is set of edges known to be
+   removed from the original region.  */
 
 bool
 gimple_duplicate_sese_region (edge entry, edge exit,
  basic_block *region, unsigned n_region,
  basic_block *region_copy,
  bool update_dominance,
- edge eliminated_edge)
+ edge eliminated_edge,
+ hash_set  *orig_eliminated_edges)
 {
   unsigned i;
   bool free_region_copy = false, copying_header = false;
@@ -6747,7 +6749,8 @@ gimple_duplicate_sese_region (edge entry, edge exit,
split_edge_bb_loc (entry), update_dominance);
   if (total_count.initialized_p () && entry_count.initialized_p ())
 {
-  if (!eliminated_edge)
+  if (!eliminated_edge
+ && (!orig_eliminated_edges || orig_eliminated_edges->is_empty ()))
{
  scale_bbs_frequencies_profile_count (region, n_region,
   total_count - entry_count,
@@ -6765,7 +6768,7 @@ gimple_duplicate_sese_region (edge entry, edge exit,
 if (cond1) <- this condition will become false
   and we update probabilities
   goto loop_exit;
-if (cond2)
+if (cond2) <- this condition is loop invariant
   goto loop_exit;
 goto loop_header   <- this will be redirected to loop.
   // region_copy_end
@@ -6776,6 +6779,7 @@ gimple_duplicate_sese_region (edge entry, edge exit,
   if (cond1)   <- we need to update probabbility here
 goto loop_exit;
   if (cond2)   <- and determine scaling factor here.
+  moreover cond2 is now always true
 goto loop_exit;
   else
 goto loop;
@@ -6785,53 +6789,84 @@ gimple_duplicate_sese_region (edge entry, edge exit,
 but only consumer so far is tree-ssa-loop-ch and it uses only this
 to handle the common case of peeling headers which have
 conditionals known to be always true upon entry.  */
- gcc_assert (eliminated_edge->src == region[0]
- && EDGE_COUNT (region[0]->succs) == 2
- && copying_header);
-
- edge e, e_copy, eliminated_

Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Richard Biener via Gcc-patches
On Mon, Jul 10, 2023 at 9:08 PM Alexander Monakov via Gcc-patches
 wrote:
>
>
> On Mon, 10 Jul 2023, Michael Matz via Gcc-patches wrote:
>
> > Hello,
> >
> > the ELF psABI for x86-64 doesn't have any callee-saved SSE
> > registers (there were actual reasons for that, but those don't
> > matter anymore).  This starts to hurt some uses, as it means that
> > as soon as you have a call (say to memmove/memcpy, even if
> > implicit as libcall) in a loop that manipulates floating point
> > or vector data you get saves/restores around those calls.
> >
> > But in reality many functions can be written such that they only need
> > to clobber a subset of the 16 XMM registers (or do the save/restore
> > themself in the codepaths that needs them, hello memcpy again).
> > So we want to introduce a way to specify this, via an ABI attribute
> > that basically says "doesn't clobber the high XMM regs".
>
> I think the main question is why you're going with this (weak) form
> instead of the (strong) form "may only clobber the low XMM regs":
> as Richi noted, surely for libcalls we'd like to know they preserve
> AVX-512 mask registers as well?
>
> (I realize this is partially answered later)
>
> Note this interacts with anything that interposes between the caller
> and the callee, like the Glibc lazy binding stub (which used to
> zero out high halves of 512-bit arguments in ZMM registers).
> Not an immediate problem for the patch, just something to mind perhaps.
>
> > I've opted to do only the obvious: do something special only for
> > xmm8 to xmm15, without a way to specify the clobber set in more detail.
> > I think such half/half split is reasonable, and as I don't want to
> > change the argument passing anyway (whose regs are always clobbered)
> > there isn't that much wiggle room anyway.
> >
> > I chose to make it possible to write function definitions with that
> > attribute with GCC adding the necessary callee save/restore code in
> > the xlogue itself.
>
> But you can't trivially restore if the callee is sibcalling — what
> happens then (a testcase might be nice)?
>
> > Carefully note that this is only possible for
> > the SSE2 registers, as other parts of them would need instructions
> > that are only optional.
>
> What is supposed to happen on 32-bit x86 with -msse -mno-sse2?
>
> > When a function doesn't contain calls to
> > unknown functions we can be a bit more lenient: we can make it so that
> > GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> > necessary.
>
> What if the source code has a local register variable bound to xmm15,
> i.e. register double x asm("xmm15"); asm("..." : "+x"(x)); ?
> Probably "dont'd do that", i.e. disallow that in the documentation?
>
> > If a function contains calls then GCC can't know which
> > parts of the XMM regset is clobbered by that, it may be parts
> > which don't even exist yet (say until avx2048 comes out), so we must
> > restrict ourself to only save/restore the SSE2 parts and then of course
> > can only claim to not clobber those parts.
>
> Hm, I guess this is kinda the reason a "weak" form is needed. But this
> highlights the difference between the two: the "weak" form will actively
> preserve some state (so it cannot preserve future extensions), while
> the "strong" form may just passively not touch any state, preserving
> any state it doesn't know about.
>
> > To that end I introduce actually two related attributes (for naming
> > see below):
> > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
>
> This is the weak/active form; I'd suggest "preserve_high_sse".

Isn't it the opposite?  "preserves_low_sse", unless you suggest
the name applies to the caller which has to preserve high parts
when calling nosseclobber.

> > * noanysseclobber: claims (and ensures) that nothing of any of the
> >   registers overlapping xmm8-15 is clobbered (not even future, as of
> >   yet unknown, parts)
>
> This is the strong/passive form; I'd suggest "only_low_sse".

Likewise.

As for mask registers I understand we'd have to split the 8 register
set into two halves to make the same approach work, otherwise
we'd have no registers left to allocate from.

> > Ensuring the first is simple: potentially add saves/restore in xlogue
> > (e.g. when xmm8 is either used explicitely or implicitely by a call).
> > Ensuring the second comes with more: we must also ensure that no
> > functions are called that don't guarantee the same thing (in addition
> > to just removing all xmm8-15 parts alltogether from the available
> > regsters).
> >
> > See also the added testcases for what I intended to support.
> >
> > I chose to use the new target independend function-abi facility for
> > this.  I need some adjustments in generic code:
> > * the "default_abi" is actually more like a "current" abi: it happily
> >   changes its contents according to conditional_register_usage,
> >   and other code assumes that such changes do propagate.
> >   But if that conditonal_reg_usage is ac

Re: [PATCH] genopinit: Allow more than 256 modes.

2023-07-11 Thread Robin Dapp via Gcc-patches
>   if (NUM_OPTABS > 0x
> || MAX_MACHINE_MODE >= ((1 << MACHINE_MODE_BITSIZE) - 1))
> fatal ("genopinit range assumptions invalid");
> 
> so it would be a case of changing those instead.

Thanks, right at the beginning of the file and I didn't see it ;)
MACHINE_MODE_BITSIZE is already 16, Pan changed that for one of
the previous patches.  Should we bump the NUM_OPTABS to
0x now, i.e. in this patch or only when the need arises?

Regards
 Robin


Re: [PATCH] genopinit: Allow more than 256 modes.

2023-07-11 Thread Richard Biener via Gcc-patches
On Tue, 11 Jul 2023, ??? wrote:

> Thanks for fixing it.
> CC Richards to see whether it is appropriate.

I agree with Richard S., but generally please avoid
'long long' and use stdint types when you need specific
precision.

Richard.

> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Robin Dapp
> Date: 2023-07-11 19:51
> To: gcc-patches
> CC: rdapp.gcc; jeffreyalaw; juzhe.zh...@rivai.ai
> Subject: [PATCH] genopinit: Allow more than 256 modes.
> Hi,
>  
> upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
> helper functions in gen* rely on the opcode as well as two modes fitting
> into an unsigned int (a signed int even if we consider the qsort default
> comparison function).  This patch changes the type of the index/hash
> from unsigned int to unsigned long long and allows up to 16 bits for a
> mode as well as 32 bits for an optab.
>  
> Despite fearing worse, bootstrap, build and test suite run times on
> x86, aarch64, rv64 and power10 are actually unchanged (I didn't check
> 32-bit architectures but would expect similar results).
>  
> Regards
> Robin
>  
> gcc/ChangeLog:
>  
> * genopinit.cc (pattern_cmp): Use if/else for comparison instead
> of subtraction.
> (main): Change to unsigned long long.
> * gensupport.cc (find_optab): Ditto.
> * gensupport.h (struct optab_pattern): Ditto.
> * optabs-query.h (optab_handler): Ditto.
> (convert_optab_handler): Ditto.
> ---
> gcc/genopinit.cc   | 19 ---
> gcc/gensupport.cc  |  3 ++-
> gcc/gensupport.h   |  2 +-
> gcc/optabs-query.h |  5 +++--
> 4 files changed, 18 insertions(+), 11 deletions(-)
>  
> diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
> index 6bd8858a1d9..58c1bf7cba8 100644
> --- a/gcc/genopinit.cc
> +++ b/gcc/genopinit.cc
> @@ -51,7 +51,12 @@ pattern_cmp (const void *va, const void *vb)
> {
>const optab_pattern *a = (const optab_pattern *)va;
>const optab_pattern *b = (const optab_pattern *)vb;
> -  return a->sort_num - b->sort_num;
> +  if (a->sort_num > b->sort_num)
> +return 1;
> +  else if (a->sort_num < b->sort_num)
> +return -1;
> +  else
> +return 0;
> }
> static int
> @@ -306,7 +311,7 @@ main (int argc, const char **argv)
>"extern const struct optab_libcall_d normlib_def[NUM_NORMLIB_OPTABS];\n"
>"\n"
>"/* Returns the active icode for the given (encoded) optab.  */\n"
> -"extern enum insn_code raw_optab_handler (unsigned);\n"
> +"extern enum insn_code raw_optab_handler (unsigned long long);\n"
>"extern bool swap_optab_enable (optab, machine_mode, bool);\n"
>"\n"
>"/* Target-dependent globals.  */\n"
> @@ -358,14 +363,14 @@ main (int argc, const char **argv)
>"#include \"optabs.h\"\n"
>"\n"
>"struct optab_pat {\n"
> -"  unsigned scode;\n"
> +"  unsigned long long scode;\n"
>"  enum insn_code icode;\n"
>"};\n\n");
>fprintf (s_file,
>"static const struct optab_pat pats[NUM_OPTAB_PATTERNS] = {\n");
>for (i = 0; patterns.iterate (i, &p); ++i)
> -fprintf (s_file, "  { %#08x, CODE_FOR_%s },\n", p->sort_num, p->name);
> +fprintf (s_file, "  { %#08llx, CODE_FOR_%s },\n", p->sort_num, p->name);
>fprintf (s_file, "};\n\n");
>fprintf (s_file, "void\ninit_all_optabs (struct target_optabs 
> *optabs)\n{\n");
> @@ -410,7 +415,7 @@ main (int argc, const char **argv)
>   the hash entries, which complicates the pat_enable array.  */
>fprintf (s_file,
>"static int\n"
> -"lookup_handler (unsigned scode)\n"
> +"lookup_handler (unsigned long long scode)\n"
>"{\n"
>"  int l = 0, h = ARRAY_SIZE (pats), m;\n"
>"  while (h > l)\n"
> @@ -428,7 +433,7 @@ main (int argc, const char **argv)
>fprintf (s_file,
>"enum insn_code\n"
> -"raw_optab_handler (unsigned scode)\n"
> +"raw_optab_handler (unsigned long long scode)\n"
>"{\n"
>"  int i = lookup_handler (scode);\n"
>"  return (i >= 0 && this_fn_optabs->pat_enable[i]\n"
> @@ -439,7 +444,7 @@ main (int argc, const char **argv)
>"bool\n"
>"swap_optab_enable (optab op, machine_mode m, bool set)\n"
>"{\n"
> -"  unsigned scode = (op << 16) | m;\n"
> +"  unsigned long long scode = ((unsigned long long)op << 32) | m;\n"
>"  int i = lookup_handler (scode);\n"
>"  if (i >= 0)\n"
>"{\n"
> diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
> index e39e6dacce2..3fe7428372d 100644
> --- a/gcc/gensupport.cc
> +++ b/gcc/gensupport.cc
> @@ -3806,7 +3806,8 @@ find_optab (optab_pattern *p, const char *name)
> {
>   p->name = name;
>   p->op = optabs[pindex].op;
> -   p->sort_num = (p->op << 16) | (p->m2 << 8) | p->m1;
> +   p->sort_num
> + = ((unsigned long long) p->op << 32) | (p->m2 << 16) | p->m1;
>   return true;
> }
>  }
> diff --git a/gcc/gensupport.h b/gcc/gensupport.h
> index 7925e22ed41..9f70e2310e2 100644
> --- a/gcc/gensupport.h
> +++ b/gcc/gensupport.h
> @@ -123,7 +123,7 @@ struct optab_pattern
>/* An index that provides a lexicographical sort of (OP, M2, M1).
>   Used by genopinit.

Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits

2023-07-11 Thread Richard Biener via Gcc-patches
On Tue, 11 Jul 2023, Jan Hubicka wrote:

> > > By now we did CCP and FRE so we likely optimized out most of constant
> > > conditionals exposed by inline.
> > 
> > So maybe we should simply delay re-propagation of the profile?  I
> > think cunrolli doesn't so much care about the profile - cunrolli
> > is (was) about abstraction removal.  Jump threading should be
> > the first pass to care.
> 
> That is what I was thinking too.  After inlining the profile counts may
> be in quite bad shape. If you inline together loop like in exchange that
> has large loop nest, we will definitely end up capping counts to avoid
> overflow.
> 
> cunrolli does:
> 
>  ret = tree_unroll_loops_completely (optimize >= 3, false);

Ah, yeah - that used to be false, false ...

> which sets may_increase_size to true for -O3 and then
> 
>  may_increase_size && optimize_loop_nest_for_speed_p (loop)
> 
> which seems reasonable guard and it may get random answers on capped
> profile.  It is not big deal to try propagating before cunrolli and then
> again before threading and see how much potential this idea has.
> I guess I should also double check that the other passes are indeed
> safe, but I think it is quite obvoius they should be.

Yeah.

Richard.


Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-11 Thread Jakub Jelinek via Gcc-patches
On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches wrote:
> On Wed, Jul 5, 2023 at 3:42 PM Drew Ross via Gcc-patches
>  wrote:
> >
> > Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y).
> > Tested successfully on x86_64 and x86 targets.
> >
> > PR middle-end/109986
> >
> > gcc/ChangeLog:
> >
> > * match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.c-torture/execute/pr109986.c: New test.
> > * gcc.dg/tree-ssa/pr109986.c: New test.
> > ---
> >  gcc/match.pd  |  11 ++
> >  .../gcc.c-torture/execute/pr109986.c  |  41 
> >  gcc/testsuite/gcc.dg/tree-ssa/pr109986.c  | 177 ++
> >  3 files changed, 229 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index a17d6838c14..d9d7d932881 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
> >(convert (bit_and @1 (bit_not @0)
> >
> > +/* (~X | Y) ^ X -> ~(X & Y).  */
> > +(simplify
> > + (bit_xor:c (nop_convert1?
> > + (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0)))
> > +@1)) (nop_convert4? @0))
> 
> you want to reduce the number of nop_convert? - for example
> I wonder if we can canonicalize
> 
>  (T)~X and ~(T)X
> 
> for nop-conversions.  The same might apply to binary bitwise operations
> where we should push those to a direction where they are likely eliminated.
> Usually we'd push them outwards.
> 
> The issue with the above pattern is that nop_convertN? expands to 2^N
> separate patterns.  Together with the two :c you get 64 out of this.
> 
> I do not see that all of the combinations can happen when X has to
> match unless we fail to contract some of them like if we have
> (unsigned)(~(signed)X | Y) ^ X which we could rewrite like
> -> (unsigned)((signed)~X | Y) ^ X -> (~X | (unsigned) Y) ^ X
> with the last step being somewhat difficult unless we do
> (signed)~X | Y -> (signed)(~X | (unsigned)Y).  It feels like a
> propagation problem and less of a direct pattern matching one.

The nop_convert1? in the pattern might seem to be unnecessary
for cases like:
int i, j, k, l;
unsigned u, v, w, x;

void
foo (void)
{
  int t0 = i;
  int t1 = (~t0) | j;
  x = t1 ^ (unsigned) t0;
  unsigned t2 = u;
  unsigned t3 = (~t2) | v;
  i = ((int) t3) ^ (int) t2;
}
we actually optimize it with or without the nop_convert1? in place,
because we have the
/* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
   when profitable.
...
  (bitop (convert@2 @0) (convert?@3 @1))
...
   (convert (bitop @0 (convert @1)
simplification.
Except that on
void
bar (void)
{
  unsigned t0 = u;
  int t1 = (~(int) t0) | j;
  x = t1 ^ t0;
  int t2 = i;
  unsigned t3 = (~(unsigned) t2) | v;
  i = ((int) t3) ^ t2;
}
the optimization doesn't trigger without the nop_convert1? and does
with it.

Perhaps we could get rid of nop_convert3? and nop_convert4?
by introducing a macro/inline function predicate like:
bitwise_equal_p (expr1, expr2) and instead of using
(nop_convert3? @0) and (nop_convert4? @0) in the pattern
use @0 and @2 and then add
if (bitwise_equal_p (@0, @2))
to the condition.
For GENERIC (i.e. in generic-match-head.cc) it could be something like:
static inline bool
bitwise_equal_p (tree expr1, tree expr2)
{
  STRIP_NOPS (expr1);
  STRIP_NOPS (expr2);
  if (expr1 == expr2)
return true;
  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
return false;
  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
return wi::to_wide (expr1) == wi::to_wide (expr2);
  return operand_equal_p (expr1, expr2, 0);
}
(the INTEGER_CST special case because operand_equal_p compares wi::to_widest
which could be different if one constant is signed and the other unsigned).
For GIMPLE, I wonder if it shouldn't be a macro that takes valueize into
account, and do something like:
#define bitwise_equal_p(expr1, expr2) gimple_bitwise_equal_p (expr1, expr2, 
valueize)

bool gimple_nop_convert (tree, tree *, tree (*)(tree));

static inline bool
gimple_bitwise_equal_p (tree expr1, tree expr2, tree (*valueize) (tree))
{
  if (expr1 == expr2)
return true;
  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
return false;
  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
return wi::to_wide (expr1) == wi::to_wide (expr2);
  if (operand_equal_p (expr1, expr2, 0))
return true;
  tree expr3, expr4;
  if (!gimple_nop_convert (expr1, &expr3, valueize))
expr3 = expr1;
  if (!gimple_nop_convert (expr2, &expr4, valueize))
expr4 = expr2;
  if (expr1 != expr3)
{
  if (operand_

Re: [PATCH] fortran: Release symbols in reversed order [PR106050]

2023-07-11 Thread Paul Richard Thomas via Gcc-patches
Hi Mikhail,

That's more than OK by me.

Thanks for attacking this PR.

I have a couple more of Steve's orphans waiting to be packaged up -
91960 and 104649. I'll submit them this evening.100607 is closed-fixed
and 103796 seems to be fixed.

Regards

Paul

On Tue, 11 Jul 2023 at 13:08, Mikael Morin via Fortran
 wrote:
>
> Hello,
>
> I saw the light regarding this PR after Paul posted a comment yesterday.
>
> Regression test in progress on x86_64-pc-linux-gnu.
> I plan to push in the next hours.
>
> Mikael
>
> -- >8 --
>
> Release symbols in reversed order wrt the order they were allocated.
> This fixes an error recovery ICE in the case of a misplaced
> derived type declaration.  Such a declaration creates nested
> symbols, one for the derived type and one for each type parameter,
> which should be immediately released as the declaration is
> rejected.  This breaks if the derived type is released first.
> As the type parameter symbols are in the namespace of the derived
> type, releasing the derived type releases the type parameters, so
> one can't access them after that, even to release them.  Hence,
> the type parameters should be released first.
>
> PR fortran/106050
>
> gcc/fortran/ChangeLog:
>
> * symbol.cc (gfc_restore_last_undo_checkpoint): Release symbols
> in reverse order.
>
> gcc/testsuite/ChangeLog:
>
> * gfortran.dg/pdt_33.f90: New test.
> ---
>  gcc/fortran/symbol.cc|  2 +-
>  gcc/testsuite/gfortran.dg/pdt_33.f90 | 15 +++
>  2 files changed, 16 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gfortran.dg/pdt_33.f90
>
> diff --git a/gcc/fortran/symbol.cc b/gcc/fortran/symbol.cc
> index 37a9e8fa0ae..4a71d84b3fe 100644
> --- a/gcc/fortran/symbol.cc
> +++ b/gcc/fortran/symbol.cc
> @@ -3661,7 +3661,7 @@ gfc_restore_last_undo_checkpoint (void)
>gfc_symbol *p;
>unsigned i;
>
> -  FOR_EACH_VEC_ELT (latest_undo_chgset->syms, i, p)
> +  FOR_EACH_VEC_ELT_REVERSE (latest_undo_chgset->syms, i, p)
>  {
>/* Symbol in a common block was new. Or was old and just put in common 
> */
>if (p->common_block
> diff --git a/gcc/testsuite/gfortran.dg/pdt_33.f90 
> b/gcc/testsuite/gfortran.dg/pdt_33.f90
> new file mode 100644
> index 000..0521513f2f8
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/pdt_33.f90
> @@ -0,0 +1,15 @@
> +! { dg-do compile }
> +!
> +! PR fortran/106050
> +! The following used to trigger an error recovery ICE by releasing
> +! the symbol T before the symbol K which was leading to releasing
> +! K twice as it's in T's namespace.
> +!
> +! Contributed by G. Steinmetz 
> +
> +program p
> +   a = 1
> +   type t(k)  ! { dg-error "Unexpected derived type 
> declaration" }
> +  integer, kind :: k = 4  ! { dg-error "not allowed outside a TYPE 
> definition" }
> +   end type   ! { dg-error "Expecting END PROGRAM" }
> +end
> --
> 2.40.1
>


--
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Jan Hubicka via Gcc-patches
> > > When a function doesn't contain calls to
> > > unknown functions we can be a bit more lenient: we can make it so that
> > > GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> > > necessary.

One may also take into account that first 8 registers are cheaper to
encode than the later 8, so perhaps we may want to choose range that
contains both.

Honza


[PATCH] genopinit: Allow more than 256 modes.

2023-07-11 Thread Robin Dapp via Gcc-patches
Ok so the consensus seems to rather stay with 32 bits and only
change the shift to 10/20?  As MACHINE_MODE_BITSIZE is already
16 we would need an additional check independent of that.
Wouldn't that also be a bit confusing?

Attached is a "v2" with unsigned long long changed to
uint64_t and checking for NUM_OPTABS <= 0x.

Regards
 Robin

Upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
helper functions in gen* rely on the opcode as well as two modes fitting
into an unsigned int (a signed int even if we consider the qsort default
comparison function).  This patch changes the type of the index/hash
from unsigned int to uint64_t and allows up to 16 bits for a mode as well
as 32 bits for an optab.

Despite fearing worse, bootstrap, build and test suite runtimes are
actually unchanged (on 64-bit architectures that is).

gcc/ChangeLog:

* genopinit.cc (pattern_cmp): Use if/else for comparison instead
of subtraction.
(main): Change to uint64_t.
* gensupport.cc (find_optab): Ditto.
* gensupport.h (struct optab_pattern): Ditto.
* optabs-query.h (optab_handler): Ditto.
(convert_optab_handler): Ditto.
---
 gcc/genopinit.cc   | 21 +
 gcc/gensupport.cc  |  2 +-
 gcc/gensupport.h   |  2 +-
 gcc/optabs-query.h |  4 ++--
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
index 6bd8858a1d9..05316ccb409 100644
--- a/gcc/genopinit.cc
+++ b/gcc/genopinit.cc
@@ -51,7 +51,12 @@ pattern_cmp (const void *va, const void *vb)
 {
   const optab_pattern *a = (const optab_pattern *)va;
   const optab_pattern *b = (const optab_pattern *)vb;
-  return a->sort_num - b->sort_num;
+  if (a->sort_num > b->sort_num)
+return 1;
+  else if (a->sort_num < b->sort_num)
+return -1;
+  else
+return 0;
 }
 
 static int
@@ -182,7 +187,7 @@ main (int argc, const char **argv)
 
   progname = "genopinit";
 
-  if (NUM_OPTABS > 0x
+  if (NUM_OPTABS > 0x
 || MAX_MACHINE_MODE >= ((1 << MACHINE_MODE_BITSIZE) - 1))
 fatal ("genopinit range assumptions inv
Upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
helper functions in gen* rely on the opcode as well as two modes fitting
into an unsigned int (a signed int even if we consider the qsort default
comparison function).  This patch changes the type of the index/hash
from unsigned int to uint64_t and allows up to 16 bits for a mode as well
as 32 bits for an optab.alid");
 
@@ -306,7 +311,7 @@ main (int argc, const char **argv)
   "extern const struct optab_libcall_d 
normlib_def[NUM_NORMLIB_OPTABS];\n"
   "\n"
   "/* Returns the active icode for the given (encoded) optab.  */\n"
-  "extern enum insn_code raw_optab_handler (unsigned);\n"
+  "extern enum insn_code raw_optab_handler (uint64_t);\n"
   "extern bool swap_optab_enable (optab, machine_mode, bool);\n"
   "\n"
   "/* Target-dependent globals.  */\n"
@@ -358,14 +363,14 @@ main (int argc, const char **argv)
   "#include \"optabs.h\"\n"
   "\n"
   "struct optab_pat {\n"
-  "  unsigned scode;\n"
+  "  uint64_t scode;\n"
   "  enum insn_code icode;\n"
   "};\n\n");
 
   fprintf (s_file,
   "static const struct optab_pat pats[NUM_OPTAB_PATTERNS] = {\n");
   for (i = 0; patterns.iterate (i, &p); ++i)
-fprintf (s_file, "  { %#08x, CODE_FOR_%s },\n", p->sort_num, p->name);
+fprintf (s_file, "  { %#08llx, CODE_FOR_%s },\n", p->sort_num, p->name);
   fprintf (s_file, "};\n\n");
 
   fprintf (s_file, "void\ninit_all_optabs (struct target_optabs 
*optabs)\n{\n");
@@ -410,7 +415,7 @@ main (int argc, const char **argv)
  the hash entries, which complicates the pat_enable array.  */
   fprintf (s_file,
   "static int\n"
-  "lookup_handler (unsigned scode)\n"
+  "lookup_handler (uint64_t scode)\n"
   "{\n"
   "  int l = 0, h = ARRAY_SIZE (pats), m;\n"
   "  while (h > l)\n"
@@ -428,7 +433,7 @@ main (int argc, const char **argv)
 
   fprintf (s_file,
   "enum insn_code\n"
-  "raw_optab_handler (unsigned scode)\n"
+  "raw_optab_handler (uint64_t scode)\n"
   "{\n"
   "  int i = lookup_handler (scode);\n"
   "  return (i >= 0 && this_fn_optabs->pat_enable[i]\n"
@@ -439,7 +444,7 @@ main (int argc, const char **argv)
   "bool\n"
   "swap_optab_enable (optab op, machine_mode m, bool set)\n"
   "{\n"
-  "  unsigned scode = (op << 16) | m;\n"
+  "  uint64_t scode = ((uint64_t)op << 32) | m;\n"
   "  int i = lookup_handler (scode);\n"
   "  if (i >= 0)\n"
   "{\n"
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index e39e6dacce2..68df90fce58 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -3806,7 +3806,7 @@ find_optab (optab_pattern *p, const char *name)
   

[PATCH] Fix typo in the testcase.

2023-07-11 Thread liuhongt via Gcc-patches
Antony Polukhin 2023-07-11 09:51:58 UTC
There's a typo at 
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/g%2B%2B.target/i386/pr110170.C;h=e638b12a5ee2264ecef77acca86432a9f24b103b;hb=d41a57c46df6f8f7dae0c0a8b349e734806a837b#l87

It should be `|| !test3() || !test3r()` rather than `|| !test3() || !test4r()`

Committed as an obvious fix.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr110170.C: Fix typo.
---
 gcc/testsuite/g++.target/i386/pr110170.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/i386/pr110170.C 
b/gcc/testsuite/g++.target/i386/pr110170.C
index e638b12a5ee..21cca8f3805 100644
--- a/gcc/testsuite/g++.target/i386/pr110170.C
+++ b/gcc/testsuite/g++.target/i386/pr110170.C
@@ -84,7 +84,7 @@ TEST()
   if (
   !test1() || !test1r()
   || !test2() || !test2r()
-  || !test3() || !test4r()
+  || !test3() || !test3r()
   || !test4() || !test4r()
   ) __builtin_abort();
 }
-- 
2.39.1.388.g2fc9e9ca3c



Re: [PATCH] c++: coercing variable template from current inst [PR110580]

2023-07-11 Thread Jason Merrill via Gcc-patches
Ok.

On Tue, Jul 11, 2023, 9:16 AM Patrick Palka  wrote:

> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?
>
> -- >8 --
>
> Here during ahead of time coercion of the variable template-id v1,
> since we pass only the innermost arguments to coerce_template_parms (and
> outer arguments are still dependent at this point), substitution of the
> default template argument V=U prematurely lowers U from level 2 to level 1.
> Thus we incorrectly resolve v1 to v1 (effectively) instead
> of to v1.
>
> Coercion of a class/alias template-id on the other hand is always done
> using the full set of arguments relative to the most general template,
> so ahead of time coercion there does the right thing.  I suppose we
> should do the same for variable template-ids.
>
> PR c++/110580
>
> gcc/cp/ChangeLog:
>
> * pt.cc (lookup_template_variable): Pass all arguments to
> coerce_template_parms, and use the innermost parameters from
> the most general template.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/cpp1y/var-templ83.C: New test.
> ---
>  gcc/cp/pt.cc |  4 +++-
>  gcc/testsuite/g++.dg/cpp1y/var-templ83.C | 16 
>  2 files changed, 19 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ83.C
>
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 076f788281e..fa15b75b9c5 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -10345,7 +10345,9 @@ lookup_template_variable (tree templ, tree
> arglist, tsubst_flags_t complain)
>if (flag_concepts && variable_concept_p (templ))
>  return build_concept_check (templ, arglist, tf_none);
>
> -  tree parms = DECL_INNERMOST_TEMPLATE_PARMS (templ);
> +  tree gen_templ = most_general_template (templ);
> +  tree parms = DECL_INNERMOST_TEMPLATE_PARMS (gen_templ);
> +  arglist = add_outermost_template_args (templ, arglist);
>arglist = coerce_template_parms (parms, arglist, templ, complain);
>if (arglist == error_mark_node)
>  return error_mark_node;
> diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ83.C
> b/gcc/testsuite/g++.dg/cpp1y/var-templ83.C
> new file mode 100644
> index 000..f5268f258d7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ83.C
> @@ -0,0 +1,16 @@
> +// PR c++/110580
> +// { dg-do compile { target c++14 } }
> +
> +template
> +struct A {
> +  template
> +  static constexpr bool v1 = __is_same(U, V);
> +
> +  template
> +  static constexpr bool v2 = !__is_same(U, V);
> +
> +  static_assert(v1, "");
> +  static_assert(v2, "");
> +};
> +
> +template struct A;
> --
> 2.41.0.327.gaa9166bcc0
>
>


Re: [PATCH] m68k: Avoid implicit function declaration in libgcc

2023-07-11 Thread Jeff Law via Gcc-patches




On 7/11/23 03:38, Florian Weimer via Gcc-patches wrote:

libgcc/

* config/m68k/fpgnulib.c (__cmpdf2): Declare.

OK.
jeff


Re: [PATCH] genopinit: Allow more than 256 modes.

2023-07-11 Thread 钟居哲
For example:

https://godbolt.org/z/1d6v5WKhY 

Clang can vectorize but GCC failed even with -ffast-math.
So I think conversions should be well checked again to make sure every variant 
can vectorize.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-07-11 21:31
To: Robin Dapp via Gcc-patches; jeffreyalaw; juzhe.zh...@rivai.ai; 
richard.sandiford; Richard Biener
CC: rdapp.gcc
Subject: [PATCH] genopinit: Allow more than 256 modes.
Ok so the consensus seems to rather stay with 32 bits and only
change the shift to 10/20?  As MACHINE_MODE_BITSIZE is already
16 we would need an additional check independent of that.
Wouldn't that also be a bit confusing?
 
Attached is a "v2" with unsigned long long changed to
uint64_t and checking for NUM_OPTABS <= 0x.
 
Regards
Robin
 
Upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
helper functions in gen* rely on the opcode as well as two modes fitting
into an unsigned int (a signed int even if we consider the qsort default
comparison function).  This patch changes the type of the index/hash
from unsigned int to uint64_t and allows up to 16 bits for a mode as well
as 32 bits for an optab.
 
Despite fearing worse, bootstrap, build and test suite runtimes are
actually unchanged (on 64-bit architectures that is).
 
gcc/ChangeLog:
 
* genopinit.cc (pattern_cmp): Use if/else for comparison instead
of subtraction.
(main): Change to uint64_t.
* gensupport.cc (find_optab): Ditto.
* gensupport.h (struct optab_pattern): Ditto.
* optabs-query.h (optab_handler): Ditto.
(convert_optab_handler): Ditto.
---
gcc/genopinit.cc   | 21 +
gcc/gensupport.cc  |  2 +-
gcc/gensupport.h   |  2 +-
gcc/optabs-query.h |  4 ++--
4 files changed, 17 insertions(+), 12 deletions(-)
 
diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
index 6bd8858a1d9..05316ccb409 100644
--- a/gcc/genopinit.cc
+++ b/gcc/genopinit.cc
@@ -51,7 +51,12 @@ pattern_cmp (const void *va, const void *vb)
{
   const optab_pattern *a = (const optab_pattern *)va;
   const optab_pattern *b = (const optab_pattern *)vb;
-  return a->sort_num - b->sort_num;
+  if (a->sort_num > b->sort_num)
+return 1;
+  else if (a->sort_num < b->sort_num)
+return -1;
+  else
+return 0;
}
static int
@@ -182,7 +187,7 @@ main (int argc, const char **argv)
   progname = "genopinit";
-  if (NUM_OPTABS > 0x
+  if (NUM_OPTABS > 0x
 || MAX_MACHINE_MODE >= ((1 << MACHINE_MODE_BITSIZE) - 1))
 fatal ("genopinit range assumptions inv
Upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
helper functions in gen* rely on the opcode as well as two modes fitting
into an unsigned int (a signed int even if we consider the qsort default
comparison function).  This patch changes the type of the index/hash
from unsigned int to uint64_t and allows up to 16 bits for a mode as well
as 32 bits for an optab.alid");
@@ -306,7 +311,7 @@ main (int argc, const char **argv)
   "extern const struct optab_libcall_d normlib_def[NUM_NORMLIB_OPTABS];\n"
   "\n"
   "/* Returns the active icode for the given (encoded) optab.  */\n"
-"extern enum insn_code raw_optab_handler (unsigned);\n"
+"extern enum insn_code raw_optab_handler (uint64_t);\n"
   "extern bool swap_optab_enable (optab, machine_mode, bool);\n"
   "\n"
   "/* Target-dependent globals.  */\n"
@@ -358,14 +363,14 @@ main (int argc, const char **argv)
   "#include \"optabs.h\"\n"
   "\n"
   "struct optab_pat {\n"
-"  unsigned scode;\n"
+"  uint64_t scode;\n"
   "  enum insn_code icode;\n"
   "};\n\n");
   fprintf (s_file,
   "static const struct optab_pat pats[NUM_OPTAB_PATTERNS] = {\n");
   for (i = 0; patterns.iterate (i, &p); ++i)
-fprintf (s_file, "  { %#08x, CODE_FOR_%s },\n", p->sort_num, p->name);
+fprintf (s_file, "  { %#08llx, CODE_FOR_%s },\n", p->sort_num, p->name);
   fprintf (s_file, "};\n\n");
   fprintf (s_file, "void\ninit_all_optabs (struct target_optabs 
*optabs)\n{\n");
@@ -410,7 +415,7 @@ main (int argc, const char **argv)
  the hash entries, which complicates the pat_enable array.  */
   fprintf (s_file,
   "static int\n"
-"lookup_handler (unsigned scode)\n"
+"lookup_handler (uint64_t scode)\n"
   "{\n"
   "  int l = 0, h = ARRAY_SIZE (pats), m;\n"
   "  while (h > l)\n"
@@ -428,7 +433,7 @@ main (int argc, const char **argv)
   fprintf (s_file,
   "enum insn_code\n"
-"raw_optab_handler (unsigned scode)\n"
+"raw_optab_handler (uint64_t scode)\n"
   "{\n"
   "  int i = lookup_handler (scode);\n"
   "  return (i >= 0 && this_fn_optabs->pat_enable[i]\n"
@@ -439,7 +444,7 @@ main (int argc, const char **argv)
   "bool\n"
   "swap_optab_enable (optab op, machine_mode m, bool set)\n"
   "{\n"
-"  unsigned scode = (op << 16) | m;\n"
+"  uint64_t scode = ((uint64_t)op << 32) | m;\n"
   "  int i = lookup_handler (scode);\n"
   "  if (i >= 0)\n"
   "{\n"
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index e39e6dacce2..68df90fce58 100644
--- a/gcc/gensup

Re: Re: [PATCH] genopinit: Allow more than 256 modes.

2023-07-11 Thread 钟居哲
Sorry for sending incorrect email.
Forget about this:).



juzhe.zh...@rivai.ai
 
From: 钟居哲
Date: 2023-07-11 21:55
To: rdapp.gcc; gcc-patches; Jeff Law; richard.sandiford; rguenther
CC: rdapp.gcc
Subject: Re: [PATCH] genopinit: Allow more than 256 modes.
For example:

https://godbolt.org/z/1d6v5WKhY 

Clang can vectorize but GCC failed even with -ffast-math.
So I think conversions should be well checked again to make sure every variant 
can vectorize.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-07-11 21:31
To: Robin Dapp via Gcc-patches; jeffreyalaw; juzhe.zh...@rivai.ai; 
richard.sandiford; Richard Biener
CC: rdapp.gcc
Subject: [PATCH] genopinit: Allow more than 256 modes.
Ok so the consensus seems to rather stay with 32 bits and only
change the shift to 10/20?  As MACHINE_MODE_BITSIZE is already
16 we would need an additional check independent of that.
Wouldn't that also be a bit confusing?
 
Attached is a "v2" with unsigned long long changed to
uint64_t and checking for NUM_OPTABS <= 0x.
 
Regards
Robin
 
Upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
helper functions in gen* rely on the opcode as well as two modes fitting
into an unsigned int (a signed int even if we consider the qsort default
comparison function).  This patch changes the type of the index/hash
from unsigned int to uint64_t and allows up to 16 bits for a mode as well
as 32 bits for an optab.
 
Despite fearing worse, bootstrap, build and test suite runtimes are
actually unchanged (on 64-bit architectures that is).
 
gcc/ChangeLog:
 
* genopinit.cc (pattern_cmp): Use if/else for comparison instead
of subtraction.
(main): Change to uint64_t.
* gensupport.cc (find_optab): Ditto.
* gensupport.h (struct optab_pattern): Ditto.
* optabs-query.h (optab_handler): Ditto.
(convert_optab_handler): Ditto.
---
gcc/genopinit.cc   | 21 +
gcc/gensupport.cc  |  2 +-
gcc/gensupport.h   |  2 +-
gcc/optabs-query.h |  4 ++--
4 files changed, 17 insertions(+), 12 deletions(-)
 
diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
index 6bd8858a1d9..05316ccb409 100644
--- a/gcc/genopinit.cc
+++ b/gcc/genopinit.cc
@@ -51,7 +51,12 @@ pattern_cmp (const void *va, const void *vb)
{
   const optab_pattern *a = (const optab_pattern *)va;
   const optab_pattern *b = (const optab_pattern *)vb;
-  return a->sort_num - b->sort_num;
+  if (a->sort_num > b->sort_num)
+return 1;
+  else if (a->sort_num < b->sort_num)
+return -1;
+  else
+return 0;
}
static int
@@ -182,7 +187,7 @@ main (int argc, const char **argv)
   progname = "genopinit";
-  if (NUM_OPTABS > 0x
+  if (NUM_OPTABS > 0x
 || MAX_MACHINE_MODE >= ((1 << MACHINE_MODE_BITSIZE) - 1))
 fatal ("genopinit range assumptions inv
Upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
helper functions in gen* rely on the opcode as well as two modes fitting
into an unsigned int (a signed int even if we consider the qsort default
comparison function).  This patch changes the type of the index/hash
from unsigned int to uint64_t and allows up to 16 bits for a mode as well
as 32 bits for an optab.alid");
@@ -306,7 +311,7 @@ main (int argc, const char **argv)
   "extern const struct optab_libcall_d normlib_def[NUM_NORMLIB_OPTABS];\n"
   "\n"
   "/* Returns the active icode for the given (encoded) optab.  */\n"
-"extern enum insn_code raw_optab_handler (unsigned);\n"
+"extern enum insn_code raw_optab_handler (uint64_t);\n"
   "extern bool swap_optab_enable (optab, machine_mode, bool);\n"
   "\n"
   "/* Target-dependent globals.  */\n"
@@ -358,14 +363,14 @@ main (int argc, const char **argv)
   "#include \"optabs.h\"\n"
   "\n"
   "struct optab_pat {\n"
-"  unsigned scode;\n"
+"  uint64_t scode;\n"
   "  enum insn_code icode;\n"
   "};\n\n");
   fprintf (s_file,
   "static const struct optab_pat pats[NUM_OPTAB_PATTERNS] = {\n");
   for (i = 0; patterns.iterate (i, &p); ++i)
-fprintf (s_file, "  { %#08x, CODE_FOR_%s },\n", p->sort_num, p->name);
+fprintf (s_file, "  { %#08llx, CODE_FOR_%s },\n", p->sort_num, p->name);
   fprintf (s_file, "};\n\n");
   fprintf (s_file, "void\ninit_all_optabs (struct target_optabs 
*optabs)\n{\n");
@@ -410,7 +415,7 @@ main (int argc, const char **argv)
  the hash entries, which complicates the pat_enable array.  */
   fprintf (s_file,
   "static int\n"
-"lookup_handler (unsigned scode)\n"
+"lookup_handler (uint64_t scode)\n"
   "{\n"
   "  int l = 0, h = ARRAY_SIZE (pats), m;\n"
   "  while (h > l)\n"
@@ -428,7 +433,7 @@ main (int argc, const char **argv)
   fprintf (s_file,
   "enum insn_code\n"
-"raw_optab_handler (unsigned scode)\n"
+"raw_optab_handler (uint64_t scode)\n"
   "{\n"
   "  int i = lookup_handler (scode);\n"
   "  return (i >= 0 && this_fn_optabs->pat_enable[i]\n"
@@ -439,7 +444,7 @@ main (int argc, const char **argv)
   "bool\n"
   "swap_optab_enable (optab op, machine_mode m, bool set)\n"
   "{\n"
-"  unsigned sc

Re: [PATCH] csky: Fix -Wincompatible-pointer-types warning during libgcc build

2023-07-11 Thread Jeff Law via Gcc-patches




On 7/11/23 03:38, Florian Weimer via Gcc-patches wrote:

libgcc/

* config/csky/linux-unwind.h (csky_fallback_frame_state): Add
missing cast.

OK
jeff


Re: [PATCH] riscv: Fix -Wincompatible-pointer-types warning during libgcc build

2023-07-11 Thread Jeff Law via Gcc-patches




On 7/11/23 03:38, Florian Weimer via Gcc-patches wrote:

libgcc/

* config/riscv/linux-unwind.h (riscv_fallback_frame_state): Add
missing cast.

OK
jeff


Re: [PATCH] arc: Fix -Wincompatible-pointer-types warning during libgcc build

2023-07-11 Thread Jeff Law via Gcc-patches




On 7/11/23 03:39, Florian Weimer via Gcc-patches wrote:

libgcc/

* config/arc/linux-unwind.h (arc_fallback_frame_state): Add
missing cast.

OK
jeff


Re: [PATCH] or1k: Fix -Wincompatible-pointer-types warning during libgcc build

2023-07-11 Thread Jeff Law via Gcc-patches




On 7/11/23 03:39, Florian Weimer via Gcc-patches wrote:

libgcc/

* config/or1k/linux-unwind.h (or1k_fallback_frame_state): Add
missing cast.

OK
jeff


Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-11 Thread Richard Biener via Gcc-patches
On Tue, Jul 11, 2023 at 3:08 PM Jakub Jelinek  wrote:
>
> On Thu, Jul 06, 2023 at 03:00:28PM +0200, Richard Biener via Gcc-patches 
> wrote:
> > On Wed, Jul 5, 2023 at 3:42 PM Drew Ross via Gcc-patches
> >  wrote:
> > >
> > > Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y).
> > > Tested successfully on x86_64 and x86 targets.
> > >
> > > PR middle-end/109986
> > >
> > > gcc/ChangeLog:
> > >
> > > * match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.c-torture/execute/pr109986.c: New test.
> > > * gcc.dg/tree-ssa/pr109986.c: New test.
> > > ---
> > >  gcc/match.pd  |  11 ++
> > >  .../gcc.c-torture/execute/pr109986.c  |  41 
> > >  gcc/testsuite/gcc.dg/tree-ssa/pr109986.c  | 177 ++
> > >  3 files changed, 229 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index a17d6838c14..d9d7d932881 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -1627,6 +1627,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
> > >(convert (bit_and @1 (bit_not @0)
> > >
> > > +/* (~X | Y) ^ X -> ~(X & Y).  */
> > > +(simplify
> > > + (bit_xor:c (nop_convert1?
> > > + (bit_ior:c (nop_convert2? (bit_not (nop_convert3? @0)))
> > > +@1)) (nop_convert4? @0))
> >
> > you want to reduce the number of nop_convert? - for example
> > I wonder if we can canonicalize
> >
> >  (T)~X and ~(T)X
> >
> > for nop-conversions.  The same might apply to binary bitwise operations
> > where we should push those to a direction where they are likely eliminated.
> > Usually we'd push them outwards.
> >
> > The issue with the above pattern is that nop_convertN? expands to 2^N
> > separate patterns.  Together with the two :c you get 64 out of this.
> >
> > I do not see that all of the combinations can happen when X has to
> > match unless we fail to contract some of them like if we have
> > (unsigned)(~(signed)X | Y) ^ X which we could rewrite like
> > -> (unsigned)((signed)~X | Y) ^ X -> (~X | (unsigned) Y) ^ X
> > with the last step being somewhat difficult unless we do
> > (signed)~X | Y -> (signed)(~X | (unsigned)Y).  It feels like a
> > propagation problem and less of a direct pattern matching one.
>
> The nop_convert1? in the pattern might seem to be unnecessary
> for cases like:
> int i, j, k, l;
> unsigned u, v, w, x;
>
> void
> foo (void)
> {
>   int t0 = i;
>   int t1 = (~t0) | j;
>   x = t1 ^ (unsigned) t0;
>   unsigned t2 = u;
>   unsigned t3 = (~t2) | v;
>   i = ((int) t3) ^ (int) t2;
> }
> we actually optimize it with or without the nop_convert1? in place,
> because we have the
> /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
>when profitable.
> ...
>   (bitop (convert@2 @0) (convert?@3 @1))
> ...
>(convert (bitop @0 (convert @1)
> simplification.
> Except that on
> void
> bar (void)
> {
>   unsigned t0 = u;
>   int t1 = (~(int) t0) | j;
>   x = t1 ^ t0;
>   int t2 = i;
>   unsigned t3 = (~(unsigned) t2) | v;
>   i = ((int) t3) ^ t2;
> }
> the optimization doesn't trigger without the nop_convert1? and does
> with it.
>
> Perhaps we could get rid of nop_convert3? and nop_convert4?
> by introducing a macro/inline function predicate like:
> bitwise_equal_p (expr1, expr2) and instead of using
> (nop_convert3? @0) and (nop_convert4? @0) in the pattern
> use @0 and @2 and then add
> if (bitwise_equal_p (@0, @2))
> to the condition.
> For GENERIC (i.e. in generic-match-head.cc) it could be something like:
> static inline bool
> bitwise_equal_p (tree expr1, tree expr2)
> {
>   STRIP_NOPS (expr1);
>   STRIP_NOPS (expr2);
>   if (expr1 == expr2)
> return true;
>   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
> return false;
>   if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
> return wi::to_wide (expr1) == wi::to_wide (expr2);
>   return operand_equal_p (expr1, expr2, 0);
> }
> (the INTEGER_CST special case because operand_equal_p compares wi::to_widest
> which could be different if one constant is signed and the other unsigned).
> For GIMPLE, I wonder if it shouldn't be a macro that takes valueize into
> account, and do something like:
> #define bitwise_equal_p(expr1, expr2) gimple_bitwise_equal_p (expr1, expr2, 
> valueize)
>
> bool gimple_nop_convert (tree, tree *, tree (*)(tree));
>
> static inline bool
> gimple_bitwise_equal_p (tree expr1, tree expr2, tree (*valueize) (tree))
> {
>   if (expr1 == expr2)
> return true;
>   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
> return false;
>   if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
>

Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Alexander Monakov via Gcc-patches


On Tue, 11 Jul 2023, Richard Biener wrote:

> > > If a function contains calls then GCC can't know which
> > > parts of the XMM regset is clobbered by that, it may be parts
> > > which don't even exist yet (say until avx2048 comes out), so we must
> > > restrict ourself to only save/restore the SSE2 parts and then of course
> > > can only claim to not clobber those parts.
> >
> > Hm, I guess this is kinda the reason a "weak" form is needed. But this
> > highlights the difference between the two: the "weak" form will actively
> > preserve some state (so it cannot preserve future extensions), while
> > the "strong" form may just passively not touch any state, preserving
> > any state it doesn't know about.
> >
> > > To that end I introduce actually two related attributes (for naming
> > > see below):
> > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> >
> > This is the weak/active form; I'd suggest "preserve_high_sse".
> 
> Isn't it the opposite?  "preserves_low_sse", unless you suggest
> the name applies to the caller which has to preserve high parts
> when calling nosseclobber.

This is the form where the function annnotated with this attribute
consumes 128 bytes on the stack to "blindly" save/restore xmm8-15
if it calls anything with a vanilla ABI.

(actually thinking about it more, I'd like to suggest shelving this part
and only implement the zero-cost variant, noanysseclobber)

> > > * noanysseclobber: claims (and ensures) that nothing of any of the
> > >   registers overlapping xmm8-15 is clobbered (not even future, as of
> > >   yet unknown, parts)
> >
> > This is the strong/passive form; I'd suggest "only_low_sse".
> 
> Likewise.

Sorry if I managed to sow confusion here. In my mind, this is the form where
only xmm0-xmm7 can be written in the function annotated with the attribute,
including its callees. I was thinking that writing to zmm16-31 would be
disallowed too. The initial example was memcpy, where eight vector registers
are sufficient for the job.

> As for mask registers I understand we'd have to split the 8 register
> set into two halves to make the same approach work, otherwise
> we'd have no registers left to allocate from.

I'd suggest to look how many mask registers OpenMP SIMD AVX-512 clones
can receive as implicit arguments, as one data point.

Alexander


Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Michael Matz via Gcc-patches
Hello,

On Tue, 11 Jul 2023, Jan Hubicka wrote:

> > > > When a function doesn't contain calls to
> > > > unknown functions we can be a bit more lenient: we can make it so that
> > > > GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> > > > necessary.
> 
> One may also take into account that first 8 registers are cheaper to
> encode than the later 8, so perhaps we may want to choose range that
> contains both.

There is actually none in the low range that's usable.  xmm0/1 are used 
for return values and xmm2-7 are used for argument passing.  Arguments are 
by default callee clobbered, and we do not want to change this (or limit 
the number of register arguments for the alternate ABI).


Ciao,
Michael.


Re: [Patch] libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space

2023-07-11 Thread Tobias Burnus

I have now committed this (mostly .texi) patch as Rev.
r14-2434-g8c2fc744a25ec4

Changes to my previously posted version: Fixed a typo in .texi and in
the changelog, tweaked the wording for {nearest} to sound better and to
provide more details.

Tobias

On 11.07.23 00:07, Tobias Burnus wrote:

I noted that all memory spaces are supported, some by falling
back to the default ("malloc") - except for omp_high_bw_mem_space
(unless the memkind lib is available).

I think it makes more sense to fallback to 'malloc' also for
omp_high_bw_mem_space.

Additionally, I updated the documentation to more explicitly state
what the current implementation is.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 8c2fc744a25ec423ab1a317bf4e7d24315c40024
Author: Tobias Burnus 
Date:   Tue Jul 11 16:11:35 2023 +0200

libgomp: Update OpenMP memory allocation doc, fix omp_high_bw_mem_space

libgomp/

* allocator.c (omp_init_allocator): Use malloc for
omp_high_bw_mem_space when the memkind lib is unavailable
instead of returning omp_null_allocator.
* libgomp.texi (OpenMP 5.0): Fix typo.
(Memory allocation with libmemkind): Document implementation
in more detail.
---
 libgomp/allocator.c  |  2 +-
 libgomp/libgomp.texi | 30 --
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index c49931cbad4..25c0f150302 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -301,7 +301,7 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
 	  break;
 	}
 #endif
-  return omp_null_allocator;
+  break;
 case omp_large_cap_mem_space:
 #ifdef LIBGOMP_USE_MEMKIND
   memkind_data = gomp_get_memkind ();
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 7d27cc50df5..d1a5e67329a 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -192,7 +192,7 @@ The OpenMP 4.5 specification is fully supported.
   env variable @tab Y @tab
 @item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
 @item @code{requires} directive @tab P
-  @tab complete but no non-host devices provides @code{unified_shared_memory}
+  @tab complete but no non-host device provides @code{unified_shared_memory}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
 @item Non-rectangular loop nests @tab P @tab Full support for C/C++, partial for Fortran
 @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
@@ -4634,6 +4634,17 @@ smaller number.  On non-host devices, the value of the
 @node Memory allocation with libmemkind
 @section Memory allocation with libmemkind
 
+For the memory spaces, the following applies:
+@itemize
+@item @code{omp_default_mem_space} is supported
+@item @code{omp_const_mem_space} maps to @code{omp_default_mem_space}
+@item @code{omp_low_lat_mem_space} maps to @code{omp_default_mem_space}
+@item @code{omp_large_cap_mem_space} maps to @code{omp_default_mem_space},
+  unless the memkind library is available
+@item @code{omp_high_bw_mem_space} maps to @code{omp_default_mem_space},
+  unless the memkind library is available
+@end itemize
+
 On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind
 library} (@code{libmemkind.so.0}) is available at runtime, it is used when
 creating memory allocators requesting
@@ -4641,9 +4652,24 @@ creating memory allocators requesting
 @itemize
 @item the memory space @code{omp_high_bw_mem_space}
 @item the memory space @code{omp_large_cap_mem_space}
-@item the partition trait @code{omp_atv_interleaved}
+@item the partition trait @code{omp_atv_interleaved}; note that for
+  @code{omp_large_cap_mem_space} the allocation will not be interleaved
 @end itemize
 
+Additional notes:
+@itemize
+@item The @code{pinned} trait is unsupported.
+@item For the @code{partition} trait, the partition part size will be the same
+  as the requested size (i.e. @code{interleaved} or @code{blocked} has no
+  effect), except for @code{interleaved} when the memkind library is
+  available.  Furthermore, for @code{nearest} the memory might not be
+  on the same NUMA node as thread that allocated the memory; on Linux,
+  this is in particular the case when the memory placement policy is
+  set to preferred.
+@item The @code{access} trait has no effect such that memory is always
+  accessible by all threads.
+@item The @code{sync_hint} trait has no effect.
+@end itemize
 
 @c -
 @c Offload-Target Specifics


Re: [PATCH] aarch64: Fix warnings during libgcc build

2023-07-11 Thread Richard Earnshaw (lists) via Gcc-patches

On 11/07/2023 10:37, Florian Weimer via Gcc-patches wrote:

libgcc/

* config/aarch64/aarch64-unwind.h (aarch64_cie_signed_with_b_key):
Add missing const qualifier.  Cast from const unsigned char *
to const char *.  Use __builtin_strchr to avoid an implicit
function declaration.
* config/aarch64/linux-unwind.h (aarch64_fallback_frame_state):
Add missing cast.

---
diff --git a/libgcc/config/aarch64/linux-unwind.h 
b/libgcc/config/aarch64/linux-unwind.h
index 00eba866049..93da7a9537d 100644
--- a/libgcc/config/aarch64/linux-unwind.h
+++ b/libgcc/config/aarch64/linux-unwind.h
@@ -77,7 +77,7 @@ aarch64_fallback_frame_state (struct _Unwind_Context *context,
  }
  
rt_ = context->cfa;

-  sc = &rt_->uc.uc_mcontext;
+  sc = (struct sigcontext *) &rt_->uc.uc_mcontext;
  
  /* This define duplicates the definition in aarch64.md */

  #define SP_REGNUM 31




This looks somewhat dubious.  I'm not particularly familiar with the 
kernel headers, but a quick look suggests an mcontext_t is nothing like 
a sigcontext_t.  So isn't the cast just papering over some more 
fundamental problem?


R.


  1   2   >