Re: [PATCH v2] add explicit ABI and align options to pr88233.c

2024-05-25 Thread Alexandre Oliva
On Apr 22, 2024, Alexandre Oliva  wrote:

> for  gcc/testsuite/ChangeLog

>   * gcc.target/powerpc/pr88233.c: Make some alignment strictness
>   and calling conventions assumptions explicit.  Restore uniform
>   codegen expectations

Ping?  https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649823.html

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[pushed] wwwdocs: git: Fix typo

2024-05-25 Thread Gerald Pfeifer
Trivial fix. Pushed.
 
Gerald
---
 htdocs/git.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/git.html b/htdocs/git.html
index 22c0eec1..a6e88566 100644
--- a/htdocs/git.html
+++ b/htdocs/git.html
@@ -236,7 +236,7 @@ additional branches can also be fetched if necessary.
 
 
 You can download any of the additional branches by adding a suitable
-fetch specification to your local copy of the git repostiory.  For
+fetch specification to your local copy of the git repository.  For
 example, if your remote is called 'origin' (the default with git
 clone) you can add the 'dead' development branches by running:
 
-- 
2.45.0


Re: [PATCH v2] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

2024-05-25 Thread Alexandre Oliva
On Apr 22, 2024, Alexandre Oliva  wrote:

> for  gcc/testsuite/ChangeLog

>   PR testsuite/101169
>   * gcc.target/powerpc/fold-vec-extract-double.p7.c: Adjust addi
>   counts for ilp32.
>   * gcc.target/powerpc/fold-vec-extract-float.p7.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-float.p8.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-int.p7.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-short.p7.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.

Ping?  https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649830.html

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH v2] [testsuite] [arm] add effective target and options for pacbti tests

2024-05-25 Thread Alexandre Oliva
On Apr 19, 2024, Alexandre Oliva  wrote:

> for  gcc/testsuite/ChangeLog

>   * gcc.target/arm/bti-1.c: Require arch, use its opts, drop skip.
>   * gcc.target/arm/bti-2.c: Likewise.
>   * gcc.target/arm/acle/pacbti-m-predef-11.c: Likewise.
>   * gcc.target/arm/acle/pacbti-m-predef-12.c: Likewise.
>   * gcc.target/arm/acle/pacbti-m-predef-7.c: Likewise.
>   * g++.target/arm/pac-1.C: Likewise.  Drop +mve.

Ping?  https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649732.html

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] [tree-prof] skip if errors were seen [PR113681]

2024-05-25 Thread Alexandre Oliva
On Apr 16, 2024, Alexandre Oliva  wrote:

> for  gcc/ChangeLog

>   PR tree-optimization/113681
>   * tree-profiling.cc (pass_ipa_tree_profile::gate): Skip if
>   seen_errors.

> for  gcc/testsuite/ChangeLog

>   PR tree-optimization/113681
>   * c-c++-common/strub-pr113681.c: New.

Ping?  https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649546.html

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[pushed] wwwdocs: gcc-13: Reword section on __bf16

2024-05-25 Thread Gerald Pfeifer
I found this section hard to understand at first (in addition to some 
grammar issues) so pushed the following.

Lingling, please advise if you'd like further changes.

Gerald

---
 htdocs/gcc-13/changes.html | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 2702170d..49261e1b 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -549,10 +549,10 @@ You may also want to check out our
   For both C and C++ the __bf16 type is supported on
   x86 systems with SSE2 and above enabled.
   
-  Use real __bf16 type for AVX512BF16 intrinsics. Previously
-  we use __bfloat16 which is typedef of short. Now we introduced real
-  __bf16 type to x86 psABI. Users need to adjust their
-  AVX512BF16-related source code when upgrading GCC12 to GCC13.
+  Use this __bf16 type for AVX512BF16 intrinsics instead
+  of __bfloat16 which is typedef for short.
+  __bf16 is now part of the x86 psABI. Users need to adjust their
+  AVX512BF16-related source code when upgrading to GCC 13.
   
   New ISA extension support for Intel AMX-COMPLEX was added.
   AMX-COMPLEX intrinsics are available via the -mamx-complex
-- 
2.45.0


Re: [PATCH] Avoid vector -Wfree-nonheap-object warnings

2024-05-25 Thread François Dumont



On 24/05/2024 16:17, Jonathan Wakely wrote:

On Thu, 23 May 2024 at 18:38, François Dumont  wrote:


On 23/05/2024 15:31, Jonathan Wakely wrote:

On 23/05/24 06:55 +0200, François Dumont wrote:

As explained in this email:

https://gcc.gnu.org/pipermail/libstdc++/2024-April/058552.html

I experimented -Wfree-nonheap-object because of my enhancements on
algos.

So here is a patch to extend the usage of the _Guard type to other
parts of vector.

Nice, that fixes the warning you were seeing?

Yes ! I indeed forgot to say so :-)



We recently got a bug report about -Wfree-nonheap-object in
std::vector, but that is coming from _M_realloc_append which already
uses the RAII guard :-(
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115016

Note that I also had to move call to __uninitialized_copy_a before
assigning this->_M_impl._M_start so get rid of the -Wfree-nonheap-object
warn. But _M_realloc_append is already doing potentially throwing
operations before assigning this->_M_impl so it must be something else.

Though it made me notice another occurence of _Guard in this method. Now
replaced too in this new patch.

  libstdc++: Use RAII to replace try/catch blocks

  Move _Guard into std::vector declaration and use it to guard all
calls to
  vector _M_allocate.

  Doing so the compiler has more visibility on what is done with the
pointers
  and do not raise anymore the -Wfree-nonheap-object warning.

  libstdc++-v3/ChangeLog:

  * include/bits/vector.tcc (_Guard): Move all the nested
duplicated class...
  * include/bits/stl_vector.h (_Guard_alloc): ...here.
  (_M_allocate_and_copy): Use latter.
  (_M_initialize_dispatch): Likewise and set _M_finish first
from the result
  of __uninitialize_fill_n_a that can throw.
  (_M_range_initialize): Likewise.


diff --git a/libstdc++-v3/include/bits/stl_vector.h
b/libstdc++-v3/include/bits/stl_vector.h
index 31169711a48..4ea74e3339a 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1607,6 +1607,39 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   clear() _GLIBCXX_NOEXCEPT
   { _M_erase_at_end(this->_M_impl._M_start); }

+private:
+  // RAII guard for allocated storage.
+  struct _Guard

If it's being defined at class scope instead of locally in a member
function, I think a better name would be good. Maybe _Ptr_guard or
_Dealloc_guard or something.

_Guard_alloc chosen.

+  {
+pointer _M_storage;// Storage to deallocate
+size_type _M_len;
+_Base& _M_vect;
+
+_GLIBCXX20_CONSTEXPR
+_Guard(pointer __s, size_type __l, _Base& __vect)
+: _M_storage(__s), _M_len(__l), _M_vect(__vect)
+{ }
+
+_GLIBCXX20_CONSTEXPR
+~_Guard()
+{
+  if (_M_storage)
+_M_vect._M_deallocate(_M_storage, _M_len);
+}
+
+_GLIBCXX20_CONSTEXPR
+pointer
+_M_release()
+{
+  pointer __res = _M_storage;
+  _M_storage = 0;

I don't think the NullablePointer requirements include assigning 0,
only from nullptr, which isn't valid in C++98.

https://en.cppreference.com/w/cpp/named_req/NullablePointer

Please use _M_storage = pointer() instead.

I forgot about user fancy pointer, fixed.



+  return __res;
+}
+
+  private:
+_Guard(const _Guard&);
+  };
+
 protected:
   /**
*  Memory expansion handler.  Uses the member allocation
function to
@@ -1618,18 +1651,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 _M_allocate_and_copy(size_type __n,
  _ForwardIterator __first, _ForwardIterator __last)
 {
-  pointer __result = this->_M_allocate(__n);
-  __try
-{
-  std::__uninitialized_copy_a(__first, __last, __result,
-  _M_get_Tp_allocator());
-  return __result;
-}
-  __catch(...)
-{
-  _M_deallocate(__result, __n);
-  __throw_exception_again;
-}
+  _Guard __guard(this->_M_allocate(__n), __n, *this);
+  std::__uninitialized_copy_a
+(__first, __last, __guard._M_storage, _M_get_Tp_allocator());
+  return __guard._M_release();
 }


@@ -1642,13 +1667,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   // 438. Ambiguity in the "do the right thing" clause
   template
 void
-_M_initialize_dispatch(_Integer __n, _Integer __value, __true_type)
+_M_initialize_dispatch(_Integer __int_n, _Integer __value,
__true_type)
 {
-  this->_M_impl._M_start = _M_allocate(_S_check_init_len(
-static_cast(__n), _M_get_Tp_allocator()));
-  this->_M_impl._M_end_of_storage =
-this->_M_impl._M_start + static_cast(__n);
-  _M_fill_initialize(static_cast(__n), __value);

Please fix the comment on _M_fill_initialize if you're removing the
use of it here.

Already done in this initial patch proposal, see below.


+  const size_type __n = static_cast(__int_n);
+  _Guard __guard(_M_allocate(_S_

[PATCH v3 #1/2] enable adjustment of return_pc debug attrs

2024-05-25 Thread Alexandre Oliva
On Apr 27, 2023, Alexandre Oliva  wrote:

> On Apr 14, 2023, Alexandre Oliva  wrote:
>> On Mar 23, 2023, Alexandre Oliva  wrote:
>>> This patch introduces infrastructure for targets to add an offset to
>>> the label issued after the call_insn to set the call_return_pc
>>> attribute.  This will be used on rs6000, that sometimes issues another
>>> instruction after the call proper as part of a call insn.

>> Ping?
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614452.html

Ping?
Refreshed, retested on ppc64le-linux-gnu.  Ok to install?


This patch introduces infrastructure for targets to add an offset to
the label issued after the call_insn to set the call_return_pc
attribute.  This will be used on rs6000, that sometimes issues another
instruction after the call proper as part of a call insn.


for  gcc/ChangeLog

* target.def (call_offset_return_label): New hook.
* gcc/doc/tm.texi.in (TARGET_CALL_OFFSET_RETURN_LABEL): Add
placeholder.
* gcc/doc/tm.texi: Rebuild.
* dwarf2out.cc (struct call_arg_loc_node): Record call_insn
instad of call_arg_loc_note.
(add_AT_lbl_id): Add optional offset argument.
(gen_call_site_die): Compute and pass on a return pc offset.
(gen_subprogram_die): Move call_arg_loc_note computation...
(dwarf2out_var_location): ... from here.  Set call_insn.
---
 gcc/doc/tm.texi|7 +++
 gcc/doc/tm.texi.in |2 ++
 gcc/dwarf2out.cc   |   26 +-
 gcc/target.def |9 +
 4 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index cd50078227d98..8a7aa70d605ba 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5557,6 +5557,13 @@ except the last are treated as named.
 You need not define this hook if it always returns @code{false}.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_CALL_OFFSET_RETURN_LABEL (rtx_insn 
*@var{call_insn})
+While generating call-site debug info for a CALL insn, or a SEQUENCE
+insn starting with a CALL, this target hook is invoked to compute the
+offset to be added to the debug label emitted after the call to obtain
+the return address that should be recorded as the return PC.
+@end deftypefn
+
 @deftypefn {Target Hook} void TARGET_START_CALL_ARGS (cumulative_args_t 
@var{complete_args})
 This target hook is invoked while generating RTL for a function call,
 after the argument values have been computed, and after stack arguments
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 058bd56487a9a..9e0830758aeea 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3887,6 +3887,8 @@ These machine description macros help implement varargs:
 
 @hook TARGET_STRICT_ARGUMENT_NAMING
 
+@hook TARGET_CALL_OFFSET_RETURN_LABEL
+
 @hook TARGET_START_CALL_ARGS
 
 @hook TARGET_CALL_ARGS
diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 5b064ffd78ad1..1092880738df4 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -3593,7 +3593,7 @@ typedef struct var_loc_list_def var_loc_list;
 
 /* Call argument location list.  */
 struct GTY ((chain_next ("%h.next"))) call_arg_loc_node {
-  rtx GTY (()) call_arg_loc_note;
+  rtx_insn * GTY (()) call_insn;
   const char * GTY (()) label;
   tree GTY (()) block;
   bool tail_call_p;
@@ -3777,7 +3777,8 @@ static void remove_addr_table_entry (addr_table_entry *);
 static void add_AT_addr (dw_die_ref, enum dwarf_attribute, rtx, bool);
 static inline rtx AT_addr (dw_attr_node *);
 static void add_AT_symview (dw_die_ref, enum dwarf_attribute, const char *);
-static void add_AT_lbl_id (dw_die_ref, enum dwarf_attribute, const char *);
+static void add_AT_lbl_id (dw_die_ref, enum dwarf_attribute, const char *,
+  int = 0);
 static void add_AT_lineptr (dw_die_ref, enum dwarf_attribute, const char *);
 static void add_AT_macptr (dw_die_ref, enum dwarf_attribute, const char *);
 static void add_AT_range_list (dw_die_ref, enum dwarf_attribute,
@@ -5353,14 +5354,17 @@ add_AT_symview (dw_die_ref die, enum dwarf_attribute 
attr_kind,
 
 static inline void
 add_AT_lbl_id (dw_die_ref die, enum dwarf_attribute attr_kind,
-   const char *lbl_id)
+  const char *lbl_id, int offset)
 {
   dw_attr_node attr;
 
   attr.dw_attr = attr_kind;
   attr.dw_attr_val.val_class = dw_val_class_lbl_id;
   attr.dw_attr_val.val_entry = NULL;
-  attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_id);
+  if (!offset)
+attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_id);
+  else
+attr.dw_attr_val.v.val_lbl_id = xasprintf ("%s%+i", lbl_id, offset);
   if (dwarf_split_debug_info)
 attr.dw_attr_val.val_entry
 = add_addr_table_entry (attr.dw_attr_val.v.val_lbl_id,
@@ -23515,7 +23519,9 @@ gen_call_site_die (tree decl, dw_die_ref subr_die,
   if (stmt_die == NULL)
 stmt_die = subr_die;
   die = new_die (dwarf_TAG (DW_TAG_call_site), stmt_die, NULL_TREE);
-  add_AT_lbl_id (die, dwarf_AT (DW_AT_call_return_pc), ca_loc->label);
+  

[PATCH v3 #1/2] [rs6000] adjust return_pc debug attrs

2024-05-25 Thread Alexandre Oliva
On Apr 27, 2023, Alexandre Oliva  wrote:

> On Apr 14, 2023, Alexandre Oliva  wrote:
>> On Mar 23, 2023, Alexandre Oliva  wrote:
>>> This patch introduces infrastructure for targets to add an offset to
>>> the label issued after the call_insn to set the call_return_pc
>>> attribute.  This will be used on rs6000, that sometimes issues another
>>> instruction after the call proper as part of a call insn.

>> Ping?
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614453.html

> Ping?

Ping?
Refreshed, retested on ppc64le-linux-gnu.  Ok to install?


Some of the rs6000 call patterns, on some ABIs, issue multiple opcodes
out of a single call insn, but the call (bl) or jump (b) is not always
the last opcode in the sequence.

This does not seem to be a problem for exception handling tables, but
the return_pc attribute in the call graph output in dwarf2+ debug
information, that takes the address of a label output right after the
call, does not match the value of the link register even for non-tail
calls.  E.g., with ABI_AIX or ABI_ELFv2, such code as:

  foo ();

outputs:

  bl foo
  nop
 LVL#:
[...]
  .8byte .LVL#  # DW_AT_call_return_pc

but debug info consumers may rely on the return_pc address, and draw
incorrect conclusions from its off-by-4 value.

This patch uses the infrastructure for targets to add an offset to the
label issued after the call_insn to set the call_return_pc attribute,
on rs6000, to account for opcodes issued after actual call opcode as
part of call insns output patterns.


for  gcc/ChangeLog

* config/rs6000/rs6000.cc (TARGET_CALL_OFFSET_RETURN_LABEL):
Override.
(rs6000_call_offset_return_label): New.
---
 gcc/config/rs6000/rs6000.cc |   18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index e4dc629ddcc9a..77e6b94a539da 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1779,6 +1779,8 @@ static const scoped_attribute_specs *const 
rs6000_attribute_table[] =
 #undef TARGET_OVERLAP_OP_BY_PIECES_P
 #define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
 
+#undef TARGET_CALL_OFFSET_RETURN_LABEL
+#define TARGET_CALL_OFFSET_RETURN_LABEL rs6000_call_offset_return_label
 
 
 /* Processor table.  */
@@ -14822,6 +14824,22 @@ rs6000_assemble_integer (rtx x, unsigned int size, int 
aligned_p)
   return default_assemble_integer (x, size, aligned_p);
 }
 
+/* Return the offset to be added to the label output after CALL_INSN
+   to compute the address to be placed in DW_AT_call_return_pc.  */
+
+static int
+rs6000_call_offset_return_label (rtx_insn *call_insn)
+{
+  /* All rs6000 CALL_INSN output patterns start with a b or bl, always
+ a 4-byte instruction, but some output patterns issue other
+ opcodes afterwards.  The return label is issued after the entire
+ call insn, including any such post-call opcodes.  Instead of
+ figuring out which cases need adjustments, we compute the offset
+ back to the address of the call opcode proper, then add the
+ constant 4 bytes, to get the address after that opcode.  */
+  return 4 - get_attr_length (call_insn);
+}
+
 /* Return a template string for assembly to emit when making an
external call.  FUNOP is the call mem argument operand number.  */
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] vect: Use vect representative statement instead of original in patch recog [PR115060]

2024-05-25 Thread Feng Xue OS
Some utility functions (such as vect_look_through_possible_promotion) that are
to find out certain kind of direct or indirect definition SSA for a value, may
return the original one of the SSA, not its pattern representative SSA, even
pattern is involved. For example,

   a = (T1) patt_b;
   patt_b = (T2) c;// b = ...
   patt_c = not-a-cast;// c = ...

Given 'a', the mentioned function will return 'c', instead of 'patt_c'. This
subtlety would make some pattern recog code that is unaware of it mis-use the
original instead of the new pattern statement, which is inconsistent wth
processing logic of the pattern formation pass. This patch corrects the issue
by forcing another utility function (vect_get_internal_def) return the pattern
statement information to caller by default.

Regression test on x86-64 and aarch64.

Feng
--
gcc/
PR tree-optimization/115060
* tree-vect-patterns.h (vect_get_internal_def): Add a new parameter
for_vectorize.
(vect_widened_op_tree): Call vect_get_internal_def instead of look_def
to get statement information.
(vect_recog_widen_abd_pattern): No need to call vect_stmt_to_vectorize.
---
 gcc/tree-vect-patterns.cc | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index a313dc64643..fa35bf26372 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -258,15 +258,21 @@ vect_element_precision (unsigned int precision)
 }
 
 /* If OP is defined by a statement that's being considered for vectorization,
-   return information about that statement, otherwise return NULL.  */
+   return information about that statement, otherwise return NULL.
+   FOR_VECTORIZE is used to specify whether original or vectorization
+   representative (if have) statement information is returned.  */
 
 static stmt_vec_info
-vect_get_internal_def (vec_info *vinfo, tree op)
+vect_get_internal_def (vec_info *vinfo, tree op, bool for_vectorize = true)
 {
   stmt_vec_info def_stmt_info = vinfo->lookup_def (op);
   if (def_stmt_info
   && STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_internal_def)
-return def_stmt_info;
+{
+  if (for_vectorize)
+   def_stmt_info = vect_stmt_to_vectorize (def_stmt_info);
+  return def_stmt_info;
+}
   return NULL;
 }
 
@@ -655,7 +661,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info 
stmt_info, tree_code code,
 
  /* Recursively process the definition of the operand.  */
  stmt_vec_info def_stmt_info
-   = vinfo->lookup_def (this_unprom->op);
+   = vect_get_internal_def (vinfo, this_unprom->op);
+
  nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
   widened_code, shift_p, max_nops,
   this_unprom, common_type,
@@ -1739,7 +1746,6 @@ vect_recog_widen_abd_pattern (vec_info *vinfo, 
stmt_vec_info stmt_vinfo,
   if (!abd_pattern_vinfo)
 return NULL;
 
-  abd_pattern_vinfo = vect_stmt_to_vectorize (abd_pattern_vinfo);
   gcall *abd_stmt = dyn_cast  (STMT_VINFO_STMT (abd_pattern_vinfo));
   if (!abd_stmt

[PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-25 Thread Feng Xue OS
Both derived classes ( loop_vec_info/bb_vec_info) have their own "bbs"
field, which have exactly same purpose of recording all basic blocks
inside the corresponding vect region, while the fields are composed by
different data type, one is normal array, the other is auto_vec. This
difference causes some duplicated code even handling the same stuff,
almost in tree-vect-patterns. One refinement is lifting this field into the
base class "vec_info", and reset its value to the continuous memory area
pointed by two old "bbs" in each constructor of derived classes.

Regression test on x86-64 and aarch64.

Feng
--
gcc/
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Move
initialization of bbs to explicit construction code.  Adjust the
definition of nbbs.
* tree-vect-pattern.cc (vect_determine_precisions): Make
loop_vec_info and bb_vec_info share same code.
(vect_pattern_recog): Remove duplicated vect_pattern_recog_1 loop.
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Access to bbs[0]
via base vec_info class.
(_bb_vec_info::_bb_vec_info): Initialize bbs and nbbs using data
fields of input auto_vec<> bbs.
(_bb_vec_info::_bb_vec_info): Add assertions on bbs and nbbs to ensure
they are not changed externally.
(vect_slp_region): Use access to nbbs to replace original
bbs.length().
(vect_schedule_slp_node): Access to bbs[0] via base vec_info class.
* tree-vectorizer.cc (vec_info::vec_info): Add initialization of
bbs and nbbs.
(vec_info::insert_seq_on_entry): Access to bbs[0] via base vec_info
class.
* tree-vectorizer.h (vec_info): Add new fields bbs and nbbs.
(_loop_vec_info): Remove field bbs.
(_bb_vec_info): Rename old bbs field to bbs_as_vector, and make it
be private.
---
 gcc/tree-vect-loop.cc |   6 +-
 gcc/tree-vect-patterns.cc | 142 +++---
 gcc/tree-vect-slp.cc  |  24 ---
 gcc/tree-vectorizer.cc|   7 +-
 gcc/tree-vectorizer.h |  19 ++---
 5 files changed, 72 insertions(+), 126 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 83c0544b6aa..aef17420a5f 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1028,7 +1028,6 @@ bb_in_loop_p (const_basic_block bb, const void *data)
 _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
   : vec_info (vec_info::loop, shared),
 loop (loop_in),
-bbs (XCNEWVEC (basic_block, loop->num_nodes)),
 num_itersm1 (NULL_TREE),
 num_iters (NULL_TREE),
 num_iters_unchanged (NULL_TREE),
@@ -1079,8 +1078,9 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
  case of the loop forms we allow, a dfs order of the BBs would the same
  as reversed postorder traversal, so we are safe.  */
 
-  unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
- bbs, loop->num_nodes, loop);
+  bbs = XCNEWVEC (basic_block, loop->num_nodes);
+  nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p, bbs,
+loop->num_nodes, loop);
   gcc_assert (nbbs == loop->num_nodes);
 
   for (unsigned int i = 0; i < nbbs; i++)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index a313dc64643..848a3195a93 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6925,81 +6925,41 @@ vect_determine_stmt_precisions (vec_info *vinfo, 
stmt_vec_info stmt_info)
 void
 vect_determine_precisions (vec_info *vinfo)
 {
+  basic_block *bbs = vinfo->bbs;
+  unsigned int nbbs = vinfo->nbbs;
+
   DUMP_VECT_SCOPE ("vect_determine_precisions");
 
-  if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
+  for (unsigned int i = 0; i < nbbs; i++)
 {
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  unsigned int nbbs = loop->num_nodes;
-
-  for (unsigned int i = 0; i < nbbs; i++)
+  basic_block bb = bbs[i];
+  for (auto gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
- basic_block bb = bbs[i];
- for (auto gsi = gsi_start_phis (bb);
-  !gsi_end_p (gsi); gsi_next (&gsi))
-   {
- stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi.phi ());
- if (stmt_info)
-   vect_determine_mask_precision (vinfo, stmt_info);
-   }
- for (auto si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
-   if (!is_gimple_debug (gsi_stmt (si)))
- vect_determine_mask_precision
-   (vinfo, vinfo->lookup_stmt (gsi_stmt (si)));
+ stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi.phi ());
+ if (stmt_info && STMT_VINFO_VECTORIZABLE (stmt_info))
+   vect_determine_mask_precision (vinfo, stmt_info);
}
-  for (unsigned int i = 0; i < nbbs; 

[PATCH] libcpp: Correct typo 'r' -> '\r'

2024-05-25 Thread Peter Damianov
libcpp/ChangeLog:
* lex.cc (do_peek_prev): Correct typo in argument to __builtin_expect()

Signed-off-by: Peter Damianov 
---
 libcpp/lex.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libcpp/lex.cc b/libcpp/lex.cc
index c9e44e6..de752bdc9c8 100644
--- a/libcpp/lex.cc
+++ b/libcpp/lex.cc
@@ -5038,7 +5038,7 @@ do_peek_prev (const unsigned char *peek, const unsigned 
char *bound)
 
   unsigned char c = *--peek;
   if (__builtin_expect (c == '\n', false)
-  || __builtin_expect (c == 'r', false))
+  || __builtin_expect (c == '\r', false))
 {
   if (peek == bound)
return peek;
-- 
2.39.2



Re: [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation

2024-05-25 Thread Jeff Law




On 5/24/24 2:41 AM, Mariam Arutunian wrote:
Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster 
CRC generation.

One performs bit-forward and the other bit-reversed CRC computation.
If CRC optabs are supported, they are used for the CRC computation.
Otherwise, table-based CRC is generated.
The supported data and CRC sizes are 8, 16, 32, and 64 bits.
The polynomial is without the leading 1.
A table with 256 elements is used to store precomputed CRCs.
For the reflection of inputs and the output, a simple algorithm involving
SHIFT, AND, and OR operations is used.

Co-authored-by: Joern Rennecke >


gcc/

    * doc/md.texi (crc@var{m}@var{n}4,
    crc_rev@var{m}@var{n}4): Document.
    * expr.cc (generate_crc_table): New function.
    (calculate_table_based_CRC): Likewise.
    (expand_crc_table_based): Likewise.
    (gen_common_operation_to_reflect): Likewise.
    (reflect_64_bit_value): Likewise.
    (reflect_32_bit_value): Likewise.
    (reflect_16_bit_value): Likewise.
    (reflect_8_bit_value): Likewise.
    (generate_reflecting_code_standard): Likewise.
    (expand_reversed_crc_table_based): Likewise.
    * expr.h (generate_reflecting_code_standard): New function declaration.
    (expand_crc_table_based): Likewise.
    (expand_reversed_crc_table_based): Likewise.
    * internal-fn.cc: (crc_direct): Define.
    (direct_crc_optab_supported_p): Likewise.
    (expand_crc_optab_fn): New function
    * internal-fn.def (CRC, CRC_REV): New internal functions.
    * optabs.def (crc_optab, crc_rev_optab): New optabs.

Signed-off-by: Mariam Arutunian >

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..be68ef860f9 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8557,6 +8557,20 @@ operand 2, greater than operand 2 or is unordered with 
operand 2.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{crc@var{m}@var{n}4} instruction pattern

+@item @samp{crc@var{m}@var{n}4}
+Calculate a bit-forward CRC using operands 1, 2 and 3,
+then store the result in operand 0.
+Operands 1 is the initial CRC, operands 2 is the data and operands 3 is the
+polynomial without leading 1.
+Operands 0, 1 and 3 have mode @var{n} and operand 2 has mode @var{m}, where
+both modes are integers.  The size of CRC to be calculated is determined by the
+mode; for example, if @var{n} is 'hi', a CRC16 is calculated.
+
+@cindex @code{crc_rev@var{m}@var{n}4} instruction pattern
+@item @samp{crc_rev@var{m}@var{n}4}
+Similar to @samp{crc@var{m}@var{n}4}, but calculates a bit-reversed CRC.
+
So just to be clear, this is a case where the input (operand 2) may have 
a different mode than the output (operand 0).  That scenario is 
generally discouraged, with a few exceptions (the most common being 
shift counts which are often QImode objects while the 
value-to-be-shifted and the output value are potentially any scalar 
integer mode.


So I don't think this is a problem, just wanted to point it out to 
anyone else that may be looking at this code.




 @end table
 
 @end ifset

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 1baa39b98eb..18368ae6b6c 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -14091,3 +14091,359 @@ int_expr_size (const_tree exp)
 
   return tree_to_shwi (size);

 }
+
+/* Calculate CRC for the initial CRC and given POLYNOMIAL.
+   CRC_BITS is CRC size.  */
+
+static unsigned HOST_WIDE_INT
+calculate_crc (unsigned HOST_WIDE_INT crc,
+ unsigned HOST_WIDE_INT polynomial,
+ unsigned crc_bits)
Just a nit.  Line up the polynomial & crc_bits declarations with the crc 
declaration.




+{
+  crc = crc << (crc_bits - 8);
+  for (int i = 8; i > 0; --i)
+{
+  if ((crc >> (crc_bits - 1)) & 1)
+   crc = (crc << 1) ^ polynomial;
+  else
+   crc <<= 1;
+}
+
+  crc <<=  (sizeof (crc) * BITS_PER_UNIT - crc_bits);
+  crc >>=  (sizeof (crc) * BITS_PER_UNIT - crc_bits);

Another nit.  Just once space after the <<= or >>= operators.



+
+  return crc;
+}
+
+/* Assemble CRC table with 256 elements for the given POLYNOM and CRC_BITS with
+   given ID.
+   ID is the identifier of the table, the name of the table is unique,
+   contains CRC size and the polynomial.
+   POLYNOM is the polynomial used to calculate the CRC table's elements.
+   CRC_BITS is the size of CRC, may be 8, 16, ... . */
+
+rtx
+assemble_crc_table (tree id, unsigned HOST_WIDE_INT polynom, unsigned crc_bits)
+{
+  unsigned table_el_n = 0x100;
+  tree ar = build_array_type (make_unsigned_type (crc_bits),
+ build_index_type (size_int (table_el_n - 1)));

Nit.  Line up build_index_type at the same indention as make_unsigned_type.

Note that with TREE_READONLY set, there is at least some chance that the 
linker will find identical tables and merge them.  I haven't tested 
this, but I know it happens for other objects in the constant pools.




+  sprintf (buf, "crc_table_for_crc_%u_polynomial_" H

Re: [RFC/RFA] [PATCH 02/12] Add built-ins and tests for bit-forward and bit-reversed CRCs

2024-05-25 Thread Jeff Law




On 5/24/24 2:41 AM, Mariam Arutunian wrote:
This patch introduces new built-in functions to GCC for computing bit- 
forward and bit-reversed CRCs.

These builtins aim to provide efficient CRC calculation capabilities.
When the target architecture supports CRC operations (as indicated by 
the presence of a CRC optab),

the builtins will utilize the expander to generate CRC code.
In the absence of hardware support, the builtins default to generating 
code for a table-based CRC calculation.


The builtins are defined as follows:
__builtin_rev_crc16_data8,
__builtin_rev_crc32_data8, __builtin_rev_crc32_data16, 
__builtin_rev_crc32_data32

__builtin_crc8_data8,
__builtin_crc16_data16, __builtin_crc16_data8,
__builtin_crc32_data8, __builtin_crc32_data16, __builtin_crc32_data32,
__builtin_crc64_data8, __builtin_crc64_data16,  __builtin_crc64_data32, 
__builtin_crc64_data64


Each builtin takes three parameters:
crc: The initial CRC value.
data: The data to be processed.
polynomial: The CRC polynomial without the leading 1.

To validate the correctness of these builtins, this patch also includes 
additions to the GCC testsuite.
This enhancement allows GCC to offer developers high-performance CRC 
computation options

that automatically adapt to the capabilities of the target hardware.

Co-authored-by: Joern Rennecke >


Not complete. May continue the work if these built-ins are needed.

gcc/

  * builtin-types.def (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE): Define.
  (BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Likewise.
           (BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise.
           (BT_FN_UINT32_UINT32_UINT8_CONST_SIZE): Likewise.
           (BT_FN_UINT32_UINT32_UINT16_CONST_SIZE): Likewise.
           (BT_FN_UINT32_UINT32_UINT32_CONST_SIZE): Likewise.
           (BT_FN_UINT64_UINT64_UINT8_CONST_SIZE): Likewise.
           (BT_FN_UINT64_UINT64_UINT16_CONST_SIZE): Likewise.
           (BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise.
           (BT_FN_UINT64_UINT64_UINT64_CONST_SIZE): Likewise.
           * builtins.cc (associated_internal_fn): Handle 
BUILT_IN_CRC8_DATA8,

           BUILT_IN_CRC16_DATA8, BUILT_IN_CRC16_DATA16,
           BUILT_IN_CRC32_DATA8, BUILT_IN_CRC32_DATA16, 
BUILT_IN_CRC32_DATA32,
           BUILT_IN_CRC64_DATA8, BUILT_IN_CRC64_DATA16, 
BUILT_IN_CRC64_DATA32,

           BUILT_IN_CRC64_DATA64,
           BUILT_IN_REV_CRC8_DATA8,
           BUILT_IN_REV_CRC16_DATA8, BUILT_IN_REV_CRC16_DATA16,
           BUILT_IN_REV_CRC32_DATA8, BUILT_IN_REV_CRC32_DATA16, 
BUILT_IN_REV_CRC32_DATA32.

           (expand_builtin_crc_table_based): New function.
           (expand_builtin): Handle BUILT_IN_CRC8_DATA8,
           BUILT_IN_CRC16_DATA8, BUILT_IN_CRC16_DATA16,
           BUILT_IN_CRC32_DATA8, BUILT_IN_CRC32_DATA16, 
BUILT_IN_CRC32_DATA32,
           BUILT_IN_CRC64_DATA8, BUILT_IN_CRC64_DATA16, 
BUILT_IN_CRC64_DATA32,

           BUILT_IN_CRC64_DATA64,
           BUILT_IN_REV_CRC8_DATA8,
           BUILT_IN_REV_CRC16_DATA8, BUILT_IN_REV_CRC16_DATA16,
           BUILT_IN_REV_CRC32_DATA8, BUILT_IN_REV_CRC32_DATA16, 
BUILT_IN_REV_CRC32_DATA32.

           * builtins.def (BUILT_IN_CRC8_DATA8): New builtin.
           (BUILT_IN_CRC16_DATA8): Likewise.
           (BUILT_IN_CRC16_DATA16): Likewise.
           (BUILT_IN_CRC32_DATA8): Likewise.
           (BUILT_IN_CRC32_DATA16): Likewise.
           (BUILT_IN_CRC32_DATA32): Likewise.
           (BUILT_IN_CRC64_DATA8): Likewise.
           (BUILT_IN_CRC64_DATA16): Likewise.
           (BUILT_IN_CRC64_DATA32): Likewise.
           (BUILT_IN_CRC64_DATA64): Likewise.
           (BUILT_IN_REV_CRC8_DATA8): New builtin.
           (BUILT_IN_REV_CRC16_DATA8): Likewise.
           (BUILT_IN_REV_CRC16_DATA16): Likewise.
           (BUILT_IN_REV_CRC32_DATA8): Likewise.
           (BUILT_IN_REV_CRC32_DATA16): Likewise.
           (BUILT_IN_REV_CRC32_DATA32): Likewise.
           * builtins.h (expand_builtin_crc_table_based): New function 
declaration.

           * doc/extend.texti (__builtin_rev_crc16_data8,
           (__builtin_rev_crc32_data32, __builtin_rev_crc32_data8,
           __builtin_rev_crc32_data16, __builtin_crc8_data8,
           __builtin_crc16_data16, __builtin_crc16_data8,
           __builtin_crc32_data32, __builtin_crc32_data8,
           __builtin_crc32_data16, __builtin_crc64_data64,
           __builtin_crc64_data8, __builtin_crc64_data16,
           __builtin_crc64_data32): Document.

       gcc/testsuite/

          * gcc.c-torture/compile/crc-builtin-rev-target32.c
          * gcc.c-torture/compile/crc-builtin-rev-target64.c
          * gcc.c-torture/compile/crc-builtin-target32.c
          * gcc.c-torture/compile/crc-builtin-target64.c

Signed-off-by: Mariam Arutunian >



diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..b662de91e49 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2207,7 +2207,24 @@ associated_internal_fn (built_in_function fn,

Re: [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-05-25 Thread Jeff Law




On 5/24/24 2:41 AM, Mariam Arutunian wrote:
If the target is ZBC or ZBKC, it uses clmul instruction for the CRC 
calculation.

Otherwise, if the target is ZBKB, generates table-based CRC,
but for reversing inputs and the output uses bswap and brev8 instructions.
Add new tests to check CRC generation for ZBC, ZBKC and ZBKB targets.

   gcc/

      * expr.cc (gf2n_poly_long_div_quotient): New function.
      (reflect): Likewise.
      * expr.h (gf2n_poly_long_div_quotient): New function declaration.
      (reflect): Likewise.

   gcc/config/riscv/

      * bitmanip.md (crc_rev4): New expander for 
reversed CRC.

      (crc4): New expander for bit-forward CRC.
      (SUBX1, ANYI1): New iterators.
      * riscv-protos.h (generate_reflecting_code_using_brev): New 
function declaration.

      (expand_crc_using_clmul): Likewise.
      (expand_reversed_crc_using_clmul): Likewise.
      * riscv.cc (generate_reflecting_code_using_brev): New function.
      (expand_crc_using_clmul): Likewise.
      (expand_reversed_crc_using_clmul): Likewise.
      * riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.

   gcc/testsuite/gcc.target/riscv/

         * crc-1-zbc.c: New test.
         * crc-10-zbc.c: Likewise.
         * crc-12-zbc.c: Likewise.
         * crc-13-zbc.c: Likewise.
         * crc-14-zbc.c: Likewise.
         * crc-17-zbc.c: Likewise.
         * crc-18-zbc.c: Likewise.
         * crc-21-zbc.c: Likewise.
         * crc-22-rv64-zbc.c: Likewise.
         * crc-22-zbkb.c: Likewise.
         * crc-23-zbc.c: Likewise.
         * crc-4-zbc.c: Likewise.
         * crc-5-zbc.c: Likewise.
         * crc-5-zbkb.c: Likewise.
         * crc-6-zbc.c: Likewise.
         * crc-7-zbc.c: Likewise.
         * crc-8-zbc.c: Likewise.
         * crc-8-zbkb.c: Likewise.
         * crc-9-zbc.c: Likewise.
         * crc-CCIT-data16-zbc.c: Likewise.
         * crc-CCIT-data8-zbc.c: Likewise.
         * crc-coremark-16bitdata-zbc.c: Likewise.

Signed-off-by: Mariam Arutunian >

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 8769a6b818b..c98d451f404 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -973,3 +973,66 @@
   "TARGET_ZBC"
   "clmulr\t%0,%1,%2"
   [(set_attr "type" "clmul")])
+
+
+;; Iterator for hardware integer modes narrower than XLEN, same as SUBX
+(define_mode_iterator SUBX1 [QI HI (SI "TARGET_64BIT")])
+
+;; Iterator for hardware integer modes narrower than XLEN, same as ANYI
+(define_mode_iterator ANYI1 [QI HI SI (DI "TARGET_64BIT")])
If these iterators are the same as existing ones, let's just using the 
existing ones. unless we need both SUBX and SUBX1 in the same pattern or 
ANYI/ANYI1.





+
+;; Reversed CRC 8, 16, 32 for TARGET_64
+(define_expand "crc_rev4"
+   ;; return value (calculated CRC)
+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+ ;; initial CRC
+   (unspec:ANYI [(match_operand:ANYI 1 "register_operand" "r")
+ ;; data
+ (match_operand:ANYI1 2 "register_operand" "r")
+ ;; polynomial without leading 1
+ (match_operand:ANYI 3)]
+ UNSPEC_CRC_REV))]
So the preferred formatting for .md files has operands of a given 
operator at the same indention level.  So in this case SET is the 
operator, with two operands (destination/source).  Indent the source and 
destination at the same level.   so


  [(set (match_operand:ANYI 0 ...0)
(unspec: ANYI ...)

Similarly for the reversed expander.



diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 85df5b7ab49..123695033a6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11394,7 +11394,7 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y)
   if (mode == HImode || mode == QImode)
 {
   int shift_bits = GET_MODE_BITSIZE (Xmode)
-   - GET_MODE_BITSIZE (mode).to_constant ();
+  - GET_MODE_BITSIZE (mode).to_constant ();
 
   gcc_assert (shift_bits > 0);

Looks like an unrelated spurious change.  Drop.


 
@@ -11415,6 +11415,188 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y)

   emit_move_insn (dest, gen_lowpart (mode, xmode_dest));
 }
 
+/* Generate instruction sequence

+   which reflects the value of the OP using bswap and brev8 instructions.
+   OP's mode may be less than word_mode, to get the correct number,
+   after reflecting we shift right the value by SHIFT_VAL.
+   E.g. we have  0001, after reflection (target 32-bit) we will get
+   1000   , if we shift-out 16 bits,
+   we will get the desired one: 1000 .  */
+
+void
+generate_reflecting_code_using_brev (rtx *op, int shift_val)
+{
+
+  riscv_expand_op (BSWAP, word_mode, *op, *op, *op);
+  riscv_expand_op (LSHIFTRT, word_mode, *op, *op,
+  gen_int_mode (shift_val, word_mode));
Formatting nit with the gen_int_mode (...) argument.  It should line up 
with the LSHIFT

Re: [RFC/RFA] [PATCH 12/12] Add tests for CRC detection and generation.

2024-05-25 Thread Jeff Law




On 5/24/24 2:42 AM, Mariam Arutunian wrote:

   gcc/testsuite/gcc.c-torture/compile/

     * crc-11.c: New test.
     * crc-15.c: Likewise.
     * crc-16.c: Likewise.
     * crc-19.c: Likewise.
     * crc-2.c: Likewise.
     * crc-20.c: Likewise.
     * crc-24.c: Likewise.
     * crc-29.c: Likewise.
     * crc-27.c: Likewise.
     * crc-3.c: Likewise.
     * crc-crc32-data24.c: Likewise.
     * crc-from-fedora-packages (1-24).c: Likewise.
     * crc-linux-(1-5).c: Likewise.
     * crc-not-crc-(1-26).c: Likewise.
     * crc-side-instr-(1-17).c: Likewise.

   gcc/testsuite/gcc.c-torture/execute/

     * crc-(1, 4-10, 12-14, 17-18, 21-28).c: New tests.
     * crc-CCIT-data16-xorOutside_InsideFor.c: Likewise.
     * crc-CCIT-data16.c: Likewise.
     * crc-CCIT-data8.c: Likewise.
     * crc-coremark16-data16.c: Likewise.
     * crc-coremark16-data16.c: Likewise.
     * crc-coremark32-data32.c: Likewise.
     * crc-coremark32-data8.c: Likewise.
     * crc-coremark64-data64.c: Likewise.
     * crc-coremark8-data8.c: Likewise.
     * crc-crc32-data16.c: Likewise.
     * crc-crc32-data8.c: Likewise.
     * crc-crc32.c: Likewise.
     * crc-crc64-data32.c: Likewise.
     * crc-crc64-data64.c: Likewise.
     * crc-crc8-data8-loop-xorInFor.c: Likewise.
     * crc-crc8-data8-loop-xorOutsideFor.c: Likewise.
     * crc-crc8-data8-xorOustideFor.c: Likewise.
     * crc-crc8.c: Likewise.

Signed-off-by: Mariam Arutunian >

OK once all prerequisites are approved.

jeff



Re: [RFC/RFA] [PATCH 04/12] RISC-V: Add CRC built-ins tests for the target ZBC.

2024-05-25 Thread Jeff Law




On 5/24/24 2:41 AM, Mariam Arutunian wrote:

   gcc/testsuite/gcc.target/riscv/

     * crc-builtin-zbc32.c: New file.
     * crc-builtin-zbc64.c: Likewise.

OK once prerequisites are approved.

jeff



[committed] [v2] More logical op simplifications in simplify-rtx.cc

2024-05-25 Thread Jeff Law

This is a revamp of what started as a target specific patch.

Basically xalan (corrected, I originally thought it was perlbench) has a 
bitset implementation with a bit of an oddity.  Specifically setBit will 
clear the bit before it is set:



if (bitToSet < 32)
{
fBits1 &= ~mask;
fBits1 |= mask;
}
 else
{
fBits2 &= ~mask;
fBits2 |= mask;
}


We can clean this up pretty easily in RTL with a small bit of code in 
simplify-rtx.  While xalan doesn't have other cases, we can synthesize 
tests pretty easily and handle them as well.



It turns out we don't actually have to recognize this stuff at the bit 
level, just standard logical identities are sufficient.  For example


(X | Y) & ~Y -> X & ~Y



Andrew P. might poke at this at the gimple level.  The type changes 
kindof get in the way in gimple but he's much better at match.pd than I 
am, so if he wants to chase it from the gimple side, I'll fully support 
that.


Bootstrapped and regression tested on x86.  Also run through my tester 
on its embedded targets.


Pushing to the trunk.

jeff

commit 05daf617ea22e1d818295ed2d037456937e23530
Author: Jeff Law 
Date:   Sat May 25 12:39:05 2024 -0600

[committed] [v2] More logical op simplifications in simplify-rtx.cc

This is a revamp of what started as a target specific patch.

Basically xalan (corrected, I originally thought it was perlbench) has a 
bitset
implementation with a bit of an oddity.  Specifically setBit will clear the 
bit
before it is set:

> if (bitToSet < 32)
> {
> fBits1 &= ~mask;
> fBits1 |= mask;
> }
>  else
> {
> fBits2 &= ~mask;
> fBits2 |= mask;
> }
We can clean this up pretty easily in RTL with a small bit of code in
simplify-rtx.  While xalan doesn't have other cases, we can synthesize tests
pretty easily and handle them as well.

It turns out we don't actually have to recognize this stuff at the bit 
level,
just standard logical identities are sufficient.  For example

(X | Y) & ~Y -> X & ~Y

Andrew P. might poke at this at the gimple level.  The type changes kindof 
get
in the way in gimple but he's much better at match.pd than I am, so if he 
wants
to chase it from the gimple side, I'll fully support that.

Bootstrapped and regression tested on x86.  Also run through my tester on 
its
embedded targets.

Pushing to the trunk.

gcc/

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1): 
Handle
more logical simplifications.

gcc/testsuite/

* g++.target/riscv/redundant-bitmap-1.C: New test.
* g++.target/riscv/redundant-bitmap-2.C: New test.
* g++.target/riscv/redundant-bitmap-3.C: New test.
* g++.target/riscv/redundant-bitmap-4.C: New test.

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 53f54d1d392..5caf1dfd957 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3549,6 +3549,12 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
return tem;
}
 
+  /* Convert (ior (and (not A) B) A) into A | B.  */
+  if (GET_CODE (op0) == AND
+ && GET_CODE (XEXP (op0, 0)) == NOT
+ && rtx_equal_p (XEXP (XEXP (op0, 0), 0), op1))
+   return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
+
   tem = simplify_byte_swapping_operation (code, mode, op0, op1);
   if (tem)
return tem;
@@ -3801,6 +3807,12 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
return tem;
}
 
+  /* Convert (xor (and (not A) B) A) into A | B.  */
+  if (GET_CODE (op0) == AND
+ && GET_CODE (XEXP (op0, 0)) == NOT
+ && rtx_equal_p (XEXP (XEXP (op0, 0), 0), op1))
+   return simplify_gen_binary (IOR, mode, XEXP (op0, 1), op1);
+
   tem = simplify_byte_swapping_operation (code, mode, op0, op1);
   if (tem)
return tem;
@@ -4006,6 +4018,23 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
  && rtx_equal_p (op1, XEXP (XEXP (op0, 1), 0)))
return simplify_gen_binary (AND, mode, op1, XEXP (op0, 0));
 
+  /* (and (ior/xor (X Y) (not Y)) -> X & ~Y */
+  if ((GET_CODE (op0) == IOR || GET_CODE (op0) == XOR)
+ && GET_CODE (op1) == NOT
+ && rtx_equal_p (XEXP (op1, 0), XEXP (op0, 1)))
+   return simplify_gen_binary (AND, mode, XEXP (op0, 0),
+   simplify_gen_unary (NOT, mode,
+   XEXP (op1, 0),
+   mode));
+  /* (and (ior/xor (Y X) (not Y)) -> X & ~Y */
+  i

[PATCH] c++: canonicity of fn types w/ instantiated eh specs [PR115223]

2024-05-25 Thread Patrick Palka
Bootstrap and regtest on x86_64-pc-linux-gnu in progress,
does this look OK for trunk if successful?

-- >8 --

When propagating structural equality in build_cp_fntype_variant, we
should consider structural equality of the exception-less variant, not
of the given type which might use structural equality only because of
the (complex) noexcept-spec we're intending to replace, as in
maybe_instantiate_noexcept which calls build_exception_variant using
the function type with a deferred noexcept-spec.  Otherwise we might
pessimisticly use structural equality for a function type with a simple
instantiated noexcept-spec, leading to a failed LTO-specific sanity
check if we later use that (structural-equality) type as the canonical
version of some other variant.

PR c++/115223

gcc/cp/ChangeLog:

* tree.cc (build_cp_fntype_variant): Propagate structural
equality of the exception-less variant.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept87.C: New test.
---
 gcc/cp/tree.cc  |  4 
 gcc/testsuite/g++.dg/cpp0x/noexcept87.C | 11 +++
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept87.C

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 4d87661b4ad..f810b8cd777 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -2796,6 +2796,10 @@ build_cp_fntype_variant (tree type, cp_ref_qualifier 
rqual,
   bool complex_eh_spec_p = (cr && cr != noexcept_true_spec
&& !UNPARSED_NOEXCEPT_SPEC_P (cr));
 
+  if (!complex_eh_spec_p && TYPE_RAISES_EXCEPTIONS (type))
+/* We want to consider structural equality of the exception-less
+   variant since we'll be replacing the exception specification.  */
+type = build_cp_fntype_variant (type, rqual, /*raises=*/NULL_TREE, late);
   if (TYPE_STRUCTURAL_EQUALITY_P (type) || complex_eh_spec_p)
 /* Propagate structural equality.  And always use structural equality
for function types with a complex noexcept-spec since their identity
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept87.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept87.C
new file mode 100644
index 000..60b1497472b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept87.C
@@ -0,0 +1,11 @@
+// PR c++/115223
+// { dg-do compile { target c++11 } }
+// { dg-additional-options -flto }
+
+template
+void f() noexcept(bool(T() || true));
+
+void g(int n) { f(); }
+
+using type = void;
+type callDestructorIfNecessary() noexcept {}
-- 
2.45.1.246.gb9cfe4845c



[PATCH v1] Gen-Match: Fix gen_kids_1 right hand braces mis-alignment

2024-05-25 Thread pan2 . li
From: Pan Li 

Notice some mis-alignment for gen_kids_1 right hand braces as below:

  if ((_q50 == _q20 && ! TREE_SIDE_EFFECTS (...
{
  if ((_q51 == _q21 && ! TREE_SIDE_EFFECTS (...
{
  {
tree captures[2] ATTRIBUTE_UNUSED = {...
{
  res_ops[0] = captures[0];
  res_ops[1] = captures[1];
  if (UNLIKELY (debug_dump)) ...
  return true;
}
  }
}
}
}  // mis-aligned here.
 }

The below test are passed for this patch:
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* genmatch.cc (dt_node::gen_kids_1): Fix indenet mis-aligned.

Signed-off-by: Pan Li 
---
 gcc/genmatch.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index c982c95b70f..f1e0e7abe0c 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -3428,7 +3428,7 @@ dt_node::gen_kids_1 (FILE *f, int indent, bool gimple, 
int depth,
  child_opname, kid_opname, j);
}
   preds[i]->gen_kids (f, indent + 4, gimple, depth);
-  fprintf (f, "}\n");
+  fprintf_indent (f, indent, "  }\n");
   indent -= 2;
   fprintf_indent (f, indent, "}\n");
 }
-- 
2.34.1



[COMMITTED] tree-optimization/115208 - Delete gori_map during destruction of GORI.

2024-05-25 Thread Andrew MacLeod

When a GORI object is constructed, we construct both GORI and a gori_map.

During destruction, I neglected to destruct the associated gori_map.  
doh!  sorry.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  And hopefully 
resolves everyone's issues.


Andrew


From e98cf19c2be1ffaf65d625e1d8b927a42e77b009 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Sat, 25 May 2024 12:28:52 -0400
Subject: [PATCH 4/4] Delete gori_map during destruction of a GORI..

Forgot to free the gori_mpa object when a gori object is freed.

	PR tree-optimization/115208
	* value-query.cc (range_query::create_gori): Confirm gori_map is NULL.
	(range_query::destroy_gori): Free gori_map if one was allocated.
---
 gcc/value-query.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/value-query.cc b/gcc/value-query.cc
index 0d0c0e8058e..556a0f39b09 100644
--- a/gcc/value-query.cc
+++ b/gcc/value-query.cc
@@ -188,6 +188,7 @@ void
 range_query::create_gori (int not_executable_flag, int sw_max_edges)
 {
   gcc_checking_assert (m_gori == &default_gori);
+  gcc_checking_assert (m_map == NULL);
   m_map = new gori_map ();
   gcc_checking_assert (m_map);
   m_gori = new gori_compute (*m_map, not_executable_flag, sw_max_edges);
@@ -199,6 +200,9 @@ range_query::destroy_gori ()
 {
   if (m_gori && m_gori != &default_gori)
 delete m_gori;
+  if (m_map)
+delete m_map;
+  m_map = NULL;
   m_gori= &default_gori;
 }
 
-- 
2.41.0



Re: [RFC/RFA] [PATCH 04/12] RISC-V: Add CRC built-ins tests for the target ZBC.

2024-05-25 Thread Mariam Arutunian
On Sat, May 25, 2024, 22:35 Jeff Law  wrote:

>
>
> On 5/24/24 2:41 AM, Mariam Arutunian wrote:
> >gcc/testsuite/gcc.target/riscv/
> >
> >  * crc-builtin-zbc32.c: New file.
> >  * crc-builtin-zbc64.c: Likewise.
> OK once prerequisites are approved.
>
> jeff
>

Thank you.

>


Re: [RFC/RFA] [PATCH 12/12] Add tests for CRC detection and generation.

2024-05-25 Thread Mariam Arutunian
On Sat, May 25, 2024, 22:34 Jeff Law  wrote:

>
>
> On 5/24/24 2:42 AM, Mariam Arutunian wrote:
> >gcc/testsuite/gcc.c-torture/compile/
> >
> >  * crc-11.c: New test.
> >  * crc-15.c: Likewise.
> >  * crc-16.c: Likewise.
> >  * crc-19.c: Likewise.
> >  * crc-2.c: Likewise.
> >  * crc-20.c: Likewise.
> >  * crc-24.c: Likewise.
> >  * crc-29.c: Likewise.
> >  * crc-27.c: Likewise.
> >  * crc-3.c: Likewise.
> >  * crc-crc32-data24.c: Likewise.
> >  * crc-from-fedora-packages (1-24).c: Likewise.
> >  * crc-linux-(1-5).c: Likewise.
> >  * crc-not-crc-(1-26).c: Likewise.
> >  * crc-side-instr-(1-17).c: Likewise.
> >
> >gcc/testsuite/gcc.c-torture/execute/
> >
> >  * crc-(1, 4-10, 12-14, 17-18, 21-28).c: New tests.
> >  * crc-CCIT-data16-xorOutside_InsideFor.c: Likewise.
> >  * crc-CCIT-data16.c: Likewise.
> >  * crc-CCIT-data8.c: Likewise.
> >  * crc-coremark16-data16.c: Likewise.
> >  * crc-coremark16-data16.c: Likewise.
> >  * crc-coremark32-data32.c: Likewise.
> >  * crc-coremark32-data8.c: Likewise.
> >  * crc-coremark64-data64.c: Likewise.
> >  * crc-coremark8-data8.c: Likewise.
> >  * crc-crc32-data16.c: Likewise.
> >  * crc-crc32-data8.c: Likewise.
> >  * crc-crc32.c: Likewise.
> >  * crc-crc64-data32.c: Likewise.
> >  * crc-crc64-data64.c: Likewise.
> >  * crc-crc8-data8-loop-xorInFor.c: Likewise.
> >  * crc-crc8-data8-loop-xorOutsideFor.c: Likewise.
> >  * crc-crc8-data8-xorOustideFor.c: Likewise.
> >  * crc-crc8.c: Likewise.
> >
> > Signed-off-by: Mariam Arutunian  > >
> OK once all prerequisites are approved.
>
> jeff
>

Thank you.

>
>