Re: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification

2023-07-15 Thread Andrew Pinski via Gcc-patches
On Wed, Jul 12, 2023 at 6:37 AM Richard Biener via Gcc-patches
 wrote:
>
> The PRs ask for optimizing of
>
>   _1 = BIT_FIELD_REF ;
>   result_4 = BIT_INSERT_EXPR ;
>
> to a vector permutation.  The following implements this as
> match.pd pattern, improving code generation on x86_64.

This is more general case of PR 93080 really where we had:

_1 = BIT_FIELD_REF <_2, 64, 64>;
 result_4 = BIT_INSERT_EXPR <_2, _1, 64>;

Thanks,
Andrew Pinski

>
> On the RTL level we face the issue that backend patterns inconsistently
> use vec_merge and vec_select of vec_concat to represent permutes.
>
> I think using a (supported) permute is almost always better
> than an extract plus insert, maybe excluding the case we extract
> element zero and that's aliased to a register that can be used
> directly for insertion (not sure how to query that).
>
> But this regresses for example gcc.target/i386/pr54855-8.c because PRE
> now realizes that
>
>   _1 = BIT_FIELD_REF ;
>   if (_1 > a_4(D))
> goto ; [50.00%]
>   else
> goto ; [50.00%]
>
>[local count: 536870913]:
>
>[local count: 1073741824]:
>   # iftmp.0_2 = PHI <_1(3), a_4(D)(2)>
>   x_5 = BIT_INSERT_EXPR ;
>
> is equal to
>
>[local count: 1073741824]:
>   _1 = BIT_FIELD_REF ;
>   if (_1 > a_4(D))
> goto ; [50.00%]
>   else
> goto ; [50.00%]
>
>[local count: 536870912]:
>   _7 = BIT_INSERT_EXPR ;
>
>[local count: 1073741824]:
>   # prephitmp_8 = PHI 
>
> and that no longer produces the desired maxsd operation at the RTL
> level (we fail to match .FMAX at the GIMPLE level earlier).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu with regressions:
>
> FAIL: gcc.target/i386/pr54855-13.c scan-assembler-times vmaxsh[ t] 1
> FAIL: gcc.target/i386/pr54855-13.c scan-assembler-not vcomish[ t]
> FAIL: gcc.target/i386/pr54855-8.c scan-assembler-times maxsd 1
> FAIL: gcc.target/i386/pr54855-8.c scan-assembler-not movsd
> FAIL: gcc.target/i386/pr54855-9.c scan-assembler-times minss 1
> FAIL: gcc.target/i386/pr54855-9.c scan-assembler-not movss
>
> I think this is also PR88540 (the lack of min/max detection, not
> sure if the SSE min/max are suitable here)
>
> PR tree-optimization/94864
> PR tree-optimization/94865
> * match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
> for vector insertion from vector extraction.
>
> * gcc.target/i386/pr94864.c: New testcase.
> * gcc.target/i386/pr94865.c: Likewise.
> ---
>  gcc/match.pd| 25 +
>  gcc/testsuite/gcc.target/i386/pr94864.c | 13 +
>  gcc/testsuite/gcc.target/i386/pr94865.c | 13 +
>  3 files changed, 51 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94864.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94865.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8543f777a28..8cc106049c4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7770,6 +7770,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   wi::to_wide (@ipos) + isize))
>  (BIT_FIELD_REF @0 @rsize @rpos)
>
> +/* Simplify vector inserts of other vector extracts to a permute.  */
> +(simplify
> + (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos)
> + (if (VECTOR_TYPE_P (type)
> +  && types_match (@0, @1)
> +  && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2))
> +  && TYPE_VECTOR_SUBPARTS (type).is_constant ())
> +  (with
> +   {
> + unsigned HOST_WIDE_INT elsz
> +   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1;
> + poly_uint64 relt = exact_div (tree_to_poly_uint64 (@rpos), elsz);
> + poly_uint64 ielt = exact_div (tree_to_poly_uint64 (@ipos), elsz);
> + unsigned nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> + vec_perm_builder builder;
> + builder.new_vector (nunits, nunits, 1);
> + for (unsigned i = 0; i < nunits; ++i)
> +   builder.quick_push (known_eq (ielt, i) ? nunits + relt : i);
> + vec_perm_indices sel (builder, 2, nunits);
> +   }
> +   (if (!VECTOR_MODE_P (TYPE_MODE (type))
> +   || can_vec_perm_const_p (TYPE_MODE (type), TYPE_MODE (type), sel, 
> false))
> +(vec_perm @0 @1 { vec_perm_indices_to_tree
> +(build_vector_type (ssizetype, nunits), sel); })
> +
>  (if (canonicalize_math_after_vectorization_p ())
>   (for fmas (FMA)
>(simplify
> diff --git a/gcc/testsuite/gcc.target/i386/pr94864.c 
> b/gcc/testsuite/gcc.target/i386/pr94864.c
> new file mode 100644
> index 000..69cb481fcfe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr94864.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-avx" } */
> +
> +typedef double v2df __attribute__((vector_size(16)));
> +
> +v2df move_sd(v2df a, v2df b)
> +{
> +v2df result = a;
> +result[0] = b[1];
> +return result;
> +}
> +
> +/* { dg-final { scan-assembler "unpckhpd\[\\t \]%xmm0, %xmm1" } } */
> diff --git a/gcc/test

Re: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification

2023-07-15 Thread Andrew Pinski via Gcc-patches
On Sat, Jul 15, 2023 at 10:31 AM Andrew Pinski  wrote:
>
> On Wed, Jul 12, 2023 at 6:37 AM Richard Biener via Gcc-patches
>  wrote:
> >
> > The PRs ask for optimizing of
> >
> >   _1 = BIT_FIELD_REF ;
> >   result_4 = BIT_INSERT_EXPR ;
> >
> > to a vector permutation.  The following implements this as
> > match.pd pattern, improving code generation on x86_64.
>
> This is more general case of PR 93080 really where we had:
>
> _1 = BIT_FIELD_REF <_2, 64, 64>;
>  result_4 = BIT_INSERT_EXPR <_2, _1, 64>;

I should mention the i386 failures show up even with the limited patch
for PR 93080 which is why I didn't move forward on my patch there.

>
> Thanks,
> Andrew Pinski
>
> >
> > On the RTL level we face the issue that backend patterns inconsistently
> > use vec_merge and vec_select of vec_concat to represent permutes.
> >
> > I think using a (supported) permute is almost always better
> > than an extract plus insert, maybe excluding the case we extract
> > element zero and that's aliased to a register that can be used
> > directly for insertion (not sure how to query that).
> >
> > But this regresses for example gcc.target/i386/pr54855-8.c because PRE
> > now realizes that
> >
> >   _1 = BIT_FIELD_REF ;
> >   if (_1 > a_4(D))
> > goto ; [50.00%]
> >   else
> > goto ; [50.00%]
> >
> >[local count: 536870913]:
> >
> >[local count: 1073741824]:
> >   # iftmp.0_2 = PHI <_1(3), a_4(D)(2)>
> >   x_5 = BIT_INSERT_EXPR ;
> >
> > is equal to
> >
> >[local count: 1073741824]:
> >   _1 = BIT_FIELD_REF ;
> >   if (_1 > a_4(D))
> > goto ; [50.00%]
> >   else
> > goto ; [50.00%]
> >
> >[local count: 536870912]:
> >   _7 = BIT_INSERT_EXPR ;
> >
> >[local count: 1073741824]:
> >   # prephitmp_8 = PHI 
> >
> > and that no longer produces the desired maxsd operation at the RTL
> > level (we fail to match .FMAX at the GIMPLE level earlier).
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu with regressions:
> >
> > FAIL: gcc.target/i386/pr54855-13.c scan-assembler-times vmaxsh[ t] 1
> > FAIL: gcc.target/i386/pr54855-13.c scan-assembler-not vcomish[ t]
> > FAIL: gcc.target/i386/pr54855-8.c scan-assembler-times maxsd 1
> > FAIL: gcc.target/i386/pr54855-8.c scan-assembler-not movsd
> > FAIL: gcc.target/i386/pr54855-9.c scan-assembler-times minss 1
> > FAIL: gcc.target/i386/pr54855-9.c scan-assembler-not movss
> >
> > I think this is also PR88540 (the lack of min/max detection, not
> > sure if the SSE min/max are suitable here)
> >
> > PR tree-optimization/94864
> > PR tree-optimization/94865
> > * match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
> > for vector insertion from vector extraction.
> >
> > * gcc.target/i386/pr94864.c: New testcase.
> > * gcc.target/i386/pr94865.c: Likewise.
> > ---
> >  gcc/match.pd| 25 +
> >  gcc/testsuite/gcc.target/i386/pr94864.c | 13 +
> >  gcc/testsuite/gcc.target/i386/pr94865.c | 13 +
> >  3 files changed, 51 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr94864.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr94865.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 8543f777a28..8cc106049c4 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -7770,6 +7770,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   wi::to_wide (@ipos) + isize))
> >  (BIT_FIELD_REF @0 @rsize @rpos)
> >
> > +/* Simplify vector inserts of other vector extracts to a permute.  */
> > +(simplify
> > + (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos)
> > + (if (VECTOR_TYPE_P (type)
> > +  && types_match (@0, @1)
> > +  && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2))
> > +  && TYPE_VECTOR_SUBPARTS (type).is_constant ())
> > +  (with
> > +   {
> > + unsigned HOST_WIDE_INT elsz
> > +   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1;
> > + poly_uint64 relt = exact_div (tree_to_poly_uint64 (@rpos), elsz);
> > + poly_uint64 ielt = exact_div (tree_to_poly_uint64 (@ipos), elsz);
> > + unsigned nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> > + vec_perm_builder builder;
> > + builder.new_vector (nunits, nunits, 1);
> > + for (unsigned i = 0; i < nunits; ++i)
> > +   builder.quick_push (known_eq (ielt, i) ? nunits + relt : i);
> > + vec_perm_indices sel (builder, 2, nunits);
> > +   }
> > +   (if (!VECTOR_MODE_P (TYPE_MODE (type))
> > +   || can_vec_perm_const_p (TYPE_MODE (type), TYPE_MODE (type), sel, 
> > false))
> > +(vec_perm @0 @1 { vec_perm_indices_to_tree
> > +(build_vector_type (ssizetype, nunits), sel); 
> > })
> > +
> >  (if (canonicalize_math_after_vectorization_p ())
> >   (for fmas (FMA)
> >(simplify
> > diff --git a/gcc/testsuite/gcc.target/i386/pr94864.c 
> > b/gcc/testsuite/gcc.target/i386/pr94864.c
> > new file mode 100644
> > index 00

[committed] hppa: Modify TLS patterns to provide both 32 and 64-bit support

2023-07-15 Thread John David Anglin
Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11.

Committed to trunk.

Dave
---

hppa: Modify TLS patterns to provide both 32 and 64-bit support.

2023-07-15  John David Anglin  

gcc/ChangeLog:

* config/pa/pa.md: Define constants R1_REGNUM, R19_REGNUM and
R27_REGNUM.
(tgd_load): Restrict to !TARGET_64BIT. Use register constants.
(tld_load): Likewise.
(tgd_load_pic): Change to expander.
(tld_load_pic, tld_offset_load, tp_load): Likewise.
(tie_load_pic, tle_load): Likewise.
(tgd_load_picsi, tgd_load_picdi): New.
(tld_load_picsi, tld_load_picdi): New.
(tld_offset_load): New.
(tp_load): New.
(tie_load_picsi, tie_load_picdi): New.
(tle_load): New.

diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md
index 726e12768f8..f603591447d 100644
--- a/gcc/config/pa/pa.md
+++ b/gcc/config/pa/pa.md
@@ -108,6 +108,14 @@
(MAX_17BIT_OFFSET   262100) ; 17-bit branch
   ])
 
+;; Register numbers
+
+(define_constants
+  [(R1_REGNUM   1)
+   (R19_REGNUM 19)
+   (R27_REGNUM 27)
+  ])
+
 ;; Mode and code iterators
 
 ;; This mode iterator allows :P to be used for patterns that operate on
@@ -10262,9 +10270,9 @@ add,l %2,%3,%3\;bv,n %%r0(%3)"
 (define_insn "tgd_load"
  [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand 1 "tgd_symbolic_operand" "")] UNSPEC_TLSGD))
-  (clobber (reg:SI 1))
-  (use (reg:SI 27))]
-  ""
+  (clobber (reg:SI R1_REGNUM))
+  (use (reg:SI R27_REGNUM))]
+  "!TARGET_64BIT"
   "*
 {
   return \"addil LR'%1-$tls_gdidx$,%%r27\;ldo RR'%1-$tls_gdidx$(%%r1),%0\";
@@ -10272,12 +10280,25 @@ add,l %2,%3,%3\;bv,n %%r0(%3)"
   [(set_attr "type" "multi")
(set_attr "length" "8")])
 
-(define_insn "tgd_load_pic"
+(define_expand "tgd_load_pic"
+ [(set (match_operand 0 "register_operand")
+   (unspec [(match_operand 1 "tgd_symbolic_operand")] UNSPEC_TLSGD_PIC))
+  (clobber (reg R1_REGNUM))]
+  ""
+{
+  if (TARGET_64BIT)
+emit_insn (gen_tgd_load_picdi (operands[0], operands[1]));
+  else
+emit_insn (gen_tgd_load_picsi (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "tgd_load_picsi"
  [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand 1 "tgd_symbolic_operand" "")] 
UNSPEC_TLSGD_PIC))
-  (clobber (reg:SI 1))
-  (use (reg:SI 19))]
-  ""
+  (clobber (reg:SI R1_REGNUM))
+  (use (reg:SI R19_REGNUM))]
+  "!TARGET_64BIT"
   "*
 {
   return \"addil LT'%1-$tls_gdidx$,%%r19\;ldo RT'%1-$tls_gdidx$(%%r1),%0\";
@@ -10285,12 +10306,25 @@ add,l %2,%3,%3\;bv,n %%r0(%3)"
   [(set_attr "type" "multi")
(set_attr "length" "8")])
 
+(define_insn "tgd_load_picdi"
+ [(set (match_operand:DI 0 "register_operand" "=r")
+   (unspec:DI [(match_operand 1 "tgd_symbolic_operand" "")] 
UNSPEC_TLSGD_PIC))
+  (clobber (reg:DI R1_REGNUM))
+  (use (reg:DI R27_REGNUM))]
+  "TARGET_64BIT"
+  "*
+{
+  return \"addil LT'%1-$tls_gdidx$,%%r27\;ldo RT'%1-$tls_gdidx$(%%r1),%0\";
+}"
+  [(set_attr "type" "multi")
+   (set_attr "length" "8")])
+
 (define_insn "tld_load"
  [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand 1 "tld_symbolic_operand" "")] UNSPEC_TLSLDM))
-  (clobber (reg:SI 1))
-  (use (reg:SI 27))]
-  ""
+  (clobber (reg:SI R1_REGNUM))
+  (use (reg:SI R27_REGNUM))]
+  "!TARGET_64BIT"
   "*
 {
   return \"addil LR'%1-$tls_ldidx$,%%r27\;ldo RR'%1-$tls_ldidx$(%%r1),%0\";
@@ -10298,12 +10332,25 @@ add,l %2,%3,%3\;bv,n %%r0(%3)"
   [(set_attr "type" "multi")
(set_attr "length" "8")])
 
-(define_insn "tld_load_pic"
+(define_expand "tld_load_pic"
+ [(set (match_operand 0 "register_operand")
+   (unspec [(match_operand 1 "tld_symbolic_operand")] UNSPEC_TLSLDM_PIC))
+  (clobber (reg R1_REGNUM))]
+  ""
+{
+  if (TARGET_64BIT)
+emit_insn (gen_tld_load_picdi (operands[0], operands[1]));
+  else
+emit_insn (gen_tld_load_picsi (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "tld_load_picsi"
  [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand 1 "tld_symbolic_operand" "")] 
UNSPEC_TLSLDM_PIC))
-  (clobber (reg:SI 1))
-  (use (reg:SI 19))]
-  ""
+  (clobber (reg:SI R1_REGNUM))
+  (use (reg:SI R19_REGNUM))]
+  "!TARGET_64BIT"
   "*
 {
   return \"addil LT'%1-$tls_ldidx$,%%r19\;ldo RT'%1-$tls_ldidx$(%%r1),%0\";
@@ -10311,12 +10358,40 @@ add,l %2,%3,%3\;bv,n %%r0(%3)"
   [(set_attr "type" "multi")
(set_attr "length" "8")])
 
-(define_insn "tld_offset_load"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-(plus:SI (unspec:SI [(match_operand 1 "tld_symbolic_operand" "")] 
+(define_insn "tld_load_picdi"
+ [(set (match_operand:DI 0 "register_operand" "=r")
+   (unspec:DI [(match_operand 1 "tld_symbolic_operand" "")] 
UNSPEC_TLSLDM_PIC))
+  (clobber (reg:DI R1_REGNUM))
+  (use (reg:DI R27_REGNUM))]
+  "TARGET_64BIT"
+  "*
+{
+  return \"addil LT'%1-$tls_ldidx$,%%r27\;ldo RT'%1-$tls_ldidx$(%%r1),%0\";
+}"
+  [(set_attr "typ

[PATCH] Update my contrib entry

2023-07-15 Thread Andrew Pinski via Gcc-patches
Committed as obvious after making sure the documentation still builds.

gcc/ChangeLog:

* doc/contrib.texi: Update my entry.
---
 gcc/doc/contrib.texi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/contrib.texi b/gcc/doc/contrib.texi
index fa551c5f900..d7b73e179a5 100644
--- a/gcc/doc/contrib.texi
+++ b/gcc/doc/contrib.texi
@@ -809,7 +809,8 @@ Marek Polacek for his work on the C front end, the 
sanitizers and general
 bug fixing.
 
 @item
-Andrew Pinski for processing bug reports by the dozen.
+Andrew Pinski for processing bug reports by the dozen, maintenance of the
+Objective-C runtime libraries, and many scalar optimizations.
 
 @item
 Ovidiu Predescu for his work on the Objective-C front end and runtime
-- 
2.31.1



Re: [PATCH v3] Introduce attribute reverse_alias

2023-07-15 Thread Nathan Sidwell via Gcc-patches
Not commenting on the semantics, but the name seems unfortunate (hello 
bikeshed).  The documentation starts with 'attribute causes @var{name} to be 
emitted as an alias to the definition'.  So not emitting a 'reverse alias', 
whatever that might be.  It doesn;t seem to mention how reverse alias differs 
from 'alias'.  Why would 'alias' not DTRT?


Is is emitting a an additiona symbol -- ie, something like 'altname'.  Or is it 
something else? Is that symbol known in the current TU, or other TUs?


nathan



On 7/14/23 21:08, Alexandre Oliva wrote:


This patch introduces an attribute to add extra aliases to a symbol
when its definition is output.  The main goal is to ease interfacing
C++ with Ada, as C++ mangled names have to be named, and in some cases
(e.g. when using stdint.h typedefs in function arguments) the symbol
names may vary across platforms.

The attribute is usable in C and C++, presumably in all C-family
languages.  It can be attached to global variables and functions.  In
C++, it can also be attached to namespace-scoped variables and
functions, static data members, member functions, explicit
instantiations and specializations of template functions, members and
classes.

When applied to constructors or destructor, additional reverse_aliases
with _Base and _Del suffixes are defined for variants other than
complete-object ones.  This changes the assumption that clones always
carry the same attributes as their abstract declarations, so there is
now a function to adjust them.

C++ also had a bug in which attributes from local extern declarations
failed to be propagated to a preexisting corresponding
namespace-scoped decl.  I've fixed that, and adjusted acc tests that
distinguished between C and C++ in this regard.

Applying the attribute to class types is only valid in C++, and the
effect is to attach the alias to the RTTI object associated with the
class type.

Regstrapped on x86_64-linux-gnu.  Ok to install?

This is refreshed and renamed from earlier versions that named the
attribute 'exalias', and that AFAICT got stuck in name bikeshedding.
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551614.html


for  gcc/ChangeLog

* attribs.cc: Include cgraph.h.
(decl_attributes): Allow late introduction of reverse_alias in
types.
(create_reverse_alias_decl, create_reverse_alias_decls): New.
* attribs.h: Declare them.
(FOR_EACH_REVERSE_ALIAS): New macro.
* cgraph.cc (cgraph_node::create): Create reverse_alias decls.
* varpool.cc (varpool_node::get_create): Create reverse_alias
decls.
* cgraph.h (symtab_node::remap_reverse_alias_target): New.
* symtab.cc (symtab_node::remap_reverse_alias_target):
Define.
* cgraphunit.cc (cgraph_node::analyze): Create alias_target
node if needed.
(analyze_functions): Fixup visibility of implicit alias only
after its node is analyzed.
* doc/extend.texi (reverse_alias): Document for variables,
functions and types.

for  gcc/ada/ChangeLog

* doc/gnat_rm/interfacing_to_other_languages.rst: Mention
attribute reverse_alias to give RTTI symbols mnemonic names.
* doc/gnat_ugn/the_gnat_compilation_model.rst: Mention
attribute reverse_alias.  Fix incorrect ref to C1 ctor variant.

for  gcc/c-family/ChangeLog

* c-ada-spec.cc (pp_asm_name): Use first reverse_alias if
available.
* c-attribs.cc (handle_reverse_alias_attribute): New.
(c_common_attribute_table): Add reverse_alias.
(handle_copy_attribute): Do not copy reverse_alias.

for  gcc/c/ChangeLog

* c-decl.cc (duplicate_decls): Remap reverse_alias target.

for  gcc/cp/ChangeLog

* class.cc (adjust_clone_attributes): New.
(copy_fndecl_with_name, build_clone): Call it.
* cp-tree.h (adjust_clone_attributes): Declare.
(update_reverse_alias_interface): Declare.
(update_tinfo_reverse_alias): Declare.
* decl.cc (duplicate_decls): Remap reverse_alias target.
Adjust clone attributes.
(grokfndecl): Tentatively create reverse_alias decls after
adding attributes in e.g. a template member function explicit
instantiation.
* decl2.cc (cplus_decl_attributes): Update tinfo
reverse_alias.
(copy_interface, update_reverse_alias_interface): New.
(determine_visibility): Update reverse_alias interface.
(tentative_decl_linkage, import_export_decl): Likewise.
* name-lookup.cc: Include target.h and cgraph.h.
(push_local_extern_decl_alias): Merge attributes with
namespace-scoped decl, and drop duplicate reverse_alias.
* optimize.cc (maybe_clone_body): Re-adjust attributes after
cloning them.  Update reverse_alias interface.
* rtti.cc: Include attribs.h and cgraph.h.
(get_tinfo_decl): Copy reverse_alias attributes from type to
tinfo decl.  Create re

[PATCH v1|GCC-13] RISC-V: Bugfix for riscv-vsetvl pass.

2023-07-15 Thread Pan Li via Gcc-patches
From: Ju-Zhe Zhong 

This patch comes from part of below change, which locate one bug of rvv
vsetvel pass when auto-vectorization.

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624523.html

Unforunately, It is not easy to reproduce this bug by intrinsic APIs
but it is worth to backport to GCC 13.

Signed-off-by: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Add vl parameter.
(change_vsetvl_insn): Ditto.
(change_insn): Add validate change as well as assert.
(pass_vsetvl::backward_demand_fusion): Allow forward.
---
 gcc/config/riscv/riscv-vsetvl.cc | 29 +++--
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 3355ca4e3fb..fbd26988106 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -633,7 +633,8 @@ gen_vsetvl_pat (enum vsetvl_type insn_type, const 
vl_vtype_info &info, rtx vl)
 }
 
 static rtx
-gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
+gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info,
+   rtx vl = NULL_RTX)
 {
   rtx new_pat;
   vl_vtype_info new_info = info;
@@ -644,7 +645,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info 
&info)
   if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
 {
   rtx dest = get_vl (rinsn);
-  new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, dest);
+  new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl ? vl : dest);
 }
   else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
 new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
@@ -926,7 +927,8 @@ change_insn (rtx_insn *rinsn, rtx new_pat)
   print_rtl_single (dump_file, PATTERN (rinsn));
 }
 
-  validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
+  bool change_p = validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
+  gcc_assert (change_p);
 
   if (dump_file)
 {
@@ -1039,7 +1041,8 @@ change_insn (function_info *ssa, insn_change change, 
insn_info *insn,
 }
 
 static void
-change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info)
+change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info,
+   rtx vl = NULL_RTX)
 {
   rtx_insn *rinsn;
   if (vector_config_insn_p (insn->rtl ()))
@@ -1053,7 +1056,7 @@ change_vsetvl_insn (const insn_info *insn, const 
vector_insn_info &info)
   rinsn = PREV_INSN (insn->rtl ());
   gcc_assert (vector_config_insn_p (rinsn));
 }
-  rtx new_pat = gen_vsetvl_pat (rinsn, info);
+  rtx new_pat = gen_vsetvl_pat (rinsn, info, vl);
   change_insn (rinsn, new_pat);
 }
 
@@ -3331,7 +3334,21 @@ pass_vsetvl::backward_demand_fusion (void)
   new_info))
continue;
 
- change_vsetvl_insn (new_info.get_insn (), new_info);
+ rtx vl = NULL_RTX;
+ /* Backward VLMAX VL:
+  bb 3:
+vsetivli zero, 1 ... -> vsetvli t1, zero
+vmv.s.x
+  bb 5:
+vsetvli t1, zero ... -> to be elided.
+vlse16.v
+
+  We should forward "t1".  */
+ if (!block_info.reaching_out.has_avl_reg ()
+   && vlmax_avl_p (new_info.get_avl ()))
+   vl = get_vl (prop.get_insn ()->rtl ());
+change_vsetvl_insn (new_info.get_insn (), new_info, vl);
+
  if (block_info.local_dem == block_info.reaching_out)
block_info.local_dem = new_info;
  block_info.reaching_out = new_info;
-- 
2.34.1



RE: Re: [PATCH] RISC-V: Support non-SLP unordered reduction

2023-07-15 Thread Li, Pan2 via Gcc-patches
File a separated PATCH target GCC 13 for this bug with rvv.exp and riscv.exp 
test passed. Unfortunately, it is not easy to reproduce this by Intrinsic API.

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624574.html

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of ???
Sent: Friday, July 14, 2023 8:51 PM
To: kito.cheng 
Cc: gcc-patches ; kito.cheng ; 
palmer ; rdapp.gcc ; Jeff Law 

Subject: Re: Re: [PATCH] RISC-V: Support non-SLP unordered reduction

So to be safe, I think it should be backport to GCC 13 even though I didn't 
have a intrinsic testcase to reproduce it.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-14 20:38
To: 钟居哲
CC: GCC Patches; Kito Cheng; Palmer Dabbelt; Robin Dapp; Jeff Law
Subject: Re: [PATCH] RISC-V: Support non-SLP unordered reduction


 於 2023年7月14日 週五 20:31 寫道:
From: Ju-Zhe Zhong 

This patch add reduc_*_scal to support reduction auto-vectorization.

Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.

Consider this following case:
int __attribute__((noipa))
and_loop (int32_t * __restrict x, 
int32_t n, int res)
{
  for (int i = 0; i < n; ++i)
res &= x[i];
  return res;
}

ASM:
and_loop:
ble a1,zero,.L4
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.i v1,-1
.L3:
vsetvli a5,a1,e32,m1,tu,ma   > MUST BE "TU".
sllia4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vand.vv v1,v2,v1
bne a1,zero,.L3
vsetivlizero,1,e32,m1,ta,ma
vmv.v.i v2,-1
vsetvli a3,zero,e32,m1,ta,ma
vredand.vs  v1,v1,v2
vmv.x.s a5,v1
and a0,a2,a5
ret
.L4:
mv  a0,a2
ret

Fix bug of VSETVL PASS which is caused by reduction testcase.


It's performance bug or correctness bug? Does it's also appeared in gcc 13 if 
it's a correctness bug?


SLP reduction and floating-point in-order reduction are not supported yet.

gcc/ChangeLog:

* config/riscv/autovec.md (reduc_plus_scal_): New pattern.
(reduc_smax_scal_): Ditto.
(reduc_umax_scal_): Ditto.
(reduc_smin_scal_): Ditto.
(reduc_umin_scal_): Ditto.
(reduc_and_scal_): Ditto.
(reduc_ior_scal_): Ditto.
(reduc_xor_scal_): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(emit_nonvlmax_integer_move_insn): Add reduction.
(expand_reduction): New function.
* config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
(emit_vlmax_fp_reduction_insn): Ditto.
(get_m1_mode): Ditto.
(expand_cond_len_binop): Fix name.
(expand_reduction): New function.
* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix bug.
(change_insn): Ditto.
(change_vsetvl_insn): Ditto.
(pass_vsetvl::backward_demand_fusion): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
* gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.

---
 gcc/config/riscv/autovec.md   | 138 ++
 gcc/config/riscv/riscv-protos.h   |   3 +
 gcc/config/riscv/riscv-v.cc   |  84 ++-
 gcc/config/riscv/riscv-vsetvl.cc  |  28 +++-
 .../riscv/rvv/autovec/reduc/reduc-1.c | 118 +++
 .../riscv/rvv/autovec/reduc/reduc-2.c | 129 
 .../riscv/rvv/autovec/reduc/reduc-3.c |  65 +
 .../riscv/rvv/autovec/reduc/reduc-4.c |  59 
 .../riscv/rvv/autovec/reduc/reduc_run-1.c |  56 +++
 .../riscv/rvv/autovec/reduc/reduc_run-2.c |  79 ++
 .../riscv/rvv/autovec/reduc/reduc_run-3.c |  49 +++
 .../riscv/rvv/autovec/reduc/reduc_run-4.c |  66 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 13 files changed, 868 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
 create mode 100644 
gcc