Re: [PATCH v5 2/2] x86: Add vec_duplicate expander

2021-06-27 Thread Richard Sandiford via Gcc-patches
"H.J. Lu"  writes:
> 1. Update vec_duplicate to allow to fail so that backend can only allow
> broadcasting an integer constant to a vector when broadcast instruction
> is available.  This can be used by memset expander to avoid vec_duplicate
> when loading from constant pool is more efficient.

I don't see any changes in target-independent code though, other than
the doc update.  It's still the case that (existing) uses of
vec_duplicate_optab do not allow it to fail.

Thanks,
Richard

> 2. Add vec_duplicate expander and enable vec_duplicate from a
> non-standard SSE constant integer only if vector broadcast is available.
>
>   * config/i386/i386-expand.c (ix86_expand_integer_vec_duplicate):
>   New function.
>   * config/i386/i386-protos.h (ix86_expand_integer_vec_duplicat):
>   New prototype.
>   * config/i386/sse.md (INT_BROADCAST_MODE): New mode iterator.
>   (vec_duplicate): New expander.
>   * doc/md.texi: Update vec_duplicate.
> ---
>  gcc/config/i386/i386-expand.c | 24 
>  gcc/config/i386/i386-protos.h |  1 +
>  gcc/config/i386/sse.md| 28 
>  gcc/doc/md.texi   |  2 --
>  4 files changed, 53 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index e9e89c82764..75c160d4349 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -15742,6 +15742,30 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, 
> rtx vec, int elt)
>  }
>  }
>  
> +/* Expand integer vec_duplicate.  Return true if successful.  */
> +
> +bool
> +ix86_expand_integer_vec_duplicate (rtx *operands)
> +{
> +  /* Enable VEC_DUPLICATE from a non-standard SSE constant integer only
> + if vector broadcast is available.  */
> +  machine_mode mode = GET_MODE (operands[0]);
> +  if (CONST_INT_P (operands[1])
> +  && (!(TARGET_AVX2
> + || (TARGET_AVX
> + && (GET_MODE_INNER (mode) == SImode
> + || GET_MODE_INNER (mode) == DImode)))
> +   || standard_sse_constant_p (operands[1], mode)))
> +return false;
> +
> +  bool ok = ix86_expand_vector_init_duplicate (false, mode,
> +operands[0],
> +operands[1]);
> +  gcc_assert (ok);
> +
> +  return true;
> +}
> +
>  /* Generate code to copy vector bits i / 2 ... i - 1 from vector SRC
> to bits 0 ... i / 2 - 1 of vector DEST, which has the same mode.
> The upper bits of DEST are undefined, though they shouldn't cause
> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> index 71745b9a1ea..a6cc09bb75b 100644
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -258,6 +258,7 @@ extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, 
> bool, bool);
>  extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
>  extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
>  extern void ix86_expand_sse2_abs (rtx, rtx);
> +extern bool ix86_expand_integer_vec_duplicate (rtx *);
>  
>  /* In i386-c.c  */
>  extern void ix86_target_macros (void);
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index e4f01e64bc1..53a703fb466 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -24640,3 +24640,31 @@ (define_insn "*aesu8"
>"TARGET_WIDEKL"
>"aes\t{%0}"
>[(set_attr "type" "other")])
> +
> +;; Modes handled by broadcast patterns.  NB: Allow V64QI and V32HI with
> +;; TARGET_AVX512F since ix86_expand_integer_vec_duplicate can expand
> +;; without TARGET_AVX512BW which is used by memset vector broadcast
> +;; expander to XI with:
> +;;   vmovd   %edi, %xmm15
> +;;   vpbroadcastb%xmm15, %ymm15
> +;;   vinserti64x4$0x1, %ymm15, %zmm15, %zmm15
> +
> +(define_mode_iterator INT_BROADCAST_MODE
> +  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
> +   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
> +   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
> +   (V8DI "TARGET_AVX512F && TARGET_64BIT")
> +   (V4DI "TARGET_AVX && TARGET_64BIT") (V2DI "TARGET_64BIT")])
> +
> +;; Broadcast from an integer.  NB: Enable broadcast only if we can move
> +;; from GPR to SSE register directly.
> +(define_expand "vec_duplicate"
> +  [(set (match_operand:INT_BROADCAST_MODE 0 "register_operand")
> + (vec_duplicate:INT_BROADCAST_MODE
> +   (match_operand: 1 "general_operand")))]
> +  "TARGET_SSE2 && TARGET_INTER_UNIT_MOVES_TO_VEC"
> +{
> +  if (!ix86_expand_integer_vec_duplicate (operands))
> +FAIL;
> +  DONE;
> +})
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 1b918144330..a892c94d163 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5077,8 +5077,6 @@ the mode appropriate for one element of @var{m}.
>  This pattern only handles duplicates of non-constant inputs.  Constant
>  vectors go through the @code{mov@var{m}} pattern instead.
>  
> -This pattern is 

[PATCH] opts: Do not stop when unknown -Wno-error value encountered [PR/65403]

2021-06-27 Thread Nicholas Guriev
gcc
PR c/65403
* opts.c (enable_warning_as_error): Do not enable errors for
negative options. Added parameter orig_text for better diagnose
mistakes in options.

gcc/testsuite
* gcc.dg/Werror-13.c: Check a note about misspelled no-warning
tag rather than a hard error.
* gcc.dg/Wno-error-1.c, gcc.dg/Wno-error-2.c: New group of tests
for parametrized -Wno-error options.

Signed-off-by: Nicholas Guriev 

---

I am CC'ing Alex Henrie because he already worked on fixing this bug and
he may be interested in my solution.

 gcc/opts.c | 48 --
 gcc/testsuite/gcc.dg/Werror-13.c   |  2 +-
 gcc/testsuite/gcc.dg/Wno-error-1.c |  6 
 gcc/testsuite/gcc.dg/Wno-error-2.c |  8 +
 4 files changed, 40 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/Wno-error-1.c
 create mode 100644 gcc/testsuite/gcc.dg/Wno-error-2.c

diff --git a/gcc/opts.c b/gcc/opts.c
index 52e9e3a9df9..9ec0846198c 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -291,12 +291,10 @@ static void decode_d_option (const char *arg, struct 
gcc_options *opts,
 static void set_unsafe_math_optimizations_flags (struct gcc_options *opts,
 int set);
 static void enable_warning_as_error (const char *arg, int value,
-unsigned int lang_mask,
-const struct cl_option_handlers *handlers,
-struct gcc_options *opts,
-struct gcc_options *opts_set,
-location_t loc,
-diagnostic_context *dc);
+const char *orig_text, unsigned lang_mask,
+const cl_option_handlers *handlers,
+gcc_options *opts, gcc_options *opts_set,
+location_t loc, diagnostic_context *dc);
 
 /* Handle a back-end option; arguments and return value as for
handle_option.  */
@@ -2526,8 +2524,8 @@ common_handle_option (struct gcc_options *opts,
   if (lang_mask == CL_DRIVER)
break;
 
-  enable_warning_as_error (arg, value, lang_mask, handlers,
-  opts, opts_set, loc, dc);
+  enable_warning_as_error (arg, value, decoded->orig_option_with_args_text,
+  lang_mask, handlers, opts, opts_set, loc, dc);
   break;
 
 case OPT_Wfatal_errors:
@@ -3236,10 +3234,9 @@ decode_d_option (const char *arg, struct gcc_options 
*opts,
NULL), location LOC.  This is used by -Werror=.  */
 
 static void
-enable_warning_as_error (const char *arg, int value, unsigned int lang_mask,
-const struct cl_option_handlers *handlers,
-struct gcc_options *opts,
-struct gcc_options *opts_set,
+enable_warning_as_error (const char *arg, int value, const char *orig_text,
+unsigned lang_mask, const cl_option_handlers *handlers,
+gcc_options *opts, gcc_options *opts_set,
 location_t loc, diagnostic_context *dc)
 {
   char *new_option;
@@ -3251,19 +3248,24 @@ enable_warning_as_error (const char *arg, int value, 
unsigned int lang_mask,
   option_index = find_opt (new_option, lang_mask);
   if (option_index == OPT_SPECIAL_unknown)
 {
-  option_proposer op;
-  const char *hint = op.suggest_option (new_option);
-  if (hint)
-   error_at (loc, "%<-W%serror=%s%>: no option %<-%s%>;"
- " did you mean %<-%s%>?", value ? "" : "no-",
- arg, new_option, hint);
-  else
-   error_at (loc, "%<-W%serror=%s%>: no option %<-%s%>",
- value ? "" : "no-", arg, new_option);
+  cl_decoded_option fake_decoded = {};
+  fake_decoded.opt_index = OPT_SPECIAL_unknown;
+  fake_decoded.arg = orig_text;
+
+  if (handlers->unknown_option_callback (&fake_decoded))
+   {
+ option_proposer op;
+ const char *hint = op.suggest_option (new_option);
+ if (hint)
+   error_at (loc, _("%qs: no option %<-%s%>; did you mean %<-%s%>?"),
+ orig_text, new_option, hint);
+ else
+   error_at (loc, _("%qs: no option %<-%s%>"), orig_text, new_option);
+   }
 }
   else if (!(cl_options[option_index].flags & CL_WARNING))
-error_at (loc, "%<-Werror=%s%>: %<-%s%> is not an option that "
- "controls warnings", arg, new_option);
+error_at (loc, _("%qs: %<-%s%> is not an option that controls warnings"),
+ orig_text, new_option);
   else
 {
   const diagnostic_t kind = value ? DK_ERROR : DK_WARNING;
diff --git a/gcc/testsuite/gcc.dg/Werror-13.c b/gcc/testsuite/gcc.dg/Werror-13.c
index 3a02b7ea2b5..6c6c7c85447 100644
--- a/gcc/testsuite

Re: [PATCH v5 2/2] x86: Add vec_duplicate expander

2021-06-27 Thread H.J. Lu via Gcc-patches
On Sun, Jun 27, 2021 at 1:43 AM Richard Sandiford
 wrote:
>
> "H.J. Lu"  writes:
> > 1. Update vec_duplicate to allow to fail so that backend can only allow
> > broadcasting an integer constant to a vector when broadcast instruction
> > is available.  This can be used by memset expander to avoid vec_duplicate
> > when loading from constant pool is more efficient.
>
> I don't see any changes in target-independent code though, other than
> the doc update.  It's still the case that (existing) uses of
> vec_duplicate_optab do not allow it to fail.

I have a followup patch set on

https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pieces/broadcast

to use it to expand memset with vector broadcast:

https://gitlab.com/x86-gcc/gcc/-/commit/991c87f8a83ca736ae9ed92baa3ebadca289f6e3

For SSE2 which doesn't have vector broadcast, the constant vector broadcast
expander returns FAIL and load from constant pool will be used.

> Thanks,
> Richard
>
> > 2. Add vec_duplicate expander and enable vec_duplicate from a
> > non-standard SSE constant integer only if vector broadcast is available.
> >
> >   * config/i386/i386-expand.c (ix86_expand_integer_vec_duplicate):
> >   New function.
> >   * config/i386/i386-protos.h (ix86_expand_integer_vec_duplicat):
> >   New prototype.
> >   * config/i386/sse.md (INT_BROADCAST_MODE): New mode iterator.
> >   (vec_duplicate): New expander.
> >   * doc/md.texi: Update vec_duplicate.
> > ---
> >  gcc/config/i386/i386-expand.c | 24 
> >  gcc/config/i386/i386-protos.h |  1 +
> >  gcc/config/i386/sse.md| 28 
> >  gcc/doc/md.texi   |  2 --
> >  4 files changed, 53 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > index e9e89c82764..75c160d4349 100644
> > --- a/gcc/config/i386/i386-expand.c
> > +++ b/gcc/config/i386/i386-expand.c
> > @@ -15742,6 +15742,30 @@ ix86_expand_vector_extract (bool mmx_ok, rtx 
> > target, rtx vec, int elt)
> >  }
> >  }
> >
> > +/* Expand integer vec_duplicate.  Return true if successful.  */
> > +
> > +bool
> > +ix86_expand_integer_vec_duplicate (rtx *operands)
> > +{
> > +  /* Enable VEC_DUPLICATE from a non-standard SSE constant integer only
> > + if vector broadcast is available.  */
> > +  machine_mode mode = GET_MODE (operands[0]);
> > +  if (CONST_INT_P (operands[1])
> > +  && (!(TARGET_AVX2
> > + || (TARGET_AVX
> > + && (GET_MODE_INNER (mode) == SImode
> > + || GET_MODE_INNER (mode) == DImode)))
> > +   || standard_sse_constant_p (operands[1], mode)))
> > +return false;
> > +
> > +  bool ok = ix86_expand_vector_init_duplicate (false, mode,
> > +operands[0],
> > +operands[1]);
> > +  gcc_assert (ok);
> > +
> > +  return true;
> > +}
> > +
> >  /* Generate code to copy vector bits i / 2 ... i - 1 from vector SRC
> > to bits 0 ... i / 2 - 1 of vector DEST, which has the same mode.
> > The upper bits of DEST are undefined, though they shouldn't cause
> > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> > index 71745b9a1ea..a6cc09bb75b 100644
> > --- a/gcc/config/i386/i386-protos.h
> > +++ b/gcc/config/i386/i386-protos.h
> > @@ -258,6 +258,7 @@ extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, 
> > bool, bool);
> >  extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
> >  extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
> >  extern void ix86_expand_sse2_abs (rtx, rtx);
> > +extern bool ix86_expand_integer_vec_duplicate (rtx *);
> >
> >  /* In i386-c.c  */
> >  extern void ix86_target_macros (void);
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index e4f01e64bc1..53a703fb466 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -24640,3 +24640,31 @@ (define_insn "*aesu8"
> >"TARGET_WIDEKL"
> >"aes\t{%0}"
> >[(set_attr "type" "other")])
> > +
> > +;; Modes handled by broadcast patterns.  NB: Allow V64QI and V32HI with
> > +;; TARGET_AVX512F since ix86_expand_integer_vec_duplicate can expand
> > +;; without TARGET_AVX512BW which is used by memset vector broadcast
> > +;; expander to XI with:
> > +;;   vmovd   %edi, %xmm15
> > +;;   vpbroadcastb%xmm15, %ymm15
> > +;;   vinserti64x4$0x1, %ymm15, %zmm15, %zmm15
> > +
> > +(define_mode_iterator INT_BROADCAST_MODE
> > +  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
> > +   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
> > +   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
> > +   (V8DI "TARGET_AVX512F && TARGET_64BIT")
> > +   (V4DI "TARGET_AVX && TARGET_64BIT") (V2DI "TARGET_64BIT")])
> > +
> > +;; Broadcast from an integer.  NB: Enable broadcast only if we can move
> > +;; from GPR to SSE register directly.
> > +(define_expand "vec_duplicate"
> > +  [(set (match_operand:INT_

Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-06-27 Thread Aldy Hernandez via Gcc-patches




On 6/25/21 9:38 AM, Richard Biener wrote:

On Thu, Jun 24, 2021 at 5:01 PM Andrew MacLeod  wrote:


On 6/24/21 9:25 AM, Andrew MacLeod wrote:

On 6/24/21 8:29 AM, Richard Biener wrote:


THe original function in EVRP currently looks like:

  === BB 2 
  :
 if (a_5(D) == b_6(D))
   goto ; [INV]
 else
   goto ; [INV]

=== BB 8 
Equivalence set : [a_5(D), b_6(D)] edge 2->8 provides
a_5 and b_6 as equivalences
  :
 goto ; [100.00%]

=== BB 6 
  :
 # i_1 = PHI <0(8), i_10(5)>
 if (i_1 < a_5(D))
   goto ; [INV]
 else
   goto ; [INV]

=== BB 3 
Relational : (i_1 < a_5(D)) edge 6->3 provides
this relation
  :
 if (i_1 == b_6(D))
   goto ; [INV]
 else
   goto ; [INV]


So It knows that a_5 and b_6 are equivalence, and it knows that i_1 <
a_5 in BB3 as well..

so we should be able to indicate that  i_1 == b_6 as [0,0]..  we
currently aren't.   I think I had turned on equivalence mapping during
relational processing, so should be able to tag that without
transitive relations...  I'll have a look at why.

And once we get a bit further along, you will be able to access this
without ranger.. if one wants to simply register the relations directly.

Anyway, I'll get back to you why its currently being missed.

Andrew




As promised.  There was a typo in the equivalency comparisons... so it
was getting missed.  With the fix, the oracle identifies the relation
and evrp will now fold that expression away and the IL becomes:

 :
if (a_5(D) == b_6(D))
  goto ; [INV]
else
  goto ; [INV]

 :
i_10 = i_1 + 1;

 :
# i_1 = PHI <0(2), i_10(3)>
if (i_1 < a_5(D))
  goto ; [INV]
else
  goto ; [INV]

 :
return;

for the other cases you quote, there are no predictions such that if a
!= 0 then this equivalency exists...

+  if (a != 0)
+{
+  c = b;
+}

but the oracle would register that in the TRUE block,  c and b are
equivalent... so some other pass that was interested in tracking
conditions that make a block relevant would be able to compare relations...


I guess to fully leverage optimizations for cases like

   if (a != 0)
 c = b;
   ...
   if (a != 0)
 {
 if (c == b)
...
 }

one would need to consider the "optimally jump threaded path" to the
program point where the to be optimized stmt resides, making all
originally conditional but on the jump threaded path unconditional
relations and equivalences available.

For VN that could be done by unwinding to the CFG merge after
the first if (a != 0), treating only one of the predecessor edges
as executable and registering the appropriate a != 0 result and
continue VN up to the desired point, committing to the result
until before the CFG merge after the second if (a != 0).  And then
unwinding again for the "else" path.  Sounds like a possible
explosion in complexity as well if second-order opportunities
arise.

That is, we'd do simplifications exposed by jump threading but
without actually doing the jump threading (which will of course
not allow all possible simplifications w/o inserting extra PHIs
for computations we might want to re-use).


FWIW, as I mention in the PR, if the upcoming threader work could be 
taught to use the relation oracle, it could easily solve the conditional 
flowing through the a!=0 path.  However, we wouldn't be able to thread 
it because in this particular case, the path crosses loop boundaries.


I leave it to Jeff/others to pontificate on whether the jump-threader 
path duplicator could be taught to through loops. ??


Aldy



[PATCH] Add AIX 7.3 Configuration

2021-06-27 Thread David Edelsohn via Gcc-patches
aix: Add AIX 7.3 configuration and SPDX License Identifiers.

The anticipated release of AIX 7.3 has been announced.  This
patch adds the configuration bits based on AIX 7.2 configuration.

gcc/ChangeLog:

* config.gcc: Add SPDX License Identifier.
(powerpc-ibm-aix789): Default to aix73.h.
(powerpc-ibm-aix7.2.*.*): New stanza.
* config/rs6000/aix72.h: Add SPDX License Identifier.
* config/rs6000/aix73.h: New file.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1be8d96f5e5..0230bb88861 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1,3 +1,4 @@
+# SPDX-License-Identifier: GPL-3.0-or-later
 # GCC target-specific configuration file.
 # Copyright (C) 1997-2021 Free Software Foundation, Inc.

@@ -3099,7 +3100,7 @@ rs6000-ibm-aix7.1.* | powerpc-ibm-aix7.1.*)
use_gcc_stdint=wrap
default_use_cxa_atexit=yes
;;
-rs6000-ibm-aix[789].* | powerpc-ibm-aix[789].*)
+rs6000-ibm-aix7.2.* | powerpc-ibm-aix7.2.*)
tmake_file="rs6000/t-aix52 t-slibgcc"
if test x$cpu_is_64bit = xyes; then
tm_file="${tm_file} rs6000/biarch64.h"
@@ -3112,6 +3113,19 @@ rs6000-ibm-aix[789].* | powerpc-ibm-aix[789].*)
use_gcc_stdint=wrap
default_use_cxa_atexit=yes
;;
+rs6000-ibm-aix[789].* | powerpc-ibm-aix[789].*)
+   tmake_file="rs6000/t-aix52 t-slibgcc"
+   if test x$cpu_is_64bit = xyes; then
+   tm_file="${tm_file} rs6000/biarch64.h"
+   tmake_file="rs6000/t-aix64 t-slibgcc"
+   fi
+   tm_file="${tm_file} rs6000/aix.h rs6000/aix73.h rs6000/xcoff.h
rs6000/aix-stdint.h"
+   extra_options="${extra_options} rs6000/aix64.opt"
+   use_collect2=yes
+   thread_file='aix'
+   use_gcc_stdint=wrap
+   default_use_cxa_atexit=yes
+   ;;
 rl78-*-elf*)
tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file}"
target_has_targetm_common=no


Re: [PATCH v5 2/2] x86: Add vec_duplicate expander

2021-06-27 Thread Richard Sandiford via Gcc-patches
"H.J. Lu via Gcc-patches"  writes:
> On Sun, Jun 27, 2021 at 1:43 AM Richard Sandiford
>  wrote:
>>
>> "H.J. Lu"  writes:
>> > 1. Update vec_duplicate to allow to fail so that backend can only allow
>> > broadcasting an integer constant to a vector when broadcast instruction
>> > is available.  This can be used by memset expander to avoid vec_duplicate
>> > when loading from constant pool is more efficient.
>>
>> I don't see any changes in target-independent code though, other than
>> the doc update.  It's still the case that (existing) uses of
>> vec_duplicate_optab do not allow it to fail.
>
> I have a followup patch set on
>
> https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pieces/broadcast
>
> to use it to expand memset with vector broadcast:
>
> https://gitlab.com/x86-gcc/gcc/-/commit/991c87f8a83ca736ae9ed92baa3ebadca289f6e3
>
> For SSE2 which doesn't have vector broadcast, the constant vector broadcast
> expander returns FAIL and load from constant pool will be used.

Hmm, but as Jeff and I mentioned in the earlier replies,
vec_duplicate_optab shouldn't be used for constants.  Constants
should go via the move expanders instead.

In a previous message I suggested:

  … would it work to change:

/* Try using vec_duplicate_optab for uniform vectors.  */
if (!TREE_SIDE_EFFECTS (exp)
&& VECTOR_MODE_P (mode)
&& eltmode == GET_MODE_INNER (mode)
&& ((icode = optab_handler (vec_duplicate_optab, mode))
!= CODE_FOR_nothing)
&& (elt = uniform_vector_p (exp)))

  to something like:

/* Try using vec_duplicate_optab for uniform vectors.  */
if (!TREE_SIDE_EFFECTS (exp)
&& VECTOR_MODE_P (mode)
&& eltmode == GET_MODE_INNER (mode)
&& (elt = uniform_vector_p (exp)))
  {
if (TREE_CODE (elt) == INTEGER_CST
|| TREE_CODE (elt) == POLY_INT_CST
|| TREE_CODE (elt) == REAL_CST
|| TREE_CODE (elt) == FIXED_CST)
  {
rtx src = gen_const_vec_duplicate (mode, expand_normal (node));
emit_move_insn (target, src);
break;
  }
…
  }

if that code was the source of the constant operand.  If we're adding a
new use of vec_duplicate_optab then that should be similarly protected
against constant operands.

Thanks,
Richard


Re: [PATCH 0.5/2] ipa-sra: Restructure how cloning and call redirection communicate (PR 93385)

2021-06-27 Thread Jan Hubicka
> 
> I was asked by Richi to split my fix for PR 93385 for easier review
> into IPA-SRA materialization refactoring and the actual DCE addition.
> Fortunately it was mostly natural except for a temporary weird
> condition in ipa_param_body_adjustments::modify_call_stmt.
> Additionally.  In addition to the patch I posted previously, this one
> also deallocated the newly added summary in toplev::finalize and fixes
> a mistakenly uninitialized field.
> 
> This is the first part which basically replaces performed_splits in
> clone_info and the code which generates it, keeps it up-to-date and
> consumes it with new edge summaries which are much nicer.  It simply
> contains 1) a mapping from the original argument indices to the actual
> indices in the call statement as it is now, 2) information needed to
> identify arguments representing pass-through IPA-SRA splits with which
> have been added to the call arguments in place of an original
> argument/reference and 3) a delta to the index where va_args may start
> - so basically directly all the information that the consumer of
> performed_splits had to compute and we also do not need the weird
> dummy declarations.
> 
> The main disadvantage is that the information has to be created (and
> kept up-to-date) for all call graph edges associated with the given
> statement from all clones (including inline clones) of the clone where
> splitting or removal happened first.  But all of this happens during
> clone materialization so the only effect on WPA memory consumption is
> the removal of a pointer from clone_info.
> 
> The statement modification code also has to know the statement from
> the original function in order to be able to locate the edge summaries
> which at this point are still keyed to these.  However, the code is
> already quite heavily dependant on how things are structured in
> tree-inline.c and in order to fix bugs like these it probably has to
> be.
> 
> The subsequent patch needs this new information to be able to remove
> arguments from calls during materialization and communicate this
> information to the call redirection.
> 
> 2021-05-17  Martin Jambor  
> 
>   PR ipa/93385
>   * symtab-clones.h (clone_info): Removed member param_adjustments.
>   * ipa-param-manipulation.h: Adjust initial comment to reflect how we
>   deal with pass-through splits now.
>   (ipa_param_performed_split): Removed.
>   (ipa_param_adjustments::modify_call): Adjusted parameters.
>   (class ipa_param_body_adjustments): Adjusted parameters of
>   register_replacement, modify_gimple_stmt and modify_call_stmt.
>   (ipa_verify_edge_has_no_modifications): Declare.
>   (ipa_edge_modifications_finalize): Declare.
>   * cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Remove
>   performed_splits processing, pas only edge to padjs->modify_call,
>   check that call arguments were not modified if they should not have
>   been.
>   * cgraphclones.c (cgraph_node::create_clone): Do not copy performed
>   splits.
>   * ipa-param-manipulation.c (struct pass_through_split_map): New type.
>   (ipa_edge_modification_info): Likewise.
>   (ipa_edge_modification_sum): Likewise.
>   (ipa_edge_modifications): New edge summary.
>   (ipa_verify_edge_has_no_modifications): New function.
>   (transitive_split_p): Removed.
>   (transitive_split_map): Likewise.
>   (init_transitive_splits): Likewise.
>   (ipa_param_adjustments::modify_call): Adjusted to use the new edge
>   summary instead of performed_splits.
>   (ipa_param_body_adjustments::register_replacement): Drop dummy
>   parameter, set base_index of the created ipa_param_body_replacement.
>   (phi_arg_will_live_p): New function.
>   (ipa_param_body_adjustments::common_initialization): Do not create
>   IPA_SRA dummy decls.
>   (simple_tree_swap_info): Removed.
>   (remap_split_decl_to_dummy): Likewise.
>   (record_argument_state_1): New function.
>   (record_argument_state): Likewise.
>   (ipa_param_body_adjustments::modify_call_stmt): New parameter
>   orig_stmt.  Do not work with dummy decls, save necessary info about
>   changes to ipa_edge_modifications.
>   (ipa_param_body_adjustments::modify_gimple_stmt): New parameter
>   orig_stmt, pass it to modify_call_stmt.
>   (ipa_param_body_adjustments::modify_cfun_body): Adjust call to
>   modify_gimple_stmt.
>   (ipa_edge_modifications_finalize): New function.
>   * tree-inline.c (remap_gimple_stmt): Pass original statement to
>   modify_gimple_stmt.
>   (copy_phis_for_bb): Do not copy dead PHI nodes.
>   (expand_call_inline): Do not remap performed_splits.
>   (update_clone_info): Likewise.
>   * toplev.c: Include ipa-param-manipulation.h.
>   (toplev::finalize): Call ipa_edge_modifications_finalize.
> ---
>  gcc/cgraph.c |  22 +-
>  gcc/cgraphclones.c   |   3 -
>  

[COMMITTED] Fix PR 101230: ICE in fold_cond_expr_with_comparison

2021-06-27 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

This fixes PR 101230 where I had messed up and forgot that
invert_tree_comparison can return ERROR_MARK if the comparsion
is not invertable (floating point types).

Committed as obvious after a bootstrap/test on x86_64-linux-gnu-gnu

gcc/ChangeLog:

PR middle-end/101230
* fold-const.c (fold_ternary_loc): Check
the return value of invert_tree_comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr101230-1.c: New test.
---
 gcc/fold-const.c  |  9 +
 gcc/testsuite/gcc.dg/torture/pr101230-1.c | 13 +
 2 files changed, 18 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr101230-1.c

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index e2110b6bffe..dfccbaec683 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -12837,10 +12837,11 @@ fold_ternary_loc (location_t loc, enum tree_code 
code, tree type,
  tree arg00 = TREE_OPERAND (arg0, 0);
  tree arg01 = TREE_OPERAND (arg0, 1);
  comp_code = invert_tree_comparison (comp_code, HONOR_NANS (arg00));
- tem = fold_cond_expr_with_comparison (loc, type, comp_code,
-   arg00,
-   arg01,
-   op2, op1);
+ if (comp_code != ERROR_MARK)
+   tem = fold_cond_expr_with_comparison (loc, type, comp_code,
+ arg00,
+ arg01,
+ op2, op1);
  if (tem)
return tem;
}
diff --git a/gcc/testsuite/gcc.dg/torture/pr101230-1.c 
b/gcc/testsuite/gcc.dg/torture/pr101230-1.c
new file mode 100644
index 000..ba9c9eec740
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr101230-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-signed-zeros" } */
+
+
+int update_r_k_curr_cluster;
+void update_r_k(void) {
+  double curr_distance = distance3d_sqr_pt4d_pt4d();
+  for (int cluster; cluster; cluster++)
+if (0 < curr_distance) {
+  curr_distance = 0;
+  update_r_k_curr_cluster = cluster;
+}
+}
-- 
2.27.0



[PATCH 0/4] v4 PHI-OPT move abs_replacement to match.pd

2021-06-27 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

To able to move PHI-OPT's abs_replacement to match.pd, a bunch
of support needed to be added to PHI-OPT.
This is a set of 4 (unapproved) patches which allows us to remove
abs_replacement and even does one set further and does a few extra
transformations that abs_replacement did not do (just because of
the moving from fold to match).

v3 to v4 Changes:
* pushed the approved patches which are not dependent on unapproved ones
* Change 1st patch to only duplicate in one direction and expanded comment
  on why it is safe and what it is needed

Andrew Pinski (4):
  Duplicate the range information of the phi onto the new ssa_name
  Allow match-and-simplified phiopt to run in early phiopt
  Try inverted comparison for match_simplify in phiopt
  Port most of the A CMP 0 ? A : -A to match

 gcc/match.pd   |  60 +
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c |   4 +-
 gcc/tree-ssa-phiopt.c  | 291 +
 3 files changed, 185 insertions(+), 170 deletions(-)

-- 
2.27.0



[PATCH 1/4] Duplicate the range information of the phi onto the new ssa_name

2021-06-27 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Since match_simplify_replacement uses gimple_simplify, there is a new
ssa name created sometimes and then we go and replace the phi edge with
this new ssa name, the range information on the phi is lost.
Placing this in replace_phi_edge_with_variable is the best option instead
of doing it in each time replace_phi_edge_with_variable is called which is
what is done today.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.c (replace_phi_edge_with_variable): Duplicate range
info if we're the only things setting the target PHI.
(value_replacement): Don't duplicate range here.
(minmax_replacement): Likewise.
---
 gcc/tree-ssa-phiopt.c | 43 ++-
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 1777bff2f7c..ab12e85569d 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -391,6 +391,32 @@ replace_phi_edge_with_variable (basic_block cond_block,
   basic_block bb = gimple_bb (phi);
   basic_block block_to_remove;
   gimple_stmt_iterator gsi;
+  tree phi_result = PHI_RESULT (phi);
+
+  /* Duplicate range info if we're the only things setting the target PHI.
+ This is needed as later on, the new_tree will be replacing
+ The assignement of the PHI.
+ For an example:
+ bb1:
+ _4 = min
+ goto bb2
+
+ range<-INF,255>
+ a_3 = PHI<_4(1)>
+ bb3:
+
+ use(a_3)
+ And _4 gets prograted into the use of a_3 and losing the range info.
+ This can't be done for more than 2 incoming edges as the progration
+ won't happen.  */
+  if (TREE_CODE (new_tree) == SSA_NAME
+  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
+  && INTEGRAL_TYPE_P (TREE_TYPE (phi_result))
+  && !SSA_NAME_RANGE_INFO (new_tree)
+  && SSA_NAME_RANGE_INFO (phi_result))
+duplicate_ssa_name_range_info (new_tree,
+  SSA_NAME_RANGE_TYPE (phi_result),
+  SSA_NAME_RANGE_INFO (phi_result));
 
   /* Change the PHI argument to new.  */
   SET_USE (PHI_ARG_DEF_PTR (phi, e->dest_idx), new_tree);
@@ -1385,16 +1411,6 @@ value_replacement (basic_block cond_bb, basic_block 
middle_bb,
   :
   # u_3 = PHI   */
   reset_flow_sensitive_info (lhs);
-  if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
-   {
- /* If available, we can use VR of phi result at least.  */
- tree phires = gimple_phi_result (phi);
- struct range_info_def *phires_range_info
-   = SSA_NAME_RANGE_INFO (phires);
- if (phires_range_info)
-   duplicate_ssa_name_range_info (lhs, SSA_NAME_RANGE_TYPE (phires),
-  phires_range_info);
-   }
   gimple_stmt_iterator gsi_from;
   for (int i = prep_cnt - 1; i >= 0; --i)
{
@@ -1794,13 +1810,6 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
   gimple_seq stmts = NULL;
   tree phi_result = PHI_RESULT (phi);
   result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1);
-  /* Duplicate range info if we're the only things setting the target PHI.  */
-  if (!gimple_seq_empty_p (stmts)
-  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
-  && !POINTER_TYPE_P (TREE_TYPE (phi_result))
-  && SSA_NAME_RANGE_INFO (phi_result))
-duplicate_ssa_name_range_info (result, SSA_NAME_RANGE_TYPE (phi_result),
-  SSA_NAME_RANGE_INFO (phi_result));
 
   gsi = gsi_last_bb (cond_bb);
   gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);
-- 
2.27.0



[PATCH 2/4] Allow match-and-simplified phiopt to run in early phiopt

2021-06-27 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

To move a few things more to match-and-simplify from phiopt,
we need to allow match_simplify_replacement to run in early
phiopt. To do this we add a replacement for gimple_simplify
that is explictly for phiopt.

OK? Bootstrapped and tested on x86_64-linux-gnu with no
regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.c (match_simplify_replacement):
Add early_p argument. Call gimple_simplify_phiopt
instead of gimple_simplify.
(tree_ssa_phiopt_worker): Update call to
match_simplify_replacement and allow unconditionally.
(phiopt_early_allow): New function.
(gimple_simplify_phiopt): New function.
---
 gcc/tree-ssa-phiopt.c | 89 ++-
 1 file changed, 70 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index ab12e85569d..17bc597851b 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -50,12 +50,13 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "internal-fn.h"
 #include "gimple-range.h"
+#include "gimple-match.h"
 
 static unsigned int tree_ssa_phiopt_worker (bool, bool, bool);
 static bool two_value_replacement (basic_block, basic_block, edge, gphi *,
   tree, tree);
 static bool match_simplify_replacement (basic_block, basic_block,
-   edge, edge, gphi *, tree, tree);
+   edge, edge, gphi *, tree, tree, bool);
 static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, tree,
gimple *);
 static int value_replacement (basic_block, basic_block,
@@ -345,9 +346,9 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)
  /* Do the replacement of conditional if it can be done.  */
  if (!early_p && two_value_replacement (bb, bb1, e2, phi, arg0, arg1))
cfgchanged = true;
- else if (!early_p
-  && match_simplify_replacement (bb, bb1, e1, e2, phi,
- arg0, arg1))
+ else if (match_simplify_replacement (bb, bb1, e1, e2, phi,
+  arg0, arg1,
+  early_p))
cfgchanged = true;
  else if (abs_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
cfgchanged = true;
@@ -811,6 +812,67 @@ two_value_replacement (basic_block cond_bb, basic_block 
middle_bb,
   return true;
 }
 
+/* Return TRUE if CODE should be allowed during early phiopt.
+   Currently this is to allow MIN/MAX and ABS/NEGATE.  */
+static bool
+phiopt_early_allow (enum tree_code code)
+{
+  switch (code)
+{
+  case MIN_EXPR:
+  case MAX_EXPR:
+  case ABS_EXPR:
+  case ABSU_EXPR:
+  case NEGATE_EXPR:
+  case SSA_NAME:
+   return true;
+  default:
+   return false;
+}
+}
+
+/* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT.
+   Return NULL if nothing can be simplified or the resulting simplified value
+   with parts pushed if EARLY_P was true. Also rejects non allowed tree code
+   if EARLY_P is set.
+   Takes the comparison from COMP_STMT and two args, ARG0 and ARG1 and tries
+   to simplify CMP ? ARG0 : ARG1.  */
+static tree
+gimple_simplify_phiopt (bool early_p, tree type, gimple *comp_stmt,
+   tree arg0, tree arg1,
+   gimple_seq *seq)
+{
+  tree result;
+  enum tree_code comp_code = gimple_cond_code (comp_stmt);
+  location_t loc = gimple_location (comp_stmt);
+  tree cmp0 = gimple_cond_lhs (comp_stmt);
+  tree cmp1 = gimple_cond_rhs (comp_stmt);
+  /* To handle special cases like floating point comparison, it is easier and
+ less error-prone to build a tree and gimplify it on the fly though it is
+ less efficient.
+ Don't use fold_build2 here as that might create (bool)a instead of just
+ "a != 0".  */
+  tree cond = build2_loc (loc, comp_code, boolean_type_node,
+ cmp0, cmp1);
+  gimple_match_op op (gimple_match_cond::UNCOND,
+ COND_EXPR, type, cond, arg0, arg1);
+
+  if (op.resimplify (early_p ? NULL : seq, follow_all_ssa_edges))
+{
+  /* Early we want only to allow some generated tree codes. */
+  if (!early_p
+ || op.code.is_tree_code ()
+ || phiopt_early_allow ((tree_code)op.code))
+   {
+ result = maybe_push_res_to_seq (&op, seq);
+ if (result)
+   return result;
+   }
+}
+
+  return NULL;
+}
+
 /*  The function match_simplify_replacement does the main work of doing the
 replacement using match and simplify.  Return true if the replacement is 
done.
 Otherwise return false.
@@ -820,10 +882,9 @@ two_value_replacement (basic_block cond_bb, basic_block 
middle_bb,
 static bool
 match_simplify_replacement (bas

[PATCH 3/4] Try inverted comparison for match_simplify in phiopt

2021-06-27 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Since match and simplify does not have all of the inverted
comparison patterns, it make sense to just have
phi-opt try to do the inversion and try match and simplify again.

OK? Bootstrapped and tested on x86_64-linux-gnu.

Thanks,
Andrew Pinski

gcc/ChangeLog:

* tree-ssa-phiopt.c (gimple_simplify_phiopt):
If "A ? B : C" fails to simplify, try "(!A) ? C : B".
---
 gcc/tree-ssa-phiopt.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 17bc597851b..9bda1b2a397 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -836,7 +836,8 @@ phiopt_early_allow (enum tree_code code)
with parts pushed if EARLY_P was true. Also rejects non allowed tree code
if EARLY_P is set.
Takes the comparison from COMP_STMT and two args, ARG0 and ARG1 and tries
-   to simplify CMP ? ARG0 : ARG1.  */
+   to simplify CMP ? ARG0 : ARG1.
+   Also try to simplify (!CMP) ? ARG0 : ARG1 if the non-inverse failed.  */
 static tree
 gimple_simplify_phiopt (bool early_p, tree type, gimple *comp_stmt,
tree arg0, tree arg1,
@@ -869,6 +870,30 @@ gimple_simplify_phiopt (bool early_p, tree type, gimple 
*comp_stmt,
return result;
}
 }
+  /* Try the inverted comparison, that is !COMP ? ARG1 : ARG0. */
+  comp_code = invert_tree_comparison (comp_code, HONOR_NANS (cmp0));
+
+  if (comp_code == ERROR_MARK)
+return NULL;
+
+  cond = build2_loc (loc,
+comp_code, boolean_type_node,
+cmp0, cmp1);
+  gimple_match_op op1 (gimple_match_cond::UNCOND,
+  COND_EXPR, type, cond, arg1, arg0);
+
+  if (op1.resimplify (early_p ? NULL : seq, follow_all_ssa_edges))
+{
+  /* Early we want only to allow some generated tree codes. */
+  if (!early_p
+ || op1.code.is_tree_code ()
+ || phiopt_early_allow ((tree_code)op1.code))
+   {
+ result = maybe_push_res_to_seq (&op1, seq);
+ if (result)
+   return result;
+   }
+}
 
   return NULL;
 }
-- 
2.27.0



[PATCH 4/4] Port most of the A CMP 0 ? A : -A to match

2021-06-27 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

To improve phiopt and be able to remove abs_replacement, this ports
most of "A CMP 0 ? A : -A" from fold_cond_expr_with_comparison to
match.pd.  There is a few extra changes that are needed to remove
the "A CMP 0 ? A : -A" part from fold_cond_expr_with_comparison:
   * Need to handle (A - B) case
   * Need to handle UN* comparisons.

I will handle those in a different patch.

Note phi-opt-15.c test needed to be updated as we get ABSU now
instead of not getting ABS.  When ABSU was added phiopt was not
updated even to use ABSU instead of not creating ABS.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (A CMP 0 ? A : -A): New patterns.
* tree-ssa-phiopt.c (abs_replacement): Delete function.
(tree_ssa_phiopt_worker): Don't call abs_replacement.
Update comment about abs_replacement.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-15.c: Update test to expect
ABSU and still not expect ABS_EXPR.
---
 gcc/match.pd   |  60 +
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c |   4 +-
 gcc/tree-ssa-phiopt.c  | 134 +
 3 files changed, 64 insertions(+), 134 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 39fb57ee1f4..0c790dfa741 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3976,6 +3976,66 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cnd (logical_inverted_value truth_valued_p@0) @1 @2)
   (cnd @0 @2 @1)))
 
+/* abs/negative simplifications moved from fold_cond_expr_with_comparison,
+   Need to handle (A - B) case as fold_cond_expr_with_comparison does.
+   Need to handle UN* comparisons.
+
+   None of these transformations work for modes with signed
+   zeros.  If A is +/-0, the first two transformations will
+   change the sign of the result (from +0 to -0, or vice
+   versa).  The last four will fix the sign of the result,
+   even though the original expressions could be positive or
+   negative, depending on the sign of A.
+
+   Note that all these transformations are correct if A is
+   NaN, since the two alternatives (A and -A) are also NaNs.  */
+
+(for cnd (cond vec_cond)
+ /* A == 0? A : -Asame as -A */
+ (for cmp (eq uneq)
+  (simplify
+   (cnd (cmp @0 zerop) @0 (negate@1 @0))
+(if (!HONOR_SIGNED_ZEROS (element_mode (type)))
+ @1))
+  (simplify
+   (cnd (cmp @0 zerop) zerop (negate@1 @0))
+(if (!HONOR_SIGNED_ZEROS (element_mode (type)))
+ @1))
+ )
+ /* A != 0? A : -Asame as A */
+ (for cmp (ne ltgt)
+  (simplify
+   (cnd (cmp @0 zerop) @0 (negate @0))
+(if (!HONOR_SIGNED_ZEROS (element_mode (type)))
+ @0))
+  (simplify
+   (cnd (cmp @0 zerop) @0 zerop)
+(if (!HONOR_SIGNED_ZEROS (element_mode (type)))
+ @0))
+ )
+ /* A >=/> 0? A : -Asame as abs (A) */
+ (for cmp (ge gt)
+  (simplify
+   (cnd (cmp @0 zerop) @0 (negate @0))
+(if (!HONOR_SIGNED_ZEROS (element_mode (type))
+&& !TYPE_UNSIGNED (type))
+ (abs @0
+ /* A <=/< 0? A : -Asame as -abs (A) */
+ (for cmp (le lt)
+  (simplify
+   (cnd (cmp @0 zerop) @0 (negate @0))
+(if (!HONOR_SIGNED_ZEROS (element_mode (type))
+&& !TYPE_UNSIGNED (type))
+ (if (ANY_INTEGRAL_TYPE_P (type)
+ && !TYPE_OVERFLOW_WRAPS (type))
+  (with {
+   tree utype = unsigned_type_for (type);
+   }
+   (convert (negate (absu:utype @0
+   (negate (abs @0)
+ )
+)
+
 /* -(type)!A -> (type)A - 1.  */
 (simplify
  (negate (convert?:s (logical_inverted_value:s @0)))
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c
index ac3018ef533..6aec68961cf 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c
@@ -9,4 +9,6 @@ foo (int i)
   return i;
 }
 
-/* { dg-final { scan-tree-dump-not "ABS" "optimized" } } */
+/* We should not have ABS_EXPR but ABSU_EXPR instead. */
+/* { dg-final { scan-tree-dump-not "ABS_EXPR" "optimized" } } */
+/* { dg-final { scan-tree-dump "ABSU" "optimized" } } */
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 9bda1b2a397..97540e30d55 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -63,8 +63,6 @@ static int value_replacement (basic_block, basic_block,
  edge, edge, gphi *, tree, tree);
 static bool minmax_replacement (basic_block, basic_block,
edge, edge, gphi *, tree, tree);
-static bool abs_replacement (basic_block, basic_block,
-edge, edge, gphi *, tree, tree);
 static bool spaceship_replacement (basic_block, basic_block,
   edge, edge, gphi *, tree, tree);
 static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, basic_block,
@@ -350,8 +348,6 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)
   arg0, arg1,

Re: [PATCH v5 1/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

2021-06-27 Thread Hongtao Liu via Gcc-patches
On Sun, Jun 27, 2021 at 4:02 AM H.J. Lu  wrote:
>
> 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO
> operands to vector broadcast from an integer with AVX2.
> 2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
> won't increase stack alignment requirement and blocks transformation by
> the combine pass.
>
> A small benchmark:
>
> https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast
>
> shows that broadcast is a little bit faster on Intel Core i7-8559U:
>
> $ make
> gcc -g -I. -O2   -c -o test.o test.c
> gcc -g   -c -o memory.o memory.S
> gcc -g   -c -o broadcast.o broadcast.S
> gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
> gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
> ./test
> memory  : 147215
> broadcast   : 121213
> vec_dup_sse2: 171366
> $
>
> broadcast is also smaller:
>
> $ size memory.o broadcast.o
>textdata bss dec hex filename
> 132   0   0 132  84 memory.o
> 122   0   0 122  7a broadcast.o
> $
>
> 3. Update PR 87767 tests to expect integer broadcast instead of broadcast
> from memory.
> 4. Update avx512f_cond_move.c to expect integer broadcast.
>
> A small benchmark:
>
> https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast
>
> shows that integer broadcast is faster than embedded memory broadcast:
>
> $ make
> gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
> gcc -g   -c -o memory.o memory.S
> gcc -g   -c -o broadcast.o broadcast.S
> gcc -o test test.o memory.o broadcast.o
> ./test
> memory  : 425538
> broadcast   : 375260
> $
>
> gcc/
>
> PR target/100865
> * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
> New prototype.
> (ix86_byte_broadcast): New function.
> (ix86_convert_const_wide_int_to_broadcast): Likewise.
> (ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode
> size is 16 bytes or bigger.
> (ix86_broadcast_from_integer_constant): New function.
> (ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR
> to broadcast if mode size is 16 bytes or bigger.
> * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
> prototype.
> * config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function.
>
> gcc/testsuite/
>
> PR target/100865
> * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer
> broadcast.
> * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
> * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
> * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
> * gcc.target/i386/avx512f_cond_move.c: Also pass
> -mprefer-vector-width=512 and expect integer broadcast.
> * gcc.target/i386/pr100865-1.c: New test.
> * gcc.target/i386/pr100865-2.c: Likewise.
> * gcc.target/i386/pr100865-3.c: Likewise.
> * gcc.target/i386/pr100865-4a.c: Likewise.
> * gcc.target/i386/pr100865-4b.c: Likewise.
> * gcc.target/i386/pr100865-5a.c: Likewise.
> * gcc.target/i386/pr100865-5b.c: Likewise.
> * gcc.target/i386/pr100865-6a.c: Likewise.
> * gcc.target/i386/pr100865-6b.c: Likewise.
> * gcc.target/i386/pr100865-6c.c: Likewise.
> * gcc.target/i386/pr100865-7a.c: Likewise.
> * gcc.target/i386/pr100865-7b.c: Likewise.
> * gcc.target/i386/pr100865-7c.c: Likewise.
> * gcc.target/i386/pr100865-8a.c: Likewise.
> * gcc.target/i386/pr100865-8b.c: Likewise.
> * gcc.target/i386/pr100865-9a.c: Likewise.
> * gcc.target/i386/pr100865-9b.c: Likewise.
> * gcc.target/i386/pr100865-10a.c: Likewise.
> * gcc.target/i386/pr100865-10b.c: Likewise.
> * gcc.target/i386/pr100865-11a.c: Likewise.
> * gcc.target/i386/pr100865-11b.c: Likewise.
> * gcc.target/i386/pr100865-12a.c: Likewise.
> * gcc.target/i386/pr100865-12b.c: Likewise.
> ---
>  gcc/config/i386/i386-expand.c | 190 --
>  gcc/config/i386/i386-protos.h |   2 +
>  gcc/config/i386/i386.c|  13 ++
>  .../i386/avx512f-broadcast-pr87767-1.c|   7 +-
>  .../i386/avx512f-broadcast-pr87767-5.c|   5 +-
>  .../gcc.target/i386/avx512f_cond_move.c   |   4 +-
>  .../i386/avx512vl-broadcast-pr87767-1.c   |  12 +-
>  .../i386/avx512vl-broadcast-pr87767-5.c   |   9 +-
>  gcc/testsuite/gcc.target/i386/pr100865-1.c|  13 ++
>  gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  33 +++
>  gcc/testsuite/gcc.target/i386/pr100865-10b.c  |   7 +
>  gcc/testsuite/gcc.target/i386/pr100865-11a.c  |  23 +++
>  gcc/testsuite/gcc.target/i386/pr100865-11b.c  |   8 +
>  gcc/testsuite/gcc.target/i386/pr100865-12a.c  |  20 ++
>  gcc/testsuite/gcc.target/i386/pr100865-12b.c  |   8 +
>  gcc/testsuite/gcc.target/i386/pr100865-2.c|  

[PATCH][gcc] Allow functions without C-style ellipsis to use format attribute

2021-06-27 Thread Tuan Le Quang via Gcc-patches
Hi,

Currently, format attribute can be used to do type-checking for arguments
with respect to  a format string. However, only functions with a C-style
ellipsis can use it.
Supporting this attribute for non-variadic functions(functions without a
C-style ellipsis) gives nice convenience especially when writing code in
C++, we can use it for C++ variadic template functions like this

template
__attribute__((format(printf, 1, 2))) void myPrint (const char * fmt,
Args...args)

This patch will introduce these changes:
1. It is no longer an error simply to have a function with the format
attribute but no C-style variadic arguments
2. Functions are subjected to warnings/errors as before, except errors
mentioned in point 1 about not being variadic. For example, when a
non-variadic function has wrong arguments, e.g
__attribute__((format(printf, 1, 1))) or when being type-checked.

Note that behaviours of C-style variadic functions do not change, errors
and warnings are given as before.

This patch does it by:
1.   Relaxing several conditions for format attribute:
 -  Will only use POSARG_ELLIPSIS flag to call `get_constant` when
getting attribute arguments of a variadic function
 -  Relax the check for the last argument of the attribute (will not
require an ellipsis argument)
 -  (Before this patch) After passing the above check, current gcc will
call `get_constant` to get the function parameter that the third attribute
argument is pointing to. If POSARG_ELLIPSIS is set, `get_constant` will
look for `...`. If not, `get_constant` will look for a C-style string. Note
that POSARG_ELLIPSIS is set automatically for getting the third attribute
argument.
(After this patch) POSARG_ELLIPSIS is set only when the function
has C-style '...'. Now, if POSARG_ELLIPSIS is not set, `get_constant` will
not check whether the third argument of format attribute points to a
C-style string.
2.   Modifying expected outcome of a testcase in objc testsuite, where we
expect a warning instead of an error
3.   Adding 2 test files

Successully bootstrapped and regression tested on x86_64-pc-linux-gnu.

Signed-off-by: Le Quang Tuan 

gcc/c-family/ChangeLog:

* c-attribs.c (positional_argument): allow third argument of format
attribute to point to parameters of any type if the function is not C-style
variadic
* c-format.c (decode_format_attr): read third argument with POSARG_ELLIPSIS
only if the function has has a variable argument
(handle_format_attribute): relax explicit checks for non-variadic functions

gcc/testsuite/ChangeLog:

* gcc.dg/format/attr-3.c: modify comment
* objc.dg/attributes/method-format-1.m: errors do not hold anymore, a
warning is given instead
* g++.dg/warn/format9.C: New test with usage of variadic templates.
* gcc.dg/format/attr-9.c: New test.

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 6bf492afcc0..7a17ce671de 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -714,6 +714,11 @@ positional_argument (const_tree fntype, const_tree
atname, tree pos,
   return NULL_TREE;
  }

+  /* For format attribute with argno >= 3, we don't expect any type
+   */
+  if (argno >= 3 && strcmp (IDENTIFIER_POINTER(atname), "format") == 0 &&
!(flags & POSARG_ELLIPSIS ) )
+return pos;
+
   /* Where the expected code is STRING_CST accept any pointer
  expected by attribute format (this includes possibly qualified
  char pointers and, for targets like Darwin, also pointers to
diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index bda3b18fcd0..453565ad7b8 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -380,9 +380,15 @@ decode_format_attr (const_tree fntype, tree atname,
tree args,
   else
 return false;

+  bool has_variable_arg = !type_argument_type(fntype,
type_num_arguments(fntype) + 1);
+  int extra_flag = 0;
+  if (has_variable_arg) {
+extra_flag = POSARG_ELLIPSIS;
+  }
+
   if (tree val = get_constant (fntype, atname, *first_arg_num_expr,
3, &info->first_arg_num,
-   (POSARG_ZERO | POSARG_ELLIPSIS), validated_p))
+   (POSARG_ZERO | extra_flag), validated_p))
 *first_arg_num_expr = val;
   else
 return false;
@@ -5193,11 +5199,11 @@ handle_format_attribute (tree *node, tree atname,
tree args,
   tree arg_type;

   /* Verify that first_arg_num points to the last arg,
- the ...  */
+ if the last arg is  ... */
   FOREACH_FUNCTION_ARGS (type, arg_type, iter)
 arg_num++;

-  if (arg_num != info.first_arg_num)
+  if (arg_num != info.first_arg_num && !type_argument_type(type, arg_num))
 {
   if (!(flags & (int) ATTR_FLAG_BUILT_IN))
  error ("argument to be formatted is not %<...%>");
diff --git a/gcc/testsuite/g++.dg/warn/format9.C
b/gcc/testsuite/g++.dg/warn/format9.C
new file mode 100644
index 000..39b615859fc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/format9.C
@@ -0,0 +1,16 @@
+// Test format attribute used with variadic templates
+// { dg-do compile { target c++11 } }
+

Re: [PATCH 3/4] Try inverted comparison for match_simplify in phiopt

2021-06-27 Thread Bernhard Reutner-Fischer via Gcc-patches
On 28 June 2021 01:24:59 CEST, apinski--- via Gcc-patches 
 wrote:
>From: Andrew Pinski 
>
>Since match and simplify does not have all of the inverted
>comparison patterns, it make sense to just have
>phi-opt try to do the inversion and try match and simplify again.
>
>OK? Bootstrapped and tested on x86_64-linux-gnu.
>
>Thanks,
>Andrew Pinski
>
>gcc/ChangeLog:
>
>   * tree-ssa-phiopt.c (gimple_simplify_phiopt):
>   If "A ? B : C" fails to simplify, try "(!A) ? C : B".
>---
> gcc/tree-ssa-phiopt.c | 27 ++-
> 1 file changed, 26 insertions(+), 1 deletion(-)
>
>diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
>index 17bc597851b..9bda1b2a397 100644
>--- a/gcc/tree-ssa-phiopt.c
>+++ b/gcc/tree-ssa-phiopt.c
>@@ -836,7 +836,8 @@ phiopt_early_allow (enum tree_code code)
>with parts pushed if EARLY_P was true. Also rejects non allowed tree
>code
>if EARLY_P is set.
>Takes the comparison from COMP_STMT and two args, ARG0 and ARG1 and
>tries
>-   to simplify CMP ? ARG0 : ARG1.  */
>+   to simplify CMP ? ARG0 : ARG1.
>+   Also try to simplify (!CMP) ? ARG0 : ARG1 if the non-inverse
>failed.  */

I think you need to swap the args above?

thanks,


[PATCH v2] fixinc: don't "fix" machine names in __has_include(...) [PR91085]

2021-06-27 Thread Xi Ruoyao via Gcc-patches
v2: match for __has_include explicitly, as it may contains
header name in <...> form.

fixincludes/

* fixfixes.c (machine_name_fix): Don't replace header names in
  __has_include(...).
* inclhack.def (machine_name): Adjust test.
* tests/base/testing.h: Update.
---
 fixincludes/fixfixes.c   | 29 +++--
 fixincludes/inclhack.def |  3 ++-
 fixincludes/tests/base/testing.h |  2 +-
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/fixincludes/fixfixes.c b/fixincludes/fixfixes.c
index 5b23a8b640d..147cba716c7 100644
--- a/fixincludes/fixfixes.c
+++ b/fixincludes/fixfixes.c
@@ -488,7 +488,7 @@ FIX_PROC_HEAD( char_macro_def_fix )
 FIX_PROC_HEAD( machine_name_fix )
 {
   regmatch_t match[2];
-  const char *line, *base, *limit, *p, *q;
+  const char *line, *base, *limit, *p, *q, *r;
   regex_t *label_re, *name_re;
   char scratch[SCRATCHSZ];
   size_t len;
@@ -524,7 +524,7 @@ FIX_PROC_HEAD( machine_name_fix )
   /* If the 'name_pat' matches in between base and limit, we have
  a bogon.  It is not worth the hassle of excluding comments
  because comments on #if/#ifdef lines are rare, and strings on
- such lines are illegal.
+ such lines are only legal in a "__has_include" directive.
 
  REG_NOTBOL means 'base' is not at the beginning of a line, which
  shouldn't matter since the name_re has no ^ anchor, but let's
@@ -544,6 +544,31 @@ FIX_PROC_HEAD( machine_name_fix )
 break;
 
   p = base + match[0].rm_so;
+
+  /* Check if the match is in __has_include(...) (PR 91085). */
+  for (q = base; q < p; q++)
+if (!strncmp (q, "__has_include", 13))
+  {
+r = q + 13;
+while (r < p && ISSPACE (*r))
+  r++;
+
+/* "__has_include" may appear as "defined(__has_include)",
+   search for the next appearance then.  */
+if (*r != '(')
+  continue;
+
+/* To avoid too much complexity, just hope there is never a
+   ')' in a header name.  */
+while (r < limit && *r != ')')
+  r++;
+if (r >= base + match[0].rm_eo)
+  {
+base = r;
+goto again;
+  }
+  }
+
   base += match[0].rm_eo;
 
   /* One more test: if on the same line we have the same string
diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def
index 3a4cfe06542..31389396af6 100644
--- a/fixincludes/inclhack.def
+++ b/fixincludes/inclhack.def
@@ -3151,7 +3151,8 @@ fix = {
 c_fix = machine_name;
 
 test_text = "/* MACH_DIFF: */\n"
-"#if defined( i386 ) || defined( sparc ) || defined( vax )"
+"#if defined( i386 ) || defined( sparc ) || defined( vax ) || "
+"defined( linux ) || __has_include (  ) || defined ( linux )"
 "\n/* no uniform test, so be careful  :-) */";
 };
 
diff --git a/fixincludes/tests/base/testing.h b/fixincludes/tests/base/testing.h
index cf95321fb86..00e8dde003e 100644
--- a/fixincludes/tests/base/testing.h
+++ b/fixincludes/tests/base/testing.h
@@ -64,7 +64,7 @@ BSD43__IOWR('T', 1) /* Some are multi-line */
 
 #if defined( MACHINE_NAME_CHECK )
 /* MACH_DIFF: */
-#if defined( i386 ) || defined( sparc ) || defined( vax )
+#if defined( i386 ) || defined( sparc ) || defined( vax ) || defined( linux ) 
|| __has_include (  ) || defined ( linux )
 /* no uniform test, so be careful  :-) */
 #endif  /* MACHINE_NAME_CHECK */
 
-- 
2.32.0





[RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328]

2021-06-27 Thread Kewen.Lin via Gcc-patches
Hi!

on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> PR100328 has some details about this issue, I am trying to
> brief it here.  In the hottest function LBM_performStreamCollideTRT
> of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions
> (27 FMA, 19 FMS, 11 FNMA).  On rs6000, this kind of FMA style
> insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg
> class have 64 registers whose foregoing 32 ones make up the
> whole FLOAT_REG.  There are some differences for these two
> flavors, taking "*fma4_fpr" as example:
> 
> (define_insn "*fma4_fpr"
>   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa,wa")
>   (fma:SFDF
> (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa")
> (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0")
> (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))]
> 
> // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG.
> //  (f/d) => A floating point register, aka. FLOAT_REG.
> 
> So for VSX_REG, we only have the destructive form, when VSX_REG
> alternative being used, the operand 2 or operand 3 is required
> to be the same as operand 0.  reload has to take care of this
> constraint and create some non-free register copies if required.
> 
> Assuming one fma insn looks like:
>   op0 = FMA (op1, op2, op3)
> 
> The best regclass of them are VSX_REG, when op1,op2,op3 are all dead,
> IRA simply creates three shuffle copies for them (here the operand
> order matters, since with the same freq, the one with smaller number
> takes preference), but IMO both op2 and op3 should take higher priority
> in copy queue due to the matching constraint.
> 
> I noticed that there is one function ira_get_dup_out_num, which meant
> to create this kind of constraint copy, but the below code looks to
> refuse to create if there is an alternative which has valid regclass
> without spilled need. 
> 
>   default:
>   {
> enum constraint_num cn = lookup_constraint (str);
> enum reg_class cl = reg_class_for_constraint (cn);
> if (cl != NO_REGS
> && !targetm.class_likely_spilled_p (cl))
>   goto fail
> 
>...
> 
> I cooked one patch attached to make ira respect this kind of matching
> constraint guarded with one parameter.  As I stated in the PR, I was
> not sure this is on the right track.  The RFC patch is to check the
> matching constraint in all alternatives, if there is one alternative
> with matching constraint and matches the current preferred regclass
> (or best of allocno?), it will record the output operand number and
> further create one constraint copy for it.  Normally it can get the
> priority against shuffle copies and the matching constraint will get
> satisfied with higher possibility, reload doesn't create extra copies
> to meet the matching constraint or the desirable register class when
> it has to.
> 
> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay
> as shuffle copies, and later any of A,B,C,D gets assigned by one
> hardware register which is a VSX register (VSX_REG) but not a FP
> register (FLOAT_REG), which means it has to pay costs once we can NOT
> go with VSX alternatives, so at that time it's important to respect
> the matching constraint then we can increase the freq for the remaining
> copies related to this (A/B, A/C, A/D).  This idea requires some side
> tables to record some information and seems a bit complicated in the
> current framework, so the proposed patch aggressively emphasizes the
> matching constraint at the time of creating copies.
> 

Comparing with the original patch (v1), this patch v3 has
considered: (this should be v2 for this mail list, but bump
it to be consistent as PR's).

  - Excluding the case where for one preferred register class
there can be two or more alternatives, one of them has the
matching constraint, while another doesn't have.  So for
the given operand, even if it's assigned by a hardware reg
which doesn't meet the matching constraint, it can simply
use the alternative which doesn't have matching constraint
so no register move is needed.  One typical case is
define_insn *mov_internal2 on rs6000.  So we
shouldn't create constraint copy for it.

  - The possible free register move in the same register class,
disable this if so since the register move to meet the
constraint is considered as free.

  - Making it on by default, suggested by Segher & Vladimir, we
hope to get rid of the parameter if the benchmarking result
looks good on major targets.

  - Tweaking cost when either of matching constraint two sides
is hardware register.  Before this patch, the constraint
copy is simply taken as a real move insn for pref and
conflict cost with one hardware register, after this patch,
it's allowed that there are several input operands
respecting the same matching constraint (but in different
alternatives), so we should take it to be like shuffle copy