RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-05 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]


As per https://gcc.gnu.org/codingconventions.html#ChangeLogs

--Snip--
ChangeLogs
ChangeLog entries are part of git commit messages and are automatically put 
into a corresponding ChangeLog file.
--Snip--

This means Changelog files will be updated automatically?  I did not do 
anything to Change log files while pushing. 
The Change log contents are part of my commit message. 

Regards,
Venkat.

> -Original Message-
> From: Gcc-patches  On Behalf Of
> Kumar, Venkataramanan via Gcc-patches
> Sent: Saturday, December 5, 2020 1:09 PM
> To: Jan Hubicka ; Uros Bizjak 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen3 CPU
> 
> [CAUTION: External Email]
> 
> [AMD Public Use]
> 
> Hi Honza,
> 
> > -Original Message-
> > From: Jan Hubicka 
> > Sent: Saturday, December 5, 2020 1:06 AM
> > To: Uros Bizjak 
> > Cc: Kumar, Venkataramanan ; gcc-
> > patc...@gcc.gnu.org
> > Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> > Zen3 CPU
> >
> > [CAUTION: External Email]
> >
> > > On Fri, Dec 4, 2020 at 6:50 PM Kumar, Venkataramanan
> > >  wrote:
> > > >
> > > > [AMD Public Use]
> > > >
> > > > Hi Uros
> > > >
> > > > > -Original Message-
> > > > > From: Uros Bizjak 
> > > > > Sent: Friday, December 4, 2020 2:30 PM
> > > > > To: Kumar, Venkataramanan 
> > > > > Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> > > > > 
> > > > > Subject: Re: [PATCH] [X86_64]: Enable support for next
> > > > > generation AMD
> > > > > Zen3 CPU
> > > > >
> > > > > [CAUTION: External Email]
> > > > >
> > > > > On Thu, Dec 3, 2020 at 4:29 PM Kumar, Venkataramanan
> > > > >  wrote:
> > > > > >
> > > > > > [AMD Public Use]
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi Maintainers,
> > > > > >
> > > > > >
> > > > > >
> > > > > > PFA, the patch that enables support for the next generation
> > > > > > AMD
> > > > > > Zen3
> > > > > CPU via -march=znver3.
> > > > > >
> > > > > > This is a very basic enablement patch. As of now the cost,
> > > > > > tuning and
> > > > > scheduler changes are kept same as znver2.
> > > > > >
> > > > > > Further changes to the cost and tunings will be done later.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Ok for trunk ?
> > > > >
> > > > > Please also add a new target to multiversioning and
> > > > > corresponding testcases. As an example, how this is done
> > > > > nowadays, please see a submission for a different target at [1].
> > > > >
> > > > > BTW: It looks that multiversioning testcases lack AMD targets.
> > > > > Can you please add a testcase similar to
> > > > > testsuite/g++.target/i386/mv16.C and also add AMD targets to
> > testsuite/gcc.target/i386/funcspec-56.inc.
> > > > > (this can be done in a follow-up patch).
> > > > >
> > > > > [1]
> > > > >
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > > gcc
> > > > > .gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> > > > >
> > July%2F549699.html&data=04%7C01%7CVenkataramanan.Kumar%40
> > > > >
> > amd.com%7Cb53d6be6a0d6439396ae08d8983308e9%7C3dd8961fe4884e
> > > > >
> > 608e11a82d994e183d%7C0%7C0%7C637426692241855598%7CUnknown
> > > > >
> > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > > > >
> > WwiLCJXVCI6Mn0%3D%7C1000&sdata=VAPPvfzv%2FMCRiXSn2eBNn
> > > > > 7bVIReoEHLkAtFgV%2BTFR4I%3D&reserved=0
> > > > >
> > > >
> > > > Please find attached the version 2 patch.
> > > >
> > > > I have made additional changes as suggested by you.
> > > > 1.  Added the AMD Zen targets to funcspec-56.inc file in the tests.
> > > > 2.  To covers multiversioning  added a new test with some set of
> > > > AMD
> > targets detected by builtin_cpus similar to mv16.C.
> > > >
> > > > is ok for trunk ?
> > >
> > > LGTM (I didn't review scheduling changes in detail).
> >
> > I checked the scheudling changes and they are OK. So the patch is OK
> > overall.
> >
> > Even with respect to Jason's point on possibly regressing primary
> > target (breaking -march=native on zen3 machine counts as a
> > regression), the risks here are low. There is nothing really controveral in 
> > the
> patch.
> >
> > It would be nice to setup the regular benchmarking on zen3 machine,
> > like we do for zen1/2.
> > Honza
> 
> Thank you for reviewing the patch.  I pushed the patch to the gcc trunk.
> 
> Ref:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
> gnu.org%2Fgit%2F%3Fp%3Dgcc.git%3Ba%3Dcommit%3Bh%3D3e2ae3ee285a
> 57455d5a23bd352a68c289130186&data=04%7C01%7Cvenkataramanan.k
> umar%40amd.com%7C03e85fcff1fe4d8386b508d898f0cb19%7C3dd8961fe488
> 4e608e11a82d994e183d%7C0%7C0%7C637427507262548698%7CUnknown%7
> CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> LCJXVCI6Mn0%3D%7C1000&sdata=TcAN3MO7J5nyIjF7RshCS0n5XfketTz
> Cvw6clctIfAI%3D&reserved=0
> 
> > >
> > > Uros.
> 
> Regards,
> Venkat.


Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-05 Thread Jakub Jelinek via Gcc-patches
On Sat, Dec 05, 2020 at 08:22:23AM +, Kumar, Venkataramanan via Gcc-patches 
wrote:
> [AMD Public Use]
> 
> 
> As per https://gcc.gnu.org/codingconventions.html#ChangeLogs
> 
> --Snip--
> ChangeLogs
> ChangeLog entries are part of git commit messages and are automatically put 
> into a corresponding ChangeLog file.
> --Snip--
> 
> This means Changelog files will be updated automatically?  I did not do 
> anything to Change log files while pushing. 
> The Change log contents are part of my commit message. 

Yes, a script will do that soon after mightnight UTC.

Jakub



[PATCH] phiopt: Improve conditional_replacement for x ? 0 : -1 [PR796232]

2020-12-05 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, for boolean x we currently optimize
in phiopt x ? 0 : -1 into -(int)!x but it can be optimized as
(int) x - 1 which is one less operation both in GIMPLE and in x86 assembly.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

And/or, shall we have a match.pd optimization to turn that -(type)!x
for BOOLEAN_TYPE (or other 1 bit unsigned precision values) into
(type) - 1.

2020-12-05  Jakub Jelinek  

PR tree-optimization/96232
* tree-ssa-phiopt.c (conditional_replacement): Optimize
x ? 0 : -1 as (int) x - 1 rather than -(int)!x.

* gcc.dg/tree-ssa/pr96232-1.c: New test.

--- gcc/tree-ssa-phiopt.c.jj2020-11-04 11:58:58.670252748 +0100
+++ gcc/tree-ssa-phiopt.c   2020-12-04 17:27:53.472837921 +0100
@@ -827,10 +827,24 @@ conditional_replacement (basic_block con
 
   if (neg)
 {
-  cond = fold_convert_loc (gimple_location (stmt),
-   TREE_TYPE (result), cond);
-  cond = fold_build1_loc (gimple_location (stmt),
-  NEGATE_EXPR, TREE_TYPE (cond), cond);
+  if (TREE_CODE (cond) == TRUTH_NOT_EXPR
+ && INTEGRAL_TYPE_P (TREE_TYPE (nonzero_arg)))
+   {
+ /* x ? 0 : -1 is better optimized as (int) x - 1 than
+-(int)!x.  */
+ cond = fold_convert_loc (gimple_location (stmt),
+  TREE_TYPE (result),
+  TREE_OPERAND (cond, 0));
+ cond = fold_build2_loc (gimple_location (stmt), PLUS_EXPR,
+ TREE_TYPE (result), cond, nonzero_arg);
+   }
+  else
+   {
+ cond = fold_convert_loc (gimple_location (stmt),
+  TREE_TYPE (result), cond);
+ cond = fold_build1_loc (gimple_location (stmt),
+ NEGATE_EXPR, TREE_TYPE (cond), cond);
+   }
 }
   else if (shift)
 {
--- gcc/testsuite/gcc.dg/tree-ssa/pr96232-1.c.jj2020-12-04 
17:32:40.607615276 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr96232-1.c   2020-12-04 17:33:09.914286354 
+0100
@@ -0,0 +1,11 @@
+/* PR tree-optimization/96232 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump " \\+ -1;" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "~x_\[0-9]*\\\(D\\\)" "optimized" } } */
+
+int
+foo (_Bool x)
+{
+  return x ? 0 : -1;
+}

Jakub



Re: [PATCH 0/8] [RS6000] rs6000_rtx_costs V2

2020-12-05 Thread Alan Modra via Gcc-patches
Hi Segher,
I've been holding off pinging these knowing you had a lot of other
review work, but maybe that's settling down now?  You already OK'd
1/8, 2/8 and 6/8.

[PATCH 3/8] [RS6000] rs6000_rtx_costs tidy AND
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555754.html

[PATCH 4/8] [RS6000] rs6000_rtx_costs tidy break/return
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555755.html

[PATCH 5/8] [RS6000] rs6000_rtx_costs cost IOR
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555756.html

[PATCH 7/8] [RS6000] rs6000_rtx_costs reduce cost for SETs
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555758.html

[PATCH 8/8] [RS6000] rs6000_rtx_costs for !speed
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555759.html

[RS6000] rotate and mask constants
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555760.html

[RS6000] Adjust testcases for power10 instructions V3
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557587.html

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH] phiopt: Handle bool in two_value_replacement [PR796232]

2020-12-05 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch improves code generation on the included testcase by
enabling two_value_replacement on booleans.  It does that only for arg0/arg1
values that conditional_replacement doesn't handle, and only does it if not
in the early phiopt pass, because conditional_replacement isn't done early
either.
I must say I'm not sure about that, in PR87105 / PR87608 you've added the
early phiopt pass and specifically excluded conditional_replacement and a
few others from being optimized early, but then 3 months later I've added
two_value_replacement and didn't restrict it to !early_p.  Shall we instead
do two_value_replacement only in late phiopt?

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-12-05  Jakub Jelinek  

PR tree-optimization/96232
* tree-ssa-phiopt.c (two_value_replacement): Add early_p argument.
If false, optimize even boolean lhs cases as long as arg0 has wider
precision and conditional_replacement doesn't handle that case.

* gcc.dg/tree-ssa/pr96232-2.c: New test.

--- gcc/tree-ssa-phiopt.c.jj2020-12-04 17:27:53.472837921 +0100
+++ gcc/tree-ssa-phiopt.c   2020-12-04 17:54:56.070689584 +0100
@@ -51,7 +51,7 @@ along with GCC; see the file COPYING3.
 
 static unsigned int tree_ssa_phiopt_worker (bool, bool, bool);
 static bool two_value_replacement (basic_block, basic_block, edge, gphi *,
-  tree, tree);
+  tree, tree, bool);
 static bool conditional_replacement (basic_block, basic_block,
 edge, edge, gphi *, tree, tree);
 static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, tree,
@@ -337,7 +337,7 @@ tree_ssa_phiopt_worker (bool do_store_el
}
 
  /* Do the replacement of conditional if it can be done.  */
- if (two_value_replacement (bb, bb1, e2, phi, arg0, arg1))
+ if (two_value_replacement (bb, bb1, e2, phi, arg0, arg1, early_p))
cfgchanged = true;
  else if (!early_p
   && conditional_replacement (bb, bb1, e1, e2, phi,
@@ -614,7 +614,7 @@ factor_out_conditional_conversion (edge
 
 static bool
 two_value_replacement (basic_block cond_bb, basic_block middle_bb,
-  edge e1, gphi *phi, tree arg0, tree arg1)
+  edge e1, gphi *phi, tree arg0, tree arg1, bool early_p)
 {
   /* Only look for adjacent integer constants.  */
   if (!INTEGRAL_TYPE_P (TREE_TYPE (arg0))
@@ -635,7 +635,7 @@ two_value_replacement (basic_block cond_
 
   if (TREE_CODE (lhs) != SSA_NAME
   || !INTEGRAL_TYPE_P (TREE_TYPE (lhs))
-  || TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE
+  || (early_p && TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE)
   || TREE_CODE (rhs) != INTEGER_CST)
 return false;
 
@@ -648,9 +648,25 @@ two_value_replacement (basic_block cond_
   return false;
 }
 
+  /* Defer boolean x ? 0 : {1,-1} or x ? {1,-1} : 0 to
+ conditional_replacement.  */
+  if (TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE
+  && (integer_zerop (arg0)
+ || integer_zerop (arg1)
+ || TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE
+ || (TYPE_PRECISION (TREE_TYPE (arg0))
+ <= TYPE_PRECISION (TREE_TYPE (lhs)
+return false;
+
   wide_int min, max;
-  if (get_range_info (lhs, &min, &max) != VR_RANGE
-  || min + 1 != max
+  if (TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE)
+{
+  min = wi::to_wide (boolean_false_node);
+  max = wi::to_wide (boolean_true_node);
+}
+  else if (get_range_info (lhs, &min, &max) != VR_RANGE)
+return false;
+  if (min + 1 != max
   || (wi::to_wide (rhs) != min
  && wi::to_wide (rhs) != max))
 return false;
--- gcc/testsuite/gcc.dg/tree-ssa/pr96232-2.c.jj2020-12-04 
17:56:22.180730234 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr96232-2.c   2020-12-04 17:57:27.973997239 
+0100
@@ -0,0 +1,18 @@
+/* PR tree-optimization/96232 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump " 38 - " "optimized" } } */
+/* { dg-final { scan-tree-dump " \\+ 97;" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "PHI <" "optimized" } } */
+
+int
+foo (_Bool x)
+{
+  return x ? 37 : 38;
+}
+
+int
+bar (_Bool x)
+{
+  return x ? 98 : 97;
+}

Jakub



Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-05 Thread Jan Hubicka
> [AMD Public Use]
> 
> 
> As per https://gcc.gnu.org/codingconventions.html#ChangeLogs
> 
> --Snip--
> ChangeLogs
> ChangeLog entries are part of git commit messages and are automatically put 
> into a corresponding ChangeLog file.
> --Snip--
> 
> This means Changelog files will be updated automatically?  I did not do 
> anything to Change log files while pushing. 
> The Change log contents are part of my commit message. 

Yes, ChageLog files are now autogenerated and commit messages get some
sanity checking at git push time, so if it got accepted it is probably
all fine :)

Honza
> 
> Regards,
> Venkat.
> 
> > -Original Message-
> > From: Gcc-patches  On Behalf Of
> > Kumar, Venkataramanan via Gcc-patches
> > Sent: Saturday, December 5, 2020 1:09 PM
> > To: Jan Hubicka ; Uros Bizjak 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH] [X86_64]: Enable support for next generation AMD
> > Zen3 CPU
> > 
> > [CAUTION: External Email]
> > 
> > [AMD Public Use]
> > 
> > Hi Honza,
> > 
> > > -Original Message-
> > > From: Jan Hubicka 
> > > Sent: Saturday, December 5, 2020 1:06 AM
> > > To: Uros Bizjak 
> > > Cc: Kumar, Venkataramanan ; gcc-
> > > patc...@gcc.gnu.org
> > > Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> > > Zen3 CPU
> > >
> > > [CAUTION: External Email]
> > >
> > > > On Fri, Dec 4, 2020 at 6:50 PM Kumar, Venkataramanan
> > > >  wrote:
> > > > >
> > > > > [AMD Public Use]
> > > > >
> > > > > Hi Uros
> > > > >
> > > > > > -Original Message-
> > > > > > From: Uros Bizjak 
> > > > > > Sent: Friday, December 4, 2020 2:30 PM
> > > > > > To: Kumar, Venkataramanan 
> > > > > > Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> > > > > > 
> > > > > > Subject: Re: [PATCH] [X86_64]: Enable support for next
> > > > > > generation AMD
> > > > > > Zen3 CPU
> > > > > >
> > > > > > [CAUTION: External Email]
> > > > > >
> > > > > > On Thu, Dec 3, 2020 at 4:29 PM Kumar, Venkataramanan
> > > > > >  wrote:
> > > > > > >
> > > > > > > [AMD Public Use]
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi Maintainers,
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > PFA, the patch that enables support for the next generation
> > > > > > > AMD
> > > > > > > Zen3
> > > > > > CPU via -march=znver3.
> > > > > > >
> > > > > > > This is a very basic enablement patch. As of now the cost,
> > > > > > > tuning and
> > > > > > scheduler changes are kept same as znver2.
> > > > > > >
> > > > > > > Further changes to the cost and tunings will be done later.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Ok for trunk ?
> > > > > >
> > > > > > Please also add a new target to multiversioning and
> > > > > > corresponding testcases. As an example, how this is done
> > > > > > nowadays, please see a submission for a different target at [1].
> > > > > >
> > > > > > BTW: It looks that multiversioning testcases lack AMD targets.
> > > > > > Can you please add a testcase similar to
> > > > > > testsuite/g++.target/i386/mv16.C and also add AMD targets to
> > > testsuite/gcc.target/i386/funcspec-56.inc.
> > > > > > (this can be done in a follow-up patch).
> > > > > >
> > > > > > [1]
> > > > > >
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > > > gcc
> > > > > > .gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> > > > > >
> > > July%2F549699.html&data=04%7C01%7CVenkataramanan.Kumar%40
> > > > > >
> > > amd.com%7Cb53d6be6a0d6439396ae08d8983308e9%7C3dd8961fe4884e
> > > > > >
> > > 608e11a82d994e183d%7C0%7C0%7C637426692241855598%7CUnknown
> > > > > >
> > > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > > > > >
> > > WwiLCJXVCI6Mn0%3D%7C1000&sdata=VAPPvfzv%2FMCRiXSn2eBNn
> > > > > > 7bVIReoEHLkAtFgV%2BTFR4I%3D&reserved=0
> > > > > >
> > > > >
> > > > > Please find attached the version 2 patch.
> > > > >
> > > > > I have made additional changes as suggested by you.
> > > > > 1.  Added the AMD Zen targets to funcspec-56.inc file in the tests.
> > > > > 2.  To covers multiversioning  added a new test with some set of
> > > > > AMD
> > > targets detected by builtin_cpus similar to mv16.C.
> > > > >
> > > > > is ok for trunk ?
> > > >
> > > > LGTM (I didn't review scheduling changes in detail).
> > >
> > > I checked the scheudling changes and they are OK. So the patch is OK
> > > overall.
> > >
> > > Even with respect to Jason's point on possibly regressing primary
> > > target (breaking -march=native on zen3 machine counts as a
> > > regression), the risks here are low. There is nothing really controveral 
> > > in the
> > patch.
> > >
> > > It would be nice to setup the regular benchmarking on zen3 machine,
> > > like we do for zen1/2.
> > > Honza
> > 
> > Thank you for reviewing the patch.  I pushed the patch to the gcc trunk.
> > 
> > Ref:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
> > gnu.org%2Fgit%2F%3Fp%3Dgcc.git%3Ba%3Dcommit%3Bh%3D3e2ae3ee285a
> > 57455d5a23bd352a68

Re: introduce overridable clear_cache emitter

2020-12-05 Thread Andreas Schwab
../../../../libffi/src/aarch64/ffi.c: In function 'ffi_prep_closure_loc':
../../../../libffi/src/aarch64/ffi.c:67:3: internal compiler error: in 
emit_library_call_value_1, at calls.c:5300
   67 |   __builtin___clear_cache (start, end);
  |   ^~~~

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] phiopt: Improve conditional_replacement for x ? 0 : -1 [PR796232]

2020-12-05 Thread Richard Biener
On December 5, 2020 10:10:25 AM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>As mentioned in the PR, for boolean x we currently optimize
>in phiopt x ? 0 : -1 into -(int)!x but it can be optimized as
>(int) x - 1 which is one less operation both in GIMPLE and in x86
>assembly.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
>And/or, shall we have a match.pd optimization to turn that -(type)!x
>for BOOLEAN_TYPE (or other 1 bit unsigned precision values) into
>(type) - 1.

I think that would make sense. Does that then cover the phiopt case directly? 

>2020-12-05  Jakub Jelinek  
>
>   PR tree-optimization/96232
>   * tree-ssa-phiopt.c (conditional_replacement): Optimize
>   x ? 0 : -1 as (int) x - 1 rather than -(int)!x.
>
>   * gcc.dg/tree-ssa/pr96232-1.c: New test.
>
>--- gcc/tree-ssa-phiopt.c.jj   2020-11-04 11:58:58.670252748 +0100
>+++ gcc/tree-ssa-phiopt.c  2020-12-04 17:27:53.472837921 +0100
>@@ -827,10 +827,24 @@ conditional_replacement (basic_block con
> 
>   if (neg)
> {
>-  cond = fold_convert_loc (gimple_location (stmt),
>-   TREE_TYPE (result), cond);
>-  cond = fold_build1_loc (gimple_location (stmt),
>-  NEGATE_EXPR, TREE_TYPE (cond), cond);
>+  if (TREE_CODE (cond) == TRUTH_NOT_EXPR
>+&& INTEGRAL_TYPE_P (TREE_TYPE (nonzero_arg)))
>+  {
>+/* x ? 0 : -1 is better optimized as (int) x - 1 than
>+   -(int)!x.  */
>+cond = fold_convert_loc (gimple_location (stmt),
>+ TREE_TYPE (result),
>+ TREE_OPERAND (cond, 0));
>+cond = fold_build2_loc (gimple_location (stmt), PLUS_EXPR,
>+TREE_TYPE (result), cond, nonzero_arg);
>+  }
>+  else
>+  {
>+cond = fold_convert_loc (gimple_location (stmt),
>+ TREE_TYPE (result), cond);
>+cond = fold_build1_loc (gimple_location (stmt),
>+NEGATE_EXPR, TREE_TYPE (cond), cond);
>+  }
> }
>   else if (shift)
> {
>--- gcc/testsuite/gcc.dg/tree-ssa/pr96232-1.c.jj   2020-12-04
>17:32:40.607615276 +0100
>+++ gcc/testsuite/gcc.dg/tree-ssa/pr96232-1.c  2020-12-04
>17:33:09.914286354 +0100
>@@ -0,0 +1,11 @@
>+/* PR tree-optimization/96232 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-optimized" } */
>+/* { dg-final { scan-tree-dump " \\+ -1;" "optimized" } } */
>+/* { dg-final { scan-tree-dump-not "~x_\[0-9]*\\\(D\\\)" "optimized" }
>} */
>+
>+int
>+foo (_Bool x)
>+{
>+  return x ? 0 : -1;
>+}
>
>   Jakub



Re: [PATCH] phiopt: Handle bool in two_value_replacement [PR796232]

2020-12-05 Thread Richard Biener
On December 5, 2020 10:14:49 AM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>The following patch improves code generation on the included testcase
>by
>enabling two_value_replacement on booleans.  It does that only for
>arg0/arg1
>values that conditional_replacement doesn't handle, and only does it if
>not
>in the early phiopt pass, because conditional_replacement isn't done
>early
>either.
>I must say I'm not sure about that, in PR87105 / PR87608 you've added
>the
>early phiopt pass and specifically excluded conditional_replacement and
>a
>few others from being optimized early, but then 3 months later I've
>added
>two_value_replacement and didn't restrict it to !early_p.  Shall we
>instead
>do two_value_replacement only in late phiopt?

Yeah, I guess that would make sense. I don't remember exactly but I saw 
regressions with early doing if conversion besides min/max/abs detection. 

>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok. 

Richard. 

>2020-12-05  Jakub Jelinek  
>
>   PR tree-optimization/96232
>   * tree-ssa-phiopt.c (two_value_replacement): Add early_p argument.
>   If false, optimize even boolean lhs cases as long as arg0 has wider
>   precision and conditional_replacement doesn't handle that case.
>
>   * gcc.dg/tree-ssa/pr96232-2.c: New test.
>
>--- gcc/tree-ssa-phiopt.c.jj   2020-12-04 17:27:53.472837921 +0100
>+++ gcc/tree-ssa-phiopt.c  2020-12-04 17:54:56.070689584 +0100
>@@ -51,7 +51,7 @@ along with GCC; see the file COPYING3.
> 
> static unsigned int tree_ssa_phiopt_worker (bool, bool, bool);
>static bool two_value_replacement (basic_block, basic_block, edge, gphi
>*,
>- tree, tree);
>+ tree, tree, bool);
> static bool conditional_replacement (basic_block, basic_block,
>edge, edge, gphi *, tree, tree);
>static gphi *factor_out_conditional_conversion (edge, edge, gphi *,
>tree, tree,
>@@ -337,7 +337,7 @@ tree_ssa_phiopt_worker (bool do_store_el
>   }
> 
> /* Do the replacement of conditional if it can be done.  */
>-if (two_value_replacement (bb, bb1, e2, phi, arg0, arg1))
>+if (two_value_replacement (bb, bb1, e2, phi, arg0, arg1, early_p))
>   cfgchanged = true;
> else if (!early_p
>  && conditional_replacement (bb, bb1, e1, e2, phi,
>@@ -614,7 +614,7 @@ factor_out_conditional_conversion (edge
> 
> static bool
> two_value_replacement (basic_block cond_bb, basic_block middle_bb,
>- edge e1, gphi *phi, tree arg0, tree arg1)
>+ edge e1, gphi *phi, tree arg0, tree arg1, bool early_p)
> {
>   /* Only look for adjacent integer constants.  */
>   if (!INTEGRAL_TYPE_P (TREE_TYPE (arg0))
>@@ -635,7 +635,7 @@ two_value_replacement (basic_block cond_
> 
>   if (TREE_CODE (lhs) != SSA_NAME
>   || !INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>-  || TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE
>+  || (early_p && TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE)
>   || TREE_CODE (rhs) != INTEGER_CST)
> return false;
> 
>@@ -648,9 +648,25 @@ two_value_replacement (basic_block cond_
>   return false;
> }
> 
>+  /* Defer boolean x ? 0 : {1,-1} or x ? {1,-1} : 0 to
>+ conditional_replacement.  */
>+  if (TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE
>+  && (integer_zerop (arg0)
>+|| integer_zerop (arg1)
>+|| TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE
>+|| (TYPE_PRECISION (TREE_TYPE (arg0))
>+<= TYPE_PRECISION (TREE_TYPE (lhs)
>+return false;
>+
>   wide_int min, max;
>-  if (get_range_info (lhs, &min, &max) != VR_RANGE
>-  || min + 1 != max
>+  if (TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE)
>+{
>+  min = wi::to_wide (boolean_false_node);
>+  max = wi::to_wide (boolean_true_node);
>+}
>+  else if (get_range_info (lhs, &min, &max) != VR_RANGE)
>+return false;
>+  if (min + 1 != max
>   || (wi::to_wide (rhs) != min
> && wi::to_wide (rhs) != max))
> return false;
>--- gcc/testsuite/gcc.dg/tree-ssa/pr96232-2.c.jj   2020-12-04
>17:56:22.180730234 +0100
>+++ gcc/testsuite/gcc.dg/tree-ssa/pr96232-2.c  2020-12-04
>17:57:27.973997239 +0100
>@@ -0,0 +1,18 @@
>+/* PR tree-optimization/96232 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-optimized" } */
>+/* { dg-final { scan-tree-dump " 38 - " "optimized" } } */
>+/* { dg-final { scan-tree-dump " \\+ 97;" "optimized" } } */
>+/* { dg-final { scan-tree-dump-not "PHI <" "optimized" } } */
>+
>+int
>+foo (_Bool x)
>+{
>+  return x ? 37 : 38;
>+}
>+
>+int
>+bar (_Bool x)
>+{
>+  return x ? 98 : 97;
>+}
>
>   Jakub



[PATCH] match.pd: Improve conditional_replacement for x ? 0 : -1 [PR796232]

2020-12-05 Thread Jakub Jelinek via Gcc-patches
On Sat, Dec 05, 2020 at 11:20:11AM +0100, Richard Biener wrote:
> >As mentioned in the PR, for boolean x we currently optimize
> >in phiopt x ? 0 : -1 into -(int)!x but it can be optimized as
> >(int) x - 1 which is one less operation both in GIMPLE and in x86
> >assembly.
> >
> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> >And/or, shall we have a match.pd optimization to turn that -(type)!x
> >for BOOLEAN_TYPE (or other 1 bit unsigned precision values) into
> >(type) - 1.
> 
> I think that would make sense. Does that then cover the phiopt case directly? 

That would be the following then.  Seems it works for that case.
Ok for trunk if it passes bootstrap/regtest?

2020-12-05  Jakub Jelinek  

PR tree-optimization/96232
* match.pd (-(type)!A -> (type)A - 1): New optimization.

* gcc.dg/tree-ssa/pr96232-1.c: New test.

--- gcc/match.pd.jj 2020-12-02 11:20:24.765486816 +0100
+++ gcc/match.pd2020-12-05 11:46:00.554518927 +0100
@@ -3812,6 +3812,16 @@ (define_operator_list COND_TERNARY
   (cnd (logical_inverted_value truth_valued_p@0) @1 @2)
   (cnd @0 @2 @1)))
 
+/* -(type)!A -> (type)A - 1.  */
+(simplify
+ (negate (convert?:s (logical_inverted_value:s @0)))
+ (if (INTEGRAL_TYPE_P (type)
+  && TREE_CODE (type) != BOOLEAN_TYPE
+  && TYPE_PRECISION (type) > 1
+  && TREE_CODE (@0) == SSA_NAME
+  && ssa_name_has_boolean_range (@0))
+  (plus (convert:type @0) { build_all_ones_cst (type); })))
+
 /* A + (B vcmp C ? 1 : 0) -> A - (B vcmp C ? -1 : 0), since vector comparisons
return all -1 or all 0 results.  */
 /* ??? We could instead convert all instances of the vec_cond to negate,
--- gcc/testsuite/gcc.dg/tree-ssa/pr96232-1.c.jj2020-12-05 
11:37:27.804332875 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr96232-1.c   2020-12-05 11:37:27.804332875 
+0100
@@ -0,0 +1,11 @@
+/* PR tree-optimization/96232 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump " \\+ -1;" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "~x_\[0-9]*\\\(D\\\)" "optimized" } } */
+
+int
+foo (_Bool x)
+{
+  return x ? 0 : -1;
+}


Jakub



[PATCH] phiopt, v2: Handle bool in two_value_replacement [PR796232]

2020-12-05 Thread Jakub Jelinek via Gcc-patches
On Sat, Dec 05, 2020 at 11:22:24AM +0100, Richard Biener wrote:
> >two_value_replacement and didn't restrict it to !early_p.  Shall we
> >instead
> >do two_value_replacement only in late phiopt?
> 
> Yeah, I guess that would make sense.  I don't remember exactly but I saw
> regressions with early doing if conversion besides min/max/abs detection.

So that would be then following patch instead.
Ok if it passes bootstrap/regtest?

2020-12-05  Jakub Jelinek  

PR tree-optimization/96232
* tree-ssa-phiopt.c (two_value_replacement): Optimize even boolean lhs
cases as long as arg0 has wider precision and conditional_replacement
doesn't handle that case.
(tree_ssa_phiopt_worker): Don't call two_value_replacement during
early phiopt.

* gcc.dg/tree-ssa/pr96232-2.c: New test.

--- gcc/tree-ssa-phiopt.c.jj2020-12-05 11:37:39.216203475 +0100
+++ gcc/tree-ssa-phiopt.c   2020-12-05 11:51:57.497472991 +0100
@@ -337,7 +337,7 @@ tree_ssa_phiopt_worker (bool do_store_el
}
 
  /* Do the replacement of conditional if it can be done.  */
- if (two_value_replacement (bb, bb1, e2, phi, arg0, arg1))
+ if (!early_p && two_value_replacement (bb, bb1, e2, phi, arg0, arg1))
cfgchanged = true;
  else if (!early_p
   && conditional_replacement (bb, bb1, e1, e2, phi,
@@ -635,7 +635,6 @@ two_value_replacement (basic_block cond_
 
   if (TREE_CODE (lhs) != SSA_NAME
   || !INTEGRAL_TYPE_P (TREE_TYPE (lhs))
-  || TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE
   || TREE_CODE (rhs) != INTEGER_CST)
 return false;
 
@@ -648,9 +647,25 @@ two_value_replacement (basic_block cond_
   return false;
 }
 
+  /* Defer boolean x ? 0 : {1,-1} or x ? {1,-1} : 0 to
+ conditional_replacement.  */
+  if (TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE
+  && (integer_zerop (arg0)
+ || integer_zerop (arg1)
+ || TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE
+ || (TYPE_PRECISION (TREE_TYPE (arg0))
+ <= TYPE_PRECISION (TREE_TYPE (lhs)
+return false;
+
   wide_int min, max;
-  if (get_range_info (lhs, &min, &max) != VR_RANGE
-  || min + 1 != max
+  if (TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE)
+{
+  min = wi::to_wide (boolean_false_node);
+  max = wi::to_wide (boolean_true_node);
+}
+  else if (get_range_info (lhs, &min, &max) != VR_RANGE)
+return false;
+  if (min + 1 != max
   || (wi::to_wide (rhs) != min
  && wi::to_wide (rhs) != max))
 return false;
--- gcc/testsuite/gcc.dg/tree-ssa/pr96232-2.c.jj2020-12-05 
11:51:01.891103252 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr96232-2.c   2020-12-05 11:51:01.891103252 
+0100
@@ -0,0 +1,18 @@
+/* PR tree-optimization/96232 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump " 38 - " "optimized" } } */
+/* { dg-final { scan-tree-dump " \\+ 97;" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "PHI <" "optimized" } } */
+
+int
+foo (_Bool x)
+{
+  return x ? 37 : 38;
+}
+
+int
+bar (_Bool x)
+{
+  return x ? 98 : 97;
+}


Jakub



Re: testsuite: Adjust target requirements for sad-vectorize and signbit

2020-12-05 Thread Alan Modra via Gcc-patches
On Thu, Oct 29, 2020 at 10:10:58PM +1030, Alan Modra wrote:
> Fixes
> FAIL: gcc.target/powerpc/signbit-1.c scan-assembler-not stxvd2x
> FAIL: gcc.target/powerpc/signbit-1.c scan-assembler-times mfvsrd 3
> FAIL: gcc.target/powerpc/signbit-1.c scan-assembler-times srdi 3
> FAIL: gcc.target/powerpc/signbit-2.c scan-assembler-times ld 1
> FAIL: gcc.target/powerpc/signbit-2.c scan-assembler-times srdi 1
> on powerpc-linux (or powerpc64-linux biarch -m32).

David,
This patch is fixing a regression caused by one of your testsuite
patches.  Please review.

https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557443.html

> signbit-1.c is quite obviously a 64-bit only testcase given the
> scan-assembler directives, and the purpose of the testcase to verify
> the 64-bit only UNSPEC_SIGNBIT patterns.  It could be made to pass for
> -m32 by adding -mpowerpc64, but that option that isn't very effective
> when bi-arch testing and results in errors on rs6000-aix.  And it is
> pointless to match -m32 stores to the stack followed by loads, which
> is what we do at the moment.
> 
> signbit-2.c on the other hand has more reasonable 32-bit output.
> 
> Regression tested powerpc64-linux biarch.
> 
>   * gcc.target/powerpc/signbit-1.c: Reinstate lp64 condition.
>   * gcc.target/powerpc/signbit-2.c: Match 32-bit output too.
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/signbit-1.c 
> b/gcc/testsuite/gcc.target/powerpc/signbit-1.c
> index eb4f53e397d..1642bf46d7a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/signbit-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/signbit-1.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target ppc_float128_sw } */
>  /* { dg-require-effective-target powerpc_p8vector_ok } */
>  /* { dg-options "-mdejagnu-cpu=power8 -O2 -mfloat128" } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/signbit-2.c 
> b/gcc/testsuite/gcc.target/powerpc/signbit-2.c
> index ff6af963dda..1b792916eba 100644
> --- a/gcc/testsuite/gcc.target/powerpc/signbit-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/signbit-2.c
> @@ -13,5 +13,7 @@ int do_signbit_kf (__float128 *a) { return 
> __builtin_signbit (*a); }
>  /* { dg-final { scan-assembler-not   "lxvw4x"   } } */
>  /* { dg-final { scan-assembler-not   "lxsd" } } */
>  /* { dg-final { scan-assembler-not   "lxsdx"} } */
> -/* { dg-final { scan-assembler-times "ld" 1 } } */
> -/* { dg-final { scan-assembler-times "srdi"   1 } } */
> +/* { dg-final { scan-assembler-times "ld" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "srdi"   1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "lwz"1 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times "rlwinm" 1 { target ilp32 } } } */

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Avoid atomic for guard acquire when that is expensive

2020-12-05 Thread Bernd Edlinger
On 12/2/20 7:57 PM, Jason Merrill wrote:
> On 12/1/20 1:28 PM, Bernd Edlinger wrote:
>> On 11/24/20 11:10 PM, Jason Merrill wrote:
>>> On 11/22/20 3:05 AM, Bernd Edlinger wrote:
 Hi,

 this avoids the need to use -fno-threadsafe-statics on
 arm-none-eabi or working around that problem by supplying
 a dummy __sync_synchronize function which might
 just lead to silent code failure of the worst kind
 (non-reproducable, racy) at runtime, as was pointed out
 on previous discussions here.

 When the atomic access involves a call to __sync_synchronize
 it is better to call __cxa_guard_acquire unconditionally,
 since it handles the atomics too, or is a non-threaded
 implementation when there is no gthread support for this target.

 This fixes also a bug for the ARM EABI big-endian target,
 that is, previously the wrong bit was checked.
>>>
>>> Instead of a new target macro, can't you follow 
>>> fold_builtin_atomic_always_lock_free/can_atomic_load_p?
>>
>> Yes, thanks, that should work too.
>> Would you like this better?
> 
>> +is_atomic_expensive_p (machine_mode mode)
>> +{
>> +  if (!flag_inline_atomics)
>> +    return false;
> 
> Why not true?
> 

Ooops...
Yes, I ought to return true here.
I must have made a mistake when I tested the last version of this patch,
sorry for the confusion.

>> +  if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode))
>> +    return false;
> 
> This also seems backwards; I'd think we want to return false if either of 
> those tests are true.  Or maybe just can_atomic_load_p, and not bother about 
> compare-and-swap.
> 


Yes, you are right.
Unfortuately can_atomic_load_p is too weak, since it does not cover
the memory barrier.

And can_compare_and_swap_p (..., false) is actually a bit too strong,
but if it returns true, we should be able to use any atomic without
need for a library call.

>> +  tree type = targetm.cxx.guard_mask_bit ()
>> +  ? TREE_TYPE (guard) : char_type_node;
>> +
>> +  if (is_atomic_expensive_p (TYPE_MODE (type)))
>> +    guard = integer_zero_node;
>> +  else
>> +    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);
> 
> It should still work to load a single byte, it just needs to be the 
> least-significant byte.  And this isn't an EABI issue; it looks like the 
> non-EABI code is also broken for big-endian targets, both the atomic load and 
> the normal load in get_guard_bits.
> 

I think the non-EABI code is always using bit 0 in the first byte,
by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1).

Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise.

For all other targets when _GLIBCXX_USE_FUTEX is defined,
__cxa_guard_XXX accesses the value as int* while the memory
is a 64-bit long, so I could imagine that is an aliasing violation.


But nothing that needs to be fixed immediately.


Attached is the corrected patch.

Tested again on arm-none-eabi with arm-sim.
Is it OK for trunk?

Thanks
Bernd.

> Jason
> 
From c1a6dcaa906113cba0dc88e36460a65aba35ec38 Mon Sep 17 00:00:00 2001
From: Bernd Edlinger 
Date: Tue, 1 Dec 2020 18:54:48 +0100
Subject: [PATCH] Avoid atomic for guard acquire when that is expensive

When the atomic access involves a call to __sync_synchronize
it is better to call __cxa_guard_acquire unconditionally,
since it handles the atomics too, or is a non-threaded
implementation when there is no gthread support for this target.

This fixes also a bug for the ARM EABI big-endian target,
that is, previously the wrong bit was checked.

2020-11-22  Bernd Edlinger  

	* decl2.c: (is_atomic_expensive_p): New helper function.
	(build_atomic_load_byte): Rename to...
	(build_atomic_load_type): ... and add new parameter type.
	(get_guard_cond): Skip the atomic here if that is expensive.
	Use the correct type for the atomic load on certain targets.
---
 gcc/cp/decl2.c | 33 +
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 1bc7b7e..c7ddcc9 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "c-family/c-ada-spec.h"
 #include "asan.h"
+#include "optabs-query.h"
 
 /* Id for dumping the raw trees.  */
 int raw_dump_id;
@@ -3297,18 +3298,34 @@ get_guard (tree decl)
   return guard;
 }
 
+/* Returns true if accessing the GUARD atomic is expensive,
+   i.e. involves a call to __sync_synchronize or similar.
+   In this case let __cxa_guard_acquire handle the atomics.  */
+
+static bool
+is_atomic_expensive_p (machine_mode mode)
+{
+  if (!flag_inline_atomics)
+return true;
+
+  if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode))
+return true;
+
+  return false;
+}
+
 /* Return an atomic load of src with the appropriate memory model.  */
 
 static tree
-build_atomic_load_byte (tree src, HOST_WIDE

Re: [PATCH] match.pd: Improve conditional_replacement for x ? 0 : -1 [PR796232]

2020-12-05 Thread Richard Biener
On December 5, 2020 11:57:46 AM GMT+01:00, Jakub Jelinek  
wrote:
>On Sat, Dec 05, 2020 at 11:20:11AM +0100, Richard Biener wrote:
>> >As mentioned in the PR, for boolean x we currently optimize
>> >in phiopt x ? 0 : -1 into -(int)!x but it can be optimized as
>> >(int) x - 1 which is one less operation both in GIMPLE and in x86
>> >assembly.
>> >
>> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> >
>> >And/or, shall we have a match.pd optimization to turn that -(type)!x
>> >for BOOLEAN_TYPE (or other 1 bit unsigned precision values) into
>> >(type) - 1.
>> 
>> I think that would make sense. Does that then cover the phiopt case
>directly? 
>
>That would be the following then.  Seems it works for that case.
>Ok for trunk if it passes bootstrap/regtest?

Ok. 

Richard. 

>2020-12-05  Jakub Jelinek  
>
>   PR tree-optimization/96232
>   * match.pd (-(type)!A -> (type)A - 1): New optimization.
>
>   * gcc.dg/tree-ssa/pr96232-1.c: New test.
>
>--- gcc/match.pd.jj2020-12-02 11:20:24.765486816 +0100
>+++ gcc/match.pd   2020-12-05 11:46:00.554518927 +0100
>@@ -3812,6 +3812,16 @@ (define_operator_list COND_TERNARY
>   (cnd (logical_inverted_value truth_valued_p@0) @1 @2)
>   (cnd @0 @2 @1)))
> 
>+/* -(type)!A -> (type)A - 1.  */
>+(simplify
>+ (negate (convert?:s (logical_inverted_value:s @0)))
>+ (if (INTEGRAL_TYPE_P (type)
>+  && TREE_CODE (type) != BOOLEAN_TYPE
>+  && TYPE_PRECISION (type) > 1
>+  && TREE_CODE (@0) == SSA_NAME
>+  && ssa_name_has_boolean_range (@0))
>+  (plus (convert:type @0) { build_all_ones_cst (type); })))
>+
>/* A + (B vcmp C ? 1 : 0) -> A - (B vcmp C ? -1 : 0), since vector
>comparisons
>return all -1 or all 0 results.  */
>/* ??? We could instead convert all instances of the vec_cond to
>negate,
>--- gcc/testsuite/gcc.dg/tree-ssa/pr96232-1.c.jj   2020-12-05
>11:37:27.804332875 +0100
>+++ gcc/testsuite/gcc.dg/tree-ssa/pr96232-1.c  2020-12-05
>11:37:27.804332875 +0100
>@@ -0,0 +1,11 @@
>+/* PR tree-optimization/96232 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-optimized" } */
>+/* { dg-final { scan-tree-dump " \\+ -1;" "optimized" } } */
>+/* { dg-final { scan-tree-dump-not "~x_\[0-9]*\\\(D\\\)" "optimized" }
>} */
>+
>+int
>+foo (_Bool x)
>+{
>+  return x ? 0 : -1;
>+}
>
>
>   Jakub



Re: [PATCH] phiopt, v2: Handle bool in two_value_replacement [PR796232]

2020-12-05 Thread Richard Biener
On December 5, 2020 11:59:22 AM GMT+01:00, Jakub Jelinek  
wrote:
>On Sat, Dec 05, 2020 at 11:22:24AM +0100, Richard Biener wrote:
>> >two_value_replacement and didn't restrict it to !early_p.  Shall we
>> >instead
>> >do two_value_replacement only in late phiopt?
>> 
>> Yeah, I guess that would make sense.  I don't remember exactly but I
>saw
>> regressions with early doing if conversion besides min/max/abs
>detection.
>
>So that would be then following patch instead.
>Ok if it passes bootstrap/regtest?

Ok. 

Richard. 

>2020-12-05  Jakub Jelinek  
>
>   PR tree-optimization/96232
>   * tree-ssa-phiopt.c (two_value_replacement): Optimize even boolean lhs
>   cases as long as arg0 has wider precision and conditional_replacement
>   doesn't handle that case.
>   (tree_ssa_phiopt_worker): Don't call two_value_replacement during
>   early phiopt.
>
>   * gcc.dg/tree-ssa/pr96232-2.c: New test.
>
>--- gcc/tree-ssa-phiopt.c.jj   2020-12-05 11:37:39.216203475 +0100
>+++ gcc/tree-ssa-phiopt.c  2020-12-05 11:51:57.497472991 +0100
>@@ -337,7 +337,7 @@ tree_ssa_phiopt_worker (bool do_store_el
>   }
> 
> /* Do the replacement of conditional if it can be done.  */
>-if (two_value_replacement (bb, bb1, e2, phi, arg0, arg1))
>+if (!early_p && two_value_replacement (bb, bb1, e2, phi, arg0,
>arg1))
>   cfgchanged = true;
> else if (!early_p
>  && conditional_replacement (bb, bb1, e1, e2, phi,
>@@ -635,7 +635,6 @@ two_value_replacement (basic_block cond_
> 
>   if (TREE_CODE (lhs) != SSA_NAME
>   || !INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>-  || TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE
>   || TREE_CODE (rhs) != INTEGER_CST)
> return false;
> 
>@@ -648,9 +647,25 @@ two_value_replacement (basic_block cond_
>   return false;
> }
> 
>+  /* Defer boolean x ? 0 : {1,-1} or x ? {1,-1} : 0 to
>+ conditional_replacement.  */
>+  if (TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE
>+  && (integer_zerop (arg0)
>+|| integer_zerop (arg1)
>+|| TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE
>+|| (TYPE_PRECISION (TREE_TYPE (arg0))
>+<= TYPE_PRECISION (TREE_TYPE (lhs)
>+return false;
>+
>   wide_int min, max;
>-  if (get_range_info (lhs, &min, &max) != VR_RANGE
>-  || min + 1 != max
>+  if (TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE)
>+{
>+  min = wi::to_wide (boolean_false_node);
>+  max = wi::to_wide (boolean_true_node);
>+}
>+  else if (get_range_info (lhs, &min, &max) != VR_RANGE)
>+return false;
>+  if (min + 1 != max
>   || (wi::to_wide (rhs) != min
> && wi::to_wide (rhs) != max))
> return false;
>--- gcc/testsuite/gcc.dg/tree-ssa/pr96232-2.c.jj   2020-12-05
>11:51:01.891103252 +0100
>+++ gcc/testsuite/gcc.dg/tree-ssa/pr96232-2.c  2020-12-05
>11:51:01.891103252 +0100
>@@ -0,0 +1,18 @@
>+/* PR tree-optimization/96232 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-optimized" } */
>+/* { dg-final { scan-tree-dump " 38 - " "optimized" } } */
>+/* { dg-final { scan-tree-dump " \\+ 97;" "optimized" } } */
>+/* { dg-final { scan-tree-dump-not "PHI <" "optimized" } } */
>+
>+int
>+foo (_Bool x)
>+{
>+  return x ? 37 : 38;
>+}
>+
>+int
>+bar (_Bool x)
>+{
>+  return x ? 98 : 97;
>+}
>
>
>   Jakub



Re: testsuite: Adjust target requirements for sad-vectorize and signbit

2020-12-05 Thread David Edelsohn via Gcc-patches
On Sat, Dec 5, 2020 at 6:12 AM Alan Modra  wrote:
>
> On Thu, Oct 29, 2020 at 10:10:58PM +1030, Alan Modra wrote:
> > Fixes
> > FAIL: gcc.target/powerpc/signbit-1.c scan-assembler-not stxvd2x
> > FAIL: gcc.target/powerpc/signbit-1.c scan-assembler-times mfvsrd 3
> > FAIL: gcc.target/powerpc/signbit-1.c scan-assembler-times srdi 3
> > FAIL: gcc.target/powerpc/signbit-2.c scan-assembler-times ld 1
> > FAIL: gcc.target/powerpc/signbit-2.c scan-assembler-times srdi 1
> > on powerpc-linux (or powerpc64-linux biarch -m32).
>
> David,
> This patch is fixing a regression caused by one of your testsuite
> patches.  Please review.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557443.html
>
> > signbit-1.c is quite obviously a 64-bit only testcase given the
> > scan-assembler directives, and the purpose of the testcase to verify
> > the 64-bit only UNSPEC_SIGNBIT patterns.  It could be made to pass for
> > -m32 by adding -mpowerpc64, but that option that isn't very effective
> > when bi-arch testing and results in errors on rs6000-aix.  And it is
> > pointless to match -m32 stores to the stack followed by loads, which
> > is what we do at the moment.
> >
> > signbit-2.c on the other hand has more reasonable 32-bit output.
> >
> > Regression tested powerpc64-linux biarch.
> >
> >   * gcc.target/powerpc/signbit-1.c: Reinstate lp64 condition.
> >   * gcc.target/powerpc/signbit-2.c: Match 32-bit output too.

Alan,

I agree that signbit-1.c clearly checks for 64-bit only instructions,
and signbit-2.c was checking for 64-bit only prior to this patch.

The PPC port has an explosion of 128 bit options (float128, ieee128,
long double, ieee128 in hardware, float128 in vsx, float128 in quad
float library), and less than ideal clarity about the differences.

As much as possible, I would like the testcase requirements set
correctly, once, and then not repeatedly adjust the testcases manually
for targets that we later discover should or should not succeed.  We
have too many ISA/ABI/OS variations and a lot of testcases.  We don't
have the resources to continually tweak them.

I would like to differentiate among "this works on PPC64LE Linux"
versus "this work on Power8" or "Power9" or "Power10" or "MMA" versus
"IEEE128" versus "!aix && !darwin".  Not substituting PPC64 Linux or
Power9 to mean IEEE128.  Where possible we should check for the
requirements that we are testing, not convenient stand-ins.

That being said, it's now clear that there is no target test in the
testsuite that captures float128 VSX functionality.  So testing
float128_sw and lp64 is the best alternative.  Or maybe the testcase
should test for the VSX level where those instructions were
introduced?

Thanks for investigating this and creating a patch.

The patch is okay.

Thanks, David


Re: [PATCH 31/31] PR target/95294: VAX: Add test cases for MODE_CC representation

2020-12-05 Thread Maciej W. Rozycki
On Fri, 20 Nov 2020, Jeff Law wrote:

> Sweet.  OK once the prereqs are all ACK'd.

 Thank you for your review.  I have applied all the changes now, as posted 
in their most recent versions, except for 01/31 where I have noticed an 
anomaly in the included gcc/testsuite/gcc.c-torture/compile/pr58901-?.c 
test cases where the typedefs have been mistakenly swapped between them 
(obviously I meant to use `signed int' for the test case using a bit-field 
and not the other one, as otherwise the `signed' qualifier is redundant).  
I have adjusted the typedefs then, and committed an updated version as 
obviously correct (I double-checked they both still fail in the absence of 
the associated code update).

 While doing that I realised that "bit-field" is the correct spelling of 
the term both with the ISO C standard and the VAX ISA documentation, and 
also our predominantly used version in preference to "bitfield".  I have 
updated all the commit messages accordingly then; there have been no new 
use cases in actual change bodies, so no change there.

 I'll yet get through the discussions and see if there is anything I would 
like to comment on that I missed.  That may not be today though.

  Maciej


Re: Updating the backend status for h8300 on the wiki

2020-12-05 Thread John Paul Adrian Glaubitz
Hi Jeff!

On 11/30/20 8:22 PM, Jeff Law wrote:
> I also took care of updating cris and mn103 status WRT cc0.  Attached is
> what was pushed to the gcc-wwwdocs trunk.

Now that VAX has been converted as well, the entry can be updated here as well.

Now there are only AVR and CR16 that need to be converted. Great progress!

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



[C PATCH] fix atomic loads [PR 97981]

2020-12-05 Thread Uecker, Martin

Hi Joseph,

the patch to drop qualifiers during lvalue conversion
was broken, because the code to emit atomic loads did
not trigger anymore.  I now added a test that scans for
"atomic_load". 

I should have taken the new warning for

_Atomic int y;
y; // warning statement with no effect 

as a tell-tale sign that something is wrong,
although I still think the warning would be
correct. Or has a atomic load some special
semantics which imply that it can't be
removed?

(regression tests still running)


--Martin




C: Fix atomic loads. [PR97981]

To handle atomic loads correctly, we need to move the code
top qualifiers in lvalue conversion after
the code that
handles atomics.


2020-12-05  Martin Uecker  

gcc/c/
 PR c/97981
 * c-typeck.c (convert_lvalue_to_rvalue): Move the code
 that drops qualifiers to the end of the function.  

gcc/testsuite/
 PR c/97981
 * gcc.dg/pr97981.c: New test.
 * gcc.dg/pr60195.c. Adapt test.



diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index cdc491a25fd..138af073925 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -2080,9 +2080,6 @@ convert_lvalue_to_rvalue (location_t loc, struct c_expr 
exp,
 exp = default_function_array_conversion (loc, exp);
   if (!VOID_TYPE_P (TREE_TYPE (exp.value)))
 exp.value = require_complete_type (loc, exp.value);
-  if (convert_p && !error_operand_p (exp.value)
-  && (TREE_CODE (TREE_TYPE (exp.value)) != ARRAY_TYPE))
-exp.value = convert (build_qualified_type (TREE_TYPE (exp.value), 
TYPE_UNQUALIFIED),
exp.value);
   if (really_atomic_lvalue (exp.value))
 {
   vec *params;
@@ -2119,6 +2116,9 @@ convert_lvalue_to_rvalue (location_t loc, struct c_expr 
exp,
   exp.value = build4 (TARGET_EXPR, nonatomic_type, tmp, func_call,
      NULL_TREE, NULL_TREE);
 }
+  if (convert_p && !error_operand_p (exp.value)
+  && (TREE_CODE (TREE_TYPE (exp.value)) != ARRAY_TYPE))
+exp.value = convert (build_qualified_type (TREE_TYPE (exp.value), 
TYPE_UNQUALIFIED),
exp.value);
   return exp;
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr60195.c b/gcc/testsuite/gcc.dg/pr60195.c
index 8eccf7f63ad..0a50a30be25 100644
--- a/gcc/testsuite/gcc.dg/pr60195.c
+++ b/gcc/testsuite/gcc.dg/pr60195.c
@@ -15,7 +15,7 @@ atomic_int
 fn2 (void)
 {
   atomic_int y = 0;
-  y;   /* { dg-warning "statement with no effect" } */
+  y;
   return y;
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr97981.c b/gcc/testsuite/gcc.dg/pr97981.c
new file mode 100644
index 000..846b8755c5b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr97981.c
@@ -0,0 +1,15 @@
+/* PR c/97981 */
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-original" } */
+/* { dg-final { scan-tree-dump-times "atomic_load" 2 "original" } } */
+
+
+void f(void)
+{
+   volatile _Atomic int x;
+   x;
+   volatile _Atomic double a;
+   double b;
+   b = a;
+}
+



Re: introduce overridable clear_cache emitter

2020-12-05 Thread Alexandre Oliva
On Dec  5, 2020, Andreas Schwab  wrote:

> ../../../../libffi/src/aarch64/ffi.c: In function 'ffi_prep_closure_loc':
> ../../../../libffi/src/aarch64/ffi.c:67:3: internal compiler error: in 
> emit_library_call_value_1, at calls.c:5300
>67 |   __builtin___clear_cache (start, end);
>   |   ^~~~

Is this still aarch64-linux-gnu -mabi=ilp32?  I'm afraid I couldn't
duplicate this error using a cross compiler (without binutils, but with
HAVE_AS_MABI_OPTION forced enabled), and many variants of a manually
minimized ffi.c (to build without libc):

static inline void
ffi_clear_cache (void *start, void *end)
{
  __builtin___clear_cache (start, end);
}

#define FFI_TRAMPOLINE_SIZE 24

typedef struct closure {
  char tramp[FFI_TRAMPOLINE_SIZE / sizeof (long)];
} ffi_closure;

void
ffi_prep_closure_loc (ffi_closure *closure)
{
  static const unsigned char trampoline[16] = {
0x90, 0x00, 0x00, 0x58, /* ldr  x16, tramp+16   */
0xf1, 0xff, 0xff, 0x10, /* adr  x17, tramp+0*/
0x00, 0x02, 0x1f, 0xd6  /* br   x16 */
  };
  char *tramp = closure->tramp;
  __builtin_memcpy(tramp, trampoline, sizeof (trampoline));
  ffi_clear_cache(tramp, tramp + FFI_TRAMPOLINE_SIZE);
}


Once you confirm command line and target, I'll look into cross-building
a full toolchain, or using a machine from the compile farm.

Thanks,

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: introduce overridable clear_cache emitter

2020-12-05 Thread Jakub Jelinek via Gcc-patches
On Sat, Dec 05, 2020 at 06:01:59PM -0300, Alexandre Oliva wrote:
> On Dec  5, 2020, Andreas Schwab  wrote:
> 
> > ../../../../libffi/src/aarch64/ffi.c: In function 'ffi_prep_closure_loc':
> > ../../../../libffi/src/aarch64/ffi.c:67:3: internal compiler error: in 
> > emit_library_call_value_1, at calls.c:5300
> >67 |   __builtin___clear_cache (start, end);
> >   |   ^~~~
> 
> Is this still aarch64-linux-gnu -mabi=ilp32?  I'm afraid I couldn't
> duplicate this error using a cross compiler (without binutils, but with
> HAVE_AS_MABI_OPTION forced enabled), and many variants of a manually
> minimized ffi.c (to build without libc):

See PR98147, I've put there an untested patch, but I have no way to test it.

Jakub



Re: [PATCH] RISC-V: Canonicalize --with-arch

2020-12-05 Thread Kito Cheng via Gcc-patches
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98152

Andreas Schwab has created a bug entry for this issue,
using awk or shell should be fine to get the same functionality, but
it might take some time,
so I plan to add some checking to detect python, python2 or python3,
and skip this step if none of above is found.
The arch string canonicalize is not a must, it's a kind of improvement,
trying to reduce the build # of multi-lib, so I think skip that should
not be a problem if python is not available.

On Fri, Dec 4, 2020 at 10:41 PM Matthias Klose  wrote:
>
> On 12/4/20 2:38 PM, Matthias Klose wrote:
> > On 12/4/20 9:07 AM, Kito Cheng via Gcc-patches wrote:
> >> Committed, thanks :)
> >>
> >> On Thu, Dec 3, 2020 at 8:51 AM Jim Wilson  wrote:
> >>>
> >>> On Tue, Dec 1, 2020 at 12:13 AM Kito Cheng  wrote:
> 
>   - We would like to canonicalize the arch string for --with-arch for
> easier handling multilib, so split canonicalization part to a stand
> along script to shared the logic.
> 
>  gcc/ChangeLog:
> 
>  * config/riscv/multilib-generator (arch_canonicalize): Move
>  code to arch-canonicalize, and call that script to canonicalize 
>  arch
>  string.
>  (canonical_order): Move code to arch-canonicalize.
>  (LONG_EXT_PREFIXES): Ditto.
>  (IMPLIED_EXT): Ditto.
>  * config/riscv/arch-canonicalize: New.
>  * config.gcc (riscv*-*-*): Canonicalize --with-arch.
> >>>
> >>>
> >>> Looks OK to me.
> >
> > that breaks the bootstrap if python is not available. The python command 
> > might
> > not be available, so please check for python3, python, or python2.
>
> same for config/riscv/arch-canonicalize


[PATCH] PR target/98152: Checking python is available before using

2020-12-05 Thread Kito Cheng
We'll try to canonicalize the arch string for --with-arch,
and the script is written in python, however it will turns out
GCC require python to build for RISC-V port, it's not expect as
the GCC requirement.

So this patch is made this as optional, detect python and only use it
when it available, it won't break any functionality with out doing
canonicalization, just might build one more redundant multi-lib.

gcc/ChangeLog:

* config.gcc (riscv*-*-*): Checking python, python3 or python2
is available, and skip doing with_arch canonicalize if no python
available.
---
 gcc/config.gcc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 9c7604481f1..3650b46734a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4623,7 +4623,10 @@ case "${target}" in
exit 1
;;
esac
-   with_arch=`${srcdir}/config/riscv/arch-canonicalize 
${with_arch}`
+   PYTHON=`which python || which python3 || which python2`
+   if test "x${PYTHON}" != x; then
+   with_arch=`${PYTHON} 
${srcdir}/config/riscv/arch-canonicalize ${with_arch}`
+   fi
tm_defines="${tm_defines} 
TARGET_RISCV_DEFAULT_ARCH=${with_arch}"
 
# Make sure --with-abi is valid.  If it was not specified,
-- 
2.29.2