Re: [PATCH] Improve __builtin_add_overflow on x86 for double-word types (PR target/93141)

2020-01-05 Thread Uros Bizjak
On Sat, Jan 4, 2020 at 12:39 PM Jakub Jelinek  wrote:
>
> On Sat, Jan 04, 2020 at 12:13:50PM +0100, Uros Bizjak wrote:
> > LGTM, but I wonder if *addcarry_1 gets overmacroized, the insn
> > condition is really hard to comprehend. Perhaps it should be written
> > as a separate pattern for SImode and DImode instead?
>
> > > +(define_insn "*addcarry_1"
> > > +  [(set (reg:CCC FLAGS_REG)
> > > +   (compare:CCC
> > > + (zero_extend:
> > > +   (plus:SWI48
> > > + (plus:SWI48
> > > +   (match_operator:SWI48 5 "ix86_carry_flag_operator"
> > > + [(match_operand 3 "flags_reg_operand") (const_int 0)])
> > > +   (match_operand:SWI48 1 "nonimmediate_operand" "%0"))
> > > + (match_operand:SWI48 2 "x86_64_immediate_operand" "e")))
> > > + (plus:
> > > +   (match_operand: 6 "const_scalar_int_operand" "")
> > > +   (match_operator: 4 "ix86_carry_flag_operator"
> > > + [(match_dup 3) (const_int 0)]
> > > +   (set (match_operand:SWI48 0 "register_operand" "=r")
> > > +   (plus:SWI48 (plus:SWI48 (match_op_dup 5
> > > +[(match_dup 3) (const_int 0)])
> > > +   (match_dup 1))
> > > +   (match_dup 2)))]
> > > +  "ix86_binary_operator_ok (PLUS, mode, operands)
> > > +   && CONST_INT_P (operands[2])
> > > +   /* Check that operands[6] is operands[2] zero extended from
> > > +  mode to mode.  */
> > > +   && ((mode == SImode || INTVAL (operands[2]) >= 0)
> > > +   ? (CONST_INT_P (operands[6])
> > > + && UINTVAL (operands[6]) == (UINTVAL (operands[2])
> > > +  & GET_MODE_MASK (mode)))
> > > +   : (CONST_WIDE_INT_P (operands[6])
> > > + && CONST_WIDE_INT_NUNITS (operands[6]) == 2
> > > + && ((unsigned HOST_WIDE_INT) CONST_WIDE_INT_ELT (operands[6], 0)
> > > + == UINTVAL (operands[2]))
> > > + && CONST_WIDE_INT_ELT (operands[6], 1) == 0))"
>
> I'm afraid it wouldn't help.  For SImode it would remove the : part, sure,
> but any optimizing compiler will optimize that away too, for DImode we need
> both branches - if INTVAL isn't negative, its zero extension will still be
> a CONST_INT with the same value, if it is negative, zero extension will be a
> CONST_WIDE_INT.  It would be shorter to write
>   rtx_equal_p (operands[6],
>simplify_unary_operation (ZERO_EXTEND, mode, operands[2],
>  mode))
> but I wanted to avoid that because it is quite costly and will create a new
> rtx each time the condition is checked.
> Anyway, attached is in one file the *addcarry_1 from the patch and
> in another one the demacroized variant, the macroized one is 41 lines
> long, the demacroized one is 75 lines long, so not fully 2 times, but
> almost.  Yes, in the SImode case we can just use const_int_operand and don't
> need to test it in the condition.

Thanks for the examples!

Yes, I agree that the expanded version is not much better than
macroized (it is even worse...).

So, OK.

Thanks,
Uros.


[C++ PATCH] Avoid caching constexpr calls that allocate something that they don't deallocate or vice versa (PR c++/91369)

2020-01-05 Thread Jakub Jelinek
Hi!

The caching of constexpr calls breaks the following testcase.
The problem is that constexpr calls that allocate from heap something
that they don't deallocate or calls that deallocate something they haven't
allocated aren't stateless for the constexpr context and in each outermost
constexpr evaluation the allocations will produce different artificial heap
VAR_DECLs.  In the testcase a function allocates something that the caller
is supposed to deallocate, so when we call it second time (e.g. from
different function) with the same constant arguments, we get the cached
result that contains address of the heap VAR_DECL from the earlier
evaluation and error trying to free something we haven't allocated (in the
same outermost constexpr evaluation).  Similarly, a constexpr call could
deallocate something the caller is supposed to allocate, if we cache such
call, we wouldn't actually deallocate it and complain the allocation has not
been deallocated.

Fixed by making sure we only cache calls that either don't perform any
allocations/deallocations, or perform exactly as many allocations and
deallocations and all those new allocations (new additions to heap_vars,
to which we only add, never remove) have been deallocated (checking just
numbers is not good enough, we might perform 3 allocations and 3
deallocations, e.g. deallocate 2 of those allocations and one passed to the
call from the caller and pass the still allocated address to the caller.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-01-05  Jakub Jelinek  

PR c++/91369
* constexpr.c (struct constexpr_global_ctx): Add heap_alloc_count
member, initialize it to zero in ctor.
(cxx_eval_call_expression): Bump heap_dealloc_count when deleting
a heap object.  Don't cache calls to functions which allocate some
heap objects and don't deallocate them or deallocate some heap
objects they didn't allocate.

* g++.dg/cpp1y/constexpr-new.C: Expect an error explaining why
static_assert failed for C++2a.
* g++.dg/cpp2a/constexpr-new9.C: New test.

--- gcc/cp/constexpr.c.jj   2020-01-01 12:22:34.002475595 +0100
+++ gcc/cp/constexpr.c  2020-01-04 11:56:07.619124987 +0100
@@ -1041,8 +1041,11 @@ struct constexpr_global_ctx {
   auto_vec heap_vars;
   /* Cleanups that need to be evaluated at the end of CLEANUP_POINT_EXPR.  */
   vec *cleanups;
+  /* Number of heap VAR_DECL deallocations.  */
+  unsigned heap_dealloc_count;
   /* Constructor.  */
-  constexpr_global_ctx () : constexpr_ops_count (0), cleanups (NULL) {}
+  constexpr_global_ctx ()
+: constexpr_ops_count (0), cleanups (NULL), heap_dealloc_count (0) {}
 };
 
 /* The constexpr expansion context.  CALL is the current function
@@ -2056,6 +2059,7 @@ cxx_eval_call_expression (const constexp
{
  DECL_NAME (var) = heap_deleted_identifier;
  ctx->global->values.remove (var);
+ ctx->global->heap_dealloc_count++;
  return void_node;
}
  else if (DECL_NAME (var) == heap_deleted_identifier)
@@ -2281,6 +2285,7 @@ cxx_eval_call_expression (const constexp
 }
   else
 {
+  bool cacheable = true;
   if (result && result != error_mark_node)
/* OK */;
   else if (!DECL_SAVED_TREE (fun))
@@ -2346,6 +2351,8 @@ cxx_eval_call_expression (const constexp
  auto_vec save_exprs;
  ctx_with_save_exprs.save_exprs = &save_exprs;
  ctx_with_save_exprs.call = &new_call;
+ unsigned save_heap_alloc_count = ctx->global->heap_vars.length ();
+ unsigned save_heap_dealloc_count = ctx->global->heap_dealloc_count;
 
  tree jump_target = NULL_TREE;
  cxx_eval_constant_expression (&ctx_with_save_exprs, body,
@@ -2417,6 +2424,33 @@ cxx_eval_call_expression (const constexp
 
  /* Make the unshared function copy we used available for re-use.  */
  save_fundef_copy (fun, copy);
+
+ /* If the call allocated some heap object that hasn't been
+deallocated during the call, or if it deallocated some heap
+object it has not allocated, the call isn't really stateless
+for the constexpr evaluation and should not be cached.
+It is fine if the call allocates something and deallocates it
+too.  */
+ if (entry
+ && (save_heap_alloc_count != ctx->global->heap_vars.length ()
+ || (save_heap_dealloc_count
+ != ctx->global->heap_dealloc_count)))
+   {
+ tree heap_var;
+ unsigned int i;
+ if ((ctx->global->heap_vars.length ()
+  - ctx->global->heap_dealloc_count)
+ != save_heap_alloc_count - save_heap_dealloc_count)
+   cacheable = false;
+ else
+   FOR_EACH_VEC_ELT_FROM (ctx->global->heap_vars, i

[testsuite, Darwin, committed] Fix fail of darwin-version-1.c.

2020-01-05 Thread Iain Sandoe
Recent platform linkers will no longer accept linking for a target
OS version less than 10.4.  Recent SDKs no longer have the libgcc_s
shims used for 10.4 and 10.5.  So we need to adjust tests that expect
these.  Added some explanation too.

tested on darwin9, darwin16 and darwin18,
applied to mainline,
thanks
Iain

gcc/testsuite/ChangeLog:

2020-01-05  Iain Sandoe  

* gcc.dg/darwin-version-1.c: Adjust test to use different
options for Darwin4-9 and Darwin10+.

diff --git a/gcc/testsuite/gcc.dg/darwin-version-1.c 
b/gcc/testsuite/gcc.dg/darwin-version-1.c
index 11cfceff39..ad7f7da3b6 100644
--- a/gcc/testsuite/gcc.dg/darwin-version-1.c
+++ b/gcc/testsuite/gcc.dg/darwin-version-1.c
@@ -1,10 +1,14 @@
 /* Basic test of the -mmacosx-version-min option.  */
 
-/* { dg-options "-mmacosx-version-min=10.1" } */
+/* Darwin4 corresponds to MacOS 10.0.  */
+/* { dg-options "-mmacosx-version-min=10.1" { target *-*-darwin[456789]* } } */
+/* Later Darwin linkers decline to link for less than Darwin8/MacOS 10.4.
+   However, we need to make the link for 10.6 because the relevant libgcc_s
+   shim files for 10.4 and 10.5 are also not installed in later SDKs.  */
+/* { dg-options "-mmacosx-version-min=10.6" { target *-*-darwin[123]* } } */
 /* { dg-do link { target *-*-darwin* } } */
 
 int main()
 {
   return 0;
 }
-



[PATCH] optimize costly division in rtx_cost

2020-01-05 Thread Alexander Monakov
Hi,

I noticed there's a costly signed 64-bit division in rtx_cost on x86 as well as
any other target where UNITS_PER_WORD is implemented like TARGET_64BIT ? 8 : 4.
It's also evident that rtx_cost does redundant work for a SET rtx argument.

Obviously the variable named 'factor' rarely exceeds 1, so in the majority of
cases it can be computed with a well-predictable branch rather than a division.

This patch makes rtx_cost do the division only in case mode is wider than
UNITS_PER_WORD, and also moves a test for a SET up front to avoid redundancy.
No functional change.

Bootstrapped on x86_64, ok for trunk?

To illustrate the improvement this buys, for tramp3d -O2 compilation, I got

before:
   73887675319  instructions:u

   72438432200  cycles:u
 924298569  idq.ms_uops:u
  102603799255  uops_dispatched.thread:u

after:
   73888371724  instructions:u

   72386986612  cycles:u
 802744775  idq.ms_uops:u
  102096987220  uops_dispatched.thread:u

(this is on Sandybridge, idq.ms_uops are uops going via the microcode sequencer,
so the unneeded division is responsible for a good fraction of them)

* rtlanal.c (rtx_cost): Handle a SET up front. Avoid division if the
mode is not wider than UNITS_PER_WORD.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 9a7afccefb8..c7ab86e228b 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -4207,18 +4207,23 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code 
outer_code,
   const char *fmt;
   int total;
   int factor;
+  unsigned mode_size;
 
   if (x == 0)
 return 0;
 
-  if (GET_MODE (x) != VOIDmode)
+  if (GET_CODE (x) == SET)
+/* A SET doesn't have a mode, so let's look at the SET_DEST to get
+   the mode for the factor.  */
+mode = GET_MODE (SET_DEST (x));
+  else if (GET_MODE (x) != VOIDmode)
 mode = GET_MODE (x);
 
+  mode_size = estimated_poly_value (GET_MODE_SIZE (mode));
+
   /* A size N times larger than UNITS_PER_WORD likely needs N times as
  many insns, taking N times as long.  */
-  factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD;
-  if (factor == 0)
-factor = 1;
+  factor = mode_size > UNITS_PER_WORD ? mode_size / UNITS_PER_WORD : 1;
 
   /* Compute the default costs of certain things.
  Note that targetm.rtx_costs can override the defaults.  */
@@ -4243,14 +4248,6 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code 
outer_code,
   /* Used in combine.c as a marker.  */
   total = 0;
   break;
-case SET:
-  /* A SET doesn't have a mode, so let's look at the SET_DEST to get
-the mode for the factor.  */
-  mode = GET_MODE (SET_DEST (x));
-  factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD;
-  if (factor == 0)
-   factor = 1;
-  /* FALLTHRU */
 default:
   total = factor * COSTS_N_INSNS (1);
 }


[Committed] New bit-field testcases Part 3/N

2020-01-05 Thread Andrew Pinski
Hi,
  I found some more testcases that would cause an internal compiler
error while working on my bit-field lowering pass.  Since there was no
testcase already done, I thought it would be best if it was applied to
the trunk.

Thanks,
Andrew Pinski

testsuite/ChangeLog:
* gcc.c-torture/compile/20200105-1.c: New testcase.
* gcc.c-torture/compile/20200105-2.c: New testcase.
* gcc.c-torture/compile/20200105-3.c: New testcase.
Index: gcc.c-torture/compile/20200105-1.c
===
--- gcc.c-torture/compile/20200105-1.c  (nonexistent)
+++ gcc.c-torture/compile/20200105-1.c  (working copy)
@@ -0,0 +1,12 @@
+struct mouse_button_str {
+unsigned char left  : 1;
+unsigned char right : 1;
+unsigned char middle: 1;
+};
+int g(void)
+{
+  signed char a = 0;
+  struct mouse_button_str *newbutton1 = (struct mouse_button_str*)&a;
+  newbutton1->left = 1;
+  return a;
+}
Index: gcc.c-torture/compile/20200105-2.c
===
--- gcc.c-torture/compile/20200105-2.c  (nonexistent)
+++ gcc.c-torture/compile/20200105-2.c  (working copy)
@@ -0,0 +1,12 @@
+struct mouse_button_str {
+signed char left  : 1;
+signed char right : 1;
+signed char middle: 1;
+};
+int g(void)
+{
+  unsigned char a = 0;
+  struct mouse_button_str *newbutton1 = (struct mouse_button_str*)&a;
+  newbutton1->left = 1;
+  return a;
+}
Index: gcc.c-torture/compile/20200105-3.c
===
--- gcc.c-torture/compile/20200105-3.c  (nonexistent)
+++ gcc.c-torture/compile/20200105-3.c  (working copy)
@@ -0,0 +1,12 @@
+struct mouse_button_str {
+unsigned char left  : 1;
+unsigned char right : 1;
+unsigned char middle: 1;
+};
+int g(void)
+{
+  unsigned char a = 0;
+  struct mouse_button_str *newbutton1 = (struct mouse_button_str*)&a;
+  newbutton1->left = 1;
+  return a;
+}


[PATCH] ipa-inline: Adjust condition for caller_growth_limits

2020-01-05 Thread Xiong Hu Luo
Inline should return failure either (newsize > param_large_function_insns)
OR (newsize > limit).  Sometimes newsize is larger than
param_large_function_insns, but smaller than limit, inline doesn't return
failure even if the new function is a large function.
Therefore, param_large_function_insns and param_large_function_growth should be
OR instead of AND, otherwise --param large-function-growth won't
work correctly with --param large-function-insns.

gcc/ChangeLog:

2020-01-06  Luo Xiong Hu  

ipa-inline-analysis.c (estimate_growth): Fix typo.
ipa-inline.c (caller_growth_limits): Use OR instead of AND.
---
 gcc/ipa-inline-analysis.c | 2 +-
 gcc/ipa-inline.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 711b85bd1a9..0e584a5a401 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -462,7 +462,7 @@ offline_size (struct cgraph_node *node, ipa_size_summary 
*info)
   return 0;
 }
 
-/* Estimate the growth caused by inlining NODE into all callees.  */
+/* Estimate the growth caused by inlining NODE into all callers.  */
 
 int
 estimate_growth (struct cgraph_node *node)
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 0f87c476dde..33945538def 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -184,8 +184,8 @@ caller_growth_limits (struct cgraph_edge *e)
  the function to shrink if it went over the limits by forced inlining.  */
   newsize = estimate_size_after_inlining (to, e);
   if (newsize >= ipa_size_summaries->get (what)->size
-  && newsize > opt_for_fn (to->decl, param_large_function_insns)
-  && newsize > limit)
+  && (newsize > opt_for_fn (to->decl, param_large_function_insns)
+ || newsize > limit))
 {
   e->inline_failed = CIF_LARGE_FUNCTION_GROWTH_LIMIT;
   return false;
-- 
2.21.0.777.g83232e3864