date:20141204

Re: [PATCH] Fix asan sanopt optimization (PR sanitizer/64170)

2014-12-04 Thread Yury Gribov


On 12/04/2014 01:07 AM, Jakub Jelinek wrote:

Hi!

The following testcase ICEs, because base_checks vector contains
stale statements, and can_remove_asan_check relies on them not to be
there anymore (assumes that all statements in the vector dominate
the current statement, if that is not true, the loop going through immediate
dominators won't reach the basic block of the stmt in the vector).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


LGTM, thanks for fixing this.

-Y

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-04 Thread Sebastian Pop

Jeff Law wrote:
> >+@item max-fsm-thread-path-insns
> >+Maximum number of instructions to copy when duplicating blocks on a
> >+finite state automaton jump thread path.  The default is 100.
> >+
> >+@item max-fsm-thread-length
> >+Maximum number of basic blocks on a finite state automaton jump thread
> >+path.  The default is 10.
> >+
> >+@item max-fsm-thread-paths
> >+Maximum number of new jump thread paths to create for a finite state
> >+automaton.  The default is 50.
> Has there been any tuning on these defaults.  I don't have any
> strong opinions about what they ought to be, this is more to get any
> such information recorded on the lists for historical purposes.

I have not tuned any of these defaults other than making sure that coremark is
still jump-threaded.  gcc.dg/tree-ssa/ssa-dom-thread-7.c is a test-case that
will check that we always optimize coremark.

> I think it's worth a note in the debug dump anytime you abort
> threading when you hit a limit.

Done.

> I'm a bit worried about compile-time impacts of the all the
> recursion, but I'm willing to wait and see if it turns out to be a
> problem in practice.

Done, as Richi suggested, checking the flag_expensive_optimizations.

> >@@ -689,6 +692,12 @@ simplify_control_stmt_condition (edge e,
> >  pass specific callback to try and simplify it further.  */
> >if (cached_lhs && ! is_gimple_min_invariant (cached_lhs))
> >  cached_lhs = (*simplify) (stmt, stmt);
> >+
> >+  /* We couldn't find an invariant.  But, callers of this
> >+ function may be able to do something useful with the
> >+ unmodified destination.  */
> >+  if (!cached_lhs)
> >+cached_lhs = original_lhs;
> >  }
> >else
> >  cached_lhs = NULL;
> Can't you just use COND rather than stuffing its value away into
> ORIGINAL_LHS?CACHED_LHS may be better in some cases if it's an
> SSA_NAME (and it should be), but I doubt it matters in practice.
> 
> Or is it the case that you have to have the original condition --
> without any context sensitive equivalences used to "simplify" the
> condition.

I think we need to start the search for FSM jump-threads with the original non
simplified condition.

> >+/* Return true if there is at least one path from START_BB to END_BB.
> >+   VISITED_BBS is used to make sure we don't fall into an infinite loop.  */
> >+
> >+static bool
> >+fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
> >+  vec *&path,
> >+  hash_set *visited_bbs, int n_insns)
> Update comment to indicate how PATH is used to return a path from
> START_BB to END_BB.

Done.

> >+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
> >+   gsi_next (&gsi))
> >+++n_insns;
> Probably don't want to count labels and GIMPLE_NOPs.  Probably do
> want to count non-virtual PHIs since those may end up as a copies or
> constant initializations.

Done.

> 
> >+  if (j == 0)
> >+kind = EDGE_START_FSM_THREAD;
> >+  else if (single_pred_p (e->src))
> >+kind = EDGE_NO_COPY_SRC_BLOCK;
> >+  else {
> >+kind = EDGE_COPY_SRC_JOINER_BLOCK;
> >+++joiners;
> >+  }
> Presumably the mis-formatting was added when you tracked the #
> joiners.  AFAICT that is a write-only variable and ought to be
> removed.  Along with the braces on the final ELSE which should
> restore proper formatting.

Done.

> 
> 
> >@@ -2343,6 +2493,55 @@ thread_through_all_blocks (bool may_peel_loop_headers)
> >threaded_blocks = BITMAP_ALLOC (NULL);
> >memset (&thread_stats, 0, sizeof (thread_stats));
> >
> >+  for (i = 0; i < paths.length ();)
> Comment before this loop.  I can see what you're doing, but I'm
> already very familiar with this code.  Basically what are you
> looking for in this loop and what do you do?

Done.

> Overall I think this is very very close and I really like the
> overall direction.  There's a few minor issues noted above and with
> those addressed, I think we should be ready to go.

Thanks for your careful review.  Please let me know if there still are things I
can improve in the attached patch.
The path passes bootstrap on x86_64-linux and powerpc64-linux, and regtest 
except 
a fail I have not seen in the past:

FAIL: gcc.c-torture/compile/pr27571.c   -Os  (internal compiler error)

I am still investigating why this fails: as far as I can see for now this is
because in copying the FSM path we create an internal loop that is then
discovered by the loop verifier as a natural loop and is not yet in the existing
loop sturctures.  I will try to fix this in duplicate_seme by invalidating the
loop structure after we code generated all the FSM paths.  I will submit an
updated patch when it passes regtest.

> Removing most of tree-ssa-threadupdate.c and using SEME duplication
> would be a huge step forward for making this code more

RE: [PATCH] Fix PR 61225

2014-12-04 Thread Zhenqiang Chen


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jeff Law
> Sent: Tuesday, December 02, 2014 6:11 AM
> To: Zhenqiang Chen
> Cc: Steven Bosscher; gcc-patches@gcc.gnu.org; Jakub Jelinek
> Subject: Re: [PATCH] Fix PR 61225
> 
> On 08/04/14 02:24, Zhenqiang Chen wrote:
> 
> >>>
> >>> ChangeLog:
> >>> 2014-05-22  Zhenqiang Chen  
> >>>
> >>>   Part of PR rtl-optimization/61225
> >>>   * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p):
> >>> New proto.
> >>>   * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New
function.
> >>>   * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note
> when
> >>>   propagating to SET.
> >>
> >> I can't help but wonder why the new 4 insn combination code isn't
> >> presenting this as a nice big fat insn to the x86 backend which would
> >> eliminate the need for the peep2.
> >>
> >> But, assuming there's a fundamental reason why that's not kicking in...
> >
> > Current combine pass can only handle
> >
> > I0 -> I1 -> I2 -> I3.
> > I0, I1 -> I2, I2 -> I3.
> > I0 -> I2; I1, I2 -> I3.
> > I0 -> I1; I1, I2 -> I3.
> >
> > For the case, it is
> > I1 -> I2 -> I3; I2 -> INSN
> >
> > I3 and INSN looks like not related. But INSN is a COMPARE to set CC
> > and I3 can also set CC. I3 and INSN can be combined together as one
> > instruction to set CC.
> Presumably there's no dataflow between I3 and INSN because they both set
> CC (doesn't that make them anti-dependent?
> 
> Can you show me the RTL corresponding to I1, I2, I3 and INSN, I simply
find it
> easier to look at RTL rather than guess why we don't have the appropriate
> linkage and thus not attempting the combinations we want.
> 
> Pseudo code for the resulting I3 and INSN would help -- as I work through
> this there's some inconsistencies in how I'm interpreting a few things and
RTL
> and pseudo-rtl for the desired output RTL would help a lot.

C code:

if (!--*p)

rtl code:

6: r91:SI=[r90:SI]
7: {r88:SI=r91:SI-0x1;clobber flags:CC;}
8: [r90:SI]=r88:SI
9: flags:CCZ=cmp(r88:SI,0)

expected output:

8: {flags:CCZ=cmp([r90:SI]-0x1,0);[r90:SI]=[r90:SI]-0x1;}

in assemble, it is

  decl (%eax)

> >
> > ChangeLog
> > 2014-08-04  Zhenqiang Chen  
> >
> >  Part of PR rtl-optimization/61225
> >  * combine.c (refer_same_reg_p): New function.
> >  (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn.
> >  (try_combine): Add one more parameter TO_COMBINED_INSN, which
> is
> >  used to create a new insn parallel (TO_COMBINED_INSN, I3).
> >
> > testsuite/ChangeLog:
> > 2014-08-04  Zhenqiang Chen  
> >
> >  * gcc.target/i386/pr61225.c: New test.
> >
> > diff --git a/gcc/combine.c b/gcc/combine.c index 53ac1d6..42098ab
> > 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -412,7 +412,7 @@ static int cant_combine_insn_p (rtx);
> >   static int can_combine_p (rtx, rtx, rtx, rtx, rtx, rtx, rtx *, rtx *);
> >   static int combinable_i3pat (rtx, rtx *, rtx, rtx, rtx, int, int, rtx
*);
> >   static int contains_muldiv (rtx);
> > -static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx);
> > +static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx, rtx);
> >   static void undo_all (void);
> >   static void undo_commit (void);
> >   static rtx *find_split_point (rtx *, rtx, bool); @@ -1099,6 +1099,46
> > @@ insn_a_feeds_b (rtx a, rtx b)
> >   #endif
> > return false;
> >   }
> > +
> > +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
> > +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
> > +   except A and B.  */
> > +
> > +static bool
> > +refer_same_reg_p (rtx a, rtx b)
> > +{
> > +  rtx seta = single_set (a);
> > +  rtx setb = single_set (b);
> > +
> > +  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
> > + || !seta || !setb)
> > +return false;
> Go ahead and use
>  || !setb
>  || !setb
> 
> It's a bit more vertical space, but I believe closer in line with our
coding
> standards.

Updated.
 
> > +
> > +  if (GET_CODE (SET_SRC (seta)) != COMPARE
> > +  || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
> > +  || !REG_P (XEXP (SET_SRC (seta), 0))
> > +  || !const0_rtx
> > +  || !REG_P (SET_SRC (setb))
> > +  || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
> > +return false;
> What's the !const0_rtx test here?  Don't you want to test some object
> from SETA against const0_rtx?  Also note that you may need to test
> against CONST0_RTX (mode)

It's my fault. It should be

XEXP (SET_SRC (seta), 1) != const0_rtx

The updated patch is attached.

Thanks!
-Zhenqiang

 > > @@ -1431,6 +1468,50 @@ combine_instructions (rtx f, unsigned int nregs)
> >}
> >}
> >
> > + /* Try to combine a compare insn that sets CC
> > +with a preceding insn that can set CC, and maybe with its
> > +logi

[patch,doc] Remove cloog & ppl in the docs

2014-12-04 Thread FX

The attached patch removes some remaining mentions of cloog and ppl in our docs.
Tested with “make info html pdf”. OK for trunk?

FX



PS: with this, the remaining mentions of cloog or ppl are comments in 
config/isl.m4, graphite.c and graphite-blocking.c. They should probably go away 
too, but they are technical so someone with cloog/isl knowledge should do it.





2014-12-04  Francois-Xavier Coudert  

* doc/install.texi: Remove mentions of cloog and ppl.
* doc/invoke.texi: Likewise



cloog.diff
Description: Binary data

[ping] account for register spans in expand_builtin_init_dwarf_reg_sizes

2014-12-04 Thread Olivier Hainque

Hi Jason,

ping for https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02975.html,

a proposal to address comments you made on a patch I had sent
earlier on.

> The attached patches combined bootstrap and regtest fine on x86_64-linux.
> We also have nominal test results with our 4.9 based series of compilers
> in-house, including for the e500 family of targets.

As I mentioned in earlier comments, the DWARF_REG_TO_UNWIND_COLUMN part is
essentially a noop today and is not necessary to fix the breakage. It's just
something that ISTM should be there in principle.

Thanks in advance for your feedback,

With Kind Regards,

Olivier

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-04 Thread FX

10-days ping
This restores bootstrap on a secondary target, target maintainer is OK with it. 
I think I need build maintainers approval, so please review.


> when the freshly built g++ is used, we need to pass the appropriate -B 
> options. As I understand it, the appropriate place for that is in the 
> toplevel configure.ac, where we already pass down the respective -L options. 
> Indeed, the attached patch restores bootstrap on x86_64-apple-darwin14 with 
> gcc as system compiler (and doesn’t break the bootstrap with clang as system 
> compiler).
> 
> OK to commit?
> 
> FX
> 
> 
> 
> PS: HJ, the reason only don’t see this on Linux is that only Darwin (AFAIK) 
> plays spec tricks with static-libstdc++ (in gcc/config/darwin.h)



libcc1.ChangeLog
Description: Binary data


libcc1.diff
Description: Binary data

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-04 Thread Sebastian Pop

Sebastian Pop wrote:
> a fail I have not seen in the past:
> 
> FAIL: gcc.c-torture/compile/pr27571.c   -Os  (internal compiler error)
> 
> I am still investigating why this fails: as far as I can see for now this is
> because in copying the FSM path we create an internal loop that is then
> discovered by the loop verifier as a natural loop and is not yet in the 
> existing
> loop sturctures.  I will try to fix this in duplicate_seme by invalidating the
> loop structure after we code generated all the FSM paths.  I will submit an
> updated patch when it passes regtest.

We need at least this patch to fix the fail:

@@ -2518,6 +2518,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
  if (duplicate_seme_region (entry, exit, region, len - 1, NULL))   

 
{ 
  /* We do not update dominance info.  */
  free_dominance_info (CDI_DOMINATORS);
  bitmap_set_bit (threaded_blocks, entry->src->index);
+ retval = true;
}

And this will trigger in the end of the code gen function:

 if (retval)
loops_state_set (LOOPS_NEED_FIXUP);

That will fix the loop structures.  I'm testing this patch on top of the one I
have just sent out.

Sebastian

Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-12-04 Thread Kyrill Tkachov



On 02/12/14 22:58, Ramana Radhakrishnan wrote:

On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov  wrote:

Hi all,

This is the arm implementation of the macro fusion hook.
It tries to fuse movw+movt operations together. It also tries to take lo_sum
RTXs into account since those generate movt instructions as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?




  if (current_tune->fuseable_ops & ARM_FUSE_MOVW_MOVT)
+{
+  /* We are trying to fuse
+ movw imm / movt imm
+ instructions as a group that gets scheduled together.  */
+

A comment here about the insn structure would be useful.


Done. It's similar to the aarch64 adrp+add case. It does make it easier 
to read, thanks.


2014-12-04  Kyrylo Tkachov  kyrylo.tkac...@arm.com\

  * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
  * config/arm/arm.c (arm_macro_fusion_p): New function.
  (arm_macro_fusion_pair_p): Likewise.
  (TARGET_SCHED_MACRO_FUSION_P): Define.
  (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
  (ARM_FUSE_NOTHING): Likewise.
  (ARM_FUSE_MOVW_MOVT): Likewise.
  (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
  arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
  arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
  arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
  arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
  arm_cortex_a5_tune): Specify fuseable_ops value.




+  set_dest = SET_DEST (curr_set);
+  if (GET_CODE (set_dest) == ZERO_EXTRACT)
+{
+  if (CONST_INT_P (SET_SRC (curr_set))
+  && CONST_INT_P (SET_SRC (prev_set))
+  && REG_P (XEXP (set_dest, 0))
+  && REG_P (SET_DEST (prev_set))
+  && REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)))
+return true;
+}
+  else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
+   && REG_P (SET_DEST (curr_set))
+   && REG_P (SET_DEST (prev_set))
+   && GET_CODE (SET_SRC (prev_set)) == HIGH
+   && REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set)))
+{
+  return true;
+}

Can we add a fast path exit to be

if (GET_MODE (set_dest) != SImode)
   return false;


Done, but if/when we extend the function to handle more fusion cases it 
will need to be
refactored, since we will want to just bail out of this MOVW+MOVT case 
rather than the whole function.




I did think whether we wanted to use reg_overlap_mentioned_p as that
may simplify the logic a bit but that's  overkill here as we still
want to restrict it to the cases above.

Otherwise OK.


Here's the updated patch. I've tested on arm-none-eabi and made sure 
that the

fusion still happens on the benchmarks I looked at.
Ok?

Thanks,
Kyrill



Ramana





+}
+  return false;
Thanks,
Kyrill

2014-11-11  Kyrylo Tkachov  

 * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
 * config/arm/arm.c (arm_macro_fusion_p): New function.
 (arm_macro_fusion_pair_p): Likewise.
 (TARGET_SCHED_MACRO_FUSION_P): Define.
 (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
 (ARM_FUSE_NOTHING): Likewise.
 (ARM_FUSE_MOVW_MOVT): Likewise.
 (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
 arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
 arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
 arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
 arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
 arm_cortex_a5_tune): Specify fuseable_ops value.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 20cfa9f..19925e9 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -289,6 +289,8 @@ struct tune_params
   bool string_ops_prefer_neon;
   /* Maximum number of instructions to inline calls to memset.  */
   int max_insns_inline_memset;
+  /* Bitfield encoding the fuseable pairs of instructions.  */
+  unsigned int fuseable_ops;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 64494e8..6f847d6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -251,6 +251,7 @@ static void arm_expand_builtin_va_start (tree, rtx);
 static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
 static void arm_option_override (void);
 static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode);
+static bool arm_macro_fusion_p (void);
 static bool arm_cannot_copy_insn_p (rtx_insn *);
 static int arm_issue_rate (void);
 static void arm_output_dwarf_dtprel (FILE *, int, rtx) ATTRIBUTE_UNUSED;
@@ -291,6 +292,8 @@ static int arm_cortex_m_branch_cost (bool, bool);
 static bool arm_vectorize_vec_perm_const_ok (machine_mode vmode,
 	 const unsigned char *sel);
 
+static bool aarch_macro_fusion_pair_p

RE: [PATCH] Add force option to find_best_rename_reg in regrename pass

2014-12-04 Thread Thomas Preud'homme

Hi Eric,

Sorry for the delay, for some reasons despite being a recipient the mail
didn't hit my inbox but only my gcc-patches box.

> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Eric Botcazou
> 
> > 2014-11-14  Thomas Preud'homme  
> >
> > * regrename.c (find_best_rename_reg): Rename to ...
> > (find_rename_reg): This. Also add a parameter to skip tick check.
> > * regrename.h: Likewise.
> > * config/c6x/c6x.c: Adapt to above renaming.
> 
> Missing function in config/c6x/c6x.c entry.

Here is the proposed ChangeLog entry:

2014-11-26 Thomas Preud'homme thomas.preudho...@arm.com\

  * regrename.c (find_best_rename_reg): Rename to ...
  (find_rename_reg): This. Also add a parameter to skip tick check.
  * regrename.h: Likewise.
  * config/c6x/c6x.c (try_rename_operands): Adapt to above renaming.

> 
> Please write it like so:
> 
> if (!check_new_reg_p (old_reg, new_reg, this_head,
> *unavailable))
>   continue;
> 
> if (!best_rename)
>   return new_reg;
> 
> /* In the first pass, we force the renaming of registers that
>don't belong to PREFERRED_CLASS to registers that do, even
>though the latters were used not very long ago.  *
> if ((pass == 0
>&& !TEST_HARD_REG_BIT
> (reg_class_contents[preferred_class],
> best_new_reg))
> || tick[best_new_reg] > tick[new_reg]))
>   best_new_reg = new_reg;

Oh yes, much better indeed. I should have thought about this. Please find below
the patch reworked as above.

diff --git a/gcc/config/c6x/c6x.c b/gcc/config/c6x/c6x.c
index 06319d0..6aca1e3 100644
--- a/gcc/config/c6x/c6x.c
+++ b/gcc/config/c6x/c6x.c
@@ -3513,7 +3513,8 @@ try_rename_operands (rtx_insn *head, rtx_insn *tail, 
unit_req_table reqs,
   COMPL_HARD_REG_SET (unavailable, reg_class_contents[(int) super_class]);
 
   old_reg = this_head->regno;
-  best_reg = find_best_rename_reg (this_head, super_class, &unavailable, 
old_reg);
+  best_reg = find_rename_reg (this_head, super_class, &unavailable, old_reg,
+ true);
 
   regrename_do_replace (this_head, best_reg);
 
diff --git a/gcc/regrename.h b/gcc/regrename.h
index 03b7164..05c78ad 100644
--- a/gcc/regrename.h
+++ b/gcc/regrename.h
@@ -89,8 +89,8 @@ extern void regrename_init (bool);
 extern void regrename_finish (void);
 extern void regrename_analyze (bitmap);
 extern du_head_p regrename_chain_from_id (unsigned int);
-extern int find_best_rename_reg (du_head_p, enum reg_class, HARD_REG_SET *,
-int);
+extern int find_rename_reg (du_head_p, enum reg_class, HARD_REG_SET *, int,
+   bool);
 extern void regrename_do_replace (du_head_p, int);
 
 #endif
diff --git a/gcc/regrename.c b/gcc/regrename.c
index 66f562b..88321d0 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -357,11 +357,13 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
 /* For the chain THIS_HEAD, compute and return the best register to
rename to.  SUPER_CLASS is the superunion of register classes in
the chain.  UNAVAILABLE is a set of registers that cannot be used.
-   OLD_REG is the register currently used for the chain.  */
+   OLD_REG is the register currently used for the chain.  BEST_RENAME
+   controls whether the register chosen must be better than the
+   current one or just respect the given constraint.  */
 
 int
-find_best_rename_reg (du_head_p this_head, enum reg_class super_class,
- HARD_REG_SET *unavailable, int old_reg)
+find_rename_reg (du_head_p this_head, enum reg_class super_class,
+HARD_REG_SET *unavailable, int old_reg, bool best_rename)
 {
   bool has_preferred_class;
   enum reg_class preferred_class;
@@ -400,15 +402,19 @@ find_best_rename_reg (du_head_p this_head, enum reg_class 
super_class,
new_reg))
continue;
 
+ if (!check_new_reg_p (old_reg, new_reg, this_head, *unavailable))
+   continue;
+
+ if (!best_rename)
+   return new_reg;
+
  /* In the first pass, we force the renaming of registers that
 don't belong to PREFERRED_CLASS to registers that do, even
 though the latters were used not very long ago.  */
- if (check_new_reg_p (old_reg, new_reg, this_head,
-  *unavailable)
- && ((pass == 0
-  && !TEST_HARD_REG_BIT (reg_class_contents[preferred_class],
- best_new_reg))
- || tick[best_new_reg] > tick[new_reg]))
+ if ((pass == 0
+ && !TEST_HARD_REG_BIT (reg_class_contents[preferred_class],
+best_new_reg))
+ || tick[best_new_reg] > tick[new_reg])
best_new_reg = new_reg;
}
   if (pass == 0 &

RE: [PATCH, contrib] Reduce check_GNU_style noise

2014-12-04 Thread Thomas Preud'homme

Ping?

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Friday, November 28, 2014 3:02 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH, contrib] Reduce check_GNU_style noise
> 
> Currently check_GNU_style.sh gives the error "There should be exactly
> one space between function name and parentheses." for the following
> kind of lines: tab[(int) idx]
> 
> This patch changes the check to only warn if there is 0 of 2+ space(s)
> between a alphanumeric character and an opening parenthesis, rather
> than 2+ space or anything else than a single space (which also was
> redundant).
> 
> With the change, above lines are now not warned about but other
> incorrect lines are still reported.
> 
> ChangeLog entry is as follows:
> 
> *** contrib/ChangeLog ***
> 
> 2014-11-28  Thomas Preud'homme  
> 
> * check_GNU_style.sh: Warn for incorrect number of space in
> function
> call only if 0 or 2+ spaces found.
> 
> 
> diff --git a/contrib/check_GNU_style.sh b/contrib/check_GNU_style.sh
> index ef8fdda..5f90190 100755
> --- a/contrib/check_GNU_style.sh
> +++ b/contrib/check_GNU_style.sh
> @@ -113,7 +113,7 @@ g 'Sentences should end with a dot.  Dot, space,
> space, end of the comment.' \
>  '[[:alnum:]][[:blank:]]*\*/' $*
> 
>  vg 'There should be exactly one space between function name and
> parentheses.' \
> -'\#define' '[[:alnum:]]([^[:blank:]]|[[:blank:]]{2,})\(' $*
> +'\#define' '[[:alnum:]]([[:blank:]]{2,})?\(' $*
> 
>  g 'There should be no space before closing parentheses.' \
>  '[[:graph:]][[:blank:]]+\)' $*
> 
> Is this ok for trunk?
> 
> Best regards,
> 
> Thomas
> 
> 
>

Re: [Patch, libstdc++/64140] Fix unmatched prefix in regex

2014-12-04 Thread Jonathan Wakely


On 03/12/14 20:34 -0800, Tim Shen wrote:

Committed. Does it need to be backported to 4.9? When do we usually do
the backporting?


It depends. If the patch is a bit risky sometimes we'll leave it on
the trunk for a while before backporting in case any problems arise.

I think this change is safe to go on the branch right away.

RE: [PATCH, contrib] Reduce check_GNU_style noise

2014-12-04 Thread Richard Biener

On Thu, 4 Dec 2014, Thomas Preud'homme wrote:

> Ping?

Ok.

Thanks,
Richard.

> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> > Sent: Friday, November 28, 2014 3:02 PM
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH, contrib] Reduce check_GNU_style noise
> > 
> > Currently check_GNU_style.sh gives the error "There should be exactly
> > one space between function name and parentheses." for the following
> > kind of lines: tab[(int) idx]
> > 
> > This patch changes the check to only warn if there is 0 of 2+ space(s)
> > between a alphanumeric character and an opening parenthesis, rather
> > than 2+ space or anything else than a single space (which also was
> > redundant).
> > 
> > With the change, above lines are now not warned about but other
> > incorrect lines are still reported.
> > 
> > ChangeLog entry is as follows:
> > 
> > *** contrib/ChangeLog ***
> > 
> > 2014-11-28  Thomas Preud'homme  
> > 
> > * check_GNU_style.sh: Warn for incorrect number of space in
> > function
> > call only if 0 or 2+ spaces found.
> > 
> > 
> > diff --git a/contrib/check_GNU_style.sh b/contrib/check_GNU_style.sh
> > index ef8fdda..5f90190 100755
> > --- a/contrib/check_GNU_style.sh
> > +++ b/contrib/check_GNU_style.sh
> > @@ -113,7 +113,7 @@ g 'Sentences should end with a dot.  Dot, space,
> > space, end of the comment.' \
> >  '[[:alnum:]][[:blank:]]*\*/' $*
> > 
> >  vg 'There should be exactly one space between function name and
> > parentheses.' \
> > -'\#define' '[[:alnum:]]([^[:blank:]]|[[:blank:]]{2,})\(' $*
> > +'\#define' '[[:alnum:]]([[:blank:]]{2,})?\(' $*
> > 
> >  g 'There should be no space before closing parentheses.' \
> >  '[[:graph:]][[:blank:]]+\)' $*
> > 
> > Is this ok for trunk?
> > 
> > Best regards,
> > 
> > Thomas
> > 
> > 
> > 
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany

Re: [PATCH] Temporarily handle COMPOUND_EXPR in convert_* (PR c++/56493)

2014-12-04 Thread Richard Biener

On Wed, 3 Dec 2014, Jakub Jelinek wrote:

> Hi!
> 
> As discussed in the PR, ideally this should be done in the middle-end,
> but no fix materialized for it during the last 18 months and is unlikely
> to happen for GCC 5 either, so this patch just restores what the FEs used
> to produce.  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

Ok.  Might want to backport this as well.

Thanks,
Richard.

> 2014-12-03  Jakub Jelinek  
> 
>   PR c++/56493
>   * convert.c (convert_to_real, convert_to_expr, convert_to_complex):
>   Handle COMPOUND_EXPR.
> 
>   * c-c++-common/pr56493.c: New test.
> 
> --- gcc/convert.c.jj  2014-11-11 00:06:25.016769050 +0100
> +++ gcc/convert.c 2014-12-03 11:22:16.193539558 +0100
> @@ -99,6 +99,15 @@ convert_to_real (tree type, tree expr)
>enum built_in_function fcode = builtin_mathfn_code (expr);
>tree itype = TREE_TYPE (expr);
>  
> +  if (TREE_CODE (expr) == COMPOUND_EXPR)
> +{
> +  tree t = convert_to_real (type, TREE_OPERAND (expr, 1));
> +  if (t == TREE_OPERAND (expr, 1))
> + return expr;
> +  return build2_loc (EXPR_LOCATION (expr), COMPOUND_EXPR, TREE_TYPE (t),
> +  TREE_OPERAND (expr, 0), t);
> +}
> +
>/* Disable until we figure out how to decide whether the functions are
>   present in runtime.  */
>/* Convert (float)sqrt((double)x) where x is float into sqrtf(x) */
> @@ -406,6 +415,15 @@ convert_to_integer (tree type, tree expr
>return error_mark_node;
>  }
>  
> +  if (ex_form == COMPOUND_EXPR)
> +{
> +  tree t = convert_to_integer (type, TREE_OPERAND (expr, 1));
> +  if (t == TREE_OPERAND (expr, 1))
> + return expr;
> +  return build2_loc (EXPR_LOCATION (expr), COMPOUND_EXPR, TREE_TYPE (t),
> +  TREE_OPERAND (expr, 0), t);
> +}
> +
>/* Convert e.g. (long)round(d) -> lround(d).  */
>/* If we're converting to char, we may encounter differing behavior
>   between converting from double->char vs double->long->char.
> @@ -926,6 +944,14 @@ convert_to_complex (tree type, tree expr
>  
>   if (TYPE_MAIN_VARIANT (elt_type) == TYPE_MAIN_VARIANT (subtype))
> return expr;
> + else if (TREE_CODE (expr) == COMPOUND_EXPR)
> +   {
> + tree t = convert_to_complex (type, TREE_OPERAND (expr, 1));
> + if (t == TREE_OPERAND (expr, 1))
> +   return expr;
> + return build2_loc (EXPR_LOCATION (expr), COMPOUND_EXPR,
> +TREE_TYPE (t), TREE_OPERAND (expr, 0), t);
> +   }
>   else if (TREE_CODE (expr) == COMPLEX_EXPR)
> return fold_build2 (COMPLEX_EXPR, type,
> convert (subtype, TREE_OPERAND (expr, 0)),
> --- gcc/testsuite/c-c++-common/pr56493.c.jj   2014-12-03 11:22:16.193539558 
> +0100
> +++ gcc/testsuite/c-c++-common/pr56493.c  2014-12-03 11:22:16.193539558 
> +0100
> @@ -0,0 +1,16 @@
> +/* PR c++/56493 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-gimple" } */
> +
> +unsigned long long bar (void);
> +int x;
> +
> +void
> +foo (void)
> +{
> +  x += bar ();
> +}
> +
> +/* Verify we narrow the addition from unsigned long long to unsigned int 
> type.  */
> +/* { dg-final { scan-tree-dump "  (\[a-zA-Z._0-9]*) = \\(unsigned int\\) 
> \[^;\n\r]*;.*  (\[a-zA-Z._0-9]*) = \\(unsigned int\\) \[^;\n\r]*;.* = \\1 \\+ 
> \\2;" "gimple" { target { ilp32 || lp64 } } } } */
> +/* { dg-final { cleanup-tree-dump "gimple" } } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany

[PATCH][AArch64][test] Disable vector cost model on vect_ctz_1.c test

2014-12-04 Thread Kyrill Tkachov


Hi all,

The second scan-assembler test for clz\tv\[0-9\]+\.2s FAILs on this test 
due to vector costs on A57.

The vectorisation happens for cortex-a53 and thunderx.
I think this test was supposed to test the capability of vectorising clz 
rather than the tuning decision of whether to (though I imagine people 
can argue the other way),
so this patch adds an -fno-vect-cost-model to it to get it to always 
vectorise.


This way the test passes on all -mcpu options.

Ok for trunk?

Thanks,
Kyrill

2014-12-04  Kyrylo Tkachov  kyrylo.tkac...@arm.com\

 * gcc.target/aarch64/vect_ctz_1.c: Add -fno-vect-cost-model to
 dg-options.Index: gcc/testsuite/gcc.target/aarch64/vect_ctz_1.c
===
--- gcc/testsuite/gcc.target/aarch64/vect_ctz_1.c	(revision 218233)
+++ gcc/testsuite/gcc.target/aarch64/vect_ctz_1.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O3 -save-temps -fno-inline" } */
+/* { dg-options "-O3 -save-temps -fno-inline -fno-vect-cost-model" } */
 
 extern void abort ();

Re: Fix folding of EQ/NE_EXPR WRT symtab aliases

2014-12-04 Thread Richard Biener

On Thu, Dec 4, 2014 at 12:28 AM, Jan Hubicka  wrote:
> Hi,
> this patch fixes fold-const folding if EQ/NE_EXPR of ADDR_EXPRS WRT aliases 
> and
> weaks. Similarly as my earlier nonzero_address_p patch, it moves the logic
> whether two symbls can resolve to same location to symtab.
>
> The existing code did not make much sense to me, so I tried to follow what
> alias oracle does - i.e. if we know both bases are decls, but offsets are
> different, we know the addresses are different.
> If we have offsets same (or we are unsure) we can see if the bases are known
> to be same or different.
>
> One important case that the patch fixes is folding &alias == &alias_target as
> false.  This avoids late optimization from removing many of the speculatively
> inlined functions.A
>
> For virtual function I play even bit unsafe and make alias == &alias_target
> even if alias_target is interposable but can not change semantically.
> This helps to turn speculative inlining into nonspeculative late in 
> compilation
> and should be safe, becuase virtual functions are not compared for addresses 
> in
> other contexts.
>
> Bootstrapped/regtested x86_64-linux, seems sane?
>
> Honza
>
> * c-family/c-common.c: Refuse weak on visibility change.
> * cgraph.h: (sybtam_node::equal_address_to): Declare.
> * fold-const.c (fold_comparison): Use equal_address_to.
> (fold_binary_loc): Likewise.
> * symtab.c (symtab_node::equal_address_to): New predicate.
>
> * gcc.dg/addr_equal-1.c: New testcase.
> Index: c-family/c-common.c
> ===
> --- c-family/c-common.c (revision 218286)
> +++ c-family/c-common.c (working copy)
> @@ -7781,7 +7781,12 @@
>  }
>else if (TREE_CODE (*node) == FUNCTION_DECL
>|| TREE_CODE (*node) == VAR_DECL)
> -declare_weak (*node);
> +{
> +  struct symtab_node *n = symtab_node::get (*node);
> +  if (n && n->refuse_visibility_changes)
> +   error ("%+D declared weak after being used", *node);
> +  declare_weak (*node);
> +}
>else
>  warning (OPT_Wattributes, "%qE attribute ignored", name);
>
> Index: cgraph.h
> ===
> --- cgraph.h(revision 218286)
> +++ cgraph.h(working copy)
> @@ -332,6 +332,11 @@
>/* Return true if symbol is known to be nonzero.  */
>bool nonzero_address ();
>
> +  /* Return 0 if symbol is known to have different address than S2,
> + Return 1 if symbol is known to have same address as S2,
> + return 2 otherwise.   */
> +  int equal_address_to (symtab_node *s2);
> +
>/* Return symbol table node associated with DECL, if any,
>   and NULL otherwise.  */
>static inline symtab_node *get (const_tree decl)
> Index: fold-const.c
> ===
> --- fold-const.c(revision 218286)
> +++ fold-const.c(working copy)
> @@ -8980,7 +8981,7 @@
> }
> }
>/* For non-equal bases we can simplify if they are addresses
> -of local binding decls or constants.  */
> +declarations with different addresses.  */
>else if (indirect_base0 && indirect_base1
>/* We know that !operand_equal_p (base0, base1, 0)
>   because the if condition was false.  But make
> @@ -8988,16 +8989,13 @@
>&& base0 != base1
>&& TREE_CODE (arg0) == ADDR_EXPR
>&& TREE_CODE (arg1) == ADDR_EXPR
> -  && (((TREE_CODE (base0) == VAR_DECL
> -|| TREE_CODE (base0) == PARM_DECL)
> -   && (targetm.binds_local_p (base0)
> -   || CONSTANT_CLASS_P (base1)))
> -  || CONSTANT_CLASS_P (base0))
> -  && (((TREE_CODE (base1) == VAR_DECL
> -|| TREE_CODE (base1) == PARM_DECL)
> -   && (targetm.binds_local_p (base1)
> -   || CONSTANT_CLASS_P (base0)))
> -  || CONSTANT_CLASS_P (base1)))
> +  && DECL_P (base0)
> +  && DECL_P (base1)
> +  /* Watch for aliases.  */
> +  && (!decl_in_symtab_p (base0)
> +  || !decl_in_symtab_p (base1)
> +  || !symtab_node::get_create (base0)->equal_address_to
> +(symtab_node::get_create (base1
> {
>   if (code == EQ_EXPR)
> return omit_two_operands_loc (loc, type, boolean_false_node,
> @@ -12248,39 +12246,6 @@
>   && code == NE_EXPR)
>  return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0));
>
> -  /* If this is an equality comparison of the address of two non-weak,
> -unaliased symbols neither of which are extern (since we do not
> -have access to attributes for externs), then we know the result.  */
> -  if (TREE_CODE (arg0) == ADDR_EXPR
> -

[PATCH x86] Enable v64qi permutations.

2014-12-04 Thread Ilya Tocar

Hi,

As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html
This patch enables v64qi permutations.
I've checked  vshuf* tests from dg-torture.exp,
with avx512* options on sde and generated permutations are correct.

OK for trunk?

---
 gcc/config/i386/i386.c | 85 ++
 gcc/config/i386/sse.md |  4 +--
 2 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index eafc15a..f29f8ce 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21831,6 +21831,10 @@ ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx 
mask, rtx op1,
   if (TARGET_AVX512VL && TARGET_AVX512BW)
gen = gen_avx512vl_vpermi2varv16hi3;
   break;
+case V64QImode:
+  if (TARGET_AVX512VBMI)
+   gen = gen_avx512bw_vpermi2varv64qi3;
+  break;
 case V32HImode:
   if (TARGET_AVX512BW)
gen = gen_avx512bw_vpermi2varv32hi3;
@@ -48872,6 +48876,7 @@ expand_vec_perm_broadcast_1 (struct expand_vec_perm_d 
*d)
emit_move_insn (d->target, gen_lowpart (d->vmode, dest));
   return true;
 
+case V64QImode:
 case V32QImode:
 case V16HImode:
 case V8SImode:
@@ -48905,6 +48910,78 @@ expand_vec_perm_broadcast (struct expand_vec_perm_d *d)
   return expand_vec_perm_broadcast_1 (d);
 }
 
+/* Implement arbitrary permutations of two V64QImode operands
+   will 2 vpermi2w, 2 vpshufb and one vpor instruction.  */
+static bool
+expand_vec_perm_vpermi2_vpshub2 (struct expand_vec_perm_d *d)
+{
+  if (!TARGET_AVX512BW || !(d->vmode == V64QImode))
+return false;
+
+  if (d->testing_p)
+return true;
+
+  struct expand_vec_perm_d ds[2];
+  rtx rperm[128], vperm, target0, target1;
+  unsigned int i, nelt;
+  machine_mode vmode;
+
+  nelt = d->nelt;
+  vmode = V64QImode;
+
+  for (i = 0; i < 2; i++)
+{
+  ds[i] = *d;
+  ds[i].vmode = V32HImode;
+  ds[i].nelt = 32;
+  ds[i].target = gen_reg_rtx (V32HImode);
+  ds[i].op0 = gen_lowpart (V32HImode, d->op0);
+  ds[i].op1 = gen_lowpart (V32HImode, d->op1);
+}
+
+  /* Prepare permutations such that the first one takes care of
+ putting the even bytes into the right positions or one higher
+ positions (ds[0]) and the second one takes care of
+ putting the odd bytes into the right positions or one below
+ (ds[1]).  */
+
+  for (i = 0; i < nelt; i++)
+{
+  ds[i & 1].perm[i / 2] = d->perm[i] / 2;
+  if (i & 1)
+   {
+ rperm[i] = constm1_rtx;
+ rperm[i + 64] = GEN_INT ((i & 14) + (d->perm[i] & 1));
+   }
+  else
+   {
+ rperm[i] = GEN_INT ((i & 14) + (d->perm[i] & 1));
+ rperm[i + 64] = constm1_rtx;
+   }
+}
+
+  bool ok = expand_vec_perm_1 (&ds[0]);
+  gcc_assert (ok);
+  ds[0].target = gen_lowpart (V64QImode, ds[0].target);
+
+  ok = expand_vec_perm_1 (&ds[1]);
+  gcc_assert (ok);
+  ds[1].target = gen_lowpart (V64QImode, ds[1].target);
+
+  vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm));
+  vperm = force_reg (vmode, vperm);
+  target0 = gen_reg_rtx (V64QImode);
+  emit_insn (gen_avx512bw_pshufbv64qi3 (target0, ds[0].target, vperm));
+
+  vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm + 64));
+  vperm = force_reg (vmode, vperm);
+  target1 = gen_reg_rtx (V64QImode);
+  emit_insn (gen_avx512bw_pshufbv64qi3 (target1, ds[1].target, vperm));
+
+  emit_insn (gen_iorv64qi3 (d->target, target0, target1));
+  return true;
+}
+
 /* Implement arbitrary permutation of two V32QImode and V16QImode operands
with 4 vpshufb insns, 2 vpermq and 3 vpor.  We should have already failed
all the shorter instruction sequences.  */
@@ -49079,6 +49156,9 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d 
*d)
   if (expand_vec_perm_vpshufb2_vpermq_even_odd (d))
 return true;
 
+  if (expand_vec_perm_vpermi2_vpshub2 (d))
+return true;
+
   /* ??? Look for narrow permutations whose element orderings would
  allow the promotion to a wider mode.  */
 
@@ -49223,6 +49303,11 @@ ix86_vectorize_vec_perm_const_ok (machine_mode vmode,
/* All implementable with a single vpermi2 insn.  */
return true;
   break;
+case V64QImode:
+  if (TARGET_AVX512BW)
+   /* Implementable with 2 vpermi2, 2 vpshufb and 1 or insn.  */
+   return true;
+  break;
 case V8SImode:
 case V8SFmode:
 case V4DFmode:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ca5d720..6252e7e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10678,7 +10678,7 @@
(V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2")
(V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
-   (V32HI "TARGET_AVX512BW")])
+   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")])
 
 (define_expand "vec_perm"
   [(match_operand:VEC_PERM_AVX2 0 "register_operand")
@@ -10700,7 +10700,7 @@
(V32QI "TARGET_AVX2") (V16H

Re: [PATCH] Add a configurable directory prefix for dynamic linkers.

2014-12-04 Thread Richard Biener

On Thu, Dec 4, 2014 at 8:59 AM, Benda Xu  wrote:
> Hello,
>
> libc could be installed in a directory prefix. This patch provides
> a way to specify such a prefix for gcc at configuration time.
>
> I have only tested the patch with glibc on amd64, x86 and arm.
> It is logically straightforward.  Please share your thoughts on it.

I think you are supposed to use sysroots for that.

Richard.

> Cheers,
> Benda
>
> * configure.ac: add --with-dynamic-linker-prefix
> * configure, config.in: regenerated
> * *.h under config: prepend dynamic-linkers with
> DYNAMIC_LINKER_PREFIX
> ---
>  gcc/config.in  |  6 ++
>  gcc/config/aarch64/aarch64-linux.h |  3 ++-
>  gcc/config/alpha/linux-elf.h   |  4 ++--
>  gcc/config/arm/linux-eabi.h|  6 +++---
>  gcc/config/arm/linux-elf.h |  2 +-
>  gcc/config/c6x/uclinux-elf.h   |  2 +-
>  gcc/config/cris/linux.h|  2 +-
>  gcc/config/dragonfly.h |  3 ++-
>  gcc/config/freebsd-spec.h  |  4 ++--
>  gcc/config/frv/linux.h |  2 +-
>  gcc/config/i386/gnu.h  |  2 +-
>  gcc/config/i386/kfreebsd-gnu.h |  2 +-
>  gcc/config/i386/kfreebsd-gnu64.h   |  6 +++---
>  gcc/config/i386/linux.h|  2 +-
>  gcc/config/i386/linux64.h  |  6 +++---
>  gcc/config/ia64/linux.h|  2 +-
>  gcc/config/knetbsd-gnu.h   |  2 +-
>  gcc/config/kopensolaris-gnu.h  |  2 +-
>  gcc/config/linux.h | 16 
>  gcc/config/m32r/linux.h|  2 +-
>  gcc/config/m68k/linux.h|  2 +-
>  gcc/config/microblaze/linux.h  |  2 +-
>  gcc/config/mips/linux.h| 17 +
>  gcc/config/mn10300/linux.h |  2 +-
>  gcc/config/nios2/linux.h   |  2 +-
>  gcc/config/pa/pa-linux.h   |  2 +-
>  gcc/config/rs6000/freebsd64.h  |  4 ++--
>  gcc/config/rs6000/linux64.h| 10 +-
>  gcc/config/rs6000/sysv4.h  |  4 ++--
>  gcc/config/s390/linux.h|  4 ++--
>  gcc/config/sh/linux.h  |  2 +-
>  gcc/config/sparc/linux.h   |  2 +-
>  gcc/config/sparc/linux64.h |  4 ++--
>  gcc/config/xtensa/linux.h  |  2 +-
>  gcc/configure  | 19 +++
>  gcc/configure.ac   | 10 ++
>  36 files changed, 101 insertions(+), 63 deletions(-)
>
> diff --git a/gcc/config.in b/gcc/config.in
> index 65d5e42..7a3f29e 100644
> --- a/gcc/config.in
> +++ b/gcc/config.in
> @@ -63,6 +63,12 @@
>  #endif
>
>
> +/* root directory of the dynamic linker */
> +#ifndef USED_FOR_TARGET
> +#undef DYNAMIC_LINKER_PREFIX
> +#endif
> +
> +
>  /* Define if you want assertions enabled. This is a cheap check. */
>  #ifndef USED_FOR_TARGET
>  #undef ENABLE_ASSERT_CHECKING
> diff --git a/gcc/config/aarch64/aarch64-linux.h 
> b/gcc/config/aarch64/aarch64-linux.h
> index d375624..112f8db 100644
> --- a/gcc/config/aarch64/aarch64-linux.h
> +++ b/gcc/config/aarch64/aarch64-linux.h
> @@ -21,7 +21,8 @@
>  #ifndef GCC_AARCH64_LINUX_H
>  #define GCC_AARCH64_LINUX_H
>
> -#define GLIBC_DYNAMIC_LINKER 
> "/lib/ld-linux-aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1"
> +#define GLIBC_DYNAMIC_LINKER DYNAMIC_LINKER_PREFIX 
> "/lib/ld-linux-aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1"
> +#define GLIBC_DYNAMIC_LINKER DYNAMIC_LINKER_PREFIX 
> "/lib/ld-linux-aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1"
>
>  #undef  ASAN_CC1_SPEC
>  #define ASAN_CC1_SPEC "%{%:sanitize(address):-funwind-tables}"
> diff --git a/gcc/config/alpha/linux-elf.h b/gcc/config/alpha/linux-elf.h
> index bdefe23..e6a7082 100644
> --- a/gcc/config/alpha/linux-elf.h
> +++ b/gcc/config/alpha/linux-elf.h
> @@ -23,8 +23,8 @@ along with GCC; see the file COPYING3.  If not see
>  #define EXTRA_SPECS \
>  { "elf_dynamic_linker", ELF_DYNAMIC_LINKER },
>
> -#define GLIBC_DYNAMIC_LINKER   "/lib/ld-linux.so.2"
> -#define UCLIBC_DYNAMIC_LINKER "/lib/ld-uClibc.so.0"
> +#define GLIBC_DYNAMIC_LINKER DYNAMIC_LINKER_PREFIX "/lib/ld-linux.so.2"
> +#define UCLIBC_DYNAMIC_LINKER DYNAMIC_LINKER_PREFIX "/lib/ld-uClibc.so.0"
>  #if DEFAULT_LIBC == LIBC_UCLIBC
>  #define CHOOSE_DYNAMIC_LINKER(G, U) "%{mglibc:" G ";:" U "}"
>  #elif DEFAULT_LIBC == LIBC_GLIBC
> diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
> index f1f3448..74df9b9 100644
> --- a/gcc/config/arm/linux-eabi.h
> +++ b/gcc/config/arm/linux-eabi.h
> @@ -68,11 +68,11 @@
> GLIBC_DYNAMIC_LINKER_DEFAULT and TARGET_DEFAULT_FLOAT_ABI.  */
>
>  #undef  GLIBC_DYNAMIC_LINKER
> -#define GLIBC_DYNAMIC_LINKER_SOFT_FLOAT "/lib/ld-linux.so.3"
> -#define GLIBC_DYNAMIC_LINKER_HARD_FLOAT "/lib/ld-linux-armhf.so.3"
> +#define GLIBC_DYNAMIC_LINKER_SOFT_FLOAT DYNAMIC_LINKER_PREFIX 
> "/lib/ld-linux.so.3"
> +#define GLIBC_DYNAMIC_LINKER_HARD_FLOAT DYNAMIC_LINKER_PREFIX 
> "/lib/ld-linux-armhf.so.3"
>  #define GLIBC_DYNAMIC_LINKER_DEFAULT GLIBC_DYNAMIC_LINKER_SOFT_FLOAT
>
> -#define GLI

Re: [patch,doc] Remove cloog & ppl in the docs

2014-12-04 Thread Richard Biener

On Thu, Dec 4, 2014 at 10:02 AM, FX  wrote:
> The attached patch removes some remaining mentions of cloog and ppl in our 
> docs.
> Tested with “make info html pdf”. OK for trunk?

Ok.

Thanks,
Richard.

> FX
>
>
>
> PS: with this, the remaining mentions of cloog or ppl are comments in 
> config/isl.m4, graphite.c and graphite-blocking.c. They should probably go 
> away too, but they are technical so someone with cloog/isl knowledge should 
> do it.
>
>
>
>
>
> 2014-12-04  Francois-Xavier Coudert  
>
> * doc/install.texi: Remove mentions of cloog and ppl.
> * doc/invoke.texi: Likewise
>

Re: [PATCH fortran/linemap] Add enough column hint to fit any possible offset

2014-12-04 Thread Tobias Burnus

Hi Manuel,

thanks for rightfully nagging - I shouldn't do late at night reviews. Regarding:

Manuel López-Ibáñez wrote:
> It is still not clear to me if line_len is the length of the line read
> or not, is it? If not, is there any way to actually get the length of
> the line?

Looking at the code in load_line, the line_len in

   int trunc = load_line (input, &line, &line_len, NULL);

is the size of the buffer. That's not too bad but usually to large. However,
the actual line length is determined one line later:

   len = gfc_wide_strlen (line);

which does a strlen on gfc_char_t which is uint32_t to accomodate unicode
characters.

Hence, I think

   b->location
-   = linemap_line_start (line_table, current_file->line++, 120);
+   = linemap_line_start (line_table, current_file->line++, line_len);

should use "len" instead of "line_len".

 * * *

> > Namely, the previous code trims the output to show only the code around the
> > error location while the common-diagnostics code shows the whole line.

I just realized that all the space indentation got lost with Thunderbird - while
it still showed it in the edit window.

See attachment for a new attempt.

Using -fmessage-length= or COLUMNS explicitly works.

Fortran uses gcc/fortran/error.c's get_terminal_width() to determine the
terminal width. While the common code uses diagnostic_set_caret_max_width,
which does not seem to work reliably.

Playing around with it, the common approach seems to work with a normal
Xterm and gnome-terminal, but it fails with MobaXterm (Windows ssh client;
shows "echo $COLUMNS" in the shell correctly, but somehow that doesn't reach
GCC) - but using "COLUMNS=80 ~/gcc/gcc-trunk/bin/gfortran" works also with
MobaXterm. Yesterday, I tried KDE's konsole and it didn't seem to work there,
either. But I cannot check as the RHEL 6 here doesn't seem to have KDE.

Thus, it seems to be enough to transfer checks from get_terminal_width
to diagnostic_set_caret_max_width to fix this issue.

Tobias
Input file - all spaces, but at tab between "aaa',"  and "'bg":
---
 print *, 
'aaa',
'bgfhkg'
 end
---

Using:
   $ ~/gcc/gcc-4.9/bin/gfortran -ffree-line-length-none -pedantic test.f90

One gets with a pipe or wide terminal:

test.f90:1.145:

 print *, 
'aaa', 'bgfhkg'

 1
Warning: Nonconforming tab character at (1)

And with a small terminal:

test.f90:1.145:

t *, 'aaa', 'bgf
1
Warning: Nonconforming tab character at (1)

And with the new code, one gets for a small window:
$ ~/gcc/gcc-trunk/bin/gfortran -ffree-line-length-none -pedantic test.f90
test.f90:1:145:

  print *, 
'aaa', 'bgfhkg'

 1
Warning: Nonconforming tab character at (1) [-Wtabs]

Re: [PATCH][AARCH64]Use selected cpu's tuning when no tuning parameter is specified.

2014-12-04 Thread Marcus Shawcroft

On 27 November 2014 at 11:27, Renlin Li  wrote:

> gcc/ChangeLog:
>
> 2014-11-27  Renlin Li  
>
> * config/aarch64/aarch64.c (aarch64_parse_cpu): Don't define
> selected_tune.
> (aarch64_override_options): Use selected_cpu's tuning.
>

OK and this is also broken in 4.9, could you prepare a backport please. /Marcus

RE: [PATCH] Add a configurable directory prefix for dynamic linkers.

2014-12-04 Thread Matthew Fortune

Richard Biener   writes:
> On Thu, Dec 4, 2014 at 8:59 AM, Benda Xu  wrote:
> > Hello,
> >
> > libc could be installed in a directory prefix. This patch provides a
> > way to specify such a prefix for gcc at configuration time.
> >
> > I have only tested the patch with glibc on amd64, x86 and arm.
> > It is logically straightforward.  Please share your thoughts on it.
> 
> I think you are supposed to use sysroots for that.

This patch would appear to allow the ABI defined locations of a dynamic
linker to be changed which does not seem like a good idea. I.e. My
understanding is that the combination of arch+ABI+Clibrary indicates an
ABI defined absolute path to the dynamic linker as well as name of the
dynamic linker. I'd welcome any correction to that from someone who
lives and breathes this stuff.

I am under the impression that installing glibc with a prefix effectively
breaks the ABI paths and is primarily for testing and/or creating
sysroots for cross compilers and such forth. Such sysroots must be
installed to a target before being used or with qemu user-mode there is
an option to say where your sysroot is so that the dynamic linker can
be located from within that path.

While I believe the dynamic linker paths are hard coded, the library
Locations are more complex especially when multilibs (MULTILIB_OSDIRNAMES)
and/or multiarch gets layered on top.

I'd love to see a description of how all these pieces are intended to
sit together as I too am looking at this area for MIPS sysroots. Perhaps
that is a topic for libc-alpha though.

Thanks,
Matthew

> 
> > Cheers,
> > Benda
> >
> > * configure.ac: add --with-dynamic-linker-prefix
> > * configure, config.in: regenerated
> > * *.h under config: prepend dynamic-linkers with
> > DYNAMIC_LINKER_PREFIX
> > ---
> >  gcc/config.in  |  6 ++
> >  gcc/config/aarch64/aarch64-linux.h |  3 ++-
> >  gcc/config/alpha/linux-elf.h   |  4 ++--
> >  gcc/config/arm/linux-eabi.h|  6 +++---
> >  gcc/config/arm/linux-elf.h |  2 +-
> >  gcc/config/c6x/uclinux-elf.h   |  2 +-
> >  gcc/config/cris/linux.h|  2 +-
> >  gcc/config/dragonfly.h |  3 ++-
> >  gcc/config/freebsd-spec.h  |  4 ++--
> >  gcc/config/frv/linux.h |  2 +-
> >  gcc/config/i386/gnu.h  |  2 +-
> >  gcc/config/i386/kfreebsd-gnu.h |  2 +-
> >  gcc/config/i386/kfreebsd-gnu64.h   |  6 +++---
> >  gcc/config/i386/linux.h|  2 +-
> >  gcc/config/i386/linux64.h  |  6 +++---
> >  gcc/config/ia64/linux.h|  2 +-
> >  gcc/config/knetbsd-gnu.h   |  2 +-
> >  gcc/config/kopensolaris-gnu.h  |  2 +-
> >  gcc/config/linux.h | 16 
> >  gcc/config/m32r/linux.h|  2 +-
> >  gcc/config/m68k/linux.h|  2 +-
> >  gcc/config/microblaze/linux.h  |  2 +-
> >  gcc/config/mips/linux.h| 17 +
> >  gcc/config/mn10300/linux.h |  2 +-
> >  gcc/config/nios2/linux.h   |  2 +-
> >  gcc/config/pa/pa-linux.h   |  2 +-
> >  gcc/config/rs6000/freebsd64.h  |  4 ++--
> >  gcc/config/rs6000/linux64.h| 10 +-
> >  gcc/config/rs6000/sysv4.h  |  4 ++--
> >  gcc/config/s390/linux.h|  4 ++--
> >  gcc/config/sh/linux.h  |  2 +-
> >  gcc/config/sparc/linux.h   |  2 +-
> >  gcc/config/sparc/linux64.h |  4 ++--
> >  gcc/config/xtensa/linux.h  |  2 +-
> >  gcc/configure  | 19 +++
> >  gcc/configure.ac   | 10 ++
> >  36 files changed, 101 insertions(+), 63 deletions(-)
> >
> > diff --git a/gcc/config.in b/gcc/config.in index 65d5e42..7a3f29e
> > 100644
> > --- a/gcc/config.in
> > +++ b/gcc/config.in
> > @@ -63,6 +63,12 @@
> >  #endif
> >
> >
> > +/* root directory of the dynamic linker */ #ifndef USED_FOR_TARGET
> > +#undef DYNAMIC_LINKER_PREFIX #endif
> > +
> > +
> >  /* Define if you want assertions enabled. This is a cheap check. */
> > #ifndef USED_FOR_TARGET  #undef ENABLE_ASSERT_CHECKING diff --git
> > a/gcc/config/aarch64/aarch64-linux.h
> > b/gcc/config/aarch64/aarch64-linux.h
> > index d375624..112f8db 100644
> > --- a/gcc/config/aarch64/aarch64-linux.h
> > +++ b/gcc/config/aarch64/aarch64-linux.h
> > @@ -21,7 +21,8 @@
> >  #ifndef GCC_AARCH64_LINUX_H
> >  #define GCC_AARCH64_LINUX_H
> >
> > -#define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-aarch64%{mbig-
> endian:_be}%{mabi=ilp32:_ilp32}.so.1"
> > +#define GLIBC_DYNAMIC_LINKER DYNAMIC_LINKER_PREFIX "/lib/ld-linux-
> aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1"
> > +#define GLIBC_DYNAMIC_LINKER DYNAMIC_LINKER_PREFIX "/lib/ld-linux-
> aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1"
> >
> >  #undef  ASAN_CC1_SPEC
> >  #define ASAN_CC1_SPEC "%{%:sanitize(address):-funwind-tables}"
> > diff --git a/gcc/config/alpha/linux-elf.h
> > b/gcc/config/alpha/linux-elf.h index bdefe23..e6a7082 10

Re: [Build, Graphite] Support ISL-0.14 with GCC 4.9

2014-12-04 Thread Richard Biener

On Wed, Dec 3, 2014 at 4:24 PM, Tobias Burnus
 wrote:
> Dear all,
>
> the attached patches permit an in-tree build of GCC 4.9 using
> CLooG with ISL-0.14 backend and ISL-0.14 - as wished by Richard.
>
> The CLooG patches have been extracted from the CLooG git repository
> and have been included as courtesy for Linux distribution people.
>
>
> I do not intent to backport this patch to 4.8, but my hope would be
> that they apply without any or with only minor modifications.
>
>
> I have bootstrapped GCC with an in-tree-build of ISL and CLooG.
> I haven't done any intensive tests, but given that the CLooG patches
> come from CLooG's git repository, the #include changes are obvious, and
> the single code change comes from the trunk, it really should work.
>
> OK for the 4.9 branch?

Just trying to verify stuff - I see cloog testsuite failures with ISL 0.14
and your patch:

[  242s] Check file ./reservoir/QR.cloog \c
[  242s] (no option), \c
[  242s] generating... \c
[  243s] --- cloog_temp 2014-12-04 10:08:32.189015995 +
[  243s] +++ ./reservoir/QR.c 2013-10-11 07:27:03.0 +
[  243s] @@ -1,12 +1,6 @@
[  243s] -/* Generated from ./reservoir/QR.cloog by CLooG
0.18.1-UNKNOWN gmp bits in 0.25s. */
[  243s] +/* Generated from
../../../git/cloog/test/./reservoir/QR.cloog by CLooG
0.14.0-136-gb91ef26 gmp bits in 0.21s. */
[  243s]  if (N >= 1) {
[  243s]S1(0);
[  243s] -  if ((M <= 0) && (N >= 2)) {
[  243s] -S3(0);
[  243s] -S10(0);
[  243s] -S1(1);
[  243s] -S5(0);
[  243s] -  }
[  243s]if ((M >= 1) && (N == 1)) {
[  243s]  for (c4=0;c4<=M-1;c4++) {
[  243s]S2(0,c4);
[  243s] @@ -35,6 +29,12 @@
[  243s]  S1(1);
[  243s]  S5(0);
[  243s]}
[  243s] +  if ((M <= 0) && (N >= 2)) {
[  243s] +S3(0) ;
[  243s] +S10(0) ;
[  243s] +S1(1) ;
[  243s] +S5(0) ;
[  243s] +  }
[  243s]for (c2=2;c2<=min(M,N-1);c2++) {
[  243s]  for (c4=c2-1;c4<=N-1;c4++) {
[  243s]S6(c2-2,c4);
[  243s] [31mFAIL: ./reservoir/QR.c is not the same[0m

and

[  245s] Check file ./isl/jacobi-shared.cloog \c
[  245s] (options -f 4 -l -1 -override -strides 1 -sh 1 ), \c
[  245s] generating... \c
[  246s] --- cloog_temp 2014-12-04 10:08:36.081015995 +
[  246s] +++ ./isl/jacobi-shared.c 2013-10-11 07:27:03.0 +
[  246s] @@ -3,7 +3,7 @@
[  246s]if ((16*floord(t0-1,16) >= -N+g1+t0+1) &&
(16*floord(g1+t0-3,16) >= -N+g1+t0+1) && (32*floord(t1-1,32) >=
-N+g2+t1+1) && (32*floord(g2+t1-3,32) >= t1-32)) {
[  246s]  for
(c0=max(-16*floord(t0-1,16)+t0,-16*floord(g1+t0-3,16)+t0);c0<=min(32,N-g1-1);c0+=16)
{
[  246s]for (c1=-32*floord(t1-1,32)+t1;c1<=min(32,N-g2-1);c1+=32) {
[  246s] -if (c1 >= 1) {
[  246s] +if ((c1 >= 1) && (c1 <= 32)) {
[  246s]S1(c0+g1-1,c1+g2-1);
[  246s]  }
[  246s]}
[  246s] [31mFAIL: ./isl/jacobi-shared.c is not the same[0m

they do not appear when using ISL 0.12.2.  I suppose the fails
might be harmless?  Are there other patches to cloog that fix those
that you have not sent?

I've sofar tested the patched gcc 4.9 still works with old cloog/isl.

Richard.

> NOTE: I have only tested with ISL 0.14; I have no idea whether it works
> with ISL 0.13. However, given that one reason for the patch is to make
> it possible to use a system ISL for both GCC 5 and GCC (4.8/)4.9,
> supporting only 0.12.2 and 0.14 should be sufficient: ISL 0.14 is
> required to avoid a testsuite ICE with GCC 5 and if one really wants
> to use an older version, 0.12.2 should be fine.
>
> Tobias

[PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

2014-12-04 Thread Jiong Wang


For PR62173, the ideal solution is to resolve the problem on tree level ivopt 
pass.

While, apart from the tree level issue, PR 62173 also exposed another two RTL 
level issues.
one of them is looks like we could improve RTL level loop invariant hoisting by 
re-shuffle insns.

for Seb's testcase

void bar(int i) {
  char A[10];
  int d = 0;
  while (i > 0)
  A[d++] = i--;

  while (d > 0)
  foo(A[d--]);
}

the insn sequences to calculate A[I]'s address looks like:

(insn 76 75 77 22 (set (reg/f:DI 109)
  (plus:DI (reg/f:DI 64 sfp)
  (reg:DI 108 [ i ]))) seb-pop.c:8 84 {*adddi3_aarch64}
  (expr_list:REG_DEAD (reg:DI 108 [ i ])
  (nil)))
(insn 77 76 78 22 (set (reg:SI 110 [ D.2633 ])
  (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 109)
  (const_int -16 [0xfff0])) [0 A S1 A8]))) seb-pop.c:8 76 
{*zero_extendqisi2_aarch64}
  (expr_list:REG_DEAD (reg/f:DI 109)
  (nil)))

while for most RISC archs, reg + reg addressing is typical, so if we re-shuffle
the instruction sequences into the following:

(insn 96 94 97 22 (set (reg/f:DI 129)
  (plus:DI (reg/f:DI 64 sfp)
  (const_int -16 [0xfff0]))) seb-pop.c:8 84 {*adddi3_aarch64}
  (nil))
(insn 97 96 98 22 (set (reg:DI 130 [ i ])
  (sign_extend:DI (reg/v:SI 97 [ i ]))) seb-pop.c:8 70 {*extendsidi2_aarch64}
  (expr_list:REG_DEAD (reg/v:SI 97 [ i ])
  (nil)))
(insn 98 97 99 22 (set (reg:SI 131 [ D.2633 ])
  (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 129)
  (reg:DI 130 [ i ])) [0 A S1 A8]))) seb-pop.c:8 76 {*zero_extendqisi2_aarch64}
  (expr_list:REG_DEAD (reg:DI 130 [ i ])
  (expr_list:REG_DEAD (reg/f:DI 129)
  (nil

which means re-associate the constant imm with the virtual frame pointer.

transform

 RA <- fixed_reg + RC
 RD <- MEM (RA + const_offset)

  into:

 RA <- fixed_reg + const_offset
 RD <- MEM (RA + RC)

then RA <- fixed_reg + const_offset is actually loop invariant, so the later
RTL GCSE PRE pass could catch it and do the hoisting, and thus ameliorate what 
tree
level ivopts could not sort out.

and this patch only tries to re-shuffle instructions within single basic block 
which
is a inner loop which is perf critical.

I am reusing the loop info in fwprop because there is loop info and it's run 
before
GCSE.

verified on aarch64 and mips64, the array base address hoisted out of loop.

bootstrap ok on x86-64 and aarch64.

comments?

thanks.

gcc/
  PR62173
  fwprop.c (prepare_for_gcse_pre): New function.
  (fwprop_done): Call it.
diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index 377b33c..b2a5918 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -1399,6 +1399,133 @@ forward_propagate_into (df_ref use)
   return false;
 }
 
+/* Loop invariant variable hoisting for critical code has
+   important impact on the performance.
+
+   The RTL GCSE PRE pass could detect more hoisting opportunities
+   if we re-shuffle the instructions to associate fixed registers
+   with constant.
+
+   This function try to transform
+
+ RA <- RB_fixed + RC
+ RD <- MEM (RA + const_offset)
+
+  into:
+
+ RA <- RB_fixed + const_offset
+ RD <- MEM (RA + RC)
+
+  If RA is DEAD after the second instruction.
+
+  After this change, the first instruction is loop invariant.  */
+
+static void
+prepare_for_gcse_pre ()
+{
+  struct loop *loop;
+
+  if (! current_loops)
+return;
+
+  FOR_EACH_LOOP (loop, LI_INCLUDE_ROOT)
+{
+  if (loop && loop->header && loop->latch
+	  && loop->header->index == loop->latch->index)
+	{
+	  rtx_insn *insn, *next_insn;
+	  rtx single_set1, single_set2, old_dest;
+	  rtx op0, op0_;
+	  rtx op1, op1_;
+	  rtx inner;
+	  rtx *mem_plus_loc;
+
+	  basic_block bb = BASIC_BLOCK_FOR_FN (cfun, loop->header->index);
+
+	  FOR_BB_INSNS (bb, insn)
+	{
+	  if (! NONDEBUG_INSN_P (insn))
+		continue;
+
+	  single_set1 = single_set (insn);
+
+	  if (! single_set1
+		  || GET_CODE (SET_SRC (single_set1)) != PLUS)
+		continue;
+
+	  old_dest = SET_DEST (single_set1);
+	  op0 = XEXP (SET_SRC (single_set1), 0);
+	  op1 = XEXP (SET_SRC (single_set1), 1);
+
+	  if (op1 == frame_pointer_rtx
+		  || op1 == stack_pointer_rtx
+		  || op1 == virtual_stack_vars_rtx)
+		std::swap (op0, op1);
+
+	  if (! (REG_P (old_dest) && REG_P (op0) && REG_P (op1)
+		 && (op0 == frame_pointer_rtx
+			 || op0 == stack_pointer_rtx
+			 || op0 == virtual_stack_vars_rtx)))
+		continue;
+
+	  if (! (next_insn = next_real_insn (insn)))
+		break;
+
+	  do
+		{
+		  if (DEBUG_INSN_P (next_insn))
+		continue;
+
+		  single_set2 = single_set (next_insn);
+
+		  if (!single_set2 || ! REG_P (SET_DEST (single_set2)))
+		continue;
+
+		  inner = SET_SRC (single_set2);
+
+		  if (GET_CODE (inner) == ZERO_EXTEND
+		  || GET_CODE (inner) == SIGN_EXTEND
+		  || GET_CODE (inner) == TRUNCATE)
+		inner = XEXP (inner, 0);
+
+		  if (! MEM_P (inner)
+		  || GET_CODE (XEXP (inner, 0)) != PLUS)
+		continue;
+
+		  mem_plus_loc = &XEXP (inner, 0);
+		  op0_ = XEXP (XEXP (inner, 0), 0);
+

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-04 Thread Sebastian Pop

Sebastian Pop wrote:
> Sebastian Pop wrote:
> > a fail I have not seen in the past:
> > 
> > FAIL: gcc.c-torture/compile/pr27571.c   -Os  (internal compiler error)
> > 
> > I am still investigating why this fails: as far as I can see for now this is
> > because in copying the FSM path we create an internal loop that is then
> > discovered by the loop verifier as a natural loop and is not yet in the 
> > existing
> > loop sturctures.  I will try to fix this in duplicate_seme by invalidating 
> > the
> > loop structure after we code generated all the FSM paths.  I will submit an
> > updated patch when it passes regtest.
> 
> We need at least this patch to fix the fail:
> 
> @@ -2518,6 +2518,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
>   if (duplicate_seme_region (entry, exit, region, len - 1, NULL)) 
>   
>  
> { 
>   /* We do not update dominance info.  */
>   free_dominance_info (CDI_DOMINATORS);
>   bitmap_set_bit (threaded_blocks, entry->src->index);
> + retval = true;
> }
> 
> And this will trigger in the end of the code gen function:
> 
>  if (retval)
> loops_state_set (LOOPS_NEED_FIXUP);
> 
> That will fix the loop structures.  I'm testing this patch on top of the one I
> have just sent out.

This passed bootstrap and regression test on x86_64-linux.

[PATCH] Mark explicit decls as implicit when we've seen a prototype

2014-12-04 Thread Richard Biener


Currently even when I prototype

double exp10 (double);

this function is not available to optimizers for code generation if
they just check for builtin_decl_implicit (BUILT_IN_EXP10).
Curiously though the function is identified as BUILT_IN_EXP10 when
used though, thus the middle-end assumes it has expected exp10
semantics.  I see we already cheat with stpcpy and make it available
to optimizers by marking it implicit when we've seen a prototype.

The following patch proposed to do that for all builtins.

At least I can't see how interpreting exp10 as exp10 but then
not being allowed to use it as exp10 is sensible.

Now one could argue that declaring exp10 doesn't mean there is
an implementation available at link time (after all declaring
exp10 doesn't mean I use exp10 anywhere).  But that's true
for implicit decls as well - I might declare 'pow', not use
it and not link against libm.  So if the compiler now emits
a call to pow my link will break:

extern double pow (double, double);
double x;
int main ()
{
  return x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x;
}

links fine with -O0 but fails to link with -Os -ffast-math where
I have to supply -lm.

So the following patch extends the stpcpy assumption to all builtins.

Ok after bootstrap / regtest?

Thanks,
Richard.

2014-12-04  Richard Biener  

c/
* c-decl.c (merge_decls): For explicit builtin functions mark
them as implicit when we have seen a compatible prototype.

cp/
* decl.c (duplicate_decls): For explicit builtin functions mark
them as implicit when we have seen a compatible prototype.

Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c  (revision 218343)
+++ gcc/c/c-decl.c  (working copy)
@@ -2563,18 +2563,11 @@ merge_decls (tree newdecl, tree olddecl,
  if (DECL_BUILT_IN_CLASS (newdecl) == BUILT_IN_NORMAL)
{
  enum built_in_function fncode = DECL_FUNCTION_CODE (newdecl);
- switch (fncode)
-   {
- /* If a compatible prototype of these builtin functions
-is seen, assume the runtime implements it with the
-expected semantics.  */
-   case BUILT_IN_STPCPY:
- if (builtin_decl_explicit_p (fncode))
-   set_builtin_decl_implicit_p (fncode, true);
- break;
-   default:
- break;
-   }
+ /* If a compatible prototype of a builtin function
+is seen, assume the runtime implements it with the
+expected semantics.  */
+ if (builtin_decl_explicit_p (fncode))
+   set_builtin_decl_implicit_p (fncode, true);
}
}
  else
Index: gcc/cp/decl.c
===
--- gcc/cp/decl.c   (revision 218343)
+++ gcc/cp/decl.c   (working copy)
@@ -2288,18 +2288,11 @@ duplicate_decls (tree newdecl, tree oldd
  if (DECL_BUILT_IN_CLASS (newdecl) == BUILT_IN_NORMAL)
{
  enum built_in_function fncode = DECL_FUNCTION_CODE (newdecl);
- switch (fncode)
-   {
- /* If a compatible prototype of these builtin functions
-is seen, assume the runtime implements it with the
-expected semantics.  */
-   case BUILT_IN_STPCPY:
- if (builtin_decl_explicit_p (fncode))
-   set_builtin_decl_implicit_p (fncode, true);
- break;
-   default:
- break;
-   }
+ /* If a compatible prototype of a builtin function
+is seen, assume the runtime implements it with the
+expected semantics.  */
+ if (builtin_decl_explicit_p (fncode))
+   set_builtin_decl_implicit_p (fncode, true);
}
}
   if (new_defines_function)

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-04 Thread Sebastian Pop

Jeff Law wrote:
> I'm a bit worried about compile-time impacts of the all the
> recursion

I will also restrict the recursion to the loop in which we look for the FSM
thread, like this:

diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index a6fb361..9a153bb 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -959,13 +959,17 @@ thread_around_empty_blocks (edge taken_edge,
 
 /* Return true if the CFG contains at least one path from START_BB to END_BB.
When a path is found, record in PATH the blocks from END_BB to START_BB.
-   VISITED_BBS is used to make sure we don't fall into an infinite loop.  */
+   VISITED_BBS is used to make sure we don't fall into an infinite loop.  Bound
+   the recursion to basic blocks belonging to LOOP.  */
 
 static bool
 fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
  vec *&path,
- hash_set *visited_bbs, int n_insns)
+ hash_set *visited_bbs, loop_p loop)
 {
+  if (loop != start_bb->loop_father)
+return false;
+
   if (start_bb == end_bb)
 {
   vec_safe_push (path, start_bb);
@@ -977,7 +981,7 @@ fsm_find_thread_path (basic_block start_bb, basic_block 
end_bb,
   edge e;
   edge_iterator ei;
   FOR_EACH_EDGE (e, ei, start_bb->succs)
-   if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, n_insns))
+   if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, loop))
  {
vec_safe_push (path, start_bb);
return true;
@@ -1035,7 +1039,8 @@ fsm_find_control_statement_thread_paths (tree expr,
{
  hash_set *visited_bbs = new hash_set;
 
- if (fsm_find_thread_path (var_bb, e->src, next_path, visited_bbs, 0))
+ if (fsm_find_thread_path (var_bb, e->src, next_path, visited_bbs,
+   e->src->loop_father))
++e_count;
 
  delete visited_bbs;

Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

2014-12-04 Thread Richard Biener

On Thu, Dec 4, 2014 at 12:00 PM, Jiong Wang  wrote:
> For PR62173, the ideal solution is to resolve the problem on tree level
> ivopt pass.
>
> While, apart from the tree level issue, PR 62173 also exposed another two
> RTL level issues.
> one of them is looks like we could improve RTL level loop invariant hoisting
> by re-shuffle insns.
>
> for Seb's testcase
>
> void bar(int i) {
>   char A[10];
>   int d = 0;
>   while (i > 0)
>   A[d++] = i--;
>
>   while (d > 0)
>   foo(A[d--]);
> }
>
> the insn sequences to calculate A[I]'s address looks like:
>
> (insn 76 75 77 22 (set (reg/f:DI 109)
>   (plus:DI (reg/f:DI 64 sfp)
>   (reg:DI 108 [ i ]))) seb-pop.c:8 84 {*adddi3_aarch64}
>   (expr_list:REG_DEAD (reg:DI 108 [ i ])
>   (nil)))
> (insn 77 76 78 22 (set (reg:SI 110 [ D.2633 ])
>   (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 109)
>   (const_int -16 [0xfff0])) [0 A S1 A8]))) seb-pop.c:8 76
> {*zero_extendqisi2_aarch64}
>   (expr_list:REG_DEAD (reg/f:DI 109)
>   (nil)))
>
> while for most RISC archs, reg + reg addressing is typical, so if we
> re-shuffle
> the instruction sequences into the following:
>
> (insn 96 94 97 22 (set (reg/f:DI 129)
>   (plus:DI (reg/f:DI 64 sfp)
>   (const_int -16 [0xfff0]))) seb-pop.c:8 84 {*adddi3_aarch64}
>   (nil))
> (insn 97 96 98 22 (set (reg:DI 130 [ i ])
>   (sign_extend:DI (reg/v:SI 97 [ i ]))) seb-pop.c:8 70
> {*extendsidi2_aarch64}
>   (expr_list:REG_DEAD (reg/v:SI 97 [ i ])
>   (nil)))
> (insn 98 97 99 22 (set (reg:SI 131 [ D.2633 ])
>   (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 129)
>   (reg:DI 130 [ i ])) [0 A S1 A8]))) seb-pop.c:8 76
> {*zero_extendqisi2_aarch64}
>   (expr_list:REG_DEAD (reg:DI 130 [ i ])
>   (expr_list:REG_DEAD (reg/f:DI 129)
>   (nil
>
> which means re-associate the constant imm with the virtual frame pointer.
>
> transform
>
>  RA <- fixed_reg + RC
>  RD <- MEM (RA + const_offset)
>
>   into:
>
>  RA <- fixed_reg + const_offset
>  RD <- MEM (RA + RC)
>
> then RA <- fixed_reg + const_offset is actually loop invariant, so the later
> RTL GCSE PRE pass could catch it and do the hoisting, and thus ameliorate
> what tree
> level ivopts could not sort out.

There is a LIM pass after gimple ivopts - if the invariantness is already
visible there why not handle it there similar to the special-cases in
rewrite_bittest and rewrite_reciprocal?

And of course similar tricks could be applied on the RTL level to
RTL invariant motion?

Thanks,
Richard.

> and this patch only tries to re-shuffle instructions within single basic
> block which
> is a inner loop which is perf critical.
>
> I am reusing the loop info in fwprop because there is loop info and it's run
> before
> GCSE.
>
> verified on aarch64 and mips64, the array base address hoisted out of loop.
>
> bootstrap ok on x86-64 and aarch64.
>
> comments?
>
> thanks.
>
> gcc/
>   PR62173
>   fwprop.c (prepare_for_gcse_pre): New function.
>   (fwprop_done): Call it.

Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

2014-12-04 Thread Richard Biener

On Thu, Dec 4, 2014 at 12:07 PM, Richard Biener
 wrote:
> On Thu, Dec 4, 2014 at 12:00 PM, Jiong Wang  wrote:
>> For PR62173, the ideal solution is to resolve the problem on tree level
>> ivopt pass.
>>
>> While, apart from the tree level issue, PR 62173 also exposed another two
>> RTL level issues.
>> one of them is looks like we could improve RTL level loop invariant hoisting
>> by re-shuffle insns.
>>
>> for Seb's testcase
>>
>> void bar(int i) {
>>   char A[10];
>>   int d = 0;
>>   while (i > 0)
>>   A[d++] = i--;
>>
>>   while (d > 0)
>>   foo(A[d--]);
>> }
>>
>> the insn sequences to calculate A[I]'s address looks like:
>>
>> (insn 76 75 77 22 (set (reg/f:DI 109)
>>   (plus:DI (reg/f:DI 64 sfp)
>>   (reg:DI 108 [ i ]))) seb-pop.c:8 84 {*adddi3_aarch64}
>>   (expr_list:REG_DEAD (reg:DI 108 [ i ])
>>   (nil)))
>> (insn 77 76 78 22 (set (reg:SI 110 [ D.2633 ])
>>   (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 109)
>>   (const_int -16 [0xfff0])) [0 A S1 A8]))) seb-pop.c:8 76
>> {*zero_extendqisi2_aarch64}
>>   (expr_list:REG_DEAD (reg/f:DI 109)
>>   (nil)))
>>
>> while for most RISC archs, reg + reg addressing is typical, so if we
>> re-shuffle
>> the instruction sequences into the following:
>>
>> (insn 96 94 97 22 (set (reg/f:DI 129)
>>   (plus:DI (reg/f:DI 64 sfp)
>>   (const_int -16 [0xfff0]))) seb-pop.c:8 84 {*adddi3_aarch64}
>>   (nil))
>> (insn 97 96 98 22 (set (reg:DI 130 [ i ])
>>   (sign_extend:DI (reg/v:SI 97 [ i ]))) seb-pop.c:8 70
>> {*extendsidi2_aarch64}
>>   (expr_list:REG_DEAD (reg/v:SI 97 [ i ])
>>   (nil)))
>> (insn 98 97 99 22 (set (reg:SI 131 [ D.2633 ])
>>   (zero_extend:SI (mem/j:QI (plus:DI (reg/f:DI 129)
>>   (reg:DI 130 [ i ])) [0 A S1 A8]))) seb-pop.c:8 76
>> {*zero_extendqisi2_aarch64}
>>   (expr_list:REG_DEAD (reg:DI 130 [ i ])
>>   (expr_list:REG_DEAD (reg/f:DI 129)
>>   (nil
>>
>> which means re-associate the constant imm with the virtual frame pointer.
>>
>> transform
>>
>>  RA <- fixed_reg + RC
>>  RD <- MEM (RA + const_offset)
>>
>>   into:
>>
>>  RA <- fixed_reg + const_offset
>>  RD <- MEM (RA + RC)
>>
>> then RA <- fixed_reg + const_offset is actually loop invariant, so the later
>> RTL GCSE PRE pass could catch it and do the hoisting, and thus ameliorate
>> what tree
>> level ivopts could not sort out.
>
> There is a LIM pass after gimple ivopts - if the invariantness is already
> visible there why not handle it there similar to the special-cases in
> rewrite_bittest and rewrite_reciprocal?
>
> And of course similar tricks could be applied on the RTL level to
> RTL invariant motion?

Oh, and the patch misses a testcase.

> Thanks,
> Richard.
>
>> and this patch only tries to re-shuffle instructions within single basic
>> block which
>> is a inner loop which is perf critical.
>>
>> I am reusing the loop info in fwprop because there is loop info and it's run
>> before
>> GCSE.
>>
>> verified on aarch64 and mips64, the array base address hoisted out of loop.
>>
>> bootstrap ok on x86-64 and aarch64.
>>
>> comments?
>>
>> thanks.
>>
>> gcc/
>>   PR62173
>>   fwprop.c (prepare_for_gcse_pre): New function.
>>   (fwprop_done): Call it.

Re: [PATCH] Mark explicit decls as implicit when we've seen a prototype

2014-12-04 Thread Alexander Monakov

On Thu, 4 Dec 2014, Richard Biener wrote:

> 
> Currently even when I prototype
> 
> double exp10 (double);
> 
> this function is not available to optimizers for code generation if
> they just check for builtin_decl_implicit (BUILT_IN_EXP10).
> Curiously though the function is identified as BUILT_IN_EXP10 when
> used though, thus the middle-end assumes it has expected exp10
> semantics.  I see we already cheat with stpcpy and make it available
> to optimizers by marking it implicit when we've seen a prototype.
> 
> The following patch proposed to do that for all builtins.
> 
> At least I can't see how interpreting exp10 as exp10 but then
> not being allowed to use it as exp10 is sensible.
> 
> Now one could argue that declaring exp10 doesn't mean there is
> an implementation available at link time (after all declaring
> exp10 doesn't mean I use exp10 anywhere).  But that's true
> for implicit decls as well - I might declare 'pow', not use
> it and not link against libm.  So if the compiler now emits
> a call to pow my link will break:
> 
> extern double pow (double, double);
> double x;
> int main ()
> {
>   return x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x;
> }
> 
> links fine with -O0 but fails to link with -Os -ffast-math where
> I have to supply -lm.
> 
> So the following patch extends the stpcpy assumption to all builtins.

I think it would be nice if -ffreestanding could be used to disable this kind
of transformations.  Compare with folding a loop to memset in a libc function
implementing memset itself.

Thanks.

Alexander

Re: [Build, Graphite] Support ISL-0.14 with GCC 4.9

2014-12-04 Thread Richard Biener

On Wed, Dec 3, 2014 at 4:24 PM, Tobias Burnus
 wrote:
> Dear all,
>
> the attached patches permit an in-tree build of GCC 4.9 using
> CLooG with ISL-0.14 backend and ISL-0.14 - as wished by Richard.
>
> The CLooG patches have been extracted from the CLooG git repository
> and have been included as courtesy for Linux distribution people.
>
>
> I do not intent to backport this patch to 4.8, but my hope would be
> that they apply without any or with only minor modifications.
>
>
> I have bootstrapped GCC with an in-tree-build of ISL and CLooG.
> I haven't done any intensive tests, but given that the CLooG patches
> come from CLooG's git repository, the #include changes are obvious, and
> the single code change comes from the trunk, it really should work.
>
> OK for the 4.9 branch?

I think for a system 0.14 ISL build which I just checked you need to
adjust the ISL version check as well.

Thanks,
Richard.

>
> NOTE: I have only tested with ISL 0.14; I have no idea whether it works
> with ISL 0.13. However, given that one reason for the patch is to make
> it possible to use a system ISL for both GCC 5 and GCC (4.8/)4.9,
> supporting only 0.12.2 and 0.14 should be sufficient: ISL 0.14 is
> required to avoid a testsuite ICE with GCC 5 and if one really wants
> to use an older version, 0.12.2 should be fine.
>
> Tobias

Re: [PATCH x86] Enable v64qi permutations.

2014-12-04 Thread H.J. Lu

On Thu, Dec 4, 2014 at 1:49 AM, Ilya Tocar  wrote:
> Hi,
>
> As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html
> This patch enables v64qi permutations.
> I've checked  vshuf* tests from dg-torture.exp,
> with avx512* options on sde and generated permutations are correct.
>
> OK for trunk?
>

Can you add a few testcases?


-- 
H.J.

Re: [PATCH x86] Enable v64qi permutations.

2014-12-04 Thread Jakub Jelinek

On Thu, Dec 04, 2014 at 03:54:25AM -0800, H.J. Lu wrote:
> On Thu, Dec 4, 2014 at 1:49 AM, Ilya Tocar  wrote:
> > Hi,
> >
> > As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html
> > This patch enables v64qi permutations.
> > I've checked  vshuf* tests from dg-torture.exp,
> > with avx512* options on sde and generated permutations are correct.
> >
> > OK for trunk?
> >
> 
> Can you add a few testcases?

Isn't it already covered by gcc.dg/torture/vshuf* ?

Jakub

Re: [PATCH x86] Enable v64qi permutations.

2014-12-04 Thread H.J. Lu

On Thu, Dec 4, 2014 at 3:57 AM, Jakub Jelinek  wrote:
> On Thu, Dec 04, 2014 at 03:54:25AM -0800, H.J. Lu wrote:
>> On Thu, Dec 4, 2014 at 1:49 AM, Ilya Tocar  wrote:
>> > Hi,
>> >
>> > As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html
>> > This patch enables v64qi permutations.
>> > I've checked  vshuf* tests from dg-torture.exp,
>> > with avx512* options on sde and generated permutations are correct.
>> >
>> > OK for trunk?
>> >
>>
>> Can you add a few testcases?
>
> Isn't it already covered by gcc.dg/torture/vshuf* ?
>

I didn't see them fail on my machines today.


-- 
H.J.

Re: [PATCH x86] Enable v64qi permutations.

2014-12-04 Thread Jakub Jelinek

On Thu, Dec 04, 2014 at 04:00:27AM -0800, H.J. Lu wrote:
> >> Can you add a few testcases?
> >
> > Isn't it already covered by gcc.dg/torture/vshuf* ?
> >
> 
> I didn't see them fail on my machines today.

Those are executable testcases, those better should not fail.
The patch just improved code generation and the testcases test
if the improved code generation works well.
Did you mean some scan-assembler test that verifies the better code
generation?  Guess it is possible, though fragile.

Jakub

[PINGv2][PATCH] Ignore alignment by option

2014-12-04 Thread Marat Zakirov



On 11/27/2014 05:14 PM, Marat Zakirov wrote:


On 11/19/2014 06:01 PM, Marat Zakirov wrote:

Hi all!

Here is the patch which forces ASan to ignore alignment of memory 
access. It increases ASan overhead but it's still useful because some 
programs like linux kernel often cheat with alignment which may cause 
false negatives.


--Marat




gcc/ChangeLog:

2014-11-14  Marat Zakirov  

	* asan.c (asan_expand_check_ifn): Ignore alignment by option.  
	* doc/invoke.texi: Document.  
	* params.def (asan-alignment-optimize): New.  
	* params.h: Likewise.  

gcc/testsuite/ChangeLog:

2014-11-14  Marat Zakirov  

	* c-c++-common/asan/red-align-3.c: New test.  


diff --git a/gcc/asan.c b/gcc/asan.c
index 79dede7..4f86088 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -2518,6 +2518,12 @@ asan_expand_check_ifn (gimple_stmt_iterator *iter, bool use_calls)
 
   HOST_WIDE_INT size_in_bytes
 = is_scalar_access && tree_fits_shwi_p (len) ? tree_to_shwi (len) : -1;
+  
+  if (!ASAN_ALIGNMENT_OPTIMIZE && size_in_bytes > 1)
+{
+  size_in_bytes = -1;
+  align = 1;
+}
 
   if (use_calls)
 {
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 13270bc..8f43c06 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10580,6 +10580,12 @@ is greater or equal to this number, use callbacks instead of inline checks.
 E.g. to disable inline code use
 @option{--param asan-instrumentation-with-call-threshold=0}.
 
+@item asan-alignment-optimize
+Enable asan optimization for aligned accesses.
+It is enabled by default when using @option{-fsanitize=address} option.
+To disable optimization for aligned accesses use 
+@option{--param asan-alignment-optimize=0}.
+
 @item chkp-max-ctor-size
 Static constructors generated by Pointer Bounds Checker may become very
 large and significantly increase compile time at optimization level
diff --git a/gcc/params.def b/gcc/params.def
index d2d2add..fbccf46 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1114,6 +1114,11 @@ DEFPARAM (PARAM_ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD,
  "in function becomes greater or equal to this number",
  7000, 0, INT_MAX)
 
+DEFPARAM (PARAM_ASAN_ALIGNMENT_OPTIMIZE,
+ "asan-alignment-optimize",
+ "Enable asan optimization for aligned access",
+ 1, 0, 1)
+
 DEFPARAM (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS,
 	  "uninit-control-dep-attempts",
 	  "Maximum number of nested calls to search for control dependencies "
diff --git a/gcc/params.h b/gcc/params.h
index 4779e17..e2973d4 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -238,5 +238,7 @@ extern void init_param_values (int *params);
   PARAM_VALUE (PARAM_ASAN_USE_AFTER_RETURN)
 #define ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD \
   PARAM_VALUE (PARAM_ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD)
+#define ASAN_ALIGNMENT_OPTIMIZE \
+  PARAM_VALUE (PARAM_ASAN_ALIGNMENT_OPTIMIZE)
 
 #endif /* ! GCC_PARAMS_H */
diff --git a/gcc/testsuite/c-c++-common/asan/ignore_align.c b/gcc/testsuite/c-c++-common/asan/ignore_align.c
new file mode 100644
index 000..989958b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/asan/ignore_align.c
@@ -0,0 +1,34 @@
+/* { dg-options "-fdump-tree-sanopt --param asan-alignment-optimize=0" } */
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+void *memset (void *, int, __SIZE_TYPE__);
+#ifdef __cplusplus
+}
+#endif
+
+struct dummy {
+  int a;
+  int b;
+  int c;
+  int d;
+};
+
+volatile struct dummy * new_p;
+volatile struct dummy * old_p; 
+
+void foo(void)
+{
+  *(volatile char *)(0x12);
+  *(volatile short int *)(0x12);
+  *(volatile unsigned int *)(0x12);
+  *(volatile unsigned long long *)(0x12);
+  *new_p = *old_p;
+}
+
+/* { dg-final { scan-tree-dump-times ">> 3" 11 "sanopt" } } */
+/* { dg-final { scan-tree-dump-times "& 7" 11 "sanopt" } } */
+/* { dg-final { cleanup-tree-dump "sanopt" } } */

Re: [PATCH] Add a configurable directory prefix for dynamic linkers.

2014-12-04 Thread heroxbd

Hi Matthew, Richard,

Matthew Fortune  writes:

> Richard Biener   writes:
>> On Thu, Dec 4, 2014 at 8:59 AM, Benda Xu  wrote:
>> > Hello,
>> >
>> > libc could be installed in a directory prefix. This patch provides a
>> > way to specify such a prefix for gcc at configuration time.
>> >
>> > I have only tested the patch with glibc on amd64, x86 and arm.
>> > It is logically straightforward.  Please share your thoughts on it.
>> 
>> I think you are supposed to use sysroots for that.

@Richard, I could not.  The sysroot does not affect location of the
dynamic loader.  There are use cases (see below) where the dynamic
loader is at e.g. for x86 DL_PREFIX/lib/ld-linux.so.2.

> This patch would appear to allow the ABI defined locations of a dynamic
> linker to be changed which does not seem like a good idea. I.e. My
> understanding is that the combination of arch+ABI+Clibrary indicates an
> ABI defined absolute path to the dynamic linker as well as name of the
> dynamic linker. I'd welcome any correction to that from someone who
> lives and breathes this stuff.

> I am under the impression that installing glibc with a prefix effectively
> breaks the ABI paths and is primarily for testing and/or creating
> sysroots for cross compilers and such forth. Such sysroots must be
> installed to a target before being used or with qemu user-mode there is
> an option to say where your sysroot is so that the dynamic linker can
> be located from within that path.

Thank you for pointing out the ABI specification.

  https://refspecs.linuxbase.org/LSB_1.2.0/IA32/spec/dynamiclinking.html

  The LSB specifies the Program Interpreter to be

/lib/ld-lsb.so.1

You are right.

But after all, the *INTERP* ELF field in Program Headers codes the
dynamic linker (output of readelf -l /bin/ls on i386):

Program Headers:
  Type   Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  INTERP 0x000154 0x08048154 0x08048154 0x00013 0x00013 R   0x1
  [Requesting program interpreter: /lib/ld-linux.so.2]

This is not for cross compiling, and not for a chroot.  The compiled
programs are supposed to be loaded by the dynamic linker at
DL_PREFIX/lib/ld-linux.so.2.  The use cases are

  1. Compiling new software on shared computing clusters as normal
 users.

 More often the host glibc-2.5 from RHEL 5 for example, becomes too
 old to compile out new programs, like >=openssl-0.9.8.  The
 feasible solution is to *run* a glibc and its dynamic linker inside
 the user's own home directory.

  2. A LTS environment insensitive to underlying infrastructure for
 dedicated software deployment.

 So that the environment could be copied to the subdir of a large
 variety of distributions, and run off-the-shelf with their own
 glibc and libstdc++.

  3. Running GNU userland on Android-based mobile devices.

 The rootfs of Android is volatile initramfs. We had better to put
 the system in a directory other than /.

https://www.gentoo.org/proj/en/gentoo-alt/prefix/usecases.xml
https://wiki.gentoo.org/wiki/Project:Android

> While I believe the dynamic linker paths are hard coded, the library
> Locations are more complex especially when multilibs (MULTILIB_OSDIRNAMES)
> and/or multiarch gets layered on top.

You are right, multilib and multiarch are about library locations.  And
they have nothing to do with the location of dynamic linker.  On
multilib, the dynamic linkers are as-is, e.g. amd64/x86 system has both
/lib64/ld-linux-x86-64.so.2 and /lib/ld-linux.so.2.

On multiarch, dynamic linkers are symlinked to the traditional
locations:

  $ ls -l /lib/ld-linux.so.2 
  lrwxrwxrwx 1 root root 25 Oct 23 19:37 /lib/ld-linux.so.2 -> 
i386-linux-gnu/ld-2.19.so

> I'd love to see a description of how all these pieces are intended to
> sit together as I too am looking at this area for MIPS sysroots. Perhaps
> that is a topic for libc-alpha though.

If the target is the rootfs of a MIPS embedded system, when it is a
classical sysroot use case;)

Cheers,
Benda

Re: [PING^5][PATCH] Warn about unclosed pragma omp declare target.

2014-12-04 Thread Ilya Tocar

Ping.
On 19 Nov 16:34, Ilya Tocar wrote:
> As omp target and offloading support is committed to trunk,
> I think it's reasonable to add some new warnings.
> 
> On 06 Nov 15:27, Ilya Tocar wrote:
> > Ping.
> > On 30 Oct 18:31, Ilya Tocar wrote:
> > > Ping.
> > > On 20 Oct 19:26, Ilya Tocar wrote:
> > > > Ping.
> > > > 
> > > > On 02 Oct 17:38, Ilya Tocar wrote:
> > > > > Ping.
> > > > > On 15 Aug 16:26, Ilya Tocar wrote:
> > > > > > Ping.
> > > > > > 
> > > > > > On 29 Jul 18:45, Ilya Tocar wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > As discussed here in 
> > > > > > > https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html
> > > > > > > Gcc should complain about pragma omp declare target without
> > > > > > > corresponding pragma omp end declare target. This patch adds a 
> > > > > > > warning
> > > > > > > for those cases.
> > > > > > > Bootstraps/passes make-check.
> > > > > > > Ok for trunk?
> > > > > > > 
> > > > > > > ChangeLog:
> > > > > > > 
> > > > > > > 2014-07-29  Ilya Tocar  
> > > > > > > 
> > > > > > >   * c-decl.c (omp_declare_target_location_stack): New.
> > > > > > >   * c-lang.h (omp_declare_target_location_stack): Declare.
> > > > > > >   * c-parser.c (warn_unclosed_pragma_omp_target): New.
> > > > > > >   (c_parser_translation_unit): Call it.
> > > > > > >   (c_parser_omp_declare_target): Remeber location.
> > > > > > >   (c_parser_omp_end_declare_target): Forget location.
> > > > > > > 
> > > > > > > And ChangeLog for testsuite:
> > > > > > > 
> > > > > > > 2014-07-29  Ilya Tocar  
> > > > > > > 
> > > > > > >   * gcc.dg/gomp//target-3.c: New testcase.
> > > > > > > 
> > > > > > > ---
> > > > > > >  gcc/c/c-decl.c   |  3 +++
> > > > > > >  gcc/c/c-lang.h   |  3 +++
> > > > > > >  gcc/c/c-parser.c | 22 +-
> > > > > > >  gcc/testsuite/gcc.dg/gomp/target-3.c | 33 
> > > > > > > +
> > > > > > >  4 files changed, 60 insertions(+), 1 deletion(-)
> > > > > > >  create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c
> > > > > > > 
> > > > > > > diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> > > > > > > index 2a4b439..2dd5b2c 100644
> > > > > > > --- a/gcc/c/c-decl.c
> > > > > > > +++ b/gcc/c/c-decl.c
> > > > > > > @@ -158,6 +158,9 @@ enum machine_mode c_default_pointer_mode = 
> > > > > > > VOIDmode;
> > > > > > >  /* If non-zero, implicit "omp declare target" attribute is added 
> > > > > > > into the
> > > > > > > attribute lists.  */
> > > > > > >  int current_omp_declare_target_attribute;
> > > > > > > +
> > > > > > > +/* Holds locations of currently open "omp declare target" 
> > > > > > > pragmas.  */
> > > > > > > +vec omp_declare_target_location_stack;
> > > > > > >  
> > > > > > >  /* Each c_binding structure describes one binding of an 
> > > > > > > identifier to
> > > > > > > a decl.  All the decls in a scope - irrespective of namespace 
> > > > > > > - are
> > > > > > > diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
> > > > > > > index e974906..cef995c 100644
> > > > > > > --- a/gcc/c/c-lang.h
> > > > > > > +++ b/gcc/c/c-lang.h
> > > > > > > @@ -59,4 +59,7 @@ struct GTY(()) language_function {
> > > > > > > attribute lists.  */
> > > > > > >  extern GTY(()) int current_omp_declare_target_attribute;
> > > > > > >  
> > > > > > > +/* Holds locations of currently open "omp declare target" 
> > > > > > > pragmas.  */
> > > > > > > +extern vec omp_declare_target_location_stack;
> > > > > > > +
> > > > > > >  #endif /* ! GCC_C_LANG_H */
> > > > > > > diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> > > > > > > index e32bf04..0b96fe9 100644
> > > > > > > --- a/gcc/c/c-parser.c
> > > > > > > +++ b/gcc/c/c-parser.c
> > > > > > > @@ -1255,6 +1255,8 @@ static bool c_parser_cilk_verify_simd 
> > > > > > > (c_parser *, enum pragma_context);
> > > > > > >  static tree c_parser_array_notation (location_t, c_parser *, 
> > > > > > > tree, tree);
> > > > > > >  static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, 
> > > > > > > bool);
> > > > > > >  
> > > > > > > +static void warn_unclosed_pragma_omp_target ();
> > > > > > > +
> > > > > > >  /* Parse a translation unit (C90 6.7, C99 6.9).
> > > > > > >  
> > > > > > > translation-unit:
> > > > > > > @@ -1290,6 +1292,8 @@ c_parser_translation_unit (c_parser *parser)
> > > > > > >   }
> > > > > > >while (c_parser_next_token_is_not (parser, CPP_EOF));
> > > > > > >  }
> > > > > > > +
> > > > > > > +  warn_unclosed_pragma_omp_target ();
> > > > > > >  }
> > > > > > >  
> > > > > > >  /* Parse an external declaration (C90 6.7, C99 6.9).
> > > > > > > @@ -13068,8 +13072,10 @@ c_finish_omp_declare_simd (c_parser 
> > > > > > > *parser, tree fndecl, tree parms,
> > > > > > >  static void
> > > > > > >  c_parser_omp_declare_target (c_parser *parser)
> > > > > > >  {
> > > > > > > +  location_t loc = c_parser_peek_token (parser)->location;
> > > > > > >c_parser_skip_to_pragma_eol (parser);
> > > > >

[PATCH] Finish up string builtin folding move

2014-12-04 Thread Richard Biener


This is the last patch, moving stpcpy folding.  There are still
string function foldings left but those are exclusively GENERIC
now with no chance of advertedly recursing from GIMPLE folding
via GENERIC folding back to GIMPLE folding.  I'll deal with those
during next stage1.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2014-12-04  Richard Biener  

* builtins.c (fold_builtin_0): Remove unused ignore parameter.
(fold_builtin_1): Likewise.
(fold_builtin_3): Likewise.
(fold_builtin_varargs): Likewise.
(fold_builtin_2): Likewise.  Do not fold stpcpy here.
(fold_builtin_n): Adjust.
(fold_builtin_stpcpy): Move to gimple-fold.c.
(gimple_fold_builtin_stpcpy): Moved and gimplified from builtins.c.
(gimple_fold_builtin): Fold stpcpy here.

Index: trunk/gcc/builtins.c
===
*** trunk.orig/gcc/builtins.c   2014-12-04 09:44:01.583387589 +0100
--- trunk/gcc/builtins.c2014-12-04 12:50:13.254000789 +0100
*** static tree fold_builtin_fabs (location_
*** 191,201 
  static tree fold_builtin_abs (location_t, tree, tree);
  static tree fold_builtin_unordered_cmp (location_t, tree, tree, tree, enum 
tree_code,
enum tree_code);
! static tree fold_builtin_0 (location_t, tree, bool);
! static tree fold_builtin_1 (location_t, tree, tree, bool);
! static tree fold_builtin_2 (location_t, tree, tree, tree, bool);
! static tree fold_builtin_3 (location_t, tree, tree, tree, tree, bool);
! static tree fold_builtin_varargs (location_t, tree, tree*, int, bool);
  
  static tree fold_builtin_strpbrk (location_t, tree, tree, tree);
  static tree fold_builtin_strstr (location_t, tree, tree, tree);
--- 191,201 
  static tree fold_builtin_abs (location_t, tree, tree);
  static tree fold_builtin_unordered_cmp (location_t, tree, tree, tree, enum 
tree_code,
enum tree_code);
! static tree fold_builtin_0 (location_t, tree);
! static tree fold_builtin_1 (location_t, tree, tree);
! static tree fold_builtin_2 (location_t, tree, tree, tree);
! static tree fold_builtin_3 (location_t, tree, tree, tree, tree);
! static tree fold_builtin_varargs (location_t, tree, tree*, int);
  
  static tree fold_builtin_strpbrk (location_t, tree, tree, tree);
  static tree fold_builtin_strstr (location_t, tree, tree, tree);
*** fold_builtin_exponent (location_t loc, t
*** 8657,8703 
return NULL_TREE;
  }
  
- /* Fold function call to builtin stpcpy with arguments DEST and SRC.
-Return NULL_TREE if no simplification can be made.  */
- 
- static tree
- fold_builtin_stpcpy (location_t loc, tree fndecl, tree dest, tree src)
- {
-   tree fn, len, lenp1, call, type;
- 
-   if (!validate_arg (dest, POINTER_TYPE)
-   || !validate_arg (src, POINTER_TYPE))
- return NULL_TREE;
- 
-   len = c_strlen (src, 1);
-   if (!len
-   || TREE_CODE (len) != INTEGER_CST)
- return NULL_TREE;
- 
-   if (optimize_function_for_size_p (cfun)
-   /* If length is zero it's small enough.  */
-   && !integer_zerop (len))
- return NULL_TREE;
- 
-   fn = builtin_decl_implicit (BUILT_IN_MEMCPY);
-   if (!fn)
- return NULL_TREE;
- 
-   lenp1 = size_binop_loc (loc, PLUS_EXPR,
- fold_convert_loc (loc, size_type_node, len),
- build_int_cst (size_type_node, 1));
-   /* We use dest twice in building our expression.  Save it from
-  multiple expansions.  */
-   dest = builtin_save_expr (dest);
-   call = build_call_expr_loc (loc, fn, 3, dest, src, lenp1);
- 
-   type = TREE_TYPE (TREE_TYPE (fndecl));
-   dest = fold_build_pointer_plus_loc (loc, dest, len);
-   dest = fold_convert_loc (loc, type, dest);
-   dest = omit_one_operand_loc (loc, type, dest, call);
-   return dest;
- }
- 
  /* Fold function call to builtin memchr.  ARG1, ARG2 and LEN are the
 arguments to the call, and TYPE is its return type.
 Return NULL_TREE if no simplification can be made.  */
--- 8657,8662 
*** fold_builtin_arith_overflow (location_t
*** 9857,9867 
  }
  
  /* Fold a call to built-in function FNDECL with 0 arguments.
!IGNORE is true if the result of the function call is ignored.  This
!function returns NULL_TREE if no simplification was possible.  */
  
  static tree
! fold_builtin_0 (location_t loc, tree fndecl, bool ignore ATTRIBUTE_UNUSED)
  {
tree type = TREE_TYPE (TREE_TYPE (fndecl));
enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
--- 9816,9825 
  }
  
  /* Fold a call to built-in function FNDECL with 0 arguments.
!This function returns NULL_TREE if no simplification was possible.  */
  
  static tree
! fold_builtin_0 (location_t loc, tree fndecl)
  {
tree type = TREE_TYPE (TREE_TYPE (fndecl));
enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
*** fold_bui

[PATCH] Fix glitch in replace_stmt_with_simplification

2014-12-04 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-12-04  Richard Biener  

* gimple-fold.c (replace_stmt_with_simplification): Properly
fail when maybe_push_res_to_seq fails.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 218343)
+++ gcc/gimple-fold.c   (working copy)
@@ -3345,8 +3408,9 @@ replace_stmt_with_simplification (gimple
   if (gimple_has_lhs (stmt))
{
  tree lhs = gimple_get_lhs (stmt);
- maybe_push_res_to_seq (rcode, TREE_TYPE (lhs),
-ops, seq, lhs);
+ if (!maybe_push_res_to_seq (rcode, TREE_TYPE (lhs),
+ ops, seq, lhs))
+   return false;
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file, "gimple_simplified to ");

Re: [PATCH 2/3] Extended if-conversion

2014-12-04 Thread Richard Biener

On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev  wrote:
> Thanks Richard for your quick reply!
>
> 1. I agree that we can combine predicate_extended_ and
> predicate_arbitrary_ to one function as you proposed.
> 2. What is your opinion about using more simple decision about
> insertion point - if bb has use of phi result insert phi predication
> before it and at the bb end otherwise. I assume that critical edge
> splitting is not a good decision.

Why not always insert before the use?  Which would be after labels,
what we do for two-arg PHIs.  That is, how can it be that you predicate
a PHI in BB1 and then for an edge predicate on one of its incoming
edges you get SSA uses with defs that are in BB1 itself?  That
can only happen for backedges but those you can't remove in any case.

Richard.

>
> Best regards.
> Yuri.
>
> 2014-12-02 16:28 GMT+03:00 Richard Biener :
>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev  wrote:
>>> Hi Richard,
>>>
>>> I resend you patch1 and patch2 with minor changes:
>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>> I also very sorry that I sent you bad patch.
>>>
>>> Now let me answer on your questions related to second patch.
>>> 1. Why we need both predicate_extended_scalar_phi and
>>> predicate_arbitrary_scalar_phi?
>>>
>>> Let's consider the following simple test-case:
>>>
>>>   #pragma omp simd safelen(8)
>>>   for (i=0; i<512; i++)
>>>   {
>>> float t = a[i];
>>> if (t > 0.0f & t < 1.0e+17f)
>>>   if (c[i] != 0)  /* c is integer array. */
>>> res += 1;
>>>   }
>>>
>>> we can see the following phi node correspondent to res:
>>>
>>> # res_1 = PHI 
>>>
>>> It is clear that we can optimize it to phi node with 2 arguments only
>>> and only one check can be used for phi predication (for reduction in
>>> our case), namely predicate of bb_5. In general case we can't do it
>>> even if we sort all phi argument values since we still have to produce
>>> a chain of cond expressions to perform phi predication (see comments
>>> for predicate_arbitrary_scalar_phi).
>>
>> How so?  We can always use !(condition) for the "last" value, thus
>> treat it as an 'else' case.  That even works for
>>
>> # res_1 = PHI 
>>
>> where the condition for edges 5 and 7 can be computed as
>> ! (condition for 3 || condition for 4).
>>
>> Of course it is worthwhile to also sort single-occurances first
>> so your case gets just the condiiton for edge 5 and its inversion
>> used for edges 3 and 4 combined.
>>
>>> 2. Why we need to introduce find_insertion_point?
>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>> only critical incoming edges and both contain code computing edge
>>> predicates, e.g.
>>>
>>> :
>>> # xmax_edge_30 = PHI 
>>> _46 = xmax_17 == xmax_37;
>>> _47 = xmax_17 == xmax_27;
>>> _48 = _46 & _47;
>>> _53 = xmax_17 == xmax_37;
>>> _54 = ~_53;
>>> _55 = xmax_17 == xmax_27;
>>> _56 = _54 & _55;
>>> _57 = _48 | _56;
>>> xmax_edge_19 = xmax_edge_39 + 1;
>>> goto ;
>>>
>>> It is evident that we can not put phi predication at the block
>>> beginning but need to put it after predicate computations.
>>> Note also that if there are no critical edges for phi arguments
>>> insertion point will be "after labels" Note also that phi result can
>>> have use in this block too, so we can't put predication code to the
>>> block end.
>>
>> So the issue is that predicate insertion for edge predicates does
>> not happen on the edge but somewhere else (generally impossible
>> for critical edges unless you split them).
>>
>> I think I've told you before that I prefer simple solutions to such issues,
>> like splitting the edge!  Certainly not involving a function walking
>> GENERIC expressions.
>>
>> Thanks,
>> Richard.
>>
>>> Let me know if you still have any questions.
>>>
>>> Best regards.
>>> Yuri.
>>>
>>>
>>>
>>>
>>> 2014-11-28 15:43 GMT+03:00 Richard Biener :
 On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev  
 wrote:
> Hi All,
>
> Here is the second patch related to extended predication.
> Few comments which explain a main goal of design.
>
> 1. I don't want to insert any critical edge splitting since it may
> lead to less efficient binaries.
> 2. One special case of extended PHI node predication was introduced
> when #arguments is more than 2 but only two arguments are different
> and one argument has the only occurrence. For such PHI conditional
> scalar reduction is applied.
> This is correspondent to the following statement:
> if (q1 && q2 && q3) var++
>  New function phi_has_two_different_args was introduced to detect such 
> phi.
> 3. Original algorithm for PHI predication used assumption that at
> least one incoming edge for blocks containing PHI is not critical - it
> guarantees that all computations related to predicate of normal

Re: [PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations

2014-12-04 Thread Uros Bizjak

On Wed, Dec 3, 2014 at 10:35 PM, H.J. Lu  wrote:

>> It would probably help reviewers if you pointed to actual path
>> submission [1], which unfortunately contains the explanation in the
>> patch itself [2], which further explains that this functionality is
>> currently only supported with gold, patched with [3].
>>
>> [1] https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00645.html
>> [2] https://gcc.gnu.org/ml/gcc-patches/2014-09/txt2CHtu81P1O.txt
>> [3] https://sourceware.org/ml/binutils/2014-05/msg00092.html
>>
>> After a bit of the above detective work, I think that new gcc option
>> is not necessary. The configure should detect if new functionality is
>> supported in the linker, and auto-configure gcc to use it when
>> appropriate.
>
> I think GCC option is needed since one can use -fuse-ld= to
> change linker.

 IMO, nobody will use this highly special x86_64-only option. It would
 be best for gnu-ld to reach feature parity with gold as far as this
 functionality is concerned. In this case, the optimization would be
 auto-configured, and would fire automatically, without any user
 intervention.

>>>
>>> Let's do it.  I implemented the same feature in bfd linker on both
>>> master and 2.25 branch.
>>>
>>
>> +bool
>> +i386_binds_local_p (const_tree exp)
>> +{
>> +  /* Globals marked extern are treated as local when linker copy relocations
>> + support is available with -f{pie|PIE}.  */
>> +  if (TARGET_64BIT && ix86_copyrelocs && flag_pie
>> +  && TREE_CODE (exp) == VAR_DECL
>> +  && DECL_EXTERNAL (exp) && !DECL_WEAK (exp))
>> +return true;
>> +  return default_binds_local_p (exp);
>> +}
>> +
>>
>> It returns true with -fPIE and false without -fPIE.  It is lying to compiler.
>> Maybe legitimate_pic_address_disp_p is a better place.

Agreed.

> Something like this?

Yes.

OK, if Jakub doesn't have any objections here. Please also add
Sriraman as author to ChangeLog entry.

Thanks,
Uros.

Re: [PATCH x86] Enable v64qi permutations.

2014-12-04 Thread Uros Bizjak

On Thu, Dec 4, 2014 at 1:04 PM, Jakub Jelinek  wrote:
> On Thu, Dec 04, 2014 at 04:00:27AM -0800, H.J. Lu wrote:
>> >> Can you add a few testcases?
>> >
>> > Isn't it already covered by gcc.dg/torture/vshuf* ?
>> >
>>
>> I didn't see them fail on my machines today.
>
> Those are executable testcases, those better should not fail.
> The patch just improved code generation and the testcases test
> if the improved code generation works well.
> Did you mean some scan-assembler test that verifies the better code
> generation?  Guess it is possible, though fragile.

I think that existing executable testcases adequately cover the
functionality of the patch.

The patch is OK.

Thanks,
Uros.

Re: [PINGv2][PATCH] Ignore alignment by option

2014-12-04 Thread Dmitry Vyukov

+address-sanitizer

Please don't hurry with it.

Do you have any numbers on how frequent are unaligned accesses in
kernel? Is it worth addressing at this cost?

size_in_bytes = -1 instrumentation is too slow to be the default in kernel.

If we want to pursue this, I propose a different scheme.
Handle 8+ byte accesses as 1/2/4 accesses. No changes to 1/2/4 access handling.
Currently when we allocate, say, 17-byte object we store 0 0 1 into
shadow. An 8-byte access starting at offset 15 won't be detected,
because the corresponding shadow value is 0. Instead we start storing
0 9 1 into shadow. Then the first shadow != 0 check will fail, and the
precise size check will catch the OOB access.
Make this scheme the default for kernel (no additional flags).

This scheme has the following advantages:
- load shadow only once (as opposed to the current size_in_bytes = -1
check that loads shadow twice)
- less code in instrumentation
- accesses to beginning and middle of the object are not slowed down
(shadow still contains 0, so fast-path works); only accesses to the
very last bytes of the object are penalized.




On Thu, Dec 4, 2014 at 3:05 PM, Marat Zakirov  wrote:
>
> On 11/27/2014 05:14 PM, Marat Zakirov wrote:
>>
>>
>> On 11/19/2014 06:01 PM, Marat Zakirov wrote:
>>>
>>> Hi all!
>>>
>>> Here is the patch which forces ASan to ignore alignment of memory access.
>>> It increases ASan overhead but it's still useful because some programs like
>>> linux kernel often cheat with alignment which may cause false negatives.
>>>
>>> --Marat
>>
>>
>
>
> gcc/ChangeLog:
>
> 2014-11-14  Marat Zakirov  
>
> * asan.c (asan_expand_check_ifn): Ignore alignment by option.
> * doc/invoke.texi: Document.
> * params.def (asan-alignment-optimize): New.
> * params.h: Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2014-11-14  Marat Zakirov  
>
> * c-c++-common/asan/red-align-3.c: New test.
>
>
> diff --git a/gcc/asan.c b/gcc/asan.c
> index 79dede7..4f86088 100644
> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -2518,6 +2518,12 @@ asan_expand_check_ifn (gimple_stmt_iterator *iter,
> bool use_calls)
>
>HOST_WIDE_INT size_in_bytes
>  = is_scalar_access && tree_fits_shwi_p (len) ? tree_to_shwi (len) : -1;
> +
> +  if (!ASAN_ALIGNMENT_OPTIMIZE && size_in_bytes > 1)
> +{
> +  size_in_bytes = -1;
> +  align = 1;
> +}
>
>if (use_calls)
>  {
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 13270bc..8f43c06 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10580,6 +10580,12 @@ is greater or equal to this number, use callbacks
> instead of inline checks.
>  E.g. to disable inline code use
>  @option{--param asan-instrumentation-with-call-threshold=0}.
>
> +@item asan-alignment-optimize
> +Enable asan optimization for aligned accesses.
> +It is enabled by default when using @option{-fsanitize=address} option.
> +To disable optimization for aligned accesses use
> +@option{--param asan-alignment-optimize=0}.
> +
>  @item chkp-max-ctor-size
>  Static constructors generated by Pointer Bounds Checker may become very
>  large and significantly increase compile time at optimization level
> diff --git a/gcc/params.def b/gcc/params.def
> index d2d2add..fbccf46 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -1114,6 +1114,11 @@ DEFPARAM
> (PARAM_ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD,
>   "in function becomes greater or equal to this number",
>   7000, 0, INT_MAX)
>
> +DEFPARAM (PARAM_ASAN_ALIGNMENT_OPTIMIZE,
> + "asan-alignment-optimize",
> + "Enable asan optimization for aligned access",
> + 1, 0, 1)
> +
>  DEFPARAM (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS,
>   "uninit-control-dep-attempts",
>   "Maximum number of nested calls to search for control dependencies
> "
> diff --git a/gcc/params.h b/gcc/params.h
> index 4779e17..e2973d4 100644
> --- a/gcc/params.h
> +++ b/gcc/params.h
> @@ -238,5 +238,7 @@ extern void init_param_values (int *params);
>PARAM_VALUE (PARAM_ASAN_USE_AFTER_RETURN)
>  #define ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD \
>PARAM_VALUE (PARAM_ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD)
> +#define ASAN_ALIGNMENT_OPTIMIZE \
> +  PARAM_VALUE (PARAM_ASAN_ALIGNMENT_OPTIMIZE)
>
>  #endif /* ! GCC_PARAMS_H */
> diff --git a/gcc/testsuite/c-c++-common/asan/ignore_align.c
> b/gcc/testsuite/c-c++-common/asan/ignore_align.c
> new file mode 100644
> index 000..989958b
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/asan/ignore_align.c
> @@ -0,0 +1,34 @@
> +/* { dg-options "-fdump-tree-sanopt --param asan-alignment-optimize=0" } */
> +/* { dg-do compile } */
> +/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +void *memset (void *, int, __SIZE_TYPE__);
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +struct dummy {
> +  int a;
> +  int b;
> +  int c;
> +  int d;
> +};
> +
> +volatile struct dummy * new_p;
> +volatile struct dummy *

[C++ Patch] PR 63985

2014-12-04 Thread Paolo Carlini


Hi,

this accepts-invalid is about an invalid loop of the form:

for (int i = 5: arr)

thus it starts with an initialized declaration, which would be legal in 
a normal for loop, but then the colon means that it can only be an 
invalid range-based for loop. Ideally, it would be nice to say something 
more detailed about the invalid declaration, but that doesn't seem 
trivial... Would something like the below be ok for now? Tested 
x86_64-linux.


Thanks,
Paolo.

/
/cp
2014-12-04  Paolo Carlini  

PR c++/63985
* parser.c (cp_parser_for_init_statement): Reject invalid declarations
in range-based for loops.

/testsuite
2014-12-04  Paolo Carlini  

PR c++/63985
* g++.dg/cpp0x/range-for29.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 218348)
+++ cp/parser.c (working copy)
@@ -10841,6 +10841,7 @@ cp_parser_for_init_statement (cp_parser* parser, t
 {
   bool is_range_for = false;
   bool saved_colon_corrects_to_scope_p = parser->colon_corrects_to_scope_p;
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
 
   if (cp_lexer_next_token_is (parser->lexer, CPP_NAME)
  && cp_lexer_nth_token_is (parser->lexer, 2, CPP_COLON))
@@ -10881,6 +10882,8 @@ cp_parser_for_init_statement (cp_parser* parser, t
   "-std=c++11 or -std=gnu++11");
  *decl = error_mark_node;
}
+ if (*decl == error_mark_node)
+   error_at (loc, "invalid declaration in range-based % loop");  

}
   else
  /* The ';' is not consumed yet because we told
Index: testsuite/g++.dg/cpp0x/range-for29.C
===
--- testsuite/g++.dg/cpp0x/range-for29.C(revision 0)
+++ testsuite/g++.dg/cpp0x/range-for29.C(working copy)
@@ -0,0 +1,10 @@
+// PR c++/63985
+// { dg-require-effective-target c++11 }
+
+int main()
+{
+  int arr;
+  for (int i = 5: arr)  // { dg-error "invalid declaration" }
+return 1;
+  return 0;
+}

Re: [PATCH x86] Enable v64qi permutations.

2014-12-04 Thread Uros Bizjak

On Thu, Dec 4, 2014 at 1:45 PM, Uros Bizjak  wrote:
> On Thu, Dec 4, 2014 at 1:04 PM, Jakub Jelinek  wrote:
>> On Thu, Dec 04, 2014 at 04:00:27AM -0800, H.J. Lu wrote:
>>> >> Can you add a few testcases?
>>> >
>>> > Isn't it already covered by gcc.dg/torture/vshuf* ?
>>> >
>>>
>>> I didn't see them fail on my machines today.
>>
>> Those are executable testcases, those better should not fail.
>> The patch just improved code generation and the testcases test
>> if the improved code generation works well.
>> Did you mean some scan-assembler test that verifies the better code
>> generation?  Guess it is possible, though fragile.
>
> I think that existing executable testcases adequately cover the
> functionality of the patch.
>
> The patch is OK.

BTW, the ChangeLog is missing.

index ca5d720..6252e7e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10678,7 +10678,7 @@
(V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2")
(V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
-   (V32HI "TARGET_AVX512BW")])
+   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")])

I don't think change for VBMI target belongs in this patch.

Uros.

Re: [C++ Patch] PR 63985

2014-12-04 Thread Paolo Carlini


... oops, sent the wrong patch. See the below instead.

Paolo.

///
Index: cp/parser.c
===
--- cp/parser.c (revision 218348)
+++ cp/parser.c (working copy)
@@ -10841,6 +10841,7 @@ cp_parser_for_init_statement (cp_parser* parser, t
 {
   bool is_range_for = false;
   bool saved_colon_corrects_to_scope_p = parser->colon_corrects_to_scope_p;
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
 
   if (cp_lexer_next_token_is (parser->lexer, CPP_NAME)
  && cp_lexer_nth_token_is (parser->lexer, 2, CPP_COLON))
@@ -10881,6 +10882,8 @@ cp_parser_for_init_statement (cp_parser* parser, t
   "-std=c++11 or -std=gnu++11");
  *decl = error_mark_node;
}
+ else if (*decl == error_mark_node)
+   error_at (loc, "invalid declaration in range-based % loop");  

}
   else
  /* The ';' is not consumed yet because we told
Index: testsuite/g++.dg/cpp0x/range-for29.C
===
--- testsuite/g++.dg/cpp0x/range-for29.C(revision 0)
+++ testsuite/g++.dg/cpp0x/range-for29.C(working copy)
@@ -0,0 +1,10 @@
+// PR c++/63985
+// { dg-require-effective-target c++11 }
+
+int main()
+{
+  int arr;
+  for (int i = 5: arr)  // { dg-error "invalid declaration" }
+return 1;
+  return 0;
+}

Re: [PATCH x86] Enable v64qi permutations.

2014-12-04 Thread H.J. Lu

On Thu, Dec 4, 2014 at 4:04 AM, Jakub Jelinek  wrote:
> On Thu, Dec 04, 2014 at 04:00:27AM -0800, H.J. Lu wrote:
>> >> Can you add a few testcases?
>> >
>> > Isn't it already covered by gcc.dg/torture/vshuf* ?
>> >
>>
>> I didn't see them fail on my machines today.
>
> Those are executable testcases, those better should not fail.
> The patch just improved code generation and the testcases test
> if the improved code generation works well.
> Did you mean some scan-assembler test that verifies the better code
> generation?  Guess it is possible, though fragile.

Well, we will never be sure that the better code is really generated
unless we visually exam the assembly code.  Any changes in
the future may disable the better code generation and we won't
notice until much later.


-- 
H.J.

Re: [Build, Graphite] Support ISL-0.14 with GCC 4.9

2014-12-04 Thread Tobias Burnus

On Thu, Dec 04, 2014 at 12:30:39PM +0100, Richard Biener wrote:
> > OK for the 4.9 branch?
> I think for a system 0.14 ISL build which I just checked you need to
> adjust the ISL version check as well.

Confirmed. Using additionally the following patch survives bootstrapping
here with
  --with-isl=/dev/shm/isl-14-inst
and
  checking for version 0.14 of ISL... yes

The other libraries (gmp, mpfr, mpc, cloog) were built in tree.

As I again (similarly to the trunk) had a race condition (this time,
an in-tree MPC wasn't there in time for gcc's configure check), I have
also updated Makefile.def, using my patch from the trunk (+ adding cloog
dependency, just to make sure).


Is the patch OK together with the previous one?

Tobias
2014-12-04  Tobias Burnus  

* configure.ac: Permit also ISL 0.14 with CLooG.
* Makefile.def: Make more dependent on mpfr, mpc, isl, and cloog.
* Makefile.in: Regenerate.
* configure: Regenerate.

diff --git a/Makefile.def b/Makefile.def
index ec2b0f2..cd99dd4 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -296,6 +296,10 @@ dependencies = { module=all-build-fixincludes; on=all-build-libiberty; };
 // Host modules specific to gcc.
 dependencies = { module=configure-gcc; on=configure-intl; };
 dependencies = { module=configure-gcc; on=all-gmp; };
+dependencies = { module=configure-gcc; on=all-mpfr; };
+dependencies = { module=configure-gcc; on=all-mpc; };
+dependencies = { module=configure-gcc; on=all-isl; };
+dependencies = { module=configure-gcc; on=all-cloog; };
 dependencies = { module=configure-gcc; on=all-lto-plugin; };
 dependencies = { module=configure-gcc; on=all-binutils; };
 dependencies = { module=configure-gcc; on=all-gas; };
diff --git a/configure.ac b/configure.ac
index 1bab680..2aee14a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1658,6 +1658,9 @@ if test "x$with_isl" != "xno" &&
 ISL_CHECK_VERSION(0,11)
 if test "${gcc_cv_isl}" = no ; then
   ISL_CHECK_VERSION(0,12)
+  if test "${gcc_cv_isl}" = no ; then
+ISL_CHECK_VERSION(0,14)
+  fi
 fi
   fi
   dnl Only execute fail-action, if ISL has been requested.
diff --git a/Makefile.in b/Makefile.in
index bf06dce..6dd5802 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -46988,6 +46988,38 @@ configure-stage3-gcc: maybe-all-stage3-gmp
 configure-stage4-gcc: maybe-all-stage4-gmp
 configure-stageprofile-gcc: maybe-all-stageprofile-gmp
 configure-stagefeedback-gcc: maybe-all-stagefeedback-gmp
+configure-gcc: maybe-all-mpfr
+
+configure-stage1-gcc: maybe-all-stage1-mpfr
+configure-stage2-gcc: maybe-all-stage2-mpfr
+configure-stage3-gcc: maybe-all-stage3-mpfr
+configure-stage4-gcc: maybe-all-stage4-mpfr
+configure-stageprofile-gcc: maybe-all-stageprofile-mpfr
+configure-stagefeedback-gcc: maybe-all-stagefeedback-mpfr
+configure-gcc: maybe-all-mpc
+
+configure-stage1-gcc: maybe-all-stage1-mpc
+configure-stage2-gcc: maybe-all-stage2-mpc
+configure-stage3-gcc: maybe-all-stage3-mpc
+configure-stage4-gcc: maybe-all-stage4-mpc
+configure-stageprofile-gcc: maybe-all-stageprofile-mpc
+configure-stagefeedback-gcc: maybe-all-stagefeedback-mpc
+configure-gcc: maybe-all-isl
+
+configure-stage1-gcc: maybe-all-stage1-isl
+configure-stage2-gcc: maybe-all-stage2-isl
+configure-stage3-gcc: maybe-all-stage3-isl
+configure-stage4-gcc: maybe-all-stage4-isl
+configure-stageprofile-gcc: maybe-all-stageprofile-isl
+configure-stagefeedback-gcc: maybe-all-stagefeedback-isl
+configure-gcc: maybe-all-cloog
+
+configure-stage1-gcc: maybe-all-stage1-cloog
+configure-stage2-gcc: maybe-all-stage2-cloog
+configure-stage3-gcc: maybe-all-stage3-cloog
+configure-stage4-gcc: maybe-all-stage4-cloog
+configure-stageprofile-gcc: maybe-all-stageprofile-cloog
+configure-stagefeedback-gcc: maybe-all-stagefeedback-cloog
 configure-gcc: maybe-all-lto-plugin
 
 configure-stage1-gcc: maybe-all-stage1-lto-plugin
diff --git a/configure b/configure
index d1c67e5..d769d93 100755
--- a/configure
+++ b/configure
@@ -6024,6 +6024,55 @@ $as_echo "$gcc_cv_isl" >&6; }
   fi
 
 
+  if test "${gcc_cv_isl}" = no ; then
+
+  if test "${ENABLE_ISL_CHECK}" = yes ; then
+_isl_saved_CFLAGS=$CFLAGS
+_isl_saved_LDFLAGS=$LDFLAGS
+_isl_saved_LIBS=$LIBS
+
+CFLAGS="${_isl_saved_CFLAGS} ${islinc} ${gmpinc}"
+LDFLAGS="${_isl_saved_LDFLAGS} ${isllibs}"
+LIBS="${_isl_saved_LIBS} -lisl"
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for version 0.14 of ISL" >&5
+$as_echo_n "checking for version 0.14 of ISL... " >&6; }
+if test "$cross_compiling" = yes; then :
+  gcc_cv_isl=yes
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+   #include 
+int
+main ()
+{
+if (strncmp (isl_version (), "isl-0.14", strlen ("isl-0.14")) != 0)
+ return 1;
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_run "$LINENO"; then :
+  gcc_cv_isl=yes
+else
+  gcc_cv_isl=no
+fi
+rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
+  conftest.$ac_objext conftest.beam confte

Nuke CLOOGLIBS from trunk's gcc/Makefile.in

2014-12-04 Thread Tobias Burnus

Nothing sets any more CLOOGLIBS and CLOOGINC - thus, we can
also remove it.

The patch survived bootstrapping.  OK?

Tobias


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 43405a0..98cff75 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1006,7 +1006,7 @@ BUILD_LIBDEPS= $(BUILD_LIBIBERTY)
 # and the system's installed libraries.
 LIBS = @LIBS@ libcommon.a $(CPPLIB) $(LIBINTL) $(LIBICONV) $(LIBBACKTRACE) \
$(LIBIBERTY) $(LIBDECNUMBER) $(HOST_LIBS)
-BACKENDLIBS = $(CLOOGLIBS) $(ISLLIBS) $(GMPLIBS) $(PLUGINLIBS) $(HOST_LIBS) \
+BACKENDLIBS = $(ISLLIBS) $(GMPLIBS) $(PLUGINLIBS) $(HOST_LIBS) \
$(ZLIB)
 # Any system libraries needed just for GNAT.
 SYSLIBS = @GNAT_LIBEXC@
@@ -1038,7 +1038,7 @@ BUILD_ERRORS = build/errors.o
 INCLUDES = -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \
   -I$(srcdir)/../include @INCINTL@ \
   $(CPPINC) $(GMPINC) $(DECNUMINC) $(BACKTRACEINC) \
-  $(CLOOGINC) $(ISLINC)
+  $(ISLINC)
 
 COMPILE.base = $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) -o $@
 ifeq ($(CXXDEPMODE),depmode=gcc3)

Re: [PATCH 2/3] Extended if-conversion

2014-12-04 Thread Yuri Rumyantsev

Richard,

I did simple change by saving gsi iterator for each bb that has
critical edges by adding additional field to bb_predicate_s:

typedef struct bb_predicate_s {

  /* The condition under which this basic block is executed.  */
  tree predicate;

  /* PREDICATE is gimplified, and the sequence of statements is
 recorded here, in order to avoid the duplication of computations
 that occur in previous conditions.  See PR44483.  */
  gimple_seq predicate_gimplified_stmts;

  /* Insertion point for blocks having incoming critical edges.  */
  gimple_stmt_iterator gsi;
} *bb_predicate_p;

and this iterator is saved in  insert_gimplified_predicates before
insertion code for predicate computation. I checked that this fix
works.

Now I am implementing merging of predicate_extended.. and
predicate_arbitrary.. functions as you proposed.

Best regards.
Yuri.

2014-12-04 15:41 GMT+03:00 Richard Biener :
> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev  wrote:
>> Thanks Richard for your quick reply!
>>
>> 1. I agree that we can combine predicate_extended_ and
>> predicate_arbitrary_ to one function as you proposed.
>> 2. What is your opinion about using more simple decision about
>> insertion point - if bb has use of phi result insert phi predication
>> before it and at the bb end otherwise. I assume that critical edge
>> splitting is not a good decision.
>
> Why not always insert before the use?  Which would be after labels,
> what we do for two-arg PHIs.  That is, how can it be that you predicate
> a PHI in BB1 and then for an edge predicate on one of its incoming
> edges you get SSA uses with defs that are in BB1 itself?  That
> can only happen for backedges but those you can't remove in any case.
>
> Richard.
>
>>
>> Best regards.
>> Yuri.
>>
>> 2014-12-02 16:28 GMT+03:00 Richard Biener :
>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev  wrote:
 Hi Richard,

 I resend you patch1 and patch2 with minor changes:
 1. I renamed flag_force_vectorize to aggressive_if_conv.
 2. Use static cast for the first argument of gimple_phi_arg_edge.
 I also very sorry that I sent you bad patch.

 Now let me answer on your questions related to second patch.
 1. Why we need both predicate_extended_scalar_phi and
 predicate_arbitrary_scalar_phi?

 Let's consider the following simple test-case:

   #pragma omp simd safelen(8)
   for (i=0; i<512; i++)
   {
 float t = a[i];
 if (t > 0.0f & t < 1.0e+17f)
   if (c[i] != 0)  /* c is integer array. */
 res += 1;
   }

 we can see the following phi node correspondent to res:

 # res_1 = PHI 

 It is clear that we can optimize it to phi node with 2 arguments only
 and only one check can be used for phi predication (for reduction in
 our case), namely predicate of bb_5. In general case we can't do it
 even if we sort all phi argument values since we still have to produce
 a chain of cond expressions to perform phi predication (see comments
 for predicate_arbitrary_scalar_phi).
>>>
>>> How so?  We can always use !(condition) for the "last" value, thus
>>> treat it as an 'else' case.  That even works for
>>>
>>> # res_1 = PHI 
>>>
>>> where the condition for edges 5 and 7 can be computed as
>>> ! (condition for 3 || condition for 4).
>>>
>>> Of course it is worthwhile to also sort single-occurances first
>>> so your case gets just the condiiton for edge 5 and its inversion
>>> used for edges 3 and 4 combined.
>>>
 2. Why we need to introduce find_insertion_point?
  Let's consider another test-case extracted from 175.vpr ( t5.c is
 attached) and we can see that bb_7 and bb_9 containig phi nodes has
 only critical incoming edges and both contain code computing edge
 predicates, e.g.

 :
 # xmax_edge_30 = PHI 
 _46 = xmax_17 == xmax_37;
 _47 = xmax_17 == xmax_27;
 _48 = _46 & _47;
 _53 = xmax_17 == xmax_37;
 _54 = ~_53;
 _55 = xmax_17 == xmax_27;
 _56 = _54 & _55;
 _57 = _48 | _56;
 xmax_edge_19 = xmax_edge_39 + 1;
 goto ;

 It is evident that we can not put phi predication at the block
 beginning but need to put it after predicate computations.
 Note also that if there are no critical edges for phi arguments
 insertion point will be "after labels" Note also that phi result can
 have use in this block too, so we can't put predication code to the
 block end.
>>>
>>> So the issue is that predicate insertion for edge predicates does
>>> not happen on the edge but somewhere else (generally impossible
>>> for critical edges unless you split them).
>>>
>>> I think I've told you before that I prefer simple solutions to such issues,
>>> like splitting the edge!  Certainly not involving a function walking
>>> GENERIC expressions.
>>>
>>> Thanks,
>>> Richard.
>>>
 Let me know if you still have any questions.

 Best regards.
 Yuri.


>>

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-04 Thread Rainer Orth

FX  writes:

> 10-days ping
> This restores bootstrap on a secondary target, target maintainer is OK with
> it. I think I need build maintainers approval, so please review.

While in my testing, 64-bit Mac OS X 10.10.1 (x86_64-apple-darwin14.0.0)
now bootstraps, but 32-bit (i386-apple-darwin14.0.0) does not:

ld: illegal text-relocation to 'anon' in ../libiberty/pic/libiberty.a(regex.o) 
from '_byte_common_op_match_null_string_p' in 
../libiberty/pic/libiberty.a(regex.o) for architecture i386
collect2: error: ld returned 1 exit status
make[3]: *** [libcc1.la] Error 1
make[2]: *** [all] Error 2
make[1]: *** [all-libcc1] Error 2

I couldn't find a corresponding reloc in otool -rv output, though.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: Nuke CLOOGLIBS from trunk's gcc/Makefile.in

2014-12-04 Thread Richard Biener

On Thu, Dec 4, 2014 at 2:04 PM, Tobias Burnus
 wrote:
> Nothing sets any more CLOOGLIBS and CLOOGINC - thus, we can
> also remove it.
>
> The patch survived bootstrapping.  OK?

Ok.

Thanks,
Richard.

> Tobias
>
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 43405a0..98cff75 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1006,7 +1006,7 @@ BUILD_LIBDEPS= $(BUILD_LIBIBERTY)
>  # and the system's installed libraries.
>  LIBS = @LIBS@ libcommon.a $(CPPLIB) $(LIBINTL) $(LIBICONV) $(LIBBACKTRACE) \
> $(LIBIBERTY) $(LIBDECNUMBER) $(HOST_LIBS)
> -BACKENDLIBS = $(CLOOGLIBS) $(ISLLIBS) $(GMPLIBS) $(PLUGINLIBS) $(HOST_LIBS) \
> +BACKENDLIBS = $(ISLLIBS) $(GMPLIBS) $(PLUGINLIBS) $(HOST_LIBS) \
> $(ZLIB)
>  # Any system libraries needed just for GNAT.
>  SYSLIBS = @GNAT_LIBEXC@
> @@ -1038,7 +1038,7 @@ BUILD_ERRORS = build/errors.o
>  INCLUDES = -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \
>-I$(srcdir)/../include @INCINTL@ \
>$(CPPINC) $(GMPINC) $(DECNUMINC) $(BACKTRACEINC) \
> -  $(CLOOGINC) $(ISLINC)
> +  $(ISLINC)
>
>  COMPILE.base = $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) -o $@
>  ifeq ($(CXXDEPMODE),depmode=gcc3)

Re: [Build, Graphite] Support ISL-0.14 with GCC 4.9

2014-12-04 Thread Richard Biener

On Thu, Dec 4, 2014 at 2:01 PM, Tobias Burnus
 wrote:
> On Thu, Dec 04, 2014 at 12:30:39PM +0100, Richard Biener wrote:
>> > OK for the 4.9 branch?
>> I think for a system 0.14 ISL build which I just checked you need to
>> adjust the ISL version check as well.
>
> Confirmed. Using additionally the following patch survives bootstrapping
> here with
>   --with-isl=/dev/shm/isl-14-inst
> and
>   checking for version 0.14 of ISL... yes
>
> The other libraries (gmp, mpfr, mpc, cloog) were built in tree.
>
> As I again (similarly to the trunk) had a race condition (this time,
> an in-tree MPC wasn't there in time for gcc's configure check), I have
> also updated Makefile.def, using my patch from the trunk (+ adding cloog
> dependency, just to make sure).
>
>
> Is the patch OK together with the previous one?

Yes.

Thanks,
Richard.

> Tobias

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-04 Thread FX

> While in my testing, 64-bit Mac OS X 10.10.1 (x86_64-apple-darwin14.0.0)
> now bootstraps, but 32-bit (i386-apple-darwin14.0.0) does not:

Is it due to my patch, or pre-existing bootstrap failure?
How do you configure this 32-bit compiler? target/build/host/CFLAGS/CXXFLAGS/etc

FX

Re: [PATCH 2/3] Extended if-conversion

2014-12-04 Thread Richard Biener

On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> I did simple change by saving gsi iterator for each bb that has
> critical edges by adding additional field to bb_predicate_s:
>
> typedef struct bb_predicate_s {
>
>   /* The condition under which this basic block is executed.  */
>   tree predicate;
>
>   /* PREDICATE is gimplified, and the sequence of statements is
>  recorded here, in order to avoid the duplication of computations
>  that occur in previous conditions.  See PR44483.  */
>   gimple_seq predicate_gimplified_stmts;
>
>   /* Insertion point for blocks having incoming critical edges.  */
>   gimple_stmt_iterator gsi;
> } *bb_predicate_p;
>
> and this iterator is saved in  insert_gimplified_predicates before
> insertion code for predicate computation. I checked that this fix
> works.

Huh?  I still wonder what the issue is with inserting everything
after the PHI we predicate.

Well, your updated patch will come with testcases for the testsuite
that will hopefully fail if doing that.

Richard.

>
> Now I am implementing merging of predicate_extended.. and
> predicate_arbitrary.. functions as you proposed.
>
> Best regards.
> Yuri.
>
> 2014-12-04 15:41 GMT+03:00 Richard Biener :
>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev  wrote:
>>> Thanks Richard for your quick reply!
>>>
>>> 1. I agree that we can combine predicate_extended_ and
>>> predicate_arbitrary_ to one function as you proposed.
>>> 2. What is your opinion about using more simple decision about
>>> insertion point - if bb has use of phi result insert phi predication
>>> before it and at the bb end otherwise. I assume that critical edge
>>> splitting is not a good decision.
>>
>> Why not always insert before the use?  Which would be after labels,
>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>> a PHI in BB1 and then for an edge predicate on one of its incoming
>> edges you get SSA uses with defs that are in BB1 itself?  That
>> can only happen for backedges but those you can't remove in any case.
>>
>> Richard.
>>
>>>
>>> Best regards.
>>> Yuri.
>>>
>>> 2014-12-02 16:28 GMT+03:00 Richard Biener :
 On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev  wrote:
> Hi Richard,
>
> I resend you patch1 and patch2 with minor changes:
> 1. I renamed flag_force_vectorize to aggressive_if_conv.
> 2. Use static cast for the first argument of gimple_phi_arg_edge.
> I also very sorry that I sent you bad patch.
>
> Now let me answer on your questions related to second patch.
> 1. Why we need both predicate_extended_scalar_phi and
> predicate_arbitrary_scalar_phi?
>
> Let's consider the following simple test-case:
>
>   #pragma omp simd safelen(8)
>   for (i=0; i<512; i++)
>   {
> float t = a[i];
> if (t > 0.0f & t < 1.0e+17f)
>   if (c[i] != 0)  /* c is integer array. */
> res += 1;
>   }
>
> we can see the following phi node correspondent to res:
>
> # res_1 = PHI 
>
> It is clear that we can optimize it to phi node with 2 arguments only
> and only one check can be used for phi predication (for reduction in
> our case), namely predicate of bb_5. In general case we can't do it
> even if we sort all phi argument values since we still have to produce
> a chain of cond expressions to perform phi predication (see comments
> for predicate_arbitrary_scalar_phi).

 How so?  We can always use !(condition) for the "last" value, thus
 treat it as an 'else' case.  That even works for

 # res_1 = PHI 

 where the condition for edges 5 and 7 can be computed as
 ! (condition for 3 || condition for 4).

 Of course it is worthwhile to also sort single-occurances first
 so your case gets just the condiiton for edge 5 and its inversion
 used for edges 3 and 4 combined.

> 2. Why we need to introduce find_insertion_point?
>  Let's consider another test-case extracted from 175.vpr ( t5.c is
> attached) and we can see that bb_7 and bb_9 containig phi nodes has
> only critical incoming edges and both contain code computing edge
> predicates, e.g.
>
> :
> # xmax_edge_30 = PHI 
> _46 = xmax_17 == xmax_37;
> _47 = xmax_17 == xmax_27;
> _48 = _46 & _47;
> _53 = xmax_17 == xmax_37;
> _54 = ~_53;
> _55 = xmax_17 == xmax_27;
> _56 = _54 & _55;
> _57 = _48 | _56;
> xmax_edge_19 = xmax_edge_39 + 1;
> goto ;
>
> It is evident that we can not put phi predication at the block
> beginning but need to put it after predicate computations.
> Note also that if there are no critical edges for phi arguments
> insertion point will be "after labels" Note also that phi result can
> have use in this block too, so we can't put predication code to the
> block end.

 So the issue is that predicate insertion for edge predicates does
 not h

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-04 Thread Iain Sandoe

Hi Rainer,

On 4 Dec 2014, at 13:32, Rainer Orth wrote:

> FX  writes:
> 
>> 10-days ping
>> This restores bootstrap on a secondary target, target maintainer is OK with
>> it. I think I need build maintainers approval, so please review.
> 
> While in my testing, 64-bit Mac OS X 10.10.1 (x86_64-apple-darwin14.0.0)
> now bootstraps, but 32-bit (i386-apple-darwin14.0.0) does not:
> 
> ld: illegal text-relocation to 'anon' in 
> ../libiberty/pic/libiberty.a(regex.o) from 
> '_byte_common_op_match_null_string_p' in 
> ../libiberty/pic/libiberty.a(regex.o) for architecture i386
> collect2: error: ld returned 1 exit status
> make[3]: *** [libcc1.la] Error 1
> make[2]: *** [all] Error 2
> make[1]: *** [all-libcc1] Error 2

For {i?86,ppc}-darwin* (i.e. m32 hosts) the PIC libiberty library is being 
incorrectly built.

The default BOOT_CFLAGS are: -O2 -g -mdynamic-no-pic
the libiberty pic build appends: -fno-common (and not even -fPIC) [NB -fPIC 
_won't_ override -mdynamic-no-pic, so that's not a simple way out]

This means that the PIC library is being built with non-pic relocs.

I have a local hack to allow build to proceed on m32-host-darwin (which I can 
send to you if you would like it) - however, it's not really a suitable patch 
for trunk... and I've not had time recently to try and fix this.

If you would like to raise a PR for this, I can append the analysis there.

cheers
Iain

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-04 Thread Rainer Orth

FX  writes:

>> While in my testing, 64-bit Mac OS X 10.10.1 (x86_64-apple-darwin14.0.0)
>> now bootstraps, but 32-bit (i386-apple-darwin14.0.0) does not:
>
> Is it due to my patch, or pre-existing bootstrap failure?

I can't tell: before your patch, 32-bit bootstrap was broken due to PR
bootstrap/63966 for quite some time.

> How do you configure this 32-bit compiler? 
> target/build/host/CFLAGS/CXXFLAGS/etc

--target=i386-apple-darwin14.0.0
--build=i386-apple-darwin14.0.0
--host=i386-apple-darwin14.0.0
--enable-languages=all,ada,obj-c++ 

Bootstrap compiler is gcc 4.9.1 (patched for 10.10 support)

CC='gcc -m32' 
CXX='g++ -m32'

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PINGv2][PATCH] Ignore alignment by option

2014-12-04 Thread Yury Gribov


On 12/04/2014 03:47 PM, Dmitry Vyukov wrote:

size_in_bytes = -1 instrumentation is too slow to be the default in kernel.

If we want to pursue this, I propose a different scheme.
Handle 8+ byte accesses as 1/2/4 accesses. No changes to 1/2/4 access handling.
Currently when we allocate, say, 17-byte object we store 0 0 1 into
shadow. An 8-byte access starting at offset 15 won't be detected,
because the corresponding shadow value is 0. Instead we start storing
0 9 1 into shadow. Then the first shadow != 0 check will fail, and the
precise size check will catch the OOB access.
Make this scheme the default for kernel (no additional flags).

This scheme has the following advantages:
- load shadow only once (as opposed to the current size_in_bytes = -1
check that loads shadow twice)
- less code in instrumentation
- accesses to beginning and middle of the object are not slowed down
(shadow still contains 0, so fast-path works); only accesses to the
very last bytes of the object are penalized.


Makes sense.  The scheme actually looks bullet-proof, why Asan team 
preferred current (fast but imprecise) algorithm?


BTW I think we'll want this option in userspace so well so we'll 
probably need to update libasan.


-Y

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-04 Thread Richard Biener

On Thu, Dec 4, 2014 at 2:47 PM, Richard Biener
 wrote:
> On Thu, Dec 4, 2014 at 2:43 PM, Iain Sandoe  wrote:
>> Hi Rainer,
>>
>> On 4 Dec 2014, at 13:32, Rainer Orth wrote:
>>
>>> FX  writes:
>>>
 10-days ping
 This restores bootstrap on a secondary target, target maintainer is OK with
 it. I think I need build maintainers approval, so please review.
>>>
>>> While in my testing, 64-bit Mac OS X 10.10.1 (x86_64-apple-darwin14.0.0)
>>> now bootstraps, but 32-bit (i386-apple-darwin14.0.0) does not:
>>>
>>> ld: illegal text-relocation to 'anon' in 
>>> ../libiberty/pic/libiberty.a(regex.o) from 
>>> '_byte_common_op_match_null_string_p' in 
>>> ../libiberty/pic/libiberty.a(regex.o) for architecture i386
>>> collect2: error: ld returned 1 exit status
>>> make[3]: *** [libcc1.la] Error 1
>>> make[2]: *** [all] Error 2
>>> make[1]: *** [all-libcc1] Error 2
>>
>> For {i?86,ppc}-darwin* (i.e. m32 hosts) the PIC libiberty library is being 
>> incorrectly built.
>>
>> The default BOOT_CFLAGS are: -O2 -g -mdynamic-no-pic
>> the libiberty pic build appends: -fno-common (and not even -fPIC) [NB -fPIC 
>> _won't_ override -mdynamic-no-pic, so that's not a simple way out]
>>
>> This means that the PIC library is being built with non-pic relocs.
>>
>> I have a local hack to allow build to proceed on m32-host-darwin (which I 
>> can send to you if you would like it) - however, it's not really a suitable 
>> patch for trunk... and I've not had time recently to try and fix this.
>>
>> If you would like to raise a PR for this, I can append the analysis there.
>
> The libiberty PIC build shouldn't use BOOT_CFLAGS.  How does
> lto-plugin get around being built for the host but as a shared library?

That is, the mistake is probably adding -mdynamic-no-pic to BOOT_CFLAGS
rather than in more contained places when building files for the host
_binaries_.

Richard.

> Richard.
>
>> cheers
>> Iain
>>

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-04 Thread Richard Biener

On Thu, Dec 4, 2014 at 2:48 PM, Richard Biener
 wrote:
> On Thu, Dec 4, 2014 at 2:47 PM, Richard Biener
>  wrote:
>> On Thu, Dec 4, 2014 at 2:43 PM, Iain Sandoe  wrote:
>>> Hi Rainer,
>>>
>>> On 4 Dec 2014, at 13:32, Rainer Orth wrote:
>>>
 FX  writes:

> 10-days ping
> This restores bootstrap on a secondary target, target maintainer is OK 
> with
> it. I think I need build maintainers approval, so please review.

 While in my testing, 64-bit Mac OS X 10.10.1 (x86_64-apple-darwin14.0.0)
 now bootstraps, but 32-bit (i386-apple-darwin14.0.0) does not:

 ld: illegal text-relocation to 'anon' in 
 ../libiberty/pic/libiberty.a(regex.o) from 
 '_byte_common_op_match_null_string_p' in 
 ../libiberty/pic/libiberty.a(regex.o) for architecture i386
 collect2: error: ld returned 1 exit status
 make[3]: *** [libcc1.la] Error 1
 make[2]: *** [all] Error 2
 make[1]: *** [all-libcc1] Error 2
>>>
>>> For {i?86,ppc}-darwin* (i.e. m32 hosts) the PIC libiberty library is being 
>>> incorrectly built.
>>>
>>> The default BOOT_CFLAGS are: -O2 -g -mdynamic-no-pic
>>> the libiberty pic build appends: -fno-common (and not even -fPIC) [NB -fPIC 
>>> _won't_ override -mdynamic-no-pic, so that's not a simple way out]
>>>
>>> This means that the PIC library is being built with non-pic relocs.
>>>
>>> I have a local hack to allow build to proceed on m32-host-darwin (which I 
>>> can send to you if you would like it) - however, it's not really a suitable 
>>> patch for trunk... and I've not had time recently to try and fix this.
>>>
>>> If you would like to raise a PR for this, I can append the analysis there.
>>
>> The libiberty PIC build shouldn't use BOOT_CFLAGS.  How does
>> lto-plugin get around being built for the host but as a shared library?
>
> That is, the mistake is probably adding -mdynamic-no-pic to BOOT_CFLAGS
> rather than in more contained places when building files for the host
> _binaries_.

Where T_CFLAGS should have been used?

Richard.

> Richard.
>
>> Richard.
>>
>>> cheers
>>> Iain
>>>

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-04 Thread FX

> The default BOOT_CFLAGS are: -O2 -g -mdynamic-no-pic
> the libiberty pic build appends: -fno-common (and not even -fPIC) [NB -fPIC 
> _won't_ override -mdynamic-no-pic, so that's not a simple way out]
> This means that the PIC library is being built with non-pic relocs.

config/mh-darwin says that -mdynamic-no-pic is there because it “speeds 
compiles by 3-5%”. I don’t think we care about speed when the bootstrap fails, 
so can we remove it altogether?

FX




darwin.diff
Description: Binary data

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-04 Thread Richard Biener

On Thu, Dec 4, 2014 at 2:52 PM, FX  wrote:
>> The default BOOT_CFLAGS are: -O2 -g -mdynamic-no-pic
>> the libiberty pic build appends: -fno-common (and not even -fPIC) [NB -fPIC 
>> _won't_ override -mdynamic-no-pic, so that's not a simple way out]
>> This means that the PIC library is being built with non-pic relocs.
>
> config/mh-darwin says that -mdynamic-no-pic is there because it “speeds 
> compiles by 3-5%”. I don’t think we care about speed when the bootstrap 
> fails, so can we remove it altogether?

Can you try adding it as

T_CFLAGS += -mdynamic-no-pic

in gcc/config/t-tarwin instead?

> FX
>
>

Re: [PATCH x86] Enable v64qi permutations.

2014-12-04 Thread Ilya Tocar

On 04 Dec 13:51, Uros Bizjak wrote:
> On Thu, Dec 4, 2014 at 1:45 PM, Uros Bizjak  wrote:
> > On Thu, Dec 4, 2014 at 1:04 PM, Jakub Jelinek  wrote:
> >> On Thu, Dec 04, 2014 at 04:00:27AM -0800, H.J. Lu wrote:
> >>> >> Can you add a few testcases?
> >>> >
> >>> > Isn't it already covered by gcc.dg/torture/vshuf* ?
> >>> >
> >>>
> >>> I didn't see them fail on my machines today.
> >>
> >> Those are executable testcases, those better should not fail.
> >> The patch just improved code generation and the testcases test
> >> if the improved code generation works well.
> >> Did you mean some scan-assembler test that verifies the better code
> >> generation?  Guess it is possible, though fragile.
> >
> > I think that existing executable testcases adequately cover the
> > functionality of the patch.
> >
> > The patch is OK.
> 
> BTW, the ChangeLog is missing.
> 
* config/i386/i386.c (ix86_expand_vec_perm_vpermi2): Handle v64qi.
(expand_vec_perm_broadcast_1): Ditto.
(expand_vec_perm_vpermi2_vpshub2): New.
(ix86_expand_vec_perm_const_1): Use it.
(ix86_vectorize_vec_perm_const_ok): Handle v64qi.
* config/i386/sse.md (VEC_PERM_AVX2): Add v64qi.
(VEC_PERM_CONST): Ditto.
> index ca5d720..6252e7e 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -10678,7 +10678,7 @@
> (V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2")
> (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
> (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
> -   (V32HI "TARGET_AVX512BW")])
> +   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")])
> 
> I don't think change for VBMI target belongs in this patch.
>
Those changes enable non-const v64qi permutes
(via single vpermi2b insn), should I split them into separate patch?

Re: [PINGv2][PATCH] Ignore alignment by option

2014-12-04 Thread Dmitry Vyukov

On Thu, Dec 4, 2014 at 4:48 PM, Yury Gribov  wrote:
> On 12/04/2014 03:47 PM, Dmitry Vyukov wrote:
>>
>> size_in_bytes = -1 instrumentation is too slow to be the default in
>> kernel.
>>
>> If we want to pursue this, I propose a different scheme.
>> Handle 8+ byte accesses as 1/2/4 accesses. No changes to 1/2/4 access
>> handling.
>> Currently when we allocate, say, 17-byte object we store 0 0 1 into
>> shadow. An 8-byte access starting at offset 15 won't be detected,
>> because the corresponding shadow value is 0. Instead we start storing
>> 0 9 1 into shadow. Then the first shadow != 0 check will fail, and the
>> precise size check will catch the OOB access.
>> Make this scheme the default for kernel (no additional flags).
>>
>> This scheme has the following advantages:
>> - load shadow only once (as opposed to the current size_in_bytes = -1
>> check that loads shadow twice)
>> - less code in instrumentation
>> - accesses to beginning and middle of the object are not slowed down
>> (shadow still contains 0, so fast-path works); only accesses to the
>> very last bytes of the object are penalized.
>
>
> Makes sense.  The scheme actually looks bullet-proof, why Asan team
> preferred current (fast but imprecise) algorithm?
>
> BTW I think we'll want this option in userspace so well so we'll probably
> need to update libasan.


We've discussed this scheme, but nobody has shown that it's important enough.
It bloats binary (we do have issues with binary sizes) and slows down
execution a bit. And if it is non-default mode, then it adds more
flags (which is bad) and adds more configurations to test.

For this to happen somebody needs to do research on (1) binary size
increase, (2) slowdown, (3) number of additional bugs it finds (we can
run it over extensive code base that is currently asan-clean).

Here is the issue with some notes:
https://code.google.com/p/address-sanitizer/issues/detail?id=100

Re: [PATCH] Mark explicit decls as implicit when we've seen a prototype

2014-12-04 Thread Jason Merrill


OK.

Jason

Re: [PATCH, i386] Add prefixes avoidance tuning for silvermont target

2014-12-04 Thread Ilya Enkovich

On 02 Dec 12:21, Uros Bizjak wrote:
> On Tue, Dec 2, 2014 at 12:08 PM, Ilya Enkovich  wrote:
> 
> >> > Having stage1 close to end, may we make some decision regarding this
> >> > patch? Having a couple of working variants, may we choose and use one
> >> > of them?
> >>
> >> I propose to wait for Vlad for an update about his plans on register
> >> preference algorythm that would fix this (and other "Ya*r"-type
> >> issues).
> >>
> >> In the absence of the fix, we'll go with "Yr,*x".
> >>
> >> Uros.
> >>
> >
> > Hi,
> >
> > Here is an updated patch which uses "Yr,*x" option to avoid long prefixes 
> > for Silvermont.  Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK 
> > for trunk?
> >
> > Thanks,
> > Ilya
> > --
> > gcc/
> >
> > 2014-12-02  Ilya Enkovich  
> >
> > * config/i386/constraints.md (Yr): New.
> > * config/i386/i386.h (reg_class): Add NO_REX_SSE_REGS.
> > (REG_CLASS_NAMES): Likewise.
> > (REG_CLASS_CONTENTS): Likewise.
> > * config/i386/sse.md (*vec_concatv2sf_sse4_1): Add alternatives
> > which use only NO_REX_SSE_REGS.
> > (vec_set_0): Likewise.
> > (*vec_setv4sf_sse4_1): Likewise.
> > (sse4_1_insertps): Likewise.
> > (*sse4_1_extractps): Likewise.
> > (*sse4_1_mulv2siv2di3): Likewise.
> > (*_mul3): Likewise.
> > (*sse4_1_3): Likewise.
> > (*sse4_1_3): Likewise.
> > (*sse4_1_eqv2di3): Likewise.
> > (sse4_2_gtv2di3): Likewise.
> > (*vec_extractv4si): Likewise.
> > (*vec_concatv2si_sse4_1): Likewise.
> > (vec_concatv2di): Likewise.
> > (_blend): Likewise.
> > (_blendv): Likewise.
> > (_dp): Likewise.
> > (_movntdqa): Likewise.
> > (_mpsadbw): Likewise.
> > (packusdw): Likewise.
> > (_pblendvb): Likewise.
> > (sse4_1_pblendw): Likewise.
> > (sse4_1_phminposuw): Likewise.
> > (sse4_1_v8qiv8hi2): Likewise.
> > (sse4_1_v4qiv4si2): Likewise.
> > (sse4_1_v4hiv4si2): Likewise.
> > (sse4_1_v2qiv2di2): Likewise.
> > (sse4_1_v2hiv2di2): Likewise.
> > (sse4_1_v2siv2di2): Likewise.
> > (sse4_1_ptest): Likewise.
> > (_round): Likewise.
> > (sse4_1_round): Likewise.
> > * config/i386/subst.md (mask_prefix4): New.
> > * config/i386/x86-tune.def (X86_TUNE_AVOID_4BYTE_PREFIXES): New.
> >
> > gcc/testsuites/
> >
> > 2014-12-02  Ilya Enkovich  
> >
> > * gcc.target/i386/sse2-init-v2di-2.c: Adjust to changed
> > vec_concatv2di template.
> 
> OK for mainline with a change below.
> 
> @@ -6544,13 +6550,14 @@
>  })
> 
>  (define_insn_and_split "*sse4_1_extractps"
> -  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,x,x")
> +  [(set (match_operand:SF 0 "nonimmediate_operand" "=*rm,rm,x,x")
>   (vec_select:SF
> 
> Please do not change preferences of non-SSE registers. Please check
> the patch that similar changes didn't creep in.
> 
> Thanks,
> Uros.

Thanks for review!  Didn't find any other typo.  Attached is a committed 
version.

Thanks,
Ilya
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 4e07d70..72925cd 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -108,6 +108,8 @@
 ;;  d  Integer register when integer DFmode moves are enabled
 ;;  x  Integer register when integer XFmode moves are enabled
 ;;  f  x87 register when 80387 floating point arithmetic is enabled
+;;  r  SSE regs not requiring REX prefix when prefixes avoidance is enabled
+;; and all SSE regs otherwise
 
 (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS"
  "First SSE register (@code{%xmm0}).")
@@ -150,6 +152,10 @@
  "(ix86_fpmath & FPMATH_387) ? FLOAT_REGS : NO_REGS"
  "@internal Any x87 register when 80387 FP arithmetic is enabled.")
 
+(define_register_constraint "Yr"
+ "TARGET_SSE ? (X86_TUNE_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : 
ALL_SSE_REGS) : NO_REGS"
+ "@internal Lower SSE register when avoiding REX prefix and all SSE registers 
otherwise.")
+
 ;; We use the B prefix to denote any number of internal operands:
 ;;  s  Sibcall memory operand, not valid for TARGET_X32
 ;;  w  Call memory operand, not valid for TARGET_X32
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3f5f979..6d9df65 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1311,6 +1311,7 @@ enum reg_class
   FP_TOP_REG, FP_SECOND_REG,   /* %st(0) %st(1) */
   FLOAT_REGS,
   SSE_FIRST_REG,
+  NO_REX_SSE_REGS,
   SSE_REGS,
   EVEX_SSE_REGS,
   BND_REGS,
@@ -1369,6 +1370,7 @@ enum reg_class
"FP_TOP_REG", "FP_SECOND_REG",  \
"FLOAT_REGS",   \
"SSE_FIRST_REG",\
+   "NO_REX_SSE_REGS",  \
"SSE_REGS", \
"EVEX_SSE_REGS",\
"BND_REGS", \
@@ -1409,6 +1411,7 @@ enum reg_class

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-04 Thread Richard Biener

On Thu, Dec 4, 2014 at 2:43 PM, Iain Sandoe  wrote:
> Hi Rainer,
>
> On 4 Dec 2014, at 13:32, Rainer Orth wrote:
>
>> FX  writes:
>>
>>> 10-days ping
>>> This restores bootstrap on a secondary target, target maintainer is OK with
>>> it. I think I need build maintainers approval, so please review.
>>
>> While in my testing, 64-bit Mac OS X 10.10.1 (x86_64-apple-darwin14.0.0)
>> now bootstraps, but 32-bit (i386-apple-darwin14.0.0) does not:
>>
>> ld: illegal text-relocation to 'anon' in 
>> ../libiberty/pic/libiberty.a(regex.o) from 
>> '_byte_common_op_match_null_string_p' in 
>> ../libiberty/pic/libiberty.a(regex.o) for architecture i386
>> collect2: error: ld returned 1 exit status
>> make[3]: *** [libcc1.la] Error 1
>> make[2]: *** [all] Error 2
>> make[1]: *** [all-libcc1] Error 2
>
> For {i?86,ppc}-darwin* (i.e. m32 hosts) the PIC libiberty library is being 
> incorrectly built.
>
> The default BOOT_CFLAGS are: -O2 -g -mdynamic-no-pic
> the libiberty pic build appends: -fno-common (and not even -fPIC) [NB -fPIC 
> _won't_ override -mdynamic-no-pic, so that's not a simple way out]
>
> This means that the PIC library is being built with non-pic relocs.
>
> I have a local hack to allow build to proceed on m32-host-darwin (which I can 
> send to you if you would like it) - however, it's not really a suitable patch 
> for trunk... and I've not had time recently to try and fix this.
>
> If you would like to raise a PR for this, I can append the analysis there.

The libiberty PIC build shouldn't use BOOT_CFLAGS.  How does
lto-plugin get around being built for the host but as a shared library?

Richard.

> cheers
> Iain
>

Re: [patch] Simplify non-inline function definitions for std::unordered_xxx containers

2014-12-04 Thread Jonathan Wakely


On 04/12/14 11:39 +, Jonathan Wakely wrote:

On 03/12/14 23:32 +0100, François Dumont wrote:

On 03/12/2014 16:59, Jonathan Wakely wrote:

François (or anyone else), do you see any problem with this change?

It makes the code shorter and I think is much easier to read, it also
reduces the memory total shown by -ftime-report.

No, no problem. Interesting to see what benefit we can have when 
reducing number of template parameters. I will try to recall it in 
the future.


We could remove the _Value parameter from every class template that
also takes _Alloc, because we can get it back from _Alloc::value_type
(assuming we consistently rebind the allocator to the value_type
before instantiating _Hashtable, which we don't currently do).

I have a patch coming soon for PR57272 that might include a change
like that, because I already need to replace _Value with
allocator_traits<_Alloc>::pointer everywhere.


i.e. like this. Please take a look and let me know what you think.

Although this touches almost every line of the hashtable.h and
hastable_policy.h files, it's mostly mechanical. The main purpose is
to replace every use of X* with a typedef like X_pointer, which comes
from the allocator. In the common case it's just a typedef for X*, but
the allocator can use something else. This fixes PR57272.

To make that work the _Hash_node_base and related types need to be
parameterised on the allocator's pointer, and then be fixed to not
rely on implicit conversions from pointer-to-derived to
pointer-to-base (see
http://cplusplus.github.io/LWG/lwg-active.html#2260). Once that change
is done it makes sense to replace the _Value parameter everywhere,
deriving it from the allocator instead.

(I can't help thinking the unordered container code would be much
easier to read if we just had a bundle of all the relevant template
parameters and passed that around, instead of passing the same 9
parameters to every class template!)

Something like the attached patch is necessary to fix PR57272,
although I also have an earlier version that just adds the _Ptr
parameter to the end of the template parameter list, instead of
removing _Value.  We should probably do it for GCC 5.0 if we're ever
going to do it. I have a similar patch for forward_list as well.

There is an alternative to refactoring everything to use the pointer
type, which I'm doing for std::list, std::map etc. to be
backward-compatible. That alternative is to keep the existing
non-template _Hash_node_base type but add an alternative
_Hash_node_ptr_base<_VoidPtr> which is used when the allocator has
"fancy pointers". That is extremely painful, because you need an
alternative _Hash_node<_Ptr> and _Node_iterator<_Ptr> and
_Local_iterator<_Ptr> and so on for every related type.

P.S. this patch also rewrites the __gnu_test::CustomPointerAlloc test
allocator to use a rewritten __gnu_test::CustomPointer which matches
the C++11 requirements better than __gnu_cxx::_Pointer_adapter does.
It also uses the __allocated_obj RAII type to manage
creating/destroying nodes safely without needing try/catch. Both those
changes have been very useful for all the containers when testing and
fixing PR57272.






patch.txt.bz2
Description: BZip2 compressed data

Re: [PINGv2][PATCH] Ignore alignment by option

2014-12-04 Thread Yury Gribov


On 12/04/2014 05:04 PM, Dmitry Vyukov wrote:

On Thu, Dec 4, 2014 at 4:48 PM, Yury Gribov  wrote:

On 12/04/2014 03:47 PM, Dmitry Vyukov wrote:


size_in_bytes = -1 instrumentation is too slow to be the default in
kernel.

If we want to pursue this, I propose a different scheme.
Handle 8+ byte accesses as 1/2/4 accesses. No changes to 1/2/4 access
handling.
Currently when we allocate, say, 17-byte object we store 0 0 1 into
shadow. An 8-byte access starting at offset 15 won't be detected,
because the corresponding shadow value is 0. Instead we start storing
0 9 1 into shadow. Then the first shadow != 0 check will fail, and the
precise size check will catch the OOB access.
Make this scheme the default for kernel (no additional flags).

This scheme has the following advantages:
- load shadow only once (as opposed to the current size_in_bytes = -1
check that loads shadow twice)
- less code in instrumentation
- accesses to beginning and middle of the object are not slowed down
(shadow still contains 0, so fast-path works); only accesses to the
very last bytes of the object are penalized.



Makes sense.  The scheme actually looks bullet-proof, why Asan team
preferred current (fast but imprecise) algorithm?

BTW I think we'll want this option in userspace so well so we'll probably
need to update libasan.


We've discussed this scheme, but nobody has shown that it's important enough.
It bloats binary (we do have issues with binary sizes) and slows down
execution a bit. And if it is non-default mode, then it adds more
flags (which is bad) and adds more configurations to test.

For this to happen somebody needs to do research on (1) binary size
increase, (2) slowdown, (3) number of additional bugs it finds (we can
run it over extensive code base that is currently asan-clean).


Regarding (3) - unless a codebase deliberately uses unaligned accesses 
(like kernel) this change would be of little use - all unaligned 
accesses are then bugs and should already be detected and fixed with UBSan.


-Y

Re: [PATCH x86] Enable v64qi permutations.

2014-12-04 Thread Uros Bizjak

On Thu, Dec 4, 2014 at 2:53 PM, Ilya Tocar  wrote:

>> >>> >> Can you add a few testcases?
>> >>> >
>> >>> > Isn't it already covered by gcc.dg/torture/vshuf* ?
>> >>> >
>> >>>
>> >>> I didn't see them fail on my machines today.
>> >>
>> >> Those are executable testcases, those better should not fail.
>> >> The patch just improved code generation and the testcases test
>> >> if the improved code generation works well.
>> >> Did you mean some scan-assembler test that verifies the better code
>> >> generation?  Guess it is possible, though fragile.
>> >
>> > I think that existing executable testcases adequately cover the
>> > functionality of the patch.
>> >
>> > The patch is OK.
>>
>> BTW, the ChangeLog is missing.
>>
> * config/i386/i386.c (ix86_expand_vec_perm_vpermi2): Handle v64qi.
> (expand_vec_perm_broadcast_1): Ditto.
> (expand_vec_perm_vpermi2_vpshub2): New.
> (ix86_expand_vec_perm_const_1): Use it.
> (ix86_vectorize_vec_perm_const_ok): Handle v64qi.
> * config/i386/sse.md (VEC_PERM_AVX2): Add v64qi.
> (VEC_PERM_CONST): Ditto.
>> index ca5d720..6252e7e 100644
>> --- a/gcc/config/i386/sse.md
>> +++ b/gcc/config/i386/sse.md
>> @@ -10678,7 +10678,7 @@
>> (V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2")
>> (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
>> (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
>> -   (V32HI "TARGET_AVX512BW")])
>> +   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")])
>>
>> I don't think change for VBMI target belongs in this patch.
>>
> Those changes enable non-const v64qi permutes
> (via single vpermi2b insn), should I split them into separate patch?

If they are not on the same topic, then please yes. Please don't mix
separate issues together.

Thanks,
Uros.

Re: [patch] Simplify non-inline function definitions for std::unordered_xxx containers

2014-12-04 Thread Jonathan Wakely


On 04/12/14 14:11 +, Jonathan Wakely wrote:

Although this touches almost every line of the hashtable.h and
hastable_policy.h files, it's mostly mechanical. The main purpose is
to replace every use of X* with a typedef like X_pointer, which comes
from the allocator. In the common case it's just a typedef for X*, but
the allocator can use something else. This fixes PR57272.


And here's the equivalent for std::forward_list (which was
considerably easier, since it's a much simpler container).


diff --git a/libstdc++-v3/include/bits/forward_list.h b/libstdc++-v3/include/bits/forward_list.h
index 7cf699e..f4270b9 100644
--- a/libstdc++-v3/include/bits/forward_list.h
+++ b/libstdc++-v3/include/bits/forward_list.h
@@ -38,6 +38,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -47,27 +49,44 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /**
*  @brief  A helper basic node class for %forward_list.
+   *
*  This is just a linked list with nothing inside it.
*  There are purely list shuffling utility methods here.
+   *
+   *  The template is parameterized on the allocator's void_pointer type.
*/
+  template
 struct _Fwd_list_node_base
 {
+  static_assert(is_void::element_type>{},
+		"the template argument must be a pointer to void");
+
+  using __pointer = __ptr_rebind<_VoidPtr, _Fwd_list_node_base>;
+  using __const_pointer = __ptr_rebind<_VoidPtr, const _Fwd_list_node_base>;
+  using __node_base = _Fwd_list_node_base;
+
   _Fwd_list_node_base() = default;
 
-_Fwd_list_node_base* _M_next = nullptr;
+  __pointer _M_next = nullptr;
 
-_Fwd_list_node_base*
-_M_transfer_after(_Fwd_list_node_base* __begin,
-		  _Fwd_list_node_base* __end) noexcept
+  // Get a pointer-to-base from a pointer-to-derived
+  // (fancy pointers might not support an implicit conversion)
+  __pointer
+  _M_base()
+  { return pointer_traits<__pointer>::pointer_to(*this); }
+
+  __pointer
+  _M_transfer_after(__pointer __begin,
+			__pointer __end) noexcept
   {
-  _Fwd_list_node_base* __keep = __begin->_M_next;
+	__pointer __keep = __begin->_M_next;
 	if (__end)
 	  {
 	__begin->_M_next = __end->_M_next;
 	__end->_M_next = _M_next;
 	  }
 	else
-	__begin->_M_next = 0;
+	  __begin->_M_next = nullptr;
 	_M_next = __keep;
 	return __end;
   }
@@ -75,38 +94,52 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   void
   _M_reverse_after() noexcept
   {
-  _Fwd_list_node_base* __tail = _M_next;
+	__pointer __tail = _M_next;
 	if (!__tail)
 	  return;
-  while (_Fwd_list_node_base* __temp = __tail->_M_next)
+	while (__pointer __temp = __tail->_M_next)
 	  {
-	  _Fwd_list_node_base* __keep = _M_next;
+	__pointer __keep = _M_next;
 	_M_next = __temp;
 	__tail->_M_next = __temp->_M_next;
 	_M_next->_M_next = __keep;
 	  }
   }
+
+  // Cast away constness
+  __pointer
+  _M_mutable() const noexcept
+  {
+	auto* __self = const_cast<_Fwd_list_node_base*>(this);
+	return pointer_traits<__pointer>::pointer_to(*__self);
+  }
 };
 
   /**
*  @brief  A helper node class for %forward_list.
-   *  This is just a linked list with uninitialized storage for a
-   *  data value in each node.
+   *
+   *  This is just a linked list with uninitialized storage for a data value
+   *  in each node.
*  There is a sorting utility method.
*/
-  template
+  template
 struct _Fwd_list_node
-: public _Fwd_list_node_base
+: public _Fwd_list_node_base<__ptr_rebind<_Ptr, void>>
 {
+  using value_type = typename pointer_traits<_Ptr>::element_type;
+
+  static_assert(!is_const{},
+		"the template argument must be a pointer to non-const");
+
   _Fwd_list_node() = default;
 
-  __gnu_cxx::__aligned_buffer<_Tp> _M_storage;
+  __gnu_cxx::__aligned_buffer _M_storage;
 
-  _Tp*
+  value_type*
   _M_valptr() noexcept
   { return _M_storage._M_ptr(); }
 
-  const _Tp*
+  const value_type*
   _M_valptr() const noexcept
   { return _M_storage._M_ptr(); }
 };
@@ -116,15 +149,16 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
* 
*   All the functions are op overloads.
*/
-  template
+  template
 struct _Fwd_list_iterator
 {
-  typedef _Fwd_list_iterator<_Tp>_Self;
-  typedef _Fwd_list_node<_Tp>_Node;
+  typedef _Fwd_list_iterator		_Self;
+  typedef _Fwd_list_node<_Ptr>		_Node;
+  typedef typename _Node::__pointer		__base_pointer;
 
-  typedef _Tpvalue_type;
-  typedef _Tp*   pointer;
-  typedef _Tp&   reference;
+  typedef typename _Node::value_type	value_type;
+  typedef value_type*			pointer;
+  typedef value_type&			reference;
   typedef ptrdiff_t difference_type;
   typedef std::forward_

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-04 Thread Sebastian Pop

Sebastian Pop wrote:
> Jeff Law wrote:
> > I'm a bit worried about compile-time impacts of the all the
> > recursion
> 
> I will also restrict the recursion to the loop in which we look for the FSM
> thread.

The attached patch includes this change.  It passed bootstrap and regression
test on x86_64-linux.  Ok to commit?

Thanks,
Sebastian
>From 0e3312921c07c0e0dd5c1bf5f24050b2336475ef Mon Sep 17 00:00:00 2001
From: Sebastian Pop 
Date: Fri, 26 Sep 2014 14:54:20 -0500
Subject: [PATCH] extend jump thread for finite state automata PR 54742

Adapted from a patch from James Greenhalgh.

	* params.def (max-fsm-thread-path-insns, max-fsm-thread-length,
	max-fsm-thread-paths): New.

	* doc/invoke.texi (max-fsm-thread-path-insns, max-fsm-thread-length,
	max-fsm-thread-paths): Documented.

	* tree-cfg.c (split_edge_bb_loc): Export.
	* tree-cfg.h (split_edge_bb_loc): Declared extern.

	* tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore the
	original value of cond when simplification fails.
	(fsm_find_thread_path): New.
	(fsm_find_control_statement_thread_paths): New.
	(fsm_thread_through_normal_block): Call
	find_control_statement_thread_paths.

	* tree-ssa-threadupdate.c (dump_jump_thread_path): Pretty print
	EDGE_FSM_THREAD.
	(verify_seme): New.
	(duplicate_seme_region): New.
	(thread_through_all_blocks): Generate code for EDGE_FSM_THREAD edges
	calling duplicate_seme_region.

	* tree-ssa-threadupdate.h (jump_thread_edge_type): Add EDGE_FSM_THREAD.

	* testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New.
	* testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c: New.
---
 gcc/doc/invoke.texi  |   12 +
 gcc/params.def   |   15 ++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c |   43 
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c |  127 ++
 gcc/tree-cfg.c   |2 +-
 gcc/tree-cfg.h   |1 +
 gcc/tree-ssa-threadedge.c|  277 +-
 gcc/tree-ssa-threadupdate.c  |  203 +++-
 gcc/tree-ssa-threadupdate.h  |1 +
 9 files changed, 678 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 89edddb..074183f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10624,6 +10624,18 @@ large and significantly increase compile time at optimization level
 @option{-O1} and higher.  This parameter is a maximum nubmer of statements
 in a single generated constructor.  Default value is 5000.
 
+@item max-fsm-thread-path-insns
+Maximum number of instructions to copy when duplicating blocks on a
+finite state automaton jump thread path.  The default is 100.
+
+@item max-fsm-thread-length
+Maximum number of basic blocks on a finite state automaton jump thread
+path.  The default is 10.
+
+@item max-fsm-thread-paths
+Maximum number of new jump thread paths to create for a finite state
+automaton.  The default is 50.
+
 @end table
 @end table
 
diff --git a/gcc/params.def b/gcc/params.def
index 9b21c07..edf3f53 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1140,6 +1140,21 @@ DEFPARAM (PARAM_CHKP_MAX_CTOR_SIZE,
 	  "Maximum number of statements to be included into a single static "
 	  "constructor generated by Pointer Bounds Checker",
 	  5000, 100, 0)
+
+DEFPARAM (PARAM_MAX_FSM_THREAD_PATH_INSNS,
+	  "max-fsm-thread-path-insns",
+	  "Maximum number of instructions to copy when duplicating blocks on a finite state automaton jump thread path",
+	  100, 1, 99)
+
+DEFPARAM (PARAM_MAX_FSM_THREAD_LENGTH,
+	  "max-fsm-thread-length",
+	  "Maximum number of basic blocks on a finite state automaton jump thread path",
+	  10, 1, 99)
+
+DEFPARAM (PARAM_MAX_FSM_THREAD_PATHS,
+	  "max-fsm-thread-paths",
+	  "Maximum number of new jump thread paths to create for a finite state automaton",
+	  50, 1, 99)
 /*
 
 Local variables:
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
new file mode 100644
index 000..bb34a74
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dom1-details" } */
+/* { dg-final { scan-tree-dump-times "FSM" 6 "dom1" } } */
+/* { dg-final { cleanup-tree-dump "dom1" } } */
+
+int sum0, sum1, sum2, sum3;
+int foo (char *s, char **ret)
+{
+  int state=0;
+  char c;
+
+  for (; *s && state != 4; s++)
+{
+  c = *s;
+  if (c == '*')
+	{
+	  s++;
+	  break;
+	}
+  switch (state)
+	{
+	case 0:
+	  if (c == '+')
+	state = 1;
+	  else if (c != '-')
+	sum0+=c;
+	  break;
+	case 1:
+	  if (c == '+')
+	state = 2;
+	  else if (c == '-')
+	state = 0;
+	  else
+	sum1+=c;
+	  break;
+	default:
+	  break;
+	}
+
+}
+

[PATCH] Update match-and-simplify docs

2014-12-04 Thread Richard Biener


Committed.

Richard.

2014-12-04  Richard Biener  

* doc/match-and-simplify.texi: Update for recent changes.

Index: doc/match-and-simplify.texi
===
--- doc/match-and-simplify.texi (revision 218352)
+++ doc/match-and-simplify.texi (working copy)
@@ -224,6 +224,46 @@ In this example the pattern will be repe
 @code{plus, minus, minus}, @code{minus, plus, plus},
 @code{minus, plus, minus}.
 
+To avoid repeating operator lists in @code{for} you can name
+them via
+
+@smallexample
+(define_operator_list pmm plus minus mult)
+@end smallexample
+
+and use them in @code{for} operator lists where they get expanded.
+
+@smallexample
+(for opa (pmm trunc_div)
+ (simplify...
+@end smallexample
+
+So this example iterates over @code{plus}, @code{minus}, @code{mult}
+and @code{trunc_div}.
+
+Using operator lists can also remove the need to explicitely write
+a @code{for}.  All operator list uses that appear in a @code{simplify}
+or @code{match} pattern in operator positions will implicitely
+be added to a new @code{for}.  For example
+
+@smallexample
+(define_operator_list SQRT BUILT_IN_SQRTF BUILT_IN_SQRT BUILT_IN_SQRTL)
+(define_operator_list POW BUILT_IN_POWF BUILT_IN_POW BUILT_IN_POWL)
+(simplify
+ (SQRT (POW @@0 @@1))
+ (POW (abs @@0) (mult @@1 @{ built_real (TREE_TYPE (@@1), dconsthalf); @})))
+@end smallexample
+
+is the same as
+
+@smallexample
+(for SQRT (BUILT_IN_SQRTF BUILT_IN_SQRT BUILT_IN_SQRTL)
+ POW (BUILT_IN_POWF BUILT_IN_POW BUILT_IN_POWL)
+ (simplify
+  (SQRT (POW @@0 @@1))
+  (POW (abs @@0) (mult @@1 @{ built_real (TREE_TYPE (@@1), dconsthalf); @}
+@end smallexample
+
 Another building block are @code{with} expressions in the
 result expression which nest the generated code in a new C block
 followed by its argument:

[PATCH] Fix the rest of PR target/64056

2014-12-04 Thread Ilya Enkovich

Hi,

This patch adds a check for stpcpy function into 
gcc.target/i386/chkp-strlen-2.c test.

make check RUNTESTFLAGS="i386.exp=chkp-strlen-2.c" is OK.  OK for trunk?

Thanks,
Ilya
--
2014-12-04  Ilya Enkovich  

PR target/64056
* lib/target-supports.exp (check_effective_target_stpcpy): New.
* gcc.target/i386/chkp-strlen-2.c: Add stpcpy target check.


diff --git a/gcc/testsuite/gcc.target/i386/chkp-strlen-2.c 
b/gcc/testsuite/gcc.target/i386/chkp-strlen-2.c
index 1ce426d..67691ee 100644
--- a/gcc/testsuite/gcc.target/i386/chkp-strlen-2.c
+++ b/gcc/testsuite/gcc.target/i386/chkp-strlen-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target mpx } */
+/* { dg-require-effective-target stpcpy } */
 /* { dg-options "-fcheck-pointer-bounds -mmpx -O2 -fdump-tree-strlen" } */
 /* { dg-final { scan-tree-dump-not "strlen" "strlen" } } */
 /* { dg-final { cleanup-tree-dump "strlen" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index ac04d95..0a911c1 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5856,6 +5856,12 @@ proc check_effective_target_mempcpy {} {
 return [check_function_available "mempcpy"]
 }
 
+# Returns 1 if "stpcpy" is available on the target system.
+
+proc check_effective_target_stpcpy {} {
+return [check_function_available "stpcpy"]
+}
+
 # Check whether the vectorizer tests are supported by the target and
 # append additional target-dependent compile flags to DEFAULT_VECTCFLAGS.
 # Set dg-do-what-default to either compile or run, depending on target

Re: [PATCH] Fix the rest of PR target/64056

2014-12-04 Thread Rainer Orth

Hi Ilya,


> This patch adds a check for stpcpy function into
> gcc.target/i386/chkp-strlen-2.c test.
>
> make check RUNTESTFLAGS="i386.exp=chkp-strlen-2.c" is OK.  OK for trunk?
>
> Thanks,
> Ilya
> --
> 2014-12-04  Ilya Enkovich  
>
>   PR target/64056
>   * lib/target-supports.exp (check_effective_target_stpcpy): New.

new effective-target keywords need documentation in sourcebuild.texi.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[patch c++]: Fix PR/64100 'A static assert using the the current class in a noexcept test leads to a segfault'

2014-12-04 Thread Kai Tietz

Hi,

 The issue is that lookup_destructor calls
adjust_result_of_qualified_name_lookup
with an NULL_TREE decl (returned by lookup_member).  So error-message
is missing.

As already discussed in bug-tracker:

ChangeLog

2014-12-04  Kai Tietz  

PR c++/64100
* typeck.c (lookup_destructor): Handle incomplete
type.

Tested on x86_64-unknown-linux-gnu.

Ok for apply?

Kai
Index: typeck.c
===
--- typeck.c(Revision 218309)
+++ typeck.c(Arbeitskopie)
@@ -2536,6 +2535,12 @@ lookup_destructor (tree object, tree scope, tree d
   expr = lookup_member (dtor_type, complete_dtor_identifier,
 /*protect=*/1, /*want_type=*/false,
 tf_warning_or_error);
+  if (!expr)
+{
+  if (complain & tf_error)
+cxx_incomplete_type_error (dtor_name, dtor_type);
+  return error_mark_node;
+}
   expr = (adjust_result_of_qualified_name_lookup
   (expr, dtor_type, object_type));
   if (scope == NULL_TREE)

Re: [PATCH] Mark explicit decls as implicit when we've seen a prototype

2014-12-04 Thread Joseph Myers

On Thu, 4 Dec 2014, Richard Biener wrote:

> Currently even when I prototype
> 
> double exp10 (double);
> 
> this function is not available to optimizers for code generation if
> they just check for builtin_decl_implicit (BUILT_IN_EXP10).
> Curiously though the function is identified as BUILT_IN_EXP10 when
> used though, thus the middle-end assumes it has expected exp10
> semantics.  I see we already cheat with stpcpy and make it available
> to optimizers by marking it implicit when we've seen a prototype.
> 
> The following patch proposed to do that for all builtins.
> 
> At least I can't see how interpreting exp10 as exp10 but then
> not being allowed to use it as exp10 is sensible.
> 
> Now one could argue that declaring exp10 doesn't mean there is
> an implementation available at link time (after all declaring
> exp10 doesn't mean I use exp10 anywhere).  But that's true
> for implicit decls as well - I might declare 'pow', not use
> it and not link against libm.  So if the compiler now emits
> a call to pow my link will break:

Logically I think the following should apply (no doubt there are various 
existing bugs in this area; at least, PR 46926 for sincos calls being 
generated in circumstances when that name is not reserved):

* If a function is assumed to have particular semantics, it can also be 
assumed to be available for code generation, *except* that implicitly 
introducing references to functions that may not be in libc is unsafe 
unless the user's program explicitly used a function from the containing 
library.  (Libraries for things that may not be in libc are as listed at 
; only 
libm for ,  and  functions is likely to be 
relevant to GCC.  "explicitly used" is as in C11 6.9#5: "used in an 
expression (other than as part of the operand of a sizeof or _Alignof 
operator whose result is an integer constant)" (appropriately extended to 
cover GNU C extensions, such as typeof with non-variably-modified 
arguments).)

I don't think there's any existing infrastructure to avoid introducing 
e.g. calls to pow (in the absence of explicit calls to some libm function) 
in cases such as you suggest (and of course pow is an ISO C90 function so 
its semantics can always be assumed - the question is just whether libm 
will be used).

* If __builtin_foo (a built-in function where there is a corresponding 
function foo, standard or otherwise) is explicitly used, except maybe for 
calls with constant arguments expected to be optimized, the function foo 
may be assumed to be available and to have the expected semantics.  This 
applies regardless of what library foo is expected to be in, and 
regardless of whether foo is defined by any standard selected with -std, 
and regardless of any target-specific configuration information about what 
functions the target libraries provide (e.g. __builtin_sincos calls can be 
expanded to sincos whether or not the target has such a function).  
Furthermore, an explicit use of __builtin_foo for a libm function can be 
taken as meaning that libm will be linked in, except maybe for calls with 
constant arguments expected to be optimized.

(The point of the constant arguments exception is that calls to 
__builtin_nan (""), for example, are OK in static initializers, and maybe 
shouldn't be taken as meaning the nan library function is available or 
libm will be linked in.  The built-in functions for generating NaNs may be 
the only case for which this needs to apply.)

* As documented for -nostdlib, memcmp, memset, memcpy and memmove may 
always be assumed to be available, even with -ffreestanding, regardless of 
what explicit calls are or are not present.

* Built-in functions enabled by the selected -std (which in the default 
mode should mean all of them, but in modes such as -std=c11 is a smaller 
set) may always be assumed to be available in libraries linked in and to 
have the expected semantics, subject to (a) any target-specific 
configuration (targetm.libc_has_function) and (b) whether there are any 
explicit calls to functions in the relevant library (not generating calls 
to pow in the absence of any explicit libm function calls).

* Built-in functions reserved but not enabled by the selected -std (e.g. 
float and long double functions for -std=c90) may be treated the same as 
those enabled by the selected -std: it's never valid for the user to use 
them with any other semantics, and targetm.libc_has_function determines 
whether the function is available to generate calls to it (together with 
the presence of any explicit calls to libm functions).

* Built-in functions not reserved or enabled by the selected -std may be 
assumed to be available and to have the expected semantics (subject to 
some functions from the relevant library being explicitly called, in the 
case of libm functions) if declared *in a system header*.  So if you use 
-std=c11 -D_GNU_SOURCE and then include a header declaring ex

[patch c++]: Fix PR/64127 ICE on invalid: tree check: exprected identifier_node, have template_id_expr in cp_parser_diagnose_invalid_type_name

2014-12-04 Thread Kai Tietz

Hi,

this patch fixes an ICE happening on invalid code for < c++11.  It is
reasoned by
accessing blindly identifier without checking that it is a declaration.

ChangeLog

2014-12-04  Kai Tietz  

PR c++/64127
* parser.c (cp_parser_diagnose_invalid_type_name): Check
id for being a declaration before accessing identifier.

Tested on x86_64-unknown-linux-gnu.

Ok for apply?

Regards,
Kai

Index: parser.c
===
--- parser.c(Revision 218309)
+++ parser.c(Arbeitskopie)
@@ -2977,6 +2977,7 @@ cp_parser_diagnose_invalid_type_name (cp_parser *p
 inform (location, "C++11 % only available with "
 "-std=c++11 or -std=gnu++11");
   else if (cxx_dialect < cxx11
+   && DECL_P (id)
&& !strcmp (IDENTIFIER_POINTER (id), "thread_local"))
 inform (location, "C++11 % only available with "
 "-std=c++11 or -std=gnu++11");

Re: [PATCH] Mark explicit decls as implicit when we've seen a prototype

2014-12-04 Thread Richard Biener

On Thu, 4 Dec 2014, Joseph Myers wrote:

> On Thu, 4 Dec 2014, Richard Biener wrote:
> 
> > Currently even when I prototype
> > 
> > double exp10 (double);
> > 
> > this function is not available to optimizers for code generation if
> > they just check for builtin_decl_implicit (BUILT_IN_EXP10).
> > Curiously though the function is identified as BUILT_IN_EXP10 when
> > used though, thus the middle-end assumes it has expected exp10
> > semantics.  I see we already cheat with stpcpy and make it available
> > to optimizers by marking it implicit when we've seen a prototype.
> > 
> > The following patch proposed to do that for all builtins.
> > 
> > At least I can't see how interpreting exp10 as exp10 but then
> > not being allowed to use it as exp10 is sensible.
> > 
> > Now one could argue that declaring exp10 doesn't mean there is
> > an implementation available at link time (after all declaring
> > exp10 doesn't mean I use exp10 anywhere).  But that's true
> > for implicit decls as well - I might declare 'pow', not use
> > it and not link against libm.  So if the compiler now emits
> > a call to pow my link will break:
> 
> Logically I think the following should apply (no doubt there are various 
> existing bugs in this area; at least, PR 46926 for sincos calls being 
> generated in circumstances when that name is not reserved):
> 
> * If a function is assumed to have particular semantics, it can also be 
> assumed to be available for code generation, *except* that implicitly 
> introducing references to functions that may not be in libc is unsafe 
> unless the user's program explicitly used a function from the containing 
> library.  (Libraries for things that may not be in libc are as listed at 
> ; only 
> libm for ,  and  functions is likely to be 
> relevant to GCC.  "explicitly used" is as in C11 6.9#5: "used in an 
> expression (other than as part of the operand of a sizeof or _Alignof 
> operator whose result is an integer constant)" (appropriately extended to 
> cover GNU C extensions, such as typeof with non-variably-modified 
> arguments).)
> 
> I don't think there's any existing infrastructure to avoid introducing 
> e.g. calls to pow (in the absence of explicit calls to some libm function) 
> in cases such as you suggest (and of course pow is an ISO C90 function so 
> its semantics can always be assumed - the question is just whether libm 
> will be used).
> 
> * If __builtin_foo (a built-in function where there is a corresponding 
> function foo, standard or otherwise) is explicitly used, except maybe for 
> calls with constant arguments expected to be optimized, the function foo 
> may be assumed to be available and to have the expected semantics.  This 
> applies regardless of what library foo is expected to be in, and 
> regardless of whether foo is defined by any standard selected with -std, 
> and regardless of any target-specific configuration information about what 
> functions the target libraries provide (e.g. __builtin_sincos calls can be 
> expanded to sincos whether or not the target has such a function).  
> Furthermore, an explicit use of __builtin_foo for a libm function can be 
> taken as meaning that libm will be linked in, except maybe for calls with 
> constant arguments expected to be optimized.
> 
> (The point of the constant arguments exception is that calls to 
> __builtin_nan (""), for example, are OK in static initializers, and maybe 
> shouldn't be taken as meaning the nan library function is available or 
> libm will be linked in.  The built-in functions for generating NaNs may be 
> the only case for which this needs to apply.)
> 
> * As documented for -nostdlib, memcmp, memset, memcpy and memmove may 
> always be assumed to be available, even with -ffreestanding, regardless of 
> what explicit calls are or are not present.
> 
> * Built-in functions enabled by the selected -std (which in the default 
> mode should mean all of them, but in modes such as -std=c11 is a smaller 
> set) may always be assumed to be available in libraries linked in and to 
> have the expected semantics, subject to (a) any target-specific 
> configuration (targetm.libc_has_function) and (b) whether there are any 
> explicit calls to functions in the relevant library (not generating calls 
> to pow in the absence of any explicit libm function calls).
> 
> * Built-in functions reserved but not enabled by the selected -std (e.g. 
> float and long double functions for -std=c90) may be treated the same as 
> those enabled by the selected -std: it's never valid for the user to use 
> them with any other semantics, and targetm.libc_has_function determines 
> whether the function is available to generate calls to it (together with 
> the presence of any explicit calls to libm functions).
> 
> * Built-in functions not reserved or enabled by the selected -std may be 
> assumed to be available and to have the expected semantics (subj

Re: [PATCH] Fix the rest of PR target/64056

2014-12-04 Thread Ilya Enkovich

On 04 Dec 15:58, Rainer Orth wrote:
> Hi Ilya,
> 
> 
> > This patch adds a check for stpcpy function into
> > gcc.target/i386/chkp-strlen-2.c test.
> >
> > make check RUNTESTFLAGS="i386.exp=chkp-strlen-2.c" is OK.  OK for trunk?
> >
> > Thanks,
> > Ilya
> > --
> > 2014-12-04  Ilya Enkovich  
> >
> > PR target/64056
> > * lib/target-supports.exp (check_effective_target_stpcpy): New.
> 
> new effective-target keywords need documentation in sourcebuild.texi.
> 
>   Rainer
> 
> -- 
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University

Thanks for the notice!  I see there is also no description for mempcpy target 
check.  So add both of them.

Thanks,
Ilya
--
gcc/

2014-12-04  Ilya Enkovich  

PR target/64056
* doc/sourcebuild.texi: Add mempcpy and stpcpy for Effective-Target 
Keywords.

gcc/testsuite/

2014-12-04  Ilya Enkovich  

PR target/64056
* lib/target-supports.exp (check_effective_target_stpcpy): New.
* gcc.target/i386/chkp-strlen-2.c: Add stpcpy target check.


diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 20a206d..797d3ee 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1761,6 +1761,9 @@ Target might have errors of a few ULP in string to 
floating-point
 conversion functions and overflow is not always detected correctly by
 those functions.
 
+@item mempcpy
+Target provides @code{mempcpy} function.
+
 @item mmap
 Target supports @code{mmap}.
 
@@ -1790,6 +1793,9 @@ Target has the basic signed and unsigned C types in 
@code{stdint.h}.
 This will be obsolete when GCC ensures a working @code{stdint.h} for
 all targets.
 
+@item stpcpy
+Target provides @code{stpcpy} function.
+
 @item trampolines
 Target supports trampolines.
 
diff --git a/gcc/testsuite/gcc.target/i386/chkp-strlen-2.c 
b/gcc/testsuite/gcc.target/i386/chkp-strlen-2.c
index 1ce426d..67691ee 100644
--- a/gcc/testsuite/gcc.target/i386/chkp-strlen-2.c
+++ b/gcc/testsuite/gcc.target/i386/chkp-strlen-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target mpx } */
+/* { dg-require-effective-target stpcpy } */
 /* { dg-options "-fcheck-pointer-bounds -mmpx -O2 -fdump-tree-strlen" } */
 /* { dg-final { scan-tree-dump-not "strlen" "strlen" } } */
 /* { dg-final { cleanup-tree-dump "strlen" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index ac04d95..0a911c1 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5856,6 +5856,12 @@ proc check_effective_target_mempcpy {} {
 return [check_function_available "mempcpy"]
 }
 
+# Returns 1 if "stpcpy" is available on the target system.
+
+proc check_effective_target_stpcpy {} {
+return [check_function_available "stpcpy"]
+}
+
 # Check whether the vectorizer tests are supported by the target and
 # append additional target-dependent compile flags to DEFAULT_VECTCFLAGS.
 # Set dg-do-what-default to either compile or run, depending on target

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-04 Thread FX

> Can you try adding it as
> 
> T_CFLAGS += -mdynamic-no-pic
> 
> in gcc/config/t-tarwin instead?

Nope, doing so fails to link libgcc_s.dylib:

/Users/fx/devel/gcc/i/./gcc/xgcc -B/Users/fx/devel/gcc/i/./gcc/ 
-B/Users/fx/devel/gcc/i2/i386-apple-darwin14.0.0/bin/ 
-B/Users/fx/devel/gcc/i2/i386-apple-darwin14.0.0/lib/ -isystem 
/Users/fx/devel/gcc/i2/i386-apple-darwin14.0.0/include -isystem 
/Users/fx/devel/gcc/i2/i386-apple-darwin14.0.0/sys-include-O2  -g -O2 
-DIN_GCC-mdynamic-no-pic -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-format -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition  -isystem ./include   -pipe -fno-common -g -DIN_LIBGCC2 
-fbuilding-libgcc -fno-stack-protector  -dynamiclib -nodefaultlibs 
-install_name /Users/fx/devel/gcc/i2/lib/libgcc_s.1.dylib -single_module -o 
./libgcc_s.dylib -Wl,-exported_symbols_list,libgcc.map -compatibility_version 1 
-current_version 1.0 -g -O2 -B./ _muldi3_s.o _negdi2_s.o _lshrdi3_s.o 
_ashldi3_s.o _ashrdi3_s.o _cmpdi2_s.o _ucmpdi2_s.o _clear_cache_s.o 
_trampoline_s.o __main_s.o _absvsi2_s.o _absvdi2_s.o _addvsi3_s.o _addvdi3_s.o 
_subvsi3_s.o _subvdi3_s.o _mulvsi3_s.o _mulvdi3_s.o _negvsi2_s.o _negvdi2_s.o 
_ctors_s.o _ffssi2_s.o _ffsdi2_s.o _clz_s.o _clzsi2_s.o _clzdi2_s.o _ctzsi2_s.o 
_ctzdi2_s.o _popcount_tab_s.o _popcountsi2_s.o _popcountdi2_s.o _paritysi2_s.o 
_paritydi2_s.o _powisf2_s.o _powidf2_s.o _powixf2_s.o _powitf2_s.o _mulsc3_s.o 
_muldc3_s.o _mulxc3_s.o _multc3_s.o _divsc3_s.o _divdc3_s.o _divxc3_s.o 
_divtc3_s.o _bswapsi2_s.o _bswapdi2_s.o _clrsbsi2_s.o _clrsbdi2_s.o 
_fixunssfsi_s.o _fixunsdfsi_s.o _fixunsxfsi_s.o _fixsfdi_s.o _fixdfdi_s.o 
_fixxfdi_s.o _fixunssfdi_s.o _fixunsdfdi_s.o _fixunsxfdi_s.o _floatdisf_s.o 
_floatdidf_s.o _floatdixf_s.o _floatundisf_s.o _floatundidf_s.o 
_floatundixf_s.o _fixsfti_s.o _fixdfti_s.o _fixxfti_s.o _fixtfti_s.o 
_fixunssfti_s.o _fixunsdfti_s.o _fixunsxfti_s.o _fixunstfti_s.o _floattisf_s.o 
_floattidf_s.o _floattixf_s.o _floattitf_s.o _floatuntisf_s.o _floatuntidf_s.o 
_floatuntixf_s.o _floatuntitf_s.o _divdi3_s.o _moddi3_s.o _udivdi3_s.o 
_umoddi3_s.o _udiv_w_sdiv_s.o _udivmoddi4_s.o darwin-64_s.o cpuinfo_s.o 
tf-signs_s.o sfp-exceptions_s.o addtf3_s.o divtf3_s.o eqtf2_s.o getf2_s.o 
letf2_s.o multf3_s.o negtf2_s.o subtf3_s.o unordtf2_s.o fixtfsi_s.o 
fixunstfsi_s.o floatsitf_s.o floatunsitf_s.o fixtfdi_s.o fixunstfdi_s.o 
floatditf_s.o floatunditf_s.o extendsftf2_s.o extenddftf2_s.o extendxftf2_s.o 
trunctfsf2_s.o trunctfdf2_s.o trunctfxf2_s.o enable-execute-stack_s.o 
unwind-dw2_s.o unwind-dw2-fde-darwin_s.o unwind-sjlj_s.o unwind-c_s.o 
emutls_s.o libgcc.a -lc
ld: warning: could not create compact unwind for __Unwind_RaiseException: does 
not use EBP or ESP based frame
ld: warning: could not create compact unwind for __Unwind_ForcedUnwind: does 
not use EBP or ESP based frame
ld: warning: could not create compact unwind for __Unwind_Resume: does not use 
EBP or ESP based frame
ld: warning: could not create compact unwind for __Unwind_Resume_or_Rethrow: 
does not use EBP or ESP based frame
ld: illegal text-relocation to '4-byte-literal' in _powisf2_s.o from 
'___powisf2' in _powisf2_s.o for architecture i386
collect2: error: ld returned 1 exit status
make[3]: *** [libgcc_s.dylib] Error 1
make[2]: *** [all-stage1-target-libgcc] Error 2
make[1]: *** [stage1-bubble] Error 2
make: *** [all] Error 2

Re: [PATCH] Mark explicit decls as implicit when we've seen a prototype

2014-12-04 Thread Joseph Myers

On Thu, 4 Dec 2014, Richard Biener wrote:

> So what does this all mean in practice for optimization passes?

I don't know what it means in terms of how to fix the various existing 
problems - it's simply how I think a fixed compiler should behave.

> When b) does not apply then the given stpcpy special-casing shows
> that even then this is not enough.  So should that special-casing
> and any extension of it add
> 
>   && in_system_header_at (DECL_SOURCE_LOCATION (newdecl))
> 
> ?

I think that would be logically correct (for allowing stpcpy calls to be 
generated in optimization if the user included  with 
_POSIX_C_SOURCE=200809L defined, for example, but not if they used 
-std=c11 without feature test macros and declared their own stpcpy).  But 
the same should apply to any optimization of stpcpy calls written by the 
user - they can only be assumed to have standard stpcpy semantics if (a) 
the selected standard includes stpcpy, e.g. -std=gnu11 or (b) stpcpy was 
declared in a system header, not just by the user.

-- 
Joseph S. Myers
jos...@codesourcery.com

[patch c++]: Fix PR c++/64106 ICE on valid code

2014-12-04 Thread Kai Tietz

Hi,

this patch adds INDIRECT_REF support to cxx_eval_store_expression handling.
There is a different variant suggested by Marek, which adds additional
operand-0 to ref, which looks to me wrong.

ChangeLog gcc/cp

2014-12-04  Kai Tietz  

PR c++/64106
* constexpr.c (cxx_eval_store_expression): Handle INDIRECT_REF.

ChangeLog testsuite

2014-12-04  Kai Tietz  

PR c++/64106
* g++.dg/pr64106.C: New file.

Tested on x86_64-unknown-linux-gnu.

Ok for apply?

Regards,
Kai

Index: constexpr.c
===
--- constexpr.c (Revision 218142)
+++ constexpr.c (Arbeitskopie)
@@ -2486,7 +2550,9 @@ cxx_eval_store_expression (const constexpr_ctx *ct
  vec_safe_push (refs, TREE_TYPE (probe));
  probe = TREE_OPERAND (probe, 0);
  break;
-
+   case INDIRECT_REF:
+ probe = TREE_OPERAND (probe, 0);
+ break;
default:
  object = probe;
  gcc_assert (DECL_P (object));
Index: gcc/gcc/testsuite/g++.dg/pr64106.C
===
--- /dev/null
+++ gcc/gcc/testsuite/g++.dg/pr64106.C
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+
+void
+f (long &c, int &lc, char *&out)
+{
+  while (lc >= 8) *out++ = (c >> (lc -= 8));
+}

Re: [PATCH] Mark explicit decls as implicit when we've seen a prototype

2014-12-04 Thread Richard Biener

On Thu, 4 Dec 2014, Joseph Myers wrote:

> On Thu, 4 Dec 2014, Richard Biener wrote:
> 
> > So what does this all mean in practice for optimization passes?
> 
> I don't know what it means in terms of how to fix the various existing 
> problems - it's simply how I think a fixed compiler should behave.
> 
> > When b) does not apply then the given stpcpy special-casing shows
> > that even then this is not enough.  So should that special-casing
> > and any extension of it add
> > 
> >   && in_system_header_at (DECL_SOURCE_LOCATION (newdecl))
> > 
> > ?
> 
> I think that would be logically correct (for allowing stpcpy calls to be 
> generated in optimization if the user included  with 
> _POSIX_C_SOURCE=200809L defined, for example, but not if they used 
> -std=c11 without feature test macros and declared their own stpcpy).  But 
> the same should apply to any optimization of stpcpy calls written by the 
> user - they can only be assumed to have standard stpcpy semantics if (a) 
> the selected standard includes stpcpy, e.g. -std=gnu11 or (b) stpcpy was 
> declared in a system header, not just by the user.

Yeah - it is that current difference that appears most odd to me.
We assume stpcpy semantics (stick BUILT_IN_STPCPY on the decl) as
soon as we see a (compatible?) decl but we do not allow us to
generate a call to it because semantics may be different(!?).

OTOH this also means the user cannot provide a conforming
implementation on his own and get that used by GCC without editing
system headers or including a header with -isystem or similar
tricks.

Richard.

Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-04 Thread Iain Sandoe

On 4 Dec 2014, at 15:24, FX wrote:

>> Can you try adding it as
>> 
>> T_CFLAGS += -mdynamic-no-pic
>> 
>> in gcc/config/t-tarwin instead?
> 

-mdynamic-no-pic should be used to build *host* executable stuff for m32 darwin.

It is not suitable for building shared libraries (hence the problem with 
building the PIC version of libiberty) and won't work for the target libaries 
for similar reasons.

If you want a "quick fix", sure remove it from the boot cflags - but it's 
hiding a real issue which is that the pic build of libiberty does not cater for 
the possibility that the non-pic flags cannot simply be overridden by the pic 
ones.

Of course, it's possible what darwin is the only affected target - but I'd not 
want to swear to that.

Iain

Re: [PATCH] Mark explicit decls as implicit when we've seen a prototype

2014-12-04 Thread Joseph Myers

On Thu, 4 Dec 2014, Richard Biener wrote:

> OTOH this also means the user cannot provide a conforming
> implementation on his own and get that used by GCC without editing
> system headers or including a header with -isystem or similar
> tricks.

Well - you could have a pragma / attribute for that purpose (declaring 
"this program is providing a version of function X that has the semantics 
GCC expects for function X", so GCC can both generate and optimize calls).  
Such a pragma / attribute could also override targetm.libc_has_function 
(for the case of the user providing their own definition of something 
missing from their system's standard libraries).  A related case would be 
declaring somehow "I will be linking in libm, even though this translation 
unit doesn't appear to be using libm functions, so calls to libm functions 
can be implicitly generated", if GCC were made to avoid introducing uses 
of libm.  (Again, this would be a matter of providing a cleaner interface 
rather than something that currently can't be expressed at all - the 
proposed definition of what it means to use libm explicitly implies that 
"if (0) (void) sqrt (0);" says that libm is being used.)

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] Careful with BIT_NOT_EXPR folding (PR middle-end/56917)

2014-12-04 Thread Marek Polacek

The PR shows a case in which fold introduces undefined behavior in a
valid program, because what it does here is
-(long int) (ul + ULONG_MAX) - 1 ->
~(long int) (ul + ULONG_MAX) ->
-(long int) ul
But the latter transformation is wrong if ul is unsigned long and equals
LONG_MAX + 1UL, because that introduces -LONG_MIN, and ubsan/-ftrapv
errors on that, even though the program is valid.
(Converting LONG_MAX + 1UL to long is implementation-defined.)

I tried to limit the folding a bit so that it doesn't creep in UB,
but I would really appreciate second pair of eyes here...

Bootstrapped/regtested on ppc64-linux, ok for trunk?

2014-12-04  Marek Polacek  

PR middle-end/56917
* fold-const.c (fold_unary_loc): Don't introduce undefined behavior
when transforming ~ (A - 1) or ~ (A + -1) to -A.

* c-c++-common/ubsan/pr56917.c: New test.

diff --git gcc/fold-const.c gcc/fold-const.c
index 17eb5bb..1e56b28 100644
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -8140,7 +8140,16 @@ fold_unary_loc (location_t loc, enum tree_code code, 
tree type, tree op0)
   && ((TREE_CODE (arg0) == MINUS_EXPR
&& integer_onep (TREE_OPERAND (arg0, 1)))
   || (TREE_CODE (arg0) == PLUS_EXPR
-  && integer_all_onesp (TREE_OPERAND (arg0, 1)
+  && integer_all_onesp (TREE_OPERAND (arg0, 1
+  /* Be careful not to introduce new overflows when
+ transforming to -A.  That might happen if type of A
+ is unsigned type for TYPE and the value of A is equal
+ TYPE_MAX + 1.  */
+  && (TYPE_PRECISION (type)
+  != TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg0, 1)))
+  || TYPE_OVERFLOW_WRAPS (type)
+  || TYPE_UNSIGNED (type)
+ == TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (arg0, 1)
return fold_build1_loc (loc, NEGATE_EXPR, type,
fold_convert_loc (loc, type,
  TREE_OPERAND (arg0, 0)));
diff --git gcc/testsuite/c-c++-common/ubsan/pr56917.c 
gcc/testsuite/c-c++-common/ubsan/pr56917.c
index e69de29..e676962 100644
--- gcc/testsuite/c-c++-common/ubsan/pr56917.c
+++ gcc/testsuite/c-c++-common/ubsan/pr56917.c
@@ -0,0 +1,34 @@
+/* PR middle-end/56917 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=undefined -fno-sanitize-recover=undefined" } */
+
+#define INT_MIN (-__INT_MAX__ - 1)
+#define LONG_MIN (-__LONG_MAX__ - 1L)
+#define LLONG_MIN (-__LONG_LONG_MAX__ - 1LL)
+
+int __attribute__ ((noinline,noclone))
+fn1 (unsigned int u)
+{
+  return (-(int) (u - 1U)) - 1;
+}
+
+long __attribute__ ((noinline,noclone))
+fn2 (unsigned long int ul)
+{
+  return (-(long) (ul - 1UL)) - 1L;
+}
+
+long long __attribute__ ((noinline,noclone))
+fn3 (unsigned long long int ull)
+{
+  return (-(long long) (ull - 1ULL)) - 1LL;
+}
+
+int
+main (void)
+{
+  if (fn1 (__INT_MAX__ + 1U) != INT_MIN
+  || fn2 (__LONG_MAX__ + 1UL) != LONG_MIN
+  || fn3 (__LONG_LONG_MAX__ + 1ULL) != LLONG_MIN)
+__builtin_abort ();
+}

Marek

Re: [patch c++]: Fix PR/64100 'A static assert using the the current class in a noexcept test leads to a segfault'

2014-12-04 Thread Marek Polacek

On Thu, Dec 04, 2014 at 04:00:34PM +0100, Kai Tietz wrote:
> Hi,
> 
>  The issue is that lookup_destructor calls
> adjust_result_of_qualified_name_lookup
> with an NULL_TREE decl (returned by lookup_member).  So error-message
> is missing.
> 
> As already discussed in bug-tracker:
> 
> ChangeLog
> 
> 2014-12-04  Kai Tietz  
> 
> PR c++/64100
> * typeck.c (lookup_destructor): Handle incomplete
> type.
> 
> Tested on x86_64-unknown-linux-gnu.
> 
> Ok for apply?

Testcase?

Marek

Re: [patch c++]: Fix PR/64127 ICE on invalid: tree check: exprected identifier_node, have template_id_expr in cp_parser_diagnose_invalid_type_name

2014-12-04 Thread Marek Polacek

On Thu, Dec 04, 2014 at 04:12:02PM +0100, Kai Tietz wrote:
> Hi,
> 
> this patch fixes an ICE happening on invalid code for < c++11.  It is
> reasoned by
> accessing blindly identifier without checking that it is a declaration.
> 
> ChangeLog
> 
> 2014-12-04  Kai Tietz  
> 
> PR c++/64127
> * parser.c (cp_parser_diagnose_invalid_type_name): Check
> id for being a declaration before accessing identifier.
> 
> Tested on x86_64-unknown-linux-gnu.
> 
> Ok for apply?

Testcase?

Marek

Re: [patch c++]: Fix PR/64100 'A static assert using the the current class in a noexcept test leads to a segfault'

2014-12-04 Thread Kai Tietz

2014-12-04 16:46 GMT+01:00 Marek Polacek :
> On Thu, Dec 04, 2014 at 04:00:34PM +0100, Kai Tietz wrote:
>> Hi,
>>
>>  The issue is that lookup_destructor calls
>> adjust_result_of_qualified_name_lookup
>> with an NULL_TREE decl (returned by lookup_member).  So error-message
>> is missing.
>>
>> As already discussed in bug-tracker:
>>
>> ChangeLog
>>
>> 2014-12-04  Kai Tietz  
>>
>> PR c++/64100
>> * typeck.c (lookup_destructor): Handle incomplete
>> type.
>>
>> Tested on x86_64-unknown-linux-gnu.
>>
>> Ok for apply?
>
> Testcase?
>
> Marek

Well, I spared that as this is an invalid code bug with a lot of
side-errors, which might change over time.

If wanted I can add testcase to g++/template/ collection

Kai

Re: [PATCH] Set stores_off_frame_dead_at_return to false if sibling call

2014-12-04 Thread John David Anglin


On 12/1/2014 11:57 AM, Jeff Law wrote:
Prior to reload (ie, in DSE1) there's a bit of magic in that we do not 
set frame_read on call insns.  That may in fact be wrong and possibly 
the source of the problem.


 /* This field is only used for the processing of const functions.
 These functions cannot read memory, but they can read the stack
 because that is where they may get their parms.  We need to be
 this conservative because, like the store motion pass, we don't
 consider CALL_INSN_FUNCTION_USAGE when processing call insns.
 Moreover, we need to distinguish two cases:
 1. Before reload (register elimination), the stores related to
outgoing arguments are stack pointer based and thus deemed
of non-constant base in this pass.  This requires special
handling but also means that the frame pointer based stores
need not be killed upon encountering a const function call.
The above is wrong for sibcalls.  Sibcall arguments are relative to the 
incoming

argument pointer.  Is this always the frame pointer?

If that is the case, then it may be possible to eliminate stack based 
stores when

we have a sibcall before reload.

 if (reload_completed || SIBLING_CALL_P (insn))
  insn_info->frame_read = true;

This works.

Dave

--
John David Anglindave.ang...@bell.net

Re: [patch c++]: Fix PR/64127 ICE on invalid: tree check: exprected identifier_node, have template_id_expr in cp_parser_diagnose_invalid_type_name

2014-12-04 Thread Kai Tietz

2014-12-04 16:47 GMT+01:00 Marek Polacek :
> On Thu, Dec 04, 2014 at 04:12:02PM +0100, Kai Tietz wrote:
>> Hi,
>>
>> this patch fixes an ICE happening on invalid code for < c++11.  It is
>> reasoned by
>> accessing blindly identifier without checking that it is a declaration.
>>
>> ChangeLog
>>
>> 2014-12-04  Kai Tietz  
>>
>> PR c++/64127
>> * parser.c (cp_parser_diagnose_invalid_type_name): Check
>> id for being a declaration before accessing identifier.
>>
>> Tested on x86_64-unknown-linux-gnu.
>>
>> Ok for apply?
>
> Testcase?
>
> Marek

Same as said before.  Issue is a invalid-code bug with ICE, and
error-messages are pretty meaningless.  It would be helpful to have in
testsuite just the opportunity to test for no ICE.

Anyway, if testcase is requested, I can add it to g++.dg/ collection

Kai

Re: [patch v2, aarch64] additional bics patterns

2014-12-04 Thread Eric Botcazou

> 2014-11-19  Sandra Loosemore  
> 
>   gcc/
>   * simplify-rtx.c (simplify_relational_operation_1): Handle
>   simplification identities for BICS patterns.
> 
>   gcc/testsuite/
>   * gcc.target/aarch64/bics_4.c: New.

OK for mainline, but there are trailing spaces in the patch.

-- 
Eric Botcazou

Re: [patch c++]: Fix PR/64127 ICE on invalid: tree check: exprected identifier_node, have template_id_expr in cp_parser_diagnose_invalid_type_name

2014-12-04 Thread Jakub Jelinek

On Thu, Dec 04, 2014 at 04:59:21PM +0100, Kai Tietz wrote:
> Same as said before.  Issue is a invalid-code bug with ICE, and
> error-messages are pretty meaningless.  It would be helpful to have in
> testsuite just the opportunity to test for no ICE.

You can add just // { dg-error "" } on all lines that are diagnosed,
the ICE should FAIL the test anyway.  Please verify it by checking
the testcase fails without the patch and succeeeds with the patch.

> Anyway, if testcase is requested, I can add it to g++.dg/ collection

Yes, please.

Jakub

Re: [PATCH] Careful with BIT_NOT_EXPR folding (PR middle-end/56917)

2014-12-04 Thread Richard Biener

On December 4, 2014 4:45:25 PM CET, Marek Polacek  wrote:
>The PR shows a case in which fold introduces undefined behavior in a
>valid program, because what it does here is
>-(long int) (ul + ULONG_MAX) - 1 ->
>~(long int) (ul + ULONG_MAX) ->
>-(long int) ul
>But the latter transformation is wrong if ul is unsigned long and
>equals
>LONG_MAX + 1UL, because that introduces -LONG_MIN, and ubsan/-ftrapv
>errors on that, even though the program is valid.
>(Converting LONG_MAX + 1UL to long is implementation-defined.)

Implementation defined behavior is OK.

Rather than disabling the transform I'd rather make it defined using unsigned 
types.

Richard.

>I tried to limit the folding a bit so that it doesn't creep in UB,
>but I would really appreciate second pair of eyes here...
>
>Bootstrapped/regtested on ppc64-linux, ok for trunk?
>
>2014-12-04  Marek Polacek  
>
>   PR middle-end/56917
>   * fold-const.c (fold_unary_loc): Don't introduce undefined behavior
>   when transforming ~ (A - 1) or ~ (A + -1) to -A.
>
>   * c-c++-common/ubsan/pr56917.c: New test.
>
>diff --git gcc/fold-const.c gcc/fold-const.c
>index 17eb5bb..1e56b28 100644
>--- gcc/fold-const.c
>+++ gcc/fold-const.c
>@@ -8140,7 +8140,16 @@ fold_unary_loc (location_t loc, enum tree_code
>code, tree type, tree op0)
>  && ((TREE_CODE (arg0) == MINUS_EXPR
>   && integer_onep (TREE_OPERAND (arg0, 1)))
>  || (TREE_CODE (arg0) == PLUS_EXPR
>- && integer_all_onesp (TREE_OPERAND (arg0, 1)
>+ && integer_all_onesp (TREE_OPERAND (arg0, 1
>+ /* Be careful not to introduce new overflows when
>+transforming to -A.  That might happen if type of A
>+is unsigned type for TYPE and the value of A is equal
>+TYPE_MAX + 1.  */
>+ && (TYPE_PRECISION (type)
>+ != TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg0, 1)))
>+ || TYPE_OVERFLOW_WRAPS (type)
>+ || TYPE_UNSIGNED (type)
>+== TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (arg0, 1)
>   return fold_build1_loc (loc, NEGATE_EXPR, type,
>   fold_convert_loc (loc, type,
> TREE_OPERAND (arg0, 0)));
>diff --git gcc/testsuite/c-c++-common/ubsan/pr56917.c
>gcc/testsuite/c-c++-common/ubsan/pr56917.c
>index e69de29..e676962 100644
>--- gcc/testsuite/c-c++-common/ubsan/pr56917.c
>+++ gcc/testsuite/c-c++-common/ubsan/pr56917.c
>@@ -0,0 +1,34 @@
>+/* PR middle-end/56917 */
>+/* { dg-do run } */
>+/* { dg-options "-fsanitize=undefined -fno-sanitize-recover=undefined"
>} */
>+
>+#define INT_MIN (-__INT_MAX__ - 1)
>+#define LONG_MIN (-__LONG_MAX__ - 1L)
>+#define LLONG_MIN (-__LONG_LONG_MAX__ - 1LL)
>+
>+int __attribute__ ((noinline,noclone))
>+fn1 (unsigned int u)
>+{
>+  return (-(int) (u - 1U)) - 1;
>+}
>+
>+long __attribute__ ((noinline,noclone))
>+fn2 (unsigned long int ul)
>+{
>+  return (-(long) (ul - 1UL)) - 1L;
>+}
>+
>+long long __attribute__ ((noinline,noclone))
>+fn3 (unsigned long long int ull)
>+{
>+  return (-(long long) (ull - 1ULL)) - 1LL;
>+}
>+
>+int
>+main (void)
>+{
>+  if (fn1 (__INT_MAX__ + 1U) != INT_MIN
>+  || fn2 (__LONG_MAX__ + 1UL) != LONG_MIN
>+  || fn3 (__LONG_LONG_MAX__ + 1ULL) != LLONG_MIN)
>+__builtin_abort ();
>+}
>
>   Marek

Re: [PATCH x86_64] Optimize access to globals in "-fpie -pie" builds with copy relocations

2014-12-04 Thread H.J. Lu

On Thu, Dec 4, 2014 at 4:44 AM, Uros Bizjak  wrote:
> On Wed, Dec 3, 2014 at 10:35 PM, H.J. Lu  wrote:
>
>>> It would probably help reviewers if you pointed to actual path
>>> submission [1], which unfortunately contains the explanation in the
>>> patch itself [2], which further explains that this functionality is
>>> currently only supported with gold, patched with [3].
>>>
>>> [1] https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00645.html
>>> [2] https://gcc.gnu.org/ml/gcc-patches/2014-09/txt2CHtu81P1O.txt
>>> [3] https://sourceware.org/ml/binutils/2014-05/msg00092.html
>>>
>>> After a bit of the above detective work, I think that new gcc option
>>> is not necessary. The configure should detect if new functionality is
>>> supported in the linker, and auto-configure gcc to use it when
>>> appropriate.
>>
>> I think GCC option is needed since one can use -fuse-ld= to
>> change linker.
>
> IMO, nobody will use this highly special x86_64-only option. It would
> be best for gnu-ld to reach feature parity with gold as far as this
> functionality is concerned. In this case, the optimization would be
> auto-configured, and would fire automatically, without any user
> intervention.
>

 Let's do it.  I implemented the same feature in bfd linker on both
 master and 2.25 branch.

>>>
>>> +bool
>>> +i386_binds_local_p (const_tree exp)
>>> +{
>>> +  /* Globals marked extern are treated as local when linker copy 
>>> relocations
>>> + support is available with -f{pie|PIE}.  */
>>> +  if (TARGET_64BIT && ix86_copyrelocs && flag_pie
>>> +  && TREE_CODE (exp) == VAR_DECL
>>> +  && DECL_EXTERNAL (exp) && !DECL_WEAK (exp))
>>> +return true;
>>> +  return default_binds_local_p (exp);
>>> +}
>>> +
>>>
>>> It returns true with -fPIE and false without -fPIE.  It is lying to 
>>> compiler.
>>> Maybe legitimate_pic_address_disp_p is a better place.
>
> Agreed.
>
>> Something like this?
>
> Yes.
>
> OK, if Jakub doesn't have any objections here. Please also add
> Sriraman as author to ChangeLog entry.
>
> Thanks,
> Uros.

Here is the patch.   OK to install?

Thanks.

-- 
H.J.
---
Normally, with -fPIE/-fpie, GCC accesses globals that are extern to the
module using the GOT.  This is two instructions, one to get the address
of the global from the GOT and the other to get the value.  If it turns
out that the global gets defined in the executable at link-time, it still
needs to go through the GOT as it is too late then to generate a direct
access.

Examples:

foo.cc
--
int a_glob;
int main () {
  return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code directly accesses the global via
PC-relative insn:

5e0   :
   mov0x165a(%rip),%eax# 1c40 

foo.cc
--

extern int a_glob;
int main () {
  return a_glob; // defined in this file
}

With -O2 -fpie -pie, the generated code accesses global via GOT using
two memory loads:

6f0  :
   mov0x1609(%rip),%rax   # 1d00 <_DYNAMIC+0x230>
   mov(%rax),%eax

This is true even if in the latter case the global was defined in the
executable through a different file.

Some experiments on google benchmarks shows that the extra memory loads
affects performance by 1% to 5%.

Solution - Copy Relocations:

When the linker supports copy relocations, GCC can always assume that
the global will be defined in the executable.  For globals that are truly
extern (come from shared objects), the linker will create copy relocations
and have them defined in the executable. Result is that no global access
needs to go through the GOT and hence improves performance.

This optimization only applies to undefined, non-weak global data.
Undefined, weak global data access still must go through the GOT.

This patch checks if linker supports PIE with copy reloc, which is
enabled in gold and bfd linker in bininutils 2.25, at configure time
and enables this optimization if the linker support is available.

gcc/

* configure.ac (HAVE_LD_PIE_COPYRELOC): Defined to 1 if
Linux/x86-64 linker supports PIE with copy reloc.
* config.in: Regenerated.
* configure: Likewise.

* config/i386/i386.c (legitimate_pic_address_disp_p): Allow
pc-relative address for undefined, non-weak, non-function
symbol reference in 64-bit PIE if linker supports PIE with
copy reloc.

* doc/sourcebuild.texi: Document pie_copyreloc target.

gcc/testsuite/

* gcc.target/i386/pie-copyrelocs-1.c: New test.
* gcc.target/i386/pie-copyrelocs-2.c: Likewise.
* gcc.target/i386/pie-copyrelocs-3.c: Likewise.
* gcc.target/i386/pie-copyrelocs-4.c: Likewise.

* lib/target-supports.exp (check_effective_target_pie_copyreloc):
New procedure.
From d5559a969c541e5375da9372f6925f40b87df5f3 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 4 Dec 2014 08:27:22 -0800
Subject: [PATCH] x86-64: Optimize access to globals in PIE with copy reloc

Normally, with -fPIE/-fpie, GCC accesses globals that are extern to the
modul

Re: [PINGv2][PATCH] Ignore alignment by option

2014-12-04 Thread Dmitry Vyukov

On Thu, Dec 4, 2014 at 5:16 PM, Yury Gribov  wrote:
> On 12/04/2014 05:04 PM, Dmitry Vyukov wrote:
>>
>> On Thu, Dec 4, 2014 at 4:48 PM, Yury Gribov  wrote:
>>>
>>> On 12/04/2014 03:47 PM, Dmitry Vyukov wrote:


 size_in_bytes = -1 instrumentation is too slow to be the default in
 kernel.

 If we want to pursue this, I propose a different scheme.
 Handle 8+ byte accesses as 1/2/4 accesses. No changes to 1/2/4 access
 handling.
 Currently when we allocate, say, 17-byte object we store 0 0 1 into
 shadow. An 8-byte access starting at offset 15 won't be detected,
 because the corresponding shadow value is 0. Instead we start storing
 0 9 1 into shadow. Then the first shadow != 0 check will fail, and the
 precise size check will catch the OOB access.
 Make this scheme the default for kernel (no additional flags).

 This scheme has the following advantages:
 - load shadow only once (as opposed to the current size_in_bytes = -1
 check that loads shadow twice)
 - less code in instrumentation
 - accesses to beginning and middle of the object are not slowed down
 (shadow still contains 0, so fast-path works); only accesses to the
 very last bytes of the object are penalized.
>>>
>>>
>>>
>>> Makes sense.  The scheme actually looks bullet-proof, why Asan team
>>> preferred current (fast but imprecise) algorithm?
>>>
>>> BTW I think we'll want this option in userspace so well so we'll probably
>>> need to update libasan.
>>
>>
>> We've discussed this scheme, but nobody has shown that it's important
>> enough.
>> It bloats binary (we do have issues with binary sizes) and slows down
>> execution a bit. And if it is non-default mode, then it adds more
>> flags (which is bad) and adds more configurations to test.
>>
>> For this to happen somebody needs to do research on (1) binary size
>> increase, (2) slowdown, (3) number of additional bugs it finds (we can
>> run it over extensive code base that is currently asan-clean).
>
>
> Regarding (3) - unless a codebase deliberately uses unaligned accesses (like
> kernel) this change would be of little use - all unaligned accesses are then
> bugs and should already be detected and fixed with UBSan.

You answered your own question about user space :)

Re: [PINGv2][PATCH] Ignore alignment by option

2014-12-04 Thread Yuri Gribov

On Thu, Dec 4, 2014 at 8:06 PM, 'Dmitry Vyukov' via address-sanitizer
 wrote:
> You answered your own question about user space :)

Yeah, I hoped someone would rush to overpersuade me...

-Y

Re: Fix folding of EQ/NE_EXPR WRT symtab aliases

2014-12-04 Thread Jan Hubicka

> 
> I think you want get_addr_base_and_unit_offset here.  But I really wonder

I copied what I found in tree-ssa-alias. The differenc eis that
get_addr_base_and_unit_offset won't give a range for variable sized accesses 
right?

> what cases this code catches that the code in fold_comparison you touched
> above does not?  (apart from previously being bogus in different kind of
> ways)
> 
> So I'd rather remove the code in fold_binary.

Well, the code in fold_binary only handles comparsions where base addresses are 
known
to be different. I.e. &a==&b returning false.
All the other cases are handled here.  I.. &a==&a or &a[5]==&b[7] etc.
Perhaps the code should be in fold_comparison?

Honza
> 
> Thanks,
> Richard.
> 
> > + offset_int offset0 = hwi_offset0;
> > + offset_int offset1 = hwi_offset1;
> > +
> > + /* Add constant offset of MEM_REF to OFFSET, if possible.  */
> > + if (base0 && TREE_CODE (base0) == MEM_REF
> > + && TREE_CODE (TREE_OPERAND (base0, 1)) == INTEGER_CST)
> > +   {
> > + offset0 += wi::lshift (mem_ref_offset (base0), 
> > LOG2_BITS_PER_UNIT);
> > + base0 = TREE_OPERAND (base0, 0);
> > +   }
> > + if (base1 && TREE_CODE (base1) == MEM_REF
> > + && TREE_CODE (TREE_OPERAND (base1, 1)) == INTEGER_CST)
> > +   {
> > + offset1 += wi::lshift (mem_ref_offset (base1), 
> > LOG2_BITS_PER_UNIT);
> > + base1 = TREE_OPERAND (base1, 0);
> > +   }
> > + /* If both offsets of MEM_REF are the same, just ignore them.  */
> > + if (base0 && base1
> > + && TREE_CODE (base0) == MEM_REF
> > + && TREE_CODE (base1) == MEM_REF
> > + && operand_equal_p (TREE_OPERAND (base0, 1), TREE_OPERAND 
> > (base1, 1), 0))
> > +   {
> > + base0 = TREE_OPERAND (base0, 0);
> > + base1 = TREE_OPERAND (base1, 1);
> > +   }
> > + /* If we see MEM_REF with variable offset, just modify MAX_SIZE 
> > to declare that
> > +sizes are unknown.  We can still prove that bases points to 
> > different memory
> > +locations.  */
> > + if (base0 && TREE_CODE (base0) == MEM_REF)
> > +   {
> > + base0 = TREE_OPERAND (base0, 0);
> > + max_size0 = -1;
> > +   }
> > + if (base1 && TREE_CODE (base1) == MEM_REF)
> > +   {
> > + base1 = TREE_OPERAND (base1, 0);
> > + max_size1 = -1;
> > +   }
> > +
> > + /* If bases are equal or they are both declarations, we can 
> > disprove equivalency by
> > +proving that offsets are different.  */
> > + if (base0 && base1 && (base0 == base1 || (DECL_P (base0) && 
> > DECL_P (base1
> > +   {
> > + if ((wi::ltu_p (offset0 + max_size0 - size0, offset1)
> > +  || (wi::ltu_p (offset1 + max_size1 - size1, offset0)))
> > + && max_size0 != -1 && max_size1 != -1
> > + && size0 != -1 && size1 != -1)
> > +   return constant_boolean_node (code != EQ_EXPR, type);
> > +   }
> > + /* If bases are equal, then addresses are equal if offsets are.
> > +We can work hader here for non-constant offsets.  */
> > + if (base0 && base0 == base1)
> > +   {
> > + if (base0 == base1
> > + && size0 != -1 && size1 != -1
> > + && (max_size0 == size0) && (max_size1 == size1))
> > +   return constant_boolean_node (code == EQ_EXPR, type);
> > +   }
> > +
> > + if (base0 && base1 && DECL_P (base0) && DECL_P (base1))
> > +   {
> > + bool in_symtab0 = decl_in_symtab_p (base0);
> > + bool in_symtab1 = decl_in_symtab_p (base1);
> > +
> > + /* Symtab and non-symtab declarations never overlap.  */
> > + if (in_symtab0 != in_symtab1)
> > +   return constant_boolean_node (code != EQ_EXPR, type);
> > + /* Non-symtab nodes never have aliases: different declaration 
> > means
> > +different memory object.  */
> > + if (!in_symtab0)
> > +   {
> > + if (base0 != base1 && TREE_CODE (base0) != TREE_CODE 
> > (base1))
> > +   return constant_boolean_node (code != EQ_EXPR, type);
> > +   }
> > + else
> > +   {
> > + struct symtab_node *symbol0 = symtab_node::get_create 
> > (base0);
> > + struct symtab_node *symbol1 = symtab_node::get_create 
> > (base1);
> > + int cmp = symbol0->equal_address_to (symbol1);
> > +
> > + if (cmp == 0)
> > +   return constant_boolean_node (code != EQ_EXPR, type);
> > + if (cmp == 1
> > + && size0 != -1 && size1 != -1
> > + && (max_siz

Re: [PATCH 2/2, PR 63814] Do not re-create expanded artificial thunks

2014-12-04 Thread Jan Hubicka

> 
> I did not really like it either (although I must say that in general I
> really dislike the need to call expand_thunk but telling it to not

Yeah, I am thinking about simply teaching inliner to handle them.  Given the 
issues
with extra copies for values passed by reference it may be better solution.

> really expand anything if possible).  So here is another approach,
> which works by delaying expansion of thunks but only after all
> redirections within create_clone are done.  This has the advantage
> that the new method expand_all_artificial_thunks is not exposed to
> every (indirect) user of cgraph_node::create_clone (as opposed to
> requiring them all to call that function at some point after making
> any redirections at all).  However, it also has the disadvantage that
> in the unlikeliest of circumstances (strongly connected components in
> IPA-CP value dependencies connected by thunks and these thunks ceasing
> to be assembly thunks in the course of cloning at the same time),
> IPA-CP might still create extra gimple-expanded thunks.

This is because you expand the thunks during creating and lose track of them?
> 
> The alternative not suffering from this problem would be either
> exposing expand_all_artificial_thunks to all users and requiring them
> to call it, with IPA-CP doing it for all created clones at the very
> end of propagation, or teaching the infrastructure and pass manager
> about this and then expanding them at some convenient time
> (before/after inlining?).  However, the first option seems to be too
> ugly for very little benefit (but I am willing to implement that) and
> the second does not seem to be stage3 material.
> 
> What do you think?  Bootstrapped and tested on x86_64-linux.

Hmm, I think thunks in SCC regions are rather rare and small.  Lets go with this
and try to look whether we can't get thunks more regular in inliner.
> 
> Martin
> 
> 
> 2014-12-02  Martin Jambor  
> 
>   * cgraph.h (cgraph_node): New method expand_all_artificial_thunks.
>   (cgraph_edge): New method redirect_callee_duplicating_thunks.
>   * cgraphclones.c (duplicate_thunk_for_node): Donot expand newly
>   created thunks.
>   (redirect_edge_duplicating_thunks): Turned into edge method
>   redirect_callee_duplicating_thunks.
>   (cgraph_node::expand_all_artificial_thunks): New method.
>   (create_clone): Call expand_all_artificial_thunks.
>   * ipa-cp.c (perhaps_add_new_callers): Call
>   redirect_callee_duplicating_thunks instead of redirect_callee.
>   Also call expand_all_artificial_thunks.

OK
Honza
> 
> Index: src/gcc/cgraph.h
> ===
> --- src.orig/gcc/cgraph.h
> +++ src/gcc/cgraph.h
> @@ -908,6 +908,10 @@ public:
>   thunks that are not lowered.  */
>bool expand_thunk (bool output_asm_thunks, bool force_gimple_thunk);
>  
> +  /*  Call expand_thunk on all callers that are thunks and if analyze those
> +  nodes that were expanded.  */
> +  void expand_all_artificial_thunks ();
> +
>/* Assemble thunks and aliases associated to node.  */
>void assemble_thunks_and_aliases (void);
>  
> @@ -1477,6 +1481,12 @@ struct GTY((chain_next ("%h.next_caller"
>   call expression.  */
>void redirect_callee (cgraph_node *n);
>  
> +  /* If the edge does not lead to a thunk, simply redirect it to N.  
> Otherwise
> + create one or more equivalent thunks for N and redirect E to the first 
> in
> + the chain.  Note that it is then necessary to call
> + n->expand_all_artificial_thunks once all callers are redirected.  */
> +  void redirect_callee_duplicating_thunks (cgraph_node *n);
> +
>/* Make an indirect edge with an unknown callee an ordinary edge leading to
>   CALLEE.  DELTA is an integer constant that is to be added to the this
>   pointer (first parameter) to compensate for skipping
> Index: src/gcc/cgraphclones.c
> ===
> --- src.orig/gcc/cgraphclones.c
> +++ src/gcc/cgraphclones.c
> @@ -370,28 +370,47 @@ duplicate_thunk_for_node (cgraph_node *t
> CGRAPH_FREQ_BASE);
>e->call_stmt_cannot_inline_p = true;
>symtab->call_edge_duplication_hooks (thunk->callees, e);
> -  if (new_thunk->expand_thunk (false, false))
> -{
> -  new_thunk->thunk.thunk_p = false;
> -  new_thunk->analyze ();
> -}
> -
>symtab->call_cgraph_duplication_hooks (thunk, new_thunk);
>return new_thunk;
>  }
>  
>  /* If E does not lead to a thunk, simply redirect it to N.  Otherwise create
> one or more equivalent thunks for N and redirect E to the first in the
> -   chain.  */
> +   chain.  Note that it is then necessary to call
> +   n->expand_all_artificial_thunks once all callers are redirected.  */
>  
>  void
> -redirect_edge_duplicating_thunks (cgraph_edge *e, cgraph_node *n)
> +cgraph_edge::redirect_callee_duplicating_thunks (cgraph_no

1 2 >

1 - 100 of 147 matches

Mail list logo