Re: [patch,fortran] PR50405 - Statement function with itself as argument SEGV's

2013-05-17 Thread Tobias Burnus

Am 17.05.2013 05:22, schrieb Bud Davis:

Not to much to add beyond the title and the patch.
The test file fails before (eventually, when you run out of stack) and passes 
after the patch is applied.  No new testsuite failures.


--bud

Index: gcc/gcc/fortran/resolve.c
===
--- gcc/gcc/fortran/resolve.c(revision 198955)
+++ gcc/gcc/fortran/resolve.c(working copy)
@@ -306,6 +306,14 @@
 && !resolve_procedure_interface (sym))
  return;
  
+  if (strcmp (proc->name,sym->name) == 0)


Missing blank after the comma.


+{
+   gfc_error ("Self referential argument "
+   "'%s' at %L is not allowed", sym->name,
+   &proc->declared_at);
+   return;


Indentation is wrong. (As a friend of hyphens, I would add one between 
self and referential, but it is also fine without.)



!{ dg-do compile }
! submitted by zec...@gmail.com
!{ dg-prune-output "Obsolescent feature: Statement function at" }


Please add "! PR fortran/50405" as comment.

Instead of dg-prune-output,  you could also use: '! { dg-options "" }'. 
That will override the default setting, i.e. it removes the "-pedantic".



f(f) = 0 ! { dg-error "Self referential argument" }
end

2013-05-17  Bud Davis  

 PR fortran/50405
 resolve.c (resolve_formal_arglist): Detect error when an argument
 has the same name as the function.


OK and thanks for the patch!

Tobias

PS: Nice that you are back to (casual) gfortran development.


[ping] Re: [patch] install host specific {bits,ext}/opt_random.h headers in host specific c++ incdir

2013-05-17 Thread Matthias Klose
ping?

Am 12.05.2013 11:58, schrieb Matthias Klose:
> {bits,ext}/opt_random.h became host specific in 4.8, but are still installed
> into the host independent c++ include dir.  install them into the host 
> specific
> include dir instead.  Ok for 4.8 and trunk?
> 
>   Matthias
> 



Re: [PATCH] Pattern recognizer rotate improvement

2013-05-17 Thread Richard Biener
On Fri, 17 May 2013, Jakub Jelinek wrote:

> On Wed, May 15, 2013 at 03:24:37PM +0200, Richard Biener wrote:
> > We have the same issue in some other places where we insert invariant
> > code into the loop body - one reason there is another LIM pass
> > after vectorization.
> 
> Well, in this case it causes the shift amount to be loaded into a vector
> instead of scalar, therefore even when LIM moves it before the loop, it
> will only work with vector/vector shifts and be more expensive that way
> (need to broadcast the value in a vector).  The following patch
> improves it slightly at least for loops, by just emitting the shift amount
> stmts to loop preheader, rotate-4.c used to be only vectorizable with
> -mavx2 (which has vector/vector shifts), now also -mavx (which doesn't)
> vectorizes it.  Unfortunately this trick doesn't work for SLP vectorization,
> emitting the stmts at the start of the current bb doesn't help, because
> every stmt emits its own and thus it is vectorized with vector/vector
> shifts only anyway.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.
 
> 2013-05-17  Jakub Jelinek  
> 
>   * tree-vect-patterns.c (vect_recog_rotate_pattern): For
>   vect_external_def oprnd1 with loop_vinfo, try to emit
>   optional cast, negation and and stmts on the loop preheader
>   edge instead of into the pattern def seq.
> 
>   * gcc.target/i386/rotate-4.c: Compile only with -mavx
>   instead of -mavx2, require only avx instead of avx2.
>   * gcc.target/i386/rotate-4a.c: Include avx-check.h instead
>   of avx2-check.h and turn into an avx runtime test instead of
>   avx2 runtime test.
> 
> --- gcc/tree-vect-patterns.c.jj   2013-05-16 13:56:08.0 +0200
> +++ gcc/tree-vect-patterns.c  2013-05-16 15:27:00.565143478 +0200
> @@ -1494,6 +1494,7 @@ vect_recog_rotate_pattern (vec *
>bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
>enum vect_def_type dt;
>optab optab1, optab2;
> +  edge ext_def = NULL;
>  
>if (!is_gimple_assign (last_stmt))
>  return NULL;
> @@ -1574,6 +1575,21 @@ vect_recog_rotate_pattern (vec *
>if (*type_in == NULL_TREE)
>  return NULL;
>  
> +  if (dt == vect_external_def
> +  && TREE_CODE (oprnd1) == SSA_NAME
> +  && loop_vinfo)
> +{
> +  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  ext_def = loop_preheader_edge (loop);
> +  if (!SSA_NAME_IS_DEFAULT_DEF (oprnd1))
> + {
> +   basic_block bb = gimple_bb (SSA_NAME_DEF_STMT (oprnd1));
> +   if (bb == NULL
> +   || !dominated_by_p (CDI_DOMINATORS, ext_def->dest, bb))
> + ext_def = NULL;
> + }
> +}
> +
>def = NULL_TREE;
>if (TREE_CODE (oprnd1) == INTEGER_CST
>|| TYPE_MODE (TREE_TYPE (oprnd1)) == TYPE_MODE (type))
> @@ -1593,7 +1609,14 @@ vect_recog_rotate_pattern (vec *
>def = vect_recog_temp_ssa_var (type, NULL);
>def_stmt = gimple_build_assign_with_ops (NOP_EXPR, def, oprnd1,
>  NULL_TREE);
> -  append_pattern_def_seq (stmt_vinfo, def_stmt);
> +  if (ext_def)
> + {
> +   basic_block new_bb
> + = gsi_insert_on_edge_immediate (ext_def, def_stmt);
> +   gcc_assert (!new_bb);
> + }
> +  else
> + append_pattern_def_seq (stmt_vinfo, def_stmt);
>  }
>stype = TREE_TYPE (def);
>  
> @@ -1618,11 +1641,19 @@ vect_recog_rotate_pattern (vec *
>def2 = vect_recog_temp_ssa_var (stype, NULL);
>def_stmt = gimple_build_assign_with_ops (NEGATE_EXPR, def2, def,
>  NULL_TREE);
> -  def_stmt_vinfo
> - = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo);
> -  set_vinfo_for_stmt (def_stmt, def_stmt_vinfo);
> -  STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype;
> -  append_pattern_def_seq (stmt_vinfo, def_stmt);
> +  if (ext_def)
> + {
> +   basic_block new_bb
> + = gsi_insert_on_edge_immediate (ext_def, def_stmt);
> +   gcc_assert (!new_bb);
> + }
> +  else
> + {
> +   def_stmt_vinfo = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo);
> +   set_vinfo_for_stmt (def_stmt, def_stmt_vinfo);
> +   STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype;
> +   append_pattern_def_seq (stmt_vinfo, def_stmt);
> + }
>  
>def2 = vect_recog_temp_ssa_var (stype, NULL);
>tree mask
> @@ -1630,11 +1661,19 @@ vect_recog_rotate_pattern (vec *
>def_stmt = gimple_build_assign_with_ops (BIT_AND_EXPR, def2,
>  gimple_assign_lhs (def_stmt),
>  mask);
> -  def_stmt_vinfo
> - = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo);
> -  set_vinfo_for_stmt (def_stmt, def_stmt_vinfo);
> -  STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype;
> -  append_pattern_def_seq (stmt_vinfo, def_stmt);
> +  if (ext_def)
> + {
>

Re: [PATCH] Fix VEC_[LR]SHIFT_EXPR folding for big-endian (PR tree-optimization/57051)

2013-05-17 Thread Richard Biener
On Fri, 17 May 2013, Jakub Jelinek wrote:

> On Thu, May 16, 2013 at 07:59:00PM +0200, Mikael Pettersson wrote:
> > Jakub Jelinek writes:
> >  > On Thu, Apr 25, 2013 at 11:47:02PM +0200, Jakub Jelinek wrote:
> >  > > This patch adds folding of constant arguments v>> and v<<, which helps 
> > to
> >  > > optimize the testcase from the PR back into constant store after 
> > vectorized
> >  > > loop is unrolled.
> >  > 
> >  > As this fixes a regression on the 4.8 branch, I've backported it (and
> >  > minimal prerequisite for that) to 4.8 branch too.
> > 
> > Unfortunately this patch makes gcc.dg/vect/no-scevccp-outer-{7,13}.c fail
> > on powerpc64-linux:
> > 
> > +FAIL: gcc.dg/vect/no-scevccp-outer-13.c execution test
> > +FAIL: gcc.dg/vect/no-scevccp-outer-7.c execution test
> > 
> > which is a regression from 4.8-20130502.  Reverting r198580 fixes it.
> > 
> > The same FAILs also occur on trunk.
> 
> Ah right, I was confused by the fact that VEC_RSHIFT_EXPR is used
> not just on little endian targets, but on big endian as well
> (VEC_LSHIFT_EXPR is never emitted), but the important spot is
> when extracting the scalar result from the vector:
> 
>   if (BYTES_BIG_ENDIAN)
> bitpos = size_binop (MULT_EXPR,
>  bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1),
>  TYPE_SIZE (scalar_type));
>   else
> bitpos = bitsize_zero_node;
> 
> Fixed thusly, ok for trunk/4.8?

Ok with a comment in front of (code == VEC_RSHIFT_EXPR) ^ 
(!BYTES_BIG_ENDIAN)

Thanks,
Richard.

> 2013-05-17  Jakub Jelinek  
> 
>   PR tree-optimization/57051
>   * fold-const.c (const_binop)case VEC_RSHIFT_EXPR>: Fix BYTES_BIG_ENDIAN handling.
> 
> --- gcc/fold-const.c.jj   2013-05-16 12:36:28.0 +0200
> +++ gcc/fold-const.c  2013-05-17 08:38:12.575117676 +0200
> @@ -1393,7 +1393,7 @@ const_binop (enum tree_code code, tree a
> if (shiftc >= outerc || (shiftc % innerc) != 0)
>   return NULL_TREE;
> int offset = shiftc / innerc;
> -   if (code == VEC_LSHIFT_EXPR)
> +   if ((code == VEC_RSHIFT_EXPR) ^ (!BYTES_BIG_ENDIAN))
>   offset = -offset;
> tree zero = build_zero_cst (TREE_TYPE (type));
> for (i = 0; i < count; i++)
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend


Re: [PATCH] Fix incorrect discriminator assignment.

2013-05-17 Thread Richard Biener
On Wed, May 15, 2013 at 6:50 PM, Cary Coutant  wrote:
>> gcc/ChangeLog:
>>
>> * tree-cfg.c (locus_descrim_hasher::hash): Only hash lineno.
>> (locus_descrim_hasher::equal): Likewise.
>> (next_discriminator_for_locus): Likewise.
>> (assign_discriminator): Add return value.
>> (make_edges): Assign more discriminator if necessary.
>> (make_cond_expr_edges): Likewise.
>> (make_goto_expr_edges): Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.dg/debug/dwarf2/discriminator.c: New Test.
>
> Looks OK to me, but I can't approve patches for tree-cfg.c.
>
> Two comments:
>
> (1) In the C++ conversion, it looks like someone misspelled "discrim"
> in "locus_descrim_hasher".
>
> (2) Is this worth fixing in trunk when we'll eventually switch to a
> per-instruction discriminator?

> This patch fixes a common case where one line is distributed to 3 BBs,
> but only 1 discriminator is assigned.

As far as I can see the patch makes discriminators coarser (by hashing
and comparing on LOCATION_LINE and not on the location).  It also has
the changes like

-  assign_discriminator (entry_locus, then_bb);
+  if (assign_discriminator (entry_locus, then_bb))
+assign_discriminator (entry_locus, bb);

which is what the comment refers to?  I think those changes are short-sighted
because what happens if the 2nd assign_discriminator call has exactly the
same issue?  Especially with the make_goto_expr_edges change there
may be multiple predecessors where I cannot see how the change is
correct.  Yes, the change changes something and thus may fix the testcase
but it doesn't change things in a predictable way as far as I can see.

So - the change to compare/hash LOCATION_LINE looks good to me,
but the assign_discriminator change needs more explaining.

Richard.

> -cary


Re: debuggability of recog_data

2013-05-17 Thread Richard Biener
On Thu, May 16, 2013 at 11:04 PM, H.J. Lu  wrote:
> On Thu, May 16, 2013 at 2:02 PM, Richard Sandiford
>  wrote:
>> Steven Bosscher  writes:
>>> On Wed, May 15, 2013 at 12:14 AM, Mike Stump wrote:
 I don't what to bike shed.  So, I'm happy if the next poor soul that
 touches it just does so.  If people like recog_data_info, I'd be happy
 to change it to that.  Let's give then peanut gallery a day to vote on
 it.  :-)
>>>
>>> Usually we append "_d" or "_def" to structure definitions, so 
>>> recog_data_def?
>>
>> Gah, I wrote the patch from memory and forgot about the bit after the comma.
>> I'm not trying to be contrary really. :-)
>>
>> Bootstrapped & regression-tested on x86_64-linux-gnu.  OK to install?
>>
>> Thanks,
>> Richard
>>
>>
>> gcc/
>> * recog.h (Recog_data): Rename to...
>> (recog_data_t): ...this.
> ^^^
> It should be recog_data_d.

Ok with that change.

Thanks,
Richard.

>
> --
> H.J.


[PATCH] Fix extendsidi2_1 splitting (PR rtl-optimization/57281, PR rtl-optimization/57300 wrong-code, alternative)

2013-05-17 Thread Jakub Jelinek
On Thu, May 16, 2013 at 06:22:10PM +0200, Jakub Jelinek wrote:
> As discussed in the PR, there seem to be only 3 define_split
> patterns that use dead_or_set_p, one in i386.md and two in s390.md,
> but unfortunately insn splitting is done in many passes
> (combine, split{1,2,3,4,5}, dbr, pro_and_epilogue, final, sometimes mach)
> and only in combine the note problem is computed.  Computing the note
> problem in split{1,2,3,4,5} just because of the single pattern on i?86 -m32
> and one on s390x -m64 might be too expensive, and while neither of these
> targets do dbr scheduling, e.g. during final without cfg one can't
> df_analyze.
> 
> So, the following patch fixes it by doing the transformation instead
> in the peephole2 pass which computes the notes problem and has REG_DEAD
> notes up2date (and peep2_reg_dead_p is used there heavily and works).

Alternative, so far untested, patch is let the register is not dead splitter
do its job always during split2 and just fix it up during peephole2, if the
register was dead.

For the non-cltd case the peephole2 is always desirable, we get rid of
a register move and free one hard register for potential other uses after
peephole2 (cprop_hardreg? anything else that could benefit from that?).

For the cltd case, it is questionable, while we gain a free hard register
at that spot, it isn't guaranteed any pass will benefit from that, and
cltd is 2 byts smaller than sarl $31, %eax.  Though, not sure about the
performance.  So, the patch below doesn't use the second peephole2 for -Os.

2013-05-17  Jakub Jelinek  

PR rtl-optimization/57281
PR rtl-optimization/57300
* config/i386/i386.md (extendsidi2_1 dead reg splitter): Remove.
(extendsidi2_1 peephole2s): Add instead 2 new peephole2s, that undo
what the other splitter did if the registers are dead.

* gcc.dg/pr57300.c: New test.
* gcc.c-torture/execute/pr57281.c: New test.

--- gcc/config/i386/i386.md.jj  2013-05-16 18:22:59.0 +0200
+++ gcc/config/i386/i386.md 2013-05-17 10:11:20.365455394 +0200
@@ -3332,22 +3332,8 @@ (define_insn "extendsidi2_1"
   "!TARGET_64BIT"
   "#")
 
-;; Extend to memory case when source register does die.
-(define_split
-  [(set (match_operand:DI 0 "memory_operand")
-   (sign_extend:DI (match_operand:SI 1 "register_operand")))
-   (clobber (reg:CC FLAGS_REG))
-   (clobber (match_operand:SI 2 "register_operand"))]
-  "(reload_completed
-&& dead_or_set_p (insn, operands[1])
-&& !reg_mentioned_p (operands[1], operands[0]))"
-  [(set (match_dup 3) (match_dup 1))
-   (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31)))
- (clobber (reg:CC FLAGS_REG))])
-   (set (match_dup 4) (match_dup 1))]
-  "split_double_mode (DImode, &operands[0], 1, &operands[3], &operands[4]);")
-
-;; Extend to memory case when source register does not die.
+;; Split the memory case.  If the source register doesn't die, it will stay
+;; this way, if it does die, following peephole2s take care of it.
 (define_split
   [(set (match_operand:DI 0 "memory_operand")
(sign_extend:DI (match_operand:SI 1 "register_operand")))
@@ -3376,6 +3362,48 @@ (define_split
   DONE;
 })
 
+;; Peepholes for the case where the source register does die, after
+;; being split with the above splitter.
+(define_peephole2
+  [(set (match_operand:SI 0 "memory_operand")
+   (match_operand:SI 1 "register_operand"))
+   (set (match_operand:SI 2 "register_operand") (match_dup 1))
+   (parallel [(set (match_dup 2)
+  (ashiftrt:SI (match_dup 2)
+   (match_operand:QI 3 "const_int_operand")))
+  (clobber (reg:CC FLAGS_REG))])
+   (set (match_operand:SI 4 "memory_operand") (match_dup 2))]
+  "INTVAL (operands[3]) == 31
+   && REGNO (operands[1]) != REGNO (operands[2])
+   && peep2_reg_dead_p (2, operands[1])
+   && peep2_reg_dead_p (4, operands[2])
+   && !reg_mentioned_p (operands[2], operands[4])"
+  [(set (match_dup 0) (match_dup 1))
+   (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31)))
+ (clobber (reg:CC FLAGS_REG))])
+   (set (match_dup 4) (match_dup 1))])
+
+(define_peephole2
+  [(set (match_operand:SI 0 "memory_operand")
+   (match_operand:SI 1 "register_operand"))
+   (parallel [(set (match_operand:SI 2 "register_operand")
+  (ashiftrt:SI (match_dup 1)
+   (match_operand:QI 3 "const_int_operand")))
+  (clobber (reg:CC FLAGS_REG))])
+   (set (match_operand:SI 4 "memory_operand") (match_dup 2))]
+  "INTVAL (operands[3]) == 31
+   /* cltd is shorter than sarl $31, %eax */
+   && !optimize_function_for_size_p (cfun)
+   && true_regnum (operands[1]) == AX_REG
+   && true_regnum (operands[2]) == DX_REG
+   && peep2_reg_dead_p (2, operands[1])
+   && peep2_reg_dead_p (3, operands[2])
+   && !reg_mentioned_p (operands[2], operands[4])"
+  [(set (match_dup 0) (match_dup 1))
+   (parallel [(set (ma

Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread Richard Biener
On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt
 wrote:
> This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
> vectorization opportunities.  Previously, GCC treated malloc'd arrays as
> only guaranteeing 4-byte alignment, even though the glibc implementation
> guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
> which is sufficient to permit the missed vectorization opportunities.
>
> The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
> but doing so currently exposes a latent bug that degrades a 64-bit
> benchmark.  I have therefore not included that change at this time, but
> added a FIXME recording the information.
>
> Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
> regressions.  Verified that SPEC CPU2006 degradations are fixed with no
> new degradations.  Ok for trunk?  Also, do you want any backports?

You say this is because glibc guarantees such alignment - shouldn't this
be guarded by a proper check then?  Probably AIX guarantees similar
alignment but there are also embedded elf/newlib targets, no?

Btw, the #define should possibly move to config/linux.h guarded by
OPTION_GLIBC?

Richard.

> Thanks,
> Bill
>
>
> 2013-05-16  Bill Schmidt  
>
> * config/rs6000/rs6000.h (MALLOC_ABI_ALIGNMENT): New #define.
>
>
> Index: gcc/config/rs6000/rs6000.h
> ===
> --- gcc/config/rs6000/rs6000.h  (revision 198998)
> +++ gcc/config/rs6000/rs6000.h  (working copy)
> @@ -2297,6 +2297,13 @@ extern char rs6000_reg_names[][8];   /* register 
> nam
>  /* How to align the given loop. */
>  #define LOOP_ALIGN(LABEL)  rs6000_loop_align(LABEL)
>
> +/* Alignment guaranteed by __builtin_malloc.  */
> +/* FIXME:  128-bit alignment is guaranteed by glibc for TARGET_64BIT.
> +   However, specifying the stronger guarantee currently leads to
> +   a regression in SPEC CPU2006 437.leslie3d.  The stronger
> +   guarantee should be implemented here once that's fixed.  */
> +#define MALLOC_ABI_ALIGNMENT (64)
> +
>  /* Pick up the return address upon entry to a procedure. Used for
> dwarf2 unwind information.  This also enables the table driven
> mechanism.  */
>
>


Re: [fixincludes] solaris_pow_int_overload should use __cplusplus

2013-05-17 Thread Rainer Orth
Bruce Korb  writes:

> On 05/16/13 06:41, Rainer Orth wrote:
>> Work is going on to incorporate all applicable fixincludes fixes into
>> the Solaris headers proper.  One fix is currently problematic since it
>> uses an G++-internal macro (__GXX_EXPERIMENTAL_CXX0X__) where libstdc++
>>  already switched to testing __cplusplus.  The following patch
>> updates the fix to match .
>>
>> Tested by mainline bootstraps on i386-pc-solaris2.11,
>> sparc-sun-solaris2.11 and 4.8 bootstrap on i386-pc-solaris2.10.
>>
>> Ok for mainline and 4.8 branch if they pass?
>
> Look good to me.  Thanks.

Testing completed successfully now on both 4.8 branch and mainline, thus
I've installed the patch on mainline.

Jakub, may I install on the 4.8 branch now or better wait until 4.8.1 is
released?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread Jakub Jelinek
On Fri, May 17, 2013 at 10:32:11AM +0200, Richard Biener wrote:
> On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt
>  wrote:
> > This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
> > vectorization opportunities.  Previously, GCC treated malloc'd arrays as
> > only guaranteeing 4-byte alignment, even though the glibc implementation
> > guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
> > which is sufficient to permit the missed vectorization opportunities.
> >
> > The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
> > but doing so currently exposes a latent bug that degrades a 64-bit
> > benchmark.  I have therefore not included that change at this time, but
> > added a FIXME recording the information.
> >
> > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
> > regressions.  Verified that SPEC CPU2006 degradations are fixed with no
> > new degradations.  Ok for trunk?  Also, do you want any backports?
> 
> You say this is because glibc guarantees such alignment - shouldn't this
> be guarded by a proper check then?  Probably AIX guarantees similar
> alignment but there are also embedded elf/newlib targets, no?
> 
> Btw, the #define should possibly move to config/linux.h guarded by
> OPTION_GLIBC?

For that location it shouldn't be 64, but 2 * POINTER_SIZE, which is what
glibc really guarantees on most architectures (I think x32 provides more
than that, but it can override).  But changing it requires some analysis
on commonly used malloc alternatives on Linux, what exactly do they
guarantee.  Say valgrind, ElectricFence, jemalloc, tcmalloc, libasan.
Note that on most targets there is some standard type that requires
such alignment, thus at least 2 * POINTER_SIZE alignment is also a C
conformance matter.

Jakub


Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread David Edelsohn
On Fri, May 17, 2013 at 4:47 AM, Jakub Jelinek  wrote:
> On Fri, May 17, 2013 at 10:32:11AM +0200, Richard Biener wrote:
>> On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt
>>  wrote:
>> > This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
>> > vectorization opportunities.  Previously, GCC treated malloc'd arrays as
>> > only guaranteeing 4-byte alignment, even though the glibc implementation
>> > guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
>> > which is sufficient to permit the missed vectorization opportunities.
>> >
>> > The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
>> > but doing so currently exposes a latent bug that degrades a 64-bit
>> > benchmark.  I have therefore not included that change at this time, but
>> > added a FIXME recording the information.
>> >
>> > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
>> > regressions.  Verified that SPEC CPU2006 degradations are fixed with no
>> > new degradations.  Ok for trunk?  Also, do you want any backports?
>>
>> You say this is because glibc guarantees such alignment - shouldn't this
>> be guarded by a proper check then?  Probably AIX guarantees similar
>> alignment but there are also embedded elf/newlib targets, no?
>>
>> Btw, the #define should possibly move to config/linux.h guarded by
>> OPTION_GLIBC?
>
> For that location it shouldn't be 64, but 2 * POINTER_SIZE, which is what
> glibc really guarantees on most architectures (I think x32 provides more
> than that, but it can override).  But changing it requires some analysis
> on commonly used malloc alternatives on Linux, what exactly do they
> guarantee.  Say valgrind, ElectricFence, jemalloc, tcmalloc, libasan.
> Note that on most targets there is some standard type that requires
> such alignment, thus at least 2 * POINTER_SIZE alignment is also a C
> conformance matter.

Jakub,

As Bill wrote earlier, 2 * POINTER_SIZE causes a different performance
regression for 16 byte alignment on PPC64.  8 bytes is a work-around
for now.

- David


Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread Jakub Jelinek
On Fri, May 17, 2013 at 04:55:20AM -0400, David Edelsohn wrote:
> As Bill wrote earlier, 2 * POINTER_SIZE causes a different performance
> regression for 16 byte alignment on PPC64.  8 bytes is a work-around
> for now.

I was talking about the config/linux.h definition, PPC64 can of course
override it to workaround something temporarily, but because there is some
problem on PPC64 we shouldn't penalize code say on x86_64, s390x or aarch64.

Jakub


Re: [PATCH] Don't invalidate string length cache when not needed

2013-05-17 Thread Marek Polacek
On Thu, May 16, 2013 at 10:33:27PM +0200, Jakub Jelinek wrote:
> On Thu, May 16, 2013 at 06:44:03PM +0200, Marek Polacek wrote:
> > --- gcc/tree-ssa-strlen.c.mp2013-05-15 14:11:20.079707492 +0200
> > +++ gcc/tree-ssa-strlen.c   2013-05-16 17:57:33.963150006 +0200
> > @@ -1693,8 +1693,10 @@ handle_char_store (gimple_stmt_iterator
> > }
> >   else
> > {
> > - si->writable = true;
> > - si->dont_invalidate = true;
> > + /* The string might be e.g. in the .rodata section.  */
> > + si->writable = false;
> 
> No, this really should be si->writable = true; (and comment not needed).
> Ok with that change.

Done.

> The thing is, while the string might not be known to be writable before,
> i.e. we can't optimize away this store, because supposedly it should
> trap, if we notice e.g. another write to the same location (writing zero
> there again), we can optimize that other write already, because we know
> that this store stored there something.
> 
> > +#include "strlenopt.h"
> > +
> > +__attribute__((noinline, noclone)) size_t
> > +fn1 (char *p, const char *r)
> > +{
> > +  size_t len1 = __builtin_strlen (r);
> > +  char *q = __builtin_strchr (p, '\0');
> > +  *q = '\0';
> > +  return len1 - __builtin_strlen (r); // This strlen should be optimized 
> > into len1.
> 
> With strlenopt.h include you can avoid using __builtin_ prefixes, all the
> builtins needed are prototyped in that header.

Ugh, sure, that's only a remnant from my testing.  Thanks for the
reviews.  Will commit this one finally.

2013-05-17  Marek Polacek  

* tree-ssa-strlen.c (handle_char_store): Don't invalidate
cached length when doing non-zero store of storing '\0' to
'\0'.

* gcc.dg/strlenopt-25.c: New test.
* gcc.dg/strlenopt-26.c: Likewise.

--- gcc/tree-ssa-strlen.c.mp2013-05-15 14:11:20.079707492 +0200
+++ gcc/tree-ssa-strlen.c   2013-05-17 10:59:27.461231167 +0200
@@ -1694,7 +1694,8 @@ handle_char_store (gimple_stmt_iterator
  else
{
  si->writable = true;
- si->dont_invalidate = true;
+ gsi_next (gsi);
+ return false;
}
}
  else
@@ -1717,6 +1718,33 @@ handle_char_store (gimple_stmt_iterator
si->endptr = ssaname;
  si->dont_invalidate = true;
}
+  /* If si->length is non-zero constant, we aren't overwriting '\0',
+and if we aren't storing '\0', we know that the length of the
+string and any other zero terminated string in memory remains
+the same.  In that case we move to the next gimple statement and
+return to signal the caller that it shouldn't invalidate anything.  
+
+This is benefical for cases like:
+
+char p[20];
+void foo (char *q)
+{
+  strcpy (p, "foobar");
+  size_t len = strlen (p);// This can be optimized into 6
+  size_t len2 = strlen (q);// This has to be computed
+  p[0] = 'X';
+  size_t len3 = strlen (p);// This can be optimized into 6
+  size_t len4 = strlen (q);// This can be optimized into len2
+  bar (len, len2, len3, len4);
+}
+   */ 
+  else if (si != NULL && si->length != NULL_TREE
+  && TREE_CODE (si->length) == INTEGER_CST
+  && integer_nonzerop (gimple_assign_rhs1 (stmt)))
+   {
+ gsi_next (gsi);
+ return false;
+   }
 }
   else if (idx == 0 && initializer_zerop (gimple_assign_rhs1 (stmt)))
 {
--- gcc/testsuite/gcc.dg/strlenopt-25.c.mp  2013-05-15 17:15:18.702118637 
+0200
+++ gcc/testsuite/gcc.dg/strlenopt-25.c 2013-05-15 18:26:27.881030317 +0200
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-strlen" } */
+
+#include "strlenopt.h"
+
+int
+main ()
+{
+  char p[] = "foobar";
+  int len, len2;
+  len = strlen (p);
+  p[0] = 'O';
+  len2 = strlen (p);
+  return len - len2;
+}
+
+/* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" } } */
+/* { dg-final { cleanup-tree-dump "strlen" } } */
--- gcc/testsuite/gcc.dg/strlenopt-26.c.mp  2013-05-16 17:33:00.302060413 
+0200
+++ gcc/testsuite/gcc.dg/strlenopt-26.c 2013-05-17 10:58:17.605015608 +0200
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-strlen" } */
+
+#include "strlenopt.h"
+
+__attribute__((noinline, noclone)) size_t
+fn1 (char *p, const char *r)
+{
+  size_t len1 = strlen (r);
+  char *q = strchr (p, '\0');
+  *q = '\0';
+  return len1 - strlen (r); // This strlen should be optimized into len1.
+}
+
+int
+main (void)
+{
+  char p[] = "foobar";
+  const char *volatile q = "xyzzy";
+  fn1 (p, q);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "strlen \\(" 1 "strlen" } } */
+/* { dg-final { cleanup-tree-dump "strlen" } } */

Marek


Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread Richard Biener
On Fri, May 17, 2013 at 10:47 AM, Jakub Jelinek  wrote:
> On Fri, May 17, 2013 at 10:32:11AM +0200, Richard Biener wrote:
>> On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt
>>  wrote:
>> > This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
>> > vectorization opportunities.  Previously, GCC treated malloc'd arrays as
>> > only guaranteeing 4-byte alignment, even though the glibc implementation
>> > guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
>> > which is sufficient to permit the missed vectorization opportunities.
>> >
>> > The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
>> > but doing so currently exposes a latent bug that degrades a 64-bit
>> > benchmark.  I have therefore not included that change at this time, but
>> > added a FIXME recording the information.
>> >
>> > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
>> > regressions.  Verified that SPEC CPU2006 degradations are fixed with no
>> > new degradations.  Ok for trunk?  Also, do you want any backports?
>>
>> You say this is because glibc guarantees such alignment - shouldn't this
>> be guarded by a proper check then?  Probably AIX guarantees similar
>> alignment but there are also embedded elf/newlib targets, no?
>>
>> Btw, the #define should possibly move to config/linux.h guarded by
>> OPTION_GLIBC?
>
> For that location it shouldn't be 64, but 2 * POINTER_SIZE, which is what
> glibc really guarantees on most architectures (I think x32 provides more
> than that, but it can override).  But changing it requires some analysis
> on commonly used malloc alternatives on Linux, what exactly do they
> guarantee.  Say valgrind, ElectricFence, jemalloc, tcmalloc, libasan.
> Note that on most targets there is some standard type that requires
> such alignment, thus at least 2 * POINTER_SIZE alignment is also a C
> conformance matter.

Yes - note that it's called MALLOC_*ABI*_ALIGNMENT for a reason - it
is supposed to be the alignment that is required for conforming to the C ABI
on the target.  For different allocators I'd rather have a new function
attribute that tells us the alignment of the allocators allocation function
(of course replacing the allocator at runtime via ELF interposition makes
that possibly fragile as well).  We shouldn't assume just because glibc
may at some day guarantee 256bit alignment that other allocators on linux
do so, too.

So - I'd rather play safe (and the default to align to BITS_PER_WORD
probably catches the C ABI guarantee on most targets I suppose).

Richard.

> Jakub


Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread Jakub Jelinek
On Fri, May 17, 2013 at 11:17:26AM +0200, Richard Biener wrote:
> Yes - note that it's called MALLOC_*ABI*_ALIGNMENT for a reason - it
> is supposed to be the alignment that is required for conforming to the C ABI
> on the target.  For different allocators I'd rather have a new function
> attribute that tells us the alignment of the allocators allocation function
> (of course replacing the allocator at runtime via ELF interposition makes
> that possibly fragile as well).  We shouldn't assume just because glibc

Most of the allocators I've mentioned, if not all, are providing just
malloc/calloc etc. entrypoints and are being in search scope before -lc,
so it isn't anything you can control with attributes.

I've looked at libasan so far (seems to guarantee either 16, or 64 or 128
bytes depending on flags), valgrind (seems to guarantee 8 bytes on 32-bit
arches except ppc32, and 16 bytes elsewhere), ElectricFence (no guarantees
at all, the default is 4 bytes, but can be overridden with EF_ALIGNMENT
env option to smaller/larger value, I guess let's ignore efence).

Jakub


Re: More vector folding

2013-05-17 Thread Hans-Peter Nilsson
> From: Marc Glisse 
> Date: Tue, 14 May 2013 13:47:23 +0200

> Here is what I tested during the night, I'll just rename the function.
> I took the chance to remove an unnecessary alternative in TRUTH_XOR_EXPR.
> 
> Passes bootstrap+testsuite on x86_64-linux-gnu.
> 
> 2013-05-14  Marc Glisse  
> 
> gcc/
>   * fold-const.c (fold_negate_expr): Handle vectors.
>   (fold_truth_not_expr): Make it static.
>   (fold_invert_truth): New static function.
>   (invert_truthvalue_loc): Handle vectors. Do not call
>   fold_truth_not_expr directly.
>   (fold_unary_loc) : Handle comparisons.
>   : Do not cast to boolean.
>   (fold_comparison): Handle vector constants.
>   (fold_binary_loc) : Remove redundant code.
>   (fold_ternary_loc) : Adapt more COND_EXPR optimizations.
>   * tree.h (fold_truth_not_expr): Remove declaration.
> 
> gcc/testsuite/
>   * g++.dg/ext/vector22.C: New testcase.
>   * gcc.dg/binop-xor3.c: Remove xfail.

Looks like the removed xfail caused regressions for about half
of all targets; there's now PR57313 opened for this.

Maybe an rtl test like BRANCH_COSTS or rtx_cost now matters in
the decision?

brgds, H-P


Re: rtl expansion without zero/sign extension based on VRP

2013-05-17 Thread Kugan

On 13/05/13 17:47, Richard Biener wrote:

On Mon, May 13, 2013 at 5:45 AM, Kugan
 wrote:

Hi,

This patch removes some of the redundant sign/zero
extensions using value ranges informations generated by VRP.

When GIMPLE_ASSIGN stmts with LHS type smaller than word is expanded to
rtl, if we can prove that RHS expression value can always fit in LHS
type and there is no sign conversion, truncation and extension to fit
the type is redundant. Subreg and Zero/sign extensions are therefore
redundant.

For example, when an expression is evaluated and it's value is assigned
to variable of type short, the generated rtl would look something like
the following.

(set (reg:SI 110)
   (zero_extend:SI (subreg:HI (reg:SI 117) 0)))

However, if during value range propagation, if we can say for certain
that the value of the expression which is present in register 117 is
within the limits of short and there is no sign conversion, we don’t
need to perform the subreg and zero_extend; instead we can generate the
following rtl.

(set (reg:SI 110)
   (reg:SI 117)))

Attached patch adds value range information to gimple stmts during value
range propagation pass and expands rtl without subreg and zero/sign
extension if the value range is within the limits of it's type.

This change improve the geomean of spec2k int benchmark with ref by about
~3.5% on an arm chromebook.

Tested  on X86_64 and ARM.

I would like review comments on this.


A few comments on the way you preserve this information.



Thanks Richard for reviewing it.


--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -191,6 +191,11 @@ struct GTY((chain_next ("%h.next")))
gimple_statement_base {
   in there.  */
unsigned int subcode : 16;

+  /* if an assignment gimple statement has RHS expression that can fit
+ LHS type, zero/sign extension to truncate is redundant.
+ Set this if we detect extension as redundant during VRP.  */
+  unsigned sign_zero_ext_redundant : 1;
+

this enlarges all gimple statements by 8 bytes, so it's out of the question.

My original plan to preserve value-range information was to re-use
SSA_NAME_PTR_INFO which is where we already store value-range
information for pointers:

struct GTY(()) ptr_info_def
{
   /* The points-to solution.  */
   struct pt_solution pt;

   /* Alignment and misalignment of the pointer in bytes.  Together
  align and misalign specify low known bits of the pointer.
  ptr & (align - 1) == misalign.  */

   /* When known, this is the power-of-two byte alignment of the object this
  pointer points into.  This is usually DECL_ALIGN_UNIT for decls and
  MALLOC_ABI_ALIGNMENT for allocated storage.  When the alignment is not
  known, it is zero.  Do not access directly but use functions
  get_ptr_info_alignment, set_ptr_info_alignment,
  mark_ptr_info_alignment_unknown and similar.  */
   unsigned int align;

   /* When alignment is known, the byte offset this pointer differs from the
  above alignment.  Access only through the same helper functions as align
  above.  */
   unsigned int misalign;
};

what you'd do is to make the ptr_info_def reference from tree_ssa_name a
reference to a union of the above (for pointer SSA names) and an alternate
info like

struct range_info_def {
   double_int min;
   double_int max;
};



I have now changed the way I am preserving this information as you have 
suggested.



or a more compressed format (a reference to a mode or type it fits for example).

The very specific case in question also points at the fact that we have
two conversion tree codes, NOP_EXPR and CONVERT_EXPR, and we've
tried to unify them (but didn't finish up that task)...



We can also remove sign/zero extension in cases other than  NOP_EXPR and 
CONVERT_EXPR,  if the value range of the RHS expressions fits LHS type 
without sign change.


For example, please look at the following gimple stmt and the resulting rtl:

;; c_3 = c_2(D) & 15;

;; Without the patch
  (insn 6 5 7 (set (reg:SI 116)
  (and:SI (reg/v:SI 115 [ c ])
  (const_int 15 [0xf]))) c5.c:4 -1
   (nil))

  (insn 7 6 0 (set (reg/v:SI 111 [ c ])
  (zero_extend:SI (subreg:QI (reg:SI 116) 0))) c5.c:4 -1
   (nil))

;; with the patch
  (insn 6 5 7 (set (reg:SI 116)
  (and:SI (reg/v:SI 115 [ c ])
  (const_int 15 [0xf]))) c5.c:4 -1
   (nil))

  (insn 7 6 0 (set (reg/v:SI 111 [ c ])
  (reg:SI 116)) c5.c:4 -1
   (nil))



+static void
+process_stmt_for_ext_elimination ()
+{

we already do all the analysis when looking for opportunities to eliminate
the intermediate cast in (T2)(T1)x in simplify_conversion_using_ranges,
that's the place that should handle this case.


I have changed this.


I have attached patch with the above changes.


Thanks,
Kugan

Richard.


Thanks,
Kugan


2013-05-09  Kugan Vivekanandarajah  

 * gcc/gimple.h (gimple_is_exp_fit_lhs, gimple_set_exp_fit_lhs): New
function.
 * gcc/tree-vrp.c (process_s

Re: [C++ Patch] PR 18126

2013-05-17 Thread Jason Merrill

OK.

Jason


Re: [PATCH] Error out on -fsanitize=thread -fsanitize=address

2013-05-17 Thread Dodji Seketeli
Jakub Jelinek  writes:


> 2013-05-06  Jakub Jelinek  
>
>   * gcc.c (SANITIZER_SPEC): Reject -fsanitize=address -fsanitize=thread
>   linking.

This looks OK to me, thanks.

-- 
Dodji


Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

2013-05-17 Thread Bill Schmidt


On Fri, 2013-05-17 at 01:11 -0400, David Edelsohn wrote:
> On Thu, May 16, 2013 at 10:40 PM, Bill Schmidt
>  wrote:
> > This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
> > vectorization opportunities.  Previously, GCC treated malloc'd arrays as
> > only guaranteeing 4-byte alignment, even though the glibc implementation
> > guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
> > which is sufficient to permit the missed vectorization opportunities.
> >
> > The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
> > but doing so currently exposes a latent bug that degrades a 64-bit
> > benchmark.  I have therefore not included that change at this time, but
> > added a FIXME recording the information.
> >
> > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
> > regressions.  Verified that SPEC CPU2006 degradations are fixed with no
> > new degradations.  Ok for trunk?  Also, do you want any backports?
> 
> Okay.  If you think that this is appropriate for 4.8 branch, that is
> okay as well.

Yes, the degradation appears to have started last summer, so updating
4.8 would be good.

> 
> Is there a FSF GCC Bugzilla open for the 437.leslie3d problem?  The
> FIXME doesn't need to contain the complete explanation, but it would
> be nice to reference a longer explanation and not pretend to be
> Fermat's theorem that is too large to write in the marge.

There is now.  I opened 57309 after submitting the patch, so I can
reference that.

Thanks,
Bill

> 
> Thanks, David
> 



Re: More vector folding

2013-05-17 Thread Marc Glisse

On Fri, 17 May 2013, Hans-Peter Nilsson wrote:


From: Marc Glisse 
Date: Tue, 14 May 2013 13:47:23 +0200



Here is what I tested during the night, I'll just rename the function.
I took the chance to remove an unnecessary alternative in TRUTH_XOR_EXPR.

Passes bootstrap+testsuite on x86_64-linux-gnu.

2013-05-14  Marc Glisse  

gcc/
* fold-const.c (fold_negate_expr): Handle vectors.
(fold_truth_not_expr): Make it static.
(fold_invert_truth): New static function.
(invert_truthvalue_loc): Handle vectors. Do not call
fold_truth_not_expr directly.
(fold_unary_loc) : Handle comparisons.
: Do not cast to boolean.
(fold_comparison): Handle vector constants.
(fold_binary_loc) : Remove redundant code.
(fold_ternary_loc) : Adapt more COND_EXPR optimizations.
* tree.h (fold_truth_not_expr): Remove declaration.

gcc/testsuite/
* g++.dg/ext/vector22.C: New testcase.
* gcc.dg/binop-xor3.c: Remove xfail.


Looks like the removed xfail caused regressions for about half
of all targets; there's now PR57313 opened for this.

Maybe an rtl test like BRANCH_COSTS or rtx_cost now matters in
the decision?


Yes, LOGICAL_OP_NON_SHORT_CIRCUIT seems to be it.

What is the proper thing to do here? If I add the generic xfail back, 
we'll get xpass on some platforms, now we have fails on some platforms, 
and listing the platforms where we want the transformation to happen is 
just a pain. Shall I remove the testcase?


--
Marc Glisse


Re: [PATCH] Fix extendsidi2_1 splitting (PR rtl-optimization/57281, PR rtl-optimization/57300 wrong-code, alternative)

2013-05-17 Thread Jakub Jelinek
On Fri, May 17, 2013 at 10:25:01AM +0200, Jakub Jelinek wrote:
> Alternative, so far untested, patch is let the register is not dead splitter
> do its job always during split2 and just fix it up during peephole2, if the
> register was dead.

Now fully bootstrapped/regtested on x86_64-linux and i686-linux.

> 2013-05-17  Jakub Jelinek  
> 
>   PR rtl-optimization/57281
>   PR rtl-optimization/57300
>   * config/i386/i386.md (extendsidi2_1 dead reg splitter): Remove.
>   (extendsidi2_1 peephole2s): Add instead 2 new peephole2s, that undo
>   what the other splitter did if the registers are dead.
> 
>   * gcc.dg/pr57300.c: New test.
>   * gcc.c-torture/execute/pr57281.c: New test.

Jakub


[fortran, doc] Improve random_seed example

2013-05-17 Thread Janne Blomqvist
Hi,

the example we provide for the usage of the random_seed intrinsic
could be better. At least one user has already been tripped over by
the fact that on some targets the first call to system_clock returns
0, resulting in a poor seed. Below is an improved(?!) example program.

Ok for trunk/4.8/4.7? (Changelog + patch is of course trivial once
there is agreement on the code itself)

! Seed the random generator with some random input
module rseed_rand
contains
  subroutine rseed()
implicit none
integer, allocatable :: seed(:)
integer :: i, n, un, istat, dt(8), pid, t(2), s
integer(8) :: count, tms

call random_seed(size = n)
allocate(seed(n))
! First try if the OS provides a random number generator
open(newunit=un, file="/dev/urandom", access="stream", &
 form="unformatted", iostat=istat)
if (istat == 0) then
   read(un) seed
   close(un)
else
   ! Fallback to XOR:ing the current time and pid. The PID is
   ! useful in case one launches multiple instances of the same
   ! program in parallel.
   call system_clock(count)
   if (count /= 0) then
  t = transfer(count, t)
   else
  call date_and_time(values=dt)
  tms = (dt(1) - 1970) * 365_8 * 24 * 60 * 60 * 1000 &
   + dt(2) * 31_8 * 24 * 60 * 60 * 1000 &
   + dt(3) * 24 * 60 * 60 * 60 * 1000 &
   + dt(5) * 60 * 60 * 1000 &
   + dt(6) * 60 * 1000 + dt(7) * 1000 &
   + dt(8)
  t = transfer(tms, t)
   end if
   s = ieor(t(1), t(2))
   pid = getpid()
   s = ieor(s, pid)
   if (n >= 3) then
  seed(1) = t(1)
  seed(2) = t(2)
  seed(3) = pid
  if (n > 3) then
 seed(4:) = s + 37 * (/ (i, i = 0, n - 4) /)
  end if
   else
  seed = s + 37 * (/ (i, i = 0, n - 1 ) /)
   end if
end if
call random_seed(put=seed)
  end subroutine rseed
end module rseed_rand

--
Janne Blomqvist


[PATCH] Fix typo in stmt_kills_ref_p_1

2013-05-17 Thread Richard Biener

stmt_kills_ref_p_1 wants to compare MEM_REF offset but compares
the pointers instead (which are previously compared as equal).
This breaks my fix for PR57303.

Fixed thus, applied as obvious.

Richard.

2013-05-17  Richard Biener  

* tree-ssa-alias.c (stmt_kills_ref_p_1): Properly compare
MEM_REF offsets.

Index: gcc/tree-ssa-alias.c
===
--- gcc/tree-ssa-alias.c(revision 199004)
+++ gcc/tree-ssa-alias.c(working copy)
@@ -2002,8 +2002,8 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref
  if (TREE_CODE (base) == MEM_REF && TREE_CODE (ref->base) == MEM_REF
  && TREE_OPERAND (base, 0) == TREE_OPERAND (ref->base, 0))
{
- if (!tree_int_cst_equal (TREE_OPERAND (base, 0),
-  TREE_OPERAND (ref->base, 0)))
+ if (!tree_int_cst_equal (TREE_OPERAND (base, 1),
+  TREE_OPERAND (ref->base, 1)))
{
  double_int off1 = mem_ref_offset (base);
  off1 = off1.lshift (BITS_PER_UNIT == 8


Re: [PATCH] Fix typo in stmt_kills_ref_p_1

2013-05-17 Thread Jakub Jelinek
On Fri, May 17, 2013 at 02:46:56PM +0200, Richard Biener wrote:
> 
> stmt_kills_ref_p_1 wants to compare MEM_REF offset but compares
> the pointers instead (which are previously compared as equal).
> This breaks my fix for PR57303.
> 
> Fixed thus, applied as obvious.
> 
> Richard.
> 
> 2013-05-17  Richard Biener  
> 
>   * tree-ssa-alias.c (stmt_kills_ref_p_1): Properly compare
>   MEM_REF offsets.

Ugh, thanks.
As the 0th operand was equal, tree_int_cst_equal used to return always
1 and thus the if condition was never true.

> --- gcc/tree-ssa-alias.c  (revision 199004)
> +++ gcc/tree-ssa-alias.c  (working copy)
> @@ -2002,8 +2002,8 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref
> if (TREE_CODE (base) == MEM_REF && TREE_CODE (ref->base) == MEM_REF
> && TREE_OPERAND (base, 0) == TREE_OPERAND (ref->base, 0))
>   {
> -   if (!tree_int_cst_equal (TREE_OPERAND (base, 0),
> -TREE_OPERAND (ref->base, 0)))
> +   if (!tree_int_cst_equal (TREE_OPERAND (base, 1),
> +TREE_OPERAND (ref->base, 1)))
>   {
> double_int off1 = mem_ref_offset (base);
> off1 = off1.lshift (BITS_PER_UNIT == 8

Jakub


Re: More vector folding

2013-05-17 Thread Chung-Ju Wu
2013/5/17 Marc Glisse :
>
> Yes, LOGICAL_OP_NON_SHORT_CIRCUIT seems to be it.
>
> What is the proper thing to do here? If I add the generic xfail back, we'll
> get xpass on some platforms, now we have fails on some platforms, and
> listing the platforms where we want the transformation to happen is just a
> pain. Shall I remove the testcase?
>

If we cannot redesign the testcase, we should remove it.
Because it seems the testcase cannot properly demonstrate
its purpose after r198983.

Otherwise we better have some comment in the testcase
to inform target maintainers listing their targets in the xfail field.

Best regards,
jasonwucj


Re: More vector folding

2013-05-17 Thread Jakub Jelinek
On Fri, May 17, 2013 at 09:54:14PM +0800, Chung-Ju Wu wrote:
> 2013/5/17 Marc Glisse :
> >
> > Yes, LOGICAL_OP_NON_SHORT_CIRCUIT seems to be it.
> >
> > What is the proper thing to do here? If I add the generic xfail back, we'll
> > get xpass on some platforms, now we have fails on some platforms, and
> > listing the platforms where we want the transformation to happen is just a
> > pain. Shall I remove the testcase?
> >
> 
> If we cannot redesign the testcase, we should remove it.
> Because it seems the testcase cannot properly demonstrate
> its purpose after r198983.
> 
> Otherwise we better have some comment in the testcase
> to inform target maintainers listing their targets in the xfail field.

You can just limit the testcase to a few selected targets, if it includes
a few widely used architectures, it will be enough for the testing and will
be better than removing the testcase altogether.

Jakub


Re: [PATCH, ARM][1 of 2] Add epilogue dwarf info for shrink-wrap

2013-05-17 Thread Ramana Radhakrishnan

On 05/16/13 07:27, Zhenqiang Chen wrote:

On 15 May 2013 06:31, Ramana Radhakrishnan  wrote:

Sorry this had dropped off my list of patches to review somehow but
anyway here's a first cut review.

On Thu, Mar 21, 2013 at 6:58 AM, Zhenqiang Chen
 wrote:

Hi,

When shrink-wrap is enabled, the "returns" from simple-return path and
normal return path can be merged. The code is like:

  tst ...
  /  \
 |  push ...
 |  ...
 |  pop ...
  \  /
  bx lr

If the dwarf info after "pop ..." is incorrect, the dwarf checks will
fail at dwarf2cfi.c: function maybe_record_trace_start.

   /* We ought to have the same state incoming to a given trace no
  matter how we arrive at the trace.  Anything else means we've
  got some kind of optimization error.  */
   gcc_checking_assert (cfi_row_equal_p (cur_row, ti->beg_row));

The patch is to add epilogue dwarf info to make sure:

Before "bx lr",
* All registers saved in stack had been "restored".
* .cfi_def_cfa_offset 0
* .cfi_def_cfa_register is sp even if frame_pointer_needed

Boot strapped and no make check regression.


in what configuration - thumb2 / arm / cortex-a15 / cortex-a9 ? what
languages and on what platform ?


Bootstrapped both arm and thumb2 on Pandaboard ES and Chromebook.
Pandaboard ES (cortex-a9):
--enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=armv7-a
--with-float=hard --with-fpu=vfpv3-d16
--enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=armv7-a
--with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb

Chromebook (cortext-a15):
--enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=armv7-a
--with-tune=cortex-a15 --with-float=hard --with-fpu=vfpv3-d16
--enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=armv7-a
--with-tune=cortex-a15 --with-float=hard --with-fpu=vfpv3-d16
--with-mode=thumb

Regression tests on Pandborad ES for both arm and thumb2 with the same
configure as bootstrap.



Is it OK?




@@ -25447,9 +25519,15 @@ arm_unwind_emit (FILE * asm_out_file, rtx insn)
   handled_one = true;
   break;



+/* The INSN is generated in epilogue.  It is set as RTX_FRAME_RELATED_P
+   to get correct dwarf info for shrink-wrap.  But We should not emit
+   unwind info for it.  */


s/dwarf/debug frame.

REplace "But We should not emit unwind info for it " with " We should
not emit unwind info for it because these are used either for pretend
arguments or .  "

I think you are correct that there is no need for unwind information
as we only have unwind information valid at call points and in this
case I wouldn't expect unwind information to need to be exact for
pretend arguments


Thanks for the comments. The patch is updated and rebased to trunk.



Ok if no regressions.

regards
Ramana



-Zhenqiang






Re: More vector folding

2013-05-17 Thread Marc Glisse

On Fri, 17 May 2013, Jakub Jelinek wrote:


On Fri, May 17, 2013 at 09:54:14PM +0800, Chung-Ju Wu wrote:

2013/5/17 Marc Glisse :


Yes, LOGICAL_OP_NON_SHORT_CIRCUIT seems to be it.

What is the proper thing to do here? If I add the generic xfail back, we'll
get xpass on some platforms, now we have fails on some platforms, and
listing the platforms where we want the transformation to happen is just a
pain. Shall I remove the testcase?



If we cannot redesign the testcase, we should remove it.
Because it seems the testcase cannot properly demonstrate
its purpose after r198983.

Otherwise we better have some comment in the testcase
to inform target maintainers listing their targets in the xfail field.


You can just limit the testcase to a few selected targets, if it includes
a few widely used architectures, it will be enough for the testing and will
be better than removing the testcase altogether.


Like this? (according to the PR, ia64, sh4 and spu would work too)

I tested with:
make check-gcc RUNTESTFLAGS=dg.exp=binop-xor3.c

and it still passes on x86_64 and appears as "unsupported" on powerpc.

2013-05-17  Marc Glisse  

PR regression/57313
* gcc.dg/binop-xor3.c: Restrict to platforms known to work (x86).


--
Marc GlisseIndex: gcc.dg/binop-xor3.c
===
--- gcc.dg/binop-xor3.c (revision 199006)
+++ gcc.dg/binop-xor3.c (working copy)
@@ -1,11 +1,11 @@
-/* { dg-do compile } */
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
 
 int
 foo (int a, int b)
 {
   return ((a && !b) || (!a && b));
 }
 
 /* { dg-final { scan-tree-dump-times "\\\^" 1 "optimized" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */


Re: [PATCH, ARM][2 of 2] Enable shrink-wrap for ARM

2013-05-17 Thread Ramana Radhakrishnan

On 05/16/13 07:46, Zhenqiang Chen wrote:

On 15 May 2013 06:36, Ramana Radhakrishnan  wrote:

What happens to the *arm_simple_return pattern that already exists in
the backend ? Can you rebase to trunk ?


The *arm_simple_return is the same as "simple_return" used by
shrink-wrap. *arm_return and *arm_simple_return are not merged since
*arm_return is for "Often the return insn will be the same as loading
from memory", while *arm_simple_return is for "branch".

For thumb2, there is only "simple_return" pattern -- *thumb2_return.

And this is the reason why the "normal" return generated by epilogue
and "simple" return generated by shrink-wrap can be optimized as one,
which leads to dwarf info issue.

The rebased patch is attached.


Ok.

Ramana



Thanks!
-Zhenqiang



On Wed, Apr 3, 2013 at 7:50 AM, Zhenqiang Chen
 wrote:

On 2 April 2013 17:55, Ramana Radhakrishnan  wrote:

On Thu, Mar 21, 2013 at 7:03 AM, Zhenqiang Chen
 wrote:

Hi,

The patch is to enable shrink-wrap for TARGET_ARM and TARGET_THUMB2.

Bootstrapped and no make check regression.
All previous Linaro shrink-wrap bugs (http://goo.gl/6fGg5) are verified.

Is it OK?


The tests should be part of the patch attached and not just added as
separate files in your patch submission.


Thank you for t




-Zhenqiang



Thanks!
-Zhenqiang

ChangeLog:
2013-03-21 Bernd Schmidt  
Zhenqiang Chen 

 * config/arm/arm-protos.h: Add and update function protos.
 * config/arm/arm.c (use_simple_return_p): New added.
 (thumb2_expand_return): Check simple_return flag.
 * config/arm/arm.md: Add simple_return and conditional simple_return.
 * config/arm/iterators.md: Add iterator for return and simple_return.

testsuite/ChangeLog:
2013-03-21 Zhenqiang Chen 

 * gcc.dg/shrink-wrap-alloca.c: New added.
 * gcc.dg/shrink-wrap-pretend.c: New added.
 * gcc.dg/shrink-wrap-sibcall.c: New added.





Re: More vector folding

2013-05-17 Thread Jakub Jelinek
On Fri, May 17, 2013 at 04:23:08PM +0200, Marc Glisse wrote:
> 2013-05-17  Marc Glisse  
> 
>   PR regression/57313
>   * gcc.dg/binop-xor3.c: Restrict to platforms known to work (x86).

I'd say it should be PR testsuite/57313 (and the PR changed to that).

Ok with that change, if some other target maintainer sees this test passing
on his platform, it can be added to the target selector later.

> --- gcc.dg/binop-xor3.c   (revision 199006)
> +++ gcc.dg/binop-xor3.c   (working copy)
> @@ -1,11 +1,11 @@
> -/* { dg-do compile } */
> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
>  /* { dg-options "-O2 -fdump-tree-optimized" } */
>  
>  int
>  foo (int a, int b)
>  {
>return ((a && !b) || (!a && b));
>  }
>  
>  /* { dg-final { scan-tree-dump-times "\\\^" 1 "optimized" } } */
>  /* { dg-final { cleanup-tree-dump "optimized" } } */


Jakub


Re: More vector folding

2013-05-17 Thread Marc Glisse

On Fri, 17 May 2013, Jakub Jelinek wrote:


On Fri, May 17, 2013 at 04:23:08PM +0200, Marc Glisse wrote:

2013-05-17  Marc Glisse  

PR regression/57313
* gcc.dg/binop-xor3.c: Restrict to platforms known to work (x86).


I'd say it should be PR testsuite/57313 (and the PR changed to that).


Ok, thanks.

I was actually thinking that it might be better to put the target selector 
on dg-final, as in the attached. Which do you prefer?


--
Marc GlisseIndex: gcc.dg/binop-xor3.c
===
--- gcc.dg/binop-xor3.c (revision 199006)
+++ gcc.dg/binop-xor3.c (working copy)
@@ -1,11 +1,11 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-optimized" } */
 
 int
 foo (int a, int b)
 {
   return ((a && !b) || (!a && b));
 }
 
-/* { dg-final { scan-tree-dump-times "\\\^" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\\^" 1 "optimized" { target i?86-*-* 
x86_64-*-* } } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */


Re: More vector folding

2013-05-17 Thread Jakub Jelinek
On Fri, May 17, 2013 at 04:37:45PM +0200, Marc Glisse wrote:
> On Fri, 17 May 2013, Jakub Jelinek wrote:
> 
> >On Fri, May 17, 2013 at 04:23:08PM +0200, Marc Glisse wrote:
> >>2013-05-17  Marc Glisse  
> >>
> >>PR regression/57313
> >>* gcc.dg/binop-xor3.c: Restrict to platforms known to work (x86).
> >
> >I'd say it should be PR testsuite/57313 (and the PR changed to that).
> 
> Ok, thanks.
> 
> I was actually thinking that it might be better to put the target
> selector on dg-final, as in the attached. Which do you prefer?

Doesn't really matter, both are fine.

Jakub


Re: [PATCH] Fix extendsidi2_1 splitting (PR rtl-optimization/57281, PR rtl-optimization/57300 wrong-code, alternative)

2013-05-17 Thread Richard Henderson
On 05/17/2013 01:25 AM, Jakub Jelinek wrote:
> +(define_peephole2
> +  [(set (match_operand:SI 0 "memory_operand")
> + (match_operand:SI 1 "register_operand"))
> +   (set (match_operand:SI 2 "register_operand") (match_dup 1))
> +   (parallel [(set (match_dup 2)
> +(ashiftrt:SI (match_dup 2)
> + (match_operand:QI 3 "const_int_operand")))
> +(clobber (reg:CC FLAGS_REG))])
> +   (set (match_operand:SI 4 "memory_operand") (match_dup 2))]
> +  "INTVAL (operands[3]) == 31

No sense in using match_operand in the pattern and INTVAL == 31
in the condition when you can just use (const_int 31) in the pattern.

Modulo those two cases, the patch is ok.


r~


Re: [PATCH, i386]: Update processor_alias_table for missing PTA_PRFCHW and PTA_FXSR flags

2013-05-17 Thread Uros Bizjak
On Fri, May 17, 2013 at 7:38 AM, Gopalasubramanian, Ganesh
 wrote:

> Thank you Uros for the patch. Could you backport this to the 4.8.0?
>
> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Wednesday, May 15, 2013 11:16 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Gopalasubramanian, Ganesh
> Subject: [PATCH, i386]: Update processor_alias_table for missing PTA_PRFCHW 
> and PTA_FXSR flags
>
> Hello!
>
> Attached patch adds missing PTA_PRFCHW and PTA_FXSR flags to x86 processor 
> alias table. PRFCHW CPUID flag is shared with 3dnow prefetch flag, so some 
> additional logic is needed to avoid generating SSE prefetches for non-SSE 
> 3dNow! targets, while still generating full set of 3dnow prefetches on 3dNow! 
> targets.
>
> 2013-05-15  Uros Bizjak  
>
> * config/i386/i386.c (iy86_option_override_internal): Update
> processor_alias_table for missing PTA_PRFCHW and PTA_FXSR flags.  Add
> PTA_POPCNT to corei7 entry and remove PTA_SSE from athlon-4 entry.
> Do not enable SSE prefetch on non-SSE 3dNow! targets.  Enable
> TARGET_PRFCHW for TARGET_3DNOW targets.
> * config/i386/i386.md (prefetch): Enable for TARGET_PRFCHW instead
> of TARGET_3DNOW.
> (*prefetch_3dnow): Enable for TARGET_PRFCHW only.
>
> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}  
> and was committed to mainline SVN. The patch will be backported to 4.8 branch 
> in a couple of days.

Done.

I have also reverted athlon-4 change everywhere. It does have SSE after all.

Uros.


Fix pr49146 - crash in unwinder

2013-05-17 Thread Richard Henderson
The rationale is captured in the comment in the first hunk.
Committed.


r~
PR target/49146
* unwind-dw2.c (UNWIND_COLUMN_IN_RANGE): New macro.
(execute_cfa_program): Use it when storing to fs->regs.


diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
index 80de5ab..041f9d5 100644
--- a/libgcc/unwind-dw2.c
+++ b/libgcc/unwind-dw2.c
@@ -59,6 +59,35 @@
 #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) (REGNO)
 #endif
 
+/* ??? For the public function interfaces, we tend to gcc_assert that the
+   column numbers are in range.  For the dwarf2 unwind info this does happen,
+   although so far in a case that doesn't actually matter.
+
+   See PR49146, in which a call from x86_64 ms abi to x86_64 unix abi stores
+   the call-saved xmm registers and annotates them.  We havn't bothered
+   providing support for the xmm registers for the x86_64 port primarily
+   because the 64-bit windows targets don't use dwarf2 unwind, using sjlj or
+   SEH instead.  Adding the support for unix targets would generally be a
+   waste.  However, some runtime libraries supplied with ICC do contain such
+   an unorthodox transition, as well as the unwind info to match.  This loss
+   of register restoration doesn't matter in practice, because the exception
+   is caught in the native unix abi, where all of the xmm registers are 
+   call clobbered.
+
+   Ideally, we'd record some bit to notice when we're failing to restore some
+   register recorded in the unwind info, but to do that we need annotation on
+   the unix->ms abi edge, so that we know when the register data may be
+   discarded.  And since this edge is also within the ICC library, we're
+   unlikely to be able to get the new annotation.
+
+   Barring a magic solution to restore the ms abi defined 128-bit xmm registers
+   (as distictly opposed to the full runtime width) without causing extra
+   overhead for normal unix abis, the best solution seems to be to simply
+   ignore unwind data for unknown columns.  */
+
+#define UNWIND_COLUMN_IN_RANGE(x) \
+__builtin_expect((x) <= DWARF_FRAME_REGISTERS, 1)
+
 #ifdef REG_VALUE_IN_UNWIND_CONTEXT
 typedef _Unwind_Word _Unwind_Context_Reg_Val;
 
@@ -939,14 +968,19 @@ execute_cfa_program (const unsigned char *insn_ptr,
  reg = insn & 0x3f;
  insn_ptr = read_uleb128 (insn_ptr, &utmp);
  offset = (_Unwind_Sword) utmp * fs->data_align;
- fs->regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].how
-   = REG_SAVED_OFFSET;
- fs->regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].loc.offset = offset;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   {
+ fs->regs.reg[reg].how = REG_SAVED_OFFSET;
+ fs->regs.reg[reg].loc.offset = offset;
+   }
}
   else if ((insn & 0xc0) == DW_CFA_restore)
{
  reg = insn & 0x3f;
- fs->regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].how = REG_UNSAVED;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   fs->regs.reg[reg].how = REG_UNSAVED;
}
   else switch (insn)
{
@@ -977,26 +1011,35 @@ execute_cfa_program (const unsigned char *insn_ptr,
  insn_ptr = read_uleb128 (insn_ptr, ®);
  insn_ptr = read_uleb128 (insn_ptr, &utmp);
  offset = (_Unwind_Sword) utmp * fs->data_align;
- fs->regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].how
-   = REG_SAVED_OFFSET;
- fs->regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].loc.offset = offset;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   {
+ fs->regs.reg[reg].how = REG_SAVED_OFFSET;
+ fs->regs.reg[reg].loc.offset = offset;
+   }
  break;
 
case DW_CFA_restore_extended:
  insn_ptr = read_uleb128 (insn_ptr, ®);
  /* FIXME, this is wrong; the CIE might have said that the
 register was saved somewhere.  */
- fs->regs.reg[DWARF_REG_TO_UNWIND_COLUMN(reg)].how = REG_UNSAVED;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   fs->regs.reg[reg].how = REG_UNSAVED;
  break;
 
case DW_CFA_same_value:
  insn_ptr = read_uleb128 (insn_ptr, ®);
- fs->regs.reg[DWARF_REG_TO_UNWIND_COLUMN(reg)].how = REG_UNSAVED;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   fs->regs.reg[reg].how = REG_UNSAVED;
  break;
 
case DW_CFA_undefined:
  insn_ptr = read_uleb128 (insn_ptr, ®);
- fs->regs.reg[DWARF_REG_TO_UNWIND_COLUMN(reg)].how = REG_UNDEFINED;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   fs->regs.reg[reg].how = REG_UNDEFINED;
  break;
 
case DW_CFA_nop:
@@ -1007,9 +1050,12 @@ execute_cfa_program (const unsigned char *insn_ptr,
 

Re: Patch ping

2013-05-17 Thread Jeff Law

On 05/17/2013 12:49 AM, Jakub Jelinek wrote:

Hi!

http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00282.html
Reject -fsanitize=address -fsanitize=thread linking that won't ever work at
runtime.
I thought Dodji already OK's this.If there's any concern about the 
scope going outside Dodji's area, then I'll approve it :-)


jeff


Re: [PATCH] Fix incorrect discriminator assignment.

2013-05-17 Thread Dehao Chen
On Fri, May 17, 2013 at 1:22 AM, Richard Biener
 wrote:
> On Wed, May 15, 2013 at 6:50 PM, Cary Coutant  wrote:
>>> gcc/ChangeLog:
>>>
>>> * tree-cfg.c (locus_descrim_hasher::hash): Only hash lineno.
>>> (locus_descrim_hasher::equal): Likewise.
>>> (next_discriminator_for_locus): Likewise.
>>> (assign_discriminator): Add return value.
>>> (make_edges): Assign more discriminator if necessary.
>>> (make_cond_expr_edges): Likewise.
>>> (make_goto_expr_edges): Likewise.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.dg/debug/dwarf2/discriminator.c: New Test.
>>
>> Looks OK to me, but I can't approve patches for tree-cfg.c.
>>
>> Two comments:
>>
>> (1) In the C++ conversion, it looks like someone misspelled "discrim"
>> in "locus_descrim_hasher".
>>
>> (2) Is this worth fixing in trunk when we'll eventually switch to a
>> per-instruction discriminator?
>
>> This patch fixes a common case where one line is distributed to 3 BBs,
>> but only 1 discriminator is assigned.
>
> As far as I can see the patch makes discriminators coarser (by hashing
> and comparing on LOCATION_LINE and not on the location).  It also has
> the changes like
>
> -  assign_discriminator (entry_locus, then_bb);
> +  if (assign_discriminator (entry_locus, then_bb))
> +assign_discriminator (entry_locus, bb);
>
> which is what the comment refers to?  I think those changes are short-sighted
> because what happens if the 2nd assign_discriminator call has exactly the
> same issue?  Especially with the make_goto_expr_edges change there
> may be multiple predecessors where I cannot see how the change is
> correct.  Yes, the change changes something and thus may fix the testcase
> but it doesn't change things in a predictable way as far as I can see.
>
> So - the change to compare/hash LOCATION_LINE looks good to me,
> but the assign_discriminator change needs more explaining.

The new discriminator assignment algorithm is:

1 FOR_EACH_BB:
2   FOR_EACH_SUCC_EDGE:
3 if (same_line(bb, succ_bb))
4   if (succ_bb has no discriminator)
5  succ_bb->discriminator = new_discriminator
6   else if (bb has no discriminator)
7  bb->discriminator = new_discriminator

Because there are so many places to handle SUCC_EDGE, thus the logic
line #3, #4 is embedded within assign_discriminator function, and the
logic in line #6 is encoded as the return value of
assign_discriminator.

Thanks,
Dehao

>
> Richard.
>
>> -cary


[PATCH] [tree-optimization/57124] Updated fix for 254.gap problems

2013-05-17 Thread Jeff Law
As I believe I pointed out in a follow-up message, 254.gap is depending 
on signed overflow semantics.


This patch avoids eliminating a cast feeding a conditional when the
SSA_NAME's range has overflowed unless -fstrict-overflow is in effect. 
Thus 254.gap should be building with -fno-strict-overflow.


This patch also introduces a warning when the optimization is applied 
and the ranges have overflowed.


Bootstrapped and regression tested on x86-unknown-linux-gnu.

OK for the trunk?



commit 62bbaa8de0e8d929eb3c63331b47950e9b09d801
Author: Jeff Law 
Date:   Wed May 1 12:33:20 2013 -0600

PR tree-optimization/57124
* tree-vrp.c (simplify_cond_using_ranges): Only simplify a
conversion feeding a condition if the range has an overflow
if -fstrict-overflow.  Add warnings for when we do make the
transformation.

PR tree-optimization/57124
* gcc.c-torture/execute/pr57124.c: New test.
* gcc.c-torture/execute/pr57124.x: Set -fno-strict-overflow.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8e92c44..9320f21 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2013-05-17  Jeff Law  
+
+   PR tree-optimization/57124
+   * tree-vrp.c (simplify_cond_using_ranges): Only simplify a
+   conversion feeding a condition if the range has an overflow
+   if -fstrict-overflow.  Add warnings for when we do make the
+   transformation.
+
 2013-05-16  Rainer Orth  
 
* reorg.c (link_cc0_insns): Wrap in #ifdef HAVE_cc0.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 879b9bc..482151c 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2013-05-17  Jeff Law  
+
+   PR tree-optimization/57124
+   * gcc.c-torture/execute/pr57124.c: New test.
+   * gcc.c-torture/execute/pr57124.x: Set -fno-strict-overflow.
+
 2013-05-16  Greta Yorsh  
 
* gcc.target/arm/unaligned-memcpy-2.c: Adjust expected output.
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr57124.c 
b/gcc/testsuite/gcc.c-torture/execute/pr57124.c
new file mode 100644
index 000..835d249
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr57124.c
@@ -0,0 +1,27 @@
+__attribute__ ((noinline))
+foo(short unsigned int *p1, short unsigned int *p2)
+{
+  short unsigned int x1, x4;
+  int x2, x3, x5, x6;
+  unsigned int x7;
+  
+  x1 = *p1;
+  x2 = (int) x1;
+  x3 = x2 * 65536;
+  x4 = *p2;
+  x5 = (int) x4;
+  x6 = x3 + x4;
+  x7 = (unsigned int) x6;
+  if (x7 <= 268435455U)
+abort ();
+  exit (0);
+}
+
+main()
+{
+  short unsigned int x, y;
+  x = -5;
+  y = -10;
+  foo (&x, &y);
+}
+
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr57124.x 
b/gcc/testsuite/gcc.c-torture/execute/pr57124.x
new file mode 100644
index 000..d8cacbe
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr57124.x
@@ -0,0 +1,2 @@
+set additional_flags "-fno-strict-overflow"
+return 0
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index b5de683..b3eccf0 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -8669,8 +8669,32 @@ simplify_cond_using_ranges (gimple stmt)
  && range_fits_type_p (vr,
TYPE_PRECISION (TREE_TYPE (op0)),
TYPE_UNSIGNED (TREE_TYPE (op0)))
- && int_fits_type_p (op1, TREE_TYPE (innerop)))
+ && int_fits_type_p (op1, TREE_TYPE (innerop))
+ /* The range must not have overflowed, or if it did overflow
+we must not be wrapping/trapping overflow and optimizing
+with strict overflow semantics.  */
+ && ((!is_negative_overflow_infinity (vr->min)
+  && !is_positive_overflow_infinity (vr->max))
+ || (!flag_wrapv && !flag_trapv && flag_strict_overflow)))
{
+ /* If the range overflowed and the user has asked for warnings
+when strict overflow semantics were used to optimize code,
+issue an appropriate warning.  */
+ if ((is_negative_overflow_infinity (vr->min)
+  || is_positive_overflow_infinity (vr->max))
+ && issue_strict_overflow_warning 
(WARN_STRICT_OVERFLOW_CONDITIONAL))
+   {
+ location_t location;
+
+ if (!gimple_has_location (stmt))
+   location = input_location;
+ else
+   location = gimple_location (stmt);
+ warning_at (location, OPT_Wstrict_overflow,
+ "assuming signed overflow does not occur when "
+ "simplifying conditional.");
+   }
+
  tree newconst = fold_convert (TREE_TYPE (innerop), op1);
  gimple_cond_set_lhs (stmt, innerop);
  gimple_cond_set_rhs (stmt, newconst);


Add myself to MAINTAINERS as Write After Approval

2013-05-17 Thread David Malcolm
Adding myself to the MAINTAINERS in the "Write After Approval" category
Index: ChangeLog
===
--- ChangeLog	(revision 199021)
+++ ChangeLog	(revision 199022)
@@ -1,3 +1,7 @@
+2013-05-17  David Malcolm  
+
+	* MAINTAINERS (Write After Approval): Add myself.
+
 2013-04-30  Brooks Moses  
 
 	* MAINTAINERS: Update my email; move myself from Fortran
Index: MAINTAINERS
===
--- MAINTAINERS	(revision 199021)
+++ MAINTAINERS	(revision 199022)
@@ -452,6 +452,7 @@
 Christophe Lyon	christophe.l...@st.com
 Luis Machado	luis...@br.ibm.com
 Ziga Mahkovec	ziga.mahko...@klika.si
+David Malcolm	dmalc...@redhat.com
 Simon Martin	simar...@users.sourceforge.net
 Ranjit Mathew	rmat...@hotmail.com
 Michael Matz	m...@suse.de


[wwwdocs] gcc-4.8/changes.html: mention IRA and transactional memory

2013-05-17 Thread Aldy Hernandez
Errr, how did we miss this?  Ok, I'm partly to blame for the lack of 
transactional memory in changes.html, but something as big as getting 
rid of reload?!


Would it be preferable to mention IRA in the target specific x86 section 
instead?  I figured it was an important enough change to warrant 
front-page coverage.


Is this OK?
Index: htdocs/gcc-4.8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.116
diff -u -r1.116 changes.html
--- htdocs/gcc-4.8/changes.html 24 Apr 2013 15:14:26 -  1.116
+++ htdocs/gcc-4.8/changes.html 17 May 2013 17:49:19 -
@@ -143,6 +143,11 @@
-fsanitize=thread. Instructions will be instrumented to
detect data races. The ThreadSanitizer is available on x86-64
GNU/Linux.
+A new local register allocator has been implemented, which
+replaces the 26 year old reload pass and improves generated code
+quality on ia32 and x86-64 targets.
+Support for transactional memory has been implemented on
+selected architectures.
   
 
 


[PATCH, AArch64] Allow insv_imm to handle bigger immediates via masking to 16-bits

2013-05-17 Thread Ian Bolton
The MOVK instruction is currently not used when operand 2 is
more than 16 bits, which leads to sub-optimal code.

This patch improves those situations by removing the check and
instead masking down to 16 bits within the new "X" format specifier
I added recently.

OK for trunk?

Cheers,
Ian


2013-05-17  Ian Bolton  

* config/aarch64/aarch64.c (aarch64_print_operand): Change the X
format
specifier to only display bottom 16 bits.

* config/aarch64/aarch64.md (insv_imm): Allow any-sized
immediate
to match for operand 2, since it will be masked.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b57416c..1bdfd85 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3424,13 +3424,13 @@ aarch64_print_operand (FILE *f, rtx x, char code)
   break;
 
 case 'X':
-  /* Print integer constant in hex.  */
+  /* Print bottom 16 bits of integer constant in hex.  */
   if (GET_CODE (x) != CONST_INT)
{
  output_operand_lossage ("invalid operand for '%%%c'", code);
  return;
}
-  asm_fprintf (f, "0x%wx", UINTVAL (x));
+  asm_fprintf (f, "0x%wx", UINTVAL (x) & 0x);
   break;
 
 case 'w':
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b27bcda..403d717 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -858,9 +858,8 @@
  (const_int 16)
  (match_operand:GPI 1 "const_int_operand" "n"))
(match_operand:GPI 2 "const_int_operand" "n"))]
-  "INTVAL (operands[1]) < GET_MODE_BITSIZE (mode)
-   && INTVAL (operands[1]) % 16 == 0
-   && UINTVAL (operands[2]) <= 0x"
+  "UINTVAL (operands[1]) < GET_MODE_BITSIZE (mode)
+   && UINTVAL (operands[1]) % 16 == 0"
   "movk\\t%0, %X2, lsl %1"
   [(set_attr "v8type" "movk")
(set_attr "mode" "")]


Re: [patch] Fix sched-deps DEP_POSTPONED, ds_t documentation

2013-05-17 Thread Steven Bosscher
Ping

On Sun, May 12, 2013 at 5:45 PM, Steven Bosscher wrote:
> Hello,
>
> While working on a sched-deps based delay slot scheduler, I've come to
> the conclusion that the dependencies themselves must indicate whether
> the dependent ref is delayed. So I started hacking sched-deps and ran
> into trouble... It turns out there is a problem introduced along with
> DEP_POSTPONED last year, but the real problem is the complicated ds_t
> representation and the unclear documentation. The first *6* bits on a
> ds_t were reserved for the dependence type, and two more bits were
> reserved for HARD_DEP and DEP_CANCELLED:
>
> -/* First '6' stands for 4 dep type bits and the HARD_DEP and DEP_CANCELLED
> -   bits.
> -   Second '4' stands for BEGIN_{DATA, CONTROL}, BE_IN_{DATA, CONTROL}
> -   dep weakness.  */
> -#define BITS_PER_DEP_WEAK ((BITS_PER_DEP_STATUS - 6) / 4)
>
> But DEP_POSTPONED adds another bit:
>
> /* Instruction has non-speculative dependence.  This bit represents the
>property of an instruction - not the one of a dependence.
>   Therefore, it can appear only in the TODO_SPEC field of an instruction.  */
> #define HARD_DEP (DEP_CONTROL << 1)
>
> /* Set in the TODO_SPEC field of an instruction for which new_ready
>has decided not to schedule it speculatively.  */
>  #define DEP_POSTPONED (HARD_DEP << 1)
>
> /* Set if a dependency is cancelled via speculation.  */
> #define DEP_CANCELLED (DEP_POSTPONED << 1)
>
> I wanted to add another flag, DEP_DELAYED, and optimistically just
> added another bit, the compiler started warning, etc.
>
>
> So far we seem to've gotten away with this because the sign bit on a
> ds_t was unused:
>
> /* We exclude sign bit.  */
> #define BITS_PER_DEP_STATUS (HOST_BITS_PER_INT - 1)
>
>
> The attached patch extends the ds_t documentation to clarify in a
> comment how all the bits are used. I've made ds_t and dw_t unsigned
> int, because ds_t is just a collection of bits, and dw_t is unsigned.
> The target hooks I had to update are all only used by ia64.
>
> I opportunistically reserved the one left-over bit for my own purposes ;-)
>
> Bootstrapped&tested on ia64-unknown-linux-gnu and on
> powerpc64-unknown-linux-gnu unix-{,-m32).
> OK for trunk?
>
> Ciao!
> Steven


Re: [PATCH] Use indentation in gtype.state to show nested structure

2013-05-17 Thread David Malcolm
On Wed, 2013-05-15 at 11:51 -0600, Jeff Law wrote:
> On 05/14/2013 01:07 PM, David Malcolm wrote:
> > On Tue, 2013-05-14 at 11:44 -0600, Jeff Law wrote:
> >> On 05/06/2013 03:05 PM, David Malcolm wrote:
> >
> > [...snip review and comments about auto-checking of formatting...]
> >
> >> Anyway, the patch is good to go.   Please install.
> >
> > Thanks for reviewing.
> >
> > Probably a dumb question, but by "Please install", do you mean, "please
> > commit to SVN"?
> Yes.

Thanks.  Committed to svn trunk as r199032 (using the version with the
fixed ChangeLog).




Re: web ICEs on subreg

2013-05-17 Thread Mike Stump
On May 16, 2013, at 5:26 PM, David Edelsohn  wrote:
> This patch is creating new segfaults for 32 bit POWER AIX.

Thanks for the heads up.  Fixed in r199030.

2013-05-17  Mike Stump  

PR rtl-optimization/57304
* web.c (union_match_dups): Ensure that DF_REF_LOC exists before
accessing DF_REF_REAL_LOC.

Index: web.c
===
--- web.c   (revision 199016)
+++ web.c   (working copy)
@@ -133,9 +133,10 @@ union_match_dups (rtx insn, struct web_e
   entry = type == OP_IN ? use_entry : def_entry;
   for (; *ref; ref++)
{
- if (DF_REF_LOC (*ref) == recog_data.operand_loc[op])
+ rtx *l = DF_REF_LOC (*ref);
+ if (l == recog_data.operand_loc[op])
break;
- if (DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op])
+ if (l && DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op])
break;
}
 
@@ -143,9 +144,10 @@ union_match_dups (rtx insn, struct web_e
{
  for (ref = use_link, entry = use_entry; *ref; ref++)
{
- if (DF_REF_LOC (*ref) == recog_data.operand_loc[op])
+ rtx *l = DF_REF_LOC (*ref);
+ if (l == recog_data.operand_loc[op])
break;
- if (DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op])
+ if (l && DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op])
break;
}
}
--


[patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Steven Bosscher
Hello,

Trying to dump the CFG as a graph fails after the pro_and_epilogue
pass because it leaves unreachable basic blocks. This patch fixes that
issue.

Will commit sometime next week if no-one objects.

Ciao!
Steven


* function.c (rest_of_handle_thread_prologue_and_epilogue):
Cleanup the CFG after thread_prologue_and_epilogue_insns.

Index: function.c
===
--- function.c  (revision 199028)
+++ function.c  (working copy)
@@ -6976,6 +6976,9 @@ rest_of_handle_thread_prologue_and_epilo
  scheduling to operate in the epilogue.  */
   thread_prologue_and_epilogue_insns ();

+  /* The prologue and epilogue may have made some blocks unreachable.  */
+  cleanup_cfg (0);
+
   /* The stack usage info is finalized during prologue expansion.  */
   if (flag_stack_usage_info)
 output_stack_usage ();


Re: [patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Jeff Law

On 05/17/2013 01:49 PM, Steven Bosscher wrote:

Hello,

Trying to dump the CFG as a graph fails after the pro_and_epilogue
pass because it leaves unreachable basic blocks. This patch fixes that
issue.

Will commit sometime next week if no-one objects.

Ciao!
Steven


 * function.c (rest_of_handle_thread_prologue_and_epilogue):
 Cleanup the CFG after thread_prologue_and_epilogue_insns.

?!?

Under what conditions does threading the prologue/epilogue make code 
unreachable?


Also note that commit after X days if no-one objects is not the projects 
policy.  Please don't take such liberties.



Jeff



Re: [Patch ARM] Fix PR target/19599

2013-05-17 Thread Richard Henderson
On 05/15/2013 04:50 AM, Ramana Radhakrishnan wrote:
>/* Cannot tail-call to long calls, since these are out of range of
>   a branch instruction.  */
> -  if (arm_is_long_call_p (decl))
> +  if (decl && arm_is_long_call_p (decl))
>  return false;

You can tail call long calls via indirection now.  I.e. load the address first.

>  (define_insn "*sibcall_insn"
> - [(call (mem:SI (match_operand:SI 0 "" "X"))
> + [(call (mem:SI (match_operand:SI 0 "call_insn_operand" "Cs,Ss"))

FWIW, "i" or "s" would work just as well, given you've already constrained by
call_insn_operand; Ss isn't really needed.

> +  if (arm_arch5 || arm_arch4t)
> + return \" bx\\t%0\\t%@ indirect register sibling call\";

Missing %?.

> +  if (arm_arch5 || arm_arch4t)
> + return \"bx\\t%1\";

Likewise.

> diff --git a/gcc/testsuite/gcc.target/arm/pr40887.c 
> b/gcc/testsuite/gcc.target/arm/pr40887.c
> index 0b5e873..5cabe3a 100644
> --- a/gcc/testsuite/gcc.target/arm/pr40887.c
> +++ b/gcc/testsuite/gcc.target/arm/pr40887.c
> @@ -2,9 +2,9 @@
>  /* { dg-options "-O2 -march=armv5te" }  */
>  /* { dg-final { scan-assembler "blx" } } */
>  
> -int (*indirect_func)();
> +int (*indirect_func)(int x);
>  
>  int indirect_call()
>  {
> -return indirect_func();
> +  return indirect_func(20) + indirect_func (40);
>  }
> 

You've made this test case 100% useless.  Of course there will always be a blx
insn, since you've two calls there, and one of them will never be tail called.
 It should have been either removed or become your new test case.


r~


[PATCH, i386]: Also pass mmx, 3dnow and sseX options with -march=native

2013-05-17 Thread Uros Bizjak
Hello!

For some reason we didn't pass mmx, 3dnow and sseX options to cc1 with
-march=native. Passing these options will help older processors that
doesn't get detected through architecture recognition functionality to
get all their ISAs enabled.

2013-05-17  Uros Bizjak  

* config/i386/driver-i386.c (host_detect_local_cpu): Pass mmx, 3dnow,
sse, sse2, sse3, ssse3 and sse4a flags to options.

Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline.

Uros.
Index: config/i386/driver-i386.c
===
--- config/i386/driver-i386.c   (revision 198989)
+++ config/i386/driver-i386.c   (working copy)
@@ -786,10 +786,17 @@ const char *host_detect_local_cpu (int argc, const
 
   if (arch)
 {
+  const char *mmx = has_mmx ? " -mmmx" : " -mno-mmx";
+  const char *mmx3dnow = has_3dnow ? " -m3dnow" : " -mno-3dnow";
+  const char *sse = has_sse ? " -msse" : " -mno-sse";
+  const char *sse2 = has_sse2 ? " -msse2" : " -mno-sse2";
+  const char *sse3 = has_sse3 ? " -msse3" : " -mno-sse3";
+  const char *ssse3 = has_ssse3 ? " -mssse3" : " -mno-ssse3";
+  const char *sse4a = has_sse4a ? " -msse4a" : " -mno-sse4a";
   const char *cx16 = has_cmpxchg16b ? " -mcx16" : " -mno-cx16";
   const char *sahf = has_lahf_lm ? " -msahf" : " -mno-sahf";
   const char *movbe = has_movbe ? " -mmovbe" : " -mno-movbe";
-  const char *ase = has_aes ? " -maes" : " -mno-aes";
+  const char *aes = has_aes ? " -maes" : " -mno-aes";
   const char *pclmul = has_pclmul ? " -mpclmul" : " -mno-pclmul";
   const char *popcnt = has_popcnt ? " -mpopcnt" : " -mno-popcnt";
   const char *abm = has_abm ? " -mabm" : " -mno-abm";
@@ -817,7 +824,8 @@ const char *host_detect_local_cpu (int argc, const
   const char *xsave = has_xsave ? " -mxsave" : " -mno-xsave";
   const char *xsaveopt = has_xsaveopt ? " -mxsaveopt" : " -mno-xsaveopt";
 
-  options = concat (options, cx16, sahf, movbe, ase, pclmul,
+  options = concat (options, mmx, mmx3dnow, sse, sse2, sse3, ssse3,
+   sse4a, cx16, sahf, movbe, aes, pclmul,
popcnt, abm, lwp, fma, fma4, xop, bmi, bmi2,
tbm, avx, avx2, sse4_2, sse4_1, lzcnt, rtm,
hle, rdrnd, f16c, fsgsbase, rdseed, prfchw, adx,


Re: [PATCH:RL78] Add new insn for mulqi3 and mulhi3

2013-05-17 Thread Richard Henderson
On 05/14/2013 09:43 PM, Kaushik Phatak wrote:
> Hi,
> The below patch adds expanders and insns for QI and HI mode for the RL78 
> target.
> The QI mode uses a generic 'mulu' instruction supported by all variants, while
> the HI mode creates insn for G13 and G14 target variants using hardware 
> multiply
> instructions.
> Tested on hardware and simulator with no additional regressions.
> Kindly review the same.
> 
> Thanks,
> Kaushik
> 
> 2013-05-15  Kaushik Phatak  
>
>   * config/rl78/rl78.md (mulqi3,mulhi3): New define_expands.
>   (mulqi3_rl78,mulhi3_rl78,mulhi3_g13): New define_insns.
> 
> Index: gcc/config/rl78/rl78.md
> ===
> --- gcc/config/rl78/rl78.md   (revision 198915)
> +++ gcc/config/rl78/rl78.md   (working copy)
> @@ -235,6 +235,24 @@
>[(set_attr "valloc" "macax")]
>  )
>  
> +(define_expand "mulqi3"
> +  [(set (match_operand:QI  0 "register_operand" "=&v")
> + (mult:QI  (match_operand:QI 1 "general_operand" "+vim")
> +   (match_operand:QI 2 "nonmemory_operand" "vi")))
> +   ]
> +  "" ; mulu supported by all targets
> +  ""
> +)

No constraints on define_expand, only predicates.

> +(define_insn "mulqi3_rl78"
> +(define_insn "mulhi3_rl78"
> +(define_insn "mulhi3_g13"

These names are not used.  They should be prefixed with "*" to indicate the
name is just for documentation.


r~



Re: [patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Steven Bosscher
On Fri, May 17, 2013 at 9:52 PM, Jeff Law  wrote:
> On 05/17/2013 01:49 PM, Steven Bosscher wrote:
>>
>> Hello,
>>
>> Trying to dump the CFG as a graph fails after the pro_and_epilogue
>> pass because it leaves unreachable basic blocks. This patch fixes that
>> issue.
>>
>> Will commit sometime next week if no-one objects.
>>
>> Ciao!
>> Steven
>>
>>
>>  * function.c (rest_of_handle_thread_prologue_and_epilogue):
>>  Cleanup the CFG after thread_prologue_and_epilogue_insns.
>
> ?!?
>
> Under what conditions does threading the prologue/epilogue make code
> unreachable?

Compiling testsuite/gcc.target/i386/pad-10.c with -fdump-rtl-all-graph.

Just before rest_of_handle_thread_prologue_and_epilogue we have:

Breakpoint 2, rest_of_handle_thread_prologue_and_epilogue () at
../../trunk/gcc/function.c:6969
6969{
(gdb) p brief_dump_cfg(stderr,-1)
;; basic block 2, loop depth 0, count 0, freq 1, maybe hot
;;  prev block 0, next block 3, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   ENTRY [100.0%]  (FALLTHRU)
;;  succ:   3 [9.2%]  (FALLTHRU)
;;  4 [90.8%]
;; basic block 3, loop depth 0, count 0, freq 922, maybe hot
;;  prev block 2, next block 4, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   2 [9.2%]  (FALLTHRU)
;;  succ:   4 [100.0%]  (FALLTHRU)
;; basic block 4, loop depth 0, count 0, freq 1, maybe hot
;;  prev block 3, next block 1, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   3 [100.0%]  (FALLTHRU)
;;  2 [90.8%]
;;  succ:   EXIT [100.0%]  (FALLTHRU)
$1 = void
(gdb) p debug_bb_n(2)
(note 6 1 4 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 4 6 31 2 NOTE_INSN_FUNCTION_BEG)
(insn 31 4 17 2 (set (reg:SI 0 ax [orig:59 D.1775 ] [59])
(reg/v:SI 4 si [orig:62 x ] [62]))
testsuite/gcc.target/i386/pad-10.c:18 85 {*movsi_internal}
 (nil))
(insn 17 31 8 2 (parallel [
(set (reg:SI 0 ax [orig:59 D.1775 ] [59])
(minus:SI (reg:SI 0 ax [orig:59 D.1775 ] [59])
(reg/v:SI 5 di [orig:61 z ] [61])))
(clobber (reg:CC 17 flags))
]) testsuite/gcc.target/i386/pad-10.c:18 295 {*subsi_1}
 (nil))
(insn 8 17 9 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/v:SI 4 si [orig:62 x ] [62])
(const_int 1 [0x1])))
testsuite/gcc.target/i386/pad-10.c:12 7 {*cmpsi_1}
 (expr_list:REG_DEAD (reg/v:SI 4 si [orig:62 x ] [62])
(nil)))
(jump_insn 9 8 10 2 (set (pc)
(if_then_else (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref:DI 18)
(pc))) testsuite/gcc.target/i386/pad-10.c:12 602 {*jcc_1}
 (expr_list:REG_BR_PROB (const_int 9078 [0x2376])
(nil))
 -> 18)

$2 = (basic_block_def *) 0x76224410
(gdb) p debug_bb_n(3)
(note 10 9 33 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 33 10 11 3 (set (mem/c:SI (plus:DI (reg/f:DI 7 sp)
(const_int 12 [0xc])) [2 %sfp+-4 S4 A32])
(reg/v:SI 5 di [orig:61 z ] [61])) 85 {*movsi_internal}
 (expr_list:REG_DEAD (reg/v:SI 5 di [orig:61 z ] [61])
(nil)))
(insn 11 33 12 3 (set (reg:QI 0 ax)
(const_int 0 [0])) testsuite/gcc.target/i386/pad-10.c:14 87
{*movqi_internal}
 (nil))
(call_insn 12 11 34 3 (call (mem:QI (symbol_ref:DI ("bar") [flags
0x41]  ) [0 bar S1 A8])
(const_int 0 [0])) testsuite/gcc.target/i386/pad-10.c:14 646 {*call}
 (expr_list:REG_DEAD (reg:QI 0 ax)
(nil))
(expr_list:REG_DEP_TRUE (use (reg:QI 0 ax))
(nil)))
(insn 34 12 5 3 (set (reg/v:SI 5 di [orig:61 z ] [61])
(mem/c:SI (plus:DI (reg/f:DI 7 sp)
(const_int 12 [0xc])) [2 %sfp+-4 S4 A32]))
testsuite/gcc.target/i386/pad-10.c:15 85 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 65)
(nil)))
(insn 5 34 18 3 (set (reg:SI 0 ax [orig:59 D.1775 ] [59])
(reg/v:SI 5 di [orig:61 z ] [61]))
testsuite/gcc.target/i386/pad-10.c:15 85 {*movsi_internal}
 (expr_list:REG_DEAD (reg/v:SI 5 di [orig:61 z ] [61])
(nil)))

$3 = (basic_block_def *) 0x76224208
(gdb) p debug_bb_n(4)
(code_label 18 5 19 4 3 "" [1 uses])
(note 19 18 27 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 27 19 0 4 (use (reg/i:SI 0 ax)) testsuite/gcc.target/i386/pad-10.c:19 -1
 (nil))

$4 = (basic_block_def *) 0x762242d8



Continuing from here leads to an ice in the graph dumper:

(gdb) cont
Continuing.
Breakpoint 1, internal_error (gmsgid=gmsgid@entry=0x119725e "in %s, at
%s:%d") at ../../trunk/gcc/diagnostic.c:1091
1091bool
(gdb) bt 10
#0  internal_error (gmsgid=gmsgid@entry=0x119725e "in %s, at %s:%d")
at ../../trunk/gcc/diagnostic.c:1091
#1  0x00dfda24 in fancy_abort (file=, line=869,
function=0xee3e60  "pre_and_rev_post_order_compute") at
../../trunk/gcc/diagnostic.c:1151
#2  0x0061dbab in pre_and_rev_post_order_compute
(pre_order=0x0, rev_post_order=0x1627850,
include_entry_exit=) at ../../trunk/gcc/cfganal.c:869
#3  0x00d28d22 in draw_cfg_nodes_no_loops (fun=, fun=, pp=0x15a4f20
) at
../../trunk/gcc/graph.c:183
#4  

Re: [patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Jeff Law

On 05/17/2013 02:53 PM, Steven Bosscher wrote:

On Fri, May 17, 2013 at 9:52 PM, Jeff Law  wrote:

On 05/17/2013 01:49 PM, Steven Bosscher wrote:


Hello,

Trying to dump the CFG as a graph fails after the pro_and_epilogue
pass because it leaves unreachable basic blocks. This patch fixes that
issue.

Will commit sometime next week if no-one objects.

Ciao!
Steven


  * function.c (rest_of_handle_thread_prologue_and_epilogue):
  Cleanup the CFG after thread_prologue_and_epilogue_insns.


?!?

Under what conditions does threading the prologue/epilogue make code
unreachable?


Compiling testsuite/gcc.target/i386/pad-10.c with -fdump-rtl-all-graph.

Just before rest_of_handle_thread_prologue_and_epilogue we have:

Thanks.





What's happened, is that emitting the epilogue at the end of basic
block 4 (with a barrier at the end) has made the use insn 43
unreachable.
But from the description you've given, it appears that the epilogue 
itself has unreachable code, and that shouldn't be happening.  If you 
think it can happen by way of shrink-wrapping, I'd like to see the analysis.


What does the epilogue itself look like before threading it into the 
insn stream?



This reaction seems to me like an unnecessary and disappointing slap
on the wrist. Project policy does allow committing obvious patches
without review. Perhaps I should just use that policy and not give
people the chance to comment? I would think you would also know by now
that I know all CFG related code as well as anyone, and, frankly,
probably better than even you.

You do not make project policy.  It's that simple.

Yes we have a obvious patch policy, but this certainly doesn't fall 
under that policy.  I would frown upon you using the obvious patch path 
to bypass existing policy for patch review.


If you want commit without review privileges, ask for someone to sponsor 
your request an the steering committee will vote on it.


jeff


Re: [patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Steven Bosscher
On Fri, May 17, 2013 at 11:16 PM, Jeff Law wrote:
>> What's happened, is that emitting the epilogue at the end of basic
>> block 4 (with a barrier at the end) has made the use insn 43
>> unreachable.
>
> But from the description you've given, it appears that the epilogue itself
> has unreachable code, and that shouldn't be happening.  If you think it can
> happen by way of shrink-wrapping, I'd like to see the analysis.

It is not the epilogue itself but the way shrink-wrapping emits it.
The block that is unreachable has its last predecessor edge removed in
function.c:6607:

6607  redirect_edge_and_branch_force (e, *pdest_bb);

I haven't looked at how the shrink-wrapping code works exactly. It's
Bernd's code, so perhaps he can have a look. This is now PR57320.

Ciao!
Steven


Re: [patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Steven Bosscher
On Fri, May 17, 2013 at 9:49 PM, Steven Bosscher wrote:
> Will commit sometime next week if no-one objects.

Obviously there are objections.
Patch withdrawn.

Ciao!
Steven


Re: [patch] Cleanup the CFG after pro_and_epilogue pass

2013-05-17 Thread Jeff Law

On 05/17/2013 03:53 PM, Steven Bosscher wrote:

On Fri, May 17, 2013 at 11:16 PM, Jeff Law wrote:

What's happened, is that emitting the epilogue at the end of basic
block 4 (with a barrier at the end) has made the use insn 43
unreachable.


But from the description you've given, it appears that the epilogue itself
has unreachable code, and that shouldn't be happening.  If you think it can
happen by way of shrink-wrapping, I'd like to see the analysis.


It is not the epilogue itself but the way shrink-wrapping emits it.
The block that is unreachable has its last predecessor edge removed in
function.c:6607:

6607  redirect_edge_and_branch_force (e, *pdest_bb);

I haven't looked at how the shrink-wrapping code works exactly. It's
Bernd's code, so perhaps he can have a look. This is now PR57320.
OK.  Let's go with your patch then.  Approved with a comment that 
shrink-wrapping can result in unreachable edges in the epilogue.


jeff


Fix infinite loop in lto-partition.c

2013-05-17 Thread Jan Hubicka
Hi,
privatize_symbol_name skips privatizing of symbols where we know for some reason
that it is needed. Loop in rename_statics however expect privatize_symbol_name
to always rename the var and it can get into an infinite loop.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* lto-partition.c (privatize_symbol_name): Return true when
privatizing happened.
(rename_statics): Do not go into infinite loop when privatizing
is not needed.
Index: lto-partition.c
===
--- lto-partition.c (revision 198936)
+++ lto-partition.c (working copy)
@@ -766,7 +766,7 @@ lto_balanced_map (void)
   with symbols defined out of the LTO world.
 */
 
-static void
+static bool
 privatize_symbol_name (symtab_node node)
 {
   tree decl = node->symbol.decl;
@@ -781,7 +781,7 @@ privatize_symbol_name (symtab_node node)
fprintf (cgraph_dump_file,
"Not privatizing symbol name: %s. It privatized already.\n",
name);
-  return;
+  return false;
 }
   /* Avoid mangling of already mangled clones. 
  ???  should have a flag whether a symbol has a 'private' name already,
@@ -793,7 +793,7 @@ privatize_symbol_name (symtab_node node)
fprintf (cgraph_dump_file,
"Not privatizing symbol name: %s. Has unique name.\n",
name);
-  return;
+  return false;
 }
   change_decl_assembler_name (decl, clone_function_name (decl, "lto_priv"));
   if (node->symbol.lto_file_data)
@@ -804,6 +804,7 @@ privatize_symbol_name (symtab_node node)
 fprintf (cgraph_dump_file,
"Privatizing symbol name: %s -> %s\n",
name, IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
+  return true;
 }
 
 /* Promote variable VNODE to be static.  */
@@ -906,11 +907,12 @@ rename_statics (lto_symtab_encoder_t enc
&& (!encoder
|| lto_symtab_encoder_lookup (encoder, s) != LCC_NOT_FOUND))
   {
-privatize_symbol_name (s);
-   /* Re-start from beggining since we do not know how many symbols 
changed a name.  */
-   s = symtab_node_for_asm (name);
+if (privatize_symbol_name (s))
+ /* Re-start from beggining since we do not know how many symbols 
changed a name.  */
+ s = symtab_node_for_asm (name);
+else s = s->symbol.next_sharing_asm_name;
   }
-   else s = s->symbol.next_sharing_asm_name;
+else s = s->symbol.next_sharing_asm_name;
 }
 
 /* Find out all static decls that need to be promoted to global because


Re: [Patch ARM] Fix PR target/19599

2013-05-17 Thread Ramana Radhakrishnan
On Fri, May 17, 2013 at 9:12 PM, Richard Henderson  wrote:
> On 05/15/2013 04:50 AM, Ramana Radhakrishnan wrote:
>>/* Cannot tail-call to long calls, since these are out of range of
>>   a branch instruction.  */
>> -  if (arm_is_long_call_p (decl))
>> +  if (decl && arm_is_long_call_p (decl))
>>  return false;
>
> You can tail call long calls via indirection now.  I.e. load the address 
> first.

>
>>  (define_insn "*sibcall_insn"
>> - [(call (mem:SI (match_operand:SI 0 "" "X"))
>> + [(call (mem:SI (match_operand:SI 0 "call_insn_operand" "Cs,Ss"))
>
> FWIW, "i" or "s" would work just as well, given you've already constrained by
> call_insn_operand; Ss isn't really needed.

True - don't know what I was thinking there.

>
>> +  if (arm_arch5 || arm_arch4t)
>> + return \" bx\\t%0\\t%@ indirect register sibling call\";
>
> Missing %?.

It's not marked predicable - if anything these %? markers ought to be
removed as well as I don't expect these to be conditionalized ever.

>
>> +  if (arm_arch5 || arm_arch4t)
>> + return \"bx\\t%1\";
>
> Likewise.
>
>> diff --git a/gcc/testsuite/gcc.target/arm/pr40887.c 
>> b/gcc/testsuite/gcc.target/arm/pr40887.c
>> index 0b5e873..5cabe3a 100644
>> --- a/gcc/testsuite/gcc.target/arm/pr40887.c
>> +++ b/gcc/testsuite/gcc.target/arm/pr40887.c
>> @@ -2,9 +2,9 @@
>>  /* { dg-options "-O2 -march=armv5te" }  */
>>  /* { dg-final { scan-assembler "blx" } } */
>>
>> -int (*indirect_func)();
>> +int (*indirect_func)(int x);
>>
>>  int indirect_call()
>>  {
>> -return indirect_func();
>> +  return indirect_func(20) + indirect_func (40);
>>  }
>>
>
> You've made this test case 100% useless.  Of course there will always be a blx
> insn, since you've two calls there, and one of them will never be tail called.
>  It should have been either removed or become your new test case.

If you haven't noticed it has effectively moved to pr19599.c as my
testcase . This is not 100% useless - this test was to make sure we
actually put out a blx for an indirect call as per PR40887 . So it
*actually* continues to test what we want as per PR40887 .

Thanks,
Ramana
>
> r~


Re: [GOOGLE] Back port discriminator patches to gcc-4_8

2013-05-17 Thread Cary Coutant
> The warning was attributed to the wrong lineno. Looks like
> discriminator does not work well when it coexists with macros.

I think the problem is with same_line_p. It's using expand_location to
test whether two locations refer to the same line, but expand_location
always unwinds the macro stack so that it's looking at the line number
of the macro expansion point. That means that every token in the macro
expansion will appear to be at the same line, so the loop over the
basic_block will replace all of the locations with the new_locus that
we just allocated for the new discriminator. Thus, it's clobbering the
locus for the instructions at line 27 in the macro definition with the
new locus we create to put a discriminator at line 26, and the warning
that should appear for line 27 now says line 26.

Other comments on the patch...

In expand_location_1:

+  if (min_discriminator_location != UNKNOWN_LOCATION
+  && loc >= min_discriminator_location
+  && loc < min_discriminator_location + next_discriminator_location)

The last comparison should just be "loc < next_discriminator_location".

Likewise in has_discriminator.

In assign_discriminator:

+  location_t new_locus = location_with_discriminator (locus,
discriminator);
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple stmt = gsi_stmt (gsi);
+  if (same_line_p (locus, gimple_location (stmt)))
+gimple_set_location (stmt, block ?
+  COMBINE_LOCATION_DATA (line_table, new_locus, block) :
+  LOCATION_LOCUS (new_locus));
+}

I'm not convinced the COMBINE_LOCATION_DATA is needed here, since
location_with_discriminator has already done that. And when block ==
NULL, calling LOCATION_LOCUS is effectively a no-op, isn't it? It
seems this could simply be:

  gimple_set_location (stmt, new_locus);

Also, you need to add vec.o to OBJS-libcommon in Makefile.in, since
input.o now depends on it. Other binaries, like gcov, won't build
otherwise.

I'm still not happy with a straightforward port of the old patch,
though, since discriminator assignment might allocate locations that
overlap those used for macro definitions -- there's no checking to see
if we've run out of locations in that case. I guess this could be OK
for the google branch for now, but not for trunk. I'm looking at
alternatives using the adhoc mapping -- do you think it would work if
we extend the adhoc mapping to track both a "data" (i.e., the block
pointer) and a discriminator?

-cary


Re: [GOOGLE] Back port discriminator patches to gcc-4_8

2013-05-17 Thread Dehao Chen
On Fri, May 17, 2013 at 4:35 PM, Cary Coutant  wrote:
>> The warning was attributed to the wrong lineno. Looks like
>> discriminator does not work well when it coexists with macros.
>
> I think the problem is with same_line_p. It's using expand_location to
> test whether two locations refer to the same line, but expand_location
> always unwinds the macro stack so that it's looking at the line number
> of the macro expansion point. That means that every token in the macro
> expansion will appear to be at the same line, so the loop over the
> basic_block will replace all of the locations with the new_locus that
> we just allocated for the new discriminator. Thus, it's clobbering the
> locus for the instructions at line 27 in the macro definition with the
> new locus we create to put a discriminator at line 26, and the warning
> that should appear for line 27 now says line 26.

Sounds like instead of checking the macro-expansion location,
same_line_p should check the expanded macro location instead, so the
problem will be resolved?

>
> Other comments on the patch...
>
> In expand_location_1:
>
> +  if (min_discriminator_location != UNKNOWN_LOCATION
> +  && loc >= min_discriminator_location
> +  && loc < min_discriminator_location + next_discriminator_location)
>
> The last comparison should just be "loc < next_discriminator_location".
>
> Likewise in has_discriminator.

Acked, will update the patch.

>
> In assign_discriminator:
>
> +  location_t new_locus = location_with_discriminator (locus,
> discriminator);
> +  gimple_stmt_iterator gsi;
> +
> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +{
> +  gimple stmt = gsi_stmt (gsi);
> +  if (same_line_p (locus, gimple_location (stmt)))
> +gimple_set_location (stmt, block ?
> +  COMBINE_LOCATION_DATA (line_table, new_locus, block) :
> +  LOCATION_LOCUS (new_locus));
> +}
>
> I'm not convinced the COMBINE_LOCATION_DATA is needed here, since
> location_with_discriminator has already done that. And when block ==
> NULL, calling LOCATION_LOCUS is effectively a no-op, isn't it? It
> seems this could simply be:
>
>   gimple_set_location (stmt, new_locus);

But what if block != NULL?

>
> Also, you need to add vec.o to OBJS-libcommon in Makefile.in, since
> input.o now depends on it. Other binaries, like gcov, won't build
> otherwise.

Acked, will update the patch
>
> I'm still not happy with a straightforward port of the old patch,
> though, since discriminator assignment might allocate locations that
> overlap those used for macro definitions -- there's no checking to see
> if we've run out of locations in that case. I guess this could be OK
> for the google branch for now, but not for trunk. I'm looking at
> alternatives using the adhoc mapping -- do you think it would work if
> we extend the adhoc mapping to track both a "data" (i.e., the block
> pointer) and a discriminator?

Yeah, I think that works. The only concern is that it would increase
the memory because each locus_adhoc map entry need to have another bit
to store the extra info?

Thanks,
Dehao

>
> -cary


Re: Minimize downward code motion during reassociation

2013-05-17 Thread Easwaran Raman
On Thu, Apr 25, 2013 at 4:42 AM, Richard Biener
 wrote:
> On Fri, Dec 7, 2012 at 9:01 PM, Easwaran Raman  wrote:
>> It seems I need to reset the debug uses of a statement before moving
>> the statement itself. The attached patch starts from the leaf to root
>> of the tree to be reassociated and places them at the point where
>> their dependences will be met after reassociation. This bootstraps and
>> I am running the tests. Ok if there are no test failures?
>
> +  if ((dep_bb != insert_bb
> +   && !dominated_by_p (CDI_DOMINATORS, insert_bb, dep_bb))
> +  || (dep_bb == insert_bb
> +  && gimple_uid (insert_stmt) < gimple_uid (dep_stmt)))
> +{
> +  insert_stmt = dep_stmt;
> +}
>
> superfluous {}
Removed.
>
> +  /* If there are any debug uses of LHS, reset them.  */
> +  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
> +{
> +  if (is_gimple_debug (use_stmt))
> +{
> +  gimple_debug_bind_reset_value (use_stmt);
>
> that's only needed for debug stmts that are not dominated by insert_point, no?
You are right. I have added the check.

>
> +  /* Statements marked for throw can not be in the middle of a basic block. 
> So
> + we can not insert a statement (not marked for throw) immediately
> after.  */
> +  else if (stmt_can_throw_internal (insert_point))
> +{
>
> use stmt_ends_bb_p (insert_point) instead
>
> +  edge succ_edge = find_fallthru_edge (insert_bb->succs);
> +  insert_bb = succ_edge->dest;
> +  /* Insert STMT at the beginning of the successor basic block.  */
> +  gsi_insert = gsi_after_labels (insert_bb);
> +  gsi_move_before (&gsistmt, &gsi_insert);
>
> are you sure this is a proper insert location?  How would you even arrive in
> this situation given find_insert_point dominance checks?

Let's say the last statement in a BB which can throw feeds into a
statement that is to be reassociated. Then the insert point would be
this statement. Instead, you could always insert at the beginning of
its (lone) successor. This is independent of the the find_insert_point
checks.

>
> Otherwise the patch looks ok - can you add one or two testcases please?
I have added a test case and submitted at r199048.

Thanks,
Easwaran


>
> Thanks,
> Richard.
>
>> Thanks,
>> Easwaran
>>
>> 2012-12-07   Easwaran Raman  
>> * tree-ssa-reassoc.c(find_insert_point): New function.
>> (insert_stmt_after): Likewise.
>> (get_def_stmt): Likewise.
>> (ensure_ops_are_available): Likewise.
>> (rewrite_expr_tree): Do not move statements beyond what is
>> necessary. Remove call to swap_ops_for_binary_stmt...
>> (reassociate_bb): ... and move it here.
>> (build_and_add_sum): Assign UIDs for new statements.
>> (linearize_expr): Likewise.
>> (do_reassoc): Renumber gimple statement UIDs.
>>
>>
>>
>> On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener
>>  wrote:
>>> On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman  wrote:
 I am unable to figure out the right way to handle the debug
 statements. What I tried was to find debug statements that use the SSA
 name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE)
 and then moved them as well at the right place. Thus, if I have to
 move t1 = a + b down (after the definition of 'd'), I also moved all
 debug statements that use t1 after the new position of t1. That still
 caused use-before-def problems in ssa_verify. I noticed that the debug
 statements got modified behind the scenes causing these issues. Any
 hints on what is the right way to handle the debug statements would be
 very helpful.
>>>
>>> I think you cannot (and should not) move debug statements.  Instead you
>>> have to invalidate them.  Otherwise you'll introduce confusion as debug
>>> info cannot handle overlapping live ranges.
>>>
>>> But maybe Alex can clarify.
>>>
>>> Richard.


[C++ Patch] PR 10207

2013-05-17 Thread Paolo Carlini

Hi,

in this even older ;) and related issue, we reject empty initializer 
lists in compound-literals. The fix seems simple: just use 
cp_parser_braced_list instead of cp_parser_initializer_list, which 
allows for the special case of empty list. Tested x86_64-linux.


There is a nit which I don't want to hide: cp_parser_initializer_list + 
build_constructor does a little more than cp_parser_braced_list: in 
build_constructor there is a loop setting TREE_SIDE_EFFECTS and 
TREE_CONSTANT to the right value for the CONSTRUCTOR overall, which 
doesn't exist in cp_parser_braced_list. In case it matters - I don't 
think it does, and we have testcases with side effects in the testsuite 
- we can't simply add it to cp_parser_braced_list, because its many 
existing uses are perfectly fine without. A little more code would be 
needed, not a big issue.


Thanks,
Paolo.

///
/cp
2013-05-18  Paolo Carlini  

PR c++/10207
* parser.c (cp_parser_postfix_expression): Use cp_parser_braced_list
instead of cp_parser_initializer_list for compound-literals.

/testsuite
2013-05-18  Paolo Carlini  

PR c++/10207
* g++.dg/ext/complit13.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 199043)
+++ cp/parser.c (working copy)
@@ -5719,7 +5719,7 @@ cp_parser_postfix_expression (cp_parser *parser, b
if (cp_parser_allow_gnu_extensions_p (parser)
&& cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
  {
-   vec *initializer_list = NULL;
+   tree initializer = NULL_TREE;
bool saved_in_type_id_in_expr_p;
 
cp_parser_parse_tentatively (parser);
@@ -5732,21 +5732,19 @@ cp_parser_postfix_expression (cp_parser *parser, b
parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
/* Look for the `)'.  */
cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN);
-   /* Look for the `{'.  */
-   cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE);
/* If things aren't going well, there's no need to
   keep going.  */
if (!cp_parser_error_occurred (parser))
  {
-   bool non_constant_p;
-   /* Parse the initializer-list.  */
-   initializer_list
- = cp_parser_initializer_list (parser, &non_constant_p);
-   /* Allow a trailing `,'.  */
-   if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
- cp_lexer_consume_token (parser->lexer);
-   /* Look for the final `}'.  */
-   cp_parser_require (parser, CPP_CLOSE_BRACE, RT_CLOSE_BRACE);
+   if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+ {
+   bool non_constant_p;
+   /* Parse the brace-enclosed initializer list.  */
+   initializer = cp_parser_braced_list (parser,
+&non_constant_p);
+ }
+   else
+ cp_parser_simulate_error (parser);
  }
/* If that worked, we're definitely looking at a
   compound-literal expression.  */
@@ -5754,7 +5752,8 @@ cp_parser_postfix_expression (cp_parser *parser, b
  {
/* Warn the user that a compound literal is not
   allowed in standard C++.  */
-   pedwarn (input_location, OPT_Wpedantic, "ISO C++ forbids 
compound-literals");
+   pedwarn (input_location, OPT_Wpedantic,
+"ISO C++ forbids compound-literals");
/* For simplicity, we disallow compound literals in
   constant-expressions.  We could
   allow compound literals of integer type, whose
@@ -5772,10 +5771,8 @@ cp_parser_postfix_expression (cp_parser *parser, b
  }
/* Form the representation of the compound-literal.  */
postfix_expression
- = (finish_compound_literal
-(type, build_constructor (init_list_type_node,
-  initializer_list),
- tf_warning_or_error));
+ = (finish_compound_literal (type, initializer,
+ tf_warning_or_error));
break;
  }
  }
Index: testsuite/g++.dg/ext/complit13.C
===
--- testsuite/g++.dg/ext/complit13.C(revision 0)
+++ testsuite/g++.dg/ext/complit13.C(working copy)
@@ -0,0 +1,11 @@
+// PR c++/10207
+// { dg-options "" }
+
+typedef struct { } EmptyStruct;
+typedef struct { EmptyStruct Empty; } DemoStruct;
+
+void Func()
+{
+  DemoStruct Demo;
+  Demo.Empty = (EmptyStruct) {};
+}


Re: [Patch][gcc-4_7-branch] Backport trunk revision 187838 into gcc-4_7-branch

2013-05-17 Thread Chung-Ju Wu
Ping: http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00429.html

The patch is to fix dependency issue of libgcc Makefile.in
by adding 'rm libgcc_tm.stamp' in the clean rule.

That was fixed on main trunk (r187838) but not on gcc-4_7-branch.
Since gcc-4_7-branch is still open, is it OK to backport r187838
into gcc-4_7-branch?


Best regards,
jasonwucj


[Patch] Extend script ./contrib/download_prerequisites usage for isl and cloog

2013-05-17 Thread Chung-Ju Wu
Hi all,

Using current trunk repository, it is now able to build
compiler with in-tree isl and cloog.

This patch is to extend ./contrib/download_prerequisites
usage to download isl and cloog conditionally in case
people would like to build gcc with graphite loop optimizations.

OK for the trunk?


contrib/ChangeLog

2013-05-18  Chung-Ju Wu  

* download_prerequisites: Download isl and cloog conditionally.



Index: contrib/download_prerequisites
===
--- contrib/download_prerequisites  (revision 199006)
+++ contrib/download_prerequisites  (working copy)
@@ -19,6 +19,12 @@
 # You should have received a copy of the GNU General Public License
 # along with this program. If not, see http://www.gnu.org/licenses/.

+# If you want to build GCC with the Graphite loop optimizations,
+# set GRAPHITE_LOOP_OPT=yes to download optional prerequisties
+# ISL Library and CLooG.
+GRAPHITE_LOOP_OPT=no
+
+# Necessary to build GCC.
 MPFR=mpfr-2.4.2
 GMP=gmp-4.3.2
 MPC=mpc-0.8.1
@@ -36,3 +42,19 @@
 ln -sf $MPC mpc || exit 1

 rm $MPFR.tar.bz2 $GMP.tar.bz2 $MPC.tar.gz || exit 1
+
+# Necessary to build GCC with the Graphite loop optimizations.
+if [ "$GRAPHITE_LOOP_OPT" == "yes" ] ; then
+  ISL=isl-0.11.1
+  CLOOG=cloog-0.18.0
+
+  wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$ISL.tar.bz2 || exit 1
+  tar xjf $ISL.tar.bz2  || exit 1
+  ln -sf $ISL isl || exit 1
+
+  wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$CLOOG.tar.gz || exit 1
+  tar xzf $CLOOG.tar.gz || exit 1
+  ln -sf $CLOOG cloog || exit 1
+
+  rm $ISL.tar.bz2 $CLOOG.tar.gz || exit 1
+fi


Best regards,
jasonwucj


Re: [PATCH] Fix latent bug in RTL GCSE/PRE (PR57159)

2013-05-17 Thread Jeff Law

On 05/07/2013 09:05 AM, Julian Brown wrote:

On Mon, 6 May 2013 11:53:20 -0600

I don't know. My assumption was that a "simple" mem would be one that
the PRE pass could handle -- that clause was intended to handle simple
mems in REG_EQUAL notes the same as simple mems as the source of a set.
Maybe that is not safe though, and it would be better to
unconditionally invalidate buried mems in REG_EQUAL notes? I was
slightly wary of inhibiting genuine optimisation opportunities.
For a simple mem, we'll do the right thing, at least that's my reading 
of the code.


I went ahead and put the two code fragments together and committed the 
change after a bootstrap and regression test on x86_64-unknown-linux-gnu.


For reference, attached is the patch that ultimately went into the tree.

Thanks,
Jeff




commit f473eb72a23bc82db0ee23e51fdd40b20417fb15
Author: law 
Date:   Sat May 18 03:48:18 2013 +

   * gcse.c (compute_ld_motion_mems): If a non-simple MEM is
   found in a REG_EQUAL note, invalidate it.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@199049 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 4536e62..d6eab5f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2013-05-17  Julian Brown  
+
+   * gcse.c (compute_ld_motion_mems): If a non-simple MEM is
+   found in a REG_EQUAL note, invalidate it.
+
 2013-05-17   Easwaran Raman  
 
* tree-ssa-reassoc.c (find_insert_point): New function.
diff --git a/gcc/gcse.c b/gcc/gcse.c
index 07043f7..e485985 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -3894,6 +3894,8 @@ compute_ld_motion_mems (void)
{
  rtx src = SET_SRC (PATTERN (insn));
  rtx dest = SET_DEST (PATTERN (insn));
+ rtx note = find_reg_equal_equiv_note (insn);
+ rtx src_eq;
 
  /* Check for a simple LOAD...  */
  if (MEM_P (src) && simple_mem (src))
@@ -3910,6 +3912,15 @@ compute_ld_motion_mems (void)
  invalidate_any_buried_refs (src);
}
 
+ if (note != 0 && REG_NOTE_KIND (note) == REG_EQUAL)
+   src_eq = XEXP (note, 0);
+ else
+   src_eq = NULL_RTX;
+
+ if (src_eq != NULL_RTX
+ && !(MEM_P (src_eq) && simple_mem (src_eq)))
+   invalidate_any_buried_refs (src_eq);
+
  /* Check for stores. Don't worry about aliased ones, they
 will block any movement we might do later. We only care
 about this exact pattern since those are the only


Re: GCC does not support *mmintrin.h with function specific opts

2013-05-17 Thread Jakub Jelinek
On Fri, May 17, 2013 at 09:00:21PM -0700, Sriraman Tallam wrote:
> --- ipa-inline.c  (revision 198950)
> +++ ipa-inline.c  (working copy)
> @@ -374,7 +374,33 @@ can_early_inline_edge_p (struct cgraph_edge *e)
>return false;
>  }
>if (!can_inline_edge_p (e, true))
> -return false;
> +{
> +  enum availability avail;
> +  struct cgraph_node *callee
> += cgraph_function_or_thunk_node (e->callee, &avail);
> +  /* Flag an error when the inlining cannot happen because of target 
> option
> +  mismatch but the callee is marked as "always_inline".  In -O0 mode
> +  this will go undetected because the error flagged in
> +  "expand_call_inline" in tree-inline.c might not execute and the
> +  inlining will not happen.  Then, the linker could complain about a
> +  missing body for the callee if it turned out that the callee was
> +  also marked "gnu_inline" with extern inline keyword as bodies of such
> +  functions are not generated.  */ 
> +  if ((!optimize
> +|| flag_no_inline)

This should be if ((!optimize || flag_no_inline) on one line.

I'd prefer also the testcase for the ICEs, something like:

/* Test case to check if AVX intrinsics and function specific target
   optimizations work together.  Check by including x86intrin.h  */

/* { dg-do compile } */
/* { dg-options "-O2 -mno-sse -mno-avx" } */

#include 

__m256 a, b, c;
void __attribute__((target ("avx")))
foo (void)
{
  a = _mm256_and_ps (b, c);
}

and another testcase that does:

/* { dg-do compile } */
#pragma GCC target ("mavx") /* { dg-error "whatever" } */

Otherwise it looks good to me, but I'd prefer the i?86 maintainers to review
it too (and Honza for ipa-inline.c?).

Jakub