Re: [PATCH] Add an intermediate coverage format for gcov

2011-10-05 Thread Sharad Singhai
Sorry about the badly formatted mail. Here is another version with a
different mailer. -Sharad


This patch was earlier submitted to google/main, but I propose it
for the trunk as well.

This patch adds an intermediate coverage format (enabled via 'gcov
-i'). This is a compact format as it does not require source files.

The new option ('gcov -i') outputs .gcov files in an intermediate text
format that can be postprocessed by lcov or other applications. It
will output a single *.gcov file per *.gcda file. No source code is
required.

The format of the intermediate .gcov file is plain text with one entry
per line

SF:source_file_name
FN:line_number,function_name
FNDA:execution_count,function_name
BA:line_num,branch_coverage_type
DA:line number,execution_count

Where the branch_coverage_type is
   0 (Branch not executed)
   1 (Branch executed, but not taken)
   2 (Branch executed and taken)

There can be multiple SF entries in an intermediate gcov file. All
entries following SF pertain to that source file until the next SF
entry.

A concrete example looks like this:

  SF:array.c
  FN:4,sum
  FNDA:0,sum
  FN:13,main
  FNDA:1,main
  DA:4,2
  BA:8,2
  DA:7,1

I have bootstrapped and tested this patch on x86_64-linux. No new test
failures were observed. 

Okay for trunk?

Thanks,
Sharad


2011-10-04   Sharad Singhai  

* doc/gcov.texi: Document gcov intermediate format.
* gcov.c (print_usage): Handle new option.
(process_args): Handle new option.
(get_gcov_file_intermediate_name): New function.
(output_intermediate_file): New function.
(generate_results): Handle new option.
* testsuite/lib/gcov.exp: Handle intermediate format.
* testsuite/g++.dg/gcov/gcov-8.C: New testcase.

Index: doc/gcov.texi
===
--- doc/gcov.texi   (revision 179475)
+++ doc/gcov.texi   (working copy)
@@ -130,6 +130,7 @@ gcov [@option{-v}|@option{--version}] [@
  [@option{-f}|@option{--function-summaries}]
  [@option{-o}|@option{--object-directory} @var{directory|file}] 
@var{sourcefiles}
  [@option{-u}|@option{--unconditional-branches}]
+ [@option{-i}|@option{--intermediate-format}]
  [@option{-d}|@option{--display-progress}]
 @c man end
 @c man begin SEEALSO
@@ -216,6 +217,32 @@ Unconditional branches are normally not 
 @itemx --display-progress
 Display the progress on the standard output.
 
+@item -i
+@itemx --intermediate-format
+Output gcov file in an intermediate text format that can be used by
+@command{lcov} or other applications. It will output a single *.gcov file per
+*.gcda file. No source code is required.
+
+The format of the intermediate @file{.gcov} file is plain text with
+one entry per line
+
+@smallexample
+SF:@var{source_file_name}
+FN:@var{line_number},@var{function_name}
+FNDA:@var{execution_count},@var{function_name}
+BA:@var{line_num},@var{branch_coverage_type}
+DA:@var{line number},@var{execution_count}
+
+Where the @var{branch_coverage_type} is
+   0 (Branch not executed)
+   1 (Branch executed, but not taken)
+   2 (Branch executed and taken)
+
+There can be multiple SF entries in an intermediate gcov file. All
+entries following SF pertain to that source file until the next SF
+entry.
+@end smallexample
+
 @end table
 
 @command{gcov} should be run with the current directory the same as that
Index: gcov.c
===
--- gcov.c  (revision 179475)
+++ gcov.c  (working copy)
@@ -39,6 +39,7 @@ along with Gcov; see the file COPYING3. 
 #include "intl.h"
 #include "diagnostic.h"
 #include "version.h"
+#include "demangle.h"
 
 #include 
 
@@ -311,6 +312,9 @@ static int flag_gcov_file = 1;
 
 static int flag_display_progress = 0;
 
+/* Output *.gcov file in intermediate format used by 'lcov'.  */
+static int flag_intermediate_format = 0;
+
 /* For included files, make the gcov output file name include the name
of the input source file.  For example, if x.h is included in a.c,
then the output file name is a.c##x.h.gcov instead of x.h.gcov.  */
@@ -436,6 +440,11 @@ print_usage (int error_p)
   fnotice (file, "  -o, --object-directory DIR|FILE Search for object files in 
DIR or called FILE\n");
   fnotice (file, "  -p, --preserve-pathsPreserve all pathname 
components\n");
   fnotice (file, "  -u, --unconditional-branchesShow unconditional branch 
counts too\n");
+  fnotice (file, "  -i, --intermediate-format   Output .gcov file in an 
intermediate text\n\
+format that can be used by 'lcov' or 
other\n\
+applications.  It will output a single\n\
+.gcov file per .gcda file.  No source 
file\n\
+is required.\n");
   fnotice (file, "  -d, --display-progress  Display progress 
information\n");
 

[PATCH, i386]: Introduce ix86_emit_binop

2011-10-05 Thread Uros Bizjak
Hello!

No functional change.

2011-10-05  Uros Bizjak  

* config/i386/i386.c (ix86_emit_binop): New static function.
(ix86_split_lea_for_addr): Use ix86_emit_binop to emit add and shl
instructions.
(x86_output_mi_thunk): Use ix86_emit_binop to emit add instructions.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32},
committed to mainline SVN.

Uros.
Index: i386.c
===
--- i386.c  (revision 179535)
+++ i386.c  (working copy)
@@ -16470,6 +16470,21 @@ ix86_avoid_lea_for_addr (rtx insn, rtx operands[])
   return !ix86_lea_outperforms (insn, regno0, regno1, regno2, split_cost);
 }
 
+/* Emit x86 binary operand CODE in mode MODE, where the first operand
+   matches destination.  RTX includes clobber of FLAGS_REG.  */
+
+static void
+ix86_emit_binop (enum rtx_code code, enum machine_mode mode,
+rtx dst, rtx src)
+{
+  rtx op, clob;
+
+  op = gen_rtx_SET (VOIDmode, dst, gen_rtx_fmt_ee (code, mode, dst, src));
+  clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+  
+  emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clob)));
+}
+
 /* Split lea instructions into a sequence of instructions
which are executed on ALU to avoid AGU stalls.
It is assumed that it is allowed to clobber flags register
@@ -16482,8 +16497,7 @@ ix86_split_lea_for_addr (rtx operands[], enum mach
   unsigned int regno1 = INVALID_REGNUM;
   unsigned int regno2 = INVALID_REGNUM;
   struct ix86_address parts;
-  rtx tmp, clob;
-  rtvec par;
+  rtx tmp;
   int ok, adds;
 
   ok = ix86_decompose_address (operands[1], &parts);
@@ -16515,14 +16529,7 @@ ix86_split_lea_for_addr (rtx operands[], enum mach
  gcc_assert (regno2 != regno0);
 
  for (adds = parts.scale; adds > 0; adds--)
-   {
- tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
- tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
- clob = gen_rtx_CLOBBER (VOIDmode,
- gen_rtx_REG (CCmode, FLAGS_REG));
- par = gen_rtvec (2, tmp, clob);
- emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
-   }
+   ix86_emit_binop (PLUS, mode, operands[0], parts.index);
}
   else
{
@@ -16531,30 +16538,14 @@ ix86_split_lea_for_addr (rtx operands[], enum mach
emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index));
 
  /* Use shift for scaling.  */
- tmp = gen_rtx_ASHIFT (mode, operands[0],
-   GEN_INT (exact_log2 (parts.scale)));
- tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
- clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
- par = gen_rtvec (2, tmp, clob);
- emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ ix86_emit_binop (ASHIFT, mode, operands[0],
+  GEN_INT (exact_log2 (parts.scale)));
 
  if (parts.base)
-   {
- tmp = gen_rtx_PLUS (mode, operands[0], parts.base);
- tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
- clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, 
FLAGS_REG));
- par = gen_rtvec (2, tmp, clob);
- emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
-   }
+   ix86_emit_binop (PLUS, mode, operands[0], parts.base);
 
  if (parts.disp && parts.disp != const0_rtx)
-   {
- tmp = gen_rtx_PLUS (mode, operands[0], parts.disp);
- tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
- clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, 
FLAGS_REG));
- par = gen_rtvec (2, tmp, clob);
- emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
-   }
+   ix86_emit_binop (PLUS, mode, operands[0], parts.disp);
}
 }
   else if (!parts.base && !parts.index)
@@ -16565,41 +16556,32 @@ ix86_split_lea_for_addr (rtx operands[], enum mach
   else
 {
   if (!parts.base)
-  {
-if (regno0 != regno2)
- emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index));
-  }
+   {
+ if (regno0 != regno2)
+   emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index));
+   }
   else if (!parts.index)
-  {
-if (regno0 != regno1)
-  emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.base));
-  }
-  else
-  {
-   if (regno0 == regno1)
- tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
-   else if (regno0 == regno2)
- tmp = gen_rtx_PLUS (mode, operands[0], parts.base);
-   else
- {
+   {
+ if (regno0 != regno1)
emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.base));
-   tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
- }
+   }
+  else
+   {
+ if (regno0 == regno1)
+   tmp = parts.inde

Re: [PATCH] Handle side-effects of EH optimizations of callees

2011-10-05 Thread Richard Guenther
On Tue, Oct 4, 2011 at 9:02 PM, Maxim Kuvyrkov  wrote:
> On 5/10/2011, at 1:49 AM, Richard Guenther wrote:
>
>> On Tue, Oct 4, 2011 at 9:17 AM, Maxim Kuvyrkov  
>> wrote:
>>> Richard,
>>>
>>> The following patch fixes a CFG consistency problem.
>>>
>>> When early IPA optimizations (e.g., early SRA) create a version of a 
>>> function that no longer throws, versioning machinery deletes EH landings 
>>> pads in _callers_ of the function [*].  However, the EH cfg edges are not 
>>> updated in the callers, which eventually leads to an ICE in verify_eh_edges.
>>
>> It needs to update EH edges in the callers via the common idiom
>>
>>  if (maybe_clean_or_replace_eh_stmt (stmt, stmt)
>>      && gimple_purge_dead_eh_edges (gimple_bb (stmt)))
>>    ... arrange for cfg-cleanup to be run ...
>>
>> where the cfg-cleanup part probably explains why it doesn't.  So one option
>> is to not delete the landing pads.
>
> There are several places in the compiler where code does not the above idiom 
> and calls maybe_clean[_or_replace]_eh_stmt without following it up with 
> cfg-cleanup.
>
>>  Where does it do that?
>
> The cases that I investigated are:
> - cgraphunit.c: update_call_expr().  The right thing to do /seems/ to be to 
> remove call to maybe_clean_eh_stmt_fn().

Ok, that simply replaces the fndecl of a call statement, so yes, removing
the maybe_clean_eh_stmt_fn call should fix this case.  But then you might
run into IL verification issues that you have a stmt with EH info that cannot
throw ...

Basically these kind of local IPA passes are bad :/

> - ipa-prop.c: ipa_modify_call_arguments() -> gsi_replace().  The right thing 
> to do /seems/ to be to add follow up call to gimple_purge_dead_eh_edges().

I'd say call gsi_replace with , false and copy EH info manually and
unconditionally here.

> Both cases above are triggered from early SRA.

But maybe also used by other real IPA passes?

> The other places that appear not to update CFG after calling 
> maybe_clean_or_replace_eh_stmt are:
> - tree-cfg.c: replace_uses_by()
> - tree-ssa-dce.c: eliminate_unnecessary_stmts().
>
>>  I suppose it
>> merely fails to properly transfer the EH info from the old to the new call
>> (thus, fails to forcefully do what maybe_duplicate_eh_stmt does).
>
> I don't follow you here.  The EH CFG edges that get out-of-date are in 
> _caller_ function.  It is the _callee_ function that is duplicated.

Yes, what I mean is if a stmt is replaced in the caller then it will not
copy EH info if the new call cannot throw.  But as we are not updating
the CFG we have to preserve EH info nevertheless (and allow "dead"
EH info in the verifiers).

The alternative is to, for example in update_call_expr (), remove dead
EH edges, switch cfun to the caller context, call cfg_cleanup, switch
back.  Expensive (but maybe doesn't happen often enough to worry?).

I wonder why we end up, at IPA SRA time, knowing the cloned function
won't throw anymore.  I suppose we don't.  Do we?

Thanks,
Richard.

> Thank you,
>
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
>
>


[PATCH] Use widening_optab_handler when expanding highpart mults

2011-10-05 Thread Andreas Krebbel
Hi,

the optab_handler uses in expand_mult_highpart_optab haven't been
replaced with the widening_optab_handler yet.

Fixed with attached patch.

Tested on s390x and x86_64.

Bye,

-Andreas-


2011-10-05  Andreas Krebbel  

* expmed.c (expand_mult_highpart_optab): Replace optab_handler
with the new widening_optab_handler.


Index: gcc/expmed.c
===
*** gcc/expmed.c.orig
--- gcc/expmed.c
*** expand_mult_highpart_optab (enum machine
*** 3467,3473 
  
/* Try widening multiplication.  */
moptab = unsignedp ? umul_widen_optab : smul_widen_optab;
!   if (optab_handler (moptab, wider_mode) != CODE_FOR_nothing
&& mul_widen_cost[speed][wider_mode] < max_cost)
  {
tem = expand_binop (wider_mode, moptab, op0, narrow_op1, 0,
--- 3467,3473 
  
/* Try widening multiplication.  */
moptab = unsignedp ? umul_widen_optab : smul_widen_optab;
!   if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
&& mul_widen_cost[speed][wider_mode] < max_cost)
  {
tem = expand_binop (wider_mode, moptab, op0, narrow_op1, 0,
*** expand_mult_highpart_optab (enum machine
*** 3504,3510 
  
/* Try widening multiplication of opposite signedness, and adjust.  */
moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
!   if (optab_handler (moptab, wider_mode) != CODE_FOR_nothing
&& size - 1 < BITS_PER_WORD
&& (mul_widen_cost[speed][wider_mode] + 2 * 
shift_cost[speed][mode][size-1]
  + 4 * add_cost[speed][mode] < max_cost))
--- 3504,3510 
  
/* Try widening multiplication of opposite signedness, and adjust.  */
moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
!   if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
&& size - 1 < BITS_PER_WORD
&& (mul_widen_cost[speed][wider_mode] + 2 * 
shift_cost[speed][mode][size-1]
  + 4 * add_cost[speed][mode] < max_cost))


Re: New warning for expanded vector operations

2011-10-05 Thread Richard Guenther
On Wed, Oct 5, 2011 at 12:18 AM, Artem Shinkarov
 wrote:
> Hi
>
> Here is a patch to inform a programmer about the expanded vector operation.
> Bootstrapped on x86-unknown-linux-gnu.
>
> ChangeLog:
>
>        * gcc/tree-vect-generic.c (expand_vector_piecewise): Adjust to
>          produce the warning.
>          (expand_vector_parallel): Adjust to produce the warning.

Entries start without gcc/, they are relative to the gcc/ChangeLog file.

>          (lower_vec_shuffle): Adjust to produce the warning.
>        * gcc/common.opt: New warning Wvector-operation-expanded.
>        * gcc/doc/invoke.texi: Document the wawning.
>
>
> Ok?

I don't like the name -Wvector-operation-expanded.  We emit a
similar warning for missed inline expansions with -Winline, so
maybe -Wvector-extensions (that's the name that appears
in the C extension documentation).

+  location_t loc = gimple_location (gsi_stmt (*gsi));
+
+  warning_at (loc, OPT_Wvector_operation_expanded,
+ "vector operation will be expanded piecewise");

   v = VEC_alloc(constructor_elt, gc, (nunits + delta - 1) / delta);
   for (i = 0; i < nunits;
@@ -260,6 +264,10 @@ expand_vector_parallel (gimple_stmt_iter
   tree result, compute_type;
   enum machine_mode mode;
   int n_words = tree_low_cst (TYPE_SIZE_UNIT (type), 1) / UNITS_PER_WORD;
+  location_t loc = gimple_location (gsi_stmt (*gsi));
+
+  warning_at (loc, OPT_Wvector_operation_expanded,
+ "vector operation will be expanded in parallel");

what's the difference between 'piecewise' and 'in parallel'?

@@ -301,16 +309,15 @@ expand_vector_addition (gimple_stmt_iter
 {
   int parts_per_word = UNITS_PER_WORD
   / tree_low_cst (TYPE_SIZE_UNIT (TREE_TYPE (type)), 1);
+  location_t loc = gimple_location (gsi_stmt (*gsi));

   if (INTEGRAL_TYPE_P (TREE_TYPE (type))
   && parts_per_word >= 4
   && TYPE_VECTOR_SUBPARTS (type) >= 4)
-return expand_vector_parallel (gsi, f_parallel,
-  type, a, b, code);
+return expand_vector_parallel (gsi, f_parallel, type, a, b, code);
   else
-return expand_vector_piecewise (gsi, f,
-   type, TREE_TYPE (type),
-   a, b, code);
+return expand_vector_piecewise (gsi, f, type,
+   TREE_TYPE (type), a, b, code);
 }

 /* Check if vector VEC consists of all the equal elements and

unless i miss something loc is unused here.  Please avoid random
whitespace changes (just review your patch yourself before posting
and revert pieces that do nothing).

+@item -Wvector-operation-expanded
+@opindex Wvector-operation-expanded
+@opindex Wno-vector-operation-expanded
+Warn if vector operation is not implemented via SIMD capabilities of the
+architecture. Mainly useful for the performance tuning.

I'd mention that this is for vector operations as of the C extension
documented in "Vector Extensions".

The vectorizer can produce some operations that will need further
lowering - we probably should make sure to _not_ warn about those.
Try running the vect.exp testsuite with the new warning turned on
(eventually disabling SSE), like with

obj/gcc> make check-gcc
RUNTESTFLAGS="--target_board=unix/-Wvector-extensions/-mno-sse
vect.exp"

> P.S. It is hard to write a reasonable testcase for the patch, because
> one needs to guess which architecture would expand a given vector
> operation. But the patch is trivial.

You can create an aritificial large vector type for example, or put a
testcase under gcc.target/i386 and disable SSE.  We should have
a testcase for this.

Thanks,
Richard.


Re: Avoid optimized out references to appear in lto symbol table

2011-10-05 Thread Jan Hubicka
> Jan Hubicka  writes:
> 
> > Hi,
> > GNU LD has bug about reporting references to hidden symbol sthat has been 
> > optimized out.
> > This however made me notice that we do output into LTO symbol tables things 
> > that do
> > not belong there. In partiuclar we often output extern inline functions 
> > that has been
> > fully inlined by early opts.
> 
> 
> > This patch solves the problem by running cgraph unreachable function 
> > rmeoval before
> > IPA (currently it is done only before early opts, not after).
> 
> Will this solve the case of
> 
>  if (expression known at compile time)  (but not constant expression)
> this_is_a_static_assert_that_failed()
> 
> ?

Only if the expression is known at compile time early enough (i.e. known to the 
early optimizers).
I would say that most of such expressions should be. I would be interested in 
seeing testcases
where we still have problems.

Honza

> I had a lot of problems with that and had to add various dummy stubs
> just work around this problem.
> 
> -Andi
> -- 
> a...@linux.intel.com -- Speaking for myself only


Re: [PATCH] Use widening_optab_handler when expanding highpart mults

2011-10-05 Thread Andrew Stubbs

On Wed 05 Oct 2011 09:33:09 BST, Andreas Krebbel wrote:

Hi,

the optab_handler uses in expand_mult_highpart_optab haven't been
replaced with the widening_optab_handler yet.


Apologies, I don't know how I missed that one? :(

Andrew


Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-10-05 Thread Richard Guenther
On Tue, Oct 4, 2011 at 10:58 PM, Richard Henderson  wrote:
> On 10/04/2011 01:17 PM, Tom de Vries wrote:
>> In general, to fold vlas (which are lowered to allocas) to normal 
>> declarations,
>> if the alloca argument is constant.
>
> Ah.  Ok, I suppose.  How often are you seeing this happening?  I can imagine
> a few instances via inlining, but even there not so much...
>
>> Any guidance on what to do if we have to expand the 
>> __builtin_alloca_with_align
>> to a function call, specifically, with the second argument? This is currently
>> handled like this, which is not very nice:
>
> Don't do anything special.  Just let it fall through as with alloca.  This 
> should
> never happen, except for stupid user tricks like 
> -fno-builtin-alloca_with_align.
> And if the user does that, they get what they deserve wrt needing to implement
> that function in their runtime.

Yes, especially as we only use builtin_alloca_with_align for VLAs.  Setting
the assembler name to alloca might be nice to users at least, the
excess argument shouldn't be any problem (well, hopefully targets don't
have a special ABI for alloca ...)

Richard.

>
> r~
>


Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-10-05 Thread Richard Guenther
On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries  wrote:
> On 10/04/2011 03:03 PM, Richard Guenther wrote:
>> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries  wrote:
>>> On 10/01/2011 05:46 PM, Tom de Vries wrote:
 On 09/30/2011 03:29 PM, Richard Guenther wrote:
> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries  
> wrote:
>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries  
>>> wrote:
 Richard,

 I got a patch for PR50527.

 The patch prevents the alignment of vla-related allocas to be set to
 BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after 
 folding
 the alloca.

 Bootstrapped and regtested on x86_64.

 OK for trunk?
>>>
>>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>> the vectorizer then will no longer see that the arrays are properly 
>>> aligned.
>>>
>>> I'm not sure what the best thing to do is here, other than trying to 
>>> record
>>> the alignment requirement of the VLA somewhere.
>>>
>>> Forcing the alignment of the alloca replacement decl to 
>>> BIGGEST_ALIGNMENT
>>> has the issue that it will force stack-realignment which isn't free 
>>> (and the
>>> point was to make the decl cheaper than the alloca).  But that might
>>> possibly be the better choice.
>>>
>>> Any other thoughts?
>>
>> How about the approach in this (untested) patch? Using the DECL_ALIGN of 
>> the vla
>> for the new array prevents stack realignment for folded vla-allocas, 
>> also for
>> large vlas.
>>
>> This will not help in vectorizing large folded vla-allocas, but I think 
>> it's not
>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that 
>> has
>> been the case up until we started to fold). If you want to trigger 
>> vectorization
>> for a vla, you can still use the aligned attribute on the declaration.
>>
>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also 
>> without using
>> an attribute on the decl. This patch exploits this by setting it at the 
>> end of
>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>> propagation though, because although the ptr_info of the lhs is 
>> propagated via
>> copy_prop afterwards, it's not propagated anymore via ccp.
>>
>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of 
>> ccp2 and
>> not fold during ccp3.
>
> Ugh, somehow I like this the least ;)
>
> How about lowering VLAs to
>
>   p = __builtin_alloca (...);
>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>
> and not assume anything for alloca itself if it feeds a
> __builtin_assume_aligned?
>
> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>
>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>
> that's less awkward to use?
>
> Sorry for not having a clear plan here ;)
>

 Using assume_aligned is a more orthogonal way to represent this in gimple, 
 but
 indeed harder to use.

 Another possibility is to add a 'tree vla_decl' field to struct
 gimple_statement_call, which is probably the easiest to implement.

 But I think __builtin_alloca_with_align might have a use beyond vlas, so I
 decided to try this one. Attached patch implements my first stab at this  
 (now
 testing on x86_64).

 Is this an acceptable approach?

>>>
>>> bootstrapped and reg-tested (including ada) on x86_64.
>>>
>>> Ok for trunk?
>>
>> The idea is ok I think.  But
>>
>>      case BUILT_IN_ALLOCA:
>> +    case BUILT_IN_ALLOCA_WITH_ALIGN:
>>        /* If the allocation stems from the declaration of a variable-sized
>>          object, it cannot accumulate.  */
>>        target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
>>        if (target)
>>         return target;
>> +      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
>> +         == BUILT_IN_ALLOCA_WITH_ALIGN)
>> +       {
>> +         tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
>> +                                              
>> built_in_decls[BUILT_IN_ALLOCA],
>> +                                              1, CALL_EXPR_ARG (exp, 0));
>> +         CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
>> +         exp = new_call;
>> +       }
>>
>> Ick.  Why can't the rest of the compiler not handle
>> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
>> (thus, arrange things so the assembler name of alloca-with-align is alloca?)
>>
>
> We can set the assembler name in the local_define_builtin call. But that will
> still create a call alloca (12, 4). How do we deal with the second argument?
>
>> I don't see why you 

Re: [PATCH 2/2] allow certain kinds of inputs to top level asm()-s

2011-10-05 Thread Jan Hubicka



we have, like specifying the set of symbols _defined_ by a toplevel
asm, right?  I might misremember but sth like

extern void foo (void);
asm(""  "foo");

was supposed to do the trick.  Or should we treat those as outputs
(given you use inputs for symbol uses)?


I don't recall any discussion of how to deal with symbols defined by a
top level asm - I was just asked to follow the "normal" asm syntax in
having two colons in the middle instead of one as I had originally (not
expecting any use for outputs here).


Honza, do you remember if we decided on anything here?


To be honest, I don't think we actually dicussed some particular  
syntax or my memory is even worse ;). We will need to make one.
In the above example, I don't think quotes about foo is needed. We  
actualy provide definition of the declaration, so it should be just  
the decl itself, not a string.


Honza


Thanks,
Richard.









Re: [PATCH] Handle side-effects of EH optimizations of callees

2011-10-05 Thread Jan Hubicka
> On Tue, Oct 4, 2011 at 9:02 PM, Maxim Kuvyrkov  wrote:
> > On 5/10/2011, at 1:49 AM, Richard Guenther wrote:
> >
> >> On Tue, Oct 4, 2011 at 9:17 AM, Maxim Kuvyrkov  
> >> wrote:
> >>> Richard,
> >>>
> >>> The following patch fixes a CFG consistency problem.
> >>>
> >>> When early IPA optimizations (e.g., early SRA) create a version of a 
> >>> function that no longer throws, versioning machinery deletes EH landings 
> >>> pads in _callers_ of the function [*].  However, the EH cfg edges are not 
> >>> updated in the callers, which eventually leads to an ICE in 
> >>> verify_eh_edges.
> >>
> >> It needs to update EH edges in the callers via the common idiom
> >>
> >>  if (maybe_clean_or_replace_eh_stmt (stmt, stmt)
> >>      && gimple_purge_dead_eh_edges (gimple_bb (stmt)))
> >>    ... arrange for cfg-cleanup to be run ...
> >>
> >> where the cfg-cleanup part probably explains why it doesn't.  So one option
> >> is to not delete the landing pads.
> >
> > There are several places in the compiler where code does not the above 
> > idiom and calls maybe_clean[_or_replace]_eh_stmt without following it up 
> > with cfg-cleanup.
> >
> >>  Where does it do that?
> >
> > The cases that I investigated are:
> > - cgraphunit.c: update_call_expr().  The right thing to do /seems/ to be to 
> > remove call to maybe_clean_eh_stmt_fn().
> 
> Ok, that simply replaces the fndecl of a call statement, so yes, removing
> the maybe_clean_eh_stmt_fn call should fix this case.  But then you might
> run into IL verification issues that you have a stmt with EH info that cannot
> throw ...
> 
> Basically these kind of local IPA passes are bad :/

This happens at materialization. Basically we should 1) materialize, 2) 
redirect callers and render some blocks unreachable, 3) inline, 4) cleanup cfg 
and get rid of them before updating SSA form.
So it seems safe to me.
Honza
> 
> > - ipa-prop.c: ipa_modify_call_arguments() -> gsi_replace().  The right 
> > thing to do /seems/ to be to add follow up call to 
> > gimple_purge_dead_eh_edges().
> 
> I'd say call gsi_replace with , false and copy EH info manually and
> unconditionally here.
> 
> > Both cases above are triggered from early SRA.
> 
> But maybe also used by other real IPA passes?
> 
> > The other places that appear not to update CFG after calling 
> > maybe_clean_or_replace_eh_stmt are:
> > - tree-cfg.c: replace_uses_by()
> > - tree-ssa-dce.c: eliminate_unnecessary_stmts().
> >
> >>  I suppose it
> >> merely fails to properly transfer the EH info from the old to the new call
> >> (thus, fails to forcefully do what maybe_duplicate_eh_stmt does).
> >
> > I don't follow you here.  The EH CFG edges that get out-of-date are in 
> > _caller_ function.  It is the _callee_ function that is duplicated.
> 
> Yes, what I mean is if a stmt is replaced in the caller then it will not
> copy EH info if the new call cannot throw.  But as we are not updating
> the CFG we have to preserve EH info nevertheless (and allow "dead"
> EH info in the verifiers).
> 
> The alternative is to, for example in update_call_expr (), remove dead
> EH edges, switch cfun to the caller context, call cfg_cleanup, switch
> back.  Expensive (but maybe doesn't happen often enough to worry?).
> 
> I wonder why we end up, at IPA SRA time, knowing the cloned function
> won't throw anymore.  I suppose we don't.  Do we?
> 
> Thanks,
> Richard.
> 
> > Thank you,
> >
> > --
> > Maxim Kuvyrkov
> > CodeSourcery / Mentor Graphics
> >
> >


Re: Ping #1: [Patch,AVR]: Clean-up some SP insns

2011-10-05 Thread Georg-Johann Lay

Ping #1: http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01690.html

Georg-Johann Lay wrote:

> This is just a code clean-up.
> 
> The bulky code from *addhi3_sp_R_pc2 and *addhi3_sp_R_pc3 is done by a small C
> function that does the same (except that it prints some comment depending on
> -dp or -fverbose-asm).
> 
> *movhi_sp is an insn that should not be there and go away because it is a move
> insn and there should be at most one move insn per mode (HImode in this case).
> 
> stack_register_operand, i.e. (REG:HI 32), is in register class STACK_REG, thus
> not in NO_REGS, thus element of register_operand, general_regs and
> nonimmediate_operand which are the predicates/condition of *movhi.  *movhi
> already knows to handle "q,r" and "r,q" moves, same applies to the output
> function output_movhi.
> 
> The patch passes the test suite, of course.
> 
> Ok?
> 
> Moreover, I'd like to remove constraint "R" which is just used in one insn now
> and replace it by 3-letter constraint C.. so that prefix R is free.
> 
> R is of absolutely no use in inline assembly and the constraint can be
> renamed/removed from documentation, IMO.
> 
> Johann
> 
>   * config/avr/avr-protos.h (avr_out_addto_sp): New prototype.
>   * config/avr/avr.c (avr_out_addto_sp): New function.
>   (adjust_insn_length): Handle ADJUST_LEN_ADDTO_SP.
>   * config/avr/avr.md (adjust_len): Add "addto_sp".
>   (*movhi_sp): Remove insn.
>   (*addhi3_sp_R_pc2, *addhi3_sp_R_pc3): Merge to *addhi3_sp_R.



Re: [wwwdocs] IA-32/x86-64 Changes for upcoming 4.7.0 series

2011-10-05 Thread Kirill Yukhin
Thank you guys for your support!

K


Commit: RX: Codegen bug fixes

2011-10-05 Thread Nick Clifton
Hi Guys,

  I am applying the patch below to fix a couple of bugs in the RX's
  machine description patterns.  The first concerns the tablejump
  pattern, which needs to include a label to be referenced by the
  ASM_OUTPUT_ADDR_DIFF_ELT macro.

  The second problem is the ADDDI3 spiltter which was not marking its
  first output operand (operand 0) as early clobbered.  This meant that
  it could be used as one of the inputs to the second part of the
  pattern (operands 4,5), causing chaos.

  The final fix was pointed out by Richard Henderson.  The recently
  added support for narrow mode min and max instructions did not work
  for the SMAX insn, as the RX does not have narrow mode versions of
  this insn.

Cheers
  Nick

gcc/ChangeLog
2011-10-05  Nick Clifton  

* config/rx/rx.md (tablejump): Add missing label.
(adddi3_internal): Mark operand 0 as early-clobbered.
(smaxsi3): Revert previous delta.
(adc_internal): Fix whitespace in generated asm.
(adc_flags): Likewise.

Index: gcc/config/rx/rx.md
===
--- gcc/config/rx/rx.md (revision 179540)
+++ gcc/config/rx/rx.md (working copy)
@@ -332,7 +332,7 @@
   ""
   { return flag_pic ? (TARGET_AS100_SYNTAX ? "\n?:\tbra\t%0"
   : "\n1:\tbra\t%0")
-  : "jmp\t%0";
+  : "\n1:jmp\t%0";
   }
   [(set_attr "timings" "33")
(set_attr "length" "2")]
@@ -901,7 +901,7 @@
  (match_operand:SI   2 "rx_source_operand" 
"r,Sint08,Sint16,Sint24,i,Q")))
 (clobber (reg:CC CC_REG))]
   "reload_completed"
-  "adc %2,%0"
+  "adc\t%2, %0"
   [(set_attr "timings" "11,11,11,11,11,33")
(set_attr "length"   "3,4,5,6,7,6")]
 )
@@ -922,7 +922,7 @@
(match_dup 2))
  (const_int 0)))]
   "reload_completed && rx_match_ccmode (insn, CC_ZSCmode)"
-  "adc %2,%0"
+  "adc\t%2, %0"
   [(set_attr "timings" "11,11,11,11,11,33")
(set_attr "length"   "3,4,5,6,7,6")]
 )
@@ -980,7 +980,7 @@
 })
 
 (define_insn_and_split "adddi3_internal"
-  [(set (match_operand:SI  0 "register_operand"  "=r")
+  [(set (match_operand:SI  0 "register_operand"  "=&r")
(plus:SI (match_operand:SI 2 "register_operand"  "r")
 (match_operand:SI 3 "rx_source_operand" "riQ")))
(set (match_operand:SI  1 "register_operand"  "=r")
@@ -1163,11 +1163,11 @@
(set_attr "timings" "22,44")]
 )
 
-(define_insn "smax3"
-  [(set (match_operand:int_modes 0 "register_operand" 
"=r,r,r,r,r,r")
-   (smax:int_modes (match_operand:int_modes 1 "register_operand" 
"%0,0,0,0,0,0")
-   (match_operand:int_modes 2 "rx_source_operand"
-
"r,Sint08,Sint16,Sint24,i,Q")))]
+(define_insn "smaxsi3"
+  [(set (match_operand:SI  0 "register_operand" "=r,r,r,r,r,r")
+   (smax:SI (match_operand:SI 1 "register_operand" "%0,0,0,0,0,0")
+(match_operand:SI 2 "rx_source_operand"
+  "r,Sint08,Sint16,Sint24,i,Q")))]
   ""
   "max\t%Q2, %0"
   [(set_attr "length" "3,4,5,6,7,6")


Re: Ping #1: [Patch,AVR]: Clean-up some SP insns

2011-10-05 Thread Denis Chertykov
2011/10/5 Georg-Johann Lay :
>
> Ping #1: http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01690.html
>
> Georg-Johann Lay wrote:
>
>> This is just a code clean-up.
>>
>> The bulky code from *addhi3_sp_R_pc2 and *addhi3_sp_R_pc3 is done by a small 
>> C
>> function that does the same (except that it prints some comment depending on
>> -dp or -fverbose-asm).
>>
>> *movhi_sp is an insn that should not be there and go away because it is a 
>> move
>> insn and there should be at most one move insn per mode (HImode in this 
>> case).
>>
>> stack_register_operand, i.e. (REG:HI 32), is in register class STACK_REG, 
>> thus
>> not in NO_REGS, thus element of register_operand, general_regs and
>> nonimmediate_operand which are the predicates/condition of *movhi.  *movhi
>> already knows to handle "q,r" and "r,q" moves, same applies to the output
>> function output_movhi.
>>
>> The patch passes the test suite, of course.
>>
>> Ok?
>>
>> Moreover, I'd like to remove constraint "R" which is just used in one insn 
>> now
>> and replace it by 3-letter constraint C.. so that prefix R is free.
>>
>> R is of absolutely no use in inline assembly and the constraint can be
>> renamed/removed from documentation, IMO.
>>
>> Johann
>>
>>       * config/avr/avr-protos.h (avr_out_addto_sp): New prototype.
>>       * config/avr/avr.c (avr_out_addto_sp): New function.
>>       (adjust_insn_length): Handle ADJUST_LEN_ADDTO_SP.
>>       * config/avr/avr.md (adjust_len): Add "addto_sp".
>>       (*movhi_sp): Remove insn.
>>       (*addhi3_sp_R_pc2, *addhi3_sp_R_pc3): Merge to *addhi3_sp_R.

Please commit.

Denis.


[PATCH] Fix COND_EXPR valueization

2011-10-05 Thread Richard Guenther

When making COND_EXPR/VEC_COND_EXPR a gimple ternary op I forgot
to update ternary handling in gimple_fold_stmt_to_constant_1 to also
valueize and fold the embedded comparison.  Done so.  This also
adjusts SCCVN to accept a SSA name result from that folder, when
simplifying A > 1 ? 2 : C to C, which is certainly useful
(observed with a testcase which triggers vector lowering).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2011-10-05  Richard Guenther  

* gimple-fold.c (gimple_fold_stmt_to_constant_1): For
ternary ops with an embedded expression valueize and fold
that as well.
* tree-ssa-sccvn.c (try_to_simplify): Also allow SSA name
results from gimple_fold_stmt_to_constant_1.

Index: gcc/gimple-fold.c
===
*** gcc/gimple-fold.c   (revision 179540)
--- gcc/gimple-fold.c   (working copy)
*** gimple_fold_stmt_to_constant_1 (gimple s
*** 2569,2574 
--- 2569,2587 
tree op1 = (*valueize) (gimple_assign_rhs2 (stmt));
tree op2 = (*valueize) (gimple_assign_rhs3 (stmt));
  
+ /* Fold embedded expressions in ternary codes.  */
+ if ((subcode == COND_EXPR
+  || subcode == VEC_COND_EXPR)
+ && COMPARISON_CLASS_P (op0))
+   {
+ tree op00 = (*valueize) (TREE_OPERAND (op0, 0));
+ tree op01 = (*valueize) (TREE_OPERAND (op0, 1));
+ tree tem = fold_binary_loc (loc, TREE_CODE (op0),
+ TREE_TYPE (op0), op00, op01);
+ if (tem)
+   op0 = tem;
+   }
+ 
return fold_ternary_loc (loc, subcode,
   gimple_expr_type (stmt), op0, op1, op2);
  }
Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 179539)
--- gcc/tree-ssa-sccvn.c(working copy)
*** simplify_unary_expression (gimple stmt)
*** 2967,2993 
  static tree
  try_to_simplify (gimple stmt)
  {
tree tem;
  
/* For stores we can end up simplifying a SSA_NAME rhs.  Just return
   in this case, there is no point in doing extra work.  */
!   if (gimple_assign_copy_p (stmt)
!   && TREE_CODE (gimple_assign_rhs1 (stmt)) == SSA_NAME)
  return NULL_TREE;
  
/* First try constant folding based on our current lattice.  */
!   tem = gimple_fold_stmt_to_constant (stmt, vn_valueize);
!   if (tem)
  return tem;
  
/* If that didn't work try combining multiple statements.  */
!   switch (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)))
  {
  case tcc_reference:
!   /* Fallthrough for some codes that can operate on registers.  */
!   if (!(TREE_CODE (gimple_assign_rhs1 (stmt)) == REALPART_EXPR
!   || TREE_CODE (gimple_assign_rhs1 (stmt)) == IMAGPART_EXPR
!   || TREE_CODE (gimple_assign_rhs1 (stmt)) == VIEW_CONVERT_EXPR))
break;
/* We could do a little more with unary ops, if they expand
 into binary ops, but it's debatable whether it is worth it. */
--- 2967,2995 
  static tree
  try_to_simplify (gimple stmt)
  {
+   enum tree_code code = gimple_assign_rhs_code (stmt);
tree tem;
  
/* For stores we can end up simplifying a SSA_NAME rhs.  Just return
   in this case, there is no point in doing extra work.  */
!   if (code == SSA_NAME)
  return NULL_TREE;
  
/* First try constant folding based on our current lattice.  */
!   tem = gimple_fold_stmt_to_constant_1 (stmt, vn_valueize);
!   if (tem
!   && (TREE_CODE (tem) == SSA_NAME
! || is_gimple_min_invariant (tem)))
  return tem;
  
/* If that didn't work try combining multiple statements.  */
!   switch (TREE_CODE_CLASS (code))
  {
  case tcc_reference:
!   /* Fallthrough for some unary codes that can operate on registers.  */
!   if (!(code == REALPART_EXPR
!   || code == IMAGPART_EXPR
!   || code == VIEW_CONVERT_EXPR))
break;
/* We could do a little more with unary ops, if they expand
 into binary ops, but it's debatable whether it is worth it. */


[C++ Patch] PR 38980

2011-10-05 Thread Paolo Carlini

Hi,

as analyzed by Jakub in the audit trail, after changes made by Mark back 
in 2005, constant_value_1 doesn't return aggregate constants for fear of 
inadvertent copies (in general their addresses must be the same 
everywhere). However, for the purpose of format checking in 
check_format_arg it is safe to do that, and necessary, thus Jakub 
suggested adding a parameter to decl_constant_value controlling that 
behavior. Which is what I tried to do with the below, tested x86_64-linux.


Ok for mainline?

Thanks,
Paolo.

/
/c-family
2011-10-05  Paolo Carlini  

PR c++/38980
* c-common.h (decl_constant_value): Add bool parameter.
* c-common.c (decl_constant_value_for_optimization): Adjust call.
* c-format.c (check_format_arg): Likewise.

2011-10-05  Paolo Carlini  

PR c++/38980
* c-typeck.c (decl_constant_value): Adjust definition.

/cp
2011-10-05  Paolo Carlini  

PR c++/38980
* typeck.c (decay_conversion): Adjust decl_constant_value call.
* call.c (convert_like_real): Likewise.
* init.c (constant_value_1): Add bool parameter.
(integral_constant_value): Adjust call.
(decl_constant_value): Adjust definition.

/testsuite
2011-10-05  Paolo Carlini  

PR c++/38980
* g++.dg/warn/format5.C: New.
Index: c-family/c-format.c
===
--- c-family/c-format.c (revision 179540)
+++ c-family/c-format.c (working copy)
@@ -1529,7 +1529,7 @@ check_format_arg (void *ctx, tree format_tree,
 format_tree = TREE_OPERAND (format_tree, 0);
   if (TREE_CODE (format_tree) == VAR_DECL
   && TREE_CODE (TREE_TYPE (format_tree)) == ARRAY_TYPE
-  && (array_init = decl_constant_value (format_tree)) != format_tree
+  && (array_init = decl_constant_value (format_tree, true)) != format_tree
   && TREE_CODE (array_init) == STRING_CST)
 {
   /* Extract the string constant initializer.  Note that this may include
Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 179540)
+++ c-family/c-common.c (working copy)
@@ -1443,7 +1443,7 @@ decl_constant_value_for_optimization (tree exp)
   || DECL_MODE (exp) == BLKmode)
 return exp;
 
-  ret = decl_constant_value (exp);
+  ret = decl_constant_value (exp, false);
   /* Avoid unwanted tree sharing between the initializer and current
  function's body where the tree can be modified e.g. by the
  gimplifier.  */
Index: c-family/c-common.h
===
--- c-family/c-common.h (revision 179540)
+++ c-family/c-common.h (working copy)
@@ -886,7 +886,7 @@ extern tree default_conversion (tree);
 
 extern tree common_type (tree, tree);
 
-extern tree decl_constant_value (tree);
+extern tree decl_constant_value (tree, bool);
 
 /* Handle increment and decrement of boolean types.  */
 extern tree boolean_increment (enum tree_code, tree);
Index: testsuite/g++.dg/warn/format5.C
===
--- testsuite/g++.dg/warn/format5.C (revision 0)
+++ testsuite/g++.dg/warn/format5.C (revision 0)
@@ -0,0 +1,12 @@
+// PR c++/38980
+// { dg-options "-Wformat" }
+
+extern "C"
+int printf(const char *format, ...) __attribute__((format(printf, 1, 2) ));
+
+const char fmt1[] = "Hello, %s";
+
+void f()
+{
+  printf(fmt1, 3); // { dg-warning "expects argument" }
+}
Index: cp/typeck.c
===
--- cp/typeck.c (revision 179540)
+++ cp/typeck.c (working copy)
@@ -1827,7 +1827,7 @@ decay_conversion (tree exp)
   /* FIXME remove? at least need to remember that this isn't really a
  constant expression if EXP isn't decl_constant_var_p, like with
  C_MAYBE_CONST_EXPR.  */
-  exp = decl_constant_value (exp);
+  exp = decl_constant_value (exp, /*return_aggregate_cst_ok_p=*/false);
   if (error_operand_p (exp))
 return error_mark_node;
 
Index: cp/init.c
===
--- cp/init.c   (revision 179540)
+++ cp/init.c   (working copy)
@@ -1794,10 +1794,11 @@ build_offset_ref (tree type, tree member, bool add
constant initializer, return the initializer (or, its initializers,
recursively); otherwise, return DECL.  If INTEGRAL_P, the
initializer is only returned if DECL is an integral
-   constant-expression.  */
+   constant-expression.  If RETURN_AGGREGATE_CST_OK_P, it is ok to
+   return an aggregate constant.  */
 
 static tree
-constant_value_1 (tree decl, bool integral_p)
+constant_value_1 (tree decl, bool integral_p, bool return_aggregate_cst_ok_p)
 {
   while (TREE_CODE (decl) == CONST_DECL
 || (integral_p
@@ -1834,11 +1835,12 @@ static tree
   if (!init
  || !TREE_TYPE (init)
  || !TREE_CONSTANT (init)
- || (!integral_p
- /* Do not r

Re: New warning for expanded vector operations

2011-10-05 Thread Artem Shinkarov
On Wed, Oct 5, 2011 at 9:40 AM, Richard Guenther
 wrote:
> On Wed, Oct 5, 2011 at 12:18 AM, Artem Shinkarov
>  wrote:
>> Hi
>>
>> Here is a patch to inform a programmer about the expanded vector operation.
>> Bootstrapped on x86-unknown-linux-gnu.
>>
>> ChangeLog:
>>
>>        * gcc/tree-vect-generic.c (expand_vector_piecewise): Adjust to
>>          produce the warning.
>>          (expand_vector_parallel): Adjust to produce the warning.
>
> Entries start without gcc/, they are relative to the gcc/ChangeLog file.

Sure, sorry.

>>          (lower_vec_shuffle): Adjust to produce the warning.
>>        * gcc/common.opt: New warning Wvector-operation-expanded.
>>        * gcc/doc/invoke.texi: Document the wawning.
>>
>>
>> Ok?
>
> I don't like the name -Wvector-operation-expanded.  We emit a
> similar warning for missed inline expansions with -Winline, so
> maybe -Wvector-extensions (that's the name that appears
> in the C extension documentation).

Hm, I don't care much about the name, unless it gets clear what the
warning is used for.  I am not really sure that Wvector-extensions
makes it clear.  Also, I don't see anything bad if the warning will
pop up during the vectorisation. Any vector operation performed
outside the SIMD accelerator looks suspicious, because it actually
doesn't improve performance.  Such a warning during the vectorisation
could mean that a programmer forgot some flag, or the constant
propagation failed to deliver a constant, or something else.

Conceptually the text I am producing is not really a warning, it is
more like an information, but I am not aware of the mechanisms that
would allow me to introduce a flag triggering inform () or something
similar.

What I think we really need to avoid is including this warning in the
standard Ox.

> +  location_t loc = gimple_location (gsi_stmt (*gsi));
> +
> +  warning_at (loc, OPT_Wvector_operation_expanded,
> +             "vector operation will be expanded piecewise");
>
>   v = VEC_alloc(constructor_elt, gc, (nunits + delta - 1) / delta);
>   for (i = 0; i < nunits;
> @@ -260,6 +264,10 @@ expand_vector_parallel (gimple_stmt_iter
>   tree result, compute_type;
>   enum machine_mode mode;
>   int n_words = tree_low_cst (TYPE_SIZE_UNIT (type), 1) / UNITS_PER_WORD;
> +  location_t loc = gimple_location (gsi_stmt (*gsi));
> +
> +  warning_at (loc, OPT_Wvector_operation_expanded,
> +             "vector operation will be expanded in parallel");
>
> what's the difference between 'piecewise' and 'in parallel'?

Parallel is a little bit better for performance than piecewise.

> @@ -301,16 +309,15 @@ expand_vector_addition (gimple_stmt_iter
>  {
>   int parts_per_word = UNITS_PER_WORD
>                       / tree_low_cst (TYPE_SIZE_UNIT (TREE_TYPE (type)), 1);
> +  location_t loc = gimple_location (gsi_stmt (*gsi));
>
>   if (INTEGRAL_TYPE_P (TREE_TYPE (type))
>       && parts_per_word >= 4
>       && TYPE_VECTOR_SUBPARTS (type) >= 4)
> -    return expand_vector_parallel (gsi, f_parallel,
> -                                  type, a, b, code);
> +    return expand_vector_parallel (gsi, f_parallel, type, a, b, code);
>   else
> -    return expand_vector_piecewise (gsi, f,
> -                                   type, TREE_TYPE (type),
> -                                   a, b, code);
> +    return expand_vector_piecewise (gsi, f, type,
> +                                   TREE_TYPE (type), a, b, code);
>  }
>
>  /* Check if vector VEC consists of all the equal elements and
>
> unless i miss something loc is unused here.  Please avoid random
> whitespace changes (just review your patch yourself before posting
> and revert pieces that do nothing).

Yes you are right, sorry.

> +@item -Wvector-operation-expanded
> +@opindex Wvector-operation-expanded
> +@opindex Wno-vector-operation-expanded
> +Warn if vector operation is not implemented via SIMD capabilities of the
> +architecture. Mainly useful for the performance tuning.
>
> I'd mention that this is for vector operations as of the C extension
> documented in "Vector Extensions".
>
> The vectorizer can produce some operations that will need further
> lowering - we probably should make sure to _not_ warn about those.
> Try running the vect.exp testsuite with the new warning turned on
> (eventually disabling SSE), like with
>
> obj/gcc> make check-gcc
> RUNTESTFLAGS="--target_board=unix/-Wvector-extensions/-mno-sse
> vect.exp"

Again, see the comment above. I think, if the warning can be triggered
only manually, then we are fine. But I'll check anyway how many
warnings I'll get from vect.exp.

>> P.S. It is hard to write a reasonable testcase for the patch, because
>> one needs to guess which architecture would expand a given vector
>> operation. But the patch is trivial.
>
> You can create an aritificial large vector type for example, or put a
> testcase under gcc.target/i386 and disable SSE.  We should have
> a testcase for this.

Yeah, disabling SSE should help.


Thanks,
Artem.
> Thanks,
> Richard

[PATCH] Fix ICE in find_equal_ptrs (PR tree-optimization/50613)

2011-10-05 Thread Jakub Jelinek
Hi!

I didn't consider that the rhs1 of a gimple cast might be something
other than SSA_NAME.  Fixed thusly, bootstrapped/regtested on x86_64-linux
and i686-linux, ok for trunk?

2011-10-05  Jakub Jelinek  

PR tree-optimization/50613
* tree-ssa-strlen.c (find_equal_ptrs): If CASE_CONVERT
operand is ADDR_EXPR, fallthru into ADDR_EXPR handling,
and if it is neither that not SSA_NAME, give up.

* gcc.dg/pr50613.c: New test.

--- gcc/tree-ssa-strlen.c.jj2011-10-05 08:13:55.0 +0200
+++ gcc/tree-ssa-strlen.c   2011-10-05 08:19:16.0 +0200
@@ -692,6 +692,14 @@ find_equal_ptrs (tree ptr, int idx)
{
case SSA_NAME:
  break;
+   CASE_CONVERT:
+ if (!POINTER_TYPE_P (TREE_TYPE (ptr)))
+   return;
+ if (TREE_CODE (ptr) == SSA_NAME)
+   break;
+ if (TREE_CODE (ptr) != ADDR_EXPR)
+   return;
+ /* FALLTHRU */
case ADDR_EXPR:
  {
int *pidx = addr_stridxptr (TREE_OPERAND (ptr, 0));
@@ -699,10 +707,6 @@ find_equal_ptrs (tree ptr, int idx)
  *pidx = idx;
return;
  }
-   CASE_CONVERT:
- if (POINTER_TYPE_P (TREE_TYPE (ptr)))
-   break;
- return;
default:
  return;
}
--- gcc/testsuite/gcc.dg/pr50613.c.jj   2011-10-05 08:22:32.0 +0200
+++ gcc/testsuite/gcc.dg/pr50613.c  2011-10-05 08:21:31.0 +0200
@@ -0,0 +1,20 @@
+/* PR tree-optimization/50613 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-ccp" } */
+
+#include "strlenopt.h"
+
+char buf[26];
+
+static inline void
+bar (char *__restrict dest, const char *__restrict src)
+{
+  strcpy (dest, src);
+}
+
+void
+foo (char *p)
+{
+  if (strlen (p) < 50)
+bar (buf, p);
+}

Jakub


Re: New warning for expanded vector operations

2011-10-05 Thread Richard Guenther
On Wed, Oct 5, 2011 at 1:28 PM, Artem Shinkarov
 wrote:
> On Wed, Oct 5, 2011 at 9:40 AM, Richard Guenther
>  wrote:
>> On Wed, Oct 5, 2011 at 12:18 AM, Artem Shinkarov
>>  wrote:
>>> Hi
>>>
>>> Here is a patch to inform a programmer about the expanded vector operation.
>>> Bootstrapped on x86-unknown-linux-gnu.
>>>
>>> ChangeLog:
>>>
>>>        * gcc/tree-vect-generic.c (expand_vector_piecewise): Adjust to
>>>          produce the warning.
>>>          (expand_vector_parallel): Adjust to produce the warning.
>>
>> Entries start without gcc/, they are relative to the gcc/ChangeLog file.
>
> Sure, sorry.
>
>>>          (lower_vec_shuffle): Adjust to produce the warning.
>>>        * gcc/common.opt: New warning Wvector-operation-expanded.
>>>        * gcc/doc/invoke.texi: Document the wawning.
>>>
>>>
>>> Ok?
>>
>> I don't like the name -Wvector-operation-expanded.  We emit a
>> similar warning for missed inline expansions with -Winline, so
>> maybe -Wvector-extensions (that's the name that appears
>> in the C extension documentation).
>
> Hm, I don't care much about the name, unless it gets clear what the
> warning is used for.  I am not really sure that Wvector-extensions
> makes it clear.  Also, I don't see anything bad if the warning will
> pop up during the vectorisation. Any vector operation performed
> outside the SIMD accelerator looks suspicious, because it actually
> doesn't improve performance.  Such a warning during the vectorisation
> could mean that a programmer forgot some flag, or the constant
> propagation failed to deliver a constant, or something else.
>
> Conceptually the text I am producing is not really a warning, it is
> more like an information, but I am not aware of the mechanisms that
> would allow me to introduce a flag triggering inform () or something
> similar.
>
> What I think we really need to avoid is including this warning in the
> standard Ox.
>
>> +  location_t loc = gimple_location (gsi_stmt (*gsi));
>> +
>> +  warning_at (loc, OPT_Wvector_operation_expanded,
>> +             "vector operation will be expanded piecewise");
>>
>>   v = VEC_alloc(constructor_elt, gc, (nunits + delta - 1) / delta);
>>   for (i = 0; i < nunits;
>> @@ -260,6 +264,10 @@ expand_vector_parallel (gimple_stmt_iter
>>   tree result, compute_type;
>>   enum machine_mode mode;
>>   int n_words = tree_low_cst (TYPE_SIZE_UNIT (type), 1) / UNITS_PER_WORD;
>> +  location_t loc = gimple_location (gsi_stmt (*gsi));
>> +
>> +  warning_at (loc, OPT_Wvector_operation_expanded,
>> +             "vector operation will be expanded in parallel");
>>
>> what's the difference between 'piecewise' and 'in parallel'?
>
> Parallel is a little bit better for performance than piecewise.

I see.  That difference should probably be documented, maybe with
an example.

Richard.

>> @@ -301,16 +309,15 @@ expand_vector_addition (gimple_stmt_iter
>>  {
>>   int parts_per_word = UNITS_PER_WORD
>>                       / tree_low_cst (TYPE_SIZE_UNIT (TREE_TYPE (type)), 1);
>> +  location_t loc = gimple_location (gsi_stmt (*gsi));
>>
>>   if (INTEGRAL_TYPE_P (TREE_TYPE (type))
>>       && parts_per_word >= 4
>>       && TYPE_VECTOR_SUBPARTS (type) >= 4)
>> -    return expand_vector_parallel (gsi, f_parallel,
>> -                                  type, a, b, code);
>> +    return expand_vector_parallel (gsi, f_parallel, type, a, b, code);
>>   else
>> -    return expand_vector_piecewise (gsi, f,
>> -                                   type, TREE_TYPE (type),
>> -                                   a, b, code);
>> +    return expand_vector_piecewise (gsi, f, type,
>> +                                   TREE_TYPE (type), a, b, code);
>>  }
>>
>>  /* Check if vector VEC consists of all the equal elements and
>>
>> unless i miss something loc is unused here.  Please avoid random
>> whitespace changes (just review your patch yourself before posting
>> and revert pieces that do nothing).
>
> Yes you are right, sorry.
>
>> +@item -Wvector-operation-expanded
>> +@opindex Wvector-operation-expanded
>> +@opindex Wno-vector-operation-expanded
>> +Warn if vector operation is not implemented via SIMD capabilities of the
>> +architecture. Mainly useful for the performance tuning.
>>
>> I'd mention that this is for vector operations as of the C extension
>> documented in "Vector Extensions".
>>
>> The vectorizer can produce some operations that will need further
>> lowering - we probably should make sure to _not_ warn about those.
>> Try running the vect.exp testsuite with the new warning turned on
>> (eventually disabling SSE), like with
>>
>> obj/gcc> make check-gcc
>> RUNTESTFLAGS="--target_board=unix/-Wvector-extensions/-mno-sse
>> vect.exp"
>
> Again, see the comment above. I think, if the warning can be triggered
> only manually, then we are fine. But I'll check anyway how many
> warnings I'll get from vect.exp.
>
>>> P.S. It is hard to write a reasonable testcase for the patch, because
>>> one needs to guess which architecture would expand a 

Re: [PATCH] Fix ICE in find_equal_ptrs (PR tree-optimization/50613)

2011-10-05 Thread Richard Guenther
On Wed, Oct 5, 2011 at 1:31 PM, Jakub Jelinek  wrote:
> Hi!
>
> I didn't consider that the rhs1 of a gimple cast might be something
> other than SSA_NAME.  Fixed thusly, bootstrapped/regtested on x86_64-linux
> and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2011-10-05  Jakub Jelinek  
>
>        PR tree-optimization/50613
>        * tree-ssa-strlen.c (find_equal_ptrs): If CASE_CONVERT
>        operand is ADDR_EXPR, fallthru into ADDR_EXPR handling,
>        and if it is neither that not SSA_NAME, give up.
>
>        * gcc.dg/pr50613.c: New test.
>
> --- gcc/tree-ssa-strlen.c.jj    2011-10-05 08:13:55.0 +0200
> +++ gcc/tree-ssa-strlen.c       2011-10-05 08:19:16.0 +0200
> @@ -692,6 +692,14 @@ find_equal_ptrs (tree ptr, int idx)
>        {
>        case SSA_NAME:
>          break;
> +       CASE_CONVERT:
> +         if (!POINTER_TYPE_P (TREE_TYPE (ptr)))
> +           return;
> +         if (TREE_CODE (ptr) == SSA_NAME)
> +           break;
> +         if (TREE_CODE (ptr) != ADDR_EXPR)
> +           return;
> +         /* FALLTHRU */
>        case ADDR_EXPR:
>          {
>            int *pidx = addr_stridxptr (TREE_OPERAND (ptr, 0));
> @@ -699,10 +707,6 @@ find_equal_ptrs (tree ptr, int idx)
>              *pidx = idx;
>            return;
>          }
> -       CASE_CONVERT:
> -         if (POINTER_TYPE_P (TREE_TYPE (ptr)))
> -           break;
> -         return;
>        default:
>          return;
>        }
> --- gcc/testsuite/gcc.dg/pr50613.c.jj   2011-10-05 08:22:32.0 +0200
> +++ gcc/testsuite/gcc.dg/pr50613.c      2011-10-05 08:21:31.0 +0200
> @@ -0,0 +1,20 @@
> +/* PR tree-optimization/50613 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-tree-ccp" } */
> +
> +#include "strlenopt.h"
> +
> +char buf[26];
> +
> +static inline void
> +bar (char *__restrict dest, const char *__restrict src)
> +{
> +  strcpy (dest, src);
> +}
> +
> +void
> +foo (char *p)
> +{
> +  if (strlen (p) < 50)
> +    bar (buf, p);
> +}
>
>        Jakub
>


[PATCH] Propagate vector CONSTRUCTOR and element extract in SCCVN

2011-10-05 Thread Richard Guenther

Noticed from vector lowering testcases.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-10-05  Richard Guenther  

* tree-ssa-sccvn.c (vn_get_expr_for): Handle CONSTRUCTOR of
vector type.
(simplify_unary_expression): Handle BIT_FIELD_REFs.
(try_to_simplify): Handle BIT_FIELD_REFs.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 179543)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -275,6 +275,13 @@ vn_get_expr_for (tree name)
  gimple_assign_rhs2 (def_stmt));
   break;
 
+case tcc_exceptional:
+  if (code == CONSTRUCTOR
+ && TREE_CODE
+  (TREE_TYPE (gimple_assign_rhs1 (def_stmt))) == VECTOR_TYPE)
+   expr = gimple_assign_rhs1 (def_stmt);
+  break;
+
 default:;
 }
   if (expr == NULL_TREE)
@@ -2922,7 +2929,8 @@ simplify_unary_expression (gimple stmt)
  GIMPLE_ASSIGN_SINGLE codes.  */
   if (code == REALPART_EXPR
   || code == IMAGPART_EXPR
-  || code == VIEW_CONVERT_EXPR)
+  || code == VIEW_CONVERT_EXPR
+  || code == BIT_FIELD_REF)
 op0 = TREE_OPERAND (op0, 0);
 
   if (TREE_CODE (op0) != SSA_NAME)
@@ -2934,7 +2942,8 @@ simplify_unary_expression (gimple stmt)
   else if (CONVERT_EXPR_CODE_P (code)
   || code == REALPART_EXPR
   || code == IMAGPART_EXPR
-  || code == VIEW_CONVERT_EXPR)
+  || code == VIEW_CONVERT_EXPR
+  || code == BIT_FIELD_REF)
 {
   /* We want to do tree-combining on conversion-like expressions.
  Make sure we feed only SSA_NAMEs or constants to fold though.  */
@@ -2943,6 +2952,7 @@ simplify_unary_expression (gimple stmt)
  || BINARY_CLASS_P (tem)
  || TREE_CODE (tem) == VIEW_CONVERT_EXPR
  || TREE_CODE (tem) == SSA_NAME
+ || TREE_CODE (tem) == CONSTRUCTOR
  || is_gimple_min_invariant (tem))
op0 = tem;
 }
@@ -2951,7 +2961,14 @@ simplify_unary_expression (gimple stmt)
   if (op0 == orig_op0)
 return NULL_TREE;
 
-  result = fold_unary_ignore_overflow (code, gimple_expr_type (stmt), op0);
+  if (code == BIT_FIELD_REF)
+{
+  tree rhs = gimple_assign_rhs1 (stmt);
+  result = fold_ternary (BIT_FIELD_REF, TREE_TYPE (rhs),
+op0, TREE_OPERAND (rhs, 1), TREE_OPERAND (rhs, 2));
+}
+  else
+result = fold_unary_ignore_overflow (code, gimple_expr_type (stmt), op0);
   if (result)
 {
   STRIP_USELESS_TYPE_CONVERSION (result);
@@ -2989,7 +3006,8 @@ try_to_simplify (gimple stmt)
   /* Fallthrough for some unary codes that can operate on registers.  */
   if (!(code == REALPART_EXPR
|| code == IMAGPART_EXPR
-   || code == VIEW_CONVERT_EXPR))
+   || code == VIEW_CONVERT_EXPR
+   || code == BIT_FIELD_REF))
break;
   /* We could do a little more with unary ops, if they expand
 into binary ops, but it's debatable whether it is worth it. */
@@ -3159,7 +3177,8 @@ visit_use (tree use)
  /* VOP-less references can go through unary case.  */
  if ((code == REALPART_EXPR
   || code == IMAGPART_EXPR
-  || code == VIEW_CONVERT_EXPR)
+  || code == VIEW_CONVERT_EXPR
+  || code == BIT_FIELD_REF)
  && TREE_CODE (TREE_OPERAND (rhs1, 0)) == SSA_NAME)
{
  changed = visit_nary_op (lhs, stmt);


[PATCH, testsuite]: "Fix" avx256-unaligned-{load,store}-3.c scan-assembler failures

2011-10-05 Thread Uros Bizjak
Hello!

Atom does not vectorize DFmode arrays by default, so add
-mtune=generic to dg-options to fix scan-assembler failures [1].

2011-10-05  Uros Bizjak  

* gcc.target/i386/avx256-unaligned-load-3.c (dg-options): Add
-mtune=generic.
* gcc.target/i386/avx256-unaligned-store-3.c (dg-options): Ditto.

Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN.

[1] http://gcc.gnu.org/ml/gcc-testresults/2011-10/msg00530.html

Uros.
Index: gcc.target/i386/avx256-unaligned-store-3.c
===
--- gcc.target/i386/avx256-unaligned-store-3.c  (revision 179535)
+++ gcc.target/i386/avx256-unaligned-store-3.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store" } */
+/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store -mtune=generic" 
} */
 
 #define N 1024
 
Index: gcc.target/i386/avx256-unaligned-load-3.c
===
--- gcc.target/i386/avx256-unaligned-load-3.c   (revision 179535)
+++ gcc.target/i386/avx256-unaligned-load-3.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-load" } */
+/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-load -mtune=generic" } 
*/
 
 #define N 1024
 


Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Diego Novillo

On 11-09-27 03:37 , Richard Guenther wrote:

On Tue, Sep 27, 2011 at 9:35 AM, Richard Guenther
  wrote:

On Tue, Sep 27, 2011 at 9:14 AM, Jakub Jelinek  wrote:

On Mon, Sep 26, 2011 at 03:05:00PM -0700, Lawrence Crowl wrote:

There a non-transparent change in behavior that may affect some users.
The inline functions will introduce additional lines in a sequence of
gdb 'step' commands.  Use 'next' instead.


That is IMHO a serious obstackle.  If anything, the inlines should
use always_inline, artificial attributes, but don't know if GDB treats them
already specially and doesn't step into them with step.
Also, I'm afraid it will significantly grow cc1plus/cc1 debug info.

The .gdbinit macro redefinition Paolo posted sounds to be a better approach.


Yeah, I already see me typing sfinish  gazillion of times when
trying to step into a function call that produces a function argument
... there is
already the very very very annoying tree_code_length function that you get for
each TREE_OPERAND (...) macro on function arguments.  I'd be very opposed
to any change that makes this situation worse.


tree_operand_length actually.  Please produce a patch that makes this function
transparent to gdb, then I might be convinced converting the other macros to
such function might be worthwhile.

Thanks,
Richard.


I think we need to find a solution for this situation.  The suggestions 
I see in this thread are nothing but workarounds to cope with current 
debugging limitations.  We have learned to live with them and hack 
around them, but debugging GCC is already a daunting task, and I think 
we can improve it.


Richard, yes, stepping into one-liner inline functions can be 
aggravating (particularly with some of the 2-3 embedded function calls 
we have in some places).  However, the ability to easily inspect state 
of data structures using the standard accessors is fundamental. 
Particularly, for developers that may not have the extensive knowledge 
of internals that you do.


Jakub, is size of a debuggable cc1/cc1plus really all that important? 
Yes, more debug info makes for bigger binaries.  But when debugging, 
that hardly matters.  Unless we were talking about a multi-Gb binary, 
which we aren't.


I would like to solve this problem for all inline functions that we may 
not care to step into.  There is a whole bunch of one-liners that 
generally annoy me: tree_operand_length, gimple_op, ... in fact, all the 
gimple accessors, etc.


How about one of these two ideas?

1- Add -g to the list of supported settings in #pragma GCC optimize (or 
create a new #pragma GCC no_debug).  This way, we can brace all these 
one liners with:


#pragma push_options
#pragma GCC optimize ("-g0")
[ ... inline functions ... ]
#pragma pop_options


2- Similar to #1, but with __attribute__ in each function declaration. 
I think I prefer #1 since it's simpler for the user to specify.



This would also let us generate smaller debug binaries, since the 
bracketed functions would not get any debug info at all.


Any reason why that scheme couldn't work?  It works well for separate TUs.


Thanks.  Diego.


Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Richard Guenther
On Wed, Oct 5, 2011 at 3:18 PM, Diego Novillo  wrote:
> On 11-09-27 03:37 , Richard Guenther wrote:
>>
>> On Tue, Sep 27, 2011 at 9:35 AM, Richard Guenther
>>   wrote:
>>>
>>> On Tue, Sep 27, 2011 at 9:14 AM, Jakub Jelinek  wrote:

 On Mon, Sep 26, 2011 at 03:05:00PM -0700, Lawrence Crowl wrote:
>
> There a non-transparent change in behavior that may affect some users.
> The inline functions will introduce additional lines in a sequence of
> gdb 'step' commands.  Use 'next' instead.

 That is IMHO a serious obstackle.  If anything, the inlines should
 use always_inline, artificial attributes, but don't know if GDB treats
 them
 already specially and doesn't step into them with step.
 Also, I'm afraid it will significantly grow cc1plus/cc1 debug info.

 The .gdbinit macro redefinition Paolo posted sounds to be a better
 approach.
>>>
>>> Yeah, I already see me typing sfinish  gazillion of times
>>> when
>>> trying to step into a function call that produces a function argument
>>> ... there is
>>> already the very very very annoying tree_code_length function that you
>>> get for
>>> each TREE_OPERAND (...) macro on function arguments.  I'd be very opposed
>>> to any change that makes this situation worse.
>>
>> tree_operand_length actually.  Please produce a patch that makes this
>> function
>> transparent to gdb, then I might be convinced converting the other macros
>> to
>> such function might be worthwhile.
>>
>> Thanks,
>> Richard.
>
> I think we need to find a solution for this situation.  The suggestions I
> see in this thread are nothing but workarounds to cope with current
> debugging limitations.  We have learned to live with them and hack around
> them, but debugging GCC is already a daunting task, and I think we can
> improve it.
>
> Richard, yes, stepping into one-liner inline functions can be aggravating
> (particularly with some of the 2-3 embedded function calls we have in some
> places).  However, the ability to easily inspect state of data structures
> using the standard accessors is fundamental. Particularly, for developers
> that may not have the extensive knowledge of internals that you do.

It is much more important to optimize my debugging time as experienced
developer resources are more scarce than some random unexperienced
guy that happens to dig into GCC.

;)

or not really ;).

> Jakub, is size of a debuggable cc1/cc1plus really all that important? Yes,
> more debug info makes for bigger binaries.  But when debugging, that hardly
> matters.  Unless we were talking about a multi-Gb binary, which we aren't.
>
> I would like to solve this problem for all inline functions that we may not
> care to step into.  There is a whole bunch of one-liners that generally
> annoy me: tree_operand_length, gimple_op, ... in fact, all the gimple
> accessors, etc.
>
> How about one of these two ideas?
>
> 1- Add -g to the list of supported settings in #pragma GCC optimize (or
> create a new #pragma GCC no_debug).  This way, we can brace all these one
> liners with:
>
> #pragma push_options
> #pragma GCC optimize ("-g0")
> [ ... inline functions ... ]
> #pragma pop_options
>
>
> 2- Similar to #1, but with __attribute__ in each function declaration. I
> think I prefer #1 since it's simpler for the user to specify.
>
>
> This would also let us generate smaller debug binaries, since the bracketed
> functions would not get any debug info at all.
>
> Any reason why that scheme couldn't work?  It works well for separate TUs.

If you crash inside those -g0 marked functions, what happens?  Why
not use the artificial attribute on them instead?  At least what is documented
is exactly what we want (well, at least it seems to me).

Thus, produce a patch that improves tree_operand_length with either
approach so we can test it.

Richard.

>
> Thanks.  Diego.
>


[PATCH] Fix PR38885

2011-10-05 Thread Richard Guenther

I'm testing a pair of patches to fix PR38885 (for constants)
and PR38884 (for non-constants) stores to complex/vector memory
and CSE of component accesses from SCCVN.

This is the piece that handles stores from constants and partial
reads of it.  We can conveniently re-use fold-const native
encode/interpret code for this.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2011-10-05  Richard Guenther  

PR tree-optimization/38885
* tree-ssa-sccvn.c (vn_reference_lookup_3): Handle partial reads
from constants.

* gcc.dg/tree-ssa/ssa-fre-33.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 179549)
--- gcc/tree-ssa-sccvn.c(working copy)
*** vn_reference_lookup_3 (ao_ref *ref, tree
*** 1442,1448 
}
  }
  
!   /* 3) For aggregate copies translate the reference through them if
   the copy kills ref.  */
else if (vn_walk_kind == VN_WALKREWRITE
   && gimple_assign_single_p (def_stmt)
--- 1442,1495 
}
  }
  
!   /* 3) Assignment from a constant.  We can use folds native encode/interpret
!  routines to extract the assigned bits.  */
!   else if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
!  && ref->size == maxsize
!  && maxsize % BITS_PER_UNIT == 0
!  && offset % BITS_PER_UNIT == 0
!  && is_gimple_reg_type (vr->type)
!  && gimple_assign_single_p (def_stmt)
!  && is_gimple_min_invariant (gimple_assign_rhs1 (def_stmt)))
! {
!   tree base2;
!   HOST_WIDE_INT offset2, size2, maxsize2;
!   base2 = get_ref_base_and_extent (gimple_assign_lhs (def_stmt),
!  &offset2, &size2, &maxsize2);
!   if (maxsize2 != -1
! && maxsize2 == size2
! && size2 % BITS_PER_UNIT == 0
! && offset2 % BITS_PER_UNIT == 0
! && operand_equal_p (base, base2, 0)
! && offset2 <= offset
! && offset2 + size2 >= offset + maxsize)
!   {
! /* We support up to 512-bit values (for V8DFmode).  */
! unsigned char buffer[64];
! int len;
! 
! len = native_encode_expr (gimple_assign_rhs1 (def_stmt),
!   buffer, sizeof (buffer));
! if (len > 0)
!   {
! tree val = native_interpret_expr (vr->type,
!   buffer
!   + ((offset - offset2)
!  / BITS_PER_UNIT),
!   ref->size / BITS_PER_UNIT);
! if (val)
!   {
! unsigned int value_id = get_or_alloc_constant_value_id (val);
! return vn_reference_insert_pieces
!  (vuse, vr->set, vr->type,
!   VEC_copy (vn_reference_op_s, heap, vr->operands),
!   val, value_id);
!   }
!   }
!   }
! }
! 
!   /* 4) For aggregate copies translate the reference through them if
   the copy kills ref.  */
else if (vn_walk_kind == VN_WALKREWRITE
   && gimple_assign_single_p (def_stmt)
*** vn_reference_lookup_3 (ao_ref *ref, tree
*** 1540,1546 
return NULL;
  }
  
!   /* 4) For memcpy copies translate the reference through them if
   the copy kills ref.  */
else if (vn_walk_kind == VN_WALKREWRITE
   && is_gimple_reg_type (vr->type)
--- 1587,1593 
return NULL;
  }
  
!   /* 5) For memcpy copies translate the reference through them if
   the copy kills ref.  */
else if (vn_walk_kind == VN_WALKREWRITE
   && is_gimple_reg_type (vr->type)
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-33.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-33.c  (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-33.c  (revision 0)
***
*** 0 
--- 1,21 
+ /* { dg-do compile } */
+ /* { dg-options "-O -fdump-tree-fre1-details" } */
+ 
+ #define vector __attribute__((vector_size(16) ))
+ 
+ struct {
+ float i;
+ vector float global_res;
+ } s;
+ float x;
+ int main(int argc)
+ {
+   vector float res = (vector float){0.0f,0.0f,0.0f,1.0f};
+   res += (vector float){1.0f,2.0f,3.0f,4.0f};
+   s.global_res = res;
+   x = *((float*)&s.global_res + 1);
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump "Replaced BIT_FIELD_REF.*with 2" "fre1" } } */
+ /* { dg-final { cleanup-tree-dump "fre1" } } */


Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Diego Novillo
On Wed, Oct 5, 2011 at 09:45, Richard Guenther
 wrote:
> On Wed, Oct 5, 2011 at 3:18 PM, Diego Novillo  wrote:
>> On 11-09-27 03:37 , Richard Guenther wrote:
>>>
>>> On Tue, Sep 27, 2011 at 9:35 AM, Richard Guenther
>>>   wrote:

 On Tue, Sep 27, 2011 at 9:14 AM, Jakub Jelinek  wrote:
>
> On Mon, Sep 26, 2011 at 03:05:00PM -0700, Lawrence Crowl wrote:
>>
>> There a non-transparent change in behavior that may affect some users.
>> The inline functions will introduce additional lines in a sequence of
>> gdb 'step' commands.  Use 'next' instead.
>
> That is IMHO a serious obstackle.  If anything, the inlines should
> use always_inline, artificial attributes, but don't know if GDB treats
> them
> already specially and doesn't step into them with step.
> Also, I'm afraid it will significantly grow cc1plus/cc1 debug info.
>
> The .gdbinit macro redefinition Paolo posted sounds to be a better
> approach.

 Yeah, I already see me typing sfinish  gazillion of times
 when
 trying to step into a function call that produces a function argument
 ... there is
 already the very very very annoying tree_code_length function that you
 get for
 each TREE_OPERAND (...) macro on function arguments.  I'd be very opposed
 to any change that makes this situation worse.
>>>
>>> tree_operand_length actually.  Please produce a patch that makes this
>>> function
>>> transparent to gdb, then I might be convinced converting the other macros
>>> to
>>> such function might be worthwhile.
>>>
>>> Thanks,
>>> Richard.
>>
>> I think we need to find a solution for this situation.  The suggestions I
>> see in this thread are nothing but workarounds to cope with current
>> debugging limitations.  We have learned to live with them and hack around
>> them, but debugging GCC is already a daunting task, and I think we can
>> improve it.
>>
>> Richard, yes, stepping into one-liner inline functions can be aggravating
>> (particularly with some of the 2-3 embedded function calls we have in some
>> places).  However, the ability to easily inspect state of data structures
>> using the standard accessors is fundamental. Particularly, for developers
>> that may not have the extensive knowledge of internals that you do.
>
> It is much more important to optimize my debugging time as experienced
> developer resources are more scarce than some random unexperienced
> guy that happens to dig into GCC.
>
> ;)
>
> or not really ;).

You are being facetious, I hope.  Part of the reason that experienced
developers are scarce is precisely because dealing with GCC's code
base is so daunting.  We should be trying to attract those random
inexperienced developers, not scare them away.

The experienced developers will retire, eventually.  Who is going to
replace them?


>> Jakub, is size of a debuggable cc1/cc1plus really all that important? Yes,
>> more debug info makes for bigger binaries.  But when debugging, that hardly
>> matters.  Unless we were talking about a multi-Gb binary, which we aren't.
>>
>> I would like to solve this problem for all inline functions that we may not
>> care to step into.  There is a whole bunch of one-liners that generally
>> annoy me: tree_operand_length, gimple_op, ... in fact, all the gimple
>> accessors, etc.
>>
>> How about one of these two ideas?
>>
>> 1- Add -g to the list of supported settings in #pragma GCC optimize (or
>> create a new #pragma GCC no_debug).  This way, we can brace all these one
>> liners with:
>>
>> #pragma push_options
>> #pragma GCC optimize ("-g0")
>> [ ... inline functions ... ]
>> #pragma pop_options
>>
>>
>> 2- Similar to #1, but with __attribute__ in each function declaration. I
>> think I prefer #1 since it's simpler for the user to specify.
>>
>>
>> This would also let us generate smaller debug binaries, since the bracketed
>> functions would not get any debug info at all.
>>
>> Any reason why that scheme couldn't work?  It works well for separate TUs.
>
> If you crash inside those -g0 marked functions, what happens?

You don't see the body of the function, but you can go up into the
exact call site.

> Why
> not use the artificial attribute on them instead?  At least what is documented
> is exactly what we want (well, at least it seems to me).

Yes, I forgot to mention in my reply.  I tried it, but you still step
into the functions.  If this is a bug with the attribute, then it
could be fixed.

> Thus, produce a patch that improves tree_operand_length with either
> approach so we can test it.

I have a slight preference for making these functions totally opaque
to the debugger.  But may be there is a setting we can use that will
not affect step and still give us some debug info in the presence of
crashes.

Tom, Cary, Ian, any suggestions?  We are trying to figure out a
compromise for tiny inline functions that are generally a nuisance
when debugging.  The scenario is a call like this: big_function_fo

Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Jakub Jelinek
On Wed, Oct 05, 2011 at 10:10:44AM -0400, Diego Novillo wrote:
> > Why
> > not use the artificial attribute on them instead?  At least what is 
> > documented
> > is exactly what we want (well, at least it seems to me).
> 
> Yes, I forgot to mention in my reply.  I tried it, but you still step
> into the functions.  If this is a bug with the attribute, then it
> could be fixed.

If it doesn't work, then something should be done about it in gdb.
Hiding the inlines altogether from the debugger is undesirable, it is
better if the debugger has choice what to do, still the default should
be that it ignores the artificial inlines in the backtraces as well as when
stepping through, e.g. glibc headers use artificial attribute heavily
for inlines which are of no interest to users to step through too.

Jakub


Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Ian Lance Taylor
Diego Novillo  writes:

>> It is much more important to optimize my debugging time as experienced
>> developer resources are more scarce than some random unexperienced
>> guy that happens to dig into GCC.
>>
>> ;)
>>
>> or not really ;).
>
> You are being facetious, I hope.  Part of the reason that experienced
> developers are scarce is precisely because dealing with GCC's code
> base is so daunting.  We should be trying to attract those random
> inexperienced developers, not scare them away.
>
> The experienced developers will retire, eventually.  Who is going to
> replace them?

Yes.  We experienced gcc developers can adapt to the tools, and we can
modify the tools to work better.  We should not hold up code
improvements because of tool deficiencies.  We should fix both problems,
but we don't have to fix them in sequence.


> Tom, Cary, Ian, any suggestions?  We are trying to figure out a
> compromise for tiny inline functions that are generally a nuisance
> when debugging.  The scenario is a call like this: big_function_foo
> (inlined_f (x), inlined_g (y));
> We want to use 's' to step inside the call to big_function_foo(), but
> we don't want to step into either inlined_f() or inlined_g().

This is a general problem when debugging C++ programs, so any solution
to this gcc-specific issue will be of general use.  However, I don't
know of a way to make it work today without changing gdb.  There is a
lot I don't know about gdb, so it is possible that there is some
approach that will work.

Basically, gdb's relatively new support for debugging inline functions
is sometimes nice, but sometimes it just gets in the way.  I guess the
most general solution is a way to mark the inline function in the source
as uninteresting.  I don't really understand the doc for the
"artificial" function attribute, but it looks like it was intended to
serve this purpose.  So it seems to me that there is a bug in gdb:
"step" should not step into a function with the "artificial" attribute.
I agree that it does not currently work that way.  Looking at the gdb
sources, it seems to me that it currently ignores DW_AT_artificial on an
ordinary function.

Ian


Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Richard Guenther
On Wed, Oct 5, 2011 at 4:10 PM, Diego Novillo  wrote:
> On Wed, Oct 5, 2011 at 09:45, Richard Guenther
>  wrote:
>> On Wed, Oct 5, 2011 at 3:18 PM, Diego Novillo  wrote:
>>> On 11-09-27 03:37 , Richard Guenther wrote:

 On Tue, Sep 27, 2011 at 9:35 AM, Richard Guenther
   wrote:
>
> On Tue, Sep 27, 2011 at 9:14 AM, Jakub Jelinek  wrote:
>>
>> On Mon, Sep 26, 2011 at 03:05:00PM -0700, Lawrence Crowl wrote:
>>>
>>> There a non-transparent change in behavior that may affect some users.
>>> The inline functions will introduce additional lines in a sequence of
>>> gdb 'step' commands.  Use 'next' instead.
>>
>> That is IMHO a serious obstackle.  If anything, the inlines should
>> use always_inline, artificial attributes, but don't know if GDB treats
>> them
>> already specially and doesn't step into them with step.
>> Also, I'm afraid it will significantly grow cc1plus/cc1 debug info.
>>
>> The .gdbinit macro redefinition Paolo posted sounds to be a better
>> approach.
>
> Yeah, I already see me typing sfinish  gazillion of times
> when
> trying to step into a function call that produces a function argument
> ... there is
> already the very very very annoying tree_code_length function that you
> get for
> each TREE_OPERAND (...) macro on function arguments.  I'd be very opposed
> to any change that makes this situation worse.

 tree_operand_length actually.  Please produce a patch that makes this
 function
 transparent to gdb, then I might be convinced converting the other macros
 to
 such function might be worthwhile.

 Thanks,
 Richard.
>>>
>>> I think we need to find a solution for this situation.  The suggestions I
>>> see in this thread are nothing but workarounds to cope with current
>>> debugging limitations.  We have learned to live with them and hack around
>>> them, but debugging GCC is already a daunting task, and I think we can
>>> improve it.
>>>
>>> Richard, yes, stepping into one-liner inline functions can be aggravating
>>> (particularly with some of the 2-3 embedded function calls we have in some
>>> places).  However, the ability to easily inspect state of data structures
>>> using the standard accessors is fundamental. Particularly, for developers
>>> that may not have the extensive knowledge of internals that you do.
>>
>> It is much more important to optimize my debugging time as experienced
>> developer resources are more scarce than some random unexperienced
>> guy that happens to dig into GCC.
>>
>> ;)
>>
>> or not really ;).
>
> You are being facetious, I hope.  Part of the reason that experienced
> developers are scarce is precisely because dealing with GCC's code
> base is so daunting.  We should be trying to attract those random
> inexperienced developers, not scare them away.
>
> The experienced developers will retire, eventually.  Who is going to
> replace them?
>
>
>>> Jakub, is size of a debuggable cc1/cc1plus really all that important? Yes,
>>> more debug info makes for bigger binaries.  But when debugging, that hardly
>>> matters.  Unless we were talking about a multi-Gb binary, which we aren't.
>>>
>>> I would like to solve this problem for all inline functions that we may not
>>> care to step into.  There is a whole bunch of one-liners that generally
>>> annoy me: tree_operand_length, gimple_op, ... in fact, all the gimple
>>> accessors, etc.
>>>
>>> How about one of these two ideas?
>>>
>>> 1- Add -g to the list of supported settings in #pragma GCC optimize (or
>>> create a new #pragma GCC no_debug).  This way, we can brace all these one
>>> liners with:
>>>
>>> #pragma push_options
>>> #pragma GCC optimize ("-g0")
>>> [ ... inline functions ... ]
>>> #pragma pop_options
>>>
>>>
>>> 2- Similar to #1, but with __attribute__ in each function declaration. I
>>> think I prefer #1 since it's simpler for the user to specify.
>>>
>>>
>>> This would also let us generate smaller debug binaries, since the bracketed
>>> functions would not get any debug info at all.
>>>
>>> Any reason why that scheme couldn't work?  It works well for separate TUs.
>>
>> If you crash inside those -g0 marked functions, what happens?
>
> You don't see the body of the function, but you can go up into the
> exact call site.
>
>> Why
>> not use the artificial attribute on them instead?  At least what is 
>> documented
>> is exactly what we want (well, at least it seems to me).
>
> Yes, I forgot to mention in my reply.  I tried it, but you still step
> into the functions.  If this is a bug with the attribute, then it
> could be fixed.

Did you also mark the function with always_inline?  That's a requirement
as artificial only works for inlined function bodies.

>> Thus, produce a patch that improves tree_operand_length with either
>> approach so we can test it.
>
> I have a slight preference for making these functions totally opaque
> to the debugger.  But may

Re: [PATCH] Add support for more sparc VIS 3.0 instructions.

2011-10-05 Thread Marc Glisse

On Tue, 4 Oct 2011, David Miller wrote:


There are only a few VIS 3.0 instructions left after this,


Hello,

I am happy to see all this work, but I was wondering: are these 
instructions documented somewhere? It makes sense for gcc to only provide 
a list in extend.texi, but I couldn't find the VIS 3.0 documentation on 
the Oracle website, which makes it rather hard to use those builtins.



+int64_t __builtin_vis_umulxhi (int64_t, int64_t);


'u' looks like "unsigned", I guess that doesn't matter for a builtin?

--
Marc Glisse


Commit: RX: Add PID support

2011-10-05 Thread Nick Clifton
Hi Guys,

  I am applying the attached patch to add support for position
  independent data to the RX backend.  When this mode is enabled
  constant data is referenced via an offset from a base address held in
  a fixed register.  This allows the position of this data to be
  determined at run-time, rather than link-time, and without the
  overhead of storing relocations in the executable image.

  The code was written by DJ Delorie for Renesas and it is now being
  contributed back to the FSF.

Cheers
  Nick

gcc/ChangeLog
2011-10-05  DJ Delorie  
Nick Clifton  

* config/rx/rx.opt (mpid): Define.
* config/rx/t-rx (MULTILIB_OPTIONS): Add -mpid
(MULTILIB_DIRNAMES): Add pid.
* config/rx/rx.c (rx_gp_base_regnum_val, rx_pid_base_regnum_val)
(rx_num_interrupt_regs): New variable.
(rx_gp_base_regnum): New function.  Returns the number of the
small data area register.
(rx_pid_base_regnum): New function.  Returns the number of the pid
base register.
(rx_decl_for_addr): New function.  Returns the symbolic part of a
MEM.
(rx_pid_data_operand): New function.  Returns whether an object is
in the position independent data area.
(rx_legitimize_address): New function.  Puts undecided PID
objects in the PID data area.
(rx_is_legitimate_address): Add support for PID operands.
(rx_print_operand_address): Likewise.
(rx_print_operand): Likewise.
(rx_maybe_pidify_operand): New function.  Determine if an operand
is suitable for PID addressing.
(rx_gen_move_template): Add PID support.
(rx_conditional_register_usage): Likewise.
(rx_option_override): Initialise rx_num_interrupt_regs.
(rx_is_legitimate_constant): Add support for PID constants.
(TARGET_LEGITIMIZE_ADDRESS): Define.
* config/rx/constraints.md (Rpid): Define.
(Rpda): Define.
* config/rx/rx.md (UNSPEC_PID_ADDR): Define.
(tablejump): Add PID support.
(mov<>): Likewise.
(mov<>_internal): Likewise.
(addsi3): Convert to an expander.  Add PID support.
(pid_addr): New pattern.
* config/rx/rx.h (CPP_SPEC): Define.
(ASM_SPEC): Pass -mpid and -mint-register on to assembler.
(CASE_VECTOR_PC_RELATIVE): Define.
(JUMP_TABLES_IN_TEXT_SECTION): Enable for PID mode.
* config/rx/rx-protos.h (rx_maybe_pidify_operand): Prototype.
* doc/invoke.texi (RX Options): Document -mpid command line
option.



rx.gcc.pid.patch.bz2
Description: BZip2 compressed data


Re: Initial shrink-wrapping patch

2011-10-05 Thread Richard Guenther
On Wed, Oct 5, 2011 at 12:29 AM, Richard Henderson  wrote:
> On 10/04/2011 03:10 PM, Bernd Schmidt wrote:
>>       * doc/invoke.texi (-fshrink-wrap): Document.
>>       * opts.c (default_options_table): Add it.
>>       * common.opt (fshrink-wrap): Add.
>>       * function.c (emit_return_into_block): Remove useless declaration.
>>       (record_hard_reg_uses_1, record_hard_reg_uses, frame_required_for_rtx,
>>       requires_stack_frame_p, gen_return_pattern): New static functions.
>>       (emit_return_into_block): New arg simple_p.  All callers changed.
>>       Use gen_return_pattern.
>>       (thread_prologue_and_epilogue_insns): Implement shrink-wrapping.
>>       * config/i386/i386.md (return): Expand into a simple_return.
>>       (simple_return): New expander):
>>       (simple_return_internal, simple_return_internal_long,
>>       simple_return_pop_internal_long, simple_return_indirect_internal):
>>       Renamed from return_internal, return_internal_long,
>>       return_pop_internal_long and return_indirect_internal; changed to use
>>       simple_return.
>>       * config/i386/i386.c (ix86_expand_epilogue): Adjust to expand
>>       simple returns.
>>       (ix86_pad_returns): Likewise.
>>       * function.h (struct rtl_data): Add member shrink_wrapped.
>>       * cfgcleanup.c (outgoing_edges_match): If shrink-wrapped, edges that
>>       are not jumps or sibcalls can't be compared.
>>
>>       * gcc.target/i386/sw-1.c: New test.
>
> Ok.
>
> As a followup, I think this option needs to be disabled for profiling
> and profile_after_prologue.  Should be a mere matter of frobbing the
> options at startup.

This breaks bootstrap on x86_64-linux.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50621

Richard.

>
> r~
>


Re: Fix pr50607 bconstp-3.c failure

2011-10-05 Thread Joseph S. Myers
On Wed, 5 Oct 2011, Artem Shinkarov wrote:

> On Tue, Oct 4, 2011 at 11:51 PM, Joseph S. Myers
>  wrote:
> > On Tue, 4 Oct 2011, Artem Shinkarov wrote:
> >
> >> Hi
> >>
> >> Here is the patch tho fix bconstp-3.c failure in the bug 50607. The
> >> failure was cause because the new parser routine did not consider
> >> original_tree_code of the expression.
> >>
> >> The patch is bootstrapped on x86-unknown-linux-gnu and is being tested.
> >
> > Please repost the patch for review without the unrelated whitespace
> > changes.
> >
> > --
> > Joseph S. Myers
> > jos...@codesourcery.com
> >
> 
> Sure. The new version is in the attachment.

This version is OK, provided that it has passed bootstrap with no 
regressions and that it fixes the failures in question.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Tom Tromey
> "Diego" == Diego Novillo  writes:

Diego> Tom, Cary, Ian, any suggestions?  We are trying to figure out a
Diego> compromise for tiny inline functions that are generally a nuisance
Diego> when debugging.  The scenario is a call like this: big_function_foo
Diego> (inlined_f (x), inlined_g (y));
Diego> We want to use 's' to step inside the call to big_function_foo(), but
Diego> we don't want to step into either inlined_f() or inlined_g().

FWIW, I wrote the "macro define" stuff that Paolo posted back when I was
actively hacking on gcc.  I consider it to be mildly superior to the
inline approach, because it circumvents the runtime type checking --
this is a plus because it means that if I type the wrong thing to gdb it
doesn't cause cc1 to abort.  At least, this is a plus for me, since I
make mistakes like that with reasonable frequency.

Diego> I proposed extending #pragma GCC options to bracket these functions
Diego> with -g0.  This would help reduce the impact of debug info size.

I think this is fixing the wrong component: it means making a
one-size-fits-all decision in the gcc build, instead of just making the
debugger be more flexible.

If you want to pursue the inline function approach, I suggest
resurrecting this gdb patch:

http://sourceware.org/bugzilla/show_bug.cgi?id=8287
http://sourceware.org/ml/gdb-patches/2010-06/msg00417.html

Then you could add the appropriate blacklisting commands to gcc's
.gdbinit by default.

Tom


Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Diego Novillo
On Wed, Oct 5, 2011 at 10:51, Richard Guenther
 wrote:

> Did you also mark the function with always_inline?  That's a requirement
> as artificial only works for inlined function bodies.

Yeah.  It doesn't quite work as I expect it to.  It steps into the
function at odd places.


Diego.


Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Paolo Carlini

On 10/05/2011 05:27 PM, Tom Tromey wrote:
FWIW, I wrote the "macro define" stuff that Paolo posted back when I 
was actively hacking on gcc.
Yes, thanks Tom! Actually, I suspected that, but couldn't remember where 
I actually got it from, maybe you posted it on a public discussion 
thread somewhere...


Paolo.


Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Diego Novillo
On Wed, Oct 5, 2011 at 11:27, Tom Tromey  wrote:

> Diego> I proposed extending #pragma GCC options to bracket these functions
> Diego> with -g0.  This would help reduce the impact of debug info size.
>
> I think this is fixing the wrong component: it means making a
> one-size-fits-all decision in the gcc build, instead of just making the
> debugger be more flexible.

That's true.

> If you want to pursue the inline function approach, I suggest
> resurrecting this gdb patch:
>
>    http://sourceware.org/bugzilla/show_bug.cgi?id=8287
>    http://sourceware.org/ml/gdb-patches/2010-06/msg00417.html
>
> Then you could add the appropriate blacklisting commands to gcc's
> .gdbinit by default.

I think this could work.  I'm not sure I like the idea of having to
specify all these blacklist commands, but I appreciate how it can make
debugging more flexible.

Is this patch stalled?  the last I see is Justin's reply to review
feedback, but no indication of whether it will be accepted.

Richi, Jakub, Lawrence, would you be OK with this approach?  IIUC,
this means we'd have to add a bunch of blacklist commands to
gcc/gdbinit.in.

This does ties us to gdb, but I don't think that's a problem.

How does this work with C++?  Do the functions need to be specified
with their mangled names?


Diego.


RFC: ARM: Add comments to emitted .eabi_attribute directives

2011-10-05 Thread Nick Clifton
Hi Richard, Hi Paul, Hi Ramana,

  Well my idea to have a header file containing all of the possible
  binary file attributes does not seem to have taken off, so here is a
  much simpler patch for the GCC ARM backend.  It just adds comments to
  the .eabi_directives (when -dA or -fverbose-asm are in effect) emitted
  by GCC.  It does not have any dependencies upon header files, nor does
  it attempt to make use of the symbolic names in the actual
  .eabi_directives.

  Any objections to this version of the patch ?

Cheers
  Nick

gcc/ChangeLog
2011-10-05  Nick Clifton  

* config/arm/arm.c (EMIT_EABI_ATTRIBUTE): New macro.  Used to
emit a .eabi_attribute assembler directive, possibly with a
comment attached.
(asm_file_start): Use the new macro.


Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 179554)
+++ gcc/config/arm/arm.c(working copy)
@@ -22243,6 +22243,21 @@
 asm_fprintf (stream, "%U%s", name);
 }
 
+/* This macro is used to emit an EABI tag and its associated value.
+   We emit the numerical value of the tag in case the assembler does not
+   support textual tags.  (Eg gas prior to 2.20).  If requested we include
+   the tag name in a comment so that anyone reading the assembler output
+   will know which tag is being set.  */
+#define EMIT_EABI_ATTRIBUTE(NAME,NUM,VAL)  \
+  do   \
+{  \
+  asm_fprintf (asm_out_file, "\t.eabi_attribute %d, %d", NUM, VAL); \
+  if (flag_verbose_asm || flag_debug_asm)  \
+   asm_fprintf (asm_out_file, "\t%s " #NAME, ASM_COMMENT_START);   \
+  asm_fprintf (asm_out_file, "\n");
\
+}  \
+  while (0)
+
 static void
 arm_file_start (void)
 {
@@ -22274,9 +22289,9 @@
  if (arm_fpu_desc->model == ARM_FP_MODEL_VFP)
{
  if (TARGET_HARD_FLOAT)
-   asm_fprintf (asm_out_file, "\t.eabi_attribute 27, 3\n");
+   EMIT_EABI_ATTRIBUTE (Tag_ABI_HardFP_use, 27, 3);
  if (TARGET_HARD_FLOAT_ABI)
-   asm_fprintf (asm_out_file, "\t.eabi_attribute 28, 1\n");
+   EMIT_EABI_ATTRIBUTE (Tag_ABI_VFP_args, 28, 1);
}
}
   asm_fprintf (asm_out_file, "\t.fpu %s\n", fpu_name);
@@ -22285,31 +22300,24 @@
  are used.  However we don't have any easy way of figuring this out.
 Conservatively record the setting that would have been used.  */
 
-  /* Tag_ABI_FP_rounding.  */
   if (flag_rounding_math)
-   asm_fprintf (asm_out_file, "\t.eabi_attribute 19, 1\n");
+   EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_rounding, 19, 1);
+
   if (!flag_unsafe_math_optimizations)
{
- /* Tag_ABI_FP_denomal.  */
- asm_fprintf (asm_out_file, "\t.eabi_attribute 20, 1\n");
- /* Tag_ABI_FP_exceptions.  */
- asm_fprintf (asm_out_file, "\t.eabi_attribute 21, 1\n");
+ EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_denormal, 20, 1);
+ EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_exceptions, 21, 1);
}
-  /* Tag_ABI_FP_user_exceptions.  */
   if (flag_signaling_nans)
-   asm_fprintf (asm_out_file, "\t.eabi_attribute 22, 1\n");
-  /* Tag_ABI_FP_number_model.  */
-  asm_fprintf (asm_out_file, "\t.eabi_attribute 23, %d\n", 
-  flag_finite_math_only ? 1 : 3);
+   EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_user_exceptions, 22, 1);
 
-  /* Tag_ABI_align8_needed.  */
-  asm_fprintf (asm_out_file, "\t.eabi_attribute 24, 1\n");
-  /* Tag_ABI_align8_preserved.  */
-  asm_fprintf (asm_out_file, "\t.eabi_attribute 25, 1\n");
-  /* Tag_ABI_enum_size.  */
-  asm_fprintf (asm_out_file, "\t.eabi_attribute 26, %d\n",
-  flag_short_enums ? 1 : 2);
+  EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_number_model, 23,
+  flag_finite_math_only ? 1 : 3);
 
+  EMIT_EABI_ATTRIBUTE (Tag_ABI_align8_needed, 24, 1);
+  EMIT_EABI_ATTRIBUTE (Tag_ABI_align8_preserved, 25, 1);
+  EMIT_EABI_ATTRIBUTE (Tag_ABI_enum_size, 26, flag_short_enums ? 1 : 2);
+
   /* Tag_ABI_optimization_goals.  */
   if (optimize_size)
val = 4;
@@ -22319,16 +22327,12 @@
val = 1;
   else
val = 6;
-  asm_fprintf (asm_out_file, "\t.eabi_attribute 30, %d\n", val);
+  EMIT_EABI_ATTRIBUTE (Tag_ABI_optimization_goals, 30, val);
 
-  /* Tag_CPU_unaligned_access.  */
-  asm_fprintf (asm_out_file, "\t.eabi_attribute 34, %d\n",
-  unaligned_access);
+  EMIT_EABI_ATTRIBUTE (Tag_CPU_unaligned_access, 34, unaligned_access);
 
-  /* Tag_ABI_FP_16bit_format.  */
   if (arm_fp16_format)
-   asm_fprintf (asm_out_file, "\t.eabi_att

Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Jakub Jelinek
On Wed, Oct 05, 2011 at 11:42:51AM -0400, Diego Novillo wrote:
> Richi, Jakub, Lawrence, would you be OK with this approach?  IIUC,
> this means we'd have to add a bunch of blacklist commands to
> gcc/gdbinit.in.

I don't mind if it goes into gdb, but IMHO the blacklisting should
definitely default to blacklisting DW_AT_artificial inline functions
(and allowing to unblacklist them), because the artificial attribute
has been designed for that purpose already more than 4 years ago
and many headers use it.

Jakub


Re: Initial shrink-wrapping patch

2011-10-05 Thread Bernd Schmidt
On 10/05/11 17:13, Richard Guenther wrote:
> On Wed, Oct 5, 2011 at 12:29 AM, Richard Henderson  wrote:
>> On 10/04/2011 03:10 PM, Bernd Schmidt wrote:
>>>   * doc/invoke.texi (-fshrink-wrap): Document.
>>>   * opts.c (default_options_table): Add it.
>>>   * common.opt (fshrink-wrap): Add.
>>>   * function.c (emit_return_into_block): Remove useless declaration.
>>>   (record_hard_reg_uses_1, record_hard_reg_uses, frame_required_for_rtx,
>>>   requires_stack_frame_p, gen_return_pattern): New static functions.
>>>   (emit_return_into_block): New arg simple_p.  All callers changed.
>>>   Use gen_return_pattern.
>>>   (thread_prologue_and_epilogue_insns): Implement shrink-wrapping.
>>>   * config/i386/i386.md (return): Expand into a simple_return.
>>>   (simple_return): New expander):
>>>   (simple_return_internal, simple_return_internal_long,
>>>   simple_return_pop_internal_long, simple_return_indirect_internal):
>>>   Renamed from return_internal, return_internal_long,
>>>   return_pop_internal_long and return_indirect_internal; changed to use
>>>   simple_return.
>>>   * config/i386/i386.c (ix86_expand_epilogue): Adjust to expand
>>>   simple returns.
>>>   (ix86_pad_returns): Likewise.
>>>   * function.h (struct rtl_data): Add member shrink_wrapped.
>>>   * cfgcleanup.c (outgoing_edges_match): If shrink-wrapped, edges that
>>>   are not jumps or sibcalls can't be compared.
>>>
>>>   * gcc.target/i386/sw-1.c: New test.
>>
>> Ok.
>>
>> As a followup, I think this option needs to be disabled for profiling
>> and profile_after_prologue.  Should be a mere matter of frobbing the
>> options at startup.
> 
> This breaks bootstrap on x86_64-linux.
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50621

Caused by the x86_64 expand_epilogue not generating REG_CFA_RESTORE
notes, and in another case by queuing but not emitting them.

Bootstrapping the following now. Ok? (Alternatively, could keep the
redzone logic, but make it depend on !flag_shrink_wrap).


Bernd
* config/i386/i386.c (ix86_add_cfa_restore_note): Lose CFA_OFFSET
argument.  All callers changed.  Always emit a note.
(ix86_expand_epilogue): Ensure queued cfa_adjust notes are attached
to an insn.

Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 179553)
+++ gcc/config/i386/i386.c  (working copy)
@@ -9126,17 +9126,11 @@ ix86_emit_save_sse_regs_using_mov (HOST_
 static GTY(()) rtx queued_cfa_restores;
 
 /* Add a REG_CFA_RESTORE REG note to INSN or queue them until next stack
-   manipulation insn.  The value is on the stack at CFA - CFA_OFFSET.
-   Don't add the note if the previously saved value will be left untouched
-   within stack red-zone till return, as unwinders can find the same value
-   in the register and on the stack.  */
+   manipulation insn.  */
 
 static void
-ix86_add_cfa_restore_note (rtx insn, rtx reg, HOST_WIDE_INT cfa_offset)
+ix86_add_cfa_restore_note (rtx insn, rtx reg)
 {
-  if (cfa_offset <= cfun->machine->fs.red_zone_offset)
-return;
-
   if (insn)
 {
   add_reg_note (insn, REG_CFA_RESTORE, reg);
@@ -10298,7 +10292,7 @@ ix86_emit_restore_reg_using_pop (rtx reg
   struct machine_function *m = cfun->machine;
   rtx insn = emit_insn (gen_pop (reg));
 
-  ix86_add_cfa_restore_note (insn, reg, m->fs.sp_offset);
+  ix86_add_cfa_restore_note (insn, reg);
   m->fs.sp_offset -= UNITS_PER_WORD;
 
   if (m->fs.cfa_reg == crtl->drap_reg
@@ -10383,8 +10377,7 @@ ix86_emit_leave (void)
   add_reg_note (insn, REG_CFA_DEF_CFA,
plus_constant (stack_pointer_rtx, m->fs.sp_offset));
   RTX_FRAME_RELATED_P (insn) = 1;
-  ix86_add_cfa_restore_note (insn, hard_frame_pointer_rtx,
-m->fs.fp_offset);
+  ix86_add_cfa_restore_note (insn, hard_frame_pointer_rtx);
 }
 }
 
@@ -10421,7 +10414,7 @@ ix86_emit_restore_regs_using_mov (HOST_W
m->fs.drap_valid = true;
  }
else
- ix86_add_cfa_restore_note (NULL_RTX, reg, cfa_offset);
+ ix86_add_cfa_restore_note (NULL_RTX, reg);
 
cfa_offset -= UNITS_PER_WORD;
   }
@@ -10446,7 +10439,7 @@ ix86_emit_restore_sse_regs_using_mov (HO
set_mem_align (mem, 128);
emit_move_insn (reg, mem);
 
-   ix86_add_cfa_restore_note (NULL_RTX, reg, cfa_offset);
+   ix86_add_cfa_restore_note (NULL_RTX, reg);
 
cfa_offset -= 16;
   }
@@ -10738,6 +10731,8 @@ ix86_expand_epilogue (int style)
 GEN_INT (m->fs.sp_offset - UNITS_PER_WORD),
 style, true);
 }
+  else
+ix86_add_queued_cfa_restore_notes (get_last_insn ());
 
   /* Sibcall epilogues don't want a return instruction.  */
   if (style == 0)


Ping! Re: [RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-10-05 Thread Sameera Deshpande
Ping!

http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html

On Wed, 2011-09-28 at 17:15 +0100, Sameera Deshpande wrote:
> Hi!
> 
> This patch generates Thumb2 epilogues in RTL form.
> 
> The work involves defining new functions, predicates and patterns along with
> few changes in existing code:
> * The load_multiple_operation predicate was found to be too restrictive for
> integer loads as it required consecutive destination regs, so this
> restriction was lifted.
> * Variations of load_multiple_operation were required to handle cases 
>- where SP must be the base register 
>- where FP values were being loaded (which do require consecutive
> destination registers)
>- where PC can be in register-list (which requires return pattern along
> with register loads).
>   Hence, the common code was factored out into a new function in arm.c and
> parameterised to show 
>- whether consecutive destination regs are needed
>- the data type being loaded 
>- whether the base register has to be SP
>- whether PC is in register-list
> 
> The patch is tested with arm-eabi with no regressions.
> 
> ChangeLog:
> 
> 2011-09-28  Ian Bolton 
> Sameera Deshpande  
>
>* config/arm/arm-protos.h (load_multiple_operation_p): New
> declaration.
>  (thumb2_expand_epilogue): Likewise.
>  (thumb2_output_return): Likewise
>  (thumb2_expand_return): Likewise.
>  (thumb_unexpanded_epilogue): Rename to... 
>  (thumb1_unexpanded_epilogue): ...this 
>* config/arm/arm.c (load_multiple_operation_p): New function. 
>  (thumb2_emit_multi_reg_pop): Likewise.
>  (thumb2_emit_vfp_multi_reg_pop): Likewise.
>  (thumb2_expand_return): Likewise. 
>  (thumb2_expand_epilogue): Likewise. 
>  (thumb2_output_return): Likewise
>  (thumb_unexpanded_epilogue): Rename to...
>  ( thumb1_unexpanded_epilogue): ...this
>* config/arm/arm.md (pop_multiple_with_stack_update): New pattern. 
>  (pop_multiple_with_stack_update_and_return): Likewise.
>  (thumb2_ldr_with_return): Likewise.
>  (floating_point_pop_multiple_with_stack_update): Likewise.
>  (return): Update condition and code for pattern.
>  (arm_return): Likewise.
>  (epilogue_insns): Likewise.
>* config/arm/predicates.md (load_multiple_operation): Update
> predicate.
>  (load_multiple_operation_stack_and_return): New predicate. 
>  (load_multiple_operation_stack): Likewise.
>  (load_multiple_operation_stack_fp): Likewise.
>* config/arm/thumb2.md (thumb2_return): Remove.
>  (thumb2_rtl_epilogue_return): New pattern.
> 
> 
> - Thanks and regards,
>   Sameera D.

-- 




Ping! Re: [RFA/ARM][Patch 02/02]: ARM epilogues in RTL

2011-10-05 Thread Sameera Deshpande
Ping!

http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html

On Wed, 2011-09-28 at 17:15 +0100, Sameera Deshpande wrote:
> Hi!
> 
> This patch generates ARM epilogue in RTL form.
> 
> The work defines new functions and reuses most of the static functions and
> patterns defined in the previous patch (Thumb2 epilogues in RTL) with minor
> changes to handle mode specific details. 
> Hence, this patch depends completely on previous patch.
> 
> It is tested with arm-eabi with no regression.
> 
> ChangeLog:
> 
> 2011-09-28  Sameera Deshpande  
> 
> 
>* config/arm/arm-protos.h (arm_expand_epilogue): New declarations. 
>  (arm_expand_return): Likewise.
>  (thumb2_expand_epilogue): Add new boolean argument. 
>* config/arm/arm.c (print_multi_reg): Remove.
>  (vfp_output_fldmd): Likewise.
> 
>  (arm_output_epilogue): Likewise.
>  (output_return_instruction): Update the function. 
>  (thumb2_emit_multi_reg_pop): Rename to...
>  (arm_emit_multi_reg_pop): ...this 
>  (thumb2_emit_vfp_multi_reg_pop): Rename to...
>  (arm_emit_vfp_multi_reg_pop): ...this
>  (arm_emit_vfp_multi_reg_pop): Add new argument base_reg.
>  (arm_expand_return): New function.
> 
>  (arm_expand_epilogue): Likewise.
>  (thumb2_expand_epilogue): Add new argument is_sibling.
>* config/arm/arm.md (pop_multiple_with_stack_update): Update 
>  condition and code for pattern.
>  (arm_return): Likewise.
>  (pop_multiple_with_stack_update_and_return): Likewise.
>  (floating_point_pop_multiple_with_stack_update): Likewise.
>  (thumb2_ldr_with_return): Rename to...
>  (ldr_with_return): ...this
>  (ldr_with_return): Update condition.
>  (cond_return): Remove.
>  (cond_return_inverted): Likewise.
>  (return): Update code.
>  (epilogue): Likewise. 
>  (sibcall_epilogue): Likewise.
>  (epilogue_insns): Update condition and code.
> 
> 
> - Thanks and regards,
>   Sameera D.

-- 




[PATCH] Fix PR46556 (poor address generation)

2011-10-05 Thread William J. Schmidt
This patch addresses the poor code generation in PR46556 for the
following code:

struct x
{
  int a[16];
  int b[16];
  int c[16];
};

extern void foo (int, int, int);

void
f (struct x *p, unsigned int n)
{
  foo (p->a[n], p->c[n], p->b[n]);
}

Prior to the fix for PR32698, gcc calculated the offset for accessing
the array elements as:  n*4; 64+n*4; 128+n*4.

Following that fix, the offsets are calculated as:  n*4; (n+16)*4; (n
+32)*4.  This led to poor code generation on powerpc64 targets, among
others.

The poor code generation was observed to not occur in loops, as the
IVOPTS code does a good job of lowering these expressions to MEM_REFs.
It was previously suggested that perhaps a general pass to lower memory
accesses to MEM_REFs in GIMPLE would solve not only this, but other
similar problems.  I spent some time looking into various approaches to
this, and reviewing some previous attempts to do similar things.  In the
end, I've concluded that this is a bad idea in practice because of the
loss of useful aliasing information.  In particular, early lowering of
component references causes us to lose the ability to disambiguate
non-overlapping references in the same structure, and there is no simple
way to carry the necessary aliasing information along with the
replacement MEM_REFs to avoid this.  While some performance gains are
available with GIMPLE lowering of memory accesses, there are also
offsetting performance losses, and I suspect this would just be a
continuous source of bug reports into the future.

Therefore the current patch is a much simpler approach to solve the
specific problem noted in the PR.  There are two pieces to the patch:

 * The offending addressing pattern is matched in GIMPLE and transformed
into a restructured MEM_REF that distributes the multiply, so that (n
+32)*4 becomes 4*n+128 as before.  This is done during the reassociation
pass, for reasons described below.  The transformation only occurs in
non-loop blocks, since IVOPTS does a good job on such things within
loops.
 * A tweak is added to the RTL forward-propagator to avoid propagating
into memory references based on a single base register with no offset,
under certain circumstances.  This improves sharing of base registers
for accesses within the same structure and slightly lowers register
pressure.

It would be possible to separate these into two patches if that's
preferred.  I chose to combine them because together they provide the
ideal code generation that the new test cases test for.

I initially implemented the pattern matcher during expand, but I found
that the expanded code for two accesses to the same structure was often
not being CSEd well.  So I moved it back into the GIMPLE phases prior to
DOM to take advantage of its CSE.  To avoid an additional complete pass
over the IL, I chose to piggyback on the reassociation pass.  This
transformation is not technically a reassociation, but it is related
enough to not be a complete wart.

One noob question about this:  It would probably be preferable to have
this transformation only take place during the second reassociation
pass, so the ARRAY_REFs are seen by earlier optimization phases.  Is
there an easy way to detect that it's the second pass without having to
generate a separate pass entry point?

One other general question about the pattern-match transformation:  Is
this an appropriate transformation for all targets, or should it be
somehow gated on available addressing modes on the target processor?

Bootstrapped and regression tested on powerpc64-linux-gnu.  Verified no
performance degradations on that target for SPEC CPU2000 and CPU2006.

I'm looking for eventual approval for trunk after any comments are
resolved.  Thanks!

Bill


2011-10-05  Bill Schmidt  

gcc:

PR rtl-optimization/46556
* fwprop.c (fwprop_bb_aux_d): New struct.
(MEM_PLUS_REGS): New macro.
(record_mem_plus_reg): New function.
(record_mem_plus_regs): Likewise.
(single_def_use_enter_block): Record mem-plus-reg patterns.
(build_single_def_use_links): Allocate aux storage.
(locally_poor_mem_replacement): New function.
(forward_propagate_and_simplify): Call
locally_poor_mem_replacement.
(fwprop_init): Free storage.
* tree.h (copy_ref_info): Expose existing function.
* tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token.
* tree-ssa-reassoc.c (restructure_base_and_offset): New function.
(restructure_mem_ref): Likewise.
(reassociate_bb): Look for opportunities to call
restructure_mem_ref; clean up immediate use lists.

gcc/testsuite:

PR rtl-optimization/46556
* gcc.target/powerpc/ppc-pr46556-1.c: New testcase.
* gcc.target/powerpc/ppc-pr46556-2.c: Likewise.
* gcc.target/powerpc/ppc-pr46556-3.c: Likewise.
* gcc.target/powerpc/ppc-pr46556-4.c: Likewise.
* gcc.dg/tree-ssa/pr46556-1.c: Likewise.
 

Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Tom Tromey
> "Diego" == Diego Novillo  writes:

Tom>    http://sourceware.org/bugzilla/show_bug.cgi?id=8287

Diego> I think this could work.  I'm not sure I like the idea of having to
Diego> specify all these blacklist commands, but I appreciate how it can make
Diego> debugging more flexible.

Yeah, that's a drawback, since you have to remember to update .gdbinit.

Diego> Is this patch stalled?  the last I see is Justin's reply to review
Diego> feedback, but no indication of whether it will be accepted.

The thread continues in the next month:

http://sourceware.org/ml/gdb-patches/2010-07/threads.html#00318

The patch needed some updates.  I don't know what happened.

Diego> How does this work with C++?  Do the functions need to be specified
Diego> with their mangled names?

I would hope not, but I don't remember any more.

Tom


Re: [RFC] Split -mrecip

2011-10-05 Thread Michael Matz
Hi,

On Mon, 19 Sep 2011, Uros Bizjak wrote:

> Looking at this topic again, I'd propose that x86 adopts approach from 
> rs6000. The rs6000 approach is more extensible, and offers the same 
> flexibility, due to "!".
> 
> So, x86 could have "-mrecip=", with all, default, none, div,
> vec-div, divf, vec-divf, rsqrt, etc ... combinations,

Like so.  Currently regstrapping on x86_64-linux.  Okay if that succeeds?


Ciao,
Michael.

* i386/i386.opt (recip_mask, recip_mask_explicit,
x_recip_mask_explicit): New variables and cl_target member.
(mrecip=): New option.
* i386/i386.h (RECIP_MASK_DIV, RECIP_MASK_SQRT, RECIP_MASK_VEC_DIV,
RECIP_MASK_VEC_SQRT, RECIP_MASK_ALL): New bitmasks.
(TARGET_RECIP_DIV, TARGET_RECIP_SQRT, TARGET_RECIP_VEC_DIV,
TARGET_RECIP_VEC_SQRT): New tests.
* i386/i386.md (divsf3): Check TARGET_RECIP_DIV.
(sqrt2): Check TARGET_RECIP_SQRT.
* i386/sse.md (div3): Check TARGET_RECIP_VEC_DIV.
(sqrt2): Check TARGET_RECIP_VEC_SQRT.
* i386/i386.c (ix86_option_override_internal): Set recip_mask
for -mrecip and -mrecip=options.
(ix86_function_specific_save): Save recip_mask_explicit.
(ix86_function_specific_restore): Restore recip_mask_explicit.

* doc/invoke.texi (ix86 Options): Document the new option.

Index: config/i386/i386.h
===
--- config/i386/i386.h  (revision 179558)
+++ config/i386/i386.h  (working copy)
@@ -2315,6 +2315,18 @@ extern void debug_dispatch_window (int);
((FLAGS) & (IX86_CALLCVT_CDECL | IX86_CALLCVT_STDCALL \
| IX86_CALLCVT_FASTCALL | IX86_CALLCVT_THISCALL))
 
+#define RECIP_MASK_DIV 0x01
+#define RECIP_MASK_SQRT0x02
+#define RECIP_MASK_VEC_DIV 0x04
+#define RECIP_MASK_VEC_SQRT0x08
+#define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \
+| RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT)
+
+#define TARGET_RECIP_DIV   ((recip_mask & RECIP_MASK_DIV) != 0)
+#define TARGET_RECIP_SQRT  ((recip_mask & RECIP_MASK_SQRT) != 0)
+#define TARGET_RECIP_VEC_DIV   ((recip_mask & RECIP_MASK_VEC_DIV) != 0)
+#define TARGET_RECIP_VEC_SQRT  ((recip_mask & RECIP_MASK_VEC_SQRT) != 0)
+
 /*
 Local variables:
 version-control: t
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 179558)
+++ config/i386/i386.md (working copy)
@@ -7062,7 +7062,9 @@ (define_expand "divsf3"
   "(TARGET_80387 && X87_ENABLE_ARITH (SFmode))
 || TARGET_SSE_MATH"
 {
-  if (TARGET_SSE_MATH && TARGET_RECIP && optimize_insn_for_speed_p ()
+  if (TARGET_SSE_MATH
+  && TARGET_RECIP_DIV
+  && optimize_insn_for_speed_p ()
   && flag_finite_math_only && !flag_trapping_math
   && flag_unsafe_math_optimizations)
 {
@@ -13438,7 +13440,9 @@ (define_expand "sqrt2"
|| (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)"
 {
   if (mode == SFmode
-  && TARGET_SSE_MATH && TARGET_RECIP && !optimize_function_for_size_p 
(cfun)
+  && TARGET_SSE_MATH
+  && TARGET_RECIP_SQRT
+  && !optimize_function_for_size_p (cfun)
   && flag_finite_math_only && !flag_trapping_math
   && flag_unsafe_math_optimizations)
 {
Index: config/i386/sse.md
===
--- config/i386/sse.md  (revision 179558)
+++ config/i386/sse.md  (working copy)
@@ -779,7 +779,9 @@ (define_expand "div3"
 {
   ix86_fixup_binary_operands_no_copy (DIV, mode, operands);
 
-  if (TARGET_SSE_MATH && TARGET_RECIP && !optimize_insn_for_size_p ()
+  if (TARGET_SSE_MATH
+  && TARGET_RECIP_VEC_DIV
+  && !optimize_insn_for_size_p ()
   && flag_finite_math_only && !flag_trapping_math
   && flag_unsafe_math_optimizations)
 {
@@ -857,7 +859,9 @@ (define_expand "sqrt2"
(sqrt:VF1 (match_operand:VF1 1 "nonimmediate_operand" "")))]
   "TARGET_SSE"
 {
-  if (TARGET_SSE_MATH && TARGET_RECIP && !optimize_insn_for_size_p ()
+  if (TARGET_SSE_MATH
+  && TARGET_RECIP_VEC_SQRT
+  && !optimize_insn_for_size_p ()
   && flag_finite_math_only && !flag_trapping_math
   && flag_unsafe_math_optimizations)
 {
Index: config/i386/i386.opt
===
--- config/i386/i386.opt(revision 179558)
+++ config/i386/i386.opt(working copy)
@@ -31,6 +31,15 @@ HOST_WIDE_INT ix86_isa_flags = TARGET_64
 Variable
 HOST_WIDE_INT ix86_isa_flags_explicit
 
+TargetVariable
+int recip_mask
+
+Variable
+int recip_mask_explicit
+
+TargetSave
+int x_recip_mask_explicit
+
 ;; Definitions to add to the cl_target_option structure
 ;; -march= processor
 TargetSave
@@ -373,6 +382,10 @@ mrecip
 Target Report Mask(RECIP) Save
 Generate reciprocals instead of divss and sqrtss.
 
+mrecip=
+Target Report RejectNegative Joined Var(ix86_recip_name) Save
+Control gene

Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Tom Tromey
> "Jakub" == Jakub Jelinek  writes:

Jakub> I don't mind if it goes into gdb, but IMHO the blacklisting should
Jakub> definitely default to blacklisting DW_AT_artificial inline functions
Jakub> (and allowing to unblacklist them), because the artificial attribute
Jakub> has been designed for that purpose already more than 4 years ago
Jakub> and many headers use it.

Could you file a gdb bug report for this?

In general I'd appreciate it if folks making debug changes to gcc filed
bug reports against gdb.  I do follow the gcc list looking for things
like this, but I miss things sometimes, and this isn't really the best
way.

Tom


Re: Initial shrink-wrapping patch

2011-10-05 Thread Richard Henderson
On 10/05/2011 08:59 AM, Bernd Schmidt wrote:
> Bootstrapping the following now. Ok? (Alternatively, could keep the
> redzone logic, but make it depend on !flag_shrink_wrap).

It's a good space-saving optimization, that redzone logic.  We
should be able to keep it based on crtl->shrink_wrapped; that
does appear to get set before we actually emit the epilogue.


r~


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-05 Thread Paolo Bonzini

On 10/05/2011 06:13 PM, William J. Schmidt wrote:

One other general question about the pattern-match transformation:  Is
this an appropriate transformation for all targets, or should it be
somehow gated on available addressing modes on the target processor?

Bootstrapped and regression tested on powerpc64-linux-gnu.  Verified no
performance degradations on that target for SPEC CPU2000 and CPU2006.

I'm looking for eventual approval for trunk after any comments are
resolved.  Thanks!


How do the costs look like for the two transforms you mention in the 
head comment of locally_poor_mem_replacement?


Paolo


Re: Fix pr50607 bconstp-3.c failure

2011-10-05 Thread Artem Shinkarov
On Wed, Oct 5, 2011 at 4:22 PM, Joseph S. Myers  wrote:
> On Wed, 5 Oct 2011, Artem Shinkarov wrote:
>
>> On Tue, Oct 4, 2011 at 11:51 PM, Joseph S. Myers
>>  wrote:
>> > On Tue, 4 Oct 2011, Artem Shinkarov wrote:
>> >
>> >> Hi
>> >>
>> >> Here is the patch tho fix bconstp-3.c failure in the bug 50607. The
>> >> failure was cause because the new parser routine did not consider
>> >> original_tree_code of the expression.
>> >>
>> >> The patch is bootstrapped on x86-unknown-linux-gnu and is being tested.
>> >
>> > Please repost the patch for review without the unrelated whitespace
>> > changes.
>> >
>> > --
>> > Joseph S. Myers
>> > jos...@codesourcery.com
>> >
>>
>> Sure. The new version is in the attachment.
>
> This version is OK, provided that it has passed bootstrap with no
> regressions and that it fixes the failures in question.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com
>

Thanks, I am going to apply it then, when the regtesting is over in
case it would be successfull.

Joseph, is it possible to commit the patch together with the space fixes?


Thanks,
Artem.


Re: Fix pr50607 bconstp-3.c failure

2011-10-05 Thread Joseph S. Myers
On Wed, 5 Oct 2011, Artem Shinkarov wrote:

> Joseph, is it possible to commit the patch together with the space fixes?

You should not commit whitespace fixes to lines not otherwise modified by 
a patch, except in a separate commit that *only* has whitespace or other 
formatting fixes, so that "svn blame" results are more meaningful.  That 
is: commit whitespace fixes (assuming they are genuinely fixes rather than 
making things worse) *separately* in a commit whose commit message makes 
clear it is just whitespace fixes.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-05 Thread Steven Bosscher
On Wed, Oct 5, 2011 at 6:13 PM, William J. Schmidt
 wrote:
>        * tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token.

Rather than this, why not move the function to common code somewhere?

Ciao!
Steven


Re: Fix pr50607 bconstp-3.c failure

2011-10-05 Thread Artem Shinkarov
On Wed, Oct 5, 2011 at 5:28 PM, Joseph S. Myers  wrote:
> On Wed, 5 Oct 2011, Artem Shinkarov wrote:
>
>> Joseph, is it possible to commit the patch together with the space fixes?
>
> You should not commit whitespace fixes to lines not otherwise modified by
> a patch, except in a separate commit that *only* has whitespace or other
> formatting fixes, so that "svn blame" results are more meaningful.  That
> is: commit whitespace fixes (assuming they are genuinely fixes rather than
> making things worse) *separately* in a commit whose commit message makes
> clear it is just whitespace fixes.

Ok, I see.  In the given patch all the fixes are mainly auto-generated
and remove the trailing spaces.  But I see your point.

Thanks,
Artem.
> --
> Joseph S. Myers
> jos...@codesourcery.com
>


Re: fix for c++/44473, mangling of decimal types, checked in

2011-10-05 Thread Peter Bergner
On Fri, 2011-09-30 at 10:37 -0700, Janis Johnson wrote:
> Patch http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00625.html was
> approved by Jason last December but I never got around to checking
> it in.  Paolo Carlini said in PR44473 that it was already approved
> and doesn't need a new approval, so I checked it in after a
> bootstrap and regtest of c,c++ for i686-pc-linux-gnu.

Hi Janis,

Thanks for committing this to mainline!  One note though, you asked
for permission to commit this also to the 4.5 release branch and Jason
approved that.  So do you have plans for committing it there too (and to
the 4.6 branch since that was created between when you submitted that
patch and committed it)?  I'd like to see this fixed on both of those
FSF release branches.

Peter






Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Diego Novillo
On Wed, Oct 5, 2011 at 11:28, Diego Novillo  wrote:
> On Wed, Oct 5, 2011 at 10:51, Richard Guenther
>  wrote:
>
>> Did you also mark the function with always_inline?  That's a requirement
>> as artificial only works for inlined function bodies.
>
> Yeah.  It doesn't quite work as I expect it to.  It steps into the
> function at odd places.

So, I played with this some more with this, and there seems to be some
inconsistency in how these attributes get handled.
http://sourceware.org/bugzilla/show_bug.cgi?id=13263

static inline int foo (int) __attribute__((always_inline,artificial));

static inline int foo (int x)
{
  int y  = x - 3;
  return y;
}

int bar (int y)
{
  return y == 0;
}

main ()
{
  foo (10);
  return bar (foo (3));
}

With GCC 4.7, the stand alone call foo(10) is not ignored by 'step'.
However, the embedded call bar(foo(3)) is ignored as I was expecting.


Diego.


Re: [PATCH] Add an intermediate coverage format for gcov

2011-10-05 Thread Mike Stump
On Oct 5, 2011, at 12:47 AM, Sharad Singhai wrote:
> This patch adds an intermediate coverage format (enabled via 'gcov
> -i'). This is a compact format as it does not require source files.

I don't like any of the tags, I think if you showed them to 100 people that had 
used gcov before, very few of them would be able to figure out the meaning 
without reading the manual.  Why make it completely cryptic?  A binary encoded 
compress format is smaller and just as readable.

SF, sounds like single float to me, I'd propose file.
FN, sounds like filename, I'd propose line.
FNDA, can't even guess at what it means, I'd propose fcount.
BA, I'd propose branch, for 0, notexe, for 1, nottaken, for 2 taken.
DA, I'd propose lcount

I'd propose fcount be listed as fname,ecount, to match the ordering of lcount.  
Also, implicit in the name, is the ordering f, then count, l, then count.

I think if you showed the above to 100 people that had used gcov before, I 
think we'd be up past 90% of the people able to figure it out, without reading 
the doc.


Re: [RFC] Split -mrecip

2011-10-05 Thread Uros Bizjak
On Wed, Oct 5, 2011 at 6:19 PM, Michael Matz  wrote:

>> Looking at this topic again, I'd propose that x86 adopts approach from
>> rs6000. The rs6000 approach is more extensible, and offers the same
>> flexibility, due to "!".
>>
>> So, x86 could have "-mrecip=", with all, default, none, div,
>> vec-div, divf, vec-divf, rsqrt, etc ... combinations,
>
> Like so.  Currently regstrapping on x86_64-linux.  Okay if that succeeds?

OK, with a nit - I'd introduce RECIP_MASK_NONE and use it in place of
0 in a couple of places.

Thanks,
Uros.


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-05 Thread William J. Schmidt
On Wed, 2011-10-05 at 18:29 +0200, Steven Bosscher wrote:
> On Wed, Oct 5, 2011 at 6:13 PM, William J. Schmidt
>  wrote:
> >* tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token.
> 
> Rather than this, why not move the function to common code somewhere?
> 
> Ciao!
> Steven

An alternative would be to move it into tree-ssa-address.c, where there
is already a simpler version called copy_mem_ref_info.  I'm open to that
if it's preferable.

Bill



Re: Initial shrink-wrapping patch

2011-10-05 Thread Bernd Schmidt
On 10/05/11 18:21, Richard Henderson wrote:
> On 10/05/2011 08:59 AM, Bernd Schmidt wrote:
>> Bootstrapping the following now. Ok? (Alternatively, could keep the
>> redzone logic, but make it depend on !flag_shrink_wrap).
> 
> It's a good space-saving optimization, that redzone logic.  We
> should be able to keep it based on crtl->shrink_wrapped; that
> does appear to get set before we actually emit the epilogue.

I've committed the following after a x86_64-linux bootstrap.


Bernd

PR bootstrap/50621
* config/i386/i386.c (ix86_add_cfa_restore_note): Omit notes only
if the function was not shrink-wrapped.
(ix86_expand_epilogue): Ensure queued cfa_adjust notes are attached
to an insn.
* function.c (thread_prologue_and_epilogue_insns): Make sure the
shrink_wrapped flag is set even if there is no dump file.

Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 179553)
+++ gcc/config/i386/i386.c  (working copy)
@@ -9134,7 +9134,8 @@ static GTY(()) rtx queued_cfa_restores;
 static void
 ix86_add_cfa_restore_note (rtx insn, rtx reg, HOST_WIDE_INT cfa_offset)
 {
-  if (cfa_offset <= cfun->machine->fs.red_zone_offset)
+  if (!crtl->shrink_wrapped
+  && cfa_offset <= cfun->machine->fs.red_zone_offset)
 return;
 
   if (insn)
@@ -10738,6 +10739,8 @@ ix86_expand_epilogue (int style)
 GEN_INT (m->fs.sp_offset - UNITS_PER_WORD),
 style, true);
 }
+  else
+ix86_add_queued_cfa_restore_notes (get_last_insn ());
 
   /* Sibcall epilogues don't want a return instruction.  */
   if (style == 0)
Index: gcc/function.c
===
--- gcc/function.c  (revision 179553)
+++ gcc/function.c  (working copy)
@@ -5741,10 +5741,11 @@ thread_prologue_and_epilogue_insns (void
  if (dump_file)
fprintf (dump_file, "Shrink-wrapping aborted due to clobber.\n");
}
-  else if (dump_file && entry_edge != orig_entry_edge)
+  else if (entry_edge != orig_entry_edge)
{
  crtl->shrink_wrapped = true;
- fprintf (dump_file, "Performing shrink-wrapping.\n");
+ if (dump_file)
+   fprintf (dump_file, "Performing shrink-wrapping.\n");
}
 
 fail_shrinkwrap:


Re: Initial shrink-wrapping patch

2011-10-05 Thread Bernd Schmidt
On 10/05/11 00:29, Richard Henderson wrote:
> As a followup, I think this option needs to be disabled for profiling
> and profile_after_prologue.  Should be a mere matter of frobbing the
> options at startup.

The other code seems to test crtl->profile rather than an option flag,
so how's this? Bootstrapped along with the x86_64 fix.


Bernd
* function.c (thread_prologue_and_epilogue_insns): Don't shrink-wrap
if profiling after the prologue.

Index: gcc/function.c
===
--- gcc/function.c  (revision 179553)
+++ gcc/function.c  (working copy)
@@ -5571,6 +5571,7 @@ thread_prologue_and_epilogue_insns (void
   }
   
   if (flag_shrink_wrap && HAVE_simple_return
+  && (targetm.profile_before_prologue () || !crtl->profile)
   && nonempty_prologue && !crtl->calls_eh_return)
 {
   HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge;


[PATCH, i386]: A couple of cleanups.

2011-10-05 Thread Uros Bizjak
Hello!

No functional change.

2011-10-05  Uros Bizjak  

* config/i386/i386.c (distance_non_agu_define): Simplify calculation
of "found".  Simplify return value calculation.
(distance_agu_use): Ditto.

Tested on x86_64-pc-linux-gnu {,-m32} with shrink-wrapping disabled,
committed to mainline SVN.

Uros.
Index: i386.c
===
--- i386.c  (revision 179558)
+++ i386.c  (working copy)
@@ -16139,19 +16139,20 @@ distance_non_agu_define (unsigned int regno1, unsi
 
  FOR_EACH_EDGE (e, ei, bb->preds)
{
- int bb_dist = distance_non_agu_define_in_bb (regno1, regno2,
-  insn, distance,
-  BB_END (e->src),
-  &found_in_bb);
+ int bb_dist
+   = distance_non_agu_define_in_bb (regno1, regno2,
+insn, distance,
+BB_END (e->src),
+&found_in_bb);
  if (found_in_bb)
{
  if (shortest_dist < 0)
shortest_dist = bb_dist;
  else if (bb_dist > 0)
shortest_dist = MIN (bb_dist, shortest_dist);
-   }
 
- found = found || found_in_bb;
+ found = true;
+   }
}
 
  distance = shortest_dist;
@@ -16164,11 +16165,9 @@ distance_non_agu_define (unsigned int regno1, unsi
   extract_insn_cached (insn);
 
   if (!found)
-distance = -1;
-  else
-distance = distance >> 1;
+return -1;
 
-  return distance;
+  return distance >> 1;
 }
 
 /* Return the distance in half-cycles between INSN and the next
@@ -16181,9 +16180,9 @@ distance_non_agu_define (unsigned int regno1, unsi
found and false otherwise.  */
 
 static int
-distance_agu_use_in_bb(unsigned int regno,
-  rtx insn, int distance, rtx start,
-  bool *found, bool *redefined)
+distance_agu_use_in_bb (unsigned int regno,
+   rtx insn, int distance, rtx start,
+   bool *found, bool *redefined)
 {
   basic_block bb = start ? BLOCK_FOR_INSN (start) : NULL;
   rtx next = start;
@@ -16268,18 +16267,19 @@ distance_agu_use (unsigned int regno0, rtx insn)
 
  FOR_EACH_EDGE (e, ei, bb->succs)
{
- int bb_dist = distance_agu_use_in_bb (regno0, insn,
-   distance, BB_HEAD (e->dest),
-   &found_in_bb, 
&redefined_in_bb);
+ int bb_dist
+   = distance_agu_use_in_bb (regno0, insn,
+ distance, BB_HEAD (e->dest),
+ &found_in_bb, &redefined_in_bb);
  if (found_in_bb)
{
  if (shortest_dist < 0)
shortest_dist = bb_dist;
  else if (bb_dist > 0)
shortest_dist = MIN (bb_dist, shortest_dist);
-   }
 
- found = found || found_in_bb;
+ found = true;
+   }
}
 
  distance = shortest_dist;
@@ -16287,11 +16287,9 @@ distance_agu_use (unsigned int regno0, rtx insn)
 }
 
   if (!found || redefined)
-distance = -1;
-  else
-distance = distance >> 1;
+return -1;
 
-  return distance;
+  return distance >> 1;
 }
 
 /* Define this macro to tune LEA priority vs ADD, it take effect when
@@ -16346,7 +16344,7 @@ ix86_lea_outperforms (rtx insn, unsigned int regno
false otherwise.  */
 
 static bool
-ix86_ok_to_clobber_flags(rtx insn)
+ix86_ok_to_clobber_flags (rtx insn)
 {
   basic_block bb = BLOCK_FOR_INSN (insn);
   df_ref *use;


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-05 Thread William J. Schmidt
On Wed, 2011-10-05 at 18:21 +0200, Paolo Bonzini wrote:
> On 10/05/2011 06:13 PM, William J. Schmidt wrote:
> > One other general question about the pattern-match transformation:  Is
> > this an appropriate transformation for all targets, or should it be
> > somehow gated on available addressing modes on the target processor?
> >
> > Bootstrapped and regression tested on powerpc64-linux-gnu.  Verified no
> > performance degradations on that target for SPEC CPU2000 and CPU2006.
> >
> > I'm looking for eventual approval for trunk after any comments are
> > resolved.  Thanks!
> 
> How do the costs look like for the two transforms you mention in the 
> head comment of locally_poor_mem_replacement?
> 
> Paolo
> 

I don't know off the top of my head -- I'll have to gather that
information.  The issue is that the profitability is really
context-sensitive, so just the isolated costs of insns aren't enough.
The forward propagation of the add into (mem (reg REG)) looks like a
slam dunk in the absence of other information, but if there are other
nearby references using nonzero offsets from REG, this just extends the
lifetimes of X and Y without eliminating the need for REG.



[PATCH, testsuite]: Fix UNRESOLVED: gcc.dg/vect/vec-scal-opt*.c scan-tree-dump-times veclower

2011-10-05 Thread Uros Bizjak
Hello!

2011-10-05  Uros Bizjak  

* gcc.dg/vect/vect.exp (VEC_CFLAGS): Move initialization after
DEFAULT_VECTFLAGS initialization.

Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN.

Uros.
Index: gcc.dg/vect/vect.exp
===
--- gcc.dg/vect/vect.exp(revision 179561)
+++ gcc.dg/vect/vect.exp(working copy)
@@ -39,15 +39,15 @@
 return
 }
 
-global VEC_FLAGS
-set VEC_FLAGS $DEFAULT_VECTCFLAGS
-
 # These flags are used for all targets.
 lappend DEFAULT_VECTCFLAGS "-ftree-vectorize" "-fno-vect-cost-model"
 
 # Initialize `dg'.
 dg-init
 
+global VEC_FLAGS
+set VEC_FLAGS $DEFAULT_VECTCFLAGS
+
 global O1_VECTCFLAGS
 set O1_VECTCFLAGS $DEFAULT_VECTCFLAGS
 lappend O1_VECTCFLAGS "-O1"


[PATCH 0/3] Fix vector shuffle problems

2011-10-05 Thread Richard Henderson
The first problem is that the generic lowering to scalar implementation
didn't match the documentation in that the shuffle indicies are to be
masked to the range of the inputs.  Or, perhaps more exactly, the generic
lowering didn't match the SSE implementation which does do the masking.
The OpenCL spec from whence this feature is taken does explicitly say
that the masking should happen; our current documentation in extend.texi
isn't quite as clear on this point.

The second problem is that the SSE implementation was broken.  It simply
didn't work much of the time.  This was masked by...

The third problem is that the test cases weren't testing what they were
intended to test.  In particular, with optimization the compiler was 
able to constant-propagate away many of the shuffling and tests.  Which
might be an interesting missed-optimization test, had they actually been
constructed differently.  But what actually needed testing first is that
the operation actually works at all.

Tested on x86_64 with

  check-gcc//unix/{,-mssse3,-msse4}

Hopefully one of the AMD guys can test on a bulldozer with -mxop?

Committed.


r~


Richard Henderson (3):
  Fix lower_vec_shuffle.
  i386: Rewrite ix86_expand_vshuffle.
  Fix vect-shuffle-* test cases.

 gcc/ChangeLog  |   14 +
 gcc/config/i386/i386-protos.h  |2 +-
 gcc/config/i386/i386.c |  208 +++
 gcc/config/i386/sse.md |4 +-
 gcc/testsuite/ChangeLog|   11 +
 .../gcc.c-torture/execute/vect-shuffle-1.c |   98 +---
 .../gcc.c-torture/execute/vect-shuffle-2.c |   96 +---
 .../gcc.c-torture/execute/vect-shuffle-3.c |   90 ---
 .../gcc.c-torture/execute/vect-shuffle-4.c |   99 
 .../gcc.c-torture/execute/vect-shuffle-5.c |  113 
 .../gcc.c-torture/execute/vect-shuffle-6.c |   64 +
 .../gcc.c-torture/execute/vect-shuffle-7.c |   70 +
 .../gcc.c-torture/execute/vect-shuffle-8.c |   55 
 gcc/tree-vect-generic.c|  276 +++-
 14 files changed, 698 insertions(+), 502 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-6.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-7.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-8.c

-- 
1.7.6.4



[PATCH 1/3] Fix lower_vec_shuffle.

2011-10-05 Thread Richard Henderson
1: It can never fail.
2: It should mask the input indicies.
---
 gcc/ChangeLog   |8 ++
 gcc/tree-vect-generic.c |  276 +--
 2 files changed, 107 insertions(+), 177 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 4bdda3d..a88854d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2011-10-05  Richard Henderson  
+
+   * tree-vect-generic.c (vector_element): Never fail.  Use
+   build_zero_cst.  Tidy up type references.
+   (lower_vec_shuffle): Never fail.  Mask shuffle indicies.  Reduce
+   code duplication.  Do update_stmt here ...
+   (expand_vector_operations_1): ... not here.
+
 2011-10-05  DJ Delorie  
Nick Clifton  
 
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index 1a6d1f1..ef03b35 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -512,22 +512,29 @@ type_for_widest_vector_mode (enum machine_mode 
inner_mode, optab op, int satp)
 static tree
 vector_element (gimple_stmt_iterator *gsi, tree vect, tree idx, tree *ptmpvec)
 {
-  tree type;
+  tree vect_type, vect_elt_type;
   gimple asgn;
   tree tmpvec;
   tree arraytype;
   bool need_asgn = true;
+  unsigned int elements;
 
-  gcc_assert (TREE_CODE (TREE_TYPE (vect)) == VECTOR_TYPE);
+  vect_type = TREE_TYPE (vect);
+  vect_elt_type = TREE_TYPE (vect_type);
+  elements = TYPE_VECTOR_SUBPARTS (vect_type);
 
-  type = TREE_TYPE (vect);
   if (TREE_CODE (idx) == INTEGER_CST)
 {
   unsigned HOST_WIDE_INT index;
 
-  if (!host_integerp (idx, 1)
-   || (index = tree_low_cst (idx, 1)) > TYPE_VECTOR_SUBPARTS (type)-1)
-return error_mark_node;
+  /* Given that we're about to compute a binary modulus,
+we don't care about the high bits of the value.  */
+  index = TREE_INT_CST_LOW (idx);
+  if (!host_integerp (idx, 1) || index >= elements)
+   {
+ index &= elements - 1;
+ idx = build_int_cst (TREE_TYPE (idx), index);
+   }
 
   if (TREE_CODE (vect) == VECTOR_CST)
 {
@@ -536,33 +543,30 @@ vector_element (gimple_stmt_iterator *gsi, tree vect, 
tree idx, tree *ptmpvec)
  for (i = 0; vals; vals = TREE_CHAIN (vals), ++i)
if (i == index)
   return TREE_VALUE (vals);
- return error_mark_node;
+ return build_zero_cst (vect_elt_type);
 }
   else if (TREE_CODE (vect) == CONSTRUCTOR)
 {
   unsigned i;
-  VEC (constructor_elt, gc) *vals = CONSTRUCTOR_ELTS (vect);
-  constructor_elt *elt;
+  tree elt_i, elt_v;
 
-  for (i = 0; VEC_iterate (constructor_elt, vals, i, elt); i++)
-if (operand_equal_p (elt->index, idx, 0))
-  return elt->value;
-  return fold_convert (TREE_TYPE (type), integer_zero_node);
+ FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (vect), i, elt_i, elt_v)
+if (operand_equal_p (elt_i, idx, 0))
+  return elt_v;
+  return build_zero_cst (vect_elt_type);
 }
-  else if (TREE_CODE (vect) == SSA_NAME)
+  else
 {
- tree size = TYPE_SIZE (TREE_TYPE (type));
+ tree size = TYPE_SIZE (vect_elt_type);
   tree pos = fold_build2 (MULT_EXPR, TREE_TYPE (idx), idx, size);
-  return fold_build3 (BIT_FIELD_REF, TREE_TYPE (type), vect, size, 
pos);
+  return fold_build3 (BIT_FIELD_REF, vect_elt_type, vect, size, pos);
 }
-  else
-   return error_mark_node;
 }
 
   if (!ptmpvec)
-tmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+tmpvec = create_tmp_var (vect_type, "vectmp");
   else if (!*ptmpvec)
-tmpvec = *ptmpvec = create_tmp_var (TREE_TYPE (vect), "vectmp");
+tmpvec = *ptmpvec = create_tmp_var (vect_type, "vectmp");
   else
 {
   tmpvec = *ptmpvec;
@@ -576,17 +580,14 @@ vector_element (gimple_stmt_iterator *gsi, tree vect, 
tree idx, tree *ptmpvec)
   gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
 }
 
-  arraytype = build_array_type_nelts (TREE_TYPE (type),
- TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect)));
-
-  return build4 (ARRAY_REF, TREE_TYPE (type),
+  arraytype = build_array_type_nelts (vect_elt_type, elements);
+  return build4 (ARRAY_REF, vect_elt_type,
  build1 (VIEW_CONVERT_EXPR, arraytype, tmpvec),
  idx, NULL_TREE, NULL_TREE);
 }
 
 /* Check if VEC_SHUFFLE_EXPR within the given setting is supported
-   by hardware, or lower it piecewise.  Function returns false when
-   the expression must be replaced with TRAP_RETURN, true otherwise.
+   by hardware, or lower it piecewise.
 
When VEC_SHUFFLE_EXPR has the same first and second operands:
VEC_SHUFFLE_EXPR  the lowered version would be
@@ -597,162 +598,96 @@ vector_element (gimple_stmt_iterator *gsi, tree vect, 
tree idx, tree *ptmpvec)
{mask[0] < len(v0) ? v0[mask[0]] : v1[mask[0]], ...}
V0 and V1 must have the 

[PATCH 2/3] i386: Rewrite ix86_expand_vshuffle.

2011-10-05 Thread Richard Henderson
1: Handle TARGET_XOP.
2: Reduce code duplication.
3: Use ASHIFT instead of MULT for scaling.
4: Fix errors in building convert-to-v16qi indicies.
5: Handle v2di without sse4.1.
---
 gcc/ChangeLog |6 +
 gcc/config/i386/i386-protos.h |2 +-
 gcc/config/i386/i386.c|  208 -
 gcc/config/i386/sse.md|4 +-
 4 files changed, 109 insertions(+), 111 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a88854d..4b5816d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -6,6 +6,12 @@
code duplication.  Do update_stmt here ...
(expand_vector_operations_1): ... not here.
 
+   * config/i386/i386.c (ix86_expand_vshuffle): Never fail.  Handle
+   TARGET_XOP.  Fix pshufb constant vector creation.  Reduce code
+   duplication.  Handle V2DI without SSE4.1.
+   * config/i386/i386-protos.h (ix86_expand_vshuffle): Update decl.
+   * config/i386/i386.md (vshuffle): Remove assert for ok.
+
 2011-10-05  DJ Delorie  
Nick Clifton  
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 99327ed..0bbfa9b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -123,7 +123,7 @@ extern bool ix86_expand_int_movcc (rtx[]);
 extern bool ix86_expand_fp_movcc (rtx[]);
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
-extern bool ix86_expand_vshuffle (rtx[]);
+extern void ix86_expand_vshuffle (rtx[]);
 extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4c1db3a..80a9e73 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19236,145 +19236,139 @@ ix86_expand_int_vcond (rtx operands[])
   return true;
 }
 
-bool
+void
 ix86_expand_vshuffle (rtx operands[])
 {
   rtx target = operands[0];
   rtx op0 = operands[1];
   rtx op1 = operands[2];
   rtx mask = operands[3];
-  rtx new_mask, vt, t1, t2, w_vector;
+  rtx vt, vec[16];
   enum machine_mode mode = GET_MODE (op0);
   enum machine_mode maskmode = GET_MODE (mask);
-  enum machine_mode maskinner = GET_MODE_INNER (mode);
-  rtx vec[16];
-  int w, i, j;
-  bool one_operand_shuffle = op0 == op1;
+  int w, e, i;
+  bool one_operand_shuffle = rtx_equal_p (op0, op1);
 
-  gcc_assert ((TARGET_SSSE3 || TARGET_AVX) && GET_MODE_BITSIZE (mode) == 128);
+  gcc_checking_assert (GET_MODE_BITSIZE (mode) == 128);
 
   /* Number of elements in the vector.  */
-  w = GET_MODE_BITSIZE (maskmode) / GET_MODE_BITSIZE (maskinner);
-
-  /* generate w_vector = {w, w, ...}  */
-  for (i = 0; i < w; i++)
-vec[i] = GEN_INT (w);
-  w_vector = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
-
-  /* mask = mask & {w-1, w-1, w-1,...} */
-  for (i = 0; i < w; i++)
-vec[i] = GEN_INT (w - 1);
+  w = GET_MODE_NUNITS (mode);
+  e = GET_MODE_UNIT_SIZE (mode);
 
-  vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
-  new_mask = expand_simple_binop (maskmode, AND, mask, vt,
- NULL_RTX, 0, OPTAB_DIRECT);
-
-  /* If the original vector mode is V16QImode, we can just
- use pshufb directly.  */
-  if (mode == V16QImode && one_operand_shuffle)
+  if (TARGET_XOP)
 {
-  t1 = gen_reg_rtx (V16QImode);
-  emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, new_mask));
-  emit_insn (gen_rtx_SET (VOIDmode, target, t1));
-  return true;
+  /* The XOP VPPERM insn supports three inputs.  By ignoring the 
+one_operand_shuffle special case, we avoid creating another
+set of constant vectors in memory.  */
+  one_operand_shuffle = false;
+
+  /* mask = mask & {2*w-1, ...} */
+  vt = GEN_INT (2*w - 1);
 }
-  else if (mode == V16QImode)
+  else
 {
-  rtx xops[6];
-
-  t1 = gen_reg_rtx (V16QImode);
-  t2 = gen_reg_rtx (V16QImode);
-  emit_insn (gen_ssse3_pshufbv16qi3 (t1, op0, new_mask));
-  emit_insn (gen_ssse3_pshufbv16qi3 (t2, op1, new_mask));
-
-  /* mask = mask & {w, w, ...}  */
-  mask = expand_simple_binop (V16QImode, AND, mask, w_vector,
- NULL_RTX, 0, OPTAB_DIRECT);
-  xops[0] = target;
-  xops[1] = operands[1];
-  xops[2] = operands[2];
-  xops[3] = gen_rtx_EQ (mode, mask, w_vector);
-  xops[4] = t1;
-  xops[5] = t2;
-
-  return ix86_expand_int_vcond (xops);
+  /* mask = mask & {w-1, ...} */
+  vt = GEN_INT (w - 1);
 }
 
-  /* mask = mask * {w, w, ...}  */
-  new_mask = expand_simple_binop (maskmode, MULT, new_mask, w_vector,
- NULL_RTX, 0, OPTAB_DIRECT);
-
-  /* Convert mask to vector of chars.  */
-  new_mask = simplify_gen_subreg (V16QImode, new_mask, maskmode, 0);
-  new_mask = force_reg (V16QImode, new_mask);
-
-  /* Build a helper mask wich we will use in pshufb
- (v4si) --> {0,0

[PATCH 3/3] Fix vect-shuffle-* test cases.

2011-10-05 Thread Richard Henderson
---
 gcc/testsuite/ChangeLog|   11 ++
 .../gcc.c-torture/execute/vect-shuffle-1.c |   98 +++---
 .../gcc.c-torture/execute/vect-shuffle-2.c |   96 +++--
 .../gcc.c-torture/execute/vect-shuffle-3.c |   90 ++--
 .../gcc.c-torture/execute/vect-shuffle-4.c |   99 +-
 .../gcc.c-torture/execute/vect-shuffle-5.c |  113 ++--
 .../gcc.c-torture/execute/vect-shuffle-6.c |   64 +++
 .../gcc.c-torture/execute/vect-shuffle-7.c |   70 
 .../gcc.c-torture/execute/vect-shuffle-8.c |   55 ++
 9 files changed, 482 insertions(+), 214 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-6.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-7.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-8.c

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 4a09d43..eff558f 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,14 @@
+2011-10-05  Richard Henderson  
+
+   * gcc.c-torture/execute/vect-shuffle-1.c: Rewrite.
+   * gcc.c-torture/execute/vect-shuffle-2.c: Rewrite.
+   * gcc.c-torture/execute/vect-shuffle-3.c: Rewrite.
+   * gcc.c-torture/execute/vect-shuffle-4.c: Rewrite.
+   * gcc.c-torture/execute/vect-shuffle-5.c: Rewrite.
+   * gcc.c-torture/execute/vect-shuffle-6.c: New test.
+   * gcc.c-torture/execute/vect-shuffle-7.c: New test.
+   * gcc.c-torture/execute/vect-shuffle-8.c: New test.
+
 2011-10-05  Richard Guenther  
 
PR tree-optimization/38885
diff --git a/gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c 
b/gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
index 20f0261..3b83636 100644
--- a/gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
@@ -1,46 +1,68 @@
-#define vector(elcount, type)  \
-__attribute__((vector_size((elcount)*sizeof(type type
+#if __SIZEOF_INT__ == 4
+typedef unsigned int V __attribute__((vector_size(16), may_alias));
 
-#define vidx(type, vec, idx) (*(((type *) &(vec)) + idx))
+struct S
+{
+  V in, mask, out;
+};
 
-#define shufcompare(type, count, vres, v0, mask) \
-do { \
-int __i; \
-for (__i = 0; __i < count; __i++) { \
-if (vidx(type, vres, __i) != vidx(type, v0, vidx(type, mask, __i))) \
-__builtin_abort (); \
-} \
-} while (0)
+struct S tests[] = {
+  {
+{ 0x, 0x, 0x, 0x },
+{ 0, 1, 2, 3 },
+{ 0x, 0x, 0x, 0x },
+  },
+  {
+{ 0x, 0x, 0x, 0x },
+{ 0+1*4, 1+2*4, 2+3*4, 3+4*4 },
+{ 0x, 0x, 0x, 0x },
+  },
+  {
+{ 0x, 0x, 0x, 0x },
+{ 3, 2, 1, 0 },
+{ 0x, 0x, 0x, 0x },
+  },
+  {
+{ 0x, 0x, 0x, 0x },
+{ 0, 3, 2, 1 },
+{ 0x, 0x, 0x, 0x },
+  },
+  {
+{ 0x, 0x, 0x, 0x },
+{ 0, 2, 1, 3 },
+{ 0x, 0x, 0x, 0x },
+  },
+  {
+{ 0x11223344, 0x55667788, 0x99aabbcc, 0xddeeff00 },
+{ 3, 1, 2, 0 },
+{ 0xddeeff00, 0x55667788, 0x99aabbcc, 0x11223344 },
+  },
+  {
+{ 0x11223344, 0x55667788, 0x99aabbcc, 0xddeeff00 },
+{ 0, 0, 0, 0 },
+{ 0x11223344, 0x11223344, 0x11223344, 0x11223344 },
+  },
+  {
+{ 0x11223344, 0x55667788, 0x99aabbcc, 0xddeeff00 },
+{ 1, 2, 1, 2 },
+{ 0x55667788, 0x99aabbcc, 0x55667788, 0x99aabbcc },
+  }
+};
 
+extern void abort(void);
 
-int main (int argc, char *argv[]) {
-/*vector (8, short) v0 = {argc, 1,2,3,4,5,6,7};
-vector (8, short) v1 = {argc, 1,argc,3,4,5,argc,7};
-vector (8, short) v2;
+int main()
+{
+  int i;
 
-vector (8, short) smask = {0,0,1,2,3,4,5,6};
+  for (i = 0; i < sizeof(tests)/sizeof(tests[0]); ++i)
+{
+  V r = __builtin_shuffle(tests[i].in, tests[i].mask);
+  if (__builtin_memcmp(&r, &tests[i].out, sizeof(V)) != 0)
+   abort();
+}
 
-v2 = __builtin_shuffle (v0,  smask);
-shufcompare (short, 8, v2, v0, smask);
-v2 = __builtin_shuffle (v0, v1);
-shufcompare (short, 8, v2, v0, v1);
-v2 = __builtin_shuffle (smask, v0);
-shufcompare (short, 8, v2, smask, v0);*/
-
-vector (4, int) i0 = {argc, 1,2,3};
-vector (4, int) i1 = {argc, 1, argc, 3};
-vector (4, int) i2;
-
-vector (4, int) imask = {0,3,2,1};
-
-/*i2 = __builtin_shuffle (i0, imask);
-shufcompare (int, 4, i2, i0, imask);*/
-i2 = __builtin_shuffle (i0, i1);
-shufcompare (int, 4, i2, i0, i1);
-
-i2 = __builtin_shuffle (imask, i0);
-shufcompare (int, 4, i2, imask, i0);
-
-return 0;
+  return 0;
 }
 
+#endif /* SIZEOF_INT */
diff --git a/gcc/testsuite/gcc.c-torture/execute/vect-

Re: Initial shrink-wrapping patch

2011-10-05 Thread Richard Henderson
On 10/05/2011 10:19 AM, Bernd Schmidt wrote:
>   * function.c (thread_prologue_and_epilogue_insns): Don't shrink-wrap
>   if profiling after the prologue.

Ok.


r~


[PATCH] Fix memory leak in vect_pattern_recog_1

2011-10-05 Thread Jakub Jelinek
Hi!

If vect_recog_func fails (or the other spot where vect_pattern_recog_1
returns early), the vector allocated in the function isn't freed, leading
to memory leak.  But, more importantly, doing a VEC_alloc + VEC_free
num_stmts_in_loop * NUM_PATTERNS times seems to be completely unnecessary,
the following patch allocates just one vector for that purpose in the caller
and only performs VEC_truncate in each call to make it independent from
previous uses of the vector.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-10-05  Jakub Jelinek  

* tree-vect-patterns.c (vect_pattern_recog_1): Add stmts_to_replace
argument, truncate it at the beginning instead of allocating there
and freeing at the end.
(vect_pattern_recog): Allocate stmts_to_replace here and free at end,
pass its address to vect_pattern_recog_1.

--- gcc/tree-vect-patterns.c.jj 2011-09-26 14:06:52.0 +0200
+++ gcc/tree-vect-patterns.c2011-10-05 15:57:38.0 +0200
@@ -1281,7 +1281,8 @@ vect_mark_pattern_stmts (gimple orig_stm
 static void
 vect_pattern_recog_1 (
gimple (* vect_recog_func) (VEC (gimple, heap) **, tree *, tree *),
-   gimple_stmt_iterator si)
+   gimple_stmt_iterator si,
+   VEC (gimple, heap) **stmts_to_replace)
 {
   gimple stmt = gsi_stmt (si), pattern_stmt;
   stmt_vec_info stmt_info;
@@ -1291,14 +1292,14 @@ vect_pattern_recog_1 (
   enum tree_code code;
   int i;
   gimple next;
-  VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1);
 
-  VEC_quick_push (gimple, stmts_to_replace, stmt);
-  pattern_stmt = (* vect_recog_func) (&stmts_to_replace, &type_in, &type_out);
+  VEC_truncate (gimple, *stmts_to_replace, 0);
+  VEC_quick_push (gimple, *stmts_to_replace, stmt);
+  pattern_stmt = (* vect_recog_func) (stmts_to_replace, &type_in, &type_out);
   if (!pattern_stmt)
 return;
 
-  stmt = VEC_last (gimple, stmts_to_replace);
+  stmt = VEC_last (gimple, *stmts_to_replace);
   stmt_info = vinfo_for_stmt (stmt);
   loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
  
@@ -1363,8 +1364,8 @@ vect_pattern_recog_1 (
   /* It is possible that additional pattern stmts are created and inserted in
  STMTS_TO_REPLACE.  We create a stmt_info for each of them, and mark the
  relevant statements.  */
-  for (i = 0; VEC_iterate (gimple, stmts_to_replace, i, stmt)
-  && (unsigned) i < (VEC_length (gimple, stmts_to_replace) - 1);
+  for (i = 0; VEC_iterate (gimple, *stmts_to_replace, i, stmt)
+ && (unsigned) i < (VEC_length (gimple, *stmts_to_replace) - 1);
i++)
 {
   stmt_info = vinfo_for_stmt (stmt);
@@ -1377,8 +1378,6 @@ vect_pattern_recog_1 (
 
   vect_mark_pattern_stmts (stmt, pattern_stmt, NULL_TREE);
 }
-
-  VEC_free (gimple, heap, stmts_to_replace);
 }
 
 
@@ -1468,6 +1467,7 @@ vect_pattern_recog (loop_vec_info loop_v
   gimple_stmt_iterator si;
   unsigned int i, j;
   gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *);
+  VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1);
 
   if (vect_print_dump_info (REPORT_DETAILS))
 fprintf (vect_dump, "=== vect_pattern_recog ===");
@@ -1483,8 +1483,11 @@ vect_pattern_recog (loop_vec_info loop_v
   for (j = 0; j < NUM_PATTERNS; j++)
 {
   vect_recog_func_ptr = vect_vect_recog_func_ptrs[j];
-  vect_pattern_recog_1 (vect_recog_func_ptr, si);
+ vect_pattern_recog_1 (vect_recog_func_ptr, si,
+   &stmts_to_replace);
 }
 }
 }
+
+  VEC_free (gimple, heap, stmts_to_replace);
 }

Jakub


Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-05 Thread Uros Bizjak
On Wed, Oct 5, 2011 at 12:49 PM, Kirill Yukhin  wrote:

> We're prepared and bunch of tests which checks autogeneration of FMA3
> instructions family.
> FMA3 typo in .md file is fixed as well (it was catched by tests).
>
> ChangeLog entry:
>
> 2011-10-04  Kirill Yukhin 
>            Yakovlev Vladimir 
>
>        * config/i386/sse.md (fma_fnmsub_): Fix a typo.
>
> testsuite/ChangeLog entry:
>
> 2011-10-04  Kirill Yukhin 
>            Yakovlev Vladimir 
>
>        * gcc.target/i386/fma_1.h: New test.
>        * gcc.target/i386/fma_2.h: Ditto.
>        * gcc.target/i386/fma_3.h: Ditto.
>        * gcc.target/i386/fma_4.h: Ditto.
>        * gcc.target/i386/fma_5.h: Ditto.
>        * gcc.target/i386/fma_6.h: Ditto.

Just say "New file." for various headers. They are not standalone tests...

>        * gcc.target/i386/fma_double_1.c: Ditto.
>        * gcc.target/i386/fma_double_2.c: Ditto.
>        * gcc.target/i386/fma_double_3.c: Ditto.
>        * gcc.target/i386/fma_double_4.c: Ditto.
>        * gcc.target/i386/fma_double_5.c: Ditto.
>        * gcc.target/i386/fma_double_6.c: Ditto.
>        * gcc.target/i386/fma_float_1.c: Ditto.
>        * gcc.target/i386/fma_float_2.c: Ditto.
>        * gcc.target/i386/fma_float_3.c: Ditto.
>        * gcc.target/i386/fma_float_4.c: Ditto.
>        * gcc.target/i386/fma_float_5.c: Ditto.
>        * gcc.target/i386/fma_float_6.c: Ditto.
>        * gcc.target/i386/fma_main.h: Ditto.
>        * gcc.target/i386/fma_run_double_1.c: Ditto.
>        * gcc.target/i386/fma_run_double_2.c: Ditto.
>        * gcc.target/i386/fma_run_double_3.c: Ditto.
>        * gcc.target/i386/fma_run_double_4.c: Ditto.
>        * gcc.target/i386/fma_run_double_5.c: Ditto.
>        * gcc.target/i386/fma_run_double_6.c: Ditto.
>        * gcc.target/i386/fma_run_double_results_1.h: Ditto.
>        * gcc.target/i386/fma_run_double_results_2.h: Ditto.
>        * gcc.target/i386/fma_run_double_results_3.h: Ditto.
>        * gcc.target/i386/fma_run_double_results_4.h: Ditto.
>        * gcc.target/i386/fma_run_double_results_5.h: Ditto.
>        * gcc.target/i386/fma_run_double_results_6.h: Ditto.
>        * gcc.target/i386/fma_run_float_1.c: Ditto.
>        * gcc.target/i386/fma_run_float_2.c: Ditto.
>        * gcc.target/i386/fma_run_float_3.c: Ditto.
>        * gcc.target/i386/fma_run_float_4.c: Ditto.
>        * gcc.target/i386/fma_run_float_5.c: Ditto.
>        * gcc.target/i386/fma_run_float_6.c: Ditto.
>        * gcc.target/i386/fma_run_float_results_1.h: Ditto.
>        * gcc.target/i386/fma_run_float_results_2.h: Ditto.
>        * gcc.target/i386/fma_run_float_results_3.h: Ditto.
>        * gcc.target/i386/fma_run_float_results_4.h: Ditto.
>        * gcc.target/i386/fma_run_float_results_5.h: Ditto.
>        * gcc.target/i386/fma_run_float_results_6.h: Ditto.
>        * gcc.target/i386/l_fma_1.h: Ditto.
>        * gcc.target/i386/l_fma_2.h: Ditto.
>        * gcc.target/i386/l_fma_3.h: Ditto.
>        * gcc.target/i386/l_fma_4.h: Ditto.
>        * gcc.target/i386/l_fma_5.h: Ditto.
>        * gcc.target/i386/l_fma_6.h: Ditto.
>        * gcc.target/i386/l_fma_double_1.c: Ditto.
>        * gcc.target/i386/l_fma_double_2.c: Ditto.
>        * gcc.target/i386/l_fma_double_3.c: Ditto.
>        * gcc.target/i386/l_fma_double_4.c: Ditto.
>        * gcc.target/i386/l_fma_double_5.c: Ditto.
>        * gcc.target/i386/l_fma_double_6.c: Ditto.
>        * gcc.target/i386/l_fma_float_1.c: Ditto.
>        * gcc.target/i386/l_fma_float_2.c: Ditto.
>        * gcc.target/i386/l_fma_float_3.c: Ditto.
>        * gcc.target/i386/l_fma_float_4.c: Ditto.
>        * gcc.target/i386/l_fma_float_5.c: Ditto.
>        * gcc.target/i386/l_fma_float_6.c: Ditto.
>        * gcc.target/i386/l_fma_main.h: Ditto.
>        * gcc.target/i386/l_fma_run_double_1.c: Ditto.
>        * gcc.target/i386/l_fma_run_double_2.c: Ditto.
>        * gcc.target/i386/l_fma_run_double_3.c: Ditto.
>        * gcc.target/i386/l_fma_run_double_4.c: Ditto.
>        * gcc.target/i386/l_fma_run_double_5.c: Ditto.
>        * gcc.target/i386/l_fma_run_double_6.c: Ditto.
>        * gcc.target/i386/l_fma_run_float_1.c: Ditto.
>        * gcc.target/i386/l_fma_run_float_2.c: Ditto.
>        * gcc.target/i386/l_fma_run_float_3.c: Ditto.
>        * gcc.target/i386/l_fma_run_float_4.c: Ditto.
>        * gcc.target/i386/l_fma_run_float_5.c: Ditto.
>        * gcc.target/i386/l_fma_run_float_6.c: Ditto.
>
> Could you please have a look?

+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-require-effective-target fma } */
+/* { dg-options "-O2 -mfma" } */

{ dg-require-effective-target fma } directive is not needed for
compile-only tests, as they stop at generating assembly file. Also,
you have disabled all tests on ia32 - unconditionally use "-O2 -mfma
-mfpmath=sse" for dg-options, and these instructions will magically
appear on all targets.

Otherwise, the patch looks OK.

Uros.


Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Mike Stump
On Oct 5, 2011, at 6:18 AM, Diego Novillo wrote:
> I think we need to find a solution for this situation.

The solution Apple found and implemented is a __nodebug__ attribute, as can be 
seen in Apple's gcc.

We use it like so:

#define __always_inline__ __always_inline__, __nodebug__
#undef __always_inline__

in headers like mmintrn.h:

__STATIC_INLINE void __attribute__((__always_inline__))
/* APPLE LOCAL end radar 5618945 */
_mm_empty (void)
{
  __builtin_ia32_emms ();
}

to disappear the debug information for all the routines, so that the context is 
the context in which the routine is called (because the routine is always 
inlined).  It is implemented and works great.  Easy to use and understand.  
Since we use the #define, #undef around the functions, it is mostly equivalent 
to #pragma, though, it does attach a little closer to the function.  You can 
strip #.* from the .i files, and not have it removed (which is nice).


Re: Modify gcc for use with gdb (issue5132047)

2011-10-05 Thread Diego Novillo
On Wed, Oct 5, 2011 at 14:20, Mike Stump  wrote:
> On Oct 5, 2011, at 6:18 AM, Diego Novillo wrote:
>> I think we need to find a solution for this situation.
>
> The solution Apple found and implemented is a __nodebug__ attribute, as can 
> be seen in Apple's gcc.
>
> We use it like so:
>
> #define __always_inline__ __always_inline__, __nodebug__
> #undef __always_inline__
>
> in headers like mmintrn.h:
>
> __STATIC_INLINE void __attribute__((__always_inline__))
> /* APPLE LOCAL end radar 5618945 */
> _mm_empty (void)
> {
>  __builtin_ia32_emms ();
> }

Ah, nice.  Though, one of the things I am liking more and more about
the blacklist solution is that it (a) does not need any modifications
to the source code, and (b) it works with no-inline functions as well.

This gives total control to the developer.  I would blacklist a bunch
of functions I never care to go into, for instance.  Others may choose
to blacklist a different set.  And you can change that from debug
session to the next.

I agree with Jakub that artificial functions should be blacklisted
automatically, however.

Richi, Jakub, if the blacklist solution was implemented in GCC would
you agree with promoting these macros into inline functions?  This is
orthogonal to http://sourceware.org/bugzilla/show_bug.cgi?id=13263, of
course.


Thanks.  Diego.


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-05 Thread Paolo Bonzini

On 10/05/2011 07:22 PM, William J. Schmidt wrote:

I don't know off the top of my head -- I'll have to gather that
information.  The issue is that the profitability is really
context-sensitive, so just the isolated costs of insns aren't enough.
The forward propagation of the add into (mem (reg REG)) looks like a
slam dunk in the absence of other information, but if there are other
nearby references using nonzero offsets from REG, this just extends the
lifetimes of X and Y without eliminating the need for REG.


True, however there are other passes that do this kind of un-CSE and 
lifetime reduction.


Paolo


Re: [PATCH, testsuite]: Fix UNRESOLVED: gcc.dg/vect/vec-scal-opt*.c scan-tree-dump-times veclower

2011-10-05 Thread Uros Bizjak
On Wed, Oct 5, 2011 at 7:42 PM, Uros Bizjak  wrote:

> 2011-10-05  Uros Bizjak  
>
>        * gcc.dg/vect/vect.exp (VEC_CFLAGS): Move initialization after
>        DEFAULT_VECTFLAGS initialization.
>

Actually, these testcases need a bit more surgery...

2011-10-05  Uros Bizjak  

* gcc.dg/vect/vect.exp (VEC_CFLAGS): Move initialization after
DEFAULT_VECTFLAGS initialization.  Append "-fdump-tree-veclower2".
* gcc.dg/vect/vec-scal-opt.c: Scan and cleanup veclower2 tree dump.
* gcc.dg/vect/vec-scal-opt1.c: Ditto.
* gcc.dg/vect/vec-scal-opt2.c: Ditto.

Attached additional patch tested on x86_64-pc-linux-gnu {,-m32},
committed to mainline SVN.

Uros.
Index: gcc.dg/vect/vec-scal-opt1.c
===
--- gcc.dg/vect/vec-scal-opt1.c (revision 179565)
+++ gcc.dg/vect/vec-scal-opt1.c (working copy)
@@ -17,5 +17,5 @@
return vidx(short, r1, 0);
 }
 
-/* { dg-final { scan-tree-dump-times ">> 2" 1 "veclower" { target 
vect_shift_scalar } } } */
-/* { dg-final { cleanup-tree-dump "veclower" } } */
+/* { dg-final { scan-tree-dump-times ">> 2" 1 "veclower2" { target 
vect_shift_scalar } } } */
+/* { dg-final { cleanup-tree-dump "veclower2" } } */
Index: gcc.dg/vect/vec-scal-opt2.c
===
--- gcc.dg/vect/vec-scal-opt2.c (revision 179565)
+++ gcc.dg/vect/vec-scal-opt2.c (working copy)
@@ -16,5 +16,5 @@
return vidx(short, r1, 0);
 }
 
-/* { dg-final { scan-tree-dump-times ">> 2" 1 "veclower" { target 
vect_shift_scalar } } } */
-/* { dg-final { cleanup-tree-dump "veclower" } } */
+/* { dg-final { scan-tree-dump-times ">> 2" 1 "veclower2" { target 
vect_shift_scalar } } } */
+/* { dg-final { cleanup-tree-dump "veclower2" } } */
Index: gcc.dg/vect/vec-scal-opt.c
===
--- gcc.dg/vect/vec-scal-opt.c  (revision 179565)
+++ gcc.dg/vect/vec-scal-opt.c  (working copy)
@@ -19,5 +19,5 @@
return vidx(short, r1, 0);
 }
 
-/* { dg-final { scan-tree-dump-times ">> k.\[0-9_\]*" 1 "veclower" { target 
vect_shift_scalar } } } */
-/* { dg-final { cleanup-tree-dump "veclower" } } */
+/* { dg-final { scan-tree-dump-times ">> k.\[0-9_\]*" 1 "veclower2" { target 
vect_shift_scalar } } } */
+/* { dg-final { cleanup-tree-dump "veclower2" } } */
Index: gcc.dg/vect/vect.exp
===
--- gcc.dg/vect/vect.exp(revision 179565)
+++ gcc.dg/vect/vect.exp(working copy)
@@ -64,8 +64,8 @@
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/nodump-*.\[cS\]]]  \
"" $DEFAULT_VECTCFLAGS
 
-# "-O -fdump-tree-veclower"
-lappend VEC_FLAGS "-O" "-fdump-tree-veclower"
+# "-O -fdump-tree-veclower2"
+lappend VEC_FLAGS "-O" "-fdump-tree-veclower2"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vec-scal-*.\[cS\]]]  \
 "" $VEC_FLAGS
 


Fix htab lookup bug in reregister_specialization (issue5190046)

2011-10-05 Thread Diego Novillo

I found this bug while debugging a failure in pph images with template
classes.  When writing the decl_specializations table, the writer
calls htab_elements() to output the length of the table.  It then
traverses the table with htab_traverse_noresize() to emit all the
entries.

The reader uses that value to know how many entries it needs to read.
However, the table was out of sync wrt empty entries because
reregister_specialization does a lookup using INSERT instead of
NO_INSERT, so it was leaving a bunch of empty entries in it (thanks
Jakub for helping me diagnose this).

Since empty entries are never traversed by htab_traverse, they were
never written out and the reader was trying to read more entries than
there really were.

Jason, I don't think this is affecting any bugs in trunk, but it looks
worth fixing.  OK for trunk?


Diego.


* pt.c (reregister_specialization): Call htab_find_slot with
NO_INSERT.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 04e7767..2366dc9 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -1648,8 +1648,8 @@ reregister_specialization (tree spec, tree tinfo, tree 
new_spec)
   elt.args = TI_ARGS (tinfo);
   elt.spec = NULL_TREE;
 
-  slot = (spec_entry **) htab_find_slot (decl_specializations, &elt, INSERT);
-  if (*slot)
+  slot = (spec_entry **) htab_find_slot (decl_specializations, &elt, 
NO_INSERT);
+  if (slot && *slot)
 {
   gcc_assert ((*slot)->spec == spec || (*slot)->spec == new_spec);
   gcc_assert (new_spec != NULL_TREE);

--
This patch is available for review at http://codereview.appspot.com/5190046


[pph] Fix template class tables streaming (issue5199041)

2011-10-05 Thread Diego Novillo

This fixes the problem I described in
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00376.html

The tests are now failing assembly comparisons because the elements in
these tables are emitted in different sequence in the pph and non-pph
case.  Other than that, the assembly produced is identical.

Tested on x86_64.  Committed to branch.


Diego.

cp/ChangeLog.pph

* pt.c (reregister_specialization):  Call htab_find_slot with
NO_INSERT.

testsuite/ChangeLog.pph

* g++.dg/pph/x1tmplclass2.cc: Mark partially fixed.
* g++.dg/pph/x4tmplclass2.cc: Likewise.
* g++.dg/pph/z4tmplclass2.cc: Likewise.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 04e7767..2366dc9 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -1648,8 +1648,8 @@ reregister_specialization (tree spec, tree tinfo, tree 
new_spec)
   elt.args = TI_ARGS (tinfo);
   elt.spec = NULL_TREE;
 
-  slot = (spec_entry **) htab_find_slot (decl_specializations, &elt, INSERT);
-  if (*slot)
+  slot = (spec_entry **) htab_find_slot (decl_specializations, &elt, 
NO_INSERT);
+  if (slot && *slot)
 {
   gcc_assert ((*slot)->spec == spec || (*slot)->spec == new_spec);
   gcc_assert (new_spec != NULL_TREE);
diff --git a/gcc/testsuite/g++.dg/pph/x1tmplclass2.cc 
b/gcc/testsuite/g++.dg/pph/x1tmplclass2.cc
index f04335d..a254106 100644
--- a/gcc/testsuite/g++.dg/pph/x1tmplclass2.cc
+++ b/gcc/testsuite/g++.dg/pph/x1tmplclass2.cc
@@ -1,5 +1,3 @@
-// { dg-xfail-if "BOGUS" { "*-*-*" } { "-fpph-map=pph.map" } }
-// { dg-bogus "x1tmplclass2.cc:1:0: internal compiler error: in 
pph_in_record_marker, at cp/pph-streamer.h" "" { xfail *-*-* } 0 }
-
+// pph asm xdiff 37711
 #include "x0tmplclass23.h"
 #include "a0tmplclass2_u.h"
diff --git a/gcc/testsuite/g++.dg/pph/x4tmplclass2.cc 
b/gcc/testsuite/g++.dg/pph/x4tmplclass2.cc
index 585d4c0..e605e40 100644
--- a/gcc/testsuite/g++.dg/pph/x4tmplclass2.cc
+++ b/gcc/testsuite/g++.dg/pph/x4tmplclass2.cc
@@ -1,6 +1,4 @@
-// { dg-xfail-if "BOGUS" { "*-*-*" } { "-fpph-map=pph.map" } }
-// { dg-bogus "x4tmplclass2.cc:1:0: internal compiler error: in 
pph_in_record_marker, at cp/pph-streamer.h" "" { xfail *-*-* } 0 }
-
+// pph asm xdiff 49533
 #include "x0tmplclass21.h"
 #include "x0tmplclass22.h"
 #include "a0tmplclass2_u.h"
diff --git a/gcc/testsuite/g++.dg/pph/z4tmplclass2.cc 
b/gcc/testsuite/g++.dg/pph/z4tmplclass2.cc
index 0243829..d025942 100644
--- a/gcc/testsuite/g++.dg/pph/z4tmplclass2.cc
+++ b/gcc/testsuite/g++.dg/pph/z4tmplclass2.cc
@@ -1,6 +1,4 @@
-// { dg-xfail-if "BOGUS" { "*-*-*" } { "-fpph-map=pph.map" } }
-// { dg-bogus "z4tmplclass2.cc:1:0: internal compiler error: in 
pph_in_record_marker, at cp/pph-streamer.h" "" { xfail *-*-* } 0 }
-
+// pph asm xdiff 59292
 #include "x0tmplclass23.h"
 #include "x0tmplclass24.h"
 #include "a0tmplclass2_u.h"

--
This patch is available for review at http://codereview.appspot.com/5199041


Re: Fix htab lookup bug in reregister_specialization (issue5190046)

2011-10-05 Thread Jakub Jelinek
On Wed, Oct 05, 2011 at 03:49:38PM -0400, Diego Novillo wrote:
> Jason, I don't think this is affecting any bugs in trunk, but it looks
> worth fixing.  OK for trunk?

Well, it can cause problems.  Consider 3 entries in the hash table
with the same hash. (1) inserted first, then (2), then (3), then (2)
gets htab_remove_elt (decl_specializations has deletions on it too),
so (2) in the hash table is overwritten with HTAB_DELETED_ENTRY.
Next reregister_specialization is called with the same hash.
It will find the slot (where (2) used to be stored), overwrites
the HTAB_DELETED_ENTRY with HTAB_EMPTY_ENTRY (aka NULL) and return
that slot, but as reregister_specialization doesn't store anything there,
afterwards htab_find won't be able to find (3).

BTW, register_specialization has the same problems.  If slot != NULL and fn
== NULL, it can still return without storing non-NULL into *slot
(when optimize_specialization_lookup_p (tmpl) returns non-zero).
You can do else if (slot != NULL && fn == NULL) htab_clear_slot 
(decl_specializations, slot);

And, IMHO slot should be void **, otherwise there is aliasing violation.

> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -1648,8 +1648,8 @@ reregister_specialization (tree spec, tree tinfo, tree 
> new_spec)
>elt.args = TI_ARGS (tinfo);
>elt.spec = NULL_TREE;
>  
> -  slot = (spec_entry **) htab_find_slot (decl_specializations, &elt, INSERT);
> -  if (*slot)
> +  slot = (spec_entry **) htab_find_slot (decl_specializations, &elt, 
> NO_INSERT);
> +  if (slot && *slot)
>  {
>gcc_assert ((*slot)->spec == spec || (*slot)->spec == new_spec);
>gcc_assert (new_spec != NULL_TREE);

If you never insert, it should be htab_find only (with spec_entry * instead
of spec_entry ** variable).

Jakub


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-05 Thread William J. Schmidt
On Wed, 2011-10-05 at 21:01 +0200, Paolo Bonzini wrote:
> On 10/05/2011 07:22 PM, William J. Schmidt wrote:
> > I don't know off the top of my head -- I'll have to gather that
> > information.  The issue is that the profitability is really
> > context-sensitive, so just the isolated costs of insns aren't enough.
> > The forward propagation of the add into (mem (reg REG)) looks like a
> > slam dunk in the absence of other information, but if there are other
> > nearby references using nonzero offsets from REG, this just extends the
> > lifetimes of X and Y without eliminating the need for REG.
> 
> True, however there are other passes that do this kind of un-CSE and 
> lifetime reduction.
> 
> Paolo

OK, I see.  If there's a better place downstream to make a swizzle, I'm
certainly fine with that.

I disabled locally_poor_mem_replacement and added some dump information
in should_replace_address to show the costs for the replacement I'm
trying to avoid:

In should_replace_address:
  old_rtx = (reg/f:DI 125 [ D.2036 ])
  new_rtx = (plus:DI (reg/v/f:DI 126 [ p ])
(reg:DI 128))
  address_cost (old_rtx) = 0
  address_cost (new_rtx) = 0
  set_src_cost (old_rtx) = 0
  set_src_cost (new_rtx) = 4

In insn 11, replacing
 (mem/s:SI (reg/f:DI 125 [ D.2036 ]) [2 p_1(D)->a S4 A32])
 with (mem/s:SI (plus:DI (reg/v/f:DI 126 [ p ])
(reg:DI 128)) [2 p_1(D)->a S4 A32])
Changed insn 11
deferring rescan insn with uid = 11.
deferring rescan insn with uid = 11.

Hope this helps,
Bill



Find more shrink-wrapping opportunities

2011-10-05 Thread Bernd Schmidt
This adds a little mini-pass to shrink-wrapping, to eliminate a common
case that often makes shrink-wrapping unavailable. If a move insn copies
an argument registers to a call-saved register, the prologue must be
emitted before this insn. We should therefore try to delay such moves
for as long as possible.

A followup patch will add an extra cprop pass, which replaces uses of
the call-saved register with uses of the incoming argument register,
thereby allowing us to move the copies in even more cases.

Bootstrapped and tested on i686-linux. Ok?


Bernd
* function.c (prepare_shrink_wrap, bb_active_p): New function.
(thread_prologue_and_epilogue_insns): Use bb_active_p.
Call prepare_shrink_wrap, then recompute bb_active_p for the
last block.

Index: gcc/function.c
===
--- gcc/function.c  (revision 179577)
+++ gcc/function.c  (working copy)
@@ -5352,6 +5352,127 @@ requires_stack_frame_p (rtx insn, HARD_R
   return true;
   return false;
 }
+
+/* Look for sets of call-saved registers in the first block of the
+   function, and move them down into successor blocks if the register
+   is used only on one path.  This exposes more opportunities for
+   shrink-wrapping.
+   These kinds of sets often occur when incoming argument registers are
+   moved to call-saved registers because their values are live across
+   one or more calls during the function.  */
+
+static void
+prepare_shrink_wrap (basic_block entry_block)
+{
+  rtx insn, curr;
+  FOR_BB_INSNS_SAFE (entry_block, insn, curr)
+{
+  basic_block next_bb;
+  edge e, live_edge;
+  edge_iterator ei;
+  rtx set, scan;
+  unsigned destreg, srcreg;
+
+  if (!NONDEBUG_INSN_P (insn))
+   continue;
+  set = single_set (insn);
+  if (!set)
+   continue;
+
+  if (!REG_P (SET_SRC (set)) || !REG_P (SET_DEST (set)))
+   continue;
+  srcreg = REGNO (SET_SRC (set));
+  destreg = REGNO (SET_DEST (set));
+  if (hard_regno_nregs[srcreg][GET_MODE (SET_SRC (set))] > 1
+ || hard_regno_nregs[destreg][GET_MODE (SET_DEST (set))] > 1)
+   continue;
+
+  next_bb = entry_block;
+  scan = insn;
+
+  for (;;)
+   {
+ live_edge = NULL;
+ FOR_EACH_EDGE (e, ei, next_bb->succs)
+   {
+ if (REGNO_REG_SET_P (df_get_live_in (e->dest), destreg))
+   {
+ if (live_edge)
+   {
+ live_edge = NULL;
+ break;
+   }
+ live_edge = e;
+   }
+   }
+ if (!live_edge)
+   break;
+ /* We can sometimes encounter dead code.  Don't try to move it
+into the exit block.  */
+ if (live_edge->dest == EXIT_BLOCK_PTR)
+   break;
+ if (EDGE_COUNT (live_edge->dest->preds) > 1)
+   break;
+ while (scan != BB_END (next_bb))
+   {
+ scan = NEXT_INSN (scan);
+ if (NONDEBUG_INSN_P (scan))
+   {
+ rtx link;
+ HARD_REG_SET set_regs;
+
+ CLEAR_HARD_REG_SET (set_regs);
+ note_stores (PATTERN (scan), record_hard_reg_sets,
+  &set_regs);
+ if (CALL_P (scan))
+   IOR_HARD_REG_SET (set_regs, call_used_reg_set);
+ for (link = REG_NOTES (scan); link; link = XEXP (link, 1))
+   if (REG_NOTE_KIND (link) == REG_INC)
+ record_hard_reg_sets (XEXP (link, 0), NULL, &set_regs);
+
+ if (TEST_HARD_REG_BIT (set_regs, srcreg)
+ || reg_referenced_p (SET_DEST (set),
+  PATTERN (scan)))
+   {
+ scan = NULL_RTX;
+ break;
+   }
+ if (CALL_P (scan))
+   {
+ rtx link = CALL_INSN_FUNCTION_USAGE (scan);
+ while (link)
+   {
+ rtx tmp = XEXP (link, 0);
+ if (GET_CODE (tmp) == USE
+ && reg_referenced_p (SET_DEST (set), tmp))
+   break;
+ link = XEXP (link, 1);
+   }
+ if (link)
+   {
+ scan = NULL_RTX;
+ break;
+   }
+   }
+   }
+   }
+ if (!scan)
+   break;
+ next_bb = live_edge->dest;
+   }
+
+  if (next_bb != entry_block)
+   {
+ rtx after = BB_HEAD (next_bb);
+ while (!NOTE_P (after)
+|| NOTE_KIND (after) != NOTE_INSN_BASIC_BLOCK)
+   after = NEXT_INSN (after);
+ emit_insn_after (PATTERN (insn), af

Re: Fix htab lookup bug in reregister_specialization (issue5190046)

2011-10-05 Thread Jason Merrill

On 10/05/2011 04:05 PM, Jakub Jelinek wrote:

BTW, register_specialization has the same problems.  If slot != NULL and fn
== NULL, it can still return without storing non-NULL into *slot
(when optimize_specialization_lookup_p (tmpl) returns non-zero).
You can do else if (slot != NULL&&  fn == NULL) htab_clear_slot 
(decl_specializations, slot);


I don't think there's a problem in that function; if fn == NULL, then we 
store spec in its place.



If you never insert, it should be htab_find only (with spec_entry * instead
of spec_entry ** variable).


Makes sense.

Jason



Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-10-05 Thread Tom de Vries
On 10/05/2011 10:46 AM, Richard Guenther wrote:
> On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries  wrote:
>> On 10/04/2011 03:03 PM, Richard Guenther wrote:
>>> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries  wrote:
 On 10/01/2011 05:46 PM, Tom de Vries wrote:
> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries  
>> wrote:
>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
 On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries 
  wrote:
> Richard,
>
> I got a patch for PR50527.
>
> The patch prevents the alignment of vla-related allocas to be set to
> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after 
> folding
> the alloca.
>
> Bootstrapped and regtested on x86_64.
>
> OK for trunk?

 Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
 the vectorizer then will no longer see that the arrays are properly 
 aligned.

 I'm not sure what the best thing to do is here, other than trying to 
 record
 the alignment requirement of the VLA somewhere.

 Forcing the alignment of the alloca replacement decl to 
 BIGGEST_ALIGNMENT
 has the issue that it will force stack-realignment which isn't free 
 (and the
 point was to make the decl cheaper than the alloca).  But that might
 possibly be the better choice.

 Any other thoughts?
>>>
>>> How about the approach in this (untested) patch? Using the DECL_ALIGN 
>>> of the vla
>>> for the new array prevents stack realignment for folded vla-allocas, 
>>> also for
>>> large vlas.
>>>
>>> This will not help in vectorizing large folded vla-allocas, but I think 
>>> it's not
>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although 
>>> that has
>>> been the case up until we started to fold). If you want to trigger 
>>> vectorization
>>> for a vla, you can still use the aligned attribute on the declaration.
>>>
>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also 
>>> without using
>>> an attribute on the decl. This patch exploits this by setting it at the 
>>> end of
>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective 
>>> in
>>> propagation though, because although the ptr_info of the lhs is 
>>> propagated via
>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>
>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of 
>>> ccp2 and
>>> not fold during ccp3.
>>
>> Ugh, somehow I like this the least ;)
>>
>> How about lowering VLAs to
>>
>>   p = __builtin_alloca (...);
>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>
>> and not assume anything for alloca itself if it feeds a
>> __builtin_assume_aligned?
>>
>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>
>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>
>> that's less awkward to use?
>>
>> Sorry for not having a clear plan here ;)
>>
>
> Using assume_aligned is a more orthogonal way to represent this in 
> gimple, but
> indeed harder to use.
>
> Another possibility is to add a 'tree vla_decl' field to struct
> gimple_statement_call, which is probably the easiest to implement.
>
> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
> decided to try this one. Attached patch implements my first stab at this  
> (now
> testing on x86_64).
>
> Is this an acceptable approach?
>

 bootstrapped and reg-tested (including ada) on x86_64.

 Ok for trunk?
>>>
>>> The idea is ok I think.  But
>>>
>>>  case BUILT_IN_ALLOCA:
>>> +case BUILT_IN_ALLOCA_WITH_ALIGN:
>>>/* If the allocation stems from the declaration of a variable-sized
>>>  object, it cannot accumulate.  */
>>>target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
>>>if (target)
>>> return target;
>>> +  if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
>>> + == BUILT_IN_ALLOCA_WITH_ALIGN)
>>> +   {
>>> + tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
>>> +  
>>> built_in_decls[BUILT_IN_ALLOCA],
>>> +  1, CALL_EXPR_ARG (exp, 0));
>>> + CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
>>> + exp = new_call;
>>> +   }
>>>
>>> Ick.  Why can't the rest of the compiler not handle
>>> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
>>> (thus, arrange things so the assembler name of alloca-with-align is alloca?)
>>>

Re: [C++-11] User defined literals

2011-10-05 Thread Jason Merrill

On 10/05/2011 11:55 AM, Ed Smith-Rowland wrote:

+ int num_parms = TREE_VEC_LENGTH (parameter_list);
+ if (num_parms != 1)
+   ok = false;
+ else
+   {
+ tree parm_list = TREE_VEC_ELT (parameter_list, 0);
+ tree parm = INNERMOST_TEMPLATE_PARMS (parm_list);


This is backwards; you want to do INNERMOST_TEMPLATE_PARMS before 
looking at the number of parms or pulling out the one at index 0.



+struct GTY(()) tree_userdef_literal {
+  struct tree_common common;


I think you can use tree_base instead of tree_common here, since you 
aren't using TREE_TYPE or TREE_CHAIN.



One thing doesn't work.  Earlier I had said that friend declarations failed.  
Not true.  What fails is defining the function body in the class definition.  
The function is simply not recorded.  I'm trying to track this down but 
pointers would be helpful.  If I write the function body outside the class 
definition friend works perfectly.



About friends.  The lookup for user-defined literal operators isn't ADL.  They are a 
lot like normal functions because of the suffix-id as opposed to operator<< in 
this regard.  Thus a friend literal operator should be accessible both as an explicit 
operator call and via a literal.  Both are tested and except as noted above about 
inline defs the tests pass.


A function that is only declared in a friend declaration can only be 
found by ADL, so that doesn't sound like a failure to me; it sounds like 
it's working the way it should.  However, if you declare the function 
earlier and then define it in the friend declaration it ought to work.


Jason


Re: Fix htab lookup bug in reregister_specialization (issue5190046)

2011-10-05 Thread Jakub Jelinek
On Wed, Oct 05, 2011 at 05:03:45PM -0400, Jason Merrill wrote:
> On 10/05/2011 04:05 PM, Jakub Jelinek wrote:
> >BTW, register_specialization has the same problems.  If slot != NULL and fn
> >== NULL, it can still return without storing non-NULL into *slot
> >(when optimize_specialization_lookup_p (tmpl) returns non-zero).
> >You can do else if (slot != NULL&&  fn == NULL) htab_clear_slot 
> >(decl_specializations, slot);
> 
> I don't think there's a problem in that function; if fn == NULL,
> then we store spec in its place.

If optimize_specialization_lookup_p (tmpl) doesn't change return value in
between the two calls, then you are right.  But perhaps in that case
you could avoid the second call and use slot != NULL instead.

Jakub


Re: Find more shrink-wrapping opportunities

2011-10-05 Thread Steven Bosscher
On Wed, Oct 5, 2011 at 10:48 PM, Bernd Schmidt  wrote:
> Bootstrapped and tested on i686-linux. Ok?

> +/* Return true if BB has any active insns.  */
> +static bool
> +bb_active_p (basic_block bb)
> +{
> +  rtx label;
> +
> +  /* Test whether there are active instructions in the last block.  */
> +  label = BB_END (bb);
> +  while (label && !LABEL_P (label))
> +{
> +  if (active_insn_p (label))
> + break;
> +  label = PREV_INSN (label);
> +}
> +  return BB_HEAD (bb) != label || !LABEL_P (label);
> +}

This assumes that all basic blocks start with a label, I don't think
that's true. It seems better to use FOR_BB_INSNS_REVERSE here.

Ciao!
Steven


[PATCH] Remove OUTPUT_ADDR_CONST_EXTRA macro

2011-10-05 Thread Anatoly Sokolov
Hi.

  No one back end does not use OUTPUT_ADDR_CONST_EXTRA macro now, this patch 
remove it. The TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA target hook should be use 
instead.

  The patch has been bootstrapped on and regression tested on
x86_64-unknown-linux-gnu for c and c++.

  This patch is pre-approved and should be committed within a week if no
objections.

* system.h (OUTPUT_ADDR_CONST_EXTRA): Poison.
* doc/tm.texi.in (OUTPUT_ADDR_CONST_EXTRA): Remove documentation.
* doc/tm.texi: Regenerate.
* target.def (output_addr_const_extra): Use
hook_bool_FILEptr_rtx_false.
* targhooks.c (default_asm_output_addr_const_extra): Remove.
* targhooks.h (default_asm_output_addr_const_extra): Remove.
* hooks.c (hook_bool_FILEptr_rtx_false): New functions.
* hooks.h (hook_bool_FILEptr_rtx_false): Declare.


Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi (revision 179476)
+++ gcc/doc/tm.texi (working copy)
@@ -7530,18 +7530,6 @@
 return @code{true}.
 @end deftypefn
 
-@defmac OUTPUT_ADDR_CONST_EXTRA (@var{stream}, @var{x}, @var{fail})
-A C statement to recognize @var{rtx} patterns that
-@code{output_addr_const} can't deal with, and output assembly code to
-@var{stream} corresponding to the pattern @var{x}.  This may be used to
-allow machine-dependent @code{UNSPEC}s to appear within constants.
-
-If @code{OUTPUT_ADDR_CONST_EXTRA} fails to recognize a pattern, it must
-@code{goto fail}, so that a standard error message is printed.  If it
-prints an error message itself, by calling, for example,
-@code{output_operand_lossage}, it may just complete normally.
-@end defmac
-
 @defmac ASM_OUTPUT_ASCII (@var{stream}, @var{ptr}, @var{len})
 A C statement to output to the stdio stream @var{stream} an assembler
 instruction to assemble a string constant containing the @var{len}
Index: gcc/doc/tm.texi.in
===
--- gcc/doc/tm.texi.in  (revision 179476)
+++ gcc/doc/tm.texi.in  (working copy)
@@ -7446,18 +7446,6 @@
 return @code{true}.
 @end deftypefn
 
-@defmac OUTPUT_ADDR_CONST_EXTRA (@var{stream}, @var{x}, @var{fail})
-A C statement to recognize @var{rtx} patterns that
-@code{output_addr_const} can't deal with, and output assembly code to
-@var{stream} corresponding to the pattern @var{x}.  This may be used to
-allow machine-dependent @code{UNSPEC}s to appear within constants.
-
-If @code{OUTPUT_ADDR_CONST_EXTRA} fails to recognize a pattern, it must
-@code{goto fail}, so that a standard error message is printed.  If it
-prints an error message itself, by calling, for example,
-@code{output_operand_lossage}, it may just complete normally.
-@end defmac
-
 @defmac ASM_OUTPUT_ASCII (@var{stream}, @var{ptr}, @var{len})
 A C statement to output to the stdio stream @var{stream} an assembler
 instruction to assemble a string constant containing the @var{len}
Index: gcc/targhooks.c
===
--- gcc/targhooks.c (revision 179476)
+++ gcc/targhooks.c (working copy)
@@ -371,21 +371,6 @@
   return get_identifier (stripped);
 }
 
-/* The default implementation of TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA.  */
-
-bool
-default_asm_output_addr_const_extra (FILE *file ATTRIBUTE_UNUSED,
-rtx x ATTRIBUTE_UNUSED)
-{
-#ifdef OUTPUT_ADDR_CONST_EXTRA
-  OUTPUT_ADDR_CONST_EXTRA (file, x, fail);
-  return true;
-
-fail:
-#endif
-  return false;
-}
-
 /* True if MODE is valid for the target.  By "valid", we mean able to
be manipulated in non-trivial ways.  In particular, this means all
the arithmetic is supported.
Index: gcc/targhooks.h
===
--- gcc/targhooks.h (revision 179476)
+++ gcc/targhooks.h (working copy)
@@ -67,8 +67,6 @@
 extern bool default_print_operand_punct_valid_p (unsigned char);
 extern tree default_mangle_assembler_name (const char *);
 
-extern bool default_asm_output_addr_const_extra (FILE *, rtx);
-
 extern bool default_scalar_mode_supported_p (enum machine_mode);
 extern bool targhook_words_big_endian (void);
 extern bool targhook_float_words_big_endian (void);
Index: gcc/hooks.c
===
--- gcc/hooks.c (revision 179476)
+++ gcc/hooks.c (working copy)
@@ -132,6 +132,14 @@
 {
 }
 
+/* Generic hook that takes (FILE *, rtx) and returns false.  */
+bool
+hook_bool_FILEptr_rtx_false (FILE *a ATTRIBUTE_UNUSED,
+rtx b ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
 /* Used for the TARGET_ASM_CAN_OUTPUT_MI_THUNK hook.  */
 bool
 hook_bool_const_tree_hwi_hwi_const_tree_false (const_tree a ATTRIBUTE_UNUSED,
Index: gcc/hooks.h
===
--- gcc/hooks.h (revision 179476)
+++ gcc/hooks.h (working copy)
@@ -63,6 +63,7 @@
 extern void hook_v

Re: Find more shrink-wrapping opportunities

2011-10-05 Thread Bernd Schmidt
On 10/05/11 23:23, Steven Bosscher wrote:
> On Wed, Oct 5, 2011 at 10:48 PM, Bernd Schmidt  
> wrote:
>> Bootstrapped and tested on i686-linux. Ok?
> 
>> +/* Return true if BB has any active insns.  */
>> +static bool
>> +bb_active_p (basic_block bb)
>> +{
>> +  rtx label;
>> +
>> +  /* Test whether there are active instructions in the last block.  */
>> +  label = BB_END (bb);
>> +  while (label && !LABEL_P (label))
>> +{
>> +  if (active_insn_p (label))
>> +break;
>> +  label = PREV_INSN (label);
>> +}
>> +  return BB_HEAD (bb) != label || !LABEL_P (label);
>> +}
> 
> This assumes that all basic blocks start with a label, I don't think
> that's true. It seems better to use FOR_BB_INSNS_REVERSE here.

It's the same code as before... and if you get a block without a label
and without active insns, then cfgcleanup hasn't done its job.


Bernd


Re: [Patch, Fortran] PR 35831: [F95] Shape mismatch check missing for dummy procedure argument

2011-10-05 Thread Janus Weil
>> If you have a cute idea how to elegantly introduce warnings into this
>> mechanism, I'm all ears.
> I'm not sure that it qualifies as cute, but we could produce multi-line
> diagnostics in the same way c++ does (for template candidates for example),
> like:
> error/warning: #the error/warning
> note: in actual argument at <> # the context

Maybe not the worst idea. The question is, how would we do this
technically? Can it be handled with the existing infrastructure
(gfc_error_... )?

Alternatives might be:
 * simply throw the warning without context
 * have 'check_dummy_characteristics' not return SUCCESS/FAILURE, but
something like SUCCESS/WARNING/ERROR and then react on this
appropriately by throwing either an error or a warning (more tedious)

Cheers,
Janus


Re: Initial shrink-wrapping patch

2011-10-05 Thread Ian Lance Taylor
On Wed, Oct 5, 2011 at 10:17 AM, Bernd Schmidt  wrote:
>
> I've committed the following after a x86_64-linux bootstrap.

This patch appears to have broken the Go bootstrap when compiling a C
file in libgo.  Here is a reduced test case:

static struct
{
  int n;
  unsigned int m;
  _Bool f;
} g;

_Bool
f (int s)
{
  unsigned int b, m;

  if (!g.f)
return 0;
  b = 1 << s;
  while (1)
{
  m = g.m;
  if (m & b)
break;
  if (__sync_bool_compare_and_swap (&g.m, m, m|b))
{
  if (m == 0)
f2 (&g.n);
  break;
}
}
  return 1;
}

Compiling this file with -O2 -fsplit-stack causes an ICE:

foo.c:29:1: internal compiler error: in maybe_record_trace_start, at
dwarf2cfi.c:2243

Compiling the file with -O2 -fsplit-stack -fno-shrink-wrap works.

Ian


Re: [PATCH] Remove OUTPUT_ADDR_CONST_EXTRA macro

2011-10-05 Thread Richard Henderson
On 10/05/2011 02:28 PM, Anatoly Sokolov wrote:
> * system.h (OUTPUT_ADDR_CONST_EXTRA): Poison.
> * doc/tm.texi.in (OUTPUT_ADDR_CONST_EXTRA): Remove documentation.
> * doc/tm.texi: Regenerate.
> * target.def (output_addr_const_extra): Use
> hook_bool_FILEptr_rtx_false.
> * targhooks.c (default_asm_output_addr_const_extra): Remove.
> * targhooks.h (default_asm_output_addr_const_extra): Remove.
> * hooks.c (hook_bool_FILEptr_rtx_false): New functions.
> * hooks.h (hook_bool_FILEptr_rtx_false): Declare.

Ok.


r~


Re: Commit: RX: Codegen bug fixes

2011-10-05 Thread Richard Henderson
On 10/05/2011 03:23 AM, Nick Clifton wrote:
>   The final fix was pointed out by Richard Henderson.  The recently
>   added support for narrow mode min and max instructions did not work
>   for the SMAX insn, as the RX does not have narrow mode versions of
>   this insn.

The SMIN pattern has the same problem.


r~


Re: [v3] versioned-namespaces spelling/soname change

2011-10-05 Thread Benjamin Kosnik

> I'm going to let this chill a bit on mainline and then check in to
> 4.6.x.

Seems fine so I dropped this into the 4_6-branch

tested x86/linux

-benjamin
2011-10-05  Benjamin Kosnik  

PR libstdc++/48698
* acinclude.m4 (GLIBCXX_ENABLE_SYMVERS): Set libtool_VERSION here.
* configure.ac: Move AC_SUBST of libtool_VERSION past call to
GLIBCXX_ENABLE_SYMVERS.
* configure: Regenerate.
* include/bits/c++config: Use __7 as versioned namespace name.
* config/abi/pre/gnu-versioned-namespace.ver: Change mangling as
per above.
* include/c_global/cwchar: Adjust nested namespaces.
* testsuite/20_util/bind/48698.cc: Add test case.
* testsuite/ext/profile/mutex_extensions_neg.cc: Change line number.


Index: configure.ac
===
--- configure.ac	(revision 179579)
+++ configure.ac	(working copy)
@@ -11,10 +11,6 @@
 # exported.  Only used at the end of this file.
 ### am handles this now?  ORIGINAL_LD_FOR_MULTILIBS=$LD
 
-# For libtool versioning info, format is CURRENT:REVISION:AGE
-libtool_VERSION=6:16:0
-AC_SUBST(libtool_VERSION)
-
 # Find the rest of the source tree framework.
 AM_ENABLE_MULTILIB(, ..)
 
@@ -303,6 +299,8 @@
 
 # This depends on GLIBCXX CHECK_LINKER_FEATURES, but without it assumes no.
 GLIBCXX_ENABLE_SYMVERS([yes])
+AC_SUBST(libtool_VERSION)
+
 GLIBCXX_ENABLE_VISIBILITY([yes])
 
 ac_ldbl_compat=no
Index: include/bits/locale_facets.tcc
===
--- include/bits/locale_facets.tcc	(revision 179579)
+++ include/bits/locale_facets.tcc	(working copy)
@@ -635,15 +635,11 @@
 
 	  const char_type __c = *__beg;
 
-	  if (!__donef)
-		__testf = __c == __lc->_M_falsename[__n];
-
+	  __testf = __c == __lc->_M_falsename[__n];
 	  if (!__testf && __donet)
 		break;
 
-	  if (!__donet)
-		__testt = __c == __lc->_M_truename[__n];
-
+	  __testt = __c == __lc->_M_truename[__n];
 	  if (!__testt && __donef)
 		break;
 
Index: include/bits/c++config
===
--- include/bits/c++config	(revision 179579)
+++ include/bits/c++config	(working copy)
@@ -162,41 +162,42 @@
 
 
 // Defined if inline namespaces are used for versioning.
-#define _GLIBCXX_INLINE_VERSION
+#define _GLIBCXX_INLINE_VERSION 
 
 // Inline namespace for symbol versioning.
 #if _GLIBCXX_INLINE_VERSION
+
 namespace std
 {
-  inline namespace _6 { }
+  inline namespace __7 { }
 
-  namespace rel_ops { inline namespace _6 { } }
+  namespace rel_ops { inline namespace __7 { } }
 
   namespace tr1
   {
-inline namespace _6 { }
-namespace placeholders { inline namespace _6 { } }
-namespace regex_constants { inline namespace _6 { } }
-namespace __detail { inline namespace _6 { } }
+inline namespace __7 { }
+namespace placeholders { inline namespace __7 { } }
+namespace regex_constants { inline namespace __7 { } }
+namespace __detail { inline namespace __7 { } }
   }
 
-  namespace decimal { inline namespace _6 { } }
+  namespace decimal { inline namespace __7 { } }
 
-  namespace chrono { inline namespace _6 { } }
-  namespace placeholders { inline namespace _6 { } }
-  namespace regex_constants { inline namespace _6 { } }
-  namespace this_thread { inline namespace _6 { } }
+  namespace chrono { inline namespace __7 { } }
+  namespace placeholders { inline namespace __7 { } }
+  namespace regex_constants { inline namespace __7 { } }
+  namespace this_thread { inline namespace __7 { } }
 
-  namespace __detail { inline namespace _6 { } }
-  namespace __regex { inline namespace _6 { } }
+  namespace __detail { inline namespace __7 { } }
+  namespace __regex { inline namespace __7 { } }
 }
 
 namespace __gnu_cxx
 {
-  inline namespace _6 { }
-  namespace __detail { inline namespace _6 { } }
+  inline namespace __7 { }
+  namespace __detail { inline namespace __7 { } }
 }
-# define _GLIBCXX_BEGIN_NAMESPACE_VERSION namespace _6 {
+# define _GLIBCXX_BEGIN_NAMESPACE_VERSION namespace __7 {
 # define _GLIBCXX_END_NAMESPACE_VERSION }
 #else
 # define _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -213,7 +214,7 @@
   namespace __cxx1998
   {
 #if _GLIBCXX_INLINE_VERSION
- inline namespace _6 { }
+ inline namespace __7 { }
 #endif
   }
 
Index: include/c_global/cwchar
===
--- include/c_global/cwchar	(revision 179579)
+++ include/c_global/cwchar	(working copy)
@@ -136,6 +136,8 @@
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
   using ::wint_t;
 
   using ::btowc;
@@ -207,8 +209,6 @@
   using ::wcsstr;
   using ::wmemchr;
 
-_GLIBCXX_BEGIN_NAMESPACE_VERSION
-
 #ifndef __CORRECT_ISO_CPP_WCHAR_H_PROTO
   inline wchar_t*
   wcschr(wchar_t* __p, wchar_t __c)
Index: testsuite/ext/profile/mutex_extensions_neg.cc
===

Re: Initial shrink-wrapping patch

2011-10-05 Thread Bernd Schmidt
On 10/06/11 01:04, Ian Lance Taylor wrote:
> On Wed, Oct 5, 2011 at 10:17 AM, Bernd Schmidt  
> wrote:
>>
>> I've committed the following after a x86_64-linux bootstrap.
> 
> This patch appears to have broken the Go bootstrap when compiling a C
> file in libgo.  Here is a reduced test case:
> 
> static struct
> {
>   int n;
>   unsigned int m;
>   _Bool f;
> } g;
> 
> _Bool
> f (int s)
> {
>   unsigned int b, m;
> 
>   if (!g.f)
> return 0;
>   b = 1 << s;
>   while (1)
> {
>   m = g.m;
>   if (m & b)
> break;
>   if (__sync_bool_compare_and_swap (&g.m, m, m|b))
> {
>   if (m == 0)
> f2 (&g.n);
>   break;
> }
> }
>   return 1;
> }
> 
> Compiling this file with -O2 -fsplit-stack causes an ICE:
> 
> foo.c:29:1: internal compiler error: in maybe_record_trace_start, at
> dwarf2cfi.c:2243
> 
> Compiling the file with -O2 -fsplit-stack -fno-shrink-wrap works.

This appears to be because the split prologue contains a jump, which
means the find_many_sub_blocks call reorders the block numbers, and our
indices into bb_flags are off.

I'm testing the following patch, but it won't finish today - feel free
to test and check it in, or to just disable shrink-wrapping with split
prologues.


Bernd
Index: gcc/function.c
===
--- gcc/function.c  (revision 179577)
+++ gcc/function.c  (working copy)
@@ -5455,7 +5455,7 @@ thread_prologue_and_epilogue_insns (void
   basic_block last_bb;
   bool last_bb_active;
 #ifdef HAVE_simple_return
-  bool unconverted_simple_returns = false;
+  VEC (basic_block, heap) *unconverted_simple_returns = NULL;
   basic_block simple_return_block_hot = NULL;
   basic_block simple_return_block_cold = NULL;
   bool nonempty_prologue;
@@ -5876,7 +5876,8 @@ thread_prologue_and_epilogue_insns (void
{
 #ifdef HAVE_simple_return
  if (simple_p)
-   unconverted_simple_returns = true;
+   VEC_safe_push (basic_block, heap,
+  unconverted_simple_returns, bb);
 #endif
  continue;
}
@@ -5894,7 +5895,8 @@ thread_prologue_and_epilogue_insns (void
{
 #ifdef HAVE_simple_return
  if (simple_p)
-   unconverted_simple_returns = true;
+   VEC_safe_push (basic_block, heap,
+  unconverted_simple_returns, bb);
 #endif
  continue;
}
@@ -6042,10 +6044,11 @@ epilogue_done:
  convert to conditional simple_returns, but couldn't for some
  reason, create a block to hold a simple_return insn and redirect
  those remaining edges.  */
-  if (unconverted_simple_returns)
+  if (!VEC_empty (basic_block, unconverted_simple_returns))
 {
-  edge_iterator ei2;
   basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
+  basic_block src_bb;
+  int i;
 
   gcc_assert (entry_edge != orig_entry_edge);
 
@@ -6062,19 +6065,12 @@ epilogue_done:
simple_return_block_cold = e->dest;
}
 
-restart_scan:
-  for (ei2 = ei_start (last_bb->preds); (e = ei_safe_edge (ei2)); )
+  FOR_EACH_VEC_ELT (basic_block, unconverted_simple_returns, i, src_bb)
{
- basic_block bb = e->src;
+ edge e = find_edge (src_bb, last_bb);
  basic_block *pdest_bb;
 
- if (bb == ENTRY_BLOCK_PTR
- || bitmap_bit_p (&bb_flags, bb->index))
-   {
- ei_next (&ei2);
- continue;
-   }
- if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
+ if (BB_PARTITION (src_bb) == BB_HOT_PARTITION)
pdest_bb = &simple_return_block_hot;
  else
pdest_bb = &simple_return_block_cold;
@@ -6094,8 +6090,8 @@ epilogue_done:
  make_edge (bb, EXIT_BLOCK_PTR, 0);
}
  redirect_edge_and_branch_force (e, *pdest_bb);
- goto restart_scan;
}
+  VEC_free (basic_block, heap, unconverted_simple_returns);
 }
 #endif
 


Re: [v3] versioned-namespaces spelling/soname change

2011-10-05 Thread Paolo Carlini

Hi,

the below hunk seems spurious?!?

Paolo.

/

Index: include/bits/locale_facets.tcc
===
--- include/bits/locale_facets.tcc  (revision 179579)
+++ include/bits/locale_facets.tcc  (working copy)
@@ -635,15 +635,11 @@

  const char_type __c = *__beg;

- if (!__donef)
-   __testf = __c == __lc->_M_falsename[__n];
-
+ __testf = __c == __lc->_M_falsename[__n];
  if (!__testf&&  __donet)
break;

- if (!__donet)
-   __testt = __c == __lc->_M_truename[__n];
-
+ __testt = __c == __lc->_M_truename[__n];
  if (!__testt&&  __donef)
break;

 





[v3] Avoid pod_char_traits.h warnings in C++0x mode

2011-10-05 Thread Paolo Carlini

Hi,

committed to mainline.

Paolo.


2011-10-05  Paolo Carlini  

* include/ext/pod_char_traits.h: Avoid warnings in C++0x mode
when int_type is unsigned.
Index: include/ext/pod_char_traits.h
===
--- include/ext/pod_char_traits.h   (revision 179582)
+++ include/ext/pod_char_traits.h   (working copy)
@@ -1,6 +1,7 @@
 // POD character, std::char_traits specialization -*- C++ -*-
 
-// Copyright (C) 2002, 2003, 2004, 2005, 2007, 2009 Free Software Foundation, 
Inc.
+// Copyright (C) 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011
+// Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -32,6 +33,8 @@
 #ifndef _POD_CHAR_TRAITS_H
 #define _POD_CHAR_TRAITS_H 1
 
+#pragma GCC system_header
+
 #include 
 
 namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)


Re: Fix htab lookup bug in reregister_specialization (issue5190046)

2011-10-05 Thread Jason Merrill

On 10/05/2011 05:15 PM, Jakub Jelinek wrote:

If optimize_specialization_lookup_p (tmpl) doesn't change return value in
between the two calls, then you are right.  But perhaps in that case
you could avoid the second call and use slot != NULL instead.


That makes sense too.

Jason



Re: [v3] add max_size and rebind to __alloc_traits

2011-10-05 Thread Paolo Carlini

Hi,

2011-10-04  Jonathan Wakely

* include/ext/alloc_traits.h (__alloc_traits::max_size): Define.
(__alloc_traits::rebind): Define.
* include/bits/stl_vector.h: Use them.
* testsuite/util/testsuite_allocator.h (SimpleAllocator): Define.
* testsuite/23_containers/vector/allocator/minimal.cc: New.
* testsuite/23_containers/vector/requirements/dr438/assign_neg.cc:
Adjust dg-error line numbers.
* testsuite/23_containers/vector/requirements/dr438/insert_neg.cc:
Likewise.
* testsuite/23_containers/vector/requirements/dr438/
constructor_1_neg.cc: Likewise.
* testsuite/23_containers/vector/requirements/dr438/
constructor_2_neg.cc: Likewise.

today I ran the whole testsuite in C++0x mode and I'm pretty sure that 
23_containers/vector/modifiers/swap/3.cc, which is now failing, wasn't a 
couple of days ago (I ran the whole testsuite like that in order to 
validate my std::list changes). When you have time, could you please 
double check? (maybe after all we *do* want it to fail in C++0x mode, 
but I'd like to understand if the behavior changed inadvertently!)


Thanks in advance,
Paolo.


Re: Fix pr50607 bconstp-3.c failure

2011-10-05 Thread Artem Shinkarov
On Wed, Oct 5, 2011 at 5:32 PM, Artem Shinkarov
 wrote:
> On Wed, Oct 5, 2011 at 5:28 PM, Joseph S. Myers  
> wrote:
>> On Wed, 5 Oct 2011, Artem Shinkarov wrote:
>>
>>> Joseph, is it possible to commit the patch together with the space fixes?
>>
>> You should not commit whitespace fixes to lines not otherwise modified by
>> a patch, except in a separate commit that *only* has whitespace or other
>> formatting fixes, so that "svn blame" results are more meaningful.  That
>> is: commit whitespace fixes (assuming they are genuinely fixes rather than
>> making things worse) *separately* in a commit whose commit message makes
>> clear it is just whitespace fixes.
>
> Ok, I see.  In the given patch all the fixes are mainly auto-generated
> and remove the trailing spaces.  But I see your point.
>
> Thanks,
> Artem.
>> --
>> Joseph S. Myers
>> jos...@codesourcery.com
>>
>

Successfully regtested on x86-unknown-linux-gnu. Committed to the
mainline with the revision 179588.

ChangeLog:
2011-10-06  Artjoms Sinkarovs  
* c-tree.h (c_expr_t): New typedef for struct c_expr.
(C_EXPR_APPEND): New macro.
* c-parser.c (c_parser_get_builtin_args): Preserve
original_tree_code of c_expr structure. Fixes bconstp-3.c
failure of PR50607.
(c_parser_postfix_expression): Adjust to the new function.


Artem.


Re: Fix pr50607 bconstp-3.c failure

2011-10-05 Thread Hans-Peter Nilsson
On Thu, 6 Oct 2011, Artem Shinkarov wrote:
> Successfully regtested on x86-unknown-linux-gnu. Committed to the
> mainline with the revision 179588.
>
> ChangeLog:
> 2011-10-06  Artjoms Sinkarovs  
> * c-tree.h (c_expr_t): New typedef for struct c_expr.
> (C_EXPR_APPEND): New macro.
> * c-parser.c (c_parser_get_builtin_args): Preserve
> original_tree_code of c_expr structure. Fixes bconstp-3.c
> failure of PR50607.
> (c_parser_postfix_expression): Adjust to the new function.

Write that changelog entry as:

PR middle-end/50607
* c-tree.h (c_expr_t): New typedef for struct c_expr.
(C_EXPR_APPEND): New macro.
* c-parser.c (c_parser_get_builtin_args): Preserve
original_tree_code of c_expr structure.
(c_parser_postfix_expression): Adjust to the new function.

(note top marker and without the "Fixes...")
and there'll be an automatic entry in bugzilla, like for example
PR50609.  You still have to close the bugzilla entry manually
(not all noteworthy fixes are reason to close a bug report).

Mostly as a note for the future but you could fix that now, when
adding the missing empty line after the date+email line. ;)

brgds, H-P


Re: Fix pr50607 bconstp-3.c failure

2011-10-05 Thread Artem Shinkarov
On Thu, Oct 6, 2011 at 3:27 AM, Hans-Peter Nilsson  wrote:
> On Thu, 6 Oct 2011, Artem Shinkarov wrote:
>> Successfully regtested on x86-unknown-linux-gnu. Committed to the
>> mainline with the revision 179588.
>>
>> ChangeLog:
>> 2011-10-06  Artjoms Sinkarovs  
>>         * c-tree.h (c_expr_t): New typedef for struct c_expr.
>>         (C_EXPR_APPEND): New macro.
>>         * c-parser.c (c_parser_get_builtin_args): Preserve
>>         original_tree_code of c_expr structure. Fixes bconstp-3.c
>>         failure of PR50607.
>>         (c_parser_postfix_expression): Adjust to the new function.
>
> Write that changelog entry as:
>
>        PR middle-end/50607
>        * c-tree.h (c_expr_t): New typedef for struct c_expr.
>        (C_EXPR_APPEND): New macro.
>        * c-parser.c (c_parser_get_builtin_args): Preserve
>        original_tree_code of c_expr structure.
>        (c_parser_postfix_expression): Adjust to the new function.
>
> (note top marker and without the "Fixes...")
> and there'll be an automatic entry in bugzilla, like for example
> PR50609.  You still have to close the bugzilla entry manually
> (not all noteworthy fixes are reason to close a bug report).
>
> Mostly as a note for the future but you could fix that now, when
> adding the missing empty line after the date+email line. ;)
>
> brgds, H-P
>

Thanks for the advice, I didn't know about the automatic mail. Fixed
with commit 179589.  As for closing bugzilla entry, I would prefer to
make sure that the patch really fixes the problem.  It would be very
helpful if you could confirm this.


Thanks,
Artem.


Merge to gccgo branch

2011-10-05 Thread Ian Lance Taylor
I merged mainlined revision 179565 to the gccgo branch.  I included a
patch disabling -fshrink-wrap if -fstack-prologue is enabled.

Ian


  1   2   >