Re: [tsan] Instrument atomics

2012-11-27 Thread Dmitry Vyukov
On Fri, Nov 23, 2012 at 6:05 PM, Jakub Jelinek  wrote:
> Hi!
>
> This patch attempts to instrument __atomic_* and __sync_* builtins.
> Unfortunately for none of the builtins there is 1:1 mapping to the tsan
> replacements, tsan uses weirdo memory model encoding (instead of values
> from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
> one memory model at the same time), so even for the easy cases GCC
> has to transform them.  More importantly, there is no way to pass through
> __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
> up and doesn't replace anything if e.g.
> __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
> is seen, perhaps one could ignore the hle bits which are just an
> optimization, but there is no way to find out in the generic code
> whether the extra bits are just an optimization or change the behavior
> of the builtin.
>
> Another issue is that libtsan apparently internally uses just the deprecated
> __sync_* builtins (or no builtins at all for load/store).  That in some
> cases works (because __sync_* is usually equivalent to __atomic_* with
> __ATOMIC_SEQ_CST memmodel), but for the load/store cases and for atomic
> exchange it doesn't (the former don't provide any barrier, the latter
> in __sync_lock_test_and_set is only __ATOMIC_ACQUIRE barrier).
> Can libtsan at least conditionally, when built with GCC 4.7 or later,
> use __atomic_* builtins instead of __sync_*?  One still probably has to
> use __ATOMIC_SEQ_CST model anyway, otherwise there would need to be a switch
> based on the mem model, as  only constant arguments to __atomic_* builtins
> do something other than sequential consistency.
>
> Another issue is that there are no fetch + nand (or nand + fetch) tsan
> entrypoints, I could transform those into a loop using cas, but that looks
> overkill, then it would be better to change libtsan.
>
> The most important problem is that __tsan_atomic*_compare_exchange has just
> a single memory model argument, while the __atomic_* builtins have two (one
> for success, another one for failure).  Right now the patch just ignores
> the failure memory model, but that might actually lead into not detecting
> a race in a program when it should have been detected (note the failure
> memory model can't be stronger than success memory model).  Would it be
> possible to add another argument to those, and use the failure mode instead
> of success mode if the atomic operation fails?
>
> Apparently there are also no entry points for op and fetch (as opposed to
> fetch and op), but the patch translates those into the corresponding
> fetch and op libcalls with + val (etc.) on the result.
>
> Oh, and there are no 16-byte atomics provided by libtsan, the library
> would need to be built with -mcx16 on x86_64 for that and have the
> additional entry points for unsigned __int128.  The patch doesn't touch
> the 16-byte atomics, leaves them as is.


Hi,

I've added 128-bit atomic ops:
http://llvm.org/viewvc/llvm-project?view=rev&revision=168683

Refactored atomic ops so that the atomic operation itself is done
under the mutex:
http://llvm.org/viewvc/llvm-project?view=rev&revision=168682

And added atomic nand operation:
http://llvm.org/viewvc/llvm-project?view=rev&revision=168584


Re: [tsan] Instrument atomics

2012-11-27 Thread Dmitry Vyukov
On Fri, Nov 23, 2012 at 6:05 PM, Jakub Jelinek  wrote:
> Hi!
>
> This patch attempts to instrument __atomic_* and __sync_* builtins.
> Unfortunately for none of the builtins there is 1:1 mapping to the tsan
> replacements, tsan uses weirdo memory model encoding (instead of values
> from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
> one memory model at the same time), so even for the easy cases GCC
> has to transform them.  More importantly, there is no way to pass through
> __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
> up and doesn't replace anything if e.g.
> __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
> is seen, perhaps one could ignore the hle bits which are just an
> optimization, but there is no way to find out in the generic code
> whether the extra bits are just an optimization or change the behavior
> of the builtin.
>
> Another issue is that libtsan apparently internally uses just the deprecated
> __sync_* builtins (or no builtins at all for load/store).  That in some
> cases works (because __sync_* is usually equivalent to __atomic_* with
> __ATOMIC_SEQ_CST memmodel), but for the load/store cases and for atomic
> exchange it doesn't (the former don't provide any barrier, the latter
> in __sync_lock_test_and_set is only __ATOMIC_ACQUIRE barrier).
> Can libtsan at least conditionally, when built with GCC 4.7 or later,
> use __atomic_* builtins instead of __sync_*?  One still probably has to
> use __ATOMIC_SEQ_CST model anyway, otherwise there would need to be a switch
> based on the mem model, as  only constant arguments to __atomic_* builtins
> do something other than sequential consistency.
>
> Another issue is that there are no fetch + nand (or nand + fetch) tsan
> entrypoints, I could transform those into a loop using cas, but that looks
> overkill, then it would be better to change libtsan.
>
> The most important problem is that __tsan_atomic*_compare_exchange has just
> a single memory model argument, while the __atomic_* builtins have two (one
> for success, another one for failure).  Right now the patch just ignores
> the failure memory model, but that might actually lead into not detecting
> a race in a program when it should have been detected (note the failure
> memory model can't be stronger than success memory model).  Would it be
> possible to add another argument to those, and use the failure mode instead
> of success mode if the atomic operation fails?
>
> Apparently there are also no entry points for op and fetch (as opposed to
> fetch and op), but the patch translates those into the corresponding
> fetch and op libcalls with + val (etc.) on the result.
>
> Oh, and there are no 16-byte atomics provided by libtsan, the library
> would need to be built with -mcx16 on x86_64 for that and have the
> additional entry points for unsigned __int128.  The patch doesn't touch
> the 16-byte atomics, leaves them as is.
>
> This patch is on top of the patch from yesterday which introduced use of
> BUILT_IN_* functions in asan and builds them if the FE doesn't.
>
> I've used the attached testcase to verify it compiles, and quickly skimmed
> the changes it is doing, but haven't actually tested it more than that.
>
> As for local statics mentioned in the PR too, those are in GCC handled by
> __cxa_guard_{acquire,release,abort} library functions, so either libtsan would
> need to intercept those calls, or we'd need to add some instrumentation
> before and/or after that (what would be needed?).
>
> And, I'm also wondering about string/memory builtins asan is instrumenting,
> wouldn't it be useful to instrument those similarly for tsan (say also
> just the first byte accessed and last one)?  I know you are overriding most
> of them in libtsan, but some of them are often expanded inline (e.g. memcpy,
> strlen, etc.) and thus the libtsan replacements aren't executed.
> Or better yet, is there some libtsan entry point to report memory access
> range?
>
> 2012-11-23  Jakub Jelinek  
>
> PR sanitizer/55439
> * Makefile.in (tsan.o): Depend on tree-ssa-propagate.h.
> * sanitizer.def: Add __tsan_atomic* builtins.
> * asan.c (initialize_sanitizer_builtins): Adjust to also
> initialize __tsan_atomic* builtins.
> * tsan.c: Include tree-ssa-propagate.h.
> (enum tsan_atomic_action): New enum.
> (tsan_atomic_table): New table.
> (instrument_builtin_call): New function.
> (instrument_gimple): Take pointer to gimple_stmt_iterator
> instead of gimple_stmt_iterator.  Call instrument_builtin_call
> on builtin call stmts.
> (instrument_memory_accesses): Adjust instrument_gimple caller.
> * builtin-types.def (BT_FN_BOOL_VPTR_PTR_I1_INT,
> BT_FN_BOOL_VPTR_PTR_I2_INT, BT_FN_BOOL_VPTR_PTR_I4_INT,
> BT_FN_BOOL_VPTR_PTR_I8_INT): New.
>
> --- gcc/Makefile.in.jj  2012-11-23 10:31:37.861377311 +0100
> +++ gcc/Makefile.in 201

Re: [tsan] Instrument atomics

2012-11-27 Thread Jakub Jelinek
On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
> I've added 128-bit atomic ops:
> http://llvm.org/viewvc/llvm-project?view=rev&revision=168683

Thanks.
> 
> Refactored atomic ops so that the atomic operation itself is done
> under the mutex:
> http://llvm.org/viewvc/llvm-project?view=rev&revision=168682

This is IMHO very wrong.  It is fine if you hold some mutex over the atomic
operation, but you do need to perform the operation atomicly, e.g. you can't
rely that all atomic accesses to that memory location are performed from
-fsanitize=thread compiled code (it could be from non-instrumented code
from e.g. other libraries, could be from user written assembly, etc.).
By not doing the operation atomicly, the other code might observe
inconsistent state, or the tsan non-atomic code might be doing the wrong
thing.  One thing is that the tsan analysis will be right, but another
thing is that the program itself might misbehave.

> And added atomic nand operation:
> http://llvm.org/viewvc/llvm-project?view=rev&revision=168584

Thanks.  Can you please also add the memory range read/write functions?
That could be used both for builtins (e.g. if user writes
  __builtin_memcpy (&a, &b, sizeof (a));
in his code, no -fno-builtin or similar actually disables that), compiler
generated builtins - e.g. structure copies
struct large a, b;
  ...
  a = b;
and also for accesses of sizes that aren't supported by the library
(consider 32-byte or 64-byte vectors etc.).

Jakub


Re: PATCH: lto/55474: global-buffer-overflow in lto-wrapper.c

2012-11-27 Thread Markus Trippelsdorf
On 2012.11.26 at 13:58 -0800, H.J. Lu wrote:
> Hi,
> 
> OPT_SPECIAL_unknown, OPT_SPECIAL_ignore, OPT_SPECIAL_program_name and
> OPT_SPECIAL_input_file are special options, which aren't in cl_options.
> This patch avoids
> 
> if (!(cl_options[foption->opt_index].flags & CL_TARGET))
> 
> on them.  This patch skips them.  OK to install?

The same fix is necessary for gcc/lto-opts.c.
This solves the issue from PR54795 comment 5, 13 and 20.

2012-11-27  Markus Trippelsdorf  

PR lto/54795
* lto-opts.c (lto_write_options): Also handle
OPT_SPECIAL_unknown, OPT_SPECIAL_ignore and
OPT_SPECIAL_program_name.


diff --git a/gcc/lto-opts.c b/gcc/lto-opts.c
index a235f41..a61a26f 100644
--- a/gcc/lto-opts.c
+++ b/gcc/lto-opts.c
@@ -93,6 +93,20 @@ lto_write_options (void)
 {
   struct cl_decoded_option *option = &save_decoded_options[i];
 
+  /* Skip explicitly some common options that we do not need.  */
+  switch (option->opt_index)
+  {
+   case OPT_dumpbase:
+   case OPT_SPECIAL_unknown:
+   case OPT_SPECIAL_ignore:
+   case OPT_SPECIAL_program_name:
+   case OPT_SPECIAL_input_file:
+ continue;
+
+   default:
+ break;
+  }
+
   /* Skip frontend and driver specific options here.  */
   if (!(cl_options[option->opt_index].flags & 
(CL_COMMON|CL_TARGET|CL_LTO)))
continue;
@@ -108,17 +122,6 @@ lto_write_options (void)
   if (cl_options[option->opt_index].flags & (CL_DRIVER|CL_WARNING))
continue;
 
-  /* Skip explicitly some common options that we do not need.  */
-  switch (option->opt_index)
-   {
-   case OPT_dumpbase:
-   case OPT_SPECIAL_input_file:
- continue;
-
-   default:
- break;
-   }
-
   for (j = 0; j < option->canonical_option_num_elements; ++j)
append_to_collect_gcc_options (&temporary_obstack, &first_p,
   option->canonical_option[j]);
-- 
Markus


Re: [tsan] Instrument atomics

2012-11-27 Thread Dmitry Vyukov
On Tue, Nov 27, 2012 at 12:23 PM, Jakub Jelinek  wrote:
> On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
>> I've added 128-bit atomic ops:
>> http://llvm.org/viewvc/llvm-project?view=rev&revision=168683
>
> Thanks.
>>
>> Refactored atomic ops so that the atomic operation itself is done
>> under the mutex:
>> http://llvm.org/viewvc/llvm-project?view=rev&revision=168682
>
> This is IMHO very wrong.  It is fine if you hold some mutex over the atomic
> operation, but you do need to perform the operation atomicly, e.g. you can't
> rely that all atomic accesses to that memory location are performed from
> -fsanitize=thread compiled code (it could be from non-instrumented code
> from e.g. other libraries, could be from user written assembly, etc.).
> By not doing the operation atomicly, the other code might observe
> inconsistent state, or the tsan non-atomic code might be doing the wrong
> thing.  One thing is that the tsan analysis will be right, but another
> thing is that the program itself might misbehave.


Yes, you are right.
I think I've done them atomically initially because of things like
FUTEX_WAKE_OP. I will fix that.


>> And added atomic nand operation:
>> http://llvm.org/viewvc/llvm-project?view=rev&revision=168584
>
> Thanks.  Can you please also add the memory range read/write functions?
> That could be used both for builtins (e.g. if user writes
>   __builtin_memcpy (&a, &b, sizeof (a));
> in his code, no -fno-builtin or similar actually disables that), compiler
> generated builtins - e.g. structure copies
> struct large a, b;
>   ...
>   a = b;
> and also for accesses of sizes that aren't supported by the library
> (consider 32-byte or 64-byte vectors etc.).

Done:
http://llvm.org/viewvc/llvm-project?view=rev&revision=168692


Re: [tsan] Instrument atomics

2012-11-27 Thread Dmitry Vyukov
On Tue, Nov 27, 2012 at 12:47 PM, Dmitry Vyukov  wrote:
> On Tue, Nov 27, 2012 at 12:23 PM, Jakub Jelinek  wrote:
>> On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
>>> I've added 128-bit atomic ops:
>>> http://llvm.org/viewvc/llvm-project?view=rev&revision=168683
>>
>> Thanks.
>>>
>>> Refactored atomic ops so that the atomic operation itself is done
>>> under the mutex:
>>> http://llvm.org/viewvc/llvm-project?view=rev&revision=168682
>>
>> This is IMHO very wrong.  It is fine if you hold some mutex over the atomic
>> operation, but you do need to perform the operation atomicly, e.g. you can't
>> rely that all atomic accesses to that memory location are performed from
>> -fsanitize=thread compiled code (it could be from non-instrumented code
>> from e.g. other libraries, could be from user written assembly, etc.).
>> By not doing the operation atomicly, the other code might observe
>> inconsistent state, or the tsan non-atomic code might be doing the wrong
>> thing.  One thing is that the tsan analysis will be right, but another
>> thing is that the program itself might misbehave.
>
>
> Yes, you are right.
> I think I've done them atomically initially because of things like
> FUTEX_WAKE_OP. I will fix that.
>
>
>>> And added atomic nand operation:
>>> http://llvm.org/viewvc/llvm-project?view=rev&revision=168584
>>
>> Thanks.  Can you please also add the memory range read/write functions?
>> That could be used both for builtins (e.g. if user writes
>>   __builtin_memcpy (&a, &b, sizeof (a));
>> in his code, no -fno-builtin or similar actually disables that), compiler
>> generated builtins - e.g. structure copies
>> struct large a, b;
>>   ...
>>   a = b;
>> and also for accesses of sizes that aren't supported by the library
>> (consider 32-byte or 64-byte vectors etc.).
>
> Done:
> http://llvm.org/viewvc/llvm-project?view=rev&revision=168692


Kostya is going to do gcc<->llvm library sync today.


[PATCH] Fix PR55331

2012-11-27 Thread Richard Biener

This is an alternate fix for PR55331, bootstrapped and tested on
x86_64-unknown-linux-gnu, applied.

Richard.

2012-11-16  Richard Biener  

PR middle-end/55331
* gimple-fold.c (gimplify_and_update_call_from_tree): Replace
stmt with a NOP instead of removing it.

* g++.dg/opt/pr55331.C: New testcase.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 193816)
+++ gcc/gimple-fold.c   (working copy)
@@ -607,7 +607,7 @@ gimplify_and_update_call_from_tree (gimp
  unlink_stmt_vdef (stmt);
  release_defs (stmt);
}
- gsi_remove (si_p, true);
+ gsi_replace (si_p, gimple_build_nop (), true);
  return;
}
 }
Index: gcc/testsuite/g++.dg/opt/pr55331.C
===
--- gcc/testsuite/g++.dg/opt/pr55331.C  (revision 0)
+++ gcc/testsuite/g++.dg/opt/pr55331.C  (working copy)
@@ -0,0 +1,14 @@
+// PR tree-optimization/55331
+// { dg-do compile }
+// { dg-options "-O2 -fno-tree-fre" }
+
+struct A {};
+
+void
+foo (A *p, bool x)
+{
+  A a;
+  char *e = (char *) (&a + 1);
+  if (x)
+__builtin_memmove (p, &a, e - (char *) &a);
+}


Re: [PATCH] Fix the bug of comparing goto_locus to UNKNOWN_LOCATION

2012-11-27 Thread Eric Botcazou
> gcc/ChangeLog:
> 2012-11-26  Dehao Chen  
> 
>* cfgrtl.c (rtl_merge_blocks): Check with UNKNOWN_LOCATION correctly.
>(cfg_layout_merge_blocks): Likewise.

OK, thanks.

-- 
Eric Botcazou


Re: [RFC] Wrong register numbers in .dwarf_frame on Linux/PowerPC

2012-11-27 Thread Mark Wielaard
On Mon, 2012-11-26 at 20:10 +0100, Ulrich Weigand wrote:
> I noticed what appears to be a long-standing bug in generating .dwarf_frame
> sections with GCC on Linux on PowerPC.
> 
> It had been my understanding that .dwarf_frame is supposed to differ from
> .eh_frame on PowerPC w.r.t. register numbers: .eh_frame should use GCC
> internal numbers, while .dwarf_frame should use the DWARF register numbers
> documented in the PowerPC ELF ABI.  However, in actual fact, .dwarf_frame
> does not use the DWARF numbers; and it does not use the GCC numbers either,
> but a weird mixture: it uses GCC numbers for everything except for the
> condition code register, for which it uses the DWARF number (64).
> [...]
> Unfortunately, "use a newer version of
> GCC" isn't really quite right any more: the only versions of GCC that
> ever did it correctly were 4.0 and 4.1, it would appear.

Aha. Thanks for the investigation. I remember being very confused when
hacking on the Systemtap unwinder for ppc64. This explains it (RHEL5
derived distributions use GCC 4.1, but most others use something much
older or much newer).

> So I'm wondering where to go from here.  I guess we could:
> 
> 1. Bring GCC (and gas) behaviour in compliance with the documented ABI
>by removing the #undef DBX_REGISTER_NUMBER and changing gas's
>md_reg_eh_frame_to_debug_frame to the original implementation from
>Jakub's patch.  That would make GDB work well on new files, but
>there are a large number of binaries out there where we continue
>to have the same behaviour as today ...
> 
> 2. Leave GCC and gas as-is and modify GDB to expect GCC numbering in
>.dwarf_frame, except for the condition code register.  This would
>break debugging of files built with GCC 4.0 and 4.1 unless we
>want to add a special hack for that.
> 
> 3. Like 2., but remove the condition code hack: simply use identical
>numbers in .eh_frame and .dwarf_frame.  This would make PowerPC
>like other Linux platforms in that respect.
> 
> Thoughts?

Which other unwinders are out there, that might rely on the current
numbering? The Systemtap runtime unwinder (*) currently is incomplete
(and in one case wrong since the numbering overlaps), so it doesn't
really matter much which solution you pick (we will just have to watch
out and fix things to be as consistent as possible when your change goes
through). If you do change the numbering it would be ideal if there was
a way to detect which one was in place (although it is probably hopeless
because depending on which GCC version is in use there can already be
different numberings). BTW. The reason the systemtap runtime unwinder is
a little wrong here is because on all other architectures we assume
eh_frame and debug_frame DWARF register numberings are equal, is ppc
really the only architecture for which that isn't true, or were we just
lucky?

Thanks,

Mark

(*) The Systemtap runtime unwinder has this comment that explains (or
maybe confuses things even more...):

/* These are slightly strange since they don't really use dwarf register
   mappings, but gcc internal register numbers. There is some confusion about
   the numbering see http://gcc.gnu.org/ml/gcc/2004-01/msg00025.html
   We just handle the 32 fixed point registers, mq, count and link and
   ignore status registers, floating point, vectors and special registers
   (most of which aren't available in pt_regs anyway). Also we placed nip
   last since we use that as UNW_PC register and it needs to be filled in.
   Note that we handle both the .eh_frame and .debug_frame numbering at
   the same time. There is potential overlap though. 64 maps to cr in one
   and mq in the other...
   Everything else is mapped to an invalid register number . */




Re: [PATCH] Cleanup last_location and update input_location in ipa_prop

2012-11-27 Thread Richard Biener
On Mon, Nov 26, 2012 at 11:54 PM, Dehao Chen  wrote:
> The new patch is attached. Bootstrapped and passed gcc regression test.
>
> Ok for trunk?

Hmm ... this merely avoids some UNKNOWN_LOCATIONs which should
be ok.  I think the issue is that gimplify_expr does:

  saved_location = input_location;
  if (save_expr != error_mark_node
  && EXPR_HAS_LOCATION (*expr_p))
input_location = EXPR_LOCATION (*expr_p);

thus it retains input_location from previous recursive invocations (that's ok).

But it doesn't seem to be the case that input_location is UNKNOWN_LOCATION
during GIMPLE passes (which it really should be ...).  So to fix the
gimplification
issue I think that gimplify_ctx should have a location (initialized to
UNKNOWN_LOCATION), which it saves/restores across its push/pop
operation.

Or of course that inside the pass manager before executing passes assert
that input_location is UNKNOWN_LOCATION and fix up things according
to that.  First offender is in cgraphunit.c in cgraph_analyze_function:

  location_t saved_loc = input_location;
  input_location = DECL_SOURCE_LOCATION (decl);

that should be totally unnecessary (input_location shoud be and stay
UNKNOWN_LOCATION).  And finalize_compilation_unit (the last thing
the frontends should call) should have input_location = UNKNOWN_LOCATION
right at the point it clears current_function_decl / cfun.

I'd prefer the 2nd approach (maybe without the assert as we are in stage3
already, but the assert would certainly help).  And places in the middle-end
that set input_location for purpose of (re-)gimplifying should use the location
in the gimplify ctx (which would need to be added) instead of setting
input_location.

Maybe as a first try set input_location to UNKNOWN_LOCATION in
finalize_compilation_unit.

Thanks,
Richard.

> Thanks,
> Dehao
>
> gcc/ChangeLog:
> 2010-11-05  Dehao Chen  
>
> * ipa-prop.c (ipa_modify_call_arguments): Set loc correctly.
> * emit-rtl.c (last_location): Remove unused variable.
>
> Index: gcc/emit-rtl.c
> ===
> --- gcc/emit-rtl.c (revision 193203)
> +++ gcc/emit-rtl.c (working copy)
> @@ -5937,7 +5937,7 @@ location_t epilogue_location;
>  /* Hold current location information and last location information, so the
> datastructures are built lazily only when some instructions in given
> place are needed.  */
> -static location_t curr_location, last_location;
> +static location_t curr_location;
>
>  /* Allocate insn location datastructure.  */
>  void
> @@ -5945,7 +5945,6 @@ insn_locations_init (void)
>  {
>prologue_location = epilogue_location = 0;
>curr_location = UNKNOWN_LOCATION;
> -  last_location = UNKNOWN_LOCATION;
>  }
>
>  /* At the end of emit stage, clear current location.  */
> Index: gcc/ipa-prop.c
> ===
> --- gcc/ipa-prop.c (revision 193203)
> +++ gcc/ipa-prop.c (working copy)
> @@ -2870,7 +2870,8 @@ ipa_modify_call_arguments (struct cgraph_edge *cs,
>
>gcc_checking_assert (adj->offset % BITS_PER_UNIT == 0);
>base = gimple_call_arg (stmt, adj->base_index);
> -  loc = EXPR_LOCATION (base);
> +  loc = DECL_P (base) ? DECL_SOURCE_LOCATION (base)
> +  : EXPR_LOCATION (base);
>
>if (TREE_CODE (base) != ADDR_EXPR
>&& POINTER_TYPE_P (TREE_TYPE (base)))


RE: [PING Updated]: [PATCH GCC/ARM] Fix problem that hardreg_cprop opportunities are missed on thumb1

2012-11-27 Thread Bin Cheng


> -Original Message-
> From: Ramana Radhakrishnan [mailto:ramana@googlemail.com]
> Sent: Saturday, November 24, 2012 7:20 AM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org; Ramana Radhakrishnan; Richard Earnshaw;
Richard
> Sandiford
> Subject: Re: [PING Updated]: [PATCH GCC/ARM] Fix problem that
hardreg_cprop
> opportunities are missed on thumb1
> 
> Ok if no regressions.
>
 
I re-tested it against trunk and committed it as r193841

Thanks.


> Ramana
> 
> On Wed, Oct 10, 2012 at 9:57 AM, Bin Cheng  wrote:
> > Ping^2
> >
> >> -Original Message-
> >> From: gcc-patches-ow...@gcc.gnu.org
> >> [mailto:gcc-patches-ow...@gcc.gnu.org]
> > On
> >> Behalf Of Bin Cheng
> >> Sent: Monday, October 08, 2012 2:36 PM
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Ramana Radhakrishnan; Richard Earnshaw; 'Richard Sandiford'
> >> Subject: RE: [PING Updated]: [PATCH GCC/ARM] Fix problem that
> > hardreg_cprop
> >> opportunities are missed on thumb1
> >>
> >> Ping.
> >>
> >> > -Original Message-
> >> > From: gcc-patches-ow...@gcc.gnu.org
> >> > [mailto:gcc-patches-ow...@gcc.gnu.org]
> >> On
> >> > Behalf Of Bin Cheng
> >> > Sent: Tuesday, September 25, 2012 4:00 PM
> >> > To: 'Richard Sandiford'
> >> > Cc: Ramana Radhakrishnan; Richard Earnshaw; gcc-patches@gcc.gnu.org
> >> > Subject: RE: [Updated]: [PATCH GCC/ARM] Fix problem that
> >> > hardreg_cprop opportunities are missed on thumb1
> >> >
> >> >
> >> > > -Original Message-
> >> > > From: Richard Sandiford [mailto:rdsandif...@googlemail.com]
> >> > > Sent: Wednesday, September 05, 2012 6:09 AM
> >> > > To: Bin Cheng
> >> > > Cc: Ramana Radhakrishnan; 'Eric Botcazou';
> >> > > gcc-patches@gcc.gnu.org
> >> > > Subject: Re: Ping: [PATCH GCC/ARM] Fix problem that hardreg_cprop
> >> > > opportunities are missed on thumb1
> >> >
> >> > > Subtraction of zero isn't canonical rtl though.  Passes after
> >> > > peephole2
> >> > would
> >> > > be well within their rights to simplify the expression back to a
move.
> >> > > From that point of view, making the passes recognise (plus X 0)
> >> > > and (minus
> >> > X 0)
> >> > > as special cases would be inconsistent.
> >> > >
> >> > > Rather than make the Thumb 1 CC usage implicit in the rtl stream,
> >> > > and
> >> > carry
> >> > > the current state around in cfun->machine, it seems like it would
> >> > > be
> >> > better to
> >> > > get md_reorg to rewrite the instructions into a form that makes
> >> > > the use of condition codes explicit.
> >> > >
> >> > > md_reorg also sounds like a better place in the pipeline than
> >> > > peephole2 to
> >> > be
> >> > > doing this kind of transformation, although I admit I have zero
> >> > > evidence
> >> > to
> >> > > back that up...
> >> > >
> >> >
> >> > Hi Richard,
> >> >
> >> > This is the updated patch according to your suggestions. I removed
> >> > the
> >> > peephole2 patterns and introduced function thumb1_reorg to rewrite
> >> > instructions in md_reorg pass.
> >> >
> >> > In addition to missed propagation, this patch also detects
> >> > following
> > case:
> >> >   mov r5, r0
> >> >   str r0, [r4]   <---miscellaneous irrelevant instructions
> >> >   [cmp r0, 0]<---saved
> >> >   bne  .Lxxx
> >> >
> >> > Patch tested on arm-none-eabi/cortex-m0, no regressions introduced.
> >> >
> >> > Is it OK?
> >> >
> >> > Thanks.
> >> >
> >> > 2012-09-25  Bin Cheng  
> >> >
> >> > * config/arm/arm.c (thumb1_reorg): New function.
> >> > (arm_reorg): Call thumb1_reorg.
> >> > (thumb1_final_prescan_insn): Record src operand in thumb1_cc_op0.
> >> > * config/arm/arm.md : Remove peephole2 patterns which rewrites
move
> >> > into subtract of ZERO.
> >>
> >>
> >>





Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-11-27 Thread Eric Botcazou
> Or rather this one. Same hammer, different color. It turns out that
> the rtlanal.c change caused problems, so I've got to use a home-brewn
> equivalent of remove_reg_equal_equiv_notes_for_regno...
> 
> Test case is unchanged so I'm omitting it here.
> 
> Ciao!
> Steven
> 
> 
> gcc/
>   * loop-unroll.c (struct iv_to_split): Add new 'orig_var' member.
>   (analyze_iv_to_split_insn): Record it.
>   (maybe_strip_eq_note_for_split_iv): New function to remove REG_EQUAL
>   notes that refer to IVs that are being split.
>   (apply_opt_in_copies): Use maybe_strip_eq_note_for_split_iv.  Twice.
>   Use FOR_BB_INSNS_SAFE.

That's fine with me, thanks.  You might want to defer applying it until the 
reason why it isn't apparently sufficient for aermod.f90 is understood though.

-- 
Eric Botcazou


Ping[2]: [PATCH v2] PR tree-optimization/55079: Don't remove all exits of a loop during loop unroll

2012-11-27 Thread Siddhesh Poyarekar
Ping!

Siddhesh

On Wed, Nov 21, 2012 at 12:42:13PM +0530, Siddhesh Poyarekar wrote:
> Hi,
> 
> Ping!
> 
> 
> Siddhesh
> 
> On Thu, 15 Nov 2012 19:05:38 +0530, Siddhesh wrote:
> 
> > Hi,
> > 
> > Here's an updated version of the patch which warns the user if the
> > removing of redundant exits results in an infinite loop.  I have added
> > an additional flag in struct loop called external_exits to record if
> > an exit edge is moved outside the loop body.  This currently happens
> > in the loop-unswitch pass and was the root cause of the regression in
> > torture/pr49518.c that I talked about earlier.  The patch now passes
> > all regression tests except a mudflap case (fail37-frag).  The test is
> > already broken due to removal of all exits so I haven't attempted to
> > fix it as part of this patch.  How does this version look?
> > 
> > Regards,
> > Siddhesh
> > 
> > gcc/ChangeLog:
> > 
> > * cfgloop.h (struct loop): New member EXTERNAL_EXITS.
> > * tree-ssa-loop-ivcanon.c (remove_redundant_iv_tests) Warn
> > when loop is left without any exits.
> > * tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Set
> > EXTERNAL_EXITS when moving a statement with an exit edge out
> > of the loop body.
> 


Ping[2]: [PATCH] Disable libsanitizer if C++ is not being built

2012-11-27 Thread Siddhesh Poyarekar
Ping!

Siddhesh

On Wed, Nov 21, 2012 at 12:43:10PM +0530, Siddhesh Poyarekar wrote:
> Hi,
> 
> Ping!
> 
> Siddhesh
> 
> On Thu, 15 Nov 2012 19:52:21 +0530, Siddhesh wrote:
> 
> > Hi,
> > 
> > Current HEAD fails build when --enable-languages=c, i.e. C++ is not
> > being built.  Attached patch fixes this.
> > 
> > Regards,
> > Siddhesh
> > 
> > ChangeLog:
> > 
> > 2012-11-15  Siddhesh Poyarekar  
> > 
> > * configure.ac: Disable libsanitizer if we're not building
> > C++.
> > * configure: Regenerate.
> > 
> 


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-11-27 Thread Richard Biener
On Tue, Nov 27, 2012 at 1:06 AM, Kenneth Zadeck
 wrote:
> Richard,
>
> I spent a good part of the afternoon talking to Mike about this.  He is on
> the c++ standards committee and is a much more seasoned c++ programmer than
> I am.
>
> He convinced me that with a large amount of engineering and c++
> "foolishness" that it was indeed possible to get your proposal to POSSIBLY
> work as well as what we did.
>
> But now the question is why would any want to do this?
>
> At the very least you are talking about instantiating two instances of
> wide-ints, one for the stack allocated uses and one for the places where we
> just move a pointer from the tree or the rtx. Then you are talking about
> creating connectors so that the stack allocated functions can take
> parameters of pointer version and visa versa.
>
> Then there is the issue that rather than just saying that something is a
> wide int, that the programmer is going to have to track it's origin.   In
> particular,  where in the code right now i say.
>
> wide_int foo = wide_int::from_rtx (r1);
> wide_int bar = wide_int::from_rtx (r2) + foo;
>
> now i would have to say
>
> wide_int_ptr foo = wide_int_ptr::from_rtx (r1);
> wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo;

No, you'd say

wide_int foo = wide_int::from_rtx (r1);

and the static, non-templated from_rtx method would automagically
return (always!) a "wide_int_ptr" kind.  The initialization then would
use the assignment operator that mediates between wide_int and
"wide_int_ptr", doing the copying.

The user should get a 'stack' kind by default when specifying wide_int,
like implemented with

struct wide_int_storage_stack;
struct wide_int_storage_ptr;

template 
class wide_int : public storage
{
...
   static wide_int  from_rtx (rtx);
}

the whole point of the exercise is to make from_rtx and from_tree avoid
the copying (and excessive stack space allocation) for the rvalue case
like in

 wide_int res = wide_int::from_rtx (x) + 1;

if you save the result into a wide_int temporary first then you are lost
of course (modulo some magic GCC optimization being able to elide
the copy somehow).

And of course for code like VRP that keeps a lattice of wide_ints to
be able to reduce its footprint by using ptr storage and explicit allocations
(that's a secondary concern, of course).  And for VRP to specify that
it needs more than the otherwise needed MAX_INT_MODE_SIZE.
ptr storage would not have this arbitrary limitation, only embedded
storage (should) have.

> then when i want to call some function using a wide_int ref that function
> now must be either overloaded to take both or i have to choose one of the
> two instantiations (presumably based on which is going to be more common)
> and just have the compiler fix up everything (which it is likely to do).

Nope, they'd be

class wide_int ...
{
   template 
   wide_int operator+(wide_int  a, wide_int b)
   {
  return wide_int::plus_worker (a.precision, a. , a.get_storage_ptr (),
b.precision, ...,
b.get_storage_ptr ());
   }


> And so what is the payoff:
> 1) No one except the c++ elite is going to understand the code. The rest of
> the community will hate me and curse the ground that i walk on.

Maybe for the implementation - but look at hash-table and vec ... not for
usage certainly.

> 2) I will end up with a version of wide-int that can be used as a medium
> life container (where i define medium life as not allowed to survive a gc
> since they will contain pointers into rtxes and trees.)
> 3) An no clients that actually wanted to do this!!I could use as an
> example one of your favorite passes, tree-vrp.   The current double-int
> could have been a medium lifetime container since it has a smaller
> footprint, but in fact tree-vrp converts those double-ints back into trees
> for medium storage.   Why, because it needs the other fields of a tree-cst
> to store the entire state.  Wide-ints also "suffer" this problem.  their
> only state are the data, and the three length fields.   They have no type
> and none of the other tree info so the most obvious client for a medium
> lifetime object is really not going to be a good match even if you "solve
> the storage problem".
>
> The fact is that wide-ints are an excellent short term storage class that
> can be very quickly converted into our two long term storage classes.  Your
> proposal is requires a lot of work, will not be easy to use and as far as i
> can see has no payoff on the horizon.   It could be that there could be
> future clients for a medium lifetime value, but asking for this with no
> clients in hand is really beyond the scope of a reasonable review.
>
> I remind you that the purpose of these patches is to solve problems that
> exist in the current compiler that we have papered over for years.   If
> someone needs wide-ints in some way that is not foreseen then they can
> change it.

The patches introduce a lot more temporary wide-int

Re: [PATCH] Disable libsanitizer if C++ is not being built

2012-11-27 Thread Jakub Jelinek
On Thu, Nov 15, 2012 at 07:52:21PM +0530, Siddhesh Poyarekar wrote:
> 2012-11-15  Siddhesh Poyarekar  
> 
>   * configure.ac: Disable libsanitizer if we're not building C++.
>   * configure: Regenerate.

Ok, thanks.

Jakub


Re: PATCH: lto/55474: global-buffer-overflow in lto-wrapper.c

2012-11-27 Thread Richard Biener
On Tue, Nov 27, 2012 at 9:32 AM, Markus Trippelsdorf
 wrote:
> On 2012.11.26 at 13:58 -0800, H.J. Lu wrote:
>> Hi,
>>
>> OPT_SPECIAL_unknown, OPT_SPECIAL_ignore, OPT_SPECIAL_program_name and
>> OPT_SPECIAL_input_file are special options, which aren't in cl_options.
>> This patch avoids
>>
>> if (!(cl_options[foption->opt_index].flags & CL_TARGET))
>>
>> on them.  This patch skips them.  OK to install?
>
> The same fix is necessary for gcc/lto-opts.c.
> This solves the issue from PR54795 comment 5, 13 and 20.

Ok for both patches.

Thanks,
Richard.

> 2012-11-27  Markus Trippelsdorf  
>
> PR lto/54795
> * lto-opts.c (lto_write_options): Also handle
> OPT_SPECIAL_unknown, OPT_SPECIAL_ignore and
> OPT_SPECIAL_program_name.
>
>
> diff --git a/gcc/lto-opts.c b/gcc/lto-opts.c
> index a235f41..a61a26f 100644
> --- a/gcc/lto-opts.c
> +++ b/gcc/lto-opts.c
> @@ -93,6 +93,20 @@ lto_write_options (void)
>  {
>struct cl_decoded_option *option = &save_decoded_options[i];
>
> +  /* Skip explicitly some common options that we do not need.  */
> +  switch (option->opt_index)
> +  {
> +   case OPT_dumpbase:
> +   case OPT_SPECIAL_unknown:
> +   case OPT_SPECIAL_ignore:
> +   case OPT_SPECIAL_program_name:
> +   case OPT_SPECIAL_input_file:
> + continue;
> +
> +   default:
> + break;
> +  }
> +
>/* Skip frontend and driver specific options here.  */
>if (!(cl_options[option->opt_index].flags & 
> (CL_COMMON|CL_TARGET|CL_LTO)))
> continue;
> @@ -108,17 +122,6 @@ lto_write_options (void)
>if (cl_options[option->opt_index].flags & (CL_DRIVER|CL_WARNING))
> continue;
>
> -  /* Skip explicitly some common options that we do not need.  */
> -  switch (option->opt_index)
> -   {
> -   case OPT_dumpbase:
> -   case OPT_SPECIAL_input_file:
> - continue;
> -
> -   default:
> - break;
> -   }
> -
>for (j = 0; j < option->canonical_option_num_elements; ++j)
> append_to_collect_gcc_options (&temporary_obstack, &first_p,
>option->canonical_option[j]);
> --
> Markus


Re: [PATCH] Fix substitute_and_fold ICE (PR tree-optimization/55331)

2012-11-27 Thread Richard Biener
On Mon, Nov 26, 2012 at 4:27 PM, Jakub Jelinek  wrote:
> On Mon, Nov 26, 2012 at 04:24:41PM +0100, Richard Biener wrote:
>> On Thu, Nov 15, 2012 at 9:09 PM, Jakub Jelinek  wrote:
>> > On the following testcase substitute_and_fold ICEs because memmove
>> > of length 1 on an empty class is optimized away, and this function wasn't
>> > prepared to handle that.
>> >
>> > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
>> > trunk/4.7?
>>
>> I think the bug is that the stmt is removed.  fold_stmt is not supposed to
>> "remove" the stmt in any case - it may replace it with a gimple_nop at most.
>>
>> Thus, gimplify_and_update_call_from_tree is at fault here.
>>
>> I am testing a patch that fixes it.
>
> Note that fold_stmt_1 also has code to handle a call folding into nothing,
> so perhaps that could be removed too if you change it to fold into a nop
> instead.

Yeah, I'm testing a patch to remove that.

Richard.

> Jakub


[PATCH ARM] Disable "-fira-hoist-pressure" on Thumb2

2012-11-27 Thread Bin Cheng
Hi,
I committed the patch implementing register pressure directed hoist in TRUNK
and enabled the option "-fira-hoist-pressure" for all targets by reviewer's
suggestion, although it has no obvious effect on Thumb2/x86/x86_64. Now I
collected performance data with Os and found it causes performance
regression in some cases on Thumb2, so this patch disables the option on
Thumb2 and modify the corresponding test cases.

Tested on arm-none-eabi/M3, is it OK?

Thanks.


2012-11-21  Bin Cheng  

* config/arm/arm.c (arm_option_override): Disable option
-fira-hoist-pressure on Thumb2.

2012-11-21  Bin Cheng  

* gcc.dg/hoist-register-pressure-1.c: Skip on ARM Thumb2.
* gcc.dg/hoist-register-pressure-2.c: Ditto.
* gcc.dg/hoist-register-pressure-3.c: Ditto.






Re: [PATCH][RFC] Bug handling SUBREG (MEM) - MEM having side-effects?

2012-11-27 Thread Tejas Belagod

Richard Sandiford wrote:


After the discussions Richard Earnshaw had on IRC with Andrew Pinski, it was
felt that it was best to fix the standard predicate
'general_operand' to not allow SUBREG (MEM) with side-effects as it has known
issues associated with it - particularly reload not being able to deal with such
cases. 'general_operand' currently outlaws cases like paradoxical SUBREG (MEM)
on targets with insn scheduling and SUBREG (MEM) with non-zero SUBREG_BYTE. This
is another case it should not allow. Here is a patch that tightens
general_operands to not allow SUBREG (MEM) with MEM having side-effects.

I would like to hear some thoughts on this.


LGTM.

It would be good to get rid of subreg mem operands altogether at some point,
but that's a little too drastic for stage 3.  This patch looks like a strict
improvement.



Thanks for looking at this. Is this OK for 4.7 as well?

Tejas Belagod
ARM.



Re: [PATCH] Disable libsanitizer if C++ is not being built

2012-11-27 Thread Siddhesh Poyarekar
On Tue, Nov 27, 2012 at 11:05:47AM +0100, Jakub Jelinek wrote:
> On Thu, Nov 15, 2012 at 07:52:21PM +0530, Siddhesh Poyarekar wrote:
> > 2012-11-15  Siddhesh Poyarekar  
> > 
> > * configure.ac: Disable libsanitizer if we're not building C++.
> > * configure: Regenerate.
> 
> Ok, thanks.
> 

Could you please commit it for me?  I don't have write access.

Thanks,
Siddhesh


Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-11-27 Thread Steven Bosscher
On Tue, Nov 27, 2012 at 10:55 AM, Eric Botcazou  wrote:
>> Or rather this one. Same hammer, different color. It turns out that
>> the rtlanal.c change caused problems, so I've got to use a home-brewn
>> equivalent of remove_reg_equal_equiv_notes_for_regno...
>>
>> Test case is unchanged so I'm omitting it here.
>>
>> Ciao!
>> Steven
>>
>>
>> gcc/
>>   * loop-unroll.c (struct iv_to_split): Add new 'orig_var' member.
>>   (analyze_iv_to_split_insn): Record it.
>>   (maybe_strip_eq_note_for_split_iv): New function to remove REG_EQUAL
>>   notes that refer to IVs that are being split.
>>   (apply_opt_in_copies): Use maybe_strip_eq_note_for_split_iv.  Twice.
>>   Use FOR_BB_INSNS_SAFE.
>
> That's fine with me, thanks.  You might want to defer applying it until the
> reason why it isn't apparently sufficient for aermod.f90 is understood though.

Thanks. Yes, I'm first going to try and figure out why this doesn't
fix aermod for Dominique. I suspect the problem is in variable
expansion in the unroller. A previous patch of mine updates REG_EQUAL
notes in variable expansion, but it doesn't clean up notes that refer
to maybe dead registers.

I have to say, I've got a uncomfortable feeling about REG_EQUAL notes
not being allowed to refer to dead registers. In the case at hand, the
code basically does:

S1: r1 = r2 + 0x4  // r2 is the reg from split_iv, r1 was the original IV
S2: r3 = mem[r1]
S3: if ... goto L1;
S4: r4 = r3 // REG_EQUAL mem[r1]
L1:
S5: r1 = r2 + 0x8

At S4, r1 is not live in the usual meaning of register liveness, but
the DEF from S1 still reaches the REG_EQUAL note. This situation is
not only created by loop unrolling. At least gcse.c (PRE),
loop-invariant.c, cse.c, dce.c and combine.c can create situations
like the above. ifcvt.c jumps through hoops to avoid it (see e.g.
PR46315, which you fixed).

Most of the problems are papered over by occasional calls to
df_note_add_problem from some passes, so that df_remove_dead_eq_notes
cleans up any notes referencing dead registers. But if we really want
to declare this kind of REG_EQUAL note reference to a dead register
invalid RTL (which is something at least you, Paolo, and me agree on),
and we expect passes to clean up their own mess by either updating or
removing any notes they invalidate, then df_remove_dead_eq_notes
shouldn't be necessary for correctness because all notes it encounters
should be valid (being updated by passes).

Removing df_remove_dead_eq_notes breaks bootstrap on the four targets
I tried it on (x86_64, powerpc64, ia64, and hppa). That is, breaks
*without* -funroll-loops, and *without* -fweb. With a hack to disable
pass_initialize_regs, bootstrap still fails, but other files are
miscompiled. With another hack on top to disable CSE2, bootstrap still
fails, and yet other files are miscompiled.

What I'm really trying to say, is that even when I fix this particular
issue with aermod (which is apparently 3 issues: the one I fixed last
month for x86_64, the one fixed with the patch in this thread for
SPARC, and the yet-to-be-identified problem Dominique is still seeing
on darwin10) then it's likely other, similar bugs will show up. Bugs
that are hard to reproduce, dependent on the target's RTL, and
difficult to debug as they are usually wrong-code bugs on larger test
cases.

We really need a more robust solution. I've added Jeff and rth to the
CC, looking for opinions/thoughts/suggestions/$0.02s.

Ciao!
Steven


Re: RFA: Enable both ld and gold in GCC 4.8

2012-11-27 Thread Matthias Klose
Am 26.11.2012 17:53, schrieb H.J. Lu:
> Hi,
> 
> There is a patch to enable both ld and gold in GCC:
> 
> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg02076.html
> 
> and Diego asked if there is "anything in particular blocking
> this patch in trunk":

I think it's lack of review.

> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg02121.html
> 
> I'd like to see it in GCC 4.8.

yes, currently this selection can only be done by providing the linkers in
different directories, and then use -B to select one. The new option would
make this independent of the directory name.

  Matthias



Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Hans-Peter Nilsson
I quoted the whole discussion, see a single line below.

On Mon, 19 Nov 2012, Eric Botcazou wrote:
> > Unfortunately, it causes regressions; read on for a very brief
> > analysis.
> >
> > For x86_64-linux (base multilib):
> >
> > Running
> > /home/hp/gcctop/tmp/pr55030-2n/gcc/gcc/testsuite/gcc.dg/guality/guality.exp
> > ... ...
> > FAIL: gcc.dg/guality/pr36728-1.c  -O1  line 12 arg7 == 30
> > [...]
> >
> > I looked into gcc.dg/guality/pr36728-1.c at base -O1.
> >
> > The generated assembly code modulo debug info, is the same.  The
> > value of arg7 is optimized out, says gdb in gcc.log.  The
> > problem doesn't seem to be any md-generated
> > frame-related-barrier as was my first guess, but the volatile
> > asms in the source(!).  The first one looks like this (and the
> > second one similar) in pr36728-1.c.168r.cse1 (r193583):
> >
> > (insn 26 25 27 2 (parallel [
> > (set (mem/c:SI (plus:DI (reg/f:DI 20 frame)
> > (const_int -32 [0xffe0])) [0 y+0 S4
> > A256]) (asm_operands/v:SI ("") ("=m") 0 [
> > (mem/c:SI (plus:DI (reg/f:DI 20 frame)
> > (const_int -32 [0xffe0])) [0 y+0
> > S4 A256]) ]
> >  [
> > (asm_input:SI ("m") (null):0)
> > ]
> >  []
> > /home/hp/gcctop/tmp/nbasf/gcc/gcc/testsuite/gcc.dg/guality/pr36728-1.c:12))
> > (clobber (reg:QI 18 fpsr))
> > (clobber (reg:QI 17 flags))
> > ])
> > /home/hp/gcctop/tmp/nbasf/gcc/gcc/testsuite/gcc.dg/guality/pr36728-1.c:12
> > -1 (nil))
> >
> > It's not caught by the previous test:
> > -  && GET_CODE (PATTERN (insn)) == ASM_OPERANDS
> > -  && MEM_VOLATILE_P (PATTERN (insn)))
> >
> > ...but since it is a volatile asm (as seen in the source and by
> > the /v on the asm_operands) volatile_insn_p does catch it (as
> > expected and intended) and down the road for some reason, gcc
> > can't recover the arg7 contents.  Somewhat expected, but this
> > volatile asm is also more volatile than intended; a clobber list
> > for example as seen above inserted by the md, is now redundant.
>
> Thanks for the analysis.  I don't think that the redundancy of the clobber
> list matters here: the clobbers are added to all asm statements, volatile or
> not, so they aren't intended to make the volatile statements more precise in
> the hope of optimizing around them.

IIIUC, Jakub disagrees on this point, see
.

Sigh, hoping we can get consensus on this point, or at least
that gcc has a consistent view...

> > I'm not sure what to do with this.  Try changing volatile_insn_p
> > adding a parameter to optionally allow volatile asms with
> > outputs to pass?  But then, when *should* that distinction be
> > done, to let such volatile asms be allowed to pass as
> > not-really-as-volatile-as-we-look-for?  I'd think "never" for
> > any of the the patched code, or maybe "only when looking at
> > effects on memory".
>
> Yes, weakening volatile_insn_p seems also dangerous to me, and I'd rather lean
> towards the opposite, conservative side.
>
> We apparently have a small conflict between the meaning of volatile asms with
> operands at the source level and volatile_insn_p.  However, I think that the
> latter interpretation is the correct one: if your asm statements have effects
> that can be described, then you should use output constraints instead of
> volatile; otherwise, you should use volatile and the output constraints have
> essentially no meaning.
>
> The gcc.dg/guality/pr36728-[12].c testcases use volatile asms in an artificial
> way to coax the compiler into following a certain behaviour, so I don't think
> that they should be taken into account to judge the correctness of the change.
> Therefore, I'd go ahead and apply the full patch below, possibly adjusting
> both testcases to cope with it.  Tentative (and untested) patch attached.
>
> I've also CCed Jakub though, as he might have a different opinion.
>
> > gcc:
> > PR middle-end/55030
> > * builtins.c (expand_builtin_setjmp_receiver): Update comment
> > regarding purpose of blockage.
> > * emit-rtl.c [!HAVE_blockage] (gen_blockage): Similarly for
> > the head comment.
> > * rtlanal.c (volatile_insn_p): Ditto.
> > * doc/md.texi (blockage): Update similarly.  Change wording to
> > require one of two forms, rather than implying a wider choice.
> > * cse.c (cse_insn): Where checking for blocking insns, use
> > volatile_insn_p instead of manual check for volatile ASM.
> > * dse.c (scan_insn): Ditto.
> > * cselib.c (cselib_process_insn): Ditto.
>
> --
> Eric Botcazou



Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-11-27 Thread Steven Bosscher
On Tue, Nov 27, 2012 at 11:34 AM, Steven Bosscher wrote:
>>> gcc/
>>>   * loop-unroll.c (struct iv_to_split): Add new 'orig_var' member.
>>>   (analyze_iv_to_split_insn): Record it.
>>>   (maybe_strip_eq_note_for_split_iv): New function to remove REG_EQUAL
>>>   notes that refer to IVs that are being split.
>>>   (apply_opt_in_copies): Use maybe_strip_eq_note_for_split_iv.  Twice.
>>>   Use FOR_BB_INSNS_SAFE.
>>
>> That's fine with me, thanks.  You might want to defer applying it until the
>> reason why it isn't apparently sufficient for aermod.f90 is understood 
>> though.
>
> Thanks. Yes, I'm first going to try and figure out why this doesn't
> fix aermod for Dominique. I suspect the problem is in variable
> expansion in the unroller. A previous patch of mine updates REG_EQUAL
> notes in variable expansion, but it doesn't clean up notes that refer
> to maybe dead registers.

Well, that's not it. But what I said below:


> I have to say, I've got a uncomfortable feeling about REG_EQUAL notes
> not being allowed to refer to dead registers.

applies even more after analyzing Dominique's bug.

At the start of RTL PRE we have:

 2000 {r719:DI=r719:DI+0x4;clobber flags:CC;}
  REG_UNUSED: flags:CC

where r719:DI+0x4 is found to be partially redundant. So PRE optimizes this to:

 2889 r719:DI=r1562:DI
  REG_EQUAL: r719:DI+0x4
...
 2913 r1562:DI=r719:DI+0x4

This self-reference in the REG_EQUAL note is the problem. The SET
kills r719, but there is a USE in the PRE'ed expression that keeps
r719 alive. But this use is copy-propagated away in CPROP2:

 2913 r1562:DI=r1562:DI+0x4

Now that USE of r719 is a use of a dead register, rendering the
REG_EQUAL invalid. From there on the problem is the same as the ones I
had to "fix" in loop-unroll.c. First the webizer puts the USE in a
different web and renames the USE to r1591:

 2889 r719:DI=r1562:DI
  REG_EQUAL: r1591:DI+0x4

CSE2 uses this and the equivalence of r719 and r1562 to "optimize" the
PRE expression:

 2913 r1562:DI=r1591:DI+0x8

Then init-regs notices that this reg is quasi-used uninitialized:

 3122 r1591:DI=0
 2913 r1562:DI=r1591:DI+0x8
  REG_DEAD: r1591:DI

And combine finally produces:

 2913 r1562:DI=0x8


I'm not sure what to name as the "root cause" for all of this. It all
starts with PRE creating a REG_EQUAL note that references the register
that's SET in the insn the note is attached to, but the register is
live after the insn so from that point of view the note is not
invalid. CPROP2 kills r179 by propagating it out and making the note
invalid. I think CPROP2 could do that to any register, and if passes
should make sure REG_EQUAL notes are updated or removed then this is a
bug in RTL CPROP.

Very, very messy...

Ciao!
Steven


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Jakub Jelinek
On Tue, Nov 27, 2012 at 06:44:23AM -0500, Hans-Peter Nilsson wrote:
> > We apparently have a small conflict between the meaning of volatile asms 
> > with
> > operands at the source level and volatile_insn_p.  However, I think that the
> > latter interpretation is the correct one: if your asm statements have 
> > effects
> > that can be described, then you should use output constraints instead of
> > volatile; otherwise, you should use volatile and the output constraints have
> > essentially no meaning.

I strongly disagree with this.  Outputs and clobbers have significant
meaning even on volatile asms, asm volatile doesn't mean all registers and
all memory are supposed to be considered clobbered.  For memory you have
"memory" clobber for that, for registers it is users responsibility to
describe exactly what changes, either in clobbers or in outputs.
The difference between non-volatile and volatile asm is, as the
documentation states:

The `volatile' keyword indicates that the instruction has important
side-effects.  GCC will not delete a volatile `asm' if it is reachable.

Volatile asm acts as an optimization barrier to some extent, but it really
can't modify registers or memory that aren't described as modified in the
asm pattern.  The important side-effects are of some other kind than
modifying registers or memory visible from the current function.
Ditto for UNSPEC_VOLATILE.

So, at least from var-tracking POV which doesn't attempt to perform any
optimizations across any insn, just tries to track where values live, IMHO a
volatile asm acts exactly the same as non-volatile, that is why I'm testing
following patch right now.

But the question is also what effects your patch can have e.g. on RTL DSE.

2012-11-26  Jakub Jelinek  

PR debug/36728
PR debug/55467
* cselib.c (cselib_process_insn): If cselib_preserve_constants,
don't reset table and exit early on volatile insns and setjmp.
Reset table afterwards on setjmp.

* gcc.dg/guality/pr36728-1.c: Include "../nop.h", make sure the asm
are non-empty and add dependency between the first and second asm.
* gcc.dg/guality/pr36728-2.c: Likewise.
* gcc.dg/guality/pr36728-3.c: New test.
* gcc.dg/guality/pr36728-4.c: New test.

--- gcc/cselib.c.jj 2012-11-26 10:14:26.0 +0100
+++ gcc/cselib.c2012-11-26 20:01:10.488304558 +0100
@@ -2626,11 +2626,12 @@ cselib_process_insn (rtx insn)
   cselib_current_insn = insn;
 
   /* Forget everything at a CODE_LABEL, a volatile insn, or a setjmp.  */
-  if (LABEL_P (insn)
-  || (CALL_P (insn)
- && find_reg_note (insn, REG_SETJMP, NULL))
-  || (NONJUMP_INSN_P (insn)
- && volatile_insn_p (PATTERN (insn
+  if ((LABEL_P (insn)
+   || (CALL_P (insn)
+  && find_reg_note (insn, REG_SETJMP, NULL))
+   || (NONJUMP_INSN_P (insn)
+  && volatile_insn_p (PATTERN (insn
+  && !cselib_preserve_constants)
 {
   cselib_reset_table (next_uid);
   cselib_current_insn = NULL_RTX;
@@ -2668,9 +2669,18 @@ cselib_process_insn (rtx insn)
   /* Look for any CLOBBERs in CALL_INSN_FUNCTION_USAGE, but only
  after we have processed the insn.  */
   if (CALL_P (insn))
-for (x = CALL_INSN_FUNCTION_USAGE (insn); x; x = XEXP (x, 1))
-  if (GET_CODE (XEXP (x, 0)) == CLOBBER)
-   cselib_invalidate_rtx (XEXP (XEXP (x, 0), 0));
+{
+  for (x = CALL_INSN_FUNCTION_USAGE (insn); x; x = XEXP (x, 1))
+   if (GET_CODE (XEXP (x, 0)) == CLOBBER)
+ cselib_invalidate_rtx (XEXP (XEXP (x, 0), 0));
+  /* Flush evertything on setjmp.  */
+  if (cselib_preserve_constants
+ && find_reg_note (insn, REG_SETJMP, NULL))
+   {
+ cselib_preserve_only_values ();
+ cselib_reset_table (next_uid);
+   }
+}
 
   /* On setter of the hard frame pointer if frame_pointer_needed,
  invalidate stack_pointer_rtx, so that sp and {,h}fp based
--- gcc/testsuite/gcc.dg/guality/pr36728-1.c.jj 2012-11-26 10:14:25.0 
+0100
+++ gcc/testsuite/gcc.dg/guality/pr36728-1.c2012-11-27 10:01:14.517080157 
+0100
@@ -1,7 +1,11 @@
 /* PR debug/36728 */
 /* { dg-do run } */
 /* { dg-options "-g" } */
-int a;
+
+#include "../nop.h"
+
+int a, b;
+
 int __attribute__((noinline))
 foo (int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7)
 {
@@ -9,9 +13,9 @@ foo (int arg1, int arg2, int arg3, int a
   int __attribute__ ((aligned(32))) y;
 
   y = 2;
-  asm ("" : "=m" (y) : "m" (y));
+  asm (NOP : "=m" (y), "=m" (b) : "m" (y));
   x[0] = 25;
-  asm ("" : "=m" (x[0]), "=m" (a) : "m" (x[0]));
+  asm (NOP : "=m" (x[0]), "=m" (a) : "m" (x[0]), "m" (b));
   return y;
 }
 
@@ -21,23 +25,23 @@ foo (int arg1, int arg2, int arg3, int a
and arg2.  So it is expected that these values are unavailable in
some of these tests.  */
 
-/* { dg-final { gdb-test 12 "arg1" "1" { target { ! "s390*-*-*" } } } }*/
-/* { dg-final { gdb-test 12 "arg2" "2" { targe

[PATCH] Remove dead code from fold_stmt_1

2012-11-27 Thread Richard Biener

This removes dead code as suggested by Jakub.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2012-11-27  Richard Biener  

* gimple-fold.c (fold_stmt_1): Remove unnecessary code.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 193839)
+++ gcc/gimple-fold.c   (working copy)
@@ -1282,14 +1282,6 @@ fold_stmt_1 (gimple_stmt_iterator *gsi,
 default:;
 }
 
-  /* If stmt folds into nothing and it was the last stmt in a bb,
- don't call gsi_stmt.  */
-  if (gsi_end_p (*gsi))
-{
-  gcc_assert (next_stmt == NULL);
-  return changed;
-}
-
   stmt = gsi_stmt (*gsi);
 
   /* Fold *& on the lhs.  Don't do this if stmt folded into nothing,


Re: [PATCH] Remove dead code from fold_stmt_1

2012-11-27 Thread Jakub Jelinek
On Tue, Nov 27, 2012 at 01:00:05PM +0100, Richard Biener wrote:
> 
> This removes dead code as suggested by Jakub.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.

Looks as partial removal only.  IMHO
  gimple_stmt_iterator gsinext = *gsi;
  gimple next_stmt;

  gsi_next (&gsinext);
  next_stmt = gsi_end_p (gsinext) ? NULL : gsi_stmt (gsinext);
can go too and
  /* Fold *& on the lhs.  Don't do this if stmt folded into nothing,
 as we'd changing the next stmt.  */
  if (gimple_has_lhs (stmt) && stmt != next_stmt)
should be:
  /* Fold *& on the lhs.  */
  if (gimple_has_lhs (stmt))

> 2012-11-27  Richard Biener  
> 
>   * gimple-fold.c (fold_stmt_1): Remove unnecessary code.
> 
> Index: gcc/gimple-fold.c
> ===
> --- gcc/gimple-fold.c (revision 193839)
> +++ gcc/gimple-fold.c (working copy)
> @@ -1282,14 +1282,6 @@ fold_stmt_1 (gimple_stmt_iterator *gsi,
>  default:;
>  }
>  
> -  /* If stmt folds into nothing and it was the last stmt in a bb,
> - don't call gsi_stmt.  */
> -  if (gsi_end_p (*gsi))
> -{
> -  gcc_assert (next_stmt == NULL);
> -  return changed;
> -}
> -
>stmt = gsi_stmt (*gsi);
>  
>/* Fold *& on the lhs.  Don't do this if stmt folded into nothing,

Jakub


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Hans-Peter Nilsson
On Tue, 27 Nov 2012, Jakub Jelinek wrote:
> On Tue, Nov 27, 2012 at 06:44:23AM -0500, Hans-Peter Nilsson wrote:

JFTR: No I didn't, Eric wrote the below.  But, it made sense to me. :)

> > > We apparently have a small conflict between the meaning of volatile asms 
> > > with
> > > operands at the source level and volatile_insn_p.  However, I think that 
> > > the
> > > latter interpretation is the correct one: if your asm statements have 
> > > effects
> > > that can be described, then you should use output constraints instead of
> > > volatile; otherwise, you should use volatile and the output constraints 
> > > have
> > > essentially no meaning.
>
> I strongly disagree with this.
> [...]

As long as volatile asms and UNSPEC_VOLATILE insns (aka.
barriers) are handled the same way and consistently throughout
gcc, I'm fine.  It seems your patch does that, so thanks!

> But the question is also what effects your patch can have e.g. on RTL DSE.

Looks like the patch caused a bootstrap for s390x.

Eagerly awaiting a PR for that, but whoever is interested
on that, please try Jakub's patch first...

> 2012-11-26  Jakub Jelinek  
>
>   PR debug/36728
>   PR debug/55467
>   * cselib.c (cselib_process_insn): If cselib_preserve_constants,
>   don't reset table and exit early on volatile insns and setjmp.
>   Reset table afterwards on setjmp.
>
>   * gcc.dg/guality/pr36728-1.c: Include "../nop.h", make sure the asm
>   are non-empty and add dependency between the first and second asm.
>   * gcc.dg/guality/pr36728-2.c: Likewise.
>   * gcc.dg/guality/pr36728-3.c: New test.
>   * gcc.dg/guality/pr36728-4.c: New test.

brgds, H-P


Re: [tsan] Instrument atomics

2012-11-27 Thread Jakub Jelinek
On Tue, Nov 27, 2012 at 09:23:30AM +0100, Jakub Jelinek wrote:
> On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
> > I've added 128-bit atomic ops:
> > http://llvm.org/viewvc/llvm-project?view=rev&revision=168683
> 
> Thanks.

+#if (defined(__clang__) && defined(__clang_major__) \
+  && defined(__clang_minor__) && __clang__ >= 1 && __clang_major__ >= 3 \
+  && __clang_minor__ >= 3) \
+|| (defined(__GNUC__) && defined(__GNUC_MINOR__) \
+  && defined(__GNUC_PATCHLEVEL__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 6 \
+  && __GNUC_PATCHLEVEL__ >= 3)

is wrong, one thing is that __int128 is available only on a couple of
architectures (i?86/x86_64/ia64 or so), and more importantly, the above
isn't true for say GCC 4.7.0, because __GNUC_PATCHLEVEL__ is then < 3.
So, either you want something like
#define GCC_VERSION ((__GNUC__) * 1 + (__GNUC_MINOR__) * 100 + 
(__GNUC_PATCHLEVEL__))
and then you can test like #if GCC_VERSION >= 40603
or, for the int128 case, much better just to test
defined(__GNUC__) && defined(__SIZEOF_INT128__)
(no idea if clang doesn't define the same macro, if it does, you could
just test for presence of the sizeof macro).

Jakub


Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-11-27 Thread Steven Bosscher
On Tue, Nov 27, 2012 at 1:00 PM, Steven Bosscher wrote:
> Now that USE of r719 is a use of a dead register, rendering the
> REG_EQUAL invalid. From there on the problem is the same as the ones I
> had to "fix" in loop-unroll.c. First the webizer puts the USE in a
> different web and renames the USE to r1591:
>
>  2889 r719:DI=r1562:DI
>   REG_EQUAL: r1591:DI+0x4

With this patch, the self-referencing EQ_USE is kept together with its DEF:

 2889 r719:DI=r1562:DI
  REG_EQUAL: r719:DI+0x4

Dominique, could you give this a try and see if it helps?
(But as I said up-thread: I'm not sure this is a proper fix, or just
another band-aid...)

Ciao!
Steven


* web.c (union_eq_uses): New function to keep self-referencing
notes intact.
(web_main): Call it.

Index: web.c
===
--- web.c   (revision 193394)
+++ web.c   (working copy)
@@ -148,6 +148,35 @@ union_match_dups (rtx insn, struct web_e
 }
 }

+/* If INSN has a REG_EQUAL note that references a DEF of INSN, union
+   them.  FUN is the function that does the union.  */
+
+static void
+union_eq_uses (rtx insn, struct web_entry *def_entry,
+  struct web_entry *use_entry,
+  bool (*fun) (struct web_entry *, struct web_entry *))
+{
+  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+  df_ref *def_link = DF_INSN_INFO_DEFS (insn_info);
+  df_ref *eq_use_link = DF_INSN_INFO_EQ_USES (insn_info);
+
+  if (! (def_link && eq_use_link))
+return;
+
+  while (*eq_use_link)
+{
+  df_ref *def_rec = def_link;
+  while (*def_rec)
+   {
+ if (DF_REF_REAL_REG (*eq_use_link) == DF_REF_REAL_REG (*def_rec))
+   (*fun) (use_entry + DF_REF_ID (*eq_use_link),
+   def_entry + DF_REF_ID (*def_rec));
+ def_rec++;
+   }
+  eq_use_link++;
+}
+}
+
 /* For each use, all possible defs reaching it must come in the same
register, union them.
FUN is the function that does the union.
@@ -390,6 +419,7 @@ web_main (void)
  if (DF_REF_REGNO (use) >= FIRST_PSEUDO_REGISTER)
union_defs (use, def_entry, used, use_entry, unionfind_union);
}
+ union_eq_uses (insn, def_entry, use_entry, unionfind_union);
}
 }


Re: [tsan] Instrument atomics

2012-11-27 Thread Dmitry Vyukov
On Tue, Nov 27, 2012 at 4:27 PM, Jakub Jelinek  wrote:
> On Tue, Nov 27, 2012 at 09:23:30AM +0100, Jakub Jelinek wrote:
>> On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
>> > I've added 128-bit atomic ops:
>> > http://llvm.org/viewvc/llvm-project?view=rev&revision=168683
>>
>> Thanks.
>
> +#if (defined(__clang__) && defined(__clang_major__) \
> +  && defined(__clang_minor__) && __clang__ >= 1 && __clang_major__ >= 3 \
> +  && __clang_minor__ >= 3) \
> +|| (defined(__GNUC__) && defined(__GNUC_MINOR__) \
> +  && defined(__GNUC_PATCHLEVEL__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 
> 6 \
> +  && __GNUC_PATCHLEVEL__ >= 3)
>
> is wrong, one thing is that __int128 is available only on a couple of
> architectures (i?86/x86_64/ia64 or so), and more importantly, the above
> isn't true for say GCC 4.7.0, because __GNUC_PATCHLEVEL__ is then < 3.
> So, either you want something like
> #define GCC_VERSION ((__GNUC__) * 1 + (__GNUC_MINOR__) * 100 + 
> (__GNUC_PATCHLEVEL__))
> and then you can test like #if GCC_VERSION >= 40603
> or, for the int128 case, much better just to test
> defined(__GNUC__) && defined(__SIZEOF_INT128__)
> (no idea if clang doesn't define the same macro, if it does, you could
> just test for presence of the sizeof macro).

clang does not support the macro.
what about
#if defined(__SIZEOF_INT128__) || defined(__clang__)
?

thanks!


Re: [tsan] Instrument atomics

2012-11-27 Thread Jakub Jelinek
On Tue, Nov 27, 2012 at 04:35:33PM +0400, Dmitry Vyukov wrote:
> On Tue, Nov 27, 2012 at 4:27 PM, Jakub Jelinek  wrote:
> > On Tue, Nov 27, 2012 at 09:23:30AM +0100, Jakub Jelinek wrote:
> >> On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
> >> > I've added 128-bit atomic ops:
> >> > http://llvm.org/viewvc/llvm-project?view=rev&revision=168683
> >>
> >> Thanks.
> >
> > +#if (defined(__clang__) && defined(__clang_major__) \
> > +  && defined(__clang_minor__) && __clang__ >= 1 && __clang_major__ >= 
> > 3 \
> > +  && __clang_minor__ >= 3) \
> > +|| (defined(__GNUC__) && defined(__GNUC_MINOR__) \
> > +  && defined(__GNUC_PATCHLEVEL__) && __GNUC__ >= 4 && __GNUC_MINOR__ 
> > >= 6 \
> > +  && __GNUC_PATCHLEVEL__ >= 3)
> >
> > is wrong, one thing is that __int128 is available only on a couple of
> > architectures (i?86/x86_64/ia64 or so), and more importantly, the above
> > isn't true for say GCC 4.7.0, because __GNUC_PATCHLEVEL__ is then < 3.
> > So, either you want something like
> > #define GCC_VERSION ((__GNUC__) * 1 + (__GNUC_MINOR__) * 100 + 
> > (__GNUC_PATCHLEVEL__))
> > and then you can test like #if GCC_VERSION >= 40603
> > or, for the int128 case, much better just to test
> > defined(__GNUC__) && defined(__SIZEOF_INT128__)
> > (no idea if clang doesn't define the same macro, if it does, you could
> > just test for presence of the sizeof macro).
> 
> clang does not support the macro.
> what about
> #if defined(__SIZEOF_INT128__) || defined(__clang__)
> ?

Then for __clang__ you need to do a version check I guess (and, the same
what I wrote applies, consider clang 4.0; don't care about that though),
but for GCC sure, just the #ifdef __SIZEOF_INT128__ is what lots of tests do.

Jakub


Re: PR55124 - tentative patch for ICE in PRE

2012-11-27 Thread Richard Biener
On Mon, Nov 19, 2012 at 3:33 AM, Tom de Vries  wrote:
> Richard,
>
> Consider the PR55124 example test.c:
> ...
> int a, b;
> long c;
>
> static void inline
> f2 (void)
> {
>   unsigned long k = 1;
>
>   foo (b ? k = 0 : 0);
>
>   b = (((c = b)
> ? (k ?: (c = 0))
> : a)
>* c);
> }
>
> void
> f1 (void)
> {
>   f2 ();
>   a = b | c;
> }
> ...
>
> when compiling with -O2, we're running into the following assertion in pre:
> ...
> test.c:18:1: internal compiler error: in find_or_generate_expression, at
> tree-ssa-pre.c:2802
>  f1 (void)
>  ^
> 0xcf41d3 find_or_generate_expression
> gcc/tree-ssa-pre.c:2802
> 0xcf42f6 create_expression_by_pieces
> gcc/tree-ssa-pre.c:2861
> 0xcf4193 find_or_generate_expression
> gcc/tree-ssa-pre.c:2799
> 0xcf42f6 create_expression_by_pieces
> gcc/tree-ssa-pre.c:2861
> 0xcf4e28 insert_into_preds_of_block
> gcc/tree-ssa-pre.c:3096
> 0xcf5c7d do_regular_insertion
> gcc/tree-ssa-pre.c:3386
> ...
>
> We're hitting the assert at the end of find_or_generate_expression:
> ...
> static tree
> find_or_generate_expression (basic_block block, tree op, gimple_seq *stmts)
> {
>   pre_expr expr = get_or_alloc_expr_for (op);
>   unsigned int lookfor = get_expr_value_id (expr);
>   pre_expr leader = bitmap_find_leader (AVAIL_OUT (block), lookfor);
>   if (leader)
> {
>   if (leader->kind == NAME)
> return PRE_EXPR_NAME (leader);
>   else if (leader->kind == CONSTANT)
> return PRE_EXPR_CONSTANT (leader);
> }
>
>   /* It must be a complex expression, so generate it recursively.  */
>   bitmap exprset = VEC_index (bitmap, value_expressions, lookfor);
>   bitmap_iterator bi;
>   unsigned int i;
>   EXECUTE_IF_SET_IN_BITMAP (exprset, 0, i, bi)
> {
>   pre_expr temp = expression_for_id (i);
>   if (temp->kind != NAME)
> return create_expression_by_pieces (block, temp, stmts,
> get_expr_type (expr));
> }
>
>   gcc_unreachable ();
> }
> ...
>
> The state in which we're asserting is the following:
> ...
> #5  0x00cf41d4 in find_or_generate_expression (block=0x76210f08,
> op=0x762384c8, stmts=0x7fffdb78) at gcc/tree-ssa-pre.c:2802
> 2802  gcc_unreachable ();
> (gdb) p block.index
> $13 = 4
> (gdb) call debug_generic_expr (op)
> b.4_3
> (gdb) call debug_pre_expr (expr)
> b.4_3
> (gdb) p lookfor
> $11 = 7
> (gdb) call debug_bitmap_set (((bb_value_sets_t) ((block)->aux))->avail_out)
> debug[0] := { b.4_8 (0012), a.10_13 (0013), _14 (0014), iftmp.5_15 (0015) }
> (gdb) p leader
> $12 = (pre_expr) 0x0
> (gdb) call debug_bitmap ( exprset )
> first = 0x21fb058 current = 0x21fb058 indx = 0
> 0x21fb058 next = (nil) prev = (nil) indx = 0
> bits = { 22 25 }
> (gdb) call debug_pre_expr (expression_for_id (22))
> b.4_3
> (gdb) call debug_pre_expr (expression_for_id (25))
> b.4_31
> ...
> We're trying to find or generate an expr with value-id 0007 in block 4. We 
> can't
> find it (there's no leader) and we can't generate it because there are no 
> exprs
> with that value that are not names.
>
> Higher up in the call stack and skipping create_expression_by_pieces, the 
> state
> is as follows:
> ...
> #7  0x00cf4194 in find_or_generate_expression (block=0x76210f08,
> op=0x76238558, stmts=0x7fffdb78) at gcc/tree-ssa-pre.c:2799
> 2799get_expr_type (expr));
> (gdb) call debug_generic_expr (op)
> c.6_5
> (gdb) call debug_pre_expr (expr)
> c.6_5
> (gdb) p lookfor
> $14 = 9
> (gdb) p leader
> $15 = (pre_expr) 0x0
> (gdb) call debug_bitmap ( exprset )
> first = 0x21fb0f8 current = 0x21fb0f8 indx = 0
> 0x21fb0f8 next = (nil) prev = (nil) indx = 0
> bits = { 23 24 26 27 }
> (gdb) call debug_pre_expr (temp)
> {nop_expr,b.4_3}
> (gdb) call debug_pre_expr (expression_for_id (23))
> c.6_5
> (gdb) call debug_pre_expr (expression_for_id (24))
> {nop_expr,b.4_3}
> (gdb) call debug_pre_expr (expression_for_id (26))
> c.6_32
> (gdb) call debug_pre_expr (expression_for_id (27))
> {mem_ref<0B>,addr_expr<&c>}@.MEM_28
> ...
> We're trying to find or generate an expr with value-id 0009 (in block 4). We
> can't find it. We're trying to generate it using {nop_expr,b.4_3}, but as 
> we've
> seen above that won't work. The generation using expr 27 would work though.
>
> Again higher up in the call stack and skipping create_expression_by_pieces, 
> the
> state is as follows:
> ...
> (gdb) up
> #9  0x00cf4e29 in insert_into_preds_of_block (block=0x76210f70,
> exprnum=19, avail=0x22102e0) at gcc/tree-ssa-pre.c:3096
> 3096   &stmts, type);
> (gdb) l
> 3091  eprime = VEC_index (pre_expr, avail, pred->dest_idx);
> 3092
> 3093  if (eprime->kind != NAME && eprime->kind != CONSTANT)
> 3094{
> 3095  builtexpr = create_expression_by_pieces (bprime, e

Re: [tsan] Instrument atomics

2012-11-27 Thread Dmitry Vyukov
On Tue, Nov 27, 2012 at 4:39 PM, Jakub Jelinek  wrote:
> On Tue, Nov 27, 2012 at 04:35:33PM +0400, Dmitry Vyukov wrote:
>> On Tue, Nov 27, 2012 at 4:27 PM, Jakub Jelinek  wrote:
>> > On Tue, Nov 27, 2012 at 09:23:30AM +0100, Jakub Jelinek wrote:
>> >> On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
>> >> > I've added 128-bit atomic ops:
>> >> > http://llvm.org/viewvc/llvm-project?view=rev&revision=168683
>> >>
>> >> Thanks.
>> >
>> > +#if (defined(__clang__) && defined(__clang_major__) \
>> > +  && defined(__clang_minor__) && __clang__ >= 1 && __clang_major__ >= 
>> > 3 \
>> > +  && __clang_minor__ >= 3) \
>> > +|| (defined(__GNUC__) && defined(__GNUC_MINOR__) \
>> > +  && defined(__GNUC_PATCHLEVEL__) && __GNUC__ >= 4 && __GNUC_MINOR__ 
>> > >= 6 \
>> > +  && __GNUC_PATCHLEVEL__ >= 3)
>> >
>> > is wrong, one thing is that __int128 is available only on a couple of
>> > architectures (i?86/x86_64/ia64 or so), and more importantly, the above
>> > isn't true for say GCC 4.7.0, because __GNUC_PATCHLEVEL__ is then < 3.
>> > So, either you want something like
>> > #define GCC_VERSION ((__GNUC__) * 1 + (__GNUC_MINOR__) * 100 + 
>> > (__GNUC_PATCHLEVEL__))
>> > and then you can test like #if GCC_VERSION >= 40603
>> > or, for the int128 case, much better just to test
>> > defined(__GNUC__) && defined(__SIZEOF_INT128__)
>> > (no idea if clang doesn't define the same macro, if it does, you could
>> > just test for presence of the sizeof macro).
>>
>> clang does not support the macro.
>> what about
>> #if defined(__SIZEOF_INT128__) || defined(__clang__)
>> ?
>
> Then for __clang__ you need to do a version check I guess (and, the same
> what I wrote applies, consider clang 4.0; don't care about that though),
> but for GCC sure, just the #ifdef __SIZEOF_INT128__ is what lots of tests do.


I've wrote:

#if defined(__SIZEOF_INT128__) \
|| (__clang_major__ * 100 + __clang_minor__ >= 302)

thanks!


Re: [PATCH, RFC] PR 55415 : Pessimistic misalignment from eipa_sra pass

2012-11-27 Thread Richard Biener
On Wed, Nov 21, 2012 at 5:58 PM, Martin Jambor  wrote:
> Hi,
>
> On Tue, Nov 20, 2012 at 09:24:20AM -0800, Richard Henderson wrote:
>> The get_pointer_alignment function can indicate that it does not know
>> what the alignment should be, and it always fills in worst-case values
>> for that case.  We should not use these worst-case values to "optimize"
>> the interface of a function.
>>
>> At minimum I think something like the following would be good.  But
>> I'm unsure why we would *ever* want to lower the alignment at a function
>> interface.  It seems to me that we'd simply want the caller to handle
>> copying the data to an aligned location?
>>
>> What was the use case of this code in the first place?
>
> PR 52402, and not surprisingly, that testcase fails on x86_64-linux
> with your patch.  Furthermore, misalignment due to MEM_REF offset has
> to be accounted for whether we know the alignment of base or not.
>
> I was hoping that we could do something along the lines of
> build_ref_for_offset tree-sra.c but that would not work for cases like
> the one from PR 52890, comment 7 when there is no dereference in the
> caller, we tranform:
>
>   D.2537 = &this->surfaces;
>   sPtr.0 = ggTrain::_ZNK7ggTrainIP9mrSurfaceEixEi.isra.0 (D.2537, 
> 1);
>
> into
>
>   D.2537 = &this->surfaces;
>   D.2554 = MEM[(const struct ggTrain *)D.2537];
>   sPtr.0 = ggTrain::_ZNK7ggTrainIP9mrSurfaceEixEi.isra.0 (D.2537, 
> 1);
>
> At the moment I hope I'll be able to:
>
> 1) Bring back tree_non_aligned_mem_p from 4.6 to be used in
>access_precludes_ipa_sra_p.  This way, any knowingly misaligned
>load in the callee should will not be transferred to the caller, if
>there is none.
>
> 2) In ipa_modify_call_arguments, be optimistic in like in your patch
>except for the case when we are looking at a dereference (i.e. we
>are turning a BLK dereference into a smaller scalar type one).  In
>that case, use its alignment like in build_ref_for_offset.
>
> But I'd like to think about this a bit more.  I believe we should ask
> for Richi's approval when he comes back and recovers anyway.  But this
> is now my highest priority (finally).

The issue here is that IPA SRA, when seeing

foo (T *addr)
{
  *addr;
}

 foo (&mem);

wanting to transform it to

foo' (T value)
{
  value;
}

 value = *(T *)mem;
 foo (value);

has to use the _same_ alignment for the access to mem as it was used
inside foo.  That's because of the fact that alignment information in GIMPLE
is solely present at memory accesses and _not_ in any way associated
with pointer types (as opposed to more strict rules imposed by some languages
such as C, often enough violated by users, *(char *)(int *)p is an access
aligned to at least 4 bytes in C).

IPA SRA can use bigger alignment if it knows that is safe (thus
get_pointer_alignment returns something larger than what the actual
access in foo was using).  What IPA SRA at the moment cannot do
is "propagate" larger alignment used in foo to the call site (AFAIK).
Thus, technically IPA SRA can use MAX (get_pointer_alignment (call site),
get_object_alignment (original dereference site)).

For fixing pessimizations caused by IPA SRA I suggest to simply punt
(not do the transform) if get_pointer_alignment_1 returns false for the
actual call argument.  Or implement the propagation.

Richard.

> Thanks,
>
> Martin
>
>
>>
>>
>>
>> r~
>
>> diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
>> index 3150bd6..d117389 100644
>> --- a/gcc/ipa-prop.c
>> +++ b/gcc/ipa-prop.c
>> @@ -2956,15 +2956,17 @@ ipa_modify_call_arguments (struct cgraph_edge *cs, 
>> gimple stmt,
>> unsigned int align;
>> unsigned HOST_WIDE_INT misalign;
>>
>> -   get_pointer_alignment_1 (base, &align, &misalign);
>> -   misalign += (tree_to_double_int (off)
>> -.sext (TYPE_PRECISION (TREE_TYPE (off))).low
>> -* BITS_PER_UNIT);
>> -   misalign = misalign & (align - 1);
>> -   if (misalign != 0)
>> - align = (misalign & -misalign);
>> -   if (align < TYPE_ALIGN (type))
>> - type = build_aligned_type (type, align);
>> +   if (get_pointer_alignment_1 (base, &align, &misalign))
>> + {
>> +   misalign += (tree_to_double_int (off)
>> +.sext (TYPE_PRECISION (TREE_TYPE (off))).low
>> +* BITS_PER_UNIT);
>> +   misalign = misalign & (align - 1);
>> +   if (misalign != 0)
>> + align = (misalign & -misalign);
>> +   if (align < TYPE_ALIGN (type))
>> + type = build_aligned_type (type, align);
>> + }
>> expr = fold_build2_loc (loc, MEM_REF, type, base, off);
>>   }
>> else
>


Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-11-27 Thread Kenneth Zadeck
i will discuss this with mike when he wakes up.he lives on the west 
pole so that will not be until after you go to bed.


the one point that i will take exception to is that the copying 
operation is, in practice, any more time expensive than the pointer 
copy.   I never bother to initialize the storage in the array, i only 
copy the elements that are live.This is with almost always 1 hwi 
because either most types are small or most constants of large types 
compress to 1 hwi.So even if a compilation does a zillion 
::from_trees, you will most likely never see the difference in time.


kenny


On 11/27/2012 05:03 AM, Richard Biener wrote:

On Tue, Nov 27, 2012 at 1:06 AM, Kenneth Zadeck
 wrote:

Richard,

I spent a good part of the afternoon talking to Mike about this.  He is on
the c++ standards committee and is a much more seasoned c++ programmer than
I am.

He convinced me that with a large amount of engineering and c++
"foolishness" that it was indeed possible to get your proposal to POSSIBLY
work as well as what we did.

But now the question is why would any want to do this?

At the very least you are talking about instantiating two instances of
wide-ints, one for the stack allocated uses and one for the places where we
just move a pointer from the tree or the rtx. Then you are talking about
creating connectors so that the stack allocated functions can take
parameters of pointer version and visa versa.

Then there is the issue that rather than just saying that something is a
wide int, that the programmer is going to have to track it's origin.   In
particular,  where in the code right now i say.

wide_int foo = wide_int::from_rtx (r1);
wide_int bar = wide_int::from_rtx (r2) + foo;

now i would have to say

wide_int_ptr foo = wide_int_ptr::from_rtx (r1);
wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo;

No, you'd say

wide_int foo = wide_int::from_rtx (r1);

and the static, non-templated from_rtx method would automagically
return (always!) a "wide_int_ptr" kind.  The initialization then would
use the assignment operator that mediates between wide_int and
"wide_int_ptr", doing the copying.

The user should get a 'stack' kind by default when specifying wide_int,
like implemented with

struct wide_int_storage_stack;
struct wide_int_storage_ptr;

template 
class wide_int : public storage
{
...
static wide_int  from_rtx (rtx);
}

the whole point of the exercise is to make from_rtx and from_tree avoid
the copying (and excessive stack space allocation) for the rvalue case
like in

  wide_int res = wide_int::from_rtx (x) + 1;

if you save the result into a wide_int temporary first then you are lost
of course (modulo some magic GCC optimization being able to elide
the copy somehow).

And of course for code like VRP that keeps a lattice of wide_ints to
be able to reduce its footprint by using ptr storage and explicit allocations
(that's a secondary concern, of course).  And for VRP to specify that
it needs more than the otherwise needed MAX_INT_MODE_SIZE.
ptr storage would not have this arbitrary limitation, only embedded
storage (should) have.


then when i want to call some function using a wide_int ref that function
now must be either overloaded to take both or i have to choose one of the
two instantiations (presumably based on which is going to be more common)
and just have the compiler fix up everything (which it is likely to do).

Nope, they'd be

class wide_int ...
{
template 
wide_int operator+(wide_int  a, wide_int b)
{
   return wide_int::plus_worker (a.precision, a. , a.get_storage_ptr (),
 b.precision, ...,
b.get_storage_ptr ());
}



And so what is the payoff:
1) No one except the c++ elite is going to understand the code. The rest of
the community will hate me and curse the ground that i walk on.

Maybe for the implementation - but look at hash-table and vec ... not for
usage certainly.


2) I will end up with a version of wide-int that can be used as a medium
life container (where i define medium life as not allowed to survive a gc
since they will contain pointers into rtxes and trees.)
3) An no clients that actually wanted to do this!!I could use as an
example one of your favorite passes, tree-vrp.   The current double-int
could have been a medium lifetime container since it has a smaller
footprint, but in fact tree-vrp converts those double-ints back into trees
for medium storage.   Why, because it needs the other fields of a tree-cst
to store the entire state.  Wide-ints also "suffer" this problem.  their
only state are the data, and the three length fields.   They have no type
and none of the other tree info so the most obvious client for a medium
lifetime object is really not going to be a good match even if you "solve
the storage problem".

The fact is that wide-ints are an excellent short term storage class that
can be very quickly converted into our two long term storage classes.  Your
propos

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-11-27 Thread Steven Bosscher
On Tue, Nov 27, 2012 at 1:28 PM, Steven Bosscher wrote:
> Dominique, could you give this a try and see if it helps?
> (But as I said up-thread: I'm not sure this is a proper fix, or just
> another band-aid...)

And more band-aid, but this time I'm not even sure where things go
wrong. In any case, we end up with a REG_EQUAL note referencing a
SUBREG of a dead register. This leaked because df_remove_dead_eq_notes
uses loc_mentioned_in_p not on the note but on the first operand of
the note.

Index: df-problems.c
===
--- df-problems.c   (revision 193394)
+++ df-problems.c   (working copy)
@@ -2907,9 +2907,10 @@ df_remove_dead_eq_notes (rtx insn, bitma
if (DF_REF_REGNO (use) > FIRST_PSEUDO_REGISTER
&& DF_REF_LOC (use)
&& (DF_REF_FLAGS (use) & DF_REF_IN_NOTE)
-   && ! bitmap_bit_p (live, DF_REF_REGNO (use))
-   && loc_mentioned_in_p (DF_REF_LOC (use), XEXP (link, 0)))
+   && ! bitmap_bit_p (live, DF_REF_REGNO (use)))
  {
+   /* Make sure that DF_SCAN is up-to-date.  */
+   gcc_assert (loc_mentioned_in_p (DF_REF_LOC (use), link));
deleted = true;
break;
  }


Re: [patch] RFA: slim RTL printing cleanups

2012-11-27 Thread Richard Biener
On Tue, Nov 27, 2012 at 12:20 AM, Steven Bosscher  wrote:
> Hello,
>
> This patch performs some necessary TLC on slim RTL printing in sched-vis.c:
>
> * Make it independent of the scheduler. Actually it already was,
> mostly. This patch completes the job.
>
> * Harmonize dumping templates for INSN_UID.
>
> * Always print the pattern of a CALL_INSN.
>
> * Don't print "jump" for sched dumps only.
>
> * Harmonize function names:
>   - print_* print to a "char *" buffer
>   - dump_* print to a given "FILE *"
>   - debug_* print to stderr.
>
> Some of the ideas come from Richard S.'s
> http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00284.html.
>
> The next patch addresses a bigger problem: The print_* variants should
> use something better than a fixed-size buffer. I plan to use
> pretty-print. An obstack would also work; but I'd like to keep one
> layer of abstraction between the file output and the printers, so that
> users can post-process the string if necessary (the case I'm thinking
> of, is the graph dumpers, which need to escape some characters before
> writing to a file). Any thoughts on this from anyone?
>
> Bootstrapped&tested on powerpc64-unknown-linux-gnu. OK for trunk?
> (Yes, I know it's stage3, but look at it this way: the patch does no
> harm, and good dumps help for debugging real bugs ;-)

Ok!

Thanks,
Richard.

> Ciao!
> Steven


Re: bit_field_ref of constructor of vectors

2012-11-27 Thread Richard Biener
On Mon, Nov 12, 2012 at 2:31 PM, Marc Glisse  wrote:
> Hello,
>
> this patch lets us fold a bit_field_ref of a constructor even when the
> elements are vectors. Writing a testcase is not that convenient because of
> the lack of a dead code elimination pass after forwprop4 (RTL doesn't always
> remove everything either), but I see a clear difference on this test on
> x86_64 without AVX.
>
> typedef double vec __attribute__((vector_size(4*sizeof(double;
> void f(vec*x){
>   *x+=*x+*x;
> }
> double g(vec*x){
>   return (*x+*x)[3];
> }
> void h(vec*x, double a, double b, double c, double d){
>   vec y={a,b,c,d};
>   *x+=y;
> }
>
>
> I don't know if the patch (assuming it is ok) is suitable for stage 3, it
> may have to wait. It passes bootstrap+testsuite.
>
> I am still not quite sure why we even have a valid_gimple_rhs_p function
> (after which we usually give up if it says no) instead of gimplifying, so I
> may have missed a reason why CONSTRUCTOR or BIT_FIELD_REF shouldn't be ok.

Boostrapped and tested on ... ?

Ok if bootstrap / testing passes.

Thanks,
Richard.

>
> 2012-11-12  Marc Glisse  
>
> * fold-const.c (fold_ternary_loc) [BIT_FIELD_REF]: Handle
> CONSTRUCTOR with vector elements.
> * tree-ssa-propagate.c (valid_gimple_rhs_p): Handle CONSTRUCTOR
> and BIT_FIELD_REF.
>
> --
> Marc Glisse
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c(revision 193429)
> +++ gcc/fold-const.c(working copy)
> @@ -14054,65 +14054,75 @@ fold_ternary_loc (location_t loc, enum t
>   unsigned HOST_WIDE_INT n = tree_low_cst (arg1, 1);
>   unsigned HOST_WIDE_INT idx = tree_low_cst (op2, 1);
>
>   if (n != 0
>   && (idx % width) == 0
>   && (n % width) == 0
>   && ((idx + n) / width) <= TYPE_VECTOR_SUBPARTS (TREE_TYPE
> (arg0)))
> {
>   idx = idx / width;
>   n = n / width;
> - if (TREE_CODE (type) == VECTOR_TYPE)
> +
> + if (TREE_CODE (arg0) == VECTOR_CST)
> {
> - if (TREE_CODE (arg0) == VECTOR_CST)
> -   {
> - tree *vals = XALLOCAVEC (tree, n);
> - unsigned i;
> - for (i = 0; i < n; ++i)
> -   vals[i] = VECTOR_CST_ELT (arg0, idx + i);
> - return build_vector (type, vals);
> -   }
> - else
> -   {
> - VEC(constructor_elt, gc) *vals;
> - unsigned i;
> - if (CONSTRUCTOR_NELTS (arg0) == 0)
> -   return build_constructor (type, NULL);
> - if (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (arg0,
> -0)->value))
> - != VECTOR_TYPE)
> -   {
> - vals = VEC_alloc (constructor_elt, gc, n);
> - for (i = 0;
> -  i < n && idx + i < CONSTRUCTOR_NELTS (arg0);
> -  ++i)
> -   CONSTRUCTOR_APPEND_ELT (vals, NULL_TREE,
> -   CONSTRUCTOR_ELT
> - (arg0, idx +
> i)->value);
> - return build_constructor (type, vals);
> -   }
> -   }
> + if (n == 1)
> +   return VECTOR_CST_ELT (arg0, idx);
> +
> + tree *vals = XALLOCAVEC (tree, n);
> + for (unsigned i = 0; i < n; ++i)
> +   vals[i] = VECTOR_CST_ELT (arg0, idx + i);
> + return build_vector (type, vals);
> +   }
> +
> + /* Constructor elements can be subvectors.  */
> + unsigned HOST_WIDE_INT k = 1;
> + if (CONSTRUCTOR_NELTS (arg0) != 0)
> +   {
> + tree cons_elem = TREE_TYPE (CONSTRUCTOR_ELT (arg0,
> 0)->value);
> + if (TREE_CODE (cons_elem) == VECTOR_TYPE)
> +   k = TYPE_VECTOR_SUBPARTS (cons_elem);
> }
> - else if (n == 1)
> +
> + /* We keep an exact subset of the constructor elements.  */
> + if ((idx % k) == 0 && (n % k) == 0)
> {
> - if (TREE_CODE (arg0) == VECTOR_CST)
> -   return VECTOR_CST_ELT (arg0, idx);
> - else if (CONSTRUCTOR_NELTS (arg0) == 0)
> -   return build_zero_cst (type);
> - else if (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (arg0,
> -
> 0)->value))
> -  != VECTOR_TYPE)
> + VEC(constructor_elt, gc) *vals;
> + if (CONSTRUCTOR_NELTS (arg0) == 0)
> +   return build_con

[PATCH] Remove bogus assertion from the vectorizer (PR tree-optimization/55110)

2012-11-27 Thread Jakub Jelinek
Hi!

This patch removes a bogus assertion (that predates introduction
of STMT_VINFO_PATTERN_DEF_SEQ (and STMT_VINFO_PATTERN_DEF_STMT
before that) in 4.7).  stmt might be not just STMT_VINFO_RELATED_STMT
of orig_stmt, bt also could be anywhere in STMT_VINFO_PATTERN_DEF_SEQ
of it.  We don't vectorize this anyway for multiple reasons (e.g.
orig_code is TRUNC_DIV_EXPR which isn't handled reduction code,
and even the pattern recognized stmts include MULT_HIGHPART_EXPR
which isn't handled either).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-11-27  Jakub Jelinek  

PR tree-optimization/55110
* tree-vect-loop.c (vectorizable_reduction): Don't assert
that STMT_VINFO_RELATED_STMT of orig_stmt is stmt.

* gcc.dg/pr55110.c: New test.

--- gcc/tree-vect-loop.c.jj 2012-11-21 16:00:12.0 +0100
+++ gcc/tree-vect-loop.c2012-11-26 11:24:42.903995009 +0100
@@ -4624,7 +4624,6 @@ vectorizable_reduction (gimple stmt, gim
   if (orig_stmt)
 {
   orig_stmt_info = vinfo_for_stmt (orig_stmt);
-  gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info) == stmt);
   gcc_assert (STMT_VINFO_IN_PATTERN_P (orig_stmt_info));
   gcc_assert (!STMT_VINFO_IN_PATTERN_P (stmt_info));
 }
--- gcc/testsuite/gcc.dg/pr55110.c.jj   2012-11-26 11:31:32.008587313 +0100
+++ gcc/testsuite/gcc.dg/pr55110.c  2012-11-26 11:31:09.0 +0100
@@ -0,0 +1,13 @@
+/* PR tree-optimization/55110 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -ftree-vectorize" } */
+
+int
+foo (int x)
+{
+  int a, b;
+  for (b = 0; b < 8; b++)
+for (a = 0; a < 2; a++)
+  x /= 3;
+  return x;
+}

Jakub


[PATCH] Fix invalid asm goto error recovery (PR middle-end/52650)

2012-11-27 Thread Jakub Jelinek
Hi!

delete_insn_and_edges doesn't DTRT here, there is __builtin_unreachable
after the invalid inline asm goto.  Rather than changing the CFG and trying
to figure out what to do to remove the bogus asm goto successfully, it seems
much easier to just turn it into asm goto ("" : : : clobbers... : lab1...);
i.e. clear template and inputs.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-11-27  Jakub Jelinek  

PR middle-end/52650
* function.c (instantiate_virtual_regs_in_insn): Don't delete invalid
asm gotos, instead just clear their template and inputs.

--- gcc/function.c.jj   2012-11-21 16:00:12.0 +0100
+++ gcc/function.c  2012-11-26 16:47:24.907472155 +0100
@@ -1738,7 +1738,18 @@ instantiate_virtual_regs_in_insn (rtx in
   if (!check_asm_operands (PATTERN (insn)))
{
  error_for_asm (insn, "impossible constraint in %");
- delete_insn_and_edges (insn);
+ /* For asm goto, instead of fixing up all the edges
+just clear the template and clear input operands
+(asm goto doesn't have any output operands).  */
+ if (JUMP_P (insn))
+   {
+ rtx asm_op = extract_asm_operands (PATTERN (insn));
+ ASM_OPERANDS_TEMPLATE (asm_op) = ggc_strdup ("");
+ ASM_OPERANDS_INPUT_VEC (asm_op) = rtvec_alloc (0);
+ ASM_OPERANDS_INPUT_CONSTRAINT_VEC (asm_op) = rtvec_alloc (0);
+   }
+ else
+   delete_insn (insn);
}
 }
   else

Jakub


Re: [PATCH] A pass which merges constant stores to bitfields

2012-11-27 Thread Richard Biener
On Tue, Nov 13, 2012 at 1:50 AM, Andrew Pinski
 wrote:
> Hi,
>   I know we are in stage3, I thought I would send this out now for
> review as I finally have it ready for consumption as I finally got
> around to removing the limitation of it only working on big-endian.
> This pass was originally written by Adam Nemet while he was at Cavium.
>  I modified it to work on little-endian and also update the code to
> use the aliasing oracle and some of the new VEC interface.
>
> Yes I know I forgot to add documentation for the new option and for
> the new pass.  I will add it soon.

Note that I think the patch doesn't try to honor the C++ memory model
nor arms strict volatile bitfields.

My plan was to get similar results as your patch by lowering bitfield
loads and stores using DECL_BIT_FIELD_REPRESENTATIVE and
then let DCE/DSE and tree combine merge things.

That is, you can replace a bit-field a.b.c.d with
BIT_FIELD_REF 
and perform stores via read-modify-write.  The DECL_BIT_FIELD_REPRESENTATIVE
loads can be CSEd leaving a combining opportunity, stores will end up
being redundant.

The advantage is that using DECL_BIT_FIELD_REPRESENTATIVE will
make you honor the memory model issues automatically - you basically
perform what expand would do.

Instead of generating the component-ref with the representative you can of
course also lower the whole thing to a MEM_REF (if the ref does not
contain variable indexes).

Now, you have a pass that might be able to figure out when this lowering
would be profitable - that's good, because that is what I was missing with
the very simple approach of performing the above lowering from insinde
gimplification.

For reference I attached the BITFIELD_COMPOSE tree expression
patch.

Richard.

> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * tree-merge-const-bfstores.c: New file.
> * tree-pass.h (pass_merge_const_bfstores): Add pass.
> * opts.c (default_options_table): Add OPT_fmerge_const_bfstores at -O2
> and above.
> * timevar.def (TV_MERGE_CONST_BFSTORES): New timevar.
> * common.opt (fmerge-const-bfstores): New option.
> * Makefile.in (OBJS): Add tree-merge-const-bfstores.o.
> (tree-merge-const-bfstores.o): New target.
> * passes.c (init_optimization_passes): Add pass_merge_const_bfstores
> right after the last pass_phiopt.


bit-field-expr
Description: Binary data


Re: [PATCH] Remove bogus assertion from the vectorizer (PR tree-optimization/55110)

2012-11-27 Thread Richard Biener
On Tue, Nov 27, 2012 at 2:23 PM, Jakub Jelinek  wrote:
> Hi!
>
> This patch removes a bogus assertion (that predates introduction
> of STMT_VINFO_PATTERN_DEF_SEQ (and STMT_VINFO_PATTERN_DEF_STMT
> before that) in 4.7).  stmt might be not just STMT_VINFO_RELATED_STMT
> of orig_stmt, bt also could be anywhere in STMT_VINFO_PATTERN_DEF_SEQ
> of it.  We don't vectorize this anyway for multiple reasons (e.g.
> orig_code is TRUNC_DIV_EXPR which isn't handled reduction code,
> and even the pattern recognized stmts include MULT_HIGHPART_EXPR
> which isn't handled either).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2012-11-27  Jakub Jelinek  
>
> PR tree-optimization/55110
> * tree-vect-loop.c (vectorizable_reduction): Don't assert
> that STMT_VINFO_RELATED_STMT of orig_stmt is stmt.
>
> * gcc.dg/pr55110.c: New test.
>
> --- gcc/tree-vect-loop.c.jj 2012-11-21 16:00:12.0 +0100
> +++ gcc/tree-vect-loop.c2012-11-26 11:24:42.903995009 +0100
> @@ -4624,7 +4624,6 @@ vectorizable_reduction (gimple stmt, gim
>if (orig_stmt)
>  {
>orig_stmt_info = vinfo_for_stmt (orig_stmt);
> -  gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info) == stmt);
>gcc_assert (STMT_VINFO_IN_PATTERN_P (orig_stmt_info));
>gcc_assert (!STMT_VINFO_IN_PATTERN_P (stmt_info));
>  }
> --- gcc/testsuite/gcc.dg/pr55110.c.jj   2012-11-26 11:31:32.008587313 +0100
> +++ gcc/testsuite/gcc.dg/pr55110.c  2012-11-26 11:31:09.0 +0100
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/55110 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -ftree-vectorize" } */
> +
> +int
> +foo (int x)
> +{
> +  int a, b;
> +  for (b = 0; b < 8; b++)
> +for (a = 0; a < 2; a++)
> +  x /= 3;
> +  return x;
> +}
>
> Jakub


Re: [PATCH] Fix invalid asm goto error recovery (PR middle-end/52650)

2012-11-27 Thread Richard Biener
On Tue, Nov 27, 2012 at 2:26 PM, Jakub Jelinek  wrote:
> Hi!
>
> delete_insn_and_edges doesn't DTRT here, there is __builtin_unreachable
> after the invalid inline asm goto.  Rather than changing the CFG and trying
> to figure out what to do to remove the bogus asm goto successfully, it seems
> much easier to just turn it into asm goto ("" : : : clobbers... : lab1...);
> i.e. clear template and inputs.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2012-11-27  Jakub Jelinek  
>
> PR middle-end/52650
> * function.c (instantiate_virtual_regs_in_insn): Don't delete invalid
> asm gotos, instead just clear their template and inputs.
>
> --- gcc/function.c.jj   2012-11-21 16:00:12.0 +0100
> +++ gcc/function.c  2012-11-26 16:47:24.907472155 +0100
> @@ -1738,7 +1738,18 @@ instantiate_virtual_regs_in_insn (rtx in
>if (!check_asm_operands (PATTERN (insn)))
> {
>   error_for_asm (insn, "impossible constraint in %");
> - delete_insn_and_edges (insn);
> + /* For asm goto, instead of fixing up all the edges
> +just clear the template and clear input operands
> +(asm goto doesn't have any output operands).  */
> + if (JUMP_P (insn))
> +   {
> + rtx asm_op = extract_asm_operands (PATTERN (insn));
> + ASM_OPERANDS_TEMPLATE (asm_op) = ggc_strdup ("");
> + ASM_OPERANDS_INPUT_VEC (asm_op) = rtvec_alloc (0);
> + ASM_OPERANDS_INPUT_CONSTRAINT_VEC (asm_op) = rtvec_alloc (0);
> +   }
> + else
> +   delete_insn (insn);
> }
>  }
>else
>
> Jakub


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Jakub Jelinek
On Tue, Nov 27, 2012 at 01:03:47PM +0100, Jakub Jelinek wrote:
> So, at least from var-tracking POV which doesn't attempt to perform any
> optimizations across any insn, just tries to track where values live, IMHO a
> volatile asm acts exactly the same as non-volatile, that is why I'm testing
> following patch right now.
> 
> 2012-11-26  Jakub Jelinek  
> 
>   PR debug/36728
>   PR debug/55467
>   * cselib.c (cselib_process_insn): If cselib_preserve_constants,
>   don't reset table and exit early on volatile insns and setjmp.
>   Reset table afterwards on setjmp.
> 
>   * gcc.dg/guality/pr36728-1.c: Include "../nop.h", make sure the asm
>   are non-empty and add dependency between the first and second asm.
>   * gcc.dg/guality/pr36728-2.c: Likewise.
>   * gcc.dg/guality/pr36728-3.c: New test.
>   * gcc.dg/guality/pr36728-4.c: New test.

Bootstrapped/regtested fine on x86_64-linux and i686-linux (and fixes the
pr36728-[34].c failures that appear without the patch, which are of the
wrong debug kind).

Jakub


Re: bit_field_ref of constructor of vectors

2012-11-27 Thread Marc Glisse

On Tue, 27 Nov 2012, Richard Biener wrote:


On Mon, Nov 12, 2012 at 2:31 PM, Marc Glisse  wrote:

Hello,

this patch lets us fold a bit_field_ref of a constructor even when the
elements are vectors. Writing a testcase is not that convenient because of
the lack of a dead code elimination pass after forwprop4 (RTL doesn't always
remove everything either), but I see a clear difference on this test on
x86_64 without AVX.

typedef double vec __attribute__((vector_size(4*sizeof(double;
void f(vec*x){
  *x+=*x+*x;
}
double g(vec*x){
  return (*x+*x)[3];
}
void h(vec*x, double a, double b, double c, double d){
  vec y={a,b,c,d};
  *x+=y;
}


I don't know if the patch (assuming it is ok) is suitable for stage 3, it
may have to wait. It passes bootstrap+testsuite.


... on x86_64-linux-gnu, sorry for the lack of precision (default 
languages, Xeon E5520, graphite enabled).



I am still not quite sure why we even have a valid_gimple_rhs_p function
(after which we usually give up if it says no) instead of gimplifying, so I
may have missed a reason why CONSTRUCTOR or BIT_FIELD_REF shouldn't be ok.


Boostrapped and tested on ... ?

Ok if bootstrap / testing passes.


Thanks, I'll retest it to make sure it still works.


2012-11-12  Marc Glisse  

* fold-const.c (fold_ternary_loc) [BIT_FIELD_REF]: Handle
CONSTRUCTOR with vector elements.
* tree-ssa-propagate.c (valid_gimple_rhs_p): Handle CONSTRUCTOR
and BIT_FIELD_REF.


--
Marc Glisse


Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-11-27 Thread Dominique Dhumieres
> Dominique, could you give this a try and see if it helps?

With the patch, the miscompilation of aermod.f90 is gone.

Thanks,

Dominique


Re: PATCH: Define x86_64_fallback_frame_state only for glibc

2012-11-27 Thread H.J. Lu
On Wed, Mar 28, 2012 at 6:15 PM, H.J. Lu  wrote:
> Hi,
>
> We shouldn't assume that we can define x86_64_fallback_frame_state
> for other x86-64 C libraries, like Bionic.  OK for trunk?
>
> Thanks.
>
>
> H.J.
> ---
> 2012-03-27  H.J. Lu  
>
> * config/i386/linux-unwind.h (x86_64_fallback_frame_state): Define
> only for glibc.
>
> diff --git a/gcc/config/i386/linux-unwind.h b/gcc/config/i386/linux-unwind.h
> index c5f7ea0..61b4ebf 100644
> --- a/libgcc/config/i386/linux-unwind.h
> +++ b/libgcc/config/i386/linux-unwind.h
> @@ -28,11 +28,17 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>
>  #ifndef inhibit_libc
>
> -#ifdef __x86_64__
> +/* There's no sys/ucontext.h for glibc 2.0, so no
> +   signal-turned-exceptions for them.  There's also no configure-run for
> +   the target, so we can't check on (e.g.) HAVE_SYS_UCONTEXT_H.  Using the
> +   target libc version macro should be enough.  */
> +#if defined __GLIBC__ && !(__GLIBC__ == 2 && __GLIBC_MINOR__ == 0)
>
>  #include 
>  #include 
>
> +#ifdef __x86_64__
> +
>  #define MD_FALLBACK_FRAME_STATE_FOR x86_64_fallback_frame_state
>
>  static _Unwind_Reason_Code
> @@ -102,15 +108,6 @@ x86_64_fallback_frame_state (struct _Unwind_Context 
> *context,
>
>  #else /* ifdef __x86_64__  */
>
> -/* There's no sys/ucontext.h for glibc 2.0, so no
> -   signal-turned-exceptions for them.  There's also no configure-run for
> -   the target, so we can't check on (e.g.) HAVE_SYS_UCONTEXT_H.  Using the
> -   target libc version macro should be enough.  */
> -#if defined __GLIBC__ && !(__GLIBC__ == 2 && __GLIBC_MINOR__ == 0)
> -
> -#include 
> -#include 
> -
>  #define MD_FALLBACK_FRAME_STATE_FOR x86_fallback_frame_state
>
>  static _Unwind_Reason_Code
> @@ -191,6 +188,6 @@ x86_frob_update_context (struct _Unwind_Context *context,
>  _Unwind_SetSignalFrame (context, 1);
>  }
>
> -#endif /* not glibc 2.0 */
>  #endif /* ifdef __x86_64__  */
> +#endif /* not glibc 2.0 */
>  #endif /* ifdef inhibit_libc  */

I'd like to backport it to 4.7 branch.  OK for 4.7?

Thanks.

-- 
H.J.


Re: libsanitizer merge 168699

2012-11-27 Thread Jakub Jelinek
On Tue, Nov 27, 2012 at 05:46:55PM +0400, Konstantin Serebryany wrote:
> This is libsanitizer merge from upstream 168699.
> Tested on x86_64 Ubuntu 12.04 with:
> rm -rf */{*/,}libsanitizer \
>   && make -j 50 \
>   && make -C gcc check-g{cc,++}
> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} asan.exp'
> 
> Ok to commit?

Ok (though I hope the tsan atomics will be fixed up soon).

Jakub


Re: libsanitizer merge 168699

2012-11-27 Thread Konstantin Serebryany
Done, r193849.
Now that the merging became 95% automated, I can do another merge upon request.

On Tue, Nov 27, 2012 at 5:53 PM, Jakub Jelinek  wrote:
> On Tue, Nov 27, 2012 at 05:46:55PM +0400, Konstantin Serebryany wrote:
>> This is libsanitizer merge from upstream 168699.
>> Tested on x86_64 Ubuntu 12.04 with:
>> rm -rf */{*/,}libsanitizer \
>>   && make -j 50 \
>>   && make -C gcc check-g{cc,++}
>> RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} asan.exp'
>>
>> Ok to commit?
>
> Ok (though I hope the tsan atomics will be fixed up soon).
>
> Jakub


[asan, tsan] Don't ICE with -Og -fsanitize=address

2012-11-27 Thread Jakub Jelinek
Hi!

I've noticed we don't schedule asan/tsan passes for -Og, which results in
ICEs.  For -O0 that is handled by pass_[at]san_O0, but for -Og which also
doesn't use the normal optimization queue there is nothing.

Ok for trunk?

2012-11-27  Jakub Jelinek  

* passes.c (init_optimization_passes): Add pass_asan and pass_tsan
to -Og optimization passes.

--- gcc/passes.c.jj 2012-11-27 12:08:02.0 +0100
+++ gcc/passes.c2012-11-27 15:04:59.722410179 +0100
@@ -1536,6 +1536,8 @@ init_optimization_passes (void)
   /* Copy propagation also copy-propagates constants, this is necessary
  to forward object-size results properly.  */
   NEXT_PASS (pass_copy_prop);
+  NEXT_PASS (pass_asan);
+  NEXT_PASS (pass_tsan);
   NEXT_PASS (pass_rename_ssa_copies);
   NEXT_PASS (pass_dce);
   /* Fold remaining builtins.  */

Jakub


Re: Sparc ASAN

2012-11-27 Thread Konstantin Serebryany
On Wed, Nov 21, 2012 at 9:05 PM, Konstantin Serebryany
 wrote:
> On Wed, Nov 21, 2012 at 8:40 PM, David Miller  wrote:
>> From: Konstantin Serebryany 
>> Date: Wed, 21 Nov 2012 19:39:52 +0400
>>
>>> There are various other things that asan library does not support.
>>
>> I'm trying to understand why making the page size variable
>> is such a difficult endeavour.
>
> Maybe it's not *that* difficult.
> Looking at it carefully, the major problem I can see is that some
> other constants are defined based on this one.
> Give me a few days to resolve it...
> http://code.google.com/p/address-sanitizer/issues/detail?id=128

 http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193849 removes the
kPageSize constant in favor of a function call.
Please give it a try.

BTW, libsanitizer now has a usable process to quickly pull the upstream changes.
It should make it easier for us to iterate on platform-specific patches.

--kcc


Re: [asan, tsan] Don't ICE with -Og -fsanitize=address

2012-11-27 Thread Richard Biener
On Tue, Nov 27, 2012 at 3:09 PM, Jakub Jelinek  wrote:
> Hi!
>
> I've noticed we don't schedule asan/tsan passes for -Og, which results in
> ICEs.  For -O0 that is handled by pass_[at]san_O0, but for -Og which also
> doesn't use the normal optimization queue there is nothing.
>
> Ok for trunk?

Ok.

Thanks,
Richard.

> 2012-11-27  Jakub Jelinek  
>
> * passes.c (init_optimization_passes): Add pass_asan and pass_tsan
> to -Og optimization passes.
>
> --- gcc/passes.c.jj 2012-11-27 12:08:02.0 +0100
> +++ gcc/passes.c2012-11-27 15:04:59.722410179 +0100
> @@ -1536,6 +1536,8 @@ init_optimization_passes (void)
>/* Copy propagation also copy-propagates constants, this is necessary
>   to forward object-size results properly.  */
>NEXT_PASS (pass_copy_prop);
> +  NEXT_PASS (pass_asan);
> +  NEXT_PASS (pass_tsan);
>NEXT_PASS (pass_rename_ssa_copies);
>NEXT_PASS (pass_dce);
>/* Fold remaining builtins.  */
>
> Jakub


Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-11-27 Thread Dominique Dhumieres
> And more band-aid, ...

The gcc_assert triggers at bootstrap when compiling gcc/ada/ali.adb:

+===GNAT BUG DETECTED==+
| 4.8.0 20121127 (experimental) [trunk revision 193848p10] 
(x86_64-apple-darwin10.8.0) GCC error:|
| in df_remove_dead_eq_notes, at df-problems.c:2917|
| Error detected around ../../work/gcc/ada/ali.adb:2682:8  |
| Please submit a bug report; see http://gcc.gnu.org/bugs.html.|
| Use a subject line meaningful to you and us to track the bug.|
| Include the entire contents of this bug box in the report.   |
| Include the exact gcc or gnatmake command that you entered.  |
| Also include sources listed below in gnatchop format |
| (concatenated together with no headers between files).   |
+==+

Dominique


Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-11-27 Thread Paolo Bonzini
Il 27/11/2012 13:00, Steven Bosscher ha scritto:
> It all
> starts with PRE creating a REG_EQUAL note that references the register
> that's SET in the insn the note is attached to, but the register is
> live after the insn so from that point of view the note is not
> invalid.

This note seems very very weird.  For one thing, it becomes invalid on
the very instruction where it is created.  I would say that it should
not be there.

Paolo


Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-11-27 Thread Steven Bosscher
On Tue, Nov 27, 2012 at 3:25 PM, Dominique Dhumieres  wrote:
>> And more band-aid, ...
>
> The gcc_assert triggers at bootstrap when compiling gcc/ada/ali.adb:
>
> +===GNAT BUG DETECTED==+
> | 4.8.0 20121127 (experimental) [trunk revision 193848p10] 
> (x86_64-apple-darwin10.8.0) GCC error:|
> | in df_remove_dead_eq_notes, at df-problems.c:2917|
> | Error detected around ../../work/gcc/ada/ali.adb:2682:8  |
> | Please submit a bug report; see http://gcc.gnu.org/bugs.html.|
> | Use a subject line meaningful to you and us to track the bug.|
> | Include the entire contents of this bug box in the report.   |
> | Include the exact gcc or gnatmake command that you entered.  |
> | Also include sources listed below in gnatchop format |
> | (concatenated together with no headers between files).   |
> +==+

Yes, I found that one already, too.

And, oh joy, we have pseudos in REG_EQUAL notes after LRA! (Probably
also after reload, btw.).  In the ICE I got, a pseudo's live range got
split and an inheritance move is injected, but REG_EQUAL notes were
not updated or removed. Finding and removing the notes is hard in IRA
and LRA because they don't use the DF caches.

Ciao!
Steven


Re: [PATCH] Cleanup last_location and update input_location in ipa_prop

2012-11-27 Thread Dehao Chen
On Tue, Nov 27, 2012 at 1:46 AM, Richard Biener
 wrote:
> On Mon, Nov 26, 2012 at 11:54 PM, Dehao Chen  wrote:
>> The new patch is attached. Bootstrapped and passed gcc regression test.
>>
>> Ok for trunk?
>
> Hmm ... this merely avoids some UNKNOWN_LOCATIONs which should
> be ok.  I think the issue is that gimplify_expr does:
>
>   saved_location = input_location;
>   if (save_expr != error_mark_node
>   && EXPR_HAS_LOCATION (*expr_p))
> input_location = EXPR_LOCATION (*expr_p);
>
> thus it retains input_location from previous recursive invocations (that's 
> ok).
>
> But it doesn't seem to be the case that input_location is UNKNOWN_LOCATION
> during GIMPLE passes (which it really should be ...).  So to fix the
> gimplification
> issue I think that gimplify_ctx should have a location (initialized to
> UNKNOWN_LOCATION), which it saves/restores across its push/pop
> operation.
>
> Or of course that inside the pass manager before executing passes assert
> that input_location is UNKNOWN_LOCATION and fix up things according
> to that.  First offender is in cgraphunit.c in cgraph_analyze_function:
>
>   location_t saved_loc = input_location;
>   input_location = DECL_SOURCE_LOCATION (decl);
>
> that should be totally unnecessary (input_location shoud be and stay
> UNKNOWN_LOCATION).  And finalize_compilation_unit (the last thing
> the frontends should call) should have input_location = UNKNOWN_LOCATION
> right at the point it clears current_function_decl / cfun.
>
> I'd prefer the 2nd approach (maybe without the assert as we are in stage3
> already, but the assert would certainly help).  And places in the middle-end
> that set input_location for purpose of (re-)gimplifying should use the 
> location
> in the gimplify ctx (which would need to be added) instead of setting
> input_location.
>
> Maybe as a first try set input_location to UNKNOWN_LOCATION in
> finalize_compilation_unit.
>
> Thanks,
> Richard.

Shall we check in this patch first, and have another patch to reset
input_location?

Thanks,
Dehao


Re: [PATCH] Cleanup last_location and update input_location in ipa_prop

2012-11-27 Thread Richard Biener
On Tue, Nov 27, 2012 at 3:54 PM, Dehao Chen  wrote:
> On Tue, Nov 27, 2012 at 1:46 AM, Richard Biener
>  wrote:
>> On Mon, Nov 26, 2012 at 11:54 PM, Dehao Chen  wrote:
>>> The new patch is attached. Bootstrapped and passed gcc regression test.
>>>
>>> Ok for trunk?
>>
>> Hmm ... this merely avoids some UNKNOWN_LOCATIONs which should
>> be ok.  I think the issue is that gimplify_expr does:
>>
>>   saved_location = input_location;
>>   if (save_expr != error_mark_node
>>   && EXPR_HAS_LOCATION (*expr_p))
>> input_location = EXPR_LOCATION (*expr_p);
>>
>> thus it retains input_location from previous recursive invocations (that's 
>> ok).
>>
>> But it doesn't seem to be the case that input_location is UNKNOWN_LOCATION
>> during GIMPLE passes (which it really should be ...).  So to fix the
>> gimplification
>> issue I think that gimplify_ctx should have a location (initialized to
>> UNKNOWN_LOCATION), which it saves/restores across its push/pop
>> operation.
>>
>> Or of course that inside the pass manager before executing passes assert
>> that input_location is UNKNOWN_LOCATION and fix up things according
>> to that.  First offender is in cgraphunit.c in cgraph_analyze_function:
>>
>>   location_t saved_loc = input_location;
>>   input_location = DECL_SOURCE_LOCATION (decl);
>>
>> that should be totally unnecessary (input_location shoud be and stay
>> UNKNOWN_LOCATION).  And finalize_compilation_unit (the last thing
>> the frontends should call) should have input_location = UNKNOWN_LOCATION
>> right at the point it clears current_function_decl / cfun.
>>
>> I'd prefer the 2nd approach (maybe without the assert as we are in stage3
>> already, but the assert would certainly help).  And places in the middle-end
>> that set input_location for purpose of (re-)gimplifying should use the 
>> location
>> in the gimplify ctx (which would need to be added) instead of setting
>> input_location.
>>
>> Maybe as a first try set input_location to UNKNOWN_LOCATION in
>> finalize_compilation_unit.
>>
>> Thanks,
>> Richard.
>
> Shall we check in this patch first, and have another patch to reset
> input_location?

Sure, that works for me.  It might be harder for you to verify that
doing that also fixes the issue though ;)

Richard.

> Thanks,
> Dehao


[PATCH] Fix PR35634

2012-11-27 Thread Richard Biener

This re-surrects Josephs patch to fix PR35634 - the fact that
the C frontend family does not properly promote the expression
operands of self-modify expressions.  Accounting for this during
gimplification is indeed easiest (as long as fold isn't too clever
with the expressions in unpromoted form).  The patch, instead of
promoting to int/unsigned int uses an unsigned type for the
computation where otherwise undefined behavior would arise from
the computation in a type with lower precision.  That avoids
most of the vectorizer regressions with the earlier patch.
The remaining three regressions are fixed with the
vect_can_advance_ivs_p hunk.

Bootstrapped and tested on x86_64-unknown-linux-gnu without
the vect_can_advance_ivs_p hunk, re-testing with that.

Ok for trunk?  (I don't think this kind of patch qualifies for
backporting)

Thanks,
Richard.

2008-04-17  Richard Guenther  

PR c/35634
* gimple.h (gimplify_self_mod_expr): Declare.
* gimplify.c (gimplify_self_mod_expr): Export.  Take a different
type for performing the arithmetic in.
(gimplify_expr): Adjust.
* tree-vect-loop-manip.c (vect_can_advance_ivs_p): Strip
sign conversions we can re-apply after adjusting the IV.

c-family/
* c-gimplify.c (c_gimplify_expr): Gimplify self-modify expressions
here and use a type with proper overflow behavior for types that would
need to be promoted for the arithmetic.

* gcc.dg/torture/pr35634.c: New testcase.
* g++.dg/torture/pr35634.C: Likewise.
* gcc.dg/vect/pr18536.c: Mark worker function noinline.

Index: gcc/testsuite/g++.dg/torture/pr35634.C
===
*** /dev/null   1970-01-01 00:00:00.0 +
--- gcc/testsuite/g++.dg/torture/pr35634.C  2012-11-27 15:49:46.387585847 
+0100
***
*** 0 
--- 1,19 
+ /* { dg-do run } */
+ 
+ extern "C" void abort (void);
+ extern "C" void exit (int);
+ 
+ void foo (int i)
+ {
+ static int n;
+ if (i < -128 || i > 127)
+ abort ();
+ if (++n > 1000)
+ exit (0);
+ }
+ 
+ int main ()
+ {
+ char c;
+ for (c = 0; ; c++) foo (c);
+ }
Index: gcc/testsuite/gcc.dg/torture/pr35634.c
===
*** /dev/null   1970-01-01 00:00:00.0 +
--- gcc/testsuite/gcc.dg/torture/pr35634.c  2012-11-27 15:49:46.424585859 
+0100
***
*** 0 
--- 1,19 
+ /* { dg-do run } */
+ 
+ void abort (void);
+ void exit (int);
+ 
+ void foo (int i)
+ {
+ static int n;
+ if (i < -128 || i > 127)
+ abort ();
+ if (++n > 1000)
+ exit (0);
+ }
+ 
+ int main ()
+ {
+ char c;
+ for (c = 0; ; c++) foo (c);
+ }
Index: gcc/gimple.h
===
*** gcc/gimple.h.orig   2012-11-27 14:27:24.0 +0100
--- gcc/gimple.h2012-11-27 15:49:46.424585859 +0100
*** extern enum gimplify_status gimplify_exp
*** 979,984 
--- 979,986 
   bool (*) (tree), fallback_t);
  extern void gimplify_type_sizes (tree, gimple_seq *);
  extern void gimplify_one_sizepos (tree *, gimple_seq *);
+ enum gimplify_status gimplify_self_mod_expr (tree *, gimple_seq *, gimple_seq 
*,
+bool, tree);
  extern bool gimplify_stmt (tree *, gimple_seq *);
  extern gimple gimplify_body (tree, bool);
  extern void push_gimplify_context (struct gimplify_ctx *);
Index: gcc/gimplify.c
===
*** gcc/gimplify.c.orig 2012-11-27 14:27:24.0 +0100
--- gcc/gimplify.c  2012-11-27 15:49:46.425585857 +0100
*** gimplify_compound_lval (tree *expr_p, gi
*** 2319,2327 
  WANT_VALUE is nonzero iff we want to use the value of this expression
in another expression.  */
  
! static enum gimplify_status
  gimplify_self_mod_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
!   bool want_value)
  {
enum tree_code code;
tree lhs, lvalue, rhs, t1;
--- 2319,2327 
  WANT_VALUE is nonzero iff we want to use the value of this expression
in another expression.  */
  
! enum gimplify_status
  gimplify_self_mod_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
!   bool want_value, tree arith_type)
  {
enum tree_code code;
tree lhs, lvalue, rhs, t1;
*** gimplify_self_mod_expr (tree *expr_p, gi
*** 2382,2408 
return ret;
  }
  
/* For POINTERs increment, use POINTER_PLUS_EXPR.  */
if (POINTER_TYPE_P (TREE_TYPE (lhs)))
  {
rhs = convert_to_ptrofftype_loc (loc, rhs);
if (arith_code == MINUS_EXPR)
rhs = fold_build1_loc (loc, NEGATE_EXPR, TREE_TYPE (rhs), rhs);
!   arith_code = POINTER_PLUS_EXPR;
  }
  
if (postfix)
  {

[PATCH] Fix PR55489, memory-hog bug in GCSE

2012-11-27 Thread Paolo Bonzini
Hi,

this bug triggers in the compilation of QEMU with GCC 4.7.2.  It is
latent on trunk because reg_known_value is completely broken.  I'll
send a separate patch, but this one applies there too.

The problem arises when you have -fPIE (or -fPIC) and a huge function
with a lot of references to global variables.  Canonicalization of
position-independent addresses is then done over and over for the
same addresses, resulting in quadratic time and memory complexity for
GCSE's compute_transp; hundreds of megabytes of memory are allocated
in plus_constant,

The fix is to canonicalize the addresses outside the loop, similar to
what is done by the RTL DSE pass.

gcc 4.4.6:
 PRE :   3.83 (24%) usr   0.15 (17%) sys   3.99 (24%) wall  267307 kB (33%) ggc

gcc 4.7.2:
 PRE :   7.95 (41%) usr   0.40 (40%) sys   8.31 (41%) wall  821017 kB (80%) ggc

gcc 4.8.0:
 PRE :   6.94 (26%) usr   0.02 ( 4%) sys   6.98 (26%) wall 731 kB ( 0%) ggc

gcc 4.7.2 + patch:
 PRE :   5.90 (34%) usr   0.02 ( 3%) sys   6.41 (35%) wall1670 kB ( 1%) ggc

Note that the bug is present on older branches too, but it became much
worse sometime between 4.4 and 4.7.

Bootstrap finished on x86_64-pc-linux-gnu, regtest in progress; ok for
4.7 and trunk if it passes?

Paolo

2012-11-26  Paolo Bonzini  

PR rtl-optimization/55489
* gcse.c (compute_transp): Precompute a canonical version
of XEXP (x, 0), and pass it to canon_true_dependence.

Index: gcse.c
===
--- gcse.c  (revisione 193848)
+++ gcse.c  (copia locale)
@@ -1658,7 +1658,11 @@ compute_transp (const_rtx x, int indx, sbitmap *bm
{
  bitmap_iterator bi;
  unsigned bb_index;
+ rtx x_addr;
 
+ x_addr = get_addr (XEXP (x, 0));
+ x_addr = canon_rtx (x_addr);
+
  /* First handle all the blocks with calls.  We don't need to
 do any list walking for them.  */
  EXECUTE_IF_SET_IN_BITMAP (blocks_with_calls, 0, bb_index, bi)
@@ -1683,7 +1687,7 @@
rtx dest_addr = pair->dest_addr;
 
if (canon_true_dependence (dest, GET_MODE (dest),
-  dest_addr, x, NULL_RTX))
+  dest_addr, x, x_addr))
  RESET_BIT (bmap[bb_index], indx);
  }
  }


[PATCH] Fix allocation of reg_known_value

2012-11-27 Thread Paolo Bonzini
When changing reg_known_value to a VEC, Steven sneaked in a small
semantic change.  That change has the advantage of improving compilation
time substantially on some testcases (including PR55489), but it is a
bit heavy-handed: it also makes set_known_reg_value a no-op, while
get_known_reg_value will always return NULL.

This patch fixes the VEC usage.  Bootstrap/regtest in progress, but I'm
going to commit this as obvious as soon as bootstrapping finishes.

Paolo

2012-11-26  Paolo Bonzini  

* alias.c (init_alias_analysis): Fix allocation of reg_known_value.

Index: ../../gcc/alias.c
===
--- ../../gcc/alias.c   (revisione 193853)
+++ ../../gcc/alias.c   (copia locale)
@@ -2808,7 +2808,7 @@ init_alias_analysis (void)
 
   timevar_push (TV_ALIAS_ANALYSIS);
 
-  vec_alloc (reg_known_value, maxreg - FIRST_PSEUDO_REGISTER);
+  vec_safe_grow_cleared (reg_known_value, maxreg - FIRST_PSEUDO_REGISTER);
   reg_known_equiv_p = sbitmap_alloc (maxreg - FIRST_PSEUDO_REGISTER);
 
   /* If we have memory allocated from the previous run, use it.  */


Re: RFA: Enable both ld and gold in GCC 4.8

2012-11-27 Thread H.J. Lu
On Tue, Nov 27, 2012 at 2:46 AM, Matthias Klose  wrote:
> Am 26.11.2012 17:53, schrieb H.J. Lu:
>> Hi,
>>
>> There is a patch to enable both ld and gold in GCC:
>>
>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg02076.html
>>
>> and Diego asked if there is "anything in particular blocking
>> this patch in trunk":
>
> I think it's lack of review.
>
>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg02121.html
>>

There is no reply for

http://gcc.gnu.org/ml/gcc-patches/2011-06/msg02121.html


-- 
H.J.


Re: [PATCH] Fix allocation of reg_known_value

2012-11-27 Thread Steven Bosscher
On Tue, Nov 27, 2012 at 5:00 PM, Paolo Bonzini wrote:
> When changing reg_known_value to a VEC, Steven sneaked in a small
> semantic change.

I don't sneak, I just tend to do stupid things from time to time ;-)

Thanks for catching this, and sorry for the breakage!

Ciao!
Steven


[AARCH64-4.7] Merge from upstream gcc-4_7-branch r193800

2012-11-27 Thread Sofiane Naci
Hi,

I have just merged upstream gcc-4_7-branch into ARM/aarch64-4.7-branch, up
to r193800.
This merge didn't cause any regressions.

Thanks
Sofiane






Re: [PATCH] Fix PR55489, memory-hog bug in GCSE

2012-11-27 Thread Steven Bosscher
On Tue, Nov 27, 2012 at 5:00 PM, Paolo Bonzini wrote:
> Note that the bug is present on older branches too, but it became much
> worse sometime between 4.4 and 4.7.

Probably around the time we (i.e. you and me) taught gcse.c to handle
REG_EQUAL expressions?

> 2012-11-26  Paolo Bonzini  <>
>
> PR rtl-optimization/55489
> * gcse.c (compute_transp): Precompute a canonical version
> of XEXP (x, 0), and pass it to canon_true_dependence.
>
> Index: gcse.c
> ===
> --- gcse.c  (revisione 193848)
> +++ gcse.c  (copia locale)
> @@ -1658,7 +1658,11 @@ compute_transp (const_rtx x, int indx, sbitmap *bm
> {
>   bitmap_iterator bi;
>   unsigned bb_index;
> + rtx x_addr;
>
> + x_addr = get_addr (XEXP (x, 0));
> + x_addr = canon_rtx (x_addr);
> +
>   /* First handle all the blocks with calls.  We don't need to
>  do any list walking for them.  */
>   EXECUTE_IF_SET_IN_BITMAP (blocks_with_calls, 0, bb_index, bi)
> @@ -1683,7 +1687,7 @@
> rtx dest_addr = pair->dest_addr;
>
> if (canon_true_dependence (dest, GET_MODE (dest),
> -  dest_addr, x, NULL_RTX))
> +  dest_addr, x, x_addr))
>   RESET_BIT (bmap[bb_index], indx);
>   }
>   }

I can't approve it, but this looks OK to me.

Maybe properly re-indent this block of code while at it?

Ciao!
Steven


[patch] Fix gnat.dg regressions and sizetype change fallouts

2012-11-27 Thread Eric Botcazou
Hi,

this fixes the couple of gnat.dg regressions on all platforms introduced by 
the sizetype changes, as well as finishes the conversion of the Ada compiler 
to the new scheme by invoking valid_constant_size_p and size_binop in the 
appropriate places.

The 2 regressions were introduced by the kludge added to layout_type to deal 
with overflowed zeroes.  As such, the kludge is wrong, since something like:

  type Arr is array(Long_Integer) of Boolean;

in Ada already yields a really overflowed zero.  So the patch adds a kludge to 
the kludge, with the appropriate ??? comment this time.

Tested on i586-suse-linux and x86-64-suse-linux, OK for the mainline?


2012-11-27  Eric Botcazou  

* stor-layout.c (layout_type) : Do not clear TREE_OVERFLOW
on overflowed zeroes, except in one specific case.
ada/
* gcc-interface/decl.c (gnat_to_gnu_entity) : Use
valid_constant_size_p to detect too large objects.
: Likewise for too large return types.
(allocatable_size_p): Call valid_constant_size_p in the fixed case.
(annotate_value) : Adjust to sizetype changes.
* gcc-interface/trans.c (gnat_to_gnu) : Use
valid_constant_size_p to detect too large objects on the LHS.
* gcc-interface/misc.c (default_pass_by_ref): Likewise for large types.
And use TYPE_SIZE_UNIT throughout.
(must_pass_by_ref): Likewise.
* gcc-interface/utils.c (max_size) : Split from common case.
: Likewise.  Call size_binop instead of fold_build2.
: Simplify.
* gcc-interface/utils2.c (build_allocator): Use valid_constant_size_p
to detect too large allocations.


2012-11-27  Eric Botcazou  

* gnat.dg/object_overflow.adb: Rename to...
* gnat.dg/object_overflow1.adb: ...this.
* gnat.dg/object_overflow2.adb: New test.
* gnat.dg/object_overflow3.adb: Likewise.
* gnat.dg/object_overflow4.adb: Likewise.


-- 
Eric BotcazouIndex: stor-layout.c
===
--- stor-layout.c	(revision 193837)
+++ stor-layout.c	(working copy)
@@ -2233,12 +2233,12 @@ layout_type (tree type)
 	  size_binop (MINUS_EXPR, ub, lb)));
 	  }
 
-	/* If we arrived at a length of zero ignore any overflow
-	   that occurred as part of the calculation.  There exists
-	   an association of the plus one where that overflow would
-	   not happen.  */
+	/* ??? We have no way to distinguish a null-sized array from an
+	   array spanning the whole sizetype range, so we arbitrarily
+	   decide that [0, -1] is the only valid representation.  */
 	if (integer_zerop (length)
-		&& TREE_OVERFLOW (length))
+	&& TREE_OVERFLOW (length)
+		&& integer_zerop (lb))
 	  length = size_zero_node;
 
 	TYPE_SIZE (type) = size_binop (MULT_EXPR, element_size,
Index: ada/gcc-interface/utils.c
===
--- ada/gcc-interface/utils.c	(revision 193837)
+++ ada/gcc-interface/utils.c	(working copy)
@@ -3024,59 +3024,67 @@ max_size (tree exp, bool max_p)
   return max_p ? size_one_node : size_zero_node;
 
 case tcc_unary:
+  if (code == NON_LVALUE_EXPR)
+	return max_size (TREE_OPERAND (exp, 0), max_p);
+ 
+  return fold_build1 (code, type,
+			  max_size (TREE_OPERAND (exp, 0),
+code == NEGATE_EXPR ? !max_p : max_p));
+
 case tcc_binary:
+  {
+	tree lhs = max_size (TREE_OPERAND (exp, 0), max_p);
+	tree rhs = max_size (TREE_OPERAND (exp, 1),
+			 code == MINUS_EXPR ? !max_p : max_p);
+
+	/* Special-case wanting the maximum value of a MIN_EXPR.
+	   In that case, if one side overflows, return the other.  */
+	if (max_p && code == MIN_EXPR)
+	  {
+	if (TREE_CODE (rhs) == INTEGER_CST && TREE_OVERFLOW (rhs))
+	  return lhs;
+
+	if (TREE_CODE (lhs) == INTEGER_CST && TREE_OVERFLOW (lhs))
+	  return rhs;
+	  }
+
+	/* Likewise, handle a MINUS_EXPR or PLUS_EXPR with the LHS
+	   overflowing and the RHS a variable.  */
+	if ((code == MINUS_EXPR || code == PLUS_EXPR)
+	&& TREE_CODE (lhs) == INTEGER_CST
+	&& TREE_OVERFLOW (lhs)
+	&& !TREE_CONSTANT (rhs))
+	  return lhs;
+
+	return size_binop (code, lhs, rhs);
+  }
+
 case tcc_expression:
   switch (TREE_CODE_LENGTH (code))
 	{
 	case 1:
 	  if (code == SAVE_EXPR)
 	return exp;
-	  else if (code == NON_LVALUE_EXPR)
-	return max_size (TREE_OPERAND (exp, 0), max_p);
-	  else
-	return
-	  fold_build1 (code, type,
-			   max_size (TREE_OPERAND (exp, 0),
- code == NEGATE_EXPR ? !max_p : max_p));
+
+	  return fold_build1 (code, type,
+			  max_size (TREE_OPERAND (exp, 0), max_p));
 
 	case 2:
 	  if (code == COMPOUND_EXPR)
 	return max_size (TREE_OPERAND (exp, 1), max_p);
 
-	  {
-	tree lhs = max_size (TREE_OPERAND (exp, 0), max_p);
-	tree rhs = max_size (TREE_OPERAND (exp, 1),
- code == MINUS_EXPR ? !max_p : max_p);
-
-	/* Spec

[patch][google/gcc-4_7] Extend expiration date for pr54127 and Google ref b/6983319 (issue6858082)

2012-11-27 Thread Simon Baldwin
Extend expiration date for pr54127 and Google ref b/6983319.

Okay for google/gcc-4_7 branch?  Thanks.


ChangeLog.google-4_7
2012-11-27  Simon Baldwin  

* contrib/testsuite-management/powerpc-grtev3-linux-gnu.xfail:
Extend expiration date for pr54127 and Google ref b/6983319.


Index: contrib/testsuite-management/powerpc-grtev3-linux-gnu.xfail
===
--- contrib/testsuite-management/powerpc-grtev3-linux-gnu.xfail (revision 
193854)
+++ contrib/testsuite-management/powerpc-grtev3-linux-gnu.xfail (working copy)
@@ -1,9 +1,9 @@
 # Temporarily ignore gcc pr54127.
-expire=20121231 | FAIL: gcc.dg/torture/pr53589.c -O3 -g  (test for excess 
errors)
-expire=20121231 | FAIL: gcc.dg/torture/pr53589.c  -O3 -g  (internal compiler 
error)
+expire=20151231 | FAIL: gcc.dg/torture/pr53589.c -O3 -g  (test for excess 
errors)
+expire=20151231 | FAIL: gcc.dg/torture/pr53589.c  -O3 -g  (internal compiler 
error)
 # Temporarily ignore Google ref b/6983319.
-expire=20121231 | FAIL: gcc.target/powerpc/regnames-1.c (test for excess 
errors)
-expire=20121231 | FAIL: gcc.target/powerpc/regnames-1.c (internal compiler 
error)
+expire=20151231 | FAIL: gcc.target/powerpc/regnames-1.c (test for excess 
errors)
+expire=20151231 | FAIL: gcc.target/powerpc/regnames-1.c (internal compiler 
error)
 
 FAIL: gfortran.dg/bessel_6.f90  -O0  execution test
 FAIL: gfortran.dg/bessel_6.f90  -O1  execution test

--
This patch is available for review at http://codereview.appspot.com/6858082


Re: [patch][google/gcc-4_7] Extend expiration date for pr54127 and Google ref b/6983319 (issue6858082)

2012-11-27 Thread Diego Novillo
On Tue, Nov 27, 2012 at 11:45 AM, Simon Baldwin  wrote:
> Extend expiration date for pr54127 and Google ref b/6983319.
>
> Okay for google/gcc-4_7 branch?  Thanks.
>
>
> ChangeLog.google-4_7
> 2012-11-27  Simon Baldwin  
>
> * contrib/testsuite-management/powerpc-grtev3-linux-gnu.xfail:
> Extend expiration date for pr54127 and Google ref b/6983319.

OK.

Diego.


[Patch AArch64] Add support for vectorizable standard math patterns.

2012-11-27 Thread James Greenhalgh

Hi,

This patch adds support for vectorizing across some of the rounding
functions in the C math library to the AArch64 back-end.

In particular, we add support for vectorizing across:

ceil (), ceilf (), lceil (),
floor (), floorf (), lfloor (),
round (), roundf (),
nearbyint (), nearbyintf (),
trunc (), truncf ()

We add testcases ensuring that each of the expected functions are
vectorized. As the i386 and rs6000 backends both ostensibly support
these optimisations we add these tests to the generic testsuites, but
only wire them up for AArch64. As a target may support any subset of
these vectorizations we need a check_effective_target macro for
each of them.

Because of this change to the generic test code I've CCed Janis
Johnson and Mike Stump.

Is this patch OK to commit?

Thanks,
James

---
gcc/

2012-11-27  James Greenhalgh  

* gcc/config/aarch64/aarch64-builtins.c
(aarch64_builtin_vectorized_function): New.
* gcc/config/aarch64/aarch64-protos.h
(aarch64_builtin_vectorized_function): Declare.
* gcc/config/aarch64/aarch64-simd-builtins.def (frintz, frintp): Add.
(frintm, frinti, frintx, frinta, fcvtzs, fcvtzu): Likewise.
(fcvtas, fcvtau, fcvtps, fcvtpu, fcvtms, fcvtmu): Likewise.
* gcc/config/aarch64/aarch64-simd.md
(aarch64_frint_): New.
(2): Likewise.
(aarch64_fcvt): Likewise.
(l2): Likewise.
* gcc/config/aarch64/aarch64.c (TARGET_VECTORIZE_BUILTINS): Define.
(TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Likewise.
* gcc/config/aarch64/aarch64.md
(btrunc2, ceil2, floor2)
(round2, rint2, nearbyint2): Consolidate as...
(2): ...this.
(lceil2, lfloor2)
(lround2)
(lrint2): Consolidate as...
(l2): ... this.
* gcc/config/aarch64/iterators.md (fcvt_target): New.
(FCVT_TARGET): Likewise.
(FRINT): Likewise.
(FCVT): Likewise.
(frint_pattern): Likewise.
(frint_suffix): Likewise.
(fcvt_pattern): Likewise.

gcc/testsuite/

2012-11-27  James Greenhalgh  

* gcc/testsuite/gcc.dg/vect/vect-rounding-btrunc.c: New test.
* gcc/testsuite/gcc.dg/vect/vect-rounding-btruncf.c: Likewise.
* gcc/testsuite/gcc.dg/vect/vect-rounding-ceil.c: Likewise.
* gcc/testsuite/gcc.dg/vect/vect-rounding-ceilf.c: Likewise.
* gcc/testsuite/gcc.dg/vect/vect-rounding-floor.c: Likewise.
* gcc/testsuite/gcc.dg/vect/vect-rounding-floorf.c: Likewise.
* gcc/testsuite/gcc.dg/vect/vect-rounding-lceil.c: Likewise.
* gcc/testsuite/gcc.dg/vect/vect-rounding-lfloor.c: Likewise.
* gcc/testsuite/gcc.dg/vect/vect-rounding-nearbyint.c: Likewise.
* gcc/testsuite/gcc.dg/vect/vect-rounding-nearbyintf.c: Likewise.
* gcc/testsuite/gcc.dg/vect/vect-rounding-round.c: Likewise.
* gcc/testsuite/gcc.dg/vect/vect-rounding-roundf.c: Likewise.
* gcc/testsuite/lib/target-supports.exp
(check_effective_target_vect_call_btrunc): New.
(check_effective_target_vect_call_btruncf): Likewise.
(check_effective_target_vect_call_ceil): Likewise.
(check_effective_target_vect_call_ceilf): Likewise.
(check_effective_target_vect_call_floor): Likewise.
(check_effective_target_vect_call_floorf): Likewise.
(check_effective_target_vect_call_lceil): Likewise.
(check_effective_target_vect_call_lfloor): Likewise.
(check_effective_target_vect_call_nearbyint): Likewise.
(check_effective_target_vect_call_nearbyintf): Likewise.
(check_effective_target_vect_call_round): Likewise.
(check_effective_target_vect_call_roundf): Likewise.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 2cdda0f..a683afd 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1221,3 +1221,70 @@ aarch64_expand_builtin (tree exp,
 
   return NULL_RTX;
 }
+
+tree
+aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
+{
+  enum machine_mode in_mode, out_mode;
+  int in_n, out_n;
+
+  if (TREE_CODE (type_out) != VECTOR_TYPE
+  || TREE_CODE (type_in) != VECTOR_TYPE)
+return NULL_TREE;
+
+  out_mode = TYPE_MODE (TREE_TYPE (type_out));
+  out_n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+
+#undef AARCH64_CHECK_BUILTIN_MODE
+#define AARCH64_CHECK_BUILTIN_MODE(C, N) 1
+#define AARCH64_FIND_FRINT_VARIANT(N) \
+  (AARCH64_CHECK_BUILTIN_MODE (2, D) \
+? aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_##N##v2df] \
+: (AARCH64_CHECK_BUILTIN_MODE (4, S) \
+	? aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_##N##v4sf] \
+	: (AARCH64_CHECK_BUILTIN_MODE (2, S) \
+	   ? aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_##N##v2sf] \
+	   : NULL_TREE)))
+  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+{
+  enum

Re: [PATCH] Fix PR55489, memory-hog bug in GCSE

2012-11-27 Thread Paolo Bonzini
Il 27/11/2012 17:40, Steven Bosscher ha scritto:
> > Note that the bug is present on older branches too, but it became much
> > worse sometime between 4.4 and 4.7.
> 
> Probably around the time we (i.e. you and me) taught gcse.c to handle
> REG_EQUAL expressions?

Hmm, no, that was much earlier.  Like 4.2 or 4.3, roughly the same time
when fwprop went in.

Paolo


Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-11-27 Thread Eric Botcazou
> This note seems very very weird.  For one thing, it becomes invalid on
> the very instruction where it is created.  I would say that it should
> not be there.

Agreed.

-- 
Eric Botcazou


Re: bit_field_ref of constructor of vectors

2012-11-27 Thread Marc Glisse

On Tue, 27 Nov 2012, Marc Glisse wrote:


On Tue, 27 Nov 2012, Richard Biener wrote:


It passes bootstrap+testsuite.


... on x86_64-linux-gnu, sorry for the lack of precision (default languages, 
Xeon E5520, graphite enabled).

[...]

Boostrapped and tested on ... ?

Ok if bootstrap / testing passes.


Thanks, I'll retest it to make sure it still works.


Duh, it uses VEC, of course it doesn't work anymore...
Here is what I tested instead (I copied the 2 vec lines from somewhere 
else in the file). I'll probably commit it later today.



2012-11-12  Marc Glisse  

* fold-const.c (fold_ternary_loc) [BIT_FIELD_REF]: Handle
CONSTRUCTOR with vector elements.
* tree-ssa-propagate.c (valid_gimple_rhs_p): Handle CONSTRUCTOR
and BIT_FIELD_REF.


--
Marc GlisseIndex: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 193847)
+++ gcc/fold-const.c(working copy)
@@ -14074,66 +14074,76 @@ fold_ternary_loc (location_t loc, enum t
  unsigned HOST_WIDE_INT n = tree_low_cst (arg1, 1);
  unsigned HOST_WIDE_INT idx = tree_low_cst (op2, 1);
 
  if (n != 0
  && (idx % width) == 0
  && (n % width) == 0
  && ((idx + n) / width) <= TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))
{
  idx = idx / width;
  n = n / width;
- if (TREE_CODE (type) == VECTOR_TYPE)
+
+ if (TREE_CODE (arg0) == VECTOR_CST)
{
- if (TREE_CODE (arg0) == VECTOR_CST)
-   {
- tree *vals = XALLOCAVEC (tree, n);
- unsigned i;
- for (i = 0; i < n; ++i)
-   vals[i] = VECTOR_CST_ELT (arg0, idx + i);
- return build_vector (type, vals);
-   }
- else
-   {
- vec *vals;
- unsigned i;
- if (CONSTRUCTOR_NELTS (arg0) == 0)
-   return build_constructor (type,
- NULL);
- if (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (arg0,
-0)->value))
- != VECTOR_TYPE)
-   {
- vec_alloc (vals, n);
- for (i = 0;
-  i < n && idx + i < CONSTRUCTOR_NELTS (arg0);
-  ++i)
-   CONSTRUCTOR_APPEND_ELT (vals, NULL_TREE,
-   CONSTRUCTOR_ELT
- (arg0, idx + i)->value);
- return build_constructor (type, vals);
-   }
-   }
+ if (n == 1)
+   return VECTOR_CST_ELT (arg0, idx);
+
+ tree *vals = XALLOCAVEC (tree, n);
+ for (unsigned i = 0; i < n; ++i)
+   vals[i] = VECTOR_CST_ELT (arg0, idx + i);
+ return build_vector (type, vals);
}
- else if (n == 1)
+
+ /* Constructor elements can be subvectors.  */
+ unsigned HOST_WIDE_INT k = 1;
+ if (CONSTRUCTOR_NELTS (arg0) != 0)
{
- if (TREE_CODE (arg0) == VECTOR_CST)
-   return VECTOR_CST_ELT (arg0, idx);
- else if (CONSTRUCTOR_NELTS (arg0) == 0)
-   return build_zero_cst (type);
- else if (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (arg0,
- 0)->value))
-  != VECTOR_TYPE)
+ tree cons_elem = TREE_TYPE (CONSTRUCTOR_ELT (arg0, 0)->value);
+ if (TREE_CODE (cons_elem) == VECTOR_TYPE)
+   k = TYPE_VECTOR_SUBPARTS (cons_elem);
+   }
+
+ /* We keep an exact subset of the constructor elements.  */
+ if ((idx % k) == 0 && (n % k) == 0)
+   {
+ if (CONSTRUCTOR_NELTS (arg0) == 0)
+   return build_constructor (type, NULL);
+ idx /= k;
+ n /= k;
+ if (n == 1)
{
  if (idx < CONSTRUCTOR_NELTS (arg0))
return CONSTRUCTOR_ELT (arg0, idx)->value;
  return build_zero_cst (type);
}
+
+ vec *vals;
+ vec_alloc (vals, n);
+ for (unsigned i = 0;
+  i < n && idx + i < CONSTRUCTOR_NELTS (arg0);
+  ++i)
+   CONSTRUCTOR_APPEND_ELT (vals, NULL_TREE,
+   CONSTRUCTOR_ELT
+ (ar

Re: [0/8] Add optabs alternatives for insv, extv and extzv

2012-11-27 Thread Ramana Radhakrishnan



Tested on x86_64-linux-gnu, powerpc64-linux-gnu and mipsisa64-elf.
Also tested by making sure that there were no changes in assembly
output for a set of gcc .ii files.  On the other hand, the -march=octeon
output for a set of mips64-linux-gnu gcc .ii files showed the optimisation
kicking in as hoped.


This sequence of patches caused a regression in 
gcc.target/arm/volatile-bitfields-1.c . I haven't reviewed the patch 
stack in great detail yet, but it looks like this sequence of patches 
doesn't take the ARM volatile bitfields into account. (193600 is fine, 
193606 is not).



regards,
Ramana



Richard







Re: RFA: Enable both ld and gold in GCC 4.8

2012-11-27 Thread Paolo Bonzini
Il 27/11/2012 17:07, H.J. Lu ha scritto:
> There is no reply for
> 
> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg02121.html

That counts as lack of review...

Paolo



Re: RFA: Enable both ld and gold in GCC 4.8

2012-11-27 Thread H.J. Lu
On Tue, Nov 27, 2012 at 9:20 AM, Paolo Bonzini  wrote:
> Il 27/11/2012 17:07, H.J. Lu ha scritto:
>> There is no reply for
>>
>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg02121.html
>
> That counts as lack of review...
>

That is no response from patch submitter.  It is different from
lack of review.


-- 
H.J.


Re: [PATCH] Fix PR35634

2012-11-27 Thread Joseph S. Myers
On Tue, 27 Nov 2012, Richard Biener wrote:

>   c-family/
>   * c-gimplify.c (c_gimplify_expr): Gimplify self-modify expressions
>   here and use a type with proper overflow behavior for types that would
>   need to be promoted for the arithmetic.

The front-end changes are OK.

> *** gcc/gimplify.c.orig   2012-11-27 14:27:24.0 +0100
> --- gcc/gimplify.c2012-11-27 15:49:46.425585857 +0100
> *** gimplify_compound_lval (tree *expr_p, gi
> *** 2319,2327 
>   WANT_VALUE is nonzero iff we want to use the value of this expression
>   in another expression.  */
>   
> ! static enum gimplify_status
>   gimplify_self_mod_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
> ! bool want_value)
>   {
> enum tree_code code;
> tree lhs, lvalue, rhs, t1;
> --- 2319,2327 
>   WANT_VALUE is nonzero iff we want to use the value of this expression
>   in another expression.  */
>   
> ! enum gimplify_status
>   gimplify_self_mod_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
> ! bool want_value, tree arith_type)

The comment needs updating to explain the new parameter.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, RFC] Dumping expanded MD files

2012-11-27 Thread Richard Henderson
On 11/22/2012 09:48 AM, Kirill Yukhin wrote:
> +.PHONY: s-mddump
> +s-mddump: $(BUILD_RTL) $(MD_DEPS) build/genmddump$(build_exeext)
> + $(RUN_GEN) build/genmddump$(build_exeext) $(md_file) 2> tmp-mddump.md

I think just 

mddump: ...
$(RUN_GEN) ... > mddump

will be sufficient.  This is not actually used by the build at all, so we
don't need to play games with stamp files etc.

There's no need for top-level makefile changes at all.  When you want to
use this, simply cd into the gcc subdirectory.

> +/* Dump all available rtl queues.  */
> +void
> +dump_expanded_md (void)

Why?  Seems to me that you can just have genmddump.c simply use the
generic read_md_rtx interface, dumping as it goes.  You might also
consider dumping the pattern_lineno argument as a comment before the
pattern.  Otherwise it might be tricky to match up the dump pattern
with the original input file patterns.


r~


Re: [PATCH, generic] Fix for define_subst

2012-11-27 Thread Richard Henderson
On 11/22/2012 09:36 AM, Kirill Yukhin wrote:
> @@ -2668,7 +2668,7 @@ add_c_test (const char *expr, int value)
>  {
>struct c_test *test;
>  
> -  if (expr[0] == 0)
> +  if (!expr || expr[0] == 0)
>  return;

This looks like it's working around a bug elsewhere.


r~


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Eric Botcazou
> I strongly disagree with this.  Outputs and clobbers have significant
> meaning even on volatile asms, asm volatile doesn't mean all registers and
> all memory are supposed to be considered clobbered.  For memory you have
> "memory" clobber for that, for registers it is users responsibility to
> describe exactly what changes, either in clobbers or in outputs.
> The difference between non-volatile and volatile asm is, as the
> documentation states:
> 
> The `volatile' keyword indicates that the instruction has important
> side-effects.  GCC will not delete a volatile `asm' if it is reachable.
> 
> Volatile asm acts as an optimization barrier to some extent, but it really
> can't modify registers or memory that aren't described as modified in the
> asm pattern.  The important side-effects are of some other kind than
> modifying registers or memory visible from the current function.
> Ditto for UNSPEC_VOLATILE.

Well, the last sentence would essentially void the entire argument I think.  
It's well established in the RTL middle-end that UNSPEC_VOLATILE can do pretty 
much everything behind the back of the compiler.

Now I agree that the discussion exists for volatile asms.  But you have for 
example in the unmodified cse.c:

  /* A volatile ASM invalidates everything.  */
  if (NONJUMP_INSN_P (insn)
  && GET_CODE (PATTERN (insn)) == ASM_OPERANDS
  && MEM_VOLATILE_P (PATTERN (insn)))
flush_hash_table ();

and the comment of volatile_insn_p is rather explicit:

/* Nonzero if X contains any volatile instructions.  These are instructions
   which may cause unpredictable machine state instructions, and thus no
   instructions should be moved or combined across them.  This includes
   only volatile asms and UNSPEC_VOLATILE instructions.  */

The problem is that the various RTL passes don't have a consistent view on 
that so the patch attempts to tidy this up in a conservative manner.

I think that a compromise could be to say that volatile asms without outputs 
are full barriers (like UNSPEC_VOLATILE) but other volatile asms aren't.
That's what the unmodified cse.c, cselib.c and dse.c currently implement.
But implementing it consistently would mean weakening volatile_insn_p.

> So, at least from var-tracking POV which doesn't attempt to perform any
> optimizations across any insn, just tries to track where values live, IMHO a
> volatile asm acts exactly the same as non-volatile, that is why I'm testing
> following patch right now.
> 
> But the question is also what effects your patch can have e.g. on RTL DSE.

It will make all volatile asms be treated as volatile asms without outputs.

-- 
Eric Botcazou


Re: [PATCH] Fix PR55489, memory-hog bug in GCSE

2012-11-27 Thread Eric Botcazou
> Bootstrap finished on x86_64-pc-linux-gnu, regtest in progress; ok for
> 4.7 and trunk if it passes?
> 
> Paolo
> 
> 2012-11-26  Paolo Bonzini  
> 
>   PR rtl-optimization/55489
>   * gcse.c (compute_transp): Precompute a canonical version
>   of XEXP (x, 0), and pass it to canon_true_dependence.

OK, thanks.

-- 
Eric Botcazou


Re: [PATCH ARM] Disable "-fira-hoist-pressure" on Thumb2

2012-11-27 Thread Richard Earnshaw

On 27/11/12 10:10, Bin Cheng wrote:

Hi,
I committed the patch implementing register pressure directed hoist in TRUNK
and enabled the option "-fira-hoist-pressure" for all targets by reviewer's
suggestion, although it has no obvious effect on Thumb2/x86/x86_64. Now I
collected performance data with Os and found it causes performance
regression in some cases on Thumb2, so this patch disables the option on
Thumb2 and modify the corresponding test cases.

Tested on arm-none-eabi/M3, is it OK?

Thanks.


2012-11-21  Bin Cheng  

* config/arm/arm.c (arm_option_override): Disable option
-fira-hoist-pressure on Thumb2.

2012-11-21  Bin Cheng  

* gcc.dg/hoist-register-pressure-1.c: Skip on ARM Thumb2.
* gcc.dg/hoist-register-pressure-2.c: Ditto.
* gcc.dg/hoist-register-pressure-3.c: Ditto.







No patch.

R.



Re: Tighten checking of vector comparisons

2012-11-27 Thread Marc Glisse

On Mon, 26 Nov 2012, Richard Biener wrote:


Thus you are rejecting a boolean valued vector comparison which we
otherwise happily accept.  I suppose that makes sense (even though
at least equality compares can make sense).


I agree that boolean equality comparison of vectors makes sense, but I 
don't think the code is ready to properly expand it. I guess we'll have to 
revisit this at some point.



Thus that hunk is ok as well.


Done, thanks.

--
Marc Glisse


Re: [PATCH] gcc-{ar,nm,ranlib}: Find binutils binaries relative to self

2012-11-27 Thread Meador Inge
On 11/26/2012 09:05 AM, Richard Biener wrote:

> On Wed, Nov 7, 2012 at 10:51 PM, Meador Inge  wrote:
>> Ping ^ 4.
> 
> Ok.

Thanks for the review.  Committed to trunk.

-- 
Meador Inge
CodeSourcery / Mentor Embedded
http://www.mentor.com/embedded-software


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Jakub Jelinek
On Tue, Nov 27, 2012 at 06:35:52PM +0100, Eric Botcazou wrote:
> Now I agree that the discussion exists for volatile asms.  But you have for 
> example in the unmodified cse.c:
> 
>   /* A volatile ASM invalidates everything.  */
>   if (NONJUMP_INSN_P (insn)
>   && GET_CODE (PATTERN (insn)) == ASM_OPERANDS
>   && MEM_VOLATILE_P (PATTERN (insn)))
> flush_hash_table ();

That was weird indeed, because
  asm volatile ("" : : "r" (x));
flushed (for targets that don't add any implicit clobbers) but e.g.
  asm volatile ("" : : "r" (x) : "r23");
wouldn't (then the pattern is a parallel with ASM_OPERANDS and some
CLOBBERs).  The effect of that is that it never triggered on i?86/x86_64,
because those have the implicit fprs/flags clobbers.

I was mainly arguing with the sentence that for asm volatile, the output
constraints (did you mean outputs and clobbers?) have essentially no
meaning.  While for some optimizations perhaps it might be desirable
to treat asm volatile as full optimization barrier, I'm not sure about all
of them (i.e. it would be important to look for performance degradations
caused by that change), and for var-tracking I'd argue that asm vs. asm
volatile is completely unimportant, if the asm volatile pattern (or even
UNSPEC_VOLATILE) doesn't say it clobbers or sets hard register XYZ, then it
can't change it (well, it could but is responsible of restoring it),
similarly for memory, if it doesn't clobber "memory" and doesn't set some
memory, it can't do that.

So, do you object even to the var-tracking change (which would mean if
var-tracking found that say some variable's value lives in hard register 12, 
when
encountering asm volatile it would mean to forget about that even when
hard register 12 isn't clobbered by the asm, nor set)?  For now var-tracking
seems to be the most important issue, as we generate invalid debug info
there (latent before, but never hit on inline asm on x86_64/i686 except
on setjmp calls).

As for other passes, the r193802 change e.g. regresses slightly modified
pr44194-1.c testcase on x86_64,
struct ints { int a, b, c; } foo();
void bar(int a, int b);

void func() {
  struct ints s = foo();
  int x;
  asm volatile ("" : "=r" (x));
  bar(s.a, s.b);
}

 func:
 .LFB0:
-   xorl%eax, %eax
subq$24, %rsp
 .LCFI0:
+   xorl%eax, %eax
callfoo
-   movq%rax, %rcx
-   shrq$32, %rcx
-   movq%rcx, %rsi
-   movl%eax, %edi
+   movq%rax, (%rsp)
+   movl%edx, 8(%rsp)
+   movl4(%rsp), %esi
+   movl(%rsp), %edi
addq$24, %rsp
 .LCFI1:
jmp bar

To me it looks like an unnecessary pessimization, the volatile there
I understand that just it doesn't want to be moved around (e.g. scheduled
before the foo call or after bar), that it can't be DCEd, but as it doesn't
mention it modifies memory, I don't understand why it should force some
registers to stack and back when it has no way to know the compiler would be
emitting anything like that at all.
Compare that to
  asm volatile ("" : "=r" (x) : : "memory");
in the testcase, where the r193802 commit makes no difference, the
stores/loads are done in both cases.

For CSE, I'd agree it should treat asm volatile as barriers, so for cselib.c
we probably need some additional cselib_init flag how we want to treat
volatile asms...  For UNSPEC_VOLATILE, perhaps even DSE should stay
conservative, but for var-tracking.c I still don't see a reason for that.

Jakub


[committed] Fix bootstrap with --enable-gather-detailed-mem-stats

2012-11-27 Thread Diego Novillo
This patch restores bootstrap when detailed memory stats are enabled.
No functional changes.

Paolo, could you try again with r193864?

Tested on x86_64.  Committed to trunk.

2012-11-27  Diego Novillo  

* vec.h: Replace 'class vec' with 'struct vec' everywhere.
(ggc_internal_cleared_alloc_stat): Remove.
(va_gc::reserve): Add PASS_MEM_STAT to ggc_realloc_stat call.
(va_stack::reserve): Add PASS_MEM_STAT to va_heap::reserve call.
(vec::copy): Replace ALONE_MEM_STAT_DECL with
ALONE_CXX_MEM_STAT_INFO.
(vec_safe_reserve): Replace MEM_STAT_DECL with CXX_MEM_STAT_INFO.
(vec_safe_reserve_exact): Likewise.
(vec_alloc): Likewise.
(vec_safe_grow): Likewise.
(vec_safe_grow_cleared): Likewise.
(vec_safe_push): Likewise.
(vec_safe_insert): Likewise.
(vec_safe_splice): Likewise.
(vec_alloc): Likewise.
(vec_check_alloc): Likewise.
---
 gcc/ChangeLog |   19 +++
 gcc/vec.h |   57 +
 2 files changed, 48 insertions(+), 28 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index da5ccea..be00e64 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,22 @@
+2012-11-27  Diego Novillo  
+
+   * vec.h: Replace 'class vec' with 'struct vec' everywhere.
+   (ggc_internal_cleared_alloc_stat): Remove.
+   (va_gc::reserve): Add PASS_MEM_STAT to ggc_realloc_stat call.
+   (va_stack::reserve): Add PASS_MEM_STAT to va_heap::reserve call.
+   (vec::copy): Replace ALONE_MEM_STAT_DECL with
+   ALONE_CXX_MEM_STAT_INFO.
+   (vec_safe_reserve): Replace MEM_STAT_DECL with CXX_MEM_STAT_INFO.
+   (vec_safe_reserve_exact): Likewise.
+   (vec_alloc): Likewise.
+   (vec_safe_grow): Likewise.
+   (vec_safe_grow_cleared): Likewise.
+   (vec_safe_push): Likewise.
+   (vec_safe_insert): Likewise.
+   (vec_safe_splice): Likewise.
+   (vec_alloc): Likewise.
+   (vec_check_alloc): Likewise.
+
 2012-11-27  Dehao Chen  
 
* cfgrtl.c (rtl_merge_blocks): Check with UNKNOWN_LOCATION correctly.
diff --git a/gcc/vec.h b/gcc/vec.h
index bfc1811..a7b3b4c 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -51,8 +51,6 @@ along with GCC; see the file COPYING3.  If not see
  has not been included.  Sigh.  */
   extern void ggc_free (void *);
   extern size_t ggc_round_alloc_size (size_t requested_size);
-  extern void *ggc_internal_cleared_alloc_stat (size_t MEM_STAT_DECL)
-ATTRIBUTE_MALLOC;
   extern void *ggc_realloc_stat (void *, size_t MEM_STAT_DECL);
 #  endif  // GCC_GGC_H
 #endif // VEC_GC_ENABLED
@@ -230,7 +228,7 @@ struct vec_prefix
 
  To compensate, we make vec_prefix a field inside vec and make
  vec a friend class of vec_prefix so it can access its fields.  */
-  template  friend class vec;
+  template  friend struct vec;
 
   /* The allocator types also need access to our internals.  */
   friend struct va_gc;
@@ -242,7 +240,7 @@ struct vec_prefix
   unsigned num_;
 };
 
-template class vec;
+template struct vec;
 
 /* Valid vector layouts
 
@@ -367,7 +365,7 @@ va_gc::reserve (vec *&v, unsigned reserve, 
bool exact
   size_t size = vec::embedded_size (alloc);
 
   /* Ask the allocator how much space it will really give us.  */
-  size = ggc_round_alloc_size (size);
+  size = ::ggc_round_alloc_size (size);
 
   /* Adjust the number of slots accordingly.  */
   size_t vec_offset = sizeof (vec_prefix);
@@ -378,7 +376,8 @@ va_gc::reserve (vec *&v, unsigned reserve, 
bool exact
   size = vec_offset + alloc * elt_size;
 
   unsigned nelem = v ? v->length () : 0;
-  v = static_cast  *> (ggc_realloc_stat (v, size));
+  v = static_cast  *> (::ggc_realloc_stat (v, size
+  PASS_MEM_STAT));
   v->embedded_init (alloc, nelem);
 }
 
@@ -447,7 +446,7 @@ va_stack::reserve (vec *&v, unsigned 
nelems, bool exact
 {
   /* V is already on the heap.  */
   va_heap::reserve (reinterpret_cast *&> (v),
-   nelems, exact);
+   nelems, exact PASS_MEM_STAT);
   return;
 }
 
@@ -456,7 +455,7 @@ va_stack::reserve (vec *&v, unsigned 
nelems, bool exact
   vec *oldvec = v;
   v = NULL;
   va_heap::reserve (reinterpret_cast *&>(v), nelems,
-   exact);
+   exact PASS_MEM_STAT);
   if (v && oldvec)
 {
   v->vecpfx_.num_ = oldvec->length ();
@@ -506,7 +505,7 @@ va_stack::release (vec *&v)
 template
-class GTY((user)) vec
+struct GTY((user)) vec
 {
 };
 
@@ -549,7 +548,7 @@ extern vnull vNULL;
- It requires 2 words of storage (prior to vector allocation).  */
 
 template
-class GTY((user)) vec
+struct GTY((user)) vec
 {
 public:
   unsigned allocated (void) const { return vecpfx_.alloc_; }
@@ -563,7 +562,7 @@ public:
   bool space (unsigned) const;
   bool iterate (unsigned, T *) const;
   bool iterate (unsigned, T **) const;
-  vec *copy

[patch][google/gcc-4_7] Permanently ignore pr54127 and Google ref b/6983319 (issue6856104)

2012-11-27 Thread Simon Baldwin
Permanently ignore pr54127 and Google ref b/6983319.

Okay for google/gcc-4_7 branch?  Thanks.


ChangeLog.google-4_7
2012-11-27  Simon Baldwin  

* contrib/testsuite-management/powerpc-grtev3-linux-gnu.xfail:
Permanently ignore pr54127 and Google ref b/6983319.


Index: contrib/testsuite-management/powerpc-grtev3-linux-gnu.xfail
===
--- contrib/testsuite-management/powerpc-grtev3-linux-gnu.xfail (revision 
193860)
+++ contrib/testsuite-management/powerpc-grtev3-linux-gnu.xfail (working copy)
@@ -1,9 +1,9 @@
-# Temporarily ignore gcc pr54127.
-expire=20151231 | FAIL: gcc.dg/torture/pr53589.c -O3 -g  (test for excess 
errors)
-expire=20151231 | FAIL: gcc.dg/torture/pr53589.c  -O3 -g  (internal compiler 
error)
-# Temporarily ignore Google ref b/6983319.
-expire=20151231 | FAIL: gcc.target/powerpc/regnames-1.c (test for excess 
errors)
-expire=20151231 | FAIL: gcc.target/powerpc/regnames-1.c (internal compiler 
error)
+# Ignore gcc pr54127.
+FAIL: gcc.dg/torture/pr53589.c -O3 -g  (test for excess errors)
+FAIL: gcc.dg/torture/pr53589.c  -O3 -g  (internal compiler error)
+# Ignore Google ref b/6983319.
+FAIL: gcc.target/powerpc/regnames-1.c (test for excess errors)
+FAIL: gcc.target/powerpc/regnames-1.c (internal compiler error)
 
 FAIL: gfortran.dg/bessel_6.f90  -O0  execution test
 FAIL: gfortran.dg/bessel_6.f90  -O1  execution test

--
This patch is available for review at http://codereview.appspot.com/6856104


Re: [patch][google/gcc-4_7] Permanently ignore pr54127 and Google ref b/6983319 (issue6856104)

2012-11-27 Thread Diego Novillo
On Tue, Nov 27, 2012 at 1:22 PM, Simon Baldwin  wrote:

> 2012-11-27  Simon Baldwin  
>
> * contrib/testsuite-management/powerpc-grtev3-linux-gnu.xfail:
> Permanently ignore pr54127 and Google ref b/6983319.

Isn't giving up the sweetest of choices?

OK.


Diego.


[Patch, Fortran] No-op Patch - a.k.a. FINAL wrapper update

2012-11-27 Thread Tobias Burnus

Dear all,

effectively, this patch doesn't do anything. Except, it updates the – 
deactivated – finalization wrapper.



Note: This patch does not include any code to actually call the 
finalization wrapper. Nor is the modified code ever called in gfortran. 
However, that patch paves the road to a proper finalization (and 
polymorphic deallocation) support. When I mention below that I tested 
the patch: That was with the larger but incomplete 
final-2012-11-27-v2.diff patch, available at 
https://userpage.physik.fu-berlin.de/~tburnus/final/ Note that the patch 
there has known issues and does not incorporate all of Janus changes.



Changes relative to the trunk:

* Properly handles coarray components: Those may not be finalized for 
intrinsic assignment; with this patch there is now a generated "IF" 
condition to ensure this in the wrapper.


* While arrays arguments to the wrapper have to be contiguous, the new 
version takes a "stride" argument which allows noncontiguity in the 
lowest dimension. That is: One can pass a contiguous array directly to 
the parent's finalizer even if it then isn't anymore contiguous (for the 
parent type). If the finalizers are all elemental (or scalar), no 
copy-in/copy-out is needed. However, if it is passed to an array final 
subroutine, the array is packed using the following code:


if (stride == STORAGE_SIZE (array)/NUMERIC_STORAGE_SIZE
|| 0 == STORAGE_SIZE (array)) then
call final_rank3 (array)
else
block
type(t) :: tmp(shape (array))

do i = 0, size (array)-1
addr = transfer (c_loc (array), addr) + i * stride
call c_f_pointer (transfer (addr, cptr), ptr)

addr = transfer (c_loc (tmp), addr)
+ i * STORAGE_SIZE (array)/NUMERIC_STORAGE_SIZE
call c_f_pointer (transfer (addr, cptr), ptr2)
ptr2 = ptr
end do
call final_rank3 (tmp)
end block
end if


Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias

PS: I don't know when I will have time to continue working on the patch. 
The next steps from my side are: First, submit some smaller bits from 
the final-2012-11-27-v2.diff patch, even if they will be unused. 
Secondly, do some cleanup and fix a few issues and merge Janus' patch. 
(My patch is based on the 2012-10-26 version of the patch, Janus' latest 
patch was 2012-11-04.) At that point, one might consider enabling the 
FINAL feature partially (e.g. only polymorphic deallocation by not 
allowing FINAL) or fully.


PPS: The patch was successfully tested with the following test case (and 
some small variations of it):


module m
type t
integer :: i
contains
final :: fini
end type t
type, extends(t) :: t2
integer :: j
contains
final :: fini2
end type t2
contains
subroutine fini(x)
! type(t), intent(in) :: x(:,:)
type(t), intent(inout) :: x(:,:)
print *, 'SHAPE:', shape(x)
print *, x
end subroutine fini
impure elemental subroutine fini2(x)
type(t2), intent(inout) :: x
print *, 'FINI2 - elemental: ', x%i
x%i = x%i+10*x%i
end subroutine fini2
end module m

use m
class(t2), allocatable :: x(:,:)
allocate(t2 :: x(2,3))
x(:,:)%i = reshape([1,2,3,4,5,6],[2,3])
print *, 'HELLO: ', x%i
deallocate(x)
end
2012-11-27  Tobias Burnus  

	PR fortran/37336
	* class.c (find_derived_vtab): New static function.
	(gfc_get_derived_vtab): Renamed from gfc_find_derived_vtab.
	(gfc_find_derived_vtab): New function.
	(gfc_class_null_initializer, get_unique_hashed_string,
	gfc_build_class_symbol, copy_vtab_proc_comps,
	): Use gfc_get_derived_vtab instead
	of gfc_find_derived_vtab.
	(finalizer_insert_packed_call): New static function.
	(finalize_component, generate_finalization_wrapper):
	Fix coarray handling and packing.
	* gfortran.h (gfc_get_derived_vtab): New prototype.
	* check.c (gfc_check_move_alloc): Use it.
	* expr.c (gfc_check_pointer_assign): Ditto.
	* interface.c (compare_parameter): Ditto.
	* iresolve.c (gfc_resolve_extends_type_of): Ditto.
	* trans-decl.c (gfc_get_symbol_decl): Ditto.
	* trans-expr.c (gfc_conv_derived_to_class,
	gfc_trans_class_assign): Ditto.
	* trans-intrinsic.c (conv_intrinsic_move_alloc): Ditto.
	* trans-stmt.c (gfc_trans_allocate,
	gfc_trans_deallocate): Ditto.
	* resolve.c (resolve_typebound_function,
	resolve_typebound_subroutine, resolve_allocate_expr,
	resolve_select_type, gfc_resolve_finalizers,
	resolve_typebound_procedures, resolve_fl_derived): Ditto.
	(resolve_symbol): Return early if attr.artificial.

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index a490238..20d6bbd 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -2801,7 +2801,7 @@ gfc_check_move_alloc (gfc_expr *from, gfc_expr *to)
 
   /* CLASS arguments: Make sure the vtab of from is present.  */
   if (to->ts.type == BT_CLASS)
-gfc_find_derived_vtab (from->ts.u.derived);
+gfc_get_derived_vtab (from->ts.u.derived);
 
   return SUCCESS;
 }
diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index 2e347cb..ab3bcc1 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -416,7 +416,7 @@ gfc_class_null_initializer (gfc_typespec *ts)
 {
   gfc

PATCH: Enable both ld and gold in GCC 4.8

2012-11-27 Thread H.J. Lu
On Tue, Nov 27, 2012 at 9:23 AM, H.J. Lu  wrote:
> On Tue, Nov 27, 2012 at 9:20 AM, Paolo Bonzini  wrote:
>> Il 27/11/2012 17:07, H.J. Lu ha scritto:
>>> There is no reply for
>>>
>>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg02121.html
>>
>> That counts as lack of review...
>>
>
> That is no response from patch submitter.  It is different from
> lack of review.
>

I updated patch for trunk.  OK to install?

Thanks.

-- 
H.J.
--
2011-06-27   Doug Kwan  

Google ref 41164-p2
Backport upstream patch under review.

2011-01-19   Nick Clifton  
 Matthias Klose 

* configure.ac (gcc_cv_gold_srcdir): New cached variable -
contains the location of the gold sources.
(ORIGINAL_GOLD_FOR_TARGET): New substituted variable - contains
the name of the locally built gold executable.
* configure: Regenerate.
* collect2.c (main): Detect the -use-gold and -use-ld switches
and select the appropriate linker, if found.
If a linker cannot be found and collect2 is executing in
verbose mode then report the search paths examined.
* exec-tool.in: Detect the -use-gold and -use-ld switches and
select the appropriate linker, if found.
Add support for -v switch.
Report problems locating linker executable.
* gcc.c (LINK_COMMAND_SPEC): Translate -fuse-ld=gold into
-use-gold and -fuse-ld=bfd into -use-ld.
* common.opt: Add fuse-ld=gold and fuse-ld=bfd.
* opts.c (comman_handle_option): Ignore -fuse-ld=gold and
-fuse-ld=bfd.
* doc/invoke.texi: Document the new options.


0001-Enable-both-ld-and-gold-in-gcc.patch
Description: Binary data


[asan] Fix some asan ICEs

2012-11-27 Thread Jakub Jelinek
Hi!

This fixes a couple of asan ICEs found while running make check
with RUNTESTFLAGS='unix/-fsanitize=address'.
The last two hunks fix ICEs while instrumenting atomics with non-standard
sizes, those are always turned into library calls, and the first argument
is the length, not a pointer.
The other issues fixed are if one uses non-prototyped builtins with wrong
types of arguments, it can result in verify_gimple ICEs (e.g. if last
argument to non-prototyped memcmp is int), or worse (say if instead of
pointer it is a double).  The patch just punts for bogus builtins,
and when len is integral, but not of the right type, it casts it to the
right type (for constant just builds the right offset as constant,
otherwise adds extra stmts as needed).

Ok for trunk?

2012-11-27  Jakub Jelinek  

* asan.c (instrument_mem_region_access): Don't instrument
if base doesn't have pointer type or len integral type.
Add cast if len doesn't have size_t compatible type.
(instrument_builtin_call): Don't instrument BUILT_IN_ATOMIC_LOAD,
BUILT_IN_ATOMIC_TEST_AND_SET, BUILT_IN_ATOMIC_CLEAR,
BUILT_IN_ATOMIC_EXCHANGE, BUILT_IN_ATOMIC_COMPARE_EXCHANGE
and BUILT_IN_ATOMIC_STORE.

--- gcc/asan.c.jj   2012-11-23 15:21:49.0 +0100
+++ gcc/asan.c  2012-11-27 17:37:10.948281831 +0100
@@ -849,7 +849,9 @@ instrument_mem_region_access (tree base,
  gimple_stmt_iterator *iter,
  location_t location, bool is_store)
 {
-  if (integer_zerop (len))
+  if (!POINTER_TYPE_P (TREE_TYPE (base))
+  || !INTEGRAL_TYPE_P (TREE_TYPE (len))
+  || integer_zerop (len))
 return;
 
   gimple_stmt_iterator gsi = *iter;
@@ -902,20 +904,41 @@ instrument_mem_region_access (tree base,
 
   /* offset = len - 1;  */
   len = unshare_expr (len);
-  gimple offset =
-gimple_build_assign_with_ops (TREE_CODE (len),
- make_ssa_name (TREE_TYPE (len), NULL),
- len, NULL);
-  gimple_set_location (offset, location);
-  gsi_insert_before (&gsi, offset, GSI_NEW_STMT);
-
-  offset =
-gimple_build_assign_with_ops (MINUS_EXPR,
- make_ssa_name (size_type_node, NULL),
- gimple_assign_lhs (offset),
- build_int_cst (size_type_node, 1));
-  gimple_set_location (offset, location);
-  gsi_insert_after (&gsi, offset, GSI_NEW_STMT);
+  tree offset;
+  gimple_seq seq = NULL;
+  if (TREE_CODE (len) == INTEGER_CST)
+offset = fold_build2 (MINUS_EXPR, size_type_node,
+ fold_convert (size_type_node, len),
+ build_int_cst (size_type_node, 1));
+  else
+{
+  gimple g;
+  tree t;
+
+  if (TREE_CODE (len) != SSA_NAME)
+   {
+ t = make_ssa_name (TREE_TYPE (len), NULL);
+ g = gimple_build_assign_with_ops (TREE_CODE (len), t, len, NULL);
+ gimple_set_location (g, location);
+ gimple_seq_add_stmt_without_update (&seq, g);
+ len = t;
+   }
+  if (!useless_type_conversion_p (size_type_node, TREE_TYPE (len)))
+   {
+ t = make_ssa_name (size_type_node, NULL);
+ g = gimple_build_assign_with_ops (NOP_EXPR, t, len, NULL);
+ gimple_set_location (g, location);
+ gimple_seq_add_stmt_without_update (&seq, g);
+ len = t;
+   }
+
+  t = make_ssa_name (size_type_node, NULL);
+  g = gimple_build_assign_with_ops (MINUS_EXPR, t, len,
+   build_int_cst (size_type_node, 1));
+  gimple_set_location (g, location);
+  gimple_seq_add_stmt_without_update (&seq, g);
+  offset = gimple_assign_lhs (g);
+}
 
   /* _1 = base;  */
   base = unshare_expr (base);
@@ -924,14 +947,16 @@ instrument_mem_region_access (tree base,
  make_ssa_name (TREE_TYPE (base), NULL),
  base, NULL);
   gimple_set_location (region_end, location);
-  gsi_insert_after (&gsi, region_end, GSI_NEW_STMT);
+  gimple_seq_add_stmt_without_update (&seq, region_end);
+  gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
+  gsi_prev (&gsi);
 
   /* _2 = _1 + offset;  */
   region_end =
 gimple_build_assign_with_ops (POINTER_PLUS_EXPR,
  make_ssa_name (TREE_TYPE (base), NULL),
  gimple_assign_lhs (region_end),
- gimple_assign_lhs (offset));
+ offset);
   gimple_set_location (region_end, location);
   gsi_insert_after (&gsi, region_end, GSI_NEW_STMT);
 
@@ -1089,7 +1114,6 @@ instrument_builtin_call (gimple_stmt_ite
These are handled differently from the classical memory memory
access builtins above.  */
 
-case BUILT_IN_ATOMIC_LOAD:
 case BUILT_IN_ATOMIC_LOAD_1:
 case BUILT_IN_ATOMIC_LOAD_2:
 case BUILT_IN_ATOMIC_LOAD_4:
@@ -11

Re: [RFC] Wrong register numbers in .dwarf_frame on Linux/PowerPC

2012-11-27 Thread Ulrich Weigand
Mark Kettenis wrote:
> > Date: Mon, 26 Nov 2012 20:10:06 +0100 (CET)
> > From: "Ulrich Weigand" 
> > 
> > Hello,
> > 
> > I noticed what appears to be a long-standing bug in generating .dwarf_frame
> > sections with GCC on Linux on PowerPC.
> > 
> > ...
> > 
> > So I'm wondering where to go from here.  I guess we could:
> > 
> > 1. Bring GCC (and gas) behaviour in compliance with the documented ABI
> >by removing the #undef DBX_REGISTER_NUMBER and changing gas's
> >md_reg_eh_frame_to_debug_frame to the original implementation from
> >Jakub's patch.  That would make GDB work well on new files, but
> >there are a large number of binaries out there where we continue
> >to have the same behaviour as today ...
> > 
> > 2. Leave GCC and gas as-is and modify GDB to expect GCC numbering in
> >.dwarf_frame, except for the condition code register.  This would
> >break debugging of files built with GCC 4.0 and 4.1 unless we
> >want to add a special hack for that.
> > 
> > 3. Like 2., but remove the condition code hack: simply use identical
> >numbers in .eh_frame and .dwarf_frame.  This would make PowerPC
> >like other Linux platforms in that respect.
> > 
> > Thoughts?
> 
> What do other compilers (in particular XLC) do?  From a GDB standpoint
> it would be a major PITA if different compilers would use different
> encodings for .dwarf_frame.

In my tests XLC (version 12.1 on Linux) seems to consistently use the
GCC register numbering in both .eh_frame and .dwarf_frame.  LLVM also
consistently uses the GCC register numbering.  Looks like this would
be another argument for variant 3 ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



[asan] Another ICE fix

2012-11-27 Thread Jakub Jelinek
Hi!

E.g. on c_char_tests.f03 we have a stmt like
  _2 = VIEW_CONVERT_EXPR(121)[1]{lb: 1 sz: 1};
after inlining (wonder why we never fold that to 121), which asan
incorrectly considered to be a load and thus attempted to instrument it,
by taking address of the rhs.

Fixed thusly, ok for trunk?

2012-11-27  Jakub Jelinek  

* asan.c (instrument_assignment): Instrument lhs only
for gimple_store_p and rhs1 only for gimple_assign_load_p.

--- gcc/asan.c.jj   2012-11-27 17:37:10.0 +0100
+++ gcc/asan.c  2012-11-27 18:37:21.316247456 +0100
@@ -1358,10 +1358,12 @@ instrument_assignment (gimple_stmt_itera
 
   gcc_assert (gimple_assign_single_p (s));
 
-  instrument_derefs (iter, gimple_assign_lhs (s),
-gimple_location (s), true);
-  instrument_derefs (iter, gimple_assign_rhs1 (s),
-gimple_location (s), false);
+  if (gimple_store_p (s))
+instrument_derefs (iter, gimple_assign_lhs (s),
+  gimple_location (s), true);
+  if (gimple_assign_load_p (s))
+instrument_derefs (iter, gimple_assign_rhs1 (s),
+  gimple_location (s), false);
 }
 
 /* Instrument the function call pointed to by the iterator ITER, if it

Jakub


Re: [RFC] Wrong register numbers in .dwarf_frame on Linux/PowerPC

2012-11-27 Thread Ulrich Weigand
Mark Wielaard wrote:

> Which other unwinders are out there, that might rely on the current
> numbering?

Well, runtime unwinders using .eh_frame should be fine, since this
uses (and has always used) consistently the GCC numbering.  I don't
know if there are other unwinders using .dwarf_frame ...

> The Systemtap runtime unwinder (*) currently is incomplete
> (and in one case wrong since the numbering overlaps), so it doesn't
> really matter much which solution you pick (we will just have to watch
> out and fix things to be as consistent as possible when your change goes
> through). If you do change the numbering it would be ideal if there was
> a way to detect which one was in place (although it is probably hopeless
> because depending on which GCC version is in use there can already be
> different numberings).

The change will most likely be to consistently use GCC numbering in
.dwarf_frame as well, which changes only the encoding of the condition
code register.  Since you're not using that at all in systemtap, you
shouldn't be affected.

> BTW. The reason the systemtap runtime unwinder is
> a little wrong here is because on all other architectures we assume
> eh_frame and debug_frame DWARF register numberings are equal, is ppc
> really the only architecture for which that isn't true, or were we just
> lucky?

As far as Linux goes, yes, ppc was the only architecture with a
different encoding between .eh_frame and .dwarf_frame.  The only
other such platforms I'm aware of are Darwin on 32-bit i386, and
some other operating systems on ppc (AIX, Darwin, BSD).

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



Re: [RFC] Wrong register numbers in .dwarf_frame on Linux/PowerPC

2012-11-27 Thread Ulrich Weigand
David Edelsohn wrote:
> On Mon, Nov 26, 2012 at 2:10 PM, Ulrich Weigand  wrote:
> 
> > So I'm wondering where to go from here.  I guess we could:
> >
> > 1. Bring GCC (and gas) behaviour in compliance with the documented ABI
> >by removing the #undef DBX_REGISTER_NUMBER and changing gas's
> >md_reg_eh_frame_to_debug_frame to the original implementation from
> >Jakub's patch.  That would make GDB work well on new files, but
> >there are a large number of binaries out there where we continue
> >to have the same behaviour as today ...
> >
> > 2. Leave GCC and gas as-is and modify GDB to expect GCC numbering in
> >.dwarf_frame, except for the condition code register.  This would
> >break debugging of files built with GCC 4.0 and 4.1 unless we
> >want to add a special hack for that.
> >
> > 3. Like 2., but remove the condition code hack: simply use identical
> >numbers in .eh_frame and .dwarf_frame.  This would make PowerPC
> >like other Linux platforms in that respect.
> >
> > Thoughts?
> 
> I vote for (3).

I'd agree, in particular given that XLC and LLVM seem to match this
behaviour as well.

Looking into this further, it turns out that on Linux not only .debug_frame
is affected, but also .debug_info and all the other .debug_... sections.
DBX_REGISTER_NUMBER is used for register numbers in those sections too ...

This again doesn't match what GDB is expecting:  For regular debug info
(not frame info), GDB only distinguished between stabs and DWARF, and
assumes GCC numbering for stabs, and DWARF numbering for DWARF.  This
holds for any PowerPC operating system.

However, looking at GCC behaviour, we have instead GCC numbering used
in either stabs or DWARF on Linux, but DWARF numbering apparently used
in either stabs or DWARF on AIX/BSD/Darwin.

Here, comparison with other compilers is less clear.  I wasn't able to
get XLC on Linux to generate any .debug_info containing a register
number for non-GPR/FPR registers (it would always put such variables
on the stack).  The XLC on AIX I have access to is quite old and only
generates stabs; again I wasn't able to see any non-GPR register
assignments.  LLVM consistently uses the GCC numbering on all operating
systems it supports (I think that's Linux, Darwin, and FreeBSD).

As far as Linux is concerned, leaving the compilers as-is and changing
GDB to expect GCC numbering might be the best option.  Not sure about
other operating systems ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



Re: [RFC] Wrong register numbers in .dwarf_frame on Linux/PowerPC

2012-11-27 Thread Mark Kettenis
> Date: Tue, 27 Nov 2012 19:43:40 +0100 (CET)
> From: "Ulrich Weigand" 
> 
> Mark Kettenis wrote:
> > > Date: Mon, 26 Nov 2012 20:10:06 +0100 (CET)
> > > From: "Ulrich Weigand" 
> > > 
> > > Hello,
> > > 
> > > I noticed what appears to be a long-standing bug in generating 
> > > .dwarf_frame
> > > sections with GCC on Linux on PowerPC.
> > > 
> > > ...
> > > 
> > > So I'm wondering where to go from here.  I guess we could:
> > > 
> > > 1. Bring GCC (and gas) behaviour in compliance with the documented ABI
> > >by removing the #undef DBX_REGISTER_NUMBER and changing gas's
> > >md_reg_eh_frame_to_debug_frame to the original implementation from
> > >Jakub's patch.  That would make GDB work well on new files, but
> > >there are a large number of binaries out there where we continue
> > >to have the same behaviour as today ...
> > > 
> > > 2. Leave GCC and gas as-is and modify GDB to expect GCC numbering in
> > >.dwarf_frame, except for the condition code register.  This would
> > >break debugging of files built with GCC 4.0 and 4.1 unless we
> > >want to add a special hack for that.
> > > 
> > > 3. Like 2., but remove the condition code hack: simply use identical
> > >numbers in .eh_frame and .dwarf_frame.  This would make PowerPC
> > >like other Linux platforms in that respect.
> > > 
> > > Thoughts?
> > 
> > What do other compilers (in particular XLC) do?  From a GDB standpoint
> > it would be a major PITA if different compilers would use different
> > encodings for .dwarf_frame.
> 
> In my tests XLC (version 12.1 on Linux) seems to consistently use the
> GCC register numbering in both .eh_frame and .dwarf_frame.  LLVM also
> consistently uses the GCC register numbering.  Looks like this would
> be another argument for variant 3 ...

Probably.  Certainly the most practical solution.  Although I'd say
that the fact that people have been able to live with the non-matching
register numbering schemes for so many years means that variant 1
wouldn't hurt people too badly.  It's a bit of a shame that on one of
the few architectures that bothered to provide a definition of the
DWARF register numbers we wouldn't use it :(.


Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-27 Thread Uros Bizjak
Hello!

> On Tue, 27 Nov 2012, Jakub Jelinek wrote:
> > On Tue, Nov 27, 2012 at 06:44:23AM -0500, Hans-Peter Nilsson wrote:
>
> JFTR: No I didn't, Eric wrote the below.  But, it made sense to me. :)
>
> > > > We apparently have a small conflict between the meaning of volatile 
> > > > asms with
> > > > operands at the source level and volatile_insn_p.  However, I think 
> > > > that the
> > > > latter interpretation is the correct one: if your asm statements have 
> > > > effects
> > > > that can be described, then you should use output constraints instead of
> > > > volatile; otherwise, you should use volatile and the output constraints 
> > > > have
> > > > essentially no meaning.
> >
> > I strongly disagree with this.
> > [...]
>
> As long as volatile asms and UNSPEC_VOLATILE insns (aka.
> barriers) are handled the same way and consistently throughout
> gcc, I'm fine.  It seems your patch does that, so thanks!
>
> > But the question is also what effects your patch can have e.g. on RTL DSE.
>
> Looks like the patch caused a bootstrap for s390x.
>
> Eagerly awaiting a PR for that, but whoever is interested
> on that, please try Jakub's patch first...
>
> > 2012-11-26  Jakub Jelinek  
> >
> > PR debug/36728
> > PR debug/55467
> > * cselib.c (cselib_process_insn): If cselib_preserve_constants,
> > don't reset table and exit early on volatile insns and setjmp.
> > Reset table afterwards on setjmp.
> >
> > * gcc.dg/guality/pr36728-1.c: Include "../nop.h", make sure the asm
> > are non-empty and add dependency between the first and second asm.
> > * gcc.dg/guality/pr36728-2.c: Likewise.
> > * gcc.dg/guality/pr36728-3.c: New test.
> > * gcc.dg/guality/pr36728-4.c: New test.

I have hit the same ICE on alpha-linux-gnu bootstrap. Jakub's patch
fixes the ICE, and allows bootstrap to pass well into stage2 now.
However, it takes ~10 hours for full bootstrap+regtest to finish, will
report back tomorrow morning (CET).

Uros.


[Patch, Fortran] PR55476 - fix bogus "Pointer might outlive the pointer target"

2012-11-27 Thread Tobias Burnus
The problem is that the symbol gets the host-associated flag as soon as 
it is host associated even in the host's namespace. Solution: Test 
additionally whether they have been declared in the same namespace.


(I wonder whether there is a case where the host-association is attr is 
set and the namespace is different but the pointer won't outlive the 
target. I tried to create such a test case in the PR – and failed. I 
think it is not possible BLOCK or a module variable; and as there is 
only a single level of nesting, I think the attached patch is correct.)


Bootstrapped and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias

PS: Pending patches on my side - I have no overview about other patches:
- This patch ;-)
- show_locus out-of-bounds fix, 
http://gcc.gnu.org/ml/fortran/2012-11/msg00084.html
- I/O leakage with iostat=, 
http://gcc.gnu.org/ml/fortran/2012-11/msg00083.html

- FINAL wrapper, http://gcc.gnu.org/ml/fortran/2012-11/msg00086.html
2012-11-27  Tobias Burnus  

	PR fortran/55476
	* expr.c (gfc_check_pointer_assign): Fix check
	pointer-might-outlive-target check for host_assoc.

2012-11-27  Tobias Burnus  

	PR fortran/55476
	* gfortran.dg/warn_target_lifetime_3.f90: New.

diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index 211f304..15570af 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -3693,7 +3693,9 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr *rvalue)
   warn = lvalue->symtree->n.sym->attr.dummy
 	 || lvalue->symtree->n.sym->attr.result
 	 || lvalue->symtree->n.sym->attr.function
-	 || lvalue->symtree->n.sym->attr.host_assoc
+	 || (lvalue->symtree->n.sym->attr.host_assoc
+		 && lvalue->symtree->n.sym->ns
+		!= rvalue->symtree->n.sym->ns)
 	 || lvalue->symtree->n.sym->attr.use_assoc
 	 || lvalue->symtree->n.sym->attr.in_common;

diff --git a/gcc/testsuite/gfortran.dg/warn_target_lifetime_3.f90 b/gcc/testsuite/gfortran.dg/warn_target_lifetime_3.f90
new file mode 100644
index 000..3ea4f7a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/warn_target_lifetime_3.f90
@@ -0,0 +1,29 @@
+! { dg-do compile }
+! { dg-options "-Wall" }
+!
+! PR fortran/55476
+!
+! Contribued by  Janus Weil
+!
+subroutine test
+  integer, pointer :: p
+  integer, target :: t
+  p => t
+contains
+  subroutine sub()
+if (p /= 0) return
+  end subroutine
+end subroutine
+
+module m
+  integer, pointer :: p2
+contains
+  subroutine test
+integer, target :: t2
+p2 => t2 ! { dg-warning "Pointer at .1. in pointer assignment might outlive the pointer target" }
+  contains
+subroutine sub()
+  if (p2 /= 0) return
+end subroutine
+  end subroutine
+end module m


Re: [PATCH] Section anchors and thread-local storage

2012-11-27 Thread Richard Sandiford
David Edelsohn  writes:
> Below is the implementation as a new target hook.

Looks good to me.  The documentation string needs filling in though,
with a hook placeholder in tm.texi.in.

Richard


Re: PR 55438: Incorrect use of BIGGEST_ALIGNMENT

2012-11-27 Thread Richard Sandiford
Eric Botcazou  writes:
>> OK, how about:
>> 
>>   /* ??? For historical reasons, reject modes that would normally
>>  receive greater alignment, even if unaligned accesses are
>>  acceptable.  This has both advantages and disadvantages.
>
> Fine with me.
>
>> I think here we really do want unit (i.e. the GET_MODE_BITSIZE).
>> We're dividing the bitfield into unit-sized chunks and want to know
>> whether those chunks are aligned or not.  If they aren't aligned,
>> we need to know whether unaligned accesses are OK.
>
> We could conceivably have an aligned mode with bitsize 32 and alignment 16 
> (i.e. STRICT_ALIGNMENT 1 and BIGGEST_ALIGNMENT 16), meaning that we can use 
> 16-bit aligned 32-bit chunks in this mode.  Why would the bitsize matter 
> (assuming that it's a multiple of the alignment)?

I suppose I was putting too much store by the expmed.c code.  How does
this version look?  Tested on x86_64-linux-gnu, powerpc64-linux-gnu
and cris-elf.

Richard


gcc/
PR middle-end/55438
* expmed.c (simple_mem_bitfield_p): New function, extracted from
store_bit_field_1 and extract_bit_field_1.  Use GET_MODE_ALIGNMENT
rather than bitsize when checking the alignment.
* stor-layout.c (bit_field_mode_iterator::bit_field_mode_iterator):
Don't limit ALIGN_.  Assume that memory is mapped in chunks of at
least word size, regardless of BIGGEST_ALIGNMENT.
(bit_field_mode_iterator::get_mode): Use GET_MODE_ALIGNMENT rather
than unit when checking the alignment.
(get_best_mode): Use GET_MODE_ALIGNMENT.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-11-20 19:49:07.0 +
+++ gcc/expmed.c2012-11-27 17:27:38.381139339 +
@@ -416,6 +416,21 @@ lowpart_bit_field_p (unsigned HOST_WIDE_
   else
 return bitnum % BITS_PER_WORD == 0;
 }
+
+/* Return true if OP is a memory and if a bitfield of size BITSIZE at
+   bit number BITNUM can be treated as a simple value of mode MODE.  */
+
+static bool
+simple_mem_bitfield_p (rtx op0, unsigned HOST_WIDE_INT bitsize,
+  unsigned HOST_WIDE_INT bitnum, enum machine_mode mode)
+{
+  return (MEM_P (op0)
+ && bitnum % BITS_PER_UNIT == 0
+ && bitsize == GET_MODE_BITSIZE (mode)
+ && (!SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (op0))
+ || (bitnum % GET_MODE_ALIGNMENT (mode) == 0
+ && MEM_ALIGN (op0) % GET_MODE_ALIGNMENT (mode) == 0)));
+}
 
 /* Try to use instruction INSV to store VALUE into a field of OP0.
BITSIZE and BITNUM are as for store_bit_field.  */
@@ -624,12 +639,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
   /* If the target is memory, storing any naturally aligned field can be
  done with a simple store.  For targets that support fast unaligned
  memory, any naturally sized, unit aligned field can be done directly.  */
-  if (MEM_P (op0)
-  && bitnum % BITS_PER_UNIT == 0
-  && bitsize == GET_MODE_BITSIZE (fieldmode)
-  && (!SLOW_UNALIGNED_ACCESS (fieldmode, MEM_ALIGN (op0))
- || (bitnum % bitsize == 0
- && MEM_ALIGN (op0) % bitsize == 0)))
+  if (simple_mem_bitfield_p (op0, bitsize, bitnum, fieldmode))
 {
   op0 = adjust_bitfield_address (op0, fieldmode, bitnum / BITS_PER_UNIT);
   emit_move_insn (op0, value);
@@ -1454,12 +1464,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
 
   /* Extraction of a full MODE1 value can be done with a load as long as
  the field is on a byte boundary and is sufficiently aligned.  */
-  if (MEM_P (op0)
-  && bitnum % BITS_PER_UNIT == 0
-  && bitsize == GET_MODE_BITSIZE (mode1)
-  && (!SLOW_UNALIGNED_ACCESS (mode1, MEM_ALIGN (op0))
- || (bitnum % bitsize == 0
- && MEM_ALIGN (op0) % bitsize == 0)))
+  if (simple_mem_bitfield_p (op0, bitsize, bitnum, mode1))
 {
   op0 = adjust_bitfield_address (op0, mode1, bitnum / BITS_PER_UNIT);
   return convert_extracted_bit_field (op0, mode, tmode, unsignedp);
Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c   2012-11-24 12:16:28.947902915 +
+++ gcc/stor-layout.c   2012-11-26 23:16:36.273308430 +
@@ -2643,15 +2643,17 @@ fixup_unsigned_type (tree type)
   unsigned int align, bool volatilep)
 : mode_ (GET_CLASS_NARROWEST_MODE (MODE_INT)), bitsize_ (bitsize),
   bitpos_ (bitpos), bitregion_start_ (bitregion_start),
-  bitregion_end_ (bitregion_end), align_ (MIN (align, BIGGEST_ALIGNMENT)),
+  bitregion_end_ (bitregion_end), align_ (align),
   volatilep_ (volatilep), count_ (0)
 {
   if (!bitregion_end_)
 {
-  /* We can assume that any aligned chunk of ALIGN_ bits that overlaps
+  /* We can assume that any aligned chunk of UNITS bits that overlaps
 the bitfield is mapped and won't trap.  */
-  bitregion_end_ = bitpos + bitsize + align_ - 1;
- 

  1   2   >