Re: [PATCH, generic] Fix for define_subst

2012-11-30 Thread Kirill Yukhin
Hello,
> Ok.
Thanks, checked in: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00937.html

Thanks, K


Re: [PATCH] Filter out -fsanitize=address if not in combined tree in libiberty

2012-11-30 Thread Richard Biener
On Thu, Nov 29, 2012 at 6:40 PM, H.J. Lu  wrote:
> Hi,
>
> When GCC is configured with --with-build-config="bootstrap-asan", all
> -flto tests will fail since -fsanitize=address is used to compile host
> libiberty, which is used to create liblto_plugin.so, and linker isn't
> compiled with -fsanitize=address.  This patch filters out
> -fsanitize=address from CFLAGS if we aren't in a combined tree with
> binutils.  OK to install?

Why not simply ensure that only host _executables_ are sanitized?

Richard.

> Thanks.
>
> H.J.
> ---
> 2012-11-21  H.J. Lu  
>
> * Makefile.in (CFLAGS): Filter out -fsanitize=address if in GCC
> tree, but not in a combined tree with binutils.
> * configure.ac (COMBINED_TREE_FALSE): New AC_SUBST.
> * configure: Regenerated.
>
> diff --git a/libiberty/Makefile.in b/libiberty/Makefile.in
> index 1ba8cf1..2d357d7 100644
> --- a/libiberty/Makefile.in
> +++ b/libiberty/Makefile.in
> @@ -63,6 +63,10 @@ PERL = @PERL@
>
>  PICFLAG = @PICFLAG@
>
> +# Filter out -fsanitize=address if we are in GCC tree, but aren't in a
> +# combined tree with binutils.
> +@COMBINED_TREE_FALSE@override CFLAGS := $(filter-out 
> -fsanitize=address,$(CFLAGS))
> +
>  MAKEOVERRIDES =
>
>  TARGETLIB = ./libiberty.a
> diff --git a/libiberty/configure b/libiberty/configure
> index 5367027..4869cd5 100755
> --- a/libiberty/configure
> +++ b/libiberty/configure
> @@ -590,6 +590,7 @@ ac_includes_default="\
>
>  ac_subst_vars='LTLIBOBJS
>  INSTALL_DEST
> +COMBINED_TREE_FALSE
>  pexecute
>  target_header_dir
>  CHECK
> @@ -6917,6 +6918,20 @@ esac
>  fi
>
>
> +# Check if this is in GCC tree, but aren't in a combined tree with
> +# binutils.
> +if test -e ${srcdir}/../gcc/gcc.c; then
> +  if test -e ${srcdir}/../ld/ldmain.c -o \
> + -e ${top_srcdir}/../gold/version.cc; then
> +COMBINED_TREE_FALSE='#'
> +  else
> +COMBINED_TREE_FALSE=''
> +  fi
> +else
> +  COMBINED_TREE_FALSE='#'
> +fi
> +
> +
>  # Install a library built with a cross compiler in $(tooldir) rather
>  # than $(libdir).
>  if test -z "${with_cross_host}"; then
> diff --git a/libiberty/configure.ac b/libiberty/configure.ac
> index c763894..7661752 100644
> --- a/libiberty/configure.ac
> +++ b/libiberty/configure.ac
> @@ -670,6 +670,20 @@ AC_SUBST(pexecute)
>
>  libiberty_AC_FUNC_STRNCMP
>
> +# Check if this is in GCC tree, but aren't in a combined tree with
> +# binutils.
> +if test -e ${srcdir}/../gcc/gcc.c; then
> +  if test -e ${srcdir}/../ld/ldmain.c -o \
> + -e ${top_srcdir}/../gold/version.cc; then
> +COMBINED_TREE_FALSE='#'
> +  else
> +COMBINED_TREE_FALSE=''
> +  fi
> +else
> +  COMBINED_TREE_FALSE='#'
> +fi
> +AC_SUBST(COMBINED_TREE_FALSE)
> +
>  # Install a library built with a cross compiler in $(tooldir) rather
>  # than $(libdir).
>  if test -z "${with_cross_host}"; then
> --
> 1.7.11.7
>


Re: [PATCH] Don't bypass blocks with multiple latch edges (PR middle-end/54838)

2012-11-30 Thread Richard Biener
On Thu, Nov 29, 2012 at 6:43 PM, Eric Botcazou  wrote:
>> Yikes, sorry, it wasn't clear to me what PROP_loops really does.  Anyway,
>> I think I have a better fix now.  The problem is just that when removing
>> BB 4 (which was a header), we have to zap ->header and ->latch.  We
>> already have code for this:
>>
>>   if (current_loops != NULL
>>   && e->src->loop_father->latch == e->src)
>> {
>>   /* ???  Now we are creating (or may create) a loop
>>  with multiple entries.  Simply mark it for
>>  removal.  Alternatively we could not do this
>>  threading.  */
>>   e->src->loop_father->header = NULL;
>>   e->src->loop_father->latch = NULL;
>> }
>>
>> but the thing is that when there are multiple latch edges, then
>> ->latch is NULL.  So we need to keep track of how many latch edges
>> the header has.  Regtested/bootstrapped on x86_64, ok for trunk?
>>
>> Can I get rid of may_be_loop_header (and just use n_latch_edges > 0
>> instead at that one place) in a followup?
>>
>> 2012-11-29  Marek Polacek  
>>
>>   PR middle-end/54838
>>   * cprop.c (bypass_block): Set header and latch to NULL when
>>   BB has more than one latch edge.
>>   (n_latches): New variable.
>
> This looks good on principle, but can't we do better now that we have the loop
> structure?   Can't we compute is_loop_header instead of may_be_loop_header and
> simplify the condition under which we mark the loop for removal?  Or maybe we
> should call disambiguate_loops_with_multiple_latches on entry of the pass?
>
> Richard, what would be the "modern" approach to solving the problem here?

RTL cprop seems to run both before and after RTL loop optimizers (currently
after RTL loop optimizers we throw away loops - an arbitrary chosen point
before IRA across which I could not get things to work).  Thus you could do

  if (current_loops)
is_loop_header = bb == bb->loop_father->header;
  else
{
  may_be_loop_header = false;
  FOR_EACH_EDGE (e, ei, bb->preds)
if (e->flags & EDGE_DFS_BACK)
  {
may_be_loop_header = true;
break;
  }
}

I don't understand

  /* The irreducible loops created by redirecting of edges entering the
 loop from outside would decrease effectiveness of some of the
 following optimizations, so prevent this.  */
  if (may_be_loop_header
  && !(e->flags & EDGE_DFS_BACK))
{
  ei_next (&ei);
  continue;
}

why isn't this simply

  if (may_be_loop_header)
{
  ei_next (&ei);
  continue;
}

?  It looks like the code tries to allow "rotating" a loop - but that's only
good if bb has exactly two predecessors (one entry and one latch edge).
And even then it requires to manually update the loop structures (update
what the new header and latch blocks are).

That said, removing the !(e->flags & EDGE_DFS_BACK) condition seems
to fix the ICE.  Threading across a loop header is in fact complicated
(see the special routine tree-ssa-threadupdate.c:thread_through_loop_header
necessary for that).  Let's declare the GIMPLE level did all interesting
threadings through headers.

Btw, if a pass wants to look at anything loop related _besides_ loop headers
it has to call loop_optimizer_init.  The only thing that is guaranteed
to be kept
up-to-date when PROP_loops is set (and thus current_loops != NULL) is
the header field in the loop tree.

Richard.

> --
> Eric Botcazou


Re: [PATCH, AARCH64] Implement vector alignment target hooks.

2012-11-30 Thread Marcus Shawcroft

On 28/11/12 16:34, Tejas Belagod wrote:


Hi,

The attached patch implements vector alignment hooks

TARGET_VECTOR_ALIGNMENT
TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

Regression tested on aarch64-none-elf. OK for aarch64-4.7 and trunk?


OK provided it was regression tested on both.
/Marcus



Re: [PATCH, RFC] Dumping expanded MD files

2012-11-30 Thread Michael Zolotukhin
I didn't find any dump_rtx - instead, I used print_inline_rtx which
seems to be what we need to output to stdout, instead of stderr.
Updated version of the patch is attached. Ok for trunk?

Changelog:
2012-11-30  Michael Zolotukhin  

* Makefile.in: Add target mddump, build/genmddump.o.  Extend
genprogrtl with mddump.
* genmddump.c: New.

---
Thanks, Michael

On 29 November 2012 22:36, Richard Henderson  wrote:
> On 2012-11-28 23:22, Michael Zolotukhin wrote:
>> +  debug_rtx (desc);
>
> ... oh, I see.  You wanted dump_rtx here, so you can output to stdout.
>
>
> r~



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


mddump-3.patch
Description: Binary data


Re: [patch] use pretty-print for slim RTL printing

2012-11-30 Thread Richard Biener
On Thu, Nov 29, 2012 at 11:58 PM, Steven Bosscher  wrote:
> Hello,
>
> This patch rewrites sched-vis.c, and graph.c, to use the pretty-print
> machinery. This makes it easier to add GIMPLE cfg dumping also,
> because all tree/gimple dumping is done via pretty-print. It also
> removes the ugly (and unsafe?) static print buffers that sched-vis.c
> and its users had.
>
> Bootstrapped&tested on powerpc64-unknown-linux-gnu. OK for trunk?

Nice!

Ok.

Thanks,
Richard.

> Ciao!
> Steven


[PATCH] Remove pointless iteration in VN

2012-11-30 Thread Richard Biener

This removes iterating propagation of value-ids (I verified
it never needs iteration in practice - certainly it does not
need iteration by design).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-11-30  Richard Biener  

* tree-ssa-sccvn.c (run_scc_vn): Remove iteration propagating
value_ids.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 193932)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -3983,7 +3983,6 @@ run_scc_vn (vn_lookup_kind default_vn_wa
 {
   size_t i;
   tree param;
-  bool changed = true;
 
   default_vn_walk_kind = default_vn_walk_kind_;
 
@@ -4028,25 +4027,18 @@ run_scc_vn (vn_lookup_kind default_vn_wa
info->value_id = get_or_alloc_constant_value_id (info->valnum);
 }
 
-  /* Propagate until they stop changing.  */
-  while (changed)
+  /* Propagate.  */
+  for (i = 1; i < num_ssa_names; ++i)
 {
-  changed = false;
-  for (i = 1; i < num_ssa_names; ++i)
-   {
- tree name = ssa_name (i);
- vn_ssa_aux_t info;
- if (!name)
-   continue;
- info = VN_INFO (name);
- if (TREE_CODE (info->valnum) == SSA_NAME
- && info->valnum != name
- && info->value_id != VN_INFO (info->valnum)->value_id)
-   {
- changed = true;
- info->value_id = VN_INFO (info->valnum)->value_id;
-   }
-   }
+  tree name = ssa_name (i);
+  vn_ssa_aux_t info;
+  if (!name)
+   continue;
+  info = VN_INFO (name);
+  if (TREE_CODE (info->valnum) == SSA_NAME
+ && info->valnum != name
+ && info->value_id != VN_INFO (info->valnum)->value_id)
+   info->value_id = VN_INFO (info->valnum)->value_id;
 }
 
   set_hashtable_value_ids ();


Re: [PATCH] asan_test.cc from llvm

2012-11-30 Thread Konstantin Serebryany
Jakub,
Great result!

Ideally, I would like to limit the differences from upstream.
I'll put some of your changes upstream, for others I'd ask you to
consider other choices.

-#include "gtest/gtest.h"
+#include "dejagnu-gtest.h"

Maybe like this?

#if ASAN_USE_DEJAGNU_GTEST
#include "dejagnu-gtest.h"
#else
#include "gtest/gtest.h"
#endif


Can we have gcc_asan_test.C which will #include the unchanged (modulo
the comment header) asan_test.cc
and have dejagnu lines there?

Like this:
// { dg-do run { target { mmap && pthread } } }
...
#include asan_test.cc


+#elif defined(__SANITIZE_ADDRESS__)
+  bool asan = 1;

I'll put this upstream.

+#ifdef __SANITIZE_ADDRESS__
+  // Avoid this test during dejagnu testing, it is too expensive
+  if (getenv ("GCC_TEST_RUN_EXPENSIVE") == NULL)
+return;
+#endif

I'd prefer to simply put this w/o ifdef.

-# error "please define ASAN_HAS_BLACKLIST"
+//# error "please define ASAN_HAS_BLACKLIST"

You can add -DASAN_HAS_BLACKLIST=0 to the command line.
If/when gcc gets blacklist support, we'll redefine it to 1


+#if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)

this is upstreamable

+#ifdef __GNUC__
+# define break_optimization(arg) __asm__ __volatile__ ("" : : "r"
(arg) : "memory")
+#endif
+

That's a nice piece of magic, let me use this too.

Cool!

--kcc

On Fri, Nov 30, 2012 at 12:46 AM, Jakub Jelinek  wrote:
> Hi!
>
> On Wed, Nov 28, 2012 at 03:13:14PM +0400, Konstantin Serebryany wrote:
>> That's a bit scary (and will be slower than with gtest).
>> But if we can limit the changes by replacing
>> asan/tests/asan_test_config.h (and maybe some minimal other changes)
>> that may work.
>
> So, here is a rough port of asan_test.cc (haven't added the few smaller
> other tests yet until this one is resolved).
>
> I'm attaching both gcc patch and diff between the current llvm files
> and the files added in the patch to make it clear what changes were done.
> Primarily it is replacing the gtest/gtest.h include with new dejagnu-gtest.h
> header that contains needed macros, adding some tcl stuff around it and
> a few dejagnu lines at the beginning.
>
> I've disabled the ManyThreads test, because it seems to be almost completely
> library bound and was very slow (order of minutes), many people run make
> check for gcc with very high -jN factors and having some test spawn 1000
> threads and in each allocate lots of memory is undersirable for normal
> testing.
>
> Attaching also make check-g++ RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
> asan.exp'
> result, there are some failures that need analysis (some of it might be
> because of the missing __asan_handle_no_return instrumentation from GCC,
> waiting there for review of the prerequisite patch, another thing is
> instrumentation of bitfields).
> But for -m32/-m64 together that is still:
>
> # of expected passes2301
> # of unexpected failures61
> # of unsupported tests  18
>
> so not that bad.  Both -m32 and -m64 testing together took around 90 seconds
> (without the ManyThreads tests, with it (GCC_TEST_RUN_EXPENSIVE=1 in
> environment) more than 4 minutes).
>
> Jakub


Re: [PATCH] Fix for PR55492 : __atomic_load doesn't match ACQUIRE memory model

2012-11-30 Thread Marcus Shawcroft

On 29/11/12 15:42, Yvan Roux wrote:

Hi,

on ARMv7, the code generated for the __atomic_load builtins in the
__ATOMIC_ACQUIRE memory model, puts a memory barrier before the load, whereas
the semantic of the acquire memory model implies a barrier after.

The issue seems to be in expand_atomic_load which puts a memory fence before
the load in any memory model. The attached patch fixes the problem.

Thanks,
Yvan



The same issue exists in the 4.7 branch.

Can the fix be back ported?

/Marcus



RE: [PATCH ARM] Disable "-fira-hoist-pressure" on Thumb2

2012-11-30 Thread Bin Cheng


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
On
> Behalf Of Bin Cheng
> Sent: Wednesday, November 28, 2012 9:48 AM
> To: Richard Earnshaw
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH ARM] Disable "-fira-hoist-pressure" on Thumb2
> 
> > -Original Message-
> > From: Richard Earnshaw
> > Sent: Wednesday, November 28, 2012 2:02 AM
> > To: Bin Cheng
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH ARM] Disable "-fira-hoist-pressure" on Thumb2
> >
> > On 27/11/12 10:10, Bin Cheng wrote:
> > > Hi,
> > > I committed the patch implementing register pressure directed hoist
> > > in TRUNK and enabled the option "-fira-hoist-pressure" for all
> > > targets by reviewer's suggestion, although it has no obvious effect
> > > on Thumb2/x86/x86_64. Now I collected performance data with Os and
> > > found it causes performance regression in some cases on Thumb2, so
> > > this patch disables the option on
> > > Thumb2 and modify the corresponding test cases.
> > >
> > > Tested on arm-none-eabi/M3, is it OK?
> > >
> > > Thanks.
> > >
> > >
> > > 2012-11-21  Bin Cheng  
> > >
> > >   * config/arm/arm.c (arm_option_override): Disable option
> > >   -fira-hoist-pressure on Thumb2.
> > >
> > > 2012-11-21  Bin Cheng  
> > >
> > >   * gcc.dg/hoist-register-pressure-1.c: Skip on ARM Thumb2.
> > >   * gcc.dg/hoist-register-pressure-2.c: Ditto.
> > >   * gcc.dg/hoist-register-pressure-3.c: Ditto.
> > >
> > >
> > >
> > >
> > >
> >
> > No patch.
> 
> Very sorry for missing the patch.
> 

Ping.





Re: [PATCH] Fix for PR55492 : __atomic_load doesn't match ACQUIRE memory model

2012-11-30 Thread Yvan Roux
> Can the fix be back ported?

Yes, the back port on 4.7 is straightforward, it just needs to be commited.

Yvan


Re: [PATCH] asan_test.cc from llvm

2012-11-30 Thread Jakub Jelinek
On Fri, Nov 30, 2012 at 01:32:52PM +0400, Konstantin Serebryany wrote:
> Ideally, I would like to limit the differences from upstream.
> I'll put some of your changes upstream, for others I'd ask you to
> consider other choices.
> 
> -#include "gtest/gtest.h"
> +#include "dejagnu-gtest.h"
> 
> Maybe like this?
> 
> #if ASAN_USE_DEJAGNU_GTEST
> #include "dejagnu-gtest.h"
> #else
> #include "gtest/gtest.h"
> #endif

Sure, I'm fine with that.

> Can we have gcc_asan_test.C which will #include the unchanged (modulo
> the comment header) asan_test.cc
> and have dejagnu lines there?
> 
> Like this:
> // { dg-do run { target { mmap && pthread } } }
> ...
> #include asan_test.cc

Yeah, I can do that, advantage will be that the testcase is named the same,
asan.exp currently runs only *.C testcases (so that *.cc is left to helper
files etc.).  So there will be a small asan_test.C with the dg markup.

> +#elif defined(__SANITIZE_ADDRESS__)
> +  bool asan = 1;
> 
> I'll put this upstream.

Thanks, if GCC ever introduces the __has_feature magic macro, it can be
dropped, but for now...  Note elsewhere I'm using the __SANITIZE_ADDRESS__
define as a test whether this is for GCC or other compilers, because
I believe clang defines __GNUC__ macro or similar, I'd need to
#if defined (__GNUC__) && !defined (__clang__) or something?

> +#ifdef __SANITIZE_ADDRESS__
> +  // Avoid this test during dejagnu testing, it is too expensive
> +  if (getenv ("GCC_TEST_RUN_EXPENSIVE") == NULL)
> +return;
> +#endif
> 
> I'd prefer to simply put this w/o ifdef.

Then even in llvm the expensive test won't run unless you put
GCC_TEST_RUN_EXPENSIVE=1 into environment.  That would be weird
for a llvm testcase to use GCC in the name.  Alternative would be
to use some other env var name, like
ASAN_TEST_RUN_EXPENSIVE, and I could adjust asan.exp to set that
env var, or it could be a macro instead, guarding the whole test,
#ifndef ASAN_AVOID_EXPENSIVE_TESTS
(TEST(AddressSanitizer, ManyThreadsTest) {
...
}
#endif
and I could in asan_test.C add:
// { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS" { target { ! 
run_expensive_tests } } }

Or, if you want even upstream to avoid it by default, it could
be a positive test, #ifdef ASAN_RUN_EXPENSIVE_TESTS or similar.

> 
> -# error "please define ASAN_HAS_BLACKLIST"
> +//# error "please define ASAN_HAS_BLACKLIST"
> 
> You can add -DASAN_HAS_BLACKLIST=0 to the command line.
> If/when gcc gets blacklist support, we'll redefine it to 1

Okay.

> +#if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
> 
> this is upstreamable
> 
> +#ifdef __GNUC__
> +# define break_optimization(arg) __asm__ __volatile__ ("" : : "r"
> (arg) : "memory")
> +#endif
> +
> 
> That's a nice piece of magic, let me use this too.

For pointers/integers it can be even as simple as
__asm__ ("" : "+g" (val));
making compiler forget about val, obviously that doesn't work for
aggregates.  Anyway, the reason for the macro is both that I wanted to
avoid compiling in another file, and for LTO it wouldn't work anyway.

BTW, I had to add -Wno-unused-but-set-variable option because without
that there was a warning about one of the variables, I think it was
stack_string in StrLenOOBTest.  Adding (void) stack_string; somewhere
would shut it up.

Jakub


Re: [Patch, Fortran] No-op Patch - a.k.a. FINAL wrapper update

2012-11-30 Thread Janus Weil
Hi,

>> one thing that I do not like about your patch is the modification of
>> "gfc_find_derived_vtab": You create two versions of it, one of which creates
>> the vtab if it does not exist, while the other version does not do this.
>> [...] can you explain to me why this would be necessary?
>
>
> Well, strictly speaking it is not necessary. However, I use it in the
> to-be-submitted calling part of the patch:
>
>   else if (al->expr->ts.type == BT_DERIVED)
> {
>   gfc_symbol *vtab = gfc_find_derived_vtab
> (al->expr->ts.u.derived);
>   if (vtab)
>
> Here, I do not want to force the generation of a vtab which wouldn't
> otherwise exist. Otherwise, one had to at least guard it by checks for
> nonextensible derived types (sequence, bind(C)).

I don't think it is a good idea to base the decision whether to call a
finalizer on the presence of a vtab. In my version of the patch I
introduced a routine 'gfc_is_finalizable' to perform this decision.


>> [Moreover, the problem is that your new "gfc_find_derived_vtab" behaves
>> different from the old one but has the same name, while your new
>> "gfc_get_derived_vtab" behaves like the old "gfc_find_derived_vtab".
>
>
> That's because of the bad choice of the current name. The other "find"
> functions do not generate the symbol if it does not exist, the "get"
> functions do. But otherwise I concur that changing the name is confusing.
>
>
>> Therefore, the places where you change the behavior by keeping the call to
>> "gfc_find_derived_vtab" are not visible in the patch.
>
>
> That should not happen. When I created the patch, I first renamed all
> existing versions, though it seems as if I there are currently three new
> ones which the current patch misses.
>
> However, if you insist on the current meaning, can you provide a good name?
> Otherwise, I could use gfc_really_find_derived_vtab ;-)

I do not oppose to renaming gfc_find_derived_vtab to
gfc_get_derived_vtab. My main point is that we do not need a variant
which only searches for the vtab but does not generate it.

Cheers,
Janus


Re: PR55124 - tentative patch for ICE in PRE

2012-11-30 Thread Richard Biener
On Wed, Nov 28, 2012 at 3:05 PM, Tom de Vries  wrote:
> On 27/11/12 13:41, Richard Biener wrote:
>> On Mon, Nov 19, 2012 at 3:33 AM, Tom de Vries  wrote:
>>> Richard,
>>>
>>> Consider the PR55124 example test.c:
>>> ...
>>> int a, b;
>>> long c;
>>>
>>> static void inline
>>> f2 (void)
>>> {
>>>   unsigned long k = 1;
>>>
>>>   foo (b ? k = 0 : 0);
>>>
>>>   b = (((c = b)
>>> ? (k ?: (c = 0))
>>> : a)
>>>* c);
>>> }
>>>
>>> void
>>> f1 (void)
>>> {
>>>   f2 ();
>>>   a = b | c;
>>> }
>>> ...
>>>
>>> when compiling with -O2, we're running into the following assertion in pre:
>>> ...
>>> test.c:18:1: internal compiler error: in find_or_generate_expression, at
>>> tree-ssa-pre.c:2802
>>>  f1 (void)
>>>  ^
>>> 0xcf41d3 find_or_generate_expression
>>> gcc/tree-ssa-pre.c:2802
>>> 0xcf42f6 create_expression_by_pieces
>>> gcc/tree-ssa-pre.c:2861
>>> 0xcf4193 find_or_generate_expression
>>> gcc/tree-ssa-pre.c:2799
>>> 0xcf42f6 create_expression_by_pieces
>>> gcc/tree-ssa-pre.c:2861
>>> 0xcf4e28 insert_into_preds_of_block
>>> gcc/tree-ssa-pre.c:3096
>>> 0xcf5c7d do_regular_insertion
>>> gcc/tree-ssa-pre.c:3386
>>> ...
>>>
>>> We're hitting the assert at the end of find_or_generate_expression:
>>> ...
>>> static tree
>>> find_or_generate_expression (basic_block block, tree op, gimple_seq *stmts)
>>> {
>>>   pre_expr expr = get_or_alloc_expr_for (op);
>>>   unsigned int lookfor = get_expr_value_id (expr);
>>>   pre_expr leader = bitmap_find_leader (AVAIL_OUT (block), lookfor);
>>>   if (leader)
>>> {
>>>   if (leader->kind == NAME)
>>> return PRE_EXPR_NAME (leader);
>>>   else if (leader->kind == CONSTANT)
>>> return PRE_EXPR_CONSTANT (leader);
>>> }
>>>
>>>   /* It must be a complex expression, so generate it recursively.  */
>>>   bitmap exprset = VEC_index (bitmap, value_expressions, lookfor);
>>>   bitmap_iterator bi;
>>>   unsigned int i;
>>>   EXECUTE_IF_SET_IN_BITMAP (exprset, 0, i, bi)
>>> {
>>>   pre_expr temp = expression_for_id (i);
>>>   if (temp->kind != NAME)
>>> return create_expression_by_pieces (block, temp, stmts,
>>> get_expr_type (expr));
>>> }
>>>
>>>   gcc_unreachable ();
>>> }
>>> ...
>>>
>>> The state in which we're asserting is the following:
>>> ...
>>> #5  0x00cf41d4 in find_or_generate_expression (block=0x76210f08,
>>> op=0x762384c8, stmts=0x7fffdb78) at gcc/tree-ssa-pre.c:2802
>>> 2802  gcc_unreachable ();
>>> (gdb) p block.index
>>> $13 = 4
>>> (gdb) call debug_generic_expr (op)
>>> b.4_3
>>> (gdb) call debug_pre_expr (expr)
>>> b.4_3
>>> (gdb) p lookfor
>>> $11 = 7
>>> (gdb) call debug_bitmap_set (((bb_value_sets_t) ((block)->aux))->avail_out)
>>> debug[0] := { b.4_8 (0012), a.10_13 (0013), _14 (0014), iftmp.5_15 (0015) }
>>> (gdb) p leader
>>> $12 = (pre_expr) 0x0
>>> (gdb) call debug_bitmap ( exprset )
>>> first = 0x21fb058 current = 0x21fb058 indx = 0
>>> 0x21fb058 next = (nil) prev = (nil) indx = 0
>>> bits = { 22 25 }
>>> (gdb) call debug_pre_expr (expression_for_id (22))
>>> b.4_3
>>> (gdb) call debug_pre_expr (expression_for_id (25))
>>> b.4_31
>>> ...
>>> We're trying to find or generate an expr with value-id 0007 in block 4. We 
>>> can't
>>> find it (there's no leader) and we can't generate it because there are no 
>>> exprs
>>> with that value that are not names.
>>>
>>> Higher up in the call stack and skipping create_expression_by_pieces, the 
>>> state
>>> is as follows:
>>> ...
>>> #7  0x00cf4194 in find_or_generate_expression (block=0x76210f08,
>>> op=0x76238558, stmts=0x7fffdb78) at gcc/tree-ssa-pre.c:2799
>>> 2799get_expr_type (expr));
>>> (gdb) call debug_generic_expr (op)
>>> c.6_5
>>> (gdb) call debug_pre_expr (expr)
>>> c.6_5
>>> (gdb) p lookfor
>>> $14 = 9
>>> (gdb) p leader
>>> $15 = (pre_expr) 0x0
>>> (gdb) call debug_bitmap ( exprset )
>>> first = 0x21fb0f8 current = 0x21fb0f8 indx = 0
>>> 0x21fb0f8 next = (nil) prev = (nil) indx = 0
>>> bits = { 23 24 26 27 }
>>> (gdb) call debug_pre_expr (temp)
>>> {nop_expr,b.4_3}
>>> (gdb) call debug_pre_expr (expression_for_id (23))
>>> c.6_5
>>> (gdb) call debug_pre_expr (expression_for_id (24))
>>> {nop_expr,b.4_3}
>>> (gdb) call debug_pre_expr (expression_for_id (26))
>>> c.6_32
>>> (gdb) call debug_pre_expr (expression_for_id (27))
>>> {mem_ref<0B>,addr_expr<&c>}@.MEM_28
>>> ...
>>> We're trying to find or generate an expr with value-id 0009 (in block 4). We
>>> can't find it. We're trying to generate it using {nop_expr,b.4_3}, but as 
>>> we've
>>> seen above that won't work. The generation using expr 27 would work though.
>>>
>>> Again higher up in the call stack and skipping create_expression_by_pieces, 
>>> the
>>> state is as follows:
>>> ...
>>> (gdb) up
>>> #9  0x00cf4e29 in insert_into_preds_of_bloc

Re: [Patch, Fortran] No-op Patch - a.k.a. FINAL wrapper update

2012-11-30 Thread Janus Weil
2012/11/30 Janus Weil :
> Hi,
>
>>> one thing that I do not like about your patch is the modification of
>>> "gfc_find_derived_vtab": You create two versions of it, one of which creates
>>> the vtab if it does not exist, while the other version does not do this.
>>> [...] can you explain to me why this would be necessary?
>>
>>
>> Well, strictly speaking it is not necessary. However, I use it in the
>> to-be-submitted calling part of the patch:
>>
>>   else if (al->expr->ts.type == BT_DERIVED)
>> {
>>   gfc_symbol *vtab = gfc_find_derived_vtab
>> (al->expr->ts.u.derived);
>>   if (vtab)
>>
>> Here, I do not want to force the generation of a vtab which wouldn't
>> otherwise exist. Otherwise, one had to at least guard it by checks for
>> nonextensible derived types (sequence, bind(C)).
>
> I don't think it is a good idea to base the decision whether to call a
> finalizer on the presence of a vtab. In my version of the patch I
> introduced a routine 'gfc_is_finalizable' to perform this decision.

Forgot to mention: My last version of the patch is available at

http://gcc.gnu.org/ml/fortran/2012-11/msg9.html


Btw, one prerequisite for the implementation of finalization would be
to have the following bug fixed:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55207

(which is about automatic deallocation in the main program).

Cheers,
Janus


Re: [PATCH] Fix allocation of reg_known_value

2012-11-30 Thread Paolo Bonzini
Il 29/11/2012 23:47, Uros Bizjak ha scritto:
> This one-liner causes following runtime test failure [1] for
> alphaev68-linux-gnu:
> 
> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
> -fomit-frame-pointer -finline-functions
> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
> -fomit-frame-pointer -finline-functions -funroll-loops
> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
> -fbounds-check
> 
> The patch miscompiles libgfortran library.
> 
> I will provide more info tomorrow, any hint what/where should I look
> for differences?

Anywhere.  The bug disabled a large part of alias analysis.

Perhaps you can bisect it and backport the fix along the steps.

Paolo



Re: [PATCH] Fix allocation of reg_known_value

2012-11-30 Thread Steven Bosscher
On Fri, Nov 30, 2012 at 11:39 AM, Paolo Bonzini  wrote:
> Il 29/11/2012 23:47, Uros Bizjak ha scritto:
>> This one-liner causes following runtime test failure [1] for
>> alphaev68-linux-gnu:
>>
>> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
>> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
>> -fomit-frame-pointer -finline-functions
>> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
>> -fomit-frame-pointer -finline-functions -funroll-loops
>> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
>> -fbounds-check
>>
>> The patch miscompiles libgfortran library.
>>
>> I will provide more info tomorrow, any hint what/where should I look
>> for differences?
>
> Anywhere.  The bug disabled a large part of alias analysis.
>
> Perhaps you can bisect it and backport the fix along the steps.

Or open a PR and assign it to me, ultimately I'm responsible for this breakage.

Ciao!
Steven


Re: [PATCH] asan_test.cc from llvm

2012-11-30 Thread Konstantin Serebryany
I've upstreamed most of your changes, please take a look.
Looks like we will be able to use libsanitizer/merge.sh to update the
test code as well.

On Fri, Nov 30, 2012 at 2:14 PM, Jakub Jelinek  wrote:
> On Fri, Nov 30, 2012 at 01:32:52PM +0400, Konstantin Serebryany wrote:
>> Ideally, I would like to limit the differences from upstream.
>> I'll put some of your changes upstream, for others I'd ask you to
>> consider other choices.
>>
>> -#include "gtest/gtest.h"
>> +#include "dejagnu-gtest.h"
>>
>> Maybe like this?
>>
>> #if ASAN_USE_DEJAGNU_GTEST
>> #include "dejagnu-gtest.h"
>> #else
>> #include "gtest/gtest.h"
>> #endif
>
> Sure, I'm fine with that.
>
>> Can we have gcc_asan_test.C which will #include the unchanged (modulo
>> the comment header) asan_test.cc
>> and have dejagnu lines there?
>>
>> Like this:
>> // { dg-do run { target { mmap && pthread } } }
>> ...
>> #include asan_test.cc
>
> Yeah, I can do that, advantage will be that the testcase is named the same,
> asan.exp currently runs only *.C testcases (so that *.cc is left to helper
> files etc.).  So there will be a small asan_test.C with the dg markup.
>
>> +#elif defined(__SANITIZE_ADDRESS__)
>> +  bool asan = 1;
>>
>> I'll put this upstream.
>
> Thanks, if GCC ever introduces the __has_feature magic macro, it can be
> dropped, but for now...  Note elsewhere I'm using the __SANITIZE_ADDRESS__
> define as a test whether this is for GCC or other compilers, because
> I believe clang defines __GNUC__ macro or similar, I'd need to
> #if defined (__GNUC__) && !defined (__clang__) or something?

I'd prefer to avoid such hairy ifdefs...


>
>> +#ifdef __SANITIZE_ADDRESS__
>> +  // Avoid this test during dejagnu testing, it is too expensive
>> +  if (getenv ("GCC_TEST_RUN_EXPENSIVE") == NULL)
>> +return;
>> +#endif
>>
>> I'd prefer to simply put this w/o ifdef.
>
> Then even in llvm the expensive test won't run unless you put
> GCC_TEST_RUN_EXPENSIVE=1 into environment.  That would be weird
> for a llvm testcase to use GCC in the name.  Alternative would be
> to use some other env var name, like
> ASAN_TEST_RUN_EXPENSIVE, and I could adjust asan.exp to set that
> env var, or it could be a macro instead, guarding the whole test,
> #ifndef ASAN_AVOID_EXPENSIVE_TESTS
> (TEST(AddressSanitizer, ManyThreadsTest) {
> ...
> }
> #endif
> and I could in asan_test.C add:
> // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS" { target { ! 
> run_expensive_tests } } }

I can add ASAN_ALLOW_SLOW_TESTS to asan_test_config.h and then #if
ASAN_ALLOW_SLOW_TESTS around ManyThreadsTest.
I did not do it yet, because I would like to understand why it's slow.
On my box (with clang+gtest) it runs in 0.6 seconds, and the entire
test suite takes 60 seconds.
Note that when running gtest tests in llvm test harness the tests are
sharded, so on a multicore machine the test takes < 10 seconds.


>
> Or, if you want even upstream to avoid it by default, it could
> be a positive test, #ifdef ASAN_RUN_EXPENSIVE_TESTS or similar.
>
>>
>> -# error "please define ASAN_HAS_BLACKLIST"
>> +//# error "please define ASAN_HAS_BLACKLIST"
>>
>> You can add -DASAN_HAS_BLACKLIST=0 to the command line.
>> If/when gcc gets blacklist support, we'll redefine it to 1
>
> Okay.
>
>> +#if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
>>
>> this is upstreamable
>>
>> +#ifdef __GNUC__
>> +# define break_optimization(arg) __asm__ __volatile__ ("" : : "r"
>> (arg) : "memory")
>> +#endif
>> +
>>
>> That's a nice piece of magic, let me use this too.
>
> For pointers/integers it can be even as simple as
> __asm__ ("" : "+g" (val));
> making compiler forget about val, obviously that doesn't work for
> aggregates.  Anyway, the reason for the macro is both that I wanted to
> avoid compiling in another file, and for LTO it wouldn't work anyway.
>
> BTW, I had to add -Wno-unused-but-set-variable option because without
> that there was a warning about one of the variables, I think it was
> stack_string in StrLenOOBTest.  Adding (void) stack_string; somewhere
> would shut it up.

I added break_optimization instead.

>
> Jakub


Re: [Patch, Fortran] No-op Patch - a.k.a. FINAL wrapper update

2012-11-30 Thread Tobias Burnus

Am 30.11.2012 11:22, schrieb Janus Weil:
In my version of the patch I introduced a routine 'gfc_is_finalizable' 
to perform this decision.


Okay. How about the following patch? It's the same without the renaming.

Build an regtested on x86-64-linux.*
OK for the trunk?

* * *

I will submit your gfc_is_finalizable together with some other auxiliary 
changes after that patch has been accepted.


I know that even with the auxiliary functions added and the updated 
finalization wrapper, it will take some work to get the remaining issues 
fixed. I am not sure that the automatic deallocation in the main program 
is really the most pressing issue with regards to finalization, but it 
surely one of the items.


Tobias

* I really like the GCC Build farm.
2012-11-27  Tobias Burnus  

	PR fortran/37336
	* class.c (finalizer_insert_packed_call): New static function.
	(finalize_component, generate_finalization_wrapper):
	Fix coarray handling and packing.

diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index 2e347cb..1271300 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -731,7 +731,7 @@ has_finalizer_component (gfc_symbol *derived)
 
 static void
 finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp,
-		gfc_expr *stat, gfc_code **code)
+		gfc_symbol *stat, gfc_symbol *fini_coarray, gfc_code **code)
 {
   gfc_expr *e;
   gfc_ref *ref;
@@ -779,12 +779,36 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp,
   e->rank = ref->next->u.ar.as->rank;
 }
 
+  /* Call DEALLOCATE (comp, stat=ignore).  */
   if (comp->attr.allocatable
   || (comp->ts.type == BT_CLASS && CLASS_DATA (comp)
 	  && CLASS_DATA (comp)->attr.allocatable))
 {
-  /* Call DEALLOCATE (comp, stat=ignore).  */
-  gfc_code *dealloc;
+  gfc_code *dealloc, *block = NULL;
+
+  /* Add IF (fini_coarray).  */
+  if (comp->attr.codimension
+	  || (comp->ts.type == BT_CLASS && CLASS_DATA (comp)
+	  && CLASS_DATA (comp)->attr.allocatable))
+	{
+	  block = XCNEW (gfc_code);
+	  if (*code)
+	{
+	  (*code)->next = block;
+	  (*code) = (*code)->next;
+	}
+	  else
+	  (*code) = block;
+
+	  block->loc = gfc_current_locus;
+	  block->op = EXEC_IF;
+
+	  block->block = XCNEW (gfc_code);
+	  block = block->block;
+	  block->loc = gfc_current_locus;
+	  block->op = EXEC_IF;
+	  block->expr1 = gfc_lval_expr_from_sym (fini_coarray);
+	}
 
   dealloc = XCNEW (gfc_code);
   dealloc->op = EXEC_DEALLOCATE;
@@ -792,9 +816,11 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp,
 
   dealloc->ext.alloc.list = gfc_get_alloc ();
   dealloc->ext.alloc.list->expr = e;
+  dealloc->expr1 = gfc_lval_expr_from_sym (stat);
 
-  dealloc->expr1 = stat;
-  if (*code)
+  if (block)
+	block->next = dealloc;
+  else if (*code)
 	{
 	  (*code)->next = dealloc;
 	  (*code) = (*code)->next;
@@ -839,7 +865,7 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp,
   gfc_component *c;
 
   for (c = comp->ts.u.derived->components; c; c = c->next)
-	finalize_component (e, c->ts.u.derived, c, stat, code);
+	finalize_component (e, comp->ts.u.derived, c, stat, fini_coarray, code);
   gfc_free_expr (e);
 }
 }
@@ -847,12 +873,11 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp,
 
 /* Generate code equivalent to
CALL C_F_POINTER (TRANSFER (TRANSFER (C_LOC (array, cptr), c_intptr)
-		 + idx * STORAGE_SIZE (array)/NUMERIC_STORAGE_SIZE., c_ptr),
-		 ptr).  */
+		 + idx * stride, c_ptr), ptr).  */
 
 static gfc_code *
 finalization_scalarizer (gfc_symbol *idx, gfc_symbol *array, gfc_symbol *ptr,
-			 gfc_namespace *sub_ns)
+			 gfc_expr *stride, gfc_namespace *sub_ns)
 {
   gfc_code *block;
   gfc_expr *expr, *expr2, *expr3;
@@ -919,40 +944,13 @@ finalization_scalarizer (gfc_symbol *idx, gfc_symbol *array, gfc_symbol *ptr,
   expr->ts.kind = gfc_index_integer_kind;
   expr2->value.function.actual->expr = expr;
 
-  /* STORAGE_SIZE (...) / NUMERIC_STORAGE_SIZE.  */
-  block->ext.actual->expr = gfc_get_expr ();
-  expr = block->ext.actual->expr;
-  expr->expr_type = EXPR_OP;
-  expr->value.op.op = INTRINSIC_DIVIDE;
-
-  /* STORAGE_SIZE (array,kind=c_intptr_t).  */
-  expr->value.op.op1 = gfc_get_expr ();
-  expr->value.op.op1->expr_type = EXPR_FUNCTION;
-  expr->value.op.op1->value.function.isym
-		= gfc_intrinsic_function_by_id (GFC_ISYM_STORAGE_SIZE);
-  gfc_get_sym_tree ("storage_size", sub_ns, &expr->value.op.op1->symtree,
-		false);
-  expr->value.op.op1->symtree->n.sym->attr.flavor = FL_PROCEDURE;
-  expr->value.op.op1->symtree->n.sym->attr.intrinsic = 1;
-  gfc_commit_symbol (expr->value.op.op1->symtree->n.sym);
-  expr->value.op.op1->value.function.actual = gfc_get_actual_arglist ();
-  expr->value.op.op1->value.function.actual->expr
-		= gfc_lval_expr_from_sym (array);
-  expr->value.op.op1->value.function.actual->n

Re: [PATCH] Fix allocation of reg_known_value

2012-11-30 Thread Uros Bizjak
On Fri, Nov 30, 2012 at 11:42 AM, Steven Bosscher  wrote:
> On Fri, Nov 30, 2012 at 11:39 AM, Paolo Bonzini  wrote:
>> Il 29/11/2012 23:47, Uros Bizjak ha scritto:
>>> This one-liner causes following runtime test failure [1] for
>>> alphaev68-linux-gnu:
>>>
>>> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
>>> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
>>> -fomit-frame-pointer -finline-functions
>>> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
>>> -fomit-frame-pointer -finline-functions -funroll-loops
>>> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
>>> -fbounds-check
>>>
>>> The patch miscompiles libgfortran library.
>>>
>>> I will provide more info tomorrow, any hint what/where should I look
>>> for differences?
>>
>> Anywhere.  The bug disabled a large part of alias analysis.
>>
>> Perhaps you can bisect it and backport the fix along the steps.
>
> Or open a PR and assign it to me, ultimately I'm responsible for this 
> breakage.

I have found the problem. Luckily, it is the testcase that is
miscompiled. The problem is in sched1 pass that moves write to an
address "addrX" that happens to be aliased with AND mutilated address
"addrY & -7". The improved alias analysis does not notice that the
write is inside (addr & -7) region, allowing write to be moved after
read.

I'm preparing a detailed PR.

Thanks,
Uros.


Re: [i386] scalar ops that preserve the high part of a vector

2012-11-30 Thread Marc Glisse

On Sun, 14 Oct 2012, Marc Glisse wrote:


On Sun, 14 Oct 2012, Uros Bizjak wrote:


On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse  wrote:

Hello,

this patch provides an alternate pattern to let combine recognize scalar
operations that preserve the high part of a vector. If the strategy is all
right, I could do the same for more operations (mul, div, ...). Something
similar is also possible for V4SF (different pattern though), but probably
not as useful.


But, we _do_ have vec_merge pattern that describes the operation.
Adding another one to each operation just to satisfy combine is IMO
not correct approach.


At some point I wondered about _replacing_ the existing pattern, so there 
would only be one ;-)


The vec_merge pattern takes as argument 2 vectors instead of a vector and a 
scalar, and describes the operation as a vector operation where we drop half 
of the result, instead of a scalar operation where we re-add the top half of 
the vector. I don't know if that's the most convenient choice. Adding code in 
simplify-rtx to replace vec_merge with vec_concat / vec_select might be 
easier than the other way around.



If the middle-end somehow gave us:
(plus X (vec_concat Y 0))
it would seem a bit strange to add an optimization that turns it into:
(vec_merge (plus X (subreg:V2DF Y)) X 1)
but then producing:
(vec_concat (plus (vec_select X 0) Y) (vec_select X 1))
would be strange as well.
(ignoring the signed zero issues here)


I'd rather see generic RTX simplification that
simplifies your proposed pattern to vec_merge pattern.


Ok, I'll see what I can do.


Also, as you mention in PR54855, Comment #5, the approach is too fragile...


I am not sure I can make the RTX simplification much less fragile... Whenever 
I see (vec_concat X (vec_select Y 1)), I would have to check whether X is 
some (possibly large) tree of scalar computations involving Y[0], move it all 
to vec_merge computations, and fix other users of some of those scalars to 
now use S[0]. Seems too hard, I would stop at single-operation X that is used 
only once. Besides, the gain is larger in proportion when there is a single 
operation :-)


Thank you for your comments,


Hello,

I experimented with the simplify-rtx transformation you suggested, see:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855

It works when the argument is a register, but not for memory (which is 
where the constant is in the testcase). And the description of the 
operation in sse.md does seem problematic. It says the second argument is:


(match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))

but Intel's documentation says "The source operand can be an XMM register 
or a 64-bit memory location", not quite the same.


Do you think the .md description should really stay this way, or could we 
change it to something that better reflects "64-bit memory location"?


--
Marc Glisse


[PATCH] Do not allocate PRE value-ids in get_expr_value_id

2012-11-30 Thread Richard Biener

Still verifying some of my PRE assumtions by making the code less
obscure ...

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-11-30  Richard Biener  

* tree-ssa-pre.c (get_expr_value_id): Do not allocate value-ids
here.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 193990)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -616,7 +616,7 @@ get_expr_value_id (pre_expr expr)
   switch (expr->kind)
 {
 case CONSTANT:
-  id = get_or_alloc_constant_value_id (PRE_EXPR_CONSTANT (expr));
+  id = get_constant_value_id (PRE_EXPR_CONSTANT (expr));
   break;
 case NAME:
   id = VN_INFO (PRE_EXPR_NAME (expr))->value_id;


Re: [PATCH, RFC] PR 55415 : Pessimistic misalignment from eipa_sra pass

2012-11-30 Thread Martin Jambor
On Thu, Nov 29, 2012 at 11:06:41AM +0100, Martin Jambor wrote:
> Hi,
> 
> thanks for the review.  When writing a reply I realized I indeed made
> a mistake or two in the part concerning prev_base and the code was not
> what it intended to be.  I'll re-write it today.

OK, this is it.  The part in tree-sra.c, in which I make IPA-SRA punt
when the alignment of the loads in the callee are not as big as
alignment of the required types of the new arguments, remained the
same.

In ipa-prop.h, I now use maximum of the alignment from
get_pointer_alignment_1 and the type alignment propagated from the
callee.  I now also recognize cases where the dereference is buried
below a COMPONENT_REF or ARRAY_REF (this is checked by new testcase
ipa-sra-9.c which fails at least on on sparc64 when things go wrong).

The patch has passed bootstrap and testing on x86_64-linux,
ppc64-linux and sparc64-linux.

Thanks,

Martin


2012-11-29  Martin Jambor  

PR middle-end/52890
PR tree-optimization/55415
PR tree-optimization/54386
PR target/55448
* ipa-prop.c (ipa_modify_call_arguments): Be optimistic when
get_pointer_alignment_1 returns false and the base was not a
dereference.
* tree-sra.c (access_precludes_ipa_sra_p): New parameter req_align,
added check for required alignment.  Update the user.

* testsuite/gcc.dg/ipa/ipa-sra-7.c: New test.
* testsuite/gcc.dg/ipa/ipa-sra-8.c: Likewise.
* testsuite/gcc.dg/ipa/ipa-sra-9.c: Likewise.
* testsuite/gcc.target/i386/pr55448.c: Likewise.


*** /tmp/cY007b_ipa-prop.c  Fri Nov 30 12:26:26 2012
--- gcc/ipa-prop.c  Thu Nov 29 16:05:04 2012
*** ipa_modify_call_arguments (struct cgraph
*** 2888,2893 
--- 2888,2895 
{
  tree expr, base, off;
  location_t loc;
+ unsigned int deref_align;
+ bool deref_base = false;
  
  /* We create a new parameter out of the value of the old one, we can
 do the following kind of transformations:
*** ipa_modify_call_arguments (struct cgraph
*** 2920,2928 
{
  HOST_WIDE_INT base_offset;
  tree prev_base;
  
  if (TREE_CODE (base) == ADDR_EXPR)
!   base = TREE_OPERAND (base, 0);
  prev_base = base;
  base = get_addr_base_and_unit_offset (base, &base_offset);
  /* Aggregate arguments can have non-invariant addresses.  */
--- 2922,2936 
{
  HOST_WIDE_INT base_offset;
  tree prev_base;
+ bool addrof;
  
  if (TREE_CODE (base) == ADDR_EXPR)
!   {
! base = TREE_OPERAND (base, 0);
! addrof = true;
!   }
! else
!   addrof = false;
  prev_base = base;
  base = get_addr_base_and_unit_offset (base, &base_offset);
  /* Aggregate arguments can have non-invariant addresses.  */
*** ipa_modify_call_arguments (struct cgraph
*** 2934,2939 
--- 2942,2952 
}
  else if (TREE_CODE (base) == MEM_REF)
{
+ if (!addrof)
+   {
+ deref_base = true;
+ deref_align = TYPE_ALIGN (TREE_TYPE (base));
+   }
  off = build_int_cst (adj->alias_ptr_type,
   base_offset
   + adj->offset / BITS_PER_UNIT);
*** ipa_modify_call_arguments (struct cgraph
*** 2956,2962 
  unsigned int align;
  unsigned HOST_WIDE_INT misalign;
  
! get_pointer_alignment_1 (base, &align, &misalign);
  misalign += (tree_to_double_int (off)
   .sext (TYPE_PRECISION (TREE_TYPE (off))).low
   * BITS_PER_UNIT);
--- 2969,2985 
  unsigned int align;
  unsigned HOST_WIDE_INT misalign;
  
! if (deref_base)
!   {
! align = deref_align;
! misalign = 0;
!   }
! else
!   {
! get_pointer_alignment_1 (base, &align, &misalign);
! if (TYPE_ALIGN (type) > align)
!   align = TYPE_ALIGN (type);
!   }
  misalign += (tree_to_double_int (off)
   .sext (TYPE_PRECISION (TREE_TYPE (off))).low
   * BITS_PER_UNIT);
*** /dev/null   Thu Oct 25 13:49:30 2012
--- gcc/testsuite/gcc.dg/ipa/ipa-sra-7.cWed Nov 28 14:19:30 2012
***
*** 0 
--- 1,42 
+ /* { dg-do run } */
+ /* { dg-options "-O3" } */
+ 
+ typedef unsigned int myint __attribute__((aligned(1)));
+ 
+ typedef struct __attribute__((packed)) S {
+   unsigned a, b, c;
+ } SS;
+ 
+ typedef SS __attribute__((aligned(1))) SSS;

Re: [PATCH] Fix allocation of reg_known_value

2012-11-30 Thread Uros Bizjak
On Fri, Nov 30, 2012 at 1:29 PM, Uros Bizjak  wrote:

 This one-liner causes following runtime test failure [1] for
 alphaev68-linux-gnu:

 FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
 FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
 -fomit-frame-pointer -finline-functions
 FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
 -fomit-frame-pointer -finline-functions -funroll-loops
 FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
 -fbounds-check

 The patch miscompiles libgfortran library.

 I will provide more info tomorrow, any hint what/where should I look
 for differences?
>>>
>>> Anywhere.  The bug disabled a large part of alias analysis.
>>>
>>> Perhaps you can bisect it and backport the fix along the steps.
>>
>> Or open a PR and assign it to me, ultimately I'm responsible for this 
>> breakage.
>
> I have found the problem. Luckily, it is the testcase that is
> miscompiled. The problem is in sched1 pass that moves write to an
> address "addrX" that happens to be aliased with AND mutilated address
> "addrY & -7". The improved alias analysis does not notice that the
> write is inside (addr & -7) region, allowing write to be moved after
> read.
>
> I'm preparing a detailed PR.

Reported as PR55547.

Uros.


Re: [PATCH] Filter out -fsanitize=address if not in combined tree in libiberty

2012-11-30 Thread H.J. Lu
On Fri, Nov 30, 2012 at 12:45 AM, Richard Biener
 wrote:
> On Thu, Nov 29, 2012 at 6:40 PM, H.J. Lu  wrote:
>> Hi,
>>
>> When GCC is configured with --with-build-config="bootstrap-asan", all
>> -flto tests will fail since -fsanitize=address is used to compile host
>> libiberty, which is used to create liblto_plugin.so, and linker isn't
>> compiled with -fsanitize=address.  This patch filters out
>> -fsanitize=address from CFLAGS if we aren't in a combined tree with
>> binutils.  OK to install?
>
> Why not simply ensure that only host _executables_ are sanitized?
>

Host libiberty library is used to create both host executables and
host plugins. We don't have separate host libiberty libraries for them.


-- 
H.J.


Re: [PATCH, RFC] PR 55415 : Pessimistic misalignment from eipa_sra pass

2012-11-30 Thread Richard Biener
On Fri, Nov 30, 2012 at 2:02 PM, Martin Jambor  wrote:
> On Thu, Nov 29, 2012 at 11:06:41AM +0100, Martin Jambor wrote:
>> Hi,
>>
>> thanks for the review.  When writing a reply I realized I indeed made
>> a mistake or two in the part concerning prev_base and the code was not
>> what it intended to be.  I'll re-write it today.
>
> OK, this is it.  The part in tree-sra.c, in which I make IPA-SRA punt
> when the alignment of the loads in the callee are not as big as
> alignment of the required types of the new arguments, remained the
> same.
>
> In ipa-prop.h, I now use maximum of the alignment from
> get_pointer_alignment_1 and the type alignment propagated from the
> callee.  I now also recognize cases where the dereference is buried
> below a COMPONENT_REF or ARRAY_REF (this is checked by new testcase
> ipa-sra-9.c which fails at least on on sparc64 when things go wrong).
>
> The patch has passed bootstrap and testing on x86_64-linux,
> ppc64-linux and sparc64-linux.

Works for me.

Thanks,
Richard.

> Thanks,
>
> Martin
>
>
> 2012-11-29  Martin Jambor  
>
> PR middle-end/52890
> PR tree-optimization/55415
> PR tree-optimization/54386
> PR target/55448
> * ipa-prop.c (ipa_modify_call_arguments): Be optimistic when
> get_pointer_alignment_1 returns false and the base was not a
> dereference.
> * tree-sra.c (access_precludes_ipa_sra_p): New parameter req_align,
> added check for required alignment.  Update the user.
>
> * testsuite/gcc.dg/ipa/ipa-sra-7.c: New test.
> * testsuite/gcc.dg/ipa/ipa-sra-8.c: Likewise.
> * testsuite/gcc.dg/ipa/ipa-sra-9.c: Likewise.
> * testsuite/gcc.target/i386/pr55448.c: Likewise.
>
>
> *** /tmp/cY007b_ipa-prop.c  Fri Nov 30 12:26:26 2012
> --- gcc/ipa-prop.c  Thu Nov 29 16:05:04 2012
> *** ipa_modify_call_arguments (struct cgraph
> *** 2888,2893 
> --- 2888,2895 
> {
>   tree expr, base, off;
>   location_t loc;
> + unsigned int deref_align;
> + bool deref_base = false;
>
>   /* We create a new parameter out of the value of the old one, we can
>  do the following kind of transformations:
> *** ipa_modify_call_arguments (struct cgraph
> *** 2920,2928 
> {
>   HOST_WIDE_INT base_offset;
>   tree prev_base;
>
>   if (TREE_CODE (base) == ADDR_EXPR)
> !   base = TREE_OPERAND (base, 0);
>   prev_base = base;
>   base = get_addr_base_and_unit_offset (base, &base_offset);
>   /* Aggregate arguments can have non-invariant addresses.  */
> --- 2922,2936 
> {
>   HOST_WIDE_INT base_offset;
>   tree prev_base;
> + bool addrof;
>
>   if (TREE_CODE (base) == ADDR_EXPR)
> !   {
> ! base = TREE_OPERAND (base, 0);
> ! addrof = true;
> !   }
> ! else
> !   addrof = false;
>   prev_base = base;
>   base = get_addr_base_and_unit_offset (base, &base_offset);
>   /* Aggregate arguments can have non-invariant addresses.  */
> *** ipa_modify_call_arguments (struct cgraph
> *** 2934,2939 
> --- 2942,2952 
> }
>   else if (TREE_CODE (base) == MEM_REF)
> {
> + if (!addrof)
> +   {
> + deref_base = true;
> + deref_align = TYPE_ALIGN (TREE_TYPE (base));
> +   }
>   off = build_int_cst (adj->alias_ptr_type,
>base_offset
>+ adj->offset / BITS_PER_UNIT);
> *** ipa_modify_call_arguments (struct cgraph
> *** 2956,2962 
>   unsigned int align;
>   unsigned HOST_WIDE_INT misalign;
>
> ! get_pointer_alignment_1 (base, &align, &misalign);
>   misalign += (tree_to_double_int (off)
>.sext (TYPE_PRECISION (TREE_TYPE (off))).low
>* BITS_PER_UNIT);
> --- 2969,2985 
>   unsigned int align;
>   unsigned HOST_WIDE_INT misalign;
>
> ! if (deref_base)
> !   {
> ! align = deref_align;
> ! misalign = 0;
> !   }
> ! else
> !   {
> ! get_pointer_alignment_1 (base, &align, &misalign);
> ! if (TYPE_ALIGN (type) > align)
> !   align = TYPE_ALIGN (type);
> !   }
>   misalign += (tree_to_double_int (off)
>.sext (TYPE_PRECISION (TREE_TYPE (off))).low
>* BITS_PER_UNIT);
> *** /dev/null   Thu Oct 25 13:49:30 2012
> --- gcc/testsuite/gcc.dg/ipa/ipa-sra-7.c   

Re: [PATCH] Filter out -fsanitize=address if not in combined tree in libiberty

2012-11-30 Thread Richard Biener
On Fri, Nov 30, 2012 at 2:17 PM, H.J. Lu  wrote:
> On Fri, Nov 30, 2012 at 12:45 AM, Richard Biener
>  wrote:
>> On Thu, Nov 29, 2012 at 6:40 PM, H.J. Lu  wrote:
>>> Hi,
>>>
>>> When GCC is configured with --with-build-config="bootstrap-asan", all
>>> -flto tests will fail since -fsanitize=address is used to compile host
>>> libiberty, which is used to create liblto_plugin.so, and linker isn't
>>> compiled with -fsanitize=address.  This patch filters out
>>> -fsanitize=address from CFLAGS if we aren't in a combined tree with
>>> binutils.  OK to install?
>>
>> Why not simply ensure that only host _executables_ are sanitized?
>>
>
> Host libiberty library is used to create both host executables and
> host plugins. We don't have separate host libiberty libraries for them.

So don't instrument libiberty then.

Richard.

>
> --
> H.J.


Re: [PATCH] Enable non-complex math builtins from C99 for Bionic

2012-11-30 Thread Richard Biener
On Fri, Nov 30, 2012 at 2:17 PM, Alexander Ivchenko  wrote:
> Bionic has the support of almost all built-in functions from C99 iso
> standard except for only complex math functions.
> (I assume because bionic wants to be as small as possible and nobody
> wants to do complex arithmetic on Android).
> GCC however checks whether the runtime has the full support of C99 or
> not, and if not it completely turns off all C99 builtins.
>
> It seems that we are missing the optimization opportunity here and
> this patch (attached) is intended to turn on built-ins that are
> supported
> in bionic.

If you want to split TARGET_C99_FUNCTIONS then split it properly,
don't add TARGET_HAS_BIONIC to the selection.

Joseph may provide some guidance here.

Richard.

> bootstrap/tests are ok.


Re: [i386] scalar ops that preserve the high part of a vector

2012-11-30 Thread Uros Bizjak
On Fri, Nov 30, 2012 at 1:34 PM, Marc Glisse  wrote:

> Hello,
>
> I experimented with the simplify-rtx transformation you suggested, see:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
>
> It works when the argument is a register, but not for memory (which is where
> the constant is in the testcase). And the description of the operation in
> sse.md does seem problematic. It says the second argument is:
>
> (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))
>
> but Intel's documentation says "The source operand can be an XMM register or
> a 64-bit memory location", not quite the same.
>
> Do you think the .md description should really stay this way, or could we
> change it to something that better reflects "64-bit memory location"?

For reference, we are talking about:

(define_insn "_vm3"
  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
(vec_merge:VF_128
  (plusminus:VF_128
(match_operand:VF_128 1 "register_operand" "0,x")
(match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))
  (match_dup 1)
  (const_int 1)))]
  "TARGET_SSE"
  "@
   \t{%2, %0|%0, %2}
   v\t{%2, %1, %0|%0, %1, %2}"
  [(set_attr "isa" "noavx,avx")
   (set_attr "type" "sseadd")
   (set_attr "prefix" "orig,vex")
   (set_attr "mode" "")])

No, looking at your description, the operand 2 should be scalar
operand (we use _s{s,d} scalar instruction here), and for doubles this
should refer to 64bit memory location. I don't remember all the
details about vec_merge scalar instructions, but it looks to me that
canonical representation should be more like your proposal:

+(define_insn "*sse2_vmv2df3"
+  [(set (match_operand:V2DF 0 "register_operand" "=x,x")
+(vec_concat:V2DF
+  (plusminus:DF
+(vec_select:DF
+  (match_operand:V2DF 1 "register_operand" "0,x")
+  (parallel [(const_int 0)]))
+(match_operand:DF 2 "nonimmediate_operand" "xm,xm"))
+  (vec_select:DF (match_dup 1) (parallel [(const_int 1)]]
+  "TARGET_SSE2"

Uros.


Re: [PATCH] Filter out -fsanitize=address if not in combined tree in libiberty

2012-11-30 Thread H.J. Lu
On Fri, Nov 30, 2012 at 5:30 AM, Richard Biener
 wrote:
> On Fri, Nov 30, 2012 at 2:17 PM, H.J. Lu  wrote:
>> On Fri, Nov 30, 2012 at 12:45 AM, Richard Biener
>>  wrote:
>>> On Thu, Nov 29, 2012 at 6:40 PM, H.J. Lu  wrote:
 Hi,

 When GCC is configured with --with-build-config="bootstrap-asan", all
 -flto tests will fail since -fsanitize=address is used to compile host
 libiberty, which is used to create liblto_plugin.so, and linker isn't
 compiled with -fsanitize=address.  This patch filters out
 -fsanitize=address from CFLAGS if we aren't in a combined tree with
 binutils.  OK to install?
>>>
>>> Why not simply ensure that only host _executables_ are sanitized?
>>>
>>
>> Host libiberty library is used to create both host executables and
>> host plugins. We don't have separate host libiberty libraries for them.
>
> So don't instrument libiberty then.
>

When you bootstrap, all host libraries are compiled with the
same CFLAGS as gcc itself in 3 stages.  That means stage2
and stage3 libiberty are compiled with -fsanitize=address if
it is added to stage2/stage3 CFLAGS for gcc.  My patch filters it
out when compiling stage2/stage3 libiberty.

-- 
H.J.


Merge from trunk to gccgo branch

2012-11-30 Thread Ian Lance Taylor
I merged trunk revision 193985 to the gccgo branch.

Ian


Re: [RFA 7/8] validate_failures.py: New options --manifest_subdir, --manifest_name

2012-11-30 Thread Diego Novillo
On Thu, Nov 29, 2012 at 6:12 PM, Doug Evans  wrote:
> Diego Novillo writes:
>  > On Sat, Nov 24, 2012 at 5:58 PM, Doug Evans  wrote:
>  > > Hi.
>  > > This seventh patch adds new options --manifest_subdir, --manifest_name.
>  > > Useful when using validate_failures.py with a different tool, instead of 
> gcc.
>  > >
>  > > Ok to check in?
>  > >
>  > > 2012-11-24  Doug Evans  
>  > >
>  > > * testsuite-management/validate_failures.py: New options 
> --manifest_subdir,
>  > > --manifest_name.
>  >
>  > OK.
>
> This is another baby-steps kinda patch.
> Instead of adding --manifest_{subdir,name},
> this patch just parameterizes the gcc-specific manifest subdir,
> and provides a utility to compute the manifest path.
>
> Ok to check in?
>
> 2012-11-29  Doug Evans  
>
> * testsuite-management/validate_failures.py: Add function
> GetManifestPath.  New global _MANIFEST_SUBDIR.

OK.


Diego.


Re: [PATCH] asan_test.cc from llvm

2012-11-30 Thread Jakub Jelinek
On Fri, Nov 30, 2012 at 02:43:12PM +0400, Konstantin Serebryany wrote:
> I've upstreamed most of your changes, please take a look.
> Looks like we will be able to use libsanitizer/merge.sh to update the
> test code as well.

Thanks.

> > #ifndef ASAN_AVOID_EXPENSIVE_TESTS
> > (TEST(AddressSanitizer, ManyThreadsTest) {
> > ...
> > }
> > #endif
> > and I could in asan_test.C add:
> > // { dg-additional-options "-DASAN_AVOID_EXPENSIVE_TESTS" { target { ! 
> > run_expensive_tests } } }
> 
> I can add ASAN_ALLOW_SLOW_TESTS to asan_test_config.h and then #if
> ASAN_ALLOW_SLOW_TESTS around ManyThreadsTest.
> I did not do it yet, because I would like to understand why it's slow.
> On my box (with clang+gtest) it runs in 0.6 seconds, and the entire
> test suite takes 60 seconds.
> Note that when running gtest tests in llvm test harness the tests are
> sharded, so on a multicore machine the test takes < 10 seconds.

Seems it is quite random, often
./asan_test.exe AddressSanitizer_ManyThreadsTest
completes within a second, but from time to time, it takes 30 seconds
instead, and occassionally even much more than that (those 4 minutes I saw).
I'm using the default distro ulimit -u:
1024
and I have certainly more than 24 other processes running around, and
furthermore the test doesn't bother to check return values of e.g.
pthread_create (which is IMHO quite important, otherwise the thread handle
is some random garbage and calling pthread_join on it is risky).

So, perhaps we shouldn't disable the test completely for
#ifdef ASAN_AVOID_EXPENSIVE_TESTS, but rather use 32 even for 64-bit
architectures if it is defined.  And/or do getrlimit (RLIMIT_PROC, &rl)
and max the number of threads with say half or quarter of the maximum.
I remember the go testsuite did something similar at one point, also wanted
to create around 1000 threads, and that resulted for months in weirdo
random testsuite failures elsewhere during parallel testing, when the go
testcase created too many threads and then there were no further user
processes allowed for the user that was testing it and so fork failed
somewhere.  My preference would be both using 32 by default and
more only when requesting expensive tests, and limit it with RLIMIT_PROC
half or quarter when running expensive tests.  And definitely, please
adjust all test to handle pthread_create failures.  In this particular
test it should just stop the loop and do the next loop only as many threads
as have been successfully created.  In other cases it might be just return;
or similar.

Jakub


Re: [PATCH] Fix allocation of reg_known_value

2012-11-30 Thread Steven Bosscher
On Fri, Nov 30, 2012 at 1:29 PM, Uros Bizjak wrote:
> On Fri, Nov 30, 2012 at 11:42 AM, Steven Bosscher wrote:
>> On Fri, Nov 30, 2012 at 11:39 AM, Paolo Bonzini wrote:
>>> Il 29/11/2012 23:47, Uros Bizjak ha scritto:
 This one-liner causes following runtime test failure [1] for
 alphaev68-linux-gnu:

 FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
 FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
 -fomit-frame-pointer -finline-functions
 FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
 -fomit-frame-pointer -finline-functions -funroll-loops
 FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
 -fbounds-check

 The patch miscompiles libgfortran library.

 I will provide more info tomorrow, any hint what/where should I look
 for differences?
>>>
>>> Anywhere.  The bug disabled a large part of alias analysis.
>>>
>>> Perhaps you can bisect it and backport the fix along the steps.
>>
>> Or open a PR and assign it to me, ultimately I'm responsible for this 
>> breakage.
>
> I have found the problem. Luckily, it is the testcase that is
> miscompiled. The problem is in sched1 pass that moves write to an
> address "addrX" that happens to be aliased with AND mutilated address
> "addrY & -7". The improved alias analysis does not notice that the
> write is inside (addr & -7) region, allowing write to be moved after
> read.

Based on this description of the problem, I remembered these patches
from Alexandre:

2012-07-06  Alexandre Oilva  <>

PR rtl-optimization/53827
PR debug/53671
PR debug/49888
* alias.c (memrefs_conflict_p): Adjust offset and size by the
same amount for alignment ANDs.

2012-06-21  Alexandre Oliva  <>

PR debug/53671
PR debug/49888
* alias.c (memrefs_conflict_p): Improve handling of AND for alignment.


They are r188868 and r189325. Can you try revert these and see if that
fixes your problem (probably at the expense of some other problem
returning, but at least it'd be a good place to start looking
further...).

Ciao!
Steven


Re: [PATCH] Fix allocation of reg_known_value

2012-11-30 Thread Steven Bosscher
On Fri, Nov 30, 2012 at 3:51 PM, Steven Bosscher wrote:
> They are r188868 and r189325.

This'd be the diff:
http://gcc.gnu.org/viewcvs/trunk/gcc/alias.c?r1=188448&r2=189325&pathrev=189325

Ciao!
Steven


[PATCH] Stream profile summary histogram through LTO files (issue6782131)

2012-11-30 Thread Teresa Johnson
Revised patch to ensure that histograms from the profile summary are streamed
through the LTO files so that the working set can be computed for use in
downstream optimizations.

Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?

Thanks,
Teresa

2012-11-29  Teresa Johnson  

* lto-cgraph.c (output_profile_summary): Stream out sum_all
and histogram.
(input_profile_summary): Stream in sum_all and histogram.
(merge_profile_summaries): Merge sum_all and histogram, and
change to use RDIV.
(input_symtab): Call compute_working_sets after merging
summaries.
* gcov-io.c (gcov_histo_index): Make extern for compiler.
* gcov-io.h (gcov_histo_index): Ditto.
* profile.c (compute_working_sets): Remove static keyword.
* profile.h (compute_working_sets): Ditto.
* Makefile.in (lto-cgraph.o): Depend on profile.h.

Index: lto-cgraph.c
===
--- lto-cgraph.c(revision 193909)
+++ lto-cgraph.c(working copy)
@@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-streamer.h"
 #include "gcov-io.h"
 #include "tree-pass.h"
+#include "profile.h"
 
 static void output_cgraph_opt_summary (void);
 static void input_cgraph_opt_summary (vec  nodes);
@@ -593,14 +594,39 @@ lto_output_ref (struct lto_simple_output_block *ob
 static void
 output_profile_summary (struct lto_simple_output_block *ob)
 {
+  unsigned h_ix;
+  struct bitpack_d bp;
+
   if (profile_info)
 {
-  /* We do not output num, sum_all and run_max, they are not used by
-GCC profile feedback and they are difficult to merge from multiple
-units.  */
+  /* We do not output num and run_max, they are not used by
+ GCC profile feedback and they are difficult to merge from multiple
+ units.  */
   gcc_assert (profile_info->runs);
   streamer_write_uhwi_stream (ob->main_stream, profile_info->runs);
   streamer_write_uhwi_stream (ob->main_stream, profile_info->sum_max);
+
+  /* sum_all is needed for computing the working set with the
+ histogram.  */
+  streamer_write_uhwi_stream (ob->main_stream, profile_info->sum_all);
+
+  /* Create and output a bitpack of non-zero histogram entries indices.  */
+  bp = bitpack_create (ob->main_stream);
+  for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++)
+bp_pack_value (&bp, profile_info->histogram[h_ix].num_counters > 0, 1);
+  streamer_write_bitpack (&bp);
+  /* Now stream out only those non-zero entries.  */
+  for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++)
+{
+  if (!profile_info->histogram[h_ix].num_counters)
+continue;
+  streamer_write_uhwi_stream (ob->main_stream,
+  
profile_info->histogram[h_ix].num_counters);
+  streamer_write_uhwi_stream (ob->main_stream,
+  profile_info->histogram[h_ix].min_value);
+  streamer_write_uhwi_stream (ob->main_stream,
+  profile_info->histogram[h_ix].cum_value);
+}
 }
   else
 streamer_write_uhwi_stream (ob->main_stream, 0);
@@ -1227,11 +1253,38 @@ static void
 input_profile_summary (struct lto_input_block *ib,
   struct lto_file_decl_data *file_data)
 {
+  unsigned h_ix;
+  struct bitpack_d bp;
   unsigned int runs = streamer_read_uhwi (ib);
   if (runs)
 {
   file_data->profile_info.runs = runs;
   file_data->profile_info.sum_max = streamer_read_uhwi (ib);
+  file_data->profile_info.sum_all = streamer_read_uhwi (ib);
+
+  memset (file_data->profile_info.histogram, 0,
+  sizeof (gcov_bucket_type) * GCOV_HISTOGRAM_SIZE);
+  /* Input the bitpack of non-zero histogram indices.  */
+  bp = streamer_read_bitpack (ib);
+  /* Read in and unpack the full bitpack, flagging non-zero
+ histogram entries by setting the num_counters non-zero.  */
+  for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++)
+{
+  file_data->profile_info.histogram[h_ix].num_counters
+  = bp_unpack_value (&bp, 1);
+}
+  for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++)
+{
+  if (!file_data->profile_info.histogram[h_ix].num_counters)
+continue;
+
+  file_data->profile_info.histogram[h_ix].num_counters
+  = streamer_read_uhwi (ib);
+  file_data->profile_info.histogram[h_ix].min_value
+  = streamer_read_uhwi (ib);
+  file_data->profile_info.histogram[h_ix].cum_value
+  = streamer_read_uhwi (ib);
+}
 }
 
 }
@@ -1242,10 +1295,13 @@ static void
 merge_profile_summaries (struct lto_file_decl_data **file_data_vec)
 {
   struct lto_file_decl_data *file_data;
-  unsigned int j;
+  unsigned int j, h_ix;
   gcov_unsigned_t max_runs = 0;
   struct cgr

Re: [PATCH] Fix allocation of reg_known_value

2012-11-30 Thread Uros Bizjak
On Fri, Nov 30, 2012 at 3:51 PM, Steven Bosscher  wrote:

> This one-liner causes following runtime test failure [1] for
> alphaev68-linux-gnu:
>
> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
> -fomit-frame-pointer -finline-functions
> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
> -fomit-frame-pointer -finline-functions -funroll-loops
> FAIL: gfortran.fortran-torture/execute/save_1.f90 execution,  -O2
> -fbounds-check
>
> The patch miscompiles libgfortran library.
>
> I will provide more info tomorrow, any hint what/where should I look
> for differences?

 Anywhere.  The bug disabled a large part of alias analysis.

 Perhaps you can bisect it and backport the fix along the steps.
>>>
>>> Or open a PR and assign it to me, ultimately I'm responsible for this 
>>> breakage.
>>
>> I have found the problem. Luckily, it is the testcase that is
>> miscompiled. The problem is in sched1 pass that moves write to an
>> address "addrX" that happens to be aliased with AND mutilated address
>> "addrY & -7". The improved alias analysis does not notice that the
>> write is inside (addr & -7) region, allowing write to be moved after
>> read.
>
> Based on this description of the problem, I remembered these patches
> from Alexandre:
>
> 2012-07-06  Alexandre Oilva  <>
>
> PR rtl-optimization/53827
> PR debug/53671
> PR debug/49888
> * alias.c (memrefs_conflict_p): Adjust offset and size by the
> same amount for alignment ANDs.
>
> 2012-06-21  Alexandre Oliva  <>
>
> PR debug/53671
> PR debug/49888
> * alias.c (memrefs_conflict_p): Improve handling of AND for alignment.
>
>
> They are r188868 and r189325. Can you try revert these and see if that
> fixes your problem (probably at the expense of some other problem
> returning, but at least it'd be a good place to start looking
> further...).

Yes backing out these revisions with attached patch fixes the runtime failure!

Uros.


alias.diff
Description: Binary data


[PATCH][Committed] Add self to write-after-approval section of MAINTAINERS file

2012-11-30 Thread Kyrylo Tkachov
Hi all,

I have committed the attached patch, adding myself to the Write After
Approval section of the MAINTAINERS file.

Thanks,
Kyrill

ChangeLog:

2012-11-30  Kyrylo Tkachov  

* MAINTAINERS (Write After Approval): Add myself.Index: MAINTAINERS
===
--- MAINTAINERS (revision 193996)
+++ MAINTAINERS (working copy)
@@ -529,6 +529,7 @@
 Samuel Tardieu s...@rfc1149.net
 Kresten Krab Thorupk...@gcc.gnu.org
 Caroline Tice  ct...@apple.com
+Kyrylo Tkachov kyrylo.tkac...@arm.com
 Konrad Trifunovic  konrad.trifuno...@inria.fr
 David Ung  dav...@mips.com
 Neil Vachharajani  nvach...@gmail.com


Re: [PATCH] asan_test.cc from llvm

2012-11-30 Thread Jakub Jelinek
Hi!

Here is updated patch, I've added also -fno-builtin which cured some
failures (will need to analyze exactly later on if it is just assumptions
of the test that are not valid with optimized memory/string etc. builtins
or if we have issues on the GCC side), the only change in the imported
files besides the two line header changes is a temporary change in the
ManyThreads test, and had to tweak slightly also the symbolizer if
asan prints [stack] in its backtrace, because that results in tcl errors.

OT, seems e.g. the pthread_create interceptor doesn't use
ENSURE_ASAN_INITED(), so when one links a threaded program not compiled with
-fsanitize=address and just LD_PRELOADs libasan.so.0 (or links against it),
then asan crashes.

Jakub


Re: [PATCH] Stream profile summary histogram through LTO files (issue6782131)

2012-11-30 Thread Jan Hubicka
> Revised patch to ensure that histograms from the profile summary are streamed
> through the LTO files so that the working set can be computed for use in
> downstream optimizations.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
> 
> Thanks,
> Teresa
> 
> 2012-11-29  Teresa Johnson  
> 
>   * lto-cgraph.c (output_profile_summary): Stream out sum_all
> and histogram.
>   (input_profile_summary): Stream in sum_all and histogram.
>   (merge_profile_summaries): Merge sum_all and histogram, and
> change to use RDIV.
>   (input_symtab): Call compute_working_sets after merging
> summaries.
>   * gcov-io.c (gcov_histo_index): Make extern for compiler.
>   * gcov-io.h (gcov_histo_index): Ditto.
>   * profile.c (compute_working_sets): Remove static keyword.
>   * profile.h (compute_working_sets): Ditto.
>   * Makefile.in (lto-cgraph.o): Depend on profile.h.

OK,
thanks!
Honza


Re: [PATCH] Don't bypass blocks with multiple latch edges (PR middle-end/54838)

2012-11-30 Thread Marek Polacek
On Fri, Nov 30, 2012 at 10:01:37AM +0100, Richard Biener wrote:
> RTL cprop seems to run both before and after RTL loop optimizers (currently
> after RTL loop optimizers we throw away loops - an arbitrary chosen point
> before IRA across which I could not get things to work).  Thus you could do
> 
>   if (current_loops)
> is_loop_header = bb == bb->loop_father->header;
>   else
> {
>   may_be_loop_header = false;
>   FOR_EACH_EDGE (e, ei, bb->preds)
> if (e->flags & EDGE_DFS_BACK)
>   {
> may_be_loop_header = true;
> break;
>   }
> }

I can do this as a followup.
 
> I don't understand
> 
>   /* The irreducible loops created by redirecting of edges entering the
>  loop from outside would decrease effectiveness of some of the
>  following optimizations, so prevent this.  */
>   if (may_be_loop_header
>   && !(e->flags & EDGE_DFS_BACK))
> {
>   ei_next (&ei);
>   continue;
> }
> 
> why isn't this simply
> 
>   if (may_be_loop_header)
> {
>   ei_next (&ei);
>   continue;
> }
> 
> ?  It looks like the code tries to allow "rotating" a loop - but that's only
> good if bb has exactly two predecessors (one entry and one latch edge).
> And even then it requires to manually update the loop structures (update
> what the new header and latch blocks are).
> 
> That said, removing the !(e->flags & EDGE_DFS_BACK) condition seems
> to fix the ICE.  Threading across a loop header is in fact complicated
> (see the special routine tree-ssa-threadupdate.c:thread_through_loop_header
> necessary for that).  Let's declare the GIMPLE level did all interesting
> threadings through headers.

Agreed.  This is the fix I had some time ago, but at that time it
didn't seem like such a great idea.  Done this time around.
Regtested/bootstrapped on x86_64-linux, ok for trunk?

2012-11-30  Marek Polacek  

PR middle-end/54838
* cprop.c (bypass_block): Skip header edges.

* gcc.dg/pr54838.c: New test.

--- gcc/cprop.c.mp  2012-11-29 15:49:53.120524295 +0100
+++ gcc/cprop.c 2012-11-30 10:30:23.509501957 +0100
@@ -1539,8 +1539,7 @@ bypass_block (basic_block bb, rtx setcc,
   /* The irreducible loops created by redirecting of edges entering the
 loop from outside would decrease effectiveness of some of the
 following optimizations, so prevent this.  */
-  if (may_be_loop_header
- && !(e->flags & EDGE_DFS_BACK))
+  if (may_be_loop_header)
{
  ei_next (&ei);
  continue;
--- gcc/testsuite/gcc.dg/pr54838.c.mp   2012-11-26 14:48:43.783980854 +0100
+++ gcc/testsuite/gcc.dg/pr54838.c  2012-11-29 17:43:19.397737779 +0100
@@ -0,0 +1,24 @@
+/* PR middle-end/54838 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-forward-propagate -ftracer" } */
+
+void bar (void);
+
+void
+foo (void *b, int *c)
+{
+again:
+  switch (*c)
+{
+case 1:
+  if (!b)
+   {
+ bar ();
+ return;
+   }
+  goto again;
+case 3:
+  if (!b)
+   goto again;
+}
+}

Marek


Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem

2012-11-30 Thread Christophe Lyon
On 29 November 2012 21:59, Joseph S. Myers  wrote:
> On Thu, 29 Nov 2012, Christophe Lyon wrote:
>
>> 2012-11-28  Christophe Lyon  
>>
>> gcc/
>> * config/arm/arm-protos.h (tune_params): Add
>> prefer_neon_for_64bits field.
>> * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>> (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>> (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>> (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>> (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>> (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>> (arm_option_override): Handle -mneon-for-64bits new option.
>> * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>> (prefer_neon_for_64bits): Declare new variable.
>> * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>> avoid_neon_for_64bits and neon_for_64bits.
>> (arch_enabled): Handle new arch types.
>> (one_cmpldi2): Use new arch names.
>> * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>> (anddi3_neon, xordi3_neon, ashldi3_neon, di3_neon): Use
>> neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>> of onlya8.
>
> This ChangeLog entry doesn't appear to mention the arm.opt change.
> Furthermore, the patch seems to be missing any .texi change to document
> the option; any new option needs documentation.  You are also missing
> testcases for the testsuite to verify that both enabled and disabled
> states of the option work properly.
>
Indeed, I forgot about the documentation; here is an updated patch.

Regarding the testcases, as this patch disables transformations
recently introduced, I would have appreciated if testcases had been
associated with them in the 1st place This requirement should be
enforced :-)

Tested with qemu on target arm-none-linux-gnueabi.


2012-11-30  Christophe Lyon  

gcc/
* config/arm/arm-protos.h (tune_params): Add
prefer_neon_for_64bits field.
* config/arm/arm.c (prefer_neon_for_64bits): New variable.
(arm_slowmul_tune): Default prefer_neon_for_64bits to false.
(arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
(arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
(arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
(arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
(arm_option_override): Handle -mneon-for-64bits new option.
* config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
(prefer_neon_for_64bits): Declare new variable.
* config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
avoid_neon_for_64bits and neon_for_64bits.
(arch_enabled): Handle new arch types.
(one_cmpldi2): Use new arch names.
* config/arm/arm.opt (mneon-for-64bits): Add option.
* config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
(anddi3_neon, xordi3_neon, ashldi3_neon, di3_neon): Use
neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
of onlya8.
* doc/invoke.texi (-mneon-for-64bits): Document.

gcc/testsuite/
* gcc.target/arm/neon-for-64bits-1.c: New tests.
* gcc.target/arm/neon-for-64bits-2.c: Likewise.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index d942c5b..c92f055 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -247,6 +247,8 @@ struct tune_params
  performance. The first element covers Thumb state and the second one
  is for ARM state.  */
   bool logical_op_non_short_circuit[2];
+  /* Prefer Neon for 64-bit bitops.  */
+  bool prefer_neon_for_64bits;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 286a6c5..9efd215 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -816,6 +816,10 @@ int arm_arch_thumb2;
 int arm_arch_arm_hwdiv;
 int arm_arch_thumb_hwdiv;
 
+/* Nonzero if we should use Neon to handle 64-bits operations rather
+   than core registers.  */
+int prefer_neon_for_64bits = 0;
+
 /* In case of a PRE_INC, POST_INC, PRE_DEC, POST_DEC memory reference,
we must report the mode of the memory reference from
TARGET_PRINT_OPERAND to TARGET_PRINT_OPERAND_ADDRESS.  */
@@ -895,6 +899,7 @@ const struct tune_params arm_slowmul_tune =
   arm_default_branch_cost,
   false,   /* Prefer LDRD/STRD.  */
   {true, true},/* Prefer non short 
circuit.  */
+  false /* Prefer Neon for 64-bits 
bitops.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -908,6 +913,7 @@ const struct tune_params arm_fastmul_tune =
   arm_default_branch_cost,
   false,   /* Prefer LDRD/STRD.  */
   {true, true},/* Prefer non short 
circuit.  */
+  false /* Prefer Neon for 64-bits 
bitops.  */
 };
 
 /* Stron

Re: [tsan] Instrument atomics

2012-11-30 Thread Jakub Jelinek
On Tue, Nov 27, 2012 at 12:47:50PM +0400, Dmitry Vyukov wrote:
> Yes, you are right.
> I think I've done them atomically initially because of things like
> FUTEX_WAKE_OP. I will fix that.

Any progress on that?

BTW, the current
template T func_nand(T v, T op) {
  return ~v & op;
}
is wrong not just by not being atomic (similarly to others), but
furthermore because __sync_fetch_and_nand (and __atomic etc.) are
  return ~(v & op);
instead (GCC < 4.4 did it wrongly as ~v & op; though).

Jakub


Re: [tsan] Instrument atomics

2012-11-30 Thread Dmitry Vyukov
On Fri, Nov 30, 2012 at 8:38 PM, Jakub Jelinek  wrote:
> On Tue, Nov 27, 2012 at 12:47:50PM +0400, Dmitry Vyukov wrote:
>> Yes, you are right.
>> I think I've done them atomically initially because of things like
>> FUTEX_WAKE_OP. I will fix that.
>
> Any progress on that?
>
> BTW, the current
> template T func_nand(T v, T op) {
>   return ~v & op;
> }
> is wrong not just by not being atomic (similarly to others), but
> furthermore because __sync_fetch_and_nand (and __atomic etc.) are
>   return ~(v & op);
> instead (GCC < 4.4 did it wrongly as ~v & op; though).


Hi,

No progress for now. It's in my todo list, but I am busy with other tasks.
It should not block you on compiler side, and I think work most of the
time, so I will fix in near future.


Re: [tsan] Instrument atomics

2012-11-30 Thread Dmitry Vyukov
and thanks for the nand catch!

On Fri, Nov 30, 2012 at 9:00 PM, Dmitry Vyukov  wrote:
> On Fri, Nov 30, 2012 at 8:38 PM, Jakub Jelinek  wrote:
>> On Tue, Nov 27, 2012 at 12:47:50PM +0400, Dmitry Vyukov wrote:
>>> Yes, you are right.
>>> I think I've done them atomically initially because of things like
>>> FUTEX_WAKE_OP. I will fix that.
>>
>> Any progress on that?
>>
>> BTW, the current
>> template T func_nand(T v, T op) {
>>   return ~v & op;
>> }
>> is wrong not just by not being atomic (similarly to others), but
>> furthermore because __sync_fetch_and_nand (and __atomic etc.) are
>>   return ~(v & op);
>> instead (GCC < 4.4 did it wrongly as ~v & op; though).
>
>
> Hi,
>
> No progress for now. It's in my todo list, but I am busy with other tasks.
> It should not block you on compiler side, and I think work most of the
> time, so I will fix in near future.


patch to add storage classes to wide int.

2012-11-30 Thread Kenneth Zadeck

Richi

this patch is our attempt to implement the storage classes for wide int 
as you suggested.   The patch currently does not work for reasons that 
will be mentioned below, but we stopped work on it because it is clear 
that this is a terrible idea.


1) Having more than one storage class is like having more than one 
register in an architecture:   1 register register allocation is easy, 
more than one register is np complete.  More than one storage class 
makes the code unreasonably complex.While it is true that in many 
cases, this can be "hidden" with c++ magic, in a lot of places it cannot 
and the notion that all of this is free is just a myth.


The storage classes mechanism that you have suggested, is not 
polymorphic.   It is two separate classes for wide it, one that does 
pointer copying and one that stack allocates.   From the client's 
perspective, this does not appear too bad, just a lot of <> here and there.


But there are a lot of places where, at run time, it is not clear which 
of the two alternatives is going to be available.  Consider the 
following code:


wide_int<>::foo = wide_int::from_rtx (x);

if (...)
   foo = foo + wide_int::from_rtx (y);

Which type of wide-int does foo contain after the if-then.   It really 
could be either one and so one of the two has to be converted into the 
other one before the next use of foo.   I claimed, i believe correctly 
that data copying was not going to be any more expensive than pointer 
copying because the amount of data was so small.But now, because we 
have more than one type, we have to deal with conversion.


2) The patch does not work for rtxes at all.   Rtxes have to copied.   
Trees could be pointer copied.
The problem is that CONST_INTs are not canonized in a way that wide-ints 
are or that trees could be.
This comes from the use of the GEN_INT macro that does not take a 
mode.   Without a mode, you do not get the integers into proper form: 
i.e. sign extended.  At the tree level, this is not a problem because 
the INT_CST constructors take a type.  But the best that can be done at 
the rtl level is to do the sign extension when we convert inside 
from_rtx (which takes a precision or a mode).


Fixing this is on Richard Sandiford's and my "it would be nice list" but 
is certainly out of the question to do as a prereq for this patch.


3) The patch seems to require that there be nothing in wide-int.c and 
everything in wide-int.h.This is a lot of code to go into a header 
file that is likely to be included into almost every c file.   The 
current patch very carefully divides the implementation of each function 
into a fast, inlined  small case that handles all types that fit in an 
HWI and a larger general implementation that handles the rarely used 
large types.   That seems to have to go away to make the two different 
implementations of wide-int work.


4) I consider the resulting wide-int.h to be unreadable.I am sure 
that after the proper indoctrination period, i will just accept this 
glop, but to the people in the community who have some apprehension 
about what gcc will turn into now that C++ is allowed, this seems like 
the poster child of where they did not want this to go.   The amount of 
c++ trickery employed here for what seems like i truly marginal gain 
seems to be out of proportion.


Mike and I gave this a good try, but in the end i think that it is 
better to not do this change.  I get it that having only one storage 
class is not going to give you a medium term large int rep.  But there 
are other ways to go
to get this.   We can simply define a class that is a medium term holder 
and copy the values into it.Again, the copying is not that bad 
because it is almost always 1 element.


Kenny



Re: patch to add storage classes to wide int.

2012-11-30 Thread Kenneth Zadeck

forgot the patch
On 11/30/2012 01:03 PM, Kenneth Zadeck wrote:

Richi

this patch is our attempt to implement the storage classes for wide 
int as you suggested.   The patch currently does not work for reasons 
that will be mentioned below, but we stopped work on it because it is 
clear that this is a terrible idea.


1) Having more than one storage class is like having more than one 
register in an architecture:   1 register register allocation is easy, 
more than one register is np complete.  More than one storage class 
makes the code unreasonably complex.While it is true that in many 
cases, this can be "hidden" with c++ magic, in a lot of places it 
cannot and the notion that all of this is free is just a myth.


The storage classes mechanism that you have suggested, is not 
polymorphic.   It is two separate classes for wide it, one that does 
pointer copying and one that stack allocates.   From the client's 
perspective, this does not appear too bad, just a lot of <> here and 
there.


But there are a lot of places where, at run time, it is not clear 
which of the two alternatives is going to be available.  Consider the 
following code:


wide_int<>::foo = wide_int::from_rtx (x);

if (...)
   foo = foo + wide_int::from_rtx (y);

Which type of wide-int does foo contain after the if-then.   It really 
could be either one and so one of the two has to be converted into the 
other one before the next use of foo.   I claimed, i believe correctly 
that data copying was not going to be any more expensive than pointer 
copying because the amount of data was so small.But now, because 
we have more than one type, we have to deal with conversion.


2) The patch does not work for rtxes at all.   Rtxes have to copied.   
Trees could be pointer copied.
The problem is that CONST_INTs are not canonized in a way that 
wide-ints are or that trees could be.
This comes from the use of the GEN_INT macro that does not take a 
mode.   Without a mode, you do not get the integers into proper form: 
i.e. sign extended.  At the tree level, this is not a problem because 
the INT_CST constructors take a type.  But the best that can be done 
at the rtl level is to do the sign extension when we convert inside 
from_rtx (which takes a precision or a mode).


Fixing this is on Richard Sandiford's and my "it would be nice list" 
but is certainly out of the question to do as a prereq for this patch.


3) The patch seems to require that there be nothing in wide-int.c and 
everything in wide-int.h.This is a lot of code to go into a header 
file that is likely to be included into almost every c file.   The 
current patch very carefully divides the implementation of each 
function into a fast, inlined  small case that handles all types that 
fit in an HWI and a larger general implementation that handles the 
rarely used large types.   That seems to have to go away to make the 
two different implementations of wide-int work.


4) I consider the resulting wide-int.h to be unreadable.I am sure 
that after the proper indoctrination period, i will just accept this 
glop, but to the people in the community who have some apprehension 
about what gcc will turn into now that C++ is allowed, this seems like 
the poster child of where they did not want this to go.   The amount 
of c++ trickery employed here for what seems like i truly marginal 
gain seems to be out of proportion.


Mike and I gave this a good try, but in the end i think that it is 
better to not do this change.  I get it that having only one storage 
class is not going to give you a medium term large int rep.  But there 
are other ways to go
to get this.   We can simply define a class that is a medium term 
holder and copy the values into it.Again, the copying is not that 
bad because it is almost always 1 element.


Kenny





wide-1.diffs.bz2
Description: application/bzip


Re: [PATCH, RFC] Dumping expanded MD files

2012-11-30 Thread Richard Henderson
On 2012-11-30 01:03, Michael Zolotukhin wrote:
> Changelog:
> 2012-11-30  Michael Zolotukhin  
> 
> * Makefile.in: Add target mddump, build/genmddump.o.  Extend
> genprogrtl with mddump.
> * genmddump.c: New.

Ok.


r~


RE: [PATCH, AArch64 4.7] Backport of __builtin_bswap16 optimisation

2012-11-30 Thread Ian Bolton
It turned out that this patch depended on another one from
earlier, so I have backported that to ARM/aarch64-4.7-branch too.

http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00452.html


Cheers,
Ian

> -Original Message-
> From: Ian Bolton [mailto:ian.bol...@arm.com]
> Sent: 23 November 2012 18:09
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH, AArch64 4.7] Backport of __builtin_bswap16
> optimisation
> 
> I had already committed my testcase for this for aarch64, but
> it depends on this patch that doesn't yet exist in 4.7, so I
> backported to our ARM/aarch64-4.7-branch.
> 
> Cheers,
> Ian
> 
> 
> 
> From:
> http://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=f811051bf87b1de7804c19
> c8192d0d099d157145
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index be34843..ce08fce 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,8 @@
> +2012-09-26  Christophe Lyon 
> +
> + * tree-ssa-math-opts.c (bswap_stats): Add found_16bit field.
> + (execute_optimize_bswap): Add support for builtin_bswap16.
> +
>  2012-09-26  Richard Guenther  
> 
>   * tree.h (DECL_IS_BUILTIN): Compare LOCATION_LOCUS.
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 3aad841..7c96949 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,7 @@
> +2012-09-26  Christophe Lyon 
> +
> + * gcc.target/arm/builtin-bswap16-1.c: New testcase.
> +
>  2012-09-25  Segher Boessenkool  
> 
>   PR target/51274
> diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c
> b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c
> new file mode 100644
> index 000..6920f00
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-require-effective-target arm_arch_v6_ok } */
> +/* { dg-add-options arm_arch_v6 } */
> +/* { dg-final { scan-assembler-not "orr\[ \t\]" } } */
> +
> +unsigned short swapu16_1 (unsigned short x)
> +{
> +  return (x << 8) | (x >> 8);
> +}
> +
> +unsigned short swapu16_2 (unsigned short x)
> +{
> +  return (x >> 8) | (x << 8);
> +}
> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index 16ff397..d9f4e9e 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -154,6 +154,9 @@ static struct
> 
>  static struct
>  {
> +  /* Number of hand-written 16-bit bswaps found.  */
> +  int found_16bit;
> +
>/* Number of hand-written 32-bit bswaps found.  */
>int found_32bit;
> 
> @@ -1803,9 +1806,9 @@ static unsigned int
>  execute_optimize_bswap (void)
>  {
>basic_block bb;
> -  bool bswap32_p, bswap64_p;
> +  bool bswap16_p, bswap32_p, bswap64_p;
>bool changed = false;
> -  tree bswap32_type = NULL_TREE, bswap64_type = NULL_TREE;
> +  tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE,
> bswap64_type = NULL_TREE;
> 
>if (BITS_PER_UNIT != 8)
>  return 0;
> @@ -1813,17 +1816,25 @@ execute_optimize_bswap (void)
>if (sizeof (HOST_WIDEST_INT) < 8)
>  return 0;
> 
> +  bswap16_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP16)
> +&& optab_handler (bswap_optab, HImode) !=
> CODE_FOR_nothing);
>bswap32_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP32)
>  && optab_handler (bswap_optab, SImode) !=
> CODE_FOR_nothing);
>bswap64_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP64)
>  && (optab_handler (bswap_optab, DImode) !=
> CODE_FOR_nothing
>  || (bswap32_p && word_mode == SImode)));
> 
> -  if (!bswap32_p && !bswap64_p)
> +  if (!bswap16_p && !bswap32_p && !bswap64_p)
>  return 0;
> 
>/* Determine the argument type of the builtins.  The code later on
>   assumes that the return and argument type are the same.  */
> +  if (bswap16_p)
> +{
> +  tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16);
> +  bswap16_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl)));
> +}
> +
>if (bswap32_p)
>  {
>tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP32);
> @@ -1863,6 +1874,13 @@ execute_optimize_bswap (void)
> 
> switch (type_size)
>   {
> + case 16:
> +   if (bswap16_p)
> + {
> +   fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16);
> +   bswap_type = bswap16_type;
> + }
> +   break;
>   case 32:
> if (bswap32_p)
>   {
> @@ -1890,7 +1908,9 @@ execute_optimize_bswap (void)
>   continue;
> 
> changed = true;
> -   if (type_size == 32)
> +   if (type_size == 16)
> + bswap_stats.found_16bit++;
> +   else if (type_size == 32)
>   bswap_stats.found_32bit++;
> else
>   bswap_stats.found_64bit++;
> @@ -1935,6 +1955,8 @@ execute_optimize_bswap (void)
>   }
>  }
> 
> +  statistics_counter_event (cfun, "16-bit bswap implementations
> found",
> + bswap_stats.found_16bit);
>statistics_counter_event 

[PATCH] PR bootstrap/55552: --enable-gold=default doesn't work with in-tree binutils

2012-11-30 Thread H.J. Lu
Hi,

Toplevel configure supports:

# Handle --enable-gold, --enable-ld.
# --disable-gold [--enable-ld]
# Build only ld.  Default option.
# --enable-gold [--enable-ld]
# Build both gold and ld.  Install gold as "ld.gold", install ld
# as "ld.bfd" and "ld".
# --enable-gold=default [--enable-ld]
# Build both gold and ld.  Install gold as "ld.gold" and "ld",
# install ld as "ld.bfd".
# --enable-gold[=default] --disable-ld
# Build only gold, which is then installed as both "ld.gold" and
# "ld".
# --enable-gold --enable-ld=default
# Build both gold (installed as "ld.gold") and ld (installed as "ld"
# and ld.bfd).
# In other words, ld is default
# --enable-gold=default --enable-ld=default
# Error.

However, gcc directory doesn't handle --enable-gold=default properly.
This patch fixes --enable-gold=default.  Tested on Linux/x86-64 with
GCC + binutils using:

--enable-plugins --enable-threads --enable-gold=default

OK to to install?

Thanks.


H.J.
---
2012-11-30  H.J. Lu  

PR bootstrap/2
* configure.ac (install_gold_as_default): New.  Set to yes for
--disable-ld or --enable-gold=default.
(gcc_cv_ld_gold_srcdir): New.
(gcc_cv_ld): Also check in-tree gold if install_gold_as_default
is yes.
* configure: Regenerated.

* exec-tool.in (dir) [collect-ld && ../gold/ld-new]: Set to gold.
(fast_install) [collect-ld && ../gold/ld-new]: Set to yes.

diff --git a/gcc/configure.ac b/gcc/configure.ac
index c6f57bd..48191c8 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1999,6 +1999,34 @@ else
   in_tree_gas=no
 fi
 
+default_ld=
+AC_ARG_ENABLE(ld,
+[[  --enable-ld[=ARG]   build ld [ARG={default,yes,no}]]],
+[case "${enableval}" in
+ no)
+   default_ld=ld.gold
+   ;;
+ esac])
+
+AC_ARG_ENABLE(gold,
+[[  --enable-gold[=ARG] build gold [ARG={default,yes,no}]]],
+[case "${enableval}" in
+ default)
+   install_gold_as_default=yes
+   ;;
+ yes)
+   if test x${default_ld} != x; then
+ install_gold_as_default=yes
+   fi
+   ;;
+ no)
+   ;;
+ *)
+   AC_MSG_ERROR([invalid --enable-gold argument])
+   ;;
+ esac],
+[install_gold_as_default=no])
+
 # Identify the linker which will work hand-in-glove with the newly
 # built GCC, so that we can examine its features.  This is the linker
 # which will be driven by the driver program.
@@ -2009,11 +2037,17 @@ fi
 gcc_cv_gld_major_version=
 gcc_cv_gld_minor_version=
 gcc_cv_ld_gld_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/ld
+gcc_cv_ld_gold_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/gold
 gcc_cv_ld_bfd_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/bfd
 
 AS_VAR_SET_IF(gcc_cv_ld,, [
 if test -x "$DEFAULT_LINKER"; then
gcc_cv_ld="$DEFAULT_LINKER"
+elif test $install_gold_as_default = yes \
+ && test -f $gcc_cv_ld_gold_srcdir/configure.ac \
+ && test -f ../gold/Makefile \
+ && test x$build = x$host; then
+   gcc_cv_ld=../gold/ld-new$build_exeext
 elif test -f $gcc_cv_ld_gld_srcdir/configure.in \
  && test -f ../ld/Makefile \
  && test x$build = x$host; then
diff --git a/gcc/exec-tool.in b/gcc/exec-tool.in
index 8a10775..6e1e710 100644
--- a/gcc/exec-tool.in
+++ b/gcc/exec-tool.in
@@ -44,7 +44,13 @@ case "$invoked" in
   original=$ORIGINAL_LD_FOR_TARGET
 fi
 prog=ld-new$exeext
-dir=ld
+if test "$original" = ../gold/ld-new$exeext; then
+  dir=gold
+  # No need to handle relink since gold doesn't use libtool.
+  fast_install=yes
+else
+  dir=ld
+fi
 id=ld
 ;;
   nm)
-- 
1.7.11.7



[C++ Patch] PR 54170

2012-11-30 Thread Paolo Carlini

Hi,

this wrong code PR is some sort of continuation of PR 52988, where we 
discard side-effects of expression of nullptr_t. Here too, in 
cp_convert_to_pointer and in build_ptrmemfunc, we don't check for 
side-effects and we replace the expression with a plain int_cst or 
nullptr_node. The testcase in the PR exercises only plain pointers but 
we have also to handle correctly pointers to members, thus I extended it 
and the patch became slightly more complex: for the case of pointer to 
data member in particular I have to call by hand build2 and build a 
COMPOUND_EXPR to host the side-effects (similarly to what I found used 
in cp_build_addr_expr_1).


Tested x86_64-linux.

Thanks,
Paolo.

/
/cp
2012-11-30  Paolo Carlini  

PR c++/54170
* cvt.c (cp_convert_to_pointer): Don't discard side-effects from
expressions of nullptr_t.
* typeck.c (build_ptrmemfunc): Likewise.

/testsuite
2012-11-30  Paolo Carlini  

PR c++/54170
* g++.dg/cpp0x/lambda/lambda-nullptr.C: New.

Index: cp/cvt.c
===
--- cp/cvt.c(revision 193995)
+++ cp/cvt.c(working copy)
@@ -219,10 +219,15 @@ cp_convert_to_pointer (tree type, tree expr, tsubs
{
  /* A NULL pointer-to-member is represented by -1, not by
 zero.  */
- expr = build_int_cst_type (type, -1);
+ expr = (TREE_SIDE_EFFECTS (expr)
+ ? build2 (COMPOUND_EXPR, type, expr,
+   build_int_cst_type (type, -1))
+ : build_int_cst_type (type, -1));
}
   else
-   expr = build_int_cst (type, 0);
+   expr = (TREE_SIDE_EFFECTS (expr)
+   ? build_nop (type, expr)
+   : build_int_cst (type, 0));
 
   return expr;
 }
Index: cp/typeck.c
===
--- cp/typeck.c (revision 193995)
+++ cp/typeck.c (working copy)
@@ -7567,7 +7567,9 @@ build_ptrmemfunc (tree type, tree pfn, int force,
   /* Handle null pointer to member function conversions.  */
   if (null_ptr_cst_p (pfn))
 {
-  pfn = build_c_cast (input_location, type, nullptr_node);
+  pfn = (TREE_SIDE_EFFECTS (pfn)
+? build_nop (type, pfn)
+: build_c_cast (input_location, type, nullptr_node));
   return build_ptrmemfunc1 (to_type,
integer_zero_node,
pfn);
Index: testsuite/g++.dg/cpp0x/lambda/lambda-nullptr.C
===
--- testsuite/g++.dg/cpp0x/lambda/lambda-nullptr.C  (revision 0)
+++ testsuite/g++.dg/cpp0x/lambda/lambda-nullptr.C  (working copy)
@@ -0,0 +1,47 @@
+// PR c++/54170
+// { dg-do run { target c++11 } }
+
+#include  
+
+struct A;
+typedef A* ptr;
+typedef int (A::*pmf) (int);
+typedef int (A::*pdm);
+
+int total;
+
+void add(int n)
+{
+  total += n;
+}
+
+template 
+RType Call(Callable native_func, int arg)
+{
+  return native_func(arg);
+}
+
+template 
+RType do_test(int delta)
+{
+  return Call([=](int delta) { add(delta); return nullptr; }, delta);
+}
+
+template 
+void test()
+{
+  total = 0;
+  assert (!do_test(5));
+  assert (total == 5);
+  assert (!do_test(20));
+  assert (total == 25);
+  assert (!do_test(-256));
+  assert (total == -231);
+}
+
+int main()
+{
+  test();
+  test();
+  test();
+}


Re: [PATCH] Enable non-complex math builtins from C99 for Bionic

2012-11-30 Thread Joseph S. Myers
On Fri, 30 Nov 2012, Richard Biener wrote:

> If you want to split TARGET_C99_FUNCTIONS then split it properly,
> don't add TARGET_HAS_BIONIC to the selection.
> 
> Joseph may provide some guidance here.

My suggested interface would be a target hook such as 
targetm.libc_has_function, taking two arguments, one of which is an enum 
value describing the general class of function (e.g. 
function_c99_misc, function_c99_math_complex, function_gnu) and the other 
of which is a string with the exact function name.  This could replace 
both the target macros TARGET_C99_FUNCTIONS and TARGET_HAS_SINCOS.

I think the default version of the hook should be that all C99 functions 
are present, but not sincos, though that requires some care in the 
conversion to keep the settings for all systems the same, since the 
present default for TARGET_C99_FUNCTIONS is 0.  There would then be an 
always-true version of the hook for non-Linux glibc systems, and one that 
checks which C library is in use for Linux systems that might or might not 
be using glibc; this last one is where the Bionic check would go.

(Given this is mainly about the target OS rather than the target CPU 
architecture, I guess the macro choosing a hook implementation would get 
defined in .h files rather than directly in .c.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Don't bypass blocks with multiple latch edges (PR middle-end/54838)

2012-11-30 Thread Eric Botcazou
> RTL cprop seems to run both before and after RTL loop optimizers (currently
> after RTL loop optimizers we throw away loops - an arbitrary chosen point
> before IRA across which I could not get things to work).  Thus you could do
> 
>   if (current_loops)
> is_loop_header = bb == bb->loop_father->header;
>   else
> {
>   may_be_loop_header = false;
>   FOR_EACH_EDGE (e, ei, bb->preds)
> if (e->flags & EDGE_DFS_BACK)
>   {
> may_be_loop_header = true;
> break;
>   }
> }

OK, let's do something like this then:

  if (current_loops)
may_be_loop_header = (bb == bb->loop_father->header);
  else
{
   may_be_loop_header = false;
   FOR_EACH_EDGE (e, ei, bb->preds)
   if (e->flags & EDGE_DFS_BACK)
 {
  may_be_loop_header = true;
  break;
 }
}

since we cannot always be sure that it's a header.

> I don't understand
> 
>   /* The irreducible loops created by redirecting of edges entering the
>  loop from outside would decrease effectiveness of some of the
>  following optimizations, so prevent this.  */
>   if (may_be_loop_header
>   && !(e->flags & EDGE_DFS_BACK))
> {
>   ei_next (&ei);
>   continue;
> }
> 
> why isn't this simply
> 
>   if (may_be_loop_header)
> {
>   ei_next (&ei);
>   continue;
> }
> 
> ?  It looks like the code tries to allow "rotating" a loop - but that's only
> good if bb has exactly two predecessors (one entry and one latch edge). And
> even then it requires to manually update the loop structures (update what
> the new header and latch blocks are).

That's the bug.  We have a loop with 2 latch edges, but the loop fixup code in 
bypass_block only handles one.

> That said, removing the !(e->flags & EDGE_DFS_BACK) condition seems
> to fix the ICE.  Threading across a loop header is in fact complicated
> (see the special routine tree-ssa-threadupdate.c:thread_through_loop_header
> necessary for that).

Scary enough indeed.  But this seems to work fine here if the loop has exactly 
one latch edge, so we don't need to change that.  Removing the above condition 
is equivalent to early returning from bypass_block for all potential headers.

> Let's declare the GIMPLE level did all interesting threadings through
> headers.

The testcase is precisely a counterexample with 2 latch edges.

But OK, let's punt if there is more than a single (potential) latch edge.

-- 
Eric Botcazou


[patch] reorg.c janitor: remove epilogue_delay_list

2012-11-30 Thread Steven Bosscher
Hello,

This epilogue_delay_list probably existed only for text epilogues, but
it is now unused.

Tested by building a set of cc1-i files at -O2 for SPARC with and
without the patch, and verifying that there are no code gen changes.

OK for trunk?

Ciao!
Steven


reorg_janitor_4.diff
Description: Binary data


Re: [PATCH] Don't bypass blocks with multiple latch edges (PR middle-end/54838)

2012-11-30 Thread Eric Botcazou
> Yikes, sorry, it wasn't clear to me what PROP_loops really does.  Anyway,
> I think I have a better fix now.  The problem is just that when removing
> BB 4 (which was a header), we have to zap ->header and ->latch.  We
> already have code for this:
> 
>   if (current_loops != NULL
>   && e->src->loop_father->latch == e->src)
> {
>   /* ???  Now we are creating (or may create) a loop
>  with multiple entries.  Simply mark it for
>  removal.  Alternatively we could not do this
>  threading.  */
>   e->src->loop_father->header = NULL;
>   e->src->loop_father->latch = NULL;
> }
> 
> but the thing is that when there are multiple latch edges, then
> ->latch is NULL.  So we need to keep track of how many latch edges
> the header has.  Regtested/bootstrapped on x86_64, ok for trunk?
> 
> Can I get rid of may_be_loop_header (and just use n_latch_edges > 0
> instead at that one place) in a followup?
> 
> 2012-11-29  Marek Polacek  
> 
>   PR middle-end/54838
>   * cprop.c (bypass_block): Set header and latch to NULL when
>   BB has more than one latch edge.
>   (n_latches): New variable.

OK, let's tweak the patch as follows:
 1) when current_loops is not NULL, we compute may_be_loop_header and whether 
the loop has more than 1 latch edge exactly,
 2) when current_loops is NULL, we use your above method to do the same,
 3) once this is done, we return from the function before entering the loop if 
this is a (potential) header with more than 1 (potential) latch edge.  The 
comment can say that threading through a loop header with more than 1 latch 
edge is delicate and cite tree-threadupdate.c:thread_through_loop_header.

-- 
Eric Botcazou


Re: [i386] scalar ops that preserve the high part of a vector

2012-11-30 Thread Marc Glisse

On Fri, 30 Nov 2012, Uros Bizjak wrote:


For reference, we are talking about:

(define_insn "_vm3"
 [(set (match_operand:VF_128 0 "register_operand" "=x,x")
(vec_merge:VF_128
  (plusminus:VF_128
(match_operand:VF_128 1 "register_operand" "0,x")
(match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))
  (match_dup 1)
  (const_int 1)))]
 "TARGET_SSE"
 "@
  \t{%2, %0|%0, %2}
  v\t{%2, %1, %0|%0, %1, %2}"
 [(set_attr "isa" "noavx,avx")
  (set_attr "type" "sseadd")
  (set_attr "prefix" "orig,vex")
  (set_attr "mode" "")])

No, looking at your description, the operand 2 should be scalar
operand (we use _s{s,d} scalar instruction here), and for doubles this
should refer to 64bit memory location. I don't remember all the
details about vec_merge scalar instructions, but it looks to me that
canonical representation should be more like your proposal:

+(define_insn "*sse2_vmv2df3"
+  [(set (match_operand:V2DF 0 "register_operand" "=x,x")
+(vec_concat:V2DF
+  (plusminus:DF
+(vec_select:DF
+  (match_operand:V2DF 1 "register_operand" "0,x")
+  (parallel [(const_int 0)]))
+(match_operand:DF 2 "nonimmediate_operand" "xm,xm"))
+  (vec_select:DF (match_dup 1) (parallel [(const_int 1)]]
+  "TARGET_SSE2"


Thank you.

Among the following possible patterns, my choice (if nobody objects) is to 
use 4) for V2DF and 3) (rewritten without iterators) for V4SF. The 
question is then what should be done about the builtins and intrinsics. 
_mm_add_sd takes two __m128. If I change the signature of 
__builtin_ia32_addsd, I can make _mm_add_sd pass __B[0] as second 
argument, but I don't know if I am allowed to change that signature. 
Otherwise I guess I'll need to keep a separate expander for it (I'd rather 
not). And then there are several other operations than +/- to handle.



1) Current pattern:

  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
(vec_merge:VF_128
  (plusminus:VF_128
(match_operand:VF_128 1 "register_operand" "0,x")
(match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))
  (match_dup 1)
  (const_int 1)))]

2) Minimal fix:

  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
(vec_merge:VF_128
  (plusminus:VF_128
(match_operand:VF_128 1 "register_operand" "0,x")
(vec_duplicate:VF_128
  (match_operand: 2 "nonimmediate_operand" "xm,xm")))
  (match_dup 1)
  (const_int 1)))]

3) With the operation in scalar mode:

  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
(vec_merge:VF_128
  (vec_duplicate:VF_128
(plusminus:
  (vec_select:
(match_operand:VF_128 1 "register_operand" "0,x")
(parallel [(const_int 0)]))
  (match_operand: 2 "nonimmediate_operand" 
"xm,xm"
  (match_dup 1)
  (const_int 1)))]

4) Special version which only makes sense for vectors of 2 elements:

  [(set (match_operand:V2DF 0 "register_operand" "=x,x")
(vec_concat:V2DF
  (plusminus:DF
(vec_select:DF
  (match_operand:V2DF 1 "register_operand" "0,x")
  (parallel [(const_int 0)]))
(match_operand:DF 2 "nonimmediate_operand" "xm,xm"))
  (vec_select:DF (match_dup 1) (parallel [(const_int 1)]]

--
Marc Glisse


Fix segfault on degenerate bitfield case

2012-11-30 Thread Eric Botcazou
This is a segfault on a degenerate bitfield case introduced by the rewrite of 
the bitfield machinery.  In Ada, we have bitfields of size zero and we ask the 
middle-end to generate accesses to them.  This doesn't work anymore because 
get_best_mode now returns VOIDmode instead of QImode in this case, which 
wreaks havoc later.

The patchlet just restores the previous behaviour.  It also makes the comment 
describing the computation of bitregion_end_ more explicit, as the original 
formulation is a bit terse on second reading, even for the reviewer. :-)

Bootstrapped/regtested on x86-64/Linux, applied on the mainline as obvious.


2012-11-30  Eric Botcazou  

* stor-layout.c (bit_field_mode_iterator::bit_field_mode_iterator): Deal
with degenerate cases where the bitsize isn't positive.  Rework comment.


2012-11-30  Eric Botcazou  

* gnat.dg/specs/pack9.ads: New test.


-- 
Eric BotcazouIndex: stor-layout.c
===
--- stor-layout.c	(revision 193985)
+++ stor-layout.c	(working copy)
@@ -2648,10 +2648,14 @@ bit_field_mode_iterator
 {
   if (!bitregion_end_)
 {
-  /* We can assume that any aligned chunk of UNITS bits that overlaps
-	 the bitfield is mapped and won't trap.  */
-  unsigned HOST_WIDE_INT units = MIN (align, MAX (BIGGEST_ALIGNMENT,
-		  BITS_PER_WORD));
+  /* We can assume that any aligned chunk of ALIGN bits that overlaps
+	 the bitfield is mapped and won't trap, provided that ALIGN isn't
+	 too large.  The cap is the biggest required alignment for data,
+	 or at least the word size.  And force one such chunk at least.  */
+  unsigned HOST_WIDE_INT units
+	= MIN (align, MAX (BIGGEST_ALIGNMENT, BITS_PER_WORD));
+  if (bitsize <= 0)
+	bitsize = 1;
   bitregion_end_ = bitpos + bitsize + units - 1;
   bitregion_end_ -= bitregion_end_ % units + 1;
 }-- { dg-do compile }

package Pack9 is

  subtype Zero is Natural range 0 .. 0;

  type Rec (D : Boolean) is record
case D is
   when True => Z : Zero;
   when False => null;
end case;
  end record;
  pragma Pack (Rec);

end Pack9;

[C++ patch, committed] Fix make_ith_pack_parameter_name ICE (PR c++/55542)

2012-11-30 Thread Jakub Jelinek
Hi!

If the pack is unnamed, make_ith_pack_parameter_name will crash.
Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
acked by Jason in the PR, committed to trunk so far.

2012-12-01  Jakub Jelinek  

PR c++/55542
* pt.c (make_ith_pack_parameter_name): Return NULL if
name is NULL.
(tsubst_decl): Call make_ith_pack_parameter_name even if
DECL_NAME is NULL.

* g++.dg/cpp0x/vt-55542.C: New test.

--- gcc/cp/pt.c.jj  2012-11-19 14:41:16.0 +0100
+++ gcc/cp/pt.c 2012-11-30 09:24:45.716620002 +0100
@@ -2905,6 +2905,8 @@ make_ith_pack_parameter_name (tree name,
   char* newname;
   int newname_len;
 
+  if (name == NULL_TREE)
+return name;
   snprintf (numbuf, NUMBUF_LEN, "%i", i);
   newname_len = IDENTIFIER_LENGTH (name)
+ strlen (numbuf) + 2;
@@ -10267,10 +10269,9 @@ tsubst_decl (tree t, tree args, tsubst_f
 /* Get the Ith type.  */
 type = TREE_VEC_ELT (expanded_types, i);
 
-if (DECL_NAME (r))
-  /* Rename the parameter to include the index.  */
-  DECL_NAME (r) =
-make_ith_pack_parameter_name (DECL_NAME (r), i);
+   /* Rename the parameter to include the index.  */
+   DECL_NAME (r)
+ = make_ith_pack_parameter_name (DECL_NAME (r), i);
   }
 else if (!type)
   /* We're dealing with a normal parameter.  */
--- gcc/testsuite/g++.dg/cpp0x/vt-55542.C.jj2012-11-30 09:37:14.042107481 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/vt-55542.C   2012-11-30 09:35:02.0 
+0100
@@ -0,0 +1,22 @@
+// PR c++/55542
+// { dg-options "-std=c++11" }
+
+template 
+struct B
+{
+  template 
+  B (O *o, void (O::*f) (P ... p)) {}
+};
+class C
+{
+  void foo (void *, int);
+  template 
+  void bar (A ... a);
+  B  c;
+  B  d;
+  C (int) : c (this, &C::bar), d (this, &C::foo) {}
+};
+template 
+void C::bar (A ...)
+{
+}

Jakub


[tsan] Instrument atomics (take 2)

2012-11-30 Thread Jakub Jelinek
On Fri, Nov 30, 2012 at 09:00:30PM +0400, Dmitry Vyukov wrote:
> and thanks for the nand catch!

Here is updated atomics instrumentation patch, including the 16-byte
atomics, nand, success/failure parameters for cas and 0-5 arguments
instead of 1<<0 through 1<<5 for memory models.

Again, on top of the http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01875.html
patch.

Tested just by compiling and eyeballing the large atomics testcase.

2012-12-01  Jakub Jelinek  

PR sanitizer/55439
* Makefile.in (tsan.o): Depend on tree-ssa-propagate.h.
* sanitizer.def: Add __tsan_atomic* builtins.
* asan.c (initialize_sanitizer_builtins): Adjust to also
initialize __tsan_atomic* builtins.
* tsan.c: Include tree-ssa-propagate.h.
(enum tsan_atomic_action): New enum.
(tsan_atomic_table): New table.
(instrument_builtin_call): New function.
(instrument_gimple): Take pointer to gimple_stmt_iterator
instead of gimple_stmt_iterator.  Call instrument_builtin_call
on builtin call stmts.
(instrument_memory_accesses): Adjust instrument_gimple caller.
* builtin-types.def (BT_FN_BOOL_VPTR_PTR_I1_INT_INT,
BT_FN_BOOL_VPTR_PTR_I2_INT_INT, BT_FN_BOOL_VPTR_PTR_I4_INT_INT,
BT_FN_BOOL_VPTR_PTR_I8_INT_INT, BT_FN_BOOL_VPTR_PTR_I16_INT_INT): New.

--- gcc/Makefile.in.jj  2012-11-30 17:26:20.468778282 +0100
+++ gcc/Makefile.in 2012-11-30 17:26:51.289584561 +0100
@@ -2234,7 +2234,8 @@ tsan.o : $(CONFIG_H) $(SYSTEM_H) $(TREE_
$(TM_H) coretypes.h $(TREE_DUMP_H) $(TREE_PASS_H) $(CGRAPH_H) $(GGC_H) \
$(BASIC_BLOCK_H) $(FLAGS_H) $(FUNCTION_H) \
$(TM_P_H) $(TREE_FLOW_H) $(DIAGNOSTIC_CORE_H) $(GIMPLE_H) tree-iterator.h \
-   intl.h cfghooks.h output.h options.h c-family/c-common.h tsan.h asan.h
+   intl.h cfghooks.h output.h options.h c-family/c-common.h tsan.h asan.h \
+   tree-ssa-propagate.h
 tree-ssa-tail-merge.o: tree-ssa-tail-merge.c \
$(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(BITMAP_H) \
$(FLAGS_H) $(TM_P_H) $(BASIC_BLOCK_H) \
--- gcc/sanitizer.def.jj2012-11-30 17:26:20.452779362 +0100
+++ gcc/sanitizer.def   2012-11-30 17:47:27.385161273 +0100
@@ -57,3 +57,196 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_WRIT
  BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_WRITE16, "__tsan_write16",
  BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_LOAD,
+ "__tsan_atomic8_load",
+ BT_FN_I1_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_LOAD,
+ "__tsan_atomic16_load",
+ BT_FN_I2_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_LOAD,
+ "__tsan_atomic32_load",
+ BT_FN_I4_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_LOAD,
+ "__tsan_atomic64_load",
+ BT_FN_I8_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC128_LOAD,
+ "__tsan_atomic128_load",
+ BT_FN_I16_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_STORE,
+ "__tsan_atomic8_store",
+ BT_FN_VOID_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_STORE,
+ "__tsan_atomic16_store",
+ BT_FN_VOID_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_STORE,
+ "__tsan_atomic32_store",
+ BT_FN_VOID_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_STORE,
+ "__tsan_atomic64_store",
+ BT_FN_VOID_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC128_STORE,
+ "__tsan_atomic128_store",
+ BT_FN_VOID_VPTR_I16_INT, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_EXCHANGE,
+ "__tsan_atomic8_exchange",
+ BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_EXCHANGE,
+ "__tsan_atomic16_exchange",
+ BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_EXCHANGE,
+ "__tsan_atomic32_exchange",
+ BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_EXCHANGE,
+ "__tsan_atomic64_exchange",
+ BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC128_EXCHANGE,
+ "__tsan_atomic128_exchange",
+ BT_FN_I16_VPTR_I16_INT

Re: [RFA:] fix PR55030, wrong code from __builtin_setjmp_receiver

2012-11-30 Thread Eric Botcazou
> I was mainly arguing with the sentence that for asm volatile, the output
> constraints (did you mean outputs and clobbers?) have essentially no
> meaning.  While for some optimizations perhaps it might be desirable
> to treat asm volatile as full optimization barrier, I'm not sure about all
> of them (i.e. it would be important to look for performance degradations
> caused by that change), and for var-tracking I'd argue that asm vs. asm
> volatile is completely unimportant, if the asm volatile pattern (or even
> UNSPEC_VOLATILE) doesn't say it clobbers or sets hard register XYZ, then it
> can't change it (well, it could but is responsible of restoring it),
> similarly for memory, if it doesn't clobber "memory" and doesn't set some
> memory, it can't do that.

Yes, I meant outputs and clobbers.  The origin of all this is:

`blockage'
 This pattern defines a pseudo insn that prevents the instruction
 scheduler and other passes from moving instructions and using
 register equivalences across the boundary defined by the blockage
 insn.  This needs to be an UNSPEC_VOLATILE pattern or a volatile
 ASM.

Now the typical blockage insn is that of the x86:

;; UNSPEC_VOLATILE is considered to use and clobber all hard registers and
;; all of memory.  This blocks insns from being moved across this point.

(define_insn "blockage"
  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)]
  ""
  ""
  [(set_attr "length" "0")])

[Note the pretty adamant comment about UNSPEC_VOLATILE here...]

So, if UNSPEC_VOLATILE is not treated specially, the blockage insn of most 
ports won't block hard register equivalences.  And blockages are precisely 
used to do that, see e.g. the untyped_call pattern of the x86:

  /* The optimizer does not know that the call sets the function value
 registers we stored in the result block.  We avoid problems by
 claiming that all hard registers are used and clobbered at this
 point.  */
  emit_insn (gen_blockage ());

That being said, I agree that volatile asms are a separate discussion.

> So, do you object even to the var-tracking change (which would mean if
> var-tracking found that say some variable's value lives in hard register 12,
> when encountering asm volatile it would mean to forget about that even when
> hard register 12 isn't clobbered by the asm, nor set)?  For now
> var-tracking seems to be the most important issue, as we generate invalid
> debug info there (latent before, but never hit on inline asm on x86_64/i686
> except on setjmp calls).

For volatile asms, no, thanks for fixing the fallout on this side.

> As for other passes, the r193802 change e.g. regresses slightly modified
> pr44194-1.c testcase on x86_64,
> struct ints { int a, b, c; } foo();
> void bar(int a, int b);
> 
> void func() {
>   struct ints s = foo();
>   int x;
>   asm volatile ("" : "=r" (x));
>   bar(s.a, s.b);
> }
> 
>  func:
>  .LFB0:
> - xorl%eax, %eax
>   subq$24, %rsp
>  .LCFI0:
> + xorl%eax, %eax
>   callfoo
> - movq%rax, %rcx
> - shrq$32, %rcx
> - movq%rcx, %rsi
> - movl%eax, %edi
> + movq%rax, (%rsp)
> + movl%edx, 8(%rsp)
> + movl4(%rsp), %esi
> + movl(%rsp), %edi
>   addq$24, %rsp
>  .LCFI1:
>   jmp bar
> 
> To me it looks like an unnecessary pessimization, the volatile there
> I understand that just it doesn't want to be moved around (e.g. scheduled
> before the foo call or after bar), that it can't be DCEd, but as it doesn't
> mention it modifies memory, I don't understand why it should force some
> registers to stack and back when it has no way to know the compiler would be
> emitting anything like that at all.
> Compare that to
>   asm volatile ("" : "=r" (x) : : "memory");
> in the testcase, where the r193802 commit makes no difference, the
> stores/loads are done in both cases.

OK, I agree that the fallout for DSE, and the effect on memory in general, is 
undesirable, especially for volatile asms.

> For CSE, I'd agree it should treat asm volatile as barriers, so for cselib.c
> we probably need some additional cselib_init flag how we want to treat
> volatile asms...  For UNSPEC_VOLATILE, perhaps even DSE should stay
> conservative, but for var-tracking.c I still don't see a reason for that.

Thank you (as well as the others) for the detailed feedback.  Clearly my 
initial stance on this was a bit extreme and needs to be watered down.
I'll ponder about this over the week-end and propose something next week.

-- 
Eric Botcazou


[tsan] Small bugfix

2012-11-30 Thread Jakub Jelinek
Hi!

When I've tried to compile the attached testcase (I was trying to see
if tsan could discover the emutls.c data race), I got ICEs because
expr_ptr in certain cases wasn't is_gimple_val and thus was invalid to
pass it directly to a call as argument, fixed thusly.

Unfortunately, trying to compile it dynamically against libtsan.so
doesn't work (apparently it is using -fvisibility=hidden, but not
saying the public entry points have default visibility), compiling it
by hand statically against libtsan.a (we don't have -static-libtsan yet)
failed at runtime, complaining the binary isn't a PIE - can't it really
support normal executables?, and when compiled/linked as PIE, I got
==
WARNING: ThreadSanitizer: thread leak (pid=31150)
  Thread 3 (tid=31153, finished) created at:
#0 pthread_create ??:0 (exe+0xe4dc)
#1 main ??:0 (exe+0x505f)

==
==
WARNING: ThreadSanitizer: thread leak (pid=31150)
  Thread 4 (tid=31155, finished) created at:
#0 pthread_create ??:0 (exe+0xe4dc)
#1 main ??:0 (exe+0x505f)

==
==
WARNING: ThreadSanitizer: thread leak (pid=31150)
  Thread 5 (tid=31156, finished) created at:
#0 pthread_create ??:0 (exe+0xe4dc)
#1 main ??:0 (exe+0x505f)

==
ThreadSanitizer: reported 3 warnings

which is probably not what I was expecting to see.

2012-12-01  Jakub Jelinek  

* tsan.c (instrument_expr): If expr_ptr isn't a gimple val, first
store it into a SSA_NAME.

--- gcc/tsan.c.jj   2012-11-30 19:17:13.0 +0100
+++ gcc/tsan.c  2012-11-30 21:50:54.695392123 +0100
@@ -93,10 +93,11 @@ is_vptr_store (gimple stmt, tree expr, b
 static bool
 instrument_expr (gimple_stmt_iterator gsi, tree expr, bool is_write)
 {
-  tree base, rhs, expr_type, expr_ptr, builtin_decl;
+  tree base, rhs, expr_ptr, builtin_decl;
   basic_block bb;
   HOST_WIDE_INT size;
   gimple stmt, g;
+  gimple_seq seq;
   location_t loc;
 
   size = int_size_in_bytes (TREE_TYPE (expr));
@@ -139,21 +140,25 @@ instrument_expr (gimple_stmt_iterator gs
   rhs = is_vptr_store (stmt, expr, is_write);
   gcc_checking_assert (rhs != NULL || is_gimple_addressable (expr));
   expr_ptr = build_fold_addr_expr (unshare_expr (expr));
-  if (rhs == NULL)
+  seq = NULL;
+  if (!is_gimple_val (expr_ptr))
 {
-  expr_type = TREE_TYPE (expr);
-  while (TREE_CODE (expr_type) == ARRAY_TYPE)
-   expr_type = TREE_TYPE (expr_type);
-  size = int_size_in_bytes (expr_type);
-  g = gimple_build_call (get_memory_access_decl (is_write, size),
-1, expr_ptr);
+  g = gimple_build_assign (make_ssa_name (TREE_TYPE (expr_ptr), NULL),
+  expr_ptr);
+  expr_ptr = gimple_assign_lhs (g);
+  gimple_set_location (g, loc);
+  gimple_seq_add_stmt_without_update (&seq, g);
 }
+  if (rhs == NULL)
+g = gimple_build_call (get_memory_access_decl (is_write, size),
+  1, expr_ptr);
   else
 {
   builtin_decl = builtin_decl_implicit (BUILT_IN_TSAN_VPTR_UPDATE);
   g = gimple_build_call (builtin_decl, 1, expr_ptr);
 }
   gimple_set_location (g, loc);
+  gimple_seq_add_stmt_without_update (&seq, g);
   /* Instrumentation for assignment of a function result
  must be inserted after the call.  Instrumentation for
  reads of function arguments must be inserted before the call.
@@ -170,13 +175,13 @@ instrument_expr (gimple_stmt_iterator gs
  bb = gsi_bb (gsi);
  e = find_fallthru_edge (bb->succs);
  if (e)
-   gsi_insert_seq_on_edge_immediate (e, g);
+   gsi_insert_seq_on_edge_immediate (e, seq);
}
   else
-   gsi_insert_after (&gsi, g, GSI_NEW_STMT);
+   gsi_insert_seq_after (&gsi, seq, GSI_NEW_STMT);
 }
   else
-gsi_insert_before (&gsi, g, GSI_SAME_STMT);
+gsi_insert_before (&gsi, seq, GSI_SAME_STMT);
 
   return true;
 }

Jakub
#include 
#include 
#include 
#include 

pthread_key_t key;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

struct S
{
  uintptr_t offset;
  size_t size;
};

struct T
{
  uintptr_t size;
  void **data[];
};

void
destroy (void *x)
{
  struct T *p = x;
  uintptr_t size = p->size;
  uintptr_t i;

  for (i = 0; i < size; ++i)
free (p->data[i]);

  free (x);
}

void
init (void)
{
  if (pthread_key_create (&key, destroy))
exit (0);
}

static uintptr_t size;

void *
foo (struct S *x)
{
  uintptr_t offset = x->offset;
  if (__builtin_expect (offset == 0, 0))
{
  static pthread_once_t once = PTHREAD_ONCE_INIT;
  pthread_once (&once, init);
  pthread_mutex_lock (&lock);
  offset = x->offset;
  if (offset == 0)
{
  offset = ++size;
  x->offset = offset;
}
  pthread_mutex_unlock (&lock);
}

  struct T *p = (struct T *) pthread_getspecific (key);
  if (__builtin_expect (p == NULL, 0))
{
  

Go patch committed: Fix bug converting unnamed types

2012-11-30 Thread Ian Lance Taylor
This patch to the Go frontend fixes a bug converting unnamed types to
GIMPLE.  It's important to convert all identical unnamed Go types to the
same GIMPLE type, to avoid middle-end errors in assignments and type
conversions.  This is done via a hash table.  Unfortunately
Type::get_backend_placeholder was not using the hash table, so it was
possible to get two different placeholder types for two identical
unnamed Go types, leading in some cases to getting two different GIMPLE
types.  This would then cause an ICE if the two types appeared in an
assignment of some sort.

This patch fixes the problem by recording placeholder information in the
hash table, so that get_backend_placeholder can use it.  Bootstrapped
and ran Go testsuite on x86_64-unknown-linux-gnu.  Committed to mainline
and 4.7 branch.

Ian

diff -r b7d0a78bc2df go/types.cc
--- a/go/types.cc	Thu Nov 29 23:01:48 2012 -0800
+++ b/go/types.cc	Fri Nov 30 15:36:47 2012 -0800
@@ -45,8 +45,7 @@
 // Class Type.
 
 Type::Type(Type_classification classification)
-  : classification_(classification), btype_is_placeholder_(false),
-btype_(NULL), type_descriptor_var_(NULL)
+  : classification_(classification), btype_(NULL), type_descriptor_var_(NULL)
 {
 }
 
@@ -910,11 +909,7 @@
 Type::get_backend(Gogo* gogo)
 {
   if (this->btype_ != NULL)
-{
-  if (this->btype_is_placeholder_ && gogo->named_types_are_converted())
-	this->finish_backend(gogo);
-  return this->btype_;
-}
+return this->btype_;
 
   if (this->forward_declaration_type() != NULL
   || this->named_type() != NULL)
@@ -928,20 +923,36 @@
   // that.  There is no need to use the hash table for named types, as
   // named types are only identical to themselves.
 
-  std::pair val(this, NULL);
+  std::pair val;
+  val.first = this;
+  val.second.btype = NULL;
+  val.second.is_placeholder = false;
   std::pair ins =
 Type::type_btypes.insert(val);
-  if (!ins.second && ins.first->second != NULL)
-{
-  if (gogo != NULL && gogo->named_types_are_converted())
-	this->btype_ = ins.first->second;
-  return ins.first->second;
+  if (!ins.second && ins.first->second.btype != NULL)
+{
+  // Note that GOGO can be NULL here, but only when the GCC
+  // middle-end is asking for a frontend type.  That will only
+  // happen for simple types, which should never require
+  // placeholders.
+  if (!ins.first->second.is_placeholder)
+	this->btype_ = ins.first->second.btype;
+  else if (gogo->named_types_are_converted())
+	{
+	  this->finish_backend(gogo, ins.first->second.btype);
+	  ins.first->second.is_placeholder = false;
+	}
+
+  return ins.first->second.btype;
 }
 
   Btype* bt = this->get_btype_without_hash(gogo);
 
-  if (ins.first->second == NULL)
-ins.first->second = bt;
+  if (ins.first->second.btype == NULL)
+{
+  ins.first->second.btype = bt;
+  ins.first->second.is_placeholder = false;
+}
   else
 {
   // We have already created a backend representation for this
@@ -949,10 +960,9 @@
   // a named type which in turns uses an identical unnamed type.
   // Use the tree we created earlier and ignore the one we just
   // built.
-  bt = ins.first->second;
-  if (gogo == NULL || !gogo->named_types_are_converted())
-	return bt;
-  this->btype_ = bt;
+  if (this->btype_ == bt)
+	this->btype_ = ins.first->second.btype;
+  bt = ins.first->second.btype;
 }
 
   return bt;
@@ -1019,6 +1029,37 @@
   // These are simple types that can just be created directly.
   return this->get_backend(gogo);
 
+case TYPE_MAP:
+case TYPE_CHANNEL:
+  // All maps and channels have the same backend representation.
+  return this->get_backend(gogo);
+
+case TYPE_NAMED:
+case TYPE_FORWARD:
+  // Named types keep track of their own dependencies and manage
+  // their own placeholders.
+  return this->get_backend(gogo);
+
+case TYPE_INTERFACE:
+  if (this->interface_type()->is_empty())
+	return Interface_type::get_backend_empty_interface_type(gogo);
+  break;
+
+default:
+  break;
+}
+
+  std::pair val;
+  val.first = this;
+  val.second.btype = NULL;
+  val.second.is_placeholder = false;
+  std::pair ins =
+Type::type_btypes.insert(val);
+  if (!ins.second && ins.first->second.btype != NULL)
+return ins.first->second.btype;
+
+  switch (this->classification_)
+{
 case TYPE_FUNCTION:
   {
 	Location loc = this->function_type()->location();
@@ -1061,37 +1102,36 @@
 	}
   break;
 	
-case TYPE_MAP:
-case TYPE_CHANNEL:
-  // All maps and channels have the same backend representation.
-  return this->get_backend(gogo);
-
 case TYPE_INTERFACE:
-  if (this->interface_type()->is_empty())
-	return Interface_type::get_backend_empty_interface_type(gogo);
-  else
-	{
-	  std::vector bfields;
-	  get_backend_interface_fields(gogo, this->interface_type(), true,
-   &bfields);
-	  bt = gogo-

[patch, libgcc] Bug in fp-bit.c when NO_NANS is defined

2012-11-30 Thread Steve Ellcey

While investigating some soft-float issues I tried compiling fp-bit.c in
libgcc with NO_NANS defined.  I wound up with an undefined reference to
makenan.

Here is my attempt at a patch to fix the problem but I am not sure if
returning 0 is the right thing to do for b/0 when NO_NANS is defined.

isinf is always going to be false when NO_NANS is defined so it
is only a question of what to do for iszero(a) that we need to worry
about. b at this point may or may not be zero.

OK to checkin or should I return something else?

Steve Ellcey
sell...@mips.com


2012-11-30  Steve Ellcey  

* fp-bit.c (_fpdiv_parts): Do not call makenan if NO_NANS defined.


diff --git a/libgcc/fp-bit.c b/libgcc/fp-bit.c
index 7509f76..10a6b3a 100644
--- a/libgcc/fp-bit.c
+++ b/libgcc/fp-bit.c
@@ -979,8 +979,10 @@ _fpdiv_parts (fp_number_type * a,
 
   if (isinf (a) || iszero (a))
 {
+#ifndef NO_NANS
   if (a->class == b->class)
return makenan ();
+#endif
   return a;
 }
 


Re: [patch, libgcc] Bug in fp-bit.c when NO_NANS is defined

2012-11-30 Thread Joseph S. Myers
On Fri, 30 Nov 2012, Steve Ellcey  wrote:

> While investigating some soft-float issues I tried compiling fp-bit.c in
> libgcc with NO_NANS defined.  I wound up with an undefined reference to
> makenan.

Unless anything in the tree defines NO_NANS, I suggest we remove all 
support for such a macro from fp-bit.  As I noted in 
, I believe all 
the support in fp-bit for the IRIX variant of IBM long double can also be 
removed, and the associated code in real.[ch].

-- 
Joseph S. Myers
jos...@codesourcery.com


[RFA:] fix group-loads of VOIDmode constants, expr.c:emit_group_load_1

2012-11-30 Thread Hans-Peter Nilsson
(No cans of worms opened here, I believe...)

For MMIX with the default, Knuth's ABI (returning values in register
$0, through the register stack), function return values are prepared
in a register-swapped fashion due to the way the register stack works,
which has to be expressed as:

(parallel [
(expr_list:REG_DEP_TRUE (reg:DI 0 $0)
(const_int 8 [0x8]))
(expr_list:REG_DEP_TRUE (reg:DI 1 $1)
(const_int 0 [0]))
])

i.e. a group load.  Compare to the more intuitive "GNU ABI", passing
parameters straightforwardly, with the order in the called and the
calling function being the same, starting with global register $231.

Of course this matters only to >64bit (i.e. >registersize) values like
TImode, alias __int128.  The problem here is that group-loading a
constant for a function return-value doesn't work; it's passed to
simplify_gen_subreg which horks on the VOIDmode constant.  Thankfully,
the code below the context handles this case, twice the register-mode,
just fine, so let's just gate the simplify_gen_subreg call with a test
for a VOIDmode source.  Bootstrapped and checked for
x86_64-*-linux-gnu and checked for mmix-knuth-mmixware, both -mabi=gnu
multilib (no regressions) and the base multilib with the Knuth ABI,
where the patch below fixes:

Running 
/home/hp/gcctop/tmp/mbasf/gcc/gcc/testsuite/gcc.c-torture/execute/execute.exp 
...
...
FAIL: gcc.c-torture/execute/pr54471.c compilation,  -O1  (internal compiler 
error)
FAIL: gcc.c-torture/execute/pr54471.c compilation,  -O2  (internal compiler 
error)
FAIL: gcc.c-torture/execute/pr54471.c compilation,  -O3 -fomit-frame-pointer  
(internal compiler error)
FAIL: gcc.c-torture/execute/pr54471.c compilation,  -O3 -fomit-frame-pointer 
-funroll-loops  (internal compiler error)
FAIL: gcc.c-torture/execute/pr54471.c compilation,  -O3 -fomit-frame-pointer 
-funroll-all-loops -finline-functions  (internal compiler error)
FAIL: gcc.c-torture/execute/pr54471.c compilation,  -O3 -g  (internal compiler 
error)
FAIL: gcc.c-torture/execute/pr54471.c compilation,  -Os  (internal compiler 
error)
FAIL: gcc.c-torture/execute/pr54471.c compilation,  -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  (internal compiler error)
FAIL: gcc.c-torture/execute/pr54471.c compilation,  -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error)
...
Running /home/hp/gcctop/tmp/mbasf/gcc/gcc/testsuite/gcc.dg/dg.exp ...
...
FAIL: gcc.dg/pr32912-2.c (internal compiler error)
FAIL: gcc.dg/pr32912-2.c (test for excess errors)
FAIL: gcc.dg/pr32912-3.c (internal compiler error)
FAIL: gcc.dg/pr32912-3.c (test for excess errors)
...
Running 
/home/hp/gcctop/tmp/mbasf/gcc/gcc/testsuite/gcc.dg/torture/dg-torture.exp ...
...
FAIL: c-c++-common/torture/vector-compare-2.c  -O1  (internal compiler error)
FAIL: c-c++-common/torture/vector-compare-2.c  -O1  (test for excess errors)
FAIL: c-c++-common/torture/vector-compare-2.c  -O2  (internal compiler error)
FAIL: c-c++-common/torture/vector-compare-2.c  -O2  (test for excess errors)
FAIL: c-c++-common/torture/vector-compare-2.c  -O3 -fomit-frame-pointer  
(internal compiler error)
FAIL: c-c++-common/torture/vector-compare-2.c  -O3 -fomit-frame-pointer  (test 
for excess errors)
FAIL: c-c++-common/torture/vector-compare-2.c  -O3 -fomit-frame-pointer 
-funroll-loops  (internal compiler error)
FAIL: c-c++-common/torture/vector-compare-2.c  -O3 -fomit-frame-pointer 
-funroll-loops  (test for excess errors)
FAIL: c-c++-common/torture/vector-compare-2.c  -O3 -fomit-frame-pointer 
-funroll-all-loops -finline-functions  (internal compiler error)
FAIL: c-c++-common/torture/vector-compare-2.c  -O3 -fomit-frame-pointer 
-funroll-all-loops -finline-functions  (test for excess errors)
FAIL: c-c++-common/torture/vector-compare-2.c  -O3 -g  (internal compiler error)
FAIL: c-c++-common/torture/vector-compare-2.c  -O3 -g  (test for excess errors)
FAIL: c-c++-common/torture/vector-compare-2.c  -Os  (internal compiler error)
FAIL: c-c++-common/torture/vector-compare-2.c  -Os  (test for excess errors)
FAIL: c-c++-common/torture/vector-compare-2.c  -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (internal compiler error)
FAIL: c-c++-common/torture/vector-compare-2.c  -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)

(ditto the c++ testsuite for c-c++-common/torture/vector-compare-2.c)

(FWIW, the failing gcc.dg/pr32912-3.c is a regression.)

Ok?

gcc:
* expr.c (emit_group_load_1): Don't call simplify_gen_subreg for
a VOIDmode src.

Index: gcc/expr.c
===
--- gcc/expr.c  (revision 192677)
+++ gcc/expr.c  (working copy)
@@ -1739,7 +1739,9 @@ emit_group_load_1 (rtx *tmps, rtx dst, r
  emit_move_insn (mem, src);
  tmps[i] = adjust_address (mem, mode, (int) bytepos);
}
-  else if (CONSTANT_P (src) && GET_MODE (dst) != BLKmode
+  else if (CONSTANT_P (src)
+   

[PATCH i386] Allow cltd/cqto etc on modern CPUs

2012-11-30 Thread Xinliang David Li
Compiling the following code with O2

typedef unsigned long ulong;
typedef __SIZE_TYPE__ size_t;
long woo_i(long a, long b) { return a/b; }

GCC generates:

.LFB0:
.cfi_startproc
movq%rdi, %rdx
movq%rdi, %rax
sarq$63, %rdx
idivq   %rsi
ret

but both ICC and LLVM generate smaller and faster version:

movq  %rdi, %rax
cqto
idivq %rsi
ret

for reference see
http://www.agner.org/optimize/instruction_tables.pdf.  On Pentium, the
latency of the instruction is 3 cycles while on modern CPUs, the
instruction has only one uOp with 1 cycle latency.

The following proposed patch fixed the problem. Note that for Atom,
only the CWD instruction is slow with 5 cycle latency, the rest sign
extension instructions are fast -- the fix for Atom needs finer grain
control and can be done separately.

Ok to install after testing?

Index: config/i386/i386.c
===
--- config/i386/i386.c (revision 193861)
+++ config/i386/i386.c (working copy)
@@ -1822,7 +1822,7 @@ static unsigned int initial_ix86_tune_fe
   m_K6,

   /* X86_TUNE_USE_CLTD */
-  ~(m_PENT | m_CORE2I7 | m_ATOM | m_K6 | m_GENERIC),
+  ~(m_PENT | m_ATOM | m_K6),

   /* X86_TUNE_USE_XCHGB: Use xchgb %rh,%rl instead of rolw/rorw $8,rx.  */
   m_PENT4,

2010-11-30  Xinliang David Li  

* config/i386/i386.c: Allow sign extend instructions (cltd etc)
on modern CPUs.


thanks,

David


Re: [C++ Patch] PR 54170

2012-11-30 Thread Jason Merrill

On 11/30/2012 04:05 PM, Paolo Carlini wrote:

@@ -219,10 +219,15 @@ cp_convert_to_pointer (tree type, tree expr, tsubs
-   expr = build_int_cst (type, 0);
+   expr = (TREE_SIDE_EFFECTS (expr)
+   ? build_nop (type, expr)
+   : build_int_cst (type, 0));


This seems to rely on a nop being sufficient to convert from any null 
pointer constant to the appropriate pointer type, which I don't think is 
safe if the integer is smaller than a pointer.


I'm also not sure if we want to rely on nullptr_t expressions actually 
having the value 0.



-  pfn = build_c_cast (input_location, type, nullptr_node);
+  pfn = (TREE_SIDE_EFFECTS (pfn)
+? build_nop (type, pfn)
+: build_c_cast (input_location, type, nullptr_node));


Here we should be able to just convert pfn to the appropriate function 
pointer type so that we don't need to duplicate the logic.


Jason