date:20130405

Re: Fix PR 56077

2013-04-05 Thread Jakub Jelinek

On Fri, Apr 05, 2013 at 10:54:48AM +0400, Andrey Belevantsev wrote:
> I am testing the revert of this backport for 4.6 and will commit it
> in about an hour or so.  However, I am surprised we don't hit this

Ok, thanks.

> either on 4.7, 4.8 or trunk.  Some flush_pending_lists calls are
> protected from debug insns as they check CALL_P or JUMP_P, but not
> all of them.  It looks like flush_pending_lists should not be called
> on debug insns at all.  And indeed, the attached patch fixes
> Leonid's test case.
> 
> Jakub, you don't happen to remember any changes in this area that
> could hide the problem for 4.7 and later?

No, but Alex or Vlad could know better.  In any case, perhaps it could be
bisected (I only have x86_64 compilers around for bisecting seed though,
I'm afraid the testcase is mips only and hasn't been even posted).

> *** gcc/sched-deps.c  (revision 197492)
> --- gcc/sched-deps.c  (working copy)
> *** sched_analyze_insn (struct deps_desc *de
> *** 3044,3050 
>   
> /* Don't flush pending lists on speculative checks for
>selective scheduling.  */
> !   if (!sel_sched_p () || !sel_insn_is_speculation_check (insn))
>   flush_pending_lists (deps, insn, true, true);
>   
> if (!deps->readonly)
> --- 3044,3050 
>   
> /* Don't flush pending lists on speculative checks for
>selective scheduling.  */
> !   if (NONDEBUG_INSN_P (insn) && (!sel_sched_p () || 
> !sel_insn_is_speculation_check (insn)))

Too long line.  Start && below NONDEBUG.

>   flush_pending_lists (deps, insn, true, true);
>   
> if (!deps->readonly)


Jakub

Re: [PATCH] Loop distribution improvements

2013-04-05 Thread Richard Biener

Jakub Jelinek  wrote:

>On Thu, Apr 04, 2013 at 08:37:47PM +0200, Richard Biener wrote:
>> Can you factor out a function that returns
>> A proper qimode value if possible or null and
>> Use it in both places?
>
>Like this?

You should be able to remove zero, minus one and constructor special casing, 
no? Ok, maybe not constructor handling, but at least move handling of it to the 
function.

Richard.

>2013-04-04  Jakub Jelinek  
>
>   * tree-loop-distribution.c (const_with_all_bytes_same): New function.
>   (generate_memset_builtin): Only handle integer_all_onesp as -1 val if
>   TYPE_PRECISION is equal to mode bitsize.  Use
>const_with_all_bytes_same
>   if possible to compute val.
>   (classify_partition): Verify CONSTRUCTOR doesn't have any elts.
>   For QImode integers don't require anything about precision.  Use
>   const_with_all_bytes_same to find out if the constant doesn't have
>   repeated bytes in it.
>
>   * gcc.dg/pr56837.c: New test.
>
>--- gcc/tree-loop-distribution.c.jj2013-04-04 15:03:28.0 +0200
>+++ gcc/tree-loop-distribution.c   2013-04-04 20:49:14.295546543 +0200
>@@ -297,6 +297,27 @@ build_addr_arg_loc (location_t loc, data
>return fold_build_pointer_plus_loc (loc, DR_BASE_ADDRESS (dr),
>addr_base);
> }
> 
>+/* If VAL memory representation contains the same value in all bytes,
>+   return that value, otherwise return -1.
>+   E.g. for 0x24242424 return 0x24, for IEEE double
>+   747708026454360457216.0 return 0x44, etc.  */
>+
>+static int
>+const_with_all_bytes_same (tree val)
>+{
>+  unsigned char buf[64];
>+  int i, len;
>+  if (CHAR_BIT != 8 || BITS_PER_UNIT != 8)
>+return -1;
>+  len = native_encode_expr (val, buf, sizeof (buf));
>+  if (len == 0)
>+return -1;
>+  for (i = 1; i < len; i++)
>+if (buf[i] != buf[0])
>+  return -1;
>+  return buf[0];
>+}
>+
> /* Generate a call to memset for PARTITION in LOOP.  */
> 
> static void
>@@ -331,11 +352,18 @@ generate_memset_builtin (struct loop *lo
>   || real_zerop (val)
>   || TREE_CODE (val) == CONSTRUCTOR)
> val = integer_zero_node;
>-  else if (integer_all_onesp (val))
>+  else if (integer_all_onesp (val)
>+ && TYPE_PRECISION (TREE_TYPE (val))
>+== GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (val
> val = build_int_cst (integer_type_node, -1);
>   else
> {
>-  if (TREE_CODE (val) == INTEGER_CST)
>+  /* Handle constants like 0x15151515 and similarly
>+   floating point constants etc. where all bytes are the same.  */
>+  int bytev = const_with_all_bytes_same (val);
>+  if (bytev != -1)
>+  val = build_int_cst (integer_type_node, bytev);
>+  else if (TREE_CODE (val) == INTEGER_CST)
>   val = fold_convert (integer_type_node, val);
>else if (!useless_type_conversion_p (integer_type_node, TREE_TYPE
>(val)))
>   {
>@@ -944,15 +972,15 @@ classify_partition (loop_p loop, struct
>   if (!(integer_zerop (rhs)
>   || real_zerop (rhs)
>   || (TREE_CODE (rhs) == CONSTRUCTOR
>-  && !TREE_CLOBBER_P (rhs))
>-  || ((integer_all_onesp (rhs)
>-   || (INTEGRAL_TYPE_P (TREE_TYPE (rhs))
>-   && (TYPE_MODE (TREE_TYPE (rhs))
>-   == TYPE_MODE (unsigned_char_type_node
>-  /* For stores of a non-zero value require that the precision
>- of the value matches its actual size.  */
>+  && !TREE_CLOBBER_P (rhs)
>+  && CONSTRUCTOR_NELTS (rhs) == 0)
>+  || (integer_all_onesp (rhs)
>   && (TYPE_PRECISION (TREE_TYPE (rhs))
>-  == GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (rhs)))
>+  == GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (rhs)
>+  || (INTEGRAL_TYPE_P (TREE_TYPE (rhs))
>+  && (TYPE_MODE (TREE_TYPE (rhs))
>+  == TYPE_MODE (unsigned_char_type_node)))
>+  || const_with_all_bytes_same (rhs) != -1))
>   return;
>   if (TREE_CODE (rhs) == SSA_NAME
> && !SSA_NAME_IS_DEFAULT_DEF (rhs)
>--- gcc/testsuite/gcc.dg/pr56837.c.jj  2013-04-04 17:37:58.458675152
>+0200
>+++ gcc/testsuite/gcc.dg/pr56837.c 2013-04-04 17:36:40.0 +0200
>@@ -0,0 +1,67 @@
>+/* Limit this test to selected targets with IEEE double, 8-byte long
>long,
>+   supported 4x int vectors, 4-byte int.  */
>+/* { dg-do compile { target { i?86-*-* x86_64-*-* powerpc*-*-* } } }
>*/
>+/* { dg-options "-O3 -fdump-tree-optimized" } */
>+/* { dg-additional-options "-msse2" { target ia32 } } */
>+/* { dg-additional-options "-mvsx -maltivec" { target powerpc*-*-* } }
>*/
>+
>+typedef int V __attribute__((__vector_size__ (16)));
>+#define N 1024
>+double d[N];
>+long long int l[N];
>+_Bool b[N];
>+_Complex double c[N];
>+V v[N];
>+
>+void
>+fd (void)
>+{
>+  int i;
>+  for (i = 0; i < N; i++)
>+d[i] = 747708026454360457216.0;
>+}
>+
>+void
>+fl (void)
>+{
>+  int i;
>+  for (i = 0; i < N; i++)
>+l[i] = 0x7c7c7c7c

Re: [PATCH] Loop distribution improvements

2013-04-05 Thread Jakub Jelinek

On Fri, Apr 05, 2013 at 09:21:16AM +0200, Richard Biener wrote:
> Jakub Jelinek  wrote:
> 
> >On Thu, Apr 04, 2013 at 08:37:47PM +0200, Richard Biener wrote:
> >> Can you factor out a function that returns
> >> A proper qimode value if possible or null and
> >> Use it in both places?
> >
> >Like this?
> 
> You should be able to remove zero, minus one and constructor special
> casing, no?  Ok, maybe not constructor handling, but at least move

No, because the function is only handling BITS_PER_UNIT == 8 && CHAR_BIT == 8,
plus is unnecessarily expensive for the common case of storing 0.

But if you want, I can move all that integer_zerop / real_zerop /
CONSTRUCTOR / integer_all_onesp handling into the function.

BTW, the integer_all_onesp stuff is broken for this from what I can see, for 
complex
numbers it returns true for -1 + 0i where all bytes aren't 0xff, so we need
to rule out COMPLEX_CSTs (or do integer_all_onesp on each part instead).
And TYPE_PRECISION on VECTOR_CSTs won't be what we are looking for.

Jakub

Re: [PATCH] Loop distribution improvements

2013-04-05 Thread Marc Glisse


On Fri, 5 Apr 2013, Jakub Jelinek wrote:


On Fri, Apr 05, 2013 at 09:21:16AM +0200, Richard Biener wrote:

Jakub Jelinek  wrote:


On Thu, Apr 04, 2013 at 08:37:47PM +0200, Richard Biener wrote:

Can you factor out a function that returns
A proper qimode value if possible or null and
Use it in both places?


Like this?


You should be able to remove zero, minus one and constructor special
casing, no?  Ok, maybe not constructor handling, but at least move


No, because the function is only handling BITS_PER_UNIT == 8 && CHAR_BIT == 8,
plus is unnecessarily expensive for the common case of storing 0.

But if you want, I can move all that integer_zerop / real_zerop /
CONSTRUCTOR / integer_all_onesp handling into the function.

BTW, the integer_all_onesp stuff is broken for this from what I can see, for 
complex
numbers it returns true for -1 + 0i where all bytes aren't 0xff, so we need
to rule out COMPLEX_CSTs (or do integer_all_onesp on each part instead).
And TYPE_PRECISION on VECTOR_CSTs won't be what we are looking for.


Shouldn't we change integer_all_onesp to do what its name says and create 
a separate integer_minus_onep for the single place I could find where it 
would break, the folding of x * -1 ?


--
Marc Glisse

Re: Fix PR 56077

2013-04-05 Thread Eric Botcazou

> Jakub, you don't happen to remember any changes in this area that could
> hide the problem for 4.7 and later?

We do have regressions on the 4.7 branch in the scheduler (CCed Olivier who 
has more information).

-- 
Eric Botcazou

Re: Fill more delay slots in conditional returns

2013-04-05 Thread Eric Botcazou

> Thinking about this some more: This could be fixed by inserting a
> machine-specific pass just after delayed-branch scheduling, like in
> the attached patch. I think the same is possible with the dbr_schedule
> call in the MIPS backend.
> 
> Eric, what do you think of this approach?

No objections on principle from a SPARC viewpoint.  But can we really control 
when the pass is run with register_pass?  Because it needs to be run _before_ 
branch shortening.

> With those two dbr_schedule calls out of the way, it will be a lot
> easier to change things such that pass_free_cfg can run after
> pass_machine_reorg (and after pass_cleanup_barriers that can be
> simplified if there's still a CFG around). It will also help make the
> DELAY_SLOTS hack in cfgrtl.c:rest_of_pass_free_cfg redundant.

I agree that we should get rid of MIPS/SPARC's dbr_schedule shuffling.

-- 
Eric Botcazou

[patch tree-ssa-structalias.c]: Small finding in find_func_aliases function

2013-04-05 Thread Kai Tietz

Hello,

while debugging I made the finding that in find_func_aliases rhsop
might be used as NULL for gimple_assign_single_p items.  It should be
using for the gimple_assign_single_p instead directly the rhs1-item as
argument to pass to get_constraint_for_rhs function.

ChangeLog

2013-04-05  Kai Tietz

* tree-ssa-structalias.c (find_func_aliases): Special-case
gimple_assign_single_p handling.

Ok for apply?

Regards,
Kai


Index: tree-ssa-structalias.c
===
--- tree-ssa-structalias.c  (Revision 197495)
+++ tree-ssa-structalias.c  (Arbeitskopie)
@@ -4667,9 +4667,10 @@ find_func_aliases (gimple origt)
}
  else if ((CONVERT_EXPR_CODE_P (code)
&& !(POINTER_TYPE_P (gimple_expr_type (t))
-&& !POINTER_TYPE_P (TREE_TYPE (rhsop
-  || gimple_assign_single_p (t))
+&& !POINTER_TYPE_P (TREE_TYPE (rhsop)
get_constraint_for_rhs (rhsop, &rhsc);
+ else if (gimple_assign_single_p (t))
+   get_constraint_for_rhs (gimple_assign_rhs1 (t), &rhsc);
  else if (code == COND_EXPR)
{
  /* The result is a merge of both COND_EXPR arms.  */

Re: [patch] replace a bunch of equivalent checks for asm operands with a new function

2013-04-05 Thread Eric Botcazou

> Hmm, what do you have in mind for such a situation?
> 
> If extract_asm_operands returns NULL then asm_noperands will return -1.
> 
> If extract_asm_operands returns non-NULL then asm_noperands deep-dives
> the PATTERN of the insn (just like extract_asm_operands) and returns
> >= 0 unless the insn is invalid.

I don't think that we want to replace calls to extract_asm_operands by calls 
to asm_noperands because that will make the compiler slower and less robust on 
invalid inputs.

Why can't we write insn_with_asm_operands_p as

  GET_CODE (body) == ASM_INPUT || extract_asm_operands (body) != NULL

and replace most of the cases with a call to it?

-- 
Eric Botcazou

Re: [PATCH 2/3] libstdc++-v3: ::tmpnam depends on uClibc SUSV4_LEGACY

2013-04-05 Thread Rainer Orth

Gabriel Dos Reis  writes:

>> diff --git a/libstdc++-v3/include/c_global/cstdio 
>> b/libstdc++-v3/include/c_global/cstdio
>> index fcbec0c..037a668 100644
>> --- a/libstdc++-v3/include/c_global/cstdio
>> +++ b/libstdc++-v3/include/c_global/cstdio
>> @@ -131,7 +131,9 @@ namespace std
>>using ::sprintf;
>>using ::sscanf;
>>using ::tmpfile;
>> +#if !defined __UCLIBC__ || defined __UCLIBC_SUSV4_LEGACY__
>>using ::tmpnam;
>> +#endif
>>using ::ungetc;
>>using ::vfprintf;
>>using ::vprintf;
>> --
>> 1.7.10.4
>>
>
> Sounds good to me.

Do we really want to use target-specific macros directly instead of
defining something more abstract either via a configure test or a define
in config/os/uclibc?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH 2/3] libstdc++-v3: ::tmpnam depends on uClibc SUSV4_LEGACY

2013-04-05 Thread Gabriel Dos Reis

On Fri, Apr 5, 2013 at 4:01 AM, Rainer Orth  
wrote:
> Gabriel Dos Reis  writes:
>
>>> diff --git a/libstdc++-v3/include/c_global/cstdio 
>>> b/libstdc++-v3/include/c_global/cstdio
>>> index fcbec0c..037a668 100644
>>> --- a/libstdc++-v3/include/c_global/cstdio
>>> +++ b/libstdc++-v3/include/c_global/cstdio
>>> @@ -131,7 +131,9 @@ namespace std
>>>using ::sprintf;
>>>using ::sscanf;
>>>using ::tmpfile;
>>> +#if !defined __UCLIBC__ || defined __UCLIBC_SUSV4_LEGACY__
>>>using ::tmpnam;
>>> +#endif
>>>using ::ungetc;
>>>using ::vfprintf;
>>>using ::vprintf;
>>> --
>>> 1.7.10.4
>>>
>>
>> Sounds good to me.
>
> Do we really want to use target-specific macros directly instead of
> defining something more abstract either via a configure test or a define
> in config/os/uclibc?
>
> Rainer

What would your suggestion for defineingsomething more abstract that reliably
says whether the feature is deprecated or absent?


>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University

[PATCH v2] inline fail reporting: reporting inline fail caused by overwritable function

2013-04-05 Thread zhouzhouyi

From: Zhouyi Zhou 

Sender: Zhouyi Zhou 
 
To: 
 
Subject: [PATCH v2] inline fail reporting: reporting inline fail caused by 
overwritable function

When inline failed because of callee is overwritable, gcc will not report it 
in dump file as other inline failing cases do. This patch correct this.

ChangeLog:
2013-04-05  Zhouyi Zhou 

* cif-code.def (OVERWRITABLE): correct the comment for overwritable
 function
* ipa-inline.c (can_inline_edge_p): let dump mechanism report the 
inline fail caused by 
overwritable function


Signed-off-by: Zhouyi Zhou 

---
Index: gcc/cif-code.def
===
--- gcc/cif-code.def(revision 197506)
+++ gcc/cif-code.def(working copy)
@@ -48,7 +48,7 @@ DEFCIFCODE(REDEFINED_EXTERN_INLINE,
 /* Function is not inlinable.  */
 DEFCIFCODE(FUNCTION_NOT_INLINABLE, N_("function not inlinable"))
 
-/* Function is not overwritable.  */
+/* Function is overwritable.  */
 DEFCIFCODE(OVERWRITABLE, N_("function body can be overwritten at link time"))
 
 /* Function is not an inlining candidate.  */
Index: gcc/ipa-inline.c
===
--- gcc/ipa-inline.c(revision 197506)
+++ gcc/ipa-inline.c(working copy)
@@ -266,7 +266,7 @@ can_inline_edge_p (struct cgraph_edge *e
   else if (avail <= AVAIL_OVERWRITABLE)
 {
   e->inline_failed = CIF_OVERWRITABLE;
-  return false;
+  inlinable = false;
 }
   else if (e->call_stmt_cannot_inline_p)
 {

Re: [PATCH 2/3] libstdc++-v3: ::tmpnam depends on uClibc SUSV4_LEGACY

2013-04-05 Thread Rainer Orth

Gabriel Dos Reis  writes:

> On Fri, Apr 5, 2013 at 4:01 AM, Rainer Orth  
> wrote:
>> Gabriel Dos Reis  writes:
>>
 diff --git a/libstdc++-v3/include/c_global/cstdio
 b/libstdc++-v3/include/c_global/cstdio
 index fcbec0c..037a668 100644
 --- a/libstdc++-v3/include/c_global/cstdio
 +++ b/libstdc++-v3/include/c_global/cstdio
 @@ -131,7 +131,9 @@ namespace std
using ::sprintf;
using ::sscanf;
using ::tmpfile;
 +#if !defined __UCLIBC__ || defined __UCLIBC_SUSV4_LEGACY__
using ::tmpnam;
 +#endif
using ::ungetc;
using ::vfprintf;
using ::vprintf;
 --
 1.7.10.4
b
>>>
>>> Sounds good to me.
>>
>> Do we really want to use target-specific macros directly instead of
>> defining something more abstract either via a configure test or a define
>> in config/os/uclibc?
>>
>> Rainer
>
> What would your suggestion for defineingsomething more abstract that reliably
> says whether the feature is deprecated or absent?

It seems _GLIBCXX_USE_TMPNAM would be in line with the other macros I
see.  Than either configure could test if tmpnam() is available without
special additional macros or config/os/uclibc/os_config.h could define
it to 0, with a default of 1 (best decided by the libstdc++
maintainers).

The configure route seems cleaner to me, especially given that
Bernhard's rationale for uClibc no longer providing it by default
suggests that other systems might follow in the foreseeable future.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH 2/3] libstdc++-v3: ::tmpnam depends on uClibc SUSV4_LEGACY

2013-04-05 Thread Gabriel Dos Reis

On Fri, Apr 5, 2013 at 4:13 AM, Rainer Orth  
wrote:
> Gabriel Dos Reis  writes:
>
>> On Fri, Apr 5, 2013 at 4:01 AM, Rainer Orth  
>> wrote:
>>> Gabriel Dos Reis  writes:
>>>
> diff --git a/libstdc++-v3/include/c_global/cstdio
> b/libstdc++-v3/include/c_global/cstdio
> index fcbec0c..037a668 100644
> --- a/libstdc++-v3/include/c_global/cstdio
> +++ b/libstdc++-v3/include/c_global/cstdio
> @@ -131,7 +131,9 @@ namespace std
>using ::sprintf;
>using ::sscanf;
>using ::tmpfile;
> +#if !defined __UCLIBC__ || defined __UCLIBC_SUSV4_LEGACY__
>using ::tmpnam;
> +#endif
>using ::ungetc;
>using ::vfprintf;
>using ::vprintf;
> --
> 1.7.10.4
> b

 Sounds good to me.
>>>
>>> Do we really want to use target-specific macros directly instead of
>>> defining something more abstract either via a configure test or a define
>>> in config/os/uclibc?
>>>
>>> Rainer
>>
>> What would your suggestion for defineingsomething more abstract that reliably
>> says whether the feature is deprecated or absent?
>
> It seems _GLIBCXX_USE_TMPNAM would be in line with the other macros I
> see.  Than either configure could test if tmpnam() is available without
> special additional macros or config/os/uclibc/os_config.h could define
> it to 0, with a default of 1 (best decided by the libstdc++
> maintainers).
>
> The configure route seems cleaner to me, especially given that
> Bernhard's rationale for uClibc no longer providing it by default
> suggests that other systems might follow in the foreseeable future.
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University

sounds reasonable; Bernhard, would you mind amending your patch in
that direction?

Re: [PATCH 3/3] libsanitizer: add LFS guards

2013-04-05 Thread Bernhard Reutner-Fischer

On 5 April 2013 08:42, Konstantin Serebryany
 wrote:
>
> On Fri, Apr 5, 2013 at 10:37 AM, Jakub Jelinek  wrote:
> > On Thu, Apr 04, 2013 at 09:53:30PM +0200, Bernhard Reutner-Fischer wrote:
> >> uClibc can be built without Largefile support, add the corresponding
> >> guards. uClibc does not have __libc_malloc()/__libc_free(), add guard.
> >
> > Ugh, this is very ugly.  In addition to the stuff mentioned by Konstantin
> > that this really should go into upstream first:
> >
> >> --- a/libsanitizer/interception/interception_type_test.cc
> >> +++ b/libsanitizer/interception/interception_type_test.cc
> >> @@ -22,7 +22,7 @@ COMPILER_CHECK(sizeof(SSIZE_T) == sizeof(ssize_t));
> >>  COMPILER_CHECK(sizeof(PTRDIFF_T) == sizeof(ptrdiff_t));
> >>  COMPILER_CHECK(sizeof(INTMAX_T) == sizeof(intmax_t));
> >>
> >> -#ifndef __APPLE__
> >> +#if !defined __APPLE__ && (defined __USE_LARGEFILE64 && defined 
> >> __off64_t_defined)
> >
> > Using the internal implementation detail of __USE_LARGEFILE64 is very ugly,
> > but why __off64_t_defined?  That macro is there just to avoid typedefing it
> > multiple times, if you include more than one of the sys/types.h, stdio.h and
> > unistd.h headers.  If you include any of those headers, it will be defined
> > when __USE_LARGEFILE64 is defined.  Or is uClibc not guaranteeing that?


It does guarantee that, let me see if i can drop that  && defined
__off64_t_defined.
>
> >
> >> --- a/libsanitizer/sanitizer_common/sanitizer_allocator.cc
> >> +++ b/libsanitizer/sanitizer_common/sanitizer_allocator.cc
> >> @@ -9,11 +9,13 @@
> >>  // run-time libraries.
> >>  // This allocator that is used inside run-times.
> >>  
> >> //===--===//
> >> +
> >> +#include 
>
> I overlooked this.
> The sanitizer files (other than *_linux.cc and such) may not include
> *any* system header files.
> We've been there, it cost us lots of pain and lots of work to get rid of.


So how do you suggest i should deal with it then?
I do not have a CPP token inside of the compiler to denote the libc
type, AFAICS.

thanks,
>
>
> --kcc
>
>
> >
> > I'm afraid features.h won't exist on many targets, it isn't a standard
> > header.  I'd say you want to include some standard header instead (stdio.h?)
> > or guard this.
> >
> > Jakub

Re: [PATCH 2/3] libstdc++-v3: ::tmpnam depends on uClibc SUSV4_LEGACY

2013-04-05 Thread Bernhard Reutner-Fischer

On 5 April 2013 11:23, Gabriel Dos Reis  wrote:
> On Fri, Apr 5, 2013 at 4:13 AM, Rainer Orth  
> wrote:
>> Gabriel Dos Reis  writes:
>>
>>> On Fri, Apr 5, 2013 at 4:01 AM, Rainer Orth  
>>> wrote:
 Gabriel Dos Reis  writes:

>> diff --git a/libstdc++-v3/include/c_global/cstdio
>> b/libstdc++-v3/include/c_global/cstdio
>> index fcbec0c..037a668 100644
>> --- a/libstdc++-v3/include/c_global/cstdio
>> +++ b/libstdc++-v3/include/c_global/cstdio
>> @@ -131,7 +131,9 @@ namespace std
>>using ::sprintf;
>>using ::sscanf;
>>using ::tmpfile;
>> +#if !defined __UCLIBC__ || defined __UCLIBC_SUSV4_LEGACY__
>>using ::tmpnam;
>> +#endif
>>using ::ungetc;
>>using ::vfprintf;
>>using ::vprintf;
>> --
>> 1.7.10.4
>> b
>
> Sounds good to me.

 Do we really want to use target-specific macros directly instead of
 defining something more abstract either via a configure test or a define
 in config/os/uclibc?

 Rainer
>>>
>>> What would your suggestion for defineingsomething more abstract that 
>>> reliably
>>> says whether the feature is deprecated or absent?
>>
>> It seems _GLIBCXX_USE_TMPNAM would be in line with the other macros I
>> see.  Than either configure could test if tmpnam() is available without
>> special additional macros or config/os/uclibc/os_config.h could define
>> it to 0, with a default of 1 (best decided by the libstdc++
>> maintainers).
>>
>> The configure route seems cleaner to me, especially given that
>> Bernhard's rationale for uClibc no longer providing it by default
>> suggests that other systems might follow in the foreseeable future.

> sounds reasonable; Bernhard, would you mind amending your patch in
> that direction?

I'll have a look.
Thanks,

Re: [PATCH 3/3] libsanitizer: add LFS guards

2013-04-05 Thread Jakub Jelinek

On Fri, Apr 05, 2013 at 11:46:44AM +0200, Bernhard Reutner-Fischer wrote:
> > >> --- a/libsanitizer/sanitizer_common/sanitizer_allocator.cc
> > >> +++ b/libsanitizer/sanitizer_common/sanitizer_allocator.cc
> > >> @@ -9,11 +9,13 @@
> > >>  // run-time libraries.
> > >>  // This allocator that is used inside run-times.
> > >>  
> > >> //===--===//
> > >> +
> > >> +#include 
> >
> > I overlooked this.
> > The sanitizer files (other than *_linux.cc and such) may not include
> > *any* system header files.
> > We've been there, it cost us lots of pain and lots of work to get rid of.
> 
> 
> So how do you suggest i should deal with it then?
> I do not have a CPP token inside of the compiler to denote the libc
> type, AFAICS.

autoconf, and pass some -DHAS___LIBC_MALLOC or similar down to the compiler
(or just -DUCLIBC or whatever, -DUCLIBC would have the advantage that
(provided llvm doesn't support uClibc) that you'd only need to change gcc's
configury)?

Jakub

Re: [patch] C++11: Observers for the three 'handler functions'

2013-04-05 Thread Jonathan Wakely

This should fix the handlers for platforms without __atomic_exchange
for pointers by using a mutex. I used the old __mutex type not
std::mutex because it's available on more platforms and works for the
'single' thread model too.  I didn't try to optimise away the atomic
ops for accessing the handlers because throwing exceptions and
getting/setting these handlers should not be on the fast path of
performance sensitive code.

PR libstdc++/56841
* libsupc++/eh_ptr.cc (rethrow_exception): Use get_unexpected() and
get_terminate() accessors.
* libsupc++/eh_throw.cc (__cxa_throw): Likewise.
* libsupc++/eh_terminate.cc: Use mutex when atomic builtins not
available.
* libsupc++/new_handler.cc: Likewise.

Tested x86_64-linux and hppa2.0-linux, committed to trunk.
commit 5a24add997a6f4052657855a4573ace82fd59ea2
Author: Jonathan Wakely 
Date:   Fri Apr 5 00:39:58 2013 +0100

PR libstdc++/56841
* libsupc++/eh_ptr.cc (rethrow_exception): Use get_unexpected() and
get_terminate() accessors.
* libsupc++/eh_throw.cc (__cxa_throw): Likewise.
* libsupc++/eh_terminate.cc: Use mutex when atomic builtins not
available.
* libsupc++/new_handler.cc: Likewise.

diff --git a/libstdc++-v3/libsupc++/eh_ptr.cc b/libstdc++-v3/libsupc++/eh_ptr.cc
index f0183ce..6bc3311 100644
--- a/libstdc++-v3/libsupc++/eh_ptr.cc
+++ b/libstdc++-v3/libsupc++/eh_ptr.cc
@@ -212,8 +212,8 @@ std::rethrow_exception(std::exception_ptr ep)
   dep->primaryException = obj;
   __atomic_add_fetch (&eh->referenceCount, 1,  __ATOMIC_ACQ_REL);
 
-  dep->unexpectedHandler = __unexpected_handler;
-  dep->terminateHandler = __terminate_handler;
+  dep->unexpectedHandler = get_unexpected ();
+  dep->terminateHandler = get_terminate ();
   __GXX_INIT_DEPENDENT_EXCEPTION_CLASS(dep->unwindHeader.exception_class);
   dep->unwindHeader.exception_cleanup = __gxx_dependent_exception_cleanup;
 
diff --git a/libstdc++-v3/libsupc++/eh_terminate.cc 
b/libstdc++-v3/libsupc++/eh_terminate.cc
index bc38e1d..b31d2e2 100644
--- a/libstdc++-v3/libsupc++/eh_terminate.cc
+++ b/libstdc++-v3/libsupc++/eh_terminate.cc
@@ -27,6 +27,15 @@
 #include 
 #include "unwind-cxx.h"
 #include 
+#include 
+
+#if ATOMIC_POINTER_LOCK_FREE < 2
+#include 
+namespace
+{
+  __gnu_cxx::__mutex mx;
+}
+#endif
 
 using namespace __cxxabiv1;
 
@@ -65,7 +74,13 @@ std::terminate_handler
 std::set_terminate (std::terminate_handler func) throw()
 {
   std::terminate_handler old;
+#if ATOMIC_POINTER_LOCK_FREE > 1
   __atomic_exchange (&__terminate_handler, &func, &old, __ATOMIC_ACQ_REL);
+#else
+  __gnu_cxx::__scoped_lock l(mx);
+  old = __terminate_handler;
+  __terminate_handler = func;
+#endif
   return old;
 }
 
@@ -73,7 +88,12 @@ std::terminate_handler
 std::get_terminate () noexcept
 {
   std::terminate_handler func;
+#if ATOMIC_POINTER_LOCK_FREE > 1
   __atomic_load (&__terminate_handler, &func, __ATOMIC_ACQUIRE);
+#else
+  __gnu_cxx::__scoped_lock l(mx);
+  func = __terminate_handler;
+#endif
   return func;
 }
 
@@ -81,7 +101,13 @@ std::unexpected_handler
 std::set_unexpected (std::unexpected_handler func) throw()
 {
   std::unexpected_handler old;
+#if ATOMIC_POINTER_LOCK_FREE > 1
   __atomic_exchange (&__unexpected_handler, &func, &old, __ATOMIC_ACQ_REL);
+#else
+  __gnu_cxx::__scoped_lock l(mx);
+  old = __unexpected_handler;
+  __unexpected_handler = func;
+#endif
   return old;
 }
 
@@ -89,6 +115,11 @@ std::unexpected_handler
 std::get_unexpected () noexcept
 {
   std::unexpected_handler func;
+#if ATOMIC_POINTER_LOCK_FREE > 1
   __atomic_load (&__unexpected_handler, &func, __ATOMIC_ACQUIRE);
+#else
+  __gnu_cxx::__scoped_lock l(mx);
+  func = __unexpected_handler;
+#endif
   return func;
 }
diff --git a/libstdc++-v3/libsupc++/eh_throw.cc 
b/libstdc++-v3/libsupc++/eh_throw.cc
index a79a025..5d37698 100644
--- a/libstdc++-v3/libsupc++/eh_throw.cc
+++ b/libstdc++-v3/libsupc++/eh_throw.cc
@@ -68,8 +68,8 @@ __cxxabiv1::__cxa_throw (void *obj, std::type_info *tinfo,
   header->referenceCount = 1;
   header->exc.exceptionType = tinfo;
   header->exc.exceptionDestructor = dest;
-  header->exc.unexpectedHandler = __unexpected_handler;
-  header->exc.terminateHandler = __terminate_handler;
+  header->exc.unexpectedHandler = std::get_unexpected ();
+  header->exc.terminateHandler = std::get_terminate ();
   __GXX_INIT_PRIMARY_EXCEPTION_CLASS(header->exc.unwindHeader.exception_class);
   header->exc.unwindHeader.exception_cleanup = __gxx_exception_cleanup;
 
diff --git a/libstdc++-v3/libsupc++/new_handler.cc 
b/libstdc++-v3/libsupc++/new_handler.cc
index 2f6bb5e..5253cfd 100644
--- a/libstdc++-v3/libsupc++/new_handler.cc
+++ b/libstdc++-v3/libsupc++/new_handler.cc
@@ -24,6 +24,15 @@
 // .
 
 #include "new"
+#include 
+
+#if ATOMIC_POINTER_LOCK_FREE < 2
+#include 
+namespace
+{
+  __gnu_cxx::__mutex mx;
+}
+#endif
 
 const std::nothrow_t std::nothrow = { };
 
@@ -37,8 +46,14 @@

[committed] Another no-dist case (in 4.6 this time) (PR other/43620)

2013-04-05 Thread Jakub Jelinek

Hi!

I've noticed another place where distdir: goal was present, in
boehm-gc/include/Makefile.in on 4.6 branch.  Fixed thusly, committed to 4.6.

2013-04-05  Jakub Jelinek  

PR other/43620 
* Makefile.am (AUTOMAKE_OPTIONS): Add no-dist.
* include/Makefile.am (AUTOMAKE_OPTIONS): Likewise.
* Makefile.in: Regenerated.
* include/Makefile.in: Regenerated.

--- boehm-gc/Makefile.am(revision 197510)
+++ boehm-gc/Makefile.am(working copy)
@@ -4,7 +4,7 @@
 ## files that should be in the distribution are not mentioned in this
 ## Makefile.am.
 
-AUTOMAKE_OPTIONS = cygnus subdir-objects
+AUTOMAKE_OPTIONS = cygnus subdir-objects no-dist
 ACLOCAL_AMFLAGS = -I .. -I ../config
 
 SUBDIRS = include
--- boehm-gc/include/Makefile.am(revision 197510)
+++ boehm-gc/include/Makefile.am(working copy)
@@ -1,4 +1,4 @@
-AUTOMAKE_OPTIONS = foreign
+AUTOMAKE_OPTIONS = foreign no-dist
 
 noinst_HEADERS = gc.h gc_backptr.h gc_local_alloc.h \
   gc_pthread_redirects.h gc_cpp.h
--- boehm-gc/Makefile.in(revision 197510)
+++ boehm-gc/Makefile.in(working copy)
@@ -283,7 +283,7 @@ toolexeclibdir = @toolexeclibdir@
 top_build_prefix = @top_build_prefix@
 top_builddir = @top_builddir@
 top_srcdir = @top_srcdir@
-AUTOMAKE_OPTIONS = cygnus subdir-objects
+AUTOMAKE_OPTIONS = cygnus subdir-objects no-dist
 ACLOCAL_AMFLAGS = -I .. -I ../config
 SUBDIRS = include
 noinst_LTLIBRARIES = libgcjgc.la libgcjgc_convenience.la
--- boehm-gc/include/Makefile.in(revision 197510)
+++ boehm-gc/include/Makefile.in(working copy)
@@ -36,9 +36,9 @@ build_triplet = @build@
 host_triplet = @host@
 target_triplet = @target@
 subdir = include
-DIST_COMMON = $(noinst_HEADERS) $(srcdir)/Makefile.am \
-   $(srcdir)/Makefile.in $(srcdir)/gc_config.h.in \
-   $(srcdir)/gc_ext_config.h.in
+DIST_COMMON = $(srcdir)/Makefile.in $(srcdir)/Makefile.am \
+   $(srcdir)/gc_config.h.in $(srcdir)/gc_ext_config.h.in \
+   $(noinst_HEADERS)
 ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
 am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
$(top_srcdir)/../config/depstand.m4 \
@@ -55,11 +55,9 @@ CONFIG_HEADER = gc_config.h gc_ext_confi
 CONFIG_CLEAN_FILES =
 CONFIG_CLEAN_VPATH_FILES =
 SOURCES =
-DIST_SOURCES =
 HEADERS = $(noinst_HEADERS)
 ETAGS = etags
 CTAGS = ctags
-DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST)
 ACLOCAL = @ACLOCAL@
 AMTAR = @AMTAR@
 AM_CPPFLAGS = @AM_CPPFLAGS@
@@ -199,7 +197,7 @@ toolexeclibdir = @toolexeclibdir@
 top_build_prefix = @top_build_prefix@
 top_builddir = @top_builddir@
 top_srcdir = @top_srcdir@
-AUTOMAKE_OPTIONS = foreign
+AUTOMAKE_OPTIONS = foreign no-dist
 noinst_HEADERS = gc.h gc_backptr.h gc_local_alloc.h \
   gc_pthread_redirects.h gc_cpp.h
 
@@ -322,37 +320,6 @@ GTAGS:
 
 distclean-tags:
-rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags
-
-distdir: $(DISTFILES)
-   @srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*]/&/g'`; \
-   topsrcdirstrip=`echo "$(top_srcdir)" | sed 's/[].[^$$\\*]/&/g'`; \
-   list='$(DISTFILES)'; \
- dist_files=`for file in $$list; do echo $$file; done | \
- sed -e "s|^$$srcdirstrip/||;t" \
- -e "s|^$$topsrcdirstrip/|$(top_builddir)/|;t"`; \
-   case $$dist_files in \
- */*) $(MKDIR_P) `echo "$$dist_files" | \
-  sed '/\//!d;s|^|$(distdir)/|;s,/[^/]*$$,,' | \
-  sort -u` ;; \
-   esac; \
-   for file in $$dist_files; do \
- if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \
- if test -d $$d/$$file; then \
-   dir=`echo "/$$file" | sed -e 's,/[^/]*$$,,'`; \
-   if test -d "$(distdir)/$$file"; then \
- find "$(distdir)/$$file" -type d ! -perm -700 -exec chmod u+rwx 
{} \;; \
-   fi; \
-   if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \
- cp -fpR $(srcdir)/$$file "$(distdir)$$dir" || exit 1; \
- find "$(distdir)/$$file" -type d ! -perm -700 -exec chmod u+rwx 
{} \;; \
-   fi; \
-   cp -fpR $$d/$$file "$(distdir)$$dir" || exit 1; \
- else \
-   test -f "$(distdir)/$$file" \
-   || cp -p $$d/$$file "$(distdir)/$$file" \
-   || exit 1; \
- fi; \
-   done
 check-am: all-am
 check: check-am
 all-am: Makefile $(HEADERS) gc_config.h gc_ext_config.h
@@ -452,16 +419,15 @@ uninstall-am:
 
 .PHONY: CTAGS GTAGS all all-am check check-am clean clean-generic \
clean-libtool ctags distclean distclean-generic distclean-hdr \
-   distclean-libtool distclean-tags distdir dvi dvi-am html \
-   html-am info info-am install install-am install-data \
-   install-data-am install-dvi install-dvi-am install-exec \
-   install-exec-am install-html install-html-am install-info \
-   install-info-am install-man install-pdf install-pdf-am \
-   install-ps install-ps-am install-strip install

Re: Fix PR 56077

2013-04-05 Thread Eric Botcazou

>  I don't know whether backporting this would be better than reverting
>  the offending change as just done on 4.7.

I presume that you meant on the 4.6 branch.

-- 
Eric Botcazou

Re: Fix PR 56077

2013-04-05 Thread Olivier Hainque


On Apr 5, 2013, at 12:21 , Eric Botcazou  wrote:

>> I don't know whether backporting this would be better than reverting
>> the offending change as just done on 4.7.
> 
> I presume that you meant on the 4.6 branch.

 Arf, indeed, thanks for correcting :)

Document cortex-a53 in invoke.texi

2013-04-05 Thread Ramana Radhakrishnan


Joseph pointed out the cortex-a53 wasn't documented in invoke.texi.

Fixed thusly.

Ramana


2013-04-05  Ramana Radhakrishnan  

   * doc/invoke.texi (ARM Options): Document cortex-a53 support.Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 197510)
+++ gcc/doc/invoke.texi (working copy)
@@ -11266,8 +11266,8 @@
 @samp{arm1136j-s}, @samp{arm1136jf-s}, @samp{mpcore}, @samp{mpcorenovfp},
 @samp{arm1156t2-s}, @samp{arm1156t2f-s}, @samp{arm1176jz-s}, 
@samp{arm1176jzf-s},
 @samp{cortex-a5}, @samp{cortex-a7}, @samp{cortex-a8}, @samp{cortex-a9}, 
-@samp{cortex-a15}, @samp{cortex-r4}, @samp{cortex-r4f}, @samp{cortex-r5},
-@samp{cortex-r7}, @samp{cortex-m4}, @samp{cortex-m3},
+@samp{cortex-a15}, @samp{cortex-a53}, @samp{cortex-r4}, @samp{cortex-r4f},
+@samp{cortex-r5}, @samp{cortex-r7}, @samp{cortex-m4}, @samp{cortex-m3},
 @samp{cortex-m1},
 @samp{cortex-m0},
 @samp{cortex-m0plus},

Re: [PATCH] Loop distribution improvements

2013-04-05 Thread Richard Biener

Jakub Jelinek  wrote:

>On Fri, Apr 05, 2013 at 09:21:16AM +0200, Richard Biener wrote:
>> Jakub Jelinek  wrote:
>> 
>> >On Thu, Apr 04, 2013 at 08:37:47PM +0200, Richard Biener wrote:
>> >> Can you factor out a function that returns
>> >> A proper qimode value if possible or null and
>> >> Use it in both places?
>> >
>> >Like this?
>> 
>> You should be able to remove zero, minus one and constructor special
>> casing, no?  Ok, maybe not constructor handling, but at least move
>
>No, because the function is only handling BITS_PER_UNIT == 8 &&
>CHAR_BIT == 8,
>plus is unnecessarily expensive for the common case of storing 0.
>
>But if you want, I can move all that integer_zerop / real_zerop /
>CONSTRUCTOR / integer_all_onesp handling into the function.

Please.

>BTW, the integer_all_onesp stuff is broken for this from what I can
>see, for complex
>numbers it returns true for -1 + 0i where all bytes aren't 0xff, so we
>need
>to rule out COMPLEX_CSTs (or do integer_all_onesp on each part
>instead).
>And TYPE_PRECISION on VECTOR_CSTs won't be what we are looking for.

Hmm, indeed.  Or remove the -1 special casing altogether. Marc is probably 
right with his note as well.

Richard.

>   Jakub

Re: Fix PR 56077

2013-04-05 Thread Andrey Belevantsev


On 05.04.2013 14:10, Olivier Hainque wrote:

On Apr 5, 2013, at 10:13 , Eric Botcazou  wrote:

We do have regressions on the 4.7 branch in the scheduler (CCed Olivier who
has more information).


  Right: we do see a SEGV while compiling the attached monitor.i (preprocessed
  output from a qemu tree) with -O2 -g.

   ./cc1 -m32 -O2 -g -quiet monitor.i

  .../monitor.c: In function ‘memory_dump’:
  .../monitor.c:1109:1: internal compiler error: Segmentation fault

  As already mentioned upthread, this is triggered by a call to
  flush_pending_lists with a DEBUG_INSN. We get into:

  if (for_write)
{
  add_dependence_list_and_free (deps, insn, &deps->pending_read_insns,
1, REG_DEP_ANTI);
  if (!deps->readonly)
{
  free_EXPR_LIST_list (&deps->pending_read_mems);
  deps->pending_read_list_length = 0;
}
}

  add_dependence_list_and_free doesn't free *LISTP when
  operating on DEBUG_INSNs, so we end up with pending_read_mems freed together
  with pending_read_insns not freed.

  This was cured on mainline by:

Author: mkuvyrkov
Date:   Mon Aug 27 22:11:48 2012 +

* sched-deps.c (add_dependence_list_and_free): Simplify.
(flush_pending_list_and_free): Fix a hack that was fixing a hack.  Free
lists when add_dependence_list_and_free doesn't free them.

(svn+ssh://gcc.gnu.org/svn/gcc/trunk@190733)

  http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01625.html

  I don't know whether backporting this would be better than reverting
  the offending change as just done on 4.7.


I'd say for 4.6 the best way is to revert.  PR 56077 is not that important, 
and this 4.6 release will be the last one.  For 4.7, we can additionally 
backport Maxim's patch or revert this one.  I'm fine with both options, but 
I'll test 4.7 backport too just to be ready for that.


Andrey

Re: [patch] C++11: Observers for the three 'handler functions'

2013-04-05 Thread Jonathan Wakely

On 5 April 2013 11:13, Jonathan Wakely wrote:
> This should fix the handlers for platforms without __atomic_exchange
> for pointers by using a mutex. I used the old __mutex type not
> std::mutex because it's available on more platforms and works for the
> 'single' thread model too.  I didn't try to optimise away the atomic
> ops for accessing the handlers because throwing exceptions and
> getting/setting these handlers should not be on the fast path of
> performance sensitive code.
>
> PR libstdc++/56841
> * libsupc++/eh_ptr.cc (rethrow_exception): Use get_unexpected() and
> get_terminate() accessors.
> * libsupc++/eh_throw.cc (__cxa_throw): Likewise.
> * libsupc++/eh_terminate.cc: Use mutex when atomic builtins not
> available.
> * libsupc++/new_handler.cc: Likewise.
>
> Tested x86_64-linux and hppa2.0-linux, committed to trunk.

I did think about adding a new accessor for internal use:

std::pair __get_handlers();

which could be used in eh_throw.cc and eh_ptr.cc to avoid doing two
separate PIC calls to get_unexpected() and get_terminate(), and so
that platforms using a mutex would only lock it once.  We can
reconsider doing that later, it's only an optimisation not needed for
correctness.

I think we could also remove the extern declarations of
__unexpected_handler and __terminate_handler and make them internal to
eh_terminate.cc, as I already did for __new_handler.  Although those
names were visible to users they should not be used, are not part of
the IA-64 ABI, and are not exported from the shared lib.

Re: patch to fix constant math - 4th patch - the wide-int class - patch ping for the next stage 1

2013-04-05 Thread Kenneth Zadeck


Richard,

There has been something that has bothered me about you proposal for the 
storage manager and i think i can now characterize that problem.  Say i 
want to compute the expression


(a + b) / c

converting from tree values, using wide-int as the engine and then 
storing the result in a tree.   (A very common operation for the various 
simplifiers in gcc.)


in my version of wide-int where there is only the stack allocated fix 
size allocation for the data, the compiler arranges for 6 instances of 
wide-int that are "statically" allocated on the stack when the function 
is entered.There would be 3 copies of the precision and data to get 
things started and one allocation variable sized object at the end when 
the INT_CST is built and one copy to put it back.   As i have argued, 
these copies are of negligible size.


In your world, to get things started, you would do 3 pointer copies to 
get the values out of the tree to set the expression leaves but then you 
will call the allocator 3 times to get space to hold the intermediate 
nodes before you get to pointer copy the result back into the result cst 
which still needs an allocation to build it. I am assuming that we can 
play the same game at the tree level that we do at the rtl level where 
we do 1 variable sized allocation to get the entire INT_CST rather than 
doing 1 fixed sized allocation and 1 variable sized one.


even if we take the simpler example of a + b, you still loose.   The 
cost of the extra allocation and it's subsequent recovery is more than 
my copies.   In fact, even in the simplest case of someone going from a 
HWI thru wide_int into tree, you have 2 allocations vs my 1.


I just do not see the cost savings and if there are no cost savings, you 
certainly cannot say that having these templates is simpler than not 
having the templates.


Kenny

On 04/02/2013 11:04 AM, Richard Biener wrote:

On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
 wrote:

This patch contains a large number of the changes requested by Richi.   It
does not contain any of the changes that he requested to abstract the
storage layer.   That suggestion appears to be quite unworkable.

I of course took this claim as a challenge ... with the following result.  It is
of course quite workable ;)

The attached patch implements the core wide-int class and three storage
models (fixed size for things like plain HWI and double-int, variable size
similar to how your wide-int works and an adaptor for the double-int as
contained in trees).  With that you can now do

HOST_WIDE_INT
wi_test (tree x)
{
   // template argument deduction doesn't do the magic we want it to do
   // to make this kind of implicit conversions work
   // overload resolution considers this kind of conversions so we
   // need some magic that combines both ... but seeding the overload
   // set with some instantiations doesn't seem to be possible :/
   // wide_int<> w = x + 1;
   wide_int<> w;
   w += x;
   w += 1;
   // template argument deduction doesn't deduce the return value type,
   // not considering the template default argument either ...
   // w = wi (x) + 1;
   // we could support this by providing rvalue-to-lvalue promotion
   // via a traits class?
   // otoh it would lead to sub-optimal code anyway so we should
   // make the result available as reference parameter and only support
   // wide_int <> res; add (res, x, 1); ?
   w = wi (x).operator+ >(1);
   wide_int<>::add(w, x, 1);
   return w.to_hwi ();
}

we are somewhat limited with C++ unless we want to get really fancy.
Eventually providing operator+ just doesn't make much sense for
generic wide-int combinations (though then the issue is its operands
are no longer commutative which I think is the case with your wide-int
or double-int as well - they don't suport "1 + wide_int" for obvious reasons).

So there are implementation design choices left undecided.

Oh, and the operation implementations are crap (they compute nonsense).

But you should get the idea.

Richard.

[PATCH][ARM][testsuite] Fix testsuite options for testing rounding vectorisation on ARMv8

2013-04-05 Thread Kyrylo Tkachov

Hi all,

With r197491 I added testsuite support for vectorisation of rounding
functions on ARMv8 NEON, but the options set up
for vect.exp results in the testsuite trying to test all the vect tests with
ARMv8 NEON which does not work on
ARMv7 targets and simulators that don't support ARMv8 (like qemu).

But if we run the tests using v7 NEON, the newly enabled vect-rounding*
tests will FAIL because they need ARMv8 NEON options.

Therefore this patch reverts most of that and instead copies the rounding
vectorisation tests to gcc.target/arm
where the correct options can be set.

Tested arm-none-eabi on qemu to make sure that the execution tests come back
and use v7 NEON

Ok for trunk?

Thanks,
Kyrill

gcc/testsuite
2013-04-05  Kyrylo Tkachov  

* lib/target-supports.exp (add_options_for_arm_v8_neon):
Add -march=armv8-a when we use v8 NEON.
(check_effective_target_vect_call_btruncf): Remove arm-*-*-*.
(check_effective_target_vect_call_ceilf): Likewise.
(check_effective_target_vect_call_floorf): Likewise.
(check_effective_target_vect_call_roundf): Likewise.
(check_vect_support_and_set_flags): Remove check for arm_v8_neon.
* gcc.target/arm/vect-rounding-btruncf.c: New testcase.
* gcc.target/arm/vect-rounding-ceilf.c: Likewise.
* gcc.target/arm/vect-rounding-floorf.c: Likewise.
* gcc.target/arm/vect-rounding-roundf.c: Likewise.

neon-v8-testsuite.patch
Description: Binary data

[gomp4] Disallow class iterators in omp simd and omp for simd loops

2013-04-05 Thread Jakub Jelinek

Hi!

I've missed that OpenMP 4.0 rc2 in 2.6's last restriction mentions:
"For C++, in the simd construct the only random access iterator type that are
for var are pointer types."

The following patch implements that restriction (no testcase yet until simd
is fully supported), committed to branch.

2013-04-05  Jakub Jelinek  

* semantics.c (finish_omp_for): Disallow class iterators for
OMP_SIMD and OMP_FOR_SIMD loops.

--- gcc/cp/semantics.c.jj   2013-03-27 13:01:09.0 +0100
+++ gcc/cp/semantics.c  2013-04-05 14:35:07.967622671 +0200
@@ -5090,6 +5090,13 @@ finish_omp_for (location_t locus, enum t
 
   if (CLASS_TYPE_P (TREE_TYPE (decl)))
{
+ if (code == OMP_SIMD || code == OMP_FOR_SIMD)
+   {
+ error_at (elocus, "%<#pragma omp%s simd%> used with class "
+   "iteration variable %qE",
+   code == OMP_FOR_SIMD ? " for" : "", decl);
+ return NULL;
+   }
  if (handle_omp_for_class_iterator (i, locus, declv, initv, condv,
 incrv, &body, &pre_body, clauses))
return NULL;


Jakub

Re: Fix PR 56077

2013-04-05 Thread Olivier Hainque


On Apr 5, 2013, at 13:22 , Andrey Belevantsev  wrote:
>>  http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01625.html
>> 
>>  I don't know whether backporting this would be better than reverting
>>  the offending change as just done on 4.7.
> 
> I'd say for 4.6 the best way is to revert.  PR 56077 is not that important, 
> and this 4.6 release will be the last one.  For 4.7, we can additionally 
> backport Maxim's patch or revert this one.  I'm fine with both options, but 
> I'll test 4.7 backport too just to be ready for that.

 Understood, thanks. Who's decision is it to pick one track or the other for 
4.7 ?

 RMs in addition to the maintainers of this particular area ?

Re: [PATCH] Loop distribution improvements

2013-04-05 Thread Jakub Jelinek

On Fri, Apr 05, 2013 at 12:46:48PM +0200, Richard Biener wrote:
> >BTW, the integer_all_onesp stuff is broken for this from what I can
> >see, for complex
> >numbers it returns true for -1 + 0i where all bytes aren't 0xff, so we
> >need
> >to rule out COMPLEX_CSTs (or do integer_all_onesp on each part
> >instead).
> >And TYPE_PRECISION on VECTOR_CSTs won't be what we are looking for.
> 
> Hmm, indeed.  Or remove the -1 special casing altogether.

Ok, zero/CONSTRUCTOR moved into the function, all_onesp handling removed (so
only on the CHAR_BIT == 8 hosts and BITS_PER_UNIT == 8 targets it will be
optimized).  Ok for trunk?

> Marc is probably right with his note as well.

I'll defer that to Marc ;)

2013-04-05  Jakub Jelinek  

* tree-loop-distribution.c (const_with_all_bytes_same): New function.
(generate_memset_builtin): Only handle integer_all_onesp as -1 val if
TYPE_PRECISION is equal to mode bitsize.  Use const_with_all_bytes_same
if possible to compute val.
(classify_partition): Verify CONSTRUCTOR doesn't have any elts.
For QImode integers don't require anything about precision.  Use
const_with_all_bytes_same to find out if the constant doesn't have
repeated bytes in it.

* gcc.dg/pr56837.c: New test.

--- gcc/tree-loop-distribution.c.jj 2013-04-04 15:03:28.0 +0200
+++ gcc/tree-loop-distribution.c2013-04-05 15:21:10.641668895 +0200
@@ -297,6 +297,36 @@ build_addr_arg_loc (location_t loc, data
   return fold_build_pointer_plus_loc (loc, DR_BASE_ADDRESS (dr), addr_base);
 }
 
+/* If VAL memory representation contains the same value in all bytes,
+   return that value, otherwise return -1.
+   E.g. for 0x24242424 return 0x24, for IEEE double
+   747708026454360457216.0 return 0x44, etc.  */
+
+static int
+const_with_all_bytes_same (tree val)
+{
+  unsigned char buf[64];
+  int i, len;
+
+  if (integer_zerop (val)
+  || real_zerop (val)
+  || (TREE_CODE (val) == CONSTRUCTOR
+  && !TREE_CLOBBER_P (val)
+  && CONSTRUCTOR_NELTS (val) == 0))
+return 0;
+
+  if (CHAR_BIT != 8 || BITS_PER_UNIT != 8)
+return -1;
+
+  len = native_encode_expr (val, buf, sizeof (buf));
+  if (len == 0)
+return -1;
+  for (i = 1; i < len; i++)
+if (buf[i] != buf[0])
+  return -1;
+  return buf[0];
+}
+
 /* Generate a call to memset for PARTITION in LOOP.  */
 
 static void
@@ -327,24 +357,20 @@ generate_memset_builtin (struct loop *lo
 
   /* This exactly matches the pattern recognition in classify_partition.  */
   val = gimple_assign_rhs1 (stmt);
-  if (integer_zerop (val)
-  || real_zerop (val)
-  || TREE_CODE (val) == CONSTRUCTOR)
-val = integer_zero_node;
-  else if (integer_all_onesp (val))
-val = build_int_cst (integer_type_node, -1);
-  else
-{
-  if (TREE_CODE (val) == INTEGER_CST)
-   val = fold_convert (integer_type_node, val);
-  else if (!useless_type_conversion_p (integer_type_node, TREE_TYPE (val)))
-   {
- gimple cstmt;
- tree tem = make_ssa_name (integer_type_node, NULL);
- cstmt = gimple_build_assign_with_ops (NOP_EXPR, tem, val, NULL_TREE);
- gsi_insert_after (&gsi, cstmt, GSI_CONTINUE_LINKING);
- val = tem;
-   }
+  /* Handle constants like 0x15151515 and similarly
+ floating point constants etc. where all bytes are the same.  */
+  int bytev = const_with_all_bytes_same (val);
+  if (bytev != -1)
+val = build_int_cst (integer_type_node, bytev);
+  else if (TREE_CODE (val) == INTEGER_CST)
+val = fold_convert (integer_type_node, val);
+  else if (!useless_type_conversion_p (integer_type_node, TREE_TYPE (val)))
+{
+  gimple cstmt;
+  tree tem = make_ssa_name (integer_type_node, NULL);
+  cstmt = gimple_build_assign_with_ops (NOP_EXPR, tem, val, NULL_TREE);
+  gsi_insert_after (&gsi, cstmt, GSI_CONTINUE_LINKING);
+  val = tem;
 }
 
   fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_MEMSET));
@@ -354,10 +380,8 @@ generate_memset_builtin (struct loop *lo
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "generated memset");
-  if (integer_zerop (val))
+  if (bytev == 0)
fprintf (dump_file, " zero\n");
-  else if (integer_all_onesp (val))
-   fprintf (dump_file, " minus one\n");
   else
fprintf (dump_file, "\n");
 }
@@ -941,18 +965,10 @@ classify_partition (loop_p loop, struct
 {
   gimple stmt = DR_STMT (single_store);
   tree rhs = gimple_assign_rhs1 (stmt);
-  if (!(integer_zerop (rhs)
-   || real_zerop (rhs)
-   || (TREE_CODE (rhs) == CONSTRUCTOR
-   && !TREE_CLOBBER_P (rhs))
-   || ((integer_all_onesp (rhs)
-|| (INTEGRAL_TYPE_P (TREE_TYPE (rhs))
-&& (TYPE_MODE (TREE_TYPE (rhs))
-== TYPE_MODE (unsigned_char_type_node
-   /* For stores of a non-zero value req

Re: Fix PR 56077

2013-04-05 Thread Jakub Jelinek

On Fri, Apr 05, 2013 at 03:28:11PM +0200, Olivier Hainque wrote:
> 
> On Apr 5, 2013, at 13:22 , Andrey Belevantsev  wrote:
> >>  http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01625.html
> >> 
> >>  I don't know whether backporting this would be better than reverting
> >>  the offending change as just done on 4.7.
> > 
> > I'd say for 4.6 the best way is to revert.  PR 56077 is not that important, 
> > and this 4.6 release will be the last one.  For 4.7, we can additionally 
> > backport Maxim's patch or revert this one.  I'm fine with both options, but 
> > I'll test 4.7 backport too just to be ready for that.
> 
>  Understood, thanks. Who's decision is it to pick one track or the other for 
> 4.7 ?
> 
>  RMs in addition to the maintainers of this particular area ?

As written in PR56848, the patch should be reverted for 4.7.3
and reapplied together with the additional fix after 4.7.3 is released
(before 4.7.3 release there is just too short time to do anything else,
while before 4.7.4 there will be plenty of time to test it sufficiently).

Jakub

Re: Fix PR 56077

2013-04-05 Thread Olivier Hainque


On Apr 5, 2013, at 15:40 , Jakub Jelinek  wrote:
> As written in PR56848, the patch should be reverted for 4.7.3
> and reapplied together with the additional fix after 4.7.3 is released
> (before 4.7.3 release there is just too short time to do anything else,
> while before 4.7.4 there will be plenty of time to test it sufficiently).

 OK, thanks for the guidance Jakub.

 Andrey, could you please take care of this ?

 Many thanks in advance,

 Olivier

Re: [PATCH][ARM][testsuite] Fix testsuite options for testing rounding vectorisation on ARMv8

2013-04-05 Thread Ramana Radhakrishnan


On 04/05/13 14:06, Kyrylo Tkachov wrote:

Hi all,

With r197491 I added testsuite support for vectorisation of rounding
functions on ARMv8 NEON, but the options set up
for vect.exp results in the testsuite trying to test all the vect tests with
ARMv8 NEON which does not work on
ARMv7 targets and simulators that don't support ARMv8 (like qemu).

But if we run the tests using v7 NEON, the newly enabled vect-rounding*
tests will FAIL because they need ARMv8 NEON options.

Therefore this patch reverts most of that and instead copies the rounding
vectorisation tests to gcc.target/arm
where the correct options can be set.

Tested arm-none-eabi on qemu to make sure that the execution tests come back
and use v7 NEON

Ok for trunk?


Ok by me but I'd like Mike to have another look.

It's a bit unfortunate we need to copy these tests over for the 
architecture levels we need - there is a good project to restructure 
gcc.target/arm to test properly at different arch. levels but for the 
minute this is the best compromise.


Ramana





Thanks,
Kyrill

gcc/testsuite
2013-04-05  Kyrylo Tkachov  

* lib/target-supports.exp (add_options_for_arm_v8_neon):
Add -march=armv8-a when we use v8 NEON.
(check_effective_target_vect_call_btruncf): Remove arm-*-*-*.
(check_effective_target_vect_call_ceilf): Likewise.
(check_effective_target_vect_call_floorf): Likewise.
(check_effective_target_vect_call_roundf): Likewise.
(check_vect_support_and_set_flags): Remove check for arm_v8_neon.
* gcc.target/arm/vect-rounding-btruncf.c: New testcase.
* gcc.target/arm/vect-rounding-ceilf.c: Likewise.
* gcc.target/arm/vect-rounding-floorf.c: Likewise.
* gcc.target/arm/vect-rounding-roundf.c: Likewise.

Re: [patch] Fix node weight updates during ipa-cp (issue7812053)

2013-04-05 Thread Teresa Johnson

On Thu, Mar 28, 2013 at 2:27 AM, Richard Biener
 wrote:
> On Wed, Mar 27, 2013 at 6:22 PM, Teresa Johnson  wrote:
>> I found that the node weight updates on cloned nodes during ipa-cp were
>> leading to incorrect/insane weights. Both the original and new node weight
>> computations used truncating divides, leading to a loss of total node weight.
>> I have fixed this by making both rounding integer divides.
>>
>> Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk?
>
> I'm sure we can outline a rounding integer divide inline function on
> gcov_type.  To gcov-io.h, I suppose.
>
> Otherwise this looks ok to me.

Thanks. I went ahead and worked on outlining this functionality. In
the process of doing so, I discovered that there was already a method
in basic-block.h to do part of this: apply_probability(), which does
the rounding divide by REG_BR_PROB_BASE. There is a related function
combine_probabilities() that takes 2 int probabilities instead of a
gcov_type and an int probability. I decided to use apply_probability()
in ipa-cp, and add a new macro GCOV_COMPUTE_SCALE to basic-block.h to
compute the scale factor/probability via a rounding divide. So the
ipa-cp changes I made use both GCOV_COMPUTE_SCALE and
apply_probability.

I then went through all the code to look for instances where we were
computing scale factors/probabilities and performing scaling. I found
a mix of existing uses of apply/combine_probabilities, uses of RDIV,
inlined rounding divides, and truncating divides. I think it would be
good to unify all of this. As a first step, I replaced all inline code
sequences that were already doing rounding divides to compute scale
factors/probabilities or do the scaling, to instead use the
appropriate helper function/macro described above. For these
locations, there should be no change to behavior.

There are a number of places where there are truncating divides right
now. Since changing those may impact the resulting behavior, for this
patch I simply added a comment as to which helper they should use. As
soon as this patch goes in I am planning to change those to use the
appropriate helper and test performance, and then will send that patch
for review. So for this patch, the only place where behavior is
changed is in ipa-cp which was my original change.

New patch is attached. Bootstrapped (both bootstrap and
profiledbootstrap) and tested on x86-64-unknown-linux-gnu. Ok for
trunk?

Thanks,
Teresa

>
> Thanks,
> Richard.
>
>> 2013-03-27  Teresa Johnson  
>>
>> * ipa-cp.c (update_profiling_info): Perform rounding integer
>> division when updating weights instead of truncating.
>> (update_specialized_profile): Ditto.
>>
>> Index: ipa-cp.c
>> ===
>> --- ipa-cp.c(revision 197118)
>> +++ ipa-cp.c(working copy)
>> @@ -2588,14 +2588,18 @@ update_profiling_info (struct cgraph_node *orig_no
>>
>>for (cs = new_node->callees; cs ; cs = cs->next_callee)
>>  if (cs->frequency)
>> -  cs->count = cs->count * (new_sum * REG_BR_PROB_BASE
>> -  / orig_node_count) / REG_BR_PROB_BASE;
>> +  cs->count = (cs->count
>> +   * ((new_sum * REG_BR_PROB_BASE + orig_node_count/2)
>> +  / orig_node_count)
>> +   + REG_BR_PROB_BASE/2) / REG_BR_PROB_BASE;
>>  else
>>cs->count = 0;
>>
>>for (cs = orig_node->callees; cs ; cs = cs->next_callee)
>> -cs->count = cs->count * (remainder * REG_BR_PROB_BASE
>> -/ orig_node_count) / REG_BR_PROB_BASE;
>> +cs->count = (cs->count
>> + * ((remainder * REG_BR_PROB_BASE + orig_node_count/2)
>> +/ orig_node_count)
>> + + REG_BR_PROB_BASE/2) / REG_BR_PROB_BASE;
>>
>>if (dump_file)
>>  dump_profile_updates (orig_node, new_node);
>> @@ -2627,14 +2631,19 @@ update_specialized_profile (struct cgraph_node *ne
>>
>>for (cs = new_node->callees; cs ; cs = cs->next_callee)
>>  if (cs->frequency)
>> -  cs->count += cs->count * redirected_sum / new_node_count;
>> +  cs->count += (cs->count
>> +* ((redirected_sum * REG_BR_PROB_BASE
>> ++ new_node_count/2) / new_node_count)
>> ++ REG_BR_PROB_BASE/2) / REG_BR_PROB_BASE;
>>  else
>>cs->count = 0;
>>
>>for (cs = orig_node->callees; cs ; cs = cs->next_callee)
>>  {
>> -  gcov_type dec = cs->count * (redirected_sum * REG_BR_PROB_BASE
>> -  / orig_node_count) / REG_BR_PROB_BASE;
>> +  gcov_type dec = (cs->count
>> +   * ((redirected_sum * REG_BR_PROB_BASE
>> +   + orig_node_count/2) / orig_node_count)
>> +   + REG_BR_PROB_BASE/2) / REG_BR_PROB_BASE;
>>if (dec < cs->count)
>> cs->count -= dec;
>>else
>>
>> --
>> This patch is available fo

RE: [PATCH][ARM][testsuite] Fix testsuite options for testing rounding vectorisation on ARMv8

2013-04-05 Thread Kyrylo Tkachov

- -Original Message-
> From: Ramana Radhakrishnan
> Sent: 05 April 2013 15:06
> To: Kyrylo Tkachov
> Cc: gcc-patches@gcc.gnu.org; mikest...@comcast.net
> Subject: Re: [PATCH][ARM][testsuite] Fix testsuite options for testing
> rounding vectorisation on ARMv8
> 
> On 04/05/13 14:06, Kyrylo Tkachov wrote:
> > Hi all,
> >
> > With r197491 I added testsuite support for vectorisation of rounding
> > functions on ARMv8 NEON, but the options set up
> > for vect.exp results in the testsuite trying to test all the vect
> tests with
> > ARMv8 NEON which does not work on
> > ARMv7 targets and simulators that don't support ARMv8 (like qemu).
> >
> > But if we run the tests using v7 NEON, the newly enabled vect-
> rounding*
> > tests will FAIL because they need ARMv8 NEON options.
> >
> > Therefore this patch reverts most of that and instead copies the
> rounding
> > vectorisation tests to gcc.target/arm
> > where the correct options can be set.
> >
> > Tested arm-none-eabi on qemu to make sure that the execution tests
> come back
> > and use v7 NEON
> >
> > Ok for trunk?
> 
> Ok by me but I'd like Mike to have another look.
> 
> It's a bit unfortunate we need to copy these tests over for the
> architecture levels we need - there is a good project to restructure
> gcc.target/arm to test properly at different arch. levels but for the
> minute this is the best compromise.

While differentiating between architecture levels in the gcc.target/arm is a
good idea,
I think in this case the problem is that when testing gcc.dg/vect/ we use
check_vect_support_and_set_flags to set a common set of flags for all the
tests in that directory.

But when for the same target (i.e. arm-none-eabi, which could be ARMv7 or
ARMv8)
different FPU options provide different vectorisation capabilities things
gets messy when we want
to test something that one FPU supports and another doesn't. Since the
vect.exp tests are common
we cannot add arm-specific FPU options there.

Kyrill

> 
> Ramana
> 
> 
> 
> >
> > Thanks,
> > Kyrill
> >
> > gcc/testsuite
> > 2013-04-05  Kyrylo Tkachov  
> >
> > * lib/target-supports.exp (add_options_for_arm_v8_neon):
> > Add -march=armv8-a when we use v8 NEON.
> > (check_effective_target_vect_call_btruncf): Remove arm-*-*-*.
> > (check_effective_target_vect_call_ceilf): Likewise.
> > (check_effective_target_vect_call_floorf): Likewise.
> > (check_effective_target_vect_call_roundf): Likewise.
> > (check_vect_support_and_set_flags): Remove check for arm_v8_neon.
> > * gcc.target/arm/vect-rounding-btruncf.c: New testcase.
> > * gcc.target/arm/vect-rounding-ceilf.c: Likewise.
> > * gcc.target/arm/vect-rounding-floorf.c: Likewise.
> > * gcc.target/arm/vect-rounding-roundf.c: Likewise.
> >

Re: [PATCH][ARM] Fix signed-unsigned comparison warning

2013-04-05 Thread Ramana Radhakrishnan


On 04/05/13 15:55, Kyrylo Tkachov wrote:

Hi all

This patch fixes a warning in arm.c about a comparison between signed and
unsigned integers.
This is usually harmless, but during bootstrap we compile with -Werror and
this turns
into an error. The fix is a one-liner.

Tested to make sure warning goes away and did a regtest run for
arm-none-eabi.

Ok for trunk?

Thanks,
Kyrill

2013-04-05  Kyrylo Tkachov  

* config/arm/arm.c (arm_expand_builtin): Change fcode
type to unsigned int.



Ok.

Ramana

[PATCH][ARM] Fix signed-unsigned comparison warning

2013-04-05 Thread Kyrylo Tkachov

Hi all

This patch fixes a warning in arm.c about a comparison between signed and
unsigned integers.
This is usually harmless, but during bootstrap we compile with -Werror and
this turns
into an error. The fix is a one-liner.

Tested to make sure warning goes away and did a regtest run for
arm-none-eabi.

Ok for trunk?

Thanks,
Kyrill

2013-04-05  Kyrylo Tkachov  

* config/arm/arm.c (arm_expand_builtin): Change fcode
type to unsigned int.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 231a27f..1558fb0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -21489,7 +21489,7 @@ arm_expand_builtin (tree exp,
   rtx   op1;
   rtx   op2;
   rtx   pat;
-  int   fcode = DECL_FUNCTION_CODE (fndecl);
+  unsigned int  fcode = DECL_FUNCTION_CODE (fndecl);
   size_ti;
   enum machine_mode tmode;
   enum machine_mode mode0;

[PATCH][Backport 4.7][ARM] Fix PR 56720

2013-04-05 Thread Kyrylo Tkachov

Hi all,

This patch is a backport of the fix for PR 56720 where we would ICE on
arm-*-* when trying to
expand vcond with a floating point unorderd comparison cases. The patch is
almost identical
to the trunk patch at:
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg00652.html
except that it adds explicit handling of the LTGT case which does not
require explicit
handling in 4.8 and onwards.

PR 56720 itself has been fixed on trunk and 4.8 but is present in 4.7 and
4.6

Ok for 4.7 now or when it reopens?

Tested arm-none-eabi on qemu.

Thanks,
Kyrill

gcc/ChangeLog
2013-04-05  Kyrylo Tkachov  

PR target/56720
* config/arm/iterators.md (v_cmp_result): New mode attribute.
* config/arm/neon.md (vcond): Handle unordered cases.


gcc/testsuite/ChangeLog
2013-04-05  Kyrylo Tkachov  

PR target/56720
* gcc.target/arm/neon-vcond-gt.c: New test.
* gcc.target/arm/neon-vcond-ltgt.c: Likewise.
* gcc.target/arm/neon-vcond-unordered.c: Likewise.

neon-vcond-4.7.patch
Description: Binary data

Re: [PATCH,ARM][1/n] New patterns for subtract with carry

2013-04-05 Thread Ramana Radhakrishnan

On 04/05/13 16:26, Greta Yorsh wrote:

-Original Message-
From: Richard Earnshaw
Sent: 22 February 2013 16:30
To: Greta Yorsh
Cc: GCC Patches; Ramana Radhakrishnan; ni...@redhat.com;
p...@codesourcery.com
Subject: Re: [PATCH,ARM][1/n] New patterns for subtract with carry

On 18/02/13 18:35, Greta Yorsh wrote:

Add patterns to handle various subtract with carry operations.

These patterns match RTL insns emitted by splitters
for DImode operations such as subdi, negdi, and cmpdi.

gcc/

2013-02-14  Greta Yorsh  

  * config/arm/arm.md (subsi3_carryin, subsi3_carryin_const):

New

patterns.
  (subsi3_carryin_compare,subsi3_carryin_compare_const):

Likewise.

  (subsi3_carryin_shift,rsbsi3_carryin_shift): Likewise.

Not ok.  RSC does not exist in Thumb state.

R.

I'm attaching an updated patch. I changed the condition of rsbsi3_carryin_shift pattern 
and added "arch" attribute to subsi3_carryin as appropriate.

I have also tested the patch again on the recent trunk along with all other 
patching in this series, which have already been approved. No regressions.

Ok for trunk?

Ok

ramana

[PATCH] Fix PR48182

2013-04-05 Thread Marek Polacek

This patch prevents segfault when using --param min-crossjump-insns=0.
What can happen in that case is that flow_find_cross_jump returns 0,
thus nmatch is 0, then
nmatch < PARAM_VALUE (PARAM_MIN_CROSSJUMP_INSNS)
doesn't hold, thus we continue, but we segfault later on when
doing split_block.  I think it's better to just bail out in that
case; moreover setting min-crossjump-insns to 0 isn't very common...

Regtested/bootstrapped on x86_64-linux, ok for trunk/4.8?

2013-04-05  Marek Polacek  

PR rtl-optimization/48182
* cfgcleanup.c (try_crossjump_to_edge): Bail out if
PARAM_MIN_CROSSJUMP_INSNS is 0.

* gcc.dg/pr48182.c: New test.

--- gcc/testsuite/gcc.dg/pr48182.c.mp   2013-04-05 14:58:06.373269392 +0200
+++ gcc/testsuite/gcc.dg/pr48182.c  2013-04-05 14:57:47.867211373 +0200
@@ -0,0 +1,11 @@
+/* PR rtl-optimization/48182 */
+/* { dg-do compile } */
+/* { dg-options "-fcrossjumping --param min-crossjump-insns=0" } */
+
+extern int bar (void);
+
+int
+foo (int x)
+{
+  return x && bar ();
+}
--- gcc/cfgcleanup.c.mp 2013-04-05 14:55:01.634675751 +0200
+++ gcc/cfgcleanup.c2013-04-05 16:33:23.701814048 +0200
@@ -1929,8 +1929,9 @@ try_crossjump_to_edge (int mode, edge e1
  of matching instructions or the 'from' block was totally matched
  (such that its predecessors will hopefully be redirected and the
  block removed).  */
-  if ((nmatch < PARAM_VALUE (PARAM_MIN_CROSSJUMP_INSNS))
+  if ((nmatch < PARAM_VALUE (PARAM_MIN_CROSSJUMP_INSNS)
   && (newpos1 != BB_HEAD (src1)))
+  || PARAM_VALUE (PARAM_MIN_CROSSJUMP_INSNS) == 0)
 return false;
 
   /* Avoid deleting preserve label when redirecting ABNORMAL edges.  */

Marek

Re: [PATCH, ARM] ARM Linux kernel-assisted atomic operation helpers vs. libcall argument promotion

2013-04-05 Thread Ramana Radhakrishnan


On 03/15/13 18:16, Julian Brown wrote:

Hi,

At present, the libcall helpers implementing atomic operations
(__sync_val_compare_and_swap_X) for char and short types suffer from
a type mismatch. This is leading to test failures, i.e.:

FAIL: gcc.dg/atomic-compare-exchange-1.c execution test
FAIL: gcc.dg/atomic-compare-exchange-2.c execution test

On investigation, these tests pass if the values used in the tests are
tweaked so that they are in the range representable by both signed and
unsigned chars, i.e. 0 to 127, rather than ~0. The failures are
happening because libcall expansion is sign-extending sub-word-size
arguments (e.g. EXPECTED, DESIRED in
optabs.c:expand_atomic_compare_and_swap), but the functions
implementing the operations are written to take unsigned arguments,
zero-extended, and the unexpected out-of-range values cause them to
fail.

The sign-extension happens because in calls.c:emit_library_call_value_1
we have:

mode = promote_function_mode (NULL_TREE, mode, &unsigned_p, NULL_TREE, 0);
argvec[count].mode = mode;
argvec[count].value = convert_modes (mode, GET_MODE (val), val, unsigned_p);
argvec[count].reg = targetm.calls.function_arg (args_so_far, mode,
NULL_TREE, true);

This calls back into arm.c:arm_promote_function_mode, which promotes
less-than-four-byte integral values to SImode, but never modifies the
PUNSIGNEDP argument. So, such values always get sign extended when being
passed to libcalls.

The simplest fix for this (since libcalls don't have proper tree types
to inspect for the actual argument types) is just to define the
linux-atomic.c functions to use signed char/short instead of unsigned
char/unsigned short, approximately reversing the change in this earlier
patch:


It is unfortunate that we need to reverse this change given that we'd 
like things to be as unsigned as possible but I don't see how this would 
work otherwise right now.


Changing to an unsigned interface everywhere appears to cause more 
issues than is worth fixing right now given that it possibly makes life 
more difficult in the fixed point arithmetic function.


Thanks for the clarification on irc.



http://gcc.gnu.org/ml/gcc-patches/2010-08/msg00492.html

A slight change is also required to the
__sync_val_compare_and_swap_* implementation in order to treat the
signed OLDVAL argument correctly (I believe the other macros are OK).

Tested cross to ARM Linux, default & thumb multilibs. The
above-mentioned tests change from FAIL to PASS. OK to apply?



Ok and this should go to all afflicted release branches modulo their 
locked(ness) and watching trunk for a day or so ( keeping an eye on 
gcc-testresults for armv5te would be good enough.)


regards
Ramana



Thanks,

Julian

ChangeLog

 libgcc/
 * config/arm/linux-atomic.c (SUBWORD_SYNC_OP, SUBWORD_VAL_CAS)
 (SUBWORD_TEST_AND_SET): Use signed char/short types instead of
 unsigned char/unsigned short.
 (__sync_val_compare_and_swap_{1,2}): Handle signed argument.

Re: [PATCH][ARM][testsuite] Fix testsuite options for testing rounding vectorisation on ARMv8

2013-04-05 Thread Ramana Radhakrishnan

On 04/05/13 15:44, Kyrylo Tkachov wrote:

- -Original Message-

From: Ramana Radhakrishnan
Sent: 05 April 2013 15:06
To: Kyrylo Tkachov
Cc: gcc-patches@gcc.gnu.org; mikest...@comcast.net
Subject: Re: [PATCH][ARM][testsuite] Fix testsuite options for testing
rounding vectorisation on ARMv8

On 04/05/13 14:06, Kyrylo Tkachov wrote:

Hi all,

With r197491 I added testsuite support for vectorisation of rounding
functions on ARMv8 NEON, but the options set up
for vect.exp results in the testsuite trying to test all the vect

tests with

ARMv8 NEON which does not work on
ARMv7 targets and simulators that don't support ARMv8 (like qemu).

But if we run the tests using v7 NEON, the newly enabled vect-

rounding*

tests will FAIL because they need ARMv8 NEON options.

Therefore this patch reverts most of that and instead copies the

rounding

vectorisation tests to gcc.target/arm
where the correct options can be set.

Tested arm-none-eabi on qemu to make sure that the execution tests

come back

and use v7 NEON

Ok for trunk?

Ok by me but I'd like Mike to have another look.

It's a bit unfortunate we need to copy these tests over for the
architecture levels we need - there is a good project to restructure
gcc.target/arm to test properly at different arch. levels but for the
minute this is the best compromise.

While differentiating between architecture levels in the gcc.target/arm is a
good idea,
I think in this case the problem is that when testing gcc.dg/vect/ we use
check_vect_support_and_set_flags to set a common set of flags for all the
tests in that directory.

But when for the same target (i.e. arm-none-eabi, which could be ARMv7 or
ARMv8)
different FPU options provide different vectorisation capabilities things
gets messy when we want
to test something that one FPU supports and another doesn't. Since the
vect.exp tests are common
we cannot add arm-specific FPU options there.

Yeah ok then.

regards
Ramana

Kyrill

Ramana

Thanks,
Kyrill

gcc/testsuite
2013-04-05  Kyrylo Tkachov  

* lib/target-supports.exp (add_options_for_arm_v8_neon):
Add -march=armv8-a when we use v8 NEON.
(check_effective_target_vect_call_btruncf): Remove arm-*-*-*.
(check_effective_target_vect_call_ceilf): Likewise.
(check_effective_target_vect_call_floorf): Likewise.
(check_effective_target_vect_call_roundf): Likewise.
(check_vect_support_and_set_flags): Remove check for arm_v8_neon.
* gcc.target/arm/vect-rounding-btruncf.c: New testcase.
* gcc.target/arm/vect-rounding-ceilf.c: Likewise.
* gcc.target/arm/vect-rounding-floorf.c: Likewise.
* gcc.target/arm/vect-rounding-roundf.c: Likewise.

Re: Fill more delay slots in conditional returns

2013-04-05 Thread Steven Bosscher

On Fri, Apr 5, 2013 at 10:22 AM, Eric Botcazou wrote:
>> Thinking about this some more: This could be fixed by inserting a
>> machine-specific pass just after delayed-branch scheduling, like in
>> the attached patch. I think the same is possible with the dbr_schedule
>> call in the MIPS backend.
>>
>> Eric, what do you think of this approach?
>
> No objections on principle from a SPARC viewpoint.  But can we really control
> when the pass is run with register_pass?  Because it needs to be run _before_
> branch shortening.

Yes, we can control that. In this particular case, register_pass() is
told to insert the pass immediately after the first (and only)
instance of the pass named "dbr" i.e. pass_delay_slots.
position_pass() will look through the pass list and performs the
insertion in the right place. The new pass is inserted between
pass_delay_slots and pass_split_for_shorten_branches.

Without the inserted pass, before the patch, the pass chain looks like this:

  NEXT_PASS (pass_machine_reorg);
  // sparc_reorg calls cleanup_barriers and dbr_schedule
  NEXT_PASS (pass_cleanup_barriers);
  NEXT_PASS (pass_delay_slots);
  NEXT_PASS (pass_split_for_shorten_branches);
  NEXT_PASS (pass_convert_to_eh_region_ranges);
  NEXT_PASS (pass_shorten_branches);

After the patch, it looks like this:

  NEXT_PASS (pass_machine_reorg); // now a NOP for sparc
  NEXT_PASS (pass_cleanup_barriers);
  NEXT_PASS (pass_delay_slots);
  NEXT_PASS (pass_work_around_errata);
  NEXT_PASS (pass_split_for_shorten_branches);
  NEXT_PASS (pass_convert_to_eh_region_ranges);
  NEXT_PASS (pass_shorten_branches);

I've confirmed this also by printing the name for each pass in the chain.

Ciao!
Steven

Re: [google/gcc-4_8]Regenerate Makefile.in

2013-04-05 Thread Diego Novillo


On 2013-04-04 19:32 , Jing Yu wrote:

OK for google/gcc-4_8?


OK.


Diego.

Re: [patch, AVR] Add new ATmegaRFR devices

2013-04-05 Thread Georg-Johann Lay


Joerg Wunsch wrote:

The attached patch adds the new ATmega*RFR* devices to AVR-GCC.
[...]


Supply the auto generated files, too.  Cf. t-avr, avr-mcus.def etc.

Johann

[PATCH, PowerPC] Fix PR 56843

2013-04-05 Thread Bill Schmidt

This patch improves code generation for Newton-Raphson reciprocal
estimates for divide and square root on PowerPC
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56843).

For the divide case, we formerly had specialized routines for two- and
three-pass estimates.  Rather than add new routines for one- and
four-pass estimates, I removed those and rewrote the algorithm to be
general for any number of passes.  This unfortunately makes the patch
hard to read.  It will probably be easiest to review by applying it to a
tree and looking at the whole rs6000_emit_swdiv function.

Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
regressions.  Ok for trunk?

Thanks,
Bill


gcc:

2013-04-05  Bill Schmidt  

PR target/56843
* config/rs6000/rs6000.c (rs6000_emit_swdiv_high_precision): Remove.
(rs6000_emit_swdiv_low_precision): Remove.
(rs6000_emit_swdiv): Rewrite to handle between one and four
iterations of Newton-Raphson generally; modify required number of
iterations for some cases.
* config/rs6000/rs6000.h (RS6000_RECIP_HIGH_PRECISION_P): Remove.

gcc/testsuite:

2013-04-05  Bill Schmidt  

PR target/56843
* gcc.target/powerpc/recip-1.c: Modify expected output.
* gcc.target/powerpc/recip-3.c: Likewise.
* gcc.target/powerpc/recip-4.c: Likewise.
* gcc.target/powerpc/recip-5.c: Add expected output for iterations.


Index: gcc/testsuite/gcc.target/powerpc/recip-1.c
===
--- gcc/testsuite/gcc.target/powerpc/recip-1.c  (revision 197486)
+++ gcc/testsuite/gcc.target/powerpc/recip-1.c  (working copy)
@@ -3,8 +3,8 @@
 /* { dg-options "-O2 -mrecip -ffast-math -mcpu=power6" } */
 /* { dg-final { scan-assembler-times "frsqrte" 2 } } */
 /* { dg-final { scan-assembler-times "fmsub" 2 } } */
-/* { dg-final { scan-assembler-times "fmul" 8 } } */
-/* { dg-final { scan-assembler-times "fnmsub" 4 } } */
+/* { dg-final { scan-assembler-times "fmul" 6 } } */
+/* { dg-final { scan-assembler-times "fnmsub" 3 } } */
 
 double
 rsqrt_d (double a)
Index: gcc/testsuite/gcc.target/powerpc/recip-3.c
===
--- gcc/testsuite/gcc.target/powerpc/recip-3.c  (revision 197486)
+++ gcc/testsuite/gcc.target/powerpc/recip-3.c  (working copy)
@@ -7,8 +7,8 @@
 /* { dg-final { scan-assembler-times "xsnmsub.dp\|fnmsub\ " 2 } } */
 /* { dg-final { scan-assembler-times "frsqrtes" 1 } } */
 /* { dg-final { scan-assembler-times "fmsubs" 1 } } */
-/* { dg-final { scan-assembler-times "fmuls" 4 } } */
-/* { dg-final { scan-assembler-times "fnmsubs" 2 } } */
+/* { dg-final { scan-assembler-times "fmuls" 2 } } */
+/* { dg-final { scan-assembler-times "fnmsubs" 1 } } */
 
 double
 rsqrt_d (double a)
Index: gcc/testsuite/gcc.target/powerpc/recip-4.c
===
--- gcc/testsuite/gcc.target/powerpc/recip-4.c  (revision 197486)
+++ gcc/testsuite/gcc.target/powerpc/recip-4.c  (working copy)
@@ -7,8 +7,8 @@
 /* { dg-final { scan-assembler-times "xvnmsub.dp" 2 } } */
 /* { dg-final { scan-assembler-times "xvrsqrtesp" 1 } } */
 /* { dg-final { scan-assembler-times "xvmsub.sp" 1 } } */
-/* { dg-final { scan-assembler-times "xvmulsp" 4 } } */
-/* { dg-final { scan-assembler-times "xvnmsub.sp" 2 } } */
+/* { dg-final { scan-assembler-times "xvmulsp" 2 } } */
+/* { dg-final { scan-assembler-times "xvnmsub.sp" 1 } } */
 
 #define SIZE 1024
 
Index: gcc/testsuite/gcc.target/powerpc/recip-5.c
===
--- gcc/testsuite/gcc.target/powerpc/recip-5.c  (revision 197486)
+++ gcc/testsuite/gcc.target/powerpc/recip-5.c  (working copy)
@@ -6,6 +6,14 @@
 /* { dg-final { scan-assembler-times "xvresp" 5 } } */
 /* { dg-final { scan-assembler-times "xsredp" 2 } } */
 /* { dg-final { scan-assembler-times "fres" 2 } } */
+/* { dg-final { scan-assembler-times "fmuls" 2 } } */
+/* { dg-final { scan-assembler-times "fnmsubs" 2 } } */
+/* { dg-final { scan-assembler-times "xsmuldp" 2 } } */
+/* { dg-final { scan-assembler-times "xsnmsub.dp" 4 } } */
+/* { dg-final { scan-assembler-times "xvmulsp" 7 } } */
+/* { dg-final { scan-assembler-times "xvnmsub.sp" 5 } } */
+/* { dg-final { scan-assembler-times "xvmuldp" 6 } } */
+/* { dg-final { scan-assembler-times "xvnmsub.dp" 8 } } */
 
 #include 
 
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 197486)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -26913,54 +26913,26 @@ rs6000_emit_nmsub (rtx dst, rtx m1, rtx m2, rtx a)
   emit_insn (gen_rtx_SET (VOIDmode, dst, r));
 }
 
-/* Newton-Raphson approximation of floating point divide with just 2 passes
-   (either single precision floating point, or newer machines with higher
-   accuracy estimates).  Support both scalar and vector divide.  Assumes no
-   trapping m

Re: [PATCH] Loop distribution improvements

2013-04-05 Thread Richard Biener

Jakub Jelinek  wrote:

>On Fri, Apr 05, 2013 at 12:46:48PM +0200, Richard Biener wrote:
>> >BTW, the integer_all_onesp stuff is broken for this from what I can
>> >see, for complex
>> >numbers it returns true for -1 + 0i where all bytes aren't 0xff, so
>we
>> >need
>> >to rule out COMPLEX_CSTs (or do integer_all_onesp on each part
>> >instead).
>> >And TYPE_PRECISION on VECTOR_CSTs won't be what we are looking for.
>> 
>> Hmm, indeed.  Or remove the -1 special casing altogether.
>
>Ok, zero/CONSTRUCTOR moved into the function, all_onesp handling
>removed (so
>only on the CHAR_BIT == 8 hosts and BITS_PER_UNIT == 8 targets it will
>be
>optimized).  Ok for trunk?


Ok.

Thanks,
Richard.

>> Marc is probably right with his note as well.
>
>I'll defer that to Marc ;)
>
>2013-04-05  Jakub Jelinek  
>
>   * tree-loop-distribution.c (const_with_all_bytes_same): New function.
>   (generate_memset_builtin): Only handle integer_all_onesp as -1 val if
>   TYPE_PRECISION is equal to mode bitsize.  Use
>const_with_all_bytes_same
>   if possible to compute val.
>   (classify_partition): Verify CONSTRUCTOR doesn't have any elts.
>   For QImode integers don't require anything about precision.  Use
>   const_with_all_bytes_same to find out if the constant doesn't have
>   repeated bytes in it.
>
>   * gcc.dg/pr56837.c: New test.
>
>--- gcc/tree-loop-distribution.c.jj2013-04-04 15:03:28.0 +0200
>+++ gcc/tree-loop-distribution.c   2013-04-05 15:21:10.641668895 +0200
>@@ -297,6 +297,36 @@ build_addr_arg_loc (location_t loc, data
>return fold_build_pointer_plus_loc (loc, DR_BASE_ADDRESS (dr),
>addr_base);
> }
> 
>+/* If VAL memory representation contains the same value in all bytes,
>+   return that value, otherwise return -1.
>+   E.g. for 0x24242424 return 0x24, for IEEE double
>+   747708026454360457216.0 return 0x44, etc.  */
>+
>+static int
>+const_with_all_bytes_same (tree val)
>+{
>+  unsigned char buf[64];
>+  int i, len;
>+
>+  if (integer_zerop (val)
>+  || real_zerop (val)
>+  || (TREE_CODE (val) == CONSTRUCTOR
>+  && !TREE_CLOBBER_P (val)
>+  && CONSTRUCTOR_NELTS (val) == 0))
>+return 0;
>+
>+  if (CHAR_BIT != 8 || BITS_PER_UNIT != 8)
>+return -1;
>+
>+  len = native_encode_expr (val, buf, sizeof (buf));
>+  if (len == 0)
>+return -1;
>+  for (i = 1; i < len; i++)
>+if (buf[i] != buf[0])
>+  return -1;
>+  return buf[0];
>+}
>+
> /* Generate a call to memset for PARTITION in LOOP.  */
> 
> static void
>@@ -327,24 +357,20 @@ generate_memset_builtin (struct loop *lo
> 
>/* This exactly matches the pattern recognition in classify_partition. 
>*/
>   val = gimple_assign_rhs1 (stmt);
>-  if (integer_zerop (val)
>-  || real_zerop (val)
>-  || TREE_CODE (val) == CONSTRUCTOR)
>-val = integer_zero_node;
>-  else if (integer_all_onesp (val))
>-val = build_int_cst (integer_type_node, -1);
>-  else
>-{
>-  if (TREE_CODE (val) == INTEGER_CST)
>-  val = fold_convert (integer_type_node, val);
>-  else if (!useless_type_conversion_p (integer_type_node,
>TREE_TYPE (val)))
>-  {
>-gimple cstmt;
>-tree tem = make_ssa_name (integer_type_node, NULL);
>-cstmt = gimple_build_assign_with_ops (NOP_EXPR, tem, val,
>NULL_TREE);
>-gsi_insert_after (&gsi, cstmt, GSI_CONTINUE_LINKING);
>-val = tem;
>-  }
>+  /* Handle constants like 0x15151515 and similarly
>+ floating point constants etc. where all bytes are the same.  */
>+  int bytev = const_with_all_bytes_same (val);
>+  if (bytev != -1)
>+val = build_int_cst (integer_type_node, bytev);
>+  else if (TREE_CODE (val) == INTEGER_CST)
>+val = fold_convert (integer_type_node, val);
>+  else if (!useless_type_conversion_p (integer_type_node, TREE_TYPE
>(val)))
>+{
>+  gimple cstmt;
>+  tree tem = make_ssa_name (integer_type_node, NULL);
>+  cstmt = gimple_build_assign_with_ops (NOP_EXPR, tem, val,
>NULL_TREE);
>+  gsi_insert_after (&gsi, cstmt, GSI_CONTINUE_LINKING);
>+  val = tem;
> }
> 
>   fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_MEMSET));
>@@ -354,10 +380,8 @@ generate_memset_builtin (struct loop *lo
>   if (dump_file && (dump_flags & TDF_DETAILS))
> {
>   fprintf (dump_file, "generated memset");
>-  if (integer_zerop (val))
>+  if (bytev == 0)
>   fprintf (dump_file, " zero\n");
>-  else if (integer_all_onesp (val))
>-  fprintf (dump_file, " minus one\n");
>   else
>   fprintf (dump_file, "\n");
> }
>@@ -941,18 +965,10 @@ classify_partition (loop_p loop, struct
> {
>   gimple stmt = DR_STMT (single_store);
>   tree rhs = gimple_assign_rhs1 (stmt);
>-  if (!(integer_zerop (rhs)
>-  || real_zerop (rhs)
>-  || (TREE_CODE (rhs) == CONSTRUCTOR
>-  && !TREE_CLOBBER_P (rhs))
>-  || ((integer_all_onesp (rhs)
>-   || (INTEGRAL_TYPE_P (TREE_TYPE (rhs))
>-

Re: [PATCH, PowerPC] Fix PR 56843

2013-04-05 Thread David Edelsohn

On Fri, Apr 5, 2013 at 1:49 PM, Bill Schmidt
 wrote:
> This patch improves code generation for Newton-Raphson reciprocal
> estimates for divide and square root on PowerPC
> (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56843).
>
> For the divide case, we formerly had specialized routines for two- and
> three-pass estimates.  Rather than add new routines for one- and
> four-pass estimates, I removed those and rewrote the algorithm to be
> general for any number of passes.  This unfortunately makes the patch
> hard to read.  It will probably be easiest to review by applying it to a
> tree and looking at the whole rs6000_emit_swdiv function.
>
> Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
> regressions.  Ok for trunk?
>
> Thanks,
> Bill
>
>
> gcc:
>
> 2013-04-05  Bill Schmidt  
>
> PR target/56843
> * config/rs6000/rs6000.c (rs6000_emit_swdiv_high_precision): Remove.
> (rs6000_emit_swdiv_low_precision): Remove.
> (rs6000_emit_swdiv): Rewrite to handle between one and four
> iterations of Newton-Raphson generally; modify required number of
> iterations for some cases.
> * config/rs6000/rs6000.h (RS6000_RECIP_HIGH_PRECISION_P): Remove.
>
> gcc/testsuite:
>
> 2013-04-05  Bill Schmidt  
>
> PR target/56843
> * gcc.target/powerpc/recip-1.c: Modify expected output.
> * gcc.target/powerpc/recip-3.c: Likewise.
> * gcc.target/powerpc/recip-4.c: Likewise.
> * gcc.target/powerpc/recip-5.c: Add expected output for iterations.

Okay.

Thanks, David

Re: [PATCH] Loop distribution improvements

2013-04-05 Thread Marc Glisse


On Fri, 5 Apr 2013, Marc Glisse wrote:

Shouldn't we change integer_all_onesp to do what its name says and create a 
separate integer_minus_onep for the single place I could find where it would 
break, the folding of x * -1 ?


2013-04-05  Marc Glisse  

* tree.c (integer_all_onesp) : Test that both
components are all 1s.
(integer_minus_onep): New function.
* tree.h (integer_minus_onep): Declare it.
* fold-const.c (fold_binary_loc) : Test
integer_minus_onep instead of integer_all_onesp.

It passes bootstrap+testsuite on x86_64-linux-gnu, but if someone else 
wants to go through the (not that long) list of integer_all_onesp to check 
for things that might break... I did not change places where the name "-1" 
might make more sense than "all 1s" but the type cannot be complex.


--
Marc GlisseIndex: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 197532)
+++ gcc/fold-const.c(working copy)
@@ -10802,21 +10802,21 @@ fold_binary_loc (location_t loc,
 
   if (! FLOAT_TYPE_P (type))
{
  if (integer_zerop (arg1))
return omit_one_operand_loc (loc, type, arg1, arg0);
  if (integer_onep (arg1))
return non_lvalue_loc (loc, fold_convert_loc (loc, type, arg0));
  /* Transform x * -1 into -x.  Make sure to do the negation
 on the original operand with conversions not stripped
 because we can only strip non-sign-changing conversions.  */
- if (integer_all_onesp (arg1))
+ if (integer_minus_onep (arg1))
return fold_convert_loc (loc, type, negate_expr (op0));
  /* Transform x * -C into -x * C if x is easily negatable.  */
  if (TREE_CODE (arg1) == INTEGER_CST
  && tree_int_cst_sgn (arg1) == -1
  && negate_expr_p (arg0)
  && (tem = negate_expr (arg1)) != arg1
  && !TREE_OVERFLOW (tem))
return fold_build2_loc (loc, MULT_EXPR, type,
fold_convert_loc (loc, type,
  negate_expr (arg0)),
Index: gcc/tree.c
===
--- gcc/tree.c  (revision 197532)
+++ gcc/tree.c  (working copy)
@@ -1774,33 +1774,33 @@ integer_onep (const_tree expr)
  if (!integer_onep (VECTOR_CST_ELT (expr, i)))
return false;
return true;
   }
 default:
   return false;
 }
 }
 
 /* Return 1 if EXPR is an integer containing all 1's in as much precision as
-   it contains.  Likewise for the corresponding complex constant.  */
+   it contains, or a complex or vector whose subparts are such integers.  */
 
 int
 integer_all_onesp (const_tree expr)
 {
   int prec;
   int uns;
 
   STRIP_NOPS (expr);
 
   if (TREE_CODE (expr) == COMPLEX_CST
   && integer_all_onesp (TREE_REALPART (expr))
-  && integer_zerop (TREE_IMAGPART (expr)))
+  && integer_all_onesp (TREE_IMAGPART (expr)))
 return 1;
 
   else if (TREE_CODE (expr) == VECTOR_CST)
 {
   unsigned i;
   for (i = 0; i < VECTOR_CST_NELTS (expr); ++i)
if (!integer_all_onesp (VECTOR_CST_ELT (expr, i)))
  return 0;
   return 1;
 }
@@ -1832,20 +1832,34 @@ integer_all_onesp (const_tree expr)
   else
high_value = ((HOST_WIDE_INT) 1 << shift_amount) - 1;
 
   return (TREE_INT_CST_LOW (expr) == ~(unsigned HOST_WIDE_INT) 0
  && TREE_INT_CST_HIGH (expr) == high_value);
 }
   else
 return TREE_INT_CST_LOW (expr) == ((unsigned HOST_WIDE_INT) 1 << prec) - 1;
 }
 
+/* Return 1 if EXPR is the integer constant minus one.  */
+
+int
+integer_minus_onep (const_tree expr)
+{
+  STRIP_NOPS (expr);
+
+  if (TREE_CODE (expr) == COMPLEX_CST)
+return (integer_all_onesp (TREE_REALPART (expr))
+   && integer_zerop (TREE_IMAGPART (expr)));
+  else
+return integer_all_onesp (expr);
+}
+
 /* Return 1 if EXPR is an integer constant that is a power of 2 (i.e., has only
one bit on).  */
 
 int
 integer_pow2p (const_tree expr)
 {
   int prec;
   unsigned HOST_WIDE_INT high, low;
 
   STRIP_NOPS (expr);
Index: gcc/tree.h
===
--- gcc/tree.h  (revision 197532)
+++ gcc/tree.h  (working copy)
@@ -5303,20 +5303,25 @@ extern int integer_zerop (const_tree);
 
 /* integer_onep (tree x) is nonzero if X is an integer constant of value 1.  */
 
 extern int integer_onep (const_tree);
 
 /* integer_all_onesp (tree x) is nonzero if X is an integer constant
all of whose significant bits are 1.  */
 
 extern int integer_all_onesp (const_tree);
 
+/* integer_minus_onep (tree x) is nonzero if X is an integer constant of
+   value -1.  */
+
+extern int integer_minus_onep (const_tree);
+
 /* integer_pow2p (tree x) is nonzero is X is an integer constant with
exactly one bit 1.  */
 
 extern int integer_pow2p (const_

Re: [patch tree-ssa-structalias.c]: Small finding in find_func_aliases function

2013-04-05 Thread Jeff Law


On 04/05/2013 02:29 AM, Kai Tietz wrote:

Hello,

while debugging I made the finding that in find_func_aliases rhsop
might be used as NULL for gimple_assign_single_p items.  It should be
using for the gimple_assign_single_p instead directly the rhs1-item as
argument to pass to get_constraint_for_rhs function.

ChangeLog

2013-04-05  Kai Tietz

 * tree-ssa-structalias.c (find_func_aliases): Special-case
 gimple_assign_single_p handling.

Ok for apply?

Yes.  OK for the trunk.

Do you have a testcase?

jeff

Re: [PATCH][ARM][testsuite] Fix testsuite options for testing rounding vectorisation on ARMv8

2013-04-05 Thread Mike Stump

On Apr 5, 2013, at 7:05 AM, Ramana Radhakrishnan  wrote:
> Ok by me but I'd like Mike to have another look.

Ok by me.

Re: functional and type_traits cleanup

2013-04-05 Thread François Dumont


On 04/05/2013 12:20 AM, Jonathan Wakely wrote:

On 4 April 2013 21:16, François Dumont wrote:
I think this is mostly very good, thanks for cleaning it up. The 
indentiation of the closing brace for __is_assignable_helper looks 
wrong. Is there a reason that __is_assignable_helper::__test uses a 
default template argument but __is_convertible_helper::__test uses 
decltype(expr, type) in the function return type? I think the 
decltype(__test_aux<_Tp1>(...)) expression would work as a default 
template argument too, which I find easier to read because it doesn't 
clutter up the return type. 


In fact my first attempt was a very simple one:

  template
class __is_convertible_helper<_From, _To, false>
{
  template
static true_type
__test(_To1);

  template
static false_type
__test(...);

public:
  typedef decltype(__test<_To>(std::declval<_From>())) type;
};

But some tests failed like:
In file included from 
/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/move.h:57:0,
 from 
/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/stl_pair.h:59,
 from 
/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/utility:70,
 from 
/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/tuple:38,
 from 
/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/functional:55,
 from 
/home/fdt/dev/gcc/src/libstdc++-v3/testsuite/20_util/bind/38889.cc:23:
/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/type_traits: 
In instantiation of 'struct std::__is_convertible_helperstd::tuple >&, std::_Placeholder<1>, false>':
/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/type_traits:1321:12: 
required from 'struct std::is_convertiblestd::tuple >&, std::_Placeholder<1> >'
/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/type_traits:111:12: 
required from 'struct std::__and_std::tuple >&, std::_Placeholder<1> > >'
/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/tuple:400:40: 
required from 'struct std::_Bind))(int)>'
/home/fdt/dev/gcc/src/libstdc++-v3/testsuite/20_util/bind/38889.cc:28:41: required 
from here
/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/type_traits:1316:30: 
error: 'std::_Placeholder<1>' is an inaccessible base of 
'std::tuple >'

   typedef decltype(__test<_To>(std::declval<_From>())) type;
  ^
/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/type_traits:1309:2: 
error:   initializing argument 1 of 'static std::true_type 
std::__is_convertible_helper<_From, _To, false>::__test(_To1) [with _To1 
= std::_Placeholder<1>; _From = const std::tuple 
>&; _To = std::_Placeholder<1>; std::true_type = 
std::integral_constant]'

  __test(_To1);
  ^

From my point of view this is an other example of use case for which 
gcc is not SFINAE friendly enough, no ?


But the version with the default template parameter is fine and more 
consistent with the other helpers implementation so, adopted! Here is an 
other version of the patch for validation.


Daniel, I agree that inheritance with integral_constant is not as 
obvious as before but it is still there and it is just what the compiler 
need. I even hope that it also simplified a (very) little bit the job 
for the compiler.


Ok to commit ?

François

Index: include/std/functional
===
--- include/std/functional	(revision 197307)
+++ include/std/functional	(working copy)
@@ -185,38 +185,6 @@
 : _Weak_result_type_impl::type>
 { };
 
-  /// Determines if the type _Tp derives from unary_function.
-  template
-struct _Derives_from_unary_function : __sfinae_types
-{
-private:
-  template
-	static __one __test(const volatile unary_function<_T1, _Res>*);
-
-  // It's tempting to change "..." to const volatile void*, but
-  // that fails when _Tp is a function type.
-  static __two __test(...);
-
-public:
-  static const bool value = sizeof(__test((_Tp*)0)) == 1;
-};
-
-  /// Determines if the type _Tp derives from binary_function.
-  template
-struct _Derives_from_binary_function : __sfinae_types
-{
-private:
-  template
-	static __one __test(const volatile binary_function<_T1, _T2, _Res>*);
-
-  // It's tempting to change "..." to const volatile void*, but
-  // that fails when _Tp is a function type.
-  static __two __test(...);
-
-public:
-  static const bool value = sizeof(__test((_Tp*)0)) == 1;
-};
-
   /**
* Invoke a function object, which may be either a member pointer or a
* function object. The first parameter will tell which.
Index: include/std/type_traits
===
--- include/std/type_t

[ira-improv] patch for new code dealing with hard reg preferences

2013-04-05 Thread Vladimir Makarov


  The following patch adds a new functionality to IRA to improve RA in
presence of hard registers in RTL.  IRA already had some mechanism
dealing with hard regs but it affected only pseudos in *the same insn*
containing hard register.  This technique was not good enough to
remove regmove pass code making transformations for matching
constraints (e.g. 2-op insn machine) although IRA can make better
decision for pseudos but still not enough good for hard registers.  So
just removing the code in regmove resulted in worse generated code
performance.

  On the other hand there are performance PRs which is a result of bad
decision in pre-mature optimization of regmove pass for matching
constraints.

  We already have an IRA code propagating pseudo preferences to other
pseudos through net of copies.  We need the same code for better
dealing with hard register preferences.  The patch adds such code.  I
could modify structure ira_copy for this but decided to use new
separate structures as the copy structure is too big and we don't need
most of its fields for hard-register preferences.

  By the way, LRA has already code for dealing with hard register
preferences but in simplified way as we work on RTL mostly locally.

  This is not a final patch as the big part of regmove code and old
IRA code for hard register preferences are not removed.  This code is
just switched off by -fira-hard-reg-pref option.  I am going to play a
bit more with the new code and when I decide to submit it to trunk, I
remove the unnecessary code and the option.

  Right now, the results look pretty good for SPEC2000 on x86/x86-64.
The compiler is always faster (as a regmove RTL pass is switched
off), generates in average smaller code (the best is 0.05% decrease
for x86-64 SPECFP2000), and generates in average better code (0.6%
and 0.4% SPECFP2000 improvement on x86 and x86-64 correspondingly).
The results were gotten on Intel Core I7-2600 in -O3 optimization
mode.

  The patch was successfully bootstrapped on x86/x86-64 with the new
code switched on.

Committed as rev. 197525.


2013-04-05  Vladimir Makarov  

* common.opt (fira-hard-reg-pref): New.
* regmove.c (regmove_optimize): Don't call regmove_forward_pass
for flag_ira_hard_reg_pref.
* ira-int.h (ira_pref_t, allocno_prefs, ALLOCNO_PREFS): New.
(struct ira_allocno_pref, ira_prefs, ira_prefs_num): New.
(ira_debug_pref, ira_debug_prefs, ira_debug_allocno_prefs): New.
(ira_create_pref, ira_create_copy): New.
(ira_add_allocno_copy_to_list): Remove.
(ira_swap_allocno_copy_ends_if_necessary): Ditto.
(ira_pref_iterator, ira_pref_iter_init, ira_pref_iter_cond): New.
(FOR_EACH_PREF): New.
* ira-build.c (ira_prefs, ira_prefs_num): New.
(ira_create_allocno): Reset preferences.
(pref_pool, pref_vec, initiate_prefs, find_allocno_pref)
(ira_create_pref, add_allocno_pref_to_list, ira_add_allocno_pref)
(print_pref, ira_debug_pref, print_prefs, ira_debug_prefs)
(print_allocno_prefs, ira_debug_allocno_prefs, finish_pref)
(finish_prefs): New.
(ira_add_allocno_copy_to_list): Rename to
add_allocno_copy_to_list.  Make static.
(ira_swap_allocno_copy_ends_if_necessary): Rename to
swap_allocno_copy_ends_if_necessary.  Make static.
(ira_build, ira_destroy): Initialize and finish the prefs.
* ira-color.c (update_allocno_cost, update_costs_from_allocno):
New.
(update_copy_costs): Rename to update_costs_from_copies.  Use
update_costs_from_allocno.
(update_costs_from_prefs, update_costs_from_copies): New.
(assign_hard_reg): Call update_costs_from_prefs.
(color_allocnos, color_pass): Ditto.
* ira-costs.c (find_costs_and_classes): Improve code.  Add code for
hard reg preferences.
(process_bb_node_for_hard_reg_moves): Add code for hard reg
preferences.

Index: ChangeLog
===
--- ChangeLog   (revision 195439)
+++ ChangeLog   (working copy)
@@ -1,3 +1,40 @@
+2013-04-05  Vladimir Makarov  
+
+   * common.opt (fira-hard-reg-pref): New.
+   * regmove.c (regmove_optimize): Don't call regmove_forward_pass
+   for flag_ira_hard_reg_pref.
+   * ira-int.h (ira_pref_t, allocno_prefs, ALLOCNO_PREFS): New.
+   (struct ira_allocno_pref, ira_prefs, ira_prefs_num): New.
+   (ira_debug_pref, ira_debug_prefs, ira_debug_allocno_prefs): New.
+   (ira_create_pref, ira_create_copy): New.
+   (ira_add_allocno_copy_to_list): Remove.
+   (ira_swap_allocno_copy_ends_if_necessary): Ditto.
+   (ira_pref_iterator, ira_pref_iter_init, ira_pref_iter_cond): New.
+   (FOR_EACH_PREF): New.
+   * ira-build.c (ira_prefs, ira_prefs_num): New.
+   (ira_create_allocno): Reset preferences.
+   (pref_pool, pref_vec, initiate_prefs, find_allocno_pref)
+   (ira_create_pref

Re: [PATCH] Fix PR48182

2013-04-05 Thread Jeff Law


On 04/05/2013 09:22 AM, Marek Polacek wrote:

This patch prevents segfault when using --param min-crossjump-insns=0.
What can happen in that case is that flow_find_cross_jump returns 0,
thus nmatch is 0, then
nmatch < PARAM_VALUE (PARAM_MIN_CROSSJUMP_INSNS)
doesn't hold, thus we continue, but we segfault later on when
doing split_block.  I think it's better to just bail out in that
case; moreover setting min-crossjump-insns to 0 isn't very common...

Regtested/bootstrapped on x86_64-linux, ok for trunk/4.8?

2013-04-05  Marek Polacek  

PR rtl-optimization/48182
* cfgcleanup.c (try_crossjump_to_edge): Bail out if
PARAM_MIN_CROSSJUMP_INSNS is 0.

* gcc.dg/pr48182.c: New test.

OK for the trunk.  Release manager's decision for 4.8.

jeff

Re: [PATCH] Fix PR48182

2013-04-05 Thread Jakub Jelinek

On Fri, Apr 05, 2013 at 02:21:57PM -0600, Jeff Law wrote:
> On 04/05/2013 09:22 AM, Marek Polacek wrote:
> >This patch prevents segfault when using --param min-crossjump-insns=0.
> >What can happen in that case is that flow_find_cross_jump returns 0,
> >thus nmatch is 0, then
> >nmatch < PARAM_VALUE (PARAM_MIN_CROSSJUMP_INSNS)
> >doesn't hold, thus we continue, but we segfault later on when
> >doing split_block.  I think it's better to just bail out in that
> >case; moreover setting min-crossjump-insns to 0 isn't very common...
> >
> >Regtested/bootstrapped on x86_64-linux, ok for trunk/4.8?
> >
> >2013-04-05  Marek Polacek  
> >
> > PR rtl-optimization/48182
> > * cfgcleanup.c (try_crossjump_to_edge): Bail out if
> > PARAM_MIN_CROSSJUMP_INSNS is 0.
> >
> > * gcc.dg/pr48182.c: New test.
> OK for the trunk.  Release manager's decision for 4.8.

Wouldn't it be better to change params.def to instead say:
 5, 1, 0)
Because with the cfgcleanup.c change, --param min-crossjump-insns=0
is handled as =infinity rather than something smaller than 0.

Jakub

Re: [PATCH] Fix PR48182

2013-04-05 Thread Jeff Law


On 04/05/2013 02:33 PM, Jakub Jelinek wrote:

On Fri, Apr 05, 2013 at 02:21:57PM -0600, Jeff Law wrote:

On 04/05/2013 09:22 AM, Marek Polacek wrote:

This patch prevents segfault when using --param min-crossjump-insns=0.
What can happen in that case is that flow_find_cross_jump returns 0,
thus nmatch is 0, then
nmatch < PARAM_VALUE (PARAM_MIN_CROSSJUMP_INSNS)
doesn't hold, thus we continue, but we segfault later on when
doing split_block.  I think it's better to just bail out in that
case; moreover setting min-crossjump-insns to 0 isn't very common...

Regtested/bootstrapped on x86_64-linux, ok for trunk/4.8?

2013-04-05  Marek Polacek  

PR rtl-optimization/48182
* cfgcleanup.c (try_crossjump_to_edge): Bail out if
PARAM_MIN_CROSSJUMP_INSNS is 0.

* gcc.dg/pr48182.c: New test.

OK for the trunk.  Release manager's decision for 4.8.


Wouldn't it be better to change params.def to instead say:
  5, 1, 0)
Because with the cfgcleanup.c change, --param min-crossjump-insns=0
is handled as =infinity rather than something smaller than 0.
?  I must be missing something, the change causes an early bail out from 
try_crossjump_to_edge.


We don't want to raise the min to > 0 as that doesn't allow the user to 
turn on this specific transformation.


jeff

maintainer-scripts/update_web_docs_libstdcxx_svn: add error detection

2013-04-05 Thread Gerald Pfeifer

So, I was debugging why the nightly run of this script did not actually
every update anything.  As part of that I manually ran the script on
gcc.gnu.org. 

Let's say the output was not particularly helpful. ;-)

This patch addresses that and does not simply ignore _all_ output any
more, plus it explicitly issues an error message if there was any problem.

http://gcc.gnu.org/ml/gccadmin/2013-q2/msg4.html shows exemplary
output.


I have not committed this yet, but plan on doing that unless there are
any objections.  Thoughts?

Gerald

2013-04-05  Gerald Pfeifer  

* update_web_docs_libstdcxx_svn: No longer ignore all output from
the actual copy process.
Check the exit code of the actual copy process; diagnose problems.

Index: update_web_docs_libstdcxx_svn
===
--- update_web_docs_libstdcxx_svn   (revision 197262)
+++ update_web_docs_libstdcxx_svn   (working copy)
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
 
 
 # "sh update_web_docs_libstdcxx.sh"
@@ -39,8 +39,13 @@
 
 # copy the tree to the onlinedocs area, preserve directory structure
 #find . -depth -print | cpio -pdv $WWWDIR
-find . -depth -print | cpio -pd $WWWDIR > /dev/null 2>&1
+find . -depth -print | cpio -pd $WWWDIR 2>&1 | grep -v "newer or same age 
version exists"
 
+err=${PIPESTATUS[1]}
+if [ $err -gt 0 ]; then
+printf "\nCopying failed with error code %d.\n" $err
+fi
+
 cd /
 /bin/rm -rf $WORKDIR

Re: [ira-improv] patch for new code dealing with hard reg preferences

2013-04-05 Thread Jeff Law


On 04/05/2013 02:14 PM, Vladimir Makarov wrote:

   The following patch adds a new functionality to IRA to improve RA in
presence of hard registers in RTL.  IRA already had some mechanism
dealing with hard regs but it affected only pseudos in *the same insn*
containing hard register.  This technique was not good enough to
remove regmove pass code making transformations for matching
constraints (e.g. 2-op insn machine) although IRA can make better
decision for pseudos but still not enough good for hard registers.  So
just removing the code in regmove resulted in worse generated code
performance.

   On the other hand there are performance PRs which is a result of bad
decision in pre-mature optimization of regmove pass for matching
constraints.

   We already have an IRA code propagating pseudo preferences to other
pseudos through net of copies.  We need the same code for better
dealing with hard register preferences.  The patch adds such code.  I
could modify structure ira_copy for this but decided to use new
separate structures as the copy structure is too big and we don't need
most of its fields for hard-register preferences.

   By the way, LRA has already code for dealing with hard register
preferences but in simplified way as we work on RTL mostly locally.

   This is not a final patch as the big part of regmove code and old
IRA code for hard register preferences are not removed.  This code is
just switched off by -fira-hard-reg-pref option.  I am going to play a
bit more with the new code and when I decide to submit it to trunk, I
remove the unnecessary code and the option.

   Right now, the results look pretty good for SPEC2000 on x86/x86-64.
The compiler is always faster (as a regmove RTL pass is switched
off), generates in average smaller code (the best is 0.05% decrease
for x86-64 SPECFP2000), and generates in average better code (0.6%
and 0.4% SPECFP2000 improvement on x86 and x86-64 correspondingly).
The results were gotten on Intel Core I7-2600 in -O3 optimization
mode.

   The patch was successfully bootstrapped on x86/x86-64 with the new
code switched on.

Committed as rev. 197525.
Very cool.  Presumably this is meant to kill optimize_reg_copy_*. 
That'd leave the fixup_match bits (which might be dead with the SLSR in 
the mainline.  Does that just leave try_auto_increment?


Jeff

Re: [PATCH] Fix PR48182

2013-04-05 Thread Jakub Jelinek

On Fri, Apr 05, 2013 at 02:42:19PM -0600, Jeff Law wrote:
> ?  I must be missing something, the change causes an early bail out
> from try_crossjump_to_edge.
> 
> We don't want to raise the min to > 0 as that doesn't allow the user
> to turn on this specific transformation.

The condition is
  if (nmatch < PARAM_VALUE (PARAM_MIN_CROSSJUMP_INSNS))
return false; // aka "don't crossjump"
So, the smaller the N in --param min-crossjump-insns=N is, the more likely
we crossjump.  Thus N=0 should mean that it is most likely we crossjump,
and as N=1 requires that at least one insn matches, N=0 would mean that
even zero insns can match.  If we for --param min-crossjump-insns=0
always return false, it means we never crossjump, so it is least likely
that we crossjump, which corresponds to largest possible N, not smallest
one.

Jakub

Re: [PATCH] Fix PR48182

2013-04-05 Thread Jeff Law


On 04/05/2013 02:50 PM, Jakub Jelinek wrote:

On Fri, Apr 05, 2013 at 02:42:19PM -0600, Jeff Law wrote:

?  I must be missing something, the change causes an early bail out
from try_crossjump_to_edge.

We don't want to raise the min to > 0 as that doesn't allow the user
to turn on this specific transformation.


The condition is
   if (nmatch < PARAM_VALUE (PARAM_MIN_CROSSJUMP_INSNS))
 return false; // aka "don't crossjump"
So, the smaller the N in --param min-crossjump-insns=N is, the more likely
we crossjump.  Thus N=0 should mean that it is most likely we crossjump,
and as N=1 requires that at least one insn matches, N=0 would mean that
even zero insns can match.  If we for --param min-crossjump-insns=0
always return false, it means we never crossjump, so it is least likely
that we crossjump, which corresponds to largest possible N, not smallest
one.
Yes the smaller the N, the more likely we are to crossjump, of course 
the value 0 would make no sense (I'm clearly out of practice on reviews :-).


Yea, changing the min value in params.def to 1 would be a better way to 
fix.  Consider that patch pre-approved.


jeff

Re: Fix PR 56077

2013-04-05 Thread Eric Botcazou

>  Andrey, could you please take care of this ?

I've reverted the patch after bootstrapping/regtesting on x86-64/Linux.

-- 
Eric Botcazou

Re: RFC: color diagnostics markers

2013-04-05 Thread Manuel López-Ibáñez

On 2 April 2013 11:14, Jakub Jelinek  wrote:
>
> Yeah, IMHO we definitely want to support GCC_COLORS env var or similar, with
> same syntax as e.g. GREP_COLORS, but with different names of the (two
> letter?) color names.

The attached patch adds support for customization via GCC_COLORS
following grep's implementation. Grep's manual page talks about
terminfo capabilities, but the source code says that they don't use
that in order to avoid adding a dependency on terminfo. So what they
call capabilities is actually labels for whatever can be customized.
There is no need to restrict the labels to two letters. My changes to
invoke.texi explain the options in detail. The comments in
diagnostics-color.c explain the choice for the default colors in
detail. The comments are taken from grep's source code but they make
sense in general, so I chose the same colors they use by default.

An important difference from grep and ls is that setting GCC_COLORS=""
disables colorization.

>> This patch only allows two options enable/disable colors (and defaults
>> to disabled), but grep has auto/never/always, and I can easily extend
>> the patch in that way.
>
> IMO we also want that autodetection and default to auto.

So following grep and ls, I added the following forms:
-fdiagnostics-color == -fdiagnostics-color=always
-fno-diagnostics-color == -fdiagnostics-color=never
-fdiagnostics-color=auto

In this patch the default is "never", because for some reason "auto"
triggers colorization during regression testing. I have not found a
way to avoid this. If we decide that we can live with it, I can simply
use an explicit -fno-diagnostics-color in the testsuite like we
currently do -fno-diagnostics-show-caret and set the default to auto.
The auto setting disables colors for pipes/redirections and dumb
terminals (like emacs).

Is the patch starting to look like something that could be approved?
If so, I will fix the formatting and write a changelog and submit a
proper version.

The testuite runs fine except for some strange errors that seem unrelated:

unix//-m64: c-c++-common/asan/global-overflow-1.c  -O0  output pattern
test, is ==22835== ERROR: AddressSanitizer failed to allocate
0xdfff0001000 (15392894357504) bytes at address 0x02008fff7000 (12)
unix//-m64: c-c++-common/asan/global-overflow-1.c  -O0  output pattern
test, is ==6702== ERROR: AddressSanitizer failed to allocate
0xdfff0001000 (15392894357504) bytes at address 0x02008fff7000 (12)
unix//-m64: c-c++-common/asan/global-overflow-1.c  -O1  output pattern
test, is ==23050== ERROR: AddressSanitizer failed to allocate
0xdfff0001000 (15392894357504) bytes at address 0x02008fff7000 (12)

Cheers,

Manuel

color-markers-2.diff
Description: Binary data

[patch][sparc] define_c_enum for UNSPEC/UNSPECV

2013-04-05 Thread Steven Bosscher

Hello,

Almost trivial, but it makes the dumps look so much better if UNSPEC
names are printed instead of just numbers.

OK for trunk?

Ciao!
Steven


* config/sparc/sparc.md: Use define_c_enum for "unspec" and "unspecv".
* config/sparc/sparc.md: Use define_c_enum for "unspec" and "unspecv".

Index: config/sparc/sparc.md
===
--- config/sparc/sparc.md   (revision 197536)
+++ config/sparc/sparc.md   (working copy)
@@ -22,88 +22,88 @@
 
 ;;- See file "rtl.def" for documentation on define_insn, match_*, et. al.
 
-(define_constants
-  [(UNSPEC_MOVE_PIC0)
-   (UNSPEC_UPDATE_RETURN   1)
-   (UNSPEC_LOAD_PCREL_SYM  2)
-   (UNSPEC_FRAME_BLOCKAGE  3)
-   (UNSPEC_MOVE_PIC_LABEL  5)
-   (UNSPEC_SETH44  6)
-   (UNSPEC_SETM44  7)
-   (UNSPEC_SETHH   9)
-   (UNSPEC_SETLM   10)
-   (UNSPEC_EMB_HISUM   11)
-   (UNSPEC_EMB_TEXTUHI 13)
-   (UNSPEC_EMB_TEXTHI  14)
-   (UNSPEC_EMB_TEXTULO 15)
-   (UNSPEC_EMB_SETHM   18)
-   (UNSPEC_MOVE_GOTDATA19)
-
-   (UNSPEC_MEMBAR  20)
-   (UNSPEC_ATOMIC  21)
-
-   (UNSPEC_TLSGD   30)
-   (UNSPEC_TLSLDM  31)
-   (UNSPEC_TLSLDO  32)
-   (UNSPEC_TLSIE   33)
-   (UNSPEC_TLSLE   34)
-   (UNSPEC_TLSLD_BASE  35)
-
-   (UNSPEC_FPACK16 40)
-   (UNSPEC_FPACK32 41)
-   (UNSPEC_FPACKFIX42)
-   (UNSPEC_FEXPAND 43)
-   (UNSPEC_MUL16AU 44)
-   (UNSPEC_MUL16AL 45)
-   (UNSPEC_MUL8UL  46)
-   (UNSPEC_MULDUL  47)
-   (UNSPEC_ALIGNDATA   48)
-   (UNSPEC_FCMP49)
-   (UNSPEC_PDIST   50)
-   (UNSPEC_EDGE8   51)
-   (UNSPEC_EDGE8L  52)
-   (UNSPEC_EDGE16  53)
-   (UNSPEC_EDGE16L 54)
-   (UNSPEC_EDGE32  55)
-   (UNSPEC_EDGE32L 56)
-   (UNSPEC_ARRAY8  57)
-   (UNSPEC_ARRAY16 58)
-   (UNSPEC_ARRAY32 59)
-
-   (UNSPEC_SP_SET  60)
-   (UNSPEC_SP_TEST 61)
-
-   (UNSPEC_EDGE8N  70)
-   (UNSPEC_EDGE8LN 71)
-   (UNSPEC_EDGE16N 72)
-   (UNSPEC_EDGE16LN73)
-   (UNSPEC_EDGE32N 74)
-   (UNSPEC_EDGE32LN75)
-   (UNSPEC_BSHUFFLE76)
-   (UNSPEC_CMASK8  77)
-   (UNSPEC_CMASK16 78)
-   (UNSPEC_CMASK32 79)
-   (UNSPEC_FCHKSM1680)
-   (UNSPEC_PDISTN  81)
-   (UNSPEC_FUCMP   82)
-   (UNSPEC_FHADD   83)
-   (UNSPEC_FHSUB   84)
-   (UNSPEC_XMUL85)
-   (UNSPEC_MUL886)
-   (UNSPEC_MUL8SU  87)
-   (UNSPEC_MULDSU  88)
-  ])
-
-(define_constants
-  [(UNSPECV_BLOCKAGE   0)
-   (UNSPECV_FLUSHW 1)
-   (UNSPECV_FLUSH  4)
-   (UNSPECV_SAVEW  6)
-   (UNSPECV_CAS8)
-   (UNSPECV_SWAP   9)
-   (UNSPECV_LDSTUB 10)
-   (UNSPECV_PROBE_STACK_RANGE  11)
-  ])
+(define_c_enum "unspec" [
+  UNSPEC_MOVE_PIC
+  UNSPEC_UPDATE_RETURN
+  UNSPEC_LOAD_PCREL_SYM
+  UNSPEC_FRAME_BLOCKAGE
+  UNSPEC_MOVE_PIC_LABEL
+  UNSPEC_SETH44
+  UNSPEC_SETM44
+  UNSPEC_SETHH
+  UNSPEC_SETLM
+  UNSPEC_EMB_HISUM
+  UNSPEC_EMB_TEXTUHI
+  UNSPEC_EMB_TEXTHI
+  UNSPEC_EMB_TEXTULO
+  UNSPEC_EMB_SETHM
+  UNSPEC_MOVE_GOTDATA
+
+  UNSPEC_MEMBAR
+  UNSPEC_ATOMIC
+
+  UNSPEC_TLSGD
+  UNSPEC_TLSLDM
+  UNSPEC_TLSLDO
+  UNSPEC_TLSIE
+  UNSPEC_TLSLE
+  UNSPEC_TLSLD_BASE
+
+  UNSPEC_FPACK16
+  UNSPEC_FPACK32
+  UNSPEC_FPACKFIX
+  UNSPEC_FEXPAND
+  UNSPEC_MUL16AU
+  UNSPEC_MUL16AL
+  UNSPEC_MUL8UL
+  UNSPEC_MULDUL
+  UNSPEC_ALIGNDATA
+  UNSPEC_FCMP
+  UNSPEC_PDIST
+  UNSPEC_EDGE8
+  UNSPEC_EDGE8L
+  UNSPEC_EDGE16
+  UNSPEC_EDGE16L
+  UNSPEC_EDGE32
+  UNSPEC_EDGE32L
+  UNSPEC_ARRAY8
+  UNSPEC_ARRAY16
+  UNSPEC_ARRAY32
+
+  UNSPEC_SP_SET
+  UNSPEC_SP_TEST
+
+  UNSPEC_EDGE8N
+  UNSPEC_EDGE8LN
+  UNSPEC_EDGE16N
+  UNSPEC_EDGE16LN
+  UNSPEC_EDGE32N
+  UNSPEC_EDGE32LN
+  UNSPEC_BSHUFFLE
+  UNSPEC_CMASK8
+  UNSPEC_CMASK16
+  UNSPEC_CMASK32
+  UNSPEC_FCHKSM16
+  UNSPEC_PDISTN
+  UNSPEC_FUCMP
+  UNSPEC_FHADD
+  UNSPEC_FHSUB
+  UNSPEC_XMUL
+  UNSPEC_MUL8
+  UNSPEC_MUL8SU
+  UNSPEC_MULDSU
+])
+
+(define_c_enum "unspecv" [
+  UNSPECV_BLOCKAGE
+  UNSPECV_FLUSHW
+  UNSPECV_FLUSH
+  UNSPECV_SAVEW
+  UNSPECV_CAS
+  UNSPECV_SWAP
+  UNSPECV_LDSTUB
+  UNSPECV_PROBE_STACK_RANGE
+])
 
 (define_constants
  [(G0_REG  0)

Re: [ira-improv] patch for new code dealing with hard reg preferences

2013-04-05 Thread Vladimir Makarov


On 13-04-05 4:48 PM, Jeff Law wrote:

On 04/05/2013 02:14 PM, Vladimir Makarov wrote:

   The following patch adds a new functionality to IRA to improve RA in
presence of hard registers in RTL.  IRA already had some mechanism
dealing with hard regs but it affected only pseudos in *the same insn*
containing hard register.  This technique was not good enough to
remove regmove pass code making transformations for matching
constraints (e.g. 2-op insn machine) although IRA can make better
decision for pseudos but still not enough good for hard registers.  So
just removing the code in regmove resulted in worse generated code
performance.

   On the other hand there are performance PRs which is a result of bad
decision in pre-mature optimization of regmove pass for matching
constraints.

   We already have an IRA code propagating pseudo preferences to other
pseudos through net of copies.  We need the same code for better
dealing with hard register preferences.  The patch adds such code.  I
could modify structure ira_copy for this but decided to use new
separate structures as the copy structure is too big and we don't need
most of its fields for hard-register preferences.

   By the way, LRA has already code for dealing with hard register
preferences but in simplified way as we work on RTL mostly locally.

   This is not a final patch as the big part of regmove code and old
IRA code for hard register preferences are not removed.  This code is
just switched off by -fira-hard-reg-pref option.  I am going to play a
bit more with the new code and when I decide to submit it to trunk, I
remove the unnecessary code and the option.

   Right now, the results look pretty good for SPEC2000 on x86/x86-64.
The compiler is always faster (as a regmove RTL pass is switched
off), generates in average smaller code (the best is 0.05% decrease
for x86-64 SPECFP2000), and generates in average better code (0.6%
and 0.4% SPECFP2000 improvement on x86 and x86-64 correspondingly).
The results were gotten on Intel Core I7-2600 in -O3 optimization
mode.

   The patch was successfully bootstrapped on x86/x86-64 with the new
code switched on.

Committed as rev. 197525.
Very cool.  Presumably this is meant to kill optimize_reg_copy_*. 
That'd leave the fixup_match bits (which might be dead with the SLSR 
in the mainline.  Does that just leave try_auto_increment?


Yes, I think so.  Unfortunately, no IRA/LRA has analogous code.  As I 
remember this code still important for targets with small displacements 
and even for x86/x86-64: although it has no practically constraints.on 
displacements, it affects x86/x86-64 code size and as a consequence the 
performance too (through better code locality).


Still removing one pass through RTL is a good result although removing 2 
passes would be better.

Re: mips SNaN/QNaN is swapped

2013-04-05 Thread Maciej W. Rozycki

On Fri, 5 Apr 2013, Thomas Schwinge wrote:

> As I understand it (and I may add that this is the first time ever I'm
> looking at soft-fp internals), that appears to be a bug in soft-fp, in
> this very code added ten years ago ;-), which is invoked by means of
> _df_to_tf.o:__extenddftf2 for this conversion operation.  (Forgive my
> ignorance of MIPS ISA floating-point details, and not looking it up
> myself at this late time of day -- why use soft-fp for that; isn't there
> anything in hardware available to do it?)

 There's no direct MIPS hardware support for any FP data type wider than 
double.

> > Index: gcc/config/fp-bit.c
> > ===
> > RCS file: /cvs/uberbaum/gcc/config/fp-bit.c,v
> > retrieving revision 1.39
> > diff -u -p -r1.39 fp-bit.c
> > --- gcc/config/fp-bit.c 26 Jan 2003 10:06:57 - 1.39
> > +++ gcc/config/fp-bit.c 1 Apr 2003 21:35:00 -
> > @@ -210,7 +210,11 @@ pack_d ( fp_number_type *  src)
> >exp = EXPMAX;
> >if (src->class == CLASS_QNAN || 1)
> > {
> > +#ifdef QUIET_NAN_NEGATED
> > + fraction |= QUIET_NAN - 1;
> > +#else
> >   fraction |= QUIET_NAN;
> > +#endif
> > }
> >  }
> >else if (isinf (src))
> > @@ -521,7 +525,11 @@ unpack_d (FLO_union_type * src, fp_numbe
> >else
> > {
> >   /* Nonzero fraction, means nan */
> > +#ifdef QUIET_NAN_NEGATED
> > + if ((fraction & QUIET_NAN) == 0)
> > +#else
> >   if (fraction & QUIET_NAN)
> > +#endif
> > {
> >   dst->class = CLASS_QNAN;
> > }
> 
> With the fix applied, we get the expected result:
> 
> 7fbf
> 7ff7 
> 7ff7 e000
> 7ff7 
> 7fff7fff   
> 7fff7fff   
> 7fff7fff   
> 7fff7fff   
> 7fff7fff   
> 
> Automated testing is still running; in case nothing turns up, does this
> look OK to check in?
> 
> Index: libgcc/fp-bit.c
> ===
> --- libgcc/fp-bit.c   (revision 402061)
> +++ libgcc/fp-bit.c   (working copy)
> @@ -217,6 +217,9 @@ pack_d (const fp_number_type *src)
>if (src->class == CLASS_QNAN || 1)
>   {
>  #ifdef QUIET_NAN_NEGATED
> +   /* Mask out the quiet/signaling bit.  */
> +   fraction &= ~QUIET_NAN;
> +   /* Set the remainder of the fraction to a non-zero value.  */
> fraction |= QUIET_NAN - 1;
>  #else
> fraction |= QUIET_NAN;

 I think the intent of this code is to preserve a NaN's payload (it 
certainly does for non-QUIET_NAN_NEGATED targets), so corrected code 
should IMHO look like:

#ifdef QUIET_NAN_NEGATED
  fraction &= QUIET_NAN - 1;
  if (fraction == 0)
fraction |= QUIET_NAN - 1;
#else
  fraction |= QUIET_NAN;
#endif

or suchalike -- making sure a NaN is not accidentally converted to 
infinity where qNaNs are denoted by a zero bit (in which case the 
canonical qNaN is returned instead -- the code above is correct for MIPS 
legacy targets where the canonical qNaN has an all-ones payload; can't 
speak of HP-PA).  Complementing the change above I think it will also make 
sense to clear the qNaN bit when extracting a payload from fraction in 
unpack_d as the class of a NaN being handled is stored separately.

 Also I find the "|| 1" clause in the condition immediately above the 
pack_d piece concerned suspicious -- why is a qNaN returned for sNaN 
input?  Likewise why are __thenan_sf, etc. encoded as sNaNs rather than 
qNaNs?  Does anybody know?

  Maciej

Re: Fix PR 56077

2013-04-05 Thread Olivier Hainque


On Apr 5, 2013, at 11:18 PM, Eric Botcazou wrote:

>> Andrey, could you please take care of this ?
> 
> I've reverted the patch after bootstrapping/regtesting on x86-64/Linux.

 Thanks Eric.

65 matches

Mail list logo