Re: [PATCH] fixup libobjc usage of PCC_BITFIELD_TYPE_MATTERS

2015-05-04 Thread Trevor Saunders
On Sun, May 03, 2015 at 10:59:46AM +0200, Andreas Schwab wrote:
> tbsaunde+...@tbsaunde.org writes:
> 
> > +AC_DEFUN([gt_BITFIELD_TYPE_MATTERS],
> > +[
> > +  AC_CACHE_CHECK([if the type of bitfields matters], 
> > gt_cv_bitfield_type_matters,
> > +  [
> > +AC_TRY_COMPILE(
> > +  [struct foo1 { char x; char :0; char y; };
> > +struct foo2 { char x; int :0; char y; };
> > +int foo1test[ sizeof (struct foo1) == 2 ? 1 : -1 ];
> > +int foo2test[ sizeof (struct foo2) == 5 ? 1 : -1]; ],
> > +  [], gt_cv_bitfield_type_matters=yes, gt_cv_bitfield_type_matters=no)
> > +  ])
> > +  if test $gt_cv_bitfield_type_matters = yes; then
> > +AC_DEFINE(HAVE_BITFIELD_TYPE_MATTERS, 1,
> > +  [Define if the type of bitfields effects alignment.])
> > +  fi
> > +])
> 
> gcc/config/aarch64/aarch64.h:#define PCC_BITFIELD_TYPE_MATTERS  1
> 
> configure:11554: /opt/gcc/gcc-20150503/Build/./gcc/xgcc 
> -B/opt/gcc/gcc-20150503/Build/./gcc/ -B/usr/aarch64-suse-linux/bin/ 
> -B/usr/aarch64-suse-linux/lib/ -isystem /usr/aarch64-suse-linux/include 
> -isystem /usr/aarch64-suse-linux/sys-include-c -O2 -g  conftest.c >&5
> conftest.c:27:5: error: size of array 'foo2test' is negative
>  int foo2test[ sizeof (struct foo2) == 5 ? 1 : -1];
>  ^
> configure:11554: $? = 1

ok, a quick test seems to show Jakub's version of the test works in this
case so lets try that.

Trev

> 
> Andreas.
> 
> -- 
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."


Re: [PATCH, PR65915] Fix float conversion split.

2015-05-04 Thread Uros Bizjak
On Thu, Apr 30, 2015 at 5:18 PM, H.J. Lu  wrote:
> On Thu, Apr 30, 2015 at 8:15 AM, Ilya Tocar  wrote:
>>> Hi,
>>>
>>> Looks like I missed some splits, which caused PR65915.
>>> Patch below fixes it.
>>> Ok for trunk?
>>>
>>> 2015-04-28  Ilya Tocar  
>>>
>>>   * config/i386/i386.md (define_split): Check for xmm16+,
>>>   when splitting scalar float conversion.
>>>
>>>
>>> ---
>>>  gcc/config/i386/i386.md | 8 ++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
>>> index 937871a..af1cd9b 100644
>>> --- a/gcc/config/i386/i386.md
>>> +++ b/gcc/config/i386/i386.md
>>> @@ -4897,7 +4897,9 @@
>>>"TARGET_SSE2 && TARGET_SSE_MATH
>>> && TARGET_USE_VECTOR_CONVERTS && optimize_function_for_speed_p (cfun)
>>> && reload_completed && SSE_REG_P (operands[0])
>>> -   && (MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)"
>>> +   && (MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
>>> +   && (!EXT_REX_SSE_REG_P (operands[0])
>>> +   || TARGET_AVX512VL)"
>>>[(const_int 0)]
>>>  {
>>>operands[3] = simplify_gen_subreg (mode, operands[0],
>>> @@ -4921,7 +4923,9 @@
>>>"TARGET_SSE2 && TARGET_SSE_MATH
>>> && TARGET_SSE_PARTIAL_REG_DEPENDENCY
>>> && optimize_function_for_speed_p (cfun)
>>> -   && reload_completed && SSE_REG_P (operands[0])"
>>> +   && reload_completed && SSE_REG_P (operands[0])
>>> +   && (!EXT_REX_SSE_REG_P (operands[0])
>>> +   || TARGET_AVX512VL)"
>>>[(const_int 0)]
>>>  {
>>>const machine_mode vmode = mode;
>>> --
>>> 1.8.3.1
>>>
>>
>> Updated version below (now with test).
>>
>> ---
>>  gcc/config/i386/i386.md | 8 ++--
>>  gcc/config/i386/sse.md  | 6 +++---
>>  gcc/testsuite/gcc.target/i386/pr65915.c | 6 ++
>>  3 files changed, 15 insertions(+), 5 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr65915.c
>>
>> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
>> index 937871a..af1cd9b 100644
>> --- a/gcc/config/i386/i386.md
>> +++ b/gcc/config/i386/i386.md
>> @@ -4897,7 +4897,9 @@
>>"TARGET_SSE2 && TARGET_SSE_MATH
>> && TARGET_USE_VECTOR_CONVERTS && optimize_function_for_speed_p (cfun)
>> && reload_completed && SSE_REG_P (operands[0])
>> -   && (MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)"
>> +   && (MEM_P (operands[1]) || TARGET_INTER_UNIT_MOVES_TO_VEC)
>> +   && (!EXT_REX_SSE_REG_P (operands[0])
>> +   || TARGET_AVX512VL)"
>>[(const_int 0)]
>>  {
>>operands[3] = simplify_gen_subreg (mode, operands[0],
>> @@ -4921,7 +4923,9 @@
>>"TARGET_SSE2 && TARGET_SSE_MATH
>> && TARGET_SSE_PARTIAL_REG_DEPENDENCY
>> && optimize_function_for_speed_p (cfun)
>> -   && reload_completed && SSE_REG_P (operands[0])"
>> +   && reload_completed && SSE_REG_P (operands[0])
>> +   && (!EXT_REX_SSE_REG_P (operands[0])
>> +   || TARGET_AVX512VL)"
>>[(const_int 0)]
>>  {
>>const machine_mode vmode = mode;
>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>> index 9b7009a..c61098d 100644
>> --- a/gcc/config/i386/sse.md
>> +++ b/gcc/config/i386/sse.md
>> @@ -4258,11 +4258,11 @@
>> (set_attr "mode" "TI")])
>>
>>  (define_insn "sse2_cvtsi2sd"
>> -  [(set (match_operand:V2DF 0 "register_operand" "=x,x,x")
>> +  [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
>> (vec_merge:V2DF
>>   (vec_duplicate:V2DF
>> (float:DF (match_operand:SI 2 "nonimmediate_operand" "r,m,rm")))
>> - (match_operand:V2DF 1 "register_operand" "0,0,x")
>> + (match_operand:V2DF 1 "register_operand" "0,0,v")
>>   (const_int 1)))]
>>"TARGET_SSE2"
>>"@
>> @@ -4275,7 +4275,7 @@
>> (set_attr "amdfam10_decode" "vector,double,*")
>> (set_attr "bdver1_decode" "double,direct,*")
>> (set_attr "btver2_decode" "double,double,double")
>> -   (set_attr "prefix" "orig,orig,vex")
>> +   (set_attr "prefix" "orig,orig,maybe_evex")
>> (set_attr "mode" "DF")])
>>
>>  (define_insn "sse2_cvtsi2sdq"
>> diff --git a/gcc/testsuite/gcc.target/i386/pr65915.c 
>> b/gcc/testsuite/gcc.target/i386/pr65915.c
>> new file mode 100644
>> index 000..990c5aa
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/pr65915.c
>> @@ -0,0 +1,6 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -mavx512f -fpic -mcmodel=medium" } */
>> +/* { dg-require-effective-target avx512f } */
>> +/* { dg-require-effective-target lp64 } */
>> +
>> +#include "avx512f-vrndscalepd-2.c"
>
> Missing testcases for
>
> FAIL: gcc.target/i386/avx512f-vrndscaleps-2.c (test for excess errors)
> FAIL: gcc.target/i386/avx512vl-vrndscaleps-2.c (internal compiler error)

The attached test is OK, since these two would test for the same problem.

> as well as ChangeLog entries.

ChangeLog is missing. Please add PR number and describe *each* change
accurately. You can say (vector convert to float spltiter) for this
particular nameless splitter.

Please repost the patc

Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)

2015-05-04 Thread Richard Biener
On Sat, 2 May 2015, Jakub Jelinek wrote:

> Hi!
> 
> This is an attempt to fix the following testcase (reduced from gsoap)
> similarly how you've fixed another issue with r221795 other AAPCS
> regressions introduced with r221348 change.
> This patch passed bootstrap/regtest on
> {x86_64,i686,armv7hl,aarch64,powerpc64{,le},s390{,x}}-linux.
> 
> Though, it still doesn't fix profiledbootstrap on armv7hl that is broken
> since r221348, so other issues are lurking in there, and I must say
> I'm not entirely sure about this, because it changes alignment even when
> the original access had higher alignment.
> 
> I was trying something like:
> struct B { char *a, *b; };
> typedef struct B C __attribute__((aligned (8)));
> struct A { C a; int b; long long c; };
> char v[3];
> 
> __attribute__((noinline, noclone)) void
> fn1 (C x, C y)
> {
>   if (x.a != &v[1] || y.a != &v[2])
> __builtin_abort ();
>   v[1]++;
> }
> 
> __attribute__((noinline, noclone)) int
> fn2 (C x)
> {
>   asm volatile ("" : "+g" (x.a) : : "memory");
>   asm volatile ("" : "+g" (x.b) : : "memory");
>   return x.a == &v[0];
> }
> 
> __attribute__((noinline, noclone)) void
> fn3 (const char *x)
> {
>   if (x[0] != 0)
> __builtin_abort ();
> }
> 
> static struct A
> foo (const char *x, struct A y, struct A z)
> {
>   struct A r = { { 0, 0 }, 0, 0 };
>   if (y.b && z.b)
> {
>   if (fn2 (y.a) && fn2 (z.a))
>   switch (x[0])
> {
> case '|':
>   break;
> default:
>   fn3 (x);
> }
>   fn1 (y.a, z.a);
> }
>   return r;
> }
> 
> __attribute__((noinline, noclone)) int
> bar (int x, struct A *y)
> {
>   switch (x)
> {
> case 219:
>   foo ("+", y[-2], y[0]);
> case 220:
>   foo ("-", y[-2], y[0]);
> }
> }
> 
> int
> main ()
> {
>   struct A a[3] = { { { &v[1], &v[0] }, 1, 1LL },
>   { { &v[0], &v[0] }, 0, 0LL },
>   { { &v[2], &v[0] }, 2, 2LL } };
>   bar (220, a + 2);
>   if (v[1] != 1)
> __builtin_abort ();
>   return 0;
> }
> 
> and this patch indeed changes the register passing, eventhough it probably
> shouldn't (though, the testcase doesn't fail).  Wouldn't it be possible to
> preserve the original type (before we call build_aligned_type on it)
> somewhere in SRA data structures, perhaps keep expr (the new MEM_REF) use
> the aligned type, but type field be the non-aligned one?

Not sure how this helps when SRA tears apart the parameter.  That is,
isn't the important thing that both the IPA modified function argument
types/decls have the same type as the types of the parameters SRA ends
up passing?  (as far as alignment goes?)

Yes, of course using "natural" alignment makes sure that the backend
can handle alignment properly and we don't run into oddball bugs here.

> 2015-05-02  Jakub Jelinek  
> 
>   PR target/65956
>   * tree-sra.c (turn_representatives_into_adjustments): For
>   adj.type, use TYPE_MAIN_VARIANT of repr->type with TYPE_QUALS.
> 
>   * gcc.c-torture/execute/pr65956.c: New test.
> 
> --- gcc/tree-sra.c.jj 2015-04-20 14:35:47.0 +0200
> +++ gcc/tree-sra.c2015-05-01 01:08:34.092636496 +0200
> @@ -4427,7 +4427,11 @@ turn_representatives_into_adjustments (v
> gcc_assert (repr->base == parm);
> adj.base_index = index;
> adj.base = repr->base;
> -   adj.type = repr->type;
> +   /* Drop any special alignment on the type if it's not on the
> +  main variant.  This avoids issues with weirdo ABIs like
> +  AAPCS.  */
> +   adj.type = build_qualified_type (TYPE_MAIN_VARIANT (repr->type),
> +TYPE_QUALS (repr->type));

So - this changes the function argument type of the clone?  Does it
also change the type of the value we pass to the function?  That is,
why drop the alignment here but not avoid attaching it to repr->type
in the first place as my fix for the other issue did?

Doesn't the above just make it inconsistent by default?

There is also the correctness issue of under-aligned types (which
was what the original code using build_aligned_type cared for - before
I "fixed" it to also preserve over-alignment).

That said - somewhere we create the register we use for passing the
argument, and only the type of that register needs fixing IMHO.

We also have

  ptype = adj->type;
  if (is_gimple_reg_type (ptype))
{
  unsigned malign = GET_MODE_ALIGNMENT (TYPE_MODE 
(ptype));
  if (TYPE_ALIGN (ptype) < malign)
ptype = build_aligned_type (ptype, malign);

in ipa_modify_formal_parameters.  That looks odd for by-value passing
as well.  When modifying the function bodies we simply take what was
set in ->new_decl which we'd populate above in 
ipa_modify_formal_parameters.  It seems to me that ipa_modify_expr
should look to preserve alignment at the callers site (for loading
into the

Re: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests

2015-05-04 Thread Thomas Schwinge
Hi!

On Thu, 30 Apr 2015 14:47:03 +0200, I wrote:
> Here is a patch, prepared by Jim Norris, to fix dg-shouldfail usage in
> OpenACC libgomp tests.  It introduces two regressions (that is, makes the
> existing errors visible), which shall then be fixed later on:
> libgomp.oacc-c-c++-common/lib-3.c, and
> libgomp.oacc-c-c++-common/lib-42.c.
> 
> As obvious, committed to trunk in r222620: [...]

So much for "obvious" ;-) -- .

Dave, would you please test the following patch, and report the
regression status compared to before r222620?  (Compared to your existing
r222021 results, as posted in the PR, for example.)

Additionally to the "%p" format specifier printing a "0x" prefix vs. not
doing that, I've also changed the expected "(nil)" output for NULL
pointers to instead match basically everything.

 libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c  | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-1.c | 4 ++--
 libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-2.c | 4 ++--
 libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-8.c | 4 ++--
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-16.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-17.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-18.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-20.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-21.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-22.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-23.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-25.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-26.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-27.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-28.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-29.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-30.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-34.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-35.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-36.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-39.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-40.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-42.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-44.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-47.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-48.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-52.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-53.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-54.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-57.c | 2 +-
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-58.c | 2 +-
 libgomp/testsuite/libgomp.oacc-fortran/data-already-1.f  | 2 +-
 libgomp/testsuite/libgomp.oacc-fortran/data-already-2.f  | 2 +-
 libgomp/testsuite/libgomp.oacc-fortran/data-already-8.f  | 2 +-
 35 files changed, 38 insertions(+), 38 deletions(-)

diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c
index fec2214..c0a5d00 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-2.c
@@ -64,5 +64,5 @@ main (int argc, char **argv)
 
 return 0;
 }
-/* { dg-output "Trying to map into device \\\[0x\[0-9a-f\]+..0x\[0-9a-f\]+\\\) 
object when \\\[0x\[0-9a-f\]+..0x\[0-9a-f\]+\\\) is already mapped" }
+/* { dg-output "Trying to map into device 
\\\[\[0-9a-fA-FxX\]+..\[0-9a-fA-FxX\]+\\\) object when 
\\\[\[0-9a-fA-FxX\]+..\[0-9a-fA-FxX\]+\\\) is already mapped" } */
 /* { dg-shouldfail "" } */
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-1.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-1.c
index 83c0a42..0c61a66 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-1.c
@@ -15,5 +15,5 @@ main (int argc, char *argv[])
   return 0;
 }
 
-/* { dg-shouldfail "" }
-   { dg-output "Trying to map into device .* object when .* is already mapped" 
} */
+/* { dg-output "Trying to map into device 
\\\[\[0-9a-fA-FxX\]+..\[0-9a-fA-FxX\]+\\\) object when 
\\\[\[0-9a-fA-FxX\]+..\[0-9a-fA-FxX\]+\\\) is already mapped" } */
+/* { dg-shouldfail "" } */
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-2.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-2.c
index 137d8ce..cd9fea3 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-already-2.c
@@ -12,5 +12,5 @@ main (int argc, char 

Re: [rs6000] Fix compare debug failure on AIX

2015-05-04 Thread Richard Biener
On Mon, May 4, 2015 at 2:32 AM, David Edelsohn  wrote:
> On Sat, May 2, 2015 at 6:04 AM, Eric Botcazou  wrote:
>>> Why should GCC unnecessarily create stack frames to avoid
>>> compare-debug testcase failures?
>>
>> I'm not sure I understand the question... compare-debug failures are failures
>> (-g is not supposed to change the generated code and this XCOFF-specific bug
>> was reported to us) so they need to be fixed.
>>
>> From there on, as Alan said, there are 2 cases: either AIX needs a frame for
>> debugging or it doesn't.  If the latter, then the lines can simply be 
>> deleted.
>> If the former, we have to draw a line somewhere; Alan suggests always 
>> creating
>> a frame while I suggest creating it only at -O0 and -Og.
>
> I believe that AIX does need a frame for debugging.  I don't remember
> the exact reason off hand.
>
> I'm sorry that XCOFF debugging changes the generated code (only in the
> sense of allocating a frame), but that is a system dependency.  It's
> been this way for over 20 years.  I see no reason to produce worse
> code at -O0 when not debugging simply to make testcases happier.

The simple reason is because it is policy for GCC to generate the same
code with -g0 and -g.  You can't simply say you don't care.

You never want to run into the situation that you miscompile a program
with -g0 but not with -g because that's very much no fun to debug.

Yes, I don't think we have this policy written down anywhere - something
we should improve on.

Richard.

> By the way, I'm still waiting for the DWARF debugging patches from
> Adacore compatible with AIX as and ld.  DWARF debugging would not
> require pushing a frame, and would resolve the failure when testing
> with DWARF.  The patch would be adjusted to only push a frame when
> writing XCOFF debugging.
>
> - David


Re: [patch] Perform anonymous constant propagation during inlining

2015-05-04 Thread Richard Biener
On Fri, May 1, 2015 at 8:09 PM, Eric Botcazou  wrote:
>> OK, how aggressive then?  We could as well do the substitution for all
>> copies:
>>
>>   /* For EXPAND_INITIALIZER try harder to get something simpler.
>>Otherwise, substitute copies on the RHS, this can propagate
>>constants at -O0 and thus simplify arithmetic operations.  */
>>   if (g == NULL
>> && !SSA_NAME_IS_DEFAULT_DEF (exp)
>> && (optimize || DECL_IGNORED_P (SSA_NAME_VAR (exp)))
>> && (modifier == EXPAND_INITIALIZER
>>
>> || (modifier != EXPAND_WRITE
>>
>> && gimple_assign_copy_p (SSA_NAME_DEF_STMT (exp
>> && stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp)))
>>   g = SSA_NAME_DEF_STMT (exp);
>
> This doesn't work (this generates wrong code because this creates overlapping
> live ranges for SSA_NAMEs with the same base variable).  Here's the latest
> working version, all the predicates and accessors used are inlined.

Hum, the fact that your earlier version created wrong code
(get_gimple_for_ssa_name
already returned false here) points at some issues with
EXPAND_INITIALIZER as well, no...?

That said, the path you add is certainly safe (though maybe we want to change
get_gimple_for_ssa_name to return tcc_constant single-use defs even if
TER is disabled
(thus at -O0 - and only at -O0, otherwise it shouldn't happen).  That
would cover
more cases of get_gimple_for_ssa_name uses (I can see
optimize_bitfield_expansion
for example...)

So, your patch is ok for trunk unless you want to explore the
get_gimple_for_ssa_name
improvement suggestion.

I also wonder about EXPAND_INITIALIZER creating overlapping
life-ranges (or moving
loads across stores).

Thanks,
Richard.

> Tested on x86_64-suse-linux, OK for the mainline?
>
>
> 2015-05-01  Eric Botcazou  
>
> * expr.c (expand_expr_real_1) : Try to substitute constants
> on the RHS of expressions.
> * gimple-expr.h (is_gimple_constant): Reorder.
>
>
> --
> Eric Botcazou


[PING^4] [PATCH] [AArch64, NEON] Improve vmulX intrinsics

2015-05-04 Thread Jiangjiji

Hi, 
  This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00772.html
  Regtested with aarch64-linux-gnu on QEMU.
  This patch has no regressions for aarch64_be-linux-gnu big-endian target too. 
  OK for the trunk? 

Thanks.
Jiang jiji







Re: PR 64454: Improve VRP for %

2015-05-04 Thread Richard Biener
On Sat, May 2, 2015 at 12:46 AM, Marc Glisse  wrote:
> Hello,
>
> this patch tries to tighten a bit the range estimate for x%y. slp-perm-7.c
> started failing by vectorizing more than expected, I assumed it was a good
> thing and updated the test. I am less conservative than Jakub with division
> by 0, but I still don't really understand how empty ranges are supposed to
> be represented in VRP.
>
> Bootstrap+testsuite on x86_64-linux-gnu.

Hmm, so I don't like how you (continute to) use trees for the constant
computations.
wide-ints would be a better fit today.  I also notice that
fold_unary_to_constant can
return NULL_TREE and neither the old nor your code handles that.

"empty" ranges are basically UNDEFINED.

Aren't you pessimizing the case where the old code used
value_range_nonnegative_p()
by just using TYPE_UNSIGNED?

Thanks,
Richard.

> 2015-05-02  Marc Glisse  
>
> PR tree-optimization/64454
> gcc/
> * tree-vrp.c (extract_range_from_binary_expr_1) :
> Rewrite.
> gcc/testsuite/
> * gcc.dg/tree-ssa/vrp97.c: New file.
> * gcc.dg/vect/slp-perm-7.c: Update.
>
> --
> Marc Glisse
> Index: gcc/testsuite/gcc.dg/tree-ssa/vrp97.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/vrp97.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/vrp97.c   (working copy)
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/64454 */
> +/* { dg-options "-O2 -fdump-tree-vrp1" } */
> +
> +int f(int a, int b)
> +{
> +if (a < -3 || a > 13) __builtin_unreachable();
> +if (b < -6 || b > 9) __builtin_unreachable();
> +int c = a % b;
> +return c >= -3 && c <= 8;
> +}
> +
> +/* { dg-final { scan-tree-dump "return 1;" "vrp1" } } */
> +/* { dg-final { cleanup-tree-dump "vrp1" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-perm-7.c
> ===
> --- gcc/testsuite/gcc.dg/vect/slp-perm-7.c  (revision 222708)
> +++ gcc/testsuite/gcc.dg/vect/slp-perm-7.c  (working copy)
> @@ -63,15 +63,15 @@ int main (int argc, const char* argv[])
>
>foo (input, output, input2, output2);
>
>for (i = 0; i < N; i++)
>   if (output[i] != check_results[i] || output2[i] != check_results2[i])
> abort ();
>
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {
> target vect_perm } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"  {
> target vect_perm } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"
> { target vect_perm } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
>
> Index: gcc/tree-vrp.c
> ===
> --- gcc/tree-vrp.c  (revision 222708)
> +++ gcc/tree-vrp.c  (working copy)
> @@ -3189,40 +3189,83 @@ extract_range_from_binary_expr_1 (value_
> }
> }
>else
> {
>   extract_range_from_multiplicative_op_1 (vr, code, &vr0, &vr1);
>   return;
> }
>  }
>else if (code == TRUNC_MOD_EXPR)
>  {
> -  if (vr1.type != VR_RANGE
> - || range_includes_zero_p (vr1.min, vr1.max) != 0
> - || vrp_val_is_min (vr1.min))
> +  if (range_is_null (&vr1))
> +   {
> + set_value_range_to_undefined (vr);
> + return;
> +   }
> +  // Some propagation of symbolic ranges should be possible
> +  // at least in the unsigned case.
> +  bool has_vr0 = vr0.type == VR_RANGE && !symbolic_range_p (&vr0);
> +  bool has_vr1 = vr1.type == VR_RANGE && !symbolic_range_p (&vr1);
> +  if (!has_vr0 && !has_vr1)
> {
>   set_value_range_to_varying (vr);
>   return;
> }
>type = VR_RANGE;
> -  /* Compute MAX <|vr1.min|, |vr1.max|> - 1.  */
> -  max = fold_unary_to_constant (ABS_EXPR, expr_type, vr1.min);
> -  if (tree_int_cst_lt (max, vr1.max))
> -   max = vr1.max;
> -  max = int_const_binop (MINUS_EXPR, max, build_int_cst (TREE_TYPE
> (max), 1));
> -  /* If the dividend is non-negative the modulus will be
> -non-negative as well.  */
> -  if (TYPE_UNSIGNED (expr_type)
> - || value_range_nonnegative_p (&vr0))
> -   min = build_int_cst (TREE_TYPE (max), 0);
> +  if (TYPE_UNSIGNED (expr_type))
> +   {
> + // A % B is at most A and smaller than B.
> + min = build_int_cst (expr_type, 0);
> + if (has_vr0 && (!has_vr1 || tree_int_cst_lt (vr0.max, vr1.max)))
> +   max = vr0.max;
> + else
> +   max = int_const_binop (MINUS_EXPR, vr1.max,
> +  build_int_cst (expr_type, 1));
> +   }
>else
> -   min = fold_unary_to_constant (NEGATE_EXPR, expr_type, max);
> +   {
> + tree min1 = NULL_TREE;
> + tree max1 = NULL_TREE;
> + if (has_vr1)
> +   {
> + // ABS (A % B) < ABS (B)
> 

Re: [PATCH, AArch64] Add Cortex-A53 erratum 843419 configure-time option

2015-05-04 Thread Yvan Roux
Hi Marcus,

On 1 May 2015 at 17:18, Marcus Shawcroft  wrote:
> On 1 May 2015 at 14:56, Yvan Roux  wrote:
>
>> 2015-05-01  Yvan Roux  
>>
>>  * configure.ac: Add --enable-fix-cortex-a53-843419 option.
>>  * configure: Regenerate.
>>  * config/aarch64/aarch64-elf-raw.h (CA53_ERR_843419_SPEC): Define.
>>  (LINK_SPEC): Include CA53_ERR_843419_SPEC.
>>  * config/aarch64/aarch64-linux.h (CA53_ERR_843419_SPEC): Define.
>>  (LINK_SPEC): Include CA53_ERR_843419_SPEC.
>>  * doc/install.texi (aarch64*-*-*): Document
>>  new --enable-fix-cortex-a53-843419 option
>>  * config/aarch64/aarch64.opt (mfix-cortex-a53-843419): New option.
>>  * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-843419
>>  and -mno-fix-cortex-a53-8434199 options.
>>
>
> +@option{--enable-fix-cortex-a53-843419} option.  This erratum
> workaround is
> +made at link time and enabling it by default in GCC will only pass
> the
>
> How about something like "The workaround is applied at link time.
> Enabling the workaround will cause GCC to pass the relevant option to
> the linker." ?

Yes this is a better formulation.

> +corresponding flag to the linker.  It can be explicitly disabled
> during
> +compilation by passing the @option{-mno-fix-cortex-a53-835769} option.
>
> Copy paste error here with the previous errata number.

Here is the patch with the modifications.  Is it needed to backport it
into 4.9 and 5.1 branches ?

Cheers,
Yvan
diff --git a/gcc/config/aarch64/aarch64-elf-raw.h 
b/gcc/config/aarch64/aarch64-elf-raw.h
index ebeeb50..bd5e51c 100644
--- a/gcc/config/aarch64/aarch64-elf-raw.h
+++ b/gcc/config/aarch64/aarch64-elf-raw.h
@@ -35,10 +35,19 @@
   " %{mfix-cortex-a53-835769:--fix-cortex-a53-835769}"
 #endif
 
+#ifdef TARGET_FIX_ERR_A53_843419_DEFAULT
+#define CA53_ERR_843419_SPEC \
+  " %{!mno-fix-cortex-a53-843419:--fix-cortex-a53-843419}"
+#else
+#define CA53_ERR_843419_SPEC \
+  " %{mfix-cortex-a53-843419:--fix-cortex-a53-843419}"
+#endif
+
 #ifndef LINK_SPEC
 #define LINK_SPEC "%{mbig-endian:-EB} %{mlittle-endian:-EL} -X \
   -maarch64elf%{mabi=ilp32*:32}%{mbig-endian:b}" \
-  CA53_ERR_835769_SPEC
+  CA53_ERR_835769_SPEC \
+  CA53_ERR_843419_SPEC
 #endif
 
 #endif /* GCC_AARCH64_ELF_RAW_H */
diff --git a/gcc/config/aarch64/aarch64-linux.h 
b/gcc/config/aarch64/aarch64-linux.h
index 9abb252..7973268 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -49,8 +49,17 @@
   " %{mfix-cortex-a53-835769:--fix-cortex-a53-835769}"
 #endif
 
+#ifdef TARGET_FIX_ERR_A53_843419_DEFAULT
+#define CA53_ERR_843419_SPEC \
+  " %{!mno-fix-cortex-a53-843419:--fix-cortex-a53-843419}"
+#else
+#define CA53_ERR_843419_SPEC \
+  " %{mfix-cortex-a53-843419:--fix-cortex-a53-843419}"
+#endif
+
 #define LINK_SPEC LINUX_TARGET_LINK_SPEC \
-  CA53_ERR_835769_SPEC
+  CA53_ERR_835769_SPEC \
+  CA53_ERR_843419_SPEC
 
 #define GNU_USER_TARGET_MATHFILE_SPEC \
   "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s}"
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index f2ef124..6d72ac2 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -71,6 +71,10 @@ mfix-cortex-a53-835769
 Target Report Var(aarch64_fix_a53_err835769) Init(2)
 Workaround for ARM Cortex-A53 Erratum number 835769
 
+mfix-cortex-a53-843419
+Target Report
+Workaround for ARM Cortex-A53 Erratum number 843419
+
 mlittle-endian
 Target Report RejectNegative InverseMask(BIG_END)
 Assume target CPU is configured as little endian
diff --git a/gcc/configure b/gcc/configure
index 84f58ce..e563e94 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -923,6 +923,7 @@ enable_gnu_indirect_function
 enable_initfini_array
 enable_comdat
 enable_fix_cortex_a53_835769
+enable_fix_cortex_a53_843419
 with_glibc_version
 enable_gnu_unique_object
 enable_linker_build_id
@@ -1648,6 +1649,14 @@ Optional Features:
   disable workaround for AArch64 Cortex-A53 erratum
   835769 by default
 
+
+  --enable-fix-cortex-a53-843419
+  enable workaround for AArch64 Cortex-A53 erratum
+  843419 by default
+  --disable-fix-cortex-a53-843419
+  disable workaround for AArch64 Cortex-A53 erratum
+  843419 by default
+
   --enable-gnu-unique-object
   enable the use of the @gnu_unique_object ELF
   extension on glibc systems
@@ -18153,7 +18162,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18156 "configure"
+#line 18165 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18259,7 +18268,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18262 "configure"
+#line 18271 "c

Re: [RFA] More type narrowing in match.pd V2

2015-05-04 Thread Richard Biener
On Sat, May 2, 2015 at 2:36 AM, Jeff Law  wrote:
> Here's an updated patch to add more type narrowing to match.pd.
>
> Changes since the last version:
>
> Slight refactoring of the condition by using types_match as suggested by
> Richi.  I also applied the new types_match to 2 other patterns in match.pd
> where it seemed clearly appropriate.
>
> Additionally the transformation is restricted by using the new single_use
> predicate.  I didn't change other patterns in match.pd to use the new
> single_use predicate.  But some probably could be changed.
>
> This (of course) continues to pass the bootstrap and regression check for
> x86-linux-gnu.
>
> There's still a ton of work to do in this space.  This is meant to be an
> incremental stand-alone improvement.
>
> OK now?

Ok with the {gimple,generic}-match-head.c changes mentioned in the ChangeLog.

Thanks,
Richard.

>
>
> Jeff
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index e006b26..5ee89de 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,8 @@
> +2015-05-01  Jeff Law  
> +
> +   * match.pd (bit_and (plus/minus (convert @0) (convert @1) mask): New
> +   simplifier to narrow arithmetic.
> +
>  2015-05-01  Rasmus Villemoes  
>
> * match.pd: New simplification patterns.
> diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c
> index daa56aa..303b237 100644
> --- a/gcc/generic-match-head.c
> +++ b/gcc/generic-match-head.c
> @@ -70,4 +70,20 @@ along with GCC; see the file COPYING3.  If not see
>  #include "dumpfile.h"
>  #include "generic-match.h"
>
> +/* Routine to determine if the types T1 and T2 are effectively
> +   the same for GENERIC.  */
>
> +inline bool
> +types_match (tree t1, tree t2)
> +{
> +  return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
> +}
> +
> +/* Return if T has a single use.  For GENERIC, we assume this is
> +   always true.  */
> +
> +inline bool
> +single_use (tree t)
> +{
> +  return true;
> +}
> diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
> index c7b2f95..dc13218 100644
> --- a/gcc/gimple-match-head.c
> +++ b/gcc/gimple-match-head.c
> @@ -861,3 +861,21 @@ do_valueize (tree (*valueize)(tree), tree op)
>return op;
>  }
>
> +/* Routine to determine if the types T1 and T2 are effectively
> +   the same for GIMPLE.  */
> +
> +inline bool
> +types_match (tree t1, tree t2)
> +{
> +  return types_compatible_p (t1, t2);
> +}
> +
> +/* Return if T has a single use.  For GIMPLE, we also allow any
> +   non-SSA_NAME (ie constants) and zero uses to cope with uses
> +   that aren't linked up yet.  */
> +
> +inline bool
> +single_use (tree t)
> +{
> +  return TREE_CODE (t) != SSA_NAME || has_zero_uses (t) || has_single_use
> (t);
> +}
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 87ecaf1..51a950a 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -289,8 +289,7 @@ along with GCC; see the file COPYING3.  If not see
>(if (((TREE_CODE (@1) == INTEGER_CST
>  && INTEGRAL_TYPE_P (TREE_TYPE (@0))
>  && int_fits_type_p (@1, TREE_TYPE (@0)))
> -   || (GIMPLE && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1)))
> -   || (GENERIC && TREE_TYPE (@0) == TREE_TYPE (@1)))
> +   || types_match (TREE_TYPE (@0), TREE_TYPE (@1)))
> /* ???  This transform conflicts with fold-const.c doing
>   Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>   constants (if x has signed type, the sign bit cannot be set
> @@ -949,8 +948,7 @@ along with GCC; see the file COPYING3.  If not see
>  /* Unordered tests if either argument is a NaN.  */
>  (simplify
>   (bit_ior (unordered @0 @0) (unordered @1 @1))
> - (if ((GIMPLE && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1)))
> -  || (GENERIC && TREE_TYPE (@0) == TREE_TYPE (@1)))
> + (if (types_match (TREE_TYPE (@0), TREE_TYPE (@1)))
>(unordered @0 @1)))
>  (simplify
>   (bit_ior:c (unordered @0 @0) (unordered:c@2 @0 @1))
> @@ -1054,7 +1052,7 @@ along with GCC; see the file COPYING3.  If not see
> operation and convert the result to the desired type.  */
>  (for op (plus minus)
>(simplify
> -(convert (op (convert@2 @0) (convert@3 @1)))
> +(convert (op@4 (convert@2 @0) (convert@3 @1)))
>  (if (INTEGRAL_TYPE_P (type)
>  /* We check for type compatibility between @0 and @1 below,
> so there's no need to check that @1/@3 are integral types.  */
> @@ -1070,15 +1068,45 @@ along with GCC; see the file COPYING3.  If not see
>  && TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
>  /* The inner conversion must be a widening conversion.  */
>  && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE
> (@0))
> -&& ((GENERIC
> - && (TYPE_MAIN_VARIANT (TREE_TYPE (@0))
> - == TYPE_MAIN_VARIANT (TREE_TYPE (@1)))
> - && (TYPE_MAIN_VARIANT (TREE_TYPE (@0))
> - == TYPE_MAIN_VARIANT (type)))
> -|| (GIMPLE
> -&& types_compatib

[PATCH, ARM] Fix testcases that require Thumb2 effective target.

2015-05-04 Thread Yvan Roux
Hi,

This patch fixes two ARM testcases that require target to be Thumb2
effective.  One is built for Cortex-m3, the purpose of the second one
is to generate thumb2_addsi3_compare0_scratch insn and both are
failing when compiled for armv5t for instance.

Built and regtested, is it OK for trunk ?

Thanks,
Yvan

2015-05-04  Yvan Roux  

* gcc.target/arm/pr65067.c: Require Thumb2 effective target.
* gcc.target/arm/pr65924.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/arm/pr65067.c 
b/gcc/testsuite/gcc.target/arm/pr65067.c
index 9ddd7bb..05da294 100644
--- a/gcc/testsuite/gcc.target/arm/pr65067.c
+++ b/gcc/testsuite/gcc.target/arm/pr65067.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb2_ok } */
 /* { dg-options "-mthumb -mcpu=cortex-m3 -O2" } */
 
 struct tmp {
diff --git a/gcc/testsuite/gcc.target/arm/pr65924.c 
b/gcc/testsuite/gcc.target/arm/pr65924.c
index 746749f..e1ad394 100644
--- a/gcc/testsuite/gcc.target/arm/pr65924.c
+++ b/gcc/testsuite/gcc.target/arm/pr65924.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb2_ok } */
 /* { dg-options "-O2 -mthumb" } */
 
 int a, b, c;


Re: [committed, gcc-5-branch] Set DEV-PHASE to prerelease

2015-05-04 Thread Rainer Orth
Jakub Jelinek  writes:

> On Thu, Apr 23, 2015 at 04:31:52PM -0700, H.J. Lu wrote:
>> Hi,
>> 
>> I checked this patch into gcc-5-branch.
>
> That's wrong according to https://gcc.gnu.org/develop.html#num_scheme

HJ has a point, though: with DEV-PHASE remaining empty, all post-5.1.0
versions of gcc identify as 5.1.1, with no way of telling them apart,
like datestamp and revison.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [committed, gcc-5-branch] Set DEV-PHASE to prerelease

2015-05-04 Thread Jakub Jelinek
On Mon, May 04, 2015 at 11:13:51AM +0200, Rainer Orth wrote:
> Jakub Jelinek  writes:
> 
> > On Thu, Apr 23, 2015 at 04:31:52PM -0700, H.J. Lu wrote:
> >> Hi,
> >> 
> >> I checked this patch into gcc-5-branch.
> >
> > That's wrong according to https://gcc.gnu.org/develop.html#num_scheme
> 
> HJ has a point, though: with DEV-PHASE remaining empty, all post-5.1.0
> versions of gcc identify as 5.1.1, with no way of telling them apart,
> like datestamp and revison.

That suggests we should change
DATESTAMP_s := "\"$(if $(DEVPHASE_c), $(DATESTAMP_c))\""
so that it would expand to DATESTAMP_c also if DEVPHASE_c is empty,
but BASEVER_c does not end with .0

Jakub


Re: [PATCH, x86] Add TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE hook

2015-05-04 Thread Christian Bruel

> Hi Christian,
> I noticed case gcc.dg/ipa/iinline-attr.c failed on aarch64.  The
> original patch is x86 specific, while the case is added as general
> one.  Could you please have a look at this?
> 
> FAIL: gcc.dg/ipa/iinline-attr.c scan-ipa-dump inline
> "hooray[^\\n]*inline copy in test"
> 

that is the same latent bug for aarch64:  alignment flags are not
propagated with attribute optimize ("O2").

testing attached patch

Christian


Index: config/aarch64/aarch64.c
===
--- config/aarch64/aarch64.c	(revision 222627)
+++ config/aarch64/aarch64.c	(working copy)
@@ -6908,18 +6908,6 @@
 #endif
 }
 
-  /* If not opzimizing for size, set the default
- alignment to what the target wants */
-  if (!optimize_size)
-{
-  if (align_loops <= 0)
-	align_loops = aarch64_tune_params->loop_align;
-  if (align_jumps <= 0)
-	align_jumps = aarch64_tune_params->jump_align;
-  if (align_functions <= 0)
-	align_functions = aarch64_tune_params->function_align;
-}
-
   if (AARCH64_TUNE_FMA_STEERING)
 aarch64_register_fma_steering ();
 
@@ -6935,6 +6923,18 @@
 flag_omit_leaf_frame_pointer = false;
   else if (flag_omit_leaf_frame_pointer)
 flag_omit_frame_pointer = true;
+
+  /* If not opzimizing for size, set the default
+ alignment to what the target wants */
+  if (!optimize_size)
+{
+  if (align_loops <= 0)
+	align_loops = aarch64_tune_params->loop_align;
+  if (align_jumps <= 0)
+	align_jumps = aarch64_tune_params->jump_align;
+  if (align_functions <= 0)
+	align_functions = aarch64_tune_params->function_align;
+}
 }
 
 static struct machine_function *


Re: [committed, gcc-5-branch] Set DEV-PHASE to prerelease

2015-05-04 Thread Richard Biener
On Mon, 4 May 2015, Jakub Jelinek wrote:

> On Mon, May 04, 2015 at 11:13:51AM +0200, Rainer Orth wrote:
> > Jakub Jelinek  writes:
> > 
> > > On Thu, Apr 23, 2015 at 04:31:52PM -0700, H.J. Lu wrote:
> > >> Hi,
> > >> 
> > >> I checked this patch into gcc-5-branch.
> > >
> > > That's wrong according to https://gcc.gnu.org/develop.html#num_scheme
> > 
> > HJ has a point, though: with DEV-PHASE remaining empty, all post-5.1.0
> > versions of gcc identify as 5.1.1, with no way of telling them apart,
> > like datestamp and revison.
> 
> That suggests we should change
> DATESTAMP_s := "\"$(if $(DEVPHASE_c), $(DATESTAMP_c))\""
> so that it would expand to DATESTAMP_c also if DEVPHASE_c is empty,
> but BASEVER_c does not end with .0

Yes.

Richard.

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)


Re: [patch] Perform anonymous constant propagation during inlining

2015-05-04 Thread Eric Botcazou
> Hum, the fact that your earlier version created wrong code
> (get_gimple_for_ssa_name
> already returned false here) points at some issues with
> EXPAND_INITIALIZER as well, no...?

Theoritically yes but, in practice, EXPAND_INITIALIZER is used in varasm.c and 
for debugging stuff only, so I don't think that's a real concern.

> That said, the path you add is certainly safe (though maybe we want to
> change get_gimple_for_ssa_name to return tcc_constant single-use defs even
> if TER is disabled
> (thus at -O0 - and only at -O0, otherwise it shouldn't happen).  That
> would cover
> more cases of get_gimple_for_ssa_name uses (I can see
> optimize_bitfield_expansion
> for example...)

optimize_bitfield_assignment_op is only interested in loads from bitfields 
though.  The get_gimple_for_ssa_name route would be interesting to bypass the 
stmt_is_replaceable_p test, i.e. to bypass the single-use test, but this could 
be counter-productive at -O0 so I'm not sure it's worth the trouble.

-- 
Eric Botcazou


[PATCH, AArch64] [4.8] Backport PR64304 fix (miscompilation with -mgeneral-regs-only )

2015-05-04 Thread Chen Shanyao
According to your opinion, I split the backports of pr64304 into 2 
emails, and this one is for 4.8 branch.
This patch backport the fix of PR target/64304 , miscompilation with 
-mgeneral-regs-only, to the 4.8 branch from trunk r219844. Tested on 
x86_64 by using qemu of aarch64.

OK for 4.8?


diff -rupN gcc-4.8-20150226/gcc/ChangeLog 
gcc-4.8-20150226.pr64304//gcc/ChangeLog

--- gcc-4.8-20150226/gcc/ChangeLog2015-03-04 21:13:46.0 -0500
+++ gcc-4.8-20150226.pr64304//gcc/ChangeLog2015-03-04 
21:19:49.0 -0500

@@ -1,3 +1,13 @@
+2015-03-05  Shanyao Chen  
+
+Backported from mainline
+2015-01-19  Jiong Wang  
+Andrew Pinski  
+
+PR target/64304
+* config/aarch64/aarch64.md (define_insn "*ashl3_insn"): 
Deleted.

+(ashl3): Don't expand if operands[2] is not constant.
+
 2015-02-26  Peter Bergner  

 Backport from mainline
diff -rupN gcc-4.8-20150226/gcc/config/aarch64/aarch64.md 
gcc-4.8-20150226.pr64304//gcc/config/aarch64/aarch64.md
--- gcc-4.8-20150226/gcc/config/aarch64/aarch64.md2015-03-04 
21:14:29.0 -0500
+++ gcc-4.8-20150226.pr64304//gcc/config/aarch64/aarch64.md 2015-03-04 
21:21:54.0 -0500

@@ -2612,6 +2612,8 @@
 DONE;
   }
   }
+else
+  FAIL;
   }
 )

@@ -2681,16 +2683,6 @@
(set_attr "mode" "SI")]
 )

-(define_insn "*ashl3_insn"
-  [(set (match_operand:SHORT 0 "register_operand" "=r")
-(ashift:SHORT (match_operand:SHORT 1 "register_operand" "r")
-  (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "rUss")))]
-  ""
-  "lsl\\t%0, %1, %2"
-  [(set_attr "v8type" "shift")
-   (set_attr "mode" "")]
-)
-
 (define_insn "*3_insn"
   [(set (match_operand:SHORT 0 "register_operand" "=r")
 (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r")
diff -rupN gcc-4.8-20150226/gcc/testsuite/ChangeLog 
gcc-4.8-20150226.pr64304//gcc/testsuite/ChangeLog
--- gcc-4.8-20150226/gcc/testsuite/ChangeLog2015-03-04 
21:16:54.0 -0500
+++ gcc-4.8-20150226.pr64304//gcc/testsuite/ChangeLog2015-03-04 
21:22:58.0 -0500

@@ -1,3 +1,10 @@
+2015-03-05  Shanyao chen  
+
+Backported from mainline
+2015-01-19  Jiong Wang  
+
+* gcc.target/aarch64/pr64304.c: New testcase.
+
 2015-02-26  Peter Bergner  

 Backport from mainline
diff -rupN gcc-4.8-20150226/gcc/testsuite/gcc.target/aarch64/pr64304.c 
gcc-4.8-20150226.pr64304//gcc/testsuite/gcc.target/aarch64/pr64304.c
--- gcc-4.8-20150226/gcc/testsuite/gcc.target/aarch64/pr64304.c 
1969-12-31 19:00:00.0 -0500
+++ gcc-4.8-20150226.pr64304//gcc/testsuite/gcc.target/aarch64/pr64304.c 
2015-03-04 21:12:15.0 -0500

@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 --save-temps" } */
+
+unsigned char byte = 0;
+
+void
+set_bit (unsigned int bit, unsigned char value)
+{
+  unsigned char mask = (unsigned char) (1 << (bit & 7));
+
+  if (! value)
+byte &= (unsigned char)~mask;
+  else
+byte |= mask;
+/* { dg-final { scan-assembler "and\tw\[0-9\]+, w\[0-9\]+, 7" } } */
+}
+
+/* { dg-final { cleanup-saved-temps } } */

diff -rupN gcc-4.8-20150226/gcc/ChangeLog 
gcc-4.8-20150226.pr64304//gcc/ChangeLog
--- gcc-4.8-20150226/gcc/ChangeLog  2015-03-04 21:13:46.0 -0500
+++ gcc-4.8-20150226.pr64304//gcc/ChangeLog 2015-03-04 21:19:49.0 
-0500
@@ -1,3 +1,13 @@
+2015-03-05  Shanyao Chen  
+
+Backported from mainline
+2015-01-19  Jiong Wang  
+Andrew Pinski  
+
+PR target/64304
+* config/aarch64/aarch64.md (define_insn "*ashl3_insn"): Deleted.
+(ashl3): Don't expand if operands[2] is not constant.
+
 2015-02-26  Peter Bergner  
 
Backport from mainline
diff -rupN gcc-4.8-20150226/gcc/config/aarch64/aarch64.md 
gcc-4.8-20150226.pr64304//gcc/config/aarch64/aarch64.md
--- gcc-4.8-20150226/gcc/config/aarch64/aarch64.md  2015-03-04 
21:14:29.0 -0500
+++ gcc-4.8-20150226.pr64304//gcc/config/aarch64/aarch64.md 2015-03-04 
21:21:54.0 -0500
@@ -2612,6 +2612,8 @@
DONE;
   }
   }
+else
+  FAIL;
   }
 )
 
@@ -2681,16 +2683,6 @@
(set_attr "mode" "SI")]
 )
 
-(define_insn "*ashl3_insn"
-  [(set (match_operand:SHORT 0 "register_operand" "=r")
-   (ashift:SHORT (match_operand:SHORT 1 "register_operand" "r")
- (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" 
"rUss")))]
-  ""
-  "lsl\\t%0, %1, %2"
-  [(set_attr "v8type" "shift")
-   (set_attr "mode" "")]
-)
-
 (define_insn "*3_insn"
   [(set (match_operand:SHORT 0 "register_operand" "=r")
(ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r")
diff -rupN gcc-4.8-20150226/gcc/testsuite/ChangeLog 
gcc-4.8-20150226.pr64304//gcc/testsuite/ChangeLog
--- gcc-4.8-20150226/gcc/testsuite/ChangeLog2015-03-04 21:16:54.0 
-0500
+++ gcc-4.8-20150226.pr64304//gcc/testsuite/ChangeLog   2015-03-04 
21:22:58.0 -0500
@@ -1,3 +1,10 @@
+201

Re: [committed, gcc-5-branch] Set DEV-PHASE to prerelease

2015-05-04 Thread Jakub Jelinek
On Mon, May 04, 2015 at 11:31:11AM +0200, Richard Biener wrote:
> On Mon, 4 May 2015, Jakub Jelinek wrote:
> 
> > On Mon, May 04, 2015 at 11:13:51AM +0200, Rainer Orth wrote:
> > > Jakub Jelinek  writes:
> > > 
> > > > On Thu, Apr 23, 2015 at 04:31:52PM -0700, H.J. Lu wrote:
> > > >> Hi,
> > > >> 
> > > >> I checked this patch into gcc-5-branch.
> > > >
> > > > That's wrong according to https://gcc.gnu.org/develop.html#num_scheme
> > > 
> > > HJ has a point, though: with DEV-PHASE remaining empty, all post-5.1.0
> > > versions of gcc identify as 5.1.1, with no way of telling them apart,
> > > like datestamp and revison.
> > 
> > That suggests we should change
> > DATESTAMP_s := "\"$(if $(DEVPHASE_c), $(DATESTAMP_c))\""
> > so that it would expand to DATESTAMP_c also if DEVPHASE_c is empty,
> > but BASEVER_c does not end with .0
> 
> Yes.

Here is a patch to do that, ok for trunk/5?

2015-05-04  Jakub Jelinek  

* Makefile.in (PATCHLEVEL_c): New variable.
(DATESTAMP_s, REVISION_s): If PATCHLEVEL_c is not 0,
expand the same way as if DEVPHASE_c was non-empty.

--- gcc/Makefile.in.jj  2015-04-12 21:50:12.0 +0200
+++ gcc/Makefile.in 2015-05-04 12:03:03.394797230 +0200
@@ -828,14 +828,20 @@ endif
 
 version := $(BASEVER_c)
 
+PATCHLEVEL_c := \
+  $(shell echo $(BASEVER_c) | sed -e 's/^[0-9]*\.[0-9]*\.\([0-9]*\)$$/\1/')
+
+
 # For use in version.c - double quoted strings, with appropriate
 # surrounding punctuation and spaces, and with the datestamp and
 # development phase collapsed to the empty string in release mode
-# (i.e. if DEVPHASE_c is empty).  The space immediately after the
-# comma in the $(if ...) constructs is significant - do not remove it.
+# (i.e. if DEVPHASE_c is empty and PATCHLEVEL_c is 0).  The space
+# immediately after the comma in the $(if ...) constructs is
+# significant - do not remove it.
 BASEVER_s   := "\"$(BASEVER_c)\""
 DEVPHASE_s  := "\"$(if $(DEVPHASE_c), ($(DEVPHASE_c)))\""
-DATESTAMP_s := "\"$(if $(DEVPHASE_c), $(DATESTAMP_c))\""
+DATESTAMP_s := \
+  "\"$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(DATESTAMP_c))\""
 PKGVERSION_s:= "\"@PKGVERSION@\""
 BUGURL_s:= "\"@REPORT_BUGS_TO@\""
 
@@ -843,7 +849,8 @@ PKGVERSION  := @PKGVERSION@
 BUGURL_TEXI := @REPORT_BUGS_TEXI@
 
 ifdef REVISION_c
-REVISION_s  := "\"$(if $(DEVPHASE_c), $(REVISION_c))\""
+REVISION_s  := \
+  "\"$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(REVISION_c))\""
 else
 REVISION_s  := "\"\""
 endif


Jakub


Re: [committed, gcc-5-branch] Set DEV-PHASE to prerelease

2015-05-04 Thread Richard Biener
On Mon, 4 May 2015, Jakub Jelinek wrote:

> On Mon, May 04, 2015 at 11:31:11AM +0200, Richard Biener wrote:
> > On Mon, 4 May 2015, Jakub Jelinek wrote:
> > 
> > > On Mon, May 04, 2015 at 11:13:51AM +0200, Rainer Orth wrote:
> > > > Jakub Jelinek  writes:
> > > > 
> > > > > On Thu, Apr 23, 2015 at 04:31:52PM -0700, H.J. Lu wrote:
> > > > >> Hi,
> > > > >> 
> > > > >> I checked this patch into gcc-5-branch.
> > > > >
> > > > > That's wrong according to https://gcc.gnu.org/develop.html#num_scheme
> > > > 
> > > > HJ has a point, though: with DEV-PHASE remaining empty, all post-5.1.0
> > > > versions of gcc identify as 5.1.1, with no way of telling them apart,
> > > > like datestamp and revison.
> > > 
> > > That suggests we should change
> > > DATESTAMP_s := "\"$(if $(DEVPHASE_c), $(DATESTAMP_c))\""
> > > so that it would expand to DATESTAMP_c also if DEVPHASE_c is empty,
> > > but BASEVER_c does not end with .0
> > 
> > Yes.
> 
> Here is a patch to do that, ok for trunk/5?

Looks good to me.

Thanks,
Richard.

> 2015-05-04  Jakub Jelinek  
> 
>   * Makefile.in (PATCHLEVEL_c): New variable.
>   (DATESTAMP_s, REVISION_s): If PATCHLEVEL_c is not 0,
>   expand the same way as if DEVPHASE_c was non-empty.
> 
> --- gcc/Makefile.in.jj2015-04-12 21:50:12.0 +0200
> +++ gcc/Makefile.in   2015-05-04 12:03:03.394797230 +0200
> @@ -828,14 +828,20 @@ endif
>  
>  version := $(BASEVER_c)
>  
> +PATCHLEVEL_c := \
> +  $(shell echo $(BASEVER_c) | sed -e 's/^[0-9]*\.[0-9]*\.\([0-9]*\)$$/\1/')
> +
> +
>  # For use in version.c - double quoted strings, with appropriate
>  # surrounding punctuation and spaces, and with the datestamp and
>  # development phase collapsed to the empty string in release mode
> -# (i.e. if DEVPHASE_c is empty).  The space immediately after the
> -# comma in the $(if ...) constructs is significant - do not remove it.
> +# (i.e. if DEVPHASE_c is empty and PATCHLEVEL_c is 0).  The space
> +# immediately after the comma in the $(if ...) constructs is
> +# significant - do not remove it.
>  BASEVER_s   := "\"$(BASEVER_c)\""
>  DEVPHASE_s  := "\"$(if $(DEVPHASE_c), ($(DEVPHASE_c)))\""
> -DATESTAMP_s := "\"$(if $(DEVPHASE_c), $(DATESTAMP_c))\""
> +DATESTAMP_s := \
> +  "\"$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(DATESTAMP_c))\""
>  PKGVERSION_s:= "\"@PKGVERSION@\""
>  BUGURL_s:= "\"@REPORT_BUGS_TO@\""
>  
> @@ -843,7 +849,8 @@ PKGVERSION  := @PKGVERSION@
>  BUGURL_TEXI := @REPORT_BUGS_TEXI@
>  
>  ifdef REVISION_c
> -REVISION_s  := "\"$(if $(DEVPHASE_c), $(REVISION_c))\""
> +REVISION_s  := \
> +  "\"$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(REVISION_c))\""
>  else
>  REVISION_s  := "\"\""
>  endif
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)


[PATCH, AArch64] [4.9] Backport PR64304 fix (miscompilation with -mgeneral-regs-only )

2015-05-04 Thread Chen Shanyao
According to your opinion, I split the backports of pr64304 into 2 
emails, and this one is for 4.9 branch.
This patch backport the fix of PR target/64304 , miscompilation with 
-mgeneral-regs-only, to the 4.9 branch from trunk r219844. Tested on 
x86_64 by using qemu of aarch64.

OK for 4.9?

diff -rupN gcc-4.9-20150225/gcc/ChangeLog 
gcc-4.9-20150225.pr64304//gcc/ChangeLog

--- gcc-4.9-20150225/gcc/ChangeLog2015-03-04 20:48:30.0 -0500
+++ gcc-4.9-20150225.pr64304//gcc/ChangeLog2015-03-04 
20:55:59.0 -0500

@@ -1,3 +1,13 @@
+2015-03-05  Shanyao Chen  
+
+Backported from mainline
+2015-01-19  Jiong Wang  
+Andrew Pinski  
+
+PR target/64304
+* config/aarch64/aarch64.md (define_insn "*ashl3_insn"): Deleted.
+(ashl3): Don't expand if operands[2] is not constant.
+
 2015-02-25  Kai Tietz  

 PR tree-optimization/61917
diff -rupN gcc-4.9-20150225/gcc/config/aarch64/aarch64.md 
gcc-4.9-20150225.pr64304//gcc/config/aarch64/aarch64.md
--- gcc-4.9-20150225/gcc/config/aarch64/aarch64.md2015-03-04 
20:41:03.0 -0500
+++ gcc-4.9-20150225.pr64304//gcc/config/aarch64/aarch64.md 2015-03-04 
20:46:44.0 -0500

@@ -2719,6 +2719,8 @@
 DONE;
   }
   }
+else
+  FAIL;
   }
 )

@@ -2947,15 +2949,6 @@
   [(set_attr "type" "shift_reg")]
 )

-(define_insn "*ashl3_insn"
-  [(set (match_operand:SHORT 0 "register_operand" "=r")
-(ashift:SHORT (match_operand:SHORT 1 "register_operand" "r")
-  (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "rUss")))]
-  ""
-  "lsl\\t%0, %1, %2"
-  [(set_attr "type" "shift_reg")]
-)
-
 (define_insn "*3_insn"
   [(set (match_operand:SHORT 0 "register_operand" "=r")
 (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r")
diff -rupN gcc-4.9-20150225/gcc/testsuite/ChangeLog 
gcc-4.9-20150225.pr64304//gcc/testsuite/ChangeLog
--- gcc-4.9-20150225/gcc/testsuite/ChangeLog2015-03-04 
21:00:24.0 -0500
+++ gcc-4.9-20150225.pr64304//gcc/testsuite/ChangeLog2015-03-04 
21:03:21.0 -0500

@@ -1,3 +1,10 @@
+2015-03-05  Shanyao chen  
+
+Backported from mainline
+2015-01-19  Jiong Wang  
+
+* gcc.target/aarch64/pr64304.c: New testcase.
+
 2015-02-25  Kai Tietz  

 Backported from mainline
diff -rupN gcc-4.9-20150225/gcc/testsuite/gcc.target/aarch64/pr64304.c 
gcc-4.9-20150225.pr64304//gcc/testsuite/gcc.target/aarch64/pr64304.c
--- gcc-4.9-20150225/gcc/testsuite/gcc.target/aarch64/pr64304.c 
1969-12-31 19:00:00.0 -0500
+++ gcc-4.9-20150225.pr64304//gcc/testsuite/gcc.target/aarch64/pr64304.c 
2015-03-04 20:59:24.0 -0500

@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 --save-temps" } */
+
+unsigned char byte = 0;
+
+void
+set_bit (unsigned int bit, unsigned char value)
+{
+  unsigned char mask = (unsigned char) (1 << (bit & 7));
+
+  if (! value)
+byte &= (unsigned char)~mask;
+  else
+byte |= mask;
+/* { dg-final { scan-assembler "and\tw\[0-9\]+, w\[0-9\]+, 7" } } */
+}
+
+/* { dg-final { cleanup-saved-temps } } */

diff -rupN gcc-4.9-20150225/gcc/ChangeLog 
gcc-4.9-20150225.pr64304//gcc/ChangeLog
--- gcc-4.9-20150225/gcc/ChangeLog  2015-03-04 20:48:30.0 -0500
+++ gcc-4.9-20150225.pr64304//gcc/ChangeLog 2015-03-04 20:55:59.0 
-0500
@@ -1,3 +1,13 @@
+2015-03-05  Shanyao Chen  
+
+   Backported from mainline
+   2015-01-19  Jiong Wang  
+   Andrew Pinski  
+
+   PR target/64304
+   * config/aarch64/aarch64.md (define_insn "*ashl3_insn"): Deleted.
+   (ashl3): Don't expand if operands[2] is not constant.
+
 2015-02-25  Kai Tietz  
 
PR tree-optimization/61917
diff -rupN gcc-4.9-20150225/gcc/config/aarch64/aarch64.md 
gcc-4.9-20150225.pr64304//gcc/config/aarch64/aarch64.md
--- gcc-4.9-20150225/gcc/config/aarch64/aarch64.md  2015-03-04 
20:41:03.0 -0500
+++ gcc-4.9-20150225.pr64304//gcc/config/aarch64/aarch64.md 2015-03-04 
20:46:44.0 -0500
@@ -2719,6 +2719,8 @@
DONE;
   }
   }
+else
+  FAIL;
   }
 )
 
@@ -2947,15 +2949,6 @@
   [(set_attr "type" "shift_reg")]
 )
 
-(define_insn "*ashl3_insn"
-  [(set (match_operand:SHORT 0 "register_operand" "=r")
-   (ashift:SHORT (match_operand:SHORT 1 "register_operand" "r")
- (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" 
"rUss")))]
-  ""
-  "lsl\\t%0, %1, %2"
-  [(set_attr "type" "shift_reg")]
-)
-
 (define_insn "*3_insn"
   [(set (match_operand:SHORT 0 "register_operand" "=r")
(ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r")
diff -rupN gcc-4.9-20150225/gcc/testsuite/ChangeLog 
gcc-4.9-20150225.pr64304//gcc/testsuite/ChangeLog
--- gcc-4.9-20150225/gcc/testsuite/ChangeLog2015-03-04 21:00:24.0 
-0500
+++ gcc-4.9-20150225.pr64304//gcc/testsuite/ChangeLog   2015-03-04 
21:03:21.0 -0500
@@ -1,3 +1,10 @@
+2015-03-05  Shanyao chen  
+
+   Backported from mainline
+   2015-01-19  Jion

[PATCH] Fix PR65965

2015-05-04 Thread Richard Biener

We don't support vectorizing group stores with gaps - so the natural
thing is to just split groups at such boundaries which enables
more BB vectorization (and likely loop vectorization as well, though
that would be some weird cases I suspect).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-05-04  Richard Biener  

PR tree-optimization/65965
* tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Split
store groups at gaps.

* gcc.dg/vect/bb-slp-33.c: New testcase.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 222758)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -2602,6 +2602,15 @@ vect_analyze_data_ref_accesses (loop_vec
  if ((init_b - init_a) % type_size_a != 0)
break;
 
+ /* If we have a store, the accesses are adjacent.  This splits
+groups into chunks we support (we don't support vectorization
+of stores with gaps).  */
+ if (!DR_IS_READ (dra)
+ && (((unsigned HOST_WIDE_INT)init_b
+ - TREE_INT_CST_LOW (DR_INIT (datarefs_copy[i-1])))
+ != type_size_a))
+   break;
+
  /* The step (if not zero) is greater than the difference between
 data-refs' inits.  This splits groups into suitable sizes.  */
  HOST_WIDE_INT step = tree_to_shwi (DR_STEP (dra));
Index: gcc/testsuite/gcc.dg/vect/bb-slp-33.c
===
--- gcc/testsuite/gcc.dg/vect/bb-slp-33.c   (revision 0)
+++ gcc/testsuite/gcc.dg/vect/bb-slp-33.c   (working copy)
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_int } */
+
+#include "tree-vect.h"
+
+extern void abort (void);
+
+void __attribute__((noinline,noclone))
+test(int *__restrict__ a, int *__restrict__ b)
+{
+  a[0] = b[0];
+  a[1] = b[1];
+  a[2] = b[2];
+  a[3] = b[3];
+  a[5] = 0;
+  a[6] = 0;
+  a[7] = 0;
+  a[8] = 0;
+}
+
+int main()
+{
+  int a[9];
+  int b[4];
+  b[0] = 1;
+  __asm__ volatile ("");
+  b[1] = 2;
+  __asm__ volatile ("");
+  b[2] = 3;
+  __asm__ volatile ("");
+  b[3] = 4;
+  __asm__ volatile ("");
+  a[4] = 7;
+  check_vect ();
+  test(a, b);
+  if (a[0] != 1
+  || a[1] != 2
+  || a[2] != 3
+  || a[3] != 4
+  || a[4] != 7
+  || a[5] != 0
+  || a[6] != 0
+  || a[7] != 0
+  || a[8] != 0)
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "slp2" { 
target { vect_element_align || vect_hw_misalign } } } } */
+/* { dg-final { cleanup-tree-dump "slp2" } } */


[PATCH] Fix PR65935

2015-05-04 Thread Richard Biener

The following fixes PR65935 where the vectorizer is confused after
SLP operands swapping to see the stmts in the IL with unswapped
operands.  As we already swap for different def-kinds just swap
for other swaps as well.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-05-04  Richard Biener  

PR tree-optimization/65935
* tree-vect-slp.c (vect_build_slp_tree): If we swapped operands
then make sure to apply that swapping to the IL.

* gcc.dg/vect/pr65935.c: New testcase.

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 222758)
--- gcc/tree-vect-slp.c (working copy)
*** vect_build_slp_tree (loop_vec_info loop_
*** 1081,1093 
dump_printf (MSG_NOTE, "%d ", j);
  }
  dump_printf (MSG_NOTE, "\n");
! /* And try again ... */
  if (vect_build_slp_tree (loop_vinfo, bb_vinfo, &child,
   group_size, max_nunits, loads,
   vectorization_factor,
!  matches, npermutes, &this_tree_size,
   max_tree_size))
{
  oprnd_info->def_stmts = vNULL;
  SLP_TREE_CHILDREN (*node).quick_push (child);
  continue;
--- 1081,1105 
dump_printf (MSG_NOTE, "%d ", j);
  }
  dump_printf (MSG_NOTE, "\n");
! /* And try again with scratch 'matches' ... */
! bool *tem = XALLOCAVEC (bool, group_size);
  if (vect_build_slp_tree (loop_vinfo, bb_vinfo, &child,
   group_size, max_nunits, loads,
   vectorization_factor,
!  tem, npermutes, &this_tree_size,
   max_tree_size))
{
+ /* ... so if successful we can apply the operand swapping
+to the GIMPLE IL.  This is necessary because for example
+vect_get_slp_defs uses operand indexes and thus expects
+canonical operand order.  */
+ for (j = 0; j < group_size; ++j)
+   if (!matches[j])
+ {
+   gimple stmt = SLP_TREE_SCALAR_STMTS (*node)[j];
+   swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt),
+  gimple_assign_rhs2_ptr (stmt));
+ }
  oprnd_info->def_stmts = vNULL;
  SLP_TREE_CHILDREN (*node).quick_push (child);
  continue;
Index: gcc/testsuite/gcc.dg/vect/pr65935.c
===
*** gcc/testsuite/gcc.dg/vect/pr65935.c (revision 0)
--- gcc/testsuite/gcc.dg/vect/pr65935.c (working copy)
***
*** 0 
--- 1,63 
+ /* { dg-do run } */
+ /* { dg-additional-options "-O3" } */
+ /* { dg-require-effective-target vect_double } */
+ 
+ #include "tree-vect.h"
+ 
+ extern void abort (void);
+ extern void *malloc (__SIZE_TYPE__);
+ 
+ struct site {
+ struct {
+   struct {
+   double real;
+   double imag;
+   } e[3][3];
+ } link[32];
+ double phase[32];
+ } *lattice;
+ int sites_on_node;
+ 
+ void rephase (void)
+ {
+   int i,j,k,dir;
+   struct site *s;
+   for(i=0,s=lattice;ilink[dir].e[j][k].real *= s->phase[dir];
+ s->link[dir].e[j][k].imag *= s->phase[dir];
+   }
+ }
+ 
+ int main()
+ {
+   int i,j,k;
+   check_vect ();
+   sites_on_node = 1;
+   lattice = malloc (sizeof (struct site) * sites_on_node);
+   for (i = 0; i < 32; ++i)
+ {
+   lattice->phase[i] = i;
+   for (j = 0; j < 3; ++j)
+   for (k = 0; k < 3; ++k)
+ {
+   lattice->link[i].e[j][k].real = 1.0;
+   lattice->link[i].e[j][k].imag = 1.0;
+   __asm__ volatile ("" : : : "memory");
+ }
+ }
+   rephase ();
+   for (i = 0; i < 32; ++i)
+ for (j = 0; j < 3; ++j)
+   for (k = 0; k < 3; ++k)
+   if (lattice->link[i].e[j][k].real != i
+   || lattice->link[i].e[j][k].imag != i)
+ abort ();
+   return 0;
+ }
+ 
+ /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "slp1" } } */
+ /* { dg-final { cleanup-tree-dump "slp1" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */


Re: [PATCH] Fix PR65935

2015-05-04 Thread H.J. Lu
On Mon, May 4, 2015 at 4:15 AM, Richard Biener  wrote:
>
> The following fixes PR65935 where the vectorizer is confused after
> SLP operands swapping to see the stmts in the IL with unswapped
> operands.  As we already swap for different def-kinds just swap
> for other swaps as well.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> Richard.
>
> 2015-05-04  Richard Biener  
>
> PR tree-optimization/65935
> * tree-vect-slp.c (vect_build_slp_tree): If we swapped operands
> then make sure to apply that swapping to the IL.
>
> * gcc.dg/vect/pr65935.c: New testcase.
>
> Index: gcc/tree-vect-slp.c
> ===
> *** gcc/tree-vect-slp.c (revision 222758)
> --- gcc/tree-vect-slp.c (working copy)
> *** vect_build_slp_tree (loop_vec_info loop_
> *** 1081,1093 
> dump_printf (MSG_NOTE, "%d ", j);
>   }
>   dump_printf (MSG_NOTE, "\n");
> ! /* And try again ... */
>   if (vect_build_slp_tree (loop_vinfo, bb_vinfo, &child,
>group_size, max_nunits, loads,
>vectorization_factor,
> !  matches, npermutes, &this_tree_size,
>max_tree_size))
> {
>   oprnd_info->def_stmts = vNULL;
>   SLP_TREE_CHILDREN (*node).quick_push (child);
>   continue;
> --- 1081,1105 
> dump_printf (MSG_NOTE, "%d ", j);
>   }
>   dump_printf (MSG_NOTE, "\n");
> ! /* And try again with scratch 'matches' ... */
> ! bool *tem = XALLOCAVEC (bool, group_size);
>   if (vect_build_slp_tree (loop_vinfo, bb_vinfo, &child,
>group_size, max_nunits, loads,
>vectorization_factor,
> !  tem, npermutes, &this_tree_size,
>max_tree_size))
> {
> + /* ... so if successful we can apply the operand swapping
> +to the GIMPLE IL.  This is necessary because for example
> +vect_get_slp_defs uses operand indexes and thus expects
> +canonical operand order.  */
> + for (j = 0; j < group_size; ++j)
> +   if (!matches[j])
> + {
> +   gimple stmt = SLP_TREE_SCALAR_STMTS (*node)[j];
> +   swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt),
> +  gimple_assign_rhs2_ptr (stmt));
> + }
>   oprnd_info->def_stmts = vNULL;
>   SLP_TREE_CHILDREN (*node).quick_push (child);
>   continue;
> Index: gcc/testsuite/gcc.dg/vect/pr65935.c
> ===
> *** gcc/testsuite/gcc.dg/vect/pr65935.c (revision 0)
> --- gcc/testsuite/gcc.dg/vect/pr65935.c (working copy)
> ***
> *** 0 
> --- 1,63 
> + /* { dg-do run } */
> + /* { dg-additional-options "-O3" } */
> + /* { dg-require-effective-target vect_double } */
> +
> + #include "tree-vect.h"
> +
> + extern void abort (void);
> + extern void *malloc (__SIZE_TYPE__);
> +
> + struct site {
> + struct {
> +   struct {
> +   double real;
> +   double imag;
> +   } e[3][3];
> + } link[32];
> + double phase[32];
> + } *lattice;
> + int sites_on_node;
> +
> + void rephase (void)
> + {
> +   int i,j,k,dir;
> +   struct site *s;
> +   for(i=0,s=lattice;i + for(dir=0;dir<32;dir++)
> +   for(j=0;j<3;j++)for(k=0;k<3;k++)
> +   {
> + s->link[dir].e[j][k].real *= s->phase[dir];
> + s->link[dir].e[j][k].imag *= s->phase[dir];
> +   }
> + }
> +
> + int main()
> + {
> +   int i,j,k;
> +   check_vect ();
> +   sites_on_node = 1;
> +   lattice = malloc (sizeof (struct site) * sites_on_node);
> +   for (i = 0; i < 32; ++i)
> + {
> +   lattice->phase[i] = i;
> +   for (j = 0; j < 3; ++j)
> +   for (k = 0; k < 3; ++k)
> + {
> +   lattice->link[i].e[j][k].real = 1.0;
> +   lattice->link[i].e[j][k].imag = 1.0;
> +   __asm__ volatile ("" : : : "memory");
> + }
> + }
> +   rephase ();
> +   for (i = 0; i < 32; ++i)
> + for (j = 0; j < 3; ++j)
> +   for (k = 0; k < 3; ++k)
> +   if (lattice->link[i].e[j][k].real != i
> +   || lattice->link[i].e[j][k].imag != i)
> + abort ();
> +   return 0;
> + }
> +
> + /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "slp1" } } */
> + /* { dg-final { cleanup-tree-dump "slp1" } } */
> + /* { dg-final { cleanup-tree-dump "vect" } } */

Need for these when it is a run-time test.

-- 
H.J.


[PATCH] Remove dead code.

2015-05-04 Thread Dominik Vogt
This patch removes a "write only" variable from the C++ code.

ChangeLog:

--

2015-05-04  Dominik Vogt  

* call.c (print_z_candidates): Remove dead code.

--

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
>From 6943ad84a5a5b69c7cf5df1ea5bb6ab5fd254825 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Mon, 4 May 2015 12:46:21 +0100
Subject: [PATCH] Remove dead code.

---
 gcc/cp/call.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 31d2b9c..55350f8 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -3436,7 +3436,6 @@ print_z_candidates (location_t loc, struct z_candidate *candidates)
 {
   struct z_candidate *cand1;
   struct z_candidate **cand2;
-  int n_candidates;
 
   if (!candidates)
 return;
@@ -3478,9 +3477,6 @@ print_z_candidates (location_t loc, struct z_candidate *candidates)
 	}
 }
 
-  for (n_candidates = 0, cand1 = candidates; cand1; cand1 = cand1->next)
-n_candidates++;
-
   for (; candidates; candidates = candidates->next)
 print_z_candidate (loc, "candidate:", candidates);
 }
-- 
2.3.0



Re: [rs6000] Fix compare debug failure on AIX

2015-05-04 Thread Tristan Gingold

> On 04 May 2015, at 02:32, David Edelsohn  wrote:
> 
> On Sat, May 2, 2015 at 6:04 AM, Eric Botcazou  wrote:
>>> Why should GCC unnecessarily create stack frames to avoid
>>> compare-debug testcase failures?
>> 
>> I'm not sure I understand the question... compare-debug failures are failures
>> (-g is not supposed to change the generated code and this XCOFF-specific bug
>> was reported to us) so they need to be fixed.
>> 
>> From there on, as Alan said, there are 2 cases: either AIX needs a frame for
>> debugging or it doesn't.  If the latter, then the lines can simply be 
>> deleted.
>> If the former, we have to draw a line somewhere; Alan suggests always 
>> creating
>> a frame while I suggest creating it only at -O0 and -Og.
> 
> I believe that AIX does need a frame for debugging.  I don't remember
> the exact reason off hand.
> 
> I'm sorry that XCOFF debugging changes the generated code (only in the
> sense of allocating a frame), but that is a system dependency.  It's
> been this way for over 20 years.  I see no reason to produce worse
> code at -O0 when not debugging simply to make testcases happier.
> 
> By the way, I'm still waiting for the DWARF debugging patches from
> Adacore compatible with AIX as and ld.  DWARF debugging would not
> require pushing a frame, and would resolve the failure when testing
> with DWARF.  The patch would be adjusted to only push a frame when
> writing XCOFF debugging.

Sorry but we don’t have these patches.  We have a tiny patch to generate
Dwarf debug infos on XCOFF platforms but that requires GNU as and ld.

Tristan.



Re: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests

2015-05-04 Thread Rainer Orth
Thomas Schwinge  writes:

> Additionally to the "%p" format specifier printing a "0x" prefix vs. not
> doing that, I've also changed the expected "(nil)" output for NULL
> pointers to instead match basically everything.

You cannot expect printf to print "(nil)" or variant for NULL pointers.
E.g. on Solaris 10 you get a SEGV instead.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: PR 64454: Improve VRP for %

2015-05-04 Thread Marc Glisse

On Mon, 4 May 2015, Richard Biener wrote:


On Sat, May 2, 2015 at 12:46 AM, Marc Glisse  wrote:

Hello,

this patch tries to tighten a bit the range estimate for x%y. slp-perm-7.c
started failing by vectorizing more than expected, I assumed it was a good
thing and updated the test. I am less conservative than Jakub with division
by 0, but I still don't really understand how empty ranges are supposed to
be represented in VRP.

Bootstrap+testsuite on x86_64-linux-gnu.


Hmm, so I don't like how you (continute to) use trees for the constant 
computations. wide-ints would be a better fit today.  I also notice that 
fold_unary_to_constant can return NULL_TREE and neither the old nor your 
code handles that.


You are right. I was lazy and tried to keep this part of the old code, I 
shouldn't have...



"empty" ranges are basically UNDEFINED.


Cool, that's what I did. But I don't see code adding calls to 
__builtin_unreachable() when an empty range is detected. Maybe that almost 
never happens?


Aren't you pessimizing the case where the old code used 
value_range_nonnegative_p() by just using TYPE_UNSIGNED?


I don't think so. The old code only handled signed types in the positive 
case, while I have a more complete handling of signed types, which should 
do at least as good as the old one even in the positive case.


--
Marc Glisse


[PATCH] Fix(?) PR66002

2015-05-04 Thread Richard Biener

This fixes a missed vectorization of a function in paq8p.  Without
merged PHI nodes phiopt doesn't recognize adjacent MIN/MAX_EXPRs.
Certainly no other pass I schedule mergephi over cares for merged
PHIs (DCE might even be confused here).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-05-04  Richard Biener  

PR tree-optimization/66002
* passes.def: Schedule pass_merge_phi after VRP, right before
ifcombine and phiopt.

* gcc.dg/vect/vect-125.c: New testcase.

Index: gcc/passes.def
===
*** gcc/passes.def  (revision 222760)
--- gcc/passes.def  (working copy)
*** along with GCC; see the file COPYING3.
*** 168,174 
NEXT_PASS (pass_build_alias);
NEXT_PASS (pass_return_slot);
NEXT_PASS (pass_fre);
-   NEXT_PASS (pass_merge_phi);
NEXT_PASS (pass_vrp);
NEXT_PASS (pass_chkp_opt);
NEXT_PASS (pass_dce);
--- 168,173 
*** along with GCC; see the file COPYING3.
*** 176,181 
--- 175,181 
NEXT_PASS (pass_call_cdce);
NEXT_PASS (pass_cselim);
NEXT_PASS (pass_copy_prop);
+   NEXT_PASS (pass_merge_phi);
NEXT_PASS (pass_tree_ifcombine);
NEXT_PASS (pass_phiopt);
NEXT_PASS (pass_tail_recursion);
Index: gcc/testsuite/gcc.dg/vect/vect-125.c
===
*** gcc/testsuite/gcc.dg/vect/vect-125.c(revision 0)
--- gcc/testsuite/gcc.dg/vect/vect-125.c(working copy)
***
*** 0 
--- 1,19 
+ /* { dg-do compile } */
+ /* { dg-require-effective-target vect_int } */
+ /* { dg-require-effective-target vect_pack_trunc } */
+ /* { dg-require-effective-target vect_unpack } */
+ 
+ void train(short *t, short *w, int n, int err)
+ {
+   n=(n+7)&-8;
+   for (int i=0; i>16)+1>>1);
+   if (wt<-32768) wt=-32768;
+   if (wt>32767) wt=32767;
+   w[i]=wt;
+ }
+ }
+ 
+ /* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail 
vect_no_int_max } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */


Re: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests

2015-05-04 Thread John David Anglin

On 2015-05-04 4:32 AM, Thomas Schwinge wrote:

Dave, would you please test the following patch, and report the
regression status compared to before r222620?  (Compared to your existing
r222021 results, as posted in the PR, for example.)

With patch, we have the following fails on hppa2.0w-hp-hpux11.11:

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-3.c 
-DACC_DEVICE_TYPE_host

=1 -DACC_MEM_SHARED=1 output pattern test, is
libgomp: no device found
, should match device [0-9]+\([0-9]+\) is initialized
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-42.c 
-DACC_DEVICE_TYPE_hos
t=1 -DACC_MEM_SHARED=1 output pattern test, is , should match 
\[[0-9a-fA-FxX]+,2

56\] is not mapped
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-62.c 
-DACC_DEVICE_TYPE_hos

t=1 -DACC_MEM_SHARED=1 output pattern test, is , should match invalid size
Running /test/gnu/gcc/gcc/libgomp/testsuite/libgomp.oacc-c++/c++.exp ...
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/lib-3.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 output pattern test, is

libgomp: no device found
, should match device [0-9]+\([0-9]+\) is initialized
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/lib-42.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 output pattern test, is , 
should match \[[0-9a-fA-FxX]+,256\] is not mapped
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/lib-62.c 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 output pattern test, is , 
should match invalid size


Note this is a 32-bit build and not the 64-bit build reported in PR.  
However, I would expect similar

printf support.  Don't have a 64-bit build handy.

Dave

--
John David Anglin  dave.ang...@bell.net



Re: Extend verify_type to check various uses of TYPE_MINVAL

2015-05-04 Thread Rainer Orth
Jan Hubicka  writes:

> Hi,
> this patch extends verify_type to check various uses of TYPE_MINVAL. 
> I also added check that MIN_VALUE have compatible type with T:
>  useless_type_conversion_p (const_cast  (t), TREE_TYPE (TYPE_MIN_VALUE 
> (t)))
> but that one fails interesting ways for C sizetype. I will try to look
> into this and thus this patch omits it.
>
> The main motivation is to check that various frontend overrides of TYPE_MINVAL
> are under control.
>
> Bootstrapped/regtested x86_64-linux, will commit it as obvious.

Not obvious enough, it seems: this patch broke gnat.dg/lto* tests at
least on i386-pc-solaris2.10.  E.g.

FAIL: gnat.dg/lto1.adb (test for excess errors)
WARNING: gnat.dg/lto1.adb compilation failed to produce executable

FAIL: gnat.dg/lto1.adb (test for excess errors)
Excess errors:
/vol/gcc/src/hg/trunk/solaris/gcc/testsuite/gnat.dg/lto1_pkg.adb:23:1: error: 
TYPE_MIN_VALUE is not constant
 
unit size 
align 32 symtab 0 alias set -1 canonical type fea16000 precision 32 min 
 max >
   >
 
unit size 
align 32 symtab 0 alias set -1 canonical type fea16000 precision 32 min 
 max >
sizes-gimplified visited SI size  unit size 

align 32 symtab 0 alias set -1 canonical type feb67ba0 precision 32 min 
 max 
index type 
unit size 
align 8 symtab 0 alias set -1 canonical type feb67960 precision 8 
min  max 
values 
value 
chain 
value 
chain 
value 
chain  value  context 

chain >
QI size  unit size 
align 8 symtab 0 alias set -1 canonical type feb67b40 precision 8 min 
 max  RM min  RM max >
chain >

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests

2015-05-04 Thread Andreas Schwab
Rainer Orth  writes:

> You cannot expect printf to print "(nil)" or variant for NULL pointers.
> E.g. on Solaris 10 you get a SEGV instead.

You are probably mixing it up with %s.  %p is required to handle NULL
like any other valid pointer value.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PR testsuite/65205, libgomp/65993] Fix dg-shouldfail usage in OpenACC libgomp tests

2015-05-04 Thread Rainer Orth
Andreas Schwab  writes:

> Rainer Orth  writes:
>
>> You cannot expect printf to print "(nil)" or variant for NULL pointers.
>> E.g. on Solaris 10 you get a SEGV instead.
>
> You are probably mixing it up with %s.  %p is required to handle NULL
> like any other valid pointer value.

Seems so.  Sorry for the noise.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-04 Thread Michael Matz
Hi,

On Thu, 30 Apr 2015, Sriraman Tallam wrote:

> We noticed that one of our benchmarks sped-up by ~1% when we eliminated 
> PLT stubs for some of the hot external library functions like memcmp, 
> pow.  The win was from better icache and itlb performance. The main 
> reason was that the PLT stubs had no spatial locality with the 
> call-sites. I have started looking at ways to tell the compiler to 
> eliminate PLT stubs (in-effect inline them) for specified external 
> functions, for x86_64. I have a proposal and a patch and I would like to 
> hear what you think.
> 
> This comes with caveats.  This cannot be generally done for all 
> functions marked extern as it is impossible for the compiler to say if a 
> function is "truly extern" (defined in a shared library). If a function 
> is not truly extern(ends up defined in the final executable), then 
> calling it indirectly is a performance penalty as it could have been a 
> direct call.

This can be fixed by Alans idea.

> Further, the newly created GOT entries are fixed up at 
> start-up and do not get lazily bound.

And this can be fixed by some enhancements in the linker and dynamic 
linker.  The idea is to still generate a PLT stub and make its GOT entry 
point to it initially (like a normal got.plt slot).  Then the first 
indirect call will use the address of PLT entry (starting lazy resolution) 
and update the GOT slot with the real address, so further indirect calls 
will directly go to the function.

This requires a new asm marker (and hence new reloc) as normally if 
there's a GOT slot it's filled by the real symbols address, unlike if 
there's only a got.plt slot.  E.g. a

  call *foo@GOTPLT(%rip)

would generate a GOT slot (and fill its address into above call insn), but 
generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one.


Ciao,
Michael.


Re: Extend verify_type to check various uses of TYPE_MINVAL

2015-05-04 Thread Eric Botcazou
> Not obvious enough, it seems: this patch broke gnat.dg/lto* tests at
> least on i386-pc-solaris2.10.  E.g.
> 
> FAIL: gnat.dg/lto1.adb (test for excess errors)
> WARNING: gnat.dg/lto1.adb compilation failed to produce executable
> 
> FAIL: gnat.dg/lto1.adb (test for excess errors)
> Excess errors:
> /vol/gcc/src/hg/trunk/solaris/gcc/testsuite/gnat.dg/lto1_pkg.adb:23:1:
> error: TYPE_MIN_VALUE is not constant 

TYPE_MIN_VALUE can be arbitrary in Ada, with or without LTO.  For

package Q is

   function LB return Natural;
   function UB return Natural;

end Q;
with Q;

package P is

   type Arr1 is array (Natural range <>) of Boolean;

   subtype Arr2 is Arr1 (Q.LB .. Q.UB);

end P;

the TYPE_DOMAIN of Arr2 is

domain 
sizes-gimplified visited DI size  unit 
size 
align 64 symtab 0 alias set -1 canonical type 0x769be000 precision 
64 min  max 

-- 
Eric Botcazou


[Patch, fortran, 64674, v1] [OOP] ICE in ASSOCIATE with class array

2015-05-04 Thread Andre Vehreschild
Hi all,

I like to present here a first patch for using class arrays in associate. Upto
now gfortran crashed, when a class array-section/element was selected in an
associate. This patch fixes this now for class array sections as well as for
single elements.

The story of the patch is told quite shortly: 

- parse.c::parse_associate() needs to gather more information about what the
  target is like. Previously the target's rank and array_spec was not computed,
  which disallowed the use of further array refs in the associate body:
  associate (vec => class_matrix(2:3, 2))
vec(1) = ... ! <- Unclassifiable statement, because no array_spec was
  attached to vec. This is fixed by the second hunk of the patch.

- The third hunk in primary.c prevents setting the dimension attribute on a
  class object's symbol.

- The hunks in resolve.c take care about adding dummy full array_refs and in
  resolve_assoc_var correct the class type, when the target expression's rank
  is 0. Previously the symbol would have an array valued type, when the
  target's base type was array valued. But for a scalar target this needed some
  polishing.

- Additionally a test was added.

Bootstraps and regtests ok on x86_64-linux-gnu/f21.

Ok for trunk?

Note, this patch was diffed from a trunk with my older patches for

PR65548, v3 https://gcc.gnu.org/ml/fortran/2015-04/msg00123.html and
PR44672, v5 https://gcc.gnu.org/ml/fortran/2015-04/msg00124.html

applied.

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


pr64674_1.clog
Description: Binary data
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 2c7c554..05b8d3d 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -3960,6 +3960,8 @@ parse_associate (void)
   for (a = new_st.ext.block.assoc; a; a = a->next)
 {
   gfc_symbol* sym;
+  gfc_ref *ref;
+  gfc_array_ref *array_ref;
 
   if (gfc_get_sym_tree (a->name, NULL, &a->st, false))
 	gcc_unreachable ();
@@ -3976,6 +3978,84 @@ parse_associate (void)
 	 for parsing component references on the associate-name
 	 in case of association to a derived-type.  */
   sym->ts = a->target->ts;
+
+  /* Check if the target expression is array valued.  This can not always
+	 be done by looking at target.rank, because that might not have been
+	 set yet.  Therefore traverse the chain of refs, looking for the last
+	 array ref and evaluate that.  */
+  array_ref = NULL;
+  for (ref = a->target->ref; ref; ref = ref->next)
+	if (ref->type == REF_ARRAY)
+	  array_ref = &ref->u.ar;
+  if (array_ref || a->target->rank)
+	{
+	  gfc_array_spec *as;
+	  int dim, rank = 0;
+	  if (array_ref)
+	{
+	  /* Count the dimension, that have a non-scalar extend.  */
+	  for (dim = 0; dim < array_ref->dimen; ++dim)
+		if (array_ref->dimen_type[dim] != DIMEN_ELEMENT
+		&& !(array_ref->dimen_type[dim] == DIMEN_UNKNOWN
+			 && array_ref->end[dim] == NULL
+			 && array_ref->start[dim] != NULL))
+		  ++rank;
+	}
+	  else
+	rank = a->target->rank;
+	  /* When the rank is greater than zero then sym will be an array.  */
+	  if (sym->ts.type == BT_CLASS)
+	{
+	  if ((!CLASS_DATA (sym)->as && rank != 0)
+		  || (CLASS_DATA (sym)->as
+		  && CLASS_DATA (sym)->as->rank != rank))
+		{
+		  /* Don't just (re-)set the attr and as in the sym.ts,
+		 because this modifies the target's attr and as.  Copy the
+		 data and do a build_class_symbol.  */
+		  symbol_attribute attr = CLASS_DATA (a->target)->attr;
+		  int corank = gfc_get_corank (a->target);
+		  gfc_typespec type;
+
+		  if (rank || corank)
+		{
+		  as = gfc_get_array_spec ();
+		  as->type = AS_DEFERRED;
+		  as->rank = rank;
+		  as->corank = corank;
+		  attr.dimension = rank ? 1 : 0;
+		  attr.codimension = corank ? 1 : 0;
+		}
+		  else
+		{
+		  as = NULL;
+		  attr.dimension = attr.codimension = 0;
+		}
+		  attr.class_ok = 0;
+		  type = CLASS_DATA (sym)->ts;
+		  if (!gfc_build_class_symbol (&type,
+	   &attr, &as))
+		gcc_unreachable ();
+		  sym->ts = type;
+		  sym->ts.type = BT_CLASS;
+		  sym->attr.class_ok = 1;
+		}
+	  else
+		sym->attr.class_ok = 1;
+	}
+	  else if ((!sym->as && rank != 0)
+		   || (sym->as && sym->as->rank != rank))
+	{
+	  as = gfc_get_array_spec ();
+	  as->type = AS_DEFERRED;
+	  as->rank = rank;
+	  as->corank = gfc_get_corank (a->target);
+	  sym->as = as;
+	  sym->attr.dimension = 1;
+	  if (as->corank)
+		sym->attr.codimension = 1;
+	}
+	}
 }
 
   accept_statement (ST_ASSOCIATE);
diff --git a/gcc/fortran/primary.c b/gcc/fortran/primary.c
index e9ced7e..46810de 100644
--- a/gcc/fortran/primary.c
+++ b/gcc/fortran/primary.c
@@ -1860,7 +1860,8 @@ gfc_match_varspec (gfc_expr *primary, int equiv_flag, bool sub_flag,
   if (sym->assoc && gfc_peek_ascii_char () == '('
   && !(sym->assoc->dangling && sym->assoc->st
 	   && sym->assoc->st->n.sym
-	   &

Re: [PATCH] Fix eipa_sra AAPCS issue (PR target/65956)

2015-05-04 Thread Jakub Jelinek
On Mon, May 04, 2015 at 10:11:13AM +0200, Richard Biener wrote:
> Not sure how this helps when SRA tears apart the parameter.  That is,
> isn't the important thing that both the IPA modified function argument
> types/decls have the same type as the types of the parameters SRA ends
> up passing?  (as far as alignment goes?)
> 
> Yes, of course using "natural" alignment makes sure that the backend
> can handle alignment properly and we don't run into oddball bugs here.

On IRC we were discussing making

 /* Return true if mode/type need doubleword alignment.  */
 static bool
 arm_needs_doubleword_align (machine_mode mode, const_tree type)
 {
   return (GET_MODE_ALIGNMENT (mode) > PARM_BOUNDARY
- || (type && TYPE_ALIGN (type) > PARM_BOUNDARY));
+ || (type && TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) > PARM_BOUNDARY));
 }


Looking at

struct S { char a[16]; }; 
typedef struct S T;
typedef struct S U __attribute__((aligned (16))); 
struct V { U u; T v; };
typedef int N __attribute__((aligned (16)));

T t1;
U u1;
int a[3];

void
f5 (__builtin_va_list *ap)
{
  t1 = __builtin_va_arg (*ap, T);
  a[0] = __builtin_va_arg (*ap, int);
  u1 = __builtin_va_arg (*ap, U);
  a[1] = __builtin_va_arg (*ap, int);
  a[2] = __builtin_va_arg (*ap, N);
}

void f6 (int, N, int, U);

void
f7 (void)
{
  U u = {};
  f6 (0, (N) 1, 0, u);
}

and s/16/8/g output, it seems that neither i?86 nor x86_64 care about
the alignment for any passing, ppc64le cares about aggregates, but not
scalars apparently (with a warning that the passing changed), arm cares
about both.  And the f7 function shows that for non-aggregates, what arm
does is simply never going to work, because there is no way to pass down
the scalars aligned, f6 is still called with 1 in int type rather than N.

So at least changing arm_needs_doubleword_align for non-aggregates would
likely not break anything that hasn't been broken already and would unbreak
the majority of cases.

The following testcase shows that eipa_sra changes alignment even for the
aggregates.  Change aligned (8) to aligned (4) to see another possibility.

/* PR target/65956 */

struct B { char *a, *b; };
typedef struct B C __attribute__((aligned (8)));
struct A { C a; int b; long long c; };
char v[3];

__attribute__((noinline, noclone)) void
fn1 (int v, ...)
{
  __builtin_va_list ap;
  __builtin_va_start (ap, v);
  C c, d;
  c = __builtin_va_arg (ap, C);
  __builtin_va_arg (ap, int);
  d = __builtin_va_arg (ap, C);
  __builtin_va_end (ap);
  if (c.a != &v[1] || d.a != &v[2])
__builtin_abort ();
  v[1]++;
}

__attribute__((noinline, noclone)) int
fn2 (C x)
{
  asm volatile ("" : "+g" (x.a) : : "memory");
  asm volatile ("" : "+g" (x.b) : : "memory");
  return x.a == &v[0];
}

__attribute__((noinline, noclone)) void
fn3 (const char *x)
{
  if (x[0] != 0)
__builtin_abort ();
}

static struct A
foo (const char *x, struct A y, struct A z)
{
  struct A r = { { 0, 0 }, 0, 0 };
  if (y.b && z.b)
{
  if (fn2 (y.a) && fn2 (z.a))
switch (x[0])
  {
  case '|':
break;
  default:
fn3 (x);
  }
  fn1 (0, y.a, 0, z.a);
}
  return r;
}

__attribute__((noinline, noclone)) int
bar (int x, struct A *y)
{
  switch (x)
{
case 219:
  foo ("+", y[-2], y[0]);
case 220:
  foo ("-", y[-2], y[0]);
}
}

int
main ()
{
  struct A a[3] = { { { &v[1], &v[0] }, 1, 1LL },
{ { &v[0], &v[0] }, 0, 0LL },
{ { &v[2], &v[0] }, 2, 2LL } };
  bar (220, a + 2);
  if (v[1] != 1)
__builtin_abort ();
  return 0;
}

Jakub


Re: [C++17] Implement N3928 - Extending static_assert

2015-05-04 Thread Marek Polacek
On Sat, May 02, 2015 at 04:16:18PM -0400, Ed Smith-Rowland wrote:
> This extends' static assert to not require a message string.
> I elected to make this work also for C++11 and C++14 and warn only with
> -pedantic.
> I think many people just write
>   static_assert(thing, "");
> .
> 
> I took the path of building an empty string in the parser in this case.
> I wasn't sure if setting message to NULL_TREE would cause sadness later on
> or not.
> 
> I also, perhaps in a fit of overzealousness made finish_static_assert not
> print the extra ": " and an empty message in this case.
> 
> I didn't modify _Static_assert for C.

I'm not aware of any C DR that is asking for _Static_assert (cst-expr), so
I suppose there's no need to change C at this point.

Marek


[Committed] Restore bootstrap for ARM

2015-05-04 Thread Andreas Tobler

All,

I committed the below as obvious.

Andreas

2015-05-04  Andreas Tobler  

* config/arm/arm.c: Restore bootstrap.


Index: config/arm/arm.c
===
--- config/arm/arm.c(revision 222767)
+++ config/arm/arm.c(working copy)
@@ -150,7 +150,7 @@
 static void assign_minipool_offsets (Mfix *);
 static void arm_print_value (FILE *, rtx);
 static void dump_minipool (rtx_insn *);
-static int arm_barrier_cost (rtx);
+static int arm_barrier_cost (rtx_insn *);
 static Mfix *create_fix_barrier (Mfix *, HOST_WIDE_INT);
 static void push_minipool_barrier (rtx_insn *, HOST_WIDE_INT);
 static void push_minipool_fix (rtx_insn *, HOST_WIDE_INT, rtx *,


PIC calls without PLT, generic implementation

2015-05-04 Thread Alexander Monakov
Recent post by Sriraman prompts me to post my -fno-plt approach sooner rather
than later; I was working on no-PLT PIC codegen in last few days too.
Although I'm posting a patch series, half of it is i386 backend tuning and can
go in independently.  Except one patch where it's noted specifically, the
patches were bootstrapped and regtested together, not separately, on x86-64.
Likewise the improvement claimed below is obtained with GCC with all patches
applied, the difference being only in -fno-plt flag.

The approach taken here is different.  Instead of adjusting call expansion in
the back end, I force callee address to be loaded into a pseudo at RTL
expansion time, similar to "function CSE" which is not enabled to most
targets.  The address load (which loads from GOT) can be moved out of loops,
scheduled, or, on x86, re-fused with indirect jump by peepholes.  On 32-bit
x86, it also allows the compiler to use registers other than %ebx for GOT
pointer (which can be a win since %ebx is callee-saved).

The benefit of PLT is the possibility of lazy relocation.  It is not possible
with BIND_NOW, in particular when -z relro -z now flags were used at link time
as security hardening measure.  Performance-critical executables do not
particularly need PLT and lazy relocation too, except if they are used very
frequently, with each individual run time extremely small -- but in that case
they can benefit massively from static linking or less massively from
prelinking, and with prelinking they can get the benefit of no-plt.

I've used LLVM/Clang to evaluate performance impact of PLT-less PIC codegen.
I configured with
  cmake -DLLVM_ENABLE_PIC=ON -DBUILD_SHARED_LIBS=ON \
  -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=OFF
from 3.6 release branch; this configuration mimics non-static build that e.g.
OpenSUSE is using, and produces Clang dependent on 112 clang/llvm shared
libraries, with roughly 24000 externally visible functions.

Without input files time is mostly spent on dynamic linking, so without
prelink there's a predictable regression, from 55 to 140 ms.  On C++ hello
world, I get:
PLT   no-PLT  PLT+BIND_NOW
[32bit]  430 ms   535 ms  590 ms
[64bit]  410 ms   495 ms  555 ms

So no-PLT is >20% slower than default, but already >10% faster when non-lazy
binding is forced.

On tramp3d compilation with -O2 -g I get:
PLT   no-PLT
[32bit]  49.0 s   43.3 s
[64bit]  41.6 s   36.8 s

So on long-running compiles -fno-plt is a very significant win.  Note that I'm
using Clang as (perhaps extreme) example of PIC-call-intensive code, but the
argument about -fno-plt being useful for performance should apply generally.

When looking at code size changes, there's a 1% improvement on 32-bit
libstdc++ and a small regression on 64-bit.  On LLVM/Clang, there's overall size
regression on both 32-bit and 64-bit; I've tried to analyze it and so far came
up with one possible cause, which is detailed in IRA REG_EQUIV patch.

Thanks.
Alexander


[PATCH i386] Move CLOBBERED_REGS earlier in register class list

2015-05-04 Thread Alexander Monakov
On 32-bit x86, register class CLOBBERED_REGS is a proper subset of
LEGACY_REGS, which causes IRA not to consider it separately for register
allocation, even when it has lower cost than other classes.  This patch is
useful to fix code generation problem that appears with no-PLT PIC tailcalls.

Was there a specific reason for CLOBBERED_REGS class to be listed as late as
it is?  On 32-bit this class contains only EAX, ECX, EDX.

OK?
* config/i386/i386.h (enum reg_class): Move CLOBBERED_REGS before 
Q_REGS.
(REG_CLASS_NAMES): Ditto.
(REG_CLASS_CONTENTS): Ditto.

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 1e755d3..75071ac 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1300,17 +1300,17 @@ extern const char *host_detect_local_cpu (int argc, 
const char **argv);
 
 enum reg_class
 {
   NO_REGS,
   AREG, DREG, CREG, BREG, SIREG, DIREG,
   AD_REGS, /* %eax/%edx for DImode */
+  CLOBBERED_REGS,  /* call-clobbered integer registers */
   Q_REGS,  /* %eax %ebx %ecx %edx */
   NON_Q_REGS,  /* %esi %edi %ebp %esp */
   INDEX_REGS,  /* %eax %ebx %ecx %edx %esi %edi %ebp */
   LEGACY_REGS, /* %eax %ebx %ecx %edx %esi %edi %ebp %esp */
-  CLOBBERED_REGS,  /* call-clobbered integer registers */
   GENERAL_REGS,/* %eax %ebx %ecx %edx %esi %edi %ebp 
%esp
   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 */
   FP_TOP_REG, FP_SECOND_REG,   /* %st(0) %st(1) */
   FLOAT_REGS,
   SSE_FIRST_REG,
   NO_REX_SSE_REGS,
@@ -1361,16 +1361,16 @@ enum reg_class
 
 #define REG_CLASS_NAMES \
 {  "NO_REGS",  \
"AREG", "DREG", "CREG", "BREG", \
"SIREG", "DIREG",   \
"AD_REGS",  \
+   "CLOBBERED_REGS",   \
"Q_REGS", "NON_Q_REGS", \
"INDEX_REGS",   \
"LEGACY_REGS",  \
-   "CLOBBERED_REGS",   \
"GENERAL_REGS", \
"FP_TOP_REG", "FP_SECOND_REG",  \
"FLOAT_REGS",   \
"SSE_FIRST_REG",\
"NO_REX_SSE_REGS",  \
"SSE_REGS", \
@@ -1400,17 +1400,17 @@ enum reg_class
   { 0x02,   0x0,0x0 },   /* DREG */  \
   { 0x04,   0x0,0x0 },   /* CREG */  \
   { 0x08,   0x0,0x0 },   /* BREG */  \
   { 0x10,   0x0,0x0 },   /* SIREG */ \
   { 0x20,   0x0,0x0 },   /* DIREG */ \
   { 0x03,   0x0,0x0 },   /* AD_REGS */   \
+  { 0x07,   0x0,0x0 },   /* CLOBBERED_REGS */\
   { 0x0f,   0x0,0x0 },   /* Q_REGS */\
   { 0x1100f0,0x1fe0,0x0 },   /* NON_Q_REGS */\
   { 0x7f,0x1fe0,0x0 },   /* INDEX_REGS */\
   { 0x1100ff,   0x0,0x0 },   /* LEGACY_REGS */   \
-  { 0x07,   0x0,0x0 },   /* CLOBBERED_REGS */\
   { 0x1100ff,0x1fe0,0x0 },   /* GENERAL_REGS */  \
  { 0x100,   0x0,0x0 },   /* FP_TOP_REG */\
 { 0x0200,   0x0,0x0 },   /* FP_SECOND_REG */ \
 { 0xff00,   0x0,0x0 },   /* FLOAT_REGS */\
   { 0x20,   0x0,0x0 },   /* SSE_FIRST_REG */ \
 { 0x1fe0,  0x00,0x0 },   /* NO_REX_SSE_REGS */   \


[PATCH i386] PR65753: allow PIC tail calls via function pointers

2015-05-04 Thread Alexander Monakov
In the i386 backend, tailcalls are incorrectly disallowed in PIC mode for
calls via function pointers on the basis that indirect calls, like direct
calls, would go via PLT and thus require %ebx to point to GOT -- but that is
not true.  Quoting Rich Felker who reported the bug,

  "For PLT slots in the non-PIE main executable, %ebx is not required at all.
  PLT slots in PIE or shared libraries need %ebx, but a function pointer can
  never evaluate to such a PLT slot; it always evaluates to the nominal address
  of the function which is the same in all DSOs and therefore fundamentally
  cannot depend on the address of the GOT in the calling DSO"

As far as I can see it's simply a mistake that was there from day 1 (comment 4
in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65753 points to original patch).

Bootstrapped and regtested on 32-bit x86, OK for trunk?
(the comment before the condition will need to be adjusted too, i.e.
s/optimize any indirect call, or a direct call/optimize any direct call/ )

PR target/65753
* config/i386/i386.c (ix86_function_ok_for_sibcall): Allow PIC sibcalls
via function pointers.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3263656..f29e053 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5448,13 +5448,13 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   /* If we are generating position-independent code, we cannot sibcall
  optimize any indirect call, or a direct call to a global function,
  as the PLT requires %ebx be live. (Darwin does not have a PLT.)  */
   if (!TARGET_MACHO
   && !TARGET_64BIT
   && flag_pic
-  && (!decl || !targetm.binds_local_p (decl)))
+  && (decl && !targetm.binds_local_p (decl)))
 return false;
 
   /* If we need to align the outgoing stack, then sibcalling would
  unalign the stack, which may break the called function.  */
   if (ix86_minimum_incoming_stack_boundary (true)
   < PREFERRED_STACK_BOUNDARY)


[PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-04 Thread Alexander Monakov
With -fno-plt, we don't have to reject even direct calls as sibcall
candidates.

This patch depends on '-fplt' flag that is introduced in another patch.

This patch requires that with -fno-plt all sibcall candidates go through
prepare_call_address that transforms the call to a GOT lookup.

OK?
* config/i386/i386.c (ix86_function_ok_for_sibcall): Check flag_plt.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f29e053..b734350 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5448,12 +5448,13 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   /* If we are generating position-independent code, we cannot sibcall
  optimize any indirect call, or a direct call to a global function,
  as the PLT requires %ebx be live. (Darwin does not have a PLT.)  */
   if (!TARGET_MACHO
   && !TARGET_64BIT
   && flag_pic
+  && flag_plt
   && (decl && !targetm.binds_local_p (decl)))
 return false;
 
   /* If we need to align the outgoing stack, then sibcalling would
  unalign the stack, which may break the called function.  */
   if (ix86_minimum_incoming_stack_boundary (true)


[PATCH i386] Extend sibcall peepholes to allow source in %eax

2015-05-04 Thread Alexander Monakov
On i386, peepholes that transform memory load and register-indirect jump into
memory-indirect jump are overly restrictive in that they don't allow combining
when the jump target is loaded into %eax, and the called function returns a
value (also in %eax, so it's not dead after the call).  Fix this by checking
for same source and output register operands separately.

OK?
* config/i386/i386.md (sibcall_value_memory): Extend peepholes to
allow memory address in %eax.
(sibcall_value_pop_memory): Likewise.

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 729db75..7f81bcc 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11872,13 +11872,14 @@
   [(set (match_operand:W 0 "register_operand")
(match_operand:W 1 "memory_operand"))
(set (match_operand 2)
(call (mem:QI (match_dup 0))
 (match_operand 3)))]
   "!TARGET_X32 && SIBLING_CALL_P (peep2_next_insn (1))
-   && peep2_reg_dead_p (2, operands[0])"
+   && (REGNO (operands[2]) == REGNO (operands[0])
+   || peep2_reg_dead_p (2, operands[0]))"
   [(parallel [(set (match_dup 2)
   (call (mem:QI (match_dup 1))
 (match_dup 3)))
  (unspec [(const_int 0)] UNSPEC_PEEPSIB)])])
 
 (define_peephole2
@@ -11886,13 +11887,14 @@
(match_operand:W 1 "memory_operand"))
(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
(set (match_operand 2)
(call (mem:QI (match_dup 0))
  (match_operand 3)))]
   "!TARGET_X32 && SIBLING_CALL_P (peep2_next_insn (2))
-   && peep2_reg_dead_p (3, operands[0])"
+   && (REGNO (operands[2]) == REGNO (operands[0])
+   || peep2_reg_dead_p (3, operands[0]))"
   [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
(parallel [(set (match_dup 2)
   (call (mem:QI (match_dup 1))
 (match_dup 3)))
  (unspec [(const_int 0)] UNSPEC_PEEPSIB)])])
 
@@ -11951,13 +11953,14 @@
   (call (mem:QI (match_dup 0))
 (match_operand 3)))
  (set (reg:SI SP_REG)
   (plus:SI (reg:SI SP_REG)
(match_operand:SI 4 "immediate_operand")))])]
   "!TARGET_64BIT && SIBLING_CALL_P (peep2_next_insn (1))
-   && peep2_reg_dead_p (2, operands[0])"
+   && (REGNO (operands[2]) == REGNO (operands[0])
+   || peep2_reg_dead_p (2, operands[0]))"
   [(parallel [(set (match_dup 2)
   (call (mem:QI (match_dup 1))
 (match_dup 3)))
  (set (reg:SI SP_REG)
   (plus:SI (reg:SI SP_REG)
(match_dup 4)))
@@ -11971,13 +11974,14 @@
   (call (mem:QI (match_dup 0))
 (match_operand 3)))
  (set (reg:SI SP_REG)
   (plus:SI (reg:SI SP_REG)
(match_operand:SI 4 "immediate_operand")))])]
   "!TARGET_64BIT && SIBLING_CALL_P (peep2_next_insn (2))
-   && peep2_reg_dead_p (3, operands[0])"
+   && (REGNO (operands[2]) == REGNO (operands[0])
+   || peep2_reg_dead_p (3, operands[0]))"
   [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
(parallel [(set (match_dup 2)
   (call (mem:QI (match_dup 1))
 (match_dup 3)))
  (set (reg:SI SP_REG)
   (plus:SI (reg:SI SP_REG)


[PATCH] Expand PIC calls without PLT with -fno-plt

2015-05-04 Thread Alexander Monakov
This patch introduces option -fno-plt that allows to expand calls that would
go via PLT to load the address of the function immediately at call site (which
introduces a GOT load).  Cover letter explains the motivation for this patch.

New option documentation for invoke.texi is missing from the patch; if this is
accepted I'll be happy to send a v2 with documentation added.

* calls.c (prepare_call_address): Transform PLT call to GOT lookup and
indirect call by forcing address into a pseudo with -fno-plt.
* common.opt (flag_plt): New option.

diff --git a/gcc/calls.c b/gcc/calls.c
index 970415d..0c3b9aa 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -222,12 +222,18 @@ prepare_call_address (tree fndecl_or_type, rtx funexp, 
rtx static_chain_value,
 /* If we are using registers for parameters, force the
function address into a register now.  */
 funexp = ((reg_parm_seen
   && targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
  ? force_not_mem (memory_address (FUNCTION_MODE, funexp))
  : memory_address (FUNCTION_MODE, funexp));
+  else if (flag_pic && !flag_plt && fndecl_or_type
+  && TREE_CODE (fndecl_or_type) == FUNCTION_DECL
+  && !targetm.binds_local_p (fndecl_or_type))
+{
+  funexp = force_reg (Pmode, funexp);
+}
   else if (! sibcallp)
 {
 #ifndef NO_FUNCTION_CSE
   if (optimize && ! flag_no_function_cse)
funexp = force_reg (Pmode, funexp);
 #endif
diff --git a/gcc/common.opt b/gcc/common.opt
index b49ac46..cd8b256 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1773,12 +1773,16 @@ Common Report Var(flag_pic,1) Negative(fpie)
 Generate position-independent code if possible (small mode)
 
 fpie
 Common Report Var(flag_pie,1) Negative(fPIC)
 Generate position-independent code for executables if possible (small mode)
 
+fplt
+Common Report Var(flag_plt) Init(1)
+Use PLT for PIC calls (-fno-plt: load the address from GOT at call site)
+
 fplugin=
 Common Joined RejectNegative Var(common_deferred_options) Defer
 Specify a plugin to load
 
 fplugin-arg-
 Common Joined RejectNegative Var(common_deferred_options) Defer


[RFC PATCH] ira: accept loads via argp rtx in validate_equiv_mem

2015-05-04 Thread Alexander Monakov
With this patch at hand, I'd like to discuss a code generation problem, which
my patch solves only partially.  FWIW, it passes bootstrap/regtest on x86-64.

With other patches in series applied, GCC with -fno-plt can generate tail
calls in PIC mode more frequently, but sometimes poorer code is generated.
I've tried to look for possible causes, and found one issue so far.

Consider the following testcase:

void foo1(int a, int b, int c, int d, int e, int f, int g, int h);
int bar(int x);
void foo2(int a, int b, int c, int d, int e, int f, int g, int h)
{
  bar(a);
  foo1(a, b, c, d, e, f, g, h);
}

Comparing x86 code generation with -O2 -m32 and with/without -fPIC, you can
see that -fPIC happens to produce smaller code.  Without -fPIC, GCC
saves/restores all arguments before/after call to 'bar'.

The reason for that is without -fPIC, GCC performs tail call optimization on
'foo1', and that causes it to drop REG_EQUIV notes for incoming arguments in
fixup_tail_calls.  After that, code generation diverges at IRA stage, where
lack of equivalences prevents loads of pseudos to be moved to the point of
first use.

The patch tries to repair the problem by allowing REG_EQUIV notes to be
resynthesized at ira init for loads that happen via `argp' rtx.  It helps for
the simple testcase above, but not for problematic Clang/LLVM functions where
I noticed the issue.

I hope there's a way around the 'big hammer' approach of fixup_tail_calls.
Might it be possible instead of dropping REG_EQUIV notes, to copy incoming
arguments into other pseudos just prior to stack pointer adjustment in
preparation for tailcall?

diff --git a/gcc/ira.c b/gcc/ira.c
index ea2b69f..e6b82e2 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -3001,13 +3001,16 @@ validate_equiv_mem (rtx_insn *start, rtx reg, rtx 
memref)
 
   /* This used to ignore readonly memory and const/pure calls.  The problem
 is the equivalent form may reference a pseudo which gets assigned a
 call clobbered hard reg.  When we later replace REG with its
 equivalent form, the value in the call-clobbered reg has been
 changed and all hell breaks loose.  */
-  if (CALL_P (insn))
+  rtx addr = XEXP (memref, 0);
+  if (GET_CODE (addr) == PLUS && GET_CODE (XEXP (addr, 1)) == CONST_INT)
+   addr = XEXP (addr, 0);
+  if (CALL_P (insn) && addr != arg_pointer_rtx)
return 0;
 
   note_stores (PATTERN (insn), validate_equiv_mem_from_store, NULL);
 
   /* If a register mentioned in MEMREF is modified via an
 auto-increment, we lose the equivalence.  Do the same if one


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-04 Thread Xinliang David Li
The use case proposed by Sri allows user to selectively eliminate PLT
overhead for hot external calls only. In such scenarios, lazy binding
won't be something matters to the user.

David

On Mon, May 4, 2015 at 7:45 AM, Michael Matz  wrote:
> Hi,
>
> On Thu, 30 Apr 2015, Sriraman Tallam wrote:
>
>> We noticed that one of our benchmarks sped-up by ~1% when we eliminated
>> PLT stubs for some of the hot external library functions like memcmp,
>> pow.  The win was from better icache and itlb performance. The main
>> reason was that the PLT stubs had no spatial locality with the
>> call-sites. I have started looking at ways to tell the compiler to
>> eliminate PLT stubs (in-effect inline them) for specified external
>> functions, for x86_64. I have a proposal and a patch and I would like to
>> hear what you think.
>>
>> This comes with caveats.  This cannot be generally done for all
>> functions marked extern as it is impossible for the compiler to say if a
>> function is "truly extern" (defined in a shared library). If a function
>> is not truly extern(ends up defined in the final executable), then
>> calling it indirectly is a performance penalty as it could have been a
>> direct call.
>
> This can be fixed by Alans idea.
>
>> Further, the newly created GOT entries are fixed up at
>> start-up and do not get lazily bound.
>
> And this can be fixed by some enhancements in the linker and dynamic
> linker.  The idea is to still generate a PLT stub and make its GOT entry
> point to it initially (like a normal got.plt slot).  Then the first
> indirect call will use the address of PLT entry (starting lazy resolution)
> and update the GOT slot with the real address, so further indirect calls
> will directly go to the function.
>
> This requires a new asm marker (and hence new reloc) as normally if
> there's a GOT slot it's filled by the real symbols address, unlike if
> there's only a got.plt slot.  E.g. a
>
>   call *foo@GOTPLT(%rip)
>
> would generate a GOT slot (and fill its address into above call insn), but
> generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one.
>
>
> Ciao,
> Michael.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-04 Thread Michael Matz
Hi,

On Mon, 4 May 2015, Xinliang David Li wrote:

> The use case proposed by Sri allows user to selectively eliminate PLT
> overhead for hot external calls only.

Yes, but only _because_ his approach doesn't use lazy binding.  With the 
full solution such restriction to a subset of functions isn't necessary.
And we should strive for going the full way, instead of adding hacks, 
shouldn't we?


Ciao,
Michael.


Re: [RFA] More type narrowing in match.pd V2

2015-05-04 Thread Jeff Law

On 05/02/2015 03:17 PM, Bernhard Reutner-Fischer wrote:


I should find time to commit the already approved auto-wipe dump file patch.
So let's assume I'll get to it maybe next weekend and nobody will notice the 2 
leftover .original dumps in this patch :)
Doh!  Not sure how there's be a .original dump left lying around, but as 
posted it'll definitely leave a .optimized lying around.  I'll fix that 
before committing.


Thanks for pointing it out.

jeff


Re: [PATCH] Remove dead code.

2015-05-04 Thread Jeff Law

On 05/04/2015 05:50 AM, Dominik Vogt wrote:

This patch removes a "write only" variable from the C++ code.

ChangeLog:

--

2015-05-04  Dominik Vogt  

* call.c (print_z_candidates): Remove dead code.

OK.  Please install.

FWIW, removing a write-only variable seems like it ought ot fall under 
the obvious rule.


jeff


Re: [PATCH 00/13] further rtx_insn *ification

2015-05-04 Thread Jeff Law

On 05/02/2015 03:01 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

Hi,

This set of patches changes rtx to rtx_insn * in many plaes where its fairly
trivial to do so.

each was bootstrapped + regtested on x86_64-linux-gnu, and the series was run
through config-list.mk.  I believe this all falls under Jeff's preapproval from
last year for this sort of thing which I assume is still valid, so committing
to trunk.
And just to be explicit, it does fall under that preapproval for such 
changes.


Jeff



Re: [PATCH 1/4] libcpp: Improvements to comments in line-map.h/c

2015-05-04 Thread Jeff Law

On 05/01/2015 06:56 PM, David Malcolm wrote:

This patch updates and expands some comments in libcpp, adding
a big table to try to clarify what an individual source_location
value can mean.

libcpp/ChangeLog:
* include/line-map.h: Fix comment at the top of the file.
(source_location): Rewrite and expand the comment for this
typedef, adding an ascii-art table to clarify how source_location
values are allocated.
* line-map.c: Fix comment at the top of the file.

OK.
jeff



Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-04 Thread Xinliang David Li
yes -- a full solution that supports lazy binding will be nice.

David

On Mon, May 4, 2015 at 9:58 AM, Michael Matz  wrote:
> Hi,
>
> On Mon, 4 May 2015, Xinliang David Li wrote:
>
>> The use case proposed by Sri allows user to selectively eliminate PLT
>> overhead for hot external calls only.
>
> Yes, but only _because_ his approach doesn't use lazy binding.  With the
> full solution such restriction to a subset of functions isn't necessary.
> And we should strive for going the full way, instead of adding hacks,
> shouldn't we?
>
>
> Ciao,
> Michael.


Re: [rfc, stage 1] default to -fno-delete-null-pointer-checks on nios2-elf

2015-05-04 Thread Jeff Law

On 05/01/2015 02:33 PM, Sandra Loosemore wrote:

Re https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01510.html :

On 04/15/2015 10:42 PM, Jeff Law wrote:

It looks very sane to me.  This is probably how the AVR and CR16 should
have been handled to begin with IMHO.

FWIW, I generally discourage ports overriding default options, but this
is a case where I believe it makes some sense.

Please move forward with an official submission.


I've now bootstrapped and regression-tested the previously posted patch
on x86_64-linux-gnu, as well as retesting it on nios2-elf after updating
my source tree to current mainline head.

Are the target-independent parts OK to commit?

Yes.  Please install.

Thanks,
Jeff


Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-05-04 Thread Jeff Law

On 05/04/2015 10:37 AM, Alexander Monakov wrote:

This patch introduces option -fno-plt that allows to expand calls that would
go via PLT to load the address of the function immediately at call site (which
introduces a GOT load).  Cover letter explains the motivation for this patch.

New option documentation for invoke.texi is missing from the patch; if this is
accepted I'll be happy to send a v2 with documentation added.

* calls.c (prepare_call_address): Transform PLT call to GOT lookup and
indirect call by forcing address into a pseudo with -fno-plt.
* common.opt (flag_plt): New option.

OK once you cobble together the invoke.texi changes.

Jeff




Re: [RFC PATCH] ira: accept loads via argp rtx in validate_equiv_mem

2015-05-04 Thread Jeff Law

On 05/04/2015 10:37 AM, Alexander Monakov wrote:

With this patch at hand, I'd like to discuss a code generation problem, which
my patch solves only partially.  FWIW, it passes bootstrap/regtest on x86-64.

With other patches in series applied, GCC with -fno-plt can generate tail
calls in PIC mode more frequently, but sometimes poorer code is generated.
I've tried to look for possible causes, and found one issue so far.

Consider the following testcase:

void foo1(int a, int b, int c, int d, int e, int f, int g, int h);
int bar(int x);
void foo2(int a, int b, int c, int d, int e, int f, int g, int h)
{
   bar(a);
   foo1(a, b, c, d, e, f, g, h);
}

Comparing x86 code generation with -O2 -m32 and with/without -fPIC, you can
see that -fPIC happens to produce smaller code.  Without -fPIC, GCC
saves/restores all arguments before/after call to 'bar'.

The reason for that is without -fPIC, GCC performs tail call optimization on
'foo1', and that causes it to drop REG_EQUIV notes for incoming arguments in
fixup_tail_calls.  After that, code generation diverges at IRA stage, where
lack of equivalences prevents loads of pseudos to be moved to the point of
first use.

The patch tries to repair the problem by allowing REG_EQUIV notes to be
resynthesized at ira init for loads that happen via `argp' rtx.  It helps for
the simple testcase above, but not for problematic Clang/LLVM functions where
I noticed the issue.

I hope there's a way around the 'big hammer' approach of fixup_tail_calls.
Might it be possible instead of dropping REG_EQUIV notes, to copy incoming
arguments into other pseudos just prior to stack pointer adjustment in
preparation for tailcall?
Isn't the whole point of dropping the notes to indicate that those 
argument slots are not longer guaranteed to hold the value at all points 
throughout the function?


That can certainly be relaxed, but you'll have to have some kind of code 
to analyze the data in the argument slots to ensure they haven't 
changed.  You can't just blindly put the notes back if I remember this 
stuff correctly.


Jeff



Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-05-04 Thread Jakub Jelinek
On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote:
> On 05/04/2015 10:37 AM, Alexander Monakov wrote:
> >This patch introduces option -fno-plt that allows to expand calls that would
> >go via PLT to load the address of the function immediately at call site 
> >(which
> >introduces a GOT load).  Cover letter explains the motivation for this patch.
> >
> >New option documentation for invoke.texi is missing from the patch; if this 
> >is
> >accepted I'll be happy to send a v2 with documentation added.
> >
> > * calls.c (prepare_call_address): Transform PLT call to GOT lookup and
> > indirect call by forcing address into a pseudo with -fno-plt.
> > * common.opt (flag_plt): New option.
> OK once you cobble together the invoke.texi changes.

Isn't what Michael/Alan suggested better?  I mean as/ld/compiler changes to
inline the plt slot's first part, then lazy binding will work fine.

Jakub


Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-05-04 Thread Jeff Law

On 05/04/2015 11:39 AM, Jakub Jelinek wrote:

On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote:

On 05/04/2015 10:37 AM, Alexander Monakov wrote:

This patch introduces option -fno-plt that allows to expand calls that would
go via PLT to load the address of the function immediately at call site (which
introduces a GOT load).  Cover letter explains the motivation for this patch.

New option documentation for invoke.texi is missing from the patch; if this is
accepted I'll be happy to send a v2 with documentation added.

* calls.c (prepare_call_address): Transform PLT call to GOT lookup and
indirect call by forcing address into a pseudo with -fno-plt.
* common.opt (flag_plt): New option.

OK once you cobble together the invoke.texi changes.


Isn't what Michael/Alan suggested better?  I mean as/ld/compiler changes to
inline the plt slot's first part, then lazy binding will work fine.

I must have missed Alan/Michael's message.

ISTM the win here is that by going through the GOT, you can CSE the GOT 
reference and possibly get some more register allocation freedom.  Is 
that still the case with Alan/Michael's approach?


jeff


Re: [PATCH] fixup libobjc usage of PCC_BITFIELD_TYPE_MATTERS

2015-05-04 Thread Jeff Law

On 05/01/2015 09:30 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

Hi,

This adds a configure check to libobjc to find out if types of bitfields effect
their layout, and uses it to replace the rather broken usage of
PCC_BITFIELD_TYPE_MATTERS.

bootstrapped + regtested x86_64-linux-gnu, bootstrapped on ppc64le-linux-gnu
and ran check-objc there without failures, and checked the correct part of the
ifdef is used on a cross to m68k-linux-elf.  ok?  I'm sure I've gotten
something wrong since this is a bunch of auto tools ;-)

Trev

libobjc/ChangeLog:

2015-05-01  Trevor Saunders  

* acinclude.m4: Include bitfields.m4.
* config.h.in: Regenerate.
* configure: Likewise.
* configure.ac: Invoke gt_BITFIELD_TYPE_MATTERS.
* encoding.c: Check HAVE_BITFIELD_TYPE_MATTERS.
OK with the general direction here.  If Jakub's test is better, then go 
with it as a follow-up.


jeff


[C++ Patch] PR 66007

2015-05-04 Thread Paolo Carlini

Hi,

unfortunately we have to return to these few lines of code :(

This regression is a more subtle variant of c++/65858: if the user 
passes -Wno-error=narrowing the pedwarn didn't result in an actual error 
(even if we are forcing -pedantic-errors around it) but produces anyway 
a warning, thus returns true, and ok isn't set to true, thus we have a 
miscompilation in this case too. Jakub suggested simply checking by hand 
errorcount, which passes all my tests.


Thanks,
Paolo.


/cp
2015-05-04  Paolo Carlini  
Jakub Jelinek  

PR c++/66007
* typeck2.c (check_narrowing): Check by-hand that the pedwarn didn't
result in an actual error.

/testsuite
2015-05-04  Paolo Carlini  
Jakub Jelinek  

PR c++/66007
* g++.dg/cpp0x/Wnarrowing4.C: New.
Index: cp/typeck2.c
===
--- cp/typeck2.c(revision 222767)
+++ cp/typeck2.c(working copy)
@@ -958,10 +958,12 @@ check_narrowing (tree type, tree init, tsubst_flag
}
   else if (complain & tf_error)
{
+ int savederrorcount = errorcount;
  global_dc->pedantic_errors = 1;
- if (!pedwarn (EXPR_LOC_OR_LOC (init, input_location), OPT_Wnarrowing,
-   "narrowing conversion of %qE from %qT to %qT "
-   "inside { }", init, ftype, type))
+ pedwarn (EXPR_LOC_OR_LOC (init, input_location), OPT_Wnarrowing,
+  "narrowing conversion of %qE from %qT to %qT "
+  "inside { }", init, ftype, type);
+ if (errorcount == savederrorcount)
ok = true;
  global_dc->pedantic_errors = flag_pedantic_errors;
}
Index: testsuite/g++.dg/cpp0x/Wnarrowing4.C
===
--- testsuite/g++.dg/cpp0x/Wnarrowing4.C(revision 0)
+++ testsuite/g++.dg/cpp0x/Wnarrowing4.C(working copy)
@@ -0,0 +1,14 @@
+// PR c++/66007
+// { dg-do run { target c++11 } }
+// { dg-options "-Wno-error=narrowing" }
+
+extern "C" void abort();
+
+int main()
+{
+  unsigned foo[] = { 1, -1, 3 };
+  if (foo[0] != 1 || foo[1] != __INT_MAX__ * 2U + 1 || foo[2] != 3)
+abort();
+}
+
+// { dg-prune-output "narrowing conversion" }


[PATCH] Fix ubsan non-call-exceptions ICE (PR tree-optimization/65984)

2015-05-04 Thread Jakub Jelinek
Hi!

The code I've added in r217755 was assuming that stmt_could_throw_p
memory read will always end a bb, but that is clearly not the case.
Thus, the following patch uses stmt_ends_bb_p instead.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/5?

2015-05-04  Jakub Jelinek  

PR tree-optimization/65984
* ubsan.c: Include tree-cfg.h.
(instrument_bool_enum_load): Use stmt_ends_bb_p instead of
stmt_could_throw_p test, rename can_throw variable to ends_bb.

* c-c++-common/ubsan/pr65984.c: New test.

--- gcc/ubsan.c.jj  2015-04-09 21:49:59.0 +0200
+++ gcc/ubsan.c 2015-05-04 17:17:34.273661884 +0200
@@ -87,6 +87,7 @@ along with GCC; see the file COPYING3.
 #include "builtins.h"
 #include "tree-object-size.h"
 #include "tree-eh.h"
+#include "tree-cfg.h"
 
 /* Map from a tree to a VAR_DECL tree.  */
 
@@ -1420,7 +1421,7 @@ instrument_bool_enum_load (gimple_stmt_i
   || TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
 return;
 
-  bool can_throw = stmt_could_throw_p (stmt);
+  bool ends_bb = stmt_ends_bb_p (stmt);
   location_t loc = gimple_location (stmt);
   tree lhs = gimple_assign_lhs (stmt);
   tree ptype = build_pointer_type (TREE_TYPE (rhs));
@@ -1432,7 +1433,7 @@ instrument_bool_enum_load (gimple_stmt_i
   tree mem = build2 (MEM_REF, utype, gimple_assign_lhs (g),
 build_int_cst (atype, 0));
   tree urhs = make_ssa_name (utype);
-  if (can_throw)
+  if (ends_bb)
 {
   gimple_assign_set_lhs (stmt, urhs);
   g = gimple_build_assign (lhs, NOP_EXPR, urhs);
@@ -1469,7 +1470,7 @@ instrument_bool_enum_load (gimple_stmt_i
   gimple_set_location (g, loc);
   gsi_insert_after (gsi, g, GSI_NEW_STMT);
 
-  if (!can_throw)
+  if (!ends_bb)
 {
   gimple_assign_set_rhs_with_ops (&gsi2, NOP_EXPR, urhs);
   update_stmt (stmt);
--- gcc/testsuite/c-c++-common/ubsan/pr65984.c.jj   2015-05-04 
14:16:59.655378975 +0200
+++ gcc/testsuite/c-c++-common/ubsan/pr65984.c  2015-05-04 17:19:55.875447821 
+0200
@@ -0,0 +1,23 @@
+/* PR tree-optimization/65984 */
+/* { dg-do compile } */
+/* { dg-options "-fnon-call-exceptions -fsanitize=bool,enum" } */
+
+#ifndef __cplusplus
+#define bool _Bool
+#endif
+
+enum E { E0, E1, E2 };
+enum E e[2];
+bool *b;
+
+int
+foo (int i)
+{
+  return e[i];
+}
+
+int
+bar (int i)
+{
+  return b[i];
+}

Jakub


Re: [PATCH 2/4] libcpp: Replace macro usage with C++ constructs

2015-05-04 Thread Jeff Law

On 05/01/2015 06:56 PM, David Malcolm wrote:

libcpp makes extensive use of the C preprocessor.  Whilst this has a
pleasingly self-referential quality, I find the code hard-to-read;
implementing source location support in my JIT branch was much harder than
I felt it should have been.

In an attempt at making the code easier to follow, and to build towards
a followup patch, this patch converts most of these macros to C++
equivalents: using "const" for compile-time constants, and inline
functions where macros aren't used as lvalues.

This effectively documents the expected types of the params, and makes
them available from the debugger e.g.:

   (gdb) p LINEMAP_FILE ($3)
   $1 = 0x13b8b37 ""

and indeed the constants also:

   (gdb) p IS_ADHOC_LOC(MAX_SOURCE_LOCATION)
   $2 = false
   (gdb) p IS_ADHOC_LOC(MAX_SOURCE_LOCATION + 1)
   $3 = true

[I didn't mark the inline functions as "static"; should they be?]

[FWIW, I posted a reduced version of this patch about a year ago as:
   https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01092.html
which covered a smaller subset of the macros].

libcpp/ChangeLog:
* include/line-map.h (MAX_SOURCE_LOCATION): Convert from a macro
to a const source_location.
(RESERVED_LOCATION_COUNT): Likewise.
(linemap_check_ordinary): Convert from a macro to a pair of inline
functions, for const/non-const arguments.
(MAP_START_LOCATION): Likewise.
(ORDINARY_MAP_STARTING_LINE_NUMBER): Likewise.
(ORDINARY_MAP_INCLUDER_FILE_INDEX): Likewise.
(ORDINARY_MAP_IN_SYSTEM_HEADER_P): Likewise.
(ORDINARY_MAP_NUMBER_OF_COLUMN_BITS): Convert from a macro to a
pair of inline functions, for const/non-const arguments, where the
latter is named...
(SET_ORDINARY_MAP_NUMBER_OF_COLUMN_BITS): New function.
(ORDINARY_MAP_FILE_NAME): Convert from a macro to a pair of inline
functions, for const/non-const arguments.
(MACRO_MAP_MACRO): Likewise.
(MACRO_MAP_NUM_MACRO_TOKENS): Likewise.
(MACRO_MAP_LOCATIONS): Likewise.
(MACRO_MAP_EXPANSION_POINT_LOCATION): Likewise.
(LINEMAPS_MAP_INFO): Likewise.
(LINEMAPS_MAPS): Likewise.
(LINEMAPS_ALLOCATED): Likewise.
(LINEMAPS_USED): Likewise.
(LINEMAPS_CACHE): Likewise.
(LINEMAPS_ORDINARY_CACHE): Likewise.
(LINEMAPS_MACRO_CACHE): Likewise.
(LINEMAPS_MAP_AT): Convert from a macro to an inline function.
(LINEMAPS_LAST_MAP): Likewise.
(LINEMAPS_LAST_ALLOCATED_MAP): Likewise.
(LINEMAPS_ORDINARY_MAPS): Likewise.
(LINEMAPS_ORDINARY_MAP_AT): Likewise.
(LINEMAPS_ORDINARY_ALLOCATED): Likewise.
(LINEMAPS_ORDINARY_USED): Likewise.
(LINEMAPS_LAST_ORDINARY_MAP): Likewise.
(LINEMAPS_LAST_ALLOCATED_ORDINARY_MAP): Likewise.
(LINEMAPS_MACRO_MAPS): Likewise.
(LINEMAPS_MACRO_MAP_AT): Likewise.
(LINEMAPS_MACRO_ALLOCATED): Likewise.
(LINEMAPS_MACRO_USED): Likewise.
(LINEMAPS_MACRO_LOWEST_LOCATION): Likewise.
(LINEMAPS_LAST_MACRO_MAP): Likewise.
(LINEMAPS_LAST_ALLOCATED_MACRO_MAP): Likewise.
(IS_ADHOC_LOC): Likewise.
(COMBINE_LOCATION_DATA): Likewise.
(SOURCE_LINE): Likewise.
(SOURCE_COLUMN): Likewise.
(LAST_SOURCE_LINE_LOCATION): Likewise.
(LAST_SOURCE_LINE): Likewise.
(LAST_SOURCE_COLUMN): Likewise.
(LAST_SOURCE_LINE_LOCATION)
(INCLUDED_FROM): Likewise.
(MAIN_FILE_P): Likewise.
(LINEMAP_FILE): Likewise.
(LINEMAP_LINE): Likewise.
(LINEMAP_SYSP): Likewise.
(linemap_location_before_p): Likewise.
* line-map.c (linemap_check_files_exited): Make local "map" const.
(linemap_add): Use SET_ORDINARY_MAP_NUMBER_OF_COLUMN_BITS.
(linemap_line_start): Likewise.
---
-#define MAP_START_LOCATION(MAP) (MAP)->start_location
+#if defined ENABLE_CHECKING && (GCC_VERSION >= 2007)
+
+/* Assertion macro to be used in line-map code.  */
+#define linemap_assert(EXPR)  \
+  do {\
+if (! (EXPR)) \
+  abort ();   \
+  } while (0)
+
+/* Assert that becomes a conditional expression when checking is disabled at
+   compilation time.  Use this for conditions that should not happen but if
+   they happen, it is better to handle them gracefully rather than crash
+   randomly later.
+   Usage:
+
+   if (linemap_assert_fails(EXPR)) handle_error(); */
+#define linemap_assert_fails(EXPR) __extension__ \
+  ({linemap_assert (EXPR); false;})
+
+#else
+/* Include EXPR, so that unused variable warnings do not occur.  */
+#define linemap_assert(EXPR) ((void)(0 && (EXPR)))
+#define linemap_assert_fails(EXPR) (! (EXPR))
+#endif
So if we're generally trying to get away from #define programming, then 
this part seems like a bit of a step backwards.

Re: [PATCH 3/4] libcpp/input.c: Add a way to visualize the linemaps

2015-05-04 Thread Jeff Law

On 05/01/2015 06:56 PM, David Malcolm wrote:

As a relative newcomer to GCC, one of the issues I had was
becoming comfortable with the linemap API and its internal
representation.

To familiarize myself with it, I wrote a dumping routine
to try to visualize how the source_location space is carved
up between line maps, and what each number can mean.

It struck me that this would benefit others, so this patch
adds this visualization, via an undocumented option
-fdump-locations, and adds a text file to libcpp's sources
documenting a simple example of compiling a small C file,
with a header and macro expansions (built using the
-fdump-locations option and a little hand-editing).

gcc/ChangeLog:
* common.opt (fdump-locations): New option.
* input.c: Include diagnostic-core.h.
(get_end_location): New function.
(write_digit): New function.
(write_digit_row): New function.
(dump_location_range): New function.
(dump_labelled_location_range): New function.
(dump_location_info): New function.
* input.h (dump_location_info): New prototype.
* toplev.c (compile_file): Handle flag_dump_locations.

libcpp/ChangeLog:
* include/line-map.h (source_location): Add a reference to
location-example.txt to the descriptive comment.
* location-example.txt: New file.
Maybe "dump-internal-locations"?  Not sure I want to bikeshed on the 
name any more than that.   If you feel strongly about the option name, 
then I won't stress about it.





+void
+dump_location_info (FILE *stream)
+{
+  if (0)
+line_table_dump (stream,
+line_table,
+LINEMAPS_ORDINARY_USED (line_table),
+LINEMAPS_MACRO_USED (line_table));

Should the if (0) code go away?


+
+  /* A brute-force visualization: emit a warning at every location.  */
+  if (0)
+for (source_location loc = 0; loc < line_table->highest_location; loc++)
+  warning_at (loc, 0, "this is location %i", loc);
+  /* Alternatively, we could use inform (), though this
+also shows lots of locations in stdc-predef.h */

And again.


So I think with removing the if (0) code and the possible option name 
change this is good to go.


Jeff


Re: [PATCH] Fix ubsan non-call-exceptions ICE (PR tree-optimization/65984)

2015-05-04 Thread Jeff Law

On 05/04/2015 12:16 PM, Jakub Jelinek wrote:

Hi!

The code I've added in r217755 was assuming that stmt_could_throw_p
memory read will always end a bb, but that is clearly not the case.
Thus, the following patch uses stmt_ends_bb_p instead.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/5?

2015-05-04  Jakub Jelinek  

PR tree-optimization/65984
* ubsan.c: Include tree-cfg.h.
(instrument_bool_enum_load): Use stmt_ends_bb_p instead of
stmt_could_throw_p test, rename can_throw variable to ends_bb.

* c-c++-common/ubsan/pr65984.c: New test.

OK.
Jeff



Fix PR48052: loop not vectorized if index is "unsigned int"

2015-05-04 Thread Abderrazek Zaafrani
This is an old thread and we are still running into similar issues:
Code is not being vectorized on 64-bit target due to scev not being
able to optimally analyze overflow condition.

While the original test case shown here seems to work now, it does not
work if the start value is not a constant and the loop index variable
is of unsigned type: Ex

void loop2( double const * __restrict__ x_in, double * __restrict__
x_out, double const * __restrict__ c, unsigned int N, unsigned int
start) {
 for(unsigned int i=start; i!=N; ++i)
   x_out[i] = c[i]*x_in[i];
}

Here is our unit test:

int foo(int* A, int* B, unsigned start, unsigned B)
{
  int s;
  for (unsigned k = start; k From eedbcd1ef6a81bb9c000e0dba9ff2a6c524576ac Mon Sep 17 00:00:00 2001
From: Abderrazek Zaafrani 
Date: Mon, 4 May 2015 11:00:12 -0500
Subject: [PATCH] scev for vectorization

PR optimization/48052
* tree-ssa-loop-niter.c (variable_appears_in_loop_exit_condition): New.
(scev_probably_wraps_p): Handle unsigned convert expressions to a 
larger type
than the basic induction variable.

* gcc.dg/vect/pr48052.c: New.
---
 gcc/testsuite/gcc.dg/vect/pr48052.c | 27 
 gcc/tree-ssa-loop-niter.c   | 84 +
 2 files changed, 111 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr48052.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr48052.c 
b/gcc/testsuite/gcc.dg/vect/pr48052.c
new file mode 100644
index 000..8e406d7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr48052.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
+int foo(int* A, int* B,  unsigned start, unsigned BS)
+{
+  int s;
+  for (unsigned k = start;  k < start + BS; k++)
+{
+  s += A[k] * B[k];
+}
+
+  return s;
+}
+
+int bar(int* A, int* B, unsigned BS)
+{
+  int s;
+  for (unsigned k = 0;  k < BS; k++)
+{
+  s += A[k] * B[k];
+}
+
+  return s;
+}
+
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 042f8df..345fb93 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -3773,6 +3773,30 @@ nowrap_type_p (tree type)
   return false;
 }
 
+/* Returns true when T appears in the exit condition of LOOP.  */
+
+static bool
+variable_appears_in_loop_exit_condition (tree t, struct loop *loop)
+{
+  struct nb_iter_bound *bound;
+
+  /* For now, we are only interested in loops with one exit condition.  */
+  if (loop->bounds == NULL || loop->bounds->next != NULL)
+  return false;
+
+  for (bound = loop->bounds; bound; bound = bound->next)
+{
+  if (gimple_code (bound->stmt) != GIMPLE_COND)
+return false;
+
+  if (t == gimple_cond_lhs(bound->stmt)
+ || t == gimple_cond_rhs(bound->stmt))
+return true;
+}
+
+  return false;
+}
+
 /* Return false only when the induction variable BASE + STEP * I is
known to not overflow: i.e. when the number of iterations is small
enough with respect to the step and initial condition in order to
@@ -3879,6 +3903,66 @@ scev_probably_wraps_p (tree base, tree step,
 
   fold_undefer_and_ignore_overflow_warnings ();
 
+  /* At this point, we could not determine that the current scalar
+ evolution composed of base and step does not overflow.  In order
+ to improve this analysis, go back to the context of this scev,
+ i.e., statement and loop, and determine from there if we can
+ deduce that there is no overflow.
+
+ We are so far interested in convert statement of this form
+
+ _1 = (some cast) I;
+
+ where I is a basic induction variable.  This case is common when
+ computing addresses for 64-bit targets.  */
+  if (loop != NULL && loop->nb_iterations != NULL && loop->bounds != NULL
+  && at_stmt != NULL && integer_onep (step))
+{
+  enum tree_code nbi_code = TREE_CODE (loop->nb_iterations);
+  enum gimple_code stmt_code = gimple_code (at_stmt);
+
+  if (nbi_code != SCEV_NOT_KNOWN && stmt_code == GIMPLE_ASSIGN)
+{
+  tree rhs1 = gimple_assign_rhs1 (at_stmt);
+  enum tree_code tree_code = gimple_assign_rhs_code (at_stmt);
+  tree rhs2 = gimple_assign_rhs2 (at_stmt);
+
+  /* If at_stmt is a convert statement: _1 = (some cast) I;  */
+  if (rhs1 != NULL && rhs2 == NULL
+  && (tree_code == CONVERT_EXPR || tree_code == NOP_EXPR))
+{
+  tree stmt_type = TREE_TYPE (gimple_assign_lhs (at_stmt));
+  int stmt_type_size = tree_to_uhwi (TYPE_SIZE(stmt_type));
+  int rhs1_type_size = tree_to_uhwi (TYPE_SIZE(TREE_TYPE(rhs1)));
+  gimple def_rhs1 = SSA_NAME_DEF_STMT (rhs1);
+
+  if (gimple_code (def_rhs1) == GIMPLE_PHI
+ && gimple_phi_num_args (def_rhs1) == 2
+ && stmt_type_size > rhs1_type_size)
+  

[PATCH, i386]: Some trivial const_wide_int/const_double related cleanups

2015-05-04 Thread Uros Bizjak
Hello!

2015-05-04  Uros Bizjak  

* config/i386/i386.c: Change GET_CODE (...) == CONST_DOUBLE check
to CONST_DOUBLE_P predicate.
(standard_sse_constant_p): Return 0 for !TARGET_SSE.
(ix86_legitimate_constant_p) : For 32bit targets,
allow only operands that satisfy standard_sse_constant_p predicate.
* config/i386/i386.md: Change GET_CODE (...) == CONST_DOUBLE check
to CONST_DOUBLE_P predicate.

Tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 222767)
+++ config/i386/i386.c  (working copy)
@@ -9368,7 +9368,7 @@ standard_80387_constant_p (rtx x)
 
   REAL_VALUE_TYPE r;
 
-  if (!(X87_FLOAT_MODE_P (mode) && (GET_CODE (x) == CONST_DOUBLE)))
+  if (!(CONST_DOUBLE_P (x) && X87_FLOAT_MODE_P (mode)))
 return -1;
 
   if (x == CONST0_RTX (mode))
@@ -9469,9 +9469,14 @@ standard_80387_constant_rtx (int idx)
 int
 standard_sse_constant_p (rtx x)
 {
-  machine_mode mode = GET_MODE (x);
+  machine_mode mode;
 
-  if (x == const0_rtx || x == CONST0_RTX (GET_MODE (x)))
+  if (!TARGET_SSE)
+return 0;
+
+  mode = GET_MODE (x);
+  
+  if (x == const0_rtx || x == CONST0_RTX (mode))
 return 1;
   if (vector_all_ones_operand (x, mode))
 switch (mode)
@@ -13078,9 +13083,7 @@ ix86_legitimate_constant_p (machine_mode, rtx x)
   break;
 
 case CONST_WIDE_INT:
-  if (GET_MODE (x) == TImode
- && x != CONST0_RTX (TImode)
-  && !TARGET_64BIT)
+  if (!TARGET_64BIT && !standard_sse_constant_p (x))
return false;
   break;
 
@@ -15903,7 +15906,7 @@ ix86_print_operand (FILE *file, rtx x, int code)
output_address (x);
 }
 
-  else if (GET_CODE (x) == CONST_DOUBLE && GET_MODE (x) == SFmode)
+  else if (CONST_DOUBLE_P (x) && GET_MODE (x) == SFmode)
 {
   REAL_VALUE_TYPE r;
   long l;
@@ -15921,7 +15924,7 @@ ix86_print_operand (FILE *file, rtx x, int code)
fprintf (file, "0x%08x", (unsigned int) l);
 }
 
-  else if (GET_CODE (x) == CONST_DOUBLE && GET_MODE (x) == DFmode)
+  else if (CONST_DOUBLE_P (x) && GET_MODE (x) == DFmode)
 {
   REAL_VALUE_TYPE r;
   long l[2];
@@ -15935,7 +15938,7 @@ ix86_print_operand (FILE *file, rtx x, int code)
 }
 
   /* These float cases don't actually occur as immediate operands.  */
-  else if (GET_CODE (x) == CONST_DOUBLE && GET_MODE (x) == XFmode)
+  else if (CONST_DOUBLE_P (x) && GET_MODE (x) == XFmode)
 {
   char dstr[30];
 
@@ -17364,8 +17367,7 @@ ix86_expand_move (machine_mode mode, rtx operands[
op1 = copy_to_mode_reg (mode, op1);
 
   if (can_create_pseudo_p ()
- && FLOAT_MODE_P (mode)
- && GET_CODE (op1) == CONST_DOUBLE)
+ && CONST_DOUBLE_P (op1))
{
  /* If we are loading a floating point constant to a register,
 force the value to memory now, since we'll get better code
@@ -19563,7 +19565,7 @@ ix86_expand_copysign (rtx operands[])
   else
 vmode = mode;
 
-  if (GET_CODE (op0) == CONST_DOUBLE)
+  if (CONST_DOUBLE_P (op0))
 {
   rtx (*copysign_insn)(rtx, rtx, rtx, rtx);
 
@@ -22632,7 +22634,7 @@ ix86_split_to_parts (rtx operand, rtx *parts, mach
  for (i = 1; i < size; i++)
parts[i] = adjust_address (operand, SImode, 4 * i);
}
- else if (GET_CODE (operand) == CONST_DOUBLE)
+ else if (CONST_DOUBLE_P (operand))
{
  REAL_VALUE_TYPE r;
  long l[4];
@@ -22683,7 +22685,7 @@ ix86_split_to_parts (rtx operand, rtx *parts, mach
  parts[0] = operand;
  parts[1] = adjust_address (operand, upper_mode, 8);
}
- else if (GET_CODE (operand) == CONST_DOUBLE)
+ else if (CONST_DOUBLE_P (operand))
{
  REAL_VALUE_TYPE r;
  long l[4];
@@ -41208,7 +41210,7 @@ ix86_preferred_reload_class (rtx x, reg_class_t re
 return SSE_CLASS_P (regclass) ? regclass : NO_REGS;
 
   /* Floating-point constants need more complex checks.  */
-  if (GET_CODE (x) == CONST_DOUBLE && GET_MODE (x) != VOIDmode)
+  if (CONST_DOUBLE_P (x))
 {
   /* General regs can load everything.  */
   if (reg_class_subset_p (regclass, GENERAL_REGS))
@@ -44551,9 +44553,9 @@ ix86_expand_vector_init (bool mmx_ok, rtx target,
   for (i = 0; i < n_elts; ++i)
 {
   x = XVECEXP (vals, 0, i);
-  if (!(CONST_INT_P (x)
-   || GET_CODE (x) == CONST_DOUBLE
-   || GET_CODE (x) == CONST_FIXED))
+  if (!(CONST_SCALAR_INT_P (x)
+   || CONST_DOUBLE_P (x)
+   || CONST_FIXED_P (x)))
n_var++, one_var = i;
   else if (x != CONST0_RTX (inner_mode))
all_const_zero = false;
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 222767)
+++ config/i386/i386.md (working copy)
@@ -2955,7 +2955,

Re: [RFA] More type narrowing in match.pd V2

2015-05-04 Thread H.J. Lu
I think this caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66009

H.J.


On Mon, May 4, 2015 at 2:02 AM, Richard Biener
 wrote:
> On Sat, May 2, 2015 at 2:36 AM, Jeff Law  wrote:
>> Here's an updated patch to add more type narrowing to match.pd.
>>
>> Changes since the last version:
>>
>> Slight refactoring of the condition by using types_match as suggested by
>> Richi.  I also applied the new types_match to 2 other patterns in match.pd
>> where it seemed clearly appropriate.
>>
>> Additionally the transformation is restricted by using the new single_use
>> predicate.  I didn't change other patterns in match.pd to use the new
>> single_use predicate.  But some probably could be changed.
>>
>> This (of course) continues to pass the bootstrap and regression check for
>> x86-linux-gnu.
>>
>> There's still a ton of work to do in this space.  This is meant to be an
>> incremental stand-alone improvement.
>>
>> OK now?
>
> Ok with the {gimple,generic}-match-head.c changes mentioned in the ChangeLog.
>
> Thanks,
> Richard.
>
>>
>>
>> Jeff
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index e006b26..5ee89de 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,8 @@
>> +2015-05-01  Jeff Law  
>> +
>> +   * match.pd (bit_and (plus/minus (convert @0) (convert @1) mask): New
>> +   simplifier to narrow arithmetic.
>> +
>>  2015-05-01  Rasmus Villemoes  
>>
>> * match.pd: New simplification patterns.
>> diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c
>> index daa56aa..303b237 100644
>> --- a/gcc/generic-match-head.c
>> +++ b/gcc/generic-match-head.c
>> @@ -70,4 +70,20 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "dumpfile.h"
>>  #include "generic-match.h"
>>
>> +/* Routine to determine if the types T1 and T2 are effectively
>> +   the same for GENERIC.  */
>>
>> +inline bool
>> +types_match (tree t1, tree t2)
>> +{
>> +  return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
>> +}
>> +
>> +/* Return if T has a single use.  For GENERIC, we assume this is
>> +   always true.  */
>> +
>> +inline bool
>> +single_use (tree t)
>> +{
>> +  return true;
>> +}
>> diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
>> index c7b2f95..dc13218 100644
>> --- a/gcc/gimple-match-head.c
>> +++ b/gcc/gimple-match-head.c
>> @@ -861,3 +861,21 @@ do_valueize (tree (*valueize)(tree), tree op)
>>return op;
>>  }
>>
>> +/* Routine to determine if the types T1 and T2 are effectively
>> +   the same for GIMPLE.  */
>> +
>> +inline bool
>> +types_match (tree t1, tree t2)
>> +{
>> +  return types_compatible_p (t1, t2);
>> +}
>> +
>> +/* Return if T has a single use.  For GIMPLE, we also allow any
>> +   non-SSA_NAME (ie constants) and zero uses to cope with uses
>> +   that aren't linked up yet.  */
>> +
>> +inline bool
>> +single_use (tree t)
>> +{
>> +  return TREE_CODE (t) != SSA_NAME || has_zero_uses (t) || has_single_use
>> (t);
>> +}
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 87ecaf1..51a950a 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -289,8 +289,7 @@ along with GCC; see the file COPYING3.  If not see
>>(if (((TREE_CODE (@1) == INTEGER_CST
>>  && INTEGRAL_TYPE_P (TREE_TYPE (@0))
>>  && int_fits_type_p (@1, TREE_TYPE (@0)))
>> -   || (GIMPLE && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1)))
>> -   || (GENERIC && TREE_TYPE (@0) == TREE_TYPE (@1)))
>> +   || types_match (TREE_TYPE (@0), TREE_TYPE (@1)))
>> /* ???  This transform conflicts with fold-const.c doing
>>   Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>>   constants (if x has signed type, the sign bit cannot be set
>> @@ -949,8 +948,7 @@ along with GCC; see the file COPYING3.  If not see
>>  /* Unordered tests if either argument is a NaN.  */
>>  (simplify
>>   (bit_ior (unordered @0 @0) (unordered @1 @1))
>> - (if ((GIMPLE && types_compatible_p (TREE_TYPE (@0), TREE_TYPE (@1)))
>> -  || (GENERIC && TREE_TYPE (@0) == TREE_TYPE (@1)))
>> + (if (types_match (TREE_TYPE (@0), TREE_TYPE (@1)))
>>(unordered @0 @1)))
>>  (simplify
>>   (bit_ior:c (unordered @0 @0) (unordered:c@2 @0 @1))
>> @@ -1054,7 +1052,7 @@ along with GCC; see the file COPYING3.  If not see
>> operation and convert the result to the desired type.  */
>>  (for op (plus minus)
>>(simplify
>> -(convert (op (convert@2 @0) (convert@3 @1)))
>> +(convert (op@4 (convert@2 @0) (convert@3 @1)))
>>  (if (INTEGRAL_TYPE_P (type)
>>  /* We check for type compatibility between @0 and @1 below,
>> so there's no need to check that @1/@3 are integral types.  */
>> @@ -1070,15 +1068,45 @@ along with GCC; see the file COPYING3.  If not see
>>  && TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
>>  /* The inner conversion must be a widening conversion.  */
>>  && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE
>> (@0))
>> -&& ((GENERIC
>> -   

Re: [C++ Patch] PR 66007

2015-05-04 Thread Jason Merrill

On 05/04/2015 01:17 PM, Paolo Carlini wrote:

This regression is a more subtle variant of c++/65858: if the user
passes -Wno-error=narrowing the pedwarn didn't result in an actual error
(even if we are forcing -pedantic-errors around it) but produces anyway
a warning, thus returns true, and ok isn't set to true, thus we have a
miscompilation in this case too. Jakub suggested simply checking by hand
errorcount, which passes all my tests.


OK.

Jason




Demangle symbols in debug assertion messages

2015-05-04 Thread François Dumont

Hi

Here is  the patch to demangle symbols in debug messages. I have 
also simplify code in formatter.h.


Here is an example of assertion message:

/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/debug/functions.h:213:
error: function requires a valid iterator range [__first, __last).

Objects involved in the operation:
iterator "__first" @ 0x0x7fff165d68b0 {
  type = __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iteratorstd::__cxx1998::vector > >, 
std::__debug::vector > > (mutable iterator);

  state = dereferenceable;
  references sequence with type `std::__debug::vectorstd::allocator >' @ 0x0x7fff165d69d0

}
iterator "__last" @ 0x0x7fff165d68e0 {
  type = __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iteratorstd::__cxx1998::vector > >, 
std::__debug::vector > > (mutable iterator);

  state = dereferenceable;
  references sequence with type `std::__debug::vectorstd::allocator >' @ 0x0x7fff165d69d0

}


* include/debug/formatter.h (_GLIBCXX_TYPEID): New macro to simplify
usage of typeid.
(_Error_formatter::_M_print_type): New.
* src/c++11/debug.cc
(_Error_formatter::_Parameter::_M_print_field): Use latter.
(_Error_formatter::_M_print_type): Implement latter using
__cxaabiv1::__cxa_demangle to print demangled type name.

I just hope that __cxa_demangle is portable.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/debug/formatter.h b/libstdc++-v3/include/debug/formatter.h
index 6767cd9..32dcf92 100644
--- a/libstdc++-v3/include/debug/formatter.h
+++ b/libstdc++-v3/include/debug/formatter.h
@@ -31,7 +31,17 @@
 
 #include 
 #include 
-#include 
+
+#if __cpp_rtti
+# include 
+# define _GLIBCXX_TYPEID(_Type) &typeid(_Type)
+#else
+namespace std
+{
+  class type_info;
+}
+# define _GLIBCXX_TYPEID(_Type) 0
+#endif
 
 namespace __gnu_debug
 {
@@ -218,21 +228,13 @@ namespace __gnu_debug
 	{
 	  _M_variant._M_iterator._M_name = __name;
 	  _M_variant._M_iterator._M_address = &__it;
-#if __cpp_rtti
-	  _M_variant._M_iterator._M_type = &typeid(__it);
-#else
-	  _M_variant._M_iterator._M_type = 0;
-#endif
+	  _M_variant._M_iterator._M_type = _GLIBCXX_TYPEID(__it);
 	  _M_variant._M_iterator._M_constness =
 	std::__are_same<_Safe_iterator<_Iterator, _Sequence>,
 			typename _Sequence::iterator>::
 	  __value ? __mutable_iterator : __const_iterator;
 	  _M_variant._M_iterator._M_sequence = __it._M_get_sequence();
-#if __cpp_rtti
-	  _M_variant._M_iterator._M_seq_type = &typeid(_Sequence);
-#else
-	  _M_variant._M_iterator._M_seq_type = 0;
-#endif
+	  _M_variant._M_iterator._M_seq_type = _GLIBCXX_TYPEID(_Sequence);
 
 	  if (__it._M_singular())
 	_M_variant._M_iterator._M_state = __singular;
@@ -256,21 +258,13 @@ namespace __gnu_debug
 	{
 	  _M_variant._M_iterator._M_name = __name;
 	  _M_variant._M_iterator._M_address = &__it;
-#if __cpp_rtti
-	  _M_variant._M_iterator._M_type = &typeid(__it);
-#else
-	  _M_variant._M_iterator._M_type = 0;
-#endif
+	  _M_variant._M_iterator._M_type = _GLIBCXX_TYPEID(__it);
 	  _M_variant._M_iterator._M_constness =
 	std::__are_same<_Safe_local_iterator<_Iterator, _Sequence>,
 			typename _Sequence::local_iterator>::
 	  __value ? __mutable_iterator : __const_iterator;
 	  _M_variant._M_iterator._M_sequence = __it._M_get_sequence();
-#if __cpp_rtti
-	  _M_variant._M_iterator._M_seq_type = &typeid(_Sequence);
-#else
-	  _M_variant._M_iterator._M_seq_type = 0;
-#endif
+	  _M_variant._M_iterator._M_seq_type = _GLIBCXX_TYPEID(_Sequence);
 
 	  if (__it._M_singular())
 	_M_variant._M_iterator._M_state = __singular;
@@ -291,11 +285,7 @@ namespace __gnu_debug
 	{
 	  _M_variant._M_iterator._M_name = __name;
 	  _M_variant._M_iterator._M_address = &__it;
-#if __cpp_rtti
-	  _M_variant._M_iterator._M_type = &typeid(__it);
-#else
-	  _M_variant._M_iterator._M_type = 0;
-#endif
+	  _M_variant._M_iterator._M_type = _GLIBCXX_TYPEID(__it);
 	  _M_variant._M_iterator._M_constness = __mutable_iterator;
 	  _M_variant._M_iterator._M_state = __it? __unknown_state : __singular;
 	  _M_variant._M_iterator._M_sequence = 0;
@@ -308,11 +298,7 @@ namespace __gnu_debug
 	{
 	  _M_variant._M_iterator._M_name = __name;
 	  _M_variant._M_iterator._M_address = &__it;
-#if __cpp_rtti
-	  _M_variant._M_iterator._M_type = &typeid(__it);
-#else
-	  _M_variant._M_iterator._M_type = 0;
-#endif
+	  _M_variant._M_iterator._M_type = _GLIBCXX_TYPEID(__it);
 	  _M_variant._M_iterator._M_constness = __const_iterator;
 	  _M_variant._M_iterator._M_state = __it? __unknown_state : __singular;
 	  _M_variant._M_iterator._M_sequence = 0;
@@ -325,11 +311,7 @@ namespace __gnu_debug
 	{
 	  _M_variant._M_iterator._M_name = __name;
 	  _M_variant._M_iterator._M_address = &__it;
-#if __cpp_rtti
-	  _M_variant._M_iterator._M_type = &typeid(__it);
-#else
-	  _M_variant._M_iterator._M_type = 0;
-#endif
+	  _M_variant._M_iterator._M_type = _GLIBCXX_TYPEID(__it);
 	  _M_variant._M_iterator._M_constness = __unknown_constness;
 	  _M

Re: [PATCH, RFC]: Next stage1, refactoring: propagating rtx subclasses

2015-05-04 Thread Mikhail Maltsev
(the original message was bounced by the mailing list, resending with
compressed attachment)

On 30.04.2015 8:00, Jeff Law wrote:
> 
> Can you please check the changes to do_jump_1, the indention looked 
> weird in the patch.  If it's correct, just say so.
It is ok. Probably that's because the surrounding code is indented with
spaces.

> The definition of PEEP2_EOB looks wrong.  I don't see how you can
> safely cast pc_rtx to an rtx_insn * since it's an RTX rather than rtx
> chain object.  Maybe you're getting away with it because it's used as
> marker. But it still feels wrong.
Yes, FWIW, it is only needed for assertions in peep2_regno_dead_p and
peep2_reg_dead_p which check it against NULL (they are intended to
verify that live_before field in peep2_insn_data struct is valid). At
least, when I removed the assertions and changed PEEP2_EOB to NULL (as
an experiment), the testsuite passed without regressions.

> You'd probably be better off creating a unique rtx_insn * object and
> using that as the marker.
OK. Fixed the patch. Rebased and tested on x86_64-linux (fortunately, it
did not conflict with Trevor's series of rtx_insn-related patches).

I'm trying to continue and the next patch (peep_split.patch,
peep_split.cl) is addressing the same task in some of the generated code
(namely, gen_peephole2_* and gen_split_* series of functions).

> If you're going to continue this work, you should probably get
> write-after-approval access so that you can commit your own approved
> changes.
Is it OK to mention you as a maintainer who can approve my request for
write access?

-- 
Regards,
Mikhail Maltsev



as_insn.tar.gz
Description: GNU Zip compressed data


Re: [PATCH 4/4] Replace line_map union with C++ class hierarchy

2015-05-04 Thread Jeff Law

On 05/01/2015 06:56 PM, David Malcolm wrote:

This patch eliminates the union in struct line_map in favor of
a simple class hierarchy, making struct line_map a base class,
with line_map_ordinary and line_map_macro subclasses.

The patch eliminates all usage of linemap_check_ordinary and
linemap_check_macro from line-map.h, updating return types and
signatures throughout libcpp and gcc's usage of it to use the
appropriate subclasses.

This moves the checking of linemap kind from run-time to
compile-time, and also implicitly documents everywhere where
the code is expecting an ordinary map vs a macro map vs
either kind of map.  I believe it makes the code significantly
simpler: most of the accessor functions in line-map.h become
trivial field-lookups.

I attemped to use templates for maps_info, but was stymied by
gengtype, so in the end I simply split it manually into
maps_info_ordinary and maps_info_macro.  In theory it's just
a vec<>, but vec.h is in gcc, and thus not available
for use from libcpp.

In a similar vein, gcc/is-a.h is presumably not usable
from within libcpp.  If it were, there would be the following
rough equivalences:

-  
line-map.h is-a.h
-  
linemap_check_ordinary (m) as_a  (m)
linemap_check_macro (m)as_a  (m)
linemap_macro_expansion_map_p (m)  (M ? is_a  (m)
   : false)
-  

There are numerous places in libcpp that offset a
line_map * using array notation to get the next/prev line_map of the
same kind, e.g.:
MAP_START_LOCATION (&cached[1])
which breaks due to the different sizes of line_map vs its subclasses.

On x86_64 host, before:
(gdb) p sizeof(line_map)
$1 = 40

after:
(gdb) p sizeof(line_map)
$1 = 8
(gdb) p sizeof(line_map_ordinary)
$2 = 32
(gdb) p sizeof(line_map_macro)
$3 = 40

Tracking down all of these array-based offsets to use a pointer to the
appropriate subclass (and thus use the correct offset) was rather
involved, but I believe the patch fixes them all now.

(the patch thus also gives a very modest saving of 8 bytes per ordinary
line map).

I've tried to use the naming convention "ord_map" and "macro_map"
whenever the typesystem ensures we're dealing with such a map,
wherever this is doable without needing to touch lines of code that
would otherwise not need touching by the patch.

gcc/ChangeLog:
* diagnostic.c (diagnostic_report_current_module): Strengthen
local "new_map" from const line_map * to
const line_map_ordinary *.
* genmatch.c (error_cb): Likewise for local "map".
(output_line_directive): Likewise for local "map".
* input.c (expand_location_1): Likewise for local "map".
Pass NULL rather than &map to
linemap_unwind_to_first_non_reserved_loc, since the value is never
read from there, and the value written back not read from here.
(is_location_from_builtin_token): Strengthen local "map" from
const line_map * to const line_map_ordinary *.
(dump_location_info): Strengthen locals "map" from
line_map *, one to const line_map_ordinary *, the other
to const line_map_macro *.
* tree-diagnostic.c (loc_map_pair): Strengthen field "map" from
const line_map * to const line_map_macro *.
(maybe_unwind_expanded_macro_loc): Add a call to
linemap_check_macro when writing to the "map" field of the
loc_map_pair.
Introduce local const line_map_ordinary * "ord_map", using it in
place of "map" in the part of the function where we know we have
an ordinary map.  Strengthen local "m" from const line_map * to
const line_map_ordinary *.

gcc/ada/ChangeLog:
* gcc-interface/trans.c (Sloc_to_locus1): Strenghthen local "map"
from line_map * to line_map_ordinary *.

gcc/c-family/ChangeLog:
* c-common.h (fe_file_change): Strengthen param from
const line_map * to const line_map_ordinary *.
(pp_file_change): Likewise.
* c-lex.c (fe_file_change): Likewise.
(cb_define): Use linemap_check_ordinary when invoking
SOURCE_LINE.
(cb_undef): Likewise.
* c-opts.c (c_finish_options): Use linemap_check_ordinary when
invoking cb_file_change.
(c_finish_options): Likewise.
(push_command_line_include): Likewise.
(cb_file_change): Strengthen param "new_map" from
const line_map * to const line_map_ordinary *.
* c-ppoutput.c (cb_define): Likewise for local "map".
(pp_file_change): Likewise for param "map" and local "from".

gcc/fortran/ChangeLog:
* cpp.c (maybe_print_line): Strengthen local "map" from
const line_map * to const line_map_ordinary *.
(cb_file_change): Likewise for param "map" and local "from".

Re: [PING 2][PATCH] libgcc: Add CFI directives to the soft floating point support code for ARM

2015-05-04 Thread Martin Galvan
Hi Ramana! Sorry to bother, but I looked at the repository and didn't
see this committed. As I don't have write access could you please
commit this for me?

Thanks a lot!

On Tue, Apr 28, 2015 at 2:07 PM, Martin Galvan
 wrote:
> Thanks a lot. I don't have write access to the repository, could you
> commit this for me?
>
> On Tue, Apr 28, 2015 at 1:21 PM, Ramana Radhakrishnan
>  wrote:
>> On Tue, Apr 28, 2015 at 4:19 PM, Martin Galvan
>>  wrote:
>>> This patch adds CFI directives to the soft floating point support code for 
>>> ARM.
>>>
>>> Previously, if we tried to do a backtrace from that code in a debug session 
>>> we'd
>>> get something like this:
>>>
>>> (gdb) bt
>>> #0  __nedf2 () at 
>>> ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1082
>>> #1  0x0db6 in __aeabi_cdcmple () at 
>>> ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1158
>>> #2  0xf5c28f5c in ?? ()
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>>
>>> Now we'll get something like this:
>>>
>>> (gdb) bt
>>> #0  __nedf2 () at 
>>> ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1156
>>> #1  0x0db6 in __aeabi_cdcmple () at 
>>> ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1263
>>> #2  0x0dc8 in __aeabi_dcmpeq () at 
>>> ../../../../../../gcc-4.9.2/libgcc/config/arm/ieee754-df.S:1285
>>> #3  0x0504 in main ()
>>>
>>> I have a company-wide copyright assignment. I don't have commit access, 
>>> though, so it would be great if anyone could commit this for me.
>>>
>>> Thanks a lot!
>>>
>>
>> this is OK , thanks. Sorry about the delay in reviewing this.
>>
>> Ramana


Re: [PATCH, RFC]: Next stage1, refactoring: propagating rtx subclasses

2015-05-04 Thread Trevor Saunders
> OK. Fixed the patch. Rebased and tested on x86_64-linux (fortunately, it
> did not conflict with Trevor's series of rtx_insn-related patches).

good :) fwiw I have another series that'll probably be ready about the
end of the week (the punishment for writing small patches is making the
testing box spin for days ;-)

> I'm trying to continue and the next patch (peep_split.patch,
> peep_split.cl) is addressing the same task in some of the generated code
> (namely, gen_peephole2_* and gen_split_* series of functions).

ok, I've stayed away from the generators andjust done more "trivial"
changes of rtx -> rtx_insn * in arguments.

Trev

> > If you're going to continue this work, you should probably get
> > write-after-approval access so that you can commit your own approved
> > changes.
> Is it OK to mention you as a maintainer who can approve my request for
> write access?
> 
> -- 
> Regards,
> Mikhail Maltsev
> 




Re: [patch] Perform anonymous constant propagation during inlining

2015-05-04 Thread Eric Botcazou
> 2015-05-01  Eric Botcazou  
> 
>   * expr.c (expand_expr_real_1) : Try to substitute constants
>   on the RHS of expressions.
>   * gimple-expr.h (is_gimple_constant): Reorder.

Bummer.  This breaks C++ debugging:

+FAIL: gdb.cp/class2.exp: print alpha at marker return 0
+FAIL: gdb.cp/class2.exp: print beta at marker return 0
+FAIL: gdb.cp/class2.exp: print * aap at marker return 0
+FAIL: gdb.cp/class2.exp: print * bbp at marker return 0
+FAIL: gdb.cp/class2.exp: print * abp at marker return 0, s-p-o off
+FAIL: gdb.cp/class2.exp: print * (B *) abp at marker return 0
+FAIL: gdb.cp/class2.exp: p acp
+FAIL: gdb.cp/class2.exp: p acp->c1
+FAIL: gdb.cp/class2.exp: p acp->c2

because C++ is apparently relying on the assignment to the anonymous return 
object to preserve the debug info attached to a return statement.

Would you be OK with a slight variation of your earlier idea, i.e. calling 
fold_stmt with a specific valueizer from fold_marked_statements instead of the 
implicit no_follow_ssa_edges in the inliner?  Something like:

tree
follow_anonymous_single_use_edges (tree val)
{
  if (TREE_CODE (val) == SSA_NAME
  && (!SSA_NAME_VAR (val) || DECL_IGNORED_P (SSA_NAME_VAR (var)))
  && has_single_use (val))
return val
  return NULL_TREE;
}

-- 
Eric Botcazou


Re: [PATCH/libiberty] fix build of gdb/binutils with clang.

2015-05-04 Thread Yunlian Jiang
There was a similar disscussion here
https://gcc.gnu.org/ml/gcc/2005-11/msg01190.html

The problem is in the configure stage, the __GNU_SOURCE is not
defined, and it could not find
the declaration of asprintf. so it make a declaration of asprintf in
libiberty.h. And  for the file floatformat.c,
the  __GNU_SOURCE is defined, so it could find another asprintf in
/usr/include/bits/stdio2.h, it also includes
libiberty.h. So these two asprintf conflicts when __USE_FORTIFY_LEVEL is set.

On Sat, May 2, 2015 at 11:58 AM, Ian Lance Taylor  wrote:
> On Fri, May 1, 2015 at 4:45 PM, Yunlian Jiang  wrote:
>> The test case does not have #define _GNU_SOURCE, so it says
>> error: ‘asprintf’ undeclared (first use in this function)
>
> OK, then my next question is: why does the test case (I assume you
> mean the test case for whether to set HAVE_DECL_ASPRINTF) not have
> #define _GNU_SOURCE?
>
> What is the background here?
>
> Ian
>
>> On Fri, May 1, 2015 at 3:45 PM, Ian Lance Taylor  wrote:
>>> On Tue, Apr 28, 2015 at 2:59 PM, Yunlian Jiang  wrote:
 I believe this is the same problem as
 https://gcc.gnu.org/ml/gcc-patches/2008-07/msg00292.html

 The asprinf declaration is  messed up when using clang to build gdb.

 diff --git a/include/libiberty.h b/include/libiberty.h
 index b33dd65..a294903 100644
 --- a/include/libiberty.h
 +++ b/include/libiberty.h
 @@ -625,8 +625,10 @@ extern int pwait (int, int *, int);
  /* Like sprintf but provides a pointer to malloc'd storage, which must
 be freed by the caller.  */

 +#ifndef asprintf
  extern int asprintf (char **, const char *, ...) ATTRIBUTE_PRINTF_2;
  #endif
 +#endif

  /* Like asprintf but allocates memory without fail. This works like
 xmalloc.  */
>>>
>>> Why is HAVE_DECL_ASPRINTF not defined?
>>>
>>> Ian


[patch committed SH] Fix PR target/65987

2015-05-04 Thread Kaz Kojima
I've committed the attached patch to fix PR target/65987
which is a 6 regression.  The recent stdarg change reveals
the target problem for section crossing jumps.
Some SH specific jump optimizations don't take into account
such jumps.  The attached patch is a minimal fix to solve
the above PR.  Tested on sh4-unknown-linux-gnu.

Regards,
kaz
--
2015-05-04  Kaz Kojima  

PR target/65987
* config/sh/sh.c (output_far_jump): Take into account crossing jumps.
(split_branches): Likewise.

diff --git a/config/sh/sh.c b/config/sh/sh.c
index 1cf6ed0..a4c9c4c 100644
--- a/config/sh/sh.c
+++ b/config/sh/sh.c
@@ -2747,7 +2747,8 @@ output_far_jump (rtx_insn *insn, rtx op)
 
   if (TARGET_SH2
   && offset >= -32764
-  && offset - get_attr_length (insn) <= 32766)
+  && offset - get_attr_length (insn) <= 32766
+  && ! CROSSING_JUMP_P (insn))
 {
   far = 0;
   jump =   "mov.w  %O0,%1" "\n"
@@ -6753,6 +6754,13 @@ split_branches (rtx_insn *first)
 
if (type == TYPE_JUMP)
  {
+   if (CROSSING_JUMP_P (insn))
+ {
+   emit_insn_before (gen_block_branch_redirect (const0_rtx),
+ insn);
+   continue;
+ }
+
far_label = as_a  (
  XEXP (SET_SRC (PATTERN (insn)), 0));
dest_uid = get_dest_uid (far_label, max_uid);


match.pd patch reverted

2015-05-04 Thread Jeff Law


I've reverted my latest match.pd change.  It's causing a bootstrap 
failure on i686.


Jeff


Re: Extend verify_type to check various uses of TYPE_MINVAL

2015-05-04 Thread Jan Hubicka
> > Not obvious enough, it seems: this patch broke gnat.dg/lto* tests at
> > least on i386-pc-solaris2.10.  E.g.
> > 
> > FAIL: gnat.dg/lto1.adb (test for excess errors)
> > WARNING: gnat.dg/lto1.adb compilation failed to produce executable
> > 
> > FAIL: gnat.dg/lto1.adb (test for excess errors)
> > Excess errors:
> > /vol/gcc/src/hg/trunk/solaris/gcc/testsuite/gnat.dg/lto1_pkg.adb:23:1:
> > error: TYPE_MIN_VALUE is not constant 
> 
> TYPE_MIN_VALUE can be arbitrary in Ada, with or without LTO.  For
> 
> package Q is
> 
>function LB return Natural;
>function UB return Natural;
> 
> end Q;
> with Q;
> 
> package P is
> 
>type Arr1 is array (Natural range <>) of Boolean;
> 
>subtype Arr2 is Arr1 (Q.LB .. Q.UB);
> 
> end P;
> 
> the TYPE_DOMAIN of Arr2 is
> 
> domain  sizetype>
> sizes-gimplified visited DI size  unit 
> size 
> align 64 symtab 0 alias set -1 canonical type 0x769be000 
> precision 
> 64 min  max 

Thanks, I just noticed the failures.  I will revert that check, it is indeed 
valid
for min values to not be constants (and even in C max values may be variable)

Honza


Re: [PATCH/libiberty] fix build of gdb/binutils with clang.

2015-05-04 Thread Ian Lance Taylor
On Mon, May 4, 2015 at 3:49 PM, Yunlian Jiang  wrote:
> There was a similar disscussion here
> https://gcc.gnu.org/ml/gcc/2005-11/msg01190.html

That was a discussion about libiberty.  Your subject says you have
trouble building gdb.

Can you describe the exact problem that you are having?  What
precisely are you doing?  What precisely happens?


> The problem is in the configure stage, the __GNU_SOURCE is not
> defined, and it could not find
> the declaration of asprintf. so it make a declaration of asprintf in
> libiberty.h. And  for the file floatformat.c,
> the  __GNU_SOURCE is defined, so it could find another asprintf in
> /usr/include/bits/stdio2.h, it also includes
> libiberty.h. So these two asprintf conflicts when __USE_FORTIFY_LEVEL is set.

I think the basic guideline should be that HAVE_DECL_ASPRINTF should
be correct.  If libiberty compiled with _GNU_SOURCE defined, then it
should test HAVE_DECL_ASPRINTF with _GNU_SOURCE defined.  If not, then
not.  So perhaps the problem is that libiberty is compiling some files
with _GNU_SOURCE defined and some not.

Ian


Re: [Patch,microblaze]: Optimized usage of fint instruction.

2015-05-04 Thread Michael Eager

On 03/04/2015 08:20 AM, Michael Eager wrote:

On 03/04/15 03:53, Ajit Kumar Agarwal wrote:



-Original Message-
From: Michael Eager [mailto:ea...@eagerm.com]
Sent: Thursday, February 26, 2015 4:33 AM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,microblaze]: Optimized usage of fint instruction.

On 02/25/15 02:20, Ajit Kumar Agarwal wrote:

Hello All:

Please find the patch for the optimized usage of fint instruction
changes. No regression is seen in the deja GNU tests.

commit ed4dc0b96bf43c200cacad97f73a98ab7048e51b
Author: Ajit Kumar Agarwal 
Date:   Wed Feb 25 15:36:29 2015 +0530

  [Patch,microblaze]: Optimized usage of fint instruction.

  The changes are made in the patch for optimized usage of fint instruction.
  The sequence of fint/cond_branch is replaced with fcmp/cond_branch. The
  fint instruction takes 6/7 cycles as compared to fcmp instruction which
  takes 1 cycles. The conversion from float to int with fint instruction
  is not required and can directly compared with fcmp instruction which
  takes 1 cycle as compared to 6/7 cycles with fint instruction.

  ChangeLog:
  2015-02-25  Ajit Agarwal  

  * config/microblaze/microblaze.md (peephole2): New.





+emit_insn (gen_cstoresf4 (comp_reg, operands[2],
+  gen_rtx_REG(SFmode,REGNO(cmp_op0)),
+  gen_rtx_REG(SFmode,REGNO(cmp_op1;


Spaces before left parens and after comma in last two lines.


Changes are incorporated. Please find the log for updated patch.

commit 492b0d0b67a5b12d2dc239de3215630c8838edea
Author: Ajit Kumar Agarwal 
Date:   Wed Mar 4 17:15:16 2015 +0530

 [Patch,microblaze]: Optimized usage of fint instruction.

 The changes are made in the patch for optimized usage of fint instruction.
 The sequence of fint/cond_branch is replaced with fcmp/cond_branch. The
 fint instruction takes 6/7 cycles as compared to fcmp instruction which
 takes 1 cycles. The conversion from float to int with fint instruction
 is not required and can directly compared with fcmp instruction which
 takes 1 cycle as compared to 6/7 cycles with fint instruction.

 ChangeLog:
 2015-03-04  Ajit Agarwal  

 * config/microblaze/microblaze.md (peephole2): New.

 Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks & Regards
Ajit


OK.


Committed revision 222790.


--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


Re: [Patch,microblaze]: Optimized usage of pcmp conditional instruction.

2015-05-04 Thread Michael Eager

On 03/06/2015 07:33 AM, Michael Eager wrote:

On 03/05/15 21:12, Ajit Kumar Agarwal wrote:





Changes are  incorporated. Please find the log of the updated patch.

commit 91f275c144165320850ddf18e3a1e059a66c
Author: Ajit Kumar Agarwal 
Date:   Fri Mar 6 09:55:11 2015 +0530

 [Patch,microblaze]: Optimized usage of pcmp conditional instruction.

 The changes are made in the patch for optimized usage of pcmpne/pcmpeq
 instructions. The xor with register to register is replaced with pcmpeq
 /pcmpne instructions and for immediate check still the xori will be used.
 The purpose of the change is to acheive the aggressive usage of pcmpne
 /pcmpeq instructions instead of xor being used for comparison.

 ChangeLog:
 2015-03-06  Ajit Agarwal  

 * config/microblaze/microblaze.md (cbranchsi4): Added immediate
 constraints.
 (cbranchsi4_reg): New.
 * config/microblaze/microblaze.c
 (microblaze_expand_conditional_branch_reg): New.
 * config/microblaze/microblaze-protos.h
 (microblaze_expand_conditional_branch_reg): New prototype.

 Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks & Regards
Ajit


OK.


Committed revision 222791.



--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


Re: Extend verify_type to check various uses of TYPE_MINVAL

2015-05-04 Thread Jan Hubicka
Hi,
if my wifi connectoin allows, I will commit the following patch I tested in
meantime.  It also adds sanity checking for TYPE_MAXVAL that does not seem to
trigger any issues anymore.

>From type_non_common it remains to check values and binfo. I hope to kill all
those fields and move them to derived structures where they belong but it is
harder than it seems because way obj-c++ shares datastructures with C++ and C
FEs and abuse these fields in interesting ways. (I got stuck on these last
stage1)

Honza

Index: ChangeLog
===
--- ChangeLog   (revision 222791)
+++ ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2015-05-02  Jan Hubicka  
+
+   * tree.c (verify_type): Check various uses of TYPE_MAXVAL;
+   fix overactive TYPE_MIN_VALUE check and add FIXME for type
+   compatibility problems.
+
 2015-05-04  Ajit Agarwal  
 
* config/microblaze/microblaze.md (cbranchsi4): Added immediate
Index: tree.c
===
--- tree.c  (revision 222753)
+++ tree.c  (working copy)
@@ -12621,14 +12621,9 @@ verify_type (const_tree t)
 }
   else if (INTEGRAL_TYPE_P (t) || TREE_CODE (t) == REAL_TYPE || TREE_CODE (t) 
== FIXED_POINT_TYPE)
 {
-  if (!TYPE_MIN_VALUE (t))
-   ;
-  else if (!TREE_CONSTANT (TYPE_MIN_VALUE (t)))
-{
- error ("TYPE_MIN_VALUE is not constant");
- debug_tree (TYPE_MIN_VALUE (t));
- error_found = true;
-}
+  /* FIXME: The following check should pass:
+ useless_type_conversion_p (const_cast  (t), TREE_TYPE 
(TYPE_MIN_VALUE (t))
+bud does not for C sizetypes in LTO.  */
 }
   else if (TYPE_MINVAL (t))
 {
@@ -12637,6 +12632,62 @@ verify_type (const_tree t)
   error_found = true;
 }
 
+  /* Check various uses of TYPE_MAXVAL.  */
+  if (RECORD_OR_UNION_TYPE_P (t))
+{
+  if (TYPE_METHODS (t) && TREE_CODE (TYPE_METHODS (t)) != FUNCTION_DECL
+ && TREE_CODE (TYPE_METHODS (t)) != TEMPLATE_DECL)
+   {
+ error ("TYPE_METHODS is not FUNCTION_DECL nor TEMPLATE_DECL");
+ debug_tree (TYPE_METHODS (t));
+ error_found = true;
+   }
+}
+  else if (TREE_CODE (t) == FUNCTION_TYPE || TREE_CODE (t) == METHOD_TYPE)
+{
+  if (TYPE_METHOD_BASETYPE (t)
+ && TREE_CODE (TYPE_METHOD_BASETYPE (t)) != RECORD_TYPE
+ && TREE_CODE (TYPE_METHOD_BASETYPE (t)) != UNION_TYPE)
+   {
+ error ("TYPE_METHOD_BASETYPE is not record nor union");
+ debug_tree (TYPE_METHOD_BASETYPE (t));
+ error_found = true;
+   }
+}
+  else if (TREE_CODE (t) == OFFSET_TYPE)
+{
+  if (TYPE_OFFSET_BASETYPE (t)
+ && TREE_CODE (TYPE_OFFSET_BASETYPE (t)) != RECORD_TYPE
+ && TREE_CODE (TYPE_OFFSET_BASETYPE (t)) != UNION_TYPE)
+   {
+ error ("TYPE_OFFSET_BASETYPE is not record nor union");
+ debug_tree (TYPE_OFFSET_BASETYPE (t));
+ error_found = true;
+   }
+}
+  else if (INTEGRAL_TYPE_P (t) || TREE_CODE (t) == REAL_TYPE || TREE_CODE (t) 
== FIXED_POINT_TYPE)
+{
+  /* FIXME: The following check should pass:
+ useless_type_conversion_p (const_cast  (t), TREE_TYPE 
(TYPE_MAX_VALUE (t))
+bud does not for C sizetypes in LTO.  */
+}
+  else if (TREE_CODE (t) == ARRAY_TYPE)
+{
+  if (TYPE_ARRAY_MAX_SIZE (t)
+ && TREE_CODE (TYPE_ARRAY_MAX_SIZE (t)) != INTEGER_CST)
+{
+ error ("TYPE_ARRAY_MAX_SIZE not INTEGER_CST");
+ debug_tree (TYPE_ARRAY_MAX_SIZE (t));
+ error_found = true;
+} 
+}
+  else if (TYPE_MAXVAL (t))
+{
+  error ("TYPE_MAXVAL non-NULL");
+  debug_tree (TYPE_MAXVAL (t));
+  error_found = true;
+}
+
 
   if (error_found)
 {


Re: [patch] libstdc++/56117 make std::async launch new threads by default

2015-05-04 Thread Jonathan Wakely

On 02/05/15 19:56 +0100, Jonathan Wakely wrote:

One last patch before I head to Lenexa, this fixes the long standing
not-a-bug that our default launch policy is launch::deferred.

This way std::async with no explicit policy or with any policy that
contains launch::async will run in a new thread.

Apparently libc++ does the same and they aren't getting lots of
complaints about fork-bombs, so let's try the same thing. If people
don't like it we have plenty of time in stage 1 to reconsider.

Tested x86_64-linux and powerpc64le-linux, I'm going to commit this to
trunk unless someone strongly objects.


Committed to trunk.


[debug-early] fix problem with template parameter packs

2015-05-04 Thread Aldy Hernandez
The code handling parameter DIEs needed a little tweaking for variable 
length template arguments.  I've relaxed the original assert, but this 
may require tweaking at branch review time-- hopefully later this week.


Committing to branch.

Aldy

p.s. Richi/Jason: Winter is coming.  Down to 1 GCC regression which is 
actually a missed DIE optimization which I hope I can fix post merge.
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index c51cea1..a5b155f 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -18018,8 +18018,20 @@ gen_formal_parameter_die (tree node, tree origin, bool 
emit_name_p,
 DW_AT_abstract_origin.  */
   if (parm_die && parm_die->die_parent != context_die)
{
- gcc_assert (!DECL_ABSTRACT_P (node));
- parm_die = NULL;
+ if (!DECL_ABSTRACT_P (node))
+   {
+ gcc_assert (!DECL_ABSTRACT_P (node));
+ parm_die = NULL;
+   }
+ else
+   {
+ /* Reuse DIE even with a differing context.  This
+happens when called through
+dwarf2out_abstract_function for
+formal parameter packs.  */
+ gcc_assert (parm_die->die_parent->die_tag
+ == DW_TAG_GNU_formal_parameter_pack);
+   }
}
 
   if (parm_die && parm_die->die_parent == NULL)


RE: [PATCH, combine] Try REG_EQUAL for nonzero_bits

2015-05-04 Thread Thomas Preud'homme
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Tuesday, April 28, 2015 12:27 AM
> OK.  No need for heroics -- give it a shot, but don't burn an insane
> amount of time on it.  If we can't get to a reasonable testcase, then so
> be it.

Ok, I tried but really didn't managed to create a testcase. I did, however,
understand the condition when this patch is helpful. In the function
reg_nonzero_bits_for_combine () in combine.c there is a test to check if
last_set_nonzero_bits for a given register is still valid.

In the case I'm considering, the test evaluates to false because:

(i) the register rX whose nonzero bits are being evaluated was set in a
previous basic block than the one with the instruction using rX (hence
rsp->last_set_label < label_tick)
(ii) the predecessor of the the basic block for that same insn is not the
previous basic block analyzed by combine_instructions (hence
label_tick_ebb_start == label_tick)
(iii) the register rX is set multiple time (hence
REG_N_SETS (REGNO (x)) != 1)

Yet, the block being processed is dominated by the SET for rX so there
is a REG_EQUAL available to narrow down the set of nonzero bits.

Based on my understanding of your answer quoted above, I'll commit
it as is, despite not having been able to come up with a testcase. I'll
wait tomorrow to do so though in case you changed your mind about it.

Best regards,

Thomas




[PATCH] Improve the test in bitfields.m4

2015-05-04 Thread tbsaunde+gcc
From: Trevor Saunders 

Hi,

here's what I committed.  bootstrapped + regtested x86_64-linux-gnu.

Trev

Using a named bitfield with a width more than 0 means we won't hit
weirdness caused by the bitfield not really needing to exist.  Changing
int to long long means we won't have trouble with some arch where size
of int is 1 or 2.

libobjc/ChangeLog:

2015-05-04  Trevor Saunders  

* configure: Regenerate.

config/ChangeLog:

2015-05-04  Trevor Saunders  

* bitfields.m4: Change int to long long, and use bitfields of
width 1 instead of 0.
---
 config/bitfields.m4 | 7 +++
 libobjc/configure   | 7 +++
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/config/bitfields.m4 b/config/bitfields.m4
index ee8f3b5..8185cd3 100644
--- a/config/bitfields.m4
+++ b/config/bitfields.m4
@@ -13,10 +13,9 @@ AC_DEFUN([gt_BITFIELD_TYPE_MATTERS],
   AC_CACHE_CHECK([if the type of bitfields matters], 
gt_cv_bitfield_type_matters,
   [
 AC_TRY_COMPILE(
-  [struct foo1 { char x; char :0; char y; };
-struct foo2 { char x; int :0; char y; };
-int foo1test[ sizeof (struct foo1) == 2 ? 1 : -1 ];
-int foo2test[ sizeof (struct foo2) == 5 ? 1 : -1]; ],
+  [struct foo1 { char x; char y:1; char z; };
+struct foo2 { char x; long long int y:1; char z; };
+int foo1test[ sizeof (struct foo1) < sizeof (struct foo2) ? 1 : -1 ]; ],
   [], gt_cv_bitfield_type_matters=yes, gt_cv_bitfield_type_matters=no)
   ])
   if test $gt_cv_bitfield_type_matters = yes; then
diff --git a/libobjc/configure b/libobjc/configure
index 0547f91..2f71735 100755
--- a/libobjc/configure
+++ b/libobjc/configure
@@ -11539,10 +11539,9 @@ else
 
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
-struct foo1 { char x; char :0; char y; };
-struct foo2 { char x; int :0; char y; };
-int foo1test[ sizeof (struct foo1) == 2 ? 1 : -1 ];
-int foo2test[ sizeof (struct foo2) == 5 ? 1 : -1];
+struct foo1 { char x; char y:1; char z; };
+struct foo2 { char x; long long int y:1; char z; };
+int foo1test[ sizeof (struct foo1) < sizeof (struct foo2) ? 1 : -1 ];
 int
 main ()
 {
-- 
2.4.0



[PATCH, i386]: Fix PR65871, add *bmi_andn__ccno pattern

2015-05-04 Thread Uros Bizjak
Hello!

Another pattern that seems useful.

2015-05-05  Uros Bizjak  

PR target/65871
* config/i386/i386.md (*bmi_andn__ccno): New pattern.

testsuite/ChangeLog:

2015-05-05  Uros Bizjak  

PR target/65871
* gcc.target/i386/pr65871-3.c: New test.

Teste on x86_64-linux-gnu {,-m32}  and committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 222774)
+++ config/i386/i386.md (working copy)
@@ -12565,11 +12564,25 @@
(set_attr "btver2_decode" "direct, double")
(set_attr "mode" "")])
 
+(define_insn "*bmi_andn__ccno"
+  [(set (reg FLAGS_REG)
+   (compare
+ (and:SWI48
+   (not:SWI48 (match_operand:SWI48 1 "register_operand" "r,r"))
+   (match_operand:SWI48 2 "nonimmediate_operand" "r,m"))
+ (const_int 0)))
+   (clobber (match_scratch:SWI48 0 "=r,r"))]
+  "TARGET_BMI && ix86_match_ccmode (insn, CCNOmode)"
+  "andn\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "bitmanip")
+   (set_attr "btver2_decode" "direct, double")
+   (set_attr "mode" "")])
+
 (define_insn "bmi_bextr_"
   [(set (match_operand:SWI48 0 "register_operand" "=r,r")
 (unspec:SWI48 [(match_operand:SWI48 1 "nonimmediate_operand" "r,m")
-   (match_operand:SWI48 2 "register_operand" "r,r")]
-   UNSPEC_BEXTR))
+   (unspec:SWI48 [(match_operand:SWI48 1 "nonimmediate_operand" "r,m")
+  (match_operand:SWI48 2 "register_operand" "r,r")]
+ UNSPEC_BEXTR))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_BMI"
   "bextr\t{%2, %1, %0|%0, %1, %2}"
Index: testsuite/gcc.target/i386/pr65871-3.c
===
--- testsuite/gcc.target/i386/pr65871-3.c   (revision 0)
+++ testsuite/gcc.target/i386/pr65871-3.c   (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbmi" } */
+
+int foo (int x, int y)
+{
+  if (~x & y)
+return 1;
+
+  return 0;
+}
+
+int bar (int x, int y)
+{
+  if ((~x & y) > 0)
+return 1;
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "test" } } */


Re: [patch] Perform anonymous constant propagation during inlining

2015-05-04 Thread Richard Biener
On May 4, 2015 11:38:42 PM GMT+02:00, Eric Botcazou  
wrote:
>> 2015-05-01  Eric Botcazou  
>> 
>>  * expr.c (expand_expr_real_1) : Try to substitute
>constants
>>  on the RHS of expressions.
>>  * gimple-expr.h (is_gimple_constant): Reorder.
>
>Bummer.  This breaks C++ debugging:
>
>+FAIL: gdb.cp/class2.exp: print alpha at marker return 0
>+FAIL: gdb.cp/class2.exp: print beta at marker return 0
>+FAIL: gdb.cp/class2.exp: print * aap at marker return 0
>+FAIL: gdb.cp/class2.exp: print * bbp at marker return 0
>+FAIL: gdb.cp/class2.exp: print * abp at marker return 0, s-p-o off
>+FAIL: gdb.cp/class2.exp: print * (B *) abp at marker return 0
>+FAIL: gdb.cp/class2.exp: p acp
>+FAIL: gdb.cp/class2.exp: p acp->c1
>+FAIL: gdb.cp/class2.exp: p acp->c2
>
>because C++ is apparently relying on the assignment to the anonymous
>return 
>object to preserve the debug info attached to a return statement.
>
>Would you be OK with a slight variation of your earlier idea, i.e.
>calling 
>fold_stmt with a specific valueizer from fold_marked_statements instead
>of the 
>implicit no_follow_ssa_edges in the inliner?  Something like:
>
>tree
>follow_anonymous_single_use_edges (tree val)
>{
>  if (TREE_CODE (val) == SSA_NAME
>  && (!SSA_NAME_VAR (val) || DECL_IGNORED_P (SSA_NAME_VAR (var)))
>  && has_single_use (val))
>return val
>  return NULL_TREE;
>}

Yes, that works for me as well.

Richard.



Re: [AArch64][PR65375] Fix RTX cost for vector SET

2015-05-04 Thread James Greenhalgh
On Sat, Apr 25, 2015 at 12:26:16AM +0100, Kugan wrote:
> 
> Thanks for the review. I have updated the patch based on the comments
> with some other minor changes. Bootstrapped and regression tested on
> aarch64-none-linux-gnu with no-new regressions. Is this OK for trunk?
> 
> 
> Thanks,
> Kugan
> 
> 
> gcc/ChangeLog:
> 
> 2015-04-24  Kugan Vivekanandarajah  
>   Jim Wilson  
> 
>   * config/arm/aarch-common-protos.h (struct mem_cost_table): Added
>   new  fields loadv and storev.
>   * config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
>   Initialize loadv and storev.
>   * config/arm/aarch-cost-tables.h (generic_extra_costs): Likewise.
>   (cortexa53_extra_costs): Likewise.
>   (cortexa57_extra_costs): Likewise.
>   (xgene1_extra_costs): Likewise.
>   * config/aarch64/aarch64.c (aarch64_rtx_costs): Update vector
>   rtx_costs.

Hi Kugan,

Just a few syle comments, regarding the placements of comments in single-line
if statements. I know the current code does not neccesarily always follow the
comments below, I'll write a patch cleaning that up at some point when I'm back
at my desk.

Thanks,
James

> @@ -5667,6 +5668,14 @@ aarch64_rtx_costs (rtx x, int code, int outer 
> ATTRIBUTE_UNUSED,
>  case NEG:
>op0 = XEXP (x, 0);
>  
> +  if (VECTOR_MODE_P (mode))
> + {
> +   if (speed)
> + /* FNEG.  */
> + *cost += extra_cost->vect.alu;
> +   return false;
> + }
> +
>if (GET_MODE_CLASS (GET_MODE (x)) == MODE_INT)
> {
>if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE

Personally, I find commented if statements without braces hard to
quickly parse. Something like this is much faster for me:

  if (speed)
{
  /* FNEG.  */
  *cost += extra_cost->vect.alu;
}

> @@ -5844,7 +5872,10 @@ cost_minus:
>  
>   if (speed)
> {
> - if (GET_MODE_CLASS (mode) == MODE_INT)
> + if (VECTOR_MODE_P (mode))
> +   /* Vector SUB.  */
> +   *cost += extra_cost->vect.alu;
> + else if (GET_MODE_CLASS (mode) == MODE_INT)
> /* SUB(S).  */
> *cost += extra_cost->alu.arith;
>   else if (GET_MODE_CLASS (mode) == MODE_FLOAT)

As above.

> @@ -5888,7 +5919,6 @@ cost_plus:
> {
>   if (speed)
> *cost += extra_cost->alu.arith_shift;
> -
>   *cost += rtx_cost (XEXP (XEXP (op0, 0), 0),
>  (enum rtx_code) GET_CODE (op0),
>  0, speed);

Drop this whitespace change.

> @@ -5913,7 +5943,10 @@ cost_plus:
>  
>   if (speed)
> {
> - if (GET_MODE_CLASS (mode) == MODE_INT)
> + if (VECTOR_MODE_P (mode))
> +   /* Vector ADD.  */
> +   *cost += extra_cost->vect.alu;
> + else if (GET_MODE_CLASS (mode) == MODE_INT)
> /* ADD.  */
> *cost += extra_cost->alu.arith;
>   else if (GET_MODE_CLASS (mode) == MODE_FLOAT)

As above.

> @@ -6013,10 +6061,15 @@ cost_plus:
>return false;
>  
>  case NOT:
> -  /* MVN.  */
>if (speed)
> - *cost += extra_cost->alu.logical;
> -
> + {
> +   /* Vector NOT.  */
> +   if (VECTOR_MODE_P (mode))
> + *cost += extra_cost->vect.alu;
> +   /* MVN.  */
> +   else
> + *cost += extra_cost->alu.logical;
> + }
>/* The logical instruction could have the shifted register form,
>   but the cost is the same if the shift is processed as a separate
>   instruction, so we don't bother with it here.  */

As above.

> @@ -6055,10 +6108,15 @@ cost_plus:
> return true;
>   }
>  
> -  /* UXTB/UXTH.  */
>if (speed)
> - *cost += extra_cost->alu.extend;
> -
> + {
> +   if (VECTOR_MODE_P (mode))
> + /* UMOV.  */
> + *cost += extra_cost->vect.alu;
> +   else
> + /* UXTB/UXTH.  */
> + *cost += extra_cost->alu.extend;
> + }
>return false;
>  
>  ca§se SIGN_EXTEND:

And again :)

> @@ -6087,10 +6150,16 @@ cost_plus:
>  
>if (CONST_INT_P (op1))
>  {
> -   /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
> -  aliases.  */
> if (speed)
> - *cost += extra_cost->alu.shift;
> + {
> +   /* Vector shift (immediate).  */
> +   if (VECTOR_MODE_P (mode))
> + *cost += extra_cost->vect.alu;
> +   /* LSL (immediate), UBMF, UBFIZ and friends.  These are all
> +  aliases.  */
> +   else
> + *cost += extra_cost->alu.shift;
> + }
>  
>/* We can incorporate zero/sign extend for free.  */
>if (GET_CODE (op0) == ZERO_EXTEND

Again, the comment here makes it very difficult to spot the form of
the if/else statement.

> @@ -6102,10 +6171,15 @@ cost_plus:
>  }
>else
>  {
> -   /* LSLV. 

[Patch, fortran] PR 37131, inline matmul

2015-05-04 Thread Thomas Koenig
Hello world,

this is an update of the matmul inline patch.  The only difference to
the last version is that it has the ubound simplification taken out.

Any further comments?  OK for trunk?

Thomas

2015-05-05  Thomas Koenig  

PR fortran/37131
* gfortran.h (gfc_isym_id):  Add GFC_ISYM_FE_RUNTIME_ERROR.
(gfc_array_spec):  Add resolved flag.
(gfc_intrinsic_sym):  Add vararg.
* intrinsic.h (gfc_check_fe_runtime_error):  Add prototype.
(gfc_resolve_re_runtime_error):  Likewise.
Add prototype for gfc_is_reallocatable_lhs.
* array.c (gfc_resolve_array_spec):  Do not resolve if it has
already been resolved.
* trans-array.h (gfc_is_reallocatable_lhs):  Remove prototype.
* check.c (gfc_check_fe_runtime_error):  New function.
* intrinsic.c (add_sym_1p):  New function.
(make_vararg):  New function.
(add_subroutines):  Add fe_runtime_error.
(gfc_intrinsic_sub_interface): Skip sorting for variable number
of arguments.
* iresolve.c (gfc_resolve_fe_runtime_error):  New function.
* lang.opt (inline-matmul-limit):  New option.
(gfc_post_options): If no inline matmul limit has been set and
BLAS is called externally, use the BLAS limit.
* frontend-passes.c:  Include intrinsic.h.
(var_num):  New global counter for naming temporary variablbles.
(matrix_case):  Enum for differentiating the different matmul
cases.
(realloc_string_callback):  Add "trim" to the variable name.
(create_var): Add optional argument vname as part of the name.
Use var_num. Set dimension of result correctly. Split off block
creation into
(insert_block): New function.
(cfe_expr_0): Use "fcn" as part of temporary variable name.
(optimize_namesapce): Also set gfc_current_ns. Call
inline_matmul_assign.
(combine_array_constructor):  Use "constr" as part of
temporary name.
(get_array_inq_function):  New function.
(build_logical_expr):  New function.
(get_operand):  new function.
(inline_limit_check):  New function.
(runtime_error_ne):  New function.
(matmul_lhs_realloc):  New function.
(is_functino_or_op):  New function.
(has_function_or_op):  New function.
(freeze_expr):  New function.
(freeze_references):  New function.
(convert_to_index_kind):  New function.
(create_do_loop):  New function.
(get_size_m1):  New function.
(scalarized_expr):  New function.
(inline_matmul_assign):  New function.
* simplify.c (simplify_bound):  Simplify the case of the
lower bound of an assumed-shape argument.

2015-05-05  Thomas Koenig  

PR fortran/37131
* gfortran.dg/dependency_26.f90: Add option to suppress inlining
matmul.
* gfortran.dg/function_optimize_1.f90:  Likewise.
* gfortran.dg/function_optimize_2.f90:  Likewise.
* gfortran.dg/function_optimize_5.f90:  Likewise.
* gfortran.dg/function_optimize_7.f90:  Likewise.
* gfortran.dg/inline_matmul_1.f90:  New test.
* gfortran.dg/inline_matmul_2.f90:  New test.
* gfortran.dg/inline_matmul_3.f90:  New test.
* gfortran.dg/inline_matmul_4.f90:  New test.
* gfortran.dg/inline_matmul_5.f90:  New test.
Index: fortran/array.c
===
--- fortran/array.c	(Revision 222771)
+++ fortran/array.c	(Arbeitskopie)
@@ -338,6 +338,9 @@ gfc_resolve_array_spec (gfc_array_spec *as, int ch
   if (as == NULL)
 return true;
 
+  if (as->resolved)
+return true;
+
   for (i = 0; i < as->rank + as->corank; i++)
 {
   e = as->lower[i];
@@ -364,6 +367,8 @@ gfc_resolve_array_spec (gfc_array_spec *as, int ch
 	}
 }
 
+  as->resolved = true;
+
   return true;
 }
 
Index: fortran/check.c
===
--- fortran/check.c	(Revision 222771)
+++ fortran/check.c	(Arbeitskopie)
@@ -5527,7 +5527,37 @@ gfc_check_random_seed (gfc_expr *size, gfc_expr *p
   return true;
 }
 
+bool
+gfc_check_fe_runtime_error (gfc_actual_arglist *a)
+{
+  gfc_expr *e;
+  int len, i;
+  int num_percent, nargs;
 
+  e = a->expr;
+  if (e->expr_type != EXPR_CONSTANT)
+return true;
+
+  len = e->value.character.length;
+  if (e->value.character.string[len-1] != '\0')
+gfc_internal_error ("fe_runtime_error string must be null terminated");
+
+  num_percent = 0;
+  for (i=0; ivalue.character.string[i] == '%')
+  num_percent ++;
+
+  nargs = 0;
+  for (; a; a = a->next)
+nargs ++;
+
+  if (nargs -1 != num_percent)
+gfc_internal_error ("fe_runtime_error: Wrong number of arguments (%d instead of %d)",
+			nargs, num_percent++);
+
+  return true;
+}
+
 bool
 gfc_check_second_sub (gfc_expr *time)
 {
Index: fortran/frontend-passes.c

Re: [PING^3] [PATCH] [AArch64, NEON] Improve vmulX intrinsics

2015-05-04 Thread James Greenhalgh
On Sat, Apr 11, 2015 at 11:37:47AM +0100, Jiangjiji wrote:
> Hi,
>   This is a ping for: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00772.html
>   Regtested with aarch64-linux-gnu on QEMU.
>   This patch has no regressions for aarch64_be-linux-gnu big-endian target 
> too.
>   OK for the trunk?
> 
> Thanks.
> Jiang jiji
> 
> 
> --
> Re: [PING^2] [PATCH] [AArch64, NEON] Improve vmulX intrinsics
> 
> Hi, Kyrill
>   Thank you for your suggestion.
>   I fixed it and regtested with aarch64-linux-gnu on QEMU.
>   This patch has no regressions for aarch64_be-linux-gnu big-endian target 
> too.
>   OK for the trunk?

Hi Jiang,

I'm sorry that I've taken so long to get to this, I've been out of office
for several weeks. I have one comment.

> +__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
> +vmul_n_f32 (float32x2_t __a, float32_t __b)
> +{
> +  return __builtin_aarch64_mul_nv2sf (__a, __b);
> +}
> +

For vmul_n_* intrinsics, is there a reason we don't want to use the
GCC vector extension syntax to allow us to write these as:

  __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
  vmul_n_f32 (float32x2_t __a, float32_t __b)
  {
return __a * __b;
  }

It would be great if we could make that work.

Thanks,
James