Re: [PATCH] PR target/48904 x86_64-knetbsd-gnu missing defs

2015-05-08 Thread Bernhard Reutner-Fischer
On 1 May 2015 at 01:23, Trevor Saunders  wrote:
> On Thu, Apr 30, 2015 at 11:58:09PM +0200, Bernhard Reutner-Fischer wrote:
>> On April 30, 2015 5:53:02 PM GMT+02:00, Jeff Law  wrote:
>> >On 04/30/2015 01:58 AM, Bernhard Reutner-Fischer wrote:
>> >> Hi,
>> >>
>> >> On 30 April 2015 at 07:00, Jeff Law  wrote:
>> >>> On 04/29/2015 02:01 AM, Bernhard Reutner-Fischer wrote:
>> 
>>  2012-09-21  H.J. Lu  
>> 
>>   PR target/48904
>>   * config.gcc (x86_64-*-knetbsd*-gnu): Add
>> >i386/knetbsd-gnu64.h.
>>   * config/i386/knetbsd-gnu64.h: New file
>> >>>
>> >>> OK.  Please install on the trunk.

Applied to trunk as r222903
Thanks,


Re: PATCH: PR target/48904: x86_64-knetbsd-gnu fails to build

2015-05-08 Thread Bernhard Reutner-Fischer
On 21 September 2012 at 21:11, H.J. Lu  wrote:
> Hi,
>
> This patch adds i386/knetbsd-gnu64.h for x86_64-knetbsd-gnu.  OK to
> install?

I now installed this to trunk as r222903 after Jeff's approval.

Thanks!
>
> Thanks.
>
> H.J.
> ---
> 2012-09-21  H.J. Lu  
>
> PR target/48904
> * config.gcc (tm_file): Add i386/knetbsd-gnu64.h for
> x86_64-*-knetbsd*-gnu.
>
> * config/i386/knetbsd-gnu64.h: New file.
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index d6c8153..00db1b4 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1275,7 +1275,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu | 
> x86_64-*-knetbsd*-gnu)
> tm_file="${tm_file} kfreebsd-gnu.h i386/kfreebsd-gnu64.h"
> ;;
> x86_64-*-knetbsd*-gnu)
> -   tm_file="${tm_file} knetbsd-gnu.h"
> +   tm_file="${tm_file} knetbsd-gnu.h i386/knetbsd-gnu64.h"
> ;;
> esac
> tmake_file="${tmake_file} i386/t-linux64"
> diff --git a/gcc/config/i386/knetbsd-gnu64.h b/gcc/config/i386/knetbsd-gnu64.h
> new file mode 100644
> index 000..d621bbe
> --- /dev/null
> +++ b/gcc/config/i386/knetbsd-gnu64.h
> @@ -0,0 +1,27 @@
> +/* Definitions for AMD x86-64 running kNetBSD-based GNU systems with ELF 
> format
> +   Copyright (C) 2012
> +   Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#define GNU_USER_LINK_EMULATION32 "elf_i386"
> +#define GNU_USER_LINK_EMULATION64 "elf_x86_64"
> +#define GNU_USER_LINK_EMULATIONX32 "elf32_x86_64"
> +
> +#define GNU_USER_DYNAMIC_LINKER32 "/lib/ld.so.1"
> +#define GNU_USER_DYNAMIC_LINKER64 "/lib/ld-knetbsd-x86-64.so.1"
> +#define GNU_USER_DYNAMIC_LINKERX32 "/lib/ld-knetbsd-x32.so.1"


Re: Enhance std::hash for pointers

2015-05-08 Thread Richard Biener
On Wed, May 6, 2015 at 10:10 PM, François Dumont  wrote:
> Hi
>
> Following Marc Glisse comment #4
> on:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65641 I would like to
> propose this enhancement to the hash functor for pointers. It simply gets
> rid of the irrelevant bits on pointers hash code based on memory alignment
> of the pointed type. The only drawback I can think of is that the type needs
> to be complete at std::hash instantiation time but is it really an issue ?
>
> IMO it is quite obvious that the resulting hash code will be better but

If you use a real hashing function that's not true.  That is, something
else than GCCs pointer_hash (void *p) { return (uintptr_t)p >>3; }.

Richard.

> if anyone has a good method to prove it I can try to implement it. The test
> I have added in quality.cc is very basic and just reflect enhancement
> following Marc's comment.
>
> 2015-05-05  François Dumont 
>
> * include/bits/functional_hash.h
> (std::__detail::_Lowest_power_of_two): New.
> (std::hash<_Tp*>::operator()): Use latter.
> * testsuite/20_util/hash/quality.cc (pointer_quality_test): New.
>
> Tested under Linux x86_64.
>
> François
>


Re: [PATCH][tree-ssa-math-opts] Expand pow (x, CONST) using square roots when possible

2015-05-08 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00071.html

Thanks,
Kyrill
On 01/05/15 17:02, Kyrill Tkachov wrote:

Hi all,

GCC has some logic to expand calls to pow (x, 0.75), pow (0.25) and pow (x, 
(int)k + 0.5)
using square roots. So, for the above examples it would generate sqrt (x) * 
sqrt (sqrt (x)),
sqrt (sqrt (x)) and powi (x, k) * sqrt (x) (assuming k > 0. For k < 0 it will 
calculate the
reciprocal of that).

However, the implementation of these optimisations is done on a bit of an 
ad-hoc basis with
the 0.25, 0.5, 0.75 cases hardcoded.
Judging by 
https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=meissner2.pdf
these are the most commonly used exponents (at least in SPEC ;))

This patch generalises this optimisation into a (hopefully) more robust 
algorithm.
In particular, it expands calls to pow (x, CST) by expanding the integer part 
of CST
using a powi, like it does already, and then expanding the fractional part as a 
product
of repeated applications of a square root if the fractional part can be 
expressed
as a multiple of a power of 0.5.

I try to explain the algorithm in more detail in the comments in the patch but, 
for example:

pow (x, 5.625) is not currently handled, but with this patch will be expanded
to powi (x, 5) * sqrt (x) * sqrt (sqrt (sqrt (x))) because 5.625 == 5.0 + 0.5 + 
0.5**3

Negative exponents are handled in either of two ways, depending on the exponent 
value:
* Using a simple reciprocal.
For example:
pow (x, -5.625) == 1.0 / pow (x, 5.625)
  --> 1.0 / (powi (x, 5) * sqrt (x) * sqrt (sqrt (sqrt (x

* For pow (x, EXP) with negative exponent EXP with integer part INT and 
fractional part FRAC:
pow (1.0 - FRAC) / powi (ceil (abs (EXP))).
For example:
pow (x, -5.875) == pow (x, 0.125) / powi (X, 6)
  --> sqrt (sqrt (sqrt (x))) / (powi (x, 6))


Since hardware square root instructions tend to be expensive, we may want to 
reduce the number
of square roots we are willing to calculate. Since we reuse intermediate square 
root results,
this boils down to restricting the depth of the square root chains. In all the 
examples above
that depth is 3. I've made this maximum depth parametrisable in params.def. By 
adjusting that
parameter we can adjust the resolution of this optimisation. So, if it's set to 
'4' then we
will synthesize every exponent that is a multiple of 0.5**4 == 0.0625, 
including negative
multiples. Currently, GCC will not try to expand negative multiples of anything 
else than 0.5

I have tried to keep the existing functionality intact and activate this only 
for
-funsafe-math-optimizations and only when the target has a sqrt instruction.
   An exception to that is pow (x, 0.5) which we prefer to transform to sqrt 
even
when a hardware sqrt is not available, presumably because the library function 
for
sqrt is usually faster than pow (?).


Having seen the glibc implementation of a fully IEEE-754-compliant pow 
function, I think we
would prefer synthesising the pow call whenever we can for -ffast-math.

I have seen this optimisation trigger a few times in SPEC2k6, in particular in 
447.dealII
and 481.wrf where it replaced calls to powf (x, -0.25), pow (x, 0.125) and pow 
(x, 0.875)
with square roots, multiplies and, in the case of -0.25, divides.
On 481.wrf I saw it remove a total of 22 out of 322 calls to pow

On 481.wrf on aarch64 I saw about a 1% improvement.
The cycle count on x86_64 was also smaller, but not by a significant amount 
(the same calls to
pow were eliminated).

In general, I think this can shine if multiple expandable calls to pow appear 
together.
So, for example for code:
double
baz (double a)
{
return __builtin_pow (a, -1.25) + __builtin_pow (a, 5.75) - __builtin_pow 
(a, 3.375);
}

we can generate:
baz:
  fsqrt   d3, d0
  fmuld4, d0, d0
  fmovd5, 1.0e+0
  fmuld6, d0, d4
  fsqrt   d2, d3
  fmuld1, d0, d2
  fsqrt   d0, d2
  fmuld3, d3, d2
  fdivd1, d5, d1
  fmuld3, d3, d6
  fmuld2, d2, d0
  fmadd   d0, d4, d3, d1
  fmsub   d0, d6, d2, d0
  ret

reusing the sqrt results and doing more optimisations rather than the current:
baz:
  stp x29, x30, [sp, -48]!
  fmovd1, -1.25e+0
  add x29, sp, 0
  stp d8, d9, [sp, 16]
  fmovd9, d0
  str d10, [sp, 32]
  bl  pow
  fmovd8, d0
  fmovd0, d9
  fmovd1, 5.75e+0
  bl  pow
  fmovd10, d0
  fmovd0, d9
  fmovd1, 3.375e+0
  bl  pow
  faddd8, d8, d10
  ldr d10, [sp, 32]
  fsubd0, d8, d0
  ldp d8, d9, [sp, 16]
  ldp x29, x30, [sp], 48
  ret


Of course gcc could already do that if the exponents were to fall in the set 
{0.25, 0.75, k+0.5}
but with this patch that set can be

Re: [PATCH 2/13] musl libc config

2015-05-08 Thread Kyrill Tkachov


On 07/05/15 19:02, Jeff Law wrote:

On 05/06/2015 05:24 AM, Szabolcs Nagy wrote:

On 29/04/15 00:30, Joseph Myers wrote:

On Mon, 20 Apr 2015, Szabolcs Nagy wrote:


* config/linux.opt (mmusl): New option.

New -m options need documenting in invoke.texi.


Patch v3.

Now with documentation in invoke.texi.

Based on previous discussion I assume it is
OK to commit now.

gcc/Changelog

2015-05-06  Gregor Richards  
Szabolcs Nagy  

* config.gcc (LIBC_MUSL): New tm_defines macro.
* config/linux.h (OPTION_MUSL): Define.
(MUSL_DYNAMIC_LINKER, MUSL_DYNAMIC_LINKER32,)
(MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERX32,)
(INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,)
(INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,)
(INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define.

* config/linux.opt (mmusl): New option.
* doc/invoke.texi (GNU/Linux Options): Document -mmusl.
* configure.ac (gcc_cv_libc_provides_ssp): Add *-*-musl*.
(gcc_cv_target_dl_iterate_phdr): Add *-linux-musl*.

* configure: Regenerate.

OK.
Jeff



I've committed this on Szabolcs' behalf with r222904.

Kyrill



Re: [PATCH, ARM] attribute target (thumb,arm) [4/6] respin (4th)

2015-05-08 Thread Ramana Radhakrishnan


I'm still playing with the code, so this is a partial review.

We should prevent inlining of ARM state functions into functions we know 
will be T16 if !TARGET_SOFT_FLOAT on the grounds that the architecture 
doesn't have floating point instruction encodings in the T16 ISA 
(Thumb1). We'll just cause internal compiler errors if we allow this.



On 07/05/15 00:03, Sandra Loosemore wrote:

On 05/06/2015 08:24 AM, Christian Bruel wrote:

diff '--exclude=.svn' -ruN gnu_trunk.p3/gcc/gcc/doc/extend.texi 
gnu_trunk.p4/gcc/gcc/doc/extend.texi
--- gnu_trunk.p3/gcc/gcc/doc/extend.texi2015-05-06 09:00:31.232943164 
+0200
+++ gnu_trunk.p4/gcc/gcc/doc/extend.texi2015-05-06 14:50:05.632612233 
+0200
@@ -3419,6 +3419,25 @@
  the compiler rejects attempts to specify an alternative.
  @end table

+@item target (@var{options})
+@cindex @code{target} function attribute
+As discussed in @ref{Common Function Attributes}, this attribute
+allows specification of target-specific compilation options.
+
+On ARM, the following options are allowed:
+
+@table @samp
+@item thumb
+@cindex @code{target("thumb")} function attribute, ARM
+Force Thumb1 Thumb2 code generation depending on the architecture.


"Force Thumb or Thumb-2 code generation, depending on the architecture."


I'd rather it said something like

"Force code generation in the Thumb (T16/ T32) ISA. The exact 
instructions chosen depends on the architecture levels chosen."



+
+@item arm
+@cindex @code{target("arm")} function attribute, ARM
+Force ARM code generation.


"Force code generation in the ARM (A32) ISA."


+@end table
+
+Functions from different modes can be inlined using the caller mode.


Rewrite this based on the review comment about inlining in the Thumb16 
state from ARM state.





"...the caller's mode."


+
  @node AVR Function Attributes
  @subsection AVR Function Attributes

@@ -18436,8 +18455,9 @@
  @xref{Function Attributes}, for more information about the
  @code{target} attribute and the attribute syntax.

-The @code{#pragma GCC target} pragma is presently implemented for
-x86, PowerPC, and Nios II targets only.
+The @code{#pragma GCC target} pragma is implemented for
+ARM, x86, PowerPC, and Nios II targets.
+


I'd rather say this once we have proper support with arch, cpu and fpu 
options enabled. Until such a time I think this hunk is a bit premature.



Ramana


Re: [PATCH 0/13] Add musl support to GCC

2015-05-08 Thread Kyrill Tkachov


On 07/05/15 19:07, Jeff Law wrote:

On 05/06/2015 05:36 AM, Szabolcs Nagy wrote:

On 30/04/15 00:18, Joseph Myers wrote:

On Wed, 29 Apr 2015, Szabolcs Nagy wrote:

only affects [u]int_fastN_t types
(on 64bit systems for N=16,32 musl uses int but glibc uses long)

i can fix glibc-stdint.h, but it's yet another way in which the
compiler is tied to a particular libc.


...

(i'd prefer if the compiler did not know about these types, but

...

The compiler also needs to know these types for the Fortran C bindings.


This is a work around patch so -mmusl or default musl libc
changes the [U]INT_FAST{16,32}_TYPE macro definitions.

The undef/define logic is needed because glibc-stdint.h is
used on non-linux targets where OPTION_MUSL would not be
defined and it is used both before and after config/linux.h on
various linux targets.

I did not find any cleaner workaround. (Separate musl-stdint.h
would need significant changes in config.gcc.)

gcc/Changelog:

2015-05-06  Szabolcs Nagy  

* config/glibc-stdint.h (OPTION_MUSL): Define.
(INT_FAST16_TYPE, INT_FAST32_TYPE, UINT_FAST16_TYPE, UINT_FAST32_TYPE):
Change the definition based on OPTION_MUSL for 64 bit targets.

* config/linux.h (OPTION_MUSL): Redefine.
* config/alpha/linux.h (OPTION_MUSL): Redefine.
* config/rs6000/linux.h (OPTION_MUSL): Redefine.
* config/rs6000/linux64.h (OPTION_MUSL): Redefine.

I really don't like the MUSL bits inside glibc-stdint.h.  But I don't
see an easy way to avoid it.

OK for the trunk.


I've committed this on Szabolcs' behalf with r222905.

Kyrill



jeff





Re: [libgomp, testsuite] Support parallel testing in libgomp (PR libgomp/66005)

2015-05-08 Thread Thomas Schwinge
Hi!

On Thu, 7 May 2015 13:39:40 +0200, Jakub Jelinek  wrote:
> On Thu, May 07, 2015 at 01:26:57PM +0200, Rainer Orth wrote:
> > As reported in the PR, with the addition of all those OpenACC tests,
> > libgomp make check times have skyrocketed since the testsuite is still
> > run sequentially.

ACK.  And, thanks for looking into that!

> > Fixing this proved trivial: I managed to almost literally copy the
> > solution from libstdc++-v3/testsuite/Makefile.am, with a minimal change
> > to libgomp.exp so the generated libgomp-test-support.exp file is found
> > in both the sequential and parallel cases.  This isn't an issue in
> > libstdc++ since all necessary variables are stored in a single
> > site.exp.
> 
> It is far from trivial though.
> The point is that most of the OpenMP tests are parallelized with the
> default OMP_NUM_THREADS, so running the tests in parallel oversubscribes the
> machine a lot, the higher number of hw threads the more.

Do you agree that we have two classes of test cases in libgomp: 1) test
cases that don't place a considerably higher load on the machine compared
to "normal" (single-threaded) execution tests, because they're just
testing some functionality that is not expected to actively depend
on/interfere with parallelism.  If needed, and/or if not already done,
such test cases can be parameterized (OMP_NUM_THREADS, OpenACC num_gangs,
num_workers, vector_length clauses, and so on) for low parallelism
levels.  And, 2) test cases that place a considerably higher load on the
machine compared to "normal" (single-threaded) execution tests, because
they're testing some functionality that actively depends on/interferes
with some kind of parallelism.  What about marking such tests specially,
such that DejaGnu will only ever schedule one of them for execution at
the same time?  For example, a new dg-* directive to run them wrapped
through »flock [libgomp/testsuite/serial.lock] [a.out]« or some such?

> If we go forward with some parallelization of the tests, we at least should
> try to export something like OMP_WAIT_POLICY=passive so that the
> oversubscribed machine would at least not spend too much time in spinning.

(Will again have the problem that DejaGnu doesn't provide infrastructure
to communicate environment variables to boards in remote testing.)

> And perhaps reconsider running all OpenACC threads 3 times, just allow
> user to select which offloading target they want to test (host fallback,
> the host nonshm hack, PTX, XeonPHI in the future?), and test just that
> (that is pretty much how OpenMP offloading testing works).

My rationale is: if you configure GCC to support a set of offloading
devices (more than one), you'll also want to get the test coverage that
indeed all these work as expected.  (It currently doesn't matter, but...)
that's something I'd like to see improved in the libgomp OpenMP
offloading testing (once it supports more than one architecture for
offloading).

> For tests that
> always want to test host fallback, I hope OpenACC offers clauses to force
> the host fallback.

Yes.


Grüße,
 Thomas


pgpBQGMcMs6G8.pgp
Description: PGP signature


Re: [PATCH, doc] fix match-and-simplify API doc errors

2015-05-08 Thread Richard Biener
On Fri, May 8, 2015 at 1:37 AM, Jim Wilson  wrote:
> I noticed this while reading the match-and-simplify docs.  The
> gimple_simplify API list has two built-in function cases with three
> tree args.  The last one is supposed to have four tree args for a
> ternary function (3 inputs and one output).  Similarly, in the
> gimple_build API list, the built-in ternary function case with four
> tree args is missing.

Oops - thanks.

Richard.


> Jim


genrecog: Address -Wsign-compare diagnostics (was: Mostly rewrite genrecog)

2015-05-08 Thread Thomas Schwinge
Hi!

On Mon, 27 Apr 2015 11:20:30 +0100, Richard Sandiford 
 wrote:
> This patch [...] by replacing most of genrecog [...]

OK to commit?

Is it a bug that I'm seeing these warnings only in the stage 1 build with
the bootstrap GCC 4.6 compiler, but not anymore later on?  (I have not
verified the C++ standard on the rules for »comparison between signed and
unsigned integer expressions«.)

commit efef4f38205a13da90ca19b6eec1a6526756b433
Author: Thomas Schwinge 
Date:   Fri May 8 10:55:19 2015 +0200

genrecog: Address -Wsign-compare diagnostics.

g++-4.6 [...] [...]/gcc/genrecog.c
[...]/gcc/genrecog.c: In function 'state_size 
find_subroutines(routine_type, state*, vec&)':
[...]/gcc/genrecog.c:3338:35: warning: comparison between signed and 
unsigned integer expressions [-Wsign-compare]
[...]/gcc/genrecog.c:3347:37: warning: comparison between signed and 
unsigned integer expressions [-Wsign-compare]
[...]/gcc/genrecog.c:3359:29: warning: comparison between signed and 
unsigned integer expressions [-Wsign-compare]
[...]/gcc/genrecog.c:3365:32: warning: comparison between signed and 
unsigned integer expressions [-Wsign-compare]

3305   state_size size;
 [...]
3337   state_size to_size = find_subroutines (type, trans->to, 
procs);
3338   if (d->next && to_size.depth > MAX_DEPTH)
 [...]
3347   if (to_size.num_statements < MIN_NUM_STATEMENTS)
 [...]
3359   if (size.num_statements > MAX_NUM_STATEMENTS)
 [...]
3365  && size.num_statements > MAX_NUM_STATEMENTS)

 175 static const int MAX_DEPTH = 6;
 [...]
 179 static const int MIN_NUM_STATEMENTS = 5;
 [...]
 185 static const int MAX_NUM_STATEMENTS = 200;
 [...]
3258 struct state_size
3259 {
 [...]
3261   unsigned int num_statements;
 [...]
3265   unsigned int depth;
3266 };

gcc/
* genrecog.c (MAX_DEPTH, MIN_NUM_STATEMENTS, MAX_NUM_STATEMENTS):
Change to unsigned int.
---
 gcc/genrecog.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git gcc/genrecog.c gcc/genrecog.c
index 73e7995..653f753 100644
--- gcc/genrecog.c
+++ gcc/genrecog.c
@@ -172,17 +172,17 @@ static const bool force_unique_params_p = true;
 /* The maximum (approximate) depth of block nesting that an individual
routine or subroutine should have.  This limit is about keeping the
output readable rather than reducing compile time.  */
-static const int MAX_DEPTH = 6;
+static const unsigned int MAX_DEPTH = 6;
 
 /* The minimum number of pseudo-statements that a state must have before
we split it out into a subroutine.  */
-static const int MIN_NUM_STATEMENTS = 5;
+static const unsigned int MIN_NUM_STATEMENTS = 5;
 
 /* The number of pseudo-statements a state can have before we consider
splitting out substates into subroutines.  This limit is about avoiding
compile-time problems with very big functions (and also about keeping
functions within --param optimization limits, etc.).  */
-static const int MAX_NUM_STATEMENTS = 200;
+static const unsigned int MAX_NUM_STATEMENTS = 200;
 
 /* The minimum number of pseudo-statements that can be used in a pattern
routine.  */


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: PR 64454: Improve VRP for %

2015-05-08 Thread Richard Biener
On Mon, May 4, 2015 at 3:57 PM, Marc Glisse  wrote:
> On Mon, 4 May 2015, Richard Biener wrote:
>
>> On Sat, May 2, 2015 at 12:46 AM, Marc Glisse  wrote:
>>>
>>> Hello,
>>>
>>> this patch tries to tighten a bit the range estimate for x%y.
>>> slp-perm-7.c
>>> started failing by vectorizing more than expected, I assumed it was a
>>> good
>>> thing and updated the test. I am less conservative than Jakub with
>>> division
>>> by 0, but I still don't really understand how empty ranges are supposed
>>> to
>>> be represented in VRP.
>>>
>>> Bootstrap+testsuite on x86_64-linux-gnu.
>>
>>
>> Hmm, so I don't like how you (continute to) use trees for the constant
>> computations. wide-ints would be a better fit today.  I also notice that
>> fold_unary_to_constant can return NULL_TREE and neither the old nor your
>> code handles that.
>
>
> You are right. I was lazy and tried to keep this part of the old code, I
> shouldn't have...
>
>> "empty" ranges are basically UNDEFINED.
>
>
> Cool, that's what I did. But I don't see code adding calls to
> __builtin_unreachable() when an empty range is detected. Maybe that almost
> never happens?

No, it's just nobody bothered to implement it.  You also have to be careful
as you can't replace reproducers of UNDEFINED but only uses that still
result in UNDEFINED result (for example 0 * UNDEFINED is defined again).

What I'd like to do at some point is have some common code that you
can query what OP1 tree_code OP2 evaluates to if either op is UNDEFINED.
For example UNDEF + X == UNDEF but UNDEF * X == 0 (as optimistical
result, of course - the only not undefined case is for X == 0).  UNDEF << X == 0
but UNDEF >> X == signed(x) ? UNDEF : 0.

We have multiple passes that duplicate only parts of those "optimizations".

Having a central place implementing this correctly would be nice.

>> Aren't you pessimizing the case where the old code used
>> value_range_nonnegative_p() by just using TYPE_UNSIGNED?
>
>
> I don't think so. The old code only handled signed types in the positive
> case, while I have a more complete handling of signed types, which should do
> at least as good as the old one even in the positive case.

Ok, I see.

Richard.

> --
> Marc Glisse


Re: [PATCH, ARM] attribute target (thumb,arm) [5/6] respin (4th)

2015-05-08 Thread Ramana Radhakrishnan



On 06/05/15 15:27, Christian Bruel wrote:

Implements the hooks for #pragma GCC target

A test included to check that macros were correctly defined/undefined on
pragma regions.

Thanks

Christian




Missing the hooks - this only appears to have the test.

Ramana


Re: [PATCH] Simple optimization for MASK_STORE.

2015-05-08 Thread Richard Biener
On Wed, May 6, 2015 at 4:04 PM, Yuri Rumyantsev  wrote:
> Hi All,
>
> Here is a patch which gives us significant speed-up on HASWELL for
> test containing masked stores. The main goal of that patch is attempt
> to avoid HW hazard for maskmove instructions through inserting
> additional check on zero mask and putting all masked store statements
> into separate block on false edge.All MASK_STORE statements having the
> same mask put into one block. Any comments will be appreciate.

Hmm.  I'm not very happy with this "optimization" happening at the
GIMPLE level - it feels more like a mdreorg thing...

The testcase you add doesn't end up with invalid addresses - so what's
the testcase you are inventing this for?

Looking into the implementation I don't see where you are validating
data dependences of any sort but you are moving stores (and possibly
loads when sinking definition stmts of stored values).  The code-sinking
part should be handled by the existing pass.  Your simple testcase
contains a single masked store, so why does simply conditionalizing
each masked store in mdreorg not work?  It's a hazard (hopefully
fixed eventually), thus not really worth optimizing 100%.

The target hook name is awful.

You don't need a extra flag in struct loop - the vectorizer scans all
insns so it can perfectly well re-compute it.

What this all feels like is more like a un-if-conversion pass which might
be useful for aggressively if-converted vectorized code as well (thus
lots of vec_cond expressions for example).

Richard.

> ChangeLog:
> 2015-05-06  Yuri Rumyantsev  
>
> * cfgloop.h (has_mask_store): Add new field to struct loop.
> * config/i386/i386.c: Include files stringpool.h and tree-ssanames.h.
> (ix86_vectorize_zero_vector): New function.
> (TARGET_VECTORIZE_ZERO_VECTOR): New target macro
> * doc/tm.texi.in: Add @hook TARGET_VECTORIZE_ZERO_VECTOR.
> * doc/tm.texi: Updated.
> * target.def (zero_vector): New DEFHOOK.
> * tree-if-conv.c (predicate_mem_writes): Set has_mask_store for loop.
> * tree-vect-stmts.c : Include tree-into-ssa.h.
> (optimize_mask_stores): New function.
> * tree-vectorizer.c (vectorize_loops): Zero has_mask_store field for
> non-vectorized loops and invoke optimize_mask_stores function.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.


Re: [patch 3/28] fixincludes: Use automake-1.11.6 (across the tree)

2015-05-08 Thread Michael Haubenwallner

On 05/08/2015 12:10 AM, Bruce Korb wrote:
> On 05/06/15 01:58, Michael Haubenwallner wrote:
>> Trivial patch for fixincludes.
> 
> A) sufficiently trivial that explicit permission ought not be required

Agreed for the actual code change - more important is to notify the automake 
revbump.

> B) it is now officially blessed that we can coalesce year lists.
>Let's do so, okay?

Not for aclocal.m4, which is generated by automake - where 1.11.6 does not 
coalesce years.

/haubi/


Re: [PATCH][tree-ssa-math-opts] Expand pow (x, CONST) using square roots when possible

2015-05-08 Thread Richard Biener
On Fri, May 1, 2015 at 6:02 PM, Kyrill Tkachov
 wrote:
> Hi all,
>
> GCC has some logic to expand calls to pow (x, 0.75), pow (0.25) and pow (x,
> (int)k + 0.5)
> using square roots. So, for the above examples it would generate sqrt (x) *
> sqrt (sqrt (x)),
> sqrt (sqrt (x)) and powi (x, k) * sqrt (x) (assuming k > 0. For k < 0 it
> will calculate the
> reciprocal of that).
>
> However, the implementation of these optimisations is done on a bit of an
> ad-hoc basis with
> the 0.25, 0.5, 0.75 cases hardcoded.
> Judging by
> https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=meissner2.pdf
> these are the most commonly used exponents (at least in SPEC ;))
>
> This patch generalises this optimisation into a (hopefully) more robust
> algorithm.
> In particular, it expands calls to pow (x, CST) by expanding the integer
> part of CST
> using a powi, like it does already, and then expanding the fractional part
> as a product
> of repeated applications of a square root if the fractional part can be
> expressed
> as a multiple of a power of 0.5.
>
> I try to explain the algorithm in more detail in the comments in the patch
> but, for example:
>
> pow (x, 5.625) is not currently handled, but with this patch will be
> expanded
> to powi (x, 5) * sqrt (x) * sqrt (sqrt (sqrt (x))) because 5.625 == 5.0 +
> 0.5 + 0.5**3
>
> Negative exponents are handled in either of two ways, depending on the
> exponent value:
> * Using a simple reciprocal.
>   For example:
>   pow (x, -5.625) == 1.0 / pow (x, 5.625)
> --> 1.0 / (powi (x, 5) * sqrt (x) * sqrt (sqrt (sqrt (x
>
> * For pow (x, EXP) with negative exponent EXP with integer part INT and
> fractional part FRAC:
> pow (1.0 - FRAC) / powi (ceil (abs (EXP))).
>   For example:
>   pow (x, -5.875) == pow (x, 0.125) / powi (X, 6)
> --> sqrt (sqrt (sqrt (x))) / (powi (x, 6))
>
>
> Since hardware square root instructions tend to be expensive, we may want to
> reduce the number
> of square roots we are willing to calculate. Since we reuse intermediate
> square root results,
> this boils down to restricting the depth of the square root chains. In all
> the examples above
> that depth is 3. I've made this maximum depth parametrisable in params.def.
> By adjusting that
> parameter we can adjust the resolution of this optimisation. So, if it's set
> to '4' then we
> will synthesize every exponent that is a multiple of 0.5**4 == 0.0625,
> including negative
> multiples. Currently, GCC will not try to expand negative multiples of
> anything else than 0.5
>
> I have tried to keep the existing functionality intact and activate this
> only for
> -funsafe-math-optimizations and only when the target has a sqrt instruction.
>  An exception to that is pow (x, 0.5) which we prefer to transform to sqrt
> even
> when a hardware sqrt is not available, presumably because the library
> function for
> sqrt is usually faster than pow (?).

Yes.  It's also a safe transform - which you seem to put under
flag_unsafe_math_optimizations only with your patch.

It would be clearer to just leave the special-case

-  /* Optimize pow(x,0.5) = sqrt(x).  This replacement is always safe
- unless signed zeros must be maintained.  pow(-0,0.5) = +0, while
- sqrt(-0) = -0.  */
-  if (sqrtfn
-  && REAL_VALUES_EQUAL (c, dconsthalf)
-  && !HONOR_SIGNED_ZEROS (mode))
-return build_and_insert_call (gsi, loc, sqrtfn, arg0);

in as-is.

You also removed the Os constraint which you should put back in.
Basically if !optimize_function_for_speed_p then generate at most
two calls to sqrt (iff the HW has a sqrt instruction).

You fail to add a testcase that checks that the optimization applies.

Otherwise the idea looks good though there must be a better way
to compute the series than by using real-arithmetic and forcefully
trying out all possibilities...

Richard.

>
>
> Having seen the glibc implementation of a fully IEEE-754-compliant pow
> function, I think we
> would prefer synthesising the pow call whenever we can for -ffast-math.
>
> I have seen this optimisation trigger a few times in SPEC2k6, in particular
> in 447.dealII
> and 481.wrf where it replaced calls to powf (x, -0.25), pow (x, 0.125) and
> pow (x, 0.875)
> with square roots, multiplies and, in the case of -0.25, divides.
> On 481.wrf I saw it remove a total of 22 out of 322 calls to pow
>
> On 481.wrf on aarch64 I saw about a 1% improvement.
> The cycle count on x86_64 was also smaller, but not by a significant amount
> (the same calls to
> pow were eliminated).
>
> In general, I think this can shine if multiple expandable calls to pow
> appear together.
> So, for example for code:
> double
> baz (double a)
> {
>   return __builtin_pow (a, -1.25) + __builtin_pow (a, 5.75) - __builtin_pow
> (a, 3.375);
> }
>
> we can generate:
> baz:
> fsqrt   d3, d0
> fmuld4, d0, d0
> fmovd5, 1.0e+0
> fmuld6, d0, d4
> fsqrt   d2, d3
> fmuld1, d0, d2
> fsqrt   d0, d

Re: [patch 1/10] debug-early merge: Ada front-end

2015-05-08 Thread Eric Botcazou
> @@ -5204,28 +5199,6 @@ gnat_write_global_declarations (void)
> types_used_by_var_decl_insert (t, dummy_global);
>   }
>  }
> -
> -  /* Output debug information for all global type declarations first.  This
> - ensures that global types whose compilation hasn't been finalized
> yet, - for example pointers to Taft amendment types, have their
> compilation - finalized in the right context.  */
> -  FOR_EACH_VEC_SAFE_ELT (global_decls, i, iter)
> -if (TREE_CODE (iter) == TYPE_DECL && !DECL_IGNORED_P (iter))
> -  debug_hooks->global_decl (iter);
> -
> -  /* Proceed to optimize and emit assembly. */
> -  symtab->finalize_compilation_unit ();
> -
> -  /* After cgraph has had a chance to emit everything that's going to
> - be emitted, output debug information for the rest of globals.  */
> -  if (!seen_error ())
> -{
> -  timevar_push (TV_SYMOUT);
> -  FOR_EACH_VEC_SAFE_ELT (global_decls, i, iter)
> - if (TREE_CODE (iter) != TYPE_DECL && !DECL_IGNORED_P (iter))
> -   debug_hooks->global_decl (iter);
> -  timevar_pop (TV_SYMOUT);
> -}
>  }

What's the replacement mechanism for the first pass on global_decls?  The 
comment explains that generating debug info must be delayed in this case.

-- 
Eric Botcazou


Re: Remove mode argument from gen_rtx_SET

2015-05-08 Thread Franz Sirl

Am 2015-05-07 um 13:37 schrieb Richard Sandiford:

One problem with the automatically-generated gen_rtx_FOO () macros
is that they always have a mode parameter, even for codes like SET
where the mode should always be VOIDmode.  This inevitably leads to
cases where a caller accidentally passes something other than VOIDmode.
E.g. when expanding an SImode move, the temptation is to make everything
SImode, even the SETs.  This in turn can cause two instructions to appear
different simply because their SETs have different modes, even though the
SET_DEST and SET_SRC are identical.

E.g. for gcc/testsuite/g++.dg/torture/pr34651.C on lm32-elf we have
the following before jump2:

   (jump_insn 42 191 43 5 (set (pc)
  (if_then_else (eq:SI (reg:SI 13 r13 [orig:43 inHotKey$4+-3 ] [43])
  (const_int 0 [0]))
  (label_ref 53)
  (pc))) gcc/testsuite/g++.dg/torture/pr34651.C:22 22 {*beq}
(expr_list:REG_DEAD (reg:SI 13 r13 [orig:43 inHotKey$4+-3 ] [43])
  (int_list:REG_BR_PROB 5000 (nil)))
-> 53)
   (note 43 42 48 6 [bb 6] NOTE_INSN_BASIC_BLOCK)
   (insn 48 43 47 6 (set (reg:SI 2 r2)
  (mem/u/c:SI (reg:SI 1 r1) [4  S4 A32])) 
gcc/testsuite/g++.dg/torture/pr34651.C:22 7 {movsi_insn}
(expr_list:REG_DEAD (reg:SI 1 r1)
  (nil)))
   [...]
   (code_label 53 169 54 7 6 "" [1 uses])
   (note 54 53 12 7 [bb 7] NOTE_INSN_BASIC_BLOCK)
   (insn 12 54 57 7 (set:SI (reg/f:SI 2 r2 [orig:46 D.2050 ] [46])
  (mem/u/c:SI (reg:SI 1 r1) [4  S4 A32])) 
gcc/testsuite/g++.dg/torture/pr34651.C:22 7 {movsi_insn}
(expr_list:REG_DEAD (reg:SI 1 r1)
  (expr_list:REG_EQUAL (symbol_ref/f:SI ("*.LC3") [flags 0x2]  )
  (nil

where insns 12 and 48 are identical except for the :SI on the SET.
This difference prevents us from merging the heads of the two blocks;
after removing it we replace the two loads with a single load before
the branch.

This patch removes the mode argument from gen_rtx_SET and updates
all callers.  I used a script to (try to) make sure that all callers
really had been caught.  I also built one target per CPU just in case.
There were some changes in gcc.dg, g++.dg and gcc.c-torture assembly
code for c6x-elf, lm32-elf and v850-elf, but all of them seemed to be
code improvements from removing duplicated instructions.  (Other ports
also passed spurious modes but apparently not in a way that affects
the tests I'd tried.)  Also tested on x86_64-linux-gnu.  OK to install?

BTW, I've split the patch up into two, the last bit being a mechanical
removal of modes.  (I did it by hand though to try to keep things
properly formatted.)

Thanks,
Richard


gcc/
* rtl.h (always_void_p): New function.
* gengenrtl.c (always_void_p(: Likewise.
(genmacro): Don't add a mode parameter to gen_rtx_foo if rtxes
with code foo are always VOIDmode.
* genemit.c (gen_exp): Update gen_rtx_foo calls accordingly.
* builtins.c, caller-save.c, calls.c, cfgexpand.c, combine.c,
compare-elim.c, config/aarch64/aarch64.c,
config/aarch64/aarch64.md, config/alpha/alpha.c,
config/alpha/alpha.md, config/arc/arc.c, config/arc/arc.md,
config/arm/arm-fixed.md, config/arm/arm.c, config/arm/arm.md,
config/arm/ldrdstrd.md, config/arm/thumb2.md, config/arm/vfp.md,
config/avr/avr.c, config/bfin/bfin.c, config/c6x/c6x.c,
config/c6x/c6x.md, config/cr16/cr16.c, config/cris/cris.c,
config/cris/cris.md, config/darwin.c, config/epiphany/epiphany.c,
config/epiphany/epiphany.md, config/fr30/fr30.c, config/frv/frv.c,
config/frv/frv.md, config/h8300/h8300.c, config/i386/i386.c,
config/i386/i386.md, config/i386/sse.md, config/ia64/ia64.c,
config/ia64/vect.md, config/iq2000/iq2000.c,
config/iq2000/iq2000.md, config/lm32/lm32.c, config/lm32/lm32.md,
config/m32c/m32c.c, config/m32r/m32r.c, config/m68k/m68k.c,
config/m68k/m68k.md, config/mcore/mcore.c, config/mcore/mcore.md,
config/mep/mep.c, config/microblaze/microblaze.c,
config/mips/mips.c, config/mips/mips.md, config/mmix/mmix.c,
config/mn10300/mn10300.c, config/msp430/msp430.c,
config/nds32/nds32-memory-manipulation.c, config/nds32/nds32.c,
config/nds32/nds32.md, config/nios2/nios2.c, config/nvptx/nvptx.c,
config/pa/pa.c, config/pa/pa.md, config/rl78/rl78.c,
config/rs6000/altivec.md, config/rs6000/rs6000.c,
config/rs6000/rs6000.md, config/rs6000/vector.md,
config/rs6000/vsx.md, config/rx/rx.c, config/rx/rx.md,
config/s390/s390.c, config/s390/s390.md, config/sh/sh.c,
config/sh/sh.md, config/sh/sh_treg_combine.cc,
config/sparc/sparc.c, config/sparc/sparc.md, config/spu/spu.c,
config/spu/spu.md, config/stormy16/stormy16.c,
config/tilegx/tilegx.c, config/tilegx/tilegx.md,
config/tilepro/tilepro.c, config/tilepro/tilepro.md,
config/v

Re: [patch 1/10] debug-early merge: Ada front-end

2015-05-08 Thread Richard Biener
On Fri, May 8, 2015 at 12:26 PM, Eric Botcazou  wrote:
>> @@ -5204,28 +5199,6 @@ gnat_write_global_declarations (void)
>> types_used_by_var_decl_insert (t, dummy_global);
>>   }
>>  }
>> -
>> -  /* Output debug information for all global type declarations first.  This
>> - ensures that global types whose compilation hasn't been finalized
>> yet, - for example pointers to Taft amendment types, have their
>> compilation - finalized in the right context.  */
>> -  FOR_EACH_VEC_SAFE_ELT (global_decls, i, iter)
>> -if (TREE_CODE (iter) == TYPE_DECL && !DECL_IGNORED_P (iter))
>> -  debug_hooks->global_decl (iter);

Shouldn't that have used ->type_decl (iter) anyway?  That is, are they not
already processed via rest_of_type_compilation or does the Ada FE not
use that?

>> -  /* Proceed to optimize and emit assembly. */
>> -  symtab->finalize_compilation_unit ();
>> -
>> -  /* After cgraph has had a chance to emit everything that's going to
>> - be emitted, output debug information for the rest of globals.  */
>> -  if (!seen_error ())
>> -{
>> -  timevar_push (TV_SYMOUT);
>> -  FOR_EACH_VEC_SAFE_ELT (global_decls, i, iter)
>> - if (TREE_CODE (iter) != TYPE_DECL && !DECL_IGNORED_P (iter))
>> -   debug_hooks->global_decl (iter);
>> -  timevar_pop (TV_SYMOUT);
>> -}
>>  }
>
> What's the replacement mechanism for the first pass on global_decls?  The
> comment explains that generating debug info must be delayed in this case.

But yes, I don't think the patches add any replacement for processing
TYPE_DECLs that happen to be in global_decls.

Richard.

> --
> Eric Botcazou


Re: [AArch64] Fix predicate and constraint mismatch in logical atomic operations

2015-05-08 Thread Richard Biener
On Tue, Nov 4, 2014 at 11:44 AM, Marcus Shawcroft
 wrote:
> On 25 September 2014 04:45, Michael Collison
>  wrote:
>> On certain patterns in atomics.md the constraint 'n' is used in combination
>> with the predicate atomic_op_operand. The constraint is too general and
>> allows constants that are disallowed by the predicate. This causes an ICE In
>> final_scan_insn when the insn cannot be split because the constraint and
>> predicate do not match.
>>
>> Tested on aarch64-none-elf, aarch64-linux-gnu. Additionally the originally
>> reporter of the bug, (d...@ubuntu.com), applied the patch and successfully
>> bootstrapped and tested with no regressions.
>>
>> 2014-09-23  Michael Collison 
>>
>> * config/aarch64/iterators.md (lconst_atomic): New mode attribute to
>> support constraints for CONST_INT in atomic operations.
>> * config/aarch64/atomics.md
>> (atomic_): Use lconst_atomic constraint.
>> (atomic_nand): Likewise.
>> (atomic_fetch_): Likewise.
>> (atomic_fetch_nand): Likewise.
>> (atomic__fetch): Likewise.
>> (atomic_nand_fetch): Likewise.
>
> OK Thanks.  /Marcus

Can you please backport this to all release branches as well?

Thanks,
Richard.


[PATCH PR65447]Improve IV handling by grouping address type uses with same base and step

2015-05-08 Thread Bin Cheng
Hi,
GCC's IVO currently handles every IV use independently, which is not right
by learning from cases reported in PR65447.

The rationale is:
1) Lots of address type IVs refer to the same memory object, share similar
base and have same step.  We should handle these IVs as a group in order to
maximize CSE opportunities, prefer reg+offset addressing mode.
2) GCC's IVO algorithm is expensive and only is run when candidate set is
small enough.  By grouping same family uses, we can decrease the number of
both uses and candidates.  Before this patch, number of candidates for
PR65447 is too big to run expensive IVO algorithm, resulting in bad assembly
code on targets like AArch64 and Mips.
3) Even for cases the assembly code isn't improved, we can still get
compilation time benefit with this patch.
4) This is a prerequisite for enabling auto-increment support in IVO on
AArch64.

For now, this is only done to address type IVs, in the future I may extend
it to general IVs too.

For AArch64:
Benchmarks 470.lbm/spec2k6 and 173.applu/spec2k are improved obviously by
this patch.  A couple of cases from spec2k/fp appear regressed.  I looked
into generated assembly code and can confirm the regression is false alarm
except one case (189.lucas).  For that case, I think it's another issue
exposed by this patch (GCC failed to CSE candidate setup code, resulting in
bloated loop header).  Anyway, I also fined tuned the patch to minimize the
impact.

For AArch32, this patch seems to be able to improve spec2kfp too, but I
didn't look deep into it.  I guess the reason is it can make life for
auto-increment support in IVO better.

One of defects of this patch is computation of max offset in
compute_max_addr_offset is basically borrowed from get_address_cost.  The
comment says we should find a better way to compute all information.  People
also complained we need to refactor that part of code.  I don't have good
solution to that yet, though I did try best to keep compute_max_addr_offset
simple.

I believe this is a generally wanted change, bootstrap and test on x86_64
and AArch64, so is it ok?


2015-05-08  Bin Cheng  

PR tree-optimization/65447
* tree-ssa-loop-ivopts.c (struct iv_use): New fields.
(dump_use, dump_uses): Support to dump sub use.
(record_use): New parameters to support sub use.  Remove call to
dump_use.
(record_sub_use, record_group_use): New functions.
(compute_max_addr_offset, split_all_small_groups): New functions.
(group_address_uses, rewrite_use_address): New functions.
(strip_offset): New declaration.
(find_interesting_uses_address): Call record_group_use.
(add_candidate): New assertion.
(infinite_cost_p): Move definition forward.
(add_costs): Check INFTY cost and return immediately.
(get_computation_cost_at): Clear setup cost and dependent bitmap
for sub uses.
(determine_use_iv_cost_address): Compute cost for sub uses.
(rewrite_use_address_1): Rename from old rewrite_use_address.
(free_loop_data): Free sub uses.
(tree_ssa_iv_optimize_loop): Call group_address_uses.

gcc/testsuite/ChangeLog
2015-05-08  Bin Cheng  

PR tree-optimization/65447
* gcc.dg/tree-ssa/pr65447.c: New test.
Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c  (revision 222758)
+++ gcc/tree-ssa-loop-ivopts.c  (working copy)
@@ -226,6 +226,7 @@ struct cost_pair
 struct iv_use
 {
   unsigned id; /* The id of the use.  */
+  unsigned sub_id; /* The id of the sub use.  */
   enum use_type type;  /* Type of the use.  */
   struct iv *iv;   /* The induction variable it is based on.  */
   gimple stmt; /* Statement in that it occurs.  */
@@ -239,6 +240,11 @@ struct iv_use
 
   struct iv_cand *selected;
/* The selected candidate.  */
+
+  struct iv_use *next; /* The next sub use.  */
+  tree addr_base;  /* Base address with const offset stripped.  */
+  unsigned HOST_WIDE_INT addr_offset;
+   /* Const offset stripped from base address.  */
 };
 
 /* The position where the iv is computed.  */
@@ -556,8 +562,12 @@ dump_iv (FILE *file, struct iv *iv)
 void
 dump_use (FILE *file, struct iv_use *use)
 {
-  fprintf (file, "use %d\n", use->id);
+  fprintf (file, "use %d", use->id);
+  if (use->sub_id)
+fprintf (file, ".%d", use->sub_id);
 
+  fprintf (file, "\n");
+
   switch (use->type)
 {
 case USE_NONLINEAR_EXPR:
@@ -605,8 +615,12 @@ dump_uses (FILE *file, struct ivopts_data *data)
   for (i = 0; i < n_iv_uses (data); i++)
 {
   use = iv_use (data, i);
-
-  dump_use (file, use);
+  do
+   {
+ dump_use (file, use);
+ use = use->next;
+   }
+  while (use);
   fprintf (file, "\n");
 }
 }
@@ -1327,33 +1341,88 @@ find_induction_variables (struct ivopts_data *data
   return tru

Re: [Patch, Fortran, PR58586, v3] ICE with derived type with allocatable component passed by value

2015-05-08 Thread Andre Vehreschild
Hi Mikael,

thanks for the review. I still have some questions/remarks before commiting:

On Thu, 07 May 2015 12:14:59 +0200
Mikael Morin  wrote:

> > @@ -2158,6 +2158,8 @@ build_function_decl (gfc_symbol * sym, bool global)
> >  gfc_set_decl_assembler_name (fndecl, gfc_sym_mangled_function_id
> > (sym)); 
> >sym->backend_decl = fndecl;
> > +  if (sym == sym->result && !sym->result->backend_decl)
> > +sym->result->backend_decl = result_decl;
> 
> Something is seriously misbehaving if the condition is true, and setting
> sym->backend_decl to result_decl doesn't seem any better than keeping it
> NULL.
> So, please remove this change

Did that. I think this was a relic from the start of me trying to understand
what was the issue and how to fix it. Later I didn't check, if it was still
necessary. Sorry for that.

> > @@ -5898,8 +5900,21 @@ gfc_generate_function_code (gfc_namespace * ns)
> >  
> >if (TREE_TYPE (DECL_RESULT (fndecl)) != void_type_node)
> >  {
> > +  bool artificial_result_decl = false;
> >tree result = get_proc_result (sym);
> >  
> > +  /* Make sure that a function returning an object with
> > +alloc/pointer_components always has a result, where at least
> > +the allocatable/pointer components are set to zero.  */
> > +  if (result == NULL_TREE && sym->attr.function
> > + && sym->ts.type == BT_DERIVED
> > + && (sym->ts.u.derived->attr.alloc_comp
> > + || sym->ts.u.derived->attr.pointer_comp))
> > +   {
> > + artificial_result_decl = true;
> > + result = gfc_get_fake_result_decl (sym, 0);
> > +   }
> 
> I expect the "fake" result decl to be needed in more cases.
> For example, if type is BT_CLASS.
> Here is a variant of alloc_comp_class_4.f03:c_init for such a case.
> 
>   class(c) function c_init2()
> allocatable :: c_init2
>   end function
> 
> or even without class:
> 
>   type(t) function t_init()
> allocatable :: t_init
>   end function
> 
> for some any type t.
> 
> So, remove the check for alloc_comp/pointer_comp and permit BT_CLASS.
> One minor thing, check sym->result's type and attribute instead of sym's
> here.  It should not make a difference, but I think it's more correct.

I am d'accord with checking sym->result, but I am not happy with removing the
checks for alloc_comp|pointer_comp. When I got you right there, you propose the
if to be like this:

  if (result == NULL_TREE && sym->attr.function
  && (sym->result->ts.type == BT_DERIVED
  || sym->result->ts.type == BT_CLASS))

Removing the attribute checks means to initialize every derived/class type
result, which may change the semantics of the code more than intented. Look for
example at this code

  type t
integer :: i = 5
  end type

  type(t) function static_t_init()
  end function

When one compiles this code with -Wreturn-type, then the warning of an
uninitialized return value is issued at the function declaration. Nevertheless
the result of static_t_init is validly initialized and i is 5. This may
confuse users.

I therefore came to the very ugly solution to make this:

  if (result == NULL_TREE && sym->attr.function
  && ((sym->result->ts.type == BT_DERIVED
   && (sym->results->attr.allocatable
   || sym->result->ts.u.derived->attr.alloc_comp
   || sym->result->ts.u.derived->attr.pointer_comp))
  || (sym->result->ts.type == BT_CLASS
  && (CLASS_DATA (sym->result)->attr.allocatable
  || CLASS_DATA (sym->result)->attr.alloc_comp
  || CLASS_DATA (sym->result)->attr.pointer_comp

(I am not yet sure, whether the pointer attribute needs to be added to.) With
the code above the result of static_t_init is not initialized with all the
consequences. 

So what do you propose to do here?

Btw, I think I found an additional bug during testing: 
  type(t) function t_init()
allocatable :: t_init
  end function
 
when called by:
  type(t), allocatable :: temp
  temp = t_init()

a segfault occurs, because the result of t_init() is NULL, which is
dereferenced by the caller in this pseudo-code:

  if (temp != 0B) goto L.12;
  temp = (struct t *) __builtin_malloc (4);
L.12:;
  *temp = *t_init (); <-- This obviously is problematic.

> The rest looks good.
> The patch is OK with the suggested changes above.  Thanks.
> I don't think the test functions above work well enough to be
> incorporated in a testcase for now.

?? I don't get you there? What do you mean? Do you think the
alloc_comp_class_3/4.* are not correctly testing the issue? Any idea of how to
test this better? I mean the pr is about this artificial constructs. I merely
struck it in search of a pr about allocatable components. 

Attached is a version of the patch that I currently use. Note the testcase
alloc_comp_class_4.f03 fails currently, because of the error noted above in
line 94.

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad 

RE: [patch 0/27] RFC: Use automake-1.11.6 across the tree

2015-05-08 Thread Bernd Edlinger
Hi Michael,

On Thu, 7 May 2015 18:52:52, Michael Haubenwallner wrote:
>
> Hi Bernd,
>
> On 05/06/2015 03:01 PM, Bernd Edlinger wrote:
>> On Tue, 5 May 2015 18:03:15, Michael Haubenwallner wrote:
>>>
>>> Now that gcc-5 is out, what about an automake-1.11.6 update for gcc-6?
>>>
>>> BTW, the actual commands I use to re-run automake for everything (I found) 
>>> is:
>>> $ export AUTOMAKE='automake-1.11 --add-missing --copy --force-missing'
>>> $ /src/gcc-trunk/configure --prefix=/install \
>>> --enable-languages=c,c++,fortran,go,java,lto,objc,obj-c++ \
>>> --enable-liboffloadmic=target \
>>> --enable-libmpx \
>>> --enable-maintainer-mode
>>> $ make bootstrap
>>>
>>
>> And for completeness: ada missing here?
>
> This starts to become tricky here on my quite up-to-date Gentoo stable amd64 
> box:
>
> The normal host compiler is: gcc version 4.8.4 configured to 
> --enable-languages=c,c++
> while the gnat compiler is: gnatgcc version 4.3.5 configured to 
> --enable-languages=c,ada
>
> But: How do I tell the gcc-trunk/configure to use gcc/g++ for C/C++ and 
> gnatgcc for Ada?
>
> I've thought of using CC=gnatgcc, but then I also would need something like 
> CXX=gnatg++
> OTOH, seems like Gentoo never has enabled ada for the normal host gcc.
>
> Is this a problem I should fix with Gentoo, or is it me missing anything here?
>
> Thanks!
> /haubi/
>
> $ gcc -v
> Using built-in specs.
> COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/4.8.4/gcc
> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.8.4/lto-wrapper
> Target: x86_64-pc-linux-gnu
> Configured with: 
> /var/tmp/portage/sys-devel/gcc-4.8.4/work/gcc-4.8.4/configure 
> --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr 
> --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.8.4 
> --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.4/include 
> --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.4 
> --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.4/man 
> --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.8.4/info 
> --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.4/include/g++-v4 
> --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.8.4/python 
> --enable-languages=c,c++ --enable-obsolete --enable-secureplt 
> --disable-werror --with-system-zlib --disable-nls --enable-checking=release 
> --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.8.4 p1.4, 
> pie-0.6.1' --enable-libstdcxx-time --enable-shared --enable-threads=posix 
> --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib 
> --with-multilib-list=m32,m64 --disable-altivec --disable-fixed-point --en
> able-targets=all --disable-libgcj --enable-libgomp --disable-libmudflap 
> --disable-libssp --disable-libquadmath --enable-lto --without-cloog 
> --enable-libsanitizer
> Thread model: posix
> gcc version 4.8.4 (Gentoo 4.8.4 p1.4, pie-0.6.1)
>
> $ gnatgcc -v
> Using built-in specs.
> Target: x86_64-pc-linux-gnu
> Configured with: 
> /var/tmp/portage/dev-lang/gnat-gcc-4.3.5/work/gcc-4.3.5/configure 
> --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gnat-gcc-bin/4.3 
> --includedir=/usr/lib64/gnat-gcc/x86_64-pc-linux-gnu/4.3/include 
> --libdir=/usr/lib64/gnat-gcc/x86_64-pc-linux-gnu/4.3 
> --libexecdir=/usr/libexec/gnat-gcc/x86_64-pc-linux-gnu/4.3 
> --datadir=/usr/share/gnat-gcc-data/x86_64-pc-linux-gnu/4.3 
> --mandir=/usr/share/gnat-gcc-data/x86_64-pc-linux-gnu/4.3/man 
> --infodir=/usr/share/gnat-gcc-data/x86_64-pc-linux-gnu/4.3/info 
> --program-prefix=gnat --enable-languages=c,ada --with-gcc 
> --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-nls 
> --with-system-zlib --disable-checking --disable-werror --disable-libgomp 
> --disable-libmudflap --disable-libssp --disable-libunwind-exceptions 
> --enable-libada --enable-threads=gnat --enable-shared=boehm-gc,ada,libada 
> --enable-multilib --enable-__cxa_atexit --enable-clocale=gnu
> Thread model: gnat
> gcc version 4.3.5

Hmm...

I don't think the boot strap can work if the "gcc" driver program does not
understand ada and c++ at the same time.

This can be a bit tricky:  If you can not find a working gcc with ada and c,c++
for your machine, then you will need to boot-strap that on a different host 
first.
That is possible, if you copy the so called system root files, that is all the 
necessary
glibc headers, and glibc binaries from your target system to 
$PREFIX/x86_64-pc-linux-gnu/include
and $PREFIX/x86_64-pc-linux-gnu/lib on the build-maching after binutils install 
but before
gcc boot-strap begins.


Bernd.
  

Re: Fix logic error in Fortran OpenACC parsing

2015-05-08 Thread Ilmir Usmanov

Hi!

On 06.05.2015 14:38, Thomas Schwinge wrote:

Hi!

On Tue, 5 May 2015 15:38:03 -0400, David Malcolm  wrote:

On Wed, 2015-04-29 at 14:10 +0200, Mikael Morin wrote:

Le 29/04/2015 02:02, David Malcolm a écrit :

diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 2c7c554..30e4eab 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -4283,7 +4283,7 @@ parse_oacc_structured_block (gfc_statement acc_st)
unexpected_eof ();
else if (st != acc_end_st)
gfc_error ("Expecting %s at %C", gfc_ascii_statement (acc_end_st));
-   reject_statement ();
+  reject_statement ();
  }
while (st != acc_end_st);
  

I think this one is a bug; there should be braces around 'gfc_error' and
'reject_statement'.
If 'st' is 'acc_end_st', as it shall be, the statement is rejected. So, 
this is a bug.





At least that's the pattern in 'parse_oacc_loop', and how the
'unexpected_statement' function is used.

FWIW, Jeff had approved that patch, so I've committed the patch to trunk
(as r222823), making the indentation reflect the block structure.

Thomas:  should the
   reject_statement ();
call in the above be guarded by the
  else if (st != acc_end_st)
clause?

Indeed, this seems to be a bug that has been introduced very early in the
OpenACC Fortran front end development -- see how the
parse_oacc_structured_block function evolved in the patches posted in

and following (Ilmir, CCed "just in case").  I also see that the
corresponding OpenMP code, parse_omp_structured_block, just calls
unexpected_statement, which Ilmir's initial patch also did, but at some
point, he then changed this to the current code: gfc_error followed by
reject_statement, as cited above -- I would guess for the reason to get a
better error message?  (Tobias, should this thus also be done for OpenMP,
and/or extend unexpected_statement accordingly?)

That's true.
I've checked abandoned openacc-1_0-branch and I used 
unexpected_statement there (there still odd *_acc_* naming presents 
instead of new-and-shiny *_oacc_* one), but, as you mentioned, I've 
changed this for better error reporting... and introduced the bug.




And then, I'm a bit confused: is it "OK" that despite this presumed logic
error, which affects all (?) valid executions of this parsing code, we're
not running into any issues with the OpenACC Fortran front end test
cases?
I think, this is OK, since this is an !$ACC END _smth_ statement and it 
shall not present in the AST. So, it is abandoned later anyway ;)  (if I 
remember correctly, during gfc_clear_new_st call). Although the bug does 
not affect the logic, it is still a bug.



OK for trunk?

From my point of view, OK.



commit 068eebfa63b2b4c8849ed5fd2c9d0a130586dfb0
Author: Thomas Schwinge 
Date:   Wed May 6 13:18:18 2015 +0200

 Fix logic error in Fortran OpenACC parsing
 
 	gcc/fortran/

* parse.c (parse_oacc_structured_block): Fix logic error.
Reported by Mikael Morin .
---
  gcc/fortran/parse.c |6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git gcc/fortran/parse.c gcc/fortran/parse.c
index 30e4eab..e977498 100644
--- gcc/fortran/parse.c
+++ gcc/fortran/parse.c
@@ -4282,8 +4282,10 @@ parse_oacc_structured_block (gfc_statement acc_st)
if (st == ST_NONE)
unexpected_eof ();
else if (st != acc_end_st)
-   gfc_error ("Expecting %s at %C", gfc_ascii_statement (acc_end_st));
-  reject_statement ();
+   {
+ gfc_error ("Expecting %s at %C", gfc_ascii_statement (acc_end_st));
+ reject_statement ();
+   }
  }
while (st != acc_end_st);
  



Grüße,
  Thomas

--
Ilmir.


RE: [patch 1/28] top-level: Use automake-1.11.6

2015-05-08 Thread Bernd Edlinger
Hi,

On Thu, 7 May 2015 15:25:14, Joseph S. Myers wrote:
>
> On Thu, 7 May 2015, Bernd Edlinger wrote:
>
>> But that is not the case for other tool scripts.  I think these should
>> be in-sync with the automake version that creates the configure scripts
>> that make use of them.
>
> At least some of these scripts are also usable other than from
> automake-generated code (I don't know if they're used like that in GCC,
> but some projects use them like that). New versions should be compatible
> with older automake, and I don't think we should be downgrading these
> scripts to older versions (which is what this patch would do).
>

Yes, but the world is not as perfect as it should be.

One example where there is an incompatibility is "missing":

Formerly it had code that emulated the missing "flex" by
creating a dummy lex.yy.c from the hopefully installed
pre-compiled flex output file.  But the version from the
trunk does nothing, which breaks all configure scripts
that used AM_PROG_LEX.  I do assume that the
automake scripts just use a different way to achieve
the same goal, if flex is not installed.

See https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03007.html
for an example what can happen, if the tool scripts are
updated, but the automake is not updated.


Bernd.
  

Re: [patch] Implement ISO/IEC TS 18822 C++ File system TS

2015-05-08 Thread Rainer Orth
Jonathan Wakely  writes:

> I've committed the two changes attached (only tested on linux again).
>
> patch2.txt should fix the mingw-w64 errors above, as well as the
> issues Daniel reported, and should fix the error on Solaris 10
>
> Rainer, would you be able to test with
> --enable-libstdcxx-filesystem-ts before we re-enable it to build by
> default on solaris* ?

I just did: a bootstrap on i386-pc-solaris2.10 completed successfully
and the experimental/filesystem tests all PASSed.  Given that Solaris 11
was fine even before, I think it's safe to re-enable it by default on
Solaris again.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [Patch, fortran, pr65894, v1] [6 Regression] severe regression in gfortran 6.0.0

2015-05-08 Thread Andre Vehreschild
Hi Mikael,

at first I tried to fix this issue with the scalarizer, too, but I could not
grasp how the scalarizer was working. Do you have any documentation, how it is
meant to be? I mean, I have read the comments in the code, but those are sparse
and the multitude of routines the scalarizer is split up into doesn't help
either.

Anyway, because not a single line of code from my patch is left, this has to be
your patch now. Thanks for finding a better solution. 

I do not have the privileges to do a review so I can't help you there. Good
luck finding a reviewer.

Regards,
Andre

On Thu, 07 May 2015 18:35:19 +0200
Mikael Morin  wrote:

> Le 07/05/2015 11:52, Andre Vehreschild a écrit :
> > Hi all,
> > 
> > my work on pr60322 caused a regression on trunk. This patch fixes it. The
> > regression had two causes:
> > 1. Not taking the correct attribute for BT_CLASS objects with allocatable
> >components into account (chunk 1), and
> > 2. taking the address of an address (chunk 2). When a class or derived typed
> >scalar object is to be returned as a reference and a scalarizer is
> > present, then the address of the address of the object was returned. The
> > former code was meant to return the address of an array element for which
> > taking the address was ok. The patch now prevents taking the additional
> > address when the object is scalar.
> > 
> Hello,
> 
> The "chunk 2" fix should go in gfc_conv_expr, so that
> gfc_add_loop_ss_code's "can_be_null_ref" condition matches the one in
> gfc_conv_expr.  Both functions work together, if references are
> generated in gfc_add_loop_ss_code, they should be used as reference in
> gfc_conv_expr.  Same if values are generated.
> 
> About the condition of the first chunk, I don't understand what it's
> good for.
> 
> So I propose the attached patch instead.
> It creates a new function to decide between reference and value, so that
> gfc_add_loop_ss_code and gfc_conv_expr are kept in sync.
> As the new function needs information about the dummy argument, the
> dummy symbol is saved to a new field in gfc_ss_info.
> And the "chunk 1" condition is reverted to its previous state.
> The testcase is yours.
> 
> regression tested on x86_64-unknown-linux-gnu.  OK for trunk?
> 
> Mikael
> 
> 
> 
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [patch 7/10] debug-early merge: LTO

2015-05-08 Thread Richard Biener
On Fri, May 8, 2015 at 2:37 AM, Aldy Hernandez  wrote:
>

Ok.

Thanks,
Richard.


Re: [patch 0/27] RFC: Use automake-1.11.6 across the tree

2015-05-08 Thread Andreas Schwab
Michael Haubenwallner  writes:

> This starts to become tricky here on my quite up-to-date Gentoo stable amd64 
> box:
>
> The normal host compiler is: gcc version 4.8.4 configured to 
> --enable-languages=c,c++
> while the gnat compiler is:  gnatgcc version 4.3.5 configured to 
> --enable-languages=c,ada
>
> But: How do I tell the gcc-trunk/configure to use gcc/g++ for C/C++ and 
> gnatgcc for Ada?

You can find a working Ada compiler here:

http://software.opensuse.org/package/gcc-ada

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [patch 3/10] debug-early merge: C++ front-end

2015-05-08 Thread Richard Biener
On Fri, May 8, 2015 at 2:34 AM, Aldy Hernandez  wrote:
>

Maybe you can split out the Java aliases stuff (that annoyed me multiple times
when trying to refactor the FE - middle-end interface).  It looks
unrelated enough.

Thanks,
Richard.


Re: Remove mode argument from gen_rtx_SET

2015-05-08 Thread Segher Boessenkool
On Fri, May 08, 2015 at 12:32:30PM +0200, Franz Sirl wrote:
> this patch (r222882 is fine, r222883 fails) breaks bootstrap for me on 
> x86_64-linux-gnu:

i386.md has "set:BND" twice; replace that with just "set", and all
should be fine.

Maybe gen* should warn on this; maybe it already does.


Segher


Re: [patch 0/10] debug-early merge

2015-05-08 Thread Richard Biener
On Fri, May 8, 2015 at 2:30 AM, Aldy Hernandez  wrote:
> Hi folks.
>
> I have divided the patches into 10 pieces.  The patches are interdependent
> and cannot be applied independently.  I am merely dividing them up to aid
> the relevant reviewers.
>
> As I've mentioned elsewhere, the patchset as posted has been bootstrapped
> and GCC tested on:
>
> x86_64-unknown-linux-gnu
> powerpc-ibm-aix7.1.2.0
> powerpc64-unknown-linux-gnu
> aarch64-unknown-linux-gnu
>
> I have also GDB tested the patchset on x86_64-linux.

I've looked over all but the middle-end changes (heh), and I wonder if most
of the other stuff (Java method aliases cleanup, global-decl-wrapup and other
FE specific stuff) can be split out from the main work and reviewed / committed
independently.

I'll go over the "meat" of the changes early next week.

Thanks for doing all this work!

Richard.

> Thanks for your help in this ordeal.
>
> Bring it on!
> Aldy


Re: [patch 4/10] debug-early merge: Fortran front-end

2015-05-08 Thread Tobias Burnus
Aldy Hernandez wrote:
> gcc/fortran/
>
>   * f95-lang.c (gfc_write_global_declarations): Remove.
>   (LANG_HOOKS_WRITE_GLOBALS): Remove.
>   (gfc_write_global_declarations): Move code from here to...
>   (gfc_be_parse_file): ...here.
>   Call global_decl_processing.
>   * trans-decl.c (gfc_emit_parameter_debug_info): Rename global_decl
>   to early_global_decl.

I don't have a real overview how those parts interact. However, the code
looks good to me. In addition, it looks much nicer than the previous
code.

Thanks,

Tobias


Re: [RFC] Elimination of zext/sext - type promotion pass

2015-05-08 Thread Richard Biener
On Fri, May 1, 2015 at 6:41 AM, Kugan  wrote:
>
>>> Thanks for the comments. Here is a prototype patch that implements a
>>> type promotion pass. This pass records SSA variables that will have
>>> values in higher bits (than the original type precision) if promoted and
>>> uses this information in inserting appropriate truncations and
>>> extensions. This pass also classifies some of the stmts that sets ssa's
>>> to be unsafe to promote. Here is a gimple difference for the type
>>> promotion as compared to previous dump for a testcase.
>>
>> Note that while GIMPLE has a way to zero-extend (using BIT_AND_EXPR)
>> it has no convenient way to sign-extend other than truncating to a signed
>> (non-promoted) type and then extending to the promoted type.  Thus
>> I think such pass should be accompanied with a new tree code,
>> SEXT_EXPR.  Otherwise we end up with "spurious" un-promoted
>> signed types which later optimizations may be confused about.
>>
>> Not sure if that is the actual issue though.
>>
>> Instead op "prmt" and "prmtn" I'd spell out promote and tree-type-prmtn
>> should be gimple-ssa-type-promote.c.  In the end all targets with
>> non-trivial PROMOTE_MODE should run the pass as a lowering step
>> so it should be enabled even at -O0 (and not disablable).
>>
>> I'd definitely run the pass _after_ pass_lower_vector_ssa (and in the
>> end I'd like to run it before IVOPTs ... which means moving IVOPTs
>> later, after VRP which should be the pass optimizing away some of
>> the extensions).
>>
>> In get_promoted_type I don't understand why you preserve qualifiers.
>> Also even for targets without PROMOTE_MODE it may be
>> beneficial to expose truncations required by expanding bit-precision
>> arithmetic earlier (that is, if !PROMOTE_MODE at least promote
>> to GET_MODE_PRECISION (TYPE_MODE (type))).  A testcase
>> for that is for example
>>
>> struct { long i : 33; long j : 33; } a;
>> return a.i + a.j;
>>
>> where bitfields of type > int do not promote so you get a
>> 33 bit add which we expand to a 64bit add plus a sign-extension
>> (and nothing optimizes that later usually).
>>
>> insert_next_bb sounds like you want to use insert_on_edge
>> somewhere.
>>
>> in assign_rhs_promotable_p you handle comparisons special
>> but the ternary COND_EXPR and VEC_COND_EXPR can have
>> comparisons embedded in their first operand.  The comment
>> confuses me though - with proper sign- or zero-extensions inserted
>> you should be able to promote them anyway?
>>
>> You seem to miss that a GIMPLE_ASSIGN can have 3 operands
>> in promote_cst_in_stmt as well.
>>
>> In promote_assign_stmt_use I consider a default: case that ends
>> up doing nothing dangerous ;)  Please either use gcc_unreachable ()
>> or do the safe thing (fix = true;?).  You seem to be working with
>> a lattice of some kind - fixing up stmt uses the way you do - walking
>> over immediate uses - is not very cache friendly.  Why not use
>> a lattice for this - record promoted vars to be used for old SSA names
>> and walk over all stmts instead, replacing SSA uses on them?
>> Btw, you don't need to call update_stmt if you SET_USE and not
>> replace an SSA name with a constant.
>>
>> You seem to "fix" with a single stmt but I don't see where you insert
>> zero- or sign-extensions for ssa_overflows_p cases?
>>
>> Note that at least for SSA names with !SSA_NAME_VAR (thus
>> anonymous vars) you want to do a cheaper promotion by not
>> allocating a new SSA name but simply "fixing" its type by
>> assigning to its TREE_TYPE.   For SSA names with SSA_NAME_VAR
>> there is of course debug-info to consider and thus doing what you
>> do is better (but probably still will wreck debuginfo?).
>>
>> GIMPLE_NOPs are not only used for parameters but also uninitialized
>> uses - for non-parameters you should simply adjust their type.  No
>> need to fixup their value.
>>
>> The pass needs more comments.
>>
>> It looks like you are not promoting all variables but only those
>> where compensation code (zero-/sign-extensions) is not necessary?
>>
>
> Thanks for the comments. Please find an updated version of this which
> addresses your review comments above. I am still to do full benchmarking
> on this, but tried with few small benchmarks. I will do proper
> benchmarking after getting feedback on the implementation. I have
> however bootstrapped on x86-64-none-linux and regression tested on
> x86-64, ARM and AArch64.
>
> I am also not clear with how I should handle the gimple debug statements
> when the intermediate temporary variable that maps to the original
> variable is promoted.

A few notes.

+/*  Sign-extend operation.  */
+DEFTREECODE (SEXT_EXPR, "sext_expr", tcc_binary, 2)

this needs an extended comment documenting the operands.

+case SEXT_EXPR:
+   {
+ rtx op0 = expand_normal (treeop0);
+ rtx temp;
+ if (!target)
+   target = gen_reg_rtx (TYPE_MODE (TREE_TYPE (treeop0)));
+
+ machine_mode inner_mode = smallest_mode_for_s

Re: Remove mode argument from gen_rtx_SET

2015-05-08 Thread Franz Sirl

Am 2015-05-08 um 13:57 schrieb Segher Boessenkool:

On Fri, May 08, 2015 at 12:32:30PM +0200, Franz Sirl wrote:

this patch (r222882 is fine, r222883 fails) breaks bootstrap for me on
x86_64-linux-gnu:


i386.md has "set:BND" twice; replace that with just "set", and all
should be fine.

Maybe gen* should warn on this; maybe it already does.


I didn't see a warning in the logs at least. But your suggestion fixes 
the bootstrap for me.


Franz.





Index: gcc/config/i386/i386.md
===
--- gcc/config/i386/i386.md (revision 222909)
+++ gcc/config/i386/i386.md (working copy)
@@ -18879,7 +18879,7 @@
   [(set_attr "type" "mpxchk")])
 
 (define_expand "_ldx"
-  [(parallel [(set:BND (match_operand:BND 0 "register_operand")
+  [(parallel [(set (match_operand:BND 0 "register_operand")
(unspec:BND
 [(mem:
   (match_par_dup 3
@@ -18909,7 +18909,7 @@
 })
 
 (define_insn "*_ldx"
-  [(parallel [(set:BND (match_operand:BND 0 "register_operand" "=w")
+  [(parallel [(set (match_operand:BND 0 "register_operand" "=w")
(unspec:BND
 [(match_operator: 3 "bnd_mem_operator"
   [(unspec:


Re: [PATCH 5/14][AArch64] Add basic fp16 support

2015-05-08 Thread Alan Lawrence


Joseph Myers wrote:
>
I'd think it would be desirable to share tests between ARM and AArch64 as 
far as possible (where applicable to both - so not the tests for the 
alternative format, and some of the gcc.target/arm/fp16-* tests using 
scan-assembler might need adapting to work for AArch64).


I agree the most desirable outcome is for the ACLE spec to be normalized between 
the two architectures! In the meantime this implements the specification that we 
have...


I attach a new patch that adds common ARM / AArch64 tests in 
gcc.target/aarch64/fp16 (i.e. beside the shared 
gcc.target/aarch64/advsimd-intrinsics). I've adapted two of the previous tests 
such that they pass on both ARM and AArch64. I'd like to propose this as a patch 
5a, and to drop those tests from the original patch 5.


I'll follow-up with a sort-through of the ARM tests, moving only those that can 
be shared, in due course.


Cheers, Alan
diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_1.c b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_1.c
new file mode 100644
index ..a1c95fd28d14668c5cfa9cfb419c945878d7ac2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_1.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-mfp16-format=ieee" {target "arm*-*-*"} } */
+
+extern void abort (void);
+
+#define EPSILON 0.0001
+
+int
+main (int argc, char **argv)
+{
+  float f1 = 3.14159f;
+  float f2 = 2.718f;
+  /* This 'assembler' statement should be portable between ARM and AArch64.  */
+  asm volatile ("" : : : "memory");
+  __fp16 in1 = f1;
+  __fp16 in2 = f2;
+
+  /* Do the addition on __fp16's (implicitly converts both operands to
+ float32, adds, converts back to f16, then we convert back to f32).  */
+  __fp16 res1 = in1 + in2;
+  asm volatile ("" : : : "memory");
+  float f_res_1 = res1;
+
+  /* Do the addition on float32's (we convert both operands to f32, and add,
+ as above, but skip the final conversion f32 -> f16 -> f32).  */
+  float f1a = in1;
+  float f2a = in2;
+  float f_res_2 = f1a + f2a;
+
+  if (__builtin_fabs (f_res_2 - f_res_1) > EPSILON)
+abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_2.c b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_2.c
new file mode 100644
index ..6aa3e59c15e0eb85595871b47e8d8aa937cca47e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_2.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-mfp16-format=ieee" {target "arm*-*-*"} } */
+
+extern void abort (void);
+
+#define EPSILON 0.0001
+
+int
+main (int argc, char **argv)
+{
+  int i1 = 3;
+  int i2 = 2;
+  /*  This 'assembler' should be portable across ARM and AArch64.  */
+  asm volatile ("" : : : "memory");
+
+  __fp16 in1 = i1;
+  __fp16 in2 = i2;
+
+  /* Do the addition on __fp16's (implicitly converts both operands to
+ float32, adds, converts back to f16, then we convert to int).  */
+  __fp16 res1 = in1 + in2;
+  asm volatile ("" : : : "memory");
+  int result1 = res1;
+
+  /* Do the addition on int's (we convert both operands directly to int, add,
+ and we're done).  */
+  int result2 = ((int) in1) + ((int) in2);
+
+  if (__builtin_abs (result2 - result1) > EPSILON)
+abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/fp16.exp b/gcc/testsuite/gcc.target/aarch64/fp16/fp16.exp
new file mode 100644
index ..7dc8d654a34004d280a1e9f6b9f39d868a60464a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fp16/fp16.exp
@@ -0,0 +1,43 @@
+# Tests of 16-bit floating point (__fp16), for both ARM and AArch64.
+# Copyright (C) 2015 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an ARM or AArch64 target.
+if {![istarget arm*-*-*]
+&& ![istarget aarch64*-*-*]} then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$su

Re: [PATCH 0/14][ARM/AArch64] __FP16 support, vectors, intrinsics, testsuite

2015-05-08 Thread Alan Lawrence

Alan Lawrence wrote:
This patch series adds support for ARM Neon float16x4_t and float16x8_t vector 
types and intrinsics, and the __fp16 type, on both ARM and AArch64, and extends 
the tests in Christophe Lyon's advsimd-intrinsics testsuite to cover these. (I 
chose to extend the existing tests rather than add new ones, as the majority of 
f16 intrinsics are just moving blocks of 16-bits around and do not depend on HW 
support; I added new files for the conversion intrinsics.)


The ARM parts were previously posted at 
https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01434.html but have had some fixes 
following the testsuite additions. Also The ARM patches depend upon my ARM 
lane-checking improvements at 
https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html , which I have just pinged.


I've cross-tested baremetal arm-none-eabi, aarch64-none-elf and 
aarch64_be-none-elf most patches individually, and bootstrapped each patch in 
series on (the relevant one of) arm-none-linux-gnueabihf and aarch64-none-linux-gnu.


OK for trunk?

Cheers, Alan




Ping (ARM, AArch64, Testsuite).



Re: [PATCH 1/2][ARM] PR/63870: Add qualifier to check lane bounds in expand

2015-05-08 Thread Alan Lawrence

Alan Lawrence wrote:

Ping (https://gcc.gnu.org/ml/gcc-patches/2015-01/msg01422.html).

These are required for float16 patches posted at 
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01332.html .


Bootstrapped + check-gcc on arm-none-linux-gnueabihf.

Alan Lawrence wrote:
This is based loosely upon svn r217440, "[AArch64] Add bounds checking to 
vqdm_lane intrinsics...", but applies to more intrinsics (including e.g. 
vget_lane), and does not do the endianness-flipping present on AArch64: the 
objective is to exactly preserve behaviour on all valid code. (Yes, the new 
qualifier may perhaps give us a location for flipping lanes according to 
endianness in the future, but I'm not doing that here.) Checks for lanes being 
in range for many insns are thus moved from assembly to expand time, with 
inlining history. For example, previous error message:


vqrdmulh_lane_s16_indices_1.c: In function 'test1':
vqrdmulh_lane_s16_indices_1.c:9:1: error: lane out of range
}
^

becomes:

In file included vqrdmulh_lane_s16_indices_1.c:3:0:
In function 'vqrdmulh_lane_s16',
inlined from 'test1' at 
gcc/testsuite/gcc.target/aarch64/simd/vqrdmulh_lane_s16_indices_1.c:8:10:
.../install/lib/gcc/arm-none-eabi/5.0.0/include/arm_neon.h:6882:10: error: lane 
-1 out of range 0 - 3

return (int16x4_t)builtin_neon_vqrdmulh_lanev4hi (a, b, c);

Note the question of how to common up tests with those in 
gcc.target/aarch64/simd/*_indices_1.c is not resolved by this patch.


Cross-tested check-gcc on arm-none-eabi
Bootstrapped on arm-none-linux-gnueabihf cortex-a15

gcc/ChangeLog:

 * config/arm/arm-builtins.c (enum arm_type_qualifiers):
 Add qualifier_lane_index.
 (arm_binop_imm_qualifiers, BINOP_IMM_QUALIFIERS): New.
 (arm_getlane_qualifiers): Use qualifier_lane_index.
 (arm_lanemac_qualifiers): Rename to...
 (arm_mac_n_qualifiers): ...this.
 (LANEMAC_QUALIFIERS): Rename to...
 (MAC_N_QUALIFIERS): ...this.
 (arm_mac_lane_qualifiers, MAC_LANE_QUALIFIERS): New.
 (arm_setlane_qualifiers): Use qualifier_lane_index.
 (arm_ternop_imm_qualifiers, TERNOP_IMM_QUALIFIERS): New.
 (enum builtin_arg): Add NEON_ARG_LANE_INDEX.
 (arm_expand_neon_args): Handle NEON_ARG_LANE_INDEX.
 (arm_expand_neon_builtin): Handle qualifier_lane_index.

 * config/arm/arm-protos.h (neon_lane_bounds): Add const_tree parameter.
 * config/arm/arm.c (bounds_check): Likewise, improve error message.
 (neon_lane_bounds, neon_const_bounds): Add arguments to bounds_check.
 * config/arm/arm_neon_builtins.def (vshrs_n, vshru_n, vrshrs_n,
 vrshru_n, vshrn_n, vrshrn_n, vqshrns_n, vqshrnu_n, vqrshrns_n,
 vqrshrnu_n, vqshrun_n, vqrshrun_n, vshl_n, vqshl_s_n, vqshl_u_n,
 vqshlu_n, vshlls_n, vshllu_n): Change qualifiers to BINOP_IMM.
 (vsras_n, vsrau_n, vrsras_n, vrsrau_n, vsri_n, vsli_n): Change
 qualifiers to TERNOP_IMM.
 (vdup_lane): Change qualifiers to GETLANE.
 (vmla_lane, vmlals_lane, vmlalu_lane, vqdmlal_lane, vmls_lane,
 vmlsls_lane, vmlslu_lane, vqdmlsl_lane): Change qualifiers to MAC_LANE.
 (vmla_n, vmlals_n, vmlalu_n, vqdmlal_n, vmls_n, vmlsls_n, vmlslu_n,
 vqdmlsl_n): Change qualifiers to MAC_N.

 * config/arm/neon.md (neon_vget_lane, neon_vget_laneu,
 neon_vget_lanedi, neon_vget_lanev2di, neon_vset_lane,
 neon_vset_lanedi, neon_vdup_lane, neon_vdup_lanedi,
 neon_vdup_lanev2di, neon_vmul_lane, neon_vmul_lane,
 neon_vmull_lane, neon_vqdmull_lane,
 neon_vqdmulh_lane, neon_vqdmulh_lane,
 neon_vmla_lane, neon_vmla_lane, neon_vmlal_lane,
 neon_vqdmlal_lane, neon_vmls_lane, neon_vmls_lane,
 neon_vmlsl_lane, neon_vqdmlsl_lane):
 Remove call to neon_lane_bounds.




Ping^2.



[Patch, fortran] Fix elemental optional dummy argument handling

2015-05-08 Thread Mikael Morin
Hello,

I found a (unrelated) bug while playing with Andre's PR65894 patch.
The dummy argument can get out of sync with the actual argument when
there is an (optional) argument missing.
I plan to commit the attached fix as obvious later today (after testing).

Mikael


2015-05-08  Mikael Morin  

* trans-array.c (gfc_walk_elemental_function_args):
Don't skip the advance to the next dummy argument when skipping
absent optional args.

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index a17f431..00334b1 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -9092,7 +9092,7 @@ gfc_walk_elemental_function_args (gfc_ss * ss, gfc_actual_arglist *arg,
   for (; arg; arg = arg->next)
 {
   if (!arg->expr || arg->expr->expr_type == EXPR_NULL)
-	continue;
+	goto loop_continue;
 
   newss = gfc_walk_subexpr (head, arg->expr);
   if (newss == head)
@@ -9122,6 +9122,7 @@ gfc_walk_elemental_function_args (gfc_ss * ss, gfc_actual_arglist *arg,
 tail = tail->next;
 }
 
+loop_continue:
   if (dummy_arg != NULL)
 	dummy_arg = dummy_arg->next;
 }


! { dg-do run }
!
! The handling of scalar optional arguments passed to elemental procedure
! did not keep actual arguments and dummy arguments synchronized while
! walking them in gfc_walk_elemental_function_args, leading to a
! null pointer dereference in the generated code.
!
  implicit none

  integer, parameter :: n = 3

  call do_test

contains

  elemental function five(nonopt1, opt1, nonopt2, opt2)
integer, intent(in), optional :: opt1, opt2
integer, intent(in) :: nonopt1, nonopt2
integer :: five

if (.not. present(opt1) .and. .not. present(opt2)) then
  five = 5
else
  five = -7
end if
  end function five

  subroutine do_test(opt)
integer, optional :: opt
integer :: i = -1, a(n) = (/ (i, i=1,n) /)
integer :: b(n)

b = five(a, nonopt2=i, opt2=opt)
if (any(b /= 5)) call abort
  end subroutine do_test

end




Re: [Patch, Fortran, PR58586, v3] ICE with derived type with allocatable component passed by value

2015-05-08 Thread Mikael Morin
Le 08/05/2015 12:54, Andre Vehreschild a écrit :
> Hi Mikael,
> 
> thanks for the review. I still have some questions/remarks before commiting:
> 
>>> @@ -5898,8 +5900,21 @@ gfc_generate_function_code (gfc_namespace * ns)
>>>  
>>>if (TREE_TYPE (DECL_RESULT (fndecl)) != void_type_node)
>>>  {
>>> +  bool artificial_result_decl = false;
>>>tree result = get_proc_result (sym);
>>>  
>>> +  /* Make sure that a function returning an object with
>>> +alloc/pointer_components always has a result, where at least
>>> +the allocatable/pointer components are set to zero.  */
>>> +  if (result == NULL_TREE && sym->attr.function
>>> + && sym->ts.type == BT_DERIVED
>>> + && (sym->ts.u.derived->attr.alloc_comp
>>> + || sym->ts.u.derived->attr.pointer_comp))
>>> +   {
>>> + artificial_result_decl = true;
>>> + result = gfc_get_fake_result_decl (sym, 0);
>>> +   }
>>
>> I expect the "fake" result decl to be needed in more cases.
>> For example, if type is BT_CLASS.
>> Here is a variant of alloc_comp_class_4.f03:c_init for such a case.
>>
>>   class(c) function c_init2()
>> allocatable :: c_init2
>>   end function
>>
>> or even without class:
>>
>>   type(t) function t_init()
>> allocatable :: t_init
>>   end function
>>
>> for some any type t.
>>
>> So, remove the check for alloc_comp/pointer_comp and permit BT_CLASS.
>> One minor thing, check sym->result's type and attribute instead of sym's
>> here.  It should not make a difference, but I think it's more correct.
> 
> I am d'accord with checking sym->result, but I am not happy with removing the
> checks for alloc_comp|pointer_comp. When I got you right there, you propose 
> the
> if to be like this:
> 
>   if (result == NULL_TREE && sym->attr.function
> && (sym->result->ts.type == BT_DERIVED
> || sym->result->ts.type == BT_CLASS))
> 
> Removing the attribute checks means to initialize every derived/class type
> result, which may change the semantics of the code more than intented. Look 
> for
> example at this code
> 
>   type t
> integer :: i = 5
>   end type
> 
>   type(t) function static_t_init()
>   end function
> 
> When one compiles this code with -Wreturn-type, then the warning of an
> uninitialized return value is issued at the function declaration. Nevertheless
> the result of static_t_init is validly initialized and i is 5. This may
> confuse users.
> 
> I therefore came to the very ugly solution to make this:
> 
>   if (result == NULL_TREE && sym->attr.function
> && ((sym->result->ts.type == BT_DERIVED
>  && (sym->results->attr.allocatable
>  || sym->result->ts.u.derived->attr.alloc_comp
>  || sym->result->ts.u.derived->attr.pointer_comp))
> || (sym->result->ts.type == BT_CLASS
> && (CLASS_DATA (sym->result)->attr.allocatable
> || CLASS_DATA (sym->result)->attr.alloc_comp
> || CLASS_DATA (sym->result)->attr.pointer_comp
> 
> (I am not yet sure, whether the pointer attribute needs to be added to.) With
> the code above the result of static_t_init is not initialized with all the
> consequences. 
> 
> So what do you propose to do here?

To be honest, I don't know this part of the code very well.
I'll think about it some more.

> Btw, I think I found an additional bug during testing: 
>   type(t) function t_init()
> allocatable :: t_init
>   end function
>  
> when called by:
>   type(t), allocatable :: temp
>   temp = t_init()
> 
> a segfault occurs, because the result of t_init() is NULL, which is
> dereferenced by the caller in this pseudo-code:
> 
>   if (temp != 0B) goto L.12;
>   temp = (struct t *) __builtin_malloc (4);
> L.12:;
>   *temp = *t_init (); <-- This obviously is problematic.
> 
>> The rest looks good.
>> The patch is OK with the suggested changes above.  Thanks.
>> I don't think the test functions above work well enough to be
>> incorporated in a testcase for now.
> 
> ?? I don't get you there? What do you mean? Do you think the
> alloc_comp_class_3/4.* are not correctly testing the issue? Any idea of how to
> test this better? I mean the pr is about this artificial constructs. I merely
> struck it in search of a pr about allocatable components. 

I was talking about the bug you found with t_init above.  :-)
the compiler is not ready to accept that function in a testcase.
The alloc_omp_class_3/4 are fine.

Mikael


[Patch, Fortran, 66035, v1] [5/6 Regression] gfortran ICE segfault

2015-05-08 Thread Andre Vehreschild
Hi all,

please find attached a patch for 66035. An ICE occurred when in a structure
constructor an allocatable component of type class was initialized with an
existing class object. This was caused by 

- the size of the memory to allocate for the component was miscalculated,
- the vptr was not set correctly, and
- when the class object to be used for init was allocatable already, it was
  copied wasting some memory instead of a view_convert inserted.

All of the above are fixed by the attached patch.

Bootstraps and regtests ok on x86_64-linux-gnu/f21 for trunk and gcc-5-trunk.

Ok for trunk and gcc-5-trunk?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


pr66035_1.clog
Description: Binary data
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index cf607d0..402d9b9 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -6881,6 +6881,30 @@ alloc_scalar_allocatable_for_subcomponent_assignment (stmtblock_t *block,
    TREE_TYPE (tmp), tmp,
    fold_convert (TREE_TYPE (tmp), size));
 }
+  else if (cm->ts.type == BT_CLASS)
+{
+  gcc_assert (expr2->ts.type == BT_CLASS || expr2->ts.type == BT_DERIVED);
+  if (expr2->ts.type == BT_DERIVED)
+	{
+	  tmp = gfc_get_symbol_decl (gfc_find_vtab (&expr2->ts));
+	  tmp = gfc_build_addr_expr (NULL_TREE, tmp);
+	  size = fold_convert (size_type_node, gfc_vptr_size_get (tmp));
+	}
+  else
+	{
+	  gfc_expr *e2vtab;
+	  gfc_se se;
+	  e2vtab = gfc_find_and_cut_at_last_class_ref (expr2);
+	  gfc_add_vptr_component (e2vtab);
+	  gfc_add_size_component (e2vtab);
+	  gfc_init_se (&se, NULL);
+	  gfc_conv_expr (&se, e2vtab);
+	  gfc_add_block_to_block (block, &se.pre);
+	  size = fold_convert (size_type_node, se.expr);
+	  gfc_free_expr (e2vtab);
+	}
+  size_in_bytes = size;
+}
   else
 {
   /* Otherwise use the length in bytes of the rhs.  */
@@ -7008,7 +7032,9 @@ gfc_trans_subcomponent_assign (tree dest, gfc_component * cm, gfc_expr * expr,
   gfc_add_expr_to_block (&block, tmp);
 }
   else if (init && (cm->attr.allocatable
-	   || (cm->ts.type == BT_CLASS && CLASS_DATA (cm)->attr.allocatable)))
+	   || (cm->ts.type == BT_CLASS && CLASS_DATA (cm)->attr.allocatable
+	   && (expr->ts.type != BT_CLASS
+		   || CLASS_DATA (expr)->attr.allocatable
 {
   /* Take care about non-array allocatable components here.  The alloc_*
 	 routine below is motivated by the alloc_scalar_allocatable_for_
@@ -7052,6 +7078,14 @@ gfc_trans_subcomponent_assign (tree dest, gfc_component * cm, gfc_expr * expr,
 	  tmp = gfc_build_memcpy_call (tmp, se.expr, size);
 	  gfc_add_expr_to_block (&block, tmp);
 	}
+  else if (cm->ts.type == BT_CLASS && expr->ts.type == BT_CLASS)
+	{
+	  tmp = gfc_copy_class_to_class (se.expr, dest, integer_one_node,
+   CLASS_DATA (cm)->attr.unlimited_polymorphic);
+	  gfc_add_expr_to_block (&block, tmp);
+	  gfc_add_modify (&block, gfc_class_vptr_get (dest),
+			  gfc_class_vptr_get (se.expr));
+	}
   else
 	gfc_add_modify (&block, tmp,
 			fold_convert (TREE_TYPE (tmp), se.expr));
diff --git a/gcc/testsuite/gfortran.dg/structure_constructor_13.f03 b/gcc/testsuite/gfortran.dg/structure_constructor_13.f03
new file mode 100644
index 000..c74e325
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/structure_constructor_13.f03
@@ -0,0 +1,28 @@
+! { dg-do run }
+!
+! Contributed by Melven Roehrig-Zoellner  
+! PR fortran/66035
+
+program test_pr66035
+  type t
+  end type t
+  type w
+class(t), allocatable :: c
+  end type w
+
+  type(t) :: o
+
+  call test(o)
+contains
+  subroutine test(o)
+class(t), intent(inout) :: o
+type(w), dimension(:), allocatable :: list
+
+select type (o)
+  class is (t)
+list = [w(o)] ! This caused an ICE
+  class default
+call abort()
+end select
+  end subroutine
+end program


Re: [Patch, Fortran, PR58586, v3] ICE with derived type with allocatable component passed by value

2015-05-08 Thread Andre Vehreschild
Hi Mikael,

> > ?? I don't get you there? What do you mean? Do you think the
> > alloc_comp_class_3/4.* are not correctly testing the issue? Any idea of how
> > to test this better? I mean the pr is about this artificial constructs. I
> > merely struck it in search of a pr about allocatable components. 
> 
> I was talking about the bug you found with t_init above.  :-)
> the compiler is not ready to accept that function in a testcase.
> The alloc_omp_class_3/4 are fine.

Oh, sorry, I misunderstood you there. Now let's see, where that one is hiding.

- Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


[PATCH] rs6000: Fix peephole

2015-05-08 Thread Segher Boessenkool
This peephole transforms

  lis a,HI ; ori a,a,LO
  cmpw c,a,b ; beq c,...

to

  xoris a,b,HI1
  cmpwi c,a,LO1 ; beq c,...

when a and c are dead after this.  But it forgets to check that a and b
are not the same reg, generating non-sensical code.  This patch fixes that.

Tested etc.; is this okay for trunk?

(This peephole caused some FAILs in the testsuite after an unrelated change;
gone after this patch).


Segher


2015-05-08  Segher Boessenkool  

* config/rs6000/rs6000.md: Require operand inequality in one
of the peepholes.

---
 gcc/config/rs6000/rs6000.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 0178bf4..463bd3c 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -11954,7 +11954,8 @@ (define_peephole2
   (match_operand 7 "" "")
   (match_operand 8 "" "")))]
   "peep2_reg_dead_p (3, operands[0])
-   && peep2_reg_dead_p (4, operands[4])"
+   && peep2_reg_dead_p (4, operands[4])
+   && REGNO (operands[0]) != REGNO (operands[5])"
  [(set (match_dup 0) (xor:SI (match_dup 5) (match_dup 9)))
   (set (match_dup 4) (compare:CC (match_dup 0) (match_dup 10)))
   (set (pc) (if_then_else (match_dup 6) (match_dup 7) (match_dup 8)))]
-- 
1.8.1.4



Re: [PATCH 6/13] mips musl support

2015-05-08 Thread H.J. Lu
On Mon, Apr 27, 2015 at 7:40 AM, Szabolcs Nagy  wrote:
>
>
> On 21/04/15 15:59, Matthew Fortune wrote:
>> Rich Felker  writes:
>>> On Tue, Apr 21, 2015 at 01:58:02PM +, Matthew Fortune wrote:
 There does however appear to be both soft and hard float variants
>
> Patch v2.
>
> Now all the ABI variants musl plans to support are represented.
>
> gcc/Changelog:
>
> 2015-04-27  Gregor Richards  
> Szabolcs Nagy  
>
> * config/mips/linux.h (MUSL_DYNAMIC_LINKER32): Define.
> (MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERN32): Define.
> (GNU_USER_DYNAMIC_LINKERN32): Update.

You checked in config/linux.h CHOOSE_DYNAMIC_LINKER change
without config/mips/linux.h change.  Now linux-mips is broken.

-- 
H.J.


Re: [PATCH][tree-ssa-math-opts] Expand pow (x, CONST) using square roots when possible

2015-05-08 Thread Kyrill Tkachov


On 08/05/15 11:18, Richard Biener wrote:

On Fri, May 1, 2015 at 6:02 PM, Kyrill Tkachov
 wrote:

Hi all,

GCC has some logic to expand calls to pow (x, 0.75), pow (0.25) and pow (x,
(int)k + 0.5)
using square roots. So, for the above examples it would generate sqrt (x) *
sqrt (sqrt (x)),
sqrt (sqrt (x)) and powi (x, k) * sqrt (x) (assuming k > 0. For k < 0 it
will calculate the
reciprocal of that).

However, the implementation of these optimisations is done on a bit of an
ad-hoc basis with
the 0.25, 0.5, 0.75 cases hardcoded.
Judging by
https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=meissner2.pdf
these are the most commonly used exponents (at least in SPEC ;))

This patch generalises this optimisation into a (hopefully) more robust
algorithm.
In particular, it expands calls to pow (x, CST) by expanding the integer
part of CST
using a powi, like it does already, and then expanding the fractional part
as a product
of repeated applications of a square root if the fractional part can be
expressed
as a multiple of a power of 0.5.

I try to explain the algorithm in more detail in the comments in the patch
but, for example:

pow (x, 5.625) is not currently handled, but with this patch will be
expanded
to powi (x, 5) * sqrt (x) * sqrt (sqrt (sqrt (x))) because 5.625 == 5.0 +
0.5 + 0.5**3

Negative exponents are handled in either of two ways, depending on the
exponent value:
* Using a simple reciprocal.
   For example:
   pow (x, -5.625) == 1.0 / pow (x, 5.625)
 --> 1.0 / (powi (x, 5) * sqrt (x) * sqrt (sqrt (sqrt (x

* For pow (x, EXP) with negative exponent EXP with integer part INT and
fractional part FRAC:
pow (1.0 - FRAC) / powi (ceil (abs (EXP))).
   For example:
   pow (x, -5.875) == pow (x, 0.125) / powi (X, 6)
 --> sqrt (sqrt (sqrt (x))) / (powi (x, 6))


Since hardware square root instructions tend to be expensive, we may want to
reduce the number
of square roots we are willing to calculate. Since we reuse intermediate
square root results,
this boils down to restricting the depth of the square root chains. In all
the examples above
that depth is 3. I've made this maximum depth parametrisable in params.def.
By adjusting that
parameter we can adjust the resolution of this optimisation. So, if it's set
to '4' then we
will synthesize every exponent that is a multiple of 0.5**4 == 0.0625,
including negative
multiples. Currently, GCC will not try to expand negative multiples of
anything else than 0.5

I have tried to keep the existing functionality intact and activate this
only for
-funsafe-math-optimizations and only when the target has a sqrt instruction.
  An exception to that is pow (x, 0.5) which we prefer to transform to sqrt
even
when a hardware sqrt is not available, presumably because the library
function for
sqrt is usually faster than pow (?).

Yes.  It's also a safe transform - which you seem to put under
flag_unsafe_math_optimizations only with your patch.

It would be clearer to just leave the special-case

-  /* Optimize pow(x,0.5) = sqrt(x).  This replacement is always safe
- unless signed zeros must be maintained.  pow(-0,0.5) = +0, while
- sqrt(-0) = -0.  */
-  if (sqrtfn
-  && REAL_VALUES_EQUAL (c, dconsthalf)
-  && !HONOR_SIGNED_ZEROS (mode))
-return build_and_insert_call (gsi, loc, sqrtfn, arg0);

in as-is.


Ok, I'll leave that case explicit.



You also removed the Os constraint which you should put back in.
Basically if !optimize_function_for_speed_p then generate at most
two calls to sqrt (iff the HW has a sqrt instruction).


I tried to move that logic into expand_with_sqrts but
I'll move it outside it. It seems that this boils down to
only 0.25, as any other 2xsqrt chain will also involve a
multiply or a divide which we currently avoid.



You fail to add a testcase that checks that the optimization applies.


I'll add one to scan the sincos dump.
I notice that we don't have a testuite check that the target has
a hw sqrt instructions. Would you like me to add one? Or can I make
the testcase aarch64-specific?



Otherwise the idea looks good though there must be a better way
to compute the series than by using real-arithmetic and forcefully
trying out all possibilities...


I get that feeling too. What I need is not only a way
of figuring out if the fractional part of the exponent can be
represented in this way, but also compute the depth of the
sqrt chain and the number of multiplies...
That being said, the current approach is O(maximum depth) and
I don't expect the depth to go much beyond 3 or 4 in practice.

Thanks for looking at it!
I'll respin the patch.

Kyrill



Richard.



Having seen the glibc implementation of a fully IEEE-754-compliant pow
function, I think we
would prefer synthesising the pow call whenever we can for -ffast-math.

I have seen this optimisation trigger a few times in SPEC2k6, in particular
in 447.dealII
and 481.wrf where it replaced calls to powf (x, -0.25), pow (x, 0.125) and
pow (x, 0.875)
with squar

Re: Remove mode argument from gen_rtx_SET

2015-05-08 Thread Richard Sandiford
Franz Sirl  writes:
> Am 2015-05-08 um 13:57 schrieb Segher Boessenkool:
>> On Fri, May 08, 2015 at 12:32:30PM +0200, Franz Sirl wrote:
>>> this patch (r222882 is fine, r222883 fails) breaks bootstrap for me on
>>> x86_64-linux-gnu:
>>
>> i386.md has "set:BND" twice; replace that with just "set", and all
>> should be fine.
>>
>> Maybe gen* should warn on this; maybe it already does.
>
> I didn't see a warning in the logs at least. But your suggestion fixes 
> the bootstrap for me.

Thanks.  I installed this as obvious after testing that x86_64-linux-gnu
built with --enable-libmpx and that rx-elf could handle:

  void f(long long *a) { a[0] = a[1]; }

when -mlra was passed.

There's also one in a comment in msp430.md:

; This pattern is identical to the truncsipsi2 pattern except
; that it uses a SUBREG instead of a TRUNC.  It is needed in
; order to prevent reload from converting (set:SI (SUBREG:PSI (SI)))
; into (SET:PSI (PSI)).

I'm not sure what that's supposed to mean (what's an SI set of a PSI
subreg?), but I suspect removing the mode would lose information,
so I left it alone.

I'll follow up with a patch to make the generators raise an error
for this, as well as to restore the "missing mode" diagnostics
mentioned in the genrecog thread.

Sorry for the breakage.

Richard


gcc/
* config/i386/i386.md (_ldx, *_ldx): Remove mode
from (set ...).
* config/rx/rx.md (movdi, movdf): Likewise.
Likewise for define_peephole2s.

Index: gcc/config/i386/i386.md
===
--- gcc/config/i386/i386.md 2015-05-08 14:42:57.823310127 +0100
+++ gcc/config/i386/i386.md 2015-05-08 14:43:12.515140307 +0100
@@ -18879,13 +18879,13 @@ (define_insn "*_"
   [(set_attr "type" "mpxchk")])
 
 (define_expand "_ldx"
-  [(parallel [(set:BND (match_operand:BND 0 "register_operand")
-   (unspec:BND
-[(mem:
-  (match_par_dup 3
-[(match_operand: 1 
"address_mpx_no_index_operand")
- (match_operand: 2 "register_operand")]))]
-UNSPEC_BNDLDX))
+  [(parallel [(set (match_operand:BND 0 "register_operand")
+   (unspec:BND
+[(mem:
+  (match_par_dup 3
+[(match_operand: 1 
"address_mpx_no_index_operand")
+ (match_operand: 2 "register_operand")]))]
+UNSPEC_BNDLDX))
   (use (mem:BLK (match_dup 1)))])]
   "TARGET_MPX"
 {
@@ -18909,14 +18909,14 @@ (define_expand "_ldx"
 })
 
 (define_insn "*_ldx"
-  [(parallel [(set:BND (match_operand:BND 0 "register_operand" "=w")
-   (unspec:BND
-[(match_operator: 3 "bnd_mem_operator"
-  [(unspec:
-[(match_operand: 1 
"address_mpx_no_index_operand" "Ti")
- (match_operand: 2 "register_operand" 
"l")]
-   UNSPEC_BNDLDX_ADDR)])]
-UNSPEC_BNDLDX))
+  [(parallel [(set (match_operand:BND 0 "register_operand" "=w")
+  (unspec:BND
+[(match_operator: 3 "bnd_mem_operator"
+  [(unspec:
+[(match_operand: 1 
"address_mpx_no_index_operand" "Ti")
+ (match_operand: 2 "register_operand" "l")]
+   UNSPEC_BNDLDX_ADDR)])]
+UNSPEC_BNDLDX))
   (use (mem:BLK (match_dup 1)))])]
   "TARGET_MPX"
   "bndldx\t{%3, %0|%0, %3}"
Index: gcc/config/rx/rx.md
===
--- gcc/config/rx/rx.md 2015-05-08 14:42:57.823310127 +0100
+++ gcc/config/rx/rx.md 2015-05-08 14:43:12.515140307 +0100
@@ -1734,9 +1734,9 @@ (define_peephole2
 (match_dup 2)))
  (clobber (reg:CC CC_REG))])]
   "peep2_regno_dead_p (2, REGNO (operands[0])) && (optimize < 3 || 
optimize_size)"
-  [(parallel [(set:SI (match_dup 2)
- (memex_commutative:SI (match_dup 2)
-   (extend_types:SI (match_dup 1
+  [(parallel [(set (match_dup 2)
+  (memex_commutative:SI (match_dup 2)
+(extend_types:SI (match_dup 1
  (clobber (reg:CC CC_REG))])]
 )
 
@@ -1748,9 +1748,9 @@ (define_peephole2
 (match_dup 0)))
  (clobber (reg:CC CC_REG))])]
   "peep2_regno_dead_p (2, REGNO (operands[0])) && (optimize < 3 || 
optimize_size)"
-  [(parallel [(set:SI (match_dup 2)
- (memex_commutative:SI (match_dup 2)
-   (extend_types:SI (match_dup 1
+  [(parallel [(set (match_dup 2)
+  (memex_commutative:SI (match_dup 2)
+

Re: [Patch, fortran, pr65894, v1] [6 Regression] severe regression in gfortran 6.0.0

2015-05-08 Thread Mikael Morin
Le 08/05/2015 13:54, Andre Vehreschild a écrit :
> Hi Mikael,
> 
> at first I tried to fix this issue with the scalarizer, too, but I could not
> grasp how the scalarizer was working. Do you have any documentation, how it is
> meant to be? I mean, I have read the comments in the code, but those are 
> sparse
> and the multitude of routines the scalarizer is split up into doesn't help
> either.

If you haven't already, you can have a look at:
https://gcc.gnu.org/wiki/GFortranScalarizer
Most of it is still relevant.

Mikael


Re: [Patch, Fortran, PR58586, v3] ICE with derived type with allocatable component passed by value

2015-05-08 Thread Andre Vehreschild
Hi,

so attached is a quick and dirty solution for the allocatable return value
problem. I personally don't like it. It is making a special case from the
assign a function result to a variable. May be you have a better idea how to do
this in gfortran style.

- Andre


On Fri, 8 May 2015 15:31:46 +0200
Andre Vehreschild  wrote:

> Hi Mikael,
> 
> > > ?? I don't get you there? What do you mean? Do you think the
> > > alloc_comp_class_3/4.* are not correctly testing the issue? Any idea of
> > > how to test this better? I mean the pr is about this artificial
> > > constructs. I merely struck it in search of a pr about allocatable
> > > components. 
> > 
> > I was talking about the bug you found with t_init above.  :-)
> > the compiler is not ready to accept that function in a testcase.
> > The alloc_omp_class_3/4 are fine.
> 
> Oh, sorry, I misunderstood you there. Now let's see, where that one is hiding.
> 
> - Andre


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 402d9b9..87e2cde 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -9043,6 +9043,7 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
   stmtblock_t body;
   bool l_is_temp;
   bool scalar_to_array;
+  bool alloc_to_alloc;
   tree string_length;
   int n;
 
@@ -9156,6 +9157,18 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
   else
 gfc_conv_expr (&lse, expr1);
 
+  alloc_to_alloc = expr1->expr_type == EXPR_VARIABLE
+  && expr1->symtree->n.sym->ts.type == BT_DERIVED
+  && expr1->symtree->n.sym->attr.allocatable
+  && expr2->expr_type == EXPR_FUNCTION
+  && expr2->ts.type == BT_DERIVED
+  && expr2->value.function.esym->attr.allocatable;
+  if (alloc_to_alloc)
+{
+  rse.expr = gfc_build_addr_expr (NULL_TREE, rse.expr);
+  lse.expr = gfc_build_addr_expr (NULL_TREE, lse.expr);;
+}
+
   /* Assignments of scalar derived types with allocatable components
  to arrays must be done with a deep copy and the rhs temporary
  must have its components deallocated afterwards.  */
@@ -9208,7 +9221,8 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
   if (lss == gfc_ss_terminator)
 {
   /* F2003: Add the code for reallocation on assignment.  */
-  if (flag_realloc_lhs && is_scalar_reallocatable_lhs (expr1))
+  if (flag_realloc_lhs && !alloc_to_alloc
+	  && is_scalar_reallocatable_lhs (expr1))
 	alloc_scalar_allocatable_for_assignment (&block, string_length,
 		 expr1, expr2);
 


Re: [PATCH] rs6000: Fix peephole

2015-05-08 Thread David Edelsohn
On Fri, May 8, 2015 at 9:38 AM, Segher Boessenkool
 wrote:
> This peephole transforms
>
>   lis a,HI ; ori a,a,LO
>   cmpw c,a,b ; beq c,...
>
> to
>
>   xoris a,b,HI1
>   cmpwi c,a,LO1 ; beq c,...
>
> when a and c are dead after this.  But it forgets to check that a and b
> are not the same reg, generating non-sensical code.  This patch fixes that.
>
> Tested etc.; is this okay for trunk?
>
> (This peephole caused some FAILs in the testsuite after an unrelated change;
> gone after this patch).
>
>
> Segher
>
>
> 2015-05-08  Segher Boessenkool  
>
> * config/rs6000/rs6000.md: Require operand inequality in one
> of the peepholes.

Okay.

Is there an artificial testcase?

Thanks, David


Re: [PATCH] rs6000: Fix peephole

2015-05-08 Thread Segher Boessenkool
On Fri, May 08, 2015 at 10:18:46AM -0400, David Edelsohn wrote:
> > 2015-05-08  Segher Boessenkool  
> >
> > * config/rs6000/rs6000.md: Require operand inequality in one
> > of the peepholes.
> 
> Okay.
> 
> Is there an artificial testcase?

I don't have one.  Peepholes require optimisation the be enabled, and
it is hard to get stupid code like this then (a compare of X with X).
But it did trigger a few times in the testsuite with this patch:


diff --git a/gcc/combine.c b/gcc/combine.c
index 46cd6db..8ba14cc 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3892,8 +3892,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   && XVECLEN (newpat, 0) == 2
   && GET_CODE (XVECEXP (newpat, 0, 0)) == SET
   && GET_CODE (XVECEXP (newpat, 0, 1)) == SET
-  && (i1 || set_noop_p (XVECEXP (newpat, 0, 0))
- || set_noop_p (XVECEXP (newpat, 0, 1)))
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT


Re: [PATCH 6/13] mips musl support

2015-05-08 Thread Szabolcs Nagy
On 08/05/15 14:56, H.J. Lu wrote:
> On Mon, Apr 27, 2015 at 7:40 AM, Szabolcs Nagy  wrote:
>> On 21/04/15 15:59, Matthew Fortune wrote:
>>> Rich Felker  writes:
 On Tue, Apr 21, 2015 at 01:58:02PM +, Matthew Fortune wrote:
> There does however appear to be both soft and hard float variants
>>
>> Patch v2.
>>
>> Now all the ABI variants musl plans to support are represented.
>>
>> gcc/Changelog:
>>
>> 2015-04-27  Gregor Richards  
>> Szabolcs Nagy  
>>
>> * config/mips/linux.h (MUSL_DYNAMIC_LINKER32): Define.
>> (MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERN32): Define.
>> (GNU_USER_DYNAMIC_LINKERN32): Update.
> 
> You checked in config/linux.h CHOOSE_DYNAMIC_LINKER change
> without config/mips/linux.h change.  Now linux-mips is broken.
> 

sorry, i cannot roll back the change right now or
provide fix up patches.

i thought i did the tests without the target patches..
but not mips (only mips is affected).

while i'm waiting for the ppl with commit rights..
is it better to roll back the commit or do a single
line fix to config/mips/linux.h?

(the single line fix is to add a "/dev/null" fourth
argument to CHOOSE_DYNAMIC_LINKER macro in the definition
of GNU_USER_DYNAMIC_LINKERN32 in config/mips/linux.h)

i don't know why i got the mail with such a big delay :(



RE: [PATCH 6/13] mips musl support

2015-05-08 Thread Matthew Fortune
H.J. Lu  writes:
> On Mon, Apr 27, 2015 at 7:40 AM, Szabolcs Nagy 
> wrote:
> >
> >
> > On 21/04/15 15:59, Matthew Fortune wrote:
> >> Rich Felker  writes:
> >>> On Tue, Apr 21, 2015 at 01:58:02PM +, Matthew Fortune wrote:
>  There does however appear to be both soft and hard float variants
> >
> > Patch v2.
> >
> > Now all the ABI variants musl plans to support are represented.
> >
> > gcc/Changelog:
> >
> > 2015-04-27  Gregor Richards  
> > Szabolcs Nagy  
> >
> > * config/mips/linux.h (MUSL_DYNAMIC_LINKER32): Define.
> > (MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERN32): Define.
> > (GNU_USER_DYNAMIC_LINKERN32): Update.
> 
> You checked in config/linux.h CHOOSE_DYNAMIC_LINKER change without
> config/mips/linux.h change.  Now linux-mips is broken.

The MIPS patch is OK. I am concerned that you are aiming for one
dynamic linker per ABI variant in musl but are not accounting for
soft-float up front in n32/n64. There is time to reconsider this
before any of this code gets to a versioned GCC release though.

I.e. as it stands this patch is not OK for backporting to GCC 5
without further discussion.

There is also the perspective that we should be able to aim for
an ABI variant agnostic dynamic linker at some point over the next
year by working towards a build that truly uses no float and is
hence compatible with all the ABI variants.

Thanks,
Matthew


[RFA] libiberty/mkstemps.c: Include if not available.

2015-05-08 Thread Joel Brobecker
Hello,

Attempting to build libiberty on LynxOS-178 fails trying to compile
mkstemps.c with the following error:

mkstemps.c:84:18: error: storage size of 'tv' isn't known
   struct timeval tv;
  ^

This file would normally include  to get the type's
definition, but unfortunately LynxOS-178 does not want us to use
, only . The configure script correctly finds
this out and generates a config.h file where HAVE_SYS_TIME_H is
undefined:

/* Define to 1 if you have the  header file. */
/* #undef HAVE_SYS_TIME_H */

This patch fixes the build issue by falling back on including 
if  could not be included (and provided that HAVE_TIME_H
is defined, of course).

libiberty/ChangeLog:

* mkstemps.c: #include  if HAVE_TIME_H is defined
but not HAVE_SYS_TIME_H.

OK to commit?

Thank you,
-- 
Joel

---
 libiberty/mkstemps.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libiberty/mkstemps.c b/libiberty/mkstemps.c
index a0e68a7..0e06fe1 100644
--- a/libiberty/mkstemps.c
+++ b/libiberty/mkstemps.c
@@ -35,6 +35,8 @@
 #endif
 #ifdef HAVE_SYS_TIME_H
 #include 
+#elif HAVE_TIME_H
+#include 
 #endif
 #include "ansidecl.h"
 
-- 
1.9.1



Re: [PATCH 6/13] mips musl support

2015-05-08 Thread Rich Felker
On Fri, May 08, 2015 at 02:25:11PM +, Matthew Fortune wrote:
> H.J. Lu  writes:
> > On Mon, Apr 27, 2015 at 7:40 AM, Szabolcs Nagy 
> > wrote:
> > >
> > >
> > > On 21/04/15 15:59, Matthew Fortune wrote:
> > >> Rich Felker  writes:
> > >>> On Tue, Apr 21, 2015 at 01:58:02PM +, Matthew Fortune wrote:
> >  There does however appear to be both soft and hard float variants
> > >
> > > Patch v2.
> > >
> > > Now all the ABI variants musl plans to support are represented.
> > >
> > > gcc/Changelog:
> > >
> > > 2015-04-27  Gregor Richards  
> > > Szabolcs Nagy  
> > >
> > > * config/mips/linux.h (MUSL_DYNAMIC_LINKER32): Define.
> > > (MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERN32): Define.
> > > (GNU_USER_DYNAMIC_LINKERN32): Update.
> > 
> > You checked in config/linux.h CHOOSE_DYNAMIC_LINKER change without
> > config/mips/linux.h change.  Now linux-mips is broken.
> 
> The MIPS patch is OK. I am concerned that you are aiming for one
> dynamic linker per ABI variant in musl but are not accounting for
> soft-float up front in n32/n64. There is time to reconsider this
> before any of this code gets to a versioned GCC release though.

I'm not aware of whether there are mips64 chips for which softfloat
would be desirable, so I don't know if it's an ABI we'll ever have,
but I'm not opposed to adding it here just to be safe (in case we need
it).

> I.e. as it stands this patch is not OK for backporting to GCC 5
> without further discussion.
> 
> There is also the perspective that we should be able to aim for
> an ABI variant agnostic dynamic linker at some point over the next
> year by working towards a build that truly uses no float and is
> hence compatible with all the ABI variants.

For musl that's not going to happen. The dynamic linker and shared
libc are one file, which therefore has lots of public interfaces that
depend on the argument passing ABI.

Rich


Re: [PATCH 6/13] mips musl support

2015-05-08 Thread Szabolcs Nagy


On 08/05/15 15:25, Matthew Fortune wrote:
> H.J. Lu  writes:
>> On Mon, Apr 27, 2015 at 7:40 AM, Szabolcs Nagy 
>> wrote:
>>>
>>>
>>> On 21/04/15 15:59, Matthew Fortune wrote:
 Rich Felker  writes:
> On Tue, Apr 21, 2015 at 01:58:02PM +, Matthew Fortune wrote:
>> There does however appear to be both soft and hard float variants
>>>
>>> Patch v2.
>>>
>>> Now all the ABI variants musl plans to support are represented.
>>>
>>> gcc/Changelog:
>>>
>>> 2015-04-27  Gregor Richards  
>>> Szabolcs Nagy  
>>>
>>> * config/mips/linux.h (MUSL_DYNAMIC_LINKER32): Define.
>>> (MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERN32): Define.
>>> (GNU_USER_DYNAMIC_LINKERN32): Update.
>>
>> You checked in config/linux.h CHOOSE_DYNAMIC_LINKER change without
>> config/mips/linux.h change.  Now linux-mips is broken.
> 
> The MIPS patch is OK. I am concerned that you are aiming for one
> dynamic linker per ABI variant in musl but are not accounting for
> soft-float up front in n32/n64. There is time to reconsider this
> before any of this code gets to a versioned GCC release though.
> 

i thought musl would not want to support soft float variants
of those abis, but now i think it does not hurt to add the -sf
there too.

if you think that's ok, i can now submit the patch with
%{msoft-float:-sf} added to all abi variants.

> I.e. as it stands this patch is not OK for backporting to GCC 5
> without further discussion.
> 
> There is also the perspective that we should be able to aim for
> an ABI variant agnostic dynamic linker at some point over the next
> year by working towards a build that truly uses no float and is
> hence compatible with all the ABI variants.

i'm not sure what you mean by 'a build that truly uses no float'

i thought the direction is to have a potentially hard float abi
with kernel emulation when the fpu is not present.

> 
> Thanks,
> Matthew
> 



Re: genrecog: Address -Wsign-compare diagnostics

2015-05-08 Thread Richard Sandiford
Thomas Schwinge  writes:
> Hi!
>
> On Mon, 27 Apr 2015 11:20:30 +0100, Richard Sandiford
>  wrote:
>> This patch [...] by replacing most of genrecog [...]
>
> OK to commit?

Looks good to me FWIW.  Probably counts as obvious.

Thanks,
Richard



RE: [PATCH 6/13] mips musl support

2015-05-08 Thread Matthew Fortune
Szabolcs Nagy  writes:
> On 08/05/15 15:25, Matthew Fortune wrote:
> > H.J. Lu  writes:
> >> On Mon, Apr 27, 2015 at 7:40 AM, Szabolcs Nagy
> >> 
> >> wrote:
> >>>
> >>>
> >>> On 21/04/15 15:59, Matthew Fortune wrote:
>  Rich Felker  writes:
> > On Tue, Apr 21, 2015 at 01:58:02PM +, Matthew Fortune wrote:
> >> There does however appear to be both soft and hard float variants
> >>>
> >>> Patch v2.
> >>>
> >>> Now all the ABI variants musl plans to support are represented.
> >>>
> >>> gcc/Changelog:
> >>>
> >>> 2015-04-27  Gregor Richards  
> >>> Szabolcs Nagy  
> >>>
> >>> * config/mips/linux.h (MUSL_DYNAMIC_LINKER32): Define.
> >>> (MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERN32): Define.
> >>> (GNU_USER_DYNAMIC_LINKERN32): Update.
> >>
> >> You checked in config/linux.h CHOOSE_DYNAMIC_LINKER change without
> >> config/mips/linux.h change.  Now linux-mips is broken.
> >
> > The MIPS patch is OK. I am concerned that you are aiming for one
> > dynamic linker per ABI variant in musl but are not accounting for
> > soft-float up front in n32/n64. There is time to reconsider this
> > before any of this code gets to a versioned GCC release though.
> >
> 
> i thought musl would not want to support soft float variants of those
> abis, but now i think it does not hurt to add the -sf there too.
> 
> if you think that's ok, i can now submit the patch with %{msoft-float:-
> sf} added to all abi variants.

That's fine. Go ahead.

> > I.e. as it stands this patch is not OK for backporting to GCC 5
> > without further discussion.
> >
> > There is also the perspective that we should be able to aim for an ABI
> > variant agnostic dynamic linker at some point over the next year by
> > working towards a build that truly uses no float and is hence
> > compatible with all the ABI variants.
> 
> i'm not sure what you mean by 'a build that truly uses no float'
> 
> i thought the direction is to have a potentially hard float abi with
> kernel emulation when the fpu is not present.

With MIPS having such a rich matrix of ABI variants the need to build code
to target all variants is quite costly. We currently have to do this
regardless of whether the code in it is affected by the differences. We are
looking at ways to create more generic objects so that some libraries can
get away with fewer build variations. The major variation for MIPS is the
set of floating-point extensions so knowing that a module has no interest
in floating point code is quite valuable.

Since Rich has just pointed out that the dynamic linker and C library are
one and the same for musl then this will not be of as much value to musl.

Thanks,
Matthew

> 
> >
> > Thanks,
> > Matthew
> >



Re: [patch 0/10] debug-early merge

2015-05-08 Thread David Malcolm
On Thu, 2015-05-07 at 17:30 -0700, Aldy Hernandez wrote:
> Hi folks.
> 
> I have divided the patches into 10 pieces.  The patches are 
> interdependent and cannot be applied independently.  I am merely 
> dividing them up to aid the relevant reviewers.
> 
> As I've mentioned elsewhere, the patchset as posted has been 
> bootstrapped and GCC tested on:
> 
>   x86_64-unknown-linux-gnu
>   powerpc-ibm-aix7.1.2.0
>   powerpc64-unknown-linux-gnu
>   aarch64-unknown-linux-gnu
> 
> I have also GDB tested the patchset on x86_64-linux.
> 
> Thanks for your help in this ordeal.

Aldy: did you test the jit with this?  Specifically, is gdb still able
to step through the generated code/inspect values? 
etc; see e.g:
https://gcc.gnu.org/onlinedocs/jit/intro/tutorial04.html#single-stepping-through-the-generated-code

Sadly the jit testsuite doesn't yet automatically verify that sane
debuginfo is generated; I've only ever hand-tested that (this is PR
jit/64196).

Alternatively, I guess I can try your branch if that's easier (exactly
which should I test?)

Dave



Re: [PATCH 6/13] mips musl support

2015-05-08 Thread Rich Felker
On Fri, May 08, 2015 at 03:41:31PM +0100, Szabolcs Nagy wrote:
> > I.e. as it stands this patch is not OK for backporting to GCC 5
> > without further discussion.
> > 
> > There is also the perspective that we should be able to aim for
> > an ABI variant agnostic dynamic linker at some point over the next
> > year by working towards a build that truly uses no float and is
> > hence compatible with all the ABI variants.
> 
> i'm not sure what you mean by 'a build that truly uses no float'
> 
> i thought the direction is to have a potentially hard float abi
> with kernel emulation when the fpu is not present.

I think Matthew's idea was that the dynamic linker could be agnostic
since it doesn't need floating point arithmetic itself, then load
appropriate libraries depending on the ABI of the application
(presumably determined by some flags in _DYNAMIC or perhaps the main
ELF header). Of course with some familiarity with musl it becomes
clear why this is not an option, but to answer things like this we
need to think from a standpoint of non-familiarity with musl. :-)

Rich


Re: PR 64454: Improve VRP for %

2015-05-08 Thread Marc Glisse

Hello,

here is a rewrite of the patch, using wide_int, and improving a bit the 
result. Same ChangeLog, tested again on x86_64-linux-gnu.


--
Marc GlisseIndex: gcc/testsuite/gcc.dg/tree-ssa/vrp97.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/vrp97.c   (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/vrp97.c   (working copy)
@@ -0,0 +1,19 @@
+/* PR tree-optimization/64454 */
+/* { dg-options "-O2 -fdump-tree-vrp1" } */
+
+int f(int a, int b)
+{
+if (a < -3 || a > 13) __builtin_unreachable();
+if (b < -6 || b > 9) __builtin_unreachable();
+int c = a % b;
+return c >= -3 && c <= 8;
+}
+
+int g(int a, int b)
+{
+  int c = a % b;
+  return c != -__INT_MAX__ - 1;
+}
+
+/* { dg-final { scan-tree-dump-times "return 1;" 2 "vrp1" } } */
+/* { dg-final { cleanup-tree-dump "vrp1" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-perm-7.c
===
--- gcc/testsuite/gcc.dg/vect/slp-perm-7.c  (revision 222906)
+++ gcc/testsuite/gcc.dg/vect/slp-perm-7.c  (working copy)
@@ -63,15 +63,15 @@ int main (int argc, const char* argv[])
 
   foo (input, output, input2, output2);
 
   for (i = 0; i < N; i++)
  if (output[i] != check_results[i] || output2[i] != check_results2[i])
abort ();
 
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
vect_perm } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"  { target 
vect_perm } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target vect_perm } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
 
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 222906)
+++ gcc/tree-vrp.c  (working copy)
@@ -3189,40 +3189,73 @@ extract_range_from_binary_expr_1 (value_
}
}
   else
{
  extract_range_from_multiplicative_op_1 (vr, code, &vr0, &vr1);
  return;
}
 }
   else if (code == TRUNC_MOD_EXPR)
 {
-  if (vr1.type != VR_RANGE
- || range_includes_zero_p (vr1.min, vr1.max) != 0
- || vrp_val_is_min (vr1.min))
+  if (range_is_null (&vr1))
{
- set_value_range_to_varying (vr);
+ set_value_range_to_undefined (vr);
  return;
}
+  // ABS (A % B) < ABS (B) and either 0 <= A % B <= A or A <= A % B <= 0.
   type = VR_RANGE;
-  /* Compute MAX <|vr1.min|, |vr1.max|> - 1.  */
-  max = fold_unary_to_constant (ABS_EXPR, expr_type, vr1.min);
-  if (tree_int_cst_lt (max, vr1.max))
-   max = vr1.max;
-  max = int_const_binop (MINUS_EXPR, max, build_int_cst (TREE_TYPE (max), 
1));
-  /* If the dividend is non-negative the modulus will be
-non-negative as well.  */
-  if (TYPE_UNSIGNED (expr_type)
- || value_range_nonnegative_p (&vr0))
-   min = build_int_cst (TREE_TYPE (max), 0);
+  signop sgn = TYPE_SIGN (expr_type);
+  unsigned int prec = TYPE_PRECISION (expr_type);
+  wide_int wmin, wmax, tmp;
+  wide_int zero = wi::zero (prec);
+  wide_int one = wi::one (prec);
+  if (vr1.type == VR_RANGE && !symbolic_range_p (&vr1))
+   {
+ wmax = wi::sub (vr1.max, one);
+ if (sgn == SIGNED)
+   {
+ tmp = wi::sub (wi::minus_one (prec), vr1.min);
+ wmax = wi::smax (wmax, tmp);
+   }
+   }
+  else
+   {
+ wmax = wi::max_value (prec, sgn);
+ // X % INT_MIN may be INT_MAX.
+ if (sgn == UNSIGNED)
+   wmax = wmax - one;
+   }
+
+  if (sgn == UNSIGNED)
+   wmin = zero;
   else
-   min = fold_unary_to_constant (NEGATE_EXPR, expr_type, max);
+   {
+ wmin = -wmax;
+ if (vr0.type == VR_RANGE && TREE_CODE (vr0.min) == INTEGER_CST)
+   {
+ tmp = vr0.min;
+ if (wi::gts_p (tmp, zero))
+   tmp = zero;
+ wmin = wi::smax (wmin, tmp);
+   }
+   }
+
+  if (vr0.type == VR_RANGE && TREE_CODE (vr0.max) == INTEGER_CST)
+   {
+ tmp = vr0.max;
+ if (sgn == SIGNED && wi::neg_p (tmp))
+   tmp = zero;
+ wmax = wi::min (wmax, tmp, sgn);
+   }
+
+  min = wide_int_to_tree (expr_type, wmin);
+  max = wide_int_to_tree (expr_type, wmax);
 }
   else if (code == BIT_AND_EXPR || code == BIT_IOR_EXPR || code == 
BIT_XOR_EXPR)
 {
   bool int_cst_range0, int_cst_range1;
   wide_int may_be_nonzero0, may_be_nonzero1;
   wide_int must_be_nonzero0, must_be_nonzero1;
 
   int_cst_range0 = zero_nonzero_bits_from_vr (expr_type, &vr0,
  &may_be_nonzero0,
  &must_be_nonzero0);


Re: [patch 0/10] debug-early merge

2015-05-08 Thread Aldy Hernandez

On 05/08/2015 07:41 AM, David Malcolm wrote:

On Thu, 2015-05-07 at 17:30 -0700, Aldy Hernandez wrote:

Hi folks.

I have divided the patches into 10 pieces.  The patches are
interdependent and cannot be applied independently.  I am merely
dividing them up to aid the relevant reviewers.

As I've mentioned elsewhere, the patchset as posted has been
bootstrapped and GCC tested on:

x86_64-unknown-linux-gnu
powerpc-ibm-aix7.1.2.0
powerpc64-unknown-linux-gnu
aarch64-unknown-linux-gnu

I have also GDB tested the patchset on x86_64-linux.

Thanks for your help in this ordeal.


Aldy: did you test the jit with this?  Specifically, is gdb still able
to step through the generated code/inspect values?
etc; see e.g:
https://gcc.gnu.org/onlinedocs/jit/intro/tutorial04.html#single-stepping-through-the-generated-code

Sadly the jit testsuite doesn't yet automatically verify that sane
debuginfo is generated; I've only ever hand-tested that (this is PR
jit/64196).


If the jit doesn't have a testsuite that runs from "make check", then no.



Alternatively, I guess I can try your branch if that's easier (exactly
which should I test?)


I will not be upset if you can test :).  Just checkout the branch 
(origin/aldyh/debug-early) and build/use it normally.  There should not 
be any differences to mainline.  If there are, it's a bug that must be 
fixed.


Let me know, and thanks!

Aldy


[PATCH] Vectorize strided group loads

2015-05-08 Thread Richard Biener

Currently the vectorizer forces unrolling for grouped loads that
have DR_STEP not constant, forcing the elements loaded with strided
load support.  The following patch enhances that machinery to deal
with SLP used groups that have non-constant DR_STEP, avoiding the
excessive unrolling (and (un-)packing).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-05-08  Richard Biener  

* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
Handle strided group loads.
(vect_verify_datarefs_alignment): Likewise.
(vect_enhance_data_refs_alignment): Likewise.
(vect_analyze_group_access): Likewise.
(vect_analyze_data_ref_access): Likewise.
(vect_analyze_data_ref_accesses): Likewise.
* tree-vect-stmts.c (vect_model_load_cost): Likewise.
(vectorizable_load): Likewise.

* gcc.dg/vect/slp-41.c: New testcase.

Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c.orig  2015-05-08 13:24:31.797746925 +0200
--- gcc/tree-vect-data-refs.c   2015-05-08 13:26:23.839725349 +0200
*** vect_compute_data_ref_alignment (struct
*** 671,677 
tree vectype;
tree base, base_addr;
bool base_aligned;
!   tree misalign;
tree aligned_to;
unsigned HOST_WIDE_INT alignment;
  
--- 671,677 
tree vectype;
tree base, base_addr;
bool base_aligned;
!   tree misalign = NULL_TREE;
tree aligned_to;
unsigned HOST_WIDE_INT alignment;
  
*** vect_compute_data_ref_alignment (struct
*** 687,696 
  
/* Strided loads perform only component accesses, misalignment information
   is irrelevant for them.  */
!   if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
  return true;
  
!   misalign = DR_INIT (dr);
aligned_to = DR_ALIGNED_TO (dr);
base_addr = DR_BASE_ADDRESS (dr);
vectype = STMT_VINFO_VECTYPE (stmt_info);
--- 687,698 
  
/* Strided loads perform only component accesses, misalignment information
   is irrelevant for them.  */
!   if (STMT_VINFO_STRIDE_LOAD_P (stmt_info)
!   && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
  return true;
  
!   if (tree_fits_shwi_p (DR_STEP (dr)))
! misalign = DR_INIT (dr);
aligned_to = DR_ALIGNED_TO (dr);
base_addr = DR_BASE_ADDRESS (dr);
vectype = STMT_VINFO_VECTYPE (stmt_info);
*** vect_compute_data_ref_alignment (struct
*** 704,712 
if (loop && nested_in_vect_loop_p (loop, stmt))
  {
tree step = DR_STEP (dr);
-   HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
  
!   if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) == 0)
  {
if (dump_enabled_p ())
  dump_printf_loc (MSG_NOTE, vect_location,
--- 706,714 
if (loop && nested_in_vect_loop_p (loop, stmt))
  {
tree step = DR_STEP (dr);
  
!   if (tree_fits_shwi_p (step)
! && tree_to_shwi (step) % GET_MODE_SIZE (TYPE_MODE (vectype)) == 0)
  {
if (dump_enabled_p ())
  dump_printf_loc (MSG_NOTE, vect_location,
*** vect_compute_data_ref_alignment (struct
*** 732,740 
if (!loop)
  {
tree step = DR_STEP (dr);
-   HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
  
!   if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
--- 734,742 
if (!loop)
  {
tree step = DR_STEP (dr);
  
!   if (tree_fits_shwi_p (step)
! && tree_to_shwi (step) % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
*** vect_verify_datarefs_alignment (loop_vec
*** 964,970 
  
/* Strided loads perform only component accesses, alignment is
 irrelevant for them.  */
!   if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
continue;
  
supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
--- 966,973 
  
/* Strided loads perform only component accesses, alignment is
 irrelevant for them.  */
!   if (STMT_VINFO_STRIDE_LOAD_P (stmt_info)
! && !STMT_VINFO_GROUPED_ACCESS (stmt_info))
continue;
  
supportable_dr_alignment = vect_supportable_dr_alignment (dr, false);
*** vect_enhance_data_refs_alignment (loop_v
*** 1431,1437 
  
/* Strided loads perform only component accesses, alignment is
 irrelevant for them.  */
!   if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
continue;
  
supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
--- 1434,1441 
  
/* Strided loads perform only component accesses, alignment is
 irrelevant for them.  */
!   if (STMT_VINFO_STRIDE_LOAD_P (stmt_info)
! && !STMT_

Re: [PATCH][tree-ssa-math-opts] Expand pow (x, CONST) using square roots when possible

2015-05-08 Thread Kyrill Tkachov


On 08/05/15 14:56, Kyrill Tkachov wrote:

On 08/05/15 11:18, Richard Biener wrote:

On Fri, May 1, 2015 at 6:02 PM, Kyrill Tkachov
 wrote:

Hi all,

GCC has some logic to expand calls to pow (x, 0.75), pow (0.25) and pow (x,
(int)k + 0.5)
using square roots. So, for the above examples it would generate sqrt (x) *
sqrt (sqrt (x)),
sqrt (sqrt (x)) and powi (x, k) * sqrt (x) (assuming k > 0. For k < 0 it
will calculate the
reciprocal of that).

However, the implementation of these optimisations is done on a bit of an
ad-hoc basis with
the 0.25, 0.5, 0.75 cases hardcoded.
Judging by
https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=meissner2.pdf
these are the most commonly used exponents (at least in SPEC ;))

This patch generalises this optimisation into a (hopefully) more robust
algorithm.
In particular, it expands calls to pow (x, CST) by expanding the integer
part of CST
using a powi, like it does already, and then expanding the fractional part
as a product
of repeated applications of a square root if the fractional part can be
expressed
as a multiple of a power of 0.5.

I try to explain the algorithm in more detail in the comments in the patch
but, for example:

pow (x, 5.625) is not currently handled, but with this patch will be
expanded
to powi (x, 5) * sqrt (x) * sqrt (sqrt (sqrt (x))) because 5.625 == 5.0 +
0.5 + 0.5**3

Negative exponents are handled in either of two ways, depending on the
exponent value:
* Using a simple reciprocal.
For example:
pow (x, -5.625) == 1.0 / pow (x, 5.625)
  --> 1.0 / (powi (x, 5) * sqrt (x) * sqrt (sqrt (sqrt (x

* For pow (x, EXP) with negative exponent EXP with integer part INT and
fractional part FRAC:
pow (1.0 - FRAC) / powi (ceil (abs (EXP))).
For example:
pow (x, -5.875) == pow (x, 0.125) / powi (X, 6)
  --> sqrt (sqrt (sqrt (x))) / (powi (x, 6))


Since hardware square root instructions tend to be expensive, we may want to
reduce the number
of square roots we are willing to calculate. Since we reuse intermediate
square root results,
this boils down to restricting the depth of the square root chains. In all
the examples above
that depth is 3. I've made this maximum depth parametrisable in params.def.
By adjusting that
parameter we can adjust the resolution of this optimisation. So, if it's set
to '4' then we
will synthesize every exponent that is a multiple of 0.5**4 == 0.0625,
including negative
multiples. Currently, GCC will not try to expand negative multiples of
anything else than 0.5

I have tried to keep the existing functionality intact and activate this
only for
-funsafe-math-optimizations and only when the target has a sqrt instruction.
   An exception to that is pow (x, 0.5) which we prefer to transform to sqrt
even
when a hardware sqrt is not available, presumably because the library
function for
sqrt is usually faster than pow (?).

Yes.  It's also a safe transform - which you seem to put under
flag_unsafe_math_optimizations only with your patch.

It would be clearer to just leave the special-case

-  /* Optimize pow(x,0.5) = sqrt(x).  This replacement is always safe
- unless signed zeros must be maintained.  pow(-0,0.5) = +0, while
- sqrt(-0) = -0.  */
-  if (sqrtfn
-  && REAL_VALUES_EQUAL (c, dconsthalf)
-  && !HONOR_SIGNED_ZEROS (mode))
-return build_and_insert_call (gsi, loc, sqrtfn, arg0);

in as-is.

Ok, I'll leave that case explicit.


You also removed the Os constraint which you should put back in.
Basically if !optimize_function_for_speed_p then generate at most
two calls to sqrt (iff the HW has a sqrt instruction).

I tried to move that logic into expand_with_sqrts but
I'll move it outside it. It seems that this boils down to
only 0.25, as any other 2xsqrt chain will also involve a
multiply or a divide which we currently avoid.


You fail to add a testcase that checks that the optimization applies.

I'll add one to scan the sincos dump.
I notice that we don't have a testuite check that the target has
a hw sqrt instructions. Would you like me to add one? Or can I make
the testcase aarch64-specific?


Otherwise the idea looks good though there must be a better way
to compute the series than by using real-arithmetic and forcefully
trying out all possibilities...

I get that feeling too. What I need is not only a way
of figuring out if the fractional part of the exponent can be
represented in this way, but also compute the depth of the
sqrt chain and the number of multiplies...
That being said, the current approach is O(maximum depth) and
I don't expect the depth to go much beyond 3 or 4 in practice.

Thanks for looking at it!
I'll respin the patch.


And here it is, with my above comments implemented.
Bootstrapped on x86_64 and tested on aarch64.
Full testing on arm and aarch64 ongoing.

Is this ok if testing comes clean?

Thanks,
Kyrill


Kyrill


Richard.


Having seen the glibc implementation of a fully IEEE-754-compliant pow
function, I think we
would prefer synthesising the

[PATCH][PR66013] Update address_taken after ifn_va_arg expansion

2015-05-08 Thread Tom de Vries

Hi,

this patch fixes PR66013.


I.

Consider this test-case, with a va_list passed from f2 to f2_1:
...
#include 

inline int __attribute__((always_inline))
f2_1 (va_list ap)
{
  return va_arg (ap, int);
}

int
f2 (int i, ...)
{
  int res;
  va_list ap;

  va_start (ap, i);
  res = f2_1 (ap);
  va_end (ap);

  return res;
}
...

When compiling at -O2 with -m32, before pass_stdarg we see that va_start and 
va_arg (in the same function after inlining) use different aps (with the same 
value though):

...
  # .MEM_2 = VDEF <.MEM_1(D)>
  # USE = nonlocal escaped
  # CLB = nonlocal escaped { D.1809 }
  __builtin_va_startD.1021 (&apD.1809, 0);

  # VUSE <.MEM_2>
  # PT = nonlocal
  ap.0_3 = apD.1809;

  # .MEM_4 = VDEF <.MEM_2>
  apD.1820 = ap.0_3;

  # .MEM_8 = VDEF <.MEM_4>
  # USE = nonlocal null { D.1820 } (escaped)
  # CLB = nonlocal null { D.1820 } (escaped)
  _7 = VA_ARG (&apD.1820, 0B, 1);
...

After expand_ifn_va_arg_1, we have this representation, and that's the one the 
pass_stdarg optimization operates on:

...
  # .MEM_2 = VDEF <.MEM_1(D)>
  # USE = nonlocal escaped
  # CLB = nonlocal escaped { D.1809 }
  __builtin_va_startD.1021 (&apD.1809, 0);

  # VUSE <.MEM_2>
  # PT = nonlocal
  ap.0_3 = apD.1809;

  # .MEM_4 = VDEF <.MEM_2>
  apD.1820 = ap.0_3;

  # VUSE <.MEM_4>
  ap.4_9 = apD.1820;

  ap.5_10 = ap.4_9 + 4;

  # .MEM_11 = VDEF <.MEM_4>
  apD.1820 = ap.5_10;

  # VUSE <.MEM_11>
  _7 = MEM[(intD.1 *)ap.4_9];
...

The optimization in pass_stdarg fails:
...
f2: va_list escapes 1, needs to save all GPR units and all FPR units.
...

The optimization fails because this assignment makes the va_list escape:
...
va_list escapes in # .MEM_4 = VDEF <.MEM_2>
apD.1820 = ap.0_3;
...


II.

By recalculating address_taken after expanding the ifn_va_arg, we get instead:
...
  # .MEM_2 = VDEF <.MEM_1(D)>
  # USE = nonlocal escaped
  # CLB = nonlocal escaped { D.1809 }
  __builtin_va_startD.1021 (&apD.1809, 0);

  # VUSE <.MEM_2>
  # PT = nonlocal
  ap.0_3 = apD.1809;

  ap_11 = ap.0_3;

  ap.4_9 = ap_11;

  ap.5_10 = ap.4_9 + 4;

  ap_4 = ap.5_10;

  # VUSE <.MEM_2>
  _7 = MEM[(intD.1 *)ap.4_9];
...

and the pass_stdarg optimization succeeds now:
...
f2: va_list escapes 0, needs to save 4 GPR units and all FPR units.
...

Bootstrapped and reg-tested on x86_64 with and without -m32.

OK for trunk?

Thanks,
- Tom
Update address_taken after ifn_va_arg expansion

2015-05-08  Tom de Vries  

	PR tree-optimization/66013
	* tree-stdarg.c: Include tree-ssa.h.
	(expand_ifn_va_arg_1): Call execute_update_addresses_taken after
	TODO_update_ssa.

	* gcc.dg/tree-ssa/stdarg-2.c: Add ia32 scan for 'va_list escapes 0'.
---
 gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c | 2 ++
 gcc/tree-stdarg.c| 2 ++
 2 files changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c b/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c
index f09b5de..3b1bc2c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/stdarg-2.c
@@ -300,6 +300,8 @@ f15 (int i, ...)
 /* We may be able to improve upon this after fixing PR66010/PR66013.  */
 /* { dg-final { scan-tree-dump "f15: va_list escapes 1, needs to save all GPR units and all FPR units" "stdarg" { target alpha*-*-linux* } } } */
 
+/* { dg-final { scan-tree-dump "f15: va_list escapes 0" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
+
 /* { dg-final { scan-tree-dump-not "f15: va_list escapes 0, needs to save 0 GPR units" "stdarg" { target { { i?86-*-* x86_64-*-* } && ia32 } } } } */
 /* { dg-final { scan-tree-dump-not "f15: va_list escapes 0, needs to save 0 GPR units" "stdarg" { target ia64-*-* } } } */
 /* { dg-final { scan-tree-dump-not "f15: va_list escapes 0, needs to save 0 GPR units" "stdarg" { target { powerpc*-*-* && lp64 } } } } */
diff --git a/gcc/tree-stdarg.c b/gcc/tree-stdarg.c
index 1356374..64e6224 100644
--- a/gcc/tree-stdarg.c
+++ b/gcc/tree-stdarg.c
@@ -62,6 +62,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfg.h"
 #include "tree-pass.h"
 #include "tree-stdarg.h"
+#include "tree-ssa.h"
 
 /* A simple pass that attempts to optimize stdarg functions on architectures
that need to save register arguments to stack on entry to stdarg functions.
@@ -1108,6 +1109,7 @@ expand_ifn_va_arg_1 (function *fun)
 
   free_dominance_info (CDI_DOMINATORS);
   update_ssa (TODO_update_ssa);
+  execute_update_addresses_taken ();
 }
 
 /* Expand IFN_VA_ARGs in FUN, if necessary.  */
-- 
1.9.1



Re: [PATCH 6/13] mips musl support

2015-05-08 Thread Szabolcs Nagy
On 08/05/15 15:46, Matthew Fortune wrote:
> Szabolcs Nagy  writes:
>> if you think that's ok, i can now submit the patch with %{msoft-float:-
>> sf} added to all abi variants.
> 
> That's fine. Go ahead.
> 

the patch for the record.

Changelog:

2015-05-08  Gregor Richards  
Szabolcs Nagy  

* config/mips/linux.h (MUSL_DYNAMIC_LINKER32): Define.
(MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERN32): Define.
(GNU_USER_DYNAMIC_LINKERN32): Update.
diff --git a/gcc/config/mips/linux.h b/gcc/config/mips/linux.h
index 91df261..fb358e2 100644
--- a/gcc/config/mips/linux.h
+++ b/gcc/config/mips/linux.h
@@ -37,7 +37,13 @@ along with GCC; see the file COPYING3.  If not see
 #define UCLIBC_DYNAMIC_LINKERN32 \
   "%{mnan=2008:/lib32/ld-uClibc-mipsn8.so.0;:/lib32/ld-uClibc.so.0}"
 
+#undef MUSL_DYNAMIC_LINKER32
+#define MUSL_DYNAMIC_LINKER32 "/lib/ld-musl-mips%{EL:el}%{msoft-float:-sf}.so.1"
+#undef MUSL_DYNAMIC_LINKER64
+#define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-mips64%{EL:el}%{msoft-float:-sf}.so.1"
+#define MUSL_DYNAMIC_LINKERN32 "/lib/ld-musl-mipsn32%{EL:el}%{msoft-float:-sf}.so.1"
+
 #define BIONIC_DYNAMIC_LINKERN32 "/system/bin/linker32"
 #define GNU_USER_DYNAMIC_LINKERN32 \
   CHOOSE_DYNAMIC_LINKER (GLIBC_DYNAMIC_LINKERN32, UCLIBC_DYNAMIC_LINKERN32, \
- BIONIC_DYNAMIC_LINKERN32)
+ BIONIC_DYNAMIC_LINKERN32, MUSL_DYNAMIC_LINKERN32)


Re: [PATCH 6/13] mips musl support

2015-05-08 Thread Kyrill Tkachov


On 08/05/15 16:13, Szabolcs Nagy wrote:

On 08/05/15 15:46, Matthew Fortune wrote:

Szabolcs Nagy  writes:

if you think that's ok, i can now submit the patch with %{msoft-float:-
sf} added to all abi variants.

That's fine. Go ahead.


the patch for the record.


I've committed this on Szabolcs' behalf with r222915.

Kyrill


Changelog:

2015-05-08  Gregor Richards  
Szabolcs Nagy  

* config/mips/linux.h (MUSL_DYNAMIC_LINKER32): Define.
(MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERN32): Define.
(GNU_USER_DYNAMIC_LINKERN32): Update.




Re: [patch 1/10] debug-early merge: Ada front-end

2015-05-08 Thread Aldy Hernandez

On 05/08/2015 03:35 AM, Richard Biener wrote:

On Fri, May 8, 2015 at 12:26 PM, Eric Botcazou  wrote:

@@ -5204,28 +5199,6 @@ gnat_write_global_declarations (void)
 types_used_by_var_decl_insert (t, dummy_global);
   }
  }
-
-  /* Output debug information for all global type declarations first.  This
- ensures that global types whose compilation hasn't been finalized
yet, - for example pointers to Taft amendment types, have their
compilation - finalized in the right context.  */
-  FOR_EACH_VEC_SAFE_ELT (global_decls, i, iter)
-if (TREE_CODE (iter) == TYPE_DECL && !DECL_IGNORED_P (iter))
-  debug_hooks->global_decl (iter);


Shouldn't that have used ->type_decl (iter) anyway?  That is, are they not
already processed via rest_of_type_compilation or does the Ada FE not
use that?


My question exactly.  Perhaps that was my confusion.  Why is this using 
->global_decl?


For example, the C front-end uses rest_of_type_compilation (see 
finish_struct() in c/c-decl.c) which calls ->type_decl(), or it calls 
->type_decl() from record_builtin_type().





-  /* Proceed to optimize and emit assembly. */
-  symtab->finalize_compilation_unit ();
-
-  /* After cgraph has had a chance to emit everything that's going to
- be emitted, output debug information for the rest of globals.  */
-  if (!seen_error ())
-{
-  timevar_push (TV_SYMOUT);
-  FOR_EACH_VEC_SAFE_ELT (global_decls, i, iter)
- if (TREE_CODE (iter) != TYPE_DECL && !DECL_IGNORED_P (iter))
-   debug_hooks->global_decl (iter);
-  timevar_pop (TV_SYMOUT);
-}
  }


What's the replacement mechanism for the first pass on global_decls?  The
comment explains that generating debug info must be delayed in this case.


But yes, I don't think the patches add any replacement for processing
TYPE_DECLs that happen to be in global_decls.


I can put the code back, but calling ->type_decl()?  Assuming you folks 
(Ada) don't want to use rest_of_type_compilation().


Aldy



Re: [RFC 0/6] Flags outputs for asms

2015-05-08 Thread Richard Henderson
On 05/07/2015 06:20 PM, H. Peter Anvin wrote:
> This is a separate issue which really shouldn't have anything to do with
> this, but is there a specific reason why:
> 
> void good1(int x, int y)
> {
>   _Bool pf;
> 
>   asm("cmpl %2,%1"
>   : "=@ccp" (pf)
>   : "r" (x), "g" (y));
> 
>   if (pf)
> beta();
> }
> 
> ... ends up generating a jump to a jump?
> 
>  :
>0:   39 f7   cmp%esi,%edi
>2:   7a 0c   jp 10 
>4:   f3 c3   repz retq
>6:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
>d:   00 00 00
>   10:   e9 00 00 00 00  jmpq   15 
> 11: R_X86_64_PC32   beta-0x4
>   15:   66 66 2e 0f 1f 84 00data32 nopw %cs:0x0(%rax,%rax,1)
>   1c:   00 00 00 00
> 

Yes, the i386 backend has not implemented conditional sibcalls.  AFAIK the only
targets that have done that are ones with predication: ia64 and maybe arm32.

It could certainly be done; I've no idea off hand how difficult it might be.  I
suspect that some new code has to be written generically in order to enable it.


r~



Re: [Patch, fortran, pr65894, v1] [6 Regression] severe regression in gfortran 6.0.0

2015-05-08 Thread Steve Kargl
On Fri, May 08, 2015 at 01:54:17PM +0200, Andre Vehreschild wrote:
> 
> I do not have the privileges to do a review so I can't help you there. Good
> luck finding a reviewer.
> 

You probably understand this area of code as well as anyone
else, and your contributions to gfortran over the last few
months certainily merit "reviewer privilege".

Mikael, if Andre believes the patch is correct and you've
done the regression testing, then I see no reason to not
commit it.

-- 
Steve


Re: PR 64454: Improve VRP for %

2015-05-08 Thread Richard Biener
On Fri, May 8, 2015 at 4:59 PM, Marc Glisse  wrote:
> Hello,
>
> here is a rewrite of the patch, using wide_int, and improving a bit the
> result. Same ChangeLog, tested again on x86_64-linux-gnu.

Please use /* */ for comments.

Otherwise ok!

Thanks,
Richard.

> --
> Marc Glisse
> Index: gcc/testsuite/gcc.dg/tree-ssa/vrp97.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/vrp97.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/vrp97.c   (working copy)
> @@ -0,0 +1,19 @@
> +/* PR tree-optimization/64454 */
> +/* { dg-options "-O2 -fdump-tree-vrp1" } */
> +
> +int f(int a, int b)
> +{
> +if (a < -3 || a > 13) __builtin_unreachable();
> +if (b < -6 || b > 9) __builtin_unreachable();
> +int c = a % b;
> +return c >= -3 && c <= 8;
> +}
> +
> +int g(int a, int b)
> +{
> +  int c = a % b;
> +  return c != -__INT_MAX__ - 1;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "return 1;" 2 "vrp1" } } */
> +/* { dg-final { cleanup-tree-dump "vrp1" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-perm-7.c
> ===
> --- gcc/testsuite/gcc.dg/vect/slp-perm-7.c  (revision 222906)
> +++ gcc/testsuite/gcc.dg/vect/slp-perm-7.c  (working copy)
> @@ -63,15 +63,15 @@ int main (int argc, const char* argv[])
>
>foo (input, output, input2, output2);
>
>for (i = 0; i < N; i++)
>   if (output[i] != check_results[i] || output2[i] != check_results2[i])
> abort ();
>
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {
> target vect_perm } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"  {
> target vect_perm } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"
> { target vect_perm } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
>
> Index: gcc/tree-vrp.c
> ===
> --- gcc/tree-vrp.c  (revision 222906)
> +++ gcc/tree-vrp.c  (working copy)
> @@ -3189,40 +3189,73 @@ extract_range_from_binary_expr_1 (value_
> }
> }
>else
> {
>   extract_range_from_multiplicative_op_1 (vr, code, &vr0, &vr1);
>   return;
> }
>  }
>else if (code == TRUNC_MOD_EXPR)
>  {
> -  if (vr1.type != VR_RANGE
> - || range_includes_zero_p (vr1.min, vr1.max) != 0
> - || vrp_val_is_min (vr1.min))
> +  if (range_is_null (&vr1))
> {
> - set_value_range_to_varying (vr);
> + set_value_range_to_undefined (vr);
>   return;
> }
> +  // ABS (A % B) < ABS (B) and either 0 <= A % B <= A or A <= A % B <=
> 0.
>type = VR_RANGE;
> -  /* Compute MAX <|vr1.min|, |vr1.max|> - 1.  */
> -  max = fold_unary_to_constant (ABS_EXPR, expr_type, vr1.min);
> -  if (tree_int_cst_lt (max, vr1.max))
> -   max = vr1.max;
> -  max = int_const_binop (MINUS_EXPR, max, build_int_cst (TREE_TYPE
> (max), 1));
> -  /* If the dividend is non-negative the modulus will be
> -non-negative as well.  */
> -  if (TYPE_UNSIGNED (expr_type)
> - || value_range_nonnegative_p (&vr0))
> -   min = build_int_cst (TREE_TYPE (max), 0);
> +  signop sgn = TYPE_SIGN (expr_type);
> +  unsigned int prec = TYPE_PRECISION (expr_type);
> +  wide_int wmin, wmax, tmp;
> +  wide_int zero = wi::zero (prec);
> +  wide_int one = wi::one (prec);
> +  if (vr1.type == VR_RANGE && !symbolic_range_p (&vr1))
> +   {
> + wmax = wi::sub (vr1.max, one);
> + if (sgn == SIGNED)
> +   {
> + tmp = wi::sub (wi::minus_one (prec), vr1.min);
> + wmax = wi::smax (wmax, tmp);
> +   }
> +   }
> +  else
> +   {
> + wmax = wi::max_value (prec, sgn);
> + // X % INT_MIN may be INT_MAX.
> + if (sgn == UNSIGNED)
> +   wmax = wmax - one;
> +   }
> +
> +  if (sgn == UNSIGNED)
> +   wmin = zero;
>else
> -   min = fold_unary_to_constant (NEGATE_EXPR, expr_type, max);
> +   {
> + wmin = -wmax;
> + if (vr0.type == VR_RANGE && TREE_CODE (vr0.min) == INTEGER_CST)
> +   {
> + tmp = vr0.min;
> + if (wi::gts_p (tmp, zero))
> +   tmp = zero;
> + wmin = wi::smax (wmin, tmp);
> +   }
> +   }
> +
> +  if (vr0.type == VR_RANGE && TREE_CODE (vr0.max) == INTEGER_CST)
> +   {
> + tmp = vr0.max;
> + if (sgn == SIGNED && wi::neg_p (tmp))
> +   tmp = zero;
> + wmax = wi::min (wmax, tmp, sgn);
> +   }
> +
> +  min = wide_int_to_tree (expr_type, wmin);
> +  max = wide_int_to_tree (expr_type, wmax);
>  }
>else if (code == BIT_AND_EXPR || code == BIT_IOR_EXPR || code ==
> BIT_XOR_EXPR)
>  {
>bool int_cst_range0, int_cst_range1;
>

Re: [PATCH][PR66013] Update address_taken after ifn_va_arg expansion

2015-05-08 Thread Richard Biener
On Fri, May 8, 2015 at 5:13 PM, Tom de Vries  wrote:
> Hi,
>
> this patch fixes PR66013.
>
>
> I.
>
> Consider this test-case, with a va_list passed from f2 to f2_1:
> ...
> #include 
>
> inline int __attribute__((always_inline))
> f2_1 (va_list ap)
> {
>   return va_arg (ap, int);
> }
>
> int
> f2 (int i, ...)
> {
>   int res;
>   va_list ap;
>
>   va_start (ap, i);
>   res = f2_1 (ap);
>   va_end (ap);
>
>   return res;
> }
> ...
>
> When compiling at -O2 with -m32, before pass_stdarg we see that va_start and
> va_arg (in the same function after inlining) use different aps (with the
> same value though):
> ...
>   # .MEM_2 = VDEF <.MEM_1(D)>
>   # USE = nonlocal escaped
>   # CLB = nonlocal escaped { D.1809 }
>   __builtin_va_startD.1021 (&apD.1809, 0);
>
>   # VUSE <.MEM_2>
>   # PT = nonlocal
>   ap.0_3 = apD.1809;
>
>   # .MEM_4 = VDEF <.MEM_2>
>   apD.1820 = ap.0_3;
>
>   # .MEM_8 = VDEF <.MEM_4>
>   # USE = nonlocal null { D.1820 } (escaped)
>   # CLB = nonlocal null { D.1820 } (escaped)
>   _7 = VA_ARG (&apD.1820, 0B, 1);
> ...
>
> After expand_ifn_va_arg_1, we have this representation, and that's the one
> the pass_stdarg optimization operates on:
> ...
>   # .MEM_2 = VDEF <.MEM_1(D)>
>   # USE = nonlocal escaped
>   # CLB = nonlocal escaped { D.1809 }
>   __builtin_va_startD.1021 (&apD.1809, 0);
>
>   # VUSE <.MEM_2>
>   # PT = nonlocal
>   ap.0_3 = apD.1809;
>
>   # .MEM_4 = VDEF <.MEM_2>
>   apD.1820 = ap.0_3;
>
>   # VUSE <.MEM_4>
>   ap.4_9 = apD.1820;
>
>   ap.5_10 = ap.4_9 + 4;
>
>   # .MEM_11 = VDEF <.MEM_4>
>   apD.1820 = ap.5_10;
>
>   # VUSE <.MEM_11>
>   _7 = MEM[(intD.1 *)ap.4_9];
> ...
>
> The optimization in pass_stdarg fails:
> ...
> f2: va_list escapes 1, needs to save all GPR units and all FPR units.
> ...
>
> The optimization fails because this assignment makes the va_list escape:
> ...
> va_list escapes in # .MEM_4 = VDEF <.MEM_2>
> apD.1820 = ap.0_3;
> ...
>
>
> II.
>
> By recalculating address_taken after expanding the ifn_va_arg, we get
> instead:
> ...
>   # .MEM_2 = VDEF <.MEM_1(D)>
>   # USE = nonlocal escaped
>   # CLB = nonlocal escaped { D.1809 }
>   __builtin_va_startD.1021 (&apD.1809, 0);
>
>   # VUSE <.MEM_2>
>   # PT = nonlocal
>   ap.0_3 = apD.1809;
>
>   ap_11 = ap.0_3;
>
>   ap.4_9 = ap_11;
>
>   ap.5_10 = ap.4_9 + 4;
>
>   ap_4 = ap.5_10;
>
>   # VUSE <.MEM_2>
>   _7 = MEM[(intD.1 *)ap.4_9];
> ...
>
> and the pass_stdarg optimization succeeds now:
> ...
> f2: va_list escapes 0, needs to save 4 GPR units and all FPR units.
> ...
>
> Bootstrapped and reg-tested on x86_64 with and without -m32.
>
> OK for trunk?

As noted in one of the PRs I think that it is the proper time to
re-implement the stdarg optimization on the un-lowered form which
should also fix this.

Thanks,
Richard.

>
> Thanks,
> - Tom


Re: [RFC 0/6] Flags outputs for asms

2015-05-08 Thread Jay Foad
On 8 May 2015 at 16:23, Richard Henderson  wrote:
> Yes, the i386 backend has not implemented conditional sibcalls.

See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60159

Jay.


Re: [RFC 0/6] Flags outputs for asms

2015-05-08 Thread Jeff Law

On 05/08/2015 09:23 AM, Richard Henderson wrote:

On 05/07/2015 06:20 PM, H. Peter Anvin wrote:

This is a separate issue which really shouldn't have anything to do with
this, but is there a specific reason why:

void good1(int x, int y)
{
   _Bool pf;

   asm("cmpl %2,%1"
   : "=@ccp" (pf)
   : "r" (x), "g" (y));

   if (pf)
 beta();
}

... ends up generating a jump to a jump?

 :
0:   39 f7   cmp%esi,%edi
2:   7a 0c   jp 10 
4:   f3 c3   repz retq
6:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
d:   00 00 00
   10:   e9 00 00 00 00  jmpq   15 
 11: R_X86_64_PC32   beta-0x4
   15:   66 66 2e 0f 1f 84 00data32 nopw %cs:0x0(%rax,%rax,1)
   1c:   00 00 00 00



Yes, the i386 backend has not implemented conditional sibcalls.  AFAIK the only
targets that have done that are ones with predication: ia64 and maybe arm32.

It could certainly be done; I've no idea off hand how difficult it might be.  I
suspect that some new code has to be written generically in order to enable it.
Kai looked at this last year, it's possible, but rather tedious to do in 
GCC due to the separation of JUMP_INSN vs CALL_INSN and the need to 
duplicate the conditional jumps as conditional sibcalls.


There's a BZ about this, I'm not sure if Kai put all his thoughts on the 
topic into the BZ or not.


jeff


Re: [RFC 0/6] Flags outputs for asms

2015-05-08 Thread Richard Henderson
On 05/07/2015 06:15 PM, H. Peter Anvin wrote:
> /* This case really should produce good code in both cases */
> 
> void good1(int x, int y)
> {
>   _Bool pf;
> 
>   asm("cmpl %2,%1"
>   : "=@ccp" (pf)
>   : "r" (x), "g" (y));
> 
>   if (pf)
> beta();
> }
> 
> void bad1(int x, int y)
> {
>   _Bool le, pf;
> 
>   asm("cmpl %3,%2"
>   : "=@ccle" (le), "=@ccp" (pf)
>   : "r" (x), "g" (y));
> 
>   if (le)
> alpha();
>   else if (pf)
> beta();
> }

I have a feeling I know why these didn't get merged.

The global optimizers aren't allowed to operate on hard registers lest they
extend the lifetime of the hard register such that it creates an impossible
situation for the register allocator.  Think what would happen if EAX were
suddenly live across the entire function.

Similarly, combine is allowed to merge insns with hard registers if the insns
are sequential.  But if the insns aren't sequential, we're lengthening the
lifetime of the hard register.  Now, I thought this didn't apply to fixed
registers like esp or flags, but perhaps not.

Note what happens if you swap the order of le and pf in the asm:

  asm("cmpl %3,%2" : "=@ccp" (pf), "=@ccle" (le) : "r" (x), "g" (y));

the order of the two setcc insns is reversed, and then the setle is in fact
merged with the branch.

Anyway, I'll look into whether the branch around alpha can be optimized, but
I'd be shocked if I'd be able to do anything about the branch around beta.
True, there's nothing in between that will clobber the flags so it would be an
excellent improvement, but combine doesn't work across basic blocks and
changing that would be a major task.


> /* This case really is too much to ask... */
> 
> _Bool good2(int x, int y)
> {
>   _Bool le;
> 
>   asm("cmpl %2,%1"
>   : "=@ccle" (le)
>   : "r" (x), "g" (y));
> 
>   return le;
> }
> 
> _Bool bad2(int x, int y)
> {
>   _Bool zf, of, sf;
> 
>   asm("cmpl %4,%3"
>   : "=@ccz" (zf), "=@cco" (of), "=@ccs" (sf)
>   : "r" (x), "g" (y));
> 
>   return zf | (sf ^ of);
> }

Haha, yes.

> /* One should expect this shouldn't produce *worse* code than the above... */
> 
> int good3(int x, int y, int a, int b)
> {
>   _Bool le;
> 
>   asm("cmpl %2,%1"
>   : "=@ccle" (le)
>   : "r" (x), "g" (y));
> 
>   return le ? b : a;
> }
> 
> int bad3(int x, int y, int a, int b)
> {
>   _Bool zf, of, sf;
> 
>   asm("cmpl %4,%3"
>   : "=@ccz" (zf), "=@cco" (of), "=@ccs" (sf)
>   : "r" (x), "g" (y));
> 
>   return zf | (sf ^ of) ? b : a;
> }

This is a case of the optimizers thinking they're helping you by not folding
too much computation into a condition.

If you use -mbranch-cost=4 you'll get the cmovne that you expect.


r~


Re: [patch 1/10] debug-early merge: Ada front-end

2015-05-08 Thread Aldy Hernandez

On 05/08/2015 03:26 AM, Eric Botcazou wrote:

@@ -5204,28 +5199,6 @@ gnat_write_global_declarations (void)
  types_used_by_var_decl_insert (t, dummy_global);
}
  }
-
-  /* Output debug information for all global type declarations first.  This
- ensures that global types whose compilation hasn't been finalized
yet, - for example pointers to Taft amendment types, have their
compilation - finalized in the right context.  */
-  FOR_EACH_VEC_SAFE_ELT (global_decls, i, iter)
-if (TREE_CODE (iter) == TYPE_DECL && !DECL_IGNORED_P (iter))
-  debug_hooks->global_decl (iter);
-
-  /* Proceed to optimize and emit assembly. */
-  symtab->finalize_compilation_unit ();
-
-  /* After cgraph has had a chance to emit everything that's going to
- be emitted, output debug information for the rest of globals.  */
-  if (!seen_error ())
-{
-  timevar_push (TV_SYMOUT);
-  FOR_EACH_VEC_SAFE_ELT (global_decls, i, iter)
-   if (TREE_CODE (iter) != TYPE_DECL && !DECL_IGNORED_P (iter))
- debug_hooks->global_decl (iter);
-  timevar_pop (TV_SYMOUT);
-}
  }


What's the replacement mechanism for the first pass on global_decls?  The
comment explains that generating debug info must be delayed in this case.



Ah, I see what you mean.  I'll address this and repost.

Thanks.
Aldy


Re: [C/C++ PATCH] Implement -Wshift-negative-value (PR c/65179)

2015-05-08 Thread Steve Ellcey
On Thu, 2015-05-07 at 21:15 +0200, Marek Polacek wrote:
> On Thu, May 07, 2015 at 12:00:20PM -0600, Jeff Law wrote:
> > OK.  Please install if you haven't already.
> 
> I have not, so will do momentarily.  Thanks,
> 
>   Marek

Marek,

This patch has broken the glibc build.  I am not sure if the problem is
a bug in your patch or a bug in the code used by glibc.  Here is a
cutdown test case from glibc (timezone/scheck.c).  This code compiled
before your patch but now it fails with:

x.c:4:3: error: initializer element is not constant
   time_t) -1) < 0)



__extension__ typedef long int __time_t;
typedef __time_t time_t;
static time_t const time_t_min =
  time_t) -1) < 0)
   ? (time_t) -1 << (8 * sizeof (time_t) - 1)
   : 0)



Steve Ellcey
sell...@imgtec.com



Re: [C/C++ PATCH] Implement -Wshift-negative-value (PR c/65179)

2015-05-08 Thread Markus Trippelsdorf
On 2015.05.08 at 09:38 -0700, Steve Ellcey wrote:
> 
> This patch has broken the glibc build.  I am not sure if the problem is
> a bug in your patch or a bug in the code used by glibc.  Here is a
> cutdown test case from glibc (timezone/scheck.c).  This code compiled
> before your patch but now it fails with:
> 
> x.c:4:3: error: initializer element is not constant
>time_t) -1) < 0)
> 
> 
> 
> __extension__ typedef long int __time_t;
> typedef __time_t time_t;
> static time_t const time_t_min =
>   time_t) -1) < 0)
>? (time_t) -1 << (8 * sizeof (time_t) - 1)
>: 0)
> 

See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66066

-- 
Markus


RE: [patch 1/28] top-level: Use automake-1.11.6

2015-05-08 Thread Joseph Myers
On Fri, 8 May 2015, Bernd Edlinger wrote:

> One example where there is an incompatibility is "missing":
> 
> Formerly it had code that emulated the missing "flex" by
> creating a dummy lex.yy.c from the hopefully installed
> pre-compiled flex output file.  But the version from the
> trunk does nothing, which breaks all configure scripts
> that used AM_PROG_LEX.  I do assume that the
> automake scripts just use a different way to achieve
> the same goal, if flex is not installed.

It seems like a bug to me that "missing" changed its interface.  However, 
since GCC doesn't use flex in any directory that uses, or is a 
subdirectory of a directory that uses, automake, clearly that change is of 
no relevance to the version of automake used in GCC.  In any case, GCC 
release tarballs should always have timestamps in the right order for 
non-checked-in generated files, and contrib/gcc_update should always be 
used when checking out checked-in generated files to get the timestamps in 
the right order, so no supported case of building GCC should ever get as 
far as trying to use "missing" to regenerate something unless there are 
bugs in the makefiles, gcc_update etc.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Remove mode argument from gen_rtx_SET

2015-05-08 Thread DJ Delorie

> ; This pattern is identical to the truncsipsi2 pattern except
> ; that it uses a SUBREG instead of a TRUNC.  It is needed in
> ; order to prevent reload from converting (set:SI (SUBREG:PSI (SI)))
> ; into (SET:PSI (PSI)).
> 
> I'm not sure what that's supposed to mean (what's an SI set of a PSI
> subreg?), but I suspect removing the mode would lose information,
> so I left it alone.

MSP430 has 20-bit registers (PSImode-sized).  One register can hold an
HI or PSI sized value, but if you have an SI value it's stored as two
HI registers.

Thus, a PSImode value in a register is *not* just the 20 LSB of an
SImode value.  Also, a PSImode subset of an SI value is stored
different than a PSImode value on its own.

Thus, consider code like this:

(set (reg:SI 1)
 (subreg:PSI (reg:SI 2)))

(set (reg:PSI 1)
 (reg:PSI 2))

On most architectures, you'd say "these do the same thing" but on
MSP430 they don't.


[PATCH, alpha]: Remove dead (HOST_BITS_PER_WIDE_INT < 64) code

2015-05-08 Thread Uros Bizjak
... to prepare alpha for TARGET_SUPPORTS_WIDE_INT switch.

2015-05-08  Uros Bizjak  

* config/alpha/alpha.c (alpha_emit_set_const_1)
(alpha_emit_set_long_const, alpha_extract_integer)
(alpha_legitimate_constant_p, alpha_split_const_mov)
(alpha_expand_block_clear, alpha_expand_zap_mask, print_operand):
[HOST_BITS_PER_WIDE_INT < 64]: Remove dead code.
* config/alpha/predicates.md (mode_mask_operand): Do not match
const_double RTX.
[HOST_BITS_PER_WIDE_INT < 64]: Remove dead code.
* config/alpha/alpha.md (abstf, *abstf_internal, UNSPEC_ZAP splitter)
[HOST_BITS_PER_WIDE_INT < 64]: Remove dead code.
(*negtf_internal): Use gen_int_mode instead of immed_double_const.

Tested on alphaev68-linux-gnu.  Will commit to mainline SVN in a couple of days.

Uros.
Index: config/alpha/predicates.md
===
--- config/alpha/predicates.md  (revision 222905)
+++ config/alpha/predicates.md  (working copy)
@@ -110,26 +110,19 @@
 ;; Return 1 if OP is a constant that is a mask of ones of width of an
 ;; integral machine mode not larger than DImode.
 (define_predicate "mode_mask_operand"
-  (match_code "const_int,const_double")
+  (match_code "const_int")
 {
-  if (CONST_INT_P (op))
-{
-  HOST_WIDE_INT value = INTVAL (op);
+  HOST_WIDE_INT value = INTVAL (op);
 
-  if (value == 0xff)
-   return 1;
-  if (value == 0x)
-   return 1;
-  if (value == 0x)
-   return 1;
-  if (value == -1)
-   return 1;
-}
-  else if (HOST_BITS_PER_WIDE_INT == 32 && GET_CODE (op) == CONST_DOUBLE)
-{
-  if (CONST_DOUBLE_LOW (op) == 0x && CONST_DOUBLE_HIGH (op) == 0)
-   return 1;
-}
+  if (value == 0xff)
+return 1;
+  if (value == 0x)
+return 1;
+  if (value == 0x)
+return 1;
+  if (value == -1)
+return 1;
+
   return 0;
 })
 
Index: config/alpha/alpha.md
===
--- config/alpha/alpha.md   (revision 222905)
+++ config/alpha/alpha.md   (working copy)
@@ -951,7 +951,7 @@
   [(set (match_operand:DI 0 "register_operand")
(and:DI (match_operand:DI 1 "register_operand")
(match_operand:DI 2 "const_int_operand")))]
-  "HOST_BITS_PER_WIDE_INT == 64 && ! and_operand (operands[2], DImode)"
+  "! and_operand (operands[2], DImode)"
   [(set (match_dup 0) (and:DI (match_dup 1) (match_dup 3)))
(set (match_dup 0) (and:DI (match_dup 0) (match_dup 4)))]
 {
@@ -1509,8 +1509,7 @@
(and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
   (match_operand:DI 2 "mul8_operand" "I"))
(match_operand:DI 3 "immediate_operand" "i")))]
-  "HOST_BITS_PER_WIDE_INT == 64
-   && CONST_INT_P (operands[3])
+  "CONST_INT_P (operands[3])
&& (((unsigned HOST_WIDE_INT) 0xff << INTVAL (operands[2])
 == (unsigned HOST_WIDE_INT) INTVAL (operands[3]))
|| ((unsigned HOST_WIDE_INT) 0x << INTVAL (operands[2])
@@ -1518,7 +1517,6 @@
|| ((unsigned HOST_WIDE_INT) 0x << INTVAL (operands[2])
 == (unsigned HOST_WIDE_INT) INTVAL (operands[3])))"
 {
-#if HOST_BITS_PER_WIDE_INT == 64
   if ((unsigned HOST_WIDE_INT) 0xff << INTVAL (operands[2])
   == (unsigned HOST_WIDE_INT) INTVAL (operands[3]))
 return "insbl %1,%s2,%0";
@@ -1528,7 +1526,7 @@
   if ((unsigned HOST_WIDE_INT) 0x << INTVAL (operands[2])
   == (unsigned HOST_WIDE_INT) INTVAL (operands[3]))
 return "insll %1,%s2,%0";
-#endif
+
   gcc_unreachable ();
 }
   [(set_attr "type" "shift")])
@@ -1619,13 +1617,7 @@
   (abs:TF (match_operand:TF 1 "reg_or_0_operand")))
  (use (match_dup 2))])]
   "TARGET_HAS_XFLOATING_LIBS"
-{
-#if HOST_BITS_PER_WIDE_INT >= 64
-  operands[2] = force_reg (DImode, GEN_INT ((HOST_WIDE_INT) 1 << 63));
-#else
-  operands[2] = force_reg (DImode, immed_double_const (0, 0x8000, DImode));
-#endif
-})
+  "operands[2] = force_reg (DImode, GEN_INT (HOST_WIDE_INT_1 << 63));")
 
 (define_insn_and_split "*abstf_internal"
   [(set (match_operand:TF 0 "register_operand" "=r")
@@ -1649,13 +1641,7 @@
   (neg:TF (match_operand:TF 1 "reg_or_0_operand")))
  (use (match_dup 2))])]
   "TARGET_HAS_XFLOATING_LIBS"
-{
-#if HOST_BITS_PER_WIDE_INT >= 64
-  operands[2] = force_reg (DImode, GEN_INT ((HOST_WIDE_INT) 1 << 63));
-#else
-  operands[2] = force_reg (DImode, immed_double_const (0, 0x8000, DImode));
-#endif
-})
+  "operands[2] = force_reg (DImode, GEN_INT (HOST_WIDE_INT_1 << 63));")
 
 (define_insn_and_split "*negtf_internal"
   [(set (match_operand:TF 0 "register_operand" "=r")
@@ -5440,7 +5426,7 @@
(match_operand:DI 2 "reg_or_8bit_operand")]
   ""
 {
-  rtx mask = immed_double_const (0x, 0, DImode);
+  rtx mask = gen_int_mode (0x, DImode);
   emit_insn (gen_mskxl (operands[0], operands[1], mask, operands[2]));
   DONE;
 })
@@ -5542,16 +

Re: [RFA] libiberty/mkstemps.c: Include if not available.

2015-05-08 Thread DJ Delorie

> * mkstemps.c: #include  if HAVE_TIME_H is defined
> but not HAVE_SYS_TIME_H.

Ok.


Re: [PATCH 6/13] mips musl support

2015-05-08 Thread Joseph Myers
On Fri, 8 May 2015, Rich Felker wrote:

> On Fri, May 08, 2015 at 03:41:31PM +0100, Szabolcs Nagy wrote:
> > > I.e. as it stands this patch is not OK for backporting to GCC 5
> > > without further discussion.
> > > 
> > > There is also the perspective that we should be able to aim for
> > > an ABI variant agnostic dynamic linker at some point over the next
> > > year by working towards a build that truly uses no float and is
> > > hence compatible with all the ABI variants.
> > 
> > i'm not sure what you mean by 'a build that truly uses no float'
> > 
> > i thought the direction is to have a potentially hard float abi
> > with kernel emulation when the fpu is not present.
> 
> I think Matthew's idea was that the dynamic linker could be agnostic
> since it doesn't need floating point arithmetic itself, then load

Note that however the dynamic linker does properly need to save and 
restore call-clobbered registers used for argument passing (because of 
IFUNCs, user-provided malloc, audit hooks etc. that might affect them even 
if the dynamic linker itself doesn't); see 
.  So any 
floating-point-agnostic dynamic linker would, if fixing the bugs around 
not saving / restoring such registers, need to have runtime-conditional 
code to save and restore them rather than simple compile-time 
conditionals.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Patch, fortran] PR 37131, inline matmul

2015-05-08 Thread H.J. Lu
On Mon, May 4, 2015 at 11:25 PM, Thomas Koenig  wrote:
> Hello world,
>
> this is an update of the matmul inline patch.  The only difference to
> the last version is that it has the ubound simplification taken out.
>
> Any further comments?  OK for trunk?
>
> Thomas
>
> 2015-05-05  Thomas Koenig  
>
> PR fortran/37131
> * gfortran.h (gfc_isym_id):  Add GFC_ISYM_FE_RUNTIME_ERROR.
> (gfc_array_spec):  Add resolved flag.
> (gfc_intrinsic_sym):  Add vararg.
> * intrinsic.h (gfc_check_fe_runtime_error):  Add prototype.
> (gfc_resolve_re_runtime_error):  Likewise.
> Add prototype for gfc_is_reallocatable_lhs.
> * array.c (gfc_resolve_array_spec):  Do not resolve if it has
> already been resolved.
> * trans-array.h (gfc_is_reallocatable_lhs):  Remove prototype.
> * check.c (gfc_check_fe_runtime_error):  New function.
> * intrinsic.c (add_sym_1p):  New function.
> (make_vararg):  New function.
> (add_subroutines):  Add fe_runtime_error.
> (gfc_intrinsic_sub_interface): Skip sorting for variable number
> of arguments.
> * iresolve.c (gfc_resolve_fe_runtime_error):  New function.
> * lang.opt (inline-matmul-limit):  New option.
> (gfc_post_options): If no inline matmul limit has been set and
> BLAS is called externally, use the BLAS limit.
> * frontend-passes.c:  Include intrinsic.h.
> (var_num):  New global counter for naming temporary variablbles.
> (matrix_case):  Enum for differentiating the different matmul
> cases.
> (realloc_string_callback):  Add "trim" to the variable name.
> (create_var): Add optional argument vname as part of the name.
> Use var_num. Set dimension of result correctly. Split off block
> creation into
> (insert_block): New function.
> (cfe_expr_0): Use "fcn" as part of temporary variable name.
> (optimize_namesapce): Also set gfc_current_ns. Call
> inline_matmul_assign.
> (combine_array_constructor):  Use "constr" as part of
> temporary name.
> (get_array_inq_function):  New function.
> (build_logical_expr):  New function.
> (get_operand):  new function.
> (inline_limit_check):  New function.
> (runtime_error_ne):  New function.
> (matmul_lhs_realloc):  New function.
> (is_functino_or_op):  New function.
> (has_function_or_op):  New function.
> (freeze_expr):  New function.
> (freeze_references):  New function.
> (convert_to_index_kind):  New function.
> (create_do_loop):  New function.
> (get_size_m1):  New function.
> (scalarized_expr):  New function.
> (inline_matmul_assign):  New function.
> * simplify.c (simplify_bound):  Simplify the case of the
> lower bound of an assumed-shape argument.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66073


-- 
H.J.


Re: [C/C++ PATCH] Implement -Wshift-negative-value (PR c/65179)

2015-05-08 Thread Joseph Myers
On Fri, 8 May 2015, Steve Ellcey wrote:

> On Thu, 2015-05-07 at 21:15 +0200, Marek Polacek wrote:
> > On Thu, May 07, 2015 at 12:00:20PM -0600, Jeff Law wrote:
> > > OK.  Please install if you haven't already.
> > 
> > I have not, so will do momentarily.  Thanks,
> > 
> > Marek
> 
> Marek,
> 
> This patch has broken the glibc build.  I am not sure if the problem is
> a bug in your patch or a bug in the code used by glibc.  Here is a
> cutdown test case from glibc (timezone/scheck.c).  This code compiled
> before your patch but now it fails with:
> 
> x.c:4:3: error: initializer element is not constant
>time_t) -1) < 0)
> 
> 
> 
> __extension__ typedef long int __time_t;
> typedef __time_t time_t;
> static time_t const time_t_min =
>   time_t) -1) < 0)
>? (time_t) -1 << (8 * sizeof (time_t) - 1)
>: 0)

Paul, although glibc's copy of parts of tzcode is a bit out of date, it 
looks like the current https://github.com/eggert/tz.git still has the 
problematic code in private.h, relying on left-shifting -1 which has 
undefined behavior in C99/C11 (implementation-defined in C90, as per 
DR#081).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 6/13] mips musl support

2015-05-08 Thread Rich Felker
On Fri, May 08, 2015 at 04:50:28PM +, Joseph Myers wrote:
> On Fri, 8 May 2015, Rich Felker wrote:
> 
> > On Fri, May 08, 2015 at 03:41:31PM +0100, Szabolcs Nagy wrote:
> > > > I.e. as it stands this patch is not OK for backporting to GCC 5
> > > > without further discussion.
> > > > 
> > > > There is also the perspective that we should be able to aim for
> > > > an ABI variant agnostic dynamic linker at some point over the next
> > > > year by working towards a build that truly uses no float and is
> > > > hence compatible with all the ABI variants.
> > > 
> > > i'm not sure what you mean by 'a build that truly uses no float'
> > > 
> > > i thought the direction is to have a potentially hard float abi
> > > with kernel emulation when the fpu is not present.
> > 
> > I think Matthew's idea was that the dynamic linker could be agnostic
> > since it doesn't need floating point arithmetic itself, then load
> 
> Note that however the dynamic linker does properly need to save and 
> restore call-clobbered registers used for argument passing (because of 
> IFUNCs, user-provided malloc, audit hooks etc. that might affect them even 
> if the dynamic linker itself doesn't); see 
> .  So any 
> floating-point-agnostic dynamic linker would, if fixing the bugs around 
> not saving / restoring such registers, need to have runtime-conditional 
> code to save and restore them rather than simple compile-time 
> conditionals.

FWIW, this also doesn't apply to musl; we don't do lazy binding and
there's no resolver function. The dynamic linker never calls into code
provided by the application except for executing init/fini functions.
IFUNC may be provided at some point but it wouldn't be lazy, so
call-clobbered registers aren't relevant; right now the lack of any
specification for what an IFUNC callback is permitted to do (in the
form of "if you do anything else, the behavior is undefined") is
what's blocking support.

Rich


Re: [RFA] libiberty/mkstemps.c: Include if not available.

2015-05-08 Thread Joel Brobecker
> > * mkstemps.c: #include  if HAVE_TIME_H is defined
> > but not HAVE_SYS_TIME_H.
> 
> Ok.

Thank you, DJ. Pushed to both GCC and binutils-gdb.

-- 
Joel


Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-08 Thread H.J. Lu
On Thu, May 7, 2015 at 2:17 PM, Joseph Myers  wrote:
> On Fri, 6 Mar 2015, H.J. Lu wrote:
>
>> +# We don't want to compile the compiler with -fPIE, it make PCH fail.
>> +COMPILER += @NO_PIE_CFLAGS@
>> +
>> +# Link with -no-pie since we compile the compiler with -fno-PIE.
>> +LINKER += @NO_PIE_FLAG@
>
> As I understand it, what we don't want is the compiler to be a PIE.  That
> is, it must be linked -no-pie (and given that the compiler is not a PIE,
> compiling -fPIE would be pointless, although it wouldn't actually break
> things to have PIE objects in the compiler as long as it's linked for a
> fixed address).
>
>> +#if defined ENABLE_DEFAULT_PIE
>> +#define GNU_USER_TARGET_STARTFILE_SPEC \
>> +  "%{!shared: %{pg|p|profile:gcrt1.o%s;: \
>> +%{" PIE_SPEC ":Scrt1.o%s} %{" NO_PIE_SPEC ":crt1.o%s}}} \
>> +   crti.o%s %{static:crtbeginT.o%s;: %{shared:crtbeginS.o%s} \
>> +   %{" PIE_SPEC ":crtbeginS.o%s} \
>> +   %{" NO_PIE_SPEC ":crtbegin.o%s}}" \
>> +   FVTABLE_VERIFY_SPEC
>> +#else
>> +#define GNU_USER_TARGET_STARTFILE_SPEC \
>> +  "%{!shared: %{pg|p|profile:gcrt1.o%s;pie:Scrt1.o%s;:crt1.o%s}} \
>> +   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}" 
>> \
>> +   FVTABLE_VERIFY_SPEC
>> +#endif
>
> With appropriate definitions of PIE_SPEC and NO_PIE_SPEC, shouldn't a
> single definition of GNU_USER_TARGET_STARTFILE_SPEC be able to work for
> both ENABLE_DEFAULT_PIE and !ENABLE_DEFAULT_PIE?

Yes.

>  noted a
> possible issue with MIPS.  Actually, rather more config/*.h and
> config/*/*.h headers contain specs testing for (-fpie, -fPIE, -fno-pie,
> -fno-PIE, -pie) options, which would be affected by these changes.  I'd
> say this patch should include an initial attempt at adjusting those config
> headers, which should be an essentially mechanical change not requiring
> understanding anything target-specific.  For link-time specs, that may
> mean using PIE_SPEC and NO_PIE_SPEC.  For compile-time specs, similar new
> macros would be added.  Given such adjustments included in the patch and
> the relevant target maintainers CC:ed, I might then be inclined to approve
> the patch on the basis of allowing a week for target maintainers to test
> the changes for their targets before commit, as I don't see any major
> problems with it beyond the need to update the target-specific specs.
>

Here is the updated patch.  I will post patches for cris, mips, powerpc
and sparc separately.  The target maintainers should be able to adjust
backend ASM_SPEC with FPIE_OR_FPIC_SPEC and
NO_FPIE_AND_FPIC_SPEC.

OK for trunk?

Thanks.


-- 
H.J.
---
Add --enable-default-pie option to configure GCC to generate PIE by
default.

gcc/

2015-03-06  Magnus Granberg  
   H.J. Lu  

* Makefile.in (COMPILER): Add @NO_PIE_CFLAGS@.
(BUILD_CFLAGS): Likewise.
(BUILD_CXXFLAGS): Likewise.
(LINKER): Add @NO_PIE_FLAG@.
(BUILD_LDFLAGS): Likewise.
(libgcc.mvars): Set NO_PIE_CFLAGS to -fno-PIE for
--enable-default-pie.
* common.opt (fPIE): Initialize to -1.
(fpie): Likewise.
(no-pie): New option.
(pie): Replace "Negative(shared)" with "Negative(no-pie)".
* configure.ac: Add --enable-default-pie.
(NO_PIE_CFLAGS): New.  Check if -fno-PIE works.  AC_SUBST.
(NO_PIE_FLAG): New.  Check if -no-pie works.  AC_SUBST.
* defaults.h (DEFAULT_FLAG_PIE): New.  Default PIE to -fPIE.
* gcc.c (NO_PIE_SPEC): New.
(PIE_SPEC): Likewise.
(NO_FPIE_SPEC): Likewise.
(FPIE_SPEC): Likewise.
(NO_FPIE_AND_FPIC_SPEC): Likewise.
(FPIE_OR_FPIC_SPEC): Likewise.
(LD_PIE_SPEC): Likewise.
(LINK_PIE_SPEC): Handle -no-pie.  Use PIE_SPEC and LD_PIE_SPEC.
* opts.c (DEFAULT_FLAG_PIE): New.  Set to 0 if ENABLE_DEFAULT_PIE
is undefined.
(finish_options): Update opts->x_flag_pie if it is -1.
* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Use
PIE_SPEC and NO_PIE_SPEC if HAVE_LD_PIE is defined.
* doc/install.texi: Document --enable-default-pie.
* doc/invoke.texi: Document -no-pie.
* config.in: Regenerated.
* configure: Likewise.

gcc/ada/

2015-03-06  H.J. Lu  

* gcc-interface/Makefile.in (TOOLS_LIBS): Add @NO_PIE_FLAG@.

libgcc/

2015-03-06  H.J. Lu  

* Makefile.in (CRTSTUFF_CFLAGS): Add $(NO_PIE_CFLAGS).
From 57e2d527af4891a4bf05b57b01d9ac97d336e959 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 6 Mar 2015 09:07:54 -0800
Subject: [PATCH 1/5] Add --enable-default-pie option to GCC configure

Add --enable-default-pie option to configure GCC to generate PIE by
default.

gcc/

2015-03-06  Magnus Granberg  
	H.J. Lu  

	* Makefile.in (COMPILER): Add @NO_PIE_CFLAGS@.
	(BUILD_CFLAGS): Likewise.
	(BUILD_CXXFLAGS): Likewise.
	(LINKER): Add @NO_PIE_FLAG@.
	(BUILD_LDFLAGS): Likewise.
	(libgcc.mvars): Set NO_PIE_CFLAGS to -fno-PIE for
	--enable-default-pie.
	* common.opt (fPIE): Initialize to -1.
	(fpie): Likewise.
	(no-pie): New option.
	(pie): Replace "Negative(shared)" with "Negative(no-pie)".
	* configure.ac: Add --enable-default-pie.
	(NO_PIE_CFLAGS): New.  Check if -fno-PIE works.  AC

[PATCH 2/5] Use NO_FPIE_AND_FPIC_SPEC in NO_SHARED_SPECS

2015-05-08 Thread H.J. Lu
OK for trunk?

* config/mips/gnu-user.h (NO_SHARED_SPECS): Use
NO_FPIE_AND_FPIC_SPEC.
---
 gcc/config/mips/gnu-user.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/mips/gnu-user.h b/gcc/config/mips/gnu-user.h
index 28b00ed..dd4cf11 100644
--- a/gcc/config/mips/gnu-user.h
+++ b/gcc/config/mips/gnu-user.h
@@ -100,7 +100,7 @@ along with GCC; see the file COPYING3.  If not see
 #ifdef HAVE_AS_NO_SHARED
 /* Default to -mno-shared for non-PIC.  */
 # define NO_SHARED_SPECS \
-  " %{mshared|mno-shared|fpic|fPIC|fpie|fPIE:;:-mno-shared}"
+  " %{mshared|mno-shared:;:%{" NO_FPIE_AND_FPIC_SPEC ":-mno-shared}}"
 #else
 # define NO_SHARED_SPECS ""
 #endif
-- 
1.9.3



[PATCH 3/5] Use FPIE_OR_FPIC_SPEC in CRIS_ASM_SUBTARGET_SPEC

2015-05-08 Thread H.J. Lu
OK for trunk?


H.J.

* config/cris/linux.h (CRIS_ASM_SUBTARGET_SPEC): Use
FPIE_OR_FPIC_SPEC.
---
 gcc/config/cris/linux.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/cris/linux.h b/gcc/config/cris/linux.h
index bd57986..262aac5 100644
--- a/gcc/config/cris/linux.h
+++ b/gcc/config/cris/linux.h
@@ -79,13 +79,13 @@ along with GCC; see the file COPYING3.  If not see
  "--em=criself \
   %{!march=*:%{!mcpu=*:--march=v32}} \
   %{!fleading-underscore:--no-underscore}\
-  %{fPIC|fpic|fPIE|fpie: --pic}"
+  %{" FPIE_OR_FPIC_SPEC ": --pic}"
 #else
 # define CRIS_ASM_SUBTARGET_SPEC \
  "--em=criself \
   %{!march=*:%{!mcpu=*:--march=v10}} \
   %{!fleading-underscore:--no-underscore}\
-  %{fPIC|fpic|fPIE|fpie: --pic}"
+  %{" FPIE_OR_FPIC_SPEC ": --pic}"
 #endif
 
 /* Previously controlled by target_flags.  */
-- 
1.9.3



[PATCH 5/5] Use FPIE_OR_FPIC_SPEC in ASM_SPEC

2015-05-08 Thread H.J. Lu
OK for trunk?

H.J.

* config/sparc/linux.h (ASM_SPEC): Use FPIE_OR_FPIC_SPEC.
* config/sparc/linux64.h (ASM_SPEC): Likewise.
---
 gcc/config/sparc/linux.h   | 2 +-
 gcc/config/sparc/linux64.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/sparc/linux.h b/gcc/config/sparc/linux.h
index 56def4b..17e1e86 100644
--- a/gcc/config/sparc/linux.h
+++ b/gcc/config/sparc/linux.h
@@ -98,7 +98,7 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
 #undef ASM_SPEC
 #define ASM_SPEC "\
 -s \
-%{fpic|fPIC|fpie|fPIE:-K PIC} \
+%{" FPIE_OR_FPIC_SPEC ":-K PIC} \
 %{!.c:%{findirect-dispatch:-K PIC}} \
 %(asm_cpu) %(asm_relax)"
 
diff --git a/gcc/config/sparc/linux64.h b/gcc/config/sparc/linux64.h
index fa805fd..43da848 100644
--- a/gcc/config/sparc/linux64.h
+++ b/gcc/config/sparc/linux64.h
@@ -208,7 +208,7 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
 #undef ASM_SPEC
 #define ASM_SPEC "\
 -s \
-%{fpic|fPIC|fpie|fPIE:-K PIC} \
+%{" FPIE_OR_FPIC_SPEC ":-K PIC} \
 %{!.c:%{findirect-dispatch:-K PIC}} \
 %(asm_cpu) %(asm_arch) %(asm_relax)"
 
-- 
1.9.3



[PATCH 4/5] Use FPIE_OR_FPIC_SPEC in ASM_SPEC

2015-05-08 Thread H.J. Lu
OK for trunk?

H.J.

* config/rs6000/sysv4.h (ASM_SPEC): Use FPIE_OR_FPIC_SPEC.
---
 gcc/config/rs6000/sysv4.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/sysv4.h b/gcc/config/rs6000/sysv4.h
index 9917c2f..1041648 100644
--- a/gcc/config/rs6000/sysv4.h
+++ b/gcc/config/rs6000/sysv4.h
@@ -530,7 +530,7 @@ extern int fixuplabelno;
 #undef ASM_SPEC
 #defineASM_SPEC "%(asm_cpu) \
 %{,assembler|,assembler-with-cpp: %{mregnames} %{mno-regnames}} \
-%{mrelocatable} %{mrelocatable-lib} %{fpic|fpie|fPIC|fPIE:-K PIC} \
+%{mrelocatable} %{mrelocatable-lib} %{" FPIE_OR_FPIC_SPEC ":-K PIC} \
 %{memb|msdata=eabi: -memb}" \
 ENDIAN_SELECT(" -mbig", " -mlittle", DEFAULT_ASM_ENDIAN)
 
-- 
1.9.3



Re: [RFC 0/6] Flags outputs for asms

2015-05-08 Thread H. Peter Anvin
On 05/08/2015 08:54 AM, Richard Henderson wrote:
> 
> Anyway, I'll look into whether the branch around alpha can be optimized, but
> I'd be shocked if I'd be able to do anything about the branch around beta.
> True, there's nothing in between that will clobber the flags so it would be an
> excellent improvement, but combine doesn't work across basic blocks and
> changing that would be a major task.
> 

Either way... optimization is something that can be done gradually.
Once we start using the feature we can figure out where it makes sense
to do further optimizations.

0hpa




Re: [patch 8/10] debug-early merge: Objective-C front-end

2015-05-08 Thread Mike Stump
On May 7, 2015, at 5:38 PM, Aldy Hernandez  wrote:

Ok.


Re: [patch 9/10] debug-early merge: testsuite changes

2015-05-08 Thread Mike Stump
On May 7, 2015, at 5:39 PM, Aldy Hernandez  wrote:

So, I don’t feel there is anything in there for me to review, I’d like the 
front-end maintainer to review.



dearly-testsuite.patch
Description: Binary data




Question about patch for PR bootstrap/65150 (identical functions)

2015-05-08 Thread Steve Ellcey
Jan and Martin,

I just noticed that your patch for PR bootstrap/65150 broke one of the
MIPS tests (gcc.target/mips/branch-1.c).  I can fix the test with no
problem but I am wondering if the change I am seeing with your patch 
is intended or not.

A cutdown version of the test is:

void bar (void);
void f1 (int x) { if (x & 4) bar (); }
void f2 (int x) { if ((x >> 2) & 1) bar (); }

After your change GCC sees that the code for f1 and f2 are identical
so it replaced the body of f2 with a call to f1.  This optimization will
save space but it is not going to be faster because any call to f2 will
now include an extra call/return.  Do other platforms have this same issue
or is there a way to make f2 an alias for f1 on other targets so no extra
call is needed?  I looked around to see if there was a target function or
macro that is used to make one function an alias of another but I didn't
see anything.

Steve Ellcey
sell...@imgtec.com


Re: [debug-early] fix -fdump-go-spec

2015-05-08 Thread Ian Lance Taylor
On Wed, Apr 29, 2015 at 5:56 PM, Aldy Hernandez  wrote:
>
> Despite what Go thinks:
>
>   /* The debug hooks are used to implement -fdump-go-spec because it
>  gives a simple and stable API for all the information we need to
>  dump.  */
>
> ...the debug hooks are not stable... :).

Alas.


> With this patch I have done my best to give Go what it wants without
> recreating what the front-ends were doing.  I've made the go_decl() call
> work from within the early_global_decl() hook which gets called as we parse
> (rest_of_decl_compilation).  I far as I understand, this hack is a one-time
> thing for use internally in the build process, so we don't care whether
> go_decl() will receive location information??

That is true: the goal is to output the declarations in Go syntax;
location information is irrelevant.


> Is there not a more modern way of Go getting the DECLs it needs without
> abusing the debug_hook machinery?

I don't know.

Thanks for working on this.  Have you tried building with
--enable-languages=go?  On a GNU/Linux system it should build and pass
all tests with no extra effort.

Ian


Re: [patch 5/10] debug-early merge: Go front-end

2015-05-08 Thread Ian Lance Taylor
This is fine if it works.  Thanks.

Ian

On Thu, May 7, 2015 at 5:36 PM, Aldy Hernandez  wrote:
>


Re: [PATCH 1/6] Only resolve_asm_operand_names once

2015-05-08 Thread Jeff Law

On 05/07/2015 03:38 PM, Richard Henderson wrote:

We do it in the front end already; no need to repeat.
---
  gcc/cfgexpand.c | 2 --
  gcc/stmt.c  | 7 ---
  2 files changed, 4 insertions(+), 5 deletions(-)
Any reason this shouldn't go into the tree immediately?  Seems like it 
stands on its own.


jeff



Re: [PATCH 3/6] Canonicalize asm volatility earlier

2015-05-08 Thread Jeff Law

On 05/07/2015 03:38 PM, Richard Henderson wrote:

If gimple_asm_volatile_p is correct, no point re-checking.
This is also done by the C and C++ front ends, but not Ada.
So we can't yet trust ASM_VOLATILE_P from the front end.
---
  gcc/cfgexpand.c | 11 +++
  gcc/gimplify.c  |  2 +-
  2 files changed, 4 insertions(+), 9 deletions(-)
Also seems like it ought to be able to go forward independently now 
rather than waiting.


jeff



Re: [PATCH 5/6] i386: Add CCPmode

2015-05-08 Thread Jeff Law

On 05/07/2015 03:38 PM, Richard Henderson wrote:

For testing parity coming out of asm flags.
---
  gcc/config/i386/i386-modes.def |  2 ++
  gcc/config/i386/i386.c | 19 +++
  2 files changed, 13 insertions(+), 8 deletions(-)

Seems like it ought to move forward now.

Oh yea, I guess you should consider my prior messages as reviews with 
the following note.


Need ChangeLog entry and testing.  With those good for the trunk.

jeff



Re: update docs for --enable-languages

2015-05-08 Thread Jeff Law

On 05/07/2015 04:15 PM, Jim Wilson wrote:

ping

https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01690.html

OK.  Must have missed it when it flew by -- sorry.

jeff



  1   2   >