date:20151103

Re: [v3 PATCH] Make the default constructors of tuple and pair conditionally explicit.

2015-11-03 Thread Paolo Carlini


Hi,

On 11/03/2015 06:01 AM, Ville Voutilainen wrote:

On 2 November 2015 at 23:07, Paolo Carlini  wrote:

Great, thanks a lot. Thinking more about this detail, I wonder if we should
therefore apply the below too? Anything I'm missing?

Tested again on Linux-PPC64. Ok for trunk?

Go ahead!

Paolo.

[PATCH, testsuite]: Move x86 specific tests to gcc.target/i386 directory

2015-11-03 Thread Uros Bizjak

... and add ifunc effective target requirement where necessary.

2015-11-03  Uros Bizjak  

* gcc.dg/mvc1.c: Move to ...
* gcc.target/i386/mvc1.c: ... here.  Require ifunc.
* gcc.dg/mvc2.c: Move to ...
* gcc.target/i386/mvc2.c: ... here.
* gcc.dg/mvc3.c: Move to ...
* gcc.target/i386/mvc3.c: ... here.
* gcc.dg/mvc4.c: Move to ...
* gcc.target/i386/mvc4.c: ... here.  Require ifunc.
* gcc.dg/mvc5.c: Move to ...
* gcc.target/i386/mvc5.c: ... here.
* gcc.dg/mvc6.c: Move to ...
* gcc.target/i386/mvc6.c: ... here.
* gcc.dg/mvc7.c: Move to ...
* gcc.target/i386/mvc7.c: ... here.

* g++.dg/ext/mvc1.C: Require ifunc.

Tested on x86_64-linux-gnu {,-m32} CentOS 5.11 (no ifunc
functionality) and committed to mainline SVN.

Uros.
Index: gcc.target/i386/mvc3.c
===
--- gcc.target/i386/mvc3.c  (revision 229652)
+++ gcc.target/i386/mvc3.c  (working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-do compile } */
 
 __attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
 int foo (); /* { dg-error "default target was not set" } */
Index: gcc.target/i386/mvc4.c
===
--- gcc.target/i386/mvc4.c  (revision 229652)
+++ gcc.target/i386/mvc4.c  (working copy)
@@ -1,4 +1,5 @@
-/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-do run } */
+/* { dg-require-ifunc "" } */
 
 __attribute__((target_clones("default","avx","default")))
 int
Index: gcc.target/i386/mvc5.c
===
--- gcc.target/i386/mvc5.c  (revision 229652)
+++ gcc.target/i386/mvc5.c  (working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-do compile } */
 /* { dg-options "-fno-inline" } */
 /* { dg-final { scan-assembler-times "foo.ifunc" 6 } } */
 
Index: gcc.target/i386/mvc6.c
===
--- gcc.target/i386/mvc6.c  (revision 229652)
+++ gcc.target/i386/mvc6.c  (working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-do compile } */
 /* { dg-options "-O3" } */
 /* { dg-final { scan-assembler "vpshufb" } } */
 /* { dg-final { scan-assembler "punpcklbw" } } */
Index: gcc.target/i386/mvc7.c
===
--- gcc.target/i386/mvc7.c  (revision 229652)
+++ gcc.target/i386/mvc7.c  (working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-do compile } */
 /* { dg-final { scan-assembler-times "foo.ifunc" 4 } } */
 
 __attribute__((target_clones("avx","default","arch=slm","arch=core-avx2")))
Index: gcc.target/i386/mvc1.c
===
--- gcc.target/i386/mvc1.c  (revision 229652)
+++ gcc.target/i386/mvc1.c  (working copy)
@@ -1,4 +1,5 @@
-/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-do run } */
+/* { dg-require-ifunc "" } */
 
 __attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
 int
Index: gcc.target/i386/mvc2.c
===
--- gcc.target/i386/mvc2.c  (revision 229652)
+++ gcc.target/i386/mvc2.c  (working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-do compile } */
 
 __attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
 int foo ();
Index: gcc.dg/mvc3.c
===
--- gcc.dg/mvc3.c   (revision 229652)
+++ gcc.dg/mvc3.c   (working copy)
@@ -1,10 +0,0 @@
-/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-
-__attribute__((target_clones("avx","arch=slm","arch=core-avx2")))
-int foo (); /* { dg-error "default target was not set" } */
-
-int
-bar ()
-{
-  return foo();
-}
Index: gcc.dg/mvc7.c
===
--- gcc.dg/mvc7.c   (revision 229652)
+++ gcc.dg/mvc7.c   (working copy)
@@ -1,10 +0,0 @@
-/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-final { scan-assembler-times "foo.ifunc" 4 } } */
-
-__attribute__((target_clones("avx","default","arch=slm","arch=core-avx2")))
-int foo ();
-
-int main()
-{
-  return foo();
-}
Index: gcc.dg/mvc4.c
===
--- gcc.dg/mvc4.c   (revision 229652)
+++ gcc.dg/mvc4.c   (working copy)
@@ -1,26 +0,0 @@
-/* { dg-do run { target i?86-*-* x86_64-*-* } } */
-
-__attribute__((target_clones("default","avx","default")))
-int
-foo ()
-{
-  return -2;
-}
-
-int
-bar ()
-{
-  return 2;
-}
-
-int
-main ()
-{
-  int r = 0;
-  r += bar ();
-  r += foo ();
-  r += bar ();
-  r += foo ();
-  r += bar ();
-  return r - 2;
-}
Index: gcc.dg/mvc1.c
==

Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-11-03 Thread Eric Botcazou

> This fails on ia64.

gnat.dg/discr44.adb and gnat.dg/discr45.adb are supposed to fail everywhere 
(and there are also a few ACATS failures everywhere).  Sorry for the awkward 
situation but they are testcases exposing the recent type system breakage.

-- 
Eric Botcazou

Re: [ping] Fix PR debug/66728

2015-11-03 Thread Richard Sandiford

Mike Stump  writes:
> On Nov 2, 2015, at 12:55 PM, Richard Sandiford
>  wrote:
>> This was:
>> 
>>  ... Sometimes structure decls
>>  have BLKmode but are assigned an integer-mode rtl (e.g. when passing
>>  3-byte structures by value to functions).
>>  [...]
>>  loc_descriptor refuses to use CONST_INT for BLKmode decls (which aren't
>>  actually integers at the source level).  That seems like the right
>>  behaviour
>
> I’ll plead ignorance here, but why do you think that?  The dwarf standard 
> says:
>
> There are six forms of constants. There are
> fixed length constant data forms for one, two,
> four and eight byte values (respec
> tively, DW_FORM_data1, DW_FORM_data2,
> DW_FORM_data4, and DW_FORM_data8). There ar
> e also variable length constant data
> forms encoded using LEB128 numbers (see below). Both signed (DW_FORM_sdata) 
> and
> unsigned (DW_FORM_udata) variable
> length constants are available
> The data in DW_FORM_data1, DW
> _FORM_data2, DW_FORM_data4 and
> DW_FORM_data8 can be anything. Depending on c
> ontext, it may be a signed integer, an
> unsigned integer, a floating-point
> constant, or anything else. A
> consumer must use context to
> know how to interpret the bits, wh
> ich if they are target machine
> data (such as an integer or
> floating point constant) will be in target machine byte-order.
>
> Certainly supplying the known byte values of a constant is preferable to
> throwing up our hands and saying, I know, but I’m not telling.  Given
> the text above, it seems like these forms can be used for content where
> the compiler knows the values of the bits that comprise the content.
> I’d ask, is the backing of your position supported by the dwarf
> standard?  If yes, what part?
>
> I think you think that this describes the type, these do not.  There is
> a separate system to describe the type.  For example, DW_ATE_UTF
> describes the bytes as forming a UTF value.  A wide int (or a CONST_INT)
> can be used to describe a unicode character, and it would have a
> DW_ATE_UTF encoding on it for the debugger to use to formulate an idea
> of how to display those bytes.  Further, a mythical front end could have
> a 3 byte unicode character, and these can be modeless as there is no 3
> byte machine mode for them.  Code-gen would be BLKmode, the type would
> be DW_ATE_UTF, and one could form constants with CONST_INT.  In a 152
> bit UTF character in that front end, CONST_INT, generally speaking,
> isn’t big enough, so a CONST_WIDE_INT would be formed.  The argument is
> the same.  That a machine has a native 3 byte type or not, is of no
> consequence, so _any_ decision based upon the mode in this way is
> flawed.

This isn't just an argument about the DWARF standard though.  It's an
argument about GCC internals.  Presumably these hypothetical BLKmode
types would need to support addition, but plus:BLK is not well formed,
and wouldn't distinguish between your 3-byte and 152-bit cases.  I don't
think const_int and const_wide_int are logically different.  There's the
historical decision that const_int doesn't have a stored mode, but I
don't think that was because we wanted to support const_ints that are
conceptually BLKmode.

I think from an rtl perspective the only sensible way for frontends to
cope with integers whose size doesn't match an rtl mode is to promote
to the next widest mode, which is what the stor-layout.c code I quoted
does.  Obviously if your 3 byte type is actually 3 bytes in memory rather
than 4, and no 3-byte mode is available, you can't just load and store
the value using a normal rtl move.  You have to use bitfield extraction
and insertion instead.

I picked this PR up because it was wide-int-related, even though
(as is probably all too obvious from this thread) I'm not familiar
with the dwarf2out.c code.  It's actually your commit that I'm trying
to fix here (r201707).  Would you mind taking the PR over and handling
it the way you think it should be handled?

Thanks,
Richard

[gomp4, committed] Remove shadowing declaration in oacc_entry_exit_ok_1

2015-11-03 Thread Tom de Vries

[ was: Re: [committed, gomp4, 2/3] Handle sequential code in kernels 
region ]


On 12/10/15 19:26, Tom de Vries wrote:

On 12/10/15 19:12, Tom de Vries wrote:

Hi,

I've committed the following patch series.

  1Add get_bbs_in_oacc_kernels_region
  2Handle sequential code in kernels region
  3Handle sequential code in kernels region - Testcases

The patch series adds detection of whether sequential code (that is,
code in the oacc kernels region before and after the loop that is to be
parallelized), is safe to execute in parallel.

Bootstrapped and reg-tested on x86_64.

I'll post the patches individually, in reply to this email.


This patch checks in parloops, for each non-loop stmt in the oacc
kernels region, that it's not a load aliasing with a store anywhere in
the region, and vice versa.

An exception are loads and stores for reductions, which are later-on
transformed into an atomic update.



I ran into an ICE in oacc kernels testcases when doing a non-bootstrap 
build and test. The ICE was caused by an uninitialized variable, which 
was uninitialized because the intended initialization was absorbed by a 
shadowing variable declaration.


This patch removes the shadowing declaration.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Remove shadowing declaration in oacc_entry_exit_ok_1

2015-11-03  Tom de Vries  

	* tree-parloops.c (oacc_entry_exit_ok_1): Remove shadowing declaration
	of ref.
---
 gcc/tree-parloops.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index a144f2d..f14cf8a 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2976,7 +2976,6 @@ oacc_entry_exit_ok_1 (bitmap in_loop_bbs, vec region_bbs,
 	}
 	  else if (gimple_store_p (stmt))
 	{
-	  ao_ref ref;
 	  ao_ref_init (&ref, gimple_assign_lhs (stmt));
 	  ref_is_store = true;
 	}
-- 
1.9.1

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67749: Do not emit separate SET insn in IF-ELSE case

2015-11-03 Thread Kyrill Tkachov


Hi Jeff,

On 02/11/15 22:46, Jeff Law wrote:

On 10/27/2015 08:49 AM, Kyrill Tkachov wrote:

Hi all,

This patch fixes the gcc.dg/ifcvt-2.c test for x86_64 where we were
failing to if-convert. This was because in my patch at
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=228194 which
tried to emit a SET to move the source of insn_a or insn_b (that came
from the test block) into a temporary. A SET however, is not always
enough. For example, on x86_64 in order for the resulting insn to be
recognised it frequently needs to be in PARALLEL with a (clobber
(reg:CC FLAGS_REG)). This leads to the insn not being recognised.

I don't think it affects the approach you've chosen, but I'm mentioning it for 
future reference.

gcse.c has some helper code (that probably ought to move into a more generic 
file) that will test for this situation.  Search for the instances of recog.  
It essentially does something like

emit_insn (gen_rtx_SET (...))

And tries to recognize the result to determine if it's valid.


you mean compute_can_copy and can_copy_p? I was not aware of those. 
Interesting, they look like they
can be useful in places indeed. I'll keep them in mind for any future work.






So this patch removes that SET and instead generates a couple of
temporary pseudos that gets passed on a bit later to the code that
loads the operands into registers when they're not general_operand.
This way we just modify the existing (recognisable) sets, allowing us
to if-convert the testcase.

That sounds much more reasonable, assuming that the original destinations were 
just used once and those uses are guaranteed to be going away.





Bootstrapped and tested on x86_64, arm, aarch64.

Ok for trunk?


What happens in the case were noce_emit_bb returns a failure? We've modified 
the original insns to use the new pseudos.  Won't this result in the original 
pseudo's uses using undefined values?


We should be fine because we don't modify the original insns. We create a copy 
of them i.e.
rtx_insn *copy_of_a = as_a  (copy_rtx (insn_a));

and modify the SET_DEST of that. The original insn should still remain intact 
if any step in
noce_try_cmove_arith fails, so we can revert back to the original sequence.

Thanks,
Kyrill



jeff

Re: [PATCH] Fix warning in tree-diagnostic.h.

2015-11-03 Thread Dominik Vogt

On Mon, Nov 02, 2015 at 09:57:22AM -0700, Jeff Law wrote:
> On 11/02/2015 06:26 AM, Dominik Vogt wrote:
> >The attached patch fixes the annoying warnings generated by
> >diagnostic_set_last_function.
> Can you point out what warning you're fixing?

Sure.  toplec.c calls diagnostic_set_last_function with a NULL
pointer as the second argument, and cp/typeck.c complains about
it.  Ah, it seems the *current* version of Gcc does not generate
the warning, but the old one (4.8.5) I used to compile does.  So,
it may or may not be worth to apply the patch.

-- snip --
unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual 
-Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long \
-Wno-variadic-macros -Wno-overlength-strings -DHAVE_CONFIG_H -I. -I. 
-I../../gcc -I../../gcc/. -I../../gcc/../include -I../../gcc/../libc\
pp/include -I../../gcc/../libdecnumber -I../../gcc/../libdecnumber/dpd 
-I../libdecnumber -I../../gcc/../libbacktrace -o trans-mem.o -MT \
trans-mem.o -MMD -MP -MF ./.deps/trans-mem.TPo ../../gcc/trans-mem.c
In file included from ../../gcc/toplev.c:42:0:
../../gcc/toplev.c: In function â\200\230void announce_function(tree)â\200\231:
../../gcc/diagnostic.h:202:50: warning: invalid access to non-static data 
member â\200\230diagnostic_info::x_dataâ\200\231 of NULL object \
[-Winvalid-offsetof]
#define diagnostic_info_auxiliary_data(DI) (DI)->x_data
^
../../gcc/tree-diagnostic.h:28:11: note: in expansion of macro 
â\200\230diagnostic_info_auxiliary_dataâ\200\231
((tree) diagnostic_info_auxiliary_data (DI))
^
../../gcc/tree-diagnostic.h:47:17: note: in expansion of macro 
â\200\230diagnostic_abstract_originâ\200\231
= (((DI) && diagnostic_abstract_origin (DI)) \
^
../../gcc/toplev.c:233:7: note: in expansion of macro 
â\200\230diagnostic_set_last_functionâ\200\231
diagnostic_set_last_function (global_dc, (diagnostic_info *) NULL);
^
../../gcc/diagnostic.h:202:50: warning: (perhaps the â\200\230offsetofâ\200\231 
macro was used incorrectly) [-Winvalid-offsetof]
#define diagnostic_info_auxiliary_data(DI) (DI)->x_data
^
../../gcc/tree-diagnostic.h:28:11: note: in expansion of macro 
â\200\230diagnostic_info_auxiliary_dataâ\200\231
((tree) diagnostic_info_auxiliary_data (DI))
^
../../gcc/tree-diagnostic.h:47:17: note: in expansion of macro 
â\200\230diagnostic_abstract_originâ\200\231
= (((DI) && diagnostic_abstract_origin (DI)) \
^
../../gcc/toplev.c:233:7: note: in expansion of macro 
â\200\230diagnostic_set_last_functionâ\200\231
diagnostic_set_last_function (global_dc, (diagnostic_info *) NULL);
^
../../gcc/diagnostic.h:202:50: warning: invalid access to non-static data 
member â\200\230diagnostic_info::x_dataâ\200\231 of NULL object \
[-Winvalid-offsetof]
#define diagnostic_info_auxiliary_data(DI) (DI)->x_data 
-- snip --

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: [RFC] Combine vectorized loops with its scalar remainder.

2015-11-03 Thread Richard Henderson


On 10/28/2015 11:45 AM, Yuri Rumyantsev wrote:

Hi All,

Here is a preliminary patch to combine vectorized loop with its scalar
remainder, draft of which was proposed by Kirill Yukhin month ago:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html
It was tested wwith '-mavx2' option to run on Haswell processor.
The main goal of it is to improve performance of vectorized loops for AVX512.


Ought this really be enabled for avx2?  While it's nice for testing to be able 
to use normal vcond patterns to be able to test with current hardware, I have 
trouble imagining that it's an improvement without the real masked operations.


I tried to have a look myself at what kind of output we'd be getting, but the 
very first test I tried produced an ICE:


void foo(float *a, float *b, int n)
{
  int i;
  for (i = 0; i < n; ++i)
a[i] += b[i];
}

$ ./cc1 -O3 -mavx2 z.c
 foo
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data>
  
 
Assembling functions:

  foo
z.c: In function ‘foo’:
z.c:1:6: error: bogus comparison result type
 void foo(float *a, float *b, int n)
  ^
vector(8) signed int
vect_vec_mask_.24_116 = vect_vec_iv_.22_112 < vect_cst_.23_115;
z.c:1:6: internal compiler error: verify_gimple failed
0xb20d17 verify_gimple_in_cfg(function*, bool)
../../git-master/gcc/tree-cfg.c:5082
0xa16d77 execute_function_todo
../../git-master/gcc/passes.c:1940
0xa1769b execute_todo
../../git-master/gcc/passes.c:1995
Please submit a full bug report,


r~

Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant

2015-11-03 Thread Alan Lawrence

On 27/10/15 22:27, H.J. Lu wrote:
>
> It caused:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68112

Bah :(.

So yes, in general case, we can't rewrite (a << 1) to (a * 2) as for signed
types (0x7f...f) << 1 == -2 whereas (0x7f...f * 2) is undefined behaviour.
Oh well :(...

I don't have a really good fix for this. The best way I can see would be to try
to make definedness of overflow a property of either the type, or maybe of the
chrec, and settable on a finer granularity than at present, rather than
TYPE_OVERFLOW_UNDEFINED = (type is signed) && !(a bunch of global flags).
However, I don't think I'm going to have time for that patch before end of
stage 1.

So, I've reverted my r229437. There is a simpler fix: to only apply the rewrite
for unsigned types. I attach that patch, which I've bootstrapped on x86; but
although I think this way is correct, I'm not really sure whether this is
something that should go in. Thoughts?

--Alan
---
 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 
 gcc/tree-scalar-evolution.c  | 19 ++
 2 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
new file mode 100644
index 000..40e6561
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
@@ -0,0 +1,33 @@
+/* PR tree-optimization/65963.  */
+#include "tree-vect.h"
+
+#define N 512
+
+int in[2*N], out[N];
+
+__attribute__ ((noinline)) void
+loop (void)
+{
+  for (unsigned i = 0; i < N; i++)
+out[i] = in[i << 1] + 7;
+}
+
+int
+main (int argc, char **argv)
+{
+  check_vect ();
+  for (int i = 0; i < 2*N; i++)
+{
+  in[i] = i;
+  __asm__ volatile ("" : : : "memory");
+}
+  loop ();
+  __asm__ volatile ("" : : : "memory");
+  for (int i = 0; i < N; i++)
+{
+  if (out[i] != i*2 + 7)
+   abort ();
+}
+  return 0;
+}
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" 
{ target { vect_strided2 } } } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 0753bf3..d8f3d46 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1840,6 +1840,25 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
   res = chrec_fold_multiply (type, chrec1, chrec2);
   break;
 
+case LSHIFT_EXPR:
+  if (!TYPE_OVERFLOW_UNDEFINED (type))
+   {
+ /* Handle A<

Re: [PATCH] clarify documentation of -Q --help=optimizers

2015-11-03 Thread Alexander Monakov

On Thu, 22 Oct 2015, Martin Sebor wrote:

> [Sending to the right list this time]
> 
> The documentation of the -Q --help=optimizers options leads some
> to expect that when options are reported as enabled imply the
> corresponding optimization will take place.  (See the following
> question on gcc-help:
> https://gcc.gnu.org/ml/gcc-help/2015-10/msg00133.html)
> 
> The patch below tries to make it clear that that's not always
> the case.

Hi,

The issue is due to optimization passes being skipped at -O0, and yet
corresponding optimization options not explicitely disabled.  The effect of -O
is an old source of confusion, and now the intro to "Optimization Options"
says,

Most optimizations are only enabled if an -O level is set on the command
line.  Otherwise they are disabled, even if individual optimization flags
are specified.

(added with this patch:
https://gcc.gnu.org/ml/gcc-patches/2009-10/msg00739.html )

As we observe, it's not visible enough, and I'm not sure saying that again in
the documentation (in a different section) is a good way to go.  Maybe we'd
warn for attempts to enable optimizations at -O0, but that's not trivial.
Perhaps go with Richard's suggestion in the end of this mail ("Thus, at the
end of --help-optimizers print ...")?
https://gcc.gnu.org/ml/gcc-patches/2012-05/msg00113.html

Thanks.
Alexander

Re: [PATCH][ARM] Fix costing of vmul+vcvt combine pattern

2015-11-03 Thread Kyrill Tkachov


Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02898.html

Thanks,
Kyrill

On 27/10/15 13:55, Kyrill Tkachov wrote:

Hi all,

This patch allows us to handle the *combine_vcvtf2i pattern in rtx costs by 
properly identifying it
as a toint coversion. Before this I saw a pattern like:
(set (reg/i:SI 0 r0)
(fix:SI (fix:SF (mult:SF (reg:SF 16 s0 [ a ])
(const_double:SF 3.2e+1 [0x0.8p+6])

being assigned a cost of 40 because the costs blindly recursed into the 
operands.
With this patch for -mcpu=cortex-a57 I see it being assigned a cost of 4.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2015-10-27  Kyrylo Tkachov  

* config/arm/arm.c (arm_new_rtx_costs, FIX case): Handle
combine_vcvtf2i pattern.

Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-11-03 Thread Eric Botcazou

> I suggest to re-instantiate the canonical type checks for the aggregate type
> case.

OK, thanks, this fixes all the known ICEs so far.

Tested on x86_64-suse-linux, OK for the mainline?


2015-11-03  Eric Botcazou  

* gimple-expr.c (useless_type_conversion_p): Reinstate type canonical
check for aggregate types and beef up comment for mode check.


2015-11-03  Eric Botcazou  

* gnat.dg/discr45.adb: Only compile the test.

-- 
Eric Botcazou
Index: gimple-expr.c
===
--- gimple-expr.c	(revision 229616)
+++ gimple-expr.c	(working copy)
@@ -84,7 +84,15 @@ useless_type_conversion_p (tree outer_ty
   if (inner_type == outer_type)
 return true;
 
-  /* Changes in machine mode are never useless conversions unless.  */
+  /* For aggregate types, if we know the canonical types, compare them.  This
+ is important for the remapping of variably-modified types.  */
+  if (AGGREGATE_TYPE_P (inner_type)
+  && TYPE_CANONICAL (inner_type)
+  && TYPE_CANONICAL (inner_type) == TYPE_CANONICAL (outer_type))
+return true;
+
+  /* Changes in machine mode are never useless conversions because the RTL
+ middle-end expects explicit conversions between modes.  */
   if (TYPE_MODE (inner_type) != TYPE_MODE (outer_type))
 return false;
 
Index: testsuite/gnat.dg/discr45.adb
===
--- testsuite/gnat.dg/discr45.adb	(revision 229630)
+++ testsuite/gnat.dg/discr45.adb	(working copy)
@@ -1,4 +1,4 @@
--- { dg-do run }
+-- { dg-do compile }
 -- { dg-options "-O2 -gnatws" }
 
 procedure Discr45 is

Re: [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF

2015-11-03 Thread Alan Lawrence

On 30/10/15 05:35, Jeff Law wrote:
> On 10/29/2015 01:18 PM, Alan Lawrence wrote:
>> This patch just teaches DOM that ARRAY_REFs can be equivalent to MEM_REFs 
>> (with
>> pointer type to the array element type).
>>
>> gcc/ChangeLog:
>>
>> * tree-ssa-dom.c (dom_normalize_single_rhs): New.
>> (dom_normalize_gimple_stmt): New.
>> (lookup_avail_expr): Call dom_normalize_gimple_stmt.
> Has this been tested?  Do you have tests where it can be shown to make a
> difference independent of the changes to tree-sra.c?
>
> The implementation looks fine, I just want to have some basic tests in the
> testsuite that show the benefit of this normalization.

I'll look at the implementation and Richard's comments soon, but as to tests -
well I had tried before and not had much luck but OK you make me try harder ;).

I'll put these in with the appropriate patches, but ssa-dom-cse-{5,6}.c test
the ARRAY_REF -> MEM_REF part (one of these has to disable SRA from optimizing
the whole test away, the other...ends up waiting until dom2 for the whole loop
to have been unrolled, so I'd not be surprised if this proved a bit fragile,
as in not spotting when some other phase starts doing the optimization instead).
ssa-dom-cse-7.c tests the normalization-of-MEM_REFs part; but AFAICT, the
*only* place making exprs like the problematic MEM[(int[8] *)&a ...], is SRA.
That is, ssa-dom-cse-7.c passes (and the patch series solves PR/63679) if
instead of my patch 2 (normalization of MEM_REFs) we have this:

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 4327990..2889a96 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1697,7 +1697,7 @@ build_ref_for_offset (location_t loc, tree base, 
HOST_WIDE_INT offset,
 }
   else
 {
-  off = build_int_cst (reference_alias_ptr_type (base),
+  off = build_int_cst (build_pointer_type (exp_type),
   base_offset + offset / BITS_PER_UNIT);
   base = build_fold_addr_expr (unshare_expr (base));
 }

...I'll test that fully but I have to wonder what the right path is here!

Cheers,
Alan
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c | 17 +
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c | 16 
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c | 18 ++
 3 files changed, 51 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c
new file mode 100644
index 000..cfbb85f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-5.c
@@ -0,0 +1,17 @@
+/* Test normalization of ARRAY_REF expressions to MEM_REFs in dom.  */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-dom2" } */
+
+#define N 8
+
+int
+main (int argc, char **argv)
+{
+  int a[N];
+  for (int i = 0; i < N; i++)
+a[i] = 2*i + 1;
+  int *p = &a[0];
+  return *(++p);
+}
+
+/* { dg-final { scan-tree-dump-times "return 3;" 1 "dom2"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c
new file mode 100644
index 000..c387fa3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-6.c
@@ -0,0 +1,16 @@
+/* Test normalization of ARRAY_REF expressions to MEM_REFs in dom.  */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-sra -fno-tree-fre -fdump-tree-dom1" } */
+
+int
+main (int argc, char **argv)
+{
+  union {
+int a[8];
+int b[2];
+  } u = { .a = { 1, 42, 3, 4, 5, 6, 7, 8 } };
+
+  return u.b[1];
+}
+
+/* { dg-final { scan-assembler-times "return 42;" 1 "dom1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c
new file mode 100644
index 000..3f4ca17
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-7.c
@@ -0,0 +1,18 @@
+/* Test normalization of MEM_REF expressions in dom.  */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" }
+
+typedef struct {
+  int a[8];
+} foo;
+
+foo f;
+
+int test ()
+{
+  foo g = { { 1, 2, 3, 4, 5, 6, 7, 8 } };
+  f=g;
+  return f.a[2];
+}
+
+/* { dg-final { scan-tree-dump-times "return 3;" 1 "optimized" } } */
-- 
1.9.1

Re: [RFC] Combine vectorized loops with its scalar remainder.

2015-11-03 Thread Yuri Rumyantsev

This is expected failure since this patch is not in sync with the
latest patches related to masking support for AVX512.
I am waiting for support for masking load/store support which is not
integrated to trunk. To get workable version of compiler use revision
before r229128.

2015-11-03 13:08 GMT+03:00 Richard Henderson :
> On 10/28/2015 11:45 AM, Yuri Rumyantsev wrote:
>>
>> Hi All,
>>
>> Here is a preliminary patch to combine vectorized loop with its scalar
>> remainder, draft of which was proposed by Kirill Yukhin month ago:
>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html
>> It was tested wwith '-mavx2' option to run on Haswell processor.
>> The main goal of it is to improve performance of vectorized loops for
>> AVX512.
>
>
> Ought this really be enabled for avx2?  While it's nice for testing to be
> able to use normal vcond patterns to be able to test with current hardware,
> I have trouble imagining that it's an improvement without the real masked
> operations.
>
> I tried to have a look myself at what kind of output we'd be getting, but
> the very first test I tried produced an ICE:
>
> void foo(float *a, float *b, int n)
> {
>   int i;
>   for (i = 0; i < n; ++i)
> a[i] += b[i];
> }
>
> $ ./cc1 -O3 -mavx2 z.c
>  foo
> Analyzing compilation unit
> Performing interprocedural optimizations
>  <*free_lang_data>   
>  
> 
> Assembling functions:
>   foo
> z.c: In function ‘foo’:
> z.c:1:6: error: bogus comparison result type
>  void foo(float *a, float *b, int n)
>   ^
> vector(8) signed int
> vect_vec_mask_.24_116 = vect_vec_iv_.22_112 < vect_cst_.23_115;
> z.c:1:6: internal compiler error: verify_gimple failed
> 0xb20d17 verify_gimple_in_cfg(function*, bool)
> ../../git-master/gcc/tree-cfg.c:5082
> 0xa16d77 execute_function_todo
> ../../git-master/gcc/passes.c:1940
> 0xa1769b execute_todo
> ../../git-master/gcc/passes.c:1995
> Please submit a full bug report,
>
>
> r~

Re: [PATCH] libitm: Support sized delete.

2015-11-03 Thread Torvald Riegel

On Fri, 2015-10-30 at 10:19 -0700, Richard Henderson wrote:
> >  #define _ZnwX  S(_Znw,MANGLE_SIZE_T)
> >  #define _ZnaX  S(_Zna,MANGLE_SIZE_T)
> > +#define _ZdlPvXS(_ZdlPv,MANGLE_SIZE_T)
> >  #define _ZnwXRKSt9nothrow_tS(S(_Znw,MANGLE_SIZE_T),RKSt9nothrow_t)
> >  #define _ZnaXRKSt9nothrow_tS(S(_Zna,MANGLE_SIZE_T),RKSt9nothrow_t)
> > +#define _ZdlPvXRKSt9nothrow_t  
> > S(S(_ZdlPv,MANGLE_SIZE_T),RKSt9nothrow_t)

These are symbols that are provided by libstdc++ if it is available
(otherwise, we use dummy implementations, see the other parts of the
patch)...
  
> >  #define _ZGTtnwX   S(_ZGTtnw,MANGLE_SIZE_T)
> >  #define _ZGTtnaX   S(_ZGTtna,MANGLE_SIZE_T)
> > +#define _ZGTtdlPvX S(_ZGTtdlPv,MANGLE_SIZE_T)
> >  #define _ZGTtnwXRKSt9nothrow_t 
> > S(S(_ZGTtnw,MANGLE_SIZE_T),RKSt9nothrow_t)
> >  #define _ZGTtnaXRKSt9nothrow_t 
> > S(S(_ZGTtna,MANGLE_SIZE_T),RKSt9nothrow_t)
> > +#define _ZGTtdlPvXRKSt9nothrow_t 
> > S(S(_ZGTtdlPv,MANGLE_SIZE_T),RKSt9nothrow_t)
> 
> One more thing... Why are there 4 new symbols here...
> 
> > +/* Wrap: operator delete(void* ptr, std::size_t sz)  */
> > +void
> > +_ZGTtdlPvX (void *ptr, size_t sz)
> > +{
> > +  if (ptr)
> > +gtm_thr()->forget_allocation (ptr, sz, _ZdlPvX);
> > +}
> > +
> > +/* Wrap: operator delete (void *ptr, std::size_t sz, const 
> > std::nothrow_t&)  */
> > +void
> > +_ZGTtdlPvXRKSt9nothrow_t (void *ptr, size_t sz, c_nothrow_p nt UNUSED)
> > +{
> > +  if (ptr)
> > +gtm_thr()->forget_allocation (ptr, sz, delsz_opnt);
> > +}
> > +

... while those are the transactional wrappers that we actually want to
provide and export from libitm.

I'll probably revise the patch in a v2 though depending on how we decide
to handle allocations of exception objects.

Re: [AARCH64][PATCH 1/3] Implementing the variants of the vmulx_ NEON intrinsic

2015-11-03 Thread James Greenhalgh

On Fri, Oct 30, 2015 at 09:29:35AM +, Bilyan Borisov wrote:
> In this patch from the series, a single new md pattern is added: the one for
> fmulx, from which all necessary __builtin functions are derived.
> 
> Several intrinsics were refactored to use the new __builtin functions as some
> of them already had an assembly block implementation. The rest, which had no
> existing implementation, were also added. A single intrinsic was removed:
> vmulx_lane_f32, since there was no test case that covered it and, moreover,
> its implementation was wrong: it was in fact implementing vmulxq_lane_f32.
> 
> In addition, test cases for all new intrinsics were added. Tested on targets
> aarch64-none-elf and aarch64_be-none-elf.

This is OK for trunk.

As you don't yet have commit rights, I've committed it on your behalf
as revision 229702.

I made some small modifications to your ChangeLog. We normally use
"Likewise." rather than "Same." And the content of each line is a short
description of what changed, rather than why it changed. Additionally,
it is usually best to put the name of the thing that was changed, as this
will be what people grep for.

In the end, this was the log I committed:

gcc/

2015-11-03  Bilyan Borisov  

* config/aarch64/aarch64-simd-builtins.def (fmulx): New.
* config/aarch64/aarch64-simd.md (aarch64_fmulx): New.
* config/aarch64/arm_neon.h (vmulx_f32): Rewrite to call fmulx
builtin.
(vmulxq_f32): Likewise.
(vmulx_f64): New.
(vmulxq_f64): Rewrite to call fmulx builtin.
(vmulxs_f32): Likewise.
(vmulxd_f64): Likewise.
(vmulx_lane_f32): Remove.
* config/aarch64/iterators.md (UNSPEC): Add fmulx.

gcc/testsuite/

2015-11-03  Bilyan Borisov  

* gcc/testsuite/gcc.target/aarch64/simd/vmulx_f32_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulx_f64_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxq_f32_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxq_f64_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxs_f32_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxd_f64_1.c: New.

Thanks,
James

Re: [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF

2015-11-03 Thread Alan Lawrence

On 3 November 2015 at 10:27, Alan Lawrence  wrote:
> That is, ssa-dom-cse-7.c passes (and the patch series solves PR/63679) if
> instead of my patch 2 (normalization of MEM_REFs) we have this:
>
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 4327990..2889a96 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1697,7 +1697,7 @@ build_ref_for_offset (location_t loc, tree base, 
> HOST_WIDE_INT offset,
>  }
>else
>  {
> -  off = build_int_cst (reference_alias_ptr_type (base),
> +  off = build_int_cst (build_pointer_type (exp_type),
>base_offset + offset / BITS_PER_UNIT);
>base = build_fold_addr_expr (unshare_expr (base));
>  }
>
> ...I'll test that fully but I have to wonder what the right path is here!

So with also changing the other reference_alias_ptr_type in the first
case of build_ref_for_offset, it breaks Ada ACATS (on x86):

c52101a "CHECK THAT ARRAY SUBTYPE CONVERSION IS APPLIED AFTER AN ARRAY
VALUE IS DETERMINED"
cc70003
cxac004 (stream access, stream functions)

I'll not dig any further unless you think that change to SRA is
the right avenue to investigate!

Cheers, Alan

Re: [AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants

2015-11-03 Thread James Greenhalgh

On Fri, Oct 30, 2015 at 09:31:08AM +, Bilyan Borisov wrote:
> In this patch from the series, all vmulx_lane variants have been implemented 
> as
> a vdup followed by a vmulx. Existing implementations of intrinsics were
> refactored to use this new approach.
> 
> Several new nameless md patterns are added that will enable the combine pass 
> to
> pick up the dup/fmulx combination and replace it with a proper fmulx[lane]
> instruction.
> 
> In addition, test cases for all new intrinsics were added. Tested on targets
> aarch64-none-elf and aarch64_be-none-elf.

Hi,

I have a small style comment below.

> 
> gcc/
> 
> 2015-XX-XX  Bilyan Borisov  
> 
>   * config/aarch64/arm_neon.h (vmulx_lane_f32): New.
>   (vmulx_lane_f64): New.
>   (vmulxq_lane_f32): Refactored & moved.
>   (vmulxq_lane_f64): Refactored & moved.
>   (vmulx_laneq_f32): New.
>   (vmulx_laneq_f64): New.
>   (vmulxq_laneq_f32): New.
>   (vmulxq_laneq_f64): New.
>   (vmulxs_lane_f32): New.
>   (vmulxs_laneq_f32): New.
>   (vmulxd_lane_f64): New.
>   (vmulxd_laneq_f64): New.

>   * config/aarch64/aarch64-simd.md (*aarch64_combine_dupfmulx1,
>   VDQSF): New pattern.
>   (*aarch64_combine_dupfmulx2, VDQF): New pattern.
>   (*aarch64_combine_dupfmulx3): New pattern.
>   (*aarch64_combine_vgetfmulx1, VDQF_DF): New pattern.

I'm not sure I like the use of 1,2,3 for this naming scheme. Elsewhere in
the file, this convention points to the number of operands a pattern
requires (for example add3).

I think elsewhere in the file we use:


  "*aarch64_mul3_elt"
  "*aarch64_mul3_elt_"
  "*aarch64_mul3_elt_to_128df"
  "*aarch64_mul3_elt_to_64v2df"

Is there a reason not to follow that pattern?

Thanks,
James

Re: [Boolean Vector, patch 1/5] Introduce boolean vector to be used as a vector comparison type

2015-11-03 Thread Richard Biener

On Mon, Nov 2, 2015 at 8:41 PM, Jeff Law  wrote:
> On 10/29/2015 07:08 AM, Ilya Enkovich wrote:
>>
>> On 28 Oct 22:37, Ilya Enkovich wrote:
>>>
>>> Seems the problem occurs in this check in expand_vector_operations_1:
>>>
>>>/* A scalar operation pretending to be a vector one.  */
>>>if (VECTOR_BOOLEAN_TYPE_P (type)
>>>&& !VECTOR_MODE_P (TYPE_MODE (type))
>>>&& TYPE_MODE (type) != BLKmode)
>>>  return;
>>>
>>> This is to filter out scalar operations on boolean vectors.
>>> The problem here is that TYPE_MODE (type) doesn't return
>>> V4SImode assigned to the type but calls vector_type_mode
>>> instead which tries to find an integer mode for it and returns
>>> TImode. This causes function exit and we don't expand vector
>>> comparison.
>>>
>>> Suppose simple option to fix it is to change default get_mask_mode
>>> hook to return BLKmode in case chosen integer vector mode is not
>>> vector_mode_supported_p.
>>>
>>> Thanks,
>>> Ilya
>>>
>>
>> Here is a patch which fixes the problem on ARM (and on i386 with -mno-sse
>> also).  I checked it fixes the problem on ARM and also bootstrapped and
>> checked it on x86_64-unknown-linux-gnu.  Is it OK?
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2015-10-29  Ilya Enkovich  
>>
>> * targhooks.c (default_get_mask_mode): Use BLKmode in
>> case target doesn't support required vector mode.
>> * stor-layout.c (layout_type): Check for BLKmode.
>
> VOIDmode would probably be a better choice than BLKmode to signal when the
> target doesn't support the required vector mode.

Though we're using BLKmode vectors in all other cases to signal that.

Richard.

>
> Jeff
>

Re: OpenACC atomic directive

2015-11-03 Thread Thomas Schwinge

Hi!

On Mon, 2 Nov 2015 14:40:54 +0100, Jakub Jelinek  wrote:
> On Mon, Nov 02, 2015 at 02:09:38PM +0100, Thomas Schwinge wrote:
> > The OpenACC atomic directive matches OpenMP's atomic directive (got that
> > clarified by the OpenACC committee), so they can share the same
> > implementation.  OK for trunk?
> 
> Ok.

Thanks for the speedy review!

Testing the x86 -m32 multilib, I noticed one problem:

$ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ 
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c 
-fopenacc -foffload=disable -O2 -m32 -march=i486 -c
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c: 
In function 'main':

source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c:564:9:
 internal compiler error: Segmentation fault
 #pragma acc atomic capture
 ^
0xaf041f crash_signal
[...]/source-gcc/gcc/toplev.c:336
0xd7c908 contains_struct_check(tree_node*, tree_node_structure_enum, char 
const*, int, char const*)
[...]/source-gcc/gcc/tree.h:3032
0xd7c908 build_call_expr_loc_array(unsigned int, tree_node*, int, 
tree_node**)
[...]/source-gcc/gcc/tree.c:10991
0xd7cb46 build_call_expr(tree_node*, int, ...)
[...]/source-gcc/gcc/tree.c:11041
0x9eaf7b expand_omp_atomic_mutex
[...]/source-gcc/gcc/omp-low.c:11903
0x9eaf7b expand_omp_atomic
[...]/source-gcc/gcc/omp-low.c:11990
[...]

gcc/omp-low.c:

 11892  expand_omp_atomic_mutex (basic_block load_bb, basic_block 
store_bb,
 11893   tree addr, tree loaded_val, tree 
stored_val)
 11894  {
 [...]
 11902t = builtin_decl_explicit (BUILT_IN_GOMP_ATOMIC_START);
 11903t = build_call_expr (t, 0);

As discussed in

a while ago (and as present on gomp-4_0-branch ever since then), I fixed
this as follows:

--- gcc/builtins.def
+++ gcc/builtins.def
@@ -182,7 +182,8 @@ along with GCC; see the file COPYING3.  If not see
 #define DEF_GOMP_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
false, true, true, ATTRS, false, \
-  (flag_openmp \
+  (flag_openacc \
+   || flag_openmp \
|| flag_tree_parallelize_loops > 1 \
|| flag_cilkplus \
|| flag_offload_abi != OFFLOAD_ABI_UNSET))

With that addded, committed in r229703:

commit 9e10bfb7c761a1343ecf51cfff60e03839561962
Author: tschwinge 
Date:   Tue Nov 3 11:28:22 2015 +

OpenACC atomic directive

gcc/c-family/
* c-pragma.c (oacc_pragmas): Add "atomic".
* c-pragma.h (pragma_kind): Add PRAGMA_OACC_ATOMIC.
gcc/c/
* c-parser.c (c_parser_omp_construct): Handle PRAGMA_OACC_ATOMIC.
gcc/cp/
* parser.c (cp_parser_omp_construct, cp_parser_pragma): Handle
PRAGMA_OACC_ATOMIC.
gcc/fortran/
* gfortran.h (gfc_statement): Add ST_OACC_ATOMIC,
ST_OACC_END_ATOMIC.
(gfc_exec_op): Add EXEC_OACC_ATOMIC.
* match.h (gfc_match_oacc_atomic): New prototype.
* openmp.c (gfc_match_omp_atomic, gfc_match_oacc_atomic): New
wrapper functions around...
(gfc_match_omp_oacc_atomic): ... this new function.
(oacc_code_to_statement, gfc_resolve_oacc_directive): Handle
EXEC_OACC_ATOMIC.
* parse.c (decode_oacc_directive): Handle "atomic", "end atomic".
(case_exec_markers): Add ST_OACC_ATOMIC.
(gfc_ascii_statement): Handle ST_OACC_ATOMIC, ST_OACC_END_ATOMIC.
(parse_omp_atomic): Rename to...
(parse_omp_oacc_atomic): ... this new function.  Add omp_p formal
parameter.  Adjust all users.
(parse_executable): Handle ST_OACC_ATOMIC.
(is_oacc): Handle EXEC_OACC_ATOMIC.
* resolve.c (gfc_resolve_blocks, gfc_resolve_code): Handle
EXEC_OACC_ATOMIC.
* st.c (gfc_free_statement): Handle EXEC_OACC_ATOMIC.
* trans-openmp.c (gfc_trans_oacc_directive): Handle
EXEC_OACC_ATOMIC.
* trans.c (trans_code): Handle EXEC_OACC_ATOMIC.
gcc/
* builtins.def (DEF_GOMP_BUILTIN): Enable for flag_openacc.
* omp-low.c (check_omp_nesting_restrictions): Allow
GIMPLE_OMP_ATOMIC_LOAD, GIMPLE_OMP_ATOMIC_STORE inside OpenACC
contexts.
gcc/testsuite/
* c-c++-common/goacc-gomp/nesting-fail-1.c: Move "atomic" tests
from here to...
* c-c++-common/goacc-gomp/nesting-1.c: ... here, and expect them
to succeed.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c: New
file.
* testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/atomic_rw-1.c: Likewise.
* tes

[PATCH] remove unused config/arm/coff.h

2015-11-03 Thread tbsaunde+gcc

From: Trevor Saunders 

Hi,

$subject, nothing refers to this header so we might as well remove it.

tested I can still build on x86_64-linux-gnu, not that I would expect anything
else or that it is particularly relevent, ok?

Trev

gcc/ChangeLog:

2015-11-03  Trevor Saunders  

* config/arm/coff.h: Remove.
---
 gcc/config/arm/coff.h | 82 ---
 1 file changed, 82 deletions(-)
 delete mode 100644 gcc/config/arm/coff.h

diff --git a/gcc/config/arm/coff.h b/gcc/config/arm/coff.h
deleted file mode 100644
index 526f9b9..000
--- a/gcc/config/arm/coff.h
+++ /dev/null
@@ -1,82 +0,0 @@
-/* Definitions of target machine for GNU compiler.
-   For ARM with COFF object format.
-   Copyright (C) 1995-2015 Free Software Foundation, Inc.
-   Contributed by Doug Evans (dev...@cygnus.com).
-   
-   This file is part of GCC.
-
-   GCC is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published
-   by the Free Software Foundation; either version 3, or (at your
-   option) any later version.
-
-   GCC is distributed in the hope that it will be useful, but WITHOUT
-   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
-   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
-   License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with GCC; see the file COPYING3.  If not see
-   .  */
-
-/* Note - it is important that this definition matches the one in tcoff.h.  */
-#undef  USER_LABEL_PREFIX
-#define USER_LABEL_PREFIX "_"
-
-
-/* Run-time Target Specification.  */
-#undef  TARGET_DEFAULT_FLOAT_ABI
-#define TARGET_DEFAULT_FLOAT_ABI ARM_FLOAT_ABI_SOFT
-
-#undef  TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_APCS_FRAME)
-
-#ifndef MULTILIB_DEFAULTS
-#define MULTILIB_DEFAULTS \
-  { "marm", "mlittle-endian", "mfloat-abi=soft", "mno-thumb-interwork" }
-#endif
-
-/* This is COFF, but prefer stabs.  */
-#define SDB_DEBUGGING_INFO 1
-
-#define PREFERRED_DEBUGGING_TYPE DBX_DEBUG
-
-
-#define TARGET_ASM_FILE_START_APP_OFF true
-
-/* Switch into a generic section.  */
-#define TARGET_ASM_NAMED_SECTION  default_coff_asm_named_section
-
-/* Support the ctors/dtors and other sections.  */
-
-#undef INIT_SECTION_ASM_OP
-
-/* Define this macro if jump tables (for `tablejump' insns) should be
-   output in the text section, along with the assembler instructions.
-   Otherwise, the readonly data section is used.  */
-/* We put ARM and Thumb-2 jump tables in the text section, because it makes
-   the code more efficient, but for Thumb-1 it's better to put them out of
-   band unless we are generating compressed tables.  */
-#define JUMP_TABLES_IN_TEXT_SECTION\
-   (TARGET_32BIT || (TARGET_THUMB && (optimize_size || flag_pic)))
-
-#undef  READONLY_DATA_SECTION_ASM_OP
-#define READONLY_DATA_SECTION_ASM_OP   "\t.section .rdata"
-#undef  CTORS_SECTION_ASM_OP
-#define CTORS_SECTION_ASM_OP   "\t.section .ctors,\"x\""
-#undef  DTORS_SECTION_ASM_OP
-#define DTORS_SECTION_ASM_OP   "\t.section .dtors,\"x\""
-
-/* Support the ctors/dtors sections for g++.  */
-
-/* __CTOR_LIST__ and __DTOR_LIST__ must be defined by the linker script.  */
-#define CTOR_LISTS_DEFINED_EXTERNALLY
-
-#undef DO_GLOBAL_CTORS_BODY
-#undef DO_GLOBAL_DTORS_BODY
-
-/* The ARM development system defines __main.  */
-#define NAME__MAIN  "__gccmain"
-#define SYMBOL__MAIN __gccmain
-
-#define SUPPORTS_INIT_PRIORITY 0
-- 
2.6.2

Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant

2015-11-03 Thread Richard Biener

On Tue, Nov 3, 2015 at 11:15 AM, Alan Lawrence  wrote:
> On 27/10/15 22:27, H.J. Lu wrote:
>>
>> It caused:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68112
>
> Bah :(.
>
> So yes, in general case, we can't rewrite (a << 1) to (a * 2) as for signed
> types (0x7f...f) << 1 == -2 whereas (0x7f...f * 2) is undefined behaviour.
> Oh well :(...
>
> I don't have a really good fix for this. The best way I can see would be to 
> try
> to make definedness of overflow a property of either the type, or maybe of the
> chrec, and settable on a finer granularity than at present, rather than
> TYPE_OVERFLOW_UNDEFINED = (type is signed) && !(a bunch of global flags).
> However, I don't think I'm going to have time for that patch before end of
> stage 1.
>
> So, I've reverted my r229437. There is a simpler fix: to only apply the 
> rewrite
> for unsigned types. I attach that patch, which I've bootstrapped on x86; but
> although I think this way is correct, I'm not really sure whether this is
> something that should go in. Thoughts?
>
> --Alan
> ---
>  gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 
> 
>  gcc/tree-scalar-evolution.c  | 19 ++
>  2 files changed, 52 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> new file mode 100644
> index 000..40e6561
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/65963.  */
> +#include "tree-vect.h"
> +
> +#define N 512
> +
> +int in[2*N], out[N];
> +
> +__attribute__ ((noinline)) void
> +loop (void)
> +{
> +  for (unsigned i = 0; i < N; i++)
> +out[i] = in[i << 1] + 7;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> +  check_vect ();
> +  for (int i = 0; i < 2*N; i++)
> +{
> +  in[i] = i;
> +  __asm__ volatile ("" : : : "memory");
> +}
> +  loop ();
> +  __asm__ volatile ("" : : : "memory");
> +  for (int i = 0; i < N; i++)
> +{
> +  if (out[i] != i*2 + 7)
> +   abort ();
> +}
> +  return 0;
> +}
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 
> "vect" { target { vect_strided2 } } } } */
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 0753bf3..d8f3d46 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -1840,6 +1840,25 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
>res = chrec_fold_multiply (type, chrec1, chrec2);
>break;
>
> +case LSHIFT_EXPR:
> +  if (!TYPE_OVERFLOW_UNDEFINED (type))

I think this should simply re-write A << B to (type) (unsigned-type) A
* (1U << B).

Does that then still vectorize the signed case?

> +   {
> + /* Handle A< + chrec1 = analyze_scalar_evolution (loop, rhs1);
> + chrec2 = analyze_scalar_evolution (loop, rhs2);
> + chrec1 = chrec_convert (type, chrec1, at_stmt);
> + chrec1 = instantiate_parameters (loop, chrec1);
> + chrec2 = instantiate_parameters (loop, chrec2);
> +
> + chrec2 = fold_build2 (LSHIFT_EXPR, type,
> +   build_int_cst (TREE_TYPE (rhs1), 1),
> +   chrec2);
> + res = chrec_fold_multiply (type, chrec1, chrec2);
> +   }
> +  else
> +   res = chrec_dont_know;
> +  break;
> +
>  CASE_CONVERT:
>/* In case we have a truncation of a widened operation that in
>   the truncated type has undefined overflow behavior analyze
> --
> 1.9.1
>

Re: [PATCH] Fix declaration of pthread-structs in s-osinte-rtems.ads

2015-11-03 Thread Jan Sommer

Am Monday 02 November 2015, 12:39:57 schrieb Sebastian Huber:
> 
> On 31/10/15 16:47, Jan Sommer wrote:
> > Hi,
> >
> > This patch changes the Ada-declaration of the pthread-related structs such 
> > as pthread_attr_t from a field-equivalent declaration to just reserving the 
> > right amount of memory.
> > It is only rtems related and essentially copies the way how the types are 
> > defined in s-osinte-linux.ads. It makes the declarations independent of a 
> > particular newlib-version and fixes the bug I filed here:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68169
> 
> [...]
> 
> >  -
> >  -- Signals --
> > @@ -448,6 +450,7 @@ package System.OS_Interface is
> > ss_low_priority : int;
> > ss_replenish_period : timespec;
> > ss_initial_budget   : timespec;
> > +  sched_ss_max_repl   : int;
> >  end record;
> >  pragma Convention (C, struct_sched_param);
> 
> Why is this structure not changed to an opaque size + alignment type 
> like the other structures?
> 

There is no corresponding size constant in s-oscons.ads. The linux version of 
s-osinte.ads uses a record declaration too.

> >   
> > @@ -621,43 +624,34 @@ private
> >  end record;
> >  pragma Convention (C, timespec);
> >   
> > -   CLOCK_REALTIME :  constant clockid_t := 1;
> > -   CLOCK_MONOTONIC : constant clockid_t := 4;
> > +   CLOCK_REALTIME :  constant clockid_t := 
> > System.OS_Constants.CLOCK_REALTIME;
> > +   CLOCK_MONOTONIC : constant clockid_t := 
> > System.OS_Constants.CLOCK_MONOTONIC;
> > +
> > +   subtype char_array is Interfaces.C.char_array;
> >   
> >  type pthread_attr_t is record
> > -  is_initialized  : int;
> > -  stackaddr   : System.Address;
> > -  stacksize   : int;
> > -  contentionscope : int;
> > -  inheritsched: int;
> > -  schedpolicy : int;
> > -  schedparam  : struct_sched_param;
> > -  cputime_clocked_allowed : int;
> > -  detatchstate: int;
> > +  Data : char_array (1 .. OS_Constants.PTHREAD_ATTR_SIZE);
> >  end record;
> >  pragma Convention (C, pthread_attr_t);
> > +   for pthread_attr_t'Alignment use Interfaces.C.unsigned_long'Alignment;
> >   
> >  type pthread_condattr_t is record
> > -  flags   : int;
> > -  process_shared  : int;
> > +  Data : char_array (1 .. OS_Constants.PTHREAD_CONDATTR_SIZE);
> >  end record;
> >  pragma Convention (C, pthread_condattr_t);
> > +   for pthread_condattr_t'Alignment use Interfaces.C.int'Alignment;
> >   
> >  type pthread_mutexattr_t is record
> > -  is_initialized  : int;
> > -  process_shared  : int;
> > -  prio_ceiling: int;
> > -  protocol: int;
> > -  mutex_type  : int;
> > -  recursive   : int;
> > -   end record;
> > +  Data : char_array (1 .. OS_Constants.PTHREAD_MUTEXATTR_SIZE);
> > +   end  record;
> >  pragma Convention (C, pthread_mutexattr_t);
> > +   for pthread_mutexattr_t'Alignment use Interfaces.C.int'Alignment;
> [...]
> 
> The alignment is sometimes int and sometimes unsigned long. I would 
> change this to long long or double throughout, e.g. if we change the CPU 
> mask type to uint64_t, then the alignment specified here is no longer 
> correct.
> 

Thanks for the tip. I will change that.

Best regards,

   Jan

Re: Add VIEW_CONVERT_EXPR to operand_equal_p

2015-11-03 Thread Richard Biener

On Tue, Nov 3, 2015 at 11:25 AM, Eric Botcazou  wrote:
>> I suggest to re-instantiate the canonical type checks for the aggregate type
>> case.
>
> OK, thanks, this fixes all the known ICEs so far.
>
> Tested on x86_64-suse-linux, OK for the mainline?

Please instead do the change here:

  /* For aggregates compare only the size.  Accesses to fields do have
 a type information by themselves and thus we only care if we can i.e.
 use the types in move operations.  */
  else if (AGGREGATE_TYPE_P (inner_type)
   && TREE_CODE (inner_type) == TREE_CODE (outer_type))
return (TYPE_MODE (outer_type) != BLKmode
|| operand_equal_p (TYPE_SIZE (inner_type),
TYPE_SIZE (outer_type), 0));

to

  return TYPE_CANONICAL (inner_type)
&& TYPE_CANONICAL (outer_type) == TYPE_CANONICAL (inner_type)

Ok with that change.

Richard.

>
> 2015-11-03  Eric Botcazou  
>
> * gimple-expr.c (useless_type_conversion_p): Reinstate type canonical
> check for aggregate types and beef up comment for mode check.
>
>
> 2015-11-03  Eric Botcazou  
>
> * gnat.dg/discr45.adb: Only compile the test.
>
> --
> Eric Botcazou

[gomp4, committed] Backport make_restrict_var_constraints fixes from trunk

2015-11-03 Thread Tom de Vries


Hi,

I've ported two recent commits in make_restrict_var_constraints 
tree-ssa-structalias.c on trunk to gomp-4_0-branch.


Committed as attached to gomp-4_0-branch.

Thanks,
- Tom
Backport make_restrict_var_constraints fixes from trunk

2015-11-03  Tom de Vries  

	backport from trunk:
	2015-11-03  Tom de Vries  

	* tree-ssa-structalias.c (make_restrict_var_constraints): Rename to ...
	(make_param_constraints): ... this.  Add and handle restrict_name
	parameter.  Handle is_full_var case.
	(intra_create_variable_infos): Use make_param_constraints.

	* tree-ssa-structalias.c (make_restrict_var_constraints): Replace
	make_copy_constraint call with make_constraint_from call.
---
 gcc/ChangeLog  | 12 
 gcc/tree-ssa-structalias.c | 33 ++---
 2 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 70895a6..b28bf54 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,15 @@
+2015-11-03  Tom de Vries  
+
+	* tree-ssa-structalias.c (make_restrict_var_constraints): Rename to ...
+	(make_param_constraints): ... this.  Add and handle restrict_name
+	parameter.  Handle is_full_var case.
+	(intra_create_variable_infos): Use make_param_constraints.
+
+2015-11-03  Tom de Vries  
+
+	* tree-ssa-structalias.c (make_restrict_var_constraints): Replace
+	make_copy_constraint call with make_constraint_from call.
+
 2015-08-29  Anatoly Sokolov  
 
 	* config/mcore/mcore.h (REG_OK_FOR_BASE_P, REG_OK_FOR_INDEX_P,
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index d409727..f4c875f 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -5892,19 +5892,22 @@ debug_solution_for_var (unsigned int var)
   dump_solution_for_var (stderr, var);
 }
 
-/* Register the constraints for restrict var VI.  */
+/* Register the constraints for function parameter related VI.  Use RESTRICT_NAME
+   as the base name of created restrict vars.  */
 
 static void
-make_restrict_var_constraints (varinfo_t vi)
+make_param_constraints (varinfo_t vi, const char *restrict_name)
 {
   for (; vi; vi = vi_next (vi))
-if (vi->may_have_pointers)
-  {
-	if (vi->only_restrict_pointers)
-	  make_constraint_from_global_restrict (vi, "GLOBAL_RESTRICT", true);
-	else
-	  make_copy_constraint (vi, nonlocal_id);
-  }
+{
+  if (vi->only_restrict_pointers)
+	make_constraint_from_global_restrict (vi, restrict_name, true);
+  else if (vi->may_have_pointers)
+	make_constraint_from (vi, nonlocal_id);
+
+  if (vi->is_full_var)
+	break;
+}
 }
 
 /* Create varinfo structures for all of the variables in the
@@ -5941,19 +5944,11 @@ intra_create_variable_infos (struct function *fn)
 	  vi->is_restrict_var = 1;
 	  insert_vi_for_tree (heapvar, vi);
 	  make_constraint_from (p, vi->id);
-	  make_restrict_var_constraints (vi);
+	  make_param_constraints (vi, "GLOBAL_RESTRICT");
 	  continue;
 	}
 
-  for (; p; p = vi_next (p))
-	{
-	  if (p->only_restrict_pointers)
-	make_constraint_from_global_restrict (p, "PARM_RESTRICT", true);
-	  else if (p->may_have_pointers)
-	make_constraint_from (p, nonlocal_id);
-	  if (p->is_full_var)
-	break;
-	}
+  make_param_constraints (p, "PARM_RESTRICT");
 }
 
   /* Add a constraint for a result decl that is passed by reference.  */
-- 
1.9.1

Re: [PATCH] New attribute to create target clones

2015-11-03 Thread Evgeny Stupachenko

Some tests in the patch already updated (ifunc require condition
added) by Uros commit: ac39b078992c27934ea53cb580dbd79f75b6c727

I'll ask to commit attached patch.
x86 bootstrap and make check passed.

2015-11-03  Evgeny Stupachenko  

gcc/
* multiple_target.c (create_dispatcher_calls): Add target check
on ifunc.
(create_target_clone): Change assembler name for versioned declarations.

gcc/testsuite
* g++.dg/ext/mvc4.C: Add dg-require-ifunc condition.
* gcc.target/i386/mvc5.c: Ditto.
* gcc.target/i386/mvc7.c: Add dg-require-ifunc condition and checks on
resolver.

Thanks,
Evgeny

On Mon, Nov 2, 2015 at 8:02 PM, Jeff Law  wrote:
> On 11/02/2015 07:50 AM, Evgeny Stupachenko wrote:
>>
>> Yes, that is exactly what should fix the tests.
>> Unfortunately I don't have access to darwin machine right now.
>> Can you please test if the patch (attached) fixes the tests?
>>
>> gcc/
>>  * multiple_target.c (create_dispatcher_calls): Add target check
>>  on ifunc.
>>  (create_target_clone): Change assembler name for versioned
>> declarations.
>>
>> gcc/testsuite
>>  * gcc.dg/mvc1.c: Add dg-require-ifunc condition.
>>  * gcc.dg/mvc4.c: Ditto.
>>  * gcc.dg/mvc5.c: Ditto.
>>  * g++.dg/ext/mvc1.C: Ditto.
>>  * g++.dg/ext/mvc4.C: Ditto.
>>  * gcc.dg/mvc7.c: Add dg-require-ifunc condition and checks on
>> resolver.
>>
> OK.
>
> jeff
>


target_clones_tests_fix.patch
Description: Binary data

Re: [RFC] Combine vectorized loops with its scalar remainder.

2015-11-03 Thread Richard Biener

On Wed, Oct 28, 2015 at 11:45 AM, Yuri Rumyantsev  wrote:
> Hi All,
>
> Here is a preliminary patch to combine vectorized loop with its scalar
> remainder, draft of which was proposed by Kirill Yukhin month ago:
> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html
> It was tested wwith '-mavx2' option to run on Haswell processor.
> The main goal of it is to improve performance of vectorized loops for AVX512.
> Note that only loads/stores and simple reductions with binary operations are
> converted to masked form, e.g. load --> masked load and reduction like
> r1 = f  r2 --> t = f  r2; r1 = m ? t : r2. Masking is performed 
> through
> creation of a new vector induction variable initialized with consequent values
> from 0.. VF-1, new const vector upper bound which contains number of 
> iterations
> and the result of comparison which is considered as mask vector.
> This implementation has several restrictions:
>
> 1. Multiple types are not supported.
> 2. SLP is not supported.
> 3. Gather/Scatter's are also not supported.
> 4. Vectorization of the loops with low trip count is not implemented yet since
>it requires additional design and tuning.
>
> We are planning to eleminate all these restrictions in GCCv7.
>
> This patch will be extended to include cost model to reject unprofutable
> transformations, e.g. new vector body cost will be evaluated through new
> target hook which estimates cast of masking different vector statements. New
> threshold parameter will be introduced which determines permissible cost
> increasing which will be tuned on an AVX512 machine.
> This patch is not in sync with changes of Ilya Enkovich for AVX512 masked
> load/store support since only part of them is in trunk compiler.
>
> Any comments will be appreciated.

As stated in the previous discussion I don't think the extra mask IV
is a good idea
and we instead should have a masked final iteration for the epilogue
(yes, that's
not really "combined" then).  This is because in the end we'd not only
want AVX512
to benefit from this work but also other ISAs that can do unaligned or masked
operations (we can overlap the epilogue work with the vectorized work or use
masked loads/stores available with AVX).  Note that the same applies to
the alignment prologue if present, I can't see how you can handle that with the
in-loop approach.

Richard.

Re: [PATCH 1/6]tree-ssa-dom.c: Normalize exprs, starting with ARRAY_REF to MEM_REF

2015-11-03 Thread Richard Biener

On Tue, Nov 3, 2015 at 12:13 PM, Alan Lawrence  wrote:
> On 3 November 2015 at 10:27, Alan Lawrence  wrote:
>> That is, ssa-dom-cse-7.c passes (and the patch series solves PR/63679) if
>> instead of my patch 2 (normalization of MEM_REFs) we have this:
>>
>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>> index 4327990..2889a96 100644
>> --- a/gcc/tree-sra.c
>> +++ b/gcc/tree-sra.c
>> @@ -1697,7 +1697,7 @@ build_ref_for_offset (location_t loc, tree base, 
>> HOST_WIDE_INT offset,
>>  }
>>else
>>  {
>> -  off = build_int_cst (reference_alias_ptr_type (base),
>> +  off = build_int_cst (build_pointer_type (exp_type),
>>base_offset + offset / BITS_PER_UNIT);
>>base = build_fold_addr_expr (unshare_expr (base));
>>  }
>>
>> ...I'll test that fully but I have to wonder what the right path is here!
>
> So with also changing the other reference_alias_ptr_type in the first
> case of build_ref_for_offset, it breaks Ada ACATS (on x86):
>
> c52101a "CHECK THAT ARRAY SUBTYPE CONVERSION IS APPLIED AFTER AN ARRAY
> VALUE IS DETERMINED"
> cc70003
> cxac004 (stream access, stream functions)
>
> I'll not dig any further unless you think that change to SRA is
> the right avenue to investigate!

Nope, that change looks wrong to me.

Richard.

> Cheers, Alan

Re: [RFC] Combine vectorized loops with its scalar remainder.

2015-11-03 Thread Yuri Rumyantsev

Richard,

It looks like misunderstanding - we assume that for GCCv6 the simple
scheme of remainder will be used through introducing new IV :
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html

Is it true or we missed something?
Now we are testing vectorization of loops with small non-constant trip count.
Yuri.

2015-11-03 14:47 GMT+03:00 Richard Biener :
> On Wed, Oct 28, 2015 at 11:45 AM, Yuri Rumyantsev  wrote:
>> Hi All,
>>
>> Here is a preliminary patch to combine vectorized loop with its scalar
>> remainder, draft of which was proposed by Kirill Yukhin month ago:
>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html
>> It was tested wwith '-mavx2' option to run on Haswell processor.
>> The main goal of it is to improve performance of vectorized loops for AVX512.
>> Note that only loads/stores and simple reductions with binary operations are
>> converted to masked form, e.g. load --> masked load and reduction like
>> r1 = f  r2 --> t = f  r2; r1 = m ? t : r2. Masking is performed 
>> through
>> creation of a new vector induction variable initialized with consequent 
>> values
>> from 0.. VF-1, new const vector upper bound which contains number of 
>> iterations
>> and the result of comparison which is considered as mask vector.
>> This implementation has several restrictions:
>>
>> 1. Multiple types are not supported.
>> 2. SLP is not supported.
>> 3. Gather/Scatter's are also not supported.
>> 4. Vectorization of the loops with low trip count is not implemented yet 
>> since
>>it requires additional design and tuning.
>>
>> We are planning to eleminate all these restrictions in GCCv7.
>>
>> This patch will be extended to include cost model to reject unprofutable
>> transformations, e.g. new vector body cost will be evaluated through new
>> target hook which estimates cast of masking different vector statements. New
>> threshold parameter will be introduced which determines permissible cost
>> increasing which will be tuned on an AVX512 machine.
>> This patch is not in sync with changes of Ilya Enkovich for AVX512 masked
>> load/store support since only part of them is in trunk compiler.
>>
>> Any comments will be appreciated.
>
> As stated in the previous discussion I don't think the extra mask IV
> is a good idea
> and we instead should have a masked final iteration for the epilogue
> (yes, that's
> not really "combined" then).  This is because in the end we'd not only
> want AVX512
> to benefit from this work but also other ISAs that can do unaligned or masked
> operations (we can overlap the epilogue work with the vectorized work or use
> masked loads/stores available with AVX).  Note that the same applies to
> the alignment prologue if present, I can't see how you can handle that with 
> the
> in-loop approach.
>
> Richard.

[PATCH] New plugin events when evaluating constexpr expressions.

2015-11-03 Thread Andres Tiraboschi

 Hi
 This patch adds two plugins events when evaluated call expression and
an init or modify expression in constexpr.
 The goal of this patch is to allow the plugins to analyze and or
modify the evaluation of constant expressions.

 This patch also adds an event that is called when the parsing of a
file is finished.

Thanks,
Andrés.
diff --git a/gccOrig/gcc/cp/constexpr.c b/gccMod/gcc/cp/constexpr.c
index e250726..1c5431a 100644
--- a/gccOrig/gcc/cp/constexpr.c
+++ b/gccMod/gcc/cp/constexpr.c
@@ -42,6 +42,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-inline.h"
 #include "ubsan.h"
+#include "plugin-api.h"
+#include "plugin.h"
 
 static bool verify_constant (tree, bool, bool *, bool *);
 #define VERIFY_CONSTANT(X) \
@@ -123,13 +125,6 @@ ensure_literal_type_for_constexpr_object (tree decl)
   return decl;
 }
 
-/* Representation of entries in the constexpr function definition table.  */
-
-struct GTY((for_user)) constexpr_fundef {
-  tree decl;
-  tree body;
-};
-
 struct constexpr_fundef_hasher : ggc_hasher
 {
   static hashval_t hash (constexpr_fundef *);
@@ -855,67 +850,17 @@ explain_invalid_constexpr_fn (tree fun)
   input_location = save_loc;
 }
 
-/* Objects of this type represent calls to constexpr functions
-   along with the bindings of parameters to their arguments, for
-   the purpose of compile time evaluation.  */
-
-struct GTY((for_user)) constexpr_call {
-  /* Description of the constexpr function definition.  */
-  constexpr_fundef *fundef;
-  /* Parameter bindings environment.  A TREE_LIST where each TREE_PURPOSE
- is a parameter _DECL and the TREE_VALUE is the value of the parameter.
- Note: This arrangement is made to accommodate the use of
- iterative_hash_template_arg (see pt.c).  If you change this
- representation, also change the hash calculation in
- cxx_eval_call_expression.  */
-  tree bindings;
-  /* Result of the call.
-   NULL means the call is being evaluated.
-   error_mark_node means that the evaluation was erroneous;
-   otherwise, the actuall value of the call.  */
-  tree result;
-  /* The hash of this call; we remember it here to avoid having to
- recalculate it when expanding the hash table.  */
-  hashval_t hash;
-};
-
 struct constexpr_call_hasher : ggc_hasher
 {
   static hashval_t hash (constexpr_call *);
   static bool equal (constexpr_call *, constexpr_call *);
 };
 
-/* The constexpr expansion context.  CALL is the current function
-   expansion, CTOR is the current aggregate initializer, OBJECT is the
-   object being initialized by CTOR, either a VAR_DECL or a _REF.  VALUES
-   is a map of values of variables initialized within the expression.  */
-
-struct constexpr_ctx {
-  /* The innermost call we're evaluating.  */
-  constexpr_call *call;
-  /* Values for any temporaries or local variables within the
- constant-expression. */
-  hash_map *values;
-  /* The CONSTRUCTOR we're currently building up for an aggregate
- initializer.  */
-  tree ctor;
-  /* The object we're building the CONSTRUCTOR for.  */
-  tree object;
-  /* Whether we should error on a non-constant expression or fail quietly.  */
-  bool quiet;
-  /* Whether we are strictly conforming to constant expression rules or
- trying harder to get a constant value.  */
-  bool strict;
-};
-
 /* A table of all constexpr calls that have been evaluated by the
compiler in this translation unit.  */
 
 static GTY (()) hash_table *constexpr_call_table;
 
-static tree cxx_eval_constant_expression (const constexpr_ctx *, tree,
- bool, bool *, bool *, tree * = NULL);
-
 /* Compute a hash value for a constexpr call representation.  */
 
 inline hashval_t
@@ -1303,6 +1248,19 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
   bool non_constant_args = false;
   cxx_bind_parameters_in_call (ctx, t, &new_call,
   non_constant_p, overflow_p, &non_constant_args);
+  constexpr_call_info call_info;
+  call_info.function = t;
+  call_info.lval = lval;
+  call_info.call = &new_call;
+  call_info.call_stack = call_stack;
+  call_info.non_constant_args = &non_constant_args;
+  call_info.non_const_p = non_constant_p;
+  call_info.ctx = ctx;
+  call_info.result = NULL_TREE;
+  invoke_plugin_callbacks (PLUGIN_EVAL_CALL_CONSTEXPR, &call_info);
+  if (call_info.result != NULL_TREE)
+return unshare_expr (call_info.result);
+
   if (*non_constant_p)
 return t;
 
@@ -2636,6 +2594,19 @@ cxx_eval_store_expression (const constexpr_ctx *ctx, 
tree t,
   target = cxx_eval_constant_expression (ctx, target,
 true,
 non_constant_p, overflow_p);
+
+  constexpr_modify_info mod_info;
+  mod_info.target = target;
+  mod_info.expr = t;
+  mod_info.ctx = ctx;
+  mod_info.init = init;
+  mod_info.overflow = overflow_p;
+  mod_info.non_c

[PATCH] Add configure flag for operator new (std::nothrow)

2015-11-03 Thread Aurelio Remonda

Currently, whenever operator new (std::nothrow) fails to allocate memory, it'll
check if there is a new-handler function available. If there is, it'll call
the handler and then try to allocate again. Otherwise, it'll return a null 
pointer.

This retrying behavior may not always be desirable. If the handler cannot fix
the memory allocation issue, we may end up being stuck in an infinite loop.
Whereas returning nullptr may be a valid alternative to keep calling the 
new_handler.
The workaround to end the loop, we would have to call 
std::set_new_handler(nullptr)
from within the handler itself, which gets complicated if the handler has to be
re-setted afterwards.

This patch adds the new_nothrow_no_retry configuration flag, which, if enabled,
will change the retrying behavior of operator new (std::nothrow) so that it 
only calls
the handler once when it fails to allocate memory and the return nullptr.
I have a company-wide copyright assignment, but I don't have commit access.

---
 ChangeLog | 9 +
 libstdc++-v3/acinclude.m4 | 9 +
 libstdc++-v3/configure.ac | 1 +
 libstdc++-v3/doc/xml/manual/configure.xml | 7 +++
 libstdc++-v3/libsupc++/new_opnt.cc| 4 
 5 files changed, 30 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 5b16ca2..a1cd0d3 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2015-10-30  Aurelio Remonda  
+
+   * libstdc++-v3/acinclude.m4: add enable_new_opnt_no_allocation_retry
+   flag definition.
+   * libstdc++-v3/configure.ac: add option flag
+   GLIBCXX_ENABLE_NEW_OPNT_NO_ALLOCATION_RETRY
+   * libstdc++-v3/libsupc++/new_opnt.cc use the defined macro
+   * libstdc++-v3/doc/xml/manual/configure.xml
+
 2015-10-09  Martin Liska  
 
* MAINTAINERS (Write After Approval): Add myself.
diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index abf2e93..c8f7a75 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -2629,6 +2629,15 @@ AC_DEFUN([GLIBCXX_ENABLE_LONG_LONG], [
   AC_MSG_RESULT([$enable_long_long])
 ])
 
+AC_DEFUN([GLIBCXX_ENABLE_NEW_OPNT_NO_ALLOCATION_RETRY], [
+  GLIBCXX_ENABLE(new-opnt-no-allocation-retry,$1,,[enable new nothrow no 
allocation retry condition])
+  if test $enable_new_opnt_no_allocation_retry = yes; then
+AC_DEFINE(_GLIBCXX_NEW_OPNT_NO_ALLOCATION_RETRY, 1,
+[Define if operator new (std::nothrow) will retry allocation after 
callin the handler.])
+  fi
+  AC_MSG_CHECKING([whether no-throw operator new should retry allocation after 
calling the new handler])
+  AC_MSG_RESULT([$enable_new_opnt_no_allocation_retry])
+])
 
 dnl
 dnl Check for decimal floating point.
diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index 3456348..2f3aaa9 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -165,6 +165,7 @@ GLIBCXX_ENABLE_CLOCALE
 GLIBCXX_ENABLE_ALLOCATOR
 GLIBCXX_ENABLE_CHEADERS($c_model)  dnl c_model from configure.host
 GLIBCXX_ENABLE_LONG_LONG([yes])
+GLIBCXX_ENABLE_NEW_OPNT_NO_ALLOCATION_RETRY([no])
 GLIBCXX_ENABLE_WCHAR_T([yes])
 GLIBCXX_ENABLE_C99([yes])
 GLIBCXX_ENABLE_CONCEPT_CHECKS([no])
diff --git a/libstdc++-v3/doc/xml/manual/configure.xml 
b/libstdc++-v3/doc/xml/manual/configure.xml
index 2f558d2..7787bc3 100644
--- a/libstdc++-v3/doc/xml/manual/configure.xml
+++ b/libstdc++-v3/doc/xml/manual/configure.xml
@@ -278,6 +278,13 @@
  
  
 
+--enable-new-opnt-no-allocation-retry 
+ This option will cause operator new (std::nothrow) not 
+   to retry allocation if a handler has been set.
+   The purpose of this is to call the handler just once and return.
+ 
+ 
+
  --enable-fully-dynamic-string
  This option enables a special version of basic_string avoiding
the optimization that allocates empty objects in static memory.
diff --git a/libstdc++-v3/libsupc++/new_opnt.cc 
b/libstdc++-v3/libsupc++/new_opnt.cc
index a9eb465..fac86bc 100644
--- a/libstdc++-v3/libsupc++/new_opnt.cc
+++ b/libstdc++-v3/libsupc++/new_opnt.cc
@@ -40,7 +40,11 @@ operator new (std::size_t sz, const std::nothrow_t&) 
_GLIBCXX_USE_NOEXCEPT
   if (sz == 0)
 sz = 1;
 
+#ifdef GLIBCXX_ENABLE_NEW_OPNT_NO_ALLOCATION_RETRY
+  if (__builtin_expect ((p = malloc (sz)) == 0, false))
+#else
   while (__builtin_expect ((p = malloc (sz)) == 0, false))
+#endif
 {
   new_handler handler = std::get_new_handler ();
   if (! handler)
-- 
1.9.1

Re: [PATCH] replace BITS_PER_UNIT with __CHAR_BIT__ in target libs

2015-11-03 Thread Joseph Myers

On Mon, 2 Nov 2015, Jeff Law wrote:

> Based on Bernd's comments, I think this is fine.  Any sense of how much work
> there is left to cleanup the runtime's inclusion of gcc's config/ target
> headers?

See .  I think most 
of that page other than the list of target macros describes stuff that was 
done some time ago (which should be verified, and then the obsolete pieces 
removed), and the list of host-side target macros used in target-side code 
may not be fully up to date, but it should be indicative of what needs 
fixing.  (The division by possible approaches for fixing each macro is 
extremely rough, however - probably several macros would best be fixed in 
some way other than the initial guess I put on that page.  And if moving 
macros to libgcc_tm.h, note my warning at (b)(ii) in 
 about some libgcc files 
not including libgcc_tm.h.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [patch 4/3] Header file reduction - Tools for contrib - second cut

2015-11-03 Thread Andrew MacLeod


On 11/03/2015 01:06 AM, Jeff Law wrote:

On 10/14/2015 09:14 AM, Andrew MacLeod wrote:

Here's the latest version of the tools for a sub directory in contrib.
I've handled all the feedback, except I have not fully commented the
python code in the tools, nor followed any particular coding
convention...   Documentation has been handled, and I've added some
additional comments to the places which were noted as being unclear. Ive
also removed all tabs from the source files.

Ive also updated show-headers slightly to be a little more
error-resistant and to put some emphasis on any header files specified
on the command as being of interest . (when there are 140 shown, it can
be hard to find the one you are looking for sometimes)

Do we wish to impose anything in particular on the source for tools
going into this sub-directory of contrib? The other tools in contrib
don't seem to have much in the way of coding standards. I also
wonder if anyone other than me will look at them much :-)

I'm certainly interested in them.

Do you have any sense of whether or not coverage of the tools has 
improved over short time since we started squashing out conditional 
compilation?  I was running the header file reordering bits on the 
trunk and was a bit surprised of how many things they're still 
changing.  But that would make sense if some files are now being 
processed that weren't before because we've squashed out the 
conditional compilation.


hmm. no, i dont have a feel for that.  Anl to be fair, I didn't run the 
tools on every file in trunk.  I limited it to the ones in backend.h, 
and took out even a few of those that were troublesome in some way or 
other at some point.   I wouldnt expect the conditional stuff to affect 
reordering much.  reducing...  we might start to see things like tm.h or 
target.h included less.


A further enhancement in line with that would be to teach the reducer 
about a couple of special files.. like the relationship between 
options.h, tm.h and target.h.   sometimes target.h was included when in 
fact options.h was the only thing actually needed..  During the 
flayttening process I manually handled this by flattening tm.h out of 
target.h and options.h into anything that included tm.h...  so every 
file had options.h, tm.h and target.h explicitly included, and then the 
reducer would just pick the "minimum".  of course, the reorder tool 
works against this by combining them again :-)


however, the tool could be taught when it see target.h for instance, if 
it can't be removed, it could try replacing it with options.h and if 
that fails, tm.h..That sort of thing could automatically remove 
headers that arent needed because target macros have been turned into 
hooks or something.  I suppose that could even be generalized to trying 
to replace  each header that included other headers...  I wonder how 
safe that would be. hum.




It certainly is true that the total result is smaller than any of the 
backend, config/ or languages changes that you posted, and I'm running 
it across the entire source tree, but I'm still surprised at how much 
churn I'm seeing.


If it weren't for the level of churn, I'd probably be suggesting we 
just have this stuff run regularly (weekly, monthly, whatever) and 
commit the result after a sanity looksie.  I've yet to see this tool 
botch anything and if we're not unnecessarily churning the sources, 
keeping us clean WRT canononical ordering and duplicate removal 
automatically seems like a good place to be.


it can botch one of the go files.. go has a backend.h of it's own... 
which buggers things up quite nicely since it doesnt include a bunch of 
the headers gcc's backend.h does :-)


The reordering tool is likely safer to run across the board.. especially 
if we can determine the very small subset it shouldn't be run on.


Right now it triggers off the presence of system.h... if system.h is not 
present, it wont do anything to the file. I haven't tried running it 
against *.c to see if there are any other failures, perhaps thats not a 
bad idea.   That will also provide us with a list of files which have 
headers included within conditional compliation... there are a few of 
those :-P  and maybe they could be fixed.  by default it wou=nt do 
anything to those either.


Anyway, if we run it against everything and check it in, then in theory 
there isn't any reason you couldnt spot run it at some interval.. there 
shouldn't be much churn then.


Maybe do another commit of the reordering output and evaluate again in 
a month?


I don't think we're quite there on the reducer and it obviously 
requires more infrastructure in place to test.  But it'd be nice to 
get to a similar state on that tool.


yeah, the reducer still needs some tweaks to be generally runnable I 
think.   IN particular, how to deal with externally supplied macros it 
cant really see.  Im still thinking about that one.


Which reminds me, you ought to add a VMS target to your tests.

Re: [PATCH] Add configure flag for operator new (std::nothrow)

2015-11-03 Thread Paolo Carlini


Hi,

On 11/03/2015 01:35 PM, Aurelio Remonda wrote:

diff --git a/ChangeLog b/ChangeLog
index 5b16ca2..a1cd0d3 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2015-10-30  Aurelio Remonda  
+
+   * libstdc++-v3/acinclude.m4: add enable_new_opnt_no_allocation_retry
+   flag definition.
+   * libstdc++-v3/configure.ac: add option flag
+   GLIBCXX_ENABLE_NEW_OPNT_NO_ALLOCATION_RETRY
+   * libstdc++-v3/libsupc++/new_opnt.cc use the defined macro
+   * libstdc++-v3/doc/xml/manual/configure.xml
+
Three minor comments. First, ChangeLog entries aren't normally submitted 
as part of the patch. Second, since the ChangeLog is under libstdc++-v3, 
the ChangeLog entries should not have libstdc++-v3 in the paths (eg, 
just * acinclude.m4: ...). Finally, since you are touching acinclude.m4 
you should normally run autoreconf, mention in the ChangeLog the changed 
regenerated files and eventually commit those changes too (like the 
ChangeLog entries, those aren't normally part of the posted patch) About 
the three issues, you have plenty of examples in the mailing list.


Otherwise, about the substance of the patch, I think we want to wait for 
Jonathan to be back.


Paolo.

Re: [PATCH] PR/67682, break SLP groups up if only some elements match

2015-11-03 Thread Richard Biener

On Tue, Oct 27, 2015 at 6:38 PM, Alan Lawrence  wrote:
> On 26/10/15 15:04, Richard Biener wrote:
>>
>>
>> apart from the fact that you'll post a new version you need to adjust
>> GROUP_GAP.
>> You also seem to somewhat "confuse" "first I stmts" and "a group of
>> size I", those
>> are not the same when the group has haps.  I'd say "a group of size i"
>> makes the
>> most sense here thus I suggest to adjust the function comment accordingly.
>
>
> Ok, thanks for pointing this out. My objective had been to only split the
> store groups - which in BB vectorization, always seem to have gap 0 1 1 
> 1. I didn't come up with a good scheme for how to split load groups, but it
> seemed that I didn't need to do anything there if I restricted to BB
> vectorization only. For example, consider (ignoring that we could multiply
> the first four elements by 1 and add 0 to the last four):
>
> a[0] = b[I] + 1;
> a[1] = b[J] + 2;
> a[2] = b[K] + 3;
> a[3] = b[L] + 4;
> a[4] = b[M] * 3;
> a[5] = b[N] * 4;
> a[6] = b[O] * 5;
> a[7] = b[P] * 7;
>
>
> with constants I,J,K,L,M,N,O,P. Even with those being a sequence 2 0 1 1 3 0
> 2 1 with overlaps and repetitions, this works fine for BB SLP (two subgroups
> of stores, *sharing* a load group but with different permutations). Likewise
> 0 1 2 3 0 2 4 6.
>
> For loop SLP, yes it looks like the load group needs to be split. So how;
> and what constraints to impose on those constants? (There is no single right
> answer!)
>
> A fairly-strict scheme could be that (I,J,K,L) must be within a contiguous
> block of memory, that does not overlap with the contiguous block containing
> (M,N,O,P). Then, splitting the load group on the boundary seems reasonable,
> and updating the gaps as you suggest. However, when you say "the group first
> elements GROUP_GAP is the gap at the _end_ of the whole group" - the gap at
> the end is the gap that comes after the last element and up towhat?
>
> Say I...P are consecutive, the input would have gaps 0 1 1 1 1 1 1 1. If we
> split the load group, we would want subgroups with gaps 0 1 1 1 and 0 1 1 1?
> (IIUC, you suggest  and 0111?)

As said on IRC it should be 4 1 1 1 and 4 1 1 1.

> If they are disjoint sets, but overlapping blocks of memory, say 0 2 4 6 1 3
> 5 7...then do we create two load groups, with gap 0 2 2 2 and 0 2 2 2 again?
> Does something record that the load groups access overlapping areas, and
> record the offset against each other?

No, I don't think we can split load groups that way.  So I think if
splitting store
groups works well (with having larger load groups) then that's the way to go
(even for loop vect).

> If there are repeated elements (as in the BB SLP case mentioned above), I'm
> not clear how we can split this effectively...so may have to rule out that
> case. (Moreover, if we are considering hybrid SLP, it may not be clear what
> the loop accesses are, we may be presented only with the SLP accesses. Do we
> necessarily want to pull those out of a load group?)
>
> So I expect I may resolve some of these issues as I progress, but I'm
> curious as to whether (and why) the patch was really broken (wrt gaps) as it
> stood...

Yes, the gaps were clearly bogously constructed in general.  If you have an
existing group you can only split it into non-overlapping groups.  Thus for
two load SLP nodes loading from 0 2 4 6 and from 1 3 5 7 you will have
a single "group" (0 1 2 3 4 5 6 7) and you can at most split it as
0 1 2 3, 4 5 6 7 which won't help in this case (but would be actually worse).

So I think restricting the splitting to the stores should work fine.

Richard.

> Thanks,
> Alan
>

Re: [mask-load, patch 1/2] Use boolean predicate for masked loads and store

2015-11-03 Thread Richard Biener

On Wed, Oct 28, 2015 at 4:23 PM, Ilya Enkovich  wrote:
> On 23 Oct 13:36, Ilya Enkovich wrote:
>> 2015-10-23 13:32 GMT+03:00 Richard Biener :
>> >
>> > No, we'd get
>> >
>> >   mask_1 = bool != 1;
>> >
>> > and the 'mask' variable should have been simplified to 'bool'
>> > (yes, we'd insert a dead stmt).  gimple_build simplifies
>> > stmts via the match-and-simplify machinery and match.pd
>> > knows how to invert conditions.
>> >
>>
>> Thanks! I'll try it.
>>
>> Ilya
>
> Hi,
>
> Here is a new version.  Changes you suggested cause BIT_NOT_EXPR used for 
> generated mask (instead of != 1 used before).  It required a small fix to get 
> it vectorized to avoid regressions.  Is this version OK?

Ok.

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-10-28  Ilya Enkovich  
>
> * internal-fn.c (expand_MASK_LOAD): Adjust to maskload optab changes.
> (expand_MASK_STORE): Adjust to maskstore optab changes.
> * optabs-query.c (can_vec_mask_load_store_p): Add MASK_MODE arg.
>  Adjust to maskload, maskstore optab changes.
> * optabs-query.h (can_vec_mask_load_store_p): Add MASK_MODE arg.
> * optabs.def (maskload_optab): Transform into convert optab.
> (maskstore_optab): Likewise.
> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
> can_vec_mask_load_store_p signature change.
> (predicate_mem_writes): Use boolean mask.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Adjust to
> can_vec_mask_load_store_p signature change.  Allow invariant masks.
> (vectorizable_operation): Ignore type precision for boolean vectors.
>
> gcc/testsuite/
>
> 2015-10-28  Ilya Enkovich  
>
> * gcc.target/i386/avx2-vec-mask-bit-not.c: New test.
>
>
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index f12d3af..2317e20 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -1901,7 +1901,9 @@ expand_MASK_LOAD (gcall *stmt)
>create_output_operand (&ops[0], target, TYPE_MODE (type));
>create_fixed_operand (&ops[1], mem);
>create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
> -  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
> +  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
> + TYPE_MODE (TREE_TYPE (maskt))),
> +  3, ops);
>  }
>
>  static void
> @@ -1924,7 +1926,9 @@ expand_MASK_STORE (gcall *stmt)
>create_fixed_operand (&ops[0], mem);
>create_input_operand (&ops[1], reg, TYPE_MODE (type));
>create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
> -  expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops);
> +  expand_insn (convert_optab_handler (maskstore_optab, TYPE_MODE (type),
> + TYPE_MODE (TREE_TYPE (maskt))),
> +  3, ops);
>  }
>
>  static void
> diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
> index 254089f..c20597c 100644
> --- a/gcc/optabs-query.c
> +++ b/gcc/optabs-query.c
> @@ -466,7 +466,9 @@ can_mult_highpart_p (machine_mode mode, bool uns_p)
>  /* Return true if target supports vector masked load/store for mode.  */
>
>  bool
> -can_vec_mask_load_store_p (machine_mode mode, bool is_load)
> +can_vec_mask_load_store_p (machine_mode mode,
> +  machine_mode mask_mode,
> +  bool is_load)
>  {
>optab op = is_load ? maskload_optab : maskstore_optab;
>machine_mode vmode;
> @@ -474,7 +476,7 @@ can_vec_mask_load_store_p (machine_mode mode, bool 
> is_load)
>
>/* If mode is vector mode, check it directly.  */
>if (VECTOR_MODE_P (mode))
> -return optab_handler (op, mode) != CODE_FOR_nothing;
> +return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
>
>/* Otherwise, return true if there is some vector mode with
>   the mask load/store supported.  */
> @@ -485,7 +487,12 @@ can_vec_mask_load_store_p (machine_mode mode, bool 
> is_load)
>if (!VECTOR_MODE_P (vmode))
>  return false;
>
> -  if (optab_handler (op, vmode) != CODE_FOR_nothing)
> +  mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (vmode),
> +  GET_MODE_SIZE (vmode));
> +  if (mask_mode == VOIDmode)
> +return false;
> +
> +  if (convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
>  return true;
>
>vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
> @@ -496,8 +503,10 @@ can_vec_mask_load_store_p (machine_mode mode, bool 
> is_load)
>if (cur <= GET_MODE_SIZE (mode))
> continue;
>vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
> +  mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (vmode),
> +  cur);
>if (VECTOR_MODE_P (vmode)
> - && optab_handler (op, vmode) != CODE_FOR_nothing)
> + && convert_optab_handl

Re: [PATCH] remove unused config/arm/coff.h

2015-11-03 Thread Richard Earnshaw

On 03/11/15 11:31, tbsaunde+...@tbsaunde.org wrote:
> From: Trevor Saunders 
> 
> Hi,
> 
> $subject, nothing refers to this header so we might as well remove it.
> 
> tested I can still build on x86_64-linux-gnu, not that I would expect anything
> else or that it is particularly relevent, ok?
> 

OK.

I think it was used by the pe-coff/wince port, but that was obsoleted
some time back.

R.

> Trev
> 
> gcc/ChangeLog:
> 
> 2015-11-03  Trevor Saunders  
> 
>   * config/arm/coff.h: Remove.
> ---
>  gcc/config/arm/coff.h | 82 
> ---
>  1 file changed, 82 deletions(-)
>  delete mode 100644 gcc/config/arm/coff.h
> 
> diff --git a/gcc/config/arm/coff.h b/gcc/config/arm/coff.h
> deleted file mode 100644
> index 526f9b9..000
> --- a/gcc/config/arm/coff.h
> +++ /dev/null
> @@ -1,82 +0,0 @@
> -/* Definitions of target machine for GNU compiler.
> -   For ARM with COFF object format.
> -   Copyright (C) 1995-2015 Free Software Foundation, Inc.
> -   Contributed by Doug Evans (dev...@cygnus.com).
> -   
> -   This file is part of GCC.
> -
> -   GCC is free software; you can redistribute it and/or modify it
> -   under the terms of the GNU General Public License as published
> -   by the Free Software Foundation; either version 3, or (at your
> -   option) any later version.
> -
> -   GCC is distributed in the hope that it will be useful, but WITHOUT
> -   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> -   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> -   License for more details.
> -
> -   You should have received a copy of the GNU General Public License
> -   along with GCC; see the file COPYING3.  If not see
> -   .  */
> -
> -/* Note - it is important that this definition matches the one in tcoff.h.  
> */
> -#undef  USER_LABEL_PREFIX
> -#define USER_LABEL_PREFIX "_"
> -
> -
> -/* Run-time Target Specification.  */
> -#undef  TARGET_DEFAULT_FLOAT_ABI
> -#define TARGET_DEFAULT_FLOAT_ABI ARM_FLOAT_ABI_SOFT
> -
> -#undef  TARGET_DEFAULT
> -#define TARGET_DEFAULT (MASK_APCS_FRAME)
> -
> -#ifndef MULTILIB_DEFAULTS
> -#define MULTILIB_DEFAULTS \
> -  { "marm", "mlittle-endian", "mfloat-abi=soft", "mno-thumb-interwork" }
> -#endif
> -
> -/* This is COFF, but prefer stabs.  */
> -#define SDB_DEBUGGING_INFO 1
> -
> -#define PREFERRED_DEBUGGING_TYPE DBX_DEBUG
> -
> -
> -#define TARGET_ASM_FILE_START_APP_OFF true
> -
> -/* Switch into a generic section.  */
> -#define TARGET_ASM_NAMED_SECTION  default_coff_asm_named_section
> -
> -/* Support the ctors/dtors and other sections.  */
> -
> -#undef INIT_SECTION_ASM_OP
> -
> -/* Define this macro if jump tables (for `tablejump' insns) should be
> -   output in the text section, along with the assembler instructions.
> -   Otherwise, the readonly data section is used.  */
> -/* We put ARM and Thumb-2 jump tables in the text section, because it makes
> -   the code more efficient, but for Thumb-1 it's better to put them out of
> -   band unless we are generating compressed tables.  */
> -#define JUMP_TABLES_IN_TEXT_SECTION  \
> -   (TARGET_32BIT || (TARGET_THUMB && (optimize_size || flag_pic)))
> -
> -#undef  READONLY_DATA_SECTION_ASM_OP
> -#define READONLY_DATA_SECTION_ASM_OP "\t.section .rdata"
> -#undef  CTORS_SECTION_ASM_OP
> -#define CTORS_SECTION_ASM_OP "\t.section .ctors,\"x\""
> -#undef  DTORS_SECTION_ASM_OP
> -#define DTORS_SECTION_ASM_OP "\t.section .dtors,\"x\""
> -
> -/* Support the ctors/dtors sections for g++.  */
> -
> -/* __CTOR_LIST__ and __DTOR_LIST__ must be defined by the linker script.  */
> -#define CTOR_LISTS_DEFINED_EXTERNALLY
> -
> -#undef DO_GLOBAL_CTORS_BODY
> -#undef DO_GLOBAL_DTORS_BODY
> -
> -/* The ARM development system defines __main.  */
> -#define NAME__MAIN  "__gccmain"
> -#define SYMBOL__MAIN __gccmain
> -
> -#define SUPPORTS_INIT_PRIORITY 0
>

Re: [Boolean Vector, patch 1/5] Introduce boolean vector to be used as a vector comparison type

2015-11-03 Thread Jeff Law


On 11/03/2015 04:26 AM, Richard Biener wrote:

On Mon, Nov 2, 2015 at 8:41 PM, Jeff Law  wrote:

On 10/29/2015 07:08 AM, Ilya Enkovich wrote:


On 28 Oct 22:37, Ilya Enkovich wrote:


Seems the problem occurs in this check in expand_vector_operations_1:

/* A scalar operation pretending to be a vector one.  */
if (VECTOR_BOOLEAN_TYPE_P (type)
&& !VECTOR_MODE_P (TYPE_MODE (type))
&& TYPE_MODE (type) != BLKmode)
  return;

This is to filter out scalar operations on boolean vectors.
The problem here is that TYPE_MODE (type) doesn't return
V4SImode assigned to the type but calls vector_type_mode
instead which tries to find an integer mode for it and returns
TImode. This causes function exit and we don't expand vector
comparison.

Suppose simple option to fix it is to change default get_mask_mode
hook to return BLKmode in case chosen integer vector mode is not
vector_mode_supported_p.

Thanks,
Ilya



Here is a patch which fixes the problem on ARM (and on i386 with -mno-sse
also).  I checked it fixes the problem on ARM and also bootstrapped and
checked it on x86_64-unknown-linux-gnu.  Is it OK?

Thanks,
Ilya
--
gcc/

2015-10-29  Ilya Enkovich  

 * targhooks.c (default_get_mask_mode): Use BLKmode in
 case target doesn't support required vector mode.
 * stor-layout.c (layout_type): Check for BLKmode.


VOIDmode would probably be a better choice than BLKmode to signal when the
target doesn't support the required vector mode.


Though we're using BLKmode vectors in all other cases to signal that.
If that's the case in the vetorizer then let's stay consistent with that 
existing practice in the vectorizer.


jeff

Re: [PATCH] Use signed boolean type for boolean vectors

2015-11-03 Thread Richard Biener

On Wed, Oct 28, 2015 at 4:30 PM, Ilya Enkovich  wrote:
> 2015-10-28 18:21 GMT+03:00 Richard Biener :
>> On Wed, Oct 28, 2015 at 2:13 PM, Ilya Enkovich  
>> wrote:
>>> Hi,
>>>
>>> Testing boolean vector conversions I found several runtime regressions
>>> and investigation showed it's due to incorrect conversion caused by
>>> unsigned boolean type.  When boolean vector is represented as an
>>> integer vector on target it's a signed integer actually.  Unsigned
>>> boolean type was chosen due to possible single bit values, but for
>>> multiple bit values it causes wrong casting.  The easiest way to fix
>>> it is to use signed boolean value.  The following patch does this and
>>> fixes my problems with conversion.  Bootstrapped and tested on
>>> x86_64-unknown-linux-gnu.  Is it OK?
>>
>> Hmm.  Actually formally the "boolean" vectors were always 0 or -1
>> (all bits set).  That is also true for a signed boolean with precision 1
>> but with higher precision what makes sure to sign-extend 'true'?
>>
>> So it's far from an obvious change, esp as you don't change the
>> precision == 1 case.  [I still think we should have precision == 1
>> for all boolean types]
>>
>> Richard.
>>
>
> For 1 bit precision signed type value 1 is out of range, right? This might 
> break
> in many place due to used 1 as true value.

For vectors -1 is true.  Did you try whether it breaks many places?
build_int_cst (type, 1) should still work fine.

Richard.

>
> Ilya

Re: [PATCH] Pass manager: add support for termination of pass list

2015-11-03 Thread Richard Biener

On Fri, Oct 30, 2015 at 1:53 PM, Martin Liška  wrote:
> On 10/30/2015 01:13 PM, Richard Biener wrote:
>> So I suggest to do the push/pop of cfun there.
>> do_per_function_toporder can be made static btw.
>>
>> Richard.
>
> Right, I've done that and it works (bootstrap has been currently running),
> feasible for HSA branch too.
>
> tree-pass.h:
>
> /* Declare for plugins.  */
> extern void do_per_function_toporder (void (*) (function *, void *), void *);
>
> Attaching the patch that I'm going to test.

Err.

+  cgraph_node::get (current_function_decl)->release_body ();
+
+  current_function_decl = NULL;
+  set_cfun (NULL);

I'd have expected

  tree fn = cfun->decl;
  pop_cfun ();
  gcc_assert (!cfun);
  cgraph_node::get (fn)->release_body ();

here.

> Martin
>

Re: [PATCH, 2/2] Handle recursive restrict in function parameter

2015-11-03 Thread Tom de Vries


On 01/11/15 19:20, Tom de Vries wrote:

On 01/11/15 19:03, Tom de Vries wrote:

So, the new patch series is:

  1Rename make_restrict_var_constraints to make_param_constraints
  2Handle recursive restrict in function parameter

I'll repost in reply to this message.



This patch adds handling of all the restrict qualifiers in the type of a
function parameter.



And reposting an updated version, now that the toplevel parameter in 
make_param_constraints has been eliminated.


Thanks,
- Tom

Handle recursive restrict in function parameter

	* tree-ssa-structalias.c (struct fieldoff): Add restrict_var field.
	(push_fields_onto_fieldstack): Add and handle handle_param parameter.
	(create_variable_info_for_1): Add and handle
	handle_param parameter.  Add extra arg to call to
	push_fields_onto_fieldstack.  Handle restrict pointer fields.
	(create_variable_info_for): Call create_variable_info_for_1 with extra
	arg.
	(make_param_constraints): Drop restrict_name parameter.  Ignore
	vi->only_restrict_pointers.
	(intra_create_variable_infos): Call create_variable_info_for_1 with
	extra arg.  Remove restrict handling.  Call make_param_constraints with
	one less arg.

	* gcc.dg/tree-ssa/restrict-7.c: New test.
	* gcc.dg/tree-ssa/restrict-8.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c | 12 
 gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c | 17 ++
 gcc/tree-ssa-structalias.c | 90 ++
 3 files changed, 83 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c b/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
new file mode 100644
index 000..f7a68c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+int
+f (int *__restrict__ *__restrict__ *__restrict__ a, int *b)
+{
+  *b = 1;
+  ***a  = 2;
+  return *b;
+}
+
+/* { dg-final { scan-tree-dump-times "return 1" 1 "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c b/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c
new file mode 100644
index 000..b0ab164
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+struct s
+{
+  int *__restrict__ *__restrict__ pp;
+};
+
+int
+f (struct s s, int *b)
+{
+  *b = 1;
+  **s.pp = 2;
+  return *b;
+}
+
+/* { dg-final { scan-tree-dump-times "return 1" 1 "fre1" } } */
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index ded5a1e..3c65db8 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -307,6 +307,7 @@ static varinfo_t first_or_preceding_vi_for_offset (varinfo_t,
 		   unsigned HOST_WIDE_INT);
 static varinfo_t lookup_vi_for_tree (tree);
 static inline bool type_can_have_subvars (const_tree);
+static void make_param_constraints (varinfo_t);
 
 /* Pool of variable info structures.  */
 static object_allocator variable_info_pool
@@ -393,6 +394,7 @@ new_var_info (tree t, const char *name, bool add_id)
   return ret;
 }
 
+static varinfo_t create_variable_info_for_1 (tree, const char *, bool, bool);
 
 /* A map mapping call statements to per-stmt variables for uses
and clobbers specific to the call.  */
@@ -5195,6 +5197,8 @@ struct fieldoff
   unsigned may_have_pointers : 1;
 
   unsigned only_restrict_pointers : 1;
+
+  varinfo_t restrict_var;
 };
 typedef struct fieldoff fieldoff_s;
 
@@ -5289,11 +5293,12 @@ field_must_have_pointers (tree t)
OFFSET is used to keep track of the offset in this entire
structure, rather than just the immediately containing structure.
Returns false if the caller is supposed to handle the field we
-   recursed for.  */
+   recursed for.  If HANDLE_PARAM is set, we're handling part of a function
+   parameter.  */
 
 static bool
 push_fields_onto_fieldstack (tree type, vec *fieldstack,
-			 HOST_WIDE_INT offset)
+			 HOST_WIDE_INT offset, bool handle_param)
 {
   tree field;
   bool empty_p = true;
@@ -5319,7 +5324,7 @@ push_fields_onto_fieldstack (tree type, vec *fieldstack,
 	|| TREE_CODE (field_type) == UNION_TYPE)
 	  push = true;
 	else if (!push_fields_onto_fieldstack
-		(field_type, fieldstack, offset + foff)
+		(field_type, fieldstack, offset + foff, handle_param)
 		 && (DECL_SIZE (field)
 		 && !integer_zerop (DECL_SIZE (field
 	  /* Empty structures may have actual size, like in C++.  So
@@ -5340,7 +5345,8 @@ push_fields_onto_fieldstack (tree type, vec *fieldstack,
 	if (!pair
 		&& offset + foff != 0)
 	  {
-		fieldoff_s e = {0, offset + foff, false, false, false, false};
+		fieldoff_s e = {0, offset + foff, false, false, false, false,
+NULL};
 		pair = fieldstack->safe_push (e);
 	  }
 
@@ -5374,6 +5380,19 @@ push_fields_onto_fieldstack (tree type, ve

Re: [PATCH] remove unused config/arm/coff.h

2015-11-03 Thread Jeff Law


On 11/03/2015 04:31 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

Hi,

$subject, nothing refers to this header so we might as well remove it.

tested I can still build on x86_64-linux-gnu, not that I would expect anything
else or that it is particularly relevent, ok?

Trev

gcc/ChangeLog:

2015-11-03  Trevor Saunders  

* config/arm/coff.h: Remove.
More generally, if we have a header file that's not used, I'd consider 
removing it to be obvious-enough to commit without approval.


We could/should probably do the same with unused functions, with the 
only wrinkle being things that are useful for debugging but which are 
otherwise unused should be kept around.


jeff

[gomp4,committed] Handle recursive restrict in function parameter

2015-11-03 Thread Tom de Vries


[ was: Re: [PATCH, 2/2] Handle recursive restrict in function parameter ]

On 03/11/15 14:46, Tom de Vries wrote:

On 01/11/15 19:20, Tom de Vries wrote:

On 01/11/15 19:03, Tom de Vries wrote:

So, the new patch series is:

  1Rename make_restrict_var_constraints to
make_param_constraints
  2Handle recursive restrict in function parameter

I'll repost in reply to this message.



This patch adds handling of all the restrict qualifiers in the type of a
function parameter.



And reposting an updated version, now that the toplevel parameter in
make_param_constraints has been eliminated.



And committed to gomp-4_0-branch.

Thanks,
- Tom


0001-Handle-recursive-restrict-in-function-parameter.patch


Handle recursive restrict in function parameter

* tree-ssa-structalias.c (struct fieldoff): Add restrict_var field.
(push_fields_onto_fieldstack): Add and handle handle_param parameter.
(create_variable_info_for_1): Add and handle
handle_param parameter.  Add extra arg to call to
push_fields_onto_fieldstack.  Handle restrict pointer fields.
(create_variable_info_for): Call create_variable_info_for_1 with extra
arg.
(make_param_constraints): Drop restrict_name parameter.  Ignore
vi->only_restrict_pointers.
(intra_create_variable_infos): Call create_variable_info_for_1 with
extra arg.  Remove restrict handling.  Call make_param_constraints with
one less arg.

* gcc.dg/tree-ssa/restrict-7.c: New test.
* gcc.dg/tree-ssa/restrict-8.c: New test.
---
  gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c | 12 
  gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c | 17 ++
  gcc/tree-ssa-structalias.c | 90 ++
  3 files changed, 83 insertions(+), 36 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
new file mode 100644
index 000..f7a68c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+int
+f (int *__restrict__ *__restrict__ *__restrict__ a, int *b)
+{
+  *b = 1;
+  ***a  = 2;
+  return *b;
+}
+
+/* { dg-final { scan-tree-dump-times "return 1" 1 "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c
new file mode 100644
index 000..b0ab164
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+struct s
+{
+  int *__restrict__ *__restrict__ pp;
+};
+
+int
+f (struct s s, int *b)
+{
+  *b = 1;
+  **s.pp = 2;
+  return *b;
+}
+
+/* { dg-final { scan-tree-dump-times "return 1" 1 "fre1" } } */
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index ded5a1e..3c65db8 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -307,6 +307,7 @@ static varinfo_t first_or_preceding_vi_for_offset 
(varinfo_t,
   unsigned HOST_WIDE_INT);
  static varinfo_t lookup_vi_for_tree (tree);
  static inline bool type_can_have_subvars (const_tree);
+static void make_param_constraints (varinfo_t);

  /* Pool of variable info structures.  */
  static object_allocator variable_info_pool
@@ -393,6 +394,7 @@ new_var_info (tree t, const char *name, bool add_id)
return ret;
  }

+static varinfo_t create_variable_info_for_1 (tree, const char *, bool, bool);

  /* A map mapping call statements to per-stmt variables for uses
 and clobbers specific to the call.  */
@@ -5195,6 +5197,8 @@ struct fieldoff
unsigned may_have_pointers : 1;

unsigned only_restrict_pointers : 1;
+
+  varinfo_t restrict_var;
  };
  typedef struct fieldoff fieldoff_s;

@@ -5289,11 +5293,12 @@ field_must_have_pointers (tree t)
 OFFSET is used to keep track of the offset in this entire
 structure, rather than just the immediately containing structure.
 Returns false if the caller is supposed to handle the field we
-   recursed for.  */
+   recursed for.  If HANDLE_PARAM is set, we're handling part of a function
+   parameter.  */

  static bool
  push_fields_onto_fieldstack (tree type, vec *fieldstack,
-HOST_WIDE_INT offset)
+HOST_WIDE_INT offset, bool handle_param)
  {
tree field;
bool empty_p = true;
@@ -5319,7 +5324,7 @@ push_fields_onto_fieldstack (tree type, vec 
*fieldstack,
|| TREE_CODE (field_type) == UNION_TYPE)
  push = true;
else if (!push_fields_onto_fieldstack
-   (field_type, fieldstack, offset + foff)
+   (field_type, fieldstack, offset + foff, handle_param)
 && (DECL_SIZE (field)
 && !in

Re: [patch 4/3] Header file reduction - Tools for contrib - second cut

2015-11-03 Thread Jeff Law


On 11/03/2015 06:24 AM, Andrew MacLeod wrote:


Do you have any sense of whether or not coverage of the tools has
improved over short time since we started squashing out conditional
compilation?  I was running the header file reordering bits on the
trunk and was a bit surprised of how many things they're still
changing.  But that would make sense if some files are now being
processed that weren't before because we've squashed out the
conditional compilation.


hmm. no, i dont have a feel for that.  Anl to be fair, I didn't run the
tools on every file in trunk.  I limited it to the ones in backend.h,
and took out even a few of those that were troublesome in some way or
other at some point.   I wouldnt expect the conditional stuff to affect
reordering much.  reducing...  we might start to see things like tm.h or
target.h included less.
Well, the reorder tool will punt if it sees conditional compilation in 
the headers, so I was kind hoping that some of the churn would be 
explainable by the ongoing removal of conditional compilation causing 
files to be processed now that weren't before.   But it appears its 
other factors.




A further enhancement in line with that would be to teach the reducer
about a couple of special files.. like the relationship between
options.h, tm.h and target.h.   sometimes target.h was included when in
fact options.h was the only thing actually needed..  During the
flayttening process I manually handled this by flattening tm.h out of
target.h and options.h into anything that included tm.h...  so every
file had options.h, tm.h and target.h explicitly included, and then the
reducer would just pick the "minimum".  of course, the reorder tool
works against this by combining them again :-)
A fair amount of the churn was options.h related.  I'll run it again and 
look closer to see how much exactly.




If it weren't for the level of churn, I'd probably be suggesting we
just have this stuff run regularly (weekly, monthly, whatever) and
commit the result after a sanity looksie.  I've yet to see this tool
botch anything and if we're not unnecessarily churning the sources,
keeping us clean WRT canononical ordering and duplicate removal
automatically seems like a good place to be.


it can botch one of the go files.. go has a backend.h of it's own...
which buggers things up quite nicely since it doesnt include a bunch of
the headers gcc's backend.h does :-)

Cute.



The reordering tool is likely safer to run across the board.. especially
if we can determine the very small subset it shouldn't be run on.
go, the gen* files perhaps a few others.  Blacklisting and running 
regularly is probably the way to go then.




Right now it triggers off the presence of system.h... if system.h is not
present, it wont do anything to the file. I haven't tried running it
against *.c to see if there are any other failures, perhaps thats not a
bad idea.   That will also provide us with a list of files which have
headers included within conditional compliation... there are a few of
those :-P  and maybe they could be fixed.  by default it wou=nt do
anything to those either.
I didn't know it keyed on system.h.  I'd manually blacklisted testsuite/ 
but otherwise let it run wild just for giggles.  Knowing it keys on 
system.h is helpful in that we don't have to blacklist nearly as much stuff.


And yes, there's a few files with conditional headers.  It wasn't 
terrible and makes a nice todo list for someone new to tackle.




Anyway, if we run it against everything and check it in, then in theory
there isn't any reason you couldnt spot run it at some interval.. there
shouldn't be much churn then.

That's the idea and obviously the more automated the better.


yeah, the reducer still needs some tweaks to be generally runnable I
think.   IN particular, how to deal with externally supplied macros it
cant really see.  Im still thinking about that one.
Well, the solution is obvious, we continue the move away from 
conditionally compiled code so that those macros don't matter in the end :-)






Which reminds me, you ought to add a VMS target to your tests.  The
reducer botched vmsdbgout.c.


Thats one of the reasons vmsdbgout.c wasn't in the list of things I
reduced :-)

Ahem, but vmsdbgout.c was part of the commit on Friday...



back to reordering...  the gen files are a bit of a pain too because of
the rtl.h conditional inclusions.. which I never really found a good
solution for...   maybe we should have a brtl.h which is used in concert
with any source which uses bconfig.h.. brtl.h could verifies bconfig.h
has been included and then includes those headers it needs, followed by
rtl.h itself.. and the tool could confirm the right pairing of
config.h/rtl.h  bconfig.h/brtl.h   is used.   hmm.
I think initially we could blacklist the gen* files.  I'm less concerned 
about the generators than I am the compiler proper.


jeff

Re: [PATCH 6/6] Make SRA replace constant-pool loads

2015-11-03 Thread Richard Biener

On Thu, Oct 29, 2015 at 8:18 PM, Alan Lawrence  wrote:
> This has changed quite a bit since the previous revision
> (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01484.html), mostly due to Ada
> and specifically Ada on ARM.
>
> I didn't find a good alternative to scanning for constant-pool accesses "as we
> go" through the function, and although I didn't find any of these accesses
> being disqualified after we'd found them in C, some awkward constant-pool
> expressions in Ada forced me to disqualify them for a new reason (inability to
> constant-propagate). Hence, I introduced a new bitmap of
> disqualified_constants, rather than an additional pass through the function.
>
> Next, the approach of using a replacement_expr (with tho constant value) in
> place of the old replacement_decl, also failed (the decl could be used in an
> expression where substituting in the expr produced ill-formed gimple). Hence,
> constant-pool entries now use replacement_decls just like everything else, for
> which I generate initializers in initialize_constant_pool_replacements.
>
> However, I was able to avoid generating the replacement constants early, and 
> to
> do so late in initialize_constant_pool_replacements; the main trick here was 
> to
> use fold_array_ctor_reference, kindly pointed out by Richie :), to fold the
> MEM_REFs built in analyze_access_subtree.
>
> Finally, I found completely_scalarize was still failing to fold all the
> constant expressions, because Ada was putting VIEW_CONVERT_EXPRs in the
> constant pool, and fold-const.c did not deal with ARRAY_REFs of these. 
> However,
> requiring completely_scalarize to be able to fold everything, seemed fragile,
> instead I decoupled from fold-const by allowing to fail.
>
> With these changes, ssa-dom-cse-2.c is fixed on all platforms with appropriate
> --param.

Hum.  I still wonder why we need all this complication ...  I would
expect that if
we simply decided to completely scalarize a constant pool aggregate load then
we can always do that.  SRA should then simply emit the _same_ IL as it does
for other complete scalarization element accesses.  The only difference is
that the result should be simplifiable to omit the element load from
the constant pool.

1) The FRE pass following SRA should do that for you.

2) You should be able to use fold_ctor_reference directly (in place of
all your code
in case offset and size are readily available - don't remember exactly how
complete scalarization "walks" elements).  Alternatively use
fold_const_aggregate_ref.

3) You can simplify the stmt SRA generated by simply calling fold_stmt on it,
that will do a bit more (wasted) work compared to 2) but may be easier.

I wouldn't bother with the case where we for some reason do not simplify
the constant pool load.

That is, I'd like to see the patch greatly simplified to just consider
constant pool
complete scalarization as if it were a regular variable.

Richard.

>
> There are still a few remaining test failures, on AArch64:
> gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
> gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
> gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
> (also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os).
> ...which I'm working on. I also tested with the hack at 
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01483.html . This revealed one 
> new failure in advsimd-intrinsics/vldX.c on AArch64 (fixed by 
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02777.html), and fixed a bunch 
> of guality failures on ARM:
> gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4
> gcc.dg/guality/pr54970.c   -O1  line 36 a[0] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 a[0] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 p[-2] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 q[-1] == 4
> (also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os)
>
> --Alan
>
> gcc/ChangeLog:
>
> * tree-sra.c (disqualified_constants): New.
> (sra_initialize): Initialize it.
> (sra_deinitialize): Deallocate it.
> (disqualify_candidate): Set bit in disqualified_constants.
> (subst_constant_pool_initial): New.
> (create_access): Scan for constant-pool entries as we go.
> (scalarizable_type_p): Disallow types containing placeholders.
> (completely_scalarize): Return bool to allow failure.
> (scalarize_elem): Likewise; check we can generate constant 
> replacements.
> (maybe_add_sra_candidate): Allow constant-pool entries.
> (analyze_access_subtree): Checking-assert that we can fold any refs
> built for constant-pool entries.
> (analyze_all_variable_accesses): Deal with completely_scalarize 
> failing.
> (initialize_constant_pool_replacements): New.
> (sra_modify_function_body): Call previous.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add sra-max-scalarization-size 
> param,
> remove xfail.
> ---
>  gcc/testsuite/gcc.dg/

Re: [PATCH] Add configure flag for operator new (std::nothrow)

2015-11-03 Thread Aurelio Remonda

On Tue, Nov 3, 2015 at 10:26 AM, Paolo Carlini  wrote:
> Hi,
>
> On 11/03/2015 01:35 PM, Aurelio Remonda wrote:
>>
>> diff --git a/ChangeLog b/ChangeLog
>> index 5b16ca2..a1cd0d3 100644
>> --- a/ChangeLog
>> +++ b/ChangeLog
>> @@ -1,3 +1,12 @@
>> +2015-10-30  Aurelio Remonda  
>> +
>> +   * libstdc++-v3/acinclude.m4: add
>> enable_new_opnt_no_allocation_retry
>> +   flag definition.
>> +   * libstdc++-v3/configure.ac: add option flag
>> +   GLIBCXX_ENABLE_NEW_OPNT_NO_ALLOCATION_RETRY
>> +   * libstdc++-v3/libsupc++/new_opnt.cc use the defined macro
>> +   * libstdc++-v3/doc/xml/manual/configure.xml
>> +
>
> Three minor comments. First, ChangeLog entries aren't normally submitted as
> part of the patch. Second, since the ChangeLog is under libstdc++-v3, the
> ChangeLog entries should not have libstdc++-v3 in the paths (eg, just *
> acinclude.m4: ...).

Ok, so ChangeLog modifications should be another patch?

>Finally, since you are touching acinclude.m4 you should
> normally run autoreconf, mention in the ChangeLog the changed regenerated
> files and eventually commit those changes too (like the ChangeLog entries,
> those aren't normally part of the posted patch) About the three issues, you
> have plenty of examples in the mailing list.

I have a problem with autoreconf, when i run it with autoconf 2.69 it
says i need
exactly autoconf 2.64 so i install it and try to do autoreconf with
2.64 and this is what
i get:

aurelio-remonda@Remonda-PC:~/gcc/libstdc++-v3$ autoreconf
configure.ac:74: error: Autoconf version 2.65 or higher is required

You can see i am using my recently installed autoconf:
aurelio-remonda@Remonda-PC:~/gcc/libstdc++-v3$ which autoconf
/home/aurelio-remonda/autoconf-2.64/install/bin/autoconf

I even try 2.65:
aurelio-remonda@Remonda-PC:~/gcc/libstdc++-v3$ which autoreconf
/home/aurelio-remonda/autoconf-2.65/install/bin/autoreconf

and got this:
aurelio-remonda@Remonda-PC:~/gcc/libstdc++-v3$ autoreconf
configure.ac:4: error: Please use exactly Autoconf 2.64 instead of 2.65.
../config/override.m4:12: _GCC_AUTOCONF_VERSION_CHECK is expanded from...
configure.ac:4: the top level
autom4te: /usr/bin/m4 failed with exit status: 1
aclocal: error: echo failed with exit status: 1
autoreconf: aclocal failed with exit status: 1

So i changed the version here:
/src/gcc/libstdc++-v3/configure.ac:3
AC_PREREQ(2.69)

And here:
/src/config/override.m4:59
m4_if(m4_defn([m4_PACKAGE_VERSION]), [2.69]

And did the autoreconf with 2.69 and that worked.
But Jonathan told me that i don't have to make those changes.
-- 
Aurelio Remonda

Software Engineer

San Lorenzo 47, 3rd Floor, Office 5
Córdoba, Argentina
Phone: +54-351-4217888 / 4218211

RE: [PATCH, MIPS, PR/61114] Migrate to reduc_..._scal optabs.

2015-11-03 Thread Moore, Catherine



> -Original Message-
> From: Simon Dardis [mailto:simon.dar...@imgtec.com]
> Sent: Wednesday, October 07, 2015 6:51 AM
> To: Alan Lawrence; Matthew Fortune; Moore, Catherine
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH, MIPS, PR/61114] Migrate to reduc_..._scal optabs.
> 
> On the change from smin/smax it was a deliberate change as I managed to
> confuse myself of the mode patterns, correct version follows. Reverted back
> to VWHB for smax/smin. Stylistic point addressed.
> 
> No new regression, ok for commit?
> 

Yes, OK to commit.  Sorry for the delay in review.
Catherine

> 
> Index: config/mips/loongson.md
> ==
> =
> --- config/mips/loongson.md   (revision 228282)
> +++ config/mips/loongson.md   (working copy)
> @@ -852,58 +852,66 @@
>"dsrl\t%0,%1,%2"
>[(set_attr "type" "fcvt")])
> 
> -(define_expand "reduc_uplus_"
> -  [(match_operand:VWH 0 "register_operand" "")
> -   (match_operand:VWH 1 "register_operand" "")]
> +(define_insn "vec_loongson_extract_lo_"
> +  [(set (match_operand: 0 "register_operand" "=r")
> +(vec_select:
> +  (match_operand:VWHB 1 "register_operand" "f")
> +  (parallel [(const_int 0)])))]
>"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> -{
> -  mips_expand_vec_reduc (operands[0], operands[1], gen_add3);
> -  DONE;
> -})
> +  "mfc1\t%0,%1"
> +  [(set_attr "type" "mfc")])
> 
> -; ??? Given that we're not describing a widening reduction, we should
> -; not have separate optabs for signed and unsigned.
> -(define_expand "reduc_splus_"
> -  [(match_operand:VWHB 0 "register_operand" "")
> +(define_expand "reduc_plus_scal_"
> +  [(match_operand: 0 "register_operand" "")
> (match_operand:VWHB 1 "register_operand" "")]
>"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
>  {
> -  emit_insn (gen_reduc_uplus_(operands[0], operands[1]));
> +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> +  mips_expand_vec_reduc (tmp, operands[1], gen_add3);
> +  emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
>DONE;
>  })
> 
> -(define_expand "reduc_smax_"
> -  [(match_operand:VWHB 0 "register_operand" "")
> +(define_expand "reduc_smax_scal_"
> +  [(match_operand: 0 "register_operand" "")
> (match_operand:VWHB 1 "register_operand" "")]
>"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
>  {
> -  mips_expand_vec_reduc (operands[0], operands[1], gen_smax3);
> +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> +  mips_expand_vec_reduc (tmp, operands[1], gen_smax3);
> +  emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
>DONE;
>  })
> 
> -(define_expand "reduc_smin_"
> -  [(match_operand:VWHB 0 "register_operand" "")
> +(define_expand "reduc_smin_scal_"
> +  [(match_operand: 0 "register_operand" "")
> (match_operand:VWHB 1 "register_operand" "")]
>"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
>  {
> -  mips_expand_vec_reduc (operands[0], operands[1], gen_smin3);
> +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> +  mips_expand_vec_reduc (tmp, operands[1], gen_smin3);
> +  emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
>DONE;
>  })
> 
> -(define_expand "reduc_umax_"
> -  [(match_operand:VB 0 "register_operand" "")
> +(define_expand "reduc_umax_scal_"
> +  [(match_operand: 0 "register_operand" "")
> (match_operand:VB 1 "register_operand" "")]
>"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
>  {
> -  mips_expand_vec_reduc (operands[0], operands[1],
> gen_umax3);
> +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> +  mips_expand_vec_reduc (tmp, operands[1], gen_umax3);
> +  emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
>DONE;
>  })
> 
> -(define_expand "reduc_umin_"
> -  [(match_operand:VB 0 "register_operand" "")
> +(define_expand "reduc_umin_scal_"
> +  [(match_operand: 0 "register_operand" "")
> (match_operand:VB 1 "register_operand" "")]
>"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
>  {
> -  mips_expand_vec_reduc (operands[0], operands[1], gen_umin3);
> +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> +  mips_expand_vec_reduc (tmp, operands[1], gen_umin3);
> +  emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
>DONE;
>  })
> 
> 
> -Original Message-
> From: Alan Lawrence [mailto:alan.lawre...@arm.com]
> Sent: 06 October 2015 11:12
> To: Simon Dardis; Matthew Fortune; Moore, Catherine
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH, MIPS, PR/61114] Migrate to reduc_..._scal optabs.
> 
> Thanks for working on this, Simon!
> 
> On 01/10/15 15:43, Simon Dardis wrote:
> > -(define_expand "reduc_smax_"
> > -  [(match_operand:VWHB 0 "register_operand" "")
> > -   (match_operand:VWHB 1 "register_operand" "")]
> > +(define_expand "reduc_smax_scal_"
> > +  [(match_operand:HI 0 "register_operand" "")
> > +   (match_operand:VH 1 "register_operand" "")]
> 
> 
> > -(define_expand "reduc_smin_"
> > -  [(match_operand:VWHB 0 "register_operand" "")
> > -   (match_ope

Re: [PATCH] Add configure flag for operator new (std::nothrow)

2015-11-03 Thread Andreas Schwab

Aurelio Remonda  writes:

> aurelio-remonda@Remonda-PC:~/gcc/libstdc++-v3$ autoreconf
> configure.ac:74: error: Autoconf version 2.65 or higher is required

Make sure you have automake 1.11.6.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH] Pass manager: add support for termination of pass list

2015-11-03 Thread Martin Liška

On 11/03/2015 02:46 PM, Richard Biener wrote:
> On Fri, Oct 30, 2015 at 1:53 PM, Martin Liška  wrote:
>> On 10/30/2015 01:13 PM, Richard Biener wrote:
>>> So I suggest to do the push/pop of cfun there.
>>> do_per_function_toporder can be made static btw.
>>>
>>> Richard.
>>
>> Right, I've done that and it works (bootstrap has been currently running),
>> feasible for HSA branch too.
>>
>> tree-pass.h:
>>
>> /* Declare for plugins.  */
>> extern void do_per_function_toporder (void (*) (function *, void *), void *);
>>
>> Attaching the patch that I'm going to test.
> 
> Err.
> 
> +  cgraph_node::get (current_function_decl)->release_body ();
> +
> +  current_function_decl = NULL;
> +  set_cfun (NULL);
> 
> I'd have expected
> 
>   tree fn = cfun->decl;
>   pop_cfun ();
>   gcc_assert (!cfun);
>   cgraph_node::get (fn)->release_body ();
> 
> here.

Yeah, that works, but we have to add following hunk:

diff --git a/gcc/function.c b/gcc/function.c
index aaf49a4..4718fe1 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4756,6 +4756,13 @@ push_cfun (struct function *new_cfun)
 void
 pop_cfun (void)
 {
+  if (cfun_stack.is_empty ())
+{
+  set_cfun (NULL);
+  current_function_decl = NULL_TREE;
+  return;
+}
+
   struct function *new_cfun = cfun_stack.pop ();
   /* When in_dummy_function, we do have a cfun but current_function_decl is
  NULL.  We also allow pushing NULL cfun and subsequently changing


If you are fine with that, looks we've fixed all issues related to the change, 
right?
Updated version of the is attached.

Martin

> 
>> Martin
>>

>From 0921507773eedadca1216a9edca954af240b7a49 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 22 Oct 2015 12:46:16 +0200
Subject: [PATCH] Pass manager: add support for termination of pass list

gcc/ChangeLog:

2015-10-30  Martin Liska  

	* function.c (pop_cfun): Set cfun and current_function_decl to
	NULL if the cfun_stack is empty.
	* passes.c (do_per_function_toporder): Push to cfun before
	calling the pass manager.
	(execute_one_pass): Handle TODO_discard_function.
	(execute_pass_list_1): Terminate if current function is null.
	(execute_pass_list): Do not push and pop function.
	* tree-pass.h: Define new TODO_discard_function.
---
 gcc/function.c  |  7 +++
 gcc/passes.c| 32 
 gcc/tree-pass.h |  3 +++
 4 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index aaf49a4..4718fe1 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4756,6 +4756,13 @@ push_cfun (struct function *new_cfun)
 void
 pop_cfun (void)
 {
+  if (cfun_stack.is_empty ())
+{
+  set_cfun (NULL);
+  current_function_decl = NULL_TREE;
+  return;
+}
+
   struct function *new_cfun = cfun_stack.pop ();
   /* When in_dummy_function, we do have a cfun but current_function_decl is
  NULL.  We also allow pushing NULL cfun and subsequently changing
diff --git a/gcc/passes.c b/gcc/passes.c
index d9af93a..d764a22 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -1706,7 +1706,12 @@ do_per_function_toporder (void (*callback) (function *, void *data), void *data)
 	  order[i] = NULL;
 	  node->process = 0;
 	  if (node->has_gimple_body_p ())
-	callback (DECL_STRUCT_FUNCTION (node->decl), data);
+	{
+	  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+	  push_cfun (fn);
+	  callback (fn, data);
+	  pop_cfun ();
+	}
 	}
   symtab->remove_cgraph_removal_hook (hook);
 }
@@ -2347,6 +2352,23 @@ execute_one_pass (opt_pass *pass)
 
   current_pass = NULL;
 
+  if (todo_after & TODO_discard_function)
+{
+  gcc_assert (cfun);
+  /* As cgraph_node::release_body expects release dominators info,
+	 we have to release it.  */
+  if (dom_info_available_p (CDI_DOMINATORS))
+	free_dominance_info (CDI_DOMINATORS);
+
+  if (dom_info_available_p (CDI_POST_DOMINATORS))
+	free_dominance_info (CDI_POST_DOMINATORS);
+
+  tree fn = cfun->decl;
+  pop_cfun ();
+  gcc_assert (!cfun);
+  cgraph_node::get (fn)->release_body ();
+}
+
   /* Signal this is a suitable GC collection point.  */
   if (!((todo_after | pass->todo_flags_finish) & TODO_do_not_ggc_collect))
 ggc_collect ();
@@ -2361,6 +2383,9 @@ execute_pass_list_1 (opt_pass *pass)
 {
   gcc_assert (pass->type == GIMPLE_PASS
 		  || pass->type == RTL_PASS);
+
+  if (cfun == NULL)
+	return;
   if (execute_one_pass (pass) && pass->sub)
 execute_pass_list_1 (pass->sub);
 
@@ -2372,14 +2397,13 @@ execute_pass_list_1 (opt_pass *pass)
 void
 execute_pass_list (function *fn, opt_pass *pass)
 {
-  push_cfun (fn);
+  gcc_assert (fn == cfun);
   execute_pass_list_1 (pass);
-  if (fn->cfg)
+  if (cfun && fn->cfg)
 {
   free_dominance_info (CDI_DOMINATORS);
   free_dominance_info (CDI_POST_DOMINATORS);
 }
-  pop_cfun ();
 }
 
 /* Write out all LTO data.  */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index c03e

[PATCH] Fix typo.

2015-11-03 Thread Yulia Koval

Hi,

This patch fixes a typo: PROCESSOT -> PROCESSOR. Ok for trunk?

Yulia


patch
Description: Binary data

Re: [PATCH] remove unused config/arm/coff.h

2015-11-03 Thread Richard Earnshaw

On 03/11/15 13:49, Jeff Law wrote:
> On 11/03/2015 04:31 AM, tbsaunde+...@tbsaunde.org wrote:
>> From: Trevor Saunders 
>>
>> Hi,
>>
>> $subject, nothing refers to this header so we might as well remove it.
>>
>> tested I can still build on x86_64-linux-gnu, not that I would expect
>> anything
>> else or that it is particularly relevent, ok?
>>
>> Trev
>>
>> gcc/ChangeLog:
>>
>> 2015-11-03  Trevor Saunders  
>>
>> * config/arm/coff.h: Remove.
> More generally, if we have a header file that's not used, I'd consider
> removing it to be obvious-enough to commit without approval.
> 
> We could/should probably do the same with unused functions, with the
> only wrinkle being things that are useful for debugging but which are
> otherwise unused should be kept around.
> 

I'd go as far as to say that such functions should be commented to that
effect.

R.

> jeff
>

Re: [patch 4/3] Header file reduction - Tools for contrib - second cut

2015-11-03 Thread Andrew MacLeod


On 11/03/2015 09:00 AM, Jeff Law wrote:



yeah, the reducer still needs some tweaks to be generally runnable I
think.   IN particular, how to deal with externally supplied macros it
cant really see.  Im still thinking about that one.
Well, the solution is obvious, we continue the move away from 
conditionally compiled code so that those macros don't matter in the 
end :-)


yeah but in the meantime its an issue.   I *think* I can simply provide 
to tool with a set of macros to define on the build command whenever it 
tries building a file..we'll see.


 It should also be possible to extract, after reduction, a list of 
macros that were used in the source file in conditional compilation, but 
which never saw a definition in any of the files. THat could also be 
useful information.  IN fact, that could be a stand alone analysis 
pretty easily I think...





Which reminds me, you ought to add a VMS target to your tests.  The
reducer botched vmsdbgout.c.


Thats one of the reasons vmsdbgout.c wasn't in the list of things I
reduced :-)

Ahem, but vmsdbgout.c was part of the commit on Friday...


ahh opps. it snuck back in over time :-P  sorry.





back to reordering...  the gen files are a bit of a pain too because of
the rtl.h conditional inclusions.. which I never really found a good
solution for...   maybe we should have a brtl.h which is used in concert
with any source which uses bconfig.h.. brtl.h could verifies bconfig.h
has been included and then includes those headers it needs, followed by
rtl.h itself.. and the tool could confirm the right pairing of
config.h/rtl.h  bconfig.h/brtl.h   is used.   hmm.
I think initially we could blacklist the gen* files.  I'm less 
concerned about the generators than I am the compiler proper.


yeah, its just annoying from a more abstract level (and results from a 
few of the tools)  for rtl.h to have to conditionally include a bunch of 
stuff provided by coretypes.h.



jeff

[gomp4, committed] Implement -foffload-alias

2015-11-03 Thread Tom de Vries


[ was: Re: [gomp4, WIP] Implement -foffload-alias ]

On 28/09/15 17:38, Tom de Vries wrote:

Hi,

this work-in-progress patch implements a new option
-foffload-alias=.


The option -foffload-alias=none instructs the compiler to assume that
objects references and pointer dereferences in an offload region do not
alias.

The option -foffload-alias=pointer instructs the compiler to assume that
objects references in an offload region do not alias.

The option -foffload-alias=all instructs the compiler to make no
assumptions about aliasing in offload regions.

The default value is -foffload-alias=none.


The patch works by adding restrict to the types of the fields used to
pass data to an offloading region.



Updated patch attached, committed to gomp-4_0-branch.


Atm, the kernels-loop-offload-alias-ptr.c test-case passes, but the
kernels-loop-offload-alias-none.c test-case fails.


I've dropped the two testcases from this patch, I'll commit in a 
follow-up patch.



For the latter, the
required amount of restrict is added, but it has no effect. I've
reported this in a more basic form in PR67742: "3rd-level restrict
ignored".


I've committed a fix for that PR as reported here: 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00204.html .


Furthermore, I've added support for the option in the 'mask & 4' case in 
install_var_field, I ran into this when trying out some Fortran test-cases.


Thanks,
- Tom

Implement -foffload-alias

2015-09-28  Tom de Vries  

	* common.opt (foffload-alias): New option.
	* flag-types.h (enum offload_alias): New enum.
	* omp-low.c (install_var_field): Handle flag_offload_alias.
	* doc/invoke.texi (@item Code Generation Options): Add -foffload-alias.
	(@item -foffload-alias): New item.
---
 gcc/common.opt  | 16 
 gcc/doc/invoke.texi | 11 +++
 gcc/flag-types.h|  7 +++
 gcc/omp-low.c   | 28 ++--
 4 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index c85ab49..135e777 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1738,6 +1738,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-alias=
+Common Joined RejectNegative Enum(offload_alias) Var(flag_offload_alias) Init(OFFLOAD_ALIAS_NONE)
+-foffload-alias=[all|pointer|none] Assume non-aliasing in an offload region
+
+Enum
+Name(offload_alias) Type(enum offload_alias) UnknownError(unknown offload aliasing %qs)
+
+EnumValue
+Enum(offload_alias) String(all) Value(OFFLOAD_ALIAS_ALL)
+
+EnumValue
+Enum(offload_alias) String(pointer) Value(OFFLOAD_ALIAS_POINTER)
+
+EnumValue
+Enum(offload_alias) String(none) Value(OFFLOAD_ALIAS_NONE)
+
 fomit-frame-pointer
 Common Report Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5a07512..8967f88 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1142,6 +1142,7 @@ See S/390 and zSeries Options.
 -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol
 -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{} @gol
 -fno-common  -fno-ident @gol
+-foffload-alias=@r{[}none@r{|}pointer@r{|}all@r{]} @gol
 -fpcc-struct-return  -fpic  -fPIC -fpie -fPIE -fno-plt @gol
 -fno-jump-tables @gol
 -frecord-gcc-switches @gol
@@ -23842,6 +23843,16 @@ The options @option{-ftrapv} and @option{-fwrapv} override each other, so using
 using @option{-ftrapv} @option{-fwrapv} @option{-fno-wrapv} on the command-line
 results in @option{-ftrapv} being effective.
 
+@item -foffload-alias=@r{[}none@r{|}pointer@r{|}all@r{]}
+@opindex -foffload-alias
+The option @option{-foffload-alias=none} instructs the compiler to assume that
+objects references and pointer dereferences in an offload region do not alias.
+The option @option{-foffload-alias=pointer} instruct the compiler to assume that
+objects references in an offload region do not alias.  The option
+@option{-foffload-alias=all} instructs the compiler to make no assumptions about
+aliasing in offload regions.  The default value is
+@option{-foffload-alias=none}.
+
 @item -fexceptions
 @opindex fexceptions
 Enable exception handling.  Generates extra code needed to propagate
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 6301cea..87b1677 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -293,5 +293,12 @@ enum gfc_convert
   GFC_FLAG_CONVERT_LITTLE
 };
 
+enum offload_alias
+{
+  OFFLOAD_ALIAS_ALL,
+  OFFLOAD_ALIAS_POINTER,
+  OFFLOAD_ALIAS_NONE
+};
+
 
 #endif /* ! GCC_FLAG_TYPES_H */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3543785..6bac074 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1441,6 +1441,14 @@ install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
   tree field, type, sfield = NULL_TREE;
   splay_tree_key key = (splay_tree_key) var;
 
+  /* We use flag_offload_alias only for the oacc

[PATCH] Fix declaration of pthread-structs in s-osinte-rtems.ads

2015-11-03 Thread Jan Sommer

Hi,

Let's try again. This time I made the diff against trunk with the changes 
Sebastian recommended, included a ChangeLog and used svn-diff.
If this patch goes through, please let me know how the backporting works.

Best regards,

   Jan

Index: ChangeLog
===
--- ChangeLog   (Revision 229709)
+++ ChangeLog   (Arbeitskopie)
@@ -1,3 +1,10 @@
+
+2015-11-03  Jan Sommer 
+
+   Use opaque types for Ada-declaration of pthread types
+   in Gnat for Rtems.
+   PR ada/68169
+
 2015-10-23  Steve Ellcey  
 
* MAINTAINERS: Update email address.
Index: gcc/ada/s-oscons-tmplt.c
===
--- gcc/ada/s-oscons-tmplt.c(Revision 229709)
+++ gcc/ada/s-oscons-tmplt.c(Arbeitskopie)
@@ -157,7 +157,7 @@ pragma Style_Checks ("M32766");
 # include <_types.h>
 #endif
 
-#if defined (__linux__) || defined (__ANDROID__)
+#if defined (__linux__) || defined (__ANDROID__) || defined (__rtems__)
 # include 
 # include 
 #endif
@@ -1458,7 +1458,7 @@ CNS(CLOCK_RT_Ada, "")
 #endif
 
 #if defined (__APPLE__) || defined (__linux__) || defined (__ANDROID__) \
-  || defined (DUMMY)
+  || defined (__rtems__) || defined (DUMMY)
 /*
 
--  Sizes of pthread data types
@@ -1501,7 +1501,7 @@ CND(PTHREAD_RWLOCKATTR_SIZE, "pthread_rwlockattr_t
 CND(PTHREAD_RWLOCK_SIZE, "pthread_rwlock_t")
 CND(PTHREAD_ONCE_SIZE,   "pthread_once_t")
 
-#endif /* __APPLE__ || __linux__ || __ANDROID__ */
+#endif /* __APPLE__ || __linux__ || __ANDROID__ || __rtems__ */
 
 /*
 
Index: gcc/ada/s-osinte-rtems.ads
===
--- gcc/ada/s-osinte-rtems.ads  (Revision 229709)
+++ gcc/ada/s-osinte-rtems.ads  (Arbeitskopie)
@@ -51,6 +51,8 @@
 --  It is designed to be a bottom-level (leaf) package.
 
 with Interfaces.C;
+with System.OS_Constants;
+
 package System.OS_Interface is
pragma Preelaborate;
 
@@ -60,6 +62,7 @@ package System.OS_Interface is
subtype rtems_id   is Interfaces.C.unsigned;
 
subtype intis Interfaces.C.int;
+   subtype char   is Interfaces.C.char;
subtype short  is Interfaces.C.short;
subtype long   is Interfaces.C.long;
subtype unsigned   is Interfaces.C.unsigned;
@@ -68,7 +71,6 @@ package System.OS_Interface is
subtype unsigned_char  is Interfaces.C.unsigned_char;
subtype plain_char is Interfaces.C.plain_char;
subtype size_t is Interfaces.C.size_t;
-
---
-- Errno --
---
@@ -76,11 +78,11 @@ package System.OS_Interface is
function errno return int;
pragma Import (C, errno, "__get_errno");
 
-   EAGAIN: constant := 11;
-   EINTR : constant := 4;
-   EINVAL: constant := 22;
-   ENOMEM: constant := 12;
-   ETIMEDOUT : constant := 116;
+   EAGAIN: constant := System.OS_Constants.EAGAIN;
+   EINTR : constant := System.OS_Constants.EINTR;
+   EINVAL: constant := System.OS_Constants.EINVAL;
+   ENOMEM: constant := System.OS_Constants.ENOMEM;
+   ETIMEDOUT : constant := System.OS_Constants.ETIMEDOUT;
 
-
-- Signals --
@@ -448,6 +450,7 @@ package System.OS_Interface is
   ss_low_priority : int;
   ss_replenish_period : timespec;
   ss_initial_budget   : timespec;
+  sched_ss_max_repl   : int;
end record;
pragma Convention (C, struct_sched_param);
 
@@ -621,43 +624,34 @@ private
end record;
pragma Convention (C, timespec);
 
-   CLOCK_REALTIME :  constant clockid_t := 1;
-   CLOCK_MONOTONIC : constant clockid_t := 4;
+   CLOCK_REALTIME :  constant clockid_t := System.OS_Constants.CLOCK_REALTIME;
+   CLOCK_MONOTONIC : constant clockid_t := System.OS_Constants.CLOCK_MONOTONIC;
 
+   subtype char_array is Interfaces.C.char_array;
+
type pthread_attr_t is record
-  is_initialized  : int;
-  stackaddr   : System.Address;
-  stacksize   : int;
-  contentionscope : int;
-  inheritsched: int;
-  schedpolicy : int;
-  schedparam  : struct_sched_param;
-  cputime_clocked_allowed : int;
-  detatchstate: int;
+  Data : char_array (1 .. OS_Constants.PTHREAD_ATTR_SIZE);
end record;
pragma Convention (C, pthread_attr_t);
+   for pthread_attr_t'Alignment use Interfaces.C.double'Alignment;
 
type pthread_condattr_t is record
-  flags   : int;
-  process_shared  : int;
+  Data : char_array (1 .. OS_Constants.PTHREAD_CONDATTR_SIZE);
end record;
pragma Convention (C, pthread_condattr_t);
+   for pthread_condattr_t'Alignment use Interfaces.C.double'Alignment;
 
type pthread_mutexattr_t is record
-  is_initialized  : int;
-  process_shared  : int;
-  prio_ceiling: int;
-  protocol: int;
-  mutex_type  : int;
-  recursive   : int;
-   end record;
+  Data : char_array (1 .. OS_Constants.PTHREAD_MUTEXATTR_SIZE);

[PATCH] PR 68192 Export AIX TLS symbols

2015-11-03 Thread David Edelsohn

TLS symbols in AIX display a new, different symbol type in nm output.
Libtool explicitly creates a list of exported symbols for shared
libraries using nm and does not recognize the new TLS symbols, so
those symbols are not exported.

This is a regression for TLS support on AIX.

This patch updates libtool.m4 in GCC and configure for libstdc++-v3,
libgfortran, and libgomp.  I would like to apply the patch to GCC
while I simultaneously work with the Libtool community to correct the
bug upstream.  I also would like to backport this to GCC 5.2 and GCC
4.9.x.

I have not been able to run the correct versions of autoconf to
regenerate configure directly.  I either can edit the files directly
or I would appreciate someone helping me to regenerate configure in
all library directories.

Bootstrapped on powerpc-ibm-aix7.1.0.0.

* libtool.m4 (export_symbols_cmds) [AIX]: Add global TLS "L" symbols.
* libstdc++-v3/configure: Regenerate.
* libgfortran/configure: Regenerate.
* libgomp/configure: Regenerate.

Thanks, David


ZZ
Description: Binary data

Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

2015-11-03 Thread Alexander Monakov

Hello,

Here's an alternative patch that does not depend on exposure of shared-memory
address space, and does not try to use pass_late_lower_omp.  It's based on
Bernd's suggestion to transform

  (use .omp_data_o)
  GOMP_parallel (fn, &omp_data_o, ...);
  .omp_data_o = {CLOBBER};

to

  .omp_data_o_ptr = __internal_omp_alloc_shared (&.omp_data_o, sizeof ...);
  (use (*.omp_data_o_ptr) instead of .omp_data_o)
  GOMP_parallel (fn, .omp_data_o_ptr, ...);
  __internal_omp_free_shared (.omp_data_o_ptr);
  .omp_data_o = {CLOBBER};

Every target except nvptx can lower free_shared to nothing and alloc_shared to
just returning the first argument, and nvptx can select storage in shared
memory or global memory.  For now it simply uses malloc/free.

Sanity-checked by running the libgomp testsuite.  I realize the #ifdef in
internal-fn.c is not appropriate: it's there to make the patch smaller, I'll
replace it with a target hook if otherwise this approach is ok.

Thanks.
Alexander

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index bf0f23e..3145a8d 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -175,6 +175,38 @@ expand_GOMP_SIMD_LAST_LANE (gcall *)
   gcc_unreachable ();
 }
 
+static void
+expand_GOMP_ALLOC_SHARED (gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  /* XXX PoC only, needs to be a target hook.  */
+#ifdef GCC_NVPTX_H
+  tree fndecl = builtin_decl_explicit (BUILT_IN_MALLOC);
+  tree t = build_call_expr (fndecl, 1, gimple_call_arg (stmt, 1));
+
+  expand_call (t, target, 0);
+#else
+  tree rhs = gimple_call_arg (stmt, 0);
+
+  rtx src = expand_normal (rhs);
+
+  emit_move_insn (target, src);
+#endif
+}
+
+static void
+expand_GOMP_FREE_SHARED (gcall *stmt)
+{
+#ifdef GCC_NVPTX_H
+  tree fndecl = builtin_decl_explicit (BUILT_IN_FREE);
+  tree t = build_call_expr (fndecl, 1, gimple_call_arg (stmt, 0));
+
+  expand_call (t, NULL_RTX, 1);
+#endif
+}
+
 /* This should get expanded in the sanopt pass.  */
 
 static void
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 0db03f1..0c8e76a 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -44,6 +44,8 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST | ECF_LEAF, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (GOMP_ALLOC_SHARED, ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (GOMP_FREE_SHARED, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (MASK_LOAD, ECF_PURE | ECF_LEAF, NULL)
 DEF_INTERNAL_FN (MASK_STORE, ECF_LEAF, NULL)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 696889d..225bf20 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5870,7 +5870,8 @@ expand_omp_taskreg (struct omp_region *region)
 a function call that has been inlined, the original PARM_DECL
 .OMP_DATA_I may have been converted into a different local
 variable.  In which case, we need to keep the assignment.  */
-  if (gimple_omp_taskreg_data_arg (entry_stmt))
+  tree data_arg = gimple_omp_taskreg_data_arg (entry_stmt);
+  if (data_arg)
{
  basic_block entry_succ_bb
= single_succ_p (entry_bb) ? single_succ (entry_bb)
@@ -5894,9 +5895,10 @@ expand_omp_taskreg (struct omp_region *region)
  /* We're ignore the subcode because we're
 effectively doing a STRIP_NOPS.  */
 
- if (TREE_CODE (arg) == ADDR_EXPR
- && TREE_OPERAND (arg, 0)
-   == gimple_omp_taskreg_data_arg (entry_stmt))
+ if ((TREE_CODE (arg) == ADDR_EXPR
+  && TREE_OPERAND (arg, 0) == data_arg)
+ || (TREE_CODE (data_arg) == INDIRECT_REF
+ && TREE_OPERAND (data_arg, 0) == arg))
{
  parcopy_stmt = stmt;
  break;
@@ -11835,27 +11837,44 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
   record_vars_into (ctx->block_vars, child_fn);
   record_vars_into (gimple_bind_vars (par_bind), child_fn);
 
+  ilist = NULL;
+  tree sender_decl = NULL_TREE;
+
   if (ctx->record_type)
 {
-  ctx->sender_decl
+  sender_decl
= create_tmp_var (ctx->srecord_type ? ctx->srecord_type
  : ctx->record_type, ".omp_data_o");
-  DECL_NAMELESS (ctx->sender_decl) = 1;
-  TREE_ADDRESSABLE (ctx->sender_decl) = 1;
+  DECL_NAMELESS (sender_decl) = 1;
+  TREE_ADDRESSABLE (sender_decl) = 1;
+
+  /* Instead of using the automatic variable .omp_data_o directly, build
+ .omp_data_o_ptr = GOMP_ALLOC_SHARED (&.omp_data_o, sizeof .omp_data_o)
+ ... and replace SENDER_DECL with indirect ref *.omp_data_o_ptr.

[PATCH][i386]Migrate reduction optabs to reduc__scal

2015-11-03 Thread Alan Lawrence

This migrates the various reduction optabs in sse.md to use the reduce-to-scalar
form. I took the straightforward approach (equivalent to the migration code in
expr.c/optabs.c) of generating a vector temporary, using the existing code to
reduce to that, and extracting lane 0, in each pattern.

Bootstrapped + check-gcc + check-g++.

Ok for trunk?

gcc/ChangeLog:

* config/i386/sse.md (reduc_splus_v8df): Rename to...
(reduc_plus_scal_v8df): ...here; reduce to temp and extract scalar.

(reduc_splus_v4df): Rename to...
(reduc_plus_scal_v4df): ...here; reduce to temp and extract scalar.

(reduc_splus_v2df): Rename to...
(reduc_plus_scal_v2df): ...here; reduce to temp and extract scalar.

(reduc_splus_v16sf): Rename to...
(reduc_plus_scal_v16sf): ...here; reduce to temp and extract scalar.

(reduc_splus_v8sf): Rename to...
(reduc_plus_scal_v8sf): ...here; reduce to temp and extract scalar.

(reduc_splus_v4sf): Rename to...
(reduc_plus_scal_v4sf): ...here; reduce to temp and extract scalar.

(reduc__, all 3 variants): Rename each to...
(reduc__scal_): ...here; reduce to temp and extract scalar.

(reduc_umin_v8hf): Rename to...
(reduc_umin_scal_v8hf): ...here; reduce to temp and extract scalar.
---
 gcc/config/i386/sse.md | 82 +++---
 1 file changed, 51 insertions(+), 31 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 43dcc6a..041e514 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2424,73 +2424,85 @@
(set_attr "prefix_rep" "1,*")
(set_attr "mode" "V4SF")])
 
-(define_expand "reduc_splus_v8df"
-  [(match_operand:V8DF 0 "register_operand")
+(define_expand "reduc_plus_scal_v8df"
+  [(match_operand:DF 0 "register_operand")
(match_operand:V8DF 1 "register_operand")]
   "TARGET_AVX512F"
 {
-  ix86_expand_reduc (gen_addv8df3, operands[0], operands[1]);
+  rtx tmp = gen_reg_rtx (V8DFmode);
+  ix86_expand_reduc (gen_addv8df3, tmp, operands[1]);
+  emit_insn (gen_vec_extractv8df (operands[0], tmp, const0_rtx));
   DONE;
 })
 
-(define_expand "reduc_splus_v4df"
-  [(match_operand:V4DF 0 "register_operand")
+(define_expand "reduc_plus_scal_v4df"
+  [(match_operand:DF 0 "register_operand")
(match_operand:V4DF 1 "register_operand")]
   "TARGET_AVX"
 {
   rtx tmp = gen_reg_rtx (V4DFmode);
   rtx tmp2 = gen_reg_rtx (V4DFmode);
+  rtx vec_res = gen_reg_rtx (V4DFmode);
   emit_insn (gen_avx_haddv4df3 (tmp, operands[1], operands[1]));
   emit_insn (gen_avx_vperm2f128v4df3 (tmp2, tmp, tmp, GEN_INT (1)));
-  emit_insn (gen_addv4df3 (operands[0], tmp, tmp2));
+  emit_insn (gen_addv4df3 (vec_res, tmp, tmp2));
+  emit_insn (gen_vec_extractv4df (operands[0], vec_res, const0_rtx));
   DONE;
 })
 
-(define_expand "reduc_splus_v2df"
-  [(match_operand:V2DF 0 "register_operand")
+(define_expand "reduc_plus_scal_v2df"
+  [(match_operand:DF 0 "register_operand")
(match_operand:V2DF 1 "register_operand")]
   "TARGET_SSE3"
 {
-  emit_insn (gen_sse3_haddv2df3 (operands[0], operands[1], operands[1]));
+  rtx tmp = gen_reg_rtx (V2DFmode);
+  emit_insn (gen_sse3_haddv2df3 (tmp, operands[1], operands[1]));
+  emit_insn (gen_vec_extractv2df (operands[0], tmp, const0_rtx));
   DONE;
 })
 
-(define_expand "reduc_splus_v16sf"
-  [(match_operand:V16SF 0 "register_operand")
+(define_expand "reduc_plus_scal_v16sf"
+  [(match_operand:SF 0 "register_operand")
(match_operand:V16SF 1 "register_operand")]
   "TARGET_AVX512F"
 {
-  ix86_expand_reduc (gen_addv16sf3, operands[0], operands[1]);
+  rtx tmp = gen_reg_rtx (V16SFmode);
+  ix86_expand_reduc (gen_addv16sf3, tmp, operands[1]);
+  emit_insn (gen_vec_extractv16sf (operands[0], tmp, const0_rtx));
   DONE;
 })
 
-(define_expand "reduc_splus_v8sf"
-  [(match_operand:V8SF 0 "register_operand")
+(define_expand "reduc_plus_scal_v8sf"
+  [(match_operand:SF 0 "register_operand")
(match_operand:V8SF 1 "register_operand")]
   "TARGET_AVX"
 {
   rtx tmp = gen_reg_rtx (V8SFmode);
   rtx tmp2 = gen_reg_rtx (V8SFmode);
+  rtx vec_res = gen_reg_rtx (V8SFmode);
   emit_insn (gen_avx_haddv8sf3 (tmp, operands[1], operands[1]));
   emit_insn (gen_avx_haddv8sf3 (tmp2, tmp, tmp));
   emit_insn (gen_avx_vperm2f128v8sf3 (tmp, tmp2, tmp2, GEN_INT (1)));
-  emit_insn (gen_addv8sf3 (operands[0], tmp, tmp2));
+  emit_insn (gen_addv8sf3 (vec_res, tmp, tmp2));
+  emit_insn (gen_vec_extractv8sf (operands[0], vec_res, const0_rtx));
   DONE;
 })
 
-(define_expand "reduc_splus_v4sf"
-  [(match_operand:V4SF 0 "register_operand")
+(define_expand "reduc_plus_scal_v4sf"
+  [(match_operand:SF 0 "register_operand")
(match_operand:V4SF 1 "register_operand")]
   "TARGET_SSE"
 {
+  rtx vec_res = gen_reg_rtx (V4SFmode);
   if (TARGET_SSE3)
 {
   rtx tmp = gen_reg_rtx (V4SFmode);
   emit_insn (gen_sse3_haddv4sf3 (tmp, operands[1], operands[1]));
-  emit_insn (gen_sse3_haddv4sf3 (ope

[PATCH/RFTesting][MIPS] Migrate reduction optabs in mips-ps-3d.md

2015-11-03 Thread Alan Lawrence

There are still a few uses of the old reduc_[us](plus|min|max)_ optabs
remaining. This migrates the instances in mips-ps-3d.md.

This seemed straightforward, as mips-ps-3d.md also provides a vec_extractv2sf.
I tried to be conservative and handle all the possible cases for endianness,
this may be overkill. Also I believe TARGET_MIPS3D implies
TARGET_PAIRED_SINGLE_FLOAT.

I've built stage-1 compilers for mips64el and mips64, and verified that these
did at least (both) compile the affected code. However I haven't tested any
further, I'm not sure what command-line options etc. to use for this. I wonder
if one of the kind MIPS folk would be able to assist with testing?

--Alan

gcc/ChangeLog:

* config/mips/mips-ps-3d.md (reduc_splus_v2sf): Remove.
(reduc_plus_scal_v2sf): New.
---
 gcc/config/mips/mips-ps-3d.md | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/gcc/config/mips/mips-ps-3d.md b/gcc/config/mips/mips-ps-3d.md
index 8bc7608..01e6753 100644
--- a/gcc/config/mips/mips-ps-3d.md
+++ b/gcc/config/mips/mips-ps-3d.md
@@ -371,13 +371,17 @@
   [(set_attr "type" "fadd")
(set_attr "mode" "SF")])
 
-(define_insn "reduc_splus_v2sf"
-  [(set (match_operand:V2SF 0 "register_operand" "=f")
-   (unspec:V2SF [(match_operand:V2SF 1 "register_operand" "f")
- (match_dup 1)]
-UNSPEC_ADDR_PS))]
+(define_expand "reduc_plus_scal_v2sf"
+  [(match_operand:SF 0 "register_operand" "=f")
+   (match_operand:V2SF 1 "register_operand" "f")]
   "TARGET_HARD_FLOAT && TARGET_MIPS3D"
-  "")
+  {
+rtx temp = gen_reg_rtx (V2SFmode);
+emit_insn (gen_mips_addr_ps (temp, operands[1], operands[1]));
+rtx lane = BYTES_BIG_ENDIAN ? const1_rtx : const0_rtx;
+emit_insn (gen_vec_extractv2sf (operands[0], temp, lane));
+DONE;
+  })
 
 ; cvt.pw.ps - Floating Point Convert Paired Single to Paired Word
 (define_insn "mips_cvt_pw_ps"
-- 
1.9.1

Re: [PATCH] Fix typo.

2015-11-03 Thread Uros Bizjak

On Tue, Nov 3, 2015 at 3:16 PM, Yulia Koval  wrote:
> Hi,
>
> This patch fixes a typo: PROCESSOT -> PROCESSOR. Ok for trunk?

Trivial patch, OK with a suitable ChangeLog.

Uros.

Re: [0/7] Type promotion pass and elimination of zext/sext

2015-11-03 Thread Richard Biener

On Mon, Nov 2, 2015 at 10:17 AM, Kugan
 wrote:
>
>
> On 29/10/15 02:45, Richard Biener wrote:
>> On Tue, Oct 27, 2015 at 1:50 AM, kugan
>>  wrote:
>>>
>>>
>>> On 23/10/15 01:23, Richard Biener wrote:

 On Thu, Oct 22, 2015 at 12:50 PM, Kugan
  wrote:
>
>
>
> On 21/10/15 23:45, Richard Biener wrote:
>>
>> On Tue, Oct 20, 2015 at 10:03 PM, Kugan
>>  wrote:
>>>
>>>
>>>
>>> On 07/09/15 12:53, Kugan wrote:


 This a new version of the patch posted in
 https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
 more testing and spitted the patch to make it more easier to review.
 There are still couple of issues to be addressed and I am working on
 them.

 1. AARCH64 bootstrap now fails with the commit
 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is
 mis-compiled
 in stage2 and fwprop.c is failing. It looks to me that there is a
 latent
 issue which gets exposed my patch. I can also reproduce this in x86_64
 if I use the same PROMOTE_MODE which is used in aarch64 port. For the
 time being, I am using  patch
 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
 workaround. This meeds to be fixed before the patches are ready to be
 committed.

 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
 -O3 -g Error: unaligned opcodes detected in executable segment. It
 works
 fine if I remove the -g. I am looking into it and needs to be fixed as
 well.
>>>
>>>
>>> Hi Richard,
>>>
>>> Now that stage 1 is going to close, I would like to get these patches
>>> accepted for stage1. I will try my best to address your review comments
>>> ASAP.
>>
>>
>> Ok, can you make the whole patch series available so I can poke at the
>> implementation a bit?  Please state the revision it was rebased on
>> (or point me to a git/svn branch the work resides on).
>>
>
> Thanks. Please find the patched rebated against trunk@229156. I have
> skipped the test-case readjustment patches.


 Some quick observations.  On x86_64 when building
>>>
>>>
>>> Hi Richard,
>>>
>>> Thanks for the review.
>>>

 short bar (short y);
 int foo (short x)
 {
short y = bar (x) + 15;
return y;
 }

 with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs)
 I get

:
_1 = (int) x_10(D);
_2 = (_1) sext (16);
_11 = bar (_2);
_5 = (int) _11;
_12 = (unsigned int) _5;
_6 = _12 & 65535;
_7 = _6 + 15;
_13 = (int) _7;
_8 = (_13) sext (16);
_9 = (_8) sext (16);
return _9;

 which looks fine but the VRP optimization doesn't trigger for the
 redundant sext
 (ranges are computed correctly but the 2nd extension is not removed).
>
> Thanks for the comments. Please fond the attached patches with which I
> am now getting
> cat .192t.optimized
>
> ;; Function foo (foo, funcdef_no=0, decl_uid=1406, cgraph_uid=0,
> symbol_order=0)
>
> foo (short int x)
> {
>   signed int _1;
>   int _2;
>   signed int _5;
>   unsigned int _6;
>   unsigned int _7;
>   signed int _8;
>   int _9;
>   short int _11;
>   unsigned int _12;
>   signed int _13;
>
>   :
>   _1 = (signed int) x_10(D);
>   _2 = _1;
>   _11 = bar (_2);
>   _5 = (signed int) _11;
>   _12 = (unsigned int) _11;
>   _6 = _12 & 65535;
>   _7 = _6 + 15;
>   _13 = (signed int) _7;
>   _8 = (_13) sext (16);
>   _9 = _8;
>   return _9;
>
> }
>
>
> There are still some redundancies. The asm difference after RTL
> optimizations is
>
> -   addl$15, %eax
> +   addw$15, %ax
>
>

 This also makes me notice trivial match.pd patterns are missing, like
 for example

 (simplify
   (sext (sext@2 @0 @1) @3)
   (if (tree_int_cst_compare (@1, @3) <= 0)
@2
(sext @0 @3)))

 as VRP doesn't run at -O1 we must rely on those to remove rendudant
 extensions,
 otherwise generated code might get worse compared to without the pass(?)
>>>
>>>
>>> Do you think that we should enable this pass only when vrp is enabled.
>>> Otherwise, even when we do the simple optimizations you mentioned below, we
>>> might not be able to remove all the redundancies.
>>>

 I also notice that the 'short' argument does not get it's sign-extension
 removed
 as redundand either even though we have

 _1 = (int) x_8(D);
 Found new range for _1: [-32768, 32767]

>>>
>>> I am looking into it.
>>>
 In the end I suspect that keeping track of the "simple" cases in the
 promotion
 pass itself (by keeping a lattice) might be a good idea (after we fix VRP
 to do
 its work).  In some way whether the ABI guarantees pr

Re: [gomp4, committed] Implement -foffload-alias

2015-11-03 Thread Tom de Vries


On 03/11/15 15:19, Tom de Vries wrote:

I've dropped the two testcases from this patch, I'll commit in a
follow-up patch.


Committed to gomp-4_0-branch, as attached.

Thanks,
- Tom
Add goacc/kernels-loop-offload-alias-{none,ptr}.c

2015-11-03  Tom de Vries  

	* c-c++-common/goacc/kernels-loop-offload-alias-none.c: New test.
	* c-c++-common/goacc/kernels-loop-offload-alias-ptr.c: New test.
---
 .../goacc/kernels-loop-offload-alias-none.c| 61 ++
 .../goacc/kernels-loop-offload-alias-ptr.c | 44 
 2 files changed, 105 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
new file mode 100644
index 000..bb96330
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
@@ -0,0 +1,61 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+/* { dg-additional-options "-fdump-tree-alias-all" } */
+/* { dg-additional-options "-foffload-alias=none" } */
+
+#include 
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+static void
+foo (unsigned int *a, unsigned int *b, unsigned int *c)
+{
+  for (COUNTERTYPE i = 0; i < N; i++)
+a[i] = i * 2;
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+for (COUNTERTYPE ii = 0; ii < N; ii++)
+  c[ii] = a[ii] + b[ii];
+  }
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+if (c[i] != a[i] + b[i])
+  abort ();
+}
+
+int
+main (void)
+{
+  unsigned int *a;
+  unsigned int *b;
+  unsigned int *c;
+
+  a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  b = (unsigned int *)malloc (N * sizeof (unsigned int));
+  c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+  foo (a, b, c);
+
+  free (a);
+  free (b);
+  free (c);
+
+  return 0;
+}
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo._omp_fn.0" 1 "optimized" } } */
+
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 3 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 3" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 4" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 5" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 6" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 7" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 9 "alias" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c
new file mode 100644
index 000..de4f45a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c
@@ -0,0 +1,44 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+/* { dg-additional-options "-fdump-tree-alias-all" } */
+/* { dg-additional-options "-foffload-alias=pointer" } */
+
+#include 
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+unsigned int a[N];
+unsigned int b[N];
+unsigned int c[N];
+
+int
+main (void)
+{
+  for (COUNTERTYPE i = 0; i < N; i++)
+a[i] = i * 2;
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+  {
+for (COUNTERTYPE ii = 0; ii < N; ii++)
+  c[ii] = a[ii] + b[ii];
+  }
+
+  for (COUNTERTYPE i = 0; i < N; i++)
+if (c[i] != a[i] + b[i])
+  abort ();
+
+  return 0;
+}
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 3 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 3" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 4" 1 "alias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 6 "alias" } } */
-- 
1.9.1

Re: [v3 PATCH] Make the default constructors of tuple and pair conditionally explicit.

2015-11-03 Thread Jonathan Wakely

On 3 November 2015 at 02:37, Paolo Carlini wrote:
> Hi,
>
> On 11/02/2015 09:20 PM, Ville Voutilainen wrote:
>>
>> On 2 November 2015 at 21:20, Paolo Carlini 
>> wrote:
>>>
>>> Can we follow the terse style already used elsewhere (eg,
>>> __is_direct_constructible_new_safe) thus directly inherit from __and_ and
>>> avoid explicit integral_constant? Otherwise patch looks good to me.
>>
>>
>> Sure. Tested again on Linux-PPC64, tests adjusted due to line changes,
>> Changelog entry updated to have a correct date on it.
>
> Great, thanks a lot. Thinking more about this detail, I wonder if we should
> therefore apply the below too? Anything I'm missing?

I have a weak preference for deriving from xxx::type rather than xxx,
so that the traits derive directly from either true_type or
false_type, not indirectly via some other type that derives from
true_type or false_type, but it probably isn't important.

Re: [PATCH] Pass manager: add support for termination of pass list

2015-11-03 Thread Richard Biener

On Tue, Nov 3, 2015 at 3:13 PM, Martin Liška  wrote:
> On 11/03/2015 02:46 PM, Richard Biener wrote:
>> On Fri, Oct 30, 2015 at 1:53 PM, Martin Liška  wrote:
>>> On 10/30/2015 01:13 PM, Richard Biener wrote:
 So I suggest to do the push/pop of cfun there.
 do_per_function_toporder can be made static btw.

 Richard.
>>>
>>> Right, I've done that and it works (bootstrap has been currently running),
>>> feasible for HSA branch too.
>>>
>>> tree-pass.h:
>>>
>>> /* Declare for plugins.  */
>>> extern void do_per_function_toporder (void (*) (function *, void *), void 
>>> *);
>>>
>>> Attaching the patch that I'm going to test.
>>
>> Err.
>>
>> +  cgraph_node::get (current_function_decl)->release_body ();
>> +
>> +  current_function_decl = NULL;
>> +  set_cfun (NULL);
>>
>> I'd have expected
>>
>>   tree fn = cfun->decl;
>>   pop_cfun ();
>>   gcc_assert (!cfun);
>>   cgraph_node::get (fn)->release_body ();
>>
>> here.
>
> Yeah, that works, but we have to add following hunk:
>
> diff --git a/gcc/function.c b/gcc/function.c
> index aaf49a4..4718fe1 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -4756,6 +4756,13 @@ push_cfun (struct function *new_cfun)
>  void
>  pop_cfun (void)
>  {
> +  if (cfun_stack.is_empty ())
> +{
> +  set_cfun (NULL);
> +  current_function_decl = NULL_TREE;
> +  return;
> +}
> +
>struct function *new_cfun = cfun_stack.pop ();
>/* When in_dummy_function, we do have a cfun but current_function_decl is
>   NULL.  We also allow pushing NULL cfun and subsequently changing

Why again?  cfun should be set via push_cfun here so what's having cfun == NULL
at the pop_cfun point?  Or rather, what code used set_cfun () instead
of push_cfun ()?

>
> If you are fine with that, looks we've fixed all issues related to the 
> change, right?
> Updated version of the is attached.
>
> Martin
>
>>
>>> Martin
>>>
>

Re: [PATCH] PR 68192 Export AIX TLS symbols

2015-11-03 Thread Richard Biener

On Tue, Nov 3, 2015 at 3:23 PM, David Edelsohn  wrote:
> TLS symbols in AIX display a new, different symbol type in nm output.
> Libtool explicitly creates a list of exported symbols for shared
> libraries using nm and does not recognize the new TLS symbols, so
> those symbols are not exported.
>
> This is a regression for TLS support on AIX.
>
> This patch updates libtool.m4 in GCC and configure for libstdc++-v3,
> libgfortran, and libgomp.  I would like to apply the patch to GCC
> while I simultaneously work with the Libtool community to correct the
> bug upstream.  I also would like to backport this to GCC 5.2 and GCC
> 4.9.x.

What does this do to the set of exported symbols esp. on branches.
Does AIX have library versions?

> I have not been able to run the correct versions of autoconf to
> regenerate configure directly.  I either can edit the files directly
> or I would appreciate someone helping me to regenerate configure in
> all library directories.
>
> Bootstrapped on powerpc-ibm-aix7.1.0.0.
>
> * libtool.m4 (export_symbols_cmds) [AIX]: Add global TLS "L" symbols.
> * libstdc++-v3/configure: Regenerate.
> * libgfortran/configure: Regenerate.
> * libgomp/configure: Regenerate.
>
> Thanks, David

Re: [v3 PATCH] Make the default constructors of tuple and pair conditionally explicit.

2015-11-03 Thread Ville Voutilainen

On 3 November 2015 at 16:42, Jonathan Wakely  wrote:
> On 3 November 2015 at 02:37, Paolo Carlini wrote:
>> Hi,
>>
>> On 11/02/2015 09:20 PM, Ville Voutilainen wrote:
>>>
>>> On 2 November 2015 at 21:20, Paolo Carlini 
>>> wrote:

 Can we follow the terse style already used elsewhere (eg,
 __is_direct_constructible_new_safe) thus directly inherit from __and_ and
 avoid explicit integral_constant? Otherwise patch looks good to me.
>>>
>>>
>>> Sure. Tested again on Linux-PPC64, tests adjusted due to line changes,
>>> Changelog entry updated to have a correct date on it.
>>
>> Great, thanks a lot. Thinking more about this detail, I wonder if we should
>> therefore apply the below too? Anything I'm missing?
>
> I have a weak preference for deriving from xxx::type rather than xxx,
> so that the traits derive directly from either true_type or
> false_type, not indirectly via some other type that derives from
> true_type or false_type, but it probably isn't important.

I expect the inheritance hierarchies of these things to be linear, so
probably not
a huge matter. I did push the patch already. :)

Re: [PATCH, 2/2] Handle recursive restrict in function parameter

2015-11-03 Thread Richard Biener

On Tue, 3 Nov 2015, Tom de Vries wrote:

> On 01/11/15 19:20, Tom de Vries wrote:
> > On 01/11/15 19:03, Tom de Vries wrote:
> > > So, the new patch series is:
> > > 
> > >   1Rename make_restrict_var_constraints to make_param_constraints
> > >   2Handle recursive restrict in function parameter
> > > 
> > > I'll repost in reply to this message.
> > > 
> > 
> > This patch adds handling of all the restrict qualifiers in the type of a
> > function parameter.
> > 
> 
> And reposting an updated version, now that the toplevel parameter in
> make_param_constraints has been eliminated.

@@ -5195,6 +5197,8 @@ struct fieldoff
   unsigned may_have_pointers : 1;

   unsigned only_restrict_pointers : 1;
+
+  varinfo_t restrict_var;
 };

store the varinfo ID here, 'unsigned int restrict_var' which ends
up not changing fieldoff size.  get_varinfo (restrict_var) will get
you the varinfo_t.

@@ -5374,6 +5380,19 @@ push_fields_onto_fieldstack (tree type, 
vec *fieldstack,
  = (!has_unknown_size
 && POINTER_TYPE_P (field_type)
 && TYPE_RESTRICT (field_type));
+   if (handle_param
+   && e.only_restrict_pointers
+   && !type_contains_placeholder_p (TREE_TYPE 
(field_type)))
+ {
+   varinfo_t rvi;
+   tree heapvar = build_fake_var_decl (TREE_TYPE 
(field_type));
+   DECL_EXTERNAL (heapvar) = 1;
+   rvi = create_variable_info_for_1 (heapvar, 
"PARM_NOALIAS",
+ true, true);
+   rvi->is_restrict_var = 1;
+   insert_vi_for_tree (heapvar, rvi);
+   e.restrict_var = rvi;
+ }

hmm, can you delay this to the point we actually will use field-sensitive
stuff?  That is, until create_variable_info_for_1 decided to use a
multi-field variable?  Say, here:

+  if (handle_param
+ && newvi->only_restrict_pointers
+ && fo->restrict_var != NULL)
+   {
+ make_constraint_from (newvi, fo->restrict_var->id);
+ make_param_constraints (fo->restrict_var);
+   }

?  Looks like then you don't need the new field at all.

Thanks,
Richard.

Re: libgo patch committed: Update to Go 1.5 release

2015-11-03 Thread Ian Lance Taylor

On Mon, Nov 2, 2015 at 11:48 PM, Uros Bizjak  wrote:
>
>> I have committed a patch to libgo to update it to the Go 1.5 release.
>>
>> As usual for libgo updates, the actual patch is too large to attach to
>> this e-mail message.  I've attached the changes to the gccgo-specific
>> files.
>>
>> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
>> to mainline.
>>
>> This may cause trouble on non-GNU/Linux operating systems.  Please let
>> me know about any problems you encounter.
>
> There is one new testsuite failure on CentOS 5.11 (kernel 2.6.18),
> where namespaces are not supported:
>
> exec_linux_test.go:29:23: error: reference to undefined identifier
> 'syscall.CLONE_NEWUSER'
>Cloneflags: syscall.CLONE_NEWUSER,
>^
> FAIL: syscall
>
> The test would be skipped, since "/proc/self/ns/user" doesn't exist,
> however, the test doesn't compile due to missing CLONE_NEWUSER define.

Thanks.  I committed this patch which should fix the problem.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 229686)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-28fbc7f42702ce081ef5f3ce9a1dbc1ed3f3c89e
+10e0f935ac369f8403c198b05c909e42e565c1e5
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 229676)
+++ libgo/mksysinfo.sh  (working copy)
@@ -1444,6 +1444,11 @@ grep '^type _inotify_event ' gen-sysinfo
 # The GNU/Linux CLONE flags.
 grep '^const _CLONE_' gen-sysinfo.go | \
   sed -e 's/^\(const \)_\(CLONE_[^= ]*\)\(.*\)$/\1\2 = _\2/' >> ${OUT}
+# We need some CLONE constants that are not defined in older versions
+# of glibc.
+if ! grep '^const CLONE_NEWUSER ' ${OUT} > /dev/null 2>&1; then
+  echo "const CLONE_NEWUSER = 0x1000" >> ${OUT}
+fi
 
 # Struct sizes.
 set cmsghdr Cmsghdr ip_mreq IPMreq ip_mreqn IPMreqn ipv6_mreq IPv6Mreq \

Re: [1/2] OpenACC routine support

2015-11-03 Thread Jakub Jelinek

On Mon, Nov 02, 2015 at 02:21:43PM -0500, Nathan Sidwell wrote:
> --- gcc/c/c-parser.c  (revision 229667)
> +++ gcc/c/c-parser.c  (working copy)
> @@ -1160,7 +1160,8 @@ enum c_parser_prec {
>  static void c_parser_external_declaration (c_parser *);
>  static void c_parser_asm_definition (c_parser *);
>  static void c_parser_declaration_or_fndef (c_parser *, bool, bool, bool,
> -bool, bool, tree *, vec);
> +bool, bool, tree *, vec,
> +tree);

Wonder if this shouldn't be tree = NULL_TREE, then you'd avoid most of the
c_parser_declaration_or_fndef caller changes.

Otherwise, LGTM.

Jakub

Re: [PATCH] remove unused config/arm/coff.h

2015-11-03 Thread Trevor Saunders

On Tue, Nov 03, 2015 at 06:49:20AM -0700, Jeff Law wrote:
> On 11/03/2015 04:31 AM, tbsaunde+...@tbsaunde.org wrote:
> >From: Trevor Saunders 
> >
> >Hi,
> >
> >$subject, nothing refers to this header so we might as well remove it.
> >
> >tested I can still build on x86_64-linux-gnu, not that I would expect 
> >anything
> >else or that it is particularly relevent, ok?
> >
> >Trev
> >
> >gcc/ChangeLog:
> >
> >2015-11-03  Trevor Saunders  
> >
> > * config/arm/coff.h: Remove.
> More generally, if we have a header file that's not used, I'd consider
> removing it to be obvious-enough to commit without approval.
> 
> We could/should probably do the same with unused functions, with the only
> wrinkle being things that are useful for debugging but which are otherwise
> unused should be kept around.

I'd agree removing things that are dead is obvious, but sometimes things
are so convoluted its hard to be sure they are in fact dead, and then
its nice to have some confirmation you didn't miss something :)

Trev

> 
> jeff
>

Re: [2/2] OpenACC routine support

2015-11-03 Thread Jakub Jelinek

On Mon, Nov 02, 2015 at 02:23:19PM -0500, Nathan Sidwell wrote:
> Here are the tests for the routine support.  The compiler tests check
> invalid combinations of gang, worker, vector & seq.  The libgomp execution
> tests check the expected partioning occurs within loops.  As  with the
> reduction tests, these ones  are taken from the execution model loop tests.

I find the testsuite coverage insufficient, e.g. you don't have equivalent
of first half of declare-simd-2.C or declare-simd-2.c
(everything above #pragma omp declare simd inbranch notinbranch),
to verify that if acc routine is used without the (fnname) in it, then
it can't be followed by var definition and various other tokens.

Jakub

Re: Multiply Optimization in match and Simplify

2015-11-03 Thread Richard Biener

On Tue, Nov 3, 2015 at 6:12 AM, Hurugalawadi, Naveen
 wrote:
> Hi,
>
> Thanks for the review and suggestions.
>
>>> Please do not drop A - B -> A + (-B) from fold-const as match.pd
>>> doesn't implement all of fold-const.c negate_expr_p support.
>
> Done.
>
>>> which is more expensive.  This means that we miss a
>>> (bit_and (bit_not @0) INTEGER_CST@1)
>
> Should we have this pattern implemented in match.pd?

I think so.  Please combine with

+/* Fold X & (X ^ Y) as X & ~Y.  */
+(simplify
+ (bit_and:c (convert? @0) (convert? (bit_xor:c @0 @1)))
+  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (convert (bit_and @0 (bit_not @1)


>>> negate_expr_p doesn't capture everything
>>> fold-const.c does so moving the above isn't a good idea.
>
> Dropped the pattern. Was working on some more patterns
> that had negate_expr_p. Will drop all of them.
>
>>> fold-const.c only handles constant C, so we only need to 2nd pattern.
>
> Yeah. Thought that even having variable would be optimized in a similar
> manner and hence had that pattern.

+/* Convert (A + A) * C -> A * 2 * C.  */
+(simplify
+ (mult (convert? (plus @0 @0)) INTEGER_CST@1)
+  (mult (convert @0) (mult { build_int_cst (TREE_TYPE (@1), 2); } @1)))

so you dropped the nop-conversion check but not the converts ...  This should
now simply match

 (mult (plus @0 @0) INTEGER_CST@1)

but I'd rather (as Marc pointed out) enable the

/* Convert x+x into x*2.0.  */
(simplify
 (plus @0 @0)
 (if (SCALAR_FLOAT_TYPE_P (type))
  (mult @0 { build_real (type, dconst2); })))

pattern also for integral types.  That should make the pattern redundant
(and the fold code can be removed anyway).  Like with

@@ -1606,7 +1619,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (plus @0 @0)
  (if (SCALAR_FLOAT_TYPE_P (type))
-  (mult @0 { build_real (type, dconst2); })))
+  (mult @0 { build_real (type, dconst2); })
+  (if (INTEGRAL_TYPE_P (type))
+   (mult @0 { build_int_cst (type, 2); }

 (simplify
  (minus integer_zerop @1)

Richard.

> Please find attached the modified pattern as per suggestions.
> Please review the patch and let me know if there should be any further
> modifications in it.
>
> Thanks,
> Naveen

[PATCH][AArch64][v2] Improve comparison with complex immediates followed by branch/cset

2015-11-03 Thread Kyrill Tkachov


Hi all,

This patch slightly improves sequences where we want to compare against a 
complex immediate and branch against the result
or perform a cset on it.
This means transforming sequences of mov+movk+cmp+branch into sub+subs+branch.
Similar for cset. Unfortunately I can't just do this by simply matching a 
(compare (reg) (const_int)) rtx because
this transformation is only valid for equal/not equal comparisons, not greater 
than/less than ones but the compare instruction
pattern only has the general CC mode. We need to also match the use of the 
condition code.

I've done this by creating a splitter for the conditional jump where the 
condition is the comparison between the register
and the complex immediate and splitting it into the sub+subs+condjump sequence. 
Similar for the cstore pattern.
Thankfully we don't split immediate moves until later in the optimization 
pipeline so combine can still try the right patterns.
With this patch for the example code:
void g(void);
void f8(int x)
{
   if (x != 0x123456) g();
}

I get:
f8:
sub w0, w0, #1191936
subsw0, w0, #1110
beq .L1
b   g
.p2align 3
.L1:
ret

instead of the previous:
f8:
mov w1, 13398
movkw1, 0x12, lsl 16
cmp w0, w1
beq .L1
b   g
.p2align 3
.L1:
ret


The condjump case triggered 130 times across all of SPEC2006 which is, 
admittedly, not much
whereas the cstore case didn't trigger at all. However, the included testcase 
in the patch
demonstrates the kind of code that it would trigger on.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill


2015-11-03  Kyrylo Tkachov  

* config/aarch64/aarch64.md (*condjump): Rename to...
(condjump): ... This.
(*compare_condjump): New define_insn_and_split.
(*compare_cstore_insn): Likewise.
(*cstore_insn): Rename to...
(aarch64_cstore): ... This.
* config/aarch64/iterators.md (CMP): Handle ne code.
* config/aarch64/predicates.md (aarch64_imm24): New predicate.

2015-11-03  Kyrylo Tkachov  

* gcc.target/aarch64/cmpimm_branch_1.c: New test.
* gcc.target/aarch64/cmpimm_cset_1.c: Likewise.
commit 7df013a391532f39932b80c902e3b4bbd841710f
Author: Kyrylo Tkachov 
Date:   Mon Sep 21 10:56:47 2015 +0100

[AArch64] Improve comparison with complex immediates

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 126c9c2..1bfc870 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -369,7 +369,7 @@ (define_expand "mod3"
   }
 )
 
-(define_insn "*condjump"
+(define_insn "condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
 			[(match_operand 1 "cc_register" "") (const_int 0)])
 			   (label_ref (match_operand 2 "" ""))
@@ -394,6 +394,40 @@ (define_insn "*condjump"
 		  (const_int 1)))]
 )
 
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; mov	x0, #imm1
+;; movk	x0, #imm2, lsl 16 /* x0 contains CST.  */
+;; cmp	x1, x0
+;; b .Label
+;; into the shorter:
+;; sub	x0, #(CST & 0xfff000)
+;; subs	x0, #(CST & 0x000fff)
+;; b .Label
+(define_insn_and_split "*compare_condjump"
+  [(set (pc) (if_then_else (EQL
+			  (match_operand:GPI 0 "register_operand" "r")
+			  (match_operand:GPI 1 "aarch64_imm24" "n"))
+			   (label_ref:P (match_operand 2 "" ""))
+			   (pc)))]
+  "!aarch64_move_imm (INTVAL (operands[1]), mode)
+   && !aarch64_plus_operand (operands[1], mode)"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
+HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
+rtx tmp = gen_reg_rtx (mode);
+emit_insn (gen_add3 (tmp, operands[0], GEN_INT (-hi_imm)));
+emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+rtx cmp_rtx = gen_rtx_fmt_ee (, mode, cc_reg, const0_rtx);
+emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+DONE;
+  }
+)
+
 (define_expand "casesi"
   [(match_operand:SI 0 "register_operand" "")	; Index
(match_operand:SI 1 "const_int_operand" "")	; Lower bound
@@ -2898,7 +2932,7 @@ (define_expand "cstore4"
   "
 )
 
-(define_insn "*cstore_insn"
+(define_insn "aarch64_cstore"
   [(set (match_operand:ALLI 0 "register_operand" "=r")
 	(match_operator:ALLI 1 "aarch64_comparison_operator"
 	 [(match_operand 2 "cc_register" "") (const_int 0)]))]
@@ -2907,6 +2941,39 @@ (define_insn "*cstore_insn"
   [(set_attr "type" "csel")]
 )
 
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; mov	x0, #imm1
+;; movk	x0, #imm2, lsl 16 /* x0 contains CST.  */
+;; cmp	x1, x0
+;; cset	x2, 
+;; into the shorter:
+;; sub	x0, #(CST & 0xfff000)
+;; subs	x0, #(CST & 0x000fff)
+;; cset x1, .
+(define_insn_and_split "*compare_cstore_insn"
+  [(set (match_operand:GPI 0 "register_ope

Re: [1/3] OpenACC reductions

2015-11-03 Thread Jakub Jelinek

On Mon, Nov 02, 2015 at 11:18:37AM -0500, Nathan Sidwell wrote:
> This is the core execution bits of OpenACC reductions.
> 
> We have a new internal fn 'IFN_GOACC_REDUCTION' and a new target hook
> goacc.reduction, to lower it on the target compiler.

So, let me start with a few questions:
1) does OpenACC allow UDRs or only the built-in reductions?  If it
   does not allow UDRs, do you have it covered by testcases that you
   disallow parsing of them (e.g. when you have
#pragma omp declare reduction (xyz: struct S: omp_out.x += omp_in.y) 
initializer (omp_priv = { 5 })
#pragma acc parallel reduction (xyz: var_with_type_S)
   )?
2) how do you expand the reductions in the end when targetting host fallback
   or when targetting non-PTX targets?

Jakub

Re: libgo patch committed: Update to Go 1.5 release

2015-11-03 Thread Lynn A. Boger

We are seeing failures on all the libgo tests when gccgo is built with 
the latest trunk

on ppc64 (BE) and when running the testsuite for 64 bit.  The failures
do not occur if run on ppc64 BE with m32 and do not occur on ppc64le.

The messages say this:

make[3]: Entering directory 
`/home/boger/gccgo.work/trunk/bld/powerpc64-linux/libgo'

gotest: warning: no tests matching Test([^a-z].*)? in _gotest_.o _xtest_.o
FAIL: bufio
make[3]: *** [bufio/check] Error 1
gotest: warning: no tests matching Test([^a-z].*)? in _gotest_.o _xtest_.o
FAIL: bytes
make[3]: *** [bytes/check] Error 1
gotest: warning: no tests matching Test([^a-z].*)? in _gotest_.o _xtest_.o
FAIL: errors

..  same message for all

On 11/03/2015 09:07 AM, Ian Lance Taylor wrote:

On Mon, Nov 2, 2015 at 11:48 PM, Uros Bizjak  wrote:

I have committed a patch to libgo to update it to the Go 1.5 release.

As usual for libgo updates, the actual patch is too large to attach to
this e-mail message.  I've attached the changes to the gccgo-specific
files.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

This may cause trouble on non-GNU/Linux operating systems.  Please let
me know about any problems you encounter.

There is one new testsuite failure on CentOS 5.11 (kernel 2.6.18),
where namespaces are not supported:

exec_linux_test.go:29:23: error: reference to undefined identifier
'syscall.CLONE_NEWUSER'
Cloneflags: syscall.CLONE_NEWUSER,
^
FAIL: syscall

The test would be skipped, since "/proc/self/ns/user" doesn't exist,
however, the test doesn't compile due to missing CLONE_NEWUSER define.

Thanks.  I committed this patch which should fix the problem.

Ian

Re: [PATCH] replace BITS_PER_UNIT with __CHAR_BIT__ in target libs

2015-11-03 Thread Trevor Saunders

On Tue, Nov 03, 2015 at 01:20:26PM +, Joseph Myers wrote:
> On Mon, 2 Nov 2015, Jeff Law wrote:
> 
> > Based on Bernd's comments, I think this is fine.  Any sense of how much work
> > there is left to cleanup the runtime's inclusion of gcc's config/ target
> > headers?
> 
> See .  I think most 
> of that page other than the list of target macros describes stuff that was 
> done some time ago (which should be verified, and then the obsolete pieces 
> removed), and the list of host-side target macros used in target-side code 
> may not be fully up to date, but it should be indicative of what needs 
> fixing.  (The division by possible approaches for fixing each macro is 
> extremely rough, however - probably several macros would best be fixed in 
> some way other than the initial guess I put on that page.  And if moving 
> macros to libgcc_tm.h, note my warning at (b)(ii) in 
>  about some libgcc files 
> not including libgcc_tm.h.)

yeah, there's a fair bit of work left, I'm kind of hopeful it might be
done some time next year, but that assumes I can make myself keep
working on it, refactoring code that is hard to test isn't that much fun
:)

Trev

> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com

[PATCH] Refactor BB vectorization

2015-11-03 Thread Richard Biener


This refactors BB vectorization in preparation to make it work
on sub-BB granularity.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-11-03  Richard Biener  

* tree-vect-data-refs.c (vect_analyze_data_refs): Do not collect
data references here.
* tree-vect-loop.c: Include cgraph.h.
(vect_analyze_loop_2): Collect data references here.
* tree-vect-slp.c (find_bb_location): Inline ...
(vect_slp_bb): ... here.  Renamed from vect_slp_analyze_bb.
Factor in vect_slp_transform_bb.
(vect_slp_transform_bb): Removed.
(vect_slp_analyze_bb_1): Collect data references here.
* tree-vectorizer.c (pass_slp_vectorize::execute): Call
vect_slp_bb.
* tree-vectorizer.h (vect_slp_bb): Declare.
(vect_slp_analyze_bb): Remove.
(vect_slp_transform_bb): Remove.
(find_bb_location): Remove.
(vect_analyze_data_refs): Remove stmt count reference parameter.

Index: trunk/gcc/tree-vect-data-refs.c
===
*** trunk.orig/gcc/tree-vect-data-refs.c2015-11-03 12:57:31.538588865 
+0100
--- trunk/gcc/tree-vect-data-refs.c 2015-11-03 15:05:51.537766823 +0100
*** vect_check_gather_scatter (gimple *stmt,
*** 3286,3297 
  */
  
  bool
! vect_analyze_data_refs (vec_info *vinfo, int *min_vf, unsigned *n_stmts)
  {
struct loop *loop = NULL;
-   basic_block bb = NULL;
unsigned int i;
-   vec datarefs;
struct data_reference *dr;
tree scalar_type;
  
--- 3286,3295 
  */
  
  bool
! vect_analyze_data_refs (vec_info *vinfo, int *min_vf)
  {
struct loop *loop = NULL;
unsigned int i;
struct data_reference *dr;
tree scalar_type;
  
*** vect_analyze_data_refs (vec_info *vinfo,
*** 3300,3405 
   "=== vect_analyze_data_refs ===\n");
  
if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
! {
!   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
! 
!   loop = LOOP_VINFO_LOOP (loop_vinfo);
!   datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
!   if (!find_loop_nest (loop, &LOOP_VINFO_LOOP_NEST (loop_vinfo)))
!   {
! if (dump_enabled_p ())
!   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!"not vectorized: loop contains function calls"
!" or data references that cannot be analyzed\n");
! return false;
!   }
! 
!   for (i = 0; i < loop->num_nodes; i++)
!   {
! gimple_stmt_iterator gsi;
! 
! for (gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi); gsi_next (&gsi))
!   {
! gimple *stmt = gsi_stmt (gsi);
! if (is_gimple_debug (stmt))
!   continue;
! ++*n_stmts;
! if (!find_data_references_in_stmt (loop, stmt, &datarefs))
!   {
! if (is_gimple_call (stmt) && loop->safelen)
!   {
! tree fndecl = gimple_call_fndecl (stmt), op;
! if (fndecl != NULL_TREE)
!   {
! struct cgraph_node *node = cgraph_node::get (fndecl);
! if (node != NULL && node->simd_clones != NULL)
!   {
! unsigned int j, n = gimple_call_num_args (stmt);
! for (j = 0; j < n; j++)
!   {
! op = gimple_call_arg (stmt, j);
! if (DECL_P (op)
! || (REFERENCE_CLASS_P (op)
! && get_base_address (op)))
!   break;
!   }
! op = gimple_call_lhs (stmt);
! /* Ignore #pragma omp declare simd functions
!if they don't have data references in the
!call stmt itself.  */
! if (j == n
! && !(op
!  && (DECL_P (op)
!  || (REFERENCE_CLASS_P (op)
!  && get_base_address (op)
!   continue;
!   }
!   }
!   }
! LOOP_VINFO_DATAREFS (loop_vinfo) = datarefs;
! if (dump_enabled_p ())
!   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!"not vectorized: loop contains function "
!"calls or data references that cannot "
!"be analyzed\n");
! return false;
!   }
!   }
!   }
! 
!   LOOP

Re: [1/2] OpenACC routine support

2015-11-03 Thread Nathan Sidwell


On 11/03/15 10:35, Jakub Jelinek wrote:

On Mon, Nov 02, 2015 at 02:21:43PM -0500, Nathan Sidwell wrote:

--- gcc/c/c-parser.c(revision 229667)
+++ gcc/c/c-parser.c(working copy)
@@ -1160,7 +1160,8 @@ enum c_parser_prec {
  static void c_parser_external_declaration (c_parser *);
  static void c_parser_asm_definition (c_parser *);
  static void c_parser_declaration_or_fndef (c_parser *, bool, bool, bool,
-  bool, bool, tree *, vec);
+  bool, bool, tree *, vec,
+  tree);


Wonder if this shouldn't be tree = NULL_TREE, then you'd avoid most of the
c_parser_declaration_or_fndef caller changes.


yeah, I guess that'd be less invasive.  I'm fine with it.

Re: [1/3] OpenACC reductions

2015-11-03 Thread Nathan Sidwell


On 11/03/15 10:46, Jakub Jelinek wrote:

On Mon, Nov 02, 2015 at 11:18:37AM -0500, Nathan Sidwell wrote:

This is the core execution bits of OpenACC reductions.

We have a new internal fn 'IFN_GOACC_REDUCTION' and a new target hook
goacc.reduction, to lower it on the target compiler.


So, let me start with a few questions:
1) does OpenACC allow UDRs or only the built-in reductions?  If it
does not allow UDRs, do you have it covered by testcases that you
disallow parsing of them (e.g. when you have


no UDR reductions.  Will check test cases for that.


#pragma omp declare reduction (xyz: struct S: omp_out.x += omp_in.y) 
initializer (omp_priv = { 5 })
#pragma acc parallel reduction (xyz: var_with_type_S)
)?



2) how do you expand the reductions in the end when targetting host fallback
or when targetting non-PTX targets?


That's what default_goacc_reduction is doing.

(I see its comment hasn't caught up with the changes I made during the merge. 
Will fix)


   LHS-opt = IFN_RED (KIND, RES_PTR, VAR, LEVEL, OP, OFFSET)
   If RES_PTR is not integer-zerop:
   SETUP - emit 'LHS = *RES_PTR', LHS = NULL
   TEARDOWN - emit '*RES_PTR = VAR'
   If LHS is not NULL
   emit 'LHS = VAR'

This is the correct behaviour for a single-threaded  loop.  Of course the loop 
could go on to be parallelized in the normal way -- or additional conversion to 
openmp constructs along the same lines as we discussed for the GOACC_LOOP function.


Does that help?

nathan

Re: [Boolean Vector, patch 1/5] Introduce boolean vector to be used as a vector comparison type

2015-11-03 Thread Jeff Law


On 10/29/2015 07:08 AM, Ilya Enkovich wrote:

On 28 Oct 22:37, Ilya Enkovich wrote:

Seems the problem occurs in this check in expand_vector_operations_1:

   /* A scalar operation pretending to be a vector one.  */
   if (VECTOR_BOOLEAN_TYPE_P (type)
   && !VECTOR_MODE_P (TYPE_MODE (type))
   && TYPE_MODE (type) != BLKmode)
 return;

This is to filter out scalar operations on boolean vectors.
The problem here is that TYPE_MODE (type) doesn't return
V4SImode assigned to the type but calls vector_type_mode
instead which tries to find an integer mode for it and returns
TImode. This causes function exit and we don't expand vector
comparison.

Suppose simple option to fix it is to change default get_mask_mode
hook to return BLKmode in case chosen integer vector mode is not
vector_mode_supported_p.

Thanks,
Ilya



Here is a patch which fixes the problem on ARM (and on i386 with -mno-sse 
also).  I checked it fixes the problem on ARM and also bootstrapped and checked 
it on x86_64-unknown-linux-gnu.  Is it OK?

Thanks,
Ilya
--
gcc/

2015-10-29  Ilya Enkovich  

* targhooks.c (default_get_mask_mode): Use BLKmode in
case target doesn't support required vector mode.
* stor-layout.c (layout_type): Check for BLKmode.
And just to be clear, since Richi pointed out that we're already using 
BLKmode for this kind of situation, this patch is OK.


Jeff

Re: [PATCH] Fix typo.

2015-11-03 Thread Yulia Koval

Here it is.

gcc/
* config/i386/i386.c (m_SKYLAKE_AVX512): Fix typo.

Yulia

On Tue, Nov 3, 2015 at 5:32 PM, Uros Bizjak  wrote:
> On Tue, Nov 3, 2015 at 3:16 PM, Yulia Koval  wrote:
>> Hi,
>>
>> This patch fixes a typo: PROCESSOT -> PROCESSOR. Ok for trunk?
>
> Trivial patch, OK with a suitable ChangeLog.
>
> Uros.

Re: [2/2] OpenACC routine support

2015-11-03 Thread Nathan Sidwell


On 11/03/15 10:38, Jakub Jelinek wrote:

On Mon, Nov 02, 2015 at 02:23:19PM -0500, Nathan Sidwell wrote:

Here are the tests for the routine support.  The compiler tests check
invalid combinations of gang, worker, vector & seq.  The libgomp execution
tests check the expected partioning occurs within loops.  As  with the
reduction tests, these ones  are taken from the execution model loop tests.


I find the testsuite coverage insufficient, e.g. you don't have equivalent
of first half of declare-simd-2.C or declare-simd-2.c
(everything above #pragma omp declare simd inbranch notinbranch),
to verify that if acc routine is used without the (fnname) in it, then
it can't be followed by var definition and various other tokens.


d'oh! forgot to port those tests.  Easy fix.

ok with that added?

nathan

Re: [2/2] OpenACC routine support

2015-11-03 Thread Jakub Jelinek

On Tue, Nov 03, 2015 at 10:56:37AM -0500, Nathan Sidwell wrote:
> On 11/03/15 10:38, Jakub Jelinek wrote:
> >On Mon, Nov 02, 2015 at 02:23:19PM -0500, Nathan Sidwell wrote:
> >>Here are the tests for the routine support.  The compiler tests check
> >>invalid combinations of gang, worker, vector & seq.  The libgomp execution
> >>tests check the expected partioning occurs within loops.  As  with the
> >>reduction tests, these ones  are taken from the execution model loop tests.
> >
> >I find the testsuite coverage insufficient, e.g. you don't have equivalent
> >of first half of declare-simd-2.C or declare-simd-2.c
> >(everything above #pragma omp declare simd inbranch notinbranch),
> >to verify that if acc routine is used without the (fnname) in it, then
> >it can't be followed by var definition and various other tokens.
> 
> d'oh! forgot to port those tests.  Easy fix.
> 
> ok with that added?

Yes.

Jakub

Re: [PATCH] PR 68192 Export AIX TLS symbols

2015-11-03 Thread David Edelsohn

On Tue, Nov 3, 2015 at 9:47 AM, Richard Biener
 wrote:
> On Tue, Nov 3, 2015 at 3:23 PM, David Edelsohn  wrote:
>> TLS symbols in AIX display a new, different symbol type in nm output.
>> Libtool explicitly creates a list of exported symbols for shared
>> libraries using nm and does not recognize the new TLS symbols, so
>> those symbols are not exported.
>>
>> This is a regression for TLS support on AIX.
>>
>> This patch updates libtool.m4 in GCC and configure for libstdc++-v3,
>> libgfortran, and libgomp.  I would like to apply the patch to GCC
>> while I simultaneously work with the Libtool community to correct the
>> bug upstream.  I also would like to backport this to GCC 5.2 and GCC
>> 4.9.x.
>
> What does this do to the set of exported symbols esp. on branches.
> Does AIX have library versions?

The patch restores symbols that were omitted by the "nm" change.  The
symbols are a superset of the current symbols and were present before,
which is why this is a regression.

Thanks, David

Re: [OpenACC] declare directive

2015-11-03 Thread James Norris


On 10/27/2015 03:18 PM, James Norris wrote:

Hi!

 This patch adds the processing of OpenACC declare directive in C
 and C++. (Note: Support in Fortran is already in trunk.)
 Commentary on the changes is included as an attachment (NOTES).

 All of the code is in the gomp-4_0-branch.

 Regtested on x86_64-linux.

 Thanks!
 Jim


Ping!

I've revised the patch since I originally submitted it for review
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02967.html). The
revision is due to Jakub and et al OpenMP 4.5 work in the area of
'omp declare target'. I now exploit that functionality and have
revised the patch accordingly.

Updated ChangeLog, patch, and commentary (NOTES) are attached.

Regtested on x86_64-linux

Thanks!
Jim

Background
The declare directive is used to allocate device memory for the
entire scope of a variable / array within a program, function,
or subroutine. Consider the following example.

int a[10];
#pragma acc declare create (a)

void func (int *a)
{
int b[10];
#pragma acc declare create (b)

#pragma acc parallel present (a, b)
{
  int i;

  for (i = 0; i < 10; i++)
  {
b[i] = a[i];
a[i] = b[i] + i;
  }

}

return;
}

int main (int argc, char **argv)
{
func (&a[0]);

return 0;
}

In the example, array 'a' will be allocated on the device at the
outset of device activity and be available for the duration.
Whereas, array 'b', will only be available when 'func' is executing.
In other words, array 'b' will be allocated at the outset of
execution of 'func' and deallocated at the return from 'func'. In
some instances, the clause may require that the host copy of a
variable / array be updated prior to a return from a function or
subroutine or exiting of the program.

C and C++ front-ends

Definitions for use in C and C++ were added to identify the
declare directive pragma and its' valid clauses. After the
clauses have been validated, if the declare directive is for a
global variable, then an attribute is created and chained.
These attributes will be used during gimplification.

Once the user-specified clauses have been parsed, the clauses
have to be examined and potentially altered and/or added to.
As mentioned in the previous section, with some clauses, e.g.,
e.g, copy, movement of data has to occur at the entry to 
something like a function as well as at exit. Hence the need
to examine/modify/add to the clauses so as to effect the
correct data movement.

For all instances of the declare directive, there is at least
one set of 'entry' clauses. If the clauses pertain to global
variables, a constructor is created. This constructor will
'register' the variable(s) / arrays so that at beginning of
OpenACC runtime the variable / arrays will be allocated and
be made available throughout program execution.

If on the other hand, the 'entry' clauses are not found to be
of a global type, then a node is created and the clauses are
associated with it. Also note that the 'return' clauses are
also associated with the node. Notice that there are 'return'
clauses only for non-global variables / arrays. The clauses
available for global variables / arrays only allow for data
movement at the initiation of program execution.

Middle-end

The OACC_DECLARE node is handled much the same as other OpenACC
nodes that represent directives. However, there is one thing
unique to declare, and that is the handling of the 'return'
clauses. The 'return' clauses are scanned and then a gimple
statment is created, but is not added. However, it is saved to
be added after the body has been gimplified.

The intent of this last-minute addition is to allow this statement
to be executed prior to returning from a function. JAKUB: While
this has been working, I'm not completely sure this is the proper
means by which to do this in order to guarantee this statement
is the last one executed. Please advise otherwise.

Callgraph

The 'make offload" functionality has been refactored to handle 
OpenACC variables / arrays. A variable is an OpenACC declare'd
is not known at the time the varpool node is created. This
requires that a check of the offloadable bit to determine
whether make_offloadable s

[Patch AArch64] Switch constant pools to separate rodata sections.

2015-11-03 Thread Ramana Radhakrishnan

Hi,

Now that PR63304 is fixed and we have an option to address
any part of the memory using adrp / add or adrp / ldr instructions
it makes sense to switch out literal pools into their own
mergeable sections by default.

This would mean that by default we could now start getting
the benefits of constant sharing across the board, potentially
improving code size. The other advantage of doing so, for the
security conscious is that this prevents intermingling of literal
pools with code.

Wilco's kindly done some performance measurements and suggests that
there is not really a performance regression in doing this.
I've looked at the code size for SPEC2k6 today at -Ofast and
in general there is a good code size improvement as expected
by sharing said constants.

Tested on aarch64-none-elf with no regressions and bootstrapped 
and regression tested in my tree for a number of days now.

Ok to commit ?

regards
Ramana

   * config/aarch64/aarch64.c (aarch64_select_rtx_section): Switch
to default section handling for non PC relative literal loads.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5c8604f..9d709e5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5244,13 +5244,22 @@ aarch64_use_blocks_for_constant_p (machine_mode mode 
ATTRIBUTE_UNUSED,
   return false;
 }
 
+/* Force all constant pool entries into the current function section.
+   In the large model we cannot reliably address all the address space
+   thus for now, inline this with the text.  */
 static section *
-aarch64_select_rtx_section (machine_mode mode ATTRIBUTE_UNUSED,
-   rtx x ATTRIBUTE_UNUSED,
-   unsigned HOST_WIDE_INT align ATTRIBUTE_UNUSED)
-{
-  /* Force all constant pool entries into the current function section.  */
-  return function_section (current_function_decl);
+aarch64_select_rtx_section (machine_mode mode,
+   rtx x,
+   unsigned HOST_WIDE_INT align)
+{
+  /* Force all constant pool entries into the current function section.
+ In the large model we cannot reliably address all the address space
+ thus for now, inline this with the text.  */
+  if (!aarch64_nopcrelative_literal_loads
+  || aarch64_cmodel == AARCH64_CMODEL_LARGE)
+return function_section (current_function_decl);
+
+  return default_elf_select_rtx_section (mode, x, align);
 }
 
 
-- 
1.9.1

Re: [PATCH, 2/2] Handle recursive restrict in function parameter

2015-11-03 Thread Tom de Vries


On 03/11/15 16:08, Richard Biener wrote:

On Tue, 3 Nov 2015, Tom de Vries wrote:


On 01/11/15 19:20, Tom de Vries wrote:

On 01/11/15 19:03, Tom de Vries wrote:

So, the new patch series is:

   1Rename make_restrict_var_constraints to make_param_constraints
   2Handle recursive restrict in function parameter

I'll repost in reply to this message.



This patch adds handling of all the restrict qualifiers in the type of a
function parameter.



And reposting an updated version, now that the toplevel parameter in
make_param_constraints has been eliminated.


@@ -5195,6 +5197,8 @@ struct fieldoff
unsigned may_have_pointers : 1;

unsigned only_restrict_pointers : 1;
+
+  varinfo_t restrict_var;
  };

store the varinfo ID here, 'unsigned int restrict_var' which ends
up not changing fieldoff size.  get_varinfo (restrict_var) will get
you the varinfo_t.


Done, attached.



@@ -5374,6 +5380,19 @@ push_fields_onto_fieldstack (tree type,
vec *fieldstack,
   = (!has_unknown_size
  && POINTER_TYPE_P (field_type)
  && TYPE_RESTRICT (field_type));
+   if (handle_param
+   && e.only_restrict_pointers
+   && !type_contains_placeholder_p (TREE_TYPE
(field_type)))
+ {
+   varinfo_t rvi;
+   tree heapvar = build_fake_var_decl (TREE_TYPE
(field_type));
+   DECL_EXTERNAL (heapvar) = 1;
+   rvi = create_variable_info_for_1 (heapvar,
"PARM_NOALIAS",
+ true, true);
+   rvi->is_restrict_var = 1;
+   insert_vi_for_tree (heapvar, rvi);
+   e.restrict_var = rvi;
+ }

hmm, can you delay this to the point we actually will use field-sensitive
stuff?  That is, until create_variable_info_for_1 decided to use a
multi-field variable?


AFAIU your concern is that in the current patch we're creating heapvars 
that may end up being ignored, f.i. if we hit the 
MAX_FIELDS_FOR_FIELD_SENSITIVE threshold?



 Say, here:

+  if (handle_param
+ && newvi->only_restrict_pointers
+ && fo->restrict_var != NULL)
+   {
+ make_constraint_from (newvi, fo->restrict_var->id);
+ make_param_constraints (fo->restrict_var);
+   }

?  Looks like then you don't need the new field at all.



The build_fake_var_decl call needs TREE_TYPE (field_type), the type the 
restrict pointer field points to.


The field type is no longer available once we've abstracted the struct 
type into a field stack in create_variable_info_for_1.


I think I can postpone the creation of the heapvar till where you 
suggest in create_variable_info_for_1, but I'd still need a means
to communicate the TREE_TYPE (field_type) from 
push_fields_onto_fieldstack to create_variable_info_for_1.


A simple implementation would be a new field:
...
@@ -5195,6 +5197,8 @@ struct fieldoff
unsigned may_have_pointers : 1;

unsigned only_restrict_pointers : 1;
+
+ tree restrict_pointed_type;
};
...
Which AFAIU will change fieldoff size.

Thanks,
- Tom
Handle recursive restrict in function parameter

	* tree-ssa-structalias.c (struct fieldoff): Add restrict_var field.
	(push_fields_onto_fieldstack): Add and handle handle_param parameter.
	(create_variable_info_for_1): Add and handle
	handle_param parameter.  Add extra arg to call to
	push_fields_onto_fieldstack.  Handle restrict pointer fields.
	(create_variable_info_for): Call create_variable_info_for_1 with extra
	arg.
	(make_param_constraints): Drop restrict_name parameter.  Ignore
	vi->only_restrict_pointers.
	(intra_create_variable_infos): Call create_variable_info_for_1 with
	extra arg.  Remove restrict handling.  Call make_param_constraints with
	one less arg.

	* gcc.dg/tree-ssa/restrict-7.c: New test.
	* gcc.dg/tree-ssa/restrict-8.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c | 12 
 gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c | 17 ++
 gcc/tree-ssa-structalias.c | 91 ++
 3 files changed, 84 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c b/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
new file mode 100644
index 000..f7a68c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/restrict-7.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+int
+f (int *__restrict__ *__restrict__ *__restrict__ a, int *b)
+{
+  *b = 1;
+  ***a  = 2;
+  return *b;
+}
+
+/* { dg-final { scan-tree-dump-times "return 1" 1 "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c b/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c
new file mode 100644
index 000..b0ab164
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/restrict-8.c
@@ -0,0 +1

Re: [PATCH] remove unused config/arm/coff.h

2015-11-03 Thread Richard Earnshaw

On 03/11/15 15:37, Trevor Saunders wrote:
> On Tue, Nov 03, 2015 at 06:49:20AM -0700, Jeff Law wrote:
>> On 11/03/2015 04:31 AM, tbsaunde+...@tbsaunde.org wrote:
>>> From: Trevor Saunders 
>>>
>>> Hi,
>>>
>>> $subject, nothing refers to this header so we might as well remove it.
>>>
>>> tested I can still build on x86_64-linux-gnu, not that I would expect 
>>> anything
>>> else or that it is particularly relevent, ok?
>>>
>>> Trev
>>>
>>> gcc/ChangeLog:
>>>
>>> 2015-11-03  Trevor Saunders  
>>>
>>> * config/arm/coff.h: Remove.
>> More generally, if we have a header file that's not used, I'd consider
>> removing it to be obvious-enough to commit without approval.
>>
>> We could/should probably do the same with unused functions, with the only
>> wrinkle being things that are useful for debugging but which are otherwise
>> unused should be kept around.
> 
> I'd agree removing things that are dead is obvious, but sometimes things
> are so convoluted its hard to be sure they are in fact dead, and then
> its nice to have some confirmation you didn't miss something :)
> 

Caution is to be commended.  In this sort of case, however, you can
probably claim obviousness but then self-impose a pre-commit timeout: I
think this is obvious but I'll wait 24 hours before committing.

R.

> Trev
> 
>>
>> jeff
>>

[c++-delayed-folding] Introduce convert_to_real_nofold

2015-11-03 Thread Marek Polacek

The last piece for convert.c.  Since convert_to_real uses fold ()
rather than fold_buildN, I defined a new macro to keep the code
more compact.

With this committed, convert.c should be dealt with.  If there's
anything else I could help with, please let me know.

Bootstrapped/regtested on x86_64-linux, ok for branch?

diff --git gcc/convert.c gcc/convert.c
index ec6ff37..3e593db 100644
--- gcc/convert.c
+++ gcc/convert.c
@@ -37,6 +37,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "ubsan.h"
 
+#define maybe_fold(FOLD_P, EXPR) \
+  ((FOLD_P) ? fold (EXPR) : (EXPR))
 #define maybe_fold_build1_loc(FOLD_P, LOC, CODE, TYPE, EXPR) \
   ((FOLD_P) ? fold_build1_loc (LOC, CODE, TYPE, EXPR)   \
: build1_loc (LOC, CODE, TYPE, EXPR))
@@ -119,17 +121,19 @@ convert_to_pointer_nofold (tree type, tree expr)
 /* Convert EXPR to some floating-point type TYPE.
 
EXPR must be float, fixed-point, integer, or enumeral;
-   in other cases error is called.  */
+   in other cases error is called.  If FOLD_P is true, try to fold
+   the expression.  */
 
-tree
-convert_to_real (tree type, tree expr)
+static tree
+convert_to_real_1 (tree type, tree expr, bool fold_p)
 {
   enum built_in_function fcode = builtin_mathfn_code (expr);
   tree itype = TREE_TYPE (expr);
+  location_t loc = EXPR_LOCATION (expr);
 
   if (TREE_CODE (expr) == COMPOUND_EXPR)
 {
-  tree t = convert_to_real (type, TREE_OPERAND (expr, 1));
+  tree t = convert_to_real_1 (type, TREE_OPERAND (expr, 1), fold_p);
   if (t == TREE_OPERAND (expr, 1))
return expr;
   return build2_loc (EXPR_LOCATION (expr), COMPOUND_EXPR, TREE_TYPE (t),
@@ -237,14 +241,13 @@ convert_to_real (tree type, tree expr)
  || TYPE_MODE (newtype) == TYPE_MODE (float_type_node)))
{
  tree fn = mathfn_built_in (newtype, fcode);
-
  if (fn)
- {
-   tree arg = fold (convert_to_real (newtype, arg0));
-   expr = build_call_expr (fn, 1, arg);
-   if (newtype == type)
- return expr;
- }
+   {
+ tree arg = convert_to_real_1 (newtype, arg0, fold_p);
+ expr = build_call_expr (fn, 1, maybe_fold (fold_p, arg));
+ if (newtype == type)
+   return expr;
+   }
}
}
default:
@@ -263,9 +266,11 @@ convert_to_real (tree type, tree expr)
  if (!flag_rounding_math
  && FLOAT_TYPE_P (itype)
  && TYPE_PRECISION (type) < TYPE_PRECISION (itype))
-   return build1 (TREE_CODE (expr), type,
-  fold (convert_to_real (type,
- TREE_OPERAND (expr, 0;
+   {
+ tree arg = convert_to_real_1 (type, TREE_OPERAND (expr, 0),
+   fold_p);
+ return build1 (TREE_CODE (expr), type, maybe_fold (fold_p, arg));
+   }
  break;
/* Convert (outertype)((innertype0)a+(innertype1)b)
   into ((newtype)a+(newtype)b) where newtype
@@ -301,8 +306,14 @@ convert_to_real (tree type, tree expr)
  || newtype == dfloat128_type_node)
{
  expr = build2 (TREE_CODE (expr), newtype,
-fold (convert_to_real (newtype, arg0)),
-fold (convert_to_real (newtype, arg1)));
+maybe_fold (fold_p,
+convert_to_real_1 (newtype,
+   arg0,
+   fold_p)),
+maybe_fold (fold_p,
+convert_to_real_1 (newtype,
+   arg1,
+   fold_p)));
  if (newtype == type)
return expr;
  break;
@@ -341,8 +352,14 @@ convert_to_real (tree type, tree expr)
  && !excess_precision_type (newtype
{
  expr = build2 (TREE_CODE (expr), newtype,
-fold (convert_to_real (newtype, arg0)),
-fold (convert_to_real (newtype, arg1)));
+maybe_fold (fold_p,
+convert_to_real_1 (newtype,
+   arg0,
+   fold_p)),
+maybe_fold (fold_p

[PATCH][RFC] Remove warning for SET VOIDmode -> BLKmode.

2015-11-03 Thread Dominik Vogt

The attached patch removes the messages "warning: source missing a
mode?" and "warning: operand ... missing mode?" (genrecog.c) for
the case that the DEST of a SET rtx has BKLmode and SRC has void
mode.  The mvcle instruction on s390 has a pretty weird format
that takes the lowest eight bits of an address with displacement
as the fill pattern.  Suggestions are welcome.

One of the patterns we're talking about is

(define_insn "*setmem_long"
  [(clobber (match_operand: 0 "register_operand" "=d"))
   (set (mem:BLK (subreg:P (match_operand: 3 "register_operand" "0") 0))
(match_operand 2 "shift_count_or_setmem_operand" "Y"))
 ^^^
   (use (match_dup 3))
   (use (match_operand: 1 "register_operand" "d"))
   (clobber (reg:CC CC_REGNUM))]
  "TARGET_64BIT || !TARGET_ZARCH"
  "mvcle\t%0,%1,%Y2\;jo\t.-4"
  [(set_attr "length" "8")
   (set_attr "type" "vs")])

("shift_count_or_setmem_operand" is the fill byte coded into the
instructions operand in the form "D2(B2)".)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

+   * genrecog.c (validate_pattern): Allow "set VOIDmode -> BLKmode" without
+   warnings.
>From ee8e5aacb020e08b71ad879af53654039f75c929 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Tue, 3 Nov 2015 16:42:37 +0100
Subject: [PATCH] Remove warning for SET VOIDmode -> BLKmode.

---
 gcc/genrecog.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/gcc/genrecog.c b/gcc/genrecog.c
index 599121f..38770d1 100644
--- a/gcc/genrecog.c
+++ b/gcc/genrecog.c
@@ -545,7 +545,7 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, int set_code)
 	  }
 
 	/* A MATCH_OPERAND that is a SET should have an output reload.  */
-	else if (set && constraints0)
+	else if (set_code && constraints0)
 	  {
 		if (set_code == '+')
 		  {
@@ -596,7 +596,7 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, int set_code)
 	/* Allowing non-lvalues in destinations -- particularly CONST_INT --
 	   while not likely to occur at runtime, results in less efficient
 	   code from insn-recog.c.  */
-	if (set && pred && pred->allows_non_lvalue)
+	if (set_code && pred && pred->allows_non_lvalue)
 	  error_at (info->loc, "destination operand %d allows non-lvalue",
 		XINT (pattern, 0));
 
@@ -616,8 +616,13 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, int set_code)
 	&& pred->allows_non_const
 	&& strstr (c_test, "operands") == NULL
 	&& ! (set
+		  && set_code
 		  && GET_CODE (set) == SET
-		  && GET_CODE (SET_SRC (set)) == CALL))
+		  && GET_CODE (SET_SRC (set)) == CALL)
+	&& ! (set
+		  && set_code == 0
+		  && GET_CODE (set) == SET
+		  && GET_MODE (SET_DEST (set)) == BLKmode))
 	  message_at (info->loc, "warning: operand %d missing mode?",
 		  XINT (pattern, 0));
 	return;
@@ -666,6 +671,7 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, int set_code)
 	/* If only one of the operands is VOIDmode, and PC or CC0 is
 	   not involved, it's probably a mistake.  */
 	else if (dmode != smode
+		 && dmode != BLKmode
 		 && GET_CODE (dest) != PC
 		 && GET_CODE (dest) != CC0
 		 && GET_CODE (src) != PC
@@ -676,13 +682,13 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, int set_code)
 	  {
 	const char *which;
 	which = (dmode == VOIDmode ? "destination" : "source");
-	message_at (info->loc, "warning: %s missing a mode?", which);
+	message_at (info->loc, "warning: %s missing a mode? %d %d", which, dmode, smode);
 	  }
 
 	if (dest != SET_DEST (pattern))
 	  validate_pattern (dest, info, pattern, '=');
 	validate_pattern (SET_DEST (pattern), info, pattern, '=');
-validate_pattern (SET_SRC (pattern), info, NULL_RTX, 0);
+validate_pattern (SET_SRC (pattern), info, pattern, 0);
 return;
   }
 
@@ -691,13 +697,15 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, int set_code)
   return;
 
 case ZERO_EXTRACT:
-  validate_pattern (XEXP (pattern, 0), info, set, set ? '+' : 0);
+  validate_pattern (XEXP (pattern, 0), info,
+			set_code ? set : NULL_RTX, set_code ? '+' : 0);
   validate_pattern (XEXP (pattern, 1), info, NULL_RTX, 0);
   validate_pattern (XEXP (pattern, 2), info, NULL_RTX, 0);
   return;
 
 case STRICT_LOW_PART:
-  validate_pattern (XEXP (pattern, 0), info, set, set ? '+' : 0);
+  validate_pattern (XEXP (pattern, 0), info,
+			set_code ? set : NULL_RTX, set_code ? '+' : 0);
   return;
 
 case LABEL_REF:
-- 
2.3.0

Re: [Patch AArch64] Switch constant pools to separate rodata sections.

2015-11-03 Thread James Greenhalgh

On Tue, Nov 03, 2015 at 04:36:45PM +, Ramana Radhakrishnan wrote:
> Hi,
> 
>   Now that PR63304 is fixed and we have an option to address
> any part of the memory using adrp / add or adrp / ldr instructions
> it makes sense to switch out literal pools into their own
> mergeable sections by default.
> 
> This would mean that by default we could now start getting
> the benefits of constant sharing across the board, potentially
> improving code size. The other advantage of doing so, for the
> security conscious is that this prevents intermingling of literal
> pools with code.
> 
> Wilco's kindly done some performance measurements and suggests that
> there is not really a performance regression in doing this.
> I've looked at the code size for SPEC2k6 today at -Ofast and
> in general there is a good code size improvement as expected
> by sharing said constants.
> 
> Tested on aarch64-none-elf with no regressions and bootstrapped 
> and regression tested in my tree for a number of days now.
> 
> Ok to commit ?

OK with the comment nits below fixed up.

>* config/aarch64/aarch64.c (aarch64_select_rtx_section): Switch
> to default section handling for non PC relative literal loads.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5c8604f..9d709e5 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -5244,13 +5244,22 @@ aarch64_use_blocks_for_constant_p (machine_mode mode 
> ATTRIBUTE_UNUSED,
>return false;
>  }
>  
> +/* Force all constant pool entries into the current function section.

Is this comment accurate now? I think it only applied to -mcmodel=large
but maybe I'm misunderstanding?

> +   In the large model we cannot reliably address all the address space
> +   thus for now, inline this with the text.  */
>  static section *
> +aarch64_select_rtx_section (machine_mode mode,
> + rtx x,
> + unsigned HOST_WIDE_INT align)
> +{
> +  /* Force all constant pool entries into the current function section.
> + In the large model we cannot reliably address all the address space
> + thus for now, inline this with the text.  */
> +  if (!aarch64_nopcrelative_literal_loads
> +  || aarch64_cmodel == AARCH64_CMODEL_LARGE)
> +return function_section (current_function_decl);

This is just a copy paste of the text above (and probably the better place
for it).

I think we'd want a more general comment at the top of the function,
then this can stay.

Thanks,
James

Re: [PATCH] Fix declaration of pthread-structs in s-osinte-rtems.ads

2015-11-03 Thread Arnaud Charlet

> Let's try again. This time I made the diff against trunk with the changes
> Sebastian recommended, included a ChangeLog and used svn-diff.
> If this patch goes through, please let me know how the backporting works.

Your ChangeLog entry is not in the proper format, see sections 6.8.1 and
6.8.2 from http://www.gnu.org/prep/standards/standards.html

The diff itself is OK.

You can use svn merge to merge changes on other branches, or try to
apply your diff using the "patch" command, and adjust for any merge
conflict.

Arno

Re: [PATCH][RFC] Remove warning for SET VOIDmode -> BLKmode.

2015-11-03 Thread Dominik Vogt

(Debug code removed from patch.)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* genrecog.c (validate_pattern): Allow "set VOIDmode -> BLKmode" without
warnings.
>From 04376919c108c42a2e9835dd1809b198bc47513f Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Tue, 3 Nov 2015 16:42:37 +0100
Subject: [PATCH] Remove warning for SET VOIDmode -> BLKmode.

---
 gcc/genrecog.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/gcc/genrecog.c b/gcc/genrecog.c
index 599121f..2c1fb47 100644
--- a/gcc/genrecog.c
+++ b/gcc/genrecog.c
@@ -545,7 +545,7 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, int set_code)
 	  }
 
 	/* A MATCH_OPERAND that is a SET should have an output reload.  */
-	else if (set && constraints0)
+	else if (set_code && constraints0)
 	  {
 		if (set_code == '+')
 		  {
@@ -596,7 +596,7 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, int set_code)
 	/* Allowing non-lvalues in destinations -- particularly CONST_INT --
 	   while not likely to occur at runtime, results in less efficient
 	   code from insn-recog.c.  */
-	if (set && pred && pred->allows_non_lvalue)
+	if (set_code && pred && pred->allows_non_lvalue)
 	  error_at (info->loc, "destination operand %d allows non-lvalue",
 		XINT (pattern, 0));
 
@@ -616,8 +616,13 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, int set_code)
 	&& pred->allows_non_const
 	&& strstr (c_test, "operands") == NULL
 	&& ! (set
+		  && set_code
 		  && GET_CODE (set) == SET
-		  && GET_CODE (SET_SRC (set)) == CALL))
+		  && GET_CODE (SET_SRC (set)) == CALL)
+	&& ! (set
+		  && set_code == 0
+		  && GET_CODE (set) == SET
+		  && GET_MODE (SET_DEST (set)) == BLKmode))
 	  message_at (info->loc, "warning: operand %d missing mode?",
 		  XINT (pattern, 0));
 	return;
@@ -666,6 +671,7 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, int set_code)
 	/* If only one of the operands is VOIDmode, and PC or CC0 is
 	   not involved, it's probably a mistake.  */
 	else if (dmode != smode
+		 && dmode != BLKmode
 		 && GET_CODE (dest) != PC
 		 && GET_CODE (dest) != CC0
 		 && GET_CODE (src) != PC
@@ -682,7 +688,7 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, int set_code)
 	if (dest != SET_DEST (pattern))
 	  validate_pattern (dest, info, pattern, '=');
 	validate_pattern (SET_DEST (pattern), info, pattern, '=');
-validate_pattern (SET_SRC (pattern), info, NULL_RTX, 0);
+validate_pattern (SET_SRC (pattern), info, pattern, 0);
 return;
   }
 
@@ -691,13 +697,15 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx set, int set_code)
   return;
 
 case ZERO_EXTRACT:
-  validate_pattern (XEXP (pattern, 0), info, set, set ? '+' : 0);
+  validate_pattern (XEXP (pattern, 0), info,
+			set_code ? set : NULL_RTX, set_code ? '+' : 0);
   validate_pattern (XEXP (pattern, 1), info, NULL_RTX, 0);
   validate_pattern (XEXP (pattern, 2), info, NULL_RTX, 0);
   return;
 
 case STRICT_LOW_PART:
-  validate_pattern (XEXP (pattern, 0), info, set, set ? '+' : 0);
+  validate_pattern (XEXP (pattern, 0), info,
+			set_code ? set : NULL_RTX, set_code ? '+' : 0);
   return;
 
 case LABEL_REF:
-- 
2.3.0

Re: [PATCH] Add configure flag for operator new (std::nothrow)

2015-11-03 Thread Aurelio Remonda

On Tue, Nov 3, 2015 at 10:26 AM, Paolo Carlini  wrote:
> Finally, since you are touching acinclude.m4 you should
> normally run autoreconf, mention in the ChangeLog the changed regenerated
> files and eventually commit those changes too (like the ChangeLog entries,
> those aren't normally part of the posted patch) About the three issues, you
> have plenty of examples in the mailing list.
>
> Otherwise, about the substance of the patch, I think we want to wait for
> Jonathan to be back.
>
> Paolo.

Just to be sure we are in the same page, i dont have commit access.
On the other hand I'm not quite sure I follow:
-The changes on the regenerated files have to be part of the patch?
(i.e the configure file changes after doing autoreconf)
Thank you!
-- 
Aurelio Remonda

Software Engineer

San Lorenzo 47, 3rd Floor, Office 5
Córdoba, Argentina
Phone: +54-351-4217888 / 4218211

[PATCH] S/390: Fix warning in "*movstr" pattern.

2015-11-03 Thread Dominik Vogt

The attached patch fixes the message "warning: dest missing a
mode?" from s390.md.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog
 
* config/s390/s390.md ("*movstr"): Fix warning.
>From e03251cacf2004e4cb302de03b44bb1a3f6ad827 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Tue, 3 Nov 2015 18:03:02 +0100
Subject: [PATCH] S/390: Fix warning in "*movstr" pattern.

---
 gcc/config/s390/s390.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index ea65c74..8a6f31d 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -2936,7 +2936,7 @@
(set (mem:BLK (match_operand:P 1 "register_operand" "0"))
 	(mem:BLK (match_operand:P 3 "register_operand" "2")))
(set (match_operand:P 0 "register_operand" "=d")
-	(unspec [(mem:BLK (match_dup 1))
+	(unspec:P [(mem:BLK (match_dup 1))
 		 (mem:BLK (match_dup 3))
 		 (reg:SI 0)] UNSPEC_MVST))
(clobber (reg:CC CC_REGNUM))]
-- 
2.3.0

Re: libgo patch committed: Update to Go 1.5 release

2015-11-03 Thread Ian Lance Taylor

On Tue, Nov 3, 2015 at 7:48 AM, Lynn A. Boger
 wrote:
>
> We are seeing failures on all the libgo tests when gccgo is built with the
> latest trunk
> on ppc64 (BE) and when running the testsuite for 64 bit.  The failures
> do not occur if run on ppc64 BE with m32 and do not occur on ppc64le.
>
> The messages say this:
>
> make[3]: Entering directory
> `/home/boger/gccgo.work/trunk/bld/powerpc64-linux/libgo'
> gotest: warning: no tests matching Test([^a-z].*)? in _gotest_.o _xtest_.o
> FAIL: bufio
> make[3]: *** [bufio/check] Error 1
> gotest: warning: no tests matching Test([^a-z].*)? in _gotest_.o _xtest_.o
> FAIL: bytes
> make[3]: *** [bytes/check] Error 1
> gotest: warning: no tests matching Test([^a-z].*)? in _gotest_.o _xtest_.o
> FAIL: errors

I don't know that I have access to a big-endian PPC GNU/Linux machine any more.

My first guess would be that somehow this case in libgo/testsuite/gotest

text="T"
case "$GOARCH" in
ppc64*) text="[TD]" ;;
esac

is not triggering.  Although it checks for ppc64*, I think it's only
required on PPC64 ABI v1 (big-endian) and is not required for ABI v2
(little endian).

You could try changing to GOARCH there to goarch to see if it helps,
although as far as I can see either should work.

Otherwise, cd to the libgo working directory, run "make bufio/check",
figure out how it is running gotest, and run "bash -xv
gotest_invocation" and send it here.

Ian

Re: [ARM] Fix PR middle-end/65958

2015-11-03 Thread Richard Earnshaw

On 06/10/15 11:11, Eric Botcazou wrote:
>> Thanks - I have no further comments on this patch. We probably need to
>> implement the same on AArch64 too in order to avoid similar problems.
> 
> Here's the implementation for aarch64, very similar but simpler since there 
> is 
> no shortage of scratch registers; the only thing to note is the new blockage 
> pattern.  This was tested on real hardware but not with Linux, instead with 
> Darwin (experimental port of the toolchain to iOS) and makes it possible to 
> pass ACATS (Ada conformance testsuite which requires stack checking).
> 
> There is also a couple of tweaks for the ARM implementation: a cosmetic one 
> for the probe_stack pattern and one for the output_probe_stack_range loop.
> 
> 
> 2015-10-06  Tristan Gingold  
> Eric Botcazou  
> 
> PR middle-end/65958
>   * config/aarch64/aarch64-protos.h (aarch64_output_probe_stack-range):
>   Declare.
>   * config/aarch64/aarch64.md: Declare UNSPECV_BLOCKAGE and
>   UNSPEC_PROBE_STACK_RANGE.
>   (blockage): New instruction.
>   (probe_stack_range): Likewise.
>   * config/aarch64/aarch64.c (aarch64_emit_probe_stack_range): New
>   function.
>   (aarch64_output_probe_stack_range): Likewise.
>   (aarch64_expand_prologue): Invoke aarch64_emit_probe_stack_range if
>   static builtin stack checking is enabled.
>   * config/aarch64/aarch64-linux.h (STACK_CHECK_STATIC_BUILTIN):
>   Define.
> 
>   * config/arm/arm.c (arm_emit_probe_stack_range): Adjust comment.
>   (output_probe_stack_range): Rotate the loop and simplify.
>   (thumb1_expand_prologue): Tweak sorry message.
>   * config/arm/arm.md (probe_stack): Use bare string.
> 
> 
> 2015-10-06  Eric Botcazou  
> 
> * gcc.target/aarch64/stack-checking.c: New test.
> 

Unless there really is common code between the two patches, this should
be separated out into two posts, one for ARM and one for AArch64.

More comments inline.

> 
> pr65958-2.diff
> 
> 
> Index: config/aarch64/aarch64-linux.h
> ===
> --- config/aarch64/aarch64-linux.h(revision 228512)
> +++ config/aarch64/aarch64-linux.h(working copy)
> @@ -88,4 +88,7 @@
>  #undef TARGET_BINDS_LOCAL_P
>  #define TARGET_BINDS_LOCAL_P default_binds_local_p_2
>  
> +/* Define this to be nonzero if static stack checking is supported.  */
> +#define STACK_CHECK_STATIC_BUILTIN 1
> +
>  #endif  /* GCC_AARCH64_LINUX_H */
> Index: config/aarch64/aarch64-protos.h
> ===
> --- config/aarch64/aarch64-protos.h   (revision 228512)
> +++ config/aarch64/aarch64-protos.h   (working copy)
> @@ -316,6 +316,7 @@ void aarch64_asm_output_labelref (FILE *
>  void aarch64_cpu_cpp_builtins (cpp_reader *);
>  void aarch64_elf_asm_named_section (const char *, unsigned, tree);
>  const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
> +const char * aarch64_output_probe_stack_range (rtx, rtx);
>  void aarch64_err_no_fpadvsimd (machine_mode, const char *);
>  void aarch64_expand_epilogue (bool);
>  void aarch64_expand_mov_immediate (rtx, rtx);
> Index: config/aarch64/aarch64.c
> ===
> --- config/aarch64/aarch64.c  (revision 228512)
> +++ config/aarch64/aarch64.c  (working copy)
> @@ -76,6 +76,7 @@
>  #include "sched-int.h"
>  #include "cortex-a57-fma-steering.h"
>  #include "target-globals.h"
> +#include "common/common-target.h"
>  
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -2144,6 +2145,167 @@ aarch64_libgcc_cmp_return_mode (void)
>return SImode;
>  }
>  
> +#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
> +
> +#if (PROBE_INTERVAL % 4096) != 0
> +#error Cannot use indexed address calculation for stack probing
> +#endif
> +
> +#if PROBE_INTERVAL > 4096
> +#error Cannot use indexed addressing mode for stack probing
> +#endif
> +

Hmm, so if PROBE_INTERVAL != 4096 we barf!

While that's safe and probably right for Linux, on some OSes there might
be a minimum page size of 16k or even 64k.  It would be nice if we could
support that.

> +/* Emit code to probe a range of stack addresses from FIRST to FIRST+SIZE,
> +   inclusive.  These are offsets from the current stack pointer.  */
> +
> +static void
> +aarch64_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size)
> +{
> +  rtx reg9 = gen_rtx_REG (Pmode, 9);

Ug!  Manifest constants should be moved to pre-defines.
PROBE_STACK_BASE_REG?

> +
> +  /* The following code uses indexed address calculation on FIRST.  */
> +  gcc_assert ((first % 4096) == 0);

where's 4096 come from?

> +
> +  /* See if we have a constant small number of probes to generate.  If so,
> + that's the easy case.  */
> +  if (size <= PROBE_INTERVAL)
> +{
> +  emit_set_insn (reg9,
> +  plus_constant (Pmode, stack_pointer_rtx,
> +

Re: [PATCH] Fix declaration of pthread-structs in s-osinte-rtems.ads

2015-11-03 Thread Jan Sommer

Am Tuesday 03 November 2015, 18:10:53 schrieb Arnaud Charlet:
> > Let's try again. This time I made the diff against trunk with the changes
> > Sebastian recommended, included a ChangeLog and used svn-diff.
> > If this patch goes through, please let me know how the backporting works.
> 
> Your ChangeLog entry is not in the proper format, see sections 6.8.1 and
> 6.8.2 from http://www.gnu.org/prep/standards/standards.html
> 
> The diff itself is OK.
> 

Ok, fixed this. See the new diff below.

> You can use svn merge to merge changes on other branches, or try to
> apply your diff using the "patch" command, and adjust for any merge
> conflict.
> 

Will do. Do I send a patch for each branch to the list or do I add these 
changes to this patch?

Best regards,

   Jan


Index: gcc/ada/ChangeLog
===
--- gcc/ada/ChangeLog   (Revision 229715)
+++ gcc/ada/ChangeLog   (Arbeitskopie)
@@ -1,3 +1,8 @@
+2015-11-03  Jan Sommer 
+
+   * s-oscons-tmplt.c: Generate pthread constants for RTEMS
+   * s-osinte-rtems.ads: Declare pthread structs as opaque types in Ada
+
 2015-10-29  Andrew MacLeod  
 
* gcc-interface/decl.c: Reorder #include's and remove duplicates.
Index: gcc/ada/s-oscons-tmplt.c
===
--- gcc/ada/s-oscons-tmplt.c(Revision 229715)
+++ gcc/ada/s-oscons-tmplt.c(Arbeitskopie)
@@ -157,7 +157,7 @@ pragma Style_Checks ("M32766");
 # include <_types.h>
 #endif
 
-#if defined (__linux__) || defined (__ANDROID__)
+#if defined (__linux__) || defined (__ANDROID__) || defined (__rtems__)
 # include 
 # include 
 #endif
@@ -1458,7 +1458,7 @@ CNS(CLOCK_RT_Ada, "")
 #endif
 
 #if defined (__APPLE__) || defined (__linux__) || defined (__ANDROID__) \
-  || defined (DUMMY)
+  || defined (__rtems__) || defined (DUMMY)
 /*
 
--  Sizes of pthread data types
@@ -1501,7 +1501,7 @@ CND(PTHREAD_RWLOCKATTR_SIZE, "pthread_rwlockattr_t
 CND(PTHREAD_RWLOCK_SIZE, "pthread_rwlock_t")
 CND(PTHREAD_ONCE_SIZE,   "pthread_once_t")
 
-#endif /* __APPLE__ || __linux__ || __ANDROID__ */
+#endif /* __APPLE__ || __linux__ || __ANDROID__ || __rtems__ */
 
 /*
 
Index: gcc/ada/s-osinte-rtems.ads
===
--- gcc/ada/s-osinte-rtems.ads  (Revision 229715)
+++ gcc/ada/s-osinte-rtems.ads  (Arbeitskopie)
@@ -51,6 +51,8 @@
 --  It is designed to be a bottom-level (leaf) package.
 
 with Interfaces.C;
+with System.OS_Constants;
+
 package System.OS_Interface is
pragma Preelaborate;
 
@@ -60,6 +62,7 @@ package System.OS_Interface is
subtype rtems_id   is Interfaces.C.unsigned;
 
subtype intis Interfaces.C.int;
+   subtype char   is Interfaces.C.char;
subtype short  is Interfaces.C.short;
subtype long   is Interfaces.C.long;
subtype unsigned   is Interfaces.C.unsigned;
@@ -68,7 +71,6 @@ package System.OS_Interface is
subtype unsigned_char  is Interfaces.C.unsigned_char;
subtype plain_char is Interfaces.C.plain_char;
subtype size_t is Interfaces.C.size_t;
-
---
-- Errno --
---
@@ -76,11 +78,11 @@ package System.OS_Interface is
function errno return int;
pragma Import (C, errno, "__get_errno");
 
-   EAGAIN: constant := 11;
-   EINTR : constant := 4;
-   EINVAL: constant := 22;
-   ENOMEM: constant := 12;
-   ETIMEDOUT : constant := 116;
+   EAGAIN: constant := System.OS_Constants.EAGAIN;
+   EINTR : constant := System.OS_Constants.EINTR;
+   EINVAL: constant := System.OS_Constants.EINVAL;
+   ENOMEM: constant := System.OS_Constants.ENOMEM;
+   ETIMEDOUT : constant := System.OS_Constants.ETIMEDOUT;
 
-
-- Signals --
@@ -448,6 +450,7 @@ package System.OS_Interface is
   ss_low_priority : int;
   ss_replenish_period : timespec;
   ss_initial_budget   : timespec;
+  sched_ss_max_repl   : int;
end record;
pragma Convention (C, struct_sched_param);
 
@@ -621,43 +624,34 @@ private
end record;
pragma Convention (C, timespec);
 
-   CLOCK_REALTIME :  constant clockid_t := 1;
-   CLOCK_MONOTONIC : constant clockid_t := 4;
+   CLOCK_REALTIME :  constant clockid_t := System.OS_Constants.CLOCK_REALTIME;
+   CLOCK_MONOTONIC : constant clockid_t := System.OS_Constants.CLOCK_MONOTONIC;
 
+   subtype char_array is Interfaces.C.char_array;
+
type pthread_attr_t is record
-  is_initialized  : int;
-  stackaddr   : System.Address;
-  stacksize   : int;
-  contentionscope : int;
-  inheritsched: int;
-  schedpolicy : int;
-  schedparam  : struct_sched_param;
-  cputime_clocked_allowed : int;
-  detatchstate: int;
+  Data : char_array (1 .. OS_Constants.PTHREAD_ATTR_SIZE);
end record;
pragma Convention (C, pthread_attr_t);
+   for pthread_attr_t'Alignment use Interfaces.C.double'

Re: [PATCH] S/390: Fix warning in "*movstr" pattern.

2015-11-03 Thread Ulrich Weigand

Dominik Vogt wrote:

> @@ -2936,7 +2936,7 @@
> (set (mem:BLK (match_operand:P 1 "register_operand" "0"))
>   (mem:BLK (match_operand:P 3 "register_operand" "2")))
> (set (match_operand:P 0 "register_operand" "=d")
> - (unspec [(mem:BLK (match_dup 1))
> + (unspec:P [(mem:BLK (match_dup 1))
>(mem:BLK (match_dup 3))
>(reg:SI 0)] UNSPEC_MVST))
> (clobber (reg:CC CC_REGNUM))]

Don't you have to change the expander too?  Otherwise the
pattern will no longer match ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [AArch64] Update comments on the usage of X30 in FIXED_REGISTERS and CALL_USED_REGISTERS

2015-11-03 Thread Jiong Wang




On 02/11/15 14:52, Richard Earnshaw wrote:

On 02/11/15 12:58, Jiong Wang wrote:


On 02/11/15 12:01, Richard Earnshaw wrote:

On 16/10/15 15:36, Jiong Wang wrote:

The patch https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02654.html
from last year changed the definition of LR in CALL_USED_REGISTERS,
but didn't update the comment above the #define to reflect the new
usage.

This patch bring the comment inline with the implementation.

OK for trunk?

Thanks.

2015-10-16  Jiong. Wang  

gcc/
* config/aarch64/aarch64.h: Update the comments on usage of X30.


fix-comment.patch


diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 5a8db76..1eaaca0 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -210,14 +210,17 @@ extern unsigned aarch64_architecture_version;
  significant bits.  Unlike AArch32 S1 is not packed into D0,
  etc.  */
   -/* Note that we don't mark X30 as a call-clobbered register.  The
idea is
-   that it's really the call instructions themselves which clobber X30.
-   We don't care what the called function does with it afterwards.
-
-   This approach makes it easier to implement sibcalls.  Unlike normal
-   calls, sibcalls don't clobber X30, so the register reaches the
-   called function intact.  EPILOGUE_USES says that X30 is useful
-   to the called function.  */
+/* We don't mark X30 as a fixed register while we mark it as a
caller-saved
+   register.  The idea is we want X30 to be allocable as a caller-saved
+   register when possible.
+
+   NOTE: although X30 is marked as caller-saved, it's callee-saved
at the same
+   time.  The caller-saved attribute makes sure if X30 is allocated
as free
+   register to hold any temporary value then the value is saved
properly across
+   function call.  While on AArch64, the call instruction writes the
return
+   address to LR.  If the called function is a non-leaf function, it
is the
+   responsibility of the callee to save and restore LR appropriately
in it's
+   prologue / epilogue.  */
   

Sorry, but I find that just confusing.

Wouldn't it be easier just to say:

   X30 is clobbered by call instructions, so must be treated as a
   caller-saved register.

Richard, thanks for the review, but I am not convinced by your change.

"caller-saved" in gcc just means if the live range of the register is
across function call then the caller will make sure it will be saved and
restored properly. this is completely a calling convention concept and
have not relationship with how call instruction works.

So, we mark X30 as caller-saved not because it will be clobbered by the
call instructions but because we relax it as free register and want it
to be saved by caller whenever it's allocated by register allocation and
life range is across function call.

And from my understanding, if one register if clobbered by call
instruction, it's not must be treated as a caller-saved register,
instead it must be treated as a *callee-saved* register. Because the call
instruction is actually assigning a new value to the register then jump
to the callee in an atomic way thus there is no save/restore from the
caller of this "new value", callee is full responsible for this. The
"NOTE" part in the patch is trying to highlight this so following extra
check in aarch64_layout_frame with be eaiser to understand for others.

   /* ... and any callee saved register that dataflow says is live. */
   for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
 if (df_regs_ever_live_p (regno)
 && (regno == R30_REGNUM   <--- X30, a caller-save, is
callee-save as well.
 || !call_used_regs[regno]))
   cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;

Regards,
Jiong

Right, I think I now understand what you are trying to say, but I still
think the wording does not convey that.

We have two statements of fact
1) On entry to a function LR contains the return address (by the
architecture)
2) LR cannot retain values across a function call (it is a caller-saved
register by the PCS)

We then have an implementation perspective on how to use LR given these
constraints: we treat the register as a callee-saved register and put
explicit clobbers on all call instructions.

So how does the following sound?

/* Technically, LR should be treated as a caller-saved register (since
it is modified during a subroutine call to contain the return address).
  However, from the compiler's perspective, it is best to treat it as a
callee-saved register and then to put explicit clobber instructions on
each call instruction to ensure that live values are not retained in it
across call instructions.  This allows us to use the register as a
scratch register between function calls.  */


Interesting... I fell this new comments is viewing the behavior of x30
from a different perspective.

By just reading this comments, I would have think the implementation on
AArch64 is:

1. X30 is set to 0 in FIXED_REGISTERS
2. X30 is set to 0

Re: [ARM] Fix PR middle-end/65958

2015-11-03 Thread Eric Botcazou

> Unless there really is common code between the two patches, this should
> be separated out into two posts, one for ARM and one for AArch64.

The ARM bits were approved by Ramana and installed right away.

> Hmm, so if PROBE_INTERVAL != 4096 we barf!

Yes, but that's not usual, ARM and SPARC have it too, 4096 happens to be the 
limit of reg+off addressing mode on several architectures.

> While that's safe and probably right for Linux, on some OSes there might
> be a minimum page size of 16k or even 64k.  It would be nice if we could
> support that.

OK, but we cannot test anything at the moment.

> Ug!  Manifest constants should be moved to pre-defines.
> PROBE_STACK_BASE_REG?

OK.

> > +
> > +  /* The following code uses indexed address calculation on FIRST.  */
> > +  gcc_assert ((first % 4096) == 0);
> 
> where's 4096 come from?

It's the same constraint as above:

#if (PROBE_INTERVAL % 4096) != 0
#error Cannot use indexed address calculation for stack probing
#endif

to be able to use the 12-bit shifted immediate instructions. 

> More manifest constants.

Yeah, consistency first. ;-)

> This should be annotated with the sequence length.

OK, thanks for the review, I'll adjust.

-- 
Eric Botcazou

1 2 >

1 - 100 of 132 matches

Mail list logo