Re: PING: Re: [patch 6/10] debug-early merge: Java front-end

2015-05-21 Thread Andrew Haley
On 20/05/15 23:32, Aldy Hernandez wrote:
> Perhaps I should've sent this to the java-patches list.
> 
> PING.

OK, I believe it.

Andrew.



Re: [nvptx] Re: Mostly rewrite genrecog

2015-05-21 Thread Thomas Schwinge
Hi!

On Thu, 7 May 2015 11:14:37 +0200, Jakub Jelinek  wrote:
> On Thu, May 07, 2015 at 10:59:01AM +0200, Thomas Schwinge wrote:
> >  build/genrecog [...]/source-gcc/gcc/common.md 
> > [...]/source-gcc/gcc/config/nvptx/nvptx.md \
> >insn-conditions.md > tmp-recog.c
> > -[...]/source-gcc/gcc/config/nvptx/nvptx.md:1206: warning: operand 0 
> > missing mode?
> > -[...]/source-gcc/gcc/config/nvptx/nvptx.md:1206: warning: operand 1 
> > missing mode?
> > 
> > gcc/config/nvptx/nvptx.md:
> > 
> > 1206 (define_insn "allocate_stack"
> > 1207   [(set (match_operand 0 "nvptx_register_operand" "=R")
> > 1208 (unspec [(match_operand 1 "nvptx_register_operand" "R")]
> > 1209   UNSPEC_ALLOCA))]
> > 1210   ""
> > 1211   "%.\\tcall (%0), %%alloca, (%1);")
> > 
> > Are these two (former) warnings a) something that should still be
> > reported by genrecog, 
> 
> Yes.

.


> > and b) something that should be addressed (Bernd)?
> 
> Yes.  Supposedly you want :P on both match_operand and unspec too, but
> as this serves not just as an insn pattern, but also as expander that
> needs to have this particular name, supposedly you want:
> 
> (define_expand "allocate_stack"
>   [(match_operand 0 "nvptx_register_operand")
>(match_operand 1 "nvptx_register_operand")]
>   ""
> {
>   if (TARGET_ABI64)
> emit_insn (gen_allocate_stack_di (operands[0], operands[1]));
>   else
> emit_insn (gen_allocate_stack_si (operands[0], operands[1]));
>   DONE;
> })
> 
> (define_insn "allocate_stack_"
>   [(set (match_operand:P 0 "nvptx_register_operand" "=R")
>   (unspec:P [(match_operand:P 1 "nvptx_register_operand" "R")]
>  UNSPEC_ALLOCA))]
>   ""
>   "%.\\tcall (%0), %%alloca, (%1);")
> 
> rr so.

OK to commit?

commit 004e521e8dd1c0236a55e9a69a17ccc2a41d
Author: Thomas Schwinge 
Date:   Thu May 7 11:30:26 2015 +0200

[nvptx] Address genrecog warnings

2015-05-21  Jakub Jelinek  

gcc/
* config/nvptx/nvptx.md (allocate_stack): Rename to...
(allocate_stack_): ... this, and add :P on both
match_operand and unspec.
(allocate_stack): New expander.
---
 gcc/config/nvptx/nvptx.md |   20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git gcc/config/nvptx/nvptx.md gcc/config/nvptx/nvptx.md
index c30de36..a49786c 100644
--- gcc/config/nvptx/nvptx.md
+++ gcc/config/nvptx/nvptx.md
@@ -1203,10 +1203,22 @@
   sorry ("target cannot support nonlocal goto.");
 })
 
-(define_insn "allocate_stack"
-  [(set (match_operand 0 "nvptx_register_operand" "=R")
-   (unspec [(match_operand 1 "nvptx_register_operand" "R")]
- UNSPEC_ALLOCA))]
+(define_expand "allocate_stack"
+  [(match_operand 0 "nvptx_register_operand")
+   (match_operand 1 "nvptx_register_operand")]
+  ""
+{
+  if (TARGET_ABI64)
+emit_insn (gen_allocate_stack_di (operands[0], operands[1]));
+  else
+emit_insn (gen_allocate_stack_si (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "allocate_stack_"
+  [(set (match_operand:P 0 "nvptx_register_operand" "=R")
+(unspec:P [(match_operand:P 1 "nvptx_register_operand" "R")]
+   UNSPEC_ALLOCA))]
   ""
   "%.\\tcall (%0), %%alloca, (%1);")
 


> Of course, as even latest Cuda drop doesn't support alloca, this is
> quite dubious, perhaps better would be sorry on it.
> 
> BTW, with Cuda 7.0, even printf doesn't work anymore, is that known?

I have not yet used that version of CUDA, so don't know about this.  :-|


Grüße,
 Thomas


pgpFi4uhk1oJs.pgp
Description: PGP signature


Re: [match-and-simplify] reject expanding operator-list to implicit 'for'

2015-05-21 Thread Richard Biener
On Wed, 20 May 2015, Prathamesh Kulkarni wrote:

> On 20 May 2015 at 18:18, Richard Biener  wrote:
> > On Wed, 20 May 2015, Prathamesh Kulkarni wrote:
> >
> >> On 20 May 2015 at 17:01, Richard Biener  wrote:
> >> > On Wed, 20 May 2015, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 20 May 2015 at 16:17, Prathamesh Kulkarni
> >> >>  wrote:
> >> >> > Hi,
> >> >> > This patch rejects expanding operator-list to implicit 'for'.
> >> >> On second thoughts, should we reject expansion of operator-list _only_
> >> >> if it's mixed with 'for' ?
> >> >
> >> > At least that, yes.
> Well I suppose we could extend it to be mixed with 'for' ?
> Add the operator lists to the inner-most 'for'.
> eg:
> (define_operator_list olist ...)
> 
> (for op (...)
>   (simplify
> (op (olist ...
> 
> would be equivalent to:
> 
> (for op (...)
>   temp (olist)
>   (simplify
> (op (olist ...
> 
> operator-list expansion can be said to simply a short-hand for single
> 'for' with number of iterators = number of operator-lists.
> If the operator-lists are enclosed within 'for', add them to the
> innermost 'for'.

Yes, but I think this use is confusing as to whether the operator lists
form a new for (like currently(?)) or if they append to the enclosing
for.  What we do currently is consistent (always create a new for) but
it is confusing behavior - as you noted initially.

Richard.

> Thanks,
> Prathamesh
> 
> >> >
> >> >> We could define multiple operator-lists in simplify to be the same as
> >> >> enclosing the simplify in 'for' with number of iterators
> >> >> equal to number of operator-lists.
> >> >> So we could allow
> >> >> (define_operator_list op1 ...)
> >> >> (define_operator_list op2 ...)
> >> >>
> >> >> (simplify
> >> >>   (op1 (op2 ... )))
> >> >>
> >> >> is equivalent to:
> >> >> (for  temp1 (op1)
> >> >>temp2 (op2)
> >> >>   (simplify
> >> >> (temp1 (temp2 ...
> >> >>
> >> >> I think we have patterns like these in match-builtin.pd in the
> >> >> match-and-simplify branch
> >> >> And reject mixing of 'for' and operator-lists.
> >> >> Admittedly the implicit 'for' behavior is not obvious from the syntax 
> >> >> -;(
> >> >
> >> > Hmm, indeed we have for example
> >> >
> >> > /* Optimize pow(1.0,y) = 1.0.  */
> >> > (simplify
> >> >  (POW real_onep@0 @1)
> >> >  @0)
> >> >
> >> > and I remember wanting that implicit for to make those less ugly.
> >> >
> >> > So can you rework only rejecting it within for?
> >> This patch rejects expanding operator-list inside 'for'.
> >> OK for trunk after bootstrap+testing ?
> >
> > Ok.
> >
> > Thanks,
> > Richard.
> >
> >> Thanks,
> >> Prathamesh
> >> >
> >> > Thanks,
> >> > Richard.
> >> >
> >> >
> >> >> Thanks,
> >> >> Prathamesh
> >> >> > OK for trunk after bootstrap+testing ?
> >> >> >
> >> >> > Thanks,
> >> >> > Prathamesh
> >> >>
> >> >>
> >> >
> >> > --
> >> > Richard Biener 
> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, 
> >> > Graham Norton, HRB 21284 (AG Nuernberg)
> >>
> >
> > --
> > Richard Biener 
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, 
> > Graham Norton, HRB 21284 (AG Nuernberg)
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


Re: OpenACC: initialization with unsupported acc_device_t

2015-05-21 Thread Jakub Jelinek
On Thu, May 21, 2015 at 08:55:59AM +0200, Thomas Schwinge wrote:
> Thanks, looks good to me -- Jakub?

Ok for trunk.

> > libgomp/
> > * oacc-init.c (resolve_device): Add FAIL_IS_ERROR argument. Update
> > function comment. Only call gomp_fatal if new argument is true.
> > (acc_dev_num_out_of_range): New function.
> > (acc_init_1, acc_shutdown_1): Update call to resolve_device. Call
> > acc_dev_num_out_of_range as appropriate.
> > (acc_get_num_devices, acc_set_device_type, acc_get_device_type)
> > (acc_get_device_num, acc_set_device_num): Update calls to 
> > resolve_device.
> > * testsuite/libgomp.oacc-c-c++-common/lib-4.c: Update expected test
> > output.

Jakub


Re: [obvious fix] fix off-by-one error when printing the caret character

2015-05-21 Thread Dodji Seketeli
Manuel López-Ibáñez  writes:


> Index: ChangeLog
> ===
> --- ChangeLog   (revision 223445)
> +++ ChangeLog   (working copy)
> @@ -1,3 +1,8 @@
> +2015-05-20  Manuel López-Ibáñez  
> +
> +   * diagnostic.c (diagnostic_print_caret_line): Fix off-by-one error
> +   when printing the caret character.
> +

This is OK, thanks!

Cheers,

-- 
Dodji


Re: [patch, libgomp] Re-factor GOMP_MAP_POINTER handling

2015-05-21 Thread Chung-Lin Tang
Ping x2.

On 15/5/11 7:19 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2015/4/21 08:21 PM, Chung-Lin Tang wrote:
>> Hi,
>> while investigating some issues in the variable mapping code, I observed
>> that the GOMP_MAP_POINTER handling is essentially duplicated under the PSET 
>> case.
>> This patch abstracts and unifies the handling code, basically just a cleanup
>> patch. Ran libgomp tests to ensure no regressions, ok for trunk?
>>
>> Thanks,
>> Chung-Lin
>>
>> 2015-04-21  Chung-Lin Tang  
>>
>> libgomp/
>> * target.c (gomp_map_pointer): New function abstracting out
>> GOMP_MAP_POINTER handling.
>> (gomp_map_vars): Remove GOMP_MAP_POINTER handling code and use
>> gomp_map_pointer().
>>
> 



[PATCH, CHKP] Fix PR middle-end/66221: lto1: error: type variant has different TYPE_ARG_TYPES

2015-05-21 Thread Ilya Enkovich
Hi,

This patch fixes PR66221 by using build_distinct_type_copy instead of copy_node 
to copy a function type for instrumented function.  Bootstrapped and regtested 
for x86_64-unknown-linux-gnu.  Applied to trunk.  Is it OK for gcc-5?

Thanks,
Ilya
--
gcc/

2015-05-21  Ilya Enkovich  

PR middle-end/66221
* ipa-chkp.c (chkp_copy_function_type_adding_bounds): Use
build_distinct_type_copy to copy bounds.

gcc/testsuite/

2015-05-21  Ilya Enkovich  

PR middle-end/66221
* gcc.dg/lto/pr66221_0.c: New test.
* gcc.dg/lto/pr66221_1.c: New test.


diff --git a/gcc/ipa-chkp.c b/gcc/ipa-chkp.c
index ac5eb35..c710291 100644
--- a/gcc/ipa-chkp.c
+++ b/gcc/ipa-chkp.c
@@ -308,7 +308,7 @@ chkp_copy_function_type_adding_bounds (tree orig_type)
   if (!arg_type)
 return orig_type;
 
-  type = copy_node (orig_type);
+  type = build_distinct_type_copy (orig_type);
   TYPE_ARG_TYPES (type) = copy_list (TYPE_ARG_TYPES (type));
 
   for (arg_type = TYPE_ARG_TYPES (type);
diff --git a/gcc/testsuite/gcc.dg/lto/pr66221_0.c 
b/gcc/testsuite/gcc.dg/lto/pr66221_0.c
new file mode 100644
index 000..dbb9282
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr66221_0.c
@@ -0,0 +1,10 @@
+/* { dg-lto-do link } */
+/* { dg-require-effective-target mpx } */
+/* { dg-lto-options { { -O2 -flto -fcheck-pointer-bounds -mmpx } } } */
+
+int test1 (const char *);
+
+int main (int argc, const char **argv)
+{
+  return test1 (argv[0]);
+}
diff --git a/gcc/testsuite/gcc.dg/lto/pr66221_1.c 
b/gcc/testsuite/gcc.dg/lto/pr66221_1.c
new file mode 100644
index 000..4c94544
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/pr66221_1.c
@@ -0,0 +1,4 @@
+int test1 (const char *p)
+{
+  return (int)(*p);
+}


Re: [Patch AArch64] PR target/66200 - gcc / libstdc++ TLC for weak memory models.

2015-05-21 Thread Ramana Radhakrishnan
And here's an additional patch for the testsuite which was missed in the 
original posting.


This is a testism that's testing code generation as per 
TARGET_RELAXED_ORDERING being false and therefore needs to be adjusted 
as attached.


Ramana

PR target/66200
* g++.dg/abi/aarch64_guard1.C: Adjust testcase.

diff --git a/gcc/testsuite/g++.dg/abi/aarch64_guard1.C 
b/gcc/testsuite/g++.dg/abi/aarch64_guard1.C
index ca1778b..e78f93c 100644
--- a/gcc/testsuite/g++.dg/abi/aarch64_guard1.C
+++ b/gcc/testsuite/g++.dg/abi/aarch64_guard1.C
@@ -13,5 +13,4 @@ int *foo ()
 }
 
 // { dg-final { scan-assembler _ZGVZ3foovE1x,8,8 } }
-// { dg-final { scan-tree-dump "_ZGVZ3foovE1x & 1" "original" } }
 // { dg-final { cleanup-tree-dump "original" } }


Re: [patch, testsuite, ARM] don't try to execute simd.exp tests on targets without NEON

2015-05-21 Thread Kyrill Tkachov

Hi Sandra,

On 21/05/15 06:43, Sandra Loosemore wrote:

This is another patch aimed at fixing bugs relating to trying to execute
NEON code on a target that doesn't support it revealed by my
arm-none-eabi testing on a gazillion different multilibs.  Inspired by
what vect.exp does and my other patch in this group to fix
advsimd-intrinsics.exp, I've hacked simd.exp to test for NEON
compilation and execution support and use set dg-do-what-default to
either "compile" or "run" as appropriate, or skip the whole set of tests
if neither is present.  And, I've removed the explicit "dg-do run" and
arm_neon_ok test (which only tests for compilation support, not
execution support) from all the individual test cases.

OK to commit?


This is ok and there is one less headache with NEON testing :)
Thanks,
Kyrill



-Sandra





Re: [PATCH] [PATCH][ARM] Fix sibcall testcases.

2015-05-21 Thread Ramana Radhakrishnan
On Wed, May 20, 2015 at 9:11 PM, Joseph Myers  wrote:
> On Wed, 20 May 2015, Alex Velenko wrote:
>
>> Hi,
>>
>> This patch prevents arm_thumb1_ok XPASS in sibcall-3.c and sibcall-4.c
>> testcases. Sibcalls are not ok for Thumb1 and testcases need to be fixed.
>
> arm_thumb1_ok means "this is an ARM target where -mthumb causes Thumb-1 to
> be used".  It only ever makes sense to use it in tests that use an
> explicit -mthumb, which these tests don't.
>
> If you want to check "is this test being built for Thumb-1 by the multilib
> options", use arm_thumb1.
>

Alex, so while you are here - why don't you improve the documentation
in sourcebuild.texi by

1. documenting arm_thumb1
2. distinguishing that from arm_thumb1_ok which just says
`ARM target generates Thumb-1 code for @code{-mthumb}.'

and that is just meaningless.

regards
Ramana



> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [patch, testsuite] don't specify "dg-do run" explicitly for vect test cases

2015-05-21 Thread Richard Biener
On Thu, May 21, 2015 at 7:12 AM, Sandra Loosemore
 wrote:
> On targets such as ARM, some arches are compatible with options needed to
> enable compilation with vectorization, but the specific hardware (or
> simulator or BSP) available for execution tests may not implement or enable
> those features.  The vect.exp test harness already includes some magic to
> determine whether the target hw can execute vectorized code and sets
> dg-do-what-default to compile the tests only if they can't be executed.
> It's a mistake for individual tests to explicitly say "dg-do run" because
> this overrides the harness's magic default and forces the test to be
> executed, even if doing so just ends up wedging the target.
>
> I already committed two patches last fall (r215627 and r218427) to address
> this, but people keep adding new vect test cases with the same problem, so
> here is yet another installment to clean them up.  I tested this on
> arm-none-eabi with a fairly large collection of multilibs.  OK to commit?

Huh... I thought we have the check_vect () stuff for that...?

> -Sandra
>


Re: [RFA] Restore combine.c split point for multiply-accumulate instructions

2015-05-21 Thread Richard Biener
On Thu, May 21, 2015 at 7:38 AM, Jeff Law  wrote:
>
> find_split_point will tend to favor splitting complex insns in such a way as
> to encourage multiply-add insns.  It does this by splitting an
> unrecognizable insn at the (plus (mult)).
>
> Now that many MULTs are canonicalized as ASHIFT, that code to prefer the
> multiply-add is no longer triggering when it could/should.  This ultimately
> results in splitting at the ASHIFT rather than the containing PLUS and thus
> we generate distinct shift and add insns rather than a single shadd insn on
> the PA (and probably other architectures).
>
> This patch will treat (plus (ashift)) just like (plus (mult)) which
> encourages creation of shift-add insns.
>
> This has been bootstrapped and regression tested on x86-unknown-linux-gnu
> and with an hppa2.0w-hp-hpux11.00 cross compiler on the hppa.exp testsuite
> (full disclosure -- hppa.exp only has two tests, so it's far from
> extensive).
>
> I've also verified this is one of the changes ultimately necessary to
> resolve the code generation regressions caused by Venkat's combine.c change
> on the PA across my 300+ testfiles for a PA cross compiler.
>
> OK for the trunk?

Sounds reasonable.

Thanks,
Richard.

>
>
> Jeff
>
>
>
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 490386e..250fa0a 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,5 +1,8 @@
>  2015-05-20  Jeff Law  
>
> +   * combine.c (find_split_point): Handle ASHIFT like MULT to encourage
> +   multiply-accumulate/shift-add insn generation.
> +
> * config/pa/pa.c (pa_print_operand): New 'o' output modifier.
> (pa_mem_shadd_constant_p): Renamed from pa_shadd_constant_p.
> (pa_shadd_constant_p): Allow constants for shadd insns rather
> diff --git a/gcc/combine.c b/gcc/combine.c
> index a90849e..ab6de3a 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -5145,7 +5163,9 @@ find_split_point (rtx *loc, rtx_insn *insn, bool
> set_src)
>/* Split at a multiply-accumulate instruction.  However if this is
>   the SET_SRC, we likely do not have such an instruction and it's
>   worthless to try this split.  */
> -  if (!set_src && GET_CODE (XEXP (x, 0)) == MULT)
> +  if (!set_src
> + && (GET_CODE (XEXP (x, 0)) == MULT
> + || GET_CODE (XEXP (x, 0)) == ASHIFT))
>  return loc;
>
>  default:
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index f20a131..bac0973 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,5 +1,7 @@
>  2015-05-20  Jeff Law  
>
> +   * gcc.target/hppa/shadd-2.c: New test.
> +
> * gcc.target/hppa/hppa.exp: New target test driver.
> * gcc.target/hppa/shadd-1.c: New test.
>
> diff --git a/gcc/testsuite/gcc.target/hppa/shadd-2.c
> b/gcc/testsuite/gcc.target/hppa/shadd-2.c
> new file mode 100644
> index 000..34708e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/hppa/shadd-2.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile }  */
> +/* { dg-options "-O2" }  */
> +/* { dg-final { scan-assembler-times "sh.add" 2 } }  */
> +
> +typedef struct rtx_def *rtx;
> +typedef const struct rtx_def *const_rtx;
> +enum machine_mode
> +{
> +  VOIDmode, BLKmode, CCmode, CCGCmode, CCGOCmode, CCNOmode, CCAmode,
> CCCmode,
> +CCOmode, CCSmode, CCZmode, CCFPmode, CCFPUmode, BImode, QImode, HImode,
> +SImode, DImode, TImode, OImode, QQmode, HQmode, SQmode, DQmode, TQmode,
> +UQQmode, UHQmode, USQmode, UDQmode, UTQmode, HAmode, SAmode, DAmode,
> +TAmode, UHAmode, USAmode, UDAmode, UTAmode, SFmode, DFmode, XFmode,
> +TFmode, SDmode, DDmode, TDmode, CQImode, CHImode, CSImode, CDImode,
> +CTImode, COImode, SCmode, DCmode, XCmode, TCmode, V2QImode, V4QImode,
> +V2HImode, V1SImode, V8QImode, V4HImode, V2SImode, V1DImode, V16QImode,
> +V8HImode, V4SImode, V2DImode, V1TImode, V32QImode, V16HImode, V8SImode,
> +V4DImode, V2TImode, V64QImode, V32HImode, V16SImode, V8DImode,
> V4TImode,
> +V2SFmode, V4SFmode, V2DFmode, V8SFmode, V4DFmode, V2TFmode, V16SFmode,
> +V8DFmode, V4TFmode, MAX_MACHINE_MODE, NUM_MACHINE_MODES =
> MAX_MACHINE_MODE
> +};
> +struct rtx_def
> +{
> +  __extension__ enum machine_mode mode:8;
> +};
> +struct target_regs
> +{
> +  unsigned char x_hard_regno_nregs[53][MAX_MACHINE_MODE];
> +};
> +extern void oof (void);
> +extern int rhs_regno (rtx);
> +
> +extern struct target_regs default_target_regs;
> +__inline__ unsigned int
> +end_hard_regno (enum machine_mode mode, unsigned int regno)
> +{
> +  return regno +
> +((&default_target_regs)->x_hard_regno_nregs)[regno][(int) mode];
> +}
> +
> +void
> +note_btr_set (rtx dest, const_rtx set
> + __attribute__ ((__unused__)), void *data)
> +{
> +  int regno, end_regno;
> +  end_regno = end_hard_regno (((dest)->mode), (rhs_regno (dest)));
> +  for (; regno < end_regno; regno++)
> +oof ();
> +}
>


Re: [PATCH, CHKP] Fix PR middle-end/66221: lto1: error: type variant has different TYPE_ARG_TYPES

2015-05-21 Thread Richard Biener
On Thu, May 21, 2015 at 10:38 AM, Ilya Enkovich  wrote:
> Hi,
>
> This patch fixes PR66221 by using build_distinct_type_copy instead of 
> copy_node to copy a function type for instrumented function.  Bootstrapped 
> and regtested for x86_64-unknown-linux-gnu.  Applied to trunk.  Is it OK for 
> gcc-5?

Ok.

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-05-21  Ilya Enkovich  
>
> PR middle-end/66221
> * ipa-chkp.c (chkp_copy_function_type_adding_bounds): Use
> build_distinct_type_copy to copy bounds.
>
> gcc/testsuite/
>
> 2015-05-21  Ilya Enkovich  
>
> PR middle-end/66221
> * gcc.dg/lto/pr66221_0.c: New test.
> * gcc.dg/lto/pr66221_1.c: New test.
>
>
> diff --git a/gcc/ipa-chkp.c b/gcc/ipa-chkp.c
> index ac5eb35..c710291 100644
> --- a/gcc/ipa-chkp.c
> +++ b/gcc/ipa-chkp.c
> @@ -308,7 +308,7 @@ chkp_copy_function_type_adding_bounds (tree orig_type)
>if (!arg_type)
>  return orig_type;
>
> -  type = copy_node (orig_type);
> +  type = build_distinct_type_copy (orig_type);
>TYPE_ARG_TYPES (type) = copy_list (TYPE_ARG_TYPES (type));
>
>for (arg_type = TYPE_ARG_TYPES (type);
> diff --git a/gcc/testsuite/gcc.dg/lto/pr66221_0.c 
> b/gcc/testsuite/gcc.dg/lto/pr66221_0.c
> new file mode 100644
> index 000..dbb9282
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/lto/pr66221_0.c
> @@ -0,0 +1,10 @@
> +/* { dg-lto-do link } */
> +/* { dg-require-effective-target mpx } */
> +/* { dg-lto-options { { -O2 -flto -fcheck-pointer-bounds -mmpx } } } */
> +
> +int test1 (const char *);
> +
> +int main (int argc, const char **argv)
> +{
> +  return test1 (argv[0]);
> +}
> diff --git a/gcc/testsuite/gcc.dg/lto/pr66221_1.c 
> b/gcc/testsuite/gcc.dg/lto/pr66221_1.c
> new file mode 100644
> index 000..4c94544
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/lto/pr66221_1.c
> @@ -0,0 +1,4 @@
> +int test1 (const char *p)
> +{
> +  return (int)(*p);
> +}


[gomp4.1] Taskloop support

2015-05-21 Thread Jakub Jelinek
Hi!

This patch finishes the C #pragma omp taskloop support on the gomp 4.1
branch, including library support.

2015-05-21  Jakub Jelinek  

* tree.h (OMP_STANDALONE_CLAUSES): Adjust to cover
OMP_TARGET_{ENTER,EXIT}_DATA.
(OMP_CLAUSE_SHARED_FIRSTPRIVATE): Define.
* gimplify.c (gimplify_scan_omp_clauses): Add lastprivate
clause to outer taskloop if needed.
(gimplify_omp_for): Fix a typo.  Fixup OMP_TASKLOOP
gimplification.
* omp-low.c (omp_copy_decl_2): If var is TREE_ADDRESSABLE
listed in task_shared_vars, clear TREE_ADDRESSABLE on the
copy.
(build_outer_var_ref): Add lastprivate argument, pass it through
recursively.  Handle lastprivate on taskloop construct.
(install_var_field): Allow multiple fields for a single
decl - one for firstprivate, another for shared clauses
on task.
(scan_sharing_clauses): Handle OMP_CLAUSE_SHARED_FIRSTPRIVATE.
(add_taskreg_looptemp_clauses): Add one more _looptemp_ clause
for taskloop GIMPLE_OMP_TASK, if it is collapse > 1 with
non-constant iteration count and there is lastprivate clause
on the inner GIMPLE_OMP_FOR.
(finish_taskreg_scan): Handle OMP_CLAUSE_SHARED_FIRSTPRIVATE.
(lower_rec_input_clauses): Likewise.  Ignore all
OMP_CLAUSE_LASTPRIVATE_FIRSTPRIVATE clauses on taskloop construct.
(lower_lastprivate_clauses): For OMP_CLAUSE_LASTPRIVATE_FIRSTPRIVATE
on taskloop lookup decl in outer context.  Pass true
to build_outer_var_ref lastprivate argument.
(lower_send_clauses): Handle OMP_CLAUSE_SHARED_FIRSTPRIVATE.
(lower_send_shared_vars): Ignore fields with NULL or
FIELD_DECL abstract origin.
(expand_task_call): Use GOMP_TASK_* defines instead of
hardcoded integers.
(expand_omp_simd): Handle addressable fd->loop.v.
(expand_omp_taskloop_for_outer): Initialize the last
_looptemp_ with total iteration count if needed.
(expand_omp_taskloop_for_inner): Handle bias and broken_loop.
(lower_omp_for_lastprivate): Use last _looptemp_ clause
on taskloop for comparison.
(create_task_copyfn): Handle OMP_CLAUSE_SHARED_FIRSTPRIVATE.
gcc/c-family/
* c-omp.c (c_finish_omp_for): Clear DECL_INITIAL.
gcc/testsuite/
* gcc.dg/gomp/taskloop-1.c: New test.
include/
* gomp-constants.h (GOMP_TASK_FLAG_UNTIED, GOMP_TASK_FLAG_FINAL,
GOMP_TASK_FLAG_MERGEABLE, GOMP_TASK_FLAG_DEPEND, GOMP_TASK_FLAG_UP,
GOMP_TASK_FLAG_GRAINSIZE, GOMP_TASK_FLAG_IF, GOMP_TASK_FLAG_NOGROUP):
Define.
libgomp/
* libgomp.map (GOMP_4.1): Export GOMP_taskloop and GOMP_taskloop_ull.
* task.c: Include gomp-constants.h.  Include taskloop.c twice
with appropriate macros.
(GOMP_task): Use GOMP_TASK_FLAG_* defines instead of hardcoded
constants.
* taskloop.c: New file.
* testsuite/libgomp.c/for-4.c: New test.
* testsuite/libgomp.c/taskloop-1.c: New test.
* testsuite/libgomp.c/taskloop-2.c: New test.
* testsuite/libgomp.c/taskloop-3.c: New test.

--- gcc/tree.h.jj   2015-05-19 18:56:50.982256719 +0200
+++ gcc/tree.h  2015-05-19 19:04:52.496759752 +0200
@@ -1206,7 +1206,7 @@ extern void protected_set_expr_location
 
 /* Generic accessors for OMP nodes that keep clauses as operand 0.  */
 #define OMP_STANDALONE_CLAUSES(NODE) \
-  TREE_OPERAND (TREE_RANGE_CHECK (NODE, OACC_CACHE, OMP_TARGET_UPDATE), 0)
+  TREE_OPERAND (TREE_RANGE_CHECK (NODE, OACC_CACHE, OMP_TARGET_EXIT_DATA), 0)
 
 #define OACC_PARALLEL_BODY(NODE) \
   TREE_OPERAND (OACC_PARALLEL_CHECK (NODE), 0)
@@ -1366,6 +1366,12 @@ extern void protected_set_expr_location
 #define OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ(NODE) \
   (OMP_CLAUSE_CHECK (NODE))->omp_clause.gimple_reduction_init
 
+/* True on a SHARED clause if a FIRSTPRIVATE clause for the same
+   decl is present in the chain (this can happen only for taskloop
+   with FIRSTPRIVATE/LASTPRIVATE on it originally.  */
+#define OMP_CLAUSE_SHARED_FIRSTPRIVATE(NODE) \
+  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_SHARED)->base.public_flag)
+
 #define OMP_CLAUSE_FINAL_EXPR(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_FINAL), 0)
 #define OMP_CLAUSE_IF_EXPR(NODE) \
--- gcc/gimplify.c.jj   2015-05-19 19:02:52.230632257 +0200
+++ gcc/gimplify.c  2015-05-20 19:07:01.317440243 +0200
@@ -6167,6 +6167,12 @@ gimplify_scan_omp_clauses (tree *list_p,
 (splay_tree_key) decl) == NULL)
omp_add_variable (outer_ctx, decl, GOVD_SHARED | GOVD_SEEN);
  else if (outer_ctx
+  && (outer_ctx->region_type & ORT_TASK) != 0
+  && outer_ctx->combined_loop
+  && splay_tree_lookup (outer_ctx->variables,
+(splay_tree_key) decl) == NULL)
+   omp_add_var

[Patch ARM] Fix PR target/65937

2015-05-21 Thread Ramana Radhakrishnan
Testism introduced by last commit to fix PR26702 on arm-*-linux* 
targets. The fix is to restore target selector to arm*-*-eabi* as the 
target macro changes only affect arm*-*-eabi*


Applied to trunk as obvious

Ramana

* gcc.target/arm/pr26702.c: Adjust target selector.
Index: gcc.target/arm/pr26702.c
===
--- gcc.target/arm/pr26702.c(revision 223444)
+++ gcc.target/arm/pr26702.c(working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target arm_eabi } } */
+/* { dg-do compile { target arm*-*-eabi* } } */
 /* { dg-final { scan-assembler "\\.size\[\\t \]+static_foo, 4" } } */
 int foo;
 static int static_foo;


RE: [PATCH, ping 1] Move insns without introducing new temporaries in loop2_invariant

2015-05-21 Thread Uros Bizjak
Hello!

>> From: Jeff Law [mailto:l...@redhat.com]
>> Sent: Wednesday, May 13, 2015 4:05 AM
>> OK for the trunk.
>>
>> Thanks for your patience,
>
> Thanks. Committed with the added "PR rtl-optimization/64616" to both
> ChangeLog entries.

This patch caused PR66236 [1].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66236

Uros.


Re: Add statistics to alias.c

2015-05-21 Thread Richard Biener
On Thu, 21 May 2015, Jan Hubicka wrote:

> Hi,
> this patch extends statistics from tree-ssa-alias to also cover TBAA oracle.
> This is useful to keep track of aliasing effectivity.  For example the hack
> in alias.c putting globbing all pointers to one costs about 20% of all
> answers on firefox. I.e. from 15500978 disambiguations/23744267 querries
> (with the hack removed) to 12932078 disambiguations/27256455 querries.
> 
> Bootstrapped x86_64-linux, OK?

Ok with the spelling fix and the same_type_for_tbaa hunk.

Thanks,
Richard.

> Honza
> 
>   * alias.c (alias_stats): New static var.
>   (alias_sets_conflict_p, alias_sets_must_conflict_p): Update stats.
>   (dump_alias_stats_in_alias_c): New function.
>   * alias.h (dump_alias_stats_in_alias_c): Declare.
>   * tree-ssa-alias.c (dump_alias_stats): Call it.
> Index: alias.c
> ===
> --- alias.c   (revision 223444)
> +++ alias.c   (working copy)
> @@ -213,6 +213,19 @@ static int write_dependence_p (const_rtx
>  
>  static void memory_modified_1 (rtx, const_rtx, void *);
>  
> +/* Query statistics for the different low-level disambiguators.
> +   A high-level query may trigger multiple of them.  */
> +
> +static struct {
> +  unsigned long long num_alias_zero;
> +  unsigned long long num_same_alias_set;
> +  unsigned long long num_same_objects;
> +  unsigned long long num_volatile;
> +  unsigned long long num_dag;
> +  unsigned long long num_disambiguated;
> +} alias_stats;
> +
> +
>  /* Set up all info needed to perform alias analysis on memory references.  */
>  
>  /* Returns the size in bytes of the mode of X.  */
> @@ -471,13 +484,20 @@ alias_sets_conflict_p (alias_set_type se
>ase = get_alias_set_entry (set1);
>if (ase != 0
>&& ase->children->get (set2))
> -return 1;
> +{
> +  ++alias_stats.num_dag;
> +  return 1;
> +}
>  
>/* Now do the same, but with the alias sets reversed.  */
>ase = get_alias_set_entry (set2);
>if (ase != 0
>&& ase->children->get (set1))
> -return 1;
> +{
> +  ++alias_stats.num_dag;
> +  return 1;
> +}
> +  ++alias_stats.num_disambiguated;
>  
>/* The two alias sets are distinct and neither one is the
>   child of the other.  Therefore, they cannot conflict.  */
> @@ -489,8 +509,16 @@ alias_sets_conflict_p (alias_set_type se
>  int
>  alias_sets_must_conflict_p (alias_set_type set1, alias_set_type set2)
>  {
> -  if (set1 == 0 || set2 == 0 || set1 == set2)
> -return 1;
> +  if (set1 == 0 || set2 == 0)
> +{
> +  ++alias_stats.num_alias_zero;
> +  return 1;
> +}
> +  if (set1 == set2)
> +{
> +  ++alias_stats.num_same_alias_set;
> +  return 1;
> +}
>  
>return 0;
>  }
> @@ -512,10 +540,17 @@ objects_must_conflict_p (tree t1, tree t
>  return 0;
>  
>/* If they are the same type, they must conflict.  */
> -  if (t1 == t2
> -  /* Likewise if both are volatile.  */
> -  || (t1 != 0 && TYPE_VOLATILE (t1) && t2 != 0 && TYPE_VOLATILE (t2)))
> -return 1;
> +  if (t1 == t2)
> +{
> +  ++alias_stats.num_same_objects;
> +  return 1;
> +}
> +  /* Likewise if both are volatile.  */
> +  if (t1 != 0 && TYPE_VOLATILE (t1) && t2 != 0 && TYPE_VOLATILE (t2))
> +{
> +  ++alias_stats.num_volatile;
> +  return 1;
> +}
>  
>set1 = t1 ? get_alias_set (t1) : 0;
>set2 = t2 ? get_alias_set (t2) : 0;
> @@ -3043,4 +3051,21 @@ end_alias_analysis (void)
>sbitmap_free (reg_known_equiv_p);
>  }
>  
> +void
> +dump_alias_stats_in_alias_c (FILE *s)
> +{
> +  fprintf (s, "  TBAA oracle: %llu disambiguations %llu queries\n"
> +   "   %llu are in alias set 0\n"
> +   "   %llu queries asked about the same object\n"
> +   "   %llu quaries asked about the same alias set\n"
> +   "   %llu access volatile\n"
> +   "   %llu are dependent in the DAG\n",
> +alias_stats.num_disambiguated,
> +alias_stats.num_alias_zero + alias_stats.num_same_alias_set
> ++ alias_stats.num_same_objects + alias_stats.num_volatile
> ++ alias_stats.num_dag,
> +alias_stats.num_alias_zero, alias_stats.num_same_alias_set,
> ++ alias_stats.num_same_objects, alias_stats.num_volatile,
> ++ alias_stats.num_dag);
> +}
>  #include "gt-alias.h"
> Index: alias.h
> ===
> --- alias.h   (revision 223444)
> +++ alias.h   (working copy)
> @@ -41,6 +41,7 @@ extern int alias_sets_conflict_p (alias_
>  extern int alias_sets_must_conflict_p (alias_set_type, alias_set_type);
>  extern int objects_must_conflict_p (tree, tree);
>  extern int nonoverlapping_memrefs_p (const_rtx, const_rtx, bool);
> +extern void dump_alias_stats_in_alias_c (FILE *s);
>  tree reference_alias_ptr_type (tree);
>  bool alias_ptr_types_compatible_p (tree,

Re: [patch, testsuite, ARM] don't try to execute advsimd-intrinsics tests on hardware without NEON

2015-05-21 Thread Christophe Lyon
On 21 May 2015 at 07:33, Sandra Loosemore  wrote:
> ARM testing shares the AArch64 advsimd-intrinsics execution tests.  On ARM,
> though, the NEON support being tested is optional -- some arches are
> compatible with the NEON compilation options but hardware available for
> testing might or might not be able to execute those instructions. In
> arm-none-eabi testing of a long list of multilibs, I found that this problem
> caused some of the multilibs to get stuck for days because every one of
> these execution tests was wandering off into the weeds and timing out.
>
> The vect.exp tests already handle this by setting dg-do-what-default to
> either "run" or "compile", depending on whether we have target hardware
> execution support (arm_neon_hw) for NEON, or only compilation support
> (arm_neon_ok).  So, I've adapted that logic for advsimd-intrinsics.exp too.

Indeed it makes sense.

>
> It also appeared that the main loop over the test cases was running them all
> twice with the torture options -- once using c-torture-execute and once
> using gcc-dg-runtest.  I deleted the former since it appears to ignore
> dg-do-what-default and always try to execute no matter what.  My dejagnu-fu
> isn't the strongest and this is pretty confusing to me am I missing
> something here?  Otherwise, OK to commit?

As noted by Alan in https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01348.html
the sets of options covered by gcc-dg-runtest and c-torture-execute
are slightly different.

That was the reason I kept both.
We can probably live with no longer testing "-Og -g" as Alan says.
OTOH, are the 2 option sets supposed to be the same, or are there any
plans to make them differ substantially  in the future?

Christophe.

> -Sandra
>


Re: Demangle symbols in debug assertion messages

2015-05-21 Thread Jonathan Wakely

On 20/05/15 21:45 +0200, François Dumont wrote:

On 20/05/2015 12:19, Jonathan Wakely wrote:
Does this fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65392 ?


With the patch this code of the bug report generates the following 
debug message:


/home/fdt/dev/gcc/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:395:
   error: attempt to retreat a past-the-end iterator 2 steps, which falls
   outside its valid range.

Objects involved in the operation:
iterator @ 0x0x7fff32365c50 {
 type = 
__gnu_debug::_Safe_iteratorint*>, std::__debug::deque > > (mutable 
iterator);

 state = past-the-end;
 references sequence with type `std::__debug::dequestd::allocator >' @ 0x0x7fff32365cd0

}

which looks nice.

However I wouldn't say that bug is fixed because debug mode do not 
generate mangle name, it simply rely on typeid to get it. Shouldn't 
bug report be saying so ? Whatever, symbol generated by typeid can be 
demangle by __cxa_demangle so it mustn't be that bad.


I was trying to demangle the names with c++filt, which failed. Users
should not have to write a C++ program using __cxa_demangle to read
the output.

If they are automatically demangled now then the bug is fixed.



Re: C/C++ PATCH to allow deprecating enum values (PR c/47043)

2015-05-21 Thread Marek Polacek
I'm pinging the C++ parts.

Thanks,

> On Thu, May 07, 2015 at 06:22:40PM +0200, Marek Polacek wrote:
> > This (third) version of the patch entails the change in tsubst_enum Ed
> > suggested + new testcase.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > 
> > 2015-05-07  Marek Polacek  
> > Edward Smith-Rowland  <3dw...@verizon.net>
> > 
> > PR c/47043
> > * c-common.c (handle_deprecated_attribute): Allow CONST_DECL.
> > 
> > * c-parser.c (c_parser_enum_specifier): Parse and apply enumerator
> > attributes.
> > 
> > * cp-tree.h (build_enumerator): Update declaration.
> > * decl.c (build_enumerator): Add attributes parameter.  Call
> > cplus_decl_attributes.
> > * init.c (constant_value_1): Pass 0 to mark_used.
> > * parser.c (cp_parser_enumerator_definition): Parse attributes and
> > pass them down to build_enumerator.
> > * pt.c (tsubst_enum): Pass decl attributes to build_enumerator.
> > * semantics.c (finish_id_expression): Don't warn_deprecated_use here.
> > 
> > * doc/extend.texi (Enumerator Attributes): New section.
> > Document syntax of enumerator attributes.
> > 
> > * c-c++-common/attributes-enum-1.c: New test.
> > * c-c++-common/attributes-enum-2.c: New test.
> > * g++.dg/cpp0x/attributes-enum-1.C: New test.
> > * g++.dg/cpp1y/attributes-enum-1.C: New test.
> > 
> > diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
> > index ada8e8a..36968e5 100644
> > --- gcc/c-family/c-common.c
> > +++ gcc/c-family/c-common.c
> > @@ -8810,6 +8810,7 @@ handle_deprecated_attribute (tree *node, tree name,
> >   || TREE_CODE (decl) == VAR_DECL
> >   || TREE_CODE (decl) == FUNCTION_DECL
> >   || TREE_CODE (decl) == FIELD_DECL
> > + || TREE_CODE (decl) == CONST_DECL
> >   || objc_method_decl (TREE_CODE (decl)))
> > TREE_DEPRECATED (decl) = 1;
> >else
> > diff --git gcc/c/c-parser.c gcc/c/c-parser.c
> > index bf0e4c57..889e6d7 100644
> > --- gcc/c/c-parser.c
> > +++ gcc/c/c-parser.c
> > @@ -2516,6 +2516,13 @@ c_parser_declspecs (c_parser *parser, struct 
> > c_declspecs *specs,
> > enumerator:
> >   enumeration-constant
> >   enumeration-constant = constant-expression
> > +
> > +   GNU Extensions:
> > +
> > +   enumerator:
> > + enumeration-constant attributes[opt]
> > + enumeration-constant attributes[opt] = constant-expression
> > +
> >  */
> >  
> >  static struct c_typespec
> > @@ -2575,6 +2582,8 @@ c_parser_enum_specifier (c_parser *parser)
> >   c_parser_set_source_position_from_token (token);
> >   decl_loc = value_loc = token->location;
> >   c_parser_consume_token (parser);
> > + /* Parse any specified attributes.  */
> > + tree enum_attrs = c_parser_attributes (parser);
> >   if (c_parser_next_token_is (parser, CPP_EQ))
> > {
> >   c_parser_consume_token (parser);
> > @@ -2584,7 +2593,9 @@ c_parser_enum_specifier (c_parser *parser)
> >   else
> > enum_value = NULL_TREE;
> >   enum_decl = build_enumerator (decl_loc, value_loc,
> > -   &the_enum, enum_id, enum_value);
> > +   &the_enum, enum_id, enum_value);
> > + if (enum_attrs)
> > +   decl_attributes (&TREE_PURPOSE (enum_decl), enum_attrs, 0);
> >   TREE_CHAIN (enum_decl) = values;
> >   values = enum_decl;
> >   seen_comma = false;
> > diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
> > index e0fbf5e..6b26cb1 100644
> > --- gcc/cp/cp-tree.h
> > +++ gcc/cp/cp-tree.h
> > @@ -5400,7 +5400,7 @@ extern bool xref_basetypes(tree, 
> > tree);
> >  extern tree start_enum (tree, tree, tree, 
> > bool, bool *);
> >  extern void finish_enum_value_list (tree);
> >  extern void finish_enum(tree);
> > -extern void build_enumerator   (tree, tree, tree, 
> > location_t);
> > +extern void build_enumerator   (tree, tree, tree, 
> > tree, location_t);
> >  extern tree lookup_enumerator  (tree, tree);
> >  extern bool start_preparsed_function   (tree, tree, int);
> >  extern bool start_function (cp_decl_specifier_seq *,
> > diff --git gcc/cp/decl.c gcc/cp/decl.c
> > index 261a12d..ebbd585 100644
> > --- gcc/cp/decl.c
> > +++ gcc/cp/decl.c
> > @@ -13067,11 +13067,12 @@ finish_enum (tree enumtype)
> >  
> >  /* Build and install a CONST_DECL for an enumeration constant of the
> > enumeration type ENUMTYPE whose NAME and VALUE (if any) are provided.
> > -   LOC is the location of NAME.
> > +   Apply ATTRIBUTES if available.  LOC is the location of NAME.
> > Assignment of sequential values by default is handled here.  */
> >  
> >  void
> > -build_enumerator (tree name, tree value, tree enumtype, location_t loc)
> > +build_enumerator (tree name, tree value, tree enumtype, tree attributes,
> > + locat

Re: [Patch AArch64] Add cpu_defines.h for AArch64.

2015-05-21 Thread Szabolcs Nagy
On 19/05/15 17:03, Ramana Radhakrishnan wrote:
> On Tue, May 19, 2015 at 4:54 PM,   wrote:
>>> On May 19, 2015, at 5:54 AM, Ramana Radhakrishnan 
>>>  wrote:
>>> Like the ARM port, the AArch64 ports needs to set glibc_integral_traps to 
>>> false as integer divide instructions do not trap.
>>>
>>> Bootstrapped and regression tested on aarch64-none-linux-gnu
>>>
>>> Ok to apply ?
>>
>> Not really questioning your patch but questioning libstdc++'s defaults.
>>  I wonder if this should be the default as most targets don't trap, only a 
>> few that does. And it should be safer default to say they don't trap too?
> 
> How about we  #error out if targets do *not* define some of these
> defaults in libstdc++  ?

__glibcxx_integral_traps seems to be used for the 'traps'
numeric_limits member.

i think it can only be meaningful if LIA-1 is properly supported
(eg. div-by-zero is never optimized away) otherwise the standard
dictates UB and anything can happen. (Note that LIA-1 also requires
well-defined semantics for signed int overflow).

i don't see a way for conforming c++ code to use traps, or for
the library to guarantee either traps==true or traps==false on
any machine.



Re: acc_on_device for device_type_host_nonshm

2015-05-21 Thread Thomas Schwinge
Hi!

On Thu, 7 May 2015 19:32:26 +0100, Julian Brown  wrote:
> Here's a new version of the patch [...]

> OK for trunk?

Makes sense to me (with just a request to drop the testsuite changes, see
below), to get the existing regressions under control.  Jakub?

> PR libgomp/65742
> 
> gcc/
> * builtins.c (expand_builtin_acc_on_device): Don't use open-coded
> sequence for !ACCEL_COMPILER.
> 
> libgomp/
> * oacc-init.c (plugin/plugin-host.h): Include.
> (acc_on_device): Check whether we're in an offloaded region for
> host_nonshm
> plugin. Don't use __builtin_acc_on_device.
> * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
> nonshm_exec flag in thread-local data.
> (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
> data for host_nonshm plugin.
> (GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
> for host_nonshm plugin.
> * plugin/plugin-host.h: New.
> * testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Remove
> -fno-builtin-acc_on_device flag.
> * testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
> * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove
> comment re: acc_on_device builtin.
> * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
> * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.> commit 
> adccf2e7d313263d585f63e752a4d36653d47811

> Author: Julian Brown 
> Date:   Tue Apr 21 12:40:45 2015 -0700
> 
> Non-SHM acc_on_device fixes
> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 6fe1456..5930fe4 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -5917,6 +5917,7 @@ expand_stack_save (void)
>  static rtx
>  expand_builtin_acc_on_device (tree exp, rtx target)
>  {
> +#ifdef ACCEL_COMPILER
>if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
>  return NULL_RTX;
>  
> @@ -5925,13 +5926,8 @@ expand_builtin_acc_on_device (tree exp, rtx target)
>/* Return (arg == v1 || arg == v2) ? 1 : 0.  */
>machine_mode v_mode = TYPE_MODE (TREE_TYPE (arg));
>rtx v = expand_normal (arg), v1, v2;
> -#ifdef ACCEL_COMPILER
>v1 = GEN_INT (GOMP_DEVICE_NOT_HOST);
>v2 = GEN_INT (ACCEL_COMPILER_acc_device);
> -#else
> -  v1 = GEN_INT (GOMP_DEVICE_NONE);
> -  v2 = GEN_INT (GOMP_DEVICE_HOST);
> -#endif
>machine_mode target_mode = TYPE_MODE (integer_type_node);
>if (!target || !register_operand (target, target_mode))
>  target = gen_reg_rtx (target_mode);
> @@ -5945,6 +5941,9 @@ expand_builtin_acc_on_device (tree exp, rtx target)
>emit_label (done_label);
>  
>return target;
> +#else
> +  return NULL;
> +#endif
>  }
>  
>  
> diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
> index 335ffd4..157147a 100644
> --- a/libgomp/oacc-init.c
> +++ b/libgomp/oacc-init.c
> @@ -29,6 +29,7 @@
>  #include "libgomp.h"
>  #include "oacc-int.h"
>  #include "openacc.h"
> +#include "plugin/plugin-host.h"
>  #include 
>  #include 
>  #include 
> @@ -611,11 +612,18 @@ ialias (acc_set_device_num)
>  int
>  acc_on_device (acc_device_t dev)
>  {
> -  if (acc_get_device_type () == acc_device_host_nonshm)
> +  struct goacc_thread *thr = goacc_thread ();
> +
> +  /* We only want to appear to be the "host_nonshm" plugin from "offloaded"
> + code -- i.e. within a parallel region.  Test a flag set by the
> + openacc_parallel hook of the host_nonshm plugin to determine that.  */
> +  if (acc_get_device_type () == acc_device_host_nonshm
> +  && thr && thr->target_tls
> +  && ((struct nonshm_thread *)thr->target_tls)->nonshm_exec)
>  return dev == acc_device_host_nonshm || dev == acc_device_not_host;
>  
> -  /* Just rely on the compiler builtin.  */
> -  return __builtin_acc_on_device (dev);
> +  /* For OpenACC, libgomp is only built for the host, so this is sufficient. 
>  */
> +  return dev == acc_device_host || dev == acc_device_none;
>  }
>  
>  ialias (acc_on_device)
> diff --git a/libgomp/plugin/plugin-host.c b/libgomp/plugin/plugin-host.c
> index 1faf5bc..3cb4dab 100644
> --- a/libgomp/plugin/plugin-host.c
> +++ b/libgomp/plugin/plugin-host.c
> @@ -44,6 +44,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef HOST_NONSHM_PLUGIN
>  #define STATIC
> @@ -55,6 +56,10 @@
>  #define SELF "host: "
>  #endif
>  
> +#ifdef HOST_NONSHM_PLUGIN
> +#include "plugin-host.h"
> +#endif
> +
>  STATIC const char *
>  GOMP_OFFLOAD_get_name (void)
>  {
> @@ -174,7 +179,10 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn) (void *),
>  void *targ_mem_desc __attribute__ ((unused)))
>  {
>  #ifdef HOST_NONSHM_PLUGIN
> +  struct nonshm_thread *thd = GOMP_PLUGIN_acc_thread ();
> +  thd->nonshm_exec = true;
>fn (devaddrs);
> +  thd->nonshm_exec = false;
>  #else
>fn (hostaddrs);
>  #endif
> @@ -232,11 +240,20 @@ STATIC void *
>  GOMP_OFFLOAD_openacc_create_thread_data (int ord
>__attribute__ ((unused)))
>  {
> +#ifdef HOST_NON

Re: acc_on_device for device_type_host_nonshm

2015-05-21 Thread Jakub Jelinek
On Thu, May 21, 2015 at 01:02:12PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 7 May 2015 19:32:26 +0100, Julian Brown  
> wrote:
> > Here's a new version of the patch [...]
> 
> > OK for trunk?
> 
> Makes sense to me (with just a request to drop the testsuite changes, see
> below), to get the existing regressions under control.  Jakub?

Ok for trunk.
> 
> > PR libgomp/65742
> > 
> > gcc/
> > * builtins.c (expand_builtin_acc_on_device): Don't use open-coded
> > sequence for !ACCEL_COMPILER.
> > 
> > libgomp/
> > * oacc-init.c (plugin/plugin-host.h): Include.
> > (acc_on_device): Check whether we're in an offloaded region for
> > host_nonshm
> > plugin. Don't use __builtin_acc_on_device.
> > * plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Set
> > nonshm_exec flag in thread-local data.
> > (GOMP_OFFLOAD_openacc_create_thread_data): Allocate thread-local
> > data for host_nonshm plugin.
> > (GOMP_OFFLOAD_openacc_destroy_thread_data): Free thread-local data
> > for host_nonshm plugin.
> > * plugin/plugin-host.h: New.
> > * testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Remove
> > -fno-builtin-acc_on_device flag.
> > * testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
> > * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove
> > comment re: acc_on_device builtin.
> > * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
> > * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.> commit 
> > adccf2e7d313263d585f63e752a4d36653d47811

Jakub


Re: [nvptx] Re: Mostly rewrite genrecog

2015-05-21 Thread Bernd Schmidt

On 05/21/2015 09:12 AM, Thomas Schwinge wrote:


OK to commit?

gcc/
* config/nvptx/nvptx.md (allocate_stack): Rename to...
(allocate_stack_): ... this, and add :P on both
match_operand and unspec.
(allocate_stack): New expander.


If you really want to. It doesn't work yet in ptxas so it's a little 
pointless to spend effort on it.



Bernd



[gomp4] Vector-single predication

2015-05-21 Thread Bernd Schmidt
This uses the patch I committed yesterday which introduces warp 
broadcasts to implement the vector-single predication needed for 
OpenACC. Outside a loop with vector parallelism, only one of the threads 
representing a vector must execute, the others follow along. So we skip 
the real work in each basic block for the inactive threads, then 
broadcast the direction to take in the control flow graph from the 
active one, and jump as a group.


This will get extended with similar functionality for worker-single. 
Julian is working on some patches on top of that to ensure the later 
optimizers don't destroy the control flow - we really need the threads 
to reconverge and perform the broadcast/jump in lockstep.


Committed on gomp-4_0-branch.


Bernd
Index: gcc/ChangeLog.gomp
===
--- gcc/ChangeLog.gomp	(revision 223444)
+++ gcc/ChangeLog.gomp	(working copy)
@@ -1,5 +1,15 @@
 2015-05-20  Bernd Schmidt  
 
+	* omp-low.c (struct omp_region): Add a gwv_this field.
+	(bb_region_map): New variable.
+	(find_omp_for_region_data, find_omp_target_region_data): New static
+	functions.
+	(build_omp_regions_1): Call them.  Build the bb_region_map.
+	(enclosing_target_region, requires_vector_predicate,
+	generate_vector_broadcast, predicate_bb, find_predicatable_bbs,
+	predicate_omp_regions): New static functions.
+	(execute_expand_omp): Allocate and free bb_region_map.
+
 	* config/nvptx/nvptx.c: Include "dumpfile,h".
 	(condition_unidirectional_p): New static function.
 	(nvptx_print_operand): Use it for new 'U' handling.
Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 223442)
+++ gcc/omp-low.c	(working copy)
@@ -159,6 +159,9 @@ struct omp_region
 
   /* True if this is a combined parallel+workshare region.  */
   bool is_combined_parallel;
+
+  /* For an OpenACC loop, the level of parallelism requested.  */
+  int gwv_this;
 };
 
 /* Levels of parallelism as defined by OpenACC.  Increasing numbers
@@ -9961,7 +9964,6 @@ expand_omp_target (struct omp_region *re
 update_ssa (TODO_update_ssa_only_virtuals);
 }
 
-
 /* Expand the parallel region tree rooted at REGION.  Expansion
proceeds in depth-first order.  Innermost regions are expanded
first.  This way, parallel regions that require a new function to
@@ -9984,7 +9986,7 @@ expand_omp (struct omp_region *region)
   if (region->type == GIMPLE_OMP_FOR
 	  && gimple_omp_for_combined_p (last_stmt (region->entry)))
 	inner_stmt = last_stmt (region->inner->entry);
-
+ 
   if (region->inner)
 	expand_omp (region->inner);
 
@@ -10041,6 +10043,44 @@ expand_omp (struct omp_region *region)
 }
 }
 
+/* Map each basic block to an omp_region.  */
+static hash_map *bb_region_map;
+
+/* Fill in additional data for a region REGION associated with an
+   OMP_FOR STMT.  */
+
+static void
+find_omp_for_region_data (struct omp_region *region, gimple stmt)
+{
+  if (!is_gimple_omp_oacc (stmt))
+return;
+
+  tree clauses = gimple_omp_for_clauses (stmt);
+  if (find_omp_clause (clauses, OMP_CLAUSE_GANG))
+region->gwv_this |= MASK_GANG;
+  if (find_omp_clause (clauses, OMP_CLAUSE_WORKER))
+region->gwv_this |= MASK_WORKER;
+  if (find_omp_clause (clauses, OMP_CLAUSE_VECTOR))
+region->gwv_this |= MASK_VECTOR;
+}
+
+/* Fill in additional data for a region REGION associated with an
+   OMP_TARGET STMT.  */
+
+static void
+find_omp_target_region_data (struct omp_region *region, gimple stmt)
+{
+  if (!is_gimple_omp_oacc (stmt))
+return;
+
+  tree clauses = gimple_omp_target_clauses (stmt);
+  if (find_omp_clause (clauses, OMP_CLAUSE_NUM_GANGS))
+region->gwv_this |= MASK_GANG;
+  if (find_omp_clause (clauses, OMP_CLAUSE_NUM_WORKERS))
+region->gwv_this |= MASK_WORKER;
+  if (find_omp_clause (clauses, OMP_CLAUSE_VECTOR_LENGTH))
+region->gwv_this |= MASK_VECTOR;
+}
 
 /* Helper for build_omp_regions.  Scan the dominator tree starting at
block BB.  PARENT is the region that contains BB.  If SINGLE_TREE is
@@ -10055,6 +10095,8 @@ build_omp_regions_1 (basic_block bb, str
   gimple stmt;
   basic_block son;
 
+  bb_region_map->put (bb, parent);
+
   gsi = gsi_last_bb (bb);
   if (!gsi_end_p (gsi) && is_gimple_omp (gsi_stmt (gsi)))
 {
@@ -10107,6 +10149,7 @@ build_omp_regions_1 (basic_block bb, str
 		case GF_OMP_TARGET_KIND_OACC_PARALLEL:
 		case GF_OMP_TARGET_KIND_OACC_KERNELS:
 		case GF_OMP_TARGET_KIND_OACC_DATA:
+		  find_omp_target_region_data (region, stmt);
 		  break;
 		case GF_OMP_TARGET_KIND_UPDATE:
 		case GF_OMP_TARGET_KIND_OACC_UPDATE:
@@ -10118,6 +10161,8 @@ build_omp_regions_1 (basic_block bb, str
 		  gcc_unreachable ();
 		}
 	}
+	  else if (code == GIMPLE_OMP_FOR)
+	find_omp_for_region_data (region, stmt);
 	  /* ..., this directive becomes the parent for a new region.  */
 	  if (region)
 	parent = region;
@@ -10156,7 +10201,7 @@ omp_expand_local (basic_block head)
   dump_omp_

Re: [gomp4] Vector-single predication

2015-05-21 Thread Jakub Jelinek
On Thu, May 21, 2015 at 01:42:11PM +0200, Bernd Schmidt wrote:
> This uses the patch I committed yesterday which introduces warp broadcasts
> to implement the vector-single predication needed for OpenACC. Outside a
> loop with vector parallelism, only one of the threads representing a vector
> must execute, the others follow along. So we skip the real work in each
> basic block for the inactive threads, then broadcast the direction to take
> in the control flow graph from the active one, and jump as a group.
> 
> This will get extended with similar functionality for worker-single. Julian
> is working on some patches on top of that to ensure the later optimizers
> don't destroy the control flow - we really need the threads to reconverge
> and perform the broadcast/jump in lockstep.
> 
> Committed on gomp-4_0-branch.

What do you do with function calls?
Do you call them just in the (tid.x & 31) == 0 threads (then they can't use
vectorization), or for all threads (then it is an ABI change, they
would need to know whether they are called this way and depending on that
handle it similarly (skip all the real work, except for function calls, for
(tid.x & 31) != 0, unless it is a vectorized region).
Or is OpenACC restricting this to statements in the constructs directly
(rather than anywhere in the region)?
Haven't seen any accompanying testcases for this, so it is unclear to me how
do you express this in OpenACC.

Jakub


Re: [PATCH 4/7] don't compare ARG_FRAME_POINTER_REGNUM and FRAME_POINTER_REGNUM with the preprocessor

2015-05-21 Thread Jeff Law

On 05/20/2015 08:09 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2015-05-20  Trevor Saunders  

* *.c: Remove comparison of ARG_FRAME_POINTER_REGNUM and
FRAME_POINTER_REGNUM with the preprocessor.
This only hits a handful of files.  If you could go ahead and list them 
in the ChangeLog that'd probably be better than *.c :-)



@@ -3781,16 +3778,14 @@ df_exit_block_uses_collect (struct df_collection_rec 
*collection_rec, bitmap exi
  df_ref_record (DF_REF_ARTIFICIAL, collection_rec, regno_reg_rtx[i], NULL,
   EXIT_BLOCK_PTR_FOR_FN (cfun), NULL, DF_REF_REG_USE, 0);

-#if FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
/* It is deliberate that this is not put in the exit block uses but
   I do not know why.  */
-  if (reload_completed
+  if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM && reload_completed
Minor nit, go ahead and put the && reload_completed on the next line. 
While it fits in 80 columns, ISTM like it more naturally (in GNU style) 
belongs on its own line.  Interestingly enough this is the only instance 
where you formatted this way -- all the others have the FP/AP comparison 
on on its own line.


OK for the trunk.

jeff



Re: [PATCH 5/7] always define HAVE_conditional_move

2015-05-21 Thread Jeff Law

On 05/20/2015 08:09 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2015-05-20  Trevor Saunders  

* genconfig.c (main): Always define HAVE_conditional_move.
* *.c: Don't check if HAVE_conditional_move is defined.
Again, you're hitting just a handful of files, if you could go ahead and 
list them it'd be appreciated.


OK for the trunk.
jeff



Re: [PATCH 6/7] remove #if HAVE_conditional_move

2015-05-21 Thread Jeff Law

On 05/20/2015 08:09 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2015-05-20  Trevor Saunders  

* *.c, *.h: DOn't check HAVE_conditional_move with the preprocessor.
You know what I'm going to say here :-)  FWIW, I think just mentioning 
the filename is fine for these kinds of mechanical changes -- no need to 
list each function that got twiddled.


OK for the trunk.

Jeff



Re: [PATCH 3/7] move default for STACK_PUSH_CODE to defaults.h

2015-05-21 Thread Jeff Law

On 05/20/2015 08:09 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2015-05-20  Trevor Saunders  

* defaults.h: Add default for STACK_PUSH_CODE.
* expr.c: Don't redefine STACK_PUSH_CODE.
* recog.c: Likewise.

OK.
jeff



Re: [PATCH 2/7] remove most ifdef STACK_GROWS_DOWNWARD

2015-05-21 Thread Jeff Law

On 05/20/2015 08:09 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/c-family/ChangeLog:

2015-05-20  Trevor Saunders  

* c-cppbuiltin.c (c_cpp_builtins): Use if instead of #if with
STACK_GROWS_DOWNWARD.

gcc/ChangeLog:

2015-05-20  Trevor Saunders  

* *.c: Use if instead of preprocessor checks with
STACK_GROWS_DOWNWARD.
---
  gcc/ChangeLog   |  5 
  gcc/builtins.c  | 30 +++
  gcc/c-family/ChangeLog  |  5 
  gcc/c-family/c-cppbuiltin.c |  5 ++--
  gcc/dwarf2cfi.c | 12 +-
  gcc/explow.c| 33 --
  gcc/expr.c  | 58 +++--
  gcc/recog.c |  8 ++-
  gcc/sched-deps.c|  9 ---
  9 files changed, 78 insertions(+), 87 deletions(-)

OK with the usual request to list filenames in the ChangeLogs.

jeff



Re: [PATCH 1/7] always define STACK_GROWS_DOWNWARD

2015-05-21 Thread Jeff Law

On 05/20/2015 08:09 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/c-family/ChangeLog:

2015-05-20  Trevor Saunders  

* c-cppbuiltin.c (c_cpp_builtins): Check the value of
STACK_GROWS_DOWNWARD rather than if it is defined.

gcc/ChangeLog:

2015-05-20  Trevor Saunders  

* *.c: Check the value of STACK_GROWS_DOWNWARD rather than if it
is defined.
* config/**/*.h: Define STACK_GROWS_DOWNWARD to an integer.
* defaults.h: Provide default for STACK_GROWS_DOWNWARD.
---
  gcc/ChangeLog  |  7 +++
  gcc/builtins.c |  6 +++---
  gcc/c-family/ChangeLog |  5 +
  gcc/c-family/c-cppbuiltin.c|  2 +-
  gcc/calls.c|  8 
  gcc/combine-stack-adj.c|  8 
  gcc/config/alpha/alpha.h   |  2 +-
  gcc/config/arc/arc.h   |  2 +-
  gcc/config/avr/avr.h   |  2 +-
  gcc/config/bfin/bfin.h |  2 +-
  gcc/config/c6x/c6x.h   |  2 +-
  gcc/config/cr16/cr16.h |  2 +-
  gcc/config/cris/cris.h |  2 +-
  gcc/config/epiphany/epiphany.h |  2 +-
  gcc/config/h8300/h8300.h   |  2 +-
  gcc/config/i386/i386.h |  2 +-
  gcc/config/iq2000/iq2000.h |  2 +-
  gcc/config/m32r/m32r.h |  2 +-
  gcc/config/mcore/mcore.h   |  2 +-
  gcc/config/microblaze/microblaze.h |  2 +-
  gcc/config/mips/mips.h |  2 +-
  gcc/config/mmix/mmix.h |  2 +-
  gcc/config/mn10300/mn10300.h   |  2 +-
  gcc/config/moxie/moxie.h   |  2 +-
  gcc/config/nds32/nds32.h   |  2 +-
  gcc/config/nios2/nios2.h   |  2 +-
  gcc/config/nvptx/nvptx.h   |  2 +-
  gcc/config/pdp11/pdp11.h   |  2 +-
  gcc/config/rs6000/rs6000.h |  2 +-
  gcc/config/s390/s390.h |  2 +-
  gcc/config/sh/sh.h |  2 +-
  gcc/config/sparc/sparc.h   |  2 +-
  gcc/config/spu/spu.h   |  2 +-
  gcc/config/tilegx/tilegx.h |  2 +-
  gcc/config/tilepro/tilepro.h   |  2 +-
  gcc/config/v850/v850.h |  2 +-
  gcc/config/vax/vax.h   |  2 +-
  gcc/config/xtensa/xtensa.h |  2 +-
  gcc/defaults.h |  4 
  gcc/dwarf2cfi.c|  4 ++--
  gcc/explow.c   | 10 +-
  gcc/expr.c | 20 
  gcc/ira-color.c|  8 
  gcc/lower-subreg.c |  7 ---
  gcc/lra-spills.c   |  8 
  gcc/recog.c|  6 +++---
  gcc/sched-deps.c   |  2 +-
  47 files changed, 71 insertions(+), 98 deletions(-)

OK.  Not going to require each filename to be listed in the ChangeLog :-)

Thanks for taking care of this stuff!

Jeff


Re: [patch, testsuite] don't specify "dg-do run" explicitly for vect test cases

2015-05-21 Thread Jeff Law

On 05/20/2015 11:12 PM, Sandra Loosemore wrote:

On targets such as ARM, some arches are compatible with options needed
to enable compilation with vectorization, but the specific hardware (or
simulator or BSP) available for execution tests may not implement or
enable those features.  The vect.exp test harness already includes some
magic to determine whether the target hw can execute vectorized code and
sets dg-do-what-default to compile the tests only if they can't be
executed.  It's a mistake for individual tests to explicitly say "dg-do
run" because this overrides the harness's magic default and forces the
test to be executed, even if doing so just ends up wedging the target.

I already committed two patches last fall (r215627 and r218427) to
address this, but people keep adding new vect test cases with the same
problem, so here is yet another installment to clean them up.  I tested
this on arm-none-eabi with a fairly large collection of multilibs.  OK
to commit?

-Sandra


vect.log


2015-05-20  Sandra Loosemore

gcc/testsuite/
* gcc.dg/vect/bb-slp-pr65935.c: Remove explicit "dg-do run".
* gcc.dg/vect/pr59354.c: Likewise.
* gcc.dg/vect/pr64252.c: Likewise.
* gcc.dg/vect/pr64404.c: Likewise.
* gcc.dg/vect/pr64493.c: Likewise.
* gcc.dg/vect/pr64495.c: Likewise.
* gcc.dg/vect/pr64844.c: Likewise.
* gcc.dg/vect/pr65518.c: Likewise.
* gcc.dg/vect/vect-aggressive-1.c: Likewise.

OK.
jeff



Re: [SH][committed] Fix gcc.target/sh/pr54236-2.c failures

2015-05-21 Thread Oleg Endo
On Tue, 2015-05-19 at 10:04 +0200, Oleg Endo wrote:
> Since a recent change to the tree optimizers
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00089.html
> some related SH patterns stopped working.  The attached patch fixes
> this.
> 
> Tested briefly with 'make all' and with
> make -k check-gcc RUNTESTFLAGS="sh.exp=pr54236* --target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> Committed as r223346.

This is a follow up patch which fixes two oversights in the above
change.  Tested as above, Committed as r223479.

Cheers,
Oleg

gcc/ChangeLog:
PR target/54236
* config/sh/sh.md (*round_int_even): Reject pattern if operands[0] and
operands[1] are the same.

testsuite/ChangeLog:
PR target/54236
* gcc.target/sh/pr54236-2.c: Fix typo in comment.
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 223478)
+++ gcc/config/sh/sh.md	(working copy)
@@ -2011,7 +2011,8 @@
 	(and:SI (plus:SI (match_operand:SI 1 "arith_reg_operand")
 			 (const_int 1))
 		(const_int -2)))]
-  "TARGET_SH1 && !TARGET_SH2A && can_create_pseudo_p ()"
+  "TARGET_SH1 && !TARGET_SH2A && can_create_pseudo_p ()
+   && !reg_overlap_mentioned_p (operands[0], operands[1])"
   "#"
   "&& 1"
   [(set (match_dup 0) (const_int -2))
Index: gcc/testsuite/gcc.target/sh/pr54236-2.c
===
--- gcc/testsuite/gcc.target/sh/pr54236-2.c	(revision 223478)
+++ gcc/testsuite/gcc.target/sh/pr54236-2.c	(working copy)
@@ -133,7 +133,7 @@
 test_016 (int a, int b, int c, int d)
 {
   // non-SH2A: 1x add #1, 1x mov #-2, 1x and
-  // SH2A: 1x add #1, 1x blcr #0
+  // SH2A: 1x add #1, 1x bclr #0
   return a + (a & 1);
 }
 


[Patch]: libbacktrace - add support of PE/COFF

2015-05-21 Thread Tristan Gingold
Hello,

this patch adds basic support to libbacktrace for PE32 and PE32+ (Windows and 
Windows64 object formats).
Support is ‘basic’ because neither DLL nor PIE (if that exists) are handled.  
Furthermore, there is no windows versions of mmapio.c and mmap.c
Finally, I have disabled the support of data symbols for PE because I wasn’t 
able to pass ‘make check’ with that: symbol ‘_global’ is at the same address as 
a symbol defined by the linker and I haven’t found any way to discard the 
latter.  As I think data symbol support isn’t a required feature, I have 
preferred to disable that feature on PE.

The new file, pecoff.c, mostly follows the structure of elf.c

Tested on both windows and windows64.
No regression on Gnu/Linux x86.

Tristan.


2015-05-21  Tristan Gingold  

* pecoff.c: New file.
* Makefile.am (FORMAT_FILES): Add pecoff.c and dependencies.
* Makefile.in: Regenerate.
* filetype.awk: Detect pecoff.
* configure.ac: Define BACKTRACE_SUPPORTS_DATA on elf platforms.
Add pecoff.
* btest.c (test5): Test enabled only if BACKTRACE_SUPPORTS_DATA is
true.
* backtrace-supported.h.in (BACKTRACE_SUPPORTS_DATA): Define.
* configure: Regenerate.
* pecoff.c: New file.


commit ac17f650356728fc07121c71213401e1e159df2f
Author: Tristan Gingold 
Date:   Thu May 21 14:29:44 2015 +0200

Add support for PE/COFF to libbacktrace.

diff --git a/libbacktrace/ChangeLog b/libbacktrace/ChangeLog
index c6604d9..139521a 100644
--- a/libbacktrace/ChangeLog
+++ b/libbacktrace/ChangeLog
@@ -1,3 +1,17 @@
+2015-05-21  Tristan Gingold  
+
+   * pecoff.c: New file.
+   * Makefile.am (FORMAT_FILES): Add pecoff.c and dependencies.
+   * Makefile.in: Regenerate.
+   * filetype.awk: Detect pecoff.
+   * configure.ac: Define BACKTRACE_SUPPORTS_DATA on elf platforms.
+   Add pecoff.
+   * btest.c (test5): Test enabled only if BACKTRACE_SUPPORTS_DATA is
+   true.
+   * backtrace-supported.h.in (BACKTRACE_SUPPORTS_DATA): Define.
+   * configure: Regenerate.
+   * pecoff.c: New file.
+
 2015-05-13  Michael Haubenwallner  
 
* Makefile.in: Regenerated with automake-1.11.6.
diff --git a/libbacktrace/Makefile.am b/libbacktrace/Makefile.am
index a93b82a..c5f0dcb 100644
--- a/libbacktrace/Makefile.am
+++ b/libbacktrace/Makefile.am
@@ -56,6 +56,7 @@ BACKTRACE_FILES = \
 
 FORMAT_FILES = \
elf.c \
+   pecoff.c \
unknown.c
 
 VIEW_FILES = \
@@ -124,6 +125,7 @@ fileline.lo: config.h backtrace.h internal.h
 mmap.lo: config.h backtrace.h internal.h
 mmapio.lo: config.h backtrace.h internal.h
 nounwind.lo: config.h internal.h
+pecoff.lo: config.h backtrace.h internal.h
 posix.lo: config.h backtrace.h internal.h
 print.lo: config.h backtrace.h internal.h
 read.lo: config.h backtrace.h internal.h
diff --git a/libbacktrace/Makefile.in b/libbacktrace/Makefile.in
index a949f29..b434d76e 100644
--- a/libbacktrace/Makefile.in
+++ b/libbacktrace/Makefile.in
@@ -299,6 +299,7 @@ BACKTRACE_FILES = \
 
 FORMAT_FILES = \
elf.c \
+   pecoff.c \
unknown.c
 
 VIEW_FILES = \
@@ -753,6 +754,7 @@ fileline.lo: config.h backtrace.h internal.h
 mmap.lo: config.h backtrace.h internal.h
 mmapio.lo: config.h backtrace.h internal.h
 nounwind.lo: config.h internal.h
+pecoff.lo: config.h backtrace.h internal.h
 posix.lo: config.h backtrace.h internal.h
 print.lo: config.h backtrace.h internal.h
 read.lo: config.h backtrace.h internal.h
diff --git a/libbacktrace/backtrace-supported.h.in 
b/libbacktrace/backtrace-supported.h.in
index 5115ce1..4574635 100644
--- a/libbacktrace/backtrace-supported.h.in
+++ b/libbacktrace/backtrace-supported.h.in
@@ -59,3 +59,8 @@ POSSIBILITY OF SUCH DAMAGE.  */
as 0.  */
 
 #define BACKTRACE_SUPPORTS_THREADS @BACKTRACE_SUPPORTS_THREADS@
+
+/* BACKTRACE_SUPPORTS_DATA will be #defined'd as 1 if the backtrace library
+   also handles data symbols, 0 if not.  */
+
+#define BACKTRACE_SUPPORTS_DATA @BACKTRACE_SUPPORTS_DATA@
diff --git a/libbacktrace/btest.c b/libbacktrace/btest.c
index 9424a92..9821e34 100644
--- a/libbacktrace/btest.c
+++ b/libbacktrace/btest.c
@@ -616,6 +616,8 @@ f33 (int f1line, int f2line)
   return failures;
 }
 
+#if BACKTRACE_SUPPORTS_DATA
+
 int global = 1;
 
 static int
@@ -684,6 +686,8 @@ test5 (void)
   return failures;
 }
 
+#endif /* BACKTRACE_SUPPORTS_DATA  */
+
 static void
 error_callback_create (void *data ATTRIBUTE_UNUSED, const char *msg,
   int errnum)
@@ -708,8 +712,10 @@ main (int argc ATTRIBUTE_UNUSED, char **argv)
   test2 ();
   test3 ();
   test4 ();
+#if BACKTRACE_SUPPORTS_DATA
   test5 ();
 #endif
+#endif
 
   exit (failures ? EXIT_FAILURE : EXIT_SUCCESS);
 }
diff --git a/libbacktrace/configure b/libbacktrace/configure
index fa81659..19418c9 100755
--- a/libbacktrace/configure
+++ b/libbacktrace/configure
@@ -607,6 +607,7 @@ NATIVE_TRUE
 BACKTRACE_USES_MALLOC
 ALLOC_FILE
 VIEW_FILE
+BACKTRACE_SUPPORTS_DATA
 

[PATCH] PR target/66232: -fPIC -fno-plt -mx32 fails to generate indirect branch via GOT

2015-05-21 Thread H.J. Lu
X32 doesn't support indirect branch via 32-bit memory slot since
indirect branch will load 64-bit address from 64-bit memory slot.
Since x32 GOT slot is 64-bit, we should allow indirect branch via GOT
slot for x32.

I am testing it on x32.  OK for master if there is no regression?

Thanks.


H.J.
--
gcc/

PR target/66232
* config/i386/constraints.md (Bg): Add a constraint for x32
call and sibcall memory operand.
* config/i386/i386.md (*call_x32): New pattern.
(*sibcall_x32): Likewise.
(*call_value_x32): Likewise.
(*sibcall_value_x32): Likewise.
* config/i386/predicates.md (x32_sibcall_memory_operand): New
predicate.
(x32_call_insn_operand): Likewise.
(x32_sibcall_insn_operand): Likewise.

gcc/testsuite/

PR target/66232
* gcc.target/i386/pr66232-1.c: New test.
* gcc.target/i386/pr66232-2.c: Likewise.
* gcc.target/i386/pr66232-3.c: Likewise.
* gcc.target/i386/pr66232-4.c: Likewise.
---
 gcc/config/i386/constraints.md|  6 ++
 gcc/config/i386/i386.md   | 36 +++
 gcc/config/i386/predicates.md | 26 ++
 gcc/testsuite/gcc.target/i386/pr66232-1.c | 13 +++
 gcc/testsuite/gcc.target/i386/pr66232-2.c | 14 
 gcc/testsuite/gcc.target/i386/pr66232-3.c | 13 +++
 gcc/testsuite/gcc.target/i386/pr66232-4.c | 13 +++
 7 files changed, 121 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-4.c

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 2271bd1..7be8917 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -146,10 +146,16 @@
  "@internal Lower SSE register when avoiding REX prefix and all SSE registers 
otherwise.")
 
 ;; We use the B prefix to denote any number of internal operands:
+;;  g  Call and sibcall memory operand, valid for TARGET_X32
 ;;  s  Sibcall memory operand, not valid for TARGET_X32
 ;;  w  Call memory operand, not valid for TARGET_X32
 ;;  z  Constant call address operand.
 
+(define_constraint "Bg"
+  "@internal Call/sibcall memory operand for x32."
+  (and (match_test "TARGET_X32")
+   (match_operand 0 "x32_sibcall_memory_operand")))
+
 (define_constraint "Bs"
   "@internal Sibcall memory operand."
   (and (not (match_test "TARGET_X32"))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index aefca43..a1ae05a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11659,6 +11659,14 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
+(define_insn "*call_x32"
+  [(call (mem:QI (zero_extend:DI
+  (match_operand:SI 0 "x32_call_insn_operand" "Bg")))
+(match_operand 1))]
+  "TARGET_X32 && !SIBLING_CALL_P (insn)"
+  "* return ix86_output_call_insn (insn, operands[0]);"
+  [(set_attr "type" "call")])
+
 (define_insn "*sibcall"
   [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "UBsBz"))
 (match_operand 1))]
@@ -11666,6 +11674,14 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
+(define_insn "*sibcall_x32"
+  [(call (mem:QI (zero_extend:DI
+  (match_operand:SI 0 "x32_sibcall_insn_operand" "Bg")))
+(match_operand 1))]
+  "TARGET_X32 && SIBLING_CALL_P (insn)"
+  "* return ix86_output_call_insn (insn, operands[0]);"
+  [(set_attr "type" "call")])
+
 (define_insn "*sibcall_memory"
   [(call (mem:QI (match_operand:W 0 "memory_operand" "m"))
 (match_operand 1))
@@ -11825,6 +11841,16 @@
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
+(define_insn "*call_value_x32"
+  [(set (match_operand 0)
+   (call (mem:QI
+   (zero_extend:DI
+ (match_operand:SI 1 "x32_call_insn_operand" "Bg")))
+ (match_operand 2)))]
+  "TARGET_X32 && !SIBLING_CALL_P (insn)"
+  "* return ix86_output_call_insn (insn, operands[1]);"
+  [(set_attr "type" "callv")])
+
 (define_insn "*sibcall_value"
   [(set (match_operand 0)
(call (mem:QI (match_operand:W 1 "sibcall_insn_operand" "UBsBz"))
@@ -11833,6 +11859,16 @@
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
+(define_insn "*sibcall_value_x32"
+  [(set (match_operand 0)
+   (call (mem:QI
+   (zero_extend:DI
+ (match_operand:SI 1 "x32_sibcall_insn_operand" "Bg")))
+ (match_operand 2)))]
+  "TARGET_X32 && SIBLING_CALL_P (insn)"
+  "* return ix86_output_call_insn (insn, operands[1]);"
+  [(set_attr "type" "callv")])
+
 (define_insn "*sibcall_value_memory"
   [(set (match_operand 0)
(call (mem:QI (match_operand

Re: [patch, libgomp] Re-factor GOMP_MAP_POINTER handling

2015-05-21 Thread Thomas Schwinge
Hi!

Jakub, for avoidance of doubt, the proposed refactoring makes sense to
me, but does need your approval:

On Thu, 21 May 2015 16:30:40 +0800, Chung-Lin Tang  
wrote:
> Ping x2.
> 
> On 15/5/11 7:19 PM, Chung-Lin Tang wrote:
> > Ping.
> > 
> > On 2015/4/21 08:21 PM, Chung-Lin Tang wrote:
> >> Hi,
> >> while investigating some issues in the variable mapping code, I observed
> >> that the GOMP_MAP_POINTER handling is essentially duplicated under the 
> >> PSET case.
> >> This patch abstracts and unifies the handling code, basically just a 
> >> cleanup
> >> patch. Ran libgomp tests to ensure no regressions, ok for trunk?
> >>
> >> Thanks,
> >> Chung-Lin
> >>
> >> 2015-04-21  Chung-Lin Tang  
> >>
> >> libgomp/
> >> * target.c (gomp_map_pointer): New function abstracting out
> >> GOMP_MAP_POINTER handling.
> >> (gomp_map_vars): Remove GOMP_MAP_POINTER handling code and use
> >> gomp_map_pointer().


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [gomp4] Vector-single predication

2015-05-21 Thread Julian Brown
On Thu, 21 May 2015 13:57:00 +0200
Jakub Jelinek  wrote:

> On Thu, May 21, 2015 at 01:42:11PM +0200, Bernd Schmidt wrote:
> > This uses the patch I committed yesterday which introduces warp
> > broadcasts to implement the vector-single predication needed for
> > OpenACC. Outside a loop with vector parallelism, only one of the
> > threads representing a vector must execute, the others follow
> > along. So we skip the real work in each basic block for the
> > inactive threads, then broadcast the direction to take in the
> > control flow graph from the active one, and jump as a group.
> > 
> > This will get extended with similar functionality for
> > worker-single. Julian is working on some patches on top of that to
> > ensure the later optimizers don't destroy the control flow - we
> > really need the threads to reconverge and perform the
> > broadcast/jump in lockstep.
> > 
> > Committed on gomp-4_0-branch.
> 
> What do you do with function calls?
> Do you call them just in the (tid.x & 31) == 0 threads (then they
> can't use vectorization), or for all threads (then it is an ABI
> change, they would need to know whether they are called this way and
> depending on that handle it similarly (skip all the real work, except
> for function calls, for (tid.x & 31) != 0, unless it is a vectorized
> region). Or is OpenACC restricting this to statements in the
> constructs directly (rather than anywhere in the region)?

OpenACC handles function calls specially (calling them "routines" -- of
varying sorts, gang, worker, vector or seq, affecting where they can be
invoked from). The plan is that all threads will call such routines --
and then some threads will be "neutered" as appropriate within the
routines themselves, as appropriate.

That's not actually implemented yet, though.

Julian


Re: [PATCH] PR target/66232: -fPIC -fno-plt -mx32 fails to generate indirect branch via GOT

2015-05-21 Thread Uros Bizjak
On Thu, May 21, 2015 at 2:59 PM, H.J. Lu  wrote:
> X32 doesn't support indirect branch via 32-bit memory slot since
> indirect branch will load 64-bit address from 64-bit memory slot.
> Since x32 GOT slot is 64-bit, we should allow indirect branch via GOT
> slot for x32.
>
> I am testing it on x32.  OK for master if there is no regression?
>
> Thanks.
>
>
> H.J.
> --
> gcc/
>
> PR target/66232
> * config/i386/constraints.md (Bg): Add a constraint for x32
> call and sibcall memory operand.
> * config/i386/i386.md (*call_x32): New pattern.
> (*sibcall_x32): Likewise.
> (*call_value_x32): Likewise.
> (*sibcall_value_x32): Likewise.
> * config/i386/predicates.md (x32_sibcall_memory_operand): New
> predicate.
> (x32_call_insn_operand): Likewise.
> (x32_sibcall_insn_operand): Likewise.
>
> gcc/testsuite/
>
> PR target/66232
> * gcc.target/i386/pr66232-1.c: New test.
> * gcc.target/i386/pr66232-2.c: Likewise.
> * gcc.target/i386/pr66232-3.c: Likewise.
> * gcc.target/i386/pr66232-4.c: Likewise.

OK.

maybe you should use match_code some more in x32_sibcall_memory_operand, e.g.

(match_code "constant" "0")
(match_code "unspec" "00")

But it is up to you, since XINT doesn't fit in this scheme...

Thanks,
Uros.

>  gcc/config/i386/constraints.md|  6 ++
>  gcc/config/i386/i386.md   | 36 
> +++
>  gcc/config/i386/predicates.md | 26 ++
>  gcc/testsuite/gcc.target/i386/pr66232-1.c | 13 +++
>  gcc/testsuite/gcc.target/i386/pr66232-2.c | 14 
>  gcc/testsuite/gcc.target/i386/pr66232-3.c | 13 +++
>  gcc/testsuite/gcc.target/i386/pr66232-4.c | 13 +++
>  7 files changed, 121 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-4.c
>
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index 2271bd1..7be8917 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -146,10 +146,16 @@
>   "@internal Lower SSE register when avoiding REX prefix and all SSE 
> registers otherwise.")
>
>  ;; We use the B prefix to denote any number of internal operands:
> +;;  g  Call and sibcall memory operand, valid for TARGET_X32
>  ;;  s  Sibcall memory operand, not valid for TARGET_X32
>  ;;  w  Call memory operand, not valid for TARGET_X32
>  ;;  z  Constant call address operand.
>
> +(define_constraint "Bg"
> +  "@internal Call/sibcall memory operand for x32."
> +  (and (match_test "TARGET_X32")
> +   (match_operand 0 "x32_sibcall_memory_operand")))
> +
>  (define_constraint "Bs"
>"@internal Sibcall memory operand."
>(and (not (match_test "TARGET_X32"))
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index aefca43..a1ae05a 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -11659,6 +11659,14 @@
>"* return ix86_output_call_insn (insn, operands[0]);"
>[(set_attr "type" "call")])
>
> +(define_insn "*call_x32"
> +  [(call (mem:QI (zero_extend:DI
> +  (match_operand:SI 0 "x32_call_insn_operand" "Bg")))
> +(match_operand 1))]
> +  "TARGET_X32 && !SIBLING_CALL_P (insn)"
> +  "* return ix86_output_call_insn (insn, operands[0]);"
> +  [(set_attr "type" "call")])
> +
>  (define_insn "*sibcall"
>[(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "UBsBz"))
>  (match_operand 1))]
> @@ -11666,6 +11674,14 @@
>"* return ix86_output_call_insn (insn, operands[0]);"
>[(set_attr "type" "call")])
>
> +(define_insn "*sibcall_x32"
> +  [(call (mem:QI (zero_extend:DI
> +  (match_operand:SI 0 "x32_sibcall_insn_operand" "Bg")))
> +(match_operand 1))]
> +  "TARGET_X32 && SIBLING_CALL_P (insn)"
> +  "* return ix86_output_call_insn (insn, operands[0]);"
> +  [(set_attr "type" "call")])
> +
>  (define_insn "*sibcall_memory"
>[(call (mem:QI (match_operand:W 0 "memory_operand" "m"))
>  (match_operand 1))
> @@ -11825,6 +11841,16 @@
>"* return ix86_output_call_insn (insn, operands[1]);"
>[(set_attr "type" "callv")])
>
> +(define_insn "*call_value_x32"
> +  [(set (match_operand 0)
> +   (call (mem:QI
> +   (zero_extend:DI
> + (match_operand:SI 1 "x32_call_insn_operand" "Bg")))
> + (match_operand 2)))]
> +  "TARGET_X32 && !SIBLING_CALL_P (insn)"
> +  "* return ix86_output_call_insn (insn, operands[1]);"
> +  [(set_attr "type" "callv")])
> +
>  (define_insn "*sibcall_value"
>[(set (match_operand 0)
> (call (mem:QI (match_operand:W 1 "sibcall_insn_operand" "UBsBz"))
> @@ -11833,6 +11859,16 @@
>"* return ix86_output_call_insn (insn, operands[1]);"

Re: [PATCH, PR target/65103, 2/3] Propagate address constants into loops for i386

2015-05-21 Thread Ilya Enkovich
Ping

2015-05-05 14:05 GMT+03:00 Ilya Enkovich :
> 2015-04-21 8:52 GMT+03:00 Jeff Law :
>> On 04/17/2015 02:34 AM, Ilya Enkovich wrote:
>>>
>>> On 15 Apr 14:07, Ilya Enkovich wrote:

 2015-04-14 8:22 GMT+03:00 Jeff Law :
>
> On 03/15/2015 02:30 PM, Richard Sandiford wrote:
>>
>>
>> Ilya Enkovich  writes:
>>>
>>>
>>> This patch allows propagation of loop invariants for i386 if
>>> propagated
>>> value is a constant to be used in address operand.  Bootstrapped and
>>> tested on x86_64-unknown-linux-gnu.  OK for trunk or stage 1?
>>
>>
>>
>> Is it necessary for this to be a target hook?  The concept doesn't seem
>> particularly target-specific.  We should only propagate into the
>> address
>> if the new cost is no greater than the old cost, but if the address
>> meets that condition and if propagating at this point in the pipeline
>> is
>> a win on x86, then wouldn't it be a win for other targets too?
>
>
> I agree with Richard here.  I can't see a strong reason why this should
> be a
> target hook.
>
> Perhaps part of the issue here is the address costing metrics may not
> have
> enough context to make good decisions.  In which case what context do
> they
> need?


 At this point I don't insist on a target hook.  The main reasoning was
 to not affect other targets. If we extend propagation for non constant
 values different aspects may appear. E.g. possible register pressure
 changes may significantly affect ia32. I just wanted to have an
 instrument to play with a propagation on x86 not affecting other
 targets. I don't have an opportunity to test possible performance
 implications on non-x86 targets. Don't expect (significant)
 regressions there but who knows...

 I'll remove the hook from this patch. Will probably introduce it later
 if some target specific cases are found.

 Thanks,
 Ilya

>
> Jeff
>>>
>>>
>>> Here is a version with no hook.  Bootstrapped and tested on
>>> x86_64-unknown-linux-gnu.  Is it OK for trunk?
>>>
>>> Thanks,
>>> Ilya
>>> --
>>> gcc/
>>>
>>> 2015-04-17  Ilya Enkovich  
>>>
>>> PR target/65103
>>> * fwprop.c (forward_propagate_into): Propagate loop
>>> invariants if a target says so.
>>>
>>> gcc/testsuite/
>>>
>>> 2015-04-17  Ilya Enkovich  
>>>
>>> PR target/65103
>>> * gcc.target/i386/pr65103-2.c: New.
>>
>> It seems to me there's a key piece missing here -- metrics.
>>
>> When is this profitable, when is it not profitable.   Just blindly undoing
>> LICM seems wrong here.
>>
>> The first thought is to look at register pressure through the loop.  I
>> thought we had some infrastructure for this kind of query available. It'd
>> probably be wise to re-use it.  In fact, one might reasonably ask if LICM
>> should have hoisted the expression to start with.
>>
>>
>> I'd also think the cost of the constant may come into play here.  A really
>> cheap constant probably should not have been hoisted by LICM to start with
>> -- but the code may have been written in such a way that some low cost
>> constants are pulled out as loop invariants at the source level.  So this
>> isn't strictly an issue of un-doing bad LICM
>>
>> So I think to go forward we need to be working on solving the "when is this
>> a profitable transformation to make".
>
> This patch doesn't force propagation.  The patch just allows
> propagation and regular fwprop cost estimation is used to compute if
> this is profitable.  For i386 I don't see cases when we shouldn't
> propagate. We remove instruction, reduce register pressure and having
> constant in memory operand is free which is reflected in address_cost
> hook.
>
> Ilya
>
>>
>> jeff


[PATCH] Fix PR66211

2015-05-21 Thread Richard Biener

The following papers over the C++ FE issue that it doesn't track
lvalueness before folding stuff.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-05-21  Richard Biener  

PR c++/66211
* match.pd: Guard pattern optimzing (int)(float)int
conversions to apply only on GIMPLE.

* g++.dg/conversion/pr66211.C: New testcase.
* gcc.dg/tree-ssa/forwprop-18.c: Adjust.

Index: gcc/testsuite/g++.dg/conversion/pr66211.C
===
*** gcc/testsuite/g++.dg/conversion/pr66211.C   (revision 0)
--- gcc/testsuite/g++.dg/conversion/pr66211.C   (working copy)
***
*** 0 
--- 1,11 
+ // PR c++/66211
+ // { dg-do compile }
+ 
+ void f(int&){}
+ 
+ int main()
+ {
+   int x = 0;
+   double y = 1;
+   f(1 > 0 ? x : y); // { dg-error "from an rvalue" }
+ }
Index: gcc/match.pd
===
--- gcc/match.pd(revision 223348)
+++ gcc/match.pd(working copy)
@@ -791,7 +791,8 @@ (define_operator_list inverted_tcc_compa
/* If we are converting an integer to a floating-point that can
   represent it exactly and back to an integer, we can skip the
   floating-point conversion.  */
-   (if (inside_int && inter_float && final_int &&
+   (if (GIMPLE /* PR66211 */
+   && inside_int && inter_float && final_int &&
(unsigned) significand_size (TYPE_MODE (inter_type))
>= inside_prec - !inside_unsignedp)
 (convert @0))
Index: gcc/testsuite/gcc.dg/tree-ssa/forwprop-18.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/forwprop-18.c (revision 223348)
+++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-18.c (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-forwprop1" } */
+/* { dg-options "-O -fdump-tree-cddce1" } */
 
 signed char f1(signed char n)
 {
@@ -19,6 +19,6 @@ signed char g2(unsigned long long n)
   return (float)n;
 }
 
-/* { dg-final { scan-tree-dump-times "\\\(float\\\)" 2 "forwprop1" } } */
-/* { dg-final { scan-tree-dump-not "\\\(long double\\\)" "forwprop1" } } */
-/* { dg-final { cleanup-tree-dump "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "\\\(float\\\)" 2 "cddce1" } } */
+/* { dg-final { scan-tree-dump-not "\\\(long double\\\)" "cddce1" } } */
+/* { dg-final { cleanup-tree-dump "cddce1" } } */


Re: [gomp4] Vector-single predication

2015-05-21 Thread Jakub Jelinek
On Thu, May 21, 2015 at 02:05:12PM +0100, Julian Brown wrote:
> OpenACC handles function calls specially (calling them "routines" -- of
> varying sorts, gang, worker, vector or seq, affecting where they can be
> invoked from). The plan is that all threads will call such routines --
> and then some threads will be "neutered" as appropriate within the
> routines themselves, as appropriate.

All functions will behave that way, or just some using some magic attribute
etc.?  Say will newlib functions behave this way (math functions, printf,
...)?  For math functions e.g. it would be nice if they could behave both ways
(perhaps as separate entrypoints), so have the possibility to say how many
threads from the warp will perform the operation and then work on array
arguments and array return value (kind like OpenMP or Cilk+ elemental
functions, just perhaps with different argument/return value passing
conventions).

Jakub


[Ada] Allow constants in SPARK contracts

2015-05-21 Thread Arnaud Charlet
This patch permits constants to appear in the following SPARK annotations:

   Depends
   Global
   Initializes
   Part_Of
   Refined_Depends
   Refined_Global
   Refined_State


-- Source --


--  legal_usage.ads

package Legal_Usage
  with SPARK_Mode,
   Abstract_State=> State,
   Initializes   => (C1, C2),
   Initial_Condition => (C1 = 1 and then C2 = 2)
is
   C1 : constant Integer := 1;
   C2 : constant Integer := 2;

   function Func (Formal : Integer) return Integer
 with Global  => (Input => (C1, C2, State)),
  Depends => (Func'Result => (Formal, C1, C2, State));

private
   C3 : constant Integer := 3 with Part_Of => State;
end Legal_Usage;

--  legal_usage.adb

package body Legal_Usage
  with SPARK_Mode,
   Refined_State => (State => (C3, C4))
is
   C4 : constant Integer := 4;

   function Func (Formal : Integer) return Integer
 with Refined_Global  => (Input => (C1, C2, C3, C4)),
  Refined_Depends => (Func'Result => (Formal, C1, C2, C3, C4))
   is
   begin
  return Formal + C1 + C2 + C3 + C4;
   end Func;
end Legal_Usage;

--  illegal_usage.ads

package Illegal_Usage is
   C1 : constant Integer := 1;

   procedure Error_1
 with Global => (In_Out => C1);

   procedure Error_2
 with Global => (Output => C1);

   procedure Error_3 (Formal : Integer)
 with Depends => (C1 => Formal);

   procedure Error_4 (Formal : Integer)
 with Global  => (Input => C1),
  Depends => (C1 => Formal);
end Illegal_Usage;


-- Compilation and output --


$ gcc -c legal_usage.adb
$ gcc -c illegal_usage.adb
illegal_usage.ads:5:32: constant "C1" cannot act as output
illegal_usage.ads:8:32: constant "C1" cannot act as output
illegal_usage.ads:11:23: read-only constant "C1" cannot appear as output in
  dependence relation
illegal_usage.ads:14:32: constant "C1" must appear in at least one input
  dependence list
illegal_usage.ads:15:23: read-only constant "C1" cannot appear as output in
  dependence relation

Tested on x86_64-pc-linux-gnu, committed on trunk

2015-05-21  Hristian Kirtchev  

* einfo.adb (Contract): This attribute now applies to constants.
(Set_Contract): This attribute now applies to constants.
(Write_Field34_Name): Add output for constants.
* einfo.ads Attribute Contract now applies to constants.
* sem_ch3.adb (Analyze_Object_Contract): Constants now have
their Part_Of indicator verified.
* sem_prag.adb (Analyze_Constituent): A constant is now a valid
constituent.
(Analyze_Global_Item): A constant cannot act as an output.
(Analyze_Initialization_Item): Constants are now a valid
initialization item.
(Analyze_Initializes_In_Decl_Part): Rename
global variable States_And_Vars to States_And_Objs and update
all its occurrences.
(Analyze_Input_Item): Constants are now a
valid initialization item. Remove SPARM RM references from error
messages.
(Analyze_Pragma): Indicator Part_Of can now apply to a constant.
(Collect_Body_States): Collect both source constants
and variables.
(Collect_States_And_Objects): Collect both source constants and
variables.
(Collect_States_And_Variables): Rename
to Collect_States_And_Objects and update all its occurrences.
(Collect_Visible_States): Do not collect constants and variables
used to map generic formals to actuals.
(Find_Role): The role of a constant is that of an input. Separate the
role of a variable from that of a constant.
(Report_Unused_Constituents): Add specialized wording for constants.
(Report_Unused_States): Add specialized wording for constants.
* sem_util.adb (Add_Contract_Item): Add processing for constants.
* sem_util.ads (Add_Contract_Item): Update the comment on usage.
(Find_Placement_In_State_Space): Update the comment on usage.

Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 223477)
+++ sem_ch3.adb (working copy)
@@ -3205,6 +3205,8 @@
  return;
   end if;
 
+  --  Constant related checks
+
   if Ekind (Obj_Id) = E_Constant then
 
  --  A constant cannot be effectively volatile. This check is only
@@ -3224,6 +3226,8 @@
 Error_Msg_N ("constant cannot be volatile", Obj_Id);
  end if;
 
+  --  Variable related checks
+
   else pragma Assert (Ekind (Obj_Id) = E_Variable);
 
  --  The following checks are only relevant when SPARK_Mode is on as
@@ -3323,15 +3327,15 @@
  if Seen then
 Check_External_Properties (Obj_Id, AR_Val, AW_Val, ER_Val, EW_Val);
  end if;
+  end if;
 
- --  Check whether the lack of indicator Part_Of agrees with the
- --  placement of the variable with respect to 

Re: [patch, libgomp] Re-factor GOMP_MAP_POINTER handling

2015-05-21 Thread Jakub Jelinek
On Thu, May 21, 2015 at 03:00:16PM +0200, Thomas Schwinge wrote:
> Jakub, for avoidance of doubt, the proposed refactoring makes sense to
> me, but does need your approval:

This is ok for trunk.

Jakub


Re: [PATCH] PR target/66232: -fPIC -fno-plt -mx32 fails to generate indirect branch via GOT

2015-05-21 Thread H.J. Lu
On Thu, May 21, 2015 at 6:11 AM, Uros Bizjak  wrote:
> On Thu, May 21, 2015 at 2:59 PM, H.J. Lu  wrote:
>> X32 doesn't support indirect branch via 32-bit memory slot since
>> indirect branch will load 64-bit address from 64-bit memory slot.
>> Since x32 GOT slot is 64-bit, we should allow indirect branch via GOT
>> slot for x32.
>>
>> I am testing it on x32.  OK for master if there is no regression?
>>
>> Thanks.
>>
>>
>> H.J.
>> --
>> gcc/
>>
>> PR target/66232
>> * config/i386/constraints.md (Bg): Add a constraint for x32
>> call and sibcall memory operand.
>> * config/i386/i386.md (*call_x32): New pattern.
>> (*sibcall_x32): Likewise.
>> (*call_value_x32): Likewise.
>> (*sibcall_value_x32): Likewise.
>> * config/i386/predicates.md (x32_sibcall_memory_operand): New
>> predicate.
>> (x32_call_insn_operand): Likewise.
>> (x32_sibcall_insn_operand): Likewise.
>>
>> gcc/testsuite/
>>
>> PR target/66232
>> * gcc.target/i386/pr66232-1.c: New test.
>> * gcc.target/i386/pr66232-2.c: Likewise.
>> * gcc.target/i386/pr66232-3.c: Likewise.
>> * gcc.target/i386/pr66232-4.c: Likewise.
>
> OK.
>
> maybe you should use match_code some more in x32_sibcall_memory_operand, e.g.
>
> (match_code "constant" "0")
> (match_code "unspec" "00")
>
> But it is up to you, since XINT doesn't fit in this scheme...
>

>>
>> +;; Return true if OP is a memory operand that can be used in x32 calls
>> +;; and sibcalls.  Only th 64-bit GOT slot is allowed.
>> +(define_predicate "x32_sibcall_memory_operand"
>> +  (and (match_operand 0 "memory_operand")
>> +   (match_test "CONSTANT_P (XEXP (op, 0))")
>> +   (match_test "GET_CODE (XEXP (XEXP (op, 0), 0)) == UNSPEC")
>> +   (match_test "XINT (XEXP (XEXP (op, 0), 0), 1) == UNSPEC_GOTPCREL")))
>> +

Since "match_code" doesn't support "constant" neither

#define CONSTANT_P(X)   \
  (GET_RTX_CLASS (GET_CODE (X)) == RTX_CONST_OBJ)

I will keep it asis.

Thanks.

-- 
H.J.


Re: [gomp4] Vector-single predication

2015-05-21 Thread Julian Brown
On Thu, 21 May 2015 15:21:54 +0200
Jakub Jelinek  wrote:

> On Thu, May 21, 2015 at 02:05:12PM +0100, Julian Brown wrote:
> > OpenACC handles function calls specially (calling them "routines"
> > -- of varying sorts, gang, worker, vector or seq, affecting where
> > they can be invoked from). The plan is that all threads will call
> > such routines -- and then some threads will be "neutered" as
> > appropriate within the routines themselves, as appropriate.
> 
> All functions will behave that way, or just some using some magic
> attribute etc.?  Say will newlib functions behave this way (math
> functions, printf, ...)? 

It's actually unclear at this point if "regular" functions are
supported by OpenACC at all (the spec says nothing about them). They
probably raise "interesting" questions about re-entrancy,
synchronisation, and so on.

> For math functions e.g. it would be nice if
> they could behave both ways (perhaps as separate entrypoints), so
> have the possibility to say how many threads from the warp will
> perform the operation and then work on array arguments and array
> return value (kind like OpenMP or Cilk+ elemental functions, just
> perhaps with different argument/return value passing conventions).

And that's something that's way outside the spec as currently defined,
AFAIK.

Julian


Re: [RFA] Restore combine.c split point for multiply-accumulate instructions

2015-05-21 Thread Segher Boessenkool
On Wed, May 20, 2015 at 11:38:44PM -0600, Jeff Law wrote:
> I've also verified this is one of the changes ultimately necessary to 
> resolve the code generation regressions caused by Venkat's combine.c 
> change on the PA across my 300+ testfiles for a PA cross compiler.

How much does it help, do you know?

> OK for the trunk?

Yes, please commit.  Thanks.  (One tiny comment below).


Segher


> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 490386e..250fa0a 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,5 +1,8 @@
>  2015-05-20  Jeff Law  
>  
> + * combine.c (find_split_point): Handle ASHIFT like MULT to encourage
> + multiply-accumulate/shift-add insn generation.
> +
>   * config/pa/pa.c (pa_print_operand): New 'o' output modifier.
>   (pa_mem_shadd_constant_p): Renamed from pa_shadd_constant_p.
>   (pa_shadd_constant_p): Allow constants for shadd insns rather
> diff --git a/gcc/combine.c b/gcc/combine.c
> index a90849e..ab6de3a 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -5145,7 +5163,9 @@ find_split_point (rtx *loc, rtx_insn *insn, bool 
> set_src)
>/* Split at a multiply-accumulate instruction.  However if this is
>   the SET_SRC, we likely do not have such an instruction and it's
>   worthless to try this split.  */
> -  if (!set_src && GET_CODE (XEXP (x, 0)) == MULT)
> +  if (!set_src
> +   && (GET_CODE (XEXP (x, 0)) == MULT
> +   || GET_CODE (XEXP (x, 0)) == ASHIFT))
>  return loc;

It might be better to also check if it is shifting by a CONST_INT.
I doubt it matters much, but it is closer to the original.


Segher


Re: [gomp4] Vector-single predication

2015-05-21 Thread Julian Brown
On Thu, 21 May 2015 14:38:19 +0100
Julian Brown  wrote:

> On Thu, 21 May 2015 15:21:54 +0200
> Jakub Jelinek  wrote:
> 
> > On Thu, May 21, 2015 at 02:05:12PM +0100, Julian Brown wrote:
> > > OpenACC handles function calls specially (calling them "routines"
> > > -- of varying sorts, gang, worker, vector or seq, affecting where
> > > they can be invoked from). The plan is that all threads will call
> > > such routines -- and then some threads will be "neutered" as
> > > appropriate within the routines themselves, as appropriate.
> > 
> > All functions will behave that way, or just some using some magic
> > attribute etc.?  Say will newlib functions behave this way (math
> > functions, printf, ...)? 
> 
> It's actually unclear at this point if "regular" functions are
> supported by OpenACC at all (the spec says nothing about them). They
> probably raise "interesting" questions about re-entrancy,
> synchronisation, and so on.

...actually, replied too soon: regular math functions, etc. will be
handled the same as routines declared with "seq". They won't contain
partitioned loops, and can be called from anywhere in an offloaded
region.

Julian


[C++ PATCH] Minor fix for warn_args_num

2015-05-21 Thread Marek Polacek
I've just noticed that we print "note: declared here" even for builtins.
E.g.:

void
foo (void)
{
  __builtin_return ();
}

q.cc: In function ‘void foo()’:
q.cc:4:21: error: too few arguments to function ‘void __builtin_return(void*)’
   __builtin_return ();
 ^
: note: declared here

That doesn't seem to be too useful and the C FE doesn't do it.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-05-21  Marek Polacek  

* typeck.c (warn_args_num): Don't print "declare here" for builtins.

diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index ba99c30..8aadeca 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -3598,8 +3598,8 @@ warn_args_num (location_t loc, tree fndecl, bool 
too_many_p)
  ? G_("too many arguments to function %q#D")
  : G_("too few arguments to function %q#D"),
  fndecl);
-  inform (DECL_SOURCE_LOCATION (fndecl),
- "declared here");
+  if (!DECL_BUILT_IN (fndecl))
+   inform (DECL_SOURCE_LOCATION (fndecl), "declared here");
 }
   else
 {

Marek


Re: [gomp4] Vector-single predication

2015-05-21 Thread Jakub Jelinek
On Thu, May 21, 2015 at 02:38:19PM +0100, Julian Brown wrote:
> > All functions will behave that way, or just some using some magic
> > attribute etc.?  Say will newlib functions behave this way (math
> > functions, printf, ...)? 
> 
> It's actually unclear at this point if "regular" functions are
> supported by OpenACC at all (the spec says nothing about them). They
> probably raise "interesting" questions about re-entrancy,
> synchronisation, and so on.
> 
> > For math functions e.g. it would be nice if
> > they could behave both ways (perhaps as separate entrypoints), so
> > have the possibility to say how many threads from the warp will
> > perform the operation and then work on array arguments and array
> > return value (kind like OpenMP or Cilk+ elemental functions, just
> > perhaps with different argument/return value passing conventions).
> 
> And that's something that's way outside the spec as currently defined,
> AFAIK.

Not necessarily.  GCC uses the elemental functions not just in OpenMP/Cilk+
simd regions, but also in auto-vectorized code.  So if auto-vectorization
for NVPTX target would just use the extra threads, it is relevant to OpenACC
as well.  Not to mention that OpenMP is also relevant to NVPTX.

Jakub


[PATCH][1/n] Reduction vectorization improvements

2015-05-21 Thread Richard Biener

I'm at the moment tearing apart a large patch that adds support for
vectorizing reductions in basic-blocks as well as making loop
vectorizing reduction chains with patterns work.

This is a first piece - allow the reduction patterns be detected
when reduction detection didn't run and remove an assert in favor
of a check.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-05-21  Richard Biener  

* tree-vect-patterns.c (vect_recog_dot_prod_pattern): Replace
assert with guard, remove check on detected reduction.
(vect_recog_sad_pattern): Likewise.
(vect_recog_widen_sum_pattern): Likewise.

Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c(revision 223470)
+++ gcc/tree-vect-patterns.c(working copy)
@@ -318,6 +318,11 @@ vect_recog_dot_prod_pattern (vec
 
   loop = LOOP_VINFO_LOOP (loop_info);
 
+  /* We don't allow changing the order of the computation in the inner-loop
+ when doing outer-loop vectorization.  */
+  if (loop && nested_in_vect_loop_p (loop, last_stmt))
+return NULL;
+
   if (!is_gimple_assign (last_stmt))
 return NULL;
 
@@ -366,8 +371,6 @@ vect_recog_dot_prod_pattern (vec
 {
   gimple def_stmt;
 
-  if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def)
-return NULL;
   oprnd0 = gimple_assign_rhs1 (last_stmt);
   oprnd1 = gimple_assign_rhs2 (last_stmt);
   if (!types_compatible_p (TREE_TYPE (oprnd0), type)
@@ -469,10 +472,6 @@ vect_recog_dot_prod_pattern (vec
   dump_printf (MSG_NOTE, "\n");
 }
 
-  /* We don't allow changing the order of the computation in the inner-loop
- when doing outer-loop vectorization.  */
-  gcc_assert (!nested_in_vect_loop_p (loop, last_stmt));
-
   return pattern_stmt;
 }
 
@@ -533,6 +532,11 @@ vect_recog_sad_pattern (vec *stm
 
   loop = LOOP_VINFO_LOOP (loop_info);
 
+  /* We don't allow changing the order of the computation in the inner-loop
+ when doing outer-loop vectorization.  */
+  if (loop && nested_in_vect_loop_p (loop, last_stmt))
+return NULL;
+
   if (!is_gimple_assign (last_stmt))
 return NULL;
 
@@ -586,8 +590,6 @@ vect_recog_sad_pattern (vec *stm
 {
   gimple def_stmt;
 
-  if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def)
-return NULL;
   plus_oprnd0 = gimple_assign_rhs1 (last_stmt);
   plus_oprnd1 = gimple_assign_rhs2 (last_stmt);
   if (!types_compatible_p (TREE_TYPE (plus_oprnd0), sum_type)
@@ -703,10 +705,6 @@ vect_recog_sad_pattern (vec *stm
   dump_printf (MSG_NOTE, "\n");
 }
 
-  /* We don't allow changing the order of the computation in the inner-loop
- when doing outer-loop vectorization.  */
-  gcc_assert (!nested_in_vect_loop_p (loop, last_stmt));
-
   return pattern_stmt;
 }
 
@@ -1201,6 +1199,11 @@ vect_recog_widen_sum_pattern (vec

Re: [RFA] Restore combine.c split point for multiply-accumulate instructions

2015-05-21 Thread Jeff Law

On 05/21/2015 07:40 AM, Segher Boessenkool wrote:

On Wed, May 20, 2015 at 11:38:44PM -0600, Jeff Law wrote:

I've also verified this is one of the changes ultimately necessary to
resolve the code generation regressions caused by Venkat's combine.c
change on the PA across my 300+ testfiles for a PA cross compiler.


How much does it help, do you know?
It resolves the remaining missed opportunities to create shadd insns 
across those 300+ files.


There's one more combine.c patch on the way to canonicalize in one more 
place -- which fixes a missed CSE due to a MULT in one context and 
ASHIFT in another.


Then it's strictly cleanup on the PA port to kill the old MULT patterns.



It might be better to also check if it is shifting by a CONST_INT.
I doubt it matters much, but it is closer to the original.

Sure, that's not a problem at all.  Will do after the usual testing.

jeff


[PATCH[2/n] Reduction vectorization improvements

2015-05-21 Thread Richard Biener

This is part #2, fixes wrong-code because of bogus reduction op used
and factors out common code (which I'd otherwise need to duplicate
once more...).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-05-21  Richard Biener  

* tree-vect-loop.c (get_reduction_op): New function.
(vect_model_reduction_cost): Use it, add reduc_index parameter.
Make ready for BB reductions.
(vect_create_epilog_for_reduction): Use get_reduction_op.
(vectorizable_reduction): Init reduc_index to a valid value.
Adjust vect_model_reduction_cost call.
* tree-vect-slp.c (vect_get_constant_vectors): Use the proper
operand for reduction defaults.  Add SAD_EXPR support.
Assert we have a neutral op for SLP reductions.
* tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): When
walking pattern stmt ops only recurse to SSA names.

Index: gcc/tree-vect-loop.c
===
*** gcc/tree-vect-loop.c(revision 223482)
--- gcc/tree-vect-loop.c(working copy)
*** have_whole_vector_shift (enum machine_mo
*** 3166,3171 
--- 3166,3194 
return true;
  }
  
+ /* Return the reduction operand (with index REDUC_INDEX) of STMT.  */
+ 
+ static tree
+ get_reduction_op (gimple stmt, int reduc_index)
+ {
+   switch (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)))
+ {
+ case GIMPLE_SINGLE_RHS:
+   gcc_assert (TREE_OPERAND_LENGTH (gimple_assign_rhs1 (stmt))
+ == ternary_op);
+   return TREE_OPERAND (gimple_assign_rhs1 (stmt), reduc_index);
+ case GIMPLE_UNARY_RHS:
+   return gimple_assign_rhs1 (stmt);
+ case GIMPLE_BINARY_RHS:
+   return (reduc_index
+ ? gimple_assign_rhs2 (stmt) : gimple_assign_rhs1 (stmt));
+ case GIMPLE_TERNARY_RHS:
+   return gimple_op (stmt, reduc_index + 1);
+ default:
+   gcc_unreachable ();
+ }
+ }
+ 
  /* TODO: Close dependency between vect_model_*_cost and vectorizable_*
 functions. Design better to avoid maintenance issues.  */
  
*** have_whole_vector_shift (enum machine_mo
*** 3177,3183 
  
  static bool
  vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
!  int ncopies)
  {
int prologue_cost = 0, epilogue_cost = 0;
enum tree_code code;
--- 3200,3206 
  
  static bool
  vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
!  int ncopies, int reduc_index)
  {
int prologue_cost = 0, epilogue_cost = 0;
enum tree_code code;
*** vect_model_reduction_cost (stmt_vec_info
*** 3187,3218 
tree reduction_op;
machine_mode mode;
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
!   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
!   void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
  
/* Cost of reduction op inside loop.  */
unsigned inside_cost = add_stmt_cost (target_cost_data, ncopies, 
vector_stmt,
stmt_info, 0, vect_body);
stmt = STMT_VINFO_STMT (stmt_info);
  
!   switch (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)))
! {
! case GIMPLE_SINGLE_RHS:
!   gcc_assert (TREE_OPERAND_LENGTH (gimple_assign_rhs1 (stmt)) == 
ternary_op);
!   reduction_op = TREE_OPERAND (gimple_assign_rhs1 (stmt), 2);
!   break;
! case GIMPLE_UNARY_RHS:
!   reduction_op = gimple_assign_rhs1 (stmt);
!   break;
! case GIMPLE_BINARY_RHS:
!   reduction_op = gimple_assign_rhs2 (stmt);
!   break;
! case GIMPLE_TERNARY_RHS:
!   reduction_op = gimple_assign_rhs3 (stmt);
!   break;
! default:
!   gcc_unreachable ();
! }
  
vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
if (!vectype)
--- 3210,3232 
tree reduction_op;
machine_mode mode;
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
!   struct loop *loop = NULL;
!   void *target_cost_data;
! 
!   if (loop_vinfo)
! {
!   loop = LOOP_VINFO_LOOP (loop_vinfo);
!   target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
! }
!   else
! target_cost_data = BB_VINFO_TARGET_COST_DATA (STMT_VINFO_BB_VINFO 
(stmt_info));
  
/* Cost of reduction op inside loop.  */
unsigned inside_cost = add_stmt_cost (target_cost_data, ncopies, 
vector_stmt,
stmt_info, 0, vect_body);
stmt = STMT_VINFO_STMT (stmt_info);
  
!   reduction_op = get_reduction_op (stmt, reduc_index);
  
vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
if (!vectype)
*** vect_model_reduction_cost (stmt_vec_info
*** 3245,3251 
   We have a reduction operator that will reduce the vector in one 
statement.
   Also requires scalar extract.  */
  
!   if (!nested_in_vect_loop_p (loop, orig_st

Re: Fix alignment propagation

2015-05-21 Thread Martin Jambor
Hi,

I have not managed to ping this, get it approved and commit it in time
for gcc 5 but it is a useful cleanup that clarifies a number of things
and something I'd like to base further cleanups on.  It still applies
cleanly and I have re-tested and re-bootstrapped the patch on
x86_64-linux without any issues.

So, OK for trunk now?

Thanks,

Martin


On Wed, Feb 25, 2015 at 08:38:26PM +0100, Martin Jambor wrote:
> Hi,
> 
> On Fri, Feb 20, 2015 at 07:22:02PM +0100, Jan Hubicka wrote:
> > > > +/* Decrease alignment info DEST to be at most CUR.  */
> > > > +
> > > > +static bool
> > > > +decrease_alignment (ipa_alignment *dest, ipa_alignment cur)
> > > > +{
> > > > +  bool changed = false;
> > > > +
> > > > +  if (!cur.known)
> > > > +return false;
> > > 
> > > I really think this should be return set_alignment_to_bottom (dest);
> > > 
> > > If some known alignment has been already propagated to DEST along a
> > > different edge and now along the current edge an unknown alignment is
> > > coming in, then the result value of the lattice must be BOTTOM and not
> > > the previous alignment this code leaves in place.
> > 
> > Well, because this is an optimisitic propagation now, !cur.known means TOP
> > that is "as good alginment as you can thunk of".
> > You have one known alignment in DEST and TOP in other, result is TOP.
> 
> It seems to be clear now that the fact that I used the same structure
> for the alignment information in the jump function and for the
> alignment lattice (and so for example known meant pessimistic
> assumptions in the former but optimistic in the latter) was really a
> confusing idea.  So, at the risk of proposing a slightly larger patch
> at this late stage, let me backtrack and come up with a real lattice,
> with bottom and top which are called that way and with a real meet
> operation.  Otherwise, the functionality the same as Honza's patch,
> with the increase_alignment function ditched, because it would never
> be used anyway.  We can revisit that in the next stage1, just as we
> can perhaps make the storage more compact.  At this point I wanted to
> minimize risk.
> 
> The decrease_alignment is now called meet_with and I hope it is now
> clear why I requested the changes.
> 
> The patch is currently undergoing bootstrap and testing, Honza
> promised to test on Firefox, it would be great if people burnt by the
> second bug in PR 65028 could run their tests too.
> 
> It's likely there will be comments I'll need to incorporate, but I
> would like to commit this soon to avoid the confusion the multiple
> uses of ipa_alignment structure apparently caused.
> 
> Thanks,
> 
> Martin
> 
> 
> 2015-02-25  Martin Jambor  
>   Jan Hubicka  
> 
>   * ipa-cp.c (ipcp_alignment_lattice): New type.
>   (ipcp_param_lattices): Use the above to represent alignment.
>   (ipcp_alignment_lattice::print): New function.
>   (print_all_lattices): Use it to print alignment information.
>   (ipcp_alignment_lattice::top_p): New function.
>   (ipcp_alignment_lattice::bottom_p): Likewise.
>   (ipcp_alignment_lattice::set_to_bottom): Likewise.
>   (ipcp_alignment_lattice::meet_with_1): Likewise.
>   (ipcp_alignment_lattice::meet_with): Two new overloaded functions.
>   (set_all_contains_variable): Use set_to_bottom of alignment lattice.
>   (initialize_node_lattices): Likewise.
>   (propagate_alignment_accross_jump_function): Work with the new class
>   for alignment lattices.
>   (propagate_constants_accross_call): Pass only the alignment lattice to
>   propagate_alignment_accross_jump_function.
>   (ipcp_store_alignment_results): Work with the new class for alignment
>   lattices.
> 
> testsuite/
>   * gcc.dg/ipa/propalign-4.c: New test.
>   * gcc.dg/ipa/propalign-5.c: Likewise.
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index bfe4d97..5ebe04a 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -257,6 +257,36 @@ public:
>struct ipcp_agg_lattice *next;
>  };
>  
> +/* Lattice of pointer alignment.  Unlike the previous types of lattices, this
> +   one is only capable of holding one value.  */
> +
> +class ipcp_alignment_lattice
> +{
> +public:
> +  /* If bottom and top are both false, these two fields hold values as given 
> by
> + ptr_info_def and get_pointer_alignment_1.  */
> +  unsigned align;
> +  unsigned misalign;
> +
> +  inline bool bottom_p () const;
> +  inline bool top_p () const;
> +  inline bool set_to_bottom ();
> +  bool meet_with (unsigned new_align, unsigned new_misalign);
> +  bool meet_with (const ipcp_alignment_lattice &other, HOST_WIDE_INT offset);
> +  void print (FILE * f);
> +private:
> +  /* If set, this lattice is bottom and all other fields should be
> + disregarded.  */
> +  bool bottom;
> +  /* If bottom and not_top are false, the lattice is TOP.  If not_top is 
> true,
> + the known alignment is stored in the fields align and misalign.  The 
> field
> + is negated so 

[PATCH, fixincludes] AIX headers and extern "C"

2015-05-21 Thread David Edelsohn
The AIX port of GCC is one of the few ports that does not define
NO_IMPLICIT_EXTERN_C.  A user reported a problem that we tracked to an
AIX header that explicitly used C++ features (bracketed by #ifdef
__cplusplus).

AIX headers have included some C++ features, mostly protected by

#if defined (__cplusplus) && defined (__IBMCPP__)

but a few only by __cplusplus.  Because of the way the particular
header is structured, adding a test for __IBMCPP__ causes other
problems.

At the encouragement of Jonathan (implicit extern "C" is an
abomination) Wakely, I tried to bootstrap GCC with
NO_IMPLICIT_EXTERN_C defined on AIX.  That failed horribly.  Some AIX
headers use C++ features while others are not C++ safe.  Sigh.

This patch adds a new fix to wrap the failing AIX headers necessary
for GCC bootstrap in extern "C". I can bootstrap with
NO_IMPLICIT_EXTERN_C defined.

I don't see clear way to discover which of the headers are C++ safe
and which are not if there is no clear intention from IBM AIX Brand to
make the headers C++ safe.  I don't see a robust and low-risk way to
enable NO_IMPLICIT_EXTERN_C on AIX.

So, this patch also adds an extern "C++" block around the C++ code in
sys/socket.h that caused the initial failure.  I could not find other
headers that used C++ features without __IBMCPP__.

Bootstrapped on powerpc-ibm-aix7.1.0.0.

Okay?

Thanks, David

* inclhack.def (aix_externc): New fix.
(aix_externcpp[12]): New fix.
* fixincl.x: Regenerate.
* test/base/ctype.h [AIX_EXTERNC_CHECK]: New test.
* test/base/sys/socket.h [AIX_EXTERNCPP[12]_CHECK]: New test.
* test/base/fcntl.h: New file.


ZZ
Description: Binary data


Re: [C++ PATCH] Minor fix for warn_args_num

2015-05-21 Thread Jason Merrill

On 05/21/2015 09:44 AM, Marek Polacek wrote:

+  if (!DECL_BUILT_IN (fndecl))


I think you want DECL_IS_BUILTIN.  OK with that change.

Jason



Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-21 Thread Rainer Orth
"H.J. Lu"  writes:

> Here is the complete patch.  Tested on Linux/x86-64.  It is also
> available on hjl/pie/master branch in git mirror.

As always, please keep generated files like configure and config.in out
of the submission: it simplifies review.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index ab9b637..e429274 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -253,6 +253,12 @@ LINKER = $(CC)
 LINKER_FLAGS = $(CFLAGS)
 endif
 
+# We don't want to compile the compiler with -fPIE, it make PCH fail.
^s
+COMPILER += @NO_PIE_CFLAGS@

@@ -750,6 +756,8 @@ CC_FOR_BUILD = @CC_FOR_BUILD@
 CXX_FOR_BUILD = @CXX_FOR_BUILD@
 BUILD_CFLAGS= @BUILD_CFLAGS@ -DGENERATOR_FILE
 BUILD_CXXFLAGS = @BUILD_CXXFLAGS@ -DGENERATOR_FILE
+BUILD_CFLAGS += @NO_PIE_CFLAGS@
+BUILD_CXXFLAGS += @NO_PIE_CFLAGS@
 
Here and in several other places, you use += instead of just adding
@NO_PIE_CFLAGS@ to the existing BUILD_CFLAGS variable.  Please lets
keep to the existing idiom instead of randomly introducing another.

@@ -761,6 +769,7 @@ BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS)
 
 # Native linker and preprocessor flags.  For x-fragment overrides.
 BUILD_LDFLAGS=@BUILD_LDFLAGS@
+BUILD_LDFLAGS += @NO_PIE_FLAG@

Likewise.

 BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \
-I$(srcdir)/../include @INCINTL@ $(CPPINC) $(CPPFLAGS)
 
@@ -1864,6 +1873,12 @@ libgcc.mvars: config.status Makefile specs xgcc$(exeext)
echo GCC_CFLAGS = '$(GCC_CFLAGS)' >> tmp-libgcc.mvars
echo INHIBIT_LIBC_CFLAGS = '$(INHIBIT_LIBC_CFLAGS)' >> tmp-libgcc.mvars
echo TARGET_SYSTEM_ROOT = '$(TARGET_SYSTEM_ROOT)' >> tmp-libgcc.mvars
+   if test @enable_default_pie@ = yes; then \
+ NO_PIE_CFLAGS="-fno-PIE"; \

Why literal -fno-PIE instead of @NO_PIE_CFLAGS@?

+   else \
+ NO_PIE_CFLAGS=; \
+   fi; \
+   echo NO_PIE_CFLAGS = "$$NO_PIE_CFLAGS" >> tmp-libgcc.mvars
 
mv tmp-libgcc.mvars libgcc.mvars
 
Besides, we're trying to get away from libgcc.mvars, moving the
detection to libgcc proper.  It would be nice to do so here.

diff --git a/gcc/ada/gcc-interface/Makefile.in 
b/gcc/ada/gcc-interface/Makefile.in
index ecc443e..90aedb5 100644
--- a/gcc/ada/gcc-interface/Makefile.in
+++ b/gcc/ada/gcc-interface/Makefile.in
@@ -267,6 +267,9 @@ TOOLS_LIBS = ../link.o ../targext.o ../../ggc-none.o 
../../libcommon-target.a \
   ../../libcommon.a ../../../libcpp/libcpp.a $(LIBGNAT) $(LIBINTL) $(LIBICONV) 
\
   ../$(LIBBACKTRACE) ../$(LIBIBERTY) $(SYSLIBS) $(TGT_LIB)
 
+# Add -no-pie to TOOLS_LIBS since some of them are compiled with -fno-PIE.
+TOOLS_LIBS += @NO_PIE_FLAG@

Again, avoid +=

diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h
index 4dceb16..adf6f3b 100644
--- a/gcc/config/sol2.h
+++ b/gcc/config/sol2.h
@@ -127,7 +127,7 @@ along with GCC; see the file COPYING3.  If not see
 #define ASM_SPEC_BASE \
 "%{v:-V} %{Qy:} %{!Qn:-Qy} %{Ym,*} -s %(asm_cpu)"
 
-#define ASM_PIC_SPEC " %{fpic|fpie|fPIC|fPIE:-K PIC}"
+#define ASM_PIC_SPEC " %{" FPIE_OR_FPIC_SPEC ":-K PIC}"
 
 #undef ASM_CPU_DEFAULT_SPEC
 #define ASM_CPU_DEFAULT_SPEC \

This is ok once the rest goes in.  I haven't reviewed the other
target-specific parts, though.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 04332c1..437a534 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1585,6 +1585,9 @@ not be built.
 Specify that the run-time libraries for stack smashing protection
 should not be built.
 
+@item --enable-default-pie
+Turn on @option{-fPIE} and @option{-pie} by default.
+
 @item --disable-libquadmath
 Specify that the GCC quad-precision math library should not be built.
 On some systems, the library is required to be linkable when building

This option was added in a seemingly completely random place, between
options to enable/disable runtime libs.  Please find a better place.

diff --git a/gcc/opts.c b/gcc/opts.c
index 9deb8df..4b6d978 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -739,8 +739,22 @@ finish_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   opts->x_flag_section_anchors = 0;
 }
 
+#ifndef ENABLE_DEFAULT_PIE
+#undef DEFAULT_FLAG_PIE
+#define DEFAULT_FLAG_PIE 0
+#endif
+

Couldn't this be done in defaults.h, too?  It seems confusing to provide
DEFAULT_FLAG_PIE defaults both here and in defaults.h.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, fixincludes] AIX headers and extern "C"

2015-05-21 Thread Bruce Korb
OK.  You might consider updating autogen.  It seems 5.18 doesn't
handle the version test quite right.  Any 5.18.n should do fine.  I
guess I didn't test the version test with older versions. :)

On Thu, May 21, 2015 at 6:58 AM, David Edelsohn  wrote:
> The AIX port of GCC is one of the few ports that does not define
> NO_IMPLICIT_EXTERN_C.  A user reported a problem that we tracked to an
> AIX header that explicitly used C++ features (bracketed by #ifdef
> __cplusplus).
>
> AIX headers have included some C++ features, mostly protected by
>
> #if defined (__cplusplus) && defined (__IBMCPP__)
>
> but a few only by __cplusplus.  Because of the way the particular
> header is structured, adding a test for __IBMCPP__ causes other
> problems.
>
> At the encouragement of Jonathan (implicit extern "C" is an
> abomination) Wakely, I tried to bootstrap GCC with
> NO_IMPLICIT_EXTERN_C defined on AIX.  That failed horribly.  Some AIX
> headers use C++ features while others are not C++ safe.  Sigh.
>
> This patch adds a new fix to wrap the failing AIX headers necessary
> for GCC bootstrap in extern "C". I can bootstrap with
> NO_IMPLICIT_EXTERN_C defined.
>
> I don't see clear way to discover which of the headers are C++ safe
> and which are not if there is no clear intention from IBM AIX Brand to
> make the headers C++ safe.  I don't see a robust and low-risk way to
> enable NO_IMPLICIT_EXTERN_C on AIX.
>
> So, this patch also adds an extern "C++" block around the C++ code in
> sys/socket.h that caused the initial failure.  I could not find other
> headers that used C++ features without __IBMCPP__.
>
> Bootstrapped on powerpc-ibm-aix7.1.0.0.
>
> Okay?
>
> Thanks, David
>
> * inclhack.def (aix_externc): New fix.
> (aix_externcpp[12]): New fix.
> * fixincl.x: Regenerate.
> * test/base/ctype.h [AIX_EXTERNC_CHECK]: New test.
> * test/base/sys/socket.h [AIX_EXTERNCPP[12]_CHECK]: New test.
> * test/base/fcntl.h: New file.


[PATCH][AArch64] Add __extension__ and __always_inline__ to crypto intrinsics

2015-05-21 Thread Kyrill Tkachov

Hi all,

The crypto intrinsics are missing an __extension__ and an __always_inline__ 
attribute that all the other
intrinsics have. I don't see any reason for them to be different and the 
always_inline attribute will be needed
if we decide to wrap the intrinsics inside a target SIMD pragma.

Tested aarch64-none-elf.

Ok for trunk?

Thanks,
Kyrill

2015-05-21  Kyrylo Tkachov  

* config/aarch64/arm_neon.h (vaeseq_u8): Add __extension__ and
__always_inline__ attribute.
(vaesdq_u8): Likewise.
(vaesmcq_u8): Likewise.
(vaesimcq_u8): Likewise.
(vsha1cq_u32): Likewise.
(vsha1mq_u32): Likewise.
(vsha1pq_u32): Likewise.
(vsha1h_u32): Likewise.
(vsha1su0q_u32): Likewise.
(vsha1su1q_u32): Likewise.
(vsha256hq_u32): Likewise.
(vsha256h2q_u32): Likewise.
(vsha256su0q_u32): Likewise.
(vsha256su1q_u32): Likewise.
(vmull_p64): Likewise.
(vmull_high_p64): Likewise.
commit 92dc194bb26ae3a9c05b86d78e749a31d320ceae
Author: Kyrylo Tkachov 
Date:   Thu May 14 16:26:04 2015 +0100

[AArch64] Add __always_inline__ attribute to crypto intrinsics

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 9896e8c..114994e 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11400,25 +11400,25 @@ vbslq_u64 (uint64x2_t __a, uint64x2_t __b, uint64x2_t __c)
 
 /* vaes  */
 
-static __inline uint8x16_t
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vaeseq_u8 (uint8x16_t data, uint8x16_t key)
 {
   return __builtin_aarch64_crypto_aesev16qi_uuu (data, key);
 }
 
-static __inline uint8x16_t
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vaesdq_u8 (uint8x16_t data, uint8x16_t key)
 {
   return __builtin_aarch64_crypto_aesdv16qi_uuu (data, key);
 }
 
-static __inline uint8x16_t
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vaesmcq_u8 (uint8x16_t data)
 {
   return __builtin_aarch64_crypto_aesmcv16qi_uu (data);
 }
 
-static __inline uint8x16_t
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vaesimcq_u8 (uint8x16_t data)
 {
   return __builtin_aarch64_crypto_aesimcv16qi_uu (data);
@@ -21053,72 +21053,74 @@ vrsrad_n_u64 (uint64_t __a, uint64_t __b, const int __c)
 
 /* vsha1  */
 
-static __inline uint32x4_t
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vsha1cq_u32 (uint32x4_t hash_abcd, uint32_t hash_e, uint32x4_t wk)
 {
   return __builtin_aarch64_crypto_sha1cv4si_ (hash_abcd, hash_e, wk);
 }
-static __inline uint32x4_t
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vsha1mq_u32 (uint32x4_t hash_abcd, uint32_t hash_e, uint32x4_t wk)
 {
   return __builtin_aarch64_crypto_sha1mv4si_ (hash_abcd, hash_e, wk);
 }
-static __inline uint32x4_t
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vsha1pq_u32 (uint32x4_t hash_abcd, uint32_t hash_e, uint32x4_t wk)
 {
   return __builtin_aarch64_crypto_sha1pv4si_ (hash_abcd, hash_e, wk);
 }
 
-static __inline uint32_t
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
 vsha1h_u32 (uint32_t hash_e)
 {
   return __builtin_aarch64_crypto_sha1hsi_uu (hash_e);
 }
 
-static __inline uint32x4_t
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vsha1su0q_u32 (uint32x4_t w0_3, uint32x4_t w4_7, uint32x4_t w8_11)
 {
   return __builtin_aarch64_crypto_sha1su0v4si_ (w0_3, w4_7, w8_11);
 }
 
-static __inline uint32x4_t
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vsha1su1q_u32 (uint32x4_t tw0_3, uint32x4_t w12_15)
 {
   return __builtin_aarch64_crypto_sha1su1v4si_uuu (tw0_3, w12_15);
 }
 
-static __inline uint32x4_t
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vsha256hq_u32 (uint32x4_t hash_abcd, uint32x4_t hash_efgh, uint32x4_t wk)
 {
   return __builtin_aarch64_crypto_sha256hv4si_ (hash_abcd, hash_efgh, wk);
 }
 
-static __inline uint32x4_t
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vsha256h2q_u32 (uint32x4_t hash_efgh, uint32x4_t hash_abcd, uint32x4_t wk)
 {
   return __builtin_aarch64_crypto_sha256h2v4si_ (hash_efgh, hash_abcd, wk);
 }
 
-static __inline uint32x4_t
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vsha256su0q_u32 (uint32x4_t w0_3, uint32x4_t w4_7)
 {
   return __builtin_aarch64_crypto_sha256su0v4si_uuu (w0_3, w4_7);
 }
 
-static __inline uint32x4_t
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vsha256su1q_u32 (uint32x4_t tw0_3, uint32x4_t w8_11, uint32x4_t w12_15)
 {
   return __builtin_aarch64_crypto_sha256su1v4si_ (tw0_3, w8_11, w12_15);
 }
 
-static __inline poly128_t
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
 vmull_p64 (poly64_t a, poly64_t b)
 {
   return
 __builtin_aarc

Re: [C++ PATCH] Minor fix for warn_args_num

2015-05-21 Thread Marek Polacek
On Thu, May 21, 2015 at 10:11:41AM -0400, Jason Merrill wrote:
> On 05/21/2015 09:44 AM, Marek Polacek wrote:
> >+  if (!DECL_BUILT_IN (fndecl))
> 
> I think you want DECL_IS_BUILTIN.  OK with that change.

Right.  With DECL_IS_BUILTIN we print

q.c:1:5: note: declared here
 int printf (const char *, ...);
 ^
even for

int printf (const char *, ...);
void
foo (void)
{
  printf ();
}

C FE uses DECL_BUILT_IN so it doesn't print the note, but I think
we want it in this case, so I'll fix it up there.  Thanks,

Marek


Re: RFA: PATCH to use -std=c++98 in stage 1 of bootstrap

2015-05-21 Thread Jason Merrill

On 05/20/2015 06:11 PM, Alexandre Oliva wrote:

The only serious problem with the patch is that it changes Makefile.in,
but not the corresponding part of Makefile.tpl from which it is
generated.  Ok with that change.

Now, if you'd also update the comments just before it, that still
suggest we build only C in stage1, that would be appreciated.


Sure, here's what I'm applying.


commit 09fbb5f79eae1e95ce64b0f746e9085567e86fbf
Author: Jason Merrill 
Date:   Mon May 18 23:58:41 2015 -0400

	* configure.ac: Add -std=c++98 to stage1_cxxflags.
	* Makefile.tpl (STAGE1_CXXFLAGS): And substitute it.
	* Makefile.in, configure: Regenerate.

diff --git a/Makefile.in b/Makefile.in
index c221a0b..7ae2a40 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -482,13 +482,12 @@ STAGEfeedback_TFLAGS = $(STAGE_TFLAGS)
 STAGEfeedback_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 
-# Only build the C compiler for stage1, because that is the only one that
-# we can guarantee will build with the native compiler, and also it is the
-# only thing useful for building stage2. STAGE1_CFLAGS (via CFLAGS),
-# MAKEINFO and MAKEINFOFLAGS are explicitly passed here to make them
-# overrideable (for a bootstrap build stage1 also builds gcc.info).
+# By default, C and C++ are the only stage1 languages, because they are the
+# only ones we require to build with the bootstrap compiler, and also the
+# only ones useful for building stage2.
 
 STAGE1_CFLAGS = @stage1_cflags@
+STAGE1_CXXFLAGS = @stage1_cxxflags@
 STAGE1_CHECKING = @stage1_checking@
 STAGE1_LANGUAGES = @stage1_languages@
 # * We force-disable intermodule optimizations, even if
@@ -677,7 +676,9 @@ CXX_FOR_TARGET_FLAG_TO_PASS = \
 	$(shell if echo "$(CXX_FOR_TARGET)" | grep " -funconfigured-" > /dev/null; then :; else echo '"CXX_FOR_TARGET=$(CXX_FOR_TARGET)"'; fi)
 @endif target-libstdc++-v3
 
-# Flags to pass down to all sub-makes.
+# Flags to pass down to all sub-makes. STAGE*FLAGS,
+# MAKEINFO and MAKEINFOFLAGS are explicitly passed here to make them
+# overrideable (for a bootstrap build stage1 also builds gcc.info).
 BASE_FLAGS_TO_PASS = \
 	"DESTDIR=$(DESTDIR)" \
 	"RPATH_ENVVAR=$(RPATH_ENVVAR)" \
diff --git a/Makefile.tpl b/Makefile.tpl
index ec53b59..914196f 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -435,13 +435,12 @@ STAGE[+id+]_TFLAGS = $(STAGE_TFLAGS)
 STAGE[+id+]_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 [+ ENDFOR bootstrap-stage +]
 
-# Only build the C compiler for stage1, because that is the only one that
-# we can guarantee will build with the native compiler, and also it is the
-# only thing useful for building stage2. STAGE1_CFLAGS (via CFLAGS),
-# MAKEINFO and MAKEINFOFLAGS are explicitly passed here to make them
-# overrideable (for a bootstrap build stage1 also builds gcc.info).
+# By default, C and C++ are the only stage1 languages, because they are the
+# only ones we require to build with the bootstrap compiler, and also the
+# only ones useful for building stage2.
 
 STAGE1_CFLAGS = @stage1_cflags@
+STAGE1_CXXFLAGS = @stage1_cxxflags@
 STAGE1_CHECKING = @stage1_checking@
 STAGE1_LANGUAGES = @stage1_languages@
 # * We force-disable intermodule optimizations, even if
@@ -579,7 +578,9 @@ CXX_FOR_TARGET_FLAG_TO_PASS = \
 	$(shell if echo "$(CXX_FOR_TARGET)" | grep " -funconfigured-" > /dev/null; then :; else echo '"CXX_FOR_TARGET=$(CXX_FOR_TARGET)"'; fi)
 @endif target-libstdc++-v3
 
-# Flags to pass down to all sub-makes.
+# Flags to pass down to all sub-makes. STAGE*FLAGS,
+# MAKEINFO and MAKEINFOFLAGS are explicitly passed here to make them
+# overrideable (for a bootstrap build stage1 also builds gcc.info).
 BASE_FLAGS_TO_PASS =[+ FOR flags_to_pass +][+ IF optional +] \
 	"`echo '[+flag+]=$([+flag+])' | sed -e s'/[^=][^=]*=$$/XFOO=/'`"[+ ELSE optional +] \
 	"[+flag+]=$([+flag+])"[+ ENDIF optional+][+ ENDFOR flags_to_pass +][+ FOR bootstrap-stage +] \
diff --git a/configure b/configure
index d804329..07aba3e 100755
--- a/configure
+++ b/configure
@@ -559,6 +559,7 @@ compare_exclusions
 host_shared
 stage2_werror_flag
 stage1_checking
+stage1_cxxflags
 stage1_cflags
 MAINT
 MAINTAINER_MODE_FALSE
@@ -14755,6 +14756,13 @@ case $build in
   *) stage1_cflags="-g -J" ;;
 esac ;;
 esac
+stage1_cxxflags='$(STAGE1_CFLAGS)'
+if test "$GCC" = yes; then
+  # Build stage 1 in C++98 mode to ensure that a C++98 compiler can still
+  # start the bootstrap.
+  stage1_cxxflags="$stage1_cxxflags -std=c++98"
+fi
+
 
 
 
diff --git a/configure.ac b/configure.ac
index 4da04b7..5808eda 100644
--- a/configure.ac
+++ b/configure.ac
@@ -3476,8 +3476,15 @@ case $build in
   *) stage1_cflags="-g -J" ;;
 esac ;;
 esac
+stage1_cxxflags='$(STAGE1_CFLAGS)'
+if test "$GCC" = yes; then
+  # Build stage 1 in C++98 mode to ensure that a C++98 compiler can still
+  # start the bootstrap.
+  stage1_cxxflags="$stage1_cxxflags -std=c++98"
+fi
 
 AC_SUBST(stage1_cflags)
+AC_SUBST(stage1_cxxflags)
 
 # Enable --enable-checking in stage1 of the compiler.
 AC_ARG_ENABLE

v3 PATCH to avoid -Wsized-deallocation warnings with C++14 compiler

2015-05-21 Thread Jason Merrill
When GCC defaults to C++14, it gives -Wsized-deallocation warnings for 
the non-sized operator deletes.  We dealt with this for the sized ones 
by passing -Wno-sized-deallocation on the command line, but using 
#pragma GCC diagnostic seems cleaner to me.


Applying to trunk.
commit 35cc1a3b6855a3e2e798d389674c98b6d5c24ffa
Author: Jason Merrill 
Date:   Wed May 20 14:21:48 2015 -0400

	* libsupc++/del_opv.cc: Suppress -Wsized-deallocation.
	* libsupc++/del_op.cc: Likewise.

diff --git a/libstdc++-v3/libsupc++/del_op.cc b/libstdc++-v3/libsupc++/del_op.cc
index 06eb2a0..8e7aa2f 100644
--- a/libstdc++-v3/libsupc++/del_op.cc
+++ b/libstdc++-v3/libsupc++/del_op.cc
@@ -40,6 +40,9 @@ _GLIBCXX_END_NAMESPACE_VERSION
 
 #include "new"
 
+// The sized deletes are defined in other files.
+#pragma GCC diagnostic ignored "-Wsized-deallocation"
+
 _GLIBCXX_WEAK_DEFINITION void
 operator delete(void* ptr) _GLIBCXX_USE_NOEXCEPT
 {
diff --git a/libstdc++-v3/libsupc++/del_opv.cc b/libstdc++-v3/libsupc++/del_opv.cc
index 6fc1710..0a050bb 100644
--- a/libstdc++-v3/libsupc++/del_opv.cc
+++ b/libstdc++-v3/libsupc++/del_opv.cc
@@ -26,6 +26,9 @@
 #include 
 #include "new"
 
+// The sized deletes are defined in other files.
+#pragma GCC diagnostic ignored "-Wsized-deallocation"
+
 _GLIBCXX_WEAK_DEFINITION void
 operator delete[] (void *ptr) _GLIBCXX_USE_NOEXCEPT
 {


Re: [PATCH] PR target/66224 _GLIBC_READ_MEM_BARRIER

2015-05-21 Thread Steven Munroe
On Wed, 2015-05-20 at 14:40 -0400, David Edelsohn wrote:
> The current definition of _GLIBC_READ_MEM_BARRIER in libstdc++ is too
> weak for an ACQUIRE FENCE, which is what it is intended to be. The
> original code emitted an "isync" instead of "lwsync".
> 
> All of the guard acquire and set code needs to be cleaned up to use
> GCC atomic intrinsics, but this is necessary for correctness.
> 
> Steve, any comment about the Linux part?
> 
This is correct for the PowerISA V2 (POWER4 and later) processors.

I assume the #ifdef __NO_LWSYNC guard is only set for older (ISA V1)
processors.

Thanks




[RFA] Fix combine to canonicalize (mult X pow2)) more often

2015-05-21 Thread Jeff Law


When combine needs to split a complex insn, it will canonicalize a 
simple (mult X (const_int Y)) where Y is a power of 2 into the expected 
(ashift X (const_int Y')) if the (mult ...) is selected as a split point.


However if the split point is (plus (mult (X (const_int Y)) Z) combine 
fails to canonicalize.


In this particular testcase we end up with two expressions which produce 
the same value, one is in canonical form using ASHIFT, the other in 
non-canonical form with a MULT.  Because the two forms differ, we fail 
to find the common subexpression in postreload-gcse.


Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  Also 
successfully ran hppa.exp with hppa2.0w-hp-hpux11 cross compiler.  With 
this change we generate the same code for my 300+ files on the PA that 
we had prior to Venkat's combine changes.


The remaining patches in this series will just be cleaning up the PA 
backend, in particular removing all cases where we might generate 
non-canonical forms.  You could legitimately ask if that would eliminate 
the need for this change.  It will not because the MULT form is still 
canonical inside a MEM and we might need to split out an address 
calculation from the MEM to make an insn recognizable.


OK for the trunk?

Jeff

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f4012b7..425813c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,8 @@
 2015-05-21  Jeff Law  
 
+   * combine.c (try_combine): Canonicalize (plus (mult X pow2) Y) into
+   (plus (ashift X log2) Y) if it is a split point.
+
* combine.c (find_split_point): Handle ASHIFT like MULT to encourage
multiply-accumulate/shift-add insn generation.
 
diff --git a/gcc/combine.c b/gcc/combine.c
index 8c527a7..2cb9fd2 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3749,6 +3749,21 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
  split_code = GET_CODE (*split);
}
 
+ /* Similarly for (plus (mult FOO (const_int pow2))).  */
+ if (split_code == PLUS
+ && GET_CODE (XEXP (*split, 0)) == MULT
+ && CONST_INT_P (XEXP (XEXP (*split, 0), 1))
+ && INTVAL (XEXP (XEXP (*split, 0), 1)) > 0
+ && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1 >= 0)
+   {
+ rtx nsplit = XEXP (*split, 0);
+ SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
+XEXP (nsplit, 0), GEN_INT (i)));
+ /* Update split_code because we may not have a multiply
+anymore.  */
+ split_code = GET_CODE (*split);
+   }
+
 #ifdef INSN_SCHEDULING
  /* If *SPLIT is a paradoxical SUBREG, when we split it, it should
 be written as a ZERO_EXTEND.  */
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 283644c..41a09bad 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,6 @@
 2015-05-21  Jeff Law  
 
+   * gcc.target/hppa/shadd-3.c: New test.
* gcc.target/hppa/shadd-2.c: New test.
 
 2015-05-21  Oleg Endo  
diff --git a/gcc/testsuite/gcc.target/hppa/shadd-3.c 
b/gcc/testsuite/gcc.target/hppa/shadd-3.c
new file mode 100644
index 000..f0443ea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/hppa/shadd-3.c
@@ -0,0 +1,41 @@
+/* { dg-do compile }  */
+/* { dg-options "-O2" }  */
+/* In this test we want to verify that combine canonicalizes the
+   MULT into an ASHIFT which in turn allows postreload-gcse to
+   find the common subexpression.
+
+   Neither pass dumps stuff in a format that is particularly good
+   for parsing here, so we count the shadd insns.  More is not
+   necessarily better in this test.  If this test is too fragile
+   over time we'll have to revisit the combine and/or postreload
+   dumps.  */
+/* { dg-final { scan-assembler-times "sh.add" 5 } }  */
+
+extern void oof (void);
+typedef struct simple_bitmap_def *sbitmap;
+struct simple_bitmap_def
+{
+  unsigned char *popcount;
+  unsigned int n_bits;
+  unsigned long elms[1];
+};
+__inline__ void
+SET_BIT (sbitmap map, unsigned int bitno)
+{
+  if (map->popcount)
+{
+  unsigned char oldbit;
+  oldbit =
+   ((map)->elms[bitno / 64]);
+  if (!oldbit)
+   oof ();
+}
+  map->elms[bitno / 64] |= 1;
+}
+
+void
+fix_bb_placements (int indx1, int indx2, sbitmap in_queue)
+{
+  SET_BIT (in_queue, indx1);
+  SET_BIT (in_queue, indx2);
+}


[PATCH] Fix (ocvt (icvt@1 @0)) simplification (PR tree-optimization/66233)

2015-05-21 Thread Jakub Jelinek
Hi!

We ICE on the following testcase at -O3 on x86_64-linux, because
gimple folding attempts to simplify FLOAT_EXPR conversion of
signed V4SI to V4SF feeding FIX_TRUNC_EXPR to unsigned V4SI
into a FIX_TRUNC_EXPR with unsigned V4SI lhs and signed V4SI rhs1,
which is invalid GIMPLE.
All the other simplifications in the same iterator block don't
optimize anything for vector types, and I can't find out any case
where something like this would be beneficial for vector types.
These days we represent source level casts of vectors to same sized
integers as VIEW_CONVERT_EXPR, which isn't handled in here,
and *_prec doesn't really mean what it tests for vector types
(it is log2 of number of elements), vector integer or float widening
is not represented using convert/float/fix_trunc, but using VEC_PERM_EXPR,
VEC_UNPACK*_{LO,HI}_EXPR etc.
I've bootstrapped/regtested with a logging variant and if
(inside_vec || inter_vec || final_vec) is true, we (mis)optimize
anything only on the testcase included in the patch and on
gfortran.dg/stfunc_4.f90 testcase, in both cases it is
V4SI -> V4SF -> V4SI, which we really shouldn't be optimizing,
because SF mode obviously can't represent all integers exactly.

So, this patch disables optimizing vectors.
Ok for trunk/5.2 if bootstrap/regtest succeeds?

For 4.9/4.8 a similar patch will be needed, but to
fold-const.c/tree-ssa-forwprop.c instead of match.pd.

2015-05-21  Jakub Jelinek  

PR tree-optimization/66233
* match.pd (ocvt (icvt@1 @0)): Don't handle vector types.
Simplify.

* gcc.c-torture/execute/pr66233.c: New test.

--- gcc/match.pd.jj 2015-05-19 15:53:43.0 +0200
+++ gcc/match.pd2015-05-21 16:21:35.627916502 +0200
@@ -730,16 +730,12 @@ (define_operator_list inverted_tcc_compa
   (for integers).  Avoid this if the final type is a pointer since
   then we sometimes need the middle conversion.  Likewise if the
   final type has a precision not equal to the size of its mode.  */
-   (if (((inter_int && inside_int)
-|| (inter_float && inside_float)
-|| (inter_vec && inside_vec))
+   (if (((inter_int && inside_int) || (inter_float && inside_float))
+   && (final_int || final_float)
&& inter_prec >= inside_prec
-   && (inter_float || inter_vec
-   || inter_unsignedp == inside_unsignedp)
-   && ! (final_prec != GET_MODE_PRECISION (element_mode (type))
- && element_mode (type) == element_mode (inter_type))
-   && ! final_ptr
-   && (! final_vec || inter_prec == inside_prec))
+   && (inter_float || inter_unsignedp == inside_unsignedp)
+   && ! (final_prec != GET_MODE_PRECISION (TYPE_MODE (type))
+ && TYPE_MODE (type) == TYPE_MODE (inter_type)))
 (ocvt @0))
 
/* If we have a sign-extension of a zero-extended value, we can
--- gcc/testsuite/gcc.c-torture/execute/pr66233.c.jj2015-05-21 
17:13:32.639713225 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr66233.c   2015-05-21 
17:10:57.0 +0200
@@ -0,0 +1,22 @@
+/* PR tree-optimization/66233 */
+
+unsigned int v[8];
+
+__attribute__((noinline, noclone)) void
+foo (void)
+{
+  int i;
+  for (i = 0; i < 8; i++)
+v[i] = (float) i;
+}
+
+int
+main ()
+{
+  unsigned int i;
+  foo ();
+  for (i = 0; i < 8; i++)
+if (v[i] != i)
+  __builtin_abort ();
+  return 0;
+}

Jakub


[committed] Tweak inform_declaration

2015-05-21 Thread Marek Polacek
See  for
the rationale.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

2015-05-21  Marek Polacek  

* c-typeck.c (inform_declaration): Use DECL_IS_BUILTIN instead of
DECL_BUILT_IN.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index ba8797b..f55d4c6 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -2853,9 +2853,10 @@ build_function_call (location_t loc, tree function, tree 
params)
 
 /* Give a note about the location of the declaration of DECL.  */
 
-static void inform_declaration (tree decl)
+static void
+inform_declaration (tree decl)
 {
-  if (decl && (TREE_CODE (decl) != FUNCTION_DECL || !DECL_BUILT_IN (decl)))
+  if (decl && (TREE_CODE (decl) != FUNCTION_DECL || !DECL_IS_BUILTIN (decl)))
 inform (DECL_SOURCE_LOCATION (decl), "declared here");
 }
 

Marek


Re: Don't dump low gimple functions in gimple dump

2015-05-21 Thread Thomas Schwinge
Hi!

It's just been a year.  ;-P

In early March, I (hopefully correctly) adapted Tom's patch to apply to
then-current GCC trunk sources; posting this here.  Is the general
approach OK?

On Tue, 20 May 2014 10:16:45 +0200, Tom de Vries  wrote:
> Honza,
> 
> Consider this program:
> ...
> int
> main(void)
> {
> #pragma omp parallel
>   {
> extern void foo(void);
> foo ();
>   }
>   return 0;
> }
> ...
> 
> When compiling this program with -fopenmp, the ompexp pass splits off a new
> function called main._omp_fn.0 containing the call to foo.  The new function 
> is
> then dumped into the gimple dump by analyze_function.
> 
> There are two problems with this:
> - the new function is in low gimple, and is dumped next to high gimple
>   functions
> - since it's already low, the new function is not lowered, and 'goes missing'
>   in the dumps following the gimple dump, until it reappears again after the
>   last lowering dump.
>   [ http://gcc.gnu.org/ml/gcc/2014-03/msg00312.html ]
> 
> This patch fixes the problems by ensuring that analyze_function only dumps the
> new function to the gimple dump after gimplification (specifically, by moving
> the dump_function call into gimplify_function_tree.  That makes the call to
> dump_function in finalize_size_functions superfluous).
> 
> That also requires us to add a call to dump_function in finalize_task_copyfn,
> where we split off a new high gimple function.
> 
> And in expand_omp_taskreg and expand_omp_target, where we split off a new low
> gimple function, we now dump the new function into the current (ompexp) dump
> file, which is the last lowering dump.
> 
> Finally, we dump an information statement at the start of
> cgraph_add_new_function to give a better idea when and what kind of function 
> is
> created.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk ?
> 
> Thanks,
> - Tom

commit b925b393c3d975a9281789d97aff8a91a8b53be0
Author: Thomas Schwinge 
Date:   Sun Mar 1 15:05:15 2015 +0100

Don't dump low gimple functions in gimple dump

id:"537b0f6d.7060...@mentor.com" or id:"53734dc5.90...@mentor.com"

2014-05-19  Tom de Vries  

* cgraphunit.c (cgraph_add_new_function): Dump message on new function.
(analyze_function): Don't dump function to gimple dump file.
* gimplify.c: Add tree-dump.h include.
(gimplify_function_tree): Dump function to gimple dump file.
* omp-low.c: Add tree-dump.h include.
(finalize_task_copyfn): Dump new function to gimple dump file.
(expand_omp_taskreg, expand_omp_target): Dump new function to dump file.
* stor-layout.c (finalize_size_functions): Don't dump function to gimple
dump file.

* gcc.dg/gomp/dump-task.c: New test.
---
 gcc/cgraphunit.c  | 15 ++-
 gcc/gimplify.c|  3 +++
 gcc/omp-low.c |  6 ++
 gcc/stor-layout.c |  1 -
 gcc/testsuite/gcc.dg/gomp/dump-task.c | 33 +
 5 files changed, 56 insertions(+), 2 deletions(-)

diff --git gcc/cgraphunit.c gcc/cgraphunit.c
index 8280fc4..0860c86 100644
--- gcc/cgraphunit.c
+++ gcc/cgraphunit.c
@@ -501,6 +501,20 @@ cgraph_node::add_new_function (tree fndecl, bool lowered)
 {
   gcc::pass_manager *passes = g->get_passes ();
   cgraph_node *node;
+
+  if (dump_file)
+{
+  const char *function_type = ((gimple_has_body_p (fndecl))
+  ? (lowered
+ ? "low gimple"
+ : "high gimple")
+  : "to-be-gimplified");
+  fprintf (dump_file,
+  "Added new %s function %s to callgraph\n",
+  function_type,
+  fndecl_name (fndecl));
+}
+
   switch (symtab->state)
 {
   case PARSING:
@@ -629,7 +643,6 @@ cgraph_node::analyze (void)
 body.  */
   if (!gimple_has_body_p (decl))
gimplify_function_tree (decl);
-  dump_function (TDI_generic, decl);
 
   /* Lower the function.  */
   if (!lowered)
diff --git gcc/gimplify.c gcc/gimplify.c
index 9214648..d6c500d 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -87,6 +87,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-low.h"
 #include "cilk.h"
 #include "gomp-constants.h"
+#include "tree-dump.h"
 
 #include "langhooks-def.h" /* FIXME: for lhd_set_decl_assembler_name */
 #include "tree-pass.h" /* FIXME: only for PROP_gimple_any */
@@ -9435,6 +9436,8 @@ gimplify_function_tree (tree fndecl)
   cfun->curr_properties = PROP_gimple_any;
 
   pop_cfun ();
+
+  dump_function (TDI_generic, fndecl);
 }
 
 /* Return a dummy expression of type TYPE in order to keep going after an
diff --git gcc/omp-low.c gcc/omp-low.c
index fac32b3..2839d8f 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -109,6 +109,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "lt

Re: Cleanup and improve canonical type construction in LTO

2015-05-21 Thread Jan Hubicka
> On Wed, 20 May 2015, Jan Hubicka wrote:
> 
> > Richard,
> > this is my attempt to make sense of TYPE_CANONICAL at LTO.  My 
> > undrestanding is
> > that gimple_canonical_types_compatible_p needs to return true for all pairs 
> > of
> > types that are considered compatible across compilation unit for any of
> > languages we support (and in a sane way for cross language, too) and 
> > moreover
> > it needs to form an equivalence so it can be used to do canonical type 
> > merging.
> > 
> > Now C definition of type compatibility ignores type names and only boils 
> > down
> > to structural compare (which we get wrong for unions, but I will look into 
> > that
> > incrementally, also C explicitely require fields names to match, which we 
> > don't)
> > and it of course says that incompete type can match complete.
> 
> field-names are difficult to match cross-language.

Yep, I suppose that may be tricky.  
One thing that may make sense is to detect if whole program is C/C++ only and 
switch
to the standard compliant matching here.
> 
> > This is bit generous on structures and unions, because every incomplete
> > RECORD_TYPE is compatible with every RECORD_TYPE in program and similarly
> > incomplete UNION_TYPE is compatible with every UNION_TYPE in program.
> > 
> > Now from the fact that gimple_canonical_types_compatible_p must be 
> > equivalence
> > (and thus transitive) we immmediately get that there is no way to make
> > difference between two RECORD_TYPEs (or UNION_TYPEs) at all: there always 
> > may
> > be incomplete that forces them equivalent.
> > 
> > This is not how the code works. gimple_canonical_types_compatible_p will not
> > match complete type with incomplete and this is not a prolblem only because
> > TYPE_CANONICAL matters for complete types only. TBAA machinery never needs
> > alias sets of an incomplete type (modulo bugs). 
> 
> Correct.
> 
> > More precisely we have two equivalences:
> >  1) full structural equivalence matching fields, array sizes and function
> > parameters, where pointer types are however recursively matched only 
> > with 2)
> 
> Not sure about function parameters (well, function types at all - they
> don't play a role in TBAA) - function "members" are always pointers, so 
> see 2)

OK, we compute the canonical types for functions just to ignore them, indeed.
> 
> >  2) structural equivalence ignoring any info from complete types:
> > here all RECORD_TYPEs are equal, so are UNION_TYPEs, for functions we
> > can only match return value (because of existence of non-prototypes),
> > for arrays only TREE_TYPE.
> > In this equivalence we also can't match TYPE_MODE of aggregates/arrays
> > because it may not be set for incomplete ones.
> > 
> > Now our implementation somehow compute only 1) and 2) is approximated by
> > matching TREE_CODE of the pointer-to type.  This is unnecesarily pesimistic.
> > Pointer to pointer to int does not need to match pointer to pointer to
> > structure. 
> 
> Note that you have (a lot of!) pointer members that point to structures
> in various state of completeness.  A pointer to an incomplete type
> needs to match all other pointer types (well, the current code tries
> to make the exception that a pointer to an aggregate stays a pointer
> to an aggregate - thus the matching of pointed-to type - sorry to
> only remember now the connection to incompleteness ...)

Yes, that is actually consistent with what C standard says - the incomplete
types (cross language) depends only on the tag and tag is basically ARRAY_TYPE/
RECORD_TYPE/UNION_TYPE distinction.
> 
> > The patch bellow changes it in the following way:
> > 
> >  a) it adds MATCH_INCOMPLETE_TYPES parameter to
> > gimple_canonical_types_compatible_p and gimple_canonical_type_hash
> > to determine whether we compute equivalence 1) or 2).
> > 
> > The way we handle pointers is updated so we set MATCH_INCOMPLETE_TYPES
> > when recursing down to pointer type.  This makes it possible for
> > complete structure referring incomplete pointer type to be equivalent 
> > with
> > a complete structure referring complete pointer type.
> 
> But does this really end up getting more equivalence classes than the
> crude approach matching TREE_CODE?

We can distinguish different pointers to pointers and pointers to types
that are always complete (integers/reals).  It makes relatively small difference
on Firefox (about 2-5% more disambiguations with my patch), but I think it is
more correct and extensible.

For example, now it is quite clear how to handle anonymous C++ types
incrementally.
> 
> > I believe that in this definition we do best possible equivalence
> > passing the rules above and we do not need to care about SCC - the
> > only way how type can reffer itself is via pointer and that will make us
> > to drop to MATCH_INCOMPLETE_TYPES.
> >  b) it disables TYPE_CANONICAL calculation for incomplete types and 
> > functions
> > types. I

Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.

2015-05-21 Thread Matthew Wahab

On 19/05/15 20:20, Torvald Riegel wrote:

On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote:

Hello,

On 15/05/15 17:22, Torvald Riegel wrote:

This patch improves the documentation of the built-ins for atomic
operations.


The "memory model" to "memory order" change does improve things but I think that
the patch has some problems. As it is now, it makes some of the descriptions
quite difficult to understand and seems to assume more familiarity with details
of the C++11 specification then might be expected.


I'd say that's a side effect of the C++11 memory model being the
reference specification of the built-ins.


Generally, the memory order descriptions seem to be targeted towards language
designers but don't provide for anybody trying to understand how to implement or
to use the built-ins.


I agree that the current descriptions aren't a tutorial on the C++11
memory model.  However, given that the model is not GCC-specific, we
aren't really in a need to provide a tutorial, in the same way that we
don't provide a C++ tutorial.  Users can pick the C++11 memory model
educational material of their choice, and we need to document what's
missing to apply the C++11 knowledge to the built-ins we provide.



We seem to have different views about the purpose of the manual page. I'm treating it 
as a description of the built-in functions provided by gcc to generate the code 
needed to implement the C++11 model. That is, the built-ins are distinct from C++11 
and their descriptions should be, as far as possible, independent of the methods used 
in the C++11 specification to describe the C++11 memory model.


I understand of course that the __atomics were added in order to support C++11 but 
that doesn't make them part of C++11 and, since __atomic functions can be made 
available when C11/C++11 may not be, it seems to make sense to try for stand-alone 
descriptions.


I'm also concerned that the patch, by describing things in terms of formal C++11 
concepts, makes it more difficult for people to know what the built-ins can be 
expected to do and so make the built-in more difficult to use There is a danger that 
rather than take a risk with uncertainty about the behaviour of the __atomics, people 
will fall-back to the __sync functions simply because their expected behaviour is 
easier to work out.


I don't think that linking to external sites will help either, unless people already 
want to know C++11. Anybody who just wants to (e.g.) add a memory barrier will take 
one look at the __sync manual page and use the closest match from there instead.


Note that none of this requires a tutorial of any kind. I'm just suggesting that the 
manual should describe what behaviour should be expected of the code generated for 
the functions. For the memory orders, that would mean describing what constraints 
need to be met by the generated code. The requirement that the atomics should support 
C++11 could be met by making sure that the description of the expected behaviour is 
sufficient for C++11.



There are several resources for implementers, for example the mappings
maintained by the Cambridge research group.  I guess it would be
sufficient to have such material on the wiki.  Is there something
specific that you'd like to see documented for implementers?
[...]
I agree it's not described in the manual, but we're implementing C++11.


(As above) I believe we're supporting the implementation of C++11 and that the 
distinction is important.



However, I don't see why happens-before semantics wouldn't apply to
GCC's implementation of the built-ins; there may be cases where we
guarantee more, but if one uses the builtins in way allowed by the C++11
model, one certainly gets behavior and happens-before relationships as
specified by C++11.



My understanding is that happens-before is a relation used in the C++11 specification 
for a specific meaning. I believe that it's used to decide whether something is or is 
not a data race so saying that it applies to a gcc built-in would be wrong. Using the 
gcc built-in rather than the equivalent C++11 library function would result in 
program that C++11 regards as invalid. (Again, as I understand it.)





diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 6004681..5b2ded8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8853,19 +8853,19 @@ are not prevented from being speculated to before the 
barrier.

   [...]  If the data type size maps to one
-of the integral sizes that may have lock free support, the generic
-version uses the lock free built-in function.  Otherwise an
+of the integral sizes that may support lock-freedom, the generic
+version uses the lock-free built-in function.  Otherwise an
   external call is left to be resolved at run time.

=
This is a slightly awkward sentence. Maybe it could be replaced with something
on the lines of "The generic function uses the lock-free built-in function when
the data-type size makes that possible, othe

Re: [gomp4] Lack of OpenACC NVPTX devices is not an error during scanning

2015-05-21 Thread Thomas Schwinge
Hi Julian!

On Tue, 19 May 2015 11:36:58 +0100, Julian Brown  
wrote:
> This patch fixes an oversight whereby if the CUDA libraries are
> available for some reason on a system that doesn't actually contain an
> nVidia card, an OpenACC program will raise an error if the NVPTX
> backend is picked as a default instead of falling back to some other
> device instead.

Thanks for fixing this!  (Has already been committed to trunk in r223352,
and to gomp-4_0-branch in r223351.)

Your patch:

> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c
> @@ -781,7 +781,13 @@ nvptx_get_num_devices (void)
>   until cuInit has been called.  Just call it now (but don't yet do any
>   further initialization).  */
>if (instantiated_devices == 0)
> -cuInit (0);
> +{
> +  r = cuInit (0);
> +  /* This is not an error: e.g. we may have CUDA libraries installed but
> + no devices available.  */
> +  if (r != CUDA_SUCCESS)
> +return 0;
> +}
>  
>r = cuDeviceGetCount (&n);
>if (r!= CUDA_SUCCESS)

In early March, I had noticed the same problem, and came up with the
following patch -- but :-( unfortunately never got around to pushing it
upstream.  I'm now posting my patch just for completeness; I think yours
is sufficient/better: no safe-guard should be needed to the cuInit call
in nvptx_init, because when that is called, we're rightfully expecting to
be able to initialize a PTX device, and in nvptx_get_num_devices, yours
is "more conservative" in doing the right thing ("no PTX offloading
device available") for all kinds of cuInit errors.

commit 6032dde185d0d45d779a1bbf0a5baee7131c0b8c
Author: Thomas Schwinge 
Date:   Sun Mar 1 14:36:02 2015 +0100

libgomp nvptx plugin: Gracefully handle CUDA_ERROR_NO_DEVICE.
---
 libgomp/plugin/plugin-nvptx.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 78e705f..0c1e826 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -592,6 +592,8 @@ nvptx_init (void)
 return -1;
 
   r = cuInit (0);
+  if (r == CUDA_ERROR_NO_DEVICE)
+r = CUDA_SUCCESS;
   if (r != CUDA_SUCCESS)
 GOMP_PLUGIN_fatal ("cuInit error: %s", cuda_error (r));
 
@@ -715,7 +717,13 @@ nvptx_get_num_devices (void)
  until cuInit has been called.  Just call it now (but don't yet do any
  further initialization).  */
   if (!ptx_inited)
-cuInit (0);
+{
+  r = cuInit (0);
+  if (r == CUDA_ERROR_NO_DEVICE)
+   return 0;
+  if (r != CUDA_SUCCESS)
+   GOMP_PLUGIN_fatal ("cuInit error: %s", cuda_error (r));
+}
 
   r = cuDeviceGetCount (&n);
   if (r!= CUDA_SUCCESS)


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] Fix (ocvt (icvt@1 @0)) simplification (PR tree-optimization/66233)

2015-05-21 Thread Richard Biener
On May 21, 2015 5:28:14 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>We ICE on the following testcase at -O3 on x86_64-linux, because
>gimple folding attempts to simplify FLOAT_EXPR conversion of
>signed V4SI to V4SF feeding FIX_TRUNC_EXPR to unsigned V4SI
>into a FIX_TRUNC_EXPR with unsigned V4SI lhs and signed V4SI rhs1,
>which is invalid GIMPLE.
>All the other simplifications in the same iterator block don't
>optimize anything for vector types, and I can't find out any case
>where something like this would be beneficial for vector types.
>These days we represent source level casts of vectors to same sized
>integers as VIEW_CONVERT_EXPR, which isn't handled in here,
>and *_prec doesn't really mean what it tests for vector types
>(it is log2 of number of elements), vector integer or float widening
>is not represented using convert/float/fix_trunc, but using
>VEC_PERM_EXPR,
>VEC_UNPACK*_{LO,HI}_EXPR etc.
>I've bootstrapped/regtested with a logging variant and if
>(inside_vec || inter_vec || final_vec) is true, we (mis)optimize
>anything only on the testcase included in the patch and on
>gfortran.dg/stfunc_4.f90 testcase, in both cases it is
>V4SI -> V4SF -> V4SI, which we really shouldn't be optimizing,
>because SF mode obviously can't represent all integers exactly.
>
>So, this patch disables optimizing vectors.
>Ok for trunk/5.2 if bootstrap/regtest succeeds?

OK.

Thanks,
Richard.

>For 4.9/4.8 a similar patch will be needed, but to
>fold-const.c/tree-ssa-forwprop.c instead of match.pd.
>
>2015-05-21  Jakub Jelinek  
>
>   PR tree-optimization/66233
>   * match.pd (ocvt (icvt@1 @0)): Don't handle vector types.
>   Simplify.
>
>   * gcc.c-torture/execute/pr66233.c: New test.
>
>--- gcc/match.pd.jj2015-05-19 15:53:43.0 +0200
>+++ gcc/match.pd   2015-05-21 16:21:35.627916502 +0200
>@@ -730,16 +730,12 @@ (define_operator_list inverted_tcc_compa
>   (for integers).  Avoid this if the final type is a pointer since
>   then we sometimes need the middle conversion.  Likewise if the
>  final type has a precision not equal to the size of its mode.  */
>-   (if (((inter_int && inside_int)
>-   || (inter_float && inside_float)
>-   || (inter_vec && inside_vec))
>+   (if (((inter_int && inside_int) || (inter_float && inside_float))
>+  && (final_int || final_float)
>   && inter_prec >= inside_prec
>-  && (inter_float || inter_vec
>-  || inter_unsignedp == inside_unsignedp)
>-  && ! (final_prec != GET_MODE_PRECISION (element_mode (type))
>-&& element_mode (type) == element_mode (inter_type))
>-  && ! final_ptr
>-  && (! final_vec || inter_prec == inside_prec))
>+  && (inter_float || inter_unsignedp == inside_unsignedp)
>+  && ! (final_prec != GET_MODE_PRECISION (TYPE_MODE (type))
>+&& TYPE_MODE (type) == TYPE_MODE (inter_type)))
> (ocvt @0))
> 
>/* If we have a sign-extension of a zero-extended value, we can
>--- gcc/testsuite/gcc.c-torture/execute/pr66233.c.jj   2015-05-21
>17:13:32.639713225 +0200
>+++ gcc/testsuite/gcc.c-torture/execute/pr66233.c  2015-05-21
>17:10:57.0 +0200
>@@ -0,0 +1,22 @@
>+/* PR tree-optimization/66233 */
>+
>+unsigned int v[8];
>+
>+__attribute__((noinline, noclone)) void
>+foo (void)
>+{
>+  int i;
>+  for (i = 0; i < 8; i++)
>+v[i] = (float) i;
>+}
>+
>+int
>+main ()
>+{
>+  unsigned int i;
>+  foo ();
>+  for (i = 0; i < 8; i++)
>+if (v[i] != i)
>+  __builtin_abort ();
>+  return 0;
>+}
>
>   Jakub




[PATCH 1/3][AArch64] Strengthen barriers for sync-fetch-op builtins.

2015-05-21 Thread Matthew Wahab

On Aarch64, the __sync builtins are implemented using the __atomic operations
and barriers. This makes the the __sync builtins inconsistent with their
documentation which requires stronger barriers than those for the __atomic
builtins.

The difference between __sync and __atomic builtins is that the restrictions
imposed by a __sync operation's barrier apply to all memory references while the
restrictions of an __atomic operation's barrier only need to apply to a
subset. This affects Aarch64 in particular because, although its implementation
of __atomic builtins is correct, the barriers generated are too weak for the
__sync builtins.

The affected __sync builtins are the __sync_fetch_and_op (and
__sync_op_and_fetch) functions, __sync_compare_and_swap and
__sync_lock_test_and_set. This and a following patch modifies the code generated
for these functions to weaken initial load-acquires to a simple load and to add
a final fence to prevent code-hoisting. The last patch will add tests for the
code generated by the Aarch64 backend for the __sync builtins.

- Full barriers:  __sync_fetch_and_op, __sync_op_and_fetch
  __sync_*_compare_and_swap

  [load-acquire; code; store-release]
  becomes
  [load; code ; store-release; fence].

- Acquire barriers:  __sync_lock_test_and_set

  [load-acquire; code; store]
  becomes
  [load; code; store; fence]

The code generated for release barriers and for the __atomic builtins is
unchanged.

This patch changes the code generated for __sync_fetch_and_ and
__sync__and_fetch builtins.

Tested with check-gcc for aarch64-none-linux-gnu.

Ok for trunk?
Matthew

gcc/
2015-05-21  Matthew Wahab  

* config/aarch64/aarch64.c (aarch64_emit_post_barrier): New.
(aarch64_split_atomic_op): Check for __sync memory models, emit
appropriate initial and final barriers.


From 2092902d2738b0c24a6272e0b3480bb9cffd275c Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 15 May 2015 09:26:28 +0100
Subject: [PATCH 1/3] [AArch64] Strengthen barriers for sync-fetch-op builtin.

Change-Id: I3342a572d672163ffc703e4e51603744680334fc
---
 gcc/config/aarch64/aarch64.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7f0cc0d..778571f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9249,6 +9249,22 @@ aarch64_expand_compare_and_swap (rtx operands[])
   emit_insn (gen_rtx_SET (bval, x));
 }
 
+/* Emit a post-operation barrier.  */
+
+static void
+aarch64_emit_post_barrier (enum memmodel model)
+{
+  const enum memmodel base_model = memmodel_base (model);
+
+  if (is_mm_sync (model)
+  && (base_model == MEMMODEL_ACQUIRE
+	  || base_model == MEMMODEL_ACQ_REL
+	  || base_model == MEMMODEL_SEQ_CST))
+{
+  emit_insn (gen_mem_thread_fence (GEN_INT (MEMMODEL_SEQ_CST)));
+}
+}
+
 /* Split a compare and swap pattern.  */
 
 void
@@ -9311,12 +9327,20 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
 {
   machine_mode mode = GET_MODE (mem);
   machine_mode wmode = (mode == DImode ? DImode : SImode);
+  const enum memmodel model = memmodel_from_int (INTVAL (model_rtx));
+  const bool is_sync = is_mm_sync (model);
+  rtx load_model_rtx = model_rtx;
   rtx_code_label *label;
   rtx x;
 
   label = gen_label_rtx ();
   emit_label (label);
 
+  /* A __sync operation will emit a final fence to stop code hoisting, so the
+ load can be relaxed.  */
+  if (is_sync)
+load_model_rtx = GEN_INT (MEMMODEL_RELAXED);
+
   if (new_out)
 new_out = gen_lowpart (wmode, new_out);
   if (old_out)
@@ -9325,7 +9349,7 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
 old_out = new_out;
   value = simplify_gen_subreg (wmode, value, mode, 0);
 
-  aarch64_emit_load_exclusive (mode, old_out, mem, model_rtx);
+  aarch64_emit_load_exclusive (mode, old_out, mem, load_model_rtx);
 
   switch (code)
 {
@@ -9361,6 +9385,10 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 			gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+
+  /* Emit any fence needed for a __sync operation.  */
+  if (is_sync)
+aarch64_emit_post_barrier (model);
 }
 
 static void
-- 
1.9.1



[AArch64][PATCH 2/3] Strengthen barriers for sync-compare-swap builtins.

2015-05-21 Thread Matthew Wahab

This patch changes the code generated for __sync_type_compare_and_swap to

  ldxr reg; cmp; bne label; stlxr; cbnz; label: dmb ish; mov .., reg

This removes the acquire-barrier from the load and ends the operation with a
fence to prevent memory references appearing after the __sync operation from
being moved ahead of the store-release.

This also strengthens the acquire barrier generated for __sync_lock_test_and_set
(which, like compare-and-swap, is implemented as a form of atomic exchange):

  ldaxr; stxr; cbnz
becomes
  ldxr; stxr; cbnz; dmb ish

Tested with check-gcc for aarch64-none-linux-gnu.

Ok for trunk?
Matthew

2015-05-21  Matthew Wahab  

* config/aarch64/aarch64.c (aarch64_split_compare_and_swap): Check
for __sync memory models, emit appropriate initial and final
barriers.

From 6f748034d25b75ea7829192d94e54189c2fbf99e Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 15 May 2015 09:31:06 +0100
Subject: [PATCH 2/3] [AArch64] Strengthen barriers for sync-compare-swap
 builtins.

Change-Id: I335771f2f42ea951d227f20f6cb9daa07330614d
---
 gcc/config/aarch64/aarch64.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 778571f..11a8cd0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9275,14 +9275,19 @@ aarch64_split_compare_and_swap (rtx operands[])
   bool is_weak;
   rtx_code_label *label1, *label2;
   rtx x, cond;
+  enum memmodel model;
+  rtx model_rtx;
+  rtx load_model_rtx;
 
   rval = operands[0];
   mem = operands[1];
   oldval = operands[2];
   newval = operands[3];
   is_weak = (operands[4] != const0_rtx);
+  model_rtx = operands[5];
   scratch = operands[7];
   mode = GET_MODE (mem);
+  model = memmodel_from_int (INTVAL (model_rtx));
 
   label1 = NULL;
   if (!is_weak)
@@ -9292,7 +9297,13 @@ aarch64_split_compare_and_swap (rtx operands[])
 }
   label2 = gen_label_rtx ();
 
-  aarch64_emit_load_exclusive (mode, rval, mem, operands[5]);
+  /* A __sync operation will end with a fence so the load can be relaxed.  */
+  if (is_mm_sync (model))
+load_model_rtx = GEN_INT (MEMMODEL_RELAXED);
+  else
+load_model_rtx = model_rtx;
+
+  aarch64_emit_load_exclusive (mode, rval, mem, load_model_rtx);
 
   cond = aarch64_gen_compare_reg (NE, rval, oldval);
   x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
@@ -9300,7 +9311,7 @@ aarch64_split_compare_and_swap (rtx operands[])
 			gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
 
-  aarch64_emit_store_exclusive (mode, scratch, mem, newval, operands[5]);
+  aarch64_emit_store_exclusive (mode, scratch, mem, newval, model_rtx);
 
   if (!is_weak)
 {
@@ -9317,6 +9328,10 @@ aarch64_split_compare_and_swap (rtx operands[])
 }
 
   emit_label (label2);
+
+  /* A __sync operation may need a final fence.  */
+  if (is_mm_sync (model))
+aarch64_emit_post_barrier (model);
 }
 
 /* Split an atomic operation.  */
-- 
1.9.1



[PATCH 3/3][Aarch64] Add tests for __sync_builtins.

2015-05-21 Thread Matthew Wahab

This patch adds tests for the code generated by the Aarch64 backend for the
__sync builtins.

Tested aarch64-none-linux-gnu with check-gcc.

Ok for trunk?
Matthew

gcc/testsuite/
2015-05-21  Matthew Wahab  

* gcc.target/aarch64/sync-comp-swap.c: New.
* gcc.target/aarch64/sync-comp-swap.x: New.
* gcc.target/aarch64/sync-op-acquire.c: New.
* gcc.target/aarch64/sync-op-acquire.x: New.
* gcc.target/aarch64/sync-op-full.c: New.
* gcc.target/aarch64/sync-op-full.x: New.
* gcc.target/aarch64/sync-op-release.c: New.
* gcc.target/aarch64/sync-op-release.x: New.

From 74738b2c0ceb9d5cae281b9609c134fde1d459e9 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 15 May 2015 09:31:42 +0100
Subject: [PATCH 3/3] [Aarch64] Add tests for __sync_builtins.

Change-Id: I9f7cde85613dfe2cb6df55cbc732e683092f14d8
---
 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c  |  8 +++
 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x  | 13 
 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c |  8 +++
 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x |  7 +++
 gcc/testsuite/gcc.target/aarch64/sync-op-full.c|  8 +++
 gcc/testsuite/gcc.target/aarch64/sync-op-full.x| 73 ++
 gcc/testsuite/gcc.target/aarch64/sync-op-release.c |  6 ++
 gcc/testsuite/gcc.target/aarch64/sync-op-release.x |  7 +++
 8 files changed, 130 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-full.x
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-release.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sync-op-release.x

diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
new file mode 100644
index 000..126b997
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-ipa-icf" } */
+
+#include "sync-comp-swap.x"
+
+/* { dg-final { scan-assembler-times "ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\]" 2 } } */
+/* { dg-final { scan-assembler-times "stlxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\]" 2 } } */
+/* { dg-final { scan-assembler-times "dmb\tish" 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
new file mode 100644
index 000..eda52e40
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-comp-swap.x
@@ -0,0 +1,13 @@
+int v = 0;
+
+int
+sync_bool_compare_swap (int a, int b)
+{
+  return __sync_bool_compare_and_swap (&v, &a, &b);
+}
+
+int
+sync_val_compare_swap (int a, int b)
+{
+  return __sync_val_compare_and_swap (&v, &a, &b);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
new file mode 100644
index 000..2639f9f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "sync-op-acquire.x"
+
+/* { dg-final { scan-assembler-times "ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "stxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "dmb\tish" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
new file mode 100644
index 000..4c4548c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-acquire.x
@@ -0,0 +1,7 @@
+int v;
+
+int
+sync_lock_test_and_set (int a)
+{
+  return __sync_lock_test_and_set (&v, a);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full.c b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
new file mode 100644
index 000..10fc8fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "sync-op-full.x"
+
+/* { dg-final { scan-assembler-times "ldxr\tw\[0-9\]+, \\\[x\[0-9\]+\\\]" 12 } } */
+/* { dg-final { scan-assembler-times "stlxr\tw\[0-9\]+, w\[0-9\]+, \\\[x\[0-9\]+\\\]" 12 } } */
+/* { dg-final { scan-assembler-times "dmb\tish" 12 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sync-op-full.x b/gcc/testsuite/gcc.target/aarch64/sync-op-full.x
new file mode 100644
index 000..c24223d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sync-op-full.x
@@ -0,0 +1,73 @@
+int v = 0;
+
+int
+sync_fetch_and_add (int a)
+{
+  return __sync_fetch_and_add (&v, a);
+}
+
+int
+sync_fetch_and_sub (int a)
+{
+  return __sync_fetch_and_sub (&v, a);
+}
+
+int
+sync_fetch_and_and (int a)
+{
+  return __sync_fetch_and_and (&v, a

Fix hashing of basetypes of methods

2015-05-21 Thread Jan Hubicka
Hi,
this patch drops TYPE_METHOD_BASETYPE from hash_canonical_type.  It is not
compared by gimple_canonical_types_compatible_p and thus it can only
corrupt the hashtable by having two entries that are equal but having different
hash.

Theoretically we may want later distinguish the method pointer by basetypes,
but the THIS pointer has proper type anyway and this all makes sense only
if we start handling pointers properly. Definitely we will need to do that
in both functions, not only here.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* lto.c (hash_canonical_type): Do not hash TYPE_METHOD_BASETYPE.
Index: lto/lto.c
===
--- lto/lto.c   (revision 223490)
+++ lto/lto.c   (working copy)
@@ -372,10 +376,6 @@ hash_canonical_type (tree type)
   unsigned na;
   tree p;
 
-  /* For method types also incorporate their parent class.  */
-  if (TREE_CODE (type) == METHOD_TYPE)
-   iterative_hash_canonical_type (TYPE_METHOD_BASETYPE (type), hstate);
-
   iterative_hash_canonical_type (TREE_TYPE (type), hstate);
 
   for (p = TYPE_ARG_TYPES (type), na = 0; p; p = TREE_CHAIN (p))


Re: [patch, testsuite] don't specify "dg-do run" explicitly for vect test cases

2015-05-21 Thread Sandra Loosemore

On 05/21/2015 03:08 AM, Richard Biener wrote:

On Thu, May 21, 2015 at 7:12 AM, Sandra Loosemore
 wrote:

On targets such as ARM, some arches are compatible with options needed to
enable compilation with vectorization, but the specific hardware (or
simulator or BSP) available for execution tests may not implement or enable
those features.  The vect.exp test harness already includes some magic to
determine whether the target hw can execute vectorized code and sets
dg-do-what-default to compile the tests only if they can't be executed.
It's a mistake for individual tests to explicitly say "dg-do run" because
this overrides the harness's magic default and forces the test to be
executed, even if doing so just ends up wedging the target.

I already committed two patches last fall (r215627 and r218427) to address
this, but people keep adding new vect test cases with the same problem, so
here is yet another installment to clean them up.  I tested this on
arm-none-eabi with a fairly large collection of multilibs.  OK to commit?


Huh... I thought we have the check_vect () stuff for that...?


We do; this is what sets dg-do-what-default.  But, if the test case 
specifies dg-do whatever explicitly, that overrides the default.  So, 
don't do that!  :-P


-Sandra


Do not compare type attributes in gimple_canonical_types_compatible_p

2015-05-21 Thread Jan Hubicka
Hi,
this patch removes call to comp_type_attributes (wich happens for
METHOD_TYPE and FUNCTION_TYPE only). This does not make sense, because
type attributes may change in variants and pointers should be considered
compatible.

We did not get any trouble from this only because we do not really use
canonical types of functions for anything.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* tree.c (gimple_canonical_types_compatible_p) Do not compare
type attributes.
(verify_type): Drop METHOD_TYPE FIXME; update FUNCTION_TYPE FIXME.
Index: tree.c
===
--- tree.c  (revision 223490)
+++ tree.c  (working copy)
@@ -12837,9 +12837,6 @@ gimple_canonical_types_compatible_p (con
trust_type_canonical))
return false;
 
-  if (!comp_type_attributes (t1, t2))
-   return false;
-
   if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
return true;
   else
@@ -12939,10 +12936,9 @@ verify_type (const_tree t)
   /* Method and function types can not be used to address memory and thus
  TYPE_CANONICAL really matters only for determining useless conversions.
 
- FIXME: C++ FE does not agree with gimple_canonical_types_compatible_p
- here.  gimple_canonical_types_compatible_p calls comp_type_attributes
- while for C++ FE the attributes does not make difference.  */
-  else if (TREE_CODE (t) == FUNCTION_TYPE || TREE_CODE (t) == METHOD_TYPE)
+ FIXME: C++ FE produce declarations of builtin functions that are not
+ compatible with main variants.  */
+  else if (TREE_CODE (t) == FUNCTION_TYPE)
 ;
   else if (t != ct
   /* FIXME: gimple_canonical_types_compatible_p can not compare types


RE: Fix PR48052: loop not vectorized if index is "unsigned int"

2015-05-21 Thread Aditya K

I tested this patch and it passes bootstrap and no extra failures.

Thanks
-Aditya


Symbolically evaluate conditionals, and subtraction when additional constraints 
are provided.

Adding this evaluation mechanism helps vectorize some loops on 64bit machines 
because on 64bit, a typecast appears
which causes scev to bail out.

gcc/ChangeLog:

2015-05-21  hiraditya  
2015-05-21 Sebastian Pop  
2015-05-21 Abderrazek Zaafrani 

    * gcc.dg/vect/pr48052.c: New test.
    * tree-ssa-loop-niter.c (fold_binary_cond_p): Fold a conditional 
operation when additional constraints are
    available.
    (fold_binary_minus_p): Fold a subtraction operations of the form (A - B 
-1) when additional constraints are
    available.
    (scev_probably_wraps_p): Use the above two functions to find whether 
valid_niter>= loop->nb_iterations.


diff --git a/gcc/testsuite/gcc.dg/vect/pr48052.c 
b/gcc/testsuite/gcc.dg/vect/pr48052.c
new file mode 100644
index 000..8e406d7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr48052.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
+int foo(int* A, int* B,  unsigned start, unsigned BS)
+{
+  int s;
+  for (unsigned k = start;  k < start + BS; k++)
+    {
+  s += A[k] * B[k];
+    }
+
+  return s;
+}
+
+int bar(int* A, int* B, unsigned BS)
+{
+  int s;
+  for (unsigned k = 0;  k < BS; k++)
+    {
+  s += A[k] * B[k];
+    }
+
+  return s;
+}
+






























> From: hiradi...@msn.com
> To: gcc-patches@gcc.gnu.org; a.zaafr...@samsung.com; seb...@gmail.com; 
> l...@redhat.com; richard.guent...@gmail.com
> Subject: Fix PR48052: loop not vectorized if index is "unsigned int"
> Date: Tue, 19 May 2015 16:12:26 +
>
> w.r.t. the PR48052, here is the patch which finds out if scev would wrap or 
> not.
> The patch symbolically evaluates if valid_niter>= loop->nb_iterations is 
> true. In that case the scev would not wrap (??).
> Currently, we only look for two special 'patterns', which are sufficient to 
> analyze the simple test cases.
>
> valid_niter = ~s (= UNIT_MAX - s)
> We have to prove that valid_niter>= loop->nb_iterations
>
> Pattern1 loop->nb_iterations: s>= e ? s - e : 0
> Pattern2 loop->nb_iterations: (e - s) -1
>
> In the first case we prove that valid_niter>= loop->nb_iterations in both the 
> cases i.e., when s>=e and when not.
> In the second case we prove valid_niter>= loop->nb_iterations, by simple 
> analysis that  UINT_MAX>= e is true in all cases.
>
> I haven't tested this patch completely. I'm looking for feedback and any 
> scope for improvement.
>
>
> hth,
> -Aditya
>
>
>
> Vectorize loops which has typecast.
>
> 2015-05-19  hiraditya  
> 2015-05-19 Sebastian Pop  
> 2015-05-19 Abderrazek Zaafrani 
>
> * gcc.dg/vect/pr48052.c: New test.
>
> gcc/ChangeLog:
>
> 2015-05-19  hiraditya  
>
> * tree-ssa-loop-niter.c (fold_binary_cond_p): Fold a conditional 
> operation when additional constraints are
> available.
> (fold_binary_minus_p): Fold a subtraction operations of the form (A - 
> B -1) when additional constraints are
> available.
> (scev_probably_wraps_p): Use the above two functions to find whether 
> valid_niter>= loop->nb_iterations.
>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr48052.c 
> b/gcc/testsuite/gcc.dg/vect/pr48052.c
> new file mode 100644
> index 000..8e406d7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr48052.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3" } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> +
> +int foo(int* A, int* B,  unsigned start, unsigned BS)
> +{
> +  int s;
> +  for (unsigned k = start;  k < start + BS; k++)
> +{
> +  s += A[k] * B[k];
> +}
> +
> +  return s;
> +}
> +
> +int bar(int* A, int* B, unsigned BS)
> +{
> +  int s;
> +  for (unsigned k = 0;  k < BS; k++)
> +{
> +  s += A[k] * B[k];
> +}
> +
> +  return s;
> +}
> +
> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index 042f8df..ddc00cc 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -3773,6 +3773,117 @@ nowrap_type_p (tree type)
>return false;
>  }
>
> +/* Return true when op0>= op1.
> +   For example:
> +   Where, op0 = ~start_3(D);
> +   op1 = start_3(D) <= stop_6(D) ? stop_6(D) - start_3(D) : 0;
> +   In this case op0 = UINT_MAX - start_3(D);
> +   So, op0>= op1 in all cases because UINT_MAX>= stop_6(D),
> +   when TREE_TYPE(stop_6(D)) == unsigned int;  */
> +bool
> +fold_binary_cond_p (enum tree_code code, tree type, tree op0, tree op1)
> +{
> +  gcc_assert (type == boolean_type_node);
> +
> +  if (TREE_TYPE (op0) != TREE_TYPE (op1))
> +return false;
> +
> +  /* TO

Re: [patch, testsuite, ARM] don't try to execute advsimd-intrinsics tests on hardware without NEON

2015-05-21 Thread Sandra Loosemore

On 05/21/2015 03:48 AM, Christophe Lyon wrote:

On 21 May 2015 at 07:33, Sandra Loosemore  wrote:

ARM testing shares the AArch64 advsimd-intrinsics execution tests.  On ARM,
though, the NEON support being tested is optional -- some arches are
compatible with the NEON compilation options but hardware available for
testing might or might not be able to execute those instructions. In
arm-none-eabi testing of a long list of multilibs, I found that this problem
caused some of the multilibs to get stuck for days because every one of
these execution tests was wandering off into the weeds and timing out.

The vect.exp tests already handle this by setting dg-do-what-default to
either "run" or "compile", depending on whether we have target hardware
execution support (arm_neon_hw) for NEON, or only compilation support
(arm_neon_ok).  So, I've adapted that logic for advsimd-intrinsics.exp too.


Indeed it makes sense.



It also appeared that the main loop over the test cases was running them all
twice with the torture options -- once using c-torture-execute and once
using gcc-dg-runtest.  I deleted the former since it appears to ignore
dg-do-what-default and always try to execute no matter what.  My dejagnu-fu
isn't the strongest and this is pretty confusing to me am I missing
something here?  Otherwise, OK to commit?


As noted by Alan in https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01348.html
the sets of options covered by gcc-dg-runtest and c-torture-execute
are slightly different.

That was the reason I kept both.
We can probably live with no longer testing "-Og -g" as Alan says.
OTOH, are the 2 option sets supposed to be the same, or are there any
plans to make them differ substantially  in the future?


Richard, adding "-Og -g" was your change:

https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01367.html

Is it an oversight that the torture option lists in c-torture.exp, 
objc-torture.exp, and gcc-dg.exp are not consistent?  Maybe we should 
have a separate patch to unify them?


-Sandra





Re: Fix hashing of basetypes of methods

2015-05-21 Thread Richard Biener
On May 21, 2015 6:02:27 PM GMT+02:00, Jan Hubicka  wrote:
>Hi,
>this patch drops TYPE_METHOD_BASETYPE from hash_canonical_type.  It is
>not
>compared by gimple_canonical_types_compatible_p and thus it can only
>corrupt the hashtable by having two entries that are equal but having
>different
>hash.
>
>Theoretically we may want later distinguish the method pointer by
>basetypes,
>but the THIS pointer has proper type anyway and this all makes sense
>only
>if we start handling pointers properly. Definitely we will need to do
>that
>in both functions, not only here.
>
>Bootstrapped/regtested x86_64-linux, OK?

OK 

Richard.

>Honza
>
>   * lto.c (hash_canonical_type): Do not hash TYPE_METHOD_BASETYPE.
>Index: lto/lto.c
>===
>--- lto/lto.c  (revision 223490)
>+++ lto/lto.c  (working copy)
>@@ -372,10 +376,6 @@ hash_canonical_type (tree type)
>   unsigned na;
>   tree p;
> 
>-  /* For method types also incorporate their parent class.  */
>-  if (TREE_CODE (type) == METHOD_TYPE)
>-  iterative_hash_canonical_type (TYPE_METHOD_BASETYPE (type), hstate);
>-
>   iterative_hash_canonical_type (TREE_TYPE (type), hstate);
> 
>   for (p = TYPE_ARG_TYPES (type), na = 0; p; p = TREE_CHAIN (p))




Re: Do not compare type attributes in gimple_canonical_types_compatible_p

2015-05-21 Thread Richard Biener
On May 21, 2015 6:06:18 PM GMT+02:00, Jan Hubicka  wrote:
>Hi,
>this patch removes call to comp_type_attributes (wich happens for
>METHOD_TYPE and FUNCTION_TYPE only). This does not make sense, because
>type attributes may change in variants and pointers should be
>considered
>compatible.
>
>We did not get any trouble from this only because we do not really use
>canonical types of functions for anything.
>
>Bootstrapped/regtested x86_64-linux, OK?

OK.

Richard.


>Honza
>
>   * tree.c (gimple_canonical_types_compatible_p) Do not compare
>   type attributes.
>   (verify_type): Drop METHOD_TYPE FIXME; update FUNCTION_TYPE FIXME.
>Index: tree.c
>===
>--- tree.c (revision 223490)
>+++ tree.c (working copy)
>@@ -12837,9 +12837,6 @@ gimple_canonical_types_compatible_p (con
>   trust_type_canonical))
>   return false;
> 
>-  if (!comp_type_attributes (t1, t2))
>-  return false;
>-
>   if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
>   return true;
>   else
>@@ -12939,10 +12936,9 @@ verify_type (const_tree t)
>/* Method and function types can not be used to address memory and thus
>TYPE_CANONICAL really matters only for determining useless conversions.
> 
>- FIXME: C++ FE does not agree with
>gimple_canonical_types_compatible_p
>- here.  gimple_canonical_types_compatible_p calls
>comp_type_attributes
>- while for C++ FE the attributes does not make difference.  */
>-  else if (TREE_CODE (t) == FUNCTION_TYPE || TREE_CODE (t) ==
>METHOD_TYPE)
>+ FIXME: C++ FE produce declarations of builtin functions that are
>not
>+ compatible with main variants.  */
>+  else if (TREE_CODE (t) == FUNCTION_TYPE)
> ;
>   else if (t != ct
>  /* FIXME: gimple_canonical_types_compatible_p can not compare types




[AArch64][TLSLE][5/N] Recognize -mtls-size

2015-05-21 Thread Jiong Wang

This patch add -mtls-size option for AArch64. This option let user to do
finer control on code generation for various TLS model on AArch64.

For example, for TLS LE, user can specify smaller tls-size, for example
4K which is quite usual, to let AArch64 backend generate more efficient
instruction sequences.

Currently, -mtls-size accept all integer, then will translate it into
12(4K), 24(16M), 32(4G), 48(256TB) based on the value.

no functional change.

ok for trunk?

2015-05-20  Jiong Wang  

gcc/
  * config/aarch64/aarch64.opt (mtls-size): New entry.
  * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function.
  * doc/invoke.texi (AArch64 Options): Document -mtls-size.
  
-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 55b166c..e6aa0e1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6835,6 +6835,7 @@ aarch64_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind,
 }
 
 static void initialize_aarch64_code_model (void);
+static void initialize_aarch64_tls_size (void);
 
 /* Parse the architecture extension string.  */
 
@@ -7068,6 +7069,7 @@ aarch64_override_options (void)
 #endif
 
   initialize_aarch64_code_model ();
+  initialize_aarch64_tls_size ();
 
   aarch64_build_bitmask_table ();
 
@@ -7173,6 +7175,36 @@ initialize_aarch64_code_model (void)
  aarch64_cmodel = aarch64_cmodel_var;
 }
 
+/* A checking mechanism for the implementation of the tls size.  */
+
+static void
+initialize_aarch64_tls_size (void)
+{
+  switch (aarch64_cmodel_var)
+{
+case AARCH64_CMODEL_TINY:
+  /* The maximum TLS size allowed under tiny is 1M.  */
+  if (aarch64_tls_size > 20)
+	aarch64_tls_size = 20;
+  break;
+case AARCH64_CMODEL_SMALL:
+  /* The maximum TLS size allowed under small is 4G.  */
+  if (aarch64_tls_size > 32)
+	aarch64_tls_size = 32;
+  break;
+case AARCH64_CMODEL_LARGE:
+  /* The maximum TLS size allowed under large is 16E.
+	 FIXME: 16E should be 64bit, we only support 48bit offset now.  */
+  if (aarch64_tls_size > 48)
+	aarch64_tls_size = 48;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  return;
+}
+
 /* Return true if SYMBOL_REF X binds locally.  */
 
 static bool
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 6d72ac2..e87a1f5 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -95,6 +95,11 @@ mtls-dialect=
 Target RejectNegative Joined Enum(tls_type) Var(aarch64_tls_dialect) Init(TLS_DESCRIPTORS)
 Specify TLS dialect
 
+mtls-size=
+Target RejectNegative Joined UInteger Var(aarch64_tls_size) Init(24)
+Specifies size of the TLS data area, default size is 16M. Accept any integer, but the value
+will be transformed into 12(4K), 24(16M), 32(4G), 48(256TB)
+
 march=
 Target RejectNegative ToLower Joined Var(aarch64_arch_string)
 -march=ARCH	Use features of architecture ARCH
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 117b5d9..1f96a4f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -513,6 +513,7 @@ Objective-C and Objective-C++ Dialects}.
 -mstrict-align @gol
 -momit-leaf-frame-pointer  -mno-omit-leaf-frame-pointer @gol
 -mtls-dialect=desc  -mtls-dialect=traditional @gol
+-mtls-size=@var{size} @gol
 -mfix-cortex-a53-835769  -mno-fix-cortex-a53-835769 @gol
 -mfix-cortex-a53-843419  -mno-fix-cortex-a53-843419 @gol
 -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}}
@@ -12390,6 +12391,13 @@ of TLS variables.  This is the default.
 Use traditional TLS as the thread-local storage mechanism for dynamic accesses
 of TLS variables.
 
+@item -mtls-size=@var{size}
+@opindex mtls-size
+Specify the size of TLS area. You can specify smaller value to get better code
+generation for TLS variable access. Currently, we accept any integer, but will
+turn them into 12(4K), 24(16M), 32(4G), 48(256TB) according to the integer
+value.
+
 @item -mfix-cortex-a53-835769
 @itemx -mno-fix-cortex-a53-835769
 @opindex mfix-cortex-a53-835769


[gomp4.1] Add taskloop-4.c testcase

2015-05-21 Thread Jakub Jelinek
Hi!

I've committed another testcase, which tests the computation of
number of iterations for each task and number of tasks.

2015-05-21  Jakub Jelinek  

* testsuite/libgomp.c/taskloop-4.c: New test.

--- libgomp/testsuite/libgomp.c/taskloop-4.c(revision 0)
+++ libgomp/testsuite/libgomp.c/taskloop-4.c(working copy)
@@ -0,0 +1,97 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fopenmp" } */
+
+int u[64], v;
+
+__attribute__((noinline, noclone)) int
+test (int a, int b, int c, int d, void (*fn) (int, int, int, int),
+  int *num_tasks, int *min_iters, int *max_iters)
+{
+  int i, t = 0;
+  __builtin_memset (u, 0, sizeof u);
+  v = 0;
+  fn (a, b, c, d);
+  *min_iters = 0;
+  *max_iters = 0;
+  *num_tasks = v;
+  if (v)
+{
+  *min_iters = u[0];
+  *max_iters = u[0];
+  t = u[0];
+  for (i = 1; i < v; i++)
+   {
+ if (*min_iters > u[i])
+   *min_iters = u[i];
+ if (*max_iters < u[i])
+   *max_iters = u[i];
+ t += u[i];
+   }
+}
+  return t;
+}
+
+void
+grainsize (int a, int b, int c, int d)
+{
+  int i, j = 0, k = 0;
+  #pragma omp taskloop firstprivate (j, k) grainsize(d)
+  for (i = a; i < b; i += c)
+{
+  if (j == 0)
+   {
+ #pragma omp atomic capture
+   k = v++;
+ if (k >= 64)
+   __builtin_abort ();
+   }
+  u[k] = ++j;
+}
+}
+
+void
+num_tasks (int a, int b, int c, int d)
+{
+  int i, j = 0, k = 0;
+  #pragma omp taskloop firstprivate (j, k) num_tasks(d)
+  for (i = a; i < b; i += c)
+{
+  if (j == 0)
+   {
+ #pragma omp atomic capture
+   k = v++;
+ if (k >= 64)
+   __builtin_abort ();
+   }
+  u[k] = ++j;
+}
+}
+
+int
+main ()
+{
+  #pragma omp parallel
+#pragma omp single
+  {
+   int min_iters, max_iters, ntasks;
+   /* If grainsize is present, # of task loop iters is >= grainsize && < 2 
* grainsize,
+  unless # of loop iterations is smaller than grainsize.  */
+   if (test (0, 79, 1, 17, grainsize, &ntasks, &min_iters, &max_iters) != 
79
+   || min_iters < 17 || max_iters >= 17 * 2)
+ __builtin_abort ();
+   if (test (-49, 2541, 7, 28, grainsize, &ntasks, &min_iters, &max_iters) 
!= 370
+   || min_iters < 28 || max_iters >= 28 * 2)
+ __builtin_abort ();
+   if (test (7, 21, 2, 15, grainsize, &ntasks, &min_iters, &max_iters) != 7
+   || ntasks != 1 || min_iters != 7 || max_iters != 7)
+ __builtin_abort ();
+   /* If num_tasks is present, # of task loop iters is min (# of loop 
iters, num_tasks).  */
+   if (test (-51, 2500, 48, 9, num_tasks, &ntasks, &min_iters, &max_iters) 
!= 54
+   || ntasks != 9)
+ __builtin_abort ();
+   if (test (0, 25, 2, 17, num_tasks, &ntasks, &min_iters, &max_iters) != 
13
+   || ntasks != 13)
+ __builtin_abort ();
+  }
+  return 0;
+}

Jakub


[AArch64][TLSLE][N/N] Implement local executable mode for all memory model

2015-05-21 Thread Jiong Wang

4 instruction sequences can be implemented for AArch64 TLS LE model
based on relocations provided.

These instruction sequences are the same for tiny/small/large, We just
need to choose the most efficient one to use accoding to tls size.

the 12bit version give us 4K TLS size, 24bit give us 16M, 32bit give us
4G while 48bit gives us 256TB.

sequence 1
==
  add  t0, tp, #:tprel_lo12:x1   R_AARCH64_TLSLE_ADD_TPREL_LO12   x1


sequence 2
==
  add  t0, tp, #:tprel_hi12:x1, lsl #12  R_AARCH64_TLSLE_ADD_TPREL_HI12   x2
  add  t0, #:tprel_lo12_nc:x1R_AARCH64_TLSLE_ADD_TPREL_LO12_NCx2

sequence 2
==
  movz t0, #:tprel_g1:x3 R_AARCH64_TLSLE_MOVW_TPREL_G1x3
  movk t0, #:tprel_g0_nc:x3  R_AARCH64_TLSLE_MOVW_TPREL_G0_NC x3
  add  t0, tp, t0

sequence 4
==
  movz t0, #:tprel_g2:x4 R_AARCH64_TLSLE_MOVW_TPREL_G2x4
  movk t0, #:tprel_g1_nc:x4  R_AARCH64_TLSLE_MOVW_TPREL_G1_NC x4
  movk t0, #:tprel_g0_nc:x4  R_AARCH64_TLSLE_MOVW_TPREL_G0_NC x4
  add  t0, t0, tp

OK for trunk?

2015-05-14  Jiong Wang  
gcc/
  * config/aarch64/aarch64.c (aarch64_print_operand): Support tls_size.
  * config/aarch64/aarch64.md (tlsle): Choose proper instruction
  sequences.
  (tlsle_): New define_insn.
  (tlsle_movsym_): Ditto.
  * config/aarch64/constraints.md (Uta): New constraint.
  (Utb): Ditto.
  (Utc): Ditto.
  (Utd): Ditto.

gcc/testsuite/
  * gcc.target/aarch64/tlsle.c: New test source.
  * gcc.target/aarch64/tlsle12.c: New testcase.
  * gcc.target/aarch64/tlsle24.c: New testcase.
  * gcc.target/aarch64/tlsle32.c: New testcase.
-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e6aa0e1..569f22d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4450,7 +4450,11 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	  break;
 
 	case SYMBOL_TLSLE:
-	  asm_fprintf (asm_out_file, ":tprel_lo12_nc:");
+	  if (aarch64_tls_size <= 12)
+	/* Make sure TLS offset fit into 12bit.  */
+	asm_fprintf (asm_out_file, ":tprel_lo12:");
+	  else
+	asm_fprintf (asm_out_file, ":tprel_lo12_nc:");
 	  break;
 
 	case SYMBOL_TINY_GOT:
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b1425a3..8b061ba 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4304,21 +4304,60 @@
   ""
 {
   machine_mode mode = GET_MODE (operands[0]);
-  emit_insn ((mode == DImode
-	  ? gen_tlsle_di
-	  : gen_tlsle_si) (operands[0], operands[1], operands[2]));
+  rtx (*gen_tlsle_si_special) (rtx , rtx , rtx);
+  rtx (*gen_tlsle_di_special) (rtx , rtx , rtx);
+
+  /* For tls offset <=24, utilize add to load 12bit offset.  */
+  if (aarch64_tls_size <= 24)
+{
+  gen_tlsle_si_special = gen_tlsle_si;
+  gen_tlsle_di_special = gen_tlsle_di;
+
+  emit_insn ((mode == DImode
+		  ? gen_tlsle_di_special
+		  : gen_tlsle_si_special) (operands[0], operands[1],
+	   operands[2]));
+  DONE;
+}
+  /* Load the sym's offset into operands[0].  */
+  else if (aarch64_tls_size <= 48)
+emit_insn (mode == DImode
+	   ? gen_tlsle_movsym_di (operands[0], operands[2])
+	   : gen_tlsle_movsym_si (operands[0], operands[2]));
+  else
+gcc_unreachable ();
+
+  /* Add base address from tp.  */
+  emit_insn (mode == DImode
+	 ? gen_adddi3 (operands[0], operands[0], operands[1])
+	 : gen_addsi3 (operands[0], operands[0], operands[1]));
+
   DONE;
 })
 
 (define_insn "tlsle_"
-  [(set (match_operand:P 0 "register_operand" "=r")
-(unspec:P [(match_operand:P 1 "register_operand" "r")
-   (match_operand 2 "aarch64_tls_le_symref" "S")]
+  [(set (match_operand:P 0 "register_operand" "=r, r")
+(unspec:P [(match_operand:P 1 "register_operand" "r, r")
+   (match_operand 2 "aarch64_tls_le_symref" "Uta, Utb")]
 		   UNSPEC_TLSLE))]
   ""
-  "add\\t%0, %1, #%G2, lsl #12\;add\\t%0, %0, #%L2"
-  [(set_attr "type" "alu_sreg")
-   (set_attr "length" "8")]
+  "@
+   add\\t%0, %1, #%L2
+   add\\t%0, %1, #%G2, lsl #12\;add\\t%0, %0, #%L2"
+  [(set_attr "type" "alu_sreg, multiple")
+   (set_attr "length" "4, 8")]
+)
+
+(define_insn "tlsle_movsym_"
+  [(set (match_operand:P 0 "register_operand" "=r, r")
+(unspec:P [(match_operand 1 "aarch64_tls_le_symref" "Utc, Utd")]
+		   UNSPEC_TLSLE))]
+  ""
+  "@
+   movz\\t%0, #:tprel_g1:%1\;movk\\t%0, #:tprel_g0_nc:%1
+   movz\\t%0, #:tprel_g2:%1\;movk\\t%0, #:tprel_g1_nc:%1\;movk\\t%0, #:tprel_g0_nc:%1"
+  [(set_attr "type" "multiple, multiple")
+   (set_attr "length" "8, 12")]
 )
 
 (define_insn "tlsdesc_small_"
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 5b189ea..58fe082 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -109,6 +109,30 @@
   A constraint that matches the immediate constant 

Re: [PATCH 1/7] always define STACK_GROWS_DOWNWARD

2015-05-21 Thread Joseph Myers
This patch needs to update tm.texi.in and regenerate tm.texi to describe 
the new semantics of STACK_GROWS_DOWNWARD.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][DRIVER] Wrong C++ include paths when configuring with "--with-sysroot=/"

2015-05-21 Thread Joseph Myers
On Thu, 21 May 2015, Yvan Roux wrote:

> There is this old patch submitted by Matthias on that same issue, if
> its logic is the right one for you Joseph I can rebase/validate it
> Joseph.
> 
> https://gcc.gnu.org/ml/gcc-patches/2012-02/msg00320.html

Yes, that seems better.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][ARM] Handle UNSPEC_VOLATILE in rtx costs and don't recurse inside the unspec

2015-05-21 Thread Kyrill Tkachov

Ping^3.

Thanks,
Kyrill
On 12/05/15 10:08, Kyrill Tkachov wrote:

Ping^2.

Thanks,
Kyrill
On 30/04/15 13:01, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01047.html

Thanks,
Kyrill

On 20/04/15 17:28, Kyrill Tkachov wrote:

Hi all,

A pet project of mine is to get to the point where backend rtx costs functions 
won't have
to handle rtxes that don't match down to any patterns/expanders we have. Or at 
least limit such cases.
A case dealt with in this patch is QImode PLUS. We don't actually generate or 
handle these anywhere in
the arm backend *except* in sync.md where, for example, 
atomic_ matches:
(set (match_operand:QHSD 0 "mem_noofs_operand" "+Ua")
(unspec_volatile:QHSD
  [(syncop:QHSD (match_dup 0)
 (match_operand:QHSD 1 "" ""))
   (match_operand:SI 2 "const_int_operand")];; model
  VUNSPEC_ATOMIC_OP))

Here QHSD can contain QImode and HImode while syncop can be PLUS.
Now immediately during splitting in arm_split_atomic_op we convert that
QImode PLUS into an SImode one, so we never actually generate any kind of 
QImode add operations
(how would we? we don't have define_insns for such things) but the RTL 
optimisers will get a hold
of the UNSPEC_VOLATILE in the meantime and ask for it's cost (for example, cse 
when building libatomic).
Currently we don't handle UNSPEC_VOLATILE (VUNSPEC_ATOMIC_OP) so the arm rtx 
costs function just recurses
into the QImode PLUS that I'd like to avoid.
This patch stops that by passing the VUNSPEC_ATOMIC_OP into arm_unspec_cost and 
handling it there
(very straightforwardly just returning COSTS_N_INSNS (2); there's no indication 
that we want to do anything
smarter here) and stopping the recursion.

This is a small step in the direction of not having to care about obviously 
useless rtxes in the backend.
The astute reader might notice that in sync.md we also have the pattern 
atomic_fetch_
which expands to/matches this:
(set (match_operand:QHSD 0 "s_register_operand" "=&r")
(match_operand:QHSD 1 "mem_noofs_operand" "+Ua"))
   (set (match_dup 1)
(unspec_volatile:QHSD
  [(syncop:QHSD (match_dup 1)
 (match_operand:QHSD 2 "" ""))
   (match_operand:SI 3 "const_int_operand")];; model
  VUNSPEC_ATOMIC_OP))


Here the QImode PLUS is in a PARALLEL together with the UNSPEC, so it might 
have rtx costs called on it
as well. This will always be a (plus (reg) (mem)) rtx, which is unlike any 
other normal rtx we generate
in the arm backend. I'll try to get a patch to handle that case, but I'm still 
thinking on how to best
do that.

Tested arm-none-eabi, I didn't see any codegen differences in some compiled 
codebases.

Ok for trunk?

P.S. I know that expmed creates all kinds of irregular rtxes and asks for their 
costs. I'm hoping to clean that
up at some point...

2015-04-20  Kyrylo Tkachov  

* config/arm/arm.c (arm_new_rtx_costs): Handle UNSPEC_VOLATILE.
(arm_unspec_cost): Allos UNSPEC_VOLATILE.  Do not recurse inside
unknown unspecs.




Re: [PATCH][ARM/AArch64] Properly cost rev16 operand

2015-05-21 Thread Kyrill Tkachov

Ping.

Thanks,
Kyrill
On 12/05/15 10:07, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00013.html

Thanks,
Kyrill
On 01/05/15 09:24, Kyrill Tkachov wrote:

Hi all,

It occurs to me that in the IOR-of-shifts form of the rev16 operation we should 
be costing the operand properly.
For that we'd want to reuse the aarch_rev16_p function that does all the heavy 
lifting and get it to write the
innermost operand of the rev16 for further costing. In the process we relax 
that function a bit to accept any
rtx as the operand, not just REGs so that we can calculate the cost of moving 
them in a register appropriately.

This patch does just that and updates the arm and aarch64 callsites 
appropriately so that the operands are
processed properly.

In practice I don't expect this to make much difference since this patterns 
occurs rarely anyway, but it seems
like the 'right thing to do' (TM).

Bootstrapped and tested on arm,aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-05-01  Kyrylo Tkachov  

   * config/arm/aarch-common-protos.h (aarch_rev16_p): Update signature
   to take a second argument.
   * config/arm/aarch-common.c (aarch_rev16_p): Add second argument.
   Write inner-most rev16 argument to it if recognised.
   (aarch_rev16_p_1): Likewise.
   * config/arm/arm.c (arm_new_rtx_costs): Properly cost rev16 operand
   in the IOR case.
   * config/aarch64/aarch64.c (aarch64_rtx_costs): Likewise.




Re: [PATCH][ARM][stage-1] Initialise cost to COSTS_N_INSNS (1) and increment in arm rtx costs

2015-05-21 Thread Kyrill Tkachov

Ping^3.

Thanks,
Kyrill

On 12/05/15 10:09, Kyrill Tkachov wrote:

Ping^2.

Thanks,
Kyrill

On 30/04/15 13:00, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01130.html

Thanks,
Kyrill

On 21/04/15 10:11, Kyrill Tkachov wrote:

Hi all,

This is the first of a series to clean up and simplify the arm rtx costs 
function.
This patch initialises the cost to COSTS_N_INSNS (1) at the top and increments 
it when appropriate
in the rest of the function. This makes it more similar to the aarch64 rtx 
costs function and saves
us the trouble of having to remember to initialise the cost to COSTS_N_INSNS 
(1) in each case of the
switch statement.

Bootstrapped and tested arm-none-linux-gnueabihf.
Compiled some large programs with no codegen difference, except some DIV 
synthesis algorithms were changed,
presumably due to the cost of SDIV/UDIV, which is now being correctly 
calculated (before it was missing the
baseline COSTS_N_INSNS (1)).

Ok for trunk?

Thanks,
Kyrill

2015-04-21  Kyrylo Tkachov  

* config/arm/arm.c (arm_new_rtx_costs): Initialise cost to
COSTS_N_INSNS (1) and increment it appropriately throughout the
function.




Re: [PATCH] PR target/66232: -fPIC -fno-plt -mx32 fails to generate indirect branch via GOT

2015-05-21 Thread H.J. Lu
On Thu, May 21, 2015 at 6:33 AM, H.J. Lu  wrote:
> On Thu, May 21, 2015 at 6:11 AM, Uros Bizjak  wrote:
>> On Thu, May 21, 2015 at 2:59 PM, H.J. Lu  wrote:
>>> X32 doesn't support indirect branch via 32-bit memory slot since
>>> indirect branch will load 64-bit address from 64-bit memory slot.
>>> Since x32 GOT slot is 64-bit, we should allow indirect branch via GOT
>>> slot for x32.
>>>
>>> I am testing it on x32.  OK for master if there is no regression?
>>>
>>> Thanks.
>>>
>>>
>>> H.J.
>>> --
>>> gcc/
>>>
>>> PR target/66232
>>> * config/i386/constraints.md (Bg): Add a constraint for x32
>>> call and sibcall memory operand.
>>> * config/i386/i386.md (*call_x32): New pattern.
>>> (*sibcall_x32): Likewise.
>>> (*call_value_x32): Likewise.
>>> (*sibcall_value_x32): Likewise.
>>> * config/i386/predicates.md (x32_sibcall_memory_operand): New
>>> predicate.
>>> (x32_call_insn_operand): Likewise.
>>> (x32_sibcall_insn_operand): Likewise.
>>>
>>> gcc/testsuite/
>>>
>>> PR target/66232
>>> * gcc.target/i386/pr66232-1.c: New test.
>>> * gcc.target/i386/pr66232-2.c: Likewise.
>>> * gcc.target/i386/pr66232-3.c: Likewise.
>>> * gcc.target/i386/pr66232-4.c: Likewise.
>>
>> OK.
>>
>> maybe you should use match_code some more in x32_sibcall_memory_operand, e.g.
>>
>> (match_code "constant" "0")
>> (match_code "unspec" "00")
>>
>> But it is up to you, since XINT doesn't fit in this scheme...
>>
>
>>>
>>> +;; Return true if OP is a memory operand that can be used in x32 calls
>>> +;; and sibcalls.  Only th 64-bit GOT slot is allowed.
>>> +(define_predicate "x32_sibcall_memory_operand"
>>> +  (and (match_operand 0 "memory_operand")
>>> +   (match_test "CONSTANT_P (XEXP (op, 0))")
>>> +   (match_test "GET_CODE (XEXP (XEXP (op, 0), 0)) == UNSPEC")
>>> +   (match_test "XINT (XEXP (XEXP (op, 0), 0), 1) == UNSPEC_GOTPCREL")))
>>> +
>
> Since "match_code" doesn't support "constant" neither
>
> #define CONSTANT_P(X)   \
>   (GET_RTX_CLASS (GET_CODE (X)) == RTX_CONST_OBJ)
>
> I will keep it asis.

Here is the updated patch.  It limited memory operand to
GOT slot only.  It used a single pattern to cover both call
and sibcall since only GOT slot is allowed.

OK for master if there is no regression?

Thanks.


-- 
H.J.
---
X32 doesn't support indirect branch via 32-bit memory slot since
indirect branch will load 64-bit address from 64-bit memory slot.
Since x32 GOT slot is 64-bit, we should allow indirect branch via GOT
slot for x32.

gcc/

PR target/66232
* config/i386/constraints.md (Bg): Add a constraint for x32
call and sibcall memory operand.
* config/i386/i386.md (*call_got_x32): New pattern.
(*call_value_got_x32): Likewise.
* config/i386/predicates.md (x32_call_got_memory_operand): New
predicate.
(x32_call_insn_got_operand): Likewise.

gcc/testsuite/

PR target/66232
* gcc.target/i386/pr66232-1.c: New test.
* gcc.target/i386/pr66232-2.c: Likewise.
* gcc.target/i386/pr66232-3.c: Likewise.
* gcc.target/i386/pr66232-4.c: Likewise.
* gcc.target/i386/pr66232-5.c: Likewise.
From 8eb3d88948e95b25ae889fcec5502a2a8dba6347 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 21 May 2015 05:50:14 -0700
Subject: [PATCH] Allow indirect branch via GOT slot for x32

X32 doesn't support indirect branch via 32-bit memory slot since
indirect branch will load 64-bit address from 64-bit memory slot.
Since x32 GOT slot is 64-bit, we should allow indirect branch via GOT
slot for x32.

gcc/

	PR target/66232
	* config/i386/constraints.md (Bg): Add a constraint for x32
	call and sibcall memory operand.
	* config/i386/i386.md (*call_got_x32): New pattern.
	(*call_value_got_x32): Likewise.
	* config/i386/predicates.md (x32_call_got_memory_operand): New
	predicate.
	(x32_call_insn_got_operand): Likewise.

gcc/testsuite/

	PR target/66232
	* gcc.target/i386/pr66232-1.c: New test.
	* gcc.target/i386/pr66232-2.c: Likewise.
	* gcc.target/i386/pr66232-3.c: Likewise.
	* gcc.target/i386/pr66232-4.c: Likewise.
	* gcc.target/i386/pr66232-5.c: Likewise.
---
 gcc/config/i386/constraints.md|  6 ++
 gcc/config/i386/i386.md   | 20 
 gcc/config/i386/predicates.md | 14 ++
 gcc/testsuite/gcc.target/i386/pr66232-1.c | 13 +
 gcc/testsuite/gcc.target/i386/pr66232-2.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr66232-3.c | 13 +
 gcc/testsuite/gcc.target/i386/pr66232-4.c | 13 +
 gcc/testsuite/gcc.target/i386/pr66232-5.c | 16 
 8 files changed, 109 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-5.c

diff --git a/gcc/config/i386/constra

Re: Check canonical types in verify_type

2015-05-21 Thread Jan Hubicka
> > Hmm, I see, interesting hack.  For the first part of comment, I see that
> > qualifiers needs to be ignored, but I do not see why we put
> > short * and int * pointers to same class.
> 
> For the reason that people are very lazy.  For example GCC has code
> punning void ** and void * and void * to Type * (and vice versa).  People
> just don't expect pointers to be in different alias sets (and there is
> little gained with doing that).

I think this observation may be out of date.  Removing that code (punting all
pointer to ptr_type_node) increase number of TBAA disambiguations for firefox
by 20%. It also drops number of queries so in reality the number is higher.

So it seems that it may be worth to track this in more sensible way.  This was
tested on LTO where most pointer alias sets are te same anyway, so the increase
should be bigger in non-LTO build. I will gather some stats.

The issue is that modern C++ code packs everything into instances, puts pointers
to instances everywhere and as wel inline everything into one glob, we could
track down a lot of data if we did not get lost in pointers.

We may want to have flag disabling that and we may want to throw away some
precision without throwing away all.

I tried to just trhow away the qualifier from pointer (i.e. make pointer to
qualified type to be globbed to pointer to unqualified type). This seems to
work in a way bootstrapping GCC with one bug (in ipa-icf) and passing
testsuite.
> 
> > The complete/incomplete type fun with LTO can be solved for C++ but indeed I
> > see why pointer to incomplete type needs to be considered compatible with 
> > every
> > other pointer to structure.  Can't this be dealt with by adding correct 
> > edges
> > into the aliasing dag?
> 
> I don't think so, without the dag degenerating.  We've had this mess
> before (complete vs. incomplete structs and so).  It didn't work very 
> well.

Can you point me to the code/issues, please?
> 
> I don't remember if I added a testcase, so no.  It's just very clear to me
> that the look of pointer members may be very different (consider 
> reference vs. pointer or pointer to array vs pointer to element).

Yep, depending on how much of cross-lanugage datastructures we want to permit.
We could definitely unpeel those differences just like I did with qualifiers.
However this trick is kind of weird because it does not propagate up to 
aggregates
buildt from these.

In a way some of this sould probably better be handled at canonical type
calculation time if we really want to permit mixing up aggregates with those
differences.

For example I think it would make sense to look throught ARRAY_TYPEs when
handling determining pointer canonical type (so pointer to array and pointer to
element are the same).  We also may ignore TREE_CODE match for POINTER_TYPE_P
if we really do have languages that mixes that.

reference types are used by Ada and Fortran (not Java), so it depends on how
these interface to C. I suppose you are right that these ought to be compatible
with C pointers.  I will dig into respective language standards.
> 
> > > > Other issue I run into is that for Ada bootstrap we have variadic type 
> > > > whose
> > > > canonical types are having different temporary set as size.  I think 
> > > > this is
> > > > valid and perhaps gimple_canonical_types_compatible_p should consider
> > > > variadic arrays to be compatible with any array of compatible type?
> > > 
> > > Those are all local types and thus the strict equality compare should be
> > > ok?  Not sure if we can do in C sth like
> > > 
> > > void foo (int n)
> > > {
> > >   struct { int i; int a[n]; } a;
> > >   struct { int i; int a[n]; } b;
> > > }
> > > 
> > > and if we'd have to assign the same alias-sets to 'a' and 'b'.
> > 
> > No idea here either. I wonder if the types are intedned to be TBAA 
> > compatible
> > across two calls of the function.  In that case we may introduce multiple 
> > copies
> > of body by early inlining already that may be a problem.
> 
> No idea.  Well, the canonical type machinery was supposed to be 
> conservative but here we're obviously not 100% conservative.

I will look into producing a testcases and lets see.
> 
> > > > I am not quite convinced we get variadic types right at LTO time, 
> > > > because
> > > > they bypass canonical type calculation anyway and their canonical type
> > > > is set by TYPE_MAIN_VARIANT in lto_read_body_or_constructor which I 
> > > > think
> > > > is not safe.  I will look for a testcase.
> > > 
> > > That is because if they are streamed locally they do not enter type
> > > merging, but they still go via gimple_register_canonical_type, so I'm not
> > > sure where you see they always get their main variant as canonical type.
> > 
> > I tought these are handled by:
> >   /* And fixup types we streamed locally.  */
> > {
> >   struct streamer_tree_cache_d *cache = data_in->reader_cache;
> >   unsigned len = cache->nodes.length ();
> > 

Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-21 Thread Joseph Myers
On Thu, 21 May 2015, Rainer Orth wrote:

> @@ -1864,6 +1873,12 @@ libgcc.mvars: config.status Makefile specs 
> xgcc$(exeext)
>   echo GCC_CFLAGS = '$(GCC_CFLAGS)' >> tmp-libgcc.mvars
>   echo INHIBIT_LIBC_CFLAGS = '$(INHIBIT_LIBC_CFLAGS)' >> tmp-libgcc.mvars
>   echo TARGET_SYSTEM_ROOT = '$(TARGET_SYSTEM_ROOT)' >> tmp-libgcc.mvars
> + if test @enable_default_pie@ = yes; then \
> +   NO_PIE_CFLAGS="-fno-PIE"; \
> 
> Why literal -fno-PIE instead of @NO_PIE_CFLAGS@?

Because this is for the target, but @NO_PIE_CFLAGS@ is for the host.

-- 
Joseph S. Myers
jos...@codesourcery.com


Avoid non-canonical RTL in splitter for adding a large constant to register on the PA

2015-05-21 Thread Jeff Law


The PA has a splitter to optimize the addition of certain constants to a 
register.  One of the cases the splitter handles is when the constant 
requires 2 insns to generate, is divisible by 2, 4, or 8 and if divided 
by 2, 4 or 8 it only needs a single insn to generate the constant.


Obviously the splitter reduces the constant by a scaling factor and 
loads that value, then uses a shift-add insn to scale it back up for the 
addition to the other operand.  That saves us one insn.


That splitter generates the non-canonical MULT form of a shadd.  This 
patch changes it to use ASHIFT form instead.  This fixes all the 
regressions seen in my testcase of 300+ files when I remove the 
non-canonical shift-add using MULT patterns.


We still have other places that can produce the non-canonical form, 
they're just not triggering in that suite of 300+ files.  So I'm not 
removing the non-canonical shift-add using MULT patterns just yet.


Tested on hppa.exp and the 300 files noted above.  Installed on the trunk.
commit 147771b2a45499d118e4c68a7953881f182e1b97
Author: Jeff Law 
Date:   Thu May 21 11:01:59 2015 -0600

* config/pa/pa.md (add-with-constant splitter): Use ASHIFT rather
than MULT for shadd sequences.

* gcc.target/hppa/shadd-4.c: New test.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3ec7255..48472bc 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2015-05-21  Jeff Law  
+
+   * config/pa/pa.md (add-with-constant splitter): Use ASHIFT rather
+   than MULT for shadd sequences.
+
 2015-05-08  Jan Hubicka  
 
* alias.c (alias_stats): New static var.
diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md
index 73c8f6b..aaec27d 100644
--- a/gcc/config/pa/pa.md
+++ b/gcc/config/pa/pa.md
@@ -5132,7 +5132,7 @@
(clobber (match_operand:SI 4 "register_operand" ""))]
   "! pa_cint_ok_for_move (INTVAL (operands[2]))"
   [(set (match_dup 4) (match_dup 2))
-   (set (match_dup 0) (plus:SI (mult:SI (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (plus:SI (ashift:SI (match_dup 4) (match_dup 3))
   (match_dup 1)))]
   "
 {
@@ -5147,17 +5147,17 @@
   if (intval % 2 == 0 && pa_cint_ok_for_move (intval / 2))
 {
   operands[2] = GEN_INT (intval / 2);
-  operands[3] = const2_rtx;
+  operands[3] = const1_rtx;
 }
   else if (intval % 4 == 0 && pa_cint_ok_for_move (intval / 4))
 {
   operands[2] = GEN_INT (intval / 4);
-  operands[3] = GEN_INT (4);
+  operands[3] = const2_rtx;
 }
   else if (intval % 8 == 0 && pa_cint_ok_for_move (intval / 8))
 {
   operands[2] = GEN_INT (intval / 8);
-  operands[3] = GEN_INT (8);
+  operands[3] = GEN_INT (3);
 }
   else if (pa_cint_ok_for_move (-intval))
 {
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 46a6bb7..20a4379 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2015-05-21  Jeff Law  
+
+   * gcc.target/hppa/shadd-4.c: New test.
+
 2015-05-08  Michael Matz  
 
* gcc.dg/vect/vect-strided-store.c: New test.
diff --git a/gcc/testsuite/gcc.target/hppa/shadd-4.c 
b/gcc/testsuite/gcc.target/hppa/shadd-4.c
new file mode 100644
index 000..e25d148
--- /dev/null
+++ b/gcc/testsuite/gcc.target/hppa/shadd-4.c
@@ -0,0 +1,8 @@
+/* { dg-do compile }  */
+/* { dg-options "-O2" }  */
+/* { dg-final { scan-assembler-times "sh.add" 1 } }  */
+unsigned int
+oof (int uid)
+{
+  return (174 << 7) + uid;
+}


Re: [PATCH][AArch64] Add __extension__ and __always_inline__ to crypto intrinsics

2015-05-21 Thread James Greenhalgh
On Thu, May 21, 2015 at 03:42:33PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> The crypto intrinsics are missing an __extension__ and an __always_inline__
> attribute that all the other intrinsics have. I don't see any reason for them
> to be different and the always_inline attribute will be needed if we decide
> to wrap the intrinsics inside a target SIMD pragma.
> 
> Tested aarch64-none-elf.
> 
> Ok for trunk?

OK!

Thanks,
James


Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-21 Thread H.J. Lu
On Thu, May 21, 2015 at 7:13 AM, Rainer Orth
 wrote:
> "H.J. Lu"  writes:
>
>> Here is the complete patch.  Tested on Linux/x86-64.  It is also
>> available on hjl/pie/master branch in git mirror.
>
> As always, please keep generated files like configure and config.in out
> of the submission: it simplifies review.

Will do.

> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index ab9b637..e429274 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -253,6 +253,12 @@ LINKER = $(CC)
>  LINKER_FLAGS = $(CFLAGS)
>  endif
>
> +# We don't want to compile the compiler with -fPIE, it make PCH fail.
> ^s
> +COMPILER += @NO_PIE_CFLAGS@
>
> @@ -750,6 +756,8 @@ CC_FOR_BUILD = @CC_FOR_BUILD@
>  CXX_FOR_BUILD = @CXX_FOR_BUILD@
>  BUILD_CFLAGS= @BUILD_CFLAGS@ -DGENERATOR_FILE
>  BUILD_CXXFLAGS = @BUILD_CXXFLAGS@ -DGENERATOR_FILE
> +BUILD_CFLAGS += @NO_PIE_CFLAGS@
> +BUILD_CXXFLAGS += @NO_PIE_CFLAGS@
>
> Here and in several other places, you use += instead of just adding
> @NO_PIE_CFLAGS@ to the existing BUILD_CFLAGS variable.  Please lets
> keep to the existing idiom instead of randomly introducing another.

I used += because

1.  There are things:

# library is not introduced.  If HOST_LIBS is not set, link with
# $(CXX) to pick up -lstdc++.
ifeq ($(HOST_LIBS),)
LINKER = $(CXX)
LINKER_FLAGS = $(CXXFLAGS)
else
LINKER = $(CC)
LINKER_FLAGS = $(CFLAGS)
endif

Use "+=" only needs one line.

2. There are many usages of  "+=" in Makefile.in already.

> @@ -761,6 +769,7 @@ BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS)
>
>  # Native linker and preprocessor flags.  For x-fragment overrides.
>  BUILD_LDFLAGS=@BUILD_LDFLAGS@
> +BUILD_LDFLAGS += @NO_PIE_FLAG@
>
> Likewise.
>
>  BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \
> -I$(srcdir)/../include @INCINTL@ $(CPPINC) $(CPPFLAGS)
>
> @@ -1864,6 +1873,12 @@ libgcc.mvars: config.status Makefile specs 
> xgcc$(exeext)
> echo GCC_CFLAGS = '$(GCC_CFLAGS)' >> tmp-libgcc.mvars
> echo INHIBIT_LIBC_CFLAGS = '$(INHIBIT_LIBC_CFLAGS)' >> 
> tmp-libgcc.mvars
> echo TARGET_SYSTEM_ROOT = '$(TARGET_SYSTEM_ROOT)' >> tmp-libgcc.mvars
> +   if test @enable_default_pie@ = yes; then \
> + NO_PIE_CFLAGS="-fno-PIE"; \
>
> Why literal -fno-PIE instead of @NO_PIE_CFLAGS@?

Joseph already commented on it.

> +   else \
> + NO_PIE_CFLAGS=; \
> +   fi; \
> +   echo NO_PIE_CFLAGS = "$$NO_PIE_CFLAGS" >> tmp-libgcc.mvars
>
> mv tmp-libgcc.mvars libgcc.mvars
>
> Besides, we're trying to get away from libgcc.mvars, moving the
> detection to libgcc proper.  It would be nice to do so here.

It will happen when libgcc.mvars is removed/moved. I don't want
to duplicate the logic in libgcc/configure now.

> diff --git a/gcc/ada/gcc-interface/Makefile.in 
> b/gcc/ada/gcc-interface/Makefile.in
> index ecc443e..90aedb5 100644
> --- a/gcc/ada/gcc-interface/Makefile.in
> +++ b/gcc/ada/gcc-interface/Makefile.in
> @@ -267,6 +267,9 @@ TOOLS_LIBS = ../link.o ../targext.o ../../ggc-none.o 
> ../../libcommon-target.a \
>../../libcommon.a ../../../libcpp/libcpp.a $(LIBGNAT) $(LIBINTL) 
> $(LIBICONV) \
>../$(LIBBACKTRACE) ../$(LIBIBERTY) $(SYSLIBS) $(TGT_LIB)
>
> +# Add -no-pie to TOOLS_LIBS since some of them are compiled with -fno-PIE.
> +TOOLS_LIBS += @NO_PIE_FLAG@
>
> Again, avoid +=
>
> diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h
> index 4dceb16..adf6f3b 100644
> --- a/gcc/config/sol2.h
> +++ b/gcc/config/sol2.h
> @@ -127,7 +127,7 @@ along with GCC; see the file COPYING3.  If not see
>  #define ASM_SPEC_BASE \
>  "%{v:-V} %{Qy:} %{!Qn:-Qy} %{Ym,*} -s %(asm_cpu)"
>
> -#define ASM_PIC_SPEC " %{fpic|fpie|fPIC|fPIE:-K PIC}"
> +#define ASM_PIC_SPEC " %{" FPIE_OR_FPIC_SPEC ":-K PIC}"
>
>  #undef ASM_CPU_DEFAULT_SPEC
>  #define ASM_CPU_DEFAULT_SPEC \
>
> This is ok once the rest goes in.  I haven't reviewed the other
> target-specific parts, though.
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 04332c1..437a534 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -1585,6 +1585,9 @@ not be built.
>  Specify that the run-time libraries for stack smashing protection
>  should not be built.
>
> +@item --enable-default-pie
> +Turn on @option{-fPIE} and @option{-pie} by default.
> +
>  @item --disable-libquadmath
>  Specify that the GCC quad-precision math library should not be built.
>  On some systems, the library is required to be linkable when building
>
> This option was added in a seemingly completely random place, between
> options to enable/disable runtime libs.  Please find a better place.

I will move it to before "@item --enable-secureplt". If you aren't
happy with it,
can you suggest a better place?

> diff --git a/gcc/opts.c b/gcc/opts.c
> index 9deb8df..4b6d978 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -739,8 +739,22 @@ finish_options (struct gcc_options *opts, struct 
> gcc_options *opts_set,
>opts->x_flag_

Ping ** 0.5 patch, fortran] Inline matmul with conjugate complex numbers

2015-05-21 Thread Thomas Koenig
Am 18.05.2015 um 00:05 schrieb Thomas Koenig:
> this patch extends the inline matmul functionality to conjugate
> complex numbers.
> 
> Regression-tested. OK for trunk?

OK (with the trivial change in the follow-up e-mail)?

I'd like to start extending this to TRANSPOSE(CONJG(A)) :-)

Thomas


Re: [AArch64][TLSLE][4/N] Recognize -mtls-size

2015-05-21 Thread Jiong Wang

Jiong Wang writes:

> This patch add -mtls-size option for AArch64. This option let user to do
> finer control on code generation for various TLS model on AArch64.
>
> For example, for TLS LE, user can specify smaller tls-size, for example
> 4K which is quite usual, to let AArch64 backend generate more efficient
> instruction sequences.
>
> Currently, -mtls-size accept all integer, then will translate it into
> 12(4K), 24(16M), 32(4G), 48(256TB) based on the value.
>
> no functional change.
>
> ok for trunk?
>
> 2015-05-20  Jiong Wang  
>
> gcc/
>   * config/aarch64/aarch64.opt (mtls-size): New entry.
>   * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function.
>   * doc/invoke.texi (AArch64 Options): Document -mtls-size.

Rename summary from "5/N" to "4/N".

The fourth patch was a binutils patch at:
  https://sourceware.org/ml/binutils/2015-05/msg00181.html
  
-- 
Regards,
Jiong



Re: C/C++ PATCH to allow deprecating enum values (PR c/47043)

2015-05-21 Thread Jason Merrill

On 05/07/2015 12:22 PM, Marek Polacek wrote:

-  mark_used (decl);
+  mark_used (decl, 0);


This should use tf_none rather than 0.


+  build_enumerator (DECL_NAME (decl), value, newtag,
+   DECL_ATTRIBUTES (decl), DECL_SOURCE_LOCATION (decl));


This is assuming that enumerators can't have dependent attributes.  I 
guess that's currently true, but please add a comment about it.


OK with those changes.

Jason




Re: [patch] testsuite enable PIE tests on FreeBSD

2015-05-21 Thread Andreas Tobler

On 20.05.15 22:30, Jeff Law wrote:

On 05/20/2015 11:04 AM, Andreas Tobler wrote:

Hi,

the attached patch enables some PIE tests on FreeBSD.

Ok for trunk?

Thanks,
Andreas

2015-05-20  Andreas Tobler  

  * gcc.target/i386/pr32219-1.c: Enable test on FreeBSD.
  * gcc.target/i386/pr32219-2.c: Likewise.
  * gcc.target/i386/pr32219-3.c: Likewise.
  * gcc.target/i386/pr32219-4.c: Likewise.
  * gcc.target/i386/pr32219-5.c: Likewise.
  * gcc.target/i386/pr32219-6.c: Likewise
  * gcc.target/i386/pr32219-7.c: Likewise.
  * gcc.target/i386/pr32219-8.c: Likewise.
  * gcc.target/i386/pr39013-1.c: Likewise.
  * gcc.target/i386/pr39013-2.c: Likewise.
  * gcc.target/i386/pr64317.c: Likewise.

Wouldn't it be better to remove the target selector and instead add:

/* { dg-require-effective-target pie } */

In each of those tests?

While the net effect is the same today, it means there's only one place
to change if another x86 target gains PIE support in the future.

Pre-approved using that style.


Thanks!

Tested on amd64-freebsd and CentOS.

Andreas


This is what I committed:

2015-05-21  Andreas Tobler  

* gcc.target/i386/pr32219-1.c: Use 'dg-require-effective-target pie'
instead of listing several targets on its own.
* gcc.target/i386/pr32219-2.c: Likewise.
* gcc.target/i386/pr32219-3.c: Likewise.
* gcc.target/i386/pr32219-4.c: Likewise.
* gcc.target/i386/pr32219-5.c: Likewise.
* gcc.target/i386/pr32219-6.c: Likewise
* gcc.target/i386/pr32219-7.c: Likewise.
* gcc.target/i386/pr32219-8.c: Likewise.
* gcc.target/i386/pr39013-1.c: Likewise.
* gcc.target/i386/pr39013-2.c: Likewise.
* gcc.target/i386/pr64317.c: Likewise.




Index: pr32219-1.c
===
--- pr32219-1.c (revision 223448)
+++ pr32219-1.c (working copy)
@@ -1,4 +1,5 @@
-/* { dg-do compile { target *-*-linux* } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
 /* { dg-options "-O2 -fpie" } */
 
 /* Initialized common symbol with -fpie.  */
Index: pr32219-2.c
===
--- pr32219-2.c (revision 223448)
+++ pr32219-2.c (working copy)
@@ -1,4 +1,5 @@
-/* { dg-do compile { target *-*-linux* } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
 /* { dg-options "-O2 -fpic" } */
 
 /* Common symbol with -fpic.  */
Index: pr32219-3.c
===
--- pr32219-3.c (revision 223448)
+++ pr32219-3.c (working copy)
@@ -1,4 +1,5 @@
-/* { dg-do compile { target *-*-linux* } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
 /* { dg-options "-O2 -fpie" } */
 
 /* Weak common symbol with -fpie.  */
Index: pr32219-4.c
===
--- pr32219-4.c (revision 223448)
+++ pr32219-4.c (working copy)
@@ -1,4 +1,5 @@
-/* { dg-do compile { target *-*-linux* } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
 /* { dg-options "-O2 -fpic" } */
 
 /* Weak common symbol with -fpic.  */
Index: pr32219-5.c
===
--- pr32219-5.c (revision 223448)
+++ pr32219-5.c (working copy)
@@ -1,4 +1,5 @@
-/* { dg-do compile { target *-*-linux* } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
 /* { dg-options "-O2 -fpie" } */
 
 /* Initialized symbol with -fpie.  */
Index: pr32219-6.c
===
--- pr32219-6.c (revision 223448)
+++ pr32219-6.c (working copy)
@@ -1,4 +1,5 @@
-/* { dg-do compile { target *-*-linux* } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
 /* { dg-options "-O2 -fpic" } */
 
 /* Initialized symbol with -fpic.  */
Index: pr32219-7.c
===
--- pr32219-7.c (revision 223448)
+++ pr32219-7.c (working copy)
@@ -1,4 +1,5 @@
-/* { dg-do compile { target *-*-linux* } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
 /* { dg-options "-O2 -fpie" } */
 
 /* Weak initialized symbol with -fpie.  */
Index: pr32219-8.c
===
--- pr32219-8.c (revision 223448)
+++ pr32219-8.c (working copy)
@@ -1,4 +1,5 @@
-/* { dg-do compile { target *-*-linux* } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
 /* { dg-options "-O2 -fpic" } */
 
 /* Weak initialized symbol with -fpic.  */
Index: pr39013-1.c
===
--- pr39013-1.c (revision 223448)
+++ pr39013-1.c (working copy)
@@ -1,5 +1,6 @@
 /* PR target/39013 */
-/* { dg-do compile { target *-*-linux* *-*-gnu* } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target pie } */
 /* { dg-options "-O2 -fpie 

Calculate TYPE_CANONICAL only for types that can be accessed in memory

2015-05-21 Thread Jan Hubicka
Hi,
this is next part of the series.  It disables canonical type calculation for
incomplete types with exception of arrays based on claim that we do not have
good notion of those.

I can botostrap this with additional checks in alias.c that canonical types are
always present with LTO but I need fix to ICF that compare alias sets of types
it does not need to and trips incomplete types otherwise.  I will push out
these fixes separately and incrementally add the fix.  The purpose of those
checks is to avoid alias.c degenerating to structural equality path for no good
reason.

I tried the alternative to disable it on ARRAY_TYPES too and add avoid
recursion to those for fields.  THis does not fly because we can have
ARRAY_REFS of incomplete types:
 
unit size 
align 8 symtab -158253232 alias set 0 canonical type 0x76adb498 
precision 8 min  max 
pointer_to_this >
readonly unsigned DI
size 
unit size 
align 64 symtab 0 alias set -1 canonical type 0x76af37e0
pointer_to_this >
readonly
arg 0 
BLK
align 64 symtab 0 alias set -1 structural equality
pointer_to_this >
   
arg 0 
constant arg 0 >
arg 1 >
arg 1 
unit size 
align 32 symtab -158421968 alias set 3 canonical type 
0x76adb690 precision 32 min  max 

pointer_to_this  reference_to_this 
>
visiteddef_stmt _103 = (int) _101;

version 103
ptr-info 0x769f04a0>
../../gcc/print-rtl.c:173:4>

and we compute alias set for it via:
#0  internal_error (gmsgid=0x1b86c8f "in %s, at %s:%d") at 
../../gcc/diagnostic.c:1271
#1  0x015e2416 in fancy_abort (file=0x167ea2a "../../gcc/alias.c", 
line=823, function=0x167f7d6  
"get_alias_set")
at ../../gcc/diagnostic.c:1341
#2  0x007109b9 in get_alias_set (t=0x7694b2a0) at 
../../gcc/alias.c:823
#3  0x0070fecf in component_uses_parent_alias_set_from 
(t=0x769c2968) at ../../gcc/alias.c:607
#4  0x00710497 in reference_alias_ptr_type_1 (t=0x7fffe068) at 
../../gcc/alias.c:719
#5  0x007107e8 in get_alias_set (t=0x769c2968) at 
../../gcc/alias.c:799
#6  0x00ebca97 in vn_reference_lookup (op=0x769c2968, 
vuse=0x769ca798, kind=VN_WALKREWRITE, vnresult=0x0) at 
../../gcc/tree-ssa-sccvn.c:2217
#7  0x00ebea99 in visit_reference_op_load (lhs=0x769c5678, 
op=0x769c2968, stmt=0x769cf730) at ../../gcc/tree-ssa-sccvn.c:3030
#8  0x00ec05ec in visit_use (use=0x769c5678) at 
../../gcc/tree-ssa-sccvn.c:3685
#9  0x00ec1047 in process_scc (scc=...) at 
../../gcc/tree-ssa-sccvn.c:3927
#10 0x00ec1679 in extract_and_process_scc_for_name 
(name=0x769c5678) at ../../gcc/tree-ssa-sccvn.c:4013
#11 0x00ec1848 in DFS (name=0x769c5678) at 
../../gcc/tree-ssa-sccvn.c:4065
#12 0x00ec26d1 in cond_dom_walker::before_dom_children 
(this=0x7fffe5a0, bb=0x769b9888) at ../../gcc/tree-ssa-sccvn.c:4345
#13 0x014c05c0 in dom_walker::walk (this=0x7fffe5a0, 
bb=0x769b9888) at ../../gcc/domwalk.c:188
#14 0x00ec2b0e in run_scc_vn (default_vn_walk_kind_=VN_WALKREWRITE) at 
../../gcc/tree-ssa-sccvn.c:4436
#15 0x00e98d59 in (anonymous namespace)::pass_fre::execute 
(this=0x1f621b0, fun=0x7698db28) at ../../gcc/tree-ssa-pre.c:4972
#16 0x00bb6c8f in execute_one_pass (pass=0x1f621b0) at 
../../gcc/passes.c:2317
#17 0x00bb6ede in execute_pass_list_1 (pass=0x1f621b0) at 
../../gcc/passes.c:2370
#18 0x00bb6f0f in execute_pass_list_1 (pass=0x1f61d90) at 
../../gcc/passes.c:2371
#19 0x00bb6f51 in execute_pass_list (fn=0x7698db28, pass=0x1f61cd0) 
at ../../gcc/passes.c:2381
#20 0x007bb3f6 in cgraph_node::expand (this=0x7695b000) at 
../../gcc/cgraphunit.c:1895
#21 0x007bba15 in expand_all_functions () at ../../gcc/cgraphunit.c:2031
#22 0x007bc4e9 in symbol_table::compile (this=0x76adb000) at 
../../gcc/cgraphunit.c:2384
#23 0x006f846c in lto_main () at ../../gcc/lto/lto.c:3315
#24 0x00cb465f in compile_file () at ../../gcc/toplev.c:594
#25 0x00cb6bb8 in do_compile () at ../../gcc/toplev.c:2081
#26 0x00cb6e04 in toplev::main (this=0x7fffe860, argc=33, 
argv=0x7fffe968) at ../../gcc/toplev.c:2182
#27 0x015c9739 in main (argc=33, argv=0x7fffe968) at 
../../gcc/main.c:39

Though few lines down alias.c globs to element type:
if (TREE_CODE (t) == ARRAY_TYPE && !TYPE_NONALIASED_COMPONENT (t))
  set = get_alias_set (TREE_TYPE (t));

I supose we can move it up and then skip calculation of those, but sollution
bellow makes sense to me.  Type has TYPE_CANONICAL if it (or its parts) can be
accessed.  Incomplete array fields can be accessed and thus need
TYPE_CANONICAL.

LTO bootstrapped on x86_64-linux, regtested with other patches, I am re-testing
in isolation 

  1   2   >