Re: [4/9] Add a dedicated rtx union member for REGs

2015-05-19 Thread Richard Sandiford
Jeff Law  writes:
> On 05/18/2015 12:19 PM, Richard Sandiford wrote:
>> This patch replaces the current REG "i0" format with a dedicated structure,
>> so that we can make use of the extra 32 bits in the "i" field.
>>
>> Of the places that iterate on formats and do something for 'i's,
>> most already handled REGs specially before the format walk and
>> so don't need to check for 'r'.  Otherwise it's mostly just a case
>> of adding dummy 'r' cases in order avoid the default gcc_unreachable ()
>> (in cases where we do the same for 'i').  The main exceptions are the
>> cselib.c and lra-constraints.c changes.
>>
>> final.c:leaf_renumber_regs_insn handled REGs specially but then
>> went on to do a no-op walk of the format.  I just added an early
>> exit instead of an empty 'r' case.
>>
>>
>> gcc/
>>  * rtl.def (REG): Change format to "r".
>>  * rtl.h (rtunion): Remove rt_reg.
>>  (reg_info): New structure.
>>  (rtx_def): Add reg field to main union.
>>  (X0REGATTR): Delete.
>>  (REG_CHECK): New macro.
>>  (SET_REGNO_RAW, rhs_regno, REG_ATTRS): Use it.
>>  * rtl.c (rtx_format): Document "r".
>>  (rtx_code_size): Handle REG specially.
>>  * gengenrtl.c (special_format): Return true for formats
>>  that include 'r'.
>>  * gengtype.c (adjust_field_rtx_def): Handle 'r' fields.
>>  Deal with REG_ATTRS after the field loop.
>>  * emit-rtl.c (gen_raw_REG): Call rtx_alloc_stat directly.
>>  * expmed.c (init_expmed): Call gen_raw_REG instead of
>>  gen_rtx_raw_REG.
>>  * expr.c (init_expr_target): Likewise.
>>  * regcprop.c (maybe_mode_change): Likewise.
>>  * varasm.c (make_decl_rtl): Likewise.
>>  * final.c (leaf_renumber_regs_insn): Return early after
>>  handling REGs.
>>  * genemit.c (gen_exp): Handle 'r' fields.
>>  * genpeep.c (match_rtx): Likewise.
>>  * gensupport.c (subst_pattern_match): Likewise.
>>  (get_alternatives_number, collect_insn_data, alter_predicate_for_insn)
>>  (alter_constraints, subst_dup): Likewise.
>>  * read-rtl.c (read_rtx_code): Likewise.
>>  * print-rtl.c (print_rtx): Likewise.
>>  * genrecog.c (find_operand, find_matching_operand): Likewise.
>>  (validate_pattern, match_pattern_2): Likewise.
>>  (parameter::UINT, rtx_test::REGNO_FIELD): New enum values.
>>  (rtx_test::regno_field): New function.
>>  (operator ==, safe_to_hoist_p, transition_parameter_type)
>>  (parameter_type_string, print_parameter_value)
>>  (print_nonbool_test, print_test): Handle new enum values.
>>  * cselib.c (rtx_equal_for_cselib_1): Handle REG specially.
>>  * lra-constraints.c (operands_match_p): Likewise.
> Just to confirm, this doesn't change the size of a REG object, right? 
> If it doesn't change the size, then it's OK.

It doesn't change the size for LP64 hosts.  It makes it 32 bits bigger
for ILP32 hosts.  See https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01598.html
for discussion about that.

If we make the opposite call this time and say that it's better to have
two alternative rtx layouts rather than grow an rtx for ILP32 hosts,
then there are things we could do.  E.g. I doubt an ILP32 host has
enough addressable memory to cope with 1<<24 registers (and associated
instructions), so for ILP32 we could split original_regno into an 8/24
bitfield, using the :8 for the number of registers and the :24 for the
original_regno.  original_regno is so rarely used that an extra shift
or mask shouldn't matter.

Or we could simply leave ILP32 hosts accessing hard_regno_nregs each
time.  The problem then would be that an incorrectly-cached number
of registers would only show up on one type of host.  It would also mean
that REG_NREGS would need to be defined in regs.h for ILP32 at least,
since hard_regno_nregs isn't visible inside rtl.h.

Thanks,
Richard



Re: [4/9] Add a dedicated rtx union member for REGs

2015-05-19 Thread Jeff Law

On 05/19/2015 01:00 AM, Richard Sandiford wrote:

Just to confirm, this doesn't change the size of a REG object, right?
If it doesn't change the size, then it's OK.


It doesn't change the size for LP64 hosts.  It makes it 32 bits bigger
for ILP32 hosts.  See https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01598.html
for discussion about that.

If we make the opposite call this time and say that it's better to have
two alternative rtx layouts rather than grow an rtx for ILP32 hosts,
then there are things we could do.  E.g. I doubt an ILP32 host has
enough addressable memory to cope with 1<<24 registers (and associated
instructions), so for ILP32 we could split original_regno into an 8/24
bitfield, using the :8 for the number of registers and the :24 for the
original_regno.  original_regno is so rarely used that an extra shift
or mask shouldn't matter.

Or we could simply leave ILP32 hosts accessing hard_regno_nregs each
time.  The problem then would be that an incorrectly-cached number
of registers would only show up on one type of host.  It would also mean
that REG_NREGS would need to be defined in regs.h for ILP32 at least,
since hard_regno_nregs isn't visible inside rtl.h.
I can live with increasing on ilp32 hosts with lp64 staying the same -- 
ilp32 hosts are less and less important every day.


jeff



Re: [PATCH, RFC] New memory usage statistics infrastructure

2015-05-19 Thread Jeff Law

On 05/15/2015 08:38 AM, Martin Liška wrote:

Hello.

Following patch attempts to rewrite memory reports for GCC's internal
allocations
so that it uses a new template type. The type shares parts which are
currently duplicated,
adds support for special 'counters' and introduces new support for
hash-{set,map,table}.

Transformation of the current code is a bit tricky as we internally used
hash-table as main
data structure which takes care of location-related allocations. As I
want to add support even
for hash tables (and all derived types), header files inclusion and
forward declaration is utilized.

Feel free to comment the patch, as well as missing features one may want
to track by location sensitive
memory allocation.

Attachment contains sample output taken from tramp3d-v4.cpp.

Thanks,
Martin




0001-New-memory-allocation-statistics-infrastructure.patch


 From 7a9048ef36bddf17209acadc1dcab9fc48a7fd63 Mon Sep 17 00:00:00 2001
From: mliska
Date: Tue, 5 May 2015 11:34:16 +0200
Subject: [PATCH] New memory allocation statistics infrastructure.

gcc/ChangeLog:

2015-05-05  Martin Liska

* Makefile.in: Add additional dependencies related to memory report
enhancement.
* alloc-pool.c (allocate_pool_descriptor): Use new ctor.
* bitmap.c (struct bitmap_descriptor_d): Remove.
(struct loc): Likewise.
(struct bitmap_desc_hasher): Likewise.
(bitmap_desc_hasher::hash): Likewise.
(bitmap_desc_hasher::equal): Likewise.
(get_bitmap_descriptor): Likewise.
(bitmap_register): User new memory descriptor API.
(register_overhead): Likewise.
(bitmap_find_bit): Register nsearches and search_iter statistics.
(struct bitmap_output_info): Remove.
(print_statistics): Likewise.
(dump_bitmap_statistics): Use new memory descriptor.
* bitmap.h (struct bitmap_usage): New class.
* genmatch.c: Extend header file inclusion.
* genpreds.c: Likewise.
* ggc-common.c (struct ggc_usage): New class.
(struct ggc_loc_desc_hasher): Remove.
(ggc_loc_desc_hasher::hash): Likewise.
(ggc_loc_desc_hasher::equal): Likewise.
(struct ggc_ptr_hash_entry): Likewise.
(struct ptr_hash_hasher): Likewise.
(ptr_hash_hasher::hash): Likewise.
(ptr_hash_hasher::equal): Likewise.
(make_loc_descriptor): Likewise.
(ggc_prune_ptr): Likewise.
(dump_ggc_loc_statistics): Use new memory descriptor.
(ggc_record_overhead): Likewise.
(ggc_free_overhead): Likewise.
(final_cmp_statistic): Remove.
(cmp_statistic): Likewise.
(ggc_add_statistics): Liekwise.
(ggc_prune_overhead_list): Likewise.
* hash-map-traits.h: New file.
* hash-map.h (struct default_hashmap_traits): Move the traits to a
separate header file.
* hash-set.h: Pass memory statistics info to ctor.
* hash-table.c (void dump_hash_table_loc_statistics): New function.
* hash-table.h (hash_table::hash_table): Add new ctor arguments.
(hash_table::~hash_table): Register memory release operation.
(hash_table::alloc_entries): Handle memory allocation operation.
(hash_table::expand): Likewise.
* inchash.c (iterative_hash_hashval_t): Move implementation to header
file.
(iterative_hash_host_wide_int): Likewise.
* inchash.h (class hash): Likewise.
* mem-stats-traits.h: New file.
* mem-stats.h: New file.
(mem_location): Add new class.
(mem_usage): Likewise.
(mem_alloc_description): Likewise.
* sese.c: Add new header file inclusision.
* toplev.c (dump_memory_report): Add report for hash_table, hash_map
and hash_set.
* tree-sra.c: Add new header file inclusision.
* vec.c (struct vec_descriptor): Remove.
(hash_descriptor): Likewise.
(struct vec_usage): Likewise.
(struct ptr_hash_entry): Likewise.
(hash_ptr): Likewise.
(eq_ptr): Likewise.
(vec_prefix::register_overhead): Use new memory descriptor API.
(vec_prefix::release_overhead): Likewise.
(add_statistics): Remove.
(dump_vec_loc_statistics): Use new memory descriptor API.
* vec.h (struct vec_prefix): Likewise.
(va_heap::reserve): Likewise.
(va_heap::release): Likewise.
Please check for overly-long lines.  I spotted several, but didn't keep 
them handy as I was working through the patch.  ISTM that using the 
script from contrib/ might help identify the overly long lines.


Overall it looks good -- I like the consistency in reporting.  I think 
if there's things missing, we ought to be able to add them incrementally.


My recommendation would be to take care of the line wrapping issues, run 
it through the usual bootstrap and testing cycles and commit assuming 
nothing is amiss.


jeff


Re: [gomp4] bootstrap broken, function enclosing_target_ctx defined but not used

2015-05-19 Thread Tom de Vries

[ moved to gcc-patches ml ]

On 18-05-15 17:31, Tom de Vries wrote:

Thomas,

In ran into this bootstrap failure with branch gomp-4_0-branch:
...
src/gcc-gomp-4_0-branch/gcc/omp-low.c:2897:1: error: 'omp_context*
enclosing_target_ctx(omp_context*)' defined but not used 
[-Werror=unused-function]
  enclosing_target_ctx (omp_context *ctx)
  ^
cc1plus: all warnings being treated as errors
make[3]: *** [omp-low.o] Error 1
...



This patch fixes bootstrap by commenting out the unused function 
enclosing_target_ctx.


The patch just comments it out, since I'm not sure whether:
- the function needs to be removed, or
- a user of the function will soon be committed.

Committed to fix bootstrap.

Thanks,
- Tom

Comment out unused enclosing_target_ctx

2015-05-19  Tom de Vries  

* omp-low.c (enclosing_target_ctx): Comment out.
---
 gcc/omp-low.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 914549c..3414ab5 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2893,6 +2893,7 @@ finish_taskreg_scan (omp_context *ctx)
 }


+#if 0
 static omp_context *
 enclosing_target_ctx (omp_context *ctx)
 {
@@ -2902,6 +2903,7 @@ enclosing_target_ctx (omp_context *ctx)
   gcc_assert (ctx != NULL);
   return ctx;
 }
+#endif

 static bool
 oacc_loop_or_target_p (gimple stmt)



[SH][committed] Fix gcc.target/sh/pr54236-2.c failures

2015-05-19 Thread Oleg Endo
Hi,

Since a recent change to the tree optimizers
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00089.html
some related SH patterns stopped working.  The attached patch fixes
this.

Tested briefly with 'make all' and with
make -k check-gcc RUNTESTFLAGS="sh.exp=pr54236* --target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r223346.

Cheers,
Oleg

gcc/ChangeLog:
PR target/54236
* config/sh/sh.md (*round_int_even): New insn_and_split and
accompanying new unnamed split.

gcc/testsuite/ChangeLog:
PR target/54236
* gcc.target/sh/pr54236-2.c: Adjust expected insn counts.
Index: gcc/testsuite/gcc.target/sh/pr54236-2.c
===
--- gcc/testsuite/gcc.target/sh/pr54236-2.c	(revision 223273)
+++ gcc/testsuite/gcc.target/sh/pr54236-2.c	(working copy)
@@ -4,12 +4,19 @@
 /* { dg-do compile }  */
 /* { dg-options "-O1" } */
 /* { dg-skip-if "" { "sh*-*-*" } { "-m5*"} { "" } } */
-/* { dg-final { scan-assembler-times "addc" 37 } } */
-/* { dg-final { scan-assembler-times "shlr" 23 } } */
+/* { dg-final { scan-assembler-times "addc" 36 } } */
+/* { dg-final { scan-assembler-times "shlr" 22 } } */
 /* { dg-final { scan-assembler-times "shll" 14 } } */
-/* { dg-final { scan-assembler-times "add\t" 12 } } */
+/* { dg-final { scan-assembler-times "add\tr" 12 } } */
 /* { dg-final { scan-assembler-not "movt" } } */
 
+/* { dg-final { scan-assembler-times "add\t#1" 1 } } */
+
+/* { dg-final { scan-assembler-times "mov\t#-2" 1 { target { ! sh2a } } } } */
+/* { dg-final { scan-assembler-times "and\tr" 1 { target { ! sh2a } } } } */
+
+/* { dg-final { scan-assembler-times "bclr\t#0" 1 { target { sh2a } } } } */
+
 int
 test_000 (int a, int c, int b, int d)
 {
@@ -125,7 +132,8 @@
 int
 test_016 (int a, int b, int c, int d)
 {
-  // 1x shlr, 1x addc
+  // non-SH2A: 1x add #1, 1x mov #-2, 1x and
+  // SH2A: 1x add #1, 1x blcr #0
   return a + (a & 1);
 }
 
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 223274)
+++ gcc/config/sh/sh.md	(working copy)
@@ -1998,6 +1998,44 @@
   [(set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))
(set (match_dup 0) (plus:SI (match_dup 0) (const_int 1)))])
 
+
+;; The tree optimiziers canonicalize 
+;;reg + (reg & 1)
+;; into
+;;(reg + 1) & -2
+;;
+;; On SH2A an add-bclr sequence will be used to handle this.
+;; On non-SH2A re-emit the add-and sequence to improve register utilization.
+(define_insn_and_split "*round_int_even"
+  [(set (match_operand:SI 0 "arith_reg_dest")
+	(and:SI (plus:SI (match_operand:SI 1 "arith_reg_operand")
+			 (const_int 1))
+		(const_int -2)))]
+  "TARGET_SH1 && !TARGET_SH2A && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (const_int -2))
+   (set (match_dup 2) (plus:SI (match_dup 1) (const_int 1)))
+   (set (match_dup 0) (and:SI (match_dup 0) (match_dup 2)))]
+{
+  operands[2] = gen_reg_rtx (SImode);
+})
+
+;; If the *round_int_even pattern is combined with another plus,
+;; convert it into an addc pattern to emit an shlr-addc sequence.
+;; This split is taken by combine on non-SH2A and SH2A.
+(define_split
+  [(set (match_operand:SI 0 "arith_reg_dest")
+	(plus:SI (and:SI (plus:SI (match_operand:SI 1 "arith_reg_operand")
+  (const_int 1))
+			 (const_int -2))
+		 (match_operand:SI 2 "arith_reg_operand")))]
+  "TARGET_SH1 && can_create_pseudo_p ()"
+  [(parallel [(set (match_dup 0)
+		   (plus:SI (plus:SI (match_dup 1) (match_dup 2))
+			(and:SI (match_dup 1) (const_int 1
+	  (clobber (reg:SI T_REG))])])
+
 ;; Split 'reg + T' into 'reg + 0 + T' to utilize the addc insn.
 ;; If the 0 constant can be CSE-ed, this becomes a one instruction
 ;; operation, as opposed to sequences such as


Re: [match-and-simplify] fix incorrect code-gen in 'for' pattern

2015-05-19 Thread Richard Biener
On Mon, 18 May 2015, Prathamesh Kulkarni wrote:

> On 18 May 2015 at 14:12, Richard Biener  wrote:
> > On Sat, 16 May 2015, Prathamesh Kulkarni wrote:
> >
> >> Hi,
> >> genmatch generates incorrect code for following (artificial) pattern:
> >>
> >> (for op (plus)
> >>   op2 (op)
> >>   (simplify
> >> (op @x @y)
> >> (op2 @x @y)
> >>
> >> generated gimple code: http://pastebin.com/h1uau9qB
> >> 'op' is not replaced in the generated code on line 33:
> >> *res_code = op;
> >>
> >> I think it would be a better idea to make op2 iterate over same set
> >> of operators (op2->substitutes = op->substitutes).
> >> I have attached patch for the same.
> >> Bootstrap + testing in progress on x86_64-unknown-linux-gnu.
> >> OK for trunk after bootstrap+testing completes ?
> >
> > Hmm, but then the example could as well just use 'op'.  I think we
> > should instead reject this.
> >
> > Consider
> >
> >   (for op (plus minus)
> > (for op2 (op)
> >   (simplify ...
> >
> > where it is not clear what would be desired.  Simple replacement
> > of 'op's value would again just mean you could have used 'op' in
> > the first place.  Doing what you propose would get you
> >
> >   (for op (plus minus)
> > (for op2 (plus minus)
> >   (simplify ...
> >
> > thus a different iteration.
> >
> >> I wonder if we really need is_oper_list flag in user_id ?
> >> We can determine if user_id is an operator list
> >> if user_id::substitutes is not empty ?
> >
> > After your change yes.
> >
> >> That will lose the ability to distinguish between user-defined operator
> >> list and list-iterator in for like op/op2, but I suppose we (so far) don't
> >> need to distinguish between them ?
> >
> > Well, your change would simply make each list-iterator a (temporary)
> > user-defined operator list as well as the current iterator element
> > (dependent on context - see the nested for example).  I think that
> > adds to confusion.
> >
> > So - can you instead reject this use?
> Well my intention was to have support for walking operator list in reverse.
> For eg:
> (for bitop (bit_and bit_ior)
>   rbitop (bit_ior bit_and)
>...)
> Could be replaced by sth like:
> (for bitop (bit_and bit_ior)
>   rbitop (~bitop))
>...)
> 
> where "~bitop" would indicate walking (bit_and bit_ior) in reverse.
> Would that be a good idea ? For symmetry, I thought
> (for op (list)
>   op2 (op))
> should be supported too.

Hmm, but is this really a useful extension?  To me it just complicates
the syntax for the occasional reader.

Richard.

> 
> Thanks,
> Prathamesh
> 
> 
> >
> > Thanks,
> > Richard.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


Re: [match-and-simplify] fix incorrect code-gen in 'for' pattern

2015-05-19 Thread Marek Polacek
On Tue, May 19, 2015 at 10:33:08AM +0200, Richard Biener wrote:
> > Would that be a good idea ? For symmetry, I thought
> > (for op (list)
> >   op2 (op))
> > should be supported too.
> 
> Hmm, but is this really a useful extension?  To me it just complicates
> the syntax for the occasional reader.

FWIW, I'd prefer this to be rejected than supported.

Marek


Re: [PATCH 01/13] recog: Increased max number of alternatives - v2

2015-05-19 Thread Andreas Krebbel
On 05/18/2015 04:19 PM, Richard Biener wrote:
> On Mon, May 18, 2015 at 3:41 PM, Andreas Krebbel
>  wrote:
>> The new version also changes the type for the alternative_mask to unsigned 
>> HOST_WIDE_INT.
>>
>> Bootstrapped without regressions on x86-64.
>>
>> Ok to apply?
> 
> Please use uint64_t instead.

Done. Ok with that change?

-Andreas-




Re: [PATCH 02/13] optabs: Fix vec_perm -> V16QI middle end lowering.

2015-05-19 Thread Andreas Krebbel
On 05/18/2015 07:35 PM, Richard Henderson wrote:
> On 05/11/2015 06:23 AM, Andreas Krebbel wrote:
>> @@ -6784,14 +6784,18 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, 
>> rtx sel, rtx target)
>>  {
>>/* Multiply each element by its byte size.  */
>>machine_mode selmode = GET_MODE (sel);
>> +  /* We cannot re-use SEL as a temp operand since it might by in
>> + read-only storage.  */
>> +  rtx sel_reg = gen_reg_rtx (selmode);
>> +
>>if (u == 2)
>> -sel = expand_simple_binop (selmode, PLUS, sel, sel,
>> -   sel, 0, OPTAB_DIRECT);
>> +sel_reg = expand_simple_binop (selmode, PLUS, sel, sel,
>> +   sel_reg, 0, OPTAB_DIRECT);
>>else
> 
> You needn't allocate sel_reg explicitly; expand_simple_binop will do that for
> you if the TARGET parameter is NULL.
> 
> Thus this patch should be an 8 character change on those two calls.

Right. Thanks!

Ok to apply with that change?

-Andreas-



Re: Check canonical types in verify_type

2015-05-19 Thread Richard Biener
On Mon, 18 May 2015, Jan Hubicka wrote:

> > > +  if (!comp_type_attributes (t1, t2))
> > > + return false;
> > > 
> > > Because I think the TBAA does not care about attribute lists.  I suppose 
> > > this
> > > is kind-of harmless because of:
> > 
> > I think that was about attributes like 'packed', but those should have
> > their effects applied at stor-layout time.  So yes, I think this can be
> > dropped.
> 
> It does test the attribute list for functions only.  I think packed should
> be detected from memory layout, so I will test patch dropping this.
> > 
> > > +  /* For canonical type comparisons we do not want to build SCCs
> > > +  so we cannot compare pointed-to types.  But we can, for now,
> > > +  require the same pointed-to type kind and match what
> > > +  useless_type_conversion_p would do.  */
> > > +  if (POINTER_TYPE_P (t1))
> > > + {
> > > +   if (TYPE_ADDR_SPACE (TREE_TYPE (t1))
> > > +   != TYPE_ADDR_SPACE (TREE_TYPE (t2)))
> > > + return false;
> > > +
> > > +   if (TREE_CODE (TREE_TYPE (t1)) != TREE_CODE (TREE_TYPE (t2)))
> > > + return false;
> > > + }
> > > 
> > > Is the reason for not making differences between differnt pointer types
> > > really just lazyness to not deal with SCCs?  For that it is easy to add
> > > set of visited type pairs, just like odr_types_equivalent_p does.
> > 
> > Well - TBAA doesn't care about pointed-to types.  See get_alias_set
> > 
> >   else if (POINTER_TYPE_P (t)
> >&& t != ptr_type_node)
> > set = get_alias_set (ptr_type_node);
> 
> Hmm, I see, interesting hack.  For the first part of comment, I see that
> qualifiers needs to be ignored, but I do not see why we put
> short * and int * pointers to same class.

For the reason that people are very lazy.  For example GCC has code
punning void ** and void * and void * to Type * (and vice versa).  People
just don't expect pointers to be in different alias sets (and there is
little gained with doing that).

> The complete/incomplete type fun with LTO can be solved for C++ but indeed I
> see why pointer to incomplete type needs to be considered compatible with 
> every
> other pointer to structure.  Can't this be dealt with by adding correct edges
> into the aliasing dag?

I don't think so, without the dag degenerating.  We've had this mess
before (complete vs. incomplete structs and so).  It didn't work very 
well.

> > 
> > so the above would at most matter for pointer-type fields of aggregates.
> > And no, we don't want to deal with SCCs - I doubt it would bring any 
> > benefit and the issue here is that I am worried about SCCs being present
> > dependent on the order of streaming (well, the code pre-dates SCC
> > streaming).
> > 
> > > Because we now stream in SCC order anyway, most of time we won't even
> > > need to populate it.  Shall I implement this?
> > 
> > You can certainly try - but I think for cross-language interoperability
> > we want to treat
> > 
> >   struct X { void *p; }
> >   struct Y { void *p; }
> > 
> > and
> > 
> >   struct X { Y *p; }
> >   struct Y { X *q; } 
> > 
> > as compatible so I don't think we can use SCCs to partition alias sets
> > further (that is, we _do_ have to treat pointers conservatively).  The
> > current code is as far as we can get IMHO (well, even the current code
> > is too strict in making struct { void *p } and struct { int *p } not
> > have the same alias-set.
> 
> I will give it a try.  Do you have some examples where the above matters
> for cross-language TBAA?

I don't remember if I added a testcase, so no.  It's just very clear to me
that the look of pointer members may be very different (consider 
reference vs. pointer or pointer to array vs pointer to element).

> > > Other issue I run into is that for Ada bootstrap we have variadic type 
> > > whose
> > > canonical types are having different temporary set as size.  I think this 
> > > is
> > > valid and perhaps gimple_canonical_types_compatible_p should consider
> > > variadic arrays to be compatible with any array of compatible type?
> > 
> > Those are all local types and thus the strict equality compare should be
> > ok?  Not sure if we can do in C sth like
> > 
> > void foo (int n)
> > {
> >   struct { int i; int a[n]; } a;
> >   struct { int i; int a[n]; } b;
> > }
> > 
> > and if we'd have to assign the same alias-sets to 'a' and 'b'.
> 
> No idea here either. I wonder if the types are intedned to be TBAA compatible
> across two calls of the function.  In that case we may introduce multiple 
> copies
> of body by early inlining already that may be a problem.

No idea.  Well, the canonical type machinery was supposed to be 
conservative but here we're obviously not 100% conservative.

> > > I am not quite convinced we get variadic types right at LTO time, because
> > > they bypass canonical type calculation anyway and their canonical type
> > > is set by TYPE_MAIN_VARIANT in lto_read_body_or_constructor which I think
> > > is not safe.  I will look for 

Re: [Patch, fortran, pr65548, 2nd take, v5] [5/6 Regression] gfc_conv_procedure_call

2015-05-19 Thread Andre Vehreschild
Hi all,

find attached latest version to fix 65548.

Bootstraps and regtests ok on x86_64-linux-gnu/f21.

- Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


pr65548_5.clog
Description: Binary data
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 814bdde..6d565ae 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -5088,7 +5088,7 @@ tree
 gfc_trans_allocate (gfc_code * code)
 {
   gfc_alloc *al;
-  gfc_expr *expr;
+  gfc_expr *expr, *e3rhs = NULL;
   gfc_se se, se_sz;
   tree tmp;
   tree parm;
@@ -5109,6 +5109,7 @@ gfc_trans_allocate (gfc_code * code)
   stmtblock_t post;
   tree nelems;
   bool upoly_expr, tmp_expr3_len_flag = false, al_len_needs_set;
+  gfc_symtree *newsym = NULL;
 
   if (!code->ext.alloc.list)
 return NULL_TREE;
@@ -5148,14 +5149,11 @@ gfc_trans_allocate (gfc_code * code)
   TREE_USED (label_finish) = 0;
 }
 
-  /* When an expr3 is present, try to evaluate it only once.  In most
- cases expr3 is invariant for all elements of the allocation list.
- Only exceptions are arrays.  Furthermore the standards prevent a
- dependency of expr3 on the objects in the allocate list.  Therefore
- it is safe to pre-evaluate expr3 for complicated expressions, i.e.
- everything not a variable or constant.  When an array allocation
- is wanted, then the following block nevertheless evaluates the
- _vptr, _len and element_size for expr3.  */
+  /* When an expr3 is present evaluate it only once.  The standards prevent a
+ dependency of expr3 on the objects in the allocate list.  An expr3 can
+ be pre-evaluated in all cases.  One just has to make sure, to use the
+ correct way, i.e., to get the descriptor or to get a reference
+ expression.  */
   if (code->expr3)
 {
   bool vtab_needed = false;
@@ -5168,75 +5166,77 @@ gfc_trans_allocate (gfc_code * code)
 	   al = al->next)
 	vtab_needed = (al->expr->ts.type == BT_CLASS);
 
-  /* A array expr3 needs the scalarizer, therefore do not process it
-	 here.  */
-  if (code->expr3->expr_type != EXPR_ARRAY
-	  && (code->expr3->rank == 0
-	  || code->expr3->expr_type == EXPR_FUNCTION)
-	  && (!code->expr3->symtree
-	  || !code->expr3->symtree->n.sym->as)
-	  && !gfc_is_class_array_ref (code->expr3, NULL))
-	{
-	  /* When expr3 is a variable, i.e., a very simple expression,
+  /* When expr3 is a variable, i.e., a very simple expression,
 	 then convert it once here.  */
-	  if ((code->expr3->expr_type == EXPR_VARIABLE)
-	  || code->expr3->expr_type == EXPR_CONSTANT)
-	{
-	  if (!code->expr3->mold
-		  || code->expr3->ts.type == BT_CHARACTER
-		  || vtab_needed)
-		{
-		  /* Convert expr3 to a tree.  */
-		  gfc_init_se (&se, NULL);
-		  se.want_pointer = 1;
-		  gfc_conv_expr (&se, code->expr3);
-		  if (!code->expr3->mold)
-		expr3 = se.expr;
-		  else
-		expr3_tmp = se.expr;
-		  expr3_len = se.string_length;
-		  gfc_add_block_to_block (&block, &se.pre);
-		  gfc_add_block_to_block (&post, &se.post);
-		}
-	  /* else expr3 = NULL_TREE set above.  */
-	}
-	  else
+  if (code->expr3->expr_type == EXPR_VARIABLE
+	  || code->expr3->expr_type == EXPR_ARRAY
+	  || code->expr3->expr_type == EXPR_CONSTANT)
+	{
+	  if (!code->expr3->mold
+	  || code->expr3->ts.type == BT_CHARACTER
+	  || vtab_needed)
 	{
-	  /* In all other cases evaluate the expr3 and create a
-		 temporary.  */
+	  /* Convert expr3 to a tree.  */
 	  gfc_init_se (&se, NULL);
-	  if (code->expr3->rank != 0
-		  && code->expr3->expr_type == EXPR_FUNCTION
-		  && code->expr3->value.function.isym)
+	  /* For all "simple" expression just get the descriptor or the
+		 reference, respectively, depending on the rank of the expr.  */
+	  if (code->expr3->rank != 0)
 		gfc_conv_expr_descriptor (&se, code->expr3);
 	  else
 		gfc_conv_expr_reference (&se, code->expr3);
-	  if (code->expr3->ts.type == BT_CLASS)
-		gfc_conv_class_to_class (&se, code->expr3,
-	 code->expr3->ts,
-	 false, true,
-	 false, false);
+	  if (!code->expr3->mold)
+		expr3 = se.expr;
+	  else
+		expr3_tmp = se.expr;
+	  expr3_len = se.string_length;
 	  gfc_add_block_to_block (&block, &se.pre);
 	  gfc_add_block_to_block (&post, &se.post);
-	  /* Prevent aliasing, i.e., se.expr may be already a
+	}
+	  /* else expr3 = NULL_TREE set above.  */
+	}
+  else
+	{
+	  /* In all other cases evaluate the expr3 and create a
+		 temporary.  */
+	  gfc_init_se (&se, NULL);
+	  symbol_attribute attr;
+	  /* Get the descriptor for all arrays, that are not allocatable or
+	 pointer, because the latter are descriptors already.  */
+	  attr = gfc_expr_attr (code->expr3);
+	  if (code->expr3->rank != 0 && !attr.allocatable && !attr.pointer)
+	gfc_conv_expr_descriptor (&se, code->expr3);
+	  else
+	gfc_conv_expr_reference (&se, code->expr3);
+	  if (code->expr3->ts.type == BT_CLASS)
+	gfc

Re: [PATCH] Fix match.pd narrowing opt (PR tree-optimization/66187)

2015-05-19 Thread Richard Biener
On Mon, 18 May 2015, Jakub Jelinek wrote:

> Hi!
> 
> As the testcases show, for signed types we really should use SIGNED rather
> than UNSIGNED as tree_int_cst_min_precision argument, that function doesn't
> really do the desired thing with UNSIGNED for negative values and with
> -fwrapv we just want the narrowing cast to not lose anything from the
> number.  Additionally, as for TYPE_OVERFLOW_WRAPS case we perform the +/-/& in
> unsigned type,  we have to avoid all negative masks, because those surely
> have the higher bits set.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> For plus and unsigned type, we could improve it even further, as in that
> case we could just check if the bit immediately above precision is clear
> in the mask, higher bits than that are uninteresting, because the addition
> will have always zeros there.  Perhaps as a follow-up?
> I mean say uchar foo (uchar a, uchar b) { return (x + y) & 0x72fe; }
> can be done as uchar addition & 0xfe, but not e.g. & 0x71fe.
> 
> 2015-05-18  Jakub Jelinek  
> 
>   PR tree-optimization/66187
>   * match.pd ((bit_and (plus/minus (convert @0) (convert @1)) mask)):
>   Pass TYPE_SIGN to tree_int_cst_min_precision.  If
>   !TYPE_OVERFLOW_WRAPS, ensure @4 is non-negative.
> 
>   * gcc.c-torture/execute/pr66187.c: New test.
>   * gcc.dg/pr66187-1.c: New test.
>   * gcc.dg/pr66187-2.c: New test.
> 
> --- gcc/match.pd.jj   2015-05-18 09:46:39.0 +0200
> +++ gcc/match.pd  2015-05-18 10:20:24.944475737 +0200
> @@ -1115,8 +1115,10 @@ (define_operator_list CBRT BUILT_IN_CBRT
>/* The inner conversion must be a widening conversion.  */
>&& TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE (@0))
>&& types_match (@0, @1)
> -  && (tree_int_cst_min_precision (@4, UNSIGNED)
> +  && (tree_int_cst_min_precision (@4, TYPE_SIGN (TREE_TYPE (@0)))
><= TYPE_PRECISION (TREE_TYPE (@0)))
> +  && (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
> +  || tree_int_cst_sgn (@4) >= 0)
>&& single_use (@5))
>(if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
>   (with { tree ntype = TREE_TYPE (@0); }
> --- gcc/testsuite/gcc.c-torture/execute/pr66187.c.jj  2015-05-18 
> 10:53:43.953654893 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr66187.c 2015-05-18 
> 10:53:28.0 +0200
> @@ -0,0 +1,16 @@
> +/* PR tree-optimization/66187 */
> +
> +int a = 1, e = -1;
> +short b, f;
> +
> +int
> +main ()
> +{
> +  f = e;
> +  int g = b < 0 ? 0 : f + b;
> +  if ((g & -4) < 0)
> +a = 0;
> +  if (a)
> +__builtin_abort ();
> +  return 0;
> +}
> --- gcc/testsuite/gcc.dg/pr66187-1.c.jj   2015-05-18 10:54:30.677906029 
> +0200
> +++ gcc/testsuite/gcc.dg/pr66187-1.c  2015-05-18 12:30:17.0 +0200
> @@ -0,0 +1,97 @@
> +/* PR tree-optimization/66187 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fno-wrapv" } */
> +
> +__attribute__((noinline, noclone)) int
> +f0 (unsigned char x, unsigned char y)
> +{
> +  return (x + y) & 0x2ff;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f1 (unsigned char x, unsigned char y)
> +{
> +  return (x - y) & 0x2ff;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f2 (signed char x, signed char y)
> +{
> +  return (x + y) & -4;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f3 (signed char x, signed char y)
> +{
> +  return (x + y) & 0xf8;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f4 (signed char x, signed char y)
> +{
> +  return (x + y) & 0x78;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f5 (unsigned char x, unsigned char y)
> +{
> +  int a = x;
> +  int b = y;
> +  int c = a + b;
> +  return c & 0x2ff;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f6 (unsigned char x, unsigned char y)
> +{
> +  int a = x;
> +  int b = y;
> +  int c = a - b;
> +  return c & 0x2ff;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f7 (signed char x, signed char y)
> +{
> +  int a = x;
> +  int b = y;
> +  int c = a + b;
> +  return c & -4;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f8 (signed char x, signed char y)
> +{
> +  int a = x;
> +  int b = y;
> +  int c = a + b;
> +  return c & 0xf8;
> +}
> +
> +__attribute__((noinline, noclone)) int
> +f9 (signed char x, signed char y)
> +{
> +  int a = x;
> +  int b = y;
> +  int c = a + b;
> +  return c & 0x78;
> +}
> +
> +int
> +main ()
> +{
> +  if (__SCHAR_MAX__ != 127 || sizeof (int) != 4)
> +return 0;
> +  if (f0 (0xff, 0xff) != 0xfe
> +  || f1 (0, 1) != 0x2ff
> +  || f2 (-2, 1) != -4
> +  || f3 (-2, 1) != 0xf8
> +  || f4 (-2, 1) != 0x78
> +  || f5 (0xff, 0xff) != 0xfe
> +  || f6 (0, 1) != 0x2ff
> +  || f7 (-2, 1) != -4
> +  || f8 (-2, 1) != 0xf8
> +  || f9 (-2, 1) != 0x78)
> +__builtin_abort ();
> +  return 0;
> +}
> --- gcc/testsuite/gcc.dg/pr66187-2.c.jj   2015-05-18 12:31:39.398176973 
> +0200
> +++ gcc/testsuite/gcc.dg/pr66187-2.c  2015-0

Re: [5/9] Create sensible dummy registers

2015-05-19 Thread Eric Botcazou
> Some pieces of code create a temporary REG or MEM and only fill it
> in later when they're testing the cost of a particular rtx.  This patch
> makes sure that even the dummy REG or MEM is valid, rather than force
> the gen_* code to handle garbage values.
> 
> 
> gcc/
>   * caller-save.c (init_caller_save): Use word_mode and
>   FIRST_PSEUDO_REGISTER when creating temporary rtxes.
>   * expr.c (init_expr_target): Likewise.
>   * ira.c (setup_prohibited_mode_move_regs): Likewise.
>   * postreload.c (reload_cse_regs_1): Likewise.

Isn't LAST_VIRTUAL_REGISTER + 1 the canonical regno to be used in this case?

-- 
Eric Botcazou


Re: [match-and-simplify] report error for invalid operator-lists

2015-05-19 Thread Richard Biener
On Tue, 19 May 2015, Prathamesh Kulkarni wrote:

> Hi,
> genmatch segfaults on:
> (define_operator_list op (plus))
> 
> The above syntax is invalid, and it segfaults on:
> fatal_error (token, "operator list is empty");
> because token is NULL.
> 
> The patch puts a check for CPP_CLOSE_PAREN after parsing id-list.
> OK for trunk after bootstrap+testing completes ?

Ok.

Thanks,
Richard.


Re: [match-and-simplify] fix incorrect code-gen in 'for' pattern

2015-05-19 Thread Richard Biener
On Tue, 19 May 2015, Prathamesh Kulkarni wrote:

> On 18 May 2015 at 20:17, Prathamesh Kulkarni
>  wrote:
> > On 18 May 2015 at 14:12, Richard Biener  wrote:
> >> On Sat, 16 May 2015, Prathamesh Kulkarni wrote:
> >>
> >>> Hi,
> >>> genmatch generates incorrect code for following (artificial) pattern:
> >>>
> >>> (for op (plus)
> >>>   op2 (op)
> >>>   (simplify
> >>> (op @x @y)
> >>> (op2 @x @y)
> >>>
> >>> generated gimple code: http://pastebin.com/h1uau9qB
> >>> 'op' is not replaced in the generated code on line 33:
> >>> *res_code = op;
> >>>
> >>> I think it would be a better idea to make op2 iterate over same set
> >>> of operators (op2->substitutes = op->substitutes).
> >>> I have attached patch for the same.
> >>> Bootstrap + testing in progress on x86_64-unknown-linux-gnu.
> >>> OK for trunk after bootstrap+testing completes ?
> >>
> >> Hmm, but then the example could as well just use 'op'.  I think we
> >> should instead reject this.
> >>
> >> Consider
> >>
> >>   (for op (plus minus)
> >> (for op2 (op)
> >>   (simplify ...
> >>
> >> where it is not clear what would be desired.  Simple replacement
> >> of 'op's value would again just mean you could have used 'op' in
> >> the first place.  Doing what you propose would get you
> >>
> >>   (for op (plus minus)
> >> (for op2 (plus minus)
> >>   (simplify ...
> >>
> >> thus a different iteration.
> >>
> >>> I wonder if we really need is_oper_list flag in user_id ?
> >>> We can determine if user_id is an operator list
> >>> if user_id::substitutes is not empty ?
> >>
> >> After your change yes.
> >>
> >>> That will lose the ability to distinguish between user-defined operator
> >>> list and list-iterator in for like op/op2, but I suppose we (so far) don't
> >>> need to distinguish between them ?
> >>
> >> Well, your change would simply make each list-iterator a (temporary)
> >> user-defined operator list as well as the current iterator element
> >> (dependent on context - see the nested for example).  I think that
> >> adds to confusion.
> AFAIU, the way it's implemented in lower_for, the iterator is handled
> the same as a user-defined operator
> list. I was wondering if we should get rid of 'for' altogether and
> have it replaced
> by operator-list ?
> 
> IMHO having two different things - iterator and operator-list is
> unnecessary and we could
> brand iterator as a "local" operator-list. We could extend syntax of 
> 'simplify'
> to accommodate "local" operator-lists.
> 
> So we can say, using an operator-list within 'match' replaces it by
> corresponding operators in that list.
> Operator-lists can be "global" (visible to all patterns), or local to
> a particular pattern.
> 
> eg:
> a) single for
> (for op (...)
>   (simplify
> (op ...)))
> 
> can be written as:
> (simplify
>   op (...)  // define "local" operator-list op.
>   (op ...)) // proceed here the same way as for lowering "global" operator 
> list.

it's not shorter and it's harder to parse.  And you can't share the
operator list with multiple simplifies like

 (for op (...)
   (simplify
 ...)
   (simplify
 ...))

which is already done I think.

> b) multiple iterators:
> (for op1 (...)
>   op2 (...)
>   (simplify
> (op1 (op2 ...
> 
> can be written as:
> (simplify
>   op1 (...)
>   op2 (...)
>   (op1 (op2 ...)))
> 
> c) nested for
> (for op1 (...)
> (for op2 (...)
>   (simplify
> (op1 (op2 ...
> 
> can be written as:
> 
> (simplify
>   op1 (...)
>   (simplify
> op2 (...)
> (op1 (op2 ...
> 
> My rationale behind removing 'for' is we don't need to distinguish
> between an "operator-list" and "iterator",
> and only have an operator-list -;)
> Also we can reuse parser::parse_operator_list (in parser::parse_for
> parsing oper-list is duplicated)
> and get rid of 'parser::parse_for'.
> We don't need to change lowering, since operator-lists are handled
> the same way as 'for' (we can keep lowering of simplify::for_vec as it is).
> 
> Does it sound reasonable ?

I dont' think the proposed syntax is simpler or more powerful.

Richard.

> Thanks,
> Prathamesh
> >>
> >> So - can you instead reject this use?
> > Well my intention was to have support for walking operator list in reverse.
> > For eg:
> > (for bitop (bit_and bit_ior)
> >   rbitop (bit_ior bit_and)
> >...)
> > Could be replaced by sth like:
> > (for bitop (bit_and bit_ior)
> >   rbitop (~bitop))
> >...)
> >
> > where "~bitop" would indicate walking (bit_and bit_ior) in reverse.
> > Would that be a good idea ? For symmetry, I thought
> > (for op (list)
> >   op2 (op))
> > should be supported too.
> >
> >
> > Thanks,
> > Prathamesh
> >
> >
> >>
> >> Thanks,
> >> Richard.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


[PATCH] Fix PR66185

2015-05-19 Thread Richard Biener

I forgot to roll back SLP child state when ending up building vectors
from scalars.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-05-19  Richard Biener  

PR tree-optimization/66185
* tree-vect-slp.c (vect_build_slp_tree): Properly roll back
when building the SLP node from scalars.

* gcc.dg/torture/pr66185.c: New testcase.

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 223288)
--- gcc/tree-vect-slp.c (working copy)
*** vect_build_slp_tree (loop_vec_info loop_
*** 1103,1108 
--- 1103,1118 
 scalar version.  */
  && !is_pattern_stmt_p (vinfo_for_stmt (stmt)))
{
+ unsigned int j;
+ slp_tree grandchild;
+ 
+ /* Roll back.  */
+ *max_nunits = old_max_nunits;
+ loads->truncate (old_nloads);
+ FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (child), j, grandchild)
+   vect_free_slp_tree (grandchild);
+ SLP_TREE_CHILDREN (child).truncate (0);
+ 
  dump_printf_loc (MSG_NOTE, vect_location,
   "Building vector operands from scalars\n");
  oprnd_info->def_stmts = vNULL;
Index: gcc/testsuite/gcc.dg/torture/pr66185.c
===
*** gcc/testsuite/gcc.dg/torture/pr66185.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr66185.c  (working copy)
***
*** 0 
--- 1,13 
+ /* { dg-do compile } */
+ 
+ unsigned int a;
+ int b[5], c;
+ 
+ int
+ main ()
+ {
+   for (c = 0; c < 4; c++)
+ b[c] = b[c+1] > ((b[0] > 0) > a);
+ 
+   return 0;
+ }


[PATCH] Fix PR66165

2015-05-19 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-05-19  Richard Biener  

PR tree-optimization/66165
* tree-vect-slp.c (vect_supported_load_permutation_p): Add guard
for no load permutation.

* gcc.dg/torture/pr66165.c: New testcase.

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 223288)
--- gcc/tree-vect-slp.c (working copy)
*** vect_supported_load_permutation_p (slp_i
*** 1400,1405 
--- 1400,1407 
   no permutation is necessary.  */
FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node)
  {
+ if (!SLP_TREE_LOAD_PERMUTATION (node).exists ())
+   continue;
  bool subchain_p = true;
next_load = NULL;
FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), j, load)
Index: gcc/testsuite/gcc.dg/torture/pr66165.c
===
*** gcc/testsuite/gcc.dg/torture/pr66165.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr66165.c  (working copy)
***
*** 0 
--- 1,11 
+ /* { dg-do compile } */
+ 
+ void foo(double *d, double *a)
+ {
+   d[0] += d[2];
+   d[1] += d[3];
+   d[2] += d[4];
+   d[3] += d[5];
+   a[0] = d[0];
+   a[1] = d[1];
+ }


Re: Refactor gimple_expr_type

2015-05-19 Thread Richard Biener
On Tue, May 19, 2015 at 12:04 AM, Aditya K  wrote:
>
>
> 
>> Date: Mon, 18 May 2015 12:08:58 +0200
>> Subject: Re: Refactor gimple_expr_type
>> From: richard.guent...@gmail.com
>> To: hiradi...@msn.com
>> CC: tbsau...@tbsaunde.org; gcc-patches@gcc.gnu.org
>>
>> On Sun, May 17, 2015 at 5:31 PM, Aditya K  wrote:
>>>
>>>
>>> 
 Date: Sat, 16 May 2015 11:53:57 -0400
 From: tbsau...@tbsaunde.org
 To: hiradi...@msn.com
 CC: gcc-patches@gcc.gnu.org
 Subject: Re: Refactor gimple_expr_type

 On Fri, May 15, 2015 at 07:13:35AM +, Aditya K wrote:
> Hi,
> I have tried to refactor gimple_expr_type to make it more readable. 
> Removed the switch block and redundant if.
>
> Please review this patch.

 for some reason your mail client seems to be inserting non breaking
 spaces all over the place. Please either configure it to not do that,
 or use git send-email for patches.
>>>
>>> Please see the updated patch.
>>
>> Ok if this passed bootstrap and regtest. (I wish if gimple_expr_type
>> didn't exist btw...)
>
> Thanks for the review. Do you have any suggestions on how to remove 
> gimple_expr_type. Are there any alternatives to it?
> I can look into refactoring more (if it is not too complicated) since I'm 
> already doing this.

Look at each caller - usually they should be fine with using TREE_TYPE
(gimple_get_lhs ()) (or a more specific one
dependent on what stmts are expected at the place).  You might want to
first refactor the code

  else if (code == GIMPLE_COND)
gcc_unreachable ();

and deal with the fallout in callers (similar for the void_type_node return).

Richard.


> -Aditya
>
>>
>> Thanks,
>> Richard.
>>
>>> gcc/ChangeLog:
>>>
>>> 2015-05-15 hiraditya 
>>>
>>> * gimple.h (gimple_expr_type): Refactor to make it concise. Remove 
>>> redundant if.
>>>
>>> diff --git a/gcc/gimple.h b/gcc/gimple.h
>>> index 95e4fc8..3a83e8f 100644
>>> --- a/gcc/gimple.h
>>> +++ b/gcc/gimple.h
>>> @@ -5717,36 +5717,26 @@ static inline tree
>>> gimple_expr_type (const_gimple stmt)
>>> {
>>> enum gimple_code code = gimple_code (stmt);
>>> -
>>> - if (code == GIMPLE_ASSIGN || code == GIMPLE_CALL)
>>> + /* In general we want to pass out a type that can be substituted
>>> + for both the RHS and the LHS types if there is a possibly
>>> + useless conversion involved. That means returning the
>>> + original RHS type as far as we can reconstruct it. */
>>> + if (code == GIMPLE_CALL)
>>> {
>>> - tree type;
>>> - /* In general we want to pass out a type that can be substituted
>>> - for both the RHS and the LHS types if there is a possibly
>>> - useless conversion involved. That means returning the
>>> - original RHS type as far as we can reconstruct it. */
>>> - if (code == GIMPLE_CALL)
>>> - {
>>> - const gcall *call_stmt = as_a  (stmt);
>>> - if (gimple_call_internal_p (call_stmt)
>>> - && gimple_call_internal_fn (call_stmt) == IFN_MASK_STORE)
>>> - type = TREE_TYPE (gimple_call_arg (call_stmt, 3));
>>> - else
>>> - type = gimple_call_return_type (call_stmt);
>>> - }
>>> + const gcall *call_stmt = as_a  (stmt);
>>> + if (gimple_call_internal_p (call_stmt)
>>> + && gimple_call_internal_fn (call_stmt) == IFN_MASK_STORE)
>>> + return TREE_TYPE (gimple_call_arg (call_stmt, 3));
>>> + else
>>> + return gimple_call_return_type (call_stmt);
>>> + }
>>> + else if (code == GIMPLE_ASSIGN)
>>> + {
>>> + if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR)
>>> + return TREE_TYPE (gimple_assign_rhs1 (stmt));
>>> else
>>> - switch (gimple_assign_rhs_code (stmt))
>>> - {
>>> - case POINTER_PLUS_EXPR:
>>> - type = TREE_TYPE (gimple_assign_rhs1 (stmt));
>>> - break;
>>> -
>>> - default:
>>> - /* As fallback use the type of the LHS. */
>>> - type = TREE_TYPE (gimple_get_lhs (stmt));
>>> - break;
>>> - }
>>> - return type;
>>> + /* As fallback use the type of the LHS. */
>>> + return TREE_TYPE (gimple_get_lhs (stmt));
>>> }
>>> else if (code == GIMPLE_COND)
>>> return boolean_type_node;
>>>
>>>
>>> Thanks,
>>> -Aditya
>>>
>>>
>>>
>>>
>>>

>
> Thanks,
> -Aditya
>
>
> gcc/ChangeLog:
>
> 2015-05-15 hiraditya 
>
> * gimple.h (gimple_expr_type): Refactor to make it concise. Remove 
> redundant if.
>
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index 95e4fc8..168d3ba 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -5717,35 +5717,28 @@ static inline tree
> gimple_expr_type (const_gimple stmt)
> {
> enum gimple_code code = gimple_code (stmt);
> -
> - if (code == GIMPLE_ASSIGN || code == GIMPLE_CALL)
> + tree type;
> + /* In general we want to pass out a type that can be substituted
> + for both the RHS and the LHS types if there is a possibly
> + useless conversion involved. That means returning the
> + original RHS type as far as we can reconstruct it. */
> + if (code == GIMPLE_CALL)
> {
>

Re: [RFC] COMDAT Safe Module Level Multi versioning

2015-05-19 Thread Richard Biener
On Tue, May 19, 2015 at 8:16 AM, Sriraman Tallam  wrote:
> We have the following problem with selectively compiling modules with
> -m options and I have provided a solution to solve this.  I would
> like to hear what you think.
>
> Multi versioning at module granularity is done by compiling a subset
> of modules with advanced ISA instructions, supported on later
> generations of the target architecture, via -m options and
> invoking the functions defined in these modules with explicit checks
> for the ISA support via builtin functions,  __builtin_cpu_supports.
> This mechanism has the unfortunate side-effect that generated COMDAT
> candidates from these modules can contain these advanced instructions
> and potentially “violate” ODR assumptions.  Choosing such a COMDAT
> candidate over a generic one from a different module can cause SIGILL
> on platforms where the advanced ISA is not supported.
>
> Here is a slightly contrived  example to illustrate:
>
>
> matrixdouble.h
> 
> // Template (Comdat) function definition in a header:
>
> template
> __attribute__((noinline))
> void matrixDouble (T *a) {
>   for (int i = 0 ; i < 16; ++i)  //Vectorizable Loop
> a[i] = a[i] * 2;
> }
>
> avx.cc  (Compile with -mavx -O2)
> -
>
> #include "matrixdouble.h"
> void getDoubleAVX(int *a) {
>  matrixDouble(a);  // Instantiated with vectorized AVX instructions
> }
>
>
> non_avx.cc (Compile with -mno-avx -O2)
> ---
>
> #include “matrixdouble.h”
> void
> getDouble(int *a) {
>  matrixDouble(a); // Instantiated with non-AVX instructions
> }
>
>
> main.cc
> ---
>
> void getDoubleAVX(int *a);
> void getDouble(int *a);
>
> int a[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
> int main () {
>  // The AVX call is appropriately guarded.
>  if (__builtin_cpu_supports(“avx”))
>getDoubleAVX(a);
>  else
>getDouble(a);
>  return a[0];
> }
>
>
> In the above code, function “getDoubleAVX” is only called when the
> run-time CPU supports AVX instructions.  This code looks clean but
> suffers from the COMDAT ODR violation.  Two copies of COMDAT function
> “matrixDouble” are generated.  One copy is generated in object file
> “avx.o” with AVX instructions and another copy exists in “non_avx.o”
> without AVX instruction.  At link time, in a link order where object
> file avx.o is seen ahead of  non_avx.o,  the COMDAT copy of function
> “matrixDouble” that contains AVX instructions is kept leading to
> SIGILL on unsupported platforms.  To reproduce the SIGILL,
>
>
> $  g++ -c -O2 -mavx avx.cc
> $ g++ -c -O2 -mno-avx non_avx.cc
> $  g++ main.cc avx.o non_avx.o
> $ ./a.out   # on a non-AVX machine
> Illegal Instruction
>
>
> To solve this, I propose introducing a new compiler option, say
> -fodr-unsafe-comdats, to let the user tag objects that use specialized
> options and let the linker choose the comdat candidate to be linked
> wisely.  The root cause of the above problem is that comdat functions
> in common headers may not be properly guarded and the linker picks the
> first candidate it sees.  A link order where the object with the
> specialized comdat functions appear first causes these comdats to be
> picked leading to SIGILL on unsupported arches.  With the objects
> tagged, the linker can be made to pick other comdat candidates when
> possible.
>
> More details:
>
> This option is user specified when using arch specific options like
> -m.  It is an indicator to the compiler that any comdat bodies
> generated are potentially unsafe for execution.  Note that the COMDAT
> bodies however have to be generated as there are no guarantees that
> other modules will do so.  The compiler then emits a specially named
> section, like “.gnu.odr.unsafe”, in the object file.  When the linker
> tries to pick a COMDAT candidate from several choices, it must avoid
> COMDAT copies from objects with sections named “.gnu.odr.unsafe” when
> presented with a choice to pick a candidate from an object that does
> not have the “.gnu.odr.unsafe” section.  Note that it may not be
> possible to do that in which case the linker must pick the unsafe
> copy, it could explicitly warn when this happens.
>
> Alternately,  the compiler can bind locally any emitted comdat version
> from a specialized module, which could also be guarded by an option.
> This will solve the problem but this may not be always possible
> especially when addresses of any such comdat version is taken.

Hm.  But which options are unsafe?  Also wouldn't it be better to simply
_not_ have unsafe options produce comdats but always make local clones
for them (thus emit the comdat with "unsafe" flags dropped)?

Richard.

>
> Thanks
> Sri


Re: [CHKP, PATCH] Fix instrumented indirect calls with propagated pointers

2015-05-19 Thread Ilya Enkovich
Ping

2015-05-05 11:05 GMT+03:00 Ilya Enkovich :
> Ping
>
> 2015-04-14 17:35 GMT+03:00 Ilya Enkovich :
>> On 10 Apr 03:27, Jan Hubicka wrote:
>>> >
>>> > +  /* We might propagate instrumented function pointer into
>>> > + not instrumented function and vice versa.  In such a
>>> > + case we need to either fix function declaration or
>>> > + remove bounds from call statement.  */
>>> > +  if (flag_check_pointer_bounds && callee)
>>> > +skip_bounds = chkp_redirect_edge (e);
>>>
>>> I think this gets wrong the case where the edge is speculative and the new
>>> direct call needs adjustement.  You probably need to do the right think in
>>> the if (e->speculative) branch so direct call is updated by indirect is not
>>> or at least give an explanation why this is not needed :)
>>>
>>> The speculative edge handling works in a way that the runtime conditoinal is
>>> built and then the edge is updated to the direct path, so perhaps
>>> you can just move all this after the ocnditoinal?
>>
>> I think you are right, it should be OK to move it after apeculative call 
>> processing.
>>
>>>
>>> > diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
>>> > index 404cb68..ffb6ad7 100644
>>> > --- a/gcc/lto-wrapper.c
>>> > +++ b/gcc/lto-wrapper.c
>>> > @@ -273,6 +273,7 @@ merge_and_complain (struct cl_decoded_option 
>>> > **decoded_options,
>>> > case OPT_fwrapv:
>>> > case OPT_fopenmp:
>>> > case OPT_fopenacc:
>>> > +   case OPT_fcheck_pointer_bounds:
>>> >   /* For selected options we can merge conservatively.  */
>>> >   for (j = 0; j < *decoded_options_count; ++j)
>>> > if ((*decoded_options)[j].opt_index == foption->opt_index)
>>> > @@ -503,6 +504,7 @@ append_compiler_options (obstack *argv_obstack, 
>>> > struct cl_decoded_option *opts,
>>> > case OPT_Ofast:
>>> > case OPT_Og:
>>> > case OPT_Os:
>>> > +   case OPT_fcheck_pointer_bounds:
>>>
>>> Hmm, will this make mixed units contaiing some check bounds and some 
>>> non-check bounds to work?
>>> Perhaps these should be function specific? Does things like inlining bounds 
>>> checked function into
>>> non-checked function work?
>>
>> This actually should make it work better because solves a possible problem 
>> with uninitialized static bounds data (chkp static constructors are 
>> generated only when OPT_fcheck_pointer_bounds is passed).
>>
>> Inlining of instrumentation thunks is not supported (similar to all other 
>> thunks).  But we may have a not instrumented call in an instrumented 
>> function and do inlining for it.
>>
>>>
>>> Otherwise the patch seems resonable.
>>> Honza
>>
>>
>> Here is a fixed version with chkp redirection moved.  Bootstrap and testing 
>> passed.  Is it OK for trunk and later for GCC 5?
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2015-04-14  Ilya Enkovich  
>>
>> PR target/65527
>> * cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Add
>> redirection for instrumented calls.
>> * lto-wrapper.c (merge_and_complain): Merge -fcheck-pointer-bounds.
>> (append_compiler_options): Append -fcheck-pointer-bounds.
>> * tree-chkp.h (chkp_copy_call_skip_bounds): New.
>> (chkp_redirect_edge): New.
>> * tree-chkp.c (chkp_copy_call_skip_bounds): New.
>> (chkp_redirect_edge): New.
>>
>> gcc/testsuite/
>>
>> 2015-04-14  Ilya Enkovich  
>>
>> PR target/65527
>> * gcc.target/i386/mpx/chkp-fix-calls-1.c: New.
>> * gcc.target/i386/mpx/chkp-fix-calls-2.c: New.
>> * gcc.target/i386/mpx/chkp-fix-calls-3.c: New.
>> * gcc.target/i386/mpx/chkp-fix-calls-4.c: New.
>>
>>
>> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
>> index 85531c8..38e71fc 100644
>> --- a/gcc/cgraph.c
>> +++ b/gcc/cgraph.c
>> @@ -1281,6 +1281,7 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>>tree lhs = gimple_call_lhs (e->call_stmt);
>>gcall *new_stmt;
>>gimple_stmt_iterator gsi;
>> +  bool skip_bounds = false;
>>  #ifdef ENABLE_CHECKING
>>cgraph_node *node;
>>  #endif
>> @@ -1389,8 +1390,16 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>> }
>>  }
>>
>> +  /* We might propagate instrumented function pointer into
>> + not instrumented function and vice versa.  In such a
>> + case we need to either fix function declaration or
>> + remove bounds from call statement.  */
>> +  if (flag_check_pointer_bounds && e->callee)
>> +skip_bounds = chkp_redirect_edge (e);
>> +
>>if (e->indirect_unknown_callee
>> -  || decl == e->callee->decl)
>> +  || (decl == e->callee->decl
>> + && !skip_bounds))
>>  return e->call_stmt;
>>
>>  #ifdef ENABLE_CHECKING
>> @@ -1415,13 +1424,19 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>> }
>>  }
>>
>> -  if (e->callee->clone.combined_args_to_skip)
>> +  if (e->callee->clone.combined_args_to_skip
>> +  || skip_bounds)
>>  {
>>int lp_nr;
>>
>> -  new_stmt
>> -   = gimple_call_copy_skip_args (e->ca

Re: [CHKP, PATCH] Fix LTO cgraph merge for instrumented functions

2015-05-19 Thread Ilya Enkovich
Ping

2015-05-05 11:06 GMT+03:00 Ilya Enkovich :
> Ping
>
> 2015-04-14 12:14 GMT+03:00 Ilya Enkovich :
>> On 10 Apr 03:15, Jan Hubicka wrote:
>>> >
>>> > References are not streamed out for nodes which are referenced in a
>>> > partition but don't belong to it ('continue' condition in output_refs
>>> > loop).
>>>
>>> Yeah, but it already special cases aliases (because we now always preserve 
>>> IPA_REF_ALIAS link
>>> in the boundary, so I think making IPA_REF_CHKP to be rpeserved should go 
>>> the same path)
>>> so we can special case the instrumentation thunks too?
>>
>> Any cgraph_node having instrumented clone should have it, not only 
>> instrumentation thunks.  Surely we can make an exception for IPA_REF_CHKP.
>>
>>> >
>>> > >
>>> > > I suppose we still need to
>>> > >  1) write_symbol and lto-symtab should know that transparent aliases 
>>> > > are not real
>>> > > symbols and ignore them. We have predicate 
>>> > > symtab_node::real_symbol_p,
>>> > > I suppose it should return true.
>>> > >
>>> > > I think cgraph_merge and varpool_merge in lto-symtab needs to know 
>>> > > what to do
>>> > > when merging two symbols with transparent aliases.
>>> > >  2) At some point we need to populate transparent alias links as 
>>> > > discussed in the
>>> > > other thread.
>>> > >  3) For safety, I guess we want 
>>> > > symbol_table::change_decl_assembler_name to either
>>> > > sanity check there are no trasparent alias links pointing to old 
>>> > > assembler
>>> > > names or update it.
>>> >
>>> > Wouldn't search for possible referring via transparent aliases nodes
>>> > be too expensive?
>>>
>>> Changing of decl_assembler_name is not very common and if we set up the 
>>> links
>>> only after renaming of statics, i think we are safe. But it would be better 
>>> to
>>> keep verify it at some point.
>>>
>>> I suppose verify_node check that there is no transparent alias link on 
>>> symbols
>>> were it is not supposed to be and that there is link when it is supposed to 
>>> be (i.e.
>>> symtab_state is >=EXPANSION and symbol is either weakref on tragets w/o 
>>> native weakrefs
>>> or instrumentation cone.
>>>
>>> Would you please mind adding the test and making sure it triggers when 
>>> things are
>>> out of sync?
>>>
>>
>> OK, but I suppose it should be a part of transparent alias links rework 
>> discussed in another thread.  BTW are you going to intoroduce transparent 
>> aliases in cgraph soon?
>>
>>> >
>>> > >  4) I think we want verify_node to check that transparent alias links 
>>> > > and chkp points
>>> > > as intended and only on those symbols where they need to be
>>> > >
>>> > > There is also logic in lto-partition to always insert alias target
>>> > >> > OK, so you have the chkp references that links to 
>>> > >> > instrumented_version.
>>> > >> > You do not stream them for boundary symbols but for some reason they 
>>> > >> > are needed,
>>> > >> > but when you can end up with wrong CHKP link?
>>> > >>
>>> > >> Wrong links may appear when cgraph nodes are merged.
>>> > >
>>> > > I see, I think you really want to fix these at merging time as sugested 
>>> > > in 1).
>>> > > In this case even the node->instrumented_version pointer may be out of 
>>> > > date and
>>> > > point to a node that lost in merging?
>>> >
>>> > node->instrumented_version is redirected and never points to lost
>>> > symbol. But during merge we may have situations when
>>> > (node->instrumented_version->instrumented_version != node) because of
>>> > multiple not merged (yet) symbols referencing the same merged target.
>>>
>>> OK, which should not be a problem if we stream the links instead of 
>>> rebuilding them, right?
>>
>> IPA_REF_CHKP refs don't affect instrumented_version merging. And I actually 
>> don't see here any problems, instrumented_version links always come into 
>> consistent state when symbol merging finishes.
>>
>>
>> Here is a new patch version.  It has no chkp_maybe_fix_chkp_ref function,  
>> IPA_REF_CHKP references are streamed out instead.  Bootstrapped and tested 
>> on x86_64-unknown-linux-gnu.  OK for trunk?  I also want to port it to GCC 5 
>> branch after release.
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2015-04-14  Ilya Enkovich  
>>
>> * ipa.c (symbol_table::remove_unreachable_nodes): Don't
>> remove instumentation thunks calling reachable functions.
>> * lto-cgraph.c (output_refs): Always output IPA_REF_CHKP.
>> * lto/lto-partition.c (privatize_symbol_name_1): New.
>> (privatize_symbol_name): Privatize both decl and orig_decl
>> names for instrumented functions.
>>
>> gcc/testsuite/
>>
>> 2015-04-14  Ilya Enkovich  
>>
>> * gcc.dg/lto/chkp-privatize-1_0.c: New.
>> * gcc.dg/lto/chkp-privatize-1_1.c: New.
>> * gcc.dg/lto/chkp-privatize-2_0.c: New.
>> * gcc.dg/lto/chkp-privatize-2_1.c: New.
>>
>>
>> diff --git a/gcc/ipa.c b/gcc/ipa.c
>> index b3752de..3054afe 100644
>> --- a/gcc/ipa.c

Re: [PATCH 01/13] recog: Increased max number of alternatives - v2

2015-05-19 Thread Richard Biener
On Tue, May 19, 2015 at 10:40 AM, Andreas Krebbel
 wrote:
> On 05/18/2015 04:19 PM, Richard Biener wrote:
>> On Mon, May 18, 2015 at 3:41 PM, Andreas Krebbel
>>  wrote:
>>> The new version also changes the type for the alternative_mask to unsigned 
>>> HOST_WIDE_INT.
>>>
>>> Bootstrapped without regressions on x86-64.
>>>
>>> Ok to apply?
>>
>> Please use uint64_t instead.
>
> Done. Ok with that change?

Yes.

Richard.

> -Andreas-
>
>


[PATCH][wwwdocs] Mention native CPU detection in aarch64 notes for GCC 6

2015-05-19 Thread Kyrill Tkachov

Hi all,

This patch adds a mention of the new native cpu detection feature in aarch64 
GNU/Linux.
Gerald, this is a patch against htdocs/gcc-6/changes.html and I thought I had 
seen the 'changes' link
in gcc.gnu.org earlier but I don't see it now (there's only a release criteria 
link).
Is this a bug?

Ok to apply?

Thanks,
Kyrill
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.7
diff -U 3 -r1.7 changes.html
--- changes.html	14 May 2015 22:07:28 -	1.7
+++ changes.html	19 May 2015 10:11:57 -
@@ -66,10 +66,19 @@
 
 
 
-
-
-
+New Targets and Target Specific Improvements
 
+AArch64
+   
+ 
+   The new command line options -march=native,
+   -mcpu=native and -mtune=native are now
+   available on native AArch64 GNU/Linux systems.  Specifying
+   these options will cause GCC to auto-detect the host CPU and
+   rewrite these options to the optimal setting for that system.
+   If GCC is unable to detect the host CPU these options have no effect.
+ 
+   
 
 
 


[PATCH] Fix PR66168: ICE due to incorrect invariant register info

2015-05-19 Thread Thomas Preud'homme
Hi,

r223113 made it possible for invariant to actually be moved rather than
moving the source to a new pseudoregister. However, when doing so
the inv->reg is not set up properly: in case of a subreg destination it
holds the inner register rather than the subreg expression.

This patch fixes that.


ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2015-05-18  Thomas Preud'homme  

PR rtl-optimization/66168
* loop-invariant.c (move_invariant_reg): Set inv->reg to destination
of inv->insn when moving an invariant without introducing a temporary
register.

*** gcc/testsuite/ChangeLog ***

2015-05-18  Thomas Preud'homme  

PR rtl-optimization/66168
* gcc.c-torture/compile/pr66168.c: New test.


diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 76a009f..30e1945 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -1642,9 +1642,13 @@ move_invariant_reg (struct loop *loop, unsigned invno)
 
  emit_insn_after (gen_move_insn (dest, reg), inv->insn);
}
-  else if (dump_file)
-   fprintf (dump_file, "Invariant %d moved without introducing a new "
-   "temporary register\n", invno);
+  else
+   {
+ reg = SET_DEST (set);
+ if (dump_file)
+   fprintf (dump_file, "Invariant %d moved without introducing a new "
+   "temporary register\n", invno);
+   }
   reorder_insns (inv->insn, inv->insn, BB_END (preheader));
 
   /* If there is a REG_EQUAL note on the insn we just moved, and the
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr66168.c 
b/gcc/testsuite/gcc.c-torture/compile/pr66168.c
new file mode 100644
index 000..d6bfc7b
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr66168.c
@@ -0,0 +1,15 @@
+int a, b;
+
+void
+fn1 ()
+{
+  for (;;)
+{
+  for (b = 0; b < 3; b++)
+   {
+ char e[2];
+ char f = e[1];
+ a ^= f ? 1 / f : 0;
+   }
+}
+}


Tested by bootstrapping on x86_64-linux-gnu and building an arm-none-eabi
cross-compiler. Testsuite run shows no regression for both of them.

Ok for trunk?

Best regards,

Thomas




Re: [Patch, fortran, PR44672, v6] [F08] ALLOCATE with SOURCE and no array-spec

2015-05-19 Thread Andre Vehreschild
Hi all,

update based on latest 65548 (v5) patch and current trunk. Description and
issue addressed unchanged (see cite below).

Bootstrapped and regtested on x86_64-linux-gnu/f21.

Any volunteers to review? The initial version dates back to March 30. 2015. Not
a single comment so far!

- Andre



On Thu, 30 Apr 2015 16:17:42 +0200
Andre Vehreschild  wrote:

> Hi all,
> 
> and also for this bug, I like to present an updated patch. It was brought to
> my attention, that the previous patch did not fix statements like:
> 
> allocate(m, source=[(I, I=1, n)])
> 
> where n is a variable and
> 
> type p
>   class(*), allocatable :: m(:,:)
> end type
> real mat(2,3)
> type(P) :: o
> allocate(o%m, source=mat)
> 
> The new version of the patch fixes those issue now also and furthermore
> addresses some issues (most probably not all) where the rank of the
> source=-variable and the rank of the array to allocate differ. For example,
> when one is do:
> 
> real v(:)
> allocate(v, source= arr(1,2:3))
> 
> where arr has a rank of 2 and only the source=-expression a rank of one, which
> is then compatible with v. Nevertheless did this need addressing, when setting
> up the descriptor of the v and during data copy.
> 
> Bootstrap ok on x86_64-linux-gnu/f21.
> Regtests with one regression in gfortran.dg/alloc_comp_constructor_1.f90,
> which is addressed in the patch for pr58586, whose final version is in
> preparation.
> 
> Ok for trunk in combination with 58586 once both are reviewed?
> 
> Regards,
>   Andre
> 
> 
> On Wed, 29 Apr 2015 17:23:58 +0200
> Andre Vehreschild  wrote:
> 
> > Hi all,
> > 
> > this is the fourth version of the patch, adapting to the current state of
> > trunk. This patch is based on my patch for 65584 version 2 and needs that
> > patch applied beforehand to apply cleanly. The patch for 65548 is available
> > from:
> > 
> > https://gcc.gnu.org/ml/fortran/2015-04/msg00121.html
> > 
> > Scope:
> > 
> > Allow allocate of arrays w/o having to give an array-spec as specified in
> > F2008:C633. An example is:
> > 
> > integer, dimension(:) :: arr
> > allocate(arr, source = [1,2,3])
> > 
> > Solution:
> > 
> > While resolving an allocate, the objects to allocate are analyzed whether
> > they carry an array-spec, if not the array-spec of the source=-expression is
> > transferred. Unfortunately some source=-expressions are not easy to handle
> > and have to be assigned to a temporary variable first. Only with the
> > temporary variable the gfc_trans_allocate() is then able to compute the
> > array descriptor correctly and allocate with correct array bounds.
> > 
> > Side notes:
> > 
> > This patch creates a regression in alloc_comp_constructor_1.f90 where two
> > free()'s are gone missing. This will be fixed by the patch for pr58586 and
> > therefore not repeated here.
> > 
> > Bootstraps and regtests ok on x86_64-linux-gnu/f21.
> > 
> > Ok for trunk?
> > 
> > Regards,
> > Andre
> > 
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


pr44672_6.clog
Description: Binary data
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index aaa4e89..a7d862b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2396,6 +2396,9 @@ typedef struct gfc_code
 {
   gfc_typespec ts;
   gfc_alloc *list;
+  /* Take the array specification from expr3 to allocate arrays
+	 without an explicit array specification.  */
+  unsigned arr_spec_from_expr3:1;
 }
 alloc;
 
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index fbf260f..6678138 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -6804,7 +6804,7 @@ conformable_arrays (gfc_expr *e1, gfc_expr *e2)
have a trailing array reference that gives the size of the array.  */
 
 static bool
-resolve_allocate_expr (gfc_expr *e, gfc_code *code)
+resolve_allocate_expr (gfc_expr *e, gfc_code *code, bool *array_alloc_wo_spec)
 {
   int i, pointer, allocatable, dimension, is_abstract;
   int codimension;
@@ -7103,13 +7103,24 @@ resolve_allocate_expr (gfc_expr *e, gfc_code *code)
   if (!ref2 || ref2->type != REF_ARRAY || ref2->u.ar.type == AR_FULL
   || (dimension && ref2->u.ar.dimen == 0))
 {
-  gfc_error ("Array specification required in ALLOCATE statement "
-		 "at %L", &e->where);
-  goto failure;
+  /* F08:C633.  */
+  if (code->expr3)
+	{
+	  if (!gfc_notify_std (GFC_STD_F2008, "Array specification required "
+			   "in ALLOCATE statement at %L", &e->where))
+	goto failure;
+	  *array_alloc_wo_spec = true;
+	}
+  else
+	{
+	  gfc_error ("Array specification required in ALLOCATE statement "
+		 "at %L", &e->where);
+	  goto failure;
+	}
 }
 
   /* Make sure that the array section reference makes sense in the
-context of an ALLOCATE specification.  */
+ context of an ALLOCATE specification.  */
 
   ar = &ref2->u.ar;
 
@@ -7124,7 +7135,7 @@ resolve_allocate_expr (gfc_expr *e, gfc_code *code)
 
   for (i = 0; i < ar->dimen; i++)
 {
- 

[gomp4] Add OpenACC vector-single/vector-partitioned tests

2015-05-19 Thread Julian Brown
Hi,

This patch adds several tests of vector-single/vector-partitioned mode,
as part of work implementing the OpenACC execution model.

Pre-approved for gomp4 branch. I will apply there shortly.

Thanks,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-c-c++-common/vec-single-{1,2,3,4,5,6}.c:
New tests.
* testsuite/libgomp.oacc-c-c++-common/vec-partn-{1,2,3,4,5,6}.c:
New tests.
commit b2bb572cef2b6b0984d65995e070dc424b03a525
Author: jbrown 
Date:   Mon May 11 16:04:48 2015 +

Add vector-single/vector-partitioned tests.

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c
new file mode 100644
index 000..b21e588
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c
@@ -0,0 +1,30 @@
+#include 
+
+/* Test basic vector-partitioned mode transitions.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n = 0, arr[32], i;
+
+  for (i = 0; i < 32; i++)
+arr[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(1) num_workers(1) \
+		   vector_length(32)
+  {
+int j;
+n++;
+#pragma acc loop vector
+for (j = 0; j < 32; j++)
+  arr[j]++;
+n++;
+  }
+
+  assert (n == 2);
+
+  for (i = 0; i < 32; i++)
+assert (arr[i] == 1);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c
new file mode 100644
index 000..1ff222d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c
@@ -0,0 +1,43 @@
+#include 
+
+/* Test vector-partitioned, gang-partitioned mode.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n[32], arr[1024], i;
+  
+  for (i = 0; i < 1024; i++)
+arr[i] = 0;
+
+  for (i = 0; i < 32; i++)
+n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
+		   vector_length(32)
+  {
+int j, k;
+
+#pragma acc loop gang(static:*)
+for (j = 0; j < 32; j++)
+  n[j]++;
+
+#pragma acc loop gang
+for (j = 0; j < 32; j++)
+  #pragma acc loop vector
+  for (k = 0; k < 32; k++)
+	arr[j * 32 + k]++;
+
+#pragma acc loop gang(static:*)
+for (j = 0; j < 32; j++)
+  n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+assert (arr[i] == 1);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c
new file mode 100644
index 000..7908d4c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c
@@ -0,0 +1,54 @@
+#include 
+
+/* Test conditional vector-partitioned loops.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i < 1024; i++)
+arr[i] = 0;
+
+  for (i = 0; i < 32; i++)
+n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
+		   vector_length(32)
+  {
+int j, k;
+
+#pragma acc loop gang(static:*)
+for (j = 0; j < 32; j++)
+  n[j]++;
+
+#pragma acc loop gang
+for (j = 0; j < 32; j++)
+  {
+	if ((j % 2) == 0)
+	  {
+	#pragma acc loop vector
+	for (k = 0; k < 32; k++)
+	  arr[j * 32 + k]++;
+	  }
+	else
+	  {
+	#pragma acc loop vector
+	for (k = 0; k < 32; k++)
+	  arr[j * 32 + k]--;
+	  }
+  }
+
+#pragma acc loop gang(static:*)
+for (j = 0; j < 32; j++)
+  n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+assert (arr[i] == (i % 64) < 32 ? 1 : -1);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c
new file mode 100644
index 000..4ea3bf2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c
@@ -0,0 +1,46 @@
+#include 
+
+/* Test conditions inside vector-partitioned loops.  */
+
+int
+main (int argc, char *argv[])
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i < 1024; i++)
+arr[i] = i;
+
+  for (i = 0; i < 32; i++)
+n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
+		   vector_length(32)
+  {
+int j, k;
+
+#pragma acc loop gang(static:*)
+for (j = 0; j < 32; j++)
+  n[j]++;
+
+#pragma acc loop gang
+for (j = 0; j < 32; j++)
+  {
+	#pragma acc loop vector
+	for (k = 0; k < 32; k++)
+	  if ((arr[j * 32 + k] % 2) != 0)
+	arr[j * 32 + k] *= 2;
+  }
+
+#pragma acc loop gang(static:*)
+for (j = 0; j < 32; j++)
+  n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+assert (arr[i] == ((i % 2) == 0 ? i : i * 2));
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-5.c
new file mode 100644
index 000..86b742a
--- /dev/null
+++ b/libgomp/testsuite/li

[gomp4] Lack of OpenACC NVPTX devices is not an error during scanning

2015-05-19 Thread Julian Brown
Hi,

This patch fixes an oversight whereby if the CUDA libraries are
available for some reason on a system that doesn't actually contain an
nVidia card, an OpenACC program will raise an error if the NVPTX
backend is picked as a default instead of falling back to some other
device instead.

OK for gomp4 branch? For trunk?

Thanks,

Julian

ChangeLog

libgomp/
* plugin/plugin-nvptx.c (nvptx_get_num_devices): Return zero
on cuInit failure.commit 696a0d7e22bb8217ff581886cdf0979bfc2e85bb
Author: Julian Brown 
Date:   Fri May 15 03:22:56 2015 -0700

Lack of PTX devices is not an error during scanning.

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b36691a..d09a91c 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -781,7 +781,13 @@ nvptx_get_num_devices (void)
  until cuInit has been called.  Just call it now (but don't yet do any
  further initialization).  */
   if (instantiated_devices == 0)
-cuInit (0);
+{
+  r = cuInit (0);
+  /* This is not an error: e.g. we may have CUDA libraries installed but
+ no devices available.  */
+  if (r != CUDA_SUCCESS)
+return 0;
+}
 
   r = cuDeviceGetCount (&n);
   if (r!= CUDA_SUCCESS)


Re: [gomp4] Lack of OpenACC NVPTX devices is not an error during scanning

2015-05-19 Thread Jakub Jelinek
On Tue, May 19, 2015 at 11:36:58AM +0100, Julian Brown wrote:
> This patch fixes an oversight whereby if the CUDA libraries are
> available for some reason on a system that doesn't actually contain an
> nVidia card, an OpenACC program will raise an error if the NVPTX
> backend is picked as a default instead of falling back to some other
> device instead.
> 
> OK for gomp4 branch? For trunk?
> 
> Thanks,
> 
> Julian
> 
> ChangeLog
> 
> libgomp/
> * plugin/plugin-nvptx.c (nvptx_get_num_devices): Return zero
> on cuInit failure.

LGTM.

Jakub


Re: Fix PR48052: loop not vectorized if index is "unsigned int"

2015-05-19 Thread Bin.Cheng
On Wed, May 6, 2015 at 7:02 PM, Richard Biener
 wrote:
> On Mon, May 4, 2015 at 9:47 PM, Abderrazek Zaafrani
>  wrote:
>> This is an old thread and we are still running into similar issues:
>> Code is not being vectorized on 64-bit target due to scev not being
>> able to optimally analyze overflow condition.
>>
>> While the original test case shown here seems to work now, it does not
>> work if the start value is not a constant and the loop index variable
>> is of unsigned type: Ex
>>
>> void loop2( double const * __restrict__ x_in, double * __restrict__
>> x_out, double const * __restrict__ c, unsigned int N, unsigned int
>> start) {
>>  for(unsigned int i=start; i!=N; ++i)
>>x_out[i] = c[i]*x_in[i];
>> }
>>
>> Here is our unit test:
>>
>> int foo(int* A, int* B, unsigned start, unsigned B)
>> {
>>   int s;
>>   for (unsigned k = start; k > s += A[k] * B[k];
>>   return s;
>> }
>>
>> Our unit test case is extracted from a matrix multiply of a
>> two-dimensional array and all loops are blocked by hand by a factor of
>> B. Even though a bit modified, above loop corresponds to the innermost
>> loop of the blocked matrix multiply.
>>
>> We worked on patch to solve the problem (see attachment.)
>> The attached patch passed bootstrap and make check on x86_64-linux.
>> Ok for trunk?
>
> Apart from coding style / API issues the case you handle is very special
> (IVs with step 1 only?!) I believe it is also wrong - the assumption that
> if there is a symbolic or constant expression for the number of iterations
> a BIV will not wrap is not true.  niter analysis can very well compute
> the number of iterations for a loop with wrapping IVs.  For your unit test
> this only works because of the special-casing of step 1 IVs.
I happen to look into similar issue right now.  scev_probably_wraps_p
and thus chrec_convert_1 should be improved using niter information.
Actually all information (and the wrap behavior) has already been
computed in tree-ssa-loop-niter.c.  We just need to find a way to used
it.

>
> Technically it might be more interesting to compute wrapping of IVs
> during niter analysis in some more generic way (we have iv->no_overflow
> computed by simple_iv, but that is rather not useful here).

For it iv->no_overflow is computed in simple_iv as below:
  tmp = analyze_scalar_evolution (use_loop, ev);
  ev = resolve_mixers (use_loop, tmp);

  if (folded_casts && tmp != ev)
*folded_casts = true;

It's inaccurate because calling resolve_mixers doesn't mean the result
scev will wrap.  resolve_mixers could have just done exact the same
transformation as instantiate_parameters.  Also
chrec_convert_aggressive is incomplete and need to revised too.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> Abderrazek Zaafrani


Re: [PATCH] Handle multiple vector sizes in BB vectorization

2015-05-19 Thread Rainer Orth
Richard Biener  writes:

> Well, not really - but at least don't fail vectorization because of that
> but allow it to proceed the "build up from scalar pieces" path.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

The testcase FAILs on Solaris/SPARC:

FAIL: gcc.dg/vect/bb-slp-35.c -flto -ffat-lto-objects  scan-tree-dump slp2 
"basic block vectorized"
FAIL: gcc.dg/vect/bb-slp-35.c scan-tree-dump slp2 "basic block vectorized"

The dump

/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-35.c:6:11: note: 
not vectorized: unsupported unaligned store.*p_6(D)
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-35.c:6:11: note: 
not vectorized: unsupported alignment in basic block.

suggests that the following adjustment is needed.  Tested on
sparc-sun-solaris2.11 on x86_64-unknown-linux-gnu.

Ok for mainline?

Rainer


2015-05-19  Rainer Orth  

* gcc.dg/vect/bb-slp-35.c: Adjust.

# HG changeset patch
# Parent 7e4562f46f5c81f1894e9efc36a5f6bd409b5a41
Fix gcc.dg/vect/bb-slp-35.c on SPARC

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-35.c b/gcc/testsuite/gcc.dg/vect/bb-slp-35.c
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-35.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-35.c
@@ -9,5 +9,5 @@ void foo (int * __restrict__ p, short * 
   p[3] = q[3] + 1;
 }
 
-/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" } } */
+/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target vect_hw_misalign } } } */
 /* { dg-final { cleanup-tree-dump "slp2" } } */

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH][AArch64] PR target/65491: Classify V1TF vectors as AAPCS64 short vectors rather than composite types

2015-05-19 Thread James Greenhalgh
On Mon, Apr 20, 2015 at 11:16:02AM +0100, Kyrylo Tkachov wrote:
> Hi all,
> 
> The ICE in the PR happens when we pass a 1x(128-bit float) vector as an
> argument.
> The aarch64 backend erroneously classifies it as a composite type when in
> fact it
> is a short vector according to AAPCS64
> (section 4.1.2 from
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.p
> df).

Agreed.

> 
> The solution in this patch is to check aarch64_composite_type_p for a short
> vector with
> aarch64_short_vector_p rather than the other way around (check for
> aarch64_short_vector_p
> in aarch64_composite_type_p).

I think I understand what you are saying, but your patch does the
opposite (ADDS a check for aarch64_short_vector_p in
aarch64_composite_type_p, REMOVES a check for aarch64_composite_type_p,
in aarch64_short_vector_p)...

This logic is pretty hairy, and I'm struggling to convince myself that
your change only hits the bug you described above. I think I've worked
it through and it does, but if you can find any additional ABI tests
which stress the Vector/Floating-Point passing rules that would help
settle my nerves.

The patch is OK. I wouldn't think we would want to backport it to
release branches as there is no regression to fix.

Thanks,
James

> 2015-04-20  Kyrylo Tkachov  
> 
> PR target/65491
> * config/aarch64/aarch64.c (aarch64_short_vector_p): Move above
> aarch64_composite_type_p.  Remove check for aarch64_composite_type_p.
> (aarch64_composite_type_p): Return false if given type and mode are
> for a short vector.
> 
> 2015-04-20  Kyrylo Tkachov  
> 
> PR target/65491
> * gcc.target/aarch64/pr65491_1.c: New test.
> * gcc.target/aarch64/aapcs64/type-def.h (vlf1_t): New typedef.
> * gcc.target/aarch64/aapcs64/func-ret-1.c: Add test for vlf1_t.



Re: [PATCH] Fix PR66168: ICE due to incorrect invariant register info

2015-05-19 Thread Steven Bosscher
On Tue, May 19, 2015 at 12:17 PM, Thomas Preud'homme wrote:
> 2015-05-18  Thomas Preud'homme
>
> PR rtl-optimization/66168
> * loop-invariant.c (move_invariant_reg): Set inv->reg to destination
> of inv->insn when moving an invariant without introducing a temporary
> register.

Not OK.
This will break in move_invariants() when it looks at REGNO (inv->reg).

Ciao!
Steven


Re: [PATCH] Handle multiple vector sizes in BB vectorization

2015-05-19 Thread Richard Biener
On Tue, 19 May 2015, Rainer Orth wrote:

> Richard Biener  writes:
> 
> > Well, not really - but at least don't fail vectorization because of that
> > but allow it to proceed the "build up from scalar pieces" path.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> 
> The testcase FAILs on Solaris/SPARC:
> 
> FAIL: gcc.dg/vect/bb-slp-35.c -flto -ffat-lto-objects  scan-tree-dump slp2 
> "basic block vectorized"
> FAIL: gcc.dg/vect/bb-slp-35.c scan-tree-dump slp2 "basic block vectorized"
> 
> The dump
> 
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-35.c:6:11: note: 
> not vectorized: unsupported unaligned store.*p_6(D)
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/bb-slp-35.c:6:11: note: 
> not vectorized: unsupported alignment in basic block.
> 
> suggests that the following adjustment is needed.  Tested on
> sparc-sun-solaris2.11 on x86_64-unknown-linux-gnu.
> 
> Ok for mainline?

Ok.

Thanks,
Richard.

>   Rainer
> 
> 
> 2015-05-19  Rainer Orth  
> 
>   * gcc.dg/vect/bb-slp-35.c: Adjust.


Commit: MSP430: Enhance the zero_extendhisi2 pattern

2015-05-19 Thread Nick Clifton
Hi Guys,

  I am applying the patch below to enhance the zero_extendhisi2 pattern
  in the MSP430 backend so that it can cope with separate source and
  destination registers.  This makes zero extending into another
  register more efficient and it also helps to work around a reload bug
  reported in PR 66156.

Cheers
  Nick

gcc/ChangeLog
2015-05-19  Nick Clifton  

PR target/66156
* config/msp430/msp430.md (zero_extendhisi2): Add support for
separate source and destination registers.

Index: gcc/config/msp430/msp430.md
===
--- gcc/config/msp430/msp430.md (revision 223348)
+++ gcc/config/msp430/msp430.md (working copy)
@@ -588,10 +588,12 @@
 ;; patterns.  Doing these manually allows for alternate optimization
 ;; paths.
 (define_insn "zero_extendhisi2"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm")
-   (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "0")))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,r")
+   (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "0,r")))]
   "msp430x"
-  "MOV.W\t#0,%H0"
+  "@
+  MOV.W\t#0,%H0
+  MOV.W\t%1,%L0 { MOV.W\t#0,%H0"
 )
 
 (define_insn "zero_extendhisipsi2"


Re: [Patch, fortran, 64674, v2] [OOP] ICE in ASSOCIATE with class array

2015-05-19 Thread Andre Vehreschild
Hi,

so here is the update on pr 64674. Besides adapting to current trunk nothing
has changed from the previous version. The links for getting the patches this
one depends on are:

PR65548 v5: https://gcc.gnu.org/ml/fortran/2015-05/msg00123.html and
PR44672 v6: https://gcc.gnu.org/ml/fortran/2015-05/msg00124.html

Bootstraps and regtests fine on x86_64-linux-gnu/f21.

Any comments?

- Andre

On Mon, 4 May 2015 16:53:15 +0200
Andre Vehreschild  wrote:

> Hi all,
> 
> I like to present here a first patch for using class arrays in associate. Upto
> now gfortran crashed, when a class array-section/element was selected in an
> associate. This patch fixes this now for class array sections as well as for
> single elements.
> 
> The story of the patch is told quite shortly: 
> 
> - parse.c::parse_associate() needs to gather more information about what the
>   target is like. Previously the target's rank and array_spec was not
> computed, which disallowed the use of further array refs in the associate
> body: associate (vec => class_matrix(2:3, 2))
> vec(1) = ... ! <- Unclassifiable statement, because no array_spec was
>   attached to vec. This is fixed by the second hunk of the patch.
> 
> - The third hunk in primary.c prevents setting the dimension attribute on a
>   class object's symbol.
> 
> - The hunks in resolve.c take care about adding dummy full array_refs and in
>   resolve_assoc_var correct the class type, when the target expression's rank
>   is 0. Previously the symbol would have an array valued type, when the
>   target's base type was array valued. But for a scalar target this needed
> some polishing.
> 
> - Additionally a test was added.
> 
> Bootstraps and regtests ok on x86_64-linux-gnu/f21.
> 
> Ok for trunk?
> 
> Note, this patch was diffed from a trunk with my older patches for
> 
> PR65548, v3 https://gcc.gnu.org/ml/fortran/2015-04/msg00123.html and
> PR44672, v5 https://gcc.gnu.org/ml/fortran/2015-04/msg00124.html
> 
> applied.
> 
> Regards,
>   Andre


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


pr64787_v2.clog
Description: Binary data
diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index 786876c..455aa69 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -234,6 +234,9 @@ gfc_add_component_ref (gfc_expr *e, const char *name)
 }
   if (*tail != NULL && strcmp (name, "_data") == 0)
 next = *tail;
+  else
+/* Avoid losing memory.  */
+gfc_free_ref_list (*tail);
   (*tail) = gfc_get_ref();
   (*tail)->next = next;
   (*tail)->type = REF_COMPONENT;
@@ -2562,13 +2565,19 @@ find_intrinsic_vtab (gfc_typespec *ts)
 	  c->attr.access = ACCESS_PRIVATE;
 
 	  /* Build a minimal expression to make use of
-		 target-memory.c/gfc_element_size for 'size'.  */
+		 target-memory.c/gfc_element_size for 'size'.  Special handling
+		 for character arrays, that are not constant sized: to support
+		 len(str)*kind, only the kind information is stored in the
+		 vtab.  */
 	  e = gfc_get_expr ();
 	  e->ts = *ts;
 	  e->expr_type = EXPR_VARIABLE;
 	  c->initializer = gfc_get_int_expr (gfc_default_integer_kind,
 		 NULL,
-		 (int)gfc_element_size (e));
+		 ts->type == BT_CHARACTER
+		 && charlen == 0 ?
+		   ts->kind :
+		   (int)gfc_element_size (e));
 	  gfc_free_expr (e);
 
 	  /* Add component _extends.  */
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index f55c691..f4fa9c8 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3168,6 +3168,7 @@ void gfc_add_component_ref (gfc_expr *, const char *);
 void gfc_add_class_array_ref (gfc_expr *);
 #define gfc_add_data_component(e) gfc_add_component_ref(e,"_data")
 #define gfc_add_vptr_component(e) gfc_add_component_ref(e,"_vptr")
+#define gfc_add_len_component(e)  gfc_add_component_ref(e,"_len")
 #define gfc_add_hash_component(e) gfc_add_component_ref(e,"_hash")
 #define gfc_add_size_component(e) gfc_add_component_ref(e,"_size")
 #define gfc_add_def_init_component(e) gfc_add_component_ref(e,"_def_init")
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 54f8f4a..697a17a 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -4975,8 +4975,7 @@ static tree
 gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset,
 		 gfc_expr ** lower, gfc_expr ** upper, stmtblock_t * pblock,
 		 stmtblock_t * descriptor_block, tree * overflow,
-		 tree expr3_elem_size, tree *nelems, gfc_expr *expr3,
-		 gfc_typespec *ts)
+		 tree expr3_elem_size, tree *nelems, gfc_expr *expr3)
 {
   tree type;
   tree tmp;
@@ -5002,7 +5001,7 @@ gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset,
 
   /* Set the dtype.  */
   tmp = gfc_conv_descriptor_dtype (descriptor);
-  gfc_add_modify (descriptor_block, tmp, gfc_get_dtype (TREE_TYPE (descriptor)));
+  gfc_add_modify (descriptor_block, tmp, gfc_get_dtype (type));
 
   or_expr = boolean_false

Re: [PATCH] plugin event for C/C++ function definitions

2015-05-19 Thread Andres Tiraboschi
2015-05-18 16:51 GMT-03:00  :
> Hi, this patch adds two new plugin events PLUGIN_START_PARSE_FUNCTION and 
> PLUGIN_FINISH_PARSE_FUNCTION. These events are invoked at start_function and 
> finish_function in gcc/c/c-decl.c and gcc/cp/decl.c respectively in the C and 
> C++ frontends.
> PLUGIN_START_PARSE_FUNCTION is called before parsing a function body.
> PLUGIN_FINISH_PARSE_FUNCTION is called after parsing a function definition.
> This patch has been implemented in gcc 5.1.0.
>
> changelog:
>
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index e28a294..fcc849d 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -8235,6 +8235,7 @@ start_function (struct c_declspecs *declspecs, struct 
> c_declarator *declarator,
>
>decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, true, NULL,
>   &attributes, NULL, NULL, DEPRECATED_NORMAL);
> +  invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1);
>
>/* If the declarator is not suitable for a function definition,
>   cause a syntax error.  */
> @@ -9050,6 +9051,7 @@ finish_function (void)
>   It's still in DECL_STRUCT_FUNCTION, and we'll restore it in
>   tree_rest_of_compilation.  */
>set_cfun (NULL);
> +  invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, 
> current_function_decl);
>current_function_decl = NULL;
>  }
>
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index c4731ae..bde92cc 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -13727,6 +13727,7 @@ start_function (cp_decl_specifier_seq *declspecs,
>tree decl1;
>
>decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, 1, &attrs);
> +  invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1);
>if (decl1 == error_mark_node)
>  return false;
>/* If the declarator is not suitable for a function definition,
> @@ -14365,6 +14366,7 @@ finish_function (int flags)
>vec_free (deferred_mark_used_calls);
>  }
>
> +  invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, fndecl);
>return fndecl;
>  }
>
> diff --git a/gcc/doc/plugins.texi b/gcc/doc/plugins.texi
> index c6caa19..d50f25c 100644
> --- a/gcc/doc/plugins.texi
> +++ b/gcc/doc/plugins.texi
> @@ -174,6 +174,8 @@ Callbacks can be invoked at the following pre-determined 
> events:
>  @smallexample
>  enum plugin_event
>  @{
> +  PLUGIN_START_PARSE_FUNCTION,  /* Called before parsing the body of a 
> function. */
> +  PLUGIN_FINISH_PARSE_FUNCTION, /* After finishing parsing a function. */
>PLUGIN_PASS_MANAGER_SETUP,/* To hook into pass manager.  */
>PLUGIN_FINISH_TYPE,   /* After finishing parsing a type.  */
>PLUGIN_FINISH_DECL,   /* After finishing parsing a declaration. */
> diff --git a/gcc/plugin.c b/gcc/plugin.c
> index d924438..628833f 100644
> --- a/gcc/plugin.c
> +++ b/gcc/plugin.c
> @@ -441,6 +441,8 @@ register_callback (const char *plugin_name,
> return;
>   }
>/* Fall through.  */
> +  case PLUGIN_START_PARSE_FUNCTION:
> +  case PLUGIN_FINISH_PARSE_FUNCTION:
>case PLUGIN_FINISH_TYPE:
>case PLUGIN_FINISH_DECL:
>case PLUGIN_START_UNIT:
> @@ -519,6 +521,8 @@ invoke_plugin_callbacks_full (int event, void *gcc_data)
> gcc_assert (event >= PLUGIN_EVENT_FIRST_DYNAMIC);
> gcc_assert (event < event_last);
>/* Fall through.  */
> +  case PLUGIN_START_PARSE_FUNCTION:
> +  case PLUGIN_FINISH_PARSE_FUNCTION:
>case PLUGIN_FINISH_TYPE:
>case PLUGIN_FINISH_DECL:
>case PLUGIN_START_UNIT:
> diff --git a/gcc/plugin.def b/gcc/plugin.def
> index 98c988a..2a7e4c2 100644
> --- a/gcc/plugin.def
> +++ b/gcc/plugin.def
> @@ -17,6 +17,11 @@ You should have received a copy of the GNU General Public 
> License
>  along with GCC; see the file COPYING3.  If not see
>  .  */
>
> +/* Called before parsing the body of a function.  */
> +DEFEVENT (PLUGIN_START_PARSE_FUNCTION)
> +
> +/* After finishing parsing a function. */
> +DEFEVENT (PLUGIN_FINISH_PARSE_FUNCTION)
>
>  /* To hook into pass manager.  */
>  DEFEVENT (PLUGIN_PASS_MANAGER_SETUP)
> diff --git a/gcc/testsuite/g++.dg/plugin/def-plugin-test.C 
> b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C
> new file mode 100644
> index 000..b7f2d3d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C
> @@ -0,0 +1,13 @@
> +int global = 12;
> +
> +int function1(void);
> +
> +int function2(int a) // { dg-warning "Start fndef function2" }
> +{
> +  return function1() + a;
> +} //  { dg-warning "Finish fndef function2" }
> +
> +int function1(void) // { dg-warning "Start fndef function1" }
> +{
> +  return global + 1;
> +} //  { dg-warning "Finish fndef function1" }
> diff --git a/gcc/testsuite/g++.dg/plugin/def_plugin.c 
> b/gcc/testsuite/g++.dg/plugin/def_plugin.c
> new file mode 100644
> index 000..63983c5
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/plugin/def_plugin.c
> @@ -0,0 +1,45 @@
> +/* A plugin example that shows w

[Patch ARM] Add cpu_defines.h for ARM

2015-05-19 Thread Ramana Radhakrishnan
Hardware Integer divide instructions do not trap. Define this to be so 
for the ARM port.


Applied to trunk after a build and test across architecture ranges and a 
bootstrap and regression run on a Cortex-A15 - a v7ve core that has 
hardware divide instructions.


A patch for AArch64 follows.

regards
Ramana

2015-05-17  Ramana Radhakrishnan  

* configure.host: Define cpu_defines_dir for ARM.
* config/cpu/arm/cpu_defines.h: New file.

Index: ChangeLog
===
--- ChangeLog   (revision 223359)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2015-05-17  Ramana Radhakrishnan  
+
+   * configure.host: Define cpu_defines_dir for ARM.
+   * config/cpu/arm/cpu_defines.h: New file.
+
 2015-05-17  François Dumont  
 
* include/bits/unordered_map.h (unordered_map, unordered_multimap): Add
Index: config/cpu/arm/cpu_defines.h
===
--- config/cpu/arm/cpu_defines.h(revision 0)
+++ config/cpu/arm/cpu_defines.h(working copy)
@@ -0,0 +1,40 @@
+// Specific definitions for generic platforms  -*- C++ -*-
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/cpu_defines.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{iosfwd}
+ */
+
+#ifndef _GLIBCXX_CPU_DEFINES
+#define _GLIBCXX_CPU_DEFINES 1
+
+// Integer divide instructions don't trap on ARM.
+#ifdef __ARM_ARCH_EXT_IDIV__
+#define __glibcxx_integral_traps false
+#else
+#define __glibcxx_integral_traps true
+#endif
+
+#endif
Index: configure.host
===
--- configure.host  (revision 223359)
+++ configure.host  (working copy)
@@ -143,6 +143,9 @@
 # Set specific CPU overrides for cpu_defines_dir. Most can just use generic.
 # THIS TABLE IS SORTED.  KEEP IT THAT WAY.
 case "${host_cpu}" in
+  arm*)
+cpu_defines_dir=cpu/arm
+;;
   powerpc* | rs6000)
 cpu_defines_dir=cpu/powerpc
 ;;


[Patch AArch64] Add cpu_defines.h for AArch64.

2015-05-19 Thread Ramana Radhakrishnan

Hi,

Like the ARM port, the AArch64 ports needs to set glibc_integral_traps 
to false as integer divide instructions do not trap.


Bootstrapped and regression tested on aarch64-none-linux-gnu

Ok to apply ?

regards
Ramana


2015-05-17  Ramana Radhakrishnan  

* configure.host: Define cpu_defines_dir for AArch64
* config/cpu/aarch64/cpu_defines.h: New file.
>From 1e38b2a848a313e5b98494094b198b7a1e34c59c Mon Sep 17 00:00:00 2001
From: Ramana Radhakrishnan 
Date: Mon, 18 May 2015 15:45:49 +0100
Subject: [PATCH 2/2] Do the same for AArch64.

---
 libstdc++-v3/config/cpu/aarch64/cpu_defines.h | 36 +++
 libstdc++-v3/configure.host   |  3 +++
 2 files changed, 39 insertions(+)
 create mode 100644 libstdc++-v3/config/cpu/aarch64/cpu_defines.h

diff --git a/libstdc++-v3/config/cpu/aarch64/cpu_defines.h b/libstdc++-v3/config/cpu/aarch64/cpu_defines.h
new file mode 100644
index 000..d5a6fd0
--- /dev/null
+++ b/libstdc++-v3/config/cpu/aarch64/cpu_defines.h
@@ -0,0 +1,36 @@
+// Specific definitions for generic platforms  -*- C++ -*-
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file bits/cpu_defines.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{iosfwd}
+ */
+
+#ifndef _GLIBCXX_CPU_DEFINES
+#define _GLIBCXX_CPU_DEFINES 1
+
+// Integer divide instructions don't trap on AArch64.
+#define __glibcxx_integral_traps false
+
+#endif
diff --git a/libstdc++-v3/configure.host b/libstdc++-v3/configure.host
index 465a40a..b1ca7b7 100644
--- a/libstdc++-v3/configure.host
+++ b/libstdc++-v3/configure.host
@@ -143,6 +143,9 @@ cpu_include_dir=cpu/${try_cpu}
 # Set specific CPU overrides for cpu_defines_dir. Most can just use generic.
 # THIS TABLE IS SORTED.  KEEP IT THAT WAY.
 case "${host_cpu}" in
+  aarch64*)
+cpu_defines_dir=cpu/aarch64
+;;
   arm*)
 cpu_defines_dir=cpu/arm
 ;;
-- 
1.9.1



[patch, avr] Restore base register if not marked dead/unused

2015-05-19 Thread Sivanupandi, Pitchumani
Test gcc.c-torture/execute/memcpy-bi.c (-O2) failed for attiny40 device.
Cause seems to be in "load from memory" as it is not restoring base
register after load instructions generated.

Function avr_out_load_psi_reg_no_disp_tiny in avr.c:
It returns just after emitting instructions to load from memory to 
registers. It is important to restore the base register if it is 
not marked dead/unused after that insn.

Code to restore base register is present already. Below patch let 
the function do the restore before return.

If OK, could someone commit? I do not have commit access.

diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
index 4e83de8..b653858 100644
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@@ -4365,9 +4365,9 @@ avr_out_load_psi_reg_no_disp_tiny (rtx insn, rtx *op, int 
*plen)
 }
   else
 {
-  return avr_asm_len ("ld %A0,%1+"  CR_TAB
-  "ld %B0,%1+"  CR_TAB
-  "ld %C0,%1", op, plen, -3);
+  avr_asm_len ("ld %A0,%1+"  CR_TAB
+   "ld %B0,%1+"  CR_TAB
+   "ld %C0,%1", op, plen, -3);
 
   if (reg_dest != reg_base - 2 &&
   !reg_unused_after (insn, base))

Regards,
Pitchumani

gcc/ChangeLog
2015-05-19  Pitchumani Sivanupandi  

* config/avr/avr.c (avr_out_load_psi_reg_no_disp_tiny): Restore base
register if not marked dead/unused, before return.



Re: [Patch ARM-AArch64/testsuite 00/13] Neon intrinsics executable tests

2015-05-19 Thread James Greenhalgh
On Tue, May 12, 2015 at 09:30:48PM +0100, Christophe Lyon wrote:
> This patch series is a follow-up to the tests I already contributed,
> converted from my original testsuite.
> 
> This series consists in 13 new files, which can be committed
> independently.
> 
> Another series (hopefully final) will follow.
> 
> Tested with qemu on arm*linux, aarch64-linux. I couldn't test on 
> aarch64_be-none-elf because my build is currently broken (see PR 66018).
> 
> 2015-05-12  Christophe Lyon  
> 
>   * gcc.target/aarch64/neon-intrinsics/vqmovn.c: New file.
>   * gcc.target/aarch64/neon-intrinsics/vqmovun.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqrdmulh.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqrdmulh_lane.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqrdmulh_n.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqrshl.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqrshrn_n.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqrshrun_n.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqshl.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqshl_n.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqshlu_n.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqshrn_n.c: Likewise.
>   * gcc.target/aarch64/neon-intrinsics/vqshrun_n.c: Likewise.

Hi Christophe,

This patch set looks good to me. The patch set is OK, please apply it.

One small nit, could you run through and check the alignment of the
trailing \ in some of the macro definitions? It might be the mail
clients, but I see (for example):

+  /* Basic test: v2=vqshlu_n(v1,v), then store the result.  */
+#define TEST_VQSHLU_N2(INSN, Q, T1, T2, T3, T4, W, N, V, 
EXPECTED_CUMULATIVE_SAT, CMT) \
+  Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T3, W, N));  \
+  VECT_VAR(vector_res, T3, W, N) = \
+INSN##Q##_n_##T2##W(VECT_VAR(vector, T1, W, N),\
+   V); \
+  vst1##Q##_##T4##W(VECT_VAR(result, T3, W, N),\
+   VECT_VAR(vector_res, T3, W, N));\
+  CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
+

in patch 11/13.

Also, if you could look out for aarch64_be fallout once the build
starts going, that would be great.

Thanks,
James

> 
> Christophe Lyon (13):
>   Add vqmovn tests.
>   Add vqmovun tests.
>   Add vqrdmulh tests.
>   Add vqrdmulh_lane tests.
>   Add vqrdmulh_n tests.
>   Add vqrshl tests.
>   Add vqrshrn_n tests.
>   Add vqrshrun_n tests.
>   Add vqshl tests.
>   Add vqshl_n tests.
>   Add vqshlu_n tests.
>   Add vqshrn_n tests.
>   Add vqshrun_n tests.
> 
>  .../gcc.target/aarch64/advsimd-intrinsics/vqmovn.c |  134 +++
>  .../aarch64/advsimd-intrinsics/vqmovun.c   |   93 ++
>  .../aarch64/advsimd-intrinsics/vqrdmulh.c  |  161 +++
>  .../aarch64/advsimd-intrinsics/vqrdmulh_lane.c |  169 +++
>  .../aarch64/advsimd-intrinsics/vqrdmulh_n.c|  155 +++
>  .../gcc.target/aarch64/advsimd-intrinsics/vqrshl.c | 1090 
> 
>  .../aarch64/advsimd-intrinsics/vqrshrn_n.c |  174 
>  .../aarch64/advsimd-intrinsics/vqrshrun_n.c|  189 
>  .../gcc.target/aarch64/advsimd-intrinsics/vqshl.c  |  829 +++
>  .../aarch64/advsimd-intrinsics/vqshl_n.c   |  234 +
>  .../aarch64/advsimd-intrinsics/vqshlu_n.c  |  263 +
>  .../aarch64/advsimd-intrinsics/vqshrn_n.c  |  177 
>  .../aarch64/advsimd-intrinsics/vqshrun_n.c |  133 +++
>  13 files changed, 3801 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqmovn.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqmovun.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmulh.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmulh_lane.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmulh_n.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshl.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrn_n.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrun_n.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshl.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshl_n.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshlu_n.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshrn_n.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshrun_n.c
> 
> -- 
> 2.1.4
> 


Re: [Patch, Fortran, PR58586, v4] ICE with derived type with allocatable component passed by value

2015-05-19 Thread Andre Vehreschild
Hi,

attached is the most recent version of the patch for 58586. It adapts to
recent trunk and addresses the caveats so far, i.e. the testcases in the
comments now compile and run again w/o errors.

Bootstraps and regtests fine on x86_64-linux-gnu/f21.

Comments?

- Andre

On Fri, 8 May 2015 16:11:11 +0200
Andre Vehreschild  wrote:

> Hi,
> 
> so attached is a quick and dirty solution for the allocatable return value
> problem. I personally don't like it. It is making a special case from the
> assign a function result to a variable. May be you have a better idea how to
> do this in gfortran style.
> 
> - Andre
> 
> 
> On Fri, 8 May 2015 15:31:46 +0200
> Andre Vehreschild  wrote:
> 
> > Hi Mikael,
> > 
> > > > ?? I don't get you there? What do you mean? Do you think the
> > > > alloc_comp_class_3/4.* are not correctly testing the issue? Any idea of
> > > > how to test this better? I mean the pr is about this artificial
> > > > constructs. I merely struck it in search of a pr about allocatable
> > > > components. 
> > > 
> > > I was talking about the bug you found with t_init above.  :-)
> > > the compiler is not ready to accept that function in a testcase.
> > > The alloc_omp_class_3/4 are fine.
> > 
> > Oh, sorry, I misunderstood you there. Now let's see, where that one is
> > hiding.
> > 
> > - Andre
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
gcc/testsuite/ChangeLog:

2015-05-19  Andre Vehreschild  

* gfortran.dg/alloc_comp_class_3.f03: New test.
* gfortran.dg/alloc_comp_class_4.f03: New test.


gcc/fortran/ChangeLog:

2015-05-19  Andre Vehreschild  

PR fortran/58586
* resolve.c (resolve_symbol): Non-private functions in modules
with allocatable or pointer components are marked referenced
now. Furthermore is the default init especially for those
components now done in gfc_conf_procedure_call preventing
duplicate code.
* trans-decl.c (gfc_generate_function_code): Generate a fake
result decl for functions returning an object with allocatable
components and initialize them.
* trans-expr.c (gfc_conv_procedure_call): For value typed trees
use the tree without indirect ref. And for non-decl trees
add a temporary variable to prevent evaluating the tree
multiple times (prevent multiple function evaluations).
* trans.h: Made gfc_trans_structure_assign () protoype
available, which is now needed by trans-decl.c:gfc_generate_
function_code(), too.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index fc11d23..e1b5762 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -14094,10 +14094,15 @@ resolve_symbol (gfc_symbol *sym)
 
   if ((!a->save && !a->dummy && !a->pointer
 	   && !a->in_common && !a->use_assoc
-	   && (a->referenced || a->result)
-	   && !(a->function && sym != sym->result))
+	   && !a->result && !a->function)
 	  || (a->dummy && a->intent == INTENT_OUT && !a->pointer))
 	apply_default_init (sym);
+  else if (a->function && sym->result && a->access != ACCESS_PRIVATE
+	   && (sym->ts.u.derived->attr.alloc_comp
+		   || sym->ts.u.derived->attr.pointer_comp))
+	/* Mark the result symbol to be referenced, when it has allocatable
+	   components.  */
+	sym->result->attr.referenced = 1;
 }
 
   if (sym->ts.type == BT_CLASS && sym->ns == gfc_current_ns
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 4c18920..f9a91c6 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -5896,9 +5896,33 @@ gfc_generate_function_code (gfc_namespace * ns)
   tmp = gfc_trans_code (ns->code);
   gfc_add_expr_to_block (&body, tmp);
 
-  if (TREE_TYPE (DECL_RESULT (fndecl)) != void_type_node)
+  if (TREE_TYPE (DECL_RESULT (fndecl)) != void_type_node
+  || (sym->result && sym->result != sym
+	  && sym->result->ts.type == BT_DERIVED
+	  && sym->result->ts.u.derived->attr.alloc_comp))
 {
+  bool artificial_result_decl = false;
   tree result = get_proc_result (sym);
+  gfc_symbol *rsym = sym == sym->result ? sym : sym->result;
+
+  /* Make sure that a function returning an object with
+	 alloc/pointer_components always has a result, where at least
+	 the allocatable/pointer components are set to zero.  */
+  if (result == NULL_TREE && sym->attr.function
+	  && ((sym->result->ts.type == BT_DERIVED
+	   && (sym->attr.allocatable
+		   || sym->attr.pointer
+		   || sym->result->ts.u.derived->attr.alloc_comp
+		   || sym->result->ts.u.derived->attr.pointer_comp))
+	  || (sym->result->ts.type == BT_CLASS
+		  && (CLASS_DATA (sym)->attr.allocatable
+		  || CLASS_DATA (sym)->attr.class_pointer
+		  || CLASS_DATA (sym->result)->attr.alloc_comp
+		  || CLASS_DATA (sym->result)->attr.pointer_comp
+	{
+	  artificial_result_decl = true;
+	  result = gfc_get_fake_result_decl (sym, 0);
+	}
 
   if (result != NULL_TREE && sym->attr.function && !sym->attr.

Re: [Patch, fortran, pr65548, 2nd take, v5] [5/6 Regression] gfc_conv_procedure_call

2015-05-19 Thread Mikael Morin
Le 19/05/2015 10:50, Andre Vehreschild a écrit :
> Hi all,
> 
> find attached latest version to fix 65548.
> 
> Bootstraps and regtests ok on x86_64-linux-gnu/f21.
> 
OK. Thanks.

Mikael


[patch] Optimize std::list when using new ABI

2015-05-19 Thread Jonathan Wakely

This fixes some missed optimizations I should have made when adding
the new std::__cxx11::list, making use of the O(1) list::size() when
it saves work.

In the equality comparisons two lists can't be equal if their sizes
differ.

When resizing a list we don't need to walk the list to find whether
we're growing or shrinking, and when shrinking by less than 50% it is
faster to find the first element to erase by moving backwards from the
end rather than starting at the beginning.

Tested powerpc64-linux.

I plan to commit this to trunk and the gcc-5-branch.

commit 9d6539fec972296694c30b073dd068cfbcdae8a5
Author: Jonathan Wakely 
Date:   Tue May 19 13:28:16 2015 +0100

	* include/bits/stl_list.h (_M_resize_pos(size_type&)): Declare.
	(operator==(const list&, const list&)): If size() is O(1) compare
	sizes before comparing each element.
	* include/bits/list.tcc (list::_M_resize_pos(size_type&)): Define.
	(list::resize): Use _M_resize_pos.

diff --git a/libstdc++-v3/include/bits/list.tcc b/libstdc++-v3/include/bits/list.tcc
index a9c8a55..c5d2ab4 100644
--- a/libstdc++-v3/include/bits/list.tcc
+++ b/libstdc++-v3/include/bits/list.tcc
@@ -157,6 +157,52 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   return __ret;
 }
 
+  // Return a const_iterator indicating the position to start inserting or
+  // erasing elements (depending whether the list is growing or shrinking),
+  // and set __new_size to the number of new elements that must be appended.
+  // Equivalent to the following, but performed optimally:
+  // if (__new_size < size()) {
+  //   __new_size = 0;
+  //   return std::next(begin(), __new_size);
+  // } else {
+  //   __newsize -= size();
+  //   return end();
+  // }
+  template
+typename list<_Tp, _Alloc>::const_iterator
+list<_Tp, _Alloc>::
+_M_resize_pos(size_type& __new_size) const
+{
+  const_iterator __i;
+#if _GLIBCXX_USE_CXX11_ABI
+  const size_type __len = size();
+  if (__new_size < __len)
+	{
+	  if (__new_size <= __len / 2)
+	{
+	  __i = begin();
+	  std::advance(__i, __new_size);
+	}
+	  else
+	{
+	  __i = end();
+	  ptrdiff_t __num_erase = __len - __new_size;
+	  std::advance(__i, -__num_erase);
+	}
+	  __new_size = 0;
+	  return __i;
+	}
+  else
+	__i = end();
+#else
+  size_type __len = 0;
+  for (__i = begin(); __i != end() && __len < __new_size; ++__i, ++__len)
+;
+#endif
+  __new_size -= __len;
+  return __i;
+}
+
 #if __cplusplus >= 201103L
   template
 void
@@ -182,14 +228,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 list<_Tp, _Alloc>::
 resize(size_type __new_size)
 {
-  iterator __i = begin();
-  size_type __len = 0;
-  for (; __i != end() && __len < __new_size; ++__i, ++__len)
-;
-  if (__len == __new_size)
+  const_iterator __i = _M_resize_pos(__new_size);
+  if (__new_size)
+	_M_default_append(__new_size);
+  else
 erase(__i, end());
-  else  // __i == end()
-	_M_default_append(__new_size - __len);
 }
 
   template
@@ -197,14 +240,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 list<_Tp, _Alloc>::
 resize(size_type __new_size, const value_type& __x)
 {
-  iterator __i = begin();
-  size_type __len = 0;
-  for (; __i != end() && __len < __new_size; ++__i, ++__len)
-;
-  if (__len == __new_size)
+  const_iterator __i = _M_resize_pos(__new_size);
+  if (__new_size)
+insert(end(), __new_size, __x);
+  else
 erase(__i, end());
-  else  // __i == end()
-insert(end(), __new_size - __len, __x);
 }
 #else
   template
@@ -212,14 +252,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 list<_Tp, _Alloc>::
 resize(size_type __new_size, value_type __x)
 {
-  iterator __i = begin();
-  size_type __len = 0;
-  for (; __i != end() && __len < __new_size; ++__i, ++__len)
-;
-  if (__len == __new_size)
-erase(__i, end());
-  else  // __i == end()
-insert(end(), __new_size - __len, __x);
+  const_iterator __i = _M_resize_pos(__new_size);
+  if (__new_size)
+insert(end(), __new_size, __x);
+  else
+erase(__i._M_const_cast(), end());
 }
 #endif
 
diff --git a/libstdc++-v3/include/bits/stl_list.h b/libstdc++-v3/include/bits/stl_list.h
index 3401e5b..a26859e 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -1789,6 +1789,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 	_S_do_it(_M_get_Node_allocator(), __x._M_get_Node_allocator()))
 	  __builtin_abort();
   }
+
+  // Used to implement resize.
+  const_iterator
+  _M_resize_pos(size_type& __new_size) const;
 };
 _GLIBCXX_END_NAMESPACE_CXX11
 
@@ -1806,6 +1810,11 @@ _GLIBCXX_END_NAMESPACE_CXX11
 inline bool
 operator==(const list<_Tp, _Alloc>& __x, const list<_Tp, _Alloc>& __y)
 {

[PATCH] Fix duplicated warning with __attribute__((format)) (PR c/64223)

2015-05-19 Thread Marek Polacek
This PR points out that we output same -Wformat warning twice when using
__attribute__ ((format)).  The problem was that attribute_value_equal
(called when processing merge_attributes) got two lists:
"format printf, 1, 2" and "__format__ __printf__, 1, 2", these should be
equal.  But since attribute_value_equal uses simple_cst_list_equal when
it sees a TREE_LISTs, it doesn't consider "__printf__" and "printf" as
the same, so it said that the two lists aren't same.  That means that the
type then contains two same format attributes and we warn twice.
Fixed by handling the format attribute specially.  (The patch doesn't
consider the printf and the gnu_printf archetypes as the same, so we still
might get duplicate warnings when combining printf and gnu_printf.)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-05-19  Marek Polacek  

PR c/64223
* tree.c (attribute_value_equal): Handle attribute format.
(cmp_attrib_identifiers): Factor out of lookup_ident_attribute.

* gcc.dg/pr64223-1.c: New test.
* gcc.dg/pr64223-2.c: New test.

diff --git gcc/testsuite/gcc.dg/pr64223-1.c gcc/testsuite/gcc.dg/pr64223-1.c
index e69de29..015bfd8 100644
--- gcc/testsuite/gcc.dg/pr64223-1.c
+++ gcc/testsuite/gcc.dg/pr64223-1.c
@@ -0,0 +1,12 @@
+/* PR c/64223: Test for duplicated warnings.  */
+/* { dg-do compile } */
+/* { dg-options "-Wformat" } */
+
+int printf (const char *, ...) __attribute__ ((__format__ (__printf__, 1, 2)));
+
+void
+foo (void)
+{
+  printf ("%d\n", 0UL); /* { dg-bogus "expects argument of type.*expects 
argument of type" } */
+ /* { dg-warning "expects argument of type" "" { target *-*-* } 10 } */
+}
diff --git gcc/testsuite/gcc.dg/pr64223-2.c gcc/testsuite/gcc.dg/pr64223-2.c
index e69de29..2a1627e 100644
--- gcc/testsuite/gcc.dg/pr64223-2.c
+++ gcc/testsuite/gcc.dg/pr64223-2.c
@@ -0,0 +1,13 @@
+/* PR c/64223: Test for duplicated warnings.  */
+/* { dg-do compile } */
+/* { dg-options "-Wformat" } */
+
+int myprintf (const char *, ...) __attribute__ ((__format__ (printf, 1, 2)));
+int myprintf (const char *, ...) __attribute__ ((__format__ (__printf__, 1, 
2)));
+
+void
+foo (void)
+{
+  myprintf ("%d\n", 0UL); /* { dg-bogus "expects argument of type.*expects 
argument of type" } */
+ /* { dg-warning "expects argument of type" "" { target *-*-* } 11 } */
+}
diff --git gcc/tree.c gcc/tree.c
index 6297f04..a58ad7b 100644
--- gcc/tree.c
+++ gcc/tree.c
@@ -4871,9 +4871,53 @@ simple_cst_list_equal (const_tree l1, const_tree l2)
   return l1 == l2;
 }
 
+/* Compare two identifier nodes representing attributes.  Either one may
+   be in prefixed __ATTR__ form.  Return true if they are the same, false
+   otherwise.  */
+
+static bool
+cmp_attrib_identifiers (const_tree attr1, const_tree attr2)
+{
+  /* Make sure we're dealing with IDENTIFIER_NODEs.  */
+  gcc_checking_assert (TREE_CODE (attr1) == IDENTIFIER_NODE
+  && TREE_CODE (attr2) == IDENTIFIER_NODE);
+
+  /* Identifiers can be compared directly for equality.  */
+  if (attr1 == attr2)
+return true;
+
+  /* If they are not equal, they may still be one in the form
+ 'text' while the other one is in the form '__text__'.  TODO:
+ If we were storing attributes in normalized 'text' form, then
+ this could all go away and we could take full advantage of
+ the fact that we're comparing identifiers. :-)  */
+  const size_t attr1_len = IDENTIFIER_LENGTH (attr1);
+  const size_t attr2_len = IDENTIFIER_LENGTH (attr2);
+
+  if (attr2_len == attr1_len + 4)
+{
+  const char *p = IDENTIFIER_POINTER (attr2);
+  const char *q = IDENTIFIER_POINTER (attr1);
+  if (p[0] == '_' && p[1] == '_'
+ && p[attr2_len - 2] == '_' && p[attr2_len - 1] == '_'
+ && strncmp (q, p + 2, attr1_len) == 0)
+   return true;;
+}
+  else if (attr2_len + 4 == attr1_len)
+{
+  const char *p = IDENTIFIER_POINTER (attr2);
+  const char *q = IDENTIFIER_POINTER (attr1);
+  if (q[0] == '_' && q[1] == '_'
+ && q[attr1_len - 2] == '_' && q[attr1_len - 1] == '_'
+ && strncmp (q + 2, p, attr2_len) == 0)
+   return true;
+}
+
+  return false;
+}
+
 /* Compare two attributes for their value identity.  Return true if the
-   attribute values are known to be equal; otherwise return false.
-*/
+   attribute values are known to be equal; otherwise return false.  */
 
 bool
 attribute_value_equal (const_tree attr1, const_tree attr2)
@@ -4883,10 +4927,25 @@ attribute_value_equal (const_tree attr1, const_tree 
attr2)
 
   if (TREE_VALUE (attr1) != NULL_TREE
   && TREE_CODE (TREE_VALUE (attr1)) == TREE_LIST
-  && TREE_VALUE (attr2) != NULL
+  && TREE_VALUE (attr2) != NULL_TREE
   && TREE_CODE (TREE_VALUE (attr2)) == TREE_LIST)
-return (simple_cst_list_equal (TREE_VALUE (attr1),
-  TREE_VALUE (attr2)) == 1);
+{
+  /* Handle attribute format.  */
+  if (is_attribute_p ("format", TREE_PURPOSE 

Re: [patch] libstdc++/66055 add missing constructors to unordered containers

2015-05-19 Thread Jonathan Wakely

On 17/05/15 22:21 +0200, François Dumont wrote:
Ok, I just commit fixing some other lines length except those having a 
long hyperlink, I didn't want to break those.


Yep, thanks. I think we should backport Nathan's patch and your one to
the gcc-5-branch too.

I'll make a note to do that before the 5.2 release.


Re: [PATCH 2/4] prologue and epilogue tidy and -mno-vrsave bug fix

2015-05-19 Thread David Edelsohn
On Sun, May 17, 2015 at 10:54 PM, Alan Modra  wrote:
> This patch tidies the prologue and epilogue altivec code a little.
> A number of places using info->altivec_size unnecessarily also test
> TARGET_ALTIVEC_ABI, when rs6000_stack_info() guarantees that
> info->altivec_size is zero if !TARGET_ALTIVEC_ABI.
>
> Similarly by inspection of rs6000_stack_info() code,
> TARGET_ALTIVEC_VRSAVE && info->vrsave_mask != 0, used when deciding to
> save or restore vrsave, can be replaced with info->vrsave_size.  I
> also removed the TARGET_ALTIVEC test used with save/restore of vrsave.
> I believe it is redundant because compute_vrsave_mask() will return 0
> when no altivec registers are used (and of course you can't use then
> without TARGET_ALTIVEC), except for Darwin where TARGET_ALTIVEC is
> forced.  The vrsave changes make the code actually doing the save or
> restore visually consistent with code that sets up a frame register
> for vrsave.
>
> Finally, I've changed two places that use info->vrsave_mask to test
> whether vrsave is saved or restored, to use info->vrsave_size.  This
> is a bug fix for -mno-vrsave.
>
> * config/rs6000/rs6000.c (struct rs6000_stack): Correct comments.
> (rs6000_stack_info): Don't zero offsets when not saving registers.
> (debug_stack_info): Adjust to omit printing unused offsets,
> as before.
> (direct_return): Test vrsave_size rather than vrsave_mask.
> (rs6000_emit_prologue): Likewise.  Remove redundant altivec tests.
> (rs6000_emit_epilogue): Likewise.

This patch is okay.

My only concern is Patch 1 causing a regression for the PR that I mentioned.

Thanks, David


Re: [PATCH 4/4] Split-stack arg pointer init refinement

2015-05-19 Thread David Edelsohn
This small refinement to the -fsplit-stack prologue arg pointer
initialization improves code generation.  Compare the -O2
gcc/testsuite/gcc.dg/split-3.c code for down() below.

beforeafter
mflr 0mflr 0
std 31,-8(1)std 31,-8(1)
std 0,16(1)mr 12,1
stdu 1,-10144(1)std 0,16(1)
addi 12,1,10144stdu 1,-10144(1)
bge 7,.L7bge 7,.L7
mr 12,29mr 12,29
.L7:.L7:

* config/rs6000/rs6000.c (rs6000_emit_allocate_stack): Return
stack adjusting insn.  Formatting.
(rs6000_emit_prologue): Track stack adjusting insn, and use of
r12.  If possible, emit first -fsplit-stack arg pointer insn
before stack adjust.  Don't use r12 to save cr if split-stack.

This patch is okay.  Nice improvement.

Thanks, David


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Michael Matz
Hi,

On Fri, 15 May 2015, Rich Felker wrote:

> Forget lazy binding. It's dead anyway because serious distros want
> PIE+relro+bindnow+...

You keep saying this, but I can't help the feeling it's mostly because 
musl doesn't support it ;-)

No, you don't have to use bindnow to get the effects of relro.  Sure 
there's more parts of the GOT protected with it, but if that's really that 
much more hardened is up for debate.

> If people really want lazy binding, they can use options which support 
> it, but I don't want to keep suffering the codegen cost of lazy binding 
> despite never using it.

> There should be an option to generate optimal code equivalent to what 
> you get with Alexander Monakov's patches for those of us who aren't 
> trying to support this legacy feature that precludes good performance 
> and precludes hardening.

H.J.'s branch is for _improving_ code on top of the no-plt code, it's not 
replacing it or an alternative for it.


Ciao,
Michael.


Re: [PATCH 02/13] optabs: Fix vec_perm -> V16QI middle end lowering.

2015-05-19 Thread Richard Henderson
On 05/19/2015 01:41 AM, Andreas Krebbel wrote:
> On 05/18/2015 07:35 PM, Richard Henderson wrote:
>> On 05/11/2015 06:23 AM, Andreas Krebbel wrote:
>>> @@ -6784,14 +6784,18 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, 
>>> rtx sel, rtx target)
>>>  {
>>>/* Multiply each element by its byte size.  */
>>>machine_mode selmode = GET_MODE (sel);
>>> +  /* We cannot re-use SEL as a temp operand since it might by in
>>> +read-only storage.  */
>>> +  rtx sel_reg = gen_reg_rtx (selmode);
>>> +
>>>if (u == 2)
>>> -   sel = expand_simple_binop (selmode, PLUS, sel, sel,
>>> -  sel, 0, OPTAB_DIRECT);
>>> +   sel_reg = expand_simple_binop (selmode, PLUS, sel, sel,
>>> +  sel_reg, 0, OPTAB_DIRECT);
>>>else
>>
>> You needn't allocate sel_reg explicitly; expand_simple_binop will do that for
>> you if the TARGET parameter is NULL.
>>
>> Thus this patch should be an 8 character change on those two calls.
> 
> Right. Thanks!
> 
> Ok to apply with that change?

Yes, thanks.


r~



Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Jeff Law

On 05/19/2015 08:43 AM, Michael Matz wrote:

Hi,

On Fri, 15 May 2015, Rich Felker wrote:


Forget lazy binding. It's dead anyway because serious distros want
PIE+relro+bindnow+...


You keep saying this, but I can't help the feeling it's mostly because
musl doesn't support it ;-)
FWIW, Red Hat is pushing PIE & partial RELRO deeper and deeper into the 
distribution.  It's not clear yet how far bindnow will go though.


jeff



[PATCH, alpha]: Some cleanups

2015-05-19 Thread Uros Bizjak
No functional changes.

2015-05-18  Uros Bizjak  

* config/alpha/alpha.c (alpha_legitimize_reload_address)
(alpha_preferred_reload_class, alpha_legitimate_constant_p): Use
CONST_INT_P, CONST_SCALAR_INT_P and CONST_DOUBLE_P predicates.
(alpha_split_reload_pair) :
Use CASE_CONST_SCALAR_INT.
(print_operand) : Use mode_width_operand to check the
value of the constant.
* config/alpha/alpha.md (movti): Use CONST_SCALAR_INT_P predicate.
* config/alpha/predicates.md (input_operand): Use general_operand
instead of match_code as operand check.
(symbolic_operand): Use match_code with subexpression digits.
* config/alpha/constraints.md (Q): Ditto.

Tested on alpha-linux-gnu and committed to mainline SVN.

Uros.


Re: [PATCH, alpha]: Some cleanups

2015-05-19 Thread Uros Bizjak
On Tue, May 19, 2015 at 5:10 PM, Uros Bizjak  wrote:
> No functional changes.
>
> 2015-05-18  Uros Bizjak  
>
> * config/alpha/alpha.c (alpha_legitimize_reload_address)
> (alpha_preferred_reload_class, alpha_legitimate_constant_p): Use
> CONST_INT_P, CONST_SCALAR_INT_P and CONST_DOUBLE_P predicates.
> (alpha_split_reload_pair) :
> Use CASE_CONST_SCALAR_INT.
> (print_operand) : Use mode_width_operand to check the
> value of the constant.
> * config/alpha/alpha.md (movti): Use CONST_SCALAR_INT_P predicate.
> * config/alpha/predicates.md (input_operand): Use general_operand
> instead of match_code as operand check.
> (symbolic_operand): Use match_code with subexpression digits.
> * config/alpha/constraints.md (Q): Ditto.
>
> Tested on alpha-linux-gnu and committed to mainline SVN.

... now with a patch.

Uros.
Index: config/alpha/alpha.c
===
--- config/alpha/alpha.c(revision 223268)
+++ config/alpha/alpha.c(working copy)
@@ -1352,7 +1352,7 @@ alpha_legitimize_reload_address (rtx x,
   && REG_P (XEXP (x, 0))
   && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER
   && REGNO_OK_FOR_BASE_P (REGNO (XEXP (x, 0)))
-  && GET_CODE (XEXP (x, 1)) == CONST_INT)
+  && CONST_INT_P (XEXP (x, 1)))
 {
   HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
   HOST_WIDE_INT low = ((val & 0x) ^ 0x8000) - 0x8000;
@@ -1644,9 +1644,8 @@ alpha_preferred_reload_class(rtx x, enum reg_class
 return rclass;
 
   /* These sorts of constants we can easily drop to memory.  */
-  if (CONST_INT_P (x)
-  || GET_CODE (x) == CONST_WIDE_INT
-  || GET_CODE (x) == CONST_DOUBLE
+  if (CONST_SCALAR_INT_P (x)
+  || CONST_DOUBLE_P (x)
   || GET_CODE (x) == CONST_VECTOR)
 {
   if (rclass == FLOAT_REGS)
@@ -2133,7 +2132,7 @@ alpha_legitimate_constant_p (machine_mode mode, rt
 
 case CONST:
   if (GET_CODE (XEXP (x, 0)) == PLUS
- && GET_CODE (XEXP (XEXP (x, 0), 1)) == CONST_INT)
+ && CONST_INT_P (XEXP (XEXP (x, 0), 1)))
x = XEXP (XEXP (x, 0), 0);
   else
return true;
@@ -3283,8 +3282,7 @@ alpha_split_tmode_pair (rtx operands[4], machine_m
   operands[2] = adjust_address (operands[1], DImode, 0);
   break;
 
-case CONST_INT:
-case CONST_WIDE_INT:
+CASE_CONST_SCALAR_INT:
 case CONST_DOUBLE:
   gcc_assert (operands[1] == CONST0_RTX (mode));
   operands[2] = operands[3] = const0_rtx;
@@ -5257,9 +5255,7 @@ print_operand (FILE *file, rtx x, int code)
 
 case 'M':
   /* 'b', 'w', 'l', or 'q' as the value of the constant.  */
-  if (!CONST_INT_P (x)
- || (INTVAL (x) != 8 && INTVAL (x) != 16
- && INTVAL (x) != 32 && INTVAL (x) != 64))
+  if (!mode_width_operand (x, VOIDmode))
output_operand_lossage ("invalid %%M value");
 
   fprintf (file, "%s",
Index: config/alpha/alpha.md
===
--- config/alpha/alpha.md   (revision 223268)
+++ config/alpha/alpha.md   (working copy)
@@ -4153,8 +4153,7 @@
   /* We must put 64-bit constants in memory.  We could keep the
  32-bit constants in TImode and rely on the splitter, but
  this doesn't seem to be worth the pain.  */
-  else if (CONST_INT_P (operands[1])
-  || GET_CODE (operands[1]) == CONST_WIDE_INT)
+  else if (CONST_SCALAR_INT_P (operands[1]))
 {
   rtx in[2], out[2], target;
 
Index: config/alpha/constraints.md
===
--- config/alpha/constraints.md (revision 223298)
+++ config/alpha/constraints.md (working copy)
@@ -97,7 +97,7 @@
 (define_memory_constraint "Q"
   "@internal A normal_memory_operand"
   (and (match_code "mem")
-   (not (match_test "GET_CODE (XEXP (op, 0)) == AND"
+   (not (match_code "and" "0"
 
 (define_constraint "R"
   "@internal A direct_call_operand"
Index: config/alpha/predicates.md
===
--- config/alpha/predicates.md  (revision 223298)
+++ config/alpha/predicates.md  (working copy)
@@ -72,7 +72,7 @@
 ;; Return 1 if the operand is a non-symbolic, nonzero constant operand.
 (define_predicate "non_zero_const_operand"
   (and (match_code "const_int,const_wide_int,const_double,const_vector")
-   (match_test "op != CONST0_RTX (mode)")))
+   (not (match_test "op == CONST0_RTX (mode)"
 
 ;; Return 1 if OP is the constant 4 or 8.
 (define_predicate "const48_operand"
@@ -150,8 +150,7 @@
 
 ;; Return 1 if OP is a valid operand for the source of a move insn.
 (define_predicate "input_operand"
-  (match_code "label_ref,symbol_ref,const,high,reg,subreg,mem,
-  const_double,const_vector,const_int,const_wide_int")
+  (match_operand 0 "general_operand")
 {
   switch (GET_CODE (op))
 {
@@ -273,8 +272,8 @@
 (define_predicate "call_operand"
   (ior (match_code "sy

breakage with series "[0/9] Record number of hard registers in a REG"

2015-05-19 Thread Hans-Peter Nilsson
> From: Richard Sandiford 
> Date: Mon, 18 May 2015 20:09:19 +0200

> While looking at a profile of gcc, I noticed one thing fairly high
> up the list was a loop iterating over all the registers in a REG,
> apparently due to the delay in computing the index for hard_regno_nregs
> and then loading the value (which would often be an L1 cache miss).

> Each patch in the series was individually bootstrapped & regression-tested
> on x86_64-linux-gnu.
> 
> Thanks,
> Richard
> 

Please also make use of config-list.mk or a subset affecting
targets.  Build succeded for cris-elf last at r223334.  Build
failed at r223355, r223364, r223366, it seems from a commit in
this patch series:

...
g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual 
-Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. 
-I. -I/tmp/hpautotest-gcc1/gcc/gcc -I/tmp/hpautotest-gcc1/gcc/gcc/. 
-I/tmp/hpautotest-gcc1/gcc/gcc/../include 
-I/tmp/hpautotest-gcc1/gcc/gcc/../libcpp/include 
-I/tmp/hpautotest-gcc1/cris-elf/gccobj/./gmp -I/tmp/hpautotest-gcc1/gcc/gmp 
-I/tmp/hpautotest-gcc1/cris-elf/gccobj/./mpfr -I/tmp/hpautotest-gcc1/gcc/mpfr 
-I/tmp/hpautotest-gcc1/gcc/mpc/src  
-I/tmp/hpautotest-gcc1/gcc/gcc/../libdecnumber 
-I/tmp/hpautotest-gcc1/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I/tmp/hpautotest-gcc1/gcc/gcc/../libbacktrace   -o cris.o -MT cris.o -MMD -MP 
-MF ./.deps/cris.TPo /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c
In file included from /tmp/hpautotest-gcc1/gcc/gcc/rtl.h:25,
 from /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:25:
/tmp/hpautotest-gcc1/gcc/gcc/input.h:37: warning: comparison between signed and 
unsigned integer expressions
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 'void 
cris_expand_prologue()':
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3141: error: 'gen_rtx_raw_REG' 
was not declared in this scope
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3165: error: 'gen_rtx_raw_REG' 
was not declared in this scope
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3263: error: 'gen_rtx_raw_REG' 
was not declared in this scope
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 'void 
cris_expand_epilogue()':
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3429: error: 'gen_rtx_raw_REG' 
was not declared in this scope
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3515: error: 'gen_rtx_raw_REG' 
was not declared in this scope
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3548: error: 'gen_rtx_raw_REG' 
was not declared in this scope
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3573: error: 'gen_rtx_raw_REG' 
was not declared in this scope
make[2]: *** [cris.o] Error 1

brgds, H-P


Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-19 Thread Joseph Myers
On Mon, 18 May 2015, H.J. Lu wrote:

> > Have updates for all affected specs for all targets been posted?  I just
> > saw a small and apparently arbitrary subset of targets with patches, and
> > no explanation of how those targets were identified or why the other
> > targets with specs mentioning the options in question did not need
> > updates.
> >
> 
> I only posted patches for an  arbitrary subset of targets because
> 
> 1. Not everyone is interested in --enable-default-pie.
> 2. I can't tests all targets myself.
> 
> If patches for all targets is the only blocker before the patch
> will be approved or target maintainers will help me test the patch,
> I will post patches for each target affected.

I think the whole thing should be posted as one patch, with both the 
target-independent changes and the target-specific changes for all 
targets.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 8:21 AM, Joseph Myers  wrote:
> On Mon, 18 May 2015, H.J. Lu wrote:
>
>> > Have updates for all affected specs for all targets been posted?  I just
>> > saw a small and apparently arbitrary subset of targets with patches, and
>> > no explanation of how those targets were identified or why the other
>> > targets with specs mentioning the options in question did not need
>> > updates.
>> >
>>
>> I only posted patches for an  arbitrary subset of targets because
>>
>> 1. Not everyone is interested in --enable-default-pie.
>> 2. I can't tests all targets myself.
>>
>> If patches for all targets is the only blocker before the patch
>> will be approved or target maintainers will help me test the patch,
>> I will post patches for each target affected.
>
> I think the whole thing should be posted as one patch, with both the
> target-independent changes and the target-specific changes for all
> targets.
>

That is what makes me concerned.  I have some simple target-specified
patches which weren't reviewed for years. What will happen if no one
reviews some simple target-specified changes due to

1. Reviewers don't have access to those targets.
2. Target maintainers aren't review them.
3. There are no clear maintainers for those targets.

As the result, my patch may go nowhere.

-- 
H.J.


Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-19 Thread Joseph Myers
On Tue, 19 May 2015, H.J. Lu wrote:

> > I think the whole thing should be posted as one patch, with both the
> > target-independent changes and the target-specific changes for all
> > targets.
> >
> 
> That is what makes me concerned.  I have some simple target-specified
> patches which weren't reviewed for years. What will happen if no one

For any unreviewed patch, keep pinging weekly.

> reviews some simple target-specified changes due to
> 
> 1. Reviewers don't have access to those targets.
> 2. Target maintainers aren't review them.
> 3. There are no clear maintainers for those targets.

I've already said in 
 that, given 
target maintainers CC:ed, I might be inclined to approve the patch on the 
basis of allowing them a week to test their target changes.

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-19 Thread Paul_Koning


From: gcc-patches-ow...@gcc.gnu.org [gcc-patches-ow...@gcc.gnu.org] on behalf 
of H.J. Lu [hjl.to...@gmail.com]
Sent: Tuesday, May 19, 2015 11:27 AM
To: Joseph Myers
Cc: Magnus Granberg; GCC Patches
Subject: Re: PING^3: [PATCH]: New configure options that make the compiler use 
-fPIE and -pie as default option

On Tue, May 19, 2015 at 8:21 AM, Joseph Myers  wrote:
> ...
> I think the whole thing should be posted as one patch, with both the
> target-independent changes and the target-specific changes for all
> targets.
>

That is what makes me concerned.  I have some simple target-specified
patches which weren't reviewed for years. What will happen if no one
reviews some simple target-specified changes due to

1. Reviewers don't have access to those targets.
2. Target maintainers aren't review them.
3. There are no clear maintainers for those targets.

As the result, my patch may go nowhere.
---

But that hasn't stopped others from posting patches like that, or getting them 
approved.  And we also have global maintainers who can approve things.  It 
feels a bit like a hypothetical issue is being used as a reason to do part of 
the job.


paul


Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 8:33 AM,   wrote:
>
> 
> From: gcc-patches-ow...@gcc.gnu.org [gcc-patches-ow...@gcc.gnu.org] on behalf 
> of H.J. Lu [hjl.to...@gmail.com]
> Sent: Tuesday, May 19, 2015 11:27 AM
> To: Joseph Myers
> Cc: Magnus Granberg; GCC Patches
> Subject: Re: PING^3: [PATCH]: New configure options that make the compiler 
> use -fPIE and -pie as default option
>
> On Tue, May 19, 2015 at 8:21 AM, Joseph Myers  wrote:
>> ...
>> I think the whole thing should be posted as one patch, with both the
>> target-independent changes and the target-specific changes for all
>> targets.
>>
>
> That is what makes me concerned.  I have some simple target-specified
> patches which weren't reviewed for years. What will happen if no one
> reviews some simple target-specified changes due to
>
> 1. Reviewers don't have access to those targets.
> 2. Target maintainers aren't review them.
> 3. There are no clear maintainers for those targets.
>
> As the result, my patch may go nowhere.
> ---
>
> But that hasn't stopped others from posting patches like that, or getting 
> them approved.  And we also have global maintainers who can approve things.  
> It feels a bit like a hypothetical issue is being used as a reason to do part 
> of the job.

It is not hypothetical.  See:

https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01558.html

It happened before.  I don't want to make it harder for global maintainers
to review a patch which has zero-impact on a target if --enable-default-pie
isn't used.

-- 
H.J.


Re: [Patch] [AArch64] PR target 66049: fix add/extend gcc test suite failures

2015-05-19 Thread Kyrill Tkachov

Hi Venkat,

On 19/05/15 16:37, Kumar, Venkataramanan wrote:

Hi Maintainers,

Please find the attached patch, that fixes add/extend gcc test suite failures 
in Aarch64 target.
Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049

These tests started to fail after we prevented combiner from converting shift 
RTX to mult RTX, when the RTX is not inside a memory operation (r222874) .
Now I have added new add/extend patterns which are based on shift operations,  
to fix these cases.

Testing status with the patch.

(1) GCC bootstrap on AArch64 successful.
(2)  SPEC2006 INT runs did not show any degradation.


Does that mean there was no performance regression? Or no codegen difference?
What I'd expect from this patch is that the codegen would be the same as before 
the combine patch
(r222874). A performance difference can sometimes be hard to measure even at 
worse code quality.
Can you please confirm that on SPEC2006 INT the adds and shifts are now back to 
being combined
into a single instruction?

Thanks,
Kyrill


(3) gcc regression testing passed.

(-Snip-)
# Comparing 3 common sum files
## /bin/sh ./gcc-fsf-trunk/contrib/compare_tests  /tmp/gxx-sum1.24998 
/tmp/gxx-sum2.24998
Tests that now work, but didn't before:

gcc.target/aarch64/adds1.c scan-assembler adds\tw[0-9]+, w[0-9]+, w[0-9]+, lsl 3
gcc.target/aarch64/adds1.c scan-assembler adds\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3
gcc.target/aarch64/adds3.c scan-assembler-times adds\tx[0-9]+, x[0-9]+, 
x[0-9]+, sxtw 2
gcc.target/aarch64/extend.c scan-assembler add\tw[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler add\tx[0-9]+,.*uxtw #?3
gcc.target/aarch64/extend.c scan-assembler sub\tw[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxtw #?3
gcc.target/aarch64/subs1.c scan-assembler subs\tw[0-9]+, w[0-9]+, w[0-9]+, lsl 3
gcc.target/aarch64/subs1.c scan-assembler subs\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3
gcc.target/aarch64/subs3.c scan-assembler-times subs\tx[0-9]+, x[0-9]+, 
x[0-9]+, sxtw 2

# No differences found in 3 common sum files
(-Snip-)

The patterns are fixing the regressing tests, so I have not added any new tests.
Regarding  removal of the old patterns based on "mults",  I am planning to do 
it as a separate work.

Is this OK for trunk ?

gcc/ChangeLog

2015-05-19  Venkataramanan Kumar  

 * config/aarch64/aarch64.md
 (*adds_shift_imm_):  New pattern.
 (*subs_shift_imm_):  Likewise.
 (*adds__shift_):  Likewise.
 (*subs__shift_): Likewise.
 (*add_uxt_shift2): Likewise.
 (*add_uxtsi_shift2_uxtw): Likewise.
(*sub_uxt_shift2): Likewise.
(*sub_uxtsi_shift2_uxtw): Likewise.


Regards,
Venkat.
   



  




[Patch] [AArch64] PR target 66049: fix add/extend gcc test suite failures

2015-05-19 Thread Kumar, Venkataramanan
Hi Maintainers, 

Please find the attached patch, that fixes add/extend gcc test suite failures 
in Aarch64 target.  
Ref: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049

These tests started to fail after we prevented combiner from converting shift 
RTX to mult RTX, when the RTX is not inside a memory operation (r222874) .
Now I have added new add/extend patterns which are based on shift operations,  
to fix these cases.

Testing status with the patch.

(1) GCC bootstrap on AArch64 successful. 
(2)  SPEC2006 INT runs did not show any degradation.
(3) gcc regression testing passed.

(-Snip-)
# Comparing 3 common sum files
## /bin/sh ./gcc-fsf-trunk/contrib/compare_tests  /tmp/gxx-sum1.24998 
/tmp/gxx-sum2.24998
Tests that now work, but didn't before:

gcc.target/aarch64/adds1.c scan-assembler adds\tw[0-9]+, w[0-9]+, w[0-9]+, lsl 3
gcc.target/aarch64/adds1.c scan-assembler adds\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3
gcc.target/aarch64/adds3.c scan-assembler-times adds\tx[0-9]+, x[0-9]+, 
x[0-9]+, sxtw 2
gcc.target/aarch64/extend.c scan-assembler add\tw[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler add\tx[0-9]+,.*uxtw #?3
gcc.target/aarch64/extend.c scan-assembler sub\tw[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxth #?1
gcc.target/aarch64/extend.c scan-assembler sub\tx[0-9]+,.*uxtw #?3
gcc.target/aarch64/subs1.c scan-assembler subs\tw[0-9]+, w[0-9]+, w[0-9]+, lsl 3
gcc.target/aarch64/subs1.c scan-assembler subs\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 3
gcc.target/aarch64/subs3.c scan-assembler-times subs\tx[0-9]+, x[0-9]+, 
x[0-9]+, sxtw 2

# No differences found in 3 common sum files  
(-Snip-)

The patterns are fixing the regressing tests, so I have not added any new 
tests. 
Regarding  removal of the old patterns based on "mults",  I am planning to do 
it as a separate work.

Is this OK for trunk ? 

gcc/ChangeLog 

2015-05-19  Venkataramanan Kumar  

* config/aarch64/aarch64.md
(*adds_shift_imm_):  New pattern.
(*subs_shift_imm_):  Likewise.
(*adds__shift_):  Likewise.
(*subs__shift_): Likewise.
(*add_uxt_shift2): Likewise.
(*add_uxtsi_shift2_uxtw): Likewise.
   (*sub_uxt_shift2): Likewise.
   (*sub_uxtsi_shift2_uxtw): Likewise.


Regards,
Venkat.
  


 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1dbadc0..d0d6a6a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1539,6 +1539,38 @@
   [(set_attr "type" "alus_sreg,alus_imm,alus_imm")]
 )
 
+(define_insn "*adds_shift_imm_"
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
+(plus:GPI (ASHIFT:GPI 
+   (match_operand:GPI 1 "register_operand" "r")
+   (match_operand:QI 2 "aarch64_shift_imm_" "n"))
+  (match_operand:GPI 3 "register_operand" "r"))
+(const_int 0)))
+   (set (match_operand:GPI 0 "register_operand" "=r")
+   (plus:GPI (ASHIFT:GPI (match_dup 1) (match_dup 2))
+ (match_dup 3)))]
+  ""
+  "adds\\t%0, %3, %1,  %2"
+  [(set_attr "type" "alus_shift_imm")]
+)
+
+(define_insn "*subs_shift_imm_"
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
+(minus:GPI (match_operand:GPI 1 "register_operand" "r")
+   (ASHIFT:GPI
+(match_operand:GPI 2 "register_operand" "r")
+(match_operand:QI 3 "aarch64_shift_imm_" "n")))
+(const_int 0)))
+   (set (match_operand:GPI 0 "register_operand" "=r")
+   (minus:GPI (match_dup 1)
+  (ASHIFT:GPI (match_dup 2) (match_dup 3]
+  ""
+  "subs\\t%0, %1, %2,  %3"
+  [(set_attr "type" "alus_shift_imm")]
+)
+
 (define_insn "*adds_mul_imm_"
   [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ
@@ -1599,6 +1631,42 @@
   [(set_attr "type" "alus_ext")]
 )
 
+(define_insn "*adds__shift_"
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
+(plus:GPI (ashift:GPI 
+   (ANY_EXTEND:GPI 
+(match_operand:ALLX 1 "register_operand" "r"))
+   (match_operand 2 "aarch64_imm3" "Ui3"))
+  (match_operand:GPI 3 "register_operand" "r"))
+(const_int 0)))
+   (set (match_operand:GPI 0 "register_operand" "=rk")
+   (plus:GPI (ashift:GPI (ANY_EXTEND:GPI (match_dup 1))
+ (match_dup 2))
+ (match_dup 3)))]
+  ""
+  "adds\\t%0, %3, %1, xt %2"
+  [(set_attr "type" "alus_ext")]
+)
+
+(define_insn "*subs__shift_"
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
+(minus:GPI (match_operand:GPI 1 "register_operand" "r")
+   (ashift:GPI 
+(ANY_EXTEND:GPI
+ (match_operand:ALLX 2 "register_operand" "r"))
+(match_operand 3 "aarch64_imm3" "Ui3")))
+(const_int 0)))
+   (set (match_operand:GPI 0 "register_operand" "=rk")
+   (minus:GPI (match_dup 1)
+  (ashift:GPI 

Re: breakage with series "[0/9] Record number of hard registers in a REG"

2015-05-19 Thread Richard Sandiford
Hans-Peter Nilsson  writes:
> g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions
> -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings
> -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic
> -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common
> -DHAVE_CONFIG_H -I. -I. -I/tmp/hpautotest-gcc1/gcc/gcc
> -I/tmp/hpautotest-gcc1/gcc/gcc/. -I/tmp/hpautotest-gcc1/gcc/gcc/../include
> -I/tmp/hpautotest-gcc1/gcc/gcc/../libcpp/include
> -I/tmp/hpautotest-gcc1/cris-elf/gccobj/./gmp
> -I/tmp/hpautotest-gcc1/gcc/gmp
> -I/tmp/hpautotest-gcc1/cris-elf/gccobj/./mpfr
> -I/tmp/hpautotest-gcc1/gcc/mpfr -I/tmp/hpautotest-gcc1/gcc/mpc/src
> -I/tmp/hpautotest-gcc1/gcc/gcc/../libdecnumber
> -I/tmp/hpautotest-gcc1/gcc/gcc/../libdecnumber/dpd -I../libdecnumber
> -I/tmp/hpautotest-gcc1/gcc/gcc/../libbacktrace -o cris.o -MT cris.o -MMD
> -MP -MF ./.deps/cris.TPo /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c
> In file included from /tmp/hpautotest-gcc1/gcc/gcc/rtl.h:25,
>  from /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:25:
> /tmp/hpautotest-gcc1/gcc/gcc/input.h:37: warning: comparison between
> signed and unsigned integer expressions
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 'void
> cris_expand_prologue()':
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3141: error:
> gen_rtx_raw_REG' was not declared in this scope
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3165: error:
> gen_rtx_raw_REG' was not declared in this scope
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3263: error:
> gen_rtx_raw_REG' was not declared in this scope
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 'void
> cris_expand_epilogue()':
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3429: error:
> gen_rtx_raw_REG' was not declared in this scope
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3515: error:
> gen_rtx_raw_REG' was not declared in this scope
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3548: error:
> gen_rtx_raw_REG' was not declared in this scope
> /tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:3573: error:
> gen_rtx_raw_REG' was not declared in this scope
> make[2]: *** [cris.o] Error 1

Installed as obvious after testing that cris-elf, microblaze-elf
and sparc-linux-gnu now build.  Obviously I mustn't have used
the usual recursive grep.

Thanks,
Richard


gcc/
* config/cris/cris.c (cris_expand_prologue): Use gen_raw_REG
instead of gen_rtx_raw_REG.
(cris_expand_epilogue): Likewise.
* config/microblaze/microblaze.c (microblaze_classify_address):
Likewise.
* config/sparc/sparc.md: Likewise.

Index: gcc/config/cris/cris.c
===
--- gcc/config/cris/cris.c  2015-05-19 16:40:34.734003511 +0100
+++ gcc/config/cris/cris.c  2015-05-19 16:40:34.890001679 +0100
@@ -3138,7 +3138,7 @@ cris_expand_prologue (void)
 
  mem = gen_rtx_MEM (SImode, stack_pointer_rtx);
  set_mem_alias_set (mem, get_varargs_alias_set ());
- insn = emit_move_insn (mem, gen_rtx_raw_REG (SImode, regno));
+ insn = emit_move_insn (mem, gen_raw_REG (SImode, regno));
 
  /* Note the absence of RTX_FRAME_RELATED_P on the above insn:
 the value isn't restored, so we don't want to tell dwarf2
@@ -3162,7 +3162,7 @@ cris_expand_prologue (void)
 
   mem = gen_rtx_MEM (SImode, stack_pointer_rtx);
   set_mem_alias_set (mem, get_frame_alias_set ());
-  insn = emit_move_insn (mem, gen_rtx_raw_REG (SImode, CRIS_SRP_REGNUM));
+  insn = emit_move_insn (mem, gen_raw_REG (SImode, CRIS_SRP_REGNUM));
   RTX_FRAME_RELATED_P (insn) = 1;
   framesize += 4;
 }
@@ -3260,7 +3260,7 @@ cris_expand_prologue (void)
 
  mem = gen_rtx_MEM (SImode, stack_pointer_rtx);
  set_mem_alias_set (mem, get_frame_alias_set ());
- insn = emit_move_insn (mem, gen_rtx_raw_REG (SImode, regno));
+ insn = emit_move_insn (mem, gen_raw_REG (SImode, regno));
  RTX_FRAME_RELATED_P (insn) = 1;
 
  framesize += 4 + size;
@@ -3426,7 +3426,7 @@ cris_expand_epilogue (void)
mem = gen_rtx_MEM (SImode, gen_rtx_POST_INC (SImode,
 stack_pointer_rtx));
set_mem_alias_set (mem, get_frame_alias_set ());
-   insn = emit_move_insn (gen_rtx_raw_REG (SImode, regno), mem);
+   insn = emit_move_insn (gen_raw_REG (SImode, regno), mem);
 
/* Whenever we emit insns with post-incremented addresses
   ourselves, we must add a post-inc note manually.  */
@@ -3512,7 +3512,7 @@ cris_expand_epilogue (void)
{
  rtx mem;
  rtx insn;
- rtx srpreg = gen_rtx_raw_REG (SImode, CRIS_SRP_REGNUM);
+ rtx srpreg = gen_raw_REG (SImode, CRIS_SRP_REGNUM);
  mem = gen_rtx_MEM (SImode,
 gen_rtx_POST_INC (SImode,

Re: [Patch AArch64] Add cpu_defines.h for AArch64.

2015-05-19 Thread pinskia




> On May 19, 2015, at 5:54 AM, Ramana Radhakrishnan 
>  wrote:
> 
> Hi,
> 
> Like the ARM port, the AArch64 ports needs to set glibc_integral_traps to 
> false as integer divide instructions do not trap.
> 
> Bootstrapped and regression tested on aarch64-none-linux-gnu
> 
> Ok to apply ?

Not really questioning your patch but questioning libstdc++'s defaults. I 
wonder if this should be the default as most targets don't trap, only a few 
that does. And it should be safer default to say they don't trap too?

Thanks,
Andrew


> 
> regards
> Ramana
> 
> 
> 2015-05-17  Ramana Radhakrishnan  
> 
>* configure.host: Define cpu_defines_dir for AArch64
>* config/cpu/aarch64/cpu_defines.h: New file.
> <0002-Do-the-same-for-AArch64.patch>


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Michael Matz
Hi,

On Tue, 19 May 2015, Jeff Law wrote:

> > > Forget lazy binding. It's dead anyway because serious distros want 
> > > PIE+relro+bindnow+...
> > 
> > You keep saying this, but I can't help the feeling it's mostly because 
> > musl doesn't support it ;-)
> 
> FWIW, Red Hat is pushing PIE & partial RELRO deeper and deeper into the 
> distribution.

Yeah, us as well, though I don't necessarily see the point for most 
packages; feels a bit like a checkmark item :)


Ciao,
Michael.


Re: [Patch AArch64] Add cpu_defines.h for AArch64.

2015-05-19 Thread Ramana Radhakrishnan
On Tue, May 19, 2015 at 4:54 PM,   wrote:
>
>
>
>
>> On May 19, 2015, at 5:54 AM, Ramana Radhakrishnan 
>>  wrote:
>>
>> Hi,
>>
>> Like the ARM port, the AArch64 ports needs to set glibc_integral_traps to 
>> false as integer divide instructions do not trap.
>>
>> Bootstrapped and regression tested on aarch64-none-linux-gnu
>>
>> Ok to apply ?
>
> Not really questioning your patch but questioning libstdc++'s defaults.
>  I wonder if this should be the default as most targets don't trap, only a 
> few that does. And it should be safer default to say they don't trap too?


How about we  #error out if targets do *not* define some of these
defaults in libstdc++  ? There are far more ports with weak memory
models, and the defaults for _GLIBCXX_READ/WRITE_BARRIER also appear
unsafe . I was toying with a patch like that to force targets to
define this sort of thing but I need to read more of configure.host
before I make up my mind.



regards
Ramana


>
> Thanks,
> Andrew
>
>
>>
>> regards
>> Ramana
>>
>>
>> 2015-05-17  Ramana Radhakrishnan  
>>
>>* configure.host: Define cpu_defines_dir for AArch64
>>* config/cpu/aarch64/cpu_defines.h: New file.
>> <0002-Do-the-same-for-AArch64.patch>


Re: [Patch ARM-AArch64/testsuite 00/13] Neon intrinsics executable tests

2015-05-19 Thread Christophe Lyon
On 19 May 2015 at 15:32, James Greenhalgh  wrote:
> On Tue, May 12, 2015 at 09:30:48PM +0100, Christophe Lyon wrote:
>> This patch series is a follow-up to the tests I already contributed,
>> converted from my original testsuite.
>>
>> This series consists in 13 new files, which can be committed
>> independently.
>>
>> Another series (hopefully final) will follow.
>>
>> Tested with qemu on arm*linux, aarch64-linux. I couldn't test on 
>> aarch64_be-none-elf because my build is currently broken (see PR 66018).
>>
>> 2015-05-12  Christophe Lyon  
>>
>>   * gcc.target/aarch64/neon-intrinsics/vqmovn.c: New file.
>>   * gcc.target/aarch64/neon-intrinsics/vqmovun.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqrdmulh.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqrdmulh_lane.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqrdmulh_n.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqrshl.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqrshrn_n.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqrshrun_n.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqshl.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqshl_n.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqshlu_n.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqshrn_n.c: Likewise.
>>   * gcc.target/aarch64/neon-intrinsics/vqshrun_n.c: Likewise.
>
> Hi Christophe,
>
> This patch set looks good to me. The patch set is OK, please apply it.
>
> One small nit, could you run through and check the alignment of the
> trailing \ in some of the macro definitions? It might be the mail
> clients, but I see (for example):
>
> +  /* Basic test: v2=vqshlu_n(v1,v), then store the result.  */
> +#define TEST_VQSHLU_N2(INSN, Q, T1, T2, T3, T4, W, N, V, 
> EXPECTED_CUMULATIVE_SAT, CMT) \
> +  Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T3, W, N));  \
> +  VECT_VAR(vector_res, T3, W, N) = \
> +INSN##Q##_n_##T2##W(VECT_VAR(vector, T1, W, N),\
> +   V); \
> +  vst1##Q##_##T4##W(VECT_VAR(result, T3, W, N),\
> +   VECT_VAR(vector_res, T3, W, N));\
> +  CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT)
> +
>
> in patch 11/13.

Done, the trailing \ is now further to the right in some of the patches.

>
> Also, if you could look out for aarch64_be fallout once the build
> starts going, that would be great.
>
Sure, my automatic validations will take care of that.

Thanks

Christophe.

> Thanks,
> James
>
>>
>> Christophe Lyon (13):
>>   Add vqmovn tests.
>>   Add vqmovun tests.
>>   Add vqrdmulh tests.
>>   Add vqrdmulh_lane tests.
>>   Add vqrdmulh_n tests.
>>   Add vqrshl tests.
>>   Add vqrshrn_n tests.
>>   Add vqrshrun_n tests.
>>   Add vqshl tests.
>>   Add vqshl_n tests.
>>   Add vqshlu_n tests.
>>   Add vqshrn_n tests.
>>   Add vqshrun_n tests.
>>
>>  .../gcc.target/aarch64/advsimd-intrinsics/vqmovn.c |  134 +++
>>  .../aarch64/advsimd-intrinsics/vqmovun.c   |   93 ++
>>  .../aarch64/advsimd-intrinsics/vqrdmulh.c  |  161 +++
>>  .../aarch64/advsimd-intrinsics/vqrdmulh_lane.c |  169 +++
>>  .../aarch64/advsimd-intrinsics/vqrdmulh_n.c|  155 +++
>>  .../gcc.target/aarch64/advsimd-intrinsics/vqrshl.c | 1090 
>> 
>>  .../aarch64/advsimd-intrinsics/vqrshrn_n.c |  174 
>>  .../aarch64/advsimd-intrinsics/vqrshrun_n.c|  189 
>>  .../gcc.target/aarch64/advsimd-intrinsics/vqshl.c  |  829 +++
>>  .../aarch64/advsimd-intrinsics/vqshl_n.c   |  234 +
>>  .../aarch64/advsimd-intrinsics/vqshlu_n.c  |  263 +
>>  .../aarch64/advsimd-intrinsics/vqshrn_n.c  |  177 
>>  .../aarch64/advsimd-intrinsics/vqshrun_n.c |  133 +++
>>  13 files changed, 3801 insertions(+)
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqmovn.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqmovun.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmulh.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmulh_lane.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrdmulh_n.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshl.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrn_n.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqrshrun_n.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshl.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshl_n.c
>>  create mode 100644 
>> gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqshlu_n.c
>>  create mode 100644 
>> gcc/testsuite/g

Re: [RFC] COMDAT Safe Module Level Multi versioning

2015-05-19 Thread Xinliang David Li
>
> Hm.  But which options are unsafe?  Also wouldn't it be better to simply
> _not_ have unsafe options produce comdats but always make local clones
> for them (thus emit the comdat with "unsafe" flags dropped)?

Always localize comdat functions may lead to text size increase. It
does not work if the comdat function is a virtual function for
instance.

David


>
> Richard.
>
>>
>> Thanks
>> Sri


Fix PR48052: loop not vectorized if index is "unsigned int"

2015-05-19 Thread Aditya K
w.r.t. the PR48052, here is the patch which finds out if scev would wrap or not.
The patch symbolically evaluates if valid_niter>= loop->nb_iterations is true. 
In that case the scev would not wrap (??).
Currently, we only look for two special 'patterns', which are sufficient to 
analyze the simple test cases.

valid_niter = ~s (= UNIT_MAX - s)
We have to prove that valid_niter>= loop->nb_iterations

Pattern1 loop->nb_iterations: s>= e ? s - e : 0
Pattern2 loop->nb_iterations: (e - s) -1

In the first case we prove that valid_niter>= loop->nb_iterations in both the 
cases i.e., when s>=e and when not.
In the second case we prove valid_niter>= loop->nb_iterations, by simple 
analysis that  UINT_MAX>= e is true in all cases.

I haven't tested this patch completely. I'm looking for feedback and any scope 
for improvement.


hth,
-Aditya



Vectorize loops which has typecast.

2015-05-19  hiraditya  

    * gcc.dg/vect/pr48052.c: New test.

gcc/ChangeLog:

2015-05-19  hiraditya  

    * tree-ssa-loop-niter.c (fold_binary_cond_p): Fold a conditional 
operation when additional constraints are
    available.
    (fold_binary_minus_p): Fold a subtraction operations of the form (A - B 
-1) when additional constraints are
    available.
    (scev_probably_wraps_p): Use the above two functions to find whether 
valid_niter>= loop->nb_iterations.


diff --git a/gcc/testsuite/gcc.dg/vect/pr48052.c 
b/gcc/testsuite/gcc.dg/vect/pr48052.c
new file mode 100644
index 000..8e406d7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr48052.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
+int foo(int* A, int* B,  unsigned start, unsigned BS)
+{
+  int s;
+  for (unsigned k = start;  k < start + BS; k++)
+    {
+  s += A[k] * B[k];
+    }
+
+  return s;
+}
+
+int bar(int* A, int* B, unsigned BS)
+{
+  int s;
+  for (unsigned k = 0;  k < BS; k++)
+    {
+  s += A[k] * B[k];
+    }
+
+  return s;
+}
+
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 042f8df..ddc00cc 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -3773,6 +3773,117 @@ nowrap_type_p (tree type)
   return false;
 }
 
+/* Return true when op0>= op1.
+   For example:
+   Where, op0 = ~start_3(D);
+   op1 = start_3(D) <= stop_6(D) ? stop_6(D) - start_3(D) : 0;
+   In this case op0 = UINT_MAX - start_3(D);
+   So, op0>= op1 in all cases because UINT_MAX>= stop_6(D),
+   when TREE_TYPE(stop_6(D)) == unsigned int;  */
+bool
+fold_binary_cond_p (enum tree_code code, tree type, tree op0, tree op1)
+{
+  gcc_assert (type == boolean_type_node);
+
+  if (TREE_TYPE (op0) != TREE_TYPE (op1))
+    return false;
+
+  /* TODO: Handle other operations.  */
+  if (code != GE_EXPR)
+    return false;
+  // The type of op0 and op1 should be unsigned.
+  if (!TYPE_UNSIGNED (TREE_TYPE(op0)))
+    return false;
+  if ((TREE_CODE (op0) != BIT_NOT_EXPR) || (TREE_CODE (op1) != COND_EXPR))
+    return false;
+
+  /* We have to show that in both the cases,
+ (when cond is true and when cond is false) op (op0, op1) is true.  */
+   tree neg_op0 = TREE_OPERAND (op0, 0);
+   tree cond_op1 = TREE_OPERAND (op1, 0);
+   tree true_op1 = TREE_OPERAND (op1, 1);
+   tree false_op1 = TREE_OPERAND (op1, 2);
+   gcc_assert(neg_op0 && cond_op1 && true_op1 && false_op1);
+
+  /* When cond is false. Evaluate op (op0, false_op1).  */
+  tree running_exp = fold_binary (code, boolean_type_node, op0, false_op1);
+  if (running_exp == NULL || integer_zerop (running_exp))
+    /* TODO: Handle more cases here. */
+    return false;
+
+  /* When cond is true. Evaluate op (op0, true_op1).  */
+  running_exp = fold_binary (code, boolean_type_node, op0, true_op1);
+  if (running_exp != NULL && integer_nonzerop (running_exp))
+    return true;
+
+  tree smaller, bigger;
+  if (TREE_CODE (cond_op1) == LE_EXPR)
+    {
+  smaller = TREE_OPERAND (cond_op1, 0);
+  bigger = TREE_OPERAND (cond_op1, 1);
+    } else return false;
+
+  if (TREE_CODE (true_op1) == MINUS_EXPR)
+    {
+  tree minuend = TREE_OPERAND (true_op1, 0);
+  tree subtrahend = TREE_OPERAND (true_op1, 1);
+  if (subtrahend == neg_op0 && subtrahend == smaller && minuend == bigger)
+    {
+  tree extreme = upper_bound_in_type (TREE_TYPE (neg_op0),
+  TREE_TYPE (neg_op0));
+  running_exp = fold_binary (code, boolean_type_node, extreme, 
minuend);
+  return running_exp != NULL && integer_nonzerop(running_exp);
+    } else return false;
+    } else return false;
+}
+
+/* Return true when op0>= op1 and
+   op0 is ~start3(D) or, UINT_MAX - start3(D)
+   op1 is (_21 - start_3(D)) - 1; */
+bool
+fold_binary_minus_p (enum tree_code code, tree type, tree op0, tree op1)
+{
+  gcc_assert (type == boolean_type_node);
+
+  if (TREE_TYPE (op0

RE: Refactor gimple_expr_type

2015-05-19 Thread Aditya K



> Date: Tue, 19 May 2015 11:33:16 +0200
> Subject: Re: Refactor gimple_expr_type
> From: richard.guent...@gmail.com
> To: hiradi...@msn.com
> CC: tbsau...@tbsaunde.org; gcc-patches@gcc.gnu.org
>
> On Tue, May 19, 2015 at 12:04 AM, Aditya K  wrote:
>>
>>
>> 
>>> Date: Mon, 18 May 2015 12:08:58 +0200
>>> Subject: Re: Refactor gimple_expr_type
>>> From: richard.guent...@gmail.com
>>> To: hiradi...@msn.com
>>> CC: tbsau...@tbsaunde.org; gcc-patches@gcc.gnu.org
>>>
>>> On Sun, May 17, 2015 at 5:31 PM, Aditya K  wrote:


 
> Date: Sat, 16 May 2015 11:53:57 -0400
> From: tbsau...@tbsaunde.org
> To: hiradi...@msn.com
> CC: gcc-patches@gcc.gnu.org
> Subject: Re: Refactor gimple_expr_type
>
> On Fri, May 15, 2015 at 07:13:35AM +, Aditya K wrote:
>> Hi,
>> I have tried to refactor gimple_expr_type to make it more readable. 
>> Removed the switch block and redundant if.
>>
>> Please review this patch.
>
> for some reason your mail client seems to be inserting non breaking
> spaces all over the place. Please either configure it to not do that,
> or use git send-email for patches.

 Please see the updated patch.
>>>
>>> Ok if this passed bootstrap and regtest. (I wish if gimple_expr_type
>>> didn't exist btw...)
>>
>> Thanks for the review. Do you have any suggestions on how to remove 
>> gimple_expr_type. Are there any alternatives to it?
>> I can look into refactoring more (if it is not too complicated) since I'm 
>> already doing this.
>
> Look at each caller - usually they should be fine with using TREE_TYPE
> (gimple_get_lhs ()) (or a more specific one
> dependent on what stmts are expected at the place). You might want to
> first refactor the code
>
> else if (code == GIMPLE_COND)
> gcc_unreachable ();
>
> and deal with the fallout in callers (similar for the void_type_node return).

Thanks for the suggestions. I looked at the use cases there are 47 usages in 
different files. That might be a lot of changes I assume, and would take some 
time.
This patch passes bootstrap and make check (although I'm not very confident 
that my way of make check ran all the regtests)

If this patch is okay to merge please do that. I'll continue working on 
removing gimle_expr_type.

Thanks,
-Aditya


>
> Richard.
>
>
>> -Aditya
>>
>>>
>>> Thanks,
>>> Richard.
>>>
 gcc/ChangeLog:

 2015-05-15 hiraditya 

 * gimple.h (gimple_expr_type): Refactor to make it concise. Remove 
 redundant if.

 diff --git a/gcc/gimple.h b/gcc/gimple.h
 index 95e4fc8..3a83e8f 100644
 --- a/gcc/gimple.h
 +++ b/gcc/gimple.h
 @@ -5717,36 +5717,26 @@ static inline tree
 gimple_expr_type (const_gimple stmt)
 {
 enum gimple_code code = gimple_code (stmt);
 -
 - if (code == GIMPLE_ASSIGN || code == GIMPLE_CALL)
 + /* In general we want to pass out a type that can be substituted
 + for both the RHS and the LHS types if there is a possibly
 + useless conversion involved. That means returning the
 + original RHS type as far as we can reconstruct it. */
 + if (code == GIMPLE_CALL)
 {
 - tree type;
 - /* In general we want to pass out a type that can be substituted
 - for both the RHS and the LHS types if there is a possibly
 - useless conversion involved. That means returning the
 - original RHS type as far as we can reconstruct it. */
 - if (code == GIMPLE_CALL)
 - {
 - const gcall *call_stmt = as_a  (stmt);
 - if (gimple_call_internal_p (call_stmt)
 - && gimple_call_internal_fn (call_stmt) == IFN_MASK_STORE)
 - type = TREE_TYPE (gimple_call_arg (call_stmt, 3));
 - else
 - type = gimple_call_return_type (call_stmt);
 - }
 + const gcall *call_stmt = as_a  (stmt);
 + if (gimple_call_internal_p (call_stmt)
 + && gimple_call_internal_fn (call_stmt) == IFN_MASK_STORE)
 + return TREE_TYPE (gimple_call_arg (call_stmt, 3));
 + else
 + return gimple_call_return_type (call_stmt);
 + }
 + else if (code == GIMPLE_ASSIGN)
 + {
 + if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR)
 + return TREE_TYPE (gimple_assign_rhs1 (stmt));
 else
 - switch (gimple_assign_rhs_code (stmt))
 - {
 - case POINTER_PLUS_EXPR:
 - type = TREE_TYPE (gimple_assign_rhs1 (stmt));
 - break;
 -
 - default:
 - /* As fallback use the type of the LHS. */
 - type = TREE_TYPE (gimple_get_lhs (stmt));
 - break;
 - }
 - return type;
 + /* As fallback use the type of the LHS. */
 + return TREE_TYPE (gimple_get_lhs (stmt));
 }
 else if (code == GIMPLE_COND)
 return boolean_type_node;


 Thanks,
 -Aditya





>
>>
>> Thanks,
>> -Aditya
>>
>>
>> gcc/ChangeLog:
>>
>> 2015-05-1

[ping**2] Handle MULTILIB_REUSE in auto-generated SYSROOT_SUFFIX_SPEC macro

2015-05-19 Thread Sandra Loosemore

Re-pinging a patch from last year that never got reviewed:

https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00511.html

This problem still exists in GCC 5.1 and the above patch still fixes it. 
 I haven't tried mainline head yet, but it doesn't look like anything 
else has touched this since we branched GCC 5.


-Sandra



[committed] linear/lastprivate clause fixes

2015-05-19 Thread Jakub Jelinek
Hi!

When working on taskloop, I've noticed various issues in the OpenMP 4.0
handling of the linear/lastprivate (explicit as well as implicit) clauses.
Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
plan to backport to 5/4.9 after a while.

2015-05-19  Jakub Jelinek  

PR middle-end/66199
* tree.h (OMP_TEAMS_COMBINED): Define.
* gimplify.c (enum gimplify_omp_var_data): Add
GOVD_LINEAR_LASTPRIVATE_NO_OUTER.
(enum omp_region_type): Add ORT_COMBINED_TEAMS.
(omp_notice_variable): Accept both ORT_TEAMS
and ORT_COMBINED_TEAMS.  Don't recurse if
GOVD_LINEAR_LASTPRIVATE_NO_OUTER is set and either
GOVD_LINEAR is set, or GOVD_LASTPRIVATE without
GOVD_FIRSTPRIVATE.
(omp_no_lastprivate): New function.
(gimplify_scan_omp_clauses): For OMP_CLAUSE_LASTPRIVATE
and OMP_CLAUSE_LINEAR, if omp_no_lastprivate, don't
notice_outer and set appropriate bits, otherwise make
sure default(none) combined constructs won't complain.
(gimplify_adjust_omp_clauses): Remove OMP_CLAUSE_LINEAR
outer special casing, for OMP_CLAUSE_LASTPRIVATE if
omp_no_lastprivate either remove the clause or turn it
into OMP_CLAUSE_PRIVATE.
(gimplify_omp_for): Fix up handling of implicit
lastprivate or linear iterators.
(gimplify_omp_workshare): For OMP_TEAMS_COMBINED use
ORT_COMBINED_TEAMS.
* omp-low.c (lower_omp_for_lastprivate): For combined
for simd use fd.loop.n2 from the for rather than simd.
gcc/c/
* c-parser.c (c_parser_omp_for_loop): Don't add
OMP_CLAUSE_SHARED to OMP_PARALLEL_CLAUSES when moving
OMP_CLAUSE_LASTPRIVATE clause to OMP_FOR_CLAUSES.
(c_parser_omp_teams): Set OMP_TEAMS_COMBINED for combined
constructs.
gcc/cp/
* parser.c (cp_parser_omp_for_loop): Don't add
OMP_CLAUSE_SHARED to OMP_PARALLEL_CLAUSES when moving
OMP_CLAUSE_LASTPRIVATE clause to OMP_FOR_CLAUSES.
(cp_parser_omp_teams): Set OMP_TEAMS_COMBINED for combined
constructs.
gcc/fortran/
* trans-openmp.c (gfc_trans_omp_teams): Set OMP_TEAMS_COMBINED for
combined constructs.
(gfc_trans_omp_target): Make sure BIND_EXPR has non-NULL
BIND_EXPR_BLOCK.
libgomp/
* testsuite/libgomp.c/pr66199-1.c: New test.
* testsuite/libgomp.c/pr66199-2.c: New test.
* testsuite/libgomp.c++/pr66199-1.C: New test.
* testsuite/libgomp.c++/pr66199-2.C: New test.
* testsuite/libgomp.fortran/pr66199-1.f90: New test.
* testsuite/libgomp.fortran/pr66199-2.f90: New test.

--- gcc/tree.h.jj   2015-05-18 09:46:37.0 +0200
+++ gcc/tree.h  2015-05-18 15:07:16.029386340 +0200
@@ -1326,6 +1326,11 @@ extern void protected_set_expr_location
 #define OMP_PARALLEL_COMBINED(NODE) \
   (OMP_PARALLEL_CHECK (NODE)->base.private_flag)
 
+/* True on an OMP_TEAMS statement if it represents an explicit
+   combined teams distribute constructs.  */
+#define OMP_TEAMS_COMBINED(NODE) \
+  (OMP_TEAMS_CHECK (NODE)->base.private_flag)
+
 /* True if OMP_ATOMIC* is supposed to be sequentially consistent
as opposed to relaxed.  */
 #define OMP_ATOMIC_SEQ_CST(NODE) \
--- gcc/gimplify.c.jj   2015-05-13 18:57:44.0 +0200
+++ gcc/gimplify.c  2015-05-19 13:48:14.019466801 +0200
@@ -111,6 +111,9 @@ enum gimplify_omp_var_data
   /* Flag for GOVD_MAP: don't copy back.  */
   GOVD_MAP_TO_ONLY = 8192,
 
+  /* Flag for GOVD_LINEAR or GOVD_LASTPRIVATE: no outer reference.  */
+  GOVD_LINEAR_LASTPRIVATE_NO_OUTER = 16384,
+
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
   | GOVD_LOCAL)
@@ -126,6 +129,7 @@ enum omp_region_type
   ORT_TASK = 4,
   ORT_UNTIED_TASK = 5,
   ORT_TEAMS = 8,
+  ORT_COMBINED_TEAMS = 9,
   /* Data region.  */
   ORT_TARGET_DATA = 16,
   /* Data region with offloading.  */
@@ -5870,7 +5874,7 @@ omp_notice_variable (struct gimplify_omp
 DECL_NAME (lang_hooks.decls.omp_report_decl (decl)));
  error_at (ctx->location, "enclosing task");
}
- else if (ctx->region_type == ORT_TEAMS)
+ else if (ctx->region_type & ORT_TEAMS)
{
  error ("%qE not specified in enclosing teams construct",
 DECL_NAME (lang_hooks.decls.omp_report_decl (decl)));
@@ -5963,6 +5967,13 @@ omp_notice_variable (struct gimplify_omp
  need to propagate anything to an outer context.  */
   if ((flags & GOVD_PRIVATE) && !(flags & GOVD_PRIVATE_OUTER_REF))
 return ret;
+  if ((flags & (GOVD_LINEAR | GOVD_LINEAR_LASTPRIVATE_NO_OUTER))
+  == (GOVD_LINEAR | GOVD_LINEAR_LASTPRIVATE_NO_OUTER))
+return ret;
+  if ((flags & (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE
+   | GOVD_LINEAR_LASTPRIVATE_NO_OUTER))
+  == (GOVD_LASTP

Re: [RFC] COMDAT Safe Module Level Multi versioning

2015-05-19 Thread Sriraman Tallam
On Tue, May 19, 2015 at 2:39 AM, Richard Biener
 wrote:
> On Tue, May 19, 2015 at 8:16 AM, Sriraman Tallam  wrote:
>> We have the following problem with selectively compiling modules with
>> -m options and I have provided a solution to solve this.  I would
>> like to hear what you think.
>>
>> Multi versioning at module granularity is done by compiling a subset
>> of modules with advanced ISA instructions, supported on later
>> generations of the target architecture, via -m options and
>> invoking the functions defined in these modules with explicit checks
>> for the ISA support via builtin functions,  __builtin_cpu_supports.
>> This mechanism has the unfortunate side-effect that generated COMDAT
>> candidates from these modules can contain these advanced instructions
>> and potentially “violate” ODR assumptions.  Choosing such a COMDAT
>> candidate over a generic one from a different module can cause SIGILL
>> on platforms where the advanced ISA is not supported.
>>
>> Here is a slightly contrived  example to illustrate:
>>
>>
>> matrixdouble.h
>> 
>> // Template (Comdat) function definition in a header:
>>
>> template
>> __attribute__((noinline))
>> void matrixDouble (T *a) {
>>   for (int i = 0 ; i < 16; ++i)  //Vectorizable Loop
>> a[i] = a[i] * 2;
>> }
>>
>> avx.cc  (Compile with -mavx -O2)
>> -
>>
>> #include "matrixdouble.h"
>> void getDoubleAVX(int *a) {
>>  matrixDouble(a);  // Instantiated with vectorized AVX instructions
>> }
>>
>>
>> non_avx.cc (Compile with -mno-avx -O2)
>> ---
>>
>> #include “matrixdouble.h”
>> void
>> getDouble(int *a) {
>>  matrixDouble(a); // Instantiated with non-AVX instructions
>> }
>>
>>
>> main.cc
>> ---
>>
>> void getDoubleAVX(int *a);
>> void getDouble(int *a);
>>
>> int a[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
>> int main () {
>>  // The AVX call is appropriately guarded.
>>  if (__builtin_cpu_supports(“avx”))
>>getDoubleAVX(a);
>>  else
>>getDouble(a);
>>  return a[0];
>> }
>>
>>
>> In the above code, function “getDoubleAVX” is only called when the
>> run-time CPU supports AVX instructions.  This code looks clean but
>> suffers from the COMDAT ODR violation.  Two copies of COMDAT function
>> “matrixDouble” are generated.  One copy is generated in object file
>> “avx.o” with AVX instructions and another copy exists in “non_avx.o”
>> without AVX instruction.  At link time, in a link order where object
>> file avx.o is seen ahead of  non_avx.o,  the COMDAT copy of function
>> “matrixDouble” that contains AVX instructions is kept leading to
>> SIGILL on unsupported platforms.  To reproduce the SIGILL,
>>
>>
>> $  g++ -c -O2 -mavx avx.cc
>> $ g++ -c -O2 -mno-avx non_avx.cc
>> $  g++ main.cc avx.o non_avx.o
>> $ ./a.out   # on a non-AVX machine
>> Illegal Instruction
>>
>>
>> To solve this, I propose introducing a new compiler option, say
>> -fodr-unsafe-comdats, to let the user tag objects that use specialized
>> options and let the linker choose the comdat candidate to be linked
>> wisely.  The root cause of the above problem is that comdat functions
>> in common headers may not be properly guarded and the linker picks the
>> first candidate it sees.  A link order where the object with the
>> specialized comdat functions appear first causes these comdats to be
>> picked leading to SIGILL on unsupported arches.  With the objects
>> tagged, the linker can be made to pick other comdat candidates when
>> possible.
>>
>> More details:
>>
>> This option is user specified when using arch specific options like
>> -m.  It is an indicator to the compiler that any comdat bodies
>> generated are potentially unsafe for execution.  Note that the COMDAT
>> bodies however have to be generated as there are no guarantees that
>> other modules will do so.  The compiler then emits a specially named
>> section, like “.gnu.odr.unsafe”, in the object file.  When the linker
>> tries to pick a COMDAT candidate from several choices, it must avoid
>> COMDAT copies from objects with sections named “.gnu.odr.unsafe” when
>> presented with a choice to pick a candidate from an object that does
>> not have the “.gnu.odr.unsafe” section.  Note that it may not be
>> possible to do that in which case the linker must pick the unsafe
>> copy, it could explicitly warn when this happens.
>>
>> Alternately,  the compiler can bind locally any emitted comdat version
>> from a specialized module, which could also be guarded by an option.
>> This will solve the problem but this may not be always possible
>> especially when addresses of any such comdat version is taken.
>
> Hm.  But which options are unsafe?  Also wouldn't it be better to simp

In general, should that be any option that affects code gen and is
only *applied to a subset of modules* is potentially unsafe as the
comdat copies generated from those modules are not identical to the
copies from other modules.  Tagging such modules with
-fodr-unsafe-comdats, even conserva

Re: [RFC] COMDAT Safe Module Level Multi versioning

2015-05-19 Thread Yury Gribov

On 05/19/2015 09:16 AM, Sriraman Tallam wrote:

We have the following problem with selectively compiling modules with
-m options and I have provided a solution to solve this.  I would
like to hear what you think.

Multi versioning at module granularity is done by compiling a subset
of modules with advanced ISA instructions, supported on later
generations of the target architecture, via -m options and
invoking the functions defined in these modules with explicit checks
for the ISA support via builtin functions,  __builtin_cpu_supports.
This mechanism has the unfortunate side-effect that generated COMDAT
candidates from these modules can contain these advanced instructions
and potentially “violate” ODR assumptions.  Choosing such a COMDAT
candidate over a generic one from a different module can cause SIGILL
on platforms where the advanced ISA is not supported.

Here is a slightly contrived  example to illustrate:


matrixdouble.h

// Template (Comdat) function definition in a header:

template
__attribute__((noinline))
void matrixDouble (T *a) {
   for (int i = 0 ; i < 16; ++i)  //Vectorizable Loop
 a[i] = a[i] * 2;
}

avx.cc  (Compile with -mavx -O2)
-

#include "matrixdouble.h"
void getDoubleAVX(int *a) {
  matrixDouble(a);  // Instantiated with vectorized AVX instructions
}


non_avx.cc (Compile with -mno-avx -O2)
---

#include “matrixdouble.h”
void
getDouble(int *a) {
  matrixDouble(a); // Instantiated with non-AVX instructions
}


main.cc
---

void getDoubleAVX(int *a);
void getDouble(int *a);

int a[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
int main () {
  // The AVX call is appropriately guarded.
  if (__builtin_cpu_supports(“avx”))
getDoubleAVX(a);
  else
getDouble(a);
  return a[0];
}


In the above code, function “getDoubleAVX” is only called when the
run-time CPU supports AVX instructions.  This code looks clean but
suffers from the COMDAT ODR violation.  Two copies of COMDAT function
“matrixDouble” are generated.  One copy is generated in object file
“avx.o” with AVX instructions and another copy exists in “non_avx.o”
without AVX instruction.  At link time, in a link order where object
file avx.o is seen ahead of  non_avx.o,  the COMDAT copy of function
“matrixDouble” that contains AVX instructions is kept leading to
SIGILL on unsupported platforms.  To reproduce the SIGILL,


$  g++ -c -O2 -mavx avx.cc
$ g++ -c -O2 -mno-avx non_avx.cc
$  g++ main.cc avx.o non_avx.o
$ ./a.out   # on a non-AVX machine
Illegal Instruction


To solve this, I propose introducing a new compiler option, say
-fodr-unsafe-comdats, to let the user tag objects that use specialized
options and let the linker choose the comdat candidate to be linked
wisely.  The root cause of the above problem is that comdat functions
in common headers may not be properly guarded and the linker picks the
first candidate it sees.  A link order where the object with the
specialized comdat functions appear first causes these comdats to be
picked leading to SIGILL on unsupported arches.  With the objects
tagged, the linker can be made to pick other comdat candidates when
possible.

More details:

This option is user specified when using arch specific options like
-m.  It is an indicator to the compiler that any comdat bodies
generated are potentially unsafe for execution.  Note that the COMDAT
bodies however have to be generated as there are no guarantees that
other modules will do so.  The compiler then emits a specially named
section, like “.gnu.odr.unsafe”, in the object file.  When the linker
tries to pick a COMDAT candidate from several choices, it must avoid
COMDAT copies from objects with sections named “.gnu.odr.unsafe” when
presented with a choice to pick a candidate from an object that does
not have the “.gnu.odr.unsafe” section.  Note that it may not be
possible to do that in which case the linker must pick the unsafe
copy, it could explicitly warn when this happens.

Alternately,  the compiler can bind locally any emitted comdat version
from a specialized module, which could also be guarded by an option.
This will solve the problem but this may not be always possible
especially when addresses of any such comdat version is taken.


Can IFUNC relocations be used to properly select optimal version of code 
at runtime?


-Y



Re: [RFC]: Remove Mem/address type assumption in combiner

2015-05-19 Thread Jeff Law

On 05/16/2015 07:55 PM, Hans-Peter Nilsson wrote:

On Sat, 16 May 2015, Segher Boessenkool wrote:

On Sat, May 16, 2015 at 12:36:38PM -0400, Hans-Peter Nilsson wrote:

On Sat, 16 May 2015, Segher Boessenkool wrote:

On Fri, May 15, 2015 at 10:40:48PM -0400, Hans-Peter Nilsson wrote:

I confess the test-case-"guarded" addi pattern should have been
expressed with a shift in addition to the multiplication.


But they wouldn't ever match so they might very well have bitrotted
by now :-(


It seems you're saying that the canonicalization to "ashift"
didn't work *at all*, when starting with an expression from an
address?  I knew it failed in part, but always thought it was
just a partial failure.


With a plus or minus combine would always write it as a mult.
I don't think any other pass would create this combination.  I
haven't tested it though.


Ports probably also generate that internally at various RTL
passes, something that takes a bit more than an at-a-glance code
inspection.
Correct.  THe PA port for example has a ton of this kind of RTL 
rewriting to exploit the shift-add insns and scaled indexed addressing 
modes (and correct for some oddities in the PA chip where the scaled 
modes don't exist in every context where you'd think they should).


And you still have to to worry about things like combine taking a (mem 
(plus (mult))), selecting the (plus (mult)) as a split point and failing 
to canonicalize it into the ashift form.


I ran into that while fixing up the PA for these changes.  The good news 
is with two trivial combine changes and the expected changes to the PA 
backend, I can get code generation back to where it was before across my 
sample testfiles.



Jeff


Re: [PATCH] Add SPECIAL_FLOAT_MODE to enable adding IEEE 128-bit floating point to PowerPC

2015-05-19 Thread Michael Meissner
On Fri, May 08, 2015 at 01:05:59PM -0600, Jeff Law wrote:
> On 05/06/2015 11:29 AM, Michael Meissner wrote:
> >On Wed, May 06, 2015 at 04:03:00PM +0100, Richard Sandiford wrote:
> >>Jeff Law  writes:
> >>>So my worry here is that folks writing these loops to iterate over modes
> >>>are going to easily miss the != VOIDmode terminator, or not know when to
> >>>use GET_MODE_WIDER_SPECIAL.
> >>>
> >>>We can certainly go with the patch as-is since you've done the work to
> >>>sort out which GET_MODE_WIDER to use and added the appropriate
> >>>termination checks.   But do we want to try to future proof this a
> >>>little and define two iterators for folks to use rather than write the
> >>>loops by hand every time and probably getting it wrong -- and wrong in
> >>>such a way that it only breaks on PPC, forcing someone to regularly be
> >>>fixing this stuff
> >
> >If they miss the != VOIDmode, the program will hang since it will never exit
> >the loop (VOIDmode is the wider type for VOIDmode).
> OK.  And presumably if someone is adding a new loop over the modes
> they'll actually be testing that code on whatever target they're
> using.  So it's unlikely they'll introduce an infinite loop on other
> targets by accident.

In looking over the patch, I learned a lot more about the internal mode
handling.  I discovered that I can use FRACTIONAL_FLOAT_MODE to add the new
types (IFmode for IBM double-double with 106 fracational bits, and KFmode for
IEEE 128-bit with 113 bits, and TFmode will become either of these depending on
the defaults and switches).  What I was missing when I previously tried to use
FRACTIONAL_FLOAT_MODE is when I create these new modes, in the case where they
are not active, I have to make sure that HARD_REGNO_NREGS never returns true
for any register (including GPRs), and the mode will be skipped.  I don't need
to add the special tests against VOIDmode during the widening rules.

So, I don't need any special machine independent support for adding IEEE
128-bit floating point to the compiler.  Thanks for pushing me to
re-investigate the changes and clean them up.

During my investigation, I did come up with a FOR_EACH iterator for emit-rtl.c
and a few other places that make the iteration over all modes easier, and I
will clean that up and submit it as a separate patch.

Some of the other mode widening places in the machine independent parts of the
compiler could also be cleaned up (lto-streamer in particular), that I may or
may not submit patches for.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



ODR merging and implicit typedefs

2015-05-19 Thread Jan Hubicka

Jason,
I just noticed that there are bogus ODR violation warnings during LTO-bootstrap
(that breaks -Werror builds).  It was caused by my work-around for 
type_in_anonymous_namespace
for the issue discussed in:
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01245.html
(i.e. the TYPE_STUB_DECL disucssion).

I simply added a loop that for type that looks anonymous by
if (TYPE_STUB_DECL (t) && !TREE_PUBLIC (TYPE_STUB_DECL (t)))
it walks up the context and looks for actual anonymous NAMESPACE_DECL.

Now I however run into type merging issues as follows:

for following type:
../../gcc/postreload-gcse.c:107:1: error: type �struct � violates one 
definition rule [-Werror=odr]
 {
  ^
../../gcc/postreload.c:721:3: note: a different type is defined in another 
translation unit
   {
^
../../gcc/postreload-gcse.c:108:7: note: the first difference of corresponding 
definitions is field �moves_inserted�
   int moves_inserted;
^
../../gcc/postreload.c:722:51: note: a field with different name is defined in 
another translation unit
 struct reg_use reg_use[RELOAD_COMBINE_MAX_USES];
^
(gdb) p debug_tree (t1->type_common.name->decl_with_vis.assembler_name)
 

So the problem is that at compile time we think the type is ODR type but it 
does not really have name:

  constant 96>
unit size  constant 12>
align 32 symtab 0 alias set -1 canonical type 0x746d2bd0
fields 
unit size 
align 32 symtab 0 alias set -1 canonical type 0x76adb690 
precision 32 min  max 
pointer_to_this  reference_to_this 
>
nonlocal SI file ../../gcc/postreload-gcse.c line 108 col 7 size 
 unit size 
align 32 offset_align 128
offset 
bit offset  context 
chain 
nonlocal SI file ../../gcc/postreload-gcse.c line 109 col 7 size 
 unit size 
align 32 offset_align 128 offset  bit 
offset  context  
chain >> context 

chain >

 
unit size 
align 32 symtab 0 alias set -1 canonical type 0x746d2bd0
fields 
nonlocal SI file ../../gcc/postreload-gcse.c line 108 col 7
size 
unit size 
align 32 offset_align 128
offset 
bit offset  context 
 chain > 
context 
pointer_to_this  chain >
VOID file ../../gcc/postreload-gcse.c line 107 col 1
align 8 context >

I tracked down that those are implicit typedef created by 
create_implicit_typedef.
My patch made them no longer anonymous that in turn triggers the bogus 
diagnostics.
I do not think it is fully correct though - those types are not anonymous.
(I also wonder we we need to introdce a type name "._134") and pass it all the 
way down
to LTO.

I tried to make type_with_linkage_p to return false on those, but that causes 
problem
witht fact that polymorphic call analysis expects all class types to have 
linkage
and working ODR equivalency on these. 

Is there a way to associate them with the real named type they correspond to and
arrange them to have same name mangling? (and perhaps the mangler to ICE on 
attempt
to try to use local name like this?)

I bootstrapped/regtested on x86_64-linux the patch bellow. If it will work for 
Firefox
and Chrome I will go ahead with it at least temporarily.

Honza

* ipa-devirt.c (type_in_anonymous_namespace_p): Return true
or implicit declarations.
(odr_type_p): Check that TYPE_NAME is TYPE_DECL before looking
into it.
(get_odr_type): Check type has linkage before adding bases.
(register_odr_type): Check that type has linkage before adding it.
(type_known_to_have_no_deriavations_p): Rename to ..
(type_known_to_have_no_derivations_p): This one.
* ipa-utils.h (type_known_to_have_no_deriavations_p): Rename to ..
(type_known_to_have_no_derivations_p): This one.
* ipa-polymorphic-call.c
(ipa_polymorphic_call_context::restrict_to_inner_type): Check that
type has linkage.
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 223390)
+++ ipa-devirt.c(working copy)
@@ -269,6 +269,8 @@ type_in_anonymous_namespace_p (const_tre
 
   if (TYPE_STUB_DECL (t) && !TREE_PUBLIC (TYPE_STUB_DECL (t)))
 {
+  if (DECL_ARTIFICIAL (TYPE_NAME (t)))
+   return true;
   tree ctx = DECL_CONTEXT (TYPE_NAME (t));
   while (ctx)
{
@@ -296,7 +298,7 @@ odr_type_p (const_tree t)
  to care, since it is used only for type merging.  */
   gcc_checking_assert (in_lto_p || flag_lto);
 
-  return (TYPE_NAME (t)
+  return (TYPE_NAME (t) && TREE_CODE (TYPE_NAME (t)) == TYPE_DECL
   && (DECL_ASSEMBLER_NAME_SET_P (TYPE_NAME (t;
 }
 
@@ -2124,6 +2126,7 @@ get_odr_type (tree type, bool insert)
 }
 
   if (build_bases && TREE_CODE (type) == RECORD_TYPE && TYPE_BINFO (type)
+  && type_with_linkage_p (type)

Re: [Patch][loop-invariant.c] Fix a couple of bugs regarding loop invariant motion discovered by spec2k6 on aarch64

2015-05-19 Thread Jeff Law

On 05/18/2015 02:16 AM, David Sherwood wrote:

Hi Jeff,

Thanks for the suggestion. I did a bootstrap x86_64 build before and after my
patch and looked for differences in the last stage object files and there were
plenty of them. I chose a nice simple function (check_callers) from
ipa-inline-analysis.c and reduced it to a small test case. Hope this is ok.

Testing done:

  *  aarch64 built, "make check" no regressions
  *  aarch64_be built, "make check" no regressions
  *  x86_64 built, "make check" no regressions

ChangeLog:
 2015-05-15  David Sherwood  

 * loop-invariant.c (create_new_invariant): Don't calculate address cost
 if mode is not a scalar integer.
  (get_inv_cost): Increase computational cost for unused invariants.
 * testsuite/gcc.dg/loop-invariant.c: New testcase.
Thanks.  Installed on the trunk after fixing a couple trivial whitespace 
nits.


Jeff



Re: ODR merging and implicit typedefs

2015-05-19 Thread Markus Trippelsdorf
On 2015.05.19 at 19:33 +0200, Jan Hubicka wrote:
> 
> Jason,
> I just noticed that there are bogus ODR violation warnings during 
> LTO-bootstrap
> (that breaks -Werror builds).  It was caused by my work-around for 
> type_in_anonymous_namespace
> for the issue discussed in:
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01245.html
> (i.e. the TYPE_STUB_DECL disucssion).

There are also many bogus ODR warnings when building LLVM with LTO:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66180

-- 
Markus


Use a couple of macros in stor-layout.c

2015-05-19 Thread Eric Botcazou
Self-explanatory, tested on x86_64-suse-linux, applied as obvious.


2015-05-19  Eric Botcazou  

* stor-layout.c (finalize_type_size): Use AGGREGATE_TYPE_P.
(layout_type): Use RECORD_OR_UNION_TYPE_P.


-- 
Eric BotcazouIndex: stor-layout.c
===
--- stor-layout.c	(revision 223349)
+++ stor-layout.c	(working copy)
@@ -1757,12 +1757,9 @@ finalize_type_size (tree type)
  However, where strict alignment is not required, avoid
  over-aligning structures, since most compilers do not do this
  alignment.  */
-
-  if (TYPE_MODE (type) != BLKmode && TYPE_MODE (type) != VOIDmode
-  && (STRICT_ALIGNMENT
-	  || (TREE_CODE (type) != RECORD_TYPE && TREE_CODE (type) != UNION_TYPE
-	  && TREE_CODE (type) != QUAL_UNION_TYPE
-	  && TREE_CODE (type) != ARRAY_TYPE)))
+  if (TYPE_MODE (type) != BLKmode
+  && TYPE_MODE (type) != VOIDmode
+  && (STRICT_ALIGNMENT || !AGGREGATE_TYPE_P (type)))
 {
   unsigned mode_align = GET_MODE_ALIGNMENT (TYPE_MODE (type));
 
@@ -2431,9 +2428,7 @@ layout_type (tree type)
   /* Compute the final TYPE_SIZE, TYPE_ALIGN, etc. for TYPE.  For
  records and unions, finish_record_layout already called this
  function.  */
-  if (TREE_CODE (type) != RECORD_TYPE
-  && TREE_CODE (type) != UNION_TYPE
-  && TREE_CODE (type) != QUAL_UNION_TYPE)
+  if (!RECORD_OR_UNION_TYPE_P (type))
 finalize_type_size (type);
 
   /* We should never see alias sets on incomplete aggregates.  And we


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 04:43:53PM +0200, Michael Matz wrote:
> Hi,
> 
> On Fri, 15 May 2015, Rich Felker wrote:
> 
> > Forget lazy binding. It's dead anyway because serious distros want
> > PIE+relro+bindnow+...
> 
> You keep saying this, but I can't help the feeling it's mostly because 
> musl doesn't support it ;-)

Well the reasons musl doesn't support it are partly the above, and
partly that it's been a continuous source of subtle bugs in glibc --
things like clobbering new vector registers, missing synchronization,
failures to be async-signal-safe, etc. So it's not that I think lazy
binding is bad because musl doesn't support it, but rather that musl
doesn't support lazy binding because I think it's bad. :-)

> No, you don't have to use bindnow to get the effects of relro.  Sure 
> there's more parts of the GOT protected with it, but if that's really that 
> much more hardened is up for debate.

Normally it's function addresses that you care about protecting --
they're the easy vector for arbitrary code execution -- and they're
unprotected without bindnow. Addresses of global data could also be an
attack vector, but a more difficult one to exploit.

> > If people really want lazy binding, they can use options which support 
> > it, but I don't want to keep suffering the codegen cost of lazy binding 
> > despite never using it.
> 
> > There should be an option to generate optimal code equivalent to what 
> > you get with Alexander Monakov's patches for those of us who aren't 
> > trying to support this legacy feature that precludes good performance 
> > and precludes hardening.
> 
> H.J.'s branch is for _improving_ code on top of the no-plt code, it's not 
> replacing it or an alternative for it.

Thanks for the clarification -- this was the part I was failing to
understand. I'm still mildly worried that concerns for supporting
relaxation might lead to decisions not to optimize code in ways that
would be difficult to relax (e.g. certain types of address load
reordering or hoisting) but I don't understand GCC internals
sufficiently to know if this concern is warranted or not. As long as
his work isn't interfering with the ability of -fno-plt to generate
optimal code, I agree it's both inappropriate and counter-productive
for me to be objecting to part or all of it.

I would still like to see the @GOTPCREL stuff added and used instead
of @GOT, as I mentioned earlier in the thread, but I agree that's
independent of relaxation support and shouldn't block it.

Rich


Re: [PATCH/libiberty] fix build of gdb/binutils with clang.

2015-05-19 Thread Yunlian Jiang
I could do that and it make the compilation of libiberty passes.
However, I  have some other problem when using clang to build gdb
because of libiberty.

Some c file from other component may include 'libiberty.h' which contains
the following

#if !HAVE_DECL_ASPRINTF
/* Like sprintf but provides a pointer to malloc'd storage, which must
   be freed by the caller.  */

extern int asprintf (char **, const char *, ...) ATTRIBUTE_PRINTF_2;
#endif

The HAVE_DECL_ASPRINTF is defined in config.h under libiberty directory.
If the other c file only includes libiberty.h and does not include the
libiberty/config.h and
at the same time, _GNU_SOURCE is defind, the same error happens.


On Mon, May 18, 2015 at 4:52 PM, Ian Lance Taylor  wrote:
> On Mon, May 18, 2015 at 4:26 PM, Yunlian Jiang  wrote:
>>
>> Yes, the problem is  libiberty is compiling some files
>> with _GNU_SOURCE defined and some not. So the configure
>> file does not include "#define _GNU_SOURCE".
>
> As far as I can see it should be fine to define _GNU_SOURCE when
> compiling all libiberty files.
>
> Ian


Re: [PATCH/libiberty] fix build of gdb/binutils with clang.

2015-05-19 Thread DJ Delorie

> If the other c file only includes libiberty.h and does not include the
> libiberty/config.h and

In general, such "other c files" should have their own config.h that
does the same test and has its own HAVE_DECL_ASPRINTF.

That way, the config.h matches the compiler options being used, and
not the compiler options libiberty's build used.


Re: [PATCH/libiberty] fix build of gdb/binutils with clang.

2015-05-19 Thread Ian Lance Taylor
On Tue, May 19, 2015 at 11:08 AM, Yunlian Jiang  wrote:
>
> I could do that and it make the compilation of libiberty passes.
> However, I  have some other problem when using clang to build gdb
> because of libiberty.
>
> Some c file from other component may include 'libiberty.h' which contains
> the following
>
> #if !HAVE_DECL_ASPRINTF
> /* Like sprintf but provides a pointer to malloc'd storage, which must
>be freed by the caller.  */
>
> extern int asprintf (char **, const char *, ...) ATTRIBUTE_PRINTF_2;
> #endif
>
> The HAVE_DECL_ASPRINTF is defined in config.h under libiberty directory.
> If the other c file only includes libiberty.h and does not include the
> libiberty/config.h and
> at the same time, _GNU_SOURCE is defind, the same error happens.

Probably if HAVE_DECL_ASPRINTF is not defined at all, we should not
declare asprintf in libiberty.h.

Ian


Re: [RFC] COMDAT Safe Module Level Multi versioning

2015-05-19 Thread Sriraman Tallam
On Tue, May 19, 2015 at 10:22 AM, Yury Gribov  wrote:
> On 05/19/2015 09:16 AM, Sriraman Tallam wrote:
>>
>> We have the following problem with selectively compiling modules with
>> -m options and I have provided a solution to solve this.  I would
>> like to hear what you think.
>>
>> Multi versioning at module granularity is done by compiling a subset
>> of modules with advanced ISA instructions, supported on later
>> generations of the target architecture, via -m options and
>> invoking the functions defined in these modules with explicit checks
>> for the ISA support via builtin functions,  __builtin_cpu_supports.
>> This mechanism has the unfortunate side-effect that generated COMDAT
>> candidates from these modules can contain these advanced instructions
>> and potentially “violate” ODR assumptions.  Choosing such a COMDAT
>> candidate over a generic one from a different module can cause SIGILL
>> on platforms where the advanced ISA is not supported.
>>
>> Here is a slightly contrived  example to illustrate:
>>
>>
>> matrixdouble.h
>> 
>> // Template (Comdat) function definition in a header:
>>
>> template
>> __attribute__((noinline))
>> void matrixDouble (T *a) {
>>for (int i = 0 ; i < 16; ++i)  //Vectorizable Loop
>>  a[i] = a[i] * 2;
>> }
>>
>> avx.cc  (Compile with -mavx -O2)
>> -
>>
>> #include "matrixdouble.h"
>> void getDoubleAVX(int *a) {
>>   matrixDouble(a);  // Instantiated with vectorized AVX instructions
>> }
>>
>>
>> non_avx.cc (Compile with -mno-avx -O2)
>> ---
>>
>> #include “matrixdouble.h”
>> void
>> getDouble(int *a) {
>>   matrixDouble(a); // Instantiated with non-AVX instructions
>> }
>>
>>
>> main.cc
>> ---
>>
>> void getDoubleAVX(int *a);
>> void getDouble(int *a);
>>
>> int a[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
>> int main () {
>>   // The AVX call is appropriately guarded.
>>   if (__builtin_cpu_supports(“avx”))
>> getDoubleAVX(a);
>>   else
>> getDouble(a);
>>   return a[0];
>> }
>>
>>
>> In the above code, function “getDoubleAVX” is only called when the
>> run-time CPU supports AVX instructions.  This code looks clean but
>> suffers from the COMDAT ODR violation.  Two copies of COMDAT function
>> “matrixDouble” are generated.  One copy is generated in object file
>> “avx.o” with AVX instructions and another copy exists in “non_avx.o”
>> without AVX instruction.  At link time, in a link order where object
>> file avx.o is seen ahead of  non_avx.o,  the COMDAT copy of function
>> “matrixDouble” that contains AVX instructions is kept leading to
>> SIGILL on unsupported platforms.  To reproduce the SIGILL,
>>
>>
>> $  g++ -c -O2 -mavx avx.cc
>> $ g++ -c -O2 -mno-avx non_avx.cc
>> $  g++ main.cc avx.o non_avx.o
>> $ ./a.out   # on a non-AVX machine
>> Illegal Instruction
>>
>>
>> To solve this, I propose introducing a new compiler option, say
>> -fodr-unsafe-comdats, to let the user tag objects that use specialized
>> options and let the linker choose the comdat candidate to be linked
>> wisely.  The root cause of the above problem is that comdat functions
>> in common headers may not be properly guarded and the linker picks the
>> first candidate it sees.  A link order where the object with the
>> specialized comdat functions appear first causes these comdats to be
>> picked leading to SIGILL on unsupported arches.  With the objects
>> tagged, the linker can be made to pick other comdat candidates when
>> possible.
>>
>> More details:
>>
>> This option is user specified when using arch specific options like
>> -m.  It is an indicator to the compiler that any comdat bodies
>> generated are potentially unsafe for execution.  Note that the COMDAT
>> bodies however have to be generated as there are no guarantees that
>> other modules will do so.  The compiler then emits a specially named
>> section, like “.gnu.odr.unsafe”, in the object file.  When the linker
>> tries to pick a COMDAT candidate from several choices, it must avoid
>> COMDAT copies from objects with sections named “.gnu.odr.unsafe” when
>> presented with a choice to pick a candidate from an object that does
>> not have the “.gnu.odr.unsafe” section.  Note that it may not be
>> possible to do that in which case the linker must pick the unsafe
>> copy, it could explicitly warn when this happens.
>>
>> Alternately,  the compiler can bind locally any emitted comdat version
>> from a specialized module, which could also be guarded by an option.
>> This will solve the problem but this may not be always possible
>> especially when addresses of any such comdat version is taken.
>
>
> Can IFUNC relocations be used to properly select optimal version of code at
> runtime?

Yes, we do want a solution like this but all the dispatching code for
IFUNC needs to be generated at link-time.   Here is an example header
file with target-specific functionalities :

https://bitbucket.org/eigen/eigen/src/6ed647a644b8e3924800f0916a4ce4addf9e7739/Eigen/Co

[PING] [RFC 12/13] S/390 Vector ABI GNU Attribute.

2015-05-19 Thread Andreas Krebbel
On 05/11/2015 03:23 PM, Andreas Krebbel wrote:
> With this patch .gnu_attribute is used to mark binaries with a vector
> ABI tag.  This is required since the z13 vector support breaks the ABI
> of existing vector_size attribute generated vector types:
> 
> 1. vector_size(16) and bigger vectors are aligned to 8 byte
> boundaries (formerly vectors were always naturally aligned)
> 
> 2. vector_size(16) or smaller vectors are passed via VR if available
> or by value on the stack (formerly vector were passed on the stack by
> reference).
> 
> The .gnu_attribute will be used by ld to emit a warning if binaries
> with incompatible ABIs are being linked together:
> https://sourceware.org/ml/binutils/2015-04/msg00316.html
> 
> And it will be used by GDB to perform inferior function calls using a
> vector ABI which fits to the binary being debugged:
> https://sourceware.org/ml/gdb-patches/2015-04/msg00833.html
> 
> The current implementation tries to only set the attribute if the
> vector types are really used in ABI relevant contexts in order to
> avoid false positives during linking.
> 
> However, this unfortunately has some limitations like in the following
> case where an ABI relevant context cannot be detected properly:
> 
> typedef int __attribute__((vector_size(16))) v4si;
> struct A
> {
>   char x;
>   v4si y;
> };
> char a[sizeof(struct A)];
> 
> The number of elements in a depends on the ABI (24 with -mvx and 32
> with -mno-vx).  However, the implementation is not able to detect this
> since the struct type is not used anywhere else and consequently does
> not survive until the checking code is able to see it.
> 
> Ideas about how to improve the implementation without creating too
> many false postives are welcome.
> 
> In particular we do not want to set the attribute for local uses of
> vector types as they would be natural for ifunc optimizations.

Any ideas how this could be improved? That's the only patch of the IBM z13 
series I did not apply yet.

-Andreas-



Re: [patch, gcc 5 regression] re-enable biarch for powerpc-linux-gnu

2015-05-19 Thread David Edelsohn
This seems reasonable to me.

Alan, any thoughts from you?

Thanks, David


On Mon, May 18, 2015 at 12:22 PM, Sandra Loosemore
 wrote:
> We've found that configuring a powerpc-linux-gnu cross toolchain with
> --enable-targets=all no longer enables -m64 support in GCC 5, due to the
> patch for PR target/65286.  (We want to build with a -m64 multilib, in
> particular.)
>
> The attached patch seems to fix the breakage, but I'm not sure that it might
> not break some other configuration.  If this isn't the right fix, can one of
> the target experts suggest a better one?
>
> Here's a link to the discussion of the patch that caused the breakage:
>
> https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00321.html
>
> -Sandra
>


RE: [PATCH, MIPS]: Fix internal compiler error: in check_bool_attrs, at recog.c:2218 for micromips attribute

2015-05-19 Thread Robert Suchanek
Hi,

The original patch had a missing declaration of micromips_globals in mips.h 
that appears to be the cause of segmentation faults when building 
mips-mti-linux-gnu.
I didn't get any failures just before the submission neither on 
mips-img-linux-gnu
nor mips64el-linux-gnu and the test case is too trivial to trigger the ICE.

Below is the missing line. With this change mips-mti-linux-gnu builds fine.
The trunk is unstable and needed another patch from PR66181 to build
Glibc. Ok to commit?

>  We could add -mflip-micromips complementing -mflip-mips16 and use that
> for testing too.  Chances are it'd reveal further issues.  Looking at how
> -mflip-mips16 has been implemented it does not appear to me adding
> -mflip-micromips would be a lot of effort.

I'm in favour of adding such a switch since the testsuite doesn't cover 
a mixture of MIPS and microMIPS code.

Regards,
Robert

gcc/
* config/mips/mips.h (micromips_globals): Declare.
---
 gcc/config/mips/mips.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 0ea4e6d..85c8a97 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3108,6 +3108,7 @@ extern const struct mips_cpu_info *mips_arch_info;
 extern const struct mips_cpu_info *mips_tune_info;
 extern unsigned int mips_base_compression_flags;
 extern GTY(()) struct target_globals *mips16_globals;
+extern GTY(()) struct target_globals *micromips_globals;
 #endif
 
 /* Enable querying of DFA units.  */
-- 
2.2.2


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Richard Henderson
On 05/19/2015 11:06 AM, Rich Felker wrote:
> I'm still mildly worried that concerns for supporting
> relaxation might lead to decisions not to optimize code in ways that
> would be difficult to relax (e.g. certain types of address load
> reordering or hoisting) but I don't understand GCC internals
> sufficiently to know if this concern is warranted or not.

It is.  The relaxation that HJ is working on requires that the reads from the
got not be hoisted.  I'm not especially convinced that what he's working on is
a win.

With LTO, the compiler can do the same job that he's attempting in the linker,
without an extra nop.  Without LTO, leaving it to the linker means that you
can't hoist the load and hide the memory latency.

> I would still like to see the @GOTPCREL stuff added and used instead
> of @GOT, as I mentioned earlier in the thread, but I agree that's
> independent of relaxation support and shouldn't block it.

I don't think that @GOTPCREL for 32-bit is a good idea.  This is the scheme
that Darwin uses, so we do have some experience with it.

In order for it to work you've got to have a pointer to a random address in the
function.  It means that you can only "easily" compute the address once.  If
you need the value again you wind up with the same "extra" addl insn that we
have with the current GOT pointer.

We've just started to do inter-function register allocation.  The next step
along those lines is to share the computation of GOT between multiple
functions.  At which point it really helps to have one global base address to
talk about.


r~


Move dependency for shared libgcc

2015-05-19 Thread Eric Botcazou
Probably a misapplied patch: the dependency of the shared libgcc on the shared 
libunwind is in a wrong place in Makefile.  The patch also removes a useless 
endif/ifneq pair.

Tested on x86_64-suse-linux and ia64-suse-linux, applied as obvious.


2015-05-19  Eric Botcazou  

* Makefile.in (LIBUNWIND): Move dependency for shared libgcc.
Remove useless endif/ifneq ($(enable_shared),yes) pair.


-- 
Eric BotcazouIndex: Makefile.in
===
--- Makefile.in	(revision 223349)
+++ Makefile.in	(working copy)
@@ -910,17 +910,14 @@ all: libgcc.a libgcov.a
 
 ifneq ($(LIBUNWIND),)
 all: libunwind.a
-libgcc_s$(SHLIB_EXT): libunwind$(SHLIB_EXT)
 endif
 
 ifeq ($(enable_shared),yes)
 all: libgcc_eh.a libgcc_s$(SHLIB_EXT)
 ifneq ($(LIBUNWIND),)
 all: libunwind$(SHLIB_EXT)
+libgcc_s$(SHLIB_EXT): libunwind$(SHLIB_EXT)
 endif
-endif
-
-ifeq ($(enable_shared),yes)
 
 # Map-file generation.
 ifneq ($(SHLIB_MKMAP),)


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  wrote:
> On 05/19/2015 11:06 AM, Rich Felker wrote:
>> I'm still mildly worried that concerns for supporting
>> relaxation might lead to decisions not to optimize code in ways that
>> would be difficult to relax (e.g. certain types of address load
>> reordering or hoisting) but I don't understand GCC internals
>> sufficiently to know if this concern is warranted or not.
>
> It is.  The relaxation that HJ is working on requires that the reads from the
> got not be hoisted.  I'm not especially convinced that what he's working on is
> a win.
>
> With LTO, the compiler can do the same job that he's attempting in the linker,
> without an extra nop.  Without LTO, leaving it to the linker means that you
> can't hoist the load and hide the memory latency.
>

My relax approach won't take away any optimization done by compiler.
It simply turns indirect branch into direct branch with a nop prefix at
link-time.  I am having a hard time to understand why we shouldn't do it.


-- 
H.J.


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 06:01:07PM +0200, Michael Matz wrote:
> Hi,
> 
> On Tue, 19 May 2015, Jeff Law wrote:
> 
> > > > Forget lazy binding. It's dead anyway because serious distros want 
> > > > PIE+relro+bindnow+...
> > > 
> > > You keep saying this, but I can't help the feeling it's mostly because 
> > > musl doesn't support it ;-)
> > 
> > FWIW, Red Hat is pushing PIE & partial RELRO deeper and deeper into the 
> > distribution.
> 
> Yeah, us as well, though I don't necessarily see the point for most 
> packages; feels a bit like a checkmark item :)

These days it's fairly rare to have software which does not interact
at all with untrusted data. Consider how much user-facing application
software that was not previously considered security-critical is
making network connections using complex protocols (e.g. anything with
TLS, IM protocols, ...), opening image files from random sources
(attachments, files that happen to be in a browsed-to directory, on
USB sticks, etc.), and so on. I think it's smart to be hardening
everything, at least for distros providing all sorts of random
unvetted software.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Richard Henderson
On 05/19/2015 12:06 PM, H.J. Lu wrote:
> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  wrote:
>> On 05/19/2015 11:06 AM, Rich Felker wrote:
>>> I'm still mildly worried that concerns for supporting
>>> relaxation might lead to decisions not to optimize code in ways that
>>> would be difficult to relax (e.g. certain types of address load
>>> reordering or hoisting) but I don't understand GCC internals
>>> sufficiently to know if this concern is warranted or not.
>>
>> It is.  The relaxation that HJ is working on requires that the reads from the
>> got not be hoisted.  I'm not especially convinced that what he's working on 
>> is
>> a win.
>>
>> With LTO, the compiler can do the same job that he's attempting in the 
>> linker,
>> without an extra nop.  Without LTO, leaving it to the linker means that you
>> can't hoist the load and hide the memory latency.
>>
> 
> My relax approach won't take away any optimization done by compiler.
> It simply turns indirect branch into direct branch with a nop prefix at
> link-time.  I am having a hard time to understand why we shouldn't do it.

I well understand what you're doing.

But my point is that the only time the compiler should present you with the
form of indirect branch you're looking for is when there's no place to hoist
the load.

At which point, is it really worth adding a new relocation to the ABI?  Is it
really worth adding new code to the linker that won't be exercised often?


r~



Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 12:11 PM, Richard Henderson  wrote:
> On 05/19/2015 12:06 PM, H.J. Lu wrote:
>> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  wrote:
>>> On 05/19/2015 11:06 AM, Rich Felker wrote:
 I'm still mildly worried that concerns for supporting
 relaxation might lead to decisions not to optimize code in ways that
 would be difficult to relax (e.g. certain types of address load
 reordering or hoisting) but I don't understand GCC internals
 sufficiently to know if this concern is warranted or not.
>>>
>>> It is.  The relaxation that HJ is working on requires that the reads from 
>>> the
>>> got not be hoisted.  I'm not especially convinced that what he's working on 
>>> is
>>> a win.
>>>
>>> With LTO, the compiler can do the same job that he's attempting in the 
>>> linker,
>>> without an extra nop.  Without LTO, leaving it to the linker means that you
>>> can't hoist the load and hide the memory latency.
>>>
>>
>> My relax approach won't take away any optimization done by compiler.
>> It simply turns indirect branch into direct branch with a nop prefix at
>> link-time.  I am having a hard time to understand why we shouldn't do it.
>
> I well understand what you're doing.
>
> But my point is that the only time the compiler should present you with the
> form of indirect branch you're looking for is when there's no place to hoist
> the load.
>
> At which point, is it really worth adding a new relocation to the ABI?  Is it
> really worth adding new code to the linker that won't be exercised often?

I believe there are plenty of indirect branches via GOT when compiling
PIE/PIC with -fno-plt:

[hjl@gnu-6 gcc]$ cat /tmp/x.c
extern void foo (void);

void
bar (void)
{
  foo ();
}
[hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
[hjl@gnu-6 gcc]$ cat x.s
.file "x.c"
.section .text.unlikely,"ax",@progbits
.LCOLDB0:
.text
.LHOTB0:
.p2align 4,,15
.globl bar
.type bar, @function
bar:
.LFB0:
.cfi_startproc
jmp *foo@GOTPCREL(%rip)
.cfi_endproc
.LFE0:
.size bar, .-bar

-- 
H.J.


Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.

2015-05-19 Thread Torvald Riegel
On Mon, 2015-05-18 at 17:36 +0100, Matthew Wahab wrote:
> Hello,
> 
> On 15/05/15 17:22, Torvald Riegel wrote:
> > This patch improves the documentation of the built-ins for atomic
> > operations.
> 
> The "memory model" to "memory order" change does improve things but I think 
> that
> the patch has some problems. As it is now, it makes some of the descriptions
> quite difficult to understand and seems to assume more familiarity with 
> details
> of the C++11 specification then might be expected.

I'd say that's a side effect of the C++11 memory model being the
reference specification of the built-ins.

> Generally, the memory order descriptions seem to be targeted towards language
> designers but don't provide for anybody trying to understand how to implement 
> or
> to use the built-ins.

I agree that the current descriptions aren't a tutorial on the C++11
memory model.  However, given that the model is not GCC-specific, we
aren't really in a need to provide a tutorial, in the same way that we
don't provide a C++ tutorial.  Users can pick the C++11 memory model
educational material of their choice, and we need to document what's
missing to apply the C++11 knowledge to the built-ins we provide.

There are several resources for implementers, for example the mappings
maintained by the Cambridge research group.  I guess it would be
sufficient to have such material on the wiki.  Is there something
specific that you'd like to see documented for implementers?

> Adding a less formal, programmers view to some of the
> descriptions would help.

Yes, probably.  However, I'm not aware of a good C++11 memory model
tutorial that we could link too (OTOH, I haven't really looked for one).
I don't plan to write one, and doing so would certainly take some time
and require *much* more text than what we have now.

> That implies the descriptions would be more than just
> illustrative, but I'd suggest that would be appropriate for the GCC manual.

The danger I see there is that if we claim to define semantics instead
of just illustrating them, then we need to do a really good job of
defining/explaining them.  I worry that we may end up replicating what's
in the standard or Batty et al.'s formalization of the memory model, and
having to give a tutorial too.  I'm not sure anyone is volunteering to
do that amount of work.

> I'm also not sure that the use of C++11 terms in the some of the
> descriptions. In particular, using happens-before seems wrong because
> happens-before isn't described anywhere in the GCC manual and because it has a
> specific meaning in the C++11 specification that doesn't apply to the GCC
> built-ins (which C++11 doesn't know about).

I agree it's not described in the manual, but we're implementing C++11.
However, I don't see why happens-before semantics wouldn't apply to
GCC's implementation of the built-ins; there may be cases where we
guarantee more, but if one uses the builtins in way allowed by the C++11
model, one certainly gets behavior and happens-before relationships as
specified by C++11.


> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 6004681..5b2ded8 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -8853,19 +8853,19 @@ are not prevented from being speculated to before the 
> barrier.
> 
>   [...]  If the data type size maps to one
> -of the integral sizes that may have lock free support, the generic
> -version uses the lock free built-in function.  Otherwise an
> +of the integral sizes that may support lock-freedom, the generic
> +version uses the lock-free built-in function.  Otherwise an
>   external call is left to be resolved at run time.
> 
> =
> This is a slightly awkward sentence. Maybe it could be replaced with something
> on the lines of "The generic function uses the lock-free built-in function 
> when
> the data-type size makes that possible, otherwise an external call is left to 
> be
> resolved at run-time."
> =

Changed to:
"It uses the lock-free built-in function if the specific data type size
makes that possible; otherwise, an external call is left to be resolved
at run time."

> 
> -The memory models integrate both barriers to code motion as well as
> -synchronization requirements with other threads.  They are listed here
> -in approximately ascending order of strength.
> +An atomic operation can both constrain code motion by the compiler and
> +be mapped to a hardware instruction for synchronization between threads
> +(e.g., a fence).  [...]
> 
> =
> This is a little unclear (and inaccurate, aarch64 can use two instructions
> for fences). I also thought that atomic operations constrain code motion by 
> the
> hardware. Maybe break the link with the compiler and hardware: "An atomic
> operation can both constrain code motion and act as a synchronization point
> between threads".
> =

I removed "by the compiler" and used "hardware instruction_s_".

> 
>   @table  @code
>   @item __ATOMIC_RELAXED
> -No barriers or synchro

Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 8:33 AM, Joseph Myers  wrote:
> On Tue, 19 May 2015, H.J. Lu wrote:
>
>> > I think the whole thing should be posted as one patch, with both the
>> > target-independent changes and the target-specific changes for all
>> > targets.
>> >
>>
>> That is what makes me concerned.  I have some simple target-specified
>> patches which weren't reviewed for years. What will happen if no one
>
> For any unreviewed patch, keep pinging weekly.
>
>> reviews some simple target-specified changes due to
>>
>> 1. Reviewers don't have access to those targets.
>> 2. Target maintainers aren't review them.
>> 3. There are no clear maintainers for those targets.
>
> I've already said in
>  that, given
> target maintainers CC:ed, I might be inclined to approve the patch on the
> basis of allowing them a week to test their target changes.
>

Here is the complete patch.  Tested on Linux/x86-64.  It is also
available on hjl/pie/master branch in git mirror.

-- 
H.J.

Add --enable-default-pie option to configure GCC to generate PIE by
default.

gcc/

2015-05-19  Magnus Granberg  
   H.J. Lu  

* Makefile.in (COMPILER): Add @NO_PIE_CFLAGS@.
(BUILD_CFLAGS): Likewise.
(BUILD_CXXFLAGS): Likewise.
(LINKER): Add @NO_PIE_FLAG@.
(BUILD_LDFLAGS): Likewise.
(libgcc.mvars): Set NO_PIE_CFLAGS to -fno-PIE for
--enable-default-pie.
* common.opt (fPIE): Initialize to -1.
(fpie): Likewise.
(no-pie): New option.
(pie): Replace "Negative(shared)" with "Negative(no-pie)".
* configure.ac: Add --enable-default-pie.
(NO_PIE_CFLAGS): New.  Check if -fno-PIE works.  AC_SUBST.
(NO_PIE_FLAG): New.  Check if -no-pie works.  AC_SUBST.
* defaults.h (DEFAULT_FLAG_PIE): New.  Default PIE to -fPIE.
* gcc.c (NO_PIE_SPEC): New.
(PIE_SPEC): Likewise.
(NO_FPIE1_SPEC): Likewise.
(FPIE1_SPEC): Likewise.
(NO_FPIE2_SPEC): Likewise.
(FPIE2_SPEC): Likewise.
(NO_FPIE2_SPEC): Likewise.
(FPIE_SPEC): Likewise.
(NO_FPIE_SPEC): Likewise.
(NO_FPIC1_SPEC): Likewise.
(FPIC1_SPEC): Likewise.
(NO_FPIC2_SPEC): Likewise.
(FPIC2_SPEC): Likewise.
(NO_FPIC2_SPEC): Likewise.
(FPIC_SPEC): Likewise.
(NO_FPIC_SPEC): Likewise.
(NO_FPIE1_AND_FPIC1_SPEC): Likewise.
(FPIE1_OR_FPIC1_SPEC): Likewise.
(NO_FPIE2_AND_FPIC2_SPEC): Likewise.
(FPIE2_OR_FPIC2_SPEC): Likewise.
(NO_FPIE_AND_FPIC_SPEC): Likewise.
(FPIE_OR_FPIC_SPEC): Likewise.
(LD_PIE_SPEC): Likewise.
(LINK_PIE_SPEC): Handle -no-pie.  Use PIE_SPEC and LD_PIE_SPEC.
* opts.c (DEFAULT_FLAG_PIE): New.  Set to 0 if ENABLE_DEFAULT_PIE
is undefined.
(finish_options): Update opts->x_flag_pie if it is -1.
* config/darwin.h (PIE_SPEC): Renamed to ...
(DARWIN_PIE_SPEC): This.
(LINK_SPEC): Replace PIE_SPEC with DARWIN_PIE_SPEC.
* config/darwin9.h (PIE_SPEC): Renamed to ...
(DARWIN_PIE_SPEC): This.
* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Use
PIE_SPEC and NO_PIE_SPEC if HAVE_LD_PIE is defined.
* config/openbsd.h (ASM_SPEC): Use FPIE1_OR_FPIC1_SPEC and
FPIE2_OR_FPIC2_SPEC.
* config/m68k/netbsd-elf.h (ASM_SPEC): Likewise.
* config/m68k/openbsd.h (ASM_SPEC): Likewise.
* gcc/config/sol2.h (ASM_PIC_SPEC): Likewise.
* config/arm/freebsd.h (SUBTARGET_EXTRA_ASM_SPEC): Likewise.
* config/arm/netbsd-elf.h (SUBTARGET_EXTRA_ASM_SPEC): Likewise.
* config/arm/semi.h (SUBTARGET_EXTRA_ASM_SPEC): Likewise.
* config/cris/linux.h (CRIS_ASM_SUBTARGET_SPEC): Likewise.
* config/m32r/m32r.h (ASM_SPEC): Likewise.
* config/m68k/uclinux.h (DRIVER_SELF_SPECS): Likewise.
* config/rs6000/linux64.h (ASM_SPEC32): Likewise.
* config/rs6000/sysv4.h (ASM_SPEC): Likewise.
* config/sparc/freebsd.h (ASM_SPEC): Likewise.
* config/sparc/linux.h (ASM_SPEC): Likewise.
* config/sparc/linux64.h (ASM_SPEC): Likewise.
* config/sparc/netbsd-elf.h (ASM_SPEC): Likewise.
* config/sparc/openbsd64.h (ASM_SPEC): Likewise.
* config/sparc/sp-elf.h (ASM_SPEC): Likewise.
* config/sparc/sp64-elf.h (ASM_SPEC): Likewise.
* config/sparc/sparc.h (ASM_SPEC): Likewise.
* config/sparc/sysv4.h (ASM_SPEC): Likewise.
* config/sparc/vxworks.h (ASM_SPEC): Likewise.
* config/c6x/elf-common.h (ASM_SPEC): Use NO_FPIC2_SPEC,
FPIC2_SPEC, FPIC1_SPEC and FPIC2_SPEC.
* config/c6x/uclinux-elf.h (LINK_SPEC): Use FPIE_SPEC.
* config/frv/frv.h (DRIVER_SELF_SPECS): Use FPIC_SPEC,
NO_FPIC_SPEC and NO_FPIE1_AND_FPIC1_SPEC.
(ASM_SPEC): Use FPIE1_OR_FPIC1_SPEC and FPIE2_OR_FPIC2_SPEC.
* config/m68k/m68k.h (ASM_PCREL_SPEC): Use FPIC_SPEC.
* config/mips/gnu-user.h (NO_SHARED_SPECS): Use
NO_FPIE_AND_FPIC_SPEC.
* config/mips/vxworks.h (SUBTARGET_ASM_SPEC): Use FPIC_SPEC.
* config/rs6000/freebsd64.h (ASM_SPEC32): Likewise.
* config/rs6000/vxworks.h (ASM_SPEC): Likewise.
* config/vax/linux.h (ASM_SPEC): Likewise.
* doc/install.texi: Document --enable-default-pie.
* doc/invoke.texi: Document -no-pie.
* config.in: Regenerated.
* configure: Likewise.

gcc/ada/

2015-05-19  H.J. Lu  

* gcc-interface/Makefile.in (TOOLS_LIBS): Add @NO_PIE_FLAG@.

libgcc/

2015-05-19  H.J. Lu  

* Makefile.in (CRTSTUFF_CFLAGS): Add $(NO_PIE_CFLAGS).
From 15d2cee44ff03e1892

[C PATCH] Use AGGREGATE_TYPE_P

2015-05-19 Thread Marek Polacek
Bootstrapped/regtested on x86_64-linux, applying to trunk.

2015-05-19  Marek Polacek  

* c-typeck.c (start_init): Use AGGREGATE_TYPE_P.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 7f54490..cf5322f 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -7126,10 +7126,7 @@ start_init (tree decl, tree asmspec_tree 
ATTRIBUTE_UNUSED, int top_level)
= ((TREE_STATIC (decl) || (pedantic && !flag_isoc99))
   /* For a scalar, you can always use any value to initialize,
  even within braces.  */
-  && (TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE
-  || TREE_CODE (TREE_TYPE (decl)) == RECORD_TYPE
-  || TREE_CODE (TREE_TYPE (decl)) == UNION_TYPE
-  || TREE_CODE (TREE_TYPE (decl)) == QUAL_UNION_TYPE));
+  && AGGREGATE_TYPE_P (TREE_TYPE (decl)));
   locus = identifier_to_locale (IDENTIFIER_POINTER (DECL_NAME (decl)));
 }
   else

Marek


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 11:59:00AM -0700, Richard Henderson wrote:
> On 05/19/2015 11:06 AM, Rich Felker wrote:
> > I'm still mildly worried that concerns for supporting
> > relaxation might lead to decisions not to optimize code in ways that
> > would be difficult to relax (e.g. certain types of address load
> > reordering or hoisting) but I don't understand GCC internals
> > sufficiently to know if this concern is warranted or not.
> 
> It is.  The relaxation that HJ is working on requires that the reads from the
> got not be hoisted.  I'm not especially convinced that what he's working on is
> a win.

Well as long as -fno-plt actually generates a load from the GOT like
what would be done for data access, and does not go out of its way to
produce something compatible with relaxation, my hope is that it would
not affected by the pessimization. I'm not sure if that's the case
though.

> With LTO, the compiler can do the same job that he's attempting in the linker,
> without an extra nop.  Without LTO, leaving it to the linker means that you
> can't hoist the load and hide the memory latency.

Yes, this is my feeling too. Alexander Monakov have been discussing it
on #musl a bit and I think the conclusion we reached is that
relaxation is possibly a significant real-world win for non-PIC main
executables, where it's very likely that addresses will be resolved at
ld-time and for the programmer not to specifically annotate this with
protected visibility. In such a case, you get either a direct call or
a direct address load and indirect call, rather than hitting an extra
cache line in the PLT thunk to do the address load and indirect call.
Note that, being non-PIC, there is no GOT register involved here.

> > I would still like to see the @GOTPCREL stuff added and used instead
> > of @GOT, as I mentioned earlier in the thread, but I agree that's
> > independent of relaxation support and shouldn't block it.
> 
> I don't think that @GOTPCREL for 32-bit is a good idea.  This is the scheme
> that Darwin uses, so we do have some experience with it.
> 
> In order for it to work you've got to have a pointer to a random address in 
> the
> function.  It means that you can only "easily" compute the address once.  If
> you need the value again you wind up with the same "extra" addl insn that we
> have with the current GOT pointer.

Why would you recompute it (this requires a fairly expensive call that
reads or pops its own return address) rather than simply spilling the
already-computed value and reloading it from the stack?

The only example I can think of where it might make sense is when you
don't want to load the address unconditionally because there are
shrink-wrappable code paths that don't need it, but multple code paths
that do, in which case they would each load different values. Is this
the concern you have in mind?

> We've just started to do inter-function register allocation.  The next step
> along those lines is to share the computation of GOT between multiple
> functions.  At which point it really helps to have one global base address to
> talk about.

I see -- that would be another case where it simplifies things.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Richard Henderson
On 05/19/2015 12:17 PM, H.J. Lu wrote:
>> But my point is that the only time the compiler should present you with the
>> form of indirect branch you're looking for is when there's no place to hoist
>> the load.
>>
>> At which point, is it really worth adding a new relocation to the ABI?  Is it
>> really worth adding new code to the linker that won't be exercised often?
> 
> I believe there are plenty of indirect branches via GOT when compiling
> PIE/PIC with -fno-plt:
> 
> [hjl@gnu-6 gcc]$ cat /tmp/x.c
> extern void foo (void);
> 
> void
> bar (void)
> {
>   foo ();
> }

Sure, as I said, when there's no place to hoist the load.

Try anything more complicated,

void bar (void)
{
  int i;
  for (i = 0; i < 10; ++i)
foo ();
}

void baz (void)
{
  foo ();
  foo ();
}

and you'll not see the call *foo@GOTPCREL(%rip) form.

Of course there's also plenty of times where combine recreates exactly that
form when perhaps the scheduler might have preferred otherwise.  Those are
optimization choices to be addressed under separate cover.

My point that we can already do what you want via LTO, without adding new
relocations, is still relevant.


r~


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Richard Henderson
On 05/19/2015 12:35 PM, Rich Felker wrote:
> Why would you recompute it (this requires a fairly expensive call that
> reads or pops its own return address) rather than simply spilling the
> already-computed value and reloading it from the stack?
> 
> The only example I can think of where it might make sense is when you
> don't want to load the address unconditionally because there are
> shrink-wrappable code paths that don't need it, but multple code paths
> that do, in which case they would each load different values. Is this
> the concern you have in mind?

That too.  I was thinking of exception landing pads, i.e. catches and cleanups,
where in the past we've had to re-compute the GOT address.  Though now that I
think on that more, it wasn't x86 that had that particular landing pad trouble.


r~


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson  wrote:
> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  
> >> wrote:
> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
>  I'm still mildly worried that concerns for supporting
>  relaxation might lead to decisions not to optimize code in ways that
>  would be difficult to relax (e.g. certain types of address load
>  reordering or hoisting) but I don't understand GCC internals
>  sufficiently to know if this concern is warranted or not.
> >>>
> >>> It is.  The relaxation that HJ is working on requires that the reads from 
> >>> the
> >>> got not be hoisted.  I'm not especially convinced that what he's working 
> >>> on is
> >>> a win.
> >>>
> >>> With LTO, the compiler can do the same job that he's attempting in the 
> >>> linker,
> >>> without an extra nop.  Without LTO, leaving it to the linker means that 
> >>> you
> >>> can't hoist the load and hide the memory latency.
> >>>
> >>
> >> My relax approach won't take away any optimization done by compiler.
> >> It simply turns indirect branch into direct branch with a nop prefix at
> >> link-time.  I am having a hard time to understand why we shouldn't do it.
> >
> > I well understand what you're doing.
> >
> > But my point is that the only time the compiler should present you with the
> > form of indirect branch you're looking for is when there's no place to hoist
> > the load.
> >
> > At which point, is it really worth adding a new relocation to the ABI?  Is 
> > it
> > really worth adding new code to the linker that won't be exercised often?
> 
> I believe there are plenty of indirect branches via GOT when compiling
> PIE/PIC with -fno-plt:
> 
> [hjl@gnu-6 gcc]$ cat /tmp/x.c
> extern void foo (void);
> 
> void
> bar (void)
> {
>   foo ();
> }
> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
> [hjl@gnu-6 gcc]$ cat x.s
> ..file "x.c"
> ..section .text.unlikely,"ax",@progbits
> ..LCOLDB0:
> ..text
> ..LHOTB0:
> ..p2align 4,,15
> ..globl bar
> ..type bar, @function
> bar:
> ..LFB0:
> ..cfi_startproc
> jmp *foo@GOTPCREL(%rip)
> ..cfi_endproc
> ..LFE0:
> ..size bar, .-bar

I agree these exist. What I question is whether the savings from the
linker being able to relax this to a direct call in the case where the
programmer failed to let the compiler make it a direct call to begin
with (by using hidden or protected visibility) are worth the cost of
not being able to hoist the load out of loops or schedule it earlier
in cases where relaxation is not possible because the call target is
not defined in the same DSO.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread H.J. Lu
On Tue, May 19, 2015 at 1:15 PM, Rich Felker  wrote:
> On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
>> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson  wrote:
>> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
>> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  
>> >> wrote:
>> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
>>  I'm still mildly worried that concerns for supporting
>>  relaxation might lead to decisions not to optimize code in ways that
>>  would be difficult to relax (e.g. certain types of address load
>>  reordering or hoisting) but I don't understand GCC internals
>>  sufficiently to know if this concern is warranted or not.
>> >>>
>> >>> It is.  The relaxation that HJ is working on requires that the reads 
>> >>> from the
>> >>> got not be hoisted.  I'm not especially convinced that what he's working 
>> >>> on is
>> >>> a win.
>> >>>
>> >>> With LTO, the compiler can do the same job that he's attempting in the 
>> >>> linker,
>> >>> without an extra nop.  Without LTO, leaving it to the linker means that 
>> >>> you
>> >>> can't hoist the load and hide the memory latency.
>> >>>
>> >>
>> >> My relax approach won't take away any optimization done by compiler.
>> >> It simply turns indirect branch into direct branch with a nop prefix at
>> >> link-time.  I am having a hard time to understand why we shouldn't do it.
>> >
>> > I well understand what you're doing.
>> >
>> > But my point is that the only time the compiler should present you with the
>> > form of indirect branch you're looking for is when there's no place to 
>> > hoist
>> > the load.
>> >
>> > At which point, is it really worth adding a new relocation to the ABI?  Is 
>> > it
>> > really worth adding new code to the linker that won't be exercised often?
>>
>> I believe there are plenty of indirect branches via GOT when compiling
>> PIE/PIC with -fno-plt:
>>
>> [hjl@gnu-6 gcc]$ cat /tmp/x.c
>> extern void foo (void);
>>
>> void
>> bar (void)
>> {
>>   foo ();
>> }
>> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
>> [hjl@gnu-6 gcc]$ cat x.s
>> ..file "x.c"
>> ..section .text.unlikely,"ax",@progbits
>> ..LCOLDB0:
>> ..text
>> ..LHOTB0:
>> ..p2align 4,,15
>> ..globl bar
>> ..type bar, @function
>> bar:
>> ..LFB0:
>> ..cfi_startproc
>> jmp *foo@GOTPCREL(%rip)
>> ..cfi_endproc
>> ..LFE0:
>> ..size bar, .-bar
>
> I agree these exist. What I question is whether the savings from the
> linker being able to relax this to a direct call in the case where the
> programmer failed to let the compiler make it a direct call to begin
> with (by using hidden or protected visibility) are worth the cost of
> not being able to hoist the load out of loops or schedule it earlier
> in cases where relaxation is not possible because the call target is
> not defined in the same DSO.

Just for fun.  I compiled binutils as PIE with -fno-plt -flto:

[hjl@gnu-mic-2 gas]$ file as-new
as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not
stripped
[hjl@gnu-mic-2 gas]$

There are 43:

ff 25 21 93 2d 00 jmpq   *0x2d9321(%rip)# 3d5f58 <_DYNAMIC+0x1e8>

and 1983

ff 15 eb f4 38 00 callq  *0x38f4eb(%rip)# 3d60e0 <_DYNAMIC+0x370>

-- 
H.J.


  1   2   >